We’re equally sad and proud to report that Rachael Lammey is moving on in her career to the very lucky team at 67Bricks. Her last day at Crossref is today, Friday 16th February. Which is too soon for us, but very exciting for her!
It’s hard to overstate Rachael’s impact on Crossref’s growth and success in her 12 years here. She started as a Product Manager where she developed that role into a broad and central function, and soon moved into the newly-formed community team as International Outreach Manager where she grew important programs such as Sponsors, Ambassadors, a series of ‘LIVE’ events around the world, and she went on to manage her own team and establish some of the most important strategic relationships that Crossref now feels fortunate to have.
Great news to share: our Executive Director, Ed Pentz, has been selected as the 2024 recipient of the Miles Conrad Award from the USA’s National Information Standards Organization (NISO). The award is testament to an individual’s lifetime contribution to the information community, and we couldn’t be more delighted that Ed was voted to be this year’s well-deserved recipient.
During the NISO Plus conference this week in Baltimore, USA, Ed accepted his award and delivered the 2024 Miles Conrad lecture, reflecting on how far open scholarly infrastructure has come, and the part he has played in this at Crossref and through numerous other collaborative initiatives.
Metadata about research objects and the relationships between them form the basis of the scholarly record: rich metadata has the potential to provide a richer context for scholarly output, and in particular, can provide trust signals to indicate integrity. Information on who authored a research work, who funded it, which other research works it cites, and whether it was updated, can act as signals of trustworthiness. Crossref provides foundational infrastructure to connect and preserve these records, but the creation of these records is an ongoing and complex community effort.
A few months ago we announced our plan to deprecate our support for the Open Funder Registry in favour of using the ROR Registry to support both affiliation and funder use cases. The feedback we’ve had from the community has been positive and supports our members, service providers and metadata users who are already starting to move in this direction.
We wanted to provide an update on work that’s underway to make this transition happen, and how you can get involved in working together with us on this.
We missed an error that led to resource resolution URLs of some 500,000+ records to be incorrectly updated. We have reverted the incorrect resolution URLs affected by this problem. And, we’re putting in place checks and changes in our processes to ensure this does not happen again.
How we got here
Our technical support team was contacted in late June by Wiley about updating resolution URLs for their content. It’s a common request of our technical support team, one meant to make the URL update process more efficient, but this was a particularly large request. Shortly thereafter, we were provided with nearly 1,200 separate files by Atypon on behalf of Wiley in order to update the resolution URLs of ~9 million records. We manually spot checked over 50 of these files, because, prior to this issue, our technical support team did not have a mechanism to automatically check for errors. That labor intensive review did not turn up any problems. That is, those 50 samples had no errors with the headers, like were found later.
Among the files we didn’t check, there were headers included in the files with different owning fromPrefix and acquiring toPrefix members’ DOI prefixes. In a URL update request, the prefixes should always be the same.
And still other files included requests to update records with DOIs that had never even been registered. Here are some examples:
In the example above, these fictional DOIs are both under prefix 10.5555. Thus, the result of this request will ONLY be that the resolution URLs of DOI 10.5555/doi1 and 10.5555/doi2 are updated in the metadata.
In this second example, these fictional DOIs are both under prefix 10.5555, but because the toPrefix in the header differs from the fromPrefix, the result of this request will be that the resolution URLs of 10.5555/doi1 and 10.5555/doi2 are updated in the metadata AND the owning prefix of both records will be transferred from prefix 10.5555 to prefix 10.9876.
We kicked off the URL update request on 30 June and all legitimate DOIs whose files were free of errors were updated by 7 July (yes, it takes about a week to update the resolution URLs for ~9 million records).
On 9 July, Peter Strickland of the International Union of Crystallography, one of 22 members affected by this mistake, contacted us to enquire how/why much of their content was resolving to incorrect URLs and why ownership of their content appeared within our search interface to be Wiley. Peter was rightly concerned. We were, too. Our technical support team quickly elevated this issue, because, frankly, this is not the first time our finicky URL update process has caused unwanted metadata updates, albeit not quite at this volume.
How we investigated the problem
We rallied our internal team. We investigated and discovered that we believed that some ~600,000 DOIs were erroneously included and updated in the requested 1,200 files. We later extended that estimate to include other conditions, in order to be as cautious as we could, to over 1 million DOIs. In the end, we determined that the incorrect files attempted updates of 1,228,041 DOIs. Due to the errors in the files (i.e., erroneous headers and non-registered DOIs), we only actually updated and then reverted 520,512 DOIs. The other 700,000+ DOIs were never updated (because of errors in the original files provided to us) or simply had never been registered with us.
Prior to this mistake, Crossref had never reverted a member’s metadata update before. To be clear, and as I said above, we have had other URL update mistakes over the years, like this one; they were just smaller in scale. We knew there were holes in our process that needed to be plugged. And we knew we needed a better solution for members to manage these updates themselves without our manual intervention. So, while there were mistakes made in the files supplied to us, this was our error and we’re fixing it; more on that below.
For this situation, we quickly realized that reversion of the metadata update was the best option for us, albeit we did not have an existing process in place to execute that reversion. That’s because we only keep the current version of each metadata record. We couldn’t back out of the change; we couldn’t simply restore these records to the metadata registered with us as of late June, because we no longer had an easily accessible, central record of those previous resolution URLs. What we did have was a record of all the previous submissions made against each DOI, so our technical team, focused their efforts there.
How we fixed all those records
We had two errors to correct: the ownership transfers (those records that had inadvertent and mismatched from/to prefixes) and the incorrect resolution URLs. We reverted all of the ownership transfers on 9 July and then double and triple checked that ownership during the week of 12 July to ensure we didn’t miss anything.
The resolution reversion was more complicated. We invested in creating a patch to identify the records that had been updated by our team, and then extract the last legitimate resolution URL registered with us by the owning member in order to revert the metadata for each record. In order to provide confidence that this mistake was contained, we also built a check into the patch to ensure that those DOIs that did have their ownership temporarily transferred were not updated during the few days that ownership was incorrect. That check helped us determine that none of the 520,512 DOIs were incorrectly updated beyond this mistaken URL update request.
The technical team built and tested this patch. The tests turned up gaps in the patch, so we refined it during the week of 2021 July 12. We kicked off the reversion of these records on Monday, 19 July at 20:05 UTC and the patch completed all reversions at 20:14 UTC, Thursday, 22 July.
In the end, we successfully reverted all of the resolution URLs for those 520,512 DOIs we identified; provided daily updates and apologies to the 22 affected members; together we worked some longer hours; and persevered.
We don’t want this to ever happen again. Like, never. We clearly need to make changes to our internal processes to prevent this in the future.
Here’s what’s ahead:
We are building a checker that we can run URL update files through to automate and our checks. This means we will be able to check every single file in a large batch, rather than relying on manual and labor intensive spot-checking;
As said above, one compounding issue in this mistake was the mismatched from/to prefixes in the file headers. Our technical support team uses the same file headers to transfer ownership/stewardship of a record or set of records between members AND to update resolution URLs. These two tasks are almost never legitimately completed in the same file. That is, there is usually a lag between ownership transfers and resolution URL updates (most members will request an ownership transfer and then a month or two later update their URLs). Because of this, simply decoupling these two tasks (feel free to follow our work at this link) would help eliminate a glaring risk, so we’re working on that too;
Lastly, we’re researching ways we can streamline resource resolution URL updates. You can also monitor our progress on this one. No promises or specifics yet, but we’re eager to reduce toil on our technical support team, avoid problems like this one, and provide members safe and straightforward ways to better update your metadata.
Thanks for the support of the whole Crossref team and our community - and for reading this far! Never a dull moment…