4 minute read.With a little help from your Crossref friends: Better metadata
We talk so much about more and better metadata that a reasonable question might be: what is Crossref doing to help?
Members and their service partners do the heavy lifting to provide Crossref with metadata and we don’t change what is supplied to us. One reason we don’t is because members can and often do change their records (important note: updated records do not incur fees!). However, we do a fair amount of behind the scenes work to check and report on the metadata as well as to add context and relationships. As a result, some of what you see in the metadata (and some of what you don’t) is facilitated, added or updated by Crossref.
Much of the work is automated but some of it still requires manual intervention (sound familiar?). Here’s an overview:
Before registration
Our open APIs allow for Crossref metadata to be used throughout research and scholarly communications systems and services, before and after records are registered with us. Those who have used a search function in something like a manuscript submission system, rather than having to hand key or copy and paste the information, will appreciate how these integrations reduce time, effort and the likelihood of errors in collecting metadata well before it gets to Crossref.
For one example, it’s very common for members to use the metadata to add DOIs to reference lists when preparing deposits. Of course, new members first need a prefix (and a memberID and name, but more on that later) in order to register content. We also provide a suffix generator for help in constructing DOIs. If you’re not sure how best to make use of existing metadata in deposits, we’ve got a few options for you. Questions are welcome.
We don’t often put it this way but we should: Crossref members rely on the metadata as much, if not more, than the rest of the community. More and better metadata directly benefits our members.
Upon registration
There are a number of ways we work with the metadata when deposits are received.
- Checking for uniqueness In order to avoid duplicate records, we check to make sure that a title or work hasn’t been registered before. Depending on what we find, a conflict report or failed registration may result.
- Adding DOIs to references When references come to us without DOIs, we’ll try to match and add them.
- ORCID auto-update We automatically update authors’ ORCID records (with their permission of course) whenever deposits include their ORCID iDs.
- Preprint to VoR reports We compare title information and provide notifications of matching records to members depositing preprints, to help them fulfill their obligation to link to Versions of Record (VoRs), where they exist.
- Relationships Like preprint to VoR links, components are another kind of relationship. These might be supplementary material such as figures we can link to the ‘parent’ record.
- Funding data When members register only a funder name as part of the information on who funded the work, we’ll try to match it to its identifier from the Funder Registry, to support better linking between funders and works.
- Timestamps We add date-times for first created and last updated to member-supplied timestamps.
- Count of references That’s right, we count all the references for each record that includes them and add the total to the metadata.
After registration
Once registered, we check, report on and update metadata in a few ways.
- Link checking We email each member a monthly Resolution Report with details of the number of failed and successful resolutions for their DOIs. If someone in the community reports a DOI that isn’t registered, we email the member a DOI Error Report.
- Citation counts and matches Citation counts for records of members participating in our Cited-by service are openly available in our REST API. The matching citations themselves are available to members, for their own records only.
- Title transfers Title, prefix and DOI transfers are common and require assistance from our team.
- MemberID It’s not uncommon for members to have more than one prefix. The memberID means users of the REST API can query for records associated with all of a member’s prefixes.
- Digital preservation We handle the infrequent but critical update of URLs that are necessary when titles are triggered for digital preservation. We also preserve the metadata itself, with both CLOCKSS and Portico.
Of course, since records are often redeposited with updates (note, deposit fees are only charged once per record), some of these processes on our side are repeated as necessary.
This list isn’t exhaustive and other needs and opportunities will emerge. For example, we are looking at matching to add ROR IDs, as we do for funderIDs, and doing some research into how we might determine and assert subject classifications at the work-level. If you’re interested in more about this kind of work, you’ll want to read this recent post by my Labs colleague Dominika on matching grants to outputs.
Get in touch if you have questions or for more information.
Further reading
- Dec 5, 2016 – Included, registered, available: let the preprint linking commence.
- Jan 8, 2025 – Metadata matching: beyond correctness
- Nov 6, 2024 – How good is your matching?
- Aug 28, 2024 – The myth of perfect metadata matching
- Jun 27, 2024 – The anatomy of metadata matching
- May 16, 2024 – Metadata matching 101: what is it and why do we need it?
- May 14, 2024 – 2024 public data file now available, featuring new experimental formats
- Mar 13, 2024 – Subject codes, incomplete and unreliable, have got to go