4 minute read.A healthy infrastructure needs healthy funding data
We’ve been talking a lot about infrastructure here at Crossref, and how the metadata we gather and organize is the foundation for so many services - those we provide directly - and those services that use our APIs to access that metadata, such as Kudos and CHORUS, which in turn provide the wider world of researchers, administrators, and funders with tailored information and tools.
The initiative formerly known as FundRef
Together Crossref’s funding data (previously known as FundRef – we simplified the name) and the Open Funder Registry, our taxonomy of grant-giving organizations, comprise a hub for gathering and querying metadata related to the questions:
“Who funded this research?” and “Where has the research we funded been published?”
To support the funding data initiative, three key pieces of metadata are needed from publishers:
Unfortunately only around half of the 950,000 Crossref DOIs with funding data contain funder IDs, the unique funder identifiers from the Open Funder Registry that are needed to link up all of the data. So, only half of the data is useful. (And 950,000 DOIs is only a fraction of the 77 million DOIs in our database, but more on that later).
When we looked at the funding data that was coming in without funder IDs we were a little surprised. We had expected that most of these would be names that simply aren’t in the Open Funder Registry yet, and we thought there would be a certain amount of incorrect information that had been entered into the “funder_name” field.
Instead, what we found was that many of the names were correct, and the funder IDs were just missing.
Tidying the data
To help correct this, we decided to match incoming names to funder IDs where we could do so with the highest level of confidence. After much testing to minimize false positives, we switched this on at the end of August 2015.
Throughout September and October, we inserted funder IDs for about 25% of the names that have been deposited without IDs. For October, the real numbers were 68,000 funder names with no IDs deposited, and 18,000 funder IDs inserted by Crossref.
In the same period 42,000 funder IDs were deposited by publishers. With our matching on top of this, we are achieving a little over a 50% overall success rate of “good” funding data (funder names and funder IDs together).
We have been very careful to distinguish the funder IDs that we have added from those deposited by publishers - provenance of data is an extremely important part of what we do. All funder IDs are tagged as provided either by the publisher or Crossref. Every time we insert an ID into a deposit, the publisher is notified in the deposit report.
We have also now added these tags to our REST API so that publishers can query to find out exactly which DOIs we have amended*. The ideal scenario at this point is that the publisher checks that they are happy with the matching and then redeposits the funding data for those DOIs, over-writing the <span >doi-asserted-by: “crossref”</span>
tag and claiming the metadata as their own.
Setting some limits
The second largest problem with funding data was incorrectly entered funder name – e.g. concatenation of several names or authors entering overly long or vague program names instead of the official funder name.
To help weed this out, we have made a couple of changes to the funding data deposit system:
- Funder_name field can no longer contain a numerical string over 4 digits
- Funder_name field can no longer contain a text string over 200 characters
- Funder names that that do not adhere to these two rules will now cause the funding data section of the metadata deposit (not the whole deposit) to fail and return an error message.
Getting the growth we need
As of today, 198 publishers deposit funding data with Crossref. This amounts to about 3.5% of Crossref’s membership (although it’s a larger proportion of our total deposits). We need more publishers to deposit funding data so that funding data search can become a truly useful tool for the community. There’s no sign-up process or additional fee - read about how to get started, and take a look at our best practices for depositing funding data.
Finally, we ask you: how can we get more and better funder metadata in 2016?
This is not a rhetorical question. Please tweet your thoughts @CrossrefOrg or email your replies to info@crossref.org. You will receive something special via snail mail if you reply to us – just Crossref’s way of saying thank you.
*At the time of posting our database is re-indexing and the “asserted-by” tags are still filtering through to the API. Check back in a day or two for the full picture.