Blog

Crossref metadata for bibliometrics

Our paper, Crossref: the sustainable source of community-owned scholarly metadata, was recently published in Quantitative Science Studies (MIT Press). The paper describes the scholarly metadata collected and made available by Crossref, as well as its importance in the scholarly research ecosystem. Containing over 106 million records and expanding at an average rate of 11% a year, Crossref’s metadata has become one of the major sources of scholarly data for publishers, authors, librarians, funders, and researchers.

Using the Crossref REST API (with Open Ukrainian Citation Index)

Over the past few years, I’ve been really interested in seeing the breadth of uses that the research community is finding for the Crossref REST API. When we ran Crossref LIVE Kyiv in March 2019, Serhii Nazarovets joined us to present his plans for the Open Ukrainian Citation Index, an initiative he explains below. But first an introduction to Serhii and his colleague Tetiana Borysova. Serhii Nazarovets is a Deputy Director for Research at the State Scientific and Technical Library of Ukraine.

Proposed schema changes - have your say

The first version of our metadata input schema (a DTD, to be specific) was created in 1999 to capture basic bibliographic information and facilitate matching DOIs to citations. Over the past 20 years the bibliographic metadata we collect has deepened, and we’ve expanded our schema to include funding information, license, updates, relations, and other metadata. Our schema isn’t as venerable as a MARC record or as comprehensive as JATS, but it’s served us well.

Request for feedback: Conference ID implementation

We’ve all been subject to floods of conference invitations, it can be difficult to sort the relevant from the not-relevant or (even worse) sketchy conferences competing for our attention. In 2017, DataCite and Crossref started a working group to investigate creating identifiers for conferences and projects. Identifiers describe and disambiguate, and applying identifiers to conference events will help build clear durable connections between scholarly events and scholarly literature. Chaired by Aliaksandr Birukou, the Executive Editor for Computer Science at Springer Nature, the group has met regularly over the past two years, collaborating to create use cases and define metadata to identify and describe conference series and events.

Building better metadata with schema releases

This month we have officially released a new version of our input metadata schema. As well as walking through the latest additions, I’ll also describe here how we’re starting to develop a new streamlined and open approach to schema development, using GitLab and some of the ideas under discussion going forward.

Funders and infrastructure: let’s get building

Human intelligence and curiosity are the lifeblood of the scholarly world, but not many people can afford to pursue research out of their own pocket. We all have bills to pay. Also, compute time, buildings, lab equipment, administration, and giant underground thingumatrons do not come cheap. In 2017, according to statistics from UNESCO, $1.7 trillion dollars were invested globally in Research and Development. A lot of this money comes from the public - 22c in every dollar spent on R&D in the USA comes from government funds, for example.

Big things have small beginnings: the growth of the Open Funder Registry

The Open Funder Registry plays a critical role in making sure that our members correctly identify the funding sources behind the research that they are publishing. It addresses a similar problem to the one that led to the creation of ORCID: researchers’ names are hard to disambiguate and are rarely unique; they get abbreviated, have spelling variations and change over time. The same is true of organizations. You don’t have to read all that many papers to see authors acknowledge funding from the US National Institutes of Health as NIH, National Institutes for Health, National Institute of Health, etc.

License metadata FTW

More and better license information is at the top of a lot of Christmas lists from a lot of research institutions and others who regularly use Crossref metadata. I know, I normally just ask for socks too. To help explain what we mean by this, we’ve collaborated with Jisc to set out some guidance for publishers on registering this license metadata with us.

Putting content in context

You can’t go far on this blog without reading about the importance of registering rich metadata. Over the past year we’ve been encouraging all of our members to review the metadata they are sending us and find out which gaps need filling by looking at their Participation Report.

The metadata elements that are tracked in Participation Reports are mostly beyond the standard bibliographic information that is used to identify a work. They are important because they provide context: they tell the reader how the research was funded, what license it’s published under, and more about its authors via links to their ORCID profiles. And while this metadata is all available through our APIs, we also display much of it to readers through our Crossmark service.

A simpler text query form

The Simple Text Query form (STQ) allows users to retrieve existing DOIs for journal articles, books, and chapters by cutting and pasting a reference or reference list into a simple query box. For years the service has been heavily used by students, editors, researchers, and publishers eager to match and link references.

We had changes to the service planned for the first half of this year - an upgraded reference matching algorithm, a more modern interface, etc. In the spirit of openness and transparency, part of our project plan was to communicate these pending changes to STQ users well in advance of our 30 April completion date. What would users think? Could they help us improve upon our plans?

RSS Feed

Archives