Blog

 4 minute read.

2025 public data file now available

Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 20,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.

Our metadata is used by thousands of services, researchers, and other organisations. We make it openly available through our APIs, which can be used to obtain a subset of records. If you want to work with our full corpus, the best way is to get a copy of the public data file and update it via the REST API with any new records created or changed since its release.

By providing an annual copy of the full corpus, we also expand the ways in which the metadata can be used and interrogated. It is ideal for groups using large samples of the scholarly record, such as metaresearchers or research integrity experts. You can find examples of the public data file used in research on journal editorial practices and in projects investigating gaps in the scholarly record.

How to access the public data file

This year’s public data file contains an additional 9 million records, and many updates to previously deposited records. The formats and method of access are the same as last year, except that it uses JSON lines, meaning that each metadata record is on a single line and the file suffix is jsonl instead of json. The records have been sorted by DOI, meaning it should be easier to navigate.

Before downloading the full dataset, you may wish to download the sample dataset containing 100 files (with 100 records in each, around 24 MB). This is a randomly sampled subset of metadata records and can be used for prototyping and development.

To get a copy of the annual data file you can access it directly via https://0-doi-org.libus.csd.mu.edu/10.13003/87bfgcee6g, or get the sample dataset and previous public data files from Academic Torrents. We make a donation to Academic Torrents to support their work, which allows the data to be accessible in this way. Some organisations have reported policies that prevent access to torrents, so we provide a copy that can be downloaded from AWS, which requires an AWS account and a small payment to cover the data transfer costs. You can find the details about access [here](/documentation/retrieve-metadata/rest-api/tips-for-using-public-data-files-and-plus].

We have some tips for working with the public data file. If you would like to have access to monthly snapshots of the whole corpus, along with higher API rate limits and other benefits, you can subscribe to Metadata Plus.

What’s different this year?

This year’s public data file contains an additional 9 million records, and many updates to previously deposited records. The formats and method of access are the same as last year, except that it uses JSON lines, meaning that each metadata record is on a single line and the file suffix is jsonl instead of json. The records have been sorted by DOI, meaning it should be easier to navigate.

A change this year is that the file does not contain aliased DOIs, which are DOI that are redirected to another DOI. Aliasing is necessary on rare occasions, for example when two DOIs are registered for the same content. Previously we haven’t indicated aliasing in the REST API and public data files; this year only the prime DOIs (the ones to which they are redirected) are included. This makes statistical analysis of the metadata more accurate, but beware that it may give different results in cases where many aliased DOIs were previously counted. See this community forum post for more details.

The file also contains retractions from the Retraction Watch database, which was acquired by Crossref in September 2023 and recently integrated into the REST API.

If you have questions, want to let us know how you will use the metadata, or want to discuss anything on the topic of retrieving Crossref metadata, head to our community forum. From there, you can also keep updated about changes to our schema and APIs.

Further reading

Page owner: Martyn Rittman   |   Last updated 2025-March-12