Are you interested in using data to understand the scholarly landscape better and help the scholarly community? Would you like to help Crossref make better-informed decisions? Join us as a Data Scientist.
- Location: Remote and global (to partially overlap with working hours in European timezones)
- Type: Full-time
- Remuneration: 120k USD or local equivalent. Note this is a general guide (as there is no universal currency) and local currency analysis will take place before the final offer.
- Reports to: Director of Data Science, Dominika Tkaczyk
- Timeline: Advertise in January/February and offer by February/March
About the role
Crossref operates an open infrastructure that connects thousands of scholarly publishers, millions of research articles, and research objects to serve an increasingly diverse set of communities within scholarly publishing, research, funding, and beyond. Our system acts as the backbone for preserving and sharing the scholarly record. We offer a wide array of services to ensure that scholarly research metadata is registered, linked, and distributed. When members register their content with us, we collect both bibliographic and non-bibliographic metadata. We process it so that connections can be made between publications, people, organizations, and other associated outputs. We preserve the metadata we receive as a critical part of the scholarly record. We also make it openly available across a range of interfaces and formats so that the community can use it and build tools with it.
Over the last few years, we have witnessed substantial growth in the scholarly community, which has been reflected in the increase in the volume and variety of the data we deal with. On the one hand, this growth opens new possibilities for using the data to understand the scholarly landscape better, better serve the community, and make more informed decisions in a data-driven way. On the other hand, we are facing challenges related to the scale and complexity of the data. To fulfil our data-related ambitions and better address the challenges, Crossref has introduced a new Data Science team. The Data Science team will use scientific research and data science to deliver, assess, improve, and enrich scholarly metadata.
The Data Science team will provide in-house data expertise to the Programs and Technology teams relating to system improvements, community impact, metadata enrichment, and other key initiatives. We work in matrix program groups across three areas of focus: Co-create and reflect community trends; Contribute to the research nexus; and Open & sustainable operations. The Data Scientists will be embedded in program steering groups.
We are looking for two Data Scientists to join our Data Science team. The roles will have different focuses:
- Data analysis & insights: The first role will be responsible for processing and analyzing the scholarly and operational data to help the scholarly community and inform Crossrefâs strategy and decisions, as well as proposing new ideas for how we can use the data to fulfil our mission. This role will closely collaborate with all other Crossref teams to co-create ideas and transform them into new knowledge and working solutions.
- Data availability & engineering: The second role will be responsible for detecting and assessing issues and gaps in the scholarly metadata, as well as researching strategies to increase the completeness and accuracy of the metadata and relationships, using internal and external data sources. This role will collaborate with the Technical and Program teams on transforming the research results into production-level services and workflows.
Key responsibilities
Data Analysis & Insights
- Working with scholarly metadata and Crossref operational data to answer questions and gather evidence supporting or disproving hypotheses
- Detecting, diagnosing and assessing problems and gaps in the scholarly metadata using automated and semi-automated techniques
gathering insights from available data to help Crossref make well-informed strategic decisions
- Analyzing trends and monitoring the results of various decisions and policies
researching and proposing new data sources and research opportunities that help to support Crossrefâs strategy
- Evaluating and adopting appropriate data analysis tool(s) for the organisation to use for insights and reporting
- Presenting the insights and new knowledge learned through data science activities internally and externally
- Collaborating with all Crossref teams to understand their needs, co-create ideas and research questions, and propose data-driven approaches to address them
- Collaborating with the data science and academic research community in the fields of bibliometrics, scientometrics, digital libraries, and similar
- Engaging with members, users, and partner organisations to understand trends and needs, and contribute to othersâ community initiatives and awareness
- Implementing and promoting good practices around research, data management, data governance, and transparency
Data Availability & Engineering
- Detecting, diagnosing and assessing problems and gaps in the scholarly metadata using automated and semi-automated techniques
- Researching automatic and semi-automatic strategies to increase the completeness and accuracy of the metadata and relationships, for example, through data cleaning, metadata matching, metadata extraction from unstructured sources
- Using evaluation techniques to estimate the quality of automated strategies
- Proposing additional metadata sources, assessing the overlap between different databases and researching strategies for metadata merging
- Collaborating with the Metadata team on modelling of the metadata gathered from multiple sources and inferred automatically, considering provenance information
- Collaborating with the Technology and Program teams on transforming the research results into production-level services
- Communicating the insights and new knowledge learned through data science activities internally and externally
- Collaborating with the data science and academic research community in the fields of bibliometrics, scientometrics, digital libraries, and similar
- Engaging with members, users, and partner organisations to understand trends and needs, and contribute to othersâ community initiatives and awareness
- Implementing and promoting good practices around research, data management, data governance, and transparency
About you
Essential experience and skills:
- Minimum 3 years of hands-on experience in data science, data engineering, applied research, or similar
- Proven track record of designing, running, and communicating data science experiments
- Experience with using and developing data science-based tools and services
- Experience with software and data engineering
- Strong analytical and problem-solving skills
- Expertise in Python programming language
- Familiarity with machine learning concepts and methods
- Familiarity with relational databases and REST APIs
- Willingness to learn new skills and work with a variety of technologies
- Ability to work independently in a self-directed way while consulting with others and collaborating openly
- Ability to plan and project manage i.e. think ahead, outline goals, and organize steps to achieve the desired outcomes
- Good communication skills with the ability to explain technical concepts to non-technical audiences
- Awareness of the limitations of data e.g. relating to cultural or geographic biases
Nice-to-have skills:
- Experience with scholarly metadata
- Experience with metadata modelling
- Knowledge of the dynamics of research communications and relevant communities
- Experience with integrating data from multiple sources
- Familiarity with JSON and mixed-content model XML
- Experience with natural language processing techniques
- Experience with statistical inference and sampling
- Experience with large-scale data processing frameworks such as Spark
- Experience with AWS services
- Experience with search engines such as Elasticsearch
- Experience with deploying and maintaining machine learning solutions in production
- Experience with data visualization tools
About Crossref & the team
Weâre a nonprofit membership organization that exists to make scholarly communications better. We rally the community; tag and share metadata; run an open infrastructure; play with technology; and make tools and servicesâall to help put research in context.
We envision a rich and reusable open network of relationships connecting research organizations, people, things, and actions; a scholarly record that the global community can build on forever, for the benefit of society. We are working towards this vision of a âResearch Nexusâ by demonstrating the value of richer and connected open metadata, incentivising people to meet best practices, while making it easier to do so. âWeâ means 20,000+ members from 160 countries, 160+ million records, and nearly 2 billion monthly metadata queries from thousands of tools across the research ecosystem. We want to be a sustainable source of complete, open, and global scholarly metadata and relationships.
Take a look at our strategic agenda to see the planned work that aims to achieve the vision. The sustainability area aims to make transparent all the processes and procedures we follow to run the operation long-term, including our financials and our ongoing commitment to the Principles of Open Scholarly Infrastructure (POSI). The governance area describes our board and its role in community
oversight.
It also takes a strong team â because reliable infrastructure needs committed people who contribute to and realise the vision, and thrive doing it. We are a distributed group of 46 dedicated people who like to play quizzes, talk about celery (sometimes cucumber), measure coffee intake, and create 100s of custom slack emojis. We enthusiastically support the Oxford comma but waver between use of American or British English. Occasionally we do some work to improve knowledge sharing worldwideâ
which we take a bit more seriously than ourselves. We do this through fair policies and working practices, a balanced approach to resourcing, and accountability to each other.
We can offer the successful candidate a challenging and fun environment to work in. Together we are dedicated to our global mission and we are constantly adapting to ensure we get there. Take a look at our organisation chart, the latest Annual Meeting recordings, and our financial information here.
Thinking of applying?
We especially encourage applications from people with backgrounds historically under-represented in research and scholarly communications. You can be based anywhere in the world where we can employ staff, either directly or through an employer of record.
Click here to apply!
Please strive to submit your application by February 5, 2025.
Anticipated salary for this role is approximately 120k USD-equivalent, paid in local currency. Crossref offers competitive compensation, benefits, flexible work arrangements, professional development opportunities, and a supportive work environment. As a nonprofit organization, we prioritize mission over profit.
Equal opportunities commitment
Crossref is committed to a policy of non-discrimination and equal opportunity for all employees and qualified applicants for employment without regard to race, colour, religion, sex, pregnancy or a condition related to pregnancy, sexual orientation, gender identity or expression, national origin, ancestry, age, physical or mental disability, genetic information, veteran status, uniform service member status, or any other protected class under applicable law. Crossref will make reasonable accommodations for qualified individuals with known disabilities in accordance with applicable law.
Thanks for your interest in joining Crossref. We are excited to hear from you!