Understanding your Similarity Report

2024 September 24

Meet the candidates and vote in our 2024 Board elections

On behalf of the Nominating Committee, I’m pleased to share the slate of candidates for the 2024 board election. Each year we do an open call for board interest. This year, the Nominating Committee received 53 submissions from members worldwide to fill four open board seats. We maintain a balanced board of 8 large member seats and 8 small member seats. Size is determined based on the organization’s membership tier (small members fall in the $0-$1,650 tiers and large members in the $3,900 - $50,000 tiers).

2024 August 28

The myth of perfect metadata matching

In our previous instalments of the blog series about matching (see part 1 and part 2), we explained what metadata matching is, why it is important and described its basic terminology. In this entry, we will discuss a few common beliefs about metadata matching that are often encountered when interacting with users, developers, integrators, and other stakeholders. Spoiler alert: we are calling them myths because these beliefs are not true! Read on to learn why.

2024 July 25

Re-introducing Participation Reports to encourage best practices in open metadata

We’ve just released an update to our participation report, which provides a view for our members into how they are each working towards best practices in open metadata. Prompted by some of the signatories and organizers of the Barcelona Declaration, which Crossref supports, and with the help of our friends at CWTS Leiden, we have fast-tracked the work to include an updated set of metadata best practices in participation reports for our members.

2024 July 22

Metadata schema development plans

It’s been a while, here’s a metadata update and request for feedback In Spring 2023 we sent out a survey to our community with a goal of assessing what our priorities for metadata development should be - what projects are our community ready to support? Where is the greatest need? What are the roadblocks? The intention was to help prioritize our metadata development work. There’s a lot we want to do, a lot our community needs from us, but we really want to make sure we’re focusing on the projects that will have the most immediate impact for now.

Understanding your Similarity Report

How is the Similarity Score calculated?

The below information will help you understand how to interpret your Similarity Report, whether you’re using iThenticate v1 or v2.

To calculate the Similarity Score, iThenticate scans your submitted document’s text, and checks it against each of the repositories you’ve chosen. The system takes the number of matching words found within the document and divides it by the document’s total word count to produce the Similarity Score percentage for the report.

If you apply exclusion options to the document, the system removes all matches for the exclusion option logic and recalculates the Similarity Score percentage.

Learn more about exclusion settings when setting up a new folder (v1 only), editing filters and exclusions in existing folders (v1 only), filters and exclusions within the Similarity Report (v1 or v2), and URL filters (v1 or v2) for account administrators.

How to interpret the Similarity Report

iThenticate does not check for plagiarism; it checks for similarity. Where a section of the submission’s content is similar or identical to one or more sources, it will be flagged for review. This doesn’t automatically mean plagiarism, however - just similarity.

It’s perfectly natural for a submission to match against some sources in the database. A high degree of overlap may indicate a well-researched document with many references to existing work, and as long as these sources are quoted and referenced correctly, this is perfectly acceptable. A high degree of overlap may also be present where an author has already shared their work on a preprint repository. If the author(s) are the same, this is not a problem.

It’s important that you don’t set a Similarity Score over which you automatically reject manuscripts - where there’s a high degree of overlap, your editors and reviewers should decide if the match is acceptable or not, as part of their general review process.

Similarity Reports and preprints

It is entirely possible (and acceptable) for an author to submit an article to a journal even though they’ve previously made the article available as a preprint. In this case, we expect a high degree of similarity between the preprint and author’s submitted manuscript.

Therefore, if you find a high degree of similarity between a manuscript you’re checking in iThenticate and a preprint by the same author(s), this is likely to be because the manuscript is a match with its own preprint. However, if the manuscript and preprint do not have the same author(s), this may indicate a problem, and you should investigate further.

Some preprints can be found in iThenticate’s Crossref Posted Content repository, so take this into account if you are checking against this repository. But even if you have excluded the Crossref Posted Content repository in your settings (v1or v2), it is still possible for preprints to appear as matches to a submission, because iThenticate also crawls preprint repositories on the web.

We recommend including preprints in your results to ensure you are checking that preprints haven’t been plagiarised by a different author, but if you see a pre-print match for the same author, this isn’t plagiarism.

Page owner: Kathleen Luschek | Last updated 2020-May-19

Get involved

Find a service

Documentation

About us

2024 September 24

Meet the candidates and vote in our 2024 Board elections

2024 August 28

The myth of perfect metadata matching

2024 July 25

Re-introducing Participation Reports to encourage best practices in open metadata

2024 July 22

Metadata schema development plans

Documentation

Understanding your Similarity Report

How is the Similarity Score calculated?

How to interpret the Similarity Report

Similarity Reports and preprints