The Crossref Curriculum

Direct deposit of XML

If you’re sending us XML directly, it’s important that you understand our schema, and that your XML files follow its rules.

To deposit your XML files with Crossref, you have a choice of three methods:

If you’re making your deposits via the admin tool or HTTPS POST, you can use our test system.

Special characters in your XML

All XML submitted to our system must be UTF-8 encoded. There are two ways to include a special unicode character in a Crossref deposit XML file:

  1. Encode the special character using a numerical representation. This is the preferred approach. Constructing an entity reference in the XML that is the numerical value of the character. For example, <surname>&#352;umbera</surname> includes the special character S with a háček (Š).
  2. Use a UTF-8 editor or tool when creating the XML and insert characters directly into the file, which results in a one or more byte sequence per character in the file.

For example, an S with a háček (Š) has a decimal value of 352 which is 160hex. This value converts to the UTF-8 sequence C5,A0 in hex. You can create a small XML file in which you insert this two-byte sequence (shown here between the <UTF_encoded> tags).

<?xml version="1.0" encoding="utf-8" ?>
<start>
<UTF_encoded>Š</UTF_encoded>
</start>

The character displays properly in a browser but if you save the XML source and try to view it in certain editors, it will not display correctly.

Character entities

XML based on schema does not support named character entities (sometimes referred to as html-encoded characters). For example, é or are not allowed. To include these characters you must use their numerical representation, &#x0E9; or &#x2013; respectively. These are called numerical entities, shown by the # (hash or pound sign). The x following # indicates the value is in hex (rather than decimal if the x were omitted). All entities must end with the ; character.

Character numerical values may be found in the Unicode Character Code Charts. Learn more about UTF-8 and unicode, and the ISO 8859 series of standardized multilingual graphic character sets for writing in alphabetic languages.

Using face markup

Some style/face markup is supported by our schema but we recommend using it only when it is essential to the meaning of the text. Learn more about face markup.

Page owner: Laura J. Wilkinson   |   Last updated 2020-April-08