Blog

 3 minute read.

Machine Readable: Are We There Yet?

Tony Hammond

Tony Hammond – 2008 November 19

In Metadata

The guidelines for Crossref publishers (“DOI Name Information and Guidelines” - [PDF, 210K][1]) has this to say in “Sect. 6.3 The response page” regarding the response page for a DOI:

“A minimal response page must contain a full bibliographic citation displayed to the user. A response page without bibliographic information should never be presented to a user.”

which would seem to be all fine and dandy. But if that user is a machine (or an agent acting for a user) they’ll likely be out of luck as the metadata in the bibliographic citation is generally targeted at human users.

So here’s a quick and dirty implementation of what a machine readable page could look like using RDFa. (The demo uses Jeni Tennison’s wonderful [rdfQuery][2] plugin which I [blogged][3] about earlier.)

Clicking the DOI link below will bring up in a sub-window a bibliographic citation which might be found in a typical DOI repsonse page. If you now click the “Read Me” link you should see an alert message which presents the bibliographic metadata as a complete RDF document (in a simple N3 – or Notation3 – format). This document is assembled on the fly by rdfQuery using the RDFa markup embedded in the page.

See the “View Source” link to list the actual XHTML markup and the RDFa properties which have been added. And note also that some of the properties are partially “hidden” to the human reader, e.g. a publication date is given in year form only whereas the machine record has the date in full, and some of the properties are fully “hidden”: print and electronic ISSNs, issue number, ending page, etc.

(Continues below.)

So, what’s new about this? There are already various means of adding metadata to pages using e.g. metadata tags (see [here][4] for an earlier post on this), or COinS objects, or even RDF/XML in comment sections. All of these have their various utilities but are still just early attempts at automation. What makes this new and compelling is that RDFa allows publishers to embed machine readable metadata that can be read as a complete machine description in RDF using pretty much off-the-shelf tools and that this markup is embedded unobtrusively into the content in the proper context.

Note that there are some similarities here between embedding an XMP packet (which includes metadata) into an arbitrary binary object, e.g. a PDF file, and embedding RDF into a section of a web page – or perhaps “draping” the RDF over the document markup would be a better term – so that the metadata travels along with the actual content.

By the way, the RDFa can be processed to yield valid RDF (as is shown in the demo) and which can also be seen by running the web page through the [RDFa Distiller][5]. (You just need to cut and paste the link of the demo page given above into the Distiller form box.) This will produce RDF in various serializations (N3, XML, Triples) from the RDFa.

So, is there really any longer any reason not to have machine readable metadata at the end of the DOI? Are we there yet?

[1]: Crossref DOI display guidelines [2]: http://code.google.com/p/rdfquery/wiki/RdfPlugin [3]: /blog/rdfquery [4]: /blog/natures-metadata-for-web-pages [5]: http://www.w3.org/2007/08/pyRdfa/

Further reading

Page owner: Tony Hammond   |   Last updated 2008-November-19