Blog

Tony Hammond

Tony worked alongside Crossref at nature.com between 2006 and 2010.

Nature’s Metadata for Web Pages

Tony Hammond

Tony Hammond – 2008 May 19

In Metadata

Well, we may not be the first but wanted anyway to report that Nature has now embedded metadata (HTML meta tags) into all its newly published pages including full text, abstracts and landing pages (all bar four titles which are currently being worked on). Metadata coverage extends back through the Nature archives (and depth of coverage varies depending on title). This conforms to the W3C’s Guideline 13.2 in the Web Content Accessibility Guidelines 1.0 which exhorts content publishers to “provide metadata to add semantic information to pages and sites”.

Metadata is provided in both DC and PRISM formats as well as in Google’s own bespoke metadata format. This generally follows the DCMI recommendation “Expressing Dublin Core metadata using HTML/XHTML meta and link elements, and the earlier RFC 2731 “Encoding Dublin Core Metadata in HTML”. (Note that schema name is normalized to lowercase.) Some notes:

  • The DOI is included in the “dc.identifier” term in URI form which is the Crossref recommendation for citing DOI.
    • We could consider adding also “prism.doi” for disclosing the native DOI form. This requires the PRISM namespace declaration to be bumped to v2.0. We might consider synchronizing this change with our RSS feeds which are currently pegged at v1.2, although note that the RSS module mod_prism currently applies only to PRISM v1.2.
      • We could then also add in a “prism.url” term to link back (through the DOI proxy server) to the content site. The namespace issue listed above still holds.
        • The “citation_” terms are not anchored in any published namespace which does make this term set problematic in application reuse. It would be useful to be able to reference a namespace (e.g. “rel="schema.gs" href="..."“) for these terms and to cite them as e.g. “gs.citation_title“.
        The HTML metadata sets from an example landing page are presented below.

OpenHandle: Languages Support

Tony Hammond

Tony Hammond – 2008 April 21

In Handle

Following up the earlier post on OpenHandle, there are now a number of language examples which have been contributed to the project. The diagram below shows the OpenHandle service in schematic with various languages support. Briefly, OpenHandle aims to provide a web services interface to the Handle System to simplify access to the data stored for a given Handle. (Note that the diagram is an HTML imagemap and all elements are “clickable”.

OpenHandle: Google Code Project

Tony Hammond

Tony Hammond – 2008 March 07

In Handle

Just announced on the handle-info and semantic-web mailing lists is the OpenHandle project on Google Code. This may be of some interest to the DOI community as it allows the handle record underpinning the DOI to be exposed in various common text-based serializations to make the data stored within the records more accessible to Web applications. Initial serializations include RDF/XML, RDF/N3, and JSON. We’d be very interested in receiving feedback on this project - either on this blog or over on the project wiki.

ISO/CD 26324 (DOI)

Tony Hammond

Tony Hammond – 2008 February 22

In Identifiers

Following on from my previous post about prism:doi I didn’t mention, or reference, the ongoing ISO work on DOI, Indeed I hadn’t realized that the DOI site now has a status update on the ISO work: _“The DOI® System is currently being standardised through ISO. It is expected that the process will be finalised during 2008. In December 2007 the Working Group for this project approved a final draft as a Committee Draft (standard for voting) which is now being processed by ISO.

prism:doi

Tony Hammond

Tony Hammond – 2008 February 22

In Metadata

The new PRISM spec (v. 2.0) was published this week, see the press release. (Downloads are available here.) This is a significant development as there is support for XMP profiles, to complement the existing XML and RDF/XML profiles. And, as PRISM is one of the major vocabularies being used by publishers, I would urge you all to go take a look at it and to consider upgrading your applications to using it.

Search Web Services Document

Tony Hammond

Tony Hammond – 2007 November 09

In Search

The OASIS Search Web Services TC has just put out the following document for public review (Nov 7- Dec 7, 2007): _Search Web Services v1.0 Discussion Document Editable Source: http://docs.oasis-open.org/search-ws/v1.0/DiscussionDocument.doc PDF: http://docs.oasis-open.org/search-ws/v1.0/DiscussionDocument.pdf HTML: http://docs.oasis-open.org/search-ws/v1.0/DiscussionDocument.html From the OASIS announcement: “This document: “Search Web Services Version 1.0 - Discussion Document - 2 November 2007”, was prepared by the OASIS Search Web Services TC as a strawman proposal, for public review, intended to generate discussion and interest.

DC in (X)HTML Meta/Links

Tony Hammond

Tony Hammond – 2007 November 06

In Metadata

This message posted out yesterday on the dc-general list (with following extract) may be of interest: _“Public Comment on encoding specifications for Dublin Core metadata in HTML and XHTML 2007-11-05, Public Comment is being held from 5 November through 3 December 2007 on the DCMI Proposed Recommendation, “Expressing Dublin Core metadata using HTML/XHTML meta and link elements” «http://dublincore.org/documents/2007/11/05/dc-html/» by Pete Johnston and Andy Powell. Interested members of the public are invited to post comments to the DC-ARCHITECTURE mailing list «http://www.

STIX Fonts in Beta

Tony Hammond

Tony Hammond – 2007 November 06

In Standards

Well, Howard already blogged on Nascent last week about the STIX fonts (Scientific and Technical Information Exchange) being launched and now freely available in beta. And today the STM Association also have blogged this milestone mark. So, just for the record, I’m noting here on CrossTech those links for easy retrieval. As Howard says: “I recommend all publishers download the fonts from the STIX web site at www.stixfonts.org today.” (And for those who want to see more of Howard, he can be found in interview here on the SIIA Executive FaceTime Webcast Series.

DCMI Identifiers Community

Tony Hammond

Tony Hammond – 2007 October 17

In Identifiers

Another DCMI invitation. And a list. Lovely.

See this message (copied below) from Douglas Campbell, National Library of New Zealand, to the dc-general mailing list.

(Continues)

Hybrid

Tony Hammond

Tony Hammond – 2007 October 17

In XMP

So, back on the old XMP tack. The simple vision from the XMP spec is that XMP packets are embedded in media files and transported along with them - and as such are relatively self-contained units, see Fig 1.

Hybrid - A.jpg

Fig. 1 - Media files with fully encapsulated descriptions.

But this is too simple. Some preliminary considerations lead us to to see why we might want to reference additional (i.e. external) sources of metadata from the original packet:

PDFs
PDFs are tightly structured and as such it can be difficult to write a new packet, or to update an existing packet. One solution proposed earlier is to embed a minimal packet which could then reference a more complete description in a standalone packet. (And in turn this standalone packet could reference additional sources of metadata.)
Images
While considerably simpler to write into web-delivery image formats (e.g. JPEG, GIF, PNG), it is the case that metadata pertinent to the image only is likely to be embedded. Also, of interest is the work from which the image is derived which is most likely to be presented externally to the image as a standalone document. (And in turn this standalone packet could reference additional sources of metadata.)

(Continues)