Blog

Poorboy Metadata Hack

Tony Hammond

Tony Hammond – 2009 January 06

In Metadata

I was playing around recently and ran across this little metadata hack. At first, I thought somebody was doing something new. But no, nothing so forward apparently. (Heh! 🙂 I was attempting to grab the response headers from an HTTP request on an article page and was using by default the Perl LWP library. For some reason I was getting metadata elements being spewed out as response headers - at least from some of the sites I tested.

And the DOI is …

Tony Hammond

Tony Hammond – 2008 December 22

In Metadata

Once structured metadata is added to a file then retrieving a given metadata element is usually a doddle. For example, for PDFs with embedded XMP one can use Phil Harvey’s excellent Exiftool utility. Exiftool is a Perl library and application which I’ve blogged about here earlier which is available as a ‘.zip‘ file for Windows (no Perl required) or ‘.dmg‘ for MacOS. Note that Phil maintains this actively and has done so over the last five years.

Xmas XMP

Tony Hammond

Tony Hammond – 2008 December 19

In XMP

Well, as I blogged on our web publishing blog Nascent we just went live with XMP labelling on Nature in yesterday’s double issue. We will be adding XMP to all new issues of Nature as well as rolling out across all our other titles in the next few weeks and months. The screenshots below from Acrobat (File > Properties, CMD-D / CTL-D) show what the user might see both with (bottom-left) and without (top-right) semantic markup.

ORE/POWDER: Remarks on Ratings

Tony Hammond

Tony Hammond – 2008 December 06

In Linking

I wanted to make some remarks about the “Ease of use” and “Learn curve” ratings which I gave in the ORE/POWDER comparison table that I blogged about here the other day. It may seem that I came out a little harsh on ORE and a little easy on POWDER. I just wanted to rationalize the justification for calling it that way. (By the way, the revised comparison table includes a qualification to those ratings.)

My primary interest was from the perspective of a data provider rather than a data consumer. What does it take to get a resource description document (“resource map”, “description resource” or “sitemap”) ready for publication?

(Continues)

Resource Maps Encoded in POWDER

Tony Hammond

Tony Hammond – 2008 December 05

In Linking

Following right on from yesterday’s post on ORE and POWDER, I’ve attempted to map the worked examples in the ORE User Guide for RDF/XML (specifically Sect. 3) to POWDER to show that POWDER can be used to model ORE, see Resource Maps Encoded in POWDER (A full explanation for each example is given in the RDF/XML Guide, Sect. 3 which should be consulted.) This could just all be sheer doolally or might possibly turn out to have a modicum of instructional value – I don’t know.

Describing Resource Sets: ORE vs POWDER

Tony Hammond

Tony Hammond – 2008 December 04

In Linking

I’ve been reading up on POWDER recently (the W3C Protocol for Web Description Resources) which is currently in last call status (with comments due in tomorrow). This is an effort to describe groups of Web resources and as such has clear similarities to the Open Archives Initiative ORE data model, which has been blogged about here before. In an attempt to better understand the similarities (and differences) between the two data models, I’ve put up the table which directly compares the two heavyweight contendors OAI-ORE and POWDER and also (unfairly) places them alongside the featherweight Sitemaps Protocol for reference.

CURIEs - A Cure for URIs

Tony Hammond

Tony Hammond – 2008 December 03

In Identifiers

A quick straw poll of a few folks at London Online yesterday revealed that they had not heard of CURIE’s. And there was I thinking that most everybody must have heard of them by now. 🙂 So anyway here’s something brief by way of explanation.

CURIE stands for Compact URI and does the signal job or rendering long and difficult to read URI strings into something more manageable. (URIs do have the particular gift of being “human transcribable” but in practice their length and the actual characters used in the URI strings tend to muddy things for the reader.) So given that the Web is built upon a bedrock of URIs, anything that then makes URIs easier to handle is going to be an important contributor to our overall ease of interaction with the Web.

(Continues)

Ubiquity commands for Crossref services

So the other day Noel O’Boyle made me feel guilty when he pinged me and asked about the possibility using one of the Crossref APIs for creating a Ubiquity extension. You see, I had played with the idea myself and had not gotten around to doing much about it. This seemed inexcusable- particularly given how easy it is to build such extensions using the API we developed for the WordPress and Moveable Type plugins that we announced earlier in the year.

RSS Good Practice Guidelines

Tony Hammond

Tony Hammond – 2008 November 24

In RSS

I just wanted to flag up here Lisa Rogers’ recent review article on RSS in FUMSI (the online magazine for information professionals published by Free Pint Ltd) RSS and Scholarly Journal Tables of Contents: the ticTOCs Project, and Good Practice Guidelines for Publishers Especially of interest is the diagram in Fig. 2 which breaks out the metadata elements that might be encountered in a rich web feed. Worthwhile pointing out that this reflects current practice and that under the item elements one would soon hope to see publishers routinely adding in prism:doi (with the bare DOI as value) and prism:url (with DOI target URL as value) from the PRISM 2.

Machine Readable: Are We There Yet?

Tony Hammond

Tony Hammond – 2008 November 19

In Metadata

The guidelines for Crossref publishers (“DOI Name Information and Guidelines” - [PDF, 210K][1]) has this to say in “Sect. 6.3 The response page” regarding the response page for a DOI:

“A minimal response page must contain a full bibliographic citation displayed to the user. A response page without bibliographic information should never be presented to a user.”

which would seem to be all fine and dandy. But if that user is a machine (or an agent acting for a user) they’ll likely be out of luck as the metadata in the bibliographic citation is generally targeted at human users.

So here’s a quick and dirty implementation of what a machine readable page could look like using RDFa. (The demo uses Jeni Tennison’s wonderful [rdfQuery][2] plugin which I [blogged][3] about earlier.)

Clicking the DOI link below will bring up in a sub-window a bibliographic citation which might be found in a typical DOI repsonse page. If you now click the “Read Me” link you should see an alert message which presents the bibliographic metadata as a complete RDF document (in a simple N3 – or Notation3 – format). This document is assembled on the fly by rdfQuery using the RDFa markup embedded in the page.

See the “View Source” link to list the actual XHTML markup and the RDFa properties which have been added. And note also that some of the properties are partially “hidden” to the human reader, e.g. a publication date is given in year form only whereas the machine record has the date in full, and some of the properties are fully “hidden”: print and electronic ISSNs, issue number, ending page, etc.

(Continues below.)