Blog

OTMI Applied - Means More Search Hits

Tony Hammond

Tony Hammond – 2007 October 09

In OTMI

(Click image to enlarge.) Following up on previous posts on OTMI (the proposal from NPG for scholarly publishers to syndicate their full text to drive text-mining applications), Fabien Campagne from Cornell, a long-time OTMI supporter, has created an OTMI-driven search engine (based on his Twease work). This may be the first publicly accessible OTMI-based service. It currently only contains NPG content from the OTMI archive online - some 2 years worth of Nature and four other titles.

Mars Bar

Tony Hammond

Tony Hammond – 2007 October 08

In PDF

Just noticed that there is now (as of last month) a blog for Mars (“Mars: Comments on PDF, Acrobat, XML, and the Mars file format”). See this from the initial post: “The Mars Project at Adobe is aimed at creating an XML representation for PDF documents. We use a component-based model for representing different aspects of the document and we use the Universal Container Format (a Zip-based packaging format) to hold the pieces.

Scholarly DC

Tony Hammond

Tony Hammond – 2007 October 05

In Metadata

This This was just sent out to the DC-GENERAL mailing list about the new DCMI Community for Scholarly Communications. As Julie Allinson says: “The aim of the group is to provide a central place for individuals and organisations to exchange information, knowledge and general discussion on issues relating to using Dublin Core for describing items of ‘scholarly communications’, be they research papers, conference presentations, images, data objects. With digital repositories of scholarly materials increasingly being established across the world, this group would like to offer a home for exploring the metadata issues faced.

The Names Project

Tony Hammond

Tony Hammond – 2007 October 05

In ORCID

Was reminded to blog about this after reading Lorcan’s post on the Names Project being run by JISC. From the blurb: _“The project is going to scope the requirements of UK institutional and subject repositories for a service that will reliably and uniquely identify names of individuals and institutions. It will then go on to develop a prototype service which will test the various processes involved. This will include determining the data format, setting up an appropriate database, mapping data from different sources, populating the database with records and testing the use of the data.

InChIKey

Tony Hammond

Tony Hammond – 2007 October 02

In IdentifiersInChI

The InChI (International Chemical Identifier from IUPAC) has been blogged earlier here. RSC have especially taken this on board in their Project Prospect and now routinely syndicate InChI identifiers in their RSS feeds as blogged here. As reported variously last month (see here for one such review) IUPAC have now released a new (1.02beta) version of their software which allows hashed versions (fixed length 25-character) of the InChI, so-called InChIKey’s, to be generated which are much more search engine friendly.

Oh No, Not You Again!

Tony Hammond

Tony Hammond – 2007 October 02

In Identifiers

Oh dear. Yesterday’s post “Using ISO URNs” was way off the mark. I don’t know. I thought that walk after lunch had cleared my mind. But apparently not. I guess I was fixing on eyeballing the result in RDF/N3 rather than the logic to arrive at that result.

(Continues.)

Using ISO URNs

Tony Hammond

Tony Hammond – 2007 October 01

In Identifiers

(Update - 2007.10.02: Just realized that there were some serious flaws in the post below regarding publication and form of namespace URIs which I’ve now addressed in a subsequent post here.)

By way of experimenting with a use case for ISO URNs, below is a listing of the document metadata for an arbitrary PDF. (You can judge for yourselves whether the metadata disclosed here is sufficient to describe the document.) Here, the metadata is taken from the information dictionary and from the document metadata stream (XMP packet).

The metadata is expressed in RDF/N3. That may not be a surprise for the XMP packet which is serialized in RDF/XML, as it’s just a hop, skip and a jump to render it as RDF/N3 with properties taken from schema whose namespaces are identified by URI. What may be more unusual is to see the document information dictionary metadata (the “normal” metadata in a PDF) rendered as RDF/N3 since the information dictionary is not nodelled on RDF, not expressed in XML, and not namespaced. Here, in addition to the trusty HTTP URI scheme, I’ve made use of two particular URI schemes: “iso:” URN namespaces, and “data:” URIs.

(Continues.)

Whole Lotta ID

Tony Hammond

Tony Hammond – 2007 October 01

In Identifiers

ISO has registered with the IANA a URN namespace identifier (“iso:”) for ISO persistent resources. From the Internet-Draft: “This URN NID is intended for use for the identification of persistent resources published by the ISO standards body (including documents, document metadata, extracted resources such as standard schemata and standard value sets, and other resources).” The toplevel grammar rules (ABNF) give some indication of scope: NSS = std-nss std-nss = “std:” docidentifier *supplement *docelement [addition]

Authors in Context?

Tony Hammond

Tony Hammond – 2007 September 30

In ORCID

On the subject of author IDs (a subject Crossref is interested in and on which held a meeting earlier this year, as blogged about here), this post by Karen Coyle “Name authority control, aka name identification” may be worth a read. She starts off with this: “Libraries do something they call “name authority control”. For most people in IT, this would be called “assigning unique identifiers to names.” Identifying authors is considered one of the essential aspects of library cataloging, and it isn’t done in any other bibliographic environment, as far as I know.

XMP-Ville

Tony Hammond

Tony Hammond – 2007 September 25

In XMP

Been so busy looking into the technical details of XMP that I almost forgot to check out the current landcsape. Luckily I chanced on these articles by Ron Roszkiewicz for The Seybold Report (and apologies for lifting the title of this post from his last). The articles about XMP are well worth reading and chart the painful progress made to date:

  • The Brief Tortured Life of XMP (July ’05)
    • [Thought Leaders Hammer out Metadata Standard] (April ’07)
      • [Metadata Persistence and “Save for Web…”] (July ’07)

      From the earlier characterization of XMP as “underachieving teenager” Roszkiewicz is cautiously optimistic that IDEAlliance’s XMP Open initiative (an initiative to advance XMP as an open industry specification) will help outreach and foster adoption of this fledgling technology.

      (Continues.)