URN inside XMP

Introduction

The XMP (for: Extensible Metadata Platform) specification developed by Adobe combines  camera/machine created EXIF metadata and human/manually created IPTC metadata for media like images (but also PDFs) into a common container.

What are the advantages of XMP?

  1. All metadata are stored inside the files themselves, so they should never ever get lost or separated from the media they’re describing.
  2. Using a standardized metadata format gives the user independency from systems and applications (user lock-in), allows easy exchange with partners, and preserves long-term access to its metadata.
  3. For the use in databases/repositories, we can export just the metadata into separate files, so that the heavy bulk of the data, the images, can be stored on the file space, while the super-small XMP files are easily crunched by the database.
  4. Best of all, it allows an easy integration for use in an XML database/repository.

In order to make this work, a slight adaption is necessary to make the metadata include the URNs used in the ICRIM framework.

Using URNs inside XMP

In order to use XMP inside the ICRIM framework, we must ensure that the architectuural building stones of our framework — URNs which point to resources — can be used here, too. There are two tags we’re interested in:

Title
The Object URN, i.e. original digital file or the file resulting from a scan w/o extension and encoded as URN. For example, urn:icrim:0010:img:dig:dcim0001717.
SubjectCode
The Subject URN, in the case of more than one divided by space, i.e. the URN of the subject represented. For example, urn:icrim:0010:scu:hm:264 for inv. Hm 264 of the sculpture collection.

Example

Musei Capitolini Neg.A1518
The example image at the right incorporates the following IPTC information: Caption, Author, Headline, Title, Date of Creation, Place, etc. You can check the image directly in your browser using the Firefox extension Exif Viewer or simply by downloading and opening it in a metadata-sensitive application like Photoshop, IrfanView or similar. For an idea of how the metadata will look like encoded as XML, we’ve exported the image’s IPTC information into an external XMP file, which should open in your browser as raw XML file.

The following is an overview of the IPTC part of the tags used in the above image (Photo A1518 capturing the sculpture inv.n.o S 299). Your own images might have some tags more (contact info or the like), but at least the tags Headline/Capiton/Creator/Title/SubjectCode should always be present. Please remember – this is about IPTC, not EXIF tags, which are far more in number and record camera metadata only (and are therefore not of importance for this demonstration).

Element: Value: Description: Code IPTC-NAA (IIM):
Content Denomination: Vechia ebbra Short description (1-2 words) = Headline
Description: Statua di vecchia donna ubriaca inv. S 299 Description (1-2 phrases) + inv. n.o = Caption/Description
Inventary: urn:collectio:0001:scu:00299 Inv. N.o using the URN system = Subject Code
Location Location: Musei Capitolini Name of location = Content Location Name
City: Roma Name of city = City
Province: Lazio Name of province or state = Province/State
Country: Italy Name of country = Country
Rights Provider: Musei Capitolini / Comune di Roma Name of provider retaining the image’s rights = Provider / Credit
Copyright: Copyright 2003 Musei Capitolini, all rights reserved Copyright declaration = Copyright Notice
Image File Name: urn:collectio:0001:foto:A:001518 If digital: name of file If analogue: name of negative, positive or print = Title
Job: Campagna fotografica BNL Notes about the photographic campaign = Job Identifier / Original Transmission Reference
Photographer: De Masi Photographer’s name = Creator
Creation Date: 1983-04-03 Date of the images original creation = Date Created
Digitalisation Date: 2008-04-22 Date of digitalisation (may be different from creation date) = Date Time
Source: Collezione Castellani The image’s origin (collection, donation etc.) = Source
Instructions: Scan from positive, Agfascan 4000, 18x24cm at 600ppi, 8bit, Adobe RGB Technical notes e.g. regarding the scanning process, the original material etc. = Instructions

Applications which support XMP

You might ask: How can I generate XMP metadata? Are there any free or at least accessible tools? In fact, there are quite a few, with many more projects turning up every month or so.

For a starter, a list of applications which support XMP are available e.g. from Wikipedia entry or from the IPTC website. The list is far from exhausting: for example, the small but excellent PhotoMe, a pure FLOSS application, is developed very rapidly. In the GNU-Linux world, Mapivi ist still the frontrunner, but Sagittarius is coming strong – and with a convincing GUI. Also Gwenview, making use of the Kipi-Plugins, is catching up. Rumours are running that the Gnome project will support XMP, too, the way MS does it natively in its desktop — in the meantime, there’s an XMPManager as an Add-On for Nautilus. You might keep an eye on the Photobuntu blog for more news.

The above apps are mostly for the casual user, though. Most pro users will fall into one of the two categories and corresponding software setup: The digital photographers will use apps like Adobe’s Lightroom, Apple’s Aperture, Bibble Lab’s Bibble, Lightzone by Lightcrafts etc. – all of which support full XMP editing. The digital asset managers instead will use Canto’s Cumulus, Portfolio by Extensis etc. – all of which permit management of metadata according to XMP standards, too.

So, either way, all the applications needed for creating/editing metadata according to XMP are already in place!

Proposed Workflow for Images

So now, you know nearly everything you have to know about XMP, but how do you actually do the trick? Here’s our preferred workflow as by now — to be taken cum grano salis as everything might change with new releases or changing circumstances. And if you expect us to plug a special application: no, we don’t have any preferred app, we just hack away with whatever does the job best.

  1. Capture of Images:Depending on the originals, two ways of:
    • scanning the originals in the case of film negatives/positives, or
    • digital capturing the scenery or objects using DigiCams

    Either way, you end up with a digital data stream.

  2. Saving the data stream:The data-stream delivered by the scanner or digital image sensor has to be
    saved:

    • In the case of scanned film negatives/positives, you might want to save it as 16bit or 8bit TIFF. Unfortunately, PNG, which otherwise is a very nice standard format for digital images, so far has no space for XMP metadata. This might be changing in the future – so look out.
    • With original digital images delivered by DigiCams, you might want to save the original RAW file, which preserves the original data in a way nothing else does. In case you have second thougths about long-term archiving your special RAW format, or have a particularly challenging post-processing workflow, you might want to convert it into DNG or 16bit TIFF, too. Please: Never ever throw away the RAW files, they are what the original slides/negatives were in the analogue/film era!
  3. (Re-)Naming conventions I:At the same time, the data stream has to be given a name according to some
    pre-defined naming scheme:

    • In the case of scanned film negatives/positives, you might want to give it a name reflecting the original material – say, brogi1929 in the case of a scan of an image from the Brogi collection, n.o 1929. Please note: the scan in itself is no new artwork, so it has not to be given an separate distinctive name!
    • With original digital images delivered by DigiCams, you might well go with the standard naming conventions delivered by your cam, such as DCIM00001, DCIM00002 or whatever.
  4. Naming conventions II:For our purposes, the naming scheme has to be enhanced encorporating the URN schema. We prefer something like urn:icrim:0001:img:dig|pos|neg:[resource] for defining
    original digital captures, scans from positive or negative material. So, brogi1929 will become urn:icrim:0001:img:pos:brogi1929 (for a scan from a 8×10″ positive), and dcim00001 will become urn:icrim:0001:img:dig:dcim0001.

  5. Creating the IPTC Metadata:Either use your exisiting image management tools (Aperture, Capture NX, Lightroom, Lightzone, &c.) or your favorite metadata manager (Sagittarius, PhotoME, Mapivi,  GwenView, &c.) — see above for a discussion of available applications — to enrich your metadata with at least the following tags: Headline/Capiton/Creator/Title/SubjectCode. Again, see above for an example.

    If you want, you can add more tags as you like – for copyright information, or place/time including GPS info. If you’re interested in the latter, GPS Correlate will do the job for you on Linux platforms.

  6. Creating derivative files for web view:Most likely you’ll not want to burden your website, database, repository or LAN environment with the heavy original 12/14/16bit files. Instead, you’ll creating thumbnails, preview, medium and large format versions (most probably as JPEGs) of the original files, all the while preserving the metadata stored inside.

    One likely friend of your’s will be mogrify, which lets you transform your images on the command line – useful, if you’re dealing with a remote server. For example, the command mogrify -resize 400x400 -quality 80 -font FreeSerif -pointsize 35 -verbose -draw "gravity south
    fill black text 0,33 'Musei Capitolini' fill white text 1,32 'Musei Capitolini'" *.jpg
    will resize all JPEGs to 400px and superimpose the copyright text. With screen you can even run the session and detach yourself from the process (CTRL-A CTRL-D) while it’s running.

  7. Creating sidecar XMP files for database:You now have to export the metadata into separate XMP files, something which most image management applications can do for whole collections at a time.

    ExifTools will do the job: exiftool -o %d%f.xmp [dir|files] will export all IPTC settings into XMP sidecar files.

    These sidecar files — plain simple XML files with metadata encoded according to the XMP schema — can now be imported into you database or repository, and then application-wise automatically linked to the different image files like thumbnails, previews, and so on.

  8. Setting up your web interface:The final step is setting up your web interface so the search results of your database/repository point to to the different image files like thumbnails, previews, and so on. Normally, you’d do it application-wise, for example using Apache’s Cocoon (which in turn is used by eXist, see the discussion about repositories), so you can let the application decide at which point you’ll wanto to show the thumbnail, a preview, or the whole JPEG.
  9. p.s. Exporting from an existing database into XMP sidecar files:You might already have an existing database, most probably some ephemeral Microsoft “technology”, which every day acts more and more like a black hole. Don’t despair, there are ways out of this. First you’d try to export your db either directly into an XML flavour or into a simple CSV file: in the latter case, you’d have to build a new XML file using, e.g., Oxygen. Then you can build the single XMP files through XSLT 2.0: the new specification, which allows multiple outputs. I agree that there’s some coding involved, – mail me if you want to have a specimen. The single XMPs can then be written to the image files using the Exiftol command exiftool -tagsFromFile %d%f.xmp [dir|files]. The sidecar XMP files have to have the same name (apart from the extension) than the image files.

That’s it. Managing metadata for media is not a bother anymore.

This entry was posted in Annotation. Bookmark the permalink.

Comments are closed.