Digital archives team drops its warez on Github August 10, 2012

When State Records started building a whole of government digital archive for New South Wales, we committed to publishing new software developed for the project as free and open source software. We rely heavily on software and web services shared freely by members of the digital preservation community, such as the PRONOM technical registry from the National Archives (UK) and Xena from the National Archives of Australia. We hope, in turn, that the software we publish will prove of use, or of interest, to others.

Why develop new digital preservation tools? We have been conscious of the importance of not ‘re-inventing the wheel’ and, wherever possible, are adopting or adapting existing tools. This avoids waste, pools resources, and means we can take advantage of the great software that is already available. However we do require a software solution which supports our general approach to digital preservation (see /digital-archive/) and this demands:

  • a flexible workflow solution that can be customised for each digital archives migration project
  • a flexible approach to file format conversion that can be adapted according to the needs of each digital archives migration project
  • and a flexible approach to managing metadata.

Accordingly we are developing three key tools: the Digital Archives Workflow Controller, the Digital Archives Preservation Pathways Registry, and the Digital Archives Metadata Registry. These are made available (in beta) under the GNU General Public License (version 3) on State Records’ Github repository.

Digital Archives Workflow Controller

https://github.com/srnsw/Workflow

A flexible platform for orchestrating digital preservation workflows. Custom workflows are defined for specific digital archives migration projects. These workflows are submitted to the workflow tool in a custom XML format along with digital records. The workflow tool then calls out to different applications and web services as defined in that XML file. The workflow tool has both command line and web service interfaces.

Digital Archives Preservation Pathways Registry

https://github.com/srnsw/Preservation-Pathway

This application records preferences for file format conversion operations. Basically, a recommendation to turn an input file with X PUID (using IDs from the National Archives PRONOM registry) into the format defined by Y PUID. Preferences can be registered for different purposes e.g. for “access” purposes we might recommend DOC->PDF, but for “preservation” purposes we might suggest DOC->ODF (just an example, not an actual policy in the registry).

Available publicly as a handy reference for the NSW jurisdiction (so that an agency that encounters records in a certain format can quickly find State Records’ recommended pathway for that format). It also produces machine readable output (JSON) that the Digital Archives Workflow Controller tool can consume (to automate format conversions where appropriate).

Digital Archives Metadata Registry

https://github.com/srnsw/Metadata-Registry

The Digital Archives Metadata Registry is a publicly accessible web service capable of:

  • allowing Digital Archives staff to progressively register preferences for published metadata terms (e.g. Dublin Core) to represent common metadata elements in the digital archives
  • allowing Digital Archives staff to progressively coin new terms (by providing a URI and description) to represent metadata elements in the digital archives for which no suitable published term can be identified
  • informing NSW government agencies wishing to transfer digital archives of State Records’ metadata preferences
  • informing users accessing the digital archives of the full set of searchable metadata fields in the system
  • providing a ‘best practice’ reference for NSW government agencies wishing to standardise metadata used in agency recordkeeping systems
  • providing a useful resource for the digital preservation and recordkeeping communities.

Special thanks

A big thanks to our development team, Nott and Ken, for their hard work on these and other digital archives projects.

Morpheus and Neo
Morpheus: the red pill or the blue pill, Neo?
Leave a Reply

You must be logged in to post a comment.