Digital archives update February 25, 2014
It’s two and a half years into the three year Digital Archives project and we have been working hard on our pilot projects, in particular the records of the Special Commission of Inquiry into Electricity Transactions and records of former State Premiers Carr, Keneally and Rees. Key applications have been developed to support the migration of records into the repository, including the creation of additional copies of certain formats for preservation and access purposes, as well as a search tool enabling powerful full text search across the records, presentation of record metadata and linkages to our archival context information.
Both of these applications are under continuing development and operate alongside other tools that we have adopted for specific purposes such as metadata extraction, file format identification and format conversion, and resources for recording and reusing metadata and preservation decisions, in the form of metadata and ‘preservation pathways’ registries. As with past software development we will be making the latest versions of our tools available as open source on GitHub when they are ready.
Context wrangling
Work on the pilot projects has involved a lot of metadata analysis. In particular, making the best use we can of the metadata that reflects the business context of the records (who? what? when?), and that enables relationships between records. Well managed metadata gives records meaning and authenticity, supports powerful (and meaningful) searching and of course helps with managing preservation issues such as format obsolescence. So it is really important that the choices that are made during a migration project are based on an understanding not only of the business context that the records were generated in, but the needs for the continued management in their new setting – from likely access requirements by the agency to use of the records by the public online, when they enter the open access period. Our metadata analysis and mapping approach is a critical part of making those choices, in an accountable way.
Archival description for records that are generated in digital systems is ‘same, same but different’. Same in the sense that the records are of course still the products of business functions and activities, created by a government agency, and are the continued responsibility of a government agency, more likely a number of them, over time. However there are interesting new considerations affecting how we, the archives, describe these records that come into our domain.
By way of an example, the records of the Special Commission of Inquiry into Electricity Transactions comprise over 40,000 digital objects that sit in a web of 3.5 Gb of metadata. The system that they were created and kept in was a highly configurable database application (‘Relativity’, set up and run as a service to government by a third party provider) that enabled the Commission to do its business of inquiring into the sales and proposed sales of a number of electricity generation assets in 2010/11. The scale and configurability of the structure for the records in this type of system make the listing methods used for transfers of paper files and other physical records and the reliance on the concept of original order unsuitable for our purposes. Our aim, therefore, is to link to higher level entities of business context, such as creating agency, function and activity, but to rely on our metadata analysis and management for searching for and making connections between the records that make up the system.
There is also the consideration of the migration process and how it may be documented and shared with users of the records, to provide important information about choices made as part of the migration with regard to format, presentation and structuring of the records. Even where records are not due to be made available to the public for a long period, it is vital that this information is captured and persistently linked to the records and along with other contextual information and the higher level description mentioned earlier. And it may not be what many archivists would regard as traditional description, potentially consisting of screenshots, user manuals and other systems documentation.
Access
We’ve been keen throughout the development of our approach to digital archives to keep access at the front of our minds, even though many of the records in our pilot projects have access directions applying to them that close them until they are 30 years old. The exceptions largely being records that can be made available under Early Access provisions because they are already in the public domain (such as published media releases or final reports of Inquiries). However the agency responsible for the records has an ongoing right of access, and may from time to time have a need to search them to respond to requests for access under GIPA (FOI), or for their own business purposes. Naturally we want to provide this access in a way that is easy for them, while also ensuring a very high degree of security and protection against inadvertent release of sensitive information. This is a challenge that require talking with our agency stakeholders to understand their needs and looking at solutions for access delivery with a focus on risk management.
What’s next?
Late last year we issued an exposure draft of the Digital Archives Migration Methodology. This will provide a framework within which we will work with NSW Government agencies to set up, plan and carry out migration projects. We’re now reviewing the comments we received and working on it for release in its final form mid year. It will be released later this year along with other tools and guidance that will enable agencies to commence projects with us, and we will be talking a lot more about it at presentations and online. Stay tuned to our usual communications for more on this move to ‘open for business’.
Hi Cassie
A very interesting post. Did the migration process from the ‘Relativity” application give you 40,000 separate objects to ingest into your archive?
I’d also be interested to hear what you did with all that metadata. If my sums are correct, 35Gb over 40,000 digital objects works out to about 1Mb per object. I guess a lot of it might be structural metadata or repeated over a number of objects. Did some of the metadata become new digital objects?
Hi Neal; sorry, that’s a typo. It’s actually about 3.5 Gb of metadata (thanks for bringing it to our attention, now corrected :)). During the analysis process we identify a relatively small quantity that will be persistently linked to digital objects in standard formats (date created, participants in the business the record documents, relationships to other records, format/s) as well as adding our own metadata (access direction, disposal rule etc). I think there is other, non-object related metadata there that can be used to show system structures and processes; we’re still playing with this..
Cheers
Cassie