Future Perfect: Digital preservation by design April 5, 2012
A couple of weeks ago I went to Future Perfect: Digital preservation by design 2012 organised by Archives New Zealand and held at the national museum, Te Papa, in Wellington. Now in its second year, it’s fantastic to have a digital preservation themed event like this happening in our region – no such event exists, to my knowledge, in Australia. The conference attracted people working in digital preservation from libraries, archives, museums and government organisations from around the world, including speakers and attendees from the UK, USA, Denmark, Germany and the Czech Republic.
I thought I would share some of the interesting bits of the two days for me (bearing in mind there were split streams that were very difficult to make choices on).
Jeff Rothenberg, formerly of the RAND Corporation, is a computer scientist who has written and spoken extensively on the subject of digital preservation for the past 20 years. Well known as a fan of emulation based approaches to digital preservation, it was interesting to hear Jeff speak about migration of digital records over time as being like playing Chinese whispers and ‘accumulating corruption’. I’m not sure that corruption is the right word. All business records go through migrations and incur some change during their existence. The question of whether migration change is tolerable is contingent on the recordkeeping requirement. I also have difficulty with concern for preservation of the ‘original artifact’ as a measure of authenticity. In digital recordkeeping, tracking and documenting processes carried out on records in an ever expanding corpus of recordkeeping metadata can attest to authenticity, not appearance. It was a pleasure the day after the conference to be able to participate in a meeting with Jeff and other delegates to talk through some of these issues. I must admit I am still not entirely clear on the benefits of emulation as a digital preservation strategy – to learn more I would recommend having a read of a recent blog post on the subject by Euan Cochrane: Incorporating Emulation into a ‘business as usual’ digital preservation workflow.
Ross Spencer works in the Digital Preservation Department of The National Archives (UK) as a Digital Preservation Researcher. I was very interested in what Ross had to say because we have been testing the TNA tool DROID and making use of their file format registry PRONOM – and plan to use both as part of our swag of digital preservation tools in NSW. TNA have built a community for format identification and analysis for DROID and PRONOM, including people from Archives NZ and the PLANETS Project. Such a collaborative approach is necessary for libraries and archives to properly track and maintain up to date file format information – none of us should have to do it on our own. Excitingly, Ross spoke about a new initiative to make PRONOM available as linked open data. He also mentioned a new DROID suite of tools including a signature development utility. One of the points that Ross made that really resonated with me was that these days TNA are less concerned about format obsolescence now than they used to be. There are other more pressing problems for them – namely the diversity, variability and quantity of records they have responsibility for. Above all, they need to be flexible and scaleable in their approach. Something we have recognised in shaping our strategy for digital archives here in NSW.
Jay Gattuso from the National Library of New Zealand spoke about how the Library is using PRONOM and DROID, and also talked a bit about Rosetta, their repository system. Jay’s talked about how from their point of view file format type, once identified and registered, drives all other functions including metadata extraction, rendering and risk management over time.
Kris Carpenter Negulescu from the Internet Archive talked about their ‘unorthodox digital repository strategy’. Their Web archive has 175+ billion (!) unique publicly accessible instances dating from 1996 to the present. Interestingly, in addition to 7 petabytes of publicly available info, the Internet Archive holds 5 petabytes of material under embargo by donors. It was interesting to hear in Kris’ talk about the rate of technology change and refresh at the IA. They need to migrate to the next generation hardware every 2 years, this is essential given their rate of growth and corresponding electricity & space needs.
It was great at Future Perfect to hear about what some of the data repositories are doing; the Australian Data Archive have an open source tool called ADAPT, available on github: https://github.com/ANUSF/adapt which allows contributors to upload data files to the ADA and associate rich metadata in Data Documentation Initiative (DDI) format. The team from Statistics NZ gave a great presentation on data use issues of authenticity and meaning, which sounded very much like our concerns in recordkeeping issues They also use (DDI), along with the Dublin Core and PREMIS standards to give their data meaning and trustworthiness. I particularly enjoyed Sally Vermaaten‘s Metadata top tips, which I tweeted:
“Metadata top tip (MTT) #1: Create structures that will allow you to re-use metadata tools#fp2012”
“MTT #2: Use fit for purpose standards eg NZ Stats use DDI for statistical data, SIARD for databases #fp2012”
“MTT #3: Consider overlap of standards & choose best standard to use eg format in PREMIS #fp2012”
“MTT #4: Give experts tools to capture metadata eg statisticians using a tool to generate metadata behind the scenes while they work #fp2012”
Alison Fleming and Michael Upton from Archives New Zealand reminded us not not to overlook the appraisal and descriptive standards issues that come with digital archives. Michael Upton struck a chord when he spoke about the issues of the hugely varied levels of capability and inclination to work on digital continuity projects in government agencies. He also noted some work they are doing with defunct agencies like Royal Commissions. So many archives find that these are the only government entities with real and pressing motivation to transfer – something I suspect we have been in denial about, and need to acknowledge in our approaches. Opening up the postcustodial conversation is one aspect of this, I think. Otherwise we could well, I suspect, have archival researchers of the future under the impression that turn of the millennium society was entirely composed of Commissions of Inquiry. I was also interested to hear that Archives NZ have a public reference group and search pilot trial as part of their digital archives project. So often all testing and reference groups are for the transfer end of things. One of my favourite quotes from the conference came from Michael Upton: ‘We’re not just making a magic box to dump things into, we’re changing how the NZ Archives operates.’ My paper at Future Perfect picked up on this theme. I gave an overview of where we are at with the State Records NSW Digital Archives project, and talked about some of the changes to the way we think about our processes that we are working through: Digital transformation at State Records NSW (PDF, 1.25MB)