System migrations to archives (a research paper from the digital archives team) January 18, 2012
Executive Summary
This research paper discusses three problems with using the term transfer to describe the processes that precede ingest into the digital archives and recommends instead the adoption of a project-based, system migration approach.
Introduction
On a theoretical and conceptual level, the loss of connection between current recordkeeping and archival systems undermines the capabilities of a connected and seamless archive…
Barbara Reed, 2010, p.352
The goal of State Records NSW’s digital archives project is to create a digital recordkeeping system that can encompass other digital recordkeeping systems – a system of systems – rather than simply a digital repository storing digital objects. Instead of focusing on the challenges of preserving digital objects as clusters of discrete files, State Records’ digital archives will aim to preserve agency records in context, as integral parts of agency recordkeeping systems. This goal will affect how we describe, preserve, store, and provide access to digital archives. It has particular consequences for the processes we develop and implement to enable agencies to transfer digital records as State archives. The purpose of this paper is to argue that, in fact, transfer is not the best term to describe this set of processes and that a system migration approach is preferable.
System migration, not Transfer
The terminology we use to describe the processes involved in preserving digital records is important because it shapes the parameters of our thinking and informs the tools and systems we build and adopt. Of course State Records’ digital archives will involve transfer – the formal exchange of custody and control of digital records between an agency and State Records – the point is not that it is never appropriate to use this term; rather, it is to say that the term transfer should not be used in a broader way to describe the larger set of processes (for example, of negotiation, planning, and the exchange of records) that will occur between an agency and State Records prior to the ingest of digital records into the digital archives.
Three problems with using the term transfer to describe this larger set of processes are that it implies a focus on files, it leads to a linear workflow, and it contains implicit assumptions about the importance of custody.
Problem: a file-level focus
… there is a tendency to view the document (object) as the focus – the document in the storage repository – to the detriment of managing the document in context with the metadata residing in the specific application software – managing objects as records.
Barbara Reed, 2010, p.353
Most existing digital preservation tools are designed to manage tasks relating to the preservation of individual digital files: tasks such as format characterisation, format conversion, metadata extraction, virus scanning, checksumming, and the creation of file manifests. This file-level focus is helpful when tackling file-level preservation problems like format obsolescence and integrity, which tend to have file-level solutions such as format migration and checksumming, but when applied to the larger processes of the exchange of digital records, description, and access it becomes problematic.
The term transfer relates to this problem because it suggests that the migration of digital records from an agency recordkeeping system to the digital archives is simply a matter of identifying, packaging, and exchanging digital files. Existing tools that manage the transfer of digital records suffer from this ‘file fixation’. Tools like Manifest Maker and Bagger create manifests of files with checksums and package them in zipped files or on portable storage media. The VERS approach is to create individually encapsulated objects that bundle record content with record metadata. These tools are useful for ensuring the successful transfer of digital material but they fail, as Barbara Reed points out, at adequately capturing the recordkeeping structures within which those digital records had been created and stored:
Control records for registry-style recordkeeping systems were an early site for the application of automation in the workplace. However, it is still the case that most archival organisations have ignored these automated systems, taking the documents/ records (objects) without their accompanying metadata which resided in these automated systems.
Barbara Reed, 2010, p.355
The file-level focus of these tools also mean that they are poorly suited for processing consignments of digital records that don’t exist as neat sets of files: for example, digital records within digital recordkeeping systems, database-backed business systems, or digital records stored in cloud-based services. Like drunks searching for their keys in the lamplight, digital archives that conceive of transfer into a digital archive as being the simple process of ingesting runs of individual files risk creating brittle transfer and preservation procedures that depend on that type of transfer, i.e. objects to checksum, migrate, and encapsulate, and find themselves unable to manage any other kind of digital record.
A file-level approach to transfer may also ultimately impair users of digital archives. From an access perspective, the object of the digital archives is not to create a library of digital objects that readers can peruse but a live recordkeeping system that the public can access for authoritative and meaningful records and that agencies can continue to rely on for evidence of their business activities.
With electronic records, this safety net will be removed in two ways –
- It will no longer be possible, if it ever was, to view “users” of archival data simply as researchers accessing other peoples’ records. Records-makers (system and data managers) and, arguably, organisations at large will also be “users” of archival data. They will require a quality data product.
- Researchers will access electronic records directly, via a network without going through archivists. What use they make of archival data will depend upon its being available, reliable and relevant. These are qualities conferred by adherence to standards when the data is generated.
Chris Hurley, 1994
To ensure a ‘quality data product’, that is as useful to records-makers as it is to the general public, the digital archives will need to ensure that complete recordkeeping structures, and not just digital files, are migrated to the digital archives so that agency staff can continue to rely on their digital records and have access to the same contextual information that was available to them when those records were stored in their agency’s own recordkeeping systems.
Problem: a linear model
This essential feature of recordkeeping – its contextual and contingent nature – makes it futile to establish hard and fast rules… Recordkeeping solutions do not come out of a box. They are, and must be, tailored to the particular requirements of individual organisations. Analysis and thought is needed to determine the recordkeeping responsibilities of organisations. Recordkeeping must be tailored to the requirements of specific business functions and activities, linked to related social and legal requirements, incorporated into particular business processes, and maintained through each change to those processes.
Barbara Reed, 1997
A second problem with the use of the term transfer is that it leads to a linear workflow in which other processes of the digital archive (such as preservation planning and preservation actions like format migration) necessarily occur after transfer and ingest.
In a system migration approach, the formal transfer of digital records becomes just one element of a larger project, sitting in parallel with other processes such as preservation planning, metadata mapping, data extraction, quarantine, file characterisation, format migration, etc. This approach allows for flexible workflows that stage preservation processes at the most appropriate moment for any given consignment of digital records.
Oftentimes it will be desirable for preservation processes to happen following the formal transfer of records. For example, the conversion of Microsoft Office files to the ODF format might be more consistently and efficiently handled by State Records staff.
In other cases, however, preservation processes will necessarily happen before formal transfer. For example, the extraction of digital records and metadata from live database-backed business systems will often be performed by agency staff. State Records will, however, want to be involved in decisions about the format of those exports and the type of metadata extracted and in these cases the preservation planning process must therefore happen before formal transfer.
In both scenarios, a project-based approach that involves the cooperation of State Records and agency staff throughout the system migration is desirable because it ensures that agencies are kept in the loop with respect to any preservation actions taken (and will therefore be more comfortable in continuing to rely on their digital records held in the digital archives) and become active partners able to perform key steps such as the extraction of data from business systems.
It should be noted that system migration projects may also be initiated well after digital records have been taken into custody. State Records’ digital archives will have an active monitoring function and it may happen that successive system migrations are necessary to maintain accessibility to certain types of digital records (e.g. ongoing migration as file formats become obsolete over time).
A project-based approach will allow State Records to construct the most appropriate workflow for any given system migration. This does not, however, mean that we will attempt to re-invent the wheel for every separate project. By developing templates for common types of system migration and by automating workflows wherever possible, State Records will seek to balance the contending needs for efficiency and flexibility of approach.
Problem: assumptions about custody
Users accessing electronic records via the network will not need archivists to hold, locate, or interpret the data. We will be needed, if at all, to help construct systems in which archival data (knowledge of context and record-keeping) is available to users when needed.
Chris Hurley, 1994
A final problem with the use of the term transfer is that it implies that the exchange of custody and control over digital records is an essential aspect of the digital archives.
Under the State Records Act 1998, State archives can remain in the custody of agencies or other controlling organisations by agreement with State Records. There is no requirement for State Records to take custody as well as control of records. These provisions of the Act have so far only been used in NSW for physical records, such as those held by the regional repositories, but their inclusion had as much to do with the discussions in the Australian recordkeeping community in the mid-1990s on post custodial strategies for electronic records. While the first generation of digital archives in Australia necessarily took a more cautious custodial approach, it is time to look again to distributed custody models for digital State archives.
In many cases the exchange of custody will be desirable because often agencies lack strong incentives to ensure the continuing accessibility of digital records required as State archives. However we should also expect that there will be times when agencies do decide to maintain custody of such records for long periods of time. In these cases the digital archives should still have a role to play: both in assisting agencies to plan for the preservation of these records and in ensuring that archival data about them is available to users.
Because a system migration approach considers transfer to be just one element of a larger process that also involves preservation planning, preservation actions, and metadata mapping, it is applicable to both custodial and non-custodial projects. The digital archives might still initiate a system migration project with an agency that decides to retain custody of its digital records in order to ensure that a preservation plan is in place and in order to perform a metadata mapping to document the digital records in State Records’ archival control system.
Further benefits of a system migration approach
A system migration approach has additional advantages.
Firstly, it will help State Records connect with technical communities outside the (relatively small) digital preservation community. ‘System migration’ is a term that has meaning not just for archivists and records managers but also for the IT community and related professions. These other disciplines have much to offer. For example, the field of ‘Enterprise Integration’, a specialisation of enterprise architecture, focuses on challenges around the interchange of electronic data. This problem is akin to the transfer of digital records to archives and this field can provide knowledge, case studies, and software applications for migrating business systems (such as Pentaho Kettle http://kettle.pentaho.com/).
Secondly, by taking a system migration approach, State Records will also ensure that the processes and tools it develops and adopts have relevance not just to other digital preservation projects but also to NSW government agencies performing in-house system migrations of digital records. In other words, if we can borrow from disciplines like enterprise architecture when tackling digital preservation challenges, then we can also apply our digital preservation expertise to the problems faced by IT and records staff working to maintain access to digital records over the long term.
Finally, a system migration approach is desirable because it brings the rigor of formal project management (planning, accountability, execution, and review) to the process of digital transfer. It ensures that the migration of digital records from agency recordkeeping systems to the digital archives is carried out appropriately, with accountability, and with key decisions (such as how to extract digital records from business systems) being planned and documented.
Sources
Chris Hurley, Strength below and grace above: the structuration of records, June 2011, online at http://infotech.monash.edu/research/groups/rcrg/publications/strength-below.pdf
Chris Hurley, Data, Systems, Management, and Standardisation, 1994, online at http://infotech.monash.edu/research/groups/rcrg/publications/datasystems.html
Barbara Reed, ‘The Australian Context Relationship (CRS or Series) System: An Appreciation’, The Arrangement and Description of Archives Amid Administrative and Technological Change: Essays and Reflections by and about Peter Scott, Australian Society of Archivists, Brisbane, 2010
Barbara Reed, Metadata: Core record or core business?, 1997, online at
http://infotech.monash.edu/research/groups/rcrg/publications/recordscontinuum-brep1.html
I think you have been somewhat harsh on VERS. VERS is not a ‘file level’ only metadata solution. Context metadata of various kinds is also captured including at the level of aggregation (directories) as well as other metadata that is more familiar in the context of recordkeeping metadata standards.
VERS continues to be referred to by Gartner as a lead standard in migration and it puzzles me that it is routinely sold down by Australians in discussion. OK it is not the latest kid on the block. Looking at the track record of success and failure with projects, business models and scalable processes seem to be the achilles heel of digital archiving and it is on these rocks that the ship of digital archiving routinely runs aground. Apologies for nautical analogy.
Thanks for picking that up Mark. You’re right, to say that VEOs only capture “file level” metadata is misleading, and I’ve amended the statement to more closely match PROV’s own description of VEOs (http://210.8.122.120/vers/standard/advice_09/5-2.htm).
Our point of difference with VERS isn’t so much on the metadata kept as on the focus on records as objects rather than as integral parts of recordkeeping systems. In other words, we wish VERS had VERKs rather than VEOs!
Hallelujah! I could never get any traction discussing these issues at my organisation, I hope this leads to a better understanding and better solutions.
An interesting article. However, after carefully examining all the icons I can find on this page, it seems the one thing I can’t do is print it (at least not without copying and pasting to Word). And unfortunately the email is Gmail – you seem to need an account to use it.
If I’ve missed something, please point it out.
Hi Mary
Sorry I can’t find any simple way to get a print view of the page either. I’ll let the site admins know. Hopefully we can fix this soon.
cheers
Richard
I think it’s great that conceptualisation (what you are trying to do) is preceding development (this is how you are trying to do it). You’re redefining “transfer” in the context of a digital archiving project and that leads me to ask what you mean by digital archiving. I would say that the cited “preservation processes” (describe, preserve, store, provide access, preservation planning, preservation actions like format migration, metadata mapping, data extraction, quarantine, file characterisation, etc.) not only need to occur before transfer/extraction/ingestion but that they can occur without transfer etc. ever occurring. In other words, the digital archives (defined as a place to which transferred data goes) is itself a tactic – one that may need to be employed, for the reasons you state, in some or even many cases but not the only way of reaching the preservation goal. That understanding is implicit in what you say, but I would make it more explicit.
Does the problem lie in the term “digital archives”? In the opening paragraphs, I took it to be a state of being, but by the time I reached the diagrams, it was looking like a place. Might it be a place defined by SRNSW “control” (partial or entire) rather than by geography or hardware? Maybe you need 2 terms : preserved records (the state of being) and digital archives (the place). I would want to say that “transfer” to SRNSW can be accomplished, provided other preservation requirements are met by one means or another, by description of preserved records alone and not only by extraction to a place. This view is not necessarily shared by all and it may be outside the project agenda (which presumably has a time-frame for completion) but it enhances the point I think you are trying to make here and it may prove to be the case, depending on how you tackle descriptive issues, that you end up closer to a purely descriptive solution than was first thought likely.
Yep, I’m certainly hoping that by disentangling physical transfer and transfer of control as Richard has done in his paper we are embarking on a path towards preservation / contextualisation / access provision partnerships with agencies that elect to hang on to digital records that happen also to be State archives. We know that they have few incentives or motivations to hand them over, and indeed with the exciting ways many are using data now many more reasons to keep all their records. However we also have orphaned recordkeeping systems of Commissions of Inquiry and State functions transferred to the C’wealth that we will need to bring into our ‘place’. I guess I am hoping by the end of the 3 years of the project we’ll be well positioned to cope with both scenarios. My 2 cents.
Mary: thanks to our clever site admin we now have a “print page” button that you can use to print out any of the posts on the futureproof blog (the button appears at the bottom of each post)
If you go to the full version of the post, then scroll down to the bottom of the page there is a “print page” button.
[…] The Digital Archives team at State Records is developing plans and processes to accept whole business systems as archives. They are also currently piloting other digital archives transfers. A research paper outlining their system migration strategy is available via the Future Proof blog. […]