NARA’s summary of records management automation approaches June 20, 2014

labyrinthine circuit board linesI have previously blogged about the US National Archives and Records Administration’s (NARA) approach to cloud email. NARA has issued a paper which surveys the available strategies for improving records capture through automation. It highlights the advantages, the maturity, and the risks and drawbacks of these approaches.

Continuing in the theme of drawing on good work being done elsewhere, this post is a quick summary of the five strategies to automation which they discuss in the paper.

A.      EDRMS-only

The paper tacitly acknowledges a shift away from EDRMS and DoD 5012.5 compliant-type approaches. The paper notes that while automation of records management tasks is performed by such systems, capturing the information in the first place relies on some sort of manual process. “Each of those repositories provides different degrees of automation of records management tasks after capture and categorization, but on their own, none automate capture and categorization” (p 9). The paper notes that “since appropriate management for the rest of the records lifecycle depends on initial capture, inconsistency here puts the effectiveness of the entire electronic records management program at risk” (p 7).

It notes the critical risk that “it is very difficult to get consistent compliance using this approach because of the reliance on end user action. The approach does not scale up to large volumes of records or staff, risking failure to effectively manage both permanent and temporary electronic records” (p 9).

B.      Rule-based automation

The paper recommends that “effective and consistent electronic records management is achievable for many agencies for at least some of their records using automated business rules that act on metadata, user roles, or another feature of records” (p 9).  The paper gives the example of NARA’s Captone email approach, where email is retained based on user roles.

However they acknowledge that “approaches simple enough for easy implementation may lead to over-retention of low value records, leading to higher storage costs and increased litigation risk, or failure to capture permanent records that occur in unexpected places” (p 10).

C.       Business Process and Workflow Automation

Many important agency business processes have information systems or workflow systems designed specifically to support the flow of information through that process. This automated approach relies on integrating workflow steps to capture necessary metadata, to associate resulting records with a retention schedule, and to destroy or transfer the records to the archives at the end of a retention period within that system.

The paper notes “While there are challenges in implementing this approach, because of its inherent consistency, the risk of mismanaging records when it is applied well is very low” but notes that “Relying on this approach alone may leave many electronic records unmanaged if the agency cannot integrate appropriate records management capabilities into all agency records-creation workflows, which will usually be the case” [emphasis added] (p 11).

D.      Modular Re-usable Records Management Tools

The paper notes that architecture work has been done to define modular records management services, however it does not reference any examples of successful implementation. It suggests that “relying on a flexible, modular approach runs the risk of leaving some electronic records unmanaged since not all existing systems may interoperate with modular tools and services” (p 12).

E.       Autocategorization

The paper notes that promising software is being developed which is based on machine learning to categorize content. The paper observes:

Because autocategorization works with so many unstructured record types, this approach has great potential to address the records in an agency that cannot be managed automatically any other way. However, the technology is relatively new and is still improving, and records managers are still learning best practices for working with it effectively. (p 12)

Separately, NARA has pointed us to some case studies utilising this approach. These case studies are provided by vendors, so must be taken as such. However it is worth noting that one major agency saw 87% accurate categorisation of email using that approach, which it argues is better than the 70% expected accuracy and even that of manual filing.

The paper notes the risk that, because autocategorization is not 100% accurate, there is some risk of incorrect disposal or over-retention of temporary records. Agency stakeholders may not trust automated algorithms, regardless of actual accuracy rates (p 13).

Conclusion

This paper thinks pragmatically, and provides a direct assessment of the drawbacks and opportunities of approaches to automation. We are looking forward to the finalised paper being published, and intend to provide an update when that happens.

photo by: quapan
One Comments

[…] have taken the liberty of changing NARA’s categories slightly – this summary  sticks more closely to NARA’s […]

Leave a Reply

You must be logged in to post a comment.