Our machine learning road trip to Canberra July 3, 2017
Earlier this month a small contingent from State Archives and Records NSW enjoyed a day travelling to Canberra to discuss appraisal, sentencing and all things machine learning with our colleagues from the National Archives of Australia (NAA). Glenn Humphries from our Digital State Archives team is undertaking research into machine learning which will be published later this month on Future Proof.
NAA are trialling automated disposal using their functional disposal authority with data from their EDRMS. They need to make sentencing of digital records as easy as possible as their annual survey Check-up has shown that agencies are not routinely implementing disposal authorities to destroy digital records or to transfer digital national archives. They have also found an occasional unfortunate perception that digital records may not need to be sentenced as in the digital space you can “keep it all”.
Their high-level, free text disposal authorities work well for humans but not so much for machines – machines can’t tell what is ‘significant’ and need structure as well as free text. One of the difficulties is that the tools are often expensive – vendors have offered to undertake research in this area (for substantial amounts of money) but, in NAA’s view, at present they can’t yet guarantee the results that NAA, as archivist, need.
The two level structure of NAA’s disposal authorities (functions and disposal classes) is appropriate for machines but the detail from the functions has to be repeated in the disposal classes to make sense to the machine. The words in authorities can also be indexed easily but words on their own are not useful – they need to be weighted and analysed by appropriate algorithms.
NAA are trialling creating two versions of their authorities – the original human-readable high-level streamlined version and a much larger granular machine-friendly version which incorporates a lot of advice for the machine to allow it to recognise what is significant. This entails a different approach to analysis for each function.
Experiments with sentencing records around visits (key records of international visits are national archives – all records of school visits are not national archives) produced interesting results. The machine easily identified international visits and school visits, but classified all records to do with an international visit to be retained as national archives – even the records around catering. (Not sure what a machine would make of an international school visit). The experience so far has been that the results of machine sentencing still need to be reviewed by a human – it’s helpful but is not a good enough result, especially considering the volumes of records to be sentenced.
The other interesting point they raised is around using vendors of proprietary products as developers. For the sentencing to be accountable, archives need information about the algorithms used – but if the code in the algorithms is a commercial product then vendors can’t release it. We need to find ways of getting the best out of industry and at the same time making sure that we are accountable for our decisions about disposal.
So my take on this is with the tools currently available it seems that it would take just as long to educate the machine about what is important as it would for a human to do the sentencing (machines are still a little dumb in some respects). Hopefully with cleverer machines the balance will tip the other way. And should appraisal and sentencing be easy anyway? It needs to be a carefully considered thought through process.
In the meantime, the NAA team is continuing its research into archival use of machine-learning: they built a conceptual model of a machine-executable retention and disposal authority which they are going to try out with more semantic analysis tools to see how the results of selection can be improved.
Thank you to Tatiana, Karuna, Jane, Marian, Kathleen and Paul from NAA for being so generous with their time and experiences.
Leave a Reply
You must be logged in to post a comment.