Mythbusting: That storage is cheap May 24, 2012

http://www.flickr.com/photos/salihan/3590753580/

When talking to people across government about the importance of records management, recordkeeping controls and records disposal, we are very regularly challenged by the argument, ‘But storage is cheap. I can just keep it all. I can run Google across the top of all my data. I don’t need to do all these costly and challenging things you are asking me to do.’ We make counter arguments about the costs of the long term management of these vast data stores, about the fact that this strategy possibly creates more risks than it mitigates, about the expense of the required migration and preservation actions, and about the fact that this isn’t really a management strategy at all, but really a strategy of management avoidance. But not a lot of people listen! That, however, could start to change because we are starting to get some hard data about the cost of this approach.

In this post we want to promote three fantastic blog posts that really challenge the notion that storage is cheap and that provide some quite staggering statistics to help to quantify exactly how wrong the ‘storage is cheap’ argument is.

The first two posts come from Barclay T Blair. Blair runs an excellent blog which is focussed on issues around Information Governance. These posts were written a year and a half ago and were posted to the RIMPA list at the time, so many readers may have already seen them but they are well worth revisiting.

Way back in October 2010 Blair wrote a post called The Origins of Information Governance by the Numbers. Using data from the IDC Quarterly Storage Software Tracker, Worldwide Quarterly Disk Storage Tracker and Costs of Hard Drives 1956 – 2010 (see http://www.idc.com/ for more information), Blair shows that storage is indeed cheap. In fact it is amazingly cheap. In 2000 the disk cost per GB was $9.14. In 2010 the price was a mere $0.08 (1% of the 2000 cost, Blair points out). The IDC statistics also show that the worldwide costs of storage hardware have also remained static over the last 10 years. In 2000 they were $25 billion and in 2010 they were $25 billion. Those who claim storage is cheap are therefore absolutely right. But…

In turns out that storage, effectively the cupboard to put everything in, is only one part of the issue. Yes, you can keep putting things in cupboards and then buying more cupboards to put even more things in, but for any business that actually needs to use the information that is occupying all this cupboard space, this is when the costs really start to hit home.

Blair shows that over the same ten year period, 2000 – 2010, the costs of world wide storage software have doubled, from $5.3 billion to $11.7 billion. When viewed in totality, Blair says that these statistics show, ‘we are spending as much on storage 10 years later, when the price of the raw materials – disk drives – has dropped to 1% of what it was’. And this is because information does have to be managed. And this is where the true cost of storage lies. To quote Blair again, ‘Managing all this information is no longer a storage problem – it’s about how well we can manage, harness, and govern that information’. And when we are talking about current volumes of data, the capacity to do this costs an awful lot of money.

This then, is the crux of the storage debate, and unfortunately it means that the solutions are not as cheap and easy as proponents of the ‘storage is cheap’ argument would hope. The solutions are complex, require cross-profession collaboration, require planning and staffing, and need to be tightly aligned to risk and business outcome requirements. As difficult and as costly as they are, it is critical however that we start on developing these complex solutions now. To keep creating these vast data stores without any real management structure behind them is to continue to create immense legacy data problems that will need to be dealt with in the not too distant future.

Blair has consolidated all his statistics into an excellent graph within a PowerPoint slide. This is freely available under Creative Commons licensing via the link above.

Continuing on the theme, in November 2010 Blair wrote another post, The Information Governance Problem is Growing Faster Than the Data Problem. Blair’s key point in this post is that all information across the world is growing at a very large rate, but the information requiring management is growing at an even larger rate.

Blair reports that the IDC 2010 Digital Universe Study quantifies that data will grow 44 times over the next decade (from 0.8 Zetabytes to 35 Zetabytes. That’s a real lot). When, as Blair does, you translate this volume of data into individual files or containers of data to be managed, these will grow at a significantly greater rate than the overall volume of raw data. According to Blair, ‘In fact, it will grow by 67 times in the same period, or almost 50% more than the overall volume.’.

When you then approach this from a business or management perspective, the impacts of this data growth become even more disturbing. As Blair reports, ‘According to the study, the amount of data requiring some type of information governance (i.e, for “privacy, compliance, custodial protection, confidentiality, or absolute lock down” purposes) by 2020 will nearly double. Moreover, the portion requiring the highest levels of information governance control will grow 100 times. Furthermore, when viewed from a files – rather than an absolute volume perspective – the number of files requiring some kind of information governance will be over 90%.’

Blair concludes: ‘This is the heart of the information governance problem: not only is overall data volume growing at an astonishing rate, but the number of individual piece of data we have to manage is growing at a faster rate, and the amount of data that we have to manage and control in a special way is growing even faster.’ So, the cost and ease of storage does not really help us with the fundamental information governance, information management, records management challenges we have to deal with. Instead, it simply increases the fundamental problems.

Again, he has consolidated all his statistics into another excellent graph within a PowerPoint slide, again available under Creative Commons licensing.

The third post on storage issues to discuss was written by David Rosenthal. He writes DSHR’s Blog where he posts information on his work in Digital Preservation. Last week he wrote a post called Let’s Just Keep Everything Forever in the Cloud.

Rosenthal was responding to the trend of data hoarding, where people and organisations are starting to keep more and more data, increasingly in the cloud, with the justification that ‘one day it might be useful’. Rosenthal uses IDC data volume statistics, cost statistics based on Amazon’s Simple Storage Service (S3) and the 2011 Gross World Product (GWP) to blast the feasibility and sustainability of this trend out of the water.

After working through the maths, Rosenthal concludes that ‘keeping 2011’s data would consume 14% of 2011’s GWP. The world would be writing S3 a check each month of the first year for almost $100 billion, unless the world got a volume discount.’

And this is the cost here and now. Remember, Blair demonstrated the impacts of annual growth rates and Rosenthal does the same. Rosenthal reports that IDC estimates that the annual growth rate of data is going to average a 57% increase year on year. Allowing for average increases in storage costs and GWP, Rosenthal calculates that ‘endowing 2012’s data will consume 19% of GWP. On these trends, endowing 2018’s data will consume more than the entire GWP for the year.’

He concludes that ‘We are going to have to throw stuff away’. He briefly outlines some of the genuine challenges that are inhibiting our ability to destroy data and then suggests the disturbing proposition – ‘We may be in the bad situation of being unable to afford either to keep or to throw away the data we generate’.

Last year State Records did an informal survey of the NSW public sector and asked people questions about their disposal of digital records. The results confirm many of the conclusions outlined in the posts above. There is still a worrying belief that we can keep all data, that digital data is cheap to manage and we don’t need to worry about planning for how we are going to throw digital information away. The full report about this survey is available in our post The problems of identifying which digital records to keep and which to throw away: survey shows digital disposal is hard.

Each of these posts reported on today shows the importance of thinking strategically about our digital information. We need to plan up front how we are going to keep what we need to keep and how we are going to discard all the records we can legally destroy. Not foregrounding these issues and considering them up front is going to create substantial and unsustainable legacy issues in the future. Every NSW government body has authorised disposal authorities available to them, so we encourage you to use these as active information management tools, to be used in system design, to be raised in discussion with ICT colleagues, to be considered in planning exercises for migration and other ICT initiatives. Recordkeepers have fantastic skills that they can bring to the table to help resolve some of the massive data management challenges we are going to face in the very near future, so please do have the confidence to get involved.

As usual, we’d love to encourage any discussion or debate! So please do get in touch if you have something to say on these issues.

Finally, we are getting very close to our 100th post on Future Proof. Over the next week or so we will be slowly and slightly redesigning Future Proof to improve navigation and to increase the accessibility of all the content available on the site. If you have any views on what you do and don’t like about Future Proof, please let us know!

5 Comments
Andrew Wilson June 8th, 2012

Bit late reading this but its a great post Kate. Very thought provoking. Wish every CIO and CTO would read it.

Kate Cumming June 20th, 2012

Thanks Andrew! Yes, these are very big and concerning issues. I worry about David Rosenthal’s proposition: ‘We may be in the bad situation of being unable to afford either to keep or to throw away the data we generate’. IM, ICT and RM need to work together to ensure this disturbing possibility does not eventuate.

Peter Cowan June 27th, 2012

Thought provoking post but i cant help feel that there is a conceptual flaw in the argument. Here it is: The growing costs of software and governance to keep increasing volumes of info are costs equally tied to throwing information out. EDMS solutions and Digital asset management tools (which I assume are what are being referred to as software) are at their core information lifecycle tools that allow organizations to manage document creation, classification, storage AND disposition.

On top of this, an effective program for disposing of redundant, out of date or trivial content (ROT) would still require upfront management at the creation phase to classify all the information assets and then to assess their value in order to make the decision to throw it out at a future date. So, as far as I can tell unless we can make individuals and organizations create less information then we are still stuck with costly software and business processes for managing what they create, and, this cost, at least conceptually, doesn’t go down by actively disposing of ROT. (It probably goes up)

Kate Cumming June 28th, 2012

Hi Peter – thanks for sharing your really interesting viewpoints. I like your ROT acronym and I take the point you are making, but I’m afraid I can’t entirely agree with your conclusions. The posts I referenced were not recordkeeping posts, and so I don’t think the software they talk about is recordkeeping specific, but instead is the broad and ever-growing categories of business software that are now required to manage and use the vast volumes of business data being created every day.

One of the major challenges I think all organisations are dealing with is in just trying to control, use and then manage the mass of their corporate information. From experience I think that many organisations are finding it challenging to identify what information is important and what is no longer relevant, they are finding it hard to identify a ‘single source of truth’ amongst these information stores, and they are also already finding it too costly to continue to migrate and maintain the accessibility of everything. This last point is leading to possibly an increasing trend of ‘orphaned’ information. Proprietary software constraints, system architecture issues and configuration challenges are creating environments where it can be very difficult to carry business information forward into new systems. As a consequence, still active and useful business information is being ‘orphaned’ and left behind in systems that are no longer supported by the business. As data volumes grow, all these issues and their associated business risks will continue to grow with them.

So we do need effective disposal, but I don’t think its implementation needs to be as complex as you envisage. One of the trends in business is for more transactionally focussed systems, or systems that relate quite directly to specific business processes. There can be a direct mapping from specific retention requirements in a disposal authority (most NSW government organisations have these) to the transactions occurring in a system. Therefore we do have values and existing business information to plug into these assessments and to flag what needs to be carried forward and what can be thrown away.

But I do take the point that we need to be more strategic and not build a new problem in trying to solve an existing one. In NSW government, organisations are focussing on the management and protection of high risk, high value digital information and prioritising their resources on systems that make and keep this information. We still need to do more to encourage the consideration of these types of system rules at system design. At system design we also need to flag the need for better export and purge functionality in systems, the need to partition data so that some can be kept and some thrown away and the need for effective, well designed metadata to document and control it all.

So I don’t think keeping everything is the option, and I agree that traditional approaches are not going to solve the problem either, but if we can be strategic, connect to business, proactively manage risks and focus on what’s important, we might start to one day reach a happy middle ground!

[…] de Barclay T Blair et de David Rosenthal à partir d’une étude IDC de 2010, rapportées par Stephen Clarke dans le billet d’un groupe LinkedIn, mettent notamment en évidence que le coût d’un Gigaoctet […]

Leave a Reply

You must be logged in to post a comment.