Climate data, angry scientists and metadata December 17, 2009

The Sydney Morning Herald, on page 1 of its December 5-6 2009 edition, ran a story called ‘Climate email mess hits Australia’. The story provides further detail from climate change records hacked from East Anglia University in the UK.ALP by Wolfgang Wildner
Creative Commons License photo credit: Wolfgang Wildner

The story focusses on the hacked records of a programmer at the university who was trying to use a range of historical Australian climate data from meteorological stations. In trying to use this data the programmer encountered a range of problems which led him to complain about the information he was working with: ‘Getting seriously fed up with the state of the Australian data…so many false references…so many changes…bewildering.’ In another email he says there is ‘no information integrity’ and concludes at one point, ‘What a bloody mess’.

If you’re very honest, would an external researcher or even a client from your own organisation make similar complaints if they had to try and find meaningful information in some of your business systems? In the East Anglia story, the key criticisms according to the Herald article relate to the poor quality of database construction and not to the integrity or validity of the actual information. The programmer knows that there is good, viable information in the system, he just can’t exactly work out what it is, nor how to extract it in a useable and meaningful way.

The problems in this example come down to metadata. Metadata or database fields that seem not to have been designed appropriately and then metadata fields that seem to not have been used appropriately. In particular it seems that a range of encoding schemes that are necessary to correctly identify the type and location of climate data – World Meteorological Organisation codes, station names and geographic coordinates – were not applied consistently and correctly.

It’s important to state that the Bureau of Meteorology emphasised in the article that it does not know the source of the climate data being used and its representative said that the Bureau had invested a lot of time in creating and maintaining high quality and reliable Australian climate data. Raw Australian climate data is available across the world in real time for forecasting purposes and this data could have come from countries other than Australia. Wherever it came from, the problems that arose through the use of the raw data provide some lessons which are illuminating:

  • realise the long term implications of system design and the metadata fields you implement
  • data quality, data reuse and data integrity are so reliant on good metadata application
  • where possible, use encoding schemes to give your data consistency and meaning
  • make sure people use your encoding schemes well. Poor use of these tools, and poor metadata in general can, as in this case, make your information virtually unusable.
One Comments

[…] This post was mentioned on Twitter Future Proof. Future Proof said: New #FutureProof post: Climate data, angry scientists and metadata http://bit.ly/7EeI2U […]

Leave a Reply

You must be logged in to post a comment.