When I first saw reports of the Guardian’s new Data Store ‘open platform’, my heart sank. In a former life, I ran the web operation at the Office for National Statistics; I resigned in June 2004, when frustration started to turn to anger. I’ve still got a copy of my resignation letter, in which I wrote:
I have always maintained that the agenda of openness which I espoused is not a choice; it is a reality forced upon us by the modern communication environment. The general public’s expectations have moved on dramatically in the last decade [1995-2004]. Sadly, this [realisation] has not been shared by other parts of the Office on whom my work or resourcing have been dependent.
I warned them that someone would come along, do a better job than they were doing, and supplant them as the ‘primary source’. Once that happened, the statistical sanctity so jealously guarded by the priesthood of statisticians could very easily be compromised. In effect, to preserve the status quo, things had to change. (The message went unheeded, by the way: the six-month ‘stopgap’ site I introduced is soon to celebrate its seventh birthday.)
So today, the Guardian unveiled their Data Store. Editor-in-chief Alan Rusbridger is absolutely clear about the service’s purpose:
Publishing data has got easier [since 1821] but it brings with it confusion and inaccessibility. How do you know where to look, what is credible or up to date? Official documents are often published as uneditable pdf files – useless for analysis except in ways already done by the organisation itself.
Just to be clear, ONS: that’s you he’s talking about. It’s expressed even more starkly in an accompanying blog post by Simon Rogers, subtitled: ‘Looking for stats and facts? This is now the place to come.‘ A quick look down the data on offer reveals a high proportion, a majority perhaps, to be ONS or other HMG data. Their tanks are on your lawn, guys.
Now I’m not for one minute suggesting the Guardian would do anything malicious. I’m simply warning of the uncomfortable position where an outside entity – indeed, in this case, one with an explicit political slant – becomes the gatekeeper to (supposedly) pure statistical data. Can we rely on them to be as comprehensive, as conscientious, as religious in their devotion to updates, corrections and revisions? No, admits Simon Rogers: ‘it is not comprehensive… this is selective’.
So is this the Doomsday Scenario I predicted? Not quite, not yet. How exactly are the Guardian serving up the data?
We’ve chosen Google Spreadsheets to host these data sets as the service offers some nice features for people who want to take the data and use it elsewhere [in] a selection of output formats including Excel, HTML, Acrobat PDF, text and csv. A key reason for choosing Google Spreadsheets to publish our data is not just the user-friendly sharing functionality but also the programmatic access it offers directly into the data. There is an API that will enable developers to build applications using the data, too.
You read that right: the actual mechanics are as basic as: uploading/copying existing Excel spreadsheets, converting/pasting them into Google Docs spreadsheets (price: £0 for 5000 reasonably-sized files), and letting the Google functionality do the rest. By way of example, data on England’s population by sex and race. The Guardian offers this Google Spreadsheet. Now download this Excel file from the ONS website, and look at the sheet labelled ‘Datasheet’. Actually, let me save you the bother: they’re identical.
Cabinet Office minister Tom Watson writes on his blog: ‘Governments should be doing this. Governments will be doing it. The question is how long will it take us to catch up.’ The answer is, the few seconds it takes to sign up for a Google account, and maybe an hour of copy-and-paste. So, tomorrow lunchtime, then.
This afternoon, I thought this was a disaster for ONS’s future. I’ve changed my mind. The Guardian’s move sets a precedent, and lays down a direct, unavoidable challenge. It could actually be ONS’s salvation.
9 thoughts on “Guardian Data Store: threat to ONS or its saviour?”
To be honest, I see this as a challenge to all Departments which publish useful stats (school league tables I suspect were top of the Guardian’s mind, but also New Year’s Honours, politicians’ finances etc).
ONS information is certainly a massively useful collection of datasets, but public sector organisations are sitting on much simpler but potentially enormously useful spreadsheet content – lists of branches including postcodes, historical spend figures on X, satisfaction figures for Y – which could live quite happily in a Google spreadsheet. One of the barriers to greater reuse of public sector information is simply having way to publish it, and I think this demonstrates a simple, low cost way which doesn’t necessitate special web publishing or a fancy API.
As we both know, ONS struggled for years with information architecture and user interface decisions about how to present its datasets in usable ways. As the Guardian Data Store grows, it will be interesting to see how they address that thorny question.
The more open information we have in this country, the better. This is a great first step, but ultimately we’re going to have to revisit the whole Crown Copyright mess, which in many ways is holding UK government services back. Not only do we have to pay for the data, but government-funded organisations have to pay when they share the data between themselves …
We’ve got an opportunity to lead the world in this, but it looks like we’re going to be falling behind the US, as usual.
I think you’re a little harsh on the “priesthood of statisticians”. Many of them working on the ground with the data were (and are) desperate to improve their web offering to the public but have found that the top of the Office and those working in the web area of the office unwilling and/or unable to help them do it.
I’m not criticising the statisticians’ devotion to their data, far from it. I’m in awe of it. No, when I think of the people I blamed for ONS’s failure to deliver on the vision I was asked to develop, I think it’s fair to say none of them were (still) front-line statisticians.
“the actual mechanics are as basic as: uploading/copying existing Excel spreadsheets, converting/pasting them into Google Docs spreadsheets”
Google spreadsheets will actually import an XLS document that is linked to on the web if you give it a URL.
However, this import is a one-shot affair (I think) – once uploaded, that’s the data you’ve got.
However, if you the data structure is simple enough to be published online as CSV, then you can use the =importData(“URL”) formula ( http://googlesystem.blogspot.com/2007/09/google-spreadsheets-lets-you-import.html ) to keep the spreadsheet data synched (within an hour or two) with the original data. Okay – so you need to get the ONS data published (where possible) as CSV, but then the Google spreadsheet will offer a much richer API to the data on your behalf…
Someone has to ‘produce’ the figures first. It is the collection and production of accurate data that is the more important role of ONS. I’m not bothered if someone views the figures via the Guardian website or the ONS one. When the Guardian starts collecting and compiling the figures then ONS will really have something to worry about.
Comments are closed.