When I first saw reports of the Guardian’s new Data Store ‘open platform’, my heart sank. In a former life, I ran the web operation at the Office for National Statistics; I resigned in June 2004, when frustration started to turn to anger. I’ve still got a copy of my resignation letter, in which I wrote:
I have always maintained that the agenda of openness which I espoused is not a choice; it is a reality forced upon us by the modern communication environment. The general public’s expectations have moved on dramatically in the last decade [1995-2004]. Sadly, this [realisation] has not been shared by other parts of the Office on whom my work or resourcing have been dependent.
I warned them that someone would come along, do a better job than they were doing, and supplant them as the ‘primary source’. Once that happened, the statistical sanctity so jealously guarded by the priesthood of statisticians could very easily be compromised. In effect, to preserve the status quo, things had to change. (The message went unheeded, by the way: the six-month ‘stopgap’ site I introduced is soon to celebrate its seventh birthday.)
So today, the Guardian unveiled their Data Store. Editor-in-chief Alan Rusbridger is absolutely clear about the service’s purpose:
Publishing data has got easier [since 1821] but it brings with it confusion and inaccessibility. How do you know where to look, what is credible or up to date? Official documents are often published as uneditable pdf files – useless for analysis except in ways already done by the organisation itself.
Just to be clear, ONS: that’s you he’s talking about. It’s expressed even more starkly in an accompanying blog post by Simon Rogers, subtitled: ‘Looking for stats and facts? This is now the place to come.‘ A quick look down the data on offer reveals a high proportion, a majority perhaps, to be ONS or other HMG data. Their tanks are on your lawn, guys.
Now I’m not for one minute suggesting the Guardian would do anything malicious. I’m simply warning of the uncomfortable position where an outside entity – indeed, in this case, one with an explicit political slant – becomes the gatekeeper to (supposedly) pure statistical data. Can we rely on them to be as comprehensive, as conscientious, as religious in their devotion to updates, corrections and revisions? No, admits Simon Rogers: ‘it is not comprehensive… this is selective’.
So is this the Doomsday Scenario I predicted? Not quite, not yet. How exactly are the Guardian serving up the data?
We’ve chosen Google Spreadsheets to host these data sets as the service offers some nice features for people who want to take the data and use it elsewhere [in] a selection of output formats including Excel, HTML, Acrobat PDF, text and csv. A key reason for choosing Google Spreadsheets to publish our data is not just the user-friendly sharing functionality but also the programmatic access it offers directly into the data. There is an API that will enable developers to build applications using the data, too.
You read that right: the actual mechanics are as basic as: uploading/copying existing Excel spreadsheets, converting/pasting them into Google Docs spreadsheets (price: £0 for 5000 reasonably-sized files), and letting the Google functionality do the rest. By way of example, data on England’s population by sex and race. The Guardian offers this Google Spreadsheet. Now download this Excel file from the ONS website, and look at the sheet labelled ‘Datasheet’. Actually, let me save you the bother: they’re identical.
Cabinet Office minister Tom Watson writes on his blog: ‘Governments should be doing this. Governments will be doing it. The question is how long will it take us to catch up.’ The answer is, the few seconds it takes to sign up for a Google account, and maybe an hour of copy-and-paste. So, tomorrow lunchtime, then.
This afternoon, I thought this was a disaster for ONS’s future. I’ve changed my mind. The Guardian’s move sets a precedent, and lays down a direct, unavoidable challenge. It could actually be ONS’s salvation.