ONS drops jobs data early

I’ve actually got a lot of sympathy for the team at the Office for National Statistics today. This morning should have seen the release of the monthly unemployment numbers; but due (apparently) to ‘a computer error on automated systems’, they leaked out yesterday – and ONS took the decision to bring forward the official publication. Bearing in mind the market sensitivity of the data, I can imagine the scenes.

As I’ve mentioned before, I was in charge of the web team(s) at ONS for a couple of years, from 2002. It was one of the most frustrating periods of my career: for all my best efforts, my vision of web-friendly database publication went unrealised. Instead, the current National Statistics website is still fundamentally the same 6-month stop-gap site I pushed through in 2002. I don’t know about the underlying data-crunching systems, but I see no evidence of there having been any improvement since I left. They were inadequate then, and they look even more inadequate this morning.

Instead, improvements to government statistics online now seem to be centred on something called the Publication Hub. In effect, it’s a big catalogue of government statistical releases – most of which are still located on the originating department’s website, and are still being delivered as PDF or Excel files. User-friendly it ain’t, placing the priority on ‘metadata’ (which, in statistical terms, means lengthy written explanations of methodology) rather than the actual data. Most people will struggle to find any numbers whatsoever.

There are some appalling quirks: for example, if you press the button to see the homepage button for the ‘next 30 days’ of scheduled statistical releases, you see day 30 first, and have to click two or three times to get to day 1 (ie tomorrow). And whilst it’s good to see RSS has been taken into account, it’s impossible to work out what’s meant to be included in the feed each time you see the orange icon.

I left ONS five years ago because I didn’t believe senior management recognised that the world had changed. In my letter of resignation, I suggested the Office was ‘five years behind the times’. Another five years on, if this Publication Hub is the answer, they still haven’t understood the question… and we’ll have to rely on third parties.

Guardian Data Store: threat to ONS or its saviour?

When I first saw reports of the Guardian’s new Data Store ‘open platform’, my heart sank. In a former life, I ran the web operation at the Office for National Statistics; I resigned in June 2004, when frustration started to turn to anger. I’ve still got a copy of my resignation letter, in which I wrote:

I have always maintained that the agenda of openness which I espoused is not a choice; it is a reality forced upon us by the modern communication environment. The general public’s expectations have moved on dramatically in the last decade [1995-2004]. Sadly, this [realisation] has not been shared by other parts of the Office on whom my work or resourcing have been dependent.

I warned them that someone would come along, do a better job than they were doing, and supplant them as the ‘primary source’. Once that happened, the statistical sanctity so jealously guarded by the priesthood of statisticians could very easily be compromised. In effect, to preserve the status quo, things had to change. (The message went unheeded, by the way: the six-month ‘stopgap’ site I introduced is soon to celebrate its seventh birthday.)

So today, the Guardian unveiled their Data Store. Editor-in-chief Alan Rusbridger is absolutely clear about the service’s purpose:

Publishing data has got easier [since 1821] but it brings with it confusion and inaccessibility. How do you know where to look, what is credible or up to date? Official documents are often published as uneditable pdf files – useless for analysis except in ways already done by the organisation itself.

Just to be clear, ONS: that’s you he’s talking about. It’s expressed even more starkly in an accompanying blog post by Simon Rogers, subtitled: ‘Looking for stats and facts? This is now the place to come.‘ A quick look down the data on offer reveals a high proportion, a majority perhaps, to be ONS or other HMG data. Their tanks are on your lawn, guys.

Now I’m not for one minute suggesting the Guardian would do anything malicious. I’m simply warning of the uncomfortable position where an outside entity – indeed, in this case, one with an explicit political slant – becomes the gatekeeper to (supposedly) pure statistical data. Can we rely on them to be as comprehensive, as conscientious, as religious in their devotion to updates, corrections and revisions? No, admits Simon Rogers: ‘it is not comprehensive… this is selective’.

So is this the Doomsday Scenario I predicted? Not quite, not yet. How exactly are the Guardian serving up the data?

We’ve chosen Google Spreadsheets to host these data sets as the service offers some nice features for people who want to take the data and use it elsewhere [in] a selection of output formats including Excel, HTML, Acrobat PDF, text and csv. A key reason for choosing Google Spreadsheets to publish our data is not just the user-friendly sharing functionality but also the programmatic access it offers directly into the data. There is an API that will enable developers to build applications using the data, too.

You read that right: the actual mechanics are as basic as: uploading/copying existing Excel spreadsheets, converting/pasting them into Google Docs spreadsheets (price: £0 for 5000 reasonably-sized files), and letting the Google functionality do the rest. By way of example, data on England’s population by sex and race. The Guardian offers this Google Spreadsheet. Now download this Excel file from the ONS website, and look at the sheet labelled ‘Datasheet’. Actually, let me save you the bother: they’re identical.

Cabinet Office minister Tom Watson writes on his blog: ‘Governments should be doing this. Governments will be doing it. The question is how long will it take us to catch up.’ The answer is, the few seconds it takes to sign up for a Google account, and maybe an hour of copy-and-paste. So, tomorrow lunchtime, then.

This afternoon, I thought this was a disaster for ONS’s future. I’ve changed my mind. The Guardian’s move sets a precedent, and lays down a direct, unavoidable challenge. It could actually be ONS’s salvation.