Creative Commons coming to data.gov.uk

There’s something almost unnerving about the launch of a government website getting so much positive coverage. But today’s been data.gov.uk‘s big day, and everyone seems to agree it’s a jolly good thing. For now.

James Crabtree’s piece for Prospect magazine hails it as ‘a tale of star power, serendipity, vision, persistence and an almost unprecedented convergence of all levels of government’. The New Statesman says it’s ‘a far more radical project than it first appears… a clear break with the closed, data-hugging state of the past.’ We’re all getting quite excitable, aren’t we?

Me? I’m just looking back over posts on this blog last year: this one about the need to make moves on data release (including an excerpt from my resignation letter from ONS), and this one on Tim Berners-Lee’s appointment. I’ll confess, I got something wrong in that latter post; I wrote that it was ‘probably’ a cult-of-celebrity, hands-off appointment. Looks like that wasn’t entirely accurate. Sorry.

This has been a long time coming. Too long. Shamefully long. But there is still good reason to be excited. Amid all the talk about bicycle accidents, you may have missed the news that OPSI is working on simplified T&Cs for reuse of the site’s data:

These terms and conditions have been aligned so that they are interoperable with any Creative Commons Attribution 3.0 Licence. The terms and conditions are also machine readable meaning that the licence is presented and coded in such a way that applications and programs can access and understand the terms and conditions too.

This is the first major step towards the adoption of a non-transactional, Creative Commons style approach to licensing the re-use of government information. The new model will replace the existing Click-Use Licence. We are working towards the launch of the new licence model by the end of May 2010.

Don’t overlook the significance of this move. This is government adopting someone else’s standard, for something they have historically claimed as their own. The Click-Use Licence is actually pretty liberal… but it’s scary.

This simple shift will take us from this:

Unless otherwise specified the information on this site is covered by either Crown Copyright, Crown Database Right or has been licensed to the Crown. It is your responsibility to clear any other rights. You are encouraged to use and re-use the information that is available on and through this site freely and flexibly, with only a few conditions…

to this (or something very like it). We, the citizens of the web, know what Creative Commons means: we don’t need to look it up, we won’t need a dictionary, and we won’t need a lawyer. Good things will happen as a direct result.

ONS drops jobs data early

I’ve actually got a lot of sympathy for the team at the Office for National Statistics today. This morning should have seen the release of the monthly unemployment numbers; but due (apparently) to ‘a computer error on automated systems’, they leaked out yesterday – and ONS took the decision to bring forward the official publication. Bearing in mind the market sensitivity of the data, I can imagine the scenes.

As I’ve mentioned before, I was in charge of the web team(s) at ONS for a couple of years, from 2002. It was one of the most frustrating periods of my career: for all my best efforts, my vision of web-friendly database publication went unrealised. Instead, the current National Statistics website is still fundamentally the same 6-month stop-gap site I pushed through in 2002. I don’t know about the underlying data-crunching systems, but I see no evidence of there having been any improvement since I left. They were inadequate then, and they look even more inadequate this morning.

Instead, improvements to government statistics online now seem to be centred on something called the Publication Hub. In effect, it’s a big catalogue of government statistical releases – most of which are still located on the originating department’s website, and are still being delivered as PDF or Excel files. User-friendly it ain’t, placing the priority on ‘metadata’ (which, in statistical terms, means lengthy written explanations of methodology) rather than the actual data. Most people will struggle to find any numbers whatsoever.

There are some appalling quirks: for example, if you press the button to see the homepage button for the ‘next 30 days’ of scheduled statistical releases, you see day 30 first, and have to click two or three times to get to day 1 (ie tomorrow). And whilst it’s good to see RSS has been taken into account, it’s impossible to work out what’s meant to be included in the feed each time you see the orange icon.

I left ONS five years ago because I didn’t believe senior management recognised that the world had changed. In my letter of resignation, I suggested the Office was ‘five years behind the times’. Another five years on, if this Publication Hub is the answer, they still haven’t understood the question… and we’ll have to rely on third parties.

Guardian Data Store: threat to ONS or its saviour?

When I first saw reports of the Guardian’s new Data Store ‘open platform’, my heart sank. In a former life, I ran the web operation at the Office for National Statistics; I resigned in June 2004, when frustration started to turn to anger. I’ve still got a copy of my resignation letter, in which I wrote:

I have always maintained that the agenda of openness which I espoused is not a choice; it is a reality forced upon us by the modern communication environment. The general public’s expectations have moved on dramatically in the last decade [1995-2004]. Sadly, this [realisation] has not been shared by other parts of the Office on whom my work or resourcing have been dependent.

I warned them that someone would come along, do a better job than they were doing, and supplant them as the ‘primary source’. Once that happened, the statistical sanctity so jealously guarded by the priesthood of statisticians could very easily be compromised. In effect, to preserve the status quo, things had to change. (The message went unheeded, by the way: the six-month ‘stopgap’ site I introduced is soon to celebrate its seventh birthday.)

So today, the Guardian unveiled their Data Store. Editor-in-chief Alan Rusbridger is absolutely clear about the service’s purpose:

Publishing data has got easier [since 1821] but it brings with it confusion and inaccessibility. How do you know where to look, what is credible or up to date? Official documents are often published as uneditable pdf files – useless for analysis except in ways already done by the organisation itself.

Just to be clear, ONS: that’s you he’s talking about. It’s expressed even more starkly in an accompanying blog post by Simon Rogers, subtitled: ‘Looking for stats and facts? This is now the place to come.‘ A quick look down the data on offer reveals a high proportion, a majority perhaps, to be ONS or other HMG data. Their tanks are on your lawn, guys.

Now I’m not for one minute suggesting the Guardian would do anything malicious. I’m simply warning of the uncomfortable position where an outside entity – indeed, in this case, one with an explicit political slant – becomes the gatekeeper to (supposedly) pure statistical data. Can we rely on them to be as comprehensive, as conscientious, as religious in their devotion to updates, corrections and revisions? No, admits Simon Rogers: ‘it is not comprehensive… this is selective’.

So is this the Doomsday Scenario I predicted? Not quite, not yet. How exactly are the Guardian serving up the data?

We’ve chosen Google Spreadsheets to host these data sets as the service offers some nice features for people who want to take the data and use it elsewhere [in] a selection of output formats including Excel, HTML, Acrobat PDF, text and csv. A key reason for choosing Google Spreadsheets to publish our data is not just the user-friendly sharing functionality but also the programmatic access it offers directly into the data. There is an API that will enable developers to build applications using the data, too.

You read that right: the actual mechanics are as basic as: uploading/copying existing Excel spreadsheets, converting/pasting them into Google Docs spreadsheets (price: £0 for 5000 reasonably-sized files), and letting the Google functionality do the rest. By way of example, data on England’s population by sex and race. The Guardian offers this Google Spreadsheet. Now download this Excel file from the ONS website, and look at the sheet labelled ‘Datasheet’. Actually, let me save you the bother: they’re identical.

Cabinet Office minister Tom Watson writes on his blog: ‘Governments should be doing this. Governments will be doing it. The question is how long will it take us to catch up.’ The answer is, the few seconds it takes to sign up for a Google account, and maybe an hour of copy-and-paste. So, tomorrow lunchtime, then.

This afternoon, I thought this was a disaster for ONS’s future. I’ve changed my mind. The Guardian’s move sets a precedent, and lays down a direct, unavoidable challenge. It could actually be ONS’s salvation.

API promised for 2011 Census data

Chances are, you missed last month’s publication of the Cabinet Office’s white paper on the 2011 Census. ‘Modern times demand modern approaches,’ declares Sir Michael Scholar, chair of the UK Statistics Authority: you’ll be able to complete your census form online, and ‘all standard outputs will be publicly accessible online, and free of charge, from the National Statistics website (whatever that is – as I understand it, the name disappeared in the UKSA rebranding).’

The Census represents a marvellous opportunity. We’re now many years into the post-web world, and online is now the main distribution channel for data. We’ve got several years to learn from the best practice of others, be they fellow statistical organisations around the world, or heavy-duty data disseminators like the financial markets. There’s no issue as regards a business model: the commitment to free availability has already been made. It’s an open goal.

Unfortunately, I probably wrote something almost identical to the preceding paragraph seven years ago, when I started working for ONS as Web Editor in Chief, full of optimism at what magic we could weave with the 2001 census data. It didn’t last; there was virtually zero consideration of public usage in the output plans, and I couldn’t persuade the key people of the cultural shift happening outside. There were some blazing rows. I left ONS in 2004; it says something that the website I built as a six-month stopgap in 2002 is still their main web presence – reskin aside, almost exactly as I left it.

A quick skim through the white paper provides little reason to restore my optimism. It has more to say about printed books of preformatted tables than it does about electronic methods – there’s no fleshing-out of what ‘online dissemination’ might mean. Instead, there’s a commitment to produce CDs and DVDs… seriously? in 2012?

But there may yet be hope. Back in December, ONS quietly launched a 2011 UK Census Output consultation – based, remarkably, on a Wiki platform. They’ve published initial survey findings from 500+ respondents, half of whom were in government; it’s a bit disappointing to see so little input from potential new customers (only 2%), as opposed to the ‘usual suspects’. Yet a clear majority of this normally conservative (small ‘c’) audience said they would be happy with electronic output alone.

And hallelujah! – elsewhere on the wiki there’s even mention of an ‘intention is to support a variety of electronic dissemination options through the use of an internet-based API [said on another page to be ‘publicly-available’] that can access the full range of aggregated Census statistics.’ There’s even a link to a list of the API calls to be offered – but it ‘does not (yet) exist’. Many a slip twixt cup and lip, as they say… but they’re undoubtedly talking the right talk here, and perhaps that’s all we can ask at this stage.

My only plea is that they remember the huge potential value for new users. Things have moved on dramatically since 2001; I can think of countless websites which would adore a system they could hook into, with fantastic potential benefits to ordinary web users. The wiki’s list of planned response formats betrays the ‘insiders first’ instinct again: nothing your average masher will be familiar with. Consult your community by all means, guys; but recognise there’s an even wider potential community these days.

  • PS: It’s not the Census group’s first venture into social media: two years ago, they took part in the Hansard Society’s Digital Dialogues initiative, with a blog centred on consultation on small area geography policy. Ten blog posts in three months (over Christmas) isn’t great, and the Hansard Soc was politely critical of the blogger’s failure to engage with the readership, and the organisation’s failure to take the initiative forward. Interestingly, the site has been wiped from the record books: the Hansard Soc’s graphics have been replaced by Flickr errors, and the onsgeography.net domain name appears to have lapsed. There’s always web.archive.org though… 🙂

Power Taskforce’s ideas on crime maps

The Home Office is confirming that it’ll press ahead with online crime mapping, as recommended by today’s Casey Report on Engaging Communities in Fighting Crime.

Even better, the Power Of Information taskforce – specifically Will Perrin and Tom Loosemore, in apparent association with designers Schulze and Webb – have posted a few concepts showing not only the mapping of crime data, aggregated to postcode sector; not only an overlaid layer of data showing public facilities such as schools, pubs and cash machines; but also the ability to actually do something as a follow-on. I’m especially intrigued by the RSS icon: blogging bobbies, perhaps?

Judging by the mockups anyway, we’re looking at some serious interaction potential: polling on local priorities, emailing the local policing team or your local elected representatives. (Never mind the possibility of interacting with the data.)

It’s not the first time some/most of this has been proposed: whilst working at National Statistics, I was involved in the concept work which ultimately led to the disappointing Neighbourhood Statistics. It’s not as if we didn’t have some of these same ideas… but mashing-up has come a long way since then, thanks particularly to Google Maps. I note the ‘presumption’ that Google’s technology would underpin these maps… another nail in Ordnance Survey’s coffin?