Cameron pledges to free our data

David Cameron has taken the Conservatives’ promises on availability of public data a few steps further, in principle at least, in a speech at Imperial College on taking ‘broken politics’ into the ‘post-bureaucratic age’.

‘In Britain today, there are over 100,000 public bodies producing a huge amount of information,’ he said; ‘Most of this information is kept locked up by the state. And what is published is mostly released in formats that mean the information can’t be searched or used with other applications… This stands in the way of accountability.’ Now I’m still not convinced that there’s that much deliberate, conscious locking-up of data; but certainly, the formats in which that data is eventually made available often has the same end result.

OK, so we’re broadly agreed on the problem… what’s the solution, Dave?

We’re going to set this data free. In the first year of the next Conservative Government, we will find the most useful information in twenty different areas ranging from information about the NHS to information about schools and road traffic and publish it so people can use it. This information will be published proactively and regularly – and in a standardised format so that it can be ‘mashed up’ and interacted with.

What’s more, because there is no complete list that can tell us exactly what data the government collects, we will create a new ‘right to data’ so that further datasets can be requested by the public. By harnessing the wisdom of the crowd, we can find out what information individuals think will be important in holding the state to account. And to avoid bureaucrats blocking these requests, we will introduce a rule that any request will be successful unless it can be proved that it would lead to overwhelming costs or demonstrable personal privacy or national security concerns.

If we are serious about helping people exert more power over the state, we need to give them the information to do it. And as part of that process, we will review the role of the Information Commissioner to make sure that it is designed to maximise political accountability in our country.

Now don’t get me wrong here, it’s great to have Cameron’s explicit sign-up to the principle of data freedom, standardised formats, the presumed right of availability, and a 12-month timeframe. But it’s not really anything that the other major parties aren’t already talking about – and in the case of the current government, bringing in the Big Guns to actually do something about. OPSI’s data unlocking service, for example, is nearly a year old, and effectively answers the ‘wisdom of the crowd’ idea. Now it hasn’t been a huge hit… but the principle is already established.

And then there’s his unfortunate choice of public sector jobs as an example of what they might do:

Today, many central government and quango job adverts are placed in a select few newspapers. Some national, some regional. Some daily, some weekly. But all of them in a variety of different publications – meaning it’s almost impossible to find out how many vacancies there are across the public sector, what kind of salaries are being offered, how these vary from public sector body to public sector body and whether functions are being duplicated. Remember this is your money being put forward to give someone a job – and you have little way of finding out why, what for and for how much. Now imagine if they were all published online and in a standardised way. Not only could you find out about vacancies for yourself, you could cross-reference what jobs are on offer and make sure your money is being put to proper use.

Er, isn’t Mr C aware of the recently-upgraded Civil Service Jobs website – with its API, allowing individuals and commercial companies to access the data in a standardised format (XML plus a bit of RDF), and republish it freely? The Tories have talked about online job ads since December 2006; maybe it’s time they updated their spiel.

So what does today’s pledge boil down to? On one level it’s just headline-grabbing, bandwagon-jumping, government-bashing, policy-reannouncing rhetoric. But that’s not necessarily a bad thing. If all the work is going on already, but it isn’t well enough known, or isn’t proving as effective as it could/should be,  maybe we should be welcoming any headlines the subject manages to grab. And if Cameron’s Conservatives do take power at the next election, and truly believe in what was said today, it would be the easy fulfilment of a campaign promise to yank these initiatives out of their quiet beta periods and into the limelight.

Tim Berners-Lee: the celebrity we need?

When Andrew Stott was appointed Director of Digital Engagement, I commented that it wasn’t the ‘rock star’ appointment many of us had been led to expect. Well, the ‘rock star’ appointment came through yesterday, with the news that Sir Tim Berners-Lee as the government’s ‘expert advisor on public information delivery’. The Director position required evidence of having ‘run a public facing web site of significant size’: well, I guess TBL qualifies, having run the entire web at one point. 🙂

This is meant to send a loud and clear signal to the civil service: raw data now. And I couldn’t agree more; see this post, for example, from 2008 about ‘API-first publication’, in the context of the 2011 Census. But I think it’s more about how that signal gets sent.

The Cabinet Office press release says:

He will head a panel of experts who will advise the Minister for the Cabinet Office on how government can best use the internet to make non-personal public data as widely available as possible. He will oversee the work to create a single online point of access for government held public data and develop proposals to extend access to data from the wider public sector, including selecting and implementing common standards. He will also help drive the use of the internet to improve government consultation processes.

It reads like a rather hands-off, committee-based kind of role. And whilst that wouldn’t be a bad thing in itself, I wonder if it’s what The Machine really needs from him. What’s the question, to which ‘Sir Tim Berners-Lee’ is the answer?

I don’t think we particularly need the advice on standards; and I don’t know that TBL will be able to tell (checks the post-reshuffle situation) Tessa Jowell how to organise data publication processes inside the typical Whitehall department. But what he will be able to do is intimidate persuade those people who always seem to block the initiatives which have already gone before. He may have more success saying the exact same things many of us have already been saying for some time, because of who he is.

Stuart Bruce, who knows a thing or two about PR / technology / the Labour Party responded thus on Twitter: ‘Opening access to government data YES! Well done. But Tim Berners-Lee? Isn’t that just like Sugar, yet more cult of celebrity.’ Maybe so. Probably so, in fact. But it may be exactly what we need.

BBC anger at DCSF data formats

BBC News website editor Steve Herrmann is not happy. In previous years, the Beeb site has carried full school league table data, as soon as the embargo is lifted at 09:30am. But not this year.

‘This is because the government has tightened up on the media’s pre-release access to official statistics,’ he explains. ‘In the past, we have generally got the official results a week in advance, under embargo, to compile and check tables. This time, we will have had sight of the data for just 24 hours.’

But, in fact, it’s not specifically the reduced lead time that’s the problem here. More accurately, it’s the reduced lead time to deal with what DCSF chucks at them.

The school results that are supplied to the news media are not in a readily accessible form. In the case of the secondary schools, there are two large spreadsheets, each with a number of pages… Each sheet has dozens of columns, and a row for each school and college. Formatting the essential benchmarks from all this for publication, using computer scripts to interrogate the data, compiling and then proofreading them, takes hours of work.

In other words, DCSF are unable – let’s give them the benefit of the doubt for a moment – to supply the data in a format which assists the end users (in this case, the entire national media) to do their jobs.

And it’s led to an official letter of complaint – signed jointly by the BBC, Press Association and national newspapers – to the DCSF’s chief statistician. ‘With less than 24 hours’ preparation time, it will be much more difficult to produce any meaningful analysis of the information and to ensure there are no errors,’ they write. ‘The result is that the main aim of the government and of our organisations – to provide an essential service to parents choosing a secondary school for their sons and daughters – will be thwarted.’

Statistical release procedures are a touchy subject; and school league tables are even touchier. Statisticians don’t like issuing them, because people insist on doing nasty things like – imagine! – ranking them in order. Many schools don’t like them either. But I can tell you for a fact, parents absolutely lap them up.

If DCSF, at whatever level, believes in the publication of this data, they need to make it easy for the major communication channels – the newspapers, and the BBC website – to republish it to their huge readerships. That clearly isn’t happening.

Stop what you’re doing and sign up

I’m not sure I need to waste my time explaining why you need to go to TheyWorkForYou and sign up to MySociety’s campaign to Free Our Bills – or rather, to have Parliamentary data marked up in mashup-friendly XML. Just compare ‘proper’ Hansard to TheyWorkForYou, and imagine the same process being done on all Parliamentary paperwork.

You may or may not be interested in the intricacies of XML parsing, or even in the uglier workings of the Houses of Parliament. But the fact is, TheyWorkForYou has become a living case study for what we want from e-government. It’s the best-practice example everyone quotes. And if they can persuade/force Parliament to work with them, it sets a valuable precedent for everyone else.

Quick update: Tom Steinberg has been in touch to say it’s not a petition, it’s ‘an action list, proper online campaign style’. Duly noted.

And when you’re done there… log into Facebook (come on, you remember) and join the campaign to allow clips of Parliament on YouTube. Useful in itself, but helpful to MPs who want to show their constituents what they’re up to. My thanks to Lynne Featherstone for the tipoff.

Set the Census data free

One particularly difficult phase of my career was my time with National Statistics, in the aftermath of the 2001 Census. I tried, and ultimately failed, to persuade the organisation to recognise the tremendous asset they held in Census data, and to make wide public access a priority. I’m proud of some of the (relatively modest) things we managed to put out, but overall I’m disappointed at the many opportunities that were missed.

I remember my frustration at how everything was driven by very narrow ‘stakeholder consultation’, which ultimately resulted in the same old people asking for the same old things. The potential for civic engagement ranked well down the list of organisational priorities; the possibilities for data mashing didn’t even register. Despite the huge sums of money spent on countless consultancies, the end product was – ahem – somewhat underwhelming.

So when I discover that the 2011 Census outputs are the subject of the latest blog-based consultation, part of the Hansard Society‘s Digital Dialogues programme, of course I’m interested. And I think we all should be.

Two dates to bear in mind here. It’s nearly a year since the publication of the Mayo-Steinberg Power Of Information report, which called for ‘a strategy in which government … supplies innovators that are re-using government-held information with the information they need, when they need it, in a way that maximises the long-term benefits for all citizens.’ And just as importantly, we’re probably five years away from the first publication of census data.

This must be the first Census to take a truly web-first, and arguably even an API-first, approach to publication. Several reasons:

  • Because it’s a one-off event, for which we have several years to prepare.
  • Because if you think the world is web-first in 2008, just you wait and see what 2013 looks like.
  • Because outsiders – from Experian to MySociety – will almost certainly do a better job than the Civil Service (sorry).
  • Because it doesn’t actually prevent government doing the ‘old school’ thing itself, if it wants. In fact, if you think ‘API first’, it’ll probably result in the ‘old school’ outputs coming together easier and quicker too. Be your own client.
  • Because to have any validity, the Census requires the goodwill and engagement of every person in the country. It’s one of the rare occasions where every resident puts something into a national kitty. Even if it’s only symbolic, this should be the prime example of the state giving something back to them in return.

This is one government consultation where the geek community (by which I mean us, sadly) should bring its influence to bear. We all know it’s the right thing to do; but they won’t do it unless there’s a sizeable, quantifiable demand. This would be a huge symbolic victory for openness and democratisation. This is our chance.

‘Gov 2.0’ in US presidential campaigning

I’m grateful to Jeff Jarvis for a detailed post on ‘government 2.0’ (although it isn’t a term he used, nor should he have). He points to two recent proposals from the Democrat candidates for the US presidency.

I hadn’t heard Hillary Clinton’s suggestion, back in January, that government should actually be required to blog:

I want to have as much information about the way our government operates on the Internet so the people who pay for it, the taxpayers of America, can see that. I want to be sure that, you know, we actually have like agency blogs. I want people in all the government agencies to be communicating with people, you know, because for me, we’re now in an era–which didn’t exist before–where you can have instant access to information, and I want to see my government be more transparent.

Meanwhile, Barack Obama told an audience at Google:

I’ll put government data online in universally accessible formats. I’ll let citizens track federal grants, contracts, earmarks, and lobbyist contacts. I’ll let you participate in government forums, ask questions in real time, offer suggestions that will be reviewed before decisions are made, and let you comment on legislation before it is signed. And to ensure that every government agency is meeting 21st century standards, I’ll appoint the nation’s first Chief Technology Officer.

The concept of universally accessible data formats will/would be music to some people’s ears, of course.

Cameron calls for data standards

Of all the topics I might have expected David Cameron to speak in favour of, standardised data formats was not top of the list. So I’m grateful to Nick Booth for pointing to Cameron’s speech last Friday to the Conservative Councillors’ Association.

At the moment, local government bodies must provide the public with information about the services they provide, what goes on in council meetings and how councillors have voted on specific issues. But the information isn’t published in a standardised way. It’s impossible for the public, charities or private companies to effectively collate this data, compare and contrast your performance and hold you to account. That’s why the Government relies on expensive and bureaucratic schemes to try and hold local government to account.

We will turn that approach on its head. We will require local authorities to publish this information – about the services they provide, council meetings and how councillors vote – online and in a standardised format. That way, it can be collected and used by the public and third party groups.

A very timely call, especially in the light of discussions last week about Harry’s consultations site (some of which, I’m told, he’s updating manually?!). I particularly like the way Cameron ties this into the tangible benefits for councillors themselves, removing the need for so many performance measurement exercises.

I’ve had some dealings in this field, particularly during my time with National Statistics. And I’m afraid it’s going to be much, much more difficult than Cameron makes it sound. Too many legacy systems, a chaotic approach to statistical geography, and (frankly) too much opposition from statisticians. Very few statisticians appreciate why they do what they do; they just do it. The work takes on an almost monastic purity. They don’t trust mere mortals – media included – to represent it properly. I was one of a management team hired to drive a culture change in that regard. Our success was limited.

Cameron’s speech focuses on TheyWorkForYou as a role model for data reprocessing; but I’m not sure the comparison holds up too well. A database of numbers would be much more difficult and more sensitive than the absolutes of Hansard: who said what (subject to correction, of course), and who voted how. Realistically we need to get to greater standardisation of process first: starting with defining consistent geographic reasons allowing sensible comparisons over time.

And besides, we haven’t done a great job of producing standardised data in RSS format, one of the most simple and straightforward data standards out there.