MySociety completes crowd-sourced video markup

Congratulations (hardly for the first time, of course) to the MySociety crew: in less than two months, it looks like their community of volunteers has completed the work to timestamp the 42,019 video clips supplied to They Work For You by BBC Parliament, covering the entire 2007-8 parliamentary session. Hero status is rightly accorded to Abi Broom, responsible herself for more than 20% of the effort (!)… but it’s interesting to see a few familiar names in the list of ‘top timestampers’.

Of course, the time is ticking down to the start of the new Parliamentary Session, when the work starts all over again. Tom Steinberg tells me they get ‘only’ 3-400 new clips per day, so keeping up to date shouldn’t be too hard. Unless Abi gets sick, obviously. 🙂

Quite seriously, this is a fantastic achievement. The goodwill of a community of people, coupled with a trivially simple tagging tool, achieved something which – realistically – neither speech recognition technology, nor the IT budgets of Hansard and BBC Parliament (combined?) ever could. And it goes without saying, it’s largely down to the MySociety ‘brand’ of charitable activism: if Parliament had asked people to do this, do you think many would? (Not that there should be much anguish in Millbank about this invasion of ‘their’ territory; I bet Parliamentary people will be the ones most grateful for the service.)

About 13% of the video clips were tagged anonymously; my guess is that, like me, many of those were people who were searching for something on TheyWorkForYou, came across an as-yet untagged video clip, and decided to ‘leave a tip’. For me, the magic of the tool was the fact that it made this bit so easy. But that means 87% were tagged by people who went to the trouble of registering – much more than I would have guessed, although admittedly, 7 people were responsible for over 50% of the tagging.

Cameron calls for data standards

Of all the topics I might have expected David Cameron to speak in favour of, standardised data formats was not top of the list. So I’m grateful to Nick Booth for pointing to Cameron’s speech last Friday to the Conservative Councillors’ Association.

At the moment, local government bodies must provide the public with information about the services they provide, what goes on in council meetings and how councillors have voted on specific issues. But the information isn’t published in a standardised way. It’s impossible for the public, charities or private companies to effectively collate this data, compare and contrast your performance and hold you to account. That’s why the Government relies on expensive and bureaucratic schemes to try and hold local government to account.

We will turn that approach on its head. We will require local authorities to publish this information – about the services they provide, council meetings and how councillors vote – online and in a standardised format. That way, it can be collected and used by the public and third party groups.

A very timely call, especially in the light of discussions last week about Harry’s consultations site (some of which, I’m told, he’s updating manually?!). I particularly like the way Cameron ties this into the tangible benefits for councillors themselves, removing the need for so many performance measurement exercises.

I’ve had some dealings in this field, particularly during my time with National Statistics. And I’m afraid it’s going to be much, much more difficult than Cameron makes it sound. Too many legacy systems, a chaotic approach to statistical geography, and (frankly) too much opposition from statisticians. Very few statisticians appreciate why they do what they do; they just do it. The work takes on an almost monastic purity. They don’t trust mere mortals – media included – to represent it properly. I was one of a management team hired to drive a culture change in that regard. Our success was limited.

Cameron’s speech focuses on TheyWorkForYou as a role model for data reprocessing; but I’m not sure the comparison holds up too well. A database of numbers would be much more difficult and more sensitive than the absolutes of Hansard: who said what (subject to correction, of course), and who voted how. Realistically we need to get to greater standardisation of process first: starting with defining consistent geographic reasons allowing sensible comparisons over time.

And besides, we haven’t done a great job of producing standardised data in RSS format, one of the most simple and straightforward data standards out there.