Deep and shallow RSS import

One of my objectives for the major government website project I’m currently working on, was wide-ranging support for RSS. I’m fairly sure I implemented Whitehall’s first ever RSS feed (2002? 2003? can’t remember), so I have a reputation to maintain here. 🙂
I thought it might be worth recording a couple of our ideas here on the blog. This time – bringing content into a website via RSS.
RSS feeds come in two distinct types: ‘full text’ and ‘summary’. The latter tend to be more popular with commercial publishers, encouraging you to click through to a ‘proper’ page impression on the source website. ‘Full text’, on the other hand, gives you the whole article without the extra hassle of the clickthrough. I’m tending to favour doing both within the same feed, unless there’s a strong commercial imperative not to.
Just as there are two types of feed, we’re planning two types of RSS consumption (ie. where we take content into our site from someone else’s feed).
‘Deep’ integration will take the items from a given feed, and turn them into self-standing items within our CMS. Having gone through the normal workflow process, they will appear on our site, under our masthead, with page addresses in our domain. They will have become ‘our’ pages (albeit with a sourcing credit).
‘Shallow’ integration will turn a feed into a simple list of links back to the source website. The necessary coding should be pretty minimal: just taking the titles, descriptions and links, and turning them into a list of bullet-points (or however you choose to present them). It’s the sort of thing you might expect to see in a page margin or sidebar.
Deep integration will probably work best with full text feeds, coming from specific partner websites within the Department’s area of responsibility or influence. (That isn’t to say we won’t consume summary feeds deeply; but they will make for rather short pages.) I suspect we will make more use of shallow feeds: as well as being less effort, it maintains a certain distance which might be seen as editorially beneficial.
I’m hoping that a combination of the two approaches will help bridge the inevitable divide between the new site and a couple of specific applications which won’t be migrated in Phase One. As long as we can get the legacy systems to produce RSS, we can bring their content – even if it’s just the latest ‘headlines’ – into the new site context. It’ll be a huge improvement on a single static link to the legacy area’s homepage.
Why RSS? Why not XML more generally? Quite simply, because RSS is a rigidly defined format, which has reached critical mass. We can point people to the RSS specification, and tell them to generate feeds which comply with the spec. Otherwise, we would need to come up with some kind of GUI-led XSLT routine – and XSLT is tricky enough at the best of times.
The implications for ‘joined up government’ are potentially huge. Here’s hoping it delivers on the promise. 😉