BBC gets it completely wrong about Whitehouse.gov robots.txt

There have been a number of stories hurtling around the web about the Whitehouse.gov’s new robots.txt file – on the first day of Obama’s presidency it changed from this massive file to simply, this:

User-agent: *
Disallow: /includes/

The robots.txt file tells search engines what and what not to include, and the original was blocking a whole load of pages. Those with a vague knowledge of search marketing leapt on the change as an indication of Obama’s new policy of open government, and it seems that the BBC has eagerly followed suit:

The robots.txt file the Bush administration set up for Whitehouse.gov ran to almost 2377 lines and thereby stopped search engines logging a lot of the data found on the site. On the first day of the Barack administration the robots.txt file shrunk to two lines allowing, for the moment, search sites to index everything it contains.

However a closer look at the old robots.txt shows that the Bush administration’s version was just blocking text only versions of pages – leaving the normal version of the page to be indexed. Bush wasn’t hiding anything from the search engines – he was just being a more sensible web administrator and avoiding duplicate content.

In fact, Obama’s team have removed a whole load of pages from whitehouse.gov without bothering to redirect them to newer versions of the pages. This is extremely bad practice, as those pages will just disappear from the search engines entirely, and the new versions will probably never get to the same position that they were under Bush. People with bookmarks or links to those old pages will now just be presented with an error page.

This is making data less accessible, not more accessible.

Now I’m not on a pro-Bush, anti-Obama rant here – I’m the opposite of a Bush fan – but at least the BBC should get its facts right before publishing something, right?


12 Comments

Carl MorrisJanuary 22nd, 2009 at 9:29 pm

Agreed. The old robots.txt is better for search optimisation.

This is such a specialised decision that it probably has nothing to do with who the President is.

Sure, we can talk about the online strategy of the new administration. (If those reading in USA don’t mind a few points from someone in the UK.)

For a start, if it’s so “open” why can’t people leave public-readable comments?

Ditto leave trackback links?

JamieJanuary 22nd, 2009 at 9:46 pm

Hi Carl. I 100% agree. I can’t imagine President Bush or Obama signing off the robots.txt! However I find it amusing that the BBC seems to be insinuating they might! :-)

tJanuary 22nd, 2009 at 10:04 pm

the bbc should get things right? good luck with that.

JamieJanuary 22nd, 2009 at 10:08 pm

the bbc should get things right? good luck with that.

True. It’s not the first time I’ve had my gripes: Credit Card Fraud Melodrama and Technobabble from the BBC.

Online Internet FaxingJanuary 23rd, 2009 at 12:34 am

Great post, the BBC definitely needs a little bit more SEO understanding before they start making statements. And considering its a brand new website give it some time and I imagine we’ll see additional lines added to the robots.txt, as well as missing redirects appearing. Cheers.

JamieJanuary 23rd, 2009 at 1:41 am

And considering its a brand new website give it some time and I imagine we’ll see additional lines added to the robots.txt, as well as missing redirects appearing.

Not to labour the point, but just as a “best practice” recommendation, if you’re going to spend loads of money on a new version of a site, definitely get the redirects right on launch. The redirects may be added later, but its likely that it will be too late, and at least for the short term those pages are going to bomb. It’s clear they haven’t put the site together overnight – it’s obviously been planned for a long time.

But its unfair to just pick on the Washington.gov site for this, as even the biggest companies (or maybe especially the biggest companies) have frequently made the same mistake. And to most people it is completely insignificant compared to the other big changes that have been implemented. My beef is primarily with BBC Tech’s unfounded and inaccurate conclusions.

reg4cJanuary 23rd, 2009 at 3:03 am

It says “a lot of the data found “. Data may or may not be useful. BBC did not say anything about blocking the content, or articles, or useful information, but they just said DATA.

BJanuary 24th, 2009 at 3:25 am

This is an example of how the press–even BBC–is fawning over Obama. Since the media is generally pretty tech-stupid, it just follows it’s knee-jerk reaction (that was Clinton’s phrase) of praising liberals and putting down Bush at every opportunity.

[...] this is a very nitpicky complaint and I’ve done enough serious BBC bashing lately. All I’ll say is that it is an honour and a privilege to correct the consistently [...]

mjcFebruary 9th, 2009 at 1:04 am

the files now has more lines… and viewing of one of the disallows lets you see a site in building progress…http://www.whitehouse.gov/omb/

User-agent: *
Disallow: /includes/
Disallow: /search/
Disallow: /omb/search/

JamieFebruary 9th, 2009 at 11:59 am

Well spotted! :-)

[...] JD [...]

Leave a comment

Your comment