Tuesday, March 16, 2010

Unblogging

Google Reader is more than just a feed reader. It's a cache and an archive: http://www.google.com/reader/atom/feed/FEED_URL?r=n&n=X displays the last X posts from FEED_URL even if FEED_URL itself does not contain all of its last X posts. If the author of FEED_URL deletes an old post it will not be removed from the cache. This is nothing strange, caching and archiving are part and parcel of the Internet.

However, there are a number of issues with this type of caching. Firstly, many authors try a number of "test" posts after creating a blog. If they check the posts in Google Reader, they become difficult to delete. Secondly, there is no indication when browsing Google Reader that a post may have been deleted and that the author no longer wishes the post to be public. Thirdly, there is no robots.txt mechanism to restrict caching. Fourthly, Google does not delete posts from the cache by request.

In Google's own words: "Reader caches all entries in your feed as your feed most likely only contains your most recent entries. Unfortunately, there isn't a way for Reader to tell which items have been deliberately removed from your site as opposed to having just fallen off the end of your feed. You can create a blank item with the same GUID tag as the original item to at least remove the content from Reader and other feed readers. Contact your blogging software provider for more help with this issue."

So, how do you find the GUIDs of your deleted posts? Assuming you have a Google Account and an account with Blogger, the following Ruby script will output the GUIDs of your deleted posts:



Next, you need to edit the contents of each post using http://www.blogger.com/post-edit.g?blogID=12345&postID=67890. You may also have to bring the post dates forward so that Google Reader will notice the changes.