Tuesday, March 16, 2010

Unblogging

Google Reader is more than just a feed reader. It's a cache and an archive: http://www.google.com/reader/atom/feed/FEED_URL?r=n&n=X displays the last X posts from FEED_URL even if FEED_URL itself does not contain all of its last X posts. If the author of FEED_URL deletes an old post it will not be removed from the cache. This is nothing strange, caching and archiving are part and parcel of the Internet.

However, there are a number of issues with this type of caching. Firstly, many authors try a number of "test" posts after creating a blog. If they check the posts in Google Reader, they become difficult to delete. Secondly, there is no indication when browsing Google Reader that a post may have been deleted and that the author no longer wishes the post to be public. Thirdly, there is no robots.txt mechanism to restrict caching. Fourthly, Google does not delete posts from the cache by request.

In Google's own words: "Reader caches all entries in your feed as your feed most likely only contains your most recent entries. Unfortunately, there isn't a way for Reader to tell which items have been deliberately removed from your site as opposed to having just fallen off the end of your feed. You can create a blank item with the same GUID tag as the original item to at least remove the content from Reader and other feed readers. Contact your blogging software provider for more help with this issue."

So, how do you find the GUIDs of your deleted posts? Assuming you have a Google Account and an account with Blogger, the following Ruby script will output the GUIDs of your deleted posts:



Next, you need to edit the contents of each post using http://www.blogger.com/post-edit.g?blogID=12345&postID=67890. You may also have to bring the post dates forward so that Google Reader will notice the changes.

8 comments:

Stephanie said...

Hi,
Thank you for your post. I have used this method for some of my RSS feeds but have found it only works for the more recent ones. Older ones won't change for me. Is this normal or am I doing something wrong?

Martin Harrigan said...

Hi Stephanie, if you send me a link to one of the RSS feeds that it did not work for I will take a look.

Martin Harrigan said...

Hi Erin, I have no idea -- you need to post an entry to LJ with the same GUID as that of the post you wish to overwrite. Unfortunately, I don't know if this is possible. Sorry.

Anonymous said...

Hi Martin,

This is very interesting. Is there a way of locating GUIDs of blog posts written (and deleted) by someone else? Might be handy for accessing sites that have been deleted but not archived elsewhere online.

Thanks!

Martin Harrigan said...

Hi The Other Greek Chorus,

I'm not sure but I think if you have the URL of the deleted feed and if it was ever cached by Google Reader then you can see the deleted posts by subscribing to the feed and retrieve the GUIDs using the method above.

Martin.

Holly said...

Hello! Would it be possible to explain more in detail step-by-step instructions on how you successfully cleared the content of an old, deleted post in Reader? THANK YOU!
-Holly

Martin Harrigan said...

Hi Holly,

The steps at http://crsouza.blogspot.com/2010/03/partial-least-squares-analysis-and.html might help,

Regards,
Martin.

Maria said...

I've tried the guide at crzouza but have problem with just one postid. The most important one which does not work when using the guide. All others work. I can't understand why. Does anyone know?