However, there are a number of issues with this type of caching. Firstly, many authors try a number of "test" posts after creating a blog. If they check the posts in Google Reader, they become difficult to delete. Secondly, there is no indication when browsing Google Reader that a post may have been deleted and that the author no longer wishes the post to be public. Thirdly, there is no robots.txt mechanism to restrict caching. Fourthly, Google does not delete posts from the cache by request.
In Google's own words: "Reader caches all entries in your feed as your feed most likely only contains your most recent entries. Unfortunately, there isn't a way for Reader to tell which items have been deliberately removed from your site as opposed to having just fallen off the end of your feed. You can create a blank item with the same GUID tag as the original item to at least remove the content from Reader and other feed readers. Contact your blogging software provider for more help with this issue."
So, how do you find the GUIDs of your deleted posts? Assuming you have a Google Account and an account with Blogger, the following Ruby script will output the GUIDs of your deleted posts:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env ruby | |
require 'rubygems' | |
require 'blogger' | |
require 'mechanize' | |
require 'set' | |
STDOUT.sync = true | |
gmail_url = 'http://www.gmail.com' | |
gmail_username = 'username' | |
gmail_password = 'password' | |
blogger_blog_id = '12345' | |
blogger_blog_name = 'blog_name' | |
blogger_id = '1234567890' | |
blogger_username = gmail_username | |
blogger_password = gmail_password | |
# log in to your gmail account | |
agent = Mechanize.new | |
page = agent.get gmail_url | |
form = page.forms.first | |
form.Email = gmail_username + '@gmail.com' | |
form.Passwd = gmail_password | |
page = agent.submit form | |
# get the list of ids as per google reader | |
reader_ids = Set.new | |
page = agent.get("http://www.google.com/reader/atom/feed/" + | |
"http://#{blogger_blog_name}.blogspot.com/feeds/posts/default?r=n&n=100") | |
page.body.scan(/https\:\/\/blogger.googleusercontent.com\/tracker\/\d*\-(\d*)\?/).each do |id| | |
reader_ids << id.to_s | |
end | |
# get the list of ids as per blogger | |
account = Blogger::Account.new(blogger_id, blogger_username, blogger_password) | |
if account.authenticated? | |
blog = account.blog_for_id(blogger_blog_id) | |
blog.posts.each do |post| | |
reader_ids.delete(post.id) | |
end | |
end | |
# the following posts have been deleted | |
reader_ids.each do |id| | |
puts id | |
end |
Next, you need to edit the contents of each post using http://www.blogger.com/post-edit.g?blogID=12345&postID=67890. You may also have to bring the post dates forward so that Google Reader will notice the changes.