I recently changed one of my robots.txt files pruning duplicate content pages to help more of the internal PageRank flow to the higher quality and better earning pages. In the process of doing that, I forgot that one of the most well linked to pages on the site had a similar URL as the noisy pages. About a week ago the site's search traffic halved (right after Google was unable to crawl and index the powerful URL). I fixed the error pretty quickly, but the site now has hundreds of pages stuck in Google's supplemental index, and I am out about $10,000 in profit for that one line of code! Both Google and Yahoo support wildcards, but you really have to be careful when changing a robots.txt file because a line like this
also blocks a file like this from being indexed in Google
Unless you are thinking of that in advance it is easy to make a mistake.
If you are trying to prune duplicate content for Google and are fine with it ranking in other search engines, you may want to make those directives specific for GoogleBot. If you make a directive for a specific robot, that bot will ignore your general robots directives in favor of following the more specific directives you created for it.
Google also offers a free robots.txt test tool, which allows you to see how robots will respond to your robots.txt file, notifying you of any files that are blocked.
You can use Xenu link sleuth to generate a list of URLs from your site. Upload that URL list to the Google robots.txt test tool (currently in 5,000 character chunks...an arbitrary limit I am sure they will eventually lift).
Inside the webmaster console Google will also show you what pages are currently blocked by your robots.txt file, and let you view when Google tried to crawl the page and noticed it was blocked. Google also shows you what pages are 404 errors, which might be a good way to see if you have any internal broken links or external links pointing at pages that no longer exist.
Gain a Competitive Advantage Today
Your top competitors have been investing into their marketing strategy for years.
Now you can know exactly where they rank, pick off their best keywords, and track new opportunities as they emerge.
Explore the ranking profile of your competitors in Google and Bing today using SEMrush.
Enter a competing URL below to quickly gain access to their organic & paid search performance history - for free.
See where they rank & beat them!
- Comprehensive competitive data: research performance across organic search, AdWords, Bing ads, video, display ads, and more.
- Compare Across Channels: use someone's AdWords strategy to drive your SEO growth, or use their SEO strategy to invest in paid search.
- Global footprint: Tracks Google results for 120+ million keywords in many languages across 28 markets
- Historical data: since 2009, before Panda and Penguin existed, so you can look for historical penalties and other potential ranking issues.
- Risk-free: Free trial & low price.