Google Robots.txt Wildcard
Not sure if I have seen this mentioned before. Dan Thies noticed Googlebot's wildcard robot.txt support:
Google's URL removal page contains a little bit of handy information that's not found on their webmaster info pages where it should be.Google supports the use of 'wildcards' in robots.txt files. This isn't part of the original 1994 robots.txt protocol, and as far as I know, is not supported by other search engines. To make it work, you need to add a separate section for Googlebot in your robots.txt file. An example:
User-agent: Googlebot
Disallow: /*sort=This would stop Googlebot from reading any URL that included the string &sort= no matter where that string occurs in the URL.
Good information to know if your site has recently suffered in Google due to duplicate content issues.
Dan also recently an SEO coach blog on his SEO Research Labs site.
Gain a Competitive Advantage Today
Your top competitors have been investing into their marketing strategy for years.
Now you can know exactly where they rank, pick off their best keywords, and track new opportunities as they emerge.
Explore the ranking profile of your competitors in Google and Bing today using SEMrush.
Enter a competing URL below to quickly gain access to their organic & paid search performance history - for free.
See where they rank & beat them!
- Comprehensive competitive data: research performance across organic search, AdWords, Bing ads, video, display ads, and more.
- Compare Across Channels: use someone's AdWords strategy to drive your SEO growth, or use their SEO strategy to invest in paid search.
- Global footprint: Tracks Google results for 120+ million keywords in many languages across 28 markets
- Historical data: since 2009, before Panda and Penguin existed, so you can look for historical penalties and other potential ranking issues.
- Risk-free: Free trial & low price.

Comments
It is supported by Yahoo! Slurp too!
Tried to submit this for a client:
User-agent: Googlebot
Disallow: /*PHPSESSID=
Through the Removal site
But I get the folowing message: URLs cannot have wild cards in them (e.g. "*"). The following line contains a wild card: DISALLOW /*PHPSESSID=
¿so?
I am also getting this message form Google, when I try to remove:
URLs cannot have wild cards in them (e.g. "*"). The following line contains a wild card:
DISALLOW /thread*-0.html
It was around for years. See http://www.webmasterworld.com/forum93/106.htm
I think that made it's rounds before I got on the scene.
MSNbot support wildcard robots.txt too.
In 2005 (when I wrote this post) I don't think Microsoft did.
Add new comment