How to Handle Duplicate Content
Here is a fun webmaster help video from March 10th of 2010, answering the following question:
"If Google crawls 1,000 pages/day, Googlebot crawling many dupe content pages may slow down indexing of a large site. In that scenario, do you recommend blocking dupes using robots.txt or is using META ROBOTS NOINDEX,NOFOLLOW a better alternative?"
The answer kinda jumps around a bit, but here is a quote:
I believe if you were to talk to our crawl and index team, they would normally say "look, let us crawl all the content, we'll figure out what parts of the site are dupe (so which sub-tree are dupes) and we'll combine that together.
Whereas if you block something with robots.txt we can't ever crawl it, so we can't ever see that its a dupe. And then you can have the full page coming up, and then sometimes you'll see these uncrawled URLs where we saw the URL but we weren't able to crawl them and see that its a dupe.
I would really try to let Google crawl the pages & see if we can figure out the dupes on our own.
Trust in GoogleBot
The key point here is that before you consider discarding any of your waste you should give GoogleBot a chance to see if they can just figure it out on their end. Then, without updating said advice, Google rolled out the Panda update & torched 10,000's of webmasters for following what was up to then a Google best practice. Only after months of significant pain did Google formally suggest on their blog that you should now block them from indexing such low value pages.
Matt's video also suggested some of the other work around options webmasters could do (like re-architecting their site or using parameter handling in Webmaster Tools), but made it sound like Google getting it right by default was anything but an anomaly. What such advice didn't take into account was the future.
What Does a Search Engineer Do?
The problem with Google is that no matter what they trust, it gets abused. Which is why they keep trying to fold more signals into search & why they are willing to make drastic changes that often seem both arbitrary & unjust.
Search engineers are well skilled at public relations. A big part of what search engineers do is managing the market through FUD. If you can get someone else to do your work for you for free then that is way more profitable than trying to sort everything out on your end.
Search engineers are great at writing code. A lot of what the search engineers do is reactionary. Some things get out of control and are so obvious that FUD won't work, so they need to stomp on them with new algorithms. Most search engine signals are created through tracking people, so they usually follow people. Even when it seems like they are trying to change the game drastically, a lot of that data still comes from following people.
What to Do as an SEO?
The ignorant SEO waits until they are told by Google to do something & starts following "best practices" after most of the potential profits have been commoditized, both by algorithmic changes & a market that has become less receptive to a marketing approach which has since lost its novelty.
The *really* ignorant SEO only listens to official Google advice & trusts some of the older advice even after it has become both stale & inaccurate. As recently as 2 years ago I saw a published author in the SEO space handing out a tip on Twitter to use the Google toolbar as your primary backlink checking tool. Sad!
The search guidelines are very much a living breathing document. If search engines are to remain relevant they must change with the web. Those blazing new paths & changing the landscape of internet marketing often operate in ways that are not yet commonplace & thus not yet covered by guidelines that are based on last year's ecosystem. Individual campaigns fail often, because they are trying something new or different. Off of each individual marketing campaign the expected outcome is failure. However they generally win the war. Those who follow behind remain in their footprints (unless they operate in less competitive markets).
The savvy SEO is a trail blazer who is pushing & probing to test some of the boundaries. They are equally a person who watches the evolution of the web through the lens of history, attempting to predict where search may lead. If you can predict where search is going you are not as likely to get caught with your pants down as the person who waits around for Google telling them what to do next. It may still happen in some cases, but it is less common & you are more likely to be able to adjust quickly if you are looking at the web through Google's perspective (rather than through the perspective they suggest you use).
Google's Noble Respect for Copyright
Google has a history of challenging the law & building a business through wildcatting in a gray hat/black hat manner.
- They repeatedly broke the law with their ebook scanning project. Their ebook store is already open in spite of a judge requiring them to rework their agreements.
- They bought Youtube, a den of video piracy & then spent $100 million on legal bills after the fact. When they were competing with Youtube they suggested that they could force copyright holders to pay Google for lost ad revenues if they didn't give Google access to the premium content. :D
- They sold ads against trademarks where it was generally viewed as illegal and awaited the court's decisions after the fact.
- They tried doing an illegal search tie-up with Yahoo & only withdrew after they were warned that it would be challenged. They later slid through a similar deal with Yahoo Japan that was approved.
- They "accidentally" collected personally identifiable information while getting router information & scanning streets (and we later learn via internal emails in court documents how important some of this "accidental" data collection was to them).
- They pushed Buzz onto Gmail users and paid the fine.
- Google torched UK finance comparison sites for buying links. Then Google bought one of the few they didn't torch (in spite of its spammy links). After getting flamed on an SEO blog they penalized that site, but then it was ranking again 2 weeks later *without* cleaning up any of the spammy links.
- When the Panda update torched one of your sites Google AdSense was probably already paying someone else to steal it & outrank you. Google itself scrapes user reviews & then replaces the original source with Google Places pages. The only way to opt out of that Google scrape is to opt out of Google search traffic.
- Google promotes open in others, but then with their own products it is all or nothing bundling: "we are using compatibility as a club to make them do things we want." - Google's Dan Morrill
- For years Google recommended warez and keygens and serials to searchers, all while building up a stable of over 50,000 advertisers pedaling counterfeit goods. That only stopped when the US government applied pressure, and then Google painted themselves as the good guys for fighting piracy.
- Google is reportedly about to launch their music service, once again without permission of the copyright holders they are abusing.
Those were examples of how Google interpreted "the guidelines" in modern societies.
Google doesn't wait for permission.
What are you doing right now?
Are you sitting around hoping that GoogleBot sorts everything out?
If so, grab a newspaper & pull out the "help wanted" section. You're going to need it!
If you want to win in Google's ecosystem you must behave like Google does, rather than behaving how they claim to & tell you to.
Gain a Competitive Advantage Today
Your top competitors have been investing into their marketing strategy for years.
Now you can know exactly where they rank, pick off their best keywords, and track new opportunities as they emerge.
Explore the ranking profile of your competitors in Google and Bing today using SEMrush.
Enter a competing URL below to quickly gain access to their organic & paid search performance history - for free.
See where they rank & beat them!
- Comprehensive competitive data: research performance across organic search, AdWords, Bing ads, video, display ads, and more.
- Compare Across Channels: use someone's AdWords strategy to drive your SEO growth, or use their SEO strategy to invest in paid search.
- Global footprint: Tracks Google results for 120+ million keywords in many languages across 28 markets
- Historical data: since 2009, before Panda and Penguin existed, so you can look for historical penalties and other potential ranking issues.
- Risk-free: Free trial & low price.