You can learn a lot about how search has improved over the years by reading Matt Cutts. Recently he highlighted how search was irrelevant in the past due to a lack of diversity:
Seven of the top 10 results all came from one domain, and the urls look a little… well, let’s say fishy. In 1999 and early 2000, search engines would often return 50 results from the same domain in the search results. One nice change that Google introduced in February 2000 was “host crowding,” which only showed two results from each hostname. ... Suddenly, Google’s search results were much cleaner and more diverse! It was a really nice win–we even got email fan letters.
Thanks to those kinds of improvements, in 2011 we never have to look at search results like this.*
* And by never, I mean, unless the results are linking to fraternal Google pages, in which case, game on!
Why should Google result crowding not apply to Google.com? Sure they can say those books are from different authors, but many websites are ran by organizations with multiple authors. Some websites are even built through the partnerships of multiple different business organizations. Who knows, maybe some searchers are uncomfortable with every other listing being an out of context book highlight.
In the past I have been called cynical for highlighting stuff like the following image
I saw it as part of a trend toward home cooking promotions. And I still view it that way. The above books promotion is simply further proof of concept.
other Google owned and operated sites
a branded website ranking for its own brand
Can you show me *any* occurrence of a result where a site is listed 5 times in the search result? Bonus points if you can find it where the 5 times are not grouped into 1 bunch via result crowding.
As a thought experiment, ask yourself if that Google ranking accident would happen if the content archive being served up was promoting media hosted on Microsoft servers.
A friend of mine summed it up nicely with:
well, it's not everyday you see that kind of power and the fact that other sites aren't afforded the same opportunity makes me think that they are being anti-competitive. Google literally wrote the book (ok scraped it) on anti-competitive practices.
If you live outside the United States and were unscathed by the Panda Update, a world of hurt may await soon. Or you may be in for a pleasant surprise. It is hard to say where the chips may lay for you without looking.
Due to Google having multiple algorithms running right now, you can get a peak at the types of sites that were hit, and if your site is in English you can see if it would have got hit by comparing your Google.com rankings in the United States versus in foreign markets by using the Google AdWords ad preview tool.
In most foreign markets Google is not likely to be as aggressive with this type of algorithm as they are in the United States (because foreign ad markets are less liquid and there is less of a critical mass of content in some foreign markets), but I would be willing to bet that Google will be pretty aggressive with it in the UK when it rolls out.
The keywords where you will see the most significant ranking changes will be those where there is a lot of competition, as keywords with less competition generally do not have as many sites to replace them when they are whacked (since there were less people competing for the keyword). Another way to get a glimpse of the aggregate data is to look at your Google Analytics search traffic from the US and see how it has changed relative to seasonal norms. Here is a look out below example, highlighting how Google traffic dropped. ;)
What is worse, is that on most sites impacted revenue declined faster than traffic because search traffic monetizes so well & the US ad market is so much deeper than most foreign markets. Thus a site that had 50% profit margins might have just went to break even or losing money after this update. :D
When Google updates the US content farmer algorithm again (likely soon, since it has already been over a month since the update happened) it will likely roll out around other large global markets, because Google does not like running (and maintaining) 2 sets of ranking algorithms for an extended period of time, as it is more cost intensive and it helps people reverse engineer the algorithm.
And the spam clean up? Google did NOTHING of the sort.
Every single example (of Google spamming Google) that was highlighted is still live.
Now Google can claim they handled the spam on their end / discounted it behind the scenes, but such claims fall short when compared to the standards Google holds other companies to.
Most sites that get manually whacked for link-based penalties are penalized for much longer than 2 weeks.
Remember the brand damage Google did to companies like JC Penny & Overstock.com by talking to the press about those penalties? In spite of THOUSANDS of media outlets writing about Google's BTQ acquisition, The Register was the most mainstream publication discussing Google's penalization of BeatThatQuote, and there were no quotes from Google in it.
When asking for forgiveness for such moral violations, you are supposed to grovel before Google admitting all past sins & admit to their omniscient ability to know everything. This can lead one to over-react and actually make things ever worse than the penalty was!
In an attempt to clean up their spam penalties (or at least to show they were making an effort) JC Penny did a bulk email to sites linking to them, stating that the links were unauthorized and to remove them. So JC Penny not only had to spend effort dropping any ill gotten link equity, but also lost tons of organic links in the process.
Time to coin a new SEO phrase: token penalty.
token penalty: an arbitrary short-term editorial action by Google to deflect against public relations blowback that could ultimately lead to review of anti-competitive monopolistic behaviors from a search engine with monopoly marketshare which doesn't bother to follow its own guidelines.
Your faith in your favorite politician should be challenged after you see him out on the town snorting coke and renting hookers. The same is true for Googler's preaching their guidelines as though it is law while Google is out buying links (and the sites that buy them).
You won't read about this in the mainstream press because they are scared of Google's monopolistic business practices. Luckily there are blogs. And Cyndi Lauper. ;)
Update: after reading this blog post, Google engineers once again penalized BeatThatQuote!
On-demand indexing was a great value added feature for Google site search, but now it carries more risks than ever. Why? Google decides how many documents make their primary index. And if too many of your documents are arbitrarily considered "low quality" then you get hit with a sitewide penalty. You did nothing but decide to trust Google & use Google products. In response Google goes out of its way to destroy your business. Awesome!
Part of the prescribed solution to the Panda Update is to noindex content that Google deems to be of low quality. But if you are telling GoogleBot to noindex some of your content, then if you are also using them for site search, you destroy the usability of their site search feature by making your content effectively invisible to your customers. For Google Site Search customers this algorithmic change is even more value destructive than the arbitrary price jack Google Site Search recently did.
Cloaking vs rel=noindex, rel=canonical, etc. etc. etc.
Google tells us that cloaking is bad & that we should build our sites for users instead of search engines, but now Google's algorithms are so complex that you literally have to break some of Google's products to be able to work with other Google products. How stupid! But a healthy reminder for those considering deeply integrating Google into your on-site customer experience. Who knows when their model will arbitrarily change again? But we do know that when it does they won't warn partners in advance. ;)
I could be wrong in the above, but if I am, it is not easy to find any helpful Google documentation. There is no site-search bot on their list of crawlers, questions about if they share the same user agent have gone unanswered, and even a blog post like this probably won't get a response.
That is a reflection of only one more layer of hypocrisy, in which Google states that if you don't provide great customer service then your business is awful, while going to the dentist is more fun than trying to get any customer service from Google. :D
I was talking to a friend about this stuff and I think he summed it up perfectly: "The layers of complexity make everyone a spammer since they ultimately conflict, giving them the ability to boot anyone at will."
I was looking for information about the nuclear reactor issue in Japan and am glad it did not turn out as bad as it first looked!
But in that process of searching for information I kept stumbling into garbage hollow websites. I was cautious not to click on the malware results, but of the mainstream sites covering the issue, one of the most flagrant efforts was from the Huffington Post.
AOL recently announced that they were firing15% to 20% of their staff. No need for original stories or even staff writers when you can literally grab a third party tweet, wrap it in your site design, and rank it in Google. Inline with that spirit, I took a screenshot. Rather than calling it the Huffington Post I decided a more fitting title would be plundering host. :D
We were told that the content farm update was to get rid of low quality web pages & yet that information-less page was ranking at the top of their search results, when it was nothing but a 3rd party tweet wrapped in brand and ads.
You can imagine in a hyperspace a bunch of points, some points are red, some points are green, and in others there’s some mixture. Your job is to find a plane which says that most things on this side of the place are red, and most of the things on that side of the plane are the opposite of red. - Google's Amit Singhal
If you make it past Google's arbitrary line in the sand there is no limit to how much spamming and jamming you can do.
we actually came up with a classifier to say, okay, IRS or Wikipedia or New York Times is over on this side, and the low-quality sites are over on this side. - Matt Cutts
(G)arbitrage never really goes away, it just becomes more corporate.
As bad as that sounds, it is actually even worse than that. Today Google Alerts showed our brand being mentioned on a group-piracy website built around a subscription model of selling 3rd party content without permission! As annoying as that feels, of course there are going to be some dirtbags on the way that you have to deal with from time to time. But now that the content farm update has went through, some of the original content producers are no longer ranking for their own titles, whereas piracy sites that stole their content are now the canonical top ranked sources!
Google never used to put piracy sites on the first page of results for my books, this is a new feature on their part, and I think it goes a long way to show that their problem is cultural rather than technical. Google seems to have reached the conclusion that since many of their users are looking for pirated eBooks, quality search results means providing them with the best directory of copyright infringements available. And since Google streamlined their DMCA process with online forms, I couldn’t discover a method of telling them to remove a result like this from their search results, though I tried anyway.
... I feel like the guy who was walking across the street when Google dropped a 1000 pound bomb to take out a cockroach - MorrisRosenthal
Take a look at what Matt Cutts shares in the following video, where he tries to compare brand domain names vs keyword domain names. He highlights brand over and over again, and then when he talks about exact match domains getting a bonus or benefit, he highlights that Google may well dial that down soon.
Now if you are still on the fence, let me just give you a bit of color. that we have looked at the rankings and the weights that we give to keyword domains, & some people have complained that we are giving a little too much weight for keywords in domains. So we have been thinking about at adjusting that mix a bit and sort of turning the knob down within the algorithm, so that given 2 different domains it wouldn't necessarily help you as much to have a domain name with a bunch of keywords in it. - Matt Cutts
It is believed that Google requires participating hotels to provide Google Maps with the lowest publicly available rates, for stays of one to seven nights, double occupancy, with arrival days up to 90 days ahead.
In a world where Google has business volume data, clientele demographics, pricing data, and customer satisfaction data for most offline businesses they don't really need to place too much weight on links or domain names. Businesses can be seen as being great simply by being great.*
(*and encouraging people to stuff the ballot box for them with discounts :D)
Classical SEO signals (on-page optimization, link anchor text, domain names, etc.) have value up until a point, but if Google is going to keep mixing in more and more signals from other data sources then the value of any single signal drops. I haven't bought any great domain names in a while, and with Google's continued brand push and Google coming over the top with more ad units (in markets like credit cards and mortgage) I am seeing more and more reason to think harder about brand. It seems that is where Google is headed. The link graph is rotted out by nepotism & paid links. Domain names are seen as a tool for speculation & a short cut. It is not surprising Google is looking for more signals.
How have you adjusted your strategies of late? What happens to the value of domain names if EMD bonus goes away & Google keeps adding other data sources?
Google and Bing admitted publicly to having ‘exception lists’ for sites that were hit by algorithms that should not have been hit. Matt Cutts explained that there is no global whitelist but for some algorithms that have a negative impact on a site in Google’s search results, Google may make an exception for individual sites.
The idea that "sites rank where they deserve, with the exception of spammers" has long been pushed to help indemnify Google from potential anti-competitive behavior. Google's marketing has further leveraged the phrase "unique democratic nature of the web" to highlight how PageRank originally worked.
But why don't we conduct a thought experiment for the purpose of thinking through the differences between how Google behaves and how Google doesn't want to be perceived as behaving.
Let's cover the negative view first. The negative view is that either Google has a competing product or a Google engineer dislikes you and goes out of his way to torch your stuff simply because you are you and he dislikes you & is holding onto a grudge. Given Google's current monopoly-level marketshare in most countries, such would be seen as unacceptable if Google was just picking winners and losers based on their business interests.
The positive view is that "the algorithm handles almost everything, except some edge cases of spam." Let's break down that positive view a bit.
Off the start, consider that Google engineers write the algorithms with set goals and objectives in mind.
Google only launched universal search after Google bought Youtube. Coincidence? Not likely. If Google had rolled out universal search before buying Youtube then they likely would have increased the price of Youtube by 30% to 50%.
Likewise, Google trains some of their algorithms with human raters. Google seeds certain questions & desired goals in the minds of raters & then uses their input to help craft an algorithm that matches their goals. (This is like me telling you I can't say the number 3, but I can ask you to add 1 and 2 then repeat whatever you say :D)
At some point Google rolls out a brand-filter (or other arbitrary algorithm) which allows certain favored sites to rank based on criteria that other sites simply can not match. It allows some sites to rank with junk doorway pages while demoting other websites.
To try to compete with that, some sites are forced to either live in obscurity & consistently shed marketshare in their market, or be aggressive and operate outside the guidelines (at least in spirit, if not in a technical basis).
If the site operates outside the guidelines there is potential that they can go unpenalized, get a short-term slap on the wrist, or get a long-term hand issued penalty that can literally last for up to 3 years!
Now here is where it gets interesting...
Google can roll out an automated algorithm that is overly punitive and has a significant number of false positives.
Then Google can follow up by allowing nepotistic businesses & those that fit certain criteria to quickly rank again via whitelisting.
Sites which might be doing the same things as the whitelisted sites might be crushed for doing the exact same thing & upon review get a cold shoulder.
You can see that even though it is claimed "TheAlgorithm" handles almost everything, they can easily interject their personal biases to decide who ranks and who does not. "TheAlgorithm" is first and foremost a legal shield. Beyond that it is a marketing tool. Relevancy is likely third in line in terms of importance (how else could one explain the content farm issue getting so out of hand for so many years before Google did something about it).
When Google did the Panda update they highlighted that not only did some "low quality" sites get hammered, but that some "high quality" sites got a boost. Matt Cutts said: "we actually came up with a classifier to say, okay, IRS or Wikipedia or New York Times is over on this side, and the low-quality sites are over on this side."
Here is the problem with that sort of classification system: doorway pages.
The following Ikea page was ranking page 1 in the search results for a fairly competitive keyword.
Once you strip away the site's navigation there are literally only 20 words on that page. And the main body area "content" for that page is a link to a bizarre, confusing, and poor-functioning flash tour which takes a while to load.
If you were trying to design the worst possible user experience & wanted to push the "minimum viable product" page into the search results then you really couldn't possibly do much worse that that Ikea page is (at least not without delivering malware and such).
I am not accusing Ikea of doing anything spammy. They just have terrible usability on that page. Their backlinks to that page are few in number & look just about as organic as they could possibly come. But not that long ago companies like JC Penny and Overstock were demoted by Google for building targeted deep links (that they needed in order to rank, but were allegedly harming search relevancy & Google user experience). Less than a month later Google arbitrarily changed their algorithm to where other branded sites simply didn't need many (or in some cases any) deep links to get in the game, even if their pages were pure crap.
We are told the recent "content farm" update was to demote low quality content. If that is the case, then how does a skeleton of a page like that rank so high? How did that Ikea page go from ranking on the third page of Google's results to the first one? I think Google's classifier is flashing a new set of exploits for those who know what to look for.
A basic tip? If you see Google ranking an information-less page like that on a site you own, that might be a green light to see how far you can run with it. Give GoogleBot the "quality content" it seeks. Opportunity abound!
You can't learn great SEO from an e-book. Or buying software tools.
Great SEO is built on an understanding.
Reducing SEO To Prescription
One of the problems with reductive, prescribed SEO approaches - i.e. step one: research keywords, step two: put keyword in title etc can be seen in the recent "Content Farm" update.
When Google decide sites are affecting their search quality, they look for a definable, repeated footprint made by the sites they deem to be undesirable. They then design algorithms that flag and punish the sites that use such a footprint.
This is why a lot of legitimate sites get taken out in updates. A collection of sites may not look, to a human, like problem sites, but the algo sees them as being the same thing, because their technical footprint is the same. For instance, a website with a high number of 250-word pages is an example of a footprint. Not necessarily an undesirable one, but a footprint nevertheless. Similar footprints exist amongst ecommerce sites heavy in sitewide templating but light on content unique to the page.
Copying successful sites is a great way to learn, but can also be a trap. If you share a similar footprint, having followed the same SEO prescription, you may go down with them if Google decides their approach is no longer flavor of the month.
The Myth Of White Hat
A lot of sites that get taken out are white hat i.e. sites that follow Google's webmaster guidelines.
It's a reasonably safe approach, but if you understand SEO, you'll soon realize that following a white hat prescription offers no guarantees of ranking, nor does it offer any guarantees you won't be taken out.
The primary reason there aren't any guarantees comes down to numbers. Google knows that when it makes a change, many sites will lose. They also know that many sites will win i.e. replace the sites that lost. If your site drops out, Google aren't bothered. There will be plenty of other sites to take your place. Google are only concerned that their users perceive the search results to be of sufficient quality.
The exception is if your site really is a one-of-a-kind. The kind of site that would embarrass Google if users couldn't find it. BMW, for example, in response to the query "BMW".
It's not fair, but we understand that's just how life is.
For those readers new to SEO, in order to really grasp SEO, you need to see things from the search engines point of view.
Firstly, understand the search engines business case. The search engine can only make money if advertisers pay for search traffic. If it were too easy for those sites who are likely to use PPC to rank highly in the natural results, then the search engines business model is undermined. Therefore, it is in the search engines interest to "encourage" purely commercial entities to use PPC, not SEO. One way they do this is to make the natural results volatile and unpredictable. There are exceptions, covered in my second point.
Secondly, search engines must provide sufficient information quality to their users. This is an SEO opportunity, because without webmasters producing free-to-crawl, quality content, there can be no search engine business model. The search engines must nurture this ecosystem.
If you provide genuine utility to end users, the search engines have a vested interest in your survival, perhaps not as an individual, but certainly as a group i.e. "quality web publishers". Traffic is the lifeblood of the web, and if quality web publishers aren't fed traffic, they die. The problem, for webmasters, is that the search engines don't care about any one "quality publisher", as there are plenty of quality publishers. The exception is if you're the type of quality publisher who has a well recognized brand, and would therefore give the impression to users that Google was useless if you didn't appear.
Thirdly, for all their cryptic black box genius, search engines aren't all that sophisticated. Yes, the people who run them are brilliant. The problems they solve are very difficult. They have built what, only decades ago, would have been considered magic. But, at the end of the day, it's just a bit of maths trying to figure out a set of signals. If you can work out what that set of signals are, the maths will - unblinkingly - reward you. It is often said that in the search engine wars, the black hats will be the last SEOs standing.
Fourthly, the search engines don't really like you. They identified you as a business risk in their statement to investors. You can, potentially, make them look bad. You can undermine their business case. You may compete with their own channels for traffic. They tolerate you because they need publishers making their stuff easy to crawl, and not locking their content away behind paywalls. Just don't expect a Christmas card.
SEO Strategy Built On Understanding
Develop strategies based on how a search engine sees the world.
For example, if you're a known brand, your approach will be different to a little known, generic publisher. There isn't really much risk you won't appear, as you could embarrass Google if users can't find you. This is the reason BMW were reinstated so quickly after falling foul of Google's guidelines, but the same doesn't necessarily apply to lesser known publishers.
If you like puzzles, then testing the algorithms can give you an unfair advantage. It's a lot harder than it used to be, but where there is difficulty, there is a barrier to entry to those who come later. Avoid listening to SEO echo chambers where advice may be well-meaning, but isn't based on rigorous testing.
If you're a publisher, not much into SEO wizardry, and you create content that is very similar to content created by others, you should focus on differentiation. If there are 100's of publishers just like you, then Google doesn't care if you disappear. Google do need to find a way to reward quality, especially in niches that aren't well covered. Be better than the rest, but if you're not, slice your niche finer and finer, until you're the top dog in your niche. You should focus on building brand, so you can own a search stream. For example, this site owns the search stream "SEO Book", a stream Aaron created and built up.
Remember, search engines don't care about you, unless there's something in it for them.