A Thought Experiment on Google Whitelisting Websites

Google has long maintained that "the algorithm" is what controls rankings, except for sites which are manually demoted for spamming, getting hacked, delivering spyware, and so on.

At the SMX conference it was revealed that Google uses white listing:

Google and Bing admitted publicly to having ‘exception lists’ for sites that were hit by algorithms that should not have been hit. Matt Cutts explained that there is no global whitelist but for some algorithms that have a negative impact on a site in Google’s search results, Google may make an exception for individual sites.

The idea that "sites rank where they deserve, with the exception of spammers" has long been pushed to help indemnify Google from potential anti-competitive behavior. Google's marketing has further leveraged the phrase "unique democratic nature of the web" to highlight how PageRank originally worked.

But why don't we conduct a thought experiment for the purpose of thinking through the differences between how Google behaves and how Google doesn't want to be perceived as behaving.

Let's cover the negative view first. The negative view is that either Google has a competing product or a Google engineer dislikes you and goes out of his way to torch your stuff simply because you are you and he dislikes you & is holding onto a grudge. Given Google's current monopoly-level marketshare in most countries, such would be seen as unacceptable if Google was just picking winners and losers based on their business interests.

The positive view is that "the algorithm handles almost everything, except some edge cases of spam." Let's break down that positive view a bit.

  • Off the start, consider that Google engineers write the algorithms with set goals and objectives in mind.
    • Google only launched universal search after Google bought Youtube. Coincidence? Not likely. If Google had rolled out universal search before buying Youtube then they likely would have increased the price of Youtube by 30% to 50%.
    • Likewise, Google trains some of their algorithms with human raters. Google seeds certain questions & desired goals in the minds of raters & then uses their input to help craft an algorithm that matches their goals. (This is like me telling you I can't say the number 3, but I can ask you to add 1 and 2 then repeat whatever you say :D)
  • At some point Google rolls out a brand-filter (or other arbitrary algorithm) which allows certain favored sites to rank based on criteria that other sites simply can not match. It allows some sites to rank with junk doorway pages while demoting other websites.
  • To try to compete with that, some sites are forced to either live in obscurity & consistently shed marketshare in their market, or be aggressive and operate outside the guidelines (at least in spirit, if not in a technical basis).
  • If the site operates outside the guidelines there is potential that they can go unpenalized, get a short-term slap on the wrist, or get a long-term hand issued penalty that can literally last for up to 3 years!
  • Now here is where it gets interesting...
    • Google can roll out an automated algorithm that is overly punitive and has a significant number of false positives.
    • Then Google can follow up by allowing nepotistic businesses & those that fit certain criteria to quickly rank again via whitelisting.
    • Sites which might be doing the same things as the whitelisted sites might be crushed for doing the exact same thing & upon review get a cold shoulder.

You can see that even though it is claimed "TheAlgorithm" handles almost everything, they can easily interject their personal biases to decide who ranks and who does not. "TheAlgorithm" is first and foremost a legal shield. Beyond that it is a marketing tool. Relevancy is likely third in line in terms of importance (how else could one explain the content farm issue getting so out of hand for so many years before Google did something about it).

Published: March 11, 2011 by Aaron Wall in google

Comments

March 11, 2011 - 2:37pm

Sometime we just don't know what to think..but I think their still the most trusted search engine..but their policy is all over the place..

"Black Seo Guy "Signing Off"

Dimwit
March 11, 2011 - 3:57pm

To be perfectly clear, Google is going to dominate search indefinitely, but...

By devaluing the foundation of their greatness (an algorithm built on relevance and 'online' reputation) in favor of increasingly subjective signals like 'brand recognition', 'trust', and 'quality'...Google has opened wide an opportunity for smaller players to provide access to the scraps left over, to help consumers find the diamonds in the rough...assuming that the small players can survive without the near-critical exposure that Google offers.

Going out on a long limb here, but I posit that 2/24 will be remembered as the day that Google jumped the Panda.

Dictina
March 11, 2011 - 4:20pm

I wonder if this whitelisting is being applied to lyrics sites, where Google can't say who is the original source for the content. Otherwise, I can't explain why some illegal sites rank before the ones who have agreements with the labels and editorials and have keep steadily adding value to that content.

March 12, 2011 - 3:30am

If you read Google's remote quality rater documents they have a quote in them that lyrics sites are not to be dinged for duplicate content because there is no official source for lyrics. My guess is that the policy stays in place until if/when Google licenses lyrics data directly, at which point in time a few licensed sites might get a chance to pop up while many of the long established unlicensed sites get flushed down the toilet.

Google doing that would have numerous benefits...

  • they would keep the other official sources fairly hidden for now
  • when Google launches all the currently established strong brands would become a bit harder to find, helping Google win marketshare faster
  • it would give Google an economy of scale benefit that allowed them to pay more than other websites can afford to
James_
March 11, 2011 - 4:58pm

yeh it seems awfully fishy to me that cultofmac started magically ranking again after they got a few mentions in the press.. they know someone who knows someone period! If CultofMac wasn't whitelisted then where are all the other sites that got their rankings back?

CureDream
March 11, 2011 - 5:24pm

I've been working on a project lately where having a good internal search engine is really important. Fortunately, my system collects a vast number of signals that can be used to rank things, but I'm starting to feel the problems that Google and other large-scale search engines have, especially the tension between "relevant results" (keyword match) and "high quality results" (quality scoring as in PageRank)

Early on, I was floundering, and a major driver of my search engine development was putting out fires. I'd find that people did a search for "X" and got bad results and I'd make some change that would fix this, but then I'd break the search for "Y".
After a while I started thinking about this in terms of test-driven development. Every time I found a search that got the wrong results, I added it to a test suite that I can run every time I change the search algorithm and I can tweak it until it works the way I want.

Then one day I finally understood some of the academic papers that I'd seen (and even contributed data for) about applying machine learning to search. Once I've got this list of searches that have particular constraints attached as to "X should rank #1 for topic Y" or "Page X should rank better than Page Y for topic Z" or "X should rank in the top 5 for topic Y" I can feed this to a machine learning algorithm and let it turn the knobs until the search works the way I want.

It's almost certain that Google uses this kind of thinking to some extent, and that they could punch a rule into their training system that says something like "eFreedom.com should never rank better than stackoverflow.com" but still keep all the other training rules so this doesn't have a destructive effect on other search terms. (Except for the webmaster who gives the signals that eFreedom.com does... On the other hand it's good if you give off the signals that StackOverflow does)

Probably the "Farmer" update involved adding a ruleset that targeted specific "content farms" but not others.

This gives Google a mechanism that can use to penalize a site or a kind of site without having to explicitly put a penalty on the URLs involved. What's neat about it is that they can do this in a way that satisfies business demands and pressures they have on a pretty quick basis... It's a reasonable guess, I think, that it takes Google a few weeks to do a few processing run on the whole data set that they use to train the search engine. This is the timescale between the time that they acknowledge the panic about content farms and when "farmer" hit the streets. Also it seems like Google does a minor reranking on a timescale of once a month or so, and they probably spend the time between reranking doing this kind of parameter tweaking.

March 12, 2011 - 3:12am

...about the farmer update is that I think Google business development folks have been far more aggressive at trying to sign up new partners the past couple weeks than before. I got an unsolicited email trying to sign seobook.com up to AdSense & that hasn't happened before.

As bad as the content farm update must feel for some publishers, imagine how it must feel for the internal AdSense team at Google, especially for any of them who have revenue goals tied to their performance & just saw everything shift around so much after Google chose a set of winners and losers on the update. ;)

Add new comment

(If you're a human, don't change the following field)
Your first name.
(If you're a human, don't change the following field)
Your first name.
(If you're a human, don't change the following field)
Your first name.