Tracking the Evolution of Search Spam

As part of their 10th birthday celebrations, Google recently released a 2001 index, to show us how much things have changed.

It is fascinating to look into the past, especially from an SEO point of view. Has the nature of spam changed since 2001? How has Google changed in order to nullify the affects of spam?

When Google filed their registration statement prior to IPO, Google identified a number of risk factors.

One of these risks was:

We are susceptible to index spammers who could harm the integrity of our web search results

There is an ongoing and increasing effort by “index spammers” to develop ways to manipulate our web search results. For example, because our web search technology ranks a web page’s relevance based in part on the importance of the web sites that link to it, people have attempted to link a group of web sites together to manipulate web search results. We take this problem very seriously because providing relevant information to users is critical to our success. If our efforts to combat these and other types of index spamming are unsuccessful, our reputation for delivering relevant information could be diminished. This could result in a decline in user traffic, which would damage our business."

Curious how Google conflates spamming with relevance, eh. While it could be true that manipulating rank could lead to lower relevance, that isn't a given. The manipulation could, after all, produce relevant results. "Relevant" being a subjective judgement made by the user.

I digress...

What Google are really getting at is the type of manipulation that leads to less relevant results, commonly referred to as search engine spam. In this respect, what has changed since 2001?

Has Search Spam Been Defeated?

Or, to put another way, what changes have Google made to reduce the business risk of non-relevant search results?

Compare the following examples with the results we see today:

Buy Viagra
Viagra

Now try searching on those two phrases in today's index. How many differences can you spot? How have the result sets changes? Are they less "spammy"?

Here are a few aspects I noticed:

  1. The search results are much tighter and much more well policed. You wouldn't find the penis-envy.com site's link exchange page ranking in Google's 2008 search results for Paxil search queries.
  2. Google used to match keyword strings a lot more than it does today. This is the reason why a lot of on-page optimization techniques have become redundant, and the reason why effective on page optimization in 2008 is more about diversity than repeating words.
  3. Blogs have came from an obscure force to category leaders in many markets.
  4. If you happen to be searching outside the US, Google now incorporates, and boosts, regional results.
  5. Google now incorporates YouTube, news, and other related informational sources, thus forcing results from smaller sites further down the page
  6. There used to be a lot more hyphenated domain names showing up top ten. Not so much these days.
  7. Wikipedia, then called Nupedia, had only just started in 2001, so wasn't yet appearing in every single search result ;)

When Google first emerged, algorithmic search was in real danger of becoming unusable. Engines like Alta Vista were losing the war against spammers, and result sets were becoming increasingly irrelevant. Sergey Brin once declared that it wasn't possible to spam Google. When Google came along, they had defeated spam forever using a clever link analysis algorithm. No more spam!

Well, not really.

Spam hasn't gone away. But it is fair to say that Google is doing a pretty good job of maintaining relevance, and in many cases, eliminating the worst forms of spam. For example, it is now uncommon to see the type of deceptive redirects that were common in 1997, whereby if you clicked on a link, you were led you to a site that was unrelated to the link text.

We've seen the rise of the authoritative domain, and the relegation of the influence of many smaller sites. Pages hosted on authoritative domains are more likely to rank higher than pages on sites that haven't established authority. This has, in turn, led to a different type of spam. People hack into authoritative sites in order to place their links, or entire pages, on these domains. Wikipedia has an ongoing battle to keep their pages free from "commercial imperatives".

The target has, in many ways, shifted down a level.

Big Changes

Since 2001, Google has incorporated verticals.

In this article, Danny Sullivan outlined the use of "invisible tags" in the delivery of search results.

"The solution I see coming is something I call "invisible tabs." Quietly, behind the scenes, search engines will automatically push the correct tab for your query and retrieve specialized search results. This should ultimately prove an improvement over the situation now, where you're handed 10 or 20 matching web pages."

Result sets have increasingly become query dependent, as if you'd pre-selected a topic tab. For example, if your query is determined to have an informational intent, you're unlikely to receive a commercially oriented result set. It is has become a lot more difficult to get off-topic listings - which in this specific case would be commercial pages - into such result sets.

We've also seen the structure of search results pages change markedly. We see images, videos, news, related searches, sub pages, onebox results boxes, personalized results, desktop results, and Adwords. This leaves less and less room for other types of pages, as the search results orient more heavily around a wider variety of data types.

However, in the end, the SERP is still just a list, that looks much like the old list. What will search, and search spam, look like in another tens years?

The Future

Over $10 billion dollars are chasing paid search each year, and that figure will surely grow as media spend increasingly shifts online. There is still a strong incentive to use all means necessary to get to the top of the list.

Google will, of course, continue to try and counter this threat to their business model. The PageRank has likely been changed considerably to when it was first published. Google is likely to continue to incorporate usage metrics, making it more and more difficult for less relevant pages to gain a foothold.

On the flip side, will search be important as it is now? There appears to be a trend for more information to be pushed our way, rather than going out and finding it ourselves. RSS, recommendation engines (Amazon, YouTube, et al), community models (Facebook), and more. Will our surfing habits be (voluntarily) monitored, and answers provided before we we're even aware of the question? We're already seeing the early stages of this with contextual Adwords in Gmail. These changes will, in turn, give rise to a new breed of spam. While the commercial incentive remains, there will always be a level of spam.

The game of cat and mouse continues...

The Google 2001 Search Index is a Great SEO Tool

Having a glimpse of the past reminds us of how things changes, which might help us think of why they changed and how they may change going forward.

The 2001 index provides for a great tool to show past popular SEO techniques that have become irrelevant, which is useful when the boss uncovers an old spammy strategy that they feel you must follow to succeed. It not only helps us inform employers, but also allows us to talk about and highlight overt forms of spam without the worry of "outing" a page that is currently ranking.

Published: October 16, 2008 by A Reader in google

Comments

jaseemumer
October 16, 2008 - 9:51am

Yup, I too believe spamming has evolved. But I have to admit that Google has won in most circumstances. I compared the search from 2001 index and the new index.

If the new index is not less spammy, it is not because Google hasn't grown. But because the benefit of spamming has increased with the growth of Google. So spammers are giving more effort.

Google will try to stop spamming again and again, but due to its growth, effort taken for spamming seems to be quiet worth.

But the new search result seemed more relevant to me(i haven't opened any pages). If you see a spam, just report it and you may see a better serp in the next week. Discount the spam from the serp, due to the relevancy of other results(ok, they too are SEOed), you will get a completely relevant result. Still, Google wins!

I am not a professional SEO, so i can be wrong.

Google can be manipulated, sure. Man hasn't created anything without a flaw.

palconit
October 16, 2008 - 10:08am

lol,reminds me of my early years online. :)

namezero had a free domain for a year when i first registered palconit.com but never renewed because i dont know how to pay online :P

well,

seo.com was Schwartz Electro-Optics Inc

and the top of the spot was

Sponsors for Educational Opportunity Web Site

there wasnt any seobook even last 2001

renesisx
October 16, 2008 - 11:20am

Google are still the engine that returns the best quality results for most queries.

There are a few queries though where I sometimes wonder if companies involved are being protected from inside Google. Ones where they aggressively, openly and blatantly use techniques such as Digital Point Ad Network to spam their way to #1 for large numbers of incredibly profitable keywords. It's either that, or Google are just totally "dropping the ball" on some queries.

Martypants
October 16, 2008 - 8:52pm

To me, spammers are no less frustrating than they were back then...I still get juiced everytime I see someone pushing more crap onto the Internet. But I never see it going away either - like jaseemumer said, there is too much money at stake, and too many lazy people trying to get rich quick. The fact that you can set up a live blog in under 10 minutes and host it for free changes the whole playing field too...there isn't much risk involved anymore, just time.
If I want a good taste of spam, I'll stick to Monty Python reruns!

Add new comment

(If you're a human, don't change the following field)
Your first name.
(If you're a human, don't change the following field)
Your first name.
(If you're a human, don't change the following field)
Your first name.