Matt Cutts on Using Search Usage Data to Fight Spam

A couple weeks back we mentioned that Google's Peter Norvig stated that Google does not use search usage data directly in their relevancy algorithms. Yesterday Matt Cutts made a post on the official Google blog stating that Google does look at search logs / usage data to determine how large spam attacks are and how well new anti-spam measures are doing

Data from search logs is one tool we use to fight webspam and return cleaner and more relevant results. Logs data such as IP address and cookie information make it possible to create and use metrics that measure the different aspects of our search quality (such as index size and coverage, results "freshness," and spam).

Whenever we create a new metric, it's essential to be able to go over our logs data and compute new spam metrics using previous queries or results. We use our search logs to go "back in time" and see how well Google did on queries from months before. When we create a metric that measures a new type of spam more accurately, we not only start tracking our spam success going forward, but we also use logs data to see how we were doing on that type of spam in previous months and years.

Published: June 28, 2008 by Aaron Wall in google

Comments

webuildpages
June 28, 2008 - 6:32pm

In particular....I'd love to hear some ideas on this sentence:

"...go over our logs data and compute new spam metrics using previous queries or results."

I wonder what are some specific "spam metrics" flags?

Anyone have thoughts?

June 29, 2008 - 10:18am

Maybe things like the percent of doorway pages and the percent of pages with affiliate links on them.

Plus they use human editors to review pages, so presumably they can look at their old search logs, see how they manually classified pages (is it spam or not) and then when they roll out a new algorithm they can see if pages that they manually classified as spam in the past tend to show up now OR if they get flagged or burried by the new algorithm.

wilreynolds
June 30, 2008 - 8:17am

What about # of adsense blocks? Maybe that, like affiliate links could be a way to check for spam. Bill slawski talked about this a while back on a Yahoo patent he saw.

June 30, 2008 - 1:23pm

Hi Will
For larger sites Google will have no part in considering # of adsense blocks or paid links as a signal of spam. Just look at Business.com or About.com as examples of where determining the difference between ads and editorial is nearly impossible.

Add new comment

(If you're a human, don't change the following field)
Your first name.
(If you're a human, don't change the following field)
Your first name.
(If you're a human, don't change the following field)
Your first name.