SEO Tips & Search Engine Tips

SEO Old Timer Tips:
An Old Timers Perspective...from SEGuru

Search Engine Old Timer Tips:
Recently a friend of mine bought me a copy of A Theory of Indexing by Gerard Salton. It is a 50 page book from 1975 with lots of charts and math, but in those few pages it has a ton of information about many of the ideas which current search technologies have been built upon.

I am probably going to have to read it again because it was so dense with information and had lots of math that was a wee bit above me the first time around, but to anyone interested in learning about search technology it is a great book...much like Mike Grehan's.

A Theory of Indexing talks about a ton of interesting things like:

  • signal to noise

  • inverse document frequency
  • discrimination value
  • and lots of other stuff

Here is a small bit I learned from the last few pages...

If words exist in a high % of the total documents in a document collection then they are not usually going to be good at discriminating which documents are relevant for a particular query (since they appear in too many documents).

If words exist is a low % of the total documents then they are not usually going to be good at discriminating which documents are relevant for a particular query (since they appear in so few documents).

Words with a mid range document frequency are better discriminators.

To make better use of words that appear in a high % of the total documents you can combine the words into word pairs or triples - which will have a lower frequency and may be better at descriminating document relevancy.

To make better use of words that appear in a low % of the total documents you can cluster the words into groups via the use of a thesaurus - which will have the net effect of creating higher frequency word classes / clusters - which may be better at descriminating document relevancy.

Published: December 8, 2004 by Aaron Wall in technology


Add new comment

(If you're a human, don't change the following field)
Your first name.
(If you're a human, don't change the following field)
Your first name.
(If you're a human, don't change the following field)
Your first name.

New to the site? Join for Free and get over $300 of free SEO software.

Once you set up your free account you can comment on our blog, and you are eligible to receive our search engine success SEO newsletter.

Already have an account? Login to share your opinions.