Fallacies of Relevance

Orion posted an interesting thread at SEW, citing this fallacies of relevance page. The SEW thread also has some good posts by other members & looks to be shaping up into a great thread.

Orion stated that he did not think current systems could yet grasp relevancy fallacies.

When you are trying to win an arguement, if you use any logical fallacies make sure you use these 38 sure fire techniques. <-- amazing resource!

Google Search History Tagging Folksonomies...Please Tag My Site :)

Oh so quietly Google added a tagging feature to their My Search History product.

I believe Google will eventually find ways to trust Google accounts more the same way they trust domains more as they age. The tags surely can be abused, but so can links. Just like link anchor text, the tagging could be used by Google to help understand the aboutness of a page or site.

It would take a good bit of knowledge to create a variety of random Google accounts that had regular and unique search habbits over time. Google does not need to try to stop all search spammers, they only need to make search spamming so complex or expensive that most people would just rather put in the effort to create something of high quality.

Yahoo! added a rich get richer factor into their algorithms, adding blogs to their news search. In an interview with Forbes.com Joff Redfern, a director in Yahoo! Search, stated blog rankings may be due in part to the number of My Yahoo! subscribers:

"If we've got more people subscribed to a blog, there is presumably more credibility to its reputation," says Redfern.

You gotta wonder how many fake accounts are getting set up as I type this.

Do any SEO websites sell search behavior or established user accounts yet? If not I wonder how long until they hit the market and how long until those services are claimed on many sites :)

Sergey Brin + Others on Video Talking Search

Berkeley has been recording lectures from some of the best minds in search. So far some of the videos include Norvig, Battelle, & Brin. Gary posted a bit about Sergey here.

I am not sure what the problem was, but my connection kept breaking in the middle of the shows, which is annoying. They have a wide variety of Podcasts available here.

Calvin Mooers Laws

People will avoid certain types of information they need:

An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have it.

How do you get people to find information they do not want to find?

Mooers Second Law of Documentation:

In the same manner that color samples provide a test for the detection of color blindness in a person, the descriptor technique provides a means for the detection of the "word-bound" or "idea-blind" person. Such detection is important because a word-bound person may not be able to provide idea-based (word-independent) retrieval service of the kind which is most congenial and most desired by the non-word-bound part of the population. - source

The concept sure highlights the need for writing to the audience the way they speak and think.

Calvin Northrup Moores background - learn more about the man who coined the term Information Retrieval

Advertising & Search Related Patents & Patent Applications

If you like patents try here, here, and here.

Mobile Search Wars

Yahoo! launches their SMS service

the new Google toolbar added a send to phone feature

not too long ago Google became the default home page for T mobile

Business 2.0 recently posted an article about the looming mobile search wars:

According to the Pierz Group, Americans spent nearly $2 billion on directory assistance from their mobile phones last year -- at an average of $1.25 a call -- which suggests a healthy demand for information on the go. And that's just a fraction of the overall mobile search market. Providing instantaneous answers to a wide range of queries is what will make mobile search invaluable. And whoever figures that out is golden.

Google Search Result Quality Evaluators

Google's search quality evaluation process site may have been around for years.

SearchBistro recently posted a 22 page PDF titled General Guidelines on Random-Query Evaluation that was last revised on December 31, 2003. In addition to posting the Random-Query Evaluation PDF, Henk van Ess has recently posted:

  • examples of offensive (or low quality) sites

  • some whitelisted sites:
    Here is a non-exhaustive "white list" of the sites whose pages are not to be rated as Offensive (nor as Erroneous):
    Kelkoo, Shopping.com, dealtime.com, bizrate.com, bizrate.lycos.com, dooyoo.com;

  • tips for rating sites:
    If it's a machine-generated, no added value affiliates, it's Spam. If it provides some unique values, for example, customer feedback, local information, it should be rated on the merit scale even if it has some affiliates. Similarly, if the game site allows you to download a game, without being intrusive (i.e. install a spyware without notice), it should be rated on the merit scale, instead of Spam.

  • How reviewers communicate to come up with solutions when review quality scores are far apart from one another

Search Classification Types:
The Google review guide classifies searches as being

  • navigational (example: a search for United Airlines)

  • informational (example: how do I..)
  • transactional (example: buy 18K White Gold Omega Watch)
  • any mixture of the above categories.

Resource Quality Rating:
Google then asks raters to classify sites listed in random queries using the following categories:

  • Vital

    • most queries, especially generic type queries do not have a Vital result.

    • Vital result example: search for Ask Jeeves returns www.ask.com.
  • Useful
    • these should have some of the following characteristics (although it likely will not exhibit all of them): comprehensive, quality, answers the search query with precission, timly, authoritative.

    • This is the highest rating attainable for most pages on most search queries.
    • Useful result example: search for USA Patriot Act returning the ACLU page covering the USA Patriot Act.
    • For some plural queries, such as Newspapers in Scotland, the best results may be lists of related sites. Reviewers must also check some links on the page to ensure the page is functional.
  • Relevant
    • One step down from Useful. Relevant results may satisfy only one important facet of a query, whereas Useful results are expected to be more broad and thorough.

    • Results that would have been Vital if a more common interpretation did not overshadow it are considered relevant.
  • Not Relevant
    • Not Relevant results are related to the topic but do not help users.

    • If a person searching for Real Estate finds a San Diego Real Estate website that would probably not be relevant since most people searching for that do not live in or want to move specifically to San Diego.
    • As the San Diego example is too narrow geographically other sites could also be too narrow in other non location based ways, such as being outdated or too specific to a subset idea of the query.
  • Off Topic
    • Is not a useful page. Irrelevant.

    • Usually occurs when text matching algorithms do not account for some terms that can have multiple meanings.
  • Offensive
    • Pages or sites that often do not hold merit on any query.

    • Example Offensive sites: spyware, unrequested porn, AdSense scraper and other keyword net type sites, etc.
  • Erronious
  • Didn't Load
  • Foreign Language
  • Unrated

Vital to Offensive are in order of quality. The higher the better. Erronious through Unrated are cast as non votes. When in doubt between rating values raters are expected to rate at the lower of the two rating values.

Why this is Important:
By learning how and what they want evaluators to look for it makes it easier to understand how to deliver what the search engines want.

This post was a quick review of General Guidelines on Random-Query Evaluation. If you are heavily interested in SEO it is well worth your time to read the original document, which lists many more examples and is in far greater detail than this post.

Random Thoughts:
With how relatively low the wages are for these positions ($10 - $20 an hour) you have to wonder:

  • why it took so long for this information to come out

  • if some of these people are using the information they gained from participating in other ways
  • if these people know anything about Google's business model, and how much THEY could be making on a per click basis if they created well cited content that fit Google's guidelines.
  • and a far off tangent! what would happen if Google's business model made self employment too profitable to where they could not afford to pay workers

More on TrustRank

A while ago I wrote a bit about TrustRank after reading the PDF about it.

It is fairly easy to understand many of the concepts of it (like attenuating a possitive trust score or offsetting the effects of link spam with a negative trust score), but it is even easier to understand them if you visualize the concept of trust attenuation.

Most sites are not exceptionally compelling, so there are usually not many legitimate hubs in any industry, but many sites are glorified link farms which will not pass any positive trust value.

For a while I helped promote many directories, but many of the new ones on the market have little to no legitimate value, and some of the links from them may even have negative value.

I just wrote an article called TrustRank & the Company You Keep, in which I made this graphic explaining the concept of AntiTrust (yet another SEO phrase I made up hehehe).

The red X's represent things that should be, but are not there.
Bad directory image, showing inbound & outbound link profile.

Yes, I know, the drop shadow is too dark, my web designer friend already yelled at me for that. Other than that, I hope the image clearly demonstrates the concept I was trying to get across.

Other than drop shadow remarks, please leave comments on the article and image below.

Google Portal, Stemming, DMOZ Submission Review

Google offers portalization of Google.com. Danny Sullivan has an in depth review. They have a number of features and intend to add many, such as RSS feed support.

Rand points out a post by Xan on stemming and a free online stemming tool

kills the submission status review. Now its even easier to be corrupt ;)

New York Times:
Begins charging for some of their content. Most of their content remains free. They are also replacing the CEO of About.com.

When Not to Submit to Directories:
when a person creates about a half dozen general directories and promotes them all together. that is not building value, that is trying to cash out and milk the web.

Many directory owners have become exceedingly greedy recently. All the while search algorithms continue to advance and few of the directory owners are actually trying to build any legitimate value.

The Search:
You can pre order John Battelle's new book. He said if you use this link he may be able to autograph it for you, assuming he can work out the shipping details.

The Size of Google's Index:
might have been a bit frothy

Google Factory Tour:
video presentations (should be up soon), Philip Lessen has highlights

Mirago AdSense:
Apparently they have a product similar to AdSense, which might be useful for companies like HotNacho.

Another Article by Orion

If you are a search geek you may like Fractals, L-Systems and Semantics

Xan questions the paper a bit at the SEW forums.

You guys as you say find inspiration in Orion's theories, even if they have not been proved, and it gives you the motivation to improve your content. This is sufficient enough to see the use of them.

The problem of the ideas as a whole as they do not take into account the big picture but focus down on a very specific are which is the content on the page, when what you should be looking at is the content you share with your peers, and how this all links in together. Starting to look at the various different dimensions your content has in relation to the rest of the world around it may tell you some more. Demo's I've seen do include the use of clustering but in the sense of topic classification. Each site or even each part will belong to 1 or many different spheres of belonging if you like. I've seen demo's that spit out the "topic sphere" if you like and enable the user to visually manipulate this or textually manipulate this to get the results they want.

Never forget the big picture!

I think Xan's point is valid in that by following rules or focusing on specific things sometimes we miss out on the big picture or create artificial machine identifiable patterns. With that being said I find lots of the stuff Orion posts interesting.

Off topic, but Orion the Hunter is my favorite constellation. I have been exploring the universe a bit recently, watching some Cosmos :)