Yahoo! Search Engine Optimization Tips

Recently Mike Grehan interviewed Jon Glick, who is Yahoo!'s Senior Manager for Web Search. You can read all the good Yahoo! Search stuff (note to self: stuff is a generic word to use in anchor text) in it, or look at my synopsis below. Yahoo! Search
How was Yahoo! Search Made?

The goal in creating Yahoo! search technology was not, you know, let's take a piece here a piece there...AllTheWeb was very, very good at was rebuilding indexes and keeping very high levels of freshness...Alta Vista had a really good core technology called MLR (machine learn ranking).

The best of breed parts of those engines were included with some of the best parts of Inktomi to make the new Yahoo! Search.

Meta Keywords
How is Yahoo! Search using the meta keywords tag?

Each keyword is an individual token separated by commas...For best practice you just need to remember it's for matching - not ranking...‘laptop computers’ will count for ‘laptop computers’ and not ‘laptop’ or ‘computers’ separately.

Essentially this is a good place for synonyms & misspellings (I tend to have some of them in my copy anyway)...Each keyword phrase is unique and separated by commas, so you will not have set off a flag for laptops if you use a meta tag such as
<meta name="keywords" content="Yahoo! laptops, computer laptops, compaqt laptops, compak laptops">

but you will also want to get words such as "lap tops" and "compaqt lap tops" in your meta tags. Each version helps to get your file included in that specific subset of search results, but has no influence on rankings.

Meta Description
How does Yahoo! Search use the meta description tag>

Yes we do use meta keywords. So let me touch on meta tags real fast. We index the meta description tag. It counts similar to body text.

Page Title Tag
How should I write my page title tag?

The title tag? My biggest recommendation is write that for users! Because that's the one real piece of editorial in the search listing that you can absolutely control...We typically show, roughly 60 characters. That's the maximum we'd show. I'm not a professional copywriter, so I can't tell you "is short and punchy better than lots of information..."

Affiliate Marketing
Why does Yahoo! hate affiliate marketers?

Well let me just say first that, in that sense Spam has gotten a lot better over the years. You don't really much have people trying to appear for off topic terms as they tended to. You now have people who are trying to be very relevant. They're trying to offer a service, but the issue with affiliate Spam is that they're trying to offer the same service as three hundred other people.

They also touch on Site Match, search personalization, SPAM, linkage data and other hot search topics...also linked to this visual thesaurus tool at the end of their eMarketing News newsletter.

Google's Competitive Advantages

Recently I was over at Topix.net and glanced at their blog and found a great post about Google by their founder Rich Skrenta which highlights Google's competitive advantages.

...the story is about seemingly incremental features that are actually massively expensive for others to match, and the platform that Google is building which makes it cheaper and easier for them to develop and run web-scale applications than anyone else...While competitors are targeting the individual applications Google has deployed, Google is building a massive, general purpose computing platform for web-scale programming.

I also just finished reading Emergence by Steven Berlin Johnson where he reviews some emergent software and social networks such as Slashdot.

While at Slashdot I noticed they too were eating up Rich's post, as the blog community at large is (last I checked MIT's Blogdex it was the most rapidly spreading idea on the web).

After reading Emergence it became more apparent how smart the Topix news idea is. After reading his post it became even more apparent how smart Rich is. I am just wondering how they will be able to take news market share from the big search monsters. Obviously doing news way better works, but even Rich's own post state how amazingling cheap CPU cycles are at Google. How will Topix overcome Google's competitive advantages?

Future of Search & Future of Search Engine Marketing

Some good info about SEM, websites, and the future of search.

Nick Scevak of Jupiter Media recently showed a yummie pie graph which showed

  • 16% of businesses surveyed outsourced search engine marketing

  • 15% do not do search engine marketing
  • 69% do search engine marketing in house

This shows some amazing room for growth potential within the industry.
Nick also stated that his biggest fear with paid search is that we may have unrealistic expectations based on amazing performace and returns for early adoptors.

Cheryle Pingle of Range Online Media stated that a large portion of the current growth in search is due to the growth of the economy.

Michael Sack of Inceptor stated that of the term space the bulk of commerce comes from a few hundred thousand terms. He believes this year that large companies will begin to buy out markets to place them out of the reach of smaller businesses.

Geoff Ramsey of emarketer.com also had many yummie pie graphs. His graphs he showed at the SEMPO meeting showed that

  • from 2000 - 2003 the search marketing industry has increased about 10 fold

  • from 2002-2003 search engine marketing had a 145% year over year growth rate
  • 22% of US households have broadband
  • Yellow Pages currently make $14.3 billion annually, whereas paid search is currently only a 2.2 billion dollar industry.

Also at SEMPO Google announced that it is now supporting search engine marketing and sponsoring SEMPO as the rising complexity and competition in the industry is preventing many business owners from being able to functionally use the marketing systems.

Fredrick Marckini of iProspect quoted a stat from StatMarket which stated the average retail web site conversion rate is 1.8 - 2.0%

Greg Boser of WebGuerrilla also provided a few good link tips on the day. When buying links, 501 C organizations are a good place to look. He also stated that he has seen unlinked URLs in TXT files count as backlinks. Some other good link ideas offered by others include trade organizations, tools, and specialty directories.

New Google Algorthim Update

It appears Google has gone far beyond stemming with their current algorithm update. They seem to be looking for semantic intent of the query as well as the page, and then returning a result based upon it. The resulting pages frequently may not even have the query on the page.

(original discussion in HighRankings Forums) Many local sites from Florida to Austin to NYC have taken a beating by the recent Google algorithm updates (Florida and Austin.) Is it any wonder they named these recent Google algorithm updates after locations?

Google still have some things to work out with the new algorithm though, as many search results are still a wee bit funky. If the relevancy only drops on commercial searches that is not so bad in the eyes of Google since other products such as Froogle will help in this area. Perhaps semantics are the way to separate the white pages from the yellow pages!

Latent Semantic Indexing

(GEEK STUFF) One of the largest problems many search engines run into is that after they get to a few hundred million documents their algorithms and hardware hit a wall.

For those companies that can afford the investment to get past this point they still run into the problem that each additional resource makes their job a bit harder.

One of the major ways around this problem is to take advantage of the natural patterns in human language. Using Latent Semantic Indexing allows indexing search results based on the pairing of like words within documents.

Many complex searches may lack exact matches in the results as well. Being able to find near matches will allow search engines to provide more comprehensive results.

Its hard to get computers to understand anything human, but the process of latent semantic indexing delivers conceptual results while being entirely mathematically driven.

There are two main ways to do this, single variable decomposition and multi dimentional scaling.

Some of the steps of the single variable decomposition process are to:

  • create a database of all words in relevant documents
  • remove common stop words
  • stemming
  • remove words appearing in all results
  • remove words only appearing in one result
  • create a database of relavent keywords
  • weight the pages based on the frequency of keyword distribution
  • increasing the relevance of terms which appear in a small number of pages (as they are more likely to be on topic than words that appear in most all documents)
  • normalize the page to remove the pagelength as a factor
  • create relevancy vectors for the keywords

The single variable decomposition process is not scalable enough to work on large scale search engines though as it requires too much processor time. Multi dimentional scaling allows us to take snapshots of the topicology of different documents. "Instead of deriving the best possible projection through matrix decomposition, the MDS algorithm starts with a random arrangement of data, and then incrementally moves it around, calculating a stress function after each perturbation to see if the projection has grown more or less accurate. The algorithm keeps nudging the data points until it can no longer find lower values for the stress function."

This does not provide exact results, but only a rough approximation. When combined with other factors this approximation improves scalability and quality of search.

Good Reading on latent semantic indexing

This technology is so amazing that it may eventually help lead to a cure for cancer. Already the technology is being refined for cognitive improvements and test grading!

How to Make Dynamic URLs Static

Many of these tips originate from members of the I search discussion list (which is an amazing resource well worth the money).

This guy has an datebase ASP website and makes his dynamic content look static to the search engines using a custom 404 error pag build.

Additional ideas are a server side filter softwarehttp://www.smalig.com/url_rewrite-en.htm and URL rewriting software http://www.opcode.co.uk/components/rewrite.asp.

Here is the Apache Mod Rewrite page for you Apache people...

General tips to make a dynamic site get spidered
1.) Do not force feed the spider a cookie
2.) Use 3 or less variables
3.) Have each query string 10 or less digets
4.) Create a sitemap which links to many of the main database locations.
5.) Build up link popularity from a few quality inbound links. The PageRank (or link popularity in search engines other than Google) will make the spider more inclined to spider deep through your site.

ChriSEO's 'Glass Ceiling'

In any medium there will be free rides as new adopters take advantage of knowledge not share by their competitors. While there is always a new technology which creates new markets, this quick read does a good job of explaining why off the page optimization is more effective than on the page optimization. Chris Ridings explains "The Glass Ceiling."

Update: above link to chriseo.com/modules.php?op=modload&name=News&file=article&sid=62&mode=thread&order=0&thold=0 delinked, as the site is owned by a domainer and is a page full of ppc ads

Update 2: Linking to Archive.org version of the page here, along with a quote:

Consider each keyword phrase as being a little market economy, an interpretation we can intuitively justify by seeing the keywords can have monetary values attached to them in advertising systems. The optimizer who is working o­n o­n-the-page factors alone is looking for an economy with extremely low competition (less than 10 competitors). This economy must also provide a profitable return. The market must have practically no barriers to entry. In short, this optimizer would be looking for a newly emerging market or a niche (a forgotten keyword). What we begin to see is that our solely o­n-the-page optimizer is less of an optimizer and more of a researcher and opportunist. That is no negative statement, being such is a skill in and of itself. However, we can also see that should they prove successful then their very success is an indicator to the competiton that this opportunity exists. i.e. people will wonder why they do so well and begin to analyse it more in depth. Thus, given time competition will form and them will become an attractor to competiton. Sooner or later they will have more than 10 competitors and the opportunity no longer exists.

Given increasing usage and competition o­n the internet we can say two things: that the quantity of such opportunities is likely to decrease and that the period of time for which such opportunities persist will decrease.

So it is, wisely, arguable that o­n-the-page optimization is a short term, unsustainable option without off-the-page optimization. It also follows that such optimization will often, but not always, be the least profitable in terms of results.

Local Community Link Structure

I try to write at least one or two articles every month. This one is about the reranking of results based on local inter connectivity. Local Community Link Structure

Pages