Recently Mike Grehan interviewed Jon Glick, who is Yahoo!'s Senior Manager for Web Search. You can read all the good Yahoo! Search stuff (note to self: stuff is a generic word to use in anchor text) in it, or look at my synopsis below. Yahoo! Search
How was Yahoo! Search Made?
The goal in creating Yahoo! search technology was not, you know, let's take a piece here a piece there...AllTheWeb was very, very good at was rebuilding indexes and keeping very high levels of freshness...Alta Vista had a really good core technology called MLR (machine learn ranking).
The best of breed parts of those engines were included with some of the best parts of Inktomi to make the new Yahoo! Search.
How is Yahoo! Search using the meta keywords tag?
Each keyword is an individual token separated by commas...For best practice you just need to remember it's for matching - not ranking...‘laptop computers’ will count for ‘laptop computers’ and not ‘laptop’ or ‘computers’ separately.
Essentially this is a good place for synonyms & misspellings (I tend to have some of them in my copy anyway)...Each keyword phrase is unique and separated by commas, so you will not have set off a flag for laptops if you use a meta tag such as
<meta name="keywords" content="Yahoo! laptops, computer laptops, compaqt laptops, compak laptops">
but you will also want to get words such as "lap tops" and "compaqt lap tops" in your meta tags. Each version helps to get your file included in that specific subset of search results, but has no influence on rankings.
How does Yahoo! Search use the meta description tag>
Yes we do use meta keywords. So let me touch on meta tags real fast. We index the meta description tag. It counts similar to body text.
Page Title Tag
How should I write my page title tag?
The title tag? My biggest recommendation is write that for users! Because that's the one real piece of editorial in the search listing that you can absolutely control...We typically show, roughly 60 characters. That's the maximum we'd show. I'm not a professional copywriter, so I can't tell you "is short and punchy better than lots of information..."
Why does Yahoo! hate affiliate marketers?
Well let me just say first that, in that sense Spam has gotten a lot better over the years. You don't really much have people trying to appear for off topic terms as they tended to. You now have people who are trying to be very relevant. They're trying to offer a service, but the issue with affiliate Spam is that they're trying to offer the same service as three hundred other people.
...the story is about seemingly incremental features that are actually massively expensive for others to match, and the platform that Google is building which makes it cheaper and easier for them to develop and run web-scale applications than anyone else...While competitors are targeting the individual applications Google has deployed, Google is building a massive, general purpose computing platform for web-scale programming.
After reading Emergence it became more apparent how smart the Topix news idea is. After reading his post it became even more apparent how smart Rich is. I am just wondering how they will be able to take news market share from the big search monsters. Obviously doing news way better works, but even Rich's own post state how amazingling cheap CPU cycles are at Google. How will Topix overcome Google's competitive advantages?
Some good info about SEM, websites, and the future of search.
Nick Scevak of Jupiter Media recently showed a yummie pie graph which showed
16% of businesses surveyed outsourced search engine marketing
15% do not do search engine marketing
69% do search engine marketing in house
This shows some amazing room for growth potential within the industry.
Nick also stated that his biggest fear with paid search is that we may have unrealistic expectations based on amazing performace and returns for early adoptors.
Cheryle Pingle of Range Online Media stated that a large portion of the current growth in search is due to the growth of the economy.
Michael Sack of Inceptor stated that of the term space the bulk of commerce comes from a few hundred thousand terms. He believes this year that large companies will begin to buy out markets to place them out of the reach of smaller businesses.
Geoff Ramsey of emarketer.com also had many yummie pie graphs. His graphs he showed at the SEMPO meeting showed that
from 2000 - 2003 the search marketing industry has increased about 10 fold
from 2002-2003 search engine marketing had a 145% year over year growth rate
22% of US households have broadband
Yellow Pages currently make $14.3 billion annually, whereas paid search is currently only a 2.2 billion dollar industry.
Also at SEMPO Google announced that it is now supporting search engine marketing and sponsoring SEMPO as the rising complexity and competition in the industry is preventing many business owners from being able to functionally use the marketing systems.
Fredrick Marckini of iProspect quoted a stat from StatMarket which stated the average retail web site conversion rate is 1.8 - 2.0%
Greg Boser of WebGuerrilla also provided a few good link tips on the day. When buying links, 501 C organizations are a good place to look. He also stated that he has seen unlinked URLs in TXT files count as backlinks. Some other good link ideas offered by others include trade organizations, tools, and specialty directories.
It appears Google has gone far beyond stemming with their current algorithm update. They seem to be looking for semantic intent of the query as well as the page, and then returning a result based upon it. The resulting pages frequently may not even have the query on the page.
(original discussion in HighRankings Forums) Many local sites from Florida to Austin to NYC have taken a beating by the recent Google algorithm updates (Florida and Austin.) Is it any wonder they named these recent Google algorithm updates after locations?
Google still have some things to work out with the new algorithm though, as many search results are still a wee bit funky. If the relevancy only drops on commercial searches that is not so bad in the eyes of Google since other products such as Froogle will help in this area. Perhaps semantics are the way to separate the white pages from the yellow pages!
(GEEK STUFF) One of the largest problems many search engines run into is that after they get to a few hundred million documents their algorithms and hardware hit a wall.
For those companies that can afford the investment to get past this point they still run into the problem that each additional resource makes their job a bit harder.
One of the major ways around this problem is to take advantage of the natural patterns in human language. Using Latent Semantic Indexing allows indexing search results based on the pairing of like words within documents.
Many complex searches may lack exact matches in the results as well. Being able to find near matches will allow search engines to provide more comprehensive results.
Its hard to get computers to understand anything human, but the process of latent semantic indexing delivers conceptual results while being entirely mathematically driven.
Some of the steps of the single variable decomposition process are to:
create a database of all words in relevant documents
remove common stop words
remove words appearing in all results
remove words only appearing in one result
create a database of relavent keywords
weight the pages based on the frequency of keyword distribution
increasing the relevance of terms which appear in a small number of pages (as they are more likely to be on topic than words that appear in most all documents)
normalize the page to remove the pagelength as a factor
create relevancy vectors for the keywords
The single variable decomposition process is not scalable enough to work on large scale search engines though as it requires too much processor time. Multi dimentional scaling allows us to take snapshots of the topicology of different documents. "Instead of deriving the best possible projection through matrix decomposition, the MDS algorithm starts with a random arrangement of data, and then incrementally moves it around, calculating a stress function after each perturbation to see if the projection has grown more or less accurate. The algorithm keeps nudging the data points until it can no longer find lower values for the stress function."
This does not provide exact results, but only a rough approximation. When combined with other factors this approximation improves scalability and quality of search.
General tips to make a dynamic site get spidered
1.) Do not force feed the spider a cookie
2.) Use 3 or less variables
3.) Have each query string 10 or less digets
4.) Create a sitemap which links to many of the main database locations.
5.) Build up link popularity from a few quality inbound links. The PageRank (or link popularity in search engines other than Google) will make the spider more inclined to spider deep through your site.
In any medium there will be free rides as new adopters take advantage of knowledge not share by their competitors. While there is always a new technology which creates new markets, this quick read does a good job of explaining why off the page optimization is more effective than on the page optimization. Chris Ridings explains "The Glass Ceiling."
Update: above link to chriseo.com/modules.php?op=modload&name=News&file=article&sid=62&mode=thread&order=0&thold=0 delinked, as the site is owned by a domainer and is a page full of ppc ads