Google Semantically Related Words & Latent Semantic Indexing Technology

Many people have been noticing a wide shuffle in search relevancy scores recently. Some of those well in the know attribute this to latent semantic indexing. Even if they are not using LSI, Google has likely been using other word relationship technologies for a while, but recently increased its weighting. How Does Latent Semantic Indexing Work?
Latent semantic indexing allows a search engine to determine what a page is about outside of specifically matching search query text.

A page about Apple computers will likely naturally have terms such as iMac or iPod on it.

Latent semantic indexing adds an important step to the document indexing process. In addition to recording which keywords a document contains, the method examines the document collection as a whole, to see which other documents contain some of those same words. LSI considers documents that have many words in common to be semantically close, and ones with few words in common to be semantically distant. This simple method correlates surprisingly well with how a human being, looking at content, might classify a document collection. Although the LSI algorithm doesn't understand anything about what the words mean, the patterns it notices can make it seem astonishingly intelligent. source

By placing additional weight on related words in content, or words in similar positions in other related documents, LSI has a net effect of lowering the value of pages which only match the specific term and do not back it up with related terms.

LSI vs Semantically Related Words:
After being roasted by a few IR students and scientists I realized that many SEOs (like me) blended the concepts of semantically related words with latent semantic indexing, and due to constraints of the web it is highly unlikely that large scale search engines are using LSI on their main search indexes.

Nonetheless, it is overtly obvious to anyone who studies search relevancy algorithms by watching the results and ranking pages that the following are true for Google:

  • search engines such as Google do try to figure out phrase relationships when processing queries, improving the rankings of pages with related phrases even if those pages are not focused on the target term

  • pages that are too focused on one phrase tend to rank worse than one would expect (sometimes even being filtered out for what some SEOs call being over-optimized)
  • pages that are focused on a wider net of related keywords tend to have more stable rankings for the core keyword and rank for a wider net of keywords

Given the above, here are tips to help increase your page relevancy scores and make your rankings far more stable...

Mix Your Anchor Text!
Latent semantic indexing (or similar technologies) can also be used to look at the link profile of your website. If all your links are heavy in a few particular phrases and light on other similar phrases then your site may not rank as well.

Example Related Terms:
Many of my links to this site say "SEO Book" but I also used various other anchor text combinations to make the linkage data appear less manipulative.

Instead of using SEO in all the links some of them may use phrases like
search engine optimization
search engine marketing
search engine placement
search engine positioning
search engine promotion
search engine ranking

Instead of using book in all the links some other good common words might be

How do I Know What Words are Related?
There are a variety of options to know what words are related to one another.

  • Search Google for search results with related terms using a ~. For example, Google Search: ~seo will return pages with terms matching or related to seo and will highlight some of the related words in the search results.

  • Use a lexical database
  • Look at variations of keywords suggested by various keyword suggestion tools.
  • write a page and use the Google AdSense sandbox to see what type of ads they would try to deliver to that page.
  • Read the page copy and analyze the backlinks of high ranking pages.

Google Sandbox and Semantic Relationships:
The concept of "Google Sandbox" has become synonymous with "the damn thing won't rank" or whatever. The Sandbox idea is based upon sites with inadequate perceived trust taking longer to rank well.

Understanding the semantic relationships of words is just another piece of the relevancy algorithms, though many sites will significantly shift in rankings due to it. The Google sandbox theory typically has more to do with people getting the wrong kinds of links or not getting enough links than it does with semantic relationships. Some sites and pages are hurt though by being too focused on a particular keyword or phrase.

Where do I learn more about Latent Semantic Indexing?
A while ago I read Patterns in Unstructured Data and found it was wrote in a rather plain english easy to understand manner.

Brian Turner also listed a good number of research papers in this thread.

Forum Coverage:

Selected Forum Quotes:

I'm not about to go post my research and examples on a public forum. But, I'll warn you now - if you're not varying your anchor text, and you're not writing pages synonymous with your term that don't contain the term you're targetting, you're going to be in a world of hurt within the next 90 days.

We've been tracking this update for the last 6 months. I was surprised to see it happen now - I honestly didn't expect it until next month or March, but it's here.


I have a page about "baby clothes". I link to my site 100 times with the anchor text "baby clothes"

I now pull out the words "baby clothes" and all the links pointing to my site with the words "baby clothes"

Do I still have footing to rank for that term "baby clothes" after you've run some sort of semantic analysis on it?

That's my simplistic explanation. I think they're doing something very similar, but taking links into account like that and maybe even devaluing some links on the "main" term...


Well, if it hasn't changed by Monday I'm going out to buy a black hat.

If irrelevant junk is what Google wants then irrelevant junk is what it's gonna get. :-(


Man I'm glad I diversified my sites. I think I will work on diverifying some more...


Google Inc. is all about money. And IMHO ... so are Yahoo Inc. and Microsft Corp.. As webmasters we are the people who build sites and depend on these money hungry companies, who at the heels of the hunt, put their interests miles ahead of ours.


My main concern with this new update is that if you search for my brand name (and there are quite a few that do based on referrals), then right now my site does not even rank. Our brand name is perhaps the best in my industry, and Google are, in my opinion, diluting my brand name and causing my company money. The first result for my brand name is a spammy page which is a "scraper site" which is actually SERP's page from somewhere - so that's basically useless.

The Hidden or Not so Hidden Messages:

  • If you are entirely dependant on any single network and a single site for the bulk of your income then you are taking a big risk. Most webmasters would be best off to have at least a couple of income streams to shield themselves from algorithm changes.

  • If you are new to SEO you are best off optimizing your site for MSN and Yahoo! off the start and then hoping to later rank well in Google.
  • Make sure you mix your anchor text to minimize your risk profile. Even if you are generally just using your site name as your anchor text eventually that too can hurt you.
  • Search algorithms and SEO will continue to get more complicated. But that makes for many fun posts ;)

Update: a few additional tools recommended in our comments and the comments at ThreadWatch

Published: February 4, 2005 by Aaron Wall in google technology


February 5, 2007 - 5:33pm

Great article but you don't spell necessarily like Cheers and have a great week.

September 15, 2006 - 7:45pm

Suggest you get an editor to check your entries. "It was wrote..."?

liam anthony
December 22, 2005 - 1:02pm

were starting a new seo company as we found our previous employer's formula doesn't work anymore. can u be a bit more comprehensive about the content writing in comparison with the link building campaigns.

p.s. do u hav any idea which one should have a higher priority in the light of the new google algorithm changes?

February 26, 2005 - 2:37pm

Hi Kid Mercury

they may have also done quite a large amount with trying to scrub links for how natural they look.

September 15, 2006 - 10:25pm

I thought I just did. They were nice enough to point out the error of my ways, but left me stranded as to how I should fix them.

June 14, 2007 - 2:45pm

good content and explaination of content is good.Learnt about LSI,google ranking techniques.


February 7, 2005 - 10:13pm

Nice post. Do you think they are using it now and if it has anything to do with last week?

Personally, I think it's something much simpler than this, but who knows?

March 27, 2006 - 8:43am

I think the priorities depend on the competitive nature of the marketplace.

Formulas miss the mark because most of them don't consider the social aspects of the web.

It is all about layering technologies with search... so it will be hard to write copy that sounds great and converts well if it has a formula in mind. It will be hard to write link worthy content if you have a formula in mind.

February 8, 2005 - 8:13am

>Do you think they are using it now and if it has anything to do with last week?

I think they are, but I also think it might have been part of a roll out of a few changes.

July 10, 2007 - 12:48pm

This is an useful information educating the world of Webmasters / SEOs. Also, Andy_boyd's quote is an amaizing one on the context.

October 24, 2006 - 3:30am

Thanks a lot for all these information Aaron.

I will use them to get higher ranking to my websites.

October 6, 2006 - 9:14pm

Hey dude... nice one there...

Well I am applying them to success... Well my new site which is based on career, jobs and other related stuffs. I am just linking out couple of links a month and then leaving the rest on google and guess what it is loving it... and I'm lovin it too...

space tv
August 15, 2007 - 12:48pm

I am a little bit late to respond, but still. Aaron, I would associate all this with duplicate content a little bit. Although using related keywords will definitely spice up your rankings, you will have to let that flow naturally, not force it into the text of your website. I think that having unique text is the best thing you should opt for though. That's the major thing that's gonna get you out of the supplementals and put you on the map.

April 11, 2006 - 11:14pm

How about other context issues? It would be nice if Google could tell when a site belongs to a company so that you don't have to write a bunch of keyphrases with company in them. Or if Google could tell the geographic location from an address so you don't have to be specific about these types of phrases.
This whole LSI thing is a welcome change if you ask me. I think it will help find sites that are actually producing useful discussion. It helps close the gap a little between what a Human thinks a page is about and what a search engine thinks it's about.

February 4, 2005 - 9:01am

Great post Aaron. I always enjoy reading your blog.

February 4, 2005 - 2:18pm

This tool scrapes related terms from Google's tilde search, so should be useful:

February 4, 2005 - 4:44pm

Aaron - that was one of the best blog posts I've ever read on an SEO blog.

February 22, 2005 - 6:05pm

best summary of LSI/LSA and the february google update that i've seen yet -- and by a wide margin. enlightening work as always, aaron! thanks!!

September 3, 2007 - 6:57am

Yes it's really good information about LSI and this information remove some of my misconception.


January 5, 2007 - 3:03am

Most of the few users sent by Google to my site used a whole phrase in their search. Does that mean we can also avoid the sandbox if our keywords are phrases?


September 3, 2007 - 4:22pm

Good tips.
Varying words not only in links and texts, but for example also in a hierarchical website-structure, makes your site look more natural. I guess this is a principle that mimics the way 'human people' (not SEO's) would write texts, links, and structure ther website :p

November 3, 2007 - 12:09am

do you want to mix keyterms because Google detects relevancy, but also because you want to cover many related keyterms to what your site has to offer as well.

November 3, 2007 - 1:03am

Yup. Mixing keywords allows you to appear semantically related and makes you relevant for a wider basket of keywords.

November 22, 2007 - 4:27pm

Aaron, this has got to be the best post on Latent Semantic Indexing having the best advice. Thanks a lot.

December 1, 2008 - 4:27am

I think I'm starting to understand this now. Not that I do it but the warnings are very helpful.

March 24, 2009 - 10:27pm

Well explained post Aaron! We have been experimenting with Latent Semantic Indexing at my office to expand website's audience and reach. I also found another well written article on this subject at:
Check it out!

May 20, 2009 - 6:20pm

Have you taken a look at Stomperblog's video on this topic and if so, what do you think?

May 20, 2009 - 7:27pm

So much free information that is created by info-marketers is built around disproving stuff, trying to make things look black and white, and selling to newbies, that (while being factually correct) it leaves the person who consumes it with a less complete view of the SEO game.

They debunked LSI, but after doing that, did they actually teach you anything?

Google clearly ***does*** look at word relationships and look for word variation in some of their anchor text analysis and on page analysis.

They may not use LSI by the classic definition, but they clearly ***are*** trying to understand word relationships and reflect those in their search relevancy algorithms, where and when possible.

So if you understand the concepts behind how LSI works you can apply such learning toward improving your optimization process and SEO skills - by using logical word variations and modifiers.

Amit Singal, Google's head of search quality, wrote:

It is critical that we understand what our users are looking for (beyond just the few words in their query). We have made several notable advances in this area including a best-in-class spelling suggestion system, an advanced synonyms system, and a very strong concept analysis system.
Synonyms are the foundation of our query understanding work. This is one of the hardest problems we are solving at Google. Though sometimes obvious to humans, it is an unsolved problem in automatic language processing. As a user, I don't want to think too much about what words I should use in my queries. Often I don't even know what the right words are. This is where our synonyms system comes into action. Our synonyms system can do sophisticated query modifications, e.g., it knows that the word 'Dr' in the query [Dr Zhivago] stands for Doctor whereas in [Rodeo Dr] it means Drive. A user looking for [back bumper repair] gets results about rear bumper repair. For [Ramstein ab], we automatically look for Ramstein Air Base; for the query query [b&b ab] we search for Bed and Breakfasts in Alberta, Canada. We have developed this level of query understanding for almost one hundred different languages, which is what I am truly proud of.

And Google is looking to extend in that direction in the future. Eric Schmidt, Google's CEO, said:

“Wouldn’t it be nice if Google understood the meaning of your phrase rather than just the words that are in that phrase? We have a lot of discoveries in that area that are going to roll out in the next little while.”

Those quotes are publicly available, but most salesmen will not quote them.

It is a lot easier to sell information if you make complex issues appear black and white. Unfortunately that often means the marketers debunk something and then don't explain the important and useful related information, particularly if it disagrees with the thesis pushed to look smart to newbies and sell to them.

Add new comment

(If you're a human, don't change the following field)
Your first name.
(If you're a human, don't change the following field)
Your first name.
(If you're a human, don't change the following field)
Your first name.