What Can LSI Do For Me Today?

Throughout this document, we have been presenting LSI in its role as a search tool for unstructured data. Given the shortcomings in current search technologies, this is undoubtedly a critical application of semantic indexing, and one with very promising results. However, there are many applications of LSI that go beyond traditional information retrieval, and many more that extend the notion of what a search engine is, and how we can best use it. To illustrate this, here are just a few examples of the areas where exciting work is happening (or should be happening) with LSI:

  • Relevance Feedback

    Most regular search engines work best when searching a small set of keywords, and very quickly decline in recall when the number of search terms grows high. Because LSI shows the reverse behavior (the more it knows about a document, the better it is at finding similar ones), a latent semantic search engine can allow a user to create a 'shopping cart' of useful results, and then go out and search for futher results that most closely match the stored ones. This lets the user do an iterative search, providing feedback to guide the search engine towards a useful result.

  • Archivist's Assistant

    In introducing LSI we contrasted it with more traditional approaches to structuring data, including human-generated taxonomies. Given LSI's strength at partially structuring unstructured data, the two techniques can be used in tandem. This is potentially a very powerful combination - it would allow archivists to use their time much more efficiently, enhancing, labeling and correcting LSI-generated categories rather than having to index every document from scratch. In the next section, we will look at a data visualization approach that could be used in conjunction with LSI to create a sophisticated, interactive application for archivist use.

  • Automated Writing Assessment

    By comparing student writing against a large data set of stored essays on a given topic, LSI tools can analyze submitted assignments and highlight content areas that the student essay didn't cover. This can be used as a kind of automated grading system, where the assignment is compared to a pool of essays of known quality, and given the closest matching grade. We believe a more appropriate use of the technology is a feedback tool to guide the student in revising his essay, and suggest directions for further study.
    { More info and demo: }

  • Textual Coherence:

    LSI can look at the semantic relationships within a text to calculate the degree of topical coherence between its constituent parts. This kind of coherence correlates well with readability and comprehension, which suggests that LSI might be a useful feedback tool in writing instruction (along the lines of existing readability metrics).
    { source: }

  • Information Filtering:

    LSI is potentially a powerful customizable technology for filtering spam (unsolicited electronic mail). By training a latent semantic algorithm on your mailbox and known spam messages, and adjusting a user-determined threshold, it might be possible to flag junk mail much more efficiently than with current keyword based approaches. The same may apply to common Microsoft Outlook computer viruses, which tend to share a basic structure.
    LSI could also be used to filter newsgroup and bulletin board messages. { source: }

< previous     next >

This work is licensed under a Creative Commons License. 2002 National Institute for Technology in Liberal Education. For more info, contact the author.

Gain a Competitive Advantage Today

Your top competitors have been investing into their marketing strategy for years.

Now you can know exactly where they rank, pick off their best keywords, and track new opportunities as they emerge.

Explore the ranking profile of your competitors in Google and Bing today using SEMrush.

Enter a competing URL below to quickly gain access to their organic & paid search performance history - for free.

See where they rank & beat them!

  • Comprehensive competitive data: research performance across organic search, AdWords, Bing ads, video, display ads, and more.
  • Compare Across Channels: use someone's AdWords strategy to drive your SEO growth, or use their SEO strategy to invest in paid search.
  • Global footprint: Tracks Google results for 120+ million keywords in many languages across 28 markets
  • Historical data: since 2009, before Panda and Penguin existed, so you can look for historical penalties and other potential ranking issues.
  • Risk-free: Free trial & low price.
Your competitors, are researching your site

Find New Opportunities Today

    Email Address
    Pick a Username
    Yes, please send me "7 Days to SEO Success" mini-course (a $57 value) for free.

    Learn More

    We value your privacy. We will not rent or sell your email address.