What Can LSI Do For Me Today?

Throughout this document, we have been presenting LSI in its role as a search tool for unstructured data. Given the shortcomings in current search technologies, this is undoubtedly a critical application of semantic indexing, and one with very promising results. However, there are many applications of LSI that go beyond traditional information retrieval, and many more that extend the notion of what a search engine is, and how we can best use it. To illustrate this, here are just a few examples of the areas where exciting work is happening (or should be happening) with LSI:

  • Relevance Feedback

    Most regular search engines work best when searching a small set of keywords, and very quickly decline in recall when the number of search terms grows high. Because LSI shows the reverse behavior (the more it knows about a document, the better it is at finding similar ones), a latent semantic search engine can allow a user to create a 'shopping cart' of useful results, and then go out and search for futher results that most closely match the stored ones. This lets the user do an iterative search, providing feedback to guide the search engine towards a useful result.

  • Archivist's Assistant

    In introducing LSI we contrasted it with more traditional approaches to structuring data, including human-generated taxonomies. Given LSI's strength at partially structuring unstructured data, the two techniques can be used in tandem. This is potentially a very powerful combination - it would allow archivists to use their time much more efficiently, enhancing, labeling and correcting LSI-generated categories rather than having to index every document from scratch. In the next section, we will look at a data visualization approach that could be used in conjunction with LSI to create a sophisticated, interactive application for archivist use.

  • Automated Writing Assessment

    By comparing student writing against a large data set of stored essays on a given topic, LSI tools can analyze submitted assignments and highlight content areas that the student essay didn't cover. This can be used as a kind of automated grading system, where the assignment is compared to a pool of essays of known quality, and given the closest matching grade. We believe a more appropriate use of the technology is a feedback tool to guide the student in revising his essay, and suggest directions for further study.
    { More info and demo: }

  • Textual Coherence:

    LSI can look at the semantic relationships within a text to calculate the degree of topical coherence between its constituent parts. This kind of coherence correlates well with readability and comprehension, which suggests that LSI might be a useful feedback tool in writing instruction (along the lines of existing readability metrics).
    { source: }

  • Information Filtering:

    LSI is potentially a powerful customizable technology for filtering spam (unsolicited electronic mail). By training a latent semantic algorithm on your mailbox and known spam messages, and adjusting a user-determined threshold, it might be possible to flag junk mail much more efficiently than with current keyword based approaches. The same may apply to common Microsoft Outlook computer viruses, which tend to share a basic structure.
    LSI could also be used to filter newsgroup and bulletin board messages. { source: }

< previous     next >

This work is licensed under a Creative Commons License. 2002 National Institute for Technology in Liberal Education. For more info, contact the author.

Gain a Competitive Advantage Today

Want more great SEO insights? Read our SEO blog to keep up with the latest search engine news, and subscribe to our SEO training program to get cutting edge tips we do not share with the general public. Our training program also offers exclusive SEO videos.

  • Over 100 training modules, covering topics like: keyword research, link building, site architecture, website monetization, pay per click ads, tracking results, and more.
  • An exclusive interactive community forum
  • Members only videos and tools
  • Additional bonuses - like data spreadsheets, and money saving tips
We love our customers, but more importantly

Our customers love us!

    Email Address
    Pick a Username
    Yes, please send me "7 Days to SEO Success" mini-course (a $57 value) for free.

    Learn More

    We value your privacy. We will not rent or sell your email address.