Cover Page
Introduction
About Search Engines
Structured Data
Latent Semantic Indexing
How LSI Works
A Quick Example
The Term-Document Matrix
Applications of LSI
Multi-Dimensional Scaling
Applications of MDS
Further Reading
| < previous next > APPLICATIONS
What Can LSI Do For Me Today?
Throughout this document, we have been presenting LSI in its role as
a search tool for unstructured data. Given the shortcomings in current
search technologies, this is undoubtedly a critical application of
semantic indexing, and one with very promising results. However, there
are many applications of LSI that go beyond traditional information
retrieval, and many more that extend the notion of what a search engine
is, and how we can best use it. To illustrate this, here are just a few
examples of the areas where exciting work is happening (or should be
happening) with LSI:
Relevance Feedback
Most regular search
engines work best when searching a small set of keywords, and very
quickly decline in recall when the number of search terms grows high.
Because LSI shows the reverse behavior (the more it knows about a
document, the better it is at finding similar ones), a latent semantic
search engine can allow a user to create a 'shopping cart' of useful
results, and then go out and search for futher results that most
closely match the stored ones. This lets the user do an iterative
search, providing feedback to guide the search engine towards a useful
result.
Archivist's Assistant In introducing LSI we
contrasted it with more traditional approaches to structuring data,
including human-generated taxonomies. Given LSI's strength at partially
structuring unstructured data, the two techniques can be used in
tandem. This is potentially a very powerful combination - it would
allow archivists to use their time much more efficiently, enhancing,
labeling and correcting LSI-generated categories rather than having to
index every document from scratch. In the next section, we will look at
a data visualization approach that could be used in conjunction with
LSI to create a sophisticated, interactive application for archivist
use.
-
Automated Writing Assessment
By
comparing student writing against a large data set of stored essays on
a given topic, LSI tools can analyze submitted assignments and
highlight content areas that the student essay didn't cover. This can
be used as a kind of automated grading system, where the assignment is
compared to a pool of essays of known quality, and given the closest
matching grade. We believe a more appropriate use of the technology is
a feedback tool to guide the student in revising his essay, and suggest
directions for further study. { More info and demo: http://www-psych.nmsu.edu/essay/ }
Textual Coherence:
LSI can look at the semantic
relationships within a text to calculate the degree of topical
coherence between its constituent parts. This kind of coherence
correlates well with readability and comprehension, which suggests that
LSI might be a useful feedback tool in writing instruction (along the
lines of existing readability metrics). { source: http://www.knowledge-technologies.com/papers/abs-dp2.foltz.html }
Information Filtering:
LSI is potentially a
powerful customizable technology for filtering spam (unsolicited
electronic mail). By training a latent semantic algorithm on your
mailbox and known spam messages, and adjusting a user-determined
threshold, it might be possible to flag junk mail much more efficiently
than with current keyword based approaches. The same may apply to
common Microsoft Outlook computer viruses, which tend to share a basic
structure. LSI could also be used to filter newsgroup and bulletin board messages. { source: http://www-psych.nmsu.edu/~pfoltz/cois/filtering-cois.html }
< previous next > |