"LatentSemanticAnalysis" (Machine Learning Method)

Details & Suboptions

  • "LatentSemanticAnalysis" is a linear dimensionality reduction method. The method projects input data in a lower-dimensional space that attempts to preserve the semantic association between data points.
  • "LatentSemanticAnalysis" works for datasets that have a large number of features or large number of examples, and works particularly well for sparse datasets (for which most values are zeros). "LatentSemanticAnalysis" is typically used to reduce the dimension of a term-document matrix (term counts in documents).
  • The following plots show the results of the "LatentSemanticAnalysis" method applied to benchmarking datasets including Fisher's Irises, MNIST and FashionMNIST:
  • "LatentSemanticAnalysis" is equivalent to "Linear" and "PrincipalComponentsAnalysis" methods, with the exception that the data is not centered.
  • The method is widely used in informational retrieval to create search engines. The method is also used in natural language processing for tasks such as text classification and topic modeling.

Examples

open allclose all

Basic Examples  (1)

Train a linear dimensionality reduction using the "LatentSemanticAnalysis" method from a list of vectors:

Use the trained reducer on new vectors:

Scope  (1)

Dataset Visualization  (1)

Load the Fisher Iris dataset from ExampleData:

Generate a reducer function using "LatentSemanticAnalysis" with the features of each example:

Group the examples by their species:

Reduce the dimension of the features:

Visualize the reduced dataset:

Applications  (1)

Text Analysis  (1)

Load the text of Don Quixote:

Split the text into sentences:

Train a reducer on these sentences:

Train a nearest function on the embedding of sentences:

Use the function to find the nearest sentence to a query in the reduced semantic space: