CreateSemanticSearchIndex

CreateSemanticSearchIndex[source]

creates a search index from the data in source.

CreateSemanticSearchIndex[{source1,}]

creates a search index with a collection of sources sourcei.

CreateSemanticSearchIndex[{source1val1,}]

associates the source sourcei to the value vali.

CreateSemanticSearchIndex[data,"name"]

gives the search index the specified name.

Details and Options

  • CreateSemanticSearchIndex is used to extract features from text that can be used to search the content semantically.
  • Possible values for source are:
  • "string"a plain string
    File["path"]individual file
    URL["url"]the text representation of "url"
    CloudObject[]a cloud object
    LocalObject[]a local object
    ContentObject[]a content object
    {source1,source2,}list of sources
  • Sources can be annotated. Every chunk from a given source will have the same annotation.
  • Possible ways to specify annotations include:
  • {source1val1,}a list of sources and associated values
    {source1,}{val1,}a rule between sources and values
  • Accepted forms of vali include:
  • "string"string labels
    <|"tag1"v1,|>an association of tags and metadata values
  • CreateSemanticSearchIndex supports the following options:
  • DistanceFunction EuclideanDistancethe distance function to use
    FeatureExtractor "SentenceBERT"how to extract features from text chunks
    GeneratedAssetLocation $GeneratedAssetLocationthe location of the index
    Method Automaticmethod details
    OverwriteTarget Automaticwhether to overwrite an existing location
    ProgressReporting$ProgressReportingwhether to report the progress of the computation
    WorkingPrecision "Real32"precision of floating-point calculations
  • Possible values for DistanceFunction include EuclideanDistance, SquaredEuclideanDistance, CosineDistance, JaccardDissimilarity and HammingDistance.
  • Possible values for FeatureExtractor include:
  • "SentenceBERT"a local model based on SentenceBERT
    LLMConfigurationan LLM-based sentence embedding
    fa custom extractor function
  • Custom extractors f must operate on a list of strings and produce a list of vectors of the same length.
  • Detailed options can be given using Method<|opt1val1|>. Possible values for opti are:
  • "ContextPadding"minimal overlap between chunks
    "MaximumItemLength"maximum length of a text chunk
    "MinimumItemLength"minimum length of a text chunk
    "SplitPattern"Automaticwhere to split long strings
  • The automatic "SplitPattern" tries to split a source text in paragraphs, newlines and words to create chunks between "MinimumItemLength" and "MaximumItemLength".
  • Possible settings for WorkingPrecision include:
  • "Integer8"signed 8-bit integers from -128 through 127
    "Real32"single-precision real (32-bit)
    "Real64"double-precision real (64-bit)

Examples

open allclose all

Basic Examples  (2)

Create a new SemanticSearchIndex:

Search in the text by semantic similarity:

Create an index with multiple labeled sources:

Recover the label for the most similar items:

Scope  (6)

Data Sources  (4)

Create an index from a string:

Create an index from a file:

Create an index from a URL:

Create an index with a specific name:

Annotations  (2)

Annotate sources with a label:

Each chunk inherits the correspondent source label:

The label is returned when performing search:

Annotated sources with tagged metadata:

Specify the annotations in a separate Association:

Options  (10)

DistanceFunction  (1)

Specify a custom distance function for the index:

By default, EuclideanDistance is used:

FeatureExtractor  (1)

Train a custom feature extractor:

Use it to extract features from another text:

GeneratedAssetLocation  (3)

Specify a custom location to store the index:

Retrieve the location:

By default, the index is stored in a local object:

Store the vector index in a file:

Retrieve the location:

Recreate the database from the file reference:

Method  (2)

Create a text with multiple very short entries:

The entire text will be embedded as a single chunk:

Adjusting the minimum and maximum item length to chunk into more relevant sections:

Create several paragraphs of text:

Join them with nonstandard separators:

The default automatic paragraph and sentence chunking will give poor results:

Use a custom split pattern to create chunks at the custom dividers:

OverwriteTarget  (2)

The index's automatic location is determined by its name:

With default OverwriteTargetAutomatic, a new index name is generated to avoid collisions:

To force overwriting, use OverwriteTargetTrue:

Use OverwriteTargetFalse to perform a strict check:

OverwriteTargetFalse will also prevent reusing the same index name in a different location:

Create a file:

By default, existing files are not overwritten:

Use OverwriteTargetTrue to overwrite the existing file:

WorkingPrecision  (1)

Specify a custom working precision for the embedding vectors:

The working precision is stored in the index's vector database:

Retrieve the precision value:

Applications  (2)

Create a reverse mapping between a word and its definitions:

Build an index using the map:

Perform reverse lookup in a dictionary by matching the query against the definitions:

Retrieve quotes from a book:

Search for the quote without knowing its exact wording:

Possible Issues  (2)

An input string is always interpreted as text:

To follow the link, use the URL wrapper:

Use File to import file content:

Text with multiple small entries:

Chunking will not occur, as the length of the string is less than the default maximum:

Lower the maximum item length to ensure chunking takes place:

Wolfram Research (2024), CreateSemanticSearchIndex, Wolfram Language function, https://reference.wolfram.com/language/ref/CreateSemanticSearchIndex.html (updated 2025).

Text

Wolfram Research (2024), CreateSemanticSearchIndex, Wolfram Language function, https://reference.wolfram.com/language/ref/CreateSemanticSearchIndex.html (updated 2025).

CMS

Wolfram Language. 2024. "CreateSemanticSearchIndex." Wolfram Language & System Documentation Center. Wolfram Research. Last Modified 2025. https://reference.wolfram.com/language/ref/CreateSemanticSearchIndex.html.

APA

Wolfram Language. (2024). CreateSemanticSearchIndex. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/CreateSemanticSearchIndex.html

BibTeX

@misc{reference.wolfram_2024_createsemanticsearchindex, author="Wolfram Research", title="{CreateSemanticSearchIndex}", year="2025", howpublished="\url{https://reference.wolfram.com/language/ref/CreateSemanticSearchIndex.html}", note=[Accessed: 21-January-2025 ]}

BibLaTeX

@online{reference.wolfram_2024_createsemanticsearchindex, organization={Wolfram Research}, title={CreateSemanticSearchIndex}, year={2025}, url={https://reference.wolfram.com/language/ref/CreateSemanticSearchIndex.html}, note=[Accessed: 21-January-2025 ]}