Wolfram Language & System Documentation Center

CreateSemanticSearchIndex

CreateSemanticSearchIndex[source]

creates a search index from the data in source.

CreateSemanticSearchIndex[{source₁,…}]

creates a search index with a collection of sources source_i.

CreateSemanticSearchIndex[{source₁val₁,…}]

associates the source source_i to the value val_i.

CreateSemanticSearchIndex[data,"name"]

gives the search index the specified name.

CreateSemanticSearchIndex

CreateSemanticSearchIndex[source]

creates a search index from the data in source.

CreateSemanticSearchIndex[{source₁,…}]

creates a search index with a collection of sources source_i.

CreateSemanticSearchIndex[{source₁val₁,…}]

associates the source source_i to the value val_i.

CreateSemanticSearchIndex[data,"name"]

gives the search index the specified name.

Details and Options

CreateSemanticSearchIndex is used to extract features from text that can be used to search the content semantically.
Possible values for source are:

	"string"	a plain string
	File["path"]	individual file
	URL["url"]	the text representation of "url"
	CloudObject[…]	a cloud object
	LocalObject[…]	a local object
	ContentObject[…]	a content object
	{source₁,source₂,…}	list of sources

Sources can be annotated. Every chunk from a given source will have the same annotation.
Possible ways to specify annotations include:
{source₁val₁,…} a list of sources and associated values

{source₁,…}{val₁,…} a rule between sources and values
Accepted forms of val_i include:
"string" string labels

<|"tag₁"v₁,…|> an association of tags and metadata values
CreateSemanticSearchIndex supports the following options:

DistanceFunction	EuclideanDistance	the distance function to use
FeatureExtractor	"MiniLM"	how to extract features from text chunks
GeneratedAssetLocation	$GeneratedAssetLocation	the location of the index
Method	Automatic	method details
OverwriteTarget	Automatic	whether to overwrite an existing location
ProgressReporting	$ProgressReporting	whether to report the progress of the computation
WorkingPrecision	"Real32"	precision of floating-point calculations

Possible values for DistanceFunction include EuclideanDistance, SquaredEuclideanDistance, CosineDistance, JaccardDissimilarity and HammingDistance.
Possible values for FeatureExtractor include:
"SentenceBERT" a local model based on SentenceBERT

LLMConfiguration an LLM-based sentence embedding

f a custom extractor function
Custom extractors f must operate on a list of strings and produce a list of vectors of the same length.
Detailed options can be given using Method<|opt₁val₁|>. Possible values for opt_i are:

"ContextPadding"		minimal overlap between chunks
"MaximumItemLength"		maximum length of a text chunk
"MinimumItemLength"		minimum length of a text chunk
"SplitPattern"	Automatic	where to split long strings

The automatic "SplitPattern" tries to split a source text in paragraphs, newlines and words to create chunks between "MinimumItemLength" and "MaximumItemLength".
Possible settings for WorkingPrecision include:
"Integer8" signed 8-bit integers from -128 through 127

"Real32" single-precision real (32-bit)

"Real64" double-precision real (64-bit)

Examples

open all close all

Basic Examples (2)

Create a new SemanticSearchIndex:

Search in the text by semantic similarity:

Create an index with multiple labeled sources:

Recover the label for the most similar items:

Scope (6)

Data Sources (4)

Create an index from a string:

Create an index from a file:

Create an index from a URL:

Create an index with a specific name:

Annotations (2)

Annotate sources with a label:

Each chunk inherits the correspondent source label:

The label is returned when performing search:

Annotated sources with tagged metadata:

Specify the annotations in a separate Association:

Options (10)

DistanceFunction (1)

Specify a custom distance function for the index:

By default, EuclideanDistance is used:

FeatureExtractor (1)

Train a custom feature extractor:

Use it to extract features from another text:

GeneratedAssetLocation (3)

Specify a custom location to store the index:

Retrieve the location:

By default, the index is stored in a local object:

Store the vector index in a file:

Retrieve the location:

Recreate the database from the file reference:

Method (2)

Create a text with multiple very short entries:

The entire text will be embedded as a single chunk:

Adjusting the minimum and maximum item length to chunk into more relevant sections:

Create several paragraphs of text:

Join them with nonstandard separators:

The default automatic paragraph and sentence chunking will give poor results:

Use a custom split pattern to create chunks at the custom dividers:

OverwriteTarget (2)

The index's automatic location is determined by its name:

With default OverwriteTargetAutomatic, a new index name is generated to avoid collisions:

To force overwriting, use OverwriteTargetTrue:

Use OverwriteTargetFalse to perform a strict check:

OverwriteTargetFalse will also prevent reusing the same index name in a different location:

Create a file:

By default, existing files are not overwritten:

Use OverwriteTargetTrue to overwrite the existing file:

WorkingPrecision (1)

Specify a custom working precision for the embedding vectors:

The working precision is stored in the index's vector database:

Retrieve the precision value:

Applications (2)

Create a reverse mapping between a word and its definitions:

Build an index using the map:

Perform reverse lookup in a dictionary by matching the query against the definitions:

Retrieve quotes from a book:

Search for the quote without knowing its exact wording:

Properties & Relations (1)

Create an index and retrieve the embeddings:

FeatureExtract can use the specified "SentenceVector" extractor to create similar embeddings:

Possible Issues (2)

An input string is always interpreted as text:

To follow the link, use the URL wrapper:

Use File to import file content:

Text with multiple small entries:

Chunking will not occur, as the length of the string is less than the default maximum:

Lower the maximum item length to ensure chunking takes place:

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

CreateSemanticSearchIndex

Details and Options

Examples

Basic Examples (2)