CreateSemanticSearchIndex
CreateSemanticSearchIndex[source]
creates a search index from the data in source.
CreateSemanticSearchIndex[source,"name"]
gives the search index the specified name.
Details and Options
- CreateSemanticSearchIndex is used to extract features from text that can be used to search the content semantically.
- Possible values for source are:
-
"string" a plain string File["path"] individual file URL["url"] the text representation of "url" CloudObject[…] a cloud object LocalObject[…] a local object {obj1,obj2,…} list of objects - Sources can be tagged; possible values include:
-
{obj1val1,…} a list of vectors and associated values {obj1,…}{val1,…} a rule between vectors and values - Accepted forms of vali include:
-
"string" string labels <"tag1"v1,… > an association of tags and metadata values - CreateSemanticSearchIndex supports the following options:
-
DistanceFunction EuclideanDistance the distance function to use FeatureExtractor "SentenceBERT" how to extract features from text segments GeneratedAssetLocation $GeneratedAssetLocation the location of the index Method Automatic method details OverwriteTarget Automatic whether to overwrite an existing location ProgressReporting $ProgressReporting whether to report the progress of the computation WorkingPrecision "Real32" precision of floating-point calculations - Possible values for DistanceFunction include EuclideanDistance, SquaredEuclideanDistance, CosineDistance, JaccardDissimilarity and HammingDistance.
- Possible values for FeatureExtractor include:
-
"SentenceBERT" a local model based on SentenceBERT f a custom extractor function - Custom extractors f must operate on a list of strings and produce a list of vectors of the same length.
- Detailed options can be given using Method<opt1val1 >. Possible values for opti are:
-
"ContextPadding" minimal overlap between items "MaximumItemLength" maximum length of a text segment "MinimumItemLength" minimum length of a text segment "SplitPattern" Automatic where to split long strings - The automatic "SplitPattern" tries to split text in paragraphs, newlines and words to create chunks between "MinimumItemLength" and "MaximumItemLength".
- Possible settings for WorkingPrecision include:
-
"Integer8" signed 8-bit integers from through 127 "Real32" single-precision real (32-bit) "Real64" double-precision real (64-bit)
Examples
open allclose allBasic Examples (1)
Create a new SemanticSearchIndex:
Options (4)
FeatureExtractor (1)
Wolfram Research (2024), CreateSemanticSearchIndex, Wolfram Language function, https://reference.wolfram.com/language/ref/CreateSemanticSearchIndex.html.
Text
Wolfram Research (2024), CreateSemanticSearchIndex, Wolfram Language function, https://reference.wolfram.com/language/ref/CreateSemanticSearchIndex.html.
CMS
Wolfram Language. 2024. "CreateSemanticSearchIndex." Wolfram Language & System Documentation Center. Wolfram Research. https://reference.wolfram.com/language/ref/CreateSemanticSearchIndex.html.
APA
Wolfram Language. (2024). CreateSemanticSearchIndex. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/CreateSemanticSearchIndex.html