Wolfram 语言与系统参考资料中心

FeatureExtraction

FeatureExtraction[examples]

生成用给定样例训练的 FeatureExtractorFunction[…].

FeatureExtraction[examples,spec]

使用指定的特征提取方法 spec.

FeatureExtraction[examples,spec,props]

给出由 props 指定的特征提取属性.

范例

打开所有单元关闭所有单元

基本范例 (3)

在简单的数据集上训练 FeatureExtractorFunction：

Wolfram Language code: fe = FeatureExtraction[{{1.4, "A"}, {1.5, "A"}, {2.3, "B"}, {5.4, "B"}}]

从新的样例提取特征：

Wolfram Language code: fe[{2.4, "A"}]

从一个样例列表提取特征：

Wolfram Language code: fe[{{2.4, "A"}, {3.7, "B"}}]

在图像数据集上训练特征提取程序：

Wolfram Language code: fe = FeatureExtraction[{[image], [image], [image], [image], [image], [image]}]

在训练集上使用特征提取程序：

Wolfram Language code: fe[{[image], [image], [image], [image], [image], [image]}]

指定特定的提取器：

Wolfram Language code: fe = FeatureExtraction[{[image], [image], [image], [image], [image], [image]}, "ImageFeatures"]

范围 (32)

输入形状 (9)

在具有单一特征的样例列表上训练特征提取器：

Wolfram Language code:

fe = FeatureExtraction[{"It was the best of times.", "A journey of a thousand miles begins with a single step.", "To be or not to be, that is the question."}]

从新的样例提取特征：

Wolfram Language code: fe["A rose by any other name"]

从多个新样例中提取特征：

Wolfram Language code: fe[{"A rose by any other name", "It was the worst of times"}]

在具有多个特征的样例列表上训练特征提取器：

Wolfram Language code:

fe = FeatureExtraction[{{"It was the best of times.", "Charles Dickens"}, {"A journey of a thousand miles begins with a single step.", "Laozi (attrib.)"}, {"To be or not to be, that is the question.", "William Shakespere"}}]

从多个新样例中提取特征：

Wolfram Language code:

fe[{{"All the world’s a stage, and all the men and women merely players", "William Shakespere"}, {"Knowing others is intelligence; knowing yourself is true wisdom", "Laozi (attrib.)"}}]

在混合类型数据集上训练特征提取器：

Wolfram Language code:

fe = FeatureExtraction[{{"the cat is grey", [image]}, {"my cat is fast", [image]}, {"this dog is scary", [image]}, {"the big dog", [image]}}]

从新的样例提取特征：

Wolfram Language code: fe[{"the nice cat", [image]}]

从关联列表中训练特征提取器：

Wolfram Language code:

fe = FeatureExtraction[{<|"age" -> 32, "height" -> 160, "gender" -> "female"|>, 
	<|"height" -> 183, "age" -> 41, "gender" -> "female"|>, 
	<|"height" -> 123, "age" -> 30, "gender" -> "female"|>, 
	<|"height" -> 175, "age" -> 21, "gender" -> "male"|>, 
	<|"height" -> 150, "age" -> 11, "gender" -> "male"|>, 
	<|"age" -> 52, "height" -> 164, "gender" -> "female"|>}]

从新的样例提取特征：

Wolfram Language code: fe[<|"age" -> 19, "height" -> 176, "gender" -> "male"|>]

从多个新样例中提取特征：

Wolfram Language code: fe[{<|"age" -> 19, "height" -> 176, "gender" -> "male"|>, <|"age" -> 45, "height" -> 164, "gender" -> "female"|>}]

从以特征列表形式给出的数据中训练特征提取器：

Wolfram Language code:

FeatureExtraction[<|"age" -> {32, 41, 30, 21, 11, 52}, "height" -> {160, 183, 123, 175, 150, 164}, "gender" -> {"female", "female", "female", "male", "male", "female"}|>]

从 Tabular 提取特征：

Wolfram Language code:

FeatureExtraction[Tabular[Association["RawSchema" -> Association["ColumnProperties" -> 
     Association["age" -> Association["ElementType" -> "Integer64"], 
      "height" -> Association["ElementType" -> "Integer64"], 
      "gender" -> Association["ElementType" -> "String"]], "KeyColumns" -> None, 
    "Backend" -> "WolframKernel"], "Options" -> {}, 
  "BackendData" -> Association["ColumnData" -> DataStructure["ColumnTable", 
      {{TabularColumn[Association["Data" -> {{32, 41, 30, 21, 11, 52}, {}, None}, 
          "ElementType" -> "Integer64"]], TabularColumn[Association[
          "Data" -> {{160, 183, 123, 175, 150, 164}, {}, None}, "ElementType" -> "Integer64"]], 
        TabularColumn[Association["Data" -> {{3, {0, 6, 12, 18, 22, 26, 32}, 
             "femalefemalefemalemalemalefemale"}, {}, None}, "ElementType" -> "String"]]}}]]]]]

从 Dataset 提取特征：

Wolfram Language code:

FeatureExtraction[Dataset[{Association["age" -> 32, "height" -> 160, "gender" -> "female"], 
  Association["age" -> 41, "height" -> 183, "gender" -> "female"], 
  Association["age" -> 30, "height" -> 123, "gender" -> "female"], 
  Association["age" -> 21, "height" -> 175, "gender" -> "male"], 
  Association["age" -> 11, "height" -> 150, "gender" -> "male"], 
  Association["age" -> 52, "height" -> 164, "gender" -> "female"]}]]

从包含缺失值的数据集中训练特征提取器：

Wolfram Language code: FeatureExtraction[{{1.4, Missing[], "A"}, {1.5, 50.2, "A"}, {Missing[], 42.3, "B"}, {5.4, 61.7, "B"}}]

定义一个不需要训练的特征提取器：

Wolfram Language code: fe = FeatureExtraction[None, "WordVectors"]

将其应用于一些文本：

Wolfram Language code: fe["hi there"]//Shallow

提取器规范 (10)

在单个文本特征上指定特征提取器 "SentenceVector"：

Wolfram Language code: fe = FeatureExtraction[{"the lizard is green"}, "SentenceVector"]

将其应用于一些文本：

Wolfram Language code: fe[{"there is a large cat"}]//Short

使用 "StandardizedVector" 方法训练特征提取器：

Wolfram Language code: fe = FeatureExtraction[{{1.4, 42.1}, {1.5, 50.2}, {4.2, 42.3}, {5.4, 61.7}}, "StandardizedVector"]

从新样例中提取特征：

Wolfram Language code: features = fe[{6.4, 32.1}]

由于此特征提取器是可逆的，因此可以使用 FeatureExtractorFunction 属性 "OriginalData" 来执行逆提取：

Wolfram Language code: fe[features, "OriginalData"]

使用 "TFIDF" 方法和 "DimensionReducedVector" 方法在文本上训练特征提取器：

Wolfram Language code:

fe = FeatureExtraction[{"the cat is grey", "my cat is fast", "this dog is scary", "the big dog"}, {"TFIDF", "DimensionReducedVector"}]

在训练集上提取特征：

Wolfram Language code: fe[{"the cat is grey", "my cat is fast", "this dog is scary", "the big dog"}]

使用纯文本 "TFIDF" 方法在文本和图像上训练特征提取器：

Wolfram Language code:

fe = FeatureExtraction[{{"the cat is grey", [image]}, {"my cat is fast", [image]}, {"this dog is scary", [image]}, {"the big dog", [image]}}, "TFIDF"]

仅从文本部分提取特征：

Wolfram Language code: fe[{"the nice cat", [image]}]

通过位置指定多个特征的特征提取：

Wolfram Language code:

fe = FeatureExtraction[{{"Glucose", Molecule["Glucose"]}, {"Water", Molecule["Water"]}, {"Acetic Acid", Molecule["Acetic Acid"]}}, {1  -> { "SentenceVector", "DimensionReducedVector"}, 2  -> "MoleculeFeatures"}]

在新特征上使用特征提取器：

Wolfram Language code: fe[{{"Sucrose", Molecule["Sucrose"]}}]

两个元素的列表将被视为两个特征的单一输入：

Wolfram Language code: fe[{"Hydrochloric Acid", Molecule["Hydrochloric Acid"]}]

仅对第二个名义变量使用 "IndicatorVector" 方法训练特征提取器：

Wolfram Language code: fe = FeatureExtraction[{{"Yes", "A"}, {"No", "A"}, {"No", "B"}, {"Maybe", "B"}, {"No", "C"}}, 2 -> "IndicatorVector"]

第一个名义变量被删除：

Wolfram Language code: Normal@fe[{"Yes", "A"}]

使用 Identity 提取器方法复制第一个变量：

Wolfram Language code:

fe = FeatureExtraction[{{"Yes", "A"}, {"No", "A"}, {"No", "B"}, {"Maybe", "B"}, {"No", "C"}}, {2 -> "IndicatorVector", 1 -> Identity}]

复制第一个变量：

Wolfram Language code: fe[{"Yes", "A"}]

一个变量可以被复制多次：

Wolfram Language code:

fe = FeatureExtraction[{{"Yes", "A"}, {"No", "A"}, {"No", "B"}, {"Maybe", "B"}, {"No", "C"}}, {2 -> "IndicatorVector", 1 -> "IndicatorVector", 1 -> Identity}]

Wolfram Language code: fe[{"Yes", "A"}]

通过键值指定多个特征上的特征提取：

Wolfram Language code:

fe = FeatureExtraction[Tabular[Association["RawSchema" -> Association["ColumnProperties" -> 
     Association["Name" -> Association["ElementType" -> "String"], 
      "Molecule" -> Association["ElementType" -> "InertExpression"]], "KeyColumns" -> None, 
    "Backend" -> "WolframKernel"], "Options" -> {}, 
  "BackendData" -> Association["ColumnData" -> DataStructure["ColumnTable", 
      {{TabularColumn[Association["Data" -> {{3, {0, 7, 12, 23}, "GlucoseWaterAcetic Acid"}, {}, 
            None}, "ElementType" -> "String"]], TabularColumn[
         Association["Data" -> {{Molecule[{"O", "C", "C", "O", "C", "O", "C", "O", "C", "O", "C", 
               "O", "H", "H", "H", "H", "H", "H", "H", "H", "H", "H", "H", "H"}, 
              {Bond[{1, 2}, "Double"], Bond[{2, 3}, "Single"], Bond[{3, 4}, "Single"], Bond[{3, 5}, 
                "Single"], Bond[{5, 6}, "Single"], Bond[{5, 7}, "Single"], Bond[{7, 8}, "Single"], 
               Bond[{7, 9}, "Single"], Bond[{9, 10}, "Single"], Bond[{9, 11}, "Single"], Bond[
                {11, 12}, "Single"], Bond[{2, 13}, "Single"], Bond[{3, 14}, "Single"], Bond[
                {4, 15}, "Single"], Bond[{5, 16}, "Single"], Bond[{6, 17}, "Single"], Bond[{7, 18}, 
                "Single"], Bond[{8, 19}, "Single"], Bond[{9, 20}, "Single"], Bond[{10, 21}, 
                "Single"], Bond[{11, 22}, "Single"], Bond[{11, 23}, "Single"], Bond[{12, 24}, 
                "Single"]}, {StereochemistryElements -> {Association["StereoType" -> "Tetrahedral", 
                  "ChiralCenter" -> 3, "Direction" -> "Counterclockwise"], Association[
                  "StereoType" -> "Tetrahedral", "ChiralCenter" -> 5, "Direction" -> "Clockwise"], 
                 Association["StereoType" -> "Tetrahedral", "ChiralCenter" -> 7, "Direction" -> 
                   "Counterclockwise"], Association["StereoType" -> "Tetrahedral", 
                  "ChiralCenter" -> 9, "Direction" -> "Counterclockwise"]}}], 
             Molecule[{"O", "H", "H"}, {Bond[{1, 2}, "Single"], Bond[{1, 3}, "Single"]}, {}], 
             Molecule[{"C", "C", "O", "O", "H", "H", "H", "H"}, {Bond[{1, 2}, "Single"], Bond[
                {2, 3}, "Double"], Bond[{2, 4}, "Single"], Bond[{1, 5}, "Single"], Bond[{1, 6}, 
                "Single"], Bond[{1, 7}, "Single"], Bond[{4, 8}, "Single"]}, {}]}, {}, None}, 
          "ElementType" -> "InertExpression", "CachedOriginalExpression" -> 
           {Molecule[{"O", "C", "C", "O", "C", "O", "C", "O", "C", "O", "C", "O", "H", "H", "H", 
              "H", "H", "H", "H", "H", "H", "H", "H", "H"}, {Bond[{1, 2}, "Double"], 
              Bond[{2, 3}, "Single"], Bond[{3, 4}, "Single"], Bond[{3, 5}, "Single"], 
              Bond[{5, 6}, "Single"], Bond[{5, 7}, "Single"], Bond[{7, 8}, "Single"], 
              Bond[{7, 9}, "Single"], Bond[{9, 10}, "Single"], Bond[{9, 11}, "Single"], 
              Bond[{11, 12}, "Single"], Bond[{2, 13}, "Single"], Bond[{3, 14}, "Single"], 
              Bond[{4, 15}, "Single"], Bond[{5, 16}, "Single"], Bond[{6, 17}, "Single"], 
              Bond[{7, 18}, "Single"], Bond[{8, 19}, "Single"], Bond[{9, 20}, "Single"], 
              Bond[{10, 21}, "Single"], Bond[{11, 22}, "Single"], Bond[{11, 23}, "Single"], 
              Bond[{12, 24}, "Single"]}, {StereochemistryElements -> {Association["StereoType" -> 
                  "Tetrahedral", "ChiralCenter" -> 3, "Direction" -> "Counterclockwise"], 
                Association["StereoType" -> "Tetrahedral", "ChiralCenter" -> 5, "Direction" -> 
                  "Clockwise"], Association["StereoType" -> "Tetrahedral", "ChiralCenter" -> 7, 
                 "Direction" -> "Counterclockwise"], Association["StereoType" -> "Tetrahedral", 
                 "ChiralCenter" -> 9, "Direction" -> "Counterclockwise"]}}], 
            Molecule[{"O", "H", "H"}, {Bond[{1, 2}, "Single"], Bond[{1, 3}, "Single"]}, {}], 
            Molecule[{"C", "C", "O", "O", "H", "H", "H", "H"}, {Bond[{1, 2}, "Single"], 
              Bond[{2, 3}, "Double"], Bond[{2, 4}, "Single"], Bond[{1, 5}, "Single"], 
              Bond[{1, 6}, "Single"], Bond[{1, 7}, "Single"], Bond[{4, 8}, "Single"]}, {}]}]]}}]]]],   {"Name"  -> { "SentenceVector", "DimensionReducedVector"}, "Molecule"  -> "MoleculeFeatures"}]

在新特征上使用特征提取器：

Wolfram Language code: fe[<|"Name" -> "Hydrochloric Acid", "Molecule" -> Molecule["Hydrochloric Acid"]|>]

在列表上使用特征提取器将假定特征的顺序与最初指定的顺序相同：

Wolfram Language code: fe[{"Hydrochloric Acid", Molecule["Hydrochloric Acid"]}]

使用自定义函数生成特征提取器：

Wolfram Language code:

data = {DateObject[{2014, 5, 5}, TimeObject[{9, 53, 6.30158}, TimeZone -> -5.], TimeZone -> -5.], DateObject[{2000, 1, 1}, TimeObject[{0, 0, 0.}, TimeZone -> -5.], TimeZone -> -5.], DateObject[{2006, 12}], DateObject[{2007, 8, 23}], DateObject[{2016, 4, 4}, TimeObject[{15, 59, 18.2738}, TimeZone -> -4.], TimeZone -> -4.]};

Wolfram Language code: fe = FeatureExtraction[data, {AbsoluteTime[#], #["Year"]}&]

将提取器应用于训练集：

Wolfram Language code: fe[data]

将自定义提取器与 "StandardizedVector" 方法链接起来：

Wolfram Language code: fe2 = FeatureExtraction[data, {{AbsoluteTime[#], #["Year"]}&, "StandardizedVector"}]

Wolfram Language code: fe2[data]

处理之前确认数据：

Wolfram Language code: FeatureExtraction[{[image], [image], [image], [image]}, {"ConformedData", "ImageFeatures"}]

降低输出的维数：

Wolfram Language code: FeatureExtraction[{[image], [image], [image], [image]}, {"ImageFeatures", "DimensionReducedVector"}]

特征类型 (10)

使用无需训练的 "SentenceVector" 提取器为文本数据创建特征提取器：

Wolfram Language code: fe = FeatureExtraction[None, "SentenceVector"]

输入类型是从指定的提取器推断出来的. 使用特征提取器来处理以下样例：

Wolfram Language code: fe[{"it is not a cat", "what a nice dog", "here is a dog again"}]//Short

为具有隐式文本和图像特征的样例创建特征提取器：

Wolfram Language code: fe = FeatureExtraction[None, {1 -> "SentenceVector", 2 -> "ImageFeatures"}]

将从两个部分提取特征：

Wolfram Language code: fe[{"the nice cat", [image]}]//Short

在文本数据上训练特征提取器：

Wolfram Language code: FeatureExtraction[{"the cat is grey", "my cat is fast", "this dog is scary", "the big dog"}]

使用 "IndicatorVector" 方法在名义变量上训练特征提取器：

Wolfram Language code: FeatureExtraction[{{"Yes", "A"}, {"No", "A"}, {"No", "B"}, {"Maybe", "B"}, {"No", "C"}}, "IndicatorVector"]

训练特征提取程序计算文本的词频逆向文件频率向量：

Wolfram Language code: fe = FeatureExtraction[{"the cat is grey", "my cat is fast", "this dog is scary", "the big dog"}, "TFIDF"]

训练集的词频逆向文件频率矩阵可以在 SparseArray 中计算：

Wolfram Language code: matrix = fe[{"the cat is grey", "my cat is fast", "this dog is scary", "the big dog"}]

可视化矩阵：

Wolfram Language code: matrix//MatrixPlot

"TFIDF" 方法也可以用于标记的数据（名义袋）：

Wolfram Language code:

FeatureExtraction[{{"the", "cat", "is", "grey"}, {"my" , "cat", "is", "fast"}, {"this", "dog", "is", "scary"}, {"the", "big", "dog"}}, "TFIDF", "ExtractedFeatures"]

在 DateObject 实例列表上训练特征提取器：

Wolfram Language code:

fe = FeatureExtraction[{DateObject[{2014, 5, 5, 9, 53, 6.30158}, "Instant", "Gregorian", -6.], DateObject[{2000, 1, 1, 0, 0, 0.}, "Instant", "Gregorian", -6.], DateObject[{2006, 12}, "Month", "Gregorian", -6.], DateObject[{2007, 8, 23}, "Day", "Gregorian", -6.], DateObject[{2016, 4, 4, 15, 59, 18.2738}, "Instant", "Gregorian", -4.]}]

从新的 DateObject 中提取特征：

Wolfram Language code: fe[DateObject[{2003, 1, 2}, "Day", "Gregorian", -6.]]

也可以给出字符串日期：

Wolfram Language code: fe["2nd of January 2003"]

在 Graph 实例列表上训练特征提取器：

Wolfram Language code:

fe = FeatureExtraction[{[image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image]}]

从新图中提取特征：

Wolfram Language code: fe[[image]]

在 TimeSeries 实例列表上训练特征提取器：

Wolfram Language code:

FeatureExtraction[{TemporalData[TimeSeries, {{{0, 1, 0, 3, 0, 0, 0, 0, 2, 1, 0, 3, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 
    0, 0, 0, 0, 0, 2, 0, 0, 36, 0, 6, 1, 2, 8, 6, 24, 20, 31, 68, 45, 140, 116, 65, 376, 322, 382, 
    516, 544, 767, 1133, 1788, 1360, 5886, 5412 ... tion[{2020, 1, 23, 0, 0, 0.}, {2020, 10, 5, 0, 0, 0.}, {1, "Day"}]}, 
  1, {"Continuous", 1}, {"Discrete", 1}, 1, {DateFunction -> Automatic, 
   ResamplingMethod -> {"Interpolation", InterpolationOrder -> 1}, ValueDimensions -> 1}}, True, 
 12.2], TemporalData[TimeSeries, {{{0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 23, 2, 1, 3, 5, 4, 13, 6, 11, 9, 20, 11, 6, 
    23, 14, 38, 50, 86, 66, 103, 37, 121, 70, 1 ...  {TemporalData`DateSpecification[{2020, 1, 23, 0, 0, 0.}, {2020, 10, 5, 0, 0, 0.}, {1, "Day"}]}, 
  1, {"Continuous", 1}, {"Discrete", 1}, 1, 
  {ResamplingMethod -> {"Interpolation", InterpolationOrder -> 1}, ValueDimensions -> 1}}, True, 
 12.2], TemporalData[TimeSeries, {{{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 2, 0, 9, 0, 7, 5, 6, 7, 14, 99, 0, 11, 38, 
    121, 51, 249, 172, 228, 572, 331, 323, 307,  ... TemporalData`DateSpecification[{2020, 1, 23, 0, 0, 0.}, 
    {2020, 10, 5, 0, 0, 0.}, {1, "Day"}]}, 1, {"Continuous", 1}, {"Discrete", 1}, 1, 
  {ResamplingMethod -> {"Interpolation", InterpolationOrder -> 1}, ValueDimensions -> 1}}, True, 
 12.2], TemporalData[TimeSeries, {{{0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 9, 0, 4, 0, 3, 0, 8, 17, 14, 4, 27, 
    24, 33, 52, 54, 53, 61, 71, 57, 163, 182, 196 ...  {TemporalData`DateSpecification[{2020, 1, 23, 0, 0, 0.}, {2020, 10, 5, 0, 0, 0.}, {1, "Day"}]}, 
  1, {"Continuous", 1}, {"Discrete", 1}, 1, 
  {ResamplingMethod -> {"Interpolation", InterpolationOrder -> 1}, ValueDimensions -> 1}}, True, 
 12.2], TemporalData[TimeSeries, {{{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 6, 0, 4, 9, 12, 20, 11, 
    28, 9, 26, 68, 35, 46, 101, 92, 21, 48, 69 ...  {TemporalData`DateSpecification[{2020, 1, 23, 0, 0, 0.}, {2020, 10, 5, 0, 0, 0.}, {1, "Day"}]}, 
  1, {"Continuous", 1}, {"Discrete", 1}, 1, 
  {ResamplingMethod -> {"Interpolation", InterpolationOrder -> 1}, ValueDimensions -> 1}}, True, 
 12.2]}]

在 Molecule 数据上训练特征提取器：

Wolfram Language code: FeatureExtraction[{Molecule["Sucrose"], Molecule["Pentane"]}]

在 Audio 实例列表上训练特征提取器：

Wolfram Language code:

FeatureExtraction[{Audio[Sound[Table[SoundNote[i, If[i == 12, 0.5, 0.1], "Violin"], {i, 0, 12}]]], Audio[Sound[Table[SoundNote[i, If[i == 12, 0.5, 0.1], "Trumpet"], {i, 0, 12}]]]}]

信息 (3)

从经过训练的 FeatureExtractorFunction 获取 Information：

Wolfram Language code:

Information[FeatureExtractorFunction[Association["ExampleNumber" -> 4, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["age" -> Association["Type" -> "Numerical"], 
       "gender" -> Association["Type" - ... ate" -> DateObject[{2025, 5, 2, 13, 0, 
       39.042795`8.344115878952366}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]]

查找可用的属性：

Wolfram Language code:

Information[FeatureExtractorFunction[Association["ExampleNumber" -> 4, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["age" -> Association["Type" -> "Numerical"], 
       "gender" -> Association["Type" - ... ate" -> DateObject[{2025, 5, 2, 13, 0, 
       39.042795`8.344115878952366}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]], "Properties"]

获取有关输入和输出类型的信息：

Wolfram Language code:

Information[FeatureExtractorFunction[Association["ExampleNumber" -> 4, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["age" -> Association["Type" -> "Numerical"], 
       "gender" -> Association["Type" - ... ate" -> DateObject[{2025, 5, 2, 13, 0, 
       39.042795`8.344115878952366}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]], {"InputTypes", "OutputTypes"}]

选项 (4)

FeatureNames (2)

训练特征提取程序，给每个特征一个名称：

Wolfram Language code:

fe = FeatureExtraction[{{2.3, "male"}, {4.8, Missing[]}, {Missing[], "female"}, {5.2, "female"}}, FeatureNames -> {"age", "gender"}]

使用关联格式从新的样例提取特征：

Wolfram Language code: fe[<|"age" -> 3.3, "gender" -> "male"|>]

依然可以使用列表格式：

Wolfram Language code: fe[{3.3, "male"}]

使用 FeatureNames 设置名称，并在 FeatureExtraction[examples,{spec₁ext₁,…}] 中引用它们：

Wolfram Language code:

fe = FeatureExtraction[{{"A", "female"}, {"B", "female"}, {"C", "male"}, {"B", "male"}}, {"class" -> Identity, "gender" -> "IndicatorVector"}, FeatureNames -> {"class", "gender"}]

使用名称指定特征来提取新样例的特征：

Wolfram Language code: fe[<|"gender" -> "female", "class" -> "B"|>]

FeatureTypes (2)

在简单数据集上通过 "IndicatorVector" 训练特征提取程序：

Wolfram Language code: fe = FeatureExtraction[{{1, "A"}, {2, "A"}, {2, "B"}, {1, "B"}}, "IndicatorVector"]

第一个特征被解释为数值型. 由于 "IndicatorVector" 方法仅作用于名义特征，第一个特征不变：

使用 FeatureTypes 执行作为名义的第一个特征的诠释：

Wolfram Language code:

fe = FeatureExtraction[{{1, "A"}, {2, "A"}, {2, "B"}, {1, "B"}}, "IndicatorVector", FeatureTypes -> <|1 -> "Numerical"|>]

第一个特征被诠释为数值型，由于 "IndicatorVector" 方法仅作用于名义特征，第一个特征不变化.

Wolfram Language code: fe[{{1, "A"}, {2, "A"}, {2, "B"}, {1, "B"}}]

创建无需训练的特征提取器可以从特定的提取器推断出预期的数据类型：

Wolfram Language code: fe = FeatureExtraction[None, "SentenceVector"]

指定特征类型将覆盖以下假设：

Wolfram Language code: fe2 = FeatureExtraction[None, "SentenceVector", FeatureTypes -> {"first" -> "Numerical", "second" -> "Text"}]

应用于命名特征：

Wolfram Language code: fe2[<|"first" -> 1, "second" -> "Good morning!"|>]//Short

应用 (3)

图像搜索 (1)

建立一个狗图片的数据集：

Wolfram Language code:

dataset = {[image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image]};

从这个数据集训练提取程序函数：

Wolfram Language code: fe = FeatureExtraction[dataset]

生成关于数据集的提取特征的 NearestFunction：

Wolfram Language code: nf = Nearest[fe[dataset] -> Automatic]

使用 NearestFunction，构建一个函数，显示数据集的最相近图像：

Wolfram Language code: nearestdog = dataset[[First@nf[fe[#]]]]&

将这个函数用于不在这个数据集的图像：

Wolfram Language code: nearestdog[[image]]

这个特征提取程序函数也可以用于删除过于相似的图像对：

Wolfram Language code: features = # -> fe[#]& /@ dataset;

Wolfram Language code: First /@ DeleteDuplicates[features, CosineDistance[#1[[2]], #2[[2]]] < 1 &]

文本搜索 (1)

加载 Alice in Wonderland 的文本：

Wolfram Language code: alice = ExampleData[{"Text", "AliceInWonderland"}];

将文本拆分为单句：

Wolfram Language code: sentences = TextSentences[alice];

在这些单句上训练特征提取程序：

Wolfram Language code: fe = FeatureExtraction[sentences]

生成带有单句特征的 NearestFunction：

Wolfram Language code: nf = Nearest[fe[sentences] -> Automatic]

使用 NearestFunction，构建一个函数，显示 Alice in Wonderland 中最相近的单句：

Wolfram Language code: nearestalice = sentences[[First@nf[fe[#]]]]&

将这个函数用于几个查询：

Wolfram Language code: nearestalice["Alice and the Rabbit"]

Wolfram Language code: nearestalice["The cat and the Queen"]

Wolfram Language code: nearestalice["Off her head"]

估算 (1)

从 ExampleData 中加载 "MNIST" 数据集，并保留图像：

Wolfram Language code: digits = First /@ ExampleData[{"MachineLearning", "MNIST"}, "TestData"];

Wolfram Language code: RandomSample[digits, 10]

将图像转换为数值数据，并将数据集分离成训练集和测试集：

Wolfram Language code: digits = Flatten[ImageData[#]]& /@ RandomSample[digits];

Wolfram Language code:

trainingset = digits[[ ;; 9000]];
testset = digits[[9001 ;; ]];

数据集的维度为 784：

Wolfram Language code: Dimensions[trainingset]

使用 "MissingImputed" 方法创建特征提取程序：

Wolfram Language code: fe = FeatureExtraction[trainingset, "MissingImputed"]

用 Missing[] 替换测试集向量的某些值，并可视化：

Wolfram Language code: vector = RandomChoice[testset];

Wolfram Language code: toimage = Image[Partition[#, 28], ImageSize -> Tiny] &;

Wolfram Language code:

vectormissing = vector;
vectormissing[[309 ;; 364]] = Missing[];
imagemissing = toimage[Replace[vectormissing, _Missing -> .5, {1}]]

使用 FeatureExtractorFunction[…] 估算缺失值：

Wolfram Language code: imputedimage = toimage[fe[vectormissing]]

可视化原始图像、带有缺失值的图像和估算后的图像：

Wolfram Language code: {toimage[vector], imagemissing, imputedimage}

属性和关系 (4)

从具有命名特征的数据中训练特征提取器：

Wolfram Language code:

fe = FeatureExtraction[<|"age" -> {32, 41, 30, 21, 11, 52}, "height" -> {160, 183, 123, 175, 150, 164}, "gender" -> {"female", "female", "female", "male", "male", "female"}|>]

无法识别的键将被忽略：

Wolfram Language code: fe[<|"age" -> 19, "height" -> 176, "gender" -> "male"|>]

Wolfram Language code: fe[<|"age" -> 19, "height" -> 176, "gender" -> "male", "weight" -> 32|>]

FeatureExtraction[…,"ExtractedFeatures"] 等价于 FeatureExtract[…]：

Wolfram Language code: data = {"the cat is grey", "my cat is fast", "this dog is scary", "the big dog"};

Wolfram Language code: FeatureExtraction[data, "TFIDF", "ExtractedFeatures"] == FeatureExtract[data, "TFIDF"]

"FeatureDistance" 属性相当于在提取器上使用 FeatureDistance：

Wolfram Language code:

fd1 = FeatureExtraction[{"the cat is grey", "my cat is fast", "this dog is scary", "the big dog"}, "TFIDF", "FeatureDistance"]

首先计算 FeatureExtractorFunction：

Wolfram Language code: fe = FeatureExtraction[{"the cat is grey", "my cat is fast", "this dog is scary", "the big dog"}, "TFIDF"]

为该特征提取器构建特征距离：

Wolfram Language code: fd2 = FeatureDistance[fe]

这两个距离函数是相同的：

Wolfram Language code: fd1["the cat is grey", "the big dog"]

Wolfram Language code: fd2["the cat is grey", "the big dog"]

在一些训练数据上创建 FeatureExtractorFunction 会创建一个表示这些特征的特征空间：

Wolfram Language code: fe = FeatureExtraction[{Molecule["Glucose"], Molecule["Sucrose"]}]

使用不同的训练数据可以得到不同大小的特征空间：

Wolfram Language code:

molecules = Molecule /@ EntityList[EntityClass["Chemical", "AminoAcids"]];
fe2 = FeatureExtraction[molecules]

Wolfram Language code: fe[Molecule["Water"]] === fe2[Molecule["Water"]]

在无数据的情况下创建相同的函数，将导致一个未经训练的函数，该函数在相同的特征空间中始终给出相同的结果：

Wolfram Language code: fe3 = FeatureExtraction[None, "MoleculeFeatures"]

可能存在的问题 (7)

在匿名数据上训练提取器将使用自动特征名称：

Wolfram Language code: feAutomatic = FeatureExtraction[{"C", "B", "C", "B", "A", "B", "B", "A", "B", "B", "A", "A"}, "IndicatorVector"]

Wolfram Language code: Information[feAutomatic, "FeatureNames"]

应用该函数时使用自定义名称将导致特征缺失错误：

Wolfram Language code: feAutomatic[<|"Letter" -> "S"|>]

可以在训练时指定特征名称：

Wolfram Language code:

feNamed = FeatureExtraction[{"C", "B", "C", "B", "A", "B", "B", "A", "B", "B", "A", "A"}, "IndicatorVector", FeatureNames -> "Letter"]

检查 FeatureExtractorFunction 的特征名称：

Wolfram Language code: Information[feNamed, "FeatureNames"]

现在可以使用自定义名称：

Wolfram Language code: feNamed[<|"Letter" -> "S"|>]

FeatureExtraction 属性 "ReconstructedData" 可用于获取提取和重建后的数据：

Wolfram Language code:

FeatureExtraction[{{1.4, 1.4, 5.4, 5.2}, {1.5, 1.5, 6.4, 5.2}, {1.2, 1.2, 6.2, 5.2}, {1.6, 1.6, 4.3, 5.2}}, "DimensionReducedVector", "ReconstructedData"]

一些特征提取器只能执行逆提取的近似：

Wolfram Language code:

fe = FeatureExtraction[{{1.4, 1.4, 5.4, 5.2}, {1.5, 1.5, 6.4, 5.2}, {1.2, 1.2, 6.2, 5.2}, {1.6, 1.6, 4.3, 5.2}}, "DiscretizedVector", "ReconstructedData"]

一些特征提取器不能执行逆提取：

Wolfram Language code: FeatureExtraction[{[image], [image], [image], [image]}, "ImageFeatures", "ReconstructedData"]

如果没有训练数据，则无法使用属性 "ReconstructedData"：

Wolfram Language code: FeatureExtraction[None, "DimensionReducedVector", "ReconstructedData"]

一些提取器可以在不需要数据的情况下创建：

Wolfram Language code: FeatureExtraction[None, "LowerCasedText"]

其它提取器需要样例来对其初始化：

Wolfram Language code: FeatureExtraction[None, "StandardizedVector"]

类似地，并非所有属性都受支持：

Wolfram Language code: FeatureExtraction[None, "LowerCasedText", "FeatureDistance"]

与数据类型不匹配的提取器将被忽略：

Wolfram Language code:

fe = FeatureExtraction[{"No", "No", "no", "no", "no", "no", "yes", "no", "Yes", "Yes"}, {"LowerCasedText", "IndicatorVector"}]

类似地，强制输入 "Text" 将导致 "IndicatorVector" 被忽略：

Wolfram Language code: fe["Yes"] == fe["yes"]

输入类型为 "Nominal"，因此 "LowerCasedText" 提取器忽略输入类型：

Wolfram Language code:

fe = FeatureExtraction[{"No", "No", "no", "no", "no", "no", "yes", "no", "Yes", "Yes"}, {"LowerCasedText", "IndicatorVector"}, FeatureTypes -> "Text"]

Wolfram Language code: fe["Yes"]

"ConformedData" 提取器需要额外的信息才能在无数据环境中运行：

Wolfram Language code: FeatureExtraction[None, "ConformedData"]

明确指定 FeatureTypes：

Wolfram Language code: FeatureExtraction[None, "ConformedData", FeatureTypes -> "Image"]

特征类型也可以从后续提取器中隐式推断出来：

Wolfram Language code: FeatureExtraction[None, {"ConformedData", "ImageFeatures"}]

自动特征提取通常采用降维步骤：

Wolfram Language code: fe = FeatureExtraction[{"rhinos have horns", "deer have antlers", "fish have scales"}]

显式特征提取器不包括降维，通常会导致更长的向量：

Wolfram Language code: fe = FeatureExtraction[{"rhinos have horns", "deer have antlers", "fish have scales"}, "SentenceVector"]

使用 "DimensionReducedVector" 添加降维步骤：

Wolfram Language code:

fe = FeatureExtraction[{"rhinos have horns", "deer have antlers", "fish have scales"}, {"SentenceVector", "DimensionReducedVector"}]

降维必须针对可用特征进行训练，因此在没有提供数据时无法应用：

Wolfram Language code: fe = FeatureExtraction[None, {"SentenceVector", "DimensionReducedVector"}]

用已命名的特征创建一个FeatureExtractorFunction ：

Wolfram Language code: fe = FeatureExtraction[None, {"Name" -> "LowerCasedText", "Molecule" -> "MoleculeFeatures"}]

规则被解释为原子表达式，而非特征的名称：

Wolfram Language code: fe[{"Name" -> "Amoxicillin", "Molecule" -> Molecule["Amoxicillin"]}]

用关联来指定已命名的特征：

Wolfram Language code: fe[<|"Name" -> "Amoxicillin", "Molecule" -> Molecule["Amoxicillin"]|>]//Short

输出类型由处理器设置：

Wolfram Language code:

response = {"No", "No", "no", "no", "no", "no", "yes", "no", "Yes", "Yes"};
fe = FeatureExtraction[response, "LowerCasedText"]

后面要求不同输入类型的处理器将被忽略：

Wolfram Language code: fe = FeatureExtraction[response, {"LowerCasedText", "IndicatorVector"}]

使用两级提取处理，重新解释类型：

Wolfram Language code:

fe1 = FeatureExtraction[response, "LowerCasedText"]
fe2 = FeatureExtraction[fe1[response], "IndicatorVector"]

依次应用，获取结果：

Wolfram Language code: fe2[fe1["No"]]

Top

	{example₁,…}	训练样例的列表
	Dataset[…]	Dataset 对象
	Tabular[…]	Tabular 对象
	None	无训练样例

	extractor	使用指定的提取器方法
	partextractor	对特定样例部分应用提取器
	{part₁extractor₁,…}	为特定部分指定提取器

	Automatic	自动提取
	Identity	给出无变化的数据
	"ConformedData"	一致化的图像、颜色、日期等
	"NumericVector"	来自任意数据的数值向量
	"name"	命名提取器方法
	f	对各个样例应用函数 f
	{extractor₁,extractor₂,…}	依次使用一系列提取器

	All	每个样例的所有部分
	i	每个样例的第 i 个部分
	{i₁,i₂,…}	每个样例的第 i₁、i₂、… 个部分
	"key"	每个样例中具有指定键的部分
	{"key₁","key₂",…}	每个样例中名称为 "key_i" 的部分

	"IndicatorVector"	用指示向量“独热编码”的名义数据
	"IntegerVector"	用整数编码的名义数据

更多学习资源

技术支持

Wolfram 解决方案

Wolfram 的教育解决方案

开始

提高你的技能

与我们合作

成人教育计划

青少年教育计划

欢迎阅读

FeatureExtraction

更多信息和选项

提取器

属性

选项

范例

基本范例 (3)

范围 (32)

输入形状 (9)

提取器规范 (10)

特征类型 (10)

信息 (3)

选项 (4)

FeatureNames (2)

FeatureTypes (2)

应用 (3)

图像搜索 (1)

文本搜索 (1)

估算 (1)

属性和关系 (4)

可能存在的问题 (7)

文本

CMS

APA

BibTeX

BibLaTeX

	"DiscretizedVector"	离散化的数值数据
	"DimensionReducedVector"	降维的数值向量
	"MissingImputed"	缺失值被估算的数据
	"StandardizedVector"	用 Standardize 处理过的数值数据

	"LowerCasedText"	每个字符均为小写的文本
	"SegmentedCharacters"	分割成字符的文本
	"SegmentedWords"	分割成单词的文本
	"SentenceVector"	文字的语义向量
	"TFIDF"	词频逆向文件频率向量
	"WordVectors"	文字的语义向量序列（仅限英文）

	"FaceFeatures"	来自人脸图像的语义向量
	"ImageFeatures"	图像的语义向量
	"PixelVector"	图像像素值向量

	"AudioFeatures"	音频对象的语义向量序列
	"AudioFeatureVector"	音频对象的语义向量
	"LPC"	音频线性预测系数
	"MelSpectrogram"	用对数频次分组的音频频谱图
	"MFCC"	音频梅尔频率倒谱系数向量序列
	"SpeakerFeatures"	讲话者的语义向量序列
	"SpeakerFeatureVector"	讲话者的语义向量
	"Spectrogram"	音频频谱图

FeatureExtraction

更多信息和选项

提取器

属性

选项

范例

基本范例 (3)

范围 (32)

输入形状 (9)

提取器规范 (10)

特征类型 (10)

信息 (3)

选项 (4)

FeatureNames (2)

FeatureTypes (2)

应用 (3)

图像搜索 (1)

文本搜索 (1)

估算 (1)

属性和关系 (4)

可能存在的问题 (7)

参见

相关指南

历史

文本

CMS

APA

BibTeX

BibLaTeX