文本内容类型

自然语言处理函数,例如,TextCasesTextPositionTextContents 允许识别文本中许多不同类型的内容. 这些内容的某些是结构化的或语法化的,其他则与语义诠释相关.

Containing 定义匹配的包含器(例如,句子)

Alternatives 匹配多种类型

Verbatim 逐字匹配的字符串

StringExpression  ▪  RegularExpression

结构元素

"Word" 类似单词的单位(通常由空格或标点符号分隔)

"Sentence" 类似句子的单位(通常由标点符号分隔)

"Paragraph" 类似段落的单位(由多个换行符分隔)

"Quotation" 由引号分隔的引文

"Line" 由一个换行符分隔的子字符串

"NonText" 不是普通字母似文本的字符

"Punctuation" 标点符号

"Whitespace" 空白字符序列

"Emoticon" 表情符号(例如,笑脸)

词性

"Noun"  ▪  "Verb"  ▪  "Adjective"  ▪  "Adverb"  ▪  "Pronoun"  ▪  "Preposition"  ▪  "Conjunction"  ▪  "Determiner"  ▪  "Interjection"

"ProperNoun" 专有名词,一般大写字母开头

"WhPronoun"  ▪  "WhAdverb"  ▪  "WhDeterminer"

"Punctuation"  ▪  "PossessiveModifier"  ▪  "ListItemMarker"  ▪  "Symbol"  ▪  "ForeignWord"

词组类型

"NounPhrase"  ▪  "VerbPhrase"  ▪  "AdjectivePhrase"  ▪  "AdverbPhrase"  ▪  "PrepositionalPhrase"  ▪  "ConjunctionPhrase"

"WhNounPhrase"  ▪  "WhAdjectivePhrase"  ▪  "WhAdverbPhrase"  ▪  "WhPrepositionalPhrase"

"NounPhraseHead"  ▪  "QuantifierPhrase"  ▪  "UnlikeCoordinatedPhrase"

"Clause"  ▪  "ReducedRelativeClause"

"Sentence"  ▪  "Fragment"  ▪  "Parenthetical"  ▪  "ListMarker"

量元素

"Number" 数字(例如,"67"、"6.78"、"6.78e+10"、"two thousand")

"Quantity" 带单位的量(例如,"4.5 km"、"10 ft. 6 in."、"30C"、"7 m/s"、"three kilometers")

"Unit" units (e.g. "km", "ft.", "m/s", "kilometers")

"CurrencyAmount" 货币金额(例如,"$5"、"45 pesos"、"10.25 GBP"、"seven euros")

"Color" 文字描述的颜色(例如,"light blue")

时间和位置元素

"Date" 日期或日期元素(例如,天、月、年、世纪)

"Location" 命名的地理位置(例如,"New York"、"France")

"LocationEntity" 带有实体解释的已命名地理位置

识别元素

"EmailAddress"  ▪  "IPAddress"  ▪  "PhoneNumber"  ▪  "URL"  ▪  "ZIPCode"

"TwitterHandle" 推特句柄(例如,"@Wolfram")

实体

Entity 匹配任何列出类型的指定实体,例如:

图形实体

"Country"  ▪  "AdministrativeDivision"  ▪  "City"  ▪  "Neighborhood"  ▪  "MetropolitanArea"  ▪  "GeographicRegion"

"Ocean"  ▪  "Island"  ▪  "UnderseaFeature"  ▪  "Reef"  ▪  "Beach"  ▪  "Lake"  ▪  "Mountain"  ▪  "Volcano"  ▪  "River"  ▪  "Waterfall"  ▪  "EarthImpact"  ▪  "Desert"  ▪  "Forest"

"Airport"  ▪  "Park"  ▪  "AmusementPark"  ▪  "AmusementParkRide"  ▪  "Stadium"

"Bridge"  ▪  "Canal"  ▪  "Tunnel"  ▪  "Dam"  ▪  "Mine"  ▪  "Cave"  ▪  "OilField"  ▪  "Building"  ▪  "Castle"  ▪  "Cemetery"  ▪  "HistoricalSite"  ▪  "ReserveLand"  ▪  "Shipwreck"

"University"  ▪  "SchoolDistrict"  ▪  "PublicSchool"  ▪  "PrivateSchool"  ▪  "Museum"  ▪  "LibraryBranch"  ▪  "LibrarySystem"

"WeatherStation"  ▪  "AstronomicalObservatory"  ▪  "ParticleAccelerator"  ▪  "NuclearReactor"  ▪  "NuclearTestSite"  ▪  "NuclearExplosion"

"TimeZone"

天文实体

"Planet"  ▪  "PlanetaryMoon"  ▪  "MinorPlanet"  ▪  "Comet"  ▪  "SolarSystemFeature"  ▪  "MeteorShower"  ▪  "Exoplanet"

"Star"  ▪  "Galaxy"  ▪  "StarCluster"  ▪  "Nebula"  ▪  "Supernova"  ▪  "Pulsar"  ▪  "AstronomicalRadioSource"  ▪  "Constellation"

空间相关的

"Satellite"  ▪  "Rocket"  ▪  "DeepSpaceProbe"  ▪  "MannedSpaceMission"

天气与地球科学

"WeatherStation"  ▪  "TropicalStorm"  ▪  "Cloud"  ▪  "AtmosphericLayer"

"GeologicalLayer"  ▪  "GeologicalPeriod"  ▪  "Mineral"  ▪  "FamousGem"

交通相关的

"Aircraft"  ▪  "Airline"  ▪  "Airport"  ▪  "Ship"

工程与结构

"BroadcastStation"  ▪  "MeasurementDevice"

"Building"  ▪  "Bridge"  ▪  "Tunnel"  ▪  "Dam"  ▪  "Mine"

文化和娱乐

"Language"  ▪  "Religion"  ▪  "Mythology"

"Movie"  ▪  "MusicAct"  ▪  "MusicAlbum"  ▪  "MusicWork"  ▪  "BroadcastStation"

"Book"  ▪  "Artwork"  ▪  "Periodical"  ▪  "FictionalCharacter"

"Museum"  ▪  "LibraryBranch"  ▪  "LibrarySystem"

活动与爱好

"MusicalInstrument"  ▪  "SportObject"  ▪  "BoardGame"

食物与营养

"Food"  ▪  "FoodBrandName"  ▪  "FoodManufacturer"  ▪  "FoodSubBrandName"

金融

"Company"  ▪  "Financial"

人与人的属性

"Person"  ▪  "GivenName"  ▪  "Surname"  ▪  "PersonTitle"  ▪  "Occupation"

历史相关的

"HistoricalCountry"  ▪  "HistoricalSite"

语言学实体

"Language"  ▪  "Alphabet"  ▪  "WritingScript"

物理科学

"Chemical"  ▪  "Element"  ▪  "Particle"  ▪  "Mineral"

"FamousPhysicsProblem"  ▪  "FamousChemistryProblem"

生命科学

"Gene"  ▪  "Protein"

医药实体

"AnatomicalStructure"  ▪  "Disease"  ▪  "MedicalTest"  ▪  "Protein"

有机体类型

"Plant"  ▪  "Species"  ▪  "DogBreed"  ▪  "CatBreed"  ▪  "Dinosaur"

数学实体

"Polyhedron"  ▪  "Surface"  ▪  "SpaceCurve"  ▪  "Graph"  ▪  "FiniteGroup"  ▪  "IntegerSequence"

"FamousMathProblem"  ▪  "FamousMathGame"

计算机相关的

"NotableComputer"  ▪  "ProgrammingLanguage"

语言样式和情绪

"PositiveSentiment"  ▪  "NegativeSentiment"  ▪  "NeutralSentiment"

"Profanity" 含有亵渎字样的文本

内容主题

"BooksTopic"  ▪  "CareerAndMoneyTopic"  ▪  "FamilyAndFriendsTopic"  ▪  "FashionTopic"  ▪  "FitnessTopic"  ▪  "FoodAndDrinkTopic"  ▪  "HealthTopic"  ▪  "LeisureTopic"  ▪  "MoviesTopic"  ▪  "MusicTopic"  ▪  "PersonalMoodTopic"  ▪  "PetsAndAnimalsTopic"  ▪  "PoliticsTopic"  ▪  "QuotesAndLifePhilosophyTopic"  ▪  "RelationshipsTopic"  ▪  "SchoolAndUniversityTopic"  ▪  "SocialMediaTopic"  ▪  "SpecialOccasionsTopic"  ▪  "SportsTopic"  ▪  "TechnologyTopic"  ▪  "TelevisionTopic"  ▪  "TransportTopic"  ▪  "TravelTopic"  ▪  "VideoGamesTopic"  ▪  "WeatherTopic"

人类语言

"Afrikaans"  ▪  "Albanian"  ▪  "Amharic"  ▪  "Arabic"  ▪  "Armenian"  ▪  "Azerbaijani"  ▪  "Basque"  ▪  "Bengali"  ▪  "Bosnian"  ▪  "Bulgarian"  ▪  "Catalan"  ▪  "Chinese"  ▪  "Croatian"  ▪  "Czech"  ▪  "Danish"  ▪  "Dutch"  ▪  "English"  ▪  "Esperanto"  ▪  "Estonian"  ▪  "Finnish"  ▪  "French"  ▪  "Georgian"  ▪  "German"  ▪  "Greek"  ▪  "Gujarati"  ▪  "Hebrew"  ▪  "Hindi"  ▪  "Hungarian"  ▪  "Icelandic"  ▪  "InuktitutGreenlandic"  ▪  "Italian"  ▪  "Japanese"  ▪  "Kannada"  ▪  "Kazakh"  ▪  "Khmer"  ▪  "Korean"  ▪  "Latvian"  ▪  "Lithuanian"  ▪  "Macedonian"  ▪  "Majhi"  ▪  "Malay"  ▪  "Malayalam"  ▪  "Mongolian"  ▪  "Nepali"  ▪  "NorwegianBokmal"  ▪  "Persian"  ▪  "Polish"  ▪  "Portuguese"  ▪  "Romanian"  ▪  "Russian"  ▪  "Serbian"  ▪  "Sinhala"  ▪  "Slovak"  ▪  "Slovenian"  ▪  "Spanish"  ▪  "Swahili"  ▪  "Swedish"  ▪  "Tagalog"  ▪  "Tamil"  ▪  "Telugu"  ▪  "Thai"  ▪  "Turkish"  ▪  "Ukrainian"  ▪  "Urdu"  ▪  "UzbekNorthern"  ▪  "Vietnamese"  ▪  "Welsh"

编程语言

"ABAP"  ▪  "Ada"  ▪  "AWK"  ▪  "BourneShell"  ▪  "C"  ▪  "CPlusPlus"  ▪  "CSharp"  ▪  "COBOL"  ▪  "CommonLisp"  ▪  "D"  ▪  "Dart"  ▪  "Delphi"  ▪  "Erlang"  ▪  "FSharp"  ▪  "Fortran"  ▪  "Groovy"  ▪  "Haskell"  ▪  "Java"  ▪  "JavaScript"  ▪  "Logo"  ▪  "Lua"  ▪  "MATLAB"  ▪  "ObjectiveC"  ▪  "Perl"  ▪  "PHP"  ▪  "Prolog"  ▪  "Python"  ▪  "R"  ▪  "Ruby"  ▪  "Rust"  ▪  "SAS"  ▪  "Scala"  ▪  "Scheme"  ▪  "SQL"  ▪  "Swift"  ▪  "Tcl"  ▪  "VBSCript"  ▪  "VisualBasicNET"  ▪  "WindowsPowerShell"  ▪  "WolframLanguage"