文本内容类型
自然语言处理函数,例如,TextCases、TextPosition 和 TextContents 允许识别文本中许多不同类型的内容. 这些内容的某些是结构化的或语法化的,其他则与语义诠释相关.
Containing — 定义匹配的包含器(例如,句子)
Alternatives — 匹配多种类型
Verbatim — 逐字匹配的字符串
StringExpression ▪ RegularExpression
结构元素
"Word"— 类似单词的单位(通常由空格或标点符号分隔)
"Sentence" — 类似句子的单位(通常由标点符号分隔)
"Paragraph" — 类似段落的单位(由多个换行符分隔)
"Quotation" — 由引号分隔的引文
"Line" — 由一个换行符分隔的子字符串
"NonText" — 不是普通字母似文本的字符
"Punctuation" — 标点符号
"Whitespace" — 空白字符序列
"Emoticon" — 表情符号(例如,笑脸)
词性
"Noun" ▪ "Verb" ▪ "Adjective" ▪ "Adverb" ▪ "Pronoun" ▪ "Preposition" ▪ "Conjunction" ▪ "Determiner" ▪ "Interjection"
"ProperNoun" — 专有名词,一般大写字母开头
"WhPronoun" ▪ "WhAdverb" ▪ "WhDeterminer"
"Punctuation" ▪ "PossessiveModifier" ▪ "ListItemMarker" ▪ "Symbol" ▪ "ForeignWord"
词组类型
"NounPhrase" ▪ "VerbPhrase" ▪ "AdjectivePhrase" ▪ "AdverbPhrase" ▪ "PrepositionalPhrase" ▪ "ConjunctionPhrase"
"WhNounPhrase" ▪ "WhAdjectivePhrase" ▪ "WhAdverbPhrase" ▪ "WhPrepositionalPhrase"
"NounPhraseHead" ▪ "QuantifierPhrase" ▪ "UnlikeCoordinatedPhrase"
"Clause" ▪ "ReducedRelativeClause"
"Sentence" ▪ "Fragment" ▪ "Parenthetical" ▪ "ListMarker"
量元素
"Number" — 数字(例如,"67"、"6.78"、"6.78e+10"、"two thousand")
"Quantity" — 带单位的量(例如,"4.5 km"、"10 ft. 6 in."、"30C"、"7 m/s"、"three kilometers")
"Unit" — units (e.g. "km", "ft.", "m/s", "kilometers")
"CurrencyAmount" — 货币金额(例如,"$5"、"45 pesos"、"10.25 GBP"、"seven euros")
"Color" — 文字描述的颜色(例如,"light blue")
时间和位置元素
"Date" — 日期或日期元素(例如,天、月、年、世纪)
"Location" — 命名的地理位置(例如,"New York"、"France")
"LocationEntity" — 带有实体解释的已命名地理位置
识别元素
"EmailAddress" ▪ "IPAddress" ▪ "PhoneNumber" ▪ "URL" ▪ "ZIPCode"
"TwitterHandle" — 推特句柄(例如,"@Wolfram")
实体
Entity — 匹配任何列出类型的指定实体,例如:
图形实体
"Country" ▪ "AdministrativeDivision" ▪ "City" ▪ "Neighborhood" ▪ "MetropolitanArea" ▪ "GeographicRegion"
"Ocean" ▪ "Island" ▪ "UnderseaFeature" ▪ "Reef" ▪ "Beach" ▪ "Lake" ▪ "Mountain" ▪ "Volcano" ▪ "River" ▪ "Waterfall" ▪ "EarthImpact" ▪ "Desert" ▪ "Forest"
"Airport" ▪ "Park" ▪ "AmusementPark" ▪ "AmusementParkRide" ▪ "Stadium"
"Bridge" ▪ "Canal" ▪ "Tunnel" ▪ "Dam" ▪ "Mine" ▪ "Cave" ▪ "OilField" ▪ "Building" ▪ "Castle" ▪ "Cemetery" ▪ "HistoricalSite" ▪ "ReserveLand" ▪ "Shipwreck"
"University" ▪ "SchoolDistrict" ▪ "PublicSchool" ▪ "PrivateSchool" ▪ "Museum" ▪ "LibraryBranch" ▪ "LibrarySystem"
"WeatherStation" ▪ "AstronomicalObservatory" ▪ "ParticleAccelerator" ▪ "NuclearReactor" ▪ "NuclearTestSite" ▪ "NuclearExplosion"
天文实体
"Planet" ▪ "PlanetaryMoon" ▪ "MinorPlanet" ▪ "Comet" ▪ "SolarSystemFeature" ▪ "MeteorShower" ▪ "Exoplanet"
"Star" ▪ "Galaxy" ▪ "StarCluster" ▪ "Nebula" ▪ "Supernova" ▪ "Pulsar" ▪ "AstronomicalRadioSource" ▪ "Constellation"
空间相关的
"Satellite" ▪ "Rocket" ▪ "DeepSpaceProbe" ▪ "MannedSpaceMission"
天气与地球科学
"WeatherStation" ▪ "TropicalStorm" ▪ "Cloud" ▪ "AtmosphericLayer"
"GeologicalLayer" ▪ "GeologicalPeriod" ▪ "Mineral" ▪ "FamousGem"
交通相关的
"Aircraft" ▪ "Airline" ▪ "Airport" ▪ "Ship"
工程与结构
"BroadcastStation" ▪ "MeasurementDevice"
"Building" ▪ "Bridge" ▪ "Tunnel" ▪ "Dam" ▪ "Mine"
文化和娱乐
"Language" ▪ "Religion" ▪ "Mythology"
"Movie" ▪ "MusicAct" ▪ "MusicAlbum" ▪ "MusicWork" ▪ "BroadcastStation"
"Book" ▪ "Artwork" ▪ "Periodical" ▪ "FictionalCharacter"
"Museum" ▪ "LibraryBranch" ▪ "LibrarySystem"
活动与爱好
"MusicalInstrument" ▪ "SportObject" ▪ "BoardGame"
食物与营养
"Food" ▪ "FoodBrandName" ▪ "FoodManufacturer" ▪ "FoodSubBrandName"
金融
人与人的属性
"Person" ▪ "GivenName" ▪ "Surname" ▪ "PersonTitle" ▪ "Occupation"
历史相关的
"HistoricalCountry" ▪ "HistoricalSite"
语言学实体
"Language" ▪ "Alphabet" ▪ "WritingScript"
物理科学
"Chemical" ▪ "Element" ▪ "Particle" ▪ "Mineral"
"FamousPhysicsProblem" ▪ "FamousChemistryProblem"
生命科学
医药实体
"AnatomicalStructure" ▪ "Disease" ▪ "MedicalTest" ▪ "Protein"
有机体类型
"Plant" ▪ "Species" ▪ "DogBreed" ▪ "CatBreed" ▪ "Dinosaur"
数学实体
"Polyhedron" ▪ "Surface" ▪ "SpaceCurve" ▪ "Graph" ▪ "FiniteGroup" ▪ "IntegerSequence"
"FamousMathProblem" ▪ "FamousMathGame"
计算机相关的
"NotableComputer" ▪ "ProgrammingLanguage"
语言样式和情绪
"PositiveSentiment" ▪ "NegativeSentiment" ▪ "NeutralSentiment"
"Profanity" — 含有亵渎字样的文本
内容主题
"BooksTopic" ▪ "CareerAndMoneyTopic" ▪ "FamilyAndFriendsTopic" ▪ "FashionTopic" ▪ "FitnessTopic" ▪ "FoodAndDrinkTopic" ▪ "HealthTopic" ▪ "LeisureTopic" ▪ "MoviesTopic" ▪ "MusicTopic" ▪ "PersonalMoodTopic" ▪ "PetsAndAnimalsTopic" ▪ "PoliticsTopic" ▪ "QuotesAndLifePhilosophyTopic" ▪ "RelationshipsTopic" ▪ "SchoolAndUniversityTopic" ▪ "SocialMediaTopic" ▪ "SpecialOccasionsTopic" ▪ "SportsTopic" ▪ "TechnologyTopic" ▪ "TelevisionTopic" ▪ "TransportTopic" ▪ "TravelTopic" ▪ "VideoGamesTopic" ▪ "WeatherTopic"
人类语言
"Afrikaans" ▪ "Albanian" ▪ "Amharic" ▪ "Arabic" ▪ "Armenian" ▪ "Azerbaijani" ▪ "Basque" ▪ "Bengali" ▪ "Bosnian" ▪ "Bulgarian" ▪ "Catalan" ▪ "Chinese" ▪ "Croatian" ▪ "Czech" ▪ "Danish" ▪ "Dutch" ▪ "English" ▪ "Esperanto" ▪ "Estonian" ▪ "Finnish" ▪ "French" ▪ "Georgian" ▪ "German" ▪ "Greek" ▪ "Gujarati" ▪ "Hebrew" ▪ "Hindi" ▪ "Hungarian" ▪ "Icelandic" ▪ "InuktitutGreenlandic" ▪ "Italian" ▪ "Japanese" ▪ "Kannada" ▪ "Kazakh" ▪ "Khmer" ▪ "Korean" ▪ "Latvian" ▪ "Lithuanian" ▪ "Macedonian" ▪ "Majhi" ▪ "Malay" ▪ "Malayalam" ▪ "Mongolian" ▪ "Nepali" ▪ "NorwegianBokmal" ▪ "Persian" ▪ "Polish" ▪ "Portuguese" ▪ "Romanian" ▪ "Russian" ▪ "Serbian" ▪ "Sinhala" ▪ "Slovak" ▪ "Slovenian" ▪ "Spanish" ▪ "Swahili" ▪ "Swedish" ▪ "Tagalog" ▪ "Tamil" ▪ "Telugu" ▪ "Thai" ▪ "Turkish" ▪ "Ukrainian" ▪ "Urdu" ▪ "UzbekNorthern" ▪ "Vietnamese" ▪ "Welsh"
编程语言
"ABAP" ▪ "Ada" ▪ "AWK" ▪ "BourneShell" ▪ "C" ▪ "CPlusPlus" ▪ "CSharp" ▪ "COBOL" ▪ "CommonLisp" ▪ "D" ▪ "Dart" ▪ "Delphi" ▪ "Erlang" ▪ "FSharp" ▪ "Fortran" ▪ "Groovy" ▪ "Haskell" ▪ "Java" ▪ "JavaScript" ▪ "Logo" ▪ "Lua" ▪ "MATLAB" ▪ "ObjectiveC" ▪ "Perl" ▪ "PHP" ▪ "Prolog" ▪ "Python" ▪ "R" ▪ "Ruby" ▪ "Rust" ▪ "SAS" ▪ "Scala" ▪ "Scheme" ▪ "SQL" ▪ "Swift" ▪ "Tcl" ▪ "VBSCript" ▪ "VisualBasicNET" ▪ "WindowsPowerShell" ▪ "WolframLanguage"