テキストコンテンツタイプ
TextCases,TextPosition,TextContents等の自然言語処理関数は,テキスト中のさまざまなタイプのコンテンツが認識できる.これらのタイプには,構造的,文法的なものや,意味解釈に関係するもの等がある.
Containing — マッチさせるコンテナ(文等)を定義する
Alternatives — 複数のタイプのマッチング
Verbatim — 文字列の逐語的マッチング
StringExpression ▪ RegularExpression
構造要素
"Word" — 単語的部分(通常空白文字や句読点で区切られる)
"Sentence" — 文的部分(通常句読点文字で区切られる)
"Paragraph" — 段落的部分(通常複数の新規行で区切られる)
"Quotation" — 引用符で区切られた引用句
"Line" — 新規行で区切られた部分文字列
"NonText" — 通常の文字的テキストではない文字
"Punctuation" — 句読点
"Whitespace" — 空白文字列
"Emoticon" — 顔文字(スマイリーフェイス等)
音声の部分
"Noun" ▪ "Verb" ▪ "Adjective" ▪ "Adverb" ▪ "Pronoun" ▪ "Preposition" ▪ "Conjunction" ▪ "Determiner" ▪ "Interjection"
"ProperNoun" — 通常大文字で始まる固有名詞
"WhPronoun" ▪ "WhAdverb" ▪ "WhDeterminer"
"Punctuation" ▪ "PossessiveModifier" ▪ "ListItemMarker" ▪ "Symbol" ▪ "ForeignWord"
句のタイプ
"NounPhrase" ▪ "VerbPhrase" ▪ "AdjectivePhrase" ▪ "AdverbPhrase" ▪ "PrepositionalPhrase" ▪ "ConjunctionPhrase"
"WhNounPhrase" ▪ "WhAdjectivePhrase" ▪ "WhAdverbPhrase" ▪ "WhPrepositionalPhrase"
"NounPhraseHead" ▪ "QuantifierPhrase" ▪ "UnlikeCoordinatedPhrase"
"Clause" ▪ "ReducedRelativeClause"
"Sentence" ▪ "Fragment" ▪ "Parenthetical" ▪ "ListMarker"
数量の要素
"Number" — 数(「67」,「6.78」,「6.78e+10」,「two thousand」等)
"Quantity" — 単位付き数量(「4.5 km」,「10 ft. 6 in.」,「30C」,「7 m/s」,「three kilometers」等)
"Unit" — 単位(「km」,「ft.」,「m/s」,「kilometers」等)
"CurrencyAmount" — 通貨の額(「$5」,「45 pesos」,「10.25 GBP」,「seven euros」等)
"Color" — テキストで記された色(「light blue」等)
時間と場所の要素
"Date" — 日付または日付要素(日,月,年,世紀等)
"Location" — 名前付き地理的位置(「New York」,「France」等)
"LocationEntity" — 名前付き地理的位置およびその実体の解釈
識別要素
"EmailAddress" ▪ "IPAddress" ▪ "PhoneNumber" ▪ "URL" ▪ "ZIPCode"
"TwitterHandle" — Twitterハンドル(「@Wolfram」等)
実体
Entity — 任意のタイプの特定の実体にマッチする.例:
地理的実体
"Country" ▪ "AdministrativeDivision" ▪ "City" ▪ "Neighborhood" ▪ "MetropolitanArea" ▪ "GeographicRegion"
"Ocean" ▪ "Island" ▪ "UnderseaFeature" ▪ "Reef" ▪ "Beach" ▪ "Lake" ▪ "Mountain" ▪ "Volcano" ▪ "River" ▪ "Waterfall" ▪ "EarthImpact" ▪ "Desert" ▪ "Forest"
"Airport" ▪ "Park" ▪ "AmusementPark" ▪ "AmusementParkRide" ▪ "Stadium"
"Bridge" ▪ "Canal" ▪ "Tunnel" ▪ "Dam" ▪ "Mine" ▪ "Cave" ▪ "OilField" ▪ "Building" ▪ "Castle" ▪ "Cemetery" ▪ "HistoricalSite" ▪ "ReserveLand" ▪ "Shipwreck"
"University" ▪ "SchoolDistrict" ▪ "PublicSchool" ▪ "PrivateSchool" ▪ "Museum" ▪ "LibraryBranch" ▪ "LibrarySystem"
"WeatherStation" ▪ "AstronomicalObservatory" ▪ "ParticleAccelerator" ▪ "NuclearReactor" ▪ "NuclearTestSite" ▪ "NuclearExplosion"
天文学的実体
"Planet" ▪ "PlanetaryMoon" ▪ "MinorPlanet" ▪ "Comet" ▪ "SolarSystemFeature" ▪ "MeteorShower" ▪ "Exoplanet"
"Star" ▪ "Galaxy" ▪ "StarCluster" ▪ "Nebula" ▪ "Supernova" ▪ "Pulsar" ▪ "AstronomicalRadioSource" ▪ "Constellation"
宇宙関連
"Satellite" ▪ "Rocket" ▪ "DeepSpaceProbe" ▪ "MannedSpaceMission"
天候と地球科学
"WeatherStation" ▪ "TropicalStorm" ▪ "Cloud" ▪ "AtmosphericLayer"
"GeologicalLayer" ▪ "GeologicalPeriod" ▪ "Mineral" ▪ "FamousGem"
輸送関連
"Aircraft" ▪ "Airline" ▪ "Airport" ▪ "Ship"
工学と構造
"BroadcastStation" ▪ "MeasurementDevice"
"Building" ▪ "Bridge" ▪ "Tunnel" ▪ "Dam" ▪ "Mine"
文化と娯楽
"Language" ▪ "Religion" ▪ "Mythology"
"Movie" ▪ "MusicAct" ▪ "MusicAlbum" ▪ "MusicWork" ▪ "BroadcastStation"
"Book" ▪ "Artwork" ▪ "Periodical" ▪ "FictionalCharacter"
"Museum" ▪ "LibraryBranch" ▪ "LibrarySystem"
活動と趣味
"MusicalInstrument" ▪ "SportObject" ▪ "BoardGame"
食物と栄養
"Food" ▪ "FoodBrandName" ▪ "FoodManufacturer" ▪ "FoodSubBrandName"
金融
人と個人的属性
"Person" ▪ "GivenName" ▪ "Surname" ▪ "PersonTitle" ▪ "Occupation"
歴史関連
"HistoricalCountry" ▪ "HistoricalSite"
言語学的実体
"Language" ▪ "Alphabet" ▪ "WritingScript"
物理化学
"Chemical" ▪ "Element" ▪ "Particle" ▪ "Mineral"
"FamousPhysicsProblem" ▪ "FamousChemistryProblem"
生命科学
医学的実体
"AnatomicalStructure" ▪ "Disease" ▪ "MedicalTest" ▪ "Protein"
生命体のタイプ
"Plant" ▪ "Species" ▪ "DogBreed" ▪ "CatBreed" ▪ "Dinosaur"
数学的実体
"Polyhedron" ▪ "Surface" ▪ "SpaceCurve" ▪ "Graph" ▪ "FiniteGroup" ▪ "IntegerSequence"
"FamousMathProblem" ▪ "FamousMathGame"
計算関連
"NotableComputer" ▪ "ProgrammingLanguage"
言語のスタイルと感情
"PositiveSentiment" ▪ "NegativeSentiment" ▪ "NeutralSentiment"
"Profanity" — 不敬が含まれるテキスト
コンテンツのトピック
"BooksTopic" ▪ "CareerAndMoneyTopic" ▪ "FamilyAndFriendsTopic" ▪ "FashionTopic" ▪ "FitnessTopic" ▪ "FoodAndDrinkTopic" ▪ "HealthTopic" ▪ "LeisureTopic" ▪ "MoviesTopic" ▪ "MusicTopic" ▪ "PersonalMoodTopic" ▪ "PetsAndAnimalsTopic" ▪ "PoliticsTopic" ▪ "QuotesAndLifePhilosophyTopic" ▪ "RelationshipsTopic" ▪ "SchoolAndUniversityTopic" ▪ "SocialMediaTopic" ▪ "SpecialOccasionsTopic" ▪ "SportsTopic" ▪ "TechnologyTopic" ▪ "TelevisionTopic" ▪ "TransportTopic" ▪ "TravelTopic" ▪ "VideoGamesTopic" ▪ "WeatherTopic"
人間の言語
"Afrikaans" ▪ "Albanian" ▪ "Amharic" ▪ "Arabic" ▪ "Armenian" ▪ "Azerbaijani" ▪ "Basque" ▪ "Bengali" ▪ "Bosnian" ▪ "Bulgarian" ▪ "Catalan" ▪ "Chinese" ▪ "Croatian" ▪ "Czech" ▪ "Danish" ▪ "Dutch" ▪ "English" ▪ "Esperanto" ▪ "Estonian" ▪ "Finnish" ▪ "French" ▪ "Georgian" ▪ "German" ▪ "Greek" ▪ "Gujarati" ▪ "Hebrew" ▪ "Hindi" ▪ "Hungarian" ▪ "Icelandic" ▪ "InuktitutGreenlandic" ▪ "Italian" ▪ "Japanese" ▪ "Kannada" ▪ "Kazakh" ▪ "Khmer" ▪ "Korean" ▪ "Latvian" ▪ "Lithuanian" ▪ "Macedonian" ▪ "Majhi" ▪ "Malay" ▪ "Malayalam" ▪ "Mongolian" ▪ "Nepali" ▪ "NorwegianBokmal" ▪ "Persian" ▪ "Polish" ▪ "Portuguese" ▪ "Romanian" ▪ "Russian" ▪ "Serbian" ▪ "Sinhala" ▪ "Slovak" ▪ "Slovenian" ▪ "Spanish" ▪ "Swahili" ▪ "Swedish" ▪ "Tagalog" ▪ "Tamil" ▪ "Telugu" ▪ "Thai" ▪ "Turkish" ▪ "Ukrainian" ▪ "Urdu" ▪ "UzbekNorthern" ▪ "Vietnamese" ▪ "Welsh"
プログラミング言語
"ABAP" ▪ "Ada" ▪ "AWK" ▪ "BourneShell" ▪ "C" ▪ "CPlusPlus" ▪ "CSharp" ▪ "COBOL" ▪ "CommonLisp" ▪ "D" ▪ "Dart" ▪ "Delphi" ▪ "Erlang" ▪ "FSharp" ▪ "Fortran" ▪ "Groovy" ▪ "Haskell" ▪ "Java" ▪ "JavaScript" ▪ "Logo" ▪ "Lua" ▪ "MATLAB" ▪ "ObjectiveC" ▪ "Perl" ▪ "PHP" ▪ "Prolog" ▪ "Python" ▪ "R" ▪ "Ruby" ▪ "Rust" ▪ "SAS" ▪ "Scala" ▪ "Scheme" ▪ "SQL" ▪ "Swift" ▪ "Tcl" ▪ "VBSCript" ▪ "VisualBasicNET" ▪ "WindowsPowerShell" ▪ "WolframLanguage"