SemanticImport
SemanticImport[file]
attempts to import a file semantically to give a Dataset object.
SemanticImport[file,type]
attempts to interpret all elements in the file as being of the specified type.
SemanticImport[file,{type1,type2,…}]
attempts to interpret elements in successive columns as being of the specified types.
SemanticImport[file,col1->type1,col2->type2,…]
keeps only the columns coli specified by their positions or names.
SemanticImport[file,typespec,form]
puts the result in the specified form.
Details and Options
- In SemanticImport[file], file can be specified as File["path"] or simply "path".
- SemanticImport is primarily intended for one- and two-dimensional arrays of elements.
- SemanticImport can use free-form linguistics to interpret elements in the structure it is given.
- Types of objects returned include numbers, Quantity objects, Entity objects, DateObject, GeoPosition, etc.
- SemanticImport makes detailed assumptions, for example about date formats, by looking at all elements in particular rows or columns of the input.
- Possible values for type include:
-
Automatic choose type automatically "String" Unicode string "Number" number in any standard format "Integer" integer in decimal notation "Real" real in decimal notation "Quantity" quantity with units "Currency" currency amount "Date" date in any standard format "DateTime" date and time "Time" time of day "GeoCoordinates" geo position specifed as latitude, longitude "URL" correctly formatted URL "EmailAddress" correctly formatted email address "Country" country given in natural language "City" city given in natural language None skip a column ispec any basic form used by Interpreter - The following options can be given to indicate features of the input:
-
CharacterEncoding Automatic assumed encoding of input file Delimiters Automatic delimiters between elements HeaderLines Automatic line numbers to treat as headers ExcludedLines {} lines to exclude from result MissingDataRules {} rules for replacing data to be considered "missing" - Possible values for form include:
-
"Dataset" a row-oriented dataset "List" a single column as a list "Columns" a list of columns, each given as a list "NamedColumns" an association associating column name with list of contents "Rows" a list of rows, each given as a list "NamedRows" a list of rows, each given as an association from column name to content - When elements cannot be interpreted, forms returned in their place include:
-
Missing["Empty"] an empty or whitespace element Missing["Invalid","string"] data with invalid or meaningless fields Missing["Unrecognized","string"] element that could not be parsed Missing["ByDesignation",value] an element matching MissingDataRules Missing[custom] a Missing[…] provided through MissingDataRules
Examples
open allclose allBasic Examples (7)
Import a file, automatically detecting and interpreting dates and cities:
Columns shown in bold correspond to semantic objects in the Wolfram Language:
Import a file with the specified column types:
Import only some columns of a file, in the specified format, using column numbers:
Import only some columns of a file, in the specified format, using column names:
Import only some columns, specifying None for columns that should be dropped:
Scope (3)
Import a file using a given character encoding:
Import a file using the given delimiter:
Specify that the first line of the file to import is a header:
Specify that the first and fifth lines of a file should be skipped:
Return missing values with the form "Unknown" in the special form Missing["UnknownData"]:
Options (7)
SemanticImport uses many of the same options as SemanticImportString. See SemanticImportString for more examples.
CharacterEncoding (1)
Delimiters (1)
ExcludedLines (1)
Applications (6)
Import a table containing the flight cost from London to many countries as a Dataset object:
Get the geographic position of London:
Get the maximum price of a flight:
Make a map showing the least expensive flight routes in blue and the most expensive ones in orange:
Import the data for a timeline of personal emails:
Get the values that are in the "family" category:
Import the first and third columns from a table of salaries for college faculty members:
Import a dataset consisting of dates and numeric values as a Dataset object:
Obtain the data as a list of rows:
Specify that dates should be interpreted as strings:
Import a dataset containing a list of famous buildings and their properties as a Dataset object. Cities and countries are automatically detected as Entity objects:
Import only the Name, Country, and Height columns of the famous building dataset:
Possible Issues (3)
Automatic selection chooses from a less rich set of types than Interpreter:
Specify explicit types to import Entity objects rather than strings:
An Automatic type specifies an automatically selected number of columns:
An {Automatic} type specifies a single column of automatically selected type:
Automatic in a type list applies to the corresponding column sequentially:
The default Automatic selection of header lines can be incorrect, depending on whether data is organized in rows or columns:
Specify the number of header lines explicitly to import the data correctly:
Text
Wolfram Research (2014), SemanticImport, Wolfram Language function, https://reference.wolfram.com/language/ref/SemanticImport.html (updated 2016).
CMS
Wolfram Language. 2014. "SemanticImport." Wolfram Language & System Documentation Center. Wolfram Research. Last Modified 2016. https://reference.wolfram.com/language/ref/SemanticImport.html.
APA
Wolfram Language. (2014). SemanticImport. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/SemanticImport.html