TSV (.tsv)

背景

    • MIME 类型:text/tab-separated-values
    • TSV 表格数据格式.
    • 按行存储数值和文本信息,使用制表符分隔字段.
    • TSV 是 Tab-Separated Values 的缩写.
    • 纯文本格式.
    • 类似于 CSV.

Import 与 Export

  • Import["file.tsv"] 返回字符串和数字的二维数组,表示存储在文件中的行列.
  • Import["file.tsv",elem] 从一个 TSV 文件中导入指定的参数.
  • Import["file.tsv",{elem,sub1,}] 导入子参数,特别是有助于导入部分数据的参数.
  • 导入格式可以用 Import["file","TSV"]Import["file",{"TSV",elem,}] 指定.
  • Export["file.tsv",expr]expr 创建 TSV 文件.
  • 支持的 expr 表达式包括:
  • {v1,v2,}单列数据
    {{v11,v12,},{v21,v22,},}数据行列表
    array数组包括 SparseArrayQuantityArray
    tseriesTimeSeriesEventSeriesTemporalData 对象
    Dataset[]数据集
    Tabular[]a tabular object
  • 请到以下参考页面了解完整的基本信息:
  • Import, Export从文件导入或导出到文件
    CloudImport, CloudExport从云对象导入或导出到云对象
    ImportString, ExportString从字符串导入或导出到字符串
    ImportByteArray, ExportByteArray从字节数组导入或导出到字节数组

Import 参数

  • Import 通用参数:
  • "Elements"该文件可用的参数和选项列表
    "Rules"所有可用参数的规则列表
    "Summary"文件摘要
  • 表示数据的参数:
  • "Data"二维数组
    "Grid"作为 Grid 对象的表格数据
    "RawData"二维数组字符串
    "Dataset"作为 Dataset 列表数据
    "Tabular"table data as a Tabular object
  • Data descriptor elements:
  • "ColumnLabels"names of columns
    "ColumnTypes"association of column names and types
    "Schema"TabularSchema object
  • 默认情况下,ImportExport 使用 "Data" 参数.
  • 对于部分数据导入,任何数据表示参数 elem 可用 {elem, rows, cols} 格式指定行和列,其中 rowscols 可为以下任意:
  • nn 行或列
    -n从末尾计数
    n;;mnm
    n;;m;;snm 步长为 s
    {n1,n2,}特定的行和列 ni
  • 元数据参数:
  • "ColumnCount"列数
    "Dimensions"行数列表和最大列数
    "RowCount"行数

选项

  • ImportExport 选项:
  • "EmptyField"""如何表示空白字段
    "QuotingCharacter""\""character used to delimit non-numeric fields
  • 数据区域内包含逗号或行分隔符通常用双引号字符包围. 默认情况下,Export 使用创引号字符作为分隔符. 指定不同字符使用 "QuotingCharacter".
  • 默认情况下,并不导入双引号字符分隔的文本字段.
  • Import 选项:
  • CharacterEncoding"UTF8ISOLatin1"文件中行字符串使用的编码
    "ColumnTypeDetectionDepth"Automaticnumber of rows used for header detection
    "CurrencyTokens"None当导入数值值时会跳过货币单位
    "DateStringFormat"None日期格式,按 DateString 规范给出
    "FieldSeparator""\t"string token taken to separate columns
    "FillRows"Automatic是否将行按最大列长填满
    "HeaderLines"Automatic假设为开头的行数
    "IgnoreEmptyLines"False是否忽略空白行
    MissingValuePatternAutomaticpatterns used to specify missing elements
    "NumberPoint""."小数点字符串
    "Numeric"Automatic如果可以的话是否将数据字段导入为数字
    "Schema"Automaticschema used to construct Tabular object
    "SkipInvalidLines"Falsewhether to skip invalid lines
    "SkipLines"Automatic文件开头处跳过的行数
  • 在默认情况下,Import 尝试将数据解释为 "UTF8" 编码文本. 如何文件中存储的任意位数序列不能用 "UTF8" 表示,则 Import 使用 "ISOLatin1" 替代.
  • CharacterEncoding -> Automatic, Import 将尝试推断文件的字符编码.
  • 以下为 "HeaderLines""SkipLines" 的可能设定:
  • Automatictry to automatically determine the number of rows to skip or use as header
    n逃过 n 行或作为 Dataset 标题使用
    {rows,cols}跳过行和列或作为标题使用
  • Import 把由 "DateStringFormat" 选项指定格式化的表格项转换 DateObject.
  • Export 选项:
  • AlignmentNone数据如何在表格行中对应
    CharacterEncoding"UTF8"文件中行字符串使用的编码
    "FillRows"False是否将行按最大列长填满
    "IncludeQuotingCharacter"Automaticwhether to add quotations around exported values
    "TableHeadings"Automatic表格行和列的标题
  • Alignment 的可用设定为 NoneLeftCenterRight.
  • "IncludeQuotingCharacter" can be set to the following values:
  • Nonedo not enclose any values in quotes
    Automaticonly enclose values in quotes when needed
    Allenclose all valid values in quotes
  • "TableHeadings" 可设置为以下值:
  • Noneskip column labels
    Automaticexport column labels
    {"col1","col2",}list of column labels
    {rhead,chead}specifies separate labels for the rows and columns
  • Export 使用运行 Wolfram 语言的计算机系统的常用规范编码行分隔字符.

范例

打开所有单元关闭所有单元

基本范例  (3)

Import a TSV file:

Import summary of a TSV file:

Export an array of expressions to TSV:

范围  (8)

Import  (4)

Import metadata from a CSV file:

Import a TSV file as a Tabular object with automatic header detection:

Import without headers, while skipping the first line:

Import a sample row of a TSV:

Analyze a single column of a file; start by looking at column labels and their types:

Get all values for one column:

Compute the mean:

Export  (4)

Export a Tabular object:

Use "TableHeadings" option to remove header from a Tabular object:

导出一个 TimeSeries

导出一个 EventSeries

导出一个 QuantityArray

导入参数  (26)

"ColumnCount"  (1)

Get the number of columns from a TSV file:

"ColumnLabels"  (1)

Get the inferred column labels from a TSV file:

"ColumnTypes"  (1)

Get the inferred column types from a TSV file:

"Data"  (6)

导入 TSV 文件为值的二维列表:

以下为默认参数:

导入 TSV 文件的单行:

导入 TSV 文件的部分指定行:

导入 TSV 文件的前 3 行:

从 TCV 文件导入单行和列:

从 TCV 文件导入单列:

"Dataset"  (3)

导入 TSV 文件为 Dataset

"HeaderLines" 将首行作为列开头使用:

"SkipLines" 仅导入需要的数据:

"Dimensions"  (1)

从 TSV 文件导入维数:

若文件中的所有行不含有相同列数,将使用最大行数:

"Grid"  (1)

导入 TSV 数据为 Grid

"RawData"  (3)

导入 TSV 数据为原始字符串:

对比 "Data"

默认情况下对于 "RawData",使用 "Numeric"->False

使用 "Numeric"->True

默认情况下对于 "RawData",使用 "FillRows"->True

使用 "FillRows"->False

"RowCount"  (1)

从 TSV 文件中获取行数:

"Schema"  (1)

Get the TabularSchema object:

"Summary"  (1)

TSV 文件摘要:

"Tabular"  (6)

Import a CSV file as a Tabular object:

Use "HeaderLines" and "SkipLines" options to only import the data of interest:

Import a single row:

Import multiple rows:

Import the first 5 rows:

Import a single element at a given row and column:

Import a single column:

导入选项  (15)

CharacterEncoding  (1)

字符串编码可用 $CharacterEncodings 设定为任意值:

"ColumnTypeDetectionDepth"  (1)

By default, only several dozen rows from the beginning of the file are used to detect column types:

Use more rows to detect column types:

"CurrencyTokens"  (1)

自动忽略货币符号:

使用 "CurrencyTokens"->None 来包含货币符号:

"DateStringFormat"  (1)

用指定的数据格式将数据转换为 DateObject:

默认情况下,没有发生转换:

"EmptyField"  (1)

对 TSV 数据中的控字段指定默认值:

"FieldSeparator"  (1)

By default, "\t" is used as a field separator:

Use "," as a field separator:

"FillRows"  (1)

对于 "Data" 参数,行长度自动被保存:

对齐行:

对于 "RawData" 参数,默认导入完整数组:

"HeaderLines"  (1)

The header line is automatically detected by default:

Use "HeaderLines" option when automatic header detection is incorrect:

指定行标题:

指定行和列标题:

"IgnoreEmptyLines"  (1)

使用 "IgnoreEmptyLines" 从导入数据中去除没有数据的行:

MissingValuePattern  (1)

By default, an automatic set of values is considered missing:

Use MissingValuePatternNone to disable missing element detection:

Use string patterns to find missing elements:

"Numeric"  (1)

使用 "Numeric"->True 解释数字:

默认情况下,将全部导入为字符串:

"NumberPoint"  (1)

By default, "." is used to specify decimal point character for floating-point data:

Use "NumberPoint" option to specify decimal point character for floating-point data:

"QuotingCharacter"  (1)

The default quoting character is a double quote:

A different quoting character can be specified:

"Schema"  (1)

Import automatically infers column labels and types from data stored in a TSV file:

Use "Schema" option to specify column types:

"SkipLines"  (1)

TSV 文件可能包括注释行:

跳过注释行:

跳过注释行,并用下一行作为 Tabular 标题:

导出选项  (7)

排列  (1)

默认情况下,对于任何排列不添加附加字符:

左对齐列的值:

中间对其列的值:

CharacterEncoding  (1)

字符串编码可通过 $CharacterEncodings 设置任意值:

"EmptyField"  (1)

默认情况下,空白参数导出为空白字符串:

对空白参数指定不同值:

"FillRows"  (1)

Row lengths are preserved by default:

"FillRows"->False 保持行的长度:

"IncludeQuotingCharacter"  (1)

By default, Export only exports quotation characters for values that need them:

Use "IncludeQuotingCharacter"All to enclose all values in quotes:

Use "IncludeQuotingCharacter"None to export all values without quotes. Note that headers are always enclosed in quotes:

"QuotingCharacter"  (1)

The default quoting character used for non-numeric elements is a double quote:

Specify a different quoting character:

Use "QuotingCharacter"->"" to export all values without quotes. Note that headers are always enclosed in quotes:

"TableHeadings"  (1)

By default, column headers are exported:

Use "TableHeadings"None to skip column headers:

用自定义列标题导出数据:

用自动以列和行标题导出数据:

应用  (1)

将欧洲国家和其人口列表导出至 TSV 文件:

导回数据并转换为表达式:

可能问题  (12)

If all rows in the file do not have the same number of columns, some rows may be considered as invalid:

Entries of the format "nnnDnnn" or "nnnEnnn" are interpreted as numbers with scientific notation:

Use the "Numeric" option to override this interpretation:

Numeric interpretation may result in a loss of precision:

Use the "Numeric" option to override this interpretation:

Starting from Version 14.2, currency tokens are not automatically skipped:

Use the "CurencyTokens" option to skip such tokens:

Starting from Version 14.2, quoting characters are added when the column of integer values contains numbers greater than Developer`$MaxMachineInteger:

Use "IncludeQuotingCharacter"->None to get the previous result:

Starting from Version 14.2, some strings are automatically considered missing:

Use MissingValuePatternNone to override this interpretation:

Starting from Version 14.2, real numbers with 0 fractional part are exported as integers:

Use "Backend"->"Table" to get the previous result:

Starting from Version 14.2, integers greater than Developer`$MaxMachineInteger are imported as real numbers:

Use "Backend"->"Table" to get the previous result:

Starting from Version 14.2, date and time columns of Tabular objects are exported using DateString:

Use "Backend"->"Table" to get the previous result:

由旧版本 Wolfram 语言生成的部分 TSV 数据可能含有不正确的文本分割区域,且在版本 11.2 中无法按预期导入:

使用 "QuotingCharacter""" 将会给出之前预测的结果:

The top-left corner of data is lost when importing a Dataset with row and column headers:

Dataset may look different depending on the dimensions of the data: