Parquet (.parquet)

背景

    • Registered MIME types: application/vnd.apache.parquet
    • Efficient, general-purpose, column-oriented data format.
    • Developed by the Apache Software Foundation.
    • Binary file format.
    • Supports multiple compression methods.

Import & Export

  • Import["file.parquet"] imports a Parquet file as a Tabular object.
  • Import["file.parquet",elem] imports the specified elements.
  • Import["file.parquet",{elem,subelem1,}] imports subelements subelemi, useful for partial data import.
  • The import format can be specified with Import["file","Parquet"] or Import["file",{"Parquet",elem,}].
  • Export["file.parquet",expr] creates a Parquet file from expr.
  • Supported expressions expr include:
  • {v1,v2,}a single column of data
    {{v11,v12,},{v21,v22,},}lists of rows of data
    arrayan array such as SparseArray, QuantityArray, etc.
    dataseta Dataset or a Tabular object
  • See the following reference pages for full general information:
  • Import, Exportimport from or export to a file
    CloudImport, CloudExportimport from or export to a cloud object
    ImportString, ExportStringimport from or export to a string
    ImportByteArray, ExportByteArrayimport from or export to a byte array

Import Elements

  • General Import elements:
  • "Elements" list of elements and options available in this file
    "Summary"summary of the file
    "Rules"list of rules for all available elements
  • Data representation elements:
  • "Data"two-dimensional array
    "Dataset"table data as a Dataset
    "Tabular"a Tabular object
  • Import by default uses the "Tabular" element.
  • Subelements for partial data import for the "Tabular" element can take row and column specifications in the form {"Tabular",rows,cols}, where rows and cols can be any of the following:
  • nnth row or column
    -ncounts from the end
    n;;mfrom n through m
    n;;m;;sfrom n through m with steps of s
    {n1,n2,}specific rows or columns ni
  • Data descriptor elements:
  • "ColumnLabels"names of columns
    "ColumnTypes"association with data type for each column
    "Schema"TabularSchema object
  • Metadata elements:
  • "ColumnCount"number of columns stored in file
    "Dimensions"data dimensions
    "RowCount"number of rows stored in file
    "MetaInformation"metadata

Options

  • General Import options:
  • IncludeMetaInformationAllmetadata types to import
    "Schema"Automaticschema used to construct Tabular object
  • General Export options:
  • "Compression"Nonecompression method
    CompressionLevelAutomaticcompression level
  • The following settings for "Compression" are supported:
  • Noneno compression
    "Brotli"Brotli compression
    "GZIP"GZIP compression
    "LZ4"LZ4 compression
    "LZ4Hadoop"LZ4 Hadoop compression
    "Snappy"Snappy compression
    "ZSTD"ZSTD compression

范例

打开所有单元关闭所有单元

基本范例  (3)

Import Tabular object from Parquet file:

Import the file summary:

Export a Tabular object to Parquet:

Scope  (3)

Import  (3)

Show all elements available in the file:

By default, a Tabular object is returned:

Import column types:

Import Elements  (14)

"ColumnCount"  (1)

Get the number of columns:

"ColumnLabels"  (1)

Read column names:

"ColumnTypes"  (1)

Import column types:

"Data"  (2)

Get the data from a file:

Import only selected rows:

Import only selected columns:

"Dataset"  (2)

Get the data as a Dataset:

Import only selected rows:

Import only selected columns:

"Dimensions"  (1)

Import data dimensions:

"MetaInformation"  (1)

Import metadata:

"RowCount"  (1)

Get the number of rows:

"Schema"  (1)

Get the TabularSchema object:

"Summary"  (1)

Get the file summary:

"Tabular"  (2)

Get the data from a file as a Tabular object:

Import only selected rows:

Import only selected columns:

Import Options  (2)

IncludeMetaInformation  (1)

By default, all metadata stored in a file is imported and embedded in the Tabular object:

Do not import metadata:

"Schema"  (1)

Export Tabular object to Parquet file:

By default, column labels and their types stored in a file are used when Tabular or Dataset objects are imported:

Use "Schema" option to specify column labels and types:

Export Options  (4)

"Compression"  (2)

Compression is disabled by default:

Compare supported compression methods:

CompressionLevel  (2)

By default, Automatic value of CompressionLevel is used. It corresponds to a different default value for each compression method:

Use maximal compression for each method: