Tabular Data Cleaning

Data cleaning is the process of preparing data and removing obstacles for further processing. Data cleaning tends to use lots of resources in a data science project, so by providing multiple tools for the different cleaning tasks, they can be made routine and more automatic.  The Wolfram Language provides a rich collection of data cleaning tools. There are structural cleaning tools that change the structure of data, from splitting and combining columns to pivoting between column values and names. There are also value cleaning tools that handle missing values or outlier values that are otherwise obstructions to further processing.

Column Keys

ColumnKeys get column keys

RenameColumns set column keys

Column Types

ColumnTypes get column types

CastColumns set column types

Reorganize Columns

TransformColumns separate or combine columns

DeleteColumns  ▪  InsertColumns

Restructure Tabular Values

PivotToColumns spreading values from single columns into several columns

PivotFromColumns gathering values from several columns into one column

Handle Missing Values

TransformMissing how to handle missing values, to impute values, etc.

MissingFallback  ▪  MissingValuePattern  ▪  Missing

Handle Extreme Values »

TransformAnomalies how to handle extreme values, to clip values, etc.

FindAnomalies  ▪  DeleteAnomalies  ▪  Clip  ▪  ...