Tabular Data Cleaning
Data cleaning is the process of preparing data and removing obstacles for further processing. Data cleaning tends to use lots of resources in a data science project, so by providing multiple tools for the different cleaning tasks, they can be made routine and more automatic. The Wolfram Language provides a rich collection of data cleaning tools. There are structural cleaning tools that change the structure of data, from splitting and combining columns to pivoting between column values and names. There are also value cleaning tools that handle missing values or outlier values that are otherwise obstructions to further processing.
Column Keys
ColumnKeys — get column keys
RenameColumns — set column keys
Column Types
ColumnTypes — get column types
CastColumns — set column types
Reorganize Columns
TransformColumns — separate or combine columns
Restructure Tabular Values
PivotToColumns — spreading values from single columns into several columns
PivotFromColumns — gathering values from several columns into one column
Handle Missing Values
TransformMissing — how to handle missing values, to impute values, etc.
MissingFallback ▪ MissingValuePattern ▪ Missing
Handle Extreme Values »
TransformAnomalies — how to handle extreme values, to clip values, etc.
FindAnomalies ▪ DeleteAnomalies ▪ Clip ▪ ...