Wolfram Language & System Documentation Center

"KMeans" (Machine Learning Method)

See Also
- FindClusters
- ClusterClassify
- ClusteringComponents
- ClusteringTree
- Dendrogram
- DimensionReduction
- Methods
- Agglomerate
- GaussianMixture
- JarvisPatrick
- KMedoids
- MeanShift
- NeighborhoodContraction
- SpanningTree
- Spectral
Tech Notes
- Partitioning Data into Clusters
- See Also
  - FindClusters
  - ClusterClassify
  - ClusteringComponents
  - ClusteringTree
  - Dendrogram
  - DimensionReduction
  - Methods
  - Agglomerate
  - GaussianMixture
  - JarvisPatrick
  - KMedoids
  - MeanShift
  - NeighborhoodContraction
  - SpanningTree
  - Spectral
- Tech Notes
  - Partitioning Data into Clusters

"KMeans" (Machine Learning Method)

Method for FindClusters, ClusterClassify and ClusteringComponents.
Partitions data into a specified clusters of similar elements using a k-means clustering algorithm.

Details & Suboptions

"KMeans" is a classic, simple, centroid-based clustering method. "KMeans" works when clusters have similar sizes and are locally and isotropically distributed around their centroid. When clusters have very different sizes, are anisotropic, are intertwined, or when outliers are present, it is likely that "KMeans" will give poor results.
The following plots show the results of the "KMeans" method applied to toy datasets:
The "KMeans" method aims to find k centroids defining k clusters. Each data point is assigned to its nearest centroid. All points assigned to a given centroid are forming a cluster.
The procedure to find the best k centroids is iterative. The search starts by using random centroids and assigning each point to its nearest centroid:

Once all clusters are defined, the mean of each cluster becomes a new centroid:

This procedure is repeated until the clusters remain unchanged. This iterative procedure is sometimes called "hard EM" (hard Expectation Maximization).
The "KMeans" method is similar to the "GaussianMixture" with a spherical covariance (that is, all clusters are isotropic and have the same size).
Since the initial centroids are chosen randomly, results might differ upon evaluation.
The suboption "InitialCentroids" can be used to specify the initial centroids as a list of data points.
The following suboption can be given:
"InitialCentroids" Automatic a list of initial centroids

Examples

open all close all

Basic Examples (3)

Find exactly four clusters of nearby values using the "KMeans" clustering method:

Create random 2D vectors:

Plot computed clusters using the "KMeans" method:

Train a ClassifierFunction on a list of strings:

Find the cluster assignments and gather the elements by their cluster:

Options (3)

DistanceFunction (1)

Cluster data using Manhattan distance:

"InitialCentroids" (2)

Generate a list of 100 random colors:

Cluster the colors without specifying the initial configuration of centroids using the "KMeans" method:

Specify the initial colors to be used as centroids using the "KMeans" method:

Create random 2D vectors:

Find different clusterings of data using the "KMeans" method by varying the "InitialCentroids":

Possible Issues (1)

Create and visualize noisy 2D moon-shaped training and test datasets:

Train a ClassifierFunction using "KMeans" for two clusters and find clusters in the test set:

Visualizing clusters indicates that "KMeans" performs poorly on intertwined clusters:

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

"KMeans" (Machine Learning Method)

Details & Suboptions

Examples

Basic Examples (3)

Options (3)

DistanceFunction (1)

"InitialCentroids" (2)

Possible Issues (1)

"KMeans" (Machine Learning Method)

Details & Suboptions

Examples

Basic Examples (3)

Options (3)

DistanceFunction (1)

"InitialCentroids" (2)

Possible Issues (1)

See Also

Tech Notes

Related Links

History