FindClusters
✖
FindClusters
Details and Options




- FindClusters partitions a list into sublists (clusters) of similar elements. The number and composition of the clusters is influenced by the input data, the method and the evaluation criterion used. The elements can belong to a variety of data types, including numerical, textual and image, as well as dates and times.
- Clustering is typically used to find classes of elements such as customer types, animal taxonomies, document topics, etc. in an unsupervised way. For supervised classification, see Classify.
- Labels for the input examples ei can be given in the following formats:
-
{e1,e2,…} use the ei themselves {e1v1,e2v2,…} a list of rules between the element ei and the label vi {e1,e2,…}{v1,v2,…} a rule between all the elements and all the labels label1e1,label2e2,… the labels as Association keys - The number of clusters can be specified in the following ways:
-
Automatic find the number of clusters automatically n find exactly n clusters UpTo[n] find at most n clusters - The following options can be given:
-
CriterionFunction Automatic criterion for selecting a method DistanceFunction Automatic the distance function to use FeatureExtractor Identity how to extract features from which to learn FeatureNames Automatic feature names to assign for input data FeatureTypes Automatic feature types to assume for input data Method Automatic what method to use MissingValueSynthesis Automatic how to synthesize missing values PerformanceGoal Automatic aspect of performance to optimize RandomSeeding 1234 what seeding of pseudorandom generators should be done internally Weights Automatic what weight to give to each example - By default, FindClusters will preprocess the data automatically unless a DistanceFunction is specified.
- The setting for DistanceFunction can be any distance or dissimilarity function, or a function f defining a distance between two values.
- Possible settings for PerformanceGoal include:
-
Automatic automatic tradeoff among speed, accuracy, and memory "Quality" maximize the accuracy of the classifier "Speed" maximize the speed of the classifier - Possible settings for Method include:
-
Automatic automatically select a method "Agglomerate" single-linkage clustering algorithm "DBSCAN" density-based spatial clustering of applications with noise "GaussianMixture" variational Gaussian mixture algorithm "JarvisPatrick" Jarvis–Patrick clustering algorithm "KMeans" k-means clustering algorithm "KMedoids" partitioning around medoids "MeanShift" mean-shift clustering algorithm "NeighborhoodContraction" shift data points toward high-density regions "SpanningTree" minimum spanning tree-based clustering algorithm "Spectral" spectral clustering algorithm - The methods "KMeans" and "KMedoids" can only be used when the number of clusters is specified.
- The methods "DBSCAN", "GaussianMixture", "JarvisPatrick", "MeanShift" and "NeighborhoodContraction" can only be used when the number of clusters is Automatic.
- The following plots show results of common methods on toy datasets:
- Possible settings for CriterionFunction include:
-
"StandardDeviation" root-mean-square standard deviation "RSquared" R-squared "Dunn" Dunn index "CalinskiHarabasz" Calinski–Harabasz index "DaviesBouldin" Davies–Bouldin index "Silhouette" Silhouette score Automatic internal index - Possible settings for RandomSeeding include:
-
Automatic automatically reseed every time the function is called Inherited use externally seeded random numbers seed use an explicit integer or strings as a seed

https://wolfram.com/xid/0b0kmgigs1w-kdgqdj
Examples
open allclose allBasic Examples (4)Summary of the most common use cases
Find clusters of nearby values:

https://wolfram.com/xid/0b0kmgigs1w-jpdaco


https://wolfram.com/xid/0b0kmgigs1w-dzpf7o

Represent clustered elements with the right-hand sides of each rule:

https://wolfram.com/xid/0b0kmgigs1w-h9jjdw

Represent clustered elements with the keys of the association:

https://wolfram.com/xid/0b0kmgigs1w-gabdcd

Scope (6)Survey of the scope of standard use cases
Cluster vectors of real values:

https://wolfram.com/xid/0b0kmgigs1w-icwhxs

Cluster data of any precision:

https://wolfram.com/xid/0b0kmgigs1w-ndtrw

https://wolfram.com/xid/0b0kmgigs1w-iaczwq

Cluster Boolean True, False data:

https://wolfram.com/xid/0b0kmgigs1w-grm7nn


https://wolfram.com/xid/0b0kmgigs1w-bwu8u


https://wolfram.com/xid/0b0kmgigs1w-wdy3pa

https://wolfram.com/xid/0b0kmgigs1w-01rtjz


https://wolfram.com/xid/0b0kmgigs1w-s75f1y

https://wolfram.com/xid/0b0kmgigs1w-lsioch


https://wolfram.com/xid/0b0kmgigs1w-4q05d0

https://wolfram.com/xid/0b0kmgigs1w-b5l05p

Options (15)Common values & functionality for each option
CriterionFunction (1)
Generate some separated data and visualize it:

https://wolfram.com/xid/0b0kmgigs1w-00c176

Cluster the data using different settings for CriterionFunction:

https://wolfram.com/xid/0b0kmgigs1w-ys32xz

https://wolfram.com/xid/0b0kmgigs1w-33iqa5
Compare the two clusterings of the data:

https://wolfram.com/xid/0b0kmgigs1w-0u8j8j


https://wolfram.com/xid/0b0kmgigs1w-60ogcx

DistanceFunction (4)
Use CanberraDistance as the measure of distance for continuous data:

https://wolfram.com/xid/0b0kmgigs1w-c1lu32

Clusters obtained with the default SquaredEuclideanDistance:

https://wolfram.com/xid/0b0kmgigs1w-m1mdnd

Use DiceDissimilarity as the measure of distance for Boolean data:

https://wolfram.com/xid/0b0kmgigs1w-rza4n

https://wolfram.com/xid/0b0kmgigs1w-w1nomf

Use MatchingDissimilarity as the measure of distance for Boolean data:

https://wolfram.com/xid/0b0kmgigs1w-2srsmp

Use HammingDistance as the measure of distance for string data:

https://wolfram.com/xid/0b0kmgigs1w-xjylmz

https://wolfram.com/xid/0b0kmgigs1w-frdmbi

Define a distance function as a pure function:

https://wolfram.com/xid/0b0kmgigs1w-gdyy3i

https://wolfram.com/xid/0b0kmgigs1w-bl8xio

FeatureExtractor (1)
Find clusters for a list of images:

https://wolfram.com/xid/0b0kmgigs1w-dfdw4g

https://wolfram.com/xid/0b0kmgigs1w-zzmk1z

Create a custom FeatureExtractor to extract features:

https://wolfram.com/xid/0b0kmgigs1w-ip44vf


https://wolfram.com/xid/0b0kmgigs1w-s0sgf5

FeatureNames (1)
Use FeatureNames to name features, and refer to their names in further specifications:

https://wolfram.com/xid/0b0kmgigs1w-48k90i

FeatureTypes (1)
Use FeatureTypes to enforce the interpretation of the features:

https://wolfram.com/xid/0b0kmgigs1w-1cc459

Compare it to the result obtained by assuming nominal features:

https://wolfram.com/xid/0b0kmgigs1w-jbdp7r

Method (4)
Cluster the data hierarchically:

https://wolfram.com/xid/0b0kmgigs1w-be97pa

Clusters obtained with the default method:

https://wolfram.com/xid/0b0kmgigs1w-cjvdme

Generate normally distributed data and visualize it:

https://wolfram.com/xid/0b0kmgigs1w-7zq5le

Cluster the data in 4 clusters by using the k-means method:

https://wolfram.com/xid/0b0kmgigs1w-zxidkh

Cluster the data using the "GaussianMixture" method without specifying the number of clusters:

https://wolfram.com/xid/0b0kmgigs1w-i8o6wm

Generate some uniformly distributed data:

https://wolfram.com/xid/0b0kmgigs1w-r9oe37

Cluster the data in 2 clusters by using the k-means method:

https://wolfram.com/xid/0b0kmgigs1w-mugba9

Cluster the data using the "DBSCAN" method without specifying the number of clusters:

https://wolfram.com/xid/0b0kmgigs1w-j7we1l


https://wolfram.com/xid/0b0kmgigs1w-pys8nc
Cluster the colors in 5 clusters using the k-medoids method:

https://wolfram.com/xid/0b0kmgigs1w-jg6ym

Cluster the colors without specifying the number of clusters using the "MeanShift" method:

https://wolfram.com/xid/0b0kmgigs1w-9dkbzy

Cluster the colors without specifying the number of clusters using the "NeighborhoodContraction" method:

https://wolfram.com/xid/0b0kmgigs1w-517ims

Cluster the colors using the "NeighborhoodContraction" method and its suboptions:

https://wolfram.com/xid/0b0kmgigs1w-iekjlh

PerformanceGoal (1)
Generate 500 random numerical vectors of length 1000:

https://wolfram.com/xid/0b0kmgigs1w-0wf5h1
Compute their clustering and benchmark the operation:

https://wolfram.com/xid/0b0kmgigs1w-2bg62

Perform the same operation with PerformanceGoal set to "Speed":

https://wolfram.com/xid/0b0kmgigs1w-h39zsa

RandomSeeding (1)
Generate 500 random numerical vectors in two dimensions:

https://wolfram.com/xid/0b0kmgigs1w-sos7z
Compute their clustering several times and compare the results:

https://wolfram.com/xid/0b0kmgigs1w-8soype

https://wolfram.com/xid/0b0kmgigs1w-rd8neb

Compute their clustering several times by changing the RandomSeeding option, and compare the results:

https://wolfram.com/xid/0b0kmgigs1w-l9zh33

https://wolfram.com/xid/0b0kmgigs1w-3rsarn

Weights (1)
Obtain cluster assignment for some numerical data:

https://wolfram.com/xid/0b0kmgigs1w-0o8jn4

https://wolfram.com/xid/0b0kmgigs1w-cypgbu

Look at the cluster assignment when changing the weight given to each number:

https://wolfram.com/xid/0b0kmgigs1w-5ywa78

Applications (3)Sample problems that can be solved with this function
Find and visualize clusters in bivariate data:

https://wolfram.com/xid/0b0kmgigs1w-c7x5qf

https://wolfram.com/xid/0b0kmgigs1w-bhb9o7

Find clusters in five‐dimensional vectors:

https://wolfram.com/xid/0b0kmgigs1w-bl1803

https://wolfram.com/xid/0b0kmgigs1w-eu3bg

Cluster genomic sequences based on the number of element‐wise differences:

https://wolfram.com/xid/0b0kmgigs1w-cs39oi

https://wolfram.com/xid/0b0kmgigs1w-c43z2h

Properties & Relations (2)Properties of the function, and connections to other functions
FindClusters returns the list of clusters, while ClusteringComponents gives an array of cluster indices:

https://wolfram.com/xid/0b0kmgigs1w-ms4t02


https://wolfram.com/xid/0b0kmgigs1w-8qq15k

FindClusters groups data, while Nearest gives the elements closest to a given value:

https://wolfram.com/xid/0b0kmgigs1w-c7sf87


https://wolfram.com/xid/0b0kmgigs1w-l2hip


https://wolfram.com/xid/0b0kmgigs1w-d07drv

Neat Examples (2)Surprising or curious use cases
Divide a square into n segments by clustering uniformly distributed random points:

https://wolfram.com/xid/0b0kmgigs1w-eu1jod

https://wolfram.com/xid/0b0kmgigs1w-xce5j

https://wolfram.com/xid/0b0kmgigs1w-dg5vim

Cluster words beginning with "agg" in the English dictionary:

https://wolfram.com/xid/0b0kmgigs1w-ed67pg

Wolfram Research (2007), FindClusters, Wolfram Language function, https://reference.wolfram.com/language/ref/FindClusters.html (updated 2020).
Text
Wolfram Research (2007), FindClusters, Wolfram Language function, https://reference.wolfram.com/language/ref/FindClusters.html (updated 2020).
Wolfram Research (2007), FindClusters, Wolfram Language function, https://reference.wolfram.com/language/ref/FindClusters.html (updated 2020).
CMS
Wolfram Language. 2007. "FindClusters." Wolfram Language & System Documentation Center. Wolfram Research. Last Modified 2020. https://reference.wolfram.com/language/ref/FindClusters.html.
Wolfram Language. 2007. "FindClusters." Wolfram Language & System Documentation Center. Wolfram Research. Last Modified 2020. https://reference.wolfram.com/language/ref/FindClusters.html.
APA
Wolfram Language. (2007). FindClusters. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/FindClusters.html
Wolfram Language. (2007). FindClusters. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/FindClusters.html
BibTeX
@misc{reference.wolfram_2025_findclusters, author="Wolfram Research", title="{FindClusters}", year="2020", howpublished="\url{https://reference.wolfram.com/language/ref/FindClusters.html}", note=[Accessed: 29-March-2025
]}
BibLaTeX
@online{reference.wolfram_2025_findclusters, organization={Wolfram Research}, title={FindClusters}, year={2020}, url={https://reference.wolfram.com/language/ref/FindClusters.html}, note=[Accessed: 29-March-2025
]}