FindClusters
FindClusters[{e1,e2,…}]
partitions the ei into clusters of similar elements.
FindClusters[{e1v1,e2v2,…}]
returns the vi corresponding to the ei in each cluster.
FindClusters[{e1,e2,…}{v1,v2,…}]
gives the same result.
FindClusters[label1e1,label2e2,…]
returns the labeli corresponding to the ei in each cluster.
FindClusters[data,n]
partitions data into at most n clusters.
Details and Options




- FindClusters works for a variety of data types, including numerical, textual, and image, as well as dates and times.
- The following options can be given:
-
CriterionFunction Automatic criterion for selecting a method DistanceFunction Automatic the distance function to use FeatureExtractor Identity how to extract features from which to learn FeatureNames Automatic feature names to assign for input data FeatureTypes Automatic feature types to assume for input data Method Automatic what method to use MissingValueSynthesis Automatic how to synthesize missing values PerformanceGoal Automatic aspect of performance to optimize RandomSeeding 1234 what seeding of pseudorandom generators should be done internally Weights Automatic what weight to give to each example - By default, FindClusters will preprocess the data automatically unless a DistanceFunction is specified.
- The setting for DistanceFunction can be any distance or dissimilarity function, or a function f defining a distance between two values.
- Possible settings for PerformanceGoal include:
-
Automatic automatic tradeoff among speed, accuracy, and memory "Quality" maximize the accuracy of the classifier "Speed" maximize the speed of the classifier - Possible settings for Method include:
-
Automatic automatically select a method "Agglomerate" single-linkage clustering algorithm "DBSCAN" density-based spatial clustering of applications with noise "NeighborhoodContraction" shift data points toward high-density regions "JarvisPatrick" Jarvis–Patrick clustering algorithm "KMeans" k-means clustering algorithm "MeanShift" mean-shift clustering algorithm "KMedoids" partitioning around medoids "SpanningTree" minimum spanning tree-based clustering algorithm "Spectral" spectral clustering algorithm "GaussianMixture" variational Gaussian mixture algorithm - The methods "KMeans" and "KMedoids" can only be used when the number of clusters is specified.
- The following plots show results of common methods on toy datasets:
- Possible settings for CriterionFunction include:
-
"StandardDeviation" root-mean-square standard deviation "RSquared" R-squared "Dunn" Dunn index "CalinskiHarabasz" Calinski–Harabasz index "DaviesBouldin" Davies–Bouldin index "Silhouette" Silhouette score Automatic internal index - Possible settings for RandomSeeding include:
-
Automatic automatically reseed every time the function is called Inherited use externally seeded random numbers seed use an explicit integer or strings as a seed
Examples
open allclose allBasic Examples (4)
Scope (6)
Options (15)
CriterionFunction (1)
Generate some separated data and visualize it:
Cluster the data using different settings for CriterionFunction:
DistanceFunction (4)
Use CanberraDistance as the measure of distance for continuous data:
Clusters obtained with the default SquaredEuclideanDistance:
Use DiceDissimilarity as the measure of distance for Boolean data:
Use MatchingDissimilarity as the measure of distance for Boolean data:
Use HammingDistance as the measure of distance for string data:
FeatureExtractor (1)
Find clusters for a list of images:
Create a custom FeatureExtractor to extract features:
FeatureNames (1)
Use FeatureNames to name features, and refer to their names in further specifications:
FeatureTypes (1)
Use FeatureTypes to enforce the interpretation of the features:
Compare it to the result obtained by assuming nominal features:
Method (4)
Cluster the data hierarchically:
Clusters obtained with the default method:
Generate normally distributed data and visualize it:
Cluster the data in 4 clusters by using the k-means method:
Cluster the data using the "GaussianMixture" method without specifying the number of clusters:
Generate some uniformly distributed data:
Cluster the data in 2 clusters by using the k-means method:
Cluster the data using the "DBSCAN" method without specifying the number of clusters:
Cluster the colors in 5 clusters using the k-medoids method:
Cluster the colors without specifying the number of clusters using the "MeanShift" method:
Cluster the colors without specifying the number of clusters using the "NeighborhoodContraction" method:
Cluster the colors using the "NeighborhoodContraction" method and its suboptions:
PerformanceGoal (1)
Generate 500 random numerical vectors of length 1000:
Compute their clustering and benchmark the operation:
Perform the same operation with PerformanceGoal set to "Speed":
RandomSeeding (1)
Generate 500 random numerical vectors in two dimensions:
Compute their clustering several times and compare the results:
Compute their clustering several times by changing the RandomSeeding option, and compare the results:
Applications (3)
Properties & Relations (2)
FindClusters returns the list of clusters, while ClusteringComponents gives an array of cluster indices:
FindClusters groups data, while Nearest gives the elements closest to a given value:
Text
Wolfram Research (2007), FindClusters, Wolfram Language function, https://reference.wolfram.com/language/ref/FindClusters.html (updated 2020).
CMS
Wolfram Language. 2007. "FindClusters." Wolfram Language & System Documentation Center. Wolfram Research. Last Modified 2020. https://reference.wolfram.com/language/ref/FindClusters.html.
APA
Wolfram Language. (2007). FindClusters. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/FindClusters.html