WOLFRAM

Details & Suboptions

  • "DBSCAN" (density-based spatial clustering of applications with noise) is a density-based clustering method where the density is estimated using a neighbor-based approach. "DBSCAN" works for arbitrary cluster shapes and sizes but requires clusters to have similar densities.
  • The following plots show the results of the "DBSCAN" method applied to toy datasets (black points indicate outliers):
  • "DBSCAN" defines "core points" as data points that have more than k neighbors within a ball of ϵ radius (i.e. data points in high-density regions). Then, core points that are at a distance of less than ϵ from each other define a cluster. Furthermore, any point that is at a distance of less than ϵ of a core point belongs to the cluster of the core point. Any point that is not near a core point is considered noise.
  • This results in each cluster containing one or more core points at its core and some non-core points at its "edge". Overall, "DBSCAN" defines clusters as connected high-density regions. In the following figure, core points are red, edge points are yellow and noise points are blue:
  • In ClusteringComponents and ClusterClassify, noise points are labeled Missing["Anomalous"].
  • In FindClusters, noise points are returned as a cluster.
  • The option DistanceFunction can be used to define which distance to use.
  • The following suboptions can be given:
  • "NeighborhoodRadius" Automatic
    radius ϵ
    "NeighborsNumber" Automaticnumber of neighbors k
    "DropAnomalousValues" Falsewhether to drop outliers

Examples

open allclose all

Basic Examples  (3)Summary of the most common use cases

Find clusters of nearby values using the "DBSCAN" method:

Out[1]=1

Train the ClassifierFunction on a list of colors using the "DBSCAN" method:

Out[1]=1

Gather the elements by their class number:

Out[2]=2

Create random 2D vectors:

Out[1]=1

Plot clusters in data found using the "DBSCAN" method:

Out[2]=2

Scope  (2)Survey of the scope of standard use cases

Obtain a random list of times:

Out[1]=1

Train the ClassifierFunction using the "DBSCAN" method:

Out[2]=2

Obtain the cluster assignment and cluster the data:

Out[3]=3
Out[3]=3

Train the ClassifierFunction using the "DBSCAN" method:

Out[1]=1

Noise points are labeled as Missing["Anomalous"]:

Out[2]=2

Options  (7)Common values & functionality for each option

DistanceFunction  (1)

Cluster string data using edit distance:

Out[1]=1

Cluster data using Manhattan distance:

Out[2]=2

"NeighborhoodRadius"  (2)

Find clusters by specifying the "NeighborhoodRadius" suboption:

Out[7]=7

Define a set of two-dimensional data points, characterized by four somewhat nebulous clusters:

Out[1]=1

Plot clusters in data found using the "DBSCAN" method:

Out[2]=2

Plot different clusterings of data using the "DBSCAN" method by varying the "NeighborhoodRadius":

Out[3]=3

"NeighborsNumber"  (3)

Find clusters by specifying the "NeighborsNumber" suboption:

Out[1]=1

Create random 2D vectors:

Out[1]=1

Plot clusters in data found using the "DBSCAN" method:

Out[2]=2

Plot different clusterings of data using the "DBSCAN" method by varying the "NeighborsNumber":

Out[3]=3

Define a set of two-dimensional data points, characterized by four somewhat nebulous clusters:

Out[1]=1

Plot clusters in data using the "DBSCAN" method:

Out[2]=2

Plot different clusterings of data using the "DBSCAN" method by varying the "NeighborsNumber":

Out[3]=3

"DropAnomalousValues"  (1)

Train the ClassifierFunction, which labels outliers as Missing["Anomalous"]:

Out[11]=11

Use the trained ClassifierFunction to identify the outliers:

Out[12]=12

Train the ClassifierFunction by dropping outliers and finding new cluster assignments:

Out[13]=13
Out[14]=14

Similarly, find clusters of nearby values with outliers:

Out[18]=18

Remove outliers using the "DropAnomalousValues" suboption:

Out[17]=17