"DBSCAN" (Machine Learning Method)
- Method for FindClusters, ClusterClassify and ClusteringComponents.
- Partitions data into clusters of similar elements using density-based spatial clustering of applications with noise (DBSCAN).
Details & Suboptions
- "DBSCAN" (density-based spatial clustering of applications with noise) is a density-based clustering method where the density is estimated using a neighbor-based approach. "DBSCAN" works for arbitrary cluster shapes and sizes but requires clusters to have similar densities.
- The following plots show the results of the "DBSCAN" method applied to toy datasets (black points indicate outliers):
- "DBSCAN" defines "core points" as data points that have more than k neighbors within a ball of ϵ radius (i.e. data points in high-density regions). Then, core points that are at a distance of less than ϵ from each other define a cluster. Furthermore, any point that is at a distance of less than ϵ of a core point belongs to the cluster of the core point. Any point that is not near a core point is considered noise.
- This results in each cluster containing one or more core points at its core and some non-core points at its "edge". Overall, "DBSCAN" defines clusters as connected high-density regions. In the following figure, core points are red, edge points are yellow and noise points are blue:
- In ClusteringComponents and ClusterClassify, noise points are labeled Missing["Anomalous"].
- In FindClusters, noise points are returned as a cluster.
- The option DistanceFunction can be used to define which distance to use.
- The following suboptions can be given:
-
"NeighborhoodRadius" Automatic radius ϵ "NeighborsNumber" Automatic number of neighbors k "DropAnomalousValues" False whether to drop outliers


Examples
open allclose allBasic Examples (3)Summary of the most common use cases
Find clusters of nearby values using the "DBSCAN" method:

https://wolfram.com/xid/0bfevoay07hcec-wxc5bs

Train the ClassifierFunction on a list of colors using the "DBSCAN" method:

https://wolfram.com/xid/0bfevoay07hcec-509fvf

Gather the elements by their class number:

https://wolfram.com/xid/0bfevoay07hcec-izazxv


https://wolfram.com/xid/0bfevoay07hcec-q8s2rm

Plot clusters in data found using the "DBSCAN" method:

https://wolfram.com/xid/0bfevoay07hcec-fj7crm

Scope (2)Survey of the scope of standard use cases
Obtain a random list of times:

https://wolfram.com/xid/0bfevoay07hcec-3sq4zb

Train the ClassifierFunction using the "DBSCAN" method:

https://wolfram.com/xid/0bfevoay07hcec-0wodlh

Obtain the cluster assignment and cluster the data:

https://wolfram.com/xid/0bfevoay07hcec-s8b4j2


Train the ClassifierFunction using the "DBSCAN" method:

https://wolfram.com/xid/0bfevoay07hcec-g11jal

Noise points are labeled as Missing["Anomalous"]:

https://wolfram.com/xid/0bfevoay07hcec-5s1s23

Options (7)Common values & functionality for each option
DistanceFunction (1)
"NeighborhoodRadius" (2)
Find clusters by specifying the "NeighborhoodRadius" suboption:

https://wolfram.com/xid/0bfevoay07hcec-3656v8

Define a set of two-dimensional data points, characterized by four somewhat nebulous clusters:

https://wolfram.com/xid/0bfevoay07hcec-0l2e30

Plot clusters in data found using the "DBSCAN" method:

https://wolfram.com/xid/0bfevoay07hcec-9b0lj5

Plot different clusterings of data using the "DBSCAN" method by varying the "NeighborhoodRadius":

https://wolfram.com/xid/0bfevoay07hcec-z3ek06

"NeighborsNumber" (3)
Find clusters by specifying the "NeighborsNumber" suboption:

https://wolfram.com/xid/0bfevoay07hcec-fip5bm


https://wolfram.com/xid/0bfevoay07hcec-4onycb

Plot clusters in data found using the "DBSCAN" method:

https://wolfram.com/xid/0bfevoay07hcec-khjy3n

Plot different clusterings of data using the "DBSCAN" method by varying the "NeighborsNumber":

https://wolfram.com/xid/0bfevoay07hcec-iauvn5

Define a set of two-dimensional data points, characterized by four somewhat nebulous clusters:

https://wolfram.com/xid/0bfevoay07hcec-tox9eg

Plot clusters in data using the "DBSCAN" method:

https://wolfram.com/xid/0bfevoay07hcec-hxic61

Plot different clusterings of data using the "DBSCAN" method by varying the "NeighborsNumber":

https://wolfram.com/xid/0bfevoay07hcec-ru8gw9

"DropAnomalousValues" (1)
Train the ClassifierFunction, which labels outliers as Missing["Anomalous"]:

https://wolfram.com/xid/0bfevoay07hcec-jhla3c

Use the trained ClassifierFunction to identify the outliers:

https://wolfram.com/xid/0bfevoay07hcec-w4txzt

Train the ClassifierFunction by dropping outliers and finding new cluster assignments:

https://wolfram.com/xid/0bfevoay07hcec-mn7l7q


Similarly, find clusters of nearby values with outliers:

https://wolfram.com/xid/0bfevoay07hcec-5at6h4

Remove outliers using the "DropAnomalousValues" suboption:

https://wolfram.com/xid/0bfevoay07hcec-ozd2vx
