Wolfram Language & System Documentation Center

"DBSCAN" (Machine Learning Method)

See Also
- FindClusters
- ClusterClassify
- ClusteringComponents
- ClusteringTree
- Dendrogram
- DimensionReduction
- Methods
- Agglomerate
- GaussianMixture
- JarvisPatrick
- KMeans
- KMedoids
- MeanShift
- NeighborhoodContraction
- SpanningTree
- Spectral
Tech Notes
- Partitioning Data into Clusters
- See Also
  - FindClusters
  - ClusterClassify
  - ClusteringComponents
  - ClusteringTree
  - Dendrogram
  - DimensionReduction
  - Methods
  - Agglomerate
  - GaussianMixture
  - JarvisPatrick
  - KMeans
  - KMedoids
  - MeanShift
  - NeighborhoodContraction
  - SpanningTree
  - Spectral
- Tech Notes
  - Partitioning Data into Clusters

"DBSCAN" (Machine Learning Method)

Method for FindClusters, ClusterClassify and ClusteringComponents.
Partitions data into clusters of similar elements using density-based spatial clustering of applications with noise (DBSCAN).

Details & Suboptions

"DBSCAN" (density-based spatial clustering of applications with noise) is a density-based clustering method where the density is estimated using a neighbor-based approach. "DBSCAN" works for arbitrary cluster shapes and sizes but requires clusters to have similar densities.
The following plots show the results of the "DBSCAN" method applied to toy datasets (black points indicate outliers):

"DBSCAN" defines "core points" as data points that have more than k neighbors within a ball of ϵ radius (i.e. data points in high-density regions). Then, core points that are at a distance of less than ϵ from each other define a cluster. Furthermore, any point that is at a distance of less than ϵ of a core point belongs to the cluster of the core point. Any point that is not near a core point is considered noise.
This results in each cluster containing one or more core points at its core and some non-core points at its "edge". Overall, "DBSCAN" defines clusters as connected high-density regions. In the following figure, core points are red, edge points are yellow and noise points are blue:

In ClusteringComponents and ClusterClassify, noise points are labeled Missing["Anomalous"].
In FindClusters, noise points are returned as a cluster.
The option DistanceFunction can be used to define which distance to use.
The following suboptions can be given:

"NeighborhoodRadius"	Automatic	radius ϵ
"NeighborsNumber"	Automatic	number of neighbors k
"DropAnomalousValues"	False	whether to drop outliers

Examples

open all close all

Basic Examples (3)

Find clusters of nearby values using the "DBSCAN" method:

Train the ClassifierFunction on a list of colors using the "DBSCAN" method:

Gather the elements by their class number:

Create random 2D vectors:

Plot clusters in data found using the "DBSCAN" method:

Scope (2)

Obtain a random list of times:

Train the ClassifierFunction using the "DBSCAN" method:

Obtain the cluster assignment and cluster the data:

Train the ClassifierFunction using the "DBSCAN" method:

Noise points are labeled as Missing["Anomalous"]:

Options (7)

DistanceFunction (1)

Cluster string data using edit distance:

Cluster data using Manhattan distance:

"NeighborhoodRadius" (2)

Find clusters by specifying the "NeighborhoodRadius" suboption:

Define a set of two-dimensional data points, characterized by four somewhat nebulous clusters:

Plot clusters in data found using the "DBSCAN" method:

Plot different clusterings of data using the "DBSCAN" method by varying the "NeighborhoodRadius":

"NeighborsNumber" (3)

Find clusters by specifying the "NeighborsNumber" suboption:

Create random 2D vectors:

Plot clusters in data found using the "DBSCAN" method:

Plot different clusterings of data using the "DBSCAN" method by varying the "NeighborsNumber":

Define a set of two-dimensional data points, characterized by four somewhat nebulous clusters:

Plot clusters in data using the "DBSCAN" method:

Plot different clusterings of data using the "DBSCAN" method by varying the "NeighborsNumber":

"DropAnomalousValues" (1)

Train the ClassifierFunction, which labels outliers as Missing["Anomalous"]:

Use the trained ClassifierFunction to identify the outliers:

Train the ClassifierFunction by dropping outliers and finding new cluster assignments:

Similarly, find clusters of nearby values with outliers:

Remove outliers using the "DropAnomalousValues" suboption:

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

"DBSCAN" (Machine Learning Method)

Details & Suboptions

Examples

Basic Examples (3)

Scope (2)

Options (7)

DistanceFunction (1)

"NeighborhoodRadius" (2)

"NeighborsNumber" (3)

"DropAnomalousValues" (1)

"DBSCAN" (Machine Learning Method)

Details & Suboptions

Examples

Basic Examples (3)

Scope (2)

Options (7)

DistanceFunction (1)

"NeighborhoodRadius" (2)

"NeighborsNumber" (3)

"DropAnomalousValues" (1)

See Also

Tech Notes

Related Links

History