"UMAP" (Machine Learning Method)
- Method for DimensionReduction, DimensionReduce, FeatureSpacePlot and FeatureSpacePlot3D.
- Reduce the dimension of data using uniform manifold approximation and projection.
Details & Suboptions
- "UMAP", which stands for uniform manifold approximation and projection, is a nonlinear nonparametric dimensionality reduction method. The method attempts to learn a low-dimensional representation of the data that preserves the local structure of the data in balance with the global structure.
- "UMAP" works for datasets with nonlinear manifolds and is particularly suited for the visualization of high-dimensional datasets.
- The following shows two-dimensional embeddings learned by the "UMAP" method applied to the benchmark datasets Fisher's Irises, MNIST and FashionMNIST:
-
✖
https://wolfram.com/xid/0dx1jvwfoc
- UMAP constructs a high-dimensional graph representation of the data then optimizes a low-dimensional graph to be as structurally similar as possible.
- In order to construct the initial high-dimensional graph, UMAP builds a weighted graph, with edge weights representing the likelihood that two points are connected. To do so, UMAP chooses a radius locally, based on the distance to each point's
nearest neighbors. The likelihood of two points being connected is then exponentially decreasing with the ratio of the distance between the points and this radius.
- Once the high-dimensional graph is constructed, UMAP optimizes the layout of a low-dimensional analog to be as similar as possible.
- By stipulating that each point must be connected to at least its closest neighbor, UMAP ensures that local structure is preserved in balance with global structure.
- The following suboptions can be given:
-
"MinDistance" 0.1 minimum distance between points in low-dimensional space "NeighborsNumber" 15 number of nearest neighbors to construct the high-dimensional graph - "MinDistance" controls how tightly UMAP clumps points together, with low values leading to more tightly packed embeddings. Larger values will make UMAP pack points together more loosely, focusing instead on the preservation of the broad topological structure.
- "NeighborsNumber" effectively controls how UMAP balances local versus global structure. Low values will push to focus more on local structure, while high values will push toward representing the big-picture structure while losing fine detail.
Examples
open allclose allBasic Examples (1)Summary of the most common use cases
Options (2)Common values & functionality for each option
"MinDistance" (1)
Load a sample from the "MNIST" dataset:
In[1]:=1

✖
https://wolfram.com/xid/0dx1jvwfoc-tcwdw7
Reduce the dimension of images using "UMAP":
In[2]:=2

✖
https://wolfram.com/xid/0dx1jvwfoc-51nrd6
Find features by performing a linear reduction before running the UMAP method using the "MinDistance" suboption:
In[3]:=3

✖
https://wolfram.com/xid/0dx1jvwfoc-o6o2zn
Visualize the obtained features and compare the results:
In[4]:=4

✖
https://wolfram.com/xid/0dx1jvwfoc-5vrs5c
Out[4]=4

Out[4]=4

"NeighborsNumber" (1)
Load the Fisher Iris dataset from ExampleData:
In[1]:=1

✖
https://wolfram.com/xid/0dx1jvwfoc-i0qojo
In[2]:=2

✖
https://wolfram.com/xid/0dx1jvwfoc-ljl8f4
Out[2]=2

Generate a reducer function using the "UMAP" method:
In[3]:=3

✖
https://wolfram.com/xid/0dx1jvwfoc-hw72bx
Out[3]=3

Group the examples by their species:
In[4]:=4

✖
https://wolfram.com/xid/0dx1jvwfoc-u41y3o
Reduce the dimension of the features:
In[5]:=5

✖
https://wolfram.com/xid/0dx1jvwfoc-2geip6
Visualize the reduced dataset:
In[6]:=6

✖
https://wolfram.com/xid/0dx1jvwfoc-ck8dgi
Out[6]=6

Perform the same operation using a different number of nearest neighbors to construct the high-dimensional graph:
In[7]:=7

✖
https://wolfram.com/xid/0dx1jvwfoc-t6jes2
Out[7]=7

In[8]:=8

✖
https://wolfram.com/xid/0dx1jvwfoc-o7x68u
In[9]:=9

✖
https://wolfram.com/xid/0dx1jvwfoc-hc3c1y
Out[9]=9
