"KDTree" (Data Structure)
"KDTree"
represents k-d tree binary spatial subdivision for a set of real number coordinates.
Details
- A k-d tree is useful for subdividing a set of n coordinates in dimension so that searches, such as nearest neighbor searches, can be done quickly. Each node has two data elements, indexed by 0 and 1. The data element can be either another k-d tree node or a list {i1,i2,…} of point indexes with 0<ik<=n. Each data element has a unique machine integer ID.
-
CreateDataStructure["KDTree",coordinates] create a new "KDTree" from coordinates that can all be represented by real numbers. CreateDataStructure["KDTree",coordinates, n] create a new "KDTree" from coordinates with at most n entries in each subdivistion. Typed[x,"KDTree"] give x the type "KDTree". - For a data structure ds of type "KDTree", the following operations can be used:
-
ds["DataBounds"] return the overall bounds for the coordinates time: O(1) ds["DataCoordinates"] return the coordinates time: O(1) ds["Graphics"] show the spatial subdivision for coordinates in 2D time: O(n) ds["ID"] return the ID for ds time: O(1) ds["ID", b] return the ID for the b (0 or 1) data element of ds time: O(1) ds["Indexes",b] return the indexes for the coordinates enclosed in the box represented by the b (0 or 1) leaf data element of ds time: O(1) ds["LeafQ", b] return True if the b (0 or 1) data element of ds is a leaf time: O(1) ds["Node",b] return the "KDTree" for the the b (0 or 1) node data element of ds time: O(1) ds["SplitDimension"] return the dimension for the spatial split of ds time: O(1) ds["SplitPosition"] return the position for the spatial split of ds time: O(1) ds["SubtreeIndexes",sd] return all the indexes for the coordinates in the subtree represented by ds time: O(n) ds["SubtreeLength"] find the number of coordinates bound by the subtree represented by ds time: O(n) ds["TreeSize"] return the number of data elements time: O(1) ds["Visualization"] return a visualization of ds time: O(n)
Examples
open allclose allBasic Examples (1)
A new "KDTree" object can be created with CreateDataStructure:
Show graphics for the 2D data:
Test if either of the 0 and 1 data elements are leaves:
Get the points included in the 1-data element leaf and highlight them with red in the graphics:
Get the k-d tree represented by the 0-data element:
The graphics for this colors pink the part of the coordinates not included in the subtree:
Visualize the tree structure of the k-d tree:
Get the split dimension and position for the k-d tree:
Get the coordinate indexes for the 1-element data of the k-d tree.
Test that the coordinates are all above the split position in the split dimension:
Get all coordinate indexes for the 1-element subtree k-d tree and check that they are all below the split position:
Scope (3)
Information (1)
A new "KDTree" can be created with CreateDataStructure:
Leaf Size (1)
The subdivision continues until there are at most a given number of coordinates in each box that can be specified as an additional argument in CreateDataStructure.
Limit the number of coordinates to at most 5:
The default is chosen to give a good balance between construction and traversal cost and operation cost on the leaves:
Arbitrary Precision (1)
If you give the coordinates with more than MachinePrecision digits, then the computations will be done in the appropriate precision:
Applications (2)
Nearest Neighbor Search (1)
Set up a program to use a "KDTree" data structure to find the element in a set of coordinates nearest to a given point:
At each node, process the side that the point is in or closest to first:
For a leaf data element, test all the points, and for a node data element, recurse:
Test the function on a set of one million coordinates in three dimensions:
Compare to the result returned by Nearest:
Compare the timing to just computing the Norm of all of the points:
The construction of the k-d tree takes more time than a single scan, but is worth the extra time if several searches are desired:
Range Search (1)
Set up a program to use a "KDTree" object to find all elements in a set of coordinates within given ranges:
At each node, process each side that fits in the range for the node split dimension:
For a leaf data element, test all the points for inclusion, and for a node data element, recurse:
Here is a dataset of selected properties for all countries for which those properties are known:
Get corresponding numerical data and create a "KDTree" data structure from it:
Find the countries with a population under a million people and GDP greater than 10 billion $:
Find the countries with population over 10 million people and area under 100000 km2: