"KDTree" (Data Structure)

"KDTree"

represents k-d tree binary spatial subdivision for a set of real number coordinates.

Details

  • A k-d tree is useful for subdividing a set of n coordinates in dimension so that searches, such as nearest neighbor searches, can be done quickly. Each node has two data elements, indexed by 0 and 1. The data element can be either another k-d tree node or a list {i1,i2,} of point indexes with 0<ik<=n. Each data element has a unique machine integer ID.
  • CreateDataStructure["KDTree",coordinates]create a new "KDTree" from coordinates that can all be represented by real numbers.
    CreateDataStructure["KDTree",coordinates, n]create a new "KDTree" from coordinates with at most n entries in each subdivistion.
    Typed[x,"KDTree"]give x the type "KDTree".
  • For a data structure ds of type "KDTree", the following operations can be used:
  • ds["DataBounds"]return the overall bounds for the coordinatestime: O(1)
    ds["DataCoordinates"]return the coordinates time: O(1)
    ds["Graphics"]show the spatial subdivision for coordinates in 2Dtime: O(n)
    ds["ID"]return the ID for dstime: O(1)
    ds["ID", b]return the ID for the b (0 or 1) data element of dstime: O(1)
    ds["Indexes",b]return the indexes for the coordinates enclosed in the box represented by the b (0 or 1) leaf data element of dstime: O(1)
    ds["LeafQ", b]return True if the b (0 or 1) data element of ds is a leaftime: O(1)
    ds["Node",b]return the "KDTree" for the the b (0 or 1) node data element of dstime: O(1)
    ds["SplitDimension"]return the dimension for the spatial split of dstime: O(1)
    ds["SplitPosition"]return the position for the spatial split of dstime: O(1)
    ds["SubtreeIndexes",sd]return all the indexes for the coordinates in the subtree represented by dstime: O(n)
    ds["SubtreeLength"]find the number of coordinates bound by the subtree represented by dstime: O(n)
    ds["TreeSize"]return the number of data elementstime: O(1)
    ds["Visualization"]return a visualization of dstime: O(n)

Examples

open allclose all

Basic Examples  (1)

A new "KDTree" object can be created with CreateDataStructure:

Show graphics for the 2D data:

Test if either of the 0 and 1 data elements are leaves:

Get the points included in the 1-data element leaf and highlight them with red in the graphics:

Get the k-d tree represented by the 0-data element:

The graphics for this colors pink the part of the coordinates not included in the subtree:

Visualize the tree structure of the k-d tree:

Get the split dimension and position for the k-d tree:

Get the coordinate indexes for the 1-element data of the k-d tree.

Test that the coordinates are all above the split position in the split dimension:

Get all coordinate indexes for the 1-element subtree k-d tree and check that they are all below the split position:

Scope  (3)

Information  (1)

A new "KDTree" can be created with CreateDataStructure:

Information about the data structure ds:

Leaf Size  (1)

The subdivision continues until there are at most a given number of coordinates in each box that can be specified as an additional argument in CreateDataStructure.

Limit the number of coordinates to at most 5:

The default is chosen to give a good balance between construction and traversal cost and operation cost on the leaves:

Arbitrary Precision  (1)

If you give the coordinates with more than MachinePrecision digits, then the computations will be done in the appropriate precision:

Applications  (2)

Nearest Neighbor Search  (1)

Set up a program to use a "KDTree" data structure to find the element in a set of coordinates nearest to a given point:

At each node, process the side that the point is in or closest to first:

For a leaf data element, test all the points, and for a node data element, recurse:

Test the function on a set of one million coordinates in three dimensions:

Compare to the result returned by Nearest:

Compare the timing to just computing the Norm of all of the points:

The construction of the k-d tree takes more time than a single scan, but is worth the extra time if several searches are desired:

Range Search  (1)

Set up a program to use a "KDTree" object to find all elements in a set of coordinates within given ranges:

At each node, process each side that fits in the range for the node split dimension:

For a leaf data element, test all the points for inclusion, and for a node data element, recurse:

Here is a dataset of selected properties for all countries for which those properties are known:

Get corresponding numerical data and create a "KDTree" data structure from it:

Find the countries with a population under a million people and GDP greater than 10 billion $:

Find the countries with population over 10 million people and area under 100000 km2: