"BloomFilter" (Data Structure)

"BloomFilter"

represents a set that tests whether elements are definitely not members.

Details

  • A Bloom filter is typically used as a probabilistic set that can test if elements are definitely not members.
  • Testing an element for membership of a Bloom filter will always return False if the element is not a member of the set.
  • Testing an element for membership of a Bloom filter will not always return True if the element is a member of the set:
  • CreateDataStructure[ "BloomFilter",capacity]create a new "BloomFilter" of specified capacity
    CreateDataStructure["BloomFilter", capacity,p]create a new "BloomFilter" with specified false positive probability p.
    Typed[x,"BloomFilter"]give x the type "BloomFilter"
  • For a data structure of type "BloomFilter", the following operations can be used:
  • ds["Capacity"]the storage capacity of dstime: O(1)
    ds["Copy"]return a copy of dstime: O(n)
    ds["CouldContain",x]return True if ds could contain x and False otherwisetime: O(1)
    ds["Insert",x]insert x into dstime: O(1)
    ds["LoadFactor"]the load factor of dstime: O(1)
    ds["FalsePositiveProbability"]return the false positive probability for dstime: O(1)
    ds["Visualization"]return a visualization of dstime: O(n)
  • The following functions are also supported:
  • dsi===dsjTrue, if dsi equals dsj
    FullForm[ds]full form of ds
    Information[ds]information about ds
    InputForm[ds]input form of ds
    Normal[ds]convert ds to a normal expression

Examples

open allclose all

Basic Examples  (1)

A new "BloomFilter" can be created with CreateDataStructure:

Any expression can be inserted into a "BloomFilter" data structure:

It is possible to test if an expression is not present. A result of False means that f[2] is definitely not in the set:

A result of True means that f[1] is possibly in the set:

Return an expression version of ds:

A visualization of the data structure can be generated:

Scope  (2)

Information  (1)

A new "BloomFilter" can be created with CreateDataStructure:

Information about the data structure ds:

FalsePositiveProbability  (1)

Create a "BloomFilter" with a high false positive probability:

Create a "BloomFilter" with the default false positive probability:

Insert an element:

The data structure with the higher false positive probability returns a higher count of false positives:

Possible Issues  (1)

A Bloom filter that is very full is more likely to return false positives on testing for inclusion.

Create a Bloom filter and insert many elements. The visualization shows it appears quite full:

The load factor shows that the filter is full:

The test for inclusion returns True even though the element was never inserted. This is because the filter is full: