Bloom Filters

So, basically, a Bloom Filter is a quick little check to see whether an item is in a set. Scratch that — it is a quick little check to see whether an item is definitely not in a set. Imagine you have a bit vectorA bit vector is a very simple array or list data structure. Each cell of the array contains a bit (either 0 or 1).... of zeroes. Treat that bit vector like a hash tableA hash table (sometimes also called a hash map) is a data structure that is useful in situations when: you need very quick insertion and retrieval, you very rarely need to iterate over all your items, you don't care what order the items are stored (they don't need to be sorted, etc.), and the data you want to store has.... For each item in the set, use a hash functionA hash function exists within the context of hashing and hash tables. Data to be stored in hash tables includes a unique attribute (id number, username, etc.). Hash functions take the unique attribute and translate it into a hash index. Most algorithms take a numerical value as input, so if your unique attribute is text-based, translate your text into a... to find the hash indexA hash index is an index into the underlying array that supports a hash table. A hash index is determined by a hash function...., and set the bit at the hash index to 1.

Imagine you have a bit vector of zeroes. Treat that bit vector like a hash table. For each item in the set, use a hash function to find the hash index, and set the bit at the hash index to 1.

Now let’s say it’s really expensive (for whatever reason) to determine whether a given item exists in the set. Before you complete the long computation, you can quickly check if a full search is a waste of time. Use the hash function on your item, find the hash index, and determine if the bit at that index is set to 1. If you find the bit at the hash index is 0, great! The item is definitely not in the set. If you find that the index is set to 1, you know that the item *might* be in the set. It isn’t definitive because (almost) no hashing functionA hash function exists within the context of hashing and hash tables. Data to be stored in hash tables includes a unique attribute (id number, username, etc.). Hash functions take the unique attribute and translate it into a hash index. Most algorithms take a numerical value as input, so if your unique attribute is text-based, translate your text into a... is perfect and two items might hash to the same index.

Posted in