Bloom Filters
Bloom filters are a mechanism for “indexing” a set and quickly determining whether or not a given element is part of the set that the Bloom Filter represents. You do not actually store items in a bloom filter, the bloom filter itself is simply an array of bits. The (very) basic idea is that for every element i in the set, you generate a series of array indexes (usually via some hashing method), and then set those positions in your bit array to 1.
For example, for the input “foo” you could generate the indices 1, 4 and 6. Then in your (fixed size) array, you set positions 1, 4 and 6 to “1″. For the input “bar” you generate the indices 1, 4, and 8, and set those positions to “1″. When you want to find out if an input is in the set, you generate the index values (via your hashing method) and check if those positions are set to 1 in the array. If all of them are, the input was likely in the set. If one or more of the indices were not set to 1, then the input was definitely not in the set. You can vary the number of index positions your hashing function provides and the size of your bit-array in order to reduce the number of false positives.
After your bit-array is built, representing your set, it becomes a constant time operation to check if something is in the set. Insertion of new items into the Bloom Filter is also a constant time operation. One major downside to Bloom Filters that I can see is that there isn’t a way to remove something from the bloom filter without causing false negatives.
The two articles I found useful in learning about Bloom Filters were:
- Bloom Filter – Wikipedia Bloom Filter Article
- Article at Cool/Snap? blog
This entry was posted on Sunday, May 3rd, 2009 at 8:48 pm and is filed under Learning, Programming Problems. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

