<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>StaticMethod &#187; data structures</title>
	<atom:link href="http://www.staticmethod.net/tag/data-structures/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.staticmethod.net</link>
	<description>Things programmers might find interesting</description>
	<lastBuildDate>Thu, 04 Mar 2010 20:05:58 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Bloom Filters</title>
		<link>http://www.staticmethod.net/2009/05/03/bloom-filters/</link>
		<comments>http://www.staticmethod.net/2009/05/03/bloom-filters/#comments</comments>
		<pubDate>Mon, 04 May 2009 01:48:20 +0000</pubDate>
		<dc:creator>Chris</dc:creator>
				<category><![CDATA[Learning]]></category>
		<category><![CDATA[Programming Problems]]></category>
		<category><![CDATA[bloom filter]]></category>
		<category><![CDATA[data structures]]></category>

		<guid isPermaLink="false">http://www.staticmethod.net/?p=34</guid>
		<description><![CDATA[Bloom filters are a mechanism for &#8220;indexing&#8221; a set and quickly determining whether or not a given element is part of the set that the Bloom Filter represents.  You do not actually store items in a bloom filter, the bloom filter itself is simply an array of bits.  The (very) basic idea is that for [...]]]></description>
			<content:encoded><![CDATA[<p>Bloom filters are a mechanism for &#8220;indexing&#8221; a set and quickly determining whether or not a given element is part of the set that the Bloom Filter represents.  You do not actually store items in a bloom filter, the bloom filter itself is simply an array of bits.  The (very) basic idea is that for every element <em>i</em> in the set, you generate a series of array indexes (usually via some hashing method), and then set those positions in your bit array to 1.</p>
<p>For example, for the input <em>&#8220;foo&#8221;</em> you could generate the indices 1, 4 and 6.  Then in your (fixed size) array, you set positions 1, 4 and 6 to &#8220;1&#8243;.     For the input <em>&#8220;bar&#8221;</em> you generate the indices 1, 4, and 8, and set those positions to &#8220;1&#8243;.  When you want to find out if an input is in the set, you generate the index values (via your hashing method) and check if those positions are set to 1 in the array.  If all of them are, the input was likely in the set.  If one or more of the indices were not set to 1, then the input was definitely not in the set.  You can vary the number of index positions your hashing function provides and the size of your bit-array in order to reduce the number of false positives.</p>
<p>After your bit-array is built, representing your set, it becomes a constant time operation to check if something is in the set.  Insertion of new items into the Bloom Filter is also a constant time operation.  One major downside to Bloom Filters that I can see is that there isn&#8217;t a way to remove something from the bloom filter without causing false negatives.</p>
<p>The two articles I found useful in learning about Bloom Filters were:</p>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Bloom_filter">Bloom Filter</a> &#8211; Wikipedia Bloom Filter Article</li>
<li><a href="http://www.coolsnap.net/kevin/?p=13">Article</a> at Cool/Snap? blog</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.staticmethod.net/2009/05/03/bloom-filters/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
