<?xml version="1.0" encoding="UTF-8"?> <rss
version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
> <channel><title>Autarchy of the Private Cave &#187; set</title> <atom:link href="https://bogdan.org.ua/tags/set/feed" rel="self" type="application/rss+xml" /><link>https://bogdan.org.ua</link> <description>Tiny bits of bioinformatics, [web-]programming etc</description> <lastBuildDate>Wed, 28 Dec 2022 16:09:04 +0000</lastBuildDate> <language>en-US</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <generator>https://wordpress.org/?v=3.8.27</generator> <item><title>Python performance: set vs list</title><link>https://bogdan.org.ua/2011/08/15/python-performance-set-vs-list.html</link> <comments>https://bogdan.org.ua/2011/08/15/python-performance-set-vs-list.html#comments</comments> <pubDate>Mon, 15 Aug 2011 09:29:04 +0000</pubDate> <dc:creator><![CDATA[Bogdan]]></dc:creator> <category><![CDATA[Notepad]]></category> <category><![CDATA[Programming]]></category> <category><![CDATA[Python]]></category> <category><![CDATA[list]]></category> <category><![CDATA[membership]]></category> <category><![CDATA[performance]]></category> <category><![CDATA[set]]></category> <guid
isPermaLink="false">http://bogdan.org.ua/?p=1673</guid> <description><![CDATA[Sometimes there is a need to be sure that no identifier is processed twice &#8211; for example, when parsing a file into a database, with file potentially containing duplicate records. An obvious solution is to properly wrap the DB insertion code into try&#8230;except block, and process duplicate primary ID exceptions. Another, sometimes more desired solution [&#8230;]]]></description> <content:encoded><![CDATA[<p>Sometimes there is a need to be sure that no identifier is processed twice &#8211; for example, when parsing a file into a database, with file potentially containing duplicate records. An obvious solution is to properly wrap the DB insertion code into try&#8230;except block, and process <em>duplicate primary ID</em> exceptions. Another, sometimes more desired solution is to maintain a set/list of processed IDs internally, and check against that list prior to attempting the insertion of anything. So is it a set or a list?</p><p>There are already quite a few internet resources discussing &#8220;python set vs list&#8221;, but probably the simplest while elegant way to test that is below.<br
/> <span
id="more-1673"></span><br
/> First, test the speed of adding/appending to a set or a list (here, I&#8217;m mimicking the real-life application, thus the test case has an optional loop):</p><div
id="ig-sh-1" class="syntax_hilite"><div
class="code"><ol
class="code" style="font-family:monospace;"><li
style="font-weight: normal; vertical-align:top;"><div
style="font: normal normal 1em/1.2em monospace; margin:0; padding:0; background:none; vertical-align:top;">$python -mtimeit -s 'myset = set()' 'for x in xrange(1000): myset.add(x)'</div></li><li
style="font-weight: normal; vertical-align:top;"><div
style="font: normal normal 1em/1.2em monospace; margin:0; padding:0; background:none; vertical-align:top;">10000 loops, best of 3: 133 usec per loop</div></li><li
style="font-weight: normal; vertical-align:top;"><div
style="font: normal normal 1em/1.2em monospace; margin:0; padding:0; background:none; vertical-align:top;">$python -mtimeit -s 'tmp = list()' 'for x in xrange(1000): tmp.append(x)'</div></li><li
style="font-weight: normal; vertical-align:top;"><div
style="font: normal normal 1em/1.2em monospace; margin:0; padding:0; background:none; vertical-align:top;">10000 loops, best of 3: 116 usec per loop</div></li></ol></div></div><p>As we can see, set and list are comparable in the speed of adding new items, with list being slightly (~12%) faster than set.</p><p>Now, the speed of membership testing: &#8216;x in tmp&#8217;. For this test, I&#8217;ve deliberately chosen an imbalance of True (1%) and False (99%) results for the test &#8211; again, mimicking the real problem I have at hand:</p><div
id="ig-sh-2" class="syntax_hilite"><div
class="code"><ol
class="code" style="font-family:monospace;"><li
style="font-weight: normal; vertical-align:top;"><div
style="font: normal normal 1em/1.2em monospace; margin:0; padding:0; background:none; vertical-align:top;">$python -mtimeit -s 'tmp = set()' -s 'for x in xrange(1000): tmp.add(x)' 'for x in xrange(100000): x in tmp'</div></li><li
style="font-weight: normal; vertical-align:top;"><div
style="font: normal normal 1em/1.2em monospace; margin:0; padding:0; background:none; vertical-align:top;">100 loops, best of 3: 7.27 msec per loop</div></li><li
style="font-weight: normal; vertical-align:top;"><div
style="font: normal normal 1em/1.2em monospace; margin:0; padding:0; background:none; vertical-align:top;">$python -mtimeit -s 'tmp = list()' -s 'for x in xrange(1000): tmp.append(x)' 'for x in xrange(100000): x in tmp'</div></li><li
style="font-weight: normal; vertical-align:top;"><div
style="font: normal normal 1em/1.2em monospace; margin:0; padding:0; background:none; vertical-align:top;">10 loops, best of 3: 2.12 sec per loop</div></li></ol></div></div><p>List is much slower for membership testing, while <a
href="http://en.wikipedia.org/wiki/Collection_(computing)#Sets">sets were designed to be fast for doing just that</a>.</p><p><a
class="a2a_button_citeulike" href="https://www.addtoany.com/add_to/citeulike?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2011%2F08%2F15%2Fpython-performance-set-vs-list.html&amp;linkname=Python%20performance%3A%20set%20vs%20list" title="CiteULike" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pocket" href="https://www.addtoany.com/add_to/pocket?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2011%2F08%2F15%2Fpython-performance-set-vs-list.html&amp;linkname=Python%20performance%3A%20set%20vs%20list" title="Pocket" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_kindle_it" href="https://www.addtoany.com/add_to/kindle_it?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2011%2F08%2F15%2Fpython-performance-set-vs-list.html&amp;linkname=Python%20performance%3A%20set%20vs%20list" title="Kindle It" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_evernote" href="https://www.addtoany.com/add_to/evernote?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2011%2F08%2F15%2Fpython-performance-set-vs-list.html&amp;linkname=Python%20performance%3A%20set%20vs%20list" title="Evernote" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pinterest" href="https://www.addtoany.com/add_to/pinterest?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2011%2F08%2F15%2Fpython-performance-set-vs-list.html&amp;linkname=Python%20performance%3A%20set%20vs%20list" title="Pinterest" rel="nofollow noopener" target="_blank"></a><a
class="a2a_dd addtoany_share_save addtoany_share" href="https://www.addtoany.com/share#url=https%3A%2F%2Fbogdan.org.ua%2F2011%2F08%2F15%2Fpython-performance-set-vs-list.html&#038;title=Python%20performance%3A%20set%20vs%20list" data-a2a-url="https://bogdan.org.ua/2011/08/15/python-performance-set-vs-list.html" data-a2a-title="Python performance: set vs list"><img
src="https://static.addtoany.com/buttons/share_save_120_16.png" alt="Share"></a></p>]]></content:encoded> <wfw:commentRss>https://bogdan.org.ua/2011/08/15/python-performance-set-vs-list.html/feed</wfw:commentRss> <slash:comments>1</slash:comments> </item> </channel> </rss>