Python: cPickle vs ConfigParser vs Shelve Performance

March 26, 2009 § 4 Comments

I need to store large number of key-values to map my Python objects. These key-values DO NOT have to be replicated across multiple servers and  the project DOES NOT require external storage systems such as RDBMS or Berkeley DB or others. The least external dependencies the better.

That leads me to cPickle vs ConfigParser vs Shelve. cPickle is obvious contender, it is fast and easy to use.

ConfigParser is an interface for writing config file, but its format is very key-value ish, so it counts.

Shelve is obvious too, because of its interface.

So I ran profiler test using hot shot and here’s the result:


Profile: Saving 100000 key-value to pickle file
700001 function calls in 2.330 CPU seconds


Profile: Extracting 100000 key-value from pickle file
4 function calls in 0.258 CPU seconds


Profile: Saving 100000 key-value in ConfigParser file
900004 function calls in 2.502 CPU seconds


Profile: Extracting 100000 key-value from ConfigParser file
300007 function calls in 1.936 CPU seconds


Profile: Saving 100000 key-value to shelve file
1300047 function calls (1300045 primitive calls) in 10.091 CPU seconds


Profile: Extracting 100000 key-value from shelve file
500027 function calls in 6.527 CPU seconds

From the results:

  • Shelve is disappointingly slow. It execute 1,300,047 calls???
  • cPickle is not bad at all. As expected, it performs really quick.
  • ConfigParser is the biggest surprise here, I was expecting it to be much slower.

Side Notes:

  • I use threading.Lock before setting the key-value to prevent resource contention (which is real life case).
  • Any improvements is greatly appreciated. Especially different data storage that I’m not aware of.
  • Code can be found here.
About these ads

Tagged:

§ 4 Responses to Python: cPickle vs ConfigParser vs Shelve Performance

  • First, why did you include dictionary creation in time measurements? Second, why are you recreating this dictionary when you load values with cPickle?

    data = pickle.load(output)
    output.close()
    for i in xrange(how_many):
    result = data["key_%s" % i]

    You can just result = data and be done.

    That said, your conclusion is still correct.

    Thanks!

  • Oops, my second point is not corrent — you’re not recreating. But still, why this “for” loop?

    P.S. Sorry for writing my name in Russian.

  • didip says:

    @Dmitry
    Your russian name is awesome. =)

    The for loop is just convenience to generate a lot items.

  • Jonathan says:

    Use Shelve when you have a large amount of data that you only need small parts of at a time.

    Use cPickle when you have data that you want to access all at once.

    Basically between Shelve and cPickle, you are trading disk-access speed for in-memory-access speed.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

What’s this?

You are currently reading Python: cPickle vs ConfigParser vs Shelve Performance at RAPD.

meta

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: