[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: py-lmdb



Luke Kenneth Casson Leighton wrote:
We fell for the fantasy of parallel writes with BerkeleyDB, but after a
dozen+ years of poking, profiling, and benchmarking, it all becomes clear -
all of that locking overhead+deadlock detection/recovery is just a waste of
resources.

  ... which is why tdb went to the other extreme, to show it could be done.

But even tdb only allows one write transaction at a time. I looked into writing a back-tdb for OpenLDAP back in 2009, before I started writing LMDB. I know pretty well how tdb works...

https://twitter.com/hyc_symas/status/451763166985613312

quote:

"The new code is faster at indexing and searching, but not so much
faster it would blow you away, even using
LMDB. Turns out the slowness of Python looping trumps the speed of a
fast datastore :(. The difference
might be bigger on a big index; I'm going to run experiments on the
Enron dataset and see."

interesting.  so why is read up at 5,000,000 per second under python
(in a python loop, obviously) but write isn't?  something odd there.

Good question. I'd guess there's some memory allocation overhead involved in writes. The Whoosh guys have some more perf stats here

https://bitbucket.org/mchaput/whoosh/wiki/Whoosh3

(their test.Tokyo / All Keys result is highly suspect though, the timing is the same for 100,000 keys as for 1M keys. Probably a bug in their test code.)

--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/