[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: LMDB: comparison contexts




Quoting Howard Chu <hyc@symas.com>:
This is explicitly documented as the correct list.
https://gitorious.org/mdb/mdb/source/3368d1f5e243225cba4d730fba19ff600798ebe3:

Ah, great. The page at symas.com/mdb was a bit more ambiguous.

It would be foolish to implement the comparator in a non-native compiled language.

Certainly when performance is paramount. But that is not the only reason someone might prefer to use LMDB.

Whatever the language, sometimes a comparison operation might depend on run-time data: for instance, case-insensitive string comparison is locale-dependent, so if you don't want to hard-code a fixed list of supported locales into the program, you need some way of passing the locale to the comparator function.

Anything is feasible but this will never be done. The comparator is called a huge number of times in any DB operation. Adding an additional parameter to the calling sequence causes a significant, measurable slowdown.

Seriously? That strikes against both my intuition and some quick benchmarks.

The intuition: if the binary search loop in mdb_node_search is so tight that function call overhead for the comparison is significant, then I would expect the indirect branch to be a bigger part of that than argument passing. After all, in the absence of register pressure, passing an argument is pretty much just a single move instruction. So if this was performance-critical, I would expect to see a fast-path specialization of the search loop for mdb_cmp_memn. That would allow the most common comparison to be inlined. But if that hasn't been worthwhile, then skimping with the parameters (at a great cost to flexibility) seems even less so.

The benchmark: I tried out both modifications (both fastpath memcmp and a third parameter to the comparison function) with db_bench_mdb. I admittedly know nothing about benchmarking databases, but I was unable to see any meaningful differences with any of the modifications. So if there is some evidence that the third parameter would make a difference, I'd be interested in learning of it.

Although the documentation in lmdb.h is generally good, it's a bit
wishy-washy when it comes to lifetimes of data.

Sorry, I take this back. This was a pet peeve of mine when I first learned LMDB, but after that a note was added to mdb_get:

	 * @note Values returned from the database are valid only until a
	 * subsequent update operation, or the end of the transaction.

This is perfectly sufficient (although it should probably be attached to mdb_cursor_get as well), but somehow I managed to miss it ever since. My bad.


Lauri