[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: back-mdb - futures...

Howard Chu wrote:
The basic idea is to construct a database that is always mmap'd to a fixed
virtual address, and which returns its mmap'd data pages directly to the
caller (instead of copying them to a newly allocated buffer). Given a fixed
address, it becomes feasible to make the on-disk record format identical to
the in-memory format. Today we have to convert from a BER-like encoding into
our in-memory format, and while that conversion is fast it still takes up a
measurable amount of time. (Which is one reason our slapd entry cache is still
so much faster than just using BDB's cache.) So instead of storing offsets
into a flattened data record, we store actual pointers (since they all simply
reside in the mmap'd space).

One stumbling block: on Little-Endian machines, of which we seem to be cursed with an overabundance these days, the in-memory format for integers makes a terrible format for database keys. Byte-swapping them between on-disk and in-memory would completely defeat the mmap'ing scheme. So there's two choices: store them Little-Endian on disk, and use a reverse-order key comparison function (which we did back in OpenLDAP 2.1). This would break portability of the database files to other machines using Big-Endian format.

The other alternative is to store them in Big-Endian format, and just use them in their reversed order in memory. That would allow the database files to remain portable and eliminate the need for alternate key comparison functions. But it would require a custom iterator to do in-order traversals and entryID sorting comparisons.

At this point I'm leaning toward the former choice: store in native byte order and sacrifice portability. The alternative will have too big an ipmact on runtime performance. With the native byte order choice, this means if you ever want to cluster a bunch of servers on the same database they will all need to use the same byte order. (And of course, the same word size, which is the same requirement we have today.)

(Too bad C doesn't give us a "byteswapped" data attribute; some CPU architectures have instructions that can load a word from memory in a byte order that you choose. That would make life easier here, but if your CPU was that smart, it probably wouldn't be using brain-damaged byte order in the first place. Oh well...)

(And yes, by the way, we have planning for LDAPCon2009 this September in the
works; I imagine the Call For Papers will go out in a week or two. So now's a
good time to pull up whatever other ideas you've had in the back of your mind
for a while...)

Reminder: LDAPCon2009 is just a couple weeks away!

  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/