[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Please comment



At 07:32 PM 6/4/00 -0400, Mark Valence wrote:
 >
 >I have a few fairly minor modifications I'd like to commit, but first
 >I'd like to get some comments.  I've done some profiling on Unix and
 >Windows NT, using the LDBM backend, and have found three changes that
 >significantly improve performance:
 >
 >1) the ldbm backend is flushing caches too often.  Adding a simple
 >"dirty" flag and flushing only when the DB is dirty noticeably
 >improves performance.

If the dirty flag is a true (that is, we sync after each
and very cache modify), I have no objection.  Safety over
speed, please.

Yup, the dirty flag just allows the backend to bypass flushes when no modifications occur -- mods will always cause a cache flush. In other words, this change is really just something that should not have been necessary in the first place.


 >2) currently, a new thread is created for each operation, which
 >introduces huge overhead.  I've implemented simple thread-pooling for
 >the NT thread code, and have seen 5-fold performance improvement.
 >The changes I've made are limited to thr_nt.c, which has the benefit
 >that testing (and integration) is easier, but the drawback that other
 >thread modules still use the same 1 thread per operation model.  It's
 >easy to port the thread pooling code to the other thread models, and
 >I haven't looked at how easy it would be to push the thread-pooling
 >code "outside" of the platform-specific thread code.  Any comments
 >here?  Preferences?

I rather we implement the ldap_pvt_thread_pool* routines built
upon ldap_pvt_thread* routines and use this in slapd as needed.

OK. I will see how much this would change in the code above the thread modules. My guess is that would not alter the "higher" levels of code, but I want to be sure before making such broad changes. Plus, I can only test the changes with pthreads, NT threads, and mach threads.


 >3) Here's a neat test:  run a bunch of clients against four LDAP
 >servers that are identical except for one thing.  In one, use the ACL:
 >
 >    access to * by * write
 >
 >in another, use:
 >
 >    access to * by dn="cn=YourName, ou=People, dc=..." write

regex won't match any dn (due to extra spaces)

 >and in another use:
 >
 >    access to * by dn=".*ou=People, dc=..." write

regex won't match any dn (due to extra spaces)

 >and in the last use:
 >
 >    access to * by dn=".*ou=.*, dc=.*, ..." write

regex won't match any dn (due to extra spaces)

 >Obviously, the DNs above should make sense for the setup.  The idea
 >in the last case is to have a "complex" regex.

But that complex regex is inefficient.  .* or .+ should
be avoid except at the end of a regex.

OK, forget the contrived (and incorrect) examples. The point is that regex matching is expensive even for trivial patterns, and there are a few cases where regex's can be factored out.


 >What you'll see is that performance on the last three cases is about
 >the same, while the first case is *much* faster, relatively speaking.

The first shortcut regex processing, as do "self", "anonymous",
"users", dn="", dn=".+", dn=*, and many varients.  A simple
DN could be shortcutted as well.

 >I've tested some simple changes that special case the two middle
 >examples above to do a simple DN match and a DN suffix match,
 >respectively.  In this case, I've seen a factor of 10 improvement in
 >performance (obviously, the improvement depends on what your ACLs
 >look like).

I wouldn't mind special casing a simple DN match (ie: a
regex that is text only).... but not beyond this.

 >My implementation is sort of a hack, to make it easier to integrate
 >the changes into the official source.  If nobody protests, I will add
 >this special case code to the official source, but will clean it up a
 >bit for better maintainability.

I don't think your 3 or 4 cases should be shortcutted.

Definitely not 4, but I think suffix matching (3) is an easy (and desirable) target. It is a useful pattern, unless there is another way to specify all objects within a certain subtree of the DIT.


A note: we should be a bit careful of over optimizing DN
processing as we will likely have to rework DNs later to
fully implement RFC 2253.

I'll take a look at 2253 and see if that's something I can do at the same time (even though it's sort of tangential).


Thanks for the comments.

Mark.