[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: entry_free() etc. bottlenecks

Howard Chu writes:
> The obvious fix is to adopt the same strategies that tcmalloc uses. (And 
> unfortunately we can't simply rely on tcmalloc always being available, or 
> always being stable in a given environment.)

Good, though I'd like to see these slapd re-implementations of system
features (like malloc) #ifdeffed with a fallback to the system feature.
Then one can compile with -D<revert to system feature> either when that
one is as good or better than slapd's, or to simplify debugging.
Configure can guess about it too, e.g. it can detect tcmalloc.

The new entry_free() plus tcmalloc may be better than plain tcmalloc,
I don't know.  It retains the global mutex though, which presumably is
or someday will be a pessimization compared to _some_ malloc out there.

> I.e., use per-thread cached free 
> lists. We maintain some small number of free objects per thread; this 
> per-thread free list can be used without locking. When the number of free 
> objects on a given thread exceeds a particular threshold

...or there is no thread key for the mutex (e.g. when the current
thread is not from the thread pool)...

Might be convenient to let slapd register init-thread and cleanup-thread
functions in the thread pool.  These could create/destroy these mutexes,
and maybe some other per-thread slapd variables too.

(Preferably the init function would be able to fail and cause the pool
thread to die, but that'd mess up the pool logic which assumes once a
thread has been created it will be able to handle submitted tasks.
Except slapd often doesn't check for malloc/mutex_init success anyway,
so demanding success would be no worse than what slapd does now.)

> then we obtain the 
> global lock to return some number of objects to the global list.
> In practice this threshold can be very small - any given thread typically 
> needs no more than 4 entries at a time. (ModDN is the worst case at 3 entries 
> locked at once. LDAP TXNs would distort this figure but not in any critical 
> fashion.) For attributes the typical usage is much more variable, but any 
> number we pick will be an improvement over the current code.

Add a few more for overlays, in particular syncrepl.  Otherwise even a
single overlay doing entry_dup() reduces performance.