[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: Thread-local malloc discussion summary



> -----Original Message-----
> From: owner-openldap-devel@OpenLDAP.org
> [mailto:owner-openldap-devel@OpenLDAP.org]On Behalf Of Matthew Backes

> On Wednesday, April 9, 2003, at 04:01 PM, Howard Chu wrote:
> > I had hoped that search could be adapted without impacting a lot of
> > other
> > code, but it dragged out to more than I expected. The current
> > arrangement is
> > troublesome due to the need to keep global heap allocs separate from
> > thread-local allocs. We can allow some sloppiness here by
> changing the
> > sl_realloc and sl_free functions to simply pass through to
> ch_realloc
> > and
> > ch_free if the passed-in pointer does not reside in the
> thread-local
> > space.

> Does it make sense to start separating allocations into separate
> types so that they might be (eventually) implemented differently
> on different platforms? Some allocations need to be global, some
> can be within a thread,

Yes, that is the current approach.

> and many could probably benefit from stack-
> allocation.  Implemented using #defines for example.  Environments
> with a working alloca() could then use it.  (with no-op for the free)
> Environments without it would fall back to a malloc() or thread-
> local system and still have the appropriate free() call when needed.

> I think you've mentioned that getting autoconf to reliably state
> whether alloca is usable is tricky.  This shouldn't be a problem
> as we can just enable it for specific architectures that are known
> to support it.

I'm not fond of embedding OS-specific details into the code... The current
approach has taken some effort to merge into the source, but it is both
brain-dead simple and extremely fast, requiring no OS-specific support. It
can still stand a bit of streamlining, but for search operations my last
checkin completely eliminates the malloc bottleneck. I'll adapt the other
operations later, as time permits.

alloca can be very fast, but is not necessarily free. The compiler has to
insert code to track the bottom of the stack frame if the function is not a
leaf function, and this imposes some overhead...

The sl_malloc/sl_free is, as I said, totally brain-dead - sl_malloc hands out
memory from a thread-specific heap, and sl_free is a no-op. When the running
operation finishes, the heap is reset and ready for the next op. For most
operations, this will be sufficient. One problem with this situation is that
a single operation that does a lot of malloc/free's will quickly consume the
thread-specific heap since free's are not reclaimed. (E.g., a search
operation that returns multiple entries.) For this situation, there is also
sl_mark/sl_release which simply checkpoints the heap and returns to that
checkpoint later. This is used inside send_search_entry() to prevent its
temporary mallocs from accumulating. Again, incredibly simple/stupid, but
very very fast.

Another possible refinement that wouldn't cost too much is to have free'd
memory reclaimed, IFF the free'd region is the last region allocated. It's
not crucial most of the time, but it might give an added measure of
flexibility without hindering performance. Many of the functions are already
coded with cleanups freeing in reverse order of allocation, so this would
have a good effect.

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support