[Date Prev][Date Next]
Re: better malloc strategies?
Howard Chu wrote:
Howard Chu wrote:
This approach helped a fair amount. Pre-allocating large chunks of
memory to divvy up into the Entry and Attribute lists eliminates the
per-alloc malloc library overhead for these structures. Since the
glibc's malloc performance decreases as the number of allocated objects
increases, this turns out to be an important win. But over the course of
hundreds of runs, the slapd process size continues to grow. Of course
things as innocuous as syslog() also contribute to the problem, as they
malloc stdio buffers for formatting their messages.
Running some other benchmarks today I saw something rather odd -
after slapd had been running for a while, and the entry cache had
been churned around, slapd would sit at 100% CPU on a new incoming
connection for several minutes, inside the malloc() function, before
eventually progressing. At a guess, memory has become badly
fragmented due to so many entries being added and freed from the
entry cache, and allocating small blocks (only 18 bytes, for the
connection peername in this case) gets to be a real problem.As a first cut, I plan to recycle Entry and Attribute structures on
our own free lists. That ought to reduce some of the general malloc
contention, as well as some degree of the churn. Will be testing this
in the next few days.
I've played with libhoard in the past and gotten mixed results. I
wonder if this is just a particularly bad version of glibc, or
something we really have to worry about. (RHEL4 based system, kernel
2.6.9-22, glibc 2.3.4, AMD64 machine, 6GB of RAM free out of 32GB at
One downside is that right now it's a very simple-minded list with a
single mutex protecting the list head. So while malloc may have some
measure of thread scalability, this approach doesn't really. I guess the
saving grace here is that allocs and frees are extremely simple, so the
locks won't be held for long.
The simplicity of the code has helped boost performance a few percent.
It remains to be seen whether this will scale beyond more than a few CPUs.
Another alternative that looks very promising is to use Sun's libumem,
which has been ported to Linux and Windows here
http://sourceforge.net/projects/umem/ . Unfortunately the code there is
not packaged and ready-to-use. It has some autoconf machinery but none
of it bootstraps cleanly, it takes a lot of manual intervention to even
get automake thru it. But the fair amount of hacking that's required
appears to be worth it; the library seems to suffer no degradation thru
continuous querying over long periods of time. Now if only it didn't
rely on so many deep-system and CPU-dependent features, porting to
anything non-x86 will be a pain.
Comparing what the authors have accomplished here with the goals Jong
had for zone-malloc, it's very tempting to think about adopting the
library and using the umem-specific APIs for managing our object caches.
But given the porting issues I guess it's not realistic to consider that
any time soon.
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc
OpenLDAP Core Team http://www.openldap.org/project/