[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: better malloc strategies

To: David Boreham <david_list@boreham.org>
Subject: Re: better malloc strategies
From: Howard Chu <hyc@symas.com>
Date: Wed, 22 Nov 2006 17:00:56 -0800
Cc: openldap-devel@openldap.org
In-reply-to: <4564EA75.8000108@boreham.org>
References: <200608282343.k7SNhOjt061559@cantor.openldap.org> <44F3896A.9080002@symas.com> <Pine.SOC.4.64.0608282050580.7225@toolbox.rutgers.edu> <44F64AB0.7080007@symas.com> <4563F8F2.8090109@symas.com> <4564DBF8.8010809@symas.com> <4564EA75.8000108@boreham.org>
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20060911 Netscape/7.2 (ax) Firefox/1.5 SeaMonkey/1.5a

David Boreham wrote:

Howard Chu wrote:

Kinda interesting - with hoard this shows us processing 23000 entries per second in the single-threaded case, vs only 3521 per second with glibc malloc.


It is possible that you are seeing the syndrome that I wrote about here:
http://www.bozemanpass.com/info/linux/malloc/Linux_Heap_Contention.html
AFAIK the poorly behaving code is still present in today's glibc malloc.
(the problem affects UMich-derived LDAP servers because entries in
the cache comprise a significant proportion of the heap traffic, and they
tend to get allocated by a different thread than frees them when the
cache fills up). malloc exhibits lock contention when blocks are freed
by a different thread than allocated them (more exactly when the threads
hash to different arenas).

Yes, that's a factor I had considered. One of the approaches I had tested was tagging each entry with the threadID that allocated it, and deferring the frees until the owning thread comes back to dispose of them. I wasn't really happy with this approach because it meant the entry cache size would not be strictly regulated, though on a busy server it's likely that all of the frees would happen in reasonably short time.

Still, free() contention alone doesn't explain the huge performance gap in the single-threaded case. However, judging from the fact that the very first glibc search often runs faster than the second and subsequent ones, I'd say that there are other significant costs in glibc free(). Most likely would be that glibc returns memory back to the OS too frequently. Hoard and Umem also return memory back to the OS, but they also keep more extensive free lists. Tcmalloc never returns memory back to the OS.

What's also interesting is that for Hoard, Umem, and Tcmalloc, the multi-threaded query times are consistently about 2x slower than the single-threaded case. The 2x slowdown makes sense since it's only a dual-core CPU and it's doing 4x as much work. This kinda says that the cost of malloc is overshadowed by the overhead of thread scheduling.

But for glibc the multi-threaded times are actually faster than the single-threaded case. Since glibc's costs are being partially hidden it seems that the multiple threads are actually getting some benefit from the entry cache here. Still the difference between glibc and the other allocators is so dramatic that this little difference is just academic. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc OpenLDAP Core Team http://www.openldap.org/project/

Follow-Ups:
- Re: better malloc strategies
  - From: David Boreham <david_list@boreham.org>

References:
- Re: better malloc strategies
  - From: Howard Chu <hyc@symas.com>
- Re: better malloc strategies
  - From: Howard Chu <hyc@symas.com>
- Re: better malloc strategies
  - From: David Boreham <david_list@boreham.org>

Prev by Date: Re: better malloc strategies
Next by Date: Re: better malloc strategies
Index(es):
- Chronological
- Thread