[Date Prev][Date Next] [Chronological] [Thread] [Top]

re: long running threads (was ITS#3404)



Actually the syncrepl provider does not occupy any thread for a long period of time. The biggest problem was its use of sl_realloc in building syncUUID sets for some large number of UUIDs, which overran the thread-specific malloc pool for a particular thread due to this bug. But restructuring that code to allocate a single re-usable buffer instead of reallocing/freeing it over and over avoids the issue. Even without the stack-based sl_malloc design constraints, it's always better to allocate a buffer once rather than realloc/free it repeatedly.

Likewise the syncrepl consumer isn't really a long running thread; it always gets a completely reset slab when it gets control. Of course, when it receives a lot of updates its individual operations aren't getting a fresh slab as normal operations would. In this case just wrapping the syncrepl operations in sl_mark/sl_release to reset the slab for each operation would avoid most problems there. We removed the mark/release functions a long time ago, but I think there are still situations (like the consumer) where they're useful.

Jong-Hyuk wrote:

The syncrepl provider can be considered as the most common trigger of the
slab memory allocator bug just because it is a long running one. Any such
thread could be hit as unstable because of it. With the new buddy slab
allocator, it is quite certain that the problem already does not exist. If
you're interested, it would be very helpful to try the new buddy allocator
together with the new syncrepl provider since it did yet go through serious
tests.
- Jong-Hyuk

----- Original Message ----- From: <hyc@symas.com>
To: <openldap-its@OpenLDAP.org>
Sent: Wednesday, December 08, 2004 12:04 PM
Subject: Re: (ITS#3404) sockber stack SEGVs





Also I should note that I've rewritten the syncrepl provider in CVS
HEAD. The new implementation will be part of OpenLDAP 2.3. The
implementation for OpenLDAP 2.2 is being deleted from CVS HEAD. This is
relevant because the syncrepl provider seemed to be the most common
cause of triggering this bug. If you're interested, it would be good to
give the new syncrepl provider code in HEAD some in-depth testing.





--
 -- Howard Chu
 Chief Architect, Symas Corp.       Director, Highland Sun
 http://www.symas.com               http://highlandsun.com/hyc
 Symas: Premier OpenSource Development and Support