[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#6056) Samba4 breaks OpenLDAP over ldapi

Howard Chu wrote:
> Andrew Bartlett wrote:

>> I agree, we might be jumping at different shadows here, but your patches
>> did fix something...
> I see what you're describing now, with the kvm set with 2 CPUs. It appears to
> be a bug caused by the recent patch for connection_hangup() processing.
> Running slapd with -d15 in your test shows that a connection is closed shortly
> after being established and becoming readable. The bug is (probably) that we
> queued the reader but processed the hangup immediately, thus closing the
> connection before the reader executes. I'm not exactly sure why this is
> causing the problem on your test, since it looks like your client is closing
> the socket before waiting for the reply. But certainly this is the right area.
With the connection problem out of the way, the remaining slapd crash appeared 
to be due to some type of heap corruption, but the usual suite of tools 
(valgrind, efence, LBER_MEMORY_DEBUG, etc.) didn't ever reproduce the problem. 
On the assumption that this was related to a stack overwrite I recompiled 
libldap_r with a 12MB stack size instead of the default 8MB and that still 
didn't change the outcome. On the assumption that there was some other race 
condition involved, I set slapd to only 2 threads (from default of 16) to try 
and limit the possibilities there. This caused the crash to occur much more 
quickly, and using libumem it became obvious that refint was accessing 
already-freed memory.

So it turns out that the patch for rev 1.41 (ITS#5428) to make this into a 
global overlay was incorrect; it moved the loop that processed its work queue 
into a new function so that it could be called multiple times, but it was 
still using that code basically as-is, which freed its queue as it operated. 
(Because originally the queue was only walked once.) Part of the reason the 
bug was so hard to reproduce is because it tended to only show up when the 
queue got fairly long, and that only happened if slapd was too busy. (Thus, 
forcing only 2 threads made it occur sooner.)

   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/