[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: slurpd lockups, serialization of updates (ITS#3123)




--On Thursday, April 29, 2004 7:22 PM +0000 openldap-its@OpenLDAP.org wrote:

I've done more tests on this problem.

1) Our 2.2.6 production servers use heimdal-0.6 and cyrus-sasl 2.1.17.  Our 
test servers were using heimdal-0.6.1 and cyrus-sasl 2.1.18.  So I rebuilt 
2.2.11 against the older versions of Heimdal and Cyrus SASL, and redeployed 
them onto the test servers.  I got the same results (50% of the time, if I 
start slurpd, one replica will not be updated when changes come in).

2) I rebuilt 2.2.11 slapd/slurpd linked against the 2.2.6 libraries:

ldd slurpd
        libldap_r.so.199 =>      /usr/local/lib/libldap_r.so.199
        libsasl2.so.2 =>         /usr/local/lib/libsasl2.so.2
        libssl.so.0.9.7 =>       /usr/local/lib/libssl.so.0.9.7
        libcrypto.so.0.9.7 =>    /usr/local/lib/libcrypto.so.0.9.7
        libresolv.so.2 =>        /usr/lib/libresolv.so.2
        libgen.so.1 =>   /usr/lib/libgen.so.1
        libnsl.so.1 =>   /usr/lib/libnsl.so.1
        libsocket.so.1 =>        /usr/lib/libsocket.so.1
        libpthread.so.1 =>       /usr/lib/libpthread.so.1
        libc.so.1 =>     /usr/lib/libc.so.1
        liblber.so.199 =>        /usr/local/lib/liblber.so.199
        libgcc_s.so.1 =>         /usr/local/lib/libgcc_s.so.1
        libdl.so.1 =>    /usr/lib/libdl.so.1
        libmp.so.2 =>    /usr/lib/libmp.so.2
        libthread.so.1 =>        /usr/lib/libthread.so.1
        /usr/platform/SUNW,Ultra-80/lib/libc_psr.so.1

I still saw the same behavior.

One other new behavior I noticed tonight, is that when all the replica's 
are updated, slurpd will exit.  Sometimes it does this cleanly (no 
slurpd.pid file left behind), sometimes uncleanly.  But no stop signal was 
ever issued to slurpd.



On the replica that is not replicated to, I see:

May 15 23:03:32 ldap-test2.Stanford.EDU slapd[13846]: [ID 848112 
local4.debug] conn=1159 fd=10 ACCEPT from IP=171.67.16.99:36308 
(IP=0.0.0.0:389)
 May 15 23:03:32 ldap-test2.Stanford.EDU slapd[13846]: [ID 952275 
local4.debug] conn=1159 fd=10 closed

Now, if the problem was a timeout issue (ie, the available mechanisms were 
not sent back fast enough), I'd expect that < 1 second would not cause a 
timeout to be hit, because this looks like the master is closing the 
connection almost immediately.

Note that this connection was made 3 seconds after slurpd started:

    root 24835     1  0 23:03:29 ?        0:00 /usr/local/lib/slurpd -t 
/var/tmp


Here is how the replica's are defined in slapd.conf:

replica         host=ldap-test3.stanford.edu:389
                tls=yes bindmethod=sasl
 
binddn=cn=replicator,cn=service,cn=applications,dc=stanford,dc=edu 
saslmech=gssapi

replica         host=ldap-test2.stanford.edu:389
                tls=yes bindmethod=sasl
 
binddn=cn=replicator,cn=service,cn=applications,dc=stanford,dc=edu 
saslmech=gssapi

replica         host=ldap-test1.stanford.edu:389
                tls=yes bindmethod=sasl
 
binddn=cn=replicator,cn=service,cn=applications,dc=stanford,dc=edu 
saslmech=gssapi

replogfile      /var/log/replog


--Quanah

--
Quanah Gibson-Mount
Principal Software Developer
ITSS/TSS/Computing Systems
ITSS/TSS/Infrastructure Operations
Stanford University
GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html