[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#5486) openldap with syncprov intermittent core dump



Full_Name: Mark Cave-Ayland
Version: 2.4.8cvs-RE24-2008-04-15
OS: RHEL4, x86
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (217.207.197.142)


Hi there,

In order to resolve issues experienced with syncrepl/glue on an existing
openldap-2.4.8 deployment (ITS#5430), we have been using a CVS checkout of
openldap RE24 branch taken from 2008-04-15 on one of our test systems.

Unfortunately, we are still seeing random segfaults occurring roughly once a day
which appear to point towards the syncprov overlay once again. At the moment, we
are having difficulty reproducing the fault under test conditions, but if
openldap is left running long enough then it is possible to obtain a core dump.

The issue is occurring with a server, pelican, which is configured using the
syncprov overlay to a number of subordinates for different parts of the tree.
The relevant log snippet follows:

Apr 28 12:18:32 pelican slapd[7688]: do_syncrep2:
cookie=rid=142,csn=20080428111855.697316Z#000000#000#000000
Apr 28 12:18:32 pelican slapd[7688]: syncrepl_entry: rid=142
LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_MODIFY)
Apr 28 12:18:32 pelican slapd[7688]: syncrepl_entry: rid=142 be_search (0)
Apr 28 12:18:32 pelican slapd[7688]: syncrepl_entry: rid=142
uid=richf,ou=V,ou=W,ou=X,dc=Y,dc=Z
Apr 28 12:18:32 pelican slapd[7688]: slap_queue_csn: queing 0x9ff7560
20080428111855.697316Z#000000#000#000000
Apr 28 12:18:32 pelican slapd[7688]: syncprov_sendresp:
cookie=rid=146,csn=20080428111855.697316Z#000000#000#000000
Apr 28 12:18:32 pelican slapd[7688]: syncprov_sendresp:
cookie=rid=134,csn=20080428111855.697316Z#000000#000#000000
Apr 28 12:18:32 pelican slapd[7688]: slap_graduate_commit_csn: removing
0xa12ee70 20080428111855.697316Z#000000#000#000000
Apr 28 12:18:32 pelican slapd[7688]: syncrepl_entry: rid=142 be_modify (0)
Apr 28 12:18:32 pelican slapd[7688]: slap_queue_csn: queing 0x9ff7560
20080428111855.697316Z#000000#000#000000

The backtrace obtained from the core file looks like this:

Loaded symbols for /usr/lib/sasl2/libdigestmd5.so.2
Reading symbols from /usr/lib/openldap/syncprov-2.4.so.2...Reading symbols from
/usr/lib/debug/usr/lib/openldap/syncprov-2.4.so.2.0.4.debug...done.
done.
Loaded symbols for /usr/lib/openldap/syncprov-2.4.so.2
#0  0x080e6638 in overlay_entry_get_ov (op=0x7e3eefd0, dn=0x7e3eeeb0, oc=0x0,
ad=0x0, rw=0, e=0x7e3eedfc, on=0x808bdf8) at
../../../servers/slapd/backover.c:355
355                             rc = on->on_bi.bi_entry_get_rw( op, dn,
(gdb) bt
#0  0x080e6638 in overlay_entry_get_ov (op=0x7e3eefd0, dn=0x7e3eeeb0, oc=0x0,
ad=0x0, rw=0, e=0x7e3eedfc, on=0x808bdf8) at
../../../servers/slapd/backover.c:355
#1  0x00b187ac in syncprov_qtask (ctx=0x7e3ef2a0, arg=0xa02f708) at
../../../../servers/slapd/overlays/syncprov.c:871
#2  0x0817a277 in ldap_int_thread_pool_wrapper (xpool=0x9db94d0) at
../../../libraries/libldap_r/tpool.c:663
#3  0x00acb371 in start_thread () from /lib/tls/libpthread.so.0
#4  0x00944ffe in clone () from /lib/tls/libc.so.6
(gdb)

The server pelican is configured using both the syncprov & glue overlays, while
the subordinate for ou=V,ou=W,ou=X,dc=Y,dc=Z is a simple syncrepl declaration of
type refreshAndPersist.

Looking at the log snippet above, I can see in the "syncprov_sendresp" lines
that the cookie appears to be empty. This does appear to be similar to ITS#5432,
although this claims to have been fixed by a commit on the 21st March (and hence
the fix would be included within our CVS checkout). Further information can be
provided on request.


Many thanks,

Mark.