[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#3684) ldapadd and ldapsearch cause slapd segfault



Here is a copy of the first message I made after i discovered this 
issue. This was sent to openldap-devel *before* the above ITS posting. 
It was bounced back as it was determined to be off-topic.

---

Hi All,

Just suffering from a reproducable crasher running OpenLDAP 2.X CVS
HEAD (from last week and updated yesterday and today).

Fedora 3, x86-64, Linux 2.6.11, DB-4.2.52 with patches, OpenLDAP 2.x
CVS HEAD 2005-04-26 10:45 GMT (.

Highlights: Testing rootless tree modifications. Consider:

o=foo
o=baz
o=bar

Recursively delete  o=foo and it's contents, with the intention
performing slapadd a new o=foo only.

Results:

Delete appeared successful (from gq), however

$ ldapsearch -w foo -x -b 'o=foo' -D 'cn=root'

segfaults slapd :

=> ldap_bv2dn(cn=root,0)
ldap_err2string
<= ldap_bv2dn(cn=root)=0 Success
=> ldap_dn2bv(272)
ldap_err2string
<= ldap_dn2bv(cn=root)=0 Success
=> ldap_dn2bv(272)
ldap_err2string
<= ldap_dn2bv(cn=root)=0 Success
<<< dnPrettyNormal: <cn=root>, <cn=root>
do_bind: version=3 dn="cn=root" method=128
conn=0 op=0 BIND dn="cn=root" method=128
==> bdb_bind: dn: cn=root
bdb_dn2entry("cn=root")
=> bdb_dn2id("cn=root")
<= bdb_dn2id: get failed: DB_NOTFOUND: No matching key/data pair found
(-30990)
Segmentation fault

Remember o=foo is not supposed to exist.

On restart, slapd detects

slapd[8084]: bdb_db_open: unclean shutdown detected; attempting
recovery.
slapd[8084]: bdb_db_recover: Database cannot be recovered. Restore from
backup!
slapd[8084]: bdb_db_open: DB recovery failed.
slapd[8084]: backend_startup_one: bi_db_open failed! (-1)
slapd[8084]: slapd stopped.

After a while of fumbling I determined this to be an effective course of
action:

$ db_recovery -v
$ db_verify *bdb
$ rm -fr alock

and restart.

Without the explicit call to db_recovery -v, a call to db_verify on
id2entry.bdb would hang in futex(), similarly so would slapd and
therefore would simply sit there forever (or greater than 10 minutes
stuck on the futex call). Only sigterm can kill it.

Dumping the db files with db_dump and db_import would similarly not
solve the problem *without* also removing the alock file. hexdump of 
this shows slight changes in 0x410 in the 5th word (if it is a word, and 
you count it that way - 6655) between the 'ok' state and crash:

0000410 0f45 426d 0000 0000 6655 0000 0000 0000
                              ^^^^

Anyway, once recovered, one can perform the exact same search with the
exact same consequences.

UPDATE #1: I've just discovered that performing a ldapsearch as supplied
above whether o=foo *exists or not* kills slapd....

UPDATE #2: 2005-04-27 After performing a CVS update, repeat the above 
search with fresh imported LDIF. Same slapd segfault. Repeat search , 
but specify "-b 'o=foo' ". This time slapd does NOT segfault, but 
consumes CPU as if performing a search. This search never completes.

Despite consuming CPU, strace the slapd process:

# strace -f -p 31856
Process 31863 attached with 3 threads - interrupt to quit
[pid 31856] futex(0x408009f0, FUTEX_WAIT, 31857, NULL <unfinished ...>
[pid 31857] select(13, [4 6 7 12], NULL, NULL, NULL



Apologies for the now largely over verbose email, but I spent so long on
it, I felt I should keep it as the recovery process may inspire some
comments.

Best regards,