[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Bug in slapd (ch_calloc) (ITS#1244)



yannick.lecaillez@altiva.fr wrote:

> Full_Name: Yannick Lecaillez
> Version: 2.0.11
> OS: Linux 2.4.3-20mdk with Mandrake 8
> URL:
> Submission from: (NULL) (193.251.69.246)
>
> I've a master slapd with two slave using replication on three different
> computer.
> I do a massive add operation (adding 1000 posixAccount, posixGroup and an
> another
> proprietary entry, 4 entries of this proprietary class is added when a
> posixAccount,
> posixGroup couple is added) from another host directly to the master.
>
> Each computer have a daemon which ask database every five seconds about
> proprietary
> entry and sometimes modifiy it. Each daemon can modify the same entry but on
> different values of the same attribute (since the massive ADD is done ...).
>
> About a random number of posixAccount created (from 1(!!) to 160) i've this
> following error :
>
> master slapd[10947]: ch_calloc of 468516866 elems of 4 bytes failed
>
> It's seems to do about 1787Mb ! I don't understand why slapd need this amount
> of
> memory. After that, slapd goes down. This is the complete log :
>
> Jul 19 08:40:09 master slapd[10927]: daemon: read activity on 9
> Jul 19 08:40:09 master slapd[10927]: connection_get(9)
> Jul 19 08:40:09 master slapd[10927]: connection_get(9): got connid=4
> Jul 19 08:40:09 master slapd[10927]: connection_read(9): checking for input on
> id=4
> Jul 19 08:40:09 master slapd[10927]: ber_get_next on fd 9 failed errno=11
> (Resource temporarily unavailable)
> Jul 19 08:40:09 master slapd[10947]: do_add
> Jul 19 08:40:09 master slapd[10947]: do_add: ndn
> (UNIQUEIDENTIFIER=1804289383,OU=SYNC,DC=ROCHEPLANE,DC=LAN)
> Jul 19 08:40:09 master slapd[10947]: conn=4 op=6 ADD
> dn="UNIQUEIDENTIFIER=1804289383,OU=SYNC,DC=ROCHEPLANE,DC=LAN"
> Jul 19 08:40:09 master slapd[10947]: dn2entry_r: dn:
> "UNIQUEIDENTIFIER=1804289383,OU=SYNC,DC=ROCHEPLANE,DC=LAN"
> Jul 19 08:40:09 master slapd[10947]: => dn2id(
> "UNIQUEIDENTIFIER=1804289383,OU=SYNC,DC=ROCHEPLANE,DC=LAN" )
> Jul 19 08:40:09 master slapd[10947]: => ldbm_cache_open(
> "/var/lib/base-Intranet/dn2id.dbb", 7, 600 )
> Jul 19 08:40:09 master slapd[10947]: <= ldbm_cache_open (cache 0)
> Jul 19 08:40:09 master slapd[10947]: <= dn2id NOID
> Jul 19 08:40:09 master slapd[10947]: dn2entry_r: dn:
> "OU=SYNC,DC=ROCHEPLANE,DC=LAN"
> Jul 19 08:40:09 master slapd[10947]: => dn2id( "OU=SYNC,DC=ROCHEPLANE,DC=LAN" )
>
> Jul 19 08:40:09 master slapd[10947]: ====>
> cache_find_entry_dn2id("OU=SYNC,DC=ROCHEPLANE,DC=LAN"): 7162 (1 tries)
> Jul 19 08:40:09 master slapd[10947]: <= dn2id 7162 (in cache)
> Jul 19 08:40:09 master slapd[10947]: => id2entry_r( 7162 )
> Jul 19 08:40:09 master slapd[10947]: entry_rdwr_rtrylock: ID: 7162
> Jul 19 08:40:09 master slapd[10947]: ====> cache_find_entry_id( 7162 )
> "ou=Sync,dc=rocheplane,dc=lan" (found) (1 tries)
> Jul 19 08:40:09 master slapd[10947]: <= id2entry_r( 7162 ) 0x80ec378 (cache)
> Jul 19 08:40:09 master slapd[10947]: ldbm_referrals: op=104
> target="uniqueIdentifier=1804289383,ou=Sync,dc=rocheplane,dc=lan"
> matched="ou=Sync,dc=rocheplane,dc=lan"
> Jul 19 08:40:09 master slapd[10947]: entry_rdwr_runlock: ID: 7162
> Jul 19 08:40:09 master slapd[10947]: ====> cache_return_entry_r( 7162 ):
> returned (0)
> Jul 19 08:40:09 master slapd[10947]: ==> ldbm_back_add:
> uniqueIdentifier=1804289383,ou=Sync,dc=rocheplane,dc=lan
> Jul 19 08:40:09 master slapd[10947]: => dn2id(
> "UNIQUEIDENTIFIER=1804289383,OU=SYNC,DC=ROCHEPLANE,DC=LAN" )
> Jul 19 08:40:09 master slapd[10947]: => ldbm_cache_open(
> "/var/lib/base-Intranet/dn2id.dbb", 7, 600 )
> Jul 19 08:40:09 master slapd[10947]: <= ldbm_cache_open (cache 0)
> Jul 19 08:40:09 master slapd[10947]: <= dn2id NOID
> Jul 19 08:40:09 master slapd[10947]: oc_check_required entry
> (uniqueIdentifier=1804289383,ou=Sync,dc=rocheplane,dc=lan), objectClass "top"
> Jul 19 08:40:09 master slapd[10947]: oc_check_required entry
> (uniqueIdentifier=1804289383,ou=Sync,dc=rocheplane,dc=lan), objectClass
> "AltivaSyncOperation"
> Jul 19 08:40:09 master slapd[10947]: oc_check_allowed type "objectClass"
> Jul 19 08:40:09 master slapd[10947]: oc_check_allowed type "uniqueIdentifier"
> Jul 19 08:40:09 master slapd[10947]: oc_check_allowed type
> "AltivaSyncCommandhosttype"
> Jul 19 08:40:09 master slapd[10947]: oc_check_allowed type
> "AltivaSyncCommandname"
> Jul 19 08:40:09 master slapd[10947]: oc_check_allowed type
> "AltivaSyncCommandparameters"
> Jul 19 08:40:09 master slapd[10947]: oc_check_allowed type "creatorsName"
> Jul 19 08:40:09 master slapd[10947]: oc_check_allowed type "createTimestamp"
> Jul 19 08:40:09 master slapd[10947]: oc_check_allowed type "modifiersName"
> Jul 19 08:40:09 master slapd[10947]: oc_check_allowed type "modifyTimestamp"
> Jul 19 08:40:09 master slapd[10947]: dn2entry_w: dn:
> "OU=SYNC,DC=ROCHEPLANE,DC=LAN"
> Jul 19 08:40:09 master slapd[10947]: => dn2id( "OU=SYNC,DC=ROCHEPLANE,DC=LAN" )
>
> Jul 19 08:40:09 master slapd[10947]: ====>
> cache_find_entry_dn2id("OU=SYNC,DC=ROCHEPLANE,DC=LAN"): 7162 (1 tries)
> Jul 19 08:40:09 master slapd[10947]: <= dn2id 7162 (in cache)
> Jul 19 08:40:09 master slapd[10947]: => id2entry_w( 7162 )
> Jul 19 08:40:09 master slapd[10947]: entry_rdwr_wtrylock: ID: 7162
> Jul 19 08:40:09 master slapd[10947]: ====> cache_find_entry_id( 7162 )
> "ou=Sync,dc=rocheplane,dc=lan" (found) (1 tries)
> Jul 19 08:40:09 master slapd[10947]: <= id2entry_w( 7162 ) 0x80ec378 (cache)
> Jul 19 08:40:09 master slapd[10947]: => access_allowed: write access to
> "ou=Sync,dc=rocheplane,dc=lan" "children" requested
> Jul 19 08:40:09 master slapd[10947]: <= root access granted
> Jul 19 08:40:09 master slapd[10947]: => ldbm_cache_open(
> "/var/lib/base-Intranet/nextid.dbb", 7, 600 )
> Jul 19 08:40:09 master slapd[10947]: <= ldbm_cache_open (cache 2)
> Jul 19 08:40:09 master slapd[10927]: daemon: select: listen=6 active_threads=1
> tvp=NULL
> Jul 19 08:40:09 master slapd[10947]: entry_rdwr_wlock: ID: 7164
> Jul 19 08:40:09 master slapd[10947]: => index_entry_add( 7164,
> "uniqueIdentifier=1804289383,ou=Sync,dc=rocheplane,dc=lan" )
> Jul 19 08:40:09 master slapd[10947]: => ldbm_cache_open(
> "/var/lib/base-Intranet/objectClass.dbb", 7, 600 )
> Jul 19 08:40:09 master slapd[10947]: <= ldbm_cache_open (cache 3)
> Jul 19 08:40:09 master slapd[10947]: => key_change(ADD,1bfc)
> Jul 19 08:40:09 master slapd[10947]: <= key_change 0
> Jul 19 08:40:09 master slapd[10947]: => key_change(ADD,1bfc)
> Jul 19 08:40:09 master slapd[10947]: ch_calloc of 468516866 elems of 4 bytes
> failed
>
> I've seen several report of this same type of bug but i don't see ANY answer

It is VERY difficult to reproduce such occurrences of GARBAGE getting
into ch_calloc. That's the reason :).  As you can see from the TODO list,
all the memory allocation in slapd should be rewritten and these problems
might be handled as operations errors that do not cause slapd to abort.

Note that a bug in SleepyCat caching handling was fixed very recently
(ITS#1176). You may try the REL_ENG_2 from the cvs server, which
will be released soon as 2.0.12; this may fix your problem.

Pierangelo.

--
Dr. Pierangelo Masarati               | voice: +39 02 2399 8309
Dip. Ing. Aerospaziale                | fax:   +39 02 2399 8334
Politecnico di Milano                 | mailto:masarati@aero.polimi.it
via La Masa 34, 20156 Milano, Italy   | http://www.aero.polimi.it/~masarati