[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#3571) Syncrepl Provider Crash



OK. The crash I see is due to the extra connection_abandon processing, 
same as ITS#3534, 3546. When the provider finds a matching sessionlog 
record, the current search operation is added to the persistent search 
list. The new abandon code was intended to make sure that persistent 
searches were aborted / closed correctly, and to free everything they 
allocated. However, in this case it is acting on a regular operation, 
and the operation gets freed twice, once while it is still being used.

You might try the patch in #3546, backing out some of the abandon 
invocation. This exposes you to a resource leak in persistent searches 
of course. The entire design here is so flawed it's hard to say where to 
begin. You can try to add a flag to the psearch list, which is only set 
once an operation actually gets detached, and make sure that the 
bdb_abandon only tries to free detached operations. However, due to the 
API layer violations between the frontend and back-bdb, this approach is 
unreliable; there is no good place to set the flag when needed. As I 
said before, IMO the 2.2 provider is not fixable.

rhafer@suse.de wrote:

>On Tuesday 01 March 2005 11:05, Howard Chu wrote:
>  
>
>>I am unable to cause a crash in this scenario. Please send your
>>slapd.conf (ACLs and syncrepl config are most relevant) as well as a
>>backtrace of the crash.
>>    
>>
>
>It seems only to crash if a session log is configured. My provider has
>this slapd.conf:
>-----------------------------
>include         /etc/openldap/schema/core.schema
>include         /etc/openldap/schema/cosine.schema
>include         /etc/openldap/schema/inetorgperson.schema
>include         /etc/openldap/schema/rfc2307bis.schema
>
>pidfile         /var/run/slapd/slapd.pid
>argsfile        /var/run/slapd/slapd.args
>
># Global ACLs
>access to dn.base=""
>        by * read
>
>access to dn.base="cn=Subschema"
>        by * read
>
>access to attr=userPassword,userPKCS12
>        by dn="cn=replicator,dc=suse,dc=de" read
>        by self write
>        by * auth
>
>access to attr=shadowLastChange
>        by dn="cn=replicator,dc=suse,dc=de" read
>        by self write
>        by * read
>
>access to *
>        by dn="cn=replicator,dc=suse,dc=de" read
>        by * read
>
>loglevel 0
>sizelimit 5000
>database bdb
>sessionlog 543 64
>suffix "dc=suse,dc=de"
>rootdn "cn=Administrator,dc=suse,dc=de"
>rootpw secret
>directory /var/lib/ldap
>checkpoint 1024 5
>cachesize 10000
>index objectClass,uidNumber,gidNumber eq
>index member,mail eq,pres
>index cn,displayname,uid,sn,givenname sub,eq,pres
>-----------------------------------
>
>The consumer has the same config, but instead of the sessionlog 
>directive I have: 
>-------------
>syncrepl rid=543
>  provider=ldap://192.168.1.3
>  type=refreshOnly
>  interval=00:00:00:05
>  searchbase="dc=suse,dc=de"
>  filter="(objectClass=*)"
>  scope=sub
>  schemachecking=off
>  updatedn="cn=Replicator,dc=suse,dc=de"
>  bindmethod=simple
>  binddn="cn=Replicator,dc=suse,dc=de"
>  credentials=secret
>-----------------
>during the test I start it from the commandline with 
>"-d 256 -c rid=543,sid=543"
>
>To crash it I loaded these entries:
>-----------------------
>dn: dc=suse,dc=de
>objectclass: organization
>objectclass: dcobject
>o: suse
>dc: suse
>
>dn: cn=replicator,dc=suse,dc=de
>objectclass: person
>cn: replicator
>sn: replicator
>userpassword: secret
>
>dn: ou=people,dc=suse,dc=de
>objectclass: organizationalUnit
>ou: people
>
>dn: ou=group,dc=suse,dc=de
>objectclass: organizationalUnit
>ou: group
>----------------------
>
>The initial sync Operation worked fine. After that I restarted first 
>the provider an than the consumer. The provider than crashes after the
>consumer issues the next sync Operation. Here's the backtrace:
>----------
>(gdb) bt
>#0  0x404bbec9 in free () from /lib/tls/libc.so.6
>#1  0x40055180 in ber_memfree_x (p=0x41f561d4, ctx=0x41f561d4) at memory.c:153
>#2  0x080ae4dc in bdb_do_search (op=0x8198f70, rs=0x41f54870, sop=0x8198f70, ps_e=0x0, ps_type=0) at search.c:1452
>#3  0x080aee48 in bdb_search (op=0x41f561cc, rs=0x41f54870) at search.c:384
>#4  0x08066de9 in do_search (op=0x8198f70, rs=0x41f54870) at search.c:412
>#5  0x08065be5 in connection_operation (ctx=0x41f54900, arg_v=0x8198f70) at connection.c:1086
>#6  0x40024034 in ldap_int_thread_pool_wrapper (xpool=0x812f058) at tpool.c:467
>#7  0x402ae7f3 in start_thread () from /lib/tls/libpthread.so.0
>#8  0x4051262a in clone () from /lib/tls/libc.so.6
>---------- 
>
>"bt full" gives this:
>----------
>(gdb) bt full
>#0  0x404bbec9 in free () from /lib/tls/libc.so.6
>No symbol table info available.
>#1  0x40055180 in ber_memfree_x (p=0x41f561d4, ctx=0x41f561d4) at memory.c:153
>No locals.
>#2  0x080ae4dc in bdb_do_search (op=0x8198f70, rs=0x41f54870, sop=0x8198f70, ps_e=0x0, ps_type=0) at search.c:1452
>        cookie = {bv_len = 52, bv_val = 0x8199c90 "csn=20050301105734Z#000004#00#000000,sid=543,rid=543"}
>        bdb = (struct bdb_info *) 0x81862f0
>        stoptime = 1109678268
>        id = 0
>        cursor = 0
>        candidates = {0 <repeats 128200 times>, 1078638801, 0, 1078539377, 1079356960, 0, 135207520, 1106589728, 1106579640, 1078548671, 
>  1106579692, 135207520, 1, 0, 0, 0, 0, 0, 0, 0, 1078545219, 1106579620, 0 <repeats 45 times>, 544407552, 0, 4294967295, 4294967293, 0, 0, 0, 0, 
>  0, 0, 0, 0, 0, 10, 1, 0, 0, 1106578548, 0, 3, 1106589640, 1106579620, 0, 135207497, 26, 4294967295, 0 <repeats 23 times>, 135207521, 
>  0 <repeats 238 times>, 1106579552, 26, 1106579868, 1106579552, 1079002468, 26, 1078684390, 2, 1106579868, 26, 0, 0, 1079432288, 1106579868, 
>  1106579596, 1078683621, 1079432288, 1106579868, 26, 1079432359, 26, 1106579868, 1079431156, 26, 1106579868, 1106579640, 1078684259, 1079432288, 
>  4294967295, 0, 26, 0, 0, 1079431156, 26, 26, 1106588072, 1078542597, 1106579676, 0, 26, 1, 26, 1106579692, 135207492, 1076577184, 1079432288, 0, 
>  0, 4222451716, 0, 0, 0, 1106579868, 1106579894, 1106588060, 0 <repeats 16 times>, 4294967295, 0 <repeats 13 times>, 1079428736, 1079432288, 0, 
>  0, 0, 0, 0, 1852731235, 1864380477, 540097904, 1212371539, 1953784096, 539639154, 1970473515, 1680631155, 1701068131, 1668489250, 1030058095, 
>  1701060658, 1030120818, 1768300592, 1919251564, 1864901181, 1667590754, 1634485108, 708670323, 664105, 0 <repeats 525 times>...}
>        scopes = {0 <repeats 65536 times>}
>        e = (Entry *) 0x0
>        base = {e_id = 1, e_name = {bv_len = 0, bv_val = 0x0}, e_nname = {bv_len = 13, bv_val = 0x81925a8 "dc=suse,dc=de"}, e_attrs = 0x0, 
>  e_ocflags = 0, e_bv = {bv_len = 0, bv_val = 0x0}, e_private = 0x8191348}
>        e_root = {e_id = 0, e_name = {bv_len = 0, bv_val = 0x0}, e_nname = {bv_len = 0, bv_val = 0x0}, e_attrs = 0x0, e_ocflags = 0, e_bv = {
>    bv_len = 0, bv_val = 0x0}, e_private = 0x0}
>        matched = Variable "matched" is not available.
>----------
>
>  
>
>>>>You should switch to the 2.3 provider as soon as practical.
>>>>        
>>>>
>>>2.3 it not an option at the momemnt. But is it feasible to backport
>>>the the syncprov overlay from 2.3 to 2.2? If yes, I might go that
>>>way.
>>>      
>>>
>>Extremely difficult, I think. And you must #if out all of the 2.2
>>provider from back-bdb to insure stability.
>>
>>    
>>
>>>>I will apply your patch,  but realize that no more development
>>>>effort is going into the 2.2 provider.
>>>>        
>>>>
>>>Thanks for the clarification.
>>>      
>>>
>
>  
>


-- 
  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support