[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: (ITS#3571) Syncrepl Provider Crash
OK. The crash I see is due to the extra connection_abandon processing,
same as ITS#3534, 3546. When the provider finds a matching sessionlog
record, the current search operation is added to the persistent search
list. The new abandon code was intended to make sure that persistent
searches were aborted / closed correctly, and to free everything they
allocated. However, in this case it is acting on a regular operation,
and the operation gets freed twice, once while it is still being used.
You might try the patch in #3546, backing out some of the abandon
invocation. This exposes you to a resource leak in persistent searches
of course. The entire design here is so flawed it's hard to say where to
begin. You can try to add a flag to the psearch list, which is only set
once an operation actually gets detached, and make sure that the
bdb_abandon only tries to free detached operations. However, due to the
API layer violations between the frontend and back-bdb, this approach is
unreliable; there is no good place to set the flag when needed. As I
said before, IMO the 2.2 provider is not fixable.
rhafer@suse.de wrote:
>On Tuesday 01 March 2005 11:05, Howard Chu wrote:
>
>
>>I am unable to cause a crash in this scenario. Please send your
>>slapd.conf (ACLs and syncrepl config are most relevant) as well as a
>>backtrace of the crash.
>>
>>
>
>It seems only to crash if a session log is configured. My provider has
>this slapd.conf:
>-----------------------------
>include /etc/openldap/schema/core.schema
>include /etc/openldap/schema/cosine.schema
>include /etc/openldap/schema/inetorgperson.schema
>include /etc/openldap/schema/rfc2307bis.schema
>
>pidfile /var/run/slapd/slapd.pid
>argsfile /var/run/slapd/slapd.args
>
># Global ACLs
>access to dn.base=""
> by * read
>
>access to dn.base="cn=Subschema"
> by * read
>
>access to attr=userPassword,userPKCS12
> by dn="cn=replicator,dc=suse,dc=de" read
> by self write
> by * auth
>
>access to attr=shadowLastChange
> by dn="cn=replicator,dc=suse,dc=de" read
> by self write
> by * read
>
>access to *
> by dn="cn=replicator,dc=suse,dc=de" read
> by * read
>
>loglevel 0
>sizelimit 5000
>database bdb
>sessionlog 543 64
>suffix "dc=suse,dc=de"
>rootdn "cn=Administrator,dc=suse,dc=de"
>rootpw secret
>directory /var/lib/ldap
>checkpoint 1024 5
>cachesize 10000
>index objectClass,uidNumber,gidNumber eq
>index member,mail eq,pres
>index cn,displayname,uid,sn,givenname sub,eq,pres
>-----------------------------------
>
>The consumer has the same config, but instead of the sessionlog
>directive I have:
>-------------
>syncrepl rid=543
> provider=ldap://192.168.1.3
> type=refreshOnly
> interval=00:00:00:05
> searchbase="dc=suse,dc=de"
> filter="(objectClass=*)"
> scope=sub
> schemachecking=off
> updatedn="cn=Replicator,dc=suse,dc=de"
> bindmethod=simple
> binddn="cn=Replicator,dc=suse,dc=de"
> credentials=secret
>-----------------
>during the test I start it from the commandline with
>"-d 256 -c rid=543,sid=543"
>
>To crash it I loaded these entries:
>-----------------------
>dn: dc=suse,dc=de
>objectclass: organization
>objectclass: dcobject
>o: suse
>dc: suse
>
>dn: cn=replicator,dc=suse,dc=de
>objectclass: person
>cn: replicator
>sn: replicator
>userpassword: secret
>
>dn: ou=people,dc=suse,dc=de
>objectclass: organizationalUnit
>ou: people
>
>dn: ou=group,dc=suse,dc=de
>objectclass: organizationalUnit
>ou: group
>----------------------
>
>The initial sync Operation worked fine. After that I restarted first
>the provider an than the consumer. The provider than crashes after the
>consumer issues the next sync Operation. Here's the backtrace:
>----------
>(gdb) bt
>#0 0x404bbec9 in free () from /lib/tls/libc.so.6
>#1 0x40055180 in ber_memfree_x (p=0x41f561d4, ctx=0x41f561d4) at memory.c:153
>#2 0x080ae4dc in bdb_do_search (op=0x8198f70, rs=0x41f54870, sop=0x8198f70, ps_e=0x0, ps_type=0) at search.c:1452
>#3 0x080aee48 in bdb_search (op=0x41f561cc, rs=0x41f54870) at search.c:384
>#4 0x08066de9 in do_search (op=0x8198f70, rs=0x41f54870) at search.c:412
>#5 0x08065be5 in connection_operation (ctx=0x41f54900, arg_v=0x8198f70) at connection.c:1086
>#6 0x40024034 in ldap_int_thread_pool_wrapper (xpool=0x812f058) at tpool.c:467
>#7 0x402ae7f3 in start_thread () from /lib/tls/libpthread.so.0
>#8 0x4051262a in clone () from /lib/tls/libc.so.6
>----------
>
>"bt full" gives this:
>----------
>(gdb) bt full
>#0 0x404bbec9 in free () from /lib/tls/libc.so.6
>No symbol table info available.
>#1 0x40055180 in ber_memfree_x (p=0x41f561d4, ctx=0x41f561d4) at memory.c:153
>No locals.
>#2 0x080ae4dc in bdb_do_search (op=0x8198f70, rs=0x41f54870, sop=0x8198f70, ps_e=0x0, ps_type=0) at search.c:1452
> cookie = {bv_len = 52, bv_val = 0x8199c90 "csn=20050301105734Z#000004#00#000000,sid=543,rid=543"}
> bdb = (struct bdb_info *) 0x81862f0
> stoptime = 1109678268
> id = 0
> cursor = 0
> candidates = {0 <repeats 128200 times>, 1078638801, 0, 1078539377, 1079356960, 0, 135207520, 1106589728, 1106579640, 1078548671,
> 1106579692, 135207520, 1, 0, 0, 0, 0, 0, 0, 0, 1078545219, 1106579620, 0 <repeats 45 times>, 544407552, 0, 4294967295, 4294967293, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 10, 1, 0, 0, 1106578548, 0, 3, 1106589640, 1106579620, 0, 135207497, 26, 4294967295, 0 <repeats 23 times>, 135207521,
> 0 <repeats 238 times>, 1106579552, 26, 1106579868, 1106579552, 1079002468, 26, 1078684390, 2, 1106579868, 26, 0, 0, 1079432288, 1106579868,
> 1106579596, 1078683621, 1079432288, 1106579868, 26, 1079432359, 26, 1106579868, 1079431156, 26, 1106579868, 1106579640, 1078684259, 1079432288,
> 4294967295, 0, 26, 0, 0, 1079431156, 26, 26, 1106588072, 1078542597, 1106579676, 0, 26, 1, 26, 1106579692, 135207492, 1076577184, 1079432288, 0,
> 0, 4222451716, 0, 0, 0, 1106579868, 1106579894, 1106588060, 0 <repeats 16 times>, 4294967295, 0 <repeats 13 times>, 1079428736, 1079432288, 0,
> 0, 0, 0, 0, 1852731235, 1864380477, 540097904, 1212371539, 1953784096, 539639154, 1970473515, 1680631155, 1701068131, 1668489250, 1030058095,
> 1701060658, 1030120818, 1768300592, 1919251564, 1864901181, 1667590754, 1634485108, 708670323, 664105, 0 <repeats 525 times>...}
> scopes = {0 <repeats 65536 times>}
> e = (Entry *) 0x0
> base = {e_id = 1, e_name = {bv_len = 0, bv_val = 0x0}, e_nname = {bv_len = 13, bv_val = 0x81925a8 "dc=suse,dc=de"}, e_attrs = 0x0,
> e_ocflags = 0, e_bv = {bv_len = 0, bv_val = 0x0}, e_private = 0x8191348}
> e_root = {e_id = 0, e_name = {bv_len = 0, bv_val = 0x0}, e_nname = {bv_len = 0, bv_val = 0x0}, e_attrs = 0x0, e_ocflags = 0, e_bv = {
> bv_len = 0, bv_val = 0x0}, e_private = 0x0}
> matched = Variable "matched" is not available.
>----------
>
>
>
>>>>You should switch to the 2.3 provider as soon as practical.
>>>>
>>>>
>>>2.3 it not an option at the momemnt. But is it feasible to backport
>>>the the syncprov overlay from 2.3 to 2.2? If yes, I might go that
>>>way.
>>>
>>>
>>Extremely difficult, I think. And you must #if out all of the 2.2
>>provider from back-bdb to insure stability.
>>
>>
>>
>>>>I will apply your patch, but realize that no more development
>>>>effort is going into the 2.2 provider.
>>>>
>>>>
>>>Thanks for the clarification.
>>>
>>>
>
>
>
--
-- Howard Chu
Chief Architect, Symas Corp. Director, Highland Sun
http://www.symas.com http://highlandsun.com/hyc
Symas: Premier OpenSource Development and Support