Issue 6472 - Syncrepl : loop problem with moddn on a new node
Summary: Syncrepl : loop problem with moddn on a new node
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: 2.4.21
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-02-12 15:28 UTC by COMBES Julien - SG/SPSSI/CPII/DOSE/ET/PNE MESSAGERIE
Modified: 2014-08-01 21:04 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description COMBES Julien - SG/SPSSI/CPII/DOSE/ET/PNE MESSAGERIE 2010-02-12 15:28:56 UTC
Full_Name: Julien COMBES
Version: 2.4.21
OS: Debian 5.0.4
URL: ftp://ftp.openldap.org/incoming/its-syncrepl-loop-moddn.tar.bz2
Submission from: (NULL) (212.23.175.185)


Hello,

I think I have found a loop problem with syncrepl replication with openldap
2.4.21, BDB 4.7.25 with all patches and hdb database. The problem appears
sometimes when an entry is moved with "modrdbn -s" in a node which has just been
created. I have reproduced the problem with the creation of a node and a moddn
while the consumer was stopped and then restarted after.

The problem follows these steps :
 - When it starts, the consumer does a request objectClass=* on the provider :
Feb 12 09:09:19 ldapma24-ida01 slapd[30445]: conn=1007 op=1 SRCH
base="dc=my,dc=domain" scope=2 deref=0 filter="(objectClass=*)"

 - The consumer finds the modrdn and tries to do this :
Feb 12 09:09:19 ldapra24-ida01 slapd[12156]:
==>hdb_modrdn(cn=user1,ou=A,dc=my,dc=domain,cn=user1,ou=X,dc=my,dc=domain)

 - The consumer fails with these errors :
Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: =>
hdb_dn2id("ou=x,dc=my,dc=domain")
Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: <= hdb_dn2id: get failed:
DB_NOTFOUND: No matching key/data pair found (-30988)
Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: hdb_modrdn:
newSup(ndn=ou=x,dc=my,dc=domain) not here!
Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: send_ldap_result: conn=-1 op=0 p=0
Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: send_ldap_result: err=32 matched=""
text="new superior not found"

 - The consumer retries the request objectClass=* on the provider and loops on
the problem. The replication doesn't work anymore.

To reproduce the problem, I have used these steps :
 - start an empty provider
 - ldapadd the entries in mydomain.ldif
ldapadd -x  -h 127.0.0.1 -D "dc=my,dc=domain" -W  -f mydomain.ldif
 - start the consumer.
 - stop the consumer when replication is finished
 - ldapadd the new node 
ldapadd -x  -h 127.0.0.1 -D "dc=my,dc=domain" -W -f add.ldif
 - modrdn -s
ldapmodrdn -x -h 127.0.0.1 -D "dc=my,dc=domain" -W -r -s "ou=X,dc=my,dc=domain"
"cn=user1,ou=A,dc=my,dc=domain" "cn=user1"
 - start the consumer

I join in its-syncrepl-loop-moddn.tar.bz2  :
 - slapd.conf of provider and consummer
 - log files of provider and consummer
 - mydomain.ldif and add.ldif

regards,
Comment 1 ando@openldap.org 2010-04-17 09:37:27 UTC
changed notes
moved from Incoming to Software Bugs
Comment 2 ando@openldap.org 2010-04-17 09:43:50 UTC
changed notes
Comment 3 ando@openldap.org 2010-04-17 13:01:27 UTC
changed notes
changed state Open to Test
Comment 4 ando@openldap.org 2010-04-17 16:46:44 UTC
> Full_Name: Julien COMBES
> Version: 2.4.21
> OS: Debian 5.0.4
> URL: ftp://ftp.openldap.org/incoming/its-syncrepl-loop-moddn.tar.bz2
> Submission from: (NULL) (212.23.175.185)
>
>
> Hello,
>
> I think I have found a loop problem with syncrepl replication with
> openldap
> 2.4.21, BDB 4.7.25 with all patches and hdb database. The problem appears
> sometimes when an entry is moved with "modrdbn -s" in a node which has
> just been
> created. I have reproduced the problem with the creation of a node and a
> moddn
> while the consumer was stopped and then restarted after.
>
> The problem follows these steps :
>  - When it starts, the consumer does a request objectClass=* on the
> provider :
> Feb 12 09:09:19 ldapma24-ida01 slapd[30445]: conn=1007 op=1 SRCH
> base="dc=my,dc=domain" scope=2 deref=0 filter="(objectClass=*)"
>
>  - The consumer finds the modrdn and tries to do this :
> Feb 12 09:09:19 ldapra24-ida01 slapd[12156]:
> ==>hdb_modrdn(cn=user1,ou=A,dc=my,dc=domain,cn=user1,ou=X,dc=my,dc=domain)
>
>  - The consumer fails with these errors :
> Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: =>
> hdb_dn2id("ou=x,dc=my,dc=domain")
> Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: <= hdb_dn2id: get failed:
> DB_NOTFOUND: No matching key/data pair found (-30988)
> Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: hdb_modrdn:
> newSup(ndn=ou=x,dc=my,dc=domain) not here!
> Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: send_ldap_result: conn=-1
> op=0 p=0
> Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: send_ldap_result: err=32
> matched=""
> text="new superior not found"
>
>  - The consumer retries the request objectClass=* on the provider and
> loops on
> the problem. The replication doesn't work anymore.
>
> To reproduce the problem, I have used these steps :
>  - start an empty provider
>  - ldapadd the entries in mydomain.ldif
> ldapadd -x  -h 127.0.0.1 -D "dc=my,dc=domain" -W  -f mydomain.ldif
>  - start the consumer.
>  - stop the consumer when replication is finished
>  - ldapadd the new node
> ldapadd -x  -h 127.0.0.1 -D "dc=my,dc=domain" -W -f add.ldif
>  - modrdn -s
> ldapmodrdn -x -h 127.0.0.1 -D "dc=my,dc=domain" -W -r -s
> "ou=X,dc=my,dc=domain"
> "cn=user1,ou=A,dc=my,dc=domain" "cn=user1"
>  - start the consumer
>
> I join in its-syncrepl-loop-moddn.tar.bz2  :
>  - slapd.conf of provider and consummer
>  - log files of provider and consummer
>  - mydomain.ldif and add.ldif

Thanks for the detailed report.  The bug is confirmed, and it's not
related to back-hdb, but seems to be syncrepl-related in general.

p.

Comment 5 ando@openldap.org 2010-04-17 19:07:47 UTC
>> Full_Name: Julien COMBES
>> Version: 2.4.21
>> OS: Debian 5.0.4
>> URL: ftp://ftp.openldap.org/incoming/its-syncrepl-loop-moddn.tar.bz2
>> Submission from: (NULL) (212.23.175.185)
>>
>>
>> Hello,
>>
>> I think I have found a loop problem with syncrepl replication with
>> openldap
>> 2.4.21, BDB 4.7.25 with all patches and hdb database. The problem
>> appears
>> sometimes when an entry is moved with "modrdbn -s" in a node which has
>> just been
>> created. I have reproduced the problem with the creation of a node and a
>> moddn
>> while the consumer was stopped and then restarted after.
>>
>> The problem follows these steps :
>>  - When it starts, the consumer does a request objectClass=* on the
>> provider :
>> Feb 12 09:09:19 ldapma24-ida01 slapd[30445]: conn=1007 op=1 SRCH
>> base="dc=my,dc=domain" scope=2 deref=0 filter="(objectClass=*)"
>>
>>  - The consumer finds the modrdn and tries to do this :
>> Feb 12 09:09:19 ldapra24-ida01 slapd[12156]:
>> ==>hdb_modrdn(cn=user1,ou=A,dc=my,dc=domain,cn=user1,ou=X,dc=my,dc=domain)
>>
>>  - The consumer fails with these errors :
>> Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: =>
>> hdb_dn2id("ou=x,dc=my,dc=domain")
>> Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: <= hdb_dn2id: get failed:
>> DB_NOTFOUND: No matching key/data pair found (-30988)
>> Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: hdb_modrdn:
>> newSup(ndn=ou=x,dc=my,dc=domain) not here!
>> Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: send_ldap_result: conn=-1
>> op=0 p=0
>> Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: send_ldap_result: err=32
>> matched=""
>> text="new superior not found"
>>
>>  - The consumer retries the request objectClass=* on the provider and
>> loops on
>> the problem. The replication doesn't work anymore.
>>
>> To reproduce the problem, I have used these steps :
>>  - start an empty provider
>>  - ldapadd the entries in mydomain.ldif
>> ldapadd -x  -h 127.0.0.1 -D "dc=my,dc=domain" -W  -f mydomain.ldif
>>  - start the consumer.
>>  - stop the consumer when replication is finished
>>  - ldapadd the new node
>> ldapadd -x  -h 127.0.0.1 -D "dc=my,dc=domain" -W -f add.ldif
>>  - modrdn -s
>> ldapmodrdn -x -h 127.0.0.1 -D "dc=my,dc=domain" -W -r -s
>> "ou=X,dc=my,dc=domain"
>> "cn=user1,ou=A,dc=my,dc=domain" "cn=user1"
>>  - start the consumer
>>
>> I join in its-syncrepl-loop-moddn.tar.bz2  :
>>  - slapd.conf of provider and consummer
>>  - log files of provider and consummer
>>  - mydomain.ldif and add.ldif
>
> Thanks for the detailed report.  The bug is confirmed, and it's not
> related to back-hdb, but seems to be syncrepl-related in general.

It's not clear to me where the issue is.  What is the "right" sequence the
add of the new superior and the mordrdn should be transmitted?  Should the
provider operate differently, or should the consumer check all syncrepl
messages and try to rebuild the final state, instead of giving up when the
internal lookup for the newsuperior fails?  Probably, a workaround could
be to perform the modrdn by crating the new superior as a glue object,
which eventually will be replaced by the actual add.

p.

Comment 6 ando@openldap.org 2010-04-17 19:28:39 UTC
> It's not clear to me where the issue is.  What is the "right" sequence the
> add of the new superior and the mordrdn should be transmitted?  Should the
> provider operate differently, or should the consumer check all syncrepl
> messages and try to rebuild the final state, instead of giving up when the
> internal lookup for the newsuperior fails?  Probably, a workaround could
> be to perform the modrdn by crating the new superior as a glue object,
> which eventually will be replaced by the actual add.

I've quickly hacked things this way, and it seems to work fine.

<ftp://ftp.openldap.org/incoming/pierangelo-masarati-2010-04-17-sync-rename.1.patch>

Please let me know if this approach is sound enough, I might have
overlooked some implications.

p.

Comment 7 Howard Chu 2010-04-17 19:41:56 UTC
masarati@aero.polimi.it wrote:
>
>> It's not clear to me where the issue is.  What is the "right" sequence the
>> add of the new superior and the mordrdn should be transmitted?  Should the
>> provider operate differently, or should the consumer check all syncrepl
>> messages and try to rebuild the final state, instead of giving up when the
>> internal lookup for the newsuperior fails?  Probably, a workaround could
>> be to perform the modrdn by crating the new superior as a glue object,
>> which eventually will be replaced by the actual add.
>
> I've quickly hacked things this way, and it seems to work fine.
>
> <ftp://ftp.openldap.org/incoming/pierangelo-masarati-2010-04-17-sync-rename.1.patch>
>
> Please let me know if this approach is sound enough, I might have
> overlooked some implications.

Patch looks good, solution makes sense. This is one of the reasons we would 
expect glue entries to be used.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 8 ando@openldap.org 2010-04-17 20:00:59 UTC
A fix is in HEAD, please test

slapd/syncrepl.c 1.502 -> 1.503

Thanks for reporting, p.

Comment 9 Quanah Gibson-Mount 2010-04-19 12:22:48 UTC
changed notes
changed state Test to Release
Comment 10 COMBES Julien - SG/SPSSI/CPII/DOSE/ET/PNE MESSAGERIE 2010-04-21 08:08:40 UTC
Hello,

Le 17/04/2010 22:00, > masarati@aero.polimi.it (par Internet) a écrit :
> A fix is in HEAD, please test

I have tried the compilaltion of the source 2.4.21 with your patch, 
"make test" failed on "test020-proxycache for bdb" with this error :

 >>>>> Starting test020-proxycache for bdb...
Starting master slapd on TCP/IP port 9011...
Using ldapsearch to check that master slapd is running...
Using ldapadd to populate the master directory...
Starting proxy cache on TCP/IP port 9012...
Using ldapsearch to check that proxy slapd is running...
Making queries on the proxy cache...
Query 1: filter:(sn=Jon) attrs:all (expect nothing)
Query 2: filter:(|(cn=*Jon*)(sn=Jon*)) attrs:cn sn title uid
Query 3: filter:(sn=Smith*) attrs:cn sn uid
./scripts/test020-proxycache: line 173: 16714 Erreur de segmentation 
$SLAPD -f $CONF2 -h $URI2 -d $LVL -d pcache > $LOG2 2>&1
ldapsearch failed (255)!
./scripts/test020-proxycache: line 177: kill: (16714) - Aucun processus 
de ce type
 >>>>> ./scripts/test020-proxycache failed for bdb (exit 255)
make[2]: *** [bdb-mod] Erreur 255
make[2]: quittant le répertoire « 
/root/openldap/2.4.21-its6472/compil-upstream/openldap-2.4.21/tests »
make[1]: *** [test] Erreur 2
make[1]: quittant le répertoire « 
/root/openldap/2.4.21-its6472/compil-upstream/openldap-2.4.21/tests »
make: *** [test] Erreur 2

I have used this compilation steps :
tar -zxvf openldap-2.4.21.tgz

cd openldap-2.4.21

patch -p0 < ../pierangelo-masarati-2010-04-17-sync-rename.1.patch
patching file servers/slapd/syncrepl.c
Hunk #2 succeeded at 2554 (offset -10 lines).
Hunk #3 succeeded at 2891 (offset -10 lines).
Hunk #4 succeeded at 3026 (offset -10 lines).

./configure --enable-debug --enable-dynamic --enable-syslog 
--enable-proctitle --enable-ipv6 --enable-local --enable-slapd 
--enable-aci --enable-cleartext --enable-crypt --disable-lmpasswd 
--enable-spasswd --enable-modules --enable-rewrite --enable-rlookups 
--enable-slapi --enable-slp --enable-wrappers --enable-backends=mod 
--enable-ldbm=no --disable-ndb --enable-overlays=mod --with-subdir=ldap 
--with-cyrus-sasl --with-threads --with-tls=openssl --with-odbc=unixodbc

make depend

make

make test

regards,
Julien

Comment 11 ando@openldap.org 2010-04-21 13:38:10 UTC
This is a known issue.  That patch is now obsoleted by the code that has
been committed to HEAD and ported to re24 for release.  Please test re24
out of the CVS, or apply the corresponding modifications to
slapd/syncrepl.c to 2.4.21, and test.  Please note that test018 has been
modified to reproduce and test the problem you highlighted.  If you run
the new test18 with 2.4.21 it should consistently fail, while it passes
with the new code.

p.

> Hello,
>
> Le 17/04/2010 22:00, > masarati@aero.polimi.it (par Internet) a écrit :
>> A fix is in HEAD, please test
>
> I have tried the compilaltion of the source 2.4.21 with your patch,
> "make test" failed on "test020-proxycache for bdb" with this error :
>
>  >>>>> Starting test020-proxycache for bdb...
> Starting master slapd on TCP/IP port 9011...
> Using ldapsearch to check that master slapd is running...
> Using ldapadd to populate the master directory...
> Starting proxy cache on TCP/IP port 9012...
> Using ldapsearch to check that proxy slapd is running...
> Making queries on the proxy cache...
> Query 1: filter:(sn=Jon) attrs:all (expect nothing)
> Query 2: filter:(|(cn=*Jon*)(sn=Jon*)) attrs:cn sn title uid
> Query 3: filter:(sn=Smith*) attrs:cn sn uid
> ./scripts/test020-proxycache: line 173: 16714 Erreur de segmentation
> $SLAPD -f $CONF2 -h $URI2 -d $LVL -d pcache > $LOG2 2>&1
> ldapsearch failed (255)!
> ./scripts/test020-proxycache: line 177: kill: (16714) - Aucun processus
> de ce type
>  >>>>> ./scripts/test020-proxycache failed for bdb (exit 255)
> make[2]: *** [bdb-mod] Erreur 255
> make[2]: quittant le répertoire «
> /root/openldap/2.4.21-its6472/compil-upstream/openldap-2.4.21/tests »
> make[1]: *** [test] Erreur 2
> make[1]: quittant le répertoire «
> /root/openldap/2.4.21-its6472/compil-upstream/openldap-2.4.21/tests »
> make: *** [test] Erreur 2
>
> I have used this compilation steps :
> tar -zxvf openldap-2.4.21.tgz
>
> cd openldap-2.4.21
>
> patch -p0 < ../pierangelo-masarati-2010-04-17-sync-rename.1.patch
> patching file servers/slapd/syncrepl.c
> Hunk #2 succeeded at 2554 (offset -10 lines).
> Hunk #3 succeeded at 2891 (offset -10 lines).
> Hunk #4 succeeded at 3026 (offset -10 lines).
>
> ./configure --enable-debug --enable-dynamic --enable-syslog
> --enable-proctitle --enable-ipv6 --enable-local --enable-slapd
> --enable-aci --enable-cleartext --enable-crypt --disable-lmpasswd
> --enable-spasswd --enable-modules --enable-rewrite --enable-rlookups
> --enable-slapi --enable-slp --enable-wrappers --enable-backends=mod
> --enable-ldbm=no --disable-ndb --enable-overlays=mod --with-subdir=ldap
> --with-cyrus-sasl --with-threads --with-tls=openssl --with-odbc=unixodbc
>
> make depend
>
> make
>
> make test
>
> regards,
> Julien
>
>


Comment 12 COMBES Julien - SG/SPSSI/CPII/DOSE/ET/PNE MESSAGERIE 2010-04-23 13:37:38 UTC
Hello,

I have tested with re24. It's ok.

Thank you.

Regards,
Julien

Le 21/04/2010 15:38, > masarati@aero.polimi.it (par Internet) a écrit :
> This is a known issue.  That patch is now obsoleted by the code that has
> been committed to HEAD and ported to re24 for release.  Please test re24
> out of the CVS, or apply the corresponding modifications to
> slapd/syncrepl.c to 2.4.21, and test.  Please note that test018 has been
> modified to reproduce and test the problem you highlighted.  If you run
> the new test18 with 2.4.21 it should consistently fail, while it passes
> with the new code.
> 
> p.
> 
>> Hello,
>>
>> Le 17/04/2010 22:00, > masarati@aero.polimi.it (par Internet) a écrit :
>>> A fix is in HEAD, please test
>> I have tried the compilaltion of the source 2.4.21 with your patch,
>> "make test" failed on "test020-proxycache for bdb" with this error :
>>
>>  >>>>> Starting test020-proxycache for bdb...
>> Starting master slapd on TCP/IP port 9011...
>> Using ldapsearch to check that master slapd is running...
>> Using ldapadd to populate the master directory...
>> Starting proxy cache on TCP/IP port 9012...
>> Using ldapsearch to check that proxy slapd is running...
>> Making queries on the proxy cache...
>> Query 1: filter:(sn=Jon) attrs:all (expect nothing)
>> Query 2: filter:(|(cn=*Jon*)(sn=Jon*)) attrs:cn sn title uid
>> Query 3: filter:(sn=Smith*) attrs:cn sn uid
>> ./scripts/test020-proxycache: line 173: 16714 Erreur de segmentation
>> $SLAPD -f $CONF2 -h $URI2 -d $LVL -d pcache > $LOG2 2>&1
>> ldapsearch failed (255)!
>> ./scripts/test020-proxycache: line 177: kill: (16714) - Aucun processus
>> de ce type
>>  >>>>> ./scripts/test020-proxycache failed for bdb (exit 255)
>> make[2]: *** [bdb-mod] Erreur 255
>> make[2]: quittant le répertoire «
>> /root/openldap/2.4.21-its6472/compil-upstream/openldap-2.4.21/tests »
>> make[1]: *** [test] Erreur 2
>> make[1]: quittant le répertoire «
>> /root/openldap/2.4.21-its6472/compil-upstream/openldap-2.4.21/tests »
>> make: *** [test] Erreur 2
>>
>> I have used this compilation steps :
>> tar -zxvf openldap-2.4.21.tgz
>>
>> cd openldap-2.4.21
>>
>> patch -p0 < ../pierangelo-masarati-2010-04-17-sync-rename.1.patch
>> patching file servers/slapd/syncrepl.c
>> Hunk #2 succeeded at 2554 (offset -10 lines).
>> Hunk #3 succeeded at 2891 (offset -10 lines).
>> Hunk #4 succeeded at 3026 (offset -10 lines).
>>
>> ./configure --enable-debug --enable-dynamic --enable-syslog
>> --enable-proctitle --enable-ipv6 --enable-local --enable-slapd
>> --enable-aci --enable-cleartext --enable-crypt --disable-lmpasswd
>> --enable-spasswd --enable-modules --enable-rewrite --enable-rlookups
>> --enable-slapi --enable-slp --enable-wrappers --enable-backends=mod
>> --enable-ldbm=no --disable-ndb --enable-overlays=mod --with-subdir=ldap
>> --with-cyrus-sasl --with-threads --with-tls=openssl --with-odbc=unixodbc
>>
>> make depend
>>
>> make
>>
>> make test
>>
>> regards,
>> Julien
>>
>>
> 
> 
> 

Comment 13 Quanah Gibson-Mount 2010-04-29 08:34:21 UTC
changed notes
changed state Release to Closed
Comment 14 OpenLDAP project 2014-08-01 21:04:29 UTC
confirmed (also with bdb; syncrepl issue)
fixed in HEAD
fixed in RE24