Full_Name: Julien COMBES Version: 2.4.21 OS: Debian 5.0.4 URL: ftp://ftp.openldap.org/incoming/its-syncrepl-loop-moddn.tar.bz2 Submission from: (NULL) (212.23.175.185) Hello, I think I have found a loop problem with syncrepl replication with openldap 2.4.21, BDB 4.7.25 with all patches and hdb database. The problem appears sometimes when an entry is moved with "modrdbn -s" in a node which has just been created. I have reproduced the problem with the creation of a node and a moddn while the consumer was stopped and then restarted after. The problem follows these steps : - When it starts, the consumer does a request objectClass=* on the provider : Feb 12 09:09:19 ldapma24-ida01 slapd[30445]: conn=1007 op=1 SRCH base="dc=my,dc=domain" scope=2 deref=0 filter="(objectClass=*)" - The consumer finds the modrdn and tries to do this : Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: ==>hdb_modrdn(cn=user1,ou=A,dc=my,dc=domain,cn=user1,ou=X,dc=my,dc=domain) - The consumer fails with these errors : Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: => hdb_dn2id("ou=x,dc=my,dc=domain") Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: <= hdb_dn2id: get failed: DB_NOTFOUND: No matching key/data pair found (-30988) Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: hdb_modrdn: newSup(ndn=ou=x,dc=my,dc=domain) not here! Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: send_ldap_result: conn=-1 op=0 p=0 Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: send_ldap_result: err=32 matched="" text="new superior not found" - The consumer retries the request objectClass=* on the provider and loops on the problem. The replication doesn't work anymore. To reproduce the problem, I have used these steps : - start an empty provider - ldapadd the entries in mydomain.ldif ldapadd -x -h 127.0.0.1 -D "dc=my,dc=domain" -W -f mydomain.ldif - start the consumer. - stop the consumer when replication is finished - ldapadd the new node ldapadd -x -h 127.0.0.1 -D "dc=my,dc=domain" -W -f add.ldif - modrdn -s ldapmodrdn -x -h 127.0.0.1 -D "dc=my,dc=domain" -W -r -s "ou=X,dc=my,dc=domain" "cn=user1,ou=A,dc=my,dc=domain" "cn=user1" - start the consumer I join in its-syncrepl-loop-moddn.tar.bz2 : - slapd.conf of provider and consummer - log files of provider and consummer - mydomain.ldif and add.ldif regards,
changed notes moved from Incoming to Software Bugs
changed notes
changed notes changed state Open to Test
> Full_Name: Julien COMBES > Version: 2.4.21 > OS: Debian 5.0.4 > URL: ftp://ftp.openldap.org/incoming/its-syncrepl-loop-moddn.tar.bz2 > Submission from: (NULL) (212.23.175.185) > > > Hello, > > I think I have found a loop problem with syncrepl replication with > openldap > 2.4.21, BDB 4.7.25 with all patches and hdb database. The problem appears > sometimes when an entry is moved with "modrdbn -s" in a node which has > just been > created. I have reproduced the problem with the creation of a node and a > moddn > while the consumer was stopped and then restarted after. > > The problem follows these steps : > - When it starts, the consumer does a request objectClass=* on the > provider : > Feb 12 09:09:19 ldapma24-ida01 slapd[30445]: conn=1007 op=1 SRCH > base="dc=my,dc=domain" scope=2 deref=0 filter="(objectClass=*)" > > - The consumer finds the modrdn and tries to do this : > Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: > ==>hdb_modrdn(cn=user1,ou=A,dc=my,dc=domain,cn=user1,ou=X,dc=my,dc=domain) > > - The consumer fails with these errors : > Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: => > hdb_dn2id("ou=x,dc=my,dc=domain") > Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: <= hdb_dn2id: get failed: > DB_NOTFOUND: No matching key/data pair found (-30988) > Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: hdb_modrdn: > newSup(ndn=ou=x,dc=my,dc=domain) not here! > Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: send_ldap_result: conn=-1 > op=0 p=0 > Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: send_ldap_result: err=32 > matched="" > text="new superior not found" > > - The consumer retries the request objectClass=* on the provider and > loops on > the problem. The replication doesn't work anymore. > > To reproduce the problem, I have used these steps : > - start an empty provider > - ldapadd the entries in mydomain.ldif > ldapadd -x -h 127.0.0.1 -D "dc=my,dc=domain" -W -f mydomain.ldif > - start the consumer. > - stop the consumer when replication is finished > - ldapadd the new node > ldapadd -x -h 127.0.0.1 -D "dc=my,dc=domain" -W -f add.ldif > - modrdn -s > ldapmodrdn -x -h 127.0.0.1 -D "dc=my,dc=domain" -W -r -s > "ou=X,dc=my,dc=domain" > "cn=user1,ou=A,dc=my,dc=domain" "cn=user1" > - start the consumer > > I join in its-syncrepl-loop-moddn.tar.bz2 : > - slapd.conf of provider and consummer > - log files of provider and consummer > - mydomain.ldif and add.ldif Thanks for the detailed report. The bug is confirmed, and it's not related to back-hdb, but seems to be syncrepl-related in general. p.
>> Full_Name: Julien COMBES >> Version: 2.4.21 >> OS: Debian 5.0.4 >> URL: ftp://ftp.openldap.org/incoming/its-syncrepl-loop-moddn.tar.bz2 >> Submission from: (NULL) (212.23.175.185) >> >> >> Hello, >> >> I think I have found a loop problem with syncrepl replication with >> openldap >> 2.4.21, BDB 4.7.25 with all patches and hdb database. The problem >> appears >> sometimes when an entry is moved with "modrdbn -s" in a node which has >> just been >> created. I have reproduced the problem with the creation of a node and a >> moddn >> while the consumer was stopped and then restarted after. >> >> The problem follows these steps : >> - When it starts, the consumer does a request objectClass=* on the >> provider : >> Feb 12 09:09:19 ldapma24-ida01 slapd[30445]: conn=1007 op=1 SRCH >> base="dc=my,dc=domain" scope=2 deref=0 filter="(objectClass=*)" >> >> - The consumer finds the modrdn and tries to do this : >> Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: >> ==>hdb_modrdn(cn=user1,ou=A,dc=my,dc=domain,cn=user1,ou=X,dc=my,dc=domain) >> >> - The consumer fails with these errors : >> Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: => >> hdb_dn2id("ou=x,dc=my,dc=domain") >> Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: <= hdb_dn2id: get failed: >> DB_NOTFOUND: No matching key/data pair found (-30988) >> Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: hdb_modrdn: >> newSup(ndn=ou=x,dc=my,dc=domain) not here! >> Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: send_ldap_result: conn=-1 >> op=0 p=0 >> Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: send_ldap_result: err=32 >> matched="" >> text="new superior not found" >> >> - The consumer retries the request objectClass=* on the provider and >> loops on >> the problem. The replication doesn't work anymore. >> >> To reproduce the problem, I have used these steps : >> - start an empty provider >> - ldapadd the entries in mydomain.ldif >> ldapadd -x -h 127.0.0.1 -D "dc=my,dc=domain" -W -f mydomain.ldif >> - start the consumer. >> - stop the consumer when replication is finished >> - ldapadd the new node >> ldapadd -x -h 127.0.0.1 -D "dc=my,dc=domain" -W -f add.ldif >> - modrdn -s >> ldapmodrdn -x -h 127.0.0.1 -D "dc=my,dc=domain" -W -r -s >> "ou=X,dc=my,dc=domain" >> "cn=user1,ou=A,dc=my,dc=domain" "cn=user1" >> - start the consumer >> >> I join in its-syncrepl-loop-moddn.tar.bz2 : >> - slapd.conf of provider and consummer >> - log files of provider and consummer >> - mydomain.ldif and add.ldif > > Thanks for the detailed report. The bug is confirmed, and it's not > related to back-hdb, but seems to be syncrepl-related in general. It's not clear to me where the issue is. What is the "right" sequence the add of the new superior and the mordrdn should be transmitted? Should the provider operate differently, or should the consumer check all syncrepl messages and try to rebuild the final state, instead of giving up when the internal lookup for the newsuperior fails? Probably, a workaround could be to perform the modrdn by crating the new superior as a glue object, which eventually will be replaced by the actual add. p.
> It's not clear to me where the issue is. What is the "right" sequence the > add of the new superior and the mordrdn should be transmitted? Should the > provider operate differently, or should the consumer check all syncrepl > messages and try to rebuild the final state, instead of giving up when the > internal lookup for the newsuperior fails? Probably, a workaround could > be to perform the modrdn by crating the new superior as a glue object, > which eventually will be replaced by the actual add. I've quickly hacked things this way, and it seems to work fine. <ftp://ftp.openldap.org/incoming/pierangelo-masarati-2010-04-17-sync-rename.1.patch> Please let me know if this approach is sound enough, I might have overlooked some implications. p.
masarati@aero.polimi.it wrote: > >> It's not clear to me where the issue is. What is the "right" sequence the >> add of the new superior and the mordrdn should be transmitted? Should the >> provider operate differently, or should the consumer check all syncrepl >> messages and try to rebuild the final state, instead of giving up when the >> internal lookup for the newsuperior fails? Probably, a workaround could >> be to perform the modrdn by crating the new superior as a glue object, >> which eventually will be replaced by the actual add. > > I've quickly hacked things this way, and it seems to work fine. > > <ftp://ftp.openldap.org/incoming/pierangelo-masarati-2010-04-17-sync-rename.1.patch> > > Please let me know if this approach is sound enough, I might have > overlooked some implications. Patch looks good, solution makes sense. This is one of the reasons we would expect glue entries to be used. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
A fix is in HEAD, please test slapd/syncrepl.c 1.502 -> 1.503 Thanks for reporting, p.
changed notes changed state Test to Release
Hello, Le 17/04/2010 22:00, > masarati@aero.polimi.it (par Internet) a écrit : > A fix is in HEAD, please test I have tried the compilaltion of the source 2.4.21 with your patch, "make test" failed on "test020-proxycache for bdb" with this error : >>>>> Starting test020-proxycache for bdb... Starting master slapd on TCP/IP port 9011... Using ldapsearch to check that master slapd is running... Using ldapadd to populate the master directory... Starting proxy cache on TCP/IP port 9012... Using ldapsearch to check that proxy slapd is running... Making queries on the proxy cache... Query 1: filter:(sn=Jon) attrs:all (expect nothing) Query 2: filter:(|(cn=*Jon*)(sn=Jon*)) attrs:cn sn title uid Query 3: filter:(sn=Smith*) attrs:cn sn uid ./scripts/test020-proxycache: line 173: 16714 Erreur de segmentation $SLAPD -f $CONF2 -h $URI2 -d $LVL -d pcache > $LOG2 2>&1 ldapsearch failed (255)! ./scripts/test020-proxycache: line 177: kill: (16714) - Aucun processus de ce type >>>>> ./scripts/test020-proxycache failed for bdb (exit 255) make[2]: *** [bdb-mod] Erreur 255 make[2]: quittant le répertoire « /root/openldap/2.4.21-its6472/compil-upstream/openldap-2.4.21/tests » make[1]: *** [test] Erreur 2 make[1]: quittant le répertoire « /root/openldap/2.4.21-its6472/compil-upstream/openldap-2.4.21/tests » make: *** [test] Erreur 2 I have used this compilation steps : tar -zxvf openldap-2.4.21.tgz cd openldap-2.4.21 patch -p0 < ../pierangelo-masarati-2010-04-17-sync-rename.1.patch patching file servers/slapd/syncrepl.c Hunk #2 succeeded at 2554 (offset -10 lines). Hunk #3 succeeded at 2891 (offset -10 lines). Hunk #4 succeeded at 3026 (offset -10 lines). ./configure --enable-debug --enable-dynamic --enable-syslog --enable-proctitle --enable-ipv6 --enable-local --enable-slapd --enable-aci --enable-cleartext --enable-crypt --disable-lmpasswd --enable-spasswd --enable-modules --enable-rewrite --enable-rlookups --enable-slapi --enable-slp --enable-wrappers --enable-backends=mod --enable-ldbm=no --disable-ndb --enable-overlays=mod --with-subdir=ldap --with-cyrus-sasl --with-threads --with-tls=openssl --with-odbc=unixodbc make depend make make test regards, Julien
This is a known issue. That patch is now obsoleted by the code that has been committed to HEAD and ported to re24 for release. Please test re24 out of the CVS, or apply the corresponding modifications to slapd/syncrepl.c to 2.4.21, and test. Please note that test018 has been modified to reproduce and test the problem you highlighted. If you run the new test18 with 2.4.21 it should consistently fail, while it passes with the new code. p. > Hello, > > Le 17/04/2010 22:00, > masarati@aero.polimi.it (par Internet) a écrit : >> A fix is in HEAD, please test > > I have tried the compilaltion of the source 2.4.21 with your patch, > "make test" failed on "test020-proxycache for bdb" with this error : > > >>>>> Starting test020-proxycache for bdb... > Starting master slapd on TCP/IP port 9011... > Using ldapsearch to check that master slapd is running... > Using ldapadd to populate the master directory... > Starting proxy cache on TCP/IP port 9012... > Using ldapsearch to check that proxy slapd is running... > Making queries on the proxy cache... > Query 1: filter:(sn=Jon) attrs:all (expect nothing) > Query 2: filter:(|(cn=*Jon*)(sn=Jon*)) attrs:cn sn title uid > Query 3: filter:(sn=Smith*) attrs:cn sn uid > ./scripts/test020-proxycache: line 173: 16714 Erreur de segmentation > $SLAPD -f $CONF2 -h $URI2 -d $LVL -d pcache > $LOG2 2>&1 > ldapsearch failed (255)! > ./scripts/test020-proxycache: line 177: kill: (16714) - Aucun processus > de ce type > >>>>> ./scripts/test020-proxycache failed for bdb (exit 255) > make[2]: *** [bdb-mod] Erreur 255 > make[2]: quittant le répertoire « > /root/openldap/2.4.21-its6472/compil-upstream/openldap-2.4.21/tests » > make[1]: *** [test] Erreur 2 > make[1]: quittant le répertoire « > /root/openldap/2.4.21-its6472/compil-upstream/openldap-2.4.21/tests » > make: *** [test] Erreur 2 > > I have used this compilation steps : > tar -zxvf openldap-2.4.21.tgz > > cd openldap-2.4.21 > > patch -p0 < ../pierangelo-masarati-2010-04-17-sync-rename.1.patch > patching file servers/slapd/syncrepl.c > Hunk #2 succeeded at 2554 (offset -10 lines). > Hunk #3 succeeded at 2891 (offset -10 lines). > Hunk #4 succeeded at 3026 (offset -10 lines). > > ./configure --enable-debug --enable-dynamic --enable-syslog > --enable-proctitle --enable-ipv6 --enable-local --enable-slapd > --enable-aci --enable-cleartext --enable-crypt --disable-lmpasswd > --enable-spasswd --enable-modules --enable-rewrite --enable-rlookups > --enable-slapi --enable-slp --enable-wrappers --enable-backends=mod > --enable-ldbm=no --disable-ndb --enable-overlays=mod --with-subdir=ldap > --with-cyrus-sasl --with-threads --with-tls=openssl --with-odbc=unixodbc > > make depend > > make > > make test > > regards, > Julien > >
Hello, I have tested with re24. It's ok. Thank you. Regards, Julien Le 21/04/2010 15:38, > masarati@aero.polimi.it (par Internet) a écrit : > This is a known issue. That patch is now obsoleted by the code that has > been committed to HEAD and ported to re24 for release. Please test re24 > out of the CVS, or apply the corresponding modifications to > slapd/syncrepl.c to 2.4.21, and test. Please note that test018 has been > modified to reproduce and test the problem you highlighted. If you run > the new test18 with 2.4.21 it should consistently fail, while it passes > with the new code. > > p. > >> Hello, >> >> Le 17/04/2010 22:00, > masarati@aero.polimi.it (par Internet) a écrit : >>> A fix is in HEAD, please test >> I have tried the compilaltion of the source 2.4.21 with your patch, >> "make test" failed on "test020-proxycache for bdb" with this error : >> >> >>>>> Starting test020-proxycache for bdb... >> Starting master slapd on TCP/IP port 9011... >> Using ldapsearch to check that master slapd is running... >> Using ldapadd to populate the master directory... >> Starting proxy cache on TCP/IP port 9012... >> Using ldapsearch to check that proxy slapd is running... >> Making queries on the proxy cache... >> Query 1: filter:(sn=Jon) attrs:all (expect nothing) >> Query 2: filter:(|(cn=*Jon*)(sn=Jon*)) attrs:cn sn title uid >> Query 3: filter:(sn=Smith*) attrs:cn sn uid >> ./scripts/test020-proxycache: line 173: 16714 Erreur de segmentation >> $SLAPD -f $CONF2 -h $URI2 -d $LVL -d pcache > $LOG2 2>&1 >> ldapsearch failed (255)! >> ./scripts/test020-proxycache: line 177: kill: (16714) - Aucun processus >> de ce type >> >>>>> ./scripts/test020-proxycache failed for bdb (exit 255) >> make[2]: *** [bdb-mod] Erreur 255 >> make[2]: quittant le répertoire « >> /root/openldap/2.4.21-its6472/compil-upstream/openldap-2.4.21/tests » >> make[1]: *** [test] Erreur 2 >> make[1]: quittant le répertoire « >> /root/openldap/2.4.21-its6472/compil-upstream/openldap-2.4.21/tests » >> make: *** [test] Erreur 2 >> >> I have used this compilation steps : >> tar -zxvf openldap-2.4.21.tgz >> >> cd openldap-2.4.21 >> >> patch -p0 < ../pierangelo-masarati-2010-04-17-sync-rename.1.patch >> patching file servers/slapd/syncrepl.c >> Hunk #2 succeeded at 2554 (offset -10 lines). >> Hunk #3 succeeded at 2891 (offset -10 lines). >> Hunk #4 succeeded at 3026 (offset -10 lines). >> >> ./configure --enable-debug --enable-dynamic --enable-syslog >> --enable-proctitle --enable-ipv6 --enable-local --enable-slapd >> --enable-aci --enable-cleartext --enable-crypt --disable-lmpasswd >> --enable-spasswd --enable-modules --enable-rewrite --enable-rlookups >> --enable-slapi --enable-slp --enable-wrappers --enable-backends=mod >> --enable-ldbm=no --disable-ndb --enable-overlays=mod --with-subdir=ldap >> --with-cyrus-sasl --with-threads --with-tls=openssl --with-odbc=unixodbc >> >> make depend >> >> make >> >> make test >> >> regards, >> Julien >> >> > > >
changed notes changed state Release to Closed
confirmed (also with bdb; syncrepl issue) fixed in HEAD fixed in RE24