Full_Name: Peter Mogensen Version: 2.4.19 OS: Debian Lenny URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (95.166.36.16) Using openldap 2.4.17 and 2.4.19 linked with libdb4.6 and libdb4.8 in a mirrormode setup: * Load the database with slapadd on server-1, start server-1 The LDIF being loaded is generated with slapcat from a slapd 2.3.30-5+etch2 Running on Debian Etch. I have no reason to suspect that it is not loaded correctly into server1 * Start server-2 and monitor the progress of replication with slapcat, for example: # for ((I=1;I<=20;I++)); do slapcat > out-$I; done * Look at the output: # for ((I=1;I<=20;I++)); do wc -l out-$I; done I would expect the generated files to be strictly increasing in size. However, some times there's a file which is smaller than the previous. In it I see LDIF entries like this: dn: objectClass: top objectClass: NamedObject objectClass: simpleSecurityObject uid: rieke userPassword:: e1NBU0x..... structuralObjectClass: NamedObject entryUUID: e46b680e-e5f5-102b-93c9-79162adc1d46 creatorsName: dc=admin,dc=example,dc=com createTimestamp: 20070823185333Z entryCSN: 20070823185333.000000Z#000002#000#000000 modifiersName: dc=admin,dc=example,dc=com modifyTimestamp: 20070823185333Z ... with an empty DN line. My config is as follows. It has been converted to LDIF and the server is running with a cn=config database: ============================================ #gentlehup on pidfile /var/run/slapd/slapd.pid argsfile /var/run/slapd/slapd.args loglevel none tool-threads 4 # Modules modulepath /usr/lib/ldap moduleload back_hdb moduleload syncprov # Schemas include /etc/ldap/schema/core.schema include /etc/ldap/schema/cosine.schema include /etc/ldap/schema/inetorgperson.schema # Limits disallow bind_anon #idletimeout 120 sizelimit 2000 # TLS/Auth TLSCACertificateFile /etc/ldap/ssl/ca.crt TLSCertificateFile /etc/ldap/ssl/server.crt TLSCertificateKeyFile /etc/ldap/ssl/server.nopass.key TLSCipherSuite "NULL-SHA" # Allow root to configure slapd via ldapi:/// TLSVerifyClient demand authz-regexp "gidNumber=0\\+uidNumber=0,cn=peercred,cn=external,cn=auth" "cn=config" authz-regexp "email=root@example.com,cn=config,ou=dev,o=example.com,st=Denmark,c=DK" "cn=config" ##### Mirror mode #### serverID 2 database config limits dn.exact="cn=config" time.soft=unlimited time.hard=unlimited size.soft=unlimited size.hard=unlimited syncrepl rid=1 provider=ldaps://server1.example.com:636/ searchbase="cn=config" type=refreshAndPersist retry="60 +" scope=sub schemachecking=on bindmethod=sasl binddn="cn=config" saslmech="EXTERNAL" tls_cert=/etc/ldap/ssl/config.crt tls_key=/etc/ldap/ssl/config.nopass.key tls_cacert=/etc/ldap/ssl/ca.crt tls_cipher_suite="NULL-SHA" syncrepl rid=2 provider=ldaps://server2.example.com:636/ searchbase="cn=config" type=refreshAndPersist retry="60 +" scope=sub schemachecking=on bindmethod=sasl binddn="cn=config" saslmech="EXTERNAL" tls_cert=/etc/ldap/ssl/config.crt tls_key=/etc/ldap/ssl/config.nopass.key tls_cacert=/etc/ldap/ssl/ca.crt tls_cipher_suite="NULL-SHA" overlay syncprov syncprov-checkpoint 100 10 syncprov-sessionlog 100 syncprov-reloadhint TRUE mirrormode on ================================================= The database which I slapcat and which is being replicated has been loaded with " ldapadd -YEXTERNAL -H ldapi:/// -f ..." from this LDIF: dn: olcDatabase={1}hdb,cn=config objectClass: olcHdbConfig objectClass: olcDatabaseConfig olcDatabase: hdb olcSuffix: cn=data,dc=example,dc=com olcRootDN: cn=config olcDbDirectory: /var/lib/ldap/cn=data,dc=example,dc=com olcDbMode: 0660 olcDbConfig: set_cachesize 2 0 0 olcDbConfig: set_lg_bsize 2097512 olcDbConfig: set_lg_dir /var/lib/ldap/cn=data,dc=example,dc=com-log olcDbConfig: set_flags DB_LOG_AUTOREMOVE olcDbConfig: set_lk_max_objects 5000 olcDbConfig: set_lk_max_locks 5000 olcDbConfig: set_lk_max_lockers 5000 olcDbCheckpoint: 1024 10 olcDbCachefree: 16 olcDbCachesize: 100000 olcDbIDLcacheSize: 300000 olcDbLinearIndex: TRUE olcDbIndex: objectClass eq olcDbIndex: entryUUID eq olcDbIndex: entryCSN eq olcDbIndex: cn eq,sub olcDbIndex: uid eq olcDbIndex: ou eq olcDbIndex: o eq olcDbIndex: givenName eq,sub olcDbIndex: sn eq,sub olcDbIndex: mail eq,sub olcDbIndex: member eq olcDbIndex: reader eq olcDbIndex: writer eq olcDbIndex: admin eq olcAccess: to dn.base="cn=data,dc=example,dc=com" attrs=userPassword by * auth olcAccess: to dn.base="cn=data,dc=example,dc=com" by dn.base="cn=data,dc=example,dc=com" search olcAccess: to dn.children="cn=data,dc=example,dc=com" by dn.base="cn=data,dc=example,dc=com" write olcSyncRepl: rid=3 provider=ldaps://server1.example.com:636/ searchbase="cn=data,dc=example,dc=com" type=refreshAndPersist retry="60 +" scope=sub schemachecking=on bindmethod=sasl binddn="cn=config" saslmech="EXTERNAL" tls_cert=/etc/ldap/ssl/config.crt tls_key=/etc/ldap/ssl/config.nopass.key tls_cacert=/etc/ldap/ssl/ca.crt tls_cipher_suite="NULL-SHA" olcSyncRepl: rid=4 provider=ldaps://server2.example.com:636/ searchbase="cn=data,dc=example,dc=com" type=refreshAndPersist retry="60 +" scope=sub schemachecking=on bindmethod=sasl binddn="cn=config" saslmech="EXTERNAL" tls_cert=/etc/ldap/ssl/config.crt tls_key=/etc/ldap/ssl/config.nopass.key tls_cacert=/etc/ldap/ssl/ca.crt tls_cipher_suite="NULL-SHA" olcMirrorMode: TRUE olcLimits: dn.base="cn=config" size.soft=unlimited size.hard=unlimited time.soft=unlimited time.hard=unlimited dn: olcOverlay=syncprov,olcDatabase={1}hdb,cn=config objectClass: olcOverlayConfig objectClass: olcSyncProvConfig olcOverlay: syncprov olcSpCheckpoint: 100 600 olcSpSessionlog: 100 olcSpReloadHint: TRUE dn: olcOverlay=refint,olcDatabase={1}hdb,cn=config objectClass: olcOverlayConfig objectClass: olcRefintConfig olcOverlay: refint olcRefintAttribute: member
apm@mutex.dk wrote: > Full_Name: Peter Mogensen > Version: 2.4.19 > OS: Debian Lenny > URL: ftp://ftp.openldap.org/incoming/ > Submission from: (NULL) (95.166.36.16) > > > Using openldap 2.4.17 and 2.4.19 linked with libdb4.6 and libdb4.8 in a > mirrormode setup: > > * Load the database with slapadd on server-1, start server-1 > The LDIF being loaded is generated with slapcat from a slapd 2.3.30-5+etch2 > Running on Debian Etch. I have no reason to suspect that it is not loaded > correctly into server1 > > * Start server-2 and monitor the progress of replication with slapcat, for > example: > > # for ((I=1;I<=20;I++)); do slapcat > out-$I; done > > * Look at the output: > > # for ((I=1;I<=20;I++)); do wc -l out-$I; done > > I would expect the generated files to be strictly increasing in size. > However, some times there's a file which is smaller than the previous. > In it I see LDIF entries like this: > > dn: > objectClass: top > objectClass: NamedObject > objectClass: simpleSecurityObject > uid: rieke > userPassword:: e1NBU0x..... > structuralObjectClass: NamedObject > entryUUID: e46b680e-e5f5-102b-93c9-79162adc1d46 > creatorsName: dc=admin,dc=example,dc=com > createTimestamp: 20070823185333Z > entryCSN: 20070823185333.000000Z#000002#000#000000 > modifiersName: dc=admin,dc=example,dc=com > modifyTimestamp: 20070823185333Z > > ... with an empty DN line. You appear to be using back-hdb. I note that in bdb_tool_entry_get() there is code specific to back-hdb that tries to lookup the parent of the current entry and, if found, "fixes" its DN. My guess is that if this can fail, e.g. because entries are being sync'ed out of order, the DN does not get fixed. If this is the case (I couldn't inspect code deep enough to make sure), I'd expect that the DN get fixed anyway, though, because missing entries should exist as "glue" objects. I apologize for the rather incomplete analysis, I can't dig further right now. I hope this provides some hint to others, unless completely wrong. p.
Pierangelo Masarati wrote: > You appear to be using back-hdb. Yes. > My guess is that if this can fail, e.g. because entries are being > sync'ed out of order, the DN does not get fixed. If this is the case (I > couldn't inspect code deep enough to make sure), I'd expect that the DN > get fixed anyway, though, because missing entries should exist as "glue" > objects. yes. But if you plan to use slapcat as a backup mechanism, then it's still a problem. /Peter
apm@mutex.dk wrote: > Pierangelo Masarati wrote: >> You appear to be using back-hdb. > > Yes. > >> My guess is that if this can fail, e.g. because entries are being >> sync'ed out of order, the DN does not get fixed. If this is the case (I >> couldn't inspect code deep enough to make sure), I'd expect that the DN >> get fixed anyway, though, because missing entries should exist as "glue" >> objects. > > yes. But if you plan to use slapcat as a backup mechanism, then it's > still a problem. Sounds like a low priority issue at best. Taking backups of a replica while it is initializing is pointless, just take a backup of the provider instead. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
changed notes changed state Open to Suspended
Howard Chu wrote: > apm@mutex.dk wrote: >> Pierangelo Masarati wrote: >>> You appear to be using back-hdb. >> Yes. >> >>> My guess is that if this can fail, e.g. because entries are being >>> sync'ed out of order, the DN does not get fixed. If this is the case (I >>> couldn't inspect code deep enough to make sure), I'd expect that the DN >>> get fixed anyway, though, because missing entries should exist as "glue" >>> objects. >> yes. But if you plan to use slapcat as a backup mechanism, then it's >> still a problem. > > Sounds like a low priority issue at best. Taking backups of a replica while it > is initializing is pointless, just take a backup of the provider instead. This is mirrormode. There's no "provider" as such. However, there's one server which is used for application access and to minimize disk load on that server, the plan was to take most backups from the other. I can't see any difference between what you call "initializing" and normal running state, except that the difference between server-1 and server-2 is (somewhat) larger. If I can't trust slapcat during this phase, how can I trust slapcat for backups? /Peter
Peter Mogensen wrote: > Howard Chu wrote: >> apm@mutex.dk wrote: >>> Pierangelo Masarati wrote: >>>> You appear to be using back-hdb. >>> Yes. >>> >>>> My guess is that if this can fail, e.g. because entries are being >>>> sync'ed out of order, the DN does not get fixed. If this is the case (I >>>> couldn't inspect code deep enough to make sure), I'd expect that the DN >>>> get fixed anyway, though, because missing entries should exist as "glue" >>>> objects. >>> yes. But if you plan to use slapcat as a backup mechanism, then it's >>> still a problem. >> >> Sounds like a low priority issue at best. Taking backups of a replica while it >> is initializing is pointless, just take a backup of the provider instead. > > This is mirrormode. > There's no "provider" as such. However, there's one server which is used > for application access and to minimize disk load on that server, the > plan was to take most backups from the other. > I can't see any difference between what you call "initializing" and > normal running state, except that the difference between server-1 and > server-2 is (somewhat) larger. If the problem is as Ando suggests, then it's because in the syncrepl Refresh phase it's receiving entries out-of-order from the provider. Ando is suggesting that the problem is caused when a child entry is replicated before its parent. Once the Refresh phase ends and it transitions to the Persist phase, all entries' parents will exist and so this particular condition will no longer occur. Of course, no one is saying for certain that this is the explanation, yet. > If I can't trust slapcat during this > phase, how can I trust slapcat for backups? Does slapcat behave this way on the active server? -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Howard Chu wrote: > Peter Mogensen wrote: >> This is mirrormode. >> There's no "provider" as such. However, there's one server which is used >> for application access and to minimize disk load on that server, the >> plan was to take most backups from the other. >> I can't see any difference between what you call "initializing" and >> normal running state, except that the difference between server-1 and >> server-2 is (somewhat) larger. > > If the problem is as Ando suggests, then it's because in the syncrepl Refresh > phase it's receiving entries out-of-order from the provider. Ando is > suggesting that the problem is caused when a child entry is replicated before > its parent. Once the Refresh phase ends and it transitions to the Persist > phase, all entries' parents will exist and so this particular condition will > no longer occur. > > Of course, no one is saying for certain that this is the explanation, yet. It sounds reasonable to me :) But unless you are not in any way allowed to - ever - make writes to more than one server in a mirromode setup, this could (as I hear it) potentially happen at any time. The only reason I have to only make writes to one server is that I currently (this will change) have an application which is dependant on making writes and immediately reading back the entry. As I hear what you're saying is that any write to a server in a mirrormode setup could invalidate a slapcat running on the other. This would mean that you can never write to more that one server at all and that's the only server you can slapcat while running. That takes a lot of the "mirror" out of "mirrormode". Doesn't it? >> If I can't trust slapcat during this >> phase, how can I trust slapcat for backups? > > Does slapcat behave this way on the active server? I've taking that test setup down now to test other stuff, so I can't say 100%. /Peter
> Howard Chu wrote: >> Peter Mogensen wrote: >>> This is mirrormode. >>> There's no "provider" as such. However, there's one server which is >>> used >>> for application access and to minimize disk load on that server, the >>> plan was to take most backups from the other. >>> I can't see any difference between what you call "initializing" and >>> normal running state, except that the difference between server-1 and >>> server-2 is (somewhat) larger. >> >> If the problem is as Ando suggests, then it's because in the syncrepl >> Refresh >> phase it's receiving entries out-of-order from the provider. Ando is >> suggesting that the problem is caused when a child entry is replicated >> before >> its parent. Once the Refresh phase ends and it transitions to the >> Persist >> phase, all entries' parents will exist and so this particular condition >> will >> no longer occur. >> >> Of course, no one is saying for certain that this is the explanation, >> yet. > > It sounds reasonable to me :) > But unless you are not in any way allowed to - ever - make writes to > more than one server in a mirromode setup, this could (as I hear it) > potentially happen at any time. > The only reason I have to only make writes to one server is that I > currently (this will change) have an application which is dependant on > making writes and immediately reading back the entry. > > As I hear what you're saying is that any write to a server in a > mirrormode setup could invalidate a slapcat running on the other. > This would mean that you can never write to more that one server at all > and that's the only server you can slapcat while running. > That takes a lot of the "mirror" out of "mirrormode". Doesn't it? Based on my *very incomplete and possibly wrong* analysis, the problem would be automatically cured by using back-bdb. Also, fixing back-hdb *if it's broken at all* should be possible. p.
masarati@aero.polimi.it wrote: >> Howard Chu wrote: >>> Of course, no one is saying for certain that this is the explanation, >>> yet. >> >> It sounds reasonable to me :) > Based on my *very incomplete and possibly wrong* analysis, the problem > would be automatically cured by using back-bdb. Also, fixing back-hdb *if > it's broken at all* should be possible. Quite certain this is not the explanation, since syncrepl will call syncrepl_add_glue() whenever it needs to store an entry that doesn't yet have its parent. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Howard Chu wrote: >> If I can't trust slapcat during this >> phase, how can I trust slapcat for backups? > > Does slapcat behave this way on the active server? Yes. Sorry for the delay, but I've only just confirmed it now. The passive server needed to be moved, so I took it completely offline. Now there's only the active server and a backup shows the same result. Running slapcat while slapd is running produced invalid LDIF. Running slapcat after slapd is stopped produced OK LDIF - at least once. slapd has since been upgraded to 2.4.21. /Peter
changed state Suspended to Open
changed notes changed state Open to Test moved from Incoming to Software Bugs
Fix verified. -------- Original Message -------- Subject: RE: truncated slapcat output Date: Thu, 11 Apr 2013 07:15:17 +0000 From: Hummel, Wolfgang <wolfgang.hummel@hp.com> To: Howard Chu <hyc@symas.com> Hello Howard, we tested your fix for truncated slapcat and it works fine. Please see test report below. As I am not familiar with your processes in commenting on bugfixes could you use the attached test report and commit the fix so it will become part of next release ? Thanks a lot for your support Regards Wolfgang Hummel Test execution timeframe: April 9th and 10th 2013 Test platform: Hardware HP DL380 G5, 32 GB Memory, 2 Quadcore CPUs OS RedHat 5.1 OpenLDAP 2.4.32 without / with slapcat truncate bugfix bdb backend Test setup: - Subscriber profile DB with 1 Mio. entries, each with 10 attributes Test execution: - ldapbenchmark.pl executing ~ 5 ldapmodifies and ~ 2 ldapsearches / second - slapcat_vfhu_vm_profiles.sh running slapcat in a cycle and counting entries Test result without slapcat truncate bugfix: - 1826 measurements of entry number - 23 events with truncated slapcat output - slapcat return code was always 0 ! Test result with slapcat truncate bugfix: - 1647 measurements of entry number - 0 events with truncated slapcat output script to measure it: #!/bin/bash # run slapcat forever and always count number of entries # they should always be the same while : do slapcat -b ou=vm,ou=profiles,ou=xxxx,c=hu,o=yyyy|grep "^dn:"|wc -l >>vfhu_vm_entries.txt echo "Slapcat Return Code: $?; `date`">>vfhu_vm_entries.txt sleep 1 done log snippet with bug fix: Slapcat Return Code: 0; Wed Apr 10 22:12:42 CEST 2013 1000001 Slapcat Return Code: 0; Wed Apr 10 22:13:05 CEST 2013 1000001 Slapcat Return Code: 0; Wed Apr 10 22:13:28 CEST 2013 1000001 Slapcat Return Code: 0; Wed Apr 10 22:13:52 CEST 2013 1000001 Slapcat Return Code: 0; Wed Apr 10 22:14:16 CEST 2013 1000001 log snippet without bugfix: 1000001 Slapcat Return Code: 0; Wed Apr 10 01:01:45 CEST 2013 319284 Slapcat Return Code: 0; Wed Apr 10 01:02:07 CEST 2013 1000001 Slapcat Return Code: 0; Wed Apr 10 01:03:15 CEST 2013 1000001 Slapcat Return Code: 0; Wed Apr 10 01:03:42 CEST 2013 1000001 Slapcat Return Code: 0; Wed Apr 10 01:04:09 CEST 2013 1000001 Slapcat Return Code: 0; Wed Apr 10 01:04:37 CEST 2013 1000001 Slapcat Return Code: 0; Wed Apr 10 01:05:04 CEST 2013 437990 Slapcat Return Code: 0; Wed Apr 10 01:05:16 CEST 2013 146790 Slapcat Return Code: 0; Wed Apr 10 01:05:21 CEST 2013 72688 Slapcat Return Code: 0; Wed Apr 10 01:05:25 CEST 2013 1000001 Slapcat Return Code: 0; Wed Apr 10 01:05:50 CEST 2013 -----Original Message----- From: Howard Chu [mailto:hyc@symas.com] Sent: Dienstag, 26. März 2013 18:09 To: Hummel, Wolfgang Subject: Re: truncated slapcat output Hummel, Wolfgang wrote: > Hello Howard, > > we are facing in production of our major customer VF the same problems as > described in > > http://www.openldap.org/lists/openldap-technical/201301/msg00232.html > > We are using back-bdb and slapcat to create nightly ldif backups of all DBs > > where 2 of them each contains ~ 40 Mio. entries. > > Even though write traffic is low when the slapcat job runs, every 1 - 2 weeks > > output is truncated without error msg or error return code which creates problems > > for subsequent batch jobs. > > Therefore a slapcat that retries like slapd if a page is locked would be > > really important for us. > > 2 Questions: > > - Is the slapcat retry mechanism the fix you described in > http://www.openldap.org/lists/openldap-technical/201301/msg00232.html ? The patch for ITS#6365 was committed to master commit 853b9d1335d27e280751e9cfb8ca6b5356ffec73 Author: Howard Chu <hyc@openldap.org> Date: Thu Feb 7 18:23:25 2013 +0000 ITS#6365 wait for read locks in tool mode > - When will it be available in OpenLDAP ? When somebody follows up to the ITS and confirms that the patch fixes the issue. So far nobody has followed up. You're welcome to test and followup. > > Regards > > Wolfgang Hummel > > ---------------------------------------------------------- > > Postal Address: Hewlett-Packard GmbH > > Wolfgang Hummel > > Enterprise Services > > _/ Communication & Media Solutions > > _/ Herrenberger Str. 140 > > _/_/_/ _/_/_/ 71034 Böblingen > > _/ _/ _/ _/ Phone: +49 7031 14-7375 > > _/ _/ _/_/_/ Fax: +49 711 18562024 > > _/ mobile: +49 151 14751791 > > _/ E-Mail: wolfgang.hummel@hp.com > <mailto:wolfgang.hummel@hp.com> > > ---------------------------------------------------------- > > http://www.hp.com/de > > Hewlett-Packard GmbH, Herrenberger Str. 140, 71034 Böblingen > > Geschäftsführer: Volker Smid (Vorsitzender), Martin Kinne, Heiko Meyer, Ernst > Reichart, Rainer Sterk Vorsitzender des Aufsichtsrates: Jörg Menno Harms Sitz > der Gesellschaft: Böblingen, Amtsgericht Stuttgart HRB 244081, WEEE-Reg.-Nr. > DE 30409072 > -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
changed notes changed state Test to Release
changed notes changed state Release to Closed
fixed in master fixed in RE24