Full_Name: Aaron Richton Version: 2.3.24 OS: Solaris 9 URL: Submission from: (NULL) (128.6.31.135) Something not quite right with session log or nonpresent list or....... Pretty much the same as the last time I played the syncrepl game, except this time instead of entries missing from the slave, there are excessive entries on the slave. Here's an example, although it's hardly illuminating: [...] Jun 8 15:18:09 master.rutgers.edu slapd[29239]: [ID 379477 local4.debug] conn=361133 op=2 DEL dn="uid=VICTIM,ou=People,dc=rci,dc=rutgers,dc=edu" [...] Jun 8 15:22:44 master.rutgers.edu slapd[29239]: [ID 848112 local4.debug] conn=361315 fd=100 ACCEPT from IP=192.168.4.67:53492 (IP=0.0.0.0:389) Jun 8 15:22:44 master.rutgers.edu slapd[29239]: [ID 560212 local4.debug] conn=361315 op=0 STARTTLS Jun 8 15:22:44 master.rutgers.edu slapd[29239]: [ID 875301 local4.debug] conn=361315 op=0 RESULT oid= err=0 text= Jun 8 15:22:44 master.rutgers.edu slapd[29239]: [ID 105384 local4.debug] conn=361315 fd=100 TLS established tls_ssf=256 ssf=256 Jun 8 15:22:44 master.rutgers.edu slapd[29239]: [ID 215403 local4.debug] conn=361315 op=1 BIND dn="cn=readEverything,dc=rci,dc=rutgers,dc=edu" method=128 Jun 8 15:22:44 master.rutgers.edu slapd[29239]: [ID 600343 local4.debug] conn=361315 op=1 BIND dn="cn=readEverything,dc=rci,dc=rutgers,dc=edu" mech=SIMPLE ssf=0 Jun 8 15:22:44 master.rutgers.edu slapd[29239]: [ID 588225 local4.debug] conn=361315 op=1 RESULT tag=97 err=0 text= Jun 8 15:22:44 master.rutgers.edu slapd[29239]: [ID 469902 local4.debug] conn=361315 op=2 SRCH base="dc=rci,dc=rutgers,dc=edu" scope=2 deref=0 filter="(objectClass=*)" Jun 8 15:22:44 master.rutgers.edu slapd[29239]: [ID 744844 local4.debug] conn=361315 op=2 SRCH attr=* + Jun 8 15:22:44 master.rutgers.edu slapd[29239]: [ID 545199 local4.debug] Entry uid=BYSTANDER,ou=People,dc=rci,dc=rutgers,dc=edu CSN 20060608190309Z#000000#00#000000 matches ctx 20060608190309Z#000000#00#000000 Jun 8 15:22:44 master.rutgers.edu slapd[29239]: [ID 167594 local4.debug] conn=361315 op=2 SEARCH RESULT tag=101 err=0 nentries=0 text= Jun 8 15:22:44 slave5.rutgers.edu slapd[12966]: [ID 496601 local4.debug] do_syncrep2: LDAP_RES_SEARCH_RESULT Jun 8 15:22:44 master.rutgers.edu slapd[29239]: [ID 218904 local4.debug] conn=361315 op=3 UNBIND Jun 8 15:22:44 master.rutgers.edu slapd[29239]: [ID 952275 local4.debug] conn=361315 fd=100 closed and at the end of this, slave5 still has uid=VICTIM, which is undesirable. slave5 is on "stats sync" debug. I think I should see a del_nonpresent in there, no? So I'd assume the master is telling it something wrong. Not sure what else I can do. I have: -d -1 from slave5 (HUGE, running for weeks. I can try to slim it down.) "stats sync" syslogs from all slaves "stats sync" syslogs from the master Everything's still running as it was/is, so if you want to dbx some syncprov struct or something like that, I can do that. Of course it might not be in scope still, but I can always delete more! (Although I'm fairly certain this isn't 100% of the time. I'll look harder. Might be like last time, need a certain number of ops per sync or similar.)
richton@nbcs.rutgers.edu wrote: > Something not quite right with session log or nonpresent list or....... > > Pretty much the same as the last time I played the syncrepl game, except this > time instead of entries missing from the slave, there are excessive entries on > the slave. Yes, I can reproduce this problem as well. At a cursory glance it appears to be a sessionlog issue as the problem disappears if I disable the sessionlog. Can provide additional info if needed.
richton@nbcs.rutgers.edu wrote: > Full_Name: Aaron Richton > Version: 2.3.24 > OS: Solaris 9 > URL: > Submission from: (NULL) (128.6.31.135) > > > Something not quite right with session log or nonpresent list or....... > > Pretty much the same as the last time I played the syncrepl game, except this > time instead of entries missing from the slave, there are excessive entries on > the slave. Here's an example, although it's hardly illuminating: > > [...] > Jun 8 15:18:09 master.rutgers.edu slapd[29239]: [ID 379477 local4.debug] > conn=361133 op=2 DEL dn="uid=VICTIM,ou=People,dc=rci,dc=rutgers,dc=edu" > [...] > Jun 8 15:22:44 master.rutgers.edu slapd[29239]: [ID 848112 local4.debug] > conn=361315 fd=100 ACCEPT from IP=192.168.4.67:53492 (IP=0.0.0.0:389) > Jun 8 15:22:44 master.rutgers.edu slapd[29239]: [ID 560212 local4.debug] > conn=361315 op=0 STARTTLS > Jun 8 15:22:44 master.rutgers.edu slapd[29239]: [ID 875301 local4.debug] > conn=361315 op=0 RESULT oid= err=0 text= > Jun 8 15:22:44 master.rutgers.edu slapd[29239]: [ID 105384 local4.debug] > conn=361315 fd=100 TLS established tls_ssf=256 ssf=256 > Jun 8 15:22:44 master.rutgers.edu slapd[29239]: [ID 215403 local4.debug] > conn=361315 op=1 BIND dn="cn=readEverything,dc=rci,dc=rutgers,dc=edu" > method=128 > Jun 8 15:22:44 master.rutgers.edu slapd[29239]: [ID 600343 local4.debug] > conn=361315 op=1 BIND dn="cn=readEverything,dc=rci,dc=rutgers,dc=edu" > mech=SIMPLE ssf=0 > Jun 8 15:22:44 master.rutgers.edu slapd[29239]: [ID 588225 local4.debug] > conn=361315 op=1 RESULT tag=97 err=0 text= > Jun 8 15:22:44 master.rutgers.edu slapd[29239]: [ID 469902 local4.debug] > conn=361315 op=2 SRCH base="dc=rci,dc=rutgers,dc=edu" scope=2 deref=0 > filter="(objectClass=*)" > Jun 8 15:22:44 master.rutgers.edu slapd[29239]: [ID 744844 local4.debug] > conn=361315 op=2 SRCH attr=* + > Jun 8 15:22:44 master.rutgers.edu slapd[29239]: [ID 545199 local4.debug] Entry > uid=BYSTANDER,ou=People,dc=rci,dc=rutgers,dc=edu CSN > 20060608190309Z#000000#00#000000 matches ctx 20060608190309Z#000000#00#000000 > Jun 8 15:22:44 master.rutgers.edu slapd[29239]: [ID 167594 local4.debug] > conn=361315 op=2 SEARCH RESULT tag=101 err=0 nentries=0 text= > Jun 8 15:22:44 slave5.rutgers.edu slapd[12966]: [ID 496601 local4.debug] > do_syncrep2: LDAP_RES_SEARCH_RESULT > Jun 8 15:22:44 master.rutgers.edu slapd[29239]: [ID 218904 local4.debug] > conn=361315 op=3 UNBIND > Jun 8 15:22:44 master.rutgers.edu slapd[29239]: [ID 952275 local4.debug] > conn=361315 fd=100 closed > > and at the end of this, slave5 still has uid=VICTIM, which is undesirable. > slave5 is on "stats sync" debug. I think I should see a del_nonpresent in there, > no? So I'd assume the master is telling it something wrong. Not sure what else I > can do. I have: > > -d -1 from slave5 (HUGE, running for weeks. I can try to slim it down.) > "stats sync" syslogs from all slaves > "stats sync" syslogs from the master > > Everything's still running as it was/is, so if you want to dbx some syncprov > struct or something like that, I can do that. Of course it might not be in scope > still, but I can always delete more! (Although I'm fairly certain this isn't > 100% of the time. I'll look harder. Might be like last time, need a certain > number of ops per sync or similar.) > > > > > It would probably help to just look at the logs for all of the write operations that occurred since the last refresh, in the refresh pass that is missing the Delete. If re-issuing the same sequence of operations always reproduces the bug, then we'll have something to go on. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc OpenLDAP Core Team http://www.openldap.org/project/
pratt@cs.utk.edu wrote: > richton@nbcs.rutgers.edu wrote: > > >> Something not quite right with session log or nonpresent list or....... >> >> Pretty much the same as the last time I played the syncrepl game, except this >> time instead of entries missing from the slave, there are excessive entries on >> the slave. >> > > Yes, I can reproduce this problem as well. At a cursory glance it > appears to be a sessionlog issue as the problem disappears if I disable > the sessionlog. Can provide additional info if needed. > > I've been able to reproduce this problem with an optimized build, setting syncprov-sessionlog 2. In this case, single deletes after the first don't get sent by syncprov. However, with a non-optimized debug build, the problem doesn't occur. I may have a buggy optimizer here (gcc 4.0.2). What compiler versions are you using, and can you try with a non-optimized syncprov overlay? -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc OpenLDAP Core Team http://www.openldap.org/project/
hyc@symas.com wrote: > pratt@cs.utk.edu wrote: > >> richton@nbcs.rutgers.edu wrote: >> >> >> >>> Something not quite right with session log or nonpresent list or....... >>> >>> Pretty much the same as the last time I played the syncrepl game, except this >>> time instead of entries missing from the slave, there are excessive entries on >>> the slave. >>> >>> >> Yes, I can reproduce this problem as well. At a cursory glance it >> appears to be a sessionlog issue as the problem disappears if I disable >> the sessionlog. Can provide additional info if needed. >> >> >> > I've been able to reproduce this problem with an optimized build, > setting syncprov-sessionlog 2. In this case, single deletes after the > first don't get sent by syncprov. However, with a non-optimized debug > build, the problem doesn't occur. I may have a buggy optimizer here (gcc > 4.0.2). What compiler versions are you using, and can you try with a > non-optimized syncprov overlay? > > This is now fixed in HEAD, slapd/ldapsync.c rev 1.34 -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc OpenLDAP Core Team http://www.openldap.org/project/
changed notes changed state Open to Test moved from Incoming to Software Bugs
I'm still seeing this. I just noticed the delcsn.bv_val[delcsn.bv_len] = '\0'; in syncprov.c 1.152. Any chance that might be manifested with these symptoms? (I'll put it in production anyway, looks like a good idea.)
richton@nbcs.rutgers.edu wrote: > I'm still seeing this. I just noticed the > > delcsn.bv_val[delcsn.bv_len] = '\0'; > > in syncprov.c 1.152. Any chance that might be manifested with these > symptoms? (I'll put it in production anyway, looks like a good idea.) > > It's a good idea anyway, but the patch to ldapsync.c already addressed any problem this may have caused. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc OpenLDAP Core Team http://www.openldap.org/project/
> It's a good idea anyway, but the patch to ldapsync.c already addressed any > problem this may have caused. Hmm, that's not good. Things are very similar to the way MODs were before: an individual delete won't go through; I see a batch of 15 DELs, and ops 2-15 go through. So the first one is either getting chopped off or undone.
OK, I'm still seeing this problem as well with the updated ldapsync.c from HEAD. On Jul 25, 2006, at 6:02 PM, richton@nbcs.rutgers.edu wrote: > I'm still seeing this. I just noticed the > > delcsn.bv_val[delcsn.bv_len] = '\0'; > > in syncprov.c 1.152. Any chance that might be manifested with these > symptoms? (I'll put it in production anyway, looks like a good idea.) > >
changed notes changed state Test to Release
On Wednesday 26 July 2006 19:49, pratt@cs.utk.edu wrote: > OK, I'm still seeing this problem as well with the updated ldapsync.c > from HEAD. > > On Jul 25, 2006, at 6:02 PM, richton@nbcs.rutgers.edu wrote: > > I'm still seeing this. I just noticed the > > > > delcsn.bv_val[delcsn.bv_len] = '\0'; > > > > in syncprov.c 1.152. Any chance that might be manifested with these > > symptoms? (I'll put it in production anyway, looks like a good idea.) I have a master (using glue, with syncprov on the parent and subordinate) with 4 slaves which run with the patch to ldapsync.c, and another 2 slaves which run my previous set of packages (only difference is not having the patch). I sent through a set of approx 400 MODRDNs. The 2 slaves without the patch received the set of MODRDNs, the 4 slaves with the patch did not receive the MODRDNs. I won't have time to look into this further today, but will try and reproduce it tomorrow. -- Buchan Milne ISP Systems Specialist B.Eng,RHCE(803004789010797),LPIC-2(LPI000074592)
changed state Release to Closed
moved from Software Bugs to Archive.Software Bugs
fixed in HEAD/re23