[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#3421) Replication problem with slurpd



Finally I can reproduce the bug (not always, but often). I am not sure
it is exactly the same bug, but it looks like.

What I have :
1 master (IP : 10.69.247.38) and 2 slaves (10.69.247.57 and
10.69.247.39). All are 2.2.19. I can join my slapd.conf files and my
DB_CONFIG if you want.

What I do :
10 threads add 50000 entries to the master :
dn: uid=userX-Y,o=example,c=fr
objectClass: top
objectClass: person
objectClass: organizationalPerson
objectClass: inetOrgPerson
sn: userX-Y
cn: userX-Y
uid: userX-Y

with X from 1 to 10 and Y from 1 to 50000

So I should have 50000 entries in the directories + 2 entries :
dn: o=example,c=fr
objectClass: top
objectClass: organization
o: example

dn: cn=repl,o=example,c=fr
cn: repl
uid: repl
sn: repl
objectClass: top
objectClass: person
objectClass: inetOrgPerson
userPassword: repl

But when I do an ldapsearch on the directories, I get :
master :
[...]
# numResponses: 500003
# numEntries: 500002

slave1 :
[...]
# numResponses: 500001
# numEntries: 500000
 
slave2 :
[...]
# numResponses: 500001
# numEntries: 500000

So it seems that 2 entries are missing. Looking at the .rej files, I have :

ERROR: Invalid syntax: objectClass: value #0 invalid per syntax
replica: 10.69.247.57:10389
time: 1102681517.6
dn: uid=user3-21130,o=example,c=fr
changetype: add
objectClass: top
objectClass: person
objectClass: o

(same thing for the second .rej file)

So it looks like that I have 2 entries missing, but only one in .rej file.
Moreother, this entry is incomplete. Of course it is complete in the
master :
ldapsearch -x  -H ldap://10.69.247.38:10389 -b
uid=user3-21130,o=example,c=fr
# user3-21130, example, fr
dn: uid=user3-21130,o=example,c=fr
objectClass: top
objectClass: person
objectClass: organizationalPerson
objectClass: inetOrgPerson
sn: user3-21130
cn: user3-21130
uid: user3-21130
 
# search result
search: 2
result: 0 Success

Finally when I look at the other entry missing, it is :
uid=user1-19854,o=example,c=fr

As you suggested, I enabled logs for the master and for one slave (level
256). To enable slurpd logs, I needed to edit servers/slurpd/globals.c
and modify the line :
int ldap_syslog = 0;
into :
int ldap_syslog = 1;

If you want I can attach my logs, but it is a 190M file (logs are
centralized on a single server)...
I think these results are pertinent :
#grep slurpd ldap.log
Dec 10 12:37:09 10.69.247.38 slurpd[7881]: Replica 10.69.247.57:10389,
skip repl record for uid=user1-25962,o=example,c=fr (old)
Dec 10 12:37:09 10.69.247.38 slurpd[7882]: Replica 10.69.247.39:10389,
skip repl record for uid=user1-25962,o=example,c=fr (old)
Dec 10 13:25:17 10.69.247.38 slurpd[7882]: Error: ldap_add_s failed
adding "objectClass: value #0 invalid per syntax":
uid=user3-21130,o=example,c=fr
Dec 10 13:25:17 10.69.247.38 slurpd[7882]: Error: ldap operation failed,
data written to
"/ldap/test-bug-replication/var/openldap-slurp/replica/10.69.247.39:10389.rej"
Dec 10 13:25:17 10.69.247.38 slurpd[7881]: Error: ldap_add_s failed
adding "objectClass: value #0 invalid per syntax":
uid=user3-21130,o=example,c=fr
Dec 10 13:25:17 10.69.247.38 slurpd[7881]: Error: ldap operation failed,
data written to
"/ldap/test-bug-replication/var/openldap-slurp/replica/10.69.247.57:10389.rej"

#grep uid=user3-21130 ldap.log
Dec 10 13:25:16 10.69.247.38 slapd[8047]: conn=2 op=21130 ADD
dn="uid=user3-21130,o=example,c=fr"
Dec 10 13:25:17 10.69.247.38 slurpd[7882]: Error: ldap_add_s failed
adding "objectClass: value #0 invalid per syntax":
uid=user3-21130,o=example,c=fr
Dec 10 13:25:17 10.69.247.57 slapd[29521]: conn=0 op=224452 ADD
dn="uid=user3-21130,o=example,c=fr"
Dec 10 13:25:17 10.69.247.38 slurpd[7881]: Error: ldap_add_s failed
adding "objectClass: value #0 invalid per syntax":
uid=user3-21130,o=example,c=fr


#grep 'slapd\[8047\]' ldap.log | grep -A3 uid=user3-21130
Dec 10 13:25:16 10.69.247.38 slapd[8047]: conn=2 op=21130 ADD
dn="uid=user3-21130,o=example,c=fr"
Dec 10 13:25:17 10.69.247.38 slapd[8047]: conn=2 op=21130 RESULT tag=105
err=0 text=
Dec 10 13:25:17 10.69.247.38 slapd[8047]: conn=6 op=24360 ADD
dn="uid=user8-24360,o=example,c=fr"
Dec 10 13:25:17 10.69.247.38 slapd[8047]: conn=6 op=24360 RESULT tag=105
err=0 text=

#grep 'slapd\[29521\]' ldap.log | grep -A3 uid=user3-21130
Dec 10 13:25:17 10.69.247.57 slapd[29521]: conn=0 op=224452 ADD
dn="uid=user3-21130,o=example,c=fr"
Dec 10 13:25:17 10.69.247.57 slapd[29521]: conn=0 op=224452 RESULT
tag=105 err=21 text=objectClass: value #0 invalid per syntax
Dec 10 13:25:20 10.69.247.57 slapd[29521]: conn=0 op=224454 ADD
dn="uid=user4-25121,o=example,c=fr"
Dec 10 13:25:20 10.69.247.57 slapd[29521]: conn=0 op=224454 RESULT
tag=105 err=0 text=


#grep  user1-19854 ldap.log
Dec 10 13:24:34 10.69.247.38 slapd[8042]: conn=9 op=19854 ADD
dn="uid=user1-19854,o=example,c=fr"

#grep 'slapd\[8042\]' ldap.log | grep -A3 uid=user1-19854
Dec 10 13:24:34 10.69.247.38 slapd[8042]: conn=9 op=19854 ADD
dn="uid=user1-19854,o=example,c=fr"
Dec 10 13:24:34 10.69.247.38 slapd[8042]: conn=9 op=19854 RESULT tag=105
err=0 text=
Dec 10 13:24:34 10.69.247.38 slapd[8042]: conn=4 op=25237 ADD
dn="uid=user6-25237,o=example,c=fr"
Dec 10 13:24:34 10.69.247.38 slapd[8042]: conn=4 op=25237 RESULT tag=105
err=0 text=

So what I think :
- the problem is on the master.
- slurpd or slapd have a problem writing replog or slurpd.replog
- maybe it is a lock problem or a flush problem

Hope it helps...

Raphael Ouazana