[Date Prev][Date Next]
Re: replication factored out of slapd
I've narrowed down the issues; see below.
> On Thu, 2005-12-15 at 10:38 +0100, Pierangelo Masarati wrote:
>> I think it's a great idea; it would also solve the issue with syncrepl
>> that we can't use it when the master is behind a firewall that doesn't
>> allow LDAP connections inwards.
>> I'm setting up a 3 slapd test (test045) that checks this. I note that
>> it works just fine with the "consumer" overlay on the back-ldap and the
>> "slurprov" overlay on the slave...
> Now I think I was a bit too enthusiast. As a baseline, it works; it's
> not resilient:
> 1) internal operations do not set o_version, which is 0 and results in
> back-ldap using LDAPv2+
Now back-ldap sets to LDAPv3 the version when o_protocol is 0, assuming
it's an internal operation (shoudl be configurable?)
> 2) when the replica is down, syncrepl operations via back-ldap fail;
> these failures seem to be handled incorrectly, since no replication
> occurs even after the replica comes up; only when the proxy is restarted
> replication starts again, but involving only new modifications; those in
> between are "lost"; I guess some full refresh should occur in case
> syncrepl recovers from a transient internal malfunction.
It had nothing to do with server down/up.
> 3) at startup, I see scary logs like
> conn=1 fd=11 ACCEPT from IP=127.0.0.1:35943 (IP=127.0.0.1:9012)
> conn=1 op=0 BIND dn="" method=128
> conn=1 op=0 RESULT tag=97 err=0 text=
> conn=1 op=1 SRCH base="dc=example,dc=com" scope=0 deref=0
> conn=1 op=1 SRCH attr=contextCSN
> conn=1 op=1 SEARCH RESULT tag=101 err=32 nentries=0 text=
> conn=2 fd=12 ACCEPT from IP=127.0.0.1:35945 (IP=127.0.0.1:9012)
> conn=2 op=0 BIND dn="" method=128
> conn=2 op=0 RESULT tag=97 err=0 text=
> conn=2 op=1 SRCH base="dc=example,dc=com" scope=0 deref=0
> conn=2 op=1 SRCH attr=contextCSN
> conn=2 op=1 SEARCH RESULT tag=101 err=32 nentries=0 text=
> conn=2 op=2 DISCONNECT tag=120 err=2 text=unexpected data in PDU
> do_search: get_ctrls failed
> conn=2 fd=12 closed (operations error)
> conn=3 fd=12 ACCEPT from IP=127.0.0.1:35946 (IP=127.0.0.1:9012)
> conn=3 op=0 BIND dn="cn=Replica,dc=example,dc=com" method=128
> conn=3 op=0 BIND dn="cn=Replica,dc=example,dc=com" mech=SIMPLE ssf=0
> conn=3 op=0 RESULT tag=97 err=0 text=
> conn=3 op=1 DISCONNECT tag=120 err=2 text=unexpected data in PDU
> do_search: get_ctrls failed
> conn=3 fd=12 closed (operations error)
> on the replica; the first conn is just fine: the replica is empty; I
> don't quite understand the second: why re-run that search if the former
> gave noSuchObject? And the error is even more puzzling.
This was related to the erroneous value of ors_filterstr in
syncrepl_entry(), which is now fixed.
The I had another issue: the internal search searches for "*,+" despite of
the attrlist that's given in the syncrepl config. I'm not sure this is
correct; in fact, using back-ldap, this was collecting "hasSubordinates",
while the provider was not sending it, essentially because it's generated.
To work things around, I've used the "attrs" requested in the syncrepl
line for the internal search as well. In this case, however, if the
configuration of the syncrepl is modified, and some attributes are
removed, they would no longer be deleted when syncing. Perhaps a better
solution would be to force syncprov to generate the run-time attrs by
Finally, I've a remaining issue: it appears that modifications consisting
in the full delete of an attr don't get replicated. For example, I have
# Ful delete of "description"
dn: cn=All Staff,ou=Groups,dc=example,dc=com
# Partial delete of "cn": the distinguished value remains
dn: cn=Bjorn Jensen,ou=Information Technology Division,ou=People,
cn: Biiff Jensen
The second one is replicated; the first isn't. I'm investigating... By
now I'll remove this type of changes, and only issue a warning, so that
the rest of the test can be run for regression.
Ing. Pierangelo Masarati
Responsabile Open Solution
Via Dossi, 8 - 27100 Pavia - ITALIA