[Date Prev][Date Next]
Re: sync replication questions
--On Wednesday, November 04, 2009 3:03 PM -0500 Edward Capriolo
I had an event happen I would like to understand how openldap handled
this and why.
We have two openldap nodes doing sync replication (configuration is
below). One node locked up this weekend no response to ping etc. This
in itself is a problem (since this server is not very high load)
however I do not have any diagnostics on this so I will move on.
ldap1 crashed. It stayed off for a good part of the weekend. On
returning to work ldap1 rebooted and openldap was started.
At this point changes made to ldap2 were propagating to ldap1. However
changes to ldap1 were not replicating to ldap2.
I restarted ldap1 with only replication debug on. I saw this... (dont
mind the header upfront we are using daemon tools)
[root@nyldap1 ~]# tail -f /service/openldap/log/main/current
@400000004af1d1582b3c9c74 bdb_db_open: warning - no DB_CONFIG file found
in dire ctory /usr/local/openldap/var/openldap-data: (2).
@400000004af1d1582b3caffc Expect poor performance for suffix "o=ec,c=US".
@400000004af1d1582b88090c bdb_monitor_db_open: monitoring disabled;
configure mo nitor database to enable
@400000004af1d1582b8bfcc4 slapd starting
URI=ldap://nyldap1.ops.ec.com DN="cn=root,o=ec,c=us" ldap_sasl_bind_s
@400000004af1d15a00006f54 do_syncrepl: rid=003 rc -5 retrying (4 retries
left) @400000004af1d15a00022ca4 do_syncrep2: rid=004
LDAP_RES_INTERMEDIATE - REFRESH_D ELETE
@400000004af1d15a2aac52f4 TLS: can't accept: (null).
@400000004af1d15f2ab0f674 TLS: can't accept: (null).
@400000004af1d15f2ab8f16c do_syncrep2: rid=003 LDAP_RES_INTERMEDIATE -
@400000004af1d1642aafae54 TLS: can't accept: (null).
@400000004af1d1692ab2fa14 TLS: can't accept: (null).
Changes from nyldap1 were still not propagating to nyldap2.
Then I restarted nyldap2. replication was again working in both
I based my setup on these notes:
retry="5 5 300 5"
syncprov-checkpoint 100 10
So our database is very very low on write/update activity. I think I
understand that syncprov-checkpoint timed out. I am going to change
syncprov-checkpoint 10000 9000
As I said our database is very low write, we add users periodically
and peoples passwords expire that is all the writes/updates that
My openldap is 2.4.16 build from source.
Latest stable is 2.4.19. There have been replication fixes since 2.4.16.
So important questions :
1) Why did two way replication not restart ? If this is the correct
behavior for my configuration or a bug. Can I configure openldap to
always start refresh and persist?
Sounds like a bug. But again, there have been fixes since the release you
2) Could some entries be out of sync now? If yes
2a) Can I use a tool to confirm these systems are in sync? "slapcat &
diff" (better option)
I'd suggest doing such a diff, yes.
2b) if one side is out of sync can i force one side to replicate over the
man slapd, look at the -c option.
Principal Software Engineer
Zimbra :: the leader in open source messaging and collaboration