[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#7007) sync: inconsistent directory when starting with emtpy node



Full_Name: Stephan
Version: 2.4.26
OS: Debian 5.0.8
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (134.100.101.1)


Hi everybody,

I have an odd problem with synchronisation. I have 5 nodes running multi master
synchronisation. I recently updated one of them to 2.4.26 (they were all running
2.4.23 before) and started it with an empty database (the database directory
contained nothing more than the DB_CONFIG file). Afterwards I noted that some
attributes were missing, especially in the root of the directory tree, which
looks like this:

dn: o=test
objectClass: organization
o: test

After the sync was complete, I could search for the o=test entry but couldn't
see it:

ldapsearch  -H "ldaps://filled.test.xyz" -D cn=superman,o=test -W -x -b o=test
o=test

# extended LDIF
#
# LDAPv3
# base <o=test> with scope subtree
# filter: o=test
# requesting: ALL
#

# search result
search: 2
result: 0 Success

# numResponses: 1

I played around a bit and am able to reproduce the problem. I tried combinations
of 2.4.23 and 2.4.26, bdb and hdb backends, bdb versions 4.5.20-13 and 4.6.21
but the problem persists. If I take the output of slapcat on one of the nodes
and feed it to the empty node before its startup everything seems to be ok.

I increased the loglevel and can see the following (rid 1 being the node with
the filled database and rid 2 the one with the empty database):

On the empty node (rid2) I get lots of 'syncrepl_entry: rid=001 inserted UUID
xxx'
'syncprov_matchops: skipping original sid 001' and 'syncrepl_entry: rid=001
be_add'
Once the sync is complete the log on the formerly empty node says:

Aug  1 17:24:55 empty slapd[7796]: do_syncrep2: rid=001 LDAP_RES_INTERMEDIATE -
REFRESH_DELETE
Aug  1 17:24:55 empty slapd[7796]: do_syncrep2: rid=001
cookie=rid=001,sid=001,csn=20110310162449.396663Z#000000#000#000000;20110801135804.526931Z#000000#001#000000;20110614092017.772141Z#000000#003#000000;20110722131433.793594Z#000000#004#000000;20110225131904.168266Z#000000#005#000000
Aug  1 17:24:55 empty slapd[7796]: slap_queue_csn: queing 0x9f901da8
20110801135804.526931Z#000000#001#000000
Aug  1 17:24:55 empty slapd[7796]: syncprov_matchops: skipping original sid 001
Aug  1 17:24:55 empty slapd[7796]: slap_graduate_commit_csn: removing 0x9f91f5f8
20110801135804.526931Z#000000#001#000000
Aug  1 17:24:55 empty slapd[7796]: do_syncrep2: rid=001 cookie=
Aug  1 17:24:55 empty slapd[7796]: syncrepl_entry: rid=001
LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_MODIFY)
Aug  1 17:24:55 empty slapd[7796]: syncrepl_entry: rid=001 be_search (0)
Aug  1 17:24:55 empty slapd[7796]: syncrepl_entry: rid=001 o=test
Aug  1 17:24:55 empty slapd[7796]: syncrepl_entry: rid=001 entry unchanged,
ignored (o=test)
Aug  1 17:24:55 empty slapd[7796]: do_syncrep2: rid=001 LDAP_RES_INTERMEDIATE -
NEW_COOKIE
Aug  1 17:24:55 empty slapd[7796]: do_syncrep2: rid=001 NEW_COOKIE:
rid=001,sid=001,csn=20110801152321.168164Z#000000#000#000000;20110801135804.526931Z#000000#001#000000;20110614092017.772141Z#000000#003#000000;20110722131433.793594Z#000000#004#000000;20110225131904.168266Z#000000#005#000000


On the node from which it took the database (rid 1):

Aug  1 17:24:54 filled slapd[17862]: do_syncrep2: rid=002 LDAP_RES_INTERMEDIATE
- SYNC_ID_SET 
Aug  1 17:24:54 filled last message repeated 7 times
Aug  1 17:24:54 filled slapd[17862]: syncrepl_entry: rid=002
LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD) 
Aug  1 17:24:54 filled slapd[17862]: syncrepl_entry: rid=002 be_search (0) 
Aug  1 17:24:54 filled slapd[17862]: syncrepl_entry: rid=002 o=test 
Aug  1 17:24:54 filled slapd[17862]: syncprov_sendresp: cookie= 
Aug  1 17:24:54 filled slapd[17862]: syncrepl_entry: rid=002 be_modify o=test
(0) 
Aug  1 17:24:54 filled slapd[17862]: syncrepl_entry: rid=002
LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD) 
Aug  1 17:24:54 filled slapd[17862]: dn_callback : entries have identical CSN
sambaDomainName=XYZ,ou=abc,o=test 20110722131010.624524Z#000000#004#000000      
 
Aug  1 17:24:54 filled slapd[17862]: syncrepl_entry: rid=002 be_search (0) 
Aug  1 17:24:54 filled slapd[17862]: syncrepl_entry: rid=002
sambaDomainName=XYZ,ou=abc,o=test          
Aug  1 17:24:54 filled slapd[17862]: syncrepl_entry: rid=002 entry unchanged,
ignored (sambaDomainName=XYZ,ou=abc,o=test)        
Aug  1 17:24:54 filled slapd[17862]: do_syncrep2: rid=002 LDAP_RES_INTERMEDIATE
- REFRESH_PRESENT 
Aug  1 17:24:54 filled slapd[17862]: do_syncrep2: rid=002
cookie=rid=002,csn=20110801152321.168164Z#000000#000#000000 
Aug  1 17:24:54 filled slapd[17862]: slap_queue_csn: queing 0x7eff600022c0
20110801152321.168164Z#000000#000#000000 
Aug  1 17:24:54 filled slapd[17862]: slap_graduate_commit_csn: removing
0x7eff60007040 20110801152321.168164Z#000000#000#000000

It seems to me that the formerly empty node overwrites the entry on the node
from which it took all the entries.

This is what my config looks like:

overlay syncprov
syncprov-checkpoint 100 10
syncprov-sessionlog 100
syncprov-reloadhint TRUE

serverID        1       "ldaps://filled.test.xyz"
serverID        2       "ldaps://empty.test.xyz"

#filled
syncrepl rid=2
        provider=ldaps://filled.test.xyz:636
        type=refreshAndPersist
        searchbase="o=test"
        retry="120 20 300 5 1800 5 3600 +"
        scope=sub
        bindmethod=simple
        binddn="cn=superman,o=test"
        credentials="hello"
        tls_reqcert=never
mirrormode on

And the other node:
overlay syncprov
syncprov-checkpoint 100 10
syncprov-sessionlog 100
syncprov-reloadhint TRUE

serverID        1       "ldaps://filled.test.xyz"
serverID        2       "ldaps://empty.test.xyz"

#filled
syncrepl rid=2
        provider=ldaps://empty.test.xyz:636
        type=refreshAndPersist
        searchbase="o=test"
        retry="120 20 300 5 1800 5 3600 +"
        scope=sub
        bindmethod=simple
        binddn="cn=superman,o=test"
        credentials="hello"
        tls_reqcert=never
mirrormode on

It seems to me that the formerly empty node overwrites the other one after the
sync.

In some cases the entries don't go missing on the source node, they just don't
get replicated to the formerly emtpy node.

Unless I made some terrible mistake in the config this is seems to be a bug
since in the administrator's guide
http://www.openldap.org/doc/admin24/replication.html#N-Way%20Multi-Master%20replication
it says: "Syncrepl is self-synchronizing; you can start with a consumer database
in any state from totally empty to fully synced and it will automatically do the
right thing to achieve and maintain synchronization"

All nodes are configured to use the same ntp source.