[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: null_callbacks after initial sync



Chris,

Well, it looks like I should have read that doc a little closer. ;) 
Still, though I now have two syncrepl blocks copied directly from that
doc, I'm still seeing odd behavior that I hope this list can clear it up.

The gist of it is, I see replication to both, but only if I write to
one.  I hope I'm understanding things correctly when I assume that in
the multi-master mirror, each system should be able to take writes,
though openldap cannot handle reconciling concurrent writes to each for
the same entry (hence the warning about writing to one at time in the
admin doc).  If I'm wrong here, well, I have any obvious flaw in my logic. 

Just for reference, I'm trying to setup a mirror behind a pair of load
balancers.  Syncrepl should take place on the back-end though.  Based on
my interpretation, I want to weight one real server so it is in effect
the 'master' in a hot standby configuration.  I wanted a mirror to
mitigate fail-back issues.

S1 ----\                                          *** 'Primary' server
via weighted LB interface.
 |          LB(s)---pub
S2-----/

A simple run down of one of my tests goes as follows:

start server 1
start server 2 and verify in logs attempts to do_syncrepl
modify entry on one system and verify synchronization to the other.

This is what I see:

start server 1 then 2 and modify 1 - this works
start server 2 then 1 and modify 1 - this works
start server 1 then 2 and modify 2 - this fails*
start server 2 then 1 and modify 2 - this fails

* By fails, I mean that attributes on the modified entry are not in
sync, though there are no 'retry' attempts as if both believe they have
the most current contextCSN.   Again, only the first (of three)
modifications is propagated.

Another thing to note, I only see this behavior if I do not use the
accesslog overlay.   Ergo:

syncrepl     
rid=001                                                                               

                     provider=ldap://server01-ldap.mgt.example.com
                     bindmethod=simple
                     binddn="uid=syncrepl,ou=ldap,dc=example,dc=com"
                     credentials=secret
                     searchbase="dc=example,dc=com"
                     schemachecking=on
                     type=refreshAndPersist
                     retry="10 +"

syncrepl      rid=002
                     provider=ldap://server02-ldap.mgt.example.com
                     bindmethod=simple
                     binddn="uid=syncrepl,ou=ldap,dc=example,dc=com"
                     credentials=secret
                     searchbase="dc=example,dc=com"
                     schemachecking=on
                     type=refreshAndPersist
                     retry="10 +"

mirrormode on
serverID 1


If I add logbase and syncfilter,  I see the behavior where any serverID
other 1 (on either system) results in the looping mentioned previously,
when one server is down.  In that setup with both servers up, I still
see the null_callback issues.

At this point I'm pretty frayed.  I don't see how copying something
directly out of the admin doc would fail unless I've tickled a new bug.

Perhaps someone could suggest a known good, working config for me to
try?  I would prefer delta-syncrepl in a mirror, but I'd settle for just
syncrepl with a mirror at this point.  I started this project with the
intention of updating our older slurpd based ldap server 'cluster.'

Again, thanks for any help.

-Nick

Chris G. Sellers wrote:
> Nick,
>
> My hunch was since you are in N-Way MultiMaster mode, it should work
> both ways, and I think that nailed where the problem is
>
> You need two sections with syncrepl.  One for the replication from A
> to B, and one for the replication from B to A.  Both servers should
> have both RIDs
>
> See the MirrorMode and N-Way documentation on the Admin24
> documentation.  You will see you need both servers to have both RID
> lines, just be sure the serverID is unique.
>
>
> http://www.openldap.org/doc/admin24/replication.html#N-Way%20Multi-Master
>
> 16.5.3.1 is a good reference.
>
> Enjoy
>
> Sellers
> (p.s.  I'm not sure why the first change replicated though - if I
> understand correctly, it would have all failed in the one direction)
>
>
> On Mar 5, 2008, at 11:34 AM, Nick Geron wrote:
>> Thanks for the reply, Chris.  That hadn't occurred to me.  It seems that
>> yes, when starting server 2 first and writing to it, replication appears
>> to work fine.  That got me thinking perhaps there was an issue with the
>> build or environment issue on the first server, but I can't find any
>> meaningful discrepancy between the systems.
>>
>> Something I'm curious about though, is the behavior when the serverID is
>> set incorrectly.  It's my understanding that in mirror mode, the pair
>> are configured to use the same RID with different SIDs.  I have been
>> working with RID 1 and SIDs 1 and 2 for server 1 and 2 respectively.  An
>> earlier test where I had neglected to change the SID, (rid 1 with two
>> sid 1) deluged my logs with what looked to be a loop; Gigs of
>> do_syncrep2 Content Sync Refresh Required and subsequent attempts to
>> sync.
>>
>> I understand that the config is the source of the problem there.
>> However, thinking that the success of this morning's test with server 2
>> as the 'primary,' I wondered if there was some confusion when using rid
>> 1 and sid 1, so I changed server 1 to sid 3.  What I don't understand is
>> that I see the same log entries/behavior with rid 1, sid 3 and sid 2 as
>> I did if both systems were set with sid 1.  Can anyone explain that
>> behavior?  And of course, Chris, what's your hunch with suspecting that
>> things might work from the second server?
>>
>> Any input is very much appreciated as my project cannot continue without
>> a functional mirror.  Thanks!
>>
>> -Nick
>>
>> Chris G. Sellers wrote:
>>> Do you have different behavior if you make the update to ldap server 2
>>> and it tries to replicate to ldap server 1?
>>>
>>> Sellers
>>> On Mar 4, 2008, at 12:20 PM, Nick Geron wrote:
>>>> I'm working with a test mirror mode setup on 2.4.7 with db 4.5.20 and
>>>> seeing issues with SyncRepl.  Specifically, do_syncrepl fails with an
>>>> initial error 0x10 and subsequent 0x14, though at least one
>>>> modification
>>>> is propagated.  To put another way:
>>>>
>>>> *Systems using same ldif to populate and running nearly identical
>>>> slapd.conf files (serverID is the only variance).
>>>>
>>>> 1) start server 1
>>>> 2) start server 2
>>>> 3) add host attribute to posixAccount entry on s-1
>>>> 4) attribute seen on s-2 but results in the following log
>>>> 5) no other updates successful until server process restarted.
>>>>
>>>> Mar  4 10:46:14 slapd[22999]: do_syncrep2: rid=001
>>>> LDAP_RES_INTERMEDIATE
>>>
>>> ______________________________________________
>>> Chris G. Sellers | NITLE  - Technology Team
>>> 734.661.2318 | chris.sellers@nitle.org
>>> <mailto:chris.sellers@nitle.org> <mailto:chris.sellers@nitle.org>
>>> AIM: imthewherd | GoogleTalk: cgseller@gmail.com
>>> <mailto:cgseller@gmail.com>
>>> <mailto:cgseller@gmail.com>
>>>
>
> ______________________________________________
> Chris G. Sellers | NITLE  - Technology Team
> 734.661.2318 | chris.sellers@nitle.org <mailto:chris.sellers@nitle.org>
> AIM: imthewherd | GoogleTalk: cgseller@gmail.com
> <mailto:cgseller@gmail.com>
>