[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: syncRepl master loses syncrepl entry (ITS#2928)
--On Thursday, January 22, 2004 11:34 AM -0500 Jong <jongchoi@OpenLDAP.org>
wrote:
>
>
>> I found that after repeated use and lockups of the slaves (which
>> required them
>
> Is this the same one as reported in ITS#2910 ?
> Is the refreshAndPersist mode used ?
> and the size of the replication content ?
This locking issue is the same as #2910, yes... I see it is noted as fixed,
I would be happy to pull down the patch if someone can point me to the
files changed, and test that it works.
I am using refreshAndPersist
The replication content size was fairly small (about 5 new entries between
the master and the replica's).
>
>> to be killed, and db_recover run on their systems), that the master
>> eventually lost its syncrepl entry (????), making it so that it was
>> impossible for them to
>
> From the above context, isn't it the slave (consumer) who is supposed to
> lose its syncrepl entry ?
Yeah, I think I searched on the wrong thing -- I looked for *repl in the
database dump (not *sync).
>
>> query the master for further updates, and resulting in the following
>> errors in the ldap log on the replica:
>>
>> Jan 21 21:36:19 ldap-dev1.Stanford.EDU slapd[13222]: [ID 100111
>> local4.debug] slapd starting
>> Jan 21 21:36:20 ldap-dev1.Stanford.EDU slapd[13222]: [ID 166296
>> local4.debug] null_callback : error code 0x42
>> Jan 21 21:36:20 ldap-dev1.Stanford.EDU slapd[13222]: [ID 749340
>> local4.debug] syncrepl_entry : be_search failed (66)
>> Jan 21 21:36:23 ldap-dev1.Stanford.EDU slapd[13222]: [ID 166296
>> local4.debug] null_callback : error code 0x42
>> Jan 21 21:36:23 ldap-dev1.Stanford.EDU slapd[13222]: [ID 749340
>> local4.debug] syncrepl_entry : be_search failed (66)
>> Jan 21 21:36:23 ldap-dev1.Stanford.EDU slapd[13222]: [ID 166296
>> local4.debug] null_callback : error code 0x42
>> Jan 21 21:36:23 ldap-dev1.Stanford.EDU slapd[13222]: [ID 749340
>> local4.debug] syncrepl_entry : be_search failed (66)
>> Jan 21 21:36:23 ldap-dev1.Stanford.EDU slapd[13222]: [ID 166296
>> local4.debug] null_callback : error code 0x42
>> Jan 21 21:36:23 ldap-dev1.Stanford.EDU slapd[13222]: [ID 749340
>> local4.debug] syncrepl_entry : be_search failed (66)
>
> This sequence of error revealed a need to update the present mode
> processing routine to make it delete the non-present entries from the
> leaf entries first. A direct approach would be to remove the leaf entries
> first by examining the hasSubordinates operational attributes. The
> disadvantage of this approach is that it is required to repeat the
> non-present entry deletion process when all to-be-deleted entries become
> leaf entries.
> Another approach would be to sort the entries in the createTimestamp
> order. Because server side sorting is not supported, the syncrepl engine
> needs to make si_nonpresentlist be sorted in the createTimestamp order.
> I'm more inclined to the latter. Any comments or other possibilities ?
Okay, I think this is the real problem then -- Basically, after the lockup
of syncRepl on the slaves, I've had to do a db_recover. Whatever changes
it dumped from the DB probably caused this scenario. I think the second
solution sounds a little better, since it avoids the repeate on the
non-present entry deletion.
--Quanah
--
Quanah Gibson-Mount
Principal Software Developer
ITSS/TSS/Computing Systems
ITSS/TSS/Infrastructure Operations
Stanford University
GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html