[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#3939) min/max function extension to LDAP protocol



At 12:09 AM 8/20/2005, quanah@stanford.edu wrote:


>--On Friday, August 19, 2005 11:41 PM -0700 "Kurt D. Zeilenga" 
><Kurt@OpenLDAP.org> wrote:
>
>> At 11:22 PM 8/19/2005, Quanah Gibson-Mount wrote:
>>
>>
>>> --On Friday, August 19, 2005 11:11 PM -0700 "Kurt D. Zeilenga"
>>> <Kurt@OpenLDAP.org> wrote:
>>>
>>>> Well, what I noted was that there was an existing protocol
>>>> mechanism to request return the entry with the lowest/highest
>>>> CSN.
>>>>
>>>> I pretty much ignored most of rest of your post as it didn't
>>>> make much sense to me at the time (and still doesn't).
>>>
>>> Okay, maybe this will explain it more:
>>>
>>> When the consumer connects to the provider, it tries to determine if it
>>> needs to synch any data or not.
>>
>> It seems here you that you intend 'it' to refer to the consumer.
>> That's incorrect.  When using LDAP sync, it is the provider
>> which determines what data needs to be sent to sync the consumer.
>> The provider does so by providing the cookie.
>
>
>No, it is the provider.

The provider doesn't need a protocol extension to talk to
itself.  As I noted, an ordering index will do.

>The provider is trying to determine if the cookie 
>sent to it is still valid or if changes need to be sent to the consumer. 
>When the consumer's cookie is different than the providers, it does an 
>internal search to determine if that cookie is still valid for its 
>database.  This is extremely expensive.
>
>
>For example, if the provider has a cookie of:
>
>20050818185717Z#000001#00#000000
>
>and the consumer has a cookie of:
>
>20050818185716Z#000001#00#000000
>
>The provider is going to run the following search:
>
>entrycsn <= 20050818185716Z#000001#00#000000
>
>to determine if there is one or more entries with a matching CSN.  If there 
>isn't, the provider refreshes the consumer with all data.  If there is, it 
>then only does an update of changes from the time the consumer last 
>received data.

The provider will likely be forced to use updates+present mode
here as the fact that it founded matching entries says nothing
about how many deleted entries it didn't find.

>The problem is the <= search done by the provider.  That needs to be 
>avoided.  If it was simply able to look at the lowest value CSN in its 
>database, it would immediately be able to tell whether or not the consumer 
>needs only a partial update, or a full update, and skip the entire <= 
>search that can generate an enormous number of results altogether.

This seems a bit of an oversimplification.  CSNs of existing
entries say nothing about CSNs of deleted entries.  Those,
as well as entries that have moved through scope of the
search, are the nasty ones.   (I note that subtree rename
introduces other complications.)

I note that in most cases, a full update is not needed.
But often a updates+present update is needed.   That is,
a updates+deletes cannot be generated.  I think we need
to avoid terms like "partial update" as that's ambiguous
to what refresh mode is actually needed and/or used.

Long ago, I talked about tracking the last CSN of a
deletes and rename operation so that the provider can
more easily determine which mode to use.  We might want
to revisit this.


>--Quanah
>
>
>--
>Quanah Gibson-Mount
>Principal Software Developer
>ITSS/Shared Services
>Stanford University
>GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html
>
>"These censorship operations against schools and libraries are stronger
>than ever in the present religio-political climate. They often focus on
>fantasy and sf books, which foster that deadly enemy to bigotry and blind
>faith, the imagination." -- Ursula K. Le Guin