Full_Name: Quanah Gibson-mount Version: HEAD OS: Solaris 8 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (171.66.182.82) It would be particularly useful to have a way to query for the max or min value of a given attribute from a server. In particular, this would help sync replication by allowing a client to quickly determine whether or not its context CSN is still valid. This is important, because if a provider shuts down after setting its context CSN to an entry that has been deleted, or has been modified twice, since the consumers contectCSN,the consumer has to run a very exhaustive search to figure out whether or not its contextCSN is still valid. Having the min function would immediately let it know whether or not its contextCSN was still valid. The max one could be used for things like max uid in db, etc.
LDAP already provides a mechanism for performing this function, by use of the paging (1 page of 1 entry) [RFC2696] and sorting [RFC2891] controls. Of course, slapd(8) does not support the sorting control. Given this is (or can be assumed to be) a request for enhancement to OpenLDAP Software and not a request to enhance LDAP (if the latter, the OpenLDAP ITS is not the right place to request enhancements to LDAP be made), I suggest this ITS be regarded as a request to implement the sorting control. Kurt At 10:58 PM 8/14/2005, quanah@stanford.edu wrote: >Full_Name: Quanah Gibson-mount >Version: HEAD >OS: Solaris 8 >URL: ftp://ftp.openldap.org/incoming/ >Submission from: (NULL) (171.66.182.82) > > >It would be particularly useful to have a way to query for the max or min value >of a given attribute from a server. In particular, this would help sync >replication by allowing a client to quickly determine whether or not its context >CSN is still valid. This is important, because if a provider shuts down after >setting its context CSN to an entry that has been deleted, or has been modified >twice, since the consumers contectCSN,the consumer has to run a very exhaustive >search to figure out whether or not its contextCSN is still valid. Having the >min function would immediately let it know whether or not its contextCSN was >still valid. > >The max one could be used for things like max uid in db, etc.
--On Tuesday, August 16, 2005 11:53 AM -0700 "Kurt D. Zeilenga" <Kurt@OpenLDAP.org> wrote: > LDAP already provides a mechanism for performing this function, > by use of the paging (1 page of 1 entry) [RFC2696] and sorting > [RFC2891] controls. Of course, slapd(8) does not support the > sorting control. > > Given this is (or can be assumed to be) a request for enhancement > to OpenLDAP Software and not a request to enhance LDAP (if the > latter, the OpenLDAP ITS is not the right place to request > enhancements to LDAP be made), I suggest this ITS be regarded > as a request to implement the sorting control. Yes, this is a request to enhance OpenLDAP appropriately. :) --Quanah -- Quanah Gibson-Mount Principal Software Developer ITSS/Shared Services Stanford University GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html
moved from Incoming to Software Enhancements
--On Tuesday, August 16, 2005 11:58 AM -0700 Quanah Gibson-Mount <quanah@stanford.edu> wrote: > > > --On Tuesday, August 16, 2005 11:53 AM -0700 "Kurt D. Zeilenga" > <Kurt@OpenLDAP.org> wrote: > >> LDAP already provides a mechanism for performing this function, >> by use of the paging (1 page of 1 entry) [RFC2696] and sorting >> [RFC2891] controls. Of course, slapd(8) does not support the >> sorting control. >> >> Given this is (or can be assumed to be) a request for enhancement >> to OpenLDAP Software and not a request to enhance LDAP (if the >> latter, the OpenLDAP ITS is not the right place to request >> enhancements to LDAP be made), I suggest this ITS be regarded >> as a request to implement the sorting control. > > Yes, this is a request to enhance OpenLDAP appropriately. :) Actually, after talking to Howard, I believe the above controls aren't sufficient. The whole problem is candidate generation. Now, with BDB, it should be possible to get the min and max values from the first and last marker in the entryCSN index database, since it can only be indexed with equality. So for syncrepl to ever really be efficient for servers that are stopping/starting after deletes or multiple modifies to the same entry, it needs a way to get those values. This completely avoids any candidate generation, and allows the syncprovider to quickly let the replica know if its CSN is out of date. How one would implement that inside the LDAP specs is a different issue. ;) --Quanah -- Quanah Gibson-Mount Principal Software Developer ITSS/Shared Services Stanford University GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html "These censorship operations against schools and libraries are stronger than ever in the present religio-political climate. They often focus on fantasy and sf books, which foster that deadly enemy to bigotry and blind faith, the imagination." -- Ursula K. Le Guin
Well, what I noted was that there was an existing protocol mechanism to request return the entry with the lowest/highest CSN. I pretty much ignored most of rest of your post as it didn't make much sense to me at the time (and still doesn't). The consumer needs to very careful in how it treats the sync cookie, and likely we're linking it too closely with CSNs. Recall that the sync protocol itself has no concept of a CSNs, use of CSNs is an implementation choice and details implementation specific. While our provider places a CSN in its cookie, the consumer really shouldn't be parsing the CSN out of the cookie, and, if it does, it shouldn't use it for anything. The consumer should regard the cookie as an opaque value. (While we should allow consumer construction of a cookie, but we should avoid consumer extraction of CSN from the cookie.) The consumer's context CSN should be independently managed, and then, only if the consumer is configured as a provider. Kurt At 10:19 PM 8/19/2005, Quanah Gibson-Mount wrote: >--On Tuesday, August 16, 2005 11:58 AM -0700 Quanah Gibson-Mount <quanah@stanford.edu> wrote: > >> >> >>--On Tuesday, August 16, 2005 11:53 AM -0700 "Kurt D. Zeilenga" >><Kurt@OpenLDAP.org> wrote: >> >>>LDAP already provides a mechanism for performing this function, >>>by use of the paging (1 page of 1 entry) [RFC2696] and sorting >>>[RFC2891] controls. Of course, slapd(8) does not support the >>>sorting control. >>> >>>Given this is (or can be assumed to be) a request for enhancement >>>to OpenLDAP Software and not a request to enhance LDAP (if the >>>latter, the OpenLDAP ITS is not the right place to request >>>enhancements to LDAP be made), I suggest this ITS be regarded >>>as a request to implement the sorting control. >> >>Yes, this is a request to enhance OpenLDAP appropriately. :) > >Actually, after talking to Howard, I believe the above controls aren't sufficient. The whole problem is candidate generation. Now, with BDB, it should be possible to get the min and max values from the first and last marker in the entryCSN index database, since it can only be indexed with equality. So for syncrepl to ever really be efficient for servers that are stopping/starting after deletes or multiple modifies to the same entry, it needs a way to get those values. This completely avoids any candidate generation, and allows the syncprovider to quickly let the replica know if its CSN is out of date. How one would implement that inside the LDAP specs is a different issue. ;) > >--Quanah > >-- >Quanah Gibson-Mount >Principal Software Developer >ITSS/Shared Services >Stanford University >GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html > >"These censorship operations against schools and libraries are stronger >than ever in the present religio-political climate. They often focus on >fantasy and sf books, which foster that deadly enemy to bigotry and blind >faith, the imagination." -- Ursula K. Le Guin
--On Friday, August 19, 2005 11:11 PM -0700 "Kurt D. Zeilenga" <Kurt@OpenLDAP.org> wrote: > Well, what I noted was that there was an existing protocol > mechanism to request return the entry with the lowest/highest > CSN. > > I pretty much ignored most of rest of your post as it didn't > make much sense to me at the time (and still doesn't). Okay, maybe this will explain it more: When the consumer connects to the provider, it tries to determine if it needs to synch any data or not. To do that, it uses the value of its cookie, and compares that value to the value of the cookie on the master. If its cookie value is not equivalent to what is on the master, it then looks for *all* values <= to its value. This works fine in a very small database (say 10k entries). It takes around 20-30 minutes in my 400k database. It would take even longer on a very large database (say 50 million entries). The more consumers you have, the worse it gets, as well. By giving the consumer a way to immediately get the smallest cookie value that the provider has, the consumer can immediately know whether or not its cookie is valid, there by skipping the <= search. If its cookie isn't valid, it does a full resynch of data. If its cookie is valid, then it only looks for entries that have been modified since the value of its cookie. That was the point of the recent CSN checking change done between 2.3.5 and 2.3.6, which took care of some cases to allow an immediate equality check (before it defaulted to always doing <=). But there are still times when the master and replica can differ in value, and for syncrepl to be worthwhile, there needs to be a way to get an immediate yes/no answer as to whether or not it needs to do a full resync. --Quanah -- Quanah Gibson-Mount Principal Software Developer ITSS/Shared Services Stanford University GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html "These censorship operations against schools and libraries are stronger than ever in the present religio-political climate. They often focus on fantasy and sf books, which foster that deadly enemy to bigotry and blind faith, the imagination." -- Ursula K. Le Guin
At 11:11 PM 8/19/2005, Kurt@OpenLDAP.org wrote: >I pretty much ignored most of rest of your post as it didn't >make much sense to me at the time (and still doesn't). That is, I focused on this aspect of your post: > The max one could be used for things like max uid in db, etc. Paging+Sorting is a sufficient protocol mechanism for this use case. Kurt
At 11:22 PM 8/19/2005, Quanah Gibson-Mount wrote: >--On Friday, August 19, 2005 11:11 PM -0700 "Kurt D. Zeilenga" <Kurt@OpenLDAP.org> wrote: > >>Well, what I noted was that there was an existing protocol >>mechanism to request return the entry with the lowest/highest >>CSN. >> >>I pretty much ignored most of rest of your post as it didn't >>make much sense to me at the time (and still doesn't). > >Okay, maybe this will explain it more: > >When the consumer connects to the provider, it tries to determine if it needs to synch any data or not. It seems here you that you intend 'it' to refer to the consumer. That's incorrect. When using LDAP sync, it is the provider which determines what data needs to be sent to sync the consumer. The provider does so by providing the cookie. So most of what follows makes little sense to me. >To do that, it uses the value of its cookie, and compares that value to the value of the cookie on the master. The cookie is suppose to be opaque. The consumer has no way to compare it to anything on the provider. The consumer should not attempt to parse the CSN. Doing so is simply counter to the design of LDAP sync. >If its cookie value is not equivalent to what is on the master, it then looks for *all* values <= to its value. This works fine in a very small database (say 10k entries). It takes around 20-30 minutes in my 400k database. Well, an ordering index on the provider might speed that up. But this is an provider side implementation detail. No need for a "mix/max function extension to LDAP" to do that. >It would take even longer on a very large database (say 50 million entries). The more consumers you have, the worse it gets, as well. By giving the consumer a way to immediately get the smallest cookie value that the provider has, No need for the consumer to do anything special with the cookie. >the consumer can immediately know whether or not its cookie is valid, there by skipping the <= search. If its cookie isn't valid, it does a full resynch of data. If its cookie is valid, then it only looks for entries that have been modified since the value of its cookie. That was the point of the recent CSN checking change done between 2.3.5 and 2.3.6, which took care of some cases to allow an immediate equality check (before it defaulted to always doing <=). I'll have to examine these changes more closely. I hope it was just a provider implementation optimization... Consumers MUST NOT parse the cookie. Doing so will cause problems when we run into other servers implementing LDAP sync. >But there are still times when the master and replica can differ in value, and for syncrepl to be worthwhile, there needs to be a way to get an immediate yes/no answer as to whether or not it needs to do a full resync. > > >--Quanah > >-- >Quanah Gibson-Mount >Principal Software Developer >ITSS/Shared Services >Stanford University >GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html > >"These censorship operations against schools and libraries are stronger >than ever in the present religio-political climate. They often focus on >fantasy and sf books, which foster that deadly enemy to bigotry and blind >faith, the imagination." -- Ursula K. Le Guin
--On Friday, August 19, 2005 11:41 PM -0700 "Kurt D. Zeilenga" <Kurt@OpenLDAP.org> wrote: > At 11:22 PM 8/19/2005, Quanah Gibson-Mount wrote: > > >> --On Friday, August 19, 2005 11:11 PM -0700 "Kurt D. Zeilenga" >> <Kurt@OpenLDAP.org> wrote: >> >>> Well, what I noted was that there was an existing protocol >>> mechanism to request return the entry with the lowest/highest >>> CSN. >>> >>> I pretty much ignored most of rest of your post as it didn't >>> make much sense to me at the time (and still doesn't). >> >> Okay, maybe this will explain it more: >> >> When the consumer connects to the provider, it tries to determine if it >> needs to synch any data or not. > > It seems here you that you intend 'it' to refer to the consumer. > That's incorrect. When using LDAP sync, it is the provider > which determines what data needs to be sent to sync the consumer. > The provider does so by providing the cookie. No, it is the provider. The provider is trying to determine if the cookie sent to it is still valid or if changes need to be sent to the consumer. When the consumer's cookie is different than the providers, it does an internal search to determine if that cookie is still valid for its database. This is extremely expensive. For example, if the provider has a cookie of: 20050818185717Z#000001#00#000000 and the consumer has a cookie of: 20050818185716Z#000001#00#000000 The provider is going to run the following search: entrycsn <= 20050818185716Z#000001#00#000000 to determine if there is one or more entries with a matching CSN. If there isn't, the provider refreshes the consumer with all data. If there is, it then only does an update of changes from the time the consumer last received data. The problem is the <= search done by the provider. That needs to be avoided. If it was simply able to look at the lowest value CSN in its database, it would immediately be able to tell whether or not the consumer needs only a partial update, or a full update, and skip the entire <= search that can generate an enormous number of results altogether. --Quanah -- Quanah Gibson-Mount Principal Software Developer ITSS/Shared Services Stanford University GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html "These censorship operations against schools and libraries are stronger than ever in the present religio-political climate. They often focus on fantasy and sf books, which foster that deadly enemy to bigotry and blind faith, the imagination." -- Ursula K. Le Guin
At 12:09 AM 8/20/2005, quanah@stanford.edu wrote: >--On Friday, August 19, 2005 11:41 PM -0700 "Kurt D. Zeilenga" ><Kurt@OpenLDAP.org> wrote: > >> At 11:22 PM 8/19/2005, Quanah Gibson-Mount wrote: >> >> >>> --On Friday, August 19, 2005 11:11 PM -0700 "Kurt D. Zeilenga" >>> <Kurt@OpenLDAP.org> wrote: >>> >>>> Well, what I noted was that there was an existing protocol >>>> mechanism to request return the entry with the lowest/highest >>>> CSN. >>>> >>>> I pretty much ignored most of rest of your post as it didn't >>>> make much sense to me at the time (and still doesn't). >>> >>> Okay, maybe this will explain it more: >>> >>> When the consumer connects to the provider, it tries to determine if it >>> needs to synch any data or not. >> >> It seems here you that you intend 'it' to refer to the consumer. >> That's incorrect. When using LDAP sync, it is the provider >> which determines what data needs to be sent to sync the consumer. >> The provider does so by providing the cookie. > > >No, it is the provider. The provider doesn't need a protocol extension to talk to itself. As I noted, an ordering index will do. >The provider is trying to determine if the cookie >sent to it is still valid or if changes need to be sent to the consumer. >When the consumer's cookie is different than the providers, it does an >internal search to determine if that cookie is still valid for its >database. This is extremely expensive. > > >For example, if the provider has a cookie of: > >20050818185717Z#000001#00#000000 > >and the consumer has a cookie of: > >20050818185716Z#000001#00#000000 > >The provider is going to run the following search: > >entrycsn <= 20050818185716Z#000001#00#000000 > >to determine if there is one or more entries with a matching CSN. If there >isn't, the provider refreshes the consumer with all data. If there is, it >then only does an update of changes from the time the consumer last >received data. The provider will likely be forced to use updates+present mode here as the fact that it founded matching entries says nothing about how many deleted entries it didn't find. >The problem is the <= search done by the provider. That needs to be >avoided. If it was simply able to look at the lowest value CSN in its >database, it would immediately be able to tell whether or not the consumer >needs only a partial update, or a full update, and skip the entire <= >search that can generate an enormous number of results altogether. This seems a bit of an oversimplification. CSNs of existing entries say nothing about CSNs of deleted entries. Those, as well as entries that have moved through scope of the search, are the nasty ones. (I note that subtree rename introduces other complications.) I note that in most cases, a full update is not needed. But often a updates+present update is needed. That is, a updates+deletes cannot be generated. I think we need to avoid terms like "partial update" as that's ambiguous to what refresh mode is actually needed and/or used. Long ago, I talked about tracking the last CSN of a deletes and rename operation so that the provider can more easily determine which mode to use. We might want to revisit this. >--Quanah > > >-- >Quanah Gibson-Mount >Principal Software Developer >ITSS/Shared Services >Stanford University >GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html > >"These censorship operations against schools and libraries are stronger >than ever in the present religio-political climate. They often focus on >fantasy and sf books, which foster that deadly enemy to bigotry and blind >faith, the imagination." -- Ursula K. Le Guin
--On Saturday, August 20, 2005 8:33 AM -0700 "Kurt D. Zeilenga" <Kurt@OpenLDAP.org> wrote: >> No, it is the provider. > > The provider doesn't need a protocol extension to talk to > itself. As I noted, an ordering index will do. > >> The provider is trying to determine if the cookie >> sent to it is still valid or if changes need to be sent to the consumer. >> When the consumer's cookie is different than the providers, it does an >> internal search to determine if that cookie is still valid for its >> database. This is extremely expensive. >> >> >> For example, if the provider has a cookie of: >> >> 20050818185717Z#000001#00#000000 >> >> and the consumer has a cookie of: >> >> 20050818185716Z#000001#00#000000 >> >> The provider is going to run the following search: >> >> entrycsn <= 20050818185716Z#000001#00#000000 >> >> to determine if there is one or more entries with a matching CSN. If >> there isn't, the provider refreshes the consumer with all data. If >> there is, it then only does an update of changes from the time the >> consumer last received data. > > The provider will likely be forced to use updates+present mode > here as the fact that it founded matching entries says nothing > about how many deleted entries it didn't find. But this search isn't done to see what entries are needed to be found for update purposes. It is simply done to determine whether or not the cookie sent by the replica is still valid (i.e., there is at least one entry in the database hasn't been updated since the replica last talked to the provider). Determining what updates need to be sent to the replica is done later. I already use the syncprov-sessionlog overlay on the provider side so that it has a log of changes (including deletes) to send to the replica *after* it is determined if it needs a full sync or not based on the validity of its cookie. >> The problem is the <= search done by the provider. That needs to be >> avoided. If it was simply able to look at the lowest value CSN in its >> database, it would immediately be able to tell whether or not the >> consumer needs only a partial update, or a full update, and skip the >> entire <= search that can generate an enormous number of results >> altogether. > > This seems a bit of an oversimplification. CSNs of existing > entries say nothing about CSNs of deleted entries. Those, > as well as entries that have moved through scope of the > search, are the nasty ones. (I note that subtree rename > introduces other complications.) As noted above, this ITS isn't about what needs to be updated. It is only about the expensive search being done by the provider to validate a replica's cookie. This is done before it determines what updates need to be done to the replica. --Quanah -- Quanah Gibson-Mount Principal Software Developer ITSS/Shared Services Stanford University GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html "These censorship operations against schools and libraries are stronger than ever in the present religio-political climate. They often focus on fantasy and sf books, which foster that deadly enemy to bigotry and blind faith, the imagination." -- Ursula K. Le Guin
--On Saturday, August 20, 2005 8:33 AM -0700 "Kurt D. Zeilenga" <Kurt@OpenLDAP.org> wrote: I realized much of what I wrote last night was muddled by tiredness. Hopefully this is clearer. :) OpenLDAP 2.3.5 Scenario: Consumer connects to Provider, sends its cookie. Provider checks to see if the consumer cookie is current. a) If yes, no updates needed. b) If no, determine if the cookie is still valid by seeing if there are any entries with an entryCSN <= consumer cookie. Then determine what type of updates to do. OpenLDAP 2.3.6 Scenario: Consumer connects to Provider, sends its cookie. Provider checks to see if the consumer cookie is current. a) If yes, no updates needed. b) If no, see if there is an entry where its entryCSN is equal to the provided consumer cookie. i) If yes, determine what updates are needed. ii) If no, determine if the cookie is still valid by seeing if there are any entries with an entryCSN <= consumer cookie. Then determine what type of updates to do. The whole issue here is around (b). If the provider has received a delete of the entry that has the entryCSN value stored in the consumer's cookie, the provider is going to move to option (ii). If the provider has received a modification of the entry that had the entryCSN value stored in the consumer's cookie, it is going to move to option (ii). A better check for (ii) to validate the consumer's cookie is simply is the minimum value of entryCSN in the provider's database less than or equal to the consumer's cookie. This skips trolling through the providers database looking at every entryCSN. --Quanah -- Quanah Gibson-Mount Principal Software Developer ITSS/Shared Services Stanford University GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html "These censorship operations against schools and libraries are stronger than ever in the present religio-political climate. They often focus on fantasy and sf books, which foster that deadly enemy to bigotry and blind faith, the imagination." -- Ursula K. Le Guin
changed notes changed state Open to Test
changed notes changed state Test to Closed
moved from Software Enhancements to Archive.Software Enhancements
Main concern fixed in HEAD/re23