[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Fwd: I-D ACTION:draft-rharrison-bulkldif-00.txt



David,

I appreciate your reply.  You've brought up some good questions, and I'd like to address your comments one by one, explaining the design decisions that led us to this approach.  Your questions have also spurred some additional thought on the subject, and I conclude with a rough idea for an alternate approach to solving our problem based on LDAP requests.  I've paraphrased your questions and comments.  In doing so, I hope that I've been true to their intent.  If  I haven't, that will give us some more to discuss!

Comment: 
I'm surprised that the motivation was to avoid 'the overhead of responses for each operation.'  Is this really a large overhead?

Response:
The overhead I refer to is not network overhead but is due to the fact that the client, if making requests synchronously, must wait for the response to each operation before requesting the next one. (I believe that this is the way ldapmodify currently works, and it is slow.)  If the client is making requests asynchronously, there is no way to ensure that the operations will be received and applied to the DIT in the same order that they occurred in the LDIF file. I discuss this at more length below.

Comment: 
Sending LDIF on the wire seems messy. Why do this rather than send the entries in the same wire format they'd have in an LDAP add operation?

Response: 
Sending LDIF on the wire pushed most of the work of implementing this extension onto the server.  This is in keeping with RFC 2251 section 3.1 which states, "it is an objective of this protocol to minimize the complexity of clients." Based on this draft, the client doesn't have to have any knowledge of the LDIF file format.  The client just reads data from a file--which it assumes is in LDIF format--and sends chunks to the server; the chunks do not even have to correspond with LDIF record boundaries.


Sending LDIF across the wire has other benefits.  First, it maintains the ordering of operations coming from the LDIF file. The latest LDIF draft states, "An LDIF file consists of a series of records separated by line separators."  The term "series" implies that the records are sequenced in order from the beginning to the end of the file. While it is possible to have the client send many asynchronous LDAP requests, one for each record in the LDIF file, there is no way to enforce the ordering of those asynchronous LDAP requests.

Unless the client wants to make these requests in synchronous fashion, which is slow, it has no way of enforcing the ordering of requests.  This may not be a problem in some LDIF files, but it could be a problem for others.  For example, consider an LDIF file that deletes the current instantiation of an object then adds a new instantiation of it.  If these two operations are reordered, the final results would not be as expected. 

Second, it makes it easy for the client to deal with cases where critical errors occur.  In this extension, the server merely stops processing the LDIF stream and notifies the client of the error.  If the server were processing asynchronous requests, the client would have to abandon all remaining requests then wait for responses from any requests that could not be abandoned.  Finally, depending on the application, the client might then have to back out changes that were applied to the DIT after the offending record in order to leave the DIT in a known state.  This would be difficult to do.

Comment: 
How does this relate to the LDUP work?  It seems to me that the task of initializing a replica server over-the-wire is quite close to the intended purpose of this proposal.

Response: 
This extension is not intended specifically for the task of initializing a replica server over the wire.  It is intended to facilitate the remote application of directory data to a DIT in a widely available format. While administrators may frequently use this extension to initialize a server's data, they may use it even more frequently to import data for a batch of new users or new customers or to apply other batches of changes.  The number of records for this type of scenario might range into the tens or hundreds of thousands per use based on what our customers tell us they'll do.

There may be some overlap in functionality between LDUP and this extension, but this extension is not intended to do replication.

Comment: 
Does this proposal have roots in existing NDS functionality?

Response: 
No.  It is merely an attempt to design a way to provide functionality that has been requested by our customers. As you point out, using a network file system or manual ftp is "somewhat less than ideal."  Our experience is that system administrators and customer support personnel demand the convenience of a remote access protocol for this type of operation.  They do not want to be bothered with having to manually create a temporary copy of the file onto a locally-accessible file system, especially if it is extremely large (millions of objects) then gain console or other access to the server to complete the task.

Even if they did use ftp, they would have to wait for the entire file to be copied to a file system before beginning to apply the LDIF file or they would run the risk of reading the end-of-file before the data was completely copied.  This extension allows the server to process data as quickly as it is sent by the client without the need for large amounts of intermediate storage space.  Of course, it is possible for the client to actually reside on the same physical machine as the LDAP server, in which case the functionality is still available, but this point is moot.

In thinking about your idea of using standard LDAP requests to do this same sort of thing, Jim and I came up with a different approach.  We probably haven't thought through all of the issues, but essentially, this approach would work as follows:

1. The client would parse the LDIF file into individual records and submit each record as an asynchronous LDAP request.

2. To preserve ordering, the client would add a control to each request identifying the record number.  This will allow the server to delay processing a re-ordered request until the records that came before it in the LDIF file are processed.

3. The server would simply reply to each request as it would any other LDAP request; the client would be completely responsible for dealing appropriately with error conditions.

Advantages of this approach are that the server wouldn't have to provide any new functionality except for support for the control.  Existing LDAP protocol operations would be used to transmit the requests, and the data would be sent in BER-encoded form.  The client would receive notifications for every completed request which would make it easy to keep track of the completion status.

Disadvantages of this approach are that the client would have to include support for full parsing of LDIF files to encode the LDAP requests.  The client would be completely responsible for dealing with critical errors and possibly backing out changes made during the abandon phase of the transmission after receiving a critical error. It would also require server support for a new control.

For performance reasons, we anticipate aggregating LDIF records into groups, then each group is applied to the DIT as a single transaction.  By implementing this functionality as an extension, it is simple for us to capture this data stream and do this relatively independent of other LDAP server operations. An extension also makes it easy to set transaction sizes once at the beginning of the transfer. If this functionality were implemented as a control, several pieces of additional control information would have to be sent with the control on each LDAP request. Implementing this functionality as a control will require more invasive coding in our LDAP server since we wouldn't use the standard LDAP operations to process individual requests. I don't know if this would apply equally to other LDAP server implementations, but it is something that I've considered in my own implementation work.

A final disadvantage is that there would be no way for a client to asynchronously apply two LDIF files to the same server over a single connection with such a simplistic control.  A slightly more sophisticated control would be needed for this.  I don't see this as a big problem since this sort of usage seems unlikely, anyway, but the extension provides for this.

I look forward to your feedback on this.

Sincerely,

Roger


>>> David Boreham <dboreham@netscape.com> 07/06/99 12:58PM >>>
Roger Harrison wrote:
> 
> Jim Sermersheim and I have published a draft describing an LDAP v3 extension to efficiently deliver and import LDIF data into an LDAP server.
> 
> Our goal in writing this draft was to define a mechanism that allows large numbers of LDAP operations to be specified by a remote client without the overhead of responses for each operation.  To do this, we've combined the LDIF data format with a wire protocol that allows a client to initiate and send LDIF data to the server and get periodic updates on the server's progress in processing the data.
> 
> We'd like to get input and feedback from the LDAP community both from a design standpoint and to see if this is something that should be added to the charter of the ldap-ext working group.

I'm surprised that the motivation was to avoid 
"the overhead of responses for each operation".
Is this really a large overhead ? 
Surely TCP ACK segments need to be sent back
to the client in response to each received data segment. 
Any LDAP response PDU
would surely piggyback on those packets wouldn't it ?
Considering that ldif is a less efficient
storage format than the equivalent BER, 
is the network traffic really lower ?

Some other thoughts:

1. Sending LDIF on the wire seems messy. Why do this rather
than send the entries in the same wire format they'd have
in an LDAP add operation ?

2. If the intent is to send a file from one 
machine to another, why can't an existing protocol
be used for that purpose (e.g. ftp, http) ?

3. How does this relate to the LDUP work ?
It seems to me that the task of initializing
a replica server over-the-wire is quite close
to the intended purpose of this proposal.
I believe that LDUP defines, or will define,
a protocol for replication update propagation.
There may be some overlap between this 
proposal and that protocol.

4. Does this proposal have roots in existing
NDS functionality ? If so, I'd be interested to hear
how deployment experience with NDS affected
the decision to specify this protocol rather
than to use one of the alternatives discussed
above. Netscape products currently use the
network file system or manual ftp here,
which has proven workable in the field,
if somewhat less than ideal.

Put another way: I've had many discussions
with engineers working on the server's
implementation where we talked about 
new and better ways to send imported
ldif data to a server, but I've never 
ever heard a customer or a marketing
person ask us to change the way this feature
currently works.