[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: ITS#1679 gentle SIGHUP handling

Pierangelo Masarati writes:
> but as Howard noted, this feature would be strikingly interesting
> if the *configuration* is changed while using the *same* database.

There is one another way to restart slapd almost seamlessly, at
least on many Unix variants.  Maybe I should implement this too?

1. Start a new slapd daemon which sends a control message to the
   old slapd but does not open any listener connections.
2. Old slapd quits reading new requests from clients that already
   have outstanding requests.  Instead it reads and processes one
   request at a time per client.  Or is that how it works already?
3. When there are no more outstanding requests except the ones being
   processed, old slapd stops reading new requests.  It send()s its
   listener file descriptors to new slapd, along with the associated
4. If both slapds are in readonly mode or if they use different
   databases, new slapd can begin to accept new clients.
5. When a client's final outstanding request is processed, old slapd
   send()s the client's file descriptor to new slapd, along with the
   Connection struct describing the client.
6. When all clients are done, old slapd terminates.
7. New slapd starts up properly, if it didn't in step 4.

In theory old slapd can close the databases and new slapd start up
as soon as all requests' database accesses are finished, they do not
need to wait until the results have been sent.  I'm not sure if that
can be fitted into the current design of the backends, though.

Ldap will slow down a bit in step 2, and hang in steps 3-6 for as
long as it can take old slapd to process a request and send the
results.  I don't know how much time I'm talking about here?
Is it so much that the 'gentle SIGHUP' approach is better?

There are a few problems:

- I don't know what to do about paged results.  The simplest way I
  can think of is for old slapd to wait to start step 1 until there
  are no paged requests, and if that doesn't happen in X seconds,
  abort the connections to clients with paged requests.

- A socket can get full, can't it?  So if a client doesn't read the
  results at once, slapd can get stuck trying to send the results.
  The simplest solution is to abort the connection after Y seconds,
  but if we can close the databases first, that's not necessary.

- I think a client can hang due to congestion after step 2 if it
  pours a lot of requests into slapd quickly without reading the
  results at once.  Then slapd can be stuck trying to send the old
  results into a full socket.  Again, the simplest solution is to
  abort the connection if the client doesn't wise up after Y

- If there is a deadlock in slapd somewhere so a client is
  deadlocked but the rest of slapd is not, old slapd won't
  terminate.  Again, abort the client connection after Y seconds.

Have I missed anything?
Do you know what of this is likely to be harder than it looks?
Is there any reason why it would be a good idea for old slapd to
send its outstanding _requests_ to new slapd as well?  I'd rather
avoid that before it gets even more complicated.