Issue 8747 - LDAP load balancer daemon (lloadd)
Summary: LDAP load balancer daemon (lloadd)
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: unspecified
Hardware: All All
: --- blocker
Target Milestone: 2.5.5
Assignee: Ondřej Kuzník
URL:
Keywords:
: 9550 (view as issue list)
Depends on:
Blocks:
 
Reported: 2017-09-28 16:13 UTC by Ondřej Kuzník
Modified: 2021-06-21 22:04 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Ondřej Kuzník 2017-09-28 16:13:46 UTC
Full_Name: Ondrej Kuznik
Version: master
OS: 
URL: https://github.com/mistotebe/openldap/tree/lloadd
Submission from: (NULL) (82.10.24.68)


The 'lloadd' branch linked above contains the load balancer code that is now
ready for review.

This adds a new server to the OpenLDAP project, a load balancing proxy
(prototype). Also, the tls branch contains the work in progress toward
StartTLS/ldaps support (which works apart from certificate checking) and can be
merged once ITS#8746 has been closed.

To test, make sure you have libevent >= 2.0 installed and regenerate the
configure script which now accepts the --enable-balancer option. The code in
question has only been compiled/tested on Linux so far.
Comment 1 Ondřej Kuzník 2017-10-02 12:04:02 UTC
On Thu, Sep 28, 2017 at 04:13:46PM +0000, ondra@openldap.org wrote:
> The 'lloadd' branch linked above contains the load balancer code that
> is now ready for review.
> 
> This adds a new server to the OpenLDAP project, a load balancing proxy
> (prototype).

To summarise the reasons for this project to exist and highlight its
features, limitations:
- most LDAP load balancers pin a connection to another server and
  then just ship data, lloadd can distribute operations from a single
  client connection across LDAP servers
- to make the above possible, lloadd sets up connections to the backend
  servers on startup and manages them as per configuration (independent
  on the clients)
- bind operations are forwarded over dedicated bind connections or using
  the VC exop if feature 'vc' is enabled in its config, the designated
  identity is then passed on with the operations with proxyauth control
  (if feature 'proxyauthz' is enabled)
- it is expected that all backends are indistinguishable (same features,
  suffixes, data)
- no SASL bind support yet
- if an operation cannot be processed or forwarded for any reason
  (overload, connection loss, ...), it is never re-sent, client will
  however still be sent an appropriate result in that case

The lloadd.8 and lloadd.conf.5 manpages are provided, Admin guide and
further documentation will come as the implementation matures.

-- 
Ondřej Kuzník
Senior Software Engineer
Symas Corporation                       http://www.symas.com
Packaged, certified, and supported LDAP solutions powered by OpenLDAP

Comment 2 Quanah Gibson-Mount 2020-11-17 18:58:01 UTC
Commits: 
  • 46ddb403 
by Ondřej Kuzník at 2020-11-17T17:15:40+00:00 
lloadd ahoy


  • c596b797 
by Ondřej Kuzník at 2020-11-17T17:15:40+00:00 
Backend configuration


  • 8e0a6119 
by Ondřej Kuzník at 2020-11-17T17:15:40+00:00 
Startup adjustment


  • 1a452490 
by Ondřej Kuzník at 2020-11-17T17:15:40+00:00 
Update connection init


  • bf66b48f 
by Ondřej Kuzník at 2020-11-17T17:15:40+00:00 
Upstream connection setup


  • 79f7e79f 
by Ondřej Kuzník at 2020-11-17T17:15:40+00:00 
Set up connections in the worker threads


  • b49932d6 
by Ondřej Kuzník at 2020-11-17T17:42:43+00:00 
Connection write support


  • 93fe1d2b 
by Ondřej Kuzník at 2020-11-17T17:42:44+00:00 
Operation parsing


  • fd5b9cdb 
by Ondřej Kuzník at 2020-11-17T17:42:44+00:00 
This is a proxy now


  • 5bdb4e15 
by Ondřej Kuzník at 2020-11-17T17:42:44+00:00 
Update maximum number or parameters for backend


  • 3d1ea469 
by Ondřej Kuzník at 2020-11-17T17:42:44+00:00 
Authenticate the upstream connection if configured


  • 2fbc8ca4 
by Ondřej Kuzník at 2020-11-17T17:42:44+00:00 
Rename backend mutex


  • f37e7757 
by Ondřej Kuzník at 2020-11-17T17:55:45+00:00 
Response handling, exploit optional bervals


  • 4ad8ecd4 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Logging improvements


  • e5f68bcf 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Option for response handling


  • 639c5912 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Client authentication


  • 9309bc94 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Make features global


  • 59291ba4 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Proxyauthz support


  • 94ee62a4 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Switch bindkey to use Backend instead of bindconf


  • 798e215e 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Add connection number config


  • 673513a0 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Maintain the configured amount of connections per backend


  • dc5e2538 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Configuration part for retry timeouts


  • 463bcdd2 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Update backend progress tracking


  • 8b1703d2 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Implement backend retry timeouts


  • b6b3f35a 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Fix proxyauthz handling


  • 2e2c8666 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
There might be errors before we save the operation in c_ops


  • 50f5c4be 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Report initial bind errors to client


  • 54cd3a27 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Reject operations when binding


  • e5fcf175 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Save connection ids on operation for logging purposes


  • 8f5bae92 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Pending operation tracking and limiting


  • 6c8b2acc 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Do not leak addrinfos


  • c0d254a4 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Do not leak BerElements


  • fba4bed6 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
connection reference counting


  • cddc9632 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Do not clear c_pendingber on short write


  • 028f2869 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
On a failed bind, stop the callback from firing again

Not a problem but causes a slew of calls to upstream_bind_cb that will
all fail in the same way.


  • 837a6068 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Rework client_read_cb along the lines of upstream


  • ea7e40b8 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Shutdown handling


  • 9d66c26b 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Operation reference counting


  • 7a29fabd 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Destroy the unbind operation when acted upon


  • c5584fd3 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Do not leak responses to abandoned ops


  • 07b5744c 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Retain a reference around for handle_responses


  • 77f2c571 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Reset c_*ber after freeing and check c_pendingber race


  • 6899d012 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Do not bother to write to a dying connection


  • 8eb7f3fb 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Stop the read callback on a dead connection.

The connection might be ready to read (close) but if we can't destroy it
yet, we don't want the callback to trigger all the time or process new
data.


  • 9ebe5acb 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Clean up events properly


  • 643194e7 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Revert connection/operation mutex order.

There was still a race where the connection could be freed as the
operation was still being used.


  • 58a880bc 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Convert backend and upstream management to use CIRCLEQ.

This alone doesn't make the server do a round robin.


  • e65cd387 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Round-robin for upstream connections


  • 53015aa4 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Round robin for backends.

Several threads calling backend_select might reset current_backend to a
different place, there are two options to deal with that:
- just let the last rotation win (the current approach)
- detect whether first == current_backend and only replace then

Not sure which one is more useful, going with the simpler.


  • ee288cfc 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Fix refcounting for all code paths


  • 37a474b5 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Fix error handling wrt. its callers


  • d020897f 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Initialise listeners after all workers have been


  • f4afc069 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Tweak connection error logging.

Do not log when receiving the last bytes on a connection. Log failed
writes.


  • cf05722b 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Lookup operations by saved connid.

We reset the connection pointer on a destruction attempt, avoid the
spurious asserts.


  • e0b8bd5f 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Free all pending operations on shutdown


  • 3f5dee0b 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Keep a list of active clients for shutdown purposes.

Potentially for timeout detection purposes in the future.


  • 26f72151 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Improve logging


  • 1dfeca35 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Another attempt at operation/connection destroy interaction.


  • 10824868 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Only enable verifycredentials if libldap does


  • 8d85912a 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
lloadd documentation


  • 015f8934 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
First test for load balancer


  • 0a075905 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Second test


  • 3fa8a0cd 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Rename listener-threads to reflect the option


  • 495dfa69 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Split client/upstream PDU size limits


  • a8a0fe26 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Documentation updates


  • c228bd11 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Be consistent with bind responses on no upstream


  • 5b1ad431 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Handle upstream connection shutdown properly


  • 7eeb5bb8 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Forward controls correctly in the face of proxyauth


  • 0e7792e8 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Borrow liblber code to get abandon processing to work


  • 6ee21f11 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Split bind configuration from backends


  • 961b600a 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Rework proxyauthz handling


  • 9d3b998a 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Document new bind configuration


  • 873d6fa3 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Handle backend unsolicited response properly


  • 05f2ac25 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Unify logging output


  • af7ce80c 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Remember and clear bind status correctly


  • 37cff373 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Manage connection refcnt better


  • 88390159 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
On connection shutdown, free op from the correct side


  • 545198c7 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Simplify abandon processing


  • 0ff462b6 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Fix issues in bind response handling


  • 46fe0143 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Make sure operation stays alive when we process it


  • 887c2661 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Update tests to match latest configuration layout


  • baf1feab 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Handle asynchronous connect properly


  • 95df8a1e 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Adjust backend operation counting


  • 33a99355 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Unblock the client when we can't find an upstream

If we can't find an upstream, we keep the client around, so it needs to
be unblocked.


  • 1dd0e513 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Only one bind at a time


  • 30e538e8 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Realign logging levels.

Stats now logs all operations, stats2 additionally intermediate messages
(search entries).


  • 1740f36b 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Fix emfile handling


  • 65def943 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
More logging improvements


  • 70464443 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Do not read on the last iteration.

When the pdu processing limit is hit, we still attempt to read another
PDU. If we succeed, the ber_get_next call in the read callback will
abort since a full PDU is already present.


  • 7b413f9e 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Update docs and defaults


  • 7b7f9724 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Avoid a deadlock with client


  • 16010e5e 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
More logging improvements


  • 622b87d5 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Make ready only when still alive


  • 31074213 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
TENTATIVE: communicate more for op destroy race


  • cda8411c 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Close up the race


  • 0ad91e05 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Do not back off until we get a failure


  • d4225924 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
CLOSING is another potential state we could be in


  • 6140cdf6 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Handle a client connection disconnected from op


  • f7cf34e6 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Reset connection state on abandon


  • e03c9e6f 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Stop processing if we freed the client


  • 532fc1bf 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Shorten time operation_mutex is locked


  • 362d5503 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Do not crash when closing both client and upstream


  • 96b7619a 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Do not unlock client unless we are destroying it


  • 5fcef01d 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Switch from a global mutex


  • cfeb4d82 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Set binding state after we have dropped all ops


  • 96f49393 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Add a load test


  • 8d93e0ba 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Unify connection locking and I/O


  • d22db36c 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
lload_libevent_init can fail and wants to log


  • 0b353106 
by Ondřej Kuzník at 2020-11-17T17:55:46+00:00 
Refactor operation_send_reject


  • c60ef739 
by Ondřej Kuzník at 2020-11-17T17:58:13+00:00 
Rework upstream conn setup


  • 7cd531c0 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Improve spec conformance, logging


  • 11f47438 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Exop support

At the moment, no exops are processed internally, all are passed on
unchanged.


  • b801ca17 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Rename macros and symbols to lloadd


  • f27517af 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Rename bind handlers


  • abab7e46 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Move client related functions to client.c


  • 5ee4b676 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Move bind handling to bind.c


  • ccf75c96 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Update write timeout to timeval


  • 063981a0 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Respond to timeout events properly


  • a0cd41ec 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Upstream TLS support


  • 1b46f866 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Client TLS support


  • f87127df 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Set up TLS context for backends


  • b4d7e8af 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
We should just be able to call backend_retry


  • 0cfd4fca 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Make timeouts common and redo connection read timeouts


  • a0ec50b3 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Upstream queues ordered by c_connid

In preparation for operation timeout events.


  • 17900184 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Record operation activity times


  • 8ba44630 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Factor out abandon message preparation


  • aecc62c0 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Introduce operation timeout machinery


  • c386d527 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Protect currently impossible branch


  • 5cbd30de 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Log timed out connections more clearly


  • ea836279 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
request_abandon RFC4511 conformance


  • c7e3437e 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Update test suite


  • 8bc7650a 
by Nadezhda Ivanova at 2020-11-17T17:58:14+00:00 
Clean ups and renames to coexist with slapd


  • 37cd5f21 
by Nadezhda Ivanova at 2020-11-17T17:58:14+00:00 
Enable compilation of the load balancer as a module

To compile the balancer as a slapd module, pass --enable-balancer=mod to ./configure
Use --enable-balancer(=yes) to compile as standalone server.


  • c91d61cf 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Do not copy files from slapd, just link them


  • 66f06f3f 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Initial extension to upstream selection


  • 1fd7249f 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
RFC4511 says Binds do not abandon, send a "reset" bind instead


  • ddd1acc3 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Passing the client directly will allow clearing it from op


  • 21a22d1b 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Refactor request parsing and sending.

We have to do most of out processing before we send the request over to
the upstream. If we don't, we might be too late and the response might
have arrived already.


  • 003a35c6 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
SASL bind support

Introduces pinned operations. When SASL bind finishes, we might still
have to maintain a link between the client an an upstream for future
bind operations if we got a SASL Bind in Progress result code. We zero
out the msgids and remember a server-unique identifer on the client and
the relevant operation that lets us retrieve that link again. This
operation is reclaimed just like anything else when connections drop.

Hopefully, this should work for LDAP TXN and VC Exop support with SASL
later as well since it allows for many-to-many links to exist.


  • ee893ae1 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Handle EXTERNAL mechanism

Will only try to extract the TLS client certificate name if used during
the last handshake.


  • 72ca7112 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Do not compare c_auth when NULL


  • c52328f6 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Clear c_auth on every bind request

For a new bind request, this is obvious, for SASL bind requests, we do
not know the final identity until we have finished handling it, make
sure it stays empty until then.


  • 5c1245de 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Manage c_sasl_bind_mech on upstream


  • 2ba83368 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Operation abandon related fixes


  • cbc0ec04 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Fix pinned operation forwarding


  • 205db0bf 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Reset pin on simple bind


  • c957bb91 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Add SASL documentation on SASL handling


  • 7a69017f 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Resolve authzid after a successful auth


  • 9baa56ad 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Update tests to support lloadd as a module


  • 2d330325 
by Nadezhda Ivanova at 2020-11-17T17:58:14+00:00 
Lload cn=monitor initial implementation


  • 77716069 
by Nadezhda Ivanova at 2020-11-17T17:58:14+00:00 
Use slapd's config.h


  • 678fa100 
by Nadezhda Ivanova at 2020-11-17T17:58:14+00:00 
Convert the load balancer into a backend


  • dab90547 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Rework monitor startup

Takes care of dealing with monitor not present/not configured and fix a
monitor startup issue.


  • 22818e85 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Module shutdown


  • db5966f6 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
More meaningful connection type reporting


  • 485a1697 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Implement pause handlers


  • 9bd90a74 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Fix a race on bind response processing.

During response processing, an upstream connection could be marked ready
after a different bind had already been allocated to it, thus allowing
two binds to be in progress on the same connection.


  • 00116847 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Cleanup sasl_bind_mech resets


  • bea9bfb3 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Move op counting to operation_init


  • ca646cd0 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Fix operation counts

Trying to abandon an operation does not automatically make it completed,
it might have failed already but we're just racing to reach the client
to record that.


  • 7f22bac4 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Introduce a new connection status - gentle shutdown


  • bf9f99dd 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Split backend destruction from resetting it


  • a7f8f58a 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
expose task functions for invalidation


  • cfe90658 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Introduce infra to handle config changes


  • edfb3d73 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Fix operation status tracking.

An operation is rejected iff it has to be dropped before we can find an
upstream for it (unless we handle it ourselves, that is). At that point
it is failed unless completed successfully.

This makes a difference for multi-stage binds which alternate between
'failed' (we are waiting on a server response) and 'completed' (server
did what we asked them to, waiting on client to continue).


  • d954216f 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Change log level for unsolicited response


  • 70ae4af6 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Fix interaction of graceful connection closing and SASL bind support


  • bace7959 
by Nadezhda Ivanova at 2020-11-17T17:58:14+00:00 
Enable dynamic configuration


  • 3a6b3995 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Reflect backend URI change in cn=monitor


  • 4c355deb 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Record the backend name


  • 362f1647 
by Ondřej Kuzník at 2020-11-17T17:58:14+00:00 
Deal with no backends being configured


  • 05d6aae4 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Rework lloadd startup


  • b1c098ad 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Module shutdown support


  • 1ea5ee1f 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Do not unlock upstream without referencing its dying ops


  • 07401e58 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Implement runtime monitor (un)registration

Unregistration is a hack and we shoould either make the subsystems into
an entry (if monitor allows subentry generation) or implement subsystem
unregistration in back-monitor.


  • db939eeb 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Protect operation when abandoning


  • 0314f95d 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Work around libevent base not waking up on shutdown


  • b039e7c1 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Keep a reference around for the bind task


  • 6b10c298 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Record pending DNS resolution to be able to cancel


  • db3961f4 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Record connect task to allow canceling it


  • 93d20459 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Make io-threads modification startup-only


  • 757c8bed 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Switch to ldap_parse_url_ext

This simplifies port parsing in the end. Also pass the url to
ldap_open_listener in anticipation of incremental listener config.


  • bd7a6f67 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Introduce lload_open_new_listener


  • f1ea9da3 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Reorganise listener support in cn=config and module startup


  • 513659c6 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Document config behaviour


  • 00806dd3 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
libevent 2.0 support


  • f4a2fdd4 
by Nadezhda Ivanova at 2020-11-17T17:58:15+00:00 
Fix a new backend not being operational if added via cn=config


  • 241f65b9 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Fix a race in managing b_dns_req


  • 2a813cb0 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Clean up backend_retry and its callers.


  • 638f8a2c 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Tighten checks on retry management


  • b4f43ed8 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Refactor backend reset

Reuse the connection walking facility in timeout management.


  • 3bd2d748 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Reuse connection_walk for client matters


  • 63efcd63 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Reuse connection walking in monitor for upstreams too


  • ef0028e5 
by Nadezhda Ivanova at 2020-11-17T17:58:15+00:00 
Initial implementation of cn=config testing script


  • 25a4d684 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Permit lloadd to share slapd TLS context


  • 9444dfc9 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Simplify pause handling

Gets rid of a race where unpause+pause fired in a quick succession would
miss the event_base_loopbreak() call.


  • 05e0906f 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Fix backend starttls= setting being ignored


  • 50a021a3 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Do not enforce a valid ld in lutil_sasl_interact


  • 4b3d2114 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Introduce SASL support for upstream connections


  • 78f25a3c 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
A failed cn=config ADD needs to be handled


  • 34ddaa5f 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Tests for monitoring support


  • bd3da732 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Add TLS tests


  • c0872442 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
SASL and proxyauthz tests


  • 81ead4a5 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Fix races with backend_retry


  • aab6af1c 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Switch to LDAP_OTHER when handling a lost upstream.

LDAP_UNAVAILABLE signals "the server is shutting down or a subsystem
necessary to complete the operation is offline", so intelligent clients
tend to infer the connection will not be usable any more, which is not
the case here.


  • dc1961cb 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Epoch based memory reclamation

Similar to the algorithm presented in
https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-579.pdf

Not completely lock-free at the moment. Also the problems with epoch
based memory reclamation are still present - a thread actively observing
an epoch getting stuck will prevent LloadConnections and LloadOperations
being freed, potentially running out of memory.


  • f832024e 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Straighten up client pending op tracking


  • b49f5187 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Implement client pending operation limits


  • b2e57148 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Shorten to one epoch per PDU

A full read cycle can take a very long time if the limits are set too
high.


  • 959ff079 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Make sure read event is not enabled while upstream_bind is scheduled


  • 58d66a39 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Fix race between unlinking a client and processing incoming data


  • 1328777a 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Fix a SASL channel-binding leak


  • 62a806b2 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Thread error checking


  • 68b163fc 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Introduce mutex checks

Switched off unless thread debugging is on, but still useful for static
analysis.


  • 1f6d8611 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Implement read throttling when writes backlog

Reject operations in such a case with LDAP_BUSY. If read_event feature
is on, just stop reading from the connection. However this could still
result in deadlocks in reasonable situations. Need to figure out better
ways to make it safe and still protect ourselves.


  • 41a74b46 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Introduce the notion of experimental features


  • 25fff30e 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Let the last thread dispose of pending references

If we're idle, there might be objects pending cleanup for the last two
epochs. Unless another thread comes in and checks into a new epoch or we
shut down, they will linger forever.

If one of the objects was a connection, it wouldn't get closed and be
stuck in CLOSE_WAIT state, potentially refusing another ligitimate
connection if its socket address were to match the one we're yet to
close.


  • dfbf25d5 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Honour keepalive settings for upstreams


  • dfbed44b 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Do not accept requests with msgid == 0

It is used internally to identify pinned operations and should not be
encountered over the wire.


  • 0abf3f5b 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Flush cache before calling dispose()

This needs to be confirmed:
Location based atomics do not imply a full fence of the same level. So
to get the code in dispose() read the actual data, it seems we need to
initiate a fence.


  • 323bb1d9 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Handle upstream rejecting a StartTLS exop


  • 8557cc93 
by Ondřej Kuzník at 2020-11-17T17:58:15+00:00 
Add lloadd into our testing regime
Comment 3 Howard Chu 2021-05-08 15:14:53 UTC
*** Issue 9550 has been marked as a duplicate of this issue. ***
Comment 4 Quanah Gibson-Mount 2021-05-10 22:13:08 UTC
  • 2c1bb42f 
by Ondřej Kuzník at 2021-05-10T18:49:13+00:00 
ITS#8747 Do not observe an epoch while calling dispose_cb


  • 4f499755 
by Ondřej Kuzník at 2021-05-10T18:49:13+00:00 
ITS#8747 Avoid epoch recursion in connection_write_cb


  • a186fd70 
by Ondřej Kuzník at 2021-05-10T18:49:13+00:00 
ITS#8747 Do not continue reading if connection is dying


  • 3802fa92 
by Ondřej Kuzník at 2021-05-10T18:49:13+00:00 
ITS#8747 Fix lloadd builds --without-tls


  • 1cb65102 
by Ondřej Kuzník at 2021-05-10T18:49:13+00:00 
ITS#8747 Keep an explicit backend pointer


  • 8e4d7ffe 
by Ondřej Kuzník at 2021-05-10T18:49:13+00:00 
ITS#8747 Remove c_private from LloadConnection


  • cba03e49 
by Ondřej Kuzník at 2021-05-10T18:49:13+00:00 
ITS#8747 Protect shutdown code while workers are still alive
Comment 5 Quanah Gibson-Mount 2021-06-21 18:42:45 UTC
  • 2d78b627
by Ondřej Kuzník at 2021-06-21T16:36:06+00:00
ITS#8747 Allow olcBkLloadClientMaxPending in cn=config
Comment 6 Quanah Gibson-Mount 2021-06-21 22:04:58 UTC
RE25:

Commits: 
  • 89e962c9 
by Ondřej Kuzník at 2021-06-21T21:58:28+00:00 
ITS#8747 Allow olcBkLloadClientMaxPending in cn=config