[Date Prev][Date Next] [Chronological] [Thread] [Top]

Too many executing vs syncrepl




Hello list,

So we are in the middle of a major upgrade of our OpenLDAP software, so it is a bit unfortunate that I have to track down issues at the same time.

os:  Solaris 10u8 x86
old: openldap-2.3.41 db-4.2.52.NC-PLUS_5_PATCHES
new: openldap-2.4.23 db-4.8.30.NC

We noticed that syncrepl stopped on pop01, pop03 and pop06 yesterday and fell behind. The only hints in slaplog was:

Sep 28 11:23:09 pop06.unix slapd[29027]: [ID 968320 local4.debug] do_syncrep2: L
DAP_RES_INTERMEDIATE - NEW_COOKIE

Sep 28 11:24:44 pop06.unix slapd[29027]: [ID 763815 local4.debug] connection_inp
ut: conn=123099 deferring operation: too many executing

Sep 28 11:24:44 pop06.unix slapd[29027]: [ID 763815 local4.debug] connection_inp
ut: conn=123099 deferring operation: pending operations

Sep 28 11:24:48 pop06.unix last message repeated 72 times

and there were no more syncrepl messages until we restarted slapd, 2 hours later. I wonder if the syncrepl connection received "too many executing". Is that possible? Can we make it so sync connections get higher priority as it were. In this case, it is new-ldap syncrepl to old-ldap for loopback lookups (dovecot).

Now, I would guess that getting "too many executing" is undesirable. Googling around it seems that what happens is that; one connection has more than half of the connection-pool operations already, and gets deferred.

What does "one connection" mean? From one IP (all connections are over loopback, except for syncrepl), or is it operations from one-tcp-stream? Or it some other kind of cookie, like rid?

Can I get slapd to tell me which connection it actually means? Having looked at the sources, it does not seem to have that ability, but I could always add our own prints. At least to get the IP of the requester. (I tried "conns" in LogLevel, but it prints all select() calls, and is unfortunately unrealistic to run on live servers. Currently I have 'stats' running.)

Or rather than hacking at the sources, should I invest in getting the overlay "monitor" to run? Would it show why we receive "too many executing".


I have also noticed a considerable performance drop when moving from old version to new version, and not entirely sure if that is something we can do something about.


Following this email is the juicy parts of slapd on most of our slaves/loopback slapd.


--
Jorgen Lundman       | <lundman@lundman.net>
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
Japan                | +81 (0)3 -3375-1767          (home)


loglevel        sync stats
access to *
        by dn.base="cn=replicator,dc=company,dc=jp" read
        by * break

access to attrs=userPassword
        by self write
        by anonymous auth
        by * none
access to *
        by self write
        by dn="cn=admin,dc=company,dc=jp" write
        by peername.ip=172.20.12.6    none
        by peername.ip=172.20.12.16   none
        by peername.ip=172.20.12.26   none
        by peername.ip=172.20.12.36   none
        by peername.ip=172.20.12.46   none
        by peername.ip=172.20.12.56   none
        by peername.ip=172.20.12.66   none
        by peername.ip=172.20.12.76   none
        by * read

password-hash {CRYPT}

database        hdb
suffix          "dc=company,dc=jp"
rootdn          "cn=admin,dc=company,dc=jp"
directory       /usr/local/var/openldap-data

# Indices to maintain
index   objectClass     eq
##index   uid                     pres,eq
index   uid                     eq
index   uidNumber               eq
index   mail                    eq
index   mailAlternateAddress    pres,eq
index   deliveryMode            eq
index   accountStatus           eq
index   gecos                   eq
index   radiusGroupName         eq
index   o                       pres,eq
index   entryCSN,entryUUID      eq
index   gidNumber               eq
index   DNSType                 eq
index   DNSIPAddr               eq
index   DNSData                 eq
index   DNSHostName             eq


checkpoint 128 15
cachesize 5000
idlcachesize 15000

overlay syncprov

syncprov-checkpoint 100 10
syncprov-sessionlog 100

dbconfig set_lk_detect DB_LOCK_DEFAULT
dbconfig set_lg_max 52428800
dbconfig set_cachesize 4 0 1
dbconfig set_flags db_log_autoremove
dbconfig set_lk_max_objects 1500
dbconfig set_lk_max_locks 1500
dbconfig set_lk_max_lockers 1500

# rid is last octet of IP, plus 256.
syncrepl   rid=279
                provider=ldap://172.20.12.163
                type=refreshAndPersist
                interval=00:00:00:30
                searchbase="dc=company,dc=jp"
                filter="(objectClass=*)"
                attrs="*,+"
                scope=sub
                schemachecking=off
                bindmethod=simple
                binddn="cn=admin,dc=company,dc=jp"
                credentials="<secret>"
                retry="60 10 300 +"

updateref       ldap://172.20.12.163