[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Too many executing vs syncrepl




Do you have different versions of OpenLDAP on the master vs the
replicas? Or did you upgrade everything at once? How large is your
database? How many entries does your database have? Are you using a disk
cache or a memory cache for BDB?

Unfortunately, yes. in this case, ldapmaster is 2.4.23, and so is ldapslave01; 2.4.23, but pop06 has loopback ldap of 2.3.41.

This is not supported and clearly undesirable, but with so many hosts it takes a long time to schedule maintenances and upgrade. Plus we admins need to sleep occasionally.

The database is:

-rw-------   1 root     root        2.4G Sep 30 08:55 id2entry.bdb

with total of about 7GB. (including bdb environment files, but no transaction files).

We already upgraded machines to 8GB RAM from a previous conversation. DB stats report (ldapmaster):

4GB     Total cache size
8       Number of caches
8       Maximum number of caches
512MB   Pool individual cache size
0       Maximum memory-mapped file size
0       Maximum open file descriptors
0       Maximum sequential buffer writes
0       Sleep after writing maximum sequential buffers
0       Requested pages mapped into the process' address space
102M    Requested pages found in the cache (99%)
204394  Requested pages not found in the cache
4282    Pages created in the cache
204394  Pages read into the cache
402689  Pages written from the cache to the backing file
0       Clean pages forced from the cache
0       Dirty pages forced from the cache
0       Dirty pages written by trickle-sync thread
208575  Current total page count
208559  Current clean page count
16      Current dirty page count
524296  Number of hash buckets used for page location
4096    Assumed page size used
102M    Total number of times hash chains searched for a page (102254300)
18      The longest hash chain searched for a page
125M    Total number of hash chain entries checked for page (125288450)
0       The number of hash bucket locks that required waiting (0%)
0       The maximum number of times any hash bucket lock was waited for (0%)
50      The number of region locks that required waiting (0%)
0       The number of buffers frozen
0       The number of buffers thawed
0       The number of frozen buffers freed
208752  The number of page allocations

and on pop loopback ldap:

80MB 2KB 912B   Total cache size.
1       Number of caches.
80MB 8KB        Pool individual cache size.
0       Requested pages mapped into the process' address space.
1439M   Requested pages found in the cache (99%).
18M     Requested pages not found in the cache.
1954    Pages created in the cache.
18M     Pages read into the cache.
255279  Pages written from the cache to the backing file.
18M     Clean pages forced from the cache.
40207   Dirty pages forced from the cache.
0       Dirty pages written by trickle-sync thread.
9069    Current total page count.
9055    Current clean page count.
14      Current dirty page count.
8191    Number of hash buckets used for page location.
1476M   Total number of times hash chains searched for a page.
9       The longest hash chain searched for a page.
3383M   Total number of hash buckets examined for page location.
2992M   The number of hash bucket locks granted without waiting.
44448   The number of hash bucket locks granted after waiting.
3233    The maximum number of times any hash bucket lock was waited for.
63M     The number of region locks granted without waiting.
76892   The number of region locks granted after waiting.
18M     The number of page allocations.
36M     The number of hash buckets examined during allocations
728     The max number of hash buckets examined for an allocation
18M     The number of pages examined during allocations
360     The max number of pages examined for an allocation
(much smaller as it shares resources. Also only syncrepls the mail tree from LDAP).


> Are you using a disk cache or a memory cache for BDB?

You can do that now? I'm afraid that the only BDB specific work I have done, are the DB_CONFIG entries shown in the slapd.conf previous. Repeated here for your convenience:

set_lk_detect DB_LOCK_DEFAULT
set_lg_max 52428800
set_cachesize 4 0 8
set_flags db_log_autoremove
set_lk_max_objects 1500
set_lk_max_locks 1500
set_lk_max_lockers 1500



Now, looking at "too many executing", we do need to do something about it. It happens on both old and new versions, so it could just be that we are at capacity already.

If I grep for the message on ldapmaster, we get 0 for all days. Of course, master only sycnrepls to the 4 slaves. The 4 slaves in turn, syncrepls to all loopback ldaps, but also get the occasional direct request.


# gzgrep "too many executing" /var/log/slaplog-201009$day.gz | wc -l

day   master slave01 slave02   pop06
29    0      16      3689      66
28    0      21      2916      65
27    0      0       582       27
26    0      0       839       2

The difference between slave01 and slave02 here is the "idlcachesize 15000" line. We added it to slave01 last week, and I did the 6 am LiveUP to add it to slave02 today. Looks like that is a step in the right direction.




--
Jorgen Lundman       | <lundman@lundman.net>
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
Japan                | +81 (0)3 -3375-1767          (home)