[Date Prev][Date Next] [Chronological] [Thread] [Top]

Performance issues lately.




Hello list,

Solaris 10u8
db-4.8.30.NC
openldap-2.4.23

So in the last week, simple queries have been really slow. For example:

# time /usr/local/bin/ldapsearch -h 0 -D cn=admin,dc=company,dc=com -x -w pass -b ou=mail,dc=company,dc=com mail=testlookup@domain.com
1m21s

Of course "mail" in indexed. The server is dedicated to LDAP (test server) so there are no other connections. Not even syncrepl connection.

I have checked the usual things, memory usage, BDB cache, and disk IO. All seem to be within reasonable limits. Then I went deeper.

I started trussing it, and it seems to send the query reply nearly right away, then sit around 200 seconds before we send a few more bytes and exit.

I tried out db_stat on various db, and found it is also really slow on stat on "id2entry.bdb". For example:

# time /usr/local/BerkeleyDB.4.8/bin/db_stat -d id2entry.bdb
2m32s

So, trussing that call (including all calls in libdb-4.2):

19700/1@1:       0.0001 -> libdb-4.8:__db_stat_print_pp(0x419920, 0x0, 0x412c30,
 0x4146a0, 0xfffffd7fffdff880, 0xfffffd7fffdff824)
19700/1:         0.0009 time()                                          = 128972
5450
19700/1:         0.0001 brk(0x0041EC20)                                 = 0
19700/1:         0.0001 brk(0x00422C20)                                 = 0
19700/1:         0.0001 open("/usr/share/lib/zoneinfo/Japan", O_RDONLY) = 4
19700/1:         0.0000 fstat(4, 0xFFFFFD7FFFDFF880)                    = 0
19700/1:         0.0001 read(4, 0xFFFFFD7FFEEA0880, 125)                = 125
19700/1:           T Z i f\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\003\0\0\003\0\0\0
\0
19700/1:          \0\0\0\t\0\0\003\0\0\0\fC3CE85 pD7 >1E90D7EC1680D8F91690D9CBF8
80
19700/1:          DB071D10DBABDA80DCE6FF10DD8BBC80020102010201020102\0\0 ~90\0\0
\0
19700/1:          \08CA00104\0\0 ~90\0\b C J T\0 J D T\0 J S T\0\0\0\0\0\0\0
19700/1:         0.0000 close(4)                                        = 0
19700/1:         0.0002 ioctl(1, TCGETA, 0xFFFFFD7FFFDFE8B0)            = 0
19700/1:         0.0000 fstat(1, 0xFFFFFD7FFFDFE840)                    = 0
19700/1:         0.0001 write(1, 0xFFFFFD7FFF006764, 36)                = 36
19700/1:           S u n   N o v   1 4   1 8 : 0 4 : 1 0   2 0 1 0\t L o c a l
 t
19700/1:           i m e\n
19700/1:        589.8794        write(1, 0xFFFFFD7FFF006764, 25)
= 25
19700/1:           5 3 1 6 2\t B t r e e   m a g i c   n u m b e r\n
19700/1:         0.0046 write(1, 0xFFFFFD7FFF006764, 23)                = 23


Not entirely sure why a write (everything leading up) to stdout takes 589 seconds.

I have tried rebuilding the whole database with slapadd but made no difference. I put all DB files in /tmp on x4540 (has 32G RAM) and yet, still 2mins to do db_stat.


General db_stat -m output:


4GB     Total cache size
8       Number of caches
8       Maximum number of caches
512MB   Pool individual cache size
0       Maximum memory-mapped file size
0       Maximum open file descriptors
0       Maximum sequential buffer writes
0       Sleep after writing maximum sequential buffers
0       Requested pages mapped into the process' address space
407M    Requested pages found in the cache (99%)
38      Requested pages not found in the cache
425405  Pages created in the cache
38      Pages read into the cache
425427  Pages written from the cache to the backing file
0       Clean pages forced from the cache
0       Dirty pages forced from the cache
0       Dirty pages written by trickle-sync thread
425427  Current total page count
425427  Current clean page count
0       Current dirty page count
524296  Number of hash buckets used for page location
4096    Assumed page size used
407M    Total number of times hash chains searched for a page (407904893)
32      The longest hash chain searched for a page
466M    Total number of hash chain entries checked for page (466861223)
0       The number of hash bucket locks that required waiting (0%)
0       The maximum number of times any hash bucket lock was waited for (0%)
0       The number of region locks that required waiting (0%)
0       The number of buffers frozen
0       The number of buffers thawed
0       The number of frozen buffers freed
425538  The number of page allocations
0       The number of hash buckets examined during allocations
0       The maximum number of hash buckets examined for an allocation
0       The number of pages examined during allocations
0       The max number of pages examined for an allocation
0       Threads waited on page I/O
0       The number of times a sync is interrupted


Pool File: id2entry.bdb
16384   Page size
0       Requested pages mapped into the process' address space
13M     Requested pages found in the cache (99%)
2       Requested pages not found in the cache
163449  Pages created in the cache
2       Pages read into the cache
163451  Pages written from the cache to the backing file

Pool File: mail.bdb
4096    Page size
0       Requested pages mapped into the process' address space
2797594 Requested pages found in the cache (99%)
2       Requested pages not found in the cache
3335    Pages created in the cache
2       Pages read into the cache
3337    Pages written from the cache to the backing file



-rw-r--r--   1 root     root         168 Nov 14 12:54 DB_CONFIG
-rw-------   1 root     root        3.8M Nov 14 14:03 DNSData.bdb
-rw-------   1 root     root        5.4M Nov 14 14:03 DNSHostName.bdb
-rw-------   1 root     root        3.3M Nov 14 14:03 DNSIPAddr.bdb
-rw-------   1 root     root        2.7M Nov 14 14:03 DNSType.bdb
-rw-------   1 root     root         24K Nov 14 12:54 __db.001
-rw-------   1 root     root        180M Nov 14 18:28 __db.002
-rw-------   1 root     root        512M Nov 14 13:45 __db.003
-rw-------   1 root     root        512M Nov 14 14:35 __db.004
-rw-------   1 root     root        512M Nov 14 14:40 __db.005
-rw-------   1 root     root        512M Nov 14 13:45 __db.006
-rw-------   1 root     root        512M Nov 14 13:45 __db.007
-rw-------   1 root     root        512M Nov 14 13:45 __db.008
-rw-------   1 root     root        512M Nov 14 14:35 __db.009
-rw-------   1 root     root        512M Nov 14 14:27 __db.010
-rw-------   1 root     root        160K Nov 14 12:54 __db.011
-rw-------   1 root     root        1.2M Nov 14 18:25 __db.012
-rw-------   1 root     root         32K Nov 14 12:54 __db.013
-rw-------   1 root     root        2.1M Nov 14 14:03 accountStatus.bdb
-rw-r--r--   1 root     root        2.0K Nov 14 16:12 alock
-rw-------   1 root     root        536K Nov 14 14:03 deliveryMode.bdb
-rw-------   1 root     root        621M Nov 14 14:07 dn2id.bdb
-rw-------   1 root     root        172M Nov 14 14:03 entryCSN.bdb
-rw-------   1 root     root        125M Nov 14 14:03 entryUUID.bdb
-rw-------   1 root     root         17M Nov 14 14:03 gecos.bdb
-rw-------   1 root     root        1.8M Nov 14 14:03 gidNumber.bdb
-rw-------   1 root     root        2.5G Nov 14 14:40 id2entry.bdb
-rw-------   1 root     root         50M Nov 14 16:12 log.0000000266
-rw-------   1 root     root         13M Nov 14 14:03 mail.bdb
-rw-------   1 root     root        3.2M Nov 14 14:03 mailAlternateAddress.bdb
-rw-------   1 root     root        8.7M Nov 14 14:03 o.bdb
-rw-------   1 root     root        4.9M Nov 14 14:03 objectClass.bdb
-rw-------   1 root     root        1.3M Nov 14 14:03 radiusGroupName.bdb
-rw-------   1 root     root         20M Nov 14 14:03 uid.bdb
-rw-------   1 root     root         17M Nov 14 14:03 uidNumber.bdb


checkpoint 128 15
cachesize 5000
idlcachesize 15000

overlay syncprov

syncprov-checkpoint 100 10
syncprov-sessionlog 1000


# DB_CONFIG FILE. Warning, these entries are only written if the DB_CONFIG
# file DOES NOT EXIST. So if you change any of these values, you will need
# to remove the file, then restart slapd.
dbconfig set_lk_detect DB_LOCK_DEFAULT
dbconfig set_lg_max 52428800
dbconfig set_cachesize 4 0 8
dbconfig set_flags db_log_autoremove
dbconfig set_lk_max_objects 1500
dbconfig set_lk_max_locks 1500
dbconfig set_lk_max_lockers 1500
# DB_CONFIG FILE.

Any suggestions? Must have hit some limit somewhere I guess.

Lund