[Date Prev][Date Next] [Chronological] [Thread] [Top]

slow searches in hdb for non-root searchbase before results are cached



I have an OL 2.2.23 directory (assume root suffix dc=foo) with about 1.6
million entries in ou=people,dc=foo.  DB is configured to use 4 gigs of
cache (in a shared memory segment) and to mmap any file in the database
directory.  When slapd starts, it immediately reserves the 4 gigs and
loads about 50 Megs resident.

Assuming that there is an entry with a DN of uid=eric,ou=people,dc=foo,
then I can search search for the objectclass attribute (or any other) for
uid=eric with a search base of dc=foo, slapd reads a few more Megs into
memory and gives me my result almost instantly.

However, if I specify a search base of ou=people,dc=foo, the search takes
over a minute while slapd pegs three CPUs and loads 800 Megabytes of
additional data into resident process space (larger than any one of my
indexes).  During this time, truss shows the main system calls being
performed as the following.

syscall               seconds   calls  errors
lwp_park                9.363  458404
lwp_unpark              7.391  454183
lwp_mutex_wakeup        3.199  195399
lwp_mutex_lock          4.481  180412


I seem to remember seeing a thread about hdb and suffixes other than the
root suffix, but I can't find it.  I figure this has something to do with
the more concise database structure used by hdb (don't recall seeing this
problem with bdb), but I also figure there's got to be something that can
be done about it.  Ideas?  Howard?

The problem is not disk I/O.  I have enough RAM to get the whole database
into the filesystem cache so the only system resource that gets saturated
is CPU.  The load was up at about 3 (on a four-processor system).

In slapd.conf I have cachesize 100000 and idlcachesize of 500000.

loglevel 1 shows the following while running the search:

Mar 24 14:45:19 host slapd[3116]: [ID 292637 local3.debug] slapd startup: initiated.
Mar 24 14:45:19 host slapd[3116]: [ID 353753 local3.debug] backend_startup: starting "dc=foo"
Mar 24 14:45:19 host slapd[3116]: [ID 881709 local3.debug] bdb_db_open: dbenv_open(/var/ldap/entdir.utexas.edu)
Mar 24 14:45:27 host slapd[3116]: [ID 100111 local3.debug] slapd starting
Mar 24 14:45:35 host slapd[3116]: [ID 611214 local3.debug] connection_get(11): got connid=0
Mar 24 14:45:35 host slapd[3116]: [ID 138202 local3.debug] connection_read(11): checking for input on id=0
Mar 24 14:45:35 host slapd[3116]: [ID 611214 local3.debug] connection_get(11): got connid=0
Mar 24 14:45:35 host slapd[3116]: [ID 138202 local3.debug] connection_read(11): checking for input on id=0
Mar 24 14:45:35 host slapd[3116]: [ID 418971 local3.debug] connection_read(11): unable to get TLS client DN, error=49 id=0
Mar 24 14:45:35 host slapd[3116]: [ID 611214 local3.debug] connection_get(11): got connid=0
Mar 24 14:45:35 host slapd[3116]: [ID 138202 local3.debug] connection_read(11): checking for input on id=0
Mar 24 14:45:35 host slapd[3116]: [ID 812316 local3.debug] ber_get_next on fd 11 failed errno=11 (Resource temporarily unavailable)
Mar 24 14:45:35 host slapd[3116]: [ID 948228 local3.debug] do_bind
Mar 24 14:45:35 host slapd[3116]: [ID 198467 local3.debug] >>> dnPrettyNormal: <uid=erici,ou=people,dc=foo>
Mar 24 14:45:35 host slapd[3116]: [ID 147344 local3.debug] <<< dnPrettyNormal: <uid=erici,ou=people,dc=foo>, <uid=erici,ou=people,dc=foo>
Mar 24 14:45:35 host slapd[3116]: [ID 286280 local3.debug] do_bind: version=3 dn="uid=erici,ou=people,dc=foo" method=128
Mar 24 14:45:35 host slapd[3116]: [ID 449132 local3.debug] bdb_dn2entry("uid=erici,ou=people,dc=foo")
Mar 24 14:45:35 host slapd[3116]: [ID 548982 local3.debug] entry_decode: ""
Mar 24 14:45:35 host slapd[3116]: [ID 184541 local3.debug] <= entry_decode()
Mar 24 14:45:35 host slapd[3116]: [ID 469106 local3.debug] do_bind: v3 bind: "uid=erici,ou=people,dc=foo" to "uid=erici,ou=people,dc=foo"
Mar 24 14:45:35 host slapd[3116]: [ID 246281 local3.debug] send_ldap_result: conn=0 op=0 p=3
Mar 24 14:45:35 host slapd[3116]: [ID 324658 local3.debug] send_ldap_response: msgid=1 tag=97 err=0
Mar 24 14:45:35 host slapd[3116]: [ID 611214 local3.debug] connection_get(11): got connid=0
Mar 24 14:45:35 host slapd[3116]: [ID 138202 local3.debug] connection_read(11): checking for input on id=0
Mar 24 14:45:35 host slapd[3116]: [ID 812316 local3.debug] ber_get_next on fd 11 failed errno=11 (Resource temporarily unavailable)
Mar 24 14:45:35 host slapd[3116]: [ID 940369 local3.debug] do_search
Mar 24 14:45:35 host slapd[3116]: [ID 198467 local3.debug] >>> dnPrettyNormal: <ou=people,dc=foo>
Mar 24 14:45:35 host slapd[3116]: [ID 147344 local3.debug] <<< dnPrettyNormal: <ou=people,dc=foo>, <ou=people,dc=foo>
Mar 24 14:45:35 host slapd[3116]: [ID 948326 local3.debug] ==> limits_get: conn=0 op=1 dn="uid=erici,ou=people,dc=foo"
Mar 24 14:45:35 host slapd[3116]: [ID 771232 local3.debug] => hdb_search
Mar 24 14:45:35 host slapd[3116]: [ID 449132 local3.debug] bdb_dn2entry("ou=people,dc=foo")
Mar 24 14:45:35 host slapd[3116]: [ID 548982 local3.debug] entry_decode: ""
Mar 24 14:45:35 host slapd[3116]: [ID 184541 local3.debug] <= entry_decode()
Mar 24 14:45:35 host slapd[3116]: [ID 180482 local3.debug] search_candidates: base="ou=people,dc=foo" (0x00000003) scope=2
Mar 24 14:45:35 host slapd[3116]: [ID 583753 local3.debug] => hdb_dn2idl( "ou=people,dc=foo" )


Then there is no more log data for a few minutes.


-- 
Eric Irrgang - UT Austin ITS Unix Systems - (512)475-9342