[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: slapd stability problems with add/change operations





--On Friday, August 12, 2005 4:03 PM +0200 Adrian Gschwend <adrian.gschwend@bfh.ch> wrote:

Quanah Gibson-Mount wrote:

[snip]

# Sven Hartge reported that he had to set this value incredibly high
# to get slapd running at all. See http://bugs.debian.org/303057
# for more information.

# Number of objects that can be locked at the same time.
set_lk_max_objects      5000
# Number of locks (both requested and granted)
set_lk_max_locks        5000
# Number of lockersX
set_lk_max_lockers    5000
--

His bug report is interesting as well, also because they write that one
has to *redo* (slapcat/slapadd) the complete database to get the changes
active. Didn't know that first.

I think this is incorrect. Any time you make changes to DB_CONFIG, the BDB environment has to be updated. The quickest way to do that in 2.2 is to shut down slapd, run db_recover, and then restart slapd. OpenLDAP 2.3 takes care of DB_CONFIG changes for you.



So I redid my DB as well with the following DB_CONFIG file:

--
# set cachesize to 2MB for now
set_cachesize   0       2097152         0

# Number of objects that can be locked at the same time.
set_lk_max_objects      10000
# Number of locks (both requested and granted)
set_lk_max_locks        10000
# Number of lockers
set_lk_max_lockers      10000
--

Note that I have no clue how to define the object lock number. You will
find some docs but as I don't know how many locks a read operation
requires it's a bit hard to judge for me. I thought 10'000 should be fine
but I already found config files with 100'000 (probably their DB is much
bigger than mine).


I seriously doubt you are running out of locks. It took me 88 indices on a 400k entry DB to have to move from the default to 3k locks.


As stresstest we did this:

- launch an add operation set from the metadb, we had about 110 add
operations. This operation will do quite some reads first to see what is
missing.
- launch two syncs of the two ldap-slaves we have


Is this with syncRepl?  If yes, see the bottom bit on this email.


Like this we can reproducible hang slapd within seconds. Note that I
don't get any hints in the logfile about why it hangs. The last entry I
see with loglevel 256 is the add operation, then it hangs.

I do have the impression that it took a bit longer to hang it since I've
changed the locks to 10'000. But as I said, I don't know if this should
be sufficient now for my databases.

some stats:

--
# db_stat-4.2 -d id2entry.bdb:
53162   Btree magic number.
9       Btree version number.
Flags:  little-endian
2       Minimum keys per-page.
16384   Underlying database page size.
2       Number of levels in the tree.
2613    Number of unique keys in the tree.
2613    Number of data items in the tree.
1       Number of tree internal pages.
...
--

--
# db_stat-4.2 -d dn2id.bdb
53162   Btree magic number.
9       Btree version number.
Flags:  duplicates, little-endian
2       Minimum keys per-page.
4096    Underlying database page size.
3       Number of levels in the tree.
5300    Number of unique keys in the tree.
15537   Number of data items in the tree.
6       Number of tree internal pages.
...
--

-> not a size issue I hope ;)

Ah and what I didn't mention so far, we have an SMP box (2CPUs) for this
machine. Not sure if this matters or not.

so, once again I'm running out of ideas here. Comments are welcome


It might, although I don't have such a problem on my 4 CPU Solaris boxes. There was a recent ITS#3456 about FreeBSD's threading setup, but I don't know if it actually applies here. Are you using syncRepl?


In any case, it may be worthwhile to build with debugging symbols, and then when slapd locks up, run gdb on the process and get a back trace of where all the threads are at.


--Quanah

--
Quanah Gibson-Mount
Principal Software Developer
ITSS/Shared Services
Stanford University
GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html