[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: slapd stability problems with add/change operations



Quanah Gibson-Mount wrote:

> If you are running stock BDB 4.2.52 without the required patches from sleepycat, I am not surprised by your problems.


thanks for all the feedback so far, my comments:

- We have all patches in FreeBSDs BDB, I confirmed that with the MD5 sums in the port and the files I found @ sleepycat.com homepage.

- According to the feedback from Howard Chu I read some links about DB_CONFIG, namely:

http://www.sleepycat.com/docs/ref/am_conf/cachesize.html
http://www.openldap.org/faq/data/cache/1074.html
http://www.openldap.org/faq/data/cache/1075.html
http://www.openldap.org/faq/data/cache/893.html

According to the calculation in the sample I calculated that for my DB. We just have around 3000 entries and DB files of around 5MB max so as expected I didn't get more than 256K (in fact I didn't even come close to it). Anyway, I switched it to 2MB because this machine has 1GB of RAM and doesn't do much more than OpenLDAP.

Then I tried to find some docs about the locks:

http://www.sleepycat.com/docs/api_c/env_set_lk_max_objects.html
http://www.sleepycat.com/docs/ref/lock/max.html

I couldn't really find much about locks and OpenLDAP, except the config file in debian:

--
[...]
# Sven Hartge reported that he had to set this value incredibly high
# to get slapd running at all. See http://bugs.debian.org/303057
# for more information.

# Number of objects that can be locked at the same time.
set_lk_max_objects      5000
# Number of locks (both requested and granted)
set_lk_max_locks        5000
# Number of lockersX
set_lk_max_lockers    5000
--

His bug report is interesting as well, also because they write that one has to *redo* (slapcat/slapadd) the complete database to get the changes active. Didn't know that first.

So I redid my DB as well with the following DB_CONFIG file:

--
# set cachesize to 2MB for now
set_cachesize   0       2097152         0

# Number of objects that can be locked at the same time.
set_lk_max_objects      10000
# Number of locks (both requested and granted)
set_lk_max_locks        10000
# Number of lockers
set_lk_max_lockers      10000
--

Note that I have no clue how to define the object lock number. You will find some docs but as I don't know how many locks a read operation requires it's a bit hard to judge for me. I thought 10'000 should be fine but I already found config files with 100'000 (probably their DB is much bigger than mine).

As stresstest we did this:

- launch an add operation set from the metadb, we had about 110 add operations. This operation will do quite some reads first to see what is missing.
- launch two syncs of the two ldap-slaves we have


Like this we can reproducible hang slapd within seconds. Note that I don't get any hints in the logfile about why it hangs. The last entry I see with loglevel 256 is the add operation, then it hangs.

I do have the impression that it took a bit longer to hang it since I've changed the locks to 10'000. But as I said, I don't know if this should be sufficient now for my databases.

some stats:

--
# db_stat-4.2 -d id2entry.bdb:
53162   Btree magic number.
9       Btree version number.
Flags:  little-endian
2       Minimum keys per-page.
16384   Underlying database page size.
2       Number of levels in the tree.
2613    Number of unique keys in the tree.
2613    Number of data items in the tree.
1       Number of tree internal pages.
...
--

--
# db_stat-4.2 -d dn2id.bdb
53162   Btree magic number.
9       Btree version number.
Flags:  duplicates, little-endian
2       Minimum keys per-page.
4096    Underlying database page size.
3       Number of levels in the tree.
5300    Number of unique keys in the tree.
15537   Number of data items in the tree.
6       Number of tree internal pages.
...
--

-> not a size issue I hope ;)

Ah and what I didn't mention so far, we have an SMP box (2CPUs) for this machine. Not sure if this matters or not.

so, once again I'm running out of ideas here. Comments are welcome

cu

Adrian
--
Adrian Gschwend
System Administrator
Berne University of Applied Sciences
Biel, Switzerland