[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#8387) online olcDbConfig change fails with syncprov



Full_Name: Ryan Tandy
Version: 2.4, master
OS: Debian
URL: 
Submission from: (NULL) (24.68.37.4)
Submitted by: ryan


Forwarding from a Debian bug report:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=816294

Configure a BDB or HDB database with syncprov:

dn: olcDatabase={1}hdb,cn=config
objectClass: olcHdbConfig
olcDbDirectory: data
olcSuffix: dc=example,dc=comD%D

dn: olcOverlay={0}syncprov,olcDatabase={1}hdb,cn=config
objectClass: olcSyncProvConfig

perform some kind of modification to the database (so that a syncprov checkpoint
is pending), and perform an online olcDbConfig change that reopens the
database.

dn: olcDatabase={1}hdb,cn=config
changetype: modify
replace: olcDbConfig
olcDbConfig: set_cachesize 1 0 1

Reopening the database fails:

56e76e17 bdb(dc=example,dc=com): BDB4511 Error: closing the transaction region
with active transactions
56e76e17 bdb_db_close: database "dc=example,dc=com": close failed: Invalid
argument (22)

and slapd crashes shortly after, when it tries to syncprov_checkpoint while the
database is already gone.

What appears to be happening is that "ctx" is different between bdb_reader_get
and bdb_reader_flush in this case.

During a normal slapd startup and shutdown:

(gdb) thread apply all frame

Thread 1 (Thread 0x7ffff7fed700 (LWP 21570)):
#0  hdb_reader_get (op=0x7fffffffd8d0, env=0xa93fa0, txn=0x7fffffffd610) at
cache.c:1666
1666		if ( !ctx ) {
(gdb) p ctx
$1 = (void *) 0x8b34a0 <ldap_int_main_thrctx>

[ ... killall slapd ... ]

(gdb) thread apply all frame

Thread 1 (Thread 0x7ffff7fed700 (LWP 21570)):
#0  hdb_reader_flush (env=0xa93fa0) at cache.c:1643
1643		if ( !ldap_pvt_thread_pool_getkey( ctx, env, &data, NULL ) ) {
(gdb) p ctx
$2 = (void *) 0x8b34a0 <ldap_int_main_thrctx>

In this case, the readers are cleared correctly.

Another startup, this time the hdb_db_close is triggered by performing an
olcDbConfig change:

(gdb) thread apply all frame

Thread 1 (Thread 0x7ffff7fed700 (LWP 21624)):
#0  hdb_reader_get (op=0x7fffffffd8d0, env=0xa93fa0, txn=0x7fffffffd610) at
cache.c:1666
1666		if ( !ctx ) {
(gd2929 p ctx
$1 = (void *) 0x8b34a0 <ldap_int_main_thrctx>

[ ... ldapmodify ... ]

(gdb) thread apply all frame

Thread 3 (Thread 0x7ffff362e700 (LWP 21633)):
#0  hdb_reader_flush (env=0xa93fa0) at cache.c:1643
1643		if ( !ldap_pvt_thread_pool_getkey( ctx, env, &data, NULL ) ) {

Thread 2 (Thread 0x7ffff3e2f700 (LWP 21631)):
#0  0x00007ffff732d4d3 in epoll_wait () at
../sysdeps/unix/syscall-template.S:84
84	../sysdeps/unix/syscall-template.S: No such file or directory.

Thread 1 (Thread 0x7ffff7fed700 (LWP 21624)):
#0  0x00007ffff75f06dd in pthread_join (threadid=140737285125888,
thread_return=0x0) at pthread_join.c:90
90	pthread_join.c: No such file or directory.
(gdb) p ctx
$2 = (void *) 0x7ffff362dbf0

This time we have a different ctx, so the readers are not cleared. This when we
get to db->close there is still an active txn.

The comment "free up any keys used by the main thread" seems to assume
bdb_reader_flush will be called on the main thread only.