[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: RE24 testing call (2.4.17) round 2



Quanah Gibson-Mount wrote:
Please test RE24 and report any issues.  All known regressions are now
believed fixed.  Thanks!

All tests passed on OpenSUSE 10.3 x86-64.

test056 failed on Ubuntu 9.04 x86-64 on the connection output. Running again succeeded. Comparing the slapd.1.log files from the bad run to the good run, something very odd shows up:

In the good run, the conn=1 op=0 RESULT is logged earlier - at the time it should be:

@@ -417,12 +417,14 @@
 send_ldap_result: err=0 matched="" text=""
 send_ldap_response: msgid=1 tag=97 err=0
 ber_flush2: 14 bytes to sd 13
+conn=1 op=0 RESULT tag=97 err=0 text=
+do_bind: v3 anonymous bind
 connection_get(13)
 connection_get(13): got connid=1
 connection_read(13): checking for input on id=1
 ber_get_next
 ber_get_next: tag 0x30 len 422 contents:
-op tag 99, time 1246147222
+op tag 99, time 1246147432
 ber_get_next
 conn=1 op=1 do_search
 ber_scanf fmt ({miiiib) ber:

In the failed run, the op=0 RESULT is logged much later, after op=2 was already handled:

@@ -470,15 +472,268 @@
 connection_read(13): checking for input on id=1
 ber_get_next
 ber_get_next: tag 0x30 len 5 contents:
-op tag 66, time 1246147222
+op tag 66, time 1246147432
 ber_get_next
 ber_get_next on fd 13 failed errno=0 (Success)
 conn=1 op=2 do_unbind
 conn=1 op=2 UNBIND
-conn=1 op=0 RESULT tag=97 err=0 text=
-do_bind: v3 anonymous bind
 connection_close: conn=1 sd=13
 conn=1 fd=13 closed
+slap_listener_activate(7):
+>>> slap_listener(ldap://localhost:9011/)
+conn=2 fd=13 ACCEPT from IP=127.0.0.1:38898 (IP=127.0.0.1:9011)
+connection_get(13)
+connection_get(13): got connid=2
+connection_read(13): checking for input on id=2
+ber_get_next
+ber_get_next: tag 0x30 len 12 contents:
+op tag 96, time 1246147432

So, the monitor output disagrees:
--- testrun.X/ldapsearch.flt	2009-06-27 17:00:22.000000000 -0700
+++ testdata/monitor1.out	2009-04-27 15:50:11.000000000 -0700
@@ -2,9 +2,9 @@
 structuralObjectClass: monitorConnection
 monitorConnectionProtocol: 3
 monitorConnectionOpsReceived: 2
-monitorConnectionOpsExecuting: 2
+monitorConnectionOpsExecuting: 1
 monitorConnectionOpsPending: 0
-monitorConnectionOpsCompleted: 0
+monitorConnectionOpsCompleted: 1
 monitorConnectionGet: 2
 monitorConnectionRead: 2
 monitorConnectionWrite: 0

because op=0 is still outstanding when the op=1 search runs. In both cases, the actual result was sent to the client at the same time (see the ber_flush2 message), but after it succeeded something prevented it from making further progress. Which is pretty odd, because after that it just does a cond_signal and a mutex_unlock. After ber_free_buf() it just gets a mutex on its counters so it can increment a couple items, but that should be a freebie now since each operation has its per-thread counter structure. I guess it's possible that the malloc library is locking us along the way, or even the stdio library.

Not sure if this is identical to ITS#5966, it doesn't seem like it. It also doesn't seem like something we should hold the release for; the monitor output is correct even if it's not what we expected to see.

Alternatively, we might consider moving the reset

    if ( op->o_conn->c_conn_state == SLAP_C_BINDING )
        op->o_conn->c_conn_state = SLAP_C_ACTIVE;

from connection_bind_cb() to connection_bind_cleanup_cb(), to prevent ops from getting dequeued until we're really free and clear.

--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/