[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: errant SASL/GSSAPI setup?



As I mailed yesterday, I've been seeing my slaves not get updates once their
initial kerberos ticket period expires.

On the server, I see this, using -d-1 on the master:

"
SASL [conn=1] Failure: GSSAPI Error: The context has expired (No error)
sb_sasl_write: failed to encode packet: generic failure
ldap_write: want=11176 error=Input/output error
ber_flush failed errno=5 reason="Input/output error"
connection_closing: readying conn=1 sd=13 for close
"

This makes sense.  The context has expired.

Anyway, I noticed the "readying" condition of closure and thought about that.
I never did see an actual "close".

Looking for that phrase, I found it in openldap-2.3.27/servers/slapd/connection.c

I made a rude change to the programming:

utility5:/tools/src/openldap/RHEL4-i686# diff -c openldap-2.3.27/servers/slapd/connection.c{~,}
*** openldap-2.3.27/servers/slapd/connection.c~	Wed Apr  5 14:08:48 2006
--- openldap-2.3.27/servers/slapd/connection.c	Thu Aug 31 09:36:18 2006
***************
*** 961,966 ****
--- 961,969 ----
  
  		/* wake write blocked operations */
  		slapd_clr_write( sd, 1 );
+ 
+                 connection_close ( c );
+ 
  		if ( c->c_writewaiter ) {
  			ldap_pvt_thread_cond_signal( &c->c_write_cv );
  			ldap_pvt_thread_mutex_unlock( &c->c_mutex );



i.e. I just did a hard close of the connection in the "connection_closing"
     routine.

This isn't something I'd want to do in my production servers, but I tried it in
my test environment.

The server shows this now:

SASL [conn=11] Failure: GSSAPI Error: The context has expired (No error)
sb_sasl_write: failed to encode packet: generic failure
ldap_write: want=11219 error=Input/output error
ber_flush failed errno=5 reason="Input/output error"
connection_closing: readying conn=11 sd=14 for close
connection_close: conn=11 sd=14

since the closure is forced now, rather than "readied".

The slave, -d 16384, shows this:

do_syncrep2 : Can't contact LDAP server
sb_sasl_write: failed to encode packet: generic failure

*** some time delay ***

request done: ld 0x928fb0 msgid 1
request done: ld 0x928fb0 msgid 2
request done: ld 0x928fb0 msgid 3
request done: ld 0x928fb0 msgid 4
syncrepl_entry: LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD)
syncrepl_entry: be_search (0)
syncrepl_entry: wpieduPersonUUID=2af586df6800b3389cbe7bcbf2a920df,ou=People,dc=WPI,dc=EDU
syncrepl_entry: be_modify (0)
do_syncrep2: LDAP_RES_INTERMEDIATE - REFRESH_DELETE


i.e. The client sees the closure and connects again, performing any needed
     updates.  This is the desired action.


I believe the problem is that the closure is scheduled, but the connection
doesn't get to a state where it is tested for that impending closure.

The call for the actual close (connection_close) is done in connection_read and
connection_resched.  It's probable that a master just doesn't get to these
states when dealing with a syncrepl slave.

The test "fix" I did was pretty brutal, and I was surprised that the servers
(slave and master) worked with it.  However, it did eliminate the hangup when
dealing with syncrepl slaves connected via GSSAPI.

As a wild guess, I think I may be seeing this problem since I may be using a
different version sasl (2.1.21) than others.  It could be that an older sasl
may not have been checking ticket age.