[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#6639) syncrepl failing using SASL/GSSAPI



whm@stanford.edu wrote:
> --On Thursday, September 09, 2010 10:43:06 PM -0700 Howard Chu<hyc@symas.com>  wrote:
>
>> whm@stanford.edu wrote:
>>> --On Friday, September 03, 2010 01:23:17 AM -0700 Bill MacAllister<whm@stanford.edu>   wrote:
>>>
>>> The problem with the database was only coincidental.  Restoring the database
>>> got the failing replica past the problem replication event.
>>>
>>> In the replica pool of 6 servers we have seen the problem on there of the
>>> servers.  In thinking about this more it is unlikely that it is a slave
>>> problem since the slaves have been in use for about 6 weeks and we did
>>> not see the problem.  Only when we changed the master to 2.4.23 did we
>>> see the problem.  I have captured a master debug log of the problem
>>> event.  It is at http://www.stanford.edu/~whm/files/master-debug.txt.
>>>
>>> Bill
>>>
>> Please try with this patch:
>>
>> Index: sasl.c
>> ===================================================================
>> RCS file: /repo/OpenLDAP/pkg/ldap/libraries/libldap/sasl.c,v
>> retrieving revision 1.79
>> diff -u -r1.79 sasl.c
>> --- sasl.c	13 Apr 2010 20:17:56 -0000	1.79
>> +++ sasl.c	10 Sep 2010 05:42:22 -0000
>> @@ -733,8 +733,9 @@
>>    		return ret;
>>    	} else if ( p->buf_out.buf_ptr != p->buf_out.buf_end ) {
>>    		/* partial write? pretend nothing got written */
>> -		len2 = 0;
>>    		p->flags |= LDAP_PVT_SASL_PARTIAL_WRITE;
>> +		sock_errset(EAGAIN);
>> +		len2 = -1;
>>    	}
>>
>>    	/* return number of bytes encoded, not written, to ensure
>
> I have applied the patch and the debian packages built fine.  I have
> installed the new packages on the master servers in our dev and test
> environments.  Initial tests show that basic replication is working.
> Of course, this problem did not exhibit itself in our test
> environments.  I will install it in our production environment Friday
> evening and let you know what happens.
>
> Am I correct in assuming that the root of the problem is on the providers?

Yes. Really, the problem will occur when either a client or server writes a 
large buffer thru a SASL encrypted connection, but this is more likely to 
happen when a server is sending responses.

> I am asking because I was not planning installing the new version on the
> consumers right away since 'emergency' changes to production software
> require special approval.  I would like to upgrade the providers in our
> next maintenance window which is next Thursday.  If you think I should
> do it with the update of the master let me know.

It's probably not urgent for the consumers, just bear in mind that any clients 
querying them thru SASL could also run into similar issues.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/