[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#8650) EAGAIN from gnutls_handshake not respected

Hi Ryan,

I'm running into a problem with slapd 2.4.46 hanging on Ubuntu 18.04, 
which seems to be a side effect of the ITS#8650 patch:


slapd will run fine for a while, but during some periods of 
high-traffic, it'll hang. It'll peg the CPU at 100% and won't respond to 
any new LDAP connections. After some time, it'll resume working again, 
but overall it's fairly unreliable.

strace on slapd during the hang shows that it's constantly making read() 
calls that return EAGAIN. After doing a gdb stack trace on slapd, I 
realized that these read() calls are happening as part of the busywait 
for loop in tlsg_session_accept() that repeatedly calls 
gnutls_handshake() when it gets EAGAIN. When slapd recovers from this 
hang state, the first message it prints is a TLS negotiation failure 
error on the culprit file descriptor.

If I back out the ITS#8650 patch, the problem goes away. If I insert 
sleep(1) in the for loop, slapd no longer pegs the CPU at 100%, but it 
still becomes unresponsive during these high-traffic periods.

I don't know what's happening during these high-traffic periods that 
causes the TLS negotiation to go astray. Unfortunately it's not easy to 
reproduce this problem outside of this production environment, given the 
diversity of clients running different OSes with various versions of SSL 

I'm wondering if there is a better way to handle EAGAIN returned from 
gnutls_handshake(), instead of doing a busywait as in ITS#8650, or my 
simplistic attempt at inserting a sleep() call which doesn't really seem 
to help. I'm wondering how the GnuTLS developers intend for people to 
use gnutls_handshake() properly, so as to gracefully handle sessions 
that involve long packets on the one hand, without opening up a 
vulnerability to chew up lots of system resources on the other hand.

