[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Bugfix: wait4msg() hangs when using SSL/TLS (ITS#446)



"Kurt D. Zeilenga" wrote:

> Forwarded to devel for discussion....
>
> At 02:35 AM 3/20/00 +0000, Andrew Hacking wrote:
> >I guess this means the socket is in 'blocking' mode, which would always cause a
> >hang when reading a byte that is not yet available (as I think you have shown).
>
> Yes, but why did it attempt to read a PDU from the slave?  It appears to me
> that your change caused the library to read from the wrong connection.
>

It trys to read a message on each connection via try_read1msg(), and thus it blocks on
the first connection that does not have any data, if and ONLY if blocking sockets are
used.

The original code had a check for ber_pvt_sb_data_ready(lc->lconn_sb), which appears to
only return true if a previous read actually read too much and left some bytes in the
sockbuf.  This doesn't help the situation, and is related to the original SSL/TLS
hanging problem because ber_pvt_sb_data_ready() may return 0 for the connection (ie the
sockbuf doesn't have any data cached) and thus select() is called.  This is not the
correct thing to do when using a SSL/TLS sockbuf; try_read1msg() must be called to
allow the openssl buffers to return data before attempting to wait on the socket via
select().

The original code would then examine the FDSET for readabillity and then call
try_read1msg().  This, of course cannot be done for custom sockbuf_IO's such as
TLS/SSL, because they may actually need the socket to become writable when you want to
read from them.

So here we have two opposing requirements:
* try_read1msg() must be called for sockbuf_IO implementations that do their own
buffering, eg SSL/TLS.
* try_read1msg() should not be called on blocking sockets.

This seems to imply that custom Sockbuf_IO's eg. SSL/TLS over blocking sockets is a
no-no.


> >This is quite serious, since even if a select() is performed to see if _some_ data
> >is available BEFORE calling try_read1msg(), the stream_read() will hang if not
> >enough bytes have been queued on the socket when attempting to parse the message.
>
> Yes.
>
> >I always thought/assumed (I know, assumption is never a good thing) that sockbuf's
> >used non-blocking sockets for this very reason... to avoid hanging indefinitely
> >when performing a read/recv.
>
> No.  By default the library uses blocking sockets.
>
> >This means that performing a read/recv on a
> >non-blocking socket when data is unavailable returns failure with errno set to
> >EWOULDBLOCK/EAGAIN, then select() would be called to wait for the arrival of more
> >data, the timeout checked, and the read/recv tried again.
>
> Only if the caller marks the sockets non-blocking.
>
> >So the question is, why are blocking sockets being used, or has it always been this
> >way and no-one noticed ?
>
> Because UNIX socket model is, by default, blocking.
>
> >I am a little concerned because libldap and liblber appear to be written around the
> >notion of non-blocking sockets.
>
> libldap and liblber support both blocking and non-blocking sockets.
>
> >If blocking sockets are used, the timeout that the caller provides to libldap will
> >never be honoured when say, the server takes some time to respond, or worse it
> >crashes, it will even hang when network congestion/disruption occurs.
>
> The timeout applies only to the select, not to the read.  Once a PDU read is
> initiated, it must be completed (or the stream closed).  The API doesn't
> support restarting of PDU reads.
>

I take it you are referring to the top level ldap API, not the internals here, since
that would really mean non-blocking cannot be supported and it certainly *appears* that
the code is written to handle non-blocking sockets.

I am curious.... if a timeout is specified on a call to an ldap api function, by
default, libldap will not honour the timeout due to the fact that the read can block
indefinitely because (as noted above), blocking sockets are used.  If this is the case,
libldap does not seem to treat timeout arguments in accordance with the
intentions/expectations of the ldap api's described in both rfc1823 and
draft-ietf-ldapext-ldap-c-api-04.  As a user of the api, I would expect any timeout I
give to say ldap_search_st() to be observed, and never ever block indefinitely.  The
rfc does not make any mention of options like blocking/non-blocking sockets, and
although I am not opposed to custom options where they make sense (I am not aware of
any value in using blocking sockets within libldap), I would still expect the correct
behavior with regards to timeouts "out-of-the-box". Meaning, I should not have to set
openldap specific session options to get the behavior described in the rfc/drafts.

What is the rationale for using and defaulting to blocking sockets within libldap ?
Anyone ?

If non-blocking were always used, the problems go away, ssl/tls works, reads can never
block indefinitely as they currently do, and the semantics of timeouts (as per the rfc)
are preserved.


-Andrew.