[Date Prev][Date Next] [Chronological] [Thread] [Top]

Possible select() probs on Darwin/OSX? (using openldap)



slapd (openldap 2.1.8) running on Mac OSX Server 10.2.2

I've just been bitten by a horrible bug, running OpenLDAP's slapd as an 
authentication service for MacOSX clients. I don't know whether this is 
purely because of running slapd on OSX, with possible API differences, or 
whether the same issues might have happened to other people on other server 
platforms.

I have not delved too deeply into the code yet, since I wanted to elicit any 
knowledge from the community if there are known issues at work here.

I have come across the bug in 3 different configurations:

(1) openldap 2.1.8 using heavily customised back-ldap
(2) openldap 2.1.8 using back-perl to query external LDAP server
(3) openldap 2.1.3 using back-perl to query external LDAP server

The problem, briefly, is intermittant. When logging in using the LDAP server, 
the client machine often hangs at the login screen.

If back-perl/bind.c returns an error (ie not 0) there is no hang. It is only 
with confirmed authentication that slapd gets upset.

The login process on MacOSX queries the LDAP server several times - maybe 
this is significant. I don't know if it shares the same connection for all of 
them, or creates new ones each time. Altogether I count 4 queries for 
(cn=foo), then a bind (which I think always succeeds), then unbind, then 
another 4 (cn=foo) searches, then another bind, after which the server 
usually misbehaves. (if not, then another unbind occurs and the login process 
continues)

Behind the scenes, there seems to maybe be something going on with file 
descriptors. I have appended 3 excertps from the relevant parts of the slapd 
log, from the point at which I return success or failure in the BIND 
subroutine in perl-back.

Running slapd with -d 1023

...
try to bind..................the perl routine is here, and returns a "0" 
result to signify success......................

Perl BIND returned 0x0000
do_bind: v2 bind: "cn=MY_CN123,ou=STF,ou=MU,ou=DE,o=ku" to 
"cn=MY_CN123,ou=STF,ou=MU,ou=DE,o=ku"
send_ldap_result: conn=6 op=2 p=2
send_ldap_result: err=0 matched="" text=""
send_ldap_response: msgid=3 tag=97 err=0
ber_flush: 14 bytes to sd 12
  0000:  30 0c 02 01 03 61 07 0a  01 00 04 00 04 00         0....a........
ldap_write: want=14, written=14
  0000:  30 0c 02 01 03 61 07 0a  01 00 04 00 04 00         0....a........
conn=6 op=2 RESULT tag=97 err=0 text=
ber_get_next
ldap_read: want=9 error=Resource temporarily unavailable
ber_get_next on fd 12 failed errno=35 (Resource temporarily unavailable)
daemon: select: listen=6 active_threads=0 tvp=NULL
daemon: activity on 4 descriptors
daemon: activity on: 7r 9r 11r 12r
daemon: read activity on 7
connection_get(7)
connection_get(7): got connid=0
connection_read(7): checking for input on id=0
ber_get_next
ldap_read: want=9, got=0

ber_get_next on fd 7 failed errno=0 (Undefined error: 0)
connection_read(7): input error=-2 id=0, closing.
connection_closing: readying conn=0 sd=7 for close
connection_close: conn=0 sd=7
daemon: removing 7
conn=0 fd=7 closed
daemon: read activity on 9
connection_get(9)
connection_get(9): got connid=4
connection_read(9): checking for input on id=4
ber_get_next
ldap_read: want=9, got=0

ber_get_next on fd 9 failed errno=0 (Undefined error: 0)
connection_read(9): input error=-2 id=4, closing.
connection_closing: readying conn=4 sd=9 for close
connection_close: conn=4 sd=9
daemon: removing 9
conn=4 fd=9 closed
daemon: read activity on 11
connection_get(11)
connection_get(11): got connid=3
connection_read(11): checking for input on id=3
ber_get_next
ldap_read: want=9, got=0

ber_get_next on fd 11 failed errno=0 (Undefined error: 0)
connection_read(11): input error=-2 id=3, closing.
connection_closing: readying conn=3 sd=11 for close
connection_close: conn=3 sd=11
daemon: removing 11
conn=3 fd=11 closed
daemon: read activity on 12
connection_get(12)
connection_get(12): got connid=6
connection_read(12): checking for input on id=6
ber_get_next
ldap_read: want=9, got=0

ber_get_next on fd 12 failed errno=0 (Undefined error: 0)
connection_read(12): input error=-2 id=6, closing.
connection_closing: readying conn=6 sd=12 for close
connection_close: conn=6 sd=12
daemon: removing 12
conn=6 fd=12 closed
daemon: select: listen=6 active_threads=0 tvp=NULL
daemon: activity on 1 descriptors
daemon: select: listen=6 active_threads=0 tvp=NULL

(at this point the client is hanging, and I assume that the slapd server has 
not sent back a result)


=============================================================
And a different error, this time at the end of a search, and before the BIND 
which would authenticate the client: (client times out gracefully in this 
case)
=============================================================

ldap_write: want=14, written=14
  0000:  30 0c 02 01 04 65 07 0a  01 00 04 00 04 00         0....e........
conn=16 op=3 RESULT tag=101 err=0 text=
ber_get_next
ldap_read: want=9 error=Resource temporarily unavailable
ber_get_next on fd 16 failed errno=35 (Resource temporarily unavailable)
daemon: select: listen=6 active_threads=0 tvp=NULL
daemon: activity on 1 descriptors
daemon: activity on: 16r
daemon: read activity on 16
connection_get(16)
connection_get(16): got connid=16
connection_read(16): checking for input on id=16
ber_get_next
ldap_read: want=9, got=9
  0000:  30 06 02 01 05 50 01 04  30                        0....P..0
ber_get_next on fd 16 failed errno=34 (Result too large)
connection_read(16): input error=-2 id=16, closing.
connection_closing: readying conn=16 sd=16 for close
connection_close: conn=16 sd=16
daemon: removing 16
conn=16 fd=16 closed
daemon: select: listen=6 active_threads=0 tvp=NULL
daemon: activity on 1 descriptors
daemon: select: listen=6 active_threads=0 tvp=NULL
daemon: activity on 1 descriptors
daemon: activity on: 14r
daemon: read activity on 14
connection_get(14)
connection_get(14): got connid=15
connection_read(14): checking for input on id=15
ber_get_next
ldap_read: want=9, got=7
  0000:  30 05 02 01 03 42 00                               0....B.
ber_get_next: tag 0x30 len 5 contents:
ber_dump: buf=0x007eccd0 ptr=0x007eccd0 end=0x007eccd5 len=5
  0000:  02 01 03 42 00                                     ...B.
do_unbind
conn=15 op=2 UNBIND
connection_closing: readying conn=15 sd=14 for close
connection_resched: attempting closing conn=15 sd=14
connection_close: conn=15 sd=14
daemon: removing 14
conn=15 fd=14 closed
daemon: select: listen=6 active_threads=0 tvp=NULL
daemon: activity on 1 descriptors
daemon: select: listen=6 active_threads=0 tvp=NULL

=============================================================
Another one like the first one, with select() reporting not 4 but 6 fds with 
data:
========================================================

do_bind: v2 bind: "cn=KU16368,ou=STF,ou=MU,ou=DE,o=ku" to 
"cn=KU16368,ou=STF,ou=MU,ou=DE,o=ku"
send_ldap_result: conn=22 op=2 p=2
send_ldap_result: err=0 matched="" text=""
send_ldap_response: msgid=3 tag=97 err=0
ber_flush: 14 bytes to sd 17
  0000:  30 0c 02 01 03 61 07 0a  01 00 04 00 04 00         0....a........
ldap_write: want=14, written=14
  0000:  30 0c 02 01 03 61 07 0a  01 00 04 00 04 00         0....a........
conn=22 op=2 RESULT tag=97 err=0 text=
ber_get_next
ldap_read: want=9 error=Resource temporarily unavailable
ber_get_next on fd 17 failed errno=35 (Resource temporarily unavailable)
daemon: select: listen=6 active_threads=0 tvp=NULL
daemon: activity on 6 descriptors
daemon: activity on: 11r 12r 13r 14r 16r 17r
daemon: read activity on 11
connection_get(11)
connection_get(11): got connid=9
connection_read(11): checking for input on id=9
ber_get_next
ldap_read: want=9, got=0

ber_get_next on fd 11 failed errno=0 (Undefined error: 0)
connection_read(11): input error=-2 id=9, closing.
connection_closing: readying conn=9 sd=11 for close
connection_close: conn=9 sd=11
daemon: removing 11
conn=9 fd=11 closed
daemon: read activity on 12
connection_get(12)
connection_get(12): got connid=12
connection_read(12): checking for input on id=12
ber_get_next
ldap_read: want=9, got=0

ber_get_next on fd 12 failed errno=0 (Undefined error: 0)
connection_read(12): input error=-2 id=12, closing.
connection_closing: readying conn=12 sd=12 for close
connection_close: conn=12 sd=12
daemon: removing 12
conn=12 fd=12 closed
daemon: read activity on 13
connection_get(13)
connection_get(13): got connid=11
connection_read(13): checking for input on id=11
ber_get_next
ldap_read: want=9, got=0

ber_get_next on fd 13 failed errno=0 (Undefined error: 0)
connection_read(13): input error=-2 id=11, closing.
connection_closing: readying conn=11 sd=13 for close
connection_close: conn=11 sd=13
daemon: removing 13
conn=11 fd=13 closed
daemon: read activity on 14
connection_get(14)
connection_get(14): got connid=20
connection_read(14): checking for input on id=20
ber_get_next
ldap_read: want=9, got=0

ber_get_next on fd 14 failed errno=0 (Undefined error: 0)
connection_read(14): input error=-2 id=20, closing.
connection_closing: readying conn=20 sd=14 for close
connection_close: conn=20 sd=14
daemon: removing 14
conn=20 fd=14 closed
daemon: read activity on 16
connection_get(16)
connection_get(16): got connid=19
connection_read(16): checking for input on id=19
ber_get_next
ldap_read: want=9, got=0

ber_get_next on fd 16 failed errno=0 (Undefined error: 0)
connection_read(16): input error=-2 id=19, closing.
connection_closing: readying conn=19 sd=16 for close
connection_close: conn=19 sd=16
daemon: removing 16
conn=19 fd=16 closed
daemon: read activity on 17
connection_get(17)
connection_get(17): got connid=22
connection_read(17): checking for input on id=22
ber_get_next
ldap_read: want=9, got=0

ber_get_next on fd 17 failed errno=0 (Undefined error: 0)
connection_read(17): input error=-2 id=22, closing.
connection_closing: readying conn=22 sd=17 for close
connection_close: conn=22 sd=17
daemon: removing 17
conn=22 fd=17 closed
daemon: select: listen=6 active_threads=0 tvp=NULL
daemon: activity on 1 descriptors
daemon: select: listen=6 active_threads=0 tvp=NULL


=============================================================
And finally, one that actually logged me in:
=============================================================

Perl BIND returned 0x0000
do_bind: v2 bind: "cn=MY_CN123,ou=STF,ou=MU,ou=DE,o=ku" to 
"cn=MY_CN123,ou=STF,ou=MU,ou=DE,o=ku"
send_ldap_result: conn=49 op=2 p=2
send_ldap_result: err=0 matched="" text=""
send_ldap_response: msgid=3 tag=97 err=0
ber_flush: 14 bytes to sd 16
  0000:  30 0c 02 01 03 61 07 0a  01 00 04 00 04 00         0....a........
ldap_write: want=14, written=14
  0000:  30 0c 02 01 03 61 07 0a  01 00 04 00 04 00         0....a........
conn=49 op=2 RESULT tag=97 err=0 text=
ber_get_next
ldap_read: want=9 error=Resource temporarily unavailable
ber_get_next on fd 16 failed errno=35 (Resource temporarily unavailable)
daemon: select: listen=6 active_threads=0 tvp=NULL
daemon: activity on 1 descriptors
daemon: activity on: 16r
daemon: read activity on 16
connection_get(16)
connection_get(16): got connid=49
connection_read(16): checking for input on id=49
ber_get_next
ldap_read: want=9, got=7
  0000:  30 05 02 01 04 42 00                               0....B.
ber_get_next: tag 0x30 len 5 contents:
ber_dump: buf=0x00814b30 ptr=0x00814b30 end=0x00814b35 len=5
  0000:  02 01 04 42 00                                     ...B.
do_unbind
conn=49 op=3 UNBIND
connection_closing: readying conn=49 sd=16 for close
connection_resched: attempting closing conn=49 sd=16
connection_close: conn=49 sd=16
daemon: removing 16
conn=49 fd=16 closed
daemon: select: listen=6 active_threads=0 tvp=NULL
daemon: activity on 1 descriptors
daemon: select: listen=6 active_threads=0 tvp=NULL
daemon: activity on 1 descriptors
daemon: activity on: 11r
daemon: read activity on 11
connection_get(11)
connection_get(11): got connid=23
connection_read(11): checking for input on id=23
ber_get_next
ldap_read: want=9, got=9
  0000:  30 0c 02 01 03 60 07 02  01                        0....`...
ldap_read: want=5, got=5
  0000:  02 04 00 80 00                                     .....
ber_get_next: tag 0x30 len 12 contents:
ber_dump: buf=0x008038e0 ptr=0x008038e0 end=0x008038ec len=12
  0000:  02 01 03 60 07 02 01 02  04 00 80 00               ...`........
do_bind
ber_scanf fmt ({imt) ber:
ber_dump: buf=0x008038e0 ptr=0x008038e3 end=0x008038ec len=9
  0000:  60 07 02 01 02 04 00 80  00                        `........
ber_scanf fmt (m}) ber:
ber_dump: buf=0x008038e0 ptr=0x008038ea end=0x008038ec len=2
  0000:  00 00                                              ..
>>> dnPrettyNormal: <>
<<< dnPrettyNormal: <>, <>
do_bind: version=2 dn="" method=128
conn=23 op=2 BIND dn="" method=128
send_ldap_result: conn=23 op=2 p=2
send_ldap_result: err=0 matched="" text=""
send_ldap_response: msgid=3 tag=97 err=0
ber_flush: 14 bytes to sd 11
  0000:  30 0c 02 01 03 61 07 0a  01 00 04 00 04 00         0....a........
ldap_write: want=14, written=14
  0000:  30 0c 02 01 03 61 07 0a  01 00 04 00 04 00         0....a........
conn=23 op=2 RESULT tag=97 err=0 text=
do_bind: v2 anonymous bind
ber_get_next
ldap_read: want=9 error=Resource temporarily unavailable
ber_get_next on fd 11 failed errno=35 (Resource temporarily unavailable)
daemon: select: listen=6 active_threads=0 tvp=NULL
daemon: activity on 1 descriptors
daemon: activity on: 11r
daemon: read activity on 11
connection_get(11)
connection_get(11): got connid=23
connection_read(11): checking for input on id=23
ber_get_next
ldap_read: want=9, got=9
  0000:  30 32 02 01 04 63 2d 04  04                        02...c-..