[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: ldap_simple_bind_s core dump



| Hi Folks,
| 
| I'm developing a driver for an LDAP server(OpenLdap 1.2.10).
| The driver functions are being called  from a multithreaded
| process with only 1 thread running. My LDAP functionality
| has no multi threading support as I have wrapped the
| functions with mutex locks. When I unit test the functions
| all is fine but when a process is kicked off from the Web
| front  end it causes a core dump with a SIGCHILD. I am
| using the Netscape SDK(4.0)by the way.  Below is a trace of
| the functionality being called before the crash courtesy of
| GDB.

There are a lot of good technical reasons, apart from
unnecessary context switches and protection domain
crossing overhead, why kernel threads implementations
are a bad idea.  You have just discovered one of them:
thread local storage (TLS) is allocated only in the
address space of the thread, not on the heap, and is
therefore not visible to all threads.

You are going to need to create a connection per thread;
alternately, you are going to have to abstract an LDAP
worker process, and connect _only_ from that worker
process, and use threads IPC to communicate LDAP requests
to the worker process for it to satisfy.

Linux and Windows have similar problems.  It's possible
to cheat on Windows, but this is really a bug with the
thread local storage of the parent thread being mapped
visibly in threads created by the parent thread, not a
feature.  Ironically, Windows programs will appear to
work until the wrong kernel thread backing the user
space thread from a thread group is used to run the user
space thread.  Also ironically, mutex enforcement is by
kernel thread, so you have to build your own counting
semaphores if you have things running off of timer
outcalls.  Also also ironically, this was a design
decision based on win32.dll compatability with Windows
3.11.

The underlying problem is communicating the connection
state between what are essentially seperate process
address spaces.

Since you are on Solaris, "man t_sync()" would probably
be informative (it's part of the POSIX XTI interface for
networking functions).  It performs for Solaris sockets
the same function that a "Free Threading Data Marshaller"
performs for Windows, in that it copies the user portion
of the connection context from one process to another, so
that references to it don't dereference nonexistant data
(like you realloc() call is doing when it tries to access
the source address with the memcpy() subroutine call).

Note that just synchronizing the network socket is _not_
sufficient: the LDAP context is not there, either, and
the copy of that needs to be adjusted for the new address
space, so it's not a trivial task.

It would be easiest to have a single worker thread, unless
you want to rewrite the API to be threads reentrant, which
meands changing its architecture to some extent.  This may
be addressed in a future revision which changes the API.


-- Terry Lambert
-- Whistle Communications, Inc., an I.B.M. Company
-- terry@whistle.com
-------------------------------------------------------------------
This is formal notice under California Assembly Bill 1629, enacted
9/26/98 that any UCE sent to my email address will be billed $50
per incident to the legally allowed maximum of $25,000.