[Date Prev][Date Next] [Chronological] [Thread] [Top]

ITS#51 (hang in test 2 on solaris)



I have located the problem (which still exists by the way in
the LDAP_REL_ENG_1_2 as of about 4:30 yesterday afternoon (New York
time).

Here's a description:

One thread does do_add, enters ldbm_back_add,
locks the parent (o=...c=...) record,
adds the entry, does send_ldap_request.

In the meantime the next thread is taking the next request,
enters do_add. It finds the parent record, calling dn2id;
this calls cache_find_entry_dn2id which mutex locks
the cache c_mutex.

Back to the first thread which wants to finish. It now tries to
do cache_set_state, (which wants the cache c_mutex lock)
 *before* it releases the write lock on that
same parent record. It can't get the lock; it hangs.

The second thread now is trying to get a read lock on the parent record
(still in dn2id); it can't because the other guy has the write lock. It
hangs.

And then I kill -9 the mess and test002 has failed.

I don't know enough about the innards to say whether moving the
cache_set_state call to just after the release of the write lock on the
parent record is ok or not. 

I am happy to provide logs or other information if you want it.
Just let me know.

Ariel Glenn
AcIS R&D
Columbia University