[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
ITS#51 (hang in test 2 on solaris)
I have located the problem (which still exists by the way in
the LDAP_REL_ENG_1_2 as of about 4:30 yesterday afternoon (New York
time).
Here's a description:
One thread does do_add, enters ldbm_back_add,
locks the parent (o=...c=...) record,
adds the entry, does send_ldap_request.
In the meantime the next thread is taking the next request,
enters do_add. It finds the parent record, calling dn2id;
this calls cache_find_entry_dn2id which mutex locks
the cache c_mutex.
Back to the first thread which wants to finish. It now tries to
do cache_set_state, (which wants the cache c_mutex lock)
*before* it releases the write lock on that
same parent record. It can't get the lock; it hangs.
The second thread now is trying to get a read lock on the parent record
(still in dn2id); it can't because the other guy has the write lock. It
hangs.
And then I kill -9 the mess and test002 has failed.
I don't know enough about the innards to say whether moving the
cache_set_state call to just after the release of the write lock on the
parent record is ok or not.
I am happy to provide logs or other information if you want it.
Just let me know.
Ariel Glenn
AcIS R&D
Columbia University