[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: ldap deadlock?



Howard Chu wrote:

Curt Blank wrote:

I'm looking for ideas here. ldap seems to deadlock once in a while whereby it continues to accept connections as noted in the log file but it does not return anything to the query, the query just hangs.

It's openldap 2.2.28 using Berkley db 4.2.52 as the backend on a SuSE 9.3 platform. All patches are up to snuff on the OS side.

I'm hoping for pointers to help see what might be going on.

As of today I started running db_deadlock in the background wit the -a y option to see if that helps.

This deadlocking is getting people up in arms here because it is disrupting authentication for the whole campus and I guess I can't blame them.

There have been no deadlocks reported in OpenLDAP 2.2 after 2.2.20. More likely you had an unclean shutdown and restarted without running db_recover, so you have stale locks in the environment. You should upgrade to 2.3 which does recovery automatically.

No, I know that isn't/wasn't the case, I manually ran db_recover with the -v option ~16 hours before the last occurrence of this and the server did not/was not shutdown in between nor did the slapd die and it wasn't stopped/started. This last time (last Friday) our backup started 12 minutes after it was only accepting connections and not responding with data and that really compounded the problem. The backup does a db_checkpoint and it hung and stopping the slapd daemon did not correct the problem. slapd stopped cleanly but when restarted it just sat there and would not even accept connections. The db_checkpoint would not complete and after about 10 minutes was killed. I know I know not the best thing to do but when you have people on campus pissed because they can't login time is one luxury that we do not have, and yes db_recover was successfully run again before slapd was started. But, I'm a bit leery of it right now....


One thing I failed to mention is that it appeared that a slurp replication to this slave server started at the time slapd started only accepting connections and not responding with data. So that's a write and that is what got me to start thinking about a deadlock situation.