Full_Name: JM Estrada Version: 2.4.39 OS: RHEL Linux URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (209.136.235.13) I am trying to determine if a problem we are having is a bug or some other issue with OpenLDAP 2.4.39. We have two servers configured as a Master/Slave using syncrepl. Both servers are running 2.4.39 and at random times, sometimes weeks apart, we are having issues where the slapd service becomes unresponsive for a period of about 10 to 15 minutes. When the problem occurs, we see numerous entries in the logs which show an UNBIND and then a close entry. We then find that the slapd is unresponsive and will not accept any requests, at the same time the CPU load for slapd skyrockets to about 100% or very close to it. This lasts for about 10-15 minutes and then the server recovers itself and again begins responding to requests. The problem is intermittent and doesn't seem to coincide with periods of heavy use versus lower usage. Is there a known bug with this version that could be causing this?
--On Tuesday, June 06, 2017 8:39 PM +0000 jmestrada69@gmail.com wrote: > Is there a known bug with this version that could be causing this? Hard to say. It is 3.5 years old and 6 releases behind. You don't state which backend you're using, which may be relevant as well. There were known fragmentation issues with back-mdb in that release, for example, that could cause extensive pauses. Without knowing significantly more about your system configuration, there's only a ton of speculation that can ensue. You may want to see about using the builds from the LTB project (<http://ltb-project.org/wiki/download#openldap>), or if you require support for your deployment, Symas (my employer) offers packaged builds and various support options. Regards, Quanah -- Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: <http://www.symas.com>
Quanah, We are running the Berkley DB back-end, ³back-bdb² in the slapd.conf file. Our server vendor did the upgrade to version 2.4.39 last year in April. In asking them about upgrading to a newer version, as a potential fix, I was told the last version in the RHEL repository that they can upgrade to is 2.4.40. I¹m not certain that our vendor will support our choice to upgrade to a newer version than what RHEL provides them in the repository, but if it will fix our problem, I¹ll have to push the envelope on that matter. Were there any known fragmentation issues with back-bdb in the 2.4.39 version that could also be causing these pauses? Initially, when we started having problems with the pausing, the server would go offline for about 15-20 minutes then recover itself. The developers had initially set the idletimeout to 8 minutes (480) and we also noted that rsyslogd was constantly logging entries about the slapd service, which stated that the the PID of the slapd service was losing messages to the log due to rate-limiting. Rate limiting was enabled by default for rsyslog so our vendor recommended to turn this off. At the same time, when they did this we scaled back the idletimeout period to 5 minutes (300). This seemed to aggravated the problem. With the original settings, we would encounter this ³pause² problem maybe once or twice in a 3 month period, and now after these changes were made we¹re seeing this more frequently, although when it does pause it seems to only be for about 10 minutes, where it was pausing for 15-20 before. We currently have the logging level set to the recommended ³256², but we¹re considering lowing the logging level also. Is it possible we have the idletimeout set too high and it should be lowered? I¹m wondering if there is some sweet-spot value for this particular setting. The reason our developers had it set so high was because, in the past they used to run some really long reports. I¹m pretty sure they do not run these any longer. I appreciate your feedback. Thanks On 6/6/17, 6:57 PM, "Quanah Gibson-Mount" <quanah@symas.com> wrote: >--On Tuesday, June 06, 2017 8:39 PM +0000 jmestrada69@gmail.com wrote: > >> Is there a known bug with this version that could be causing this? > >Hard to say. It is 3.5 years old and 6 releases behind. You don't state >which backend you're using, which may be relevant as well. There were >known fragmentation issues with back-mdb in that release, for example, >that >could cause extensive pauses. Without knowing significantly more about >your system configuration, there's only a ton of speculation that can >ensue. > >You may want to see about using the builds from the LTB project >(<http://ltb-project.org/wiki/download#openldap>), or if you require >support for your deployment, Symas (my employer) offers packaged builds >and >various support options. > >Regards, >Quanah > >-- > >Quanah Gibson-Mount >Product Architect >Symas Corporation >Packaged, certified, and supported LDAP solutions powered by OpenLDAP: ><http://www.symas.com> >
jmestrada69@gmail.com wrote: > Our server vendor did the upgrade to version 2.4.39 last year in April. In > asking them about upgrading to a newer version, as a potential fix, I was > told the last version in the RHEL repository that they can upgrade to is > 2.4.40. They seem to just recommend what seems to be the easiest choice for them and not what would be the recommended choice for *you*. RHEL packages are heavily patched by Red Hat and generally not recommended. The upstream developers cannot oversee what's the current patch state of RHEL packages. => You should kick out your server vendor from doing the OpenLDAP support. Ciao, Michael.
Yes, I've reached out to our vendor about this. I am hoping we can sidestep the RHEL releases. Thanks for the info on this. Sent from my iPhone > On Jun 7, 2017, at 6:16 AM, Michael Ströder <michael@stroeder.com> wrote: > > jmestrada69@gmail.com wrote: >> Our server vendor did the upgrade to version 2.4.39 last year in April. In >> asking them about upgrading to a newer version, as a potential fix, I was >> told the last version in the RHEL repository that they can upgrade to is >> 2.4.40. > > They seem to just recommend what seems to be the easiest choice for them and not what > would be the recommended choice for *you*. RHEL packages are heavily patched by Red Hat > and generally not recommended. The upstream developers cannot oversee what's the current > patch state of RHEL packages. > > => You should kick out your server vendor from doing the OpenLDAP support. > > Ciao, Michael.
--On Wednesday, June 07, 2017 7:07 AM -0600 Joaquin Estrada <jmestrada69@gmail.com> wrote: > Quanah, > > We are running the Berkley DB back-end, ³back-bdb² in the slapd.conf > file. > > Our server vendor did the upgrade to version 2.4.39 last year in April. In > asking them about upgrading to a newer version, as a potential fix, I was > told the last version in the RHEL repository that they can upgrade to is > 2.4.40. I¹m not certain that our vendor will support our choice to > upgrade to a newer version than what RHEL provides them in the > repository, but if it will fix our problem, I¹ll have to push the > envelope on that matter. > > Were there any known fragmentation issues with back-bdb in the 2.4.39 > version that could also be causing these pauses? No, back-bdb is not remotely the same as back-mdb. However, I've no idea what options RedHat compiles their BDB library with and there were specific options that had an effect on OpenLDAP. Generally, I would note that the back-bdb backend and back-hdb backends are deprecated at this point. > Is it possible we have the idletimeout set too high and it should be > lowered? I¹m wondering if there is some sweet-spot value for this > particular setting. I generally leave it unset unless one is encountering an issue of running out of connections. Generally, it would be fairly strange for idletimeout to affect things this way at all. It simply drops idle connections based off of the timer. Disabling rate throttling in rsyslogd is a good idea, but may be unrelated as well. We've also seen cases with RHEL7 where Redhat has set things up so that journald also gets all the syslog messages, which causes severe performance degredation. You could spend some time seeing if you can isolate an exact cause. For example, set loglevel to 0 and see if you still encounter the issue. If you do, it is unrelated to syslog activity. Another test would be to set idletimeout to 0. If you still encounter the issue, it is unrelated to idle connections being dropped. etc. As Michael noted, Redhat builds are somewhat questionable as they make various changes to the code base that the OpenLDAP project have not been reviewed. Your issues may or may not be related to such a change, it's generally impossible to know. Hope that helps. --Quanah -- Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: <http://www.symas.com>
quanah@symas.com wrote: >> Is it possible we have the idletimeout set too high and it should be >> lowered? I=C2=B9m wondering if there is some sweet-spot value for this >> particular setting. > > I generally leave it unset unless one is encountering an issue of running > out of connections. Generally, it would be fairly strange for idletimeout > to affect things this way at all. I generally recommend to set idletimeout even somewhat tight in case you don't have a strictly defined set of clients. Because a client application which does not use its LDAP connection for ~5 min. is most times simply not closing connections. And running out of file handles can affect all file creation on your system (e.g. creating BDB's transaction log files). Only the original poster can find out with monitoring. One can find out stale connections via back-monitor in sub-tree cn=Connections,cn=Monitor. IITC attribute 'monitorConnectionActivityTime' contains last client access time on this connection. (Ummh, I have to add this to my own monitoring script...) And of course normal system monitoring of file handles would be also helpful. Ciao, Michael. (Keep repeating this mantra: monitoring, monitoring, monitoring, monitoring…)
--On Wednesday, June 07, 2017 8:44 PM +0200 Michael Ströder <michael@stroeder.com> wrote: > quanah@symas.com wrote: >>> Is it possible we have the idletimeout set too high and it should be >>> lowered? I=C2=B9m wondering if there is some sweet-spot value for this >>> particular setting. >> >> I generally leave it unset unless one is encountering an issue of running >> out of connections. Generally, it would be fairly strange for >> idletimeout to affect things this way at all. > > I generally recommend to set idletimeout even somewhat tight in case you > don't have a strictly defined set of clients. Because a client > application which does not use its LDAP connection for ~5 min. is most > times simply not closing connections. And running out of file handles can > affect all file creation on your system (e.g. creating BDB's transaction > log files). Yep, there can be poorly written clients out there. I'd expect idletimeout to be completely unrelated, given it's long standing existence and use. ;) --Quanah -- Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: <http://www.symas.com>
back-bdb deprecated Need further information to pursue