Issue 5796 - back-sql intermittently failing
Summary: back-sql intermittently failing
Status: VERIFIED SUSPENDED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: backends (show other issues)
Version: 2.4.12
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-11-06 19:43 UTC by robb@webtechnologygroup.co.uk
Modified: 2020-06-25 23:25 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description robb@webtechnologygroup.co.uk 2008-11-06 19:43:04 UTC
Full_Name: Robert Brooks
Version: 2
OS: 2.4.12
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (62.49.19.14)


Set up is a pair of centos 4 hosts, one acts as primary, the other is a syncrepl
consumer and mysql slave. Apart from syncrepl config/myqsl slave all config and
software versions are the same. Mysql queries on both hosts are to localhost via
socket.

After a period of operation (24-48 hours) the slave host drops the back-sql part
of directory for some queries (same query will suceed some times). Using strace
it can been seen that the ldap searches are not resulting in queries to mysql.
Back-sql is a subordinate db and results from hdb backed part of directory are
returned.

Restarting slapd fixes the problem, but start-up reports the shutdown was
unclean on hdb db. Operation of primary is reliable.

Comment 1 ando@openldap.org 2008-11-07 00:02:39 UTC
changed notes
Comment 2 robb@webtechnologygroup.co.uk 2008-11-10 15:26:24 UTC
strace from failing slapd process

13637 send(3, "<167>Nov  5 10:34:39 slapd[13628"..., 117, MSG_NOSIGNAL) 
= 117
13637 semop(22052874, 0xb53ad7d0, 2)    = 0
13637 semop(22052874, 0xb53ad7e0, 1)    = 0
13637 time(NULL)                        = 1225881279
13637 semop(22052874, 0xb53ad750, 2)    = 0
13637 semop(22052874, 0xb53ad760, 1)    = 0
13637 write(57, "\27\3\1\0 
\334@\'\353]R\251-Q\251Dd>0\376&\227\326\4\222dU\t\266/\356\177"..., 90) 
= 90
13637 time(NULL)                        = 1225881279
13637 stat64("/etc/localtime", {st_dev=makedev(3, 3), st_ino=1032239, 
st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, 
st_blocks=16, st_size=
1323, st_atime=2008/11/05-10:31:17, st_mtime=2007/08/09-19:54:26, 
st_ctime=2007/08/09-19:54:26}) = 0
13637 stat64("/etc/localtime", {st_dev=makedev(3, 3), st_ino=1032239, 
st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, 
st_blocks=16, st_size=
1323, st_atime=2008/11/05-10:31:17, st_mtime=2007/08/09-19:54:26, 
st_ctime=2007/08/09-19:54:26}) = 0
13637 stat64("/etc/localtime", {st_dev=makedev(3, 3), st_ino=1032239, 
st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, 
st_blocks=16, st_size=
1323, st_atime=2008/11/05-10:31:17, st_mtime=2007/08/09-19:54:26, 
st_ctime=2007/08/09-19:54:26}) = 0
13637 send(3, "<167>Nov  5 10:34:39 slapd[13628"..., 76, MSG_NOSIGNAL) = 76

looks like it connect to mysql but never queries it.

any way I can capture something useful with gdb?

Comment 3 robb@webtechnologygroup.co.uk 2008-11-11 09:07:50 UTC
can re-create this problem by restarting mysql. Following a restart 
about one in five queries fails on back-sql portion of directory.

Comment 4 ando@openldap.org 2008-11-11 15:44:44 UTC
robb@wtg.cw.com wrote:
> Full_Name: Robert Brooks
> Version: 2
> OS: 2.4.12
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (62.49.19.14)
> 
> 
> Set up is a pair of centos 4 hosts, one acts as primary, the other is a syncrepl
> consumer and mysql slave. Apart from syncrepl config/myqsl slave all config and
> software versions are the same. Mysql queries on both hosts are to localhost via
> socket.
> 
> After a period of operation (24-48 hours) the slave host drops the back-sql part
> of directory for some queries (same query will suceed some times). Using strace
> it can been seen that the ldap searches are not resulting in queries to mysql.
> Back-sql is a subordinate db and results from hdb backed part of directory are
> returned.
> 
> Restarting slapd fixes the problem, but start-up reports the shutdown was
> unclean on hdb db. Operation of primary is reliable.

I have no possibility to setup a back-sql with mysql right now.  Can you 
post your configuration?  Can you post some (verbose) logs of operations 
affecting back-sql (e.g. with BACKSQL_TRACE manually #defined in 
back-sql/back-sql.h)?  Can you step thru a search into backsql_search() 
and see if it gets called and why it doesn't get to contacting the rdbms?

p.


Ing. Pierangelo Masarati
OpenLDAP Core Team

SysNet s.r.l.
via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
-----------------------------------
Office:  +39 02 23998309
Mobile:  +39 333 4963172
Fax:     +39 0382 476497
Email:   ando@sys-net.it
-----------------------------------

Comment 5 robb@webtechnologygroup.co.uk 2008-11-11 16:27:49 UTC
Pierangelo Masarati wrote:
> I have no possibility to setup a back-sql with mysql right now.  Can you 
> post your configuration?  Can you post some (verbose) logs of operations 
> affecting back-sql (e.g. with BACKSQL_TRACE manually #defined in 
> back-sql/back-sql.h)?  Can you step thru a search into backsql_search() 
> and see if it gets called and why it doesn't get to contacting the rdbms?

I'm compiling with BACKSQL_TRACE defined right now.

After further prodding I have discovered...

ejabberd (a jabber daemon) makes connections to ldap, which it keeps 
open indefinitely. It appears mysql's odbc connector will time out an 
idle mysql connection after 8 hours. Since this is the slave system it's 
entirely possible for it not touch mysql connections for at least 8 hours.

If we restart mysql some ldap queries fail on back-sql portion until we 
kill ejabberd (to close it's ldap connections).

I'll give you logs with tracing enabled asap, but this looks like a big 
clue as to what is happening.

Comment 6 ando@openldap.org 2008-11-11 16:29:58 UTC
Robert Brooks wrote:
> Pierangelo Masarati wrote:
>> I have no possibility to setup a back-sql with mysql right now.  Can 
>> you post your configuration?  Can you post some (verbose) logs of 
>> operations affecting back-sql (e.g. with BACKSQL_TRACE manually 
>> #defined in back-sql/back-sql.h)?  Can you step thru a search into 
>> backsql_search() and see if it gets called and why it doesn't get to 
>> contacting the rdbms?
> 
> I'm compiling with BACKSQL_TRACE defined right now.
> 
> After further prodding I have discovered...
> 
> ejabberd (a jabber daemon) makes connections to ldap, which it keeps 
> open indefinitely. It appears mysql's odbc connector will time out an 
> idle mysql connection after 8 hours. Since this is the slave system it's 
> entirely possible for it not touch mysql connections for at least 8 hours.
> 
> If we restart mysql some ldap queries fail on back-sql portion until we 
> kill ejabberd (to close it's ldap connections).
> 
> I'll give you logs with tracing enabled asap, but this looks like a big 
> clue as to what is happening.

OK, this makes sense.  In fact, back-sql does not seem to specifically 
handle broken connections to the odbc.  It appears it would need 
something similar to idle-timeout as in back-ldap and back-meta.

p.


Ing. Pierangelo Masarati
OpenLDAP Core Team

SysNet s.r.l.
via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
-----------------------------------
Office:  +39 02 23998309
Mobile:  +39 333 4963172
Fax:     +39 0382 476497
Email:   ando@sys-net.it
-----------------------------------

Comment 7 robb@webtechnologygroup.co.uk 2008-11-11 17:12:54 UTC
Pierangelo Masarati wrote:

> OK, this makes sense.  In fact, back-sql does not seem to specifically 
> handle broken connections to the odbc.  It appears it would need 
> something similar to idle-timeout as in back-ldap and back-meta.

ok, this looks like it...

Nov 11 17:01:38 nops slapd[24733]: backsql_dn2id("dc=sql,dc=example,dc=com")
: error executing query ("SELECT id,keyval,oc_map_id,dn FROM 
ldap_entries WHERE dn=?", "dc=sql,dc=example,dc=com"):
Nov 11 17:01:38 nops slapd[24733]: Return code: -1
Nov 11 17:01:38 nops slapd[24733]:    nativeErrCode=2006 
SQLengineState=08S01 msg="[unixODBC][MySQL][ODBC 3.51 
Driver][mysqld-5.0.67-log]MySQL server has gone away"
Nov 11 17:01:38 nops slapd[24733]: 
<==backsql_dn2id("dc=sql,dc=example,dc=com")
: err=80

if you'd like a tarball of full logs + config just say.

Regards,

Rob

Comment 8 ando@openldap.org 2008-11-11 17:17:10 UTC
changed notes
moved from Incoming to Software Bugs
Comment 9 robb@webtechnologygroup.co.uk 2008-11-17 16:14:48 UTC
Pierangelo Masarati wrote:

> OK, this makes sense.  In fact, back-sql does not seem to specifically 
> handle broken connections to the odbc.  It appears it would need 
> something similar to idle-timeout as in back-ldap and back-meta.

current work-around is to set "idletimeout" in openldap shorter than 
mysql's "wait_timeout" (default 8hrs).

Comment 10 robb@webtechnologygroup.co.uk 2009-01-12 11:27:20 UTC
Pierangelo Masarati wrote:

[...]

> OK, this makes sense.  In fact, back-sql does not seem to specifically 
> handle broken connections to the odbc.  It appears it would need 
> something similar to idle-timeout as in back-ldap and back-meta.

any luck with this?

Regards,

Rob

Comment 11 ando@openldap.org 2009-01-24 14:51:18 UTC
Robert Brooks wrote:

> any luck with this?

Not even in my priority list right now, sorry.  A patch would be welcome.

p.


Ing. Pierangelo Masarati
OpenLDAP Core Team

SysNet s.r.l.
via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
-----------------------------------
Office:  +39 02 23998309
Mobile:  +39 333 4963172
Fax:     +39 0382 476497
Email:   ando@sys-net.it
-----------------------------------

Comment 12 frederik.bosch@gmail.com 2010-09-07 13:34:40 UTC
http://www.openldap.org/lists/openldap-technical/201009/msg00070.html

While googling for reconnect in the ODBC API I found this:
http://dev.mysql.com/doc/refman/5.0/en/connector-odbc-configuration-connection-parameters.html

option 4194304: Enables auto-reconnection functionality. It seems like 
it fixes my problem.

Comment 13 OpenLDAP project 2014-08-01 21:04:18 UTC
back-sql needs retry
Comment 14 Quanah Gibson-Mount 2020-06-25 23:25:34 UTC
patches welcome