[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: OpenLDAP keeps on dying sporadically



Am 28.04.15 um 02:32 schrieb Howard Chu:
Leander Schäfer wrote:
Ok, here is the first result running the debugging mode with gdb(1)

 >> Procedure overview:
(gdb) run
(gdb) bt full
(gdb) thread apply all bt
(gdb) generate-core-file

No need for a core file if you're just running slapd inside gdb.
I thought the core file would be required to have a better base of knowledge about the internal happening?! But ofcourse I'm happy if it is not required, cause it doesn't seem to create it properly in the first place.


 >> This came up:
candidates = Error accessing memory address 0x7ffffeafb6f0: Bad address.

# ================================================== #

root@FreeBSD [~]$ gdb --args /usr/local/libexec/slapd -d -1 -f
/usr/local/etc/openldap/slapd.conf -u ldap -g ldap -h
"ldapi://%2fvar%2frun%2fopenldap%2fldapi/ ldap:/// ldaps:///"
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...
(gdb) run
Starting program: /usr/local/libexec/slapd -d -1 -f
/usr/local/etc/openldap/slapd.conf -u ldap -g ldap -h
ldapi://%2fvar%2frun%2fopenldap%2fldapi/\ ldap:///\ ldaps:///
[New LWP 101138]
[New Thread 802806400 (LWP 101138/slapd)]

[...]

553e8a87 conn=1006 op=2 SRCH attr=mailAlias
553e8a87 send_ldap_result: err=0 matched="" text=""
0010: 51 bd aa 7d 3f 1c 50 fb 25 f8 59 9e 9d 9a ba 0f Q..}?.P.%.Y..... 0020: d0 07 aa 95 ac 1c e7 3e 81 f6 e6 0b 6d 09 94 9b .......>....m... 0730: 1b 51 e3 08 4b 38 ec f1 ee 8c 0f 35 cd 55 eb 80 .Q..K8.....5.U..
553e8a87 ==> limits_get: conn=1006 op=2 self="[anonymous]"
this="ou=accounts,ou=mail,dc=mydomain,dc=local"
0740: 83 e2 3b b5 13 fd 08 51 13 25 d9 7d 57 9f 6b e9 ..;....Q.%.}W.k.
[New Thread 943c11800 (LWP 100198/slapd)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 943c11800 (LWP 100198/slapd)]
mdb_search (op=0x94581c400, rs=0x7ffffebfbb60) at search.c:404
404     search.c: No such file or directory.
         in search.c
Current language:  auto; currently minimal
(gdb) bt full
#0  mdb_search (op=0x94581c400, rs=0x7ffffebfbb60) at search.c:404
         mdb = (struct mdb_info *) 0x80290a000
         id = 0
         cursor = 0
         nsubs = 128
         ncand = 0
         cscope = 0
         lastid = 18446744073709551615
         candidates = Error accessing memory address 0x7ffffeafb6f0: Bad
address.
(gdb) thread apply all bt
[New Thread 943c15000 (LWP 101255/slapd)]
[New Thread 943c14c00 (LWP 101213/slapd)]
[New Thread 943c14800 (LWP 101202/slapd)]
[New Thread 943c14400 (LWP 100898/slapd)]
[New Thread 943c14000 (LWP 100884/slapd)]
[New Thread 943c13c00 (LWP 100647/slapd)]
[New Thread 943c13800 (LWP 100619/slapd)]
[New Thread 943c13400 (LWP 100577/slapd)]
[New Thread 943c13000 (LWP 100531/slapd)]
[New Thread 943c12c00 (LWP 100515/slapd)]
[New Thread 943c12800 (LWP 100347/slapd)]
[New Thread 943c12400 (LWP 100311/slapd)]
[New Thread 943c12000 (LWP 100296/slapd)]
[New Thread 943c11c00 (LWP 100268/slapd)]
[New Thread 943c11400 (LWP 100165/slapd)]
[New Thread 802807800 (LWP 100103/slapd)]

Thread 19 (Thread 802807800 (LWP 100103/slapd)):
#0  0x0000000801aa78cc in __error () from /lib/libthr.so.3
#1  0x0000000801aa27f4 in pthread_mutex_destroy () from /lib/libthr.so.3
#2  0x0000000801dfc237 in flockfile () from /lib/libc.so.7
#3  0x0000000801dd7e64 in fputs () from /lib/libc.so.7
#4  0x0000000800bfd48f in lutil_debug () from
/usr/local/lib/liblber-2.4.so.2
#5  0x000000000043b96f in slapd_daemon_task (ptr=0x8028afb08) at
daemon.c:2530
#6  0x0000000801a9c4f5 in pthread_create () from /lib/libthr.so.3

Seems like something went wrong here. Am I using gdb wrong?

Looks like your liblber was installed without debug symbols. Most of these stack traces look invalid.
What does this mean? How did you see this / what indicated this to you? Is it required to fix this liblber issue for a better debug result, or is it ok for first diagnosis?

Am 27.04.15 um 19:04 schrieb Michael Ströder:
Leander Schäfer wrote:
Can you please provide me a link, cause I wasn't able to find
"current RE24" on the official website nor on the FTP mirror.

Use git or this link to checkout snapshot of the RE24 branch:

http://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=snapshot;h=refs/heads/OPENLDAP_REL_ENG_2_4;sf=tgz

Assuming you compiled the latest snapshot, the SEGV at back-mdb/search.c:404 makes not much sense, it's a return statement.

Also, as back-mdb didn't exist 5 years ago, this cannot be the same issue you've been running into all the time.
^ Pretty much possible. But again - the exit points are always different. I could do the same thing with BDB and yet it would exit on different points.

Perhaps you've hit a stack overrun. Generally slapd uses 8MB stacks on 64bit machines. It seems from your ulimit output that 8MB should be fine, so that also seems unlikely.
Since this exceeds my current knowledge, I decided to asked the developement experts on FreeBSD Forums: https://forums.freebsd.org/threads/thread-stack-size-segmentation-fault.51419/


What was the full LDAP search request that was running at the moment of the crash? Mainly interested in seeing the search filter, and how complex it was, as well as the depth of the DIT.
The search filter looks quite simple to me:
==> 553e8a87 conn=1006 op=2 SRCH base="ou=accounts,ou=mail,dc=MyDomain,dc=Local" scope=2 deref=0 filter="(mailAddress=root@wm-01.mydomain.local)"

Here is a bit more of the debug output.

[...]

553e8a87 daemon: waked
  0670:  63 04 aa 27 74 94 4c 86  1d d7 3f a1 2f af 22 0f c..'t.L...?./.".
553e8a87  mailAlias553e8a87
  0680:  41 88 ed e0 c6 d0 4e 92  ed 2f b1 40 51 ae a8 77 A.....N../.@Q..w
ber_get_next: tag 0x30 len 12 contents:
  0690:  fe b9 b7 18 ae 33 df a4  27 50 0e 3d 8d 17 31 ee .....3..'P.=..1.
  06a0:  21 e9 ed c7 a2 90 fa 6f  3e b3 87 ff d8 6b d4 e8 !......o>....k..
  0030:  c8 57 d6 0e 59 3c 7b ef  b9 db fe 64 f9 4d 02 10 .W..Y<{....d.M..
  0040:  af 62 1a 21 b3 68 08 50  31 1e 6b 09 8f da 88 84 .b.!.h.P1.k.....
553e8a87 conn=1006 op=2 SRCH base="ou=accounts,ou=mail,dc=MyDomain,dc=Local" scope=2 deref=0 filter="(mailAddress=root@wm-01.mydomain.local)"
  0010:  6d 61 69 6c 2c 64 63 3d  4e 65 74 4f 63 65 61 6e mail,dc=Mydomain
  0020:  2c 64 63 3d 4c 6f 63 61  6c 0a 01 02 0a 01 00 02 ,dc=Local.......
  0030:  01 00 02 01 0a 01 01 00  a3 29 04 0b 6d 61 69 6c .........)..mail
  0040:  41 64 64 72 65 73 73 04  1a 72 6f 6f 74 40 77 6d Address..root@wm
ldap_read: want=8 error=Resource temporarily unavailable
ber_dump: buf=0x947c19070 ptr=0x947c19070 end=0x947c1907c len=12
553e8a87 conn=1011 op=1 do_bind
ber_scanf fmt ({imt) ber:
553e8a87 <= mdb_list_candidates: id=0 first=0 last=0
553e8a87 daemon: select: listen=10 active_threads=0 tvp=NULL
  0440:  2d 30 31 2e 4e 65 74 4f  63 65 61 6e 2e 4c 6f 63 -01.Mydomain.Loc
  0450:  61 6c 2f 50 4b 49 2f 43  41 2f 53 69 67 6e 69 6e al/PKI/CA/Signin
  0000:  aa 58 54 9e 30 18 bb 75  df 8e 62 e4 62 b0 5e 39 .XT.0..u..b.b.^9
553e8a87 send_ldap_response: msgid=3 tag=101 err=0
ber_flush2: 14 bytes to sd 19
TLS trace: SSL_accept:SSLv3 write finished A
  06b0:  aa 09 24 e0 e3 6a c8 f2  51 06 ca 80 0c 3c c3 f8 ..$..j..Q....<..
  06c0:  32 df 78 c7 1d 90 62 d0  a3 34 cd 63 18 fa 96 1f 2.x...b..4.c....
tls_write: want=226, written=226
  06d0:  4d 3e 23 5a 7f 8f c0 59  13 58 e7 c0 ad 15 a0 53 M>#Z...Y.X.....S
  06e0:  7d 48 35 a8 6c 5f d8 f2  b3 cc e2 dd 0e b0 54 02 }H5.l_........T.
553e8a87 <= mdb_filter_candidates: id=0 first=0 last=0
553e8a87 mdb_search_candidates: id=0 first=0 last=0
553e8a87 mdb_search: no candidates
  0050:  35 53 8d a9 94 ee bc 81  ab 9e ca af 51 dc 86 18 5S..........Q...
  0060:  f2 83 8b 13 23 1c b3 18  6e 9e 90 ce 07 7b 31 c1 ....#...n....{1.
  0070:  95 f3 7c 1e 85 14 4b 54  4b 52 8b ..|...KTKR.
ldap_read: want=91, got=91
ldap_read: want=8, got=8
ber_dump: buf=0x947171180 ptr=0x947171180 end=0x9471711f6 len=118
  0000:  02 01 03 63 71 04 28 6f  75 3d 61 63 63 6f 75 6e ...cq.(ou=accoun
  0000:  16 03 03 00 aa 04 00 00  a6 00 00 01 2c 00 a0 2b ............,..+
ber_dump: buf=0x948179180 ptr=0x948179183 end=0x9481791f6 len=115
  0000:  63 71 04 28 6f 75 3d 61  63 63 6f 75 6e 74 73 2c cq.(ou=accounts,
  0010:  6f 75 3d 6d 61 69 6c 2c  64 63 3d 4e 65 74 4f 63 ou=mail,dc=NetOc
553e8a87 send_ldap_result: conn=1005 op=2 p=3
553e8a87 conn=1006 op=2 SRCH attr=mailAlias
553e8a87 send_ldap_result: err=0 matched="" text=""
  0010:  51 bd aa 7d 3f 1c 50 fb  25 f8 59 9e 9d 9a ba 0f Q..}?.P.%.Y.....
  0020:  d0 07 aa 95 ac 1c e7 3e  81 f6 e6 0b 6d 09 94 9b .......>....m...
  0730:  1b 51 e3 08 4b 38 ec f1  ee 8c 0f 35 cd 55 eb 80 .Q..K8.....5.U..
553e8a87 ==> limits_get: conn=1006 op=2 self="[anonymous]" this="ou=accounts,ou=mail,dc=mydomain,dc=local"
  0740:  83 e2 3b b5 13 fd 08 51  13 25 d9 7d 57 9f 6b e9 ..;....Q.%.}W.k.
[New Thread 943c11800 (LWP 100198/slapd)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 943c11800 (LWP 100198/slapd)]
mdb_search (op=0x94581c400, rs=0x7ffffebfbb60) at search.c:404
404     search.c: No such file or directory.
        in search.c
Current language:  auto; currently minimal
(gdb) bt full
#0  mdb_search (op=0x94581c400, rs=0x7ffffebfbb60) at search.c:404
        mdb = (struct mdb_info *) 0x80290a000
        id = 0
        cursor = 0
        nsubs = 128
        ncand = 0
        cscope = 0
        lastid = 18446744073709551615
candidates = Error accessing memory address 0x7ffffeafb6f0: Bad address.
(gdb)