Full_Name: Toby Blake Version: 2.4.15 OS: Scientific Linux 5.2 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (129.215.197.95) Hi there, I have been testing slapo-pcache/back-ldap for some time now, typically in student lab situations (to provide a localhost proxycaching slapd with back-ldap connecting (via gssapi) to several remote servers), initially using 2.3.x but I have recently moved to using the latest 2.4.x versions, as it's clear that's where the development effort is going. I'm seeing quite a lot of crashes during my testing, with a few different backtraces. I don't know whether you would prefer I submitted these separately for different backtraces, so I'm opening this ITS to get the process started... I have seen no way of reproducing these crashes. I'm currently using: - openldap-2.4.15 on scientific linux 5.2 - bdb 4.7.25 with 3 patches - We build our own RPMs. I have openldap-2.4.15 with no optimisation (-O0) for the purposes of debugging. The most common backtrace I see is an assertion failure in memory.c:152. Here's an example: Program terminated with signal 6, Aborted. #0 0x00b8d402 in __kernel_vsyscall () (gdb) bt #0 0x00b8d402 in __kernel_vsyscall () #1 0x009e8d20 in raise () from /lib/libc.so.6 #2 0x009ea631 in abort () from /lib/libc.so.6 #3 0x00a20e6b in __libc_message () from /lib/libc.so.6 #4 0x00a28b16 in _int_free () from /lib/libc.so.6 #5 0x00a2c070 in free () from /lib/libc.so.6 #6 0x081d5746 in ber_memfree_x (p=0xa6809478, ctx=0x0) at memory.c:152 #7 0x081d630c in ber_bvarray_free_x (a=0x9049bb8, ctx=0x0) at memory.c:731 #8 0x081d6343 in ber_bvarray_free (a=0x9049bb8) at memory.c:741 #9 0x08079757 in attr_clean (a=0xb62d41bc) at attr.c:134 #10 0x0807983b in attrs_free (a=0xb62d41bc) at attr.c:196 #11 0x0807c059 in entry_clean (e=0xb5acfdbc) at entry.c:504 #12 0x0811ec17 in ldap_back_search (op=0xa680e5d8, rs=0xb5ad10e0) at search.c:354 #13 0x080f2b02 in overlay_op_walk (op=0xa680e5d8, rs=0xb5ad10e0, which=op_search, oi=0x8fa9208, on=0x0) at backover.c:669 #14 0x080f2d0a in over_op_func (op=0xa680e5d8, rs=0xb5ad10e0, which=op_search) at backover.c:721 #15 0x080f2dae in over_op_search (op=0xa680e5d8, rs=0xb5ad10e0) at backover.c:743 #16 0x080720d6 in fe_op_search (op=0xa680e5d8, rs=0xb5ad10e0) at search.c:366 #17 0x08071a2c in do_search (op=0xa680e5d8, rs=0xb5ad10e0) at search.c:217 #18 0x0806e705 in connection_operation (ctx=0xb5ad11d0, arg_v=0xa680e5d8) at connection.c:1097 #19 0x0806ebdf in connection_read_thread (ctx=0xb5ad11d0, argv=0x13) at connection.c:1223 #20 0x0819ecad in ldap_int_thread_pool_wrapper (xpool=0x8f86fa0) at tpool.c:663 #21 0x00b3346b in start_thread () from /lib/libpthread.so.0 #22 0x00a90dbe in clone () from /lib/libc.so.6 (gdb) I have seen quite a few of these backtraces, although sometimes coming through slap_sl_free (and a couple of other places) instead of ber_bvarray_free_x. Let me know what further information I can provide. Many thanks Toby Blake School of Informatics University of Edinburgh
toby@inf.ed.ac.uk wrote: > Full_Name: Toby Blake > Version: 2.4.15 > OS: Scientific Linux 5.2 > URL: ftp://ftp.openldap.org/incoming/ > Submission from: (NULL) (129.215.197.95) > > > Hi there, > > I have been testing slapo-pcache/back-ldap for some time now, > typically in student lab situations (to provide a localhost > proxycaching slapd with back-ldap connecting (via gssapi) to several > remote servers), initially using 2.3.x but I have recently moved to > using the latest 2.4.x versions, as it's clear that's where the > development effort is going. > > I'm seeing quite a lot of crashes during my testing, with a few > different backtraces. I don't know whether you would prefer I > submitted these separately for different backtraces, so I'm opening > this ITS to get the process started... > > I have seen no way of reproducing these crashes. > > I'm currently using: > - openldap-2.4.15 on scientific linux 5.2 > - bdb 4.7.25 with 3 patches > - We build our own RPMs. I have openldap-2.4.15 with no optimisation > (-O0) for the purposes of debugging. > > The most common backtrace I see is an assertion failure in > memory.c:152. Here's an example: > > Program terminated with signal 6, Aborted. > #0 0x00b8d402 in __kernel_vsyscall () > (gdb) bt > #0 0x00b8d402 in __kernel_vsyscall () > #1 0x009e8d20 in raise () from /lib/libc.so.6 > #2 0x009ea631 in abort () from /lib/libc.so.6 > #3 0x00a20e6b in __libc_message () from /lib/libc.so.6 > #4 0x00a28b16 in _int_free () from /lib/libc.so.6 > #5 0x00a2c070 in free () from /lib/libc.so.6 > #6 0x081d5746 in ber_memfree_x (p=0xa6809478, ctx=0x0) at memory.c:152 > #7 0x081d630c in ber_bvarray_free_x (a=0x9049bb8, ctx=0x0) at memory.c:731 > #8 0x081d6343 in ber_bvarray_free (a=0x9049bb8) at memory.c:741 > #9 0x08079757 in attr_clean (a=0xb62d41bc) at attr.c:134 The problem seems to be here; apparently, the a->a_nvals array is being incorrectly freed. If you have those core files around, it would be interesting to see the contents of those attributes, e.g. p *a p a->a_desc[0] p a->a_flags p a->a_nvals[0] from within frame 9. > #10 0x0807983b in attrs_free (a=0xb62d41bc) at attr.c:196 > #11 0x0807c059 in entry_clean (e=0xb5acfdbc) at entry.c:504 > #12 0x0811ec17 in ldap_back_search (op=0xa680e5d8, rs=0xb5ad10e0) > at search.c:354 Also, it would be interesting to see the parameters of the search; e.g. p op->o_req_ndn p op->o_request.oq_search p op->o_request.oq_search.rs_filterstr p op->o_request.oq_search.rs_attrs[0] (unless NULL) from within frame 12 > #13 0x080f2b02 in overlay_op_walk (op=0xa680e5d8, rs=0xb5ad10e0, > which=op_search, oi=0x8fa9208, on=0x0) at backover.c:669 > #14 0x080f2d0a in over_op_func (op=0xa680e5d8, rs=0xb5ad10e0, which=op_search) > at backover.c:721 > #15 0x080f2dae in over_op_search (op=0xa680e5d8, rs=0xb5ad10e0) > at backover.c:743 > #16 0x080720d6 in fe_op_search (op=0xa680e5d8, rs=0xb5ad10e0) at search.c:366 > #17 0x08071a2c in do_search (op=0xa680e5d8, rs=0xb5ad10e0) at search.c:217 > #18 0x0806e705 in connection_operation (ctx=0xb5ad11d0, arg_v=0xa680e5d8) > at connection.c:1097 > #19 0x0806ebdf in connection_read_thread (ctx=0xb5ad11d0, argv=0x13) > at connection.c:1223 > #20 0x0819ecad in ldap_int_thread_pool_wrapper (xpool=0x8f86fa0) at tpool.c:663 > #21 0x00b3346b in start_thread () from /lib/libpthread.so.0 > #22 0x00a90dbe in clone () from /lib/libc.so.6 > (gdb) > > I have seen quite a few of these backtraces, although sometimes coming > through slap_sl_free (and a couple of other places) instead of > ber_bvarray_free_x. > > Let me know what further information I can provide. Finally, it would be great to see your slapd.conf. p. Ing. Pierangelo Masarati OpenLDAP Core Team SysNet s.r.l. via Dossi, 8 - 27100 Pavia - ITALIA http://www.sys-net.it ----------------------------------- Office: +39 02 23998309 Mobile: +39 333 4963172 Fax: +39 0382 476497 Email: ando@sys-net.it -----------------------------------
On 14 Mar 2009, at 15:49, Pierangelo Masarati wrote: Hello, Thanks for your quick reply. >> (gdb) bt >> #0 0x00b8d402 in __kernel_vsyscall () >> #1 0x009e8d20 in raise () from /lib/libc.so.6 >> #2 0x009ea631 in abort () from /lib/libc.so.6 >> #3 0x00a20e6b in __libc_message () from /lib/libc.so.6 >> #4 0x00a28b16 in _int_free () from /lib/libc.so.6 >> #5 0x00a2c070 in free () from /lib/libc.so.6 >> #6 0x081d5746 in ber_memfree_x (p=0xa6809478, ctx=0x0) at memory.c: >> 152 >> #7 0x081d630c in ber_bvarray_free_x (a=0x9049bb8, ctx=0x0) at >> memory.c:731 >> #8 0x081d6343 in ber_bvarray_free (a=0x9049bb8) at memory.c:741 >> #9 0x08079757 in attr_clean (a=0xb62d41bc) at attr.c:134 > > The problem seems to be here; apparently, the a->a_nvals array is > being incorrectly freed. If you have those core files around, it > would be interesting to see the contents of those attributes, e.g. > > p *a > p a->a_desc[0] > p a->a_flags > p a->a_nvals[0] > > from within frame 9. (gdb) up #9 0x08079757 in attr_clean (a=0xb62d41bc) at attr.c:134 134 ber_bvarray_free( a->a_nvals ); (gdb) p *a $1 = {a_desc = 0x8faa150, a_vals = 0x90487f0, a_nvals = 0x9049bb8, a_numvals = 631, a_flags = 0, a_next = 0x0} (gdb) p a->a_desc[0] $2 = {ad_next = 0x0, ad_type = 0x8fbba20, ad_cname = {bv_len = 9, bv_val = 0x8fba5d8 "memberUid"}, ad_tags = {bv_len = 0, bv_val = 0x0}, ad_flags = 0} (gdb) p a->a_flags $3 = 0 (gdb) p a->a_nvals[0] $4 = {bv_len = 8, bv_val = 0x905aac0 "v1ggoodh"} (gdb) >> #10 0x0807983b in attrs_free (a=0xb62d41bc) at attr.c:196 >> #11 0x0807c059 in entry_clean (e=0xb5acfdbc) at entry.c:504 >> #12 0x0811ec17 in ldap_back_search (op=0xa680e5d8, rs=0xb5ad10e0) >> at search.c:354 > > Also, it would be interesting to see the parameters of the search; > e.g. > > p op->o_req_ndn > p op->o_request.oq_search > p op->o_request.oq_search.rs_filterstr > p op->o_request.oq_search.rs_attrs[0] (unless NULL) > > from within frame 12 (gdb) up #12 0x0811ec17 in ldap_back_search (op=0xa680e5d8, rs=0xb5ad10e0) at search.c:354 354 entry_clean( &ent ); (gdb) p op->o_req_ndn $5 = {bv_len = 24, bv_val = 0xb44ce1e4 "dc=inf,dc=ed,dc=ac,dc=uk"} (gdb) p op->o_request.oq_search $6 = {rs_scope = 2, rs_deref = 0, rs_slimit = 24576, rs_tlimit = 3600, rs_limit = 0x8fa88b8, rs_attrsonly = 0, rs_attrs = 0xb44ce904, rs_filter = 0xb44ce86c, rs_filterstr = {bv_len = 112, bv_val = 0xb44ce87c "(&(objectClass=posixGroup)(| (memberUid=vlavrenk) (uniqueMember=uid=vlavrenk,ou=people,dc=inf,dc=ed,dc=ac,dc=uk)))"}} (gdb) p op->o_request.oq_search.rs_filterstr $7 = {bv_len = 112, bv_val = 0xb44ce87c "(&(objectClass=posixGroup)(| (memberUid=vlavrenk) (uniqueMember=uid=vlavrenk,ou=people,dc=inf,dc=ed,dc=ac,dc=uk)))"} (gdb) p op->o_request.oq_search.rs_attrs[0] $8 = {an_name = {bv_len = 9, bv_val = 0x907b012 "gidNumber"}, an_desc = 0x8f85930, an_oc_exclude = 0, an_oc = 0x0} (gdb) > Finally, it would be great to see your slapd.conf. No problem, this is the database, back-ldap, pcache part of it: database ldap suffix dc=inf,dc=ed,dc=ac,dc=uk rootdn uid=ldaprep/ hostname.inf.ed.ac.uk,cn=inf.ed.ac.uk,cn=gssapi,cn=auth uri "ldap://server1.inf.ed.ac.uk/,ldap:// server2.inf.ed.ac.uk" idassert-bind mode=none bindmethod=sasl saslmech=GSSAPI idassert-authzFrom "dn:*" overlay pcache proxycache bdb 5000 4 500 300 proxycachequeries 10000 proxyattrset 0 cn userPassword memberUid uniqueMember gidNumber proxyattrset 1 uid userPassword uidNumber gidNumber cn homeDirectory loginShell gecos description objectClass afsHomeDirectory primaryRoles secondary Roles proxyattrset 2 cn nisNetgroupTriple memberNisNetgroup proxyattrset 3 "*" proxytemplate (uid=) 1 1800 1800 proxytemplate (&(objectClass=)(uid=)) 1 1800 1800 proxytemplate (&(objectClass=)(cn=)) 0 1800 1800 proxytemplate (&(objectClass=)(cn=)) 2 1800 1800 proxytemplate (&(objectClass=)(|(memberUid=) (uniqueMember=))) 0 1800 1800 proxytemplate (&(objectClass=)(uniqueMember=)) 0 1800 1800 proxytemplate (&(objectClass=)(memberUid=)) 0 1800 1800 proxytemplate (&(objectClass=)(uidNumber=)) 1 1800 1800 proxytemplate (&(objectClass=)(gidNumber=)) 0 1800 1800 proxytemplate (&(objectClass=)(amdmapName=)(amdmapKey=)) 3 1800 1800 proxytemplate (&(objectClass=)(uid=)) 3 1800 1800 directory /var/openldap-data checkpoint 128 60 cachesize 10000 idlcachesize 10000 dbconfig set_cachesize 0 16777216 1 dbconfig set_lg_regionmax 262144 dbconfig set_lg_bsize 2097152 If there's anything else that would be useful, let me know. As I mentioned, I have a number of other core files, including some that look the same as this one. Cheers Toby -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
toby@inf.ed.ac.uk wrote: > On 14 Mar 2009, at 15:49, Pierangelo Masarati wrote: > > Hello, > > Thanks for your quick reply. > >>> (gdb) bt >>> #0 0x00b8d402 in __kernel_vsyscall () >>> #1 0x009e8d20 in raise () from /lib/libc.so.6 >>> #2 0x009ea631 in abort () from /lib/libc.so.6 >>> #3 0x00a20e6b in __libc_message () from /lib/libc.so.6 >>> #4 0x00a28b16 in _int_free () from /lib/libc.so.6 >>> #5 0x00a2c070 in free () from /lib/libc.so.6 >>> #6 0x081d5746 in ber_memfree_x (p=0xa6809478, ctx=0x0) at memory.c: >>> 152 >>> #7 0x081d630c in ber_bvarray_free_x (a=0x9049bb8, ctx=0x0) at >>> memory.c:731 >>> #8 0x081d6343 in ber_bvarray_free (a=0x9049bb8) at memory.c:741 >>> #9 0x08079757 in attr_clean (a=0xb62d41bc) at attr.c:134 >> The problem seems to be here; apparently, the a->a_nvals array is >> being incorrectly freed. If you have those core files around, it >> would be interesting to see the contents of those attributes, e.g. >> >> p *a >> p a->a_desc[0] >> p a->a_flags >> p a->a_nvals[0] >> >> from within frame 9. > > > (gdb) up > #9 0x08079757 in attr_clean (a=0xb62d41bc) at attr.c:134 > 134 ber_bvarray_free( a->a_nvals ); > (gdb) p *a > $1 = {a_desc = 0x8faa150, a_vals = 0x90487f0, a_nvals = 0x9049bb8, > a_numvals = 631, a_flags = 0, a_next = 0x0} > (gdb) p a->a_desc[0] > $2 = {ad_next = 0x0, ad_type = 0x8fbba20, ad_cname = {bv_len = 9, > bv_val = 0x8fba5d8 "memberUid"}, ad_tags = {bv_len = 0, bv_val = > 0x0}, > ad_flags = 0} > (gdb) p a->a_flags > $3 = 0 > (gdb) p a->a_nvals[0] > $4 = {bv_len = 8, bv_val = 0x905aac0 "v1ggoodh"} > (gdb) > > >>> #10 0x0807983b in attrs_free (a=0xb62d41bc) at attr.c:196 >>> #11 0x0807c059 in entry_clean (e=0xb5acfdbc) at entry.c:504 >>> #12 0x0811ec17 in ldap_back_search (op=0xa680e5d8, rs=0xb5ad10e0) >>> at search.c:354 >> Also, it would be interesting to see the parameters of the search; >> e.g. >> >> p op->o_req_ndn >> p op->o_request.oq_search >> p op->o_request.oq_search.rs_filterstr >> p op->o_request.oq_search.rs_attrs[0] (unless NULL) >> >> from within frame 12 > > > (gdb) up > #12 0x0811ec17 in ldap_back_search (op=0xa680e5d8, rs=0xb5ad10e0) > at search.c:354 > 354 entry_clean( &ent ); > (gdb) p op->o_req_ndn > $5 = {bv_len = 24, bv_val = 0xb44ce1e4 "dc=inf,dc=ed,dc=ac,dc=uk"} > (gdb) p op->o_request.oq_search > $6 = {rs_scope = 2, rs_deref = 0, rs_slimit = 24576, rs_tlimit = 3600, > rs_limit = 0x8fa88b8, rs_attrsonly = 0, rs_attrs = 0xb44ce904, > rs_filter = 0xb44ce86c, rs_filterstr = {bv_len = 112, > bv_val = 0xb44ce87c "(&(objectClass=posixGroup)(| > (memberUid=vlavrenk) > (uniqueMember=uid=vlavrenk,ou=people,dc=inf,dc=ed,dc=ac,dc=uk)))"}} > (gdb) p op->o_request.oq_search.rs_filterstr > $7 = {bv_len = 112, > bv_val = 0xb44ce87c "(&(objectClass=posixGroup)(| > (memberUid=vlavrenk) > (uniqueMember=uid=vlavrenk,ou=people,dc=inf,dc=ed,dc=ac,dc=uk)))"} > (gdb) p op->o_request.oq_search.rs_attrs[0] > $8 = {an_name = {bv_len = 9, bv_val = 0x907b012 "gidNumber"}, > an_desc = 0x8f85930, an_oc_exclude = 0, an_oc = 0x0} > (gdb) > > >> Finally, it would be great to see your slapd.conf. > > > No problem, this is the database, back-ldap, pcache part of it: > > database ldap > suffix dc=inf,dc=ed,dc=ac,dc=uk > rootdn uid=ldaprep/ > hostname.inf.ed.ac.uk,cn=inf.ed.ac.uk,cn=gssapi,cn=auth > uri "ldap://server1.inf.ed.ac.uk/,ldap:// > server2.inf.ed.ac.uk" > idassert-bind mode=none > bindmethod=sasl > saslmech=GSSAPI > idassert-authzFrom "dn:*" > > overlay pcache > > proxycache bdb 5000 4 500 300 > proxycachequeries 10000 > > proxyattrset 0 cn userPassword memberUid uniqueMember > gidNumber > proxyattrset 1 uid userPassword uidNumber gidNumber cn > homeDirectory > loginShell gecos description objectClass afsHomeDirectory primaryRoles > secondary > Roles > proxyattrset 2 cn nisNetgroupTriple memberNisNetgroup > proxyattrset 3 "*" > > proxytemplate (uid=) 1 1800 1800 > proxytemplate (&(objectClass=)(uid=)) 1 1800 1800 > proxytemplate (&(objectClass=)(cn=)) 0 1800 1800 > proxytemplate (&(objectClass=)(cn=)) 2 1800 1800 > proxytemplate (&(objectClass=)(|(memberUid=) > (uniqueMember=))) 0 1800 1800 > proxytemplate (&(objectClass=)(uniqueMember=)) 0 1800 1800 > proxytemplate (&(objectClass=)(memberUid=)) 0 1800 1800 > proxytemplate (&(objectClass=)(uidNumber=)) 1 1800 1800 > proxytemplate (&(objectClass=)(gidNumber=)) 0 1800 1800 > proxytemplate (&(objectClass=)(amdmapName=)(amdmapKey=)) 3 > 1800 1800 > proxytemplate (&(objectClass=)(uid=)) 3 1800 1800 > > directory /var/openldap-data > > checkpoint 128 60 > cachesize 10000 > idlcachesize 10000 > > dbconfig set_cachesize 0 16777216 1 > dbconfig set_lg_regionmax 262144 > dbconfig set_lg_bsize 2097152 > > > If there's anything else that would be useful, let me know. As I > mentioned, I have a number of other core files, including some that > look the same as this one. I did my best to reproduce your setup, but I got no crashes. One thing I'd like to ask is: I see some non-standard attributes in the pcache configuration. Can you tell whether there might occur some schema inconsistencies between the remote server and the proxy? Also, can you tell how often crashes occur, or any detail about the type of operation, the workload and so? I also disabled slab allocation, so that true mallocs occur all times, and valgrind is not showing any memory issue. p. Ing. Pierangelo Masarati OpenLDAP Core Team SysNet s.r.l. via Dossi, 8 - 27100 Pavia - ITALIA http://www.sys-net.it ----------------------------------- Office: +39 02 23998309 Mobile: +39 333 4963172 Fax: +39 0382 476497 Email: ando@sys-net.it -----------------------------------
> I did my best to reproduce your setup, but I got no crashes. One > thing I'd like to ask is: I see some non-standard attributes in the > pcache configuration. Can you tell whether there might occur some > schema inconsistencies between the remote server and the proxy? No, they're the same - the schema files are distributed via RPM and configured in slapd.conf with the same bit of configuration code. > Also, can you tell how often crashes occur, or any detail about the > type of operation, the workload and so? The machines I'm currently testing with are all in a student lab, running condor jobs when quiet, so a bit difficult to predict - I haven't seen any discernible pattern so far - I've seen crashes on busy machines and quiet ones. For the core file we're looking at here, this machine had answered 9 queries in the second prior to the crash, but had been quiet for 3 minutes prior to that after it had removed stale entries from the cache. I can attach a portion of the slapd log if it would be useful. As for how often the crashes occur, these 24 lab machines have been running 2.4.15 built with no optimisation since 27th Feb and have produced 18 core files in that time. Of these, approximately half have been assertion failures for free(p) in memory.c:152. I've seen 3 cases of segfaults in pcache_filter_cmp, as mentioned in ITS#5756, but I'll submit non-optimised backtraces to that ticket, rather than confuse the issue here. > I also disabled slab allocation, so that true mallocs occur all > times, and valgrind is not showing any memory issue. The difficulty is in trying to reproduce this - my own desktop machine is running with the same setup and has never crashed, despite my best efforts. Cheers Toby -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Hi there, I've been collecting a bit more information on these crashes... Since we started running 2.4.15: 26 core files, of these, 15 have crashed in memory.c:152 It looks like the majority of these were removing stale entries from the cache when slapd crashed, which wasn't the case with the original ticket I submitted. Also, the last thing logged by slapd in quite a few of these cases was along this lines of ... Mar 30 20:50:16 albany slapd[14239]: DELETING ENTRY TEMPLATE=db78c810-0c2a-49de-96af-f5796bbe8ca3 Mar 30 20:50:17 albany last message repeated 14 times ... I don't know whether this is useful? Here's the details of the latest crash, slapd log and backtrace: Mar 30 20:50:16 albany slapd[14239]: Lock CR index = 0x83a5140 Mar 30 20:50:16 albany slapd[14239]: TEMPLATE 0x83a5140 QUERIES-- 0 Mar 30 20:50:16 albany slapd[14239]: Unlock CR index = 0x83a5140 Mar 30 20:50:16 albany slapd[14239]: DELETING ENTRY TEMPLATE=e7ccb20c-d263-4d6c-ba23-62afe55f0158 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, SIZE=1 Mar 30 20:50:16 albany slapd[14239]: STORED QUERIES = 30 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, CACHE =22 entries Mar 30 20:50:16 albany slapd[14239]: Lock CR index = 0x83a4e90 Mar 30 20:50:16 albany slapd[14239]: TEMPLATE 0x83a4e90 QUERIES-- 15 Mar 30 20:50:16 albany slapd[14239]: Unlock CR index = 0x83a4e90 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, SIZE=0 Mar 30 20:50:16 albany slapd[14239]: STORED QUERIES = 29 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, CACHE =22 entries Mar 30 20:50:16 albany slapd[14239]: Lock CR index = 0x83a4e90 Mar 30 20:50:16 albany slapd[14239]: TEMPLATE 0x83a4e90 QUERIES-- 14 Mar 30 20:50:16 albany slapd[14239]: Unlock CR index = 0x83a4e90 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, SIZE=0 Mar 30 20:50:16 albany slapd[14239]: STORED QUERIES = 28 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, CACHE =22 entries Mar 30 20:50:16 albany slapd[14239]: Lock CR index = 0x83a4e90 Mar 30 20:50:16 albany slapd[14239]: TEMPLATE 0x83a4e90 QUERIES-- 13 Mar 30 20:50:16 albany slapd[14239]: Unlock CR index = 0x83a4e90 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, SIZE=0 Mar 30 20:50:16 albany slapd[14239]: STORED QUERIES = 27 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, CACHE =22 entries Mar 30 20:50:16 albany slapd[14239]: Lock CR index = 0x83a4e90 Mar 30 20:50:16 albany slapd[14239]: TEMPLATE 0x83a4e90 QUERIES-- 12 Mar 30 20:50:16 albany slapd[14239]: Unlock CR index = 0x83a4e90 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, SIZE=0 Mar 30 20:50:16 albany slapd[14239]: STORED QUERIES = 26 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, CACHE =22 entries Mar 30 20:50:16 albany slapd[14239]: Lock CR index = 0x83a4e90 Mar 30 20:50:16 albany slapd[14239]: TEMPLATE 0x83a4e90 QUERIES-- 11 Mar 30 20:50:16 albany slapd[14239]: Unlock CR index = 0x83a4e90 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, SIZE=0 Mar 30 20:50:16 albany slapd[14239]: STORED QUERIES = 25 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, CACHE =22 entries Mar 30 20:50:16 albany slapd[14239]: Lock CR index = 0x83a4e90 Mar 30 20:50:16 albany slapd[14239]: TEMPLATE 0x83a4e90 QUERIES-- 10 Mar 30 20:50:16 albany slapd[14239]: Unlock CR index = 0x83a4e90 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, SIZE=0 Mar 30 20:50:16 albany slapd[14239]: STORED QUERIES = 24 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, CACHE =22 entries Mar 30 20:50:16 albany slapd[14239]: Lock CR index = 0x83a4e90 Mar 30 20:50:16 albany slapd[14239]: TEMPLATE 0x83a4e90 QUERIES-- 9 Mar 30 20:50:16 albany slapd[14239]: Unlock CR index = 0x83a4e90 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, SIZE=0 Mar 30 20:50:16 albany slapd[14239]: STORED QUERIES = 23 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, CACHE =22 entries Mar 30 20:50:16 albany slapd[14239]: Lock CR index = 0x83a4e90 Mar 30 20:50:16 albany slapd[14239]: TEMPLATE 0x83a4e90 QUERIES-- 8 Mar 30 20:50:16 albany slapd[14239]: Unlock CR index = 0x83a4e90 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, SIZE=0 Mar 30 20:50:16 albany slapd[14239]: STORED QUERIES = 22 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, CACHE =22 entries Mar 30 20:50:16 albany slapd[14239]: Lock CR index = 0x83a4e90 Mar 30 20:50:16 albany slapd[14239]: TEMPLATE 0x83a4e90 QUERIES-- 7 Mar 30 20:50:16 albany slapd[14239]: Unlock CR index = 0x83a4e90 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, SIZE=0 Mar 30 20:50:16 albany slapd[14239]: STORED QUERIES = 21 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, CACHE =22 entries Mar 30 20:50:16 albany slapd[14239]: Lock CR index = 0x83a4e90 Mar 30 20:50:16 albany slapd[14239]: TEMPLATE 0x83a4e90 QUERIES-- 6 Mar 30 20:50:16 albany slapd[14239]: Unlock CR index = 0x83a4e90 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, SIZE=0 Mar 30 20:50:16 albany slapd[14239]: STORED QUERIES = 20 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, CACHE =22 entries Mar 30 20:50:16 albany slapd[14239]: Lock CR index = 0x83a4e90 Mar 30 20:50:16 albany slapd[14239]: TEMPLATE 0x83a4e90 QUERIES-- 5 Mar 30 20:50:16 albany slapd[14239]: Unlock CR index = 0x83a4e90 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, SIZE=0 Mar 30 20:50:16 albany slapd[14239]: STORED QUERIES = 19 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, CACHE =22 entries Mar 30 20:50:16 albany slapd[14239]: Lock CR index = 0x83a4e90 Mar 30 20:50:16 albany slapd[14239]: TEMPLATE 0x83a4e90 QUERIES-- 4 Mar 30 20:50:16 albany slapd[14239]: Unlock CR index = 0x83a4e90 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, SIZE=0 Mar 30 20:50:16 albany slapd[14239]: STORED QUERIES = 18 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, CACHE =22 entries Mar 30 20:50:16 albany slapd[14239]: Lock CR index = 0x83a4e90 Mar 30 20:50:16 albany slapd[14239]: TEMPLATE 0x83a4e90 QUERIES-- 3 Mar 30 20:50:16 albany slapd[14239]: Unlock CR index = 0x83a4e90 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, SIZE=0 Mar 30 20:50:16 albany slapd[14239]: STORED QUERIES = 17 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, CACHE =22 entries Mar 30 20:50:16 albany slapd[14239]: Lock CR index = 0x83a4e90 Mar 30 20:50:16 albany slapd[14239]: TEMPLATE 0x83a4e90 QUERIES-- 2 Mar 30 20:50:16 albany slapd[14239]: Unlock CR index = 0x83a4e90 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, SIZE=0 Mar 30 20:50:16 albany slapd[14239]: STORED QUERIES = 16 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, CACHE =22 entries Mar 30 20:50:16 albany slapd[14239]: Lock CR index = 0x83a4e90 Mar 30 20:50:16 albany slapd[14239]: TEMPLATE 0x83a4e90 QUERIES-- 1 Mar 30 20:50:16 albany slapd[14239]: Unlock CR index = 0x83a4e90 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, SIZE=0 Mar 30 20:50:16 albany slapd[14239]: STORED QUERIES = 15 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, CACHE =22 entries Mar 30 20:50:16 albany slapd[14239]: Lock CR index = 0x83a4e90 Mar 30 20:50:16 albany slapd[14239]: TEMPLATE 0x83a4e90 QUERIES-- 0 Mar 30 20:50:16 albany slapd[14239]: Unlock CR index = 0x83a4e90 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, SIZE=0 Mar 30 20:50:16 albany slapd[14239]: STORED QUERIES = 14 Mar 30 20:50:16 albany slapd[14239]: STALE QUERY REMOVED, CACHE =22 entries Mar 30 20:50:16 albany slapd[14239]: Lock CR index = 0x83a4e28 Mar 30 20:50:16 albany slapd[14239]: TEMPLATE 0x83a4e28 QUERIES-- 0 Mar 30 20:50:16 albany slapd[14239]: Unlock CR index = 0x83a4e28 Mar 30 20:50:16 albany slapd[14239]: DELETING ENTRY TEMPLATE=db78c810-0c2a-49de-96af-f5796bbe8ca3 Mar 30 20:50:17 albany last message repeated 14 times Program terminated with signal 6, Aborted. #0 0x00185402 in __kernel_vsyscall () (gdb) bt #0 0x00185402 in __kernel_vsyscall () #1 0x00476d20 in raise () from /lib/libc.so.6 #2 0x00478631 in abort () from /lib/libc.so.6 #3 0x004aee6b in __libc_message () from /lib/libc.so.6 #4 0x004b6b16 in _int_free () from /lib/libc.so.6 #5 0x004ba070 in free () from /lib/libc.so.6 #6 0x081d5746 in ber_memfree_x (p=0x841e688, ctx=0x0) at memory.c:152 #7 0x081d630c in ber_bvarray_free_x (a=0x843af58, ctx=0x0) at memory.c:731 #8 0x081d6343 in ber_bvarray_free (a=0x843af58) at memory.c:741 #9 0x0807979c in attr_clean (a=0xb62934ec) at attr.c:146 #10 0x0807983b in attrs_free (a=0xb62934ec) at attr.c:196 #11 0x0807c059 in entry_clean (e=0xb6727584) at entry.c:504 #12 0x0807c080 in entry_free (e=0xb6727584) at entry.c:514 #13 0x08140030 in bdb_entry_return (e=0xb6727584) at id2entry.c:229 #14 0x08134c0c in bdb_cache_delete_cleanup (cache=0x838956c, ei=0xb332a060) at cache.c:1316 #15 0x0813a824 in bdb_delete (op=0xb32fed5c, rs=0xb32febec) at delete.c:575 #16 0x08177be1 in remove_query_data (op=0xb32fed5c, rs=0xb32fecf8, query_uuid=0x83ff288) at pcache.c:1460 #17 0x0817af8f in consistency_check (ctx=0xb32ff1d0, arg=0x83f57e8) at pcache.c:2611 #18 0x0819ecad in ldap_int_thread_pool_wrapper (xpool=0x8366f90) at tpool.c:663 #19 0x005c746b in start_thread () from /lib/libpthread.so.0 #20 0x0051edbe in clone () from /lib/libc.so.6 (gdb) Also, I'll include details of another type of crash that I've seen 3 times since running 2.4.15 in the lab, at the risk of overloading information into this ticket. I'm including it here as the slapd behaviour at time of crash was similar to that noted above, i.e. removing stale entries from the cache: Mar 13 04:37:48 pemberton slapd[6767]: Lock CR index = 0x9ac9188 Mar 13 04:37:48 pemberton slapd[6767]: TEMPLATE 0x9ac9188 QUERIES-- 2 Mar 13 04:37:48 pemberton slapd[6767]: Unlock CR index = 0x9ac9188 Mar 13 04:37:48 pemberton slapd[6767]: DELETING ENTRY TEMPLATE=febcaede-1a2f-42ef-af2e-9206f0a4f678 Mar 13 04:37:48 pemberton slapd[6767]: STALE QUERY REMOVED, SIZE=1 Mar 13 04:37:48 pemberton slapd[6767]: STORED QUERIES = 32 Mar 13 04:37:48 pemberton slapd[6767]: STALE QUERY REMOVED, CACHE =19 entries Mar 13 04:37:48 pemberton slapd[6767]: Lock CR index = 0x9ac8eb0 Mar 13 04:37:48 pemberton slapd[6767]: TEMPLATE 0x9ac8eb0 QUERIES-- 15 Mar 13 04:37:48 pemberton slapd[6767]: Unlock CR index = 0x9ac8eb0 Mar 13 04:37:48 pemberton slapd[6767]: STALE QUERY REMOVED, SIZE=0 Mar 13 04:37:48 pemberton slapd[6767]: STORED QUERIES = 31 Mar 13 04:37:48 pemberton slapd[6767]: STALE QUERY REMOVED, CACHE =19 entries Mar 13 04:37:48 pemberton slapd[6767]: Lock CR index = 0x9ac8eb0 Mar 13 04:37:48 pemberton slapd[6767]: TEMPLATE 0x9ac8eb0 QUERIES-- 14 Mar 13 04:37:48 pemberton slapd[6767]: Unlock CR index = 0x9ac8eb0 Mar 13 04:37:48 pemberton slapd[6767]: STALE QUERY REMOVED, SIZE=0 Mar 13 04:37:48 pemberton slapd[6767]: STORED QUERIES = 30 Mar 13 04:37:48 pemberton slapd[6767]: STALE QUERY REMOVED, CACHE =19 entries Mar 13 04:37:48 pemberton slapd[6767]: Lock CR index = 0x9ac8eb0 Mar 13 04:37:48 pemberton slapd[6767]: TEMPLATE 0x9ac8eb0 QUERIES-- 13 Mar 13 04:37:48 pemberton slapd[6767]: Unlock CR index = 0x9ac8eb0 Mar 13 04:37:48 pemberton slapd[6767]: STALE QUERY REMOVED, SIZE=0 Mar 13 04:37:48 pemberton slapd[6767]: STORED QUERIES = 29 Mar 13 04:37:48 pemberton slapd[6767]: STALE QUERY REMOVED, CACHE =19 entries Mar 13 04:37:48 pemberton slapd[6767]: Lock CR index = 0x9ac8eb0 Mar 13 04:37:48 pemberton slapd[6767]: TEMPLATE 0x9ac8eb0 QUERIES-- 12 Mar 13 04:37:48 pemberton slapd[6767]: Unlock CR index = 0x9ac8eb0 Mar 13 04:37:48 pemberton slapd[6767]: STALE QUERY REMOVED, SIZE=0 Mar 13 04:37:48 pemberton slapd[6767]: STORED QUERIES = 28 Mar 13 04:37:48 pemberton slapd[6767]: STALE QUERY REMOVED, CACHE =19 entries Mar 13 04:37:48 pemberton slapd[6767]: Lock CR index = 0x9ac8eb0 Mar 13 04:37:48 pemberton slapd[6767]: TEMPLATE 0x9ac8eb0 QUERIES-- 11 Mar 13 04:37:48 pemberton slapd[6767]: Unlock CR index = 0x9ac8eb0 Mar 13 04:37:48 pemberton slapd[6767]: STALE QUERY REMOVED, SIZE=0 Mar 13 04:37:48 pemberton slapd[6767]: STORED QUERIES = 27 Mar 13 04:37:48 pemberton slapd[6767]: STALE QUERY REMOVED, CACHE =19 entries Mar 13 04:37:48 pemberton slapd[6767]: Lock CR index = 0x9ac8eb0 Mar 13 04:37:48 pemberton slapd[6767]: TEMPLATE 0x9ac8eb0 QUERIES-- 10 Mar 13 04:37:48 pemberton slapd[6767]: Unlock CR index = 0x9ac8eb0 Mar 13 04:37:48 pemberton slapd[6767]: STALE QUERY REMOVED, SIZE=0 Mar 13 04:37:48 pemberton slapd[6767]: STORED QUERIES = 26 Mar 13 04:37:48 pemberton slapd[6767]: STALE QUERY REMOVED, CACHE =19 entries Mar 13 04:37:48 pemberton slapd[6767]: Lock CR index = 0x9ac8e48 Mar 13 04:37:48 pemberton slapd[6767]: TEMPLATE 0x9ac8e48 QUERIES-- 1 Mar 13 04:37:48 pemberton slapd[6767]: Unlock CR index = 0x9ac8e48 Mar 13 04:37:48 pemberton slapd[6767]: DELETING ENTRY TEMPLATE=3eff8b71-4ca6-4918-b760-c32023dbe215 Mar 13 04:37:48 pemberton last message repeated 12 times Mar 13 04:37:48 pemberton slapd[6767]: REMOVING TEMP ATTR : TEMPLATE=3eff8b71-4ca6-4918-b760-c32023dbe215 Mar 13 04:37:48 pemberton last message repeated 2 times Mar 13 04:37:48 pemberton slapd[6767]: STALE QUERY REMOVED, SIZE=13 Mar 13 04:37:48 pemberton slapd[6767]: STORED QUERIES = 25 Mar 13 04:37:48 pemberton slapd[6767]: STALE QUERY REMOVED, CACHE =6 entries Mar 13 04:37:48 pemberton slapd[6767]: Lock CR index = 0x9ac8da8 Program terminated with signal 11, Segmentation fault. #0 0x081772d2 in remove_from_template (qc=0xaef209a8, template=0x9ac8da8) at pcache.c:1297 1297 qc->prev->next = qc->next; (gdb) bt #0 0x081772d2 in remove_from_template (qc=0xaef209a8, template=0x9ac8da8) at pcache.c:1297 #1 0x0817adc8 in consistency_check (ctx=0xaeeff1d0, arg=0x9b19860) at pcache.c:2594 #2 0x0819ecad in ldap_int_thread_pool_wrapper (xpool=0x9a8afa0) at tpool.c:663 #3 0x00bb046b in start_thread () from /lib/libpthread.so.0 #4 0x00b07dbe in clone () from /lib/libc.so.6 (gdb) (gdb) p qc->prev->next Cannot access memory at address 0x54 (gdb) p qc->prev $2 = (struct cached_query_s *) 0x30 (gdb) p *qc->prev Cannot access memory at address 0x30 (gdb) As always, if there's any more information that would be useful, let me know... Cheers Toby -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
toby@inf.ed.ac.uk wrote: > Hi there, I've been collecting a bit more information on these > crashes... > > Since we started running 2.4.15: > > 26 core files, of these, 15 have crashed in memory.c:152 > > It looks like the majority of these were removing stale entries from > the cache when slapd crashed, which wasn't the case with the original > ticket I submitted. Also, the last thing logged by slapd in quite a > few of these cases was along this lines of ... > > Mar 30 20:50:16 albany slapd[14239]: DELETING ENTRY TEMPLATE=db78c810-0c2a-49de-96af-f5796bbe8ca3 > Mar 30 20:50:17 albany last message repeated 14 times > > ... I don't know whether this is useful? > > Here's the details of the latest crash, slapd log and backtrace: > Program terminated with signal 6, Aborted. > #0 0x00185402 in __kernel_vsyscall () > (gdb) bt When a program aborts it will print an error message on stderr. It would be useful to have that message. Can you please update to RE24, the 2.4.16 candidate, and see if the behavior has changed? Also, can you please be sure to compile without optimization (or with frame pointers intact; gcc -fno-omit-frame-pointer); there seem to be a few functions missing in these stack traces. Also grab the latest overlays/pcache.c (1.168). It probably won't fix things, but it should make the crashes a little more obvious. > #0 0x00185402 in __kernel_vsyscall () > #1 0x00476d20 in raise () from /lib/libc.so.6 > #2 0x00478631 in abort () from /lib/libc.so.6 > #3 0x004aee6b in __libc_message () from /lib/libc.so.6 > #4 0x004b6b16 in _int_free () from /lib/libc.so.6 > #5 0x004ba070 in free () from /lib/libc.so.6 > #6 0x081d5746 in ber_memfree_x (p=0x841e688, ctx=0x0) at memory.c:152 > #7 0x081d630c in ber_bvarray_free_x (a=0x843af58, ctx=0x0) at memory.c:731 > #8 0x081d6343 in ber_bvarray_free (a=0x843af58) at memory.c:741 > #9 0x0807979c in attr_clean (a=0xb62934ec) at attr.c:146 > #10 0x0807983b in attrs_free (a=0xb62934ec) at attr.c:196 > #11 0x0807c059 in entry_clean (e=0xb6727584) at entry.c:504 > #12 0x0807c080 in entry_free (e=0xb6727584) at entry.c:514 > #13 0x08140030 in bdb_entry_return (e=0xb6727584) at id2entry.c:229 > #14 0x08134c0c in bdb_cache_delete_cleanup (cache=0x838956c, ei=0xb332a060) > at cache.c:1316 > #15 0x0813a824 in bdb_delete (op=0xb32fed5c, rs=0xb32febec) at delete.c:575 > #16 0x08177be1 in remove_query_data (op=0xb32fed5c, rs=0xb32fecf8, > query_uuid=0x83ff288) at pcache.c:1460 > #17 0x0817af8f in consistency_check (ctx=0xb32ff1d0, arg=0x83f57e8) > at pcache.c:2611 > #18 0x0819ecad in ldap_int_thread_pool_wrapper (xpool=0x8366f90) at tpool.c:663 > #19 0x005c746b in start_thread () from /lib/libpthread.so.0 > #20 0x0051edbe in clone () from /lib/libc.so.6 > (gdb) > > > Also, I'll include details of another type of crash that I've seen 3 > times since running 2.4.15 in the lab, at the risk of overloading > information into this ticket. I'm including it here as the slapd > behaviour at time of crash was similar to that noted above, > i.e. removing stale entries from the cache: Yeah, based on the debug output it's probably related. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
> When a program aborts it will print an error message on stderr. It > would be useful to have that message. Long since disappeared for this case, I'm afraid. I'll try and get it in the future. > Can you please update to RE24, the 2.4.16 candidate, and see if the > behavior has changed? Also, can you please be sure to compile > without optimization (or with frame pointers intact; gcc > -fno-omit-frame-pointer); there seem to be a few functions missing > in these stack traces. > > Also grab the latest overlays/pcache.c (1.168). It probably won't > fix things, but it should make the crashes a little more obvious. OK, I've built 2.4.16 (which I see has the new pcache.c) with no optimisation (-O0 and -fno-omit-frame-pointer) and will put in place in our test lab. Cheers Toby -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
OK, this is my first crash since moving to 2.4.16: Slightly different from the previous ones, but is coming through pcache.c:remove_from_template and, according to slapd log, was removing queries from cache when crash occurred: Apr 9 05:05:18 rockingham slapd[6662]: Lock CR index = 0x94e70f0 Apr 9 05:05:18 rockingham slapd[6662]: TEMPLATE 0x94e70f0 QUERIES-- 5 Apr 9 05:05:18 rockingham slapd[6662]: Unlock CR index = 0x94e70f0 Apr 9 05:05:18 rockingham slapd[6662]: STALE QUERY REMOVED, SIZE=0 Apr 9 05:05:18 rockingham slapd[6662]: STORED QUERIES = 26 Apr 9 05:05:18 rockingham slapd[6662]: STALE QUERY REMOVED, CACHE =17 entries Apr 9 05:05:18 rockingham slapd[6662]: Lock CR index = 0x94e70f0 Apr 9 05:05:18 rockingham slapd[6662]: TEMPLATE 0x94e70f0 QUERIES-- 4 Apr 9 05:05:18 rockingham slapd[6662]: Unlock CR index = 0x94e70f0 Apr 9 05:05:18 rockingham slapd[6662]: STALE QUERY REMOVED, SIZE=0 Apr 9 05:05:18 rockingham slapd[6662]: STORED QUERIES = 25 Apr 9 05:05:18 rockingham slapd[6662]: STALE QUERY REMOVED, CACHE =17 entries Apr 9 05:05:18 rockingham slapd[6662]: Lock CR index = 0x94e70f0 Apr 9 05:05:18 rockingham slapd[6662]: TEMPLATE 0x94e70f0 QUERIES-- 3 Apr 9 05:05:18 rockingham slapd[6662]: Unlock CR index = 0x94e70f0 Apr 9 05:05:18 rockingham slapd[6662]: STALE QUERY REMOVED, SIZE=0 Apr 9 05:05:18 rockingham slapd[6662]: STORED QUERIES = 24 Apr 9 05:05:18 rockingham slapd[6662]: STALE QUERY REMOVED, CACHE =17 entries Apr 9 05:05:18 rockingham slapd[6662]: Lock CR index = 0x94e70f0 Apr 9 05:05:18 rockingham slapd[6662]: TEMPLATE 0x94e70f0 QUERIES-- 2 Apr 9 05:05:18 rockingham slapd[6662]: Unlock CR index = 0x94e70f0 Apr 9 05:05:18 rockingham slapd[6662]: STALE QUERY REMOVED, SIZE=0 Apr 9 05:05:18 rockingham slapd[6662]: STORED QUERIES = 23 Apr 9 05:05:18 rockingham slapd[6662]: STALE QUERY REMOVED, CACHE =17 entries Apr 9 05:05:18 rockingham slapd[6662]: Lock CR index = 0x94e70f0 Apr 9 05:05:18 rockingham slapd[6662]: TEMPLATE 0x94e70f0 QUERIES-- 1 Apr 9 05:05:18 rockingham slapd[6662]: Unlock CR index = 0x94e70f0 Apr 9 05:05:18 rockingham slapd[6662]: STALE QUERY REMOVED, SIZE=0 Apr 9 05:05:18 rockingham slapd[6662]: STORED QUERIES = 22 Apr 9 05:05:18 rockingham slapd[6662]: STALE QUERY REMOVED, CACHE =17 entries Apr 9 05:05:18 rockingham slapd[6662]: Lock CR index = 0x94e70f0 Apr 9 05:05:18 rockingham slapd[6662]: TEMPLATE 0x94e70f0 QUERIES-- 0 Apr 9 05:05:18 rockingham slapd[6662]: Unlock CR index = 0x94e70f0 Apr 9 05:05:18 rockingham slapd[6662]: STALE QUERY REMOVED, SIZE=0 Apr 9 05:05:18 rockingham slapd[6662]: STORED QUERIES = 21 Apr 9 05:05:18 rockingham slapd[6662]: STALE QUERY REMOVED, CACHE =17 entries Apr 9 05:05:18 rockingham slapd[6662]: Lock CR index = 0x94e6eb0 Here's the backtrace: Program terminated with signal 11, Segmentation fault. #0 0x081764ed in pcache_query_cmp (v1=0xb1f0e900, v2=0x83e58955) at pcache.c:694 694 return pcache_filter_cmp( q1->first, q2->first ); (gdb) bt #0 0x081764ed in pcache_query_cmp (v1=0xb1f0e900, v2=0x83e58955) at pcache.c:694 #1 0x08199b6d in tavl_delete (root=0x825d7d0, data=0xb1f0e900, fcmp=0x81764d8 <pcache_query_cmp>) at tavl.c:202 #2 0x08177ab4 in remove_from_template (qc=0xb1f0e900, template=0x94e6eb0) at pcache.c:1302 #3 0x0817b57d in consistency_check (ctx=0xb44201d0, arg=0x9537c50) at pcache.c:2597 #4 0x0819fd01 in ldap_int_thread_pool_wrapper (xpool=0x94a8fa0) at tpool.c:663 #5 0x009c546b in start_thread () from /lib/libpthread.so.0 #6 0x0091cdbe in clone () from /lib/libc.so.6 (gdb) (gdb) p q1->first $3 = (Filter *) 0xb1f0d4c0 (gdb) p q2->first Cannot access memory at address 0x83e58959 Let me know what else would be useful from this backtrace. Cheers Toby -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
toby@inf.ed.ac.uk wrote: > OK, this is my first crash since moving to 2.4.16: > > Slightly different from the previous ones, but is coming through > pcache.c:remove_from_template and, according to slapd log, was > removing queries from cache when crash occurred: > > Apr 9 05:05:18 rockingham slapd[6662]: Lock CR index = 0x94e70f0 > Apr 9 05:05:18 rockingham slapd[6662]: TEMPLATE 0x94e70f0 QUERIES-- 5 > Apr 9 05:05:18 rockingham slapd[6662]: Unlock CR index = 0x94e70f0 > Apr 9 05:05:18 rockingham slapd[6662]: STALE QUERY REMOVED, SIZE=0 > Apr 9 05:05:18 rockingham slapd[6662]: STORED QUERIES = 26 > Apr 9 05:05:18 rockingham slapd[6662]: STALE QUERY REMOVED, CACHE =17 > entries > Apr 9 05:05:18 rockingham slapd[6662]: Lock CR index = 0x94e70f0 > Apr 9 05:05:18 rockingham slapd[6662]: TEMPLATE 0x94e70f0 QUERIES-- 4 > Apr 9 05:05:18 rockingham slapd[6662]: Unlock CR index = 0x94e70f0 > Apr 9 05:05:18 rockingham slapd[6662]: STALE QUERY REMOVED, SIZE=0 > Apr 9 05:05:18 rockingham slapd[6662]: STORED QUERIES = 25 > Apr 9 05:05:18 rockingham slapd[6662]: STALE QUERY REMOVED, CACHE =17 > entries > Apr 9 05:05:18 rockingham slapd[6662]: Lock CR index = 0x94e70f0 > Apr 9 05:05:18 rockingham slapd[6662]: TEMPLATE 0x94e70f0 QUERIES-- 3 > Apr 9 05:05:18 rockingham slapd[6662]: Unlock CR index = 0x94e70f0 > Apr 9 05:05:18 rockingham slapd[6662]: STALE QUERY REMOVED, SIZE=0 > Apr 9 05:05:18 rockingham slapd[6662]: STORED QUERIES = 24 > Apr 9 05:05:18 rockingham slapd[6662]: STALE QUERY REMOVED, CACHE =17 > entries > Apr 9 05:05:18 rockingham slapd[6662]: Lock CR index = 0x94e70f0 > Apr 9 05:05:18 rockingham slapd[6662]: TEMPLATE 0x94e70f0 QUERIES-- 2 > Apr 9 05:05:18 rockingham slapd[6662]: Unlock CR index = 0x94e70f0 > Apr 9 05:05:18 rockingham slapd[6662]: STALE QUERY REMOVED, SIZE=0 > Apr 9 05:05:18 rockingham slapd[6662]: STORED QUERIES = 23 > Apr 9 05:05:18 rockingham slapd[6662]: STALE QUERY REMOVED, CACHE =17 > entries > Apr 9 05:05:18 rockingham slapd[6662]: Lock CR index = 0x94e70f0 > Apr 9 05:05:18 rockingham slapd[6662]: TEMPLATE 0x94e70f0 QUERIES-- 1 > Apr 9 05:05:18 rockingham slapd[6662]: Unlock CR index = 0x94e70f0 > Apr 9 05:05:18 rockingham slapd[6662]: STALE QUERY REMOVED, SIZE=0 > Apr 9 05:05:18 rockingham slapd[6662]: STORED QUERIES = 22 > Apr 9 05:05:18 rockingham slapd[6662]: STALE QUERY REMOVED, CACHE =17 > entries > Apr 9 05:05:18 rockingham slapd[6662]: Lock CR index = 0x94e70f0 > Apr 9 05:05:18 rockingham slapd[6662]: TEMPLATE 0x94e70f0 QUERIES-- 0 > Apr 9 05:05:18 rockingham slapd[6662]: Unlock CR index = 0x94e70f0 > Apr 9 05:05:18 rockingham slapd[6662]: STALE QUERY REMOVED, SIZE=0 > Apr 9 05:05:18 rockingham slapd[6662]: STORED QUERIES = 21 > Apr 9 05:05:18 rockingham slapd[6662]: STALE QUERY REMOVED, CACHE =17 > entries > Apr 9 05:05:18 rockingham slapd[6662]: Lock CR index = 0x94e6eb0 > > > Here's the backtrace: > > Program terminated with signal 11, Segmentation fault. > #0 0x081764ed in pcache_query_cmp (v1=0xb1f0e900, v2=0x83e58955) > at pcache.c:694 > 694 return pcache_filter_cmp( q1->first, q2->first ); > (gdb) bt > #0 0x081764ed in pcache_query_cmp (v1=0xb1f0e900, v2=0x83e58955) > at pcache.c:694 > #1 0x08199b6d in tavl_delete (root=0x825d7d0, data=0xb1f0e900, > fcmp=0x81764d8<pcache_query_cmp>) at tavl.c:202 > #2 0x08177ab4 in remove_from_template (qc=0xb1f0e900, > template=0x94e6eb0) > at pcache.c:1302 > #3 0x0817b57d in consistency_check (ctx=0xb44201d0, arg=0x9537c50) > at pcache.c:2597 > #4 0x0819fd01 in ldap_int_thread_pool_wrapper (xpool=0x94a8fa0) at > tpool.c:663 > #5 0x009c546b in start_thread () from /lib/libpthread.so.0 > #6 0x0091cdbe in clone () from /lib/libc.so.6 > (gdb) > (gdb) p q1->first > $3 = (Filter *) 0xb1f0d4c0 > (gdb) p q2->first > Cannot access memory at address 0x83e58959 > > Let me know what else would be useful from this backtrace. Nothing obvious is jumping out here. Can you run a few of these slapds with libefence or valgrind? -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
On 10 Apr 2009, at 02:02, Howard Chu wrote: > Nothing obvious is jumping out here. Can you run a few of these > slapds with libefence or valgrind? I'll see what I can do. >> (gdb) p q1->first >> $3 = (Filter *) 0xb1f0d4c0 >> (gdb) p q2->first >> Cannot access memory at address 0x83e58959 > > print *q1 > > up > print *root (gdb) p *q1 $1 = {filter = 0xa, first = 0xb1f0d4c0, qbase = 0x825d7d0, scope = 0, q_uuid = {bv_len = 2985341416, bv_val = 0x1d <Address 0x1d out of bounds>}, q_sizelimit = 2147483647, qtemp = 0xb1f0d4c0, expiry_time = 136697880, next = 0xb1f10cd8, prev = 0xb1f031e0, lru_up = 0x1d, lru_down = 0x69647062, rwlock = {__data = {__lock = 1852386930, __nr_readers = 1684352614, __readers_wakeup = 778264878, __writer_wakeup = 27509, __nr_readers_queued = 21, __nr_writers_queued = 2985295944, __flags = 72 'H', __shared = 0 '\0', __pad1 = 240 '?', __pad2 = 177 '?', __writer = 0}, __size = "r.inf.ed.ac.uk\000\000\025\000\000\000H\000?H\000? \000\000\000", __align = 1852386930}} (gdb) up #1 0x08199b6d in tavl_delete (root=0x825d7d0, data=0xb1f0e900, fcmp=0x81764d8 <pcache_query_cmp>) at tavl.c:202 202 side = fcmp( data, p->avl_data ); (gdb) p *root $2 = (Avlnode *) 0x81d8ac4 (gdb) p **root $3 = {avl_data = 0x83e58955, avl_link = {0x7d8318ec, 0x24750008}, avl_bits = "?D", avl_bf = 36 '$'} (gdb) Cheers Toby -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
changed notes changed state Open to Closed
no feedback