[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: (ITS#3817) slapadd (glue?) dumps core under ridiculous situations
>
> Trying to add to a back-null database with the glue overlay, slapadd dumps
> core.
> I know that's utterly ridiculous, but at the same time, there's part of me
> that
> believes
> (a) software really shouldn't dump core ever; I'd expect more
> like "slapadd: database doesn't support necessary operations." and
"shouldn't" sounds reasonable. That's where users' help gets unvaluable:
I'd never test slapo-glue with back-null, nor add that check to the test
suite. I think the point is that the large number of backend types and
overlays is making a comprehensive test matrix simply too large to even be
considered.
> (b) I'm concerned that this might be a symptom of a more real bug.
It was: it appears that slapo-glue is not always testing the availability
of backend hooks before using them. This problem is nuw fixed in HEAD: do
cvs diff -u -r 1.19 -r 1.20 servers/slapd/overlays/glue.c
for a patch that should work for 2.3.4 as well. I'm auditing the code for
other potential occurrences.
>
>
> Quickly playing around with test011 -b null results in "slapadd: database
> doesn't
> support necessary operations." which is what I would expect.
Note that, on a more likely ground, I'm routinely running the test suite
with -b ldif, and roughly 70% of the tests pass. Most of those that do
not pass show a message like the above; few still just core dump, and in
few cases I also tracked the cause, but I'm too busy (or too lazy?) to fix
them.
> However, my
> full-blown
> config file results in:
>
> slapadd: could not add entry dn="cn=facstaffView,dc=rutgers,dc=edu"
> (line=9):
> Assertion failed: info->al_slot > 0, file alock.c, line 207
> Abort (core dumped)
>
> with a backtrace of:
>
> #0 0xfed1f82c in _lwp_kill () from /usr/lib/libc.so.1
> #1 0xfecd0a24 in raise () from /usr/lib/libc.so.1
> #2 0xfecb6ce0 in abort () from /usr/lib/libc.so.1
> #3 0xfecb6f80 in _assert () from /usr/lib/libc.so.1
> #4 0x000fd7b8 in alock_read_slot (info=0x2b6ee0, slot_data=0xffbff584)
> at alock.c:207
> #5 0x000fe7e8 in alock_close (info=0x2b6ee0) at alock.c:512
> #6 0x00100df8 in bdb_db_close () at tools.c:314
> #7 0x0006fdb8 in backend_shutdown (be=0x2b6c90) at backend.c:360
> #8 0x001780bc in glue_close (bi=0x2b7528) at glue.c:539
> #9 0x0006fd64 in backend_shutdown (be=0x2b8230) at backend.c:351
> #10 0x0009f13c in slap_shutdown (be=0x2b8230) at init.c:203
> #11 0x000ea158 in slap_tool_destroy () at slapcommon.c:604
> #12 0x000e8380 in slapadd (argc=5, argv=0xffbffbbc) at slapadd.c:376
> #13 0x000406e8 in main (argc=5, argv=0xffbffbbc) at main.c:279
This should have very little to do with -b null...
>
>
> I started trying to minimize my config file to get test011 into a repro
> case,
> but the
> symptoms changed. I think the major difference is lack of a "syncrepl"
> directive;
> then I receive:
>
> slapadd: line 9: database (dc=rutgers,dc=edu) not configured to hold
> "cn=blah,dc=rutgers,dc=edu"
> slapadd: line 9: database (dc=rutgers,dc=edu) not configured to hold
> "cn=blah,dc=rutgers,dc=edu"
> followed by a SEGV. The stack trace from gdb:
>
> #0 0x00177d24 in glue_tool_inst (bi=0x2a78e0) at glue.c:464
> #1 0x00178900 in glue_tool_sync (b0=???) at glue.c:693
>
> Incidently, pstack shows that glue_tool_sync in perpetuity, i.e.
>
> 00177d24 glue_tool_inst (2a78e0, 24684c, 0, 0, 0, 0) + c
> 001788f8 glue_tool_sync (2a7fe0, 24684c, 0, 0, 0, 0) + 10
> 0017897c glue_tool_sync (2a7fe0, 24684c, 0, 0, 0, 0) + 94
> 0017897c glue_tool_sync (2a7fe0, 24684c, 0, 0, 0, 0) + 94
> 0017897c glue_tool_sync (2a7fe0, 24684c, 0, 0, 0, 0) + 94
> [...infinitely...]
> 0017897c glue_tool_sync (2a7fe0, 24684c, 0, 0, 0, 0) + 94
> [...infinitely...]
>
> Because the symptoms changed, I got scared off from trying to make a repro
> case.
> I can pick that up again if the backtraces aren't helpful. On the bright
> side,
> this
> is mind-numbingly easy to reproduce, so I can give -d -1 etc. with minimal
> delay.
> Coming up with a full test case for one (or, even worse, both) should be
> possible too, but will obviously take longer.
>
I'll give that a look.
Thanks for reporting. p.
--
Pierangelo Masarati
mailto:pierangelo.masarati@sys-net.it
SysNet - via Dossi,8 27100 Pavia Tel: +390382573859 Fax: +390382476497