[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#3817) slapadd (glue?) dumps core under ridiculous situations



>
> Trying to add to a back-null database with the glue overlay, slapadd dumps
> core.
> I know that's utterly ridiculous, but at the same time, there's part of me
> that
> believes
> (a) software really shouldn't dump core ever; I'd expect more
> like "slapadd: database doesn't support necessary operations." and

"shouldn't" sounds reasonable.  That's where users' help gets unvaluable:
I'd never test slapo-glue with back-null, nor add that check to the test
suite.  I think the point is that the large number of backend types and
overlays is making a comprehensive test matrix simply too large to even be
considered.

> (b) I'm concerned that this might be a symptom of a more real bug.

It was: it appears that slapo-glue is not always testing the availability
of backend hooks before using them.  This problem is nuw fixed in HEAD: do

    cvs diff -u -r 1.19 -r 1.20 servers/slapd/overlays/glue.c

for a patch that should work for 2.3.4 as well.  I'm auditing the code for
other potential occurrences.

>
>
> Quickly playing around with test011 -b null results in "slapadd: database
> doesn't
> support necessary operations." which is what I would expect.

Note that, on a more likely ground, I'm routinely running the test suite
with -b ldif, and roughly 70% of the tests pass.  Most of those that do
not pass show a message like the above; few still just core dump, and in
few cases I also tracked the cause, but I'm too busy (or too lazy?) to fix
them.


> However, my
> full-blown
> config file results in:
>
> slapadd: could not add entry dn="cn=facstaffView,dc=rutgers,dc=edu"
> (line=9):
> Assertion failed: info->al_slot > 0, file alock.c, line 207
> Abort (core dumped)
>
> with a backtrace of:
>
> #0  0xfed1f82c in _lwp_kill () from /usr/lib/libc.so.1
> #1  0xfecd0a24 in raise () from /usr/lib/libc.so.1
> #2  0xfecb6ce0 in abort () from /usr/lib/libc.so.1
> #3  0xfecb6f80 in _assert () from /usr/lib/libc.so.1
> #4  0x000fd7b8 in alock_read_slot (info=0x2b6ee0, slot_data=0xffbff584)
>     at alock.c:207
> #5  0x000fe7e8 in alock_close (info=0x2b6ee0) at alock.c:512
> #6  0x00100df8 in bdb_db_close () at tools.c:314
> #7  0x0006fdb8 in backend_shutdown (be=0x2b6c90) at backend.c:360
> #8  0x001780bc in glue_close (bi=0x2b7528) at glue.c:539
> #9  0x0006fd64 in backend_shutdown (be=0x2b8230) at backend.c:351
> #10 0x0009f13c in slap_shutdown (be=0x2b8230) at init.c:203
> #11 0x000ea158 in slap_tool_destroy () at slapcommon.c:604
> #12 0x000e8380 in slapadd (argc=5, argv=0xffbffbbc) at slapadd.c:376
> #13 0x000406e8 in main (argc=5, argv=0xffbffbbc) at main.c:279

This should have very little to do with -b null...

>
>
> I started trying to minimize my config file to get test011 into a repro
> case,
> but the
> symptoms changed. I think the major difference is lack of a "syncrepl"
> directive;
> then I receive:
>
> slapadd: line 9: database (dc=rutgers,dc=edu) not configured to hold
> "cn=blah,dc=rutgers,dc=edu"
> slapadd: line 9: database (dc=rutgers,dc=edu) not configured to hold
> "cn=blah,dc=rutgers,dc=edu"
> followed by a SEGV. The stack trace from gdb:
>
> #0  0x00177d24 in glue_tool_inst (bi=0x2a78e0) at glue.c:464
> #1  0x00178900 in glue_tool_sync (b0=???) at glue.c:693
>
> Incidently, pstack shows that glue_tool_sync in perpetuity, i.e.
>
>  00177d24 glue_tool_inst (2a78e0, 24684c, 0, 0, 0, 0) + c
>  001788f8 glue_tool_sync (2a7fe0, 24684c, 0, 0, 0, 0) + 10
>  0017897c glue_tool_sync (2a7fe0, 24684c, 0, 0, 0, 0) + 94
>  0017897c glue_tool_sync (2a7fe0, 24684c, 0, 0, 0, 0) + 94
>  0017897c glue_tool_sync (2a7fe0, 24684c, 0, 0, 0, 0) + 94
> [...infinitely...]
>  0017897c glue_tool_sync (2a7fe0, 24684c, 0, 0, 0, 0) + 94
> [...infinitely...]
>
> Because the symptoms changed, I got scared off from trying to make a repro
> case.
> I can pick that up again if the backtraces aren't helpful. On the bright
> side,
> this
> is mind-numbingly easy to reproduce, so I can give -d -1 etc. with minimal
> delay.
> Coming up with a full test case for one (or, even worse, both) should be
> possible too, but will obviously take longer.
>

I'll give that a look.

Thanks for reporting.  p.

-- 
Pierangelo Masarati
mailto:pierangelo.masarati@sys-net.it


    SysNet - via Dossi,8 27100 Pavia Tel: +390382573859 Fax: +390382476497