[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: ldapmodify crashes with long-ish input for an entry

On Sat, Jan 03 2015 at 01:31:10 +0000, Howard Chu scribbled
 in "Re: ldapmodify crashes with long-ish input for an entry":

Hi Howard,

> openldap-its is not a discussion list at all. Both lists require you
> to actually submit a ticket and reference the ticket number.

Thanks for the clarification.

> > We use OpenLDAP for a few directories, one of which is in the
> > process of being migrated to newer hardware, with OS upgrade
> > thrown in, and I've noticed an issue with ldapmodify that I
> > thought was worth reporting.  The directory in question has some
> > scripted tooling around it to manage updates from a number of
> > sources, which are staged in a Postgresql database before having
> > some LDIF generated to update the directory itself.
> >
> > During the course of my testing (we've not seen this in
> > production, thankfully) I've noticed that, with reasonably lengthy
> > updates for an entry, ldapmodify dies with an error like the
> > following:
> Define lengthy - how many bytes?

For the complete LDIF of the entry update in question, it is 672808
bytes (658K) -- I haven't done an exhaustive "trial and error"
analysis to find the boundary between OK and too-big, but at 68233
bytes (67K, or 1000 lines of updates for the entry) the update
completes OK, but at 205105 bytes (201K or 3000 lines) the update
fails as described.

> > #---8<--- Command Output ------------------------------------------
> > modifying entry "<DN-FOR-FAILED-ENTRY>"
> > ldap_result: Can't contact LDAP server (-1)
> > #---8<-------------------------------------------------------------
> Define crash - that doesn't look like a crash.

Sorry, you're right.  I should rather have said that ldapmodify
behaves in an unexpected manner and fails to complete the update when
the list of changes for a DN is reasonably lengthy.

> > There are matching log-entries in the system's syslog (timestamp,
> > hostname and PID trimmed off to save some linewrapping) and slapd
> > logs (we run slapd under daemontools supervision, and capture it's
> > stdout/stderr):
> >
> > #---8<---- SysLog Output ------------------------------------------
> > local4.debug slapd: conn=1002 op=3158 MOD
> >     dn="<DN-OF-LAST-GOOD-ENTRY>"
> > local4.debug slapd: conn=1002 op=3158 MOD attr=member
> > local4.debug slapd: conn=1002 op=3158 RESULT tag=103 err=0 text=
> > local4.debug slapd: conn=1002 fd=13 closed (connection lost)
> > #---8<-------------------------------------------------------------
> >
> > #---8<---- Slapd Output -------------------------------------------
> > 54a2e47f conn=1002 op=3158 MOD dn="<DN-OF-LAST-GOOD-ENTRY>"
> > 54a2e47f conn=1002 op=3158 MOD attr=member
> > 54a2e47f conn=1002 op=3158 RESULT tag=103 err=0 text=
> > sb_sasl_cyrus_decode: failed to decode packet: generic failure
> > sb_sasl_generic_read: failed to decode packet
> > 54a2e47f conn=1002 fd=13 closed (connection lost)
> > #---8<-------------------------------------------------------------
> That looks like a bug in Cyrus SASL. What SASL mechanism are you
> using?

GSSAPI -- we use Kerberos for nearly everything, and the task in
question uses a credential cache created from a keytab using k5start,
though as an independent process (we're not using k5start to run the
task's commands itself).

/etc/sasl/slapd.conf for the system in question contains only
"mech_list: gssapi".

While I must admit that C isn't my forte (I'm more on the admin side
of sysadmin than on the development side), I'm comfortable reading
code and had a look through the code-base for those sb_sasl* messages
and didn't see anything specific enough for my untrained (or
unfamiliar) eye to use as a clue for further sleuthing.

While it is quite probably an incorrect assumption, I assumed at the
time that the messages may have been caused by the client side closing
or dropping the connection unexpectedly.

If it is of any interest, I see the same issue with both
libsasl2-modules-gssapi-heimdal and libsasl2-modules-gssapi-mit Debian

> > The LDIF for the failed entry consists of:
> >
> > #---8<-------------------------------------------------------------
> > changetype: modify
> > replace: member
> > member: <DN-FOR-MEMBER>
> > ...
> > #---8<-------------------------------------------------------------
> >
> > where the list of members was, in this case, 9799 long.  The LDIF
> > itself is 30097 lines long, and was happy for the first ~15000
> > lines.
> >
> > If I prune out the modifications for the troublesome DN, the
> > remainder of the file also goes through happily.
> >
> > As a work-around I can manually split up the list into several
> > blocks (tested with roughly 1000 member updates per block) with
> > "replace: member" for the first, to match the current behaviour,
> > and "add: member" for the rest. In this format, ldapmodify is
> > happy to process the LDIF (all in one connection, but as discreet
> > operations).  (Note: the authenticated user has "unlimited" limits
> > in the config)
> >
> > Given that, it sounds like it could be a bug in ldapmodify (I
> > don't think it's on the slapd end).  I've tested on both the
> > debian stable package (version 2.4.31-1+nmu2), and with a locally
> > compiled build direct from the project's download pages (also
> > 2.4.31 to start with, to see if it was an introduced bug).
> >
> > Has anyone else on the list seen anything like this before?
> 2.4.31 is over 2 years old. 2.4.41 is about to be released. Debian
> is known to break their packages from time to time with ill-advised
> "security" patches. You should try with freshly built source.

Yeah, we've hit that sort of thing before, and is one reason why we
have compiled our own packages in the past (not just for openldap),
but it's something we try to avoid if possible, and attempt to
re-converge on Debian packages where patches and updates we we're
applying are eventually applied in Debian.

Yesterday I complied 2.4.40 and switched the managed configuration to
use that package for the test system in question.  Unfortunately it
had the same issue.

Since writing originally, I've had a look (with my previously admitted
limited C skills) to compare any differences between ldapmodify for
the (working) 2.4.24, the (not working) 2.4.31 and the (also not
working) 2.4.40.  There's little change to speak of from what I can
see, and nothing that looked to my eye like it might be the cause
(though I've only really looked through clients/tools/ldapmodify.c,
and very briefly through libraries/libldap/modify.c)

After that, I'm less sure that ldapmodify is where the problem lies
(rather it's just where I'm seeing it manifest).

Thanks for your help Howard -- if you're at FlossUK again this year I
should remember to buy you a pint by way of thanks in general (not
just as a bribe for helping me out now :) and do bring the violin
again, it's a great way to start a presentation! (Re: your talk about
mdb in Edinburgh 2012)

> -- 
>   -- Howard Chu
>   CTO, Symas Corp.           http://www.symas.com
>   Director, Highland Sun     http://highlandsun.com/hyc/
>   Chief Architect, OpenLDAP  http://www.openldap.org/project/



><> ><> ><> ><> ><> ><> ooOoo <>< <>< <>< <>< <>< <><
Dameon Wagner, Systems Development and Support Team
IT Services, University of Oxford
><> ><> ><> ><> ><> ><> ooOoo <>< <>< <>< <>< <>< <><