[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#6845) sortvals oddities - with more information



Full_Name: Andrew Elble
Version: 2.4.24 / CVS Head
OS: Solaris / MacOS
URL: 
Submission from: (NULL) (129.21.6.207)


  We are using sortvals on both member and memberUid. We have been
seeing duplicate member/memberUid attributes on some objects that have
been modified (as well as a lack of sorting on those attributes). It
seemed that there was a correlation between modifies to objects that
experienced deadlocks and the objects that had duplicate
member/memberUid attributes on them. We put the seqmod overlay in
place - and this reduced the number of occurrences of the issue but
did not eliminate them.

Upon further investigation, I discovered that it was possible to
bypass the sorting behavior if the object was not created with an
instance of the attribute with sorting enabled as a part of it.

It would seem that attr_merge() (in attr.c) should have something like this:

        if ( *a == NULL ) {
                *a = attr_alloc( desc );
                if (desc->ad_type->sat_flags & SLAP_AT_SORTED_VAL) {
                  (*a)->a_flags |= SLAP_ATTR_SORTED_VALS;
                }
        } else {

Further pursuing the issue, I started to focus on the index deletion
code that was changed as a part of ITS#5183. Specifically, the
portion of code within bdb_modify_internal() (in back-bdb/modify.c)
that is commented:

                  /* Move deleted values to end of array */

This code modifies save_attrs, which is actually apparently a pointer
to memory that resides within the cache. If a deadlock occurs, these
changes are not reverted, thereby corrupting the entry in the cache. I
replaced this code with the pre-ITS#5183 code and I am no longer able
to 'break' the object and insert duplicate member/memberUids.

I also found it surprising that the call to bdb_idl_cache_del() in
bdb_idl_delete_key() in back-bdb/idl.c occurred prior to any calls to
the database?

I can answer any questions about the specifics of the environment
in which where we are seeing this - it is a somewhat difficult problem to
reproduce outside of our production environment. I'm not terribly
familiar with the code - I'm looking to see if I have collected enough
data here to open an ITS to have this fixed. (or if I'm just way off base)


Thanks,

Andy