[Date Prev][Date Next] [Chronological] [Thread] [Top]

Trouble found: substring searches very broken (ITS#402)



On Fri, Dec 17, 1999 at 12:45:53AM +0000, ahu@casema.net wrote:

> > You might try using 'ldbmcat -n' and then use ldapadd.
> 
> Ok, I'm nearly there. An excerpt from the trace log (which has been expanded
> a bit by additional logging). id 301 is the one we are looking for.

I'm there :-)

If I understand correctly, something very silly is going on. If a direct
idl block is full, we split it, whereby every id smaller then the one we are
trying to insert goes to the 'lower' block, and everything above to the
'upper' block. We then insert an indirect block containing pointers to the
separate blocks.

Additional keys are always inserted 'in order' in a block.

This way, when we later concatenate all blocks mentioned in the indirect
block, the keys are in order, and idl_intersection can take advantage of
this.

However, something else is also going on, see idl.c:

        /* insert the id */
        switch ( idl_insert( &tmp, id, db->dbc_maxids ) ) {
        case 0:         /* id inserted ok */
                if ( (rc = idl_store( be, db, k2, tmp )) != 0 ) {
                        Debug( LDAP_DEBUG_ANY,
                            "idl_store of (%s) returns %d\n", k2.dptr, rc, 0
);
                }
                break;

        case 1:         /* id inserted - first id in block has changed */
                /*
                 * key for this block has changed, so we have to
                 * write the block under the new key, delete the
                 * old key block + update and write the indirect
                 * header block.
                 */

                rc = idl_change_first( be, db, key, idl, i, k2, tmp );
                break;

        case 2:         /* id not inserted - already there, do nothing */
                rc = 0;
                break;

        case 3:         /* id not inserted - block is full */
                /*
                 * first, see if it will fit in the next block,
                 * without splitting, unless we're trying to insert
                 * into the beginning of the first block.
                 */

                /* is there a next block? */

This shouldn't happen! This means that keys get inserted in blocks where
they don't belong, causing the straight concatenation of all blocks to be
unprocessable by idl_intersection:

        for ( ni = 0, ai = 0, bi = 0; ai < ID_BLOCK_NIDS(a); ai++ ) {
                for ( ;
                        bi < ID_BLOCK_NIDS(b) && ID_BLOCK_ID(b, bi) < 
                        bi++ ) 
                {
                        ;       /* NULL */
                }

                if ( bi == ID_BLOCK_NIDS(b) ) {
                        break;
                }

                if ( ID_BLOCK_ID(b, bi) == ID_BLOCK_ID(a, ai) ) {
                        ID_BLOCK_ID(n, ni++) = ID_BLOCK_ID(a, ai);
                }
        }

	(been mangled a bit by cutting & pasting from the screen)

This function assumes that all ids are in order so it can stop searching
very rapidly.

This assumption is invalidated by the 'is there room in the next block'
optimization.

Now, this can be fixed by replacing the above snippet by:

        for ( ni = 0, ai = 0; ai < ID_BLOCK_NIDS(a); ai++ )
        for ( bi = 0; bi < ID_BLOCK_NIDS(b); bi++ )
        {
                if ( ID_BLOCK_ID(b, bi) == ID_BLOCK_ID(a, ai) ) {
                        ID_BLOCK_ID(n, ni++) = ID_BLOCK_ID(a, ai);
                }
        }


This however is lots slower and doesn't scale very well eventually. It is
however the only way to fix the problem on an already indexed database. This
works instantaneously.

If you also apply the since merged fix to stop searching once
idl_intersection has cut down the selection to a reasonable number (instead
of continuing to intersect with single valued idls), this might be
manageable.

The true fix however is to disable the 'is there room in the next block'
optimization, or make sure that it enters it in any hypothetical space
between the end of the current and the beginning of the beginning of the
next block.

This can be done by checking that the id that is going to be spliced in the
next block is larger than the largest id in the current block. We can also
be sure that the id will then be the first entry in the next block,
necessitating an idl_change_first() call.

I leave implementing this as an excercise for the reader. I've been
debugging for 16 hours straight and hope to get some sleep now :-)

With kind regards,


bert hubert.

-- 
    +---------------+  |              http://www.rent-a-nerd.nl
    | nerd for hire |  |                  
    +---------------+  |                     - U N I X -
            |          |          Inspice et cautus eris - D11T'95