[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#5221) cache? of parent failes for hdb



mån 2007-11-12 klockan 15:15 -0800 skrev Quanah Gibson-Mount:
> --On Monday, November 12, 2007 7:02 AM +0000 hyc@symas.com wrote:
> >
> > This isn't a lot of information to go on. If you can create a test
> > program  that shows the problem occurring, using dummy data, that would
> > help. --
> 
> Also, Just some general data on what it is you are doing that is a bit more 
> explanative. 

I have now done several days of testing and think I have tracked what is
wrong. All my tests have been done in 2.3.38.

What my program going to do is: move one person from one branch to an
other one. It does the following (in a simplified way):
 1) search for the person entry using base o=xxx and filter uid=yyyy
 2) does a modrdn from  cn=qqq+uid=yyy,a=aaaa,b=bbbb,c=cccc,o=xxx
    to cn=qqq+uid=yyy,d=dddd,e=eeee,c=cccc,o=xxx

I do this now on a newly started ldap server (that is cache have not
been filled). This is a special case that I found triggered what is
probably the bug I got previously but then the server may have been
running for a long time.

My analysis from all my debug prints indicate that:
 during 1) above the person entry is located and
hdb_cache_find_parent is called which calls hdb_dn2id_parent to find the
way to the root. From what I can see this constructs cache entries with
one kid entry in bei_kids. It does not load all kids of each entry found
along the path to root.

Next in 2) modrdn is called and bdb_cache_modrdn. This removes the
person entry from the a=aaaa entry. As the a=aaaa entry was cached
during 1) with just setting one child instead of loading all from
disk, that entry now has no children (bei_kids is NULL) so the state is
set to CACHE_ENTRY_NO_KIDS.

If I after this does a search with base c=cccc,o=xxx the hdb_dn2idl
routine will not find all entries as the cached entry of
a=aaaa,b=bbbb,c=cccc,o=xxx in hdb_dn2idl_internal has state
CACHE_ENTRY_NO_KIDS and is ignored.
If modrdn is step 2) before deleting the entry from its parents list of
kids, had loaded all kids from disk, is should have worked.

So the problem is, it my analysis is correct, is that sometimes cache
entries are created which have not loaded the children from disk and
then an other cache routine change the number of children in the cache
without first loading the correct number of children from disk.

If this looks correct to you, what code should I add to fix it?
It would be better if one of you who knows the code better than me could
do that. I can test and see if it works.

I hope this is the only place in cache handling of an entries children,
though maybe someone with better knowledge on the code can identify
others.

Hope this is the bug as I have used many days to trace it down and need
to do some normal work for my company.

Regards,

   Dan
-- 
Dan Oscarsson
TietoEnator                   Email: Dan.Oscarsson@tietoenator.com
Box 85
201 20  Malmo, Sweden