[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: EntryInfo cache size....

> I understand your point. But I think I need to reiterate - the EntryInfo
> cache is a tree structure, you cannot delete it in arbitrary order the way
> you have done for the Entry cache. Regardless of the order that you
> to search it, the branch leading from the root to the desired Entry must
> present. You cannot delete interior nodes of the tree, you can only delete
> leaf nodes. So if your access pattern visits interior nodes first, there
> no advantage to un-cache them immediately after use, before hitting the
> node; the nodes would be recreated again as soon as you look for the
> children.
> I wonder what kind of memory constraints you envision. For a 1 million
> directory the EntryInfo would consume probably 80-100MB. I think it's fair
> expect that a machine serving a million entries would have at least
> hundred MB of RAM available.

Because EntryInfo cache impacts less than the entry cache or dbcache,
for now it seems fine to support only the entry cache bypass ( some more
works to do though ;) )

More important issue seems to be the dbcache. The (virtual) size of the
dbcache is determined at the
ENV open time and cannot be changed afterwards. Also, it seems impossible to
bypass the dbcache unless there's an api for this I'm not aware of. (there
is ?)
The problem is, to reiterate, the dbcache can be polluted by a large
syncrepl session,
so as to make its working set unnecessarily large.

A solution that I'm currently thinking of is to make another database,
namely, id2UUID database,
and make a path to perform internal search on it instead of the id2entry
The size of id2UUID database would be much smaller (less than 1/10 of the
Also the working set coming from syncrepl's index db or dn2id db accesses
would be small in general.

Comments ?

> The previous EntryInfo code was LRU driven but because it is only allowed
> delete leaf nodes, it really didn't help much. The EntryInfo replacement
> pattern differs from the Entry pattern, and you can't keep them in
> without breaking the hierarchy, which would defeat the purpose of the
> back-hdb design.
> An alternative would be to restructure the way Searches work: instead of
> walking a flat IDL of candidates, process candidates in tree order. I've
> thought about this quite a bit; it may make some other issues easier to
> handle but it makes indexing much harder to use.