[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: EntryInfo cache size....



>>>
-----Original Message-----
From: Jonghyuk Choi [mailto:jongchoi@us.ibm.com]

>Given how much back-hdb depends on EntryInfo caching, I think it would cost
>more to delete and recreate these nodes than to just keep them around. Keep
>in mind that you have to delete them bottom-up, but you're using them
>top-down. It would be quite messy. As a speed vs space issue, keeping them
is
>a big win. In fact I'm about to add code to keep "deleted" EntryInfo's on a
>free list, since they are needed so often.

The reason I'm focusing on the caching issue is that the syncrepl engine
accesses entries in a sequential fashion so there's little locality.
Also, the syncrepl access pattern and the client access pattern are
generally independent. In fact, a large sequential access by the syncrepl
engine can make the EntryInfo cache virtually ineffective when the system
is memory constrained, if we can't control the size of EntryInfo cache.
<<<
I understand your point. But I think I need to reiterate - the EntryInfo
cache is a tree structure, you cannot delete it in arbitrary order the way
you have done for the Entry cache. Regardless of the order that you request
to search it, the branch leading from the root to the desired Entry must be
present. You cannot delete interior nodes of the tree, you can only delete
leaf nodes. So if your access pattern visits interior nodes first, there is
no advantage to un-cache them immediately after use, before hitting the next
node; the nodes would be recreated again as soon as you look for the
children.

I wonder what kind of memory constraints you envision. For a 1 million entry
directory the EntryInfo would consume probably 80-100MB. I think it's fair to
expect that a machine serving a million entries would have at least several
hundred MB of RAM available.

The previous EntryInfo code was LRU driven but because it is only allowed to
delete leaf nodes, it really didn't help much. The EntryInfo replacement
pattern differs from the Entry pattern, and you can't keep them in lockstep
without breaking the hierarchy, which would defeat the purpose of the
back-hdb design.

An alternative would be to restructure the way Searches work: instead of
walking a flat IDL of candidates, process candidates in tree order. I've
thought about this quite a bit; it may make some other issues easier to
handle but it makes indexing much harder to use.

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support