[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: EntryInfo cache size....

>This is probably a workable idea, but I'm not up on the whole details. What
info does syncrepl need, the entryCSN, the DN, what else?

The backend operation in concern is the be_search() in syncrepl_del_nonpresent().
For a large replication (e.g. objectClass=*), this search will touch most db pages.
Because the only information required is entryUUID in this search, I thought it would
be possible to use a separate db, id2UUID db, in the id2entry's stead.
We can have an Operation option to say this. bdb_search will use id2UUID instead
of id2entry in the search candidate loop, bypassing the entry and EntryInfo cache
by directly accessing the database.

Another option is to maintain a duplicate db home dir whose id2entry database
contains entryUUID attribute. We have to update both the dir at the same time.
The second db environment can be set with a very low db cache size and with
only necessary indexing required for syncrepl. The be_search() in syncrepl_del_nonpresent()
can be made to use only the second db, so syncrepl does not pollute the first
environment's dbcache or  entry cache.

Anyway, currently the syncrepl engine runs with quite stably for a mid scale setup.
I experimented with a 130K entry directory. With a LAN connection, the overhead of
incremental synchronization is observed negligble, with a constraint amount of
memory for each cache.

- Jong

Jong Hyuk Choi
IBM Thomas J. Watson Research Center - Enterprise Linux Group
P. O. Box 218, Yorktown Heights, NY 10598
email: jongchoi@us.ibm.com
(phone) 914-945-3979    (fax) 914-945-4425   TL: 862-3979