[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#6275) syncrepl taking long(not sync) when consumer not connect for a moment



--On Thursday, August 27, 2009 6:39 AM -0700 Rodrigo Costa 
<rlvcosta@yahoo.com> wrote:

> Quanah,
>
> Please see answer in your previous e-mail below.
>
> I'm also sending the information I could collect attached since it is a
> small file(5KB).
>
> The behavior that appears strange and that could indicate a problem is
> the fact that even when consumer is stopped the provider still doing
> something for a long time. This doesn't appear to be correct.
>
> Other strange behavior is that when system enters in this state one
> provider CPU stays running around 100% CPU usage. I made a jmeter script
> to test individual bind/search(no ldapsearch *) and then even with some
> load(like 200 simultaneous query) I do not see CPU in 100%. Something
> doesn't appear to be ok since I do not see why CPU should enter in 100%
> permanently.

I explained to you previously why this would be.  Other comments inline.

>> Why are you stopping the provider to do a slapcat?
> [Rodrigo]Faster dump of data. And in any case if other situation like a
> problema occurs the secondary system could stay disconnect for other
> reasons.

Do you have any evidence that an offline slapcat is faster than one while 
slapd is running?  I don't understand what you mean in the rest of that 
sentence.

>>> Even a small number of entrances are different when consumer in
>>> Provider 2
>>> connects to Provider 1 then syncrepl enters in the full DB search as
>>> expected.
>>
>>
>> What is your sessionlog setting on each provider for the syncprov
>> overlay?
> [Rodrigo]
> syncprov-checkpoint 10000 120
> syncprov-sessionlog 100000

Hm, I would probably checkpoint the cookie a lot more frequently than you 
have it set to.  The sessionlog setting seems fine to me.

> Same configuration in both systems.
>>
>>> For definition purposes I have some memory limitations where I need to
>>> limit dncachesize for around 80% of DB entrances.
>>
>> We already went through other things you could do to reduce your
>> memory footprint in other ways.  You've completely ignored that
>> advice.  As long as your dncachesize is in this state, I don't expect
>> things to behave normally.
> [Rodrigo]I implemented what was possible. The end is this cache config
> possible by the memory constraints :
># Cache values
># cachesize       10000
> cachesize       20000
> dncachesize     3000000
># dncachesize    400000
># idlcachesize    10000
> idlcachesize    30000
># cachefree       10
> cachefree       100

You don't say anything in here about your DB_CONFIG settings, which is 
where you could stand to gain the most amount of memory back.  I do see 
you're definitely running a very restricted cachesize/idlcachesize. ;)



>> What value did you set for "cachefree"?
> [Rodrigo] cachefree       100


This value is likely substantially way too low for your system 
configuration.  This is how many entries get freed from any of the caches. 
With your dncachesize being 3,000,000, removing 100 entries from it will do 
hardly anything, and may be part of the issue.  If it wasn't for the major 
imbalance between your entry, idl, and dncachesizes, I would suggest a 
fairly high value like 100,000.  But given your entry cache is 20,000, 
you'll probably have to limit the cachefree to 5000-10000.  But it is going 
to need to be higher than 100.

--Quanah

--

Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra ::  the leader in open source messaging and collaboration