[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#3851) Berkeley DB Scalability Patch

> Hi Jong,
> I don't think you read my email very closely.
> 1) I noted I don't have systems with large amounts of RAM to test 
> large databases on.

Although I'm testing the BDB scalability patch on a machine having 12GB 
RAM, the BDB cache size is set to only 1.6GB. It's a 32-bit platform. 
What is  obvious between the lines of the performance results of the BDB 
scalability patch is that the patch makes it possible to add large 
databases in memory constrained systems which is typical in small to 
medium scale servers such as IBM BladeCenter and OpenPower. Hence, you 
should not consider not having a large amount of RAM as an inhibitor to 
large database experiments.

> 2) I didn't say this was a meaningful comparison.  I said this is 
> behavior I noticed when using a large set of indices.

What I suspect is that the performance concern you reported is more 
related to the memory pressure caused by heavy indexing, less to the 
heavy indexing itself. If you perform more experiments with varying 
degree of indexing or varying sizes of DIT, it will become clear where 
the performance difference came from.

> I am quite aware I'm only looking at one small data point.  But what 
> is significant to me about that data point, and what you said in your 
> previous emails about this patch, is that you are not using a 
> significant number of indices.  You are only using two (one 
> objectclass eq, one cn eq,sub).  What I saw in my 100k test is when I 
> went from 3 indices to 21, is that the scalability patch begins to 
> suffer.  Which is then why I asked, at the end of my email:
>>> Have you done any testing of your patch on large scale DB's with a
>>> good number of indices?
> That is the question I'm interested in having an answer to.  Its great 
> if the patch holds up if you do 500 billion entry databases.  If it 
> can't function when you have more than some X number of indices, where 
> X is rather small, then its usefulness becomes suspect.

I don't agree with your last point. As I said earlier, with the data you 
gathered, you cannot tell whether the performance difference comes from 
indexing or not. Also, scalability means a lot more than just having 
high performance for every scaling point in the range. I would think it 
acceptable to spend 12 minutes instead of 9 in adding 100K entries if it 
is able to reduce the time for 4 million entries from 11 hours to just one.

- Jong-Hyuk