[Date Prev][Date Next]
Re: (ITS#3851) Berkeley DB Scalability Patch
> Hi Jong,
> I don't think you read my email very closely.
> 1) I noted I don't have systems with large amounts of RAM to test
> large databases on.
Although I'm testing the BDB scalability patch on a machine having 12GB
RAM, the BDB cache size is set to only 1.6GB. It's a 32-bit platform.
What is obvious between the lines of the performance results of the BDB
scalability patch is that the patch makes it possible to add large
databases in memory constrained systems which is typical in small to
medium scale servers such as IBM BladeCenter and OpenPower. Hence, you
should not consider not having a large amount of RAM as an inhibitor to
large database experiments.
> 2) I didn't say this was a meaningful comparison. I said this is
> behavior I noticed when using a large set of indices.
What I suspect is that the performance concern you reported is more
related to the memory pressure caused by heavy indexing, less to the
heavy indexing itself. If you perform more experiments with varying
degree of indexing or varying sizes of DIT, it will become clear where
the performance difference came from.
> I am quite aware I'm only looking at one small data point. But what
> is significant to me about that data point, and what you said in your
> previous emails about this patch, is that you are not using a
> significant number of indices. You are only using two (one
> objectclass eq, one cn eq,sub). What I saw in my 100k test is when I
> went from 3 indices to 21, is that the scalability patch begins to
> suffer. Which is then why I asked, at the end of my email:
>>> Have you done any testing of your patch on large scale DB's with a
>>> good number of indices?
> That is the question I'm interested in having an answer to. Its great
> if the patch holds up if you do 500 billion entry databases. If it
> can't function when you have more than some X number of indices, where
> X is rather small, then its usefulness becomes suspect.
I don't agree with your last point. As I said earlier, with the data you
gathered, you cannot tell whether the performance difference comes from
indexing or not. Also, scalability means a lot more than just having
high performance for every scaling point in the range. I would think it
acceptable to spend 12 minutes instead of 9 in adding 100K entries if it
is able to reduce the time for 4 million entries from 11 hours to just one.