RE: benchmarks &real world

On Thu, 15 Apr 2004, Howard Chu wrote:

"Howard Chu" wrote:

I've played with DirectoryMark a few times, after seeing it bandied about.
In fact, the data set that it uses is randomly generated, so there is no
canonical fixed data set. Nor is there a standardized job mix or load
factor. In articles I've seen that report DirectoryMark figures, these critical input
parameters are not included. As such, none of the published results I've ever
seen could be compared to any concrete reference, which makes the usage of
> > >DirectoryMark rather meaningless.
> > Well, what other/better benchmark do you suggest then ?
> The DirectoryMark tools are basically OK. Improving it is a simple matter of
> applying the scientific method: make it consistent and repeatable.
> Using a single data set for all tests would be a good step there. Since the
> data set could be quite large, it would be acceptable to continue to
> "randomly" generate it, as long as
> 	A) the pseudo-random number generation algorithm is fixed
> and	B) the input seed is fixed
> for all invocations of the test. (This is just a cheat to be able to
> distribute a constant dataset without having to actually store it in expanded
> LDIF form.)
> Likewise, the hardware environment for the test must be fixed. The client
> machines must be fully specified - CPU, RAM, network interface, OS version,
> etc. as well as the servers. The operation mix must be fixed. Again, the job
> mix can be "randomly" generated as long as the sequence can be perfectly
> repeated - i.e., use the known pseudo-random number generator and input seed.
> All of these parameters must be either published with the DirectoryMark
> report, or everyone must agree to a canonical set of parameters so that they
> can be explicitly referenced/linked to when publishing a report.

The only published DirectoryMark results for OpenLDAP I saw did not 
specify how many operations were performend per bind, and my own 
benchmarks (2 OSs, 2 OpenLDAP releases, 2 backend types, and 3 databases - 
2 generated by DirectoryMark, one a real-world db) showed that this had 
the biggest effect, and since the published benchmark used similar 
hardware, I must assume the tests were done with one bind for the whole 
benchmark. This happens to be the parameter which has the biggest effect 
(besides indexing) ...

> The fact that certain vendors' servers are not available on certain machines
> or OS revisions complicates matters, because you can't get an
> apples-to-apples comparison then. In that case, whatever numbers you get are
> basically useless for product comparisons.
> The notion of benchmarking is surely a well-understood topic, it surprises me
> that no one working in this space has demonstrated any actual understanding
> of it thus far.

I think that's a bit harsh.  But, it may be a good start to work on some 
benchmark clients, ldif generator, and query generator, and a default set 
of parameters.