[Date Prev][Date Next]
RE: benchmarks &real world
On Thu, 15 Apr 2004, Howard Chu wrote:
> > -----Original Message-----
> > From: John Smith [mailto:firstname.lastname@example.org]
> > "Howard Chu" wrote:
> > >
> > >I've played with DirectoryMark a few times, after seeing it
> > >bandied about.
> > >In
> > >fact, the data set that it uses is randomly generated, so there is no
> > >canonical fixed data set. Nor is there a standardized job
> > >mix or load
> > >factor.
> > >In articles I've seen that report DirectoryMark figures,
> > >these critical
> > >input
> > >parameters are not included. As such, none of the published
> > >results I've
> > >ever
> > >seen could be compared to any concrete reference, which
> > >makes the usage of
> > >DirectoryMark rather meaningless.
> > Well, what other/better benchmark do you suggest then ?
> The DirectoryMark tools are basically OK. Improving it is a simple matter of
> applying the scientific method: make it consistent and repeatable.
> Using a single data set for all tests would be a good step there. Since the
> data set could be quite large, it would be acceptable to continue to
> "randomly" generate it, as long as
> A) the pseudo-random number generation algorithm is fixed
> and B) the input seed is fixed
> for all invocations of the test. (This is just a cheat to be able to
> distribute a constant dataset without having to actually store it in expanded
> LDIF form.)
> Likewise, the hardware environment for the test must be fixed. The client
> machines must be fully specified - CPU, RAM, network interface, OS version,
> etc. as well as the servers. The operation mix must be fixed. Again, the job
> mix can be "randomly" generated as long as the sequence can be perfectly
> repeated - i.e., use the known pseudo-random number generator and input seed.
> All of these parameters must be either published with the DirectoryMark
> report, or everyone must agree to a canonical set of parameters so that they
> can be explicitly referenced/linked to when publishing a report.
The only published DirectoryMark results for OpenLDAP I saw did not
specify how many operations were performend per bind, and my own
benchmarks (2 OSs, 2 OpenLDAP releases, 2 backend types, and 3 databases -
2 generated by DirectoryMark, one a real-world db) showed that this had
the biggest effect, and since the published benchmark used similar
hardware, I must assume the tests were done with one bind for the whole
benchmark. This happens to be the parameter which has the biggest effect
(besides indexing) ...
> The fact that certain vendors' servers are not available on certain machines
> or OS revisions complicates matters, because you can't get an
> apples-to-apples comparison then. In that case, whatever numbers you get are
> basically useless for product comparisons.
> The notion of benchmarking is surely a well-understood topic, it surprises me
> that no one working in this space has demonstrated any actual understanding
> of it thus far.
I think that's a bit harsh. But, it may be a good start to work on some
benchmark clients, ldif generator, and query generator, and a default set