[Date Prev][Date Next] [Chronological] [Thread] [Top]

Antwort: RE: benchmarks &real world [Virus checked]






>The DirectoryMark tools are basically OK. Improving it is a simple matter of
>applying the scientific method: make it consistent and repeatable.

>Using a single data set for all tests would be a good step there. Since the
>data set could be quite large, it would be acceptable to continue to
>"randomly" generate it, as long as
>                 A) the pseudo-random number generation algorithm is fixed
>and                 B) the input seed is fixed

>for all invocations of the test. (This is just a cheat to be able to
>distribute a constant dataset without having to actually store it in expanded
>LDIF form.)

Random data is perfectly OK, unless one sees significant differences between two runs. I would be very surprised to see more than a couple of % differences between runs.

In fact, fixing the data set may not only be an overkill, but may theoretically even deliver results that are biased, and thus less usefull than those obtained with random data.

Before attempting to fix the data set, I would strongly suggest running the test as it is several times, and comparing the results. And yes, this is a perfectly legal scientific method (Here is a link for folks interested in rehearsing the basics of measuremental errors estimations: http://www.ae.gatech.edu/classes/ae3051/craig-errors.pdf).

regards
        Denis
--
T-Mobile Austria GmbH,
Information Technologies / Services
Knowledge Management & Process Automation

Dr. Denis Havlik,                                   eMail: denis.havlik@t-mobile.at
Rennweg 97-99, BT2E0304031        Phone: +43-1-79-585/6237          
A-1030 Vienna                                        Fax:      +43-1-79-585/6584