Re: (ITS#7667) performance degradation when using MDB_INTEGERKEY

romange@gmail.com wrote:
> Hi, I extracted a small dataset that shows the problem.
> you can download it from here:
> https://docs.google.com/file/d/0B6o29pwkWoERdnFSaUtMNDljemc/edit?usp=sharing
> I modified mdb_copy.c to demonstrate the difference. copy it to source dir
> from here
> https://docs.google.com/file/d/0B6o29pwkWoERd3VuUm1DN0FpcUU/edit?usp=sharing
> build and
> run "time ./mdb_copy foo foo2"
> after this change the flag at line 64 and run it again.
> at my computer the difference is 17s vs 1.7s for 3 million items.

This test doesn't prove the existence of a bug. You're running on a 
Little-Endian machine, therefore data that is in sorted order as a string is 
in hashed order when used as an integer. Your data insert turns into a 
worst-case insert order in this case, causing the worst possible random access 
strides through memory. Assuming the two orders to be equivalent is a pretty 
common mistake for DB programmers. Microsoft has done the same thing in 
ActiveDirectory, I mentioned it here a few years ago 

If you had run this test on a Big-Endian machine, like SPARC, the insert order 
would be identical either way, and INTEGERKEY result would have been faster.

Closing this ITS, no bug.
