[Date Prev][Date Next] [Chronological] [Thread] [Top]

back-mdb optimization



back-mdb was first feature-complete a week ago. Between then and now I've spent a bit of time profiling its behavior and eliminating hot spots. For this work I used a test database of 250,000 synthetic entries, running under valgrind's callgrind tool and my FunctionCheck profiler. (Occasionally the two tools would disagree on where hot spots were, so I wound up targeting both.) The callgrind output is available on http://highlandsun.com/hyc/mdb_search/ for reference.

With the initial back-mdb code, which was basically back-bdb with all of its caching logic removed, an ldapsearch that scanned the entire DB ran in 8,875,046,744 instructions. The biggest hot spot was memnrcmp(), which is the libmdb function for comparing two strings in reverse byte order. The top 10 functions were:

1,828,659,111	memnrcmp
1,151,097,812	strncasecmp
  911,117,166	mdb_search_node
  628,388,775	avl_find
  612,549,560	ad_keystring
  600,502,442	entry_decode
  494,045,070	slap_bv2ad
  376,177,636	mdb_search_page
  285,882,273	attr_index_name_cmp
  199,751,242	mdb_cursor_set

(That basically corresponded to commit e5b1dce6a7904e0eb31029959865730fc813ce57)

-=-=-=-

The next step was to eliminate slap_bv2ad() from the entry_decode() path, using numeric IDs for attributeDescriptions in the database. Rewriting slapd entry_decode as mdb_entry_decode() with this change brought the total search execution down to 5,410,551,759 instructions. The top 10 functions were:

1,823,974,563	memnrcmp
  796,099,511	mdb_search_node
  528,502,136	mdb_entry_decode
  376,251,165	mdb_search_page
  199,751,329	mdb_cursor_set
  167,097,364	strncasecmp
  151,524,069	attr_clean
  143,741,422	mdb_get_page
   87,494,764	cursor_push_page
   83,500,204	is_ad_subtype
(commit 1e32fcf099ba8c15333365fe68aefa5217ae3d8c)

-=-=-=-

Next was to eliminate some redundant navigation of the dn2id index. It was doing essentially the same traversal twice on each search candidate - once to determine if the candidate belonged to the search scope, and once to assemble the entryDN. With this change the total search came down to 3,767,565,531 instructions. The top functions were:

1,018,812,205	memnrcmp
  528,501,424	mdb_entry_decode
  463,442,142	mdb_search_node
  212,601,386	mdb_search_page
  167,097,240	strncasecmp
  151,523,868	attr_clean
   93,001,026	mdb_cursor_set
   83,500,204	is_ad_subtype
   80,889,277	avl_find
   80,496,022	mdb_get_page
(commit 6c8e4f2671b6aed41cd5098725048dbe2513612c)

-=-=-=-

The next step was a minor libmdb cleanup, restructuring it so that key/data pairs were always guaranteed to start on a 2-byte aligned address. (While x86 didn't seem to care, CPUs like SPARC would SIGBUS otherwise.) This restructuring brought execution down to 3,441,377,693 instructions - making code more portable is always a good thing, even if the impact is minor. The top functions were

1,018,873,702	memnrcmp
  537,251,494	mdb_entry_decode
  463,465,251	mdb_search_node
  212,597,818	mdb_search_page
  151,523,868	attr_clean
   93,001,026	mdb_cursor_set
   83,500,204	is_ad_subtype
   80,496,022	mdb_get_page
   63,750,282	attrs_alloc
   62,500,669	mdb_search
(commit 293df78b2be77d6d153fd7052cc62d3377dc5501)

-=-=-=-

Next, finally start doing something about memnrcmp. First is simply writing a more integer-oriented function cintcmp, which operates on unaligned integers a byte at a time. This had only a small impact as well, getting us down only a bit to 3,342,205,373 instructions.

  919,700,412	cintcmp
(commit f9c8796d0b3ed9bc0f51c76bb28609121b1e2eec)
The rest of the trace profile is basically identical to the previous one.

-=-=-=-

Next was a bit of libmdb code cleanup and restructuring. The performance change was minimal, bringing total execution down to 3,310,510,255 instructions. The trace profile is mostly the same as the previous.
commit dac3fae3b540841ae753bea16f3b353e2124c43d)

-=-=-=-

Next was a further speedup of cintcmp, changing it to operate on unsigned shorts instead of just chars, now that we had guaranteed 2-byte alignment. This brought execution down to 2,832,817,987 instructions, and shook up the profile a little bit:

  537,251,494	mdb_entry_decode
  475,922,450	cintcmp
  457,377,978	mdb_search_node
  176,828,204	mdb_search_page_root
  151,523,868	attr_clean
   83,500,204	is_ad_subtype
(commit 1b69295a48cca409ed0c2f3fe655325e00f55ce2)

-=-=-=-

Next was a further rewrite of mdb_entry_decode, using tmpmem allocs instead of the slapd central entry_alloc/attrs_alloc functions. This brought execution down to 2,483,077,294 instructions. The profile is much like the previous, but attr_clean and its associated functions disappear. The top functions were:

  535,751,482	mdb_entry_decode
  475,922,450	cintcmp
  457,377,978	mdb_search_node
  176,828,204	mdb_search_page_root
   83,500,204	is_ad_subtype
(commit f72d65b77aa6cd4439ee0ad80b498f4ed707a278)

-=-=-=-

Next was another tweak for mdb_search(), keeping the cursor on the id2entry database for the duration of the search. This eliminated a lot of mdb_search_page overhead since usually the cursor was already on the right page when the next entry was being fetched. This change brought execution down to 2,241,832,009 instructions. The top functions were:

  535,751,482	mdb_entry_decode
  391,064,166	cintcmp
  350,350,674	mdb_search_node
  139,905,148	mdb_search_page_root
   93,256,473	mdb_cursor_set
   83,500,204	is_ad_subtype
(commit 54ced52c047425b432075946dd2997c52f020de0)
-=-=-=-

Finally (as of this morning) a bit of cleanup and restructuring in libmdb, to eliminate a bunch of cruft in the previous data structure layout. This change was more for esthetic reasons than for performance, but it still offered a small gain, with execution at 2,232,560,312 instructions. The top functions:

  535,751,482	mdb_entry_decode
  391,064,166	cintcmp
  342,681,498	mdb_search_node
  129,900,173	mdb_search_page_root
   90,006,339	mdb_cursor_set
   83,500,204	is_ad_subtype
(commit 25529a4c36903d0456b1251712de32f665850029)

-=-=-=-

At the outset libmdb had its clumsy parts. Now the libmdb/mdb.o text+data is only 31255 bytes - it's tight and very efficient. The entire DB engine can execute entirely within a CPU's L1 instruction cache, with room to spare.

--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/