[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#3851) Berkeley DB Scalability Patch



Full_Name: Jong-Hyuk Choi
Version: HEAD
OS: Linux
URL: ftp://ftp.openldap.org/incoming/bdb-scalability-patch-jongchoi-050708.diff
Submission from: (NULL) (129.34.20.23)


This ITS issue provides a scalability patch for Berkeley DB which contains a
buddy memory allocator for the Berkeley DB memory pool.

While I have been investigating the scalability of OpenLDAP and developing
scalability enhancing techniques such as the index clustering (ITS#3611), it was
found that those techniques become almost ineffective as soon as the working set
grows larger than the DB cache size.

Through profiling it was identified that the Berkeley DB's shared memory
allocator, __db_shalloc(), is consuming most of the CPU cycles in finding a free
memory chunk which can accomodate the requested size. __db_shalloc() uses a
simple first-fit linear algorithm which can obviously cause a serious
performance degradation once there is a high degree of external fragmentation.

The new memory allocator contained in this patch, __db_smalloc(), implements a
buddy memory allocator which provides O(1) behavior with regard to the number of
objects in a region while preserving the same semantics as in the original
__db_shalloc() such as the persistency of allocation, ability to remap to
arbitrary virtual addresses, and due support for aligning and region sizes. The
new memory allocator can coexist with the original one. In the current patch,
only the memory pool subsystem is set to use the new buddy allocator. The rest
including the logging and the locking subsystems still use the original linear
allocator. They can also be configured to use the new one by setting a flag as
such in the corresponding region information data structure.

The target of this patch is the Berkeley DB version 4.2.52 without encryption
support. Applicability to other versions of Berkeley DB is not tested.

The following graph shows the performance of directory population (slapadd).
x-axis is the number of directory entries added and y-axes is the time taken to
perform directory population in seconds.

System Under Test
Server: IBM eServer x335 with 8 2.8GHz Xeons and 12GB of memory
Disk:   3 IBM ServeRaid Disk Arrays (RAID 5, 15K RPM Disks)
        database environment, log files, and ldif file, each sits on separate
disk arrays
OS:     SLES9
DB Cache Size: 1.68GB

DIT: DirectoryMark generated; 1 organization, 10 organizationalunit, and the
rest organizationalPerson
Indexing: equality for objectClasss, equality and substring for cn

(Similar performance results could be obtained with smaller scale servers
because slapadd is single threaded and the SUT is a 32-bit platform.)

ftp://ftp.openldap.org/incoming/slapadd-perf-choi-050708.png (full scale)
ftp://ftp.openldap.org/incoming/slapadd-perf-zi-choi-050708.png (zoomed in)

The first bar in each group represents the performance of the current HEAD and
the second bar represents that of the current HEAD + Index Clustering Patch
(ITS#3611) both without the Berkeley DB scalability patch. The third and forth
bar represent those with the Berkeley DB scalability patch.

First compare the first and the second bars in each group. As shown in the
graph, the index clustering becomes less effective beyond 1 million directory
entries. The directory population time increases linearly up to 1 million
entries but increases more than quadratically from that point on. While the
index clustering is able to achieve close to 70% speedup with 1 million entries,
speedup is 16% at most with 4 million entries.

With the Berkeley DB scalability patch, the directory population time grows
almost linearly and the index clustering patch remains effective up to 8 million
entries.
The speedups achieved by the Berkeley DB scalability patch with 4 million
entries are 6.49 and 9.12 with and without the index clustering, respectively.
The overall speedup achieved by applying both the Berkeley DB scalability patch
and the index clustering is 10.52.

I'm currently gathering more performance data with different configurations and
with larger ldif files. I will post further results as follow ups to this ITS
issue.

- Jong-Hyuk

------------------------
Jong Hyuk Choi
IBM Thomas J. Watson Research Center - Enterprise Linux Group
P.O. Box 218, Yorktown Heights, NY 10598
email: jongchoi@us.ibm.com
(phone) 914-945-3979    (fax) 914-945-4425   TL: 862-3979