7566 – ldapadd slower on Linux than BSD

Issue 7566 - ldapadd slower on Linux than BSD

Summary: ldapadd slower on Linux than BSD

Status:	UNCONFIRMED

Alias:	None

Product:	OpenLDAP
Classification:	Unclassified
Component:	slapd (show other issues)
Version:	2.4.31
Hardware:	All All

Importance:	--- normal
Target Milestone:	---
Assignee:	OpenLDAP project

URL:
Keywords:

Depends on:
Blocks:

Reported:	2013-04-05 23:36 UTC by lm5@ualberta.ca
Modified:	2020-09-21 22:38 UTC (History)
CC List:	1 user (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this issue.

Description lm5@ualberta.ca 2013-04-05 23:36:05 UTC

Full_Name: Leo Mocofan
Version: 2.4.31
OS: Debian Testing
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (129.128.11.107)


We are currently testing Linux, OpenBSD and FreeBSD to run our LDAP service. Our
tests showed that Linux has great performance numbers in most of the areas but
we found one test that is very slow.

The database has 1 million entries and is 4.5 GB in size. Importing the whole
dababase using slapadd takes minutes on all platforms. When we tried to import
the database using ldapadd we noticed a huge difference: FreeBSD - 42 min,
OpenBSD - 59 min, Linux - 863 min. Changing file systems for backend in Linux
had no effect. Using memory based file system for the backend in Linux the time
for ldapadd dropped to 3 min.

Strace for ldapadd shows:
write(1, "adding new entry \"uid=noname,o"..., 60) = 60
time(NULL)                              = 1365202241
getrusage(RUSAGE_SELF, {ru_utime={1, 684105}, ru_stime={0, 212013}, ...}) = 0
time(NULL)                              = 1365202241
times({tms_utime=168, tms_stime=21, tms_cutime=0, tms_cstime=0}) = 1719044914
getrusage(RUSAGE_SELF, {ru_utime={1, 684105}, ru_stime={0, 212013}, ...}) = 0
time(NULL)                              = 1365202241
times({tms_utime=168, tms_stime=21, tms_cutime=0, tms_cstime=0}) = 1719044914
write(4, "\27\27\27\27\17uJ,\324\324\324\324"..., 1237) = 1237
poll([{fd=4, events=POLLIN|POLLPRI}], 1, 100) = 0 (Timeout)
poll([{fd=4, events=POLLIN|POLLPRI}], 1, 100) = 0 (Timeout)
poll([{fd=4, events=POLLIN|POLLPRI}], 1, 100) = 0 (Timeout)
poll([{fd=4, events=POLLIN|POLLPRI}], 1, 100) = 1 ([{fd=4, revents=POLLIN}])

Strace for one thread of slapd:
write(12, "\27\27\27\27\17uJ,\324\324\324\324"..., 117) = 117
time([1365202241])                      = 1365202241
sendto(4, "<167>Apr  5 16:50:41 slapd[4513]"..., 79, MSG_NOSIGNAL, NULL, 0) =
79
futex(0x7f36f8fd6724, FUTEX_WAIT_PRIVATE, 9166, NULL) = 0
futex(0x7f36f8fd66f8, FUTEX_WAKE_PRIVATE, 1) = 0
time(NULL)                              = 1365202242
read(12, "", 5)                         = 0
write(6, "0", 1)                        = 1
futex(0x7f36f8fd6724, FUTEX_WAIT_PRIVATE, 9169, NULL

Strace for another thread of slapd:
futex(0x7f36f8fd66f8, FUTEX_WAKE_PRIVATE, 1) = 1
time(NULL)                              = 1365202241
epoll_wait(7, {{EPOLLIN, {u32=4177193844, u64=139874082155380}}}, 1024, 150000)
= 1
read(5, "0", 8192)                      = 1
time(NULL)                              = 1365202241
epoll_wait(7, {{EPOLLIN, {u32=4177193872, u64=139874082155408}}}, 1024, 150000)
= 1
epoll_ctl(7, EPOLL_CTL_MOD, 12, {0, {u32=4177193872, u64=139874082155408}}) = 0
futex(0x7f36f8fd6724, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f36f8fd6720,
{FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
time(NULL)                              = 1365202242
epoll_wait(7, {{EPOLLIN, {u32=4177193844, u64=139874082155380}}}, 1024, 150000)
= 1
read(5, "0", 8192)                      = 1
time(NULL)                              = 1365202242
epoll_wait(7, {}, 1024, 150000)         = 0
time(NULL)                              = 1365202392
epoll_wait(7, 

Ldapadd is stuck in poll system call, while one thread of slapd is waiting on
epoll_wait(7, 
After couple of seconds when apparently nothing happens, then everything starts
working again.

Changing the database backend from to bdb to hdb and mdb doesn't seem to have an
effect. We then tried to force SLAP_EVENT_WAIT to use select instead of
epoll_wait on Linux and ldapadd is still much slower on Linux.

We are runnin Debian testing as of Apr 5, 2013, with glibc-2.13, linux-3.2.41,
libdb-5.1.29, openldap-2.4.31.

Comment 1 Quanah Gibson-Mount 2013-04-05 23:45:30 UTC

--On Friday, April 05, 2013 11:36 PM +0000 lm5@ualberta.ca wrote:

> Full_Name: Leo Mocofan
> Version: 2.4.31
> OS: Debian Testing
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (129.128.11.107)

If you are going to test back-mdb, then you need to use a current OpenLDAP 
release (2.4.35).

In any case, I don't see a bug report here?  It sounds more like you're 
asking for tuning advice?  I would ask what filesystem you are using (ext2? 
ext4?) etc.  For ext4, I had to modify dirty_bytes to keep the 
Debian/Ubuntu flush process from killing load performance.

--Quanah

--

Quanah Gibson-Mount
Sr. Member of Technical Staff
Zimbra, Inc
A Division of VMware, Inc.
--------------------
Zimbra ::  the leader in open source messaging and collaboration

Comment 2 Quanah Gibson-Mount 2013-04-05 23:48:36 UTC

--On Friday, April 05, 2013 11:46 PM +0000 quanah@zimbra.com wrote:

> --On Friday, April 05, 2013 11:36 PM +0000 lm5@ualberta.ca wrote:
>
>> Full_Name: Leo Mocofan
>> Version: 2.4.31
>> OS: Debian Testing
>> URL: ftp://ftp.openldap.org/incoming/
>> Submission from: (NULL) (129.128.11.107)
>
> If you are going to test back-mdb, then you need to use a current
> OpenLDAP  release (2.4.35).
>
> In any case, I don't see a bug report here?  It sounds more like you're
> asking for tuning advice?  I would ask what filesystem you are using
> (ext2?  ext4?) etc.  For ext4, I had to modify dirty_bytes to keep the
> Debian/Ubuntu flush process from killing load performance.

Oh, one other note -- I advise avoiding the debian/ubuntu build versions of 
BDB, if that is what you are using.  They do not build BDB with the correct 
configure options, seriously impacting performance.

--Quanah

--

Quanah Gibson-Mount
Sr. Member of Technical Staff
Zimbra, Inc
A Division of VMware, Inc.
--------------------
Zimbra ::  the leader in open source messaging and collaboration

Comment 3 lm5@ualberta.ca 2013-04-06 02:52:51 UTC

Hi Quanah,

I have started doing tests with Gentoo Linux running openldap 2.4.35, glibc
2.16 and linux kernel 3.4, 3.8 and 3.9-rc5. I have tried ext2, ext4,
reiserfs and jfs. Ldapadd is still very slow compared to BSD regardless of
the openldap backend.

Also a correction to the initial report: using a memory based file system,
ldapadd the whole database took 22 min.

Leo


On Fri, Apr 5, 2013 at 5:48 PM, Quanah Gibson-Mount <quanah@zimbra.com>wrote:

> --On Friday, April 05, 2013 11:46 PM +0000 quanah@zimbra.com wrote:
>
>  --On Friday, April 05, 2013 11:36 PM +0000 lm5@ualberta.ca wrote:
>>
>>  Full_Name: Leo Mocofan
>>> Version: 2.4.31
>>> OS: Debian Testing
>>> URL: ftp://ftp.openldap.org/**incoming/<ftp://ftp.openldap.org/incoming/>
>>> Submission from: (NULL) (129.128.11.107)
>>>
>>
>> If you are going to test back-mdb, then you need to use a current
>> OpenLDAP  release (2.4.35).
>>
>> In any case, I don't see a bug report here?  It sounds more like you're
>> asking for tuning advice?  I would ask what filesystem you are using
>> (ext2?  ext4?) etc.  For ext4, I had to modify dirty_bytes to keep the
>> Debian/Ubuntu flush process from killing load performance.
>>
>
> Oh, one other note -- I advise avoiding the debian/ubuntu build versions
> of BDB, if that is what you are using.  They do not build BDB with the
> correct configure options, seriously impacting performance.
>
>
> --Quanah
>
> --
>
> Quanah Gibson-Mount
> Sr. Member of Technical Staff
> Zimbra, Inc
> A Division of VMware, Inc.
> --------------------
> Zimbra ::  the leader in open source messaging and collaboration
>
>

Comment 4 Quanah Gibson-Mount 2013-04-06 03:14:00 UTC

--On Friday, April 05, 2013 8:52 PM -0600 - lidutu <lm5@ualberta.ca> wrote:

>
>
>
> Hi Quanah,
>
> I have started doing tests with Gentoo Linux running openldap 2.4.35,
> glibc 2.16 and linux kernel 3.4, 3.8 and 3.9-rc5. I have tried ext2,
> ext4, reiserfs and jfs. Ldapadd is still very slow compared to BSD
> regardless of the openldap backend.
>
>
> Also a correction to the initial report: using a memory based file
> system, ldapadd the whole database took 22 min.

So what is happening on the slapd server while things are paused?

Also, if you use an ldapi:/// socket rather than a tcp/ip socket, do you 
see the same issue?

--Quanah

--

Quanah Gibson-Mount
Sr. Member of Technical Staff
Zimbra, Inc
A Division of VMware, Inc.
--------------------
Zimbra ::  the leader in open source messaging and collaboration

Comment 5 lm5@ualberta.ca 2013-04-06 19:07:58 UTC

> So what is happening on the slapd server while things are paused?
>
> Also, if you use an ldapi:/// socket rather than a tcp/ip socket, do you
> see the same issue?
>

A have changed to use the ldapi:/// socket and there is no improvement.

It seems that one thread of slapd is repeatedly waiting on epoll_wait.
Strace prints
epoll_wait(6,
and then pauses for couple of seconds.

Iotop shows to slapd threads writing to disk. Right after I start ldapadd,
they write to disk at approximately 60MB/s each. Over time, they go down to
2-3 MB/s each.

TID    PRIO    USER    DISK READ    DISK WRITE    SWAPIN    IO    COMMAND
15264 be/4 ldap        0.00 B/s    2.81 M/s  0.00 % 99.99 % slapd -u ldap
-g ldap -f /etc/openldap/slapd.conf -h
ldapi://%2fvar%2frun%2fopenldap%2fslapd.sock
15255 be/4 ldap        0.00 B/s    3.00 M/s  0.00 % 19.48 % slapd -u ldap
-g ldap -f /etc/openldap/slapd.conf -h
ldapi://%2fvar%2frun%2fopenldap%2fslapd.sock

I took an strace for all the process and threads involved over a 2 minute
period.

Process slapd - main thread:
futex(0x7f2efa38e9d0, FUTEX_WAIT, 15251, NULL

Process slapd - thread:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 57.08    1.639877         474      3457       268 futex
 42.54    1.222208        1076      1136           fdatasync
  0.17    0.004913           0    272650           writev
  0.11    0.003194           3      1137           pwrite
  0.06    0.001736           0    272649           lseek
  0.02    0.000530           0      4544      1136 read
  0.01    0.000282           0      2273           write
  0.00    0.000132           0      2273           sendto
  0.00    0.000065           0      1136           open
  0.00    0.000061           0      1136           stat
  0.00    0.000041           0      1136           epoll_ctl
  0.00    0.000011           0      1136           close
  0.00    0.000000           0       116           mprotect
  0.00    0.000000           0      2272           fcntl
  0.00    0.000000           0      1136           getuid
  0.00    0.000000           0      1136           getppid
  0.00    0.000000           0      1136           gettid
------ ----------- ----------- --------- --------- ----------------
100.00    2.873050                570459      1404 total

Process slapd - thread:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 50.67    1.128008         399      2828       213 futex
 48.96    1.089974        1161       939           fdatasync
  0.15    0.003311           0    226607           writev
  0.13    0.002928           3       939           pwrite
  0.07    0.001544           0    226607           lseek
  0.01    0.000214           0      1879           write
  0.01    0.000139           0      1879           sendto
  0.00    0.000065           0       940           open
  0.00    0.000046           0      3760       940 read
  0.00    0.000029           0        94           mprotect
  0.00    0.000021           0       940           close
  0.00    0.000021           0       940           getppid
  0.00    0.000019           0       940           getuid
  0.00    0.000015           0       940           epoll_ctl
  0.00    0.000014           0       940           stat
  0.00    0.000000           0      1880           fcntl
  0.00    0.000000           0       940           gettid
------ ----------- ----------- --------- --------- ----------------
100.00    2.226348                473992      1153 total

Process slapd - thread:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
  -nan    0.000000           0      1386           read
  -nan    0.000000           0     13861           sendto
  -nan    0.000000           0      3866       294 futex
  -nan    0.000000           0      2772           epoll_wait
  -nan    0.000000           0      1386           epoll_ctl
------ ----------- ----------- --------- --------- ----------------
100.00    0.000000                 23271       294 total

Process ldapadd:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.19    0.056533          18      3194           poll
  0.48    0.000276           0      4866           read
  0.32    0.000183           0      6846           write
  0.00    0.000000           0         1           restart_syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.056992                 14907           total

Comment 6 Quanah Gibson-Mount 2013-04-08 03:19:32 UTC

--On Saturday, April 06, 2013 1:07 PM -0600 - lidutu <lm5@ualberta.ca> 
wrote:

>
>
>
>
>
>
>
> So what is happening on the slapd server while things are paused?
>
> Also, if you use an ldapi:/// socket rather than a tcp/ip socket, do you
> see the same issue?
>
>
>  
>
> A have changed to use the ldapi:/// socket and there is no improvement.
>
>
> It seems that one thread of slapd is repeatedly waiting on epoll_wait.
> Strace prints
> epoll_wait(6,
>
> and then pauses for couple of seconds.
>
>
>
> Iotop shows to slapd threads writing to disk. Right after I start
> ldapadd, they write to disk at approximately 60MB/s each. Over time, they
> go down to 2-3 MB/s each.
>
> TID    PRIO    USER    DISK READ    DISK WRITE   
> SWAPIN    IO    COMMAND
> 15264 be/4 ldap        0.00 B/s    2.81 M/s  0.00 % 99.99 %
> slapd -u ldap -g ldap -f /etc/openldap/slapd.conf -h
> ldapi://%2fvar%2frun%2fopenldap%2fslapd.sock
> 15255 be/4 ldap        0.00 B/s    3.00 M/s  0.00 % 19.48 %
> slapd -u ldap -g ldap -f /etc/openldap/slapd.conf -h
> ldapi://%2fvar%2frun%2fopenldap%2fslapd.sock

I noticed something similar on ubuntu10 as well.  I upgraded my server to 
ubuntu12, and I also set /proc/sys/vm/dirty_ratio to 90 and 
/proc/sys/vm/dirty_expire_centisecs to 60000, and that helped significantly 
with back-mdb using current OpenLDAP master.  An ldapadd of almost 3.5 
million users takes approximately 4 hours now for my dataset vs 16.  so 4x 
faster.

--Quanah

--

Quanah Gibson-Mount
Sr. Member of Technical Staff
Zimbra, Inc
A Division of VMware, Inc.
--------------------
Zimbra ::  the leader in open source messaging and collaboration

Comment 7 lm5@ualberta.ca 2013-04-09 02:31:11 UTC

>
> I noticed something similar on ubuntu10 as well.  I upgraded my server to
> ubuntu12, and I also set /proc/sys/vm/dirty_ratio to 90 and
> /proc/sys/vm/dirty_expire_centisecs to 60000, and that helped significantly
> with back-mdb using current OpenLDAP master.  An ldapadd of almost 3.5
> million users takes approximately 4 hours now for my dataset vs 16.  so 4x
> faster.
>

Thank you for your suggestions. I tried these two settings and ldapadd
wrote 3 GB in 8 hours (out of 4.3 GB). How big is your database ?

Have you considered running the database from a ramdisk ? Using ramfs in
Linux, I got the fastest times on all the tests I tried.

Leo Mocofan

Comment 8 Quanah Gibson-Mount 2013-04-09 02:44:46 UTC

--On Monday, April 08, 2013 8:31 PM -0600 Leo Mocofan <lm5@ualberta.ca> 
wrote:

>
>
>
>
>
>
> I noticed something similar on ubuntu10 as well.  I upgraded my server to
> ubuntu12, and I also set /proc/sys/vm/dirty_ratio to 90 and
> /proc/sys/vm/dirty_expire_centisecs to 60000, and that helped
> significantly
> with back-mdb using current OpenLDAP master.  An ldapadd of almost 3.5
> million users takes approximately 4 hours now for my dataset vs 16.  so
> 4x
> faster.
>
>
>
>
> Thank you for your suggestions. I tried these two settings and ldapadd
> wrote 3 GB in 8 hours (out of 4.3 GB). How big is your database ?

My DB is 5.6GB in size after load is complete.

--Quanah

--

Quanah Gibson-Mount
Sr. Member of Technical Staff
Zimbra, Inc
A Division of VMware, Inc.
--------------------
Zimbra ::  the leader in open source messaging and collaboration