[Date Prev][Date Next] [Chronological] [Thread] [Top]

slapadd, 1 million entries, some numbers



Being one of the folks who said that openldap was slow, I decided
to run some tests.

Yesterday I tried do add about 1 million entries via slapadd. Results
are below. The machine is a P4 2.4GHz, 1Gb RAM, cheap 20G udma IDE disk,
P4PE motherboard.

Configuration:
openldap-2.1.19
db-4.1.25
linux-kernel-2.4.21-pre7-ac1
no indexing, schemacheck off, dbnosync set, DB_TXN_NOSYNC set

# time slapadd -l saida.ldif
real    233m3.877s
user    7m51.880s
sys     1m11.710s

I noticed via top (and according to the time output above) that slapd
spends a considerable amount of time in the D state, that is, waiting for
a system call to complete if I'm not mistaken. I suppose this is due to the
heavy logging the BDB backend uses/makes. I got about 2.7G worth of log files.

I'm about to repeat this test on another machine now with two scsi disks, 2 HT
CPUs, openldap-2.1.20 and with the log dir on the other disk.

# grep dn: saida.ldif |wc -l
1081802

Relevant parts of slapd.conf:
schemacheck off
database        bdb
dbnosync
checkpoint 100000 10

DB_CONFIG:
set_flags       DB_TXN_NOSYNC
#set_lg_dir     /storage/ldap
set_lg_max      104857600

(I will repeat the test with the log dir set to another disk)

Meanwhile, I'm running slapindex on that first machine, and it is also spending
a lot of time in the D state, as I imagine is expected. I will repeat all this in
a SCSI machine as well.

Meanwhile (again :), does anybody see any obvious mistake besides the ones I
already mentioned (ide disk, log in the same disk as the database)? Is cache
relevant for bulk loading data?

A sample of the ldif file, 900 fictious branches, 1200 test subjects:
# head -50 saida.ldif 
dn: o=Company
o: SP
objectClass: top
objectClass: organization

dn: ou=Branches, o=Company
ou: Escolas
objectClass: top
objectClass: organizationalUnit

dn: ou=Branch-1, ou=Branches, o=Company
ou: Branch-1
objectClass: top
objectClass: organizationalUnit

dn: ou=People, ou=Branch-1, ou=Branches, o=Company
ou: People
objectClass: top
objectClass: organizationalUnit

dn: uid=Emp-1, ou=People, ou=Branch-1, ou=Branches, o=Company
uid: Emp-1
cn: Emp-1-cn
givenName: Emp-1-gn
sn: Emp-1-sn
mail: Emp-1@Branch-1.company.com
uidNumber: 1001
gidNumber: 1001
homeDirectory: /home/Emp-1
objectClass: person
objectClass: organizationalPerson
objectClass: inetOrgPerson
objectClass: top
objectClass: posixAccount
(...)