OpenLDAP
Up to top level
Build   Contrib   Development   Documentation   Historical   Incoming   Software Bugs   Software Enhancements   Web  

Logged in as guest

Viewing Archive.Incoming/3564
Full headers

From: daniel.armbrust@mayo.edu
Subject: Programatic Insert Scaleability Problem
Compose comment
Download message
State:
0 replies:
8 followups: 1 2 3 4 5 6 7 8

Major security issue: yes  no

Notes:

Notification:


Date: Wed, 23 Feb 2005 14:15:04 GMT
From: daniel.armbrust@mayo.edu
To: openldap-its@OpenLDAP.org
Subject: Programatic Insert Scaleability Problem
Full_Name: Dan Armbrust
Version: 2.2.23
OS: Fedora Core 3
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (129.176.151.126)


With signifigant indexing enabled, openldap seems to be incapable of scaleing up
to even moderate sized databases - when you are loading a running server (rather
than an offline bulk insert)

If I start with a blank, clean database - here is the behavior I see.

The first ~200,000 entries or so (times about 5 attributes per entry) take about
2 minutes per 10,000 entries.

Then it starts getting slower.  By the time I get to entry 300,000 - it is
taking ~45 minutes per 10,000 entries.

Then it starts jumping around - sometimes taking a 1/2 hour per 10,000 sometimes
3 hours per 10,000.

Finally, usually somewhere around 600,000 concepts, it just kicks the client out
with some odd error message - it changes from run to run.  The most recent
example was:

javax.naming.NamingException: [LDAP: error code 80 - entry store failed]

After this happened - I could no longer connect to the ldap server with any
client - the server actually crashed when I tried to connect to it.  But then
when I restart the server, it seems to run fine, at least for browsing.  I
currently don't have the ability to resume my inserts - so I haven't been able
to tell if the insert speed would be good again after a restart, or if it would
stay as slow as it was at the end.

The database size was only 3.6 GB when it failed - I have loaded 16 GB databases
with slapadd before.  Why does this work in slapadd - but not when it is done
with slapd?

Config info:
Openldap 2.2.23
BerkeleyDB 4.2.52(.2)

bdb backend
DB_CONFIG file:
set_flags       DB_TXN_NOSYNC
set_flags       DB_TXN_NOT_DURABLE
set_cachesize   1       0       1


Useful? bits from the conf file:

schemacheck     on
idletimeout     14400
threads         150
sizelimit       6000

database        bdb
suffix          "service=test,dc=LexGrid,dc=org"
directory       /localwork/ldap/database/dbdanTest
checkpoint      512     30

index           objectClass eq
index           conceptCode eq
index           language pres,eq
index           dc eq
index           sourceConcept,targetConcept,association,presentationId eq
index           text,entityDescription pres,eq,sub,subany


You can see the schema here if you desire:
http://informatics.mayo.edu/index.php?page=102


It appears that the performance tanked when the database grew to about twice the
size of the cache - and the whole thing crashed when it got to 3 times the size
of the cache.

What is the point of a database, if it is limited to only holding twice the
amount of what you can put into RAM?


Followup 1

Download message
Date: Thu, 03 Mar 2005 13:28:34 -0800
From: Howard Chu <hyc@symas.com>
To: daniel.armbrust@mayo.edu
CC: openldap-its@OpenLDAP.org
Subject: Re: (ITS#3564) Programatic Insert Scaleability Problem
daniel.armbrust@mayo.edu wrote:

>With signifigant indexing enabled, openldap seems to be incapable of
scaleing up
>to even moderate sized databases - when you are loading a running server
(rather
>than an offline bulk insert)
>  
>
Bulk loading of a running server is not specifically addressed in any 
current releases.

>If I start with a blank, clean database - here is the behavior I see.
>
>The first ~200,000 entries or so (times about 5 attributes per entry) take
about
>2 minutes per 10,000 entries.
>
>Then it starts getting slower.  By the time I get to entry 300,000 - it is
>taking ~45 minutes per 10,000 entries.
>
>Then it starts jumping around - sometimes taking a 1/2 hour per 10,000
sometimes
>3 hours per 10,000.
>
>Finally, usually somewhere around 600,000 concepts, it just kicks the client
out
>with some odd error message - it changes from run to run.  The most recent
>example was:
>
>javax.naming.NamingException: [LDAP: error code 80 - entry store failed]
>
>After this happened - I could no longer connect to the ldap server with any
>client - the server actually crashed when I tried to connect to it.  But
then
>when I restart the server, it seems to run fine, at least for browsing.  I
>currently don't have the ability to resume my inserts - so I haven't been
able
>to tell if the insert speed would be good again after a restart, or if it
would
>stay as slow as it was at the end.
>  
>
Of course the server should never crash... We need to see a backtrace 
from the crash to understand what has failed. You should probably also 
run with debug level 0; there may be diagnostic messages from the 
BerkeleyDB library as well that you're not seeing at the moment.

>The database size was only 3.6 GB when it failed - I have loaded 16 GB
databases
>with slapadd before.  Why does this work in slapadd - but not when it is
done
>with slapd?
>  
>
At a guess, there may be a memory leak somewhere. But there's too little 
information about the crash here to tell.

>Config info:
>Openldap 2.2.23
>BerkeleyDB 4.2.52(.2)
>
>bdb backend
>DB_CONFIG file:
>set_flags       DB_TXN_NOSYNC
>set_flags       DB_TXN_NOT_DURABLE
>set_cachesize   1       0       1
>  
>
If that's all that you have in your DB_CONFIG, then you're definitely 
limiting yourself. Remember that you MUST store the transaction logs on 
a separate physical disk from your database files to get reasonable 
write performance. This is stated several times in the Sleepycat 
documentation.

>It appears that the performance tanked when the database grew to about twice
the
>size of the cache 
>
Most likely because your logs are not configured properly.

>- and the whole thing crashed when it got to 3 times the size
>of the cache.
>  
>
This is a problem that needs investigation.

>What is the point of a database, if it is limited to only holding twice the
>amount of what you can put into RAM?
>
We've built much larger databases fairly often. Once you exceed the size 
of the RAM cache, speeds are obviously limited by your disk throughput. 
Using more, faster disks is the only solution when you don't have any 
RAM left to throw at the situation. Keeping the transaction logs 
separate from the database disks will make a big difference in these cases.

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support



Followup 2

Download message
From: "Armbrust, Daniel C." <Armbrust.Daniel@mayo.edu>
To: 
Cc: openldap-its@openldap.org
Subject: RE: (ITS#3564) Programatic Insert Scaleability Problem
Date: Wed, 16 Mar 2005 12:12:52 -0600
>Bulk loading of a running server is not specifically addressed in any 
>current releases.

So how exactly is one supposed to put things into an ldap database if you can't
load a running server?
I certainly shouldn't have to manually convert my data to ldif to be able to
load it.


>We need to see a backtrace 
>from the crash to understand what has failed. You should probably also 
>run with debug level 0; there may be diagnostic messages from the 
>BerkeleyDB library as well that you're not seeing at the moment.

I'll try to get more logs, but trying to load this much stuff with a debug level
of 0 completely tanks the performance.  I don't know if I even have enough disk
space to hold the file it would generate given how verbose it is....

>bdb backend
>DB_CONFIG file:
>set_flags       DB_TXN_NOSYNC
>set_flags       DB_TXN_NOT_DURABLE
>set_cachesize   1       0       1

>If that's all that you have in your DB_CONFIG, then you're definitely 
>limiting yourself. Remember that you MUST store the transaction logs on 
>a separate physical disk from your database files to get reasonable 
>write performance.

If my understanding of the DB_CONFIG file is correct, the configuration I have
above should be disabling logging entirely.  Performance with logging turned on
is to pathetic to mention for this type of operation....

I'll post additional information relating to the crash to the other bug
(http://www.openldap.org/its/index.cgi?findid=3565 ) when I am able to generate
it.

But your first comment that I responded to above has significantly cooled our
interest in using an ldap backend in our project
(http://informatics.mayo.edu/LexGrid/index.php), so its not as high of priority
for me anymore.



Followup 3

Download message
Date: Wed, 16 Mar 2005 10:50:54 -0800
From: Howard Chu <hyc@symas.com>
To: Armbrust.Daniel@mayo.edu
CC: openldap-its@OpenLDAP.org
Subject: Re: (ITS#3564) Programatic Insert Scaleability Problem
Armbrust.Daniel@mayo.edu wrote:

>>Bulk loading of a running server is not specifically addressed in any 
>>current releases.
>>    
>>
>
>So how exactly is one supposed to put things into an ldap database if you
can't load a running server?
>I certainly shouldn't have to manually convert my data to ldif to be able to
load it.
>  
>
You're forgetting that an LDAP server is optimized for many reads and 
few writes. If your application profile doesn't fit this description, 
then you should probably use something else. Most installations that 
load large databases do so offline.

>>We need to see a backtrace 
>>    
>>
>>from the crash to understand what has failed. You should probably also 
>  
>
>>run with debug level 0; there may be diagnostic messages from the 
>>BerkeleyDB library as well that you're not seeing at the moment.
>>    
>>
>
>I'll try to get more logs, but trying to load this much stuff with a debug
level of 0 completely tanks the performance.  I don't know if I even have enough
disk space to hold the file it would generate given how verbose it is....
>  
>
debug level 0 should only be logging serious errors. Something doesn't 
sound right here.

>  
>
>>bdb backend
>>DB_CONFIG file:
>>set_flags       DB_TXN_NOSYNC
>>set_flags       DB_TXN_NOT_DURABLE
>>set_cachesize   1       0       1
>>    
>>
>
>  
>
>>If that's all that you have in your DB_CONFIG, then you're definitely 
>>limiting yourself. Remember that you MUST store the transaction logs on 
>>a separate physical disk from your database files to get reasonable 
>>write performance.
>>    
>>
>
>If my understanding of the DB_CONFIG file is correct, the configuration I
have above should be disabling logging entirely.  Performance with logging
turned on is to pathetic to mention for this type of operation....
>  
>
Sorry, I was thinking of BDB 4.3 there. You're right.

>I'll post additional information relating to the crash to the other bug
(http://www.openldap.org/its/index.cgi?findid=3565 ) when I am able to generate
it.
>
>But your first comment that I responded to above has significantly cooled
our interest in using an ldap backend in our project
(http://informatics.mayo.edu/LexGrid/index.php), so its not as high of priority
for me anymore.
>
-- 
  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support



Followup 4

Download message
Date: Wed, 16 Mar 2005 10:58:40 -0800
To: Armbrust.Daniel@mayo.edu
From: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org>
Subject: RE: (ITS#3564) Programatic Insert Scaleability Problem
Cc: openldap-its@OpenLDAP.org
At 10:13 AM 3/16/2005, Armbrust.Daniel@mayo.edu wrote:
>>Bulk loading of a running server is not specifically addressed in any 
>>current releases.
>
>So how exactly is one supposed to put things into an ldap database if you
can't load a running server?
>I certainly shouldn't have to manually convert my data to ldif to be able to
load it. 

In the OpenLDAP developer community, "bulk loading" implies use
of slapadd(1).  slapadd(1) is not intended to used with the
server is running, or more precisely, slapadd(1) assumes
and requires exclusive access to the underlying DB.

If you want to load a bulk of entries while the server is
running, you must use an LDAP client such as ldapadd(1).
While ldapadd(1) expects input to be LDIF, there certainly
are clients which expect input in other formats.

Kurt 



Followup 5

Download message
From: "Armbrust, Daniel C." <Armbrust.Daniel@mayo.edu>
To: 
Cc: openldap-its@openldap.org
Subject: RE: (ITS#3564) Programatic Insert Scaleability Problem
Date: Wed, 16 Mar 2005 13:00:44 -0600
>You're forgetting that an LDAP server is optimized for many reads and 
>few writes. If your application profile doesn't fit this description, 
>then you should probably use something else. Most installations that 
>load large databases do so offline.

There is a difference between being optimized, and just not working.  I expect
to pay a performance penalty for using ldap when I am doing a load of the data. 
Our applications are almost always reads once things are up and going...
However, I still have to be able to actually put the data into the ldap server
at the beginning.  And since the data I have is not in the ldif format, I can't
bulk load it.    

I even went so far as to load my data into the old Netscape DS server (which
worked just fine) and then doing a ldif dump from that, to try to load into
openLdap - but something was off with the resulting ldif that openLdap didn't
like... I haven't investigated that any further yet.

>debug level 0 should only be logging serious errors. Something doesn't 
>sound right here.

My mistake, I was thinking level 1.  I'll rerun with level 0.

Dan 



Followup 6

Download message
From: "Armbrust, Daniel C." <Armbrust.Daniel@mayo.edu>
To: 
Cc: openldap-its@openldap.org
Subject: RE: (ITS#3564) Programatic Insert Scaleability Problem
Date: Wed, 16 Mar 2005 13:07:56 -0600
I've probably confused the issue by using the term bulk loading when I shouldn't
have.

I'm simply trying to add (a large amount) of content into the server while it is
running, through a connection to the running server.  My program that actually
processes the data is written in Java, and uses Suns standard API for accessing
a LDAP server.

If I had (or could easily generate) my data as ldif, I would do so (and then use
slapadd as we have in the past) but its not a simple or trivial task.  Plus, the
programmatic API is already implemented and works (with small data sets) (and
large data sets on other ldap implementations)


Dan 



Followup 7

Download message
Subject: RE: (ITS#3564) Programatic Insert Scaleability Problem
Date: Thu, 17 Mar 2005 08:22:20 -0500
From: "Bill Kuker" <wckits@rit.edu>
To: <openldap-its@OpenLDAP.org>
I had some problems with openldap's update speed. Digging through the
source it turned out that it was locking every 'row' (I am using the bdb
backend) of a multi valued attribute just to add one value. The code is
full of things like this, and I am not sure if they are terrible
mistakes or if they are necessary for correct operation. Either way the
problem was that one of our admins had turned off caching for bdb
because the startup and shutdown times were too long (with a 2gb
in-memory cache). When I increased the cache to be larger than the index
file where the attribute lived (about 11 Mb) my time dropped down from
120 seconds to under 1 second.

I send a few 100k of changes every day easily now. The moral here is
that even small changes can be sped up massively by correctly sizing
your cache, and a small (1Mb) change in cache size can mean a 10x change
in some cases.


-Bill Kuker


-----Original Message-----
From: owner-openldap-bugs@OpenLDAP.org
[mailto:owner-openldap-bugs@OpenLDAP.org] On Behalf Of
Armbrust.Daniel@mayo.edu
Sent: Wednesday, March 16, 2005 2:08 PM
To: openldap-its@OpenLDAP.org
Subject: RE: (ITS#3564) Programatic Insert Scaleability Problem

I've probably confused the issue by using the term bulk loading when I
shouldn't have.

I'm simply trying to add (a large amount) of content into the server
while it is running, through a connection to the running server.  My
program that actually processes the data is written in Java, and uses
Suns standard API for accessing a LDAP server.

If I had (or could easily generate) my data as ldif, I would do so (and
then use slapadd as we have in the past) but its not a simple or trivial
task.  Plus, the programmatic API is already implemented and works (with
small data sets) (and large data sets on other ldap implementations)


Dan 





Followup 8

Download message
Date: Mon, 04 Apr 2005 14:29:59 +0200
From: Pierangelo Masarati <ando@sys-net.it>
To: Armbrust.Daniel@mayo.edu
CC: openldap-its@OpenLDAP.org
Subject: Re: (ITS#3564) Programatic Insert Scaleability Problem
Armbrust.Daniel@mayo.edu wrote:

>I've probably confused the issue by using the term bulk loading when I
shouldn't have.
>
>I'm simply trying to add (a large amount) of content into the server while
it is running, through a connection to the running server.  My program that
actually processes the data is written in Java, and uses Suns standard API for
accessing a LDAP server.
>
>If I had (or could easily generate) my data as ldif, I would do so (and then
use slapadd as we have in the past) but its not a simple or trivial task.  Plus,
the programmatic API is already implemented and works (with small data sets)
(and large data sets on other ldap implementations)
>  
>
Sorry for jumping in so late; you might be able to get what you need by 
using something like "back-null" with a (trivial) custom overlay that 
simply dumps things on a file in LDIF format; then you could use slapadd 
as you did in the past.  I don't see a big advantage between 
sequentially writing to file and writing to back-bdb with logs off, but 
at least you'd have some control on what your application generates, and 
then you'd be able to use standard tools.  The overlay that dumps stuff 
in LDIF is definitely trivial: it basically needs to streamline 
operations and call entry2str() from within the "add" hook; the rest 
doesn't need to be implemented, but you need to ensure that your 
application is not leaving holes in the tree.

p.



    SysNet - via Dossi,8 27100 Pavia Tel: +390382573859 Fax: +390382476497


Up to top level
Build   Contrib   Development   Documentation   Historical   Incoming   Software Bugs   Software Enhancements   Web  

Logged in as guest


The OpenLDAP Issue Tracking System uses a hacked version of JitterBug

______________
© Copyright 2013, OpenLDAP Foundation, info@OpenLDAP.org