[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: ldapsearch hangs




I just sow what is going on...



the search returns sth like tihs:


<-------------------------------------snip---------------------------------------->

# 039010, 046010.100.8000.100, 99893, bestMatchPrefixList, sipDirektor, ot.hr
dn: originatorPrefixID=039010,carrierPrefixID=046010.100.8000.100,bestMatchPre
 fix=99893,ou=bestMatchPrefixList,ou=sipDirektor,dc=ot,dc=hr
originatorPrefix: 039010
priority: 100
originator: 039010
originatorPrefixID: 039010
objectClass: top
objectClass: originatorPrefixID

# 385, bestMatchPrefixList, sipDirektor, ot.hr
dn: bestMatchPrefix=385,ou=bestMatchPrefixList,ou=sipDirektor,dc=ot,dc=hr
destination: Croatia
bestMatchPrefix: 385
objectClass: top
objectClass: bestMatchPrefix

# 006800.100.10000.100, 385, bestMatchPrefixList, sipDirektor, ot.hr
dn: carrierPrefixID=006800.100.10000.100,bestMatchPrefix=385,ou=bestMatchPrefi
 xList,ou=sipDirektor,dc=ot,dc=hr
qos: 100
priority: 10000
carrierPrefixID: 006800.100.10000.100
carrierPrefix: 006800
weight: 100
carrier: Optima Telekom
objectClass: top
objectClass: carrierPrefixID

# 000010, 006800.100.10000.100, 385, bestMatchPrefixList, sipDirektor, ot.hr
dn: originatorPrefixID=000010,carrierPrefixID=006800.100.10000.100,bestMatchPr
 efix=385,ou=bestMatchPrefixList,ou=sipDirektor,dc=ot,dc=hr
originatorPrefix: 000010
priority: 100
originator: T-COM/HT
originatorPrefixID: 000010
objectClass: top
objectClass: originatorPrefixID



it stops here for a while and downbelow are the remainig entries that i added with ldapadd asfer i recreated the database from ldif file.... Something is wrong with this entries .. either are not indexed or something... Just to menitio .. I'm runing the same search several times with same results... Always stops here and the entries i added with ldapadd are returned after a while ... if ever.






# 043010.100.10000.100, 385, bestMatchPrefixList, sipDirektor, ot.hr
dn: carrierPrefixID=043010.100.10000.100,bestMatchPrefix=385,ou=bestMatchPrefi
 xList,ou=sipDirektor,dc=ot,dc=hr
qos: 100
priority: 10000
carrierPrefixID: 043010.100.10000.100
carrierPrefix: 043010
weight: 100
carrier: Telekom Austria
objectClass: top
objectClass: carrierPrefixID

# 000010, 043010.100.10000.100, 385, bestMatchPrefixList, sipDirektor, ot.hr
dn: originatorPrefixID=000010,carrierPrefixID=043010.100.10000.100,bestMatchPr
 efix=385,ou=bestMatchPrefixList,ou=sipDirektor,dc=ot,dc=hr
originatorPrefix: 000010
priority: 100
originator: T-COM/HT
originatorPrefixID: 000010
objectClass: top
objectClass: originatorPrefixID

# 078120.100.10000.100, 385, bestMatchPrefixList, sipDirektor, ot.hr
dn: carrierPrefixID=078120.100.10000.100,bestMatchPrefix=385,ou=bestMatchPrefi
 xList,ou=sipDirektor,dc=ot,dc=hr
qos: 100
priority: 10000
carrierPrefixID: 078120.100.10000.100
carrierPrefix: 078120
weight: 100
carrier: Lanck Telekom
objectClass: top
objectClass: carrierPrefixID

# 000010, 078120.100.10000.100, 385, bestMatchPrefixList, sipDirektor, ot.hr
dn: originatorPrefixID=000010,carrierPrefixID=078120.100.10000.100,bestMatchPr
 efix=385,ou=bestMatchPrefixList,ou=sipDirektor,dc=ot,dc=hr
originatorPrefix: 000010
priority: 100
originator: T-COM/HT
originatorPrefixID: 000010
objectClass: top
objectClass: originatorPrefixID

# search result
search: 2
result: 0 Success

# numResponses: 101584
# numEntries: 101583


Tihomir.



On Fri, Sep 11, 2009 at 5:10 PM, Tihomir Culjaga <tculjaga@gmail.com> wrote:
Hi Quanah,


I moved to OpenLDAP 2.4.18 and patched B DB 4.7.25 with all 4 patches from oracle.


I DIDN't change slapd.config at all

i reduced the number of entries to a total of 3437278.

[root@l01lnp2 ~]# du -c -h /var/lib/ldap/*.bdb

200K    /var/lib/ldap/bestMatchPrefix.bdb
982M    /var/lib/ldap/dn2id.bdb
2.4G    /var/lib/ldap/id2entry.bdb

1.8M    /var/lib/ldap/objectClass.bdb
1.2M    /var/lib/ldap/originatorPrefixID.bdb
48M     /var/lib/ldap/uniqueID.bdb
3.4G    total <= interesting ... almost the same as number of entries :)


changed DB_CONFIG to cache 7 GB:

set_cachesize 7 0 1

set_lg_regionmax 262144
set_lg_bsize 2097152



my system has 10 GB of  RAM and the situation now is:


[root@l01lnp2 ~]# free
             total       used       free     shared    buffers     cached
Mem:      10234924   10176544      58380          0       2144    3786596
-/+ buffers/cache:    6387804    3847120
Swap:      4096564     753572    3342992
[root@l01lnp2 ~]#



When i'm doing ldapsearch (time ldapsearch  -h localhost -x -b ou=bestMatchPrefixList,ou=sipDirektor,dc=ot,dc=hr  -D cn=admin,dc=ot,dc=hr -w pero99) before i actuall add anything with ldapadd, the search completes within 40 seconds. slapd process takes 24 - 26% memory.

After I add new entries (just 2 more) and perform the same search, it hangs after a while. When it ldapsearch finishes returning entries, i see slapd process memory starts growing .... it is taking almost everything.... reaching 97% ?!?!
It is always like this.... the search throws all entries and then waits for some time .. it is almost random 60 seconds - 6 minutes to actually exit.


Please can you take a loot to strace logs i've attached in my previous e-mail... as asoon as the ldapsearch stops returning entries i see a lot of jubrish there...



Here is slapd process memory growth:

top - 16:42:22 up 4 days,  1:02,  2 users,  load average: 2.13, 0.67, 0.23
Tasks: 119 total,   1 running, 118 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.8%us,  0.2%sy,  0.0%ni, 70.0%id, 28.8%wa,  0.0%hi,  0.2%si,  0.0%st
Mem:  10234924k total, 10177568k used,    57356k free,     6676k buffers
Swap:  4096564k total,    36516k used,  4060048k free,  3603688k cached


  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND   
 9404 ldap      25   0 13.3g 8.8g 2.8g S  4.0 89.7   1:13.49 slapd     
    1 root      15   0 10344  372  344 S  0.0  0.0   0:01.69 init      
    2 root      RT  -5     0    0    0 S  0.0  0.0   0:00.06 migration/0
   


Tasks: 117 total,   1 running, 116 sleeping,   0 stopped,   0 zombie
Cpu(s):  7.2%us,  0.7%sy,  0.0%ni, 67.5%id, 24.3%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:  10234924k total, 10177968k used,    56956k free,     6656k buffers
Swap:  4096564k total,    36516k used,  4060048k free,  3580356k cached


  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9404 ldap      25   0 13.3g 8.9g 2.9g S 30.3 90.9   1:16.76 slapd 
  325 root      10  -5     0    0    0 S  0.7  0.0   5:37.11 kswapd0
 8458 root      15   0     0    0    0 D  0.3  0.0   0:02.02 pdflush


Tasks: 117 total,   1 running, 116 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.0%us,  0.3%sy,  0.0%ni, 72.3%id, 26.1%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:  10234924k total, 10180560k used,    54364k free,     6140k buffers
Swap:  4096564k total,    36516k used,  4060048k free,  3488164k cached


  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9404 ldap      25   0 13.4g 9.3g 3.2g S  4.7 95.5   1:28.86 slapd 
 8458 root      15   0     0    0    0 D  0.7  0.0   0:02.20 pdflush


Tasks: 117 total,   1 running, 116 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.9%us,  0.4%sy,  0.0%ni, 70.5%id, 28.0%wa,  0.0%hi,  0.2%si,  0.0%st
Mem:  10234924k total, 10177812k used,    57112k free,     3492k buffers
Swap:  4096564k total,    36516k used,  4060048k free,  3481476k cached


  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9404 ldap      25   0 13.4g 9.4g 3.2g S  4.3 95.9   1:30.39 slapd 
  325 root      10  -5     0    0    0 S  0.7  0.0   5:38.08 kswapd0



top - 16:45:01 up 4 days,  1:05,  2 users,  load average: 1.91, 1.40, 0.59
Tasks: 117 total,   1 running, 116 sleeping,   0 stopped,   0 zombie
Cpu(s):  3.2%us,  0.2%sy,  0.0%ni, 75.0%id, 21.4%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:  10234924k total, 10179744k used,    55180k free,      396k buffers
Swap:  4096564k total,    42328k used,  4054236k free,  3473624k cached


  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9404 ldap      25   0 13.5g 9.4g 3.3g S 13.6 96.7   1:33.44 slapd 
 9490 root      15   0     0    0    0 S  0.3  0.0   0:00.31 pdflush




top - 16:45:33 up 4 days,  1:05,  2 users,  load average: 1.55, 1.36, 0.60
Tasks: 117 total,   1 running, 116 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.7%us,  0.2%sy,  0.0%ni, 74.7%id, 22.3%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:  10234924k total, 10180100k used,    54824k free,      652k buffers
Swap:  4096564k total,   118616k used,  3977948k free,  3521232k cached


  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9404 ldap      25   0 13.5g 9.4g 3.3g S 10.6 96.6   1:37.36 slapd
  325 root      10  -5     0    0    0 S  0.3  0.0   5:38.63 kswapd0




This looks to me as a memory leak bug to me.

Tihomir.



 

On Thu, Sep 10, 2009 at 9:37 PM, Quanah Gibson-Mount <quanah@zimbra.com> wrote:
--On Thursday, September 10, 2009 8:56 PM +0200 Tihomir Culjaga <tculjaga@gmail.com> wrote:

So, the situation is that i have 2 ldif files i'm recreating the database
from.

/usr/local/libexec/slapadd -l /home/tculjaga/file2.ldif -f
/usr/local/etc/openldap/slapd.conf
/usr/local/libexec/slapadd -l /home/tculjaga/file2.ldif -f
/usr/local/etc/openldap/slapd.conf

I would suggest you just make these a single file, so all the work can be done at one time.


I tried to re-index with /usr/local/libexec/slapindex -f
/usr/local/etc/openldap/slapd.conf -v
restart slapd process, restart the machine ... it is always the same
issue.

Nothing here indicates a problem with your indices.  Running slapindex repeatedly is a waste of your time.


[root@l01lnp2 traces]# /usr/local/libexec/slapd -V
@(#) $OpenLDAP: slapd 2.4.16 (Sep  9 2009 14:39:44) $
    root@l01lnp2:/home/tculjaga/openldap-2.4.16/servers/slapd

I would strongly urge you to upgrade to 2.4.18 (for reasons I will note further down)



[root@l01lnp2 traces]# /usr/local/BerkeleyDB.4.7/bin/db_stat -V
Berkeley DB 4.7.25: (May 15, 2008) - unpached!

You need to rebuild BDB 4.7.25 with the 4 patches from Oracle.  There are known issues when running BDB 4.7 without them.


[root@l01lnp2 traces]# du -c -h /var/lib/ldap/*.bdb
200K    /var/lib/ldap/bestMatchPrefix.bdb
3.8G    /var/lib/ldap/dn2id.bdb
6.2G    /var/lib/ldap/id2entry.bdb
1.8M    /var/lib/ldap/objectClass.bdb
1.2M    /var/lib/ldap/originatorPrefixID.bdb
48M    /var/lib/ldap/uniqueID.bdb
10G    total

Since your database is a total of 10 GB in size, for slapadd to work at optimum efficiency, you need at least 10GB of cache for your DB_CONFIG file.  Unfortunately, you only have 10GB of RAM.  Essentially, your system is under powered for your database size.




[tculjaga@l01lnp2 ~]$ cat ot.ldif | grep -c "dn: "
101588
[tculjaga@l01lnp2 ~]$ cat l01sipdir1.ldif | grep -c "dn: "
9994864
[tculjaga@l01lnp2 ~]$

So you have 10,096,452 entries total.


[root@l01lnp2 traces]# cat /var/lib/ldap/DB_CONFIG | grep -v "#"

set_cachesize 0 3221225472 1
set_lg_regionmax 262144
set_lg_bsize 2097152

You only have a 3GB DB cachesize configured here.  Expect things to perform sub optimally.  It would have been easier to set this by going

set_cachesize 3 0 1

Which would have the same effect, since the first number is the number of gigabytes to allocate.


Please find attached slapd.conf

Ok, so the relevant bits from here are:

cachesize 2500000
idlcachesize 7500000
cachefree 1000

Which means you have a cachesize of 2.5 million, an idlcachesize of 7.5 million, and (with OL 2.4.16) a dncachesize of 5 million.

I would highly advise you upgrade to OpenLDAP 2.4.18, and change the slapd.conf settings to:

dncachesize 0 (which means unlimited).

And setting no cache or idlcachesize, and fixing your DB_CONFIG.  But you also need to buy a substantial amount of RAM for a DB of this size. :P  I would advise you upgrade to at least 32GB total.  Then you can more optimally tune the system.


--Quanah

--

Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra ::  the leader in open source messaging and collaboration