[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: ch_malloc of 8388608 bytes failed (ITS#2270)



When ch_malloc fails it calls abort() to kill the process. In your stack back
trace, there are 232 threads but none of them is in the abort() routine,
which I find very odd. Regardless, your problem is not due to any bug in
OpenLDAP. The fact is, even though you have a 64 bit machine, you have built
a 32 bit binary. So, it is limited to a 32 bit address space, and in Solaris,
not all of that 32 bit space is available for user memory, only about half of
it (31 bits, 2GB) is available. The default size of a thread stack has grown
in OpenLDAP 2.1, but even in OpenLDAP 2.0 it was 2MB per thread. With the
current 4MB per thread, times 232 threads, you have used 928MB of RAM. You
are also using 1GB for your BDB cache. This alone (1.9GB) leaves practically
nothing left for slapd to run with.

You should decrease the maximum number of threads; creating more beyond a
certain limit does not enhance concurrency anyway. You can increase your
available address space by building as a pure 64 bit executable but that
doesn't change the fact that having too many threads will slow you down.

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support

> -----Original Message-----
> From: owner-openldap-bugs@OpenLDAP.org
> [mailto:owner-openldap-bugs@OpenLDAP.org]On Behalf Of
> joseph.tingiris@cox.net
> Sent: Wednesday, January 15, 2003 9:27 AM
> To: openldap-its@OpenLDAP.org
> Subject: ch_malloc of 8388608 bytes failed (ITS#2270)
>
>
> Full_Name: Joseph Tingiris
> Version: 2.1.12
> OS: Solaris 8
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (206.157.224.254)
>
>
> I've read some of the other folks, using Solaris, having
> similar problems and
> I've tried almost everything I could find short of actually modifying
> ch_malloc.c myself. It appears to be specific to
> multiprocessor (3+) Sun
> installations.  The binaries have been compiled with
> -lmtmalloc and the latest
> versions of all Openldap dependent packages are used.  The primary
> authentication mechanism is cleartext.
>
> Some key points:
>
> * This server is a replica.
> * BDB-4.1 with 3.4 million DNs, 6 indexes (eq,sub)
> * process stack 32k (plimit -s), DB cache 1G (via DB_CONFIG)
> * this problem has persisted, on the same hardware, since
> openldap 2.0.12
> * slapd fails at least once a day with the same error every
> time, "ch_malloc of
> 8388608 bytes failed"; it's always the same amount of bytes
> * it appears to happen during a wildcard search, although it
> may be during some
> type of replication event
>
> Here is some info on the build environment:
>
> Application - OpenLdap and Dependencies:
>
> openldap-2.1.12
> openssl-0.9.7
> krb5-1.2.7
> cyrus-sasl-2.1.10
> db-4.1.25
>
> Compiler/Dev Tools:
>
> autoconf-2.57
> automake-1.7.2
> binutils-2.11.2
> bison-1.75
> fileutils-4.1
> gawk-3.1.0
> gcc-2.95.3
> gdb-5.0
> gdbm-1.8.0
> gettext-0.10.37
> glib-1.2.10
> gtk+-1.2.10
> libgcc-3.2
> libiconv-1.6.1
> libnet-1.0.2a
> libpcap-0.7.1
> libtool-1.4
> m4-1.4
> make-3.80
> ncurses-5.2
> slang-1.4.4
> tcl-8.4.1
> termcap-1.3
> textutils-2.0
> tk-8.4.1
> zlib-1.1.4
>
> Here's the system info:
>
> System Configuration:  Sun Microsystems  sun4u Sun Fire 3800
> System clock frequency: 150 MHz
> Memory size: 8192 Megabytes
>
> ========================= CPUs
> ===============================================
>
>             Port  Run    E$   CPU      CPU
> FRU Name     ID   MHz    MB   Impl.    Mask
> ----------  ----  ----  ----  -------  ----
> /N0/SB0/P0    0    750   8.0  US-III   3.4
> /N0/SB0/P1    1    750   8.0  US-III   3.4
> /N0/SB0/P2    2    750   8.0  US-III   3.4
> /N0/SB0/P3    3    750   8.0  US-III   3.4
> /N0/SB2/P0    8    750   8.0  US-III   3.4
> /N0/SB2/P1    9    750   8.0  US-III   3.4
> /N0/SB2/P2   10    750   8.0  US-III   3.4
> /N0/SB2/P3   11    750   8.0  US-III   3.4
>
> ========================= Memory Configuration
> ===============================
>
>                      Logical  Logical  Logical
>                Port  Bank     Bank     Bank         DIMM
> Interleave
> Interleave
> FRU Name        ID   Num      Size     Status       Size
> Factor      Segment
> -------------  ----  ----     ------   -----------  ------
> ----------
> ----------
> /N0/SB0/P0/B0    0    0       512MB    pass          256MB
>  8-way       0
> /N0/SB0/P0/B0    0    2       512MB    pass          256MB
>  8-way       0
> /N0/SB0/P1/B0    1    0       512MB    pass          256MB
>  8-way       0
> /N0/SB0/P1/B0    1    2       512MB    pass          256MB
>  8-way       0
> /N0/SB0/P2/B0    2    0       512MB    pass          256MB
>  8-way       0
> /N0/SB0/P2/B0    2    2       512MB    pass          256MB
>  8-way       0
> /N0/SB0/P3/B0    3    0       512MB    pass          256MB
>  8-way       0
> /N0/SB0/P3/B0    3    2       512MB    pass          256MB
>  8-way       0
> /N0/SB2/P0/B0    8    0       512MB    pass          256MB
>  8-way       1
> /N0/SB2/P0/B0    8    2       512MB    pass          256MB
>  8-way       1
> /N0/SB2/P1/B0    9    0       512MB    pass          256MB
>  8-way       1
> /N0/SB2/P1/B0    9    2       512MB    pass          256MB
>  8-way       1
> /N0/SB2/P2/B0   10    0       512MB    pass          256MB
>  8-way       1
> /N0/SB2/P2/B0   10    2       512MB    pass          256MB
>  8-way       1
> /N0/SB2/P3/B0   11    0       512MB    pass          256MB
>  8-way       1
> /N0/SB2/P3/B0   11    2       512MB    pass          256MB
>  8-way       1
>
> ========================= IO Cards =========================
>
>                                 Bus  Max
>             IO   Port Bus       Freq Bus  Dev,
> FRU Name    Type  ID  Side Slot MHz  Freq Func State Name
>
>       Model
> ----------  ---- ---- ---- ---- ---- ---- ---- -----
> --------------------------------  ----------------------
> /N0/IB6/P0  cPCI  24   B    2    33   33  1,0  ok
> pci-pci1011,46.1/pci108e,1000     pci-bridge
> /N0/IB6/P0  cPCI  24   B    2    33   33  0,0  ok
> pci108e,1000-pci108e,1000.1
> /N0/IB6/P0  cPCI  24   B    2    33   33  0,1  ok
> SUNW,hme-pci108e,1001
>       SUNW,cheerio
> /N0/IB6/P0  cPCI  24   B    2    33   33  4,0  ok
> SUNW,isptwo-pci1077,1020/sd
> (blo+ QLGC,ISP1040B
> /N0/IB6/P0  cPCI  24   B    3    33   33  2,0  ok
> network-pci108e,abba.11
>       SUNW,cpci-ce
> /N0/IB6/P1  cPCI  25   B    4    33   33  1,0  ok
> pci-pci1011,46.1/pci108e,1000     pci-bridge
> /N0/IB6/P1  cPCI  25   B    4    33   33  0,0  ok
> pci108e,1000-pci108e,1000.1
> /N0/IB6/P1  cPCI  25   B    4    33   33  0,1  ok
> SUNW,qfe-pci108e,1001
>       SUNW,cpci-qfe
> /N0/IB6/P1  cPCI  25   B    4    33   33  1,0  ok
> pci108e,1000-pci108e,1000.1
> /N0/IB6/P1  cPCI  25   B    4    33   33  1,1  ok
> SUNW,qfe-pci108e,1001
>       SUNW,cpci-qfe
> /N0/IB6/P1  cPCI  25   B    4    33   33  2,0  ok
> pci108e,1000-pci108e,1000.1
> /N0/IB6/P1  cPCI  25   B    4    33   33  2,1  ok
> SUNW,qfe-pci108e,1001
>       SUNW,cpci-qfe
> /N0/IB6/P1  cPCI  25   B    4    33   33  3,0  ok
> pci108e,1000-pci108e,1000.1
> /N0/IB6/P1  cPCI  25   B    4    33   33  3,1  ok
> SUNW,qfe-pci108e,1001
>       SUNW,cpci-qfe
> /N0/IB6/P1  cPCI  25   A    1    66   66  1,0  ok
> fibre-channel-pci10df,f900.10df.+
> /N0/IB8/P0  cPCI  28   B    2    33   33  1,0  ok
> network-pci108e,abba.11
>       SUNW,cpci-ce
> /N0/IB8/P1  cPCI  29   B    4    33   33  1,0  ok
> pci-pci1011,46.1/pci108e,1000     pci-bridge
> /N0/IB8/P1  cPCI  29   B    4    33   33  0,0  ok
> pci108e,1000-pci108e,1000.1
> /N0/IB8/P1  cPCI  29   B    4    33   33  0,1  ok
> SUNW,qfe-pci108e,1001
>       SUNW,cpci-qfe
> /N0/IB8/P1  cPCI  29   B    4    33   33  1,0  ok
> pci108e,1000-pci108e,1000.1
> /N0/IB8/P1  cPCI  29   B    4    33   33  1,1  ok
> SUNW,qfe-pci108e,1001
>       SUNW,cpci-qfe
> /N0/IB8/P1  cPCI  29   B    4    33   33  2,0  ok
> pci108e,1000-pci108e,1000.1
> /N0/IB8/P1  cPCI  29   B    4    33   33  2,1  ok
> SUNW,qfe-pci108e,1001
>       SUNW,cpci-qfe
> /N0/IB8/P1  cPCI  29   B    4    33   33  3,0  ok
> pci108e,1000-pci108e,1000.1
> /N0/IB8/P1  cPCI  29   B    4    33   33  3,1  ok
> SUNW,qfe-pci108e,1001
>       SUNW,cpci-qfe
> /N0/IB8/P1  cPCI  29   A    1    66   66  1,0  ok
> fibre-channel-pci10df,f900.10df.+
>
> ========================= Active Boards for Domain
> ===========================
>
>           Power  Fault  HotPlug  Board
> FRU Name   LED    LED     LED    Cond.
> --------  -----  -----  -------  -------
> /N0/SB0   on     off    off      ok
> /N0/SB2   on     off    off      ok
> /N0/IB6   on     off    off      ok
> /N0/IB8   on     off    off      ok
>
> ========================= Available Boards/Slots for Domain
> ==================
>
>           Power  Fault  HotPlug  Board/Slot  Board/Slot
> FRU Name   LED    LED     LED    Condition   Assigned
> --------  -----  -----  -------  ----------  ----------
> There are currently no Boards/Slots available to this Domain
>
> ========================= Hardware Failures
> ==================================
> No Hardware failures found in System
>
> Need any more info?  I still have pmap, lsof, truss, cores,
> and additional debug
> data.  Anyone have any ideas?
>
> Any help would be greatly appreciated.
>
> Thanks!
>
>
>