[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: ldap query performance issue



Meike Stone wrote:
Hello,

because of this,  does it make sense in a directory with > 1,000,000
people to index the sex?

Indexing is all about making rare data easy to find. If you have an attribute that occurs on 99% of your entries, indexing it won't save any search time, and it will needlessly slow down modify time.

Asking about "1,000,000" entries is meaningless on its own. It's not the raw number of entries that matters, it's the percentage of the total directory. If you have 1,000,000,000 entries in your directory, then 1,000,000 is actually quite a small percentage of the data and it might be smart to index it. If you have only 2,000,000 entries total, it may not make enough difference to be worthwhile.

It's not the raw numbers that matter, it's the frequency of occurrences.


thanks Meike

2013/5/23 Quanah Gibson-Mount <quanah@zimbra.com>:
--On Thursday, May 23, 2013 4:40 PM +0000 Chris Card <ctcard@hotmail.com>
wrote:

Hi all,

I have an openldap directory with about 7 million DNs, running openldap
2.4.31 with a BDB backend (4.6.21), running on CentOS 6.3.

The structure of the directory is like this, with suffix dc=x,dc=y

dc=x,dc=y
    account=a,dc=x,dc=y
       mail=m,account=a,dc=x,dc=y           // Users
       ....
       licenceId=l,account=a,dc=x,dc=y      // Licences,
objectclass=licence       ....
       group=g,account=a,dc=x,dc=y          // Groups
       ....
       // etc.

    account=b,dc=x,dc=y
       ....

Most of the DNs in the directory are users or groups, and the number of
licences is small (<10) for each account.

If I do a query with basedn account=a,dc=x,dc=y and filter
(objectclass=licence) I see wildly different performance, depending on
how many users are under account a. For an account with ~30000 users the
query takes 2 seconds at most, but for an account with ~60000 users  the
query takes 1 minute.

It only appears to be when I filter on objectclass=licence that I see
that behaviour. If I filter on a different objectclass which matches a
similar number of objects to the objectclass=licence filter, the
performance doesn't seem to depend on the number of users.

There is an index on objectclass (of course), but the behaviour I'm
seeing seems to indicate that for this query, at some point slapd stops
using the index and just scans all the objects under the account.

Any ideas?


Increase the IDL range.  This is how I do it:

--- openldap-2.4.35/servers/slapd/back-bdb/idl.h.orig   2011-02-17
16:32:02.598593211 -0800
+++ openldap-2.4.35/servers/slapd/back-bdb/idl.h        2011-02-17
16:32:08.937757993 -0800
@@ -20,7 +20,7 @@
/* IDL sizes - likely should be even bigger
  *   limiting factors: sizeof(ID), thread stack size
  */
-#define        BDB_IDL_LOGN    16      /* DB_SIZE is 2^16, UM_SIZE is 2^17
*/
+#define        BDB_IDL_LOGN    17      /* DB_SIZE is 2^16, UM_SIZE is 2^17
*/
#define BDB_IDL_DB_SIZE                (1<<BDB_IDL_LOGN)
#define BDB_IDL_UM_SIZE                (1<<(BDB_IDL_LOGN+1))
#define BDB_IDL_UM_SIZEOF      (BDB_IDL_UM_SIZE * sizeof(ID))


--Quanah

--

Quanah Gibson-Mount
Sr. Member of Technical Staff
Zimbra, Inc
A Division of VMware, Inc.
--------------------
Zimbra ::  the leader in open source messaging and collaboration





--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/