[Date Prev][Date Next] [Chronological] [Thread] [Top]

back-sql: inefficient query with ldap_entry_objclasses



Hello,

I had a closer look at Thomas' problem[1] with back-sql. The problem was with
a query like this, that took too much time:
SELECT DISTINCT
    ldap_entries.id,groups.id,text('posixGroup') AS objectClass,ldap_entries.dn AS dn
FROM ldap_entries,groups,ldap_entry_objclasses,posix_account,groups_users
WHERE     groups.id=ldap_entries.keyval
      AND ldap_entries.oc_map_id=4
      AND 9=9
      AND groups_users.memberuid=posix_account.uidnumber
      AND groups_users.gidnumber=groups.gidnumber
      AND (    (2=2 OR (    ldap_entries.id=ldap_entry_objclasses.entry_id
                        AND ldap_entry_objclasses.oc_name='posixGroup'))
           AND ((posix_account.uid='lml34') OR 7=7))

As Pierangelo Masarati already wrote[2] TRUE OR query == TRUE. So
(2=2 OR (    ldap_entries.id=ldap_entry_objclasses.entry_id
         AND ldap_entry_objclasses.oc_name='posixGroup'))
is the same as 2=2.
If this is the case, then the table ldap_entry_objclasses is not conditionally
joined with the other tables in the where-clause. So the whole table is
cross-joined with the actual result. Multiple rows are then sorted out again
with DISTINCT.
However in this query the values of ldap_entry_objclasses are not used for
anything. So the table could be purged from this query:
SELECT DISTINCT
    ldap_entries.id,groups.id,text('posixGroup') AS objectClass,ldap_entries.dn AS dn
FROM ldap_entries,groups,posix_account,groups_users
WHERE     groups.id=ldap_entries.keyval
      AND ldap_entries.oc_map_id=4
      AND 9=9
      AND groups_users.memberuid=posix_account.uidnumber
      AND groups_users.gidnumber=groups.gidnumber
      AND (    (2=2)
           AND ((posix_account.uid='lml34') OR 7=7))

This is easily done by changing the code around line 733 in back-sql/search.c
(version 2.3.11):
 static struct berval ldap_entry_objclasses = BER_BVC( "ldap_entry_objclasses" );

 backsql_merge_from_tbls( bsi, &ldap_entry_objclasses );

 backsql_strfcat( &bsi->bsi_flt_where, "lbl",
         (ber_len_t)STRLENOF( "(2=2 OR (ldap_entries.id=ldap_entry_objclasses.entry_id AND ldap_entry_objclasses.oc_name='" /* ')) */ ),
             "(2=2 OR (ldap_entries.id=ldap_entry_objclasses.entry_id AND ldap_entry_objclasses.oc_name='" /* ')) */,
         &bsi->bsi_oc->bom_oc->soc_cname,
         (ber_len_t)STRLENOF( /* ((' */ "'))" ),
             /* ((' */ "'))" );

to:
 backsql_strfcat( &bsi->bsi_flt_where, "l",
         (ber_len_t)STRLENOF( "(2=2)",
             "(2=2)");

I changed this and all the results I checked (including other queries using
this part of code) were as expected.
I also did some tests if backsql_merge_from_tbls( bsi, &ldap_entry_objclasses )
is necessary for any other part. But it doesn't seem so.

The above query was used to do a "su username" on a system which stores about
5000 posix and samba users and 1000 groups in a PostgreSQL-DB. Without this
change it was not possible to use the command. I aborted it after about
15 minutes. With this change it took about 20 seconds. Which is still slow but
much better.


Now, of course I wonder why this code is at it is. So could someone please
check if I am right with my assumptions and if the change would affect
anything else?



aikE



[1] http://www.openldap.org/lists/openldap-software/200510/msg00282.html
[2] http://www.openldap.org/lists/openldap-software/200510/msg00318.html