Back-sql improvements

Hello, all!

I've just committed a big bunch of improvements to back-sql, contributed by
Sam Drake and Raj Damani from TimesTen Performance Software.

It is available through CVS (HEAD branch).

Summary of changes is cited below. In brief - this eliminates most of known
deficiencies of back-sql, and is expected to boost out-of-box performance by
SEVERAL TIMES !

The patch will undergo some cosmetic changes later, when I return from
vacation, but it should be ready to test. Rajen kindly agreed to answer all
questions asked about this patch on openldap-devel and -software lists, and
provide bugfixes (if any will be needed). All those will be committed when I
return.

Big thanks to Sam and Raj,

WBW, Dmitry

----- Original Message -----
From: "Rajen Damani" <rdamani@timesten.com>
To: <mitya@seismic.ru>
Sent: Saturday, July 28, 2001 2:52 AM
Subject: Back-sql improvements

> Hello Dmitry,
> My name is Raj Damani and I work with Sam Drake at TimesTen Performance
> Software. Sam had made bunch of improvements to OpenLDAP a while ago. I
> recently made the same changes to latest OpenLDAP 2.0.11 source. We would
> like to send those changes to you. (email below describes all those
> changes).
>
> I was wondering, how I should submit these changes. I can send you the
> entire upgraded (back-sql) files that include all improvements via email.
> If this does not work for you, please let me know how we can submit these
> changes.
>
> Thanks.
> -Raj Damani
> TimesTen Performance Software
>
> -----Original Message-----
> From: Sam Drake [mailto:drake@timesten.com]
> Sent: Saturday, April 07, 2001 10:40 PM
> To: 'mitya@seismic.ru'
> Cc: openldap-devel@OpenLDAP.org
> Subject: RE: Slapd frontend performance issues
>
>
> FYI, here is a short description of the changes I made. I'll package up
the
> changes asap, but it may take a couple of days.
>
> The performance numbers quoted in this report were seen at my location
with
> a 100,000 object database ... the slower numbers I mentioned earlier were
> reported by a customer with a 1,000,000 object database.
>
> I also can't explain the very poor performance I saw with OpenLDAP and
LDBM
> with a 100,000 object database.
>
> ...Sam Drake / TimesTen Performance Software
>
> ----------
>
> Work Performed
>
> OpenLDAP 2.0.9, including back-sql, was built successfully on Solaris
> 8 using gcc. The LDAP server itself, slapd, passed all tests bundled
> with OpenLDAP. OpenLDAP was built using Sleepycat LDBM release 3.1.17
> as the "native" storage manager.
>
> The experimental back-sql facility in slapd was also built
> successfully. It was built using Oracle release 8.1.7 and the Oracle
> ODBC driver and ODBC Driver Manager from Merant. Rudimentary testing
> was performed with the data and examples provided with back-sql, and
> back-sql was found to be functional.
>
> Slapd and back-sql were then tested with TimesTen, using TimesTen
> 4.1.1. Back-sql was not immediately functional with TimesTen due to a
> number of SQL limitations in the TimesTen product.
>
> Functional issues encountered were:
>
> 1. Back-sql issued SELECT statements including the construct,
>    "UPPER(?)". While TimesTen supports UPPER, it does not support the
>    use of parameters as input to builtin functions. Back-sql was
>    modified to convert the parameter to upper case prior to giving it
>    to the underlying database ... a change that is appropriate for all
>    databases.
>
> 2. Back-sql issued SELECT statements using the SQL CONCAT function.
>    TimesTen does not support this function. Back-sql was modified to
>    concatentate the necessary strings itself (in "C" code) prior to
>    passing the parameters to SQL. This change is also appropriate for
>    all databases, not just TimesTen.
>
> Once these two issues were resolved, back-sql could successfully
> process LDAP searches using the sample data and examples provided with
> back-sql.
>
> While performance was not measured at this point, numerous serious
> performance problems were observed with the back-sql code and the
> generated SQL. In particular:
>
> 1. In the process of implementing an LDAP search, back-sql will
>    generate and execute a SQL query for all object classes stored in
>    back-sql. During the source of generating each SQL query, it is
>    common for back-sql to determine that a particular object class can
>    not possibly have any members satisfying the search. For example,
>    this can occur if the query searches an attribute of the LDAP
>    object that does not exist in the SQL schema. In this case,
>    back-sql would generate and issue the SQL query anyway, including a
>    clause such as "WHERE 1=0" in the generated SELECT. The overhead
>    of parsing, optimizing and executing the query is non-trivial, and
>    the answer (the empty set) is known in advance. Solution: Back-sql
>    was modified to stop executing a SQL query when it can be
>    predetermined that the query will return no rows.
>
> 2. Searches in LDAP are fundamentally case-insensitive ("abc" is equal
>    to "aBc"). However, in SQL this is not normally the case.
>    Back-sql thus generated SQL SELECT statements including clauses of
>    the form, "WHERE UPPER(attribute) = 'JOE'". Even if an index is
>    defined on the attribute in the relational database, the index can
>    not be used to satisfy the query, as the index is case sensitive.
>    The relational database then is forced to scan all rows in the
>    table in order to satisfy the query ... an expensive and
>    non-scalable proposition. Solution: Back-sql was modified to allow
>    the schema designer to add additional "upper cased" columns to the
>    SQL schema. These columns, if present, contain an upper cased
>    version of the "standard" field, and will be used preferentially
>    for searching. Such columns can be provided for all searchable
>    columns, some columns, or no columns. An application using
>    database "triggers" or similar mechanisms can automatically
>    maintain these upper cased columns when the standard column is
>    changed.
>
> 3. In order to implement the hierarchical nature of LDAP object
>    hierarchies, OpenLDAP uses suffix searches in SQL. For example, to
>    find all objects in the subtree "o=TimesTen,c=us", a SQL SELECT
>    statement of the form, "WHERE UPPER(dn) LIKE '%O=TIMESTEN,C=US'"
>    would be employed. Aside from the UPPER issue discussed above, a
>    second performance problem in this query is the use of suffix
>    search. In TimesTen (and most relational databases), indexes can
>    be used to optimize exact-match searches and prefix searches.
>    However, suffix searches must be performed by scanning every row in
>    the table ... an expensive and non-scalable proposition. Solution:
>    Back-sql was modified to optionally add a new "dn_ru" column to the
>    ldap_entries table. This additional column, if present, contains a
>    byte-reversed and upper cased version of the DN. This allows
>    back-sql to generate indexable prefix searches. This column is
>    also easily maintained automatically through the use of triggers.
>
> Results
>
> A simple database schema was generated holding the LDAP objects and
> attributes specified by our customer. An application was written to
> generate test databases. Both TimesTen and Oracle 8.1.7 were
> populated with 100,000 entry databases.
>
> Load Times
>
> Using "slapadd" followed by "slapindex", loading and indexing 100,000
> entries in an LDBM database ran for 19 minutes 10 seconds.
>
> Using a C++ application that used ODBC, loading 100,000 entries into
> a disk based RDBMS took 17 minutes 53 seconds.
>
> Using a C++ application that used ODBC, loading 100,000 entries into
> TimesTen took 1 minute 40 seconds.
>
> Search Times
>
> The command, "timex timesearch.sh '(cn=fname210100*)'" was used to
> test search times. This command issues the same LDAP search 4000
> times over a single LDAP connection. Both the client and server
> (slapd) were run on the same machine.
>
> With TimesTen as the database, 4000 queries took 14.93 seconds, for a
> rate of 267.9 per second.
>
> With a disk based RDBMS as the database, 4000 queries took 77.79 seconds,
> for a
> rate of 51.42 per second.
>
> With LDBM as the database, 1 query takes 76 seconds, or 0.076 per
> second. Something is clearly broken.

Back-sql improvements - please test