[Date Prev][Date Next] [Chronological] [Thread] [Top]

OpenLDAP-specific API

> -----Original Message-----
> From: owner-openldap-software@OpenLDAP.org
> [mailto:owner-openldap-software@OpenLDAP.org]On Behalf Of Howard Chu

> A lot of these changes are undocumented and adhere to no
> published standards.

While I'm on the topic of standards, another digression. (Since we use it in
OpenLDAP I guess it still falls under the list charter...)

In 1986 I proposed that the new ANSI C string library should change the
definition of strcpy/strcat's return value, to point to the tail of the
resulting string instead of the head of it. I reasoned that you have already
passed in a pointer to the head of the string; you know its value so
returning it again is useless. On the other hand, returning the tail is
extremely useful, and allows you to cat complex strings together quickly and
efficiently. This was viewed as too radical a change, and the idea was
rejected. I then proposed that a new function be introduced, with the
above-described behavior, and that too was rejected. (0 for 2, sigh.) Anyway,
the upshot of this...

There are frequent examples in C of using the string library like so:
	char buf[MAXLEN]; int len;
	strcat(strcat(strcat(strcpy(buf, "This "),"is "),"a long "),"string.");
	len = strlen(buf);
to produce "This is a long string." and determine its length. Code-beauty
aside, this is a horrid example that many C programming students have adopted
as "normal". It executes in exponential time, relative to the lengths of the
strings involved.

I used a function "strcopy" which behaves as I proposed:
	len = strcopy(strcopy(strcopy(strcopy(buf, "x"),"y"),"z"),"phooey") - buf;
My version executes in linear time, and eliminates the second pass through
the string to calculate the length of the result.

Between 1986 and today probably millions or billions of lines of code have
been written by naive C programmers using strcpy/strcat/strlen as above,
wasting countless hours of CPU time. It may seem like a little thing, but its
effect can be huge: Easily 40% of UMich-ldap 3.3's execution time is spent in
the string library, and another 55% in malloc, with less than 5% spent on
actual LDAP protocol processing. We've rewritten most of what was done badly
in the OpenLDAP code base. You'll find "strcopy" as lutil_strcopy() in
OpenLDAP's liblutil.

The rule for the OpenLDAP project is correctness first, performance second.
This makes a great deal of sense; a fast tool that's broken doesn't help
anyone. But once you've formulated the correct approach to a solution, you
can't defer thinking about the optimal path for too long. The longer you
wait, the harder it becomes to change. This is painfully obvious as we
attempt to whip the legacy of UMich into shape. We've come a long way, but
there's still a ways to go.

For a server to scale, you must do everything as efficiently as possible.
Don't use an exponential algorithm when it's easy to make it linear instead.
Don't use a linear algorithm when it's easily made logarithmic. Simple,
obvious principles, that somehow seem to get ignored a lot of the time...

The trend these days is to deal with multiple abstraction layers piled on top
of each other. As long as you get a well-defined interface, you don't care
what goes on underneath. To me, accepting a black box, and not looking
inside, is foolish and leads to wasteful code. Again, one of the benefits of
Open Source is that you are free to examine the code you're working with. If
you care about using a tool effectively, you really owe it to yourself to
look inside, learn how it works, and how it works best. (But of course, I'm
the type who likes to open up black boxes and look inside. Hmm...)

Hopefully you, as users of the OpenLDAP APIs, will take an interest in
looking inside the code to see how it works, and where it can be made better.

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support