[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Plea for Server Side Sorting



On Fri, Apr 01, 2005 at 11:49:38AM -0600, Armbrust, Daniel C. wrote:

> Unnecessary to one, necessary to another.  When we load 1 million
> concept codes from large terminologies into an ldap server, sorting
> them on the client side is not feasible.

True. There are certainly cases where server-side sorting would be
useful, but it has severe performance implications of its own...
Operations that work well for a handful of clients sometimes cause
complete breakdown when used by hundreds or thousands of clients.

> The sorting should be done on
> the server - because the server can do things the client can't - it can
> pre-compute the sort order and it typically has more memory/processing
> power for doing the sorting.

The sort *order* is not really the problem. OpenLDAP can probably
extract pre-sorted results from some backend types if the right
attributes are indexed. The problem comes from the fact that a search
must first be performed and then the results of that search must be
sorted. This tends to mean that the whole result set must be held in a
buffer somewhere on the server, even if the client only chooses to
download a small subset from it using the paged results service.
The buffer cannot be discarded until either the whole result set has
been downloaded by the client, or the client has abandoned the
operation.

Typical user-interface demands for this sort of service come from
implementing apparently-enormous scrolling lists. You see this sort of
thing in proprietary addressbook interfaces, which have unfortunately
taught people to scroll endlessly rather than typing a few characters
to form a good search. As an addressbook scrolling window could be
open for hours or even days, correct implementation requires that the
result set it is scrolling through must be stored somewhere. If you do
that on the server it chews up immense amounts of server resources,
and if you do it on the client it chews up immense amounts of network
resources (and time, to transfer the data). No win...

> I know that IBM's Tivoli supports sorted results on large data sets that
> behave as you expect they would.

True, but it also has a tuning parameter to limit the number of
sorted-results searches that can be outstanding at any one time. That
parameter is normally set to a very low value (default 3) because
the resource implications can be crippling. There are also controls
to restrict both sorted-results and paged-results operations to use by
administrators only. Similar controls are found on other commercial LDAP
servers that have SQL-based backends.

> making the point that sorting results isn't a limitation of LDAP design
> in general - its more a limitation of the underlying implementation
> of OpenLDAP.   I imagine if it were easy to implement, you probably
> wouldn't get such a strong pushback from your offer.

OK, here's an idea...

Assuming that the main demand for server-side sorting is to work with
paged results and cursors in order to implement those scrolling lists of
names in addressbook interfaces, we might relax some requirements. Maybe
we can relax enough to make the implementation feasible. I suggest that
*for this particular application* we might allow:

1)	The scrolling list might not be constant while you scroll.
	This means that you could view part of the list, scroll to
	another part, come back to the first page, and see different
	entries.

	Allowing this inconsistency permits the server to re-generate
	parts of the result set on demand, so it does not have to
	store vast sets of temporary results for each client.

2)	The form of the search and the sort might be constrained to
	one of a very small number of admin-defined profiles.

	This would allow the temporary result sets to be shared
	between clients.

3)	The data shown in the sorted lists might be permitted to be
	rather less up-to-date than that returned from lower-overhead
	directory operations.

	This would permit the server to re-generate the temporary
	result sets on an occasional basis. The frequency could be
	chosen by the site admin.

I suspect that applying some combination of those relaxations would
allow a more scalable server-side sorting implementation, perhaps as
an overlay.

Andrew
-- 
-----------------------------------------------------------------------
|                 From Andrew Findlay, Skills 1st Ltd                 |
| Consultant in large-scale systems, networks, and directory services |
|     http://www.skills-1st.co.uk/                +44 1628 782565     |
-----------------------------------------------------------------------