Full_Name: Andrew Gray Version: 2.4.17 OS: Debian 5.0 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (131.216.14.1) On receiving LDAP queries with a pagedResultsControl (in this case with a size of 250), back-sql generates an extremely inefficient query for every iteration in the form of: SELECT DISTINCT ldap_entries.id,people.local_id,text('UNLVexpperson') AS objectClass,ldap_entries.dn AS dn FROM ldap_entries,people,ldap_entry_objclasses WHERE people.local_ id=ldap_entries.keyval AND ldap_entries.oc_map_id=1 AND upper(ldap_entries.dn) LIKE upper('%'||'%OU=PEOPLE,DC=UNLV,DC=EDU') AND ldap_entries.id>250 AND (2=2 OR (ldap_entries.id=ldap_entry_objclasses.entry_id AND ldap_entry_objclasses.oc_ name='UNLVexpperson')) (this repeats for id>250, id>500, id>750, etc. etc.) Ideally (IMO) there really should be a SQL LIMIT applied here, as in this case slapd gets back a few tens of thousands of rows on every iteration, and the memory usage explodes and eventually gets killed.
Andrew.Gray@unlv.edu wrote: > Full_Name: Andrew Gray > Version: 2.4.17 > OS: Debian 5.0 > URL: ftp://ftp.openldap.org/incoming/ > Submission from: (NULL) (131.216.14.1) > > > On receiving LDAP queries with a pagedResultsControl (in this case with a size > of 250), back-sql generates an extremely inefficient query for every iteration > in the form of: > > SELECT DISTINCT ldap_entries.id,people.local_id,text('UNLVexpperson') AS > objectClass,ldap_entries.dn AS dn FROM ldap_entries,people,ldap_entry_objclasses > WHERE people.local_ > id=ldap_entries.keyval AND ldap_entries.oc_map_id=1 AND upper(ldap_entries.dn) > LIKE upper('%'||'%OU=PEOPLE,DC=UNLV,DC=EDU') AND ldap_entries.id>250 AND (2=2 OR > (ldap_entries.id=ldap_entry_objclasses.entry_id AND ldap_entry_objclasses.oc_ > name='UNLVexpperson')) > > (this repeats for id>250, id>500, id>750, etc. etc.) > > Ideally (IMO) there really should be a SQL LIMIT applied here, as in this case > slapd gets back a few tens of thousands of rows on every iteration, and the > memory usage explodes and eventually gets killed. Using back-sql on large databases along with pagedResult control is not advisable. Limiting the number of entries returned by each query is not viable as well, since some entries might not mathc the LDAP filter, or ACLs or so, possibly leading to less than pageSize entries returned within one page. PagedResults could be removed from back-sql, and dealt with by an overlay that simply pages results returned by back-sql in a single internal search; probably this is the preferable approach, since it would also result in a reduction of the complexity of back-sql. However, I have little interest in improving back-sql, so patches are welcome, as usual... p.
masarati@aero.polimi.it wrote: > Andrew.Gray@unlv.edu wrote: >> Full_Name: Andrew Gray >> Version: 2.4.17 >> OS: Debian 5.0 >> URL: ftp://ftp.openldap.org/incoming/ >> Submission from: (NULL) (131.216.14.1) >> >> >> On receiving LDAP queries with a pagedResultsControl (in this case with a size >> of 250), back-sql generates an extremely inefficient query for every iteration >> in the form of: >> >> SELECT DISTINCT ldap_entries.id,people.local_id,text('UNLVexpperson') AS >> objectClass,ldap_entries.dn AS dn FROM ldap_entries,people,ldap_entry_objclasses >> WHERE people.local_ >> id=ldap_entries.keyval AND ldap_entries.oc_map_id=1 AND upper(ldap_entries.dn) >> LIKE upper('%'||'%OU=PEOPLE,DC=UNLV,DC=EDU') AND ldap_entries.id>250 AND (2=2 OR >> (ldap_entries.id=ldap_entry_objclasses.entry_id AND ldap_entry_objclasses.oc_ >> name='UNLVexpperson')) >> >> (this repeats for id>250, id>500, id>750, etc. etc.) >> >> Ideally (IMO) there really should be a SQL LIMIT applied here, as in this case >> slapd gets back a few tens of thousands of rows on every iteration, and the >> memory usage explodes and eventually gets killed. > > Using back-sql on large databases along with pagedResult control is not > advisable. Limiting the number of entries returned by each query is not > viable as well, since some entries might not mathc the LDAP filter, or > ACLs or so, possibly leading to less than pageSize entries returned > within one page. PagedResults could be removed from back-sql, and dealt > with by an overlay that simply pages results returned by back-sql in a > single internal search; probably this is the preferable approach, since > it would also result in a reduction of the complexity of back-sql. > However, I have little interest in improving back-sql, so patches are > welcome, as usual... The sssvlv overlay already intercepts pagedResults requests if they occur in combination with the Sort control. It would be trivial to extend it to always intercept pagedResults, and then we can rip the paging support out of each of the backends. (Of course, there's a marginal efficiency advantage to letting back-bdb/hdb do its own paging. A configurable option might be best.) -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Howard Chu wrote: >> Using back-sql on large databases along with pagedResult control is not >> advisable. Limiting the number of entries returned by each query is not >> viable as well, since some entries might not mathc the LDAP filter, or >> ACLs or so, possibly leading to less than pageSize entries returned >> within one page. PagedResults could be removed from back-sql, and dealt >> with by an overlay that simply pages results returned by back-sql in a >> single internal search; probably this is the preferable approach, since >> it would also result in a reduction of the complexity of back-sql. >> However, I have little interest in improving back-sql, so patches are >> welcome, as usual... > > The sssvlv overlay already intercepts pagedResults requests if they > occur in combination with the Sort control. It would be trivial to > extend it to always intercept pagedResults, and then we can rip the > paging support out of each of the backends. (Of course, there's a > marginal efficiency advantage to letting back-bdb/hdb do its own paging. > A configurable option might be best.) That's more or less what I had in mind. I assume you merged the two functionalities in one overlay because pagedResult needs special care when combined with SSSVLV, and this might be true for other functionalities, though (e.g. having efficient pagedResult; life would be much better without it, since clients do not need it while it makes servers' life harder). With respect to conditionally exploiting native pagedResult capabilities of back-bdb/hdb I only fear some issues related to glued databases. Those could be possibly solved by disabling native back-bdb/hdb pagedResult handling when used in glued databases, or even more granularly, when a search spans more than one database in a glued configuration, delegating the handling to the overlay in those cases. p.
See comments on how to fix this via enhancement, if desired.
changed notes moved from Incoming to Software Enhancements