Full_Name: Daniel Armbrust Version: 2.2.17 OS: Fedora Core 2 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (129.176.151.126) I have hit a problem where Openldap seems to stop using its indexes (or just does something wrong with them) and instead of quickly do a one level search, it ends up scanning the entire database, returning tons of "scope not ok" messages. The interesting part is that this only happens on 1 particular node in my database. Renaming the node to something else has no affect. It also seems to be size related - as when we split up the ldif that we load the database from, such that we load this node in pieces, we can load some parts of this node, but not all. For example - when I have this particular node into 3 parts - I can load part 1, and things work. If I then load part 3 things fail. If I backup - and just load part 3 - things still work. But then when I add part 1, things fail. Configuration details: I have openldap 2.2.17 installed on a Fedora Core 2 machine, using a fully patched Berkeley 4.2.52. The problem originally surfaced on my 2.2.15 install. We have a custom schema, which I can provide if it would be useful. The rest of my slapd.conf file looks like this: pidfile ndfrt.pid schemacheck on idletimeout 14400 threads 150 sizelimit 6000 access to * by * write database bdb suffix "service=NDF-RT,dc=LexGrid,dc=org" rootdn "cn=yadayada,service=NDF-RT,dc=LexGrid,dc=org" rootpw "something for me to know..." directory /localwork/ldap/database/dbndfrt/ index objectClass eq index conceptCode eq index language pres,eq index dc eq index sourceConcept,targetConcept,association,presentationId eq index text,entityDescription pres,eq,sub,subany My DB_CONFIG file looks like this: set_flags DB_TXN_NOSYNC set_flags DB_TXN_NOT_DURABLE set_cachesize 0 102400000 1 My full ldif file is 720 MB - so it's a little hard to post..... But I load the entire database with slapadd. After the database is loaded, if I connect up to it with my Softerra Ldap Administrator, and browse down to the problem node, everything works fine. I have Softerra configured to use paged results - currently set to 100 items. When I click to expand the problem node (get its immediate children) this is what the server does (log level 1) (the node I am expanding is "conceptCode=kc8") connection_get(9): got connid=0 connection_read(9): checking for input on id=0 ber_get_next ber_get_next: tag 0x30 len 251 contents: ber_get_next ber_get_next on fd 9 failed errno=11 (Resource temporarily unavailable) do_search ber_scanf fmt ({miiiib) ber: >>> dnPrettyNormal: <conceptCode=KC8,dc=concepts,codingScheme=NDF-RT,dc=codingSchemes,service=NDF-RT,dc=LexGrid,dc=org> => ldap_bv2dn(conceptCode=KC8,dc=concepts,codingScheme=NDF-RT,dc=codingSchemes,service=NDF-RT,dc=LexGrid,dc=org,0) ldap_err2string <= ldap_bv2dn(conceptCode=KC8,dc=concepts,codingScheme=NDF-RT,dc=codingSchemes,service=NDF-RT,dc=LexGrid,dc=org)=0 Success => ldap_dn2bv(272) ldap_err2string <= ldap_dn2bv(conceptCode=KC8,dc=concepts,codingScheme=NDF-RT,dc=codingSchemes,service=NDF-RT,dc=LexGrid,dc=org)=0 Success => ldap_dn2bv(272) ldap_err2string <= ldap_dn2bv(conceptCode=kc8,dc=concepts,codingScheme=ndf-rt,dc=codingschemes,service=ndf-rt,dc=lexgrid,dc=org)=0 Success <<< dnPrettyNormal: <conceptCode=KC8,dc=concepts,codingScheme=NDF-RT,dc=codingSchemes,service=NDF-RT,dc=LexGrid,dc=org>, <conceptCode=kc8,dc=concepts,codingScheme=ndf-rt,dc=codingschemes,service=ndf-rt,dc=lexgrid,dc=org> ber_scanf fmt (m) ber: ber_scanf fmt ({M}}) ber: => get_ctrls ber_scanf fmt ({m) ber: ber_scanf fmt (b) ber: ber_scanf fmt (m) ber: => get_ctrls: oid="1.2.840.113556.1.4.473" (noncritical) ber_scanf fmt ({m) ber: ber_scanf fmt (b) ber: ber_scanf fmt (m) ber: => get_ctrls: oid="1.2.840.113556.1.4.319" (critical) ber_scanf fmt ({im}) ber: <= get_ctrls: n=2 rc=0 err="" ==> limits_get: conn=0 op=9 dn="[anonymous]" => bdb_search bdb_dn2entry("conceptCode=kc8,dc=concepts,codingScheme=ndf-rt,dc=codingschemes,service=ndf-rt,dc=lexgrid,dc=org") search_candidates: base="conceptCode=kc8,dc=concepts,codingScheme=ndf-rt,dc=codingschemes,service=ndf-rt,dc=lexgrid,dc=org" (0x0000000b) scope=1 => bdb_dn2idl( "conceptCode=kc8,dc=concepts,codingScheme=ndf-rt,dc=codingschemes,service=ndf-rt,dc=lexgrid,dc=org" ) <= bdb_dn2idl: id=-1 first=12 last=2821415 => bdb_presence_candidates (objectClass) bdb_search_candidates: id=-1 first=12 last=2821415 => send_search_entry: dn="propertyId=P-KC8-0,conceptCode=KC8,dc=concepts,codingScheme=NDF-RT,dc=codingSchemes,service=NDF-RT,dc=LexGrid,dc=org" (attrsOnly) ber_flush: 127 bytes to sd 9 <= send_search_entry bdb_search: 13 scope not okay bdb_search: 14 scope not okay bdb_search: 15 scope not okay bdb_search: 16 scope not okay bdb_search: 17 scope not okay bdb_search: 18 scope not okay bdb_search: 19 scope not okay bdb_search: 20 scope not okay bdb_search: 21 scope not okay bdb_search: 22 scope not okay bdb_search: 23 scope not okay bdb_search: 24 scope not okay bdb_search: 25 scope not okay bdb_search: 26 scope not okay bdb_search: 27 scope not okay bdb_search: 28 scope not okay bdb_search: 29 scope not okay <SNIP> bdb_search: 51 scope not okay connection_get(9): got connid=0 connection_read(9): checking for input on id=0 ber_get_next ber_get_next: tag 0x30 len 143 contents: ber_get_next ber_get_next on fd 9 failed errno=11 (Resource temporarily unavailable) bdb_search: 52 scope not okay bdb_search: 53 scope not okay bdb_search: 54 scope not okay do_search ber_scanf fmt ({miiiib) ber: >>> dnPrettyNormal: <conceptCode=KC8,dc=concepts,codingScheme=NDF-RT,dc=codingSchemes,service=NDF-RT,dc=LexGrid,dc=org> => ldap_bv2dn(conceptCode=KC8,dc=concepts,codingScheme=NDF-RT,dc=codingSchemes,service=NDF-RT,dc=LexGrid,dc=org,0) ldap_err2string <= ldap_bv2dn(conceptCode=KC8,dc=concepts,codingScheme=NDF-RT,dc=codingSchemes,service=NDF-RT,dc=LexGrid,dc=org)=0 Success => ldap_dn2bv(272) ldap_err2string <= ldap_dn2bv(conceptCode=KC8,dc=concepts,codingScheme=NDF-RT,dc=codingSchemes,service=NDF-RT,dc=LexGrid,dc=org)=0 Success => ldap_dn2bv(272) ldap_err2string <= ldap_dn2bv(conceptCode=kc8,dc=concepts,codingScheme=ndf-rt,dc=codingschemes,service=ndf-rt,dc=lexgrid,dc=org)=0 Success <<< dnPrettyNormal: <conceptCode=KC8,dc=concepts,codingScheme=NDF-RT,dc=codingSchemes,service=NDF-RT,dc=LexGrid,dc=org>, <conceptCode=kc8,dc=concepts,codingScheme=ndf-rt,dc=codingschemes,service=ndf-rt,dc=lexgrid,dc=org> ber_scanf fmt (m) ber: ber_scanf fmt ({M}}) ber: ==> limits_get: conn=0 op=10 dn="[anonymous]" => bdb_search bdb_dn2entry("conceptCode=kc8,dc=concepts,codingScheme=ndf-rt,dc=codingschemes,service=ndf-rt,dc=lexgrid,dc=org") => send_search_entry: dn="conceptCode=KC8,dc=concepts,codingScheme=NDF-RT,dc=codingSchemes,service=NDF-RT,dc=LexGrid,dc=org" ber_flush: 195 bytes to sd 9 <= send_search_entry bdb_search: 55 scope not okay send_ldap_result: conn=0 op=10 p=3 send_ldap_response: msgid=31 tag=101 err=0 ber_flush: 14 bytes to sd 9 bdb_search: 56 scope not okay bdb_search: 57 scope not okay <SNIP> bdb_search: 221 scope not okay entry_decode: "propertyId=P-C190-0,conceptCode=C190,dc=concepts,codingScheme=NDF-RT,dc=codingSchemes,service=NDF-RT,dc=LexGrid,dc=org" <= entry_decode(propertyId=P-C190-0,conceptCode=C190,dc=concepts,codingScheme=NDF-RT,dc=codingSchemes,service=NDF-RT,dc=LexGrid,dc=org) => bdb_dn2id( "propertyId=p-c190-0,conceptCode=c190,dc=concepts,codingScheme=ndf-rt,dc=codingschemes,service=ndf-rt,dc=lexgrid,dc=org" ) <= bdb_dn2id: got id=0x000000de bdb_search: 222 scope not okay entry_decode: "propertyId=SearchName-1,conceptCode=C190,dc=concepts,codingScheme=NDF-RT,dc=codingSchemes,service=NDF-RT,dc=LexGrid,dc=org" <= entry_decode(propertyId=SearchName-1,conceptCode=C190,dc=concepts,codingScheme=NDF-RT,dc=codingSchemes,service=NDF-RT,dc=LexGrid,dc=org) => bdb_dn2id( "propertyId=searchname-1,conceptCode=c190,dc=concepts,codingScheme=ndf-rt,dc=codingschemes,service=ndf-rt,dc=lexgrid,dc=org" ) <= bdb_dn2id: got id=0x000000df bdb_search: 223 scope not okay entry_decode: "conceptCode=C192,dc=concepts,codingScheme=NDF-RT,dc=codingSchemes,service=NDF-RT,dc=LexGrid,dc=org" <= entry_decode(conceptCode=C192,dc=concepts,codingScheme=NDF-RT,dc=codingSchemes,service=NDF-RT,dc=LexGrid,dc=org) => bdb_dn2id( "conceptCode=c192,dc=concepts,codingScheme=ndf-rt,dc=codingschemes,service=ndf-rt,dc=lexgrid,dc=org" ) <= bdb_dn2id: got id=0x000000e0 bdb_search: 224 scope not okay And then it continues this until the timeout limit is reached, and throws an error back to the client. The last time I saw this error was when there was a bug in the paged results code - but this occurs no matter what the paged result setting is. There are other large nodes in this database, and they all work correctly. I have also loaded over 2 GB of ldif into other openldap databases before, and not run into this error. What I don't know, however, is if I have ever loaded this many entries under one node before. This node itself is about 190 MB worth of ldif. So I could be hitting a limitation (or bug) there that I have never tickled before.
daniel.armbrust@mayo.edu wrote: >Full_Name: Daniel Armbrust >Version: 2.2.17 >OS: Fedora Core 2 >URL: ftp://ftp.openldap.org/incoming/ >Submission from: (NULL) (129.176.151.126) > > >I have hit a problem where Openldap seems to stop using its indexes (or just >does something wrong with them) and instead of quickly do a one level search, it >ends up scanning the entire database, returning tons of "scope not ok" >messages. > >The interesting part is that this only happens on 1 particular node in my >database. Renaming the node to something else has no affect. It also seems to >be size related - as when we split up the ldif that we load the database from, >such that we load this node in pieces, we can load some parts of this node, but >not all. > > Exactly how large is this "node" ? You say it fails when you do a one-level search under it - how many immediate children does it have? There is a known limitation in back-bdb's index design; when any index slot hits 65536 entries it gets converted from an explicit list of entries into a "range". If the entries in this slot were not added in sorted order, then the range may span a large portion of the database. For example, assuming the slot size was 4, and you had an index slot with entry IDs 2,6,25,57 if you added a new entry under this slot, entry ID 99, this index slot would be converted into a range 2-99 which would include quite a large number of entries that really have nothing to do with that slot. You can tweak the slot sizes in back-bdb/idl.h BDB_IDL_DB_SIZE and BDB_IDL_UM_SIZE and recompile. I believe UM_SIZE must always be at least twice the DB_SIZE. You will also need to dump the database to LDIF before making this change, and reload from scratch afterward. Also, loading your database in sorted order will help minimize the impact of this problem. I.e., make sure that all of the children of a particular node are loaded contiguously, without other intervening entries. This only helps when the DIT is relatively flat. Originally back-hdb did not have this problem, although it does now because it shares the same search/indexing mechanism. -- -- Howard Chu Chief Architect, Symas Corp. Director, Highland Sun http://www.symas.com http://highlandsun.com/hyc Symas: Premier OpenSource Development and Support
Thanks for the info. I'll try changing the parameters and reloading it next week sometime. We have 259,423 direct children and about 690,264 total children under the problem node. Dan -----Original Message----- From: Howard Chu [mailto:hyc@symas.com] Sent: Friday, October 01, 2004 4:38 AM To: Armbrust, Daniel C. Cc: openldap-its@OpenLDAP.org Subject: Re: scope not ok errors on very large databases (ITS#3343) daniel.armbrust@mayo.edu wrote: >Full_Name: Daniel Armbrust >Version: 2.2.17 >OS: Fedora Core 2 >URL: ftp://ftp.openldap.org/incoming/ >Submission from: (NULL) (129.176.151.126) > > >I have hit a problem where Openldap seems to stop using its indexes (or just >does something wrong with them) and instead of quickly do a one level search, it >ends up scanning the entire database, returning tons of "scope not ok" >messages. > >The interesting part is that this only happens on 1 particular node in my >database. Renaming the node to something else has no affect. It also seems to >be size related - as when we split up the ldif that we load the database from, >such that we load this node in pieces, we can load some parts of this node, but >not all. > > Exactly how large is this "node" ? You say it fails when you do a one-level search under it - how many immediate children does it have? There is a known limitation in back-bdb's index design; when any index slot hits 65536 entries it gets converted from an explicit list of entries into a "range". If the entries in this slot were not added in sorted order, then the range may span a large portion of the database. For example, assuming the slot size was 4, and you had an index slot with entry IDs 2,6,25,57 if you added a new entry under this slot, entry ID 99, this index slot would be converted into a range 2-99 which would include quite a large number of entries that really have nothing to do with that slot. You can tweak the slot sizes in back-bdb/idl.h BDB_IDL_DB_SIZE and BDB_IDL_UM_SIZE and recompile. I believe UM_SIZE must always be at least twice the DB_SIZE. You will also need to dump the database to LDIF before making this change, and reload from scratch afterward. Also, loading your database in sorted order will help minimize the impact of this problem. I.e., make sure that all of the children of a particular node are loaded contiguously, without other intervening entries. This only helps when the DIT is relatively flat. Originally back-hdb did not have this problem, although it does now because it shares the same search/indexing mechanism. -- -- Howard Chu Chief Architect, Symas Corp. Director, Highland Sun http://www.symas.com http://highlandsun.com/hyc Symas: Premier OpenSource Development and Support
changed notes changed state Open to Closed
I changed the values you recommended to: Code from openldap-2.2.17/servers/slapd/back-bdb/idl.h: /* IDL sizes - likely should be even bigger * limiting factors: sizeof(ID), thread stack size */ #define BDB_IDL_DB_SIZE (1<<18) /* 64K IDL on disk - dan modified to 256K */ #define BDB_IDL_UM_SIZE (1<<19) /* 128K IDL in memory - dan modifed to 512K*/ And now I get a segmentation fault when I run "make test" >>>>> Starting test003-search ... running defines.sh Running slapadd to build slapd database... Running slapindex to index slapd database... Starting slapd on TCP/IP port 9011... Testing slapd searching... Waiting 5 seconds for slapd to start... Testing exact searching... Testing approximate searching... Testing OR searching... Testing AND matching and ends-with searching... ./scripts/test003-search: line 100: 7856 Segmentation fault $SLAPD -f $CONF1 -h $URI1 -d $LVL $TIMING >$LOG1 2>&1 ldapsearch failed (255)! ./scripts/test003-search: line 104: kill: (7856) - No such process >>>>> ./scripts/test003-search failed (exit 255) make[2]: *** [bdb-yes] Error 255 make[2]: Leaving directory `/home/armbrust/temp/openldap-2.2.17/tests' make[1]: *** [test] Error 2 make[1]: Leaving directory `/home/armbrust/temp/openldap-2.2.17/tests' make: *** [test] Error 2 Did I mess up changing the params? Dan
Further data point.... If I only double (instead of quadruple) the values - so now I'm using #define BDB_IDL_DB_SIZE (1<<17) /* 64K IDL on disk - dan modified to 128K */ #define BDB_IDL_UM_SIZE (1<<18) /* 128K IDL in memory - dan modifed to 256K*/ And now all make tests pass. I don't think this will be enough extra size to fix my problem, however... I'm starting a new load right now to determine if the behavior has changed at all. Dan
Hm.... Would need a gdb stack trace to be sure, but most likely the new size is too large for the regular thread stack. You'll need to increase the size of LDAP_PVT_THREAD_STACK_SIZE and recompile libldap_r to change that, and relink slapd. Armbrust, Daniel C. wrote: > >I changed the values you recommended to: >Code from openldap-2.2.17/servers/slapd/back-bdb/idl.h: > >/* IDL sizes - likely should be even bigger > * limiting factors: sizeof(ID), thread stack size > */ >#define BDB_IDL_DB_SIZE (1<<18) /* 64K IDL on disk - dan modified to 256K */ >#define BDB_IDL_UM_SIZE (1<<19) /* 128K IDL in memory - dan modifed to 512K*/ > > >And now I get a segmentation fault when I run "make test" > > > >>>>>>Starting test003-search ... >>>>>> >>>>>> >running defines.sh >Running slapadd to build slapd database... >Running slapindex to index slapd database... >Starting slapd on TCP/IP port 9011... >Testing slapd searching... >Waiting 5 seconds for slapd to start... >Testing exact searching... >Testing approximate searching... >Testing OR searching... >Testing AND matching and ends-with searching... >./scripts/test003-search: line 100: 7856 Segmentation fault $SLAPD -f $CONF1 -h $URI1 -d $LVL $TIMING >$LOG1 2>&1 >ldapsearch failed (255)! >./scripts/test003-search: line 104: kill: (7856) - No such process > > >>>>>>./scripts/test003-search failed (exit 255) >>>>>> >>>>>> >make[2]: *** [bdb-yes] Error 255 >make[2]: Leaving directory `/home/armbrust/temp/openldap-2.2.17/tests' >make[1]: *** [test] Error 2 >make[1]: Leaving directory `/home/armbrust/temp/openldap-2.2.17/tests' >make: *** [test] Error 2 > >Did I mess up changing the params? >Dan > > > > -- -- Howard Chu Chief Architect, Symas Corp. Director, Highland Sun http://www.symas.com http://highlandsun.com/hyc Symas: Premier OpenSource Development and Support
Followup to the last post - when I reloaded my problematic database on openldap 2.2.17 using these parameters: #define BDB_IDL_DB_SIZE (1<<17) /* 64K IDL on disk - dan modified to 128K */ #define BDB_IDL_UM_SIZE (1<<18) /* 128K IDL in memory - dan modified to 256K*/ The problem that I reported in the initial post went away. I am now able to view/search, etc, on this large node in my database. This surprised me, because I didn't think that I had increased the size of the key enough to fix the problem. It must be because my ~260,000ish entries are being split across multiple keys (I'm not sure why, the are almost all aliases) Am I likely to run into any other problems by using these larger values? If not, is there a reason not to update openldap itself to use these larger values? Thanks, Dan
Ps - Howard, you were right again with the LDAP_PVT_THREAD_STACK_SIZE. I tried changing the multiple from 4 to 8, and then changed the other variables back to a bit shift of 18 and 19, and rebuilt all of openldap, and this time all of the make tests passed. I suppose this issue is a matter of balancing the scalability of openldap vs its ability to run on a machine with limited RAM. I'll add these changes to my notes, and hopefully remember to set them before all future builds. Thanks for your expertise! Dan -----Original Message----- From: Howard Chu [mailto:hyc@symas.com] Sent: Wednesday, October 06, 2004 5:03 PM To: Armbrust, Daniel C. Cc: openldap-its@OpenLDAP.org Subject: Re: [JunkMail] RE: scope not ok errors on very large databases (ITS#3343) Hm.... Would need a gdb stack trace to be sure, but most likely the new size is too large for the regular thread stack. You'll need to increase the size of LDAP_PVT_THREAD_STACK_SIZE and recompile libldap_r to change that, and relink slapd.
Armbrust, Daniel C. wrote: > >Followup to the last post - when I reloaded my problematic database on openldap 2.2.17 using these parameters: > >#define BDB_IDL_DB_SIZE (1<<17) /* 64K IDL on disk - dan modified to 128K */ >#define BDB_IDL_UM_SIZE (1<<18) /* 128K IDL in memory - dan modified to 256K*/ > >The problem that I reported in the initial post went away. I am now able to view/search, etc, on this large node in my database. This surprised me, because I didn't think that I had increased the size of the key enough to fix the problem. It must be because my ~260,000ish entries are being split across multiple keys (I'm not sure why, the are almost all aliases) > >Am I likely to run into any other problems by using these larger values? If not, is there a reason not to update openldap itself to use these larger values? > > The main issue is memory usage. Every IDL slot is 4 bytes, so 256K of them is 1024KB of memory. back-bdb preallocates a search stack for every thread; this stack is configurable in slapd.conf but defaults to 8 chunks, so that's 8*1024KB = 8MB. Also, one or two of them may need to fit on the regular thread stack, as you already saw. All of this adds up quickly, especially if you have a large number of threads configured. These default values were chosen a long time ago, when slapd still defaulted to 32 threads (as opposed to 16 now), and were reasonable for a typical 32-bit machine. But obviously there's room here for tuning, and if you were to create a 64-bit build you'd probably want even larger sizes. -- -- Howard Chu Chief Architect, Symas Corp. Director, Highland Sun http://www.symas.com http://highlandsun.com/hyc Symas: Premier OpenSource Development and Support
Possibly related information I forgot to put in the initial report: I have made the following modifications to my instance of 2.2.23 (because of this bug http://www.openldap.org/its/index.cgi?findid=3343): In the file 'servers/slapd/back-bdb/idl.h' I modify these two lines: #define BDB_IDL_DB_SIZE (1<<16) /* 64K IDL on disk*/ #define BDB_IDL_UM_SIZE (1<<17) /* 128K IDL in memory*/ so that they read: #define BDB_IDL_DB_SIZE (1<<18) /* 256K IDL on disk*/ #define BDB_IDL_UM_SIZE (1<<19) /* 512K IDL in memory*/ in the file 'include/ldap_pvt_thread.h' on the line that says: #define LDAP_PVT_THREAD_STACK_SIZE (4*1024*1024) change it to: #define LDAP_PVT_THREAD_STACK_SIZE (8*1024*1024)
moved from Incoming to Archive.Incoming
expected behavior