Full_Name: Thomas Reifferscheid Version: 2.3.24 OS: Linux/2.6.16 URL: http://student.physik.uni-mainz.de/~reiffert/back_sql.txt Submission from: (NULL) (84.168.189.212) I'm using Thunderbird as slapd frontend. When searching for an recipient whose name contains special chars like "���" (german umlauts), the slapd's syslog correctly shows the utf-8 encoded characters. slapd then hands over the filter to back-sql which then returns "invalid filter". When searching for "M*ller", so replacing "�" by "*", the answers that get back to thunderbird, correctly show "M�ller". A loglevel 65535 output from a "m�l" search can be found at the provided URL. Kind regards Thomas Reifferscheid
Additional Information: The ITS-Webinterface replaces the mentioned special german chars by "." You can find examples here: http://de.wikipedia.org/wiki/Umlaut#Umlautbuchstaben (öäüÖÄÜß) -> (.......) So when reading the original issue, please keep in mind, that you have to replace "." by one of those chars. Hope that helps Thomas Reifferscheid
changed notes moved from Incoming to Software Enhancements
On Sat, 2006-07-01 at 08:07 +0000, reiffert@student.physik.uni-mainz.de wrote: > Full_Name: Thomas Reifferscheid > Version: 2.3.24 > OS: Linux/2.6.16 > URL: http://student.physik.uni-mainz.de/~reiffert/back_sql.txt > Submission from: (NULL) (84.168.189.212) > > > I'm using Thunderbird as slapd frontend. > When searching for an recipient whose name contains special chars > like "üöä" (german umlauts), the slapd's syslog correctly shows the utf-8 > encoded > characters. > > slapd then hands over the filter to back-sql which then returns > "invalid filter". > > > When searching for "M*ller", so replacing "ü" by "*", the answers > that get back to thunderbird, correctly show "Müller". > > > A loglevel 65535 output from a "mül" search can be found > at the provided URL. back-sql acts as a "dumb" backend to a generic RDBMS; LDAP requires UTF8 for directory strings, while the RDBMS uses whatever instructed to. You need to make sure appropriate conversion occurs in between. Back-sql does not provide any means to convert strings to/from another character set. A patch that consistently enables translitteration from UTF8 to whatever charset is used in the RDBMS for a given data type is welcome, provided it complies with the license and other contribution criteria of OpenLDAP software. p. Ing. Pierangelo Masarati Responsabile Open Solution OpenLDAP Core Team SysNet s.n.c. Via Dossi, 8 - 27100 Pavia - ITALIA http://www.sys-net.it ------------------------------------------ Office: +39.02.23998309 Mobile: +39.333.4963172 Email: pierangelo.masarati@sys-net.it ------------------------------------------
Dear Pieranelo, regarding http://student.physik.uni-mainz.de/~reiffert/back_sql.txt ... On Sat, 1 Jul 2006, Pierangelo Masarati wrote: > back-sql acts as a "dumb" backend to a generic RDBMS; LDAP requires UTF8 > for directory strings, Jul 1 09:51:10 hell slapd[25812]: filter: (|(cn=m\C3\BCl*)(?=undefined)(sn=m\C3\BCl*)) isn't that in the exact right way it has to be? If not, how has it to look like at this place so that the answer is not: Jul 1 09:51:10 hell slapd[25812]: backsql_process_filter(): invalid filter > while the RDBMS uses whatever instructed to. You > need to make sure appropriate conversion occurs in between. Back-sql > does not provide any means to convert strings to/from another character > set. A patch that consistently enables translitteration from UTF8 to > whatever charset is used in the RDBMS for a given data type is welcome, > provided it complies with the license and other contribution criteria of > OpenLDAP software. If I'd understand you better, I probably will start finding a workaround to enhance openldap getting around that nasty thing! Kind regards Thomas Reifferscheid
On Sat, 2006-07-01 at 16:47 +0200, Thomas Reifferscheid wrote: > Dear Pieranelo, > > regarding http://student.physik.uni-mainz.de/~reiffert/back_sql.txt > ... > > > On Sat, 1 Jul 2006, Pierangelo Masarati wrote: > > > back-sql acts as a "dumb" backend to a generic RDBMS; LDAP requires UTF8 > > for directory strings, > > > Jul 1 09:51:10 hell slapd[25812]: filter: > (|(cn=m\C3\BCl*)(?=undefined)(sn=m\C3\BCl*)) This seems to indicate that the second AVA in your filter is incorrect. I have no idea as per what Thunderbird sends as search filter, but likely either that attribute is not defined in your slapd's schema, or its value does not comply with that attribute's syntax. I suggest you find out (e.g. by looking at logs with level "packets") what filter is actually sent; then check if the attribute is defined in your schema and if the value complies with the syntax of that attribute. > > > isn't that in the exact right way it has to be? > > If not, how has it to look like at this place so that > > the answer is not: > > Jul 1 09:51:10 hell slapd[25812]: backsql_process_filter(): invalid > filter > > > > while the RDBMS uses whatever instructed to. You > > need to make sure appropriate conversion occurs in between. Back-sql > > does not provide any means to convert strings to/from another character > > set. A patch that consistently enables translitteration from UTF8 to > > whatever charset is used in the RDBMS for a given data type is welcome, > > provided it complies with the license and other contribution criteria of > > OpenLDAP software. > > If I'd understand you better, I probably will start finding a workaround > to enhance openldap getting around that nasty thing! This remains valid: LDAP strings are UTF8 encoded; data in your RBBMS is encoded whatever it was set as. back-sql should be smart enough to allow mapping, but right now it's not. p. Ing. Pierangelo Masarati Responsabile Open Solution OpenLDAP Core Team SysNet s.n.c. Via Dossi, 8 - 27100 Pavia - ITALIA http://www.sys-net.it ------------------------------------------ Office: +39.02.23998309 Mobile: +39.333.4963172 Email: pierangelo.masarati@sys-net.it ------------------------------------------
On Sat, 1 Jul 2006, Pierangelo Masarati wrote: >> Jul 1 09:51:10 hell slapd[25812]: filter: >> (|(cn=m\C3\BCl*)(?=undefined)(sn=m\C3\BCl*)) > > This seems to indicate that the second AVA in your filter is incorrect. > I have no idea as per what Thunderbird sends as search filter, but > likely either that attribute is not defined in your slapd's schema, or > its value does not comply with that attribute's syntax. Oh, how could I've been so blind. Thanks for the hint. Here is the result: working filter, for entering "kepper" into thunderbird: ==>backsql_search(): base="dc=foo,dc=de", filter="(|(cn=kepper*)(mail=kepper*)(sn=kepper*))" And here is what happens when I enter "Müller", the word with the german special char: ==>backsql_search(): base="dc=foo,dc=de", filter="(|(cn=m\C3\BCller*)(?=undefined)(sn=m\C3\BCller*))" Now let's the why. starting slapd with -d 255 showed up: ldap_read: want=101, got=101 0000: 11 64 63 3d 64 69 65 66 69 72 6d 61 2c 64 63 3d .dc=diefirma,dc= 0010: 64 65 0a 01 02 0a 01 00 02 02 00 c8 02 01 00 01 de.............. 0020: 01 00 a1 35 a4 0f 04 02 63 6e 30 09 80 07 6d c3 ...5....cn0...m. 0030: bc 6c 6c 65 72 a4 11 04 04 6d 61 69 6c 30 09 80 .ller....mail0.. 0040: 07 6d c3 bc 6c 6c 65 72 a4 0f 04 02 73 6e 30 09 .m..ller....sn0. 0050: 80 07 6d c3 bc 6c 6c 65 72 30 0a 04 02 63 6e 04 ..m..ller0...cn. 0060: 04 6d 61 69 6c .mail ber_get_next: tag 0x30 len 107 contents: ber_dump: buf=0x08256e08 ptr=0x08256e08 end=0x08256e73 len=107 0000: 02 01 02 63 66 04 11 64 63 3d 64 69 65 66 69 72 ...cf..dc=diefir 0010: 6d 61 2c 64 63 3d 64 65 0a 01 02 0a 01 00 02 02 ma,dc=de........ 0020: 00 c8 02 01 00 01 01 00 a1 35 a4 0f 04 02 63 6e .........5....cn 0030: 30 09 80 07 6d c3 bc 6c 6c 65 72 a4 11 04 04 6d 0...m..ller....m 0040: 61 69 6c 30 09 80 07 6d c3 bc 6c 6c 65 72 a4 0f ail0...m..ller.. 0050: 04 02 73 6e 30 09 80 07 6d c3 bc 6c 6c 65 72 30 ..sn0...m..ller0 0060: 0a 04 02 63 6e 04 04 6d 61 69 6c ...cn..mail ber_get_next So it still looks like everything is fine. What came to my mind here was, "is the attribute mail allowed to contain utf-8 chars by the schema?" but I could not solve that question yet. Let's see what happens next: begin get_filter_list begin get_filter SUBSTRINGS begin get_ssa ber_scanf fmt ({m) ber: ber_dump: buf=0x08256e08 ptr=0x08256e32 end=0x08256e73 len=65 0000:a4 0f 04 02 63 6e 30 0980 07 6d c3 bc 6c 6c 65 ....cn0...m..lle 0010:72 a4 11 04 04 6d 61 696c 30 09 80 07 6d c3 bc r....mail0...m.. 0020:6c 6c 65 72 a4 0f 04 0273 6e 30 09 80 07 6d c3 ller....sn0...m. 0030:bc 6c 6c 65 72 30 0a 0402 63 6e 04 04 6d 61 69 .ller0...cn..mai 0040:6c l ber_scanf fmt (m) ber: ber_dump: buf=0x08256e08 ptr=0x08256e3a end=0x08256e73 len=57 0000:80 07 6d c3 bc 6c 6c 6572 a4 11 04 04 6d 61 69 ..m..ller....mai 0010:6c 30 09 80 07 6d c3 bc6c 6c 65 72 a4 0f 04 02 l0...m..ller.... 0020:73 6e 30 09 80 07 6d c3bc 6c 6c 65 72 30 0a 04 sn0...m..ller0.. 0030:02 63 6e 04 04 6d 61 696c .cn..mail INITIAL end get_ssa Now it comes to scanf: begin get_ssa ber_scanf fmt ({m) ber: ber_dump: buf=0x08256e08 ptr=0x08256e43 end=0x08256e73 len=48 0000:00 11 04 04 6d 61 69 6c30 09 80 07 6d c3 bc 6c ....mail0...m..l 0010:6c 65 72 a4 0f 04 02 736e 30 09 80 07 6d c3 bc ler....sn0...m.. 0020:6c 6c 65 72 30 0a 04 0263 6e 04 04 6d 61 69 6c ller0...cn..mail ber_scanf fmt (m) ber: ber_dump: buf=0x08256e08 ptr=0x08256e4d end=0x08256e73 len=38 0000:80 07 6d c3 bc 6c 6c 6572 a4 0f 04 02 73 6e 30 ..m..ller....sn0 0010:09 80 07 6d c3 bc 6c 6c65 72 30 0a 04 02 63 6e ...m..ller0...cn 0020:04 04 6d 61 69 6c ..mail error=21 end get_filter 0 error=21 shows up. Finally let's the rest: begin get_filter SUBSTRINGS begin get_ssa ber_scanf fmt ({m) ber: ber_dump: buf=0x08256e08 ptr=0x08256e56 end=0x08256e73 len=29 0000:00 0f 04 02 73 6e 30 0980 07 6d c3 bc 6c 6c 65 ....sn0...m..lle 0010:72 30 0a 04 02 63 6e 0404 6d 61 69 6c r0...cn..mail ber_scanf fmt (m) ber: ber_dump: buf=0x08256e08 ptr=0x08256e5e end=0x08256e73 len=21 0000:80 07 6d c3 bc 6c 6c 6572 30 0a 04 02 63 6e 04 ..m..ller0...cn. 0010:04 6d 61 69 6c .mail INITIAL end get_ssa end get_filter 0 end get_filter_list end get_filter 0 filter: (|(cn=m\C3\BCller*)(?=undefined)(sn=m\C3\BCller*)) I hope the output will help here. > > I suggest you find out (e.g. by looking at logs with level "packets") > what filter is actually sent; then check if the attribute is defined in > your schema and if the value complies with the syntax of that attribute. So let's have a closer look on the syntax of mail in the schema files and how I've solved it: I changed in core.schema attributetype ( 0.9.2342.19200300.100.1.3 NAME ( 'mail' 'rfc822Mailbox' ) DESC 'RFC1274: RFC822 Mailbox' EQUALITY caseIgnoreIA5Match SUBSTR caseIgnoreIA5SubstringsMatch SYNTAX 1.3.6.1.4.1.1466.115.121.1.26{256} ) to attributetype ( 0.9.2342.19200300.100.1.3 NAME ( 'mail' 'rfc822Mailbox' ) DESC 'RFC1274: RFC822 Mailbox' EQUALITY caseIgnoreMatch SUBSTR caseIgnoreSubstringsMatch SYNTAX 1.3.6.1.4.1.1466.115.121.1.15{32768} ) and now it works! Thanks a lot for the right hint on the subject. The issue can be closed. Kind regards Thomas Reifferscheid
On Sat, 1 Jul 2006, Pierangelo Masarati wrote: > You shouldn't do that: changing standard track schema is not good > practice. As far as I remember, RFC 2822 does not allow arbitrary chars > in the local part of an address; they might be allowed in the display > name, but I fear it's restricted to plain ASCII as well. I suspect the > bug here, if any, is that back-sql should simply ignore that part of the > filter while converting it into an SQL query. This is what back-sql is > supposed to do; I'll check if it works as expected. Additional information: when the (mail=müller) arrives and get's translated to (?=undefined), there is no mysql-query performed against the mysql-db. Hope that helps checking. Kind regards Thomas Reifferscheid
On Sat, 2006-07-01 at 17:41 +0200, Thomas Reifferscheid wrote: > I changed in core.schema > > attributetype ( 0.9.2342.19200300.100.1.3 > NAME ( 'mail' 'rfc822Mailbox' ) > DESC 'RFC1274: RFC822 Mailbox' > EQUALITY caseIgnoreIA5Match > SUBSTR caseIgnoreIA5SubstringsMatch > SYNTAX 1.3.6.1.4.1.1466.115.121.1.26{256} ) > > > to > > > attributetype ( 0.9.2342.19200300.100.1.3 > NAME ( 'mail' 'rfc822Mailbox' ) > DESC 'RFC1274: RFC822 Mailbox' > EQUALITY caseIgnoreMatch > SUBSTR caseIgnoreSubstringsMatch > SYNTAX 1.3.6.1.4.1.1466.115.121.1.15{32768} ) > > > and now it works! > > > Thanks a lot for the right hint on the subject. > The issue can be closed. You shouldn't do that: changing standard track schema is not good practice. As far as I remember, RFC 2822 does not allow arbitrary chars in the local part of an address; they might be allowed in the display name, but I fear it's restricted to plain ASCII as well. I suspect the bug here, if any, is that back-sql should simply ignore that part of the filter while converting it into an SQL query. This is what back-sql is supposed to do; I'll check if it works as expected. p. p. Ing. Pierangelo Masarati Responsabile Open Solution OpenLDAP Core Team SysNet s.n.c. Via Dossi, 8 - 27100 Pavia - ITALIA http://www.sys-net.it ------------------------------------------ Office: +39.02.23998309 Mobile: +39.333.4963172 Email: pierangelo.masarati@sys-net.it ------------------------------------------
changed notes
On Sat, 2006-07-01 at 18:25 +0200, Thomas Reifferscheid wrote: > when the (mail=müller) arrives and get's > translated to (?=undefined), there is no mysql-query performed against > the mysql-db. Thanks; I've spotted this separate but related issue in filter translation; it should now be fixed in HEAD; you may pull it out of the CVS by getting a diff of servers/slapd/back-sql/search.c 1.118 -> 1.119; it should apply smoothly to 2.3.24. Please test at your convenience and report thru the ITS. p. Ing. Pierangelo Masarati Responsabile Open Solution OpenLDAP Core Team SysNet s.n.c. Via Dossi, 8 - 27100 Pavia - ITALIA http://www.sys-net.it ------------------------------------------ Office: +39.02.23998309 Mobile: +39.333.4963172 Email: pierangelo.masarati@sys-net.it ------------------------------------------
On Sat, 1 Jul 2006, Pierangelo Masarati wrote: > it should now be fixed in HEAD; you may pull it out of the > CVS by getting a diff of servers/slapd/back-sql/search.c 1.118 -> > 1.119; it should apply smoothly to 2.3.24. Please test at your > convenience and report thru the ITS. Using HEAD 1.119, search.c it works now: conn=1 op=1 SRCH base="dc=foo,dc=de" scope=2 deref=0 filter="(|(cn=m\C3\BCller*)(?=undefined)(sn=m\C3\BCller*))" An additional mysql query is sent afterwards: ... backsql_process_filter(): filter computed (UNDEFINED) ... <==backsql_srch_query() returns SELECT DISTINCT ldap_entries.id,ldap_entries.id,'inetOrgPerson' AS objectClass,ldap_entries.dn AS dn FROM ldap_entries WHER E ldap_entries.id=ldap_entries.keyval AND ldap_entries.oc_map_id=? AND 9=9 AND 1=1 AND ((ldap_entries.cn LIKE 'mÃŒller%') OR 12=0 OR (ldap_entries.sn LIKE 'mÃŒller%')) Many thanks for fixing at the speed of of light. Kind regards Thomas Reifferscheid
changed notes changed state Open to Suspended
back-sql add support for charset conversion patch welcome related issue: handle "computed" filters; a patch is in HEAD/2.3.26