Issue 4604 - backs-sql: backsql_process_filter(): invalid filter for UTF-8 querys
Summary: backs-sql: backsql_process_filter(): invalid filter for UTF-8 querys
Status: VERIFIED SUSPENDED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-07-01 08:07 UTC by reiffert@student.physik.uni-mainz.de
Modified: 2020-06-26 15:05 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description reiffert@student.physik.uni-mainz.de 2006-07-01 08:07:01 UTC
Full_Name: Thomas Reifferscheid
Version: 2.3.24
OS: Linux/2.6.16
URL: http://student.physik.uni-mainz.de/~reiffert/back_sql.txt
Submission from: (NULL) (84.168.189.212)


I'm using Thunderbird as slapd frontend.
When searching for an recipient whose name contains special chars
like "���" (german umlauts), the slapd's syslog correctly shows the utf-8
encoded
characters.

slapd then hands over the filter to back-sql which then returns
"invalid filter".


When searching for "M*ller", so replacing "�" by "*", the answers
that get back to thunderbird, correctly show "M�ller".


A loglevel 65535 output from a "m�l" search can be found
at the provided URL.

Kind regards
Thomas Reifferscheid

Comment 1 reiffert@student.physik.uni-mainz.de 2006-07-01 08:17:39 UTC
Additional Information:

The ITS-Webinterface replaces the mentioned special german chars by
"."

You can find examples here:
http://de.wikipedia.org/wiki/Umlaut#Umlautbuchstaben


(öäüÖÄÜß) -> (.......)


So when reading the original issue, please keep in mind,
that you have to replace "." by one of those chars.



Hope that helps
Thomas Reifferscheid
Comment 2 ando@openldap.org 2006-07-01 13:59:21 UTC
changed notes
moved from Incoming to Software Enhancements
Comment 3 ando@openldap.org 2006-07-01 14:02:13 UTC
On Sat, 2006-07-01 at 08:07 +0000, reiffert@student.physik.uni-mainz.de
wrote:
> Full_Name: Thomas Reifferscheid
> Version: 2.3.24
> OS: Linux/2.6.16
> URL: http://student.physik.uni-mainz.de/~reiffert/back_sql.txt
> Submission from: (NULL) (84.168.189.212)
> 
> 
> I'm using Thunderbird as slapd frontend.
> When searching for an recipient whose name contains special chars
> like "üöä" (german umlauts), the slapd's syslog correctly shows the utf-8
> encoded
> characters.
> 
> slapd then hands over the filter to back-sql which then returns
> "invalid filter".
> 
> 
> When searching for "M*ller", so replacing "ü" by "*", the answers
> that get back to thunderbird, correctly show "Müller".
> 
> 
> A loglevel 65535 output from a "mül" search can be found
> at the provided URL.

back-sql acts as a "dumb" backend to a generic RDBMS; LDAP requires UTF8
for directory strings, while the RDBMS uses whatever instructed to.  You
need to make sure appropriate conversion occurs in between.  Back-sql
does not provide any means to convert strings to/from another character
set.  A patch that consistently enables translitteration from UTF8 to
whatever charset is used in the RDBMS for a given data type is welcome,
provided it complies with the license and other contribution criteria of
OpenLDAP software.

p.




Ing. Pierangelo Masarati
Responsabile Open Solution
OpenLDAP Core Team

SysNet s.n.c.
Via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
------------------------------------------
Office:   +39.02.23998309          
Mobile:   +39.333.4963172
Email:    pierangelo.masarati@sys-net.it
------------------------------------------

Comment 4 reiffert@student.physik.uni-mainz.de 2006-07-01 14:47:07 UTC
Dear Pieranelo,

regarding http://student.physik.uni-mainz.de/~reiffert/back_sql.txt
...


On Sat, 1 Jul 2006, Pierangelo Masarati wrote:

> back-sql acts as a "dumb" backend to a generic RDBMS; LDAP requires UTF8
> for directory strings,


Jul  1 09:51:10 hell slapd[25812]:     filter: 
(|(cn=m\C3\BCl*)(?=undefined)(sn=m\C3\BCl*))


isn't that in the exact right way it has to be?

If not, how has it to look like at this place so that

the answer is not:

Jul  1 09:51:10 hell slapd[25812]: backsql_process_filter(): invalid 
filter


> while the RDBMS uses whatever instructed to.  You
> need to make sure appropriate conversion occurs in between.  Back-sql
> does not provide any means to convert strings to/from another character
> set.  A patch that consistently enables translitteration from UTF8 to
> whatever charset is used in the RDBMS for a given data type is welcome,
> provided it complies with the license and other contribution criteria of
> OpenLDAP software.

If I'd understand you better, I probably will start finding a workaround
to enhance openldap getting around that nasty thing!

Kind regards
Thomas Reifferscheid

Comment 5 ando@openldap.org 2006-07-01 15:06:32 UTC
On Sat, 2006-07-01 at 16:47 +0200, Thomas Reifferscheid wrote:
> Dear Pieranelo,
> 
> regarding http://student.physik.uni-mainz.de/~reiffert/back_sql.txt
> ...
> 
> 
> On Sat, 1 Jul 2006, Pierangelo Masarati wrote:
> 
> > back-sql acts as a "dumb" backend to a generic RDBMS; LDAP requires UTF8
> > for directory strings,
> 
> 
> Jul  1 09:51:10 hell slapd[25812]:     filter: 
> (|(cn=m\C3\BCl*)(?=undefined)(sn=m\C3\BCl*))

This seems to indicate that the second AVA in your filter is incorrect.
I have no idea as per what Thunderbird sends as search filter, but
likely either that attribute is not defined in your slapd's schema, or
its value does not comply with that attribute's syntax.

I suggest you find out (e.g. by looking at logs with level "packets")
what filter is actually sent; then check if the attribute is defined in
your schema and if the value complies with the syntax of that attribute.

> 
> 
> isn't that in the exact right way it has to be?
> 
> If not, how has it to look like at this place so that
> 
> the answer is not:
> 
> Jul  1 09:51:10 hell slapd[25812]: backsql_process_filter(): invalid 
> filter
> 
> 
> > while the RDBMS uses whatever instructed to.  You
> > need to make sure appropriate conversion occurs in between.  Back-sql
> > does not provide any means to convert strings to/from another character
> > set.  A patch that consistently enables translitteration from UTF8 to
> > whatever charset is used in the RDBMS for a given data type is welcome,
> > provided it complies with the license and other contribution criteria of
> > OpenLDAP software.
> 
> If I'd understand you better, I probably will start finding a workaround
> to enhance openldap getting around that nasty thing!

This remains valid: LDAP strings are UTF8 encoded; data in your RBBMS is
encoded whatever it was set as.  back-sql should be smart enough to
allow mapping, but right now it's not.

p.




Ing. Pierangelo Masarati
Responsabile Open Solution
OpenLDAP Core Team

SysNet s.n.c.
Via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
------------------------------------------
Office:   +39.02.23998309          
Mobile:   +39.333.4963172
Email:    pierangelo.masarati@sys-net.it
------------------------------------------

Comment 6 reiffert@student.physik.uni-mainz.de 2006-07-01 15:41:14 UTC

On Sat, 1 Jul 2006, Pierangelo Masarati wrote:

>> Jul  1 09:51:10 hell slapd[25812]:     filter:
>> (|(cn=m\C3\BCl*)(?=undefined)(sn=m\C3\BCl*))
>
> This seems to indicate that the second AVA in your filter is incorrect.
> I have no idea as per what Thunderbird sends as search filter, but
> likely either that attribute is not defined in your slapd's schema, or
> its value does not comply with that attribute's syntax.


Oh, how could I've been so blind. Thanks for the hint. Here is the
result:

working filter, for entering "kepper" into thunderbird:

==>backsql_search(): 
base="dc=foo,dc=de", filter="(|(cn=kepper*)(mail=kepper*)(sn=kepper*))"


And here is what happens when I enter "Müller", the word
with the german special char:

==>backsql_search():
base="dc=foo,dc=de", filter="(|(cn=m\C3\BCller*)(?=undefined)(sn=m\C3\BCller*))"


Now let's the why.


starting slapd with -d 255 showed up:

ldap_read: want=101, got=101
  0000: 11 64 63 3d 64 69 65 66 69 72 6d 61 2c 64 63 3d .dc=diefirma,dc=
  0010: 64 65 0a 01 02 0a 01 00 02 02 00 c8 02 01 00 01 de..............
  0020: 01 00 a1 35 a4 0f 04 02 63 6e 30 09 80 07 6d c3 ...5....cn0...m.
  0030: bc 6c 6c 65 72 a4 11 04 04 6d 61 69 6c 30 09 80 .ller....mail0..
  0040: 07 6d c3 bc 6c 6c 65 72 a4 0f 04 02 73 6e 30 09 .m..ller....sn0.
  0050: 80 07 6d c3 bc 6c 6c 65 72 30 0a 04 02 63 6e 04 ..m..ller0...cn.
  0060: 04 6d 61 69 6c                                  .mail
ber_get_next: tag 0x30 len 107 contents:
ber_dump: buf=0x08256e08 ptr=0x08256e08 end=0x08256e73 len=107
  0000: 02 01 02 63 66 04 11 64 63 3d 64 69 65 66 69 72 ...cf..dc=diefir
  0010: 6d 61 2c 64 63 3d 64 65 0a 01 02 0a 01 00 02 02 ma,dc=de........
  0020: 00 c8 02 01 00 01 01 00 a1 35 a4 0f 04 02 63 6e .........5....cn
  0030: 30 09 80 07 6d c3 bc 6c 6c 65 72 a4 11 04 04 6d 0...m..ller....m
  0040: 61 69 6c 30 09 80 07 6d c3 bc 6c 6c 65 72 a4 0f ail0...m..ller..
  0050: 04 02 73 6e 30 09 80 07 6d c3 bc 6c 6c 65 72 30 ..sn0...m..ller0
  0060: 0a 04 02 63 6e 04 04 6d 61 69 6c                ...cn..mail
ber_get_next

So it still looks like everything is fine.
What came to my mind here was, "is the attribute mail allowed
to contain utf-8 chars by the schema?" but I could not solve that
question yet.

Let's see what happens next:

begin get_filter_list
begin get_filter
SUBSTRINGS
begin get_ssa
ber_scanf fmt ({m) ber:
ber_dump: buf=0x08256e08 ptr=0x08256e32 end=0x08256e73 len=65
0000:a4 0f 04 02 63 6e 30 0980 07 6d c3 bc 6c 6c 65 ....cn0...m..lle
0010:72 a4 11 04 04 6d 61 696c 30 09 80 07 6d c3 bc r....mail0...m..
0020:6c 6c 65 72 a4 0f 04 0273 6e 30 09 80 07 6d c3 ller....sn0...m.
0030:bc 6c 6c 65 72 30 0a 0402 63 6e 04 04 6d 61 69 .ller0...cn..mai
0040:6c                                             l
ber_scanf fmt (m) ber:
ber_dump: buf=0x08256e08 ptr=0x08256e3a end=0x08256e73 len=57
0000:80 07 6d c3 bc 6c 6c 6572 a4 11 04 04 6d 61 69 ..m..ller....mai
0010:6c 30 09 80 07 6d c3 bc6c 6c 65 72 a4 0f 04 02 l0...m..ller....
0020:73 6e 30 09 80 07 6d c3bc 6c 6c 65 72 30 0a 04 sn0...m..ller0..
0030:02 63 6e 04 04 6d 61 696c                      .cn..mail
INITIAL
end get_ssa

Now it comes to scanf:


begin get_ssa
ber_scanf fmt ({m) ber:
ber_dump: buf=0x08256e08 ptr=0x08256e43 end=0x08256e73 len=48
0000:00 11 04 04 6d 61 69 6c30 09 80 07 6d c3 bc 6c ....mail0...m..l
0010:6c 65 72 a4 0f 04 02 736e 30 09 80 07 6d c3 bc ler....sn0...m..
0020:6c 6c 65 72 30 0a 04 0263 6e 04 04 6d 61 69 6c ller0...cn..mail
ber_scanf fmt (m) ber:
ber_dump: buf=0x08256e08 ptr=0x08256e4d end=0x08256e73 len=38
0000:80 07 6d c3 bc 6c 6c 6572 a4 0f 04 02 73 6e 30 ..m..ller....sn0
0010:09 80 07 6d c3 bc 6c 6c65 72 30 0a 04 02 63 6e ...m..ller0...cn
0020:04 04 6d 61 69 6c                              ..mail
error=21
end get_filter 0


error=21 shows up.

Finally let's the rest:

begin get_filter
SUBSTRINGS
begin get_ssa
ber_scanf fmt ({m) ber:
ber_dump: buf=0x08256e08 ptr=0x08256e56 end=0x08256e73 len=29
0000:00 0f 04 02 73 6e 30 0980 07 6d c3 bc 6c 6c 65 ....sn0...m..lle
0010:72 30 0a 04 02 63 6e 0404 6d 61 69 6c          r0...cn..mail
ber_scanf fmt (m) ber:
ber_dump: buf=0x08256e08 ptr=0x08256e5e end=0x08256e73 len=21
0000:80 07 6d c3 bc 6c 6c 6572 30 0a 04 02 63 6e 04 ..m..ller0...cn.
0010:04 6d 61 69 6c                                 .mail
INITIAL
end get_ssa
end get_filter 0
end get_filter_list
end get_filter 0
filter: (|(cn=m\C3\BCller*)(?=undefined)(sn=m\C3\BCller*))




I hope the output will help here.



>
> I suggest you find out (e.g. by looking at logs with level "packets")
> what filter is actually sent; then check if the attribute is defined in
> your schema and if the value complies with the syntax of that attribute.



So let's have a closer look on the syntax of mail in the schema files
and how I've solved it:


I changed in core.schema

attributetype ( 0.9.2342.19200300.100.1.3
         NAME ( 'mail' 'rfc822Mailbox' )
         DESC 'RFC1274: RFC822 Mailbox'
     EQUALITY caseIgnoreIA5Match
     SUBSTR caseIgnoreIA5SubstringsMatch
     SYNTAX 1.3.6.1.4.1.1466.115.121.1.26{256} )


to


attributetype ( 0.9.2342.19200300.100.1.3
         NAME ( 'mail' 'rfc822Mailbox' )
         DESC 'RFC1274: RFC822 Mailbox'
     EQUALITY caseIgnoreMatch
     SUBSTR caseIgnoreSubstringsMatch
     SYNTAX 1.3.6.1.4.1.1466.115.121.1.15{32768} )


and now it works!


Thanks a lot for the right hint on the subject.
The issue can be closed.


Kind regards
Thomas Reifferscheid
Comment 7 reiffert@student.physik.uni-mainz.de 2006-07-01 16:25:19 UTC

On Sat, 1 Jul 2006, Pierangelo Masarati wrote:

> You shouldn't do that: changing standard track schema is not good
> practice.  As far as I remember, RFC 2822 does not allow arbitrary chars
> in the local part of an address; they might be allowed in the display
> name, but I fear it's restricted to plain ASCII as well.  I suspect the
> bug here, if any, is that back-sql should simply ignore that part of the
> filter while converting it into an SQL query.  This is what back-sql is
> supposed to do; I'll check if it works as expected.

Additional information:

when the (mail=müller) arrives and get's
translated to (?=undefined), there is no mysql-query performed against
the mysql-db.

Hope that helps checking.

Kind regards
Thomas Reifferscheid
Comment 8 ando@openldap.org 2006-07-01 16:27:28 UTC
On Sat, 2006-07-01 at 17:41 +0200, Thomas Reifferscheid wrote:

> I changed in core.schema
> 
> attributetype ( 0.9.2342.19200300.100.1.3
>          NAME ( 'mail' 'rfc822Mailbox' )
>          DESC 'RFC1274: RFC822 Mailbox'
>      EQUALITY caseIgnoreIA5Match
>      SUBSTR caseIgnoreIA5SubstringsMatch
>      SYNTAX 1.3.6.1.4.1.1466.115.121.1.26{256} )
> 
> 
> to
> 
> 
> attributetype ( 0.9.2342.19200300.100.1.3
>          NAME ( 'mail' 'rfc822Mailbox' )
>          DESC 'RFC1274: RFC822 Mailbox'
>      EQUALITY caseIgnoreMatch
>      SUBSTR caseIgnoreSubstringsMatch
>      SYNTAX 1.3.6.1.4.1.1466.115.121.1.15{32768} )
> 
> 
> and now it works!
> 
> 
> Thanks a lot for the right hint on the subject.
> The issue can be closed.

You shouldn't do that: changing standard track schema is not good
practice.  As far as I remember, RFC 2822 does not allow arbitrary chars
in the local part of an address; they might be allowed in the display
name, but I fear it's restricted to plain ASCII as well.  I suspect the
bug here, if any, is that back-sql should simply ignore that part of the
filter while converting it into an SQL query.  This is what back-sql is
supposed to do; I'll check if it works as expected.

p.

p.




Ing. Pierangelo Masarati
Responsabile Open Solution
OpenLDAP Core Team

SysNet s.n.c.
Via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
------------------------------------------
Office:   +39.02.23998309          
Mobile:   +39.333.4963172
Email:    pierangelo.masarati@sys-net.it
------------------------------------------

Comment 9 ando@openldap.org 2006-07-01 17:01:35 UTC
changed notes
Comment 10 ando@openldap.org 2006-07-01 17:16:53 UTC
On Sat, 2006-07-01 at 18:25 +0200, Thomas Reifferscheid wrote:

> when the (mail=müller) arrives and get's
> translated to (?=undefined), there is no mysql-query performed against
> the mysql-db.

Thanks; I've spotted this separate but related issue in filter
translation; it should now be fixed in HEAD; you may pull it out of the
CVS by getting a diff of servers/slapd/back-sql/search.c  1.118 ->
1.119; it should apply smoothly to 2.3.24.  Please test at your
convenience and report thru the ITS.

p.




Ing. Pierangelo Masarati
Responsabile Open Solution
OpenLDAP Core Team

SysNet s.n.c.
Via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
------------------------------------------
Office:   +39.02.23998309          
Mobile:   +39.333.4963172
Email:    pierangelo.masarati@sys-net.it
------------------------------------------

Comment 11 reiffert@student.physik.uni-mainz.de 2006-07-02 15:51:32 UTC

On Sat, 1 Jul 2006, Pierangelo Masarati wrote:

> it should now be fixed in HEAD; you may pull it out of the
> CVS by getting a diff of servers/slapd/back-sql/search.c  1.118 ->
> 1.119; it should apply smoothly to 2.3.24.  Please test at your
> convenience and report thru the ITS.

Using HEAD 1.119, search.c it works now:

conn=1 op=1 SRCH base="dc=foo,dc=de" scope=2 deref=0 
filter="(|(cn=m\C3\BCller*)(?=undefined)(sn=m\C3\BCller*))"

An additional mysql query is sent afterwards:

...
backsql_process_filter(): filter computed (UNDEFINED)
...

<==backsql_srch_query() returns SELECT DISTINCT 
ldap_entries.id,ldap_entries.id,'inetOrgPerson' AS 
objectClass,ldap_entries.dn AS dn FROM ldap_entries WHER
E ldap_entries.id=ldap_entries.keyval AND ldap_entries.oc_map_id=? AND 9=9 
AND 1=1 AND ((ldap_entries.cn LIKE 'mÃŒller%') OR 12=0 OR (ldap_entries.sn 
LIKE
'mÃŒller%'))


Many thanks for fixing at the speed of of light.

Kind regards
Thomas Reifferscheid
Comment 12 ando@openldap.org 2006-08-17 18:04:46 UTC
changed notes
Comment 13 ando@openldap.org 2006-08-18 07:05:10 UTC
changed notes
changed state Open to Suspended
Comment 14 ando@openldap.org 2007-01-12 10:24:19 UTC
changed notes
Comment 15 OpenLDAP project 2014-08-01 21:04:52 UTC
back-sql
add support for charset conversion
patch welcome
related issue: handle "computed" filters; a patch is in HEAD/2.3.26