Issue 7447 - back-sql and german umlaute
Summary: back-sql and german umlaute
Status: VERIFIED SUSPENDED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: backends (show other issues)
Version: 2.4.30
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-11-20 13:11 UTC by metzdorf@geograt.de
Modified: 2020-06-25 23:27 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description metzdorf@geograt.de 2012-11-20 13:11:02 UTC
Full_Name: Herbert Metzdorf
Version: 2.4.30
OS: Windows 7 / 2008
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (87.138.76.52)


I am using thunderbird to query the openldap with the sql backend. Queries with
german umlaut's give no result because of wrong lower/uppercase conversion. Look
at the following log snippets:

50ab7cdb ==> limits_get: conn=1008 op=1 self="[anonymous]"
this="ou=adressen,dc=geograt,dc=de"
50ab7cdb ==>backsql_search(): base="ou=adressen,dc=geograt,dc=de",
filter="(|(?mail=*search�o�u�*)(cn=*search�o�u�*)(givenName=*search�o�u�*)(sn=*search�o�u�*))",
scope=2,50ab7cdb  deref=0, attrsonly=0, attributes to load: custom list
50ab7cdb ==>backsql_get_db_conn()
...
50ab7cdb <==backsql_srch_query() returns SELECT DISTINCT
ldap_entries.id,ldap_persons.sysid,'inetOrgPerson' AS
objectClass,ldap_entries.dn AS dn FROM ldap_entries,ldap_persons WHERE
ldap_persons.sysid=ldap_entries.keyval AND ldap_entries.oc_map_id=? AND
UPPER(ldap_entries.dn) LIKE CONCAT('%',UPPER(?)) AND (1=0 OR (UPPER(vorname||'
'||nachname) LIKE '%SEARCH�O�U�%') OR (UPPER(vorname) LIKE '%SEARCH�O�U�%') OR
(UPPER(nachname) LIKE '%SEARCH�O�U�%'))

The search expression in converted to uppercase but this fails for the
umlaut's.
Is there a parameter to configure this behavior, or do you know a workaround?

Thanks for your help
Herbert
Comment 1 metzdorf@geograt.de 2012-11-20 13:23:58 UTC
The umlaut's are gone while posting.
So i will try to describe:

The search expression is "search<lower umlaut a>o<lower umlaut o>u...".
This is converted to "SEARCH<lower umlaut a>O<lower umlaut o>U" in the 
SELECT statement.
The expected is "SEARCH<upper umlaut a>O<upper umlaut o>U".

-- 
Herbert Metzdorf

Comment 2 Tomas Novosad 2013-04-04 06:13:10 UTC
Hello,

i got exactly same problem.

Only the discussed character is different ;-)).
When ThunderBird (or ldapsearch, it doesnt matter) send search query to LDAP with some UTF-8 character,
the result query to DB (PGSQL in this case) is like
(upper(last_name) LIKE '%šEV%')

where the search parameter is:
%<lower case utf8 character>EV%

obviously backsql does not correctly handle UTF8 characters.

I can't find any way how to avoid this.
If only back-sql would leave the upper case conversion on DB - like
this:
(upper(last_name) LIKE upper('%šEV%'))

or use ILIKE

Anyone has any suggestion how to workaround this?

Thanks in advance

-- 
Tomáš Novosad

Comment 3 ando@openldap.org 2013-04-04 07:49:59 UTC
On 04/04/2013 08:13 AM, tomas.novosad@linuxbox.cz wrote:
> Hello,
>
> i got exactly same problem.
>
> Only the discussed character is different ;-)).
> When ThunderBird (or ldapsearch, it doesnt matter) send search query to LDAP with some UTF-8 character,
> the result query to DB (PGSQL in this case) is like
> (upper(last_name) LIKE '%Å¡EV%')
>
> where the search parameter is:
> %<lower case utf8 character>EV%
>
> obviously backsql does not correctly handle UTF8 characters.
>
> I can't find any way how to avoid this.
> If only back-sql would leave the upper case conversion on DB - like
> this:
> (upper(last_name) LIKE upper('%Å¡EV%'T

The solution is to augment table ldap_attr_mappings (with non-trivial 
implications on DN searching and matching) with a field that specifies 
the encoding for a particular attribute, and convert back and forth any 
time an operation affects those attributes.  Not trivial, but 
contributions are welcome.

p.

-- 
Pierangelo Masarati
Associate Professor
Dipartimento di Scienze e Tecnologie Aerospaziali
Politecnico di Milano

Comment 4 OpenLDAP project 2017-04-13 15:23:38 UTC
back-sql
Comment 5 Quanah Gibson-Mount 2017-04-13 15:23:38 UTC
changed notes
moved from Incoming to Software Bugs
Comment 6 Quanah Gibson-Mount 2020-06-25 23:27:49 UTC
patches welcome