[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Problems with case folding of UTF-8



> Hi
> 
> I've digged more into this now, and you're right, something was wrong.
> My apologies for not realizing this at once. Unicode has a composition
> exclusion list. Due to a bug in the exclusion code in ucgendat.c we
> excluded quite a lot of compositions that we shouldn't have. I've now
> replaced 15 with 5 at two places in ucgendat.c, and everything should
> be okay. So please check latest CVS and let me know how it works.
> 
> Thanks a lot for reporting this! I'm quite happy you found it before
> 2.1 was released.

Well, I'm quite happy you find the fix, because I started gathering
documentation on UTF-8 from the official website, and it really looks
a nightmare :) I found something about these compositions and was
trying to follow the code, but I'm really running out of time.

> 
> BTW, with latest HEAD I'm not allowed to have non-ascii DNs it seems.
> I haven't tried to see why yet. I suppose you are aware of it.

Kind of; the last time I committed stuff about DN normalization 
i fell in that strange problem with case folding, so I wasn't able
to test all the non-ascii stuff I was working at.  One problem I found
is that when the "escaped" DN is longer than the input string, the
legacy wrappers we're still using (dn_normalize and so) must fail
because the normalization cannot occur in place.  Actually, only
"escaped" DNs or OID=#{ber encoded octets} are allowed.

In many cases you can add spurious space nearly everywhere (the
DN parsing is quite liberal: you can do 

	[space]<attr>[space]=[space]<value>[space][+[space]<ava>][,<rdn>[...]]

and so on. Of course, if required, we can deny this freedom
by setting the appropriate pedantic flags :)

If you have any suggestions on how to better handle non-ascii
stuff, please come in.  My experience with non-ascii is quite limited.

Pierangelo.