[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: ITS#1998, zero-length attr vals



On Tue, Aug 06, 2002 at 01:26:02AM -0700, Howard Chu wrote:
> > -----Original Message-----
> > From: Kurt D. Zeilenga [mailto:Kurt@OpenLDAP.org]
> 
> > BTW, we need to do some work in the string preparation
> > area.  One approach is to do away with UTF8StringNormalize
> > and put all the string preparation (excepting transcoding)
> > into UTF8bvnormalize.  Presently we're removing extra
> > spaces (U+0020) before we've done mapping... which means
> > we might still have extra spaces.
> 
> I don't think that's an issue. As the comment in UTF8StringNormalize
> states, all whitespace is ASCII; there are no 8-bit or larger
> characters that fall into the "Space" class, and longer encodings
> of ASCII characters are illegal. So, the result of the decomposition
> and mapping should not result in any more spaces than the original
> string contained.

We should do mapping first anyway, and even if there are no spaces
outside ascii, someone might some time change mapping tables so
that there is. And by doing mapping first, we can simply check for
U+0020 and nothing else later.

Regarding spaces, what about these? Should none of these be mapped
to space?

1680;OGHAM SPACE MARK;Zs;0;WS;;;;;N;;;;;
2000;EN QUAD;Zs;0;WS;2002;;;;N;;;;;
2001;EM QUAD;Zs;0;WS;2003;;;;N;;;;;
2002;EN SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;;
2003;EM SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;;
2004;THREE-PER-EM SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;;
2005;FOUR-PER-EM SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;;
2006;SIX-PER-EM SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;;
2007;FIGURE SPACE;Zs;0;WS;<noBreak> 0020;;;;N;;;;;
2008;PUNCTUATION SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;;
2009;THIN SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;;
200A;HAIR SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;;
2028;LINE SEPARATOR;Zl;0;WS;;;;;N;;;;;
202F;NARROW NO-BREAK SPACE;Zs;0;WS;<noBreak> 0020;;;;N;;;;;
21C7;LEFTWARDS PAIRED ARROWS;So;0;ON;;;;;N;LEFT PAIRED ARROWS;;;;
21C8;UPWARDS PAIRED ARROWS;So;0;ON;;;;;N;UP PAIRED ARROWS;;;;
21C9;RIGHTWARDS PAIRED ARROWS;So;0;ON;;;;;N;RIGHT PAIRED ARROWS;;;;
21CA;DOWNWARDS PAIRED ARROWS;So;0;ON;;;;;N;DOWN PAIRED ARROWS;;;;
3000;IDEOGRAPHIC SPACE;Zs;0;WS;<wide> 0020;;;;N;;;;;

I don't know, I guess some of them are mapped to nothing, but all of
them?

Someone will probably write a generic stringprep library that we could
try to use, there will be a lot of applications that need stringprep I
think.

Stig