[Date Prev][Date Next] [Chronological] [Thread] [Top]

normalised UTF-8, should it be "decomposed", or "composed"?



In ldap/libraries/liblunicode/ucstr.c we have around 203:

                /* normalize ucs of length p - ucs */
                uccanondecomp( ucs, p - ucs, &ucsout, &ucsoutlen );
                ucsoutlen = uccanoncomp( ucsout, ucsoutlen );

Why convert to decomposed form then back to composed?  Wouldn't
it be better to us decomposed form as the "normalised" form?