[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (c.harding 44401) RE: (c.harding 44382) RE: (a.josey 14931) Re: (c.harding 44333) Re: regexMatch (Was: substring filters using DN attributes ?)



Hi, Ron -

>Thank you, Chris.
>
>Actually, I was trying to say that I thought that this might be more than a
>server would wish to do. I think you are saying (are you?) that, where an
>attribute is marked with a language tag, the RE should be applied in a
>locale appropriate to *that* language. This implies that filter processing
>is attribute ;option dependent?
>

I'm not exactly sure how it would work - hadn't thought that far. What I
was saying was that someone ought to think about it. 

I'm a bit rusty both on locales and language tags, but I think that it
could be possible to deduce a locale from a language tag. Trouble is,
locale is a UNIX construct and other O/S's (WIN 95/98 for a start - I don't
know about NT) probably don't understand them. 

It may be possible to do something by keeping the existing RE definition
and saying that your character encoding is UNICODE. But this might have
undesirable characteristics - for example I suspect that [a-eacute] would
include some non-alphabetic characters which probably isn't what you want. 

>Another problem I see is that re.html defines the syntax but not the
>semantics of REs. How an RE is constructed is committed to BNF, but how it
>is interpreted is ambiguous. I mention the handling of ^ in subexpressions
>and the meaning (not) given to "**".
>
I would hope that the semantics aren't ambiguous - but you may be right. I
think we have a test suite that covers RE processing - if so, it is likely
that ambiguities will have been bcorrected and resolved.

>Ron.
>-----Original Message-----
>From: Chris Harding [mailto:c.harding@opengroup.org]
>Sent: Thursday, 27 July 2000 18:23
>To: Ramsay, Ron
>Cc: ldapext
>Subject: Re: (c.harding 44382) RE: (a.josey 14931) Re: (c.harding 44333)
>Re: regexMatch (Was: substring filters using DN attributes ?)
>
>
>Hi, Ron -
>
>There are several variants on the definition of regular expression, for
>historical reasons. There is one UNIX(TM) standard, though, which says that
>all of the versions must be supported by a UNIX system, each being applied
>in the appropriate contexts as defined by the standard.
>
>Internationalization is a very tricky - but extremely important - area. The
>standards (even the UNIX ones) should not be followed blindly, you need to
>look carefuly at what implications they would have before referencing them
>in a matching rule RFC or draft.
>
>When dealing with multiple character sets and languages, collating
>sequences and regular expressions clearly become more difficult. A great
>deal of very good work went into the definition of internationalized
>regular expressions. However, that work pre-dates the deployment of
>UNICODE, which removes some but by no means all of the problems. So far as
>I know, it has not been re-evaluated in the light of UNICODE, but it
>certainly should be.
>
>Should there be locale attributes and if so where they should go? People
>must be able to put entries using different languages and character sets in
>the same directory. This is in fact supported by LDAP language tagging (RFC
>2596), and the first question has to be whether any mechanism beyond RFC
>2596 is needed. For the sake of simplicity, I would hope not. 
>
>>Chris,
>>
>>Interesting. On reading re.html below I found no less than three
>'standards'
>>in the first paragraph. Mention of locales later completed the picture for
>>me.
>>
>>I guess in the conformance statement for the directory we say which RE
>>'standard' we are following. We also need an attribute inb the root DSE
>>which specifies what our locale is.
>>
>>Ron.
>>
>>-----Original Message-----
>>From: Chris Harding [mailto:c.harding@opengroup.org]
>>Sent: Wednesday, 26 July 2000 21:37
>>To: Rob Byrne - Sun Microsystems; Kurt D. Zeilenga
>>Cc: ldapext; a.josey@opengroup.org
>>Subject: (a.josey 14931) Re: (c.harding 44333) Re: regexMatch (Was:
>>substring filters using DN attributes ?)
>>
>>
>>>Is there a standard definition  of what a regular expression actually is ?
>>>
>>There certainly is.
>>
>>It is part of the standard definition of the UNIX(TM) operating system
>>which is available (foc) from The Open Group, see
>>http://www.opengroup.org/publications/catalog/t912.htm#medium2
>>
>>The definition of regular expressions is at
>>http://www.opengroup.org/onlinepubs/007908799/xbd/re.html
>>
>>>I ask this because if you work on Solaris for example, there are n
>>different
>>>libraries and functions for doing regular expression matching so the
>>meaning
>>>of "regular expression" is not so obvious.
>>>
>>>Rob.
>>>
>>>"Kurt D. Zeilenga" wrote:
>>>
>>>> At 09:35 AM 7/25/00 -0700, Mark C Smith wrote:
>>>> >"Kurt D. Zeilenga" wrote:
>>>> >>
>>>> >> I've meaning to publish a regexMatch rule I-D which would allow
>>>> >> matching of an asserted regular expression against the string
>>>> >> representation of attribute values.  Of course, to be useful with
>>>> >> DNs, we'd have to have to define a canonical string representation
>>>> >> of DNs.  Given such, you would be able to do DN matching like:
>>>> >>
>>>> >>         (member:regexMatch:=.*,dc=example,dc=com$)
>>>> >>
>>>> >> Such a matching rule, I believe, would be generally useful in
>>>> >> a number of applications.  Of course, user applications may
>>>> >> not want to expose regular expressions to average Joe.
>>>> >>
>>>> >> If others concur that this would be generally useful, I'll put
>>>> >> up a straw man proposal after IETF#48.
>>>> >
>>>> >It would be interesting to see examples of the kinds of LDAP
>application
>>>> >problems that would be more easily addressed if such a matching rule
>was
>>>> >available.
>>>>
>>>> I agree.  In fact, I wouldn't attempt to write such an I-D
>>>> without decent examples.  In general, such a rule would be useful
>>>> to applications which required very specific, complex matching
>>>> which cannot easily be decomposed into a substrings assertion.
>>>> I'll try to come up with some examples, hopefully ones which
>>>> are not too contrived.
>>>>
>>>> >If all we really need is a way to anchor the start and end
>>>> >of strings (i.e., ^ and $ from regex), I'd rather see a more narrow
>>>> >proposal.  Why?  Because general regular expression matching will be
>>>> >quite difficult to support using indexes, etc.
>>>>
>>>> I concur that general regular expressions are quite difficult to
>>>> to support using indexing.  I also concur that applications wanting
>>>> to make an assertion should use an appropriate matching rule.  I
>>>> fully agree that applications wanting to simply assert start/end
>>>> text should use a substrings matching rules.
>>>>
>>>> Kurt
>>>
>>>
>>>
>>
>>Regards,
>>
>>Chris
>>+++++
>>
>>========================================================================
>>           Chris Harding
>>  T H E    Directory Program Manager
>> O P E N   Apex Plaza, Forbury Road, Reading RG1 1AX, UK
>>G R O U P  Mailto:c.harding@opengroup.org  Phone:  +44 118 950 8311 x2262
>>           WWW: http://www.opengroup.org   Mobile: +44 771 8588820  
>>========================================================================
>>
>>
>
>Regards,
>
>Chris
>+++++
>
>========================================================================
>           Chris Harding
>  T H E    Directory Program Manager
> O P E N   Apex Plaza, Forbury Road, Reading RG1 1AX, UK
>G R O U P  Mailto:c.harding@opengroup.org  Phone:  +44 118 950 8311 x2262
>           WWW: http://www.opengroup.org   Mobile: +44 771 8588820  
>========================================================================
>
>

Regards,

Chris
+++++

========================================================================
           Chris Harding
  T H E    Directory Program Manager
 O P E N   Apex Plaza, Forbury Road, Reading RG1 1AX, UK
G R O U P  Mailto:c.harding@opengroup.org  Phone:  +44 118 950 8311 x2262
           WWW: http://www.opengroup.org   Mobile: +44 771 8588820  
========================================================================