Issue 5942 - URI matching of "self" in add_syncrepl is incomplete
Summary: URI matching of "self" in add_syncrepl is incomplete
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-02-12 19:08 UTC by Jonathan
Modified: 2014-08-01 21:04 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Jonathan 2009-02-12 19:08:27 UTC
Full_Name: Jonathan Clarke
Version: RE24
OS: irrelevant
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (213.41.243.192)


Hi,

When adding a syncrepl config, the function add_syncrepl performs a "check if
URL points to current server". This check is based on an exact match between the
provider parameter from the syncrepl config line, and the URIs given to slapd on
startup.

If this doesn't match when it should, the database is marked as a shadow, and
all following updates fail with "shadow context; no update refs". This is quite
a pain when it happens on cn=config :)

There are multiple cases when this happens:
1) If no specific URI was specified on launch (no -h option)
2) Port numbers are explicitly specified or not (":389")
3) Trailing slash (for example "ldap://1.2.3.4" != "ldap://1.2.3.4/")
4) IP is specified rather than DNS name ("ldap://localhost" !=
"ldap://127.0.0.1")

I saw the comment in the code that clarifies this behaviour. However, it's a
surprising behaviour, and I think there is code to parse this kind of thing in
the serverID detection now. Maybe it could be reused?

Otherwise, we should probably document this behaviour, to avoid headaches :)

Comment 1 Howard Chu 2009-02-12 19:41:44 UTC
jclarke@linagora.com wrote:
> Full_Name: Jonathan Clarke
> Version: RE24
> OS: irrelevant
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (213.41.243.192)
>
>
> Hi,
>
> When adding a syncrepl config, the function add_syncrepl performs a "check if
> URL points to current server". This check is based on an exact match between the
> provider parameter from the syncrepl config line, and the URIs given to slapd on
> startup.
>
> If this doesn't match when it should, the database is marked as a shadow, and
> all following updates fail with "shadow context; no update refs". This is quite
> a pain when it happens on cn=config :)
>
> There are multiple cases when this happens:
> 1) If no specific URI was specified on launch (no -h option)
> 2) Port numbers are explicitly specified or not (":389")
> 3) Trailing slash (for example "ldap://1.2.3.4" != "ldap://1.2.3.4/")
> 4) IP is specified rather than DNS name ("ldap://localhost" !=
> "ldap://127.0.0.1")
>
> I saw the comment in the code that clarifies this behaviour. However, it's a
> surprising behaviour, and I think there is code to parse this kind of thing in
> the serverID detection now. Maybe it could be reused?
>
> Otherwise, we should probably document this behaviour, to avoid headaches :)

The manpage says the serverID URL must use an FQDN. We already do a number of 
guesses in the code, I don't see any reason to extend this further.

1) with no URI the listener will default to localhost. The serverID URL should 
therefore refer to localhost, or omit the hostname.

2) Port numbers shouldn't be an issue, since they're always matched in the 
parsed URLs.

3) Trailing slashes don't matter in the parsed URLs.

4) The doc is quite explicit about using FQDN. I have no sympathy for people 
who trip over this.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 2 Howard Chu 2009-02-12 19:52:06 UTC
hyc@symas.com wrote:
> jclarke@linagora.com wrote:
>> Full_Name: Jonathan Clarke
>> Version: RE24
>> OS: irrelevant
>> URL: ftp://ftp.openldap.org/incoming/
>> Submission from: (NULL) (213.41.243.192)
>>
>>
>> Hi,
>>
>> When adding a syncrepl config, the function add_syncrepl performs a "check if
>> URL points to current server". This check is based on an exact match between the
>> provider parameter from the syncrepl config line, and the URIs given to slapd on
>> startup.
>>
>> If this doesn't match when it should, the database is marked as a shadow, and
>> all following updates fail with "shadow context; no update refs". This is quite
>> a pain when it happens on cn=config :)
>>
>> There are multiple cases when this happens:
>> 1) If no specific URI was specified on launch (no -h option)
>> 2) Port numbers are explicitly specified or not (":389")
>> 3) Trailing slash (for example "ldap://1.2.3.4" != "ldap://1.2.3.4/")
>> 4) IP is specified rather than DNS name ("ldap://localhost" !=
>> "ldap://127.0.0.1")
>>
>> I saw the comment in the code that clarifies this behaviour. However, it's a
>> surprising behaviour, and I think there is code to parse this kind of thing in
>> the serverID detection now. Maybe it could be reused?

Oops, never mind. I was thinking of the serverID code, and you're talking 
about the syncrepl code.

I'll have to think a bit about why the two aren't behaving identically; 
probably they should both use the same code.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 3 Howard Chu 2009-02-12 19:56:49 UTC
Howard Chu wrote:
> hyc@symas.com wrote:
>> jclarke@linagora.com wrote:
>>> I saw the comment in the code that clarifies this behaviour. However, it's a
>>> surprising behaviour, and I think there is code to parse this kind of thing in
>>> the serverID detection now. Maybe it could be reused?
>
> Oops, never mind. I was thinking of the serverID code, and you're talking
> about the syncrepl code.
>
> I'll have to think a bit about why the two aren't behaving identically;
> probably they should both use the same code.
>
Ah right. For the serverID code, we're doing a liberal match because anything 
that matches the current name:port is obviously the current server, and we 
want it to be identified as such.

For syncrepl, we know full well that it may be the current server, but for 
whatever reason you may want to replicate against yourself anyway (e.g., 
against a different baseDN, etc...).

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 4 Jonathan 2009-02-12 20:41:23 UTC
Le Jeu 12 février 2009 20:41, Howard Chu a écrit :
> jclarke@linagora.com wrote:
>> Full_Name: Jonathan Clarke
>> Version: RE24
>> OS: irrelevant
>> URL: ftp://ftp.openldap.org/incoming/
>> Submission from: (NULL) (213.41.243.192)
>>
>>
>> Hi,
>>
>> When adding a syncrepl config, the function add_syncrepl performs a
>> "check if
>> URL points to current server". This check is based on an exact match
>> between the
>> provider parameter from the syncrepl config line, and the URIs given to
>> slapd on
>> startup.
>>
>> If this doesn't match when it should, the database is marked as a
>> shadow, and
>> all following updates fail with "shadow context; no update refs". This
>> is quite
>> a pain when it happens on cn=config :)
>>
>> There are multiple cases when this happens:
>> 1) If no specific URI was specified on launch (no -h option)
>> 2) Port numbers are explicitly specified or not (":389")
>> 3) Trailing slash (for example "ldap://1.2.3.4" != "ldap://1.2.3.4/")
>> 4) IP is specified rather than DNS name ("ldap://localhost" !=
>> "ldap://127.0.0.1")
>>
>> I saw the comment in the code that clarifies this behaviour. However,
>> it's a
>> surprising behaviour, and I think there is code to parse this kind of
>> thing in
>> the serverID detection now. Maybe it could be reused?
>>
>> Otherwise, we should probably document this behaviour, to avoid
>> headaches :)
>
> The manpage says the serverID URL must use an FQDN. We already do a number
> of
> guesses in the code, I don't see any reason to extend this further.

I'm sorry, but I was not referring to the serverID URI matching. I'm
referring to syncrepl provider matching to listeners.

I mentioned the serverID matching since it seems to work like syncrepl
provider matching should.

Regards,
Jonathan

Comment 5 Jonathan 2009-02-12 22:09:22 UTC
hyc@symas.com wrote:
> Howard Chu wrote:
>   
>> hyc@symas.com wrote:
>>     
>>> jclarke@linagora.com wrote:
>>>       
>>>> I saw the comment in the code that clarifies this behaviour. However, it's a
>>>> surprising behaviour, and I think there is code to parse this kind of thing in
>>>> the serverID detection now. Maybe it could be reused?
>>>>         
>> Oops, never mind. I was thinking of the serverID code, and you're talking
>> about the syncrepl code.
>>
>> I'll have to think a bit about why the two aren't behaving identically;
>> probably they should both use the same code.
>>
>>     
> Ah right. For the serverID code, we're doing a liberal match because anything 
> that matches the current name:port is obviously the current server, and we 
> want it to be identified as such.
>
> For syncrepl, we know full well that it may be the current server, but for 
> whatever reason you may want to replicate against yourself anyway (e.g., 
> against a different baseDN, etc...).
>   

Actually, this comes straight from test049 (config replication). With 
multimaster config replication, it starts by replicating itself 
(useless, but necessary for other masters). This problem doesn't appear 
in the test case, since a variable ($URI1) is used for both the slapd -h 
$URI1 option, and in the syncrepl provider=$URI1 LDIF. Otherwise it 
would produce weird results.

Jon

Comment 6 Howard Chu 2009-02-13 00:23:10 UTC
Jonathan Clarke wrote:
> hyc@symas.com wrote:
>> Ah right. For the serverID code, we're doing a liberal match because anything
>> that matches the current name:port is obviously the current server, and we
>> want it to be identified as such.
>>
>> For syncrepl, we know full well that it may be the current server, but for
>> whatever reason you may want to replicate against yourself anyway (e.g.,
>> against a different baseDN, etc...).

> Actually, this comes straight from test049 (config replication). With
> multimaster config replication, it starts by replicating itself
> (useless, but necessary for other masters). This problem doesn't appear
> in the test case, since a variable ($URI1) is used for both the slapd -h
> $URI1 option, and in the syncrepl provider=$URI1 LDIF. Otherwise it
> would produce weird results.

If you want a setup similar to test049, follow what it does. If you want to 
something else, do otherwise. There are valid reasons to point a syncrepl 
consumer at the same slapd (e.g., proxy syncrepl, automatic local 
backup/mirroring, rewriting a subtree of a main database, etc.). Using the 
exhaustive matching that serverID uses would preclude those cases from working.

As usual, when you're configuring a server, you have to pay attention to what 
you're doing and what effect you want to accomplish.
-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 7 Jonathan 2009-02-13 13:28:56 UTC
On 13.02.2009 01:23, Howard Chu wrote:
> Jonathan Clarke wrote:
>> hyc@symas.com wrote:
>>> Ah right. For the serverID code, we're doing a liberal match because 
>>> anything
>>> that matches the current name:port is obviously the current server, 
>>> and we
>>> want it to be identified as such.
>>>
>>> For syncrepl, we know full well that it may be the current server, 
>>> but for
>>> whatever reason you may want to replicate against yourself anyway 
>>> (e.g.,
>>> against a different baseDN, etc...).
>> Actually, this comes straight from test049 (config replication). With
>> multimaster config replication, it starts by replicating itself
>> (useless, but necessary for other masters). This problem doesn't appear
>> in the test case, since a variable ($URI1) is used for both the slapd -h
>> $URI1 option, and in the syncrepl provider=$URI1 LDIF. Otherwise it
>> would produce weird results.
>
> If you want a setup similar to test049, follow what it does. If you 
> want to something else, do otherwise.
I am following what test049 does. Only difference is test049 starts 
slapd with "-h $URI1" (with URI1="ldap://localhost:9011/") and I start 
slapd with no -h option or a slightly different URI. This is quite 
common, I believe.

If you patch the test as follows, it fails:
8<----------------------------------------------------
--- tests/scripts/test049-sync-config    10 Feb 2009 12:29:01 -0000    
1.4.2.9
+++ tests/scripts/test049-sync-config    13 Feb 2009 10:11:22 -0000
@@ -112,7 +112,7 @@ $LDAPMODIFY -D cn=config -H $URI1 -y $CO
-olcSyncRepl: rid=001 provider=$URI1 binddn="cn=config" bindmethod=simple
+olcSyncRepl: rid=001 provider=`echo $URI1 | sed 
"s/localhost/127.0.0.1/"` binddn="cn=config" bindmethod=simple
8<----------------------------------------------------

Error:
8<----------------------------------------------------
Running ./scripts/test049-sync-config...
running defines.sh
Starting producer slapd on TCP/IP port 9011...
Using ldapsearch to check that producer slapd is running...
Inserting syncprov overlay on producer...
ldapmodify failed for syncrepl config (10)!
8<----------------------------------------------------

> There are valid reasons to point a syncrepl consumer at the same slapd 
> (e.g., proxy syncrepl, automatic local backup/mirroring, rewriting a 
> subtree of a main database, etc.). Using the exhaustive matching that 
> serverID uses would preclude those cases from working.
I see. I understand you don't want to break existing functionality. 
However, I still feel this is a bug, or at least that you need to "hack" 
the setup to get things working as expected.

Would I be right in assuming that in all the "valid reasons to point a 
syncrepl consumer at the same slapd" you mention, the syncrepl consumer 
uses a different base DN than the base DN of the database that the 
syncrepl consumer is configured on? If so, maybe the exhaustive matching 
that serverID uses could be used, if and only if those two base DNs are 
different?

Jonathan

Comment 8 Jonathan 2009-02-13 16:23:51 UTC
On 13.02.2009 14:29, jclarke@linagora.com wrote:
> On 13.02.2009 01:23, Howard Chu wrote:
>    
>> Jonathan Clarke wrote:
>>      
>>> hyc@symas.com wrote:
>>>        
>>>> Ah right. For the serverID code, we're doing a liberal match because
>>>> anything
>>>> that matches the current name:port is obviously the current server,
>>>> and we
>>>> want it to be identified as such.
>>>>
>>>> For syncrepl, we know full well that it may be the current server,
>>>> but for
>>>> whatever reason you may want to replicate against yourself anyway
>>>> (e.g.,
>>>> against a different baseDN, etc...).
>>>>          
>>> Actually, this comes straight from test049 (config replication). With
>>> multimaster config replication, it starts by replicating itself
>>> (useless, but necessary for other masters). This problem doesn't appear
>>> in the test case, since a variable ($URI1) is used for both the slapd -h
>>> $URI1 option, and in the syncrepl provider=$URI1 LDIF. Otherwise it
>>> would produce weird results.
>>>        
>> If you want a setup similar to test049, follow what it does. If you
>> want to something else, do otherwise.
>>      
> I am following what test049 does.
Oops, sorry, I've been mixing 049 and 050 in my head. My bad, this isn't 
a problem for test049 indeed.

Although I am curious as to why test049 creates a syncrepl consumer 
targeted at itself? Can't quite get my head around it yet, but will keep 
thinking, with regard to multimaster.
Comment 9 Howard Chu 2009-02-14 00:51:27 UTC
Jonathan Clarke wrote:
> On 13.02.2009 01:23, Howard Chu wrote:
>> If you want a setup similar to test049, follow what it does. If you
>> want to something else, do otherwise.
> I am following what test049 does. Only difference is test049 starts
> slapd with "-h $URI1" (with URI1="ldap://localhost:9011/") and I start
> slapd with no -h option or a slightly different URI. This is quite
> common, I believe.

And the slapd(8) manpage clearly states that with no -h option, it defaults to 
"ldap:///". If the default suits your purpose, use it. If not, then specify 
the URL you want explicitly. Yes, it's quite common to start slapd with no -h 
option, because it suits the general case. This discussion is not about the 
general case.

> If you patch the test as follows, it fails:

Obviously you should not do that.

>> There are valid reasons to point a syncrepl consumer at the same slapd
>> (e.g., proxy syncrepl, automatic local backup/mirroring, rewriting a
>> subtree of a main database, etc.). Using the exhaustive matching that
>> serverID uses would preclude those cases from working.
> I see. I understand you don't want to break existing functionality.
> However, I still feel this is a bug, or at least that you need to "hack"
> the setup to get things working as expected.

The docs state that "syncrepl provider" is the LDAP URI of the master server. 
If you're not putting in the same URI as the master is using, that's a config 
error, not a bug.

> Would I be right in assuming that in all the "valid reasons to point a
> syncrepl consumer at the same slapd" you mention, the syncrepl consumer
> uses a different base DN than the base DN of the database that the
> syncrepl consumer is configured on?  If so, maybe the exhaustive matching
> that serverID uses could be used, if and only if those two base DNs are
> different?

That sounds reasonable. Will think about it a bit more. Certainly if the two 
base DNs are the same, it will cause a loop...
-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 10 Howard Chu 2009-02-14 00:57:32 UTC
jclarke@linagora.com wrote:
> This is a multi-part message in MIME format.

Please do not send HTML/MIME emails here.

> On 13.02.2009 14:29, jclarke@linagora.com wrote:
> I am following what test049 does.

> Oops, sorry, I've been mixing 049 and 050 in my head. My bad, this isn't
> a problem for test049 indeed.
>
> Although I am curious as to why test049 creates a syncrepl consumer
> targeted at itself? Can't quite get my head around it yet, but will keep
> thinking, with regard to multimaster.

Very simple. Once the real consumer connects to the master and replicates, it 
will acquire whatever config was on the master. If the master doesn't have a 
consumer pointed at itself, then once the consumer acquires the master's 
configuration, the consumer's syncrepl config will disappear, and replication 
will stop.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 11 Howard Chu 2009-02-14 01:07:40 UTC
changed state Open to Feedback
Comment 12 Jonathan 2009-02-16 08:28:36 UTC
On 14.02.2009 01:51, Howard Chu wrote:
> Jonathan Clarke wrote:
>> On 13.02.2009 01:23, Howard Chu wrote:
>>> If you want a setup similar to test049, follow what it does. If you
>>> want to something else, do otherwise.
>> I am following what test049 does. Only difference is test049 starts
>> slapd with "-h $URI1" (with URI1="ldap://localhost:9011/") and I start
>> slapd with no -h option or a slightly different URI. This is quite
>> common, I believe.
>
> And the slapd(8) manpage clearly states that with no -h option, it
> defaults to "ldap:///". If the default suits your purpose, use it. If
> not, then specify the URL you want explicitly. Yes, it's quite common to
> start slapd with no -h option, because it suits the general case. This
> discussion is not about the general case.
>
>> If you patch the test as follows, it fails:
>
> Obviously you should not do that.
>
>>> There are valid reasons to point a syncrepl consumer at the same slapd
>>> (e.g., proxy syncrepl, automatic local backup/mirroring, rewriting a
>>> subtree of a main database, etc.). Using the exhaustive matching that
>>> serverID uses would preclude those cases from working.
>> I see. I understand you don't want to break existing functionality.
>> However, I still feel this is a bug, or at least that you need to "hack"
>> the setup to get things working as expected.
>
> The docs state that "syncrepl provider" is the LDAP URI of the master
> server. If you're not putting in the same URI as the master is using,
> that's a config error, not a bug.

Understood. Nothing is broken, and we have worked around this issue 
using explicit listeners to slapd -h.

This behaviour did surprise me, considering the behaviour of serverID 
matching. May I suggest that a note of warning is included in the admin 
guide? For example:
http://milopita.phillipoux.net/jonathan-clarke-20090216.patch

>> Would I be right in assuming that in all the "valid reasons to point a
>> syncrepl consumer at the same slapd" you mention, the syncrepl consumer
>> uses a different base DN than the base DN of the database that the
>> syncrepl consumer is configured on? If so, maybe the exhaustive matching
>> that serverID uses could be used, if and only if those two base DNs are
>> different?
>
> That sounds reasonable. Will think about it a bit more. Certainly if the
> two base DNs are the same, it will cause a loop...

Indeed. For what it's worth, while testing on cn=config, I noticed this 
having various consequences including:
1) cn=config becoming read-only (shadow) and updates returning a referral.
2) in a MMR setup, some updates not being replicated to all N servers.

Regards,
Jonathan

Comment 13 Howard Chu 2009-03-03 17:36:26 UTC
changed notes
Comment 14 Howard Chu 2009-07-29 07:07:25 UTC
jclarke@linagora.com wrote:
> On 14.02.2009 01:51, Howard Chu wrote:
>> Jonathan Clarke wrote:
>>> On 13.02.2009 01:23, Howard Chu wrote:
>>>> There are valid reasons to point a syncrepl consumer at the same slapd
>>>> (e.g., proxy syncrepl, automatic local backup/mirroring, rewriting a
>>>> subtree of a main database, etc.). Using the exhaustive matching that
>>>> serverID uses would preclude those cases from working.
>>> I see. I understand you don't want to break existing functionality.
>>> However, I still feel this is a bug, or at least that you need to "hack"
>>> the setup to get things working as expected.
>>
>> The docs state that "syncrepl provider" is the LDAP URI of the master
>> server. If you're not putting in the same URI as the master is using,
>> that's a config error, not a bug.
>
> Understood. Nothing is broken, and we have worked around this issue
> using explicit listeners to slapd -h.
>
> This behaviour did surprise me, considering the behaviour of serverID
> matching. May I suggest that a note of warning is included in the admin
> guide? For example:
> http://milopita.phillipoux.net/jonathan-clarke-20090216.patch
>
>>> Would I be right in assuming that in all the "valid reasons to point a
>>> syncrepl consumer at the same slapd" you mention, the syncrepl consumer
>>> uses a different base DN than the base DN of the database that the
>>> syncrepl consumer is configured on? If so, maybe the exhaustive matching
>>> that serverID uses could be used, if and only if those two base DNs are
>>> different?

Actually, the exhaustive matching is only needed if the two base DNs are the 
same. This is now patched in HEAD, please test.

>> That sounds reasonable. Will think about it a bit more. Certainly if the
>> two base DNs are the same, it will cause a loop...
>
> Indeed. For what it's worth, while testing on cn=config, I noticed this
> having various consequences including:
> 1) cn=config becoming read-only (shadow) and updates returning a referral.
> 2) in a MMR setup, some updates not being replicated to all N servers.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 15 Howard Chu 2009-07-29 07:19:34 UTC
changed notes
changed state Feedback to Test
moved from Incoming to Software Bugs
Comment 16 Quanah Gibson-Mount 2009-08-02 21:26:54 UTC
changed notes
changed state Test to Release
Comment 17 Quanah Gibson-Mount 2009-09-08 16:43:02 UTC
changed notes
changed state Release to Closed
Comment 18 OpenLDAP project 2014-08-01 21:04:23 UTC
doc added in HEAD
fixed in HEAD
fixed in RE24