Issue 7400 - Memberof and Syncrepl incompatibility
Summary: Memberof and Syncrepl incompatibility
Status: RESOLVED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: 2.4.29
Hardware: All All
: --- normal
Target Milestone: 2.6.8
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-09-25 11:45 UTC by arunkumar_1123@yahoo.com
Modified: 2024-03-26 16:56 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description arunkumar_1123@yahoo.com 2012-09-25 11:45:21 UTC
Full_Name: Arunkumar shanmugam
Version: 2.4.29
OS: rhel5
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (203.83.248.32)


Hi,

I'm currently using Openldap 2.4.29 to model an Authorization platform. I
noticed some inconsistent behavior with syncrepl and memberof overlays.

The issue happens as follows:

If I Create groups with a large number of members and delete them in quick
succession on the writemaster, the data replicated to the readslave is
incorrect, in particular, the memberof fields of the User objects.

This seems to happen because the memberof field is getting replicated to the
slave nodes, although the documentation states that it shouldn't. While
replicating, the User object is replicated inclusive of the memberof fields, but
by the time the syncrepl search comes to the group object, it has already been
deleted, and hence not replicated. This leaves a dangling memberof field in the
read slave instance.

I was wondering if anyone has faced this issues (I did not see any ITS related
to this), and has a workaround.

Thanks,
Arunkumar
Comment 1 Howard Chu 2012-09-30 15:21:54 UTC
arunkumar_1123@yahoo.com wrote:
> Full_Name: Arunkumar shanmugam
> Version: 2.4.29
> OS: rhel5
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (203.83.248.32)
>
>
> Hi,
>
> I'm currently using Openldap 2.4.29 to model an Authorization platform. I
> noticed some inconsistent behavior with syncrepl and memberof overlays.

Does this issue occur with the current release, 2.4.32?
>
> The issue happens as follows:
>
> If I Create groups with a large number of members and delete them in quick
> succession on the writemaster, the data replicated to the readslave is
> incorrect, in particular, the memberof fields of the User objects.
>
> This seems to happen because the memberof field is getting replicated to the
> slave nodes, although the documentation states that it shouldn't.

Indeed. Do you have debug logs showing the replication traffic, and showing 
that the memberof attribute got replicated?

> While
> replicating, the User object is replicated inclusive of the memberof fields, but
> by the time the syncrepl search comes to the group object, it has already been
> deleted, and hence not replicated. This leaves a dangling memberof field in the
> read slave instance.
>
> I was wondering if anyone has faced this issues (I did not see any ITS related
> to this), and has a workaround.
>
> Thanks,
> Arunkumar
>
>


-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 2 arunkumar_1123@yahoo.com 2012-10-01 11:51:44 UTC
Hi,
Yes, I am able to reproduce the issue with 2.4.32


Making sense of the logs for the exact reproduction is hard since it needs a lot of operations in a short time. But this will probably help:

1. At the start of the test, the group temp_group existed.

2. I created a user temp_user and added temp_user to temp_group.

3. During replication, the group was replicated first and I got an error 32 (NO_SUCH_OBJECT) when it tried to modify the memberOf of the temp_user object (This does not exist in the readslave yet).

4. The temp_user object was replicated next, and as you see, querying it does show a memberOf attribute, proving that this field was replicated. (Note that I have run OpenLDAP with debug as -1 and verified that the replicated data has the memberOf field in it. I can provide this too if needed.)

5. The more serious problem occurs when the sequence is reversed and the group has been deleted as a last operation - The user is replicated first, but since the group is deleted, it is never replicated and a stale memberOf entry stays with the user.


LOGS:

5069157c syncrepl_message_to_entry: rid=179 DN: ou=temp_group,ou=group,dc=example,dc=com, UUID: 31090ab1-f8a7-4363-83c2-c0ac0d3918d4
5069157c syncrepl_entry: rid=179 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD)
5069157c syncrepl_entry: rid=179 inserted UUID 31090ab1-f8a7-4363-83c2-c0ac0d3918d4
5069157c syncrepl_entry: rid=179 be_search (0)
5069157c syncrepl_entry: rid=179 ou=temp_group,ou=group,dc=example,dc=com
5069157c slap_queue_csn: queing 0x4270c730 20121001040100.779862Z#000000#000#000000
5069157c slap_graduate_commit_csn: removing 0xfa035c0 20121001040100.779862Z#000000#000#000000
5069157c conn=-1 op=0: memberof_value_modify DN="uid=temp_user,dc=example,dc=com" add memberOf="ou=temp_group,ou=group,dc=example,dc=com" failed err=32
5069157c syncrepl_entry: rid=179 be_modify ou=temp_group,ou=group,dc=example,dc=com (0)
5069157c syncrepl_message_to_entry: rid=179 DN: uid=temp_user,dc=example,dc=com, UUID: 748bd1a9-6be3-450b-809c-5ea692aa073c
5069157c syncrepl_entry: rid=179 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD)
5069157c syncrepl_entry: rid=179 inserted UUID 748bd1a9-6be3-450b-809c-5ea692aa073c
5069157c syncrepl_entry: rid=179 be_search (0)
5069157c syncrepl_entry: rid=179 uid=temp_user,dc=example,dc=com
5069157c syncrepl_entry: rid=179 be_add uid=temp_user,dc=example,dc=com (0)

The object temp_user:

dn: uid=temp_user,dc=example,dc=com
memberOf: ou=temp_group,ou=group,dc=example,dc=com


What is interesting is that in this case, the memberOf field being replicated actually protects the slave from incorrect data as the temp_user entry was not present at the time the group got replicated (The user entry was the second entry in the replication order). On the other hand, a reversed path during replication causes the mentioned bug.

Thanks



________________________________
 From: Howard Chu <hyc@symas.com>
To: arunkumar_1123@yahoo.com 
Cc: openldap-its@openldap.org 
Sent: Sunday, 30 September 2012 8:51 PM
Subject: Re: (ITS#7400) Memberof and Syncrepl incompatibility
 
arunkumar_1123@yahoo.com wrote:
> Full_Name: Arunkumar shanmugam
> Version: 2.4.29
> OS: rhel5
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (203.83.248.32)
>
>
> Hi,
>
> I'm currently using Openldap 2.4.29 to model an Authorization platform. I
> noticed some inconsistent behavior with syncrepl and memberof overlays.

Does this issue occur with the current release, 2.4.32?
>
> The issue happens as follows:
>
> If I Create groups with a large number of members and delete them in quick
> succession on the writemaster, the data replicated to the readslave is
> incorrect, in particular, the memberof fields of the User objects.
>
> This seems to happen because the memberof field is getting replicated to the
> slave nodes, although the documentation states that it shouldn't.

Indeed. Do you have debug logs showing the replication traffic, and showing 
that the memberof attribute got replicated?

> While
> replicating, the User object is replicated inclusive of the memberof fields, but
> by the time the syncrepl search comes to the group object, it has already been
> deleted, and hence not replicated. This leaves a dangling memberof field in the
> read slave instance.
>
> I was wondering if anyone has faced this issues (I did not see any ITS related
> to this), and has a workaround.
>
> Thanks,
> Arunkumar
>
>


-- 
   -- Howard Chu
   CTO, Symas Corp.          http://www.symas.com
   Director, Highland Sun    http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/
Comment 3 Howard Chu 2012-10-01 13:29:33 UTC
arun s wrote:
> Hi,
> Yes, I am able to reproduce the issue with 2.4.32
>
> Making sense of the logs for the exact reproduction is hard since it needs a
> lot of operations in a short time. But this will probably help:
>
> 1. At the start of the test, the group temp_group existed.
>
> 2. I created a user temp_user and added temp_user to temp_group.
>
> 3. During replication, the group was replicated first and I got an error 32
> (NO_SUCH_OBJECT) when it tried to modify the memberOf of the temp_user object
> (This does not exist in the readslave yet).
>
> 4. The temp_user object was replicated next, and as you see, querying it does
> show a memberOf attribute, proving that this field was replicated. (Note that
> I have run OpenLDAP with debug as -1 and verified that the replicated data has
> the memberOf field in it. I can provide this too if needed.)

I see. The current code drops the memberOf attribute if it was not explicitly 
requested in the search. However, by default the consumer requests "+" which 
means "all operational attributes" and so slapd considers memberOf to have 
been requested. We need to reconsider this aspect of the design.
>
> 5. The more serious problem occurs when the sequence is reversed and the group
> has been deleted as a last operation - The user is replicated first, but since
> the group is deleted, it is never replicated and a stale memberOf entry stays
> with the user.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 4 arunkumar_1123@yahoo.com 2012-10-09 11:48:32 UTC
Hi,
To try and overcome this issue, we tried two fixes:

1. Every time a user was deleted from a group, we force-updated the user 
object manually to make sure its entryCSN got updated and it got 
replicated properly. This is an expensive operation and did not scale 
well for big group sizes (10-20k), and did not work out.

2. We then tried to do the same thing in OpenLDAP. We noticed in the 
memberof.c commits that there were a couple of patches to force the 
entryCSN of the user object to get updated. (http://tinyurl.com/8k4qrdj 
and http://tinyurl.com/9akqgfl)These have since been reverted because of 
access log and some replication issues, but for us, speed was a higher 
priority. I reapplied these patches back to the code. This solved the 
member-of replication issue, but we noticed that occasionally under a 
heavy load, there was a sudden surge in OpenLDAP's memory usage going up 
to whatever memory was available and finally crashing.

We have gone back to option (1) though (2) would be the preferred option.

Any help on figuring out why (2) caused the memory bloat would be really 
great. I can provide more details/memory traces if needed.

We will be glad to contribute any fixes once we are able to nail down 
the issue.

Thanks,
Arunkumar

________________________________
 From: Howard Chu <hyc@symas.com>
To: arun s <arunkumar_1123@yahoo.com> 
Cc: "openldap-its@openldap.org" <openldap-its@openldap.org> 
Sent: Monday, 1 October 2012 6:59 PM
Subject: Re: (ITS#7400) Memberof and Syncrepl incompatibility
 
arun s wrote:
> Hi,
> Yes, I am able to reproduce the issue with 2.4.32
> 
> Making sense of the logs for the exact reproduction is hard since it needs a
> lot of operations in a short time. But this will probably help:
> 
> 1. At the start of the test, the group temp_group existed.
> 
> 2. I created a user temp_user and added temp_user to temp_group.
> 
> 3. During replication, the group was replicated first and I got an error 32
> (NO_SUCH_OBJECT) when it tried to modify the memberOf of the temp_user object
> (This does not exist in the readslave yet).
> 
> 4. The temp_user object was replicated next, and as you see, querying it does
> show a memberOf attribute, proving that this field was replicated. (Note that
> I have run OpenLDAP with debug as -1 and verified that the replicated data has
> the memberOf field in it. I can provide this too if needed.)

I see. The current code drops the memberOf attribute if it was not explicitly requested in the search. However, by default the consumer requests "+" which means "all operational attributes" and so slapd considers memberOf to have been requested. We need to reconsider this aspect of the design.
> 
> 5. The more serious problem occurs when the sequence is reversed and the group
> has been deleted as a last operation - The user is replicated first, but since
> the group is deleted, it is never replicated and a stale memberOf entry stays
> with the user.

--   -- Howard Chu
  CTO, Symas Corp.          http://www.symas.com
  Director, Highland Sun    http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
Comment 5 Howard Chu 2012-10-09 14:32:02 UTC
arun s wrote:
> Hi,
> To try and overcome this issue, we tried two fixes:
>
> 1. Every time a user was deleted from a group, we force-updated the user
> object manually to make sure its entryCSN got updated and it got
> replicated properly. This is an expensive operation and did not scale
> well for big group sizes (10-20k), and did not work out.

This is also one of the reasons the code in memberof.c was reverted. Working 
as you suggest *cannot* scale. It makes no difference whether you do it in 
your external client or inside slapd, the amount of actual work required to 
replicate everything quickly grows out of control. This is why the 
documentation states that memberof must be configured on each replica - the 
only way to successfully execute the amount of work is to distribute the work 
evenly to each replica.

> 2. We then tried to do the same thing in OpenLDAP. We noticed in the
> memberof.c commits that there were a couple of patches to force the
> entryCSN of the user object to get updated. (http://tinyurl.com/8k4qrdj
> and http://tinyurl.com/9akqgfl)These have since been reverted because of
> access log and some replication issues, but for us, speed was a higher
> priority. I reapplied these patches back to the code. This solved the
> member-of replication issue, but we noticed that occasionally under a
> heavy load, there was a sudden surge in OpenLDAP's memory usage going up
> to whatever memory was available and finally crashing.
>
> We have gone back to option (1) though (2) would be the preferred option.
>
> Any help on figuring out why (2) caused the memory bloat would be really
> great. I can provide more details/memory traces if needed.
>
> We will be glad to contribute any fixes once we are able to nail down
> the issue.

Probably this conversation should continue on the openldap-devel mailing list. 
It needs some new design work; it is not a simple bugfix.

>
> Thanks,
> Arunkumar
> ------------------------------------------------------------------------------
> *From:* Howard Chu <hyc@symas.com>
> *To:* arun s <arunkumar_1123@yahoo.com>
> *Cc:* "openldap-its@openldap.org" <openldap-its@openldap.org>
> *Sent:* Monday, 1 October 2012 6:59 PM
> *Subject:* Re: (ITS#7400) Memberof and Syncrepl incompatibility
>
> arun s wrote:
>  > Hi,
>  > Yes, I am able to reproduce the issue with 2.4.32
>  >
>  > Making sense of the logs for the exact reproduction is hard since it needs a
>  > lot of operations in a short time. But this will probably help:
>  >
>  > 1. At the start of the test, the group temp_group existed.
>  >
>  > 2. I created a user temp_user and added temp_user to temp_group.
>  >
>  > 3. During replication, the group was replicated first and I got an error 32
>  > (NO_SUCH_OBJECT) when it tried to modify the memberOf of the temp_user object
>  > (This does not exist in the readslave yet).
>  >
>  > 4. The temp_user object was replicated next, and as you see, querying it does
>  > show a memberOf attribute, proving that this field was replicated. (Note that
>  > I have run OpenLDAP with debug as -1 and verified that the replicated data has
>  > the memberOf field in it. I can provide this too if needed.)
>
> I see. The current code drops the memberOf attribute if it was not explicitly
> requested in the search. However, by default the consumer requests "+" which
> means "all operational attributes" and so slapd considers memberOf to have
> been requested. We need to reconsider this aspect of the design.
>  >
>  > 5. The more serious problem occurs when the sequence is reversed and the group
>  > has been deleted as a last operation - The user is replicated first, but since
>  > the group is deleted, it is never replicated and a stale memberOf entry stays
>  > with the user.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 6 Quanah Gibson-Mount 2017-04-03 18:15:49 UTC
changed notes
moved from Incoming to Software Bugs
Comment 7 Quanah Gibson-Mount 2017-08-30 20:51:16 UTC
changed notes
Comment 8 OpenLDAP project 2017-08-30 21:06:13 UTC
memberOf needs rewrite to be syncrepl compatible
Comment 9 Quanah Gibson-Mount 2017-08-30 21:06:13 UTC
changed notes
Comment 10 Quanah Gibson-Mount 2020-06-26 18:50:38 UTC
The fix in issue#9227 should partially help.
Comment 11 Quanah Gibson-Mount 2020-09-01 22:57:31 UTC
For 2.5, memberof is deprecated and the recommendation is to use slapo-dynlist as a replacement.  The 2.5 dynlist allows memberOf population on objects via static and/or dynamic groups.
Comment 12 Quanah Gibson-Mount 2024-02-15 18:20:25 UTC
head:

  • ab55c7fd 
by Howard Chu at 2024-02-06T01:22:58+00:00 
ITS#7400 memberof: note consumers must use exattr


RE26:

  • 6b81fca5 
by Howard Chu at 2024-02-15T17:56:24+00:00 
ITS#7400 memberof: note consumers must use exattr
Comment 13 Quanah Gibson-Mount 2024-03-26 16:56:12 UTC
RE26:

  • f30def77 
by Howard Chu at 2024-03-26T16:38:10+00:00 
ITS#7400 slapo-memberof: delete note about deprecation
Comment 14 Quanah Gibson-Mount 2024-03-26 16:56:33 UTC
head:

  • ae1c8f18 
by Howard Chu at 2024-02-20T15:55:37+00:00 
ITS#7400 slapo-memberof: delete note about deprecation