Issue 6268 - multi-master sync replication ldap_add error code 68 bug
Summary: multi-master sync replication ldap_add error code 68 bug
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: 2.4.17
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-08-20 14:45 UTC by bcolston@xtec.com
Modified: 2021-01-15 21:07 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description bcolston@xtec.com 2009-08-20 14:45:01 UTC
Full_Name: Barry Colston
Version: 2.4.17
OS: Fedora 10
URL: 
Submission from: (NULL) (209.255.208.219)


While testing sync replication, I encountered a situation in which a previously
deleted DN cannot be added again because the ldapadd command receives a 68 error
code.  If I perform an ldapsearch command for the DN, the DN is not found.  If I
try to add the DN, the add fails with a 68 error code.  If I perform a slapcat
command for that DN, slapcat displays the record.

I have 3 multi-master servers in my configuration; all 3 are executing on the
same physical server, listening on different ports and with separate copies of
BDB databases. Each server is replicating to the other 2 servers (e.g., server 1
replicates to server 2 and server 3, server 2 replicates to server 1 and server
3,
and server 3 replicates to server 1 and server 2) using the refreshAndPersist
mode. I execute 3 shell scripts simultaneously, each of which adds a set of
parent/child records using the ldapadd command and ldif files, then deletes the
records using the ldapdelete command (each shell scripts adds and deletes about
267 records and each shell script operates on a separate set of DNs). The 3
shell scripts issue the ldapadd and ldapdelete commands against server 1 and
repeat the add/delete cycle 10 times before exiting. After all 3 shell scripts
finish, I compare the server 1 records against the server 2 records and compare
the server 1 records against the server 3 records listing any differences.  The
method I normally execute the shell scripts results in 800 records being added
then deleted 10 times, for a total of 8000 adds/deletes occurring.

After executing the above 3 shell scripts multiple times (without bringing down
slapd between executes), some records will fail to be added by the ldapadd
command because the ldapadd command returns an error code of 68.  After this
occurs, I execute an ldapsearch command for the DN of the record that received
the 68 error; the ldapsearch command fails to find the record (which is correct
because the record was deleted).  If I perform a slapcat command for the DN
against server 1's BDB files, slapcat finds the record, but it is listed with an
objectClass and structuralObjectClass of "glue" (which are different than when
the record was added.) Slapcat performed against the 2 other master servers
(server 2 and server 3) do not display a record.  When this error occurs, there
are usually multiple records that fail to add with an error code of 68.

I have removed all BDB index files and rerun the slapindex command, but the DN
is still not found with the ldapsearch command and fails to be added because of
a 68 error code.

This condition is not repeatable on demand, but if I run my scripts doing
ldapadd/ldapdelete multiple times, it will eventually occur.

This error appears to be related to the value specified in my slapd.conf file
for "syncprov-sessionlog".  I have changed the value of "syncprov-sessionlog",
with the following results:

syncprov-sessionlog 5000 - usually execution 3 or 4 results in the ldapadd 68
error (e.g., the first 2 executions of 8000 adds/deletes work OK)

syncprov-sessionlog 50000 - usually execution 6 or 7 results in the ldapadd 68
error (e.g., the first 5 executions of 8000 adds/deletes work OK)

syncprov-sessionlog 200 - usually execution 2 or 3 results in the ldapadd 68
error (e.g., the first execution of 8000 adds/deletes works OK)

syncprov-sessionlog not specified - usually execution 2 or 3 results in 1 of the
slapd servers crashing with a segmentation fault
    (usually server 1, but sometimes the other servers)
    (example of crash output is *** glibc detected ***
/tmp/reptest/openldap/openldap-2.4.17/libexec/slapd: malloc(): memory corruption
(fast): 0x9e61f860 ***)

I am using BDB 4.6.21 and have tested with the 4 BDB patches applied and not
applied (the 68 error occurs using BDB without the patches and BDB with the
patches).
Comment 1 Howard Chu 2009-08-20 16:44:05 UTC
bcolston@xtec.com wrote:
> Full_Name: Barry Colston
> Version: 2.4.17
> OS: Fedora 10
> URL:
> Submission from: (NULL) (209.255.208.219)
>
>
> While testing sync replication, I encountered a situation in which a previously
> deleted DN cannot be added again because the ldapadd command receives a 68 error
> code.  If I perform an ldapsearch command for the DN, the DN is not found.  If I
> try to add the DN, the add fails with a 68 error code.  If I perform a slapcat
> command for that DN, slapcat displays the record.

This sounds like it's related to ITS#6097.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 2 Hallvard Furuseth 2009-11-23 21:11:26 UTC
moved from Incoming to Software Bugs
Comment 3 Quanah Gibson-Mount 2010-04-21 13:12:07 UTC
changed notes
Comment 4 OpenLDAP project 2014-08-01 21:04:29 UTC
See ITS#6097 as well
Comment 5 Quanah Gibson-Mount 2018-02-10 00:37:41 UTC
Hi Barry,

I had a few questions to ask in relation to this ITS:

a) Can you reproduce this problem with current RE24?

and

b) Could you provide copies of your shell scripts so I can create a 
generalized regression test case for this? (And, if this issue is not 
resolved as of yet, it will help with getting it fixed).

Thanks!

Regards,
Quanah

--

Quanah Gibson-Mount
Product Architect
Symas Corporation
Packaged, certified, and supported LDAP solutions powered by OpenLDAP:
<http://www.symas.com>


Comment 6 Quanah Gibson-Mount 2021-01-14 18:03:35 UTC
Likely fixed by recent work, may also have been tied to a BDB specific bug.