[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: segfaults with syncrepl

On Friday 27 June 2008 19:26:22 Liutauras Adomaitis wrote:
> hello everybody,
> I'm quite new to OpenLdap. Actually i've been using it for a few years, but
> I have no deep knowlege.
> The problem I'm facing is my cosumer replicas are segfaulting.

There were a number of fixes to syncrepl in 2.4.9. 

> My design:
> I have one master with several o=BranchX,dc=example,dc=com This is
> provider. I have several (the number is X-1) replicas, consumers.
> All consumers are replicating its branch o=BranchX,dc=example,dc=com and
> one common branch o=BranchMain,dc=example,dc=com.
> The picture is  like this:
> Provider
> o=BranchMain,dc=example,dc=com
> o=Branch1,dc=example,dc=com
> o=Branch2,dc=example,dc=com
> .....
> o=BranchX,dc=example,dc=com
> Consumer 1:
> o=BranchMain,dc=example,dc=com
> o=Branch1,dc=example,dc=com
> Consumer 2:
> o=BranchMain,dc=example,dc=com
> o=Branch2,dc=example,dc=com

But it seems you have implemented this by using a single database at 
dc=example,dc=com, with multiple syncrepl statements (one for each subtree 
that you replicate). As far as I know, this in not supported. Instead, you 
should consider using a separate database for each syncrepl statement, and 
glue the databases together by using the 'subordinate' statement in each 
sub-tree database.

This would look something like this:

database bdb
suffix o=BranchMain,dc=example,dc=com
syncrepl ...

database bdb
suffix o=Branch1,dc=example,dc=com
syncrepl ...

database bdb
suffix dc=example,dc=com
syncrepl ...

> At the begining I had one consumer, which was segfaultin just randomly once
> or twice a day. I decided to comment out my syncrepl directives in conf
> file and now it is running for a day and half.

I'm running 2.4.9 or 2.4.10 on my own systems at present, but I didn't see any 
problems like this on small databases with 2.4.8.

> I should mention, that after 
> cosumer segfaults I cannot start slapd any more. The only solution I have
> is to delete ol /var/lib/ldap (all database) directory contents and then
> restarting slapd.

Did running database recovery  (/etc/init.d/ldap recover) make any difference 

> If restarting slapd on the old database - segfaulti is 
> happening.
> Since this was a smaill branch and only one branch I thought to debug the
> problem later. Today I faced the same situation on a biger consumer. The
> same situation. slapd just crashed and only deleting database helped me to
> start it again.

For a larger database, you should have some database cache specified in the 
DB_CONFIG file in the database directory.

> My systems are Mandriva 2008.1 with slapd version:
> @(#) $OpenLDAP: slapd 2.4.8 (Mar 23 2008 16:49:39) $
>         mandrake@klodia.mandriva.com:
> /home/mandrake/rpm/BUILD/openldap-2.4.8/servers/slapd

I am considering shipping an official update, most likely to 2.4.10. In the 
meantime, I have released a 2.4.10 to backports for 2008.1. If fixing your 
configuration doesn't address all your stability problems, you may want to 
consider upgrading to that package.

> I have one branch runing old slapd versions (the ones comming with Mandriva
> 2007.0), but they seem to work except that I can have replicated only one
> branch (one rid).

See above, the multiple-database (one syncrepl statement per database) would 
work in 2.3 as well.

> Seems old slapd doesn't support several rids. 

And the new slapd mainly supports multiple syncrepl statements in the same 
database for multi-master replication, not for the design you've chosen.

> Can anybody help me to debug this situation? This configuration is rather
> new but I was thinking to build all infrastructure on such a configuration,
> so segfaulting is very big issue.
> Provider (master) configuration is:
> include /usr/share/openldap/schema/core.schema
> include /usr/share/openldap/schema/cosine.schema
> include /usr/share/openldap/schema/corba.schema
> include /usr/share/openldap/schema/inetorgperson.schema
> include /usr/share/openldap/schema/nis.schema
> include /usr/share/openldap/schema/openldap.schema
> include /usr/share/openldap/schema/samba.schema
> include /usr/share/openldap/schema/qmail.schema
> include /etc/openldap/schema/local.schema
> include         /etc/openldap/slapd.access.conf
> access to dn.subtree="dc=example,dc=com"
>         by group="cn=Replicator,ou=Group,dc=example,dc=com"
>         by users read
>         by anonymous read
> pidfile         /var/run/ldap/slapd.pid
> argsfile        /var/run/ldap/slapd.args
> modulepath      /usr/lib64/openldap
> moduleload     syncprov.la
> TLSRandFile             /dev/random
> TLSCipherSuite          HIGH:MEDIUM:+SSLv2+SSLv3
> TLSCertificateFile      /etc/pki/tls/certs/slapd.pem
> TLSCertificateKeyFile   /etc/pki/tls/certs/slapd.pem
> TLSCACertificatePath    /etc/pki/tls/certs/
> TLSCACertificateFile    /etc/pki/tls/certs/ca-bundle.crt
> TLSVerifyClient never # ([never]|allow|try|demand)
> database        bdb
> suffix          "dc=example,dc=com"
> rootdn          "cn=Manager,dc=example,dc=com"
> rootpw          secret
> directory       /var/lib/ldap
> checkpoint 256 5
> index   mailAlternateAddress                    eq,sub
> index   accountStatus,mailHost,deliveryMode     eq
> index   default                                 sub
> index   objectClass                                             eq
> index   cn,mail,surname,givenname
> eq,subinitial
> index   uidNumber,gidNumber,memberuid,member,uniqueMember       eq
> index   uid
> eq,subinitial
> index   sambaSID,sambaDomainName,displayName                    eq
> index  entryCSN,entryUUID                                      eq
> limits group="cn=Replicator,dc=infosaitas,dc=lt"
>      size=unlimited
>      time=unlimited
> access to *
>         by group="cn=Replicator,dc=infosaitas,dc=lt" write
>         by * read
> overlay syncprov
> syncprov-checkpoint 100 10
> syncprov-sessionlog 10
> Consumers configuration (all the same):
> include /usr/share/openldap/schema/core.schema
> include /usr/share/openldap/schema/cosine.schema
> include /usr/share/openldap/schema/corba.schema
> include /usr/share/openldap/schema/inetorgperson.schema
> include /usr/share/openldap/schema/nis.schema
> include /usr/share/openldap/schema/openldap.schema
> include /usr/share/openldap/schema/samba.schema
> include /usr/share/openldap/schema/qmail.schema
> include /etc/openldap/schema/local.schema
> include         /etc/openldap/slapd.access.conf
> include         /etc/openldap/slapd.access.ldapauth.conf
> access to dn.subtree="dc=example,dc=com"
>         by group="cn=Replicator,ou=Group,dc=example,dc=com"
>         by users read
>         by anonymous read
> pidfile         /var/run/ldap/slapd.pid
> argsfile        /var/run/ldap/slapd.args
> modulepath      /usr/lib64/openldap
> moduleload      back_ldap.la
> TLSCertificateFile      /etc/ssl/openldap/ldap.pem
> TLSCertificateKeyFile   /etc/ssl/openldap/ldap.pem
> TLSCACertificateFile    /etc/ssl/openldap/ldap.pem
> overlay                 chain
> chain-uri               "ldap://master.server";
> chain-idassert-bind     bindmethod="simple"
>                         binddn="cn=Manager,dc=example,dc=com"
>                         credentials=secret
>                         mode="none"
> chain-tls               start
> chain-return-error      TRUE
> database        bdb
> suffix          "dc=example,dc=com"
> rootdn          "cn=Manager,dc=example,dc=com"
> rootpw          secret
> directory       /var/lib/ldap
> checkpoint 256 5
> index   objectClass                                             eq
> index   mailAlternateAddress                    eq,sub
> index   accountStatus,mailHost,deliveryMode     eq
> index   default                                 sub
> index   cn,mail,surname,givenname
> eq,subinitial
> index   uidNumber,gidNumber,memberuid,member,uniqueMember       eq
> index   uid
> eq,subinitial
> index   sambaSID,sambaDomainName,displayName                    eq
> limits group="cn=Replicator,ou=Group,dc=example,dc=com"
>  size=unlimited
>  time=unlimited
> syncrepl rid=1
>     provider=ldap://master.server:389
>     type=refreshAndPersist
>     retry="60 +"
>     searchbase="o=BranchMain,dc=example,dc=com"
>     filter="(objectClass=*)"
>     scope=sub
>     attrs=*

Using attrs=* will mean you don't replicate operational attributes, either 
leave attrs unspecified, or use the default ('attrs=*,+').

>     schemachecking=off
>     bindmethod=simple
>     binddn="cn=Manager,dc=example,dc=com"
>     credentials=secret
>     starttls=yes
> syncrepl rid=2
>     provider=ldap://master.server:389
>     type=refreshAndPersist
>     retry="60 +"
>     searchbase="o=Branch1,dc=example,dc=com"
>     filter="(objectClass=*)"
>     scope=sub
>     attrs=*
>     schemachecking=off
>     bindmethod=simple
>     binddn="cn=Manager,dc=example,dc=com"
>     credentials=secret
>     starttls=yes
> updateref ldap://master.server
