[Date Prev][Date Next] [Chronological] [Thread] [Top]

Replication appears to vanish, sometimes.




Hello list,

SunOS ldapmaster01.unix 5.10 Generic_118844-26 i86pc i386 i86pc
openldap-2.3.19
db-4.2.52.NC-sol10-intel-local
./configure LDFLAGS=-L/usr/local/lib --enable-crypt


Generally, Openldap is working well, and replication is working. I get no .rej files, and the replog file itself is frequently drained.


But, ldapmaster and ldapslave frequently get out of sync. I thought it was initially just on the modrdn operations, but we have had "simpler" operations vanish.

I realise this isn't much to go on, but perhaps I can get suggestions which may lead to more clues.


The important looking sections in slapd.conf:

loglevel none
database        bdb
suffix          "dc=company,dc=jp"
rootdn          "cn=admin,dc=company,dc=jp"

access to attr=userPassword
        by self write
        by anonymous auth
        by * none

access to *
        by self write
        by dn="cn=admin,dc=company,dc=jp" write
        by * read

password-hash {CRYPT}


checkpoint 128 15

directory       /usr/local/var/openldap-data
index   objectClass     eq
index   uid                     eq
index   uidNumber               eq
index   mail                    eq
index   mailAlternateAddress    pres,eq
index   deliveryMode            eq
index   accountStatus           eq
index   gecos                   eq
index   radiusGroupName         eq
index   o                       pres,eq


replogfile /usr/local/var/openldap-slurp/replica/slurpd.replog

replica host=172.20.12.23:389
        binddn="cn=admin,dc=company,dc=jp"
        bindmethod=simple
        credentials="<secret>"

ldapslave slapd.conf is identical (it really is scp'ed) but with the last two lines removed (replogfile, replica).





# /usr/local/bin/ldapsearch -h ldapmaster01 -D cn=admin,dc=company,dc
=jp -x -w secret -b ou=mail,dc=company,dc=jp mail=test01@cs-sd01.com
# extended LDIF
#
# LDAPv3
# base <ou=mail,dc=company,dc=jp> with scope subtree
# filter: mail=test01@cs-sd01.com
# requesting: ALL
#

# test01, cs-sd01.com, mail, company.jp
dn: uid=test01,o=cs-sd01.com,ou=mail,dc=company,dc=jp
objectClass: top
objectClass: account
objectClass: posixAccount
objectClass: shadowAccount
objectClass: qmailUser
objectClass: amavisAccount
uid: test01
cn: test01
[snip]

# search result
search: 2
result: 0 Success

# numResponses: 2
# numEntries: 1



# /usr/local/bin/ldapsearch -h ldapslave01 -D cn=admin,dc=company,dc=jp -x -w <secret> -b ou=mail,dc=company,dc=jp mail=test01@cs-sd01.com
# extended LDIF
#
# LDAPv3
# base <ou=mail,dc=secret,dc=jp> with scope subtree
# filter: mail=test01@cs-sd01.com
# requesting: ALL
#


# search result
search: 2
result: 0 Success

# numResponses: 1



The various slapdlogs appear to contain little of interest, but perhaps "loglevel none" is the wrong setting?

ldapmaster usually gives lines like:

Mar 25 09:51:23 ldapmaster01.unix slapd[1319]: [ID 458966 local4.debug] do_search: invalid dn (uid=a.user, o=, ou=mail, dc=company, dc=jp)
Mar 25 09:51:25 ldapmaster01.unix slapd[1319]: [ID 458966 local4.debug] do_search: invalid dn (uid=user4447, o=, ou=mail, dc=company, dc=jp)
Mar 25 09:51:29 ldapmaster01.unix slapd[1319]: [ID 458966 local4.debug] do_search: invalid dn (uid=6201535, o=, ou=mail, dc=company, dc=jp)


Which is just users entering their usernames wrong.

ldapslave01 has no real log entries:

Mar 24 22:42:41 ldapslave01.unix slapd[25495]: [ID 854361 local4.debug] bdb_db_open: unclean shutdown detected; attempting recovery.
Mar 24 22:42:41 ldapslave01.unix slapd[25495]: [ID 100111 local4.debug] slapd starting



When things are out of sync, I stop ldapslave01, stop provisioning (the only thing writing to ldap), then rsync everything from ldapmaster, and start it again.



We are considering running without ldapslave until it can be resolved, it will just a bit slower. As I said, it is not all replication that fails, just some.



There was a period where ldapslave was experiencing "too many files open error", but we increased file descriptors since then (6 weeks ago).


I do also have a DBCONFIG file.


Lund

--
Jorgen Lundman       | <lundman@lundman.net>
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
Japan                | +81 (0)3 -3375-1767          (home)