[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Corrupt LDAP DB ... ans 2.3.11 syncrepl



Hiya,

Thanks for the input Buchan, here is the slapd.conf for the 2.3.11 syncrepl provider:

#
# See slapd.conf(5) for details on configuration options.
# This file should NOT be world readable.
#
include         /usr/local/etc/openldap/schema/core.schema
include         /usr/local/etc/openldap/schema/cosine.schema
include         /usr/local/etc/openldap/schema/nis.schema
include         /usr/local/etc/openldap/schema/inetorgperson.schema
include         /usr/local/etc/openldap/schema/qmail.schema
include         /usr/local/etc/openldap/schema/openldap.schema

pidfile         /usr/local/var/run/slapd.pid
argsfile        /usr/local/var/run/slapd.args

access to * by * write

threads 10
loglevel 5
conn_max_pending 100

#######################################################################
# BDB database definitions
#######################################################################
database        bdb
suffix          "dc=ukbboss,dc=co,dc=uk"
rootdn          "cn=Manager,dc=ukbboss,dc=co,dc=uk"
rootpw  cheesetesting
directory       /database/openldap/openldap-data

index   objectClass     eq
index mailMessageStore sub,eq,pres
index mailServices sub,eq,pres
index mailUsername sub,eq,pres
index mail sub,eq,pres
index cn,uid eq
index uidNumber eq
index gidNumber eq
index entryCSN eq
index entryUUID eq

sizelimit unlimited
cachesize   10000000

overlay glue

#
# We are a sync provider
#
       overlay syncprov
       syncprov-checkpoint 1 1
       syncprov-sessionlog 5000

database monitor #
# THE END
# ver 4.5.6 27/10/2005 18:45 edited by leigh@ark.cheese.local


When I start the slave the debugging looks like it connects OK but does not see anything to sync:


=>do_syncrep2 ldap_result ld 0x8222948 msgid -1 ldap_chkResponseList ld 0x8222948 msgid -1 all 0 ldap_chkResponseList returns ld 0x8222948 NULL wait4msg ld 0x8222948 msgid -1 (infinite timeout) wait4msg continue ld 0x8222948 msgid -1 all 0 ** ld 0x8222948 Connections: * host: 10.100.100.30 port: 389 (default) refcnt: 2 status: Connected last used: Thu Oct 27 18:09:43 2005

** ld 0x8222948 Outstanding Requests:
* msgid 2,  origid 2, status InProgress
  outstanding referrals 0, parent count 0
** ld 0x8222948 Response Queue:
  Empty
ldap_chkResponseList ld 0x8222948 msgid -1 all 0
ldap_chkResponseList returns ld 0x8222948 NULL
ldap_int_select
read1msg: ld 0x8222948 msgid -1 all 0
ber_get_next
ber_get_next: tag 0x30 len 591 contents:
read1msg: ld 0x8222948 msgid 2 message type search-entry
ber_scanf fmt ({xx) ber:
do_syncrep2: got search entry without control
ldap_msgfree
ldap_free_request (origid 2, msgid 2)
ldap_free_connection 1 1
ldap_send_unbind
ber_flush: 7 bytes to sd 11
ldap_free_connection: actually freed

I posted the slave slapd.conf earlier,. but it all looks OK. Nothing odd or errorish on the master debug output.
I'll try 2.3.9 and see what happens..


For my 2.2 problems, I really cannot believe that it is anything OpenLDAP is breaking as it's obvious that other
people here use the same code with installs much larger that mine without problems.


--
Leigh



Buchan Milne wrote:

On Thursday 27 October 2005 18:59, Leigh Porter wrote:


Hiya All,

I also have seen this problem. I have a master and syncrepl OpenLDAP
pair on various versions from 2.2.18



I don't think the OP was using syncrepl. Note that there are known issues with syncrepl providers on 2.2 ...




to 2.2.2x and get seemingly random database corruption problems on the
master. To fix this, I slapcat the slave,
delete the master db and slapadd it to the master, delete the slave
database and restart both - this of course fixes
it all..



And db_recover ?



When the problem presents itself, the master refuses to answer queries
and usually hangs the connection, often
there are quite a few active conenctions to slapd, all hung. If I
slapcat the master's database, the slapcat will
get to a certain point and then hang.



Sounds like slapd is running out of db environment/locks etc etc.



I have no idea what could be causing this, we never caught it on a log

:( It's odd as we are not doing much

that is clever, not many updates but quite a few reads (ISP auth
system). I did have a DB_CONFIG
but deleted it as performance was fine without it and I removed it to
rule it out.



But, maybe the database environment wasn't big enough to handle the few writes when under read load ...




At the moment I am testing 2.3.11 on some lab boxes, but as per the last
posts syncrepl does not seem
to work at the moment.



I'm seeing one issue on 2.3.11, the contextCSN on the suffix of a syncprov'd db doesn't seem to update as it should (even with a suitable syncprov-checkpoint and indexed entryUUIDs). Restarting the provider fixes that, and immediately synchronisation starts up again..


I may consider downgrading to 2.3.9 with patches ...



. On the subject of which, does anybody know why this would happen wioth
ldap 2.3.11 and bdb 4.3.29


root@server1:/database/openldap/openldap-data# slapadd -vwl
/root/test.ldif slapadd: database doesn't support necessary operations.



No idea without seeing the slapd.conf.