[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#3267) more modrdn problems



--T4sUOijqQbZv57TR
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Thu, Mar 31, 2005 at 07:13:16PM -0800, Howard Chu wrote:
> entryIDs are not "recycled" unless they have fully wrapped around. Since 
> entryIDs are an unsigned long, which would generally be 32 bits (or 64 
> if you're on a 64bit architecture) that means you cycled through over 4 
> billion add/delete operations. It doesn't sound to me like that's what 
> you're doing.
no. That is not happening. The entries are about the id being used.
There has been no wrapping at all (around 140f entries had been entered).
However, I can see ``holes'' if I scroll the output down, as in the
output you have seen in the mail.

> The other case for holes in the entryID sequence arises because your 
> input LDIF file is not well-formed. 
The input LDIF was generated by a previous slapcat. So, there's
definetively something broken, either in the slapcat generation or
in the adding code.

  Ok, so, by the data we have: 
  	- no holes should have been occurred at all. This either 
	  means there's something wrong either:
	    - in the id generation (unlikely, code which has been
	      tested a lot)
	    - in the input ldif (unlikely, since it has been generated
	      by slapcat)
	- if a hole creates (unlikely, since 4 billion entries should
	  have been added), the recycling code does not work correctly, 
	  giving me some ids which are already in use (32 bit arch).

Note also that, after inserting the database, and the ``wrong entries''
being supposedly discarded by slapadd, I'd expect to have a ``clean'' 
database. But if I slapcat and slapadd again, the same errors verify. 
After the slapadd of last night, I slapcatted the database and slapadded
it again, and this time had errors into:

[...]
added: xxxx (00002db8)
added: xxxx (00002db9)
added: xxxx (00002dba)
added: xxxx (00002dbb)
=> bdb_tool_next_id: dn2id_add failed: DB_KEYEXIST: Key/data pair already exists (-30996)
=> hdb_tool_entry_put: txn_aborted! DB_KEYEXIST: Key/data pair already exists (-30996)
slapadd: could not add entry dn=xxxx (line=195428): txn_aborted! DB_KEYEXIST: Key/data pair already exists (-30996) (ffffffff)
added: xxxx (00002dbe)
added: xxxx (00002dbf)
added: xxxx (00002dc0)
[...]

  So:
    slapadd < file.ldiff 
[errors sent you in previous mail]
[slapd is down, on my local machine,
 trying to get some understanding of the problem,
 no queries have been performed at all]
    slapcat > new.ldiff
    rm /var/lib/ldap/!(DB_CONFIG)
    slapadd < new.ldiff
[errors above. Brand new entries failing, 
 brand new holes]

  Note that slapd has not been started at all, and no
operations have been performed on the database. I tried both
with a ``well suited'' DB_CONFIG file and without. The only
thing old entries have in common with the new entries showing
up the problem, is that they have all been moved with modrdn2,
from one parent to a different parent (the application we use 
moves entries ``out of the tree'' (out of a ou it uses under
the rootdn) -- into a sort of ``trash'' (another ou under the
same rootdn), trash that is periodically cleaned up by a cron 
job). Only entries in this ``trash'' seems to have this kind
of problem. That is why the bug report was titled ``modrdn
problems'' in the first place. 
  To me, the above probably means the LDIFF is sane, and 
there's something wrong with slapcat and slapadd. 

> An entry's parent must already exist 
> before the entry itself can be added. It is certainly possible that hole 
> management is broken, but as you suggested, this code has been heavily 
> used in back-bdb for over two years. back-hdb probably has not had the 
> same volume of testing as back-bdb, but much of the code is identical 
> anyway.
I don't know how much testing hdb had at all under high loads. 
  We are experiencing lot of troubles on many different installations 
(before 2.2.15, we experienced almost a crash a week, from index corruption 
to complete database corruption -- almost anything you can see in the its,
changelog and ml from 2.2.5 up to now). My fault we didn't file bug reports 
every time, but most of the times I had no real data about what was going 
wrong (slapd ``just crashed'' or hanged, but had no idea of what operations
were being performed at the time) and couldn't reproduce the problem 
on a regular basis, or stop the systems (nor attach gdb) to investigate the 
problem. I'm working now on a testing setup able to reproduce the above 
problems that would allow me to attach a debugger. 

> If you can supply an LDIF file (and slapd.conf) that reproduces the 
> problem, that will be most useful.
you can find the slapd.conf attached. Note this is the slapd.conf I used
on my testing system to reproduce the problems. Don't get scared by the
ACLs :) I'm working on a ldiff file. If I can find something,  I'll let
you know. If you have suspects, I can easily check them out with gdb. I'm
not a novice with that kind of tools, I just didn't want to start poking 
around with another source code, and openldap sources don't seem the
easyest I have had to deal with. So, if I can provide any additional
info you may need with gdb, please let me know.

Cheers,
Carlo

-- 
  GPG Fingerprint: 2383 7B14 4D08 53A4 2C1A CA29 9E98 5431 1A68 6975
                        -------------
Real computer scientists like having a computer on their desk, else how
could they read their mail?

--T4sUOijqQbZv57TR
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="slapd.conf"

# Allow LDAPv2 binds
allow bind_v2 
allow update_anon
allow bind_anon_dn
allow bind_anon_cred

# This is the main slapd configuration file. See slapd.conf(5) for more
# info on the configuration options.

#######################################################################
# Global Directives:

# Features to permit
#allow bind_v2

# Schema and objectClass definitions
include         /etc/ldap/schema/core.schema
include         /etc/ldap/schema/cosine.schema
include         /etc/ldap/schema/nis.schema
include         /etc/ldap/schema/inetorgperson.schema
include		/etc/ldap/schema/mail-mb.schema

# Schema check allows for forcing entries to
# match schemas for their objectClasses's
schemacheck     on

# Where the pid file is put. The init.d script
# will not stop the server if you change this.
pidfile         /var/run/slapd/slapd.pid

# List of arguments that were passed to the server
argsfile        /var/run/slapd.args

# Read slapd.conf(5) for possible values
loglevel        0

# Where the dynamically loaded modules are stored
modulepath	/usr/lib/ldap
moduleload	back_hdb

sizelimit size.soft=1000
sizelimit size.hard=-1

#######################################################################
# Specific Backend Directives for bdb:
# Backend specific directives apply to this backend until another
# 'backend' directive occurs
backend		hdb

#######################################################################
# Specific Backend Directives for 'other':
# Backend specific directives apply to this backend until another
# 'backend' directive occurs
#backend		<other>

#######################################################################
# Specific Directives for database #1, of type bdb:
# Database specific directives apply to this databasse until another
# 'database' directive occurs
database        hdb
checkpoint 4096 1

# The base of your directory in database #1
suffix          "dc=nodomain"

# Where the database file are physically stored for database #1
directory       "/var/lib/ldap"

# Indexing options for database #1
index           objectClass eq

# Save the time that the entry gets modified, for database #1
lastmod         on

# Where to store the replica logs for database #1
# replogfile	/var/lib/ldap/replog

# The userPassword by default can be changed
# by the entry owning it if they are authenticated.
# Others should not be able to see it, except the
# admin entry below
# These access lines apply to database #1 only
access to * by anonymous write
access to * by * write

database        hdb
checkpoint 4096 1
#
# The base of your directory in database #1
suffix          "dc=mydomain,dc=net"

# Where the database file are physically stored for database #1
directory       "/var/lib/ldap2"

# Indexing options for database #1
index           objectClass eq
index		account eq
index		domain eq

# Save the time that the entry gets modified, for database #1
lastmod         on

# Where to store the replica logs for database #1
# replogfile	/var/lib/ldap/replog

# The userPassword by default can be changed
# by the entry owning it if they are authenticated.
# Others should not be able to see it, except the
# admin entry below
# These access lines apply to database #1 only
access to * by anonymous write
access to * by * write

#
# Ensure read access to the base for things like
# supportedSASLMechanisms.  Without this you may
# have problems with SASL not knowing what
# mechanisms are available and the like.
# Note that this is covered by the 'access to *'
# ACL below too but if you change that as people
# are wont to do you'll still need this if you
# want SASL (and possible other things) to work 
# happily.

# The admin dn has full write access, everyone else
# can read everything.

# For Netscape Roaming support, each user gets a roaming
# profile for which they have write access to
#access to dn=".*,ou=Roaming,o=morsnet"
#        by dn="cn=admin,dc=nodomain" write
#        by dnattr=owner write

#######################################################################
# Specific Directives for database #2, of type 'other' (can be bdb too):
# Database specific directives apply to this databasse until another
# 'database' directive occurs
#database        <other>

# The base of your directory for database #2
#suffix		"dc=debian,dc=org"

--T4sUOijqQbZv57TR--