[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: OpenLDAP syncrepl woes



On Thu, Nov 17, 2011 at 5:50 PM, Howard Chu <hyc@symas.com> wrote:
>
> Jeffrey Crawford wrote:
>>
>> On Wed, Nov 16, 2011 at 1:27 PM, Howard Chu <hyc@symas.com
>> <mailto:hyc@symas.com>> wrote:
>>    Jeffrey Crawford wrote:
>>        On Wed, Nov 16, 2011 at 7:40 AM, Jeffrey Crawford<jeffreyc@ucsc.edu
>>        <mailto:jeffreyc@ucsc.edu>>  wrote:
>>            On Wed, Nov 16, 2011 at 12:09 AM, Howard Chu<hyc@symas.com
>>            <mailto:hyc@symas.com>>  wrote:
>>                Jeffrey Crawford wrote:
>>                    I'm trying to stabilize our openldap server farm before
>>                    going live and
>>                    am finding that despite the contextCSN matching between
>>                    providers and
>>                    replicas, the actual content of the server is getting out
>>                    of sync.
>>                    This is most prominent when we are testing our population
>>                    routine and
>>                    we need to remove all accounts before starting. right now
>>                    it's only
>>                    about 22000 entries (It will get much larger).
>
>>                    During the mass delete we got the following sprinkled
>>                    throughout the
>>                    logs on all machines:
>>                    ====
>>                    Nov 15 15:47:16 idm-prod-ldap-2 slapd[33070]:
>>                    bdb(dc=domain,dc=name):
>>                    previous transaction deadlock return not resolved
>>
>>
>>                Wow. I've never seen this error message before. What version
>>                of OpenLDAP and
>>                BerkeleyDB are you using?
>>
>>
>>            FreeBSD 8.2 with openldap 2.4.26, however like I mentioned before,
>>            right now I think we are squeezing ram right now Part of this
>>            deployment was to discover how much ram we needed on the virtual
>>            machine and it was started pretty low.
>>
>>
>>        Oh and we are using bdb 4.6 right now (forgot to answer that)
>>
>>
>>    Running out of memory would cause an obvious error message ("no memory")
>>    so that's not likely to be the problem here. Might be worth upgrading to
>>    at least BDB 4.8, but again, never having seen BDB spit out that error
>>    before, that's just a guess.
>
>> Not sure if this is significant but I'm been noticing that this error only
>> shows up on deletes. However it also shows up on deletes on the machine I'm
>> running the ldapdelete against. So perhaps this is more of a software issue.
>> I'll go ahead and run this with more ram and I'll check with the sysadmin if
>> they can compile it against bdb 4.8 and see if that changes anything. But I
>> don't think ITS#7052 applies here because the machine I'm doing this against
>> does not use syncrepl, its the provider to others.
>>
>> This is a machine on a VM. Are there any known issues with that?
>
> Way back in the dawn of time, there were some VMware implementations that didn't support mutexes correctly. I don't think that's been an issue for many years. There ought to be other error messages in your log, immediately preceding the one you quoted. Post those too.
>
> --
>  -- Howard Chu
>  CTO, Symas Corp.           http://www.symas.com
>  Director, Highland Sun     http://highlandsun.com/hyc/
>  Chief Architect, OpenLDAP  http://www.openldap.org/project/

There really isn't much there but here is an example really not much
around it: (I've modified the usernames only)

Nov 17 21:11:55 localhost slapd[1912]: conn=1478 op=10706 DEL
dn="uid=user1,ou=people,dc=ucsc,dc=edu"
Nov 17 21:11:55 localhost slapd[1912]: conn=1478 op=10706 RESULT
tag=107 err=0 text=
Nov 17 21:11:55 localhost slapd[1912]: conn=1478 op=10707 DEL
dn="uid=user2,ou=people,dc=ucsc,dc=edu"
Nov 17 21:11:55 localhost slapd[1912]: conn=1478 op=10707 RESULT
tag=107 err=0 text=
Nov 17 21:11:55 localhost slapd[1912]: conn=1478 op=10708 DEL
dn="uid=user3,ou=people,dc=ucsc,dc=edu"
Nov 17 21:11:55 localhost slapd[1912]: conn=1478 op=10708 RESULT
tag=107 err=0 text=
Nov 17 21:11:55 localhost slapd[1912]: conn=1478 op=10709 DEL
dn="uid=user4,ou=people,dc=ucsc,dc=edu"
Nov 17 21:11:55 localhost slapd[1912]: bdb(dc=ucsc,dc=edu): previous
transaction deadlock return not resolved
Nov 17 21:11:55 localhost slapd[1912]: => bdb_idl_delete_key: cursor
failed: Invalid argument (22)
Nov 17 21:11:55 localhost slapd[1912]: conn=1478 op=10709 RESULT
tag=107 err=80 text=entry index delete failed
Nov 17 21:11:55 localhost slapd[1912]: conn=1478 op=10710 DEL
dn="uid=user5,ou=people,dc=ucsc,dc=edu"
Nov 17 21:11:55 localhost slapd[1912]: conn=1478 op=10710 RESULT
tag=107 err=0 text=
Nov 17 21:11:55 localhost slapd[1912]: conn=1478 op=10711 DEL
dn="uid=user6,ou=people,dc=ucsc,dc=edu"
Nov 17 21:11:56 localhost slapd[1912]: conn=1478 op=10711 RESULT
tag=107 err=0 text=
Nov 17 21:11:56 localhost slapd[1912]: conn=1478 op=10712 DEL
dn="uid=user7,ou=people,dc=ucsc,dc=edu"
Nov 17 21:11:56 localhost slapd[1912]: conn=1478 op=10712 RESULT
tag=107 err=0 text=