[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: syncrepl broken, and I can't update anything on the server



All,

In the interests of sharing my stupidity.. I finally fixed this one. 

The clue was in this line.

> Aug 19 04:23:11 rtp-ldap-server-dev-1 slapd[27609]: syncrepl_entry: rid=005 be_add cn={1}DUAConfigProfile,cn=schema,cn=config failed (53) 

When i actually looked on the server, it had already been connected to another test machine, and there was an existing scheme called cn={1}nis   

What I didn't realise, and what seems to be the behaviour is that syncrepl won't renumber cn items, so it couldn't just move on and create  cn={2}DUAConfigProfile,cn=schema,cn=config it failed because of the existing cn={1}  

Solution:  I stopped the slapd on rtp-1, manually deleted the ldif files from etc/openldap/slapd.d/cn=config/cn=schema  and restarted slapd with 'slapd -c rid=001 -c rid=005'

Alister

On 19 Aug 2011, at 13:52, Alister Forbes wrote:

> Dmitriy,
> Thanks very much,  made the changes you suggested, and then restarted the servers with -c for both machines, and finally things are up and sort of running.
> 
> (which is a lot further on than I was)
> 
> Now, if I change something in cn=config, on one server, then it's replicated to the other.
> 
> This only seems to be affecting new changes though.  on rtp-1 I have only 3 schemas declared whereas on bru-1, there are 12.
> 
> in the rtp-1 logs, I see this repeated over and over
> 
> Aug 19 04:23:10 rtp-ldap-server-dev-1 slapd[27609]: conn=1058 fd=9 ACCEPT from IP=144.254.10.212:58653 (IP=0.0.0.0:389) 
> Aug 19 04:23:10 rtp-ldap-server-dev-1 slapd[27609]: conn=1058 op=0 BIND dn="cn=admin,cn=config" method=128 
> Aug 19 04:23:10 rtp-ldap-server-dev-1 slapd[27609]: conn=1058 op=0 BIND dn="cn=admin,cn=config" mech=SIMPLE ssf=0 
> Aug 19 04:23:10 rtp-ldap-server-dev-1 slapd[27609]: conn=1058 op=0 RESULT tag=97 err=0 text= 
> Aug 19 04:23:10 rtp-ldap-server-dev-1 slapd[27609]: conn=1058 op=1 SRCH base="cn=config" scope=2 deref=0 filter="(objectClass=*)" 
> Aug 19 04:23:10 rtp-ldap-server-dev-1 slapd[27609]: conn=1058 op=1 SRCH attr=* + 
> Aug 19 04:23:10 rtp-ldap-server-dev-1 slapd[27609]: conn=1058 op=1 SEARCH RESULT tag=101 err=0 nentries=0 text= 
> Aug 19 04:23:10 rtp-ldap-server-dev-1 slapd[27609]: conn=1058 op=2 UNBIND 
> Aug 19 04:23:10 rtp-ldap-server-dev-1 slapd[27609]: conn=1058 fd=9 closed 
> Aug 19 04:23:11 rtp-ldap-server-dev-1 slapd[27609]: null_callback : error code 0x35 
> Aug 19 04:23:11 rtp-ldap-server-dev-1 slapd[27609]: syncrepl_entry: rid=005 be_add cn={1}DUAConfigProfile,cn=schema,cn=config failed (53) 
> Aug 19 04:23:11 rtp-ldap-server-dev-1 slapd[27609]: do_syncrepl: rid=005 rc 53 retrying (4 retries left) 
> 
> 
> bru-1 is a solaris10 box, and I'm wondering if that might have something to do with it.  If I run an ldapsearch from rtp-1 everything works fineâ
> 
> ldapsearch -x -H ldap://bru-1.cisco.com  -b 'cn=config' -D 'cn=admin,cn=config' -W -s base olcServerID
> 
> but if I try from the Solaris box I need to change the search parametersâ
> 
> ldapsearch -b "cn=config" -h rtp-ldap-server-dev-1.cisco.com  -D 'cn=admin,cn=config' -W -s base "olcServerID=*"
> 
> 
> Thinking about it logically, I don't see how that can be the problem, as when I make a change on either machine, it is replicated across properly.  But at the moment, I'm at my wits end.  Could anyone point me in the right direction please?  (I'm perfectly willing to accept, "you idiot look at this bit of the manual")
> 
> Thanks very much
> Alister
> 
> 
> 
> On 16 Aug 2011, at 20:39, Dmitriy Kirhlarov wrote:
> 
>> 
>> 
>> 16.08.2011 16:18, Alister Forbes ÐÐÑÐÑ:
>>> All,
>>> 
>>> I have two servers, bru-1 and rtp-1
>>> 
>>> At one point I had cn=config working properly, and somehow managed to mess that up.
>>> 
>>> The situation I'm in now is that syncing between the two machines doesn't work, and I can't make any changes to the configs.
>>> 
>>> There are no special configurations, no SASL, or Kerberos, just plain passwords.  I've been through the Server guide, and hopefully I'm just missing something, but I can't seem to find any indication of a way to solve my problem.
>>> 
>>> bru-1 is running solaris10, and rtp-1 is running RHEL 5 , both with hand compiled openldap 2.4.23
>>> 
>>> bru-1:
>>> 
>>> dn: cn=config
>>> olcServerID: 1 ldap://rtp-1.cisco.com
>>> olcServerID: 5 ldap://bru-1.cisco.com
>>> 
>>> # {0}config, config
>>> dn: olcDatabase={0}config,cn=config
>>> olcSyncrepl: {0}rid=005 provider=ldap://bru-1.cisco.com binddn
>>> ="cn=admin,cn=config" bindmethod=simple credentials="testpass" searchbase="cn
>>> =config" type=refreshAndPersist retry="5 5 300 5" timeout=3
>> olcSyncrepl: {1}rid=001 provider=ldap://rtp-1.cisco.com binddn="cn=admin,cn=config" bindmethod=simple credentials="testpass" searchbase="cn=config" type=refreshAndPersist retry="5 5 300 5" timeout=3
>> olcMirrorMode: TRUE
>> 
>>> rtp-1
>>> 
>>> dn: cn=config
>>> olcServerID: 1 ldap://rtp-1.cisco.com
>>> olcServerID: 5 ldap://bru-1.cisco.com
>>> 
>>> # {0}config, config
>>> dn: olcDatabase={0}config,cn=config
>>> olcSyncrepl: {0}rid=005 provider=ldap://bru-1.cisco.com binddn
>>> ="cn=admin,cn=config" bindmethod=simple credentials="testpass" searchbase="cn
>>> =config" type=refreshAndPersist retry="5 5 300 5" timeout=3
>> olcSyncrepl: {1}rid=001 provider=ldap://rtp-1.cisco.com binddn="cn=admin,cn=config" bindmethod=simple credentials="testpass" searchbase="cn=config" type=refreshAndPersist retry="5 5 300 5" timeout=3
>> olcMirrorMode: TRUE
>> 
>> don't forget olcOverlay={0}syncprov,olcDatabase={0}config,cn=config
>> 
>> and correct A and PTR DNS-records
>> 
>> WBR
>> 
>>> 
>>> I'm having to run them with olcMirrorMode set to False at the moment, because if I try to startup bru-1 with mirror mode enabled, it crashes out.
>>> 
>>> Aug 16 13:35:32 bru-1.cisco.com slapd[14591]: [ID 600618 local4.debug] olcServerID: value #2: SID=0x005 (listener=ldap:///)
>>> Aug 16 13:35:33 bru-1.cisco.com slapd[14591]: [ID 309573 local4.debug] olcSyncrepl: value #0: syncrepl will eventually stop retrying; the "retry" parameter should end with a '+'.
>>> Aug 16 13:35:33 bru-1.cisco.com slapd[14591]: [ID 942748 local4.debug] Config: ** successfully added syncrepl rid=005 "ldap://bru-1.cisco.com";
>>> Aug 16 13:35:33 bru-1.cisco.com slapd[14591]: [ID 801593 local4.debug] olcMirrorMode: value #0:<olcMirrorMode>  database is not a shadow
>>> Aug 16 13:35:33 bru-1.cisco.com slapd[14591]: [ID 183426 local4.debug] config error processing olcDatabase={0}config,cn=config:<olcMirrorMode>  database is not a shadow
>>> Aug 16 13:35:33 bru-1.cisco.com slapd[14591]: [ID 486161 local4.debug] slapd stopped.
>>> Aug 16 13:35:33 bru-1.cisco.com slapd[14591]: [ID 432338 local4.debug] connections_destroy: nothing to destroy.
>>> 
>>> Can anyone give me a pointer in the right direction please?  Even just how to get my database back to being a shadow so I can work on the replication problem later.  I realise it could all be linked, so this is the sort of log error I'm seeing when I try running.  It looks almost like a password problem, but that doesn't make any sense, as I can do searches on both machines with ldapsearch, or even phpldapadmin.
>>> 
>>> Bru-1:
>>> 
>>> Aug 16 13:39:51 bru-1.cisco.com slapd[14662]: [ID 445809 local4.debug] do_syncrepl: rid=008 rc -1 retrying (4 retries left)
>>> Aug 16 13:39:51 bru-1.cisco.com slapd[14662]: [ID 445809 local4.debug] do_syncrepl: rid=007 rc -1 retrying (4 retries left)
>>> Aug 16 13:39:51 bru-1.cisco.com slapd[14662]: [ID 319573 local4.debug] slap_client_connect: URI=ldap://rtp-1.cisco.com DN="cn=root,dc=ca" ldap_sasl_bind_s failed (49)
>>> Aug 16 13:39:51 bru-1.cisco.com slapd[14662]: [ID 445809 local4.debug] do_syncrepl: rid=006 rc 49 retrying (4 retries left)
>>> 
>>> rtp-1:
>>> 
>>> Aug 16 05:16:11 rtp-1 slapd[19928]: syncrepl_entry: rid=005 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD)
>>> Aug 16 05:16:11 rtp-1 slapd[19928]: syncrepl_entry: rid=005 be_search (0)
>>> Aug 16 05:16:11 rtp-1 slapd[19928]: syncrepl_entry: rid=005 cn={1}DUAConfigProfile,cn=schema,cn=config
>>> Aug 16 05:16:11 rtp-1 slapd[19928]: null_callback : error code 0x35
>>> Aug 16 05:16:11 rtp-1 slapd[19928]: syncrepl_entry: rid=005 be_add cn={1}DUAConfigProfile,cn=schema,cn=config (53)
>>> Aug 16 05:16:11 rtp-1 slapd[19928]: syncrepl_entry: rid=005 be_add cn={1}DUAConfigProfile,cn=schema,cn=config failed (53)
>>> Aug 16 05:16:11 rtp-1 slapd[19928]: do_syncrepl: rid=005 rc 53 retrying (4 retries left)
>>> 
>>> Any clues or hints, would be greatly appreciated
>>> --
>>> Alister Forbes      TACSUNS             _.|._.|._ Cisco Systems
>>> 
>>> Please avoid sending me Word or PowerPoint attachments. See -
>>> http://www.gnu.org/philosophy/no-word-attachments.html
>>> 
>>> 
>> 
> 
> --
> Alister Forbes  TACSUNS             _.|._.|._ Cisco Systems
> 
> Please avoid sending me Word or PowerPoint attachments. See -
> http://www.gnu.org/philosophy/no-word-attachments.html
> 
> 

--
Alister Forbes      Work:   +32 2 704 5762    Internal: 322 5762
a@cisco.com    TACSUNS             _.|._.|._ Cisco Systems

Please avoid sending me Word or PowerPoint attachments. See -
http://www.gnu.org/philosophy/no-word-attachments.html