[Date Prev][Date Next]
Dear OpenLDAP guys.
First of all, I would like to say sorry about my English as you know,
but I'd like your opinions and/or suggestions about my recent activity
to OpenLDAP which we call syncbackup, that is, to provide
high-availability of master server ( An experimental implementation is
almost done ) keeping the database consistency.
In order to achieve this kind of HA, syncbackup propagates LDAP
operations for modifications such as ldap_modify, ldap_add,
ldap_delete and ldap_rename, to other LDAP servers before commiting
the changes locally.
Client -- modify(0) --> | |
| | --- modify(0) ---> |
| | |
| | <--- SUCCESS ----- |
| <---- SUCCESS ----- |
Only if the master slapd server gets the sucuess results from other
backup servers as well as local commits, it returns the success to the
client. So, if the master returns success, at least two servers have the
If the master server hangs up, another backup server can become another
master server because all changes to the original master server have
been propagated to the new master server.
This is a simple example and has many problems.
First, in this example, if the backup server hangs up, the master server
can't get the response from the backup server. Then if either the master
or the backup hang up, the client can't get result.
Second, when the original master server is restarted, how does this
server follow the latest directory?
To solve these problems, the following ideas were born:
*) At least backup
*) Boot-time synchronization using syncrepl()
* At least backup
This is a way to propagates an operation to several backup servers
at the same time and wait for at least one success result.
|Master| <--- SUCCESS ------------+
| | |
| +------------------+ |
| | |
propagation propagation |
XXXXX | |
XhangupX \ / |
XXXXX +------+ |
+------+ |Backup| +---+
So even if one backup server hangs up or delay because of some reason
such as network trouble, the master server can return the success to
the client as soon as possible.
* Boot-time synchronization using syncrepl()
When another backup server fail over the master service, the backup
server fetches the changes to other servers by using one-time
+------+ +-----> |Backup|
|Backup| ---+ +------+
The new server should be up-to-date within the available servers.
These are the ohter extensions to construct syncbackup:
* startSYNC extended operation
This is a LDAPv3 extended operation to start syncbackup.
It sends syncCookie and syncid and then the master server
attaches it to the apropriate list and propagates
LDAP operations to the backup server.
|Master| ---- propagation +
/ \ |
startsync operation |
| | Backup server
| | +------+ |
| +---------+ | Core | |
| |startSYNC| +------+ |
| +---------+ |
If the backup server returns error, the master sends the
SYNC_ERROR response to the startSYNC operation and detaches the
If the master server exits, the master sends the SYNC_DONE
response to the startSYNC operation.
* NOOP extended operation
This is a LDAPv3 extended operation to check whether the server
works or not. If the server gets the operation, it returns just
* contextCSN LDAP control
This is a LDAPv3 LDAP control. When the master propagates the
operation, it puts the appropriate contextCSN value to the
operation. If the backup server gots the contextCSN control,
it uses the value rather than using lutil_csnstr().
* Execution example from log message
These are from the log of my experimental implementation:
The following log shows that the operation is propagated to the two
backup servers "hoge" and "hige" and these servers become
conn=6 op=1 ADD dn="o=University of Michigan,c=US"
conn=6 op=1 BACKUP sync=1 backup=hige
conn=6 op=1 BACKUP sync=1 backup=hoge
conn=6 op=1 RESULT tag=105 err=0 text=
The next log shows the operation conn=8, op=142 is propagated to
backup "hoge" at first at (1) and returns a success to the client at
(2). The backup "hige" is delayed and up-to-date at (4).
conn=8 op=142 ADD dn="cn=Test141,ou=People,o=University of M..
conn=8 op=74 BACKUP sync=0 backup=hige
conn=8 op=75 BACKUP sync=0 backup=hige
(1) conn=8 op=142 BACKUP sync=1 backup=hoge
conn=8 op=76 BACKUP sync=0 backup=hige
conn=8 op=143 ADD dn="cn=Test142,ou=People,o=University of M..
(2) conn=8 op=142 RESULT tag=105 err=0 text=
(3) < Suspending "hoge" .... >
conn=8 op=77 BACKUP sync=0 backup=hige
conn=8 op=78 BACKUP sync=0 backup=hige
conn=8 op=79 BACKUP sync=0 backup=hige
conn=8 op=141 BACKUP sync=0 backup=hige
(4) conn=8 op=142 BACKUP sync=1 backup=hige
(5) conn=8 op=143 BACKUP sync=1 backup=hige
(6) conn=8 op=143 RESULT tag=105 err=0 text=
Because the "hoge" is suspended at (3), the master server can't get
any result from "hoge" and "hige". So the master waits for the
"hige" to be up-to-date until (4). After "hige" is up-to-date, the
master sends the operation "143" to the "hige" and gets result at
(5). After (5), because the master gets one success result, it sends
RESULT err=0 to the client at (6).
Even though the implementation has some bugs yet, I can use this
replication engine within my OpenLDAP environment. I have to think
twice about the following topics:
* considerations about rfc3389.
* multi-master environment.
* ant others??
I hope any opinions, suggestions or notices. If there is no important
problem, I'd like to merge this feature to OpenLDAP itself. We will
release this little big patch under the same term with OpenLDAP Public
Thanks for your excellent efforts and also for reading this mail.
Masato Taruishi <email@example.com>