[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: syncrepl catchup (ITS#2713)



This is a multipart message in MIME format.
--=_alternative 00501A0585256D9E_=
Content-Type: text/plain; charset="us-ascii"

>Hello,

>The syncrepl process implemented in 2.2, makes from what I can tell, a 
bad
>assumption.

>It assumes the DB on the replica is empty, meaning that it wants to send
>everything it has in its DB to the replica.  Since we pre-load the 
replicas with
>a dump of the master's DB, there is no need for this to be done.  What 
would be
>better, is for the master to generate a tag of some sort, marking a 
starting
>point (probably 0?) that gets incremented as it receives updates, from 
the time
>its DB was created.  That way, if I dump the DB after it has gotten some 
updates

Syncrepl uses cookies for that pupose. The contextCSN at the sync provider
is maintained such that it represents the current status of the provider 
replica
(this is far from trivial in the transaction environment).
The syncreplCookie at the sync consumer is the largest cookie value it 
received
from the provider. When the consumer connects to the provider, it always
include its syncreplCookie in the request so that it can receive entries 
changed
after the corresponding state.

>(so its tag is now 500, lets say), and I load that into a new replica, 
and bring
>it up, the master will only catch it up from change #500 onward.  I have 
a
>feeling this has to be significantly faster than sending our entire 450MB 
DB
>across the network to the replica, as well.

If you populate the consumer with a dump of the consumer's DB 
(not of the provider's), incremental synchronization will take place.
A syncreplCookie can be included in the consumer's dump by using slapcat 
-k.
Slapadding of this dump will create the consumerSyncSubentry
which contains the syncreplCookie. 

When you start from the provider's dump, the contextCSN can be included
in the dump by retrieving the providerSyncSubentry (use slapcat -m).
When you want to move this dump directly to the sync consumer, the current 
workaround
would be to change the providerSyncSubentry to the corresponding 
consumerSyncSubentry
having the same cookie value.

Currently, it is possible to add the providerSyncSubentry automatically 
when using slapadd (-w option).
A similar functionality can be supported for the consumerSyncSubentry
under the assumption that each such dump covers only one syncrepl search 
specification.
Then, it would be possible to populate the directory from any (provider or 
consumer) DB
wihch includes entryCSN attribute of each entry.

------------------------
Jong Hyuk Choi
IBM Thomas J. Watson Research Center - Enterprise Linux Group
P. O. Box 218, Yorktown Heights, NY 10598
email: jongchoi@us.ibm.com
(phone) 914-945-3979    (fax) 914-945-4425   TL: 862-3979
--=_alternative 00501A0585256D9E_=
Content-Type: text/html; charset="us-ascii"


<br><font size=2><tt>&gt;Hello,<br>
<br>
&gt;The syncrepl process implemented in 2.2, makes from what I can tell, a bad<br>
&gt;assumption.<br>
<br>
&gt;It assumes the DB on the replica is empty, meaning that it wants to send<br>
&gt;everything it has in its DB to the replica. &nbsp;Since we pre-load the replicas with<br>
&gt;a dump of the master's DB, there is no need for this to be done. &nbsp;What would be<br>
&gt;better, is for the master to generate a tag of some sort, marking a starting<br>
&gt;point (probably 0?) that gets incremented as it receives updates, from the time<br>
&gt;its DB was created. &nbsp;That way, if I dump the DB after it has gotten some updates</tt></font>
<br>
<br><font size=2><tt>Syncrepl uses cookies for that pupose. The contextCSN at the sync provider</tt></font>
<br><font size=2><tt>is maintained such that it represents the current status of the provider replica</tt></font>
<br><font size=2><tt>(this is far from trivial in the transaction environment).</tt></font>
<br><font size=2><tt>The syncreplCookie at the sync consumer is the largest cookie value it received</tt></font>
<br><font size=2><tt>from the provider. When the consumer connects to the provider, it always</tt></font>
<br><font size=2><tt>include its syncreplCookie in the request so that it can receive entries changed</tt></font>
<br><font size=2><tt>after the corresponding state.</tt></font>
<br><font size=2><tt><br>
&gt;(so its tag is now 500, lets say), and I load that into a new replica, and bring<br>
&gt;it up, the master will only catch it up from change #500 onward. &nbsp;I have a<br>
&gt;feeling this has to be significantly faster than sending our entire 450MB DB<br>
&gt;across the network to the replica, as well.</tt></font>
<br>
<br><font size=2><tt>If you populate the consumer with a dump of the consumer's DB &nbsp;</tt></font>
<br><font size=2><tt>(not of the provider's), incremental synchronization will take place.</tt></font>
<br><font size=2><tt>A syncreplCookie can be included in the consumer's dump by using slapcat -k.</tt></font>
<br><font size=2><tt>Slapadding of this dump will create the consumerSyncSubentry</tt></font>
<br><font size=2><tt>which contains the syncreplCookie. </tt></font>
<br>
<br><font size=2><tt>When you start from the provider's dump, the contextCSN can be included</tt></font>
<br><font size=2><tt>in the dump by retrieving the providerSyncSubentry (use slapcat -m).</tt></font>
<br><font size=2><tt>When you want to move this dump directly to the sync consumer, the current workaround</tt></font>
<br><font size=2><tt>would be to change the providerSyncSubentry to the corresponding consumerSyncSubentry</tt></font>
<br><font size=2><tt>having the same cookie value.</tt></font>
<br>
<br><font size=2><tt>Currently, it is possible to add the providerSyncSubentry automatically when using slapadd (-w option).</tt></font>
<br><font size=2><tt>A similar functionality can be supported for the consumerSyncSubentry</tt></font>
<br><font size=2><tt>under the assumption that each such dump covers only one syncrepl search specification.</tt></font>
<br><font size=2><tt>Then, it would be possible to populate the directory from any (provider or consumer) DB</tt></font>
<br><font size=2><tt>wihch includes entryCSN attribute of each entry.</tt></font>
<br><font size=2 face="sans-serif"><br>
------------------------<br>
Jong Hyuk Choi<br>
IBM Thomas J. Watson Research Center - Enterprise Linux Group<br>
P. O. Box 218, Yorktown Heights, NY 10598<br>
email: jongchoi@us.ibm.com<br>
(phone) 914-945-3979 &nbsp; &nbsp;(fax) 914-945-4425 &nbsp; TL: 862-3979</font>
--=_alternative 00501A0585256D9E_=--