[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: OL 2.3.18 syncrepl vs slurpd





Howard Chu wrote:
Francis Swasey wrote:
Folks,
I'm attempting to convert from using slurpd to using syncrepl. However, my testing is developing a definite belief that syncrepl is hopelessly unable to keep up.


I have a test situation where I have loaded a 48,819 entry ldif using slapadd -q -w on the master and slapadd -q on the replica. I then proceed to perform 12,654 modrdns, 56 modifies, and 961 delete/add actions in rapid succession.

Did you verify that the syncrepl consumer was actually idle before you started your tests? syncrepl requires a contextCSN attribute to be present on both the provider and on the consumer. The "-w" option to slapadd causes the contextCSN attribute to be written, so that means your provider's database was immediately usable. But then you need to copy that value over to the consumer. If the LDIF file that you slapadd'd on the consumer came from slapcat'ing the provider, then you're all set, because it contains all the operational attributes, including the contextCSN attribute. But if you slapadd'd a plain input LDIF file on the consumer, then it had no contextCSN attribute, and so it would have to suck the entire database down from the provider before it considered itself sync'd up.

I believe that I verified the syncrepl consumer was idle. I set the loglevel on the consumer to 16640 (256 stats + 16384 LDAPSync) and the syslog was quiet for several minutes as well as no timestamps changing in the database directory before I started the test.


The ldif that I loaded on both the syncrepl provider and syncrepl consumer was generated by a slapcat on the syncrepl provider after the original plain ldif file was loaded with the -w flag to generate the contextCSN attribute and I have verified that the contextCSN is in the ldif that was loaded -- however, since I used the -w flag on the slapadd, would that have regenerated a (possibly) different contextCSN value on the syncrepl provider's database?


With that prerequisite aside, it's well understood that syncrepl is slower than slurpd for a number of reasons. Since syncrepl sends whole

Ah, it wasn't well understood by me that it was designed to be slower.

entries rather than just modifications, it uses a lot more network bandwidth than slurpd. It also causes a lot more database update activity on the consumers. We can take steps to make some of the database activity more efficient, but the network load is still an issue. That's why Symas developed the delta-syncrepl mode of operation, which uses the accesslog data format to propagate modifications instead of whole entries. Of course, delta-syncrepl has its own performance cost since it serializes write operations. (The serialization is two-phase, so you can have two writes in progress at a time.) There's an up-side and a down-side to this; the downside is serialization limiting the number of simultaneous write operations, the upside is that you generally get zero database deadlocks this way so every modification completes much faster.

I guess I'd better investigate the delta-syncrepl mode.

Frank