[Date Prev][Date Next] [Chronological] [Thread] [Top]

Strange issue with contextCSN



I'm running concurrency tests of MMR, and I see some strange issues:

1) loss of sync after multi-concurrent load (multiple concurrent ops on each server, and modifications to the same data subset on all servers, in order to trigger conflicts). I'm still trying to see if there is any pattern or clue about what failed (like finding some explanation in the logs). This happens once in a while after many operations. I don't expect this to be necessarily a bug; it might be the consequence of conflicts. Of course, it would be nice if slapd allows to clearly identify where the conflict occurred, to support manual resolution.

2) loss of sync after single-concurrent load (multiple concurrent ops on a single server). This is really inesplicable (to me), as there should be no conflict. The only possible explanation I see (but need to investigate further) is that an entry is added on a server, sync'd to another one and, in the meanwhile, deleted on the first one before its own sync gets back. This happens very seldom.

3) whay puzzled me a bit is that when I load a single server, I'd expect to end up with a single contextCSN containing the SID of that server. This is correct for the server I load, but the others, even when they get correctly sync'd, contain a contextCSN for each server in the MMR pool, and the contextCSN with the other SIDs don't get propagated to the server that was loaded. It's not clear why those CSNs are generated, and how they get into the loop and propagate between servers that do not receive direct modifications.

4) another thing that puzzled me a bit is that in some cases, when all servers are loaded, and the contextCSNs are one for each SID and the same in all of the servers, they are sorted randomly, and differently; for example:

bash-3.2$ diff -u testrun/server2.out testrun/server3.out
--- testrun/server2.out	2008-11-22 17:17:28.000000000 +0100
+++ testrun/server3.out	2008-11-22 17:17:28.000000000 +0100
@@ -2497,8 +2497,8 @@
 associatedDomain: example.com
 entryCSN: 20081122161630.753152Z#000000#001#000000
 contextCSN: 20081122161708.935242Z#000000#001#000000
-contextCSN: 20081122161658.195350Z#000000#002#000000
 contextCSN: 20081122161658.193983Z#000000#003#000000
+contextCSN: 20081122161658.195350Z#000000#002#000000

Not a big deal (except for the need to sort values to compare them), but I'd expect them to be exactly in the same order...

I'm going to pack my suite of tests and put them on ftp.openldap.org (and eventually add them to OpenLDAP's test suite, specifically meant to test MMR), but first I need to polish them a little bit and enucleate those that present issues, in order to open specific ITSes.

p.


Ing. Pierangelo Masarati OpenLDAP Core Team

SysNet s.r.l.
via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
-----------------------------------
Office:  +39 02 23998309
Mobile:  +39 333 4963172
Fax:     +39 0382 476497
Email:   ando@sys-net.it
-----------------------------------