[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#5488) syncrepl received contextCSN not passed on to syncprov consumers



On Sun, 4 May 2008, hyc@symas.com wrote:

> Rein Tollevik wrote:
>> On Wed, 30 Apr 2008, Howard Chu wrote:
>>> rein@OpenLDAP.org wrote:
>>>> My first attempt at fixing this was to change syncprov to fetch the
>>>> queued csn values from the glue backend where it was used.  But that
>>>> failed as other modules queues the csn values in their own backend when
>>>> they changes things.
>>> What other modules? Generally there cannot be any other sources of changes.
>>
>> Sorry, I should have written other configurations.  The CSNs gets queued
>> in the subordinate database when syncrepl is used there, or not at all
>> (i.e in regular updates that comes in through the frontend).
>
> OK, but that's again quite a special case. I.e., that's multi-master; in the
> default (single-master) case there cannot be regular updates arriving through
> the frontend. When a single-master syncrepl consumer is configured, that is
> the only possible source of updates. Let's be sure we've solved this question
> for the single-master case first, before addressing the multi-master case.

No, I'm thinking about single-master glued configurations where either:
1) The server is the single master for the subordinate backend or
2) The server is a syncrepl consumer for the subordinate backend, and
syncrepl is configured on the subordinate db.

In both cases is the CSN values queued in the subordinate database where
syncprov looks for the values.

The case that don't work is when syncrepl and syncprov are both used on
the glue database, but still in single-master mode (although I don't
think that matter).  I.e, this server acts like a kind of forwarding
server, it replicates the changes it receives from its producer to its
own consumers.  In this case syncrepl queues the CSN values in the glue
database, while syncprov still looks for them in the subordinate
database where the actual changes are made.

> While it's expected that the software will be able to handle multiple glued
> DBs and multi-master across them, I seriously doubt that anyone out there
> actually knows how to configure and maintain such a setup yet.

I haven't looked at multi-master yet, although I have multiple master
servers that replicate between each other.  But each backend database
has a clearly defined single master, so this is not what I think about
as multi-master configurations.

>>>> Instead I changed ctxcsn.c so that it always
>>>> queues them in the glue backend where syncprov is used.  But I don't
>>>> feel that my understanding of this stuff is good enough to be sure that
>>>> this is the optimal solution..
>>> I definitely don't like references to the syncprov overlay appearing in main
>>> slapd code like that. We need a different solution.
>
>> To me it makes sense to have a single queue of CSN values in a glued
>> configuration, no matter if or where syncprov is used.
>
> Yes, I can probably go along with that. The downside is that it may reduce
> write concurrency a bit, compared to a glued configuration where each glued DB
> is otherwise independent.

Which again should imply that the best fix probably is to change
syncrepl so that it queues the csn values in the backends where the
changes are made?

>> Another approach could be to have syncprov look in the glue database if
>> it fails to find any queued CSN in a subordinate db.  I haven't tested
>> it, but that should work in both configurations.  It should also remove
>> the need to always look for the glue db which my patch requires.  Would
>>> that be better?
>
> That sounds like a decent alternative.

A new patch that implements this is at the end. Is this OK or should we
go for the syncrepl alternative instead?

>>>> Btw, in syncprov_checkpoint() there is a similar SLAP_GLUE_SUBORDINATE
>>>> test, should that have included an overlay_is_inst() clause as well?
>>> Perhaps. You would have to use op->o_bd->bd_self instead of op->o_bd on
>>> that call.
>
>> The current test (introduced to fix ITS#5433) causes the contextCSN to
>> be written to the glue database when syncprov is used on a subordinate
>> db, which appears wrong to me.
>
> Understood.
>
> Again, the question is whether the admin intended to configure a single
> syncprov over an entire glued DB, or individual syncprovs over each component
> of the glued tree. The distinction is vital, and it's detected based on
> whether the syncprov overlay is above the glue overlay in the overlay stack,
> or below it, on the topmost DB.

Yes, the first case is what I'm using, it works with the current
code. The second is what I'm afraid got broken by this patch.  Although
I haven't tried the second type, so I'm not sure..

>> Could you elaborate on when op->o_bd->bd_self must be used instead of
>> op->o_bd?  I understand that op->o_bd may be a copy of the original
>> structure that op->o_bd->bd_self refers to, but I'm not sure when it
>> must be used.  Btw, could op->o_bd->bd_self->bd_info be used to fetch
>> the BackendInfo that can be used to call the top-most bd_search (and
>> similar) also in overlays?
>
> If you read the code for overlay_is_inst() it should be obvious - that
> function only works when used with a real BackendDB structure. The local copy
> structure has had its bd_info replaced with whatever on_inst structure
> corresponds to the current overlay.

OK, I understand that one.  I had hoped for a general rule, but am afraid
that can't be given.  And if we should continue this discussion I guess
it's time to move it to openldap-devel.

> Yes, the bd_self points to the topmost structure, so you can use it for
> be_search. Much of what's happening in these overlays was intended to avoid
> starting over at the top though, because the code is already running in the
> desired overlay context.

Ah, good.  Using that is much clearer than to cast op->o_bd->bd_info
into an overinst pointer and fetching it there :-)

Rein
Index: OpenLDAP/servers/slapd/overlays/syncprov.c
===================================================================
RCS file: /f/CVSROOT/drift/OpenLDAP/servers/slapd/overlays/syncprov.c,v
retrieving revision 1.20
diff -u -u -r1.20 syncprov.c
--- OpenLDAP/servers/slapd/overlays/syncprov.c	5 May 2008 17:45:38 -0000	1.20
+++ OpenLDAP/servers/slapd/overlays/syncprov.c	5 May 2008 20:58:35 -0000
@@ -1589,6 +1589,17 @@
  		cbuf[0] = '\0';
  		ldap_pvt_thread_rdwr_wlock( &si->si_csn_rwlock );
  		slap_get_commit_csn( op, &maxcsn );
+		if ( BER_BVISNULL( &maxcsn ) && SLAP_GLUE_SUBORDINATE( op->o_bd )) {
+			/* syncrepl queues the CSN values in the db where
+			 * it is configured , not where the changes are made.
+			 * So look for a value in the glue db if we didn't
+			 * find any in this db.
+			 */
+			BackendDB *be = op->o_bd;
+			op->o_bd = select_backend( &be->be_nsuffix[0], 1);
+			slap_get_commit_csn( op, &maxcsn );
+			op->o_bd = be;
+		}
  		if ( !BER_BVISNULL( &maxcsn ) ) {
  			int i, sid;
  			strcpy( cbuf, maxcsn.bv_val );