[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Overlays: freeing resources on a hijacked Operation



David Hawes wrote:

I have recently finished an overlay that I call 'addpartial' that intercepts an add operation, determines if the entry to be added exists, and if so, determines if anything has changed and modifies the appropriate attributes. If the entry does not exist, the add proceeds as expected. If the entry does exist yet is identical to the existing entry, the add will silently return LDAP_SUCCESS. The intent of this overlay is to do partial record replication from a master to its slaves, while upstream replication data (from our Registry, in this case) can be full records. Before this overlay, we were doing deletes and adds for each entry changed in our Registry (ugh!), so we were effectively slowing down our directories. The overlay can do a steady 400-500 records compared per second on entries with approximately 20-40 attributes (very rough numbers). I know other people faced with this problem effectively diff entries at another level--this overlay simply does that at the LDAP level. I provide this information in the hope that others will find this useful. I'd love to polish it up and contribute it, if so.

My apologies for that long-winded description.  Now on to my question.

After the overlay has finished doing what it needs to do, it sends LDAP_SUCCESS to the client and then returns. If I return LDAP_SUCCESS, my server runs out of memory at around 1 million entries (starts eating up swap and then segfaults). However, if I return SLAP_CB_CONTINUE, the server does not run out of memory (tested up to 20 million entries). I assume that something in the original add Operation needs to be freed, but I am not sure what. Could anyone point me in the right direction for this case? The overlay works fine when I return SLAP_CB_CONTINUE, but there are some misleading errors left in the logs (68: entry already exists) as it tries to complete the original operation.

From my perspective, I see that operation code is quite well isolated; in your case,
one approach could be to simply look at servers/slapd/add.c, note any occurrence
of malloc'ed memory that is eventually assigned to any member of "op" (the
Operation structure), with particular attention to what is assigned to op->ora_*
(macros for op->o_request_oq_add.* mebers) but also to other malloc'ed members
of "op". Then, any time you're modifying any of them, remember to free() the
original one with the appropriate free() routine (in some cases, op->o_tmpfree()
passing op->o_tmpmemctx as context; in others simply ch_free()).


This should point you in the right direction. Then, following Howard's suggestion,
make sure you don't exceed in freeing stuff by using fnccheck or valgrind or efence
or any other debugging tool (usually, I first try defining MALLOC_CHECK_, see
malloc(3) for details, if your glibc implementation supports it).


There are very few examples of overlays that actually deal with the add operation;
I recall rwm.c and unique.c only. Other overlays simply proxy the operation to
another backend, or do trivial stuff.


p.




SysNet - via Dossi,8 27100 Pavia Tel: +390382573859 Fax: +390382476497