Full_Name: Hallvard B Furuseth Version: HEAD OS: Linux x86_64 URL: Submission from: (NULL) (129.240.6.233) Submitted by: hallvard Here are a number of bugs and other issues with slapd/sl_malloc.c, and one for ber_memcalloc. Some for RE24, likely some for RE25, but I don't know where to cut off for RE24? (Waiting for someone to say...) * It claims to align on 2*sizeof(int) boundaries, but it looks like stack:realloc and pool:mem_create can un-align to (2n+1)*sizeof(int) boundaries on 32-bit hosts, where sizeof(ber_len_t) == sizeof(int). A slap_sl_realloc() fix for the stack implementation is to replace size += pad + sizeof( ber_len_t ); size &= ~pad; with size = ((size + pad) & ~pad) + sizeof( ber_len_t ); however see the optimizations and tweaks below. For the pool implementation, maybe slap_sl_mem_create() can just set so->so_ptr = (char *) sh->sh_base + sizeof(ber_len_t) % (pad+1); Unless that messes up its expectations of how much space is left, I haven't checked. Should mem_create size be increased accordingly? Unfortunately sl_malloc.c exports its internals, so I don't know what kind of fix would affect it: * Why is there a slap_sl_mem_detach() when there is no way to reattach it? As far as I can tell all the caller can do is to inspect/clean up the returned slab_heap itself. * slap_sl_mem_create() switching implementation stack<->pool on an old memctx slab can crash or leak, it runs the wrong cleanup code. Fix: Remove code duplication, rearrange. The pool cleanup duplicates slap_sl_mem_destroy(), except the final frees. Let destroy(key=NULL,) mean don't free and call that. Also the create/destroy pool branches duplicate the stack branches. * However, will the pool (i.e. non-stack) implementation ever be used? If not, a simpler fix to some of this is to delete it. OTOH if it someday may be used, hopefully some of these cleanups will be leaving it a bit more correct, easier to read, and faster. Though I haven't gone to the drastic step of studying it to see how it works. Or we could #ifdef it out by default. It's unclear to me why this is a run-time argument to the create function instead of instead of a -D<compiler flag>, why would the caller care? #define SLAP_SLAB_MODE <undef or 1:stack, 0:pool: -1:support both>. #define SLAP_SLAB_ISSTACK(stack) <stack if both, else SLAP_SLAB_MODE> * Regarding code duplication, the #ifdef SLAP_NO_SL_MALLOC code duplicates the !ctx code. Make that if(<enum No_sl_malloc> && !ctx) and the #ifdefs can be killed. * Also we can throw the #ifdef NO_THREADS/#else/#endif code out to two macros #define GET_MEMCTX()/SET_MEMCTX(). * The "enough room?" test (ptr + size < endptr) is invalid, can e.g. wrap around, when there is not enough room. Use (size < endptr-ptr). * Some failures assert(0)/exit() like ch_malloc.c, but without giving a Debug() message first. * The slap_sl_malloc() pool implementation can return NULL on failure at comment "FIXME: missing return; guessing...". For consistency it should exit on failure like the rest of sl_malloc.c and ch_malloc.c. * Realloc(last block) when the following free space is not big enough, does not move sh_last and reclaim free space properly. * slap_sl_calloc()/ber_memcalloc() do not catch <count>*<size> overflow: ber_len_t total = n * size; if (!n && total/n == size) <allocate...>; That can be optimized a bit, if anyone cares: #define LIM_SQRT(t) /* some value < sqrt(max value of unsigned type t) */ \ ((0UL|(t)-1) >>31>>31 > 1 ? ((t)1 <<32) - 1 : \ (0UL|(t)-1) >>31 ? 65535U : (0UL|(t)-1) >>15 ? 255U : 15U) ... /* sqrt test = slight optimization: often avoids the division */ if ((n | size) <= LIM_SQRT(ber_len_t) || !n || total/n == size) ... * slap_sl_<malloc/realloc> increase size before proceeding, they should reduce it before falling back to ch_malloc and for Debug messages. Space optimizations/cleanup: * To save space on 64-bit hosts: We can store store sizes (head, tail) in 4-byte typedef ber_uint_t slap_sl_size_t instead of a ber_len_t. After fixing the misalignment issue above, however. * Also the stack implementation does not need to allocate the tail. It can store the "block is freed" bit in the head of the _next_ block, and only store a tail in free blocks (i.e. if that bit is set). I think that does add a bit of code, though it is convenient for the "reclaim freed blocks at sh_last" code in slap_sl_free(). * Ignoring last point, the "reclaim" code can in any case be simplified: /* mark it free */ p[-1] = size |= 1; /* reclaim free space off tail */ if (sh->sh_last == p) { do { p = (slap_sllen_t*) ((char*) p - size + 1) - 1; size = p[-1]; } while (size & 1); sh->sh_last = p; } * Realloc can check if previous or next block is free before giving up and doing malloc/copy/free. That does multiply its code size though. Dunno if it's a good idea. Needs testing/timing, I guess. * Aligning with (size & ~pad) is wrong for non-two's complement signed pad, it should be (size & -(pad + 1)) aka (size & -Alignment). All right, so I'm nitpicking. I just dislike unportable code which isn't even simpler than the portable variant. * pad and order_start can be compile-time enums instead of computed at run time (if any compilers do not optimize it away already): enum { Align = 2*sizeof(int), Align_log2 = 1 + (Align>2) + (Align>4) + (Align>8) + (Align>16), pad = Align - 1, /* but drop that, it's just as easy to use Align */ order_start = Align_log2 /* which also can be dropped */ }; ... assert(Align == 1 << Align_log2); /* Adding head+tail preserves alignment */ assert(2*sizeof(ber_len_t) % Align == 0); * The pool code can be cleaned up by basing 'order' on Align-sized chunks instead of byte-sized chunks, i.e. subtract order_start from sh->sh_maxorder, order/i/j variables. Also subtract 1 from mem_create:order, so that 'order' means the same as 'order' elsewhere. (Straightforward replacement, no need to invoke the brain and understand the code:-) * The pool implementation sometimes does 1<<foo where it should do (ber_len_t)1<<foo or 1U<<foo.
h.b.furuseth@usit.uio.no wrote: > Full_Name: Hallvard B Furuseth > Version: HEAD > OS: Linux x86_64 > URL: > Submission from: (NULL) (129.240.6.233) > Submitted by: hallvard > > > Here are a number of bugs and other issues with slapd/sl_malloc.c, and > one for ber_memcalloc. Some for RE24, likely some for RE25, but I don't > know where to cut off for RE24? (Waiting for someone to say...) Go ahead and commit. In working pieces if necessary. We'll pick and choose... > > * It claims to align on 2*sizeof(int) boundaries, but it looks like > stack:realloc and pool:mem_create can un-align to (2n+1)*sizeof(int) > boundaries on 32-bit hosts, where sizeof(ber_len_t) == sizeof(int). > > A slap_sl_realloc() fix for the stack implementation is to replace > size += pad + sizeof( ber_len_t ); > size&= ~pad; > with > size = ((size + pad)& ~pad) + sizeof( ber_len_t ); > however see the optimizations and tweaks below. Fine. > For the pool implementation, maybe slap_sl_mem_create() can just set > so->so_ptr = (char *) sh->sh_base + sizeof(ber_len_t) % (pad+1); > Unless that messes up its expectations of how much space is left, I > haven't checked. Should mem_create size be increased accordingly? Nobody uses the pool code. I would suggest you profile it against the stack code before spending any time fixing it. > Unfortunately sl_malloc.c exports its internals, so I don't know what > kind of fix would affect it: > > * Why is there a slap_sl_mem_detach() when there is no way to reattach > it? As far as I can tell all the caller can do is to inspect/clean up > the returned slab_heap itself. It was originally implemented in 2.2 to support psearch for sync replication. It was obsoleted when the syncprov overlay was written in 2.3, we just never deleted it. Go ahead and delete it. (And remember, the point of sl_malloc is that a heap here doesn't need any special cleanup - just free it and forget it.) > * slap_sl_mem_create() switching implementation stack<->pool on an old > memctx slab can crash or leak, it runs the wrong cleanup code. > > Fix: Remove code duplication, rearrange. The pool cleanup duplicates > slap_sl_mem_destroy(), except the final frees. Let destroy(key=NULL,) > mean don't free and call that. Also the create/destroy pool branches > duplicate the stack branches. > > * However, will the pool (i.e. non-stack) implementation ever be used? > If not, a simpler fix to some of this is to delete it. OTOH if it > someday may be used, hopefully some of these cleanups will be leaving > it a bit more correct, easier to read, and faster. Though I haven't > gone to the drastic step of studying it to see how it works. > > Or we could #ifdef it out by default. It's unclear to me why this is > a run-time argument to the create function instead of instead of a > -D<compiler flag>, why would the caller care? Dunno. I guess it was for quicker turnaround during testing. But afaik only Jong ever tested/used it, and no one else has spent any time on it. #if'ing it all out is fine with me. Of course, if you find that it's actually better performing than the stack version, then we can #if out the stack code instead. > > #define SLAP_SLAB_MODE<undef or 1:stack, 0:pool: -1:support both>. > #define SLAP_SLAB_ISSTACK(stack)<stack if both, else SLAP_SLAB_MODE> > > * Regarding code duplication, the #ifdef SLAP_NO_SL_MALLOC code > duplicates the !ctx code. Make that if(<enum No_sl_malloc> && !ctx) > and the #ifdefs can be killed. I would personally prefer no runtime checks, and all compile-time checks. > * Also we can throw the #ifdef NO_THREADS/#else/#endif code out to two > macros #define GET_MEMCTX()/SET_MEMCTX(). right. > * The "enough room?" test (ptr + size< endptr) is invalid, can e.g. > wrap around, when there is not enough room. Use (size< endptr-ptr). ok. > * Some failures assert(0)/exit() like ch_malloc.c, but without giving a > Debug() message first. don't really care... > * The slap_sl_malloc() pool implementation can return NULL on failure > at comment "FIXME: missing return; guessing...". For consistency it > should exit on failure like the rest of sl_malloc.c and ch_malloc.c. don't care, unless we decide to keep the pool code. Go ahead and commit. > * Realloc(last block) when the following free space is not big enough, > does not move sh_last and reclaim free space properly. > > * slap_sl_calloc()/ber_memcalloc() do not catch<count>*<size> overflow: > ber_len_t total = n * size; > if (!n&& total/n == size)<allocate...>; > > That can be optimized a bit, if anyone cares: > > #define LIM_SQRT(t) /* some value< sqrt(max value of unsigned type t) */ \ > ((0UL|(t)-1)>>31>>31> 1 ? ((t)1<<32) - 1 : \ > (0UL|(t)-1)>>31 ? 65535U : (0UL|(t)-1)>>15 ? 255U : 15U) > ... > /* sqrt test = slight optimization: often avoids the division */ > if ((n | size)<= LIM_SQRT(ber_len_t) || !n || total/n == size) ... > > * slap_sl_<malloc/realloc> increase size before proceeding, they should > reduce it before falling back to ch_malloc and for Debug messages. > > Space optimizations/cleanup: > > * To save space on 64-bit hosts: We can store store sizes (head, tail) > in 4-byte typedef ber_uint_t slap_sl_size_t instead of a ber_len_t. > > After fixing the misalignment issue above, however. > > * Also the stack implementation does not need to allocate the tail. > It can store the "block is freed" bit in the head of the _next_ block, > and only store a tail in free blocks (i.e. if that bit is set). > > I think that does add a bit of code, though it is convenient for the > "reclaim freed blocks at sh_last" code in slap_sl_free(). > > * Ignoring last point, the "reclaim" code can in any case be simplified: > /* mark it free */ > p[-1] = size |= 1; > /* reclaim free space off tail */ > if (sh->sh_last == p) { > do { > p = (slap_sllen_t*) ((char*) p - size + 1) - 1; > size = p[-1]; > } while (size& 1); > sh->sh_last = p; > } > > * Realloc can check if previous or next block is free before giving up > and doing malloc/copy/free. That does multiply its code size though. > Dunno if it's a good idea. Needs testing/timing, I guess. > > * Aligning with (size& ~pad) is wrong for non-two's complement signed > pad, it should be (size& -(pad + 1)) aka (size& -Alignment). > > All right, so I'm nitpicking. I just dislike unportable code which > isn't even simpler than the portable variant. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
hyc@symas.com writes: > Go ahead and commit. In working pieces if necessary. We'll pick and choose... Thanks. Will do. > Nobody uses the pool code. I would suggest you profile it against the stack > code before spending any time fixing it. Right, that and detach was what I wondered most about. Beyond just "with enough changes, some will be bugs":-) Well, the current pool code blocks the "ber_len_t -> 32-bit" space optimization, which would import the alignment problem to 64-bit hosts. Will profile first. When I get time. >> * Regarding code duplication, the #ifdef SLAP_NO_SL_MALLOC code >> duplicates the !ctx code. Make that if(<enum No_sl_malloc> && !ctx) >> and the #ifdefs can be killed. > > I would personally prefer no runtime checks, and all compile-time checks. Right, but if(compile-time constant) is a compile-time check on a decent compiler. Otherwise we shouldn't be using '#define foobar do { ... } while(0)' trick to allow ';' after a macro. -- Hallvard
hyc@symas.com writes: > h.b.furuseth@usit.uio.no wrote: > Go ahead and commit. In working pieces if necessary. We'll pick and choose... Committed one batch of fixes/tweaks, hopefully independent after the first one or two. Next batch will have to walk all over some of these. >> * Why is there a slap_sl_mem_detach() when there is no way to reattach >> it? As far as I can tell all the caller can do is to inspect/clean up >> the returned slab_heap itself. > > It was originally implemented in 2.2 to support psearch for sync replication. > It was obsoleted when the syncprov overlay was written in 2.3, we just never > deleted it. Go ahead and delete it. Actually I just realized the context can be used thread-less just fine, even if it's only relevant if any 3rd party modules do it. Might as well keep it in RE24, just in case. But I'll move structs slap_object and slap_heap back to sl_malloc.c, so such modules cannot mess with them. > (And remember, the point of sl_malloc is that a heap here doesn't need > any special cleanup - just free it and forget it.) Not quite, due to the ch_malloc fallbacks. Anyway, I've committed some comments with my understanding of the package. >> * Some failures assert(0)/exit() like ch_malloc.c, but without giving a >> Debug() message first. > > don't really care... I do, I've had silent exits from slapd before and hated it. -- Hallvard
changed notes changed state Open to Active moved from Incoming to Software Bugs
hyc@symas.com writes: > Nobody uses the pool code. I would suggest you profile it against the stack > code before spending any time fixing it. The pool version crashes if I divide SLAP_SLAB_SIZE by 16 or multiply the allocated amounts by 17. The stack version stays happy. I also tried RE23 and rev 1.48 in case of code rot. (Should have tried OpenLDAP 2.2, but would need to install an old Berkeley DB.) Adding a sl_malloc fallback to ch_malloc instead of returning NULL didn't help. Anyway, I tried this because I assume the pool version is intended to reduce the number of sl_malloc fallbacks to ch_malloc, but that almost never happens in 'make test' with the current SLAP_SLAB_SIZE. Which made that aspect hard to test. Also the pool version does a number of ch_mallocs for administration and ch_frees them at reset. If the idea is to reduce ch_mallocs, it should keep these blocks around. (Or allocate them with the slab.) Except the 'slab_object's It only needs cleanup if the slab size changes, which slapd never seems to do. -- Hallvard
Hallvard B Furuseth wrote: > hyc@symas.com writes: >> Nobody uses the pool code. I would suggest you profile it against the stack >> code before spending any time fixing it. > > The pool version crashes if I divide SLAP_SLAB_SIZE by 16 or multiply > the allocated amounts by 17. The stack version stays happy. I also > tried RE23 and rev 1.48 in case of code rot. (Should have tried > OpenLDAP 2.2, but would need to install an old Berkeley DB.) Adding a > sl_malloc fallback to ch_malloc instead of returning NULL didn't help. > > Anyway, I tried this because I assume the pool version is intended to > reduce the number of sl_malloc fallbacks to ch_malloc, but that almost > never happens in 'make test' with the current SLAP_SLAB_SIZE. Which > made that aspect hard to test. > > Also the pool version does a number of ch_mallocs for administration and > ch_frees them at reset. If the idea is to reduce ch_mallocs, it should > keep these blocks around. (Or allocate them with the slab.) Except the > 'slab_object's It only needs cleanup if the slab size changes, which > slapd never seems to do. This stuff was originally written to grow the slab as needed, but it was also being very braindead and would have kept a lot of excess memory lying around so I dropped that. So right, currently the slab size doesn't change. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
changed notes changed state Active to Test
changed notes changed state Test to Partial
Some fixes in HEAD Some fixes in RE24 The rest waits for RE25