Issue 8685 - Invalid memory access
Summary: Invalid memory access
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-07-06 19:13 UTC by leitao@debian.org
Modified: 2017-07-09 15:33 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description leitao@debian.org 2017-07-06 19:13:11 UTC
Full_Name: Breno Leitao
Version: upstream
OS: Debian
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (32.104.18.202)


Currently, do_random() function in tests/progs/slapd-mtread.c uses a random
number (upto RAND_MAX) to access an array that is much smaller than RAND_MAX,
causing a segfault.

This causes a segmentation fault and more details could be found at
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=866122
Comment 1 Howard Chu 2017-07-06 19:55:01 UTC
leitao@debian.org wrote:
> Full_Name: Breno Leitao
> Version: upstream
> OS: Debian
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (32.104.18.202)
> 
> 
> Currently, do_random() function in tests/progs/slapd-mtread.c uses a random
> number (upto RAND_MAX) to access an array that is much smaller than RAND_MAX,
> causing a segfault.
> 
> This causes a segmentation fault and more details could be found at
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=866122
> 
> 
Thanks for the report. I've examined your proposed patch in your debian 
bugtracker. It doesn't make much sense though.

The random number is being correctly scaled, line 682:

int r = ((double)nvalues)*rand()/(RAND_MAX + 1.0);

Which means the value of r can only be from 0 to nvalues-1.

And there should be no difference between nvalues and i, since on line 657:

nvalues = ldap_count_entries( ld, res );

Since i is simply iterated through all of the entries in the response, the two 
values cannot disagree.

Finally, such a simple bug as your patch suggests would have crashed long ago 
on every other machine/OS, and it has never done so. I don't believe you've 
identified the actual bug.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 2 leitao@debian.org 2017-07-07 00:57:30 UTC
Hello Howard,

On Thu, Jul 06, 2017 at 08:55:01PM +0100, Howard Chu wrote:
> leitao@debian.org wrote:
> > Full_Name: Breno Leitao
> > Version: upstream
> > OS: Debian
> > URL: ftp://ftp.openldap.org/incoming/
> > Submission from: (NULL) (32.104.18.202)
> > 
> > 
> > Currently, do_random() function in tests/progs/slapd-mtread.c uses a random
> > number (upto RAND_MAX) to access an array that is much smaller than RAND_MAX,
> > causing a segfault.
> > 
> > This causes a segmentation fault and more details could be found at
> > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=866122
> > 
> > 
> Thanks for the report. I've examined your proposed patch in your debian
> bugtracker. It doesn't make much sense though.
> 
> The random number is being correctly scaled, line 682:
> 
> int r = ((double)nvalues)*rand()/(RAND_MAX + 1.0);
> 
> Which means the value of r can only be from 0 to nvalues-1.
> 
> And there should be no difference between nvalues and i, since on line 657:
> 
> nvalues = ldap_count_entries( ld, res );
> 
> Since i is simply iterated through all of the entries in the response, the
> two values cannot disagree.

Thanks for looking at it, and your suggestion made me revisit it. let me
share I am finding in the debug. This is the failure frame:

       #5  do_read (ld=0x3fff980008e0, entry=0x6e6d756c413d756f <error: Cannot access memory at address 0x6e6d756c413d756f>, 
          attrs=0x20020058 <srchattrs>, noattrs=<optimized out>, maxloop=<optimized out>, maxretries=<optimized out>, force=<optimized out>, 
          idx=<optimized out>, chaserefs=<optimized out>, delay=<optimized out>, nobind=<optimized out>) at ../../../../tests/progs/slapd-mtread.c:791

On this frame, these are the values we have:

       i = 0
       do_retry = 0
       rc = <optimized out>
       thrstr = "Read(1): entry=\"0 cnt: 1 (retried 0) (dc=example,dc=com)\000u=Alumni Association,ou=People,dc=example,dc=com)\000mple,dc=com)", '\000' <repeats 6641 times>...
       e = <optimized out>
       attrs = {0x200072a0 "1.1", 0x0}
       rc = 0
       nvalues = <optimized out>
       res = 0x3fffa0001ac0

So, that is what I suppose is happening. On the following loop,
ldap_first_entry() is returning NULL, thus, i = 0;

       for ( i = 0, e = ldap_first_entxury( ld, res ); e != NULL; i++, e = ldap_next_entry( ld, e ) )
       {
             values[ i ] = ldap_get_dn( ld, e );
       }
       values[ i ] = NULL;

Thus, value[0] = NULL;

Later, the do_random() code does the following loop and innerloop is 10000.

       for ( i = 0; i < innerloop; i++ ) {
               int     r = ((double)nvalues)*rand()/(RAND_MAX + 1.0);
               do_read( ld, values[ r ],
                       srchattrs, noattrs, nobind, 1, maxretries,
                       delay, force, chaserefs, idx );
       }

Thus, independent of the r value, values[r] will have garbage, right?  But I
agree with you, I need to find a better patch that address this e = NULL corner
case.

Comment 3 Ryan Tandy 2017-07-07 06:20:46 UTC
Hi Breno,

Thanks a lot for taking the time to look at this.

I reproduced the crash on a minicloud VM (thanks!) with gcc -O2 (but not 
-O1 or -O0) and also with clang -O2 and -O1 (but not -O0).

On Fri, Jul 07, 2017 at 12:57:47AM +0000, leitao@debian.org wrote:
>So, that is what I suppose is happening. On the following loop,
>ldap_first_entry() is returning NULL, thus, i = 0;
>
>       for ( i = 0, e = ldap_first_entxury( ld, res ); e != NULL; i++, e = ldap_next_entry( ld, e ) )
>       {
>             values[ i ] = ldap_get_dn( ld, e );
>       }
>       values[ i ] = NULL;
>
>Thus, value[0] = NULL;

I have not been able to reproduce that. I added an assert(i == nvalues) 
here, but it never failed in the time I spent on this today.

On IRC, Howard pointed at the computation of 'r' (line 682), and indeed 
some printf debugging suggests that may be going wrong. I occasionally 
saw values of 'r' greater than 'nvalues', leading to reads past the end 
of the array like you suggested.

for example:

nvalues = 19
[...]
i=20 r=11 values[r]=0x10014fba0f0
i=48 r=6 values[r]=0x3fff40000bd0
i=49 r=3 values[r]=0x3fff40000b50
i=51 r=7 values[r]=0x3fff6c001310
i=46 r=11 values[r]=0x3fff0c003530
i=56 r=27 values[r]=0x25
Segmentation fault (core dumped)

Unpacking the computation, it looks like the multiplication is the part 
that sometimes returns the wrong result. For example:

nvalues = 19
[...]
rand() -> 745824322
((double)19)*745824322 -> 45495283642.000000
45495283642.000000 / 2147483648.000000 -> 21.185392
i=75 r=21 values[r]=0x35
Segmentation fault (core dumped)

but 19.0*745824322 should be 14170662118.

I'm not really sure where to look next. Compiler bug would be a nice 
explanation, but wouldn't it be unusual for gcc and clang to have the 
same bug?

Comment 4 Howard Chu 2017-07-07 13:54:04 UTC
Breno Leitao wrote:
> Hello Howard,
> 
> On Thu, Jul 06, 2017 at 08:55:01PM +0100, Howard Chu wrote:
>> leitao@debian.org wrote:
>>> Full_Name: Breno Leitao
>>> Version: upstream
>>> OS: Debian
>>> URL: ftp://ftp.openldap.org/incoming/
>>> Submission from: (NULL) (32.104.18.202)
>>>
>>>
>>> Currently, do_random() function in tests/progs/slapd-mtread.c uses a random
>>> number (upto RAND_MAX) to access an array that is much smaller than RAND_MAX,
>>> causing a segfault.
>>>
>>> This causes a segmentation fault and more details could be found at
>>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=866122
>>>
>>>
>> Thanks for the report. I've examined your proposed patch in your debian
>> bugtracker. It doesn't make much sense though.
>>
>> The random number is being correctly scaled, line 682:
>>
>> int r = ((double)nvalues)*rand()/(RAND_MAX + 1.0);
>>
>> Which means the value of r can only be from 0 to nvalues-1.
>>
>> And there should be no difference between nvalues and i, since on line 657:
>>
>> nvalues = ldap_count_entries( ld, res );
>>
>> Since i is simply iterated through all of the entries in the response, the
>> two values cannot disagree.
> 
> Thanks for looking at it, and your suggestion made me revisit it. let me
> share I am finding in the debug. This is the failure frame:

You cannot possibly diagnose this with an optimized binary. You're missing 
several crucial variables.

Your analysis here is completely invalid.
> 
>         #5  do_read (ld=0x3fff980008e0, entry=0x6e6d756c413d756f <error: Cannot access memory at address 0x6e6d756c413d756f>,
>            attrs=0x20020058 <srchattrs>, noattrs=<optimized out>, maxloop=<optimized out>, maxretries=<optimized out>, force=<optimized out>,
>            idx=<optimized out>, chaserefs=<optimized out>, delay=<optimized out>, nobind=<optimized out>) at ../../../../tests/progs/slapd-mtread.c:791
> 
> On this frame, these are the values we have:
> 
>         i = 0
>         do_retry = 0
>         rc = <optimized out>
>         thrstr = "Read(1): entry=\"0 cnt: 1 (retried 0) (dc=example,dc=com)\000u=Alumni Association,ou=People,dc=example,dc=com)\000mple,dc=com)", '\000' <repeats 6641 times>...
>         e = <optimized out>
>         attrs = {0x200072a0 "1.1", 0x0}
>         rc = 0
>         nvalues = <optimized out>
>         res = 0x3fffa0001ac0

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 5 Ryan Tandy 2017-07-08 17:07:31 UTC
On Fri, Jul 07, 2017 at 06:20:55AM +0000, ryan@nardis.ca wrote:
>Unpacking the computation, it looks like the multiplication is the part
>that sometimes returns the wrong result.

Not the multiplication, but rather the cast of nvalues to double.

I'm going to take further followups to the Debian tracker, as it seems 
fairly clear to me that this is not an OpenLDAP bug.

Comment 6 OpenLDAP project 2017-07-09 15:33:51 UTC
kernel bug on powerpc64
Comment 7 Ryan Tandy 2017-07-09 15:33:51 UTC
changed notes
changed state Open to Closed