Full_Name: Breno Leitao Version: upstream OS: Debian URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (32.104.18.202) Currently, do_random() function in tests/progs/slapd-mtread.c uses a random number (upto RAND_MAX) to access an array that is much smaller than RAND_MAX, causing a segfault. This causes a segmentation fault and more details could be found at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=866122
leitao@debian.org wrote: > Full_Name: Breno Leitao > Version: upstream > OS: Debian > URL: ftp://ftp.openldap.org/incoming/ > Submission from: (NULL) (32.104.18.202) > > > Currently, do_random() function in tests/progs/slapd-mtread.c uses a random > number (upto RAND_MAX) to access an array that is much smaller than RAND_MAX, > causing a segfault. > > This causes a segmentation fault and more details could be found at > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=866122 > > Thanks for the report. I've examined your proposed patch in your debian bugtracker. It doesn't make much sense though. The random number is being correctly scaled, line 682: int r = ((double)nvalues)*rand()/(RAND_MAX + 1.0); Which means the value of r can only be from 0 to nvalues-1. And there should be no difference between nvalues and i, since on line 657: nvalues = ldap_count_entries( ld, res ); Since i is simply iterated through all of the entries in the response, the two values cannot disagree. Finally, such a simple bug as your patch suggests would have crashed long ago on every other machine/OS, and it has never done so. I don't believe you've identified the actual bug. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Hello Howard, On Thu, Jul 06, 2017 at 08:55:01PM +0100, Howard Chu wrote: > leitao@debian.org wrote: > > Full_Name: Breno Leitao > > Version: upstream > > OS: Debian > > URL: ftp://ftp.openldap.org/incoming/ > > Submission from: (NULL) (32.104.18.202) > > > > > > Currently, do_random() function in tests/progs/slapd-mtread.c uses a random > > number (upto RAND_MAX) to access an array that is much smaller than RAND_MAX, > > causing a segfault. > > > > This causes a segmentation fault and more details could be found at > > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=866122 > > > > > Thanks for the report. I've examined your proposed patch in your debian > bugtracker. It doesn't make much sense though. > > The random number is being correctly scaled, line 682: > > int r = ((double)nvalues)*rand()/(RAND_MAX + 1.0); > > Which means the value of r can only be from 0 to nvalues-1. > > And there should be no difference between nvalues and i, since on line 657: > > nvalues = ldap_count_entries( ld, res ); > > Since i is simply iterated through all of the entries in the response, the > two values cannot disagree. Thanks for looking at it, and your suggestion made me revisit it. let me share I am finding in the debug. This is the failure frame: #5 do_read (ld=0x3fff980008e0, entry=0x6e6d756c413d756f <error: Cannot access memory at address 0x6e6d756c413d756f>, attrs=0x20020058 <srchattrs>, noattrs=<optimized out>, maxloop=<optimized out>, maxretries=<optimized out>, force=<optimized out>, idx=<optimized out>, chaserefs=<optimized out>, delay=<optimized out>, nobind=<optimized out>) at ../../../../tests/progs/slapd-mtread.c:791 On this frame, these are the values we have: i = 0 do_retry = 0 rc = <optimized out> thrstr = "Read(1): entry=\"0 cnt: 1 (retried 0) (dc=example,dc=com)\000u=Alumni Association,ou=People,dc=example,dc=com)\000mple,dc=com)", '\000' <repeats 6641 times>... e = <optimized out> attrs = {0x200072a0 "1.1", 0x0} rc = 0 nvalues = <optimized out> res = 0x3fffa0001ac0 So, that is what I suppose is happening. On the following loop, ldap_first_entry() is returning NULL, thus, i = 0; for ( i = 0, e = ldap_first_entxury( ld, res ); e != NULL; i++, e = ldap_next_entry( ld, e ) ) { values[ i ] = ldap_get_dn( ld, e ); } values[ i ] = NULL; Thus, value[0] = NULL; Later, the do_random() code does the following loop and innerloop is 10000. for ( i = 0; i < innerloop; i++ ) { int r = ((double)nvalues)*rand()/(RAND_MAX + 1.0); do_read( ld, values[ r ], srchattrs, noattrs, nobind, 1, maxretries, delay, force, chaserefs, idx ); } Thus, independent of the r value, values[r] will have garbage, right? But I agree with you, I need to find a better patch that address this e = NULL corner case.
Hi Breno, Thanks a lot for taking the time to look at this. I reproduced the crash on a minicloud VM (thanks!) with gcc -O2 (but not -O1 or -O0) and also with clang -O2 and -O1 (but not -O0). On Fri, Jul 07, 2017 at 12:57:47AM +0000, leitao@debian.org wrote: >So, that is what I suppose is happening. On the following loop, >ldap_first_entry() is returning NULL, thus, i = 0; > > for ( i = 0, e = ldap_first_entxury( ld, res ); e != NULL; i++, e = ldap_next_entry( ld, e ) ) > { > values[ i ] = ldap_get_dn( ld, e ); > } > values[ i ] = NULL; > >Thus, value[0] = NULL; I have not been able to reproduce that. I added an assert(i == nvalues) here, but it never failed in the time I spent on this today. On IRC, Howard pointed at the computation of 'r' (line 682), and indeed some printf debugging suggests that may be going wrong. I occasionally saw values of 'r' greater than 'nvalues', leading to reads past the end of the array like you suggested. for example: nvalues = 19 [...] i=20 r=11 values[r]=0x10014fba0f0 i=48 r=6 values[r]=0x3fff40000bd0 i=49 r=3 values[r]=0x3fff40000b50 i=51 r=7 values[r]=0x3fff6c001310 i=46 r=11 values[r]=0x3fff0c003530 i=56 r=27 values[r]=0x25 Segmentation fault (core dumped) Unpacking the computation, it looks like the multiplication is the part that sometimes returns the wrong result. For example: nvalues = 19 [...] rand() -> 745824322 ((double)19)*745824322 -> 45495283642.000000 45495283642.000000 / 2147483648.000000 -> 21.185392 i=75 r=21 values[r]=0x35 Segmentation fault (core dumped) but 19.0*745824322 should be 14170662118. I'm not really sure where to look next. Compiler bug would be a nice explanation, but wouldn't it be unusual for gcc and clang to have the same bug?
Breno Leitao wrote: > Hello Howard, > > On Thu, Jul 06, 2017 at 08:55:01PM +0100, Howard Chu wrote: >> leitao@debian.org wrote: >>> Full_Name: Breno Leitao >>> Version: upstream >>> OS: Debian >>> URL: ftp://ftp.openldap.org/incoming/ >>> Submission from: (NULL) (32.104.18.202) >>> >>> >>> Currently, do_random() function in tests/progs/slapd-mtread.c uses a random >>> number (upto RAND_MAX) to access an array that is much smaller than RAND_MAX, >>> causing a segfault. >>> >>> This causes a segmentation fault and more details could be found at >>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=866122 >>> >>> >> Thanks for the report. I've examined your proposed patch in your debian >> bugtracker. It doesn't make much sense though. >> >> The random number is being correctly scaled, line 682: >> >> int r = ((double)nvalues)*rand()/(RAND_MAX + 1.0); >> >> Which means the value of r can only be from 0 to nvalues-1. >> >> And there should be no difference between nvalues and i, since on line 657: >> >> nvalues = ldap_count_entries( ld, res ); >> >> Since i is simply iterated through all of the entries in the response, the >> two values cannot disagree. > > Thanks for looking at it, and your suggestion made me revisit it. let me > share I am finding in the debug. This is the failure frame: You cannot possibly diagnose this with an optimized binary. You're missing several crucial variables. Your analysis here is completely invalid. > > #5 do_read (ld=0x3fff980008e0, entry=0x6e6d756c413d756f <error: Cannot access memory at address 0x6e6d756c413d756f>, > attrs=0x20020058 <srchattrs>, noattrs=<optimized out>, maxloop=<optimized out>, maxretries=<optimized out>, force=<optimized out>, > idx=<optimized out>, chaserefs=<optimized out>, delay=<optimized out>, nobind=<optimized out>) at ../../../../tests/progs/slapd-mtread.c:791 > > On this frame, these are the values we have: > > i = 0 > do_retry = 0 > rc = <optimized out> > thrstr = "Read(1): entry=\"0 cnt: 1 (retried 0) (dc=example,dc=com)\000u=Alumni Association,ou=People,dc=example,dc=com)\000mple,dc=com)", '\000' <repeats 6641 times>... > e = <optimized out> > attrs = {0x200072a0 "1.1", 0x0} > rc = 0 > nvalues = <optimized out> > res = 0x3fffa0001ac0 -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
On Fri, Jul 07, 2017 at 06:20:55AM +0000, ryan@nardis.ca wrote: >Unpacking the computation, it looks like the multiplication is the part >that sometimes returns the wrong result. Not the multiplication, but rather the cast of nvalues to double. I'm going to take further followups to the Debian tracker, as it seems fairly clear to me that this is not an OpenLDAP bug.
kernel bug on powerpc64
changed notes changed state Open to Closed