[Date Prev][Date Next] [Chronological] [Thread] [Top]

MAJOR slurpd: ri.c: Using strcmp() on the Timestamp (ITS#1323)



Full_Name: James Tavares
Version: 1.2.12
OS: Solaris 7
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (24.218.235.205)


This weekend we witnessed the historic change from 999,999,999 seconds to
1,000,000,000 in the unix seconds-since-epoch counter.

At the same time, all replication on my systems ceased to exist. Running slurpd
in debug mode claimed that the records in my slurpd.replog were OLDER than what
my slurpd.status showed, which was easily verified as FALSE... My slurpd.status
hadn't been updated since 999,999,942 (verified to be TRUE against the file
dates on the slave servers) and records in my slurpd.replog were dated
1,000,000,000 and beyond.

I grep'd for the "skip repl record for" error message and found it in
slurpd/ri.c. As you all know, the function "ri.c:isnew()" is responsible for
determining if a REPLOG entry is newer than the last update recorded in
SLURPD.STATUS. However, the function uses strcmp() to compare the two
timestamps. Since strcmp() compares alphabetically, it makes sense that slurpd
was reporting 1,000,000,000 to be older than 999,999,942. A simple (and working)
patch follows:

24a25
> #include <math.h>
229d229
<     int       x;
234,235c234
<     x = strcmp( re->re_timestamp, ri->ri_stel->last );
<     if ( x > 0 ) {
---
>     if ( atof(re->re_timestamp) > atof(ri->ri_stel->last) ) {
238c237
<     } else if ( x < 0 ) {
---
>     } else if ( atof(re->re_timestamp) < atof(ri->ri_stel->last) ) {


I am not very familiar with openldap source code (this is the first time I've
looked through it) so to be sure I didn't miss anything I used atof() instead of
atoi().. I remember seeing a ".0" on a timestamp once in slurpd.replog, and
while I guess that could be the sequence number referenced later in the function
as opposed to some fractional-second, I didn't want to take any chances. Feel
free to correct it to atoi() if re->re_timestamp and ri->ri_stel->last are
guaranteed to be integers (whole seconds from time()).

Also, I have no idea where ELSE in slapd / slurpd source code this type of bug
may exist. I just happened to locate this one while trying to get my replication
working :) Which, by the way, this patch (at least) seems to have fixed.

Any questions? Just bounce me an email: jtavares@loa.com.