Alex Shi, along with Yanmin Zhang have been noticing occasional time
inconsistencies recently. Through their great diagnosis, they found that
the xtime_nsec value used in update_wall_time was occasionally going
negative. After looking through the code for awhile, I realized we have
the possibility for an underflow when three conditions are met in
update_wall_time():
1) We have accumulated a second's worth of nanoseconds, so we
incremented xtime.tv_sec and appropriately decrement xtime_nsec. (This
doesn't cause xtime_nsec to go negative, but it can cause it to be
small).
2) The remaining offset value is large, but just slightly less then
cycle_interval.
3) clocksource_adjust() is speeding up the clock, causing a corrective
amount (compensating for the increase in the multiplier being multiplied
against the unaccumulated offset value) to be subtracted from
xtime_nsec.
This can cause xtime_nsec to underflow.
Unfortunately, since we notify the NTP subsystem via second_overflow()
whenever we accumulate a full second, and this effects the error
accumulation that has already occured, we cannot simply revert the
accumulated second from xtime nor move the second accumulation to after
the clocksource_adjust call without a change in behavior.
This leaves us with (at least) two options:
1) Simply return from clocksource_adjust() without making a change if we
notice the adjustment would cause xtime_nsec to go negative.
This would work, but I'm concerned that if a large adjustment was needed
(due to the error being large), it may be possible to get stuck with an
ever increasing error that becomes too large to correct (since it may
always force xtime_nsec negative). This may just be paranoia on my part.
2) Catch xtime_nsec if it is negative, then add back the amount its
negative to both xtime_nsec and the error.
This second method is consistent with how we've handled earlier rounding
issues, and also has the benefit that the error being added is always in
the oposite direction also always equal or smaller then the correction
being applied. So the risk of a corner case where things get out of
control is lessened.
This patch fixes bug 11970, as tested by Yanmin Zhang
http://bugzilla.kernel.org/show_bug.cgi?id=11970
Signed-off-by: John Stultz <[email protected]>
Index: linux-2.6/kernel/time/timekeeping.c
===================================================================
--- linux-2.6.orig/kernel/time/timekeeping.c 2008-12-01 18:24:58.000000000 -0800
+++ linux-2.6/kernel/time/timekeeping.c 2008-12-01 18:25:00.000000000 -0800
@@ -518,6 +518,26 @@ void update_wall_time(void)
/* correct the clock when NTP error is too big */
clocksource_adjust(offset);
+
+ /* Since in the loop above, we accumulate any amount of time
+ * in xtime_nsec over a second into xtime.tv_sec, its possible for
+ * xtime_nsec to be fairly small after the loop. Further, if we're
+ * slightly speeding the clocksource up in clocksource_adjust(),
+ * its possible the required corrective factor to xtime_nsec could
+ * cause it to underflow.
+ * Now, we cannot simply roll the accumulated second back, since
+ * the NTP subsystem has been notified via second_overflow. So
+ * instead we push xtime_nsec forward by the amount we underflowed,
+ * and add that amount into the error.
+ * We'll correct this error next time through this function, when
+ * xtime_nsec is not as small.
+ */
+ if (unlikely((s64)clock->xtime_nsec < 0)) {
+ s64 neg = -(s64)clock->xtime_nsec;
+ clock->xtime_nsec = 0;
+ clock->error += neg << (NTP_SCALE_SHIFT - clock->shift);
+ }
+
/* store full nanoseconds into xtime after rounding it up and
* add the remainder to the error difference.
*/
Hi,
On Mon, 1 Dec 2008, john stultz wrote:
> Alex Shi, along with Yanmin Zhang have been noticing occasional time
> inconsistencies recently. Through their great diagnosis, they found that
> the xtime_nsec value used in update_wall_time was occasionally going
> negative. After looking through the code for awhile, I realized we have
> the possibility for an underflow when three conditions are met in
> update_wall_time():
>
> 1) We have accumulated a second's worth of nanoseconds, so we
> incremented xtime.tv_sec and appropriately decrement xtime_nsec. (This
> doesn't cause xtime_nsec to go negative, but it can cause it to be
> small).
>
> 2) The remaining offset value is large, but just slightly less then
> cycle_interval.
>
> 3) clocksource_adjust() is speeding up the clock, causing a corrective
> amount (compensating for the increase in the multiplier being multiplied
> against the unaccumulated offset value) to be subtracted from
> xtime_nsec.
>
> This can cause xtime_nsec to underflow.
This doesn't explain the problem entirely, I considered a negative
xtime_nsec before, but xtime_nsec+offset should still be positive and
produce the correct result, at least I can't find anything in
getnstimeofday(). It should also be a very rare event, so it's really
puzzling that it's so easy to reproduce.
So there must be more to it than just a negative xtime_nsec, it triggers
the problem, but it's not the actual problem. One possible explanation is
this line:
clock->xtime_nsec -= (s64)xtime.tv_nsec << clock->shift;
The rounding further increases the problem as the error is adjusted into
the wrong direction and under the right conditions it seems to be possible
to go out of sync as the error increasingly gets worse. I'd like to see
some numbers to confirm this theory, in any case above line is incorrect
for negative numbers.
> + /* Since in the loop above, we accumulate any amount of time
> + * in xtime_nsec over a second into xtime.tv_sec, its possible for
> + * xtime_nsec to be fairly small after the loop. Further, if we're
> + * slightly speeding the clocksource up in clocksource_adjust(),
> + * its possible the required corrective factor to xtime_nsec could
> + * cause it to underflow.
> + * Now, we cannot simply roll the accumulated second back, since
> + * the NTP subsystem has been notified via second_overflow. So
> + * instead we push xtime_nsec forward by the amount we underflowed,
> + * and add that amount into the error.
> + * We'll correct this error next time through this function, when
> + * xtime_nsec is not as small.
> + */
> + if (unlikely((s64)clock->xtime_nsec < 0)) {
> + s64 neg = -(s64)clock->xtime_nsec;
> + clock->xtime_nsec = 0;
> + clock->error += neg << (NTP_SCALE_SHIFT - clock->shift);
> + }
I don't mind this solution, but to be precise it avoids the problem.
My favourite solution would involve improving the xtime handling, as it's
not really necessary to copy the nsec value back and forth between xtime
and xtime_nsec, but it requires going through all xtime users, especially
all settimeofday implementations, which also have to set xtime_nsec so
update_wall_time() doesn't has to read it in. Then it's possible to make
xtime_nsec signed and allow it to be negative.
So avoiding the negative nsec value is the better shorttime solution, but
I'd prefer you'd drop the second paragraph, instead of suggesting a broken
solution I'd rather see a bit info about how to fix this properly, i.e.
fixing the xtime handling, so it's safe to allow negative values.
bye, Roman
On Wed, 2008-12-03 at 03:35 +0100, Roman Zippel wrote:
> Hi,
>
> On Mon, 1 Dec 2008, john stultz wrote:
>
> > Alex Shi, along with Yanmin Zhang have been noticing occasional time
> > inconsistencies recently. Through their great diagnosis, they found that
> > the xtime_nsec value used in update_wall_time was occasionally going
> > negative. After looking through the code for awhile, I realized we have
> > the possibility for an underflow when three conditions are met in
> > update_wall_time():
> >
> > 1) We have accumulated a second's worth of nanoseconds, so we
> > incremented xtime.tv_sec and appropriately decrement xtime_nsec. (This
> > doesn't cause xtime_nsec to go negative, but it can cause it to be
> > small).
> >
> > 2) The remaining offset value is large, but just slightly less then
> > cycle_interval.
> >
> > 3) clocksource_adjust() is speeding up the clock, causing a corrective
> > amount (compensating for the increase in the multiplier being multiplied
> > against the unaccumulated offset value) to be subtracted from
> > xtime_nsec.
> >
> > This can cause xtime_nsec to underflow.
>
> This doesn't explain the problem entirely, I considered a negative
> xtime_nsec before, but xtime_nsec+offset should still be positive
xtime_nsec underflows after clocksource_adjust. Before clocksource_adjust,
xtime_nsec is a small positive.
When xtime_nsec underflows at the first time, xtime.tv_nsec becomes -1.
Later on when the second tick arrives, below statement in the while loop
clock->xtime_nsec += clock->xtime_interval;
will cause clock->xtime_nsec becomes positive again. So the second tick
appears a going-backward time.
> and
> produce the correct result, at least I can't find anything in
> getnstimeofday().
The testing uses vsyscall to get call gettimeofday. vsyscall_gtod_data.wall_time_nsec
is a u32 while timespec->tv_nsec is a signed long.
> It should also be a very rare event, so it's really
> puzzling that it's so easy to reproduce.
We just triggered it on 2 machines. Other machines are ok.
> So there must be more to it than just a negative xtime_nsec, it triggers
> the problem, but it's not the actual problem. One possible explanation is
> this line:
>
> clock->xtime_nsec -= (s64)xtime.tv_nsec << clock->shift;
>
> The rounding further increases the problem as the error is adjusted into
> the wrong direction and under the right conditions it seems to be possible
> to go out of sync as the error increasingly gets worse. I'd like to see
> some numbers to confirm this theory, in any case above line is incorrect
> for negative numbers.
>
> > + /* Since in the loop above, we accumulate any amount of time
> > + * in xtime_nsec over a second into xtime.tv_sec, its possible for
> > + * xtime_nsec to be fairly small after the loop. Further, if we're
> > + * slightly speeding the clocksource up in clocksource_adjust(),
> > + * its possible the required corrective factor to xtime_nsec could
> > + * cause it to underflow.
> > + * Now, we cannot simply roll the accumulated second back, since
> > + * the NTP subsystem has been notified via second_overflow. So
> > + * instead we push xtime_nsec forward by the amount we underflowed,
> > + * and add that amount into the error.
> > + * We'll correct this error next time through this function, when
> > + * xtime_nsec is not as small.
> > + */
> > + if (unlikely((s64)clock->xtime_nsec < 0)) {
> > + s64 neg = -(s64)clock->xtime_nsec;
> > + clock->xtime_nsec = 0;
> > + clock->error += neg << (NTP_SCALE_SHIFT - clock->shift);
> > + }
>
> I don't mind this solution, but to be precise it avoids the problem.
> My favourite solution would involve improving the xtime handling, as it's
> not really necessary to copy the nsec value back and forth between xtime
> and xtime_nsec, but it requires going through all xtime users, especially
> all settimeofday implementations, which also have to set xtime_nsec so
> update_wall_time() doesn't has to read it in. Then it's possible to make
> xtime_nsec signed and allow it to be negative.
> So avoiding the negative nsec value is the better shorttime solution, but
> I'd prefer you'd drop the second paragraph, instead of suggesting a broken
> solution I'd rather see a bit info about how to fix this properly, i.e.
> fixing the xtime handling, so it's safe to allow negative values.
>
> bye, Roman
Hi,
On Wed, 3 Dec 2008, Zhang, Yanmin wrote:
> > This doesn't explain the problem entirely, I considered a negative
> > xtime_nsec before, but xtime_nsec+offset should still be positive
> xtime_nsec underflows after clocksource_adjust. Before clocksource_adjust,
> xtime_nsec is a small positive.
>
> When xtime_nsec underflows at the first time, xtime.tv_nsec becomes -1.
> Later on when the second tick arrives, below statement in the while loop
> clock->xtime_nsec += clock->xtime_interval;
> will cause clock->xtime_nsec becomes positive again. So the second tick
> appears a going-backward time.
Yes, but only by 1nsec, so normally it wouldn't be noticable.
> > and
> > produce the correct result, at least I can't find anything in
> > getnstimeofday().
> The testing uses vsyscall to get call gettimeofday. vsyscall_gtod_data.wall_time_nsec
> is a u32 while timespec->tv_nsec is a signed long.
Ok, I was missing this part, I looked at the 32bit version of
getnstimeofday() and there xtime.tv_nsec was correctly sign extended.
To be safe for the future wall_time_nsec should also be a s32.
bye, Roman
On Wed, 2008-12-03 at 04:48 +0100, Roman Zippel wrote:
> Hi,
>
> On Wed, 3 Dec 2008, Zhang, Yanmin wrote:
>
> > > This doesn't explain the problem entirely, I considered a negative
> > > xtime_nsec before, but xtime_nsec+offset should still be positive
> > xtime_nsec underflows after clocksource_adjust. Before clocksource_adjust,
> > xtime_nsec is a small positive.
> >
> > When xtime_nsec underflows at the first time, xtime.tv_nsec becomes -1.
> > Later on when the second tick arrives, below statement in the while loop
> > clock->xtime_nsec += clock->xtime_interval;
> > will cause clock->xtime_nsec becomes positive again. So the second tick
> > appears a going-backward time.
>
> Yes, but only by 1nsec, so normally it wouldn't be noticable.
Not 1nsec. At the second tick, go back about 4294967296 nsec (2^32), about
4 seconds. That matches the output of testing process.
>
> > > and
> > > produce the correct result, at least I can't find anything in
> > > getnstimeofday().
> > The testing uses vsyscall to get call gettimeofday. vsyscall_gtod_data.wall_time_nsec
> > is a u32 while timespec->tv_nsec is a signed long.
>
> Ok, I was missing this part, I looked at the 32bit version of
> getnstimeofday() and there xtime.tv_nsec was correctly sign extended.
> To be safe for the future wall_time_nsec should also be a s32.
Hi,
On Wed, 3 Dec 2008, Zhang, Yanmin wrote:
> > > When xtime_nsec underflows at the first time, xtime.tv_nsec becomes -1.
> > > Later on when the second tick arrives, below statement in the while loop
> > > clock->xtime_nsec += clock->xtime_interval;
> > > will cause clock->xtime_nsec becomes positive again. So the second tick
> > > appears a going-backward time.
> >
> > Yes, but only by 1nsec, so normally it wouldn't be noticable.
> Not 1nsec. At the second tick, go back about 4294967296 nsec (2^32), about
> 4 seconds. That matches the output of testing process.
Above code is not the problem. The error appeared in do_vgettimeofday() as
it only uses unsigned values, so the -1 became 0xffffffff in nsec and the
clock jumped forward 4 seconds and a tick later 4 seconds back.
bye, Roman
On Wed, 2008-12-03 at 03:35 +0100, Roman Zippel wrote:
> Hi,
>
> On Mon, 1 Dec 2008, john stultz wrote:
>
> > Alex Shi, along with Yanmin Zhang have been noticing occasional time
> > inconsistencies recently. Through their great diagnosis, they found that
> > the xtime_nsec value used in update_wall_time was occasionally going
> > negative. After looking through the code for awhile, I realized we have
> > the possibility for an underflow when three conditions are met in
> > update_wall_time():
> >
> > 1) We have accumulated a second's worth of nanoseconds, so we
> > incremented xtime.tv_sec and appropriately decrement xtime_nsec. (This
> > doesn't cause xtime_nsec to go negative, but it can cause it to be
> > small).
> >
> > 2) The remaining offset value is large, but just slightly less then
> > cycle_interval.
> >
> > 3) clocksource_adjust() is speeding up the clock, causing a corrective
> > amount (compensating for the increase in the multiplier being multiplied
> > against the unaccumulated offset value) to be subtracted from
> > xtime_nsec.
> >
> > This can cause xtime_nsec to underflow.
>
> This doesn't explain the problem entirely, I considered a negative
> xtime_nsec before, but xtime_nsec+offset should still be positive and
> produce the correct result, at least I can't find anything in
> getnstimeofday(). It should also be a very rare event, so it's really
> puzzling that it's so easy to reproduce.
> So there must be more to it than just a negative xtime_nsec, it triggers
> the problem, but it's not the actual problem. One possible explanation is
> this line:
>
> clock->xtime_nsec -= (s64)xtime.tv_nsec << clock->shift;
>
> The rounding further increases the problem as the error is adjusted into
> the wrong direction and under the right conditions it seems to be possible
> to go out of sync as the error increasingly gets worse. I'd like to see
> some numbers to confirm this theory, in any case above line is incorrect
> for negative numbers.
>
> > + /* Since in the loop above, we accumulate any amount of time
> > + * in xtime_nsec over a second into xtime.tv_sec, its possible for
> > + * xtime_nsec to be fairly small after the loop. Further, if we're
> > + * slightly speeding the clocksource up in clocksource_adjust(),
> > + * its possible the required corrective factor to xtime_nsec could
> > + * cause it to underflow.
> > + * Now, we cannot simply roll the accumulated second back, since
> > + * the NTP subsystem has been notified via second_overflow. So
> > + * instead we push xtime_nsec forward by the amount we underflowed,
> > + * and add that amount into the error.
> > + * We'll correct this error next time through this function, when
> > + * xtime_nsec is not as small.
> > + */
> > + if (unlikely((s64)clock->xtime_nsec < 0)) {
> > + s64 neg = -(s64)clock->xtime_nsec;
> > + clock->xtime_nsec = 0;
> > + clock->error += neg << (NTP_SCALE_SHIFT - clock->shift);
> > + }
>
> I don't mind this solution, but to be precise it avoids the problem.
> My favourite solution would involve improving the xtime handling, as it's
Yea, I agree.
> not really necessary to copy the nsec value back and forth between xtime
> and xtime_nsec, but it requires going through all xtime users, especially
> all settimeofday implementations, which also have to set xtime_nsec so
> update_wall_time() doesn't has to read it in. Then it's possible to make
> xtime_nsec signed and allow it to be negative.
> So avoiding the negative nsec value is the better shorttime solution, but
> I'd prefer you'd drop the second paragraph, instead of suggesting a broken
> solution I'd rather see a bit info about how to fix this properly, i.e.
> fixing the xtime handling, so it's safe to allow negative values.
So to make sure I understand, you're suggesting we push xtime_nsec to
effectively replace the xtime.tv_nsec value, right? That would effect
quite a few places. But mostly for the remaining non-GENERIC_TIME
arches, which I need to get around to converting at some point.
thanks
-john