2006-05-17 23:04:30

by Tim Mann

[permalink] [raw]
Subject: Fix time going backward with clock=pit [1/2]

This is patch 1 of a 2-patch series. The diff is against the
2.6.17-rc3-git17 snapshot.

* * *

Currently, if you boot with clock=pit on the kernel command line and
run a program that loops calling gettimeofday, on many machines you'll
observe that time frequently goes backward by about one jiffy. This
patch fixes that symptom and also some other related bugs.

Bugs and fixes:

1) get_offset_pit was assuming that jiffies could not change because
the read side of xtime_lock is held by its caller, do_gettimeofday.
But xtime_lock is a seqlock now, so jiffies can change; the seqlock
only ensures that get_offset_pit will be retried if jiffies
changes. This matters because get_offset_pit has side effects on
internal state that are not undone in the retry case; in
particular, it remembers the last values of jiffies and count. If
events occur in this order:

* get_offset_pit latches the PIT count
* the PIT counter overflows
* the resulting interrupt is handled and jiffies is incremented
* get_offset_pit reads jiffies

...then the resulting (jiffies, count) pair is about 1 jiffy ahead
of real time. Although do_gettimeofday's seqlock loop will discard
this bogus value and call get_offset_pit again, that doesn't help,
because get_offset_pit saved the bogus value in (jiffies_p,
count_p).

I fixed this by reading jiffies before latching the count, so we
only have to deal with the case where jiffies is older than the
count, never the case where it's newer. (We always have to handle
the case of jiffies being older, because events can happen in this
order:)

* the PIT counter overflows
* get_offset_pit latches the count and reads jiffies (either order)
* the PIT interrupt is handled and jiffies is incremented

2) do_timer_overflow was trying to detect the case where the counter
has wrapped since jiffies was incremented by checking whether the
timer interrupt is still pending in the PIC. This is bogus for a
couple of reasons: (a) There is a window between the point where
the PIC interrupt is acknowledged and jiffies is incremented.
(b) Most systems use the IOAPIC for interrupt routing now. The
kernel has code that tries to also route the timer interrupt to the
PIC and acknowledge it there, but this does not appear to work; in
my testing on a couple of IOAPIC systems, do_timer_overflow always
thought a timer interrupt was pending. Also, this code has the
same window as in (a).

I fixed this by not calling do_timer_overflow, instead going with
the simple solution of preventing count from going in the wrong
direction within a jiffy.

arch/i386/kernel/timers/timer_pit.c | 57 +++++++++++++++++++-----------------
1 files changed, 31 insertions(+), 26 deletions(-)

Index: linux-2.6.17-rc3-git17/arch/i386/kernel/timers/timer_pit.c
===================================================================
--- linux-2.6.17-rc3-git17.orig/arch/i386/kernel/timers/timer_pit.c
+++ linux-2.6.17-rc3-git17/arch/i386/kernel/timers/timer_pit.c
@@ -93,24 +93,25 @@ static unsigned long get_offset_pit(void
int count;
unsigned long flags;
static unsigned long jiffies_p = 0;
-
- /*
- * cache volatile jiffies temporarily; we have xtime_lock.
- */
unsigned long jiffies_t;

spin_lock_irqsave(&i8253_lock, flags);
- /* timer count may underflow right here */
- outb_p(0x00, PIT_MODE); /* latch the count ASAP */
-
- count = inb_p(PIT_CH0); /* read the latched count */
-
/*
- * We do this guaranteed double memory access instead of a _p
- * postfix in the previous port access. Wheee, hackady hack
+ * Although our caller has the read side of xtime_lock, this
+ * is now a seqlock, and we are cheating in this routine by
+ * having side effects on state that we cannot undo if
+ * there is a collision on the seqlock and our caller has to
+ * retry. (Namely, jiffies_p and count_p.) So we must treat
+ * jiffies as volatile despite the lock. We read jiffies
+ * before latching the timer count to guarantee that although
+ * the jiffies value might be older than the count (that is,
+ * the counter may underflow between the last point where
+ * jiffies was incremented and the point where we latch the
+ * count), it cannot be newer.
*/
- jiffies_t = jiffies;
-
+ jiffies_t = jiffies;
+ outb_p(0x00, PIT_MODE); /* latch the count */
+ count = inb_p(PIT_CH0); /* read the latched count */
count |= inb_p(PIT_CH0) << 8;

/* VIA686a test code... reset the latch if count > max + 1 */
@@ -122,18 +123,22 @@ static unsigned long get_offset_pit(void
}

/*
- * avoiding timer inconsistencies (they are rare, but they happen)...
- * there are two kinds of problems that must be avoided here:
- * 1. the timer counter underflows
- * 2. hardware problem with the timer, not giving us continuous time,
- * the counter does small "jumps" upwards on some Pentium systems,
- * (see c't 95/10 page 335 for Neptun bug.)
+ * There are two kinds of problems that may occur here:
+ * 1. The timer counter underflowed between the point
+ * where jiffies was last incremented and the point
+ * where we latched the counter above.
+ * 2. Hardware problem with the timer, not giving us
+ * continuous time. The counter does small "jumps" upwards
+ * on some Pentium systems (see c't 95/10 page 335 for
+ * Neptun bug).
+ * Past attempts to distinguish these cases have been buggy, so we
+ * do the simple thing: just don't allow time to go backward.
*/

if( jiffies_t == jiffies_p ) {
if( count > count_p ) {
/* the nutcase */
- count = do_timer_overflow(count);
+ count = count_p;
}
} else
jiffies_p = jiffies_t;


--
Tim Mann work: [email protected] home: [email protected]
http://www.vmware.com http://tim-mann.org


2006-05-17 23:06:32

by Tim Mann

[permalink] [raw]
Subject: Fix time going backward with clock=pit [2/2]

This is patch 2 of a 2-patch series. The diff is against the
2.6.17-rc3-git17 snapshot. It nukes some code that is dead after the
first patch.

* * *

do_timer_overflow is no longer used now; nuke it.

(By the way, the comment "It means that a timer tick interrupt came
along while the previous one was pending, thus a tick was missed" in
do_timer_overflow was wrong. The code was meant to handle the easier
and more common case where the PIT counter wrapped just before we read
it and the tick interrupt that that caused was still pending.)

Note: There is some other code in the kernel that was meant to support
what do_timer_overflow was doing here. See the "Subtle..." comments
in do_timer_interrupt and check_timer. Unfortunately I don't know how
much of that code (if any) can be cleaned out now, as I don't
understand the stuff about "NMI lines for the watchdog if run on an
82489DX-based system" or all the details of check_timer.

include/asm-i386/mach-default/do_timer.h | 52 -------------------------------
include/asm-i386/mach-visws/do_timer.h | 26 ---------------
include/asm-i386/mach-voyager/do_timer.h | 13 -------
3 files changed, 91 deletions(-)

Index: linux-2.6.17-rc3-git17/include/asm-i386/mach-default/do_timer.h
===================================================================
--- linux-2.6.17-rc3-git17.orig/include/asm-i386/mach-default/do_timer.h
+++ linux-2.6.17-rc3-git17/include/asm-i386/mach-default/do_timer.h
@@ -32,55 +32,3 @@ static inline void do_timer_interrupt_ho
smp_local_timer_interrupt(regs);
#endif
}
-
-
-/* you can safely undefine this if you don't have the Neptune chipset */
-
-#define BUGGY_NEPTUN_TIMER
-
-/**
- * do_timer_overflow - process a detected timer overflow condition
- * @count: hardware timer interrupt count on overflow
- *
- * Description:
- * This call is invoked when the jiffies count has not incremented but
- * the hardware timer interrupt has. It means that a timer tick interrupt
- * came along while the previous one was pending, thus a tick was missed
- **/
-static inline int do_timer_overflow(int count)
-{
- int i;
-
- spin_lock(&i8259A_lock);
- /*
- * This is tricky when I/O APICs are used;
- * see do_timer_interrupt().
- */
- i = inb(0x20);
- spin_unlock(&i8259A_lock);
-
- /* assumption about timer being IRQ0 */
- if (i & 0x01) {
- /*
- * We cannot detect lost timer interrupts ...
- * well, that's why we call them lost, don't we? :)
- * [hmm, on the Pentium and Alpha we can ... sort of]
- */
- count -= LATCH;
- } else {
-#ifdef BUGGY_NEPTUN_TIMER
- /*
- * for the Neptun bug we know that the 'latch'
- * command doesn't latch the high and low value
- * of the counter atomically. Thus we have to
- * substract 256 from the counter
- * ... funny, isnt it? :)
- */
-
- count -= 256;
-#else
- printk("do_slow_gettimeoffset(): hardware timer problem?\n");
-#endif
- }
- return count;
-}
Index: linux-2.6.17-rc3-git17/include/asm-i386/mach-visws/do_timer.h
===================================================================
--- linux-2.6.17-rc3-git17.orig/include/asm-i386/mach-visws/do_timer.h
+++ linux-2.6.17-rc3-git17/include/asm-i386/mach-visws/do_timer.h
@@ -25,29 +25,3 @@ static inline void do_timer_interrupt_ho
smp_local_timer_interrupt(regs);
#endif
}
-
-static inline int do_timer_overflow(int count)
-{
- int i;
-
- spin_lock(&i8259A_lock);
- /*
- * This is tricky when I/O APICs are used;
- * see do_timer_interrupt().
- */
- i = inb(0x20);
- spin_unlock(&i8259A_lock);
-
- /* assumption about timer being IRQ0 */
- if (i & 0x01) {
- /*
- * We cannot detect lost timer interrupts ...
- * well, that's why we call them lost, don't we? :)
- * [hmm, on the Pentium and Alpha we can ... sort of]
- */
- count -= LATCH;
- } else {
- printk("do_slow_gettimeoffset(): hardware timer problem?\n");
- }
- return count;
-}
Index: linux-2.6.17-rc3-git17/include/asm-i386/mach-voyager/do_timer.h
===================================================================
--- linux-2.6.17-rc3-git17.orig/include/asm-i386/mach-voyager/do_timer.h
+++ linux-2.6.17-rc3-git17/include/asm-i386/mach-voyager/do_timer.h
@@ -10,16 +10,3 @@ static inline void do_timer_interrupt_ho

voyager_timer_interrupt(regs);
}
-
-static inline int do_timer_overflow(int count)
-{
- /* can't read the ISR, just assume 1 tick
- overflow */
- if(count > LATCH || count < 0) {
- printk(KERN_ERR "VOYAGER PROBLEM: count is %d, latch is %d\n", count, LATCH);
- count = LATCH;
- }
- count -= LATCH;
-
- return count;
-}
Index: linux-2.6.17-rc3-git17/arch/i386/kernel/time.c
===================================================================
--- linux-2.6.17-rc3-git17.orig/arch/i386/kernel/time.c
+++ linux-2.6.17-rc3-git17/arch/i386/kernel/time.c
@@ -254,6 +254,12 @@ static inline void do_timer_interrupt(in
* manually to reset the IRR bit for do_slow_gettimeoffset().
* This will also deassert NMI lines for the watchdog if run
* on an 82489DX-based system.
+ *
+ * XXX The code that formerly looked at the IRR bit in
+ * do_slow_gettimeoffset no longer exists. Can we
+ * remove the "timer_ack" code here and the
+ * corresponding setup in check_timer, or do NMI
+ * watchdogs still need this?
*/
spin_lock(&i8259A_lock);
outb(0x0c, PIC_MASTER_OCW3);


--
Tim Mann work: [email protected] home: [email protected]
http://www.vmware.com http://tim-mann.org

2006-05-18 08:11:20

by Andrew Morton

[permalink] [raw]
Subject: Re: Fix time going backward with clock=pit [1/2]

Tim Mann <[email protected]> wrote:
>
> Currently, if you boot with clock=pit on the kernel command line and
> run a program that loops calling gettimeofday, on many machines you'll
> observe that time frequently goes backward by about one jiffy. This
> patch fixes that symptom and also some other related bugs.

It might be a bit late to get this into 2.6.17. Although it does look
pretty safe and simple.

Are many people hitting this problem?

And for 2.6.18 we're hoping to get John's x86 timer rework merged up.
John, do those patches address this bug?

So if we decide these two patches are not-for-2.6.17 then I'll sit on them
until we decide whether or not to merge John's patches. If we do, and if
those patches fix this problem then your two patches aren't needed. If
John's patches don't get merged then I'll need to merge these two.

Hope that all makes sense ;)

2006-05-18 11:51:52

by Roman Zippel

[permalink] [raw]
Subject: Re: Fix time going backward with clock=pit [1/2]

Hi,

On Wed, 17 May 2006, Tim Mann wrote:

> 2) do_timer_overflow was trying to detect the case where the counter
> has wrapped since jiffies was incremented by checking whether the
> timer interrupt is still pending in the PIC. This is bogus for a
> couple of reasons: (a) There is a window between the point where
> the PIC interrupt is acknowledged and jiffies is incremented.
> (b) Most systems use the IOAPIC for interrupt routing now. The
> kernel has code that tries to also route the timer interrupt to the
> PIC and acknowledge it there, but this does not appear to work; in
> my testing on a couple of IOAPIC systems, do_timer_overflow always
> thought a timer interrupt was pending. Also, this code has the
> same window as in (a).

Do you (or anyone else) have more information about this? If it's not
possible to detect the underflow, it would mean that PIT isn't usable
as clock in these cases, as the clock would still jump around (just not
visibly backwards).

> if( jiffies_t == jiffies_p ) {
> if( count > count_p ) {
> /* the nutcase */
> - count = do_timer_overflow(count);
> + count = count_p;
> }
> } else
> jiffies_p = jiffies_t;

IMO the else part is not correct. I think the whole think should look
something like this:

if (jiffies_t == jiffies_p) {
if (count > count_p) {
underflow or crappy timer;
}
} else {
jiffies_p = jiffies_t;
if (count > LATCH/2 && underflow)
count -= LATCH;
}

This would also basically solve the ordering problem, retrying the
function would produce the correct result.

So we basically have two issues:
1. Fix the timekeeping to produce correct results.
2. Broken underflow handling.

If the second isn't fixable, it would make the whole thing practically
unusable. I hope that it's at least detectable, so we don't use it as a
clock, which would be IMO preferable instead of completely removing the
underflow check (which would make the clock unreliable for everyone).

BTW another bug here is the wrong initialization of jiffies_p (it should
be INITIAL_JIFFIES).

bye, Roman

2006-05-18 18:50:25

by Tim Mann

[permalink] [raw]
Subject: Re: Fix time going backward with clock=pit [1/2]

On Thu, 18 May 2006 13:51:36 +0200 (CEST), Roman Zippel <[email protected]> wrote:
> Hi,
>
> On Wed, 17 May 2006, Tim Mann wrote:
>
> > 2) do_timer_overflow was trying to detect the case where the counter
> > has wrapped since jiffies was incremented by checking whether the
> > timer interrupt is still pending in the PIC. This is bogus for a
> > couple of reasons: (a) There is a window between the point where
> > the PIC interrupt is acknowledged and jiffies is incremented.
> > (b) Most systems use the IOAPIC for interrupt routing now. The
> > kernel has code that tries to also route the timer interrupt to the
> > PIC and acknowledge it there, but this does not appear to work; in
> > my testing on a couple of IOAPIC systems, do_timer_overflow always
> > thought a timer interrupt was pending. Also, this code has the
> > same window as in (a).
>
> Do you (or anyone else) have more information about this? If it's not
> possible to detect the underflow,

I don't know of a reliable way to detect when we're in the state that
the PIT has wrapped but the resulting interrupt hasn't yet been handled
far enough to cause jiffies to be incremented. I doubt there is one.
Here are two ideas that work in other roughly-similar cases but not here:

1) Generally, on counters that don't interrupt when they wrap, you can
watch for the high-order bit to toggle and detect the wrap that way, and
that method doesn't require handling an interrupt atomically with
reading the counter. However, this is only reliable if you can be
certain to read the counter often enough to detect all the wraps, which
isn't the case with the PIT counter -- it wraps way too often.

2) There are algorithms for reading multi-word hardware counters that
increment atomically but can't be read atomically; for example:

high1 = <read high word of counter>;
do {
high2 = high1;
low = <read low word of counter>;
high1 = high2;
} while (high1 != high2);

... but that doesn't work either, because there still could be a pending
PIT timer interrupt that was routed to another CPU and not handled yet.
The unpredictable delay between the PIT wrapping and jiffies getting
incremented is a killer.

> it would mean that PIT isn't usable
> as clock in these cases, as the clock would still jump around (just not
> visibly backwards).

Yeah, the PIT counter is still not a very good clock source. It's
better on average than just returning the jiffy count, but in the worst
case you can still get a time that's old by up to almost a jiffy (using
the code I sent, and I don't see how to do better).

> > if( jiffies_t == jiffies_p ) {
> > if( count > count_p ) {
> > /* the nutcase */
> > - count = do_timer_overflow(count);
> > + count = count_p;
> > }
> > } else
> > jiffies_p = jiffies_t;
>
> IMO the else part is not correct.

The else part is correct if we eliminate the possibility that jiffies
gets incremented between reading count and reading jiffies (by reading
jiffies first). It looks like you're trying to solve that problem
without moving the read of jiffies above count, but your code is too
sketchy. Also, I don't really see why you want to do that.

> I think the whole think should look
> something like this:
>
> if (jiffies_t == jiffies_p) {
> if (count > count_p) {
> underflow or crappy timer;

What should the code do in this case?

> }
> } else {
> jiffies_p = jiffies_t;
> if (count > LATCH/2 && underflow)
> count -= LATCH;

I think I see what you're aiming at here: in the case where we read
count, then the counter wraps, then we read jiffies, you want to detect
that and fix it. Actually if you could detect that, the right way to
fix it would be to set count = LATCH, since the old count is, well, old:
the current time is right after the jiffy.

However, we don't really have a way to detect that this case happened --
the "&& underflow" in your code is a handwave.

> }
>
> This would also basically solve the ordering problem, retrying the
> function would produce the correct result.
>
> So we basically have two issues:
> 1. Fix the timekeeping to produce correct results.
> 2. Broken underflow handling.
>
> If the second isn't fixable, it would make the whole thing practically
> unusable. I hope that it's at least detectable, so we don't use it as a
> clock, which would be IMO preferable instead of completely removing the
> underflow check (which would make the clock unreliable for everyone).
>
> BTW another bug here is the wrong initialization of jiffies_p (it should
> be INITIAL_JIFFIES).
>
> bye, Roman


--
Tim Mann work: [email protected] home: [email protected]
http://www.vmware.com http://tim-mann.org

2006-05-19 00:09:42

by Roman Zippel

[permalink] [raw]
Subject: Re: Fix time going backward with clock=pit [1/2]

Hi,

On Thu, 18 May 2006, Tim Mann wrote:

> > I think the whole think should look
> > something like this:
> >
> > if (jiffies_t == jiffies_p) {
> > if (count > count_p) {
> > underflow or crappy timer;
>
> What should the code do in this case?

Basically the current do_timer_overflow().

> > }
> > } else {
> > jiffies_p = jiffies_t;
> > if (count > LATCH/2 && underflow)
> > count -= LATCH;
>
> I think I see what you're aiming at here: in the case where we read
> count, then the counter wraps, then we read jiffies, you want to detect
> that and fix it. Actually if you could detect that, the right way to
> fix it would be to set count = LATCH, since the old count is, well, old:
> the current time is right after the jiffy.

It's really "-= LATCH". :)

> However, we don't really have a way to detect that this case happened --
> the "&& underflow" in your code is a handwave.

Ok, I'm not that familiar with Intel hardware (it must be crappier than I
thought). Is there really no way to detect the pending interrupt (e.g.
what do_timer_overflow() does)? Without that information one can really
only guess the time.
It's not that important if it's not completely correct for SMP systems,
they usually have other sources, but for the few systems there this is the
only time source, we should at least make an effort to avoid the read
error.

bye, Roman

2006-05-19 01:54:48

by Tim Mann

[permalink] [raw]
Subject: Re: Fix time going backward with clock=pit [1/2]

On Thu, 18 May 2006 01:11:16 -0700, Andrew Morton <[email protected]> wrote:
> Tim Mann <[email protected]> wrote:
> >
> > Currently, if you boot with clock=pit on the kernel command line and
> > run a program that loops calling gettimeofday, on many machines you'll
> > observe that time frequently goes backward by about one jiffy. This
> > patch fixes that symptom and also some other related bugs.
>
> It might be a bit late to get this into 2.6.17. Although it does look
> pretty safe and simple.
>
> Are many people hitting this problem?

No, since pit is a lower-priority clock source than tsc, pmtmr, or hpet,
and most systems can use one of the latter. However, we currently
recommend clock=pit to VMware users because clock=tsc and clock=pmtmr
have bugs in their lost-tick compensation that makes them actually gain
extra time, and those bugs get tickled often in a VM, making the time
gain quite bad. (See http://bugzilla.kernel.org/show_bug.cgi?id=5127.)
So it would be nice for us if clock=pit didn't go backward, but most
people wouldn't care.

> And for 2.6.18 we're hoping to get John's x86 timer rework merged up.

That would be great. I'm much more excited about seeing John's rework
go in than this little patch of mine. I only sent out this patch
because John said he wasn't sure how soon his rework would get in.

> John, do those patches address this bug?

I can answer that. Yes -- John's patch had a similar bug, but I sent
him a similar fix for it and he has merged it in his version C2.
(However, the issues that Roman and I are discussing apply there too, so
more work may still be needed to make pit a good clocksource, and maybe
it can be good only on UP systems...)

> So if we decide these two patches are not-for-2.6.17 then I'll sit on them
> until we decide whether or not to merge John's patches. If we do, and if
> those patches fix this problem then your two patches aren't needed. If
> John's patches don't get merged then I'll need to merge these two.
>
> Hope that all makes sense ;)

It makes perfect sense to me and sounds like the right thing to do. Thanks!


--
Tim Mann work: [email protected] home: [email protected]
http://www.vmware.com http://tim-mann.org

2006-05-19 02:02:21

by Tim Mann

[permalink] [raw]
Subject: Re: Fix time going backward with clock=pit [1/2]

On Fri, 19 May 2006 02:09:29 +0200 (CEST), Roman Zippel <[email protected]> wrote:
> Hi,
>
> On Thu, 18 May 2006, Tim Mann wrote:
>
> > > I think the whole think should look
> > > something like this:
> > >
> > > if (jiffies_t == jiffies_p) {
> > > if (count > count_p) {
> > > underflow or crappy timer;
> >
> > What should the code do in this case?
>
> Basically the current do_timer_overflow().

Oh, I see, you were also assuming there's a way to fix that.

> > > }
> > > } else {
> > > jiffies_p = jiffies_t;
> > > if (count > LATCH/2 && underflow)
> > > count -= LATCH;
> >
> > I think I see what you're aiming at here: in the case where we read
> > count, then the counter wraps, then we read jiffies, you want to detect
> > that and fix it. Actually if you could detect that, the right way to
> > fix it would be to set count = LATCH, since the old count is, well, old:
> > the current time is right after the jiffy.
>
> It's really "-= LATCH". :)

Yeah. :-)

Tangentially, let me point out another thing: letting get_offset_pit
return more than a jiffy is dangerous if we can ever lose a tick (due to
interrupts being disabled for too long, etc.). If that happens,
get_offset_pit might advance time past the lost tick, but when the next
non-lost tick comes in, jiffies is incremented by only 1 and count
recycles again, so time effectively snaps backward to the point where
the lost tick occurred. Losing ticks is bad in itself, of course, but
having that make time actually go backward by about a jiffy (rather than
just stop for one jiffy) seems a bit worse.

> > However, we don't really have a way to detect that this case happened --
> > the "&& underflow" in your code is a handwave.
>
> Ok, I'm not that familiar with Intel hardware (it must be crappier than I
> thought). Is there really no way to detect the pending interrupt (e.g.
> what do_timer_overflow() does)? Without that information one can really
> only guess the time.
>
> It's not that important if it's not completely correct for SMP systems,
> they usually have other sources, but for the few systems there this is the
> only time source, we should at least make an effort to avoid the read
> error.

Hmm. If you don't care about SMP systems, that makes the problem
tractable. In that case get_offset_pit can assume that acknowledging
the interrupt and incrementing jiffies happen atomically (since that's
done at interrupt level), so checking whether there's an unacknowledged
interrupt is a sound approach. I'm definitely not expert enough to be
sure how/if you can do that correctly, though. The current code in
do_timer_overflow may be correct for systems using PIC interrupt
routing, but it doesn't seem to work in the APIC systems I've tried it
on, and I don't have a suggestion for how to fix that case. Maybe
someone else does...?

It also would be preferable to fix the SMP case so that at least time
doesn't go backward there, in case someone tries to use the pit
clocksource there. It's quite easy to hit the window where one CPU
calls gettimeofday while another one has ack'd a timer interrupt but
hasn't incremented jiffies yet. Or I suppose we could disable the pit
clocksource for SMP systems, but that seems a bit draconian.

--
Tim Mann work: [email protected] home: [email protected]
http://www.vmware.com http://tim-mann.org

2006-05-19 15:31:53

by Roman Zippel

[permalink] [raw]
Subject: Re: Fix time going backward with clock=pit [1/2]

Hi,

On Thu, 18 May 2006, Tim Mann wrote:

> Tangentially, let me point out another thing: letting get_offset_pit
> return more than a jiffy is dangerous if we can ever lose a tick (due to
> interrupts being disabled for too long, etc.). If that happens,
> get_offset_pit might advance time past the lost tick, but when the next
> non-lost tick comes in, jiffies is incremented by only 1 and count
> recycles again, so time effectively snaps backward to the point where
> the lost tick occurred. Losing ticks is bad in itself, of course, but
> having that make time actually go backward by about a jiffy (rather than
> just stop for one jiffy) seems a bit worse.

This can only happen if someone disables the interrupts for too long and
then calls get_offset_pit(). As soon as the timer interrupt had a chance
to run, time is advanced and everything should be fine.
IMO it's not worth it to support such misbehaviour.

> > > However, we don't really have a way to detect that this case happened --
> > > the "&& underflow" in your code is a handwave.
> >
> > Ok, I'm not that familiar with Intel hardware (it must be crappier than I
> > thought). Is there really no way to detect the pending interrupt (e.g.
> > what do_timer_overflow() does)? Without that information one can really
> > only guess the time.
> >
> > It's not that important if it's not completely correct for SMP systems,
> > they usually have other sources, but for the few systems there this is the
> > only time source, we should at least make an effort to avoid the read
> > error.
>
> Hmm. If you don't care about SMP systems, that makes the problem
> tractable. In that case get_offset_pit can assume that acknowledging
> the interrupt and incrementing jiffies happen atomically (since that's
> done at interrupt level), so checking whether there's an unacknowledged
> interrupt is a sound approach. I'm definitely not expert enough to be
> sure how/if you can do that correctly, though. The current code in
> do_timer_overflow may be correct for systems using PIC interrupt
> routing, but it doesn't seem to work in the APIC systems I've tried it
> on, and I don't have a suggestion for how to fix that case. Maybe
> someone else does...?
>
> It also would be preferable to fix the SMP case so that at least time
> doesn't go backward there, in case someone tries to use the pit
> clocksource there. It's quite easy to hit the window where one CPU
> calls gettimeofday while another one has ack'd a timer interrupt but
> hasn't incremented jiffies yet. Or I suppose we could disable the pit
> clocksource for SMP systems, but that seems a bit draconian.

We should at least add a warning that the clock is not usable for precise
timekeeping (in the resolution limits one would expect from it).

In the UP case we can live without the underflow information, if we assume
the function is called with interrupts enabled and it's properly restarted
in case of an underflow via the timer interrupt. If we do this, we also
have to document this somewhere that it relies on the current seq_lock
bevaviour.
I guess in the SMP IO-APIC case we can't do much more than print the
warning and make sure the time doesn't go backwards as you suggested.

If the underflow information is usable from the PIC, we could make proper
use of it as I suggested and this had the advantange it's safe to use with
interrupts disabled and doesn't require retrying. Although to make it safe
for SMP it would also require synchronisation with the interrupt
acknowledgement.

Anyway, the current underflow handling is next to useless, so I guess it's
better to remove it and just document the current limitations. If someone
cares enough, he can then do an alternative offset function properly using
the information from the PIC.

bye, Roman

2006-05-19 23:15:33

by Tim Mann

[permalink] [raw]
Subject: Re: Fix time going backward with clock=pit [1/2]

> > > It's not that important if it's not completely correct for SMP systems,
> > > they usually have other sources, but for the few systems there this is the
> > > only time source, we should at least make an effort to avoid the read
> > > error.
> >
> > Hmm. If you don't care about SMP systems, that makes the problem
> > tractable. In that case get_offset_pit can assume that acknowledging
> > the interrupt and incrementing jiffies happen atomically (since that's
> > done at interrupt level), so checking whether there's an unacknowledged
> > interrupt is a sound approach. I'm definitely not expert enough to be
> > sure how/if you can do that correctly, though. The current code in
> > do_timer_overflow may be correct for systems using PIC interrupt
> > routing, but it doesn't seem to work in the APIC systems I've tried it
> > on, and I don't have a suggestion for how to fix that case. Maybe
> > someone else does...?
> >
> > It also would be preferable to fix the SMP case so that at least time
> > doesn't go backward there, in case someone tries to use the pit
> > clocksource there. It's quite easy to hit the window where one CPU
> > calls gettimeofday while another one has ack'd a timer interrupt but
> > hasn't incremented jiffies yet. Or I suppose we could disable the pit
> > clocksource for SMP systems, but that seems a bit draconian.
>
> We should at least add a warning that the clock is not usable for precise
> timekeeping (in the resolution limits one would expect from it).
>
> In the UP case we can live without the underflow information, if we assume
> the function is called with interrupts enabled and it's properly restarted
> in case of an underflow via the timer interrupt. If we do this, we also
> have to document this somewhere that it relies on the current seq_lock
> bevaviour.

I think it's cleanest to put a loop into get_offset_pit itself,
something like this:

{
static unsigned long jiffies_p = INITIAL_JIFFIES;
unsigned long jiffies1, jiffies2;
static int count_p = LATCH;
int count;

/*
* It's difficult to get a jiffies value and count that are
* guaranteed to be coherent, because count can wrap at any
* time, generating an interrupt which may not be handled
* immediately.
*
* The algorithm below is correct in the uniprocessor case and
* if interrupts are enabled. It works because although we
* cannot read jiffies and count in one atomic operation, they
* are effectively updated atomically: when count wraps, that
* causes an interrupt that is handled by the one and only
* CPU, and the interrupt hander increments jiffies. Thus,
* code that runs with interrupts enabled cannot read the
* count and then subsequently read an *older* value of
* jiffies.
*
* In the SMP case, however, the interrupt may be routed to a
* different CPU and handled there, and that CPU may not yet
* have gotten around to incrementing jiffies before we next
* read it. So as a band-aid for the SMP case, we store the
* last (jiffies, count) value returned by this function and
* make sure that time never appears to go backward. However,
* time can still spuriously jump forward by up to almost 1
* jiffy (though usually less) and stick there until real time
* catches up.
*/
jiffies1 = jiffies;
do {
jiffies2 = jiffies1;
count = <read pit counter>;
jiffies1 = jiffies;
} while (jiffies1 != jiffies2);

if (jiffies1 == jiffies_p && count > count_p) {
/*
* Should happen only on SMP systems or due to Neptun
* bug. (Well, or if you call this routine with
* interrupts disabled.)
*/
count = count_p;
}

jiffies_p = jiffies1;
count_p = count;

<...>
}

I think this actually ends up producing the same results as what I
originally sent out, but it's commented better, it does its own retries
instead of relying on the essentially incidental fact that it's called
from inside a seqlock retry loop, and it's a little easier to reason
about because it stores a value in (jiffies_p, count_p) only if it
passes the jiffies1 == jiffies2 test.

I'll try this idea in real code and update my patch, and will also
update the corresponding fix for clocksource=pit in John's rework.

> I guess in the SMP IO-APIC case we can't do much more than print the
> warning and make sure the time doesn't go backwards as you suggested.
>
> If the underflow information is usable from the PIC, we could make proper
> use of it as I suggested and this had the advantange it's safe to use with
> interrupts disabled and doesn't require retrying.

Do you know if we need this code to work with interrupts disabled?
I can try to find out...

> Although to make it safe
> for SMP it would also require synchronisation with the interrupt
> acknowledgement.
>
> Anyway, the current underflow handling is next to useless, so I guess it's
> better to remove it and just document the current limitations. If someone
> cares enough, he can then do an alternative offset function properly using
> the information from the PIC.

--
Tim Mann work: [email protected] home: [email protected]
http://www.vmware.com http://tim-mann.org

2006-05-26 00:09:50

by john stultz

[permalink] [raw]
Subject: [-mm PATCH] time: fix time going backward w/ clock=pit

On Thu, 2006-05-18 at 01:11 -0700, Andrew Morton wrote:
> Tim Mann <[email protected]> wrote:
> >
> > Currently, if you boot with clock=pit on the kernel command line and
> > run a program that loops calling gettimeofday, on many machines you'll
> > observe that time frequently goes backward by about one jiffy. This
> > patch fixes that symptom and also some other related bugs.
>
> And for 2.6.18 we're hoping to get John's x86 timer rework merged up.
> John, do those patches address this bug?
>
> So if we decide these two patches are not-for-2.6.17 then I'll sit on them
> until we decide whether or not to merge John's patches. If we do, and if
> those patches fix this problem then your two patches aren't needed. If
> John's patches don't get merged then I'll need to merge these two.

Hey Andrew,
Sorry I've been so slow here, just starting to recover from a one week
+ flu. :P Here is the PIT fix against the TOD patches that Tim pointed
out. Many thanks to Tim for hunting this down.

thanks
-john

Signed-off-by: John Stultz <[email protected]>

Index: devmm/arch/i386/kernel/i8253.c
===================================================================
--- devmm.orig/arch/i386/kernel/i8253.c 2006-05-25 18:12:41.000000000 -0500
+++ devmm/arch/i386/kernel/i8253.c 2006-05-25 18:52:31.000000000 -0500
@@ -41,9 +41,25 @@
{
unsigned long flags;
int count;
- u64 jifs;
+ u32 jifs;
+ static int old_count;
+ static u32 old_jifs;

spin_lock_irqsave(&i8253_lock, flags);
+ /*
+ * Although our caller may have the read side of xtime_lock,
+ * this is now a seqlock, and we are cheating in this routine
+ * by having side effects on state that we cannot undo if
+ * there is a collision on the seqlock and our caller has to
+ * retry. (Namely, old_jifs and old_count.) So we must treat
+ * jiffies as volatile despite the lock. We read jiffies
+ * before latching the timer count to guarantee that although
+ * the jiffies value might be older than the count (that is,
+ * the counter may underflow between the last point where
+ * jiffies was incremented and the point where we latch the
+ * count), it cannot be newer.
+ */
+ jifs = jiffies;
outb_p(0x00, PIT_MODE); /* latch the count ASAP */
count = inb_p(PIT_CH0); /* read the latched count */
count |= inb_p(PIT_CH0) << 8;
@@ -55,12 +71,29 @@
outb(LATCH >> 8, PIT_CH0);
count = LATCH - 1;
}
- spin_unlock_irqrestore(&i8253_lock, flags);

- jifs = jiffies_64;
+ /*
+ * It's possible for count to appear to go the wrong way for a
+ * couple of reasons:
+ *
+ * 1. The timer counter underflows, but we haven't handled the
+ * resulting interrupt and incremented jiffies yet.
+ * 2. Hardware problem with the timer, not giving us continuous time,
+ * the counter does small "jumps" upwards on some Pentium systems,
+ * (see c't 95/10 page 335 for Neptun bug.)
+ *
+ * Previous attempts to handle these cases intelligently were
+ * buggy, so we just do the simple thing now.
+ */
+ if (count > old_count && jifs == old_jifs) {
+ count = old_count;
+ }
+ old_count = count;
+ old_jifs = jifs;
+
+ spin_unlock_irqrestore(&i8253_lock, flags);

- jifs -= INITIAL_JIFFIES;
- count = (LATCH-1) - count;
+ count = (LATCH - 1) - count;

return (cycle_t)(jifs * LATCH) + count;
}
@@ -69,7 +102,7 @@
.name = "pit",
.rating = 110,
.read = pit_read,
- .mask = CLOCKSOURCE_MASK(64),
+ .mask = CLOCKSOURCE_MASK(32),
.mult = 0,
.shift = 20,
};