2006-10-11 19:54:26

by john stultz

[permalink] [raw]
Subject: [PATCH] i386 Time: Avoid PIT SMP lockups

Andrew: I think this is 2.6.19 material, but probably should go through an -mm or two.

thanks
-john


This patch avoids possible PIT livelock issues seen on SMP systems (and
reported by Andi), by not allowing it as a clocksource on SMP boxes.

However, since the PIT may no longer be present, we have to properly
handle the cases where SMP systems have TSC skew and fall back from the
TSC. Since the PIT isn't there, it would "fall back" to the TSC again.
So this changes the jiffies rating to 1, and the TSC-bad rating value to
0.

Thus you will get the following behavior priority on i386 systems:

tsc [if present & stable]
hpet [if present]
cyclone [if present]
acpi_pm [if present]
pit [if UP]
jiffies

Rather then the current more complicated:
tsc [if present & stable]
hpet [if present]
cyclone [if present]
acpi_pm [if present]
pit [if cpus < 4]
tsc [if present & unstable]
jiffies

Signed-off-by: John Stultz <[email protected]>

diff --git a/arch/i386/kernel/i8253.c b/arch/i386/kernel/i8253.c
index 477b24d..9a0060b 100644
--- a/arch/i386/kernel/i8253.c
+++ b/arch/i386/kernel/i8253.c
@@ -109,7 +109,7 @@ static struct clocksource clocksource_pi

static int __init init_pit_clocksource(void)
{
- if (num_possible_cpus() > 4) /* PIT does not scale! */
+ if (num_possible_cpus() > 1) /* PIT does not scale! */
return 0;

clocksource_pit.mult = clocksource_hz2mult(CLOCK_TICK_RATE, 20);
diff --git a/arch/i386/kernel/tsc.c b/arch/i386/kernel/tsc.c
index b8fa0a8..fbc9582 100644
--- a/arch/i386/kernel/tsc.c
+++ b/arch/i386/kernel/tsc.c
@@ -349,8 +349,8 @@ static int tsc_update_callback(void)
int change = 0;

/* check to see if we should switch to the safe clocksource: */
- if (clocksource_tsc.rating != 50 && check_tsc_unstable()) {
- clocksource_tsc.rating = 50;
+ if (clocksource_tsc.rating != 0 && check_tsc_unstable()) {
+ clocksource_tsc.rating = 0;
clocksource_reselect();
change = 1;
}
@@ -461,7 +461,7 @@ static int __init init_tsc_clocksource(v
clocksource_tsc.shift);
/* lower the rating if we already know its unstable: */
if (check_tsc_unstable())
- clocksource_tsc.rating = 50;
+ clocksource_tsc.rating = 0;

init_timer(&verify_tsc_freq_timer);
verify_tsc_freq_timer.function = verify_tsc_freq;
diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
index 126bb30..a99b2a6 100644
--- a/kernel/time/jiffies.c
+++ b/kernel/time/jiffies.c
@@ -57,7 +57,7 @@ static cycle_t jiffies_read(void)

struct clocksource clocksource_jiffies = {
.name = "jiffies",
- .rating = 0, /* lowest rating*/
+ .rating = 1, /* lowest valid rating*/
.read = jiffies_read,
.mask = 0xffffffff, /*32bits*/
.mult = NSEC_PER_JIFFY << JIFFIES_SHIFT, /* details above */



2006-10-11 21:27:16

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] i386 Time: Avoid PIT SMP lockups

On Wed, 11 Oct 2006 12:54:21 -0700
john stultz <[email protected]> wrote:

> Andrew: I think this is 2.6.19 material, but probably should go through an -mm or two.
>
> thanks
> -john
>
>
> This patch avoids possible PIT livelock issues seen on SMP systems (and
> reported by Andi), by not allowing it as a clocksource on SMP boxes.
>
> However, since the PIT may no longer be present, we have to properly
> handle the cases where SMP systems have TSC skew and fall back from the
> TSC. Since the PIT isn't there, it would "fall back" to the TSC again.
> So this changes the jiffies rating to 1, and the TSC-bad rating value to
> 0.
>
> Thus you will get the following behavior priority on i386 systems:
>
> tsc [if present & stable]
> hpet [if present]
> cyclone [if present]
> acpi_pm [if present]
> pit [if UP]
> jiffies
>
> Rather then the current more complicated:
> tsc [if present & stable]
> hpet [if present]
> cyclone [if present]
> acpi_pm [if present]
> pit [if cpus < 4]

Actually <=4, and that matters: there are a lot of 4-ways.

> tsc [if present & unstable]
> jiffies
>

So this patch has the potential to screw up people who have 2-way or 4-way,
no hpet/pm-timer and dodgy TSCs.

Wouldn't it be better to fix the livelock? What's causing it?

2006-10-11 22:48:41

by john stultz

[permalink] [raw]
Subject: Re: [PATCH] i386 Time: Avoid PIT SMP lockups

On Wed, 2006-10-11 at 14:26 -0700, Andrew Morton wrote:
> On Wed, 11 Oct 2006 12:54:21 -0700
> john stultz <[email protected]> wrote:
> > Andrew: I think this is 2.6.19 material, but probably should go through an -mm or two.
> >
> >
> > This patch avoids possible PIT livelock issues seen on SMP systems (and
> > reported by Andi), by not allowing it as a clocksource on SMP boxes.
> >
> > However, since the PIT may no longer be present, we have to properly
> > handle the cases where SMP systems have TSC skew and fall back from the
> > TSC. Since the PIT isn't there, it would "fall back" to the TSC again.
> > So this changes the jiffies rating to 1, and the TSC-bad rating value to
> > 0.
> >
> > Thus you will get the following behavior priority on i386 systems:
> >
> > tsc [if present & stable]
> > hpet [if present]
> > cyclone [if present]
> > acpi_pm [if present]
> > pit [if UP]
> > jiffies
> >
> > Rather then the current more complicated:
> > tsc [if present & stable]
> > hpet [if present]
> > cyclone [if present]
> > acpi_pm [if present]
> > pit [if cpus < 4]
>
> Actually <=4, and that matters: there are a lot of 4-ways.

Yes, you're right.

>
> > tsc [if present & unstable]
> > jiffies
> >
>
> So this patch has the potential to screw up people who have 2-way or 4-way,
> no hpet/pm-timer and dodgy TSCs.

Indeed. Although I believe the number of TSC screwy SMP boxes w/o an
ACPI PM or HPET is quite small (NUMAQ being the one I'm familiar with,
but it already was using jiffies in multi-node configs).

And by "screw up", it would mean pretty course(tick) granular time.
Otherwise the systems should still function correctly, but I recognize
the loss in granularity could be seen as a regression. The trade off
being that a hang (which would be seen otherwise) isn't acceptable
either.

> Wouldn't it be better to fix the livelock? What's causing it?

I spent a few days trying to narrow this down, and I haven't been able
to do so to my satisfaction.

At this point, my suspicion is that because the PIT io-read is very slow
(~18us), and done while holding a lock. It would be possible that one
cpu calling gettimeofday would do the following:

grab xtime sequence read lock
grab i8253 spin lock
do port io (very slow)
release i8253 spin lock
realize xtime has been grabed and repeat

While another cpu does the following after in a timer interrupt:
Grabs xtime sequence write lock
spins trying to grab i8253 spin lock

Assuming the first thread can reacquire the i8253 lock before the
second, you could have both threads potentially spinning forever.

I've not yet been able to prove this. alt-sysrq-t usually doesn't have
*any* threads in gettimeofday/timer_interrupt, but removing the
expensive port io in pit_read() causes the hang to go away.

I'm open to digging on this more, but I wanted to get the fix out so
others didn't hit the hang in the meantime.

thanks
-john

2006-10-11 23:03:36

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] i386 Time: Avoid PIT SMP lockups

On Wed, 11 Oct 2006 15:48:31 -0700
john stultz <[email protected]> wrote:

> > Wouldn't it be better to fix the livelock? What's causing it?
>
> I spent a few days trying to narrow this down, and I haven't been able
> to do so to my satisfaction.
>
> At this point, my suspicion is that because the PIT io-read is very slow
> (~18us), and done while holding a lock. It would be possible that one
> cpu calling gettimeofday would do the following:
>
> grab xtime sequence read lock
> grab i8253 spin lock
> do port io (very slow)
> release i8253 spin lock
> realize xtime has been grabed and repeat
>
> While another cpu does the following after in a timer interrupt:
> Grabs xtime sequence write lock
> spins trying to grab i8253 spin lock
>
> Assuming the first thread can reacquire the i8253 lock before the
> second, you could have both threads potentially spinning forever.

Is there any actual need to hold xtime_lock while doing the port IO? I'd
have thought it would suffice to do

temp = port_io
write_seqlock(xtime_lock);
xtime = muck_with(temp);
write_sequnlock(xtime_lock);

?

2006-10-16 13:48:12

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] i386 Time: Avoid PIT SMP lockups

Andrew Morton <[email protected]> writes:
>
> Is there any actual need to hold xtime_lock while doing the port IO? I'd
> have thought it would suffice to do
>
> temp = port_io
> write_seqlock(xtime_lock);
> xtime = muck_with(temp);
> write_sequnlock(xtime_lock);
>
> ?

That would be a good idea in general. The trouble is just that whatever race
is there will be still there then, just harder to trigger (so instead of
every third boot it will muck up every 6 weeks). Not sure that is
a real improvement.

-Andi

2006-10-16 18:39:51

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] i386 Time: Avoid PIT SMP lockups

On 16 Oct 2006 15:48:02 +0200
Andi Kleen <[email protected]> wrote:

> Andrew Morton <[email protected]> writes:
> >
> > Is there any actual need to hold xtime_lock while doing the port IO? I'd
> > have thought it would suffice to do
> >
> > temp = port_io
> > write_seqlock(xtime_lock);
> > xtime = muck_with(temp);
> > write_sequnlock(xtime_lock);
> >
> > ?
>
> That would be a good idea in general. The trouble is just that whatever race
> is there will be still there then, just harder to trigger (so instead of
> every third boot it will muck up every 6 weeks). Not sure that is
> a real improvement.
>

Confused. What race are you referring to?

This is addressing a starvation problem which is due to the slowness of the
port-io (iirc).

2006-10-16 18:42:34

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] i386 Time: Avoid PIT SMP lockups

On Monday 16 October 2006 20:39, Andrew Morton wrote:
> On 16 Oct 2006 15:48:02 +0200
> Andi Kleen <[email protected]> wrote:
>
> > Andrew Morton <[email protected]> writes:
> > >
> > > Is there any actual need to hold xtime_lock while doing the port IO? I'd
> > > have thought it would suffice to do
> > >
> > > temp = port_io
> > > write_seqlock(xtime_lock);
> > > xtime = muck_with(temp);
> > > write_sequnlock(xtime_lock);
> > >
> > > ?
> >
> > That would be a good idea in general. The trouble is just that whatever race
> > is there will be still there then, just harder to trigger (so instead of
> > every third boot it will muck up every 6 weeks). Not sure that is
> > a real improvement.
> >
>
> Confused. What race are you referring to?

Sorry s/race/starvation/

>
> This is addressing a starvation problem which is due to the slowness of the
> port-io (iirc).

Is it just sure to go away when the critical section is shorter?

-Andi



2006-10-16 19:09:19

by john stultz

[permalink] [raw]
Subject: Re: [PATCH] i386 Time: Avoid PIT SMP lockups

On Mon, 2006-10-16 at 20:42 +0200, Andi Kleen wrote:
> On Monday 16 October 2006 20:39, Andrew Morton wrote:
> > On 16 Oct 2006 15:48:02 +0200
> > Andi Kleen <[email protected]> wrote:
> >
> > > Andrew Morton <[email protected]> writes:
> > > >
> > > > Is there any actual need to hold xtime_lock while doing the port IO? I'd
> > > > have thought it would suffice to do
> > > >
> > > > temp = port_io
> > > > write_seqlock(xtime_lock);
> > > > xtime = muck_with(temp);
> > > > write_sequnlock(xtime_lock);
> > > >
> > > > ?
> > >
> > > That would be a good idea in general. The trouble is just that whatever race
> > > is there will be still there then, just harder to trigger (so instead of
> > > every third boot it will muck up every 6 weeks). Not sure that is
> > > a real improvement.

It is an interesting idea that could be applied generally. There might
be some small races possible (where the cycle_last grabed within
xtime_lock might be ahead of the value pulled from the port_io), but
I'll put some time into seeing if it will work.


> > Confused. What race are you referring to?
>
> Sorry s/race/starvation/
>
> >
> > This is addressing a starvation problem which is due to the slowness of the
> > port-io (iirc).
>
> Is it just sure to go away when the critical section is shorter?

Yea. I don't see the box hangs when the portio is removed. Although I
don't really feel I've narrowed it down completely to the starvation
theory. Its just the best theory I've got so far.

Currently my test box is busy with other things, but as soon as its free
I'll put some more time into this one.

thanks
-john

2006-10-18 16:16:25

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH] i386 Time: Avoid PIT SMP lockups

On Wed, 11 Oct 2006, Andrew Morton wrote:

> So this patch has the potential to screw up people who have 2-way or 4-way,
> no hpet/pm-timer and dodgy TSCs.

Note that all APIC-based SMP systems (even these rare i486 beasts) by
definition do have local APIC timers, one per CPU, with a reasonable
resolution which could likely be used instead. There is an erratum in
some early Pentium integrated APICs, where a tick is lost when 0 is
reached (the timer counts downwards), so it may be needed to be taken into
account for good timekeeping, but otherwise the timers look like a
reasonable choice. The timers are local to their respective CPUs and are
never written by Linux (see references to APIC_TMCCT), so there is no need
for locking to access them.

Maciej

2006-10-18 16:27:45

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] i386 Time: Avoid PIT SMP lockups

"Maciej W. Rozycki" <[email protected]> writes:

> On Wed, 11 Oct 2006, Andrew Morton wrote:
>
> > So this patch has the potential to screw up people who have 2-way or 4-way,
> > no hpet/pm-timer and dodgy TSCs.
>
> Note that all APIC-based SMP systems (even these rare i486 beasts) by
> definition do have local APIC timers, one per CPU, with a reasonable
> resolution which could likely be used instead.

It wouldn't work on dual core laptop chipsets which support C3 and where
the APIC timer stops during C3.

I had a apicrunsmaintimer option for some time on x86-64 but it broke
on a few other non laptop machines for unknown reasons too, so I cannot
really recommend it.

-Andi

2006-10-18 16:45:17

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH] i386 Time: Avoid PIT SMP lockups

On Wed, 18 Oct 2006, Andi Kleen wrote:

> It wouldn't work on dual core laptop chipsets which support C3 and where
> the APIC timer stops during C3.
>
> I had a apicrunsmaintimer option for some time on x86-64 but it broke
> on a few other non laptop machines for unknown reasons too, so I cannot
> really recommend it.

So they managed to break it too? Oh well...

If you look at early APIC documentation, then you'll see how the APIC
timers were recommended to be the only source of the clock tick for SMP
systems in favour to the old-fashioned PIT. Now, how can it be used if it
does not run all the time?...

Maciej