2001-11-07 11:50:48

by Jonas Diemer

[permalink] [raw]
Subject: VIA 686 timer bugfix incomplete

HI!

I noticed with great happiness that the via timer bugfix had made it into linus'
version of the kernel as of 2.4.10.

but it seems that the patch was incomplete: The bug is still triggered on my
computer using 2.4.14, but the bugfix seems to work whith -ac kernels.


now I diffed the ac-version of that file whith the current version of linus'
kernel:

# diff -u arch/i386/kernel/time.c ../linux-2.4.10-ac12/arch/i386/kernel/time.c


--- arch/i386/kernel/time.c Wed Oct 24 17:16:13 2001
+++ ../linux-2.4.10-ac12/arch/i386/kernel/time.c Sun Oct 14 16:23:52 2001
@@ -501,6 +501,24 @@

count = inb_p(0x40); /* read the latched count */
count |= inb(0x40) << 8;
+
+
+ /* VIA686a test code... reset the latch if count > max */
+ if (count > LATCH) {
+ static int last_whine;
+ outb_p(0x34, 0x43);
+ outb_p(LATCH & 0xff, 0x40);
+ outb(LATCH >> 8, 0x40);
+ count = LATCH - 1;
+ if(time_after(jiffies, last_whine))
+ {
+ printk(KERN_WARNING "probable hardware bug:
clock timer configuration lost - probably a VIA686a motherboard.\n");
+ printk(KERN_WARNING "probable hardware bug:
restoring chip configuration.\n");
+ last_whine = jiffies + HZ;
+ }
+ }
+
+
spin_unlock(&i8253_lock);

count = ((LATCH-1) - count) * TICK_SIZE;


you can see what's missing to actually work around the via timer bug. I hope
this will go into 2.4.15.

regards

PS: CC me in your answers, I am not subscribed to the list


2001-11-07 12:08:59

by Alan

[permalink] [raw]
Subject: Re: VIA 686 timer bugfix incomplete

> but it seems that the patch was incomplete: The bug is still triggered on my
> computer using 2.4.14, but the bugfix seems to work whith -ac kernels.

The first piece is in.

> you can see what's missing to actually work around the via timer bug. I hope
> this will go into 2.4.15.

I don't plan to submit it until the locking fixes for the timer access are
done and we know the real cause

2001-11-07 19:26:18

by Steve Underwood

[permalink] [raw]
Subject: Re: VIA 686 timer bugfix incomplete

Alan Cox wrote:

>>but it seems that the patch was incomplete: The bug is still triggered on my
>>computer using 2.4.14, but the bugfix seems to work whith -ac kernels.
>>
>
> The first piece is in.
>
>
>>you can see what's missing to actually work around the via timer bug. I hope
>>this will go into 2.4.15.
>>
>
> I don't plan to submit it until the locking fixes for the timer access are
> done and we know the real cause


If the messages:

probable hardware bug: clock timer configuration lost - probably a
VIA686a motherboard.
probable hardware bug: restoring chip configuration.

are really related to a VIA686A bug, why do they erratically appear on
Compaq ML370's, which use ServerWorks chip sets? Is there a common bug
between these chip sets? Seems unlikely.

Regards,
Steve



2001-11-07 19:26:17

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: VIA 686 timer bugfix incomplete

On Wed, Nov 07, 2001 at 12:15:47PM +0000, Alan Cox wrote:
> > but it seems that the patch was incomplete: The bug is still triggered on my
> > computer using 2.4.14, but the bugfix seems to work whith -ac kernels.
>
> The first piece is in.
>
> > you can see what's missing to actually work around the via timer bug. I hope
> > this will go into 2.4.15.
>
> I don't plan to submit it until the locking fixes for the timer access are
> done and we know the real cause

I'm trying to figure it out (locking, bug workarounds), but its tough:

We have two hw bugs:

1) The VIA 686a bug (happening at least on vt82c686a, possibly also 686b),
where the timer chip sometimes corrupts its programming, not conting
from 11920 down to 0, but from a higher value, presumably 65536. There
is no good workaround known - all we can do is detect it when it happens
and restore the programming. Some ticks are lost irreversibly, though.

2) The Intel Neptune (happening at least on Mercury and Neptune P6
chipsets, but very likely also on newer chipsets, including SiS). The
bug is in the 0x00 (latch) command to the timer chip, which instead of
reading the 16-bit counter into a temporary buffer just selects it to be
read. The subsequent two 8-bit reads read the counter non-atomically,
which can cause a value larger by 256 to be read instead of the correct
one.

The bug #2 can trigger the test for #1, because the timer is read just
after the timer interrupt happens and thus the value is usually around
11920, which, plus 256 is larger than 11920.

Also, bug #2 isn't correctly worked around in the kernel. There is some
logic to work it around when it'd give too inconsistent results, but
still isn't giving correct results on affected chips.

Furthermore, the i8253 is accessed from more than one place:

timer.c: do_slow_gettimeofday() ... has both workarounds
timer_interrupt() ... only has VIA workaround
apic.c: ... only has Neptune workaround
ftape-calibr.c: ... has a crazy workaround for some
other hardware bug, bad
implementation
gameport.c, analog.c: ... no workarounds present, not
too critical
hd.c, ide.c ... no workarounds, bad implementation,
#ifdef-ed out.

Only timer.c and apic.c do proper locking.

The locking itself isn't a problem to solve. And it's also not enough to
fix the problems that are seen on SiS and other newer chipsets - most of
the users don't use gameport/analog/ftape, and thus have the locking
correct.

The problem is how to work around the bugs 1) and 2) reliably and
without too much performance impact. I haven't found a feasible way to
do that yet.

Best would be to forget about the i8253 reading altogether and use some
other means of doing gettimeofday and timex et cetera, if there is any
present (RTC, TSC, whatever) ...

--
Vojtech Pavlik
SuSE Labs

2001-11-07 19:48:17

by Jonas Diemer

[permalink] [raw]
Subject: Re: VIA 686 timer bugfix incomplete

On Wed, 7 Nov 2001 20:25:46 +0100
Vojtech Pavlik <[email protected]> wrote:

...
> The bug #2 can trigger the test for #1, because the timer is read just
> after the timer interrupt happens and thus the value is usually around
> 11920, which, plus 256 is larger than 11920.
>

so why don't you simply add a new option to the config file, that says "work
around Via 686a bug"? that way, only ppl who have the bug need the fix.

...
> Only timer.c and apic.c do proper locking.
>

well, but as I said, the workaround in arch/i386/kernel/time.c is incomplete, at
least in linus' kernel tree!

> The problem is how to work around the bugs 1) and 2) reliably and
> without too much performance impact. I haven't found a feasible way to
> do that yet.

well, just use the option described above. that way, ppl that need the fix can
choose to use it (at a cost of performance), others simply don't need checking.

-jonas

PS: CC me in your answers plz, I am not subscribed to the list.

2001-11-07 20:15:20

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: VIA 686 timer bugfix incomplete

On Wed, Nov 07, 2001 at 08:48:00PM +0100, Jonas Diemer wrote:
> On Wed, 7 Nov 2001 20:25:46 +0100
> Vojtech Pavlik <[email protected]> wrote:
>
> ...
> > The bug #2 can trigger the test for #1, because the timer is read just
> > after the timer interrupt happens and thus the value is usually around
> > 11920, which, plus 256 is larger than 11920.
> >
>
> so why don't you simply add a new option to the config file, that says "work
> around Via 686a bug"? that way, only ppl who have the bug need the fix.
>
> ...
> > Only timer.c and apic.c do proper locking.
> >
>
> well, but as I said, the workaround in arch/i386/kernel/time.c is incomplete, at
> least in linus' kernel tree!
>
> > The problem is how to work around the bugs 1) and 2) reliably and
> > without too much performance impact. I haven't found a feasible way to
> > do that yet.
>
> well, just use the option described above. that way, ppl that need the fix can
> choose to use it (at a cost of performance), others simply don't need checking.
>
> -jonas
>
> PS: CC me in your answers plz, I am not subscribed to the list.

The VIA bug isn't a problem: The fix doesn't cause performance problems
to people unaffected by the bug, it just prints an annoying message to
people who see it triggered by bug #2 (Neptune).

The Neptune bug (which seems much more widespread than expected) is a
much larger problem - it's hard to detect without performance
degradation and currently it isn't known which chipsets are affected. It
is known that Intel Mercury and Intel Neptune (old P6 chipsets) are. But
how about others ... ?

--
Vojtech Pavlik
SuSE Labs

2001-11-07 22:22:37

by Neale Banks

[permalink] [raw]
Subject: Re: VIA 686 timer bugfix incomplete

On Wed, 7 Nov 2001, Vojtech Pavlik wrote:

[...]
> On Wed, Nov 07, 2001 at 08:48:00PM +0100, Jonas Diemer wrote:
[...]
> > well, just use the option described above. that way, ppl that need the fix can
> > choose to use it (at a cost of performance), others simply don't need checking.
> >
> > -jonas
> >
> > PS: CC me in your answers plz, I am not subscribed to the list.
>
> The VIA bug isn't a problem: The fix doesn't cause performance problems
> to people unaffected by the bug, it just prints an annoying message to
> people who see it triggered by bug #2 (Neptune).
[snip]

Maybe not performance problems, but my tired-but-otherwise-reliable
AcerNote-950C (which definitely does not have a VIA686a - it's a Pentium)
doesn't seem to like this VIA686a fix (but only sometimes {:-( ).

Prior to 2.2.19, on going to sleep due to low battery, I could reliably
wake it up. with 2.2.19 (being where this fix entered 2.2) this isn't the
case - sometimes I just get the "probable bug" message, sometimes also a
diag re hda (sorry, can't quote right now) and on one occasion serious
file system corruption (OK, maybe it was a co-incidence, or maybe not).

Yes, I probably have a bug in the timer department, but I strongly suspect
that the fix for the 686a is not appropriate for my chipset.

If the current VIA686a "probable bug" fix is going to remain as default,
then I for one would like to see a knob to disable it.

For 2.2, I'm happy to have a go at making and alpha-testing a patch for a
kernel command-line switch to disable this - but I'd very much like to
hear from the custodians of consistency in such matters as to an
appropriate/best attribute=value to use for this. Some sugestions:

chips=novia686a
via_hacks=no686a
via_hacks=none
timer=no686a

Regards,
Neale.

2001-11-08 08:02:38

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: VIA 686 timer bugfix incomplete

On Thu, Nov 08, 2001 at 09:32:30AM +1100, Neale Banks wrote:
> On Wed, 7 Nov 2001, Vojtech Pavlik wrote:
>
> [...]
> > On Wed, Nov 07, 2001 at 08:48:00PM +0100, Jonas Diemer wrote:
> [...]
> > > well, just use the option described above. that way, ppl that need the fix can
> > > choose to use it (at a cost of performance), others simply don't need checking.
> > >
> > > -jonas
> > >
> > > PS: CC me in your answers plz, I am not subscribed to the list.
> >
> > The VIA bug isn't a problem: The fix doesn't cause performance problems
> > to people unaffected by the bug, it just prints an annoying message to
> > people who see it triggered by bug #2 (Neptune).
> [snip]
>
> Maybe not performance problems, but my tired-but-otherwise-reliable
> AcerNote-950C (which definitely does not have a VIA686a - it's a Pentium)
> doesn't seem to like this VIA686a fix (but only sometimes {:-( ).
>
> Prior to 2.2.19, on going to sleep due to low battery, I could reliably
> wake it up. with 2.2.19 (being where this fix entered 2.2) this isn't the
> case - sometimes I just get the "probable bug" message, sometimes also a
> diag re hda (sorry, can't quote right now) and on one occasion serious
> file system corruption (OK, maybe it was a co-incidence, or maybe not).
>
> Yes, I probably have a bug in the timer department, but I strongly suspect
> that the fix for the 686a is not appropriate for my chipset.
>
> If the current VIA686a "probable bug" fix is going to remain as default,
> then I for one would like to see a knob to disable it.
>
> For 2.2, I'm happy to have a go at making and alpha-testing a patch for a
> kernel command-line switch to disable this - but I'd very much like to
> hear from the custodians of consistency in such matters as to an
> appropriate/best attribute=value to use for this. Some sugestions:
>
> chips=novia686a
> via_hacks=no686a
> via_hacks=none
> timer=no686a

I think the latest would be best. Anyway, I don't think it could be a
cause of your IDE problems.

--
Vojtech Pavlik
SuSE Labs

2001-11-08 09:21:52

by Jonas Diemer

[permalink] [raw]
Subject: Re: VIA 686 timer bugfix incomplete

Well, then maybe Vojtech's suggestion is best: use RTC for timing, not the
chipset...
as to my knowledge, every i38 system has a standard RTC, so why not use this? or
even better: make an option in the config to choose whether use RTC or the
chipset.

-jonas

PS: CC me in your answers, plz.

2001-11-08 20:09:03

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: VIA 686 timer bugfix incomplete

On Thu, Nov 08, 2001 at 10:21:24AM +0100, Jonas Diemer wrote:

> Well, then maybe Vojtech's suggestion is best: use RTC for timing, not the
> chipset...
> as to my knowledge, every i38 system has a standard RTC, so why not use this? or
> even better: make an option in the config to choose whether use RTC or the
> chipset.

There is a little problem with RTC, though:

While you can set it up to generate interrupts at say 1024 Hz, you can't
read any value of how much time passed since last interrupt. You can do
this on the PIT (i8253), and this is the part that is buggy.

TSC is perfect, precise and accurate, but not reliable in long term.
Some CPUs do thermal throttling, notebooks play with CPU speed, etc,
etc. And it's not synchronized to any interrupt source.

Ugly, ugly, ugly is the PC architecture.

--
Vojtech Pavlik
SuSE Labs

2001-11-08 20:11:53

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: VIA 686 timer bugfix incomplete

On Thu, Nov 08, 2001 at 03:29:45AM +0800, Steve Underwood wrote:
> Alan Cox wrote:
>
> >>but it seems that the patch was incomplete: The bug is still triggered on my
> >>computer using 2.4.14, but the bugfix seems to work whith -ac kernels.
> >>
> >
> > The first piece is in.
> >
> >
> >>you can see what's missing to actually work around the via timer bug. I hope
> >>this will go into 2.4.15.
> >>
> >
> > I don't plan to submit it until the locking fixes for the timer access are
> > done and we know the real cause
>
>
> If the messages:
>
> probable hardware bug: clock timer configuration lost - probably a
> VIA686a motherboard.
> probable hardware bug: restoring chip configuration.
>
> are really related to a VIA686A bug, why do they erratically appear on
> Compaq ML370's, which use ServerWorks chip sets? Is there a common bug
> between these chip sets? Seems unlikely.

Just to make sure: Is on the system the Ftape of any joystick driver in
use? If not, then:

The ServerWorks chip set has a bug that is shared with old Intel Neptune
chipset most likely. This is a problem per se, but also triggers the VIA
bug workaround. The VIA bug test can be enhanced to detect this false
alarm, but the Neptune-like bug still stays and is dangerous as well.

At least the VIA workaround told us something fishy is happening on
non-VIA hardware as well.

For example on my VIA686a/cg (late revision), the workaround is never
triggered.

--
Vojtech Pavlik
SuSE Labs

2001-11-08 21:22:59

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: VIA 686 timer bugfix incomplete

On Thu, Nov 08, 2001 at 10:17:51PM +0100, Jonas Diemer wrote:

> On Thu, 8 Nov 2001 21:08:40 +0100
> Vojtech Pavlik <[email protected]> wrote:
>
> > There is a little problem with RTC, though:
> >
> > While you can set it up to generate interrupts at say 1024 Hz, you can't
> > read any value of how much time passed since last interrupt. You can do
> > this on the PIT (i8253), and this is the part that is buggy.
> >
> > TSC is perfect, precise and accurate, but not reliable in long term.
> > Some CPUs do thermal throttling, notebooks play with CPU speed, etc,
> > etc. And it's not synchronized to any interrupt source.
> >
> > Ugly, ugly, ugly is the PC architecture.
> >
>
> can't you just read the battery buffered clock? how are other OSes such as
> Window$ doing the timing?

You can. But you only get 0.5 second resolution, which obviously isn't
good enough for microsecond timing.

--
Vojtech Pavlik
SuSE Labs

2001-11-08 21:31:20

by George Anzinger

[permalink] [raw]
Subject: Re: VIA 686 timer bugfix incomplete

Vojtech Pavlik wrote:
>
> On Thu, Nov 08, 2001 at 10:21:24AM +0100, Jonas Diemer wrote:
>
> > Well, then maybe Vojtech's suggestion is best: use RTC for timing, not the
> > chipset...
> > as to my knowledge, every i38 system has a standard RTC, so why not use this? or
> > even better: make an option in the config to choose whether use RTC or the
> > chipset.
>
> There is a little problem with RTC, though:
>
> While you can set it up to generate interrupts at say 1024 Hz, you can't
> read any value of how much time passed since last interrupt. You can do
> this on the PIT (i8253), and this is the part that is buggy.
>
> TSC is perfect, precise and accurate, but not reliable in long term.
> Some CPUs do thermal throttling, notebooks play with CPU speed, etc,
> etc. And it's not synchronized to any interrupt source.
>
> Ugly, ugly, ugly is the PC architecture.
>
Me thinks the real solution is the ACPI pm timer. 3 times the
resolution of the PIT and you can not stop it. The high-res-timers
patch will allow you to use this as the time keeper and just use the PIT
to generate interrupts.

Finding the ACPI pm timer, on the other hand, is MOST obscure and not
all x86 platforms have ACPI. Still, we are almost there.

George

2001-11-08 23:24:06

by Alan

[permalink] [raw]
Subject: Re: VIA 686 timer bugfix incomplete

> Me thinks the real solution is the ACPI pm timer. 3 times the
> resolution of the PIT and you can not stop it. The high-res-timers
> patch will allow you to use this as the time keeper and just use the PIT
> to generate interrupts.

For awkward boxes you can use the PIT, for good boxes we can use rdtsc or
eventually the ACPI timers when running with ACPI

2001-11-09 02:54:35

by Steve Underwood

[permalink] [raw]
Subject: Re: VIA 686 timer bugfix incomplete

Hi,

Vojtech Pavlik wrote:

> On Thu, Nov 08, 2001 at 03:29:45AM +0800, Steve Underwood wrote:
>>If the messages:
>>
>>probable hardware bug: clock timer configuration lost - probably a
>>VIA686a motherboard.
>>probable hardware bug: restoring chip configuration.
>>
>>are really related to a VIA686A bug, why do they erratically appear on
>>Compaq ML370's, which use ServerWorks chip sets? Is there a common bug
>>between these chip sets? Seems unlikely.
>>
>
> Just to make sure: Is on the system the Ftape of any joystick driver in
> use? If not, then:
>
> The ServerWorks chip set has a bug that is shared with old Intel Neptune
> chipset most likely. This is a problem per se, but also triggers the VIA
> bug workaround. The VIA bug test can be enhanced to detect this false
> alarm, but the Neptune-like bug still stays and is dangerous as well.
>
> At least the VIA workaround told us something fishy is happening on
> non-VIA hardware as well.
>
> For example on my VIA686a/cg (late revision), the workaround is never
> triggered.


There are no such devices in use in our machines. This is happening on 3

Compaq servers, and each has a similar configuration. A Compaq ML370,

1G RAM, a Compaq Smart Array 431 RAID controller, and some Dialogic

telephony cards.


I don't have one of these machines running without telephony cards, to
see if that has any significance.The only external interfaces we use are
the LAN, one of the serial ports, and the phone lines connected to the
Dialogic cards. The SCSI controller is idle, as the disks are on the
RAID controller. The IDE interface has a CD-ROM on it..

From what you say, it sounds like the ServerWorks chipset may well have
a timer bug. This machine uses the LE 3.0 chipset.

Regards,
Steve



2001-11-09 08:35:16

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: VIA 686 timer bugfix incomplete

On Thu, Nov 08, 2001 at 11:30:52PM +0000, Alan Cox wrote:
> > Me thinks the real solution is the ACPI pm timer. 3 times the
> > resolution of the PIT and you can not stop it. The high-res-timers
> > patch will allow you to use this as the time keeper and just use the PIT
> > to generate interrupts.
>
> For awkward boxes you can use the PIT, for good boxes we can use rdtsc or
> eventually the ACPI timers when running with ACPI

The problem is that we use PIT even together with TSC because we need to
know how much time passed since last interrupt to be able to synchronize
the TSC with possibly delayed timer interrupts and TSC doesn't tell us
that ... but hopefully this can be done with some kind of PLL ...

--
Vojtech Pavlik
SuSE Labs

2001-11-09 17:22:44

by George Anzinger

[permalink] [raw]
Subject: Re: VIA 686 timer bugfix incomplete

Alan Cox wrote:
>
> > Me thinks the real solution is the ACPI pm timer. 3 times the
> > resolution of the PIT and you can not stop it. The high-res-timers
> > patch will allow you to use this as the time keeper and just use the PIT
> > to generate interrupts.
>
> For awkward boxes you can use the PIT, for good boxes we can use rdtsc or
> eventually the ACPI timers when running with ACPI

I am attempting to use the ACPI timer without waiting for or running
ACPI. After all it is there if you can find it.

George


Attachments:
george.vcf (302.00 B)
Card for george anzinger

2001-11-09 19:21:39

by Andrew Grover

[permalink] [raw]
Subject: RE: VIA 686 timer bugfix incomplete

George, I was mistaken before, sorry.

The address of the PM timer is in a table, not in the ACPI namespace. It is
in the FADT. Therefore you should be able to use it iff acpi tables are
present, but it should not strictly require the interpreter.

Regards -- Andy

> -----Original Message-----
> From: george anzinger [mailto:[email protected]]
> Sent: Friday, November 09, 2001 9:22 AM
> To: Alan Cox
> Cc: Vojtech Pavlik; Jonas Diemer; [email protected]
> Subject: Re: VIA 686 timer bugfix incomplete
> Importance: High
>
>
> Alan Cox wrote:
> >
> > > Me thinks the real solution is the ACPI pm timer. 3 times the
> > > resolution of the PIT and you can not stop it. The
> high-res-timers
> > > patch will allow you to use this as the time keeper and
> just use the PIT
> > > to generate interrupts.
> >
> > For awkward boxes you can use the PIT, for good boxes we
> can use rdtsc or
> > eventually the ACPI timers when running with ACPI
>
> I am attempting to use the ACPI timer without waiting for or running
> ACPI. After all it is there if you can find it.
>
> George
>