2001-12-05 19:22:27

by Troels Walsted Hansen

[permalink] [raw]
Subject: VIA acknowledges North Bridge bug (AKA Linux Kernel with Athlon optimizations bug)

Remember the pci_fixup_via_athlon_bug() (AKA "Athlon bug stomper")
function which went into kernel 2.4.14 after much discussion?

Apparently the mysterious register 55 in the Northbridge controls a
buggy Memory Write Queue timer. They finally acknowledged the problem
when Nvidia drivers and Windows XP started pushing the hardware enough
to trigger the bug...

http://bbs.pcstats.com/viahardware/messageview.cfm?catid=19&threadid=863
8

--
Troels Walsted Hansen



2001-12-05 21:10:16

by Calin A. Culianu

[permalink] [raw]
Subject: Re: VIA acknowledges North Bridge bug (AKA Linux Kernel with Athlon optimizations bug)


So does this mean we will be seeing a patch that clears bits 6,7, and 8 in
register 55 on the northbridge soon?

-Calin

On Wed, 5 Dec 2001, Troels Walsted Hansen wrote:

> Remember the pci_fixup_via_athlon_bug() (AKA "Athlon bug stomper")
> function which went into kernel 2.4.14 after much discussion?
>
> Apparently the mysterious register 55 in the Northbridge controls a
> buggy Memory Write Queue timer. They finally acknowledged the problem
> when Nvidia drivers and Windows XP started pushing the hardware enough
> to trigger the bug...
>
> http://bbs.pcstats.com/viahardware/messageview.cfm?catid=19&threadid=863
> 8
>
>

2001-12-06 00:33:58

by Alan

[permalink] [raw]
Subject: Re: VIA acknowledges North Bridge bug (AKA Linux Kernel with Athlon

> So does this mean we will be seeing a patch that clears bits 6,7, and 8 in
> register 55 on the northbridge soon?

We already have one. The Linux folks saw the problem much earlier than
windows people because our athlon optimised memory copies triggered it
reliably on many boards.

Whats sad is its taken VIA this long to finally acknowledge a bug that we
have shown existed months and months ago, and even had Linux fixes for a
while in 2.4

Alan

2001-12-06 08:45:36

by Petri Kaukasoina

[permalink] [raw]
Subject: Re: VIA acknowledges North Bridge bug (AKA Linux Kernel with Athlon

On Thu, Dec 06, 2001 at 12:41:36AM +0000, Alan Cox wrote:
> We already have one. The Linux folks saw the problem much earlier than
> windows people because our athlon optimised memory copies triggered it
> reliably on many boards.

The details were a bit different in that web page:

Linux looks for chip with id 1106:0305 (KT133) and clears only bit 7 of
register 55. The Windows driver checks for chips 1106:0305, 1106:3099,
1106:3102, 1106:3112, and clears three bits: bits 5-7 of that register. In
addition, the web page tells that it's probably not right for 1106:3099
(KT266) because there it should be register 95.

Maybe this is not relevant: maybe all BIOSes for KT266 chipsets already set
the right values and maybe KT133 boards with problems only have bit 7 set,
not bits 5 and 6. (PCs here with KT133 already have all bits 5-7 zero in
reg. 55 and PCs with KT266 have bits 5-7 both in reg. 55 and 95 zero.)

2001-12-06 21:35:53

by Calin A. Culianu

[permalink] [raw]
Subject: Re: VIA acknowledges North Bridge bug (AKA Linux Kernel with Athlon

On Thu, 6 Dec 2001, Alan Cox wrote:

> > So does this mean we will be seeing a patch that clears bits 6,7, and 8 in
> > register 55 on the northbridge soon?
>
> We already have one. The Linux folks saw the problem much earlier than
> windows people because our athlon optimised memory copies triggered it
> reliably on many boards.
>
> Whats sad is its taken VIA this long to finally acknowledge a bug that we
> have shown existed months and months ago, and even had Linux fixes for a
> while in 2.4


There seems to be some confusion though.. I probably should just read the
code myself.. but it seems from what I've read that the patch we had
didn't clear all the bits and that maybe on the KT266, the chipset isn't
being detected as 'buggy' by the patch so nothing is being cleared... is
this correct? If not would you mind taking the time to just tell me where
in the code I can grep for the patch?


> > Alan
>

2001-12-06 22:09:37

by Calin A. Culianu

[permalink] [raw]
Subject: Re: VIA acknowledges North Bridge bug (AKA Linux Kernel with Athlon

On Thu, 6 Dec 2001, Petri Kaukasoina wrote:

> On Thu, Dec 06, 2001 at 12:41:36AM +0000, Alan Cox wrote:
> > We already have one. The Linux folks saw the problem much earlier than
> > windows people because our athlon optimised memory copies triggered it
> > reliably on many boards.
>
> The details were a bit different in that web page:
>
> Linux looks for chip with id 1106:0305 (KT133) and clears only bit 7 of
> register 55. The Windows driver checks for chips 1106:0305, 1106:3099,
> 1106:3102, 1106:3112, and clears three bits: bits 5-7 of that register. In
> addition, the web page tells that it's probably not right for 1106:3099
> (KT266) because there it should be register 95.
>
> Maybe this is not relevant: maybe all BIOSes for KT266 chipsets already set
> the right values and maybe KT133 boards with problems only have bit 7 set,
> not bits 5 and 6. (PCs here with KT133 already have all bits 5-7 zero in
> reg. 55 and PCs with KT266 have bits 5-7 both in reg. 55 and 95 zero.)
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

Here is the webpage:

This patch detects the 0305, 3099, 3102, and 3112 (KT133x, KT266x, VT8662,
and KLE133) *only*. On these chipsets, it will patch register 55 in the
Northbridge, which will supposedly switch off a Memory Write Queue timer.
In the KT133A datasheet, register 55 is "reserved". But - yikes! - in the
KT266, the documented MWQ register is register 95, not 55. Register 55
contains unrelated DDR timing adjustments and could actually be dangerous
to program. For this reason, I do not recommend installing this driver on
the KT266x chipsets until VIA examines this issue. For now, use WPCREDIT
and set bits 5, 6, and 7 to zero in register 95 instead."

----

Clearly, we need to modify the via workaround patches to take into account
the other via device id's (namely 3099, 3102, and 3112), and for each one
change the appropriate register. Either register 55 or in the case of the
kt266x, register 95. I am grepping through quirks.c right now and it
seems this would be the correct file to modify.. any other suggestions on
what file to modify?

I am going to play with this right now myself using user-space tools and
experiment to see if it solves my issues with my 33-node beowulf cluster
(all nodes use the kt266) being very unstable.

Read again: I seriously believe that I am a person who is afflicted by
this (or some other unknown) hardware bug as any linux kernel I try is
relatively unstable (approx 5% per day chance of crash -- 13% chance if we
turn on athlon optimizations)! This means that in a few weeks, almost all
of our beowulf nodes go down if left unattended! And I KNOW it's not due
to cpu temperature, wrong GCC version, or any other really obvious thing..

-Calin

2001-12-07 10:29:06

by Martin Eriksson

[permalink] [raw]
Subject: [PATCH][highly-experimental] via-mwq (Was: Re: VIA acknowledges North Bridge bug...)

----- Original Message -----
From: "Calin A. Culianu" <[email protected]>
To: "Petri Kaukasoina" <[email protected]>
Cc: <[email protected]>
Sent: Thursday, December 06, 2001 11:09 PM
Subject: Re: VIA acknowledges North Bridge bug (AKA Linux Kernel with Athlon

> Here is the webpage:
>
> This patch detects the 0305, 3099, 3102, and 3112 (KT133x, KT266x, VT8662,
> and KLE133) *only*. On these chipsets, it will patch register 55 in the
> Northbridge, which will supposedly switch off a Memory Write Queue timer.
> In the KT133A datasheet, register 55 is "reserved". But - yikes! - in the
> KT266, the documented MWQ register is register 95, not 55. Register 55
> contains unrelated DDR timing adjustments and could actually be dangerous
> to program. For this reason, I do not recommend installing this driver on
> the KT266x chipsets until VIA examines this issue. For now, use WPCREDIT
> and set bits 5, 6, and 7 to zero in register 95 instead."
>
> ----
>
> Clearly, we need to modify the via workaround patches to take into account
> the other via device id's (namely 3099, 3102, and 3112), and for each one
> change the appropriate register. Either register 55 or in the case of the
> kt266x, register 95. I am grepping through quirks.c right now and it
> seems this would be the correct file to modify.. any other suggestions on
> what file to modify?

I've (hastily) put these changes into "arch/i386/kernel/pci-pc.c" and had to
modify "include/linux/pci_ids.h" too.

The patch is included, but a warning: I have no VIA based computer that I
can test this on myself...

_____________________________________________________
| Martin Eriksson <[email protected]>
| MSc CSE student, department of Computing Science
| Ume? University, Sweden


Attachments:
via-mwq.patch (2.56 kB)

2001-12-07 12:47:43

by Pozsar Balazs

[permalink] [raw]
Subject: Re: [PATCH][highly-experimental] via-mwq (Was: Re: VIA acknowledges North Bridge bug...)


On Fri, 7 Dec 2001, Martin Eriksson wrote:

> I've (hastily) put these changes into "arch/i386/kernel/pci-pc.c" and had to
> modify "include/linux/pci_ids.h" too.
>
> The patch is included, but a warning: I have no VIA based computer that I
> can test this on myself...


I noticed one little typo:

[...]
+static void __init pci_fixup_via_kt266_athlon_bug(struct pci_dev *d)
+{
+ u8 v;
+ pci_read_config_byte(d, 0x95, &v);
+ if (v & 0xE0) {
+ printk("PCI: Disabling VIA VT8366 Memory Write Queue\n");
+ v &= 0x1f; /* clear bit 55.7, 6, 5 */
^^^^^
+ pci_write_config_byte(d, 0x95, v);
+ }
+}
+

It should also be 95 imho.

regards,
--
Balazs Pozsar

2001-12-07 18:46:49

by Calin A. Culianu

[permalink] [raw]
Subject: Re: [PATCH][highly-experimental] via-mwq (Was: Re: VIA acknowledges North Bridge bug...)


I wrote a patch as well.. and tested it.. It works fine. I would
recommend not writing an entirely new function. It's a bit more efficient
just to overload the athlon bug stomper function and register it for a few
extra chipsets.

I will post my patch tha tbasically works just like your patch but has a
smaller footprint.. :)

-Calin


On Fri, 7
Dec 2001, Martin Eriksson wrote:

> ----- Original Message -----
> From: "Calin A. Culianu" <[email protected]>
> To: "Petri Kaukasoina" <[email protected]>
> Cc: <[email protected]>
> Sent: Thursday, December 06, 2001 11:09 PM
> Subject: Re: VIA acknowledges North Bridge bug (AKA Linux Kernel with Athlon
>
> > Here is the webpage:
> >
> > This patch detects the 0305, 3099, 3102, and 3112 (KT133x, KT266x, VT8662,
> > and KLE133) *only*. On these chipsets, it will patch register 55 in the
> > Northbridge, which will supposedly switch off a Memory Write Queue timer.
> > In the KT133A datasheet, register 55 is "reserved". But - yikes! - in the
> > KT266, the documented MWQ register is register 95, not 55. Register 55
> > contains unrelated DDR timing adjustments and could actually be dangerous
> > to program. For this reason, I do not recommend installing this driver on
> > the KT266x chipsets until VIA examines this issue. For now, use WPCREDIT
> > and set bits 5, 6, and 7 to zero in register 95 instead."
> >
> > ----
> >
> > Clearly, we need to modify the via workaround patches to take into account
> > the other via device id's (namely 3099, 3102, and 3112), and for each one
> > change the appropriate register. Either register 55 or in the case of the
> > kt266x, register 95. I am grepping through quirks.c right now and it
> > seems this would be the correct file to modify.. any other suggestions on
> > what file to modify?
>
> I've (hastily) put these changes into "arch/i386/kernel/pci-pc.c" and had to
> modify "include/linux/pci_ids.h" too.
>
> The patch is included, but a warning: I have no VIA based computer that I
> can test this on myself...
>
> _____________________________________________________
> | Martin Eriksson <[email protected]>
> | MSc CSE student, department of Computing Science
> | Ume? University, Sweden
>
>