2001-11-08 16:47:16

by Calin A. Culianu

[permalink] [raw]
Subject: Any lingering Athlon bugs in Kernel 2.4.14?


Hi, I am wondering if maybe there are any lingering Athlon bugs in Kernel
2.4.14?

I basically have a 33-node AMD Athlon Beowulf Cluster using the KT266
chipset. I compiled kernel 2.4.14 optimized for athlons.

If I leave the computers up for several days, without fail random nodes in
the beowulf start to drop like flies. Every other day, a different,
random node will get those Aiiiee messages and complain about some virtual
page request being invalid or somesuch, hanging the machine.

I am sure all the machines have good hardware as we ran thorough tests on
the machines using things like memtest86. I only started experiencing
problems since upgrading the kernels from the stock redhat kernels that
came with those machines.

I haven't yet tried just compiling the kernel without the Athlon
optimizations. I was wondering, though, if there are any known or
suspected issues with Athlons and the latest kernel?

Any help/advice/thoughts/even flames would be appreciated... :)

-Calin


2001-11-08 17:51:35

by Robert Love

[permalink] [raw]
Subject: Re: Any lingering Athlon bugs in Kernel 2.4.14?

On Thu, 2001-11-08 at 11:46, Calin A. Culianu wrote:
> Hi, I am wondering if maybe there are any lingering Athlon bugs in Kernel
> 2.4.14?
> [...]
> Any help/advice/thoughts/even flames would be appreciated... :)

Would you mind trying Alan's tree? Get linux-2.4.13 and
patch-2.4.13-ac7. The newest is 2.4.13-ac8, but stick with 7 for now.

Ie, give kernel 2.4.13-ac7 a whirl.

Robert Love

2001-11-08 18:17:28

by Calin A. Culianu

[permalink] [raw]
Subject: Re: Any lingering Athlon bugs in Kernel 2.4.14?

On 8 Nov 2001, Robert Love wrote:

> On Thu, 2001-11-08 at 11:46, Calin A. Culianu wrote:
> > Hi, I am wondering if maybe there are any lingering Athlon bugs in Kernel
> > 2.4.14?
> > [...]
> > Any help/advice/thoughts/even flames would be appreciated... :)
>
> Would you mind trying Alan's tree? Get linux-2.4.13 and
> patch-2.4.13-ac7. The newest is 2.4.13-ac8, but stick with 7 for now.

I wouldn't mind trying his tree at all. Does his tree somehow use the
older VM, or does it try to address Athlon bugs more aggressively? Ie: Why
is this a great idea? (Apart from Alan's tree just being really cool).

-Calin

> > Ie, give kernel 2.4.13-ac7 a whirl.
>
> Robert Love
>

2001-11-08 19:18:28

by Robert Love

[permalink] [raw]
Subject: Re: Any lingering Athlon bugs in Kernel 2.4.14?

On Thu, 2001-11-08 at 13:17, Calin A. Culianu wrote:
> I wouldn't mind trying his tree at all. Does his tree somehow use the
> older VM, or does it try to address Athlon bugs more aggressively? Ie: Why
> is this a great idea? (Apart from Alan's tree just being really cool).

It does use the older VM, and more importantly it has some odd end fixes
that have yet to be incorporated into Linus's tree. And, yes, it is
just really cool :)

After that, I would look into compiling without optimization.

Also, what exactly happens on the systems? Do they hard lock? Do you
have an oops?

Robert Love

2001-11-08 21:43:49

by Gniazdowski

[permalink] [raw]
Subject: Re: Any lingering Athlon bugs in Kernel 2.4.14?

> older VM, or does it try to address Athlon bugs more aggressively? Ie: Why

Athlon have bugs ?!
Or are we talking about Athlon-optimizations bugs ? Or about Athlon SMP ?

Regards Gniazdowski Mariusz.

2001-11-08 21:54:49

by Wilson

[permalink] [raw]
Subject: Re: Any lingering Athlon bugs in Kernel 2.4.14?

----- Original Message -----
From: "Gniazdowski" <[email protected]>
To: <[email protected]>
Sent: Thursday, November 08, 2001 4:50 PM
Subject: Re: Any lingering Athlon bugs in Kernel 2.4.14?


> > older VM, or does it try to address Athlon bugs more aggressively? Ie:
Why
>
> Athlon have bugs ?!
> Or are we talking about Athlon-optimizations bugs ? Or about Athlon SMP ?
>

Bugs in the Athlon optimizations present in the Linux kernel.


2001-11-08 22:16:49

by Mark Hahn

[permalink] [raw]
Subject: Re: Any lingering Athlon bugs in Kernel 2.4.14?

> Bugs in the Athlon optimizations present in the Linux kernel.

what bugs would those be? if you're thinking of the infamous
"my athlon dies when I boot a CONFIG_MK7 kernel on a kt133",
it is by all accounts a *chipset* bug, not a kernel bug.
it's still unclear whether the voodoo workaround
(in both linux and ac) is doing something sensible.

2001-11-08 22:38:31

by Wilson

[permalink] [raw]
Subject: Re: Any lingering Athlon bugs in Kernel 2.4.14?

----- Original Message -----
From: "Mark Hahn" <[email protected]>
To: "Wilson" <[email protected]>
Cc: <[email protected]>
Sent: Thursday, November 08, 2001 5:16 PM
Subject: Re: Any lingering Athlon bugs in Kernel 2.4.14?


> > Bugs in the Athlon optimizations present in the Linux kernel.
>
> what bugs would those be? if you're thinking of the infamous
> "my athlon dies when I boot a CONFIG_MK7 kernel on a kt133",
> it is by all accounts a *chipset* bug, not a kernel bug.
> it's still unclear whether the voodoo workaround
> (in both linux and ac) is doing something sensible.
>
Perhaps I should have said "Unfortunate interactions" rather than "bug."
The bottom line is that some people have trouble running Linux with "Athlon"
selected as the processor type.
I was just trying to reassure the original poster that there wasn't anything
wrong with the Athlon CPU itself.




2001-11-08 23:14:55

by Dan Hollis

[permalink] [raw]
Subject: Re: Any lingering Athlon bugs in Kernel 2.4.14?

On Thu, 8 Nov 2001, Gniazdowski wrote:
> > older VM, or does it try to address Athlon bugs more aggressively? Ie: Why
> Athlon have bugs ?!
> Or are we talking about Athlon-optimizations bugs ? Or about Athlon SMP ?

We're talking about VIA northbridge bugs triggered by athlon optimized
code.

-Dan
--
[-] Omae no subete no kichi wa ore no mono da. [-]

2001-11-08 23:22:46

by Alan

[permalink] [raw]
Subject: Re: Any lingering Athlon bugs in Kernel 2.4.14?

> > Or are we talking about Athlon-optimizations bugs ? Or about Athlon SMP ?
>
> Bugs in the Athlon optimizations present in the Linux kernel.

The only bugs we've seen recently appear to be in Athlon chipsets and/or
BIOS setup. 2.4.14 should sort those by poking around and doing what the
BIOS didn't

2001-11-09 23:30:38

by Calin A. Culianu

[permalink] [raw]
Subject: Re: Any lingering Athlon bugs in Kernel 2.4.14?


They get random NULL pointer dereference attempts from the kernel, as well
as some messages relating to errors in virtual page blah blah blah.
Really I should write down the error messages.. but basically the system
prints what looks like a black screen of death to the console and then
becomes completely non-responsive.


On 8 Nov 2001, Robert Love
wrote:

> On Thu, 2001-11-08 at 13:17, Calin A. Culianu wrote:
> > I wouldn't mind trying his tree at all. Does his tree somehow use the
> > older VM, or does it try to address Athlon bugs more aggressively? Ie: Why
> > is this a great idea? (Apart from Alan's tree just being really cool).
>
> It does use the older VM, and more importantly it has some odd end fixes
> that have yet to be incorporated into Linus's tree. And, yes, it is
> just really cool :)
>
> After that, I would look into compiling without optimization.
>
> Also, what exactly happens on the systems? Do they hard lock? Do you
> have an oops?
>
> Robert Love
>

2001-11-09 23:33:18

by Calin A. Culianu

[permalink] [raw]
Subject: Re: Any lingering Athlon bugs in Kernel 2.4.14?

On Thu, 8 Nov 2001, Wilson wrote:

> ----- Original Message -----
> From: "Mark Hahn" <[email protected]>
> To: "Wilson" <[email protected]>
> Cc: <[email protected]>
> Sent: Thursday, November 08, 2001 5:16 PM
> Subject: Re: Any lingering Athlon bugs in Kernel 2.4.14?
>
>
> > > Bugs in the Athlon optimizations present in the Linux kernel.
> >
> > what bugs would those be? if you're thinking of the infamous
> > "my athlon dies when I boot a CONFIG_MK7 kernel on a kt133",
> > it is by all accounts a *chipset* bug, not a kernel bug.
> > it's still unclear whether the voodoo workaround
> > (in both linux and ac) is doing something sensible.
> >
> Perhaps I should have said "Unfortunate interactions" rather than "bug."
> The bottom line is that some people have trouble running Linux with "Athlon"
> selected as the processor type.
> I was just trying to reassure the original poster that there wasn't anything
> wrong with the Athlon CPU itself.

So you think I should turn Athlon optimizations off? Exactly what kinds
of problems were people who use Athlon optimizations experiencing?


>
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2001-11-09 23:37:38

by Calin A. Culianu

[permalink] [raw]
Subject: Re: Any lingering Athlon bugs in Kernel 2.4.14?

On Thu, 8 Nov 2001, Alan Cox wrote:

> > > Or are we talking about Athlon-optimizations bugs ? Or about Athlon SMP ?
> >
> > Bugs in the Athlon optimizations present in the Linux kernel.
>
> The only bugs we've seen recently appear to be in Athlon chipsets and/or
> BIOS setup. 2.4.14 should sort those by poking around and doing what the
> BIOS didn't

Alan:

Specifically what chipsets are affected, and/or what things in the BIOS
can trigger problems? (I have VIA KT266 chipsets on SpaceWalker AK31
motherboards... 33 of them to be precise.. and many of the machines seem
to be somewhat unstable!)

-Calin

> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2001-11-09 23:35:09

by Robert Love

[permalink] [raw]
Subject: Re: Any lingering Athlon bugs in Kernel 2.4.14?

On Fri, 2001-11-09 at 18:30, Calin A. Culianu wrote:
> They get random NULL pointer dereference attempts from the kernel, as well
> as some messages relating to errors in virtual page blah blah blah.
> Really I should write down the error messages.. but basically the system
> prints what looks like a black screen of death to the console and then
> becomes completely non-responsive.

Any luck with 2.4.13-ac7 yet ?

Robert Love

2001-11-09 23:35:18

by Calin A. Culianu

[permalink] [raw]
Subject: Re: Any lingering Athlon bugs in Kernel 2.4.14?

On Thu, 8 Nov 2001, Dan Hollis wrote:

> On Thu, 8 Nov 2001, Gniazdowski wrote:
> > > older VM, or does it try to address Athlon bugs more aggressively? Ie: Why
> > Athlon have bugs ?!
> > Or are we talking about Athlon-optimizations bugs ? Or about Athlon SMP ?
>
> We're talking about VIA northbridge bugs triggered by athlon optimized
> code.

I have a VIA northbridge, and, verily, I turned Athlon optimizations on!
Should I not have done this?!?!?! What are the symptoms of these
bugs/problems?


>
> -Dan
>

2001-11-09 23:46:10

by Calin A. Culianu

[permalink] [raw]
Subject: Re: Any lingering Athlon bugs in Kernel 2.4.14?

On 9 Nov 2001, Robert Love wrote:

> On Fri, 2001-11-09 at 18:30, Calin A. Culianu wrote:
> > They get random NULL pointer dereference attempts from the kernel, as well
> > as some messages relating to errors in virtual page blah blah blah.
> > Really I should write down the error messages.. but basically the system
> > prints what looks like a black screen of death to the console and then
> > becomes completely non-responsive.
>
> Any luck with 2.4.13-ac7 yet ?

I haven't gotten authorization from people using this beowulf cluster to
try 2.4.13-ac7 yet! :) I did however compile it. I kind of reverted back
to 2.4.2 before I became convinced of 2.4.13-ac7's virtues... So you
really think 2.4.13-ac7 has some cool hw bug workarounds? I guess I should
read about what went into -ac7.... Where would be a good place to find
more info?

-Calin

>
> Robert Love
>

2001-11-10 09:01:43

by Luigi Genoni

[permalink] [raw]
Subject: Re: Any lingering Athlon bugs in Kernel 2.4.14?



On Fri, 9 Nov 2001, Calin A. Culianu wrote:

> On Thu, 8 Nov 2001, Alan Cox wrote:
>
> > > > Or are we talking about Athlon-optimizations bugs ? Or about Athlon SMP ?
> > >
> > > Bugs in the Athlon optimizations present in the Linux kernel.
> >
> > The only bugs we've seen recently appear to be in Athlon chipsets and/or
> > BIOS setup. 2.4.14 should sort those by poking around and doing what the
> > BIOS didn't
>
> Alan:
>
> Specifically what chipsets are affected, and/or what things in the BIOS
> can trigger problems? (I have VIA KT266 chipsets on SpaceWalker AK31
> motherboards... 33 of them to be precise.. and many of the machines seem
> to be somewhat unstable!)
VIA KT133 KT133 for sure, with abit bios 1.3R. but We saw report of other
bios with similar problema.
A work around for the bioses with the 55.7 register not setted to 0 has
been merged in the main kernel starting from 2.4.11.

about symptoms... Hand at boot, filesystem corruption under eavy I/O
load...
Do not worry, those are kind of problems that everyone could hardly
ignore.

Luigi


2001-11-10 09:17:53

by Wayne Whitney

[permalink] [raw]
Subject: Re: Any lingering Athlon bugs in Kernel 2.4.14?

In mailing-lists.linux-kernel, Luigi Genoni wrote:

> On Fri, 9 Nov 2001, Calin A. Culianu wrote:

> > Specifically what chipsets are affected, and/or what things in the BIOS
> > can trigger problems?
>
> VIA KT133 KT133 for sure, with abit bios 1.3R. but We saw report of other
> bios with similar problema.

I just wanted to ask for a clarification, because "VIA KT133 KT133"
looks like a typo of some sort. Does the problem really affect some
KT133 motherboards? I thought it was KT133A specific.

Thanks, Wayne

2001-11-10 13:34:11

by Luigi Genoni

[permalink] [raw]
Subject: Re: Any lingering Athlon bugs in Kernel 2.4.14?

Yes, it was a typo, KT!33 and KT133A.
The bios bug has been reported for many KT133A chipsets, but if you take
the time to consult the lkml archive, some report are concerning KT133
chipset.



On Sat, 10 Nov 2001, Wayne Whitney wrote:

> In mailing-lists.linux-kernel, Luigi Genoni wrote:
>
> > On Fri, 9 Nov 2001, Calin A. Culianu wrote:
>
> > > Specifically what chipsets are affected, and/or what things in the BIOS
> > > can trigger problems?
> >
> > VIA KT133 KT133 for sure, with abit bios 1.3R. but We saw report of other
> > bios with similar problema.
>
> I just wanted to ask for a clarification, because "VIA KT133 KT133"
> looks like a typo of some sort. Does the problem really affect some
> KT133 motherboards? I thought it was KT133A specific.
>
> Thanks, Wayne
>

2001-11-10 13:44:39

by Alan

[permalink] [raw]
Subject: Re: Any lingering Athlon bugs in Kernel 2.4.14?

> really think 2.4.13-ac7 has some cool hw bug workarounds? I guess I should
> read about what went into -ac7.... Where would be a good place to find
> more info?

If you want to be predictable about your test set then you can simply pull
the VIA Athlon workaround pci quirk form 2.4.13-ac or 2.4.14 and merge it
with your base 2.4.2, or 2.4.2-rh whatever tree.

In fact you can do it in userspace with setpci if thats politically optimal
8)


Alan

2001-11-11 05:06:14

by Calin A. Culianu

[permalink] [raw]
Subject: Re: Any lingering Athlon bugs in Kernel 2.4.14?

On Sat, 10 Nov 2001, Alan Cox wrote:

> > really think 2.4.13-ac7 has some cool hw bug workarounds? I guess I should
> > read about what went into -ac7.... Where would be a good place to find
> > more info?
>
> If you want to be predictable about your test set then you can simply pull
> the VIA Athlon workaround pci quirk form 2.4.13-ac or 2.4.14 and merge it
> with your base 2.4.2, or 2.4.2-rh whatever tree.
>
> In fact you can do it in userspace with setpci if thats politically optimal
> 8)
>
>
> Alan
>

Alan:

Good idea.. I actually would like to be scientific about it by sticking
to the kernel that the machines came with, and just trying the VIA Athlon
workaroung pci quirk on those kernels... that way I can experiment and see
if it makes much of a difference.

I suppose this question is stuff I can find elsewhere but: where do I find
just that patch alone (so that I cen see about adapting it to my redhat
kernel) and/or how do I do it from userspace ;) (i assume i just have to
write to some registers or something using some user-space mechanism that
i am unaware of???).

-Calin

2001-11-11 16:48:40

by Wayne Whitney

[permalink] [raw]
Subject: Re: Any lingering Athlon bugs in Kernel 2.4.14?

In mailing-lists.linux-kernel, you wrote:

> I suppose this question is stuff I can find elsewhere but: where do I find
> just that patch alone (so that I cen see about adapting it to my redhat
> kernel)

The relevant function is from arch/i386/kernel/pci-pc.c:

static void __init pci_fixup_via_athlon_bug(struct pci_dev *d)
{
u8 v;
pci_read_config_byte(d, 0x55, &v);
if (v & 0x80) {
printk("Trying to stomp on Athlon bug...\n");
v &= 0x7f; /* clear bit 55.7 */
pci_write_config_byte(d, 0x55, v);
}
}

Note also this line from struct pci_fixup pcibios_fixups[] in the same
file:

{ PCI_FIXUP_HEADER, PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_8363_0, pci_fixup_via_athlon_bug },

I believe that all you have to do is add the above line to
pcibios_fixups[] and add the above function to pci-pc.c.

> and/or how do I do it from userspace ;) (i assume i just have to
> write to some registers or something using some user-space mechanism
> that i am unaware of???).

You can see what pci_fixup_via_athlon_bug does: it clears bit 7 of
register 0x55 of the PCI device VIA 8363. On a machine that has a VIA
8363 (the northbridge), I believe it will be PCI ID 0:0.0 (0th bus,
0th device, 0th subdevice). You can check this with 'lspci -n -s
0:0.0', it should say '00:00.0 Class 0600: 1106:0305' followed by a
revision (2 for the KT133, 3 for the KT133A)

Then as root, you do 'setpci -s 0:0.0 55' to query the register, do
the computation of clearing bit 7 of the result to get a value YY, and
do 'setpci -s 0:0.0 55=YY' to set the register. Note that both 55 and
YY are in hexadecimal.

Cheers, Wayne

2001-11-19 18:48:57

by Bill Davidsen

[permalink] [raw]
Subject: Re: Any lingering Athlon bugs in Kernel 2.4.14?

In article <Pine.LNX.4.10.10111081706491.31943-100000@coffee.psychology.mcmaster.ca>
[email protected] wrote:
>> Bugs in the Athlon optimizations present in the Linux kernel.
>
>what bugs would those be? if you're thinking of the infamous
>"my athlon dies when I boot a CONFIG_MK7 kernel on a kt133",
>it is by all accounts a *chipset* bug, not a kernel bug.
>it's still unclear whether the voodoo workaround
>(in both linux and ac) is doing something sensible.

Without the voodoo the Athlon is a very dubious chip to use indeed...
because user mode code can and will use Athlon optimizations which hang
the system. This is a case of "I do it because if I don't the system
doesn't work right."

--
bill davidsen <[email protected]>
His first management concern is not solving the problem, but covering
his ass. If he lived in the middle ages he'd wear his codpiece backward.