2003-11-02 06:13:30

by Geoffrey Lee

[permalink] [raw]
Subject: [patch] reproducible athlon mce fix

Hi all,


After switching from 2.4.22 to 2.6.0-test9, I have received reproducible
MCE non-fatal error check messages in my kernel log. (For example, one
shows up right after my first scsi card init).

>From Dave Jones' patch here:

http://lists.insecure.org/lists/linux-kernel/2003/Sep/7362.html

and another message here:

http://lkml.org/lkml/2003/10/7/214

would seem to imply that Athlons don't like having their Bank 0 poked at,
though that's what non-fatal.c does. Would it be correct to make sure
that that non-fatal.c starts at bank 1, if it is an Athlon?

Dave, is the following patch correct? Booted and tested, no ill effects
so far ...


- g.


Attachments:
(No filename) (660.00 B)
mce-fix.patch (428.00 B)
Download all attachments

2003-11-02 07:25:49

by Willy Tarreau

[permalink] [raw]
Subject: Re: [patch] reproducible athlon mce fix

Hi,

I don't know if the patch is correct, but :

On Sun, Nov 02, 2003 at 01:57:48PM +0800, Geoffrey Lee wrote:
> preempt_disable();
> +#if CONFIG_MK7
> + for (i=1; i<nr_mce_banks; i++) {
> +#else
> for (i=0; i<nr_mce_banks; i++) {
> +#endif

Including opening braces within #if often fools editors such as emacs
which count them and don't know about #if. Then, editing the rest of
the file can become annoying because it simply thinks that there are
two embedded for loops.

Other solutions to this problem often include one of these constructs
which are unfortunately not as beautiful :

1) a still readable one
#if CONFIG_MK7
i=1;
#else
i=0;
#endif
for (; i<nr_mce_banks; i++) {

2) when there's absolutely no choice (eg changing part of an
expression...) something unreadable like this.

for (
#if CONFIG_MK7
i=1;
#else
i=0;
#endif
i<nr_mce_banks; i++) {

I'm sure other people have other solution that don't come to mind.

Cheers,
Willy

2003-11-02 07:44:24

by Geoffrey Lee

[permalink] [raw]
Subject: Re: [patch] reproducible athlon mce fix

On Sun, Nov 02, 2003 at 08:25:19AM +0100, Willy Tarreau wrote:
> Hi,
>
>
> Other solutions to this problem often include one of these constructs
> which are unfortunately not as beautiful :
>


Hi,

Yep, I know it's ugly, but what I am really after is whether the patch
is correct or not, in terms of what it does.

If it's correct we can work out how to fix this properly, since non-fatal.c
is generic code and I am not sure if doing a CONFIG_MK7 like that is very
pretty there anyway ...

- g.

2003-11-02 09:00:33

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [patch] reproducible athlon mce fix

On Sun, 2 Nov 2003, Willy Tarreau wrote:

> 1) a still readable one
> #if CONFIG_MK7
> i=1;
> #else
> i=0;
> #endif
> for (; i<nr_mce_banks; i++) {
>
> 2) when there's absolutely no choice (eg changing part of an
> expression...) something unreadable like this.
>
> for (
> #if CONFIG_MK7
> i=1;
> #else
> i=0;
> #endif
> i<nr_mce_banks; i++) {
>
> I'm sure other people have other solution that don't come to mind.

Perhaps..

#ifdef CONFIG_MK7
#define MCE_BANK_START 1
#else
#define MCE_BANK_START 0
#endif

for (i = MCE_BANK_START; nr_mce_banks; i++)

2003-11-02 12:52:40

by Dave Jones

[permalink] [raw]
Subject: Re: [patch] reproducible athlon mce fix

On Sun, Nov 02, 2003 at 01:57:48PM +0800, Geoffrey Lee wrote:

> preempt_disable();
> +#if CONFIG_MK7
> + for (i=1; i<nr_mce_banks; i++) {
> +#else
> for (i=0; i<nr_mce_banks; i++) {
> +#endif
> rdmsr (MSR_IA32_MC0_STATUS+i*4, low, high);

This needs to be a runtime check. In 2.6, a K7 can boot
a P4 kernel, and vice versa.

Dave

--
Dave Jones http://www.codemonkey.org.uk

2003-11-02 19:27:58

by Luca

[permalink] [raw]
Subject: Re: [patch] reproducible athlon mce fix

Geoffrey Lee ha scritto:
> After switching from 2.4.22 to 2.6.0-test9, I have received reproducible
> MCE non-fatal error check messages in my kernel log. (For example, one
> shows up right after my first scsi card init).
[cut]
> would seem to imply that Athlons don't like having their Bank 0 poked at,
> though that's what non-fatal.c does. Would it be correct to make sure
> that that non-fatal.c starts at bank 1, if it is an Athlon?
>
> --- linux-2.6.0-test9/arch/i386/kernel/cpu/mcheck/non-fatal.c.orig 2003-11-02 13:31:43.000000000 +0800
> +++ linux-2.6.0-test9/arch/i386/kernel/cpu/mcheck/non-fatal.c 2003-11-02 13:34:37.000000000 +0800
> @@ -30,7 +30,11 @@
> int i;
>
> preempt_disable();
> +#if CONFIG_MK7
> + for (i=1; i<nr_mce_banks; i++) {
> +#else
> for (i=0; i<nr_mce_banks; i++) {
> +#endif
> rdmsr (MSR_IA32_MC0_STATUS+i*4, low, high);
>
> if (high & (1<<31)) {

In this way you don't read from bank 0. The strange thing is that
amd_mcheck_init doesn't enable reporting on this bank... it should stay
clean. What's going on here?

Luca
--
Reply-To: [email protected]
Home: http://kronoz.cjb.net
Quando un uomo porta dei fiori a sua moglie senza motivo,
un motivo c'e`.

2003-11-03 13:25:41

by Dave Jones

[permalink] [raw]
Subject: Re: [patch] reproducible athlon mce fix

On Mon, Nov 03, 2003 at 05:20:48PM +0800, Geoffrey Lee wrote:

> Would checking boot_cpu_data.x86_vendor == X86_VENDOR_AMD and
> boot_cpu_data.x86 == 6 be sufficient? It seems to do the right thing ..
> Updated patch attached.

Yup, looks to be pretty much the same change I made in my local tree.

Dave

2003-11-09 06:22:08

by Geoffrey Lee

[permalink] [raw]
Subject: Re: [patch] reproducible athlon mce fix

On Sun, Nov 02, 2003 at 07:25:56PM +0100, Kronos wrote:

> In this way you don't read from bank 0. The strange thing is that



In any case:

(1) only does it for k7 (which seems to do the right thing)
(2) k7.c mcheck doesn't read from bank 0 as well

> amd_mcheck_init doesn't enable reporting on this bank... it should stay
> clean. What's going on here?
>


Notice how k7.c doesn't read from bank 0 either, and this was the fix
submitted by Dave earlier on for k7.c but was not done for non-fatal.c.


- g.
--
\x42\x20\x70\x65\x6f\x70\x6c\x65\x20\x61\x72\x65\x20\x77\x61\x6e\x6b\x65\x72\x73

2003-11-09 06:23:18

by Geoffrey Lee

[permalink] [raw]
Subject: Re: [patch] reproducible athlon mce fix

On Sun, Nov 02, 2003 at 12:52:03PM +0000, Dave Jones wrote:
> On Sun, Nov 02, 2003 at 01:57:48PM +0800, Geoffrey Lee wrote:
>
> > preempt_disable();
> > +#if CONFIG_MK7
> > + for (i=1; i<nr_mce_banks; i++) {
> > +#else
> > for (i=0; i<nr_mce_banks; i++) {
> > +#endif
> > rdmsr (MSR_IA32_MC0_STATUS+i*4, low, high);
>
> This needs to be a runtime check. In 2.6, a K7 can boot
> a P4 kernel, and vice versa.
>

Yes, of course.

Would it be sufficient to check for the boot_cpu_data.x86 == 6 and
boot_cpu_data.x86_vendor == X86_VENDOR_AMD? This would seem to do
the right thing. Updated patch attached.


- g.


Attachments:
(No filename) (629.00 B)
mce-fix.patch (1.04 kB)
Download all attachments

2003-11-09 06:22:05

by Geoffrey Lee

[permalink] [raw]
Subject: Re: [patch] reproducible athlon mce fix

On Sun, Nov 02, 2003 at 12:52:03PM +0000, Dave Jones wrote:
> On Sun, Nov 02, 2003 at 01:57:48PM +0800, Geoffrey Lee wrote:
>
> > preempt_disable();
> > +#if CONFIG_MK7
> > + for (i=1; i<nr_mce_banks; i++) {
> > +#else
> > for (i=0; i<nr_mce_banks; i++) {
> > +#endif
> > rdmsr (MSR_IA32_MC0_STATUS+i*4, low, high);
>
> This needs to be a runtime check. In 2.6, a K7 can boot
> a P4 kernel, and vice versa.
>


(Resending as it seems to have eaten my mail due to dns problems ... apologies
if you get this twice.)


Would checking boot_cpu_data.x86_vendor == X86_VENDOR_AMD and
boot_cpu_data.x86 == 6 be sufficient? It seems to do the right thing ..
Updated patch attached.

- g.


Attachments:
(No filename) (702.00 B)
mce-fix.patch (1.04 kB)
Download all attachments

2003-11-11 00:05:45

by Bill Davidsen

[permalink] [raw]
Subject: Re: [patch] reproducible athlon mce fix

In article <[email protected]>,
Willy Tarreau <[email protected]> wrote:

| I don't know if the patch is correct, but :
|
| On Sun, Nov 02, 2003 at 01:57:48PM +0800, Geoffrey Lee wrote:
| > preempt_disable();
| > +#if CONFIG_MK7
| > + for (i=1; i<nr_mce_banks; i++) {
| > +#else
| > for (i=0; i<nr_mce_banks; i++) {
| > +#endif
|
| Including opening braces within #if often fools editors such as emacs
| which count them and don't know about #if. Then, editing the rest of
| the file can become annoying because it simply thinks that there are
| two embedded for loops.

Wouldn't it be easier to just move the brace out of the ifdef and put it
on a line by itself? Readable, doesn't confuse, etc?

preempt_disable();
+#if CONFIG_MK7
+ for (i=1; i<nr_mce_banks; i++) {
+#else
- for (i=0; i<nr_mce_banks; i++) {
+ for (i=0; i<nr_mce_banks; i++)
+#endif
{

or similar. Otherwise I guess the solution defining a starting value
would be "less unreadable."


--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.