2007-11-12 20:39:49

by Max Asbock

[permalink] [raw]
Subject: x86 32-bit machine check handler

Now that the 32-bit and 64-bit x86 machine check handlers live next to
each other a certain asymmetry in functionality is apparent. Notably,
the 64-bit machine check handler implements a timer that periodically
polls for silent machine check errors and makes them accessible to user
space through /dev/mcelog. Are there reasons the x86 32-bit machine
check handler couldn't do the same?

thanks,
Max



2007-11-12 21:24:26

by H. Peter Anvin

[permalink] [raw]
Subject: Re: x86 32-bit machine check handler

Max Asbock wrote:
> Now that the 32-bit and 64-bit x86 machine check handlers live next to
> each other a certain asymmetry in functionality is apparent. Notably,
> the 64-bit machine check handler implements a timer that periodically
> polls for silent machine check errors and makes them accessible to user
> space through /dev/mcelog. Are there reasons the x86 32-bit machine
> check handler couldn't do the same?

No, and in fact, it should.

-hpa

2007-11-13 14:15:58

by Andi Kleen

[permalink] [raw]
Subject: Re: x86 32-bit machine check handler

Max Asbock <[email protected]> writes:

> Now that the 32-bit and 64-bit x86 machine check handlers live next to
> each other a certain asymmetry in functionality is apparent. Notably,
> the 64-bit machine check handler implements a timer that periodically
> polls for silent machine check errors and makes them accessible to user
> space through /dev/mcelog.

Actually 32bit implements that too (non-fatal.c). But it misses some
of the more advanced functionality like AMD Threshold Interrupts.

> Are there reasons the x86 32-bit machine
> check handler couldn't do the same?

The 32bit machine check code has some serious design problems. The
best would be probably to just move 32bit over to the 64bit code too. In
fact there was a patch to do that some time ago, but it ran into some
minor problems and was unfortunately never merged. But it would be the
right thing to do.

The only missing functionality on the 64bit side would be support for
old non IA compliant old machine checks like P5 or WinChip. One option
would be to simply drop them. AFAIK these CPUs don't really have
anywhere near usable machine check capability anyways so dropping it
would not make much difference. Or alternatively keep p5.c/winchip.c
around. But if you look at them they don't do much except simple
printk with not much information and printk in a machine check handler
is always wrong because it can deadlock. I personally would prefer
dropping.

And I think one or two K7 quirks are also missing on 64bit, but these
would be very easy to add. Other than that it should just work on
32bit CPUs.

-Andi

2007-11-15 01:06:49

by Max Asbock

[permalink] [raw]
Subject: Re: x86 32-bit machine check handler

On Tue, 2007-11-13 at 15:15 +0100, Andi Kleen wrote:
> Max Asbock <[email protected]> writes:
>
> > Now that the 32-bit and 64-bit x86 machine check handlers live next to
> > each other a certain asymmetry in functionality is apparent. Notably,
> > the 64-bit machine check handler implements a timer that periodically
> > polls for silent machine check errors and makes them accessible to user
> > space through /dev/mcelog.
>
> Actually 32bit implements that too (non-fatal.c). But it misses some
> of the more advanced functionality like AMD Threshold Interrupts.
>
> > Are there reasons the x86 32-bit machine
> > check handler couldn't do the same?
>
> The 32bit machine check code has some serious design problems. The
> best would be probably to just move 32bit over to the 64bit code too. In
> fact there was a patch to do that some time ago, but it ran into some
> minor problems and was unfortunately never merged. But it would be the
> right thing to do.

I found patch from about three years ago that implemented a 32-bit
version of the x86_64 machine check handler. Do you know of any newer
attempts?
However, given the merge of x86, a single implementation should be able
to handle both the 32-bit and 64-bit cases. I tried to build the 64-bit
machine check handler (mce_64.c) for 32-bit to see what kind problems it
would run into. So far I found a few things:
- there is no idle_notifier_register in 32-bit x86
- there is no oops_begin in 32-bit x86
- register names are different (rip, cs)
- some data types would have to adjusted to be 64 bit
The issues seem to be surmountable.

> The only missing functionality on the 64bit side would be support for
> old non IA compliant old machine checks like P5 or WinChip. One option
> would be to simply drop them. AFAIK these CPUs don't really have
> anywhere near usable machine check capability anyways so dropping it
> would not make much difference. Or alternatively keep p5.c/winchip.c
> around. But if you look at them they don't do much except simple
> printk with not much information and printk in a machine check handler
> is always wrong because it can deadlock. I personally would prefer
> dropping.
>
> And I think one or two K7 quirks are also missing on 64bit, but these
> would be very easy to add. Other than that it should just work on
> 32bit CPUs.
>
So it looks like giving 32-bit x86 the same machine check support as in
64-bit is both feasible and desirable.
Are there any plans to do this or is anybody currently working on it?

thanks,
Max


2007-11-15 05:36:55

by Andi Kleen

[permalink] [raw]
Subject: Re: x86 32-bit machine check handler

> I found patch from about three years ago that implemented a 32-bit
> version of the x86_64 machine check handler. Do you know of any newer
> attempts?

No.

> However, given the merge of x86, a single implementation should be able
> to handle both the 32-bit and 64-bit cases. I tried to build the 64-bit
> machine check handler (mce_64.c) for 32-bit to see what kind problems it
> would run into. So far I found a few things:
> - there is no idle_notifier_register in 32-bit x86

There used to be one, just needs to be readded.

> - there is no oops_begin in 32-bit x86
> - register names are different (rip, cs)

regs->rip -> instruction_pointer()
->cs just needs a similar macro

> So it looks like giving 32-bit x86 the same machine check support as in
> 64-bit is both feasible and desirable.

Yep.

-Andi