Subject: [PATCH 0/2] x86: mcheck: suspend/resume bug fixes

Following two patches fix suspend/resume issues with current kernel.
Patches are against v2.6.28-rc8-1-g6c34bc2
IMHO this should go into .28.


Thanks,

Andreas


Subject: [PATCH 1/2] x86: don't enable MCE after suspend/resume when system was booted with "nomce"

Impact: fix suspend/resume bug with MCE

A suspend/resume cycle unconditionally enables MCE
for the boot CPU if MCE is compiled into the kernel.
I.e. even a system booted with "nomce" configures MCE for the boot CPU.

This patch ensures that MCE is not turned on for systems booted with
"nomce".

Signed-off-by: Andreas Herrmann <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce_64.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce_64.c b/arch/x86/kernel/cpu/mcheck/mce_64.c
index 4b031a4..e2d9649 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_64.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_64.c
@@ -443,6 +443,9 @@ static void mce_init(void *dummy)
u64 cap;
int i;

+ if (mce_dont_init)
+ return;
+
rdmsrl(MSR_IA32_MCG_CAP, cap);
banks = cap & 0xff;
if (banks > MCE_EXTENDED_BANK) {
--
1.6.0.4


Subject: [PATCH 2/2] x86: re-enable MCE on secondary CPUS after suspend/resume

Impact: fix suspend/resume bug with MCE

After suspend/resume MCx_CTL registers of secondary CPUs are cleared.
(At least that's what I've observed on several systems.)
Linux currently only re-initializes MCE on the boot CPU - see mce_resume().
Thus after suspend/resume we end up with a system where MCE is active
on the boot CPU but switched off on all other CPUs.

By calling mce_init() whenever a CPU comes online this problem is
solved.

Signed-off-by: Andreas Herrmann <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce_64.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce_64.c b/arch/x86/kernel/cpu/mcheck/mce_64.c
index e2d9649..0174539 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_64.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_64.c
@@ -889,6 +889,7 @@ static int __cpuinit mce_cpu_callback(struct notifier_block *nfb,
mce_create_device(cpu);
if (threshold_cpu_callback)
threshold_cpu_callback(action, cpu);
+ mce_init(NULL);
break;
case CPU_DEAD:
case CPU_DEAD_FROZEN:
--
1.6.0.4


2008-12-12 19:06:21

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86: re-enable MCE on secondary CPUS after suspend/resume

Andreas Herrmann <[email protected]> writes:

> Impact: fix suspend/resume bug with MCE
>
> After suspend/resume MCx_CTL registers of secondary CPUs are cleared.
> (At least that's what I've observed on several systems.)
> Linux currently only re-initializes MCE on the boot CPU - see mce_resume().
> Thus after suspend/resume we end up with a system where MCE is active
> on the boot CPU but switched off on all other CPUs.
>
> By calling mce_init() whenever a CPU comes online this problem is
> solved.

Can you double check that please?

Suspend/resume are supposted to hotunplug all CPUs except the BP and
then re-online them on resume (with "disable_nonboot_cpus()) . The
re-online initializes MCEs in the standard CPU bootup path.

A good way is to stick a WARN_ON(num_online_cpus() > 1) into
mce_suspend(). I had that here for some time and didn't see
it trigger.

I got a couple of suspend bug fixes in my mce improvement tree, see:

http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-x86.git;a=history;f=arch/x86/kernel/cpu/mcheck/mce_64.c;h=9512a7eab4e7b03a584f5bb647bd242bd4c003dc;hb=x86/mce

During review it was decided to all defer it to .29 though.

-Andi

--
[email protected]

2008-12-12 19:08:18

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86: don't enable MCE after suspend/resume when system was booted with "nomce"

Andreas Herrmann <[email protected]> writes:

> Impact: fix suspend/resume bug with MCE
>
> A suspend/resume cycle unconditionally enables MCE
> for the boot CPU if MCE is compiled into the kernel.
> I.e. even a system booted with "nomce" configures MCE for the boot CPU.
>
> This patch ensures that MCE is not turned on for systems booted with
> "nomce".

I have that fixed in a better way in

http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-x86.git;a=commit;h=81c7af997dafbabad464ccefba89af2b247899da

I believe


-Andi

--
[email protected]

Subject: Re: [PATCH 1/2] x86: don't enable MCE after suspend/resume when system was booted with "nomce"

On Fri, Dec 12, 2008 at 08:08:28PM +0100, Andi Kleen wrote:
> Andreas Herrmann <[email protected]> writes:
>
> > Impact: fix suspend/resume bug with MCE
> >
> > A suspend/resume cycle unconditionally enables MCE
> > for the boot CPU if MCE is compiled into the kernel.
> > I.e. even a system booted with "nomce" configures MCE for the boot CPU.
> >
> > This patch ensures that MCE is not turned on for systems booted with
> > "nomce".
>
> I have that fixed in a better way in

> http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-x86.git;a=commit;h=81c7af997dafbabad464ccefba89af2b247899da

Thanks for that pointer.
I didn't look at tip/x86/mce up to know. Is this .28 material?
My fix is intended for .28.

I didn't care about the lurking sysfs interface when booted with
"nomce". It's there and you could enable MCE with changing attributes
but then you are actively fiddling with MCE.

The fix is for the more severe problem that mce_init() is called on
resume and thus will enable MCE (setup cr4 and corresponding MSRs)
which shouldn't happen when booted with "nomce".


Regards,

Andreas

Subject: Re: [PATCH 2/2] x86: re-enable MCE on secondary CPUS after suspend/resume

On Fri, Dec 12, 2008 at 08:06:21PM +0100, Andi Kleen wrote:
> Andreas Herrmann <[email protected]> writes:
>
> > Impact: fix suspend/resume bug with MCE
> >
> > After suspend/resume MCx_CTL registers of secondary CPUs are cleared.
> > (At least that's what I've observed on several systems.)
> > Linux currently only re-initializes MCE on the boot CPU - see mce_resume().
> > Thus after suspend/resume we end up with a system where MCE is active
> > on the boot CPU but switched off on all other CPUs.
> >
> > By calling mce_init() whenever a CPU comes online this problem is
> > solved.
>
> Can you double check that please?
>
> Suspend/resume are supposted to hotunplug all CPUs except the BP and
> then re-online them on resume (with "disable_nonboot_cpus()) . The
> re-online initializes MCEs in the standard CPU bootup path.

For BP we have

/* On resume clear all MCE state. Don't want to see leftovers from the BIOS.
Only one CPU is active at this time, the others get readded later using
CPU hotplug. */
static int mce_resume(struct sys_device *dev)
{
mce_init(NULL);
return 0;
}

For APs mcheck_init() is called on resume. But as the respective bit
for an AP is usually set in "mce_cpus" after boot (which is correct, I
think) mcheck_init does not call mce_init, see:

void __cpuinit mcheck_init(struct cpuinfo_x86 *c)
{
static cpumask_t mce_cpus = CPU_MASK_NONE;

mce_cpu_quirks(c);

if (mce_dont_init ||
cpu_test_and_set(smp_processor_id(), mce_cpus) ||
!mce_available(c))
=> return;

mce_init(NULL);
mce_cpu_features(c);
}

But we need to call mce_init to clear all MCE state.
IMHO the best location to call mce_init for APs is the cpu notifier.


Regards,

Andreas

2008-12-15 22:11:17

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86: don't enable MCE after suspend/resume when system was booted with "nomce"

> > http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-x86.git;a=commit;h=81c7af997dafbabad464ccefba89af2b247899da
>
> Thanks for that pointer.
> I didn't look at tip/x86/mce up to know. Is this .28 material?

I would think so, but I'm not sure the x86 maintainers think the same.

> My fix is intended for .28.
>
> I didn't care about the lurking sysfs interface when booted with
> "nomce". It's there and you could enable MCE with changing attributes
> but then you are actively fiddling with MCE.
>
> The fix is for the more severe problem that mce_init() is called on
> resume and thus will enable MCE (setup cr4 and corresponding MSRs)
> which shouldn't happen when booted with "nomce".

The patch does the same (although the description doesn't say that).
When there is no sysfs interface there won't
be any resume either because resume goes through the sysdev devices.
So it's a cleaner and more complete fix imho.

-Andi

2008-12-15 22:21:19

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86: re-enable MCE on secondary CPUS after suspend/resume

> void __cpuinit mcheck_init(struct cpuinfo_x86 *c)
> {
> static cpumask_t mce_cpus = CPU_MASK_NONE;
>
> mce_cpu_quirks(c);
>
> if (mce_dont_init ||
> cpu_test_and_set(smp_processor_id(), mce_cpus) ||
> !mce_available(c))
> => return;
>
> mce_init(NULL);
> mce_cpu_features(c);
> }
>
> But we need to call mce_init to clear all MCE state.
> IMHO the best location to call mce_init for APs is the cpu notifier.

Ah got it. Thanks that makes sense.

But I think the better fix is to just drop the mce_cpus check and
then handly it naturally in the standard bootup path. I'm not
sure what it was good for anyways. I copied it into the 64bit code
from 32bit, but I suppose even there it isn't really needed and on
32bit it is already gone even.

How about this patch. Does it fix the problem for you too?

-Andi

--


Don't prevent multiple initialization of MCEs.

Back from early prehistory mcheck_init() has a reentry check. Presumably
that was needed in very old kernels to prevent it entering twice.

But as Andreas points out this prevents CPU hotplug (and therefore resume)
to correctly reinitialize MCEs when a AP boots again after being
offlined.

Just drop the check.

Based on a report from Andreas Herrmann

Signed-off-by: Andi Kleen <[email protected]>

---
arch/x86/kernel/cpu/mcheck/mce_64.c | 3 ---
1 file changed, 3 deletions(-)

Index: linux/arch/x86/kernel/cpu/mcheck/mce_64.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/mcheck/mce_64.c 2008-12-15 23:13:02.000000000 +0100
+++ linux/arch/x86/kernel/cpu/mcheck/mce_64.c 2008-12-15 23:13:44.000000000 +0100
@@ -510,12 +510,9 @@
*/
void __cpuinit mcheck_init(struct cpuinfo_x86 *c)
{
- static cpumask_t mce_cpus = CPU_MASK_NONE;
-
mce_cpu_quirks(c);

if (mce_dont_init ||
- cpu_test_and_set(smp_processor_id(), mce_cpus) ||
!mce_available(c))
return;

Subject: Re: [PATCH 2/2] x86: re-enable MCE on secondary CPUS after suspend/resume

On Mon, Dec 15, 2008 at 11:33:10PM +0100, Andi Kleen wrote:
> > void __cpuinit mcheck_init(struct cpuinfo_x86 *c)
> > {
> > static cpumask_t mce_cpus = CPU_MASK_NONE;
> >
> > mce_cpu_quirks(c);
> >
> > if (mce_dont_init ||
> > cpu_test_and_set(smp_processor_id(), mce_cpus) ||
> > !mce_available(c))
> > => return;
> >
> > mce_init(NULL);
> > mce_cpu_features(c);
> > }
> >
> > But we need to call mce_init to clear all MCE state.
> > IMHO the best location to call mce_init for APs is the cpu notifier.
>
> Ah got it. Thanks that makes sense.
>
> But I think the better fix is to just drop the mce_cpus check and
> then handly it naturally in the standard bootup path. I'm not
> sure what it was good for anyways. I copied it into the 64bit code
> from 32bit, but I suppose even there it isn't really needed and on
> 32bit it is already gone even.
>
> How about this patch. Does it fix the problem for you too?

Yup, works as well:

Tested-by: Andreas Herrmann <[email protected]>


Thanks,

Andreas

> -Andi
>
> --
>
>
> Don't prevent multiple initialization of MCEs.
>
> Back from early prehistory mcheck_init() has a reentry check. Presumably
> that was needed in very old kernels to prevent it entering twice.
>
> But as Andreas points out this prevents CPU hotplug (and therefore resume)
> to correctly reinitialize MCEs when a AP boots again after being
> offlined.
>
> Just drop the check.
>
> Based on a report from Andreas Herrmann
>
> Signed-off-by: Andi Kleen <[email protected]>
>
> ---
> arch/x86/kernel/cpu/mcheck/mce_64.c | 3 ---
> 1 file changed, 3 deletions(-)
>
> Index: linux/arch/x86/kernel/cpu/mcheck/mce_64.c
> ===================================================================
> --- linux.orig/arch/x86/kernel/cpu/mcheck/mce_64.c 2008-12-15 23:13:02.000000000 +0100
> +++ linux/arch/x86/kernel/cpu/mcheck/mce_64.c 2008-12-15 23:13:44.000000000 +0100
> @@ -510,12 +510,9 @@
> */
> void __cpuinit mcheck_init(struct cpuinfo_x86 *c)
> {
> - static cpumask_t mce_cpus = CPU_MASK_NONE;
> -
> mce_cpu_quirks(c);
>
> if (mce_dont_init ||
> - cpu_test_and_set(smp_processor_id(), mce_cpus) ||
> !mce_available(c))
> return;
>
>

2008-12-16 22:03:39

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86: re-enable MCE on secondary CPUS after suspend/resume


* Andreas Herrmann <[email protected]> wrote:

> On Mon, Dec 15, 2008 at 11:33:10PM +0100, Andi Kleen wrote:
> > > void __cpuinit mcheck_init(struct cpuinfo_x86 *c)
> > > {
> > > static cpumask_t mce_cpus = CPU_MASK_NONE;
> > >
> > > mce_cpu_quirks(c);
> > >
> > > if (mce_dont_init ||
> > > cpu_test_and_set(smp_processor_id(), mce_cpus) ||
> > > !mce_available(c))
> > > => return;
> > >
> > > mce_init(NULL);
> > > mce_cpu_features(c);
> > > }
> > >
> > > But we need to call mce_init to clear all MCE state.
> > > IMHO the best location to call mce_init for APs is the cpu notifier.
> >
> > Ah got it. Thanks that makes sense.
> >
> > But I think the better fix is to just drop the mce_cpus check and
> > then handly it naturally in the standard bootup path. I'm not
> > sure what it was good for anyways. I copied it into the 64bit code
> > from 32bit, but I suppose even there it isn't really needed and on
> > 32bit it is already gone even.
> >
> > How about this patch. Does it fix the problem for you too?
>
> Yup, works as well:
>
> Tested-by: Andreas Herrmann <[email protected]>

applied to tip/x86/urgent, thanks guys.

Ingo