Subject: [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs

Commit-ID: ea431643d6c38728195e2c456801c3ef66bb9991
Gitweb: http://git.kernel.org/tip/ea431643d6c38728195e2c456801c3ef66bb9991
Author: Ingo Molnar <[email protected]>
AuthorDate: Thu, 17 Apr 2014 10:25:53 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 17 Apr 2014 10:28:42 +0200

x86/mce: Fix CMCI preemption bugs

The following commit:

27f6c573e0f7 ("x86, CMCI: Add proper detection of end of CMCI storms")

Added two preemption bugs:

- machine_check_poll() does a get_cpu_var() without a matching
put_cpu_var(), which causes preemption imbalance and crashes upon
bootup.

- it does percpu ops without disabling preemption. Preemption is not
disabled due to the mistaken use of a raw spinlock.

To fix these bugs fix the imbalance and change
cmci_discover_lock to a regular spinlock.

Reported-by: Owen Kibel <[email protected]>
Reported-by: Linus Torvalds <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Chen, Gong <[email protected]>
Cc: Josh Boyer <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Alexander Todorov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
--
arch/x86/kernel/cpu/mcheck/mce.c | 4 +---
arch/x86/kernel/cpu/mcheck/mce_intel.c | 18 +++++++++---------
2 files changed, 10 insertions(+), 12 deletions(-)
---
arch/x86/kernel/cpu/mcheck/mce.c | 4 +---
arch/x86/kernel/cpu/mcheck/mce_intel.c | 18 +++++++++---------
2 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index eeee23f..68317c8 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -598,7 +598,6 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
{
struct mce m;
int i;
- unsigned long *v;

this_cpu_inc(mce_poll_count);

@@ -618,8 +617,7 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
if (!(m.status & MCI_STATUS_VAL))
continue;

- v = &get_cpu_var(mce_polled_error);
- set_bit(0, v);
+ this_cpu_write(mce_polled_error, 1);
/*
* Uncorrected or signalled events are handled by the exception
* handler when it is enabled, so don't process those here.
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index 3bdb95a..9a316b2 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -42,7 +42,7 @@ static DEFINE_PER_CPU(mce_banks_t, mce_banks_owned);
* cmci_discover_lock protects against parallel discovery attempts
* which could race against each other.
*/
-static DEFINE_RAW_SPINLOCK(cmci_discover_lock);
+static DEFINE_SPINLOCK(cmci_discover_lock);

#define CMCI_THRESHOLD 1
#define CMCI_POLL_INTERVAL (30 * HZ)
@@ -144,14 +144,14 @@ static void cmci_storm_disable_banks(void)
int bank;
u64 val;

- raw_spin_lock_irqsave(&cmci_discover_lock, flags);
+ spin_lock_irqsave(&cmci_discover_lock, flags);
owned = __get_cpu_var(mce_banks_owned);
for_each_set_bit(bank, owned, MAX_NR_BANKS) {
rdmsrl(MSR_IA32_MCx_CTL2(bank), val);
val &= ~MCI_CTL2_CMCI_EN;
wrmsrl(MSR_IA32_MCx_CTL2(bank), val);
}
- raw_spin_unlock_irqrestore(&cmci_discover_lock, flags);
+ spin_unlock_irqrestore(&cmci_discover_lock, flags);
}

static bool cmci_storm_detect(void)
@@ -211,7 +211,7 @@ static void cmci_discover(int banks)
int i;
int bios_wrong_thresh = 0;

- raw_spin_lock_irqsave(&cmci_discover_lock, flags);
+ spin_lock_irqsave(&cmci_discover_lock, flags);
for (i = 0; i < banks; i++) {
u64 val;
int bios_zero_thresh = 0;
@@ -266,7 +266,7 @@ static void cmci_discover(int banks)
WARN_ON(!test_bit(i, __get_cpu_var(mce_poll_banks)));
}
}
- raw_spin_unlock_irqrestore(&cmci_discover_lock, flags);
+ spin_unlock_irqrestore(&cmci_discover_lock, flags);
if (mca_cfg.bios_cmci_threshold && bios_wrong_thresh) {
pr_info_once(
"bios_cmci_threshold: Some banks do not have valid thresholds set\n");
@@ -316,10 +316,10 @@ void cmci_clear(void)

if (!cmci_supported(&banks))
return;
- raw_spin_lock_irqsave(&cmci_discover_lock, flags);
+ spin_lock_irqsave(&cmci_discover_lock, flags);
for (i = 0; i < banks; i++)
__cmci_disable_bank(i);
- raw_spin_unlock_irqrestore(&cmci_discover_lock, flags);
+ spin_unlock_irqrestore(&cmci_discover_lock, flags);
}

static void cmci_rediscover_work_func(void *arg)
@@ -360,9 +360,9 @@ void cmci_disable_bank(int bank)
if (!cmci_supported(&banks))
return;

- raw_spin_lock_irqsave(&cmci_discover_lock, flags);
+ spin_lock_irqsave(&cmci_discover_lock, flags);
__cmci_disable_bank(bank);
- raw_spin_unlock_irqrestore(&cmci_discover_lock, flags);
+ spin_unlock_irqrestore(&cmci_discover_lock, flags);
}

static void intel_init_cmci(void)


2014-04-17 10:10:00

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs

On Thu, Apr 17, 2014 at 02:57:54AM -0700, tip-bot for Ingo Molnar wrote:
> Commit-ID: ea431643d6c38728195e2c456801c3ef66bb9991
> Gitweb: http://git.kernel.org/tip/ea431643d6c38728195e2c456801c3ef66bb9991
> Author: Ingo Molnar <[email protected]>
> AuthorDate: Thu, 17 Apr 2014 10:25:53 +0200
> Committer: Ingo Molnar <[email protected]>
> CommitDate: Thu, 17 Apr 2014 10:28:42 +0200
>
> x86/mce: Fix CMCI preemption bugs
>
> The following commit:
>
> 27f6c573e0f7 ("x86, CMCI: Add proper detection of end of CMCI storms")
>
> Added two preemption bugs:
>
> - machine_check_poll() does a get_cpu_var() without a matching
> put_cpu_var(), which causes preemption imbalance and crashes upon
> bootup.
>
> - it does percpu ops without disabling preemption. Preemption is not
> disabled due to the mistaken use of a raw spinlock.

it is arch_spinlock that doesn't disable preemption. raw_spinlock
disables preemption just fine.

2014-04-17 10:24:49

by Borislav Petkov

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs

On Thu, Apr 17, 2014 at 12:09:44PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 17, 2014 at 02:57:54AM -0700, tip-bot for Ingo Molnar wrote:
> > Commit-ID: ea431643d6c38728195e2c456801c3ef66bb9991
> > Gitweb: http://git.kernel.org/tip/ea431643d6c38728195e2c456801c3ef66bb9991
> > Author: Ingo Molnar <[email protected]>
> > AuthorDate: Thu, 17 Apr 2014 10:25:53 +0200
> > Committer: Ingo Molnar <[email protected]>
> > CommitDate: Thu, 17 Apr 2014 10:28:42 +0200
> >
> > x86/mce: Fix CMCI preemption bugs
> >
> > The following commit:
> >
> > 27f6c573e0f7 ("x86, CMCI: Add proper detection of end of CMCI storms")
> >
> > Added two preemption bugs:
> >
> > - machine_check_poll() does a get_cpu_var() without a matching
> > put_cpu_var(), which causes preemption imbalance and crashes upon
> > bootup.
> >
> > - it does percpu ops without disabling preemption. Preemption is not
> > disabled due to the mistaken use of a raw spinlock.
>
> it is arch_spinlock that doesn't disable preemption. raw_spinlock
> disables preemption just fine.

Hohum, __raw_spin_lock_irqsave does preempt_disable(). And
machine_check_poll should be running in irq context so why would the
original issue happen?

> kernel: [ 7.341085] BUG: using __this_cpu_write() in preemptible [00000000] code: modprobe/546

Unfortunately, I have only one line in a mail CCed to me.

Color me confused.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-04-17 14:05:51

by Luck, Tony

[permalink] [raw]
Subject: RE: [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs

> Hohum, __raw_spin_lock_irqsave does preempt_disable(). And
> machine_check_poll should be running in irq context so why would the
> original issue happen?
>
>> kernel: [ 7.341085] BUG: using __this_cpu_write() in preemptible [00000000] code: modprobe/546
>
> Unfortunately, I have only one line in a mail CCed to me.
>
> Color me confused.

Is this just the missing put_cpu() that Chen Gong already sent a patch for?

See attached

-Tony


Attachments:
(No filename) (2.70 kB)

2014-04-17 15:26:21

by Borislav Petkov

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs

On Thu, Apr 17, 2014 at 02:03:34PM +0000, Luck, Tony wrote:
> > Hohum, __raw_spin_lock_irqsave does preempt_disable(). And
> > machine_check_poll should be running in irq context so why would the
> > original issue happen?
> >
> >> kernel: [ 7.341085] BUG: using __this_cpu_write() in preemptible [00000000] code: modprobe/546
> >
> > Unfortunately, I have only one line in a mail CCed to me.
> >
> > Color me confused.
>
> Is this just the missing put_cpu() that Chen Gong already sent a patch for?

I'm not sure. There's some bug report floating around which contains the
"BUG" line above but I can't seem to find/get it.

I'll boot latest Linus tree on my SNB machine to check whether it
triggers here. Ingo says CONFIG_DEBUG_PREEMPT=y is causing it but this
is all hearsay stuff from where I'm standing...

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-04-17 16:54:32

by Josh Boyer

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs

On Thu, Apr 17, 2014 at 11:26 AM, Borislav Petkov <[email protected]> wrote:
> On Thu, Apr 17, 2014 at 02:03:34PM +0000, Luck, Tony wrote:
>> > Hohum, __raw_spin_lock_irqsave does preempt_disable(). And
>> > machine_check_poll should be running in irq context so why would the
>> > original issue happen?
>> >
>> >> kernel: [ 7.341085] BUG: using __this_cpu_write() in preemptible [00000000] code: modprobe/546
>> >
>> > Unfortunately, I have only one line in a mail CCed to me.
>> >
>> > Color me confused.
>>
>> Is this just the missing put_cpu() that Chen Gong already sent a patch for?
>
> I'm not sure. There's some bug report floating around which contains the
> "BUG" line above but I can't seem to find/get it.
>
> I'll boot latest Linus tree on my SNB machine to check whether it
> triggers here. Ingo says CONFIG_DEBUG_PREEMPT=y is causing it but this
> is all hearsay stuff from where I'm standing...

For some context.

A user (Owen) reported seeing the following backtrace with 3.15-rc1+:

kernel: [ 120.253539] Hardware name: Hewlett-Packard HP ENVY 15
Notebook PC/1962, BIOS F.24 08/27/2013
kernel: [ 120.253540] ffff88025f2146c0 ffffffff81c01e40
ffffffff8171dc1d ffffffff81d11aa0
kernel: [ 120.253543] ffffffff81c01e50 ffffffff81719a39
ffffffff81c01eb0 ffffffff8172151b
kernel: [ 120.253545] ffffffff81c18480 ffffffff81c01fd8
00000000000146c0 00000000000146c0
kernel: [ 120.253548] Call Trace:
kernel: [ 120.253555] [<ffffffff8171dc1d>] dump_stack+0x4d/0x6f
kernel: [ 120.253558] [<ffffffff81719a39>] __schedule_bug+0x4c/0x5a
kernel: [ 120.253560] [<ffffffff8172151b>] __schedule+0x6eb/0x7a0
kernel: [ 120.253563] [<ffffffff81721b91>] schedule_preempt_disabled+0x31/0x80
kernel: [ 120.253566] [<ffffffff810af8f3>] cpu_startup_entry+0x173/0x490
kernel: [ 120.253570] [<ffffffff8170e3e4>] rest_init+0x84/0x90
kernel: [ 120.253574] [<ffffffff81d34f83>] start_kernel+0x450/0x45b
kernel: [ 120.253576] [<ffffffff81d3493c>] ? repair_env_string+0x5c/0x5c
kernel: [ 120.253578] [<ffffffff81d34120>] ? early_idt_handlers+0x120/0x120
kernel: [ 120.253581] [<ffffffff81d345ee>] x86_64_start_reservations+0x2a/0x2c
kernel: [ 120.253583] [<ffffffff81d3472e>] x86_64_start_kernel+0x13e/0x14d

For whatever reason, his report didn't hit lkml, but it did hit Linus'
inbox. Linus took a look around, googled some, and came across a
report that Alexander filed 2 days ago against Fedora rawhide with a
similar backtrace:

https://bugzilla.redhat.com/show_bug.cgi?id=1087810

Linus CC'd me and Alexander on his reply to Owen, Ingo, Peter, etc.
Ingo was aware of Chen Gong's patch, but when Owen tested it it
produced the BUG line above. So Ingo came up with a slightly
different fix to hopefully resolve that as well. We haven't heard
from Owen whether Ingo's patch resolves everything yet.

I think (hope) that is the full backstory. Ingo or Peter could correct me.

josh

2014-04-17 19:23:46

by Borislav Petkov

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs

On Thu, Apr 17, 2014 at 12:54:21PM -0400, Josh Boyer wrote:
> A user (Owen) reported seeing the following backtrace with 3.15-rc1+:
>
> kernel: [ 120.253539] Hardware name: Hewlett-Packard HP ENVY 15
> Notebook PC/1962, BIOS F.24 08/27/2013
> kernel: [ 120.253540] ffff88025f2146c0 ffffffff81c01e40
> ffffffff8171dc1d ffffffff81d11aa0
> kernel: [ 120.253543] ffffffff81c01e50 ffffffff81719a39
> ffffffff81c01eb0 ffffffff8172151b
> kernel: [ 120.253545] ffffffff81c18480 ffffffff81c01fd8
> 00000000000146c0 00000000000146c0
> kernel: [ 120.253548] Call Trace:
> kernel: [ 120.253555] [<ffffffff8171dc1d>] dump_stack+0x4d/0x6f
> kernel: [ 120.253558] [<ffffffff81719a39>] __schedule_bug+0x4c/0x5a
> kernel: [ 120.253560] [<ffffffff8172151b>] __schedule+0x6eb/0x7a0
> kernel: [ 120.253563] [<ffffffff81721b91>] schedule_preempt_disabled+0x31/0x80
> kernel: [ 120.253566] [<ffffffff810af8f3>] cpu_startup_entry+0x173/0x490
> kernel: [ 120.253570] [<ffffffff8170e3e4>] rest_init+0x84/0x90
> kernel: [ 120.253574] [<ffffffff81d34f83>] start_kernel+0x450/0x45b
> kernel: [ 120.253576] [<ffffffff81d3493c>] ? repair_env_string+0x5c/0x5c
> kernel: [ 120.253578] [<ffffffff81d34120>] ? early_idt_handlers+0x120/0x120
> kernel: [ 120.253581] [<ffffffff81d345ee>] x86_64_start_reservations+0x2a/0x2c
> kernel: [ 120.253583] [<ffffffff81d3472e>] x86_64_start_kernel+0x13e/0x14d
>
> For whatever reason, his report didn't hit lkml, but it did hit Linus'
> inbox. Linus took a look around, googled some, and came across a
> report that Alexander filed 2 days ago against Fedora rawhide with a
> similar backtrace:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1087810
>
> Linus CC'd me and Alexander on his reply to Owen, Ingo, Peter, etc.
> Ingo was aware of Chen Gong's patch, but when Owen tested it it
> produced the BUG line above. So Ingo came up with a slightly
> different fix to hopefully resolve that as well. We haven't heard
> from Owen whether Ingo's patch resolves everything yet.
>
> I think (hope) that is the full backstory. Ingo or Peter could correct me.

Thanks Josh, that explains it. Owen bounced me his mail too, in the
meantime.

So, the only thing I'm seeing so far is that I cannot reproduce this
with Owen's config on my machines with Linus' tree from right now.

And the only sane idea I have currently is that without the get_cpu_var
fix, preemption count gets fumbled just right down the

identify_boot_cpu ->
mcheck_cpu_init ->
machine_check_poll ->
get_cpu_var ->
preempt_disable

path so that the initial schedule we do, catches it:

start_kernel -> rest_init -> schedule_preempt_disabled

Which would mean that the only fix for that should be Gong's
initial fix which Tony just sent (see attached), i.e. without the
s/raw_spin_lock/spin_lock/ changes from Ingo.

But you're saying here, Owen tested it already. Which kinda hints at with that

kernel: [ 7.341085] BUG: using __this_cpu_write() in preemptible [00000000] code: modprobe/546

line which was in one of the previous mails.

Anyway, I did merge Owen's config wrt PREEMPT into mine:

$ grep PREEMPT .config
CONFIG_TREE_PREEMPT_RCU=y
CONFIG_PREEMPT_RCU=y
CONFIG_PREEMPT_NOTIFIERS=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
# CONFIG_DEBUG_PREEMPT is not set
# CONFIG_PREEMPT_TRACER is not set

and I can't repro it. With, or without the attached patch :-\.
Which basically could mean that there's either some specific timing
Owen's or Alex' boxes are hitting which mine aren't or there's some
miscommunication in the whole thing.

@Owen: can you please test the attached patch if you haven't done so
already and report back?

Thanks.

/me is still confused.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--


Attachments:
(No filename) (3.76 kB)
diff (1.24 kB)
Download all attachments

2014-04-17 19:25:24

by Linus Torvalds

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs

On Thu, Apr 17, 2014 at 12:23 PM, Borislav Petkov <[email protected]> wrote:
>
> But you're saying here, Owen tested it already.

No, Owen tested a simpler patch that just changes the "get_cpu_var()"
to "__get_cpu_var()" and avoids the preempt increment.

Linus

2014-04-17 19:42:53

by Borislav Petkov

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs

On Thu, Apr 17, 2014 at 12:25:14PM -0700, Linus Torvalds wrote:
> No, Owen tested a simpler patch that just changes the "get_cpu_var()"
> to "__get_cpu_var()" and avoids the preempt increment.

Which basically would be the same as doing this_cpu_write() in the
proposed fix - both don't touch preemption. So it is something else.
More staring...

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-04-17 20:58:49

by Borislav Petkov

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs

On Thu, Apr 17, 2014 at 09:42:41PM +0200, Borislav Petkov wrote:
> On Thu, Apr 17, 2014 at 12:25:14PM -0700, Linus Torvalds wrote:
> > No, Owen tested a simpler patch that just changes the "get_cpu_var()"
> > to "__get_cpu_var()" and avoids the preempt increment.
>
> Which basically would be the same as doing this_cpu_write() in the
> proposed fix - both don't touch preemption. So it is something else.
> More staring...

Ok, in one of the mails Ingo forwarded to me, it said it still failed with

> kernel: [ 7.341085] BUG: using __this_cpu_write() in preemptible [00000000] code: modprobe/546

but considering Owen tried with a simpler __get_cpu_var version, I
fail to see how the __this_cpu_write() BUG will happen. Btw, those
__this_cpu_write things have received preemption checks. I'm seeing
right now another thread happening on lkml:

http://lkml.kernel.org/r/[email protected]

So, Owen, can you please clarify which patch you *did* text exactly and
whether it worked or not.

Also, did you test the patch below? If not, please give it a run too.

Thanks.

---
This bug is introduced by me in commit 27f6c573e0. I forget
to execute put_cpu_var operation after get_cpu_var. Fix it
via this_cpu_write instead of get_cpu_var.

v2 -> v1: Separate cleanup from bug fix.

Signed-off-by: Chen, Gong <[email protected]>
Suggested-by: H. Peter Anvin <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index eeee23f..68317c8 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -598,7 +598,6 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
{
struct mce m;
int i;
- unsigned long *v;

this_cpu_inc(mce_poll_count);

@@ -618,8 +617,7 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
if (!(m.status & MCI_STATUS_VAL))
continue;

- v = &get_cpu_var(mce_polled_error);
- set_bit(0, v);
+ this_cpu_write(mce_polled_error, 1);
/*
* Uncorrected or signalled events are handled by the exception
* handler when it is enabled, so don't process those here.
--
1.9.0


--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-04-17 21:30:20

by Borislav Petkov

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs

On Thu, Apr 17, 2014 at 02:21:41PM -0700, Owen Kibel wrote:
> The patch tested was from Chen, Gong
>
> https://lkml.org/lkml/2014/4/15/838
>
> It worked, apart from the warning on boot described previously - the
> possible boot message was extracted from /var/log/kern.log
>
> The above patch on mainline: 3.15-rc1 2014-04-13 [tar.xz]
>
> seems to have cured the problem

Ok, good. So this is straightened out.

@Ingo: just pick up Gong's fix without the raw_spin_lock* stuff.

> - the kernel appears fine after announcing something like
>
> kernel: [ 7.341085] BUG: using __this_cpu_write() in preemptible
> [00000000] code: modprobe/546
>
> in the boot process.

This is most likely unrelated and is caused by the preemption checks
added to __this_cpu_* in 188a81409ff7. If you'd like to debug this
further, please send a full dmesg:

dmesg > dmesg.log

Privately is fine too.

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-04-17 22:20:33

by Borislav Petkov

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs

On Thu, Apr 17, 2014 at 11:30:12PM +0200, Borislav Petkov wrote:
> This is most likely unrelated and is caused by the preemption checks
> added to __this_cpu_* in 188a81409ff7. If you'd like to debug this
> further, please send a full dmesg:
>
> dmesg > dmesg.log
>
> Privately is fine too.

Ok, thanks for the dmesg. Replying to the thread with everybody:

The splat Owen is seeing is the same as this one at the beginning of
this thread here:

http://lkml.kernel.org/r/[email protected]

which has a viable fix. Btw, those two splats happen on HP notebooks.

Ok, good, I think we're all solved. Phew :-)

Thanks to all for their help.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-04-18 08:07:42

by Ingo Molnar

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs


* Borislav Petkov <[email protected]> wrote:

> On Thu, Apr 17, 2014 at 11:30:12PM +0200, Borislav Petkov wrote:
> > This is most likely unrelated and is caused by the preemption checks
> > added to __this_cpu_* in 188a81409ff7. If you'd like to debug this
> > further, please send a full dmesg:
> >
> > dmesg > dmesg.log
> >
> > Privately is fine too.
>
> Ok, thanks for the dmesg. Replying to the thread with everybody:
>
> The splat Owen is seeing is the same as this one at the beginning of
> this thread here:
>
> http://lkml.kernel.org/r/[email protected]
>
> which has a viable fix. Btw, those two splats happen on HP notebooks.
>
> Ok, good, I think we're all solved. Phew :-)
>
> Thanks to all for their help.

Okay, so AFAICS the fix in x86/urgent isn't wrong functionally, it's
just that the changelog incorrectly claims the raw-spinlock use is a
bug causing a problem here.

Still that raw spinlock is bogus and might be hiding other problems,
so we can keep the x86/urgent change (ea431643d6c3) as-is and I'll get
it to Linus later today ...

Thanks,

Ingo

2014-04-18 09:22:25

by Borislav Petkov

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs

On Fri, Apr 18, 2014 at 10:07:16AM +0200, Ingo Molnar wrote:
> Okay, so AFAICS the fix in x86/urgent isn't wrong functionally, it's
> just that the changelog incorrectly claims the raw-spinlock use is a
> bug causing a problem here.
>
> Still that raw spinlock is bogus and might be hiding other problems,
> so we can keep the x86/urgent change (ea431643d6c3) as-is and I'll get
> it to Linus later today ...

Ok.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-04-25 09:13:20

by Chen, Gong

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs

On Fri, Apr 18, 2014 at 10:07:16AM +0200, Ingo Molnar wrote:
> Okay, so AFAICS the fix in x86/urgent isn't wrong functionally, it's
> just that the changelog incorrectly claims the raw-spinlock use is a
> bug causing a problem here.
>
> Still that raw spinlock is bogus and might be hiding other problems,
> so we can keep the x86/urgent change (ea431643d6c3) as-is and I'll get
> it to Linus later today ...
>
> Thanks,
>
> Ingo
Hi, Ingo

We ever had a patch(59d958d2c7) to make spinlock -> raw_spinlock.
Would you please explain it a little more why you revert it?


Attachments:
(No filename) (575.00 B)
signature.asc (819.00 B)
Digital signature
Download all attachments

2014-04-25 13:15:53

by Ingo Molnar

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs


* Chen, Gong <[email protected]> wrote:

> On Fri, Apr 18, 2014 at 10:07:16AM +0200, Ingo Molnar wrote:
> > Okay, so AFAICS the fix in x86/urgent isn't wrong functionally, it's
> > just that the changelog incorrectly claims the raw-spinlock use is a
> > bug causing a problem here.
> >
> > Still that raw spinlock is bogus and might be hiding other problems,
> > so we can keep the x86/urgent change (ea431643d6c3) as-is and I'll get
> > it to Linus later today ...
> >
> > Thanks,
> >
> > Ingo
> Hi, Ingo
>
> We ever had a patch(59d958d2c7) to make spinlock -> raw_spinlock.
> Would you please explain it a little more why you revert it?

It was years ago and I forgot about that. Should be redone I guess.

Thanks,

Ingo