Zhang Liguang report a bug as bellow:
1) system detected cmci storm on current cpu
2) disable cmci interrupt on banks ownd by current cpu, then swiching to poll mode
3) a few minites later, system swiching to interrupt mode on current cpu
4) we expect system to reenable cmci interrupt on banks ownd by current cpu
mce_intel_adjust_timer
|-> cmci_reenable
|-> cmci_discover # but, ownd banks is ignore here
> static void cmci_discover(int banks)
> ...
> for (i = 0; i < banks; i++) {
> ...
> if (test_bit(i, owned)) # ownd banks is ignore here
> continue;
In this patch, we add a func cmci_storm_enable_banks(), just to enable banks
which ownd by current cpu without clean the ownd flags. We call this func
instead of cmci_reenble() when swiching to interrupt mode.
Reported-by: Zhang Liguang <[email protected]>
Cc: [email protected] # v4.1+
Signed-off-by: Xie XiuQi <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce_intel.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index 844f56c..d4e98c7 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -146,6 +146,22 @@ void mce_intel_hcpu_update(unsigned long cpu)
per_cpu(cmci_storm_state, cpu) = CMCI_STORM_NONE;
}
+static void cmci_storm_enable_banks(void)
+{
+ unsigned long flags, *owned;
+ int bank;
+ u64 val;
+
+ raw_spin_lock_irqsave(&cmci_discover_lock, flags);
+ owned = this_cpu_ptr(mce_banks_owned);
+ for_each_set_bit(bank, owned, MAX_NR_BANKS) {
+ rdmsrl(MSR_IA32_MCx_CTL2(bank), val);
+ val |= MCI_CTL2_CMCI_EN;
+ wrmsrl(MSR_IA32_MCx_CTL2(bank), val);
+ }
+ raw_spin_unlock_irqrestore(&cmci_discover_lock, flags);
+}
+
unsigned long cmci_intel_adjust_timer(unsigned long interval)
{
if ((this_cpu_read(cmci_backoff_cnt) > 0) &&
@@ -175,7 +191,7 @@ unsigned long cmci_intel_adjust_timer(unsigned long interval)
*/
if (!atomic_read(&cmci_storm_on_cpus)) {
__this_cpu_write(cmci_storm_state, CMCI_STORM_NONE);
- cmci_reenable();
+ cmci_storm_enable_banks();
cmci_recheck();
}
return CMCI_POLL_INTERVAL;
--
2.0.0
On Tue, Aug 11, 2015 at 06:09:37PM +0800, Xie XiuQi wrote:
> Zhang Liguang report a bug as bellow:
> 1) system detected cmci storm on current cpu
> 2) disable cmci interrupt on banks ownd by current cpu, then swiching to poll mode
> 3) a few minites later, system swiching to interrupt mode on current cpu
> 4) we expect system to reenable cmci interrupt on banks ownd by current cpu
> mce_intel_adjust_timer
> |-> cmci_reenable
> |-> cmci_discover # but, ownd banks is ignore here
>
> > static void cmci_discover(int banks)
> > ...
> > for (i = 0; i < banks; i++) {
> > ...
> > if (test_bit(i, owned)) # ownd banks is ignore here
> > continue;
>
> In this patch, we add a func cmci_storm_enable_banks(), just to enable banks
> which ownd by current cpu without clean the ownd flags. We call this func
> instead of cmci_reenble() when swiching to interrupt mode.
Hmm, and we cannot clear the owned bit because those banks won't be
polled otherwise, see:
27f6c573e0f7 ("x86, CMCI: Add proper detection of end of CMCI storms")
Yuck.
Well, ok, but do it differently, please: rename
cmci_storm_disable_banks() to cmci_storm_switch_banks(bool on) which
turns them on and off. Unless Tony has a better suggestion...
> Reported-by: Zhang Liguang <[email protected]>
> Cc: [email protected] # v4.1+
Why 4.1 only?
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
> Well, ok, but do it differently, please: rename
> cmci_storm_disable_banks() to cmci_storm_switch_banks(bool on) which
> turns them on and off. Unless Tony has a better suggestion...
I like the boolean argument ... but not the "switch_banks" name. It sounds more
like we are juggling between banks, rather than setting a switch/flag in a bank.
How does "cmci_storm_set_cmci(bool on)" sound? Too many "cmci" in one name?
-Tony
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?
On 2015/8/11 22:46, Borislav Petkov wrote:
> On Tue, Aug 11, 2015 at 06:09:37PM +0800, Xie XiuQi wrote:
>> Zhang Liguang report a bug as bellow:
>> 1) system detected cmci storm on current cpu
>> 2) disable cmci interrupt on banks ownd by current cpu, then swiching to poll mode
>> 3) a few minites later, system swiching to interrupt mode on current cpu
>> 4) we expect system to reenable cmci interrupt on banks ownd by current cpu
>> mce_intel_adjust_timer
>> |-> cmci_reenable
>> |-> cmci_discover # but, ownd banks is ignore here
>>
>>> static void cmci_discover(int banks)
>>> ...
>>> for (i = 0; i < banks; i++) {
>>> ...
>>> if (test_bit(i, owned)) # ownd banks is ignore here
>>> continue;
>>
>> In this patch, we add a func cmci_storm_enable_banks(), just to enable banks
>> which ownd by current cpu without clean the ownd flags. We call this func
>> instead of cmci_reenble() when swiching to interrupt mode.
>
> Hmm, and we cannot clear the owned bit because those banks won't be
> polled otherwise, see:
>
> 27f6c573e0f7 ("x86, CMCI: Add proper detection of end of CMCI storms")
OK, thanks.
>
> Yuck.
>
> Well, ok, but do it differently, please: rename
> cmci_storm_disable_banks() to cmci_storm_switch_banks(bool on) which
> turns them on and off. Unless Tony has a better suggestion...
>
>> Reported-by: Zhang Liguang <[email protected]>
>> Cc: [email protected] # v4.1+
>
> Why 4.1 only?
My fault, it's v3.15+.
Thanks,
Xie XiuQi
>
On 2015/8/12 2:52, Luck, Tony wrote:
>> Well, ok, but do it differently, please: rename
>> cmci_storm_disable_banks() to cmci_storm_switch_banks(bool on) which
>> turns them on and off. Unless Tony has a better suggestion...
>
> I like the boolean argument ... but not the "switch_banks" name. It sounds more
> like we are juggling between banks, rather than setting a switch/flag in a bank.
>
> How does "cmci_storm_set_cmci(bool on)" sound? Too many "cmci" in one name?
Thanks, I'll use this name.
--
Xie XiuQi
>
> -Tony
>