Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754281AbbHLLwj (ORCPT ); Wed, 12 Aug 2015 07:52:39 -0400 Received: from szxga01-in.huawei.com ([58.251.152.64]:16327 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752510AbbHLLwi (ORCPT ); Wed, 12 Aug 2015 07:52:38 -0400 Subject: Re: [PATCH v2] x86/mce: fix failed to reenable cmci when swiching to interrupt mode To: Borislav Petkov References: <1439347871-2702-1-git-send-email-xiexiuqi@huawei.com> <20150812095457.GB14011@nazgul.tnic> CC: , , , , , , , , , From: Xie XiuQi Message-ID: <55CB336B.8080708@huawei.com> Date: Wed, 12 Aug 2015 19:52:11 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:42.0) Gecko/20100101 Thunderbird/42.0a1 MIME-Version: 1.0 In-Reply-To: <20150812095457.GB14011@nazgul.tnic> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.19.210] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5399 Lines: 163 On 2015/8/12 17:54, Borislav Petkov wrote: > On Wed, Aug 12, 2015 at 10:51:11AM +0800, Xie XiuQi wrote: >> Zhang Liguang report a bug as bellow: >> 1) system detected cmci storm on current cpu >> 2) disable cmci interrupt on banks ownd by current cpu, then swiching to poll mode >> 3) a few minites later, system swiching to interrupt mode on current cpu >> 4) we expect system to reenable cmci interrupt on banks ownd by current cpu >> mce_intel_adjust_timer >> |-> cmci_reenable >> |-> cmci_discover # but, ownd banks is ignore here >> >>> static void cmci_discover(int banks) >>> ... >>> for (i = 0; i < banks; i++) { >>> ... >>> if (test_bit(i, owned)) # ownd banks is ignore here >>> continue; >> >> In this patch, we add a func cmci_storm_set_cmci(), just to enable or > > Yeah, that's too many "cmci"'s in the name. Here's what I committed: It looks much better than me. Thanks. > > --- > From: Xie XiuQi > Date: Wed, 12 Aug 2015 10:51:11 +0800 > Subject: [PATCH] x86/mce: Reenable CMCI banks when swiching back to interrupt mode > > Zhang Liguang reported the following issue: > > 1) System detects a CMCI storm on the current CPU. > > 2) Kernel disables the CMCI interrupt on banks owned by the current CPU and > switches to poll mode > > 3) After the CMCI storm subsides, kernel switches back to interrupt mode > > 4) We expect the system to reenable the CMCI interrupt on banks owned by > the current CPU > > mce_intel_adjust_timer > |-> cmci_reenable > |-> cmci_discover # owned banks are ignored here > > static void cmci_discover(int banks) > ... > for (i = 0; i < banks; i++) { > ... > if (test_bit(i, owned)) # ownd banks is ignore here > continue; > > So convert cmci_storm_disable_banks() to cmci_toggle_interrupt_mode() > which controls whether to enable or disable CMCI interrupts with its > argument. > > NB: We cannot clear the owned bit because the banks won't be polled, > otherwise. See > > 27f6c573e0f7 ("x86, CMCI: Add proper detection of end of CMCI storms") > > for more info. > > Reported-by: Zhang Liguang > Signed-off-by: Xie XiuQi > Cc: # v3.15+ > Cc: "H. Peter Anvin" > Cc: huawei.libin@huawei.com > Cc: Ingo Molnar > Cc: linux-edac > Cc: rui.xiang@huawei.com > Cc: Thomas Gleixner > Cc: Tony Luck > Cc: x86-ml > Link: http://lkml.kernel.org/r/1439347871-2702-1-git-send-email-xiexiuqi@huawei.com > Signed-off-by: Borislav Petkov > --- > arch/x86/kernel/cpu/mcheck/mce_intel.c | 41 +++++++++++++++++++--------------- > 1 file changed, 23 insertions(+), 18 deletions(-) > > diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c > index c5c003291861..1e8bb6c94f14 100644 > --- a/arch/x86/kernel/cpu/mcheck/mce_intel.c > +++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c > @@ -146,6 +146,27 @@ void mce_intel_hcpu_update(unsigned long cpu) > per_cpu(cmci_storm_state, cpu) = CMCI_STORM_NONE; > } > > +static void cmci_toggle_interrupt_mode(bool on) > +{ > + unsigned long flags, *owned; > + int bank; > + u64 val; > + > + raw_spin_lock_irqsave(&cmci_discover_lock, flags); > + owned = this_cpu_ptr(mce_banks_owned); > + for_each_set_bit(bank, owned, MAX_NR_BANKS) { > + rdmsrl(MSR_IA32_MCx_CTL2(bank), val); > + > + if (on) > + val |= MCI_CTL2_CMCI_EN; > + else > + val &= ~MCI_CTL2_CMCI_EN; > + > + wrmsrl(MSR_IA32_MCx_CTL2(bank), val); > + } > + raw_spin_unlock_irqrestore(&cmci_discover_lock, flags); > +} > + > unsigned long cmci_intel_adjust_timer(unsigned long interval) > { > if ((this_cpu_read(cmci_backoff_cnt) > 0) && > @@ -175,7 +196,7 @@ unsigned long cmci_intel_adjust_timer(unsigned long interval) > */ > if (!atomic_read(&cmci_storm_on_cpus)) { > __this_cpu_write(cmci_storm_state, CMCI_STORM_NONE); > - cmci_reenable(); > + cmci_toggle_interrupt_mode(true); > cmci_recheck(); > } > return CMCI_POLL_INTERVAL; > @@ -186,22 +207,6 @@ unsigned long cmci_intel_adjust_timer(unsigned long interval) > } > } > > -static void cmci_storm_disable_banks(void) > -{ > - unsigned long flags, *owned; > - int bank; > - u64 val; > - > - raw_spin_lock_irqsave(&cmci_discover_lock, flags); > - owned = this_cpu_ptr(mce_banks_owned); > - for_each_set_bit(bank, owned, MAX_NR_BANKS) { > - rdmsrl(MSR_IA32_MCx_CTL2(bank), val); > - val &= ~MCI_CTL2_CMCI_EN; > - wrmsrl(MSR_IA32_MCx_CTL2(bank), val); > - } > - raw_spin_unlock_irqrestore(&cmci_discover_lock, flags); > -} > - > static bool cmci_storm_detect(void) > { > unsigned int cnt = __this_cpu_read(cmci_storm_cnt); > @@ -223,7 +228,7 @@ static bool cmci_storm_detect(void) > if (cnt <= CMCI_STORM_THRESHOLD) > return false; > > - cmci_storm_disable_banks(); > + cmci_toggle_interrupt_mode(false); > __this_cpu_write(cmci_storm_state, CMCI_STORM_ACTIVE); > r = atomic_add_return(1, &cmci_storm_on_cpus); > mce_timer_kick(CMCI_STORM_INTERVAL); > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/