Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755859AbbKSQP7 (ORCPT ); Thu, 19 Nov 2015 11:15:59 -0500 Received: from mail.skyhub.de ([78.46.96.112]:52416 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753179AbbKSQP5 (ORCPT ); Thu, 19 Nov 2015 11:15:57 -0500 Date: Thu, 19 Nov 2015 17:15:21 +0100 From: Borislav Petkov To: Tony Luck Cc: "Chen, Gong" , linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [UNTESTED PATCH] x86, mce: Avoid double entry of deferred errors into the genpool. Message-ID: <20151119161521.GF6065@pd.tnic> References: <20151111193845.GA9055@agluck-desk.sc.intel.com> <3165a4989dcb45fc0306438d40d0cf2ace429c4c.1447280215.git.tony.luck@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <3165a4989dcb45fc0306438d40d0cf2ace429c4c.1447280215.git.tony.luck@intel.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3821 Lines: 101 On Wed, Nov 11, 2015 at 02:01:51PM -0800, Tony Luck wrote: > We used to have a special ring buffer for deferred errors that > was used to mark problem pages. We replaced that with a genpool. > Then later converted mce_log() to also use the same genpool. As > a result we end up adding all deferred errors to the genpool twice. > > Rearrange this code. Make sure to set the m.severity and m.usable_addr > fields for deferred errors. Then if flags and mca_cfg.dont_log_ce mean > we call mce_log() we are done, because that will add this entry to the > genpool. > > If we skipped mce_log(), then we still want to take action for the > deferred error, so add to the genpool. > > Changed the name of the boolean "error_logged" to "error_seen", we > should set it whether of not we logged an error because the return > value from machine_check_poll() is used to decide whether storms > have subsided or not. > > Reported-by: Chen, Gong > Signed-off-by: Tony Luck > --- > arch/x86/kernel/cpu/mcheck/mce.c | 24 +++++++++++++----------- > 1 file changed, 13 insertions(+), 11 deletions(-) Applied, thanks. Btw, looking at that mce.usable_addr, it doesn't make a whole lotta sense to me and we can use mce_usable_address() directly instead and use the byte in struct mce for something more important. So how about I kill it (diff ontop of yours): --- diff --git a/arch/x86/include/uapi/asm/mce.h b/arch/x86/include/uapi/asm/mce.h index 03429da2fa80..2184943341bf 100644 --- a/arch/x86/include/uapi/asm/mce.h +++ b/arch/x86/include/uapi/asm/mce.h @@ -16,7 +16,7 @@ struct mce { __u8 cpuvendor; /* cpu vendor as encoded in system.h */ __u8 inject_flags; /* software inject flags */ __u8 severity; - __u8 usable_addr; + __u8 pad; __u32 cpuid; /* CPUID 1 EAX */ __u8 cs; /* code segment */ __u8 bank; /* machine check bank */ diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index 6531cb46803c..fb8b1db7b150 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -484,7 +484,7 @@ static int srao_decode_notifier(struct notifier_block *nb, unsigned long val, if (!mce) return NOTIFY_DONE; - if (mce->usable_addr && (mce->severity == MCE_AO_SEVERITY)) { + if (mce_usable_address(mce) && (mce->severity == MCE_AO_SEVERITY)) { pfn = mce->addr >> PAGE_SHIFT; memory_failure(pfn, MCE_VECTOR, 0); } @@ -610,12 +610,9 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b) severity = mce_severity(&m, mca_cfg.tolerant, NULL, false); - if (severity == MCE_DEFERRED_SEVERITY && memory_error(&m)) { - if (m.status & MCI_STATUS_ADDRV) { + if (severity == MCE_DEFERRED_SEVERITY && memory_error(&m)) + if (m.status & MCI_STATUS_ADDRV) m.severity = severity; - m.usable_addr = mce_usable_address(&m); - } - } /* * Don't get the IP here because it's unlikely to @@ -623,7 +620,7 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b) */ if (!(flags & MCP_DONTLOG) && !mca_cfg.dont_log_ce) mce_log(&m); - else if (m.usable_addr) { + else if (mce_usable_address(&m)) { /* * Although we skipped logging this, we still want * to take action. Add to the pool so the registered @@ -1091,7 +1088,6 @@ void do_machine_check(struct pt_regs *regs, long error_code) /* assuming valid severity level != 0 */ m.severity = severity; - m.usable_addr = mce_usable_address(&m); mce_log(&m); -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/