Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758645AbbKUTQG (ORCPT ); Sat, 21 Nov 2015 14:16:06 -0500 Received: from mail.skyhub.de ([78.46.96.112]:35095 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751122AbbKUTQE (ORCPT ); Sat, 21 Nov 2015 14:16:04 -0500 Date: Sat, 21 Nov 2015 20:15:56 +0100 From: Borislav Petkov To: Tony Luck Cc: "Chen, Gong" , linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [UNTESTED PATCH] x86, mce: Avoid double entry of deferred errors into the genpool. Message-ID: <20151121191556.GB15172@pd.tnic> References: <20151111193845.GA9055@agluck-desk.sc.intel.com> <3165a4989dcb45fc0306438d40d0cf2ace429c4c.1447280215.git.tony.luck@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <3165a4989dcb45fc0306438d40d0cf2ace429c4c.1447280215.git.tony.luck@intel.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2529 Lines: 66 On Wed, Nov 11, 2015 at 02:01:51PM -0800, Tony Luck wrote: > We used to have a special ring buffer for deferred errors that > was used to mark problem pages. We replaced that with a genpool. > Then later converted mce_log() to also use the same genpool. As > a result we end up adding all deferred errors to the genpool twice. > > Rearrange this code. Make sure to set the m.severity and m.usable_addr > fields for deferred errors. Then if flags and mca_cfg.dont_log_ce mean > we call mce_log() we are done, because that will add this entry to the > genpool. > > If we skipped mce_log(), then we still want to take action for the > deferred error, so add to the genpool. > > Changed the name of the boolean "error_logged" to "error_seen", we > should set it whether of not we logged an error because the return > value from machine_check_poll() is used to decide whether storms > have subsided or not. > > Reported-by: Chen, Gong > Signed-off-by: Tony Luck > --- > arch/x86/kernel/cpu/mcheck/mce.c | 24 +++++++++++++----------- > 1 file changed, 13 insertions(+), 11 deletions(-) ... > @@ -626,9 +621,16 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b) > * Don't get the IP here because it's unlikely to > * have anything to do with the actual error location. > */ > - if (!(flags & MCP_DONTLOG) && !mca_cfg.dont_log_ce) { > - error_logged = true; > + if (!(flags & MCP_DONTLOG) && !mca_cfg.dont_log_ce) > mce_log(&m); > + else if (m.usable_addr) { > + /* > + * Although we skipped logging this, we still want > + * to take action. Add to the pool so the registered > + * notifiers will see it. > + */ > + if (!mce_gen_pool_add(&m)) > + mce_schedule_work(); Right, this still causes the error to come out on AMD because the notifier calls amd_decode_mce(). I guess we can extend the "if (m.usable_addr)" check above with "if error is not CE" too and only add it to the generic pool when its severity is anything stronger than MCE_KEEP_SEVERITY... Also, two more fixes I've done while injecting in a kvm guest I'm sending as a reply to this message. Will inject on a real box too. Thanks. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/