Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751682AbbEDPrO (ORCPT ); Mon, 4 May 2015 11:47:14 -0400 Received: from mail.skyhub.de ([78.46.96.112]:44169 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750887AbbEDPrF (ORCPT ); Mon, 4 May 2015 11:47:05 -0400 Date: Mon, 4 May 2015 17:46:52 +0200 From: Borislav Petkov To: Aravind Gopalakrishnan Cc: tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com, tony.luck@intel.com, jiang.liu@linux.intel.com, yinghai@kernel.org, x86@kernel.org, dvlasenk@redhat.com, JBeulich@suse.com, slaoub@gmail.com, luto@amacapital.net, dave.hansen@linux.intel.com, oleg@redhat.com, rostedt@goodmis.org, rusty@rustcorp.com.au, prarit@redhat.com, linux@rasmusvillemoes.dk, jroedel@suse.de, andriy.shevchenko@linux.intel.com, macro@linux-mips.org, wangnan0@huawei.com, linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, Robert Richter Subject: Re: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler Message-ID: <20150504154652.GF3829@pd.tnic> References: <1430405365-4473-1-git-send-email-Aravind.Gopalakrishnan@amd.com> <1430405365-4473-3-git-send-email-Aravind.Gopalakrishnan@amd.com> <20150503092212.GC18048@pd.tnic> <5547906E.3060701@amd.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <5547906E.3060701@amd.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2429 Lines: 64 On Mon, May 04, 2015 at 10:29:50AM -0500, Aravind Gopalakrishnan wrote: > >What's the family check for? for BIOSes which don't set the LVT offset > >to 2, as they should? > > > >If so, we probably should say > > > > pr_err(FW_BUG "Your BIOS is not setting up LVT offset 0x2 for deferred error IRQs correctly.\n"); > > > >or similar... > > Yeah. I meant to provide a comment at least for this. > Forgot to do that. > > I'll print out a error message as you suggested (considering we do this in > other places like threshold setup or IBS setup..) lvt_off_valid() does that already. Adding Robert. > Right. I think a __log_error() is a good idea. > Except, in amd_threshold_interrupt(), we have- > m.misc = ((u64)high << 32) | low; > > which, is actually useless as we don't use m.misc anywhere in > amd_decode_mce() or anywhere else in the decoding pipeline AFAICT. > We only print out if 'misc' is valid and we only need status bits for that- > ((m->status & MCI_STATUS_MISCV) ? "MiscV" : "-"), > > But, more importantly, we don't setup 'm.addr' here (in > amd_threshold_interrupt() or in amd_deferred_error_interrupt()) > Which means anytime we pass an error to be decoded from the interrupt > handlers, we don't get any info about the error address. So what are we reporting with a deferred error if it is not a full-fledged MCE? We better fix that otherwise we probably shouldn't even report those. I mean, userspace is supposed to do some error handling based on error info but if that info's missing, we might just as well panic right then and there, right? > So, we can do one of these- > 1. Remove m.misc setup in amd_threshold_interrupt() and > rdmsrl(MSR_IA32_MCx_ADDR(bank), m.addr) before we call mce_log() > 2. Since we have mce_read_aux() that reads misc and addr registers, we can > move the mce_[rd|wr]msrl wrappers and mce_read_aux() into mce.h and use it > here in mce_amd.c > > Thoughts? Makes sense but you need to first check though, which registers are valid in the hw when a threshold/deferred error happens and collect them. Only then we can do proper recovery. Thanks. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/