Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753725AbcKRQlu (ORCPT ); Fri, 18 Nov 2016 11:41:50 -0500 Received: from mga01.intel.com ([192.55.52.88]:2921 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752043AbcKRQlt (ORCPT ); Fri, 18 Nov 2016 11:41:49 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.31,510,1473145200"; d="scan'208";a="788074129" Date: Fri, 18 Nov 2016 08:42:04 -0800 From: "Luck, Tony" To: Borislav Petkov Cc: linux-kernel@vger.kernel.org, Andi Kleen Subject: Re: [PATCH 1/2] x86/mce: Include the PPIN in machine check records when it is available Message-ID: <20161118164204.GA10543@intel.com> References: <1479429348-1664-1-git-send-email-tony.luck@intel.com> <1479429348-1664-2-git-send-email-tony.luck@intel.com> <20161118130022.vcnodxrslg6khycc@pd.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161118130022.vcnodxrslg6khycc@pd.tnic> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2443 Lines: 82 On Fri, Nov 18, 2016 at 02:00:22PM +0100, Borislav Petkov wrote: > On Thu, Nov 17, 2016 at 04:35:48PM -0800, Luck, Tony wrote: > > @@ -2134,8 +2140,37 @@ static int __init mcheck_enable(char *str) > > } > > __setup("mce", mcheck_enable); > > > > +static void mcheck_intel_ppin_init(void) > > So this functionality could all be moved to arch/x86/kernel/cpu/intel.c > where you could set an artificial X86_FEATURE_PPIN and get rid of the > have_ppin var. Ok - will do. > > + switch (boot_cpu_data.x86_model) { > > + case INTEL_FAM6_IVYBRIDGE_X: > > + case INTEL_FAM6_HASWELL_X: > > + case INTEL_FAM6_BROADWELL_XEON_D: > > + case INTEL_FAM6_BROADWELL_X: > > + case INTEL_FAM6_SKYLAKE_X: > > + if (rdmsrl_safe(MSR_PPIN_CTL, &msr_ppin_ctl)) > > + return; > > I don't think you need to check models - if the RDMSR fails, you're > done. Other models may use this MSR number for some other purpose. So the read might succeed, but what I get might be something else entirely. Technically with the model check I shouldn't have to use the _safe versions ... but I'm paranoid that some SKUs might not implement this. > > + if (msr_ppin_ctl == 1) { > > & BIT_ULL(0) > > for future robustness in case those other reserved bits get used. Unlikely ... but paranoia is good (see above about using rdmsr_safe). > > + pr_info("PPIN available but disabled\n"); > > We don't care, do we? Probably not ... there might be a BIOS setting, but the user that finds they aren't getting PPIN in their logs could diagnose by making their own rdmsr checks ... will delete this pr_info(). > > + return; > > + } > > + /* if PPIN is disabled, but not locked, try to enable */ > > + if (msr_ppin_ctl == 0) { > > Also, properly masked off. There are [63:2] reserved bits which might be > assigned someday. Ok. > > + wrmsrl_safe(MSR_PPIN_CTL, 2); > > + rdmsrl_safe(MSR_PPIN_CTL, &msr_ppin_ctl); > > Why aren't we programming a number here? Or are users supposed to do > that? > > If so, please design a proper sysfs interface and not make them use > msr-tools. The PPIN is programmed at the fab. To the user it is just a handy unique number. I think Intel can decode it back to which fab and production run this chip came from (useful to us if there are many chips reporting some error). > > + } > > + if (msr_ppin_ctl == 2) > > + have_ppin = 1; > > set_cpu_cap(c, X86_FEATURE_PPIN); Yes - that looks prettier. Thanks -Tony