Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752261AbdGRT7A (ORCPT ); Tue, 18 Jul 2017 15:59:00 -0400 Received: from g9t5009.houston.hpe.com ([15.241.48.73]:56240 "EHLO g9t5009.houston.hpe.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752109AbdGRT65 (ORCPT ); Tue, 18 Jul 2017 15:58:57 -0400 From: "Kani, Toshimitsu" To: "tony.luck@intel.com" , "bp@alien8.de" CC: "linux-kernel@vger.kernel.org" , "tglx@linutronix.de" , "mchehab@kernel.org" , "rjw@rjwysocki.net" , "srinivas.pandruvada@linux.intel.com" , "lenb@kernel.org" , "linux-acpi@vger.kernel.org" , "linux-edac@vger.kernel.org" Subject: Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac Thread-Topic: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac Thread-Index: AQHS/0lGOl4i5ss6QUSn88C0VF1rJqJZF6OAgADnxYA= Date: Tue, 18 Jul 2017 19:58:54 +0000 Message-ID: <1500407379.2042.21.camel@hpe.com> References: <20170717215912.26070-1-toshi.kani@hpe.com> <20170717215912.26070-4-toshi.kani@hpe.com> <20170718060007.GB8736@nazgul.tnic> In-Reply-To: <20170718060007.GB8736@nazgul.tnic> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: intel.com; dkim=none (message not signed) header.d=none;intel.com; dmarc=none action=none header.from=hpe.com; x-originating-ip: [15.219.147.8] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;DF4PR84MB0185;7:6zWSrbEzKEvLalVjYNFA1ovGAFfgecnNd5+mAkIcoTYSFz9WVap2qcqK4ki0k4HnfnkdEVF7R2l12/9IN+/4cxPIT1bjtypE6Ujc+qbyna+1rWYbCPO7dvcj2e20Hj8bgPYzc976F9ZCZxc9LSj/K/Ydv32xvNFjiSQ7jstdETFmlzUNWLbJ3W4vwNbL5+A4dh986RI0bWXiLOIrVbhYZICFrcHO3NAxIpvDWLGYK/C5nIgTqa7YiGWeUp7MEIAtZmxcLZ7WhFqDIYkgH1n+gu87mEebeD2zeGz0ShtoYIp4AUEj+7vOI2zh16dNjkfFiTd9p6c9qWjV1qcsQo5P+qZB4jSpDU7MEZEh+WWE4u0Yf3FZkFv+XcT0zRx4tyrh927XeK+TtAbKsaUF1YLOJ1zOuUlkFKqppJYW39mukFehHlJlYKD2OXQr09UoxtPtkIZXMFFAca+bvUItnHQA++r502gGwCQBBF/83VyO9uNmlU5ueFzp79uQBvQtR7inDbns02rEZ9cwV1XD7D6s3E48bsbfS+oxtESChtszT+n6qxXDRt7Rch6Dg8YHeacSiNR+IqTVvyrqZUpp2G+uRpMyNFEWOM1yQC4XfvOeemWhHKevXzGokE5yzkbJQ2vpf7WB79ijcB8SRjEbyXkqdWjc3/JtNPhLj70LVe0u0BEyU9DY88uk7KWcQ/rS6ruA87LcOsNCDlidC1WeY4zdiaajqMotG5F7VIzZkQSzXoDHz5RKYsSRb/NpMXdxPt0dmXYWYEib7CAwIWuWskhf0XMRd4kCiK7hfFjsLis61uU= x-ms-office365-filtering-correlation-id: b8f2e450-d2d5-45c5-ab71-08d4ce176b2d x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(22001)(300000502095)(300135100095)(2017030254075)(48565401081)(300000503095)(300135400095)(2017052603031)(201703131423075)(201703031133081)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095);SRVR:DF4PR84MB0185; x-ms-traffictypediagnostic: DF4PR84MB0185: x-exchange-antispam-report-test: UriScan:(20558992708506)(133145235818549)(236129657087228)(48057245064654); x-microsoft-antispam-prvs: x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(601004)(2401047)(8121501046)(5005006)(2017060910075)(100000703101)(100105400095)(10201501046)(93006095)(93001095)(3002001)(6055026)(6041248)(20161123562025)(20161123560025)(20161123558100)(20161123555025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123564025)(6072148)(100000704101)(100105200095)(100000705101)(100105500095);SRVR:DF4PR84MB0185;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:DF4PR84MB0185; x-forefront-prvs: 037291602B x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(6009001)(39400400002)(39850400002)(39840400002)(39410400002)(39860400002)(39450400003)(377424004)(24454002)(6512007)(33646002)(189998001)(2501003)(36756003)(229853002)(54356999)(14454004)(6486002)(76176999)(77096006)(86362001)(25786009)(54906002)(6436002)(6506006)(103116003)(478600001)(4326008)(66066001)(50986999)(6246003)(102836003)(6116002)(3846002)(38730400002)(53936002)(3660700001)(81166006)(8676002)(5890100001)(8936002)(5660300001)(3280700002)(305945005)(2950100002)(2906002)(2900100001)(7416002)(7736002);DIR:OUT;SFP:1102;SCL:1;SRVR:DF4PR84MB0185;H:DF4PR84MB0187.NAMPRD84.PROD.OUTLOOK.COM;FPR:;SPF:None;MLV:sfv;LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="utf-8" Content-ID: MIME-Version: 1.0 X-MS-Exchange-CrossTenant-originalarrivaltime: 18 Jul 2017 19:58:54.2244 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 105b2061-b669-4b31-92ac-24d304d195dc X-MS-Exchange-Transport-CrossTenantHeadersStamped: DF4PR84MB0185 X-OriginatorOrg: hpe.com Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by nfs id v6IJx5d7029816 Content-Length: 2411 Lines: 58 On Tue, 2017-07-18 at 08:00 +0200, Borislav Petkov wrote: > On Mon, Jul 17, 2017 at 03:59:12PM -0600, Toshi Kani wrote: > > The ghes_edac driver was introduced in 2013 [1], but it has not > > been enabled by any distro yet.  This driver obtains error info > > from firmware interfaces, which are not properly implemented on > > many platforms, as the driver always emits the messages below: > > > >  This EDAC driver relies on BIOS to enumerate memory and get error > > reports.  Unfortunately, not all BIOSes reflect the memory layout > > correctly  So, the end result of using this driver varies from > > vendor to vendor  If you find incorrect reports, please contact > > your hardware vendor  to correct its BIOS. > > > > To get out from this situation, add a platform type check to > > selectively enable the driver on the platforms that are known to > > have proper firmware implementation.  Platform vendors can add > > their platforms to the list when they support ghes_edac. > > So maintaining whitelists for things has always been a PITA and we > should try to avoid it, if possible. (We can always do it if nothing > saner comes along.) Agreed. > Now, below is a dirty patch converting ghes_edac to a normal module. > On systems where we have GHES, the firmware generally disables the > detection of the presence of ECC hardware, thus preventing the > platform EDAC driver from loading. I have HPE Haswell and Skylake test systems with GHES, but they do not hide IMCs from the OS. So, the sb_edac and skx_edac drivers get attached on these systems when ghes_edac is disabled. > Let me clarify: I have an AMD HP box which, when GHES is enabled in > the BIOS, says that ECC is disabled in the memory controller and the > amd64_edac driver doesn't load for that memory controller. Hmm... what's the platform name of this box? I can look into this case if you need. > And I think we should try this first: have the firmware disable > detection methods so that the platform drivers don't load. I do not think we can rely on this method. > Then, ghes_edac can be a simple module and no other driver would > attempt loading. I like the use of notifier chain, which is much cleaner. > The question is: does the platform do this disabling now? Unfortunately, that is not the case today. The IMCs cannot be hidden with the Device Hide registers for Skylake at least. Thanks, -Toshi