Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753459AbbKXPv0 (ORCPT ); Tue, 24 Nov 2015 10:51:26 -0500 Received: from mga11.intel.com ([192.55.52.93]:4590 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751939AbbKXPvY (ORCPT ); Tue, 24 Nov 2015 10:51:24 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,338,1444719600"; d="scan'208";a="858469789" From: "Luck, Tony" To: Borislav Petkov CC: "Chen, Gong" , "linux-edac@vger.kernel.org" , "linux-kernel@vger.kernel.org" Subject: RE: [UNTESTED PATCH] x86, mce: Avoid double entry of deferred errors into the genpool. Thread-Topic: [UNTESTED PATCH] x86, mce: Avoid double entry of deferred errors into the genpool. Thread-Index: AQHRJJEUZ78B3ZefO0SFhT0qpcRR7J6qUVPwgAEBXICAAAJtIA== Date: Tue, 24 Nov 2015 15:51:21 +0000 Message-ID: <3908561D78D1C84285E8C5FCA982C28F39EA08E1@ORSMSX114.amr.corp.intel.com> References: <20151111193845.GA9055@agluck-desk.sc.intel.com> <3165a4989dcb45fc0306438d40d0cf2ace429c4c.1447280215.git.tony.luck@intel.com> <20151121191556.GB15172@pd.tnic> <3908561D78D1C84285E8C5FCA982C28F39E9FEF6@ORSMSX114.amr.corp.intel.com> <20151124073639.GA3785@pd.tnic> In-Reply-To: <20151124073639.GA3785@pd.tnic> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.22.254.140] Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by mail.home.local id tAOFpUJ1022094 Content-Length: 1322 Lines: 23 >> Ok ... applied those two on top of my "UNTESTED" patch and injected an error to force a UCNA log. > > Ok, what error type is that in EINJ nomenclature? I had only > > /sys/kernel/debug/apei/einj/available_error_type:0x00000002 Processor Uncorrectable non-fatal > /sys/kernel/debug/apei/einj/available_error_type:0x00000008 Memory Correctable > /sys/kernel/debug/apei/einj/available_error_type:0x00000010 Memory Uncorrectable non-fatal > > and I would've guessed it is the 0x10 type, i.e., the memory > uncorrectable which is non-fatal - assuming here - but that one got > promoted to a #MC on my box. I juggled with the type of the injection and the instruction sequence to access the target location. I used 0x10 to inject an uncorrected memory error with "# echo 1 > notrigger" to make sure the EINJ driver skipped the trigger actions. Then I had a user mode test program write a byte to the cache line. That pulled the uncorrected data into the cache (which logged the UCNA error signaled with CMCI). But the processor didn't actually consume the poison (no registers had corrupted data), so there was no machine check. Sneaky, huh? -Tony ????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?