Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753599AbdHQWad convert rfc822-to-8bit (ORCPT ); Thu, 17 Aug 2017 18:30:33 -0400 Received: from g9t1613g.houston.hpe.com ([15.241.32.99]:57944 "EHLO g9t1613g.houston.hpe.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753005AbdHQWab (ORCPT ); Thu, 17 Aug 2017 18:30:31 -0400 From: "Elliott, Robert (Persistent Memory)" To: Andrew Morton , "Luck, Tony" CC: Borislav Petkov , Dave Hansen , "Naoya Horiguchi" , "x86@kernel.org" , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" Subject: RE: [PATCH-resend] mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages Thread-Topic: [PATCH-resend] mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages Thread-Index: AQHTFrOs5yBR/eBpjUGfoM88uKwdtKKJHaUAgAAAkmA= Date: Thu, 17 Aug 2017 22:29:48 +0000 Message-ID: References: <20170816171803.28342-1-tony.luck@intel.com> <20170817150942.017f87537b6cbb48e9cfc082@linux-foundation.org> In-Reply-To: <20170817150942.017f87537b6cbb48e9cfc082@linux-foundation.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=elliott@hpe.com; x-originating-ip: [15.211.195.2] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;AT5PR84MB0051;6:91lgayJh4c9muiGWB0ETM7Oaf5WED15HcUuXGMlrh1gSUKz8mToukU8jvdxVDynn7ClMOR4SGNqxRXkJARQd+BbdzaP7rS6Nf+SsvFi0xIXppfacsZzxDxyn4jWm17gTyCZNAVRGv/oC0ff0LI/s4Qf9PVF+Olx2hCohutktmINAQQHwbK0Yy4N9moAjDXDpQJPiL8mWXX08qWwY1Owb+1WQSdYclri8KeDo5Us/wNh9IsluatRkV3vPXaZ4phzD+lWsenwl6rb1UNUXK0IBa567ZPmxcYepxrrgG3+p0vTpRo80r5VDyIJco5jy0g1gT4fyFDtmom1LbdOc1lNs9g==;5:Q8muEpMwL1Re2EanBHu2UKlTENss40cC0KjNF8Abe2WxeO0KMtx4/jngyGhnkjZDqEBNvZoqaSSqdvpoxqKNS2DfkBHilRv0CEtTJ9JAWf3g1BZFjMOEijSqSTUq5v/342xFXnQU/p1z+V/Jxxyscw==;24:9kpD+aRYI+arxPtKuOZmg8ftCWMtT/YAcOccLXiCyzwdK1nr+NU1dLVXMxnitp5lHvQC5caTKUg+nEwRgt8LLrUg0vzHs5cObMhqNWPoPGk=;7:ydlUtz7181nAwB4rgqQtqpVgCCtWMoFtIrQr+eqqgXYPdEawkEHXOFsZis3026DVea2BKRoJuVW8yQKfrtS+X1otxDIyXvSxa/R80ZtwBUdldNqXTLwnSYyCJGT9+fq3/KV2AUxHh3eR4yud4n3BDrL04HSQVVV67e4I77vEo89Ab0cHiSa2mV2Mp9m8U4YTKopHeg/ZmmiHk0dxqeq7t6xLpypiq6PpnDpi8/wYWGA= x-ms-exchange-antispam-srfa-diagnostics: SSOS; x-ms-office365-filtering-correlation-id: 0f0710e2-77f8-4b7e-50cf-08d4e5bf782e x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(22001)(300000502095)(300135100095)(2017030254152)(48565401081)(300000503095)(300135400095)(2017052603031)(201703131423075)(201703031133081)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095);SRVR:AT5PR84MB0051; x-ms-traffictypediagnostic: AT5PR84MB0051: x-exchange-antispam-report-test: UriScan:(227479698468861)(20558992708506)(9452136761055)(228905959029699); x-microsoft-antispam-prvs: x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(601004)(2401047)(8121501046)(5005006)(10201501046)(100000703101)(100105400095)(93006095)(93001095)(3002001)(6055026)(6041248)(20161123558100)(20161123564025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123562025)(20161123560025)(20161123555025)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095);SRVR:AT5PR84MB0051;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:AT5PR84MB0051; x-forefront-prvs: 0402872DA1 x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(6009001)(39860400002)(189002)(377454003)(199003)(13464003)(24454002)(66066001)(5660300001)(478600001)(25786009)(7696004)(3280700002)(4326008)(81156014)(8676002)(74316002)(81166006)(3660700001)(86362001)(33656002)(305945005)(189998001)(7736002)(2906002)(8936002)(101416001)(6116002)(102836003)(5250100002)(55016002)(54906002)(53936002)(6246003)(14454004)(9686003)(6436002)(3846002)(106356001)(68736007)(2900100001)(6506006)(105586002)(229853002)(50986999)(76176999)(2950100002)(53546010)(97736004)(54356999);DIR:OUT;SFP:1102;SCL:1;SRVR:AT5PR84MB0051;H:AT5PR84MB0082.NAMPRD84.PROD.OUTLOOK.COM;FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-MS-Exchange-CrossTenant-originalarrivaltime: 17 Aug 2017 22:29:48.3116 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 105b2061-b669-4b31-92ac-24d304d195dc X-MS-Exchange-Transport-CrossTenantHeadersStamped: AT5PR84MB0051 X-OriginatorOrg: hpe.com Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2817 Lines: 67 > -----Original Message----- > From: Andrew Morton [mailto:akpm@linux-foundation.org] > Sent: Thursday, August 17, 2017 5:10 PM > To: Luck, Tony > Cc: Borislav Petkov ; Dave Hansen ; > Naoya Horiguchi ; Elliott, Robert (Persistent > Memory) ; x86@kernel.org; linux-mm@kvack.org; linux- > kernel@vger.kernel.org > Subject: Re: [PATCH-resend] mm/hwpoison: Clear PRESENT bit for kernel 1:1 > mappings of poison pages > > On Wed, 16 Aug 2017 10:18:03 -0700 "Luck, Tony" > wrote: > > > Speculative processor accesses may reference any memory that has a > > valid page table entry. While a speculative access won't generate > > a machine check, it will log the error in a machine check bank. That > > could cause escalation of a subsequent error since the overflow bit > > will be then set in the machine check bank status register. > > > > Code has to be double-plus-tricky to avoid mentioning the 1:1 virtual > > address of the page we want to map out otherwise we may trigger the > > very problem we are trying to avoid. We use a non-canonical address > > that passes through the usual Linux table walking code to get to the > > same "pte". > > > > Thanks to Dave Hansen for reviewing several iterations of this. > > It's unclear (to lil ole me) what the end-user-visible effects of this > are. > > Could we please have a description of that? So a) people can > understand your decision to cc:stable and b) people whose kernels are > misbehaving can use your description to decide whether your patch might > fix the issue their users are reporting. In general, the system is subject to halting due to uncorrectable memory errors at addresses that software is not even accessing. The first error doesn't cause the crash, but if a second error happens before the machine check handler services the first one, it'll find the Overflow bit set and won't know what errors or how many errors happened (e.g., it might have been problems in an instruction fetch, and the instructions the CPU is slated to run are bogus). Halting is the only safe thing to do. For persistent memory, the BIOS reports known-bad addresses in the ACPI ARS (address range scrub) table. They are likely to keep reappearing every boot since it is persistent memory, so you can't just reboot and hope they go away. Software is supposed to avoid reading those addresses until it fixes them (e.g., writes new data to those locations). Even if it follows this rule, the system can still crash due to speculative reads (e.g., prefetches) touching those addresses. Tony's patch marks those addresses in the page tables so the CPU won't speculatively try to read them. --- Robert Elliott, HPE Persistent Memory