Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752929AbbKKVsN (ORCPT ); Wed, 11 Nov 2015 16:48:13 -0500 Received: from mga14.intel.com ([192.55.52.115]:39762 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752862AbbKKVsL (ORCPT ); Wed, 11 Nov 2015 16:48:11 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,278,1444719600"; d="scan'208";a="834643036" Date: Wed, 11 Nov 2015 13:48:04 -0800 From: "Luck, Tony" To: Borislav Petkov Cc: linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, x86@kernel.org Subject: Re: [RFC PATCH 0/3] Machine check recovery when kernel accesses poison Message-ID: <20151111214803.GA11052@agluck-desk.sc.intel.com> References: <20151110112101.GB19187@pd.tnic> <20151110215546.GA28172@agluck-desk.sc.intel.com> <20151111204157.GL22512@pd.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151111204157.GL22512@pd.tnic> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1495 Lines: 32 On Wed, Nov 11, 2015 at 09:41:58PM +0100, Borislav Petkov wrote: > On Tue, Nov 10, 2015 at 01:55:46PM -0800, Luck, Tony wrote: > > I need to add more to the motivation part of this. The people who want > > this are playing with NVDIMMs as storage. So think of many GBytes of > > non-volatile memory on the source end of the memcpy(). People are used > > to disk errors just giving them a -EIO error. They'll be unhappy if an > > NVDIMM error crashes the machine. > > Ah. > > Btw, there's no flag, by chance, somewhere in the MCA regs bunch at > error time which says that the error is originating from NVDIMM? Because > if there were, this patchset is moot. :) No flag. We can search MCi_ADDR across the ranges to see whether this was a normal RAM error on non-volatile. But that doesn't make this patch moot. We still need to change the return address to go to the fixup code instead of back to the place where we hit the error. The exception table is a list of pairs of instruction pointers: [Instruction-that-may-fault, Address-of-fixup-code] In my RFC code I only have one function that can fault, and all the fixup addresses point to the same place. But that doesn't scale to adding more functions (like mcsafe_copy_from_user()). -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/