Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752484AbbKISlD (ORCPT ); Mon, 9 Nov 2015 13:41:03 -0500 Received: from mga09.intel.com ([134.134.136.24]:48200 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752162AbbKISlA (ORCPT ); Mon, 9 Nov 2015 13:41:00 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,266,1444719600"; d="scan'208";a="846899777" Message-Id: From: Tony Luck Date: Mon, 9 Nov 2015 10:26:08 -0800 Subject: [RFC PATCH 0/3] Machine check recovery when kernel accesses poison To: Borislav Petkov Cc: linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, x86@kernel.org Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2301 Lines: 50 This is a first draft to show the direction I'm taking to make it possible for the kernel to recover from machine checks taken while kernel code is executing. In a nutshell I'm duplicating the existing annotation mechanism we use to handle faults when copying to/from user space and then having the machine check handler check to see if we were running code in a marked region to fudge the return IP to point to the recovery path. Note that I also fudge the return value. I'd like in the future to be able to write a "mcsafe_copy_from_user()" function that would be annotated both for page faults, to return a count of bytes uncopied, or an indication that there was a machine check. Hence the BIT(63) bit. Internal feedback suggested we'd need some IS_ERR() like macros to help users decode what happened to take the right action. But this is "RFC" to see if people have better ideas on how to handle this. Almost certainly breaks 32-bit kernels ... while we need to not break them, I don't see that we need to make this work for them. Machine check recovery only works on Xeon systems that have a minimum memory too big for a 32-bit kernel to even boot. Tony Luck (3): x86, ras: Add new infrastructure for machine check fixup tables x86, ras: Extend machine check recovery code to annotated ring0 areas x86, ras: Add mcsafe_memcpy() function to recover from machine checks arch/x86/include/asm/asm.h | 7 +++ arch/x86/include/asm/uaccess.h | 1 + arch/x86/include/asm/uaccess_64.h | 3 + arch/x86/kernel/cpu/mcheck/mce-severity.c | 19 ++++++- arch/x86/kernel/cpu/mcheck/mce.c | 13 ++++- arch/x86/kernel/x8664_ksyms_64.c | 2 + arch/x86/lib/copy_user_64.S | 91 +++++++++++++++++++++++++++++++ arch/x86/mm/extable.c | 16 ++++++ include/asm-generic/vmlinux.lds.h | 6 ++ include/linux/module.h | 1 + kernel/extable.c | 14 +++++ 11 files changed, 168 insertions(+), 5 deletions(-) -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/