Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp4167122pxv; Mon, 19 Jul 2021 19:04:36 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyAgMJdVXakJAg2/6H5XWwIbMwlWcivTz/wvE6irw+oR54rFAhDTjn1jPaQMckhOFlWdr/E X-Received: by 2002:a17:907:7709:: with SMTP id kw9mr30875766ejc.68.1626746675998; Mon, 19 Jul 2021 19:04:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626746675; cv=none; d=google.com; s=arc-20160816; b=fQrFWJlP6f1yuA0mbjP0HCAqEWXM/9Sf6rvaeWrK7cyZnOOjsiorHeHN5cPanqbh17 rmrJ5TdtDrLfIWVa0HAOWE9OTkUH/nA2OzRV70UqD9l0RqOa/ixcVq6rDGRDNXW9+0fZ MlFRV+ep40NJRkZlyyO5b9cC7Rzm9DJbtIG89Bf+HuzmrFgl/wDHt/jbFpH+eLdgFWtJ DgcKgqcaUVD9nWOqX+t43rkpPQtsk+M6VlpxCwIkxjqqCTqnSNMKOIuvbl9XnbhrTTY9 /Qz0sMrSpdLqb8C28keT+4q6GzPwnqWGEjkta+f1ReL6FoXq5G9b+5NKfuQOFP1KHqmh Yi1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=aTNcUGO47GCVFg/j+RniLeU6bV7nxyeF37WayAamdPA=; b=bd+AtCTzEPYEwNMzJ0mRzm+aq7u5m+gyGLrwnzINg9jL/ovtEcczdciMYX0y5M1Lwe QJAmW2ltp7zK8X9/4KYRlx2O47lpyHAC9DfpdCH78zuO9bjK89FtCRJT2jSM+0yk6TpD DrK2zVGGEaKlFLgPW4FHxmUBg6vQ1xOuQAChAB7SSNkQRgBB/jqIfQKKwrilXoezWFHL DWrERCZaFJMTf84agHSgXTCVXXqS7MOwiKChimrpNHj+ploBIfqxDJXMPKEGZ8Sh/aS1 oJfAgCZ6iW3wDMzonuyA9l7GwHEQ9wk8yQAb9KyMJPhJEu114wv1MLlm7ua5kAYA5lgw bP4Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s10si22542165edr.36.2021.07.19.19.04.13; Mon, 19 Jul 2021 19:04:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1382022AbhGSRwC (ORCPT + 99 others); Mon, 19 Jul 2021 13:52:02 -0400 Received: from mga03.intel.com ([134.134.136.65]:51676 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1382639AbhGSRjh (ORCPT ); Mon, 19 Jul 2021 13:39:37 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10050"; a="211166997" X-IronPort-AV: E=Sophos;i="5.84,252,1620716400"; d="scan'208";a="211166997" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jul 2021 11:20:15 -0700 X-IronPort-AV: E=Sophos;i="5.84,252,1620716400"; d="scan'208";a="509462730" Received: from agluck-desk2.sc.intel.com ([10.3.52.146]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jul 2021 11:20:15 -0700 From: Tony Luck To: Sean Christopherson , Jarkko Sakkinen , Dave Hansen Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Tony Luck Subject: [PATCH v2 0/6] Basic recovery for machine checks inside SGX Date: Mon, 19 Jul 2021 11:20:03 -0700 Message-Id: <20210719182009.1409895-1-tony.luck@intel.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20210708181423.1312359-1-tony.luck@intel.com> References: <20210708181423.1312359-1-tony.luck@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Very different from version 1 based on feedback. Sean: Didn't like tracking types of SGX pages, so that's all gone now. I do track the life cycle (in patch 1) using the "owner" field to determine whether a page is in use vs. dirty/free. Currently this series doesn't make use of that ... so patch 1 could be dropped. But it is very small, and I think a pre-requisite for future improvements to take pre-emptive action for asynch poison notification (rather that just hoping that the enclave will exit without accessing poison, or that if it does consume the poison the error will be recoverable). I think we should defer the whole asynch action to a subsequent series that can build on top of this (and do it properly ... my version 1 sent out SIGBUS signals without regard for system (/proc/sys/vm/memory_failure_early_kill) or per-task (prctl PR_MCE_KILL) policies). Jarkko: Said poison pages should not just be dropped on the floor. They should be added to a list for future tools to examine. I tried the list approach, but safely removing pages from free/dirty lists involved some complex locking, so I skipped ahead to the "tools" idea and just added files in debugfs to show the count of poison pages and a list of addresses (maybe the count is redundant? Could just "wc -l poison_page_list"?). Other: I got a complaint that after a poison page is handled Linux spits out this message: Could not invalidate pfn=0x2000c4d from 1:1 map this is from set_mce_nospec() and happens because EPC pages are not in the 1:1 map. Add code to check and ignore them. Tony Luck (6): x86/sgx: Provide indication of life-cycle of EPC pages x86/sgx: Add infrastructure to identify SGX EPC pages x86/sgx: Initial poison handling for dirty and free pages x86/sgx: Add SGX infrastructure to recover from poison x86/sgx: Hook sgx_memory_failure() into mainline code x86/sgx: Add hook to error injection address validation .../firmware-guide/acpi/apei/einj.rst | 19 +++ arch/x86/include/asm/set_memory.h | 4 + arch/x86/kernel/cpu/sgx/encl.c | 2 +- arch/x86/kernel/cpu/sgx/main.c | 137 +++++++++++++++++- arch/x86/kernel/cpu/sgx/sgx.h | 6 +- drivers/acpi/apei/einj.c | 3 +- include/linux/mm.h | 15 ++ mm/memory-failure.c | 19 ++- 8 files changed, 195 insertions(+), 10 deletions(-) base-commit: 2734d6c1b1a089fb593ef6a23d4b70903526fe0c -- 2.29.2