Received: by 10.223.176.46 with SMTP id f43csp626509wra; Fri, 26 Jan 2018 04:26:51 -0800 (PST) X-Google-Smtp-Source: AH8x224D7GsJ9x9W/SA6Mk+SKcKqRap/bgKdzJQ8TrqhBSWchqVZ0U0VpfCqjNgypTABPqquYBc+ X-Received: by 10.101.98.26 with SMTP id d26mr15902403pgv.416.1516969611016; Fri, 26 Jan 2018 04:26:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516969610; cv=none; d=google.com; s=arc-20160816; b=jW1iB7uIQ1pDd24QZi56UKJ/OGuqRyhyrEqw1+2Foz+qWDJD1ChfKjrzAj1x+wHA7T JfX/0HYP61YsNzApxQ2c6EtE/pq1Rnreh0a0spzDUHvks8+M8UiBHM1YL/Ha/aTMaTR8 xuXjXiP77PjYiUB7Vx+03jGzoStJ3Z8g74T2yJ67B0hP7ImGv6D3X23dRpV783UFMQDA PIkdHbd1J3DLg6Q0MvNOWP6uB6O4TCfw1xyTW+3Wb8YKipCbZeWLqXtw0yslZUtjAc9F Uk24OploTSq/YZAlqLltHcfIs0/rz1bnloO5lG4jsh9e85fQ57DE2MjvCg+Q3ZgkGlOk NMkA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:date:subject:cc :to:from:arc-authentication-results; bh=ibm/BuxAKUCDaY9vdPuxOjR0PPBE9UfitjVydYF066w=; b=oG1UC3K0uk/VCPA7MdZ7gOWnF/pokkWwuAvrgMJnUNxDBxdwQeW64CSJbnYiXUSK5P mbZhH+wPCnxjkFuekFYEBpxKH07olvZTtYWhFBr/BwBte6zE5CnIX+TWnX8v6PBDY0OS x85Vc199Jh1iqIM1E/7Bmc+d8ghALHDVs5ygbYA0l6vnMKWs38PiilNUlHWaK0QXEQy5 d37XlBgpRw5+TdNAMHLpcpEx4PwRVQ1Ix39+RFtvjGJLihedX3zaV2ihRGT58d46H8XG 62li+EA37u8XtHN6gLkF0lYQwouya4MZkq/Iix3tFz1qdeW/8fk3ULgvcwYgltXAkz7d 6obQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q11si2946389pgc.617.2018.01.26.04.26.36; Fri, 26 Jan 2018 04:26:50 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751973AbeAZMZW (ORCPT + 99 others); Fri, 26 Jan 2018 07:25:22 -0500 Received: from szxga06-in.huawei.com ([45.249.212.32]:42094 "EHLO huawei.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1751793AbeAZMZS (ORCPT ); Fri, 26 Jan 2018 07:25:18 -0500 Received: from DGGEMS411-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id D0E8B289A61B7; Fri, 26 Jan 2018 20:25:04 +0800 (CST) Received: from localhost.localdomain.localdomain (10.175.113.25) by DGGEMS411-HUB.china.huawei.com (10.3.19.211) with Microsoft SMTP Server id 14.3.361.1; Fri, 26 Jan 2018 20:24:57 +0800 From: Xie XiuQi To: , , , , , , , , , , , , , CC: , , , , , , , , , , , , Subject: [PATCH v5 0/3] arm64/ras: support sea error recovery Date: Fri, 26 Jan 2018 20:31:22 +0800 Message-ID: <1516969885-150532-1-git-send-email-xiexiuqi@huawei.com> X-Mailer: git-send-email 1.8.3.1 MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.175.113.25] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org With ARM v8.2 RAS Extension, SEA are usually triggered when memory errors are consumed. According to the existing process, errors occurred in the kernel, leading to direct panic, if it occurred the user-space, we should just kill process. But there is a class of error, in fact, is not necessary to kill process, you can recover and continue to run the process. Such as the instruction data corrupted, where the memory page might be read-only, which is has not been modified, the disk might have the correct data, so you can directly drop the page, ant reload it when necessary. So this patchset is just try to solve such problem: if the error is consumed in user-space and the error occurs on a clean page, you can directly drop the memory page without killing process. If the corrupted page is clean, just dropped it and return to user-space without side effects. And if corrupted page is dirty, memory_failure() will send SIGBUS with code=BUS_MCEERR_AR. While without this patchset, do_sea() will just send SIGBUS, so the process was killed in the same place. Because memory_failure() may sleep, we can not call it directly in SEA exception context. So we saved faulting physical address associated with a process in the ghes handler and set __TIF_SEA_NOTIFY. When we return from SEA exception context and get into do_notify_resume() before the process running, we could check it and call memory_failure() to do recovery. It's safe, because we are in process context. In some platform, when SEA triggerred, physical address could be reported by memory section or by processor section, so we save address at this two place. --- v5 - v4: - rebased on top of 4.15-rc9 + efi patches efi patches: https://patchwork.codeaurora.org/patch/415877/ https://patchwork.codeaurora.org/patch/415879/ - add Tyler & Xiongfeng's Tested-by. v4 - v3: - rebase on top of the latest mainline - make ghes_arm_process_error as a weak function - only pick cache error from arm processor section for error recovery - fix s-o-b issue https://lkml.org/lkml/2017/9/7/98 v3 - v2: - fix patch style issue v2 - v1: - wrap arm_proc_error_check and log_arm_hw_error in a single arm_process_error() - fix sea_save_info return value issue - fix link error if this CONFIG_ARM64_ERR_RECOV is not selected - use a notify chain instead of call arch_apei_report_mem_error directly https://lkml.org/lkml/2017/9/1/189 Xie XiuQi (3): arm64/ras: support sea error recovery GHES: add a notify chain for process memory section arm64/ras: save error address from memory section for recovery arch/arm64/Kconfig | 11 +++ arch/arm64/include/asm/ras.h | 23 +++++ arch/arm64/include/asm/thread_info.h | 4 +- arch/arm64/kernel/Makefile | 1 + arch/arm64/kernel/ras.c | 173 +++++++++++++++++++++++++++++++++++ arch/arm64/kernel/signal.c | 7 ++ arch/arm64/mm/fault.c | 27 ++++-- drivers/acpi/apei/ghes.c | 18 +++- include/acpi/ghes.h | 11 +++ 9 files changed, 265 insertions(+), 10 deletions(-) create mode 100644 arch/arm64/include/asm/ras.h create mode 100644 arch/arm64/kernel/ras.c -- 1.8.3.1