Received: by 2002:a05:7208:3188:b0:7e:5202:c8b4 with SMTP id r8csp798552rbd; Fri, 23 Feb 2024 04:09:03 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCUEe0WXGXgh41JuTC88IH9sOBaeo6nKneqMm6gXI83+xX+Gt/T0D4Bxu1igWveGZgo0G9dQ72pyCwYJdezErCaggMr1aoj7lg7X5oyPvg== X-Google-Smtp-Source: AGHT+IED7yNOL7xKK33pMLJs3joqCNIQ/kA7SYWY7Jp762VJGmvYn2TRHbMHCxURajZL/JJqX9ii X-Received: by 2002:a17:906:d8d1:b0:a3e:526c:3516 with SMTP id re17-20020a170906d8d100b00a3e526c3516mr1023184ejb.77.1708690143165; Fri, 23 Feb 2024 04:09:03 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1708690143; cv=pass; d=google.com; s=arc-20160816; b=tAI9YG07TkXbBKaMVW0nhqwhvui/CV37OLLv6hVhzJHF88MVStq5pT+Aq9akybRBsM +rVpkF0UV+yXEjtnMC+qVo6YFHTDzT0SaR2znjL83EAmlNs13mIEnevolPFpHPgLSSyF TJE4olJ7n6ftJ4VKKX4prJRg1mcsEFy3xchwsmmegqFJmBNpu2YZ0MQuMc/yem+B094K KB6NlgaYfRnlHiSjgbbea0l+V4zj/HnJKWAXPXYHtrv/VaoZB3LV4MlpN60MCpOusYlL 86P/NsXi8EtXqwnE4h2gmVhTG9aXL/ioHljMrDB0GQrkORRhjcrPh5QjYZSc9jPxar3F rxeQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:organization:references :in-reply-to:message-id:subject:cc:to:from:date; bh=xoIe3sPXYstbgPx+iFnzmo7Fbmwm9KqcCOlM3ee1tT4=; fh=gYmxPOVWiMcJpDPMUcuXHaRGtBhvAgZW5JkMHQQ3lvA=; b=MnHrn7kQycKHoWL0QWaF5OaU6QNQpKQF3Z4QcEDoqh3GxgnIAXPjqV0KkLQW1SVi/s oUQAe41zFzzNxsO47AkMr19imxZPtObxuEcCHKYmFPSm24T7fA8KLHaIW5QYMHSj0l2C cBqvv0amPx3gJlCvUsY0C/CZndi+8AykAH39LYM8A34wF+f1q3M75KTo/tmgHajcoIce G4BN6PSJOrh+l8oCzPI3RwivS1qg/+tjOzIgPlnBfamBVobyUa1aZ6zprrIKecn71Lv5 e0hlHykh+i4s7tpIMnelhxFctgNUF7ElsfPeTjYRuVQkJf9C22y2lwbADzt5b+8T1bOq QM2Q==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huawei.com dmarc=pass fromdomain=huawei.com); spf=pass (google.com: domain of linux-kernel+bounces-78271-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-78271-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id e19-20020a170906045300b00a3ea381b5c6si4132270eja.585.2024.02.23.04.09.03 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Feb 2024 04:09:03 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-78271-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huawei.com dmarc=pass fromdomain=huawei.com); spf=pass (google.com: domain of linux-kernel+bounces-78271-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-78271-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id E89191F22224 for ; Fri, 23 Feb 2024 12:09:02 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 508B67CF27; Fri, 23 Feb 2024 12:08:29 +0000 (UTC) Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C1C0D7C099; Fri, 23 Feb 2024 12:08:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=185.176.79.56 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708690108; cv=none; b=EqOQlKx0eLtNyiFouykPfNY1o7/FV0SkimAoIelNj5+bSdjafc4qAimsXTF0J9StTgs5kZgDGqB9bmuCUmdGw+91nCBHT1bKgp4HTHqwE5QSmxpaEPKrN8nOylPiKt/6xBp48nYZcj+WE6AGrXLly/cRQLTLDfbjQxlnMgrn0dc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708690108; c=relaxed/simple; bh=kbwCh5ivkreyialYBJQE89WYwcGywhW6JhNL6UddKKo=; h=Date:From:To:CC:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=JwFrFQbvJ+WkSWRlT5EkDsXKo8lRtMd0VCGlyy3pOsWIxLiTSpWu+vRjMn0U9kVNbyageK3wK0mpYz2WBI+2CJMiVwJQimMfUMJYp2ORUnYCfBvtMoRDdR/AlZ4tI0QXwec8t4U6ien7R/dX/QYeqD5qlHesVKlKOQkoQzZKeNs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=Huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=185.176.79.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=Huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.18.186.231]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4Th7w71fm3z6K6B1; Fri, 23 Feb 2024 20:04:07 +0800 (CST) Received: from lhrpeml500005.china.huawei.com (unknown [7.191.163.240]) by mail.maildlp.com (Postfix) with ESMTPS id 2F56D140B33; Fri, 23 Feb 2024 20:08:16 +0800 (CST) Received: from localhost (10.202.227.76) by lhrpeml500005.china.huawei.com (7.191.163.240) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Fri, 23 Feb 2024 12:08:15 +0000 Date: Fri, 23 Feb 2024 12:08:13 +0000 From: Jonathan Cameron To: Dan Williams CC: Shuai Xue , Borislav Petkov , Ira Weiny , "Luck, Tony" , "james.morse@arm.com" , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: Re: [PATCH v11 1/3] ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered Message-ID: <20240223120813.00005d1f@Huawei.com> In-Reply-To: <65d82c9352e78_24f3f294d5@dwillia2-mobl3.amr.corp.intel.com.notmuch> References: <20221027042445.60108-1-xueshuai@linux.alibaba.com> <20240204080144.7977-2-xueshuai@linux.alibaba.com> <20240219092528.GTZdMeiDWIDz613VeT@fat_crate.local> <65d82c9352e78_24f3f294d5@dwillia2-mobl3.amr.corp.intel.com.notmuch> Organization: Huawei Technologies Research and Development (UK) Ltd. X-Mailer: Claws Mail 4.1.0 (GTK 3.24.33; x86_64-w64-mingw32) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-ClientProxiedBy: lhrpeml100004.china.huawei.com (7.191.162.219) To lhrpeml500005.china.huawei.com (7.191.163.240) On Thu, 22 Feb 2024 21:26:43 -0800 Dan Williams wrote: > Shuai Xue wrote: > > > > > > On 2024/2/19 17:25, Borislav Petkov wrote: > > > On Sun, Feb 04, 2024 at 04:01:42PM +0800, Shuai Xue wrote: > > >> Synchronous error was detected as a result of user-space process accessing > > >> a 2-bit uncorrected error. The CPU will take a synchronous error exception > > >> such as Synchronous External Abort (SEA) on Arm64. The kernel will queue a > > >> memory_failure() work which poisons the related page, unmaps the page, and > > >> then sends a SIGBUS to the process, so that a system wide panic can be > > >> avoided. > > >> > > >> However, no memory_failure() work will be queued when abnormal synchronous > > >> errors occur. These errors can include situations such as invalid PA, > > >> unexpected severity, no memory failure config support, invalid GUID > > >> section, etc. In such case, the user-space process will trigger SEA again. > > >> This loop can potentially exceed the platform firmware threshold or even > > >> trigger a kernel hard lockup, leading to a system reboot. > > >> > > >> Fix it by performing a force kill if no memory_failure() work is queued > > >> for synchronous errors. > > >> > > >> Signed-off-by: Shuai Xue > > >> --- > > >> drivers/acpi/apei/ghes.c | 9 +++++++++ > > >> 1 file changed, 9 insertions(+) > > >> > > >> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c > > >> index 7b7c605166e0..0892550732d4 100644 > > >> --- a/drivers/acpi/apei/ghes.c > > >> +++ b/drivers/acpi/apei/ghes.c > > >> @@ -806,6 +806,15 @@ static bool ghes_do_proc(struct ghes *ghes, > > >> } > > >> } > > >> > > >> + /* > > >> + * If no memory failure work is queued for abnormal synchronous > > >> + * errors, do a force kill. > > >> + */ > > >> + if (sync && !queued) { > > >> + pr_err("Sending SIGBUS to current task due to memory error not recovered"); > > >> + force_sig(SIGBUS); > > >> + } > > > > > > Except that there are a bunch of CXL GUIDs being handled there too and > > > this will sigbus those processes now automatically. > > > > Before the CXL GUIDs added, @Tony confirmed that the HEST notifications are always > > asynchronous on x86 platform, so only Synchronous External Abort (SEA) on ARM is > > delivered as a synchronous notification. > > > > Will the CXL component trigger synchronous events for which we need to terminate the > > current process by sending sigbus to process? > > None of the CXL component errors should be handled as synchronous > events. They are either asynchronous protocol errors, or effectively > equivalent to CPER_SEC_PLATFORM_MEM notifications. Not a good example, CPER_SEC_PLATFORM_MEM is sometimes signaled via SEA.