Received: by 2002:a05:7412:419a:b0:f3:1519:9f41 with SMTP id i26csp420889rdh; Thu, 23 Nov 2023 07:30:13 -0800 (PST) X-Google-Smtp-Source: AGHT+IHKBY1+oWOgRIyHKSzWaTWHj0GcEsaviwwOQxTF04JxyAO6n8kFTUIHInaNyxMUrb8WktPz X-Received: by 2002:a17:902:a5c9:b0:1ce:64fb:e507 with SMTP id t9-20020a170902a5c900b001ce64fbe507mr5636222plq.27.1700753412683; Thu, 23 Nov 2023 07:30:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700753412; cv=none; d=google.com; s=arc-20160816; b=TTbUCBezMBh9OOMclzVT6P8KZG7CsKUwgSoodNLuwXZHEKAaQbQ/lgpL2uUaV+spBo vJdlcz7FujzFv5BSJLXPRMbObqPsr5dYTsiIs+CqdjA9YUMs5bnmZp3i+1ukTRc1RIQk 4dS1HBATxJiFEtDvDe38aOrlDU+0RKuS/Q1SDBlBJ9gNX6Q/uyvesYTY91CUVMlIiEzD nkZgN+uZ2g9WYJudpB8D9eUk/ce4pgAC9ynGtxgasogJLSlg8fcS5deOmOIeXkwsWjqN HJdLZ+agHSuWdGHIB6VNrtQ3qSc7gY1tAq/VBS6LamKYMi69Y5NPk0ME88vXwU6/9/fg w0sQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=GZioNhBqPoaVW6r9VNgUDTxDc/9WQFahtmtDcDv1jQk=; fh=sKlWT0Bq30p6RhJWICAiT/FD3+zmlZr8pzE4J+HSf50=; b=nrQOs81EGI0KYunx5ooR8T20Af685WHXtG8rnuto0Sdhf+Bm4IPsTg8A/+VAJoQjjS KiCQNJ9koQhswcTo1TUb8rfnl2tKodMwR1cOVgKRIqrcE2tnq1/6VtICqEbPxea7V+Vs NGezpZ0Cb5qd3eSG7HwSyoEOTcERP9PxOfHCnJVYrKf/8j160L1knEqd1ol24UaNMIJj lwS2qlqHIxLo1jkCATt2nT6wjAuMQgkLrkvcXI3xJf4VTJdbZcEMF917HO9TcWx1vc/z eol9FSu1tvkZbLsFLTAMl6KVZvq/Xv2lzHPHga5KWKZe/O+dxUZKocGHgBP/Rmj0bmJC UYDg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@alien8.de header.s=alien8 header.b="SM/6TxLm"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alien8.de Return-Path: Received: from morse.vger.email (morse.vger.email. [2620:137:e000::3:1]) by mx.google.com with ESMTPS id 12-20020a170902ee4c00b001cc50114667si1234481plo.551.2023.11.23.07.30.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Nov 2023 07:30:12 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) client-ip=2620:137:e000::3:1; Authentication-Results: mx.google.com; dkim=pass header.i=@alien8.de header.s=alien8 header.b="SM/6TxLm"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alien8.de Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id 1910F806D803; Thu, 23 Nov 2023 07:08:05 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346004AbjKWPHt (ORCPT + 99 others); Thu, 23 Nov 2023 10:07:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33558 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345968AbjKWPHr (ORCPT ); Thu, 23 Nov 2023 10:07:47 -0500 Received: from mail.alien8.de (mail.alien8.de [IPv6:2a01:4f9:3051:3f93::2]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9BDD1D64; Thu, 23 Nov 2023 07:07:53 -0800 (PST) Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.alien8.de (SuperMail on ZX Spectrum 128k) with ESMTP id F2CDC40E0257; Thu, 23 Nov 2023 15:07:50 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at mail.alien8.de Authentication-Results: mail.alien8.de (amavisd-new); dkim=pass (4096-bit key) header.d=alien8.de Received: from mail.alien8.de ([127.0.0.1]) by localhost (mail.alien8.de [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id m5sjdodFMcPV; Thu, 23 Nov 2023 15:07:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alien8.de; s=alien8; t=1700752069; bh=GZioNhBqPoaVW6r9VNgUDTxDc/9WQFahtmtDcDv1jQk=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=SM/6TxLmCLeeCMJ4E7eAn6aIauEZQYDF368OVQdaxtJMkCf3CvIxti5slID2BVG8T 7vlw2LVVbk1bS3DnCn8J6Tm8te6+efi6FDcjzlIjoC5+f+QG27JxLjHfBHheYGcQut bcOXtX8ftoC4YUDrQFAKRPkmwTfJpRrkbI/0OTy7nU2hjzvU4t6tY69JbHucTrYPhW 8wP8EdCI9dHTom0ATgsb5RTUjJiC/9JbjriV6R+OowpPSQCgBYWZzYiUbUXaXXGBTV Lnp1aVp6+K9DwYxKXo/kCkQMMAl0tSwCDo0vI+gV7IYDVnOj/tgn58Qmck734jvle8 s1qX7K27uPhTo8tutx/KBwdLgam7VR46wBo204py/FLWXvyjkS5KbwYeqcRf4ds/vv 189WaejXxC4zyMAngHPMVWdfiH/srNz8tUIUbyYdk5Sw02kWFghaX6yseQsJKvSDmy n7CbsMRUZtQ/55hbbKcNiSVnH6BwIDxaRfmIxV2EURLNtKr/8PjKsI4/oozYuoREDy WYRCFGP/z/hq+4RWfu0JFOyZywYRt0OoPMUhYmMyFHV0v7q+T4g8v19Ulkm8Kfu1Gn n5kP3VMwL0iPvOkOfUoEqDF875T1CllORUq1wJebvge1tYzhwHHP8sFcRsrR3YZ3oE ej22VW8rwDIqG1Ea0/a5GaHE= Received: from zn.tnic (pd95304da.dip0.t-ipconnect.de [217.83.4.218]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by mail.alien8.de (SuperMail on ZX Spectrum 128k) with ESMTPSA id 9922540E0195; Thu, 23 Nov 2023 15:07:14 +0000 (UTC) Date: Thu, 23 Nov 2023 16:07:10 +0100 From: Borislav Petkov To: Shuai Xue Cc: rafael@kernel.org, wangkefeng.wang@huawei.com, tanxiaofei@huawei.com, mawupeng1@huawei.com, tony.luck@intel.com, linmiaohe@huawei.com, naoya.horiguchi@nec.com, james.morse@arm.com, gregkh@linuxfoundation.org, will@kernel.org, jarkko@kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-edac@vger.kernel.org, acpica-devel@lists.linuxfoundation.org, stable@vger.kernel.org, x86@kernel.org, justin.he@arm.com, ardb@kernel.org, ying.huang@intel.com, ashish.kalra@amd.com, baolin.wang@linux.alibaba.com, tglx@linutronix.de, mingo@redhat.com, dave.hansen@linux.intel.com, lenb@kernel.org, hpa@zytor.com, robert.moore@intel.com, lvying6@huawei.com, xiexiuqi@huawei.com, zhuo.song@linux.alibaba.com Subject: Re: [PATCH v9 0/2] ACPI: APEI: handle synchronous errors in task work with proper si_code Message-ID: <20231123150710.GEZV9qnkWMBWrggGc1@fat_crate.local> References: <20221027042445.60108-1-xueshuai@linux.alibaba.com> <20231007072818.58951-1-xueshuai@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20231007072818.58951-1-xueshuai@linux.alibaba.com> X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Thu, 23 Nov 2023 07:08:05 -0800 (PST) On Sat, Oct 07, 2023 at 03:28:16PM +0800, Shuai Xue wrote: > However, this trick is not always be effective So far so good. What's missing here is why "this trick" is not always effective. Basically to explain what exactly the problem is. > For example, hwpoison-aware user-space processes use the si_code: > BUS_MCEERR_AO for 'action optional' early notifications, and BUS_MCEERR_AR > for 'action required' synchronous/late notifications. Specifically, when a > signal with SIGBUS_MCEERR_AR is delivered to QEMU, it will inject a vSEA to > Guest kernel. In contrast, a signal with SIGBUS_MCEERR_AO will be ignored > by QEMU.[1] > > Fix it by seting memory failure flags as MF_ACTION_REQUIRED on synchronous events. (PATCH 1) So you're fixing qemu by "fixing" the kernel? This doesn't make any sense. Make errors which are ACPI_HEST_NOTIFY_SEA type return MF_ACTION_REQUIRED so that it *happens* to fix your use case. Sounds like a lot of nonsense to me. What is the issue here you're trying to solve? > 2. Handle memory_failure() abnormal fails to avoid a unnecessary reboot > > If process mapping fault page, but memory_failure() abnormal return before > try_to_unmap(), for example, the fault page process mapping is KSM page. > In this case, arm64 cannot use the page fault process to terminate the > synchronous exception loop.[4] > > This loop can potentially exceed the platform firmware threshold or even trigger > a kernel hard lockup, leading to a system reboot. However, kernel has the > capability to recover from this error. > > Fix it by performing a force kill when memory_failure() abnormal fails or when > other abnormal synchronous errors occur. Just like that? Without giving the process the opportunity to even save its other data? So this all is still very confusing, patches definitely need splitting and this whole thing needs restraint. You go and do this: you split *each* issue you're addressing into a separate patch and explain it like this: --- 1. Prepare the context for the explanation briefly. 2. Explain the problem at hand. 3. "It happens because of <...>" 4. "Fix it by doing X" 5. "(Potentially do Y)." --- and each patch explains *exactly* *one* issue, what happens, why it happens and just the fix for it and *why* it is needed. Otherwise, this is unreviewable. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette