Received: by 2002:a05:7412:b10a:b0:f3:1519:9f41 with SMTP id az10csp485362rdb; Thu, 30 Nov 2023 09:44:00 -0800 (PST) X-Google-Smtp-Source: AGHT+IEzEuezwdJlG0KP8P1/5+edOEH4M0zriJJ+Yt71A8o90qLnW+1DjFduPoO8L0DvhmdwxEEV X-Received: by 2002:a9d:6443:0:b0:6c6:18c7:7ca2 with SMTP id m3-20020a9d6443000000b006c618c77ca2mr277620otl.12.1701366240624; Thu, 30 Nov 2023 09:44:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701366240; cv=none; d=google.com; s=arc-20160816; b=LEfPsX1nqRXaZXPg04b2mD7z3AggvkgB23VDNJ0Ty1+u74Xml8fX9txOU416WvSNun D+FQdG487Edqzj5vXpU2MaJVRd2kTi/LLkyWlgl1EEMEt9PxzVgDoDx5V1PslBitV0rD DRaMNJy52rqlxmc5lbN8KrRF+eX7azdAFk5NDnsIzn76otjnH4Q0uqofUrcOF/iCo2uR dSnpX/GoAc7/n8qLr1p5pwPh1YED+aniZjRXzMOFipV5hvnIZcGCiwJDRZ0TmigVtjUi fg9DhvSQ2ls18XaCscdrAUKRN2IvzWQZpKqxF9MAMyYmnTBsctUsStUlMCuIqbY3mPya xW6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=n6E/KMc6fd9ywjkDP7JwJ3J0cfBwqIcTtB0EmKsxv68=; fh=VvKAUMJ1oUe5LUXXVwmgEcFsVjVeC8myeHP3yRe7bys=; b=t8hQRc50bxxSMxkFtXbAFC8ccJS19R+L2bHagtfvkYURdO2P6nh5YmMUCV+r2E6Y1t PCrv0N0YdXIJVD7ay8nX23jR+Svvrk8TOGJ+DtdMp1APh4xnf0Iwks5Lo30mrTVXgtgc rhHt6xcItypmIyj79UIukurOxEvUETbOP5LAUhOftK7mAm+54/BF6vTV+TPyKd0pIvzh nrctjUW8qCpj3pTThJGMXBZPRc1Q0GlkuE1Dt48QsQcao8WTH+k9wjs41cui5LaHV6er 5ycHJxJQE1IFf5+CJxORHFXiEaBwG7oE25unx9ZPMMZuksGfpZogJ847fxxFgU6305vV up0w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from fry.vger.email (fry.vger.email. [23.128.96.38]) by mx.google.com with ESMTPS id bj10-20020a056a02018a00b005898b0b851asi1816979pgb.530.2023.11.30.09.44.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Nov 2023 09:44:00 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 2AF718025703; Thu, 30 Nov 2023 09:43:58 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346620AbjK3Rno (ORCPT + 99 others); Thu, 30 Nov 2023 12:43:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39032 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231935AbjK3Rnm (ORCPT ); Thu, 30 Nov 2023 12:43:42 -0500 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id D2F7210FC; Thu, 30 Nov 2023 09:43:48 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3E42F1756; Thu, 30 Nov 2023 09:44:35 -0800 (PST) Received: from [10.1.197.60] (eglon.cambridge.arm.com [10.1.197.60]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B012C3F6C4; Thu, 30 Nov 2023 09:43:44 -0800 (PST) Message-ID: Date: Thu, 30 Nov 2023 17:43:38 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux aarch64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [PATCH v9 0/2] ACPI: APEI: handle synchronous errors in task work with proper si_code Content-Language: en-GB To: Borislav Petkov , Shuai Xue Cc: rafael@kernel.org, wangkefeng.wang@huawei.com, tanxiaofei@huawei.com, mawupeng1@huawei.com, tony.luck@intel.com, linmiaohe@huawei.com, naoya.horiguchi@nec.com, gregkh@linuxfoundation.org, will@kernel.org, jarkko@kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-edac@vger.kernel.org, acpica-devel@lists.linuxfoundation.org, stable@vger.kernel.org, x86@kernel.org, justin.he@arm.com, ardb@kernel.org, ying.huang@intel.com, ashish.kalra@amd.com, baolin.wang@linux.alibaba.com, tglx@linutronix.de, mingo@redhat.com, dave.hansen@linux.intel.com, lenb@kernel.org, hpa@zytor.com, robert.moore@intel.com, lvying6@huawei.com, xiexiuqi@huawei.com, zhuo.song@linux.alibaba.com References: <20221027042445.60108-1-xueshuai@linux.alibaba.com> <20231007072818.58951-1-xueshuai@linux.alibaba.com> <20231123150710.GEZV9qnkWMBWrggGc1@fat_crate.local> <9e92e600-86a4-4456-9de4-b597854b107c@linux.alibaba.com> <20231125121059.GAZWHkU27odMLns7TZ@fat_crate.local> <1048123e-b608-4db1-8d5f-456dd113d06f@linux.alibaba.com> <20231129185406.GBZWeIzqwgRQe7XDo/@fat_crate.local> <20231130144001.GGZWiewYtvMSJir62f@fat_crate.local> From: James Morse In-Reply-To: <20231130144001.GGZWiewYtvMSJir62f@fat_crate.local> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.9 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Thu, 30 Nov 2023 09:43:58 -0800 (PST) Hi Boris, On 30/11/2023 14:40, Borislav Petkov wrote: > FTR, this is starting to make sense, thanks for explaining. > > Replying only to this one for now: > > On Thu, Nov 30, 2023 at 10:58:53AM +0800, Shuai Xue wrote: >> To reproduce this problem: >> >> # STEP1: enable early kill mode >> #sysctl -w vm.memory_failure_early_kill=1 >> vm.memory_failure_early_kill = 1 >> >> # STEP2: inject an UCE error and consume it to trigger a synchronous error > > So this is for ARM folks to deal with, BUT: > > A consumed uncorrectable error on x86 means panic. On some hw like on > AMD, that error doesn't even get seen by the OS but the hw does > something called syncflood to prevent further error propagation. So > there's no any action required - the hw does that. > > But I'd like to hear from ARM folks whether consuming an uncorrectable > error even lets software run. Dunno. I think we mean different things by 'consume' here. I'd assume Shuai's test is poisoning a cache-line. When the CPU tries to access that cache-line it will get an 'external abort' signal back from the memory system. Shuai - is this what you mean by 'consume' - the CPU received external abort from the poisoned cache line? It's then up to the CPU whether it can put the world back in order to take this as synchronous-external-abort or asynchronous-external-abort, which for arm64 are two different interrupt/exception types. The synchronous exceptions can't be masked, but the asynchronous one can. If by the time the asynchronous-external-abort interrupt/exception has been unmasked, the CPU has used the poisoned value in some calculation (which is what we usually mean by consume) which has resulted in a memory access - it will report the error as 'uncontained' because the error has been silently propagated. APEI should always report those a 'fatal', and there is little point getting the OS involved at this point. Also in this category are things like 'tag ram corruption', where you can no longer trust anything about memory. Everything in this thread is about synchronous errors where this can't happen. The CPU stops and does takes an interrupt/exception instead. Thanks, James