Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp462424pxj; Thu, 17 Jun 2021 06:48:11 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwgqJtFikdnUQznRu3AekMvxhRaVzWLcP3I2jvRsTnAIwyogW6L70ezLkx8E1eF7N15+EcN X-Received: by 2002:a17:906:b84f:: with SMTP id ga15mr5270593ejb.372.1623937691024; Thu, 17 Jun 2021 06:48:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1623937691; cv=none; d=google.com; s=arc-20160816; b=OXjlkwsn55VSt4tA053rIpHj4TuYB9w1fbGsi4o2sLWJ3DFu1Ken93TdMWo3BA864r v0KCPxO7w27WmXeQ+nQwYjq+CcqUk++WoQZRkM8EmrT7eKqYQlHrbJe798P4pg0zoGgJ bXeaNInfgkTcPZOkVHCeBwgleYNG9SQr7vm7Acu/zsZZL+0Jbicjdo/Vn0KDr/dv4nwk M5l4Wq9mrFQ6k+mIR8VLKxN3zr5vtHz9RSU4rUkrZ1ZiphwvFyY2bQMhfNk7U++hCgil J7xE8rXGFni6bhQSrx72O2zrG8WA7XYCd43PqPJSeMdbux9H3oCF85HhNepJOK6cvNMk MtbQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version; bh=70Prv0Q9PiR2Mu1RsII1XZ4V8W5RAajPqngPjwLFOeo=; b=D4qRRLrtPsYORKmFQ3P95OOKZ4Sl9A8nJa62g3l3GxwiWopxDnrZJyc5m4CxudVMrn aSInSDzKOIVtrWiLXJmLwJQ8sxQaV4v07VTZNWlmuLegzbQuP2IywQn0MtsL/4rB/aI7 hyzxITvC/HHFAWCUUiRlJtlsAqqDac8tsEKoB9R9mJfEFmm8ZfQRnG/qJu0mz1ze96iV etoUdKu7LdUJAm5QrwSlZfLgTj7WnNK63BgaxqMP5MOKGGa6sAVSqEZ554Q5BCGONi7n OLa5yA85xuQ+/5KeuZUmUNWDqxt27OOTJtANgGzTIg24UCIXbYafKk/7Zxwv9F8Ab+iB RcBA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id e19si4960880ejy.254.2021.06.17.06.47.48; Thu, 17 Jun 2021 06:48:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231969AbhFQMJV (ORCPT + 99 others); Thu, 17 Jun 2021 08:09:21 -0400 Received: from mail-ot1-f42.google.com ([209.85.210.42]:42911 "EHLO mail-ot1-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231299AbhFQMJU (ORCPT ); Thu, 17 Jun 2021 08:09:20 -0400 Received: by mail-ot1-f42.google.com with SMTP id w23-20020a9d5a970000b02903d0ef989477so5850017oth.9; Thu, 17 Jun 2021 05:07:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=70Prv0Q9PiR2Mu1RsII1XZ4V8W5RAajPqngPjwLFOeo=; b=qIbPRxZ8VNE3CIqpmpIntA6oA9+tF5RH6xhHHSJ+x353HDab4UWxE2F7aaNlnISNnp 2by1hLfkHE3X9H/+vPDDqCgmK3GuPMcvndvMCLllEekzC8TiEvb69vYFEya1gzi2+fpH cy8ElREI9Db9HIQ/FyfgFgGd5OpSZchlFFzBWAVdbOgZ+tkqnFZABfvR+y5X/VMaC8F1 oDEP9Nvv/7nO94T9TOLhN4s/ySjvgSm8C3KbaxdouzHCPHLZjsN9cFk8tzmIBwftwscx aGreh8u4jMtSMQbwWzKLTfUicb4cOyD22fMIm82RngXAHQuVc+1KD33ytf9ZSkuNzvhw d7mw== X-Gm-Message-State: AOAM533fxxku2SZx8bsmE/sEVQ6ETCJDGV9M+sUKhqb0JDQwa6bpD199 qP5FRAm7+DsxRvN9EVZW2wd4FUMuvTFR0BLLL4w= X-Received: by 2002:a05:6830:1bf7:: with SMTP id k23mr4450235otb.206.1623931632819; Thu, 17 Jun 2021 05:07:12 -0700 (PDT) MIME-Version: 1.0 References: <1623415027-36130-1-git-send-email-tanxiaofei@huawei.com> In-Reply-To: From: "Rafael J. Wysocki" Date: Thu, 17 Jun 2021 14:07:01 +0200 Message-ID: Subject: Re: [PATCH v7] ACPI / APEI: fix the regression of synchronous external aborts occur in user-mode To: Xiaofei Tan Cc: "Rafael J. Wysocki" , James Morse , "Rafael J. Wysocki" , Len Brown , Tony Luck , Borislav Petkov , Andrew Morton , Joerg Roedel , Peter Zijlstra , ACPI Devel Maling List , Linux Kernel Mailing List , linuxarm@openeuler.org Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 15, 2021 at 5:47 AM Xiaofei Tan wrote: > > Hi Rafael, > > On 2021/6/14 23:46, Rafael J. Wysocki wrote: > > On Fri, Jun 11, 2021 at 2:40 PM Xiaofei Tan wrote: > >> > >> Before commit 8fcc4ae6faf8 ("arm64: acpi: Make apei_claim_sea() > >> synchronise with APEI's irq work"), do_sea() would unconditionally > >> signal the affected task from the arch code. Since that change, > >> the GHES driver sends the signals. > >> > >> This exposes a problem as errors the GHES driver doesn't understand > >> or doesn't handle effectively are silently ignored. It will cause > >> the errors get taken again, and circulate endlessly. User-space task > >> get stuck in this loop. > >> > >> Existing firmware on Kunpeng9xx systems reports cache errors with the > >> 'ARM Processor Error' CPER records. > >> > >> Do memory failure handling for ARM Processor Error Section just like > >> for Memory Error Section. > > > > Still, I'm not convinced that this is the right way to address the problem. > > > > In particular, is it guaranteed that "ARM Processor Error" will always > > mean "memory failure" on all platforms? > > > > There are two sources for ARM Processor cache errors(no second case for the platform that doesn't support poison mechanism). > 1.occur in the cache. If it is transient, we have a chance to recover by doing memory failure. > If it is persistent, we have to handle in other place, such as do cache way isolation in firmware, > or trigger cpu core isolation in user space. I think most platform can't support such feature, > so the most simple and effective way is report as fatal error and do isolation during firmware start-up phase. > > 2.error transferred from other RAS node. If it is from DDR, i think there is no doubt, and this is > the most cases we met before.If it is from other place of SoC, such as internal SRAM(the probability is very little compare to DDR), > the error is still in the hardware. But the RAS node that detected the SRAM error will also report the error. > > To sum up the above, it is effective for most situation, and no harm for the others. OK, so applied as 5.14 material under edited subject. Thanks!