Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp454306rwd; Wed, 14 Jun 2023 19:20:06 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7+G+/nozVFZYIn1UpJ0umgBPHM1Q4MmfAoWOOR0DGhIjeNV933ebhT+7xkFVzeNIfAmtyt X-Received: by 2002:a17:907:9414:b0:974:31:ed74 with SMTP id dk20-20020a170907941400b009740031ed74mr17817905ejc.65.1686795606476; Wed, 14 Jun 2023 19:20:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686795606; cv=none; d=google.com; s=arc-20160816; b=R77yiawd60LCYnQKl8PRL6mIeRDUoPAqCNGBbtjNac7cB00XUbkvzj2lTEsrYQYFc/ qyA5z1ZVPNJPcS3E2AyMGnfOlVlQmHn8DaNSbj3NIbzxlgx4d4IcRdBe9HRi++8t6ErW I23PiPqR8VihyA6uiqDIgadQZX4058wWiwKic0lcx5DWzWS7ikAO0Tc25fq4oAoJI9L9 7jRUlpAg5MQlLn38UWuFjnrVr9YrPKnoxdoFRbTxiDgow1g/wzeRKyWoBTGx+MXjFDO0 jIo+6OU/0pHDud0NOG7YJBR0zaHdzYIBws25KvYDVlC4HrKH7iqqgzi5TcVDPPwgCNSd /A8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=0lS+p+2rCyZFyjGXkQe5Uy3YYZttProaoHShE86BZ/M=; b=cY0KXJLiqElIdrU668yETeK2IIjE6F5A5ovqfvAwU2vjGr7bGyXx5rrjsVecqmtDXz E24cZC8nLvA0V8Nfm3B83WJQapuBZWHSB9OwxpoUG157XOzUfAuE5sARK73XVrt6OY87 RYzzQ6vuelhPh1EwQasnibmFsnBEKqJrn2B45bLaWjQhHpA+Hod1q/y7TznNm9zpXAbH 5OXol83H0oQU9MhvRS0wOTL1g2OUUN6OYPBovHS7av6O0mEdsTGUgF/wJm4DMYh2kqJb 93S+e/fiWTrJ+E6xuR0ydYRjFhObuXxqx6/Y4hAg8KSktBpR6a317aiPQdX0E1xI5RU9 Vv8A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w2-20020a170906480200b009827505fb50si1415080ejq.318.2023.06.14.19.19.41; Wed, 14 Jun 2023 19:20:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236802AbjFOCMu (ORCPT + 99 others); Wed, 14 Jun 2023 22:12:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54960 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229453AbjFOCMt (ORCPT ); Wed, 14 Jun 2023 22:12:49 -0400 Received: from out30-111.freemail.mail.aliyun.com (out30-111.freemail.mail.aliyun.com [115.124.30.111]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9FD381BE8; Wed, 14 Jun 2023 19:12:47 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R201e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045168;MF=xueshuai@linux.alibaba.com;NM=1;PH=DS;RN=11;SR=0;TI=SMTPD_---0Vl8Sk2z_1686795162; Received: from 30.240.112.107(mailfrom:xueshuai@linux.alibaba.com fp:SMTPD_---0Vl8Sk2z_1686795162) by smtp.aliyun-inc.com; Thu, 15 Jun 2023 10:12:44 +0800 Message-ID: <31816165-e3fc-5bb2-71ad-6fe77ecd64a7@linux.alibaba.com> Date: Thu, 15 Jun 2023 10:12:41 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.11.1 Subject: Re: [PATCH 2/3] x86/mce: Define amd_mce_usable_address() Content-Language: en-US To: Yazen Ghannam , linux-edac@vger.kernel.org Cc: linux-kernel@vger.kernel.org, tony.luck@intel.com, x86@kernel.org, muralidhara.mk@amd.com, joao.m.martins@oracle.com, william.roche@oracle.com, boris.ostrovsky@oracle.com, john.allen@amd.com, baolin.wang@linux.alibaba.com References: <20230613141142.36801-1-yazen.ghannam@amd.com> <20230613141142.36801-3-yazen.ghannam@amd.com> <31fdaacc-cc2b-5ea5-8a0e-e5ccfe674834@linux.alibaba.com> <1e9b1a0c-564d-6a3c-c253-1b1da1773ecc@amd.com> From: Shuai Xue In-Reply-To: <1e9b1a0c-564d-6a3c-c253-1b1da1773ecc@amd.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-10.0 required=5.0 tests=BAYES_00, ENV_AND_HDR_SPF_MATCH,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2023/6/14 23:09, Yazen Ghannam wrote: > On 6/13/2023 10:19 PM, Shuai Xue wrote: >> >> >> On 2023/6/13 22:11, Yazen Ghannam wrote: >>> Currently, all valid MCA_ADDR values are assumed to be usable on AMD >>> systems. However, this is not correct in most cases. Notifiers expecting >>> usable addresses may then operate on inappropriate values. >>> >>> Define a helper function to do AMD-specific checks for a usable memory >>> address. List out all known cases. >>> >>> Signed-off-by: Yazen Ghannam >>> --- >>>   arch/x86/kernel/cpu/mce/amd.c      | 38 ++++++++++++++++++++++++++++++ >>>   arch/x86/kernel/cpu/mce/core.c     |  3 +++ >>>   arch/x86/kernel/cpu/mce/internal.h |  2 ++ >>>   3 files changed, 43 insertions(+) >>> >>> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c >>> index 1ccfb0c9257f..ca79fa10b844 100644 >>> --- a/arch/x86/kernel/cpu/mce/amd.c >>> +++ b/arch/x86/kernel/cpu/mce/amd.c >>> @@ -746,6 +746,44 @@ bool amd_mce_is_memory_error(struct mce *m) >>>       return legacy_mce_is_memory_error(m); >>>   } >>>   +/* >>> + * AMD systems do not have an explicit indicator that the value in MCA_ADDR is >>> + * a system physical address. Therefore individual cases need to be detected. >>> + * Future cases and checks will be added as needed. >>> + * >>> + * 1) General case >>> + *    a) Assume address is not usable. >>> + * 2) "Poison" errors >>> + *    a) Indicated by MCA_STATUS[43]: POISON. Defined for all banks except legacy >>> + *       Northbridge (bank 4). >>> + *    b) Refers to poison consumption in the Core. Does not include "no action", >>> + *       "action optional", or "deferred" error severities. >>> + *    c) Will include a usuable address so that immediate action can be taken. >>> + * 3) Northbridge DRAM ECC errors >>> + *    a) Reported in legacy bank 4 with XEC 8. >>> + *    b) MCA_STATUS[43] is *not* defined as POISON in legacy bank 4. Therefore, >>> + *       this bit should not be checked. >> [nit] >> >>> + * >>> + * NOTE: SMCA UMC memory errors fall into case #1. >> >> hi, Yazen >> >> The address for SMCA UMC memory error is not system physical address, it make sense >> to be not usable. But how we deal with the SMCA address? The MCE chain like >> uc_decode_notifier will do a sanity check with mce_usable_address and it will not >> handle SMCA address. >> > > Hi Shuai, > > That's correct. > > There isn't a good solution today. This will be handled in future changes. Hi, Yazen, Do you have plan to address it? If not, I can help. We meet this problem in our products. Thanks Shuai