Received: by 2002:a05:6358:111d:b0:dc:6189:e246 with SMTP id f29csp1885170rwi; Thu, 3 Nov 2022 10:07:00 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6KB8INUxJ4ogT4mSEe4fHJLWf6U0zfbR2RW2BVczALj8LYNI2T1I6a/NauCG8axF0q3IRO X-Received: by 2002:a05:6402:428f:b0:454:c988:4bb1 with SMTP id g15-20020a056402428f00b00454c9884bb1mr30741020edc.196.1667495220604; Thu, 03 Nov 2022 10:07:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667495220; cv=none; d=google.com; s=arc-20160816; b=rH1SVFkNh5WtDk94znrmt/0QwwQjwTdka0Y+N6b/H+hRWxrBhmXDKw+/OMoHjmQQyO Xgt4lnZIOQS8qNaVcYgPDHw5oHStyj66MQpwd0pi8RLW9Tybw+H9YMFtMkqatAjhycxN ivLMQyM6HAzEOw/oNjDAt5oEL5BuSIp/t4w9GWTodmma/sFZun8p/QZ0avzrGEoVJXW7 QOXXi42/bixxViT4MEQzdykwHE4AQ9fJZBwimHyNZ4vR5jE2xW2Gw6K9KpjnmXg+10sC M/D3zSzBpA3kG6a/zWPQo7r8Ys/yFP38d5XavEOqmVtL8VMeXLjCEDubz4cDy9FOpMt3 oAgg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date; bh=BZ6OqGFFfDG4QuHHFBSmXU2tqruYaObBwBipncyOBHU=; b=MD3sT0opUlq2E6B1NfIErTnP7Q5fCCMqGfw8DRl0Yh7VEy1FaGMbOgAElv1AVINcDb cqcDkMjjbH/3+UvZFeTmwPjemb/syFG9gkfmi4fc1XxM/XpgD0+Aq7TGxoCqlTtf7ZUB +aAEb2Fbb8sz5g/2SrSrtLjnHsiu8sO6yGgnmuIZJuZYqwodULjBwYPxwiTU2Zo2HU+q NHUDTdTuq+aD5UNsfFtrcD4JV+nynT6jeYHFVeakjX6Zp4zSIDRBWpt8Lqgx1oGAjqNv Uv3SQQOrC6mtlHTA0cvoH2jcS+a9wUTJ/c/pxSACjfOh5LFtQ2xzYdcIbk0HlMPyzYUQ ey/g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ko2-20020a170907986200b0078de83a052csi1658736ejc.483.2022.11.03.10.06.26; Thu, 03 Nov 2022 10:07:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231315AbiKCQ6f (ORCPT + 98 others); Thu, 3 Nov 2022 12:58:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56206 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230300AbiKCQ6e (ORCPT ); Thu, 3 Nov 2022 12:58:34 -0400 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 484481143; Thu, 3 Nov 2022 09:58:33 -0700 (PDT) Received: from fraeml743-chm.china.huawei.com (unknown [172.18.147.201]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4N38zW1JkDz67lBv; Fri, 4 Nov 2022 00:56:23 +0800 (CST) Received: from lhrpeml500005.china.huawei.com (7.191.163.240) by fraeml743-chm.china.huawei.com (10.206.15.224) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Thu, 3 Nov 2022 17:58:29 +0100 Received: from localhost (10.202.227.76) by lhrpeml500005.china.huawei.com (7.191.163.240) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Thu, 3 Nov 2022 16:58:28 +0000 Date: Thu, 3 Nov 2022 16:58:27 +0000 From: Jonathan Cameron To: Dan Williams CC: "Jonathan Zhang (Infra)" , Smita Koralahalli , "linux-cxl@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Alison Schofield , Vishal Verma , "Ira Weiny" , Ben Widawsky , "Robert Richter" , Yazen Ghannam , "Terry Bowman" , Ard Biesheuvel Subject: Re: [PATCH 0/2] efi/cper, cxl: Decode CXL Protocol Errors CPER Message-ID: <20221103165827.00000b39@Huawei.com> In-Reply-To: <635c3f9e39742_6be12941a@dwillia2-xfh.jf.intel.com.notmuch> References: <20221007211714.71129-1-Smita.KoralahalliChannabasappa@amd.com> <63531a9dd51b9_4da32946c@dwillia2-xfh.jf.intel.com.notmuch> <151c093f-1e92-1c8e-957b-8781e488626a@amd.com> <63587b16dbb3_14192944c@dwillia2-mobl3.amr.corp.intel.com.notmuch> <69a14ac2-510e-fd8f-0854-60805a150663@amd.com> <635c3f9e39742_6be12941a@dwillia2-xfh.jf.intel.com.notmuch> Organization: Huawei Technologies Research and Development (UK) Ltd. X-Mailer: Claws Mail 4.1.0 (GTK 3.24.33; x86_64-w64-mingw32) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.202.227.76] X-ClientProxiedBy: lhrpeml500001.china.huawei.com (7.191.163.213) To lhrpeml500005.china.huawei.com (7.191.163.240) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 28 Oct 2022 13:46:22 -0700 Dan Williams wrote: > Jonathan Zhang (Infra) wrote: > > > > > > > On Oct 26, 2022, at 12:31 PM, Smita Koralahalli wrote: > > > > > > On 10/25/2022 5:11 PM, Dan Williams wrote: > > >> Smita Koralahalli wrote: > > >>> Hi Dan, > > >>> > > >>> On 10/21/2022 3:18 PM, Dan Williams wrote: > > >>>> Hi Smita, > > >>>> > > >>>> Smita Koralahalli wrote: > > >>>>> This series adds decoding for the CXL Protocol Errors Common Platform > > >>>>> Error Record. > > >>>> Be sure to copy Ard Biesheuvel , added, on > > >>>> drivers/firmware/efi/ patches. > > >>>> > > >>>> Along those lines, drivers/cxl/ developers have an idea of what is > > >>>> contained in the new CXL protocol error records and why Linux might want > > >>>> to decode them, others from outside drivers/cxl/ might not. It always > > >>>> helps to have a small summary of the benefit to end users of the > > >>>> motivation to apply a patch set. > > >>> Sure, will include in my v2. > > >>> > > >>>>> Smita Koralahalli (2): > > >>>>> efi/cper, cxl: Decode CXL Protocol Error Section > > >>>>> efi/cper, cxl: Decode CXL Error Log > > >>>>> > > >>>>> drivers/firmware/efi/Makefile | 2 +- > > >>>>> drivers/firmware/efi/cper.c | 9 +++ > > >>>>> drivers/firmware/efi/cper_cxl.c | 108 ++++++++++++++++++++++++++++++++ > > >>>>> drivers/firmware/efi/cper_cxl.h | 58 +++++++++++++++++ > > >>>>> include/linux/cxl_err.h | 21 +++++++ > > >>>>> 5 files changed, 197 insertions(+), 1 deletion(-) > > >>>> I notice no updates for the trace events in ghes_do_proc(), is that next > > >>>> in your queue? That's ok to be a follow-on after v2. > > >>> Sorry, if I haven't understood this right. Are you implying about the > > >>> "handling" > > >>> of cxl memory errors in ghes_do_proc() or is it just copying of CPER > > >>> entries to > > >>> tracepoints? > > >> Right now ghes_do_proc() will let the CXL CPER records fall through to > > >> log_non_standard_event(). Are you planning to add trace event decode > > >> there for CPER_SEC_CXL_PROT_ERR records? > > > > > > Thanks! Yeah its a good idea to add. I did not think about this before. > > > I will send this as a separate patchset after v2. > > > > > > I think with this cxl cper trace event support and Ira's patchset which traces > > > specific event record types via Get Event Record, we can start the userspace > > > handling probably in rasdaemon? > > Yes, I think this makes sense. rasdaemon could aggregate data and provide user > > with full picture: > > * Memory errors from both processor attached memory and CXL memory. > > * CXL protocol errors. > > * CXL device errors. > > Such errors may be handled either firmware first or OS first. > > I have no concerns about rasdaemon subscribing to CXL RAS events, but > the nice thing about trace-events is that any number of subscribers can > attach to the event stream. So I expect cxl-cli to have a monitor of > these CXL specific events and that does not preclude rasdaemon from > also incorporating CXL events into its event list. FYI, we posted some poison list RAS daemon patches a while back. https://lore.kernel.org/all/20220622122021.1986-1-shiju.jose@huawei.com/ Absolutely agree we'll want all the rest of these once kernel patches are in (and hence we know the tracepoints definitions are stable) If anyone is working on the RASdaemon side of things, shout on the list as I'd rather not see duplication of effort. Jonathan