Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp1621687rwd; Thu, 25 May 2023 15:41:32 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7Y3gi9wSWpjM3bLofbMqnHnEMJqS5wfjJ2srZ60jZsSS7Z/YHl/TUT0Zw5ybMCvbVP5pa4 X-Received: by 2002:a05:6a20:3d09:b0:10a:cbe6:69f0 with SMTP id y9-20020a056a203d0900b0010acbe669f0mr22814811pzi.10.1685054492084; Thu, 25 May 2023 15:41:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685054492; cv=none; d=google.com; s=arc-20160816; b=Xz/GdFIKyILwvBkFdkljHe+t2tracQXr7dML/g7Q+dGQBc9BXVvj+h3eXLzm7n1eh3 z8Mp7vKXoiqnum9kCIGMkH+j8izGAA/tsWU8/tx1zcaJ8cFcx3xJMoE6Vw1tzgDgp92k yUxPMSDhJy5OI2I7OLS8D5YCjKWB7BqRMX6yc7Q/RmHTzpeNX+g1UMj/+vwUKEcxC60f FkfQ4Sf7ifuyqMqhuzRTbpP+fMN6+LnA5qELUrMKStVYaimM0bmpXY8GnBMfDxr13Bzm 9ZVyG1liMGovdKEP6IRVNzOoCin86vyh0zZvCaWvVv4p1awsNj+6alFYnt9c56YRnsPA mFIw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :message-id:subject:cc:to:from:date:dkim-signature; bh=XoRf0BdxQVMacvX3+C7SDDYMeKQHbRjjiS/qOVPqpjo=; b=ikR/gXpgReXEVB8xHATEBmn6Ml6qmH1jeKokJQxBi7QnV3oIqbtttnqrzo56L0cjzL toSIuQIQGEGkZx6Z2mk2kLvnufykfvoijlxmLNMJH/N3JAINFaazC99THnFgMq/s8x2j EPyFWbLlorOZXGliB/jS5pnQUwDzNTy0TIbaFPbJdbbEHBgXljhC+mGyMMWbErf7IAQJ CHNIYga9ZWG90VvpEZ0t5J3Bh0TDFBwSezBPj4kCODTbCWRN+xxUM6kCnyvbYzHy1ffL Ag/tt1M6Vsf68pEOQBB4wR+SHyhpUjgsD/EARCNMqRRvLzAdO7YCY9h5Wa4+TY15jAjS Ibxw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=tlUgaxrF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v3-20020a637a03000000b0052856cec94csi2075115pgc.877.2023.05.25.15.41.20; Thu, 25 May 2023 15:41:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=tlUgaxrF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241637AbjEYWBJ (ORCPT + 99 others); Thu, 25 May 2023 18:01:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56636 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229689AbjEYWBH (ORCPT ); Thu, 25 May 2023 18:01:07 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 72DDE183; Thu, 25 May 2023 15:01:04 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 03ECA615C8; Thu, 25 May 2023 22:01:04 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1A547C433D2; Thu, 25 May 2023 22:01:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1685052063; bh=ptE4zUO2N1qn5e28e8ELQ07KDbno8edydjVNz2FJipk=; h=Date:From:To:Cc:Subject:In-Reply-To:From; b=tlUgaxrF3z9IwUeA0DGL4h+NIbcmOJM/r14l3L3I9HRklMIf5eX3CeONLvzDEntda osuUN5uTSj89B76YksTuXRcjG3o8fnn1/nKVfoBt6SjNv1TMJy8pxP0weYoXN08/LZ NnWpZaqHEPkpZ9EFY31YI9li4C+FWhtxNf9dtO5psADQ1v26O4DYueomEjqdkpDHdo L9s2ilCnNe0E8J44KETH5vfeLzcAjK1qIvGazQyKlmwaBwWn7amMIDus4DjZp+Q/fw CgxEXg46iB38Nmidqd9ugZLtJxkJb0isTuKOi0eGXQXe/NfDzgbtfkzFzNEvgjyGAN CxYZdeN6e4wtQ== Date: Thu, 25 May 2023 17:01:01 -0500 From: Bjorn Helgaas To: Robert Richter Cc: alison.schofield@intel.com, dave.jiang@intel.com, Terry Bowman , vishal.l.verma@intel.com, linuxppc-dev@lists.ozlabs.org, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, bhelgaas@google.com, Oliver O'Halloran , Jonathan.Cameron@huawei.com, bwidawsk@kernel.org, dan.j.williams@intel.com, ira.weiny@intel.com Subject: Re: [PATCH v4 22/23] PCI/AER: Forward RCH downstream port-detected errors to the CXL.mem dev handler Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 25, 2023 at 11:29:58PM +0200, Robert Richter wrote: > eOn 24.05.23 16:32:35, Bjorn Helgaas wrote: > > On Tue, May 23, 2023 at 06:22:13PM -0500, Terry Bowman wrote: > > > From: Robert Richter > > > > > > In Restricted CXL Device (RCD) mode a CXL device is exposed as an > > > RCiEP, but CXL downstream and upstream ports are not enumerated and > > > not visible in the PCIe hierarchy. Protocol and link errors are sent > > > to an RCEC. > > > > > > Restricted CXL host (RCH) downstream port-detected errors are signaled > > > as internal AER errors, either Uncorrectable Internal Error (UIE) or > > > Corrected Internal Errors (CIE). > > > > From the parallelism with RCD above, I first thought that RCH devices > > were non-RCD mode and *were* enumerated as part of the PCIe hierarchy, > > but actually I suspect it's more like the following? > > > > ... but CXL downstream and upstream ports are not enumerated and not > > visible in the PCIe hierarchy. > > > > Protocol and link errors from these non-enumerated ports are > > signaled as internal AER errors ... via a CXL RCEC. > > Exactly, except the RCEC is standard PCIe and also must not > necessarily on the same PCI bus as the CXL RCiEPs are. So make it "RCEC" instead of "CXL RCEC", I guess? PCIe r6.0, sec 7.9.10.3, allows an RCEC to be associated with RCiEPs on different buses, so nothing to see there. > > > The error source is the id of the RCEC. > > > > This seems odd; I assume this refers to the RCEC's AER Error Source > > Identification register, and the ERR_COR or ERR_FATAL/NONFATAL Source > > Identification would ordinarily be the Requester ID of the RCiEP that > > "sent" the Error Message. But you're saying it's actually the ID of > > the *RCEC*, not the RCiEP? > > Right, the downstream port has its own AER ext capability in > non-config (io mapped) RCRB register range. Errors originating from > there are signaled as internal AER errors via the RCEC *with* the > RCEC's Requester ID. Code walks through all associated CXL endpoints, > determines the dport and checks its AER. > > There is also an RDPAS structure defined in CXL but that is only a > different way to provide the RCEC to dport association instead of > using the RCEC's Endpoint Association Extended Capability. In the end > we get all associated RCHs and check the AER of all their dports. > > The upstream port is signaled using the RCiEP's AER. CXL spec is > strict here: "Upstream Port RCRB shall not implement the AER Extended > Capability." The RCiEP's requestor ID is used then and its config > space the AER is in. > > CXL.cachemem errors are reported with the RCiEP as requester > too. Status is in the CXL RAS cap and the UIE or CIE is set > respectively in the AER status of the RCiEP. > > > We're going to call pci_aer_handle_error() as well, to handle the > > non-internal errors, and I'm pretty sure that path expects the RCiEP > > ID there. > > > > Whatever the answer, I'm not sure this sentence is actually relevant > > to this patch, since this patch doesn't read PCI_ERR_ROOT_ERR_SRC or > > look at struct aer_err_source.id. > > The source id is used in aer_process_err_devices() which finally calls > handle_error_source() for the device with the requestor id. This is > the place where cxl_rch_handle_error() checks if it is an RCEC that > received an internal error and has cxl devices connected to it. Then, > the request is forwarded to the cxl_mem handler which also needs to > check the dport now. That is, pcie_walk_rcec() in > cxl_rch_handle_error() is called with the RCEC's pci handle, > cxl_rch_handle_error_iter() with the RCiEP's pci handle. I'm still not sure this is relevant. Isn't that last sentence just the way we always use pcie_walk_rcec()? If there's something *different* here about CXL, and it's important to this patch, sure. But I don't see that yet. Maybe a comment in the code if you think it's important to clarify something there. Bjorn