2022-11-09 11:09:29

by Robert Richter

[permalink] [raw]
Subject: [PATCH v3 0/9] cxl: Add support for Restricted CXL hosts (RCD mode)

In Restricted CXL Device (RCD) mode (formerly referred to as CXL 1.1)
the PCIe enumeration hierarchy is different from CXL VH Enumeration
(formerly referred to as 2.0, for both modes see CXL spec 3.0: 9.11
and 9.12, [1]). This series adds support for RCD mode. It implements
the detection of Restricted CXL Hosts (RCHs) and its corresponding
Restricted CXL Devices (RCDs). It does the necessary enumeration of
ports and connects the endpoints. With all the plumbing an RCH/RCD
pair is registered at the Linux CXL bus and becomes visible in sysfs
in the same way as CXL VH hosts and devices do already. RCDs are
brought up as CXL endpoints and bound to subsequent drivers such as
cxl_mem.

For CXL VH the host driver (cxl_acpi) starts host bridge discovery
once the ACPI0017 CXL root device is detected and then searches for
ACPI0016 host bridges to enable CXL. This implementation requires the
ACPI device enumeration for RCD mode. It also expects the host's
downstream and upstream port RCRBs base address being reported by
firmware using the optional CEDT CHBS entry of the host bridge (see
CXL spec 3.0, 9.17.1.2).

RCD mode does not support hot-plug, so host discovery is at boot time
only.

This version bases on 2b76fc22aefd ("cxl/acpi: Improve debug messages
in cxl_acpi_probe()") containing the already accepted patches.

[1] https://www.computeexpresslink.org/spec-landing

---

v3:

* Rebased onto 2b76fc22aefd ("cxl/acpi: Improve debug messages in
cxl_acpi_probe()").
* Added Co-developed-by tag. (Rafael)
* Added public function cxl_rcrb_to_component() to regs.c for later
reuse. Added arg to switch between upstream and downstream RCRB.
(Dan, Dave)
* Added patch to register CXL host ports by bridge device. Note the
alias detection in ndctl (cxl list command) need to check both
sysfs entries, firmware_node and physical_node. A rework is needed
here. (Dan, Vishal)
* Reworked implementation to skip intermediate port enumeration of
restricted endpoints (RCDs). (Dan, Dave)
* Added check to only register RCDs with device 0, function 0 as CXL
memory.
* Added Terry's patch to set ACPI's CXL _OSC to indicate CXL1.1
support.

v2:
* Reworked series to use add_host_bridge_dport() and
add_host_bridge_uport(). There is a single cxl root device
(ACPI0017) also for RCHs (must have a ACPI0016 id). (Dan)
* Rebased onto 6.1-rc1.
* Added a WARN_ON_ONCE() to CXL_RESOURCE_NONE check. Updated patch
description with an example case. (Dan, Jonathan)
* Added wrapper functions to devm_cxl_add_port() and
devm_cxl_add_dport(). (Dan)
* Dropped "PCI/ACPI: Link host bridge to its ACPI fw node" patch.
* Updated spec refs to use 3.0. Added PCIe base spec refs. (Jonathan)
* Reused UID detect code. (Dan)
* Dropped "cxl/acpi: Specify module load order dependency for the
cxl_acpi module" patch. Return -EINVAL if host not yet ready. (Dan)
* Minor other changes.
* Note: I haven't included most of the received Reviewed-by tags due
to the major rework. In any case, thanks to all here.

Robert Richter (8):
cxl/acpi: Register CXL host ports by bridge device
cxl/acpi: Extract component registers of restricted hosts from RCRB
cxl/mem: Adjust cxl_mem_find_port() to find an RCH's port
cxl/mem: Skip intermediate port enumeration of restricted endpoints
(RCDs)
cxl/pci: Only register RCDs with device 0, function 0 as CXL memory
device
cxl/pci: Do not ignore PCI config read errors in match_add_dports()
cxl/pci: Factor out code in match_add_dports() to pci_dev_add_dport()
cxl/pci: Extend devm_cxl_port_enumerate_dports() to support restricted
hosts (RCH)

Terry Bowman (1):
cxl/acpi: Set ACPI's CXL _OSC to indicate CXL1.1 support

drivers/acpi/pci_root.c | 1 +
drivers/cxl/acpi.c | 89 +++++++++++++++++++++++++++++------------
drivers/cxl/core/pci.c | 74 ++++++++++++++++++++++++++++------
drivers/cxl/core/port.c | 56 +++++++++++++++++++++++++-
drivers/cxl/core/regs.c | 46 +++++++++++++++++++++
drivers/cxl/cxl.h | 8 ++++
drivers/cxl/pci.c | 25 +++++++++++-
7 files changed, 257 insertions(+), 42 deletions(-)


base-commit: 2b76fc22aefd39820c0520255875f99b326ede99
--
2.30.2



2022-11-09 11:12:33

by Robert Richter

[permalink] [raw]
Subject: [PATCH v3 8/9] cxl/pci: Extend devm_cxl_port_enumerate_dports() to support restricted hosts (RCH)

The PCIe Software View of an RCH and RCD is different to VH mode. An
RCD is paired with an RCH and shows up as RCiEP. Its downstream and
upstream ports are hidden to the PCI hierarchy. This different PCI
topology requires a different handling of RCHs.

Extend devm_cxl_port_enumerate_dports() to support restricted hosts
(RCH). If an RCH is detected, register its port as dport to the
device. An RCH is found if the host's dev 0 func 0 devices is an RCiEP
with an existing PCIe DVSEC for CXL Devices (ID 0000h).

Signed-off-by: Robert Richter <[email protected]>
---
drivers/cxl/core/pci.c | 37 ++++++++++++++++++++++++++++++++++++-
1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 667de4f125f6..a6b1a1501db3 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -48,6 +48,37 @@ static int pci_dev_add_dport(struct pci_dev *pdev, struct cxl_port *port,
return 0;
}

+static int restricted_host_enumerate_dport(struct cxl_port *port,
+ struct pci_bus *bus)
+{
+ struct pci_dev *pdev;
+ bool is_restricted_host;
+ int rc;
+
+ /* Check CXL DVSEC of dev 0 func 0 */
+ pdev = pci_get_slot(bus, PCI_DEVFN(0, 0));
+
+ is_restricted_host = pdev
+ && (pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_END)
+ && pci_find_dvsec_capability(pdev,
+ PCI_DVSEC_VENDOR_ID_CXL,
+ CXL_DVSEC_PCIE_DEVICE);
+ if (is_restricted_host)
+ rc = pci_dev_add_dport(pdev, port, CXL_RESOURCE_NONE);
+
+ pci_dev_put(pdev);
+
+ if (!is_restricted_host)
+ return 0;
+
+ dev_dbg(bus->bridge, "CXL restricted host found\n");
+
+ if (rc)
+ return rc;
+
+ return 1;
+}
+
static int match_add_dports(struct pci_dev *pdev, void *data)
{
struct cxl_walk_context *ctx = data;
@@ -91,11 +122,15 @@ int devm_cxl_port_enumerate_dports(struct cxl_port *port)
{
struct pci_bus *bus = cxl_port_to_pci_bus(port);
struct cxl_walk_context ctx;
- int type;
+ int type, count;

if (!bus)
return -ENXIO;

+ count = restricted_host_enumerate_dport(port, bus);
+ if (count)
+ return count;
+
if (pci_is_root_bus(bus))
type = PCI_EXP_TYPE_ROOT_PORT;
else
--
2.30.2


2022-11-09 11:25:58

by Robert Richter

[permalink] [raw]
Subject: [PATCH v3 3/9] cxl/mem: Adjust cxl_mem_find_port() to find an RCH's port

The PCIe software view of an RCH and RCD is different to VH mode. An
RCD is paired with an RCH and shows up as RCiEP with a parent already
pointing to a PCI bridge (struct pci_host_bridge). In contrast, in VH
mode an PCI Express Endpoint is a PCI type 0 device with a PCI type 1
device as parent (struct pci_dev, most of the time a downstream switch
port, but could also be a root port). The following hierarchy applies
in VH mode:

CXL memory device, cxl_memdev endpoint
└──PCIe Endpoint (type 0), pci_dev |
└──Downstream Port (type 1), pci_dev (Nth switch) portN
└──Upstream Port (type 1), pci_dev (Nth switch) |
: :
└──Downstream Port (type 1), pci_dev (1st switch) port1
└──Upstream Port (type 1), pci_dev (1st switch) |
└──Root Port (type 1), pci_dev |
└──PCI host bridge, pci_host_bridge port0
: |
:..ACPI0017, acpi_dev root

(There can be zero or any other number of switches in between.)

An iterator through the grandparents takes us to the root port which
is registered as dport to the bridge. The next port an endpoint is
connected to can be determined by using the grandparent of the memory
device as a dport_dev in cxl_mem_find_port().

The same does not work in RCD mode where only an RCiEP is connected to
the host bridge:

CXL memory device, cxl_memdev endpoint
└──PCIe Endpoint (type 0), pci_dev |
└──PCI host bridge, pci_host_bridge port0
: |
:..ACPI0017, acpi_dev root

Here, an endpoint is directly connected to the host bridge without a
type 1 PCI device (root or downstream port) in between. To link the
endpoint to the correct port, the endpoint's PCI device (parent of the
memory device) must be taken as dport_dev arg in cxl_mem_find_port().

Change cxl_mem_find_port() to find an RCH's port.

Signed-off-by: Robert Richter <[email protected]>
---
drivers/cxl/core/port.c | 38 ++++++++++++++++++++++++++++++++++++++
1 file changed, 38 insertions(+)

diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 0431ed860d8e..d10c3580719b 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1354,6 +1354,14 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd,
return rc;
}

+static inline bool is_cxl_restricted(struct cxl_memdev *cxlmd)
+{
+ struct device *parent = cxlmd->dev.parent;
+ if (!dev_is_pci(parent))
+ return false;
+ return pci_pcie_type(to_pci_dev(parent)) == PCI_EXP_TYPE_RC_END;
+}
+
int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
{
struct device *dev = &cxlmd->dev;
@@ -1433,9 +1441,39 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
}
EXPORT_SYMBOL_NS_GPL(devm_cxl_enumerate_ports, CXL);

+/*
+ * CXL memory device and port hierarchy:
+ *
+ * VH mode:
+ *
+ * CXL memory device, cxl_memdev endpoint
+ * └──PCIe Endpoint (type 0), pci_dev |
+ * └──Downstream Port (type 1), pci_dev (Nth switch) portN
+ * └──Upstream Port (type 1), pci_dev (Nth switch) |
+ * : :
+ * └──Downstream Port (type 1), pci_dev (1st switch) port1
+ * └──Upstream Port (type 1), pci_dev (1st switch) |
+ * └──Root Port (type 1), pci_dev |
+ * └──PCI host bridge, pci_host_bridge port0
+ * : |
+ * :..ACPI0017, acpi_dev root
+ *
+ * (There can be zero or any other number of switches in between.)
+ *
+ * RCD mode:
+ *
+ * CXL memory device, cxl_memdev endpoint
+ * └──PCIe Endpoint (type 0), pci_dev |
+ * └──PCI host bridge, pci_host_bridge port0
+ * : |
+ * :..ACPI0017, acpi_dev root
+ */
struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd,
struct cxl_dport **dport)
{
+ if (is_cxl_restricted(cxlmd))
+ return find_cxl_port(cxlmd->dev.parent, dport);
+
return find_cxl_port(grandparent(&cxlmd->dev), dport);
}
EXPORT_SYMBOL_NS_GPL(cxl_mem_find_port, CXL);
--
2.30.2


2022-11-09 11:30:49

by Robert Richter

[permalink] [raw]
Subject: [PATCH v3 4/9] cxl/mem: Skip intermediate port enumeration of restricted endpoints (RCDs)

When an endpoint is found, all ports in beetween the endpoint and the
CXL host bridge need to be created. In the RCH case there are no ports
in between a host bridge and the endpoint. Skip the enumeration of
intermediate ports.

The port enumeration does not only create all ports, it also
initializes the endpoint chain by adding the endpoint to every
downstream port up to the root bridge. This must be done also in RCD
mode, but is much more simple as the endpoint only needs to be added
to the host bridge's dport.

Note: For endpoint removal the cxl_detach_ep() is not needed as it is
released in cxl_port_release().

Signed-off-by: Robert Richter <[email protected]>
---
drivers/cxl/core/port.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index d10c3580719b..e21a9c3fe4da 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1366,8 +1366,24 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
{
struct device *dev = &cxlmd->dev;
struct device *iter;
+ struct cxl_dport *dport;
+ struct cxl_port *port;
int rc;

+ /*
+ * Skip intermediate port enumeration in the RCH case, there
+ * are no ports in between a host bridge and an endpoint. Only
+ * initialize the EP chain.
+ */
+ if (is_cxl_restricted(cxlmd)) {
+ port = cxl_mem_find_port(cxlmd, &dport);
+ if (!port)
+ return -ENXIO;
+ rc = cxl_add_ep(dport, &cxlmd->dev);
+ put_device(&port->dev);
+ return rc;
+ }
+
rc = devm_add_action_or_reset(&cxlmd->dev, cxl_detach_ep, cxlmd);
if (rc)
return rc;
@@ -1381,8 +1397,6 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
for (iter = dev; iter; iter = grandparent(iter)) {
struct device *dport_dev = grandparent(iter);
struct device *uport_dev;
- struct cxl_dport *dport;
- struct cxl_port *port;

if (!dport_dev)
return 0;
--
2.30.2


2022-11-09 11:31:47

by Robert Richter

[permalink] [raw]
Subject: [PATCH v3 5/9] cxl/pci: Only register RCDs with device 0, function 0 as CXL memory device

The Device 0, Function 0 DVSEC controls the CXL functionality of the
entire device. Add a check to prevent registration of any other PCI
device on the bus as a CXL memory device.

Signed-off-by: Robert Richter <[email protected]>
---
drivers/cxl/pci.c | 25 +++++++++++++++++++++++--
1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index faeb5d9d7a7a..cc4f206f24b3 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -428,11 +428,26 @@ static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
}
}

+static int check_restricted_device(struct pci_dev *pdev, u16 pcie_dvsec)
+{
+ if (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_END)
+ return 0; /* no RCD */
+
+ if (pdev->devfn == PCI_DEVFN(0, 0) && pcie_dvsec)
+ return 0; /* ok */
+
+ dev_warn(&pdev->dev, "Skipping RCD: devfn=0x%02x dvsec=%u\n",
+ pdev->devfn, pcie_dvsec);
+
+ return -ENODEV;
+}
+
static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
{
struct cxl_register_map map;
struct cxl_memdev *cxlmd;
struct cxl_dev_state *cxlds;
+ u16 pcie_dvsec;
int rc;

/*
@@ -442,6 +457,13 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
BUILD_BUG_ON(offsetof(struct cxl_regs, memdev) !=
offsetof(struct cxl_regs, device_regs.memdev));

+ pcie_dvsec = pci_find_dvsec_capability(
+ pdev, PCI_DVSEC_VENDOR_ID_CXL, CXL_DVSEC_PCIE_DEVICE);
+
+ rc = check_restricted_device(pdev, pcie_dvsec);
+ if (rc)
+ return rc;
+
rc = pcim_enable_device(pdev);
if (rc)
return rc;
@@ -451,8 +473,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
return PTR_ERR(cxlds);

cxlds->serial = pci_get_dsn(pdev);
- cxlds->cxl_dvsec = pci_find_dvsec_capability(
- pdev, PCI_DVSEC_VENDOR_ID_CXL, CXL_DVSEC_PCIE_DEVICE);
+ cxlds->cxl_dvsec = pcie_dvsec;
if (!cxlds->cxl_dvsec)
dev_warn(&pdev->dev,
"Device DVSEC not present, skip CXL.mem init\n");
--
2.30.2


2022-11-09 17:49:59

by Dave Jiang

[permalink] [raw]
Subject: Re: [PATCH v3 4/9] cxl/mem: Skip intermediate port enumeration of restricted endpoints (RCDs)



On 11/9/2022 2:40 AM, Robert Richter wrote:
> When an endpoint is found, all ports in beetween the endpoint and the
s/beetween/between/

DJ
> CXL host bridge need to be created. In the RCH case there are no ports
> in between a host bridge and the endpoint. Skip the enumeration of
> intermediate ports.
>
> The port enumeration does not only create all ports, it also
> initializes the endpoint chain by adding the endpoint to every
> downstream port up to the root bridge. This must be done also in RCD
> mode, but is much more simple as the endpoint only needs to be added
> to the host bridge's dport.
>
> Note: For endpoint removal the cxl_detach_ep() is not needed as it is
> released in cxl_port_release().
>
> Signed-off-by: Robert Richter <[email protected]>
> ---
> drivers/cxl/core/port.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index d10c3580719b..e21a9c3fe4da 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1366,8 +1366,24 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> {
> struct device *dev = &cxlmd->dev;
> struct device *iter;
> + struct cxl_dport *dport;
> + struct cxl_port *port;
> int rc;
>
> + /*
> + * Skip intermediate port enumeration in the RCH case, there
> + * are no ports in between a host bridge and an endpoint. Only
> + * initialize the EP chain.
> + */
> + if (is_cxl_restricted(cxlmd)) {
> + port = cxl_mem_find_port(cxlmd, &dport);
> + if (!port)
> + return -ENXIO;
> + rc = cxl_add_ep(dport, &cxlmd->dev);
> + put_device(&port->dev);
> + return rc;
> + }
> +
> rc = devm_add_action_or_reset(&cxlmd->dev, cxl_detach_ep, cxlmd);
> if (rc)
> return rc;
> @@ -1381,8 +1397,6 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> for (iter = dev; iter; iter = grandparent(iter)) {
> struct device *dport_dev = grandparent(iter);
> struct device *uport_dev;
> - struct cxl_dport *dport;
> - struct cxl_port *port;
>
> if (!dport_dev)
> return 0;

2022-11-11 12:04:51

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH v3 8/9] cxl/pci: Extend devm_cxl_port_enumerate_dports() to support restricted hosts (RCH)

On 09.11.22 11:40:58, Robert Richter wrote:

> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index 667de4f125f6..a6b1a1501db3 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -48,6 +48,37 @@ static int pci_dev_add_dport(struct pci_dev *pdev, struct cxl_port *port,
> return 0;
> }
>
> +static int restricted_host_enumerate_dport(struct cxl_port *port,
> + struct pci_bus *bus)
> +{
> + struct pci_dev *pdev;
> + bool is_restricted_host;
> + int rc;
> +
> + /* Check CXL DVSEC of dev 0 func 0 */
> + pdev = pci_get_slot(bus, PCI_DEVFN(0, 0));
> +
> + is_restricted_host = pdev
> + && (pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_END)
> + && pci_find_dvsec_capability(pdev,
> + PCI_DVSEC_VENDOR_ID_CXL,
> + CXL_DVSEC_PCIE_DEVICE);
> + if (is_restricted_host)
> + rc = pci_dev_add_dport(pdev, port, CXL_RESOURCE_NONE);

See my comment in patch #6. This reads the port id from RCiEP's PCIe
cap, but instead the RCD UP RCRB should be used for this.
pci_dev_add_dport() needs to be updated to handle that.

-Robert

> +
> + pci_dev_put(pdev);
> +
> + if (!is_restricted_host)
> + return 0;
> +
> + dev_dbg(bus->bridge, "CXL restricted host found\n");
> +
> + if (rc)
> + return rc;
> +
> + return 1;
> +}

2022-11-15 00:35:58

by Dan Williams

[permalink] [raw]
Subject: RE: [PATCH v3 3/9] cxl/mem: Adjust cxl_mem_find_port() to find an RCH's port

Robert Richter wrote:
> The PCIe software view of an RCH and RCD is different to VH mode. An
> RCD is paired with an RCH and shows up as RCiEP with a parent already
> pointing to a PCI bridge (struct pci_host_bridge). In contrast, in VH
> mode an PCI Express Endpoint is a PCI type 0 device with a PCI type 1
> device as parent (struct pci_dev, most of the time a downstream switch
> port, but could also be a root port). The following hierarchy applies
> in VH mode:
>
> CXL memory device, cxl_memdev endpoint
> └──PCIe Endpoint (type 0), pci_dev |
> └──Downstream Port (type 1), pci_dev (Nth switch) portN
> └──Upstream Port (type 1), pci_dev (Nth switch) |
> : :
> └──Downstream Port (type 1), pci_dev (1st switch) port1
> └──Upstream Port (type 1), pci_dev (1st switch) |
> └──Root Port (type 1), pci_dev |
> └──PCI host bridge, pci_host_bridge port0
> : |
> :..ACPI0017, acpi_dev root
>
> (There can be zero or any other number of switches in between.)
>
> An iterator through the grandparents takes us to the root port which
> is registered as dport to the bridge. The next port an endpoint is
> connected to can be determined by using the grandparent of the memory
> device as a dport_dev in cxl_mem_find_port().
>
> The same does not work in RCD mode where only an RCiEP is connected to
> the host bridge:
>
> CXL memory device, cxl_memdev endpoint
> └──PCIe Endpoint (type 0), pci_dev |
> └──PCI host bridge, pci_host_bridge port0
> : |
> :..ACPI0017, acpi_dev root
>
> Here, an endpoint is directly connected to the host bridge without a
> type 1 PCI device (root or downstream port) in between. To link the
> endpoint to the correct port, the endpoint's PCI device (parent of the
> memory device) must be taken as dport_dev arg in cxl_mem_find_port().
>
> Change cxl_mem_find_port() to find an RCH's port.
>
> Signed-off-by: Robert Richter <[email protected]>
> ---
> drivers/cxl/core/port.c | 38 ++++++++++++++++++++++++++++++++++++++
> 1 file changed, 38 insertions(+)
>
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 0431ed860d8e..d10c3580719b 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1354,6 +1354,14 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd,
> return rc;
> }
>
> +static inline bool is_cxl_restricted(struct cxl_memdev *cxlmd)
> +{
> + struct device *parent = cxlmd->dev.parent;
> + if (!dev_is_pci(parent))
> + return false;
> + return pci_pcie_type(to_pci_dev(parent)) == PCI_EXP_TYPE_RC_END;
> +}
> +
> int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> {
> struct device *dev = &cxlmd->dev;
> @@ -1433,9 +1441,39 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> }
> EXPORT_SYMBOL_NS_GPL(devm_cxl_enumerate_ports, CXL);
>
> +/*
> + * CXL memory device and port hierarchy:
> + *
> + * VH mode:
> + *
> + * CXL memory device, cxl_memdev endpoint
> + * └──PCIe Endpoint (type 0), pci_dev |
> + * └──Downstream Port (type 1), pci_dev (Nth switch) portN
> + * └──Upstream Port (type 1), pci_dev (Nth switch) |
> + * : :
> + * └──Downstream Port (type 1), pci_dev (1st switch) port1
> + * └──Upstream Port (type 1), pci_dev (1st switch) |
> + * └──Root Port (type 1), pci_dev |
> + * └──PCI host bridge, pci_host_bridge port0
> + * : |
> + * :..ACPI0017, acpi_dev root
> + *
> + * (There can be zero or any other number of switches in between.)
> + *
> + * RCD mode:
> + *
> + * CXL memory device, cxl_memdev endpoint
> + * └──PCIe Endpoint (type 0), pci_dev |
> + * └──PCI host bridge, pci_host_bridge port0
> + * : |
> + * :..ACPI0017, acpi_dev root
> + */
> struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd,
> struct cxl_dport **dport)
> {
> + if (is_cxl_restricted(cxlmd))
> + return find_cxl_port(cxlmd->dev.parent, dport);
> +
> return find_cxl_port(grandparent(&cxlmd->dev), dport);

I do not see why this change is needed. For example:

# readlink -f /sys/bus/cxl/devices/mem0
/sys/devices/pci0000:38/0000:38:00.0/mem0
# cxl list -BT
[
{
"bus":"root0",
"provider":"ACPI.CXL",
"nr_dports":1,
"dports":[
{
"dport":"pci0000:38",
"id":49
}
]
}
]

...so, in this case, the grandparent of "mem0" is "pci0000:38", and
"pci0000:38" is a dport. Unmodified cxl_mem_find_port() will do the
right thing and find that this CXL RCIEP is directly connected to
"root0".

2022-11-15 00:36:43

by Dan Williams

[permalink] [raw]
Subject: RE: [PATCH v3 4/9] cxl/mem: Skip intermediate port enumeration of restricted endpoints (RCDs)

Robert Richter wrote:
> When an endpoint is found, all ports in beetween the endpoint and the
> CXL host bridge need to be created. In the RCH case there are no ports
> in between a host bridge and the endpoint. Skip the enumeration of
> intermediate ports.
>
> The port enumeration does not only create all ports, it also
> initializes the endpoint chain by adding the endpoint to every
> downstream port up to the root bridge. This must be done also in RCD
> mode, but is much more simple as the endpoint only needs to be added
> to the host bridge's dport.
>
> Note: For endpoint removal the cxl_detach_ep() is not needed as it is
> released in cxl_port_release().
>
> Signed-off-by: Robert Richter <[email protected]>
> ---
> drivers/cxl/core/port.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index d10c3580719b..e21a9c3fe4da 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1366,8 +1366,24 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> {
> struct device *dev = &cxlmd->dev;
> struct device *iter;
> + struct cxl_dport *dport;
> + struct cxl_port *port;
> int rc;
>
> + /*
> + * Skip intermediate port enumeration in the RCH case, there
> + * are no ports in between a host bridge and an endpoint. Only
> + * initialize the EP chain.
> + */
> + if (is_cxl_restricted(cxlmd)) {
> + port = cxl_mem_find_port(cxlmd, &dport);
> + if (!port)
> + return -ENXIO;
> + rc = cxl_add_ep(dport, &cxlmd->dev);

On second look, this seems problematic. While yes it will be deleted
when the root CXL port dies, it will not be deleted if the cxl_pci
driver is reloaded. I will code up a unit test to double check.

I note that cxl_add_ep() for the VH case is skipped for the root CXL
port, so I do not suspect it is needed here either. Did you add it for a
specific reason?

2022-11-15 01:00:39

by Dan Williams

[permalink] [raw]
Subject: RE: [PATCH v3 4/9] cxl/mem: Skip intermediate port enumeration of restricted endpoints (RCDs)

Robert Richter wrote:
> When an endpoint is found, all ports in beetween the endpoint and the
> CXL host bridge need to be created. In the RCH case there are no ports
> in between a host bridge and the endpoint. Skip the enumeration of
> intermediate ports.
>
> The port enumeration does not only create all ports, it also
> initializes the endpoint chain by adding the endpoint to every
> downstream port up to the root bridge. This must be done also in RCD
> mode, but is much more simple as the endpoint only needs to be added
> to the host bridge's dport.
>
> Note: For endpoint removal the cxl_detach_ep() is not needed as it is
> released in cxl_port_release().
>
> Signed-off-by: Robert Richter <[email protected]>
> ---
> drivers/cxl/core/port.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index d10c3580719b..e21a9c3fe4da 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1366,8 +1366,24 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> {
> struct device *dev = &cxlmd->dev;
> struct device *iter;
> + struct cxl_dport *dport;
> + struct cxl_port *port;
> int rc;
>
> + /*
> + * Skip intermediate port enumeration in the RCH case, there
> + * are no ports in between a host bridge and an endpoint. Only
> + * initialize the EP chain.
> + */
> + if (is_cxl_restricted(cxlmd)) {

I changed this to:

if (cxlmd->cxlds->rcd) {

...where the cxl_pci driver sets that rcd flag. Aside from keeping some
PCI specifics out of this helper it also allows RCH/RCD configurations
to be mocked with cxl_test.

> + port = cxl_mem_find_port(cxlmd, &dport);
> + if (!port)
> + return -ENXIO;
> + rc = cxl_add_ep(dport, &cxlmd->dev);

Ah, good catch, I had missed this detail previously.

> + put_device(&port->dev);
> + return rc;
> + }
> +
> rc = devm_add_action_or_reset(&cxlmd->dev, cxl_detach_ep, cxlmd);
> if (rc)
> return rc;
> @@ -1381,8 +1397,6 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> for (iter = dev; iter; iter = grandparent(iter)) {
> struct device *dport_dev = grandparent(iter);
> struct device *uport_dev;
> - struct cxl_dport *dport;
> - struct cxl_port *port;
>
> if (!dport_dev)
> return 0;
> --
> 2.30.2
>



2022-11-15 13:46:09

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH v3 4/9] cxl/mem: Skip intermediate port enumeration of restricted endpoints (RCDs)

On 14.11.22 16:24:06, Dan Williams wrote:
> Robert Richter wrote:
> > When an endpoint is found, all ports in beetween the endpoint and the
> > CXL host bridge need to be created. In the RCH case there are no ports
> > in between a host bridge and the endpoint. Skip the enumeration of
> > intermediate ports.
> >
> > The port enumeration does not only create all ports, it also
> > initializes the endpoint chain by adding the endpoint to every
> > downstream port up to the root bridge. This must be done also in RCD
> > mode, but is much more simple as the endpoint only needs to be added
> > to the host bridge's dport.
> >
> > Note: For endpoint removal the cxl_detach_ep() is not needed as it is
> > released in cxl_port_release().
> >
> > Signed-off-by: Robert Richter <[email protected]>
> > ---
> > drivers/cxl/core/port.c | 18 ++++++++++++++++--
> > 1 file changed, 16 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > index d10c3580719b..e21a9c3fe4da 100644
> > --- a/drivers/cxl/core/port.c
> > +++ b/drivers/cxl/core/port.c
> > @@ -1366,8 +1366,24 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> > {
> > struct device *dev = &cxlmd->dev;
> > struct device *iter;
> > + struct cxl_dport *dport;
> > + struct cxl_port *port;
> > int rc;
> >
> > + /*
> > + * Skip intermediate port enumeration in the RCH case, there
> > + * are no ports in between a host bridge and an endpoint. Only
> > + * initialize the EP chain.
> > + */
> > + if (is_cxl_restricted(cxlmd)) {
> > + port = cxl_mem_find_port(cxlmd, &dport);
> > + if (!port)
> > + return -ENXIO;
> > + rc = cxl_add_ep(dport, &cxlmd->dev);
>
> On second look, this seems problematic. While yes it will be deleted
> when the root CXL port dies, it will not be deleted if the cxl_pci
> driver is reloaded. I will code up a unit test to double check.
>
> I note that cxl_add_ep() for the VH case is skipped for the root CXL
> port, so I do not suspect it is needed here either. Did you add it for a
> specific reason?

Yes, all endpoint iterators need to be reworked. Also true, the first
endpoint is skipped in the chain. So only intermediate EP structs are
touched by the loops actually.

In particular, cxl_ep_load() returned NULL for the first lookup if the
ep is missing. We could stop the iteration then. I tried to avoid a
rework here, but maybe it is not too extensive as I expected first.

-Robert

2022-11-15 13:50:41

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH v3 4/9] cxl/mem: Skip intermediate port enumeration of restricted endpoints (RCDs)

On 14.11.22 16:07:58, Dan Williams wrote:
> Robert Richter wrote:
> > When an endpoint is found, all ports in beetween the endpoint and the
> > CXL host bridge need to be created. In the RCH case there are no ports
> > in between a host bridge and the endpoint. Skip the enumeration of
> > intermediate ports.
> >
> > The port enumeration does not only create all ports, it also
> > initializes the endpoint chain by adding the endpoint to every
> > downstream port up to the root bridge. This must be done also in RCD
> > mode, but is much more simple as the endpoint only needs to be added
> > to the host bridge's dport.
> >
> > Note: For endpoint removal the cxl_detach_ep() is not needed as it is
> > released in cxl_port_release().
> >
> > Signed-off-by: Robert Richter <[email protected]>
> > ---
> > drivers/cxl/core/port.c | 18 ++++++++++++++++--
> > 1 file changed, 16 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > index d10c3580719b..e21a9c3fe4da 100644
> > --- a/drivers/cxl/core/port.c
> > +++ b/drivers/cxl/core/port.c
> > @@ -1366,8 +1366,24 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> > {
> > struct device *dev = &cxlmd->dev;
> > struct device *iter;
> > + struct cxl_dport *dport;
> > + struct cxl_port *port;
> > int rc;
> >
> > + /*
> > + * Skip intermediate port enumeration in the RCH case, there
> > + * are no ports in between a host bridge and an endpoint. Only
> > + * initialize the EP chain.
> > + */
> > + if (is_cxl_restricted(cxlmd)) {
>
> I changed this to:
>
> if (cxlmd->cxlds->rcd) {

I a mail to Bjorn I suggested to have cxl_dev_cap and a cxl_port_cap
in the pci_dev struct that could be looked up too including RCD mode.
Checking the pci_dev looks more reasonable to me here, though we could
have a flag of it in cxlds too.

-Robert

>
> ...where the cxl_pci driver sets that rcd flag. Aside from keeping some
> PCI specifics out of this helper it also allows RCH/RCD configurations
> to be mocked with cxl_test.
>
> > + port = cxl_mem_find_port(cxlmd, &dport);
> > + if (!port)
> > + return -ENXIO;
> > + rc = cxl_add_ep(dport, &cxlmd->dev);
>
> Ah, good catch, I had missed this detail previously.
>
> > + put_device(&port->dev);
> > + return rc;
> > + }
> > +
> > rc = devm_add_action_or_reset(&cxlmd->dev, cxl_detach_ep, cxlmd);
> > if (rc)
> > return rc;
> > @@ -1381,8 +1397,6 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> > for (iter = dev; iter; iter = grandparent(iter)) {
> > struct device *dport_dev = grandparent(iter);
> > struct device *uport_dev;
> > - struct cxl_dport *dport;
> > - struct cxl_port *port;
> >
> > if (!dport_dev)
> > return 0;
> > --
> > 2.30.2
> >
>
>

2022-11-15 14:12:39

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH v3 3/9] cxl/mem: Adjust cxl_mem_find_port() to find an RCH's port

On 14.11.22 15:45:19, Dan Williams wrote:
> Robert Richter wrote:
> > The PCIe software view of an RCH and RCD is different to VH mode. An
> > RCD is paired with an RCH and shows up as RCiEP with a parent already
> > pointing to a PCI bridge (struct pci_host_bridge). In contrast, in VH
> > mode an PCI Express Endpoint is a PCI type 0 device with a PCI type 1
> > device as parent (struct pci_dev, most of the time a downstream switch
> > port, but could also be a root port). The following hierarchy applies
> > in VH mode:
> >
> > CXL memory device, cxl_memdev endpoint
> > └──PCIe Endpoint (type 0), pci_dev |
> > └──Downstream Port (type 1), pci_dev (Nth switch) portN
> > └──Upstream Port (type 1), pci_dev (Nth switch) |
> > : :
> > └──Downstream Port (type 1), pci_dev (1st switch) port1
> > └──Upstream Port (type 1), pci_dev (1st switch) |
> > └──Root Port (type 1), pci_dev |
> > └──PCI host bridge, pci_host_bridge port0
> > : |
> > :..ACPI0017, acpi_dev root
> >
> > (There can be zero or any other number of switches in between.)
> >
> > An iterator through the grandparents takes us to the root port which
> > is registered as dport to the bridge. The next port an endpoint is
> > connected to can be determined by using the grandparent of the memory
> > device as a dport_dev in cxl_mem_find_port().
> >
> > The same does not work in RCD mode where only an RCiEP is connected to
> > the host bridge:
> >
> > CXL memory device, cxl_memdev endpoint
> > └──PCIe Endpoint (type 0), pci_dev |
> > └──PCI host bridge, pci_host_bridge port0
> > : |
> > :..ACPI0017, acpi_dev root
> >
> > Here, an endpoint is directly connected to the host bridge without a
> > type 1 PCI device (root or downstream port) in between. To link the
> > endpoint to the correct port, the endpoint's PCI device (parent of the
> > memory device) must be taken as dport_dev arg in cxl_mem_find_port().
> >
> > Change cxl_mem_find_port() to find an RCH's port.
> >
> > Signed-off-by: Robert Richter <[email protected]>
> > ---
> > drivers/cxl/core/port.c | 38 ++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 38 insertions(+)
> >
> > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > index 0431ed860d8e..d10c3580719b 100644
> > --- a/drivers/cxl/core/port.c
> > +++ b/drivers/cxl/core/port.c
> > @@ -1354,6 +1354,14 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd,
> > return rc;
> > }
> >
> > +static inline bool is_cxl_restricted(struct cxl_memdev *cxlmd)
> > +{
> > + struct device *parent = cxlmd->dev.parent;
> > + if (!dev_is_pci(parent))
> > + return false;
> > + return pci_pcie_type(to_pci_dev(parent)) == PCI_EXP_TYPE_RC_END;
> > +}
> > +
> > int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> > {
> > struct device *dev = &cxlmd->dev;
> > @@ -1433,9 +1441,39 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> > }
> > EXPORT_SYMBOL_NS_GPL(devm_cxl_enumerate_ports, CXL);
> >
> > +/*
> > + * CXL memory device and port hierarchy:
> > + *
> > + * VH mode:
> > + *
> > + * CXL memory device, cxl_memdev endpoint
> > + * └──PCIe Endpoint (type 0), pci_dev |
> > + * └──Downstream Port (type 1), pci_dev (Nth switch) portN
> > + * └──Upstream Port (type 1), pci_dev (Nth switch) |
> > + * : :
> > + * └──Downstream Port (type 1), pci_dev (1st switch) port1
> > + * └──Upstream Port (type 1), pci_dev (1st switch) |
> > + * └──Root Port (type 1), pci_dev |
> > + * └──PCI host bridge, pci_host_bridge port0
> > + * : |
> > + * :..ACPI0017, acpi_dev root
> > + *
> > + * (There can be zero or any other number of switches in between.)
> > + *
> > + * RCD mode:
> > + *
> > + * CXL memory device, cxl_memdev endpoint
> > + * └──PCIe Endpoint (type 0), pci_dev |
> > + * └──PCI host bridge, pci_host_bridge port0
> > + * : |
> > + * :..ACPI0017, acpi_dev root
> > + */
> > struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd,
> > struct cxl_dport **dport)
> > {
> > + if (is_cxl_restricted(cxlmd))
> > + return find_cxl_port(cxlmd->dev.parent, dport);
> > +
> > return find_cxl_port(grandparent(&cxlmd->dev), dport);
>
> I do not see why this change is needed. For example:
>
> # readlink -f /sys/bus/cxl/devices/mem0
> /sys/devices/pci0000:38/0000:38:00.0/mem0
> # cxl list -BT
> [
> {
> "bus":"root0",
> "provider":"ACPI.CXL",
> "nr_dports":1,
> "dports":[
> {
> "dport":"pci0000:38",
> "id":49
> }
> ]
> }
> ]
>
> ...so, in this case, the grandparent of "mem0" is "pci0000:38", and
> "pci0000:38" is a dport. Unmodified cxl_mem_find_port() will do the
> right thing and find that this CXL RCIEP is directly connected to
> "root0".

find_cxl_port() uses the dport_dev, not the uport_dev. A lookup of
pci0000:38 gives the cxl root (ACPI.CXL). Instead, the endpoint's
device (0000:38:00.0) must be used to get to the bridge
("pci0000:38").

There is a parent missing because there is no Root Port in the RCD
hierarchy, simplified example:

VH mode:

CXL memory device, cxl_memdev endpoint <- cxlmd
└──PCIe Endpoint (type 0), pci_dev |
└──Downstream Port (type 1), pci_dev (1st switch) port1 <- port1: registered as dport at port0
└──Upstream Port (type 1), pci_dev (1st switch) | port1: grandparent(cxlmd)
└──Root Port (type 1), pci_dev | <- pdev: registered as dport at port0
| | pdev: grandparent(port1)
└──PCI host bridge, pci_host_bridge port0 <- find_cxl_port(grandparent(grandparent(cxlmd)))
: | port0 enumerates root ports (pdev) as dports
:..ACPI0017, acpi_dev root

Additional switches add another grandparent() level each (adding an
dport and uport).

RCD mode:

CXL memory device, cxl_memdev endpoint <- cxlmd
└──PCIe Endpoint (type 0), pci_dev | <- pdev: registered as dport at port0
| | pdev: parent(cxlmd)
└──PCI host bridge, pci_host_bridge port0 <- find_cxl_port(parent(endpoint))
: | port0 enumerates endpoint (RCiEP) as dports
:..ACPI0017, acpi_dev root

I hope I could shed some light here.

-Robert

2022-11-15 18:30:31

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v3 4/9] cxl/mem: Skip intermediate port enumeration of restricted endpoints (RCDs)

Robert Richter wrote:
> On 14.11.22 16:07:58, Dan Williams wrote:
> > Robert Richter wrote:
> > > When an endpoint is found, all ports in beetween the endpoint and the
> > > CXL host bridge need to be created. In the RCH case there are no ports
> > > in between a host bridge and the endpoint. Skip the enumeration of
> > > intermediate ports.
> > >
> > > The port enumeration does not only create all ports, it also
> > > initializes the endpoint chain by adding the endpoint to every
> > > downstream port up to the root bridge. This must be done also in RCD
> > > mode, but is much more simple as the endpoint only needs to be added
> > > to the host bridge's dport.
> > >
> > > Note: For endpoint removal the cxl_detach_ep() is not needed as it is
> > > released in cxl_port_release().
> > >
> > > Signed-off-by: Robert Richter <[email protected]>
> > > ---
> > > drivers/cxl/core/port.c | 18 ++++++++++++++++--
> > > 1 file changed, 16 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > > index d10c3580719b..e21a9c3fe4da 100644
> > > --- a/drivers/cxl/core/port.c
> > > +++ b/drivers/cxl/core/port.c
> > > @@ -1366,8 +1366,24 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> > > {
> > > struct device *dev = &cxlmd->dev;
> > > struct device *iter;
> > > + struct cxl_dport *dport;
> > > + struct cxl_port *port;
> > > int rc;
> > >
> > > + /*
> > > + * Skip intermediate port enumeration in the RCH case, there
> > > + * are no ports in between a host bridge and an endpoint. Only
> > > + * initialize the EP chain.
> > > + */
> > > + if (is_cxl_restricted(cxlmd)) {
> >
> > I changed this to:
> >
> > if (cxlmd->cxlds->rcd) {
>
> I a mail to Bjorn I suggested to have cxl_dev_cap and a cxl_port_cap
> in the pci_dev struct that could be looked up too including RCD mode.
> Checking the pci_dev looks more reasonable to me here, though we could
> have a flag of it in cxlds too.

Would that not need the PCI core to understand how to walk the RCRB
generically? As far as I understand the RCRB association is ACPI.CEDT
specific.

2022-11-15 18:33:05

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v3 3/9] cxl/mem: Adjust cxl_mem_find_port() to find an RCH's port

Robert Richter wrote:
> On 14.11.22 15:45:19, Dan Williams wrote:
> > Robert Richter wrote:
> > > The PCIe software view of an RCH and RCD is different to VH mode. An
> > > RCD is paired with an RCH and shows up as RCiEP with a parent already
> > > pointing to a PCI bridge (struct pci_host_bridge). In contrast, in VH
> > > mode an PCI Express Endpoint is a PCI type 0 device with a PCI type 1
> > > device as parent (struct pci_dev, most of the time a downstream switch
> > > port, but could also be a root port). The following hierarchy applies
> > > in VH mode:
> > >
> > > CXL memory device, cxl_memdev endpoint
> > > └──PCIe Endpoint (type 0), pci_dev |
> > > └──Downstream Port (type 1), pci_dev (Nth switch) portN
> > > └──Upstream Port (type 1), pci_dev (Nth switch) |
> > > : :
> > > └──Downstream Port (type 1), pci_dev (1st switch) port1
> > > └──Upstream Port (type 1), pci_dev (1st switch) |
> > > └──Root Port (type 1), pci_dev |
> > > └──PCI host bridge, pci_host_bridge port0
> > > : |
> > > :..ACPI0017, acpi_dev root
> > >
> > > (There can be zero or any other number of switches in between.)
> > >
> > > An iterator through the grandparents takes us to the root port which
> > > is registered as dport to the bridge. The next port an endpoint is
> > > connected to can be determined by using the grandparent of the memory
> > > device as a dport_dev in cxl_mem_find_port().
> > >
> > > The same does not work in RCD mode where only an RCiEP is connected to
> > > the host bridge:
> > >
> > > CXL memory device, cxl_memdev endpoint
> > > └──PCIe Endpoint (type 0), pci_dev |
> > > └──PCI host bridge, pci_host_bridge port0
> > > : |
> > > :..ACPI0017, acpi_dev root
> > >
> > > Here, an endpoint is directly connected to the host bridge without a
> > > type 1 PCI device (root or downstream port) in between. To link the
> > > endpoint to the correct port, the endpoint's PCI device (parent of the
> > > memory device) must be taken as dport_dev arg in cxl_mem_find_port().
> > >
> > > Change cxl_mem_find_port() to find an RCH's port.
> > >
> > > Signed-off-by: Robert Richter <[email protected]>
> > > ---
> > > drivers/cxl/core/port.c | 38 ++++++++++++++++++++++++++++++++++++++
> > > 1 file changed, 38 insertions(+)
> > >
> > > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > > index 0431ed860d8e..d10c3580719b 100644
> > > --- a/drivers/cxl/core/port.c
> > > +++ b/drivers/cxl/core/port.c
> > > @@ -1354,6 +1354,14 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd,
> > > return rc;
> > > }
> > >
> > > +static inline bool is_cxl_restricted(struct cxl_memdev *cxlmd)
> > > +{
> > > + struct device *parent = cxlmd->dev.parent;
> > > + if (!dev_is_pci(parent))
> > > + return false;
> > > + return pci_pcie_type(to_pci_dev(parent)) == PCI_EXP_TYPE_RC_END;
> > > +}
> > > +
> > > int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> > > {
> > > struct device *dev = &cxlmd->dev;
> > > @@ -1433,9 +1441,39 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> > > }
> > > EXPORT_SYMBOL_NS_GPL(devm_cxl_enumerate_ports, CXL);
> > >
> > > +/*
> > > + * CXL memory device and port hierarchy:
> > > + *
> > > + * VH mode:
> > > + *
> > > + * CXL memory device, cxl_memdev endpoint
> > > + * └──PCIe Endpoint (type 0), pci_dev |
> > > + * └──Downstream Port (type 1), pci_dev (Nth switch) portN
> > > + * └──Upstream Port (type 1), pci_dev (Nth switch) |
> > > + * : :
> > > + * └──Downstream Port (type 1), pci_dev (1st switch) port1
> > > + * └──Upstream Port (type 1), pci_dev (1st switch) |
> > > + * └──Root Port (type 1), pci_dev |
> > > + * └──PCI host bridge, pci_host_bridge port0
> > > + * : |
> > > + * :..ACPI0017, acpi_dev root
> > > + *
> > > + * (There can be zero or any other number of switches in between.)
> > > + *
> > > + * RCD mode:
> > > + *
> > > + * CXL memory device, cxl_memdev endpoint
> > > + * └──PCIe Endpoint (type 0), pci_dev |
> > > + * └──PCI host bridge, pci_host_bridge port0
> > > + * : |
> > > + * :..ACPI0017, acpi_dev root
> > > + */
> > > struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd,
> > > struct cxl_dport **dport)
> > > {
> > > + if (is_cxl_restricted(cxlmd))
> > > + return find_cxl_port(cxlmd->dev.parent, dport);
> > > +
> > > return find_cxl_port(grandparent(&cxlmd->dev), dport);
> >
> > I do not see why this change is needed. For example:
> >
> > # readlink -f /sys/bus/cxl/devices/mem0
> > /sys/devices/pci0000:38/0000:38:00.0/mem0
> > # cxl list -BT
> > [
> > {
> > "bus":"root0",
> > "provider":"ACPI.CXL",
> > "nr_dports":1,
> > "dports":[
> > {
> > "dport":"pci0000:38",
> > "id":49
> > }
> > ]
> > }
> > ]
> >
> > ...so, in this case, the grandparent of "mem0" is "pci0000:38", and
> > "pci0000:38" is a dport. Unmodified cxl_mem_find_port() will do the
> > right thing and find that this CXL RCIEP is directly connected to
> > "root0".
>
> find_cxl_port() uses the dport_dev, not the uport_dev. A lookup of
> pci0000:38 gives the cxl root (ACPI.CXL).

...but that is what I would expect. I.e. that RCDs appear directly
connected to the cxl root with no intervening cxl_port instance, and RCH
host-bridges only serve the role of dport devices. This also matches the
diagram from 9.11.8 CXL Devices Attached to an RCH where a
downstream-port RCRB in the host bridge is directly connected to the
RCIEP endpoint.

2022-11-15 19:00:25

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v3 4/9] cxl/mem: Skip intermediate port enumeration of restricted endpoints (RCDs)

Robert Richter wrote:
> On 14.11.22 16:24:06, Dan Williams wrote:
> > Robert Richter wrote:
> > > When an endpoint is found, all ports in beetween the endpoint and the
> > > CXL host bridge need to be created. In the RCH case there are no ports
> > > in between a host bridge and the endpoint. Skip the enumeration of
> > > intermediate ports.
> > >
> > > The port enumeration does not only create all ports, it also
> > > initializes the endpoint chain by adding the endpoint to every
> > > downstream port up to the root bridge. This must be done also in RCD
> > > mode, but is much more simple as the endpoint only needs to be added
> > > to the host bridge's dport.
> > >
> > > Note: For endpoint removal the cxl_detach_ep() is not needed as it is
> > > released in cxl_port_release().
> > >
> > > Signed-off-by: Robert Richter <[email protected]>
> > > ---
> > > drivers/cxl/core/port.c | 18 ++++++++++++++++--
> > > 1 file changed, 16 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > > index d10c3580719b..e21a9c3fe4da 100644
> > > --- a/drivers/cxl/core/port.c
> > > +++ b/drivers/cxl/core/port.c
> > > @@ -1366,8 +1366,24 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> > > {
> > > struct device *dev = &cxlmd->dev;
> > > struct device *iter;
> > > + struct cxl_dport *dport;
> > > + struct cxl_port *port;
> > > int rc;
> > >
> > > + /*
> > > + * Skip intermediate port enumeration in the RCH case, there
> > > + * are no ports in between a host bridge and an endpoint. Only
> > > + * initialize the EP chain.
> > > + */
> > > + if (is_cxl_restricted(cxlmd)) {
> > > + port = cxl_mem_find_port(cxlmd, &dport);
> > > + if (!port)
> > > + return -ENXIO;
> > > + rc = cxl_add_ep(dport, &cxlmd->dev);
> >
> > On second look, this seems problematic. While yes it will be deleted
> > when the root CXL port dies, it will not be deleted if the cxl_pci
> > driver is reloaded. I will code up a unit test to double check.
> >
> > I note that cxl_add_ep() for the VH case is skipped for the root CXL
> > port, so I do not suspect it is needed here either. Did you add it for a
> > specific reason?
>
> Yes, all endpoint iterators need to be reworked. Also true, the first
> endpoint is skipped in the chain. So only intermediate EP structs are
> touched by the loops actually.
>
> In particular, cxl_ep_load() returned NULL for the first lookup if the
> ep is missing. We could stop the iteration then. I tried to avoid a
> rework here, but maybe it is not too extensive as I expected first.

Hmm, ok, let me get the unit test topology working for this to make sure
my assumptions are correct about when an @ep reference is used / needed.

2022-11-16 20:04:06

by Dan Williams

[permalink] [raw]
Subject: RE: [PATCH v3 5/9] cxl/pci: Only register RCDs with device 0, function 0 as CXL memory device

Robert Richter wrote:
> The Device 0, Function 0 DVSEC controls the CXL functionality of the
> entire device. Add a check to prevent registration of any other PCI
> device on the bus as a CXL memory device.

Can you reference the specification wording that indicates that the OS
needs to actively avoid these situations, or otherwise point to the real
world scenario where this filtering is needed?

>
> Signed-off-by: Robert Richter <[email protected]>
> ---
> drivers/cxl/pci.c | 25 +++++++++++++++++++++++--
> 1 file changed, 23 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index faeb5d9d7a7a..cc4f206f24b3 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -428,11 +428,26 @@ static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
> }
> }
>
> +static int check_restricted_device(struct pci_dev *pdev, u16 pcie_dvsec)
> +{
> + if (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_END)
> + return 0; /* no RCD */
> +
> + if (pdev->devfn == PCI_DEVFN(0, 0) && pcie_dvsec)
> + return 0; /* ok */
> +
> + dev_warn(&pdev->dev, "Skipping RCD: devfn=0x%02x dvsec=%u\n",

s/0x%02x/%#02x/

> + pdev->devfn, pcie_dvsec);

This looks like a dev_dbg() to me. Otherwise a warning will always fire
on a benign condition.

> +
> + return -ENODEV;
> +}
> +
> static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> {
> struct cxl_register_map map;
> struct cxl_memdev *cxlmd;
> struct cxl_dev_state *cxlds;
> + u16 pcie_dvsec;
> int rc;
>
> /*
> @@ -442,6 +457,13 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> BUILD_BUG_ON(offsetof(struct cxl_regs, memdev) !=
> offsetof(struct cxl_regs, device_regs.memdev));
>
> + pcie_dvsec = pci_find_dvsec_capability(
> + pdev, PCI_DVSEC_VENDOR_ID_CXL, CXL_DVSEC_PCIE_DEVICE);
> +
> + rc = check_restricted_device(pdev, pcie_dvsec);
> + if (rc)
> + return rc;
> +
> rc = pcim_enable_device(pdev);
> if (rc)
> return rc;
> @@ -451,8 +473,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> return PTR_ERR(cxlds);
>
> cxlds->serial = pci_get_dsn(pdev);
> - cxlds->cxl_dvsec = pci_find_dvsec_capability(
> - pdev, PCI_DVSEC_VENDOR_ID_CXL, CXL_DVSEC_PCIE_DEVICE);
> + cxlds->cxl_dvsec = pcie_dvsec;
> if (!cxlds->cxl_dvsec)
> dev_warn(&pdev->dev,
> "Device DVSEC not present, skip CXL.mem init\n");
> --
> 2.30.2
>



2022-11-16 21:46:44

by Dan Williams

[permalink] [raw]
Subject: RE: [PATCH v3 8/9] cxl/pci: Extend devm_cxl_port_enumerate_dports() to support restricted hosts (RCH)

Robert Richter wrote:
> The PCIe Software View of an RCH and RCD is different to VH mode. An
> RCD is paired with an RCH and shows up as RCiEP. Its downstream and
> upstream ports are hidden to the PCI hierarchy. This different PCI
> topology requires a different handling of RCHs.
>
> Extend devm_cxl_port_enumerate_dports() to support restricted hosts
> (RCH). If an RCH is detected, register its port as dport to the
> device. An RCH is found if the host's dev 0 func 0 devices is an RCiEP
> with an existing PCIe DVSEC for CXL Devices (ID 0000h).

It is not clear to me what this extra dport represents. Here are the
Linux CXL objects I see in a VH vs an RCH topology:

VH
┌──────────┐
│ ACPI0017 │
│ root0 │
└─────┬────┘

┌─────┴────┐
│ dport0 │
┌─────┤ ACPI0016 ├─────┐
│ │ port1 │ │
│ └────┬─────┘ │
│ │ │
┌──┴───┐ ┌──┴───┐ ┌───┴──┐
│dport0│ │dport1│ │dport2│
│ RP0 │ │ RP1 │ │ RP2 │
└──────┘ └──┬───┘ └──────┘

┌───┴─────┐
│endpoint0│
│ port2 │
└─────────┘


RCH
┌──────────┐
│ ACPI0017 │
│ root0 │
└────┬─────┘

┌───┴────┐
│ dport0 │
│ACPI0016│
└───┬────┘

┌────┴─────┐
│endpoint0 │
│ port1 │
└──────────┘

So in the RCH case the only dport is the dport that root0 targets, and
then that dport is directly connected to the RCIEP endpoint-port.

In the VH case another level of dports are needed to route from root0 to
the fan out across the CXL root ports.

2022-11-17 16:17:43

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH v3 5/9] cxl/pci: Only register RCDs with device 0, function 0 as CXL memory device

On 16.11.22 11:24:48, Dan Williams wrote:
> Robert Richter wrote:
> > The Device 0, Function 0 DVSEC controls the CXL functionality of the
> > entire device. Add a check to prevent registration of any other PCI
> > device on the bus as a CXL memory device.
>
> Can you reference the specification wording that indicates that the OS
> needs to actively avoid these situations, or otherwise point to the real
> world scenario where this filtering is needed?

CXL 3.0

8.1.3 PCIe DVSEC for CXL Device

"""
An RCD creates a new PCIe enumeration hierarchy. As such, it spawns a new Root Bus
and can expose one or more PCIe device numbers and function numbers at this bus
number. These are exposed as Root Complex Integrated Endpoints (RCiEP). The PCIe
Configuration Space of Device 0, Function 0 shall include the CXL PCIe DVSEC as shown
in Figure 8-1.
"""

"""
In either case, the capability, status, and control fields in Device 0, Function 0 DVSEC
control the CXL functionality of the entire device.
"""

There are some other occurrences. I think this is even true for VH
mode, as multiple CXL devices on the bus are exposed through multiple
DSPs or Root Ports.

Anyway, I limited this to an RCD only, esp. because its counterpart
would be missing and thus port mapping would fail otherwise. See
restricted_host_enumerate_dport() of this series.

>
> >
> > Signed-off-by: Robert Richter <[email protected]>
> > ---
> > drivers/cxl/pci.c | 25 +++++++++++++++++++++++--
> > 1 file changed, 23 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > index faeb5d9d7a7a..cc4f206f24b3 100644
> > --- a/drivers/cxl/pci.c
> > +++ b/drivers/cxl/pci.c
> > @@ -428,11 +428,26 @@ static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
> > }
> > }
> >
> > +static int check_restricted_device(struct pci_dev *pdev, u16 pcie_dvsec)
> > +{
> > + if (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_END)
> > + return 0; /* no RCD */
> > +
> > + if (pdev->devfn == PCI_DEVFN(0, 0) && pcie_dvsec)
> > + return 0; /* ok */
> > +
> > + dev_warn(&pdev->dev, "Skipping RCD: devfn=0x%02x dvsec=%u\n",
>
> s/0x%02x/%#02x/
>
> > + pdev->devfn, pcie_dvsec);

Ok.

> This looks like a dev_dbg() to me. Otherwise a warning will always fire
> on a benign condition.

I have chosen dev_warn() here as this is a non-compliant unexpected
behavior of the device. There are no (legal) cases this may happen. I
suppose you are worried about spamming the console here, but that
error should be reported somewhere and thus being visible.

Note: There can be multiple devices on the bus, but those shouldn't
have a CXL mem dev class code and the devices shouldn't being probed
by cxl_pci_probe() which contains the check and later creates a cxlmd
dev.

-Robert



2022-11-17 18:11:29

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v3 5/9] cxl/pci: Only register RCDs with device 0, function 0 as CXL memory device

Robert Richter wrote:
> On 16.11.22 11:24:48, Dan Williams wrote:
> > Robert Richter wrote:
> > > The Device 0, Function 0 DVSEC controls the CXL functionality of the
> > > entire device. Add a check to prevent registration of any other PCI
> > > device on the bus as a CXL memory device.
> >
> > Can you reference the specification wording that indicates that the OS
> > needs to actively avoid these situations, or otherwise point to the real
> > world scenario where this filtering is needed?
>
> CXL 3.0
>
> 8.1.3 PCIe DVSEC for CXL Device
>
> """
> An RCD creates a new PCIe enumeration hierarchy. As such, it spawns a new Root Bus
> and can expose one or more PCIe device numbers and function numbers at this bus
> number. These are exposed as Root Complex Integrated Endpoints (RCiEP). The PCIe
> Configuration Space of Device 0, Function 0 shall include the CXL PCIe DVSEC as shown
> in Figure 8-1.
> """
>
> """
> In either case, the capability, status, and control fields in Device 0, Function 0 DVSEC
> control the CXL functionality of the entire device.
> """
>
> There are some other occurrences. I think this is even true for VH
> mode, as multiple CXL devices on the bus are exposed through multiple
> DSPs or Root Ports.
>
> Anyway, I limited this to an RCD only, esp. because its counterpart
> would be missing and thus port mapping would fail otherwise. See
> restricted_host_enumerate_dport() of this series.
>
> >
> > >
> > > Signed-off-by: Robert Richter <[email protected]>
> > > ---
> > > drivers/cxl/pci.c | 25 +++++++++++++++++++++++--
> > > 1 file changed, 23 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > > index faeb5d9d7a7a..cc4f206f24b3 100644
> > > --- a/drivers/cxl/pci.c
> > > +++ b/drivers/cxl/pci.c
> > > @@ -428,11 +428,26 @@ static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
> > > }
> > > }
> > >
> > > +static int check_restricted_device(struct pci_dev *pdev, u16 pcie_dvsec)
> > > +{
> > > + if (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_END)
> > > + return 0; /* no RCD */
> > > +
> > > + if (pdev->devfn == PCI_DEVFN(0, 0) && pcie_dvsec)
> > > + return 0; /* ok */
> > > +
> > > + dev_warn(&pdev->dev, "Skipping RCD: devfn=0x%02x dvsec=%u\n",
> >
> > s/0x%02x/%#02x/
> >
> > > + pdev->devfn, pcie_dvsec);
>
> Ok.
>
> > This looks like a dev_dbg() to me. Otherwise a warning will always fire
> > on a benign condition.
>
> I have chosen dev_warn() here as this is a non-compliant unexpected
> behavior of the device. There are no (legal) cases this may happen. I
> suppose you are worried about spamming the console here, but that
> error should be reported somewhere and thus being visible.

There are so many spec illegal values and conditions that the driver
could checki, but does not. The reason I am poking here is why does the
driver need to be explicit about *this* illegal condition versus all the
other potential conditions? What is the practical end user impact if
Linux does not include this change? For example, if it is just one
vendor that made this mistake that can be an explicit quirk.

A dev_warn() is not necessary for simple quirks.

2022-11-17 19:12:06

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH v3 4/9] cxl/mem: Skip intermediate port enumeration of restricted endpoints (RCDs)

On 15.11.22 10:08:51, Dan Williams wrote:
> Robert Richter wrote:
> > On 14.11.22 16:07:58, Dan Williams wrote:
> > > Robert Richter wrote:
> > > > When an endpoint is found, all ports in beetween the endpoint and the
> > > > CXL host bridge need to be created. In the RCH case there are no ports
> > > > in between a host bridge and the endpoint. Skip the enumeration of
> > > > intermediate ports.
> > > >
> > > > The port enumeration does not only create all ports, it also
> > > > initializes the endpoint chain by adding the endpoint to every
> > > > downstream port up to the root bridge. This must be done also in RCD
> > > > mode, but is much more simple as the endpoint only needs to be added
> > > > to the host bridge's dport.
> > > >
> > > > Note: For endpoint removal the cxl_detach_ep() is not needed as it is
> > > > released in cxl_port_release().
> > > >
> > > > Signed-off-by: Robert Richter <[email protected]>
> > > > ---
> > > > drivers/cxl/core/port.c | 18 ++++++++++++++++--
> > > > 1 file changed, 16 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > > > index d10c3580719b..e21a9c3fe4da 100644
> > > > --- a/drivers/cxl/core/port.c
> > > > +++ b/drivers/cxl/core/port.c
> > > > @@ -1366,8 +1366,24 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> > > > {
> > > > struct device *dev = &cxlmd->dev;
> > > > struct device *iter;
> > > > + struct cxl_dport *dport;
> > > > + struct cxl_port *port;
> > > > int rc;
> > > >
> > > > + /*
> > > > + * Skip intermediate port enumeration in the RCH case, there
> > > > + * are no ports in between a host bridge and an endpoint. Only
> > > > + * initialize the EP chain.
> > > > + */
> > > > + if (is_cxl_restricted(cxlmd)) {
> > >
> > > I changed this to:
> > >
> > > if (cxlmd->cxlds->rcd) {
> >
> > I a mail to Bjorn I suggested to have cxl_dev_cap and a cxl_port_cap
> > in the pci_dev struct that could be looked up too including RCD mode.
> > Checking the pci_dev looks more reasonable to me here, though we could
> > have a flag of it in cxlds too.
>
> Would that not need the PCI core to understand how to walk the RCRB
> generically? As far as I understand the RCRB association is ACPI.CEDT
> specific.

I am thinking of doing some sort of cxl_setup_pci_dev(pdev) in
cxl_pci_probe() which extracts the caps and writes them into a struct
pci_dev. Possibly the CXL 68B Flit and VH Capable/Enable bit could be
used for RCD mode. But if that is not feasible for some reason a flag
somewhere could work too.

-Robert

2022-11-17 20:16:25

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH v3 3/9] cxl/mem: Adjust cxl_mem_find_port() to find an RCH's port

On 15.11.22 10:06:28, Dan Williams wrote:
> Robert Richter wrote:
> > On 14.11.22 15:45:19, Dan Williams wrote:
> > > Robert Richter wrote:
> > > > The PCIe software view of an RCH and RCD is different to VH mode. An
> > > > RCD is paired with an RCH and shows up as RCiEP with a parent already
> > > > pointing to a PCI bridge (struct pci_host_bridge). In contrast, in VH
> > > > mode an PCI Express Endpoint is a PCI type 0 device with a PCI type 1
> > > > device as parent (struct pci_dev, most of the time a downstream switch
> > > > port, but could also be a root port). The following hierarchy applies
> > > > in VH mode:
> > > >
> > > > CXL memory device, cxl_memdev endpoint
> > > > └──PCIe Endpoint (type 0), pci_dev |
> > > > └──Downstream Port (type 1), pci_dev (Nth switch) portN
> > > > └──Upstream Port (type 1), pci_dev (Nth switch) |
> > > > : :
> > > > └──Downstream Port (type 1), pci_dev (1st switch) port1
> > > > └──Upstream Port (type 1), pci_dev (1st switch) |
> > > > └──Root Port (type 1), pci_dev |
> > > > └──PCI host bridge, pci_host_bridge port0
> > > > : |
> > > > :..ACPI0017, acpi_dev root
> > > >
> > > > (There can be zero or any other number of switches in between.)
> > > >
> > > > An iterator through the grandparents takes us to the root port which
> > > > is registered as dport to the bridge. The next port an endpoint is
> > > > connected to can be determined by using the grandparent of the memory
> > > > device as a dport_dev in cxl_mem_find_port().
> > > >
> > > > The same does not work in RCD mode where only an RCiEP is connected to
> > > > the host bridge:
> > > >
> > > > CXL memory device, cxl_memdev endpoint
> > > > └──PCIe Endpoint (type 0), pci_dev |
> > > > └──PCI host bridge, pci_host_bridge port0
> > > > : |
> > > > :..ACPI0017, acpi_dev root
> > > >
> > > > Here, an endpoint is directly connected to the host bridge without a
> > > > type 1 PCI device (root or downstream port) in between. To link the
> > > > endpoint to the correct port, the endpoint's PCI device (parent of the
> > > > memory device) must be taken as dport_dev arg in cxl_mem_find_port().
> > > >
> > > > Change cxl_mem_find_port() to find an RCH's port.
> > > >
> > > > Signed-off-by: Robert Richter <[email protected]>
> > > > ---
> > > > drivers/cxl/core/port.c | 38 ++++++++++++++++++++++++++++++++++++++
> > > > 1 file changed, 38 insertions(+)
> > > >
> > > > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > > > index 0431ed860d8e..d10c3580719b 100644
> > > > --- a/drivers/cxl/core/port.c
> > > > +++ b/drivers/cxl/core/port.c
> > > > @@ -1354,6 +1354,14 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd,
> > > > return rc;
> > > > }
> > > >
> > > > +static inline bool is_cxl_restricted(struct cxl_memdev *cxlmd)
> > > > +{
> > > > + struct device *parent = cxlmd->dev.parent;
> > > > + if (!dev_is_pci(parent))
> > > > + return false;
> > > > + return pci_pcie_type(to_pci_dev(parent)) == PCI_EXP_TYPE_RC_END;
> > > > +}
> > > > +
> > > > int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> > > > {
> > > > struct device *dev = &cxlmd->dev;
> > > > @@ -1433,9 +1441,39 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> > > > }
> > > > EXPORT_SYMBOL_NS_GPL(devm_cxl_enumerate_ports, CXL);
> > > >
> > > > +/*
> > > > + * CXL memory device and port hierarchy:
> > > > + *
> > > > + * VH mode:
> > > > + *
> > > > + * CXL memory device, cxl_memdev endpoint
> > > > + * └──PCIe Endpoint (type 0), pci_dev |
> > > > + * └──Downstream Port (type 1), pci_dev (Nth switch) portN
> > > > + * └──Upstream Port (type 1), pci_dev (Nth switch) |
> > > > + * : :
> > > > + * └──Downstream Port (type 1), pci_dev (1st switch) port1
> > > > + * └──Upstream Port (type 1), pci_dev (1st switch) |
> > > > + * └──Root Port (type 1), pci_dev |
> > > > + * └──PCI host bridge, pci_host_bridge port0
> > > > + * : |
> > > > + * :..ACPI0017, acpi_dev root
> > > > + *
> > > > + * (There can be zero or any other number of switches in between.)
> > > > + *
> > > > + * RCD mode:
> > > > + *
> > > > + * CXL memory device, cxl_memdev endpoint
> > > > + * └──PCIe Endpoint (type 0), pci_dev |
> > > > + * └──PCI host bridge, pci_host_bridge port0
> > > > + * : |
> > > > + * :..ACPI0017, acpi_dev root
> > > > + */
> > > > struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd,
> > > > struct cxl_dport **dport)
> > > > {
> > > > + if (is_cxl_restricted(cxlmd))
> > > > + return find_cxl_port(cxlmd->dev.parent, dport);
> > > > +
> > > > return find_cxl_port(grandparent(&cxlmd->dev), dport);
> > >
> > > I do not see why this change is needed. For example:
> > >
> > > # readlink -f /sys/bus/cxl/devices/mem0
> > > /sys/devices/pci0000:38/0000:38:00.0/mem0
> > > # cxl list -BT
> > > [
> > > {
> > > "bus":"root0",
> > > "provider":"ACPI.CXL",
> > > "nr_dports":1,
> > > "dports":[
> > > {
> > > "dport":"pci0000:38",
> > > "id":49
> > > }
> > > ]
> > > }
> > > ]
> > >
> > > ...so, in this case, the grandparent of "mem0" is "pci0000:38", and
> > > "pci0000:38" is a dport. Unmodified cxl_mem_find_port() will do the
> > > right thing and find that this CXL RCIEP is directly connected to
> > > "root0".
> >
> > find_cxl_port() uses the dport_dev, not the uport_dev. A lookup of
> > pci0000:38 gives the cxl root (ACPI.CXL).
>
> ...but that is what I would expect. I.e. that RCDs appear directly
> connected to the cxl root with no intervening cxl_port instance, and RCH
> host-bridges only serve the role of dport devices. This also matches the
> diagram from 9.11.8 CXL Devices Attached to an RCH where a
> downstream-port RCRB in the host bridge is directly connected to the
> RCIEP endpoint.

All devices connected to ACPI.CXL? IMO this needs to be bound to the
host bridge. The hierarchy should show a host/device mapping. In fact,
in a VH the root port is also not part of the mapping, instead the
host bridge shows up there. Also, there can be multiple RCHs.

-Robert

2022-11-18 09:21:06

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH v3 5/9] cxl/pci: Only register RCDs with device 0, function 0 as CXL memory device

On 17.11.22 09:27:23, Dan Williams wrote:
> Robert Richter wrote:
> > On 16.11.22 11:24:48, Dan Williams wrote:
> > > Robert Richter wrote:
> > > > The Device 0, Function 0 DVSEC controls the CXL functionality of the
> > > > entire device. Add a check to prevent registration of any other PCI
> > > > device on the bus as a CXL memory device.
> > >
> > > Can you reference the specification wording that indicates that the OS
> > > needs to actively avoid these situations, or otherwise point to the real
> > > world scenario where this filtering is needed?
> >
> > CXL 3.0
> >
> > 8.1.3 PCIe DVSEC for CXL Device
> >
> > """
> > An RCD creates a new PCIe enumeration hierarchy. As such, it spawns a new Root Bus
> > and can expose one or more PCIe device numbers and function numbers at this bus
> > number. These are exposed as Root Complex Integrated Endpoints (RCiEP). The PCIe
> > Configuration Space of Device 0, Function 0 shall include the CXL PCIe DVSEC as shown
> > in Figure 8-1.
> > """
> >
> > """
> > In either case, the capability, status, and control fields in Device 0, Function 0 DVSEC
> > control the CXL functionality of the entire device.
> > """
> >
> > There are some other occurrences. I think this is even true for VH
> > mode, as multiple CXL devices on the bus are exposed through multiple
> > DSPs or Root Ports.
> >
> > Anyway, I limited this to an RCD only, esp. because its counterpart
> > would be missing and thus port mapping would fail otherwise. See
> > restricted_host_enumerate_dport() of this series.
> >
> > >
> > > >
> > > > Signed-off-by: Robert Richter <[email protected]>
> > > > ---
> > > > drivers/cxl/pci.c | 25 +++++++++++++++++++++++--
> > > > 1 file changed, 23 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > > > index faeb5d9d7a7a..cc4f206f24b3 100644
> > > > --- a/drivers/cxl/pci.c
> > > > +++ b/drivers/cxl/pci.c
> > > > @@ -428,11 +428,26 @@ static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
> > > > }
> > > > }
> > > >
> > > > +static int check_restricted_device(struct pci_dev *pdev, u16 pcie_dvsec)
> > > > +{
> > > > + if (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_END)
> > > > + return 0; /* no RCD */
> > > > +
> > > > + if (pdev->devfn == PCI_DEVFN(0, 0) && pcie_dvsec)
> > > > + return 0; /* ok */
> > > > +
> > > > + dev_warn(&pdev->dev, "Skipping RCD: devfn=0x%02x dvsec=%u\n",
> > >
> > > s/0x%02x/%#02x/
> > >
> > > > + pdev->devfn, pcie_dvsec);
> >
> > Ok.
> >
> > > This looks like a dev_dbg() to me. Otherwise a warning will always fire
> > > on a benign condition.
> >
> > I have chosen dev_warn() here as this is a non-compliant unexpected
> > behavior of the device. There are no (legal) cases this may happen. I
> > suppose you are worried about spamming the console here, but that
> > error should be reported somewhere and thus being visible.
>
> There are so many spec illegal values and conditions that the driver
> could checki, but does not. The reason I am poking here is why does the
> driver need to be explicit about *this* illegal condition versus all the
> other potential conditions? What is the practical end user impact if
> Linux does not include this change? For example, if it is just one
> vendor that made this mistake that can be an explicit quirk.
>
> A dev_warn() is not necessary for simple quirks.

This is not simply a cross check, the driver prevents enablement of
CXL mem devs other than PCI_DEVFN(0, 0). It shouldn't silently drop
out then. It's a device malfunction which should appropriate reported
and not only visible if dbg is enabled.

As written above, the check is necessary as the counterpart is missing
otherwise and init would fail later with a more obfuscating error
message.

-Robert

2022-11-18 18:28:10

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v3 5/9] cxl/pci: Only register RCDs with device 0, function 0 as CXL memory device

Robert Richter wrote:
> On 17.11.22 09:27:23, Dan Williams wrote:
> > Robert Richter wrote:
> > > On 16.11.22 11:24:48, Dan Williams wrote:
> > > > Robert Richter wrote:
> > > > > The Device 0, Function 0 DVSEC controls the CXL functionality of the
> > > > > entire device. Add a check to prevent registration of any other PCI
> > > > > device on the bus as a CXL memory device.
> > > >
> > > > Can you reference the specification wording that indicates that the OS
> > > > needs to actively avoid these situations, or otherwise point to the real
> > > > world scenario where this filtering is needed?
> > >
> > > CXL 3.0
> > >
> > > 8.1.3 PCIe DVSEC for CXL Device
> > >
> > > """
> > > An RCD creates a new PCIe enumeration hierarchy. As such, it spawns a new Root Bus
> > > and can expose one or more PCIe device numbers and function numbers at this bus
> > > number. These are exposed as Root Complex Integrated Endpoints (RCiEP). The PCIe
> > > Configuration Space of Device 0, Function 0 shall include the CXL PCIe DVSEC as shown
> > > in Figure 8-1.
> > > """
> > >
> > > """
> > > In either case, the capability, status, and control fields in Device 0, Function 0 DVSEC
> > > control the CXL functionality of the entire device.
> > > """
> > >
> > > There are some other occurrences. I think this is even true for VH
> > > mode, as multiple CXL devices on the bus are exposed through multiple
> > > DSPs or Root Ports.
> > >
> > > Anyway, I limited this to an RCD only, esp. because its counterpart
> > > would be missing and thus port mapping would fail otherwise. See
> > > restricted_host_enumerate_dport() of this series.
> > >
> > > >
> > > > >
> > > > > Signed-off-by: Robert Richter <[email protected]>
> > > > > ---
> > > > > drivers/cxl/pci.c | 25 +++++++++++++++++++++++--
> > > > > 1 file changed, 23 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > > > > index faeb5d9d7a7a..cc4f206f24b3 100644
> > > > > --- a/drivers/cxl/pci.c
> > > > > +++ b/drivers/cxl/pci.c
> > > > > @@ -428,11 +428,26 @@ static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
> > > > > }
> > > > > }
> > > > >
> > > > > +static int check_restricted_device(struct pci_dev *pdev, u16 pcie_dvsec)
> > > > > +{
> > > > > + if (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_END)
> > > > > + return 0; /* no RCD */
> > > > > +
> > > > > + if (pdev->devfn == PCI_DEVFN(0, 0) && pcie_dvsec)
> > > > > + return 0; /* ok */
> > > > > +
> > > > > + dev_warn(&pdev->dev, "Skipping RCD: devfn=0x%02x dvsec=%u\n",
> > > >
> > > > s/0x%02x/%#02x/
> > > >
> > > > > + pdev->devfn, pcie_dvsec);
> > >
> > > Ok.
> > >
> > > > This looks like a dev_dbg() to me. Otherwise a warning will always fire
> > > > on a benign condition.
> > >
> > > I have chosen dev_warn() here as this is a non-compliant unexpected
> > > behavior of the device. There are no (legal) cases this may happen. I
> > > suppose you are worried about spamming the console here, but that
> > > error should be reported somewhere and thus being visible.
> >
> > There are so many spec illegal values and conditions that the driver
> > could checki, but does not. The reason I am poking here is why does the
> > driver need to be explicit about *this* illegal condition versus all the
> > other potential conditions? What is the practical end user impact if
> > Linux does not include this change? For example, if it is just one
> > vendor that made this mistake that can be an explicit quirk.
> >
> > A dev_warn() is not necessary for simple quirks.
>
> This is not simply a cross check, the driver prevents enablement of
> CXL mem devs other than PCI_DEVFN(0, 0). It shouldn't silently drop
> out then. It's a device malfunction which should appropriate reported
> and not only visible if dbg is enabled.
>
> As written above, the check is necessary as the counterpart is missing

It is only necessary if this condition happens in practice, not a
theoretically. So I am asking, are you seeing this with an actual device
that someone will use in production? If so, that's what pci quirks are
for to keep those workarounds organized in a common location.

2022-11-18 20:38:32

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH v3 5/9] cxl/pci: Only register RCDs with device 0, function 0 as CXL memory device

On 18.11.22 08:55:13, Dan Williams wrote:
> Robert Richter wrote:
> > On 17.11.22 09:27:23, Dan Williams wrote:
> > > Robert Richter wrote:
> > > > On 16.11.22 11:24:48, Dan Williams wrote:
> > > > > Robert Richter wrote:
> > > > > > The Device 0, Function 0 DVSEC controls the CXL functionality of the
> > > > > > entire device. Add a check to prevent registration of any other PCI
> > > > > > device on the bus as a CXL memory device.
> > > > >
> > > > > Can you reference the specification wording that indicates that the OS
> > > > > needs to actively avoid these situations, or otherwise point to the real
> > > > > world scenario where this filtering is needed?
> > > >
> > > > CXL 3.0
> > > >
> > > > 8.1.3 PCIe DVSEC for CXL Device
> > > >
> > > > """
> > > > An RCD creates a new PCIe enumeration hierarchy. As such, it spawns a new Root Bus
> > > > and can expose one or more PCIe device numbers and function numbers at this bus
> > > > number. These are exposed as Root Complex Integrated Endpoints (RCiEP). The PCIe
> > > > Configuration Space of Device 0, Function 0 shall include the CXL PCIe DVSEC as shown
> > > > in Figure 8-1.
> > > > """
> > > >
> > > > """
> > > > In either case, the capability, status, and control fields in Device 0, Function 0 DVSEC
> > > > control the CXL functionality of the entire device.
> > > > """
> > > >
> > > > There are some other occurrences. I think this is even true for VH
> > > > mode, as multiple CXL devices on the bus are exposed through multiple
> > > > DSPs or Root Ports.
> > > >
> > > > Anyway, I limited this to an RCD only, esp. because its counterpart
> > > > would be missing and thus port mapping would fail otherwise. See
> > > > restricted_host_enumerate_dport() of this series.
> > > >
> > > > >
> > > > > >
> > > > > > Signed-off-by: Robert Richter <[email protected]>
> > > > > > ---
> > > > > > drivers/cxl/pci.c | 25 +++++++++++++++++++++++--
> > > > > > 1 file changed, 23 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > > > > > index faeb5d9d7a7a..cc4f206f24b3 100644
> > > > > > --- a/drivers/cxl/pci.c
> > > > > > +++ b/drivers/cxl/pci.c
> > > > > > @@ -428,11 +428,26 @@ static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
> > > > > > }
> > > > > > }
> > > > > >
> > > > > > +static int check_restricted_device(struct pci_dev *pdev, u16 pcie_dvsec)
> > > > > > +{
> > > > > > + if (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_END)
> > > > > > + return 0; /* no RCD */
> > > > > > +
> > > > > > + if (pdev->devfn == PCI_DEVFN(0, 0) && pcie_dvsec)
> > > > > > + return 0; /* ok */
> > > > > > +
> > > > > > + dev_warn(&pdev->dev, "Skipping RCD: devfn=0x%02x dvsec=%u\n",
> > > > >
> > > > > s/0x%02x/%#02x/
> > > > >
> > > > > > + pdev->devfn, pcie_dvsec);
> > > >
> > > > Ok.
> > > >
> > > > > This looks like a dev_dbg() to me. Otherwise a warning will always fire
> > > > > on a benign condition.
> > > >
> > > > I have chosen dev_warn() here as this is a non-compliant unexpected
> > > > behavior of the device. There are no (legal) cases this may happen. I
> > > > suppose you are worried about spamming the console here, but that
> > > > error should be reported somewhere and thus being visible.
> > >
> > > There are so many spec illegal values and conditions that the driver
> > > could checki, but does not. The reason I am poking here is why does the
> > > driver need to be explicit about *this* illegal condition versus all the
> > > other potential conditions? What is the practical end user impact if
> > > Linux does not include this change? For example, if it is just one
> > > vendor that made this mistake that can be an explicit quirk.
> > >
> > > A dev_warn() is not necessary for simple quirks.
> >
> > This is not simply a cross check, the driver prevents enablement of
> > CXL mem devs other than PCI_DEVFN(0, 0). It shouldn't silently drop
> > out then. It's a device malfunction which should appropriate reported
> > and not only visible if dbg is enabled.
> >
> > As written above, the check is necessary as the counterpart is missing
>
> It is only necessary if this condition happens in practice, not a
> theoretically. So I am asking, are you seeing this with an actual device
> that someone will use in production? If so, that's what pci quirks are
> for to keep those workarounds organized in a common location.

I can make it a dev_dbg() message. But I do not understand the ratio
behind this. This is not a quirk nor a workaround or a fix for
something. The likely paths are the conditions checked that return 0.
Only if the unlikely case happens where a CXL mem dev is not a dev 0,
func 0, a warning is shown to inform the user that this dev is not
enabled. So yes, this might be theoretical similar to that a driver
cannot allocate memory. But why not print this as a warning message
then?

Anyway, let's make it a dev_dbg().

Thanks,

-Robert

2022-11-18 21:06:12

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v3 5/9] cxl/pci: Only register RCDs with device 0, function 0 as CXL memory device

Robert Richter wrote:
> On 18.11.22 08:55:13, Dan Williams wrote:
> > Robert Richter wrote:
> > > On 17.11.22 09:27:23, Dan Williams wrote:
> > > > Robert Richter wrote:
> > > > > On 16.11.22 11:24:48, Dan Williams wrote:
> > > > > > Robert Richter wrote:
> > > > > > > The Device 0, Function 0 DVSEC controls the CXL functionality of the
> > > > > > > entire device. Add a check to prevent registration of any other PCI
> > > > > > > device on the bus as a CXL memory device.
> > > > > >
> > > > > > Can you reference the specification wording that indicates that the OS
> > > > > > needs to actively avoid these situations, or otherwise point to the real
> > > > > > world scenario where this filtering is needed?
> > > > >
> > > > > CXL 3.0
> > > > >
> > > > > 8.1.3 PCIe DVSEC for CXL Device
> > > > >
> > > > > """
> > > > > An RCD creates a new PCIe enumeration hierarchy. As such, it spawns a new Root Bus
> > > > > and can expose one or more PCIe device numbers and function numbers at this bus
> > > > > number. These are exposed as Root Complex Integrated Endpoints (RCiEP). The PCIe
> > > > > Configuration Space of Device 0, Function 0 shall include the CXL PCIe DVSEC as shown
> > > > > in Figure 8-1.
> > > > > """
> > > > >
> > > > > """
> > > > > In either case, the capability, status, and control fields in Device 0, Function 0 DVSEC
> > > > > control the CXL functionality of the entire device.
> > > > > """
> > > > >
> > > > > There are some other occurrences. I think this is even true for VH
> > > > > mode, as multiple CXL devices on the bus are exposed through multiple
> > > > > DSPs or Root Ports.
> > > > >
> > > > > Anyway, I limited this to an RCD only, esp. because its counterpart
> > > > > would be missing and thus port mapping would fail otherwise. See
> > > > > restricted_host_enumerate_dport() of this series.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > Signed-off-by: Robert Richter <[email protected]>
> > > > > > > ---
> > > > > > > drivers/cxl/pci.c | 25 +++++++++++++++++++++++--
> > > > > > > 1 file changed, 23 insertions(+), 2 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > > > > > > index faeb5d9d7a7a..cc4f206f24b3 100644
> > > > > > > --- a/drivers/cxl/pci.c
> > > > > > > +++ b/drivers/cxl/pci.c
> > > > > > > @@ -428,11 +428,26 @@ static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
> > > > > > > }
> > > > > > > }
> > > > > > >
> > > > > > > +static int check_restricted_device(struct pci_dev *pdev, u16 pcie_dvsec)
> > > > > > > +{
> > > > > > > + if (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_END)
> > > > > > > + return 0; /* no RCD */
> > > > > > > +
> > > > > > > + if (pdev->devfn == PCI_DEVFN(0, 0) && pcie_dvsec)
> > > > > > > + return 0; /* ok */
> > > > > > > +
> > > > > > > + dev_warn(&pdev->dev, "Skipping RCD: devfn=0x%02x dvsec=%u\n",
> > > > > >
> > > > > > s/0x%02x/%#02x/
> > > > > >
> > > > > > > + pdev->devfn, pcie_dvsec);
> > > > >
> > > > > Ok.
> > > > >
> > > > > > This looks like a dev_dbg() to me. Otherwise a warning will always fire
> > > > > > on a benign condition.
> > > > >
> > > > > I have chosen dev_warn() here as this is a non-compliant unexpected
> > > > > behavior of the device. There are no (legal) cases this may happen. I
> > > > > suppose you are worried about spamming the console here, but that
> > > > > error should be reported somewhere and thus being visible.
> > > >
> > > > There are so many spec illegal values and conditions that the driver
> > > > could checki, but does not. The reason I am poking here is why does the
> > > > driver need to be explicit about *this* illegal condition versus all the
> > > > other potential conditions? What is the practical end user impact if
> > > > Linux does not include this change? For example, if it is just one
> > > > vendor that made this mistake that can be an explicit quirk.
> > > >
> > > > A dev_warn() is not necessary for simple quirks.
> > >
> > > This is not simply a cross check, the driver prevents enablement of
> > > CXL mem devs other than PCI_DEVFN(0, 0). It shouldn't silently drop
> > > out then. It's a device malfunction which should appropriate reported
> > > and not only visible if dbg is enabled.
> > >
> > > As written above, the check is necessary as the counterpart is missing
> >
> > It is only necessary if this condition happens in practice, not a
> > theoretically. So I am asking, are you seeing this with an actual device
> > that someone will use in production? If so, that's what pci quirks are
> > for to keep those workarounds organized in a common location.
>
> I can make it a dev_dbg() message. But I do not understand the ratio
> behind this. This is not a quirk nor a workaround or a fix for
> something. The likely paths are the conditions checked that return 0.
> Only if the unlikely case happens where a CXL mem dev is not a dev 0,
> func 0, a warning is shown to inform the user that this dev is not
> enabled. So yes, this might be theoretical similar to that a driver
> cannot allocate memory. But why not print this as a warning message
> then?
>
> Anyway, let's make it a dev_dbg().

Sorry for the thrash, lets set aside the the dev_dbg() vs dev_warn()
issue. It is minor compared to *why* this patch needs to be applied. I
would expect all production devices to be spec compliant and not
advertise the CXL memory device class code on anything but function0.

So either, there is a real threat that someone will build such a mistake
and Linux needs to take this action to protect itself, or no one will
ever build such a device and this patch is not needed.

Basically I read the changelog and it answered the "What?" question, but
it did not answer the "Why?" question.