2022-04-16 01:37:24

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V8 00/10] CXL: Read CDAT and DSMAS data from the device

From: Ira Weiny <[email protected]>

Changes from V7:[4]
Avoid code bloat by making pci-doe.c conditional on CONFIG_PCI_DOE
which is auto selected by the CXL_PCI config option.
Minor code clean ups
Fix bug in pci_doe_supports_prot()
Rebase to cxl-pending

Changes from V6:[3]
The big change is the removal of the auxiliary bus code from the PCI
layer. The auxiliary bus usage is now in the CXL layer. The PCI layer
provides helpers for subsystems to utilize DOE mailboxes by creating a
pci_doe_mb object which controls a state machine for that mailbox
capability. The CXL layer wraps this object in an auxiliary device and
driver which can then be used to determine if the kernel is controlling
the capability or it is available to be used by user space. Reads from
user space via lspci are allowed. Writes are allowed but flagged via a
tainting the kernel.

Feedback from Bjorn, Jonathan, and Dan
Details in each patch

Changes from V5:[0]

Rework the patch set to split PCI vs CXL changes
Also make each change a bit more stand alone for easier review
Add cxl_cdat structure
Put CDAT related data structures in cdat.h
Clarify some device lifetimes with comments
Incorporate feedback from Jonathan, Bjorn and Dan
The bigest change is placing the DOE scanning code into the
pci_doe driver (part of the PCI codre).
Validate the CDAT when it is read rather than before DSMAS
parsing
Do not report DSMAS failure as an error, report a warning and
keep going.
Retry reading the table 1 time.
Update commit messages and this cover letter



CXL drivers need various data which are provided through generic DOE mailboxes
as defined in the PCIe 6.0 spec.[1]

One such data is the Coherent Device Atribute Table (CDAT). CDAT data provides
coherent information about the various devices in the system. It was developed
because systems no longer have a priori knowledge of all coherent devices
within a system. CDAT describes the coherent characteristics of the
components on the CXL bus separate from system configurations. The OS can
then, for example, use this information to form correct interleave sets.

To begin reading the CDAT the OS must have support to access the DOE mailboxes
provided by the CXL devices.

Because DOE is not specific to DOE but is provided within the PCI spec, the
series adds PCI DOE capability library functions. These functions allow for
the iteration of the DOE capabilities on a device as well as creating
pci_doe_mb structures which can control the operation of the DOE state machine.

CXL iterates the DOE capabilities creates auxiliary bus devices. These devices
are driven by a CXL DOE auxiliary driver which calls into the PCI DOE library
functions as appropriate.

The auxiliary bus architecture allows for root users to control which DOE
mailboxes are controlled by the kernel and which should be allowed for
unrestricted access by user space. One such use case is to allow for CXL
Compliance Testing (CXL 2.0 14.16.4 Compliance Mode DOE). By default the
kernel controls all mailboxes found.

After the devices are created and the driver attaches, CDAT data is read from
the device and DSMAS information parsed from that CDAT blob for use later.

This work was tested using qemu with additional patches.


[0] https://lore.kernel.org/linux-cxl/[email protected]/
[1] https://pcisig.com/specifications
[2] https://lore.kernel.org/qemu-devel/[email protected]/
[3] https://lore.kernel.org/linux-cxl/[email protected]/
[4] https://lore.kernel.org/linux-cxl/[email protected]/


Ira Weiny (7):
PCI: Replace magic constant for PCI Sig Vendor ID
cxl/pci: Create auxiliary devices for each DOE mailbox
cxl/pci: Create DOE auxiliary driver
cxl/pci: Find the DOE mailbox which supports CDAT
cxl/cdat: Introduce cxl_cdat_valid()
cxl/mem: Retry reading CDAT on failure
cxl/port: Parse out DSMAS data from CDAT table

Jonathan Cameron (3):
PCI: Add vendor ID for the PCI SIG
PCI: Create PCI library functions in support of DOE mailboxes.
cxl/mem: Read CDAT table

drivers/cxl/Kconfig | 14 +
drivers/cxl/Makefile | 2 +
drivers/cxl/cdat.h | 116 +++++++
drivers/cxl/core/memdev.c | 38 +++
drivers/cxl/cxl.h | 8 +
drivers/cxl/cxlmem.h | 28 ++
drivers/cxl/cxlpci.h | 35 ++
drivers/cxl/doe.c | 90 +++++
drivers/cxl/pci.c | 423 +++++++++++++++++++++++
drivers/cxl/port.c | 65 ++++
drivers/pci/Kconfig | 3 +
drivers/pci/Makefile | 1 +
drivers/pci/pci-doe.c | 607 ++++++++++++++++++++++++++++++++++
drivers/pci/probe.c | 2 +-
include/linux/pci-doe.h | 119 +++++++
include/linux/pci_ids.h | 1 +
include/uapi/linux/pci_regs.h | 30 +-
17 files changed, 1580 insertions(+), 2 deletions(-)
create mode 100644 drivers/cxl/cdat.h
create mode 100644 drivers/cxl/doe.c
create mode 100644 drivers/pci/pci-doe.c
create mode 100644 include/linux/pci-doe.h


base-commit: 4f497e0dd3530981f8c897d5d47ee880b62baefb
--
2.35.1


2022-04-16 01:41:51

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V8 09/10] cxl/mem: Retry reading CDAT on failure

From: Ira Weiny <[email protected]>

The CDAT read may fail for a number of reasons but mainly it is possible
to get different parts of a valid state. The checksum in the CDAT table
protects against this.

Now that the cdat data is validated issue a retries if the CDAT read
fails. For now 5 retries are implemented.

Signed-off-by: Ira Weiny <[email protected]>

---
Changes from V6
Move to pci.c
Fix retries count
Change to 5 retries

Changes from V5:
New patch -- easy to push off or drop.
---
drivers/cxl/pci.c | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index d7952156dd02..43cbc297079d 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -940,7 +940,7 @@ static void cxl_initialize_cdat_callbacks(struct cxl_dev_state *cxlds)
cxlds->cdat_read_table = cxl_cdat_read_table;
}

-static int read_cdat_data(struct cxl_dev_state *cxlds)
+static int __read_cdat_data(struct cxl_dev_state *cxlds)
{
struct device *dev = cxlds->dev;
size_t cdat_length;
@@ -962,6 +962,21 @@ static int read_cdat_data(struct cxl_dev_state *cxlds)
return ret;
}

+static void read_cdat_data(struct cxl_dev_state *cxlds)
+{
+ int retries = 5;
+ int rc;
+
+ while (retries--) {
+ rc = __read_cdat_data(cxlds);
+ if (!rc)
+ break;
+ dev_err(cxlds->dev,
+ "CDAT data read error rc=%d (retries %d)\n",
+ rc, retries);
+ }
+}
+
static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
{
struct cxl_register_map map;
@@ -1035,9 +1050,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
cxl_initialize_cdat_callbacks(cxlds);

/* Cache the data early to ensure is_visible() works */
- rc = read_cdat_data(cxlds);
- if (rc)
- dev_err(&pdev->dev, "CDAT data read error (%d)\n", rc);
+ read_cdat_data(cxlds);

cxl_dvsec_ranges(cxlds);

--
2.35.1

2022-04-16 01:47:16

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V8 08/10] cxl/cdat: Introduce cxl_cdat_valid()

From: Ira Weiny <[email protected]>

The CDAT data is protected by a checksum and should be the proper
length.

Introduce cxl_cdat_valid() to validate the data. While at it check and
store the sequence number.

Signed-off-by: Ira Weiny <[email protected]>

---
Changes from V6
Change name to cxl_cdat_valid() as this validates all the CDAT
data not just the header
Add error and debug prints

Changes from V5
New patch, split out
Update cdat_hdr_valid()
Remove revision and cs field parsing
There is no point in these
Add seq check and debug print.
---
drivers/cxl/cdat.h | 2 ++
drivers/cxl/pci.c | 36 ++++++++++++++++++++++++++++++++++++
2 files changed, 38 insertions(+)

diff --git a/drivers/cxl/cdat.h b/drivers/cxl/cdat.h
index 4722b6bbbaf0..a7725d26f2d2 100644
--- a/drivers/cxl/cdat.h
+++ b/drivers/cxl/cdat.h
@@ -88,10 +88,12 @@
*
* @table: cache of CDAT table
* @length: length of cached CDAT table
+ * @seq: Last read Sequence number of the CDAT table
*/
struct cxl_cdat {
void *table;
size_t length;
+ u32 seq;
};

#endif /* !__CXL_CDAT_H__ */
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index aecb327911a0..d7952156dd02 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -781,6 +781,40 @@ static int cxl_setup_doe_devices(struct cxl_dev_state *cxlds)
return 0;
}

+static bool cxl_cdat_valid(struct device *dev, struct cxl_cdat *cdat)
+{
+ u32 *table = cdat->table;
+ u8 *data8 = cdat->table;
+ u32 length, seq;
+ u8 check;
+ int i;
+
+ length = FIELD_GET(CDAT_HEADER_DW0_LENGTH, table[0]);
+ if ((length < CDAT_HEADER_LENGTH_BYTES) || (length > cdat->length)) {
+ dev_err(dev, "Invalid length %u (%lu-%lu)\n", length,
+ CDAT_HEADER_LENGTH_BYTES, cdat->length);
+ return false;
+ }
+
+ for (check = 0, i = 0; i < length; i++)
+ check += data8[i];
+
+ dev_dbg(dev, "CDAT length %u CS %u\n", length, check);
+ if (check != 0) {
+ dev_err(dev, "Invalid checksum %u\n", check);
+ return false;
+ }
+
+ seq = FIELD_GET(CDAT_HEADER_DW3_SEQUENCE, table[3]);
+ /* Store the sequence for now. */
+ if (cdat->seq != seq) {
+ dev_info(dev, "CDAT seq change %x -> %x\n", cdat->seq, seq);
+ cdat->seq = seq;
+ }
+
+ return true;
+}
+
#define CDAT_DOE_REQ(entry_handle) \
(FIELD_PREP(CXL_DOE_TABLE_ACCESS_REQ_CODE, \
CXL_DOE_TABLE_ACCESS_REQ_CODE_READ) | \
@@ -892,6 +926,8 @@ static int cxl_cdat_read_table(struct cxl_dev_state *cxlds,

release_driver:
cxl_pci_doe_put_drv(doe_dev);
+ if (!rc && !cxl_cdat_valid(cxlds->dev, cdat))
+ return -EIO;
return rc;
}

--
2.35.1

2022-04-16 02:02:24

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V8 06/10] cxl/pci: Find the DOE mailbox which supports CDAT

From: Ira Weiny <[email protected]>

Each CXL device may have multiple DOE mailbox capabilities and each
mailbox may support multiple protocols.

Search the DOE auxiliary devices for one which supports the CDAT
protocol. Cache that device to be used for future queries.

Signed-off-by: Ira Weiny <[email protected]>

---
Changes from V7
Minor code clean ups

Changes from V6
Adjust for aux devices being a CXL only concept
Update commit msg.
Ensure devices iterated by auxiliary_find_device() are checked
to be DOE devices prior to checking for the CDAT
protocol
From Ben
Ensure reference from auxiliary_find_device() is dropped
---
drivers/cxl/cxl.h | 2 ++
drivers/cxl/cxlmem.h | 2 ++
drivers/cxl/pci.c | 76 +++++++++++++++++++++++++++++++++++++++++++-
3 files changed, 79 insertions(+), 1 deletion(-)

diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 990b6670222e..50817fd2c912 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -90,6 +90,8 @@ static inline int cxl_hdm_decoder_count(u32 cap_hdr)
#define CXLDEV_MBOX_BG_CMD_STATUS_OFFSET 0x18
#define CXLDEV_MBOX_PAYLOAD_OFFSET 0x20

+#define CXL_DOE_PROTOCOL_TABLE_ACCESS 2
+
/*
* Using struct_group() allows for per register-block-type helper routines,
* without requiring block-type agnostic code to include the prefix.
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 7235d2f976e5..0dc6988afb30 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -168,6 +168,7 @@ struct cxl_endpoint_dvsec_info {
* Currently only memory devices are represented.
*
* @dev: The device associated with this CXL state
+ * @cdat_doe: Auxiliary DOE device capabile of reading CDAT
* @regs: Parsed register blocks
* @cxl_dvsec: Offset to the PCIe device DVSEC
* @payload_size: Size of space for payload
@@ -200,6 +201,7 @@ struct cxl_endpoint_dvsec_info {
struct cxl_dev_state {
struct device *dev;

+ struct cxl_doe_dev *cdat_doe;
struct cxl_regs regs;
int cxl_dvsec;

diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 0dec1f1a3f38..622cac925262 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -706,6 +706,80 @@ static int cxl_pci_create_doe_devices(struct pci_dev *pdev)
return 0;
}

+static bool cxl_doe_dev_supports_prot(struct cxl_doe_dev *doe_dev, u16 vid, u16 pid)
+{
+ struct cxl_doe_drv_state *doe_ds;
+ bool ret;
+
+ doe_ds = cxl_pci_doe_get_drv(doe_dev);
+ if (!doe_ds) {
+ cxl_pci_doe_put_drv(doe_dev);
+ return false;
+ }
+ ret = pci_doe_supports_prot(doe_ds->doe_mb, vid, pid);
+ cxl_pci_doe_put_drv(doe_dev);
+ return ret;
+}
+
+static int cxl_match_cdat_doe_device(struct device *dev, const void *data)
+{
+ const struct cxl_dev_state *cxlds = data;
+ struct auxiliary_device *adev = to_auxiliary_dev(dev);
+ struct cxl_doe_dev *doe_dev;
+
+ /* Ensure this is a DOE device */
+ if (strcmp(DOE_DEV_NAME, adev->name))
+ return 0;
+
+ /* Determine if this auxiliary device belongs to the cxlds */
+ if (cxlds->dev != dev->parent)
+ return 0;
+
+ doe_dev = container_of(adev, struct cxl_doe_dev, adev);
+
+ /* If it is one of ours check for the CDAT protocol */
+ if (!cxl_doe_dev_supports_prot(doe_dev, PCI_DVSEC_VENDOR_ID_CXL,
+ CXL_DOE_PROTOCOL_TABLE_ACCESS))
+ return 0;
+
+ return 1;
+}
+
+static void drop_cdat_doe_ref(void *data)
+{
+ struct cxl_doe_dev *cdat_doe = data;
+
+ put_device(&cdat_doe->adev.dev);
+}
+
+static int cxl_setup_doe_devices(struct cxl_dev_state *cxlds)
+{
+ struct device *dev = cxlds->dev;
+ struct pci_dev *pdev = to_pci_dev(dev);
+ struct auxiliary_device *adev;
+ int rc;
+
+ rc = cxl_pci_create_doe_devices(pdev);
+ if (rc)
+ return rc;
+
+ adev = auxiliary_find_device(NULL, cxlds, &cxl_match_cdat_doe_device);
+
+ if (adev) {
+ struct cxl_doe_dev *doe_dev = container_of(adev,
+ struct cxl_doe_dev,
+ adev);
+
+ /* auxiliary_find_device() takes the reference */
+ rc = devm_add_action_or_reset(dev, drop_cdat_doe_ref, doe_dev);
+ if (rc)
+ return rc;
+ cxlds->cdat_doe = doe_dev;
+ }
+
+ return 0;
+}
+
static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
{
struct cxl_register_map map;
@@ -772,7 +846,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
if (rc)
return rc;

- rc = cxl_pci_create_doe_devices(pdev);
+ rc = cxl_setup_doe_devices(cxlds);
if (rc)
return rc;

--
2.35.1

2022-04-16 02:17:03

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V8 04/10] cxl/pci: Create auxiliary devices for each DOE mailbox

From: Ira Weiny <[email protected]>

CXL kernel drivers optionally need to access DOE mailbox capabilities.
Access to mailboxes for things such as CDAT, SPDM, and IDE are needed by
the kernel while other access is designed towards user space usage. An
example of this is for CXL Compliance Testing (see CXL 2.0 14.16.4
Compliance Mode DOE) which offers a mechanism to set different test
modes for a device.

There is no anticipated need for the kernel to share an individual
mailbox with user space. Thus developing an interface to marshal access
between the kernel and user space for a single mailbox is unnecessary
overhead. However, having the kernel relinquish some mailboxes to be
controlled by user space is a reasonable compromise to share access to
the device.

The auxiliary bus provides an elegant solution for this. Each DOE
capability is given its own auxiliary device. This device is controlled
by a kernel driver by default which restricts access to the mailbox.
Unbinding the driver from a single auxiliary device (DOE mailbox
capability) frees the mailbox for user space access. This architecture
also allows a clear picture on which mailboxes are kernel controlled vs
not.

Iterate each DOE mailbox capability and create auxiliary bus devices.
Follow on patches will define a driver for the newly created devices.

sysfs shows the devices.

$ ls -l /sys/bus/auxiliary/devices/
total 0
lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.0 -> ../../../devices/pci0000:bf/0000:bf:00.0/0000:c0:00.0/cxl_pci.doe.0
lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.1 -> ../../../devices/pci0000:bf/0000:bf:01.0/0000:c1:00.0/cxl_pci.doe.1
lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.2 -> ../../../devices/pci0000:35/0000:35:00.0/0000:36:00.0/cxl_pci.doe.2
lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.3 -> ../../../devices/pci0000:35/0000:35:01.0/0000:37:00.0/cxl_pci.doe.3
lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.4 -> ../../../devices/pci0000:35/0000:35:00.0/0000:36:00.0/cxl_pci.doe.4
lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.5 -> ../../../devices/pci0000:bf/0000:bf:00.0/0000:c0:00.0/cxl_pci.doe.5
lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.6 -> ../../../devices/pci0000:35/0000:35:01.0/0000:37:00.0/cxl_pci.doe.6
lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.7 -> ../../../devices/pci0000:bf/0000:bf:01.0/0000:c1:00.0/cxl_pci.doe.7

Signed-off-by: Ira Weiny <[email protected]>

---
Changes from V7:
Minor code clean ups
Rebased on cxl-pending

Changes from V6:
Move all the auxiliary device stuff to the CXL layer

Changes from V5:
Split the CXL specific stuff off from the PCI DOE create
auxiliary device code.
---
drivers/cxl/Kconfig | 1 +
drivers/cxl/cxlpci.h | 21 +++++++
drivers/cxl/pci.c | 127 +++++++++++++++++++++++++++++++++++++++++++
3 files changed, 149 insertions(+)

diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index f64e3984689f..ac0f5ca95431 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -16,6 +16,7 @@ if CXL_BUS
config CXL_PCI
tristate "PCI manageability"
default CXL_BUS
+ select AUXILIARY_BUS
help
The CXL specification defines a "CXL memory device" sub-class in the
PCI "memory controller" base class of devices. Device's identified by
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index 329e7ea3f36a..2ad8715173ce 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -2,6 +2,7 @@
/* Copyright(c) 2020 Intel Corporation. All rights reserved. */
#ifndef __CXL_PCI_H__
#define __CXL_PCI_H__
+#include <linux/auxiliary_bus.h>
#include <linux/pci.h>
#include "cxl.h"

@@ -72,4 +73,24 @@ static inline resource_size_t cxl_regmap_to_base(struct pci_dev *pdev,
}

int devm_cxl_port_enumerate_dports(struct cxl_port *port);
+
+/**
+ * struct cxl_doe_dev - CXL DOE auxiliary bus device
+ *
+ * @adev: Auxiliary bus device
+ * @pdev: PCI device this belongs to
+ * @cap_offset: Capability offset
+ * @use_irq: Set if IRQs are to be used with this mailbox
+ *
+ * This represents a single DOE mailbox device. CXL devices should create this
+ * device and register it on the Auxiliary bus for the CXL DOE driver to drive.
+ */
+struct cxl_doe_dev {
+ struct auxiliary_device adev;
+ struct pci_dev *pdev;
+ int cap_offset;
+ bool use_irq;
+};
+#define DOE_DEV_NAME "doe"
+
#endif /* __CXL_PCI_H__ */
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index e7ab9a34d718..41a6f3eb0a5c 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -8,6 +8,7 @@
#include <linux/mutex.h>
#include <linux/list.h>
#include <linux/pci.h>
+#include <linux/pci-doe.h>
#include <linux/io.h>
#include "cxlmem.h"
#include "cxlpci.h"
@@ -564,6 +565,128 @@ static void cxl_dvsec_ranges(struct cxl_dev_state *cxlds)
info->ranges = __cxl_dvsec_ranges(cxlds, info);
}

+static void cxl_pci_free_irq_vectors(void *data)
+{
+ pci_free_irq_vectors(data);
+}
+
+static DEFINE_IDA(pci_doe_adev_ida);
+
+static void cxl_pci_doe_dev_release(struct device *dev)
+{
+ struct auxiliary_device *adev = container_of(dev,
+ struct auxiliary_device,
+ dev);
+ struct cxl_doe_dev *doe_dev = container_of(adev, struct cxl_doe_dev,
+ adev);
+
+ ida_free(&pci_doe_adev_ida, adev->id);
+ kfree(doe_dev);
+}
+
+static void cxl_pci_doe_destroy_device(void *ad)
+{
+ auxiliary_device_delete(ad);
+ auxiliary_device_uninit(ad);
+}
+
+/**
+ * cxl_pci_create_doe_devices - Create auxiliary bus DOE devices for all DOE
+ * mailboxes found
+ *
+ * @pci_dev: The PCI device to scan for DOE mailboxes
+ *
+ * There is no coresponding destroy of these devices. This function associates
+ * the DOE auxiliary devices created with the pci_dev passed in. That
+ * association is device managed (devm_*) such that the DOE auxiliary device
+ * lifetime is always less than or equal to the lifetime of the pci_dev.
+ *
+ * RETURNS: 0 on success -ERRNO on failure.
+ */
+static int cxl_pci_create_doe_devices(struct pci_dev *pdev)
+{
+ struct device *dev = &pdev->dev;
+ bool use_irq = true;
+ int irqs = 0;
+ u16 off = 0;
+ int rc;
+
+ pci_doe_for_each_off(pdev, off)
+ irqs++;
+ pci_info(pdev, "Found %d DOE mailbox's\n", irqs);
+
+ /*
+ * Allocate enough vectors for the DOE's
+ */
+ rc = pci_alloc_irq_vectors(pdev, irqs, irqs, PCI_IRQ_MSI |
+ PCI_IRQ_MSIX);
+ if (rc != irqs) {
+ pci_err(pdev,
+ "Not enough interrupts for all the DOEs; use polling\n");
+ use_irq = false;
+ /* Some got allocated; clean them up */
+ if (rc > 0)
+ cxl_pci_free_irq_vectors(pdev);
+ } else {
+ /*
+ * Enabling bus mastering is require for MSI/MSIx. It could be
+ * done later within the DOE initialization, but as it
+ * potentially has other impacts keep it here when setting up
+ * the IRQ's.
+ */
+ pci_set_master(pdev);
+ rc = devm_add_action_or_reset(dev,
+ cxl_pci_free_irq_vectors,
+ pdev);
+ if (rc)
+ return rc;
+ }
+
+ pci_doe_for_each_off(pdev, off) {
+ struct auxiliary_device *adev;
+ struct cxl_doe_dev *new_dev;
+ int id;
+
+ new_dev = kzalloc(sizeof(*new_dev), GFP_KERNEL);
+ if (!new_dev)
+ return -ENOMEM;
+
+ new_dev->pdev = pdev;
+ new_dev->cap_offset = off;
+ new_dev->use_irq = use_irq;
+
+ /* Set up struct auxiliary_device */
+ adev = &new_dev->adev;
+ id = ida_alloc(&pci_doe_adev_ida, GFP_KERNEL);
+ if (id < 0) {
+ kfree(new_dev);
+ return -ENOMEM;
+ }
+
+ adev->id = id;
+ adev->name = DOE_DEV_NAME;
+ adev->dev.release = cxl_pci_doe_dev_release;
+ adev->dev.parent = dev;
+
+ if (auxiliary_device_init(adev)) {
+ cxl_pci_doe_dev_release(&adev->dev);
+ return -EIO;
+ }
+
+ if (auxiliary_device_add(adev)) {
+ auxiliary_device_uninit(adev);
+ return -EIO;
+ }
+
+ rc = devm_add_action_or_reset(dev, cxl_pci_doe_destroy_device,
+ adev);
+ if (rc)
+ return rc;
+ }
+
+ return 0;
+}
+
static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
{
struct cxl_register_map map;
@@ -630,6 +753,10 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
if (rc)
return rc;

+ rc = cxl_pci_create_doe_devices(pdev);
+ if (rc)
+ return rc;
+
cxl_dvsec_ranges(cxlds);

cxlmd = devm_cxl_add_memdev(cxlds);
--
2.35.1

2022-04-16 02:24:34

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V8 10/10] cxl/port: Parse out DSMAS data from CDAT table

From: Ira Weiny <[email protected]>

CXL Ports with memory devices attached need the information from the
Device Scoped Memory Affinity Structure (DSMAS). This information is
contained within the CDAT table buffer which is previously read and
cached in the device state.

If CDAT data is available, parse and cache DSMAS data from the table.
Store this data in unmarshaled struct dsmas data structures for ease of
use.

Signed-off-by: Ira Weiny <[email protected]>

---
Changes from V7
Rebased on cxl-pending

Changes from V6
Move to port.c
It is not an error if no DSMAS data is found

Changes from V5
Fix up sparse warnings
Split out cdat_hdr_valid()
Update cdat_hdr_valid()
Remove revision and cs field parsing
There is no point in these
Add seq check and debug print.
From Jonathan
Add spaces around '+' and '/'
use devm_krealloc() for dmas_ary
---
drivers/cxl/cdat.h | 17 ++++++++++++
drivers/cxl/cxl.h | 6 +++++
drivers/cxl/port.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 88 insertions(+)

diff --git a/drivers/cxl/cdat.h b/drivers/cxl/cdat.h
index a7725d26f2d2..66706b238bc9 100644
--- a/drivers/cxl/cdat.h
+++ b/drivers/cxl/cdat.h
@@ -83,6 +83,23 @@
#define CDAT_SSLBIS_ENTRY_PORT_Y(entry, i) (((entry)[4 + (i) * 2] & 0xffff0000) >> 16)
#define CDAT_SSLBIS_ENTRY_LAT_OR_BW(entry, i) ((entry)[4 + (i) * 2 + 1] & 0x0000ffff)

+/**
+ * struct cxl_dsmas - host unmarshaled version of DSMAS data
+ *
+ * As defined in the Coherent Device Attribute Table (CDAT) specification this
+ * represents a single DSMAS entry in that table.
+ *
+ * @dpa_base: The lowest Device Physical Address associated with this DSMAD
+ * @length: Length in bytes of this DSMAD
+ * @non_volatile: If set, the memory region represents Non-Volatile memory
+ */
+struct cxl_dsmas {
+ u64 dpa_base;
+ u64 length;
+ /* Flags */
+ u8 non_volatile:1;
+};
+
/**
* struct cxl_cdat - CXL CDAT data
*
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 50817fd2c912..80fd39769a65 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -9,6 +9,8 @@
#include <linux/bitops.h>
#include <linux/io.h>

+#include "cdat.h"
+
/**
* DOC: cxl objects
*
@@ -269,6 +271,8 @@ struct cxl_nvdimm {
* @component_reg_phys: component register capability base address (optional)
* @dead: last ep has been removed, force port re-creation
* @depth: How deep this port is relative to the root. depth 0 is the root.
+ * @dsmas_ary: Array of DSMAS entries as parsed from the CDAT table
+ * @nr_dsmas: Number of entries in dsmas_ary
*/
struct cxl_port {
struct device dev;
@@ -280,6 +284,8 @@ struct cxl_port {
resource_size_t component_reg_phys;
bool dead;
unsigned int depth;
+ struct cxl_dsmas *dsmas_ary;
+ int nr_dsmas;
};

/**
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index d420da5fc39c..2288432a27cd 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -30,6 +30,70 @@ static void schedule_detach(void *cxlmd)
schedule_cxl_memdev_detach(cxlmd);
}

+static void parse_dsmas(struct cxl_port *port, struct cxl_dev_state *cxlds)
+{
+ struct device *dev = &port->dev;
+ struct cxl_dsmas *dsmas_ary = NULL;
+ u32 *data = cxlds->cdat.table;
+ int bytes_left = cxlds->cdat.length;
+ int nr_dsmas = 0;
+
+ if (!data) {
+ dev_info(dev, "No CDAT data available for DSMAS\n");
+ return;
+ }
+
+ /* Skip header */
+ data += CDAT_HEADER_LENGTH_DW;
+ bytes_left -= CDAT_HEADER_LENGTH_BYTES;
+
+ while (bytes_left > 0) {
+ u32 *cur_rec = data;
+ u8 type = FIELD_GET(CDAT_STRUCTURE_DW0_TYPE, cur_rec[0]);
+ u16 length = FIELD_GET(CDAT_STRUCTURE_DW0_LENGTH, cur_rec[0]);
+
+ if (type == CDAT_STRUCTURE_DW0_TYPE_DSMAS) {
+ struct cxl_dsmas *new_ary;
+ u8 flags;
+
+ new_ary = devm_krealloc(dev, dsmas_ary,
+ sizeof(*dsmas_ary) * (nr_dsmas + 1),
+ GFP_KERNEL);
+ if (!new_ary) {
+ dev_err(dev,
+ "Failed to allocate memory for DSMAS data (nr_dsmas %d)\n",
+ nr_dsmas);
+ return;
+ }
+ dsmas_ary = new_ary;
+
+ flags = FIELD_GET(CDAT_DSMAS_DW1_FLAGS, cur_rec[1]);
+
+ dsmas_ary[nr_dsmas].dpa_base = CDAT_DSMAS_DPA_OFFSET(cur_rec);
+ dsmas_ary[nr_dsmas].length = CDAT_DSMAS_DPA_LEN(cur_rec);
+ dsmas_ary[nr_dsmas].non_volatile = CDAT_DSMAS_NON_VOLATILE(flags);
+
+ dev_dbg(dev, "DSMAS %d: %llx:%llx %s\n",
+ nr_dsmas,
+ dsmas_ary[nr_dsmas].dpa_base,
+ dsmas_ary[nr_dsmas].dpa_base +
+ dsmas_ary[nr_dsmas].length,
+ (dsmas_ary[nr_dsmas].non_volatile ?
+ "Persistent" : "Volatile")
+ );
+
+ nr_dsmas++;
+ }
+
+ data += (length / sizeof(u32));
+ bytes_left -= length;
+ }
+
+ dev_dbg(dev, "Found %d DSMAS entries\n", nr_dsmas);
+ port->dsmas_ary = dsmas_ary;
+ port->nr_dsmas = nr_dsmas;
+}
+
static int cxl_port_probe(struct device *dev)
{
struct cxl_port *port = to_cxl_port(dev);
@@ -43,6 +107,7 @@ static int cxl_port_probe(struct device *dev)
rc = devm_add_action_or_reset(dev, schedule_detach, cxlmd);
if (rc)
return rc;
+ parse_dsmas(port, cxlmd->cxlds);
} else {
rc = devm_cxl_port_enumerate_dports(port);
if (rc < 0)
--
2.35.1

2022-04-16 02:43:55

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V8 05/10] cxl/pci: Create DOE auxiliary driver

From: Ira Weiny <[email protected]>

CXL kernel drivers optionally need to access DOE mailbox capabilities.
Access to mailboxes for things such as CDAT, SPDM, and IDE are needed by
the kernel while other access is designed towards user space usage. An
example of this is for CXL Compliance Testing (see CXL 2.0 14.16.4
Compliance Mode DOE) which offers a mechanism to set different test
modes for a device.

There is no anticipated need for the kernel to share an individual
mailbox with user space. Thus developing an interface to marshal access
between the kernel and user space for a single mailbox is unnecessary
overhead. However, having the kernel relinquish some mailboxes to be
controlled by user space is a reasonable compromise to share access to
the device.

The auxiliary bus provides an elegant solution for this. Each DOE
capability is given its own auxiliary device. This device is controlled
by a kernel driver by default which restricts access to the mailbox.
Unbinding the driver from a single auxiliary device (DOE mailbox
capability) frees the mailbox for user space access. This architecture
also allows a clear picture on which mailboxes are kernel controlled vs
not.

Create a driver for the DOE auxiliary devices. The driver uses the PCI
DOE core to manage the mailbox.

User space must be prevented from unbinding the driver state when the
DOE auxiliary driver is being accessed by the kernel. Add a read write
lock to the DOE auxiliary device to protect the driver data portion.

Finally, flag the driver module to be preloaded by device creation to
ensure the driver is attached when iterating the DOE capabilities.

User space access can be obtained by unbinding the driver from that
device. For example:

$ ls -l /sys/bus/auxiliary/drivers
total 0
drwxr-xr-x 2 root root 0 Mar 24 10:45 cxl_doe.cxl_doe_drv

$ ls -l /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci*
lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.0 -> ../../../../devices/pci0000:bf/0000:bf:00.0/0000:c0:00.0/cxl_pci.doe.0
lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.1 -> ../../../../devices/pci0000:bf/0000:bf:01.0/0000:c1:00.0/cxl_pci.doe.1
lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.2 -> ../../../../devices/pci0000:35/0000:35:00.0/0000:36:00.0/cxl_pci.doe.2
lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.3 -> ../../../../devices/pci0000:35/0000:35:01.0/0000:37:00.0/cxl_pci.doe.3
lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.3 -> ../../../../devices/pci0000:35/0000:35:01.0/0000:37:00.0/cxl_pci.doe.4
lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.5 -> ../../../../devices/pci0000:bf/0000:bf:00.0/0000:c0:00.0/cxl_pci.doe.5
lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.6 -> ../../../../devices/pci0000:35/0000:35:01.0/0000:37:00.0/cxl_pci.doe.6
lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.7 -> ../../../../devices/pci0000:bf/0000:bf:01.0/0000:c1:00.0/cxl_pci.doe.7

$ echo "cxl_pci.doe.4" > /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/unbind

$ ls -l /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci*
lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.0 -> ../../../../devices/pci0000:bf/0000:bf:00.0/0000:c0:00.0/cxl_pci.doe.0
lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.1 -> ../../../../devices/pci0000:bf/0000:bf:01.0/0000:c1:00.0/cxl_pci.doe.1
lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.2 -> ../../../../devices/pci0000:35/0000:35:00.0/0000:36:00.0/cxl_pci.doe.2
lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.3 -> ../../../../devices/pci0000:35/0000:35:01.0/0000:37:00.0/cxl_pci.doe.3
lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.5 -> ../../../../devices/pci0000:bf/0000:bf:00.0/0000:c0:00.0/cxl_pci.doe.5
lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.6 -> ../../../../devices/pci0000:35/0000:35:01.0/0000:37:00.0/cxl_pci.doe.6
lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.7 -> ../../../../devices/pci0000:bf/0000:bf:01.0/0000:c1:00.0/cxl_pci.doe.7

Signed-off-by: Ira Weiny <[email protected]>

---
Changes from V7:
Now need to select PCI_DOE
Change MODULE_LICENSE to 'GPL' instead of old 'GPL v2'

Changes from V6:
The CXL layer now contains the driver for these auxiliary
devices.

Changes from V5:
Split the CXL specific stuff off from the PCI DOE create
auxiliary device code.
---
drivers/cxl/Kconfig | 13 +++++
drivers/cxl/Makefile | 2 +
drivers/cxl/cxlpci.h | 13 +++++
drivers/cxl/doe.c | 90 +++++++++++++++++++++++++++++++++++
drivers/cxl/pci.c | 20 ++++++++
include/uapi/linux/pci_regs.h | 1 +
6 files changed, 139 insertions(+)
create mode 100644 drivers/cxl/doe.c

diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index ac0f5ca95431..82f3908fa5cc 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -103,4 +103,17 @@ config CXL_SUSPEND
def_bool y
depends on SUSPEND && CXL_MEM

+config CXL_PCI_DOE
+ tristate "CXL PCI Data Object Exchange (DOE) support"
+ depends on CXL_PCI
+ default CXL_BUS
+ select PCI_DOE
+ help
+ Driver for DOE auxiliary devices.
+
+ The DOE capabilities provides a simple mailbox in PCI config space that
+ is used for a number of different protocols useful to CXL. The CXL PCI
+ subsystem creates auxiliary devices for each DOE mailbox capability
+ found. This driver is required for the kernel to use these devices.
+
endif
diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile
index a78270794150..c71b7a6345fb 100644
--- a/drivers/cxl/Makefile
+++ b/drivers/cxl/Makefile
@@ -1,6 +1,7 @@
# SPDX-License-Identifier: GPL-2.0
obj-y += core/
obj-$(CONFIG_CXL_PCI) += cxl_pci.o
+obj-$(CONFIG_CXL_PCI_DOE) += cxl_doe.o
obj-$(CONFIG_CXL_MEM) += cxl_mem.o
obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o
obj-$(CONFIG_CXL_PMEM) += cxl_pmem.o
@@ -8,6 +9,7 @@ obj-$(CONFIG_CXL_PORT) += cxl_port.o

cxl_mem-y := mem.o
cxl_pci-y := pci.o
+cxl_doe-y := doe.o
cxl_acpi-y := acpi.o
cxl_pmem-y := pmem.o
cxl_port-y := port.o
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index 2ad8715173ce..821fe05e8289 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -79,6 +79,7 @@ int devm_cxl_port_enumerate_dports(struct cxl_port *port);
*
* @adev: Auxiliary bus device
* @pdev: PCI device this belongs to
+ * @driver_access: Lock the driver during access
* @cap_offset: Capability offset
* @use_irq: Set if IRQs are to be used with this mailbox
*
@@ -88,9 +89,21 @@ int devm_cxl_port_enumerate_dports(struct cxl_port *port);
struct cxl_doe_dev {
struct auxiliary_device adev;
struct pci_dev *pdev;
+ struct rw_semaphore driver_access;
int cap_offset;
bool use_irq;
};
#define DOE_DEV_NAME "doe"

+/**
+ * struct cxl_doe_drv_state - state of the DOE Aux driver
+ *
+ * @doe_dev: The Auxiliary DOE device
+ * @doe_mb: PCI DOE mailbox state
+ */
+struct cxl_doe_drv_state {
+ struct cxl_doe_dev *doe_dev;
+ struct pci_doe_mb *doe_mb;
+};
+
#endif /* __CXL_PCI_H__ */
diff --git a/drivers/cxl/doe.c b/drivers/cxl/doe.c
new file mode 100644
index 000000000000..1d3a24a77002
--- /dev/null
+++ b/drivers/cxl/doe.c
@@ -0,0 +1,90 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2022 Intel Corporation. All rights reserved. */
+
+#include <linux/pci.h>
+#include <linux/pci-doe.h>
+
+#include "cxlpci.h"
+
+static void doe_destroy_mb(void *ds)
+{
+ struct cxl_doe_drv_state *doe_ds = ds;
+
+ pci_doe_destroy_mb(doe_ds->doe_mb);
+}
+
+static int cxl_pci_doe_probe(struct auxiliary_device *aux_dev,
+ const struct auxiliary_device_id *id)
+{
+ struct cxl_doe_dev *doe_dev = container_of(aux_dev, struct cxl_doe_dev,
+ adev);
+ struct device *dev = &aux_dev->dev;
+ struct cxl_doe_drv_state *doe_ds;
+ struct pci_doe_mb *doe_mb;
+
+ doe_ds = devm_kzalloc(dev, sizeof(*doe_ds), GFP_KERNEL);
+ if (!doe_ds)
+ return -ENOMEM;
+
+ doe_mb = pci_doe_create_mb(doe_dev->pdev, doe_dev->cap_offset,
+ doe_dev->use_irq);
+ if (IS_ERR(doe_mb)) {
+ dev_err(dev, "Failed to create the DOE mailbox state machine\n");
+ return PTR_ERR(doe_mb);
+ }
+
+ doe_ds->doe_mb = doe_mb;
+ devm_add_action_or_reset(dev, doe_destroy_mb, doe_ds);
+
+ down_write(&doe_dev->driver_access);
+ auxiliary_set_drvdata(aux_dev, doe_ds);
+ up_write(&doe_dev->driver_access);
+
+ return 0;
+}
+
+static void cxl_pci_doe_remove(struct auxiliary_device *aux_dev)
+{
+ struct cxl_doe_dev *doe_dev = container_of(aux_dev, struct cxl_doe_dev,
+ adev);
+
+ down_write(&doe_dev->driver_access);
+ auxiliary_set_drvdata(aux_dev, NULL);
+ up_write(&doe_dev->driver_access);
+}
+
+static const struct auxiliary_device_id cxl_pci_doe_auxiliary_id_table[] = {
+ {.name = "cxl_pci." DOE_DEV_NAME, },
+ {},
+};
+
+MODULE_DEVICE_TABLE(auxiliary, cxl_pci_doe_auxiliary_id_table);
+
+struct auxiliary_driver cxl_pci_doe_auxiliary_drv = {
+ .name = "cxl_doe_drv",
+ .id_table = cxl_pci_doe_auxiliary_id_table,
+ .probe = cxl_pci_doe_probe,
+ .remove = cxl_pci_doe_remove,
+};
+
+static int __init cxl_pci_doe_init_module(void)
+{
+ int ret;
+
+ ret = auxiliary_driver_register(&cxl_pci_doe_auxiliary_drv);
+ if (ret) {
+ pr_err("Failed cxl_pci_doe auxiliary_driver_register() ret=%d\n",
+ ret);
+ }
+
+ return ret;
+}
+
+static void __exit cxl_pci_doe_exit_module(void)
+{
+ auxiliary_driver_unregister(&cxl_pci_doe_auxiliary_drv);
+}
+
+module_init(cxl_pci_doe_init_module);
+module_exit(cxl_pci_doe_exit_module);
+MODULE_LICENSE("GPL");
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 41a6f3eb0a5c..0dec1f1a3f38 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -590,6 +590,17 @@ static void cxl_pci_doe_destroy_device(void *ad)
auxiliary_device_uninit(ad);
}

+static struct cxl_doe_drv_state *cxl_pci_doe_get_drv(struct cxl_doe_dev *doe_dev)
+{
+ down_read(&doe_dev->driver_access);
+ return auxiliary_get_drvdata(&doe_dev->adev);
+}
+
+static void cxl_pci_doe_put_drv(struct cxl_doe_dev *doe_dev)
+{
+ up_read(&doe_dev->driver_access);
+}
+
/**
* cxl_pci_create_doe_devices - Create auxiliary bus DOE devices for all DOE
* mailboxes found
@@ -652,6 +663,7 @@ static int cxl_pci_create_doe_devices(struct pci_dev *pdev)
return -ENOMEM;

new_dev->pdev = pdev;
+ init_rwsem(&new_dev->driver_access);
new_dev->cap_offset = off;
new_dev->use_irq = use_irq;

@@ -682,6 +694,13 @@ static int cxl_pci_create_doe_devices(struct pci_dev *pdev)
adev);
if (rc)
return rc;
+
+ if (device_attach(&adev->dev) != 1) {
+ dev_err(&adev->dev,
+ "Failed to attach a driver to DOE device %d\n",
+ adev->id);
+ return -ENODEV;
+ }
}

return 0;
@@ -785,6 +804,7 @@ static struct pci_driver cxl_pci_driver = {
},
};

+MODULE_SOFTDEP("pre: cxl_doe");
MODULE_LICENSE("GPL v2");
module_pci_driver(cxl_pci_driver);
MODULE_IMPORT_NS(CXL);
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index 4e96b45ee36d..c268df088dd4 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -1118,6 +1118,7 @@
#define PCI_DOE_STATUS_DATA_OBJECT_READY 0x80000000 /* Data Object Ready */
#define PCI_DOE_WRITE 0x10 /* DOE Write Data Mailbox Register */
#define PCI_DOE_READ 0x14 /* DOE Read Data Mailbox Register */
+#define PCI_DOE_CAP_SIZE (0x14 + 4) /* Size of this register block */

/* DOE Data Object - note not actually registers */
#define PCI_DOE_DATA_OBJECT_HEADER_1_VID 0x0000ffff
--
2.35.1

2022-04-27 17:46:46

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH V8 04/10] cxl/pci: Create auxiliary devices for each DOE mailbox

On Thu, 14 Apr 2022 13:32:31 -0700
[email protected] wrote:

> From: Ira Weiny <[email protected]>
>
> CXL kernel drivers optionally need to access DOE mailbox capabilities.
> Access to mailboxes for things such as CDAT, SPDM, and IDE are needed by
> the kernel while other access is designed towards user space usage. An
> example of this is for CXL Compliance Testing (see CXL 2.0 14.16.4
> Compliance Mode DOE) which offers a mechanism to set different test
> modes for a device.
>
> There is no anticipated need for the kernel to share an individual
> mailbox with user space. Thus developing an interface to marshal access
> between the kernel and user space for a single mailbox is unnecessary
> overhead. However, having the kernel relinquish some mailboxes to be
> controlled by user space is a reasonable compromise to share access to
> the device.
>
> The auxiliary bus provides an elegant solution for this. Each DOE
> capability is given its own auxiliary device. This device is controlled
> by a kernel driver by default which restricts access to the mailbox.
> Unbinding the driver from a single auxiliary device (DOE mailbox
> capability) frees the mailbox for user space access. This architecture
> also allows a clear picture on which mailboxes are kernel controlled vs
> not.
>
> Iterate each DOE mailbox capability and create auxiliary bus devices.
> Follow on patches will define a driver for the newly created devices.
>
> sysfs shows the devices.
>
> $ ls -l /sys/bus/auxiliary/devices/
> total 0
> lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.0 -> ../../../devices/pci0000:bf/0000:bf:00.0/0000:c0:00.0/cxl_pci.doe.0
> lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.1 -> ../../../devices/pci0000:bf/0000:bf:01.0/0000:c1:00.0/cxl_pci.doe.1
> lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.2 -> ../../../devices/pci0000:35/0000:35:00.0/0000:36:00.0/cxl_pci.doe.2
> lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.3 -> ../../../devices/pci0000:35/0000:35:01.0/0000:37:00.0/cxl_pci.doe.3
> lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.4 -> ../../../devices/pci0000:35/0000:35:00.0/0000:36:00.0/cxl_pci.doe.4
> lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.5 -> ../../../devices/pci0000:bf/0000:bf:00.0/0000:c0:00.0/cxl_pci.doe.5
> lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.6 -> ../../../devices/pci0000:35/0000:35:01.0/0000:37:00.0/cxl_pci.doe.6
> lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.7 -> ../../../devices/pci0000:bf/0000:bf:01.0/0000:c1:00.0/cxl_pci.doe.7
>
> Signed-off-by: Ira Weiny <[email protected]>

I'm not 100% happy with effectively having one solution for CXL
and probably a different one for DOEs on switch ports
(which I just hacked into a port service driver to convince
myself there was at least one plausible way of doing that) but if
this effectively separates the two discussions then I guess I can
live with it for now ;)

Once this is merged we can start the discussion about how to
handle switch ports with DOEs both for CDAT and SPDM.

I'll send out an RFC that is so hideous it will get people to
suggestion how to do it better! Currently it starts and
stops the mailbox 3 times in the registration path and I think
it's more luck than judgement that is landing me with the right
MSI.

Anyhow, this looks good to me.

Reviewed-by: Jonathan Cameron <[email protected]>


>
> ---
> Changes from V7:
> Minor code clean ups
> Rebased on cxl-pending
>
> Changes from V6:
> Move all the auxiliary device stuff to the CXL layer
>
> Changes from V5:
> Split the CXL specific stuff off from the PCI DOE create
> auxiliary device code.
> ---
> drivers/cxl/Kconfig | 1 +
> drivers/cxl/cxlpci.h | 21 +++++++
> drivers/cxl/pci.c | 127 +++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 149 insertions(+)
>
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index f64e3984689f..ac0f5ca95431 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -16,6 +16,7 @@ if CXL_BUS
> config CXL_PCI
> tristate "PCI manageability"
> default CXL_BUS
> + select AUXILIARY_BUS
> help
> The CXL specification defines a "CXL memory device" sub-class in the
> PCI "memory controller" base class of devices. Device's identified by
> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> index 329e7ea3f36a..2ad8715173ce 100644
> --- a/drivers/cxl/cxlpci.h
> +++ b/drivers/cxl/cxlpci.h
> @@ -2,6 +2,7 @@
> /* Copyright(c) 2020 Intel Corporation. All rights reserved. */
> #ifndef __CXL_PCI_H__
> #define __CXL_PCI_H__
> +#include <linux/auxiliary_bus.h>
> #include <linux/pci.h>
> #include "cxl.h"
>
> @@ -72,4 +73,24 @@ static inline resource_size_t cxl_regmap_to_base(struct pci_dev *pdev,
> }
>
> int devm_cxl_port_enumerate_dports(struct cxl_port *port);
> +
> +/**
> + * struct cxl_doe_dev - CXL DOE auxiliary bus device
> + *
> + * @adev: Auxiliary bus device
> + * @pdev: PCI device this belongs to
> + * @cap_offset: Capability offset
> + * @use_irq: Set if IRQs are to be used with this mailbox
> + *
> + * This represents a single DOE mailbox device. CXL devices should create this
> + * device and register it on the Auxiliary bus for the CXL DOE driver to drive.
> + */
> +struct cxl_doe_dev {
> + struct auxiliary_device adev;
> + struct pci_dev *pdev;
> + int cap_offset;
> + bool use_irq;
> +};
> +#define DOE_DEV_NAME "doe"
> +
> #endif /* __CXL_PCI_H__ */
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index e7ab9a34d718..41a6f3eb0a5c 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -8,6 +8,7 @@
> #include <linux/mutex.h>
> #include <linux/list.h>
> #include <linux/pci.h>
> +#include <linux/pci-doe.h>
> #include <linux/io.h>
> #include "cxlmem.h"
> #include "cxlpci.h"
> @@ -564,6 +565,128 @@ static void cxl_dvsec_ranges(struct cxl_dev_state *cxlds)
> info->ranges = __cxl_dvsec_ranges(cxlds, info);
> }
>
> +static void cxl_pci_free_irq_vectors(void *data)
> +{
> + pci_free_irq_vectors(data);
> +}
> +
> +static DEFINE_IDA(pci_doe_adev_ida);
> +
> +static void cxl_pci_doe_dev_release(struct device *dev)
> +{
> + struct auxiliary_device *adev = container_of(dev,
> + struct auxiliary_device,
> + dev);
> + struct cxl_doe_dev *doe_dev = container_of(adev, struct cxl_doe_dev,
> + adev);
> +
> + ida_free(&pci_doe_adev_ida, adev->id);
> + kfree(doe_dev);
> +}
> +
> +static void cxl_pci_doe_destroy_device(void *ad)
> +{
> + auxiliary_device_delete(ad);
> + auxiliary_device_uninit(ad);
> +}
> +
> +/**
> + * cxl_pci_create_doe_devices - Create auxiliary bus DOE devices for all DOE
> + * mailboxes found
> + *
> + * @pci_dev: The PCI device to scan for DOE mailboxes
> + *
> + * There is no coresponding destroy of these devices. This function associates
> + * the DOE auxiliary devices created with the pci_dev passed in. That
> + * association is device managed (devm_*) such that the DOE auxiliary device
> + * lifetime is always less than or equal to the lifetime of the pci_dev.
> + *
> + * RETURNS: 0 on success -ERRNO on failure.
> + */
> +static int cxl_pci_create_doe_devices(struct pci_dev *pdev)
> +{
> + struct device *dev = &pdev->dev;
> + bool use_irq = true;
> + int irqs = 0;
> + u16 off = 0;
> + int rc;
> +
> + pci_doe_for_each_off(pdev, off)
> + irqs++;
> + pci_info(pdev, "Found %d DOE mailbox's\n", irqs);
> +
> + /*
> + * Allocate enough vectors for the DOE's
> + */
> + rc = pci_alloc_irq_vectors(pdev, irqs, irqs, PCI_IRQ_MSI |
> + PCI_IRQ_MSIX);
> + if (rc != irqs) {
> + pci_err(pdev,
> + "Not enough interrupts for all the DOEs; use polling\n");
> + use_irq = false;
> + /* Some got allocated; clean them up */
> + if (rc > 0)
> + cxl_pci_free_irq_vectors(pdev);
> + } else {
> + /*
> + * Enabling bus mastering is require for MSI/MSIx. It could be
> + * done later within the DOE initialization, but as it
> + * potentially has other impacts keep it here when setting up
> + * the IRQ's.
> + */
> + pci_set_master(pdev);
> + rc = devm_add_action_or_reset(dev,
> + cxl_pci_free_irq_vectors,
> + pdev);
> + if (rc)
> + return rc;
> + }
> +
> + pci_doe_for_each_off(pdev, off) {
> + struct auxiliary_device *adev;
> + struct cxl_doe_dev *new_dev;
> + int id;
> +
> + new_dev = kzalloc(sizeof(*new_dev), GFP_KERNEL);
> + if (!new_dev)
> + return -ENOMEM;
> +
> + new_dev->pdev = pdev;
> + new_dev->cap_offset = off;
> + new_dev->use_irq = use_irq;
> +
> + /* Set up struct auxiliary_device */
> + adev = &new_dev->adev;
> + id = ida_alloc(&pci_doe_adev_ida, GFP_KERNEL);
> + if (id < 0) {
> + kfree(new_dev);
> + return -ENOMEM;
> + }
> +
> + adev->id = id;
> + adev->name = DOE_DEV_NAME;
> + adev->dev.release = cxl_pci_doe_dev_release;
> + adev->dev.parent = dev;
> +
> + if (auxiliary_device_init(adev)) {
> + cxl_pci_doe_dev_release(&adev->dev);
> + return -EIO;
> + }
> +
> + if (auxiliary_device_add(adev)) {
> + auxiliary_device_uninit(adev);
> + return -EIO;
> + }
> +
> + rc = devm_add_action_or_reset(dev, cxl_pci_doe_destroy_device,
> + adev);
> + if (rc)
> + return rc;
> + }
> +
> + return 0;
> +}
> +
> static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> {
> struct cxl_register_map map;
> @@ -630,6 +753,10 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> if (rc)
> return rc;
>
> + rc = cxl_pci_create_doe_devices(pdev);
> + if (rc)
> + return rc;
> +
> cxl_dvsec_ranges(cxlds);
>
> cxlmd = devm_cxl_add_memdev(cxlds);

2022-04-27 18:26:06

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH V8 09/10] cxl/mem: Retry reading CDAT on failure

On Thu, 14 Apr 2022 13:32:36 -0700
[email protected] wrote:

> From: Ira Weiny <[email protected]>
>
> The CDAT read may fail for a number of reasons but mainly it is possible
> to get different parts of a valid state. The checksum in the CDAT table
> protects against this.
>
> Now that the cdat data is validated issue a retries if the CDAT read

validated, retry if the CDAT read fails.

> fails. For now 5 retries are implemented.
>
> Signed-off-by: Ira Weiny <[email protected]>

Reviewed-by: Jonathan Cameron <[email protected]>

>
> ---
> Changes from V6
> Move to pci.c
> Fix retries count
> Change to 5 retries
>
> Changes from V5:
> New patch -- easy to push off or drop.
> ---
> drivers/cxl/pci.c | 21 +++++++++++++++++----
> 1 file changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index d7952156dd02..43cbc297079d 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -940,7 +940,7 @@ static void cxl_initialize_cdat_callbacks(struct cxl_dev_state *cxlds)
> cxlds->cdat_read_table = cxl_cdat_read_table;
> }
>
> -static int read_cdat_data(struct cxl_dev_state *cxlds)
> +static int __read_cdat_data(struct cxl_dev_state *cxlds)
> {
> struct device *dev = cxlds->dev;
> size_t cdat_length;
> @@ -962,6 +962,21 @@ static int read_cdat_data(struct cxl_dev_state *cxlds)
> return ret;
> }
>
> +static void read_cdat_data(struct cxl_dev_state *cxlds)
> +{
> + int retries = 5;
> + int rc;
> +
> + while (retries--) {
> + rc = __read_cdat_data(cxlds);
> + if (!rc)
> + break;
> + dev_err(cxlds->dev,
> + "CDAT data read error rc=%d (retries %d)\n",
> + rc, retries);
> + }
> +}
> +
> static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> {
> struct cxl_register_map map;
> @@ -1035,9 +1050,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> cxl_initialize_cdat_callbacks(cxlds);
>
> /* Cache the data early to ensure is_visible() works */
> - rc = read_cdat_data(cxlds);
> - if (rc)
> - dev_err(&pdev->dev, "CDAT data read error (%d)\n", rc);
> + read_cdat_data(cxlds);
>
> cxl_dvsec_ranges(cxlds);
>

2022-04-27 18:29:24

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH V8 10/10] cxl/port: Parse out DSMAS data from CDAT table

On Thu, 14 Apr 2022 13:32:37 -0700
[email protected] wrote:

> From: Ira Weiny <[email protected]>
>
> CXL Ports with memory devices attached need the information from the
> Device Scoped Memory Affinity Structure (DSMAS). This information is
> contained within the CDAT table buffer which is previously read and
> cached in the device state.
>
> If CDAT data is available, parse and cache DSMAS data from the table.
> Store this data in unmarshaled struct dsmas data structures for ease of
> use.
>
> Signed-off-by: Ira Weiny <[email protected]>

You could hold off on this patch and having it in the series that uses
the data.

Patch itself looks fine - it's just a bit random to parse one particular
record and do nothing with it beyond some debug prints :)

Reviewed-by: Jonathan Cameron <[email protected]>

>
> ---
> Changes from V7
> Rebased on cxl-pending
>
> Changes from V6
> Move to port.c
> It is not an error if no DSMAS data is found
>
> Changes from V5
> Fix up sparse warnings
> Split out cdat_hdr_valid()
> Update cdat_hdr_valid()
> Remove revision and cs field parsing
> There is no point in these
> Add seq check and debug print.
> From Jonathan
> Add spaces around '+' and '/'
> use devm_krealloc() for dmas_ary
> ---
> drivers/cxl/cdat.h | 17 ++++++++++++
> drivers/cxl/cxl.h | 6 +++++
> drivers/cxl/port.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 88 insertions(+)
>
> diff --git a/drivers/cxl/cdat.h b/drivers/cxl/cdat.h
> index a7725d26f2d2..66706b238bc9 100644
> --- a/drivers/cxl/cdat.h
> +++ b/drivers/cxl/cdat.h
> @@ -83,6 +83,23 @@
> #define CDAT_SSLBIS_ENTRY_PORT_Y(entry, i) (((entry)[4 + (i) * 2] & 0xffff0000) >> 16)
> #define CDAT_SSLBIS_ENTRY_LAT_OR_BW(entry, i) ((entry)[4 + (i) * 2 + 1] & 0x0000ffff)
>
> +/**
> + * struct cxl_dsmas - host unmarshaled version of DSMAS data
> + *
> + * As defined in the Coherent Device Attribute Table (CDAT) specification this
> + * represents a single DSMAS entry in that table.
> + *
> + * @dpa_base: The lowest Device Physical Address associated with this DSMAD
> + * @length: Length in bytes of this DSMAD
> + * @non_volatile: If set, the memory region represents Non-Volatile memory
> + */
> +struct cxl_dsmas {
> + u64 dpa_base;
> + u64 length;
> + /* Flags */
> + u8 non_volatile:1;
> +};
> +
> /**
> * struct cxl_cdat - CXL CDAT data
> *
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 50817fd2c912..80fd39769a65 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -9,6 +9,8 @@
> #include <linux/bitops.h>
> #include <linux/io.h>
>
> +#include "cdat.h"
> +
> /**
> * DOC: cxl objects
> *
> @@ -269,6 +271,8 @@ struct cxl_nvdimm {
> * @component_reg_phys: component register capability base address (optional)
> * @dead: last ep has been removed, force port re-creation
> * @depth: How deep this port is relative to the root. depth 0 is the root.
> + * @dsmas_ary: Array of DSMAS entries as parsed from the CDAT table
> + * @nr_dsmas: Number of entries in dsmas_ary
> */
> struct cxl_port {
> struct device dev;
> @@ -280,6 +284,8 @@ struct cxl_port {
> resource_size_t component_reg_phys;
> bool dead;
> unsigned int depth;
> + struct cxl_dsmas *dsmas_ary;
> + int nr_dsmas;
> };
>
> /**
> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> index d420da5fc39c..2288432a27cd 100644
> --- a/drivers/cxl/port.c
> +++ b/drivers/cxl/port.c
> @@ -30,6 +30,70 @@ static void schedule_detach(void *cxlmd)
> schedule_cxl_memdev_detach(cxlmd);
> }
>
> +static void parse_dsmas(struct cxl_port *port, struct cxl_dev_state *cxlds)
> +{
> + struct device *dev = &port->dev;
> + struct cxl_dsmas *dsmas_ary = NULL;
> + u32 *data = cxlds->cdat.table;
> + int bytes_left = cxlds->cdat.length;
> + int nr_dsmas = 0;
> +
> + if (!data) {
> + dev_info(dev, "No CDAT data available for DSMAS\n");
> + return;
> + }
> +
> + /* Skip header */
> + data += CDAT_HEADER_LENGTH_DW;
> + bytes_left -= CDAT_HEADER_LENGTH_BYTES;
> +
> + while (bytes_left > 0) {
> + u32 *cur_rec = data;
> + u8 type = FIELD_GET(CDAT_STRUCTURE_DW0_TYPE, cur_rec[0]);
> + u16 length = FIELD_GET(CDAT_STRUCTURE_DW0_LENGTH, cur_rec[0]);
> +
> + if (type == CDAT_STRUCTURE_DW0_TYPE_DSMAS) {
> + struct cxl_dsmas *new_ary;
> + u8 flags;
> +
> + new_ary = devm_krealloc(dev, dsmas_ary,
> + sizeof(*dsmas_ary) * (nr_dsmas + 1),
> + GFP_KERNEL);
> + if (!new_ary) {
> + dev_err(dev,
> + "Failed to allocate memory for DSMAS data (nr_dsmas %d)\n",
> + nr_dsmas);
> + return;
> + }
> + dsmas_ary = new_ary;
> +
> + flags = FIELD_GET(CDAT_DSMAS_DW1_FLAGS, cur_rec[1]);
> +
> + dsmas_ary[nr_dsmas].dpa_base = CDAT_DSMAS_DPA_OFFSET(cur_rec);
> + dsmas_ary[nr_dsmas].length = CDAT_DSMAS_DPA_LEN(cur_rec);
> + dsmas_ary[nr_dsmas].non_volatile = CDAT_DSMAS_NON_VOLATILE(flags);
> +
> + dev_dbg(dev, "DSMAS %d: %llx:%llx %s\n",
> + nr_dsmas,
> + dsmas_ary[nr_dsmas].dpa_base,
> + dsmas_ary[nr_dsmas].dpa_base +
> + dsmas_ary[nr_dsmas].length,
> + (dsmas_ary[nr_dsmas].non_volatile ?
> + "Persistent" : "Volatile")
> + );
> +
> + nr_dsmas++;
> + }
> +
> + data += (length / sizeof(u32));
> + bytes_left -= length;
> + }
> +
> + dev_dbg(dev, "Found %d DSMAS entries\n", nr_dsmas);
> + port->dsmas_ary = dsmas_ary;
> + port->nr_dsmas = nr_dsmas;
> +}
> +

2022-04-29 09:12:14

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH V8 05/10] cxl/pci: Create DOE auxiliary driver

On Thu, 28 Apr 2022 07:48:00 -0700
[email protected] wrote:

> On Wed, Apr 27, 2022 at 06:43:45PM +0100, Jonathan Cameron wrote:
> > On Thu, 14 Apr 2022 13:32:32 -0700
> > [email protected] wrote:
> >
> > > From: Ira Weiny <[email protected]>
> > >
> > > CXL kernel drivers optionally need to access DOE mailbox capabilities.
> > > Access to mailboxes for things such as CDAT, SPDM, and IDE are needed by
> > > the kernel while other access is designed towards user space usage. An
> > > example of this is for CXL Compliance Testing (see CXL 2.0 14.16.4
> > > Compliance Mode DOE) which offers a mechanism to set different test
> > > modes for a device.
> > >
> > > There is no anticipated need for the kernel to share an individual
> > > mailbox with user space. Thus developing an interface to marshal access
> > > between the kernel and user space for a single mailbox is unnecessary
> > > overhead. However, having the kernel relinquish some mailboxes to be
> > > controlled by user space is a reasonable compromise to share access to
> > > the device.
> > >
> > > The auxiliary bus provides an elegant solution for this. Each DOE
> > > capability is given its own auxiliary device. This device is controlled
> > > by a kernel driver by default which restricts access to the mailbox.
> > > Unbinding the driver from a single auxiliary device (DOE mailbox
> > > capability) frees the mailbox for user space access. This architecture
> > > also allows a clear picture on which mailboxes are kernel controlled vs
> > > not.
> > >
> > > Create a driver for the DOE auxiliary devices. The driver uses the PCI
> > > DOE core to manage the mailbox.
> > >
> > > User space must be prevented from unbinding the driver state when the
> > > DOE auxiliary driver is being accessed by the kernel. Add a read write
> > > lock to the DOE auxiliary device to protect the driver data portion.
> > >
> > > Finally, flag the driver module to be preloaded by device creation to
> > > ensure the driver is attached when iterating the DOE capabilities.
> > >
> > > User space access can be obtained by unbinding the driver from that
> > > device. For example:
> > >
> > > $ ls -l /sys/bus/auxiliary/drivers
> > > total 0
> > > drwxr-xr-x 2 root root 0 Mar 24 10:45 cxl_doe.cxl_doe_drv
> > >
> > > $ ls -l /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci*
> > > lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.0 -> ../../../../devices/pci0000:bf/0000:bf:00.0/0000:c0:00.0/cxl_pci.doe.0
> > > lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.1 -> ../../../../devices/pci0000:bf/0000:bf:01.0/0000:c1:00.0/cxl_pci.doe.1
> > > lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.2 -> ../../../../devices/pci0000:35/0000:35:00.0/0000:36:00.0/cxl_pci.doe.2
> > > lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.3 -> ../../../../devices/pci0000:35/0000:35:01.0/0000:37:00.0/cxl_pci.doe.3
> > > lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.3 -> ../../../../devices/pci0000:35/0000:35:01.0/0000:37:00.0/cxl_pci.doe.4
> > > lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.5 -> ../../../../devices/pci0000:bf/0000:bf:00.0/0000:c0:00.0/cxl_pci.doe.5
> > > lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.6 -> ../../../../devices/pci0000:35/0000:35:01.0/0000:37:00.0/cxl_pci.doe.6
> > > lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.7 -> ../../../../devices/pci0000:bf/0000:bf:01.0/0000:c1:00.0/cxl_pci.doe.7
> > >
> > > $ echo "cxl_pci.doe.4" > /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/unbind
> > >
> > > $ ls -l /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci*
> > > lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.0 -> ../../../../devices/pci0000:bf/0000:bf:00.0/0000:c0:00.0/cxl_pci.doe.0
> > > lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.1 -> ../../../../devices/pci0000:bf/0000:bf:01.0/0000:c1:00.0/cxl_pci.doe.1
> > > lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.2 -> ../../../../devices/pci0000:35/0000:35:00.0/0000:36:00.0/cxl_pci.doe.2
> > > lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.3 -> ../../../../devices/pci0000:35/0000:35:01.0/0000:37:00.0/cxl_pci.doe.3
> > > lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.5 -> ../../../../devices/pci0000:bf/0000:bf:00.0/0000:c0:00.0/cxl_pci.doe.5
> > > lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.6 -> ../../../../devices/pci0000:35/0000:35:01.0/0000:37:00.0/cxl_pci.doe.6
> > > lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.7 -> ../../../../devices/pci0000:bf/0000:bf:01.0/0000:c1:00.0/cxl_pci.doe.7
> > >
> > > Signed-off-by: Ira Weiny <[email protected]>
> >
> > Hi Ira,
> >
> > A few minor comments inline.
> > With those cleaned up
>
> Thanks!
>
> >
> > Reviewed-by: Jonathan Cameron <[email protected]>
> >
> > >
> > > ---
> > > Changes from V7:
> > > Now need to select PCI_DOE
> > > Change MODULE_LICENSE to 'GPL' instead of old 'GPL v2'
> > >
> > > Changes from V6:
> > > The CXL layer now contains the driver for these auxiliary
> > > devices.
> > >
> > > Changes from V5:
> > > Split the CXL specific stuff off from the PCI DOE create
> > > auxiliary device code.
> > > ---
> > > drivers/cxl/Kconfig | 13 +++++
> > > drivers/cxl/Makefile | 2 +
> > > drivers/cxl/cxlpci.h | 13 +++++
> > > drivers/cxl/doe.c | 90 +++++++++++++++++++++++++++++++++++
> > > drivers/cxl/pci.c | 20 ++++++++
> > > include/uapi/linux/pci_regs.h | 1 +
> > > 6 files changed, 139 insertions(+)
> > > create mode 100644 drivers/cxl/doe.c
> > >

...


> > > diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> > > index 2ad8715173ce..821fe05e8289 100644
> > > --- a/drivers/cxl/cxlpci.h
> > > +++ b/drivers/cxl/cxlpci.h
> > > @@ -79,6 +79,7 @@ int devm_cxl_port_enumerate_dports(struct cxl_port *port);
> > > *
> > > * @adev: Auxiliary bus device
> > > * @pdev: PCI device this belongs to
> > > + * @driver_access: Lock the driver during access
> > > * @cap_offset: Capability offset
> > > * @use_irq: Set if IRQs are to be used with this mailbox
> > > *
> > > @@ -88,9 +89,21 @@ int devm_cxl_port_enumerate_dports(struct cxl_port *port);
> > > struct cxl_doe_dev {
> > > struct auxiliary_device adev;
> > > struct pci_dev *pdev;
> > > + struct rw_semaphore driver_access;
> > > int cap_offset;
> > > bool use_irq;
> > > };
> > > #define DOE_DEV_NAME "doe"
> > >
> > > +/**
> > > + * struct cxl_doe_drv_state - state of the DOE Aux driver
> > > + *
> > > + * @doe_dev: The Auxiliary DOE device
> >
> > As far as I can tell no one actually uses the doe_dev from here for anything
> > so do we need it at all?
>
> Oh wow! Great catch. Worse yet I never even set it. :-(
>
> It must have been left over cruft from development which I missed in final
> review. I think the logic was that the device goes along with the driver
> state...
>
> Yes I will remove it.
>
> But that unfortunately begs the question does cxl_doe_drv_state even need to
> exist?
>
> I don't like returning the pci_doe_mb directly but having a struct contain a
> struct is worse IMO.
>
> So cxl_pci_doe_get_drv() and cxl_pci_doe_put_drv() are going to be
> get_mb/put_mb respectively.
>
> Can I make that change with your review by?

Yes, that's fine.

>
> >
> > > + * @doe_mb: PCI DOE mailbox state
> > > + */
> > > +struct cxl_doe_drv_state {
> > > + struct cxl_doe_dev *doe_dev;
> > > + struct pci_doe_mb *doe_mb;
> > > +};
> > > +
> > > #endif /* __CXL_PCI_H__ */
> > > diff --git a/drivers/cxl/doe.c b/drivers/cxl/doe.c
> > > new file mode 100644
> > > index 000000000000..1d3a24a77002
> > > --- /dev/null
> > > +++ b/drivers/cxl/doe.c

...

> >
> > > + return PTR_ERR(doe_mb);
> >
> > > + }
> > > +
> > > + doe_ds->doe_mb = doe_mb;
> > > + devm_add_action_or_reset(dev, doe_destroy_mb, doe_ds);
> > > +
> > > + down_write(&doe_dev->driver_access);
> > > + auxiliary_set_drvdata(aux_dev, doe_ds);
> > > + up_write(&doe_dev->driver_access);
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static void cxl_pci_doe_remove(struct auxiliary_device *aux_dev)
> > > +{
> > > + struct cxl_doe_dev *doe_dev = container_of(aux_dev, struct cxl_doe_dev,
> > > + adev);
> > > +
> > > + down_write(&doe_dev->driver_access);
> > > + auxiliary_set_drvdata(aux_dev, NULL);
> >
> > This confused me for a bit. I 'think' you are doing this to be able to use
> > it as a flag for whether the driver is still bound. If so, a comment would
> > be useful.
>
> Yes.
>
> I'll add this from the commit message:
>
> User space must be prevented from unbinding the driver state when the
> DOE auxiliary driver is being accessed by the kernel.

Maybe a comment here as well as setting drvdata to NULL is rather rare
as I'm fairly sure the driver core does it for you these days (but too late
for this particular use).

>
> >
> > > + up_write(&doe_dev->driver_access);
> > > +}
> > > +
> > > +static const struct auxiliary_device_id cxl_pci_doe_auxiliary_id_table[] = {
> > > + {.name = "cxl_pci." DOE_DEV_NAME, },
> > > + {},
> > > +};
> > > +
> > > +MODULE_DEVICE_TABLE(auxiliary, cxl_pci_doe_auxiliary_id_table);
> > > +
> > > +struct auxiliary_driver cxl_pci_doe_auxiliary_drv = {
> > > + .name = "cxl_doe_drv",
> > > + .id_table = cxl_pci_doe_auxiliary_id_table,
> > > + .probe = cxl_pci_doe_probe,
> > > + .remove = cxl_pci_doe_remove,
> > > +};
> > > +
> > > +static int __init cxl_pci_doe_init_module(void)
> > > +{
> > > + int ret;
> > > +
> > > + ret = auxiliary_driver_register(&cxl_pci_doe_auxiliary_drv);
> > > + if (ret) {
> > > + pr_err("Failed cxl_pci_doe auxiliary_driver_register() ret=%d\n",
> > > + ret);
> > > + }
> > > +
> > > + return ret;
> > > +}
> > > +
> > > +static void __exit cxl_pci_doe_exit_module(void)
> > > +{
> > > + auxiliary_driver_unregister(&cxl_pci_doe_auxiliary_drv);
> > > +}
> > > +
> > > +module_init(cxl_pci_doe_init_module);
> > > +module_exit(cxl_pci_doe_exit_module);
> > > +MODULE_LICENSE("GPL");
> > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > > index 41a6f3eb0a5c..0dec1f1a3f38 100644
> > > --- a/drivers/cxl/pci.c
> > > +++ b/drivers/cxl/pci.c
> > > @@ -590,6 +590,17 @@ static void cxl_pci_doe_destroy_device(void *ad)
> > > auxiliary_device_uninit(ad);
> > > }
> > >
> > > +static struct cxl_doe_drv_state *cxl_pci_doe_get_drv(struct cxl_doe_dev *doe_dev)
> > > +{
> > > + down_read(&doe_dev->driver_access);
> > > + return auxiliary_get_drvdata(&doe_dev->adev);
> > > +}
> > > +
> > > +static void cxl_pci_doe_put_drv(struct cxl_doe_dev *doe_dev)
> > > +{
> > > + up_read(&doe_dev->driver_access);
> > > +}
> > > +
> > > /**
> > > * cxl_pci_create_doe_devices - Create auxiliary bus DOE devices for all DOE
> > > * mailboxes found
> > > @@ -652,6 +663,7 @@ static int cxl_pci_create_doe_devices(struct pci_dev *pdev)
> > > return -ENOMEM;
> > >
> > > new_dev->pdev = pdev;
> > > + init_rwsem(&new_dev->driver_access);
> > > new_dev->cap_offset = off;
> > > new_dev->use_irq = use_irq;
> > >
> > > @@ -682,6 +694,13 @@ static int cxl_pci_create_doe_devices(struct pci_dev *pdev)
> > > adev);
> > > if (rc)
> > > return rc;
> > > +
> > > + if (device_attach(&adev->dev) != 1) {
> > > + dev_err(&adev->dev,
> > > + "Failed to attach a driver to DOE device %d\n",
> > > + adev->id);
> > > + return -ENODEV;
> > > + }
> >
> > Can you add a comment on why this has to be the case at this point. Why can't
> > the driver come along later?
>
> Yes I will.
>
> I've not really liked this aspect of the auxiliary device arch. But I've not
> convinced myself that it will work ok to leave this till later.
>
> Putting the CDAT retry on a timer might be a better option but it would also
> mean searching for the proper DOE would need to be delayed as well...

Ultimately we may not need to know CDAT info at the time of driver
load, but only on either a userspace read, or setup of a region etc.
I guess in majority of cases that will all be automated though (based
on LSA contents etc).

>
> For now I'll put in a comment.
Great,

>
> Thanks for the review!

You are welcome.

Thanks for driving this forwards.

Jonathan

> Ira
>
> >
> > > }
> > >
> > > return 0;
> > > @@ -785,6 +804,7 @@ static struct pci_driver cxl_pci_driver = {
> > > },
> > > };
> >
> > ...
> >

2022-04-29 13:54:35

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH V8 05/10] cxl/pci: Create DOE auxiliary driver

On Wed, Apr 27, 2022 at 06:43:45PM +0100, Jonathan Cameron wrote:
> On Thu, 14 Apr 2022 13:32:32 -0700
> [email protected] wrote:
>
> > From: Ira Weiny <[email protected]>
> >
> > CXL kernel drivers optionally need to access DOE mailbox capabilities.
> > Access to mailboxes for things such as CDAT, SPDM, and IDE are needed by
> > the kernel while other access is designed towards user space usage. An
> > example of this is for CXL Compliance Testing (see CXL 2.0 14.16.4
> > Compliance Mode DOE) which offers a mechanism to set different test
> > modes for a device.
> >
> > There is no anticipated need for the kernel to share an individual
> > mailbox with user space. Thus developing an interface to marshal access
> > between the kernel and user space for a single mailbox is unnecessary
> > overhead. However, having the kernel relinquish some mailboxes to be
> > controlled by user space is a reasonable compromise to share access to
> > the device.
> >
> > The auxiliary bus provides an elegant solution for this. Each DOE
> > capability is given its own auxiliary device. This device is controlled
> > by a kernel driver by default which restricts access to the mailbox.
> > Unbinding the driver from a single auxiliary device (DOE mailbox
> > capability) frees the mailbox for user space access. This architecture
> > also allows a clear picture on which mailboxes are kernel controlled vs
> > not.
> >
> > Create a driver for the DOE auxiliary devices. The driver uses the PCI
> > DOE core to manage the mailbox.
> >
> > User space must be prevented from unbinding the driver state when the
> > DOE auxiliary driver is being accessed by the kernel. Add a read write
> > lock to the DOE auxiliary device to protect the driver data portion.
> >
> > Finally, flag the driver module to be preloaded by device creation to
> > ensure the driver is attached when iterating the DOE capabilities.
> >
> > User space access can be obtained by unbinding the driver from that
> > device. For example:
> >
> > $ ls -l /sys/bus/auxiliary/drivers
> > total 0
> > drwxr-xr-x 2 root root 0 Mar 24 10:45 cxl_doe.cxl_doe_drv
> >
> > $ ls -l /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci*
> > lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.0 -> ../../../../devices/pci0000:bf/0000:bf:00.0/0000:c0:00.0/cxl_pci.doe.0
> > lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.1 -> ../../../../devices/pci0000:bf/0000:bf:01.0/0000:c1:00.0/cxl_pci.doe.1
> > lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.2 -> ../../../../devices/pci0000:35/0000:35:00.0/0000:36:00.0/cxl_pci.doe.2
> > lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.3 -> ../../../../devices/pci0000:35/0000:35:01.0/0000:37:00.0/cxl_pci.doe.3
> > lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.3 -> ../../../../devices/pci0000:35/0000:35:01.0/0000:37:00.0/cxl_pci.doe.4
> > lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.5 -> ../../../../devices/pci0000:bf/0000:bf:00.0/0000:c0:00.0/cxl_pci.doe.5
> > lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.6 -> ../../../../devices/pci0000:35/0000:35:01.0/0000:37:00.0/cxl_pci.doe.6
> > lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.7 -> ../../../../devices/pci0000:bf/0000:bf:01.0/0000:c1:00.0/cxl_pci.doe.7
> >
> > $ echo "cxl_pci.doe.4" > /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/unbind
> >
> > $ ls -l /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci*
> > lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.0 -> ../../../../devices/pci0000:bf/0000:bf:00.0/0000:c0:00.0/cxl_pci.doe.0
> > lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.1 -> ../../../../devices/pci0000:bf/0000:bf:01.0/0000:c1:00.0/cxl_pci.doe.1
> > lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.2 -> ../../../../devices/pci0000:35/0000:35:00.0/0000:36:00.0/cxl_pci.doe.2
> > lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.3 -> ../../../../devices/pci0000:35/0000:35:01.0/0000:37:00.0/cxl_pci.doe.3
> > lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.5 -> ../../../../devices/pci0000:bf/0000:bf:00.0/0000:c0:00.0/cxl_pci.doe.5
> > lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.6 -> ../../../../devices/pci0000:35/0000:35:01.0/0000:37:00.0/cxl_pci.doe.6
> > lrwxrwxrwx 1 root root 0 Mar 24 10:53 /sys/bus/auxiliary/drivers/cxl_doe.cxl_doe_drv/cxl_pci.doe.7 -> ../../../../devices/pci0000:bf/0000:bf:01.0/0000:c1:00.0/cxl_pci.doe.7
> >
> > Signed-off-by: Ira Weiny <[email protected]>
>
> Hi Ira,
>
> A few minor comments inline.
> With those cleaned up

Thanks!

>
> Reviewed-by: Jonathan Cameron <[email protected]>
>
> >
> > ---
> > Changes from V7:
> > Now need to select PCI_DOE
> > Change MODULE_LICENSE to 'GPL' instead of old 'GPL v2'
> >
> > Changes from V6:
> > The CXL layer now contains the driver for these auxiliary
> > devices.
> >
> > Changes from V5:
> > Split the CXL specific stuff off from the PCI DOE create
> > auxiliary device code.
> > ---
> > drivers/cxl/Kconfig | 13 +++++
> > drivers/cxl/Makefile | 2 +
> > drivers/cxl/cxlpci.h | 13 +++++
> > drivers/cxl/doe.c | 90 +++++++++++++++++++++++++++++++++++
> > drivers/cxl/pci.c | 20 ++++++++
> > include/uapi/linux/pci_regs.h | 1 +
> > 6 files changed, 139 insertions(+)
> > create mode 100644 drivers/cxl/doe.c
> >
> > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> > index ac0f5ca95431..82f3908fa5cc 100644
> > --- a/drivers/cxl/Kconfig
> > +++ b/drivers/cxl/Kconfig
> > @@ -103,4 +103,17 @@ config CXL_SUSPEND
> > def_bool y
> > depends on SUSPEND && CXL_MEM
> >
> > +config CXL_PCI_DOE
> > + tristate "CXL PCI Data Object Exchange (DOE) support"
> > + depends on CXL_PCI
>
> Should be leading tabs not spaces.

Opps must have been a bad copy/paste. Sorry.

>
> > + default CXL_BUS
> > + select PCI_DOE
> > + help
> > + Driver for DOE auxiliary devices.
> > +
> > + The DOE capabilities provides a simple mailbox in PCI config space that
> > + is used for a number of different protocols useful to CXL. The CXL PCI
> > + subsystem creates auxiliary devices for each DOE mailbox capability
> > + found. This driver is required for the kernel to use these devices.
> > +
> > endif
> > diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile
> > index a78270794150..c71b7a6345fb 100644
> > --- a/drivers/cxl/Makefile
> > +++ b/drivers/cxl/Makefile
> > @@ -1,6 +1,7 @@
> > # SPDX-License-Identifier: GPL-2.0
> > obj-y += core/
> > obj-$(CONFIG_CXL_PCI) += cxl_pci.o
> > +obj-$(CONFIG_CXL_PCI_DOE) += cxl_doe.o
> > obj-$(CONFIG_CXL_MEM) += cxl_mem.o
> > obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o
> > obj-$(CONFIG_CXL_PMEM) += cxl_pmem.o
> > @@ -8,6 +9,7 @@ obj-$(CONFIG_CXL_PORT) += cxl_port.o
> >
> > cxl_mem-y := mem.o
> > cxl_pci-y := pci.o
> > +cxl_doe-y := doe.o
> > cxl_acpi-y := acpi.o
> > cxl_pmem-y := pmem.o
> > cxl_port-y := port.o
> > diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> > index 2ad8715173ce..821fe05e8289 100644
> > --- a/drivers/cxl/cxlpci.h
> > +++ b/drivers/cxl/cxlpci.h
> > @@ -79,6 +79,7 @@ int devm_cxl_port_enumerate_dports(struct cxl_port *port);
> > *
> > * @adev: Auxiliary bus device
> > * @pdev: PCI device this belongs to
> > + * @driver_access: Lock the driver during access
> > * @cap_offset: Capability offset
> > * @use_irq: Set if IRQs are to be used with this mailbox
> > *
> > @@ -88,9 +89,21 @@ int devm_cxl_port_enumerate_dports(struct cxl_port *port);
> > struct cxl_doe_dev {
> > struct auxiliary_device adev;
> > struct pci_dev *pdev;
> > + struct rw_semaphore driver_access;
> > int cap_offset;
> > bool use_irq;
> > };
> > #define DOE_DEV_NAME "doe"
> >
> > +/**
> > + * struct cxl_doe_drv_state - state of the DOE Aux driver
> > + *
> > + * @doe_dev: The Auxiliary DOE device
>
> As far as I can tell no one actually uses the doe_dev from here for anything
> so do we need it at all?

Oh wow! Great catch. Worse yet I never even set it. :-(

It must have been left over cruft from development which I missed in final
review. I think the logic was that the device goes along with the driver
state...

Yes I will remove it.

But that unfortunately begs the question does cxl_doe_drv_state even need to
exist?

I don't like returning the pci_doe_mb directly but having a struct contain a
struct is worse IMO.

So cxl_pci_doe_get_drv() and cxl_pci_doe_put_drv() are going to be
get_mb/put_mb respectively.

Can I make that change with your review by?

>
> > + * @doe_mb: PCI DOE mailbox state
> > + */
> > +struct cxl_doe_drv_state {
> > + struct cxl_doe_dev *doe_dev;
> > + struct pci_doe_mb *doe_mb;
> > +};
> > +
> > #endif /* __CXL_PCI_H__ */
> > diff --git a/drivers/cxl/doe.c b/drivers/cxl/doe.c
> > new file mode 100644
> > index 000000000000..1d3a24a77002
> > --- /dev/null
> > +++ b/drivers/cxl/doe.c
> > @@ -0,0 +1,90 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/* Copyright(c) 2022 Intel Corporation. All rights reserved. */
> > +
> > +#include <linux/pci.h>
> > +#include <linux/pci-doe.h>
>
> Rather sparse include list given what is going on in here.
> We shouldn't rely too heavily on what comes indirectly except
> where it's really standard headers.
>
> Probably mod_devicetable.h and some headers for auxiliary devices
> and rwsem.h at least?

Yea I'll add to this list.

>
> > +
> > +#include "cxlpci.h"
> > +
> > +static void doe_destroy_mb(void *ds)
> > +{
> > + struct cxl_doe_drv_state *doe_ds = ds;
> > +
> > + pci_doe_destroy_mb(doe_ds->doe_mb);
> > +}
> > +
> > +static int cxl_pci_doe_probe(struct auxiliary_device *aux_dev,
> > + const struct auxiliary_device_id *id)
> > +{
> > + struct cxl_doe_dev *doe_dev = container_of(aux_dev, struct cxl_doe_dev,
> > + adev);
> > + struct device *dev = &aux_dev->dev;
> > + struct cxl_doe_drv_state *doe_ds;
> > + struct pci_doe_mb *doe_mb;
> > +
> > + doe_ds = devm_kzalloc(dev, sizeof(*doe_ds), GFP_KERNEL);
> > + if (!doe_ds)
> > + return -ENOMEM;
> > +
> > + doe_mb = pci_doe_create_mb(doe_dev->pdev, doe_dev->cap_offset,
> > + doe_dev->use_irq);
> > + if (IS_ERR(doe_mb)) {
> > + dev_err(dev, "Failed to create the DOE mailbox state machine\n");
>
> You could use dev_err_probe() here to tidy this up a little bit.

Sure!

>
> > + return PTR_ERR(doe_mb);
>
> > + }
> > +
> > + doe_ds->doe_mb = doe_mb;
> > + devm_add_action_or_reset(dev, doe_destroy_mb, doe_ds);
> > +
> > + down_write(&doe_dev->driver_access);
> > + auxiliary_set_drvdata(aux_dev, doe_ds);
> > + up_write(&doe_dev->driver_access);
> > +
> > + return 0;
> > +}
> > +
> > +static void cxl_pci_doe_remove(struct auxiliary_device *aux_dev)
> > +{
> > + struct cxl_doe_dev *doe_dev = container_of(aux_dev, struct cxl_doe_dev,
> > + adev);
> > +
> > + down_write(&doe_dev->driver_access);
> > + auxiliary_set_drvdata(aux_dev, NULL);
>
> This confused me for a bit. I 'think' you are doing this to be able to use
> it as a flag for whether the driver is still bound. If so, a comment would
> be useful.

Yes.

I'll add this from the commit message:

User space must be prevented from unbinding the driver state when the
DOE auxiliary driver is being accessed by the kernel.

>
> > + up_write(&doe_dev->driver_access);
> > +}
> > +
> > +static const struct auxiliary_device_id cxl_pci_doe_auxiliary_id_table[] = {
> > + {.name = "cxl_pci." DOE_DEV_NAME, },
> > + {},
> > +};
> > +
> > +MODULE_DEVICE_TABLE(auxiliary, cxl_pci_doe_auxiliary_id_table);
> > +
> > +struct auxiliary_driver cxl_pci_doe_auxiliary_drv = {
> > + .name = "cxl_doe_drv",
> > + .id_table = cxl_pci_doe_auxiliary_id_table,
> > + .probe = cxl_pci_doe_probe,
> > + .remove = cxl_pci_doe_remove,
> > +};
> > +
> > +static int __init cxl_pci_doe_init_module(void)
> > +{
> > + int ret;
> > +
> > + ret = auxiliary_driver_register(&cxl_pci_doe_auxiliary_drv);
> > + if (ret) {
> > + pr_err("Failed cxl_pci_doe auxiliary_driver_register() ret=%d\n",
> > + ret);
> > + }
> > +
> > + return ret;
> > +}
> > +
> > +static void __exit cxl_pci_doe_exit_module(void)
> > +{
> > + auxiliary_driver_unregister(&cxl_pci_doe_auxiliary_drv);
> > +}
> > +
> > +module_init(cxl_pci_doe_init_module);
> > +module_exit(cxl_pci_doe_exit_module);
> > +MODULE_LICENSE("GPL");
> > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > index 41a6f3eb0a5c..0dec1f1a3f38 100644
> > --- a/drivers/cxl/pci.c
> > +++ b/drivers/cxl/pci.c
> > @@ -590,6 +590,17 @@ static void cxl_pci_doe_destroy_device(void *ad)
> > auxiliary_device_uninit(ad);
> > }
> >
> > +static struct cxl_doe_drv_state *cxl_pci_doe_get_drv(struct cxl_doe_dev *doe_dev)
> > +{
> > + down_read(&doe_dev->driver_access);
> > + return auxiliary_get_drvdata(&doe_dev->adev);
> > +}
> > +
> > +static void cxl_pci_doe_put_drv(struct cxl_doe_dev *doe_dev)
> > +{
> > + up_read(&doe_dev->driver_access);
> > +}
> > +
> > /**
> > * cxl_pci_create_doe_devices - Create auxiliary bus DOE devices for all DOE
> > * mailboxes found
> > @@ -652,6 +663,7 @@ static int cxl_pci_create_doe_devices(struct pci_dev *pdev)
> > return -ENOMEM;
> >
> > new_dev->pdev = pdev;
> > + init_rwsem(&new_dev->driver_access);
> > new_dev->cap_offset = off;
> > new_dev->use_irq = use_irq;
> >
> > @@ -682,6 +694,13 @@ static int cxl_pci_create_doe_devices(struct pci_dev *pdev)
> > adev);
> > if (rc)
> > return rc;
> > +
> > + if (device_attach(&adev->dev) != 1) {
> > + dev_err(&adev->dev,
> > + "Failed to attach a driver to DOE device %d\n",
> > + adev->id);
> > + return -ENODEV;
> > + }
>
> Can you add a comment on why this has to be the case at this point. Why can't
> the driver come along later?

Yes I will.

I've not really liked this aspect of the auxiliary device arch. But I've not
convinced myself that it will work ok to leave this till later.

Putting the CDAT retry on a timer might be a better option but it would also
mean searching for the proper DOE would need to be delayed as well...

For now I'll put in a comment.

Thanks for the review!
Ira

>
> > }
> >
> > return 0;
> > @@ -785,6 +804,7 @@ static struct pci_driver cxl_pci_driver = {
> > },
> > };
>
> ...
>

2022-04-30 19:12:11

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH V8 04/10] cxl/pci: Create auxiliary devices for each DOE mailbox

On Fri, Apr 29, 2022 at 9:39 AM Jonathan Cameron
<[email protected]> wrote:
>
> On Thu, 28 Apr 2022 14:09:38 -0700
> [email protected] wrote:
>
> > On Wed, Apr 27, 2022 at 06:19:42PM +0100, Jonathan Cameron wrote:
> > > On Thu, 14 Apr 2022 13:32:31 -0700
> > > [email protected] wrote:
> > >
> > > > From: Ira Weiny <[email protected]>
> > > >
> > > > CXL kernel drivers optionally need to access DOE mailbox capabilities.
> > > > Access to mailboxes for things such as CDAT, SPDM, and IDE are needed by
> > > > the kernel while other access is designed towards user space usage. An
> > > > example of this is for CXL Compliance Testing (see CXL 2.0 14.16.4
> > > > Compliance Mode DOE) which offers a mechanism to set different test
> > > > modes for a device.
> > > >
> > > > There is no anticipated need for the kernel to share an individual
> > > > mailbox with user space. Thus developing an interface to marshal access
> > > > between the kernel and user space for a single mailbox is unnecessary
> > > > overhead. However, having the kernel relinquish some mailboxes to be
> > > > controlled by user space is a reasonable compromise to share access to
> > > > the device.
> > > >
> > > > The auxiliary bus provides an elegant solution for this. Each DOE
> > > > capability is given its own auxiliary device. This device is controlled
> > > > by a kernel driver by default which restricts access to the mailbox.
> > > > Unbinding the driver from a single auxiliary device (DOE mailbox
> > > > capability) frees the mailbox for user space access. This architecture
> > > > also allows a clear picture on which mailboxes are kernel controlled vs
> > > > not.
> > > >
> > > > Iterate each DOE mailbox capability and create auxiliary bus devices.
> > > > Follow on patches will define a driver for the newly created devices.
> > > >
> > > > sysfs shows the devices.
> > > >
> > > > $ ls -l /sys/bus/auxiliary/devices/
> > > > total 0
> > > > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.0 -> ../../../devices/pci0000:bf/0000:bf:00.0/0000:c0:00.0/cxl_pci.doe.0
> > > > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.1 -> ../../../devices/pci0000:bf/0000:bf:01.0/0000:c1:00.0/cxl_pci.doe.1
> > > > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.2 -> ../../../devices/pci0000:35/0000:35:00.0/0000:36:00.0/cxl_pci.doe.2
> > > > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.3 -> ../../../devices/pci0000:35/0000:35:01.0/0000:37:00.0/cxl_pci.doe.3
> > > > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.4 -> ../../../devices/pci0000:35/0000:35:00.0/0000:36:00.0/cxl_pci.doe.4
> > > > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.5 -> ../../../devices/pci0000:bf/0000:bf:00.0/0000:c0:00.0/cxl_pci.doe.5
> > > > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.6 -> ../../../devices/pci0000:35/0000:35:01.0/0000:37:00.0/cxl_pci.doe.6
> > > > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.7 -> ../../../devices/pci0000:bf/0000:bf:01.0/0000:c1:00.0/cxl_pci.doe.7
> > > >
> > > > Signed-off-by: Ira Weiny <[email protected]>
> > >
> > > I'm not 100% happy with effectively having one solution for CXL
> > > and probably a different one for DOEs on switch ports
> > > (which I just hacked into a port service driver to convince
> > > myself there was at least one plausible way of doing that) but if
> > > this effectively separates the two discussions then I guess I can
> > > live with it for now ;)
> >
> > I took some time this morning to mull this over and talk to Dan...
> >
> > :-(
> >
> > Truthfully the aux driver does very little except provide a way for admins to
> > trigger the driver to stop/start accessing the Mailbox.
> >
> > I suppose a simple sysfs interface could be done to do the same?
> >
> > I'll let Dan weigh in here.
>
> I wonder if best short term option is to not provide a means of
> removing it at all (separate from the PCI driver that is).
> Then we can take our time to decide on the interface if we ever
> get much demand for one.
>
> >
> > >
> > > Once this is merged we can start the discussion about how to
> > > handle switch ports with DOEs both for CDAT and SPDM.
> >
> > I'm ok with that too. However, I was thinking that this was not a user ABI.
> > But it really is. If user space starts writing script to unbind drivers and
> > then we drop the aux driver support it will break them...
> >
> > >
> > > I'll send out an RFC that is so hideous it will get people to
> > > suggestion how to do it better!
> >
> > I think I'd like to see that.
>
> Fair enough. It may muddy the waters a bit :( I'll send an RFC
> next week. I've not looked at how the CXL region code etc would
> actually get to the latency / bandwidth info from the driver yet
> it just goes as far as reading a CDAT length. I also want to actually
> hook up some decent switch CDAT emulation in the QEMU code
> (right now it's giving the same default table as for a type 3 device).
>
> I just hope we don't bikeshed around the RFC in a fashion that slows
> this series moving forwards.

I think we have time in the sense that the worst that happens is that
tooling picks the wrong CFMWS to dynamically create a region and the
performance ends up being sub-optimal. That's tolerable to work around
in userspace in the near term. I want to get some wider confidence in
the DOE ABI with respect to the known protocols and what to do about
the vendor-specific protocols that may conflict and will be driven
from userspace issued config-cycles. That likely means that no DOE ABI
is the best ABI to start which means not moving forward with
aux-devices so scripts do not become attached to something that is not
fully committed to being carried forward.

I still want to refresh the request_config_region() support for at
least having the kernel warn on userspace conflicting configuration
writes to config areas claimed by a driver.

2022-05-02 07:59:40

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH V8 04/10] cxl/pci: Create auxiliary devices for each DOE mailbox

On Thu, 14 Apr 2022 13:32:31 -0700
[email protected] wrote:

> From: Ira Weiny <[email protected]>
>
> CXL kernel drivers optionally need to access DOE mailbox capabilities.
> Access to mailboxes for things such as CDAT, SPDM, and IDE are needed by
> the kernel while other access is designed towards user space usage. An
> example of this is for CXL Compliance Testing (see CXL 2.0 14.16.4
> Compliance Mode DOE) which offers a mechanism to set different test
> modes for a device.
>
> There is no anticipated need for the kernel to share an individual
> mailbox with user space. Thus developing an interface to marshal access
> between the kernel and user space for a single mailbox is unnecessary
> overhead. However, having the kernel relinquish some mailboxes to be
> controlled by user space is a reasonable compromise to share access to
> the device.
>
> The auxiliary bus provides an elegant solution for this. Each DOE
> capability is given its own auxiliary device. This device is controlled
> by a kernel driver by default which restricts access to the mailbox.
> Unbinding the driver from a single auxiliary device (DOE mailbox
> capability) frees the mailbox for user space access. This architecture
> also allows a clear picture on which mailboxes are kernel controlled vs
> not.
>
> Iterate each DOE mailbox capability and create auxiliary bus devices.
> Follow on patches will define a driver for the newly created devices.
>
> sysfs shows the devices.
>
> $ ls -l /sys/bus/auxiliary/devices/
> total 0
> lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.0 -> ../../../devices/pci0000:bf/0000:bf:00.0/0000:c0:00.0/cxl_pci.doe.0
> lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.1 -> ../../../devices/pci0000:bf/0000:bf:01.0/0000:c1:00.0/cxl_pci.doe.1
> lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.2 -> ../../../devices/pci0000:35/0000:35:00.0/0000:36:00.0/cxl_pci.doe.2
> lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.3 -> ../../../devices/pci0000:35/0000:35:01.0/0000:37:00.0/cxl_pci.doe.3
> lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.4 -> ../../../devices/pci0000:35/0000:35:00.0/0000:36:00.0/cxl_pci.doe.4
> lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.5 -> ../../../devices/pci0000:bf/0000:bf:00.0/0000:c0:00.0/cxl_pci.doe.5
> lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.6 -> ../../../devices/pci0000:35/0000:35:01.0/0000:37:00.0/cxl_pci.doe.6
> lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.7 -> ../../../devices/pci0000:bf/0000:bf:01.0/0000:c1:00.0/cxl_pci.doe.7
>
> Signed-off-by: Ira Weiny <[email protected]>

oops. I just replied to v7. The point still stands though.
Repeated briefly below.

>
> ---
> Changes from V7:
> Minor code clean ups
> Rebased on cxl-pending
>
> Changes from V6:
> Move all the auxiliary device stuff to the CXL layer
>
> Changes from V5:
> Split the CXL specific stuff off from the PCI DOE create
> auxiliary device code.
> ---
> drivers/cxl/Kconfig | 1 +
> drivers/cxl/cxlpci.h | 21 +++++++
> drivers/cxl/pci.c | 127 +++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 149 insertions(+)
>
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index f64e3984689f..ac0f5ca95431 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -16,6 +16,7 @@ if CXL_BUS
> config CXL_PCI
> tristate "PCI manageability"
> default CXL_BUS
> + select AUXILIARY_BUS
> help
> The CXL specification defines a "CXL memory device" sub-class in the
> PCI "memory controller" base class of devices. Device's identified by
> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> index 329e7ea3f36a..2ad8715173ce 100644
> --- a/drivers/cxl/cxlpci.h
> +++ b/drivers/cxl/cxlpci.h
> @@ -2,6 +2,7 @@
> /* Copyright(c) 2020 Intel Corporation. All rights reserved. */
> #ifndef __CXL_PCI_H__
> #define __CXL_PCI_H__
> +#include <linux/auxiliary_bus.h>
> #include <linux/pci.h>
> #include "cxl.h"
>
> @@ -72,4 +73,24 @@ static inline resource_size_t cxl_regmap_to_base(struct pci_dev *pdev,
> }
>
> int devm_cxl_port_enumerate_dports(struct cxl_port *port);
> +
> +/**
> + * struct cxl_doe_dev - CXL DOE auxiliary bus device
> + *
> + * @adev: Auxiliary bus device
> + * @pdev: PCI device this belongs to
> + * @cap_offset: Capability offset
> + * @use_irq: Set if IRQs are to be used with this mailbox
> + *
> + * This represents a single DOE mailbox device. CXL devices should create this
> + * device and register it on the Auxiliary bus for the CXL DOE driver to drive.
> + */
> +struct cxl_doe_dev {
> + struct auxiliary_device adev;
> + struct pci_dev *pdev;
> + int cap_offset;
> + bool use_irq;
> +};
> +#define DOE_DEV_NAME "doe"
> +
> #endif /* __CXL_PCI_H__ */
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index e7ab9a34d718..41a6f3eb0a5c 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -8,6 +8,7 @@
> #include <linux/mutex.h>
> #include <linux/list.h>
> #include <linux/pci.h>
> +#include <linux/pci-doe.h>
> #include <linux/io.h>
> #include "cxlmem.h"
> #include "cxlpci.h"
> @@ -564,6 +565,128 @@ static void cxl_dvsec_ranges(struct cxl_dev_state *cxlds)
> info->ranges = __cxl_dvsec_ranges(cxlds, info);
> }
>
> +static void cxl_pci_free_irq_vectors(void *data)
> +{
> + pci_free_irq_vectors(data);
> +}
> +
> +static DEFINE_IDA(pci_doe_adev_ida);
> +
> +static void cxl_pci_doe_dev_release(struct device *dev)
> +{
> + struct auxiliary_device *adev = container_of(dev,
> + struct auxiliary_device,
> + dev);
> + struct cxl_doe_dev *doe_dev = container_of(adev, struct cxl_doe_dev,
> + adev);
> +
> + ida_free(&pci_doe_adev_ida, adev->id);
> + kfree(doe_dev);
> +}
> +
> +static void cxl_pci_doe_destroy_device(void *ad)
> +{
> + auxiliary_device_delete(ad);
> + auxiliary_device_uninit(ad);
> +}
> +
> +/**
> + * cxl_pci_create_doe_devices - Create auxiliary bus DOE devices for all DOE
> + * mailboxes found
> + *
> + * @pci_dev: The PCI device to scan for DOE mailboxes
> + *
> + * There is no coresponding destroy of these devices. This function associates
> + * the DOE auxiliary devices created with the pci_dev passed in. That
> + * association is device managed (devm_*) such that the DOE auxiliary device
> + * lifetime is always less than or equal to the lifetime of the pci_dev.
> + *
> + * RETURNS: 0 on success -ERRNO on failure.
> + */
> +static int cxl_pci_create_doe_devices(struct pci_dev *pdev)
> +{
> + struct device *dev = &pdev->dev;
> + bool use_irq = true;
> + int irqs = 0;
> + u16 off = 0;
> + int rc;
> +
> + pci_doe_for_each_off(pdev, off)
> + irqs++;
I believe this is insufficient because there may be other irqs in use
on the device. In a similar fashion to that done in pcie/portdrv_core.c
I think we need to instead find the maximum msi/msix vector number
by reading the config space. Then we request one more vector
than that max value to ensure the vector we need has been requested.

Jonathan

> + pci_info(pdev, "Found %d DOE mailbox's\n", irqs);
> +
> + /*
> + * Allocate enough vectors for the DOE's
> + */
> + rc = pci_alloc_irq_vectors(pdev, irqs, irqs, PCI_IRQ_MSI |
> + PCI_IRQ_MSIX);
> + if (rc != irqs) {
> + pci_err(pdev,
> + "Not enough interrupts for all the DOEs; use polling\n");
> + use_irq = false;
> + /* Some got allocated; clean them up */
> + if (rc > 0)
> + cxl_pci_free_irq_vectors(pdev);
> + } else {
> + /*
> + * Enabling bus mastering is require for MSI/MSIx. It could be
> + * done later within the DOE initialization, but as it
> + * potentially has other impacts keep it here when setting up
> + * the IRQ's.
> + */
> + pci_set_master(pdev);
> + rc = devm_add_action_or_reset(dev,
> + cxl_pci_free_irq_vectors,
> + pdev);
> + if (rc)
> + return rc;
> + }
> +
> + pci_doe_for_each_off(pdev, off) {
> + struct auxiliary_device *adev;
> + struct cxl_doe_dev *new_dev;
> + int id;
> +
> + new_dev = kzalloc(sizeof(*new_dev), GFP_KERNEL);
> + if (!new_dev)
> + return -ENOMEM;
> +
> + new_dev->pdev = pdev;
> + new_dev->cap_offset = off;
> + new_dev->use_irq = use_irq;
> +
> + /* Set up struct auxiliary_device */
> + adev = &new_dev->adev;
> + id = ida_alloc(&pci_doe_adev_ida, GFP_KERNEL);
> + if (id < 0) {
> + kfree(new_dev);
> + return -ENOMEM;
> + }
> +
> + adev->id = id;
> + adev->name = DOE_DEV_NAME;
> + adev->dev.release = cxl_pci_doe_dev_release;
> + adev->dev.parent = dev;
> +
> + if (auxiliary_device_init(adev)) {
> + cxl_pci_doe_dev_release(&adev->dev);
> + return -EIO;
> + }
> +
> + if (auxiliary_device_add(adev)) {
> + auxiliary_device_uninit(adev);
> + return -EIO;
> + }
> +
> + rc = devm_add_action_or_reset(dev, cxl_pci_doe_destroy_device,
> + adev);
> + if (rc)
> + return rc;
> + }
> +
> + return 0;
> +}
> +
> static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> {
> struct cxl_register_map map;
> @@ -630,6 +753,10 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> if (rc)
> return rc;
>
> + rc = cxl_pci_create_doe_devices(pdev);
> + if (rc)
> + return rc;
> +
> cxl_dvsec_ranges(cxlds);
>
> cxlmd = devm_cxl_add_memdev(cxlds);

2022-05-02 12:55:26

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH V8 04/10] cxl/pci: Create auxiliary devices for each DOE mailbox

On Wed, Apr 27, 2022 at 06:19:42PM +0100, Jonathan Cameron wrote:
> On Thu, 14 Apr 2022 13:32:31 -0700
> [email protected] wrote:
>
> > From: Ira Weiny <[email protected]>
> >
> > CXL kernel drivers optionally need to access DOE mailbox capabilities.
> > Access to mailboxes for things such as CDAT, SPDM, and IDE are needed by
> > the kernel while other access is designed towards user space usage. An
> > example of this is for CXL Compliance Testing (see CXL 2.0 14.16.4
> > Compliance Mode DOE) which offers a mechanism to set different test
> > modes for a device.
> >
> > There is no anticipated need for the kernel to share an individual
> > mailbox with user space. Thus developing an interface to marshal access
> > between the kernel and user space for a single mailbox is unnecessary
> > overhead. However, having the kernel relinquish some mailboxes to be
> > controlled by user space is a reasonable compromise to share access to
> > the device.
> >
> > The auxiliary bus provides an elegant solution for this. Each DOE
> > capability is given its own auxiliary device. This device is controlled
> > by a kernel driver by default which restricts access to the mailbox.
> > Unbinding the driver from a single auxiliary device (DOE mailbox
> > capability) frees the mailbox for user space access. This architecture
> > also allows a clear picture on which mailboxes are kernel controlled vs
> > not.
> >
> > Iterate each DOE mailbox capability and create auxiliary bus devices.
> > Follow on patches will define a driver for the newly created devices.
> >
> > sysfs shows the devices.
> >
> > $ ls -l /sys/bus/auxiliary/devices/
> > total 0
> > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.0 -> ../../../devices/pci0000:bf/0000:bf:00.0/0000:c0:00.0/cxl_pci.doe.0
> > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.1 -> ../../../devices/pci0000:bf/0000:bf:01.0/0000:c1:00.0/cxl_pci.doe.1
> > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.2 -> ../../../devices/pci0000:35/0000:35:00.0/0000:36:00.0/cxl_pci.doe.2
> > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.3 -> ../../../devices/pci0000:35/0000:35:01.0/0000:37:00.0/cxl_pci.doe.3
> > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.4 -> ../../../devices/pci0000:35/0000:35:00.0/0000:36:00.0/cxl_pci.doe.4
> > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.5 -> ../../../devices/pci0000:bf/0000:bf:00.0/0000:c0:00.0/cxl_pci.doe.5
> > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.6 -> ../../../devices/pci0000:35/0000:35:01.0/0000:37:00.0/cxl_pci.doe.6
> > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.7 -> ../../../devices/pci0000:bf/0000:bf:01.0/0000:c1:00.0/cxl_pci.doe.7
> >
> > Signed-off-by: Ira Weiny <[email protected]>
>
> I'm not 100% happy with effectively having one solution for CXL
> and probably a different one for DOEs on switch ports
> (which I just hacked into a port service driver to convince
> myself there was at least one plausible way of doing that) but if
> this effectively separates the two discussions then I guess I can
> live with it for now ;)

I took some time this morning to mull this over and talk to Dan...

:-(

Truthfully the aux driver does very little except provide a way for admins to
trigger the driver to stop/start accessing the Mailbox.

I suppose a simple sysfs interface could be done to do the same?

I'll let Dan weigh in here.

>
> Once this is merged we can start the discussion about how to
> handle switch ports with DOEs both for CDAT and SPDM.

I'm ok with that too. However, I was thinking that this was not a user ABI.
But it really is. If user space starts writing script to unbind drivers and
then we drop the aux driver support it will break them...

>
> I'll send out an RFC that is so hideous it will get people to
> suggestion how to do it better!

I think I'd like to see that.

> Currently it starts and
> stops the mailbox 3 times in the registration path and I think
> it's more luck than judgement that is landing me with the right
> MSI.
>
> Anyhow, this looks good to me.
>
> Reviewed-by: Jonathan Cameron <[email protected]>

Thanks!

The good news is that the main support for the algorithm is now just part of
the pci core as library functions. I think this discussion is a pretty
light lift to move the calls to those around...

Ira

>
>
> >
> > ---
> > Changes from V7:
> > Minor code clean ups
> > Rebased on cxl-pending
> >
> > Changes from V6:
> > Move all the auxiliary device stuff to the CXL layer
> >
> > Changes from V5:
> > Split the CXL specific stuff off from the PCI DOE create
> > auxiliary device code.
> > ---
> > drivers/cxl/Kconfig | 1 +
> > drivers/cxl/cxlpci.h | 21 +++++++
> > drivers/cxl/pci.c | 127 +++++++++++++++++++++++++++++++++++++++++++
> > 3 files changed, 149 insertions(+)
> >
> > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> > index f64e3984689f..ac0f5ca95431 100644
> > --- a/drivers/cxl/Kconfig
> > +++ b/drivers/cxl/Kconfig
> > @@ -16,6 +16,7 @@ if CXL_BUS
> > config CXL_PCI
> > tristate "PCI manageability"
> > default CXL_BUS
> > + select AUXILIARY_BUS
> > help
> > The CXL specification defines a "CXL memory device" sub-class in the
> > PCI "memory controller" base class of devices. Device's identified by
> > diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> > index 329e7ea3f36a..2ad8715173ce 100644
> > --- a/drivers/cxl/cxlpci.h
> > +++ b/drivers/cxl/cxlpci.h
> > @@ -2,6 +2,7 @@
> > /* Copyright(c) 2020 Intel Corporation. All rights reserved. */
> > #ifndef __CXL_PCI_H__
> > #define __CXL_PCI_H__
> > +#include <linux/auxiliary_bus.h>
> > #include <linux/pci.h>
> > #include "cxl.h"
> >
> > @@ -72,4 +73,24 @@ static inline resource_size_t cxl_regmap_to_base(struct pci_dev *pdev,
> > }
> >
> > int devm_cxl_port_enumerate_dports(struct cxl_port *port);
> > +
> > +/**
> > + * struct cxl_doe_dev - CXL DOE auxiliary bus device
> > + *
> > + * @adev: Auxiliary bus device
> > + * @pdev: PCI device this belongs to
> > + * @cap_offset: Capability offset
> > + * @use_irq: Set if IRQs are to be used with this mailbox
> > + *
> > + * This represents a single DOE mailbox device. CXL devices should create this
> > + * device and register it on the Auxiliary bus for the CXL DOE driver to drive.
> > + */
> > +struct cxl_doe_dev {
> > + struct auxiliary_device adev;
> > + struct pci_dev *pdev;
> > + int cap_offset;
> > + bool use_irq;
> > +};
> > +#define DOE_DEV_NAME "doe"
> > +
> > #endif /* __CXL_PCI_H__ */
> > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > index e7ab9a34d718..41a6f3eb0a5c 100644
> > --- a/drivers/cxl/pci.c
> > +++ b/drivers/cxl/pci.c
> > @@ -8,6 +8,7 @@
> > #include <linux/mutex.h>
> > #include <linux/list.h>
> > #include <linux/pci.h>
> > +#include <linux/pci-doe.h>
> > #include <linux/io.h>
> > #include "cxlmem.h"
> > #include "cxlpci.h"
> > @@ -564,6 +565,128 @@ static void cxl_dvsec_ranges(struct cxl_dev_state *cxlds)
> > info->ranges = __cxl_dvsec_ranges(cxlds, info);
> > }
> >
> > +static void cxl_pci_free_irq_vectors(void *data)
> > +{
> > + pci_free_irq_vectors(data);
> > +}
> > +
> > +static DEFINE_IDA(pci_doe_adev_ida);
> > +
> > +static void cxl_pci_doe_dev_release(struct device *dev)
> > +{
> > + struct auxiliary_device *adev = container_of(dev,
> > + struct auxiliary_device,
> > + dev);
> > + struct cxl_doe_dev *doe_dev = container_of(adev, struct cxl_doe_dev,
> > + adev);
> > +
> > + ida_free(&pci_doe_adev_ida, adev->id);
> > + kfree(doe_dev);
> > +}
> > +
> > +static void cxl_pci_doe_destroy_device(void *ad)
> > +{
> > + auxiliary_device_delete(ad);
> > + auxiliary_device_uninit(ad);
> > +}
> > +
> > +/**
> > + * cxl_pci_create_doe_devices - Create auxiliary bus DOE devices for all DOE
> > + * mailboxes found
> > + *
> > + * @pci_dev: The PCI device to scan for DOE mailboxes
> > + *
> > + * There is no coresponding destroy of these devices. This function associates
> > + * the DOE auxiliary devices created with the pci_dev passed in. That
> > + * association is device managed (devm_*) such that the DOE auxiliary device
> > + * lifetime is always less than or equal to the lifetime of the pci_dev.
> > + *
> > + * RETURNS: 0 on success -ERRNO on failure.
> > + */
> > +static int cxl_pci_create_doe_devices(struct pci_dev *pdev)
> > +{
> > + struct device *dev = &pdev->dev;
> > + bool use_irq = true;
> > + int irqs = 0;
> > + u16 off = 0;
> > + int rc;
> > +
> > + pci_doe_for_each_off(pdev, off)
> > + irqs++;
> > + pci_info(pdev, "Found %d DOE mailbox's\n", irqs);
> > +
> > + /*
> > + * Allocate enough vectors for the DOE's
> > + */
> > + rc = pci_alloc_irq_vectors(pdev, irqs, irqs, PCI_IRQ_MSI |
> > + PCI_IRQ_MSIX);
> > + if (rc != irqs) {
> > + pci_err(pdev,
> > + "Not enough interrupts for all the DOEs; use polling\n");
> > + use_irq = false;
> > + /* Some got allocated; clean them up */
> > + if (rc > 0)
> > + cxl_pci_free_irq_vectors(pdev);
> > + } else {
> > + /*
> > + * Enabling bus mastering is require for MSI/MSIx. It could be
> > + * done later within the DOE initialization, but as it
> > + * potentially has other impacts keep it here when setting up
> > + * the IRQ's.
> > + */
> > + pci_set_master(pdev);
> > + rc = devm_add_action_or_reset(dev,
> > + cxl_pci_free_irq_vectors,
> > + pdev);
> > + if (rc)
> > + return rc;
> > + }
> > +
> > + pci_doe_for_each_off(pdev, off) {
> > + struct auxiliary_device *adev;
> > + struct cxl_doe_dev *new_dev;
> > + int id;
> > +
> > + new_dev = kzalloc(sizeof(*new_dev), GFP_KERNEL);
> > + if (!new_dev)
> > + return -ENOMEM;
> > +
> > + new_dev->pdev = pdev;
> > + new_dev->cap_offset = off;
> > + new_dev->use_irq = use_irq;
> > +
> > + /* Set up struct auxiliary_device */
> > + adev = &new_dev->adev;
> > + id = ida_alloc(&pci_doe_adev_ida, GFP_KERNEL);
> > + if (id < 0) {
> > + kfree(new_dev);
> > + return -ENOMEM;
> > + }
> > +
> > + adev->id = id;
> > + adev->name = DOE_DEV_NAME;
> > + adev->dev.release = cxl_pci_doe_dev_release;
> > + adev->dev.parent = dev;
> > +
> > + if (auxiliary_device_init(adev)) {
> > + cxl_pci_doe_dev_release(&adev->dev);
> > + return -EIO;
> > + }
> > +
> > + if (auxiliary_device_add(adev)) {
> > + auxiliary_device_uninit(adev);
> > + return -EIO;
> > + }
> > +
> > + rc = devm_add_action_or_reset(dev, cxl_pci_doe_destroy_device,
> > + adev);
> > + if (rc)
> > + return rc;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > {
> > struct cxl_register_map map;
> > @@ -630,6 +753,10 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > if (rc)
> > return rc;
> >
> > + rc = cxl_pci_create_doe_devices(pdev);
> > + if (rc)
> > + return rc;
> > +
> > cxl_dvsec_ranges(cxlds);
> >
> > cxlmd = devm_cxl_add_memdev(cxlds);
>

2022-05-02 18:31:54

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH V8 04/10] cxl/pci: Create auxiliary devices for each DOE mailbox

On Fri, Apr 29, 2022 at 04:55:02PM +0100, Jonathan Cameron wrote:
> On Thu, 14 Apr 2022 13:32:31 -0700
> [email protected] wrote:
>
> > From: Ira Weiny <[email protected]>
> >

[snip]

> > +
> > +/**
> > + * cxl_pci_create_doe_devices - Create auxiliary bus DOE devices for all DOE
> > + * mailboxes found
> > + *
> > + * @pci_dev: The PCI device to scan for DOE mailboxes
> > + *
> > + * There is no coresponding destroy of these devices. This function associates
> > + * the DOE auxiliary devices created with the pci_dev passed in. That
> > + * association is device managed (devm_*) such that the DOE auxiliary device
> > + * lifetime is always less than or equal to the lifetime of the pci_dev.
> > + *
> > + * RETURNS: 0 on success -ERRNO on failure.
> > + */
> > +static int cxl_pci_create_doe_devices(struct pci_dev *pdev)
> > +{
> > + struct device *dev = &pdev->dev;
> > + bool use_irq = true;
> > + int irqs = 0;
> > + u16 off = 0;
> > + int rc;
> > +
> > + pci_doe_for_each_off(pdev, off)
> > + irqs++;
> I believe this is insufficient because there may be other irqs in use
> on the device.

I did not think that was true for any current CXL device. From what I could
tell all CXL devices would be covered by this simple calculation. I left it to
the reader to determine if a new CXL device came along which needed other irq's
to lift this somewhere to cover those allocations. I probably should have made
some comment. Sorry.

> In a similar fashion to that done in pcie/portdrv_core.c
> I think we need to instead find the maximum msi/msix vector number
> by reading the config space.

I was not aware I could do that...

> Then we request one more vector
> than that max value to ensure the vector we need has been requested.

Yea at some point I figured this would be lifted out of here as part of a
larger 'allocate all the vectors for the device' function.

But for now this is the only place that needs irqs so I punted. I can lift
this into something like

cxl_pci_alloc_irq_vectors(...) and then pass use_irq here.

But to move this series forward I would propose that
cxl_pci_alloc_irq_vectors() do what I'm doing here for now.

Ira

>
> Jonathan
>
> > + pci_info(pdev, "Found %d DOE mailbox's\n", irqs);
> > +
> > + /*
> > + * Allocate enough vectors for the DOE's
> > + */
> > + rc = pci_alloc_irq_vectors(pdev, irqs, irqs, PCI_IRQ_MSI |
> > + PCI_IRQ_MSIX);
> > + if (rc != irqs) {
> > + pci_err(pdev,
> > + "Not enough interrupts for all the DOEs; use polling\n");
> > + use_irq = false;
> > + /* Some got allocated; clean them up */
> > + if (rc > 0)
> > + cxl_pci_free_irq_vectors(pdev);
> > + } else {
> > + /*
> > + * Enabling bus mastering is require for MSI/MSIx. It could be
> > + * done later within the DOE initialization, but as it
> > + * potentially has other impacts keep it here when setting up
> > + * the IRQ's.
> > + */
> > + pci_set_master(pdev);
> > + rc = devm_add_action_or_reset(dev,
> > + cxl_pci_free_irq_vectors,
> > + pdev);
> > + if (rc)
> > + return rc;
> > + }
> > +
> > + pci_doe_for_each_off(pdev, off) {
> > + struct auxiliary_device *adev;
> > + struct cxl_doe_dev *new_dev;
> > + int id;
> > +
> > + new_dev = kzalloc(sizeof(*new_dev), GFP_KERNEL);
> > + if (!new_dev)
> > + return -ENOMEM;
> > +
> > + new_dev->pdev = pdev;
> > + new_dev->cap_offset = off;
> > + new_dev->use_irq = use_irq;
> > +
> > + /* Set up struct auxiliary_device */
> > + adev = &new_dev->adev;
> > + id = ida_alloc(&pci_doe_adev_ida, GFP_KERNEL);
> > + if (id < 0) {
> > + kfree(new_dev);
> > + return -ENOMEM;
> > + }
> > +
> > + adev->id = id;
> > + adev->name = DOE_DEV_NAME;
> > + adev->dev.release = cxl_pci_doe_dev_release;
> > + adev->dev.parent = dev;
> > +
> > + if (auxiliary_device_init(adev)) {
> > + cxl_pci_doe_dev_release(&adev->dev);
> > + return -EIO;
> > + }
> > +
> > + if (auxiliary_device_add(adev)) {
> > + auxiliary_device_uninit(adev);
> > + return -EIO;
> > + }
> > +
> > + rc = devm_add_action_or_reset(dev, cxl_pci_doe_destroy_device,
> > + adev);
> > + if (rc)
> > + return rc;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > {
> > struct cxl_register_map map;
> > @@ -630,6 +753,10 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > if (rc)
> > return rc;
> >
> > + rc = cxl_pci_create_doe_devices(pdev);
> > + if (rc)
> > + return rc;
> > +
> > cxl_dvsec_ranges(cxlds);
> >
> > cxlmd = devm_cxl_add_memdev(cxlds);
>

2022-05-04 01:50:06

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH V8 04/10] cxl/pci: Create auxiliary devices for each DOE mailbox

On Fri, 29 Apr 2022 10:01:09 -0700
Dan Williams <[email protected]> wrote:

> On Fri, Apr 29, 2022 at 9:39 AM Jonathan Cameron
> <[email protected]> wrote:
> >
> > On Thu, 28 Apr 2022 14:09:38 -0700
> > [email protected] wrote:
> >
> > > On Wed, Apr 27, 2022 at 06:19:42PM +0100, Jonathan Cameron wrote:
> > > > On Thu, 14 Apr 2022 13:32:31 -0700
> > > > [email protected] wrote:
> > > >
> > > > > From: Ira Weiny <[email protected]>
> > > > >
> > > > > CXL kernel drivers optionally need to access DOE mailbox capabilities.
> > > > > Access to mailboxes for things such as CDAT, SPDM, and IDE are needed by
> > > > > the kernel while other access is designed towards user space usage. An
> > > > > example of this is for CXL Compliance Testing (see CXL 2.0 14.16.4
> > > > > Compliance Mode DOE) which offers a mechanism to set different test
> > > > > modes for a device.
> > > > >
> > > > > There is no anticipated need for the kernel to share an individual
> > > > > mailbox with user space. Thus developing an interface to marshal access
> > > > > between the kernel and user space for a single mailbox is unnecessary
> > > > > overhead. However, having the kernel relinquish some mailboxes to be
> > > > > controlled by user space is a reasonable compromise to share access to
> > > > > the device.
> > > > >
> > > > > The auxiliary bus provides an elegant solution for this. Each DOE
> > > > > capability is given its own auxiliary device. This device is controlled
> > > > > by a kernel driver by default which restricts access to the mailbox.
> > > > > Unbinding the driver from a single auxiliary device (DOE mailbox
> > > > > capability) frees the mailbox for user space access. This architecture
> > > > > also allows a clear picture on which mailboxes are kernel controlled vs
> > > > > not.
> > > > >
> > > > > Iterate each DOE mailbox capability and create auxiliary bus devices.
> > > > > Follow on patches will define a driver for the newly created devices.
> > > > >
> > > > > sysfs shows the devices.
> > > > >
> > > > > $ ls -l /sys/bus/auxiliary/devices/
> > > > > total 0
> > > > > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.0 -> ../../../devices/pci0000:bf/0000:bf:00.0/0000:c0:00.0/cxl_pci.doe.0
> > > > > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.1 -> ../../../devices/pci0000:bf/0000:bf:01.0/0000:c1:00.0/cxl_pci.doe.1
> > > > > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.2 -> ../../../devices/pci0000:35/0000:35:00.0/0000:36:00.0/cxl_pci.doe.2
> > > > > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.3 -> ../../../devices/pci0000:35/0000:35:01.0/0000:37:00.0/cxl_pci.doe.3
> > > > > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.4 -> ../../../devices/pci0000:35/0000:35:00.0/0000:36:00.0/cxl_pci.doe.4
> > > > > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.5 -> ../../../devices/pci0000:bf/0000:bf:00.0/0000:c0:00.0/cxl_pci.doe.5
> > > > > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.6 -> ../../../devices/pci0000:35/0000:35:01.0/0000:37:00.0/cxl_pci.doe.6
> > > > > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.7 -> ../../../devices/pci0000:bf/0000:bf:01.0/0000:c1:00.0/cxl_pci.doe.7
> > > > >
> > > > > Signed-off-by: Ira Weiny <[email protected]>
> > > >
> > > > I'm not 100% happy with effectively having one solution for CXL
> > > > and probably a different one for DOEs on switch ports
> > > > (which I just hacked into a port service driver to convince
> > > > myself there was at least one plausible way of doing that) but if
> > > > this effectively separates the two discussions then I guess I can
> > > > live with it for now ;)
> > >
> > > I took some time this morning to mull this over and talk to Dan...
> > >
> > > :-(
> > >
> > > Truthfully the aux driver does very little except provide a way for admins to
> > > trigger the driver to stop/start accessing the Mailbox.
> > >
> > > I suppose a simple sysfs interface could be done to do the same?
> > >
> > > I'll let Dan weigh in here.
> >
> > I wonder if best short term option is to not provide a means of
> > removing it at all (separate from the PCI driver that is).
> > Then we can take our time to decide on the interface if we ever
> > get much demand for one.
> >
> > >
> > > >
> > > > Once this is merged we can start the discussion about how to
> > > > handle switch ports with DOEs both for CDAT and SPDM.
> > >
> > > I'm ok with that too. However, I was thinking that this was not a user ABI.
> > > But it really is. If user space starts writing script to unbind drivers and
> > > then we drop the aux driver support it will break them...
> > >
> > > >
> > > > I'll send out an RFC that is so hideous it will get people to
> > > > suggestion how to do it better!
> > >
> > > I think I'd like to see that.
> >
> > Fair enough. It may muddy the waters a bit :( I'll send an RFC
> > next week. I've not looked at how the CXL region code etc would
> > actually get to the latency / bandwidth info from the driver yet
> > it just goes as far as reading a CDAT length. I also want to actually
> > hook up some decent switch CDAT emulation in the QEMU code
> > (right now it's giving the same default table as for a type 3 device).
> >
> > I just hope we don't bikeshed around the RFC in a fashion that slows
> > this series moving forwards.
>
> I think we have time in the sense that the worst that happens is that
> tooling picks the wrong CFMWS to dynamically create a region and the
> performance ends up being sub-optimal. That's tolerable to work around
> in userspace in the near term. I want to get some wider confidence in
> the DOE ABI with respect to the known protocols and what to do about
> the vendor-specific protocols that may conflict and will be driven
> from userspace issued config-cycles.
Hi Dan,

Ok, though I would like to try to push the conversation forwards beyond where
we got to last year. IIRC the general conclusion was that all protocols
sharing a DOE instance (and more generally because they all share
discovery) must be mediated by the kernel and that we would not be
supporting a generic access interface (there was one in a much earlier
version of this patch set). Whilst it's far from the only thing that needs
resolving, this DOE question is also a blocker on getting anywhere with
CMA/SPDM. Unbinding a driver as the means to stop the kernel accessing
a DOE is reliant on the host driver not deciding to do any more protocol
discovery - we can probably rely on that but it's not pretty.

> That likely means that no DOE ABI
> is the best ABI to start which means not moving forward with
> aux-devices so scripts do not become attached to something that is not
> fully committed to being carried forward.

This isn't really DOE ABI we are currently discussing, it's a CXL subsystem ABI.
If we decided to only expose the DOE as an internal implementation
detail (calls made directly from appropriate existing CXL driver) then
there wouldn't be an ABI question. We would be limiting DOE access
for other protocols but personally I don't see that as a problem
in the short to medium term.

Things may be more 'interesting' for the PCIe port services driver though
(see RFC just sent out)
https://lore.kernel.org/linux-cxl/[email protected]/T/#t
Perhaps if we can resolve what that should look like, the CXL EP side
of things will be much easier to figure out?

>
> I still want to refresh the request_config_region() support for at
> least having the kernel warn on userspace conflicting configuration
> writes to config areas claimed by a driver.

Great, that seems like a sensible step to do in parallel.

Jonathan


2022-05-04 15:16:16

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH V8 04/10] cxl/pci: Create auxiliary devices for each DOE mailbox

On Fri, 29 Apr 2022 10:20:54 -0700
Ira Weiny <[email protected]> wrote:

> On Fri, Apr 29, 2022 at 04:55:02PM +0100, Jonathan Cameron wrote:
> > On Thu, 14 Apr 2022 13:32:31 -0700
> > [email protected] wrote:
> >
> > > From: Ira Weiny <[email protected]>
> > >
>
> [snip]
>
> > > +
> > > +/**
> > > + * cxl_pci_create_doe_devices - Create auxiliary bus DOE devices for all DOE
> > > + * mailboxes found
> > > + *
> > > + * @pci_dev: The PCI device to scan for DOE mailboxes
> > > + *
> > > + * There is no coresponding destroy of these devices. This function associates
> > > + * the DOE auxiliary devices created with the pci_dev passed in. That
> > > + * association is device managed (devm_*) such that the DOE auxiliary device
> > > + * lifetime is always less than or equal to the lifetime of the pci_dev.
> > > + *
> > > + * RETURNS: 0 on success -ERRNO on failure.
> > > + */
> > > +static int cxl_pci_create_doe_devices(struct pci_dev *pdev)
> > > +{
> > > + struct device *dev = &pdev->dev;
> > > + bool use_irq = true;
> > > + int irqs = 0;
> > > + u16 off = 0;
> > > + int rc;
> > > +
> > > + pci_doe_for_each_off(pdev, off)
> > > + irqs++;
> > I believe this is insufficient because there may be other irqs in use
> > on the device.
>
> I did not think that was true for any current CXL device. From what I could
> tell all CXL devices would be covered by this simple calculation. I left it to
> the reader to determine if a new CXL device came along which needed other irq's
> to lift this somewhere to cover those allocations. I probably should have made
> some comment. Sorry.
>
> > In a similar fashion to that done in pcie/portdrv_core.c
> > I think we need to instead find the maximum msi/msix vector number
> > by reading the config space.
>
> I was not aware I could do that...
>
> > Then we request one more vector
> > than that max value to ensure the vector we need has been requested.
>
> Yea at some point I figured this would be lifted out of here as part of a
> larger 'allocate all the vectors for the device' function.
>
> But for now this is the only place that needs irqs so I punted. I can lift
> this into something like
>
> cxl_pci_alloc_irq_vectors(...) and then pass use_irq here.
>
> But to move this series forward I would propose that
> cxl_pci_alloc_irq_vectors() do what I'm doing here for now.

Handling this right is pretty simple code e.g. equivalent of
https://elixir.bootlin.com/linux/latest/source/drivers/pci/pcie/portdrv_core.c#L62
So my inclination would be to fix it now rather than leaving chance
of some odd failures later.

Jonathan

>
> Ira
>
> >
> > Jonathan
> >
> > > + pci_info(pdev, "Found %d DOE mailbox's\n", irqs);
> > > +
> > > + /*
> > > + * Allocate enough vectors for the DOE's
> > > + */
> > > + rc = pci_alloc_irq_vectors(pdev, irqs, irqs, PCI_IRQ_MSI |
> > > + PCI_IRQ_MSIX);
> > > + if (rc != irqs) {
> > > + pci_err(pdev,
> > > + "Not enough interrupts for all the DOEs; use polling\n");
> > > + use_irq = false;
> > > + /* Some got allocated; clean them up */
> > > + if (rc > 0)
> > > + cxl_pci_free_irq_vectors(pdev);
> > > + } else {
> > > + /*
> > > + * Enabling bus mastering is require for MSI/MSIx. It could be
> > > + * done later within the DOE initialization, but as it
> > > + * potentially has other impacts keep it here when setting up
> > > + * the IRQ's.
> > > + */
> > > + pci_set_master(pdev);
> > > + rc = devm_add_action_or_reset(dev,
> > > + cxl_pci_free_irq_vectors,
> > > + pdev);
> > > + if (rc)
> > > + return rc;
> > > + }
> > > +
> > > + pci_doe_for_each_off(pdev, off) {
> > > + struct auxiliary_device *adev;
> > > + struct cxl_doe_dev *new_dev;
> > > + int id;
> > > +
> > > + new_dev = kzalloc(sizeof(*new_dev), GFP_KERNEL);
> > > + if (!new_dev)
> > > + return -ENOMEM;
> > > +
> > > + new_dev->pdev = pdev;
> > > + new_dev->cap_offset = off;
> > > + new_dev->use_irq = use_irq;
> > > +
> > > + /* Set up struct auxiliary_device */
> > > + adev = &new_dev->adev;
> > > + id = ida_alloc(&pci_doe_adev_ida, GFP_KERNEL);
> > > + if (id < 0) {
> > > + kfree(new_dev);
> > > + return -ENOMEM;
> > > + }
> > > +
> > > + adev->id = id;
> > > + adev->name = DOE_DEV_NAME;
> > > + adev->dev.release = cxl_pci_doe_dev_release;
> > > + adev->dev.parent = dev;
> > > +
> > > + if (auxiliary_device_init(adev)) {
> > > + cxl_pci_doe_dev_release(&adev->dev);
> > > + return -EIO;
> > > + }
> > > +
> > > + if (auxiliary_device_add(adev)) {
> > > + auxiliary_device_uninit(adev);
> > > + return -EIO;
> > > + }
> > > +
> > > + rc = devm_add_action_or_reset(dev, cxl_pci_doe_destroy_device,
> > > + adev);
> > > + if (rc)
> > > + return rc;
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +
> > > static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > > {
> > > struct cxl_register_map map;
> > > @@ -630,6 +753,10 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > > if (rc)
> > > return rc;
> > >
> > > + rc = cxl_pci_create_doe_devices(pdev);
> > > + if (rc)
> > > + return rc;
> > > +
> > > cxl_dvsec_ranges(cxlds);
> > >
> > > cxlmd = devm_cxl_add_memdev(cxlds);
> >


2022-05-09 21:49:11

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH V8 06/10] cxl/pci: Find the DOE mailbox which supports CDAT

On Wed, Apr 27, 2022 at 06:49:56PM +0100, Jonathan Cameron wrote:
> On Thu, 14 Apr 2022 13:32:33 -0700
> [email protected] wrote:
>
> > From: Ira Weiny <[email protected]>
> >
> > Each CXL device may have multiple DOE mailbox capabilities and each
> > mailbox may support multiple protocols.
> >
> > Search the DOE auxiliary devices for one which supports the CDAT
> > protocol. Cache that device to be used for future queries.
> >
> > Signed-off-by: Ira Weiny <[email protected]>
>
> One question inline. With that addressed (or convincing argument why not)
> the rest looks good.
>
> Reviewed-by: Jonathan Cameron <[email protected]>

Thanks!

>
> >
> > ---
> > Changes from V7
> > Minor code clean ups
> >
> > Changes from V6
> > Adjust for aux devices being a CXL only concept
> > Update commit msg.
> > Ensure devices iterated by auxiliary_find_device() are checked
> > to be DOE devices prior to checking for the CDAT
> > protocol
> > From Ben
> > Ensure reference from auxiliary_find_device() is dropped
> > ---
> > drivers/cxl/cxl.h | 2 ++
> > drivers/cxl/cxlmem.h | 2 ++
> > drivers/cxl/pci.c | 76 +++++++++++++++++++++++++++++++++++++++++++-
> > 3 files changed, 79 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index 990b6670222e..50817fd2c912 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -90,6 +90,8 @@ static inline int cxl_hdm_decoder_count(u32 cap_hdr)
> > #define CXLDEV_MBOX_BG_CMD_STATUS_OFFSET 0x18
> > #define CXLDEV_MBOX_PAYLOAD_OFFSET 0x20
> >
> > +#define CXL_DOE_PROTOCOL_TABLE_ACCESS 2
> > +
> > /*
> > * Using struct_group() allows for per register-block-type helper routines,
> > * without requiring block-type agnostic code to include the prefix.
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index 7235d2f976e5..0dc6988afb30 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -168,6 +168,7 @@ struct cxl_endpoint_dvsec_info {
> > * Currently only memory devices are represented.
> > *
> > * @dev: The device associated with this CXL state
> > + * @cdat_doe: Auxiliary DOE device capabile of reading CDAT
> > * @regs: Parsed register blocks
> > * @cxl_dvsec: Offset to the PCIe device DVSEC
> > * @payload_size: Size of space for payload
> > @@ -200,6 +201,7 @@ struct cxl_endpoint_dvsec_info {
> > struct cxl_dev_state {
> > struct device *dev;
> >
> > + struct cxl_doe_dev *cdat_doe;
> > struct cxl_regs regs;
> > int cxl_dvsec;
> >
> > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > index 0dec1f1a3f38..622cac925262 100644
> > --- a/drivers/cxl/pci.c
> > +++ b/drivers/cxl/pci.c
> > @@ -706,6 +706,80 @@ static int cxl_pci_create_doe_devices(struct pci_dev *pdev)
> > return 0;
> > }
> >
> > +static bool cxl_doe_dev_supports_prot(struct cxl_doe_dev *doe_dev, u16 vid, u16 pid)
> > +{
> > + struct cxl_doe_drv_state *doe_ds;
> > + bool ret;
> > +
> > + doe_ds = cxl_pci_doe_get_drv(doe_dev);
> > + if (!doe_ds) {
> > + cxl_pci_doe_put_drv(doe_dev);
>
> That means the get_drv has a rather unexpected side effect.
> Could you move the NULL check into the function so that we don't retain
> a reference if it fails?

Indeed. I struggled a bit with this because get_* is usually taking a
reference on the subject of the get but in this case we are really getting a
reference on the device not the driver...

I originally open coded this and the functions were 'get_device'...

:-/

I think this also may have been why the device was originally part of the
driver state.

I think this is the better way to code these access functions:

static struct cxl_doe_drv_state *cxl_pci_doe_get_drv(struct cxl_doe_dev *doe_dev)
{
struct cxl_doe_drv_state *cxl_ds;

down_read(&doe_dev->driver_access);
cxl_ds = auxiliary_get_drvdata(&doe_dev->adev);
if (!cxl_ds)
up_read(&doe_dev->driver_access);
return cxl_ds;
}

static void cxl_pci_doe_put_drv(struct cxl_doe_drv_state *cxl_ds)
{
if (!cxl_ds)
return;
up_read(&cxl_ds->doe_dev->driver_access);
}

Ira

>
> > + return false;
> > + }
> > + ret = pci_doe_supports_prot(doe_ds->doe_mb, vid, pid);
> > + cxl_pci_doe_put_drv(doe_dev);
> > + return ret;
> > +}
> > +
> > +static int cxl_match_cdat_doe_device(struct device *dev, const void *data)
> > +{
> > + const struct cxl_dev_state *cxlds = data;
> > + struct auxiliary_device *adev = to_auxiliary_dev(dev);
> > + struct cxl_doe_dev *doe_dev;
> > +
> > + /* Ensure this is a DOE device */
> > + if (strcmp(DOE_DEV_NAME, adev->name))
> > + return 0;
> > +
> > + /* Determine if this auxiliary device belongs to the cxlds */
> > + if (cxlds->dev != dev->parent)
> > + return 0;
> > +
> > + doe_dev = container_of(adev, struct cxl_doe_dev, adev);
> > +
> > + /* If it is one of ours check for the CDAT protocol */
> > + if (!cxl_doe_dev_supports_prot(doe_dev, PCI_DVSEC_VENDOR_ID_CXL,
> > + CXL_DOE_PROTOCOL_TABLE_ACCESS))
> > + return 0;
> > +
> > + return 1;
> > +}
> > +
> > +static void drop_cdat_doe_ref(void *data)
> > +{
> > + struct cxl_doe_dev *cdat_doe = data;
> > +
> > + put_device(&cdat_doe->adev.dev);
> > +}
> > +
> > +static int cxl_setup_doe_devices(struct cxl_dev_state *cxlds)
> > +{
> > + struct device *dev = cxlds->dev;
> > + struct pci_dev *pdev = to_pci_dev(dev);
> > + struct auxiliary_device *adev;
> > + int rc;
> > +
> > + rc = cxl_pci_create_doe_devices(pdev);
> > + if (rc)
> > + return rc;
> > +
> > + adev = auxiliary_find_device(NULL, cxlds, &cxl_match_cdat_doe_device);
> > +
> > + if (adev) {
> > + struct cxl_doe_dev *doe_dev = container_of(adev,
> > + struct cxl_doe_dev,
> > + adev);
> > +
> > + /* auxiliary_find_device() takes the reference */
> > + rc = devm_add_action_or_reset(dev, drop_cdat_doe_ref, doe_dev);
> > + if (rc)
> > + return rc;
> > + cxlds->cdat_doe = doe_dev;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > {
> > struct cxl_register_map map;
> > @@ -772,7 +846,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > if (rc)
> > return rc;
> >
> > - rc = cxl_pci_create_doe_devices(pdev);
> > + rc = cxl_setup_doe_devices(cxlds);
> > if (rc)
> > return rc;
> >
>