2020-11-11 05:47:55

by Ben Widawsky

[permalink] [raw]
Subject: [RFC PATCH 0/9] CXL 2.0 Support

Introduce support for “type-3” memory devices defined in the recently released
Compute Express Link (CXL) 2.0 specification. Specifically, these are the memory
devices defined by section 8.2.8.5 of the CXL 2.0 spec. A reference
implementation emulating these devices is being submitted to the QEMU mailing
list. “Type-3” is a CXL device that acts as a memory expander for RAM or PMEM.
It might be interleaved with other CXL devices in a given physical address
range.

These changes allow for foundational enumeration of CXL 2.0 memory devices. The functionality present is:
- Initial driver bring-up
- Device enumeration and an initial sysfs representation
- Submit a basic firmware command via ‘mailbox’ to an emulated memory device
with non-volatile capacity.

Some of the functionality that is still missing includes:
- Memory interleaving at the host bridge, root port, or switch level
- CXL 1.1 Root Complex Integrated Endpoint Support
- CXL 2.0 Hot plug support

In addition to the core functionality of discovering the spec defined registers
and resources, introduce a CXL device model that will be the foundation for
translating CXL capabilities into existing Linux infrastructure for Persistent
Memory and other memory devices. For now, this only includes support for the
management command mailbox that type-3 devices surface. These control devices
fill the role of “DIMMs” / nmemX memory-devices in LIBNVDIMM terms.

Now, while implementing the driver some feedback for the specification was
generated to cover perceived gaps and address conflicts. The feedback is
presented as a reference implementation in the driver and QEMU emulation.
Specifically the following concepts are original to the Linux implementation and
feedback / collaboration is requested to develop these into specification
proposals:
1. Top level ACPI object (ACPI0017)
2. HW imposed address space and interleave constraints
3. _OSC UUID A4D1629D-FF52-4888-BE96-E5CADE548DB1

ACPI0017
--------
Introduce a new ACPI namespace device with an _HID of ACPI0017. The purpose of
this object is twofold, support a legacy OS with a set of out-of-tree CXL
modules, and establish an attach point for a driver that knows about
interleaving. Both of these boil down to the same point, to centralize Operating
System support for resources described by the CXL Early Discovery Table (CEDT).

The legacy OS problem stems from the spec's description of a host bridge,
ACPI0016 is denoted as the _HID for host bridges, with a _CID of PNP0A08. In a
CXL unaware version of Linux, the core ACPI subsystem will bind a driver to
PNP0A08 and preclude a CXL-aware driver from binding to ACPI0016. An ACPI0017
device allows a standalone CXL-aware driver to register for handling /
coordinating CEDT and CXL-specific _OSC control.

Similarly when managing interleaving there needs to be some management layer
above the ACPI0016 device that is capable of assembling leaf nodes into
interleave sets. As is the case with ACPI0012 that does this central
coordination for NFIT defined resources, ACPI0017 does the same for CEDT
described resources.

Memory Windows
-------
For CXL.mem capable platforms, there is a need for a mechanism for platform
firmware to make the Operating System aware of any restrictions that hardware
might have in address space. For example, in a system with 4 host bridges all
participating in an interleave set, the firmware needs to provide some
description of this. That information is missing from the CXL 2.0 spec as of
today and it also is not implemented in the driver. A variety of ACPI based
mechanisms, for example _CRS fields on the ACPI0017 device, were considered.


CXL Exclusive _OSC
-----------------
CXL 2.0 definition provides new fields to _OSC for host bridges to allow for new
services provided by CXL - error handling, hot plug, capabilities, etc. This is
built on top of PCIe _OSC via a new UUID. A CXL unaware OS will use the old UUID
to configure the PCIe host bridge. The expectation is that a CXL aware OS uses
the new UUID and to modify both CXL and PCIE capabilities in one shot. The issue
arises when trying to define a standalone CXL driver. The core OS will configure
the PCIe _OSC, but when the CXL driver attempts to set CXL _OSC the current
definition makes that driver re-specify PCIE capabilities. An isolated CXL-only
_OSC allows the PCIE core to be unchanged and let a CXL driver stack manage CXL
_OSC without the possibility of clobbering / colliding with PCIE core OSC
management. The proposal moves the new _OSC dwords (SUPC and CTRC) to their own
_OSC UUID.

Next steps after this basic foundation is expanded command support and LIBNVDIMM
integration. This is the initial “release early / release often” version of the
Linux CXL enabling.


Ben Widawsky (5):
cxl/mem: Map memory device registers
cxl/mem: Find device capabilities
cxl/mem: Initialize the mailbox interface
cxl/mem: Implement polled mode mailbox
MAINTAINERS: Add maintainers of the CXL driver

Dan Williams (2):
cxl/mem: Add a driver for the type-3 mailbox
cxl/mem: Register CXL memX devices

Vishal Verma (2):
cxl/acpi: Add an acpi_cxl module for the CXL interconnect
cxl/acpi: add OSC support

MAINTAINERS | 9 +
drivers/Kconfig | 1 +
drivers/Makefile | 1 +
drivers/cxl/Kconfig | 50 ++++
drivers/cxl/Makefile | 9 +
drivers/cxl/acpi.c | 325 ++++++++++++++++++++++
drivers/cxl/acpi.h | 33 +++
drivers/cxl/bus.c | 35 +++
drivers/cxl/bus.h | 8 +
drivers/cxl/cxl.h | 166 +++++++++++
drivers/cxl/mem.c | 631 ++++++++++++++++++++++++++++++++++++++++++
drivers/cxl/pci.h | 21 ++
include/acpi/actbl1.h | 52 ++++
13 files changed, 1341 insertions(+)
create mode 100644 drivers/cxl/Kconfig
create mode 100644 drivers/cxl/Makefile
create mode 100644 drivers/cxl/acpi.c
create mode 100644 drivers/cxl/acpi.h
create mode 100644 drivers/cxl/bus.c
create mode 100644 drivers/cxl/bus.h
create mode 100644 drivers/cxl/cxl.h
create mode 100644 drivers/cxl/mem.c
create mode 100644 drivers/cxl/pci.h

--
2.29.2


2020-11-11 05:47:57

by Ben Widawsky

[permalink] [raw]
Subject: [RFC PATCH 5/9] cxl/mem: Find device capabilities

CXL devices contain an array of capabilities that describe the
interactions software can interact with the device, or firmware running
on the device. A CXL compliant device must implement the device status
and the mailbox capability. A CXL compliant memory device must implement
the memory device capability.

Each of the capabilities can [will] provide an offset within the MMIO
region for interacting with the CXL device.

Signed-off-by: Ben Widawsky <[email protected]>
---
drivers/cxl/cxl.h | 89 +++++++++++++++++++++++++++++++++++++++++++++++
drivers/cxl/mem.c | 58 +++++++++++++++++++++++++++---
2 files changed, 143 insertions(+), 4 deletions(-)
create mode 100644 drivers/cxl/cxl.h

diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
new file mode 100644
index 000000000000..02858ae63d6d
--- /dev/null
+++ b/drivers/cxl/cxl.h
@@ -0,0 +1,89 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright(c) 2020 Intel Corporation. All rights reserved.
+
+#ifndef __CXL_H__
+#define __CXL_H__
+
+/* Device */
+#define CXLDEV_CAP_ARRAY_REG 0x0
+#define CXLDEV_CAP_ARRAY_CAP_ID 0
+#define CXLDEV_CAP_ARRAY_ID(x) ((x) & 0xffff)
+#define CXLDEV_CAP_ARRAY_COUNT(x) (((x) >> 32) & 0xffff)
+
+#define CXL_CAPABILITIES_CAP_ID_DEVICE_STATUS 1
+#define CXL_CAPABILITIES_CAP_ID_PRIMARY_MAILBOX 2
+#define CXL_CAPABILITIES_CAP_ID_SECONDARY_MAILBOX 3
+#define CXL_CAPABILITIES_CAP_ID_MEMDEV 0x4000
+
+/* Mailbox */
+#define CXLDEV_MB_CAPS 0x00
+#define CXLDEV_MB_CAP_PAYLOAD_SIZE(cap) ((cap) & 0x1F)
+#define CXLDEV_MB_CTRL 0x04
+#define CXLDEV_MB_CMD 0x08
+#define CXLDEV_MB_STATUS 0x10
+#define CXLDEV_MB_BG_CMD_STATUS 0x18
+
+struct cxl_mem {
+ struct pci_dev *pdev;
+ void __iomem *regs;
+
+ /* Cap 0000h */
+ struct {
+ void __iomem *regs;
+ } status;
+
+ /* Cap 0002h */
+ struct {
+ void __iomem *regs;
+ size_t payload_size;
+ } mbox;
+
+ /* Cap 0040h */
+ struct {
+ void __iomem *regs;
+ } mem;
+};
+
+#define cxl_reg(type) \
+ static inline void cxl_write_##type##_reg32(struct cxl_mem *cxlm, \
+ u32 reg, u32 value) \
+ { \
+ void __iomem *reg_addr = READ_ONCE(cxlm->type.regs); \
+ writel(value, reg_addr + reg); \
+ } \
+ static inline void cxl_write_##type##_reg64(struct cxl_mem *cxlm, \
+ u32 reg, u64 value) \
+ { \
+ void __iomem *reg_addr = READ_ONCE(cxlm->type.regs); \
+ writeq(value, reg_addr + reg); \
+ } \
+ static inline u32 cxl_read_##type##_reg32(struct cxl_mem *cxlm, \
+ u32 reg) \
+ { \
+ void __iomem *reg_addr = READ_ONCE(cxlm->type.regs); \
+ return readl(reg_addr + reg); \
+ } \
+ static inline u64 cxl_read_##type##_reg64(struct cxl_mem *cxlm, \
+ u32 reg) \
+ { \
+ void __iomem *reg_addr = READ_ONCE(cxlm->type.regs); \
+ return readq(reg_addr + reg); \
+ }
+
+cxl_reg(status)
+cxl_reg(mbox)
+
+static inline u32 __cxl_raw_read_reg32(struct cxl_mem *cxlm, u32 reg)
+{
+ void __iomem *reg_addr = READ_ONCE(cxlm->regs);
+
+ return readl(reg_addr + reg);
+}
+
+static inline u64 __cxl_raw_read_reg64(struct cxl_mem *cxlm, u32 reg)
+{
+ void __iomem *reg_addr = READ_ONCE(cxlm->regs);
+
+ return readq(reg_addr + reg);
+}
+#endif /* __CXL_H__ */
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 8d9b9ab6c5ea..4109ef7c3ecb 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -5,11 +5,57 @@
#include <linux/io.h>
#include "acpi.h"
#include "pci.h"
+#include "cxl.h"

-struct cxl_mem {
- struct pci_dev *pdev;
- void __iomem *regs;
-};
+static int cxl_mem_setup_regs(struct cxl_mem *cxlm)
+{
+ u64 cap_array;
+ int cap;
+
+ cap_array = __cxl_raw_read_reg64(cxlm, CXLDEV_CAP_ARRAY_REG);
+ if (CXLDEV_CAP_ARRAY_ID(cap_array) != CXLDEV_CAP_ARRAY_CAP_ID)
+ return -ENODEV;
+
+ for (cap = 1; cap <= CXLDEV_CAP_ARRAY_COUNT(cap_array); cap++) {
+ void *__iomem register_block;
+ u32 offset;
+ u16 cap_id;
+
+ cap_id = __cxl_raw_read_reg32(cxlm, cap * 0x10) & 0xffff;
+ offset = __cxl_raw_read_reg32(cxlm, cap * 0x10 + 0x4);
+ register_block = cxlm->regs + offset;
+
+ switch (cap_id) {
+ case CXL_CAPABILITIES_CAP_ID_DEVICE_STATUS:
+ dev_dbg(&cxlm->pdev->dev, "found Status capability\n");
+ cxlm->status.regs = register_block;
+ break;
+ case CXL_CAPABILITIES_CAP_ID_PRIMARY_MAILBOX:
+ dev_dbg(&cxlm->pdev->dev,
+ "found Mailbox capability\n");
+ cxlm->mbox.regs = register_block;
+ cxlm->mbox.payload_size = CXLDEV_MB_CAP_PAYLOAD_SIZE(cap_id);
+ break;
+ case CXL_CAPABILITIES_CAP_ID_SECONDARY_MAILBOX:
+ dev_dbg(&cxlm->pdev->dev,
+ "found UNSUPPORTED Secondary Mailbox capability\n");
+ break;
+ case CXL_CAPABILITIES_CAP_ID_MEMDEV:
+ dev_dbg(&cxlm->pdev->dev,
+ "found Memory Device capability\n");
+ cxlm->mem.regs = register_block;
+ break;
+ default:
+ dev_err(&cxlm->pdev->dev, "Unknown cap ID: %d\n", cap_id);
+ return -ENXIO;
+ }
+ }
+
+ if (!cxlm->status.regs || !cxlm->mbox.regs || !cxlm->mem.regs)
+ return -ENXIO;
+
+ return 0;
+}

static struct cxl_mem *cxl_mem_create(struct pci_dev *pdev, u32 reg_lo, u32 reg_hi)
{
@@ -110,6 +156,10 @@ static int cxl_mem_probe(struct pci_dev *pdev, const struct pci_device_id *id)
if (IS_ERR(cxlm))
return -ENXIO;

+ rc = cxl_mem_setup_regs(cxlm);
+ if (rc)
+ return rc;
+
pci_set_drvdata(pdev, cxlm);

return 0;
--
2.29.2

2020-11-11 22:08:32

by Ben Widawsky

[permalink] [raw]

2020-11-12 02:15:39

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [RFC PATCH 0/9] CXL 2.0 Support

On Tue, Nov 10, 2020 at 09:43:47PM -0800, Ben Widawsky wrote:
> ...
> Ben Widawsky (5):
> cxl/mem: Map memory device registers
> cxl/mem: Find device capabilities
> cxl/mem: Initialize the mailbox interface
> cxl/mem: Implement polled mode mailbox
> MAINTAINERS: Add maintainers of the CXL driver
>
> Dan Williams (2):
> cxl/mem: Add a driver for the type-3 mailbox

To include important words first and use "Type 3" as in spec:

cxl/mem: Add Type 3 mailbox driver

> cxl/mem: Register CXL memX devices
>
> Vishal Verma (2):
> cxl/acpi: Add an acpi_cxl module for the CXL interconnect
> cxl/acpi: add OSC support

For consistency:

cxl/acpi: Add _OSC support

It's conventional in drivers/acpi and drivers/pci to capitalize the
"ACPI" and "PCI" initialisms except in actual C code. Seems like
you're mostly doing the same with "CXL", except in the subject lines
above. Since you're making a new directory, I guess you get to
choose.

I use "PCIe" (not "PCIE" or "PCI-E"; you have a mix) because that
seems to be the official abbreviation.

2020-11-13 18:28:16

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [RFC PATCH 5/9] cxl/mem: Find device capabilities

On Tue, Nov 10, 2020 at 09:43:52PM -0800, Ben Widawsky wrote:
> CXL devices contain an array of capabilities that describe the
> interactions software can interact with the device, or firmware running
> on the device. A CXL compliant device must implement the device status
> and the mailbox capability. A CXL compliant memory device must implement
> the memory device capability.
>
> Each of the capabilities can [will] provide an offset within the MMIO
> region for interacting with the CXL device.
>
> Signed-off-by: Ben Widawsky <[email protected]>
> ---
> drivers/cxl/cxl.h | 89 +++++++++++++++++++++++++++++++++++++++++++++++
> drivers/cxl/mem.c | 58 +++++++++++++++++++++++++++---
> 2 files changed, 143 insertions(+), 4 deletions(-)
> create mode 100644 drivers/cxl/cxl.h
>
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> new file mode 100644
> index 000000000000..02858ae63d6d
> --- /dev/null
> +++ b/drivers/cxl/cxl.h
> @@ -0,0 +1,89 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +// Copyright(c) 2020 Intel Corporation. All rights reserved.

Fix comment usage (I think SPDX in .h needs "/* */")

> +#ifndef __CXL_H__
> +#define __CXL_H__
> +
> +/* Device */
> +#define CXLDEV_CAP_ARRAY_REG 0x0
> +#define CXLDEV_CAP_ARRAY_CAP_ID 0
> +#define CXLDEV_CAP_ARRAY_ID(x) ((x) & 0xffff)
> +#define CXLDEV_CAP_ARRAY_COUNT(x) (((x) >> 32) & 0xffff)
> +
> +#define CXL_CAPABILITIES_CAP_ID_DEVICE_STATUS 1
> +#define CXL_CAPABILITIES_CAP_ID_PRIMARY_MAILBOX 2
> +#define CXL_CAPABILITIES_CAP_ID_SECONDARY_MAILBOX 3
> +#define CXL_CAPABILITIES_CAP_ID_MEMDEV 0x4000

Strange that the first three are decimal and the last is hex.

> +/* Mailbox */
> +#define CXLDEV_MB_CAPS 0x00
> +#define CXLDEV_MB_CAP_PAYLOAD_SIZE(cap) ((cap) & 0x1F)

Use upper- or lower-case hex consistently. Add tabs to line things
up.

> +#define CXLDEV_MB_CTRL 0x04
> +#define CXLDEV_MB_CMD 0x08
> +#define CXLDEV_MB_STATUS 0x10
> +#define CXLDEV_MB_BG_CMD_STATUS 0x18
> +
> +struct cxl_mem {
> + struct pci_dev *pdev;
> + void __iomem *regs;
> +
> + /* Cap 0000h */
> + struct {
> + void __iomem *regs;
> + } status;
> +
> + /* Cap 0002h */
> + struct {
> + void __iomem *regs;
> + size_t payload_size;
> + } mbox;
> +
> + /* Cap 0040h */
> + struct {
> + void __iomem *regs;
> + } mem;
> +};

Maybe a note about why READ_ONCE() is required?

> +#define cxl_reg(type) \
> + static inline void cxl_write_##type##_reg32(struct cxl_mem *cxlm, \
> + u32 reg, u32 value) \
> + { \
> + void __iomem *reg_addr = READ_ONCE(cxlm->type.regs); \
> + writel(value, reg_addr + reg); \
> + } \
> + static inline void cxl_write_##type##_reg64(struct cxl_mem *cxlm, \
> + u32 reg, u64 value) \
> + { \
> + void __iomem *reg_addr = READ_ONCE(cxlm->type.regs); \
> + writeq(value, reg_addr + reg); \
> + } \
> + static inline u32 cxl_read_##type##_reg32(struct cxl_mem *cxlm, \
> + u32 reg) \
> + { \
> + void __iomem *reg_addr = READ_ONCE(cxlm->type.regs); \
> + return readl(reg_addr + reg); \
> + } \
> + static inline u64 cxl_read_##type##_reg64(struct cxl_mem *cxlm, \
> + u32 reg) \
> + { \
> + void __iomem *reg_addr = READ_ONCE(cxlm->type.regs); \
> + return readq(reg_addr + reg); \
> + }
> +
> +cxl_reg(status)
> +cxl_reg(mbox)
> +
> +static inline u32 __cxl_raw_read_reg32(struct cxl_mem *cxlm, u32 reg)
> +{
> + void __iomem *reg_addr = READ_ONCE(cxlm->regs);
> +
> + return readl(reg_addr + reg);
> +}
> +
> +static inline u64 __cxl_raw_read_reg64(struct cxl_mem *cxlm, u32 reg)
> +{
> + void __iomem *reg_addr = READ_ONCE(cxlm->regs);
> +
> + return readq(reg_addr + reg);
> +}

Are the "__" prefixes here to leave space for something else in the
future? "__" typically means something like "raw", so right now it
sort of reads like "raw cxl raw read". So if you don't *need* the
"__" prefix, I'd drop it.

> +#endif /* __CXL_H__ */
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index 8d9b9ab6c5ea..4109ef7c3ecb 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -5,11 +5,57 @@
> #include <linux/io.h>
> #include "acpi.h"
> #include "pci.h"
> +#include "cxl.h"
>
> -struct cxl_mem {
> - struct pci_dev *pdev;
> - void __iomem *regs;
> -};

Probably nicer if you put "struct cxl_mem" in its ultimate destination
(drivers/cxl/cxl.h) from the beginning. Then it's easier to see what
this patch adds because it's not moving at the same time.

> +static int cxl_mem_setup_regs(struct cxl_mem *cxlm)
> +{
> + u64 cap_array;
> + int cap;
> +
> + cap_array = __cxl_raw_read_reg64(cxlm, CXLDEV_CAP_ARRAY_REG);
> + if (CXLDEV_CAP_ARRAY_ID(cap_array) != CXLDEV_CAP_ARRAY_CAP_ID)
> + return -ENODEV;
> +
> + for (cap = 1; cap <= CXLDEV_CAP_ARRAY_COUNT(cap_array); cap++) {
> + void *__iomem register_block;
> + u32 offset;
> + u16 cap_id;
> +
> + cap_id = __cxl_raw_read_reg32(cxlm, cap * 0x10) & 0xffff;
> + offset = __cxl_raw_read_reg32(cxlm, cap * 0x10 + 0x4);
> + register_block = cxlm->regs + offset;
> +
> + switch (cap_id) {
> + case CXL_CAPABILITIES_CAP_ID_DEVICE_STATUS:
> + dev_dbg(&cxlm->pdev->dev, "found Status capability\n");

Consider including the address or offset in these messages to help
debug? Printing a completely constant string always seems like a
missed opportunity to me.

> + cxlm->status.regs = register_block;
> + break;
> + case CXL_CAPABILITIES_CAP_ID_PRIMARY_MAILBOX:
> + dev_dbg(&cxlm->pdev->dev,
> + "found Mailbox capability\n");
> + cxlm->mbox.regs = register_block;
> + cxlm->mbox.payload_size = CXLDEV_MB_CAP_PAYLOAD_SIZE(cap_id);
> + break;
> + case CXL_CAPABILITIES_CAP_ID_SECONDARY_MAILBOX:
> + dev_dbg(&cxlm->pdev->dev,
> + "found UNSUPPORTED Secondary Mailbox capability\n");
> + break;
> + case CXL_CAPABILITIES_CAP_ID_MEMDEV:
> + dev_dbg(&cxlm->pdev->dev,
> + "found Memory Device capability\n");
> + cxlm->mem.regs = register_block;
> + break;
> + default:
> + dev_err(&cxlm->pdev->dev, "Unknown cap ID: %d\n", cap_id);
> + return -ENXIO;
> + }
> + }
> +
> + if (!cxlm->status.regs || !cxlm->mbox.regs || !cxlm->mem.regs)
> + return -ENXIO;
> +
> + return 0;
> +}
>
> static struct cxl_mem *cxl_mem_create(struct pci_dev *pdev, u32 reg_lo, u32 reg_hi)
> {
> @@ -110,6 +156,10 @@ static int cxl_mem_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> if (IS_ERR(cxlm))
> return -ENXIO;
>
> + rc = cxl_mem_setup_regs(cxlm);
> + if (rc)
> + return rc;
> +
> pci_set_drvdata(pdev, cxlm);
>
> return 0;
> --
> 2.29.2
>

2020-11-14 01:40:48

by Ben Widawsky

[permalink] [raw]
Subject: Re: [RFC PATCH 5/9] cxl/mem: Find device capabilities

On 20-11-13 12:26:03, Bjorn Helgaas wrote:
> On Tue, Nov 10, 2020 at 09:43:52PM -0800, Ben Widawsky wrote:
> > CXL devices contain an array of capabilities that describe the
> > interactions software can interact with the device, or firmware running
> > on the device. A CXL compliant device must implement the device status
> > and the mailbox capability. A CXL compliant memory device must implement
> > the memory device capability.
> >
> > Each of the capabilities can [will] provide an offset within the MMIO
> > region for interacting with the CXL device.
> >
> > Signed-off-by: Ben Widawsky <[email protected]>
> > ---
> > drivers/cxl/cxl.h | 89 +++++++++++++++++++++++++++++++++++++++++++++++
> > drivers/cxl/mem.c | 58 +++++++++++++++++++++++++++---
> > 2 files changed, 143 insertions(+), 4 deletions(-)
> > create mode 100644 drivers/cxl/cxl.h
> >
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > new file mode 100644
> > index 000000000000..02858ae63d6d
> > --- /dev/null
> > +++ b/drivers/cxl/cxl.h
> > @@ -0,0 +1,89 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +// Copyright(c) 2020 Intel Corporation. All rights reserved.
>
> Fix comment usage (I think SPDX in .h needs "/* */")
>
> > +#ifndef __CXL_H__
> > +#define __CXL_H__
> > +
> > +/* Device */
> > +#define CXLDEV_CAP_ARRAY_REG 0x0
> > +#define CXLDEV_CAP_ARRAY_CAP_ID 0
> > +#define CXLDEV_CAP_ARRAY_ID(x) ((x) & 0xffff)
> > +#define CXLDEV_CAP_ARRAY_COUNT(x) (((x) >> 32) & 0xffff)
> > +
> > +#define CXL_CAPABILITIES_CAP_ID_DEVICE_STATUS 1
> > +#define CXL_CAPABILITIES_CAP_ID_PRIMARY_MAILBOX 2
> > +#define CXL_CAPABILITIES_CAP_ID_SECONDARY_MAILBOX 3
> > +#define CXL_CAPABILITIES_CAP_ID_MEMDEV 0x4000
>
> Strange that the first three are decimal and the last is hex.
>
> > +/* Mailbox */
> > +#define CXLDEV_MB_CAPS 0x00
> > +#define CXLDEV_MB_CAP_PAYLOAD_SIZE(cap) ((cap) & 0x1F)
>
> Use upper- or lower-case hex consistently. Add tabs to line things
> up.
>
> > +#define CXLDEV_MB_CTRL 0x04
> > +#define CXLDEV_MB_CMD 0x08
> > +#define CXLDEV_MB_STATUS 0x10
> > +#define CXLDEV_MB_BG_CMD_STATUS 0x18
> > +
> > +struct cxl_mem {
> > + struct pci_dev *pdev;
> > + void __iomem *regs;
> > +
> > + /* Cap 0000h */
> > + struct {
> > + void __iomem *regs;
> > + } status;
> > +
> > + /* Cap 0002h */
> > + struct {
> > + void __iomem *regs;
> > + size_t payload_size;
> > + } mbox;
> > +
> > + /* Cap 0040h */
> > + struct {
> > + void __iomem *regs;
> > + } mem;
> > +};
>
> Maybe a note about why READ_ONCE() is required?
>

I don't believe it's actually necessary. I will drop it.

> > +#define cxl_reg(type) \
> > + static inline void cxl_write_##type##_reg32(struct cxl_mem *cxlm, \
> > + u32 reg, u32 value) \
> > + { \
> > + void __iomem *reg_addr = READ_ONCE(cxlm->type.regs); \
> > + writel(value, reg_addr + reg); \
> > + } \
> > + static inline void cxl_write_##type##_reg64(struct cxl_mem *cxlm, \
> > + u32 reg, u64 value) \
> > + { \
> > + void __iomem *reg_addr = READ_ONCE(cxlm->type.regs); \
> > + writeq(value, reg_addr + reg); \
> > + } \
> > + static inline u32 cxl_read_##type##_reg32(struct cxl_mem *cxlm, \
> > + u32 reg) \
> > + { \
> > + void __iomem *reg_addr = READ_ONCE(cxlm->type.regs); \
> > + return readl(reg_addr + reg); \
> > + } \
> > + static inline u64 cxl_read_##type##_reg64(struct cxl_mem *cxlm, \
> > + u32 reg) \
> > + { \
> > + void __iomem *reg_addr = READ_ONCE(cxlm->type.regs); \
> > + return readq(reg_addr + reg); \
> > + }
> > +
> > +cxl_reg(status)
> > +cxl_reg(mbox)
> > +
> > +static inline u32 __cxl_raw_read_reg32(struct cxl_mem *cxlm, u32 reg)
> > +{
> > + void __iomem *reg_addr = READ_ONCE(cxlm->regs);
> > +
> > + return readl(reg_addr + reg);
> > +}
> > +
> > +static inline u64 __cxl_raw_read_reg64(struct cxl_mem *cxlm, u32 reg)
> > +{
> > + void __iomem *reg_addr = READ_ONCE(cxlm->regs);
> > +
> > + return readq(reg_addr + reg);
> > +}
>
> Are the "__" prefixes here to leave space for something else in the
> future? "__" typically means something like "raw", so right now it
> sort of reads like "raw cxl raw read". So if you don't *need* the
> "__" prefix, I'd drop it.
>

The "__" prefix is because those functions really shouldn't be used except in
early initialization and perhaps for debugfs kinds of things. I can remove the
'raw' from the name, but I do consider this a raw read as it isn't going to
read/write to any particular function of a CXL device.

So unless you're deeply offended by it, I'd like to make it

__cxl_read/write_reg64()

> > +#endif /* __CXL_H__ */
> > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> > index 8d9b9ab6c5ea..4109ef7c3ecb 100644
> > --- a/drivers/cxl/mem.c
> > +++ b/drivers/cxl/mem.c
> > @@ -5,11 +5,57 @@
> > #include <linux/io.h>
> > #include "acpi.h"
> > #include "pci.h"
> > +#include "cxl.h"
> >
> > -struct cxl_mem {
> > - struct pci_dev *pdev;
> > - void __iomem *regs;
> > -};
>
> Probably nicer if you put "struct cxl_mem" in its ultimate destination
> (drivers/cxl/cxl.h) from the beginning. Then it's easier to see what
> this patch adds because it's not moving at the same time.
>

Yes, this is sort of the wart again of 3 of us all working on the code at the
same time. Dan, you want to squash it into yours?

> > +static int cxl_mem_setup_regs(struct cxl_mem *cxlm)
> > +{
> > + u64 cap_array;
> > + int cap;
> > +
> > + cap_array = __cxl_raw_read_reg64(cxlm, CXLDEV_CAP_ARRAY_REG);
> > + if (CXLDEV_CAP_ARRAY_ID(cap_array) != CXLDEV_CAP_ARRAY_CAP_ID)
> > + return -ENODEV;
> > +
> > + for (cap = 1; cap <= CXLDEV_CAP_ARRAY_COUNT(cap_array); cap++) {
> > + void *__iomem register_block;
> > + u32 offset;
> > + u16 cap_id;
> > +
> > + cap_id = __cxl_raw_read_reg32(cxlm, cap * 0x10) & 0xffff;
> > + offset = __cxl_raw_read_reg32(cxlm, cap * 0x10 + 0x4);
> > + register_block = cxlm->regs + offset;
> > +
> > + switch (cap_id) {
> > + case CXL_CAPABILITIES_CAP_ID_DEVICE_STATUS:
> > + dev_dbg(&cxlm->pdev->dev, "found Status capability\n");
>
> Consider including the address or offset in these messages to help
> debug? Printing a completely constant string always seems like a
> missed opportunity to me.
>

Sure. The main thing the debug message is trying to help notify is textual
versions of the caps, compared to what one might expect. I don't see offsets as
immediately useful, but they definitely do not hurt.

> > + cxlm->status.regs = register_block;
> > + break;
> > + case CXL_CAPABILITIES_CAP_ID_PRIMARY_MAILBOX:
> > + dev_dbg(&cxlm->pdev->dev,
> > + "found Mailbox capability\n");
> > + cxlm->mbox.regs = register_block;
> > + cxlm->mbox.payload_size = CXLDEV_MB_CAP_PAYLOAD_SIZE(cap_id);
> > + break;
> > + case CXL_CAPABILITIES_CAP_ID_SECONDARY_MAILBOX:
> > + dev_dbg(&cxlm->pdev->dev,
> > + "found UNSUPPORTED Secondary Mailbox capability\n");
> > + break;
> > + case CXL_CAPABILITIES_CAP_ID_MEMDEV:
> > + dev_dbg(&cxlm->pdev->dev,
> > + "found Memory Device capability\n");
> > + cxlm->mem.regs = register_block;
> > + break;
> > + default:
> > + dev_err(&cxlm->pdev->dev, "Unknown cap ID: %d\n", cap_id);
> > + return -ENXIO;
> > + }
> > + }
> > +
> > + if (!cxlm->status.regs || !cxlm->mbox.regs || !cxlm->mem.regs)
> > + return -ENXIO;
> > +
> > + return 0;
> > +}
> >
> > static struct cxl_mem *cxl_mem_create(struct pci_dev *pdev, u32 reg_lo, u32 reg_hi)
> > {
> > @@ -110,6 +156,10 @@ static int cxl_mem_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > if (IS_ERR(cxlm))
> > return -ENXIO;
> >
> > + rc = cxl_mem_setup_regs(cxlm);
> > + if (rc)
> > + return rc;
> > +
> > pci_set_drvdata(pdev, cxlm);
> >
> > return 0;
> > --
> > 2.29.2
> >

2020-11-17 15:18:19

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [RFC PATCH 5/9] cxl/mem: Find device capabilities

On Tue, 10 Nov 2020 21:43:52 -0800
Ben Widawsky <[email protected]> wrote:

> CXL devices contain an array of capabilities that describe the
> interactions software can interact with the device, or firmware running
> on the device. A CXL compliant device must implement the device status
> and the mailbox capability. A CXL compliant memory device must implement
> the memory device capability.
>
> Each of the capabilities can [will] provide an offset within the MMIO
> region for interacting with the CXL device.
>
> Signed-off-by: Ben Widawsky <[email protected]>

A few really minor things in this one.

Jonathan

> ---
> drivers/cxl/cxl.h | 89 +++++++++++++++++++++++++++++++++++++++++++++++
> drivers/cxl/mem.c | 58 +++++++++++++++++++++++++++---
> 2 files changed, 143 insertions(+), 4 deletions(-)
> create mode 100644 drivers/cxl/cxl.h
>
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> new file mode 100644
> index 000000000000..02858ae63d6d
> --- /dev/null
> +++ b/drivers/cxl/cxl.h
> @@ -0,0 +1,89 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +// Copyright(c) 2020 Intel Corporation. All rights reserved.
> +
> +#ifndef __CXL_H__
> +#define __CXL_H__
> +
> +/* Device */
> +#define CXLDEV_CAP_ARRAY_REG 0x0
> +#define CXLDEV_CAP_ARRAY_CAP_ID 0
> +#define CXLDEV_CAP_ARRAY_ID(x) ((x) & 0xffff)
> +#define CXLDEV_CAP_ARRAY_COUNT(x) (((x) >> 32) & 0xffff)
> +
> +#define CXL_CAPABILITIES_CAP_ID_DEVICE_STATUS 1

I'm not sure what you can do about consistent naming, but
perhaps this really does need to be
CXLDEV_CAP_CAP_ID_x however silly that looks.

It's a funny exercise I've only seen done once in a spec, but
I wish everyone would try working out what their fully expanded
field names will end up as and make sure the long form naming shortens
to something sensible. It definitely helps with clarity!

> +#define CXL_CAPABILITIES_CAP_ID_PRIMARY_MAILBOX 2
> +#define CXL_CAPABILITIES_CAP_ID_SECONDARY_MAILBOX 3
> +#define CXL_CAPABILITIES_CAP_ID_MEMDEV 0x4000
> +
> +/* Mailbox */
> +#define CXLDEV_MB_CAPS 0x00

Elsewhere you've used _OFFSET postfix. That's useful so I'd do it here
as well. Cross references to the spec section always appreciated as well!

> +#define CXLDEV_MB_CAP_PAYLOAD_SIZE(cap) ((cap) & 0x1F)
> +#define CXLDEV_MB_CTRL 0x04
> +#define CXLDEV_MB_CMD 0x08
> +#define CXLDEV_MB_STATUS 0x10
> +#define CXLDEV_MB_BG_CMD_STATUS 0x18
> +
> +struct cxl_mem {
> + struct pci_dev *pdev;
> + void __iomem *regs;
> +
> + /* Cap 0000h */

What are the numbers here? These capabilities have too
many indexes associated with them in different ways for it
to be obvious, so perhaps more detail in the comments?

> + struct {
> + void __iomem *regs;
> + } status;
> +
> + /* Cap 0002h */
> + struct {
> + void __iomem *regs;
> + size_t payload_size;
> + } mbox;
> +
> + /* Cap 0040h */
> + struct {
> + void __iomem *regs;
> + } mem;
> +};

2020-11-24 00:45:12

by Ben Widawsky

[permalink] [raw]
Subject: Re: [RFC PATCH 5/9] cxl/mem: Find device capabilities

On 20-11-17 15:15:19, Jonathan Cameron wrote:
> On Tue, 10 Nov 2020 21:43:52 -0800
> Ben Widawsky <[email protected]> wrote:
>
> > CXL devices contain an array of capabilities that describe the
> > interactions software can interact with the device, or firmware running
> > on the device. A CXL compliant device must implement the device status
> > and the mailbox capability. A CXL compliant memory device must implement
> > the memory device capability.
> >
> > Each of the capabilities can [will] provide an offset within the MMIO
> > region for interacting with the CXL device.
> >
> > Signed-off-by: Ben Widawsky <[email protected]>
>
> A few really minor things in this one.
>
> Jonathan
>
> > ---
> > drivers/cxl/cxl.h | 89 +++++++++++++++++++++++++++++++++++++++++++++++
> > drivers/cxl/mem.c | 58 +++++++++++++++++++++++++++---
> > 2 files changed, 143 insertions(+), 4 deletions(-)
> > create mode 100644 drivers/cxl/cxl.h
> >
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > new file mode 100644
> > index 000000000000..02858ae63d6d
> > --- /dev/null
> > +++ b/drivers/cxl/cxl.h
> > @@ -0,0 +1,89 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +// Copyright(c) 2020 Intel Corporation. All rights reserved.
> > +
> > +#ifndef __CXL_H__
> > +#define __CXL_H__
> > +
> > +/* Device */
> > +#define CXLDEV_CAP_ARRAY_REG 0x0
> > +#define CXLDEV_CAP_ARRAY_CAP_ID 0
> > +#define CXLDEV_CAP_ARRAY_ID(x) ((x) & 0xffff)
> > +#define CXLDEV_CAP_ARRAY_COUNT(x) (((x) >> 32) & 0xffff)
> > +
> > +#define CXL_CAPABILITIES_CAP_ID_DEVICE_STATUS 1
>
> I'm not sure what you can do about consistent naming, but
> perhaps this really does need to be
> CXLDEV_CAP_CAP_ID_x however silly that looks.
>
> It's a funny exercise I've only seen done once in a spec, but
> I wish everyone would try working out what their fully expanded
> field names will end up as and make sure the long form naming shortens
> to something sensible. It definitely helps with clarity!
>
> > +#define CXL_CAPABILITIES_CAP_ID_PRIMARY_MAILBOX 2
> > +#define CXL_CAPABILITIES_CAP_ID_SECONDARY_MAILBOX 3
> > +#define CXL_CAPABILITIES_CAP_ID_MEMDEV 0x4000
> > +
> > +/* Mailbox */
> > +#define CXLDEV_MB_CAPS 0x00
>
> Elsewhere you've used _OFFSET postfix. That's useful so I'd do it here
> as well. Cross references to the spec section always appreciated as well!
>
> > +#define CXLDEV_MB_CAP_PAYLOAD_SIZE(cap) ((cap) & 0x1F)
> > +#define CXLDEV_MB_CTRL 0x04
> > +#define CXLDEV_MB_CMD 0x08
> > +#define CXLDEV_MB_STATUS 0x10
> > +#define CXLDEV_MB_BG_CMD_STATUS 0x18
> > +
> > +struct cxl_mem {
> > + struct pci_dev *pdev;
> > + void __iomem *regs;
> > +
> > + /* Cap 0000h */
>
> What are the numbers here? These capabilities have too
> many indexes associated with them in different ways for it
> to be obvious, so perhaps more detail in the comments?

This comment was a bug. The cap is 0001h actually. I've added the hash define
for its cap id as part of the comment.

Everything else is accepted.

>
> > + struct {
> > + void __iomem *regs;
> > + } status;
> > +
> > + /* Cap 0002h */
> > + struct {
> > + void __iomem *regs;
> > + size_t payload_size;
> > + } mbox;
> > +
> > + /* Cap 0040h */
> > + struct {
> > + void __iomem *regs;
> > + } mem;
> > +};

2020-11-26 09:48:46

by Jon Masters

[permalink] [raw]
Subject: Re: [RFC PATCH 5/9] cxl/mem: Find device capabilities

On 11/11/20 12:43 AM, Ben Widawsky wrote:

> + case CXL_CAPABILITIES_CAP_ID_SECONDARY_MAILBOX:
> + dev_dbg(&cxlm->pdev->dev,
> + "found UNSUPPORTED Secondary Mailbox capability\n");

Per spec, the secondary mailbox is intended for use by platform
firmware, so Linux should never be using it anyway. Maybe that message
is slightly misleading?

Jon.

P.S. Related - I've severe doubts about the mailbox approach being
proposed by CXL and have begun to push back through the spec org.

--
Computer Architect

2020-11-27 08:39:09

by Ben Widawsky

[permalink] [raw]
Subject: Re: [RFC PATCH 5/9] cxl/mem: Find device capabilities

On 20-11-26 01:05:56, Jon Masters wrote:
> On 11/11/20 12:43 AM, Ben Widawsky wrote:
>
> > + case CXL_CAPABILITIES_CAP_ID_SECONDARY_MAILBOX:
> > + dev_dbg(&cxlm->pdev->dev,
> > + "found UNSUPPORTED Secondary Mailbox capability\n");
>
> Per spec, the secondary mailbox is intended for use by platform firmware, so
> Linux should never be using it anyway. Maybe that message is slightly
> misleading?

Yeah, I think the message could be reworded, but it is dev_dbg, so I wasn't too
worried about the wording in the first place. I think it is a mistake in this
case for the spec to describe the intended purpose. If the expectation is for
platform firmware to use it, but there is no negotiation mechanism in place,
it's essentially useless.

>
> Jon.
>
> P.S. Related - I've severe doubts about the mailbox approach being proposed
> by CXL and have begun to push back through the spec org.

Any reason not to articulate that here? Now that the spec is public, I don't see
any reason not to disclose that publicly.

2020-12-04 07:38:16

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC PATCH 5/9] cxl/mem: Find device capabilities

On Wed, Nov 25, 2020 at 10:06 PM Jon Masters <[email protected]> wrote:
>
> On 11/11/20 12:43 AM, Ben Widawsky wrote:
>
> > + case CXL_CAPABILITIES_CAP_ID_SECONDARY_MAILBOX:
> > + dev_dbg(&cxlm->pdev->dev,
> > + "found UNSUPPORTED Secondary Mailbox capability\n");
>
> Per spec, the secondary mailbox is intended for use by platform
> firmware, so Linux should never be using it anyway. Maybe that message
> is slightly misleading?
>
> Jon.
>
> P.S. Related - I've severe doubts about the mailbox approach being
> proposed by CXL and have begun to push back through the spec org.

The more Linux software voices the better. At the same time the spec
is released so we're into xkcd territory [1] of what the driver will
be expected to support for any future improvements.

[1]: https://xkcd.com/927/

2020-12-04 07:43:44

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC PATCH 5/9] cxl/mem: Find device capabilities

On Tue, Nov 10, 2020 at 9:44 PM Ben Widawsky <[email protected]> wrote:
>
> CXL devices contain an array of capabilities that describe the
> interactions software can interact with the device, or firmware running
> on the device. A CXL compliant device must implement the device status
> and the mailbox capability. A CXL compliant memory device must implement
> the memory device capability.
>
> Each of the capabilities can [will] provide an offset within the MMIO
> region for interacting with the CXL device.
>
> Signed-off-by: Ben Widawsky <[email protected]>
> ---
> drivers/cxl/cxl.h | 89 +++++++++++++++++++++++++++++++++++++++++++++++
> drivers/cxl/mem.c | 58 +++++++++++++++++++++++++++---
> 2 files changed, 143 insertions(+), 4 deletions(-)
> create mode 100644 drivers/cxl/cxl.h
>
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> new file mode 100644
> index 000000000000..02858ae63d6d
> --- /dev/null
> +++ b/drivers/cxl/cxl.h
[..]
> +static inline u32 __cxl_raw_read_reg32(struct cxl_mem *cxlm, u32 reg)

Going through my reworks and the "raw" jumped out at me. My typical
interpretation of "raw" in respect to register access macros is the
difference between readl() and __raw_readl() which means "don't do
bus endian swizzling, and don't do a memory clobber barrier". Any
heartburn to drop the "raw"?

...is it only me that reacts that way?

2020-12-04 17:47:18

by Chris Browy

[permalink] [raw]
Subject: Re: [RFC PATCH 0/9] CXL 2.0 Support

Hi Ben,

Trying to bring up the environment using the latest developments as follows:

1. Linux kernel baseline version is cloned using
     git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
   Using master branch.  Merged the 9 CXL linux kernel patches manually and built kernel

2. QEMU baseline version is cloned using
     git clone https://gitlab.com/bwidawsk/qemu.git

3. UEFI baseline is cloned using
     git clone https://github.com/tianocore/edk2.git
   Using master and built

4. Now can run qemu as follows:
     The qcow2 we use is based on Ubuntu 20.10 with updated with kernel from 1) above

     QEMU command:

     sudo qemu-system-x86_64 -nic \
     user,hostfwd=tcp::2222-:22,hostfwd=tcp::1234-:1234 -machine \
     type=pc-q35-4.0,hmat=on,accel=kvm -enable-kvm -cpu host -smp \
     6,cores=6,threads=1,sockets=1 -m 8G -boot order=d -k 'en-us' -vga virtio \
     -drive file=/home/chris/Downloads/AQCXL/ubuntu_20.qcow,format=qcow2 -drive \
     if=pflash,format=raw,readonly,file=/home/chris/OVMF_CODE.fd \
     -drive if=pflash,format=raw,file=/home/chris/OVMF_VARS.fd \
     -object memory-backend-file,id=cxl-mem1,share,mem-path=/tmp/cxl-test/cxl,size=512M \
     -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52,uid=0,len-window-base=1,\
     window-base[0]=0x4c0000000,memdev[0]=cxl-mem1 \
     -device cxl-rp,id=rp0,bus=cxl.0,addr=0.0,chassis=0,slot=0  \
     -device cxl-type3,bus=rp0,memdev=cxl-mem1,id=cxl-pmem0,size=256M  2>&1 | tee -a \
     /home/chris/Downloads/AQCXL/log/qemu.log

   The qemu options are derived from looking at the tests/qtests/cxl-test.c
   along with the -hmat=on which seemed to make sense.

   The system boots and lspci -vvv shows the CXL device is enumerated.  But
   no DOE capability register for CDAT access though (see below).  Otherwise the
   DVSEC registers are present.

   acpidump indicates the CXL0 and CXLM devices but no SRAT or HMAT tables are
   in the dump which is curious.


35:00.0 Memory controller [0502]: Intel Corporation Device 0d93 (rev 01) (prog-if 10)
    Subsystem: Red Hat, Inc. Device 1100
    Physical Slot: 0
    Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0
    Region 0: Memory at c0a00000 (64-bit, non-prefetchable) [size=64K]
    Region 2: Memory at c0a10000 (64-bit, non-prefetchable) [size=4K]
    Capabilities: [80] Express (v2) Endpoint, MSI 00
        DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
            ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
        DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
            RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
            MaxPayload 128 bytes, MaxReadReq 128 bytes
        DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
        LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s <64ns
            ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
        LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
            TrErr- Train- SlotClk- DLActive+ BWMgmt- ABWMgmt-
        DevCap2: Completion Timeout: Not Supported, TimeoutDis-, NROPrPrP-, LTR-
             10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt+, EETLPPrefix+, MaxEETLPPrefixes 4
               EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
             FRS-, TPHComp-, ExtTPHComp-
             AtomicOpsCap: 32bit- 64bit- 128bitCAS-
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
             AtomicOpsCtl: ReqEn-
        LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
             Compliance De-emphasis: -6dB
        LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
             EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
    Capabilities: [100 v1] Designated Vendor-Specific <?>
    Capabilities: [138 v1] Designated Vendor-Specific <?>
    Kernel driver in use: cxl_mem

Questions/Comments:
-------------------
1. Linux
  a. Is there a gitlab for the linux kernel patches for CXL?  This would
     facilitate review and code modifications.

2. UEFI (edk2 from tianocore)
  a. seems to only support CXL 1.1 which means only method #1 (Device
     option ROM) of Coherent Device Attribute Table_1.02 spec
     for CDAT handling is possible now.

     Does device option ROM need to be added to QEMU CXL setup?

     Can we add a CXL 1.1 emulated device?

  b. lspci doesn’t show the existence of the DOE extended capability register
     in the CXL CT3D (needed to support method #2).  Are there more patches?

3. Do you have example user programs to share or better yet the CXL 2.0
   Sec 14.3.6.1 Application Layer/ Transaction layer test for CXL.mem?

4. What are the userspace system APIs for targeting CXL HDM address domain?
   Usually you can mmap a SPA if you know how to look it up.


Best Regards,
Chris Browy



2020-12-04 18:16:00

by Ben Widawsky

[permalink] [raw]
Subject: Re: [RFC PATCH 0/9] CXL 2.0 Support

Hi Chris.

On 20-12-04 12:40:03, Chris Browy wrote:
> Hi Ben,
>
> Trying to bring up the environment using the latest developments as follows:
>
> 1. Linux kernel baseline version is cloned using
>      git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
>    Using master branch.  Merged the 9 CXL linux kernel patches manually and built kernel
>
> 2. QEMU baseline version is cloned using
>      git clone https://gitlab.com/bwidawsk/qemu.git
>
> 3. UEFI baseline is cloned using
>      git clone https://github.com/tianocore/edk2.git
>    Using master and built
>
> 4. Now can run qemu as follows:
>      The qcow2 we use is based on Ubuntu 20.10 with updated with kernel from 1) above
>
>      QEMU command:
>
>      sudo qemu-system-x86_64 -nic \
>      user,hostfwd=tcp::2222-:22,hostfwd=tcp::1234-:1234 -machine \
>      type=pc-q35-4.0,hmat=on,accel=kvm -enable-kvm -cpu host -smp \
>      6,cores=6,threads=1,sockets=1 -m 8G -boot order=d -k 'en-us' -vga virtio \
>      -drive file=/home/chris/Downloads/AQCXL/ubuntu_20.qcow,format=qcow2 -drive \
>      if=pflash,format=raw,readonly,file=/home/chris/OVMF_CODE.fd \
>      -drive if=pflash,format=raw,file=/home/chris/OVMF_VARS.fd \
>      -object memory-backend-file,id=cxl-mem1,share,mem-path=/tmp/cxl-test/cxl,size=512M \
>      -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52,uid=0,len-window-base=1,\
>      window-base[0]=0x4c0000000,memdev[0]=cxl-mem1 \
>      -device cxl-rp,id=rp0,bus=cxl.0,addr=0.0,chassis=0,slot=0  \
>      -device cxl-type3,bus=rp0,memdev=cxl-mem1,id=cxl-pmem0,size=256M  2>&1 | tee -a \
>      /home/chris/Downloads/AQCXL/log/qemu.log
>
>    The qemu options are derived from looking at the tests/qtests/cxl-test.c
>    along with the -hmat=on which seemed to make sense.
>
>    The system boots and lspci -vvv shows the CXL device is enumerated.  But
>    no DOE capability register for CDAT access though (see below).  Otherwise the
>    DVSEC registers are present.

DOE is not supported yet in either Linux or QEMU. For us, CDAT isn't a high
priority yet so it likely won't be done for a while. I'd really like to see DOE
support added by someone - not me - so that we can wire it up. Not sure what
that would look like in the QEMU side.

>
>    acpidump indicates the CXL0 and CXLM devices but no SRAT or HMAT tables are
>    in the dump which is curious.

I don't typically use HMAT, but I do have an SRAT in mine, so that's strange.
You should also have a CEDT.

>
>
> 35:00.0 Memory controller [0502]: Intel Corporation Device 0d93 (rev 01) (prog-if 10)
>     Subsystem: Red Hat, Inc. Device 1100
>     Physical Slot: 0
>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>     Latency: 0
>     Region 0: Memory at c0a00000 (64-bit, non-prefetchable) [size=64K]
>     Region 2: Memory at c0a10000 (64-bit, non-prefetchable) [size=4K]
>     Capabilities: [80] Express (v2) Endpoint, MSI 00
>         DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
>             ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
>         DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
>             RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>             MaxPayload 128 bytes, MaxReadReq 128 bytes
>         DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
>         LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s <64ns
>             ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
>         LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
>             ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>         LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
>             TrErr- Train- SlotClk- DLActive+ BWMgmt- ABWMgmt-
>         DevCap2: Completion Timeout: Not Supported, TimeoutDis-, NROPrPrP-, LTR-
>              10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt+, EETLPPrefix+, MaxEETLPPrefixes 4
>                EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
>              FRS-, TPHComp-, ExtTPHComp-
>              AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
>              AtomicOpsCtl: ReqEn-
>         LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>              Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
>              Compliance De-emphasis: -6dB
>         LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
>              EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>     Capabilities: [100 v1] Designated Vendor-Specific <?>
>     Capabilities: [138 v1] Designated Vendor-Specific <?>
>     Kernel driver in use: cxl_mem
>
> Questions/Comments:
> -------------------
> 1. Linux
>   a. Is there a gitlab for the linux kernel patches for CXL?  This would
>      facilitate review and code modifications.

We're hopefully going to send out v2 in the next couple of days. I'll push the
repo somewhere as well.

>
> 2. UEFI (edk2 from tianocore)
>   a. seems to only support CXL 1.1 which means only method #1 (Device
>      option ROM) of Coherent Device Attribute Table_1.02 spec
>      for CDAT handling is possible now.
>
>      Does device option ROM need to be added to QEMU CXL setup?
>
>      Can we add a CXL 1.1 emulated device?

Patches welcome :-). I know of other people who want this, but I only care about
2.0+, so I have no intention to implement it.

>
>   b. lspci doesn’t show the existence of the DOE extended capability register
>      in the CXL CT3D (needed to support method #2).  Are there more patches?

As above, it's not supported. I'm hoping someone else will do that work since I
don't care about it just yet.

>
> 3. Do you have example user programs to share or better yet the CXL 2.0
>    Sec 14.3.6.1 Application Layer/ Transaction layer test for CXL.mem?
>

I don't have, mostly because I haven't actually implemented a lot of the real
CXL support. My primary concern was having the Linux driver be able to enumerate
devices and communicate with the device via the mailbox interface. v2 will
contain support for userspace to do this, which I think is a step toward what
you're asking for.

> 4. What are the userspace system APIs for targeting CXL HDM address domain?
>    Usually you can mmap a SPA if you know how to look it up.

I think Dan answered this in the other thread...

>
>
> Best Regards,
> Chris Browy
>
>
>

2020-12-04 19:28:35

by Vishal Verma

[permalink] [raw]
Subject: Re: [RFC PATCH 0/9] CXL 2.0 Support

On Fri, 2020-12-04 at 10:12 -0800, Ben Widawsky wrote:
> Hi Chris.
>
> On 20-12-04 12:40:03, Chris Browy wrote:
[..]
>
> > acpidump indicates the CXL0 and CXLM devices but no SRAT or HMAT tables are
> > in the dump which is curious.
>
> I don't typically use HMAT, but I do have an SRAT in mine, so that's strange.
> You should also have a CEDT.
>
I suspect an SRAT is only added if you have distinct numa nodes. Adding
a few '-numa node' bits to the qemu command line should be enough to
make that happen.

2020-12-04 19:48:09

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC PATCH 0/9] CXL 2.0 Support

On Fri, Dec 4, 2020 at 11:26 AM Verma, Vishal L
<[email protected]> wrote:
>
> On Fri, 2020-12-04 at 10:12 -0800, Ben Widawsky wrote:
> > Hi Chris.
> >
> > On 20-12-04 12:40:03, Chris Browy wrote:
> [..]
> >
> > > acpidump indicates the CXL0 and CXLM devices but no SRAT or HMAT tables are
> > > in the dump which is curious.
> >
> > I don't typically use HMAT, but I do have an SRAT in mine, so that's strange.
> > You should also have a CEDT.
> >
> I suspect an SRAT is only added if you have distinct numa nodes. Adding
> a few '-numa node' bits to the qemu command line should be enough to
> make that happen.

For CXL-2.0-Type-3, BIOS is responsible for retrieving CDATs and
synthesizing SRAT/SLIT/HMAT tables for the CXL.mem that is mapped by
platform firmware. For CXL.mem that is mapped by the OS, there is no
requirement to publish updated ACPI tables. CXL.mem mapped by the OS
need only support native CXL memory enumeration and leave ACPI only
for static platform resources.

2020-12-07 04:45:48

by Chris Browy

[permalink] [raw]
Subject: Re: [RFC PATCH 0/9] CXL 2.0 Support

Hi Ben,


>On Dec 4, 2020, at 1:12 PM, Ben Widawsky <mailto:[email protected]> wrote:
>
>Hi Chris.
>
>On 20-12-04 12:40:03, Chris Browy wrote:
>
>Hi Ben,
>
>Trying to bring up the environment using the latest developments as follows:
>
>1. Linux kernel baseline version is cloned using
> git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
> Using master branch. Merged the 9 CXL linux kernel patches manually and built kernel
>
>2. QEMU baseline version is cloned using
> git clone https://gitlab.com/bwidawsk/qemu.git
>
>3. UEFI baseline is cloned using
> git clone https://github.com/tianocore/edk2.git
> Using master and built
>
>4. Now can run qemu as follows:
> The qcow2 we use is based on Ubuntu 20.10 with updated with kernel from 1) above
>
> QEMU command:
>
> sudo qemu-system-x86_64 -nic \
> user,hostfwd=tcp::2222-:22,hostfwd=tcp::1234-:1234 -machine \
> type=pc-q35-4.0,hmat=on,accel=kvm -enable-kvm -cpu host -smp \
> 6,cores=6,threads=1,sockets=1 -m 8G -boot order=d -k 'en-us' -vga virtio \
> -drive file=/home/chris/Downloads/AQCXL/ubuntu_20.qcow,format=qcow2 -drive \
> if=pflash,format=raw,readonly,file=/home/chris/OVMF_CODE.fd \
> -drive if=pflash,format=raw,file=/home/chris/OVMF_VARS.fd \
> -object memory-backend-file,id=cxl-mem1,share,mem-path=/tmp/cxl-test/cxl,size=512M \
> -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52,uid=0,len-window-base=1,\
> window-base[0]=0x4c0000000,memdev[0]=cxl-mem1 \
> -device cxl-rp,id=rp0,bus=cxl.0,addr=0.0,chassis=0,slot=0 \
> -device cxl-type3,bus=rp0,memdev=cxl-mem1,id=cxl-pmem0,size=256M 2>&1 | tee -a \
> /home/chris/Downloads/AQCXL/log/qemu.log
>
> The qemu options are derived from looking at the tests/qtests/cxl-test.c
> along with the -hmat=on which seemed to make sense.
>
> The system boots and lspci -vvv shows the CXL device is enumerated. But
> no DOE capability register for CDAT access though (see below). Otherwise the
> DVSEC registers are present.
>
>DOE is not supported yet in either Linux or QEMU. For us, CDAT isn't a high
>priority yet so it likely won't be done for a while. I'd really like to see DOE
>support added by someone - not me - so that we can wire it up. Not sure what
>that would look like in the QEMU side.
>
>
>
> acpidump indicates the CXL0 and CXLM devices but no SRAT or HMAT tables are
> in the dump which is curious.
>
>I don't typically use HMAT, but I do have an SRAT in mine, so that's strange.
>You should also have a CEDT.
>
Could you provide the QEMU command line? I was not successful adding numa node for cxl-mem1
and RAM. Leaving out numa node for cxl-mem1 I can now see SRAT table being created but that’s
not exactly the point.

Are you using UEFI or legacy BIOS to boot?

Reproducing a known working environment is better for now and then deviate for trying new
configurations.

>
>
>35:00.0 Memory controller [0502]: Intel Corporation Device 0d93 (rev 01) (prog-if 10)
> Subsystem: Red Hat, Inc. Device 1100
> Physical Slot: 0
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> Latency: 0
> Region 0: Memory at c0a00000 (64-bit, non-prefetchable) [size=64K]
> Region 2: Memory at c0a10000 (64-bit, non-prefetchable) [size=4K]
> Capabilities: [80] Express (v2) Endpoint, MSI 00
> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> MaxPayload 128 bytes, MaxReadReq 128 bytes
> DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s <64ns
> ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
> LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
> TrErr- Train- SlotClk- DLActive+ BWMgmt- ABWMgmt-
> DevCap2: Completion Timeout: Not Supported, TimeoutDis-, NROPrPrP-, LTR-
> 10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt+, EETLPPrefix+, MaxEETLPPrefixes 4
> EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
> FRS-, TPHComp-, ExtTPHComp-
> AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
> AtomicOpsCtl: ReqEn-
> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> Compliance De-emphasis: -6dB
> LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> Capabilities: [100 v1] Designated Vendor-Specific <?>
> Capabilities: [138 v1] Designated Vendor-Specific <?>
> Kernel driver in use: cxl_mem
Can you check this in the QEMU cxl_component_register_init_common() in cxl-component-utils.c


I used devmem2 to dump out CLX Component CLX.mem regsisters located at BAR + 0x1000. Header
List:
04110000 CXL CAP
08010002 CXL RAS
0D820004 CXL Link
11010005 CXL HDM Decoder
00000000

I think the 1st entry is the CXL Cap Header List should be
0x03110001


Capability_ID looks like it is set using ARRAY_FIELD_DP32

/* CXL Capability Header Register */
ARRAY_FIELD_DP32(reg_state, CXL_CAPABILITY_HEADER, ID, 1);
ARRAY_FIELD_DP32(reg_state, CXL_CAPABILITY_HEADER, VERSION, 1);
ARRAY_FIELD_DP32(reg_state, CXL_CAPABILITY_HEADER, CACHE_MEM_VERSION, 1);
ARRAY_FIELD_DP32(reg_state, CXL_CAPABILITY_HEADER, ARRAY_SIZE, caps);


But value reported for ID is ‘0’.

Since there are only RAS, LINK, and HDM headers found the Array_Size should be ‘3’
according to the spec:

Array_Size: This defines the number of elements present in
the CXL_Capability array, not including the
CXL_Capability_Header element. Each element is 1 DWORD in
size and is located contiguous with previous elements.


‘caps’ should one less that calculated in switch statement

96 void cxl_component_register_init_common(uint32_t *reg_state, enum reg_type type)
97 {
98 int caps = 0;
99 switch (type) {
100 case CXL2_DOWNSTREAM_PORT:
101 case CXL2_DEVICE:
102 /* CAP, RAS, Link */
103 caps = 3;
104 break;
105 case CXL2_UPSTREAM_PORT:
106 case CXL2_TYPE3_DEVICE:
107 case CXL2_LOGICAL_DEVICE:
108 /* + HDM */
109 caps = 4;
110 break;
111 case CXL2_ROOT_PORT:
112 /* + Extended Security, + Snoop */
113 caps = 6;
114 break;
115 default:
116 abort();
117 }


>
>Questions/Comments:
>-------------------
>1. Linux
> a. Is there a gitlab for the linux kernel patches for CXL? This would
> facilitate review and code modifications.
>
>We're hopefully going to send out v2 in the next couple of days. I'll push the
>repo somewhere as well.
>
That’s great!

>
>
>
>2. UEFI (edk2 from tianocore)
> a. seems to only support CXL 1.1 which means only method #1 (Device
> option ROM) of Coherent Device Attribute Table_1.02 spec
> for CDAT handling is possible now.
>
> Does device option ROM need to be added to QEMU CXL setup?
>
> Can we add a CXL 1.1 emulated device?
>
>Patches welcome :-). I know of other people who want this, but I only care about
>2.0+, so I have no intention to implement it.
Can you summarize how the System Physical Address (SPA) gets assigned to the
CXL pmem if not by the UEFI or CXL driver methods?

Is it some alternate method using host bridge window-base and then
cxl_realize() in cxl_type3.c.

-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52,uid=0,len-window-base=1, \
window-base[0]=0x4c0000000,memdev[0]=cxl-mem1 \

The QEMU command would inform a lot especially whether you
create 2 backend memories with numa nodes to models
RAM and PMEM?

Eventually the CXL driver would need to handle hot plug of
CT3D and access CDAT from the CT3D via DOE Mailbox.

As Dan mentions about Coherent Device Attribute Table (CDAT) Specification r1.02
requires CDAT to build the SRAT/HMAT statically and dynamically
- Figure 1 Pre-boot CDAT Extraction Method (for CXL devices) (uses CDAT in option ROM)
- Figure 2 OS Runtime CDAT Extraction Method (for CXL devices (uses DOE mailbox)

>
>
>
>
> b. lspci doesn’t show the existence of the DOE extended capability register
> in the CXL CT3D (needed to support method #2). Are there more patches?
>
>As above, it's not supported. I'm hoping someone else will do that work since I
>don't care about it just yet.

We can get a start on some QEMU updates to support DOE mailbox access.

>
>
>
>
>3. Do you have example user programs to share or better yet the CXL 2.0
> Sec 14.3.6.1 Application Layer/ Transaction layer test for CXL.mem?
>
>I don't have, mostly because I haven't actually implemented a lot of the real
>CXL support. My primary concern was having the Linux driver be able to enumerate
>devices and communicate with the device via the mailbox interface. v2 will
>contain support for userspace to do this, which I think is a step toward what
>you're asking for.
>
Understood. Looking forward to v2 and linux branch! Working from a known good
reference environment will allow for more in depth code review.


>
>
>4. What are the userspace system APIs for targeting CXL HDM address domain?
> Usually you can mmap a SPA if you know how to look it up.
>
>I think Dan answered this in the other thread…

Correct


Best Regards,
Chris Browy



2020-12-07 06:15:25

by Ben Widawsky

[permalink] [raw]
Subject: Re: [RFC PATCH 5/9] cxl/mem: Find device capabilities

On 20-12-03 23:41:16, Dan Williams wrote:
> On Tue, Nov 10, 2020 at 9:44 PM Ben Widawsky <[email protected]> wrote:
> >
> > CXL devices contain an array of capabilities that describe the
> > interactions software can interact with the device, or firmware running
> > on the device. A CXL compliant device must implement the device status
> > and the mailbox capability. A CXL compliant memory device must implement
> > the memory device capability.
> >
> > Each of the capabilities can [will] provide an offset within the MMIO
> > region for interacting with the CXL device.
> >
> > Signed-off-by: Ben Widawsky <[email protected]>
> > ---
> > drivers/cxl/cxl.h | 89 +++++++++++++++++++++++++++++++++++++++++++++++
> > drivers/cxl/mem.c | 58 +++++++++++++++++++++++++++---
> > 2 files changed, 143 insertions(+), 4 deletions(-)
> > create mode 100644 drivers/cxl/cxl.h
> >
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > new file mode 100644
> > index 000000000000..02858ae63d6d
> > --- /dev/null
> > +++ b/drivers/cxl/cxl.h
> [..]
> > +static inline u32 __cxl_raw_read_reg32(struct cxl_mem *cxlm, u32 reg)
>
> Going through my reworks and the "raw" jumped out at me. My typical
> interpretation of "raw" in respect to register access macros is the
> difference between readl() and __raw_readl() which means "don't do
> bus endian swizzling, and don't do a memory clobber barrier". Any
> heartburn to drop the "raw"?
>
> ...is it only me that reacts that way?

I will drop "raw". Especially given that I intend to reuse the word in v2 for
something entirely different, it makes sense.

My idea of "raw" was that it's just unfettered access to the device's MMIO
space. No offsets, no checks. I'm not sure of a better adjective to describe
that, but if you have any in mind, I'd like to add it.

2020-12-09 00:52:59

by Ben Widawsky

[permalink] [raw]
Subject: Re: [RFC PATCH 0/9] CXL 2.0 Support

On 20-12-06 23:40:47, Chris Browy wrote:
> Hi Ben,
>
>
> >On Dec 4, 2020, at 1:12 PM, Ben Widawsky <mailto:[email protected]> wrote:
> >
> >Hi Chris.
> >
> >On 20-12-04 12:40:03, Chris Browy wrote:
> >
> >Hi Ben,
> >
> >Trying to bring up the environment using the latest developments as follows:
> >
> >1. Linux kernel baseline version is cloned using
> > git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
> > Using master branch. Merged the 9 CXL linux kernel patches manually and built kernel
> >
> >2. QEMU baseline version is cloned using
> > git clone https://gitlab.com/bwidawsk/qemu.git
> >
> >3. UEFI baseline is cloned using
> > git clone https://github.com/tianocore/edk2.git
> > Using master and built
> >
> >4. Now can run qemu as follows:
> > The qcow2 we use is based on Ubuntu 20.10 with updated with kernel from 1) above
> >
> > QEMU command:
> >
> > sudo qemu-system-x86_64 -nic \
> > user,hostfwd=tcp::2222-:22,hostfwd=tcp::1234-:1234 -machine \
> > type=pc-q35-4.0,hmat=on,accel=kvm -enable-kvm -cpu host -smp \
> > 6,cores=6,threads=1,sockets=1 -m 8G -boot order=d -k 'en-us' -vga virtio \
> > -drive file=/home/chris/Downloads/AQCXL/ubuntu_20.qcow,format=qcow2 -drive \
> > if=pflash,format=raw,readonly,file=/home/chris/OVMF_CODE.fd \
> > -drive if=pflash,format=raw,file=/home/chris/OVMF_VARS.fd \
> > -object memory-backend-file,id=cxl-mem1,share,mem-path=/tmp/cxl-test/cxl,size=512M \
> > -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52,uid=0,len-window-base=1,\
> > window-base[0]=0x4c0000000,memdev[0]=cxl-mem1 \
> > -device cxl-rp,id=rp0,bus=cxl.0,addr=0.0,chassis=0,slot=0 \
> > -device cxl-type3,bus=rp0,memdev=cxl-mem1,id=cxl-pmem0,size=256M 2>&1 | tee -a \
> > /home/chris/Downloads/AQCXL/log/qemu.log
> >
> > The qemu options are derived from looking at the tests/qtests/cxl-test.c
> > along with the -hmat=on which seemed to make sense.
> >
> > The system boots and lspci -vvv shows the CXL device is enumerated. But
> > no DOE capability register for CDAT access though (see below). Otherwise the
> > DVSEC registers are present.
> >
> >DOE is not supported yet in either Linux or QEMU. For us, CDAT isn't a high
> >priority yet so it likely won't be done for a while. I'd really like to see DOE
> >support added by someone - not me - so that we can wire it up. Not sure what
> >that would look like in the QEMU side.
> >
> >
> >
> > acpidump indicates the CXL0 and CXLM devices but no SRAT or HMAT tables are
> > in the dump which is curious.
> >
> >I don't typically use HMAT, but I do have an SRAT in mine, so that's strange.
> >You should also have a CEDT.
> >
> Could you provide the QEMU command line? I was not successful adding numa node for cxl-mem1
> and RAM. Leaving out numa node for cxl-mem1 I can now see SRAT table being created but that’s
> not exactly the point.
>
> Are you using UEFI or legacy BIOS to boot?
>
> Reproducing a known working environment is better for now and then deviate for trying new
> configurations.

I'm using UEFI. I haven't built my own in a while. The script we use to run QEMU
pulls whatever is here: https://www.archlinux.org/packages/extra/any/edk2-ovmf/download/

-machine q35,accel=kvm,nvdimm,cxl
-m 8192M,slots=4,maxmem=40964M
-smp 8,sockets=2,cores=2,threads=2
-enable-kvm
-display none
-nographic
-drive if=pflash,format=raw,unit=0,file=OVMF_CODE.fd,readonly=on
-drive if=pflash,format=raw,unit=1,file=OVMF_VARS.fd
-debugcon file:uefi_debug.log
-global isa-debugcon.iobase=0x402
-drive file=root.img,format=raw,media=disk
-kernel ./mkosi.extra/boot/vmlinuz
-initrd mkosi.extra/boot/initramfs-5.10.0-rc2-BEN-00016-geb5b35e092af-dirty.img
-append selinux=0 audit=0 console=tty0 console=ttyS0 root=/dev/sda2 ignore_loglevel rw dyndbg="file drivers/cxl/mem.c +p"
-device e1000,netdev=net0,mac=52:54:00:12:34:56
-netdev user,id=net0,hostfwd=tcp::10022-:22
-object memory-backend-file,id=cxl-mem1,share,mem-path=cxl-type3,size=512M
-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52,uid=0,len-window-base=1,window-base[0]=0x4c0000000,memdev[0]=cxl-mem1
-device cxl-rp,id=rp0,bus=cxl.0,addr=0.0,chassis=0,slot=0
-device cxl-type3,bus=rp0,memdev=cxl-mem1,id=cxl-pmem0,size=256M
-snapshot
-object memory-backend-ram,id=mem0,size=2048M
-numa node,nodeid=0,memdev=mem0,
-numa cpu,node-id=0,socket-id=0
-object memory-backend-ram,id=mem1,size=2048M
-numa node,nodeid=1,memdev=mem1,
-numa cpu,node-id=1,socket-id=1
-object memory-backend-ram,id=mem2,size=2048M
-numa node,nodeid=2,memdev=mem2,
-object memory-backend-ram,id=mem3,size=2048M
-numa node,nodeid=3,memdev=mem3,
-numa node,nodeid=4,
-object memory-backend-file,id=nvmem0,share,mem-path=nvdimm-0,size=16384M,align=1G
-device nvdimm,memdev=nvmem0,id=nv0,label-size=2M,node=4
-numa node,nodeid=5,
-object memory-backend-file,id=nvmem1,share,mem-path=nvdimm-1,size=16384M,align=1G
-device nvdimm,memdev=nvmem1,id=nv1,label-size=2M,node=5
-numa dist,src=0,dst=0,val=10
-numa dist,src=0,dst=1,val=21
-numa dist,src=0,dst=2,val=12
-numa dist,src=0,dst=3,val=21
-numa dist,src=0,dst=4,val=17
-numa dist,src=0,dst=5,val=28
-numa dist,src=1,dst=1,val=10
-numa dist,src=1,dst=2,val=21
-numa dist,src=1,dst=3,val=12
-numa dist,src=1,dst=4,val=28
-numa dist,src=1,dst=5,val=17
-numa dist,src=2,dst=2,val=10
-numa dist,src=2,dst=3,val=21
-numa dist,src=2,dst=4,val=28
-numa dist,src=2,dst=5,val=28
-numa dist,src=3,dst=3,val=10
-numa dist,src=3,dst=4,val=28
-numa dist,src=3,dst=5,val=28
-numa dist,src=4,dst=4,val=10
-numa dist,src=4,dst=5,val=28
-numa dist,src=5,dst=5,val=10


>
> >
> >
> >35:00.0 Memory controller [0502]: Intel Corporation Device 0d93 (rev 01) (prog-if 10)
> > Subsystem: Red Hat, Inc. Device 1100
> > Physical Slot: 0
> > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> > Latency: 0
> > Region 0: Memory at c0a00000 (64-bit, non-prefetchable) [size=64K]
> > Region 2: Memory at c0a10000 (64-bit, non-prefetchable) [size=4K]
> > Capabilities: [80] Express (v2) Endpoint, MSI 00
> > DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
> > ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
> > DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> > MaxPayload 128 bytes, MaxReadReq 128 bytes
> > DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
> > LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s <64ns
> > ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
> > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
> > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> > LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
> > TrErr- Train- SlotClk- DLActive+ BWMgmt- ABWMgmt-
> > DevCap2: Completion Timeout: Not Supported, TimeoutDis-, NROPrPrP-, LTR-
> > 10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt+, EETLPPrefix+, MaxEETLPPrefixes 4
> > EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
> > FRS-, TPHComp-, ExtTPHComp-
> > AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> > DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
> > AtomicOpsCtl: ReqEn-
> > LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> > Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> > Compliance De-emphasis: -6dB
> > LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
> > EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> > Capabilities: [100 v1] Designated Vendor-Specific <?>
> > Capabilities: [138 v1] Designated Vendor-Specific <?>
> > Kernel driver in use: cxl_mem
> Can you check this in the QEMU cxl_component_register_init_common() in cxl-component-utils.c
>
>
> I used devmem2 to dump out CLX Component CLX.mem regsisters located at BAR + 0x1000. Header
> List:
> 04110000 CXL CAP
> 08010002 CXL RAS
> 0D820004 CXL Link
> 11010005 CXL HDM Decoder
> 00000000
>
> I think the 1st entry is the CXL Cap Header List should be
> 0x03110001
>
>
> Capability_ID looks like it is set using ARRAY_FIELD_DP32
>
> /* CXL Capability Header Register */
> ARRAY_FIELD_DP32(reg_state, CXL_CAPABILITY_HEADER, ID, 1);
> ARRAY_FIELD_DP32(reg_state, CXL_CAPABILITY_HEADER, VERSION, 1);
> ARRAY_FIELD_DP32(reg_state, CXL_CAPABILITY_HEADER, CACHE_MEM_VERSION, 1);
> ARRAY_FIELD_DP32(reg_state, CXL_CAPABILITY_HEADER, ARRAY_SIZE, caps);
>
>
> But value reported for ID is ‘0’.
>
> Since there are only RAS, LINK, and HDM headers found the Array_Size should be ‘3’
> according to the spec:
>
> Array_Size: This defines the number of elements present in
> the CXL_Capability array, not including the
> CXL_Capability_Header element. Each element is 1 DWORD in
> size and is located contiguous with previous elements.
>
>
> ‘caps’ should one less that calculated in switch statement
>
> 96 void cxl_component_register_init_common(uint32_t *reg_state, enum reg_type type)
> 97 {
> 98 int caps = 0;
> 99 switch (type) {
> 100 case CXL2_DOWNSTREAM_PORT:
> 101 case CXL2_DEVICE:
> 102 /* CAP, RAS, Link */
> 103 caps = 3;
> 104 break;
> 105 case CXL2_UPSTREAM_PORT:
> 106 case CXL2_TYPE3_DEVICE:
> 107 case CXL2_LOGICAL_DEVICE:
> 108 /* + HDM */
> 109 caps = 4;
> 110 break;
> 111 case CXL2_ROOT_PORT:
> 112 /* + Extended Security, + Snoop */
> 113 caps = 6;
> 114 break;
> 115 default:
> 116 abort();
> 117 }
>
>

You're correct. I've fixed it and pushed it to the v2 branch.

> >
> >Questions/Comments:
> >-------------------
> >1. Linux
> > a. Is there a gitlab for the linux kernel patches for CXL? This would
> > facilitate review and code modifications.
> >
> >We're hopefully going to send out v2 in the next couple of days. I'll push the
> >repo somewhere as well.
> >
> That’s great!
>
> >
> >
> >
> >2. UEFI (edk2 from tianocore)
> > a. seems to only support CXL 1.1 which means only method #1 (Device
> > option ROM) of Coherent Device Attribute Table_1.02 spec
> > for CDAT handling is possible now.
> >
> > Does device option ROM need to be added to QEMU CXL setup?
> >
> > Can we add a CXL 1.1 emulated device?
> >
> >Patches welcome :-). I know of other people who want this, but I only care about
> >2.0+, so I have no intention to implement it.
> Can you summarize how the System Physical Address (SPA) gets assigned to the
> CXL pmem if not by the UEFI or CXL driver methods?

The PMEM address isn't currently assigned. It expects the driver to program the
HDMs. However, for testing I faked this a bit, grep for SET_PMEM_PADDR. I did
have a tool I used to verify the HDM programming, but it appears that's been
lost to the sands of time.


>
> Is it some alternate method using host bridge window-base and then
> cxl_realize() in cxl_type3.c.
>
> -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52,uid=0,len-window-base=1, \
> window-base[0]=0x4c0000000,memdev[0]=cxl-mem1 \
>
> The QEMU command would inform a lot especially whether you
> create 2 backend memories with numa nodes to models
> RAM and PMEM?
>
> Eventually the CXL driver would need to handle hot plug of
> CT3D and access CDAT from the CT3D via DOE Mailbox.
>
> As Dan mentions about Coherent Device Attribute Table (CDAT) Specification r1.02
> requires CDAT to build the SRAT/HMAT statically and dynamically
> - Figure 1 Pre-boot CDAT Extraction Method (for CXL devices) (uses CDAT in option ROM)
> - Figure 2 OS Runtime CDAT Extraction Method (for CXL devices (uses DOE mailbox)
>
> >
> >
> >
> >
> > b. lspci doesn’t show the existence of the DOE extended capability register
> > in the CXL CT3D (needed to support method #2). Are there more patches?
> >
> >As above, it's not supported. I'm hoping someone else will do that work since I
> >don't care about it just yet.
>
> We can get a start on some QEMU updates to support DOE mailbox access.
>
> >
> >
> >
> >
> >3. Do you have example user programs to share or better yet the CXL 2.0
> > Sec 14.3.6.1 Application Layer/ Transaction layer test for CXL.mem?
> >
> >I don't have, mostly because I haven't actually implemented a lot of the real
> >CXL support. My primary concern was having the Linux driver be able to enumerate
> >devices and communicate with the device via the mailbox interface. v2 will
> >contain support for userspace to do this, which I think is a step toward what
> >you're asking for.
> >
> Understood. Looking forward to v2 and linux branch! Working from a known good
> reference environment will allow for more in depth code review.
>
>
> >
> >
> >4. What are the userspace system APIs for targeting CXL HDM address domain?
> > Usually you can mmap a SPA if you know how to look it up.
> >
> >I think Dan answered this in the other thread…
>
> Correct
>
>
> Best Regards,
> Chris Browy
>
>
>