Changes since v1 [1]
A few additions have been made:
- IOCTL (UAPI) interface has been added with commands
- Kernel docs have been created
- A new debug macro is introduced and sprinkled throughout.
A deletion was made:
- Removal of the non-standard _OSC UUID.
The detailed list of fixes is:
- fix cxl_register() no previous prototype warning (0day robot)
- s/REGLOG/REGLOC/ (Ben)
- Wait for doorbell on cxl_mem_mbox_get() and add comment on why (Ben)
- make "type-3" a proper adjective, add spec references, also did the same for
the Kconfig (Bjorn)
- align some defines (Bjorn)
- s/bar/BAR (Bjorn)
- rename cxl_bus_prepared() to cxl_bus_acquire() (Bjorn)
- move definition of struct cxl_mem to "cxl/mem: Map memory device registers" (Bjorn)
- use consistent hex/decimal (Bjorn)
- use consistent upper/lower hex values (Bjorn)
- get rid of READ_ONCE (Bjorn)
- add offsets to debug messages (Bjorn)
- cleanup SPDX comment style (Bjorn, Christoph)
- change errors returned by case (Bjorn, Dan)
- 80 character violation cleanups (Christoph)
- cleanup CXL_BUS_PROVIDER dependencies (Christoph, Randy)
- remove "raw" from mmio functions (Dan)
- rename PCI_DVSEC_VENDOR_CXL to add _ID (Jonathan)
- combine introduction of mbox infrastruct and cxl_mem_identify() (Jonathan)
- add ABI documentation for sysfs attributes (Jonathan)
- document scope of cxl_memdev_lock (Jonathan)
- rework cxl_register() to have devm semantics (reaction to comments about
cxl_mem_remove() and cxl_mem_add_memdev() semantics) (Jonathan)
- fix cxl_mem_exit() ordering (Jonathan)
- use GENMASK/GET_FIELD (Jonathan)
- fix and add comments for cap ids (Jonathan)
- use _OFFSET postfix in definitions (Jonathan)
- save pci_set_drvdata for later (Jonathan)
[1]: https://lore.kernel.org/linux-cxl/[email protected]/
---
Introduce support for “type-3” memory devices defined in the recently released
Compute Express Link (CXL) 2.0 specification[2]. Specifically, these are the
memory devices defined by section 8.2.8.5 of the CXL 2.0 spec. A reference
implementation emulating these devices has been submitted to the QEMU mailing
list and is available on gitlab [3]. “Type-3” is a CXL device that acts as a
memory expander for RAM or PMEM. It might be interleaved with other CXL devices
in a given physical address range.
These changes allow for foundational enumeration of CXL 2.0 memory devices as
well as basic userspace interaction. The functionality present is:
- Initial driver bring-up
- Device enumeration and an initial sysfs representation
- Submit a basic firmware command via ‘mailbox’ to an emulated memory device
with non-volatile capacity.
- Provide an interface to send "raw" commands to the hardware.
Some of the functionality that is still missing includes:
- Memory interleaving at the host bridge, root port, or switch level
- CXL 1.1 Root Complex Integrated Endpoint Support
- CXL 2.0 Hot plug support
- A bevy of supported device commands
In addition to the core functionality of discovering the spec defined registers
and resources, introduce a CXL device model that will be the foundation for
translating CXL capabilities into existing Linux infrastructure for Persistent
Memory and other memory devices. For now, this only includes support for the
management command mailbox that type-3 devices surface. These control devices
fill the role of “DIMMs” / nmemX memory-devices in LIBNVDIMM terms.
Now, while implementing the driver some feedback for the specification was
generated to cover perceived gaps and address conflicts. The feedback is
presented as a reference implementation in the driver and QEMU emulation.
Specifically the following concepts are original to the Linux implementation and
feedback / collaboration is requested to develop these into specification
proposals:
1. Top level ACPI object (ACPI0017)
2. HW imposed address space and interleave constraints
ACPI0017
--------
Introduce a new ACPI namespace device with an _HID of ACPI0017. The purpose of
this object is twofold, support a legacy OS with a set of out-of-tree CXL
modules, and establish an attach point for a driver that knows about
interleaving. Both of these boil down to the same point, to centralize Operating
System support for resources described by the CXL Early Discovery Table (CEDT).
The legacy OS problem stems from the spec's description of a host bridge,
ACPI0016 is denoted as the _HID for host bridges, with a _CID of PNP0A08. In a
CXL unaware version of Linux, the core ACPI subsystem will bind a driver to
PNP0A08 and preclude a CXL-aware driver from binding to ACPI0016. An ACPI0017
device allows a standalone CXL-aware driver to register for handling /
coordinating CEDT and CXL-specific _OSC control.
Similarly when managing interleaving there needs to be some management layer
above the ACPI0016 device that is capable of assembling leaf nodes into
interleave sets. As is the case with ACPI0012 that does this central
coordination for NFIT defined resources, ACPI0017 does the same for CEDT
described resources.
Memory Windows
-------
For CXL.mem capable platforms, there is a need for a mechanism for platform
firmware to make the Operating System aware of any restrictions that hardware
might have in address space. For example, in a system with 4 host bridges all
participating in an interleave set, the firmware needs to provide some
description of this. That information is missing from the CXL 2.0 spec as of
today and it also is not implemented in the driver. A variety of ACPI based
mechanisms, for example _CRS fields on the ACPI0017 device, were considered.
Next steps after this basic foundation is expanded command support and LIBNVDIMM
integration. This is the initial “release early / release often” version of the
Linux CXL enabling.
[2]: https://www.computeexpresslink.org/
[3]: https://gitlab.com/bwidawsk/qemu/-/tree/cxl-2.0v2
Ben Widawsky (10):
docs: cxl: Add basic documentation
cxl/mem: Map memory device registers
cxl/mem: Find device capabilities
cxl/mem: Implement polled mode mailbox
cxl/mem: Add basic IOCTL interface
cxl/mem: Add send command
cxl/mem: Add a "RAW" send command
cxl: Add basic debugging
MAINTAINERS: Add maintainers of the CXL driver
WIP/cxl/mem: Add get firmware for testing
Dan Williams (2):
cxl/mem: Introduce a driver for CXL-2.0-Type-3 endpoints
cxl/mem: Register CXL memX devices
Vishal Verma (2):
cxl/acpi: Add an acpi_cxl module for the CXL interconnect
cxl/acpi: add OSC support
Documentation/ABI/testing/sysfs-bus-cxl | 26 +
Documentation/cxl/index.rst | 12 +
Documentation/cxl/memory-devices.rst | 51 ++
Documentation/index.rst | 1 +
MAINTAINERS | 10 +
drivers/Kconfig | 1 +
drivers/Makefile | 1 +
drivers/cxl/Kconfig | 58 ++
drivers/cxl/Makefile | 9 +
drivers/cxl/acpi.c | 352 ++++++++
drivers/cxl/acpi.h | 35 +
drivers/cxl/bus.c | 54 ++
drivers/cxl/bus.h | 8 +
drivers/cxl/cxl.h | 188 +++++
drivers/cxl/mem.c | 1022 +++++++++++++++++++++++
drivers/cxl/pci.h | 34 +
include/acpi/actbl1.h | 51 ++
include/uapi/linux/cxl_mem.h | 148 ++++
18 files changed, 2061 insertions(+)
create mode 100644 Documentation/ABI/testing/sysfs-bus-cxl
create mode 100644 Documentation/cxl/index.rst
create mode 100644 Documentation/cxl/memory-devices.rst
create mode 100644 drivers/cxl/Kconfig
create mode 100644 drivers/cxl/Makefile
create mode 100644 drivers/cxl/acpi.c
create mode 100644 drivers/cxl/acpi.h
create mode 100644 drivers/cxl/bus.c
create mode 100644 drivers/cxl/bus.h
create mode 100644 drivers/cxl/cxl.h
create mode 100644 drivers/cxl/mem.c
create mode 100644 drivers/cxl/pci.h
create mode 100644 include/uapi/linux/cxl_mem.h
--
2.29.2
This starts a new cxl directory because CXL is a new bus and it is
expected that the documentation will grow large over time. Currently
only memory devices are documented as they are all that's supported.
Signed-off-by: Ben Widawsky <[email protected]>
---
Documentation/cxl/index.rst | 12 ++++++++++++
Documentation/cxl/memory-devices.rst | 15 +++++++++++++++
Documentation/index.rst | 1 +
3 files changed, 28 insertions(+)
create mode 100644 Documentation/cxl/index.rst
create mode 100644 Documentation/cxl/memory-devices.rst
diff --git a/Documentation/cxl/index.rst b/Documentation/cxl/index.rst
new file mode 100644
index 000000000000..036e49553542
--- /dev/null
+++ b/Documentation/cxl/index.rst
@@ -0,0 +1,12 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================
+Compute Express Link
+====================
+
+.. toctree::
+ :maxdepth: 1
+
+ memory-devices
+
+.. only:: subproject and html
diff --git a/Documentation/cxl/memory-devices.rst b/Documentation/cxl/memory-devices.rst
new file mode 100644
index 000000000000..aa4262280c67
--- /dev/null
+++ b/Documentation/cxl/memory-devices.rst
@@ -0,0 +1,15 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
+
+===================================
+Compute Express Link Memory Devices
+===================================
+
+A Compute Express Link Memory Device is a CXL component that implements the
+CXL.mem protocol. It contains some amount of volatile memory, persistent memory,
+or both.
+
+Driver Infrastructure
+=====================
+
+This sections covers the driver infrastructure for a CXL memory device.
diff --git a/Documentation/index.rst b/Documentation/index.rst
index 57719744774c..38678f6c5676 100644
--- a/Documentation/index.rst
+++ b/Documentation/index.rst
@@ -137,6 +137,7 @@ needed).
misc-devices/index
scheduler/index
mhi/index
+ cxl/index
Architecture-agnostic documentation
-----------------------------------
--
2.29.2
From: Dan Williams <[email protected]>
The CXL.mem protocol allows a device to act as a provider of "System
RAM" and/or "Persistent Memory" that is fully coherent as if the memory
was attached to the typical CPU memory controller.
With the CXL-2.0 specification a PCI endpoint can implement a "Type-3"
device interface and give the operating system control over "Host
Managed Device Memory". See section 2.3 Type 3 CXL Device.
The memory range exported by the device may optionally be described by
the platform firmware memory map, or by infrastructure like LIBNVDIMM to
provision persistent memory capacity from one, or more, CXL.mem devices.
A pre-requisite for Linux-managed memory-capacity provisioning is this
cxl_mem driver that can speak the mailbox protocol defined in section
8.2.8.4 Mailbox Registers.
For now just land the driver boiler-plate and fill it in with
functionality in subsequent commits.
Link: https://www.computeexpresslink.org/download-the-specification
Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: Ben Widawsky <[email protected]>
---
Documentation/cxl/memory-devices.rst | 9 ++++
drivers/cxl/Kconfig | 22 +++++++++
drivers/cxl/Makefile | 2 +
drivers/cxl/mem.c | 69 ++++++++++++++++++++++++++++
drivers/cxl/pci.h | 20 ++++++++
5 files changed, 122 insertions(+)
create mode 100644 drivers/cxl/mem.c
create mode 100644 drivers/cxl/pci.h
diff --git a/Documentation/cxl/memory-devices.rst b/Documentation/cxl/memory-devices.rst
index 6ce88f9d5f4f..134c9b6b4ff4 100644
--- a/Documentation/cxl/memory-devices.rst
+++ b/Documentation/cxl/memory-devices.rst
@@ -23,6 +23,15 @@ ACPI CXL
.. kernel-doc:: drivers/cxl/acpi.c
:internal:
+CXL Memory Device
+-----------------
+
+.. kernel-doc:: drivers/cxl/mem.c
+ :doc: cxl mem
+
+.. kernel-doc:: drivers/cxl/mem.c
+ :internal:
+
External Interfaces
===================
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 68da926ba5b1..0ac5080cd6e0 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -33,4 +33,26 @@ config CXL_ACPI
specification.
If unsure say 'm'
+
+config CXL_MEM
+ tristate "CXL.mem: Endpoint Support"
+ depends on PCI && CXL_BUS_PROVIDER
+ default CXL_BUS_PROVIDER
+ help
+ The CXL.mem protocol allows a device to act as a provider of
+ "System RAM" and/or "Persistent Memory" that is fully coherent
+ as if the memory was attached to the typical CPU memory
+ controller.
+
+ Say 'y/m' to enable a driver (named "cxl_mem.ko" when built as
+ a module) that will attach to CXL.mem devices for
+ configuration, provisioning, and health monitoring. This
+ driver is required for dynamic provisioning of CXL.mem
+ attached memory which is a pre-requisite for persistent memory
+ support. Typically volatile memory is mapped by platform
+ firmware and included in the platform memory map, but in some
+ cases the OS is responsible for mapping that memory. See
+ Chapter 2.3 Type 3 CXL Device in the CXL 2.0 specification.
+
+ If unsure say 'm'.
endif
diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile
index d38cd34a2582..97fdffb00f2d 100644
--- a/drivers/cxl/Makefile
+++ b/drivers/cxl/Makefile
@@ -1,5 +1,7 @@
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o
+obj-$(CONFIG_CXL_MEM) += cxl_mem.o
ccflags-y += -DDEFAULT_SYMBOL_NAMESPACE=CXL
cxl_acpi-y := acpi.o
+cxl_mem-y := mem.o
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
new file mode 100644
index 000000000000..005404888942
--- /dev/null
+++ b/drivers/cxl/mem.c
@@ -0,0 +1,69 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2020 Intel Corporation. All rights reserved. */
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/io.h>
+#include "acpi.h"
+#include "pci.h"
+
+static int cxl_mem_dvsec(struct pci_dev *pdev, int dvsec)
+{
+ int pos;
+
+ pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_DVSEC);
+ if (!pos)
+ return 0;
+
+ while (pos) {
+ u16 vendor, id;
+
+ pci_read_config_word(pdev, pos + PCI_DVSEC_VENDOR_ID_OFFSET,
+ &vendor);
+ pci_read_config_word(pdev, pos + PCI_DVSEC_ID_OFFSET, &id);
+ if (vendor == PCI_DVSEC_VENDOR_ID_CXL && dvsec == id)
+ return pos;
+
+ pos = pci_find_next_ext_capability(pdev, pos,
+ PCI_EXT_CAP_ID_DVSEC);
+ }
+
+ return 0;
+}
+
+static int cxl_mem_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+ struct device *dev = &pdev->dev;
+ int rc, regloc;
+
+ rc = cxl_bus_acquire(pdev);
+ if (rc != 0) {
+ dev_err(dev, "failed to acquire interface\n");
+ return rc;
+ }
+
+ regloc = cxl_mem_dvsec(pdev, PCI_DVSEC_ID_CXL_REGLOC);
+ if (!regloc) {
+ dev_err(dev, "register location dvsec not found\n");
+ return -ENXIO;
+ }
+
+ return 0;
+}
+
+static const struct pci_device_id cxl_mem_pci_tbl[] = {
+ /* PCI class code for CXL.mem Type-3 Devices */
+ { PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID,
+ PCI_CLASS_MEMORY_CXL, 0xffffff, 0 },
+ { /* terminate list */ },
+};
+MODULE_DEVICE_TABLE(pci, cxl_mem_pci_tbl);
+
+static struct pci_driver cxl_mem_driver = {
+ .name = KBUILD_MODNAME,
+ .id_table = cxl_mem_pci_tbl,
+ .probe = cxl_mem_probe,
+};
+
+MODULE_LICENSE("GPL v2");
+module_pci_driver(cxl_mem_driver);
+MODULE_IMPORT_NS(CXL);
diff --git a/drivers/cxl/pci.h b/drivers/cxl/pci.h
new file mode 100644
index 000000000000..a8a9935fa90b
--- /dev/null
+++ b/drivers/cxl/pci.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2020 Intel Corporation. All rights reserved. */
+#ifndef __CXL_PCI_H__
+#define __CXL_PCI_H__
+
+#define PCI_CLASS_MEMORY_CXL 0x050210
+
+/*
+ * See section 8.1 Configuration Space Registers in the CXL 2.0
+ * Specification
+ */
+#define PCI_EXT_CAP_ID_DVSEC 0x23
+#define PCI_DVSEC_VENDOR_ID_CXL 0x1E98
+#define PCI_DVSEC_VENDOR_ID_OFFSET 0x4
+#define PCI_DVSEC_ID_CXL 0x0
+#define PCI_DVSEC_ID_OFFSET 0x8
+
+#define PCI_DVSEC_ID_CXL_REGLOC 0x8
+
+#endif /* __CXL_PCI_H__ */
--
2.29.2
Add a straightforward IOCTL that provides a mechanism for userspace to
query the supported memory device commands.
Memory device commands are specified in 8.2.9 of the CXL 2.0
specification. They are submitted through a mailbox mechanism specified
in 8.2.8.4.
Signed-off-by: Ben Widawsky <[email protected]>
---
I did attempt to use the same struct for querying commands as well as
sending commands (upcoming patch). The number of unused fields between
the two made for a bad fit IMO.
Signed-off-by: Ben Widawsky <[email protected]>
---
Documentation/cxl/memory-devices.rst | 9 +++
drivers/cxl/mem.c | 89 +++++++++++++++++++++++
include/uapi/linux/cxl_mem.h | 102 +++++++++++++++++++++++++++
3 files changed, 200 insertions(+)
create mode 100644 include/uapi/linux/cxl_mem.h
diff --git a/Documentation/cxl/memory-devices.rst b/Documentation/cxl/memory-devices.rst
index 5f723c25382b..ec54674b3822 100644
--- a/Documentation/cxl/memory-devices.rst
+++ b/Documentation/cxl/memory-devices.rst
@@ -32,6 +32,15 @@ CXL Memory Device
.. kernel-doc:: drivers/cxl/mem.c
:internal:
+CXL IOCTL Interface
+-------------------
+
+.. kernel-doc:: include/uapi/linux/cxl_mem.h
+ :doc: UAPI
+
+.. kernel-doc:: include/uapi/linux/cxl_mem.h
+ :internal:
+
External Interfaces
===================
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index bb6ea58f6c7b..2c4aadcea0e4 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -7,6 +7,7 @@
#include <linux/idr.h>
#include <linux/pci.h>
#include <linux/io.h>
+#include <uapi/linux/cxl_mem.h>
#include "acpi.h"
#include "pci.h"
#include "cxl.h"
@@ -73,6 +74,49 @@ static DEFINE_IDR(cxl_mem_idr);
/* protect cxl_mem_idr allocations */
static DEFINE_MUTEX(cxl_memdev_lock);
+/*
+ * This table defines the supported mailboxes commands for the driver. The id is
+ * ordinal and thus gaps in this table aren't allowed. This table is made up of
+ * a UAPI structure. Non-negative values in the table will be validated against
+ * the user's input. For example, if size_in is 0, and the user passed in 1, it
+ * is an error.
+ */
+#define CXL_CMD(_id, _flags, sin, sout, _name, _enable, op) \
+ { \
+ { .id = CXL_MEM_COMMAND_ID_##_id, \
+ .flags = CXL_MEM_COMMAND_FLAG_##_flags, \
+ .size_in = sin, \
+ .size_out = sout, \
+ .name = _name }, \
+ .enable = _enable, .opcode = op \
+ }
+
+/**
+ * struct cxl_mem_command - Driver representation of a memory device command
+ * @info: Command information as it exists for the UAPI
+ * @opcode: The actual bits used for the mailbox protocol
+ * @enable: Whether the command is enabled. The driver may support a large set
+ * of commands that may not be enabled. The primary reason a command
+ * would not be enabled is for commands that are specified as optional
+ * and the hardware doesn't support the command.
+ *
+ * The cxl_mem_command is the driver's internal representation of commands that
+ * are supported by the driver. Some of these commands may not be supported by
+ * the hardware (!@enable). The driver will use @info to validate the fields
+ * passed in by the user then submit the @opcode to the hardware.
+ *
+ * See struct cxl_command_info.
+ */
+struct cxl_mem_command {
+ const struct cxl_command_info info;
+ const u16 opcode;
+ bool enable;
+};
+
+static struct cxl_mem_command mem_commands[] = {
+ CXL_CMD(INVALID, NONE, 0, 0, "Reserved", false, 0),
+};
+
static int cxl_mem_wait_for_doorbell(struct cxl_mem *cxlm)
{
const int timeout = msecs_to_jiffies(2000);
@@ -268,8 +312,53 @@ static int cxl_mem_open(struct inode *inode, struct file *file)
return 0;
}
+static int cxl_mem_count_commands(void)
+{
+ int i, n = 0;
+
+ for (i = 0; i < ARRAY_SIZE(mem_commands); i++) {
+ struct cxl_mem_command *c = &mem_commands[i];
+
+ if (c->enable)
+ n++;
+ }
+
+ return n;
+}
+
static long cxl_mem_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
+ if (cmd == CXL_MEM_QUERY_COMMANDS) {
+ struct cxl_mem_query_commands __user *q = (void __user *)arg;
+ u32 n_commands;
+ int i, j;
+
+ if (get_user(n_commands, (u32 __user *)arg))
+ return -EFAULT;
+
+ if (n_commands == 0)
+ return put_user(cxl_mem_count_commands(),
+ (u32 __user *)arg);
+
+ for (i = 0, j = 0;
+ i < ARRAY_SIZE(mem_commands) && j < n_commands; i++) {
+ struct cxl_mem_command *c = &mem_commands[i];
+ const struct cxl_command_info *info = &c->info;
+
+ if (!c->enable)
+ continue;
+
+ if (copy_to_user(&q->commands[j], info, sizeof(*info)))
+ return -EFAULT;
+
+ if (copy_to_user(&q->commands[j].name, info->name,
+ strlen(info->name)))
+ return -EFAULT;
+
+ j++;
+ }
+ }
+
return -ENOTTY;
}
diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
new file mode 100644
index 000000000000..1d1e143f98ec
--- /dev/null
+++ b/include/uapi/linux/cxl_mem.h
@@ -0,0 +1,102 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * CXL IOCTLs for Memory Devices
+ */
+
+#ifndef _UAPI_CXL_MEM_H_
+#define _UAPI_CXL_MEM_H_
+
+#if defined(__cplusplus)
+extern "C" {
+#endif
+
+/**
+ * DOC: UAPI
+ *
+ * CXL memory devices expose UAPI to have a standard user interface.
+ * Userspace can refer to these structure definitions and UAPI formats
+ * to communicate to driver
+ */
+
+#define CXL_MEM_QUERY_COMMANDS _IOR('C', 1, struct cxl_mem_query_commands)
+
+#define CXL_MEM_COMMAND_NAME_LENGTH 32
+
+/**
+ * struct cxl_command_info - Command information returned from a query.
+ * @id: ID number for the command.
+ * @flags: Flags that specify command behavior.
+ *
+ * - CXL_MEM_COMMAND_FLAG_TAINT: Using this command will taint the kernel.
+ * @size_in: Expected input size, or -1 if variable length.
+ * @size_out: Expected output size, or -1 if variable length.
+ * @name: Name describing the command.
+ *
+ * Represents a single command that is supported by both the driver and the
+ * hardware. The is returned as part of an array from the query ioctl. The
+ * following would be a command named "foobar" that takes a variable length
+ * input and returns 0 bytes of output.
+ *
+ * - @id = 10
+ * - @name = foobar
+ * - @flags = 0
+ * - @size_in = -1
+ * - @size_out = 0
+ *
+ * See struct cxl_mem_query_commands.
+ */
+struct cxl_command_info {
+ __u32 id;
+#define CXL_MEM_COMMAND_ID_INVALID 0
+
+ __u32 flags;
+#define CXL_MEM_COMMAND_FLAG_NONE 0
+#define CXL_MEM_COMMAND_FLAG_TAINT BIT(0)
+
+ __s32 size_in;
+ __s32 size_out;
+
+ char name[32];
+};
+
+/**
+ * struct cxl_mem_query_commands - Query supported commands.
+ * @n_commands: In/out parameter. When @n_commands is > 0, the driver will
+ * return min(num_support_commands, n_commands). When @n_commands
+ * is 0, driver will return the number of total supported commands.
+ * @rsvd: Reserved for future use.
+ * @commands: Output array of supported commands. This array must be allocated
+ * by userspace to be at least min(num_support_commands, @n_commands)
+ *
+ * Allow userspace to query the available commands supported by both the driver,
+ * and the hardware. Commands that aren't supported by either the driver, or the
+ * hardware are not returned in the query.
+ *
+ * Examples:
+ *
+ * - { .n_commands = 0 } // Get number of supported commands
+ * - { .n_commands = 15, .commands = buf } // Return first 15 (or less)
+ * supported commands
+ *
+ * See struct cxl_command_info.
+ */
+struct cxl_mem_query_commands {
+ /*
+ * Input: Number of commands to return (space allocated by user)
+ * Output: Number of commands supported by the driver/hardware
+ *
+ * If n_commands is 0, kernel will only return number of commands and
+ * not try to populate commands[], thus allowing userspace to know how
+ * much space to allocate
+ */
+ __u32 n_commands;
+ __u32 rsvd;
+
+ struct cxl_command_info __user commands[]; /* out: supported commands */
+};
+
+#if defined(__cplusplus)
+}
+#endif
+
+#endif
--
2.29.2
From: Vishal Verma <[email protected]>
Add support to advertise OS capabilities, and request OS control for CXL
features using the ACPI _OSC mechanism. Advertise support for all
possible CXL features, and attempt to request control for all possible
features.
Based on a patch by Sean Kelley.
Signed-off-by: Vishal Verma <[email protected]>
Signed-off-by: Ben Widawsky <[email protected]>
---
Documentation/cxl/memory-devices.rst | 15 ++
drivers/cxl/acpi.c | 261 ++++++++++++++++++++++++++-
drivers/cxl/acpi.h | 20 ++
3 files changed, 293 insertions(+), 3 deletions(-)
diff --git a/Documentation/cxl/memory-devices.rst b/Documentation/cxl/memory-devices.rst
index aa4262280c67..6ce88f9d5f4f 100644
--- a/Documentation/cxl/memory-devices.rst
+++ b/Documentation/cxl/memory-devices.rst
@@ -13,3 +13,18 @@ Driver Infrastructure
=====================
This sections covers the driver infrastructure for a CXL memory device.
+
+ACPI CXL
+--------
+
+.. kernel-doc:: drivers/cxl/acpi.c
+ :doc: cxl acpi
+
+.. kernel-doc:: drivers/cxl/acpi.c
+ :internal:
+
+External Interfaces
+===================
+
+.. kernel-doc:: drivers/cxl/acpi.c
+ :export:
diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 0f1ba9b3f1ed..42180c64cbee 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -11,7 +11,259 @@
#include <linux/pci.h>
#include "acpi.h"
-/*
+/**
+ * DOC: cxl acpi
+ *
+ * ACPI _OSC setup: The exported function cxl_bus_acquire() sets up ACPI
+ * Operating System Capabilities (_OSC) for the CXL bus. It declares support
+ * for all CXL capabilities, and attempts to request control for all possible
+ * capabilities. The resulting support and control sets are saved in global
+ * variables cxl_osc_support_set and cxl_osc_control_set. The internal
+ * functions cxl_osc_declare_support(), and cxl_osc_request_control() can be
+ * used to update the support and control sets in accordance with the ACPI
+ * rules for _OSC evaluation - most importantly, capabilities already granted
+ * should not be rescinded by either the OS or firmware.
+ */
+
+static u32 cxl_osc_support_set;
+static u32 cxl_osc_control_set;
+static DEFINE_MUTEX(acpi_desc_lock);
+
+struct pci_osc_bit_struct {
+ u32 bit;
+ char *desc;
+};
+
+static struct pci_osc_bit_struct cxl_osc_support_bit[] = {
+ { CXL_OSC_PORT_REG_ACCESS_SUPPORT, "CXLPortRegAccess" },
+ { CXL_OSC_PORT_DEV_REG_ACCESS_SUPPORT, "CXLPortDevRegAccess" },
+ { CXL_OSC_PER_SUPPORT, "CXLProtocolErrorReporting" },
+ { CXL_OSC_NATIVE_HP_SUPPORT, "CXLNativeHotPlug" },
+};
+
+static struct pci_osc_bit_struct cxl_osc_control_bit[] = {
+ { CXL_OSC_MEM_ERROR_CONTROL, "CXLMemErrorReporting" },
+};
+
+static u8 cxl_osc_uuid_str[] = "68F2D50B-C469-4d8A-BD3D-941A103FD3FC";
+
+static void decode_osc_bits(struct device *dev, char *msg, u32 word,
+ struct pci_osc_bit_struct *table, int size)
+{
+ char buf[80];
+ int i, len = 0;
+ struct pci_osc_bit_struct *entry;
+
+ buf[0] = '\0';
+ for (i = 0, entry = table; i < size; i++, entry++)
+ if (word & entry->bit)
+ len += scnprintf(buf + len, sizeof(buf) - len, "%s%s",
+ len ? " " : "", entry->desc);
+
+ dev_info(dev, "_OSC: %s [%s]\n", msg, buf);
+}
+
+static void decode_cxl_osc_support(struct device *dev, char *msg, u32 word)
+{
+ decode_osc_bits(dev, msg, word, cxl_osc_support_bit,
+ ARRAY_SIZE(cxl_osc_support_bit));
+}
+
+static void decode_cxl_osc_control(struct device *dev, char *msg, u32 word)
+{
+ decode_osc_bits(dev, msg, word, cxl_osc_control_bit,
+ ARRAY_SIZE(cxl_osc_control_bit));
+}
+
+static acpi_status acpi_cap_run_osc(acpi_handle handle, const u32 *capbuf,
+ u8 *uuid_str, u32 *retval)
+{
+ struct acpi_osc_context context = {
+ .uuid_str = uuid_str,
+ .rev = 1,
+ .cap.length = 20,
+ .cap.pointer = (void *)capbuf,
+ };
+ acpi_status status;
+
+ status = acpi_run_osc(handle, &context);
+ if (ACPI_SUCCESS(status)) {
+ /* pointer + offset to DWORD 5 */
+ *retval = *((u32 *)(context.ret.pointer + 16));
+ kfree(context.ret.pointer);
+ }
+ return status;
+}
+
+static acpi_status cxl_query_osc(acpi_handle handle, u32 support, u32 *control)
+{
+ struct acpi_pci_root *root;
+ acpi_status status;
+ u32 result, capbuf[5];
+
+ root = acpi_pci_find_root(handle);
+ if (!root)
+ return -ENXIO;
+
+ support &= CXL_OSC_SUPPORT_VALID_MASK;
+ support |= cxl_osc_support_set;
+
+ capbuf[OSC_QUERY_DWORD] = OSC_QUERY_ENABLE;
+ capbuf[PCI_OSC_SUPPORT_DWORD] = root->osc_support_set;
+ capbuf[PCI_OSC_CONTROL_DWORD] = root->osc_control_set;
+ capbuf[CXL_OSC_SUPPORT_DWORD] = support;
+ if (control) {
+ *control &= CXL_OSC_CONTROL_VALID_MASK;
+ capbuf[CXL_OSC_CONTROL_DWORD] = *control | cxl_osc_control_set;
+ } else {
+ /* Run _OSC query only with existing controls. */
+ capbuf[CXL_OSC_CONTROL_DWORD] = cxl_osc_control_set;
+ }
+
+ status = acpi_cap_run_osc(handle, capbuf, cxl_osc_uuid_str, &result);
+ if (ACPI_SUCCESS(status)) {
+ cxl_osc_support_set = support;
+ if (control)
+ *control = result;
+ }
+ return status;
+}
+
+/**
+ * cxl_osc_declare_support - Declare support for CXL root _OSC features.
+ * @handle: ACPI device handle for the PCI root bridge (or PCIe Root Complex).
+ * @flags: The _OSC bits to declare support for.
+ *
+ * The PCI specific DWORDS are obtained from the pci root port device, and
+ * used as-is. The resulting CXL support set is saved in cxl_osc_support_set.
+ * Any future calls to this are forced to be strictly incremental from the
+ * existing cxl_osc_support_set.
+ */
+static acpi_status cxl_osc_declare_support(acpi_handle handle, u32 flags)
+{
+ acpi_status status;
+
+ mutex_lock(&acpi_desc_lock);
+ status = cxl_query_osc(handle, flags, NULL);
+ mutex_unlock(&acpi_desc_lock);
+ return status;
+}
+
+/**
+ * cxl_osc_request_control - Request control of CXL root _OSC features.
+ * @adev: ACPI device for the PCI root bridge (or PCIe Root Complex).
+ * @mask: Mask of _OSC bits to request control of, place to store control mask.
+ * @req: Mask of _OSC bits the control of is essential to the caller.
+ *
+ * The PCI specific DWORDS are obtained from the pci root port device, and
+ * used as-is. The resulting CXL control set is saved in cxl_osc_control_set.
+ * Any future calls to this are forced to be strictly incremental from the
+ * existing cxl_osc_control_set.
+ */
+static acpi_status cxl_osc_request_control(struct acpi_device *adev, u32 *mask,
+ u32 req)
+{
+ acpi_handle handle = adev->handle;
+ struct device *dev = &adev->dev;
+ struct acpi_pci_root *root;
+ acpi_status status = AE_OK;
+ u32 ctrl, capbuf[5];
+
+ if (!mask)
+ return AE_BAD_PARAMETER;
+
+ ctrl = *mask & CXL_OSC_MEM_ERROR_CONTROL;
+ if ((ctrl & req) != req)
+ return AE_TYPE;
+
+ root = acpi_pci_find_root(handle);
+ if (!root)
+ return -ENXIO;
+
+ mutex_lock(&acpi_desc_lock);
+
+ *mask = ctrl | cxl_osc_control_set;
+ /* No need to evaluate _OSC if the control was already granted. */
+ if ((cxl_osc_control_set & ctrl) == ctrl)
+ goto out;
+
+ /* Need to check the available controls bits before requesting them. */
+ while (*mask) {
+ status = cxl_query_osc(handle, cxl_osc_support_set, mask);
+ if (ACPI_FAILURE(status))
+ goto out;
+ if (ctrl == *mask)
+ break;
+ decode_cxl_osc_control(dev, "platform does not support",
+ ctrl & ~(*mask));
+ ctrl = *mask;
+ }
+
+ if ((ctrl & req) != req) {
+ decode_cxl_osc_control(
+ dev,
+ "not requesting control; platform does not support",
+ req & ~(ctrl));
+ status = AE_SUPPORT;
+ goto out;
+ }
+
+ capbuf[OSC_QUERY_DWORD] = 0;
+ capbuf[PCI_OSC_SUPPORT_DWORD] = root->osc_support_set;
+ capbuf[PCI_OSC_CONTROL_DWORD] = root->osc_control_set;
+ capbuf[CXL_OSC_SUPPORT_DWORD] = cxl_osc_support_set;
+ capbuf[CXL_OSC_CONTROL_DWORD] = ctrl;
+ status = acpi_cap_run_osc(handle, capbuf, cxl_osc_uuid_str, mask);
+ if (ACPI_SUCCESS(status))
+ cxl_osc_control_set = *mask;
+out:
+ mutex_unlock(&acpi_desc_lock);
+ return status;
+}
+
+static int cxl_negotiate_osc(struct acpi_device *adev)
+{
+ u32 cxl_support, cxl_control, requested;
+ acpi_handle handle = adev->handle;
+ struct device *dev = &adev->dev;
+ acpi_status status;
+
+ /* Declare support for everything */
+ cxl_support = CXL_OSC_SUPPORT_VALID_MASK;
+ decode_cxl_osc_support(dev, "OS supports", cxl_support);
+ status = cxl_osc_declare_support(handle, cxl_support);
+ if (ACPI_FAILURE(status)) {
+ dev_info(dev, "CXL_OSC failed (%s)\n",
+ acpi_format_exception(status));
+ return -ENXIO;
+ }
+
+ /* Request control for everything */
+ cxl_control = CXL_OSC_CONTROL_VALID_MASK;
+ requested = cxl_control;
+ status = cxl_osc_request_control(adev, &cxl_control,
+ CXL_OSC_MEM_ERROR_CONTROL);
+ if (ACPI_SUCCESS(status)) {
+ decode_cxl_osc_control(dev, "OS now controls", cxl_control);
+ } else {
+ decode_cxl_osc_control(dev, "OS requested", requested);
+ decode_cxl_osc_control(dev, "platform willing to grant",
+ cxl_control);
+ dev_info(dev, "_OSC failed (%s)\n",
+ acpi_format_exception(status));
+ }
+ return 0;
+}
+
+/**
+ * cxl_bus_acquire - Perform platform-specific bus operations
+ * @pdev: pci_dev associated with the CXL device
+ *
+ * This performs bus-specific operations such as ACPI _OSC to ensure that
+ * the bus is 'prepared'. Since the CXL definition of _OSC depends on the
+ * existing PCI _OSC DWORDS in 'Arg3', pull those in from the pci root port
+ * device, and merge those into the new CXL-augmented _OSC calls.
+ *
* If/when CXL support is defined by other platform firmware the kernel
* will need a mechanism to select between the platform specific version
* of this routine, until then, hard-code ACPI assumptions
@@ -21,6 +273,7 @@ int cxl_bus_acquire(struct pci_dev *pdev)
struct acpi_device *adev;
struct pci_dev *root_port;
struct device *root;
+ int rc;
root_port = pcie_find_root_port(pdev);
if (!root_port)
@@ -34,9 +287,11 @@ int cxl_bus_acquire(struct pci_dev *pdev)
if (!adev)
return -ENXIO;
- /* TODO: OSC enabling */
+ rc = cxl_negotiate_osc(adev);
+ if (rc)
+ dev_err(&pdev->dev, "Failed to negotiate OSC\n");
- return 0;
+ return rc;
}
EXPORT_SYMBOL_GPL(cxl_bus_acquire);
diff --git a/drivers/cxl/acpi.h b/drivers/cxl/acpi.h
index d638f8886ab7..6ef154021745 100644
--- a/drivers/cxl/acpi.h
+++ b/drivers/cxl/acpi.h
@@ -12,4 +12,24 @@ struct cxl_acpi_desc {
int cxl_bus_acquire(struct pci_dev *pci_dev);
+/* Indexes into _OSC Capabilities Buffer */
+#define PCI_OSC_SUPPORT_DWORD 1 /* DWORD 2 */
+#define PCI_OSC_CONTROL_DWORD 2 /* DWORD 3 */
+#define CXL_OSC_SUPPORT_DWORD 3 /* DWORD 4 */
+#define CXL_OSC_CONTROL_DWORD 4 /* DWORD 5 */
+
+/* CXL Host Bridge _OSC: Capabilities DWORD 4: Support Field */
+#define CXL_OSC_PORT_REG_ACCESS_SUPPORT 0x00000001
+#define CXL_OSC_PORT_DEV_REG_ACCESS_SUPPORT 0x00000002
+#define CXL_OSC_PER_SUPPORT 0x00000004
+#define CXL_OSC_NATIVE_HP_SUPPORT 0x00000008
+#define CXL_OSC_SUPPORT_VALID_MASK (CXL_OSC_PORT_REG_ACCESS_SUPPORT | \
+ CXL_OSC_PORT_DEV_REG_ACCESS_SUPPORT | \
+ CXL_OSC_PER_SUPPORT | \
+ CXL_OSC_NATIVE_HP_SUPPORT)
+
+/* CXL Host Bridge _OSC: Capabilities DWORD 5: Control Field */
+#define CXL_OSC_MEM_ERROR_CONTROL 0x00000001
+#define CXL_OSC_CONTROL_VALID_MASK (CXL_OSC_MEM_ERROR_CONTROL)
+
#endif /* __CXL_ACPI_H__ */
--
2.29.2
On Tue, 2020-12-08 at 16:24 -0800, Ben Widawsky wrote:
> Changes since v1 [1]
>
> A few additions have been made:
> - IOCTL (UAPI) interface has been added with commands
> - Kernel docs have been created
> - A new debug macro is introduced and sprinkled throughout.
>
> A deletion was made:
> - Removal of the non-standard _OSC UUID.
>
> The detailed list of fixes is:
> - fix cxl_register() no previous prototype warning (0day robot)
> - s/REGLOG/REGLOC/ (Ben)
> - Wait for doorbell on cxl_mem_mbox_get() and add comment on why (Ben)
> - make "type-3" a proper adjective, add spec references, also did the same for
> the Kconfig (Bjorn)
> - align some defines (Bjorn)
> - s/bar/BAR (Bjorn)
> - rename cxl_bus_prepared() to cxl_bus_acquire() (Bjorn)
> - move definition of struct cxl_mem to "cxl/mem: Map memory device registers" (Bjorn)
> - use consistent hex/decimal (Bjorn)
> - use consistent upper/lower hex values (Bjorn)
> - get rid of READ_ONCE (Bjorn)
> - add offsets to debug messages (Bjorn)
> - cleanup SPDX comment style (Bjorn, Christoph)
> - change errors returned by case (Bjorn, Dan)
> - 80 character violation cleanups (Christoph)
> - cleanup CXL_BUS_PROVIDER dependencies (Christoph, Randy)
> - remove "raw" from mmio functions (Dan)
> - rename PCI_DVSEC_VENDOR_CXL to add _ID (Jonathan)
> - combine introduction of mbox infrastruct and cxl_mem_identify() (Jonathan)
> - add ABI documentation for sysfs attributes (Jonathan)
> - document scope of cxl_memdev_lock (Jonathan)
> - rework cxl_register() to have devm semantics (reaction to comments about
> cxl_mem_remove() and cxl_mem_add_memdev() semantics) (Jonathan)
> - fix cxl_mem_exit() ordering (Jonathan)
> - use GENMASK/GET_FIELD (Jonathan)
> - fix and add comments for cap ids (Jonathan)
> - use _OFFSET postfix in definitions (Jonathan)
> - save pci_set_drvdata for later (Jonathan)
There are a few more change credits for the acpi patches:
- Remove unnecessary helpers and callbacks in drivers/cxl/acpi.c (Christoph, Jonathan)
- Remove some unnecessary variable initalizations (Bjorn, Jonathan)
- Convert to platform_driver (Rafael)
>
> [1]: https://lore.kernel.org/linux-cxl/[email protected]/
>
> ---
>
> Introduce support for “type-3” memory devices defined in the recently released
> Compute Express Link (CXL) 2.0 specification[2]. Specifically, these are the
> memory devices defined by section 8.2.8.5 of the CXL 2.0 spec. A reference
> implementation emulating these devices has been submitted to the QEMU mailing
> list and is available on gitlab [3]. “Type-3” is a CXL device that acts as a
> memory expander for RAM or PMEM. It might be interleaved with other CXL devices
> in a given physical address range.
>
> These changes allow for foundational enumeration of CXL 2.0 memory devices as
> well as basic userspace interaction. The functionality present is:
> - Initial driver bring-up
> - Device enumeration and an initial sysfs representation
> - Submit a basic firmware command via ‘mailbox’ to an emulated memory device
> with non-volatile capacity.
> - Provide an interface to send "raw" commands to the hardware.
>
> Some of the functionality that is still missing includes:
> - Memory interleaving at the host bridge, root port, or switch level
> - CXL 1.1 Root Complex Integrated Endpoint Support
> - CXL 2.0 Hot plug support
> - A bevy of supported device commands
>
> In addition to the core functionality of discovering the spec defined registers
> and resources, introduce a CXL device model that will be the foundation for
> translating CXL capabilities into existing Linux infrastructure for Persistent
> Memory and other memory devices. For now, this only includes support for the
> management command mailbox that type-3 devices surface. These control devices
> fill the role of “DIMMs” / nmemX memory-devices in LIBNVDIMM terms.
>
> Now, while implementing the driver some feedback for the specification was
> generated to cover perceived gaps and address conflicts. The feedback is
> presented as a reference implementation in the driver and QEMU emulation.
> Specifically the following concepts are original to the Linux implementation and
> feedback / collaboration is requested to develop these into specification
> proposals:
> 1. Top level ACPI object (ACPI0017)
> 2. HW imposed address space and interleave constraints
>
> ACPI0017
> --------
> Introduce a new ACPI namespace device with an _HID of ACPI0017. The purpose of
> this object is twofold, support a legacy OS with a set of out-of-tree CXL
> modules, and establish an attach point for a driver that knows about
> interleaving. Both of these boil down to the same point, to centralize Operating
> System support for resources described by the CXL Early Discovery Table (CEDT).
>
> The legacy OS problem stems from the spec's description of a host bridge,
> ACPI0016 is denoted as the _HID for host bridges, with a _CID of PNP0A08. In a
> CXL unaware version of Linux, the core ACPI subsystem will bind a driver to
> PNP0A08 and preclude a CXL-aware driver from binding to ACPI0016. An ACPI0017
> device allows a standalone CXL-aware driver to register for handling /
> coordinating CEDT and CXL-specific _OSC control.
>
> Similarly when managing interleaving there needs to be some management layer
> above the ACPI0016 device that is capable of assembling leaf nodes into
> interleave sets. As is the case with ACPI0012 that does this central
> coordination for NFIT defined resources, ACPI0017 does the same for CEDT
> described resources.
>
> Memory Windows
> -------
> For CXL.mem capable platforms, there is a need for a mechanism for platform
> firmware to make the Operating System aware of any restrictions that hardware
> might have in address space. For example, in a system with 4 host bridges all
> participating in an interleave set, the firmware needs to provide some
> description of this. That information is missing from the CXL 2.0 spec as of
> today and it also is not implemented in the driver. A variety of ACPI based
> mechanisms, for example _CRS fields on the ACPI0017 device, were considered.
>
> Next steps after this basic foundation is expanded command support and LIBNVDIMM
> integration. This is the initial “release early / release often” version of the
> Linux CXL enabling.
>
> [2]: https://www.computeexpresslink.org/
> [3]: https://gitlab.com/bwidawsk/qemu/-/tree/cxl-2.0v2
>
>
> Ben Widawsky (10):
> docs: cxl: Add basic documentation
> cxl/mem: Map memory device registers
> cxl/mem: Find device capabilities
> cxl/mem: Implement polled mode mailbox
> cxl/mem: Add basic IOCTL interface
> cxl/mem: Add send command
> cxl/mem: Add a "RAW" send command
> cxl: Add basic debugging
> MAINTAINERS: Add maintainers of the CXL driver
> WIP/cxl/mem: Add get firmware for testing
>
> Dan Williams (2):
> cxl/mem: Introduce a driver for CXL-2.0-Type-3 endpoints
> cxl/mem: Register CXL memX devices
>
> Vishal Verma (2):
> cxl/acpi: Add an acpi_cxl module for the CXL interconnect
> cxl/acpi: add OSC support
>
> Documentation/ABI/testing/sysfs-bus-cxl | 26 +
> Documentation/cxl/index.rst | 12 +
> Documentation/cxl/memory-devices.rst | 51 ++
> Documentation/index.rst | 1 +
> MAINTAINERS | 10 +
> drivers/Kconfig | 1 +
> drivers/Makefile | 1 +
> drivers/cxl/Kconfig | 58 ++
> drivers/cxl/Makefile | 9 +
> drivers/cxl/acpi.c | 352 ++++++++
> drivers/cxl/acpi.h | 35 +
> drivers/cxl/bus.c | 54 ++
> drivers/cxl/bus.h | 8 +
> drivers/cxl/cxl.h | 188 +++++
> drivers/cxl/mem.c | 1022 +++++++++++++++++++++++
> drivers/cxl/pci.h | 34 +
> include/acpi/actbl1.h | 51 ++
> include/uapi/linux/cxl_mem.h | 148 ++++
> 18 files changed, 2061 insertions(+)
> create mode 100644 Documentation/ABI/testing/sysfs-bus-cxl
> create mode 100644 Documentation/cxl/index.rst
> create mode 100644 Documentation/cxl/memory-devices.rst
> create mode 100644 drivers/cxl/Kconfig
> create mode 100644 drivers/cxl/Makefile
> create mode 100644 drivers/cxl/acpi.c
> create mode 100644 drivers/cxl/acpi.h
> create mode 100644 drivers/cxl/bus.c
> create mode 100644 drivers/cxl/bus.h
> create mode 100644 drivers/cxl/cxl.h
> create mode 100644 drivers/cxl/mem.c
> create mode 100644 drivers/cxl/pci.h
> create mode 100644 include/uapi/linux/cxl_mem.h
>
On 20-12-08 16:24:04, Ben Widawsky wrote:
[snip]
This is available on gitlab here: https://gitlab.com/bwidawsk/linux/-/tree/cxl-2.0v2
On Tue, Dec 8, 2020 at 4:24 PM Ben Widawsky <[email protected]> wrote:
>
> Add a straightforward IOCTL that provides a mechanism for userspace to
> query the supported memory device commands.
>
> Memory device commands are specified in 8.2.9 of the CXL 2.0
> specification. They are submitted through a mailbox mechanism specified
> in 8.2.8.4.
>
> Signed-off-by: Ben Widawsky <[email protected]>
>
> ---
>
> I did attempt to use the same struct for querying commands as well as
> sending commands (upcoming patch). The number of unused fields between
> the two made for a bad fit IMO.
>
> Signed-off-by: Ben Widawsky <[email protected]>
> ---
> Documentation/cxl/memory-devices.rst | 9 +++
> drivers/cxl/mem.c | 89 +++++++++++++++++++++++
> include/uapi/linux/cxl_mem.h | 102 +++++++++++++++++++++++++++
> 3 files changed, 200 insertions(+)
> create mode 100644 include/uapi/linux/cxl_mem.h
>
> diff --git a/Documentation/cxl/memory-devices.rst b/Documentation/cxl/memory-devices.rst
> index 5f723c25382b..ec54674b3822 100644
> --- a/Documentation/cxl/memory-devices.rst
> +++ b/Documentation/cxl/memory-devices.rst
> @@ -32,6 +32,15 @@ CXL Memory Device
> .. kernel-doc:: drivers/cxl/mem.c
> :internal:
>
> +CXL IOCTL Interface
> +-------------------
> +
> +.. kernel-doc:: include/uapi/linux/cxl_mem.h
> + :doc: UAPI
> +
> +.. kernel-doc:: include/uapi/linux/cxl_mem.h
> + :internal:
> +
> External Interfaces
> ===================
>
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index bb6ea58f6c7b..2c4aadcea0e4 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -7,6 +7,7 @@
> #include <linux/idr.h>
> #include <linux/pci.h>
> #include <linux/io.h>
> +#include <uapi/linux/cxl_mem.h>
> #include "acpi.h"
> #include "pci.h"
> #include "cxl.h"
> @@ -73,6 +74,49 @@ static DEFINE_IDR(cxl_mem_idr);
> /* protect cxl_mem_idr allocations */
> static DEFINE_MUTEX(cxl_memdev_lock);
>
> +/*
> + * This table defines the supported mailboxes commands for the driver. The id is
> + * ordinal and thus gaps in this table aren't allowed. This table is made up of
> + * a UAPI structure. Non-negative values in the table will be validated against
> + * the user's input. For example, if size_in is 0, and the user passed in 1, it
> + * is an error.
> + */
> +#define CXL_CMD(_id, _flags, sin, sout, _name, _enable, op) \
> + { \
> + { .id = CXL_MEM_COMMAND_ID_##_id, \
> + .flags = CXL_MEM_COMMAND_FLAG_##_flags, \
> + .size_in = sin, \
> + .size_out = sout, \
> + .name = _name }, \
> + .enable = _enable, .opcode = op \
> + }
Seems the ordinality requirement could be dropped if the definition was:
#define CXL_CMD(_id, _flags, sin, sout, _name, _enable, op) \
[CXL_MEM_COMMAND_ID_##_id] = {
\
{ .id = CXL_MEM_COMMAND_ID_##_id, \
...
Then command 0 and 42 could be defined out of order in the table.
Especially if we need to config-disable or deprecate commands, I think
it would be useful if this table was tolerant to being sparse.
> +
> +/**
> + * struct cxl_mem_command - Driver representation of a memory device command
> + * @info: Command information as it exists for the UAPI
> + * @opcode: The actual bits used for the mailbox protocol
> + * @enable: Whether the command is enabled. The driver may support a large set
> + * of commands that may not be enabled. The primary reason a command
> + * would not be enabled is for commands that are specified as optional
> + * and the hardware doesn't support the command.
> + *
> + * The cxl_mem_command is the driver's internal representation of commands that
> + * are supported by the driver. Some of these commands may not be supported by
> + * the hardware (!@enable). The driver will use @info to validate the fields
> + * passed in by the user then submit the @opcode to the hardware.
> + *
> + * See struct cxl_command_info.
> + */
> +struct cxl_mem_command {
> + const struct cxl_command_info info;
> + const u16 opcode;
> + bool enable;
> +};
> +
> +static struct cxl_mem_command mem_commands[] = {
> + CXL_CMD(INVALID, NONE, 0, 0, "Reserved", false, 0),
> +};
> +
> static int cxl_mem_wait_for_doorbell(struct cxl_mem *cxlm)
> {
> const int timeout = msecs_to_jiffies(2000);
> @@ -268,8 +312,53 @@ static int cxl_mem_open(struct inode *inode, struct file *file)
> return 0;
> }
>
> +static int cxl_mem_count_commands(void)
> +{
> + int i, n = 0;
> +
> + for (i = 0; i < ARRAY_SIZE(mem_commands); i++) {
> + struct cxl_mem_command *c = &mem_commands[i];
> +
> + if (c->enable)
> + n++;
> + }
> +
> + return n;
> +}
> +
> static long cxl_mem_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> {
> + if (cmd == CXL_MEM_QUERY_COMMANDS) {
> + struct cxl_mem_query_commands __user *q = (void __user *)arg;
> + u32 n_commands;
> + int i, j;
> +
> + if (get_user(n_commands, (u32 __user *)arg))
> + return -EFAULT;
> +
> + if (n_commands == 0)
> + return put_user(cxl_mem_count_commands(),
> + (u32 __user *)arg);
> +
> + for (i = 0, j = 0;
> + i < ARRAY_SIZE(mem_commands) && j < n_commands; i++) {
> + struct cxl_mem_command *c = &mem_commands[i];
> + const struct cxl_command_info *info = &c->info;
> +
> + if (!c->enable)
> + continue;
> +
> + if (copy_to_user(&q->commands[j], info, sizeof(*info)))
> + return -EFAULT;
> +
> + if (copy_to_user(&q->commands[j].name, info->name,
> + strlen(info->name)))
> + return -EFAULT;
Not sure why this is needed, see comment below about @name in
cxl_mem_query_commands.
> +
> + j++;
> + }
> + }
> +
> return -ENOTTY;
> }
>
> diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
> new file mode 100644
> index 000000000000..1d1e143f98ec
> --- /dev/null
> +++ b/include/uapi/linux/cxl_mem.h
> @@ -0,0 +1,102 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/*
> + * CXL IOCTLs for Memory Devices
> + */
> +
> +#ifndef _UAPI_CXL_MEM_H_
> +#define _UAPI_CXL_MEM_H_
> +
> +#if defined(__cplusplus)
> +extern "C" {
> +#endif
> +
> +/**
> + * DOC: UAPI
> + *
> + * CXL memory devices expose UAPI to have a standard user interface.
> + * Userspace can refer to these structure definitions and UAPI formats
> + * to communicate to driver
> + */
> +
> +#define CXL_MEM_QUERY_COMMANDS _IOR('C', 1, struct cxl_mem_query_commands)
> +
> +#define CXL_MEM_COMMAND_NAME_LENGTH 32
> +
> +/**
> + * struct cxl_command_info - Command information returned from a query.
> + * @id: ID number for the command.
> + * @flags: Flags that specify command behavior.
> + *
> + * - CXL_MEM_COMMAND_FLAG_TAINT: Using this command will taint the kernel.
> + * @size_in: Expected input size, or -1 if variable length.
> + * @size_out: Expected output size, or -1 if variable length.
> + * @name: Name describing the command.
> + *
> + * Represents a single command that is supported by both the driver and the
> + * hardware. The is returned as part of an array from the query ioctl. The
> + * following would be a command named "foobar" that takes a variable length
> + * input and returns 0 bytes of output.
> + *
> + * - @id = 10
> + * - @name = foobar
> + * - @flags = 0
> + * - @size_in = -1
> + * - @size_out = 0
> + *
> + * See struct cxl_mem_query_commands.
> + */
> +struct cxl_command_info {
> + __u32 id;
> +#define CXL_MEM_COMMAND_ID_INVALID 0
> +
> + __u32 flags;
> +#define CXL_MEM_COMMAND_FLAG_NONE 0
> +#define CXL_MEM_COMMAND_FLAG_TAINT BIT(0)
> +
> + __s32 size_in;
> + __s32 size_out;
> +
> + char name[32];
Why does the name for a command need to be shuffled back and forth
over the ioctl interface. Can't this be handled by a static lookup
table defined in the header?
> +};
> +
> +/**
> + * struct cxl_mem_query_commands - Query supported commands.
> + * @n_commands: In/out parameter. When @n_commands is > 0, the driver will
> + * return min(num_support_commands, n_commands). When @n_commands
> + * is 0, driver will return the number of total supported commands.
> + * @rsvd: Reserved for future use.
> + * @commands: Output array of supported commands. This array must be allocated
> + * by userspace to be at least min(num_support_commands, @n_commands)
> + *
> + * Allow userspace to query the available commands supported by both the driver,
> + * and the hardware. Commands that aren't supported by either the driver, or the
> + * hardware are not returned in the query.
> + *
> + * Examples:
> + *
> + * - { .n_commands = 0 } // Get number of supported commands
> + * - { .n_commands = 15, .commands = buf } // Return first 15 (or less)
> + * supported commands
> + *
> + * See struct cxl_command_info.
> + */
> +struct cxl_mem_query_commands {
> + /*
> + * Input: Number of commands to return (space allocated by user)
> + * Output: Number of commands supported by the driver/hardware
> + *
> + * If n_commands is 0, kernel will only return number of commands and
> + * not try to populate commands[], thus allowing userspace to know how
> + * much space to allocate
> + */
> + __u32 n_commands;
> + __u32 rsvd;
> +
> + struct cxl_command_info __user commands[]; /* out: supported commands */
> +};
> +
> +#if defined(__cplusplus)
> +}
> +#endif
> +
> +#endif
> --
> 2.29.2
>
On 20-12-08 17:37:50, Dan Williams wrote:
> On Tue, Dec 8, 2020 at 4:24 PM Ben Widawsky <[email protected]> wrote:
> >
> > Add a straightforward IOCTL that provides a mechanism for userspace to
> > query the supported memory device commands.
> >
> > Memory device commands are specified in 8.2.9 of the CXL 2.0
> > specification. They are submitted through a mailbox mechanism specified
> > in 8.2.8.4.
> >
> > Signed-off-by: Ben Widawsky <[email protected]>
> >
> > ---
> >
> > I did attempt to use the same struct for querying commands as well as
> > sending commands (upcoming patch). The number of unused fields between
> > the two made for a bad fit IMO.
> >
> > Signed-off-by: Ben Widawsky <[email protected]>
> > ---
> > Documentation/cxl/memory-devices.rst | 9 +++
> > drivers/cxl/mem.c | 89 +++++++++++++++++++++++
> > include/uapi/linux/cxl_mem.h | 102 +++++++++++++++++++++++++++
> > 3 files changed, 200 insertions(+)
> > create mode 100644 include/uapi/linux/cxl_mem.h
> >
> > diff --git a/Documentation/cxl/memory-devices.rst b/Documentation/cxl/memory-devices.rst
> > index 5f723c25382b..ec54674b3822 100644
> > --- a/Documentation/cxl/memory-devices.rst
> > +++ b/Documentation/cxl/memory-devices.rst
> > @@ -32,6 +32,15 @@ CXL Memory Device
> > .. kernel-doc:: drivers/cxl/mem.c
> > :internal:
> >
> > +CXL IOCTL Interface
> > +-------------------
> > +
> > +.. kernel-doc:: include/uapi/linux/cxl_mem.h
> > + :doc: UAPI
> > +
> > +.. kernel-doc:: include/uapi/linux/cxl_mem.h
> > + :internal:
> > +
> > External Interfaces
> > ===================
> >
> > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> > index bb6ea58f6c7b..2c4aadcea0e4 100644
> > --- a/drivers/cxl/mem.c
> > +++ b/drivers/cxl/mem.c
> > @@ -7,6 +7,7 @@
> > #include <linux/idr.h>
> > #include <linux/pci.h>
> > #include <linux/io.h>
> > +#include <uapi/linux/cxl_mem.h>
> > #include "acpi.h"
> > #include "pci.h"
> > #include "cxl.h"
> > @@ -73,6 +74,49 @@ static DEFINE_IDR(cxl_mem_idr);
> > /* protect cxl_mem_idr allocations */
> > static DEFINE_MUTEX(cxl_memdev_lock);
> >
> > +/*
> > + * This table defines the supported mailboxes commands for the driver. The id is
> > + * ordinal and thus gaps in this table aren't allowed. This table is made up of
> > + * a UAPI structure. Non-negative values in the table will be validated against
> > + * the user's input. For example, if size_in is 0, and the user passed in 1, it
> > + * is an error.
> > + */
> > +#define CXL_CMD(_id, _flags, sin, sout, _name, _enable, op) \
> > + { \
> > + { .id = CXL_MEM_COMMAND_ID_##_id, \
> > + .flags = CXL_MEM_COMMAND_FLAG_##_flags, \
> > + .size_in = sin, \
> > + .size_out = sout, \
> > + .name = _name }, \
> > + .enable = _enable, .opcode = op \
> > + }
>
> Seems the ordinality requirement could be dropped if the definition was:
>
> #define CXL_CMD(_id, _flags, sin, sout, _name, _enable, op) \
> [CXL_MEM_COMMAND_ID_##_id] = {
> \
> { .id = CXL_MEM_COMMAND_ID_##_id, \
> ...
>
> Then command 0 and 42 could be defined out of order in the table.
> Especially if we need to config-disable or deprecate commands, I think
> it would be useful if this table was tolerant to being sparse.
>
How sparse are we talking? The current form does support sparseness, but
obviously gets quite large if the ID numbering is similar to random
distribution.
I think if we do see this being more like random distribution, it can be
supported, but I think it adds a decent amount of complexity for what I see as
not much reward - unless you know of a fairly simple way to create this data
structure with full sparse ID support?
> > +
> > +/**
> > + * struct cxl_mem_command - Driver representation of a memory device command
> > + * @info: Command information as it exists for the UAPI
> > + * @opcode: The actual bits used for the mailbox protocol
> > + * @enable: Whether the command is enabled. The driver may support a large set
> > + * of commands that may not be enabled. The primary reason a command
> > + * would not be enabled is for commands that are specified as optional
> > + * and the hardware doesn't support the command.
> > + *
> > + * The cxl_mem_command is the driver's internal representation of commands that
> > + * are supported by the driver. Some of these commands may not be supported by
> > + * the hardware (!@enable). The driver will use @info to validate the fields
> > + * passed in by the user then submit the @opcode to the hardware.
> > + *
> > + * See struct cxl_command_info.
> > + */
> > +struct cxl_mem_command {
> > + const struct cxl_command_info info;
> > + const u16 opcode;
> > + bool enable;
> > +};
> > +
> > +static struct cxl_mem_command mem_commands[] = {
> > + CXL_CMD(INVALID, NONE, 0, 0, "Reserved", false, 0),
> > +};
> > +
> > static int cxl_mem_wait_for_doorbell(struct cxl_mem *cxlm)
> > {
> > const int timeout = msecs_to_jiffies(2000);
> > @@ -268,8 +312,53 @@ static int cxl_mem_open(struct inode *inode, struct file *file)
> > return 0;
> > }
> >
> > +static int cxl_mem_count_commands(void)
> > +{
> > + int i, n = 0;
> > +
> > + for (i = 0; i < ARRAY_SIZE(mem_commands); i++) {
> > + struct cxl_mem_command *c = &mem_commands[i];
> > +
> > + if (c->enable)
> > + n++;
> > + }
> > +
> > + return n;
> > +}
> > +
> > static long cxl_mem_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> > {
> > + if (cmd == CXL_MEM_QUERY_COMMANDS) {
> > + struct cxl_mem_query_commands __user *q = (void __user *)arg;
> > + u32 n_commands;
> > + int i, j;
> > +
> > + if (get_user(n_commands, (u32 __user *)arg))
> > + return -EFAULT;
> > +
> > + if (n_commands == 0)
> > + return put_user(cxl_mem_count_commands(),
> > + (u32 __user *)arg);
> > +
> > + for (i = 0, j = 0;
> > + i < ARRAY_SIZE(mem_commands) && j < n_commands; i++) {
> > + struct cxl_mem_command *c = &mem_commands[i];
> > + const struct cxl_command_info *info = &c->info;
> > +
> > + if (!c->enable)
> > + continue;
> > +
> > + if (copy_to_user(&q->commands[j], info, sizeof(*info)))
> > + return -EFAULT;
> > +
> > + if (copy_to_user(&q->commands[j].name, info->name,
> > + strlen(info->name)))
> > + return -EFAULT;
>
> Not sure why this is needed, see comment below about @name in
> cxl_mem_query_commands.
>
> > +
> > + j++;
> > + }
> > + }
> > +
> > return -ENOTTY;
> > }
> >
> > diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
> > new file mode 100644
> > index 000000000000..1d1e143f98ec
> > --- /dev/null
> > +++ b/include/uapi/linux/cxl_mem.h
> > @@ -0,0 +1,102 @@
> > +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> > +/*
> > + * CXL IOCTLs for Memory Devices
> > + */
> > +
> > +#ifndef _UAPI_CXL_MEM_H_
> > +#define _UAPI_CXL_MEM_H_
> > +
> > +#if defined(__cplusplus)
> > +extern "C" {
> > +#endif
> > +
> > +/**
> > + * DOC: UAPI
> > + *
> > + * CXL memory devices expose UAPI to have a standard user interface.
> > + * Userspace can refer to these structure definitions and UAPI formats
> > + * to communicate to driver
> > + */
> > +
> > +#define CXL_MEM_QUERY_COMMANDS _IOR('C', 1, struct cxl_mem_query_commands)
> > +
> > +#define CXL_MEM_COMMAND_NAME_LENGTH 32
> > +
> > +/**
> > + * struct cxl_command_info - Command information returned from a query.
> > + * @id: ID number for the command.
> > + * @flags: Flags that specify command behavior.
> > + *
> > + * - CXL_MEM_COMMAND_FLAG_TAINT: Using this command will taint the kernel.
> > + * @size_in: Expected input size, or -1 if variable length.
> > + * @size_out: Expected output size, or -1 if variable length.
> > + * @name: Name describing the command.
> > + *
> > + * Represents a single command that is supported by both the driver and the
> > + * hardware. The is returned as part of an array from the query ioctl. The
> > + * following would be a command named "foobar" that takes a variable length
> > + * input and returns 0 bytes of output.
> > + *
> > + * - @id = 10
> > + * - @name = foobar
> > + * - @flags = 0
> > + * - @size_in = -1
> > + * - @size_out = 0
> > + *
> > + * See struct cxl_mem_query_commands.
> > + */
> > +struct cxl_command_info {
> > + __u32 id;
> > +#define CXL_MEM_COMMAND_ID_INVALID 0
> > +
> > + __u32 flags;
> > +#define CXL_MEM_COMMAND_FLAG_NONE 0
> > +#define CXL_MEM_COMMAND_FLAG_TAINT BIT(0)
> > +
> > + __s32 size_in;
> > + __s32 size_out;
> > +
> > + char name[32];
>
> Why does the name for a command need to be shuffled back and forth
> over the ioctl interface. Can't this be handled by a static lookup
> table defined in the header?
>
I was thinking of cases where the userspace application doesn't match the
current kernel's UAPI and giving the driver flexibility to return whatever.
OTTOMH, I also can't think of a way to do this if you want to do define the
table sparsely though. Do you have ideas for that?
> > +};
> > +
> > +/**
> > + * struct cxl_mem_query_commands - Query supported commands.
> > + * @n_commands: In/out parameter. When @n_commands is > 0, the driver will
> > + * return min(num_support_commands, n_commands). When @n_commands
> > + * is 0, driver will return the number of total supported commands.
> > + * @rsvd: Reserved for future use.
> > + * @commands: Output array of supported commands. This array must be allocated
> > + * by userspace to be at least min(num_support_commands, @n_commands)
> > + *
> > + * Allow userspace to query the available commands supported by both the driver,
> > + * and the hardware. Commands that aren't supported by either the driver, or the
> > + * hardware are not returned in the query.
> > + *
> > + * Examples:
> > + *
> > + * - { .n_commands = 0 } // Get number of supported commands
> > + * - { .n_commands = 15, .commands = buf } // Return first 15 (or less)
> > + * supported commands
> > + *
> > + * See struct cxl_command_info.
> > + */
> > +struct cxl_mem_query_commands {
> > + /*
> > + * Input: Number of commands to return (space allocated by user)
> > + * Output: Number of commands supported by the driver/hardware
> > + *
> > + * If n_commands is 0, kernel will only return number of commands and
> > + * not try to populate commands[], thus allowing userspace to know how
> > + * much space to allocate
> > + */
> > + __u32 n_commands;
> > + __u32 rsvd;
> > +
> > + struct cxl_command_info __user commands[]; /* out: supported commands */
> > +};
> > +
> > +#if defined(__cplusplus)
> > +}
> > +#endif
> > +
> > +#endif
> > --
> > 2.29.2
> >
On Tue, Dec 8, 2020 at 6:13 PM Ben Widawsky <[email protected]> wrote:
>
> On 20-12-08 17:37:50, Dan Williams wrote:
> > On Tue, Dec 8, 2020 at 4:24 PM Ben Widawsky <[email protected]> wrote:
> > >
> > > Add a straightforward IOCTL that provides a mechanism for userspace to
> > > query the supported memory device commands.
> > >
> > > Memory device commands are specified in 8.2.9 of the CXL 2.0
> > > specification. They are submitted through a mailbox mechanism specified
> > > in 8.2.8.4.
> > >
> > > Signed-off-by: Ben Widawsky <[email protected]>
> > >
> > > ---
> > >
> > > I did attempt to use the same struct for querying commands as well as
> > > sending commands (upcoming patch). The number of unused fields between
> > > the two made for a bad fit IMO.
> > >
> > > Signed-off-by: Ben Widawsky <[email protected]>
> > > ---
> > > Documentation/cxl/memory-devices.rst | 9 +++
> > > drivers/cxl/mem.c | 89 +++++++++++++++++++++++
> > > include/uapi/linux/cxl_mem.h | 102 +++++++++++++++++++++++++++
> > > 3 files changed, 200 insertions(+)
> > > create mode 100644 include/uapi/linux/cxl_mem.h
> > >
> > > diff --git a/Documentation/cxl/memory-devices.rst b/Documentation/cxl/memory-devices.rst
> > > index 5f723c25382b..ec54674b3822 100644
> > > --- a/Documentation/cxl/memory-devices.rst
> > > +++ b/Documentation/cxl/memory-devices.rst
> > > @@ -32,6 +32,15 @@ CXL Memory Device
> > > .. kernel-doc:: drivers/cxl/mem.c
> > > :internal:
> > >
> > > +CXL IOCTL Interface
> > > +-------------------
> > > +
> > > +.. kernel-doc:: include/uapi/linux/cxl_mem.h
> > > + :doc: UAPI
> > > +
> > > +.. kernel-doc:: include/uapi/linux/cxl_mem.h
> > > + :internal:
> > > +
> > > External Interfaces
> > > ===================
> > >
> > > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> > > index bb6ea58f6c7b..2c4aadcea0e4 100644
> > > --- a/drivers/cxl/mem.c
> > > +++ b/drivers/cxl/mem.c
> > > @@ -7,6 +7,7 @@
> > > #include <linux/idr.h>
> > > #include <linux/pci.h>
> > > #include <linux/io.h>
> > > +#include <uapi/linux/cxl_mem.h>
> > > #include "acpi.h"
> > > #include "pci.h"
> > > #include "cxl.h"
> > > @@ -73,6 +74,49 @@ static DEFINE_IDR(cxl_mem_idr);
> > > /* protect cxl_mem_idr allocations */
> > > static DEFINE_MUTEX(cxl_memdev_lock);
> > >
> > > +/*
> > > + * This table defines the supported mailboxes commands for the driver. The id is
> > > + * ordinal and thus gaps in this table aren't allowed. This table is made up of
> > > + * a UAPI structure. Non-negative values in the table will be validated against
> > > + * the user's input. For example, if size_in is 0, and the user passed in 1, it
> > > + * is an error.
> > > + */
> > > +#define CXL_CMD(_id, _flags, sin, sout, _name, _enable, op) \
> > > + { \
> > > + { .id = CXL_MEM_COMMAND_ID_##_id, \
> > > + .flags = CXL_MEM_COMMAND_FLAG_##_flags, \
> > > + .size_in = sin, \
> > > + .size_out = sout, \
> > > + .name = _name }, \
> > > + .enable = _enable, .opcode = op \
> > > + }
> >
> > Seems the ordinality requirement could be dropped if the definition was:
> >
> > #define CXL_CMD(_id, _flags, sin, sout, _name, _enable, op) \
> > [CXL_MEM_COMMAND_ID_##_id] = {
> > \
> > { .id = CXL_MEM_COMMAND_ID_##_id, \
> > ...
> >
> > Then command 0 and 42 could be defined out of order in the table.
> > Especially if we need to config-disable or deprecate commands, I think
> > it would be useful if this table was tolerant to being sparse.
> >
>
> How sparse are we talking? The current form does support sparseness, but
> obviously gets quite large if the ID numbering is similar to random
> distribution.
"Sparse" may have been the wrong word to use. I was implying sparse
enough that if I add command N+1 I don't need to be careful where I
put it in mem_commands, but still be able to rely on lookups into
mem_commands being indexed by the command-id.
> I think if we do see this being more like random distribution, it can be
> supported, but I think it adds a decent amount of complexity for what I see as
> not much reward - unless you know of a fairly simple way to create this data
> structure with full sparse ID support?
I'm expecting the command distribution to be mostly uniform, it's more
of the lookup property that I think would be useful especially for the
dynamic case of walking mem_commands to update it relative to what the
hardware supports or other metadata. Speaking of which I think @enable
should be turned into @flags of which 'enable' is one, in case we want
to define more flags in the future.
>
> > > +
> > > +/**
> > > + * struct cxl_mem_command - Driver representation of a memory device command
> > > + * @info: Command information as it exists for the UAPI
> > > + * @opcode: The actual bits used for the mailbox protocol
> > > + * @enable: Whether the command is enabled. The driver may support a large set
> > > + * of commands that may not be enabled. The primary reason a command
> > > + * would not be enabled is for commands that are specified as optional
> > > + * and the hardware doesn't support the command.
> > > + *
> > > + * The cxl_mem_command is the driver's internal representation of commands that
> > > + * are supported by the driver. Some of these commands may not be supported by
> > > + * the hardware (!@enable). The driver will use @info to validate the fields
> > > + * passed in by the user then submit the @opcode to the hardware.
> > > + *
> > > + * See struct cxl_command_info.
> > > + */
> > > +struct cxl_mem_command {
> > > + const struct cxl_command_info info;
> > > + const u16 opcode;
> > > + bool enable;
> > > +};
> > > +
> > > +static struct cxl_mem_command mem_commands[] = {
> > > + CXL_CMD(INVALID, NONE, 0, 0, "Reserved", false, 0),
> > > +};
> > > +
> > > static int cxl_mem_wait_for_doorbell(struct cxl_mem *cxlm)
> > > {
> > > const int timeout = msecs_to_jiffies(2000);
> > > @@ -268,8 +312,53 @@ static int cxl_mem_open(struct inode *inode, struct file *file)
> > > return 0;
> > > }
> > >
> > > +static int cxl_mem_count_commands(void)
> > > +{
> > > + int i, n = 0;
> > > +
> > > + for (i = 0; i < ARRAY_SIZE(mem_commands); i++) {
> > > + struct cxl_mem_command *c = &mem_commands[i];
> > > +
> > > + if (c->enable)
> > > + n++;
> > > + }
> > > +
> > > + return n;
> > > +}
> > > +
> > > static long cxl_mem_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> > > {
> > > + if (cmd == CXL_MEM_QUERY_COMMANDS) {
> > > + struct cxl_mem_query_commands __user *q = (void __user *)arg;
> > > + u32 n_commands;
> > > + int i, j;
> > > +
> > > + if (get_user(n_commands, (u32 __user *)arg))
> > > + return -EFAULT;
> > > +
> > > + if (n_commands == 0)
> > > + return put_user(cxl_mem_count_commands(),
> > > + (u32 __user *)arg);
> > > +
> > > + for (i = 0, j = 0;
> > > + i < ARRAY_SIZE(mem_commands) && j < n_commands; i++) {
> > > + struct cxl_mem_command *c = &mem_commands[i];
> > > + const struct cxl_command_info *info = &c->info;
> > > +
> > > + if (!c->enable)
> > > + continue;
> > > +
> > > + if (copy_to_user(&q->commands[j], info, sizeof(*info)))
> > > + return -EFAULT;
> > > +
> > > + if (copy_to_user(&q->commands[j].name, info->name,
> > > + strlen(info->name)))
> > > + return -EFAULT;
> >
> > Not sure why this is needed, see comment below about @name in
> > cxl_mem_query_commands.
> >
> > > +
> > > + j++;
> > > + }
> > > + }
> > > +
> > > return -ENOTTY;
> > > }
> > >
> > > diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
> > > new file mode 100644
> > > index 000000000000..1d1e143f98ec
> > > --- /dev/null
> > > +++ b/include/uapi/linux/cxl_mem.h
> > > @@ -0,0 +1,102 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> > > +/*
> > > + * CXL IOCTLs for Memory Devices
> > > + */
> > > +
> > > +#ifndef _UAPI_CXL_MEM_H_
> > > +#define _UAPI_CXL_MEM_H_
> > > +
> > > +#if defined(__cplusplus)
> > > +extern "C" {
> > > +#endif
> > > +
> > > +/**
> > > + * DOC: UAPI
> > > + *
> > > + * CXL memory devices expose UAPI to have a standard user interface.
> > > + * Userspace can refer to these structure definitions and UAPI formats
> > > + * to communicate to driver
> > > + */
> > > +
> > > +#define CXL_MEM_QUERY_COMMANDS _IOR('C', 1, struct cxl_mem_query_commands)
> > > +
> > > +#define CXL_MEM_COMMAND_NAME_LENGTH 32
> > > +
> > > +/**
> > > + * struct cxl_command_info - Command information returned from a query.
> > > + * @id: ID number for the command.
> > > + * @flags: Flags that specify command behavior.
> > > + *
> > > + * - CXL_MEM_COMMAND_FLAG_TAINT: Using this command will taint the kernel.
> > > + * @size_in: Expected input size, or -1 if variable length.
> > > + * @size_out: Expected output size, or -1 if variable length.
> > > + * @name: Name describing the command.
> > > + *
> > > + * Represents a single command that is supported by both the driver and the
> > > + * hardware. The is returned as part of an array from the query ioctl. The
> > > + * following would be a command named "foobar" that takes a variable length
> > > + * input and returns 0 bytes of output.
> > > + *
> > > + * - @id = 10
> > > + * - @name = foobar
> > > + * - @flags = 0
> > > + * - @size_in = -1
> > > + * - @size_out = 0
> > > + *
> > > + * See struct cxl_mem_query_commands.
> > > + */
> > > +struct cxl_command_info {
> > > + __u32 id;
> > > +#define CXL_MEM_COMMAND_ID_INVALID 0
> > > +
> > > + __u32 flags;
> > > +#define CXL_MEM_COMMAND_FLAG_NONE 0
> > > +#define CXL_MEM_COMMAND_FLAG_TAINT BIT(0)
> > > +
> > > + __s32 size_in;
> > > + __s32 size_out;
> > > +
> > > + char name[32];
> >
> > Why does the name for a command need to be shuffled back and forth
> > over the ioctl interface. Can't this be handled by a static lookup
> > table defined in the header?
> >
>
> I was thinking of cases where the userspace application doesn't match the
> current kernel's UAPI and giving the driver flexibility to return whatever.
How / why would the application by looking at @name for UAPI compatibility?
> OTTOMH, I also can't think of a way to do this if you want to do define the
> table sparsely though. Do you have ideas for that?
I don't think the name lookup would be sparse. i.e. it would be ok for
mem_commands to not have an entry for everything in the name lookup
table. As for defining the table it could use C preprocessor trick
popularized by Steven Rostedt:
#define CMDS \
C(CMD1, "command one"), \
C(CMD2, "command two") \
#undef C
#define C(a, b) a
enum commands_enum { CMDS };
#undef C
#define C(a, b) { b }
static struct {
const char *name;
} commands[] = { CMDS };
#undef C
...then there's no way for the command ids to get out of sync with the names.
On 20-12-08 19:33:13, Dan Williams wrote:
> On Tue, Dec 8, 2020 at 6:13 PM Ben Widawsky <[email protected]> wrote:
> >
> > On 20-12-08 17:37:50, Dan Williams wrote:
> > > On Tue, Dec 8, 2020 at 4:24 PM Ben Widawsky <[email protected]> wrote:
> > > >
> > > > Add a straightforward IOCTL that provides a mechanism for userspace to
> > > > query the supported memory device commands.
> > > >
> > > > Memory device commands are specified in 8.2.9 of the CXL 2.0
> > > > specification. They are submitted through a mailbox mechanism specified
> > > > in 8.2.8.4.
> > > >
> > > > Signed-off-by: Ben Widawsky <[email protected]>
> > > >
> > > > ---
> > > >
> > > > I did attempt to use the same struct for querying commands as well as
> > > > sending commands (upcoming patch). The number of unused fields between
> > > > the two made for a bad fit IMO.
> > > >
> > > > Signed-off-by: Ben Widawsky <[email protected]>
> > > > ---
> > > > Documentation/cxl/memory-devices.rst | 9 +++
> > > > drivers/cxl/mem.c | 89 +++++++++++++++++++++++
> > > > include/uapi/linux/cxl_mem.h | 102 +++++++++++++++++++++++++++
> > > > 3 files changed, 200 insertions(+)
> > > > create mode 100644 include/uapi/linux/cxl_mem.h
> > > >
> > > > diff --git a/Documentation/cxl/memory-devices.rst b/Documentation/cxl/memory-devices.rst
> > > > index 5f723c25382b..ec54674b3822 100644
> > > > --- a/Documentation/cxl/memory-devices.rst
> > > > +++ b/Documentation/cxl/memory-devices.rst
> > > > @@ -32,6 +32,15 @@ CXL Memory Device
> > > > .. kernel-doc:: drivers/cxl/mem.c
> > > > :internal:
> > > >
> > > > +CXL IOCTL Interface
> > > > +-------------------
> > > > +
> > > > +.. kernel-doc:: include/uapi/linux/cxl_mem.h
> > > > + :doc: UAPI
> > > > +
> > > > +.. kernel-doc:: include/uapi/linux/cxl_mem.h
> > > > + :internal:
> > > > +
> > > > External Interfaces
> > > > ===================
> > > >
> > > > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> > > > index bb6ea58f6c7b..2c4aadcea0e4 100644
> > > > --- a/drivers/cxl/mem.c
> > > > +++ b/drivers/cxl/mem.c
> > > > @@ -7,6 +7,7 @@
> > > > #include <linux/idr.h>
> > > > #include <linux/pci.h>
> > > > #include <linux/io.h>
> > > > +#include <uapi/linux/cxl_mem.h>
> > > > #include "acpi.h"
> > > > #include "pci.h"
> > > > #include "cxl.h"
> > > > @@ -73,6 +74,49 @@ static DEFINE_IDR(cxl_mem_idr);
> > > > /* protect cxl_mem_idr allocations */
> > > > static DEFINE_MUTEX(cxl_memdev_lock);
> > > >
> > > > +/*
> > > > + * This table defines the supported mailboxes commands for the driver. The id is
> > > > + * ordinal and thus gaps in this table aren't allowed. This table is made up of
> > > > + * a UAPI structure. Non-negative values in the table will be validated against
> > > > + * the user's input. For example, if size_in is 0, and the user passed in 1, it
> > > > + * is an error.
> > > > + */
> > > > +#define CXL_CMD(_id, _flags, sin, sout, _name, _enable, op) \
> > > > + { \
> > > > + { .id = CXL_MEM_COMMAND_ID_##_id, \
> > > > + .flags = CXL_MEM_COMMAND_FLAG_##_flags, \
> > > > + .size_in = sin, \
> > > > + .size_out = sout, \
> > > > + .name = _name }, \
> > > > + .enable = _enable, .opcode = op \
> > > > + }
> > >
> > > Seems the ordinality requirement could be dropped if the definition was:
> > >
> > > #define CXL_CMD(_id, _flags, sin, sout, _name, _enable, op) \
> > > [CXL_MEM_COMMAND_ID_##_id] = {
> > > \
> > > { .id = CXL_MEM_COMMAND_ID_##_id, \
> > > ...
> > >
> > > Then command 0 and 42 could be defined out of order in the table.
> > > Especially if we need to config-disable or deprecate commands, I think
> > > it would be useful if this table was tolerant to being sparse.
> > >
> >
> > How sparse are we talking? The current form does support sparseness, but
> > obviously gets quite large if the ID numbering is similar to random
> > distribution.
>
> "Sparse" may have been the wrong word to use. I was implying sparse
> enough that if I add command N+1 I don't need to be careful where I
> put it in mem_commands, but still be able to rely on lookups into
> mem_commands being indexed by the command-id.
>
I'm not sure I understand the issue then. It's already demonstrated via the
first command being reserved - ie. already sparse.
CXL_CMD(INVALID, NONE, 0, 0, "Reserved", false, 0)
As long as the command doesn't have @enable set, it's effectively ignored for
all user interactions.
If you look at the last patch in the series, WIP, there is an example for
enabling one.
> > I think if we do see this being more like random distribution, it can be
> > supported, but I think it adds a decent amount of complexity for what I see as
> > not much reward - unless you know of a fairly simple way to create this data
> > structure with full sparse ID support?
>
> I'm expecting the command distribution to be mostly uniform, it's more
> of the lookup property that I think would be useful especially for the
> dynamic case of walking mem_commands to update it relative to what the
> hardware supports or other metadata. Speaking of which I think @enable
> should be turned into @flags of which 'enable' is one, in case we want
> to define more flags in the future.
>
I like the idea of moving enable to flags. I still don't see a reason to change
how it's defined today, can you give me an example where what is there won't
work?
> >
> > > > +
> > > > +/**
> > > > + * struct cxl_mem_command - Driver representation of a memory device command
> > > > + * @info: Command information as it exists for the UAPI
> > > > + * @opcode: The actual bits used for the mailbox protocol
> > > > + * @enable: Whether the command is enabled. The driver may support a large set
> > > > + * of commands that may not be enabled. The primary reason a command
> > > > + * would not be enabled is for commands that are specified as optional
> > > > + * and the hardware doesn't support the command.
> > > > + *
> > > > + * The cxl_mem_command is the driver's internal representation of commands that
> > > > + * are supported by the driver. Some of these commands may not be supported by
> > > > + * the hardware (!@enable). The driver will use @info to validate the fields
> > > > + * passed in by the user then submit the @opcode to the hardware.
> > > > + *
> > > > + * See struct cxl_command_info.
> > > > + */
> > > > +struct cxl_mem_command {
> > > > + const struct cxl_command_info info;
> > > > + const u16 opcode;
> > > > + bool enable;
> > > > +};
> > > > +
> > > > +static struct cxl_mem_command mem_commands[] = {
> > > > + CXL_CMD(INVALID, NONE, 0, 0, "Reserved", false, 0),
> > > > +};
> > > > +
> > > > static int cxl_mem_wait_for_doorbell(struct cxl_mem *cxlm)
> > > > {
> > > > const int timeout = msecs_to_jiffies(2000);
> > > > @@ -268,8 +312,53 @@ static int cxl_mem_open(struct inode *inode, struct file *file)
> > > > return 0;
> > > > }
> > > >
> > > > +static int cxl_mem_count_commands(void)
> > > > +{
> > > > + int i, n = 0;
> > > > +
> > > > + for (i = 0; i < ARRAY_SIZE(mem_commands); i++) {
> > > > + struct cxl_mem_command *c = &mem_commands[i];
> > > > +
> > > > + if (c->enable)
> > > > + n++;
> > > > + }
> > > > +
> > > > + return n;
> > > > +}
> > > > +
> > > > static long cxl_mem_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> > > > {
> > > > + if (cmd == CXL_MEM_QUERY_COMMANDS) {
> > > > + struct cxl_mem_query_commands __user *q = (void __user *)arg;
> > > > + u32 n_commands;
> > > > + int i, j;
> > > > +
> > > > + if (get_user(n_commands, (u32 __user *)arg))
> > > > + return -EFAULT;
> > > > +
> > > > + if (n_commands == 0)
> > > > + return put_user(cxl_mem_count_commands(),
> > > > + (u32 __user *)arg);
> > > > +
> > > > + for (i = 0, j = 0;
> > > > + i < ARRAY_SIZE(mem_commands) && j < n_commands; i++) {
> > > > + struct cxl_mem_command *c = &mem_commands[i];
> > > > + const struct cxl_command_info *info = &c->info;
> > > > +
> > > > + if (!c->enable)
> > > > + continue;
> > > > +
> > > > + if (copy_to_user(&q->commands[j], info, sizeof(*info)))
> > > > + return -EFAULT;
> > > > +
> > > > + if (copy_to_user(&q->commands[j].name, info->name,
> > > > + strlen(info->name)))
> > > > + return -EFAULT;
> > >
> > > Not sure why this is needed, see comment below about @name in
> > > cxl_mem_query_commands.
> > >
> > > > +
> > > > + j++;
> > > > + }
> > > > + }
> > > > +
> > > > return -ENOTTY;
> > > > }
> > > >
> > > > diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
> > > > new file mode 100644
> > > > index 000000000000..1d1e143f98ec
> > > > --- /dev/null
> > > > +++ b/include/uapi/linux/cxl_mem.h
> > > > @@ -0,0 +1,102 @@
> > > > +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> > > > +/*
> > > > + * CXL IOCTLs for Memory Devices
> > > > + */
> > > > +
> > > > +#ifndef _UAPI_CXL_MEM_H_
> > > > +#define _UAPI_CXL_MEM_H_
> > > > +
> > > > +#if defined(__cplusplus)
> > > > +extern "C" {
> > > > +#endif
> > > > +
> > > > +/**
> > > > + * DOC: UAPI
> > > > + *
> > > > + * CXL memory devices expose UAPI to have a standard user interface.
> > > > + * Userspace can refer to these structure definitions and UAPI formats
> > > > + * to communicate to driver
> > > > + */
> > > > +
> > > > +#define CXL_MEM_QUERY_COMMANDS _IOR('C', 1, struct cxl_mem_query_commands)
> > > > +
> > > > +#define CXL_MEM_COMMAND_NAME_LENGTH 32
> > > > +
> > > > +/**
> > > > + * struct cxl_command_info - Command information returned from a query.
> > > > + * @id: ID number for the command.
> > > > + * @flags: Flags that specify command behavior.
> > > > + *
> > > > + * - CXL_MEM_COMMAND_FLAG_TAINT: Using this command will taint the kernel.
> > > > + * @size_in: Expected input size, or -1 if variable length.
> > > > + * @size_out: Expected output size, or -1 if variable length.
> > > > + * @name: Name describing the command.
> > > > + *
> > > > + * Represents a single command that is supported by both the driver and the
> > > > + * hardware. The is returned as part of an array from the query ioctl. The
> > > > + * following would be a command named "foobar" that takes a variable length
> > > > + * input and returns 0 bytes of output.
> > > > + *
> > > > + * - @id = 10
> > > > + * - @name = foobar
> > > > + * - @flags = 0
> > > > + * - @size_in = -1
> > > > + * - @size_out = 0
> > > > + *
> > > > + * See struct cxl_mem_query_commands.
> > > > + */
> > > > +struct cxl_command_info {
> > > > + __u32 id;
> > > > +#define CXL_MEM_COMMAND_ID_INVALID 0
> > > > +
> > > > + __u32 flags;
> > > > +#define CXL_MEM_COMMAND_FLAG_NONE 0
> > > > +#define CXL_MEM_COMMAND_FLAG_TAINT BIT(0)
> > > > +
> > > > + __s32 size_in;
> > > > + __s32 size_out;
> > > > +
> > > > + char name[32];
> > >
> > > Why does the name for a command need to be shuffled back and forth
> > > over the ioctl interface. Can't this be handled by a static lookup
> > > table defined in the header?
> > >
> >
> > I was thinking of cases where the userspace application doesn't match the
> > current kernel's UAPI and giving the driver flexibility to return whatever.
>
> How / why would the application by looking at @name for UAPI compatibility?
>
> > OTTOMH, I also can't think of a way to do this if you want to do define the
> > table sparsely though. Do you have ideas for that?
>
> I don't think the name lookup would be sparse. i.e. it would be ok for
> mem_commands to not have an entry for everything in the name lookup
> table. As for defining the table it could use C preprocessor trick
> popularized by Steven Rostedt:
>
> #define CMDS \
> C(CMD1, "command one"), \
> C(CMD2, "command two") \
> #undef C
> #define C(a, b) a
> enum commands_enum { CMDS };
> #undef C
> #define C(a, b) { b }
> static struct {
> const char *name;
> } commands[] = { CMDS };
> #undef C
>
> ...then there's no way for the command ids to get out of sync with the names.
I will move it to the header and drop name[32] from UAPI. My personal preference
is to have the driver fill in the field, but I have no objective reason for
that.
On 12/8/20 4:24 PM, Ben Widawsky wrote:
> +
> +#define CXL_MEM_QUERY_COMMANDS _IOR('C', 1, struct cxl_mem_query_commands)
Hi,
I could have missed it, but IOCTL major "numbers" (like 'C') should be
listed in Documentation/userspace-api/ioctl/ioctl-number.rst.
thanks.
--
~Randy