2020-03-27 07:14:37

by Alastair D'Silva

[permalink] [raw]
Subject: [PATCH v4 00/25] Add support for OpenCAPI Persistent Memory devices

This series adds support for OpenCAPI Persistent Memory devices on bare metal (arch/powernv), exposing them as nvdimms so that we can make use of the existing infrastructure. There already exists a driver for the same devices abstracted through PowerVM (arch/pseries): arch/powerpc/platforms/pseries/papr_scm.c

These devices are connected via OpenCAPI, and present as LPC (lowest coherence point) memory to the system, practically, that means that memory on these cards could be treated as conventional, cache-coherent memory.

Since the devices are connected via OpenCAPI, they are not enumerated via ACPI. Instead, OpenCAPI links present as pseudo-PCI bridges, with devices below them.

This series introduces a driver that exposes the memory on these cards as nvdimms, with each card getting it's own bus. This is somewhat complicated by the fact that the cards do not have out of band persistent storage for metadata, so 1 SECTION_SIZE's (see SPARSEMEM) worth of storage is carved out of the top of the card storage to implement the ndctl_config_* calls.

The driver is not responsible for configuring the NPU (NVLink Processing Unit) BARs to map the LPC memory from the card into the system's physical address space, instead, it requests this to be done via OPAL calls (typically implemented by Skiboot).

The series is structured as follows:
- Required infrastructure changes & cleanup
- A minimal driver implementation
- Implementing additional features within the driver

Changelog:
V4:
- Rebase on next-20200320
- Bump copyright to 2020
- Ensure all uapi headers use C89 compatible comments (missed ocxlpmem.h)
- Move the driver back to drivers/nvdimm/ocxl, after confirmation
that this location is desirable
- Rename ocxl.c to ocxlpmem.c (+ support files)
- Rename all ocxl_pmem to ocxlpmem
- Address checkpatch --strict issues
- "powerpc/powernv: Add OPAL calls for LPC memory alloc/release"
- Pass base address as __be64
- "ocxl: Tally up the LPC memory on a link & allow it to be mapped"
- Address checkpatch spacing warnings
- Reword blurb
- Reword size description for ocxl_link_add_lpc_mem()
- Add an early exit in ocxl_link_lpc_release() to avoid triggering
bogus warnings if called after ocxl_link_lpc_map() fails
- "powerpc/powernv: Add OPAL calls for LPC memory alloc/release"
- Reword blurb
- "powerpc/powernv: Map & release OpenCAPI LPC memory"
- Reword blurb
- Move minor_idr init from file_init() to ocxlpmem_init() (fixes runtime error
in "nvdimm: Add driver for OpenCAPI Persistent Memory")
- Wrap long lines
- "nvdimm: Add driver for OpenCAPI Storage Class Memory"
- Remove '+ 1' workround from serial number->cookie assignment
- Drop out of memory message for ocxlpmem in probe()
- Fix leaks of ocxlpmem & ocxlpmem->ocxl_fn in probe()
- remove struct ocxlpmem_function0, it didn't value add
- factor out err_unregistered label in probe
- Address more checkpatch warnings
- get/put the pci dev on probe/free
- Drop ocxlpmem_ prefix from static functions
- Propogate errors up from called functions in probe()
- Set MODULE_LICENSE to GPLv2
- Add myself as module author
- Call nvdimm_bus_unregister() in remove() to release references
- Don't call devm_memunmap on metadata_address, the release handler on
the device already deals with this
- "nvdimm/ocxl: Read the capability registers & wait for device ready"
- Fix mask for read_latency
- Fold in is_usable logic into timeout to remove error message race
- propogate bad rc from read_device_metadata
- "nvdimm/ocxl: Add register addresses & status values to the header"
- Add comments for register abbreviations where names have been
expanded
- Add missing status for blocked on background task
- Alias defines for firmware update status to show that the duplication
of values is intentional
- "nvdimm/ocxl: Register a character device for userspace to interact with"
- Add lock around minors IDR, delete the cdev before device_unregister
- Propogate errors up from called functions in probe()
- "nvdimm/ocxl: Add support for Admin commands"
- Fix typo in setup_command_data error message, and drop 'ocxl' from it
- Drop vestigial CHI read from admin_command_request
- Change command ID mismatch message to dev_err, and return an error
- Use jiffies to implement admin_command_complete_timeout()
- Flesh out blurb
- Create a wrapper to issue the command & wait for timeout
- "nvdimm/ocxl: Add support for near storage commands"
- dropped (will submit with the patches for nvdimm overwrite)
- "nvdimm/ocxl: Implement the Read Error Log command"
- Remove stray blank line
- change misplaced goto to an early exit in read_error_log
- Inline error_log_offset_0x08
- Read WWID data as LE rather than host endian
- Move the include of nvdimm/ocxlpmem.h to ocxl.c
- Add padding after fwrevision in struct ioctl_ocxl_pmem_error_log
- Register IOCTL magic
- Coerce pointers to __u64 in IOCTLs
- "nvdimm/ocxl: Add controller dump IOCTLs"
- Coerce pointers to __u64 in IOCTLs
- Document expected IOCTL usage in blurb
- Add missing rc check
- Only populate up to the number of bytes returned by the card,
and return this length to the caller
- Add missing header check
- "nvdimm/ocxl: Add an IOCTL to report controller statistics"
- Update to match the latest version of the spec
- Verify that parametr block IDs & lengths match what we expect
- Use defines for offsets
- "nvdimm/ocxl: Forward events to userspace"
- Don't enable NSCRA doorbell
- return -EBUSY if the event context is already used
- return -ENODEV if IRQs cannot be mapped
- Tag IRQ pointers with __iomem
- Drop ocxlpmem_ prefix from static functions
- Propogate error from eventfd_ctx_fdget
- Fix error check in copy_to_user
- Drop GLOBAL_MMIO_CHI_NSCRA (this should be in the overwrite patch)
- Drop unused irq_pgmap
- Don't redef BIT_ULL
- "nvdimm/ocxl: Add debug IOCTLs"
- Eliminate clearing loop (now done in admin_command_execute()
- Drop dummy IOCTLs if CONFIG_OCXL_PMEM_DEBUG is not set
- Group debug IOCTLs together & comment that they may not be available
- "nvdimm/ocxl: Expose SMART data via ndctl"
- Drop 'rc = 0; goto out;'
- Propogate errors from ndctl_smart()
- "nvdimm/ocxl: Expose the serial number in sysfs" & "nvdimm/ocxl: Expose the firmware version in sysfs"
- Squash these 2 patches together
- Expose data as a DIMM attribute rather than an ocxlpmem
attribute
- "nvdimm/ocxl: Add an IOCTL to request controller health & perf data"
- Reword blurb
- "nvdimm/ocxl: Implement the heartbeat command"
- Propogate rc in probe()

V3:
- Rebase against next/next-20200220
- Move driver to arch/powerpc/platforms/powernv, we now expect this
driver to go upstream via the powerpc tree
- "nvdimm/ocxl: Implement the Read Error Log command"
- Fix bad header path
- "nvdimm/ocxl: Read the capability registers & wait for device ready"
- Fix overlapping masks between readiness_timeout & memory_available_timeout
- "nvdimm: Add driver for OpenCAPI Storage Class Memory"
- Address minor review comments from Jonathan Cameron
- Remove attributes
- Default to module if building LIBNVDIMM
- Propogate errors up from called functions in probe()
- "nvdimm/ocxl: Expose SMART data via ndctl"
- Pack attributes in struct
- Support different size SMART buffers for compatibility with newer
ndctls that may want more SMART attribs than we provide
- Rework to to use ND_CMD_CALL instead of ND_CMD_SMART
- drop "ocxl: Free detached contexts in ocxl_context_detach_all()"
- "powerpc: Map & release OpenCAPI LPC memory"
- Remove 'extern'
- Only available with CONFIG_MEMORY_HOTPLUG_SPARSE
- "ocxl: Tally up the LPC memory on a link & allow it to be mapped"
- Address minor review comments from Jonathan Cameron
- "ocxl: Add functions to map/unmap LPC memory"
- Split detected memory message into a separate patch
- Address minor review comments from Jonathan Cameron
- Add a comment explaining why unmap_lpc_mem is in deconfigure_afu
- "nvdimm/ocxl: Add support for Admin commands"
- use sizeof(u64) rather than 0x08 when iterating u64s
- "nvdimm/ocxl: Implement the heartbeat command"
- Fix typo in blurb
- Address kernel doc issues
- Ensure all uapi headers use C89 compatible comments
- Drop patches for firmware update & overwrite, these will be
submitted later once patches are available for ndctl
- Rename SCM to OpenCAPI Persistent Memory

V2:
- "powerpc: Map & release OpenCAPI LPC memory"
- Fix #if -> #ifdef
- use pci_dev_id to get the bdfn
- use __be64 to hold be data
- indent check_hotplug_memory_addressable correctly
- Remove export of check_hotplug_memory_addressable
- "ocxl: Conditionally bind SCM devices to the generic OCXL driver"
- Improve patch description and remove redundant default
- "nvdimm: Add driver for OpenCAPI Storage Class Memory"
- Mark a few funcs as static as identified by the 0day bot
- Add OCXL dependancies to OCXL_SCM
- Use memcpy_mcsafe in scm_ndctl_config_read
- Rename scm_foo_offset_0x00 to scm_foo_header_parse & add docs
- Name DIMM attribs "ocxl" rather than "scm"
- Split out into base + many feature patches
- "powerpc: Enable OpenCAPI Storage Class Memory driver on bare metal"
- Build DEV_DAX & friends as modules
- "ocxl: Conditionally bind SCM devices to the generic OCXL driver"
- Patch dropped (easy enough to maintain this out of tree for development)
- "ocxl: Tally up the LPC memory on a link & allow it to be mapped"
- Add a warning if an unmatched lpc_release is called
- "ocxl: Add functions to map/unmap LPC memory"
- Use EXPORT_SYMBOL_GPL


Alastair D'Silva (25):
powerpc/powernv: Add OPAL calls for LPC memory alloc/release
mm/memory_hotplug: Allow check_hotplug_memory_addressable to be called
from drivers
powerpc/powernv: Map & release OpenCAPI LPC memory
ocxl: Remove unnecessary externs
ocxl: Address kernel doc errors & warnings
ocxl: Tally up the LPC memory on a link & allow it to be mapped
ocxl: Add functions to map/unmap LPC memory
ocxl: Emit a log message showing how much LPC memory was detected
ocxl: Save the device serial number in ocxl_fn
nvdimm: Add driver for OpenCAPI Persistent Memory
powerpc: Enable the OpenCAPI Persistent Memory driver for
powernv_defconfig
nvdimm/ocxl: Add register addresses & status values to the header
nvdimm/ocxl: Read the capability registers & wait for device ready
nvdimm/ocxl: Add support for Admin commands
nvdimm/ocxl: Register a character device for userspace to interact
with
nvdimm/ocxl: Implement the Read Error Log command
nvdimm/ocxl: Add controller dump IOCTLs
nvdimm/ocxl: Add an IOCTL to report controller statistics
nvdimm/ocxl: Forward events to userspace
nvdimm/ocxl: Add an IOCTL to request controller health & perf data
nvdimm/ocxl: Implement the heartbeat command
nvdimm/ocxl: Add debug IOCTLs
nvdimm/ocxl: Expose SMART data via ndctl
nvdimm/ocxl: Expose the serial number & firmware version in sysfs
MAINTAINERS: Add myself & nvdimm/ocxl to ocxl

.../userspace-api/ioctl/ioctl-number.rst | 1 +
MAINTAINERS | 3 +
arch/powerpc/configs/powernv_defconfig | 5 +
arch/powerpc/include/asm/opal-api.h | 2 +
arch/powerpc/include/asm/opal.h | 2 +
arch/powerpc/include/asm/pnv-ocxl.h | 42 +-
arch/powerpc/platforms/powernv/ocxl.c | 43 +
arch/powerpc/platforms/powernv/opal-call.c | 2 +
drivers/misc/ocxl/config.c | 74 +-
drivers/misc/ocxl/core.c | 61 +
drivers/misc/ocxl/link.c | 60 +
drivers/misc/ocxl/ocxl_internal.h | 45 +-
drivers/nvdimm/Kconfig | 2 +
drivers/nvdimm/Makefile | 1 +
drivers/nvdimm/ocxl/Kconfig | 21 +
drivers/nvdimm/ocxl/Makefile | 7 +
drivers/nvdimm/ocxl/main.c | 1975 +++++++++++++++++
drivers/nvdimm/ocxl/ocxlpmem.h | 197 ++
drivers/nvdimm/ocxl/ocxlpmem_internal.c | 280 +++
include/linux/memory_hotplug.h | 5 +
include/misc/ocxl.h | 122 +-
include/uapi/linux/ndctl.h | 1 +
include/uapi/nvdimm/ocxlpmem.h | 127 ++
mm/memory_hotplug.c | 4 +-
24 files changed, 2983 insertions(+), 99 deletions(-)
create mode 100644 drivers/nvdimm/ocxl/Kconfig
create mode 100644 drivers/nvdimm/ocxl/Makefile
create mode 100644 drivers/nvdimm/ocxl/main.c
create mode 100644 drivers/nvdimm/ocxl/ocxlpmem.h
create mode 100644 drivers/nvdimm/ocxl/ocxlpmem_internal.c
create mode 100644 include/uapi/nvdimm/ocxlpmem.h

--
2.24.1


2020-03-27 07:15:04

by Alastair D'Silva

[permalink] [raw]
Subject: [PATCH v4 11/25] powerpc: Enable the OpenCAPI Persistent Memory driver for powernv_defconfig

This patch enables the OpenCAPI Persistent Memory driver, as well
as DAX support, for the 'powernv' defconfig.

DAX is not a strict requirement for the functioning of the driver, but it
is likely that a user will want to create a DAX device on top of their
persistent memory device.

Signed-off-by: Alastair D'Silva <[email protected]>
Reviewed-by: Andrew Donnellan <[email protected]>
---
arch/powerpc/configs/powernv_defconfig | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/configs/powernv_defconfig b/arch/powerpc/configs/powernv_defconfig
index 71749377d164..921d77bbd3d2 100644
--- a/arch/powerpc/configs/powernv_defconfig
+++ b/arch/powerpc/configs/powernv_defconfig
@@ -348,3 +348,8 @@ CONFIG_KVM_BOOK3S_64=m
CONFIG_KVM_BOOK3S_64_HV=m
CONFIG_VHOST_NET=m
CONFIG_PRINTK_TIME=y
+CONFIG_ZONE_DEVICE=y
+CONFIG_OCXL_PMEM=m
+CONFIG_DEV_DAX=m
+CONFIG_DEV_DAX_PMEM=m
+CONFIG_FS_DAX=y
--
2.24.1

2020-03-29 02:28:55

by Alastair D'Silva

[permalink] [raw]
Subject: [PATCH v4 01/25] powerpc/powernv: Add OPAL calls for LPC memory alloc/release

Add OPAL calls for LPC memory alloc/release

Signed-off-by: Alastair D'Silva <[email protected]>
Acked-by: Andrew Donnellan <[email protected]>
Acked-by: Frederic Barrat <[email protected]>
---
arch/powerpc/include/asm/opal-api.h | 2 ++
arch/powerpc/include/asm/opal.h | 2 ++
arch/powerpc/platforms/powernv/opal-call.c | 2 ++
3 files changed, 6 insertions(+)

diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index c1f25a760eb1..9298e603001b 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -208,6 +208,8 @@
#define OPAL_HANDLE_HMI2 166
#define OPAL_NX_COPROC_INIT 167
#define OPAL_XIVE_GET_VP_STATE 170
+#define OPAL_NPU_MEM_ALLOC 171
+#define OPAL_NPU_MEM_RELEASE 172
#define OPAL_MPIPL_UPDATE 173
#define OPAL_MPIPL_REGISTER_TAG 174
#define OPAL_MPIPL_QUERY_TAG 175
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 9986ac34b8e2..301fea46c7ca 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -39,6 +39,8 @@ int64_t opal_npu_spa_clear_cache(uint64_t phb_id, uint32_t bdfn,
uint64_t PE_handle);
int64_t opal_npu_tl_set(uint64_t phb_id, uint32_t bdfn, long cap,
uint64_t rate_phys, uint32_t size);
+int64_t opal_npu_mem_alloc(u64 phb_id, u32 bdfn, u64 size, __be64 *bar);
+int64_t opal_npu_mem_release(u64 phb_id, u32 bdfn);

int64_t opal_console_write(int64_t term_number, __be64 *length,
const uint8_t *buffer);
diff --git a/arch/powerpc/platforms/powernv/opal-call.c b/arch/powerpc/platforms/powernv/opal-call.c
index 5cd0f52d258f..f26e58b72c04 100644
--- a/arch/powerpc/platforms/powernv/opal-call.c
+++ b/arch/powerpc/platforms/powernv/opal-call.c
@@ -287,6 +287,8 @@ OPAL_CALL(opal_pci_set_pbcq_tunnel_bar, OPAL_PCI_SET_PBCQ_TUNNEL_BAR);
OPAL_CALL(opal_sensor_read_u64, OPAL_SENSOR_READ_U64);
OPAL_CALL(opal_sensor_group_enable, OPAL_SENSOR_GROUP_ENABLE);
OPAL_CALL(opal_nx_coproc_init, OPAL_NX_COPROC_INIT);
+OPAL_CALL(opal_npu_mem_alloc, OPAL_NPU_MEM_ALLOC);
+OPAL_CALL(opal_npu_mem_release, OPAL_NPU_MEM_RELEASE);
OPAL_CALL(opal_mpipl_update, OPAL_MPIPL_UPDATE);
OPAL_CALL(opal_mpipl_register_tag, OPAL_MPIPL_REGISTER_TAG);
OPAL_CALL(opal_mpipl_query_tag, OPAL_MPIPL_QUERY_TAG);
--
2.24.1

2020-03-29 02:28:56

by Alastair D'Silva

[permalink] [raw]
Subject: [PATCH v4 08/25] ocxl: Emit a log message showing how much LPC memory was detected

This patch emits a message showing how much LPC memory & special purpose
memory was detected on an OCXL device.

Signed-off-by: Alastair D'Silva <[email protected]>
Acked-by: Frederic Barrat <[email protected]>
Acked-by: Andrew Donnellan <[email protected]>
---
drivers/misc/ocxl/config.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
index a62e3d7db2bf..69cca341d446 100644
--- a/drivers/misc/ocxl/config.c
+++ b/drivers/misc/ocxl/config.c
@@ -568,6 +568,10 @@ static int read_afu_lpc_memory_info(struct pci_dev *dev,
afu->special_purpose_mem_size =
total_mem_size - lpc_mem_size;
}
+
+ dev_info(&dev->dev, "Probed LPC memory of %#llx bytes and special purpose memory of %#llx bytes\n",
+ afu->lpc_mem_size, afu->special_purpose_mem_size);
+
return 0;
}

--
2.24.1

2020-03-29 02:30:16

by Alastair D'Silva

[permalink] [raw]
Subject: [PATCH v4 02/25] mm/memory_hotplug: Allow check_hotplug_memory_addressable to be called from drivers

When setting up OpenCAPI connected persistent memory, the range check may
not be performed until quite late (or perhaps not at all, if the user does
not establish a DAX device).

This patch makes the range check callable so we can perform the check while
probing the OpenCAPI Persistent Memory device.

Signed-off-by: Alastair D'Silva <[email protected]>
Reviewed-by: Andrew Donnellan <[email protected]>
---
include/linux/memory_hotplug.h | 5 +++++
mm/memory_hotplug.c | 4 ++--
2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index f4d59155f3d4..9a19ae0d7e31 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -337,6 +337,11 @@ static inline void __remove_memory(int nid, u64 start, u64 size) {}
extern void set_zone_contiguous(struct zone *zone);
extern void clear_zone_contiguous(struct zone *zone);

+#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
+int check_hotplug_memory_addressable(unsigned long pfn,
+ unsigned long nr_pages);
+#endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
+
extern void __ref free_area_init_core_hotplug(int nid);
extern int __add_memory(int nid, u64 start, u64 size);
extern int add_memory(int nid, u64 start, u64 size);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0a54ffac8c68..14945f033594 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -276,8 +276,8 @@ static int check_pfn_span(unsigned long pfn, unsigned long nr_pages,
return 0;
}

-static int check_hotplug_memory_addressable(unsigned long pfn,
- unsigned long nr_pages)
+int check_hotplug_memory_addressable(unsigned long pfn,
+ unsigned long nr_pages)
{
const u64 max_addr = PFN_PHYS(pfn + nr_pages) - 1;

--
2.24.1

2020-03-29 02:30:16

by Alastair D'Silva

[permalink] [raw]
Subject: [PATCH v4 14/25] nvdimm/ocxl: Add support for Admin commands

Admin commands for these devices are the primary means of interacting
with the device controller to provide functionality beyond the load/store
capabilities offered via the NPU.

For example, SMART data, firmware update, and device error logs are
implemented via admin commands.

This patch requests the metadata required to issue admin commands, as well
as some helper functions to construct and check the completion of the
commands.

Signed-off-by: Alastair D'Silva <[email protected]>
---
drivers/nvdimm/ocxl/main.c | 65 ++++++
drivers/nvdimm/ocxl/ocxlpmem.h | 50 ++++-
drivers/nvdimm/ocxl/ocxlpmem_internal.c | 261 ++++++++++++++++++++++++
3 files changed, 375 insertions(+), 1 deletion(-)

diff --git a/drivers/nvdimm/ocxl/main.c b/drivers/nvdimm/ocxl/main.c
index be76acd33d74..8db573036423 100644
--- a/drivers/nvdimm/ocxl/main.c
+++ b/drivers/nvdimm/ocxl/main.c
@@ -217,6 +217,58 @@ static int register_lpc_mem(struct ocxlpmem *ocxlpmem)
return 0;
}

+/**
+ * extract_command_metadata() - Extract command data from MMIO & save it for further use
+ * @ocxlpmem: the device metadata
+ * @offset: The base address of the command data structures (address of CREQO)
+ * @command_metadata: A pointer to the command metadata to populate
+ * Return: 0 on success, negative on failure
+ */
+static int extract_command_metadata(struct ocxlpmem *ocxlpmem, u32 offset,
+ struct command_metadata *command_metadata)
+{
+ int rc;
+ u64 tmp;
+
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu, offset,
+ OCXL_LITTLE_ENDIAN, &tmp);
+ if (rc)
+ return rc;
+
+ command_metadata->request_offset = tmp >> 32;
+ command_metadata->response_offset = tmp & 0xFFFFFFFF;
+
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu, offset + 8,
+ OCXL_LITTLE_ENDIAN, &tmp);
+ if (rc)
+ return rc;
+
+ command_metadata->data_offset = tmp >> 32;
+ command_metadata->data_size = tmp & 0xFFFFFFFF;
+
+ command_metadata->id = 0;
+
+ return 0;
+}
+
+/**
+ * setup_command_metadata() - Set up the command metadata
+ * @ocxlpmem: the device metadata
+ */
+static int setup_command_metadata(struct ocxlpmem *ocxlpmem)
+{
+ int rc;
+
+ mutex_init(&ocxlpmem->admin_command.lock);
+
+ rc = extract_command_metadata(ocxlpmem, GLOBAL_MMIO_ACMA_CREQO,
+ &ocxlpmem->admin_command);
+ if (rc)
+ return rc;
+
+ return 0;
+}
+
/**
* allocate_minor() - Allocate a minor number to use for an OpenCAPI pmem device
* @ocxlpmem: the device metadata
@@ -421,6 +473,14 @@ static int probe(struct pci_dev *pdev, const struct pci_device_id *ent)

ocxlpmem->pdev = pci_dev_get(pdev);

+ ocxlpmem->timeouts[ADMIN_COMMAND_ERRLOG] = 2000; // ms
+ ocxlpmem->timeouts[ADMIN_COMMAND_HEARTBEAT] = 100; // ms
+ ocxlpmem->timeouts[ADMIN_COMMAND_SMART] = 100; // ms
+ ocxlpmem->timeouts[ADMIN_COMMAND_CONTROLLER_DUMP] = 1000; // ms
+ ocxlpmem->timeouts[ADMIN_COMMAND_CONTROLLER_STATS] = 100; // ms
+ ocxlpmem->timeouts[ADMIN_COMMAND_SHUTDOWN] = 1000; // ms
+ ocxlpmem->timeouts[ADMIN_COMMAND_FW_UPDATE] = 16000; // ms
+
pci_set_drvdata(pdev, ocxlpmem);

ocxlpmem->ocxl_fn = ocxl_function_open(pdev);
@@ -467,6 +527,11 @@ static int probe(struct pci_dev *pdev, const struct pci_device_id *ent)
goto err;
}

+ if (setup_command_metadata(ocxlpmem)) {
+ dev_err(&pdev->dev, "Could not read command metadata\n");
+ goto err;
+ }
+
elapsed = 0;
timeout = ocxlpmem->readiness_timeout +
ocxlpmem->memory_available_timeout;
diff --git a/drivers/nvdimm/ocxl/ocxlpmem.h b/drivers/nvdimm/ocxl/ocxlpmem.h
index 3eadbe19f6d0..b72b3f909fc3 100644
--- a/drivers/nvdimm/ocxl/ocxlpmem.h
+++ b/drivers/nvdimm/ocxl/ocxlpmem.h
@@ -7,6 +7,7 @@
#include <linux/mm.h>

#define LABEL_AREA_SIZE BIT_ULL(PA_SECTION_SHIFT)
+#define DEFAULT_TIMEOUT 100

#define GLOBAL_MMIO_CHI 0x000
#define GLOBAL_MMIO_CHIC 0x008
@@ -57,6 +58,7 @@
#define GLOBAL_MMIO_HCI_CONTROLLER_DUMP_COLLECTED BIT_ULL(5) // CDC
#define GLOBAL_MMIO_HCI_REQ_HEALTH_PERF BIT_ULL(6) // CHPD

+// must be maintained with admin_command_names in ocxlpmem_internal.c
#define ADMIN_COMMAND_HEARTBEAT 0x00u
#define ADMIN_COMMAND_SHUTDOWN 0x01u
#define ADMIN_COMMAND_FW_UPDATE 0x02u
@@ -68,6 +70,13 @@
#define ADMIN_COMMAND_CMD_CAPS 0x08u
#define ADMIN_COMMAND_MAX 0x08u

+#define NS_COMMAND_SECURE_ERASE 0x20ull
+
+#define NS_RESPONSE_SECURE_ERASE_ACCESSIBLE_SUCCESS 0x20
+#define NS_RESPONSE_SECURE_ERASE_ACCESSIBLE_ATTEMPTED 0x28
+#define NS_RESPONSE_SECURE_ERASE_DEFECTIVE_SUCCESS 0x30
+#define NS_RESPONSE_SECURE_ERASE_DEFECTIVE_ATTEMPTED 0x38
+
#define STATUS_SUCCESS 0x00
#define STATUS_MEM_UNAVAILABLE 0x20
#define STATUS_BLOCKED_BG_TASK 0x21
@@ -81,6 +90,16 @@
#define STATUS_FW_ARG_INVALID STATUS_BAD_REQUEST_PARM
#define STATUS_FW_INVALID STATUS_BAD_DATA_PARM

+struct command_metadata {
+ u32 request_offset;
+ u32 response_offset;
+ u32 data_offset;
+ u32 data_size;
+ struct mutex lock; /* locks access to this command */
+ u16 id;
+ u8 op_code;
+};
+
struct ocxlpmem {
struct device dev;
struct pci_dev *pdev;
@@ -91,10 +110,11 @@ struct ocxlpmem {
struct ocxl_afu *ocxl_afu;
struct ocxl_context *ocxl_context;
void *metadata_addr;
+ struct command_metadata admin_command;
struct resource pmem_res;
struct nd_region *nd_region;
char fw_version[8 + 1];
-
+ u32 timeouts[ADMIN_COMMAND_MAX + 1];
u32 max_controller_dump_size;
u16 scm_revision; // major/minor
u8 readiness_timeout; /* The worst case time (in seconds) that the host
@@ -123,3 +143,31 @@ struct ocxlpmem {
* Returns 0 on success, negative on error
*/
int ocxlpmem_chi(const struct ocxlpmem *ocxlpmem, u64 *chi);
+
+/**
+ * admin_command_execute() - Execute an admin command and wait for completion
+ *
+ * Additional MMIO registers (dependent on the command) may
+ * need to be initialized
+ *
+ * @ocxlpmem: the device metadata
+ * @op_code: the code for the admin command
+ * Returns 0 on success, -EINVAL for a bad op code, -EBUSY on timeout
+ */
+int admin_command_execute(struct ocxlpmem *ocxlpmem, u8 op_code);
+
+/**
+ * admin_response_handled() - Notify the controller that the admin response has been handled
+ * @ocxlpmem: the device metadata
+ * Returns 0 on success, negative on failure
+ */
+int admin_response_handled(const struct ocxlpmem *ocxlpmem);
+
+/**
+ * warn_status() - Emit a kernel warning showing a command status.
+ * @ocxlpmem: the device metadata
+ * @message: A message to accompany the warning
+ * @status: The command status
+ */
+void warn_status(const struct ocxlpmem *ocxlpmem, const char *message,
+ u8 status);
diff --git a/drivers/nvdimm/ocxl/ocxlpmem_internal.c b/drivers/nvdimm/ocxl/ocxlpmem_internal.c
index 5578169b7515..7470a6ab3b08 100644
--- a/drivers/nvdimm/ocxl/ocxlpmem_internal.c
+++ b/drivers/nvdimm/ocxl/ocxlpmem_internal.c
@@ -17,3 +17,264 @@ int ocxlpmem_chi(const struct ocxlpmem *ocxlpmem, u64 *chi)

return 0;
}
+
+#define COMMAND_REQUEST_SIZE (8 * sizeof(u64))
+/**
+ * scm_command_request() - Set up a command request
+ * @cmd: The metadata for the type of command to be issued
+ * @op_code: the op code for the command
+ * @valid_bytes: the number of bytes in the header to preserve (these must be set before calling)
+ */
+static int scm_command_request(const struct ocxlpmem *ocxlpmem,
+ struct command_metadata *cmd, u8 op_code,
+ u8 valid_bytes)
+{
+ u64 val = op_code;
+ int rc;
+ u8 i;
+
+ cmd->op_code = op_code;
+ cmd->id++;
+
+ val |= ((u64)cmd->id) << 16;
+
+ rc = ocxl_global_mmio_write64(ocxlpmem->ocxl_afu, cmd->request_offset,
+ OCXL_LITTLE_ENDIAN, val);
+ if (rc)
+ return rc;
+
+ for (i = valid_bytes; i < COMMAND_REQUEST_SIZE; i += sizeof(u64)) {
+ rc = ocxl_global_mmio_write64(ocxlpmem->ocxl_afu,
+ cmd->request_offset + i,
+ OCXL_LITTLE_ENDIAN, 0);
+ if (rc)
+ return rc;
+ }
+
+ return 0;
+}
+
+/**
+ * admin_command_request() - Issue an admin command request
+ * @ocxlpmem: the device metadata
+ * @op_code: The op-code for the command
+ *
+ * Returns an identifier for the command, or negative on error
+ */
+static int admin_command_request(struct ocxlpmem *ocxlpmem, u8 op_code)
+{
+ u8 valid_bytes = sizeof(u64);
+
+ switch (op_code) {
+ case ADMIN_COMMAND_HEARTBEAT:
+ case ADMIN_COMMAND_SHUTDOWN:
+ case ADMIN_COMMAND_ERRLOG:
+ case ADMIN_COMMAND_CMD_CAPS:
+ valid_bytes += 0;
+ break;
+ case ADMIN_COMMAND_FW_UPDATE:
+ case ADMIN_COMMAND_SMART:
+ case ADMIN_COMMAND_CONTROLLER_STATS:
+ case ADMIN_COMMAND_CONTROLLER_DUMP:
+ valid_bytes += sizeof(u64);
+ break;
+ case ADMIN_COMMAND_FW_DEBUG:
+ valid_bytes += 3 * sizeof(u64);
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return scm_command_request(ocxlpmem, &ocxlpmem->admin_command, op_code,
+ valid_bytes);
+}
+
+static int command_response(const struct ocxlpmem *ocxlpmem,
+ const struct command_metadata *cmd)
+{
+ u64 val;
+ u16 id;
+ u8 status;
+ int rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
+ cmd->response_offset,
+ OCXL_LITTLE_ENDIAN, &val);
+ if (rc)
+ return rc;
+
+ status = val & 0xff;
+ id = (val >> 16) & 0xffff;
+
+ if (id != cmd->id) {
+ dev_err(&ocxlpmem->dev,
+ "Expected response for command %d, but received response for command %d instead.\n",
+ cmd->id, id);
+ return -EBUSY;
+ }
+
+ return status;
+}
+
+/**
+ * admin_response() - Validate an admin response
+ * @ocxlpmem: the device metadata
+ * Returns the status code of the command, or negative on error
+ */
+static int admin_response(const struct ocxlpmem *ocxlpmem)
+{
+ return command_response(ocxlpmem, &ocxlpmem->admin_command);
+}
+
+/**
+ * admin_command_exec() - Notify the controller to start processing a pending admin command
+ * @ocxlpmem: the device metadata
+ * Returns 0 on success, negative on error
+ */
+static int admin_command_exec(const struct ocxlpmem *ocxlpmem)
+{
+ return ocxl_global_mmio_set64(ocxlpmem->ocxl_afu, GLOBAL_MMIO_HCI,
+ OCXL_LITTLE_ENDIAN, GLOBAL_MMIO_HCI_ACRW);
+}
+
+static bool admin_command_complete(const struct ocxlpmem *ocxlpmem)
+{
+ u64 val = 0;
+
+ int rc = ocxlpmem_chi(ocxlpmem, &val);
+
+ WARN_ON(rc);
+
+ return (val & GLOBAL_MMIO_CHI_ACRA) != 0;
+}
+
+/**
+ * admin_command_complete_timeout() - Wait for an admin command to finish executing
+ * @ocxlpmem: the device metadata
+ * @command: the admin command to wait for completion (determines the timeout)
+ * Returns 0 on success, -EBUSY on timeout
+ */
+static int admin_command_complete_timeout(const struct ocxlpmem *ocxlpmem,
+ int command)
+{
+ unsigned long timeout = jiffies +
+ msecs_to_jiffies(ocxlpmem->timeouts[command]);
+
+ // 32 is the next power of 2 greater than the 20ms minimum for msleep
+#define TIMEOUT_SLEEP_MILLIS 32
+ do {
+ if (admin_command_complete(ocxlpmem))
+ return 0;
+ msleep(TIMEOUT_SLEEP_MILLIS);
+ } while (time_before(jiffies, timeout));
+
+ if (admin_command_complete(ocxlpmem))
+ return 0;
+
+ return -EBUSY;
+}
+
+// Must be maintained with ADMIN_COMMAND_* in ocxlpmem.h
+static const char * const admin_command_names[] = {
+ "heartbeat",
+ "shutdown",
+ "firmware update",
+ "firmware debug",
+ "retrieve error log",
+ "retrieve SMART data",
+ "controller statistics",
+ "controller dump",
+ "command capabilities",
+};
+
+/**
+ * admin_command_name() - get the name of an admin command
+ * @ocxlpmem: the device metadata
+ * @op_code: the code for the admin command
+ * Returns a string representing the name of the command
+ */
+static const char *admin_command_name(u8 op_code)
+{
+ if (op_code > ADMIN_COMMAND_MAX)
+ return "unknown command";
+
+ return admin_command_names[op_code];
+}
+
+/**
+ * admin_command_execute() - Execute an admin command and wait for completion
+ *
+ * Additional MMIO registers (dependent on the command) may
+ * need to be initialized
+ *
+ * @ocxlpmem: the device metadata
+ * @op_code: the code for the admin command
+ * Returns 0 on success, -EINVAL for a bad op code, -EBUSY on timeout
+ */
+int admin_command_execute(struct ocxlpmem *ocxlpmem, u8 op_code)
+{
+ int rc;
+
+ if (op_code > ADMIN_COMMAND_MAX)
+ return -EINVAL;
+
+ rc = admin_command_request(ocxlpmem, op_code);
+ if (rc)
+ return rc;
+
+ rc = admin_command_exec(ocxlpmem);
+ if (rc)
+ return rc;
+
+ rc = admin_command_complete_timeout(ocxlpmem, op_code);
+ if (rc < 0) {
+ dev_warn(&ocxlpmem->dev, "%s timed out\n",
+ admin_command_name(op_code));
+ return rc;
+ }
+
+ return admin_response(ocxlpmem);
+}
+
+int admin_response_handled(const struct ocxlpmem *ocxlpmem)
+{
+ // writing to the CHIC register clears the bit in CHI
+ return ocxl_global_mmio_set64(ocxlpmem->ocxl_afu, GLOBAL_MMIO_CHIC,
+ OCXL_LITTLE_ENDIAN, GLOBAL_MMIO_CHI_ACRA);
+}
+
+void warn_status(const struct ocxlpmem *ocxlpmem, const char *message,
+ u8 status)
+{
+ const char *text = "Unknown";
+
+ switch (status) {
+ case STATUS_SUCCESS:
+ text = "Success";
+ break;
+
+ case STATUS_MEM_UNAVAILABLE:
+ text = "Persistent memory unavailable";
+ break;
+
+ case STATUS_BAD_OPCODE:
+ text = "Bad opcode";
+ break;
+
+ case STATUS_BAD_REQUEST_PARM:
+ text = "Bad request parameter";
+ break;
+
+ case STATUS_BAD_DATA_PARM:
+ text = "Bad data parameter";
+ break;
+
+ case STATUS_DEBUG_BLOCKED:
+ text = "Debug action blocked";
+ break;
+
+ case STATUS_FAIL:
+ text = "Failed";
+ break;
+ }
+
+ dev_warn(&ocxlpmem->dev, "%s: %s (%x)\n", message, text, status);
+}
--
2.24.1

2020-03-29 02:30:56

by Alastair D'Silva

[permalink] [raw]
Subject: [PATCH v4 09/25] ocxl: Save the device serial number in ocxl_fn

This patch retrieves the serial number of the card and makes it available
to consumers of the ocxl driver via the ocxl_fn struct.

Signed-off-by: Alastair D'Silva <[email protected]>
Acked-by: Frederic Barrat <[email protected]>
Acked-by: Andrew Donnellan <[email protected]>
---
drivers/misc/ocxl/config.c | 46 ++++++++++++++++++++++++++++++++++++++
include/misc/ocxl.h | 1 +
2 files changed, 47 insertions(+)

diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
index 69cca341d446..2b835e57daca 100644
--- a/drivers/misc/ocxl/config.c
+++ b/drivers/misc/ocxl/config.c
@@ -71,6 +71,51 @@ static int find_dvsec_afu_ctrl(struct pci_dev *dev, u8 afu_idx)
return 0;
}

+/**
+ * get_function_0() - Find a related PCI device (function 0)
+ * @device: PCI device to match
+ *
+ * Returns a pointer to the related device, or null if not found
+ */
+static struct pci_dev *get_function_0(struct pci_dev *dev)
+{
+ unsigned int devfn = PCI_DEVFN(PCI_SLOT(dev->devfn), 0);
+
+ return pci_get_domain_bus_and_slot(pci_domain_nr(dev->bus),
+ dev->bus->number, devfn);
+}
+
+static void read_serial(struct pci_dev *dev, struct ocxl_fn_config *fn)
+{
+ u32 low, high;
+ int pos;
+
+ pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DSN);
+ if (pos) {
+ pci_read_config_dword(dev, pos + 0x04, &low);
+ pci_read_config_dword(dev, pos + 0x08, &high);
+
+ fn->serial = low | ((u64)high) << 32;
+
+ return;
+ }
+
+ if (PCI_FUNC(dev->devfn) != 0) {
+ struct pci_dev *related = get_function_0(dev);
+
+ if (!related) {
+ fn->serial = 0;
+ return;
+ }
+
+ read_serial(related, fn);
+ pci_dev_put(related);
+ return;
+ }
+
+ fn->serial = 0;
+}
+
static void read_pasid(struct pci_dev *dev, struct ocxl_fn_config *fn)
{
u16 val;
@@ -208,6 +253,7 @@ int ocxl_config_read_function(struct pci_dev *dev, struct ocxl_fn_config *fn)
int rc;

read_pasid(dev, fn);
+ read_serial(dev, fn);

rc = read_dvsec_tl(dev, fn);
if (rc) {
diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
index d8b0b4d46bfb..b8514dc64bd0 100644
--- a/include/misc/ocxl.h
+++ b/include/misc/ocxl.h
@@ -46,6 +46,7 @@ struct ocxl_fn_config {
int dvsec_afu_info_pos; /* offset of the AFU information DVSEC */
s8 max_pasid_log;
s8 max_afu_index;
+ u64 serial;
};

enum ocxl_endian {
--
2.24.1

2020-03-29 02:31:13

by Alastair D'Silva

[permalink] [raw]
Subject: [PATCH v4 13/25] nvdimm/ocxl: Read the capability registers & wait for device ready

This patch reads timeouts & firmware version from the controller, and
uses those timeouts to wait for the controller to report that it is ready
before handing the memory over to libnvdimm.

Signed-off-by: Alastair D'Silva <[email protected]>
---
drivers/nvdimm/ocxl/Makefile | 2 +-
drivers/nvdimm/ocxl/main.c | 85 +++++++++++++++++++++++++
drivers/nvdimm/ocxl/ocxlpmem.h | 29 +++++++++
drivers/nvdimm/ocxl/ocxlpmem_internal.c | 19 ++++++
4 files changed, 134 insertions(+), 1 deletion(-)
create mode 100644 drivers/nvdimm/ocxl/ocxlpmem_internal.c

diff --git a/drivers/nvdimm/ocxl/Makefile b/drivers/nvdimm/ocxl/Makefile
index e0e8ade1987a..bab97082e062 100644
--- a/drivers/nvdimm/ocxl/Makefile
+++ b/drivers/nvdimm/ocxl/Makefile
@@ -4,4 +4,4 @@ ccflags-$(CONFIG_PPC_WERROR) += -Werror

obj-$(CONFIG_OCXL_PMEM) += ocxlpmem.o

-ocxlpmem-y := main.o
\ No newline at end of file
+ocxlpmem-y := main.o ocxlpmem_internal.o
diff --git a/drivers/nvdimm/ocxl/main.c b/drivers/nvdimm/ocxl/main.c
index c0066fedf9cc..be76acd33d74 100644
--- a/drivers/nvdimm/ocxl/main.c
+++ b/drivers/nvdimm/ocxl/main.c
@@ -8,6 +8,7 @@

#include <linux/module.h>
#include <misc/ocxl.h>
+#include <linux/delay.h>
#include <linux/ndctl.h>
#include <linux/mm_types.h>
#include <linux/memory_hotplug.h>
@@ -327,6 +328,50 @@ static void remove(struct pci_dev *pdev)
}
}

+/**
+ * read_device_metadata() - Retrieve config information from the AFU and save it for future use
+ * @ocxlpmem: the device metadata
+ * Return: 0 on success, negative on failure
+ */
+static int read_device_metadata(struct ocxlpmem *ocxlpmem)
+{
+ u64 val;
+ int rc;
+
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu, GLOBAL_MMIO_CCAP0,
+ OCXL_LITTLE_ENDIAN, &val);
+ if (rc)
+ return rc;
+
+ ocxlpmem->scm_revision = val & 0xFFFF;
+ ocxlpmem->read_latency = (val >> 32) & 0xFFFF;
+ ocxlpmem->readiness_timeout = (val >> 48) & 0x0F;
+ ocxlpmem->memory_available_timeout = val >> 52;
+
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu, GLOBAL_MMIO_CCAP1,
+ OCXL_LITTLE_ENDIAN, &val);
+ if (rc)
+ return rc;
+
+ ocxlpmem->max_controller_dump_size = val & 0xFFFFFFFF;
+
+ // Extract firmware version text
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu, GLOBAL_MMIO_FWVER,
+ OCXL_HOST_ENDIAN,
+ (u64 *)ocxlpmem->fw_version);
+ if (rc)
+ return rc;
+
+ ocxlpmem->fw_version[8] = '\0';
+
+ dev_info(&ocxlpmem->dev,
+ "Firmware version '%s' SCM revision %d:%d\n",
+ ocxlpmem->fw_version, ocxlpmem->scm_revision >> 4,
+ ocxlpmem->scm_revision & 0x0F);
+
+ return 0;
+}
+
/**
* probe_function0() - Set up function 0 for an OpenCAPI persistent memory device
* This is important as it enables templates higher than 0 across all other
@@ -359,6 +404,9 @@ static int probe(struct pci_dev *pdev, const struct pci_device_id *ent)
{
struct ocxlpmem *ocxlpmem;
int rc;
+ u64 chi;
+ u16 elapsed, timeout;
+ bool ready = false;

if (PCI_FUNC(pdev->devfn) == 0)
return probe_function0(pdev);
@@ -413,6 +461,43 @@ static int probe(struct pci_dev *pdev, const struct pci_device_id *ent)
goto err;
}

+ rc = read_device_metadata(ocxlpmem);
+ if (rc) {
+ dev_err(&pdev->dev, "Could not read metadata\n");
+ goto err;
+ }
+
+ elapsed = 0;
+ timeout = ocxlpmem->readiness_timeout +
+ ocxlpmem->memory_available_timeout;
+
+ while (true) {
+ rc = ocxlpmem_chi(ocxlpmem, &chi);
+ ready = (chi & (GLOBAL_MMIO_CHI_CRDY | GLOBAL_MMIO_CHI_MA)) ==
+ (GLOBAL_MMIO_CHI_CRDY | GLOBAL_MMIO_CHI_MA);
+
+ if (ready)
+ break;
+
+ if (elapsed++ > timeout) {
+ dev_err(&ocxlpmem->dev,
+ "OpenCAPI Persistent Memory ready timeout.\n");
+
+ if (!(chi & GLOBAL_MMIO_CHI_CRDY))
+ dev_err(&ocxlpmem->dev,
+ "controller is not ready.\n");
+
+ if (!(chi & GLOBAL_MMIO_CHI_MA))
+ dev_err(&ocxlpmem->dev,
+ "controller does not have memory available.\n");
+
+ rc = -ENXIO;
+ goto err;
+ }
+
+ msleep(1000);
+ }
+
rc = register_lpc_mem(ocxlpmem);
if (rc) {
dev_err(&pdev->dev,
diff --git a/drivers/nvdimm/ocxl/ocxlpmem.h b/drivers/nvdimm/ocxl/ocxlpmem.h
index 322387873b4b..3eadbe19f6d0 100644
--- a/drivers/nvdimm/ocxl/ocxlpmem.h
+++ b/drivers/nvdimm/ocxl/ocxlpmem.h
@@ -93,4 +93,33 @@ struct ocxlpmem {
void *metadata_addr;
struct resource pmem_res;
struct nd_region *nd_region;
+ char fw_version[8 + 1];
+
+ u32 max_controller_dump_size;
+ u16 scm_revision; // major/minor
+ u8 readiness_timeout; /* The worst case time (in seconds) that the host
+ * shall wait for the controller to become
+ * operational following a reset (CHI.CRDY).
+ */
+ u8 memory_available_timeout; /* The worst case time (in seconds) that
+ * the host shall wait for memory to
+ * become available following a reset
+ * (CHI.MA).
+ */
+
+ u16 read_latency; /* The nominal measure of latency (in nanoseconds)
+ * associated with an unassisted read of a memory
+ * block.
+ * This represents the capability of the raw media
+ * technology without assistance
+ */
};
+
+/**
+ * ocxlpmem_chi() - Get the value of the CHI register
+ * @ocxlpmem: the device metadata
+ * @chi: returns the CHI value
+ *
+ * Returns 0 on success, negative on error
+ */
+int ocxlpmem_chi(const struct ocxlpmem *ocxlpmem, u64 *chi);
diff --git a/drivers/nvdimm/ocxl/ocxlpmem_internal.c b/drivers/nvdimm/ocxl/ocxlpmem_internal.c
new file mode 100644
index 000000000000..5578169b7515
--- /dev/null
+++ b/drivers/nvdimm/ocxl/ocxlpmem_internal.c
@@ -0,0 +1,19 @@
+// SPDX-License-Identifier: GPL-2.0+
+// Copyright 2020 IBM Corp.
+
+#include <misc/ocxl.h>
+#include <linux/delay.h>
+#include "ocxlpmem.h"
+
+int ocxlpmem_chi(const struct ocxlpmem *ocxlpmem, u64 *chi)
+{
+ u64 val;
+ int rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu, GLOBAL_MMIO_CHI,
+ OCXL_LITTLE_ENDIAN, &val);
+ if (rc)
+ return rc;
+
+ *chi = val;
+
+ return 0;
+}
--
2.24.1

2020-03-29 02:31:15

by Alastair D'Silva

[permalink] [raw]
Subject: [PATCH v4 10/25] nvdimm: Add driver for OpenCAPI Persistent Memory

This driver exposes LPC memory on OpenCAPI pmem cards
as an NVDIMM, allowing the existing nvram infrastructure
to be used.

Namespace metadata is stored on the media itself, so
scm_reserve_metadata() maps 1 section's worth of PMEM storage
at the start to hold this. The rest of the PMEM range is registered
with libnvdimm as an nvdimm. ndctl_config_read/write/size() provide
callbacks to libnvdimm to access the metadata.

Signed-off-by: Alastair D'Silva <[email protected]>
---
drivers/nvdimm/Kconfig | 2 +
drivers/nvdimm/Makefile | 1 +
drivers/nvdimm/ocxl/Kconfig | 15 ++
drivers/nvdimm/ocxl/Makefile | 7 +
drivers/nvdimm/ocxl/main.c | 476 +++++++++++++++++++++++++++++++++
drivers/nvdimm/ocxl/ocxlpmem.h | 23 ++
6 files changed, 524 insertions(+)
create mode 100644 drivers/nvdimm/ocxl/Kconfig
create mode 100644 drivers/nvdimm/ocxl/Makefile
create mode 100644 drivers/nvdimm/ocxl/main.c
create mode 100644 drivers/nvdimm/ocxl/ocxlpmem.h

diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
index b7d1eb38b27d..368328637182 100644
--- a/drivers/nvdimm/Kconfig
+++ b/drivers/nvdimm/Kconfig
@@ -131,4 +131,6 @@ config NVDIMM_TEST_BUILD
core devm_memremap_pages() implementation and other
infrastructure.

+source "drivers/nvdimm/ocxl/Kconfig"
+
endif
diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
index 29203f3d3069..bc02be11c794 100644
--- a/drivers/nvdimm/Makefile
+++ b/drivers/nvdimm/Makefile
@@ -33,3 +33,4 @@ libnvdimm-$(CONFIG_NVDIMM_KEYS) += security.o
TOOLS := ../../tools
TEST_SRC := $(TOOLS)/testing/nvdimm/test
obj-$(CONFIG_NVDIMM_TEST_BUILD) += $(TEST_SRC)/iomap.o
+obj-$(CONFIG_LIBNVDIMM) += ocxl/
diff --git a/drivers/nvdimm/ocxl/Kconfig b/drivers/nvdimm/ocxl/Kconfig
new file mode 100644
index 000000000000..c5d927520920
--- /dev/null
+++ b/drivers/nvdimm/ocxl/Kconfig
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0-only
+if LIBNVDIMM
+
+config OCXL_PMEM
+ tristate "OpenCAPI Persistent Memory"
+ depends on LIBNVDIMM && PPC_POWERNV && PCI && EEH && ZONE_DEVICE && OCXL
+ help
+ Exposes devices that implement the OpenCAPI Storage Class Memory
+ specification as persistent memory regions. You may also want
+ DEV_DAX, DEV_DAX_PMEM & FS_DAX if you plan on using DAX devices
+ stacked on top of this driver.
+
+ Select N if unsure.
+
+endif
diff --git a/drivers/nvdimm/ocxl/Makefile b/drivers/nvdimm/ocxl/Makefile
new file mode 100644
index 000000000000..e0e8ade1987a
--- /dev/null
+++ b/drivers/nvdimm/ocxl/Makefile
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: GPL-2.0
+
+ccflags-$(CONFIG_PPC_WERROR) += -Werror
+
+obj-$(CONFIG_OCXL_PMEM) += ocxlpmem.o
+
+ocxlpmem-y := main.o
\ No newline at end of file
diff --git a/drivers/nvdimm/ocxl/main.c b/drivers/nvdimm/ocxl/main.c
new file mode 100644
index 000000000000..c0066fedf9cc
--- /dev/null
+++ b/drivers/nvdimm/ocxl/main.c
@@ -0,0 +1,476 @@
+// SPDX-License-Identifier: GPL-2.0+
+// Copyright 2020 IBM Corp.
+
+/*
+ * A driver for OpenCAPI devices that implement the Storage Class
+ * Memory specification.
+ */
+
+#include <linux/module.h>
+#include <misc/ocxl.h>
+#include <linux/ndctl.h>
+#include <linux/mm_types.h>
+#include <linux/memory_hotplug.h>
+#include "ocxlpmem.h"
+
+static const struct pci_device_id pci_tbl[] = {
+ { PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x0625), },
+ { }
+};
+
+MODULE_DEVICE_TABLE(pci, pci_tbl);
+
+#define NUM_MINORS 256 // Total to reserve
+
+static dev_t ocxlpmem_dev;
+static struct class *ocxlpmem_class;
+static struct mutex minors_idr_lock;
+static struct idr minors_idr;
+
+/**
+ * ndctl_config_write() - Handle a ND_CMD_SET_CONFIG_DATA command from ndctl
+ * @ocxlpmem: the device metadata
+ * @command: the incoming data to write
+ * Return: 0 on success, negative on failure
+ */
+static int ndctl_config_write(struct ocxlpmem *ocxlpmem,
+ struct nd_cmd_set_config_hdr *command)
+{
+ if (command->in_offset + command->in_length > LABEL_AREA_SIZE)
+ return -EINVAL;
+
+ memcpy_flushcache(ocxlpmem->metadata_addr + command->in_offset,
+ command->in_buf, command->in_length);
+
+ return 0;
+}
+
+/**
+ * ndctl_config_read() - Handle a ND_CMD_GET_CONFIG_DATA command from ndctl
+ * @ocxlpmem: the device metadata
+ * @command: the read request
+ * Return: 0 on success, negative on failure
+ */
+static int ndctl_config_read(struct ocxlpmem *ocxlpmem,
+ struct nd_cmd_get_config_data_hdr *command)
+{
+ if (command->in_offset + command->in_length > LABEL_AREA_SIZE)
+ return -EINVAL;
+
+ memcpy_mcsafe(command->out_buf,
+ ocxlpmem->metadata_addr + command->in_offset,
+ command->in_length);
+
+ return 0;
+}
+
+/**
+ * ndctl_config_size() - Handle a ND_CMD_GET_CONFIG_SIZE command from ndctl
+ * @command: the read request
+ * Return: 0 on success, negative on failure
+ */
+static int ndctl_config_size(struct nd_cmd_get_config_size *command)
+{
+ command->status = 0;
+ command->config_size = LABEL_AREA_SIZE;
+ command->max_xfer = PAGE_SIZE;
+
+ return 0;
+}
+
+static int ndctl(struct nvdimm_bus_descriptor *nd_desc,
+ struct nvdimm *nvdimm,
+ unsigned int cmd, void *buf, unsigned int buf_len, int *cmd_rc)
+{
+ struct ocxlpmem *ocxlpmem = container_of(nd_desc,
+ struct ocxlpmem, bus_desc);
+
+ switch (cmd) {
+ case ND_CMD_GET_CONFIG_SIZE:
+ *cmd_rc = ndctl_config_size(buf);
+ return 0;
+
+ case ND_CMD_GET_CONFIG_DATA:
+ *cmd_rc = ndctl_config_read(ocxlpmem, buf);
+ return 0;
+
+ case ND_CMD_SET_CONFIG_DATA:
+ *cmd_rc = ndctl_config_write(ocxlpmem, buf);
+ return 0;
+
+ default:
+ return -ENOTTY;
+ }
+}
+
+/**
+ * reserve_metadata() - Reserve space for nvdimm metadata
+ * @ocxlpmem: the device metadata
+ * @lpc_mem: The resource representing the LPC memory of the OpenCAPI device
+ */
+static int reserve_metadata(struct ocxlpmem *ocxlpmem,
+ struct resource *lpc_mem)
+{
+ ocxlpmem->metadata_addr = devm_memremap(&ocxlpmem->dev, lpc_mem->start,
+ LABEL_AREA_SIZE, MEMREMAP_WB);
+ if (IS_ERR(ocxlpmem->metadata_addr))
+ return PTR_ERR(ocxlpmem->metadata_addr);
+
+ return 0;
+}
+
+/**
+ * register_lpc_mem() - Discover persistent memory on a device and register it with the NVDIMM subsystem
+ * @ocxlpmem: the device metadata
+ * Return: 0 on success
+ */
+static int register_lpc_mem(struct ocxlpmem *ocxlpmem)
+{
+ struct nd_region_desc region_desc;
+ struct nd_mapping_desc nd_mapping_desc;
+ struct resource *lpc_mem;
+ const struct ocxl_afu_config *config;
+ const struct ocxl_fn_config *fn_config;
+ int rc;
+ unsigned long nvdimm_cmd_mask = 0;
+ unsigned long nvdimm_flags = 0;
+ int target_node;
+ char serial[16 + 1];
+
+ // Set up the reserved metadata area
+ rc = ocxl_afu_map_lpc_mem(ocxlpmem->ocxl_afu);
+ if (rc < 0)
+ return rc;
+
+ lpc_mem = ocxl_afu_lpc_mem(ocxlpmem->ocxl_afu);
+ if (!lpc_mem || !lpc_mem->start)
+ return -EINVAL;
+
+ config = ocxl_afu_config(ocxlpmem->ocxl_afu);
+ fn_config = ocxl_function_config(ocxlpmem->ocxl_fn);
+
+ rc = reserve_metadata(ocxlpmem, lpc_mem);
+ if (rc)
+ return rc;
+
+ ocxlpmem->bus_desc.provider_name = "ocxlpmem";
+ ocxlpmem->bus_desc.ndctl = ndctl;
+ ocxlpmem->bus_desc.module = THIS_MODULE;
+
+ ocxlpmem->nvdimm_bus = nvdimm_bus_register(&ocxlpmem->dev,
+ &ocxlpmem->bus_desc);
+ if (!ocxlpmem->nvdimm_bus)
+ return -EINVAL;
+
+ ocxlpmem->pmem_res.start = (u64)lpc_mem->start + LABEL_AREA_SIZE;
+ ocxlpmem->pmem_res.end = (u64)lpc_mem->start + config->lpc_mem_size - 1;
+ ocxlpmem->pmem_res.name = "OpenCAPI persistent memory";
+
+ set_bit(ND_CMD_GET_CONFIG_SIZE, &nvdimm_cmd_mask);
+ set_bit(ND_CMD_GET_CONFIG_DATA, &nvdimm_cmd_mask);
+ set_bit(ND_CMD_SET_CONFIG_DATA, &nvdimm_cmd_mask);
+
+ set_bit(NDD_ALIASING, &nvdimm_flags);
+
+ snprintf(serial, sizeof(serial), "%llx", fn_config->serial);
+ nd_mapping_desc.nvdimm = nvdimm_create(ocxlpmem->nvdimm_bus, ocxlpmem,
+ NULL, nvdimm_flags,
+ nvdimm_cmd_mask, 0, NULL);
+ if (!nd_mapping_desc.nvdimm)
+ return -ENOMEM;
+
+ if (nvdimm_bus_check_dimm_count(ocxlpmem->nvdimm_bus, 1))
+ return -EINVAL;
+
+ nd_mapping_desc.start = ocxlpmem->pmem_res.start;
+ nd_mapping_desc.size = resource_size(&ocxlpmem->pmem_res);
+ nd_mapping_desc.position = 0;
+
+ ocxlpmem->nd_set.cookie1 = fn_config->serial;
+ ocxlpmem->nd_set.cookie2 = fn_config->serial;
+
+ target_node = of_node_to_nid(ocxlpmem->pdev->dev.of_node);
+
+ memset(&region_desc, 0, sizeof(region_desc));
+ region_desc.res = &ocxlpmem->pmem_res;
+ region_desc.numa_node = NUMA_NO_NODE;
+ region_desc.target_node = target_node;
+ region_desc.num_mappings = 1;
+ region_desc.mapping = &nd_mapping_desc;
+ region_desc.nd_set = &ocxlpmem->nd_set;
+
+ set_bit(ND_REGION_PAGEMAP, &region_desc.flags);
+ /*
+ * NB: libnvdimm copies the data from ndr_desc into it's own
+ * structures so passing a stack pointer is fine.
+ */
+ ocxlpmem->nd_region = nvdimm_pmem_region_create(ocxlpmem->nvdimm_bus,
+ &region_desc);
+ if (!ocxlpmem->nd_region)
+ return -EINVAL;
+
+ dev_info(&ocxlpmem->dev,
+ "Onlining %lluMB of persistent memory\n",
+ nd_mapping_desc.size / SZ_1M);
+
+ return 0;
+}
+
+/**
+ * allocate_minor() - Allocate a minor number to use for an OpenCAPI pmem device
+ * @ocxlpmem: the device metadata
+ * Return: the allocated minor number
+ */
+static int allocate_minor(struct ocxlpmem *ocxlpmem)
+{
+ int minor;
+
+ mutex_lock(&minors_idr_lock);
+ minor = idr_alloc(&minors_idr, ocxlpmem, 0, NUM_MINORS, GFP_KERNEL);
+ mutex_unlock(&minors_idr_lock);
+ return minor;
+}
+
+static void free_minor(struct ocxlpmem *ocxlpmem)
+{
+ mutex_lock(&minors_idr_lock);
+ idr_remove(&minors_idr, MINOR(ocxlpmem->dev.devt));
+ mutex_unlock(&minors_idr_lock);
+}
+
+/**
+ * free_ocxlpmem() - Free all members of an ocxlpmem struct
+ * @ocxlpmem: the device struct to clear
+ */
+static void free_ocxlpmem(struct ocxlpmem *ocxlpmem)
+{
+ int rc;
+
+ free_minor(ocxlpmem);
+
+ if (ocxlpmem->ocxl_context) {
+ rc = ocxl_context_detach(ocxlpmem->ocxl_context);
+ if (rc == -EBUSY)
+ dev_warn(&ocxlpmem->dev, "Timeout detaching ocxl context\n");
+ else
+ ocxl_context_free(ocxlpmem->ocxl_context);
+ }
+
+ if (ocxlpmem->ocxl_afu)
+ ocxl_afu_put(ocxlpmem->ocxl_afu);
+
+ if (ocxlpmem->ocxl_fn)
+ ocxl_function_close(ocxlpmem->ocxl_fn);
+
+ pci_dev_put(ocxlpmem->pdev);
+
+ kfree(ocxlpmem);
+}
+
+/**
+ * free_ocxlpmem_dev() - Free an OpenCAPI persistent memory device
+ * @dev: The device struct
+ */
+static void free_ocxlpmem_dev(struct device *dev)
+{
+ struct ocxlpmem *ocxlpmem = container_of(dev, struct ocxlpmem, dev);
+
+ free_ocxlpmem(ocxlpmem);
+}
+
+/**
+ * ocxlpmem_register() - Register an OpenCAPI pmem device with the kernel
+ * @ocxlpmem: the device metadata
+ * Return: 0 on success, negative on failure
+ */
+static int ocxlpmem_register(struct ocxlpmem *ocxlpmem)
+{
+ int rc;
+ int minor = allocate_minor(ocxlpmem);
+
+ if (minor < 0)
+ return minor;
+
+ ocxlpmem->dev.release = free_ocxlpmem_dev;
+ rc = dev_set_name(&ocxlpmem->dev, "ocxlpmem%d", minor);
+ if (rc < 0)
+ return rc;
+
+ ocxlpmem->dev.devt = MKDEV(MAJOR(ocxlpmem_dev), minor);
+ ocxlpmem->dev.class = ocxlpmem_class;
+ ocxlpmem->dev.parent = &ocxlpmem->pdev->dev;
+
+ return device_register(&ocxlpmem->dev);
+}
+
+/**
+ * ocxlpmem_remove() - Free an OpenCAPI persistent memory device
+ * @pdev: the PCI device information struct
+ */
+static void remove(struct pci_dev *pdev)
+{
+ if (PCI_FUNC(pdev->devfn) == 0) {
+ struct ocxl_fn *func0 = pci_get_drvdata(pdev);
+
+ if (func0)
+ ocxl_function_close(func0);
+ } else {
+ struct ocxlpmem *ocxlpmem = pci_get_drvdata(pdev);
+
+ if (!ocxlpmem)
+ return;
+
+ if (ocxlpmem->nvdimm_bus)
+ nvdimm_bus_unregister(ocxlpmem->nvdimm_bus);
+
+ device_unregister(&ocxlpmem->dev);
+ }
+}
+
+/**
+ * probe_function0() - Set up function 0 for an OpenCAPI persistent memory device
+ * This is important as it enables templates higher than 0 across all other
+ * functions, which in turn enables higher bandwidth accesses
+ * @pdev: the PCI device information struct
+ * Return: 0 on success, negative on failure
+ */
+static int probe_function0(struct pci_dev *pdev)
+{
+ struct ocxl_fn *fn;
+
+ fn = ocxl_function_open(pdev);
+ if (IS_ERR(fn)) {
+ dev_err(&pdev->dev, "failed to open OCXL function\n");
+ return PTR_ERR(fn);
+ }
+
+ pci_set_drvdata(pdev, fn);
+
+ return 0;
+}
+
+/**
+ * probe() - Init an OpenCAPI persistent memory device
+ * @pdev: the PCI device information struct
+ * @ent: The entry from ocxlpmem_pci_tbl
+ * Return: 0 on success, negative on failure
+ */
+static int probe(struct pci_dev *pdev, const struct pci_device_id *ent)
+{
+ struct ocxlpmem *ocxlpmem;
+ int rc;
+
+ if (PCI_FUNC(pdev->devfn) == 0)
+ return probe_function0(pdev);
+ else if (PCI_FUNC(pdev->devfn) != 1)
+ return 0;
+
+ ocxlpmem = kzalloc(sizeof(*ocxlpmem), GFP_KERNEL);
+ if (!ocxlpmem) {
+ rc = -ENOMEM;
+ goto err;
+ }
+
+ ocxlpmem->pdev = pci_dev_get(pdev);
+
+ pci_set_drvdata(pdev, ocxlpmem);
+
+ ocxlpmem->ocxl_fn = ocxl_function_open(pdev);
+ if (IS_ERR(ocxlpmem->ocxl_fn)) {
+ rc = PTR_ERR(ocxlpmem->ocxl_fn);
+ dev_err(&pdev->dev, "failed to open OCXL function\n");
+ goto err_unregistered;
+ }
+
+ ocxlpmem->ocxl_afu = ocxl_function_fetch_afu(ocxlpmem->ocxl_fn, 0);
+ if (!ocxlpmem->ocxl_afu) {
+ rc = -ENXIO;
+ dev_err(&pdev->dev, "Could not get OCXL AFU from function\n");
+ goto err_unregistered;
+ }
+
+ ocxl_afu_get(ocxlpmem->ocxl_afu);
+
+ // Resources allocated below here are cleaned up in the release handler
+
+ rc = ocxlpmem_register(ocxlpmem);
+ if (rc) {
+ dev_err(&pdev->dev,
+ "Could not register OpenCAPI persistent memory device with the kernel\n");
+ goto err;
+ }
+
+ rc = ocxl_context_alloc(&ocxlpmem->ocxl_context, ocxlpmem->ocxl_afu,
+ NULL);
+ if (rc) {
+ dev_err(&pdev->dev, "Could not allocate OCXL context\n");
+ goto err;
+ }
+
+ rc = ocxl_context_attach(ocxlpmem->ocxl_context, 0, NULL);
+ if (rc) {
+ dev_err(&pdev->dev, "Could not attach ocxl context\n");
+ goto err;
+ }
+
+ rc = register_lpc_mem(ocxlpmem);
+ if (rc) {
+ dev_err(&pdev->dev,
+ "Could not register OpenCAPI persistent memory with libnvdimm\n");
+ goto err;
+ }
+
+ return 0;
+
+err_unregistered:
+ if (!IS_ERR(ocxlpmem->ocxl_fn))
+ ocxl_function_close(ocxlpmem->ocxl_fn);
+ pci_dev_put(ocxlpmem->pdev);
+ kfree(ocxlpmem);
+ pci_set_drvdata(pdev, NULL);
+
+err:
+ /*
+ * Further cleanup is done in the release handler via free_ocxlpmem()
+ * This allows us to keep the character device live to handle IOCTLs to
+ * investigate issues if the card has an error
+ */
+
+ dev_err(&pdev->dev,
+ "Error detected, will not register OpenCAPI persistent memory\n");
+ return 0;
+}
+
+static struct pci_driver pci_driver = {
+ .name = "ocxlpmem",
+ .id_table = pci_tbl,
+ .probe = probe,
+ .remove = remove,
+ .shutdown = remove,
+};
+
+static int __init ocxlpmem_init(void)
+{
+ int rc;
+
+ mutex_init(&minors_idr_lock);
+ idr_init(&minors_idr);
+
+ rc = pci_register_driver(&pci_driver);
+ if (rc)
+ return rc;
+
+ return 0;
+}
+
+static void ocxlpmem_exit(void)
+{
+ pci_unregister_driver(&pci_driver);
+}
+
+module_init(ocxlpmem_init);
+module_exit(ocxlpmem_exit);
+
+MODULE_DESCRIPTION("OpenCAPI Persistent Memory");
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Alastair D'Silva <[email protected]>");
diff --git a/drivers/nvdimm/ocxl/ocxlpmem.h b/drivers/nvdimm/ocxl/ocxlpmem.h
new file mode 100644
index 000000000000..03fe7a264281
--- /dev/null
+++ b/drivers/nvdimm/ocxl/ocxlpmem.h
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0+
+// Copyright 2020 IBM Corp.
+
+#include <linux/pci.h>
+#include <misc/ocxl.h>
+#include <linux/libnvdimm.h>
+#include <linux/mm.h>
+
+#define LABEL_AREA_SIZE BIT_ULL(PA_SECTION_SHIFT)
+
+struct ocxlpmem {
+ struct device dev;
+ struct pci_dev *pdev;
+ struct ocxl_fn *ocxl_fn;
+ struct nd_interleave_set nd_set;
+ struct nvdimm_bus_descriptor bus_desc;
+ struct nvdimm_bus *nvdimm_bus;
+ struct ocxl_afu *ocxl_afu;
+ struct ocxl_context *ocxl_context;
+ void *metadata_addr;
+ struct resource pmem_res;
+ struct nd_region *nd_region;
+};
--
2.24.1

2020-03-29 02:58:37

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v4 10/25] nvdimm: Add driver for OpenCAPI Persistent Memory

On Fri, Mar 27, 2020 at 06:11:47PM +1100, Alastair D'Silva wrote:
> +static struct mutex minors_idr_lock;
> +static struct idr minors_idr;
...
> + mutex_lock(&minors_idr_lock);
> + minor = idr_alloc(&minors_idr, ocxlpmem, 0, NUM_MINORS, GFP_KERNEL);
> + mutex_unlock(&minors_idr_lock);
...
> + mutex_lock(&minors_idr_lock);
> + idr_remove(&minors_idr, MINOR(ocxlpmem->dev.devt));
> + mutex_unlock(&minors_idr_lock);
...
> + mutex_init(&minors_idr_lock);
> + idr_init(&minors_idr);

Unless you look up ocxlpmem by minor number later in the patch series (and
most of the series didn't make it to my mailbox), this can just be an ida.

static DEFINE_IDA(minors);
...
minor = ida_alloc_max(&minors, NUM_MINORS, GFP_KERNEL);
...
ida_free(&minors, MINOR(ocxlpmem->dev.devt));
...
and you can drop the dynamic initialisation. And the mutex.

2020-03-29 03:02:20

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v4 10/25] nvdimm: Add driver for OpenCAPI Persistent Memory

On Sat, Mar 28, 2020 at 07:56:17PM -0700, Matthew Wilcox wrote:
> On Fri, Mar 27, 2020 at 06:11:47PM +1100, Alastair D'Silva wrote:
> > +static struct mutex minors_idr_lock;
> > +static struct idr minors_idr;
> ...
> > + mutex_lock(&minors_idr_lock);
> > + minor = idr_alloc(&minors_idr, ocxlpmem, 0, NUM_MINORS, GFP_KERNEL);
> > + mutex_unlock(&minors_idr_lock);
> ...
> > + mutex_lock(&minors_idr_lock);
> > + idr_remove(&minors_idr, MINOR(ocxlpmem->dev.devt));
> > + mutex_unlock(&minors_idr_lock);
> ...
> > + mutex_init(&minors_idr_lock);
> > + idr_init(&minors_idr);
>
> Unless you look up ocxlpmem by minor number later in the patch series (and
> most of the series didn't make it to my mailbox), this can just be an ida.
>
> static DEFINE_IDA(minors);
> ...
> minor = ida_alloc_max(&minors, NUM_MINORS, GFP_KERNEL);

Oops. NUM_MINORS - 1. Or #define MAX_MINOR 255 since you only seem to use
NUM_MINORS in this one place.

> ...
> ida_free(&minors, MINOR(ocxlpmem->dev.devt));
> ...
> and you can drop the dynamic initialisation. And the mutex.

2020-03-30 05:23:31

by Alastair D'Silva

[permalink] [raw]
Subject: [PATCH v4 03/25] powerpc/powernv: Map & release OpenCAPI LPC memory

This patch adds OPAL calls to powernv so that the OpenCAPI
driver can map & release LPC (Lowest Point of Coherency) memory.

Signed-off-by: Alastair D'Silva <[email protected]>
Reviewed-by: Andrew Donnellan <[email protected]>
---
arch/powerpc/include/asm/pnv-ocxl.h | 2 ++
arch/powerpc/platforms/powernv/ocxl.c | 43 +++++++++++++++++++++++++++
2 files changed, 45 insertions(+)

diff --git a/arch/powerpc/include/asm/pnv-ocxl.h b/arch/powerpc/include/asm/pnv-ocxl.h
index 7de82647e761..560a19bb71b7 100644
--- a/arch/powerpc/include/asm/pnv-ocxl.h
+++ b/arch/powerpc/include/asm/pnv-ocxl.h
@@ -32,5 +32,7 @@ extern int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle)

extern int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr);
extern void pnv_ocxl_free_xive_irq(u32 irq);
+u64 pnv_ocxl_platform_lpc_setup(struct pci_dev *pdev, u64 size);
+void pnv_ocxl_platform_lpc_release(struct pci_dev *pdev);

#endif /* _ASM_PNV_OCXL_H */
diff --git a/arch/powerpc/platforms/powernv/ocxl.c b/arch/powerpc/platforms/powernv/ocxl.c
index 8c65aacda9c8..f13119a7c026 100644
--- a/arch/powerpc/platforms/powernv/ocxl.c
+++ b/arch/powerpc/platforms/powernv/ocxl.c
@@ -475,6 +475,49 @@ void pnv_ocxl_spa_release(void *platform_data)
}
EXPORT_SYMBOL_GPL(pnv_ocxl_spa_release);

+u64 pnv_ocxl_platform_lpc_setup(struct pci_dev *pdev, u64 size)
+{
+ struct pci_controller *hose = pci_bus_to_host(pdev->bus);
+ struct pnv_phb *phb = hose->private_data;
+ u32 bdfn = pci_dev_id(pdev);
+ __be64 base_addr_be64;
+ u64 base_addr;
+ int rc;
+
+ rc = opal_npu_mem_alloc(phb->opal_id, bdfn, size, &base_addr_be64);
+ if (rc) {
+ dev_warn(&pdev->dev,
+ "OPAL could not allocate LPC memory, rc=%d\n", rc);
+ return 0;
+ }
+
+ base_addr = be64_to_cpu(base_addr_be64);
+
+#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
+ rc = check_hotplug_memory_addressable(base_addr >> PAGE_SHIFT,
+ size >> PAGE_SHIFT);
+ if (rc)
+ return 0;
+#endif
+
+ return base_addr;
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_platform_lpc_setup);
+
+void pnv_ocxl_platform_lpc_release(struct pci_dev *pdev)
+{
+ struct pci_controller *hose = pci_bus_to_host(pdev->bus);
+ struct pnv_phb *phb = hose->private_data;
+ u32 bdfn = pci_dev_id(pdev);
+ int rc;
+
+ rc = opal_npu_mem_release(phb->opal_id, bdfn);
+ if (rc)
+ dev_warn(&pdev->dev,
+ "OPAL reported rc=%d when releasing LPC memory\n", rc);
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_platform_lpc_release);
+
int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle)
{
struct spa_data *data = (struct spa_data *) platform_data;
--
2.24.1

2020-03-30 05:24:42

by Alastair D'Silva

[permalink] [raw]
Subject: [PATCH v4 05/25] ocxl: Address kernel doc errors & warnings

This patch addresses warnings and errors from the kernel doc scripts for
the OpenCAPI driver.

It also makes minor tweaks to make the docs more consistent.

Signed-off-by: Alastair D'Silva <[email protected]>
Acked-by: Andrew Donnellan <[email protected]>
---
drivers/misc/ocxl/config.c | 24 ++++----
drivers/misc/ocxl/ocxl_internal.h | 9 +--
include/misc/ocxl.h | 96 ++++++++++++-------------------
3 files changed, 55 insertions(+), 74 deletions(-)

diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
index c8e19bfb5ef9..a62e3d7db2bf 100644
--- a/drivers/misc/ocxl/config.c
+++ b/drivers/misc/ocxl/config.c
@@ -273,16 +273,16 @@ static int read_afu_info(struct pci_dev *dev, struct ocxl_fn_config *fn,
}

/**
- * Read the template version from the AFU
- * dev: the device for the AFU
- * fn: the AFU offsets
- * len: outputs the template length
- * version: outputs the major<<8,minor version
+ * read_template_version() - Read the template version from the AFU
+ * @dev: the device for the AFU
+ * @fn: the AFU offsets
+ * @len: outputs the template length
+ * @version: outputs the major<<8,minor version
*
* Returns 0 on success, negative on failure
*/
static int read_template_version(struct pci_dev *dev, struct ocxl_fn_config *fn,
- u16 *len, u16 *version)
+ u16 *len, u16 *version)
{
u32 val32;
u8 major, minor;
@@ -476,16 +476,16 @@ static int validate_afu(struct pci_dev *dev, struct ocxl_afu_config *afu)
}

/**
- * Populate AFU metadata regarding LPC memory
- * dev: the device for the AFU
- * fn: the AFU offsets
- * afu: the AFU struct to populate the LPC metadata into
+ * read_afu_lpc_memory_info() - Populate AFU metadata regarding LPC memory
+ * @dev: the device for the AFU
+ * @fn: the AFU offsets
+ * @afu: the AFU struct to populate the LPC metadata into
*
* Returns 0 on success, negative on failure
*/
static int read_afu_lpc_memory_info(struct pci_dev *dev,
- struct ocxl_fn_config *fn,
- struct ocxl_afu_config *afu)
+ struct ocxl_fn_config *fn,
+ struct ocxl_afu_config *afu)
{
int rc;
u32 val32;
diff --git a/drivers/misc/ocxl/ocxl_internal.h b/drivers/misc/ocxl/ocxl_internal.h
index 345bf843a38e..198e4e4bc51d 100644
--- a/drivers/misc/ocxl/ocxl_internal.h
+++ b/drivers/misc/ocxl/ocxl_internal.h
@@ -122,11 +122,12 @@ int ocxl_config_check_afu_index(struct pci_dev *dev,
struct ocxl_fn_config *fn, int afu_idx);

/**
- * Update values within a Process Element
+ * ocxl_link_update_pe() - Update values within a Process Element
+ * @link_handle: the link handle associated with the process element
+ * @pasid: the PASID for the AFU context
+ * @tid: the new thread id for the process element
*
- * link_handle: the link handle associated with the process element
- * pasid: the PASID for the AFU context
- * tid: the new thread id for the process element
+ * Returns 0 on success
*/
int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid);

diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
index 0a762e387418..357ef1aadbc0 100644
--- a/include/misc/ocxl.h
+++ b/include/misc/ocxl.h
@@ -62,8 +62,7 @@ struct ocxl_context;
// Device detection & initialisation

/**
- * Open an OpenCAPI function on an OpenCAPI device
- *
+ * ocxl_function_open() - Open an OpenCAPI function on an OpenCAPI device
* @dev: The PCI device that contains the function
*
* Returns an opaque pointer to the function, or an error pointer (check with IS_ERR)
@@ -71,8 +70,7 @@ struct ocxl_context;
struct ocxl_fn *ocxl_function_open(struct pci_dev *dev);

/**
- * Get the list of AFUs associated with a PCI function device
- *
+ * ocxl_function_afu_list() - Get the list of AFUs associated with a PCI function device
* Returns a list of struct ocxl_afu *
*
* @fn: The OpenCAPI function containing the AFUs
@@ -80,8 +78,7 @@ struct ocxl_fn *ocxl_function_open(struct pci_dev *dev);
struct list_head *ocxl_function_afu_list(struct ocxl_fn *fn);

/**
- * Fetch an AFU instance from an OpenCAPI function
- *
+ * ocxl_function_fetch_afu() - Fetch an AFU instance from an OpenCAPI function
* @fn: The OpenCAPI function to get the AFU from
* @afu_idx: The index of the AFU to get
*
@@ -92,23 +89,20 @@ struct list_head *ocxl_function_afu_list(struct ocxl_fn *fn);
struct ocxl_afu *ocxl_function_fetch_afu(struct ocxl_fn *fn, u8 afu_idx);

/**
- * Take a reference to an AFU
- *
+ * ocxl_afu_get() - Take a reference to an AFU
* @afu: The AFU to increment the reference count on
*/
void ocxl_afu_get(struct ocxl_afu *afu);

/**
- * Release a reference to an AFU
- *
+ * ocxl_afu_put() - Release a reference to an AFU
* @afu: The AFU to decrement the reference count on
*/
void ocxl_afu_put(struct ocxl_afu *afu);


/**
- * Get the configuration information for an OpenCAPI function
- *
+ * ocxl_function_config() - Get the configuration information for an OpenCAPI function
* @fn: The OpenCAPI function to get the config for
*
* Returns the function config, or NULL on error
@@ -116,8 +110,7 @@ void ocxl_afu_put(struct ocxl_afu *afu);
const struct ocxl_fn_config *ocxl_function_config(struct ocxl_fn *fn);

/**
- * Close an OpenCAPI function
- *
+ * ocxl_function_close() - Close an OpenCAPI function
* This will free any AFUs previously retrieved from the function, and
* detach and associated contexts. The contexts must by freed by the caller.
*
@@ -129,8 +122,7 @@ void ocxl_function_close(struct ocxl_fn *fn);
// Context allocation

/**
- * Allocate an OpenCAPI context
- *
+ * ocxl_context_alloc() - Allocate an OpenCAPI context
* @context: The OpenCAPI context to allocate, must be freed with ocxl_context_free
* @afu: The AFU the context belongs to
* @mapping: The mapping to unmap when the context is closed (may be NULL)
@@ -139,14 +131,13 @@ int ocxl_context_alloc(struct ocxl_context **context, struct ocxl_afu *afu,
struct address_space *mapping);

/**
- * Free an OpenCAPI context
- *
+ * ocxl_context_free() - Free an OpenCAPI context
* @ctx: The OpenCAPI context to free
*/
void ocxl_context_free(struct ocxl_context *ctx);

/**
- * Grant access to an MM to an OpenCAPI context
+ * ocxl_context_attach() - Grant access to an MM to an OpenCAPI context
* @ctx: The OpenCAPI context to attach
* @amr: The value of the AMR register to restrict access
* @mm: The mm to attach to the context
@@ -157,7 +148,7 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr,
struct mm_struct *mm);

/**
- * Detach an MM from an OpenCAPI context
+ * ocxl_context_detach() - Detach an MM from an OpenCAPI context
* @ctx: The OpenCAPI context to attach
*
* Returns 0 on success, negative on failure
@@ -167,7 +158,7 @@ int ocxl_context_detach(struct ocxl_context *ctx);
// AFU IRQs

/**
- * Allocate an IRQ associated with an AFU context
+ * ocxl_afu_irq_alloc() - Allocate an IRQ associated with an AFU context
* @ctx: the AFU context
* @irq_id: out, the IRQ ID
*
@@ -176,7 +167,7 @@ int ocxl_context_detach(struct ocxl_context *ctx);
int ocxl_afu_irq_alloc(struct ocxl_context *ctx, int *irq_id);

/**
- * Frees an IRQ associated with an AFU context
+ * ocxl_afu_irq_free() - Frees an IRQ associated with an AFU context
* @ctx: the AFU context
* @irq_id: the IRQ ID
*
@@ -185,7 +176,7 @@ int ocxl_afu_irq_alloc(struct ocxl_context *ctx, int *irq_id);
int ocxl_afu_irq_free(struct ocxl_context *ctx, int irq_id);

/**
- * Gets the address of the trigger page for an IRQ
+ * ocxl_afu_irq_get_addr() - Gets the address of the trigger page for an IRQ
* This can then be provided to an AFU which will write to that
* page to trigger the IRQ.
* @ctx: The AFU context that the IRQ is associated with
@@ -196,7 +187,7 @@ int ocxl_afu_irq_free(struct ocxl_context *ctx, int irq_id);
u64 ocxl_afu_irq_get_addr(struct ocxl_context *ctx, int irq_id);

/**
- * Provide a callback to be called when an IRQ is triggered
+ * ocxl_irq_set_handler() - Provide a callback to be called when an IRQ is triggered
* @ctx: The AFU context that the IRQ is associated with
* @irq_id: The IRQ ID
* @handler: the callback to be called when the IRQ is triggered
@@ -213,8 +204,7 @@ int ocxl_irq_set_handler(struct ocxl_context *ctx, int irq_id,
// AFU Metadata

/**
- * Get a pointer to the config for an AFU
- *
+ * ocxl_afu_config() - Get a pointer to the config for an AFU
* @afu: a pointer to the AFU to get the config for
*
* Returns a pointer to the AFU config
@@ -222,27 +212,24 @@ int ocxl_irq_set_handler(struct ocxl_context *ctx, int irq_id,
struct ocxl_afu_config *ocxl_afu_config(struct ocxl_afu *afu);

/**
- * Assign opaque hardware specific information to an OpenCAPI AFU.
- *
- * @dev: The PCI device associated with the OpenCAPI device
+ * ocxl_afu_set_private() - Assign opaque hardware specific information to an OpenCAPI AFU.
+ * @afu: The OpenCAPI AFU
* @private: the opaque hardware specific information to assign to the driver
*/
void ocxl_afu_set_private(struct ocxl_afu *afu, void *private);

/**
- * Fetch the hardware specific information associated with an external OpenCAPI
- * AFU. This may be consumed by an external OpenCAPI driver.
- *
- * @afu: The AFU
+ * ocxl_afu_get_private() - Fetch the hardware specific information associated with
+ * an external OpenCAPI AFU. This may be consumed by an external OpenCAPI driver.
+ * @afu: The OpenCAPI AFU
*
* Returns the opaque pointer associated with the device, or NULL if not set
*/
-void *ocxl_afu_get_private(struct ocxl_afu *dev);
+void *ocxl_afu_get_private(struct ocxl_afu *afu);

// Global MMIO
/**
- * Read a 32 bit value from global MMIO
- *
+ * ocxl_global_mmio_read32() - Read a 32 bit value from global MMIO
* @afu: The AFU
* @offset: The Offset from the start of MMIO
* @endian: the endianness that the MMIO data is in
@@ -251,11 +238,10 @@ void *ocxl_afu_get_private(struct ocxl_afu *dev);
* Returns 0 for success, negative on error
*/
int ocxl_global_mmio_read32(struct ocxl_afu *afu, size_t offset,
- enum ocxl_endian endian, u32 *val);
+ enum ocxl_endian endian, u32 *val);

/**
- * Read a 64 bit value from global MMIO
- *
+ * ocxl_global_mmio_read64() - Read a 64 bit value from global MMIO
* @afu: The AFU
* @offset: The Offset from the start of MMIO
* @endian: the endianness that the MMIO data is in
@@ -264,11 +250,10 @@ int ocxl_global_mmio_read32(struct ocxl_afu *afu, size_t offset,
* Returns 0 for success, negative on error
*/
int ocxl_global_mmio_read64(struct ocxl_afu *afu, size_t offset,
- enum ocxl_endian endian, u64 *val);
+ enum ocxl_endian endian, u64 *val);

/**
- * Write a 32 bit value to global MMIO
- *
+ * ocxl_global_mmio_write32() - Write a 32 bit value to global MMIO
* @afu: The AFU
* @offset: The Offset from the start of MMIO
* @endian: the endianness that the MMIO data is in
@@ -277,11 +262,10 @@ int ocxl_global_mmio_read64(struct ocxl_afu *afu, size_t offset,
* Returns 0 for success, negative on error
*/
int ocxl_global_mmio_write32(struct ocxl_afu *afu, size_t offset,
- enum ocxl_endian endian, u32 val);
+ enum ocxl_endian endian, u32 val);

/**
- * Write a 64 bit value to global MMIO
- *
+ * ocxl_global_mmio_write64() - Write a 64 bit value to global MMIO
* @afu: The AFU
* @offset: The Offset from the start of MMIO
* @endian: the endianness that the MMIO data is in
@@ -290,11 +274,10 @@ int ocxl_global_mmio_write32(struct ocxl_afu *afu, size_t offset,
* Returns 0 for success, negative on error
*/
int ocxl_global_mmio_write64(struct ocxl_afu *afu, size_t offset,
- enum ocxl_endian endian, u64 val);
+ enum ocxl_endian endian, u64 val);

/**
- * Set bits in a 32 bit global MMIO register
- *
+ * ocxl_global_mmio_set32() - Set bits in a 32 bit global MMIO register
* @afu: The AFU
* @offset: The Offset from the start of MMIO
* @endian: the endianness that the MMIO data is in
@@ -303,11 +286,10 @@ int ocxl_global_mmio_write64(struct ocxl_afu *afu, size_t offset,
* Returns 0 for success, negative on error
*/
int ocxl_global_mmio_set32(struct ocxl_afu *afu, size_t offset,
- enum ocxl_endian endian, u32 mask);
+ enum ocxl_endian endian, u32 mask);

/**
- * Set bits in a 64 bit global MMIO register
- *
+ * ocxl_global_mmio_set64() - Set bits in a 64 bit global MMIO register
* @afu: The AFU
* @offset: The Offset from the start of MMIO
* @endian: the endianness that the MMIO data is in
@@ -316,11 +298,10 @@ int ocxl_global_mmio_set32(struct ocxl_afu *afu, size_t offset,
* Returns 0 for success, negative on error
*/
int ocxl_global_mmio_set64(struct ocxl_afu *afu, size_t offset,
- enum ocxl_endian endian, u64 mask);
+ enum ocxl_endian endian, u64 mask);

/**
- * Set bits in a 32 bit global MMIO register
- *
+ * ocxl_global_mmio_clear32() - Set bits in a 32 bit global MMIO register
* @afu: The AFU
* @offset: The Offset from the start of MMIO
* @endian: the endianness that the MMIO data is in
@@ -329,11 +310,10 @@ int ocxl_global_mmio_set64(struct ocxl_afu *afu, size_t offset,
* Returns 0 for success, negative on error
*/
int ocxl_global_mmio_clear32(struct ocxl_afu *afu, size_t offset,
- enum ocxl_endian endian, u32 mask);
+ enum ocxl_endian endian, u32 mask);

/**
- * Set bits in a 64 bit global MMIO register
- *
+ * ocxl_global_mmio_clear64() - Set bits in a 64 bit global MMIO register
* @afu: The AFU
* @offset: The Offset from the start of MMIO
* @endian: the endianness that the MMIO data is in
@@ -342,7 +322,7 @@ int ocxl_global_mmio_clear32(struct ocxl_afu *afu, size_t offset,
* Returns 0 for success, negative on error
*/
int ocxl_global_mmio_clear64(struct ocxl_afu *afu, size_t offset,
- enum ocxl_endian endian, u64 mask);
+ enum ocxl_endian endian, u64 mask);

// Functions left here are for compatibility with the cxlflash driver

--
2.24.1

2020-03-30 05:53:28

by Alastair D'Silva

[permalink] [raw]
Subject: [PATCH v4 25/25] MAINTAINERS: Add myself & nvdimm/ocxl to ocxl

The OpenCAPI Persistent Memory driver will be maintained as part ofi
the ppc tree.

I'm also adding myself as an author of the driver & contributor to
the generic ocxl driver.

Signed-off-by: Alastair D'Silva <[email protected]>
---
MAINTAINERS | 3 +++
1 file changed, 3 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index f8670989ec91..3fb9a9f576a7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12064,13 +12064,16 @@ F: tools/objtool/
OCXL (Open Coherent Accelerator Processor Interface OpenCAPI) DRIVER
M: Frederic Barrat <[email protected]>
M: Andrew Donnellan <[email protected]>
+M: Alastair D'Silva <[email protected]>
L: [email protected]
S: Supported
F: arch/powerpc/platforms/powernv/ocxl.c
+F: arch/powerpc/platforms/powernv/pmem/*
F: arch/powerpc/include/asm/pnv-ocxl.h
F: drivers/misc/ocxl/
F: include/misc/ocxl*
F: include/uapi/misc/ocxl.h
+F: include/uapi/nvdimm/ocxl-pmem.h
F: Documentation/userspace-api/accelerators/ocxl.rst

OMAP AUDIO SUPPORT
--
2.24.1

2020-03-30 05:53:32

by Alastair D'Silva

[permalink] [raw]
Subject: [PATCH v4 21/25] nvdimm/ocxl: Implement the heartbeat command

The heartbeat admin command is a simple admin command that exercises
the communication mechanisms within the controller.

This patch issues a heartbeat command to the card during init to ensure
we can communicate with the card's controller.

Signed-off-by: Alastair D'Silva <[email protected]>
Reviewed-by: Andrew Donnellan <[email protected]>
---
drivers/nvdimm/ocxl/main.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)

diff --git a/drivers/nvdimm/ocxl/main.c b/drivers/nvdimm/ocxl/main.c
index a4315472683c..2fbe3f2f77d9 100644
--- a/drivers/nvdimm/ocxl/main.c
+++ b/drivers/nvdimm/ocxl/main.c
@@ -272,6 +272,30 @@ static int setup_command_metadata(struct ocxlpmem *ocxlpmem)
return 0;
}

+/**
+ * heartbeat() - Issue a heartbeat command to the controller
+ * @ocxlpmem: the device metadata
+ * Return: 0 if the controller responded correctly, negative on error
+ */
+static int heartbeat(struct ocxlpmem *ocxlpmem)
+{
+ int rc;
+
+ mutex_lock(&ocxlpmem->admin_command.lock);
+
+ rc = admin_command_execute(ocxlpmem, ADMIN_COMMAND_HEARTBEAT);
+ if (rc < 0)
+ goto out;
+ if (rc != STATUS_SUCCESS)
+ warn_status(ocxlpmem, "Unexpected status from heartbeat", rc);
+
+ (void)admin_response_handled(ocxlpmem);
+
+out:
+ mutex_unlock(&ocxlpmem->admin_command.lock);
+ return rc;
+}
+
/**
* allocate_minor() - Allocate a minor number to use for an OpenCAPI pmem device
* @ocxlpmem: the device metadata
@@ -1460,6 +1484,12 @@ static int probe(struct pci_dev *pdev, const struct pci_device_id *ent)
goto err;
}

+ rc = heartbeat(ocxlpmem);
+ if (rc) {
+ dev_err(&pdev->dev, "Heartbeat failed\n");
+ goto err;
+ }
+
elapsed = 0;
timeout = ocxlpmem->readiness_timeout +
ocxlpmem->memory_available_timeout;
--
2.24.1

2020-03-30 05:53:46

by Alastair D'Silva

[permalink] [raw]
Subject: [PATCH v4 20/25] nvdimm/ocxl: Add an IOCTL to request controller health & perf data

When health & performance data is requested from the controller,
it responds with an error log containing the requested information.

This patch allows the request to be issued via an IOCTL, the data
can later be collected in userspace via the error log IOCTL introduced
in a previous patch. Userspace will be notified of pending error
logs via an event.

Signed-off-by: Alastair D'Silva <[email protected]>
---
drivers/nvdimm/ocxl/main.c | 16 ++++++++++++++++
include/uapi/nvdimm/ocxlpmem.h | 1 +
2 files changed, 17 insertions(+)

diff --git a/drivers/nvdimm/ocxl/main.c b/drivers/nvdimm/ocxl/main.c
index cb6cdc9eb899..a4315472683c 100644
--- a/drivers/nvdimm/ocxl/main.c
+++ b/drivers/nvdimm/ocxl/main.c
@@ -991,6 +991,18 @@ static int ioctl_event_check(struct ocxlpmem *ocxlpmem, u64 __user *uarg)
return rc;
}

+/**
+ * req_controller_health_perf() - Request controller health & performance data
+ * @ocxlpmem: the device metadata
+ * Return: 0 on success, negative on failure
+ */
+int req_controller_health_perf(struct ocxlpmem *ocxlpmem)
+{
+ return ocxl_global_mmio_set64(ocxlpmem->ocxl_afu, GLOBAL_MMIO_HCI,
+ OCXL_LITTLE_ENDIAN,
+ GLOBAL_MMIO_HCI_REQ_HEALTH_PERF);
+}
+
static long file_ioctl(struct file *file, unsigned int cmd, unsigned long args)
{
struct ocxlpmem *ocxlpmem = file->private_data;
@@ -1028,6 +1040,10 @@ static long file_ioctl(struct file *file, unsigned int cmd, unsigned long args)
case IOCTL_OCXLPMEM_EVENT_CHECK:
rc = ioctl_event_check(ocxlpmem, (u64 __user *)args);
break;
+
+ case IOCTL_OCXLPMEM_REQUEST_HEALTH:
+ rc = req_controller_health_perf(ocxlpmem);
+ break;
}

return rc;
diff --git a/include/uapi/nvdimm/ocxlpmem.h b/include/uapi/nvdimm/ocxlpmem.h
index d573bd307e35..9c5c8585c1c2 100644
--- a/include/uapi/nvdimm/ocxlpmem.h
+++ b/include/uapi/nvdimm/ocxlpmem.h
@@ -91,5 +91,6 @@ struct ioctl_ocxlpmem_eventfd {
#define IOCTL_OCXLPMEM_CONTROLLER_STATS _IO(OCXLPMEM_MAGIC, 0x34)
#define IOCTL_OCXLPMEM_EVENTFD _IOW(OCXLPMEM_MAGIC, 0x35, struct ioctl_ocxlpmem_eventfd)
#define IOCTL_OCXLPMEM_EVENT_CHECK _IOR(OCXLPMEM_MAGIC, 0x36, __u64)
+#define IOCTL_OCXLPMEM_REQUEST_HEALTH _IO(OCXLPMEM_MAGIC, 0x37)

#endif /* _UAPI_OCXL_SCM_H */
--
2.24.1

2020-03-30 05:53:50

by Alastair D'Silva

[permalink] [raw]
Subject: [PATCH v4 24/25] nvdimm/ocxl: Expose the serial number & firmware version in sysfs

This patch exposes the serial number & firmware version in sysfs,
which will be used by ndctl in userspace to help users identify
the device.

Signed-off-by: Alastair D'Silva <[email protected]>
---
drivers/nvdimm/ocxl/main.c | 42 ++++++++++++++++++++++++++++++++++++--
1 file changed, 40 insertions(+), 2 deletions(-)

diff --git a/drivers/nvdimm/ocxl/main.c b/drivers/nvdimm/ocxl/main.c
index 92b4389e8cbb..1f422f0d51ef 100644
--- a/drivers/nvdimm/ocxl/main.c
+++ b/drivers/nvdimm/ocxl/main.c
@@ -235,6 +235,43 @@ static int reserve_metadata(struct ocxlpmem *ocxlpmem,
return 0;
}

+static ssize_t serial_show(struct device *device, struct device_attribute *attr,
+ char *buf)
+{
+ struct nvdimm *nvdimm = to_nvdimm(device);
+ struct ocxlpmem *ocxlpmem = nvdimm_provider_data(nvdimm);
+ const struct ocxl_fn_config *fn_config = ocxl_function_config(ocxlpmem->ocxl_fn);
+
+ return scnprintf(buf, PAGE_SIZE, "%llu\n", fn_config->serial);
+}
+static DEVICE_ATTR_RO(serial);
+
+static ssize_t fw_version_show(struct device *device,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvdimm *nvdimm = to_nvdimm(device);
+ struct ocxlpmem *ocxlpmem = nvdimm_provider_data(nvdimm);
+
+ return scnprintf(buf, PAGE_SIZE, "%s\n", ocxlpmem->fw_version);
+}
+static DEVICE_ATTR_RO(fw_version);
+
+static struct attribute *ocxl_pmem_attrs[] = {
+ &dev_attr_serial.attr,
+ &dev_attr_fw_version.attr,
+ NULL,
+};
+
+static const struct attribute_group ocxl_pmem_attribute_group = {
+ .name = "ocxlpmem",
+ .attrs = ocxl_pmem_attrs,
+};
+
+static const struct attribute_group *ocxl_pmem_dimm_attribute_groups[] = {
+ &ocxl_pmem_attribute_group,
+ NULL,
+};
+
/**
* register_lpc_mem() - Discover persistent memory on a device and register it with the NVDIMM subsystem
* @ocxlpmem: the device metadata
@@ -291,8 +328,9 @@ static int register_lpc_mem(struct ocxlpmem *ocxlpmem)

snprintf(serial, sizeof(serial), "%llx", fn_config->serial);
nd_mapping_desc.nvdimm = nvdimm_create(ocxlpmem->nvdimm_bus, ocxlpmem,
- NULL, nvdimm_flags,
- nvdimm_cmd_mask, 0, NULL);
+ ocxl_pmem_dimm_attribute_groups,
+ nvdimm_flags, nvdimm_cmd_mask, 0,
+ NULL);
if (!nd_mapping_desc.nvdimm)
return -ENOMEM;

--
2.24.1

2020-03-30 05:54:02

by Alastair D'Silva

[permalink] [raw]
Subject: [PATCH v4 12/25] nvdimm/ocxl: Add register addresses & status values to the header

These values have been taken from the device specifications.

Signed-off-by: Alastair D'Silva <[email protected]>
---
drivers/nvdimm/ocxl/ocxlpmem.h | 73 ++++++++++++++++++++++++++++++++++
1 file changed, 73 insertions(+)

diff --git a/drivers/nvdimm/ocxl/ocxlpmem.h b/drivers/nvdimm/ocxl/ocxlpmem.h
index 03fe7a264281..322387873b4b 100644
--- a/drivers/nvdimm/ocxl/ocxlpmem.h
+++ b/drivers/nvdimm/ocxl/ocxlpmem.h
@@ -8,6 +8,79 @@

#define LABEL_AREA_SIZE BIT_ULL(PA_SECTION_SHIFT)

+#define GLOBAL_MMIO_CHI 0x000
+#define GLOBAL_MMIO_CHIC 0x008
+#define GLOBAL_MMIO_CHIE 0x010
+#define GLOBAL_MMIO_CHIEC 0x018
+#define GLOBAL_MMIO_HCI 0x020
+#define GLOBAL_MMIO_HCIC 0x028
+#define GLOBAL_MMIO_IMA0_OHP 0x040
+#define GLOBAL_MMIO_IMA0_CFP 0x048
+#define GLOBAL_MMIO_IMA1_OHP 0x050
+#define GLOBAL_MMIO_IMA1_CFP 0x058
+#define GLOBAL_MMIO_ACMA_CREQO 0x100
+#define GLOBAL_MMIO_ACMA_CRSPO 0x104
+#define GLOBAL_MMIO_ACMA_CDBO 0x108
+#define GLOBAL_MMIO_ACMA_CDBS 0x10c
+#define GLOBAL_MMIO_NSCMA_CREQO 0x120
+#define GLOBAL_MMIO_NSCMA_CRSPO 0x124
+#define GLOBAL_MMIO_NSCMA_CDBO 0x128
+#define GLOBAL_MMIO_NSCMA_CDBS 0x12c
+#define GLOBAL_MMIO_CSTS 0x140
+#define GLOBAL_MMIO_FWVER 0x148
+#define GLOBAL_MMIO_CCAP0 0x160
+#define GLOBAL_MMIO_CCAP1 0x168
+
+#define GLOBAL_MMIO_CHI_ACRA BIT_ULL(0)
+#define GLOBAL_MMIO_CHI_NSCRA BIT_ULL(1)
+#define GLOBAL_MMIO_CHI_CRDY BIT_ULL(4)
+#define GLOBAL_MMIO_CHI_CFFS BIT_ULL(5)
+#define GLOBAL_MMIO_CHI_MA BIT_ULL(6)
+#define GLOBAL_MMIO_CHI_ELA BIT_ULL(7)
+#define GLOBAL_MMIO_CHI_CDA BIT_ULL(8)
+#define GLOBAL_MMIO_CHI_CHFS BIT_ULL(9)
+
+#define GLOBAL_MMIO_CHI_ALL (GLOBAL_MMIO_CHI_ACRA | \
+ GLOBAL_MMIO_CHI_NSCRA | \
+ GLOBAL_MMIO_CHI_CRDY | \
+ GLOBAL_MMIO_CHI_CFFS | \
+ GLOBAL_MMIO_CHI_MA | \
+ GLOBAL_MMIO_CHI_ELA | \
+ GLOBAL_MMIO_CHI_CDA | \
+ GLOBAL_MMIO_CHI_CHFS)
+
+#define GLOBAL_MMIO_HCI_ACRW BIT_ULL(0) // ACRW
+#define GLOBAL_MMIO_HCI_NSCRW BIT_ULL(1) // NSCRW
+#define GLOBAL_MMIO_HCI_AFU_RESET BIT_ULL(2) // AR
+#define GLOBAL_MMIO_HCI_FW_DEBUG BIT_ULL(3) // FDE
+#define GLOBAL_MMIO_HCI_CONTROLLER_DUMP BIT_ULL(4) // CD
+#define GLOBAL_MMIO_HCI_CONTROLLER_DUMP_COLLECTED BIT_ULL(5) // CDC
+#define GLOBAL_MMIO_HCI_REQ_HEALTH_PERF BIT_ULL(6) // CHPD
+
+#define ADMIN_COMMAND_HEARTBEAT 0x00u
+#define ADMIN_COMMAND_SHUTDOWN 0x01u
+#define ADMIN_COMMAND_FW_UPDATE 0x02u
+#define ADMIN_COMMAND_FW_DEBUG 0x03u
+#define ADMIN_COMMAND_ERRLOG 0x04u
+#define ADMIN_COMMAND_SMART 0x05u
+#define ADMIN_COMMAND_CONTROLLER_STATS 0x06u
+#define ADMIN_COMMAND_CONTROLLER_DUMP 0x07u
+#define ADMIN_COMMAND_CMD_CAPS 0x08u
+#define ADMIN_COMMAND_MAX 0x08u
+
+#define STATUS_SUCCESS 0x00
+#define STATUS_MEM_UNAVAILABLE 0x20
+#define STATUS_BLOCKED_BG_TASK 0x21
+#define STATUS_BAD_OPCODE 0x50
+#define STATUS_BAD_REQUEST_PARM 0x51
+#define STATUS_BAD_DATA_PARM 0x52
+#define STATUS_DEBUG_BLOCKED 0x70
+#define STATUS_FAIL 0xFF
+
+#define STATUS_FW_UPDATE_BLOCKED STATUS_BLOCKED_BG_TASK
+#define STATUS_FW_ARG_INVALID STATUS_BAD_REQUEST_PARM
+#define STATUS_FW_INVALID STATUS_BAD_DATA_PARM
+
struct ocxlpmem {
struct device dev;
struct pci_dev *pdev;
--
2.24.1

2020-03-30 05:54:30

by Alastair D'Silva

[permalink] [raw]
Subject: [PATCH v4 15/25] nvdimm/ocxl: Register a character device for userspace to interact with

This patch introduces a character device (/dev/ocxlpmemX) which further
patches will use to interact with userspace, such as error logs,
controller stats and card debug functionality.

Signed-off-by: Alastair D'Silva <[email protected]>
---
drivers/nvdimm/ocxl/main.c | 117 ++++++++++++++++++++++++++++++++-
drivers/nvdimm/ocxl/ocxlpmem.h | 2 +
2 files changed, 117 insertions(+), 2 deletions(-)

diff --git a/drivers/nvdimm/ocxl/main.c b/drivers/nvdimm/ocxl/main.c
index 8db573036423..9b85fcd3f1c9 100644
--- a/drivers/nvdimm/ocxl/main.c
+++ b/drivers/nvdimm/ocxl/main.c
@@ -10,6 +10,7 @@
#include <misc/ocxl.h>
#include <linux/delay.h>
#include <linux/ndctl.h>
+#include <linux/fs.h>
#include <linux/mm_types.h>
#include <linux/memory_hotplug.h>
#include "ocxlpmem.h"
@@ -356,6 +357,67 @@ static int ocxlpmem_register(struct ocxlpmem *ocxlpmem)
return device_register(&ocxlpmem->dev);
}

+static void ocxlpmem_put(struct ocxlpmem *ocxlpmem)
+{
+ put_device(&ocxlpmem->dev);
+}
+
+static struct ocxlpmem *ocxlpmem_get(struct ocxlpmem *ocxlpmem)
+{
+ return (!get_device(&ocxlpmem->dev)) ? NULL : ocxlpmem;
+}
+
+static struct ocxlpmem *find_and_get_ocxlpmem(dev_t devno)
+{
+ struct ocxlpmem *ocxlpmem;
+ int minor = MINOR(devno);
+
+ mutex_lock(&minors_idr_lock);
+ ocxlpmem = idr_find(&minors_idr, minor);
+ if (ocxlpmem)
+ ocxlpmem_get(ocxlpmem);
+ mutex_unlock(&minors_idr_lock);
+
+ return ocxlpmem;
+}
+
+static int file_open(struct inode *inode, struct file *file)
+{
+ struct ocxlpmem *ocxlpmem;
+
+ ocxlpmem = find_and_get_ocxlpmem(inode->i_rdev);
+ if (!ocxlpmem)
+ return -ENODEV;
+
+ file->private_data = ocxlpmem;
+ return 0;
+}
+
+static int file_release(struct inode *inode, struct file *file)
+{
+ struct ocxlpmem *ocxlpmem = file->private_data;
+
+ ocxlpmem_put(ocxlpmem);
+ return 0;
+}
+
+static const struct file_operations fops = {
+ .owner = THIS_MODULE,
+ .open = file_open,
+ .release = file_release,
+};
+
+/**
+ * create_cdev() - Create the chardev in /dev for the device
+ * @ocxlpmem: the SCM metadata
+ * Return: 0 on success, negative on failure
+ */
+static int create_cdev(struct ocxlpmem *ocxlpmem)
+{
+ cdev_init(&ocxlpmem->cdev, &fops);
+ return cdev_add(&ocxlpmem->cdev, ocxlpmem->dev.devt, 1);
+}
+
/**
* ocxlpmem_remove() - Free an OpenCAPI persistent memory device
* @pdev: the PCI device information struct
@@ -376,6 +438,13 @@ static void remove(struct pci_dev *pdev)
if (ocxlpmem->nvdimm_bus)
nvdimm_bus_unregister(ocxlpmem->nvdimm_bus);

+ /*
+ * Remove the cdev early to prevent a race against userspace
+ * via the char dev
+ */
+ if (ocxlpmem->cdev.owner)
+ cdev_del(&ocxlpmem->cdev);
+
device_unregister(&ocxlpmem->dev);
}
}
@@ -527,11 +596,18 @@ static int probe(struct pci_dev *pdev, const struct pci_device_id *ent)
goto err;
}

- if (setup_command_metadata(ocxlpmem)) {
+ rc = setup_command_metadata(ocxlpmem);
+ if (rc) {
dev_err(&pdev->dev, "Could not read command metadata\n");
goto err;
}

+ rc = create_cdev(ocxlpmem);
+ if (rc) {
+ dev_err(&pdev->dev, "Could not create character device\n");
+ goto err;
+ }
+
elapsed = 0;
timeout = ocxlpmem->readiness_timeout +
ocxlpmem->memory_available_timeout;
@@ -599,6 +675,36 @@ static struct pci_driver pci_driver = {
.shutdown = remove,
};

+static int file_init(void)
+{
+ int rc;
+
+ rc = alloc_chrdev_region(&ocxlpmem_dev, 0, NUM_MINORS, "ocxlpmem");
+ if (rc) {
+ idr_destroy(&minors_idr);
+ pr_err("Unable to allocate OpenCAPI persistent memory major number: %d\n",
+ rc);
+ return rc;
+ }
+
+ ocxlpmem_class = class_create(THIS_MODULE, "ocxlpmem");
+ if (IS_ERR(ocxlpmem_class)) {
+ idr_destroy(&minors_idr);
+ pr_err("Unable to create ocxlpmem class\n");
+ unregister_chrdev_region(ocxlpmem_dev, NUM_MINORS);
+ return PTR_ERR(ocxlpmem_class);
+ }
+
+ return 0;
+}
+
+static void file_exit(void)
+{
+ class_destroy(ocxlpmem_class);
+ unregister_chrdev_region(ocxlpmem_dev, NUM_MINORS);
+ idr_destroy(&minors_idr);
+}
+
static int __init ocxlpmem_init(void)
{
int rc;
@@ -606,16 +712,23 @@ static int __init ocxlpmem_init(void)
mutex_init(&minors_idr_lock);
idr_init(&minors_idr);

- rc = pci_register_driver(&pci_driver);
+ rc = file_init();
if (rc)
return rc;

+ rc = pci_register_driver(&pci_driver);
+ if (rc) {
+ file_exit();
+ return rc;
+ }
+
return 0;
}

static void ocxlpmem_exit(void)
{
pci_unregister_driver(&pci_driver);
+ file_exit();
}

module_init(ocxlpmem_init);
diff --git a/drivers/nvdimm/ocxl/ocxlpmem.h b/drivers/nvdimm/ocxl/ocxlpmem.h
index b72b3f909fc3..ee3bd651f254 100644
--- a/drivers/nvdimm/ocxl/ocxlpmem.h
+++ b/drivers/nvdimm/ocxl/ocxlpmem.h
@@ -2,6 +2,7 @@
// Copyright 2020 IBM Corp.

#include <linux/pci.h>
+#include <linux/cdev.h>
#include <misc/ocxl.h>
#include <linux/libnvdimm.h>
#include <linux/mm.h>
@@ -103,6 +104,7 @@ struct command_metadata {
struct ocxlpmem {
struct device dev;
struct pci_dev *pdev;
+ struct cdev cdev;
struct ocxl_fn *ocxl_fn;
struct nd_interleave_set nd_set;
struct nvdimm_bus_descriptor bus_desc;
--
2.24.1

2020-03-30 05:55:20

by Alastair D'Silva

[permalink] [raw]
Subject: [PATCH v4 06/25] ocxl: Tally up the LPC memory on a link & allow it to be mapped

OpenCAPI LPC memory is allocated per link, but each link supports
multiple AFUs, and each AFU can have LPC memory assigned to it.

This patch tallys the memory for all AFUs on a link, allowing it
to be mapped in a single operation after the AFUs have been
enumerated.

Signed-off-by: Alastair D'Silva <[email protected]>
---
drivers/misc/ocxl/core.c | 10 ++++++
drivers/misc/ocxl/link.c | 60 +++++++++++++++++++++++++++++++
drivers/misc/ocxl/ocxl_internal.h | 33 +++++++++++++++++
3 files changed, 103 insertions(+)

diff --git a/drivers/misc/ocxl/core.c b/drivers/misc/ocxl/core.c
index b7a09b21ab36..2531c6cf19a0 100644
--- a/drivers/misc/ocxl/core.c
+++ b/drivers/misc/ocxl/core.c
@@ -230,8 +230,18 @@ static int configure_afu(struct ocxl_afu *afu, u8 afu_idx, struct pci_dev *dev)
if (rc)
goto err_free_pasid;

+ if (afu->config.lpc_mem_size || afu->config.special_purpose_mem_size) {
+ rc = ocxl_link_add_lpc_mem(afu->fn->link, afu->config.lpc_mem_offset,
+ afu->config.lpc_mem_size +
+ afu->config.special_purpose_mem_size);
+ if (rc)
+ goto err_free_mmio;
+ }
+
return 0;

+err_free_mmio:
+ unmap_mmio_areas(afu);
err_free_pasid:
reclaim_afu_pasid(afu);
err_free_actag:
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index 58d111afd9f6..af119d3ef79a 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -84,6 +84,11 @@ struct ocxl_link {
int dev;
atomic_t irq_available;
struct spa *spa;
+ struct mutex lpc_mem_lock; /* protects lpc_mem & lpc_mem_sz */
+ u64 lpc_mem_sz; /* Total amount of LPC memory presented on the link */
+ u64 lpc_mem;
+ int lpc_consumers;
+
void *platform_data;
};
static struct list_head links_list = LIST_HEAD_INIT(links_list);
@@ -396,6 +401,8 @@ static int alloc_link(struct pci_dev *dev, int PE_mask, struct ocxl_link **out_l
if (rc)
goto err_spa;

+ mutex_init(&link->lpc_mem_lock);
+
/* platform specific hook */
rc = pnv_ocxl_spa_setup(dev, link->spa->spa_mem, PE_mask,
&link->platform_data);
@@ -711,3 +718,56 @@ void ocxl_link_free_irq(void *link_handle, int hw_irq)
atomic_inc(&link->irq_available);
}
EXPORT_SYMBOL_GPL(ocxl_link_free_irq);
+
+int ocxl_link_add_lpc_mem(void *link_handle, u64 offset, u64 size)
+{
+ struct ocxl_link *link = (struct ocxl_link *)link_handle;
+
+ // Check for overflow
+ if (offset > (offset + size))
+ return -EINVAL;
+
+ mutex_lock(&link->lpc_mem_lock);
+ link->lpc_mem_sz = max(link->lpc_mem_sz, offset + size);
+
+ mutex_unlock(&link->lpc_mem_lock);
+
+ return 0;
+}
+
+u64 ocxl_link_lpc_map(void *link_handle, struct pci_dev *pdev)
+{
+ struct ocxl_link *link = (struct ocxl_link *)link_handle;
+
+ mutex_lock(&link->lpc_mem_lock);
+
+ if (!link->lpc_mem)
+ link->lpc_mem = pnv_ocxl_platform_lpc_setup(pdev, link->lpc_mem_sz);
+
+ if (link->lpc_mem)
+ link->lpc_consumers++;
+ mutex_unlock(&link->lpc_mem_lock);
+
+ return link->lpc_mem;
+}
+
+void ocxl_link_lpc_release(void *link_handle, struct pci_dev *pdev)
+{
+ struct ocxl_link *link = (struct ocxl_link *)link_handle;
+
+ mutex_lock(&link->lpc_mem_lock);
+
+ if (!link->lpc_mem) {
+ mutex_unlock(&link->lpc_mem_lock);
+ return;
+ }
+
+ WARN_ON(--link->lpc_consumers < 0);
+
+ if (link->lpc_consumers == 0) {
+ pnv_ocxl_platform_lpc_release(pdev);
+ link->lpc_mem = 0;
+ }
+
+ mutex_unlock(&link->lpc_mem_lock);
+}
diff --git a/drivers/misc/ocxl/ocxl_internal.h b/drivers/misc/ocxl/ocxl_internal.h
index 198e4e4bc51d..2d7575225bd7 100644
--- a/drivers/misc/ocxl/ocxl_internal.h
+++ b/drivers/misc/ocxl/ocxl_internal.h
@@ -142,4 +142,37 @@ int ocxl_irq_offset_to_id(struct ocxl_context *ctx, u64 offset);
u64 ocxl_irq_id_to_offset(struct ocxl_context *ctx, int irq_id);
void ocxl_afu_irq_free_all(struct ocxl_context *ctx);

+/**
+ * ocxl_link_add_lpc_mem() - Increment the amount of memory required by an OpenCAPI link
+ *
+ * @link_handle: The OpenCAPI link handle
+ * @offset: The offset of the memory to add
+ * @size: The number of bytes to increment memory on the link by
+ *
+ * Returns 0 on success, -EINVAL on overflow
+ */
+int ocxl_link_add_lpc_mem(void *link_handle, u64 offset, u64 size);
+
+/**
+ * ocxl_link_lpc_map() - Map the LPC memory for an OpenCAPI device
+ * Since LPC memory belongs to a link, the whole LPC memory available
+ * on the link must be mapped in order to make it accessible to a device.
+ * @link_handle: The OpenCAPI link handle
+ * @pdev: A device that is on the link
+ *
+ * Returns the address of the mapped LPC memory, or 0 on error
+ */
+u64 ocxl_link_lpc_map(void *link_handle, struct pci_dev *pdev);
+
+/**
+ * ocxl_link_lpc_release() - Release the LPC memory device for an OpenCAPI device
+ *
+ * Offlines LPC memory on an OpenCAPI link for a device. If this is the
+ * last device on the link to release the memory, unmap it from the link.
+ *
+ * @link_handle: The OpenCAPI link handle
+ * @pdev: A device that is on the link
+ */
+void ocxl_link_lpc_release(void *link_handle, struct pci_dev *pdev);
+
#endif /* _OCXL_INTERNAL_H_ */
--
2.24.1

2020-03-31 08:59:55

by Alastair D'Silva

[permalink] [raw]
Subject: [PATCH v4 19/25] nvdimm/ocxl: Forward events to userspace

Some of the interrupts that the card generates are better handled
by the userspace daemon, in particular:
Controller Hardware/Firmware Fatal
Controller Dump Available
Error Log available

This patch allows a userspace application to register an eventfd with
the driver via SCM_IOCTL_EVENTFD to receive notifications of these
interrupts.

Userspace can then identify what events have occurred by calling
SCM_IOCTL_EVENT_CHECK and checking against the SCM_IOCTL_EVENT_FOO
masks.

Signed-off-by: Alastair D'Silva <[email protected]>
---
drivers/nvdimm/ocxl/main.c | 220 +++++++++++++++++++++++++++++++++
drivers/nvdimm/ocxl/ocxlpmem.h | 4 +
include/uapi/nvdimm/ocxlpmem.h | 12 ++
3 files changed, 236 insertions(+)

diff --git a/drivers/nvdimm/ocxl/main.c b/drivers/nvdimm/ocxl/main.c
index 0040fc09cceb..cb6cdc9eb899 100644
--- a/drivers/nvdimm/ocxl/main.c
+++ b/drivers/nvdimm/ocxl/main.c
@@ -10,6 +10,7 @@
#include <misc/ocxl.h>
#include <linux/delay.h>
#include <linux/ndctl.h>
+#include <linux/eventfd.h>
#include <linux/fs.h>
#include <linux/mm_types.h>
#include <linux/memory_hotplug.h>
@@ -301,8 +302,19 @@ static void free_ocxlpmem(struct ocxlpmem *ocxlpmem)
{
int rc;

+ // Disable doorbells
+ (void)ocxl_global_mmio_set64(ocxlpmem->ocxl_afu, GLOBAL_MMIO_CHIEC,
+ OCXL_LITTLE_ENDIAN,
+ GLOBAL_MMIO_CHI_ALL);
+
free_minor(ocxlpmem);

+ if (ocxlpmem->irq_addr[1])
+ iounmap(ocxlpmem->irq_addr[1]);
+
+ if (ocxlpmem->irq_addr[0])
+ iounmap(ocxlpmem->irq_addr[0]);
+
if (ocxlpmem->ocxl_context) {
rc = ocxl_context_detach(ocxlpmem->ocxl_context);
if (rc == -EBUSY)
@@ -398,6 +410,11 @@ static int file_release(struct inode *inode, struct file *file)
{
struct ocxlpmem *ocxlpmem = file->private_data;

+ if (ocxlpmem->ev_ctx) {
+ eventfd_ctx_put(ocxlpmem->ev_ctx);
+ ocxlpmem->ev_ctx = NULL;
+ }
+
ocxlpmem_put(ocxlpmem);
return 0;
}
@@ -928,6 +945,52 @@ static int ioctl_controller_stats(struct ocxlpmem *ocxlpmem,
return rc;
}

+static int ioctl_eventfd(struct ocxlpmem *ocxlpmem,
+ struct ioctl_ocxlpmem_eventfd __user *uarg)
+{
+ struct ioctl_ocxlpmem_eventfd args;
+
+ if (copy_from_user(&args, uarg, sizeof(args)))
+ return -EFAULT;
+
+ if (ocxlpmem->ev_ctx)
+ return -EBUSY;
+
+ ocxlpmem->ev_ctx = eventfd_ctx_fdget(args.eventfd);
+ if (IS_ERR(ocxlpmem->ev_ctx))
+ return PTR_ERR(ocxlpmem->ev_ctx);
+
+ return 0;
+}
+
+static int ioctl_event_check(struct ocxlpmem *ocxlpmem, u64 __user *uarg)
+{
+ u64 val = 0;
+ int rc;
+ u64 chi = 0;
+
+ rc = ocxlpmem_chi(ocxlpmem, &chi);
+ if (rc < 0)
+ return rc;
+
+ if (chi & GLOBAL_MMIO_CHI_ELA)
+ val |= IOCTL_OCXLPMEM_EVENT_ERROR_LOG_AVAILABLE;
+
+ if (chi & GLOBAL_MMIO_CHI_CDA)
+ val |= IOCTL_OCXLPMEM_EVENT_CONTROLLER_DUMP_AVAILABLE;
+
+ if (chi & GLOBAL_MMIO_CHI_CFFS)
+ val |= IOCTL_OCXLPMEM_EVENT_FIRMWARE_FATAL;
+
+ if (chi & GLOBAL_MMIO_CHI_CHFS)
+ val |= IOCTL_OCXLPMEM_EVENT_HARDWARE_FATAL;
+
+ if (copy_to_user((u64 __user *)uarg, &val, sizeof(val)))
+ return -EFAULT;
+
+ return rc;
+}
+
static long file_ioctl(struct file *file, unsigned int cmd, unsigned long args)
{
struct ocxlpmem *ocxlpmem = file->private_data;
@@ -956,6 +1019,15 @@ static long file_ioctl(struct file *file, unsigned int cmd, unsigned long args)
rc = ioctl_controller_stats(ocxlpmem,
(struct ioctl_ocxlpmem_controller_stats __user *)args);
break;
+
+ case IOCTL_OCXLPMEM_EVENTFD:
+ rc = ioctl_eventfd(ocxlpmem,
+ (struct ioctl_ocxlpmem_eventfd __user *)args);
+ break;
+
+ case IOCTL_OCXLPMEM_EVENT_CHECK:
+ rc = ioctl_event_check(ocxlpmem, (u64 __user *)args);
+ break;
}

return rc;
@@ -1109,6 +1181,148 @@ static void dump_error_log(struct ocxlpmem *ocxlpmem)
kfree(buf);
}

+static irqreturn_t imn0_handler(void *private)
+{
+ struct ocxlpmem *ocxlpmem = private;
+ u64 chi = 0;
+
+ (void)ocxlpmem_chi(ocxlpmem, &chi);
+
+ if (chi & GLOBAL_MMIO_CHI_ELA) {
+ dev_warn(&ocxlpmem->dev, "Error log is available\n");
+
+ if (ocxlpmem->ev_ctx)
+ eventfd_signal(ocxlpmem->ev_ctx, 1);
+ }
+
+ if (chi & GLOBAL_MMIO_CHI_CDA) {
+ dev_warn(&ocxlpmem->dev, "Controller dump is available\n");
+
+ if (ocxlpmem->ev_ctx)
+ eventfd_signal(ocxlpmem->ev_ctx, 1);
+ }
+
+ return IRQ_HANDLED;
+}
+
+static irqreturn_t imn1_handler(void *private)
+{
+ struct ocxlpmem *ocxlpmem = private;
+ u64 chi = 0;
+
+ (void)ocxlpmem_chi(ocxlpmem, &chi);
+
+ if (chi & (GLOBAL_MMIO_CHI_CFFS | GLOBAL_MMIO_CHI_CHFS)) {
+ dev_err(&ocxlpmem->dev,
+ "Controller status is fatal, chi=0x%llx, going offline\n",
+ chi);
+
+ if (ocxlpmem->nvdimm_bus) {
+ nvdimm_bus_unregister(ocxlpmem->nvdimm_bus);
+ ocxlpmem->nvdimm_bus = NULL;
+ }
+
+ if (ocxlpmem->ev_ctx)
+ eventfd_signal(ocxlpmem->ev_ctx, 1);
+ }
+
+ return IRQ_HANDLED;
+}
+
+/**
+ * ocxlpmem_setup_irq() - Set up the IRQs for the OpenCAPI Persistent Memory device
+ * @ocxlpmem: the device metadata
+ * Return: 0 on success, negative on failure
+ */
+static int setup_irqs(struct ocxlpmem *ocxlpmem)
+{
+ int rc;
+ u64 irq_addr;
+
+ rc = ocxl_afu_irq_alloc(ocxlpmem->ocxl_context,
+ &ocxlpmem->irq_id[0]);
+ if (rc)
+ return rc;
+
+ rc = ocxl_irq_set_handler(ocxlpmem->ocxl_context, ocxlpmem->irq_id[0],
+ imn0_handler, NULL, ocxlpmem);
+
+ irq_addr = ocxl_afu_irq_get_addr(ocxlpmem->ocxl_context,
+ ocxlpmem->irq_id[0]);
+ if (!irq_addr)
+ return -EFAULT;
+
+ ocxlpmem->irq_addr[0] = ioremap(irq_addr, PAGE_SIZE);
+ if (!ocxlpmem->irq_addr[0])
+ return -ENODEV;
+
+ rc = ocxl_global_mmio_write64(ocxlpmem->ocxl_afu, GLOBAL_MMIO_IMA0_OHP,
+ OCXL_LITTLE_ENDIAN,
+ (u64)ocxlpmem->irq_addr[0]);
+ if (rc)
+ goto out_irq0;
+
+ rc = ocxl_global_mmio_write64(ocxlpmem->ocxl_afu, GLOBAL_MMIO_IMA0_CFP,
+ OCXL_LITTLE_ENDIAN, 0);
+ if (rc)
+ goto out_irq0;
+
+ rc = ocxl_afu_irq_alloc(ocxlpmem->ocxl_context, &ocxlpmem->irq_id[1]);
+ if (rc)
+ goto out_irq0;
+
+ rc = ocxl_irq_set_handler(ocxlpmem->ocxl_context, ocxlpmem->irq_id[1],
+ imn1_handler, NULL, ocxlpmem);
+ if (rc)
+ goto out_irq0;
+
+ irq_addr = ocxl_afu_irq_get_addr(ocxlpmem->ocxl_context,
+ ocxlpmem->irq_id[1]);
+ if (!irq_addr) {
+ rc = -EFAULT;
+ goto out_irq0;
+ }
+
+ ocxlpmem->irq_addr[1] = ioremap(irq_addr, PAGE_SIZE);
+ if (!ocxlpmem->irq_addr[1]) {
+ rc = -ENODEV;
+ goto out_irq0;
+ }
+
+ rc = ocxl_global_mmio_write64(ocxlpmem->ocxl_afu, GLOBAL_MMIO_IMA1_OHP,
+ OCXL_LITTLE_ENDIAN,
+ (u64)ocxlpmem->irq_addr[1]);
+ if (rc)
+ goto out_irq1;
+
+ rc = ocxl_global_mmio_write64(ocxlpmem->ocxl_afu, GLOBAL_MMIO_IMA1_CFP,
+ OCXL_LITTLE_ENDIAN, 0);
+ if (rc)
+ goto out_irq1;
+
+ // Enable doorbells
+ rc = ocxl_global_mmio_set64(ocxlpmem->ocxl_afu, GLOBAL_MMIO_CHIE,
+ OCXL_LITTLE_ENDIAN,
+ GLOBAL_MMIO_CHI_ELA |
+ GLOBAL_MMIO_CHI_CDA |
+ GLOBAL_MMIO_CHI_CFFS |
+ GLOBAL_MMIO_CHI_CHFS);
+ if (rc)
+ goto out_irq1;
+
+ return 0;
+
+out_irq1:
+ iounmap(ocxlpmem->irq_addr[1]);
+ ocxlpmem->irq_addr[1] = NULL;
+
+out_irq0:
+ iounmap(ocxlpmem->irq_addr[0]);
+ ocxlpmem->irq_addr[0] = NULL;
+
+ return rc;
+}
+
/**
* probe_function0() - Set up function 0 for an OpenCAPI persistent memory device
* This is important as it enables templates higher than 0 across all other
@@ -1212,6 +1426,12 @@ static int probe(struct pci_dev *pdev, const struct pci_device_id *ent)
goto err;
}

+ rc = setup_irqs(ocxlpmem);
+ if (rc) {
+ dev_err(&pdev->dev, "Could not set up OCXL IRQs\n");
+ goto err;
+ }
+
rc = setup_command_metadata(ocxlpmem);
if (rc) {
dev_err(&pdev->dev, "Could not read command metadata\n");
diff --git a/drivers/nvdimm/ocxl/ocxlpmem.h b/drivers/nvdimm/ocxl/ocxlpmem.h
index ee3bd651f254..01721596f982 100644
--- a/drivers/nvdimm/ocxl/ocxlpmem.h
+++ b/drivers/nvdimm/ocxl/ocxlpmem.h
@@ -106,6 +106,9 @@ struct ocxlpmem {
struct pci_dev *pdev;
struct cdev cdev;
struct ocxl_fn *ocxl_fn;
+#define SCM_IRQ_COUNT 2
+ int irq_id[SCM_IRQ_COUNT];
+ void __iomem *irq_addr[SCM_IRQ_COUNT];
struct nd_interleave_set nd_set;
struct nvdimm_bus_descriptor bus_desc;
struct nvdimm_bus *nvdimm_bus;
@@ -117,6 +120,7 @@ struct ocxlpmem {
struct nd_region *nd_region;
char fw_version[8 + 1];
u32 timeouts[ADMIN_COMMAND_MAX + 1];
+ struct eventfd_ctx *ev_ctx;
u32 max_controller_dump_size;
u16 scm_revision; // major/minor
u8 readiness_timeout; /* The worst case time (in seconds) that the host
diff --git a/include/uapi/nvdimm/ocxlpmem.h b/include/uapi/nvdimm/ocxlpmem.h
index ca3a7098fa9d..d573bd307e35 100644
--- a/include/uapi/nvdimm/ocxlpmem.h
+++ b/include/uapi/nvdimm/ocxlpmem.h
@@ -71,6 +71,16 @@ struct ioctl_ocxlpmem_controller_stats {
__u64 fast_write_count;
};

+struct ioctl_ocxlpmem_eventfd {
+ __s32 eventfd;
+ __u32 reserved;
+};
+
+#define IOCTL_OCXLPMEM_EVENT_CONTROLLER_DUMP_AVAILABLE (1ULL << (0))
+#define IOCTL_OCXLPMEM_EVENT_ERROR_LOG_AVAILABLE (1ULL << (1))
+#define IOCTL_OCXLPMEM_EVENT_HARDWARE_FATAL (1ULL << (2))
+#define IOCTL_OCXLPMEM_EVENT_FIRMWARE_FATAL (1ULL << (3))
+
/* ioctl numbers */
#define OCXLPMEM_MAGIC 0xCA
/* OpenCAPI Persistent memory devices */
@@ -79,5 +89,7 @@ struct ioctl_ocxlpmem_controller_stats {
#define IOCTL_OCXLPMEM_CONTROLLER_DUMP_DATA _IOWR(OCXLPMEM_MAGIC, 0x32, struct ioctl_ocxlpmem_controller_dump_data)
#define IOCTL_OCXLPMEM_CONTROLLER_DUMP_COMPLETE _IO(OCXLPMEM_MAGIC, 0x33)
#define IOCTL_OCXLPMEM_CONTROLLER_STATS _IO(OCXLPMEM_MAGIC, 0x34)
+#define IOCTL_OCXLPMEM_EVENTFD _IOW(OCXLPMEM_MAGIC, 0x35, struct ioctl_ocxlpmem_eventfd)
+#define IOCTL_OCXLPMEM_EVENT_CHECK _IOR(OCXLPMEM_MAGIC, 0x36, __u64)

#endif /* _UAPI_OCXL_SCM_H */
--
2.24.1

2020-03-31 08:59:57

by Alastair D'Silva

[permalink] [raw]
Subject: [PATCH v4 23/25] nvdimm/ocxl: Expose SMART data via ndctl

This patch retrieves proprietary formatted SMART data and makes it
available via ndctl. A later contribution will be made to ndctl to
parse this data.

Signed-off-by: Alastair D'Silva <[email protected]>
---
drivers/nvdimm/ocxl/main.c | 113 +++++++++++++++++++++++++++++++++
drivers/nvdimm/ocxl/ocxlpmem.h | 18 ++++++
include/uapi/linux/ndctl.h | 1 +
3 files changed, 132 insertions(+)

diff --git a/drivers/nvdimm/ocxl/main.c b/drivers/nvdimm/ocxl/main.c
index 2811bf7efbab..92b4389e8cbb 100644
--- a/drivers/nvdimm/ocxl/main.c
+++ b/drivers/nvdimm/ocxl/main.c
@@ -82,6 +82,114 @@ static int ndctl_config_size(struct nd_cmd_get_config_size *command)
return 0;
}

+/**
+ * smart_header_parse() - Parse the first 64 bits of the SMART admin command response
+ * @ocxlpmem: the device metadata
+ * @length: out, returns the number of bytes in the response (excluding the 64 bit header)
+ */
+static int smart_header_parse(struct ocxlpmem *ocxlpmem, u32 *length)
+{
+ int rc;
+ u64 val;
+
+ u16 data_identifier;
+ u32 data_length;
+
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
+ ocxlpmem->admin_command.data_offset,
+ OCXL_LITTLE_ENDIAN, &val);
+ if (rc)
+ return rc;
+
+ data_identifier = val >> 48;
+ data_length = val & 0xFFFFFFFF;
+
+ if (data_identifier != 0x534D) { // 'SM'
+ dev_err(&ocxlpmem->dev,
+ "Bad data identifier for smart data, expected 'SM', got '%2s'\n",
+ (char *)&data_identifier);
+ return -EINVAL;
+ }
+
+ *length = data_length;
+ return 0;
+}
+
+static int ndctl_smart(struct ocxlpmem *ocxlpmem, struct nd_cmd_pkg *pkg)
+{
+ u32 length, i;
+ struct nd_ocxl_smart *out;
+ int rc;
+
+ mutex_lock(&ocxlpmem->admin_command.lock);
+
+ rc = admin_command_execute(ocxlpmem, ADMIN_COMMAND_SMART);
+ if (rc < 0)
+ goto out;
+ if (rc != STATUS_SUCCESS) {
+ warn_status(ocxlpmem, "Unexpected status from SMART", rc);
+ goto out;
+ }
+
+ rc = smart_header_parse(ocxlpmem, &length);
+ if (rc)
+ goto out;
+
+ pkg->nd_fw_size = length + offsetof(struct nd_ocxl_smart, attribs);
+
+ length = min(length, (u32)(pkg->nd_size_out -
+ offsetof(struct nd_ocxl_smart, attribs))); // bytes
+ out = (struct nd_ocxl_smart *)pkg->nd_payload;
+ // Each SMART attribute is 2 * 64 bits
+ out->count = length / (2 * sizeof(u64)); // attributes
+
+ for (i = 0; i < length; i += sizeof(u64)) {
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
+ ocxlpmem->admin_command.data_offset +
+ sizeof(u64) + i,
+ OCXL_LITTLE_ENDIAN,
+ &out->attribs[i / sizeof(u64)]);
+ if (rc)
+ goto out;
+ }
+
+ rc = admin_response_handled(ocxlpmem);
+ if (rc)
+ goto out;
+
+out:
+ mutex_unlock(&ocxlpmem->admin_command.lock);
+ return rc;
+}
+
+static int ndctl_call(struct ocxlpmem *ocxlpmem, void *buf,
+ unsigned int buf_len)
+{
+ struct nd_cmd_pkg *pkg = buf;
+
+ if (buf_len < sizeof(struct nd_cmd_pkg)) {
+ dev_err(&ocxlpmem->dev, "Invalid ND_CALL size=%u\n", buf_len);
+ return -EINVAL;
+ }
+
+ if (pkg->nd_family != NVDIMM_FAMILY_OCXL) {
+ dev_err(&ocxlpmem->dev, "Invalid ND_CALL family=0x%llx\n",
+ pkg->nd_family);
+ return -EINVAL;
+ }
+
+ switch (pkg->nd_command) {
+ case ND_CMD_OCXL_SMART:
+ return ndctl_smart(ocxlpmem, pkg);
+ default:
+ dev_err(&ocxlpmem->dev, "Invalid ND_CALL command=0x%llx\n",
+ pkg->nd_command);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
static int ndctl(struct nvdimm_bus_descriptor *nd_desc,
struct nvdimm *nvdimm,
unsigned int cmd, void *buf, unsigned int buf_len, int *cmd_rc)
@@ -90,6 +198,10 @@ static int ndctl(struct nvdimm_bus_descriptor *nd_desc,
struct ocxlpmem, bus_desc);

switch (cmd) {
+ case ND_CMD_CALL:
+ *cmd_rc = ndctl_call(ocxlpmem, buf, buf_len);
+ return 0;
+
case ND_CMD_GET_CONFIG_SIZE:
*cmd_rc = ndctl_config_size(buf);
return 0;
@@ -173,6 +285,7 @@ static int register_lpc_mem(struct ocxlpmem *ocxlpmem)
set_bit(ND_CMD_GET_CONFIG_SIZE, &nvdimm_cmd_mask);
set_bit(ND_CMD_GET_CONFIG_DATA, &nvdimm_cmd_mask);
set_bit(ND_CMD_SET_CONFIG_DATA, &nvdimm_cmd_mask);
+ set_bit(ND_CMD_CALL, &nvdimm_cmd_mask);

set_bit(NDD_ALIASING, &nvdimm_flags);

diff --git a/drivers/nvdimm/ocxl/ocxlpmem.h b/drivers/nvdimm/ocxl/ocxlpmem.h
index c8794e7775ec..ebaf11692c09 100644
--- a/drivers/nvdimm/ocxl/ocxlpmem.h
+++ b/drivers/nvdimm/ocxl/ocxlpmem.h
@@ -6,6 +6,7 @@
#include <misc/ocxl.h>
#include <linux/libnvdimm.h>
#include <linux/mm.h>
+#include <linux/ndctl.h>

#define LABEL_AREA_SIZE BIT_ULL(PA_SECTION_SHIFT)
#define DEFAULT_TIMEOUT 100
@@ -101,6 +102,23 @@ struct command_metadata {
u8 op_code;
};

+struct nd_ocxl_smart {
+ __u8 count;
+ __u8 reserved[7];
+ __u64 attribs[0];
+} __packed;
+
+struct nd_pkg_ocxl {
+ struct nd_cmd_pkg gen;
+ union {
+ struct nd_ocxl_smart smart;
+ };
+};
+
+enum nd_cmd_ocxl {
+ ND_CMD_OCXL_SMART = 1,
+};
+
struct ocxlpmem {
struct device dev;
struct pci_dev *pdev;
diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
index de5d90212409..2885052e7f40 100644
--- a/include/uapi/linux/ndctl.h
+++ b/include/uapi/linux/ndctl.h
@@ -244,6 +244,7 @@ struct nd_cmd_pkg {
#define NVDIMM_FAMILY_HPE2 2
#define NVDIMM_FAMILY_MSFT 3
#define NVDIMM_FAMILY_HYPERV 4
+#define NVDIMM_FAMILY_OCXL 6

#define ND_IOCTL_CALL _IOWR(ND_IOCTL, ND_CMD_CALL,\
struct nd_cmd_pkg)
--
2.24.1

2020-03-31 09:00:45

by Alastair D'Silva

[permalink] [raw]
Subject: [PATCH v4 22/25] nvdimm/ocxl: Add debug IOCTLs

These IOCTLs provide low level access to the card to aid in debugging
controller/FPGA firmware.

Signed-off-by: Alastair D'Silva <[email protected]>
---
drivers/nvdimm/ocxl/Kconfig | 6 +
drivers/nvdimm/ocxl/main.c | 198 +++++++++++++++++++++++++++++++++
drivers/nvdimm/ocxl/ocxlpmem.h | 2 +-
include/uapi/nvdimm/ocxlpmem.h | 31 ++++++
4 files changed, 236 insertions(+), 1 deletion(-)

diff --git a/drivers/nvdimm/ocxl/Kconfig b/drivers/nvdimm/ocxl/Kconfig
index c5d927520920..3f44429d70c9 100644
--- a/drivers/nvdimm/ocxl/Kconfig
+++ b/drivers/nvdimm/ocxl/Kconfig
@@ -12,4 +12,10 @@ config OCXL_PMEM

Select N if unsure.

+config OCXL_PMEM_DEBUG
+ bool "OpenCAPI Persistent Memory debugging"
+ depends on OCXL_PMEM
+ help
+ Enables low level IOCTLs for OpenCAPI Persistent Memory firmware development
+
endif
diff --git a/drivers/nvdimm/ocxl/main.c b/drivers/nvdimm/ocxl/main.c
index 2fbe3f2f77d9..2811bf7efbab 100644
--- a/drivers/nvdimm/ocxl/main.c
+++ b/drivers/nvdimm/ocxl/main.c
@@ -1027,6 +1027,183 @@ int req_controller_health_perf(struct ocxlpmem *ocxlpmem)
GLOBAL_MMIO_HCI_REQ_HEALTH_PERF);
}

+#ifdef CONFIG_OCXL_PMEM_DEBUG
+/**
+ * enable_fwdebug() - Enable FW debug on the controller
+ * @ocxlpmem: the device metadata
+ * Return: 0 on success, negative on failure
+ */
+static int enable_fwdebug(const struct ocxlpmem *ocxlpmem)
+{
+ return ocxl_global_mmio_set64(ocxlpmem->ocxl_afu, GLOBAL_MMIO_HCI,
+ OCXL_LITTLE_ENDIAN,
+ GLOBAL_MMIO_HCI_FW_DEBUG);
+}
+
+/**
+ * disable_fwdebug() - Disable FW debug on the controller
+ * @ocxlpmem: the device metadata
+ * Return: 0 on success, negative on failure
+ */
+static int disable_fwdebug(const struct ocxlpmem *ocxlpmem)
+{
+ return ocxl_global_mmio_set64(ocxlpmem->ocxl_afu, GLOBAL_MMIO_HCIC,
+ OCXL_LITTLE_ENDIAN,
+ GLOBAL_MMIO_HCI_FW_DEBUG);
+}
+
+static int ioctl_fwdebug(struct ocxlpmem *ocxlpmem,
+ struct ioctl_ocxl_pmem_fwdebug __user *uarg)
+{
+ struct ioctl_ocxl_pmem_fwdebug args;
+ u64 val;
+ int i;
+ int rc;
+
+ if (copy_from_user(&args, uarg, sizeof(args)))
+ return -EFAULT;
+
+ // Buffer size must be a multiple of 8
+ if ((args.buf_size & 0x07))
+ return -EINVAL;
+
+ if (args.buf_size > ocxlpmem->admin_command.data_size)
+ return -EINVAL;
+
+ mutex_lock(&ocxlpmem->admin_command.lock);
+
+ rc = enable_fwdebug(ocxlpmem);
+ if (rc)
+ goto out;
+
+ // Write DebugAction & FunctionCode
+ val = ((u64)args.debug_action << 56) | ((u64)args.function_code << 40);
+
+ rc = ocxl_global_mmio_write64(ocxlpmem->ocxl_afu,
+ ocxlpmem->admin_command.request_offset + 0x08,
+ OCXL_LITTLE_ENDIAN, val);
+ if (rc)
+ goto out;
+
+ rc = ocxl_global_mmio_write64(ocxlpmem->ocxl_afu,
+ ocxlpmem->admin_command.request_offset + 0x10,
+ OCXL_LITTLE_ENDIAN,
+ args.debug_parameter_1);
+ if (rc)
+ goto out;
+
+ rc = ocxl_global_mmio_write64(ocxlpmem->ocxl_afu,
+ ocxlpmem->admin_command.request_offset + 0x18,
+ OCXL_LITTLE_ENDIAN,
+ args.debug_parameter_2);
+ if (rc)
+ goto out;
+
+ // Populate admin command buffer
+ if (args.buf_size) {
+ for (i = 0; i < args.buf_size; i += sizeof(u64)) {
+ u64 val;
+
+ if (copy_from_user(&val, &args.buf[i], sizeof(u64))) {
+ rc = -EFAULT;
+ goto out;
+ }
+
+ rc = ocxl_global_mmio_write64(ocxlpmem->ocxl_afu,
+ ocxlpmem->admin_command.data_offset + i,
+ OCXL_HOST_ENDIAN, val);
+ if (rc)
+ goto out;
+ }
+ }
+
+ rc = admin_command_execute(ocxlpmem,
+ ocxlpmem->timeouts[ADMIN_COMMAND_FW_DEBUG]);
+ if (rc < 0)
+ goto out;
+ if (rc != STATUS_SUCCESS) {
+ warn_status(ocxlpmem, "Unexpected status from FW Debug", rc);
+ goto out;
+ }
+
+ if (args.buf_size) {
+ for (i = 0; i < args.buf_size; i += sizeof(u64)) {
+ u64 val;
+
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
+ ocxlpmem->admin_command.data_offset + i,
+ OCXL_HOST_ENDIAN, &val);
+ if (rc)
+ goto out;
+
+ if (copy_to_user(&args.buf[i], &val, sizeof(u64))) {
+ rc = -EFAULT;
+ goto out;
+ }
+ }
+ }
+
+ rc = admin_response_handled(ocxlpmem);
+ if (rc)
+ goto out;
+
+ rc = disable_fwdebug(ocxlpmem);
+
+out:
+ mutex_unlock(&ocxlpmem->admin_command.lock);
+ return rc;
+}
+
+static int ioctl_shutdown(struct ocxlpmem *ocxlpmem)
+{
+ int rc;
+
+ mutex_lock(&ocxlpmem->admin_command.lock);
+
+ rc = admin_command_execute(ocxlpmem, ADMIN_COMMAND_SHUTDOWN);
+ if (rc)
+ goto out;
+
+ rc = admin_response_handled(ocxlpmem);
+
+out:
+ mutex_unlock(&ocxlpmem->admin_command.lock);
+ return rc;
+}
+
+static int ioctl_mmio_write(struct ocxlpmem *ocxlpmem,
+ struct ioctl_ocxl_pmem_mmio __user *uarg)
+{
+ struct scm_ioctl_mmio args;
+
+ if (copy_from_user(&args, uarg, sizeof(args)))
+ return -EFAULT;
+
+ return ocxl_global_mmio_write64(ocxlpmem->ocxl_afu, args.address,
+ OCXL_LITTLE_ENDIAN, args.val);
+}
+
+static int ioctl_mmio_read(struct ocxlpmem *ocxlpmem,
+ struct ioctl_ocxl_pmem_mmio __user *uarg)
+{
+ struct ioctl_ocxl_pmem_mmio args;
+ int rc;
+
+ if (copy_from_user(&args, uarg, sizeof(args)))
+ return -EFAULT;
+
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu, args.address,
+ OCXL_LITTLE_ENDIAN, &args.val);
+ if (rc)
+ return rc;
+
+ if (copy_to_user(uarg, &args, sizeof(args)))
+ return -EFAULT;
+
+ return 0;
+}
+#endif /* CONFIG_OCXL_PMEM_DEBUG */
+
static long file_ioctl(struct file *file, unsigned int cmd, unsigned long args)
{
struct ocxlpmem *ocxlpmem = file->private_data;
@@ -1068,6 +1245,27 @@ static long file_ioctl(struct file *file, unsigned int cmd, unsigned long args)
case IOCTL_OCXLPMEM_REQUEST_HEALTH:
rc = req_controller_health_perf(ocxlpmem);
break;
+
+#ifdef CONFIG_OCXL_PMEM_DEBUG
+ case IOCTL_OCXL_PMEM_FWDEBUG:
+ rc = ioctl_fwdebug(ocxlpmem,
+ (struct ioctl_ocxl_pmem_fwdebug __user *)args);
+ break;
+
+ case IOCTL_OCXL_PMEM_SHUTDOWN:
+ rc = ioctl_shutdown(ocxlpmem);
+ break;
+
+ case IOCTL_OCXL_PMEM_MMIO_WRITE:
+ rc = ioctl_mmio_write(ocxlpmem,
+ (struct ioctl_ocxl_pmem_mmio __user *)args);
+ break;
+
+ case IOCTL_OCXL_PMEM_MMIO_READ:
+ rc = ioctl_mmio_read(ocxlpmem,
+ (struct ioctl_ocxl_pmem_mmio __user *)args);
+ break;
+#endif
}

return rc;
diff --git a/drivers/nvdimm/ocxl/ocxlpmem.h b/drivers/nvdimm/ocxl/ocxlpmem.h
index 01721596f982..c8794e7775ec 100644
--- a/drivers/nvdimm/ocxl/ocxlpmem.h
+++ b/drivers/nvdimm/ocxl/ocxlpmem.h
@@ -158,7 +158,7 @@ int ocxlpmem_chi(const struct ocxlpmem *ocxlpmem, u64 *chi);
*
* @ocxlpmem: the device metadata
* @op_code: the code for the admin command
- * Returns 0 on success, -EINVAL for a bad op code, -EBUSY on timeout
+// * Returns 0 on success, -EINVAL for a bad op code, -EBUSY on timeout
*/
int admin_command_execute(struct ocxlpmem *ocxlpmem, u8 op_code);

diff --git a/include/uapi/nvdimm/ocxlpmem.h b/include/uapi/nvdimm/ocxlpmem.h
index 9c5c8585c1c2..374b70690b3a 100644
--- a/include/uapi/nvdimm/ocxlpmem.h
+++ b/include/uapi/nvdimm/ocxlpmem.h
@@ -93,4 +93,35 @@ struct ioctl_ocxlpmem_eventfd {
#define IOCTL_OCXLPMEM_EVENT_CHECK _IOR(OCXLPMEM_MAGIC, 0x36, __u64)
#define IOCTL_OCXLPMEM_REQUEST_HEALTH _IO(OCXLPMEM_MAGIC, 0x37)

+/*
+ * Debug IOCTLs, these are only available if the kernel is compiled with
+ * CONFIG_OCXLPMEM_DEBUG
+ */
+
+#define OCXLPMEM_FWDEBUG_READ_CONTROLLER_MEMORY 0x01
+#define OCXLPMEM_FWDEBUG_WRITE_CONTROLLER_MEMORY 0x02
+#define OCXLPMEM_FWDEBUG_ENABLE_FUNCTION 0x03
+#define OCXLPMEM_FWDEBUG_DISABLE_FUNCTION 0x04
+#define OCXLPMEM_FWDEBUG_GET_PEL 0x05 /* Retrieve Persistent Error Log */
+
+struct ioctl_ocxlpmem_fwdebug { /* All args are inputs */
+ __u64 debug_parameter_1;
+ __u64 debug_parameter_2;
+ __u64 buf; /* Coerced pointer to optional in/out data buffer */
+ __u16 debug_action; /* OCXLPMEM_FWDEBUG_... */
+ __u16 function_code;
+ __u16 buf_size; /* Size of optional data buffer */
+ __u16 reserved;
+};
+
+struct ioctl_ocxlpmem_mmio {
+ __u64 address; /* Offset in global MMIO space */
+ __u64 val; /* value to write/was read */
+};
+
+#define IOCTL_OCXLPMEM_FWDEBUG _IOWR(OCXLPMEM_MAGIC, 0xf0, struct ioctl_ocxlpmem_fwdebug)
+#define IOCTL_OCXLPMEM_MMIO_WRITE _IOW(OCXLPMEM_MAGIC, 0xf1, struct ioctl_ocxlpmem_mmio)
+#define IOCTL_OCXLPMEM_MMIO_READ _IOWR(OCXLPMEM_MAGIC, 0xf2, struct ioctl_ocxlpmem_mmio)
+#define IOCTL_OCXLPMEM_SHUTDOWN _IO(OCXLPMEM_MAGIC, 0xf3)
+
#endif /* _UAPI_OCXL_SCM_H */
--
2.24.1

2020-03-31 09:00:58

by Alastair D'Silva

[permalink] [raw]
Subject: [PATCH v4 16/25] nvdimm/ocxl: Implement the Read Error Log command

The read error log command extracts information from the controller's
internal error log.

This patch exposes this information in 2 ways:
- During probe, if an error occurs & a log is available, print it to the
console
- After probe, make the error log available to userspace via an IOCTL.
Userspace is notified of pending error logs in a later patch
("powerpc/powernv/pmem: Forward events to userspace")

Signed-off-by: Alastair D'Silva <[email protected]>
---
.../userspace-api/ioctl/ioctl-number.rst | 1 +
drivers/nvdimm/ocxl/main.c | 240 ++++++++++++++++++
include/uapi/nvdimm/ocxlpmem.h | 46 ++++
3 files changed, 287 insertions(+)
create mode 100644 include/uapi/nvdimm/ocxlpmem.h

diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 9425377615ce..ba0ce7dca643 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -340,6 +340,7 @@ Code Seq# Include File Comments
0xC0 00-0F linux/usb/iowarrior.h
0xCA 00-0F uapi/misc/cxl.h
0xCA 10-2F uapi/misc/ocxl.h
+0xCA 30-3F uapi/nvdimm/ocxlpmem.h OpenCAPI Persistent Memory
0xCA 80-BF uapi/scsi/cxlflash_ioctl.h
0xCB 00-1F CBM serial IEC bus in development:
<mailto:[email protected]>
diff --git a/drivers/nvdimm/ocxl/main.c b/drivers/nvdimm/ocxl/main.c
index 9b85fcd3f1c9..e6be0029f658 100644
--- a/drivers/nvdimm/ocxl/main.c
+++ b/drivers/nvdimm/ocxl/main.c
@@ -13,6 +13,7 @@
#include <linux/fs.h>
#include <linux/mm_types.h>
#include <linux/memory_hotplug.h>
+#include <uapi/nvdimm/ocxlpmem.h>
#include "ocxlpmem.h"

static const struct pci_device_id pci_tbl[] = {
@@ -401,10 +402,190 @@ static int file_release(struct inode *inode, struct file *file)
return 0;
}

+/**
+ * error_log_header_parse() - Parse the first 64 bits of the error log command response
+ * @ocxlpmem: the device metadata
+ * @length: out, returns the number of bytes in the response (excluding the 64 bit header)
+ */
+static int error_log_header_parse(struct ocxlpmem *ocxlpmem, u16 *length)
+{
+ int rc;
+ u64 val;
+ u16 data_identifier;
+ u32 data_length;
+
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
+ ocxlpmem->admin_command.data_offset,
+ OCXL_LITTLE_ENDIAN, &val);
+ if (rc)
+ return rc;
+
+ data_identifier = val >> 48;
+ data_length = val & 0xFFFF;
+
+ if (data_identifier != 0x454C) { // 'EL'
+ dev_err(&ocxlpmem->dev,
+ "Bad data identifier for error log data, expected 'EL', got '%2s' (%#x), data_length=%u\n",
+ (char *)&data_identifier,
+ (unsigned int)data_identifier, data_length);
+ return -EINVAL;
+ }
+
+ *length = data_length;
+ return 0;
+}
+
+static int read_error_log(struct ocxlpmem *ocxlpmem,
+ struct ioctl_ocxlpmem_error_log *log,
+ bool buf_is_user)
+{
+ u64 val;
+ u16 user_buf_length;
+ u16 buf_length;
+ u64 *buf = (u64 *)log->buf_ptr;
+ u16 i;
+ int rc;
+
+ if (log->buf_size % 8)
+ return -EINVAL;
+
+ rc = ocxlpmem_chi(ocxlpmem, &val);
+ if (rc)
+ return rc;
+
+ if (!(val & GLOBAL_MMIO_CHI_ELA))
+ return -EAGAIN;
+
+ user_buf_length = log->buf_size;
+
+ mutex_lock(&ocxlpmem->admin_command.lock);
+
+ rc = admin_command_execute(ocxlpmem, ADMIN_COMMAND_ERRLOG);
+ if (rc != STATUS_SUCCESS) {
+ warn_status(ocxlpmem,
+ "Unexpected status from retrieve error log", rc);
+ goto out;
+ }
+
+ rc = error_log_header_parse(ocxlpmem, &log->buf_size);
+ if (rc)
+ goto out;
+ // log->buf_size now contains the returned buffer size, not the user size
+
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
+ ocxlpmem->admin_command.data_offset + 0x08,
+ OCXL_LITTLE_ENDIAN, &val);
+ if (rc)
+ goto out;
+
+ log->log_identifier = val >> 32;
+ log->program_reference_code = val & 0xFFFFFFFF;
+
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
+ ocxlpmem->admin_command.data_offset + 0x10,
+ OCXL_LITTLE_ENDIAN, &val);
+ if (rc)
+ goto out;
+
+ log->error_log_type = val >> 56;
+ log->action_flags = (log->error_log_type == OCXLPMEM_ERROR_LOG_TYPE_GENERAL) ?
+ (val >> 32) & 0xFFFFFF : 0;
+ log->power_on_seconds = val & 0xFFFFFFFF;
+
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
+ ocxlpmem->admin_command.data_offset + 0x18,
+ OCXL_LITTLE_ENDIAN, &log->timestamp);
+ if (rc)
+ goto out;
+
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
+ ocxlpmem->admin_command.data_offset + 0x20,
+ OCXL_LITTLE_ENDIAN, &log->wwid[0]);
+ if (rc)
+ goto out;
+
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
+ ocxlpmem->admin_command.data_offset + 0x28,
+ OCXL_LITTLE_ENDIAN, &log->wwid[1]);
+ if (rc)
+ goto out;
+
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
+ ocxlpmem->admin_command.data_offset + 0x30,
+ OCXL_HOST_ENDIAN, (u64 *)log->fw_revision);
+ if (rc)
+ goto out;
+ log->fw_revision[8] = '\0';
+
+ buf_length = (user_buf_length < log->buf_size) ?
+ user_buf_length : log->buf_size;
+ for (i = 0; i < buf_length / (sizeof(u64)); i++) {
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
+ ocxlpmem->admin_command.data_offset +
+ i * sizeof(u64),
+ OCXL_HOST_ENDIAN, &val);
+ if (rc)
+ goto out;
+
+ if (buf_is_user) {
+ if (copy_to_user((u64 __user *)&buf[i], &val,
+ sizeof(u64))) {
+ rc = -EFAULT;
+ goto out;
+ }
+ } else {
+ buf[i] = val;
+ }
+ }
+
+ rc = admin_response_handled(ocxlpmem);
+ if (rc)
+ goto out;
+
+out:
+ mutex_unlock(&ocxlpmem->admin_command.lock);
+ return rc;
+}
+
+static int ioctl_error_log(struct ocxlpmem *ocxlpmem,
+ struct ioctl_ocxlpmem_error_log __user *uarg)
+{
+ struct ioctl_ocxlpmem_error_log args;
+ int rc;
+
+ if (copy_from_user(&args, uarg, sizeof(args)))
+ return -EFAULT;
+
+ rc = read_error_log(ocxlpmem, &args, true);
+ if (rc)
+ return rc;
+
+ if (copy_to_user(uarg, &args, sizeof(args)))
+ return -EFAULT;
+
+ return 0;
+}
+
+static long file_ioctl(struct file *file, unsigned int cmd, unsigned long args)
+{
+ struct ocxlpmem *ocxlpmem = file->private_data;
+ int rc = -EINVAL;
+
+ switch (cmd) {
+ case IOCTL_OCXLPMEM_ERROR_LOG:
+ rc = ioctl_error_log(ocxlpmem,
+ (struct ioctl_ocxlpmem_error_log __user *)args);
+ break;
+ }
+ return rc;
+}
+
static const struct file_operations fops = {
.owner = THIS_MODULE,
.open = file_open,
.release = file_release,
+ .unlocked_ioctl = file_ioctl,
+ .compat_ioctl = file_ioctl,
};

/**
@@ -493,6 +674,60 @@ static int read_device_metadata(struct ocxlpmem *ocxlpmem)
return 0;
}

+static const char *decode_error_log_type(u8 error_log_type)
+{
+ switch (error_log_type) {
+ case 0x00:
+ return "general";
+ case 0x01:
+ return "predictive failure";
+ case 0x02:
+ return "thermal warning";
+ case 0x03:
+ return "data loss";
+ case 0x04:
+ return "health & performance";
+ default:
+ return "unknown";
+ }
+}
+
+static void dump_error_log(struct ocxlpmem *ocxlpmem)
+{
+ struct ioctl_ocxlpmem_error_log log;
+ u32 buf_size;
+ u8 *buf;
+ int rc;
+
+ if (ocxlpmem->admin_command.data_size == 0)
+ return;
+
+ buf_size = ocxlpmem->admin_command.data_size - 0x48;
+ buf = kzalloc(buf_size, GFP_KERNEL);
+ if (!buf)
+ return;
+
+ log.buf_ptr = (u64)buf;
+ log.buf_size = buf_size;
+
+ rc = read_error_log(ocxlpmem, &log, false);
+ if (rc < 0)
+ goto out;
+
+ dev_warn(&ocxlpmem->dev,
+ "OCXL PMEM Error log: WWID=0x%016llx%016llx LID=0x%x PRC=%x type=0x%x %s, Uptime=%u seconds timestamp=0x%llx\n",
+ log.wwid[0], log.wwid[1],
+ log.log_identifier, log.program_reference_code,
+ log.error_log_type,
+ decode_error_log_type(log.error_log_type),
+ log.power_on_seconds, log.timestamp);
+ print_hex_dump(KERN_WARNING, "buf", DUMP_PREFIX_OFFSET, 16, 1, buf,
+ log.buf_size, false);
+
+out:
+ kfree(buf);
+}
+
/**
* probe_function0() - Set up function 0 for an OpenCAPI persistent memory device
* This is important as it enables templates higher than 0 across all other
@@ -656,6 +891,11 @@ static int probe(struct pci_dev *pdev, const struct pci_device_id *ent)
pci_set_drvdata(pdev, NULL);

err:
+ if (ocxlpmem &&
+ (ocxlpmem_chi(ocxlpmem, &chi) == 0) &&
+ (chi & GLOBAL_MMIO_CHI_ELA))
+ dump_error_log(ocxlpmem);
+
/*
* Further cleanup is done in the release handler via free_ocxlpmem()
* This allows us to keep the character device live to handle IOCTLs to
diff --git a/include/uapi/nvdimm/ocxlpmem.h b/include/uapi/nvdimm/ocxlpmem.h
new file mode 100644
index 000000000000..5d3a03ea1e08
--- /dev/null
+++ b/include/uapi/nvdimm/ocxlpmem.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
+/* Copyright 2020 IBM Corp. */
+#ifndef _UAPI_OCXL_SCM_H
+#define _UAPI_OCXL_SCM_H
+
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
+#define OCXLPMEM_ERROR_LOG_ACTION_RESET (1 << (32 - 32))
+#define OCXLPMEM_ERROR_LOG_ACTION_CHKFW (1 << (53 - 32))
+#define OCXLPMEM_ERROR_LOG_ACTION_REPLACE (1 << (54 - 32))
+#define OCXLPMEM_ERROR_LOG_ACTION_DUMP (1 << (55 - 32))
+
+#define OCXLPMEM_ERROR_LOG_TYPE_GENERAL (0x00)
+#define OCXLPMEM_ERROR_LOG_TYPE_PREDICTIVE_FAILURE (0x01)
+#define OCXLPMEM_ERROR_LOG_TYPE_THERMAL_WARNING (0x02)
+#define OCXLPMEM_ERROR_LOG_TYPE_DATA_LOSS (0x03)
+#define OCXLPMEM_ERROR_LOG_TYPE_HEALTH_PERFORMANCE (0x04)
+
+struct ioctl_ocxlpmem_error_log {
+ __u32 log_identifier; /* out */
+ __u32 program_reference_code; /* out */
+ __u32 action_flags; /* out, recommended course of action */
+ __u32 power_on_seconds; /* out, Number of seconds the controller has been on when the error occurred */
+ __u64 timestamp; /* out, relative time since the current IPL */
+ __u64 wwid[2]; /* out, the NAA formatted WWID associated with the controller */
+ char fw_revision[8 + 1]; /* out, firmware revision as null terminated text */
+ __u8 reserved0[7];
+ __u16 buf_size; /* in/out, buffer size provided/required.
+ * If required is greater than provided, the buffer
+ * will be truncated to the amount provided. If its
+ * less, then only the required bytes will be populated.
+ * If it is 0, then there are no more error log entries.
+ */
+ __u8 error_log_type;
+ __u8 reserved1[5];
+ __u64 buf_ptr; /* coerced pointer to output buffer */
+ __u64 reserved2[2];
+};
+
+/* ioctl numbers */
+#define OCXLPMEM_MAGIC 0xCA
+/* OpenCAPI Persistent memory devices */
+#define IOCTL_OCXLPMEM_ERROR_LOG _IOWR(OCXLPMEM_MAGIC, 0x30, struct ioctl_ocxlpmem_error_log)
+
+#endif /* _UAPI_OCXL_SCM_H */
--
2.24.1

2020-03-31 09:01:08

by Alastair D'Silva

[permalink] [raw]
Subject: [PATCH v4 18/25] nvdimm/ocxl: Add an IOCTL to report controller statistics

The controller can report a number of statistics that are useful
in evaluating the performance and reliability of the card.

This patch exposes this information via an IOCTL.

Signed-off-by: Alastair D'Silva <[email protected]>
---
drivers/nvdimm/ocxl/main.c | 220 +++++++++++++++++++++++++++++++++
include/uapi/nvdimm/ocxlpmem.h | 21 ++++
2 files changed, 241 insertions(+)

diff --git a/drivers/nvdimm/ocxl/main.c b/drivers/nvdimm/ocxl/main.c
index d0db358ded43..0040fc09cceb 100644
--- a/drivers/nvdimm/ocxl/main.c
+++ b/drivers/nvdimm/ocxl/main.c
@@ -713,6 +713,221 @@ static int ioctl_controller_dump_complete(struct ocxlpmem *ocxlpmem)
GLOBAL_MMIO_HCI_CONTROLLER_DUMP_COLLECTED);
}

+/**
+ * controller_stats_header_parse() - Parse the first 64 bits of the controller stats admin command response
+ * @ocxlpmem: the device metadata
+ * @length: out, returns the number of bytes in the response (excluding the 64 bit header)
+ */
+static int controller_stats_header_parse(struct ocxlpmem *ocxlpmem,
+ u32 *length)
+{
+ int rc;
+ u64 val;
+
+ u16 data_identifier;
+ u32 data_length;
+
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
+ ocxlpmem->admin_command.data_offset,
+ OCXL_LITTLE_ENDIAN, &val);
+ if (rc)
+ return rc;
+
+ data_identifier = val >> 48;
+ data_length = val & 0xFFFFFFFF;
+
+ if (data_identifier != 0x4353) { // 'CS'
+ dev_err(&ocxlpmem->dev,
+ "Bad data identifier for error log data, expected 'CS', got '%2s' (%#x), data_length=%u\n",
+ (char *)&data_identifier,
+ (unsigned int)data_identifier, data_length);
+ return -EINVAL;
+ }
+
+ *length = data_length;
+ return 0;
+}
+
+static int ioctl_controller_stats(struct ocxlpmem *ocxlpmem,
+ struct ioctl_ocxlpmem_controller_stats __user *uarg)
+{
+ struct ioctl_ocxlpmem_controller_stats args;
+ u32 length;
+ int rc;
+ u64 val;
+
+ memset(&args, '\0', sizeof(args));
+
+ mutex_lock(&ocxlpmem->admin_command.lock);
+
+ rc = ocxl_global_mmio_write64(ocxlpmem->ocxl_afu,
+ ocxlpmem->admin_command.request_offset + 0x08,
+ OCXL_LITTLE_ENDIAN, 0);
+ if (rc)
+ goto out;
+
+ rc = admin_command_execute(ocxlpmem, ADMIN_COMMAND_CONTROLLER_STATS);
+ if (rc < 0)
+ goto out;
+ if (rc != STATUS_SUCCESS) {
+ warn_status(ocxlpmem,
+ "Unexpected status from controller stats", rc);
+ goto out;
+ }
+
+ rc = controller_stats_header_parse(ocxlpmem, &length);
+ if (rc)
+ goto out;
+
+ if (length != 0x140) // Documented length of controller stats response
+ warn_status(ocxlpmem,
+ "Unexpected length for controller stats data, expected 0x140, got 0x%x",
+ length);
+
+#define SPID1_OFFSET (ocxlpmem->admin_command.data_offset + 0x08)
+#define SPID2_OFFSET (ocxlpmem->admin_command.data_offset + 0x48)
+#define RESET_INFO (SPID1_OFFSET + 0x08)
+#define UPTIME_LIFE (SPID1_OFFSET + 0x10)
+#define CRITICAL_UTIL (SPID2_OFFSET + 0x08)
+#define HOST_LOAD_COUNT (SPID2_OFFSET + 0x10)
+#define HOST_STORE_COUNT (SPID2_OFFSET + 0x18)
+#define HOST_LOAD_DURATION (SPID2_OFFSET + 0x20)
+#define HOST_STORE_DURATION (SPID2_OFFSET + 0x28)
+#define MEDIA_READ_COUNT (SPID2_OFFSET + 0x50)
+#define MEDIA_WRITE_COUNT (SPID2_OFFSET + 0x58)
+#define MEDIA_READ_DURATION (SPID2_OFFSET + 0x60)
+#define MEDIA_WRITE_DURATION (SPID2_OFFSET + 0x68)
+#define CACHE_READ_HIT_COUNT (SPID2_OFFSET + 0x90)
+#define CACHE_WRITE_HIT_COUNT (SPID2_OFFSET + 0x98)
+#define FAST_WRITE_COUNT (SPID2_OFFSET + 0xA0)
+
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu, SPID1_OFFSET,
+ OCXL_LITTLE_ENDIAN, &val);
+ if (rc)
+ goto out;
+ if ((val >> 56) != 0x01) { // Check the parameter ID
+ rc = -ENODEV;
+ goto out;
+ }
+ if ((val & 0xFFFFFFFF) != 0x38) { // Check the length
+ rc = -ENODEV;
+ goto out;
+ }
+
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu, RESET_INFO,
+ OCXL_LITTLE_ENDIAN, &val);
+ if (rc)
+ goto out;
+
+ args.reset_count = val >> 32;
+ args.reset_uptime = val & 0xFFFFFFFF;
+
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu, UPTIME_LIFE,
+ OCXL_LITTLE_ENDIAN, &val);
+ if (rc)
+ goto out;
+
+ args.power_on_uptime = val >> 32;
+ args.life_remaining = val & 0xFF;
+
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu, SPID2_OFFSET,
+ OCXL_LITTLE_ENDIAN, &val);
+ if (rc)
+ goto out;
+ if ((val >> 56) != 0x02) { // Check the parameter ID
+ rc = -ENODEV;
+ goto out;
+ }
+ if ((val & 0xFFFFFFFF) != 0xF8) { // Check the length
+ rc = -ENODEV;
+ goto out;
+ }
+
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu, CRITICAL_UTIL,
+ OCXL_LITTLE_ENDIAN, &val);
+ if (rc)
+ goto out;
+
+ args.critical_resource_utilization = val & 0xff;
+
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu, HOST_LOAD_COUNT,
+ OCXL_LITTLE_ENDIAN,
+ &args.host_load_count);
+ if (rc)
+ goto out;
+
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu, HOST_STORE_COUNT,
+ OCXL_LITTLE_ENDIAN,
+ &args.host_store_count);
+ if (rc)
+ goto out;
+
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu, HOST_LOAD_DURATION,
+ OCXL_LITTLE_ENDIAN,
+ &args.host_load_duration);
+ if (rc)
+ goto out;
+
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu, HOST_STORE_DURATION,
+ OCXL_LITTLE_ENDIAN,
+ &args.host_store_duration);
+ if (rc)
+ goto out;
+
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu, MEDIA_READ_COUNT,
+ OCXL_LITTLE_ENDIAN,
+ &args.media_read_count);
+ if (rc)
+ goto out;
+
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu, MEDIA_WRITE_COUNT,
+ OCXL_LITTLE_ENDIAN,
+ &args.media_write_count);
+ if (rc)
+ goto out;
+
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu, MEDIA_READ_DURATION,
+ OCXL_LITTLE_ENDIAN,
+ &args.media_read_duration);
+ if (rc)
+ goto out;
+
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu, MEDIA_WRITE_DURATION,
+ OCXL_LITTLE_ENDIAN,
+ &args.media_write_duration);
+ if (rc)
+ goto out;
+
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu, CACHE_READ_HIT_COUNT,
+ OCXL_LITTLE_ENDIAN,
+ &args.cache_read_hit_count);
+ if (rc)
+ goto out;
+
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu, CACHE_WRITE_HIT_COUNT,
+ OCXL_LITTLE_ENDIAN,
+ &args.cache_write_hit_count);
+ if (rc)
+ goto out;
+
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu, FAST_WRITE_COUNT,
+ OCXL_LITTLE_ENDIAN,
+ &args.fast_write_count);
+ if (rc)
+ goto out;
+
+ if (copy_to_user(uarg, &args, sizeof(args))) {
+ rc = -EFAULT;
+ goto out;
+ }
+
+ rc = admin_response_handled(ocxlpmem);
+
+out:
+ mutex_unlock(&ocxlpmem->admin_command.lock);
+ return rc;
+}
+
static long file_ioctl(struct file *file, unsigned int cmd, unsigned long args)
{
struct ocxlpmem *ocxlpmem = file->private_data;
@@ -736,6 +951,11 @@ static long file_ioctl(struct file *file, unsigned int cmd, unsigned long args)
case IOCTL_OCXLPMEM_CONTROLLER_DUMP_COMPLETE:
rc = ioctl_controller_dump_complete(ocxlpmem);
break;
+
+ case IOCTL_OCXLPMEM_CONTROLLER_STATS:
+ rc = ioctl_controller_stats(ocxlpmem,
+ (struct ioctl_ocxlpmem_controller_stats __user *)args);
+ break;
}

return rc;
diff --git a/include/uapi/nvdimm/ocxlpmem.h b/include/uapi/nvdimm/ocxlpmem.h
index 05e2b3f7b27c..ca3a7098fa9d 100644
--- a/include/uapi/nvdimm/ocxlpmem.h
+++ b/include/uapi/nvdimm/ocxlpmem.h
@@ -51,6 +51,26 @@ struct ioctl_ocxlpmem_controller_dump_data {
__u64 reserved[8];
};

+struct ioctl_ocxlpmem_controller_stats {
+ __u32 reset_count;
+ __u32 reset_uptime; /* seconds */
+ __u32 power_on_uptime; /* seconds */
+ __u8 life_remaining; /* percentage, 0-100 */
+ __u8 critical_resource_utilization; /* percentage, 0-100 */
+ __u8 reserved[2];
+ __u64 host_load_count;
+ __u64 host_store_count;
+ __u64 host_load_duration; /* nanoseconds, total */
+ __u64 host_store_duration; /* nanoseconds, total */
+ __u64 media_read_count;
+ __u64 media_write_count;
+ __u64 media_read_duration; /* nanoseconds, total */
+ __u64 media_write_duration; /* nanoseconds, total */
+ __u64 cache_read_hit_count;
+ __u64 cache_write_hit_count;
+ __u64 fast_write_count;
+};
+
/* ioctl numbers */
#define OCXLPMEM_MAGIC 0xCA
/* OpenCAPI Persistent memory devices */
@@ -58,5 +78,6 @@ struct ioctl_ocxlpmem_controller_dump_data {
#define IOCTL_OCXLPMEM_CONTROLLER_DUMP _IO(OCXLPMEM_MAGIC, 0x31)
#define IOCTL_OCXLPMEM_CONTROLLER_DUMP_DATA _IOWR(OCXLPMEM_MAGIC, 0x32, struct ioctl_ocxlpmem_controller_dump_data)
#define IOCTL_OCXLPMEM_CONTROLLER_DUMP_COMPLETE _IO(OCXLPMEM_MAGIC, 0x33)
+#define IOCTL_OCXLPMEM_CONTROLLER_STATS _IO(OCXLPMEM_MAGIC, 0x34)

#endif /* _UAPI_OCXL_SCM_H */
--
2.24.1

2020-03-31 09:01:44

by Alastair D'Silva

[permalink] [raw]
Subject: [PATCH v4 17/25] nvdimm/ocxl: Add controller dump IOCTLs

This patch adds IOCTLs to allow userspace to request & fetch dumps
of the internal controller state.

This is useful during debugging or when a fatal error on the controller
has occurred.

The expected flow of operations are:
1. IOCTL_OCXL_PMEM_CONTROLLER_DUMP to request the controller to take
a dump. This IOCTL will complete after the dump is available for
collection.
2. IOCTL_OCXL_PMEM_CONTROLLER_DUMP_DATA called repeatedly to fetch
chunks from the buffer
3. IOCTL_OCXL_PMEM_CONTROLLER_DUMP_COMPLETE to notify the controller
that it can free any internal resources used for the dump

Signed-off-by: Alastair D'Silva <[email protected]>
---
drivers/nvdimm/ocxl/main.c | 161 +++++++++++++++++++++++++++++++++
include/uapi/nvdimm/ocxlpmem.h | 16 ++++
2 files changed, 177 insertions(+)

diff --git a/drivers/nvdimm/ocxl/main.c b/drivers/nvdimm/ocxl/main.c
index e6be0029f658..d0db358ded43 100644
--- a/drivers/nvdimm/ocxl/main.c
+++ b/drivers/nvdimm/ocxl/main.c
@@ -566,6 +566,153 @@ static int ioctl_error_log(struct ocxlpmem *ocxlpmem,
return 0;
}

+/**
+ * controller_dump_header_parse() - Parse the first 64 bits of the controller dump command response
+ * @ocxlpmem: the device metadata
+ * @length: out, returns the number of bytes in the response (excluding the 64 bit header)
+ */
+static int controller_dump_header_parse(struct ocxlpmem *ocxlpmem, u16 *length)
+{
+ int rc;
+ u64 val;
+ u16 data_identifier;
+ u32 data_length;
+
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
+ ocxlpmem->admin_command.data_offset,
+ OCXL_LITTLE_ENDIAN, &val);
+ if (rc)
+ return rc;
+
+ data_identifier = val >> 48;
+ data_length = val & 0xFFFF;
+
+ if (data_identifier != 0x4344) { // 'CD'
+ dev_err(&ocxlpmem->dev,
+ "Bad data identifier for error log data, expected 'CD', got '%2s' (%#x), data_length=%u\n",
+ (char *)&data_identifier,
+ (unsigned int)data_identifier, data_length);
+ return -EINVAL;
+ }
+
+ *length = data_length;
+ return 0;
+}
+
+static int ioctl_controller_dump_data(struct ocxlpmem *ocxlpmem,
+ struct ioctl_ocxlpmem_controller_dump_data __user *uarg)
+{
+ struct ioctl_ocxlpmem_controller_dump_data args;
+ u64 __user *buf;
+ u16 i, buf_size;
+ u64 val;
+ int rc;
+
+ if (copy_from_user(&args, uarg, sizeof(args)))
+ return -EFAULT;
+
+ if (args.buf_size % sizeof(u64))
+ return -EINVAL;
+
+ if (args.buf_size > ocxlpmem->admin_command.data_size)
+ return -EINVAL;
+
+ buf = (u64 *)args.buf_ptr;
+
+ mutex_lock(&ocxlpmem->admin_command.lock);
+
+ val = ((u64)args.offset) << 32;
+ val |= args.buf_size;
+ rc = ocxl_global_mmio_write64(ocxlpmem->ocxl_afu,
+ ocxlpmem->admin_command.request_offset + 0x08,
+ OCXL_LITTLE_ENDIAN, val);
+ if (rc)
+ goto out;
+
+ rc = admin_command_execute(ocxlpmem, ADMIN_COMMAND_CONTROLLER_DUMP);
+ if (rc)
+ goto out;
+ if (rc != STATUS_SUCCESS) {
+ warn_status(ocxlpmem,
+ "Unexpected status from controller dump",
+ rc);
+ goto out;
+ }
+
+ rc = controller_dump_header_parse(ocxlpmem, &buf_size);
+ if (rc)
+ goto out;
+
+ buf_size = min((u16)(buf_size + sizeof(u64)), args.buf_size);
+
+ for (i = 0; i < buf_size / sizeof(u64); i++) {
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
+ ocxlpmem->admin_command.data_offset +
+ i * sizeof(u64),
+ OCXL_HOST_ENDIAN, &val);
+ if (rc)
+ goto out;
+
+ if (copy_to_user(&buf[i], &val, sizeof(u64))) {
+ rc = -EFAULT;
+ goto out;
+ }
+ }
+
+ args.buf_size = buf_size;
+
+ if (copy_to_user(uarg, &args, sizeof(args))) {
+ rc = -EFAULT;
+ goto out;
+ }
+
+ rc = admin_response_handled(ocxlpmem);
+ if (rc)
+ goto out;
+
+out:
+ mutex_unlock(&ocxlpmem->admin_command.lock);
+ return rc;
+}
+
+int request_controller_dump(struct ocxlpmem *ocxlpmem)
+{
+ int rc;
+ u64 busy = 1;
+
+ rc = ocxl_global_mmio_set64(ocxlpmem->ocxl_afu, GLOBAL_MMIO_CHIC,
+ OCXL_LITTLE_ENDIAN,
+ GLOBAL_MMIO_CHI_CDA);
+ if (rc)
+ return rc;
+
+ rc = ocxl_global_mmio_set64(ocxlpmem->ocxl_afu, GLOBAL_MMIO_HCI,
+ OCXL_LITTLE_ENDIAN,
+ GLOBAL_MMIO_HCI_CONTROLLER_DUMP);
+ if (rc)
+ return rc;
+
+ while (busy) {
+ rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
+ GLOBAL_MMIO_HCI,
+ OCXL_LITTLE_ENDIAN, &busy);
+ if (rc)
+ return rc;
+
+ busy &= GLOBAL_MMIO_HCI_CONTROLLER_DUMP;
+ cond_resched();
+ }
+
+ return 0;
+}
+
+static int ioctl_controller_dump_complete(struct ocxlpmem *ocxlpmem)
+{
+ return ocxl_global_mmio_set64(ocxlpmem->ocxl_afu, GLOBAL_MMIO_HCI,
+ OCXL_LITTLE_ENDIAN,
+ GLOBAL_MMIO_HCI_CONTROLLER_DUMP_COLLECTED);
+}
+
static long file_ioctl(struct file *file, unsigned int cmd, unsigned long args)
{
struct ocxlpmem *ocxlpmem = file->private_data;
@@ -576,7 +723,21 @@ static long file_ioctl(struct file *file, unsigned int cmd, unsigned long args)
rc = ioctl_error_log(ocxlpmem,
(struct ioctl_ocxlpmem_error_log __user *)args);
break;
+
+ case IOCTL_OCXLPMEM_CONTROLLER_DUMP:
+ rc = request_controller_dump(ocxlpmem);
+ break;
+
+ case IOCTL_OCXLPMEM_CONTROLLER_DUMP_DATA:
+ rc = ioctl_controller_dump_data(ocxlpmem,
+ (struct ioctl_ocxlpmem_controller_dump_data __user *)args);
+ break;
+
+ case IOCTL_OCXLPMEM_CONTROLLER_DUMP_COMPLETE:
+ rc = ioctl_controller_dump_complete(ocxlpmem);
+ break;
}
+
return rc;
}

diff --git a/include/uapi/nvdimm/ocxlpmem.h b/include/uapi/nvdimm/ocxlpmem.h
index 5d3a03ea1e08..05e2b3f7b27c 100644
--- a/include/uapi/nvdimm/ocxlpmem.h
+++ b/include/uapi/nvdimm/ocxlpmem.h
@@ -38,9 +38,25 @@ struct ioctl_ocxlpmem_error_log {
__u64 reserved2[2];
};

+struct ioctl_ocxlpmem_controller_dump_data {
+ __u64 buf_ptr; /* coerced pointer to output buffer */
+ __u16 buf_size; /* in/out, buffer size provided/required.
+ * If required is greater than provided, the buffer
+ * will be truncated to the amount provided. If its
+ * less, then only the required bytes will be populated.
+ * If it is 0, then there is no more dump data available.
+ */
+ __u16 reserved0;
+ __u32 offset; /* in, Offset within the dump */
+ __u64 reserved[8];
+};
+
/* ioctl numbers */
#define OCXLPMEM_MAGIC 0xCA
/* OpenCAPI Persistent memory devices */
#define IOCTL_OCXLPMEM_ERROR_LOG _IOWR(OCXLPMEM_MAGIC, 0x30, struct ioctl_ocxlpmem_error_log)
+#define IOCTL_OCXLPMEM_CONTROLLER_DUMP _IO(OCXLPMEM_MAGIC, 0x31)
+#define IOCTL_OCXLPMEM_CONTROLLER_DUMP_DATA _IOWR(OCXLPMEM_MAGIC, 0x32, struct ioctl_ocxlpmem_controller_dump_data)
+#define IOCTL_OCXLPMEM_CONTROLLER_DUMP_COMPLETE _IO(OCXLPMEM_MAGIC, 0x33)

#endif /* _UAPI_OCXL_SCM_H */
--
2.24.1

2020-04-01 08:48:59

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v4 01/25] powerpc/powernv: Add OPAL calls for LPC memory alloc/release

On Sun, Mar 29, 2020 at 10:23 PM Alastair D'Silva <[email protected]> wrote:
>
> Add OPAL calls for LPC memory alloc/release
>

This seems to be referencing an existing api definition, can you
include a pointer to the spec in case someone wanted to understand
what these routines do? I suspect this is not allocating memory in the
traditional sense as much as it's allocating physical address space
for a device to be mapped?


> Signed-off-by: Alastair D'Silva <[email protected]>
> Acked-by: Andrew Donnellan <[email protected]>
> Acked-by: Frederic Barrat <[email protected]>
> ---
> arch/powerpc/include/asm/opal-api.h | 2 ++
> arch/powerpc/include/asm/opal.h | 2 ++
> arch/powerpc/platforms/powernv/opal-call.c | 2 ++
> 3 files changed, 6 insertions(+)
>
> diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
> index c1f25a760eb1..9298e603001b 100644
> --- a/arch/powerpc/include/asm/opal-api.h
> +++ b/arch/powerpc/include/asm/opal-api.h
> @@ -208,6 +208,8 @@
> #define OPAL_HANDLE_HMI2 166
> #define OPAL_NX_COPROC_INIT 167
> #define OPAL_XIVE_GET_VP_STATE 170
> +#define OPAL_NPU_MEM_ALLOC 171
> +#define OPAL_NPU_MEM_RELEASE 172
> #define OPAL_MPIPL_UPDATE 173
> #define OPAL_MPIPL_REGISTER_TAG 174
> #define OPAL_MPIPL_QUERY_TAG 175
> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
> index 9986ac34b8e2..301fea46c7ca 100644
> --- a/arch/powerpc/include/asm/opal.h
> +++ b/arch/powerpc/include/asm/opal.h
> @@ -39,6 +39,8 @@ int64_t opal_npu_spa_clear_cache(uint64_t phb_id, uint32_t bdfn,
> uint64_t PE_handle);
> int64_t opal_npu_tl_set(uint64_t phb_id, uint32_t bdfn, long cap,
> uint64_t rate_phys, uint32_t size);
> +int64_t opal_npu_mem_alloc(u64 phb_id, u32 bdfn, u64 size, __be64 *bar);
> +int64_t opal_npu_mem_release(u64 phb_id, u32 bdfn);
>
> int64_t opal_console_write(int64_t term_number, __be64 *length,
> const uint8_t *buffer);
> diff --git a/arch/powerpc/platforms/powernv/opal-call.c b/arch/powerpc/platforms/powernv/opal-call.c
> index 5cd0f52d258f..f26e58b72c04 100644
> --- a/arch/powerpc/platforms/powernv/opal-call.c
> +++ b/arch/powerpc/platforms/powernv/opal-call.c
> @@ -287,6 +287,8 @@ OPAL_CALL(opal_pci_set_pbcq_tunnel_bar, OPAL_PCI_SET_PBCQ_TUNNEL_BAR);
> OPAL_CALL(opal_sensor_read_u64, OPAL_SENSOR_READ_U64);
> OPAL_CALL(opal_sensor_group_enable, OPAL_SENSOR_GROUP_ENABLE);
> OPAL_CALL(opal_nx_coproc_init, OPAL_NX_COPROC_INIT);
> +OPAL_CALL(opal_npu_mem_alloc, OPAL_NPU_MEM_ALLOC);
> +OPAL_CALL(opal_npu_mem_release, OPAL_NPU_MEM_RELEASE);
> OPAL_CALL(opal_mpipl_update, OPAL_MPIPL_UPDATE);
> OPAL_CALL(opal_mpipl_register_tag, OPAL_MPIPL_REGISTER_TAG);
> OPAL_CALL(opal_mpipl_query_tag, OPAL_MPIPL_QUERY_TAG);
> --
> 2.24.1
>

2020-04-01 08:49:17

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v4 02/25] mm/memory_hotplug: Allow check_hotplug_memory_addressable to be called from drivers

On Sun, Mar 29, 2020 at 10:23 PM Alastair D'Silva <[email protected]> wrote:
>
> When setting up OpenCAPI connected persistent memory, the range check may
> not be performed until quite late (or perhaps not at all, if the user does
> not establish a DAX device).
>
> This patch makes the range check callable so we can perform the check while
> probing the OpenCAPI Persistent Memory device.
>
> Signed-off-by: Alastair D'Silva <[email protected]>
> Reviewed-by: Andrew Donnellan <[email protected]>
> ---
> include/linux/memory_hotplug.h | 5 +++++
> mm/memory_hotplug.c | 4 ++--
> 2 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index f4d59155f3d4..9a19ae0d7e31 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -337,6 +337,11 @@ static inline void __remove_memory(int nid, u64 start, u64 size) {}
> extern void set_zone_contiguous(struct zone *zone);
> extern void clear_zone_contiguous(struct zone *zone);
>
> +#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
> +int check_hotplug_memory_addressable(unsigned long pfn,
> + unsigned long nr_pages);
> +#endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */

Let's move this to include/linux/memory.h with the other
CONFIG_MEMORY_HOTPLUG_SPARSE declarations, and add a dummy
implementation for the CONFIG_MEMORY_HOTPLUG_SPARSE=n case.

Also, this patch can be squashed with the next one, no need for it to
be stand alone.


> +
> extern void __ref free_area_init_core_hotplug(int nid);
> extern int __add_memory(int nid, u64 start, u64 size);
> extern int add_memory(int nid, u64 start, u64 size);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 0a54ffac8c68..14945f033594 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -276,8 +276,8 @@ static int check_pfn_span(unsigned long pfn, unsigned long nr_pages,
> return 0;
> }
>
> -static int check_hotplug_memory_addressable(unsigned long pfn,
> - unsigned long nr_pages)
> +int check_hotplug_memory_addressable(unsigned long pfn,
> + unsigned long nr_pages)
> {
> const u64 max_addr = PFN_PHYS(pfn + nr_pages) - 1;
>
> --
> 2.24.1
>

2020-04-01 08:49:31

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v4 03/25] powerpc/powernv: Map & release OpenCAPI LPC memory

On Sun, Mar 29, 2020 at 10:23 PM Alastair D'Silva <[email protected]> wrote:
>
> This patch adds OPAL calls to powernv so that the OpenCAPI
> driver can map & release LPC (Lowest Point of Coherency) memory.
>
> Signed-off-by: Alastair D'Silva <[email protected]>
> Reviewed-by: Andrew Donnellan <[email protected]>
> ---
> arch/powerpc/include/asm/pnv-ocxl.h | 2 ++
> arch/powerpc/platforms/powernv/ocxl.c | 43 +++++++++++++++++++++++++++
> 2 files changed, 45 insertions(+)
>
> diff --git a/arch/powerpc/include/asm/pnv-ocxl.h b/arch/powerpc/include/asm/pnv-ocxl.h
> index 7de82647e761..560a19bb71b7 100644
> --- a/arch/powerpc/include/asm/pnv-ocxl.h
> +++ b/arch/powerpc/include/asm/pnv-ocxl.h
> @@ -32,5 +32,7 @@ extern int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle)
>
> extern int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr);
> extern void pnv_ocxl_free_xive_irq(u32 irq);
> +u64 pnv_ocxl_platform_lpc_setup(struct pci_dev *pdev, u64 size);
> +void pnv_ocxl_platform_lpc_release(struct pci_dev *pdev);
>
> #endif /* _ASM_PNV_OCXL_H */
> diff --git a/arch/powerpc/platforms/powernv/ocxl.c b/arch/powerpc/platforms/powernv/ocxl.c
> index 8c65aacda9c8..f13119a7c026 100644
> --- a/arch/powerpc/platforms/powernv/ocxl.c
> +++ b/arch/powerpc/platforms/powernv/ocxl.c
> @@ -475,6 +475,49 @@ void pnv_ocxl_spa_release(void *platform_data)
> }
> EXPORT_SYMBOL_GPL(pnv_ocxl_spa_release);
>
> +u64 pnv_ocxl_platform_lpc_setup(struct pci_dev *pdev, u64 size)
> +{
> + struct pci_controller *hose = pci_bus_to_host(pdev->bus);
> + struct pnv_phb *phb = hose->private_data;

Is calling the local variable 'hose' instead of 'host' on purpose?

> + u32 bdfn = pci_dev_id(pdev);
> + __be64 base_addr_be64;
> + u64 base_addr;
> + int rc;
> +
> + rc = opal_npu_mem_alloc(phb->opal_id, bdfn, size, &base_addr_be64);
> + if (rc) {
> + dev_warn(&pdev->dev,
> + "OPAL could not allocate LPC memory, rc=%d\n", rc);
> + return 0;
> + }
> +
> + base_addr = be64_to_cpu(base_addr_be64);
> +
> +#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE

With the proposed cleanup in patch2 the ifdef can be elided here.

> + rc = check_hotplug_memory_addressable(base_addr >> PAGE_SHIFT,
> + size >> PAGE_SHIFT);
> + if (rc)
> + return 0;

Is this an error worth logging if someone is wondering why their
device is not showing up?


> +#endif
> +
> + return base_addr;
> +}
> +EXPORT_SYMBOL_GPL(pnv_ocxl_platform_lpc_setup);
> +
> +void pnv_ocxl_platform_lpc_release(struct pci_dev *pdev)
> +{
> + struct pci_controller *hose = pci_bus_to_host(pdev->bus);
> + struct pnv_phb *phb = hose->private_data;
> + u32 bdfn = pci_dev_id(pdev);
> + int rc;
> +
> + rc = opal_npu_mem_release(phb->opal_id, bdfn);
> + if (rc)
> + dev_warn(&pdev->dev,
> + "OPAL reported rc=%d when releasing LPC memory\n", rc);
> +}
> +EXPORT_SYMBOL_GPL(pnv_ocxl_platform_lpc_release);
> +
> int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle)
> {
> struct spa_data *data = (struct spa_data *) platform_data;
> --
> 2.24.1
>

2020-04-01 08:50:26

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v4 08/25] ocxl: Emit a log message showing how much LPC memory was detected

On Sun, Mar 29, 2020 at 10:23 PM Alastair D'Silva <[email protected]> wrote:
>
> This patch emits a message showing how much LPC memory & special purpose
> memory was detected on an OCXL device.
>
> Signed-off-by: Alastair D'Silva <[email protected]>
> Acked-by: Frederic Barrat <[email protected]>
> Acked-by: Andrew Donnellan <[email protected]>
> ---
> drivers/misc/ocxl/config.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
> index a62e3d7db2bf..69cca341d446 100644
> --- a/drivers/misc/ocxl/config.c
> +++ b/drivers/misc/ocxl/config.c
> @@ -568,6 +568,10 @@ static int read_afu_lpc_memory_info(struct pci_dev *dev,
> afu->special_purpose_mem_size =
> total_mem_size - lpc_mem_size;
> }
> +
> + dev_info(&dev->dev, "Probed LPC memory of %#llx bytes and special purpose memory of %#llx bytes\n",
> + afu->lpc_mem_size, afu->special_purpose_mem_size);

A patch for a single log message is too fine grained for my taste,
let's squash this into another patch in the series.

> +
> return 0;
> }
>
> --
> 2.24.1
>

2020-04-01 08:50:48

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v4 06/25] ocxl: Tally up the LPC memory on a link & allow it to be mapped

On Sun, Mar 29, 2020 at 10:53 PM Alastair D'Silva <[email protected]> wrote:
>
> OpenCAPI LPC memory is allocated per link, but each link supports
> multiple AFUs, and each AFU can have LPC memory assigned to it.

Is there an OpenCAPI primer to decode these objects and their
associations that I can reference?


>
> This patch tallys the memory for all AFUs on a link, allowing it
> to be mapped in a single operation after the AFUs have been
> enumerated.
>
> Signed-off-by: Alastair D'Silva <[email protected]>
> ---
> drivers/misc/ocxl/core.c | 10 ++++++
> drivers/misc/ocxl/link.c | 60 +++++++++++++++++++++++++++++++
> drivers/misc/ocxl/ocxl_internal.h | 33 +++++++++++++++++
> 3 files changed, 103 insertions(+)
>
> diff --git a/drivers/misc/ocxl/core.c b/drivers/misc/ocxl/core.c
> index b7a09b21ab36..2531c6cf19a0 100644
> --- a/drivers/misc/ocxl/core.c
> +++ b/drivers/misc/ocxl/core.c
> @@ -230,8 +230,18 @@ static int configure_afu(struct ocxl_afu *afu, u8 afu_idx, struct pci_dev *dev)
> if (rc)
> goto err_free_pasid;
>
> + if (afu->config.lpc_mem_size || afu->config.special_purpose_mem_size) {
> + rc = ocxl_link_add_lpc_mem(afu->fn->link, afu->config.lpc_mem_offset,
> + afu->config.lpc_mem_size +
> + afu->config.special_purpose_mem_size);
> + if (rc)
> + goto err_free_mmio;
> + }
> +
> return 0;
>
> +err_free_mmio:
> + unmap_mmio_areas(afu);
> err_free_pasid:
> reclaim_afu_pasid(afu);
> err_free_actag:
> diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
> index 58d111afd9f6..af119d3ef79a 100644
> --- a/drivers/misc/ocxl/link.c
> +++ b/drivers/misc/ocxl/link.c
> @@ -84,6 +84,11 @@ struct ocxl_link {
> int dev;
> atomic_t irq_available;
> struct spa *spa;
> + struct mutex lpc_mem_lock; /* protects lpc_mem & lpc_mem_sz */
> + u64 lpc_mem_sz; /* Total amount of LPC memory presented on the link */
> + u64 lpc_mem;
> + int lpc_consumers;
> +
> void *platform_data;
> };
> static struct list_head links_list = LIST_HEAD_INIT(links_list);
> @@ -396,6 +401,8 @@ static int alloc_link(struct pci_dev *dev, int PE_mask, struct ocxl_link **out_l
> if (rc)
> goto err_spa;
>
> + mutex_init(&link->lpc_mem_lock);
> +
> /* platform specific hook */
> rc = pnv_ocxl_spa_setup(dev, link->spa->spa_mem, PE_mask,
> &link->platform_data);
> @@ -711,3 +718,56 @@ void ocxl_link_free_irq(void *link_handle, int hw_irq)
> atomic_inc(&link->irq_available);
> }
> EXPORT_SYMBOL_GPL(ocxl_link_free_irq);
> +
> +int ocxl_link_add_lpc_mem(void *link_handle, u64 offset, u64 size)
> +{
> + struct ocxl_link *link = (struct ocxl_link *)link_handle;
> +
> + // Check for overflow
> + if (offset > (offset + size))
> + return -EINVAL;
> +
> + mutex_lock(&link->lpc_mem_lock);
> + link->lpc_mem_sz = max(link->lpc_mem_sz, offset + size);
> +
> + mutex_unlock(&link->lpc_mem_lock);
> +
> + return 0;
> +}
> +
> +u64 ocxl_link_lpc_map(void *link_handle, struct pci_dev *pdev)
> +{
> + struct ocxl_link *link = (struct ocxl_link *)link_handle;
> +
> + mutex_lock(&link->lpc_mem_lock);
> +
> + if (!link->lpc_mem)
> + link->lpc_mem = pnv_ocxl_platform_lpc_setup(pdev, link->lpc_mem_sz);
> +
> + if (link->lpc_mem)
> + link->lpc_consumers++;
> + mutex_unlock(&link->lpc_mem_lock);
> +
> + return link->lpc_mem;
> +}
> +
> +void ocxl_link_lpc_release(void *link_handle, struct pci_dev *pdev)
> +{
> + struct ocxl_link *link = (struct ocxl_link *)link_handle;
> +
> + mutex_lock(&link->lpc_mem_lock);
> +
> + if (!link->lpc_mem) {
> + mutex_unlock(&link->lpc_mem_lock);
> + return;
> + }
> +
> + WARN_ON(--link->lpc_consumers < 0);
> +
> + if (link->lpc_consumers == 0) {
> + pnv_ocxl_platform_lpc_release(pdev);
> + link->lpc_mem = 0;
> + }
> +
> + mutex_unlock(&link->lpc_mem_lock);
> +}
> diff --git a/drivers/misc/ocxl/ocxl_internal.h b/drivers/misc/ocxl/ocxl_internal.h
> index 198e4e4bc51d..2d7575225bd7 100644
> --- a/drivers/misc/ocxl/ocxl_internal.h
> +++ b/drivers/misc/ocxl/ocxl_internal.h
> @@ -142,4 +142,37 @@ int ocxl_irq_offset_to_id(struct ocxl_context *ctx, u64 offset);
> u64 ocxl_irq_id_to_offset(struct ocxl_context *ctx, int irq_id);
> void ocxl_afu_irq_free_all(struct ocxl_context *ctx);
>
> +/**
> + * ocxl_link_add_lpc_mem() - Increment the amount of memory required by an OpenCAPI link
> + *
> + * @link_handle: The OpenCAPI link handle
> + * @offset: The offset of the memory to add
> + * @size: The number of bytes to increment memory on the link by
> + *
> + * Returns 0 on success, -EINVAL on overflow
> + */
> +int ocxl_link_add_lpc_mem(void *link_handle, u64 offset, u64 size);
> +
> +/**
> + * ocxl_link_lpc_map() - Map the LPC memory for an OpenCAPI device
> + * Since LPC memory belongs to a link, the whole LPC memory available
> + * on the link must be mapped in order to make it accessible to a device.
> + * @link_handle: The OpenCAPI link handle
> + * @pdev: A device that is on the link
> + *
> + * Returns the address of the mapped LPC memory, or 0 on error
> + */
> +u64 ocxl_link_lpc_map(void *link_handle, struct pci_dev *pdev);
> +
> +/**
> + * ocxl_link_lpc_release() - Release the LPC memory device for an OpenCAPI device
> + *
> + * Offlines LPC memory on an OpenCAPI link for a device. If this is the
> + * last device on the link to release the memory, unmap it from the link.
> + *
> + * @link_handle: The OpenCAPI link handle
> + * @pdev: A device that is on the link
> + */
> +void ocxl_link_lpc_release(void *link_handle, struct pci_dev *pdev);
> +
> #endif /* _OCXL_INTERNAL_H_ */
> --
> 2.24.1
>

2020-04-01 08:51:06

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v4 10/25] nvdimm: Add driver for OpenCAPI Persistent Memory

On Sun, Mar 29, 2020 at 10:23 PM Alastair D'Silva <[email protected]> wrote:
>
> This driver exposes LPC memory on OpenCAPI pmem cards
> as an NVDIMM, allowing the existing nvram infrastructure
> to be used.
>
> Namespace metadata is stored on the media itself, so
> scm_reserve_metadata() maps 1 section's worth of PMEM storage
> at the start to hold this. The rest of the PMEM range is registered
> with libnvdimm as an nvdimm. ndctl_config_read/write/size() provide
> callbacks to libnvdimm to access the metadata.
>
> Signed-off-by: Alastair D'Silva <[email protected]>
> ---
> drivers/nvdimm/Kconfig | 2 +
> drivers/nvdimm/Makefile | 1 +
> drivers/nvdimm/ocxl/Kconfig | 15 ++
> drivers/nvdimm/ocxl/Makefile | 7 +
> drivers/nvdimm/ocxl/main.c | 476 +++++++++++++++++++++++++++++++++
> drivers/nvdimm/ocxl/ocxlpmem.h | 23 ++
> 6 files changed, 524 insertions(+)
> create mode 100644 drivers/nvdimm/ocxl/Kconfig
> create mode 100644 drivers/nvdimm/ocxl/Makefile
> create mode 100644 drivers/nvdimm/ocxl/main.c
> create mode 100644 drivers/nvdimm/ocxl/ocxlpmem.h
>
> diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
> index b7d1eb38b27d..368328637182 100644
> --- a/drivers/nvdimm/Kconfig
> +++ b/drivers/nvdimm/Kconfig
> @@ -131,4 +131,6 @@ config NVDIMM_TEST_BUILD
> core devm_memremap_pages() implementation and other
> infrastructure.
>
> +source "drivers/nvdimm/ocxl/Kconfig"
> +
> endif
> diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
> index 29203f3d3069..bc02be11c794 100644
> --- a/drivers/nvdimm/Makefile
> +++ b/drivers/nvdimm/Makefile
> @@ -33,3 +33,4 @@ libnvdimm-$(CONFIG_NVDIMM_KEYS) += security.o
> TOOLS := ../../tools
> TEST_SRC := $(TOOLS)/testing/nvdimm/test
> obj-$(CONFIG_NVDIMM_TEST_BUILD) += $(TEST_SRC)/iomap.o
> +obj-$(CONFIG_LIBNVDIMM) += ocxl/
> diff --git a/drivers/nvdimm/ocxl/Kconfig b/drivers/nvdimm/ocxl/Kconfig
> new file mode 100644
> index 000000000000..c5d927520920
> --- /dev/null
> +++ b/drivers/nvdimm/ocxl/Kconfig
> @@ -0,0 +1,15 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +if LIBNVDIMM
> +
> +config OCXL_PMEM
> + tristate "OpenCAPI Persistent Memory"
> + depends on LIBNVDIMM && PPC_POWERNV && PCI && EEH && ZONE_DEVICE && OCXL

Does OXCL_PMEM itself have any CONFIG_ZONE_DEVICE dependencies? That's
more a function of CONFIG_DEV_DAX and CONFIG_FS_DAX. Doesn't OCXL
already depend on CONFIG_PCI?


> + help
> + Exposes devices that implement the OpenCAPI Storage Class Memory
> + specification as persistent memory regions. You may also want
> + DEV_DAX, DEV_DAX_PMEM & FS_DAX if you plan on using DAX devices
> + stacked on top of this driver.
> +
> + Select N if unsure.
> +
> +endif
> diff --git a/drivers/nvdimm/ocxl/Makefile b/drivers/nvdimm/ocxl/Makefile
> new file mode 100644
> index 000000000000..e0e8ade1987a
> --- /dev/null
> +++ b/drivers/nvdimm/ocxl/Makefile
> @@ -0,0 +1,7 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +ccflags-$(CONFIG_PPC_WERROR) += -Werror
> +
> +obj-$(CONFIG_OCXL_PMEM) += ocxlpmem.o
> +
> +ocxlpmem-y := main.o
> \ No newline at end of file
> diff --git a/drivers/nvdimm/ocxl/main.c b/drivers/nvdimm/ocxl/main.c
> new file mode 100644
> index 000000000000..c0066fedf9cc
> --- /dev/null
> +++ b/drivers/nvdimm/ocxl/main.c
> @@ -0,0 +1,476 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +// Copyright 2020 IBM Corp.
> +
> +/*
> + * A driver for OpenCAPI devices that implement the Storage Class
> + * Memory specification.
> + */
> +
> +#include <linux/module.h>
> +#include <misc/ocxl.h>
> +#include <linux/ndctl.h>
> +#include <linux/mm_types.h>
> +#include <linux/memory_hotplug.h>
> +#include "ocxlpmem.h"
> +
> +static const struct pci_device_id pci_tbl[] = {
> + { PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x0625), },
> + { }
> +};
> +
> +MODULE_DEVICE_TABLE(pci, pci_tbl);
> +
> +#define NUM_MINORS 256 // Total to reserve
> +
> +static dev_t ocxlpmem_dev;
> +static struct class *ocxlpmem_class;
> +static struct mutex minors_idr_lock;
> +static struct idr minors_idr;
> +
> +/**
> + * ndctl_config_write() - Handle a ND_CMD_SET_CONFIG_DATA command from ndctl
> + * @ocxlpmem: the device metadata
> + * @command: the incoming data to write
> + * Return: 0 on success, negative on failure
> + */
> +static int ndctl_config_write(struct ocxlpmem *ocxlpmem,
> + struct nd_cmd_set_config_hdr *command)
> +{
> + if (command->in_offset + command->in_length > LABEL_AREA_SIZE)
> + return -EINVAL;
> +
> + memcpy_flushcache(ocxlpmem->metadata_addr + command->in_offset,
> + command->in_buf, command->in_length);
> +
> + return 0;
> +}
> +
> +/**
> + * ndctl_config_read() - Handle a ND_CMD_GET_CONFIG_DATA command from ndctl
> + * @ocxlpmem: the device metadata
> + * @command: the read request
> + * Return: 0 on success, negative on failure
> + */
> +static int ndctl_config_read(struct ocxlpmem *ocxlpmem,
> + struct nd_cmd_get_config_data_hdr *command)
> +{
> + if (command->in_offset + command->in_length > LABEL_AREA_SIZE)
> + return -EINVAL;
> +
> + memcpy_mcsafe(command->out_buf,
> + ocxlpmem->metadata_addr + command->in_offset,
> + command->in_length);
> +
> + return 0;
> +}
> +
> +/**
> + * ndctl_config_size() - Handle a ND_CMD_GET_CONFIG_SIZE command from ndctl
> + * @command: the read request
> + * Return: 0 on success, negative on failure
> + */
> +static int ndctl_config_size(struct nd_cmd_get_config_size *command)
> +{
> + command->status = 0;
> + command->config_size = LABEL_AREA_SIZE;
> + command->max_xfer = PAGE_SIZE;
> +
> + return 0;
> +}
> +
> +static int ndctl(struct nvdimm_bus_descriptor *nd_desc,
> + struct nvdimm *nvdimm,
> + unsigned int cmd, void *buf, unsigned int buf_len, int *cmd_rc)
> +{
> + struct ocxlpmem *ocxlpmem = container_of(nd_desc,
> + struct ocxlpmem, bus_desc);
> +
> + switch (cmd) {
> + case ND_CMD_GET_CONFIG_SIZE:
> + *cmd_rc = ndctl_config_size(buf);
> + return 0;
> +
> + case ND_CMD_GET_CONFIG_DATA:
> + *cmd_rc = ndctl_config_read(ocxlpmem, buf);
> + return 0;
> +
> + case ND_CMD_SET_CONFIG_DATA:
> + *cmd_rc = ndctl_config_write(ocxlpmem, buf);
> + return 0;
> +
> + default:
> + return -ENOTTY;
> + }
> +}
> +
> +/**
> + * reserve_metadata() - Reserve space for nvdimm metadata
> + * @ocxlpmem: the device metadata
> + * @lpc_mem: The resource representing the LPC memory of the OpenCAPI device
> + */
> +static int reserve_metadata(struct ocxlpmem *ocxlpmem,
> + struct resource *lpc_mem)
> +{
> + ocxlpmem->metadata_addr = devm_memremap(&ocxlpmem->dev, lpc_mem->start,
> + LABEL_AREA_SIZE, MEMREMAP_WB);
> + if (IS_ERR(ocxlpmem->metadata_addr))
> + return PTR_ERR(ocxlpmem->metadata_addr);
> +
> + return 0;
> +}
> +
> +/**
> + * register_lpc_mem() - Discover persistent memory on a device and register it with the NVDIMM subsystem
> + * @ocxlpmem: the device metadata
> + * Return: 0 on success
> + */
> +static int register_lpc_mem(struct ocxlpmem *ocxlpmem)
> +{
> + struct nd_region_desc region_desc;
> + struct nd_mapping_desc nd_mapping_desc;
> + struct resource *lpc_mem;
> + const struct ocxl_afu_config *config;
> + const struct ocxl_fn_config *fn_config;
> + int rc;
> + unsigned long nvdimm_cmd_mask = 0;
> + unsigned long nvdimm_flags = 0;
> + int target_node;
> + char serial[16 + 1];
> +
> + // Set up the reserved metadata area
> + rc = ocxl_afu_map_lpc_mem(ocxlpmem->ocxl_afu);
> + if (rc < 0)
> + return rc;
> +
> + lpc_mem = ocxl_afu_lpc_mem(ocxlpmem->ocxl_afu);
> + if (!lpc_mem || !lpc_mem->start)
> + return -EINVAL;
> +
> + config = ocxl_afu_config(ocxlpmem->ocxl_afu);
> + fn_config = ocxl_function_config(ocxlpmem->ocxl_fn);
> +
> + rc = reserve_metadata(ocxlpmem, lpc_mem);
> + if (rc)
> + return rc;
> +
> + ocxlpmem->bus_desc.provider_name = "ocxlpmem";
> + ocxlpmem->bus_desc.ndctl = ndctl;
> + ocxlpmem->bus_desc.module = THIS_MODULE;
> +
> + ocxlpmem->nvdimm_bus = nvdimm_bus_register(&ocxlpmem->dev,
> + &ocxlpmem->bus_desc);
> + if (!ocxlpmem->nvdimm_bus)
> + return -EINVAL;
> +
> + ocxlpmem->pmem_res.start = (u64)lpc_mem->start + LABEL_AREA_SIZE;
> + ocxlpmem->pmem_res.end = (u64)lpc_mem->start + config->lpc_mem_size - 1;
> + ocxlpmem->pmem_res.name = "OpenCAPI persistent memory";
> +
> + set_bit(ND_CMD_GET_CONFIG_SIZE, &nvdimm_cmd_mask);
> + set_bit(ND_CMD_GET_CONFIG_DATA, &nvdimm_cmd_mask);
> + set_bit(ND_CMD_SET_CONFIG_DATA, &nvdimm_cmd_mask);
> +
> + set_bit(NDD_ALIASING, &nvdimm_flags);
> +
> + snprintf(serial, sizeof(serial), "%llx", fn_config->serial);
> + nd_mapping_desc.nvdimm = nvdimm_create(ocxlpmem->nvdimm_bus, ocxlpmem,
> + NULL, nvdimm_flags,
> + nvdimm_cmd_mask, 0, NULL);
> + if (!nd_mapping_desc.nvdimm)
> + return -ENOMEM;
> +
> + if (nvdimm_bus_check_dimm_count(ocxlpmem->nvdimm_bus, 1))
> + return -EINVAL;
> +
> + nd_mapping_desc.start = ocxlpmem->pmem_res.start;
> + nd_mapping_desc.size = resource_size(&ocxlpmem->pmem_res);
> + nd_mapping_desc.position = 0;
> +
> + ocxlpmem->nd_set.cookie1 = fn_config->serial;
> + ocxlpmem->nd_set.cookie2 = fn_config->serial;
> +
> + target_node = of_node_to_nid(ocxlpmem->pdev->dev.of_node);
> +
> + memset(&region_desc, 0, sizeof(region_desc));
> + region_desc.res = &ocxlpmem->pmem_res;
> + region_desc.numa_node = NUMA_NO_NODE;
> + region_desc.target_node = target_node;
> + region_desc.num_mappings = 1;
> + region_desc.mapping = &nd_mapping_desc;
> + region_desc.nd_set = &ocxlpmem->nd_set;
> +
> + set_bit(ND_REGION_PAGEMAP, &region_desc.flags);
> + /*
> + * NB: libnvdimm copies the data from ndr_desc into it's own
> + * structures so passing a stack pointer is fine.
> + */
> + ocxlpmem->nd_region = nvdimm_pmem_region_create(ocxlpmem->nvdimm_bus,
> + &region_desc);
> + if (!ocxlpmem->nd_region)
> + return -EINVAL;
> +
> + dev_info(&ocxlpmem->dev,
> + "Onlining %lluMB of persistent memory\n",
> + nd_mapping_desc.size / SZ_1M);
> +
> + return 0;
> +}
> +
> +/**
> + * allocate_minor() - Allocate a minor number to use for an OpenCAPI pmem device
> + * @ocxlpmem: the device metadata
> + * Return: the allocated minor number
> + */
> +static int allocate_minor(struct ocxlpmem *ocxlpmem)
> +{
> + int minor;
> +
> + mutex_lock(&minors_idr_lock);
> + minor = idr_alloc(&minors_idr, ocxlpmem, 0, NUM_MINORS, GFP_KERNEL);
> + mutex_unlock(&minors_idr_lock);
> + return minor;
> +}
> +
> +static void free_minor(struct ocxlpmem *ocxlpmem)
> +{
> + mutex_lock(&minors_idr_lock);
> + idr_remove(&minors_idr, MINOR(ocxlpmem->dev.devt));
> + mutex_unlock(&minors_idr_lock);
> +}
> +
> +/**
> + * free_ocxlpmem() - Free all members of an ocxlpmem struct
> + * @ocxlpmem: the device struct to clear
> + */
> +static void free_ocxlpmem(struct ocxlpmem *ocxlpmem)
> +{
> + int rc;
> +
> + free_minor(ocxlpmem);
> +
> + if (ocxlpmem->ocxl_context) {
> + rc = ocxl_context_detach(ocxlpmem->ocxl_context);
> + if (rc == -EBUSY)
> + dev_warn(&ocxlpmem->dev, "Timeout detaching ocxl context\n");
> + else
> + ocxl_context_free(ocxlpmem->ocxl_context);
> + }
> +
> + if (ocxlpmem->ocxl_afu)
> + ocxl_afu_put(ocxlpmem->ocxl_afu);
> +
> + if (ocxlpmem->ocxl_fn)
> + ocxl_function_close(ocxlpmem->ocxl_fn);
> +
> + pci_dev_put(ocxlpmem->pdev);
> +
> + kfree(ocxlpmem);
> +}
> +
> +/**
> + * free_ocxlpmem_dev() - Free an OpenCAPI persistent memory device
> + * @dev: The device struct
> + */
> +static void free_ocxlpmem_dev(struct device *dev)
> +{
> + struct ocxlpmem *ocxlpmem = container_of(dev, struct ocxlpmem, dev);
> +
> + free_ocxlpmem(ocxlpmem);
> +}
> +
> +/**
> + * ocxlpmem_register() - Register an OpenCAPI pmem device with the kernel
> + * @ocxlpmem: the device metadata
> + * Return: 0 on success, negative on failure
> + */
> +static int ocxlpmem_register(struct ocxlpmem *ocxlpmem)
> +{
> + int rc;
> + int minor = allocate_minor(ocxlpmem);
> +
> + if (minor < 0)
> + return minor;
> +
> + ocxlpmem->dev.release = free_ocxlpmem_dev;
> + rc = dev_set_name(&ocxlpmem->dev, "ocxlpmem%d", minor);
> + if (rc < 0)
> + return rc;
> +
> + ocxlpmem->dev.devt = MKDEV(MAJOR(ocxlpmem_dev), minor);
> + ocxlpmem->dev.class = ocxlpmem_class;
> + ocxlpmem->dev.parent = &ocxlpmem->pdev->dev;
> +
> + return device_register(&ocxlpmem->dev);

You lost me, why does this need it's own ioctl path and device node?
I'd expect you you'd tunnel the commands you need through the existing
infrastructure ideally with an eye for cross-arch compatibility. This
is a fundamental concern that probably has significant implications
for what follows.



> +}
> +
> +/**
> + * ocxlpmem_remove() - Free an OpenCAPI persistent memory device
> + * @pdev: the PCI device information struct
> + */
> +static void remove(struct pci_dev *pdev)
> +{
> + if (PCI_FUNC(pdev->devfn) == 0) {
> + struct ocxl_fn *func0 = pci_get_drvdata(pdev);
> +
> + if (func0)
> + ocxl_function_close(func0);
> + } else {
> + struct ocxlpmem *ocxlpmem = pci_get_drvdata(pdev);
> +
> + if (!ocxlpmem)
> + return;
> +
> + if (ocxlpmem->nvdimm_bus)
> + nvdimm_bus_unregister(ocxlpmem->nvdimm_bus);
> +
> + device_unregister(&ocxlpmem->dev);
> + }
> +}
> +
> +/**
> + * probe_function0() - Set up function 0 for an OpenCAPI persistent memory device
> + * This is important as it enables templates higher than 0 across all other
> + * functions, which in turn enables higher bandwidth accesses
> + * @pdev: the PCI device information struct
> + * Return: 0 on success, negative on failure
> + */
> +static int probe_function0(struct pci_dev *pdev)
> +{
> + struct ocxl_fn *fn;
> +
> + fn = ocxl_function_open(pdev);
> + if (IS_ERR(fn)) {
> + dev_err(&pdev->dev, "failed to open OCXL function\n");
> + return PTR_ERR(fn);
> + }
> +
> + pci_set_drvdata(pdev, fn);
> +
> + return 0;
> +}
> +
> +/**
> + * probe() - Init an OpenCAPI persistent memory device
> + * @pdev: the PCI device information struct
> + * @ent: The entry from ocxlpmem_pci_tbl
> + * Return: 0 on success, negative on failure
> + */
> +static int probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> +{
> + struct ocxlpmem *ocxlpmem;
> + int rc;
> +
> + if (PCI_FUNC(pdev->devfn) == 0)
> + return probe_function0(pdev);
> + else if (PCI_FUNC(pdev->devfn) != 1)
> + return 0;
> +
> + ocxlpmem = kzalloc(sizeof(*ocxlpmem), GFP_KERNEL);
> + if (!ocxlpmem) {
> + rc = -ENOMEM;
> + goto err;
> + }
> +
> + ocxlpmem->pdev = pci_dev_get(pdev);
> +
> + pci_set_drvdata(pdev, ocxlpmem);
> +
> + ocxlpmem->ocxl_fn = ocxl_function_open(pdev);
> + if (IS_ERR(ocxlpmem->ocxl_fn)) {
> + rc = PTR_ERR(ocxlpmem->ocxl_fn);
> + dev_err(&pdev->dev, "failed to open OCXL function\n");
> + goto err_unregistered;
> + }
> +
> + ocxlpmem->ocxl_afu = ocxl_function_fetch_afu(ocxlpmem->ocxl_fn, 0);
> + if (!ocxlpmem->ocxl_afu) {
> + rc = -ENXIO;
> + dev_err(&pdev->dev, "Could not get OCXL AFU from function\n");
> + goto err_unregistered;
> + }
> +
> + ocxl_afu_get(ocxlpmem->ocxl_afu);
> +
> + // Resources allocated below here are cleaned up in the release handler
> +
> + rc = ocxlpmem_register(ocxlpmem);
> + if (rc) {
> + dev_err(&pdev->dev,
> + "Could not register OpenCAPI persistent memory device with the kernel\n");
> + goto err;
> + }
> +
> + rc = ocxl_context_alloc(&ocxlpmem->ocxl_context, ocxlpmem->ocxl_afu,
> + NULL);
> + if (rc) {
> + dev_err(&pdev->dev, "Could not allocate OCXL context\n");
> + goto err;
> + }
> +
> + rc = ocxl_context_attach(ocxlpmem->ocxl_context, 0, NULL);
> + if (rc) {
> + dev_err(&pdev->dev, "Could not attach ocxl context\n");
> + goto err;
> + }
> +
> + rc = register_lpc_mem(ocxlpmem);
> + if (rc) {
> + dev_err(&pdev->dev,
> + "Could not register OpenCAPI persistent memory with libnvdimm\n");
> + goto err;
> + }
> +
> + return 0;
> +
> +err_unregistered:
> + if (!IS_ERR(ocxlpmem->ocxl_fn))
> + ocxl_function_close(ocxlpmem->ocxl_fn);
> + pci_dev_put(ocxlpmem->pdev);
> + kfree(ocxlpmem);
> + pci_set_drvdata(pdev, NULL);
> +
> +err:
> + /*
> + * Further cleanup is done in the release handler via free_ocxlpmem()
> + * This allows us to keep the character device live to handle IOCTLs to
> + * investigate issues if the card has an error
> + */
> +
> + dev_err(&pdev->dev,
> + "Error detected, will not register OpenCAPI persistent memory\n");
> + return 0;
> +}
> +
> +static struct pci_driver pci_driver = {
> + .name = "ocxlpmem",
> + .id_table = pci_tbl,
> + .probe = probe,
> + .remove = remove,
> + .shutdown = remove,
> +};
> +
> +static int __init ocxlpmem_init(void)
> +{
> + int rc;
> +
> + mutex_init(&minors_idr_lock);
> + idr_init(&minors_idr);
> +
> + rc = pci_register_driver(&pci_driver);
> + if (rc)
> + return rc;
> +
> + return 0;
> +}
> +
> +static void ocxlpmem_exit(void)
> +{
> + pci_unregister_driver(&pci_driver);
> +}
> +
> +module_init(ocxlpmem_init);
> +module_exit(ocxlpmem_exit);
> +
> +MODULE_DESCRIPTION("OpenCAPI Persistent Memory");
> +MODULE_LICENSE("GPL v2");
> +MODULE_AUTHOR("Alastair D'Silva <[email protected]>");
> diff --git a/drivers/nvdimm/ocxl/ocxlpmem.h b/drivers/nvdimm/ocxl/ocxlpmem.h
> new file mode 100644
> index 000000000000..03fe7a264281
> --- /dev/null
> +++ b/drivers/nvdimm/ocxl/ocxlpmem.h
> @@ -0,0 +1,23 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +// Copyright 2020 IBM Corp.
> +
> +#include <linux/pci.h>
> +#include <misc/ocxl.h>
> +#include <linux/libnvdimm.h>
> +#include <linux/mm.h>
> +
> +#define LABEL_AREA_SIZE BIT_ULL(PA_SECTION_SHIFT)
> +
> +struct ocxlpmem {
> + struct device dev;
> + struct pci_dev *pdev;
> + struct ocxl_fn *ocxl_fn;
> + struct nd_interleave_set nd_set;
> + struct nvdimm_bus_descriptor bus_desc;
> + struct nvdimm_bus *nvdimm_bus;
> + struct ocxl_afu *ocxl_afu;
> + struct ocxl_context *ocxl_context;
> + void *metadata_addr;
> + struct resource pmem_res;
> + struct nd_region *nd_region;
> +};
> --
> 2.24.1
>

2020-04-01 08:51:21

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v4 05/25] ocxl: Address kernel doc errors & warnings

On Sun, Mar 29, 2020 at 10:23 PM Alastair D'Silva <[email protected]> wrote:
>
> This patch addresses warnings and errors from the kernel doc scripts for
> the OpenCAPI driver.
>
> It also makes minor tweaks to make the docs more consistent.
>
> Signed-off-by: Alastair D'Silva <[email protected]>
> Acked-by: Andrew Donnellan <[email protected]>
> ---
> drivers/misc/ocxl/config.c | 24 ++++----
> drivers/misc/ocxl/ocxl_internal.h | 9 +--
> include/misc/ocxl.h | 96 ++++++++++++-------------------
> 3 files changed, 55 insertions(+), 74 deletions(-)

Looks good.


>
> diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
> index c8e19bfb5ef9..a62e3d7db2bf 100644
> --- a/drivers/misc/ocxl/config.c
> +++ b/drivers/misc/ocxl/config.c
> @@ -273,16 +273,16 @@ static int read_afu_info(struct pci_dev *dev, struct ocxl_fn_config *fn,
> }
>
> /**
> - * Read the template version from the AFU
> - * dev: the device for the AFU
> - * fn: the AFU offsets
> - * len: outputs the template length
> - * version: outputs the major<<8,minor version
> + * read_template_version() - Read the template version from the AFU
> + * @dev: the device for the AFU
> + * @fn: the AFU offsets
> + * @len: outputs the template length
> + * @version: outputs the major<<8,minor version
> *
> * Returns 0 on success, negative on failure
> */
> static int read_template_version(struct pci_dev *dev, struct ocxl_fn_config *fn,
> - u16 *len, u16 *version)
> + u16 *len, u16 *version)
> {
> u32 val32;
> u8 major, minor;
> @@ -476,16 +476,16 @@ static int validate_afu(struct pci_dev *dev, struct ocxl_afu_config *afu)
> }
>
> /**
> - * Populate AFU metadata regarding LPC memory
> - * dev: the device for the AFU
> - * fn: the AFU offsets
> - * afu: the AFU struct to populate the LPC metadata into
> + * read_afu_lpc_memory_info() - Populate AFU metadata regarding LPC memory
> + * @dev: the device for the AFU
> + * @fn: the AFU offsets
> + * @afu: the AFU struct to populate the LPC metadata into
> *
> * Returns 0 on success, negative on failure
> */
> static int read_afu_lpc_memory_info(struct pci_dev *dev,
> - struct ocxl_fn_config *fn,
> - struct ocxl_afu_config *afu)
> + struct ocxl_fn_config *fn,
> + struct ocxl_afu_config *afu)
> {
> int rc;
> u32 val32;
> diff --git a/drivers/misc/ocxl/ocxl_internal.h b/drivers/misc/ocxl/ocxl_internal.h
> index 345bf843a38e..198e4e4bc51d 100644
> --- a/drivers/misc/ocxl/ocxl_internal.h
> +++ b/drivers/misc/ocxl/ocxl_internal.h
> @@ -122,11 +122,12 @@ int ocxl_config_check_afu_index(struct pci_dev *dev,
> struct ocxl_fn_config *fn, int afu_idx);
>
> /**
> - * Update values within a Process Element
> + * ocxl_link_update_pe() - Update values within a Process Element
> + * @link_handle: the link handle associated with the process element
> + * @pasid: the PASID for the AFU context
> + * @tid: the new thread id for the process element
> *
> - * link_handle: the link handle associated with the process element
> - * pasid: the PASID for the AFU context
> - * tid: the new thread id for the process element
> + * Returns 0 on success
> */
> int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid);
>
> diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
> index 0a762e387418..357ef1aadbc0 100644
> --- a/include/misc/ocxl.h
> +++ b/include/misc/ocxl.h
> @@ -62,8 +62,7 @@ struct ocxl_context;
> // Device detection & initialisation
>
> /**
> - * Open an OpenCAPI function on an OpenCAPI device
> - *
> + * ocxl_function_open() - Open an OpenCAPI function on an OpenCAPI device
> * @dev: The PCI device that contains the function
> *
> * Returns an opaque pointer to the function, or an error pointer (check with IS_ERR)
> @@ -71,8 +70,7 @@ struct ocxl_context;
> struct ocxl_fn *ocxl_function_open(struct pci_dev *dev);
>
> /**
> - * Get the list of AFUs associated with a PCI function device
> - *
> + * ocxl_function_afu_list() - Get the list of AFUs associated with a PCI function device
> * Returns a list of struct ocxl_afu *
> *
> * @fn: The OpenCAPI function containing the AFUs
> @@ -80,8 +78,7 @@ struct ocxl_fn *ocxl_function_open(struct pci_dev *dev);
> struct list_head *ocxl_function_afu_list(struct ocxl_fn *fn);
>
> /**
> - * Fetch an AFU instance from an OpenCAPI function
> - *
> + * ocxl_function_fetch_afu() - Fetch an AFU instance from an OpenCAPI function
> * @fn: The OpenCAPI function to get the AFU from
> * @afu_idx: The index of the AFU to get
> *
> @@ -92,23 +89,20 @@ struct list_head *ocxl_function_afu_list(struct ocxl_fn *fn);
> struct ocxl_afu *ocxl_function_fetch_afu(struct ocxl_fn *fn, u8 afu_idx);
>
> /**
> - * Take a reference to an AFU
> - *
> + * ocxl_afu_get() - Take a reference to an AFU
> * @afu: The AFU to increment the reference count on
> */
> void ocxl_afu_get(struct ocxl_afu *afu);
>
> /**
> - * Release a reference to an AFU
> - *
> + * ocxl_afu_put() - Release a reference to an AFU
> * @afu: The AFU to decrement the reference count on
> */
> void ocxl_afu_put(struct ocxl_afu *afu);
>
>
> /**
> - * Get the configuration information for an OpenCAPI function
> - *
> + * ocxl_function_config() - Get the configuration information for an OpenCAPI function
> * @fn: The OpenCAPI function to get the config for
> *
> * Returns the function config, or NULL on error
> @@ -116,8 +110,7 @@ void ocxl_afu_put(struct ocxl_afu *afu);
> const struct ocxl_fn_config *ocxl_function_config(struct ocxl_fn *fn);
>
> /**
> - * Close an OpenCAPI function
> - *
> + * ocxl_function_close() - Close an OpenCAPI function
> * This will free any AFUs previously retrieved from the function, and
> * detach and associated contexts. The contexts must by freed by the caller.
> *
> @@ -129,8 +122,7 @@ void ocxl_function_close(struct ocxl_fn *fn);
> // Context allocation
>
> /**
> - * Allocate an OpenCAPI context
> - *
> + * ocxl_context_alloc() - Allocate an OpenCAPI context
> * @context: The OpenCAPI context to allocate, must be freed with ocxl_context_free
> * @afu: The AFU the context belongs to
> * @mapping: The mapping to unmap when the context is closed (may be NULL)
> @@ -139,14 +131,13 @@ int ocxl_context_alloc(struct ocxl_context **context, struct ocxl_afu *afu,
> struct address_space *mapping);
>
> /**
> - * Free an OpenCAPI context
> - *
> + * ocxl_context_free() - Free an OpenCAPI context
> * @ctx: The OpenCAPI context to free
> */
> void ocxl_context_free(struct ocxl_context *ctx);
>
> /**
> - * Grant access to an MM to an OpenCAPI context
> + * ocxl_context_attach() - Grant access to an MM to an OpenCAPI context
> * @ctx: The OpenCAPI context to attach
> * @amr: The value of the AMR register to restrict access
> * @mm: The mm to attach to the context
> @@ -157,7 +148,7 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr,
> struct mm_struct *mm);
>
> /**
> - * Detach an MM from an OpenCAPI context
> + * ocxl_context_detach() - Detach an MM from an OpenCAPI context
> * @ctx: The OpenCAPI context to attach
> *
> * Returns 0 on success, negative on failure
> @@ -167,7 +158,7 @@ int ocxl_context_detach(struct ocxl_context *ctx);
> // AFU IRQs
>
> /**
> - * Allocate an IRQ associated with an AFU context
> + * ocxl_afu_irq_alloc() - Allocate an IRQ associated with an AFU context
> * @ctx: the AFU context
> * @irq_id: out, the IRQ ID
> *
> @@ -176,7 +167,7 @@ int ocxl_context_detach(struct ocxl_context *ctx);
> int ocxl_afu_irq_alloc(struct ocxl_context *ctx, int *irq_id);
>
> /**
> - * Frees an IRQ associated with an AFU context
> + * ocxl_afu_irq_free() - Frees an IRQ associated with an AFU context
> * @ctx: the AFU context
> * @irq_id: the IRQ ID
> *
> @@ -185,7 +176,7 @@ int ocxl_afu_irq_alloc(struct ocxl_context *ctx, int *irq_id);
> int ocxl_afu_irq_free(struct ocxl_context *ctx, int irq_id);
>
> /**
> - * Gets the address of the trigger page for an IRQ
> + * ocxl_afu_irq_get_addr() - Gets the address of the trigger page for an IRQ
> * This can then be provided to an AFU which will write to that
> * page to trigger the IRQ.
> * @ctx: The AFU context that the IRQ is associated with
> @@ -196,7 +187,7 @@ int ocxl_afu_irq_free(struct ocxl_context *ctx, int irq_id);
> u64 ocxl_afu_irq_get_addr(struct ocxl_context *ctx, int irq_id);
>
> /**
> - * Provide a callback to be called when an IRQ is triggered
> + * ocxl_irq_set_handler() - Provide a callback to be called when an IRQ is triggered
> * @ctx: The AFU context that the IRQ is associated with
> * @irq_id: The IRQ ID
> * @handler: the callback to be called when the IRQ is triggered
> @@ -213,8 +204,7 @@ int ocxl_irq_set_handler(struct ocxl_context *ctx, int irq_id,
> // AFU Metadata
>
> /**
> - * Get a pointer to the config for an AFU
> - *
> + * ocxl_afu_config() - Get a pointer to the config for an AFU
> * @afu: a pointer to the AFU to get the config for
> *
> * Returns a pointer to the AFU config
> @@ -222,27 +212,24 @@ int ocxl_irq_set_handler(struct ocxl_context *ctx, int irq_id,
> struct ocxl_afu_config *ocxl_afu_config(struct ocxl_afu *afu);
>
> /**
> - * Assign opaque hardware specific information to an OpenCAPI AFU.
> - *
> - * @dev: The PCI device associated with the OpenCAPI device
> + * ocxl_afu_set_private() - Assign opaque hardware specific information to an OpenCAPI AFU.
> + * @afu: The OpenCAPI AFU
> * @private: the opaque hardware specific information to assign to the driver
> */
> void ocxl_afu_set_private(struct ocxl_afu *afu, void *private);
>
> /**
> - * Fetch the hardware specific information associated with an external OpenCAPI
> - * AFU. This may be consumed by an external OpenCAPI driver.
> - *
> - * @afu: The AFU
> + * ocxl_afu_get_private() - Fetch the hardware specific information associated with
> + * an external OpenCAPI AFU. This may be consumed by an external OpenCAPI driver.
> + * @afu: The OpenCAPI AFU
> *
> * Returns the opaque pointer associated with the device, or NULL if not set
> */
> -void *ocxl_afu_get_private(struct ocxl_afu *dev);
> +void *ocxl_afu_get_private(struct ocxl_afu *afu);
>
> // Global MMIO
> /**
> - * Read a 32 bit value from global MMIO
> - *
> + * ocxl_global_mmio_read32() - Read a 32 bit value from global MMIO
> * @afu: The AFU
> * @offset: The Offset from the start of MMIO
> * @endian: the endianness that the MMIO data is in
> @@ -251,11 +238,10 @@ void *ocxl_afu_get_private(struct ocxl_afu *dev);
> * Returns 0 for success, negative on error
> */
> int ocxl_global_mmio_read32(struct ocxl_afu *afu, size_t offset,
> - enum ocxl_endian endian, u32 *val);
> + enum ocxl_endian endian, u32 *val);
>
> /**
> - * Read a 64 bit value from global MMIO
> - *
> + * ocxl_global_mmio_read64() - Read a 64 bit value from global MMIO
> * @afu: The AFU
> * @offset: The Offset from the start of MMIO
> * @endian: the endianness that the MMIO data is in
> @@ -264,11 +250,10 @@ int ocxl_global_mmio_read32(struct ocxl_afu *afu, size_t offset,
> * Returns 0 for success, negative on error
> */
> int ocxl_global_mmio_read64(struct ocxl_afu *afu, size_t offset,
> - enum ocxl_endian endian, u64 *val);
> + enum ocxl_endian endian, u64 *val);
>
> /**
> - * Write a 32 bit value to global MMIO
> - *
> + * ocxl_global_mmio_write32() - Write a 32 bit value to global MMIO
> * @afu: The AFU
> * @offset: The Offset from the start of MMIO
> * @endian: the endianness that the MMIO data is in
> @@ -277,11 +262,10 @@ int ocxl_global_mmio_read64(struct ocxl_afu *afu, size_t offset,
> * Returns 0 for success, negative on error
> */
> int ocxl_global_mmio_write32(struct ocxl_afu *afu, size_t offset,
> - enum ocxl_endian endian, u32 val);
> + enum ocxl_endian endian, u32 val);
>
> /**
> - * Write a 64 bit value to global MMIO
> - *
> + * ocxl_global_mmio_write64() - Write a 64 bit value to global MMIO
> * @afu: The AFU
> * @offset: The Offset from the start of MMIO
> * @endian: the endianness that the MMIO data is in
> @@ -290,11 +274,10 @@ int ocxl_global_mmio_write32(struct ocxl_afu *afu, size_t offset,
> * Returns 0 for success, negative on error
> */
> int ocxl_global_mmio_write64(struct ocxl_afu *afu, size_t offset,
> - enum ocxl_endian endian, u64 val);
> + enum ocxl_endian endian, u64 val);
>
> /**
> - * Set bits in a 32 bit global MMIO register
> - *
> + * ocxl_global_mmio_set32() - Set bits in a 32 bit global MMIO register
> * @afu: The AFU
> * @offset: The Offset from the start of MMIO
> * @endian: the endianness that the MMIO data is in
> @@ -303,11 +286,10 @@ int ocxl_global_mmio_write64(struct ocxl_afu *afu, size_t offset,
> * Returns 0 for success, negative on error
> */
> int ocxl_global_mmio_set32(struct ocxl_afu *afu, size_t offset,
> - enum ocxl_endian endian, u32 mask);
> + enum ocxl_endian endian, u32 mask);
>
> /**
> - * Set bits in a 64 bit global MMIO register
> - *
> + * ocxl_global_mmio_set64() - Set bits in a 64 bit global MMIO register
> * @afu: The AFU
> * @offset: The Offset from the start of MMIO
> * @endian: the endianness that the MMIO data is in
> @@ -316,11 +298,10 @@ int ocxl_global_mmio_set32(struct ocxl_afu *afu, size_t offset,
> * Returns 0 for success, negative on error
> */
> int ocxl_global_mmio_set64(struct ocxl_afu *afu, size_t offset,
> - enum ocxl_endian endian, u64 mask);
> + enum ocxl_endian endian, u64 mask);
>
> /**
> - * Set bits in a 32 bit global MMIO register
> - *
> + * ocxl_global_mmio_clear32() - Set bits in a 32 bit global MMIO register
> * @afu: The AFU
> * @offset: The Offset from the start of MMIO
> * @endian: the endianness that the MMIO data is in
> @@ -329,11 +310,10 @@ int ocxl_global_mmio_set64(struct ocxl_afu *afu, size_t offset,
> * Returns 0 for success, negative on error
> */
> int ocxl_global_mmio_clear32(struct ocxl_afu *afu, size_t offset,
> - enum ocxl_endian endian, u32 mask);
> + enum ocxl_endian endian, u32 mask);
>
> /**
> - * Set bits in a 64 bit global MMIO register
> - *
> + * ocxl_global_mmio_clear64() - Set bits in a 64 bit global MMIO register
> * @afu: The AFU
> * @offset: The Offset from the start of MMIO
> * @endian: the endianness that the MMIO data is in
> @@ -342,7 +322,7 @@ int ocxl_global_mmio_clear32(struct ocxl_afu *afu, size_t offset,
> * Returns 0 for success, negative on error
> */
> int ocxl_global_mmio_clear64(struct ocxl_afu *afu, size_t offset,
> - enum ocxl_endian endian, u64 mask);
> + enum ocxl_endian endian, u64 mask);
>
> // Functions left here are for compatibility with the cxlflash driver
>
> --
> 2.24.1
>

2020-04-01 09:18:59

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v4 00/25] Add support for OpenCAPI Persistent Memory devices

On Sun, Mar 29, 2020 at 10:23 PM Alastair D'Silva <[email protected]> wrote:
>
> This series adds support for OpenCAPI Persistent Memory devices on bare metal (arch/powernv), exposing them as nvdimms so that we can make use of the existing infrastructure. There already exists a driver for the same devices abstracted through PowerVM (arch/pseries): arch/powerpc/platforms/pseries/papr_scm.c
>
> These devices are connected via OpenCAPI, and present as LPC (lowest coherence point) memory to the system, practically, that means that memory on these cards could be treated as conventional, cache-coherent memory.
>
> Since the devices are connected via OpenCAPI, they are not enumerated via ACPI. Instead, OpenCAPI links present as pseudo-PCI bridges, with devices below them.
>
> This series introduces a driver that exposes the memory on these cards as nvdimms, with each card getting it's own bus. This is somewhat complicated by the fact that the cards do not have out of band persistent storage for metadata, so 1 SECTION_SIZE's (see SPARSEMEM) worth of storage is carved out of the top of the card storage to implement the ndctl_config_* calls.

Is it really tied to section-size? Can't that change based on the
configured page-size? It's not clear to me why that would be the
choice, but I'll dig into the implementation.

> The driver is not responsible for configuring the NPU (NVLink Processing Unit) BARs to map the LPC memory from the card into the system's physical address space, instead, it requests this to be done via OPAL calls (typically implemented by Skiboot).

Are OPAL calls similar to ACPI DSMs? I.e. methods for the OS to invoke
platform firmware services? What's Skiboot?

>
> The series is structured as follows:
> - Required infrastructure changes & cleanup
> - A minimal driver implementation
> - Implementing additional features within the driver

Thanks for the intro and the changelog!

>
> Changelog:
> V4:
> - Rebase on next-20200320

Do you have dependencies on other material that's in -next? Otherwise
-next is only a viable development baseline if you are going to merge
through Andrew's tree.

> - Bump copyright to 2020
> - Ensure all uapi headers use C89 compatible comments (missed ocxlpmem.h)
> - Move the driver back to drivers/nvdimm/ocxl, after confirmation
> that this location is desirable
> - Rename ocxl.c to ocxlpmem.c (+ support files)
> - Rename all ocxl_pmem to ocxlpmem
> - Address checkpatch --strict issues
> - "powerpc/powernv: Add OPAL calls for LPC memory alloc/release"
> - Pass base address as __be64
> - "ocxl: Tally up the LPC memory on a link & allow it to be mapped"
> - Address checkpatch spacing warnings
> - Reword blurb
> - Reword size description for ocxl_link_add_lpc_mem()
> - Add an early exit in ocxl_link_lpc_release() to avoid triggering
> bogus warnings if called after ocxl_link_lpc_map() fails
> - "powerpc/powernv: Add OPAL calls for LPC memory alloc/release"
> - Reword blurb
> - "powerpc/powernv: Map & release OpenCAPI LPC memory"
> - Reword blurb
> - Move minor_idr init from file_init() to ocxlpmem_init() (fixes runtime error
> in "nvdimm: Add driver for OpenCAPI Persistent Memory")
> - Wrap long lines
> - "nvdimm: Add driver for OpenCAPI Storage Class Memory"
> - Remove '+ 1' workround from serial number->cookie assignment
> - Drop out of memory message for ocxlpmem in probe()
> - Fix leaks of ocxlpmem & ocxlpmem->ocxl_fn in probe()
> - remove struct ocxlpmem_function0, it didn't value add
> - factor out err_unregistered label in probe
> - Address more checkpatch warnings
> - get/put the pci dev on probe/free
> - Drop ocxlpmem_ prefix from static functions
> - Propogate errors up from called functions in probe()
> - Set MODULE_LICENSE to GPLv2
> - Add myself as module author
> - Call nvdimm_bus_unregister() in remove() to release references
> - Don't call devm_memunmap on metadata_address, the release handler on
> the device already deals with this
> - "nvdimm/ocxl: Read the capability registers & wait for device ready"
> - Fix mask for read_latency
> - Fold in is_usable logic into timeout to remove error message race
> - propogate bad rc from read_device_metadata
> - "nvdimm/ocxl: Add register addresses & status values to the header"
> - Add comments for register abbreviations where names have been
> expanded
> - Add missing status for blocked on background task
> - Alias defines for firmware update status to show that the duplication
> of values is intentional
> - "nvdimm/ocxl: Register a character device for userspace to interact with"
> - Add lock around minors IDR, delete the cdev before device_unregister
> - Propogate errors up from called functions in probe()
> - "nvdimm/ocxl: Add support for Admin commands"
> - Fix typo in setup_command_data error message, and drop 'ocxl' from it
> - Drop vestigial CHI read from admin_command_request
> - Change command ID mismatch message to dev_err, and return an error
> - Use jiffies to implement admin_command_complete_timeout()
> - Flesh out blurb
> - Create a wrapper to issue the command & wait for timeout
> - "nvdimm/ocxl: Add support for near storage commands"
> - dropped (will submit with the patches for nvdimm overwrite)
> - "nvdimm/ocxl: Implement the Read Error Log command"
> - Remove stray blank line
> - change misplaced goto to an early exit in read_error_log
> - Inline error_log_offset_0x08
> - Read WWID data as LE rather than host endian
> - Move the include of nvdimm/ocxlpmem.h to ocxl.c
> - Add padding after fwrevision in struct ioctl_ocxl_pmem_error_log
> - Register IOCTL magic
> - Coerce pointers to __u64 in IOCTLs
> - "nvdimm/ocxl: Add controller dump IOCTLs"
> - Coerce pointers to __u64 in IOCTLs
> - Document expected IOCTL usage in blurb
> - Add missing rc check
> - Only populate up to the number of bytes returned by the card,
> and return this length to the caller
> - Add missing header check
> - "nvdimm/ocxl: Add an IOCTL to report controller statistics"
> - Update to match the latest version of the spec
> - Verify that parametr block IDs & lengths match what we expect
> - Use defines for offsets
> - "nvdimm/ocxl: Forward events to userspace"
> - Don't enable NSCRA doorbell
> - return -EBUSY if the event context is already used
> - return -ENODEV if IRQs cannot be mapped
> - Tag IRQ pointers with __iomem
> - Drop ocxlpmem_ prefix from static functions
> - Propogate error from eventfd_ctx_fdget
> - Fix error check in copy_to_user
> - Drop GLOBAL_MMIO_CHI_NSCRA (this should be in the overwrite patch)
> - Drop unused irq_pgmap
> - Don't redef BIT_ULL
> - "nvdimm/ocxl: Add debug IOCTLs"
> - Eliminate clearing loop (now done in admin_command_execute()
> - Drop dummy IOCTLs if CONFIG_OCXL_PMEM_DEBUG is not set
> - Group debug IOCTLs together & comment that they may not be available
> - "nvdimm/ocxl: Expose SMART data via ndctl"
> - Drop 'rc = 0; goto out;'
> - Propogate errors from ndctl_smart()
> - "nvdimm/ocxl: Expose the serial number in sysfs" & "nvdimm/ocxl: Expose the firmware version in sysfs"
> - Squash these 2 patches together
> - Expose data as a DIMM attribute rather than an ocxlpmem
> attribute
> - "nvdimm/ocxl: Add an IOCTL to request controller health & perf data"
> - Reword blurb
> - "nvdimm/ocxl: Implement the heartbeat command"
> - Propogate rc in probe()
>
> V3:
> - Rebase against next/next-20200220
> - Move driver to arch/powerpc/platforms/powernv, we now expect this
> driver to go upstream via the powerpc tree
> - "nvdimm/ocxl: Implement the Read Error Log command"
> - Fix bad header path
> - "nvdimm/ocxl: Read the capability registers & wait for device ready"
> - Fix overlapping masks between readiness_timeout & memory_available_timeout
> - "nvdimm: Add driver for OpenCAPI Storage Class Memory"
> - Address minor review comments from Jonathan Cameron
> - Remove attributes
> - Default to module if building LIBNVDIMM
> - Propogate errors up from called functions in probe()
> - "nvdimm/ocxl: Expose SMART data via ndctl"
> - Pack attributes in struct
> - Support different size SMART buffers for compatibility with newer
> ndctls that may want more SMART attribs than we provide
> - Rework to to use ND_CMD_CALL instead of ND_CMD_SMART
> - drop "ocxl: Free detached contexts in ocxl_context_detach_all()"
> - "powerpc: Map & release OpenCAPI LPC memory"
> - Remove 'extern'
> - Only available with CONFIG_MEMORY_HOTPLUG_SPARSE
> - "ocxl: Tally up the LPC memory on a link & allow it to be mapped"
> - Address minor review comments from Jonathan Cameron
> - "ocxl: Add functions to map/unmap LPC memory"
> - Split detected memory message into a separate patch
> - Address minor review comments from Jonathan Cameron
> - Add a comment explaining why unmap_lpc_mem is in deconfigure_afu
> - "nvdimm/ocxl: Add support for Admin commands"
> - use sizeof(u64) rather than 0x08 when iterating u64s
> - "nvdimm/ocxl: Implement the heartbeat command"
> - Fix typo in blurb
> - Address kernel doc issues
> - Ensure all uapi headers use C89 compatible comments
> - Drop patches for firmware update & overwrite, these will be
> submitted later once patches are available for ndctl
> - Rename SCM to OpenCAPI Persistent Memory
>
> V2:
> - "powerpc: Map & release OpenCAPI LPC memory"
> - Fix #if -> #ifdef
> - use pci_dev_id to get the bdfn
> - use __be64 to hold be data
> - indent check_hotplug_memory_addressable correctly
> - Remove export of check_hotplug_memory_addressable
> - "ocxl: Conditionally bind SCM devices to the generic OCXL driver"
> - Improve patch description and remove redundant default
> - "nvdimm: Add driver for OpenCAPI Storage Class Memory"
> - Mark a few funcs as static as identified by the 0day bot
> - Add OCXL dependancies to OCXL_SCM
> - Use memcpy_mcsafe in scm_ndctl_config_read
> - Rename scm_foo_offset_0x00 to scm_foo_header_parse & add docs
> - Name DIMM attribs "ocxl" rather than "scm"
> - Split out into base + many feature patches
> - "powerpc: Enable OpenCAPI Storage Class Memory driver on bare metal"
> - Build DEV_DAX & friends as modules
> - "ocxl: Conditionally bind SCM devices to the generic OCXL driver"
> - Patch dropped (easy enough to maintain this out of tree for development)
> - "ocxl: Tally up the LPC memory on a link & allow it to be mapped"
> - Add a warning if an unmatched lpc_release is called
> - "ocxl: Add functions to map/unmap LPC memory"
> - Use EXPORT_SYMBOL_GPL
>
>
> Alastair D'Silva (25):
> powerpc/powernv: Add OPAL calls for LPC memory alloc/release
> mm/memory_hotplug: Allow check_hotplug_memory_addressable to be called
> from drivers
> powerpc/powernv: Map & release OpenCAPI LPC memory
> ocxl: Remove unnecessary externs
> ocxl: Address kernel doc errors & warnings
> ocxl: Tally up the LPC memory on a link & allow it to be mapped
> ocxl: Add functions to map/unmap LPC memory
> ocxl: Emit a log message showing how much LPC memory was detected
> ocxl: Save the device serial number in ocxl_fn
> nvdimm: Add driver for OpenCAPI Persistent Memory
> powerpc: Enable the OpenCAPI Persistent Memory driver for
> powernv_defconfig
> nvdimm/ocxl: Add register addresses & status values to the header
> nvdimm/ocxl: Read the capability registers & wait for device ready
> nvdimm/ocxl: Add support for Admin commands
> nvdimm/ocxl: Register a character device for userspace to interact
> with
> nvdimm/ocxl: Implement the Read Error Log command
> nvdimm/ocxl: Add controller dump IOCTLs
> nvdimm/ocxl: Add an IOCTL to report controller statistics
> nvdimm/ocxl: Forward events to userspace
> nvdimm/ocxl: Add an IOCTL to request controller health & perf data
> nvdimm/ocxl: Implement the heartbeat command
> nvdimm/ocxl: Add debug IOCTLs
> nvdimm/ocxl: Expose SMART data via ndctl
> nvdimm/ocxl: Expose the serial number & firmware version in sysfs
> MAINTAINERS: Add myself & nvdimm/ocxl to ocxl
>
> .../userspace-api/ioctl/ioctl-number.rst | 1 +
> MAINTAINERS | 3 +
> arch/powerpc/configs/powernv_defconfig | 5 +
> arch/powerpc/include/asm/opal-api.h | 2 +
> arch/powerpc/include/asm/opal.h | 2 +
> arch/powerpc/include/asm/pnv-ocxl.h | 42 +-
> arch/powerpc/platforms/powernv/ocxl.c | 43 +
> arch/powerpc/platforms/powernv/opal-call.c | 2 +
> drivers/misc/ocxl/config.c | 74 +-
> drivers/misc/ocxl/core.c | 61 +
> drivers/misc/ocxl/link.c | 60 +
> drivers/misc/ocxl/ocxl_internal.h | 45 +-
> drivers/nvdimm/Kconfig | 2 +
> drivers/nvdimm/Makefile | 1 +
> drivers/nvdimm/ocxl/Kconfig | 21 +
> drivers/nvdimm/ocxl/Makefile | 7 +
> drivers/nvdimm/ocxl/main.c | 1975 +++++++++++++++++
> drivers/nvdimm/ocxl/ocxlpmem.h | 197 ++
> drivers/nvdimm/ocxl/ocxlpmem_internal.c | 280 +++
> include/linux/memory_hotplug.h | 5 +
> include/misc/ocxl.h | 122 +-
> include/uapi/linux/ndctl.h | 1 +
> include/uapi/nvdimm/ocxlpmem.h | 127 ++
> mm/memory_hotplug.c | 4 +-
> 24 files changed, 2983 insertions(+), 99 deletions(-)
> create mode 100644 drivers/nvdimm/ocxl/Kconfig
> create mode 100644 drivers/nvdimm/ocxl/Makefile
> create mode 100644 drivers/nvdimm/ocxl/main.c
> create mode 100644 drivers/nvdimm/ocxl/ocxlpmem.h
> create mode 100644 drivers/nvdimm/ocxl/ocxlpmem_internal.c
> create mode 100644 include/uapi/nvdimm/ocxlpmem.h
>
> --
> 2.24.1
>

2020-04-01 19:37:40

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v4 10/25] nvdimm: Add driver for OpenCAPI Persistent Memory

On Wed, Apr 1, 2020 at 1:49 AM Dan Williams <[email protected]> wrote:
>
> On Sun, Mar 29, 2020 at 10:23 PM Alastair D'Silva <[email protected]> wrote:
> >
> > This driver exposes LPC memory on OpenCAPI pmem cards
> > as an NVDIMM, allowing the existing nvram infrastructure
> > to be used.
> >
> > Namespace metadata is stored on the media itself, so
> > scm_reserve_metadata() maps 1 section's worth of PMEM storage
> > at the start to hold this. The rest of the PMEM range is registered
> > with libnvdimm as an nvdimm. ndctl_config_read/write/size() provide
> > callbacks to libnvdimm to access the metadata.
> >
> > Signed-off-by: Alastair D'Silva <[email protected]>
> > ---
> > drivers/nvdimm/Kconfig | 2 +
> > drivers/nvdimm/Makefile | 1 +
> > drivers/nvdimm/ocxl/Kconfig | 15 ++
> > drivers/nvdimm/ocxl/Makefile | 7 +
> > drivers/nvdimm/ocxl/main.c | 476 +++++++++++++++++++++++++++++++++
> > drivers/nvdimm/ocxl/ocxlpmem.h | 23 ++
> > 6 files changed, 524 insertions(+)
> > create mode 100644 drivers/nvdimm/ocxl/Kconfig
> > create mode 100644 drivers/nvdimm/ocxl/Makefile
> > create mode 100644 drivers/nvdimm/ocxl/main.c
> > create mode 100644 drivers/nvdimm/ocxl/ocxlpmem.h
> >
> > diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
> > index b7d1eb38b27d..368328637182 100644
> > --- a/drivers/nvdimm/Kconfig
> > +++ b/drivers/nvdimm/Kconfig
> > @@ -131,4 +131,6 @@ config NVDIMM_TEST_BUILD
> > core devm_memremap_pages() implementation and other
> > infrastructure.
> >
> > +source "drivers/nvdimm/ocxl/Kconfig"
> > +
> > endif
> > diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
> > index 29203f3d3069..bc02be11c794 100644
> > --- a/drivers/nvdimm/Makefile
> > +++ b/drivers/nvdimm/Makefile
> > @@ -33,3 +33,4 @@ libnvdimm-$(CONFIG_NVDIMM_KEYS) += security.o
> > TOOLS := ../../tools
> > TEST_SRC := $(TOOLS)/testing/nvdimm/test
> > obj-$(CONFIG_NVDIMM_TEST_BUILD) += $(TEST_SRC)/iomap.o
> > +obj-$(CONFIG_LIBNVDIMM) += ocxl/
> > diff --git a/drivers/nvdimm/ocxl/Kconfig b/drivers/nvdimm/ocxl/Kconfig
> > new file mode 100644
> > index 000000000000..c5d927520920
> > --- /dev/null
> > +++ b/drivers/nvdimm/ocxl/Kconfig
> > @@ -0,0 +1,15 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +if LIBNVDIMM
> > +
> > +config OCXL_PMEM
> > + tristate "OpenCAPI Persistent Memory"
> > + depends on LIBNVDIMM && PPC_POWERNV && PCI && EEH && ZONE_DEVICE && OCXL
>
> Does OXCL_PMEM itself have any CONFIG_ZONE_DEVICE dependencies? That's
> more a function of CONFIG_DEV_DAX and CONFIG_FS_DAX. Doesn't OCXL
> already depend on CONFIG_PCI?
>
>
> > + help
> > + Exposes devices that implement the OpenCAPI Storage Class Memory
> > + specification as persistent memory regions. You may also want
> > + DEV_DAX, DEV_DAX_PMEM & FS_DAX if you plan on using DAX devices
> > + stacked on top of this driver.
> > +
> > + Select N if unsure.
> > +
> > +endif
> > diff --git a/drivers/nvdimm/ocxl/Makefile b/drivers/nvdimm/ocxl/Makefile
> > new file mode 100644
> > index 000000000000..e0e8ade1987a
> > --- /dev/null
> > +++ b/drivers/nvdimm/ocxl/Makefile
> > @@ -0,0 +1,7 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +
> > +ccflags-$(CONFIG_PPC_WERROR) += -Werror
> > +
> > +obj-$(CONFIG_OCXL_PMEM) += ocxlpmem.o
> > +
> > +ocxlpmem-y := main.o
> > \ No newline at end of file
> > diff --git a/drivers/nvdimm/ocxl/main.c b/drivers/nvdimm/ocxl/main.c
> > new file mode 100644
> > index 000000000000..c0066fedf9cc
> > --- /dev/null
> > +++ b/drivers/nvdimm/ocxl/main.c
> > @@ -0,0 +1,476 @@
> > +// SPDX-License-Identifier: GPL-2.0+
> > +// Copyright 2020 IBM Corp.
> > +
> > +/*
> > + * A driver for OpenCAPI devices that implement the Storage Class
> > + * Memory specification.
> > + */
> > +
> > +#include <linux/module.h>
> > +#include <misc/ocxl.h>
> > +#include <linux/ndctl.h>
> > +#include <linux/mm_types.h>
> > +#include <linux/memory_hotplug.h>
> > +#include "ocxlpmem.h"
> > +
> > +static const struct pci_device_id pci_tbl[] = {
> > + { PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x0625), },
> > + { }
> > +};
> > +
> > +MODULE_DEVICE_TABLE(pci, pci_tbl);
> > +
> > +#define NUM_MINORS 256 // Total to reserve
> > +
> > +static dev_t ocxlpmem_dev;
> > +static struct class *ocxlpmem_class;
> > +static struct mutex minors_idr_lock;
> > +static struct idr minors_idr;
> > +
> > +/**
> > + * ndctl_config_write() - Handle a ND_CMD_SET_CONFIG_DATA command from ndctl
> > + * @ocxlpmem: the device metadata
> > + * @command: the incoming data to write
> > + * Return: 0 on success, negative on failure
> > + */
> > +static int ndctl_config_write(struct ocxlpmem *ocxlpmem,
> > + struct nd_cmd_set_config_hdr *command)
> > +{
> > + if (command->in_offset + command->in_length > LABEL_AREA_SIZE)
> > + return -EINVAL;
> > +
> > + memcpy_flushcache(ocxlpmem->metadata_addr + command->in_offset,
> > + command->in_buf, command->in_length);
> > +
> > + return 0;
> > +}
> > +
> > +/**
> > + * ndctl_config_read() - Handle a ND_CMD_GET_CONFIG_DATA command from ndctl
> > + * @ocxlpmem: the device metadata
> > + * @command: the read request
> > + * Return: 0 on success, negative on failure
> > + */
> > +static int ndctl_config_read(struct ocxlpmem *ocxlpmem,
> > + struct nd_cmd_get_config_data_hdr *command)
> > +{
> > + if (command->in_offset + command->in_length > LABEL_AREA_SIZE)
> > + return -EINVAL;
> > +
> > + memcpy_mcsafe(command->out_buf,
> > + ocxlpmem->metadata_addr + command->in_offset,
> > + command->in_length);
> > +
> > + return 0;
> > +}
> > +
> > +/**
> > + * ndctl_config_size() - Handle a ND_CMD_GET_CONFIG_SIZE command from ndctl
> > + * @command: the read request
> > + * Return: 0 on success, negative on failure
> > + */
> > +static int ndctl_config_size(struct nd_cmd_get_config_size *command)
> > +{
> > + command->status = 0;
> > + command->config_size = LABEL_AREA_SIZE;
> > + command->max_xfer = PAGE_SIZE;
> > +
> > + return 0;
> > +}
> > +
> > +static int ndctl(struct nvdimm_bus_descriptor *nd_desc,
> > + struct nvdimm *nvdimm,
> > + unsigned int cmd, void *buf, unsigned int buf_len, int *cmd_rc)
> > +{
> > + struct ocxlpmem *ocxlpmem = container_of(nd_desc,
> > + struct ocxlpmem, bus_desc);
> > +
> > + switch (cmd) {
> > + case ND_CMD_GET_CONFIG_SIZE:
> > + *cmd_rc = ndctl_config_size(buf);
> > + return 0;
> > +
> > + case ND_CMD_GET_CONFIG_DATA:
> > + *cmd_rc = ndctl_config_read(ocxlpmem, buf);
> > + return 0;
> > +
> > + case ND_CMD_SET_CONFIG_DATA:
> > + *cmd_rc = ndctl_config_write(ocxlpmem, buf);
> > + return 0;
> > +
> > + default:
> > + return -ENOTTY;
> > + }
> > +}
> > +
> > +/**
> > + * reserve_metadata() - Reserve space for nvdimm metadata
> > + * @ocxlpmem: the device metadata
> > + * @lpc_mem: The resource representing the LPC memory of the OpenCAPI device
> > + */
> > +static int reserve_metadata(struct ocxlpmem *ocxlpmem,
> > + struct resource *lpc_mem)
> > +{
> > + ocxlpmem->metadata_addr = devm_memremap(&ocxlpmem->dev, lpc_mem->start,
> > + LABEL_AREA_SIZE, MEMREMAP_WB);
> > + if (IS_ERR(ocxlpmem->metadata_addr))
> > + return PTR_ERR(ocxlpmem->metadata_addr);
> > +
> > + return 0;
> > +}
> > +
> > +/**
> > + * register_lpc_mem() - Discover persistent memory on a device and register it with the NVDIMM subsystem
> > + * @ocxlpmem: the device metadata
> > + * Return: 0 on success
> > + */
> > +static int register_lpc_mem(struct ocxlpmem *ocxlpmem)
> > +{
> > + struct nd_region_desc region_desc;
> > + struct nd_mapping_desc nd_mapping_desc;
> > + struct resource *lpc_mem;
> > + const struct ocxl_afu_config *config;
> > + const struct ocxl_fn_config *fn_config;
> > + int rc;
> > + unsigned long nvdimm_cmd_mask = 0;
> > + unsigned long nvdimm_flags = 0;
> > + int target_node;
> > + char serial[16 + 1];
> > +
> > + // Set up the reserved metadata area
> > + rc = ocxl_afu_map_lpc_mem(ocxlpmem->ocxl_afu);
> > + if (rc < 0)
> > + return rc;
> > +
> > + lpc_mem = ocxl_afu_lpc_mem(ocxlpmem->ocxl_afu);
> > + if (!lpc_mem || !lpc_mem->start)
> > + return -EINVAL;
> > +
> > + config = ocxl_afu_config(ocxlpmem->ocxl_afu);
> > + fn_config = ocxl_function_config(ocxlpmem->ocxl_fn);
> > +
> > + rc = reserve_metadata(ocxlpmem, lpc_mem);
> > + if (rc)
> > + return rc;
> > +
> > + ocxlpmem->bus_desc.provider_name = "ocxlpmem";
> > + ocxlpmem->bus_desc.ndctl = ndctl;
> > + ocxlpmem->bus_desc.module = THIS_MODULE;
> > +
> > + ocxlpmem->nvdimm_bus = nvdimm_bus_register(&ocxlpmem->dev,
> > + &ocxlpmem->bus_desc);
> > + if (!ocxlpmem->nvdimm_bus)
> > + return -EINVAL;
> > +
> > + ocxlpmem->pmem_res.start = (u64)lpc_mem->start + LABEL_AREA_SIZE;
> > + ocxlpmem->pmem_res.end = (u64)lpc_mem->start + config->lpc_mem_size - 1;
> > + ocxlpmem->pmem_res.name = "OpenCAPI persistent memory";
> > +
> > + set_bit(ND_CMD_GET_CONFIG_SIZE, &nvdimm_cmd_mask);
> > + set_bit(ND_CMD_GET_CONFIG_DATA, &nvdimm_cmd_mask);
> > + set_bit(ND_CMD_SET_CONFIG_DATA, &nvdimm_cmd_mask);
> > +
> > + set_bit(NDD_ALIASING, &nvdimm_flags);
> > +
> > + snprintf(serial, sizeof(serial), "%llx", fn_config->serial);
> > + nd_mapping_desc.nvdimm = nvdimm_create(ocxlpmem->nvdimm_bus, ocxlpmem,
> > + NULL, nvdimm_flags,
> > + nvdimm_cmd_mask, 0, NULL);
> > + if (!nd_mapping_desc.nvdimm)
> > + return -ENOMEM;
> > +
> > + if (nvdimm_bus_check_dimm_count(ocxlpmem->nvdimm_bus, 1))
> > + return -EINVAL;
> > +
> > + nd_mapping_desc.start = ocxlpmem->pmem_res.start;
> > + nd_mapping_desc.size = resource_size(&ocxlpmem->pmem_res);
> > + nd_mapping_desc.position = 0;
> > +
> > + ocxlpmem->nd_set.cookie1 = fn_config->serial;
> > + ocxlpmem->nd_set.cookie2 = fn_config->serial;
> > +
> > + target_node = of_node_to_nid(ocxlpmem->pdev->dev.of_node);
> > +
> > + memset(&region_desc, 0, sizeof(region_desc));
> > + region_desc.res = &ocxlpmem->pmem_res;
> > + region_desc.numa_node = NUMA_NO_NODE;
> > + region_desc.target_node = target_node;
> > + region_desc.num_mappings = 1;
> > + region_desc.mapping = &nd_mapping_desc;
> > + region_desc.nd_set = &ocxlpmem->nd_set;
> > +
> > + set_bit(ND_REGION_PAGEMAP, &region_desc.flags);
> > + /*
> > + * NB: libnvdimm copies the data from ndr_desc into it's own
> > + * structures so passing a stack pointer is fine.
> > + */
> > + ocxlpmem->nd_region = nvdimm_pmem_region_create(ocxlpmem->nvdimm_bus,
> > + &region_desc);
> > + if (!ocxlpmem->nd_region)
> > + return -EINVAL;
> > +
> > + dev_info(&ocxlpmem->dev,
> > + "Onlining %lluMB of persistent memory\n",
> > + nd_mapping_desc.size / SZ_1M);
> > +
> > + return 0;
> > +}
> > +
> > +/**
> > + * allocate_minor() - Allocate a minor number to use for an OpenCAPI pmem device
> > + * @ocxlpmem: the device metadata
> > + * Return: the allocated minor number
> > + */
> > +static int allocate_minor(struct ocxlpmem *ocxlpmem)
> > +{
> > + int minor;
> > +
> > + mutex_lock(&minors_idr_lock);
> > + minor = idr_alloc(&minors_idr, ocxlpmem, 0, NUM_MINORS, GFP_KERNEL);
> > + mutex_unlock(&minors_idr_lock);
> > + return minor;
> > +}
> > +
> > +static void free_minor(struct ocxlpmem *ocxlpmem)
> > +{
> > + mutex_lock(&minors_idr_lock);
> > + idr_remove(&minors_idr, MINOR(ocxlpmem->dev.devt));
> > + mutex_unlock(&minors_idr_lock);
> > +}
> > +
> > +/**
> > + * free_ocxlpmem() - Free all members of an ocxlpmem struct
> > + * @ocxlpmem: the device struct to clear
> > + */
> > +static void free_ocxlpmem(struct ocxlpmem *ocxlpmem)
> > +{
> > + int rc;
> > +
> > + free_minor(ocxlpmem);
> > +
> > + if (ocxlpmem->ocxl_context) {
> > + rc = ocxl_context_detach(ocxlpmem->ocxl_context);
> > + if (rc == -EBUSY)
> > + dev_warn(&ocxlpmem->dev, "Timeout detaching ocxl context\n");
> > + else
> > + ocxl_context_free(ocxlpmem->ocxl_context);
> > + }
> > +
> > + if (ocxlpmem->ocxl_afu)
> > + ocxl_afu_put(ocxlpmem->ocxl_afu);
> > +
> > + if (ocxlpmem->ocxl_fn)
> > + ocxl_function_close(ocxlpmem->ocxl_fn);
> > +
> > + pci_dev_put(ocxlpmem->pdev);
> > +
> > + kfree(ocxlpmem);
> > +}
> > +
> > +/**
> > + * free_ocxlpmem_dev() - Free an OpenCAPI persistent memory device
> > + * @dev: The device struct
> > + */
> > +static void free_ocxlpmem_dev(struct device *dev)
> > +{
> > + struct ocxlpmem *ocxlpmem = container_of(dev, struct ocxlpmem, dev);
> > +
> > + free_ocxlpmem(ocxlpmem);
> > +}
> > +
> > +/**
> > + * ocxlpmem_register() - Register an OpenCAPI pmem device with the kernel
> > + * @ocxlpmem: the device metadata
> > + * Return: 0 on success, negative on failure
> > + */
> > +static int ocxlpmem_register(struct ocxlpmem *ocxlpmem)
> > +{
> > + int rc;
> > + int minor = allocate_minor(ocxlpmem);
> > +
> > + if (minor < 0)
> > + return minor;
> > +
> > + ocxlpmem->dev.release = free_ocxlpmem_dev;
> > + rc = dev_set_name(&ocxlpmem->dev, "ocxlpmem%d", minor);
> > + if (rc < 0)
> > + return rc;
> > +
> > + ocxlpmem->dev.devt = MKDEV(MAJOR(ocxlpmem_dev), minor);
> > + ocxlpmem->dev.class = ocxlpmem_class;
> > + ocxlpmem->dev.parent = &ocxlpmem->pdev->dev;
> > +
> > + return device_register(&ocxlpmem->dev);
>
> You lost me, why does this need it's own ioctl path and device node?
> I'd expect you you'd tunnel the commands you need through the existing
> infrastructure ideally with an eye for cross-arch compatibility. This
> is a fundamental concern that probably has significant implications
> for what follows.

At a minimum this device registration should move to the end of the
patchset once all the libnvdimm generic enabling is done. Even then I
think the libnvdimm core has all the hooks you need to provide device
specific enabling *through* the existing interfaces. Have a look at
the how the ACPI NFIT bus driver provides "nfit/" scoped sysfs
attributes and tunnels "struct nd_cmd_pkg" ioctl invocations to either
the bus or "dimm" level device objects. The dimm security model is an
example of an implementation that wraps a libnvdimm-generic sysfs
front-end to an intel-specific-device backend. It would be unfortunate
to need to invent another ABI.

2020-04-01 20:29:54

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v4 12/25] nvdimm/ocxl: Add register addresses & status values to the header

On Sun, Mar 29, 2020 at 10:53 PM Alastair D'Silva <[email protected]> wrote:
>
> These values have been taken from the device specifications.

Link to specification?

2020-04-01 20:30:14

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v4 11/25] powerpc: Enable the OpenCAPI Persistent Memory driver for powernv_defconfig

On Sun, Mar 29, 2020 at 10:23 PM Alastair D'Silva <[email protected]> wrote:
>
> This patch enables the OpenCAPI Persistent Memory driver, as well
> as DAX support, for the 'powernv' defconfig.
>
> DAX is not a strict requirement for the functioning of the driver, but it
> is likely that a user will want to create a DAX device on top of their
> persistent memory device.
>
> Signed-off-by: Alastair D'Silva <[email protected]>
> Reviewed-by: Andrew Donnellan <[email protected]>
> ---
> arch/powerpc/configs/powernv_defconfig | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/arch/powerpc/configs/powernv_defconfig b/arch/powerpc/configs/powernv_defconfig
> index 71749377d164..921d77bbd3d2 100644
> --- a/arch/powerpc/configs/powernv_defconfig
> +++ b/arch/powerpc/configs/powernv_defconfig
> @@ -348,3 +348,8 @@ CONFIG_KVM_BOOK3S_64=m
> CONFIG_KVM_BOOK3S_64_HV=m
> CONFIG_VHOST_NET=m
> CONFIG_PRINTK_TIME=y
> +CONFIG_ZONE_DEVICE=y
> +CONFIG_OCXL_PMEM=m
> +CONFIG_DEV_DAX=m
> +CONFIG_DEV_DAX_PMEM=m
> +CONFIG_FS_DAX=y

These options have dependencies. I think it would better to implement
a top-level configuration question called something like
PERSISTENT_MEMORY_ALL that goes and selects all the bus providers and
infrastructure and lets other defaults follow along. For example,
CONFIG_DEV_DAX could grow a "default LIBNVDIMM" and then
CONFIG_DEV_DAX_PMEM would default on as well. If
CONFIG_PERSISTENT_MEMORY_ALL selected all the bus providers and
ZONE_DEVICE then the Kconfig system could prompt you to where the
dependencies are not satisfied.

2020-04-01 22:44:58

by Alastair D'Silva

[permalink] [raw]
Subject: RE: [PATCH v4 00/25] Add support for OpenCAPI Persistent Memory devices


> -----Original Message-----
> From: Dan Williams <[email protected]>
> Sent: Wednesday, 1 April 2020 7:48 PM
> To: Alastair D'Silva <[email protected]>
> Cc: Aneesh Kumar K . V <[email protected]>; Oliver O'Halloran
> <[email protected]>; Benjamin Herrenschmidt
> <[email protected]>; Paul Mackerras <[email protected]>; Michael
> Ellerman <[email protected]>; Frederic Barrat <[email protected]>;
> Andrew Donnellan <[email protected]>; Arnd Bergmann
> <[email protected]>; Greg Kroah-Hartman <[email protected]>;
> Vishal Verma <[email protected]>; Dave Jiang
> <[email protected]>; Ira Weiny <[email protected]>; Andrew Morton
> <[email protected]>; Mauro Carvalho Chehab
> <[email protected]>; David S. Miller <[email protected]>;
> Rob Herring <[email protected]>; Anton Blanchard <[email protected]>;
> Krzysztof Kozlowski <[email protected]>; Mahesh Salgaonkar
> <[email protected]>; Madhavan Srinivasan
> <[email protected]>; Cédric Le Goater <[email protected]>; Anju T
> Sudhakar <[email protected]>; Hari Bathini
> <[email protected]>; Thomas Gleixner <[email protected]>; Greg
> Kurz <[email protected]>; Nicholas Piggin <[email protected]>; Masahiro
> Yamada <[email protected]>; Alexey Kardashevskiy
> <[email protected]>; Linux Kernel Mailing List <[email protected]>;
> linuxppc-dev <[email protected]>; linux-nvdimm <linux-
> [email protected]>; Linux MM <[email protected]>
> Subject: Re: [PATCH v4 00/25] Add support for OpenCAPI Persistent Memory
> devices
>
> On Sun, Mar 29, 2020 at 10:23 PM Alastair D'Silva <[email protected]>
> wrote:
> >
> > This series adds support for OpenCAPI Persistent Memory devices on
> > bare metal (arch/powernv), exposing them as nvdimms so that we can
> > make use of the existing infrastructure. There already exists a driver
> > for the same devices abstracted through PowerVM (arch/pseries):
> > arch/powerpc/platforms/pseries/papr_scm.c
> >
> > These devices are connected via OpenCAPI, and present as LPC (lowest
> coherence point) memory to the system, practically, that means that
> memory on these cards could be treated as conventional, cache-coherent
> memory.
> >
> > Since the devices are connected via OpenCAPI, they are not enumerated
> via ACPI. Instead, OpenCAPI links present as pseudo-PCI bridges, with
> devices below them.
> >
> > This series introduces a driver that exposes the memory on these cards as
> nvdimms, with each card getting it's own bus. This is somewhat complicated
> by the fact that the cards do not have out of band persistent storage for
> metadata, so 1 SECTION_SIZE's (see SPARSEMEM) worth of storage is carved
> out of the top of the card storage to implement the ndctl_config_* calls.
>
> Is it really tied to section-size? Can't that change based on the configured
> page-size? It's not clear to me why that would be the choice, but I'll dig into
> the implementation.
>

I had tried using PAGE_SIZE, but ran into problems carving off just 1 page and handing it to the kernel, while leaving the rest as pmem. That was a while ago though, so maybe I should retry it.

> > The driver is not responsible for configuring the NPU (NVLink Processing
> Unit) BARs to map the LPC memory from the card into the system's physical
> address space, instead, it requests this to be done via OPAL calls (typically
> implemented by Skiboot).
>
> Are OPAL calls similar to ACPI DSMs? I.e. methods for the OS to invoke
> platform firmware services? What's Skiboot?
>

Yes, OPAL is the interface to firmware for POWER. Skiboot is the open-source (and only) implementation of OPAL.

> >
> > The series is structured as follows:
> > - Required infrastructure changes & cleanup
> > - A minimal driver implementation
> > - Implementing additional features within the driver
>
> Thanks for the intro and the changelog!
>
> >
> > Changelog:
> > V4:
> > - Rebase on next-20200320
>
> Do you have dependencies on other material that's in -next? Otherwise -
> next is only a viable development baseline if you are going to merge through
> Andrew's tree.
>
> > - Bump copyright to 2020
> > - Ensure all uapi headers use C89 compatible comments (missed
> ocxlpmem.h)
> > - Move the driver back to drivers/nvdimm/ocxl, after confirmation
> > that this location is desirable
> > - Rename ocxl.c to ocxlpmem.c (+ support files)
> > - Rename all ocxl_pmem to ocxlpmem
> > - Address checkpatch --strict issues
> > - "powerpc/powernv: Add OPAL calls for LPC memory alloc/release"
> > - Pass base address as __be64
> > - "ocxl: Tally up the LPC memory on a link & allow it to be mapped"
> > - Address checkpatch spacing warnings
> > - Reword blurb
> > - Reword size description for ocxl_link_add_lpc_mem()
> > - Add an early exit in ocxl_link_lpc_release() to avoid triggering
> > bogus warnings if called after ocxl_link_lpc_map() fails
> > - "powerpc/powernv: Add OPAL calls for LPC memory alloc/release"
> > - Reword blurb
> > - "powerpc/powernv: Map & release OpenCAPI LPC memory"
> > - Reword blurb
> > - Move minor_idr init from file_init() to ocxlpmem_init() (fixes runtime
> error
> > in "nvdimm: Add driver for OpenCAPI Persistent Memory")
> > - Wrap long lines
> > - "nvdimm: Add driver for OpenCAPI Storage Class Memory"
> > - Remove '+ 1' workround from serial number->cookie assignment
> > - Drop out of memory message for ocxlpmem in probe()
> > - Fix leaks of ocxlpmem & ocxlpmem->ocxl_fn in probe()
> > - remove struct ocxlpmem_function0, it didn't value add
> > - factor out err_unregistered label in probe
> > - Address more checkpatch warnings
> > - get/put the pci dev on probe/free
> > - Drop ocxlpmem_ prefix from static functions
> > - Propogate errors up from called functions in probe()
> > - Set MODULE_LICENSE to GPLv2
> > - Add myself as module author
> > - Call nvdimm_bus_unregister() in remove() to release references
> > - Don't call devm_memunmap on metadata_address, the release
> handler on
> > the device already deals with this
> > - "nvdimm/ocxl: Read the capability registers & wait for device ready"
> > - Fix mask for read_latency
> > - Fold in is_usable logic into timeout to remove error message race
> > - propogate bad rc from read_device_metadata
> > - "nvdimm/ocxl: Add register addresses & status values to the header"
> > - Add comments for register abbreviations where names have been
> > expanded
> > - Add missing status for blocked on background task
> > - Alias defines for firmware update status to show that the duplication
> > of values is intentional
> > - "nvdimm/ocxl: Register a character device for userspace to interact with"
> > - Add lock around minors IDR, delete the cdev before
> device_unregister
> > - Propogate errors up from called functions in probe()
> > - "nvdimm/ocxl: Add support for Admin commands"
> > - Fix typo in setup_command_data error message, and drop 'ocxl' from
> it
> > - Drop vestigial CHI read from admin_command_request
> > - Change command ID mismatch message to dev_err, and return an
> error
> > - Use jiffies to implement admin_command_complete_timeout()
> > - Flesh out blurb
> > - Create a wrapper to issue the command & wait for timeout
> > - "nvdimm/ocxl: Add support for near storage commands"
> > - dropped (will submit with the patches for nvdimm overwrite)
> > - "nvdimm/ocxl: Implement the Read Error Log command"
> > - Remove stray blank line
> > - change misplaced goto to an early exit in read_error_log
> > - Inline error_log_offset_0x08
> > - Read WWID data as LE rather than host endian
> > - Move the include of nvdimm/ocxlpmem.h to ocxl.c
> > - Add padding after fwrevision in struct ioctl_ocxl_pmem_error_log
> > - Register IOCTL magic
> > - Coerce pointers to __u64 in IOCTLs
> > - "nvdimm/ocxl: Add controller dump IOCTLs"
> > - Coerce pointers to __u64 in IOCTLs
> > - Document expected IOCTL usage in blurb
> > - Add missing rc check
> > - Only populate up to the number of bytes returned by the card,
> > and return this length to the caller
> > - Add missing header check
> > - "nvdimm/ocxl: Add an IOCTL to report controller statistics"
> > - Update to match the latest version of the spec
> > - Verify that parametr block IDs & lengths match what we expect
> > - Use defines for offsets
> > - "nvdimm/ocxl: Forward events to userspace"
> > - Don't enable NSCRA doorbell
> > - return -EBUSY if the event context is already used
> > - return -ENODEV if IRQs cannot be mapped
> > - Tag IRQ pointers with __iomem
> > - Drop ocxlpmem_ prefix from static functions
> > - Propogate error from eventfd_ctx_fdget
> > - Fix error check in copy_to_user
> > - Drop GLOBAL_MMIO_CHI_NSCRA (this should be in the overwrite
> patch)
> > - Drop unused irq_pgmap
> > - Don't redef BIT_ULL
> > - "nvdimm/ocxl: Add debug IOCTLs"
> > - Eliminate clearing loop (now done in admin_command_execute()
> > - Drop dummy IOCTLs if CONFIG_OCXL_PMEM_DEBUG is not set
> > - Group debug IOCTLs together & comment that they may not be
> available
> > - "nvdimm/ocxl: Expose SMART data via ndctl"
> > - Drop 'rc = 0; goto out;'
> > - Propogate errors from ndctl_smart()
> > - "nvdimm/ocxl: Expose the serial number in sysfs" & "nvdimm/ocxl:
> Expose the firmware version in sysfs"
> > - Squash these 2 patches together
> > - Expose data as a DIMM attribute rather than an ocxlpmem
> > attribute
> > - "nvdimm/ocxl: Add an IOCTL to request controller health & perf data"
> > - Reword blurb
> > - "nvdimm/ocxl: Implement the heartbeat command"
> > - Propogate rc in probe()
> >
> > V3:
> > - Rebase against next/next-20200220
> > - Move driver to arch/powerpc/platforms/powernv, we now expect this
> > driver to go upstream via the powerpc tree
> > - "nvdimm/ocxl: Implement the Read Error Log command"
> > - Fix bad header path
> > - "nvdimm/ocxl: Read the capability registers & wait for device ready"
> > - Fix overlapping masks between readiness_timeout &
> memory_available_timeout
> > - "nvdimm: Add driver for OpenCAPI Storage Class Memory"
> > - Address minor review comments from Jonathan Cameron
> > - Remove attributes
> > - Default to module if building LIBNVDIMM
> > - Propogate errors up from called functions in probe()
> > - "nvdimm/ocxl: Expose SMART data via ndctl"
> > - Pack attributes in struct
> > - Support different size SMART buffers for compatibility with newer
> > ndctls that may want more SMART attribs than we provide
> > - Rework to to use ND_CMD_CALL instead of ND_CMD_SMART
> > - drop "ocxl: Free detached contexts in ocxl_context_detach_all()"
> > - "powerpc: Map & release OpenCAPI LPC memory"
> > - Remove 'extern'
> > - Only available with CONFIG_MEMORY_HOTPLUG_SPARSE
> > - "ocxl: Tally up the LPC memory on a link & allow it to be mapped"
> > - Address minor review comments from Jonathan Cameron
> > - "ocxl: Add functions to map/unmap LPC memory"
> > - Split detected memory message into a separate patch
> > - Address minor review comments from Jonathan Cameron
> > - Add a comment explaining why unmap_lpc_mem is in
> deconfigure_afu
> > - "nvdimm/ocxl: Add support for Admin commands"
> > - use sizeof(u64) rather than 0x08 when iterating u64s
> > - "nvdimm/ocxl: Implement the heartbeat command"
> > - Fix typo in blurb
> > - Address kernel doc issues
> > - Ensure all uapi headers use C89 compatible comments
> > - Drop patches for firmware update & overwrite, these will be
> > submitted later once patches are available for ndctl
> > - Rename SCM to OpenCAPI Persistent Memory
> >
> > V2:
> > - "powerpc: Map & release OpenCAPI LPC memory"
> > - Fix #if -> #ifdef
> > - use pci_dev_id to get the bdfn
> > - use __be64 to hold be data
> > - indent check_hotplug_memory_addressable correctly
> > - Remove export of check_hotplug_memory_addressable
> > - "ocxl: Conditionally bind SCM devices to the generic OCXL driver"
> > - Improve patch description and remove redundant default
> > - "nvdimm: Add driver for OpenCAPI Storage Class Memory"
> > - Mark a few funcs as static as identified by the 0day bot
> > - Add OCXL dependancies to OCXL_SCM
> > - Use memcpy_mcsafe in scm_ndctl_config_read
> > - Rename scm_foo_offset_0x00 to scm_foo_header_parse & add docs
> > - Name DIMM attribs "ocxl" rather than "scm"
> > - Split out into base + many feature patches
> > - "powerpc: Enable OpenCAPI Storage Class Memory driver on bare
> metal"
> > - Build DEV_DAX & friends as modules
> > - "ocxl: Conditionally bind SCM devices to the generic OCXL driver"
> > - Patch dropped (easy enough to maintain this out of tree for
> development)
> > - "ocxl: Tally up the LPC memory on a link & allow it to be mapped"
> > - Add a warning if an unmatched lpc_release is called
> > - "ocxl: Add functions to map/unmap LPC memory"
> > - Use EXPORT_SYMBOL_GPL
> >
> >
> > Alastair D'Silva (25):
> > powerpc/powernv: Add OPAL calls for LPC memory alloc/release
> > mm/memory_hotplug: Allow check_hotplug_memory_addressable to be
> called
> > from drivers
> > powerpc/powernv: Map & release OpenCAPI LPC memory
> > ocxl: Remove unnecessary externs
> > ocxl: Address kernel doc errors & warnings
> > ocxl: Tally up the LPC memory on a link & allow it to be mapped
> > ocxl: Add functions to map/unmap LPC memory
> > ocxl: Emit a log message showing how much LPC memory was detected
> > ocxl: Save the device serial number in ocxl_fn
> > nvdimm: Add driver for OpenCAPI Persistent Memory
> > powerpc: Enable the OpenCAPI Persistent Memory driver for
> > powernv_defconfig
> > nvdimm/ocxl: Add register addresses & status values to the header
> > nvdimm/ocxl: Read the capability registers & wait for device ready
> > nvdimm/ocxl: Add support for Admin commands
> > nvdimm/ocxl: Register a character device for userspace to interact
> > with
> > nvdimm/ocxl: Implement the Read Error Log command
> > nvdimm/ocxl: Add controller dump IOCTLs
> > nvdimm/ocxl: Add an IOCTL to report controller statistics
> > nvdimm/ocxl: Forward events to userspace
> > nvdimm/ocxl: Add an IOCTL to request controller health & perf data
> > nvdimm/ocxl: Implement the heartbeat command
> > nvdimm/ocxl: Add debug IOCTLs
> > nvdimm/ocxl: Expose SMART data via ndctl
> > nvdimm/ocxl: Expose the serial number & firmware version in sysfs
> > MAINTAINERS: Add myself & nvdimm/ocxl to ocxl
> >
> > .../userspace-api/ioctl/ioctl-number.rst | 1 +
> > MAINTAINERS | 3 +
> > arch/powerpc/configs/powernv_defconfig | 5 +
> > arch/powerpc/include/asm/opal-api.h | 2 +
> > arch/powerpc/include/asm/opal.h | 2 +
> > arch/powerpc/include/asm/pnv-ocxl.h | 42 +-
> > arch/powerpc/platforms/powernv/ocxl.c | 43 +
> > arch/powerpc/platforms/powernv/opal-call.c | 2 +
> > drivers/misc/ocxl/config.c | 74 +-
> > drivers/misc/ocxl/core.c | 61 +
> > drivers/misc/ocxl/link.c | 60 +
> > drivers/misc/ocxl/ocxl_internal.h | 45 +-
> > drivers/nvdimm/Kconfig | 2 +
> > drivers/nvdimm/Makefile | 1 +
> > drivers/nvdimm/ocxl/Kconfig | 21 +
> > drivers/nvdimm/ocxl/Makefile | 7 +
> > drivers/nvdimm/ocxl/main.c | 1975 +++++++++++++++++
> > drivers/nvdimm/ocxl/ocxlpmem.h | 197 ++
> > drivers/nvdimm/ocxl/ocxlpmem_internal.c | 280 +++
> > include/linux/memory_hotplug.h | 5 +
> > include/misc/ocxl.h | 122 +-
> > include/uapi/linux/ndctl.h | 1 +
> > include/uapi/nvdimm/ocxlpmem.h | 127 ++
> > mm/memory_hotplug.c | 4 +-
> > 24 files changed, 2983 insertions(+), 99 deletions(-) create mode
> > 100644 drivers/nvdimm/ocxl/Kconfig create mode 100644
> > drivers/nvdimm/ocxl/Makefile create mode 100644
> > drivers/nvdimm/ocxl/main.c create mode 100644
> > drivers/nvdimm/ocxl/ocxlpmem.h create mode 100644
> > drivers/nvdimm/ocxl/ocxlpmem_internal.c
> > create mode 100644 include/uapi/nvdimm/ocxlpmem.h
> >
> > --
> > 2.24.1
> >


--
Alastair D'Silva mob: 0423 762 819
skype: alastair_dsilva msn: [email protected]
blog: http://alastair.d-silva.org Twitter: @EvilDeece

2020-04-01 22:58:39

by Alastair D'Silva

[permalink] [raw]
Subject: RE: [PATCH v4 01/25] powerpc/powernv: Add OPAL calls for LPC memory alloc/release

> -----Original Message-----
> From: Dan Williams <[email protected]>
> Sent: Wednesday, 1 April 2020 7:48 PM
> To: Alastair D'Silva <[email protected]>
> Cc: Aneesh Kumar K . V <[email protected]>; Oliver O'Halloran
> <[email protected]>; Benjamin Herrenschmidt
> <[email protected]>; Paul Mackerras <[email protected]>; Michael
> Ellerman <[email protected]>; Frederic Barrat <[email protected]>;
> Andrew Donnellan <[email protected]>; Arnd Bergmann
> <[email protected]>; Greg Kroah-Hartman <[email protected]>;
> Vishal Verma <[email protected]>; Dave Jiang
> <[email protected]>; Ira Weiny <[email protected]>; Andrew Morton
> <[email protected]>; Mauro Carvalho Chehab
> <[email protected]>; David S. Miller <[email protected]>;
> Rob Herring <[email protected]>; Anton Blanchard <[email protected]>;
> Krzysztof Kozlowski <[email protected]>; Mahesh Salgaonkar
> <[email protected]>; Madhavan Srinivasan
> <[email protected]>; Cédric Le Goater <[email protected]>; Anju T
> Sudhakar <[email protected]>; Hari Bathini
> <[email protected]>; Thomas Gleixner <[email protected]>; Greg
> Kurz <[email protected]>; Nicholas Piggin <[email protected]>; Masahiro
> Yamada <[email protected]>; Alexey Kardashevskiy
> <[email protected]>; Linux Kernel Mailing List <[email protected]>;
> linuxppc-dev <[email protected]>; linux-nvdimm <linux-
> [email protected]>; Linux MM <[email protected]>
> Subject: Re: [PATCH v4 01/25] powerpc/powernv: Add OPAL calls for LPC
> memory alloc/release
>
> On Sun, Mar 29, 2020 at 10:23 PM Alastair D'Silva <[email protected]>
> wrote:
> >
> > Add OPAL calls for LPC memory alloc/release
> >
>
> This seems to be referencing an existing api definition, can you include a
> pointer to the spec in case someone wanted to understand what these
> routines do? I suspect this is not allocating memory in the traditional sense as
> much as it's allocating physical address space for a device to be mapped?
>

These API calls were introduced in the following skiboot commit:
https://github.com/open-power/skiboot/commit/1a548857ce1f02f43585b326a891eed18a7b43b3

I'll add it to the description.

>
> > Signed-off-by: Alastair D'Silva <[email protected]>
> > Acked-by: Andrew Donnellan <[email protected]>
> > Acked-by: Frederic Barrat <[email protected]>
> > ---
> > arch/powerpc/include/asm/opal-api.h | 2 ++
> > arch/powerpc/include/asm/opal.h | 2 ++
> > arch/powerpc/platforms/powernv/opal-call.c | 2 ++
> > 3 files changed, 6 insertions(+)
> >
> > diff --git a/arch/powerpc/include/asm/opal-api.h
> > b/arch/powerpc/include/asm/opal-api.h
> > index c1f25a760eb1..9298e603001b 100644
> > --- a/arch/powerpc/include/asm/opal-api.h
> > +++ b/arch/powerpc/include/asm/opal-api.h
> > @@ -208,6 +208,8 @@
> > #define OPAL_HANDLE_HMI2 166
> > #define OPAL_NX_COPROC_INIT 167
> > #define OPAL_XIVE_GET_VP_STATE 170
> > +#define OPAL_NPU_MEM_ALLOC 171
> > +#define OPAL_NPU_MEM_RELEASE 172
> > #define OPAL_MPIPL_UPDATE 173
> > #define OPAL_MPIPL_REGISTER_TAG 174
> > #define OPAL_MPIPL_QUERY_TAG 175
> > diff --git a/arch/powerpc/include/asm/opal.h
> > b/arch/powerpc/include/asm/opal.h index 9986ac34b8e2..301fea46c7ca
> > 100644
> > --- a/arch/powerpc/include/asm/opal.h
> > +++ b/arch/powerpc/include/asm/opal.h
> > @@ -39,6 +39,8 @@ int64_t opal_npu_spa_clear_cache(uint64_t phb_id,
> uint32_t bdfn,
> > uint64_t PE_handle); int64_t
> > opal_npu_tl_set(uint64_t phb_id, uint32_t bdfn, long cap,
> > uint64_t rate_phys, uint32_t size);
> > +int64_t opal_npu_mem_alloc(u64 phb_id, u32 bdfn, u64 size, __be64
> > +*bar); int64_t opal_npu_mem_release(u64 phb_id, u32 bdfn);
> >
> > int64_t opal_console_write(int64_t term_number, __be64 *length,
> > const uint8_t *buffer); diff --git
> > a/arch/powerpc/platforms/powernv/opal-call.c
> > b/arch/powerpc/platforms/powernv/opal-call.c
> > index 5cd0f52d258f..f26e58b72c04 100644
> > --- a/arch/powerpc/platforms/powernv/opal-call.c
> > +++ b/arch/powerpc/platforms/powernv/opal-call.c
> > @@ -287,6 +287,8 @@ OPAL_CALL(opal_pci_set_pbcq_tunnel_bar,
> OPAL_PCI_SET_PBCQ_TUNNEL_BAR);
> > OPAL_CALL(opal_sensor_read_u64,
> OPAL_SENSOR_READ_U64);
> > OPAL_CALL(opal_sensor_group_enable,
> OPAL_SENSOR_GROUP_ENABLE);
> > OPAL_CALL(opal_nx_coproc_init, OPAL_NX_COPROC_INIT);
> > +OPAL_CALL(opal_npu_mem_alloc, OPAL_NPU_MEM_ALLOC);
> > +OPAL_CALL(opal_npu_mem_release,
> OPAL_NPU_MEM_RELEASE);
> > OPAL_CALL(opal_mpipl_update, OPAL_MPIPL_UPDATE);
> > OPAL_CALL(opal_mpipl_register_tag,
> OPAL_MPIPL_REGISTER_TAG);
> > OPAL_CALL(opal_mpipl_query_tag,
> OPAL_MPIPL_QUERY_TAG);
> > --
> > 2.24.1
> >
>
>
> --
> This email has been checked for viruses by AVG.
> https://www.avg.com


--
Alastair D'Silva mob: 0423 762 819
skype: alastair_dsilva msn: [email protected]
blog: http://alastair.d-silva.org Twitter: @EvilDeece

2020-04-02 00:22:24

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v4 13/25] nvdimm/ocxl: Read the capability registers & wait for device ready

On Sun, Mar 29, 2020 at 10:23 PM Alastair D'Silva <[email protected]> wrote:
>
> This patch reads timeouts & firmware version from the controller, and
> uses those timeouts to wait for the controller to report that it is ready
> before handing the memory over to libnvdimm.
>
> Signed-off-by: Alastair D'Silva <[email protected]>
> ---
> drivers/nvdimm/ocxl/Makefile | 2 +-
> drivers/nvdimm/ocxl/main.c | 85 +++++++++++++++++++++++++
> drivers/nvdimm/ocxl/ocxlpmem.h | 29 +++++++++
> drivers/nvdimm/ocxl/ocxlpmem_internal.c | 19 ++++++
> 4 files changed, 134 insertions(+), 1 deletion(-)
> create mode 100644 drivers/nvdimm/ocxl/ocxlpmem_internal.c
>
> diff --git a/drivers/nvdimm/ocxl/Makefile b/drivers/nvdimm/ocxl/Makefile
> index e0e8ade1987a..bab97082e062 100644
> --- a/drivers/nvdimm/ocxl/Makefile
> +++ b/drivers/nvdimm/ocxl/Makefile
> @@ -4,4 +4,4 @@ ccflags-$(CONFIG_PPC_WERROR) += -Werror
>
> obj-$(CONFIG_OCXL_PMEM) += ocxlpmem.o
>
> -ocxlpmem-y := main.o
> \ No newline at end of file
> +ocxlpmem-y := main.o ocxlpmem_internal.o
> diff --git a/drivers/nvdimm/ocxl/main.c b/drivers/nvdimm/ocxl/main.c
> index c0066fedf9cc..be76acd33d74 100644
> --- a/drivers/nvdimm/ocxl/main.c
> +++ b/drivers/nvdimm/ocxl/main.c
> @@ -8,6 +8,7 @@
>
> #include <linux/module.h>
> #include <misc/ocxl.h>
> +#include <linux/delay.h>
> #include <linux/ndctl.h>
> #include <linux/mm_types.h>
> #include <linux/memory_hotplug.h>
> @@ -327,6 +328,50 @@ static void remove(struct pci_dev *pdev)
> }
> }
>
> +/**
> + * read_device_metadata() - Retrieve config information from the AFU and save it for future use
> + * @ocxlpmem: the device metadata
> + * Return: 0 on success, negative on failure
> + */
> +static int read_device_metadata(struct ocxlpmem *ocxlpmem)
> +{
> + u64 val;
> + int rc;
> +
> + rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu, GLOBAL_MMIO_CCAP0,
> + OCXL_LITTLE_ENDIAN, &val);

This calling convention would seem to defeat the ability of sparse to
validate endian correctness. That's independent of this series, but I
wonder how does someone review why this argument is sometimes
OCXL_LITTLE_ENDIAN and sometimes OCXL_HOST_ENDIAN?

> + if (rc)
> + return rc;
> +
> + ocxlpmem->scm_revision = val & 0xFFFF;
> + ocxlpmem->read_latency = (val >> 32) & 0xFFFF;
> + ocxlpmem->readiness_timeout = (val >> 48) & 0x0F;
> + ocxlpmem->memory_available_timeout = val >> 52;

Maybe some macros to parse out these register fields?

> +
> + rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu, GLOBAL_MMIO_CCAP1,
> + OCXL_LITTLE_ENDIAN, &val);
> + if (rc)
> + return rc;
> +
> + ocxlpmem->max_controller_dump_size = val & 0xFFFFFFFF;
> +
> + // Extract firmware version text
> + rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu, GLOBAL_MMIO_FWVER,
> + OCXL_HOST_ENDIAN,
> + (u64 *)ocxlpmem->fw_version);
> + if (rc)
> + return rc;
> +
> + ocxlpmem->fw_version[8] = '\0';
> +
> + dev_info(&ocxlpmem->dev,
> + "Firmware version '%s' SCM revision %d:%d\n",
> + ocxlpmem->fw_version, ocxlpmem->scm_revision >> 4,
> + ocxlpmem->scm_revision & 0x0F);

Does the driver need to be chatty here. If this data is relevant
should it appear in sysfs by default?

> +
> + return 0;
> +}
> +
> /**
> * probe_function0() - Set up function 0 for an OpenCAPI persistent memory device
> * This is important as it enables templates higher than 0 across all other
> @@ -359,6 +404,9 @@ static int probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> {
> struct ocxlpmem *ocxlpmem;
> int rc;
> + u64 chi;
> + u16 elapsed, timeout;
> + bool ready = false;
>
> if (PCI_FUNC(pdev->devfn) == 0)
> return probe_function0(pdev);
> @@ -413,6 +461,43 @@ static int probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> goto err;
> }
>
> + rc = read_device_metadata(ocxlpmem);
> + if (rc) {
> + dev_err(&pdev->dev, "Could not read metadata\n");
> + goto err;
> + }
> +
> + elapsed = 0;
> + timeout = ocxlpmem->readiness_timeout +
> + ocxlpmem->memory_available_timeout;
> +
> + while (true) {
> + rc = ocxlpmem_chi(ocxlpmem, &chi);
> + ready = (chi & (GLOBAL_MMIO_CHI_CRDY | GLOBAL_MMIO_CHI_MA)) ==
> + (GLOBAL_MMIO_CHI_CRDY | GLOBAL_MMIO_CHI_MA);
> +
> + if (ready)
> + break;
> +
> + if (elapsed++ > timeout) {
> + dev_err(&ocxlpmem->dev,
> + "OpenCAPI Persistent Memory ready timeout.\n");
> +
> + if (!(chi & GLOBAL_MMIO_CHI_CRDY))
> + dev_err(&ocxlpmem->dev,
> + "controller is not ready.\n");
> +
> + if (!(chi & GLOBAL_MMIO_CHI_MA))
> + dev_err(&ocxlpmem->dev,
> + "controller does not have memory available.\n");
> +
> + rc = -ENXIO;
> + goto err;
> + }
> +
> + msleep(1000);

At platform boot this is going to serialize / delay other pci hardware
init. Do you need this determination to be synchronous with the call
to ->probe()? If not, let's move it out of line. For example nvdimm
device registration is asynchronous by default with options to flush
if userspace needs to know that the kernel has finished loading
drivers.

> + }
> +
> rc = register_lpc_mem(ocxlpmem);
> if (rc) {
> dev_err(&pdev->dev,
> diff --git a/drivers/nvdimm/ocxl/ocxlpmem.h b/drivers/nvdimm/ocxl/ocxlpmem.h
> index 322387873b4b..3eadbe19f6d0 100644
> --- a/drivers/nvdimm/ocxl/ocxlpmem.h
> +++ b/drivers/nvdimm/ocxl/ocxlpmem.h
> @@ -93,4 +93,33 @@ struct ocxlpmem {
> void *metadata_addr;
> struct resource pmem_res;
> struct nd_region *nd_region;
> + char fw_version[8 + 1];
> +
> + u32 max_controller_dump_size;
> + u16 scm_revision; // major/minor
> + u8 readiness_timeout; /* The worst case time (in seconds) that the host
> + * shall wait for the controller to become
> + * operational following a reset (CHI.CRDY).
> + */
> + u8 memory_available_timeout; /* The worst case time (in seconds) that
> + * the host shall wait for memory to
> + * become available following a reset
> + * (CHI.MA).
> + */
> +
> + u16 read_latency; /* The nominal measure of latency (in nanoseconds)
> + * associated with an unassisted read of a memory
> + * block.
> + * This represents the capability of the raw media
> + * technology without assistance
> + */
> };
> +
> +/**
> + * ocxlpmem_chi() - Get the value of the CHI register
> + * @ocxlpmem: the device metadata
> + * @chi: returns the CHI value
> + *
> + * Returns 0 on success, negative on error
> + */
> +int ocxlpmem_chi(const struct ocxlpmem *ocxlpmem, u64 *chi);
> diff --git a/drivers/nvdimm/ocxl/ocxlpmem_internal.c b/drivers/nvdimm/ocxl/ocxlpmem_internal.c
> new file mode 100644
> index 000000000000..5578169b7515
> --- /dev/null
> +++ b/drivers/nvdimm/ocxl/ocxlpmem_internal.c
> @@ -0,0 +1,19 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +// Copyright 2020 IBM Corp.
> +
> +#include <misc/ocxl.h>
> +#include <linux/delay.h>
> +#include "ocxlpmem.h"
> +
> +int ocxlpmem_chi(const struct ocxlpmem *ocxlpmem, u64 *chi)
> +{
> + u64 val;
> + int rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu, GLOBAL_MMIO_CHI,
> + OCXL_LITTLE_ENDIAN, &val);
> + if (rc)
> + return rc;
> +
> + *chi = val;
> +
> + return 0;
> +}
> --
> 2.24.1
>

2020-04-02 00:29:42

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v4 15/25] nvdimm/ocxl: Register a character device for userspace to interact with

On Sun, Mar 29, 2020 at 10:53 PM Alastair D'Silva <[email protected]> wrote:
>
> This patch introduces a character device (/dev/ocxlpmemX) which further
> patches will use to interact with userspace, such as error logs,
> controller stats and card debug functionality.

This was asked earlier, but I'll reiterate, I do not see what
justifies an ocxlpmemX private device ABI vs routing through the
existing generic character ndbusX and nmemX character devices.

>
> Signed-off-by: Alastair D'Silva <[email protected]>
> ---
> drivers/nvdimm/ocxl/main.c | 117 ++++++++++++++++++++++++++++++++-
> drivers/nvdimm/ocxl/ocxlpmem.h | 2 +
> 2 files changed, 117 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/nvdimm/ocxl/main.c b/drivers/nvdimm/ocxl/main.c
> index 8db573036423..9b85fcd3f1c9 100644
> --- a/drivers/nvdimm/ocxl/main.c
> +++ b/drivers/nvdimm/ocxl/main.c
> @@ -10,6 +10,7 @@
> #include <misc/ocxl.h>
> #include <linux/delay.h>
> #include <linux/ndctl.h>
> +#include <linux/fs.h>
> #include <linux/mm_types.h>
> #include <linux/memory_hotplug.h>
> #include "ocxlpmem.h"
> @@ -356,6 +357,67 @@ static int ocxlpmem_register(struct ocxlpmem *ocxlpmem)
> return device_register(&ocxlpmem->dev);
> }
>
> +static void ocxlpmem_put(struct ocxlpmem *ocxlpmem)
> +{
> + put_device(&ocxlpmem->dev);
> +}
> +
> +static struct ocxlpmem *ocxlpmem_get(struct ocxlpmem *ocxlpmem)
> +{
> + return (!get_device(&ocxlpmem->dev)) ? NULL : ocxlpmem;
> +}
> +
> +static struct ocxlpmem *find_and_get_ocxlpmem(dev_t devno)
> +{
> + struct ocxlpmem *ocxlpmem;
> + int minor = MINOR(devno);
> +
> + mutex_lock(&minors_idr_lock);
> + ocxlpmem = idr_find(&minors_idr, minor);
> + if (ocxlpmem)
> + ocxlpmem_get(ocxlpmem);
> + mutex_unlock(&minors_idr_lock);
> +
> + return ocxlpmem;
> +}
> +
> +static int file_open(struct inode *inode, struct file *file)
> +{
> + struct ocxlpmem *ocxlpmem;
> +
> + ocxlpmem = find_and_get_ocxlpmem(inode->i_rdev);
> + if (!ocxlpmem)
> + return -ENODEV;
> +
> + file->private_data = ocxlpmem;
> + return 0;
> +}
> +
> +static int file_release(struct inode *inode, struct file *file)
> +{
> + struct ocxlpmem *ocxlpmem = file->private_data;
> +
> + ocxlpmem_put(ocxlpmem);
> + return 0;
> +}
> +
> +static const struct file_operations fops = {
> + .owner = THIS_MODULE,
> + .open = file_open,
> + .release = file_release,
> +};
> +
> +/**
> + * create_cdev() - Create the chardev in /dev for the device
> + * @ocxlpmem: the SCM metadata
> + * Return: 0 on success, negative on failure
> + */
> +static int create_cdev(struct ocxlpmem *ocxlpmem)
> +{
> + cdev_init(&ocxlpmem->cdev, &fops);
> + return cdev_add(&ocxlpmem->cdev, ocxlpmem->dev.devt, 1);
> +}
> +
> /**
> * ocxlpmem_remove() - Free an OpenCAPI persistent memory device
> * @pdev: the PCI device information struct
> @@ -376,6 +438,13 @@ static void remove(struct pci_dev *pdev)
> if (ocxlpmem->nvdimm_bus)
> nvdimm_bus_unregister(ocxlpmem->nvdimm_bus);
>
> + /*
> + * Remove the cdev early to prevent a race against userspace
> + * via the char dev
> + */
> + if (ocxlpmem->cdev.owner)
> + cdev_del(&ocxlpmem->cdev);
> +
> device_unregister(&ocxlpmem->dev);
> }
> }
> @@ -527,11 +596,18 @@ static int probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> goto err;
> }
>
> - if (setup_command_metadata(ocxlpmem)) {
> + rc = setup_command_metadata(ocxlpmem);
> + if (rc) {
> dev_err(&pdev->dev, "Could not read command metadata\n");
> goto err;
> }
>
> + rc = create_cdev(ocxlpmem);
> + if (rc) {
> + dev_err(&pdev->dev, "Could not create character device\n");
> + goto err;
> + }
> +
> elapsed = 0;
> timeout = ocxlpmem->readiness_timeout +
> ocxlpmem->memory_available_timeout;
> @@ -599,6 +675,36 @@ static struct pci_driver pci_driver = {
> .shutdown = remove,
> };
>
> +static int file_init(void)
> +{
> + int rc;
> +
> + rc = alloc_chrdev_region(&ocxlpmem_dev, 0, NUM_MINORS, "ocxlpmem");
> + if (rc) {
> + idr_destroy(&minors_idr);
> + pr_err("Unable to allocate OpenCAPI persistent memory major number: %d\n",
> + rc);
> + return rc;
> + }
> +
> + ocxlpmem_class = class_create(THIS_MODULE, "ocxlpmem");
> + if (IS_ERR(ocxlpmem_class)) {
> + idr_destroy(&minors_idr);
> + pr_err("Unable to create ocxlpmem class\n");
> + unregister_chrdev_region(ocxlpmem_dev, NUM_MINORS);
> + return PTR_ERR(ocxlpmem_class);
> + }
> +
> + return 0;
> +}
> +
> +static void file_exit(void)
> +{
> + class_destroy(ocxlpmem_class);
> + unregister_chrdev_region(ocxlpmem_dev, NUM_MINORS);
> + idr_destroy(&minors_idr);
> +}
> +
> static int __init ocxlpmem_init(void)
> {
> int rc;
> @@ -606,16 +712,23 @@ static int __init ocxlpmem_init(void)
> mutex_init(&minors_idr_lock);
> idr_init(&minors_idr);
>
> - rc = pci_register_driver(&pci_driver);
> + rc = file_init();
> if (rc)
> return rc;
>
> + rc = pci_register_driver(&pci_driver);
> + if (rc) {
> + file_exit();
> + return rc;
> + }
> +
> return 0;
> }
>
> static void ocxlpmem_exit(void)
> {
> pci_unregister_driver(&pci_driver);
> + file_exit();
> }
>
> module_init(ocxlpmem_init);
> diff --git a/drivers/nvdimm/ocxl/ocxlpmem.h b/drivers/nvdimm/ocxl/ocxlpmem.h
> index b72b3f909fc3..ee3bd651f254 100644
> --- a/drivers/nvdimm/ocxl/ocxlpmem.h
> +++ b/drivers/nvdimm/ocxl/ocxlpmem.h
> @@ -2,6 +2,7 @@
> // Copyright 2020 IBM Corp.
>
> #include <linux/pci.h>
> +#include <linux/cdev.h>
> #include <misc/ocxl.h>
> #include <linux/libnvdimm.h>
> #include <linux/mm.h>
> @@ -103,6 +104,7 @@ struct command_metadata {
> struct ocxlpmem {
> struct device dev;
> struct pci_dev *pdev;
> + struct cdev cdev;
> struct ocxl_fn *ocxl_fn;
> struct nd_interleave_set nd_set;
> struct nvdimm_bus_descriptor bus_desc;
> --
> 2.24.1
>

2020-04-02 01:33:41

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH v4 08/25] ocxl: Emit a log message showing how much LPC memory was detected

On Wed, 2020-04-01 at 01:49 -0700, Dan Williams wrote:
> On Sun, Mar 29, 2020 at 10:23 PM Alastair D'Silva <[email protected]> wrote:
> > This patch emits a message showing how much LPC memory & special purpose
> > memory was detected on an OCXL device.
[]
> > diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
[]
> > @@ -568,6 +568,10 @@ static int read_afu_lpc_memory_info(struct pci_dev *dev,
> > afu->special_purpose_mem_size =
> > total_mem_size - lpc_mem_size;
> > }
> > +
> > + dev_info(&dev->dev, "Probed LPC memory of %#llx bytes and special purpose memory of %#llx bytes\n",
> > + afu->lpc_mem_size, afu->special_purpose_mem_size);
>
> A patch for a single log message is too fine grained for my taste,
> let's squash this into another patch in the series.

Is the granularity of lpc_mem_size actually bytes?
Might this be better as KiB or something using functions

Maybe something like:

unsigned long si_val(unsigned long val)
{
static const char units[] = "BKMGTPE";
const char *unit = units;

while (!(val & 1023) && unit[1]) {
val >>= 10;
unit++;
}

return val;
}

char si_type(unsigned long val)
{
static const char units[] = "BKMGTPE";
const char *unit = units;

while (!(val & 1023) && unit[1]) {
val >>= 10;
unit++;
}

return *unit;
}

so this could be something like:

dev_info(&dev->dev, "Probed LPC memory of %#llu%c and special purpose memory of %#llu%c\n",
si_val(afu->lpc_mem_size), si_type(afu->lpc_mem_size),
si_val(afu->special_purpose_mem_size), si_type(afu->special_purpose_mem_size));



2020-04-02 02:10:21

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v4 19/25] nvdimm/ocxl: Forward events to userspace

On Tue, Mar 31, 2020 at 1:59 AM Alastair D'Silva <[email protected]> wrote:
>
> Some of the interrupts that the card generates are better handled
> by the userspace daemon, in particular:
> Controller Hardware/Firmware Fatal
> Controller Dump Available
> Error Log available
>
> This patch allows a userspace application to register an eventfd with
> the driver via SCM_IOCTL_EVENTFD to receive notifications of these
> interrupts.
>
> Userspace can then identify what events have occurred by calling
> SCM_IOCTL_EVENT_CHECK and checking against the SCM_IOCTL_EVENT_FOO
> masks.

The amount new ioctl's in this driver is too high, it seems much of
this data can be exported via sysfs attributes which are more
maintainable that ioctls. Then sysfs also has the ability to signal
events on sysfs attributes, see sys_notify_dirent.

Can you step back and review the ABI exposure of the driver and what
can be moved to sysfs? If you need to have bus specific attributes
ordered underneath the libnvdimm generic attributes you can create a
sysfs attribute subdirectory.

In general a roadmap document of all the proposed ABI is needed to
make sure it is both sufficient and necessary. See the libnvdimm
document that introduced the initial libnvdimm ABI:

https://www.kernel.org/doc/Documentation/nvdimm/nvdimm.txt

>
> Signed-off-by: Alastair D'Silva <[email protected]>
> ---
> drivers/nvdimm/ocxl/main.c | 220 +++++++++++++++++++++++++++++++++
> drivers/nvdimm/ocxl/ocxlpmem.h | 4 +
> include/uapi/nvdimm/ocxlpmem.h | 12 ++
> 3 files changed, 236 insertions(+)
>
> diff --git a/drivers/nvdimm/ocxl/main.c b/drivers/nvdimm/ocxl/main.c
> index 0040fc09cceb..cb6cdc9eb899 100644
> --- a/drivers/nvdimm/ocxl/main.c
> +++ b/drivers/nvdimm/ocxl/main.c
> @@ -10,6 +10,7 @@
> #include <misc/ocxl.h>
> #include <linux/delay.h>
> #include <linux/ndctl.h>
> +#include <linux/eventfd.h>
> #include <linux/fs.h>
> #include <linux/mm_types.h>
> #include <linux/memory_hotplug.h>
> @@ -301,8 +302,19 @@ static void free_ocxlpmem(struct ocxlpmem *ocxlpmem)
> {
> int rc;
>
> + // Disable doorbells
> + (void)ocxl_global_mmio_set64(ocxlpmem->ocxl_afu, GLOBAL_MMIO_CHIEC,
> + OCXL_LITTLE_ENDIAN,
> + GLOBAL_MMIO_CHI_ALL);
> +
> free_minor(ocxlpmem);
>
> + if (ocxlpmem->irq_addr[1])
> + iounmap(ocxlpmem->irq_addr[1]);
> +
> + if (ocxlpmem->irq_addr[0])
> + iounmap(ocxlpmem->irq_addr[0]);
> +
> if (ocxlpmem->ocxl_context) {
> rc = ocxl_context_detach(ocxlpmem->ocxl_context);
> if (rc == -EBUSY)
> @@ -398,6 +410,11 @@ static int file_release(struct inode *inode, struct file *file)
> {
> struct ocxlpmem *ocxlpmem = file->private_data;
>
> + if (ocxlpmem->ev_ctx) {
> + eventfd_ctx_put(ocxlpmem->ev_ctx);
> + ocxlpmem->ev_ctx = NULL;
> + }
> +
> ocxlpmem_put(ocxlpmem);
> return 0;
> }
> @@ -928,6 +945,52 @@ static int ioctl_controller_stats(struct ocxlpmem *ocxlpmem,
> return rc;
> }
>
> +static int ioctl_eventfd(struct ocxlpmem *ocxlpmem,
> + struct ioctl_ocxlpmem_eventfd __user *uarg)
> +{
> + struct ioctl_ocxlpmem_eventfd args;
> +
> + if (copy_from_user(&args, uarg, sizeof(args)))
> + return -EFAULT;
> +
> + if (ocxlpmem->ev_ctx)
> + return -EBUSY;
> +
> + ocxlpmem->ev_ctx = eventfd_ctx_fdget(args.eventfd);
> + if (IS_ERR(ocxlpmem->ev_ctx))
> + return PTR_ERR(ocxlpmem->ev_ctx);
> +
> + return 0;
> +}
> +
> +static int ioctl_event_check(struct ocxlpmem *ocxlpmem, u64 __user *uarg)
> +{
> + u64 val = 0;
> + int rc;
> + u64 chi = 0;
> +
> + rc = ocxlpmem_chi(ocxlpmem, &chi);
> + if (rc < 0)
> + return rc;
> +
> + if (chi & GLOBAL_MMIO_CHI_ELA)
> + val |= IOCTL_OCXLPMEM_EVENT_ERROR_LOG_AVAILABLE;
> +
> + if (chi & GLOBAL_MMIO_CHI_CDA)
> + val |= IOCTL_OCXLPMEM_EVENT_CONTROLLER_DUMP_AVAILABLE;
> +
> + if (chi & GLOBAL_MMIO_CHI_CFFS)
> + val |= IOCTL_OCXLPMEM_EVENT_FIRMWARE_FATAL;
> +
> + if (chi & GLOBAL_MMIO_CHI_CHFS)
> + val |= IOCTL_OCXLPMEM_EVENT_HARDWARE_FATAL;
> +
> + if (copy_to_user((u64 __user *)uarg, &val, sizeof(val)))
> + return -EFAULT;
> +
> + return rc;
> +}
> +
> static long file_ioctl(struct file *file, unsigned int cmd, unsigned long args)
> {
> struct ocxlpmem *ocxlpmem = file->private_data;
> @@ -956,6 +1019,15 @@ static long file_ioctl(struct file *file, unsigned int cmd, unsigned long args)
> rc = ioctl_controller_stats(ocxlpmem,
> (struct ioctl_ocxlpmem_controller_stats __user *)args);
> break;
> +
> + case IOCTL_OCXLPMEM_EVENTFD:
> + rc = ioctl_eventfd(ocxlpmem,
> + (struct ioctl_ocxlpmem_eventfd __user *)args);
> + break;
> +
> + case IOCTL_OCXLPMEM_EVENT_CHECK:
> + rc = ioctl_event_check(ocxlpmem, (u64 __user *)args);
> + break;
> }
>
> return rc;
> @@ -1109,6 +1181,148 @@ static void dump_error_log(struct ocxlpmem *ocxlpmem)
> kfree(buf);
> }
>
> +static irqreturn_t imn0_handler(void *private)
> +{
> + struct ocxlpmem *ocxlpmem = private;
> + u64 chi = 0;
> +
> + (void)ocxlpmem_chi(ocxlpmem, &chi);
> +
> + if (chi & GLOBAL_MMIO_CHI_ELA) {
> + dev_warn(&ocxlpmem->dev, "Error log is available\n");
> +
> + if (ocxlpmem->ev_ctx)
> + eventfd_signal(ocxlpmem->ev_ctx, 1);
> + }
> +
> + if (chi & GLOBAL_MMIO_CHI_CDA) {
> + dev_warn(&ocxlpmem->dev, "Controller dump is available\n");
> +
> + if (ocxlpmem->ev_ctx)
> + eventfd_signal(ocxlpmem->ev_ctx, 1);
> + }
> +
> + return IRQ_HANDLED;
> +}
> +
> +static irqreturn_t imn1_handler(void *private)
> +{
> + struct ocxlpmem *ocxlpmem = private;
> + u64 chi = 0;
> +
> + (void)ocxlpmem_chi(ocxlpmem, &chi);
> +
> + if (chi & (GLOBAL_MMIO_CHI_CFFS | GLOBAL_MMIO_CHI_CHFS)) {
> + dev_err(&ocxlpmem->dev,
> + "Controller status is fatal, chi=0x%llx, going offline\n",
> + chi);
> +
> + if (ocxlpmem->nvdimm_bus) {
> + nvdimm_bus_unregister(ocxlpmem->nvdimm_bus);
> + ocxlpmem->nvdimm_bus = NULL;
> + }
> +
> + if (ocxlpmem->ev_ctx)
> + eventfd_signal(ocxlpmem->ev_ctx, 1);
> + }
> +
> + return IRQ_HANDLED;
> +}
> +
> +/**
> + * ocxlpmem_setup_irq() - Set up the IRQs for the OpenCAPI Persistent Memory device
> + * @ocxlpmem: the device metadata
> + * Return: 0 on success, negative on failure
> + */
> +static int setup_irqs(struct ocxlpmem *ocxlpmem)
> +{
> + int rc;
> + u64 irq_addr;
> +
> + rc = ocxl_afu_irq_alloc(ocxlpmem->ocxl_context,
> + &ocxlpmem->irq_id[0]);
> + if (rc)
> + return rc;
> +
> + rc = ocxl_irq_set_handler(ocxlpmem->ocxl_context, ocxlpmem->irq_id[0],
> + imn0_handler, NULL, ocxlpmem);
> +
> + irq_addr = ocxl_afu_irq_get_addr(ocxlpmem->ocxl_context,
> + ocxlpmem->irq_id[0]);
> + if (!irq_addr)
> + return -EFAULT;
> +
> + ocxlpmem->irq_addr[0] = ioremap(irq_addr, PAGE_SIZE);
> + if (!ocxlpmem->irq_addr[0])
> + return -ENODEV;
> +
> + rc = ocxl_global_mmio_write64(ocxlpmem->ocxl_afu, GLOBAL_MMIO_IMA0_OHP,
> + OCXL_LITTLE_ENDIAN,
> + (u64)ocxlpmem->irq_addr[0]);
> + if (rc)
> + goto out_irq0;
> +
> + rc = ocxl_global_mmio_write64(ocxlpmem->ocxl_afu, GLOBAL_MMIO_IMA0_CFP,
> + OCXL_LITTLE_ENDIAN, 0);
> + if (rc)
> + goto out_irq0;
> +
> + rc = ocxl_afu_irq_alloc(ocxlpmem->ocxl_context, &ocxlpmem->irq_id[1]);
> + if (rc)
> + goto out_irq0;
> +
> + rc = ocxl_irq_set_handler(ocxlpmem->ocxl_context, ocxlpmem->irq_id[1],
> + imn1_handler, NULL, ocxlpmem);
> + if (rc)
> + goto out_irq0;
> +
> + irq_addr = ocxl_afu_irq_get_addr(ocxlpmem->ocxl_context,
> + ocxlpmem->irq_id[1]);
> + if (!irq_addr) {
> + rc = -EFAULT;
> + goto out_irq0;
> + }
> +
> + ocxlpmem->irq_addr[1] = ioremap(irq_addr, PAGE_SIZE);
> + if (!ocxlpmem->irq_addr[1]) {
> + rc = -ENODEV;
> + goto out_irq0;
> + }
> +
> + rc = ocxl_global_mmio_write64(ocxlpmem->ocxl_afu, GLOBAL_MMIO_IMA1_OHP,
> + OCXL_LITTLE_ENDIAN,
> + (u64)ocxlpmem->irq_addr[1]);
> + if (rc)
> + goto out_irq1;
> +
> + rc = ocxl_global_mmio_write64(ocxlpmem->ocxl_afu, GLOBAL_MMIO_IMA1_CFP,
> + OCXL_LITTLE_ENDIAN, 0);
> + if (rc)
> + goto out_irq1;
> +
> + // Enable doorbells
> + rc = ocxl_global_mmio_set64(ocxlpmem->ocxl_afu, GLOBAL_MMIO_CHIE,
> + OCXL_LITTLE_ENDIAN,
> + GLOBAL_MMIO_CHI_ELA |
> + GLOBAL_MMIO_CHI_CDA |
> + GLOBAL_MMIO_CHI_CFFS |
> + GLOBAL_MMIO_CHI_CHFS);
> + if (rc)
> + goto out_irq1;
> +
> + return 0;
> +
> +out_irq1:
> + iounmap(ocxlpmem->irq_addr[1]);
> + ocxlpmem->irq_addr[1] = NULL;
> +
> +out_irq0:
> + iounmap(ocxlpmem->irq_addr[0]);
> + ocxlpmem->irq_addr[0] = NULL;
> +
> + return rc;
> +}
> +
> /**
> * probe_function0() - Set up function 0 for an OpenCAPI persistent memory device
> * This is important as it enables templates higher than 0 across all other
> @@ -1212,6 +1426,12 @@ static int probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> goto err;
> }
>
> + rc = setup_irqs(ocxlpmem);
> + if (rc) {
> + dev_err(&pdev->dev, "Could not set up OCXL IRQs\n");
> + goto err;
> + }
> +
> rc = setup_command_metadata(ocxlpmem);
> if (rc) {
> dev_err(&pdev->dev, "Could not read command metadata\n");
> diff --git a/drivers/nvdimm/ocxl/ocxlpmem.h b/drivers/nvdimm/ocxl/ocxlpmem.h
> index ee3bd651f254..01721596f982 100644
> --- a/drivers/nvdimm/ocxl/ocxlpmem.h
> +++ b/drivers/nvdimm/ocxl/ocxlpmem.h
> @@ -106,6 +106,9 @@ struct ocxlpmem {
> struct pci_dev *pdev;
> struct cdev cdev;
> struct ocxl_fn *ocxl_fn;
> +#define SCM_IRQ_COUNT 2
> + int irq_id[SCM_IRQ_COUNT];
> + void __iomem *irq_addr[SCM_IRQ_COUNT];
> struct nd_interleave_set nd_set;
> struct nvdimm_bus_descriptor bus_desc;
> struct nvdimm_bus *nvdimm_bus;
> @@ -117,6 +120,7 @@ struct ocxlpmem {
> struct nd_region *nd_region;
> char fw_version[8 + 1];
> u32 timeouts[ADMIN_COMMAND_MAX + 1];
> + struct eventfd_ctx *ev_ctx;
> u32 max_controller_dump_size;
> u16 scm_revision; // major/minor
> u8 readiness_timeout; /* The worst case time (in seconds) that the host
> diff --git a/include/uapi/nvdimm/ocxlpmem.h b/include/uapi/nvdimm/ocxlpmem.h
> index ca3a7098fa9d..d573bd307e35 100644
> --- a/include/uapi/nvdimm/ocxlpmem.h
> +++ b/include/uapi/nvdimm/ocxlpmem.h
> @@ -71,6 +71,16 @@ struct ioctl_ocxlpmem_controller_stats {
> __u64 fast_write_count;
> };
>
> +struct ioctl_ocxlpmem_eventfd {
> + __s32 eventfd;
> + __u32 reserved;
> +};
> +
> +#define IOCTL_OCXLPMEM_EVENT_CONTROLLER_DUMP_AVAILABLE (1ULL << (0))
> +#define IOCTL_OCXLPMEM_EVENT_ERROR_LOG_AVAILABLE (1ULL << (1))
> +#define IOCTL_OCXLPMEM_EVENT_HARDWARE_FATAL (1ULL << (2))
> +#define IOCTL_OCXLPMEM_EVENT_FIRMWARE_FATAL (1ULL << (3))
> +
> /* ioctl numbers */
> #define OCXLPMEM_MAGIC 0xCA
> /* OpenCAPI Persistent memory devices */
> @@ -79,5 +89,7 @@ struct ioctl_ocxlpmem_controller_stats {
> #define IOCTL_OCXLPMEM_CONTROLLER_DUMP_DATA _IOWR(OCXLPMEM_MAGIC, 0x32, struct ioctl_ocxlpmem_controller_dump_data)
> #define IOCTL_OCXLPMEM_CONTROLLER_DUMP_COMPLETE _IO(OCXLPMEM_MAGIC, 0x33)
> #define IOCTL_OCXLPMEM_CONTROLLER_STATS _IO(OCXLPMEM_MAGIC, 0x34)
> +#define IOCTL_OCXLPMEM_EVENTFD _IOW(OCXLPMEM_MAGIC, 0x35, struct ioctl_ocxlpmem_eventfd)
> +#define IOCTL_OCXLPMEM_EVENT_CHECK _IOR(OCXLPMEM_MAGIC, 0x36, __u64)
>
> #endif /* _UAPI_OCXL_SCM_H */
> --
> 2.24.1
>

2020-04-02 03:44:58

by Michael Ellerman

[permalink] [raw]
Subject: RE: [PATCH v4 00/25] Add support for OpenCAPI Persistent Memory devices

"Alastair D'Silva" <[email protected]> writes:
>> -----Original Message-----
>> From: Dan Williams <[email protected]>
>>
>> On Sun, Mar 29, 2020 at 10:23 PM Alastair D'Silva <[email protected]>
>> wrote:
>> >
>> > This series adds support for OpenCAPI Persistent Memory devices on
>> > bare metal (arch/powernv), exposing them as nvdimms so that we can
>> > make use of the existing infrastructure. There already exists a driver
>> > for the same devices abstracted through PowerVM (arch/pseries):
>> > arch/powerpc/platforms/pseries/papr_scm.c
>> >
>> > These devices are connected via OpenCAPI, and present as LPC (lowest
>> coherence point) memory to the system, practically, that means that
>> memory on these cards could be treated as conventional, cache-coherent
>> memory.
>> >
>> > Since the devices are connected via OpenCAPI, they are not enumerated
>> via ACPI. Instead, OpenCAPI links present as pseudo-PCI bridges, with
>> devices below them.
>> >
>> > This series introduces a driver that exposes the memory on these cards as
>> nvdimms, with each card getting it's own bus. This is somewhat complicated
>> by the fact that the cards do not have out of band persistent storage for
>> metadata, so 1 SECTION_SIZE's (see SPARSEMEM) worth of storage is carved
>> out of the top of the card storage to implement the ndctl_config_* calls.
>>
>> Is it really tied to section-size? Can't that change based on the configured
>> page-size? It's not clear to me why that would be the choice, but I'll dig into
>> the implementation.
>>
>
> I had tried using PAGE_SIZE, but ran into problems carving off just 1 page and handing it to the kernel, while leaving the rest as pmem. That was a while ago though, so maybe I should retry it.
>
>> > The driver is not responsible for configuring the NPU (NVLink Processing
>> Unit) BARs to map the LPC memory from the card into the system's physical
>> address space, instead, it requests this to be done via OPAL calls (typically
>> implemented by Skiboot).
>>
>> Are OPAL calls similar to ACPI DSMs? I.e. methods for the OS to invoke
>> platform firmware services? What's Skiboot?
>>
>
> Yes, OPAL is the interface to firmware for POWER. Skiboot is the open-source (and only) implementation of OPAL.

https://github.com/open-power/skiboot

In particular the tokens for calls are defined here:

https://github.com/open-power/skiboot/blob/master/include/opal-api.h#L220

And you can grep for the token to find the implementation:

https://github.com/open-power/skiboot/blob/master/hw/npu2-opencapi.c#L2328


cheers

2020-04-02 03:52:13

by Oliver O'Halloran

[permalink] [raw]
Subject: Re: [PATCH v4 00/25] Add support for OpenCAPI Persistent Memory devices

On Thu, Apr 2, 2020 at 2:42 PM Michael Ellerman <[email protected]> wrote:
>
> "Alastair D'Silva" <[email protected]> writes:
> >> -----Original Message-----
> >> From: Dan Williams <[email protected]>
> >>
> >> On Sun, Mar 29, 2020 at 10:23 PM Alastair D'Silva <[email protected]>
> >> wrote:
> >> >
> >> > *snip*
> >> Are OPAL calls similar to ACPI DSMs? I.e. methods for the OS to invoke
> >> platform firmware services? What's Skiboot?
> >>
> >
> > Yes, OPAL is the interface to firmware for POWER. Skiboot is the open-source (and only) implementation of OPAL.
>
> https://github.com/open-power/skiboot
>
> In particular the tokens for calls are defined here:
>
> https://github.com/open-power/skiboot/blob/master/include/opal-api.h#L220
>
> And you can grep for the token to find the implementation:
>
> https://github.com/open-power/skiboot/blob/master/hw/npu2-opencapi.c#L2328

I'm not sure I'd encourage anyone to read npu2-opencapi.c. I find it
hard enough to follow even with access to the workbooks.

There's an OPAL call API reference here:
http://open-power.github.io/skiboot/doc/opal-api/index.html

Oliver

2020-04-02 04:35:26

by Alastair D'Silva

[permalink] [raw]
Subject: RE: [PATCH v4 02/25] mm/memory_hotplug: Allow check_hotplug_memory_addressable to be called from drivers

> -----Original Message-----
> From: Dan Williams <[email protected]>
> Sent: Wednesday, 1 April 2020 7:48 PM
> To: Alastair D'Silva <[email protected]>
> Cc: Aneesh Kumar K . V <[email protected]>; Oliver O'Halloran
> <[email protected]>; Benjamin Herrenschmidt
> <[email protected]>; Paul Mackerras <[email protected]>; Michael
> Ellerman <[email protected]>; Frederic Barrat <[email protected]>;
> Andrew Donnellan <[email protected]>; Arnd Bergmann
> <[email protected]>; Greg Kroah-Hartman <[email protected]>;
> Vishal Verma <[email protected]>; Dave Jiang
> <[email protected]>; Ira Weiny <[email protected]>; Andrew Morton
> <[email protected]>; Mauro Carvalho Chehab
> <[email protected]>; David S. Miller <[email protected]>;
> Rob Herring <[email protected]>; Anton Blanchard <[email protected]>;
> Krzysztof Kozlowski <[email protected]>; Mahesh Salgaonkar
> <[email protected]>; Madhavan Srinivasan
> <[email protected]>; Cédric Le Goater <[email protected]>; Anju T
> Sudhakar <[email protected]>; Hari Bathini
> <[email protected]>; Thomas Gleixner <[email protected]>; Greg
> Kurz <[email protected]>; Nicholas Piggin <[email protected]>; Masahiro
> Yamada <[email protected]>; Alexey Kardashevskiy
> <[email protected]>; Linux Kernel Mailing List <[email protected]>;
> linuxppc-dev <[email protected]>; linux-nvdimm <linux-
> [email protected]>; Linux MM <[email protected]>
> Subject: Re: [PATCH v4 02/25] mm/memory_hotplug: Allow
> check_hotplug_memory_addressable to be called from drivers
>
> On Sun, Mar 29, 2020 at 10:23 PM Alastair D'Silva <[email protected]>
> wrote:
> >
> > When setting up OpenCAPI connected persistent memory, the range check
> > may not be performed until quite late (or perhaps not at all, if the
> > user does not establish a DAX device).
> >
> > This patch makes the range check callable so we can perform the check
> > while probing the OpenCAPI Persistent Memory device.
> >
> > Signed-off-by: Alastair D'Silva <[email protected]>
> > Reviewed-by: Andrew Donnellan <[email protected]>
> > ---
> > include/linux/memory_hotplug.h | 5 +++++
> > mm/memory_hotplug.c | 4 ++--
> > 2 files changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/memory_hotplug.h
> > b/include/linux/memory_hotplug.h index f4d59155f3d4..9a19ae0d7e31
> > 100644
> > --- a/include/linux/memory_hotplug.h
> > +++ b/include/linux/memory_hotplug.h
> > @@ -337,6 +337,11 @@ static inline void __remove_memory(int nid, u64
> > start, u64 size) {} extern void set_zone_contiguous(struct zone
> > *zone); extern void clear_zone_contiguous(struct zone *zone);
> >
> > +#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
> > +int check_hotplug_memory_addressable(unsigned long pfn,
> > + unsigned long nr_pages); #endif
> > +/* CONFIG_MEMORY_HOTPLUG_SPARSE */
>
> Let's move this to include/linux/memory.h with the other
> CONFIG_MEMORY_HOTPLUG_SPARSE declarations, and add a dummy
> implementation for the CONFIG_MEMORY_HOTPLUG_SPARSE=n case.
>
> Also, this patch can be squashed with the next one, no need for it to be
> stand alone.
>

Ok

>
> > +
> > extern void __ref free_area_init_core_hotplug(int nid); extern int
> > __add_memory(int nid, u64 start, u64 size); extern int add_memory(int
> > nid, u64 start, u64 size); diff --git a/mm/memory_hotplug.c
> > b/mm/memory_hotplug.c index 0a54ffac8c68..14945f033594 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -276,8 +276,8 @@ static int check_pfn_span(unsigned long pfn,
> unsigned long nr_pages,
> > return 0;
> > }
> >
> > -static int check_hotplug_memory_addressable(unsigned long pfn,
> > - unsigned long nr_pages)
> > +int check_hotplug_memory_addressable(unsigned long pfn,
> > + unsigned long nr_pages)
> > {
> > const u64 max_addr = PFN_PHYS(pfn + nr_pages) - 1;
> >
> > --
> > 2.24.1
> >

--
Alastair D'Silva mob: 0423 762 819
skype: alastair_dsilva msn: [email protected]
blog: http://alastair.d-silva.org Twitter: @EvilDeece



2020-04-02 04:37:25

by Alastair D'Silva

[permalink] [raw]
Subject: RE: [PATCH v4 03/25] powerpc/powernv: Map & release OpenCAPI LPC memory

> -----Original Message-----
> From: Dan Williams <[email protected]>
> Sent: Wednesday, 1 April 2020 7:49 PM
> To: Alastair D'Silva <[email protected]>
> Cc: Aneesh Kumar K . V <[email protected]>; Oliver O'Halloran
> <[email protected]>; Benjamin Herrenschmidt
> <[email protected]>; Paul Mackerras <[email protected]>; Michael
> Ellerman <[email protected]>; Frederic Barrat <[email protected]>;
> Andrew Donnellan <[email protected]>; Arnd Bergmann
> <[email protected]>; Greg Kroah-Hartman <[email protected]>;
> Vishal Verma <[email protected]>; Dave Jiang
> <[email protected]>; Ira Weiny <[email protected]>; Andrew Morton
> <[email protected]>; Mauro Carvalho Chehab
> <[email protected]>; David S. Miller <[email protected]>;
> Rob Herring <[email protected]>; Anton Blanchard <[email protected]>;
> Krzysztof Kozlowski <[email protected]>; Mahesh Salgaonkar
> <[email protected]>; Madhavan Srinivasan
> <[email protected]>; Cédric Le Goater <[email protected]>; Anju T
> Sudhakar <[email protected]>; Hari Bathini
> <[email protected]>; Thomas Gleixner <[email protected]>; Greg
> Kurz <[email protected]>; Nicholas Piggin <[email protected]>; Masahiro
> Yamada <[email protected]>; Alexey Kardashevskiy
> <[email protected]>; Linux Kernel Mailing List <[email protected]>;
> linuxppc-dev <[email protected]>; linux-nvdimm <linux-
> [email protected]>; Linux MM <[email protected]>
> Subject: Re: [PATCH v4 03/25] powerpc/powernv: Map & release OpenCAPI
> LPC memory
>
> On Sun, Mar 29, 2020 at 10:23 PM Alastair D'Silva <[email protected]>
> wrote:
> >
> > This patch adds OPAL calls to powernv so that the OpenCAPI driver can
> > map & release LPC (Lowest Point of Coherency) memory.
> >
> > Signed-off-by: Alastair D'Silva <[email protected]>
> > Reviewed-by: Andrew Donnellan <[email protected]>
> > ---
> > arch/powerpc/include/asm/pnv-ocxl.h | 2 ++
> > arch/powerpc/platforms/powernv/ocxl.c | 43
> > +++++++++++++++++++++++++++
> > 2 files changed, 45 insertions(+)
> >
> > diff --git a/arch/powerpc/include/asm/pnv-ocxl.h
> > b/arch/powerpc/include/asm/pnv-ocxl.h
> > index 7de82647e761..560a19bb71b7 100644
> > --- a/arch/powerpc/include/asm/pnv-ocxl.h
> > +++ b/arch/powerpc/include/asm/pnv-ocxl.h
> > @@ -32,5 +32,7 @@ extern int
> pnv_ocxl_spa_remove_pe_from_cache(void
> > *platform_data, int pe_handle)
> >
> > extern int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr);
> > extern void pnv_ocxl_free_xive_irq(u32 irq);
> > +u64 pnv_ocxl_platform_lpc_setup(struct pci_dev *pdev, u64 size); void
> > +pnv_ocxl_platform_lpc_release(struct pci_dev *pdev);
> >
> > #endif /* _ASM_PNV_OCXL_H */
> > diff --git a/arch/powerpc/platforms/powernv/ocxl.c
> > b/arch/powerpc/platforms/powernv/ocxl.c
> > index 8c65aacda9c8..f13119a7c026 100644
> > --- a/arch/powerpc/platforms/powernv/ocxl.c
> > +++ b/arch/powerpc/platforms/powernv/ocxl.c
> > @@ -475,6 +475,49 @@ void pnv_ocxl_spa_release(void *platform_data)
> }
> > EXPORT_SYMBOL_GPL(pnv_ocxl_spa_release);
> >
> > +u64 pnv_ocxl_platform_lpc_setup(struct pci_dev *pdev, u64 size) {
> > + struct pci_controller *hose = pci_bus_to_host(pdev->bus);
> > + struct pnv_phb *phb = hose->private_data;
>
> Is calling the local variable 'hose' instead of 'host' on purpose?
>

Yes, this follows the convention used in other functions in this file.

> > + u32 bdfn = pci_dev_id(pdev);
> > + __be64 base_addr_be64;
> > + u64 base_addr;
> > + int rc;
> > +
> > + rc = opal_npu_mem_alloc(phb->opal_id, bdfn, size,
> &base_addr_be64);
> > + if (rc) {
> > + dev_warn(&pdev->dev,
> > + "OPAL could not allocate LPC memory, rc=%d\n", rc);
> > + return 0;
> > + }
> > +
> > + base_addr = be64_to_cpu(base_addr_be64);
> > +
> > +#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
>
> With the proposed cleanup in patch2 the ifdef can be elided here.

Ok
>
> > + rc = check_hotplug_memory_addressable(base_addr >> PAGE_SHIFT,
> > + size >> PAGE_SHIFT);
> > + if (rc)
> > + return 0;
>
> Is this an error worth logging if someone is wondering why their device is not
> showing up?
>

Yes, I'll add a message.

>
> > +#endif
> > +
> > + return base_addr;
> > +}
> > +EXPORT_SYMBOL_GPL(pnv_ocxl_platform_lpc_setup);
> > +
> > +void pnv_ocxl_platform_lpc_release(struct pci_dev *pdev) {
> > + struct pci_controller *hose = pci_bus_to_host(pdev->bus);
> > + struct pnv_phb *phb = hose->private_data;
> > + u32 bdfn = pci_dev_id(pdev);
> > + int rc;
> > +
> > + rc = opal_npu_mem_release(phb->opal_id, bdfn);
> > + if (rc)
> > + dev_warn(&pdev->dev,
> > + "OPAL reported rc=%d when releasing LPC
> > +memory\n", rc); }
> EXPORT_SYMBOL_GPL(pnv_ocxl_platform_lpc_release);
> > +
> > int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int
> > pe_handle) {
> > struct spa_data *data = (struct spa_data *) platform_data;
> > --
> > 2.24.1
> >
>
>
> --
> This email has been checked for viruses by AVG.
> https://www.avg.com


2020-04-02 06:26:50

by Andrew Donnellan

[permalink] [raw]
Subject: Re: [PATCH v4 06/25] ocxl: Tally up the LPC memory on a link & allow it to be mapped

On 1/4/20 7:48 pm, Dan Williams wrote:
> On Sun, Mar 29, 2020 at 10:53 PM Alastair D'Silva <[email protected]> wrote:
>>
>> OpenCAPI LPC memory is allocated per link, but each link supports
>> multiple AFUs, and each AFU can have LPC memory assigned to it.
>
> Is there an OpenCAPI primer to decode these objects and their
> associations that I can reference?

There isn't presently a primer that I think addresses these questions
nicely (to my knowledge - Fred might have something he can link to?) -
there are the specs published by the OpenCAPI Consortium at
https://opencapi.org but they're really for hardware implementers.

We should probably expand what's currently documented in
Documentation/userspace-api/accelerators/ocxl.rst generally, and this
series should probably update that to include details on LPC.

To explain the specific objects here:

- A "link" is a point-to-point link between the host CPU, and a single
OpenCAPI card. (We don't currently support cards making use of multiple
links for increased bandwidth, though that is supported from a hardware
point of view.)

- On POWER9, each link appears as a separate PCI domain, with a single
bus, and the card appears as a single device.

- A device can have up to 8 functions, per PCI.

- An Attached Functional Unit (AFU) is the abstraction for a particular
application function. Each PCI function defines the number of AFUs it
has through a set of OpenCAPI-specific DVSECs, max 64 per function. The
ocxl driver handles AFU discovery.

- On the host side, LPC memory is mapped by setting a single BAR for the
whole link, but on the device side, LPC memory is requested on a per-AFU
basis, through an AFU descriptor that is exposed through the
aforementioned DVSECs. Hence the need to loop through the AFUs and get
the total required LPC memory to work out the correct BAR value.

--
Andrew Donnellan OzLabs, ADL Canberra
[email protected] IBM Australia Limited

2020-04-02 06:43:11

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v4 14/25] nvdimm/ocxl: Add support for Admin commands

On Sun, Mar 29, 2020 at 10:23 PM Alastair D'Silva <[email protected]> wrote:
>
> Admin commands for these devices are the primary means of interacting
> with the device controller to provide functionality beyond the load/store
> capabilities offered via the NPU.
>
> For example, SMART data, firmware update, and device error logs are
> implemented via admin commands.
>
> This patch requests the metadata required to issue admin commands, as well
> as some helper functions to construct and check the completion of the
> commands.
>
> Signed-off-by: Alastair D'Silva <[email protected]>
> ---
> drivers/nvdimm/ocxl/main.c | 65 ++++++
> drivers/nvdimm/ocxl/ocxlpmem.h | 50 ++++-
> drivers/nvdimm/ocxl/ocxlpmem_internal.c | 261 ++++++++++++++++++++++++
> 3 files changed, 375 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/nvdimm/ocxl/main.c b/drivers/nvdimm/ocxl/main.c
> index be76acd33d74..8db573036423 100644
> --- a/drivers/nvdimm/ocxl/main.c
> +++ b/drivers/nvdimm/ocxl/main.c
> @@ -217,6 +217,58 @@ static int register_lpc_mem(struct ocxlpmem *ocxlpmem)
> return 0;
> }
>
> +/**
> + * extract_command_metadata() - Extract command data from MMIO & save it for further use
> + * @ocxlpmem: the device metadata
> + * @offset: The base address of the command data structures (address of CREQO)
> + * @command_metadata: A pointer to the command metadata to populate
> + * Return: 0 on success, negative on failure
> + */
> +static int extract_command_metadata(struct ocxlpmem *ocxlpmem, u32 offset,
> + struct command_metadata *command_metadata)

How about "struct ocxlpmem *ocp" throughout all these patches? The
full duplication of the type name as the local variable name makes
this look like non-idiomatic Linux code to me. It had not quite hit me
until I saw "struct command_metadata *command_metadata" that just
strikes me as too literal and the person that gets to maintain this
code later will appreciate a smaller amount of typing.

Also, is it really the case that the layout of the admin command
metadata needs to be programmatically determined at runtime? I would
expect it to be a static command definition in the spec.


> +{
> + int rc;
> + u64 tmp;
> +
> + rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu, offset,
> + OCXL_LITTLE_ENDIAN, &tmp);
> + if (rc)
> + return rc;
> +
> + command_metadata->request_offset = tmp >> 32;
> + command_metadata->response_offset = tmp & 0xFFFFFFFF;
> +
> + rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu, offset + 8,
> + OCXL_LITTLE_ENDIAN, &tmp);
> + if (rc)
> + return rc;
> +
> + command_metadata->data_offset = tmp >> 32;
> + command_metadata->data_size = tmp & 0xFFFFFFFF;
> +
> + command_metadata->id = 0;
> +
> + return 0;
> +}
> +
> +/**
> + * setup_command_metadata() - Set up the command metadata
> + * @ocxlpmem: the device metadata
> + */
> +static int setup_command_metadata(struct ocxlpmem *ocxlpmem)
> +{
> + int rc;
> +
> + mutex_init(&ocxlpmem->admin_command.lock);
> +
> + rc = extract_command_metadata(ocxlpmem, GLOBAL_MMIO_ACMA_CREQO,
> + &ocxlpmem->admin_command);
> + if (rc)
> + return rc;
> +
> + return 0;
> +}
> +
> /**
> * allocate_minor() - Allocate a minor number to use for an OpenCAPI pmem device
> * @ocxlpmem: the device metadata
> @@ -421,6 +473,14 @@ static int probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>
> ocxlpmem->pdev = pci_dev_get(pdev);
>
> + ocxlpmem->timeouts[ADMIN_COMMAND_ERRLOG] = 2000; // ms
> + ocxlpmem->timeouts[ADMIN_COMMAND_HEARTBEAT] = 100; // ms
> + ocxlpmem->timeouts[ADMIN_COMMAND_SMART] = 100; // ms
> + ocxlpmem->timeouts[ADMIN_COMMAND_CONTROLLER_DUMP] = 1000; // ms
> + ocxlpmem->timeouts[ADMIN_COMMAND_CONTROLLER_STATS] = 100; // ms
> + ocxlpmem->timeouts[ADMIN_COMMAND_SHUTDOWN] = 1000; // ms
> + ocxlpmem->timeouts[ADMIN_COMMAND_FW_UPDATE] = 16000; // ms
> +
> pci_set_drvdata(pdev, ocxlpmem);
>
> ocxlpmem->ocxl_fn = ocxl_function_open(pdev);
> @@ -467,6 +527,11 @@ static int probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> goto err;
> }
>
> + if (setup_command_metadata(ocxlpmem)) {
> + dev_err(&pdev->dev, "Could not read command metadata\n");
> + goto err;
> + }
> +
> elapsed = 0;
> timeout = ocxlpmem->readiness_timeout +
> ocxlpmem->memory_available_timeout;
> diff --git a/drivers/nvdimm/ocxl/ocxlpmem.h b/drivers/nvdimm/ocxl/ocxlpmem.h
> index 3eadbe19f6d0..b72b3f909fc3 100644
> --- a/drivers/nvdimm/ocxl/ocxlpmem.h
> +++ b/drivers/nvdimm/ocxl/ocxlpmem.h
> @@ -7,6 +7,7 @@
> #include <linux/mm.h>
>
> #define LABEL_AREA_SIZE BIT_ULL(PA_SECTION_SHIFT)
> +#define DEFAULT_TIMEOUT 100
>
> #define GLOBAL_MMIO_CHI 0x000
> #define GLOBAL_MMIO_CHIC 0x008
> @@ -57,6 +58,7 @@
> #define GLOBAL_MMIO_HCI_CONTROLLER_DUMP_COLLECTED BIT_ULL(5) // CDC
> #define GLOBAL_MMIO_HCI_REQ_HEALTH_PERF BIT_ULL(6) // CHPD
>
> +// must be maintained with admin_command_names in ocxlpmem_internal.c
> #define ADMIN_COMMAND_HEARTBEAT 0x00u
> #define ADMIN_COMMAND_SHUTDOWN 0x01u
> #define ADMIN_COMMAND_FW_UPDATE 0x02u
> @@ -68,6 +70,13 @@
> #define ADMIN_COMMAND_CMD_CAPS 0x08u
> #define ADMIN_COMMAND_MAX 0x08u
>
> +#define NS_COMMAND_SECURE_ERASE 0x20ull
> +
> +#define NS_RESPONSE_SECURE_ERASE_ACCESSIBLE_SUCCESS 0x20
> +#define NS_RESPONSE_SECURE_ERASE_ACCESSIBLE_ATTEMPTED 0x28
> +#define NS_RESPONSE_SECURE_ERASE_DEFECTIVE_SUCCESS 0x30
> +#define NS_RESPONSE_SECURE_ERASE_DEFECTIVE_ATTEMPTED 0x38
> +
> #define STATUS_SUCCESS 0x00
> #define STATUS_MEM_UNAVAILABLE 0x20
> #define STATUS_BLOCKED_BG_TASK 0x21
> @@ -81,6 +90,16 @@
> #define STATUS_FW_ARG_INVALID STATUS_BAD_REQUEST_PARM
> #define STATUS_FW_INVALID STATUS_BAD_DATA_PARM
>
> +struct command_metadata {
> + u32 request_offset;
> + u32 response_offset;
> + u32 data_offset;
> + u32 data_size;
> + struct mutex lock; /* locks access to this command */
> + u16 id;
> + u8 op_code;
> +};
> +
> struct ocxlpmem {
> struct device dev;
> struct pci_dev *pdev;
> @@ -91,10 +110,11 @@ struct ocxlpmem {
> struct ocxl_afu *ocxl_afu;
> struct ocxl_context *ocxl_context;
> void *metadata_addr;
> + struct command_metadata admin_command;
> struct resource pmem_res;
> struct nd_region *nd_region;
> char fw_version[8 + 1];
> -
> + u32 timeouts[ADMIN_COMMAND_MAX + 1];
> u32 max_controller_dump_size;
> u16 scm_revision; // major/minor
> u8 readiness_timeout; /* The worst case time (in seconds) that the host
> @@ -123,3 +143,31 @@ struct ocxlpmem {
> * Returns 0 on success, negative on error
> */
> int ocxlpmem_chi(const struct ocxlpmem *ocxlpmem, u64 *chi);
> +
> +/**
> + * admin_command_execute() - Execute an admin command and wait for completion
> + *
> + * Additional MMIO registers (dependent on the command) may
> + * need to be initialized
> + *
> + * @ocxlpmem: the device metadata
> + * @op_code: the code for the admin command
> + * Returns 0 on success, -EINVAL for a bad op code, -EBUSY on timeout
> + */
> +int admin_command_execute(struct ocxlpmem *ocxlpmem, u8 op_code);
> +
> +/**
> + * admin_response_handled() - Notify the controller that the admin response has been handled
> + * @ocxlpmem: the device metadata
> + * Returns 0 on success, negative on failure
> + */
> +int admin_response_handled(const struct ocxlpmem *ocxlpmem);
> +
> +/**
> + * warn_status() - Emit a kernel warning showing a command status.
> + * @ocxlpmem: the device metadata
> + * @message: A message to accompany the warning
> + * @status: The command status
> + */
> +void warn_status(const struct ocxlpmem *ocxlpmem, const char *message,
> + u8 status);
> diff --git a/drivers/nvdimm/ocxl/ocxlpmem_internal.c b/drivers/nvdimm/ocxl/ocxlpmem_internal.c
> index 5578169b7515..7470a6ab3b08 100644
> --- a/drivers/nvdimm/ocxl/ocxlpmem_internal.c
> +++ b/drivers/nvdimm/ocxl/ocxlpmem_internal.c
> @@ -17,3 +17,264 @@ int ocxlpmem_chi(const struct ocxlpmem *ocxlpmem, u64 *chi)
>
> return 0;
> }
> +
> +#define COMMAND_REQUEST_SIZE (8 * sizeof(u64))
> +/**
> + * scm_command_request() - Set up a command request
> + * @cmd: The metadata for the type of command to be issued
> + * @op_code: the op code for the command
> + * @valid_bytes: the number of bytes in the header to preserve (these must be set before calling)
> + */
> +static int scm_command_request(const struct ocxlpmem *ocxlpmem,
> + struct command_metadata *cmd, u8 op_code,
> + u8 valid_bytes)

Ah, here's an abbreviation for command metadata.

> +{
> + u64 val = op_code;
> + int rc;
> + u8 i;
> +
> + cmd->op_code = op_code;
> + cmd->id++;
> +
> + val |= ((u64)cmd->id) << 16;
> +
> + rc = ocxl_global_mmio_write64(ocxlpmem->ocxl_afu, cmd->request_offset,
> + OCXL_LITTLE_ENDIAN, val);
> + if (rc)
> + return rc;
> +
> + for (i = valid_bytes; i < COMMAND_REQUEST_SIZE; i += sizeof(u64)) {
> + rc = ocxl_global_mmio_write64(ocxlpmem->ocxl_afu,
> + cmd->request_offset + i,
> + OCXL_LITTLE_ENDIAN, 0);
> + if (rc)
> + return rc;
> + }
> +
> + return 0;
> +}
> +
> +/**
> + * admin_command_request() - Issue an admin command request
> + * @ocxlpmem: the device metadata
> + * @op_code: The op-code for the command
> + *
> + * Returns an identifier for the command, or negative on error
> + */
> +static int admin_command_request(struct ocxlpmem *ocxlpmem, u8 op_code)
> +{
> + u8 valid_bytes = sizeof(u64);
> +
> + switch (op_code) {
> + case ADMIN_COMMAND_HEARTBEAT:
> + case ADMIN_COMMAND_SHUTDOWN:
> + case ADMIN_COMMAND_ERRLOG:
> + case ADMIN_COMMAND_CMD_CAPS:
> + valid_bytes += 0;
> + break;
> + case ADMIN_COMMAND_FW_UPDATE:
> + case ADMIN_COMMAND_SMART:
> + case ADMIN_COMMAND_CONTROLLER_STATS:
> + case ADMIN_COMMAND_CONTROLLER_DUMP:
> + valid_bytes += sizeof(u64);
> + break;
> + case ADMIN_COMMAND_FW_DEBUG:
> + valid_bytes += 3 * sizeof(u64);
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> + return scm_command_request(ocxlpmem, &ocxlpmem->admin_command, op_code,
> + valid_bytes);

Why isn't this just:

ocp_command_request(ocp, op_code, valid_bytes);

...because it's not clear to me why this switches from a generic name
like admin_command_request() and then switches to scm_. The
"command_metadata" is implied by the ocp context and the op_code,
right?


> +}
> +
> +static int command_response(const struct ocxlpmem *ocxlpmem,
> + const struct command_metadata *cmd)
> +{
> + u64 val;
> + u16 id;
> + u8 status;
> + int rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
> + cmd->response_offset,
> + OCXL_LITTLE_ENDIAN, &val);
> + if (rc)
> + return rc;
> +
> + status = val & 0xff;
> + id = (val >> 16) & 0xffff;
> +
> + if (id != cmd->id) {
> + dev_err(&ocxlpmem->dev,
> + "Expected response for command %d, but received response for command %d instead.\n",
> + cmd->id, id);
> + return -EBUSY;
> + }
> +
> + return status;
> +}
> +
> +/**
> + * admin_response() - Validate an admin response
> + * @ocxlpmem: the device metadata
> + * Returns the status code of the command, or negative on error
> + */
> +static int admin_response(const struct ocxlpmem *ocxlpmem)
> +{
> + return command_response(ocxlpmem, &ocxlpmem->admin_command);
> +}
> +
> +/**
> + * admin_command_exec() - Notify the controller to start processing a pending admin command
> + * @ocxlpmem: the device metadata
> + * Returns 0 on success, negative on error
> + */
> +static int admin_command_exec(const struct ocxlpmem *ocxlpmem)
> +{
> + return ocxl_global_mmio_set64(ocxlpmem->ocxl_afu, GLOBAL_MMIO_HCI,
> + OCXL_LITTLE_ENDIAN, GLOBAL_MMIO_HCI_ACRW);
> +}
> +
> +static bool admin_command_complete(const struct ocxlpmem *ocxlpmem)
> +{
> + u64 val = 0;
> +
> + int rc = ocxlpmem_chi(ocxlpmem, &val);
> +
> + WARN_ON(rc);
> +
> + return (val & GLOBAL_MMIO_CHI_ACRA) != 0;
> +}
> +
> +/**
> + * admin_command_complete_timeout() - Wait for an admin command to finish executing
> + * @ocxlpmem: the device metadata
> + * @command: the admin command to wait for completion (determines the timeout)
> + * Returns 0 on success, -EBUSY on timeout
> + */
> +static int admin_command_complete_timeout(const struct ocxlpmem *ocxlpmem,
> + int command)
> +{
> + unsigned long timeout = jiffies +
> + msecs_to_jiffies(ocxlpmem->timeouts[command]);
> +
> + // 32 is the next power of 2 greater than the 20ms minimum for msleep
> +#define TIMEOUT_SLEEP_MILLIS 32
> + do {
> + if (admin_command_complete(ocxlpmem))
> + return 0;
> + msleep(TIMEOUT_SLEEP_MILLIS);
> + } while (time_before(jiffies, timeout));
> +
> + if (admin_command_complete(ocxlpmem))
> + return 0;
> +
> + return -EBUSY;
> +}
> +
> +// Must be maintained with ADMIN_COMMAND_* in ocxlpmem.h
> +static const char * const admin_command_names[] = {
> + "heartbeat",
> + "shutdown",
> + "firmware update",
> + "firmware debug",
> + "retrieve error log",
> + "retrieve SMART data",
> + "controller statistics",
> + "controller dump",
> + "command capabilities",
> +};
> +
> +/**
> + * admin_command_name() - get the name of an admin command
> + * @ocxlpmem: the device metadata
> + * @op_code: the code for the admin command
> + * Returns a string representing the name of the command
> + */
> +static const char *admin_command_name(u8 op_code)
> +{
> + if (op_code > ADMIN_COMMAND_MAX)
> + return "unknown command";
> +
> + return admin_command_names[op_code];
> +}
> +
> +/**
> + * admin_command_execute() - Execute an admin command and wait for completion
> + *
> + * Additional MMIO registers (dependent on the command) may
> + * need to be initialized
> + *
> + * @ocxlpmem: the device metadata
> + * @op_code: the code for the admin command
> + * Returns 0 on success, -EINVAL for a bad op code, -EBUSY on timeout
> + */
> +int admin_command_execute(struct ocxlpmem *ocxlpmem, u8 op_code)
> +{
> + int rc;
> +
> + if (op_code > ADMIN_COMMAND_MAX)
> + return -EINVAL;
> +
> + rc = admin_command_request(ocxlpmem, op_code);
> + if (rc)
> + return rc;
> +
> + rc = admin_command_exec(ocxlpmem);
> + if (rc)
> + return rc;
> +
> + rc = admin_command_complete_timeout(ocxlpmem, op_code);
> + if (rc < 0) {
> + dev_warn(&ocxlpmem->dev, "%s timed out\n",
> + admin_command_name(op_code));
> + return rc;
> + }
> +
> + return admin_response(ocxlpmem);
> +}
> +
> +int admin_response_handled(const struct ocxlpmem *ocxlpmem)
> +{
> + // writing to the CHIC register clears the bit in CHI
> + return ocxl_global_mmio_set64(ocxlpmem->ocxl_afu, GLOBAL_MMIO_CHIC,
> + OCXL_LITTLE_ENDIAN, GLOBAL_MMIO_CHI_ACRA);
> +}
> +
> +void warn_status(const struct ocxlpmem *ocxlpmem, const char *message,
> + u8 status)
> +{
> + const char *text = "Unknown";
> +
> + switch (status) {
> + case STATUS_SUCCESS:
> + text = "Success";
> + break;
> +
> + case STATUS_MEM_UNAVAILABLE:
> + text = "Persistent memory unavailable";
> + break;
> +
> + case STATUS_BAD_OPCODE:
> + text = "Bad opcode";
> + break;
> +
> + case STATUS_BAD_REQUEST_PARM:
> + text = "Bad request parameter";
> + break;
> +
> + case STATUS_BAD_DATA_PARM:
> + text = "Bad data parameter";
> + break;
> +
> + case STATUS_DEBUG_BLOCKED:
> + text = "Debug action blocked";
> + break;
> +
> + case STATUS_FAIL:
> + text = "Failed";
> + break;
> + }
> +
> + dev_warn(&ocxlpmem->dev, "%s: %s (%x)\n", message, text, status);
> +}
> --
> 2.24.1
>

2020-04-02 10:07:50

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH v4 00/25] Add support for OpenCAPI Persistent Memory devices

"Oliver O'Halloran" <[email protected]> writes:
> On Thu, Apr 2, 2020 at 2:42 PM Michael Ellerman <[email protected]> wrote:
>> "Alastair D'Silva" <[email protected]> writes:
>> >> -----Original Message-----
>> >> From: Dan Williams <[email protected]>
>> >>
>> >> On Sun, Mar 29, 2020 at 10:23 PM Alastair D'Silva <[email protected]>
>> >> wrote:
>> >> >
>> >> > *snip*
>> >> Are OPAL calls similar to ACPI DSMs? I.e. methods for the OS to invoke
>> >> platform firmware services? What's Skiboot?
>> >
>> > Yes, OPAL is the interface to firmware for POWER. Skiboot is the open-source (and only) implementation of OPAL.
>>
>> https://github.com/open-power/skiboot
>>
>> In particular the tokens for calls are defined here:
>>
>> https://github.com/open-power/skiboot/blob/master/include/opal-api.h#L220
>>
>> And you can grep for the token to find the implementation:
>>
>> https://github.com/open-power/skiboot/blob/master/hw/npu2-opencapi.c#L2328
>
> I'm not sure I'd encourage anyone to read npu2-opencapi.c. I find it
> hard enough to follow even with access to the workbooks.

Compared to certain firmwares that run on certain other platforms it's
actually pretty readable code ;)

> There's an OPAL call API reference here:
> http://open-power.github.io/skiboot/doc/opal-api/index.html

Even better.

cheers

2020-04-02 10:44:54

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH v4 03/25] powerpc/powernv: Map & release OpenCAPI LPC memory

On Wed, 2020-04-01 at 01:48 -0700, Dan Williams wrote:
> >
> > +u64 pnv_ocxl_platform_lpc_setup(struct pci_dev *pdev, u64 size)
> > +{
> > + struct pci_controller *hose = pci_bus_to_host(pdev->bus);
> > + struct pnv_phb *phb = hose->private_data;
>
> Is calling the local variable 'hose' instead of 'host' on purpose?

Haha that's funny :-)

It's an oooooooold usage that comes iirc from sparc ? or maybe alpha ?
I somewhat accidentally picked it up when adding multiple host-bridge
support on powerpc in the early 2000's and it hasn't quite died yet :)

Cheers,
Ben.

2020-04-02 12:00:03

by Greg Kurz

[permalink] [raw]
Subject: Re: [PATCH v4 00/25] Add support for OpenCAPI Persistent Memory devices

On Thu, 02 Apr 2020 21:06:01 +1100
Michael Ellerman <[email protected]> wrote:

> "Oliver O'Halloran" <[email protected]> writes:
> > On Thu, Apr 2, 2020 at 2:42 PM Michael Ellerman <[email protected]> wrote:
> >> "Alastair D'Silva" <[email protected]> writes:
> >> >> -----Original Message-----
> >> >> From: Dan Williams <[email protected]>
> >> >>
> >> >> On Sun, Mar 29, 2020 at 10:23 PM Alastair D'Silva <[email protected]>
> >> >> wrote:
> >> >> >
> >> >> > *snip*
> >> >> Are OPAL calls similar to ACPI DSMs? I.e. methods for the OS to invoke
> >> >> platform firmware services? What's Skiboot?
> >> >
> >> > Yes, OPAL is the interface to firmware for POWER. Skiboot is the open-source (and only) implementation of OPAL.
> >>
> >> https://github.com/open-power/skiboot
> >>
> >> In particular the tokens for calls are defined here:
> >>
> >> https://github.com/open-power/skiboot/blob/master/include/opal-api.h#L220
> >>
> >> And you can grep for the token to find the implementation:
> >>
> >> https://github.com/open-power/skiboot/blob/master/hw/npu2-opencapi.c#L2328
> >
> > I'm not sure I'd encourage anyone to read npu2-opencapi.c. I find it
> > hard enough to follow even with access to the workbooks.
>
> Compared to certain firmwares that run on certain other platforms it's
> actually pretty readable code ;)
>

Forth rocks ! ;-)

> > There's an OPAL call API reference here:
> > http://open-power.github.io/skiboot/doc/opal-api/index.html
>
> Even better.
>
> cheers

2020-04-03 01:16:43

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v4 16/25] nvdimm/ocxl: Implement the Read Error Log command

On Tue, Mar 31, 2020 at 1:59 AM Alastair D'Silva <[email protected]> wrote:
>
> The read error log command extracts information from the controller's
> internal error log.
>
> This patch exposes this information in 2 ways:
> - During probe, if an error occurs & a log is available, print it to the
> console
> - After probe, make the error log available to userspace via an IOCTL.
> Userspace is notified of pending error logs in a later patch
> ("powerpc/powernv/pmem: Forward events to userspace")

So, have a look at the recent papr_scm patches to add health flags and
smart data retrieval. I'd prefer to extend existing nvdimm device
retrieval mechanisms than invent new ones.


>
> Signed-off-by: Alastair D'Silva <[email protected]>
> ---
> .../userspace-api/ioctl/ioctl-number.rst | 1 +
> drivers/nvdimm/ocxl/main.c | 240 ++++++++++++++++++
> include/uapi/nvdimm/ocxlpmem.h | 46 ++++
> 3 files changed, 287 insertions(+)
> create mode 100644 include/uapi/nvdimm/ocxlpmem.h
>
> diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
> index 9425377615ce..ba0ce7dca643 100644
> --- a/Documentation/userspace-api/ioctl/ioctl-number.rst
> +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
> @@ -340,6 +340,7 @@ Code Seq# Include File Comments
> 0xC0 00-0F linux/usb/iowarrior.h
> 0xCA 00-0F uapi/misc/cxl.h
> 0xCA 10-2F uapi/misc/ocxl.h
> +0xCA 30-3F uapi/nvdimm/ocxlpmem.h OpenCAPI Persistent Memory
> 0xCA 80-BF uapi/scsi/cxlflash_ioctl.h
> 0xCB 00-1F CBM serial IEC bus in development:
> <mailto:[email protected]>
> diff --git a/drivers/nvdimm/ocxl/main.c b/drivers/nvdimm/ocxl/main.c
> index 9b85fcd3f1c9..e6be0029f658 100644
> --- a/drivers/nvdimm/ocxl/main.c
> +++ b/drivers/nvdimm/ocxl/main.c
> @@ -13,6 +13,7 @@
> #include <linux/fs.h>
> #include <linux/mm_types.h>
> #include <linux/memory_hotplug.h>
> +#include <uapi/nvdimm/ocxlpmem.h>
> #include "ocxlpmem.h"
>
> static const struct pci_device_id pci_tbl[] = {
> @@ -401,10 +402,190 @@ static int file_release(struct inode *inode, struct file *file)
> return 0;
> }
>
> +/**
> + * error_log_header_parse() - Parse the first 64 bits of the error log command response
> + * @ocxlpmem: the device metadata
> + * @length: out, returns the number of bytes in the response (excluding the 64 bit header)
> + */
> +static int error_log_header_parse(struct ocxlpmem *ocxlpmem, u16 *length)
> +{
> + int rc;
> + u64 val;
> + u16 data_identifier;
> + u32 data_length;
> +
> + rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
> + ocxlpmem->admin_command.data_offset,
> + OCXL_LITTLE_ENDIAN, &val);
> + if (rc)
> + return rc;
> +
> + data_identifier = val >> 48;
> + data_length = val & 0xFFFF;
> +
> + if (data_identifier != 0x454C) { // 'EL'
> + dev_err(&ocxlpmem->dev,
> + "Bad data identifier for error log data, expected 'EL', got '%2s' (%#x), data_length=%u\n",
> + (char *)&data_identifier,
> + (unsigned int)data_identifier, data_length);
> + return -EINVAL;
> + }
> +
> + *length = data_length;
> + return 0;
> +}
> +
> +static int read_error_log(struct ocxlpmem *ocxlpmem,
> + struct ioctl_ocxlpmem_error_log *log,
> + bool buf_is_user)
> +{
> + u64 val;
> + u16 user_buf_length;
> + u16 buf_length;
> + u64 *buf = (u64 *)log->buf_ptr;
> + u16 i;
> + int rc;
> +
> + if (log->buf_size % 8)
> + return -EINVAL;
> +
> + rc = ocxlpmem_chi(ocxlpmem, &val);
> + if (rc)
> + return rc;
> +
> + if (!(val & GLOBAL_MMIO_CHI_ELA))
> + return -EAGAIN;
> +
> + user_buf_length = log->buf_size;
> +
> + mutex_lock(&ocxlpmem->admin_command.lock);
> +
> + rc = admin_command_execute(ocxlpmem, ADMIN_COMMAND_ERRLOG);
> + if (rc != STATUS_SUCCESS) {
> + warn_status(ocxlpmem,
> + "Unexpected status from retrieve error log", rc);
> + goto out;
> + }
> +
> + rc = error_log_header_parse(ocxlpmem, &log->buf_size);
> + if (rc)
> + goto out;
> + // log->buf_size now contains the returned buffer size, not the user size
> +
> + rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
> + ocxlpmem->admin_command.data_offset + 0x08,
> + OCXL_LITTLE_ENDIAN, &val);
> + if (rc)
> + goto out;
> +
> + log->log_identifier = val >> 32;
> + log->program_reference_code = val & 0xFFFFFFFF;
> +
> + rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
> + ocxlpmem->admin_command.data_offset + 0x10,
> + OCXL_LITTLE_ENDIAN, &val);
> + if (rc)
> + goto out;
> +
> + log->error_log_type = val >> 56;
> + log->action_flags = (log->error_log_type == OCXLPMEM_ERROR_LOG_TYPE_GENERAL) ?
> + (val >> 32) & 0xFFFFFF : 0;
> + log->power_on_seconds = val & 0xFFFFFFFF;
> +
> + rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
> + ocxlpmem->admin_command.data_offset + 0x18,
> + OCXL_LITTLE_ENDIAN, &log->timestamp);
> + if (rc)
> + goto out;
> +
> + rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
> + ocxlpmem->admin_command.data_offset + 0x20,
> + OCXL_LITTLE_ENDIAN, &log->wwid[0]);
> + if (rc)
> + goto out;
> +
> + rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
> + ocxlpmem->admin_command.data_offset + 0x28,
> + OCXL_LITTLE_ENDIAN, &log->wwid[1]);
> + if (rc)
> + goto out;
> +
> + rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
> + ocxlpmem->admin_command.data_offset + 0x30,
> + OCXL_HOST_ENDIAN, (u64 *)log->fw_revision);
> + if (rc)
> + goto out;
> + log->fw_revision[8] = '\0';
> +
> + buf_length = (user_buf_length < log->buf_size) ?
> + user_buf_length : log->buf_size;
> + for (i = 0; i < buf_length / (sizeof(u64)); i++) {
> + rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
> + ocxlpmem->admin_command.data_offset +
> + i * sizeof(u64),
> + OCXL_HOST_ENDIAN, &val);
> + if (rc)
> + goto out;
> +
> + if (buf_is_user) {
> + if (copy_to_user((u64 __user *)&buf[i], &val,
> + sizeof(u64))) {
> + rc = -EFAULT;
> + goto out;
> + }
> + } else {
> + buf[i] = val;
> + }
> + }
> +
> + rc = admin_response_handled(ocxlpmem);
> + if (rc)
> + goto out;
> +
> +out:
> + mutex_unlock(&ocxlpmem->admin_command.lock);
> + return rc;
> +}
> +
> +static int ioctl_error_log(struct ocxlpmem *ocxlpmem,
> + struct ioctl_ocxlpmem_error_log __user *uarg)
> +{
> + struct ioctl_ocxlpmem_error_log args;
> + int rc;
> +
> + if (copy_from_user(&args, uarg, sizeof(args)))
> + return -EFAULT;
> +
> + rc = read_error_log(ocxlpmem, &args, true);
> + if (rc)
> + return rc;
> +
> + if (copy_to_user(uarg, &args, sizeof(args)))
> + return -EFAULT;
> +
> + return 0;
> +}
> +
> +static long file_ioctl(struct file *file, unsigned int cmd, unsigned long args)
> +{
> + struct ocxlpmem *ocxlpmem = file->private_data;
> + int rc = -EINVAL;
> +
> + switch (cmd) {
> + case IOCTL_OCXLPMEM_ERROR_LOG:
> + rc = ioctl_error_log(ocxlpmem,
> + (struct ioctl_ocxlpmem_error_log __user *)args);
> + break;
> + }
> + return rc;
> +}
> +
> static const struct file_operations fops = {
> .owner = THIS_MODULE,
> .open = file_open,
> .release = file_release,
> + .unlocked_ioctl = file_ioctl,
> + .compat_ioctl = file_ioctl,
> };
>
> /**
> @@ -493,6 +674,60 @@ static int read_device_metadata(struct ocxlpmem *ocxlpmem)
> return 0;
> }
>
> +static const char *decode_error_log_type(u8 error_log_type)
> +{
> + switch (error_log_type) {
> + case 0x00:
> + return "general";
> + case 0x01:
> + return "predictive failure";
> + case 0x02:
> + return "thermal warning";
> + case 0x03:
> + return "data loss";
> + case 0x04:
> + return "health & performance";
> + default:
> + return "unknown";
> + }
> +}
> +
> +static void dump_error_log(struct ocxlpmem *ocxlpmem)
> +{
> + struct ioctl_ocxlpmem_error_log log;
> + u32 buf_size;
> + u8 *buf;
> + int rc;
> +
> + if (ocxlpmem->admin_command.data_size == 0)
> + return;
> +
> + buf_size = ocxlpmem->admin_command.data_size - 0x48;
> + buf = kzalloc(buf_size, GFP_KERNEL);
> + if (!buf)
> + return;
> +
> + log.buf_ptr = (u64)buf;
> + log.buf_size = buf_size;
> +
> + rc = read_error_log(ocxlpmem, &log, false);
> + if (rc < 0)
> + goto out;
> +
> + dev_warn(&ocxlpmem->dev,
> + "OCXL PMEM Error log: WWID=0x%016llx%016llx LID=0x%x PRC=%x type=0x%x %s, Uptime=%u seconds timestamp=0x%llx\n",
> + log.wwid[0], log.wwid[1],
> + log.log_identifier, log.program_reference_code,
> + log.error_log_type,
> + decode_error_log_type(log.error_log_type),
> + log.power_on_seconds, log.timestamp);
> + print_hex_dump(KERN_WARNING, "buf", DUMP_PREFIX_OFFSET, 16, 1, buf,
> + log.buf_size, false);
> +
> +out:
> + kfree(buf);
> +}
> +
> /**
> * probe_function0() - Set up function 0 for an OpenCAPI persistent memory device
> * This is important as it enables templates higher than 0 across all other
> @@ -656,6 +891,11 @@ static int probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> pci_set_drvdata(pdev, NULL);
>
> err:
> + if (ocxlpmem &&
> + (ocxlpmem_chi(ocxlpmem, &chi) == 0) &&
> + (chi & GLOBAL_MMIO_CHI_ELA))
> + dump_error_log(ocxlpmem);
> +
> /*
> * Further cleanup is done in the release handler via free_ocxlpmem()
> * This allows us to keep the character device live to handle IOCTLs to
> diff --git a/include/uapi/nvdimm/ocxlpmem.h b/include/uapi/nvdimm/ocxlpmem.h
> new file mode 100644
> index 000000000000..5d3a03ea1e08
> --- /dev/null
> +++ b/include/uapi/nvdimm/ocxlpmem.h
> @@ -0,0 +1,46 @@
> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> +/* Copyright 2020 IBM Corp. */
> +#ifndef _UAPI_OCXL_SCM_H
> +#define _UAPI_OCXL_SCM_H
> +
> +#include <linux/types.h>
> +#include <linux/ioctl.h>
> +
> +#define OCXLPMEM_ERROR_LOG_ACTION_RESET (1 << (32 - 32))
> +#define OCXLPMEM_ERROR_LOG_ACTION_CHKFW (1 << (53 - 32))
> +#define OCXLPMEM_ERROR_LOG_ACTION_REPLACE (1 << (54 - 32))
> +#define OCXLPMEM_ERROR_LOG_ACTION_DUMP (1 << (55 - 32))
> +
> +#define OCXLPMEM_ERROR_LOG_TYPE_GENERAL (0x00)
> +#define OCXLPMEM_ERROR_LOG_TYPE_PREDICTIVE_FAILURE (0x01)
> +#define OCXLPMEM_ERROR_LOG_TYPE_THERMAL_WARNING (0x02)
> +#define OCXLPMEM_ERROR_LOG_TYPE_DATA_LOSS (0x03)
> +#define OCXLPMEM_ERROR_LOG_TYPE_HEALTH_PERFORMANCE (0x04)
> +
> +struct ioctl_ocxlpmem_error_log {
> + __u32 log_identifier; /* out */
> + __u32 program_reference_code; /* out */
> + __u32 action_flags; /* out, recommended course of action */
> + __u32 power_on_seconds; /* out, Number of seconds the controller has been on when the error occurred */
> + __u64 timestamp; /* out, relative time since the current IPL */
> + __u64 wwid[2]; /* out, the NAA formatted WWID associated with the controller */
> + char fw_revision[8 + 1]; /* out, firmware revision as null terminated text */
> + __u8 reserved0[7];
> + __u16 buf_size; /* in/out, buffer size provided/required.
> + * If required is greater than provided, the buffer
> + * will be truncated to the amount provided. If its
> + * less, then only the required bytes will be populated.
> + * If it is 0, then there are no more error log entries.
> + */
> + __u8 error_log_type;
> + __u8 reserved1[5];
> + __u64 buf_ptr; /* coerced pointer to output buffer */
> + __u64 reserved2[2];
> +};
> +
> +/* ioctl numbers */
> +#define OCXLPMEM_MAGIC 0xCA
> +/* OpenCAPI Persistent memory devices */
> +#define IOCTL_OCXLPMEM_ERROR_LOG _IOWR(OCXLPMEM_MAGIC, 0x30, struct ioctl_ocxlpmem_error_log)
> +
> +#endif /* _UAPI_OCXL_SCM_H */
> --
> 2.24.1
>

2020-04-03 03:59:17

by Alastair D'Silva

[permalink] [raw]
Subject: RE: [PATCH v4 08/25] ocxl: Emit a log message showing how much LPC memory was detected

> -----Original Message-----
> From: Dan Williams <[email protected]>
> Sent: Wednesday, 1 April 2020 7:49 PM
> To: Alastair D'Silva <[email protected]>
> Cc: Aneesh Kumar K . V <[email protected]>; Oliver O'Halloran
> <[email protected]>; Benjamin Herrenschmidt
> <[email protected]>; Paul Mackerras <[email protected]>; Michael
> Ellerman <[email protected]>; Frederic Barrat <[email protected]>;
> Andrew Donnellan <[email protected]>; Arnd Bergmann
> <[email protected]>; Greg Kroah-Hartman <[email protected]>;
> Vishal Verma <[email protected]>; Dave Jiang
> <[email protected]>; Ira Weiny <[email protected]>; Andrew Morton
> <[email protected]>; Mauro Carvalho Chehab
> <[email protected]>; David S. Miller <[email protected]>;
> Rob Herring <[email protected]>; Anton Blanchard <[email protected]>;
> Krzysztof Kozlowski <[email protected]>; Mahesh Salgaonkar
> <[email protected]>; Madhavan Srinivasan
> <[email protected]>; Cédric Le Goater <[email protected]>; Anju T
> Sudhakar <[email protected]>; Hari Bathini
> <[email protected]>; Thomas Gleixner <[email protected]>; Greg
> Kurz <[email protected]>; Nicholas Piggin <[email protected]>; Masahiro
> Yamada <[email protected]>; Alexey Kardashevskiy
> <[email protected]>; Linux Kernel Mailing List <[email protected]>;
> linuxppc-dev <[email protected]>; linux-nvdimm <linux-
> [email protected]>; Linux MM <[email protected]>
> Subject: Re: [PATCH v4 08/25] ocxl: Emit a log message showing how much
> LPC memory was detected
>
> On Sun, Mar 29, 2020 at 10:23 PM Alastair D'Silva <[email protected]>
> wrote:
> >
> > This patch emits a message showing how much LPC memory & special
> > purpose memory was detected on an OCXL device.
> >
> > Signed-off-by: Alastair D'Silva <[email protected]>
> > Acked-by: Frederic Barrat <[email protected]>
> > Acked-by: Andrew Donnellan <[email protected]>
> > ---
> > drivers/misc/ocxl/config.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
> > index a62e3d7db2bf..69cca341d446 100644
> > --- a/drivers/misc/ocxl/config.c
> > +++ b/drivers/misc/ocxl/config.c
> > @@ -568,6 +568,10 @@ static int read_afu_lpc_memory_info(struct
> pci_dev *dev,
> > afu->special_purpose_mem_size =
> > total_mem_size - lpc_mem_size;
> > }
> > +
> > + dev_info(&dev->dev, "Probed LPC memory of %#llx bytes and special
> purpose memory of %#llx bytes\n",
> > + afu->lpc_mem_size, afu->special_purpose_mem_size);
>
> A patch for a single log message is too fine grained for my taste, let's squash
> this into another patch in the series.
>

I'm not sure there's a great place for it. At a pinch, it could with the previous patch, but they are really different layers.

> > +
> > return 0;
> > }
> >
> > --
> > 2.24.1
> >
>
>
> --
> This email has been checked for viruses by AVG.
> https://www.avg.com


2020-04-03 04:39:33

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH v4 03/25] powerpc/powernv: Map & release OpenCAPI LPC memory

Benjamin Herrenschmidt <[email protected]> writes:
> On Wed, 2020-04-01 at 01:48 -0700, Dan Williams wrote:
>> >
>> > +u64 pnv_ocxl_platform_lpc_setup(struct pci_dev *pdev, u64 size)
>> > +{
>> > + struct pci_controller *hose = pci_bus_to_host(pdev->bus);
>> > + struct pnv_phb *phb = hose->private_data;
>>
>> Is calling the local variable 'hose' instead of 'host' on purpose?
>
> Haha that's funny :-)
>
> It's an oooooooold usage that comes iirc from sparc ? or maybe alpha ?

Yeah it was alpha, I found it in the history tree:

https://github.com/mpe/linux-fullhistory/blob/1928de59ba4209dc5e9f2cef63560c09ba0df73b/arch/alpha/kernel/mcpcia.c

And airlied found an old manual which confirms it:

The TIOP module interfaces the AlphaServer 8000 system bus to four I/O channels, called "hoses."

https://www.hpl.hp.com/hpjournal/dtj/vol7num1/vol7num1art4.pdf


So at least now we know where it comes from.

It's also used widely in mips, microblaze, sh and a little bit in drm.

cheers