2017-12-18 15:21:43

by Frederic Barrat

[permalink] [raw]
Subject: [PATCH 00/13] New driver to support OpenCAPI devices on POWER9


This series adds support for Open Coherent Accelerator (ocxl) devices
on POWER9 processor. OpenCAPI is a consortium developing the
specifications for an interface between processors and accelerators,
allowing sharing the host memory with the accelerators, using virtual
addresses.

The OpenCAPI device can also have its own local memory and provide
access to the host, though it is not supported by that series.

The OpenCAPI specification is processor agnostic, but this series adds
support specifically for powerpc.

Even though the underlying transport is not PCI, the firmware
abstracts the hardware like a PCI host bridge and Linux sees the
OpenCAPI devices as PCI devices. So a lot of existing infrastructure
and commands can be reused.

Patches 1-5: add the platform-specific services needed by the driver
Patches 6-10: driver code
Patch 11: small correction to existing cxl driver
Patch 12: documentation

Current limitations, that will be addressed in later patches:
- no capability to trigger a reset of the opencapi adapter
- no support for the 'wake_host_thread' command
- no support for adapters with a dual-link connection (none exists yet)
- no access to the adapter-local memory

Many people contributed directly or indirectly, from the software,
hardware and bringup teams. In particular Andrew Donnellan and
Alastair D'Silva, who are developing the related firmware and library.

Feedback welcome!



Frederic Barrat (13):
powerpc/powernv: Introduce new PHB type for opencapi links
powerpc/powernv: Set correct configuration space size for opencapi
devices
powerpc/powernv: Add opal calls for opencapi
powerpc/powernv: Add platform-specific services for opencapi
powerpc/powernv: Capture actag information for the device
ocxl: Driver code for 'generic' opencapi devices
ocxl: Add AFU interrupt support
ocxl: Add a kernel API for other opencapi drivers
ocxl: Add trace points
ocxl: Add Makefile and Kconfig
cxl: Remove support for "Processing accelerators" class
ocxl: Documentation
ocxl: add MAINTAINERS entry

Documentation/accelerators/ocxl.rst | 151 +++++
Documentation/ioctl/ioctl-number.txt | 1 +
MAINTAINERS | 12 +
arch/powerpc/include/asm/opal-api.h | 5 +-
arch/powerpc/include/asm/opal.h | 6 +
arch/powerpc/include/asm/pnv-ocxl.h | 43 ++
arch/powerpc/platforms/powernv/Makefile | 1 +
arch/powerpc/platforms/powernv/npu-dma.c | 2 +-
arch/powerpc/platforms/powernv/ocxl.c | 519 ++++++++++++++++++
arch/powerpc/platforms/powernv/opal-wrappers.S | 3 +
arch/powerpc/platforms/powernv/pci-ioda.c | 56 +-
arch/powerpc/platforms/powernv/pci.c | 4 +
arch/powerpc/platforms/powernv/pci.h | 8 +-
drivers/misc/Kconfig | 1 +
drivers/misc/Makefile | 1 +
drivers/misc/cxl/pci.c | 2 -
drivers/misc/ocxl/Kconfig | 25 +
drivers/misc/ocxl/Makefile | 10 +
drivers/misc/ocxl/afu_irq.c | 209 +++++++
drivers/misc/ocxl/config.c | 729 +++++++++++++++++++++++++
drivers/misc/ocxl/context.c | 275 ++++++++++
drivers/misc/ocxl/file.c | 438 +++++++++++++++
drivers/misc/ocxl/link.c | 654 ++++++++++++++++++++++
drivers/misc/ocxl/main.c | 40 ++
drivers/misc/ocxl/ocxl_internal.h | 138 +++++
drivers/misc/ocxl/pasid.c | 114 ++++
drivers/misc/ocxl/pci.c | 592 ++++++++++++++++++++
drivers/misc/ocxl/sysfs.c | 150 +++++
drivers/misc/ocxl/trace.c | 13 +
drivers/misc/ocxl/trace.h | 189 +++++++
include/misc/ocxl-config.h | 52 ++
include/misc/ocxl.h | 221 ++++++++
include/uapi/misc/ocxl.h | 56 ++
33 files changed, 4702 insertions(+), 18 deletions(-)
create mode 100644 Documentation/accelerators/ocxl.rst
create mode 100644 arch/powerpc/include/asm/pnv-ocxl.h
create mode 100644 arch/powerpc/platforms/powernv/ocxl.c
create mode 100644 drivers/misc/ocxl/Kconfig
create mode 100644 drivers/misc/ocxl/Makefile
create mode 100644 drivers/misc/ocxl/afu_irq.c
create mode 100644 drivers/misc/ocxl/config.c
create mode 100644 drivers/misc/ocxl/context.c
create mode 100644 drivers/misc/ocxl/file.c
create mode 100644 drivers/misc/ocxl/link.c
create mode 100644 drivers/misc/ocxl/main.c
create mode 100644 drivers/misc/ocxl/ocxl_internal.h
create mode 100644 drivers/misc/ocxl/pasid.c
create mode 100644 drivers/misc/ocxl/pci.c
create mode 100644 drivers/misc/ocxl/sysfs.c
create mode 100644 drivers/misc/ocxl/trace.c
create mode 100644 drivers/misc/ocxl/trace.h
create mode 100644 include/misc/ocxl-config.h
create mode 100644 include/misc/ocxl.h
create mode 100644 include/uapi/misc/ocxl.h

--
2.14.1


2017-12-18 15:21:55

by Frederic Barrat

[permalink] [raw]
Subject: [PATCH 04/13] powerpc/powernv: Add platform-specific services for opencapi

Implement a few platform-specific calls which can be used by drivers:

- provide the Transaction Layer capabilities of the host, so that the
driver can find some common ground and configure the device and host
appropriately.

- provide the hw interrupt to be used for translation faults raised by
the NPU

- map/unmap some NPU mmio registers to get the fault context when the
NPU raises an address translation fault

The rest are wrappers around the previously-introduced opal calls.


Signed-off-by: Frederic Barrat <[email protected]>
---
arch/powerpc/include/asm/pnv-ocxl.h | 36 ++++++
arch/powerpc/platforms/powernv/Makefile | 1 +
arch/powerpc/platforms/powernv/ocxl.c | 187 ++++++++++++++++++++++++++++++++
3 files changed, 224 insertions(+)
create mode 100644 arch/powerpc/include/asm/pnv-ocxl.h
create mode 100644 arch/powerpc/platforms/powernv/ocxl.c

diff --git a/arch/powerpc/include/asm/pnv-ocxl.h b/arch/powerpc/include/asm/pnv-ocxl.h
new file mode 100644
index 000000000000..b9ab3f0a9634
--- /dev/null
+++ b/arch/powerpc/include/asm/pnv-ocxl.h
@@ -0,0 +1,36 @@
+/*
+ * Copyright 2017 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _ASM_PVN_OCXL_H
+#define _ASM_PVN_OCXL_H
+
+#include <linux/pci.h>
+
+#define PNV_OCXL_TL_MAX_TEMPLATE 63
+#define PNV_OCXL_TL_BITS_PER_RATE 4
+#define PNV_OCXL_TL_RATE_BUF_SIZE ((PNV_OCXL_TL_MAX_TEMPLATE+1) * PNV_OCXL_TL_BITS_PER_RATE / 8)
+
+extern int pnv_ocxl_get_tl_cap(struct pci_dev *dev, long *cap,
+ char *rate_buf, int rate_buf_size);
+extern int pnv_ocxl_set_tl_conf(struct pci_dev *dev, long cap,
+ uint64_t rate_buf_phys, int rate_buf_size);
+
+extern int pnv_ocxl_get_xsl_irq(struct pci_dev *dev, int *hwirq);
+extern void pnv_ocxl_unmap_xsl_regs(void __iomem *dsisr, void __iomem *dar,
+ void __iomem *tfc, void __iomem *pe_handle);
+extern int pnv_ocxl_map_xsl_regs(struct pci_dev *dev, void __iomem **dsisr,
+ void __iomem **dar, void __iomem **tfc,
+ void __iomem **pe_handle);
+
+extern int pnv_ocxl_spa_setup(struct pci_dev *dev, void *spa_mem, int PE_mask,
+ void **platform_data);
+extern void pnv_ocxl_spa_release(void *platform_data);
+extern int pnv_ocxl_spa_remove_pe(void *platform_data, int pe_handle);
+
+#endif /* _ASM_PVN_OCXL_H */
diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index 3732118a0482..6c9d5199a7e2 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -17,3 +17,4 @@ obj-$(CONFIG_PERF_EVENTS) += opal-imc.o
obj-$(CONFIG_PPC_MEMTRACE) += memtrace.o
obj-$(CONFIG_PPC_VAS) += vas.o vas-window.o vas-debug.o
obj-$(CONFIG_PPC_FTW) += nx-ftw.o
+obj-$(CONFIG_OCXL_BASE) += ocxl.o
diff --git a/arch/powerpc/platforms/powernv/ocxl.c b/arch/powerpc/platforms/powernv/ocxl.c
new file mode 100644
index 000000000000..3378b75cf5e5
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/ocxl.c
@@ -0,0 +1,187 @@
+/*
+ * Copyright 2017 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <asm/pnv-ocxl.h>
+#include <asm/opal.h>
+#include "pci.h"
+
+#define PNV_OCXL_TL_P9_RECV_CAP 0x000000000000000Full
+/* PASIDs are 20-bit, but on P9, NPU can only handle 15 bits */
+#define PNV_OCXL_PASID_BITS 15
+#define PNV_OCXL_PASID_MAX ((1 << PNV_OCXL_PASID_BITS) - 1)
+
+
+static void set_templ_rate(unsigned int templ, unsigned int rate, char *buf)
+{
+ int shift, idx;
+
+ WARN_ON(templ > PNV_OCXL_TL_MAX_TEMPLATE);
+ idx = (PNV_OCXL_TL_MAX_TEMPLATE - templ) / 2;
+ shift = 4 * (1 - ((PNV_OCXL_TL_MAX_TEMPLATE - templ) % 2));
+ buf[idx] |= rate << shift;
+}
+
+int pnv_ocxl_get_tl_cap(struct pci_dev *dev, long *cap,
+ char *rate_buf, int rate_buf_size)
+{
+ if (rate_buf_size != PNV_OCXL_TL_RATE_BUF_SIZE)
+ return -EINVAL;
+ /*
+ * The TL capabilities are a characteristic of the NPU, so
+ * we go with hard-coded values.
+ *
+ * The receiving rate of each template is encoded on 4 bits.
+ *
+ * On P9:
+ * - templates 0 -> 3 are supported
+ * - templates 0, 1 and 3 have a 0 receiving rate
+ * - template 2 has receiving rate of 1 (extra cycle)
+ */
+ memset(rate_buf, 0, rate_buf_size);
+ set_templ_rate(2, 1, rate_buf);
+ *cap = PNV_OCXL_TL_P9_RECV_CAP;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_get_tl_cap);
+
+int pnv_ocxl_set_tl_conf(struct pci_dev *dev, long cap,
+ uint64_t rate_buf_phys, int rate_buf_size)
+{
+ struct pci_controller *hose = pci_bus_to_host(dev->bus);
+ struct pnv_phb *phb = hose->private_data;
+ int rc;
+
+ if (rate_buf_size != PNV_OCXL_TL_RATE_BUF_SIZE)
+ return -EINVAL;
+
+ rc = opal_npu_tl_set(phb->opal_id, dev->devfn, cap,
+ rate_buf_phys, rate_buf_size);
+ if (rc) {
+ dev_err(&dev->dev, "Can't configure host TL: %d\n", rc);
+ return -EINVAL;
+ }
+ return 0;
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_set_tl_conf);
+
+int pnv_ocxl_get_xsl_irq(struct pci_dev *dev, int *hwirq)
+{
+ int rc;
+
+ rc = of_property_read_u32(dev->dev.of_node, "ibm,opal-xsl-irq", hwirq);
+ if (rc) {
+ dev_err(&dev->dev,
+ "Can't translation xsl interrupt for device\n");
+ return rc;
+ }
+ return 0;
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_get_xsl_irq);
+
+void pnv_ocxl_unmap_xsl_regs(void __iomem *dsisr, void __iomem *dar,
+ void __iomem *tfc, void __iomem *pe_handle)
+{
+ iounmap(dsisr);
+ iounmap(dar);
+ iounmap(tfc);
+ iounmap(pe_handle);
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_unmap_xsl_regs);
+
+int pnv_ocxl_map_xsl_regs(struct pci_dev *dev, void __iomem **dsisr,
+ void __iomem **dar, void __iomem **tfc,
+ void __iomem **pe_handle)
+{
+ u64 reg;
+ int i, j, rc = 0;
+ void __iomem *regs[4];
+
+ /*
+ * opal stores the mmio addresses of the DSISR, DAR, TFC and
+ * PE_HANDLE registers in a device tree property, in that
+ * order
+ */
+ for (i = 0; i < 4; i++) {
+ rc = of_property_read_u64_index(dev->dev.of_node,
+ "ibm,opal-xsl-mmio", i, &reg);
+ if (rc)
+ break;
+ regs[i] = ioremap(reg, 8);
+ if (!regs[i]) {
+ rc = -EINVAL;
+ break;
+ }
+ }
+ if (rc) {
+ dev_err(&dev->dev, "Can't map translation mmio registers\n");
+ for (j = i - 1; j >= 0; j--)
+ iounmap(regs[j]);
+ } else {
+ *dsisr = regs[0];
+ *dar = regs[1];
+ *tfc = regs[2];
+ *pe_handle = regs[3];
+ }
+ return rc;
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_map_xsl_regs);
+
+struct spa_data {
+ u64 phb_opal_id;
+ u32 bdfn;
+};
+
+int pnv_ocxl_spa_setup(struct pci_dev *dev, void *spa_mem, int PE_mask,
+ void **platform_data)
+{
+ struct pci_controller *hose = pci_bus_to_host(dev->bus);
+ struct pnv_phb *phb = hose->private_data;
+ struct spa_data *data;
+ u32 bdfn;
+ int rc;
+
+ data = kzalloc(sizeof(*data), GFP_KERNEL);
+ if (!data)
+ return -ENOMEM;
+
+ bdfn = (dev->bus->number << 8) | dev->devfn;
+ rc = opal_npu_spa_setup(phb->opal_id, bdfn, virt_to_phys(spa_mem),
+ PE_mask);
+ if (rc) {
+ dev_err(&dev->dev, "Can't setup Shared Process Area: %d\n", rc);
+ kfree(data);
+ return rc;
+ }
+ data->phb_opal_id = phb->opal_id;
+ data->bdfn = bdfn;
+ *platform_data = (void *) data;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_spa_setup);
+
+void pnv_ocxl_spa_release(void *platform_data)
+{
+ struct spa_data *data = (struct spa_data *) platform_data;
+ int rc;
+
+ rc = opal_npu_spa_setup(data->phb_opal_id, data->bdfn, 0, 0);
+ WARN_ON(rc);
+ kfree(data);
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_spa_release);
+
+int pnv_ocxl_spa_remove_pe(void *platform_data, int pe_handle)
+{
+ struct spa_data *data = (struct spa_data *) platform_data;
+ int rc;
+
+ rc = opal_npu_spa_clear_cache(data->phb_opal_id, data->bdfn, pe_handle);
+ return rc;
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_spa_remove_pe);
--
2.14.1

2017-12-18 15:21:59

by Frederic Barrat

[permalink] [raw]
Subject: [PATCH 03/13] powerpc/powernv: Add opal calls for opencapi

Add opal calls to interact with the NPU:

OPAL_NPU_SPA_SETUP: set the Shared Process Area (SPA)
The SPA is a table containing one entry (Process Element) per memory
context which can be accessed by the opencapi device.

OPAL_NPU_SPA_CLEAR_CACHE: clear the context cache
The NPU keeps a cache of recently accessed memory contexts. When a
Process Element is removed from the SPA, the cache for the link must
be cleared.

OPAL_NPU_TL_SET: configure the Transaction Layer
The Transaction Layer specification defines several templates for
messages to be exchanged on the link. During link setup, the host and
device must negotiate what templates are supported on both sides and
at what rates those messages can be sent.


Signed-off-by: Frederic Barrat <[email protected]>
---
arch/powerpc/include/asm/opal-api.h | 5 ++++-
arch/powerpc/include/asm/opal.h | 6 ++++++
arch/powerpc/platforms/powernv/opal-wrappers.S | 3 +++
3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index 233c7504b1f2..24c73f5575ee 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -201,7 +201,10 @@
#define OPAL_SET_POWER_SHIFT_RATIO 155
#define OPAL_SENSOR_GROUP_CLEAR 156
#define OPAL_PCI_SET_P2P 157
-#define OPAL_LAST 157
+#define OPAL_NPU_SPA_SETUP 159
+#define OPAL_NPU_SPA_CLEAR_CACHE 160
+#define OPAL_NPU_TL_SET 161
+#define OPAL_LAST 161

/* Device tree flags */

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 0c545f7fc77b..12e70fb58700 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -34,6 +34,12 @@ int64_t opal_npu_init_context(uint64_t phb_id, int pasid, uint64_t msr,
uint64_t bdf);
int64_t opal_npu_map_lpar(uint64_t phb_id, uint64_t bdf, uint64_t lparid,
uint64_t lpcr);
+int64_t opal_npu_spa_setup(uint64_t phb_id, uint32_t bdfn,
+ uint64_t addr, uint64_t PE_mask);
+int64_t opal_npu_spa_clear_cache(uint64_t phb_id, uint32_t bdfn,
+ uint64_t PE_handle);
+int64_t opal_npu_tl_set(uint64_t phb_id, uint32_t bdfn, long cap,
+ uint64_t rate_phys, uint32_t size);
int64_t opal_console_write(int64_t term_number, __be64 *length,
const uint8_t *buffer);
int64_t opal_console_read(int64_t term_number, __be64 *length,
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 6f4b00a2ac46..1b2936ba6040 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -320,3 +320,6 @@ OPAL_CALL(opal_set_powercap, OPAL_SET_POWERCAP);
OPAL_CALL(opal_get_power_shift_ratio, OPAL_GET_POWER_SHIFT_RATIO);
OPAL_CALL(opal_set_power_shift_ratio, OPAL_SET_POWER_SHIFT_RATIO);
OPAL_CALL(opal_sensor_group_clear, OPAL_SENSOR_GROUP_CLEAR);
+OPAL_CALL(opal_npu_spa_setup, OPAL_NPU_SPA_SETUP);
+OPAL_CALL(opal_npu_spa_clear_cache, OPAL_NPU_SPA_CLEAR_CACHE);
+OPAL_CALL(opal_npu_tl_set, OPAL_NPU_TL_SET);
--
2.14.1

2017-12-18 15:22:13

by Frederic Barrat

[permalink] [raw]
Subject: [PATCH 09/13] ocxl: Add trace points

Define a few trace points so that we can use the standard tracing
mechanism for debug and/or monitoring.

Signed-off-by: Frederic Barrat <[email protected]>
---
drivers/misc/ocxl/afu_irq.c | 5 ++
drivers/misc/ocxl/context.c | 2 +
drivers/misc/ocxl/link.c | 11 ++-
drivers/misc/ocxl/trace.c | 13 +++
drivers/misc/ocxl/trace.h | 189 ++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 219 insertions(+), 1 deletion(-)
create mode 100644 drivers/misc/ocxl/trace.c
create mode 100644 drivers/misc/ocxl/trace.h

diff --git a/drivers/misc/ocxl/afu_irq.c b/drivers/misc/ocxl/afu_irq.c
index 0b217a854837..472fe1fb9fd9 100644
--- a/drivers/misc/ocxl/afu_irq.c
+++ b/drivers/misc/ocxl/afu_irq.c
@@ -11,6 +11,7 @@
#include <linux/eventfd.h>
#include <asm/pnv-ocxl.h>
#include "ocxl_internal.h"
+#include "trace.h"

struct afu_irq {
int id;
@@ -35,6 +36,7 @@ static irqreturn_t afu_irq_handler(int virq, void *data)
{
struct afu_irq *irq = (struct afu_irq *) data;

+ trace_ocxl_afu_irq_receive(virq);
if (irq->ev_ctx)
eventfd_signal(irq->ev_ctx, 1);
return IRQ_HANDLED;
@@ -109,6 +111,8 @@ int ocxl_afu_irq_alloc(struct ocxl_context *ctx, u64 *irq_offset)

*irq_offset = irq_id_to_offset(ctx, irq->id);

+ trace_ocxl_afu_irq_alloc(ctx->pasid, irq->id, irq->virq, irq->hw_irq,
+ *irq_offset);
mutex_unlock(&ctx->irq_lock);
return 0;

@@ -124,6 +128,7 @@ int ocxl_afu_irq_alloc(struct ocxl_context *ctx, u64 *irq_offset)

static void afu_irq_free(struct afu_irq *irq, struct ocxl_context *ctx)
{
+ trace_ocxl_afu_irq_free(ctx->pasid, irq->id);
if (ctx->mapping)
unmap_mapping_range(ctx->mapping,
irq_id_to_offset(ctx, irq->id),
diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
index 19575269ed22..bc6546a8877e 100644
--- a/drivers/misc/ocxl/context.c
+++ b/drivers/misc/ocxl/context.c
@@ -8,6 +8,7 @@
*/

#include <linux/sched/mm.h>
+#include "trace.h"
#include "ocxl_internal.h"

struct ocxl_context *ocxl_context_alloc(void)
@@ -210,6 +211,7 @@ int ocxl_context_detach(struct ocxl_context *ctx)
mutex_lock(&ctx->afu->afu_control_lock);
rc = ocxl_config_terminate_pasid(dev, afu_control_pos, ctx->pasid);
mutex_unlock(&ctx->afu->afu_control_lock);
+ trace_ocxl_terminate_pasid(ctx->pasid, rc);
if (rc) {
/*
* If we timeout waiting for the AFU to terminate the
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index fda2c9def4ff..99dfb198b3d9 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -14,6 +14,7 @@
#include <asm/pnv-ocxl.h>
#include <misc/ocxl.h>
#include "ocxl_internal.h"
+#include "trace.h"


#define SPA_PASID_BITS 15
@@ -123,8 +124,11 @@ static void ack_irq(struct spa *spa, enum xsl_response r)
else
WARN(1, "Invalid irq response %d\n", r);

- if (reg)
+ if (reg) {
+ trace_ocxl_fault_ack(spa->spa_mem, spa->xsl_fault.pe,
+ spa->xsl_fault.dsisr, spa->xsl_fault.dar, reg);
out_be64(spa->reg_tfc, reg);
+ }
}

static void xsl_fault_handler_bh(struct work_struct *fault_work)
@@ -189,6 +193,7 @@ static irqreturn_t xsl_fault_handler(int irq, void *data)
int lpid, pid, tid;

read_irq(spa, &dsisr, &dar, &pe_handle);
+ trace_ocxl_fault(spa->spa_mem, pe_handle, dsisr, dar, -1);

WARN_ON(pe_handle > SPA_PE_MASK);
pe = spa->spa_mem + pe_handle;
@@ -539,6 +544,7 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
* the problem.
*/
mmgrab(mm);
+ trace_ocxl_context_add(current->pid, spa->spa_mem, pasid, pidr, tidr);
unlock:
mutex_unlock(&spa->spa_lock);
return rc;
@@ -584,6 +590,9 @@ int ocxl_link_remove_pe(void *link_handle, int pasid)
goto unlock;
}

+ trace_ocxl_context_remove(current->pid, spa->spa_mem, pasid,
+ be32_to_cpu(pe->pid), be32_to_cpu(pe->tid));
+
memset(pe, 0, sizeof(struct ocxl_process_element));
/*
* The barrier makes sure the PE is removed from the SPA
diff --git a/drivers/misc/ocxl/trace.c b/drivers/misc/ocxl/trace.c
new file mode 100644
index 000000000000..7d3993283f8d
--- /dev/null
+++ b/drivers/misc/ocxl/trace.c
@@ -0,0 +1,13 @@
+/*
+ * Copyright 2017 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef __CHECKER__
+#define CREATE_TRACE_POINTS
+#include "trace.h"
+#endif
diff --git a/drivers/misc/ocxl/trace.h b/drivers/misc/ocxl/trace.h
new file mode 100644
index 000000000000..2ec202c79c2d
--- /dev/null
+++ b/drivers/misc/ocxl/trace.h
@@ -0,0 +1,189 @@
+/*
+ * Copyright 2017 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM ocxl
+
+#if !defined(_TRACE_OCXL_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_OCXL_H
+
+#include <linux/tracepoint.h>
+
+DECLARE_EVENT_CLASS(ocxl_context,
+ TP_PROTO(pid_t pid, void *spa, int pasid, u32 pidr, u32 tidr),
+ TP_ARGS(pid, spa, pasid, pidr, tidr),
+
+ TP_STRUCT__entry(
+ __field(pid_t, pid)
+ __field(void*, spa)
+ __field(int, pasid)
+ __field(u32, pidr)
+ __field(u32, tidr)
+ ),
+
+ TP_fast_assign(
+ __entry->pid = pid;
+ __entry->spa = spa;
+ __entry->pasid = pasid;
+ __entry->pidr = pidr;
+ __entry->tidr = tidr;
+ ),
+
+ TP_printk("linux pid=%d spa=0x%p pasid=0x%x pidr=0x%x tidr=0x%x",
+ __entry->pid,
+ __entry->spa,
+ __entry->pasid,
+ __entry->pidr,
+ __entry->tidr
+ )
+);
+
+DEFINE_EVENT(ocxl_context, ocxl_context_add,
+ TP_PROTO(pid_t pid, void *spa, int pasid, u32 pidr, u32 tidr),
+ TP_ARGS(pid, spa, pasid, pidr, tidr)
+);
+
+DEFINE_EVENT(ocxl_context, ocxl_context_remove,
+ TP_PROTO(pid_t pid, void *spa, int pasid, u32 pidr, u32 tidr),
+ TP_ARGS(pid, spa, pasid, pidr, tidr)
+);
+
+TRACE_EVENT(ocxl_terminate_pasid,
+ TP_PROTO(int pasid, int rc),
+ TP_ARGS(pasid, rc),
+
+ TP_STRUCT__entry(
+ __field(int, pasid)
+ __field(int, rc)
+ ),
+
+ TP_fast_assign(
+ __entry->pasid = pasid;
+ __entry->rc = rc;
+ ),
+
+ TP_printk("pasid=0x%x rc=%d",
+ __entry->pasid,
+ __entry->rc
+ )
+);
+
+DECLARE_EVENT_CLASS(ocxl_fault_handler,
+ TP_PROTO(void *spa, u64 pe, u64 dsisr, u64 dar, u64 tfc),
+ TP_ARGS(spa, pe, dsisr, dar, tfc),
+
+ TP_STRUCT__entry(
+ __field(void *, spa)
+ __field(u64, pe)
+ __field(u64, dsisr)
+ __field(u64, dar)
+ __field(u64, tfc)
+ ),
+
+ TP_fast_assign(
+ __entry->spa = spa;
+ __entry->pe = pe;
+ __entry->dsisr = dsisr;
+ __entry->dar = dar;
+ __entry->tfc = tfc;
+ ),
+
+ TP_printk("spa=%p pe=0x%llx dsisr=0x%llx dar=0x%llx tfc=0x%llx",
+ __entry->spa,
+ __entry->pe,
+ __entry->dsisr,
+ __entry->dar,
+ __entry->tfc
+ )
+);
+
+DEFINE_EVENT(ocxl_fault_handler, ocxl_fault,
+ TP_PROTO(void *spa, u64 pe, u64 dsisr, u64 dar, u64 tfc),
+ TP_ARGS(spa, pe, dsisr, dar, tfc)
+);
+
+DEFINE_EVENT(ocxl_fault_handler, ocxl_fault_ack,
+ TP_PROTO(void *spa, u64 pe, u64 dsisr, u64 dar, u64 tfc),
+ TP_ARGS(spa, pe, dsisr, dar, tfc)
+);
+
+TRACE_EVENT(ocxl_afu_irq_alloc,
+ TP_PROTO(int pasid, int irq_id, unsigned int virq, int hw_irq,
+ u64 irq_offset),
+ TP_ARGS(pasid, irq_id, virq, hw_irq, irq_offset),
+
+ TP_STRUCT__entry(
+ __field(int, pasid)
+ __field(int, irq_id)
+ __field(unsigned int, virq)
+ __field(int, hw_irq)
+ __field(u64, irq_offset)
+ ),
+
+ TP_fast_assign(
+ __entry->pasid = pasid;
+ __entry->irq_id = irq_id;
+ __entry->virq = virq;
+ __entry->hw_irq = hw_irq;
+ __entry->irq_offset = irq_offset;
+ ),
+
+ TP_printk("pasid=0x%x irq_id=%d virq=%u hw_irq=%d irq_offset=0x%llx",
+ __entry->pasid,
+ __entry->irq_id,
+ __entry->virq,
+ __entry->hw_irq,
+ __entry->irq_offset
+ )
+);
+
+TRACE_EVENT(ocxl_afu_irq_free,
+ TP_PROTO(int pasid, int irq_id),
+ TP_ARGS(pasid, irq_id),
+
+ TP_STRUCT__entry(
+ __field(int, pasid)
+ __field(int, irq_id)
+ ),
+
+ TP_fast_assign(
+ __entry->pasid = pasid;
+ __entry->irq_id = irq_id;
+ ),
+
+ TP_printk("pasid=0x%x irq_id=%d",
+ __entry->pasid,
+ __entry->irq_id
+ )
+);
+
+TRACE_EVENT(ocxl_afu_irq_receive,
+ TP_PROTO(int virq),
+ TP_ARGS(virq),
+
+ TP_STRUCT__entry(
+ __field(int, virq)
+ ),
+
+ TP_fast_assign(
+ __entry->virq = virq;
+ ),
+
+ TP_printk("virq=%d",
+ __entry->virq
+ )
+);
+
+#endif /* _TRACE_OCXL_H */
+
+/* This part must be outside protection */
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH .
+#define TRACE_INCLUDE_FILE trace
+#include <trace/define_trace.h>
--
2.14.1

2017-12-18 15:22:48

by Frederic Barrat

[permalink] [raw]
Subject: [PATCH 10/13] ocxl: Add Makefile and Kconfig

OCXL_BASE triggers the platform support needed by the driver.

Signed-off-by: Frederic Barrat <[email protected]>
---
drivers/misc/Kconfig | 1 +
drivers/misc/Makefile | 1 +
drivers/misc/ocxl/Kconfig | 25 +++++++++++++++++++++++++
drivers/misc/ocxl/Makefile | 10 ++++++++++
4 files changed, 37 insertions(+)
create mode 100644 drivers/misc/ocxl/Kconfig
create mode 100644 drivers/misc/ocxl/Makefile

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index f1a5c2357b14..0534f338c84a 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -508,4 +508,5 @@ source "drivers/misc/mic/Kconfig"
source "drivers/misc/genwqe/Kconfig"
source "drivers/misc/echo/Kconfig"
source "drivers/misc/cxl/Kconfig"
+source "drivers/misc/ocxl/Kconfig"
endmenu
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 5ca5f64df478..73326d54e246 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -55,6 +55,7 @@ obj-$(CONFIG_CXL_BASE) += cxl/
obj-$(CONFIG_ASPEED_LPC_CTRL) += aspeed-lpc-ctrl.o
obj-$(CONFIG_ASPEED_LPC_SNOOP) += aspeed-lpc-snoop.o
obj-$(CONFIG_PCI_ENDPOINT_TEST) += pci_endpoint_test.o
+obj-$(CONFIG_OCXL) += ocxl/

lkdtm-$(CONFIG_LKDTM) += lkdtm_core.o
lkdtm-$(CONFIG_LKDTM) += lkdtm_bugs.o
diff --git a/drivers/misc/ocxl/Kconfig b/drivers/misc/ocxl/Kconfig
new file mode 100644
index 000000000000..4496b61f48db
--- /dev/null
+++ b/drivers/misc/ocxl/Kconfig
@@ -0,0 +1,25 @@
+#
+# Open Coherent Accelerator (OCXL) compatible devices
+#
+
+config OCXL_BASE
+ bool
+ default n
+ select PPC_COPRO_BASE
+
+config OCXL
+ tristate "Support for Open Coherent Accelerators (OCXL)"
+ depends on PPC_POWERNV && PCI && EEH
+ select OCXL_BASE
+ default m
+ help
+
+ Select this option to enable driver support for Open
+ Coherent Accelerators (OCXL). OCXL is otherwise known as
+ Open Coherent Accelerator Processor Interface (OCAPI).
+ OCAPI allows accelerators in FPGAs to be coherently attached
+ to a CPU through a Open CAPI link. This driver enables
+ userspace programs to access these accelerators through
+ devices found in /dev/ocxl/
+
+ If unsure, say N.
diff --git a/drivers/misc/ocxl/Makefile b/drivers/misc/ocxl/Makefile
new file mode 100644
index 000000000000..f75853411cfd
--- /dev/null
+++ b/drivers/misc/ocxl/Makefile
@@ -0,0 +1,10 @@
+ccflags-$(CONFIG_PPC_WERROR) += -Werror
+
+ocxl-y += main.o pci.o config.o file.o pasid.o
+ocxl-y += link.o context.o afu_irq.o sysfs.o trace.o
+obj-$(CONFIG_OCXL) += ocxl.o
+
+# For tracepoints to include our trace.h from tracepoint infrastructure:
+CFLAGS_trace.o := -I$(src)
+
+# ccflags-y += -DDEBUG
--
2.14.1

2017-12-18 15:22:08

by Frederic Barrat

[permalink] [raw]
Subject: [PATCH 11/13] cxl: Remove support for "Processing accelerators" class

The cxl driver currently declares in its table of supported PCI
devices the class "Processing accelerators". Therefore it may be
called to probe for opencapi devices, which generates errors, as the
config space of a cxl device is not compatible with opencapi.

So remove support for the generic class, as we now have (at least) two
drivers for devices of the same class. Most cxl devices are FPGAs with
a PSL which will show a known device ID of 0x477. Other devices are
really supported by the cxlflash driver and are already listed in the
table. So removing the class is expected to go unnoticed.

Signed-off-by: Frederic Barrat <[email protected]>
---
drivers/misc/cxl/pci.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index 19969ee86d6f..758842f65a1b 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -125,8 +125,6 @@ static const struct pci_device_id cxl_pci_tbl[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x0601), },
{ PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x0623), },
{ PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x0628), },
- { PCI_DEVICE_CLASS(0x120000, ~0), },
-
{ }
};
MODULE_DEVICE_TABLE(pci, cxl_pci_tbl);
--
2.14.1

2017-12-18 15:22:16

by Frederic Barrat

[permalink] [raw]
Subject: [PATCH 13/13] ocxl: add MAINTAINERS entry

Signed-off-by: Frederic Barrat <[email protected]>
Signed-off-by: Andrew Donnellan <[email protected]>
---
MAINTAINERS | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index a6e86e20761e..edc9e1db352b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3919,6 +3919,18 @@ F: drivers/scsi/cxlflash/
F: include/uapi/scsi/cxlflash_ioctls.h
F: Documentation/powerpc/cxlflash.txt

+OCXL (Open Coherent Accelerator Processor Interface OpenCAPI) DRIVER
+M: Frederic Barrat <[email protected]>
+M: Andrew Donnellan <[email protected]>
+L: [email protected]
+S: Supported
+F: arch/powerpc/platforms/powernv/ocxl.c
+F: arch/powerpc/include/asm/pnv-ocxl.h
+F: drivers/misc/ocxl/
+F: include/misc/ocxl*
+F: include/uapi/misc/ocxl.h
+F: Documentation/accelerators/ocxl.txt
+
CYBERPRO FB DRIVER
M: Russell King <[email protected]>
L: [email protected] (moderated for non-subscribers)
--
2.14.1

2017-12-18 15:22:52

by Frederic Barrat

[permalink] [raw]
Subject: [PATCH 12/13] ocxl: Documentation

ocxl.rst gives a quick, high-level view of opencapi.

Update ioctl-number.txt to reflect ioctl numbers being used by the
ocxl driver

Signed-off-by: Frederic Barrat <[email protected]>
---
Documentation/accelerators/ocxl.rst | 151 +++++++++++++++++++++++++++++++++++
Documentation/ioctl/ioctl-number.txt | 1 +
2 files changed, 152 insertions(+)
create mode 100644 Documentation/accelerators/ocxl.rst

diff --git a/Documentation/accelerators/ocxl.rst b/Documentation/accelerators/ocxl.rst
new file mode 100644
index 000000000000..94ccd55f2acd
--- /dev/null
+++ b/Documentation/accelerators/ocxl.rst
@@ -0,0 +1,151 @@
+========================================================
+OpenCAPI (Open Coherent Accelerator Processor Interface)
+========================================================
+
+OpenCAPI is an interface between processors and accelerators. It aims
+at being low-latency and high-bandwidth. The specification is
+developed by the `OpenCAPI Consortium <http://opencapi.org/>`_.
+
+It allows an accelerator (which could be a FPGA, ASICs, ...) to access
+the host memory coherently, using virtual addresses. An OpenCAPI
+device can also host its own memory, that can be accessed from the
+host.
+
+OpenCAPI is known in linux as 'ocxl', as the open, processor-agnostic
+evolution of 'cxl' (the driver for the IBM CAPI interface for
+powerpc), which was named that way to avoid confusion with the ISDN
+CAPI subsystem.
+
+
+High-level view
+===============
+
+OpenCAPI defines a Data Link Layer (DL) and Transaction Layer (TL), to
+be implemented on top of a physical link. Any processor or device
+implementing the DL and TL can start sharing memory.
+
+::
+
+ +-----------+ +-------------+
+ | | | |
+ | | | Accelerated |
+ | Processor | | Function |
+ | | +--------+ | Unit | +--------+
+ | |--| Memory | | (AFU) |--| Memory |
+ | | +--------+ | | +--------+
+ +-----------+ +-------------+
+ | |
+ +-----------+ +-------------+
+ | TL | | TLX |
+ +-----------+ +-------------+
+ | |
+ +-----------+ +-------------+
+ | DL | | DLX |
+ +-----------+ +-------------+
+ | |
+ | PHY |
+ +---------------------------------------+
+
+
+
+Device discovery
+================
+
+OpenCAPI relies on a PCI-like configuration space, implemented on the
+device. So the host can discover AFUs by querying the config space.
+
+OpenCAPI devices in Linux are treated like PCI devices (with a few
+caveats). The firmware is expected to abstract the hardware as if it
+was a PCI link. A lot of the existing PCI infrastructure is reused:
+devices are scanned and BARs are assigned during the standard PCI
+enumeration. Commands like 'lspci' can therefore be used to see what
+devices are available.
+
+The configuration space defines the AFU(s) that can be found on the
+physical adapter, such as its name, how many memory contexts it can
+work with, the size of its MMIO areas, ...
+
+
+
+MMIO
+====
+
+OpenCAPI defines two MMIO areas for each AFU:
+
+* the global MMIO area, with registers pertinent to the whole AFU.
+* a per-process MMIO area, which has a fixed size for each context.
+
+
+
+AFU interrupts
+==============
+
+OpenCAPI includes the possibility for an AFU to send an interrupt to a
+host process. It is done through a 'intrp_req' defined in the
+Transaction Layer, specifying a 64-bit object handle which defines the
+interrupt.
+
+The driver allows a process to allocate an interrupt and obtain its
+64-bit object handle, that can be passed to the AFU.
+
+
+
+char devices
+============
+
+The driver creates one char device per AFU found on the physical
+device. A physical device may have multiple functions and each
+function can have multiple AFUs. At the time of this writing though,
+it has only been tested with devices exporting only one AFU.
+
+Char devices can be found in /dev/ocxl/ and are named as:
+/dev/ocxl/<AFU name>.<location>.<index>
+
+where <AFU name> is a max 20-character long name, as found in the
+config space of the AFU.
+<location> is added by the driver and can help distinguish devices
+when a system has more than one instance of the same OpenCAPI device.
+<index> is also to help distinguish AFUs in the unlikely case where a
+device carries multiple copies of the same AFU.
+
+
+
+User API
+========
+
+open
+----
+
+Based on the AFU definition found in the config space, an AFU may
+support working with more than one memory context, in which case the
+associated char device may be opened multiple times by different
+processes.
+
+
+ioctl
+-----
+
+OCXL_IOCTL_ATTACH:
+
+ Attach the memory context of the calling process to the AFU so that
+ the AFU can access its memory.
+
+OCXL_IOCTL_IRQ_ALLOC:
+
+ Allocate an AFU interrupt and return an identifier.
+
+OCXL_IOCTL_IRQ_FREE:
+
+ Free a previously allocated AFU interrupt.
+
+OCXL_IOCTL_IRQ_SET_FD:
+
+ Associate an event fd to an AFU interrupt so that the user process
+ can be notified when the AFU sends an interrupt.
+
+
+mmap
+----
+
+A process can mmap the per-process MMIO area for interactions with the
+AFU.
diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
index 3e3fdae5f3ed..6501389d55b9 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -326,6 +326,7 @@ Code Seq#(hex) Include File Comments
0xB5 00-0F uapi/linux/rpmsg.h <mailto:[email protected]>
0xC0 00-0F linux/usb/iowarrior.h
0xCA 00-0F uapi/misc/cxl.h
+0xCA 10-2F uapi/misc/ocxl.h
0xCA 80-BF uapi/scsi/cxlflash_ioctl.h
0xCB 00-1F CBM serial IEC bus in development:
<mailto:[email protected]>
--
2.14.1

2017-12-18 15:23:20

by Frederic Barrat

[permalink] [raw]
Subject: [PATCH 08/13] ocxl: Add a kernel API for other opencapi drivers

Some of the functions done by the generic driver should also be needed
by other opencapi drivers: attaching a context to an adapter,
translation fault handling, AFU interrupt allocation...

So to avoid code duplication, the driver provides a kernel API that
other drivers can use, similar to calling a in-kernel library.

It is still a bit theoretical, for lack of real hardware, and will
likely need adjustements down the road. But we used the cxlflash
driver as a guinea pig.

Signed-off-by: Frederic Barrat <[email protected]>
---
drivers/misc/ocxl/config.c | 13 ++-
drivers/misc/ocxl/link.c | 7 ++
drivers/misc/ocxl/ocxl_internal.h | 71 +-----------
include/misc/ocxl.h | 221 ++++++++++++++++++++++++++++++++++++++
4 files changed, 241 insertions(+), 71 deletions(-)
create mode 100644 include/misc/ocxl.h

diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
index bb2fde5967e2..2326e8da4029 100644
--- a/drivers/misc/ocxl/config.c
+++ b/drivers/misc/ocxl/config.c
@@ -9,8 +9,8 @@

#include <linux/pci.h>
#include <asm/pnv-ocxl.h>
+#include <misc/ocxl.h>
#include <misc/ocxl-config.h>
-#include "ocxl_internal.h"

#define EXTRACT_BIT(val, bit) (!!(val & BIT(bit)))
#define EXTRACT_BITS(val, s, e) ((val & GENMASK(e, s)) >> s)
@@ -250,6 +250,7 @@ int ocxl_config_read_function(struct pci_dev *dev, struct ocxl_fn_config *fn)
rc = validate_function(dev, fn);
return rc;
}
+EXPORT_SYMBOL_GPL(ocxl_config_read_function);

static int read_afu_info(struct pci_dev *dev, struct ocxl_fn_config *fn,
int offset, u32 *data)
@@ -308,6 +309,7 @@ int ocxl_config_check_afu_index(struct pci_dev *dev,
}
return 1;
}
+EXPORT_SYMBOL_GPL(ocxl_config_check_afu_index);

static int read_afu_name(struct pci_dev *dev, struct ocxl_fn_config *fn,
struct ocxl_afu_config *afu)
@@ -505,6 +507,7 @@ int ocxl_config_read_afu(struct pci_dev *dev, struct ocxl_fn_config *fn,
rc = validate_afu(dev, afu);
return rc;
}
+EXPORT_SYMBOL_GPL(ocxl_config_read_afu);

int ocxl_config_get_actag_info(struct pci_dev *dev, u16 *base, u16 *enabled,
u16 *supported)
@@ -523,6 +526,7 @@ int ocxl_config_get_actag_info(struct pci_dev *dev, u16 *base, u16 *enabled,
}
return 0;
}
+EXPORT_SYMBOL_GPL(ocxl_config_get_actag_info);

void ocxl_config_set_afu_actag(struct pci_dev *dev, int pos, int actag_base,
int actag_count)
@@ -535,11 +539,13 @@ void ocxl_config_set_afu_actag(struct pci_dev *dev, int pos, int actag_base,
val = actag_base & OCXL_DVSEC_ACTAG_MASK;
pci_write_config_dword(dev, pos + OCXL_DVSEC_AFU_CTRL_ACTAG_BASE, val);
}
+EXPORT_SYMBOL_GPL(ocxl_config_set_afu_actag);

int ocxl_config_get_pasid_info(struct pci_dev *dev, int *count)
{
return pnv_ocxl_get_pasid_count(dev, count);
}
+EXPORT_SYMBOL_GPL(ocxl_config_get_pasid_info);

void ocxl_config_set_afu_pasid(struct pci_dev *dev, int pos, int pasid_base,
u32 pasid_count_log)
@@ -557,6 +563,7 @@ void ocxl_config_set_afu_pasid(struct pci_dev *dev, int pos, int pasid_base,
pci_write_config_dword(dev, pos + OCXL_DVSEC_AFU_CTRL_PASID_BASE,
val32);
}
+EXPORT_SYMBOL_GPL(ocxl_config_set_afu_pasid);

void ocxl_config_set_afu_state(struct pci_dev *dev, int pos, int enable)
{
@@ -569,6 +576,7 @@ void ocxl_config_set_afu_state(struct pci_dev *dev, int pos, int enable)
val &= 0xFE;
pci_write_config_byte(dev, pos + OCXL_DVSEC_AFU_CTRL_ENABLE, val);
}
+EXPORT_SYMBOL_GPL(ocxl_config_set_afu_state);

int ocxl_config_set_TL(struct pci_dev *dev, int tl_dvsec)
{
@@ -666,6 +674,7 @@ int ocxl_config_set_TL(struct pci_dev *dev, int tl_dvsec)
kfree(recv_rate);
return rc;
}
+EXPORT_SYMBOL_GPL(ocxl_config_set_TL);

int ocxl_config_terminate_pasid(struct pci_dev *dev, int afu_control, int pasid)
{
@@ -705,6 +714,7 @@ int ocxl_config_terminate_pasid(struct pci_dev *dev, int afu_control, int pasid)
}
return 0;
}
+EXPORT_SYMBOL_GPL(ocxl_config_terminate_pasid);

void ocxl_config_set_actag(struct pci_dev *dev, int func_dvsec, u32 tag_first,
u32 tag_count)
@@ -716,3 +726,4 @@ void ocxl_config_set_actag(struct pci_dev *dev, int func_dvsec, u32 tag_first,
pci_write_config_dword(dev, func_dvsec + OCXL_DVSEC_FUNC_OFF_ACTAG,
val);
}
+EXPORT_SYMBOL_GPL(ocxl_config_set_actag);
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index 5f12564eea99..fda2c9def4ff 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -12,6 +12,7 @@
#include <linux/mmu_context.h>
#include <asm/copro.h>
#include <asm/pnv-ocxl.h>
+#include <misc/ocxl.h>
#include "ocxl_internal.h"


@@ -427,6 +428,7 @@ int ocxl_link_setup(struct pci_dev *dev, int PE_mask, void **link_handle)
mutex_unlock(&links_list_lock);
return rc;
}
+EXPORT_SYMBOL_GPL(ocxl_link_setup);

static void release_xsl(struct kref *ref)
{
@@ -446,6 +448,7 @@ void ocxl_link_release(struct pci_dev *dev, void *link_handle)
kref_put(&link->ref, release_xsl);
mutex_unlock(&links_list_lock);
}
+EXPORT_SYMBOL_GPL(ocxl_link_release);

static u64 calculate_cfg_state(bool kernel)
{
@@ -540,6 +543,7 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
mutex_unlock(&spa->spa_lock);
return rc;
}
+EXPORT_SYMBOL_GPL(ocxl_link_add_pe);

int ocxl_link_remove_pe(void *link_handle, int pasid)
{
@@ -608,6 +612,7 @@ int ocxl_link_remove_pe(void *link_handle, int pasid)
mutex_unlock(&spa->spa_lock);
return rc;
}
+EXPORT_SYMBOL_GPL(ocxl_link_remove_pe);

int ocxl_link_irq_alloc(void *link_handle, int *hw_irq, u64 *trigger_addr)
{
@@ -628,6 +633,7 @@ int ocxl_link_irq_alloc(void *link_handle, int *hw_irq, u64 *trigger_addr)
*trigger_addr = addr;
return 0;
}
+EXPORT_SYMBOL_GPL(ocxl_link_irq_alloc);

void ocxl_link_free_irq(void *link_handle, int hw_irq)
{
@@ -636,3 +642,4 @@ void ocxl_link_free_irq(void *link_handle, int hw_irq)
pnv_ocxl_free_xive_irq(hw_irq);
atomic_inc(&link->irq_available);
}
+EXPORT_SYMBOL_GPL(ocxl_link_free_irq);
diff --git a/drivers/misc/ocxl/ocxl_internal.h b/drivers/misc/ocxl/ocxl_internal.h
index 829369c5f004..18ec28c0a5cb 100644
--- a/drivers/misc/ocxl/ocxl_internal.h
+++ b/drivers/misc/ocxl/ocxl_internal.h
@@ -13,8 +13,8 @@
#include <linux/pci.h>
#include <linux/cdev.h>
#include <linux/list.h>
+#include <misc/ocxl.h>

-#define OCXL_AFU_NAME_SZ (24+1) /* add 1 for NULL termination */
#define MAX_IRQ_PER_LINK 2000
#define MAX_IRQ_PER_CONTEXT MAX_IRQ_PER_LINK

@@ -23,38 +23,6 @@

extern struct pci_driver ocxl_pci_driver;

-/*
- * The following 2 structures are a fairly generic way of representing
- * the configuration data for a function and AFU, as read from the
- * configuration space.
- */
-struct ocxl_afu_config {
- u8 idx;
- int dvsec_afu_control_pos;
- char name[OCXL_AFU_NAME_SZ];
- u8 version_major;
- u8 version_minor;
- u8 afuc_type;
- u8 afum_type;
- u8 profile;
- u8 global_mmio_bar;
- u64 global_mmio_offset;
- u32 global_mmio_size;
- u8 pp_mmio_bar;
- u64 pp_mmio_offset;
- u32 pp_mmio_stride;
- u8 log_mem_size;
- u8 pasid_supported_log;
- u16 actag_supported;
-};
-
-struct ocxl_fn_config {
- int dvsec_tl_pos;
- int dvsec_function_pos;
- int dvsec_afu_info_pos;
- s8 max_pasid_log;
- s8 max_afu_index;
-};

struct ocxl_fn {
struct device dev;
@@ -142,43 +110,6 @@ extern void ocxl_unregister_afu(struct ocxl_afu *afu);
extern int ocxl_file_init(void);
extern void ocxl_file_exit(void);

-extern int ocxl_config_read_function(struct pci_dev *dev,
- struct ocxl_fn_config *fn);
-
-extern int ocxl_config_check_afu_index(struct pci_dev *dev,
- struct ocxl_fn_config *fn, int afu_idx);
-extern int ocxl_config_read_afu(struct pci_dev *dev,
- struct ocxl_fn_config *fn,
- struct ocxl_afu_config *afu,
- u8 afu_idx);
-extern int ocxl_config_get_pasid_info(struct pci_dev *dev, int *count);
-extern void ocxl_config_set_afu_pasid(struct pci_dev *dev,
- int afu_control,
- int pasid_base, u32 pasid_count_log);
-extern int ocxl_config_get_actag_info(struct pci_dev *dev,
- u16 *base, u16 *enabled, u16 *supported);
-extern void ocxl_config_set_actag(struct pci_dev *dev, int func_dvsec,
- u32 tag_first, u32 tag_count);
-extern void ocxl_config_set_afu_actag(struct pci_dev *dev, int afu_control,
- int actag_base, int actag_count);
-extern void ocxl_config_set_afu_state(struct pci_dev *dev, int afu_control,
- int enable);
-extern int ocxl_config_set_TL(struct pci_dev *dev, int tl_dvsec);
-extern int ocxl_config_terminate_pasid(struct pci_dev *dev, int afu_control,
- int pasid);
-
-extern int ocxl_link_setup(struct pci_dev *dev, int PE_mask,
- void **link_handle);
-extern void ocxl_link_release(struct pci_dev *dev, void *link_handle);
-extern int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
- u64 amr, struct mm_struct *mm,
- void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
- void *xsl_err_data);
-extern int ocxl_link_remove_pe(void *link_handle, int pasid);
-extern int ocxl_link_irq_alloc(void *link_handle, int *hw_irq,
- u64 *addr);
-extern void ocxl_link_free_irq(void *link_handle, int hw_irq);
-
extern int ocxl_pasid_afu_alloc(struct ocxl_fn *fn, u32 size);
extern void ocxl_pasid_afu_free(struct ocxl_fn *fn, u32 start, u32 size);
extern int ocxl_actag_afu_alloc(struct ocxl_fn *fn, u32 size);
diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
new file mode 100644
index 000000000000..2b66a2083a23
--- /dev/null
+++ b/include/misc/ocxl.h
@@ -0,0 +1,221 @@
+/*
+ * Copyright 2017 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _MISC_OCXL_H_
+#define _MISC_OCXL_H_
+
+#include <linux/pci.h>
+
+/*
+ * Opencapi drivers all need some common facilities, like parsing the
+ * device configuration space, adding a Process Element to the Shared
+ * Process Area, etc...
+ *
+ * The ocxl module provides a kernel API, to allow other drivers to
+ * reuse common code. A bit like a in-kernel library.
+ */
+
+#define OCXL_AFU_NAME_SZ (24+1) /* add 1 for NULL termination */
+
+/*
+ * The following 2 structures are a fairly generic way of representing
+ * the configuration data for a function and AFU, as read from the
+ * configuration space.
+ */
+struct ocxl_afu_config {
+ u8 idx;
+ int dvsec_afu_control_pos; /* offset of AFU control DVSEC */
+ char name[OCXL_AFU_NAME_SZ];
+ u8 version_major;
+ u8 version_minor;
+ u8 afuc_type;
+ u8 afum_type;
+ u8 profile;
+ u8 global_mmio_bar; /* global MMIO area */
+ u64 global_mmio_offset;
+ u32 global_mmio_size;
+ u8 pp_mmio_bar; /* per-process MMIO area */
+ u64 pp_mmio_offset;
+ u32 pp_mmio_stride;
+ u8 log_mem_size;
+ u8 pasid_supported_log;
+ u16 actag_supported;
+};
+
+struct ocxl_fn_config {
+ int dvsec_tl_pos; /* offset of the Transaction Layer DVSEC */
+ int dvsec_function_pos; /* offset of the Function DVSEC */
+ int dvsec_afu_info_pos; /* offset of the AFU information DVSEC */
+ s8 max_pasid_log;
+ s8 max_afu_index;
+};
+
+/*
+ * Read the configuration space of a function and fill in a
+ * ocxl_fn_config structure with all the function details
+ */
+extern int ocxl_config_read_function(struct pci_dev *dev,
+ struct ocxl_fn_config *fn);
+
+/*
+ * Check if an AFU index is valid for the given function.
+ *
+ * AFU indexes can be sparse, so a driver should check all indexes up
+ * to the maximum found in the function description
+ */
+extern int ocxl_config_check_afu_index(struct pci_dev *dev,
+ struct ocxl_fn_config *fn, int afu_idx);
+
+/*
+ * Read the configuration space of a function for the AFU specified by
+ * the index 'afu_idx'. Fills in a ocxl_afu_config structure
+ */
+extern int ocxl_config_read_afu(struct pci_dev *dev,
+ struct ocxl_fn_config *fn,
+ struct ocxl_afu_config *afu,
+ u8 afu_idx);
+
+/*
+ * Get the max PASID value that can be used by the function
+ */
+extern int ocxl_config_get_pasid_info(struct pci_dev *dev, int *count);
+
+/*
+ * Tell an AFU, by writing in the configuration space, the PASIDs that
+ * it can use. Range starts at 'pasid_base' and its size is a multiple
+ * of 2
+ *
+ * 'afu_control_offset' is the offset of the AFU control DVSEC which
+ * can be found in the function configuration
+ */
+extern void ocxl_config_set_afu_pasid(struct pci_dev *dev,
+ int afu_control_offset,
+ int pasid_base, u32 pasid_count_log);
+
+/*
+ * Get the actag configuration for the function:
+ * 'base' is the first actag value that can be used.
+ * 'enabled' it the number of actags available, starting from base.
+ * 'supported' is the total number of actags desired by all the AFUs
+ * of the function.
+ */
+extern int ocxl_config_get_actag_info(struct pci_dev *dev,
+ u16 *base, u16 *enabled, u16 *supported);
+
+/*
+ * Tell a function, by writing in the configuration space, the actags
+ * it can use.
+ *
+ * 'func_offset' is the offset of the Function DVSEC that can found in
+ * the function configuration
+ */
+extern void ocxl_config_set_actag(struct pci_dev *dev, int func_offset,
+ u32 actag_base, u32 actag_count);
+
+/*
+ * Tell an AFU, by writing in the configuration space, the actags it
+ * can use.
+ *
+ * 'afu_control_offset' is the offset of the AFU control DVSEC for the
+ * desired AFU. It can be found in the AFU configuration
+ */
+extern void ocxl_config_set_afu_actag(struct pci_dev *dev,
+ int afu_control_offset,
+ int actag_base, int actag_count);
+
+/*
+ * Enable/disable an AFU, by writing in the configuration space.
+ *
+ * 'afu_control_offset' is the offset of the AFU control DVSEC for the
+ * desired AFU. It can be found in the AFU configuration
+ */
+extern void ocxl_config_set_afu_state(struct pci_dev *dev,
+ int afu_control_offset, int enable);
+
+/*
+ * Set the Transaction Layer configuration in the configuration space.
+ * Only needed for function 0.
+ *
+ * It queries the host TL capabilities, find some common ground
+ * between the host and device, and set the Transaction Layer on both
+ * accordingly.
+ */
+extern int ocxl_config_set_TL(struct pci_dev *dev, int tl_dvsec);
+
+/*
+ * Request an AFU to terminate a PASID.
+ * Will return once the AFU has acked the request, or an error in case
+ * of timeout.
+ *
+ * The hardware can only terminate one PASID at a time, so caller must
+ * guarantee some kind of serialization.
+ *
+ * 'afu_control_offset' is the offset of the AFU control DVSEC for the
+ * desired AFU. It can be found in the AFU configuration
+ */
+extern int ocxl_config_terminate_pasid(struct pci_dev *dev,
+ int afu_control_offset, int pasid);
+
+/*
+ * Set up the opencapi link for the function.
+ *
+ * When called for the first time for a link, it sets up the Shared
+ * Process Area for the link and the interrupt handler to process
+ * translation faults.
+ *
+ * Returns a 'link handle' that should be used for further calls for
+ * the link
+ */
+extern int ocxl_link_setup(struct pci_dev *dev, int PE_mask,
+ void **link_handle);
+
+/*
+ * Remove the association between the function and its link.
+ */
+extern void ocxl_link_release(struct pci_dev *dev, void *link_handle);
+
+/*
+ * Add a Process Element to the Shared Process Area for a link.
+ * The process is defined by its PASID, pid, tid and its mm_struct.
+ *
+ * 'xsl_err_cb' is an optional callback if the driver wants to be
+ * notified when the translation fault interrupt handler detects an
+ * address error.
+ * 'xsl_err_data' is an argument passed to the above callback, if
+ * defined
+ */
+extern int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
+ u64 amr, struct mm_struct *mm,
+ void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
+ void *xsl_err_data);
+
+/*
+ * Remove a Process Element from the Shared Process Area for a link
+ */
+extern int ocxl_link_remove_pe(void *link_handle, int pasid);
+
+/*
+ * Allocate an AFU interrupt associated to the link.
+ *
+ * 'hw_irq' is the hardware interrupt number
+ * 'obj_handle' is the 64-bit object handle to be passed to the AFU to
+ * trigger the interrupt.
+ * On P9, 'obj_handle' is an address, which, if written, triggers the
+ * interrupt. It is an MMIO address which needs to be remapped (one
+ * page).
+ */
+extern int ocxl_link_irq_alloc(void *link_handle, int *hw_irq,
+ u64 *obj_handle);
+
+/*
+ * Free a previously allocated AFU interrupt
+ */
+extern void ocxl_link_free_irq(void *link_handle, int hw_irq);
+
+#endif /* _MISC_OCXL_H_ */
--
2.14.1

2017-12-18 15:23:23

by Frederic Barrat

[permalink] [raw]
Subject: [PATCH 07/13] ocxl: Add AFU interrupt support

Add user APIs through ioctl to allocate, free, and be notified of an
AFU interrupt.

For opencapi, an AFU can trigger an interrupt on the host by sending a
specific command targeting a 64-bit object handle. On POWER9, this is
implemented by mapping a special page in the address space of a
process and a write to that page will trigger an interrupt.

Signed-off-by: Frederic Barrat <[email protected]>
---
arch/powerpc/include/asm/pnv-ocxl.h | 3 +
arch/powerpc/platforms/powernv/ocxl.c | 30 +++++
drivers/misc/ocxl/afu_irq.c | 204 ++++++++++++++++++++++++++++++++++
drivers/misc/ocxl/context.c | 40 ++++++-
drivers/misc/ocxl/file.c | 33 ++++++
drivers/misc/ocxl/link.c | 28 +++++
drivers/misc/ocxl/ocxl_internal.h | 7 ++
include/uapi/misc/ocxl.h | 9 ++
8 files changed, 352 insertions(+), 2 deletions(-)
create mode 100644 drivers/misc/ocxl/afu_irq.c

diff --git a/arch/powerpc/include/asm/pnv-ocxl.h b/arch/powerpc/include/asm/pnv-ocxl.h
index 5a7ae7f28209..1e26f0a39500 100644
--- a/arch/powerpc/include/asm/pnv-ocxl.h
+++ b/arch/powerpc/include/asm/pnv-ocxl.h
@@ -37,4 +37,7 @@ extern int pnv_ocxl_spa_setup(struct pci_dev *dev, void *spa_mem, int PE_mask,
extern void pnv_ocxl_spa_release(void *platform_data);
extern int pnv_ocxl_spa_remove_pe(void *platform_data, int pe_handle);

+extern int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr);
+extern void pnv_ocxl_free_xive_irq(u32 irq);
+
#endif /* _ASM_PVN_OCXL_H */
diff --git a/arch/powerpc/platforms/powernv/ocxl.c b/arch/powerpc/platforms/powernv/ocxl.c
index 6c79924b95c8..96cafba6aef1 100644
--- a/arch/powerpc/platforms/powernv/ocxl.c
+++ b/arch/powerpc/platforms/powernv/ocxl.c
@@ -9,6 +9,7 @@

#include <asm/pnv-ocxl.h>
#include <asm/opal.h>
+#include <asm/xive.h>
#include <misc/ocxl-config.h>
#include "pci.h"

@@ -487,3 +488,32 @@ int pnv_ocxl_spa_remove_pe(void *platform_data, int pe_handle)
return rc;
}
EXPORT_SYMBOL_GPL(pnv_ocxl_spa_remove_pe);
+
+int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr)
+{
+ __be64 flags, trigger_page;
+ s64 rc;
+ u32 hwirq;
+
+ hwirq = xive_native_alloc_irq();
+ if (!hwirq)
+ return -ENOENT;
+
+ rc = opal_xive_get_irq_info(hwirq, &flags, NULL, &trigger_page, NULL,
+ NULL);
+ if (rc || !trigger_page) {
+ xive_native_free_irq(hwirq);
+ return -ENOENT;
+ }
+ *irq = hwirq;
+ *trigger_addr = be64_to_cpu(trigger_page);
+ return 0;
+
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_alloc_xive_irq);
+
+void pnv_ocxl_free_xive_irq(u32 irq)
+{
+ xive_native_free_irq(irq);
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_free_xive_irq);
diff --git a/drivers/misc/ocxl/afu_irq.c b/drivers/misc/ocxl/afu_irq.c
new file mode 100644
index 000000000000..0b217a854837
--- /dev/null
+++ b/drivers/misc/ocxl/afu_irq.c
@@ -0,0 +1,204 @@
+/*
+ * Copyright 2017 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/interrupt.h>
+#include <linux/eventfd.h>
+#include <asm/pnv-ocxl.h>
+#include "ocxl_internal.h"
+
+struct afu_irq {
+ int id;
+ int hw_irq;
+ unsigned int virq;
+ char *name;
+ u64 trigger_page;
+ struct eventfd_ctx *ev_ctx;
+};
+
+static int irq_offset_to_id(struct ocxl_context *ctx, u64 offset)
+{
+ return (offset - ctx->afu->irq_base_offset) >> PAGE_SHIFT;
+}
+
+static u64 irq_id_to_offset(struct ocxl_context *ctx, int id)
+{
+ return ctx->afu->irq_base_offset + (id << PAGE_SHIFT);
+}
+
+static irqreturn_t afu_irq_handler(int virq, void *data)
+{
+ struct afu_irq *irq = (struct afu_irq *) data;
+
+ if (irq->ev_ctx)
+ eventfd_signal(irq->ev_ctx, 1);
+ return IRQ_HANDLED;
+}
+
+static int setup_afu_irq(struct ocxl_context *ctx, struct afu_irq *irq)
+{
+ int rc;
+
+ irq->virq = irq_create_mapping(NULL, irq->hw_irq);
+ if (!irq->virq) {
+ pr_err("irq_create_mapping failed\n");
+ return -ENOMEM;
+ }
+ pr_debug("hw_irq %d mapped to virq %u\n", irq->hw_irq, irq->virq);
+
+ irq->name = kasprintf(GFP_KERNEL, "ocxl-afu-%u", irq->virq);
+ if (!irq->name) {
+ irq_dispose_mapping(irq->virq);
+ return -ENOMEM;
+ }
+
+ rc = request_irq(irq->virq, afu_irq_handler, 0, irq->name, irq);
+ if (rc) {
+ kfree(irq->name);
+ irq->name = NULL;
+ irq_dispose_mapping(irq->virq);
+ pr_err("request_irq failed: %d\n", rc);
+ return rc;
+ }
+ return 0;
+}
+
+static void release_afu_irq(struct afu_irq *irq)
+{
+ free_irq(irq->virq, irq);
+ irq_dispose_mapping(irq->virq);
+ kfree(irq->name);
+}
+
+int ocxl_afu_irq_alloc(struct ocxl_context *ctx, u64 *irq_offset)
+{
+ struct afu_irq *irq;
+ int rc;
+
+ irq = kzalloc(sizeof(struct afu_irq), GFP_KERNEL);
+ if (!irq)
+ return -ENOMEM;
+
+ /*
+ * We limit the number of afu irqs per context and per link to
+ * avoid a single process or user depleting the pool of IPIs
+ */
+
+ mutex_lock(&ctx->irq_lock);
+
+ irq->id = idr_alloc(&ctx->irq_idr, irq, 0, MAX_IRQ_PER_CONTEXT,
+ GFP_KERNEL);
+ if (irq->id < 0) {
+ rc = -ENOSPC;
+ goto err_unlock;
+ }
+
+ rc = ocxl_link_irq_alloc(ctx->afu->fn->link, &irq->hw_irq,
+ &irq->trigger_page);
+ if (rc)
+ goto err_idr;
+
+ rc = setup_afu_irq(ctx, irq);
+ if (rc)
+ goto err_alloc;
+
+ *irq_offset = irq_id_to_offset(ctx, irq->id);
+
+ mutex_unlock(&ctx->irq_lock);
+ return 0;
+
+err_alloc:
+ ocxl_link_free_irq(ctx->afu->fn->link, irq->hw_irq);
+err_idr:
+ idr_remove(&ctx->irq_idr, irq->id);
+err_unlock:
+ mutex_unlock(&ctx->irq_lock);
+ kfree(irq);
+ return rc;
+}
+
+static void afu_irq_free(struct afu_irq *irq, struct ocxl_context *ctx)
+{
+ if (ctx->mapping)
+ unmap_mapping_range(ctx->mapping,
+ irq_id_to_offset(ctx, irq->id),
+ 1 << PAGE_SHIFT, 1);
+ release_afu_irq(irq);
+ if (irq->ev_ctx)
+ eventfd_ctx_put(irq->ev_ctx);
+ ocxl_link_free_irq(ctx->afu->fn->link, irq->hw_irq);
+ kfree(irq);
+}
+
+int ocxl_afu_irq_free(struct ocxl_context *ctx, u64 irq_offset)
+{
+ struct afu_irq *irq;
+ int id = irq_offset_to_id(ctx, irq_offset);
+
+ mutex_lock(&ctx->irq_lock);
+
+ irq = idr_find(&ctx->irq_idr, id);
+ if (!irq) {
+ mutex_unlock(&ctx->irq_lock);
+ return -EINVAL;
+ }
+ idr_remove(&ctx->irq_idr, irq->id);
+ afu_irq_free(irq, ctx);
+ mutex_unlock(&ctx->irq_lock);
+ return 0;
+}
+
+void ocxl_afu_irq_free_all(struct ocxl_context *ctx)
+{
+ struct afu_irq *irq;
+ int id;
+
+ mutex_lock(&ctx->irq_lock);
+ idr_for_each_entry(&ctx->irq_idr, irq, id)
+ afu_irq_free(irq, ctx);
+ mutex_unlock(&ctx->irq_lock);
+}
+
+int ocxl_afu_irq_set_fd(struct ocxl_context *ctx, u64 irq_offset, int eventfd)
+{
+ struct afu_irq *irq;
+ struct eventfd_ctx *ev_ctx;
+ int rc = 0, id = irq_offset_to_id(ctx, irq_offset);
+
+ mutex_lock(&ctx->irq_lock);
+ irq = idr_find(&ctx->irq_idr, id);
+ if (!irq) {
+ rc = -EINVAL;
+ goto unlock;
+ }
+
+ ev_ctx = eventfd_ctx_fdget(eventfd);
+ if (IS_ERR(ev_ctx)) {
+ rc = -EINVAL;
+ goto unlock;
+ }
+
+ irq->ev_ctx = ev_ctx;
+unlock:
+ mutex_unlock(&ctx->irq_lock);
+ return rc;
+}
+
+u64 ocxl_afu_irq_get_addr(struct ocxl_context *ctx, u64 irq_offset)
+{
+ struct afu_irq *irq;
+ int id = irq_offset_to_id(ctx, irq_offset);
+ u64 addr = 0;
+
+ mutex_lock(&ctx->irq_lock);
+ irq = idr_find(&ctx->irq_idr, id);
+ if (irq)
+ addr = irq->trigger_page;
+ mutex_unlock(&ctx->irq_lock);
+ return addr;
+}
diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
index 0bc0dd97d784..19575269ed22 100644
--- a/drivers/misc/ocxl/context.c
+++ b/drivers/misc/ocxl/context.c
@@ -38,6 +38,8 @@ int ocxl_context_init(struct ocxl_context *ctx, struct ocxl_afu *afu,
mutex_init(&ctx->mapping_lock);
init_waitqueue_head(&ctx->events_wq);
mutex_init(&ctx->xsl_error_lock);
+ mutex_init(&ctx->irq_lock);
+ idr_init(&ctx->irq_idr);
/*
* Keep a reference on the AFU to make sure it's valid for the
* duration of the life of the context
@@ -87,6 +89,19 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr)
return rc;
}

+static int map_afu_irq(struct vm_area_struct *vma, unsigned long address,
+ u64 offset, struct ocxl_context *ctx)
+{
+ u64 trigger_addr;
+
+ trigger_addr = ocxl_afu_irq_get_addr(ctx, offset);
+ if (!trigger_addr)
+ return VM_FAULT_SIGBUS;
+
+ vm_insert_pfn(vma, address, trigger_addr >> PAGE_SHIFT);
+ return VM_FAULT_NOPAGE;
+}
+
static int map_pp_mmio(struct vm_area_struct *vma, unsigned long address,
u64 offset, struct ocxl_context *ctx)
{
@@ -125,7 +140,10 @@ static int ocxl_mmap_fault(struct vm_fault *vmf)
pr_debug("%s: pasid %d address 0x%lx offset 0x%llx\n", __func__,
ctx->pasid, vmf->address, offset);

- rc = map_pp_mmio(vma, vmf->address, offset, ctx);
+ if (offset < ctx->afu->irq_base_offset)
+ rc = map_pp_mmio(vma, vmf->address, offset, ctx);
+ else
+ rc = map_afu_irq(vma, vmf->address, offset, ctx);
return rc;
}

@@ -133,6 +151,19 @@ static const struct vm_operations_struct ocxl_vmops = {
.fault = ocxl_mmap_fault,
};

+static int check_mmap_afu_irq(struct ocxl_context *ctx,
+ struct vm_area_struct *vma)
+{
+ /* only one page */
+ if (vma_pages(vma) != 1)
+ return -EINVAL;
+
+ /* check offset validty */
+ if (!ocxl_afu_irq_get_addr(ctx, vma->vm_pgoff << PAGE_SHIFT))
+ return -EINVAL;
+ return 0;
+}
+
static int check_mmap_mmio(struct ocxl_context *ctx,
struct vm_area_struct *vma)
{
@@ -146,7 +177,10 @@ int ocxl_context_mmap(struct ocxl_context *ctx, struct vm_area_struct *vma)
{
int rc;

- rc = check_mmap_mmio(ctx, vma);
+ if ((vma->vm_pgoff << PAGE_SHIFT) < ctx->afu->irq_base_offset)
+ rc = check_mmap_mmio(ctx, vma);
+ else
+ rc = check_mmap_afu_irq(ctx, vma);
if (rc)
return rc;

@@ -231,6 +265,8 @@ void ocxl_context_free(struct ocxl_context *ctx)
idr_remove(&ctx->afu->contexts_idr, ctx->pasid);
mutex_unlock(&ctx->afu->contexts_lock);

+ ocxl_afu_irq_free_all(ctx);
+ idr_destroy(&ctx->irq_idr);
/* reference to the AFU taken in ocxl_context_init */
ocxl_afu_put(ctx->afu);
kfree(ctx);
diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c
index a51386eff4f5..0a73e2c11ba6 100644
--- a/drivers/misc/ocxl/file.c
+++ b/drivers/misc/ocxl/file.c
@@ -110,12 +110,17 @@ static long afu_ioctl_attach(struct ocxl_context *ctx,
}

#define CMD_STR(x) (x == OCXL_IOCTL_ATTACH ? "ATTACH" : \
+ x == OCXL_IOCTL_IRQ_ALLOC ? "IRQ_ALLOC" : \
+ x == OCXL_IOCTL_IRQ_FREE ? "IRQ_FREE" : \
+ x == OCXL_IOCTL_IRQ_SET_FD ? "IRQ_SET_FD" : \
"UNKNOWN")

static long afu_ioctl(struct file *file, unsigned int cmd,
unsigned long args)
{
struct ocxl_context *ctx = file->private_data;
+ struct ocxl_ioctl_irq_fd irq_fd;
+ u64 irq_offset;
long rc;

pr_debug("%s for context %d, command %s\n", __func__, ctx->pasid,
@@ -130,6 +135,34 @@ static long afu_ioctl(struct file *file, unsigned int cmd,
(struct ocxl_ioctl_attach __user *) args);
break;

+ case OCXL_IOCTL_IRQ_ALLOC:
+ rc = ocxl_afu_irq_alloc(ctx, &irq_offset);
+ if (!rc) {
+ rc = copy_to_user((u64 *) args, &irq_offset,
+ sizeof(irq_offset));
+ if (rc)
+ ocxl_afu_irq_free(ctx, irq_offset);
+ }
+ break;
+
+ case OCXL_IOCTL_IRQ_FREE:
+ rc = copy_from_user(&irq_offset, (u64 *) args,
+ sizeof(irq_offset));
+ if (rc)
+ return -EFAULT;
+ rc = ocxl_afu_irq_free(ctx, irq_offset);
+ break;
+
+ case OCXL_IOCTL_IRQ_SET_FD:
+ rc = copy_from_user(&irq_fd, (u64 *) args, sizeof(irq_fd));
+ if (rc)
+ return -EFAULT;
+ if (irq_fd.reserved)
+ return -EINVAL;
+ rc = ocxl_afu_irq_set_fd(ctx, irq_fd.irq_offset,
+ irq_fd.eventfd);
+ break;
+
default:
rc = -EINVAL;
}
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index 6b184cd7d2a6..5f12564eea99 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -608,3 +608,31 @@ int ocxl_link_remove_pe(void *link_handle, int pasid)
mutex_unlock(&spa->spa_lock);
return rc;
}
+
+int ocxl_link_irq_alloc(void *link_handle, int *hw_irq, u64 *trigger_addr)
+{
+ struct link *link = (struct link *) link_handle;
+ int rc, irq;
+ u64 addr;
+
+ if (atomic_dec_if_positive(&link->irq_available) < 0)
+ return -ENOSPC;
+
+ rc = pnv_ocxl_alloc_xive_irq(&irq, &addr);
+ if (rc) {
+ atomic_inc(&link->irq_available);
+ return rc;
+ }
+
+ *hw_irq = irq;
+ *trigger_addr = addr;
+ return 0;
+}
+
+void ocxl_link_free_irq(void *link_handle, int hw_irq)
+{
+ struct link *link = (struct link *) link_handle;
+
+ pnv_ocxl_free_xive_irq(hw_irq);
+ atomic_inc(&link->irq_available);
+}
diff --git a/drivers/misc/ocxl/ocxl_internal.h b/drivers/misc/ocxl/ocxl_internal.h
index e07f7d523275..829369c5f004 100644
--- a/drivers/misc/ocxl/ocxl_internal.h
+++ b/drivers/misc/ocxl/ocxl_internal.h
@@ -197,4 +197,11 @@ extern void ocxl_context_free(struct ocxl_context *ctx);
extern int ocxl_sysfs_add_afu(struct ocxl_afu *afu);
extern void ocxl_sysfs_remove_afu(struct ocxl_afu *afu);

+extern int ocxl_afu_irq_alloc(struct ocxl_context *ctx, u64 *irq_offset);
+extern int ocxl_afu_irq_free(struct ocxl_context *ctx, u64 irq_offset);
+extern void ocxl_afu_irq_free_all(struct ocxl_context *ctx);
+extern int ocxl_afu_irq_set_fd(struct ocxl_context *ctx, u64 irq_offset,
+ int eventfd);
+extern u64 ocxl_afu_irq_get_addr(struct ocxl_context *ctx, u64 irq_offset);
+
#endif /* _OCXL_INTERNAL_H_ */
diff --git a/include/uapi/misc/ocxl.h b/include/uapi/misc/ocxl.h
index 71fa387f2efd..488e75228c33 100644
--- a/include/uapi/misc/ocxl.h
+++ b/include/uapi/misc/ocxl.h
@@ -39,9 +39,18 @@ struct ocxl_ioctl_attach {
__u64 reserved3;
};

+struct ocxl_ioctl_irq_fd {
+ __u64 irq_offset;
+ __s32 eventfd;
+ __u32 reserved;
+};
+
/* ioctl numbers */
#define OCXL_MAGIC 0xCA
/* AFU devices */
#define OCXL_IOCTL_ATTACH _IOW(OCXL_MAGIC, 0x10, struct ocxl_ioctl_attach)
+#define OCXL_IOCTL_IRQ_ALLOC _IOR(OCXL_MAGIC, 0x11, __u64)
+#define OCXL_IOCTL_IRQ_FREE _IOW(OCXL_MAGIC, 0x12, __u64)
+#define OCXL_IOCTL_IRQ_SET_FD _IOW(OCXL_MAGIC, 0x13, struct ocxl_ioctl_irq_fd)

#endif /* _UAPI_MISC_OCXL_H */
--
2.14.1

2017-12-18 15:23:30

by Frederic Barrat

[permalink] [raw]
Subject: [PATCH 06/13] ocxl: Driver code for 'generic' opencapi devices

Add an ocxl driver to handle generic opencapi devices. Of course, it's
not meant to be the only opencapi driver, any device is free to
implement its own. But if a host application only needs basic services
like attaching to an opencapi adapter, have translation faults handled
or allocate AFU interrupts, it should suffice.

The AFU config space must follow the opencapi specification and use
the expected vendor/device ID to be seen by the generic driver.

The driver exposes the device AFUs as a char device in /dev/ocxl/

Note that the driver currently doesn't handle memory attached to the
opencapi device.

Signed-off-by: Frederic Barrat <[email protected]>
Signed-off-by: Andrew Donnellan <[email protected]>
Signed-off-by: Alastair D'Silva <[email protected]>
---
drivers/misc/ocxl/config.c | 718 ++++++++++++++++++++++++++++++++++++++
drivers/misc/ocxl/context.c | 237 +++++++++++++
drivers/misc/ocxl/file.c | 405 +++++++++++++++++++++
drivers/misc/ocxl/link.c | 610 ++++++++++++++++++++++++++++++++
drivers/misc/ocxl/main.c | 40 +++
drivers/misc/ocxl/ocxl_internal.h | 200 +++++++++++
drivers/misc/ocxl/pasid.c | 114 ++++++
drivers/misc/ocxl/pci.c | 592 +++++++++++++++++++++++++++++++
drivers/misc/ocxl/sysfs.c | 150 ++++++++
include/uapi/misc/ocxl.h | 47 +++
10 files changed, 3113 insertions(+)
create mode 100644 drivers/misc/ocxl/config.c
create mode 100644 drivers/misc/ocxl/context.c
create mode 100644 drivers/misc/ocxl/file.c
create mode 100644 drivers/misc/ocxl/link.c
create mode 100644 drivers/misc/ocxl/main.c
create mode 100644 drivers/misc/ocxl/ocxl_internal.h
create mode 100644 drivers/misc/ocxl/pasid.c
create mode 100644 drivers/misc/ocxl/pci.c
create mode 100644 drivers/misc/ocxl/sysfs.c
create mode 100644 include/uapi/misc/ocxl.h

diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
new file mode 100644
index 000000000000..bb2fde5967e2
--- /dev/null
+++ b/drivers/misc/ocxl/config.c
@@ -0,0 +1,718 @@
+/*
+ * Copyright 2017 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/pci.h>
+#include <asm/pnv-ocxl.h>
+#include <misc/ocxl-config.h>
+#include "ocxl_internal.h"
+
+#define EXTRACT_BIT(val, bit) (!!(val & BIT(bit)))
+#define EXTRACT_BITS(val, s, e) ((val & GENMASK(e, s)) >> s)
+
+#define OCXL_DVSEC_AFU_IDX_MASK GENMASK(5, 0)
+#define OCXL_DVSEC_ACTAG_MASK GENMASK(11, 0)
+#define OCXL_DVSEC_PASID_MASK GENMASK(19, 0)
+#define OCXL_DVSEC_PASID_LOG_MASK GENMASK(4, 0)
+
+#define OCXL_DVSEC_TEMPL_VERSION 0x0
+#define OCXL_DVSEC_TEMPL_NAME 0x4
+#define OCXL_DVSEC_TEMPL_AFU_VERSION 0x1C
+#define OCXL_DVSEC_TEMPL_MMIO_GLOBAL 0x20
+#define OCXL_DVSEC_TEMPL_MMIO_GLOBAL_SZ 0x28
+#define OCXL_DVSEC_TEMPL_MMIO_PP 0x30
+#define OCXL_DVSEC_TEMPL_MMIO_PP_SZ 0x38
+#define OCXL_DVSEC_TEMPL_MEM_SZ 0x3C
+#define OCXL_DVSEC_TEMPL_WWID 0x40
+
+#define OCXL_MAX_AFU_PER_FUNCTION 64
+#define OCXL_TEMPL_LEN 0x58
+#define OCXL_TEMPL_NAME_LEN 24
+#define OCXL_CFG_TIMEOUT 3
+
+static int find_dvsec(struct pci_dev *dev, int dvsec_id)
+{
+ int vsec = 0;
+ u16 vendor, id;
+
+ while ((vsec = pci_find_next_ext_capability(dev, vsec,
+ OCXL_EXT_CAP_ID_DVSEC))) {
+ pci_read_config_word(dev, vsec + OCXL_DVSEC_VENDOR_OFFSET,
+ &vendor);
+ pci_read_config_word(dev, vsec + OCXL_DVSEC_ID_OFFSET, &id);
+ if (vendor == PCI_VENDOR_ID_IBM && id == dvsec_id)
+ return vsec;
+ }
+ return 0;
+}
+
+static int find_dvsec_afu_ctrl(struct pci_dev *dev, u8 afu_idx)
+{
+ int vsec = 0;
+ u16 vendor, id;
+ u8 idx;
+
+ while ((vsec = pci_find_next_ext_capability(dev, vsec,
+ OCXL_EXT_CAP_ID_DVSEC))) {
+ pci_read_config_word(dev, vsec + OCXL_DVSEC_VENDOR_OFFSET,
+ &vendor);
+ pci_read_config_word(dev, vsec + OCXL_DVSEC_ID_OFFSET, &id);
+
+ if (vendor == PCI_VENDOR_ID_IBM &&
+ id == OCXL_DVSEC_AFU_CTRL_ID) {
+ pci_read_config_byte(dev,
+ vsec + OCXL_DVSEC_AFU_CTRL_AFU_IDX,
+ &idx);
+ if (idx == afu_idx)
+ return vsec;
+ }
+ }
+ return 0;
+}
+
+static int read_pasid(struct pci_dev *dev, struct ocxl_fn_config *fn)
+{
+ u16 val;
+ int pos;
+
+ pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_PASID);
+ if (!pos) {
+ /*
+ * PASID capability is not mandatory, but there
+ * shouldn't be any AFU
+ */
+ dev_dbg(&dev->dev, "Function doesn't require any PASID\n");
+ fn->max_pasid_log = -1;
+ goto out;
+ }
+ pci_read_config_word(dev, pos + PCI_PASID_CAP, &val);
+ fn->max_pasid_log = EXTRACT_BITS(val, 8, 12);
+
+out:
+ dev_dbg(&dev->dev, "PASID capability:\n");
+ dev_dbg(&dev->dev, " Max PASID log = %d\n", fn->max_pasid_log);
+ return 0;
+}
+
+static int read_dvsec_tl(struct pci_dev *dev, struct ocxl_fn_config *fn)
+{
+ int pos;
+
+ pos = find_dvsec(dev, OCXL_DVSEC_TL_ID);
+ if (!pos && PCI_FUNC(dev->devfn) == 0) {
+ dev_err(&dev->dev, "Can't find TL DVSEC\n");
+ return -ENODEV;
+ }
+ if (pos && PCI_FUNC(dev->devfn) != 0) {
+ dev_err(&dev->dev, "TL DVSEC is only allowed on function 0\n");
+ return -ENODEV;
+ }
+ fn->dvsec_tl_pos = pos;
+ return 0;
+}
+
+static int read_dvsec_function(struct pci_dev *dev, struct ocxl_fn_config *fn)
+{
+ int pos, afu_present;
+ u32 val;
+
+ pos = find_dvsec(dev, OCXL_DVSEC_FUNC_ID);
+ if (!pos) {
+ dev_err(&dev->dev, "Can't find function DVSEC\n");
+ return -ENODEV;
+ }
+ fn->dvsec_function_pos = pos;
+
+ pci_read_config_dword(dev, pos + OCXL_DVSEC_FUNC_OFF_INDEX, &val);
+ afu_present = EXTRACT_BIT(val, 31);
+ if (!afu_present) {
+ fn->max_afu_index = -1;
+ dev_dbg(&dev->dev, "Function doesn't define any AFU\n");
+ goto out;
+ }
+ fn->max_afu_index = EXTRACT_BITS(val, 24, 29);
+
+out:
+ dev_dbg(&dev->dev, "Function DVSEC:\n");
+ dev_dbg(&dev->dev, " Max AFU index = %d\n", fn->max_afu_index);
+ return 0;
+}
+
+static int read_dvsec_afu_info(struct pci_dev *dev, struct ocxl_fn_config *fn)
+{
+ int pos;
+
+ if (fn->max_afu_index < 0) {
+ fn->dvsec_afu_info_pos = -1;
+ return 0;
+ }
+
+ pos = find_dvsec(dev, OCXL_DVSEC_AFU_INFO_ID);
+ if (!pos) {
+ dev_err(&dev->dev, "Can't find AFU information DVSEC\n");
+ return -ENODEV;
+ }
+ fn->dvsec_afu_info_pos = pos;
+ return 0;
+}
+
+static int read_dvsec_vendor(struct pci_dev *dev)
+{
+ int pos;
+ u32 cfg, tlx, dlx;
+
+ /*
+ * vendor specific DVSEC is optional
+ *
+ * It's currently only used on function 0 to specify the
+ * version of some logic blocks. Some older images may not
+ * even have it so we ignore any errors
+ */
+ if (PCI_FUNC(dev->devfn) != 0)
+ return 0;
+
+ pos = find_dvsec(dev, OCXL_DVSEC_VENDOR_ID);
+ if (!pos)
+ return 0;
+
+ pci_read_config_dword(dev, pos + OCXL_DVSEC_VENDOR_CFG_VERS, &cfg);
+ pci_read_config_dword(dev, pos + OCXL_DVSEC_VENDOR_TLX_VERS, &tlx);
+ pci_read_config_dword(dev, pos + OCXL_DVSEC_VENDOR_DLX_VERS, &dlx);
+
+ dev_dbg(&dev->dev, "Vendor specific DVSEC:\n");
+ dev_dbg(&dev->dev, " CFG version = 0x%x\n", cfg);
+ dev_dbg(&dev->dev, " TLX version = 0x%x\n", tlx);
+ dev_dbg(&dev->dev, " DLX version = 0x%x\n", dlx);
+ return 0;
+}
+
+static int validate_function(struct pci_dev *dev, struct ocxl_fn_config *fn)
+{
+ if (fn->max_pasid_log == -1 && fn->max_afu_index >= 0) {
+ dev_err(&dev->dev,
+ "AFUs are defined but no PASIDs are requested\n");
+ return -EINVAL;
+ }
+
+ if (fn->max_afu_index > OCXL_MAX_AFU_PER_FUNCTION) {
+ dev_err(&dev->dev,
+ "Max AFU index out of architectural limit (%d vs %d)\n",
+ fn->max_afu_index, OCXL_MAX_AFU_PER_FUNCTION);
+ return -EINVAL;
+ }
+ return 0;
+}
+
+int ocxl_config_read_function(struct pci_dev *dev, struct ocxl_fn_config *fn)
+{
+ int rc;
+
+ rc = read_pasid(dev, fn);
+ if (rc) {
+ dev_err(&dev->dev, "Invalid PASID configuration: %d\n", rc);
+ return -ENODEV;
+ }
+
+ rc = read_dvsec_tl(dev, fn);
+ if (rc) {
+ dev_err(&dev->dev,
+ "Invalid Transaction Layer DVSEC configuration: %d\n",
+ rc);
+ return -ENODEV;
+ }
+
+ rc = read_dvsec_function(dev, fn);
+ if (rc) {
+ dev_err(&dev->dev,
+ "Invalid Function DVSEC configuration: %d\n", rc);
+ return -ENODEV;
+ }
+
+ rc = read_dvsec_afu_info(dev, fn);
+ if (rc) {
+ dev_err(&dev->dev, "Invalid AFU configuration: %d\n", rc);
+ return -ENODEV;
+ }
+
+ rc = read_dvsec_vendor(dev);
+ if (rc) {
+ dev_err(&dev->dev,
+ "Invalid vendor specific DVSEC configuration: %d\n",
+ rc);
+ return -ENODEV;
+ }
+
+ rc = validate_function(dev, fn);
+ return rc;
+}
+
+static int read_afu_info(struct pci_dev *dev, struct ocxl_fn_config *fn,
+ int offset, u32 *data)
+{
+ u32 val;
+ unsigned long timeout = jiffies + (HZ * OCXL_CFG_TIMEOUT);
+ int pos = fn->dvsec_afu_info_pos;
+
+ /* Protect 'data valid' bit */
+ if (EXTRACT_BIT(offset, 31)) {
+ dev_err(&dev->dev, "Invalid offset in AFU info DVSEC\n");
+ return -EINVAL;
+ }
+
+ pci_write_config_dword(dev, pos + OCXL_DVSEC_AFU_INFO_OFF, offset);
+ pci_read_config_dword(dev, pos + OCXL_DVSEC_AFU_INFO_OFF, &val);
+ while (!EXTRACT_BIT(val, 31)) {
+ if (time_after_eq(jiffies, timeout)) {
+ dev_err(&dev->dev,
+ "Timeout while reading AFU info DVSEC (offset=%d)\n",
+ offset);
+ return -EBUSY;
+ }
+ cpu_relax();
+ pci_read_config_dword(dev, pos + OCXL_DVSEC_AFU_INFO_OFF, &val);
+ }
+ pci_read_config_dword(dev, pos + OCXL_DVSEC_AFU_INFO_DATA, data);
+ return 0;
+}
+
+int ocxl_config_check_afu_index(struct pci_dev *dev,
+ struct ocxl_fn_config *fn, int afu_idx)
+{
+ u32 val;
+ int rc, templ_major, templ_minor, len;
+
+ pci_write_config_word(dev, fn->dvsec_afu_info_pos, afu_idx);
+ rc = read_afu_info(dev, fn, OCXL_DVSEC_TEMPL_VERSION, &val);
+ if (rc)
+ return rc;
+
+ /* AFU index map can have holes */
+ if (!val)
+ return 0;
+
+ templ_major = EXTRACT_BITS(val, 8, 15);
+ templ_minor = EXTRACT_BITS(val, 0, 7);
+ dev_dbg(&dev->dev, "AFU descriptor template version %d.%d\n",
+ templ_major, templ_minor);
+
+ len = EXTRACT_BITS(val, 16, 31);
+ if (len != OCXL_TEMPL_LEN) {
+ dev_warn(&dev->dev,
+ "Unexpected template length in AFU information (%#x)\n",
+ len);
+ }
+ return 1;
+}
+
+static int read_afu_name(struct pci_dev *dev, struct ocxl_fn_config *fn,
+ struct ocxl_afu_config *afu)
+{
+ int i, rc;
+ u32 val, *ptr;
+
+ BUILD_BUG_ON(OCXL_AFU_NAME_SZ < OCXL_TEMPL_NAME_LEN);
+ for (i = 0; i < OCXL_TEMPL_NAME_LEN; i += 4) {
+ rc = read_afu_info(dev, fn, OCXL_DVSEC_TEMPL_NAME + i, &val);
+ if (rc)
+ return rc;
+ ptr = (u32 *) &afu->name[i];
+ *ptr = val;
+ }
+ afu->name[OCXL_AFU_NAME_SZ - 1] = '\0'; /* play safe */
+ return 0;
+}
+
+static int read_afu_mmio(struct pci_dev *dev, struct ocxl_fn_config *fn,
+ struct ocxl_afu_config *afu)
+{
+ int rc;
+ u32 val;
+
+ /*
+ * Global MMIO
+ */
+ rc = read_afu_info(dev, fn, OCXL_DVSEC_TEMPL_MMIO_GLOBAL, &val);
+ if (rc)
+ return rc;
+ afu->global_mmio_bar = EXTRACT_BITS(val, 0, 2);
+ afu->global_mmio_offset = EXTRACT_BITS(val, 16, 31) << 16;
+
+ rc = read_afu_info(dev, fn, OCXL_DVSEC_TEMPL_MMIO_GLOBAL + 4, &val);
+ if (rc)
+ return rc;
+ afu->global_mmio_offset += (u64) val << 32;
+
+ rc = read_afu_info(dev, fn, OCXL_DVSEC_TEMPL_MMIO_GLOBAL_SZ, &val);
+ if (rc)
+ return rc;
+ afu->global_mmio_size = val;
+
+ /*
+ * Per-process MMIO
+ */
+ rc = read_afu_info(dev, fn, OCXL_DVSEC_TEMPL_MMIO_PP, &val);
+ if (rc)
+ return rc;
+ afu->pp_mmio_bar = EXTRACT_BITS(val, 0, 2);
+ afu->pp_mmio_offset = EXTRACT_BITS(val, 16, 31) << 16;
+
+ rc = read_afu_info(dev, fn, OCXL_DVSEC_TEMPL_MMIO_PP + 4, &val);
+ if (rc)
+ return rc;
+ afu->pp_mmio_offset += (u64) val << 32;
+
+ rc = read_afu_info(dev, fn, OCXL_DVSEC_TEMPL_MMIO_PP_SZ, &val);
+ if (rc)
+ return rc;
+ afu->pp_mmio_stride = val;
+
+ return 0;
+}
+
+static int read_afu_control(struct pci_dev *dev, struct ocxl_afu_config *afu)
+{
+ int pos;
+ u8 val8;
+ u16 val16;
+
+ pos = find_dvsec_afu_ctrl(dev, afu->idx);
+ if (!pos) {
+ dev_err(&dev->dev, "Can't find AFU control DVSEC for AFU %d\n",
+ afu->idx);
+ return -ENODEV;
+ }
+ afu->dvsec_afu_control_pos = pos;
+
+ pci_read_config_byte(dev, pos + OCXL_DVSEC_AFU_CTRL_PASID_SUP, &val8);
+ afu->pasid_supported_log = EXTRACT_BITS(val8, 0, 4);
+
+ pci_read_config_word(dev, pos + OCXL_DVSEC_AFU_CTRL_ACTAG_SUP, &val16);
+ afu->actag_supported = EXTRACT_BITS(val16, 0, 11);
+ return 0;
+}
+
+static bool char_allowed(int c)
+{
+ /*
+ * Permitted Characters : Alphanumeric, hyphen, underscore, comma
+ */
+ if ((c >= 0x30 && c <= 0x39) /* digits */ ||
+ (c >= 0x41 && c <= 0x5A) /* upper case */ ||
+ (c >= 0x61 && c <= 0x7A) /* lower case */ ||
+ c == 0 /* NULL */ ||
+ c == 0x2D /* - */ ||
+ c == 0x5F /* _ */ ||
+ c == 0x2C /* , */)
+ return true;
+ return false;
+}
+
+static int validate_afu(struct pci_dev *dev, struct ocxl_afu_config *afu)
+{
+ int i;
+
+ if (!afu->name[0]) {
+ dev_err(&dev->dev, "Empty AFU name\n");
+ return -EINVAL;
+ }
+ for (i = 0; i < OCXL_TEMPL_NAME_LEN; i++) {
+ if (!char_allowed(afu->name[i])) {
+ dev_err(&dev->dev,
+ "Invalid character in AFU name\n");
+ return -EINVAL;
+ }
+ }
+
+ if (afu->global_mmio_bar != 0 &&
+ afu->global_mmio_bar != 2 &&
+ afu->global_mmio_bar != 4) {
+ dev_err(&dev->dev, "Invalid global MMIO bar number\n");
+ return -EINVAL;
+ }
+ if (afu->pp_mmio_bar != 0 &&
+ afu->pp_mmio_bar != 2 &&
+ afu->pp_mmio_bar != 4) {
+ dev_err(&dev->dev, "Invalid per-process MMIO bar number\n");
+ return -EINVAL;
+ }
+ return 0;
+}
+
+int ocxl_config_read_afu(struct pci_dev *dev, struct ocxl_fn_config *fn,
+ struct ocxl_afu_config *afu, u8 afu_idx)
+{
+ int rc;
+ u32 val32;
+
+ /*
+ * First, we need to write the AFU idx for the AFU we want to
+ * access.
+ */
+ WARN_ON((afu_idx & OCXL_DVSEC_AFU_IDX_MASK) != afu_idx);
+ afu->idx = afu_idx;
+ pci_write_config_byte(dev,
+ fn->dvsec_afu_info_pos + OCXL_DVSEC_AFU_INFO_AFU_IDX,
+ afu->idx);
+
+ rc = read_afu_name(dev, fn, afu);
+ if (rc)
+ return rc;
+
+ rc = read_afu_info(dev, fn, OCXL_DVSEC_TEMPL_AFU_VERSION, &val32);
+ if (rc)
+ return rc;
+ afu->version_major = EXTRACT_BITS(val32, 24, 31);
+ afu->version_minor = EXTRACT_BITS(val32, 16, 23);
+ afu->afuc_type = EXTRACT_BITS(val32, 14, 15);
+ afu->afum_type = EXTRACT_BITS(val32, 12, 13);
+ afu->profile = EXTRACT_BITS(val32, 0, 7);
+
+ rc = read_afu_mmio(dev, fn, afu);
+ if (rc)
+ return rc;
+
+ rc = read_afu_info(dev, fn, OCXL_DVSEC_TEMPL_MEM_SZ, &val32);
+ if (rc)
+ return rc;
+ afu->log_mem_size = EXTRACT_BITS(val32, 0, 7);
+
+ rc = read_afu_control(dev, afu);
+ if (rc)
+ return rc;
+
+ dev_dbg(&dev->dev, "AFU configuration:\n");
+ dev_dbg(&dev->dev, " name = %s\n", afu->name);
+ dev_dbg(&dev->dev, " version = %d.%d\n", afu->version_major,
+ afu->version_minor);
+ dev_dbg(&dev->dev, " global mmio bar = %hhu\n", afu->global_mmio_bar);
+ dev_dbg(&dev->dev, " global mmio offset = %#llx\n",
+ afu->global_mmio_offset);
+ dev_dbg(&dev->dev, " global mmio size = %#x\n", afu->global_mmio_size);
+ dev_dbg(&dev->dev, " pp mmio bar = %hhu\n", afu->pp_mmio_bar);
+ dev_dbg(&dev->dev, " pp mmio offset = %#llx\n", afu->pp_mmio_offset);
+ dev_dbg(&dev->dev, " pp mmio stride = %#x\n", afu->pp_mmio_stride);
+ dev_dbg(&dev->dev, " mem size (log) = %hhu\n", afu->log_mem_size);
+ dev_dbg(&dev->dev, " pasid supported (log) = %u\n",
+ afu->pasid_supported_log);
+ dev_dbg(&dev->dev, " actag supported = %u\n",
+ afu->actag_supported);
+
+ rc = validate_afu(dev, afu);
+ return rc;
+}
+
+int ocxl_config_get_actag_info(struct pci_dev *dev, u16 *base, u16 *enabled,
+ u16 *supported)
+{
+ int rc;
+
+ /*
+ * This is really a simple wrapper for the kernel API, to
+ * avoid an external driver using ocxl as a library to call
+ * platform-dependent code
+ */
+ rc = pnv_ocxl_get_actag(dev, base, enabled, supported);
+ if (rc) {
+ dev_err(&dev->dev, "Can't get actag for device: %d\n", rc);
+ return rc;
+ }
+ return 0;
+}
+
+void ocxl_config_set_afu_actag(struct pci_dev *dev, int pos, int actag_base,
+ int actag_count)
+{
+ u16 val;
+
+ val = actag_count & OCXL_DVSEC_ACTAG_MASK;
+ pci_write_config_byte(dev, pos + OCXL_DVSEC_AFU_CTRL_ACTAG_EN, val);
+
+ val = actag_base & OCXL_DVSEC_ACTAG_MASK;
+ pci_write_config_dword(dev, pos + OCXL_DVSEC_AFU_CTRL_ACTAG_BASE, val);
+}
+
+int ocxl_config_get_pasid_info(struct pci_dev *dev, int *count)
+{
+ return pnv_ocxl_get_pasid_count(dev, count);
+}
+
+void ocxl_config_set_afu_pasid(struct pci_dev *dev, int pos, int pasid_base,
+ u32 pasid_count_log)
+{
+ u8 val8;
+ u32 val32;
+
+ val8 = pasid_count_log & OCXL_DVSEC_PASID_LOG_MASK;
+ pci_write_config_byte(dev, pos + OCXL_DVSEC_AFU_CTRL_PASID_EN, val8);
+
+ pci_read_config_dword(dev, pos + OCXL_DVSEC_AFU_CTRL_PASID_BASE,
+ &val32);
+ val32 &= ~OCXL_DVSEC_PASID_MASK;
+ val32 |= pasid_base & OCXL_DVSEC_PASID_MASK;
+ pci_write_config_dword(dev, pos + OCXL_DVSEC_AFU_CTRL_PASID_BASE,
+ val32);
+}
+
+void ocxl_config_set_afu_state(struct pci_dev *dev, int pos, int enable)
+{
+ u8 val;
+
+ pci_read_config_byte(dev, pos + OCXL_DVSEC_AFU_CTRL_ENABLE, &val);
+ if (enable)
+ val |= 1;
+ else
+ val &= 0xFE;
+ pci_write_config_byte(dev, pos + OCXL_DVSEC_AFU_CTRL_ENABLE, val);
+}
+
+int ocxl_config_set_TL(struct pci_dev *dev, int tl_dvsec)
+{
+ u32 val, *ptr32;
+ u8 timers;
+ int i, rc;
+ long recv_cap;
+ char *recv_rate;
+
+ /*
+ * Skip on function != 0, as the TL can only be defined on 0
+ */
+ if (PCI_FUNC(dev->devfn) != 0)
+ return 0;
+
+ recv_rate = kzalloc(PNV_OCXL_TL_RATE_BUF_SIZE, GFP_KERNEL);
+ if (!recv_rate)
+ return -ENOMEM;
+ /*
+ * The spec defines 64 templates for messages in the
+ * Transaction Layer (TL).
+ *
+ * The host and device each support a subset, so we need to
+ * configure the transmitters on each side to send only
+ * templates the receiver understands, at a rate the receiver
+ * can process. Per the spec, template 0 must be supported by
+ * everybody. That's the template which has been used by the
+ * host and device so far.
+ *
+ * The sending rate limit must be set before the template is
+ * enabled.
+ */
+
+ /*
+ * Device -> host
+ */
+ rc = pnv_ocxl_get_tl_cap(dev, &recv_cap, recv_rate,
+ PNV_OCXL_TL_RATE_BUF_SIZE);
+ if (rc)
+ goto out;
+
+ for (i = 0; i < PNV_OCXL_TL_RATE_BUF_SIZE; i += 4) {
+ ptr32 = (u32 *) &recv_rate[i];
+ pci_write_config_dword(dev,
+ tl_dvsec + OCXL_DVSEC_TL_SEND_RATE + i,
+ be32_to_cpu(*ptr32));
+ }
+ val = recv_cap >> 32;
+ pci_write_config_dword(dev, tl_dvsec + OCXL_DVSEC_TL_SEND_CAP, val);
+ val = recv_cap & GENMASK(31, 0);
+ pci_write_config_dword(dev, tl_dvsec + OCXL_DVSEC_TL_SEND_CAP + 4, val);
+
+ /*
+ * Host -> device
+ */
+ for (i = 0; i < PNV_OCXL_TL_RATE_BUF_SIZE; i += 4) {
+ pci_read_config_dword(dev,
+ tl_dvsec + OCXL_DVSEC_TL_RECV_RATE + i,
+ &val);
+ ptr32 = (u32 *) &recv_rate[i];
+ *ptr32 = cpu_to_be32(val);
+ }
+ pci_read_config_dword(dev, tl_dvsec + OCXL_DVSEC_TL_RECV_CAP, &val);
+ recv_cap = (long) val << 32;
+ pci_read_config_dword(dev, tl_dvsec + OCXL_DVSEC_TL_RECV_CAP + 4, &val);
+ recv_cap |= val;
+
+ rc = pnv_ocxl_set_tl_conf(dev, recv_cap, __pa(recv_rate),
+ PNV_OCXL_TL_RATE_BUF_SIZE);
+ if (rc)
+ goto out;
+
+ /*
+ * Opencapi commands needing to be retried are classified per
+ * the TL in 2 groups: short and long commands.
+ *
+ * The short back off timer it not used for now. It will be
+ * for opencapi 4.0.
+ *
+ * The long back off timer is typically used when an AFU hits
+ * a page fault but the NPU is already processing one. So the
+ * AFU needs to wait before it can resubmit. Having a value
+ * too low doesn't break anything, but can generate extra
+ * traffic on the link.
+ * We set it to 1.6 us for now. It's shorter than, but in the
+ * same order of magnitude as the time spent to process a page
+ * fault.
+ */
+ timers = 0x2 << 4; /* long timer = 1.6 us */
+ pci_write_config_byte(dev, tl_dvsec + OCXL_DVSEC_TL_BACKOFF_TIMERS,
+ timers);
+
+ rc = 0;
+out:
+ kfree(recv_rate);
+ return rc;
+}
+
+int ocxl_config_terminate_pasid(struct pci_dev *dev, int afu_control, int pasid)
+{
+ u32 val;
+ unsigned long timeout;
+
+ pci_read_config_dword(dev, afu_control + OCXL_DVSEC_AFU_CTRL_TERM_PASID,
+ &val);
+ if (EXTRACT_BIT(val, 20)) {
+ dev_err(&dev->dev,
+ "Can't terminate PASID %#x, previous termination didn't complete\n",
+ pasid);
+ return -EBUSY;
+ }
+
+ val &= ~OCXL_DVSEC_PASID_MASK;
+ val |= pasid & OCXL_DVSEC_PASID_MASK;
+ val |= BIT(20);
+ pci_write_config_dword(dev,
+ afu_control + OCXL_DVSEC_AFU_CTRL_TERM_PASID,
+ val);
+
+ timeout = jiffies + (HZ * OCXL_CFG_TIMEOUT);
+ pci_read_config_dword(dev, afu_control + OCXL_DVSEC_AFU_CTRL_TERM_PASID,
+ &val);
+ while (EXTRACT_BIT(val, 20)) {
+ if (time_after_eq(jiffies, timeout)) {
+ dev_err(&dev->dev,
+ "Timeout while waiting for AFU to terminate PASID %#x\n",
+ pasid);
+ return -EBUSY;
+ }
+ cpu_relax();
+ pci_read_config_dword(dev,
+ afu_control + OCXL_DVSEC_AFU_CTRL_TERM_PASID,
+ &val);
+ }
+ return 0;
+}
+
+void ocxl_config_set_actag(struct pci_dev *dev, int func_dvsec, u32 tag_first,
+ u32 tag_count)
+{
+ u32 val;
+
+ val = (tag_first & OCXL_DVSEC_ACTAG_MASK) << 16;
+ val |= tag_count & OCXL_DVSEC_ACTAG_MASK;
+ pci_write_config_dword(dev, func_dvsec + OCXL_DVSEC_FUNC_OFF_ACTAG,
+ val);
+}
diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
new file mode 100644
index 000000000000..0bc0dd97d784
--- /dev/null
+++ b/drivers/misc/ocxl/context.c
@@ -0,0 +1,237 @@
+/*
+ * Copyright 2017 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/sched/mm.h>
+#include "ocxl_internal.h"
+
+struct ocxl_context *ocxl_context_alloc(void)
+{
+ return kzalloc(sizeof(struct ocxl_context), GFP_KERNEL);
+}
+
+int ocxl_context_init(struct ocxl_context *ctx, struct ocxl_afu *afu,
+ struct address_space *mapping)
+{
+ int pasid;
+
+ ctx->afu = afu;
+ mutex_lock(&afu->contexts_lock);
+ pasid = idr_alloc(&afu->contexts_idr, ctx, afu->pasid_base,
+ afu->pasid_base + afu->pasid_max, GFP_KERNEL);
+ if (pasid < 0) {
+ mutex_unlock(&afu->contexts_lock);
+ return pasid;
+ }
+ afu->pasid_count++;
+ mutex_unlock(&afu->contexts_lock);
+
+ ctx->pasid = pasid;
+ ctx->status = OPENED;
+ mutex_init(&ctx->status_mutex);
+ ctx->mapping = mapping;
+ mutex_init(&ctx->mapping_lock);
+ init_waitqueue_head(&ctx->events_wq);
+ mutex_init(&ctx->xsl_error_lock);
+ /*
+ * Keep a reference on the AFU to make sure it's valid for the
+ * duration of the life of the context
+ */
+ ocxl_afu_get(afu);
+ return 0;
+}
+
+/*
+ * Callback for when a translation fault triggers an error
+ * data: a pointer to the context which triggered the fault
+ * addr: the address that triggered the error
+ * dsisr: the value of the PPC64 dsisr register
+ */
+static void xsl_fault_error(void *data, u64 addr, u64 dsisr)
+{
+ struct ocxl_context *ctx = (struct ocxl_context *) data;
+
+ mutex_lock(&ctx->xsl_error_lock);
+ ctx->xsl_error.addr = addr;
+ ctx->xsl_error.dsisr = dsisr;
+ ctx->xsl_error.count++;
+ mutex_unlock(&ctx->xsl_error_lock);
+
+ wake_up_all(&ctx->events_wq);
+}
+
+int ocxl_context_attach(struct ocxl_context *ctx, u64 amr)
+{
+ int rc;
+
+ mutex_lock(&ctx->status_mutex);
+ if (ctx->status != OPENED) {
+ rc = -EIO;
+ goto out;
+ }
+
+ rc = ocxl_link_add_pe(ctx->afu->fn->link, ctx->pasid,
+ current->mm->context.id, 0, amr, current->mm,
+ xsl_fault_error, ctx);
+ if (rc)
+ goto out;
+
+ ctx->status = ATTACHED;
+out:
+ mutex_unlock(&ctx->status_mutex);
+ return rc;
+}
+
+static int map_pp_mmio(struct vm_area_struct *vma, unsigned long address,
+ u64 offset, struct ocxl_context *ctx)
+{
+ u64 pp_mmio_addr;
+ int pasid_off;
+
+ if (offset >= ctx->afu->config.pp_mmio_stride)
+ return VM_FAULT_SIGBUS;
+
+ mutex_lock(&ctx->status_mutex);
+ if (ctx->status != ATTACHED) {
+ mutex_unlock(&ctx->status_mutex);
+ pr_debug("%s: Context not attached, failing mmio mmap\n",
+ __func__);
+ return VM_FAULT_SIGBUS;
+ }
+
+ pasid_off = ctx->pasid - ctx->afu->pasid_base;
+ pp_mmio_addr = ctx->afu->pp_mmio_start +
+ pasid_off * ctx->afu->config.pp_mmio_stride +
+ offset;
+
+ vm_insert_pfn(vma, address, pp_mmio_addr >> PAGE_SHIFT);
+ mutex_unlock(&ctx->status_mutex);
+ return VM_FAULT_NOPAGE;
+}
+
+static int ocxl_mmap_fault(struct vm_fault *vmf)
+{
+ struct vm_area_struct *vma = vmf->vma;
+ struct ocxl_context *ctx = vma->vm_file->private_data;
+ u64 offset;
+ int rc;
+
+ offset = vmf->pgoff << PAGE_SHIFT;
+ pr_debug("%s: pasid %d address 0x%lx offset 0x%llx\n", __func__,
+ ctx->pasid, vmf->address, offset);
+
+ rc = map_pp_mmio(vma, vmf->address, offset, ctx);
+ return rc;
+}
+
+static const struct vm_operations_struct ocxl_vmops = {
+ .fault = ocxl_mmap_fault,
+};
+
+static int check_mmap_mmio(struct ocxl_context *ctx,
+ struct vm_area_struct *vma)
+{
+ if ((vma_pages(vma) + vma->vm_pgoff) >
+ (ctx->afu->config.pp_mmio_stride >> PAGE_SHIFT))
+ return -EINVAL;
+ return 0;
+}
+
+int ocxl_context_mmap(struct ocxl_context *ctx, struct vm_area_struct *vma)
+{
+ int rc;
+
+ rc = check_mmap_mmio(ctx, vma);
+ if (rc)
+ return rc;
+
+ vma->vm_flags |= VM_IO | VM_PFNMAP;
+ vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+ vma->vm_ops = &ocxl_vmops;
+ return 0;
+}
+
+int ocxl_context_detach(struct ocxl_context *ctx)
+{
+ struct pci_dev *dev;
+ int afu_control_pos;
+ enum ocxl_context_status status;
+ int rc;
+
+ mutex_lock(&ctx->status_mutex);
+ status = ctx->status;
+ ctx->status = CLOSED;
+ mutex_unlock(&ctx->status_mutex);
+ if (status != ATTACHED)
+ return 0;
+
+ dev = to_pci_dev(ctx->afu->fn->dev.parent);
+ afu_control_pos = ctx->afu->config.dvsec_afu_control_pos;
+
+ mutex_lock(&ctx->afu->afu_control_lock);
+ rc = ocxl_config_terminate_pasid(dev, afu_control_pos, ctx->pasid);
+ mutex_unlock(&ctx->afu->afu_control_lock);
+ if (rc) {
+ /*
+ * If we timeout waiting for the AFU to terminate the
+ * pasid, then it's dangerous to clean up the Process
+ * Element entry in the SPA, as it may be referenced
+ * in the future by the AFU. In which case, we would
+ * checkstop because of an invalid PE access (FIR
+ * register 2, bit 42). So leave the PE
+ * defined. Caller shouldn't free the context so that
+ * PASID remains allocated.
+ *
+ * A link reset will be required to cleanup the AFU
+ * and the SPA.
+ */
+ if (rc == -EBUSY)
+ return rc;
+ }
+ rc = ocxl_link_remove_pe(ctx->afu->fn->link, ctx->pasid);
+ if (rc) {
+ dev_warn(&ctx->afu->dev,
+ "Couldn't remove PE entry cleanly: %d\n", rc);
+ }
+ return 0;
+}
+
+void ocxl_context_detach_all(struct ocxl_afu *afu)
+{
+ struct ocxl_context *ctx;
+ int tmp;
+
+ mutex_lock(&afu->contexts_lock);
+ idr_for_each_entry(&afu->contexts_idr, ctx, tmp) {
+ ocxl_context_detach(ctx);
+ /*
+ * We are force detaching - remove any active mmio
+ * mappings so userspace cannot interfere with the
+ * card if it comes back. Easiest way to exercise
+ * this is to unbind and rebind the driver via sysfs
+ * while it is in use.
+ */
+ mutex_lock(&ctx->mapping_lock);
+ if (ctx->mapping)
+ unmap_mapping_range(ctx->mapping, 0, 0, 1);
+ mutex_unlock(&ctx->mapping_lock);
+ }
+ mutex_unlock(&afu->contexts_lock);
+}
+
+void ocxl_context_free(struct ocxl_context *ctx)
+{
+ mutex_lock(&ctx->afu->contexts_lock);
+ ctx->afu->pasid_count--;
+ idr_remove(&ctx->afu->contexts_idr, ctx->pasid);
+ mutex_unlock(&ctx->afu->contexts_lock);
+
+ /* reference to the AFU taken in ocxl_context_init */
+ ocxl_afu_put(ctx->afu);
+ kfree(ctx);
+}
diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c
new file mode 100644
index 000000000000..a51386eff4f5
--- /dev/null
+++ b/drivers/misc/ocxl/file.c
@@ -0,0 +1,405 @@
+/*
+ * Copyright 2017 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/fs.h>
+#include <linux/poll.h>
+#include <linux/sched/signal.h>
+#include <linux/uaccess.h>
+#include <uapi/misc/ocxl.h>
+#include "ocxl_internal.h"
+
+
+#define OCXL_NUM_MINORS 256 /* Total to reserve */
+
+static dev_t ocxl_dev;
+static struct class *ocxl_class;
+static struct mutex minors_idr_lock;
+static struct idr minors_idr;
+
+static struct ocxl_afu *find_and_get_afu(dev_t devno)
+{
+ struct ocxl_afu *afu;
+ int afu_minor;
+
+ afu_minor = MINOR(devno);
+ /*
+ * We don't declare an RCU critical section here, as our AFU
+ * is protected by a reference counter on the device. By the time the
+ * minor number of a device is removed from the idr, the ref count of
+ * the device is already at 0, so no user API will access that AFU and
+ * this function can't return it.
+ */
+ afu = idr_find(&minors_idr, afu_minor);
+ if (afu)
+ ocxl_afu_get(afu);
+ return afu;
+}
+
+static int allocate_afu_minor(struct ocxl_afu *afu)
+{
+ int minor;
+
+ mutex_lock(&minors_idr_lock);
+ minor = idr_alloc(&minors_idr, afu, 0, OCXL_NUM_MINORS, GFP_KERNEL);
+ mutex_unlock(&minors_idr_lock);
+ return minor;
+}
+
+static void free_afu_minor(struct ocxl_afu *afu)
+{
+ mutex_lock(&minors_idr_lock);
+ idr_remove(&minors_idr, MINOR(afu->dev.devt));
+ mutex_unlock(&minors_idr_lock);
+}
+
+static int afu_open(struct inode *inode, struct file *file)
+{
+ struct ocxl_afu *afu;
+ struct ocxl_context *ctx;
+ int rc;
+
+ pr_debug("%s for device %x\n", __func__, inode->i_rdev);
+
+ afu = find_and_get_afu(inode->i_rdev);
+ if (!afu)
+ return -ENODEV;
+
+ ctx = ocxl_context_alloc();
+ if (!ctx) {
+ rc = -ENOMEM;
+ goto put_afu;
+ }
+
+ rc = ocxl_context_init(ctx, afu, inode->i_mapping);
+ if (rc)
+ goto put_afu;
+ file->private_data = ctx;
+ ocxl_afu_put(afu);
+ return 0;
+
+put_afu:
+ ocxl_afu_put(afu);
+ return rc;
+}
+
+static long afu_ioctl_attach(struct ocxl_context *ctx,
+ struct ocxl_ioctl_attach __user *uarg)
+{
+ struct ocxl_ioctl_attach arg;
+ u64 amr = 0;
+ int rc;
+
+ pr_debug("%s for context %d\n", __func__, ctx->pasid);
+
+ if (copy_from_user(&arg, uarg, sizeof(arg)))
+ return -EFAULT;
+
+ /* Make sure reserved fields are not set for forward compatibility */
+ if (arg.reserved1 || arg.reserved2 || arg.reserved3)
+ return -EINVAL;
+
+ amr = arg.amr & mfspr(SPRN_UAMOR);
+ rc = ocxl_context_attach(ctx, amr);
+ return rc;
+}
+
+#define CMD_STR(x) (x == OCXL_IOCTL_ATTACH ? "ATTACH" : \
+ "UNKNOWN")
+
+static long afu_ioctl(struct file *file, unsigned int cmd,
+ unsigned long args)
+{
+ struct ocxl_context *ctx = file->private_data;
+ long rc;
+
+ pr_debug("%s for context %d, command %s\n", __func__, ctx->pasid,
+ CMD_STR(cmd));
+
+ if (ctx->status == CLOSED)
+ return -EIO;
+
+ switch (cmd) {
+ case OCXL_IOCTL_ATTACH:
+ rc = afu_ioctl_attach(ctx,
+ (struct ocxl_ioctl_attach __user *) args);
+ break;
+
+ default:
+ rc = -EINVAL;
+ }
+ return rc;
+}
+
+static long afu_compat_ioctl(struct file *file, unsigned int cmd,
+ unsigned long args)
+{
+ return afu_ioctl(file, cmd, args);
+}
+
+static int afu_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ struct ocxl_context *ctx = file->private_data;
+
+ pr_debug("%s for context %d\n", __func__, ctx->pasid);
+ return ocxl_context_mmap(ctx, vma);
+}
+
+static bool has_xsl_error(struct ocxl_context *ctx)
+{
+ bool ret;
+
+ mutex_lock(&ctx->xsl_error_lock);
+ ret = !!ctx->xsl_error.addr;
+ mutex_unlock(&ctx->xsl_error_lock);
+
+ return ret;
+}
+
+/*
+ * Are there any events pending on the AFU
+ * ctx: The AFU context
+ * Returns: true if there are events pending
+ */
+static bool afu_events_pending(struct ocxl_context *ctx)
+{
+ if (has_xsl_error(ctx))
+ return true;
+ return false;
+}
+
+static unsigned int afu_poll(struct file *file, struct poll_table_struct *wait)
+{
+ struct ocxl_context *ctx = file->private_data;
+ unsigned int mask = 0;
+ bool closed;
+
+ pr_debug("%s for context %d\n", __func__, ctx->pasid);
+
+ poll_wait(file, &ctx->events_wq, wait);
+
+ mutex_lock(&ctx->status_mutex);
+ closed = (ctx->status == CLOSED);
+ mutex_unlock(&ctx->status_mutex);
+
+ if (afu_events_pending(ctx))
+ mask = POLLIN | POLLRDNORM;
+ else if (closed)
+ mask = POLLERR;
+
+ return mask;
+}
+
+/*
+ * Populate the supplied buffer with a single XSL error
+ * ctx: The AFU context to report the error from
+ * header: the event header to populate
+ * buf: The buffer to write the body into (should be at least
+ * AFU_EVENT_BODY_XSL_ERROR_SIZE)
+ * Return: the amount of buffer that was populated
+ */
+static ssize_t append_xsl_error(struct ocxl_context *ctx,
+ struct ocxl_kernel_event_header *header,
+ char __user *buf)
+{
+ struct ocxl_kernel_event_xsl_fault_error body;
+
+ memset(&body, 0, sizeof(body));
+
+ mutex_lock(&ctx->xsl_error_lock);
+ if (!ctx->xsl_error.addr) {
+ mutex_unlock(&ctx->xsl_error_lock);
+ return 0;
+ }
+
+ body.addr = ctx->xsl_error.addr;
+ body.dsisr = ctx->xsl_error.dsisr;
+ body.count = ctx->xsl_error.count;
+
+ ctx->xsl_error.addr = 0;
+ ctx->xsl_error.dsisr = 0;
+ ctx->xsl_error.count = 0;
+
+ mutex_unlock(&ctx->xsl_error_lock);
+
+ header->type = OCXL_AFU_EVENT_XSL_FAULT_ERROR;
+
+ if (copy_to_user(buf, &body, sizeof(body)))
+ return -EFAULT;
+
+ return sizeof(body);
+}
+
+#define AFU_EVENT_BODY_MAX_SIZE sizeof(struct ocxl_kernel_event_xsl_fault_error)
+
+/*
+ * Reports events on the AFU
+ * Format:
+ * Header (struct ocxl_kernel_event_header)
+ * Body (struct ocxl_kernel_event_*)
+ * Header...
+ */
+static ssize_t afu_read(struct file *file, char __user *buf, size_t count,
+ loff_t *off)
+{
+ struct ocxl_context *ctx = file->private_data;
+ struct ocxl_kernel_event_header header;
+ ssize_t rc;
+ size_t used = 0;
+ DEFINE_WAIT(event_wait);
+
+ memset(&header, 0, sizeof(header));
+
+ /* Require offset to be 0 */
+ if (*off != 0)
+ return -EINVAL;
+
+ if (count < (sizeof(struct ocxl_kernel_event_header) +
+ AFU_EVENT_BODY_MAX_SIZE))
+ return -EINVAL;
+
+ for (;;) {
+ prepare_to_wait(&ctx->events_wq, &event_wait,
+ TASK_INTERRUPTIBLE);
+
+ if (afu_events_pending(ctx))
+ break;
+
+ if (ctx->status == CLOSED)
+ break;
+
+ if (file->f_flags & O_NONBLOCK) {
+ finish_wait(&ctx->events_wq, &event_wait);
+ return -EAGAIN;
+ }
+
+ if (signal_pending(current)) {
+ finish_wait(&ctx->events_wq, &event_wait);
+ return -ERESTARTSYS;
+ }
+
+ schedule();
+ }
+
+ finish_wait(&ctx->events_wq, &event_wait);
+
+ if (has_xsl_error(ctx)) {
+ used = append_xsl_error(ctx, &header, buf + sizeof(header));
+ if (used < 0)
+ return used;
+ }
+
+ if (!afu_events_pending(ctx))
+ header.flags |= OCXL_KERNEL_EVENT_FLAG_LAST;
+
+ if (copy_to_user(buf, &header, sizeof(header)))
+ return -EFAULT;
+
+ used += sizeof(header);
+
+ rc = (ssize_t) used;
+ return rc;
+}
+
+static int afu_release(struct inode *inode, struct file *file)
+{
+ struct ocxl_context *ctx = file->private_data;
+ int rc;
+
+ pr_debug("%s for device %x\n", __func__, inode->i_rdev);
+ rc = ocxl_context_detach(ctx);
+ mutex_lock(&ctx->mapping_lock);
+ ctx->mapping = NULL;
+ mutex_unlock(&ctx->mapping_lock);
+ wake_up_all(&ctx->events_wq);
+ if (rc != -EBUSY)
+ ocxl_context_free(ctx);
+ return 0;
+}
+
+static const struct file_operations ocxl_afu_fops = {
+ .owner = THIS_MODULE,
+ .open = afu_open,
+ .unlocked_ioctl = afu_ioctl,
+ .compat_ioctl = afu_compat_ioctl,
+ .mmap = afu_mmap,
+ .poll = afu_poll,
+ .read = afu_read,
+ .release = afu_release,
+};
+
+int ocxl_create_cdev(struct ocxl_afu *afu)
+{
+ int rc;
+
+ cdev_init(&afu->cdev, &ocxl_afu_fops);
+ rc = cdev_add(&afu->cdev, afu->dev.devt, 1);
+ if (rc) {
+ dev_err(&afu->dev, "Unable to add afu char device: %d\n", rc);
+ return rc;
+ }
+ return 0;
+}
+
+void ocxl_destroy_cdev(struct ocxl_afu *afu)
+{
+ cdev_del(&afu->cdev);
+}
+
+int ocxl_register_afu(struct ocxl_afu *afu)
+{
+ int minor;
+
+ minor = allocate_afu_minor(afu);
+ if (minor < 0)
+ return minor;
+ afu->dev.devt = MKDEV(MAJOR(ocxl_dev), minor);
+ afu->dev.class = ocxl_class;
+ return device_register(&afu->dev);
+}
+
+void ocxl_unregister_afu(struct ocxl_afu *afu)
+{
+ free_afu_minor(afu);
+}
+
+static char *ocxl_devnode(struct device *dev, umode_t *mode)
+{
+ return kasprintf(GFP_KERNEL, "ocxl/%s", dev_name(dev));
+}
+
+int ocxl_file_init(void)
+{
+ int rc;
+
+ mutex_init(&minors_idr_lock);
+ idr_init(&minors_idr);
+
+ rc = alloc_chrdev_region(&ocxl_dev, 0, OCXL_NUM_MINORS, "ocxl");
+ if (rc) {
+ pr_err("Unable to allocate ocxl major number: %d\n", rc);
+ return rc;
+ }
+
+ ocxl_class = class_create(THIS_MODULE, "ocxl");
+ if (IS_ERR(ocxl_class)) {
+ pr_err("Unable to create ocxl class\n");
+ unregister_chrdev_region(ocxl_dev, OCXL_NUM_MINORS);
+ return PTR_ERR(ocxl_class);
+ }
+
+ ocxl_class->devnode = ocxl_devnode;
+ return 0;
+}
+
+void ocxl_file_exit(void)
+{
+ class_destroy(ocxl_class);
+ unregister_chrdev_region(ocxl_dev, OCXL_NUM_MINORS);
+ idr_destroy(&minors_idr);
+}
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
new file mode 100644
index 000000000000..6b184cd7d2a6
--- /dev/null
+++ b/drivers/misc/ocxl/link.c
@@ -0,0 +1,610 @@
+/*
+ * Copyright 2017 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/sched/mm.h>
+#include <linux/mutex.h>
+#include <linux/mmu_context.h>
+#include <asm/copro.h>
+#include <asm/pnv-ocxl.h>
+#include "ocxl_internal.h"
+
+
+#define SPA_PASID_BITS 15
+#define SPA_PASID_MAX ((1 << SPA_PASID_BITS) - 1)
+#define SPA_PE_MASK SPA_PASID_MAX
+#define SPA_SPA_SIZE_LOG 22 /* Each SPA is 4 Mb */
+
+#define SPA_CFG_SF (1ull << (63-0))
+#define SPA_CFG_TA (1ull << (63-1))
+#define SPA_CFG_HV (1ull << (63-3))
+#define SPA_CFG_UV (1ull << (63-4))
+#define SPA_CFG_XLAT_hpt (0ull << (63-6)) /* Hashed page table (HPT) mode */
+#define SPA_CFG_XLAT_roh (2ull << (63-6)) /* Radix on HPT mode */
+#define SPA_CFG_XLAT_ror (3ull << (63-6)) /* Radix on Radix mode */
+#define SPA_CFG_PR (1ull << (63-49))
+#define SPA_CFG_TC (1ull << (63-54))
+#define SPA_CFG_DR (1ull << (63-59))
+
+#define SPA_XSL_TF (1ull << (63-3)) /* Translation fault */
+#define SPA_XSL_S (1ull << (63-38)) /* Store operation */
+
+#define SPA_PE_VALID 0x80000000
+
+
+struct pe_data {
+ struct mm_struct *mm;
+ /* callback to trigger when a translation fault occurs */
+ void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr);
+ /* opaque pointer to be passed to the above callback */
+ void *xsl_err_data;
+ struct rcu_head rcu;
+};
+
+struct spa {
+ struct ocxl_process_element *spa_mem;
+ int spa_order;
+ struct mutex spa_lock;
+ struct radix_tree_root pe_tree; /* Maps PE handles to pe_data */
+ char *irq_name;
+ int virq;
+ void __iomem *reg_dsisr;
+ void __iomem *reg_dar;
+ void __iomem *reg_tfc;
+ void __iomem *reg_pe_handle;
+ /*
+ * The following field are used by the memory fault
+ * interrupt handler. We can only have one interrupt at a
+ * time. The NPU won't raise another interrupt until the
+ * previous one has been ack'd by writing to the TFC register
+ */
+ struct xsl_fault {
+ struct work_struct fault_work;
+ u64 pe;
+ u64 dsisr;
+ u64 dar;
+ struct pe_data pe_data;
+ } xsl_fault;
+};
+
+/*
+ * A opencapi link can be used be by several PCI functions. We have
+ * one link per device slot.
+ *
+ * A linked list of opencapi links should suffice, as there's a
+ * limited number of opencapi slots on a system and lookup is only
+ * done when the device is probed
+ */
+struct link {
+ struct list_head list;
+ struct kref ref;
+ int domain;
+ int bus;
+ int dev;
+ atomic_t irq_available;
+ struct spa *spa;
+ void *platform_data;
+};
+static struct list_head links_list = LIST_HEAD_INIT(links_list);
+static DEFINE_MUTEX(links_list_lock);
+
+enum xsl_response {
+ CONTINUE,
+ ADDRESS_ERROR,
+ RESTART,
+};
+
+
+static void read_irq(struct spa *spa, u64 *dsisr, u64 *dar, u64 *pe)
+{
+ u64 reg;
+
+ *dsisr = in_be64(spa->reg_dsisr);
+ *dar = in_be64(spa->reg_dar);
+ reg = in_be64(spa->reg_pe_handle);
+ *pe = reg & SPA_PE_MASK;
+}
+
+static void ack_irq(struct spa *spa, enum xsl_response r)
+{
+ u64 reg = 0;
+
+ /* continue is not supported */
+ if (r == RESTART)
+ reg = PPC_BIT(31);
+ else if (r == ADDRESS_ERROR)
+ reg = PPC_BIT(30);
+ else
+ WARN(1, "Invalid irq response %d\n", r);
+
+ if (reg)
+ out_be64(spa->reg_tfc, reg);
+}
+
+static void xsl_fault_handler_bh(struct work_struct *fault_work)
+{
+ unsigned int flt = 0;
+ unsigned long access, flags, inv_flags = 0;
+ enum xsl_response r;
+ struct xsl_fault *fault = container_of(fault_work, struct xsl_fault,
+ fault_work);
+ struct spa *spa = container_of(fault, struct spa, xsl_fault);
+
+ int rc;
+
+ /*
+ * We need to release a reference on the mm whenever exiting this
+ * function (taken in the memory fault interrupt handler)
+ */
+ rc = copro_handle_mm_fault(fault->pe_data.mm, fault->dar, fault->dsisr,
+ &flt);
+ if (rc) {
+ pr_debug("copro_handle_mm_fault failed: %d\n", rc);
+ if (fault->pe_data.xsl_err_cb) {
+ fault->pe_data.xsl_err_cb(
+ fault->pe_data.xsl_err_data,
+ fault->dar, fault->dsisr);
+ }
+ r = ADDRESS_ERROR;
+ goto ack;
+ }
+
+ if (!radix_enabled()) {
+ /*
+ * update_mmu_cache() will not have loaded the hash
+ * since current->trap is not a 0x400 or 0x300, so
+ * just call hash_page_mm() here.
+ */
+ access = _PAGE_PRESENT | _PAGE_READ;
+ if (fault->dsisr & SPA_XSL_S)
+ access |= _PAGE_WRITE;
+
+ if (REGION_ID(fault->dar) != USER_REGION_ID)
+ access |= _PAGE_PRIVILEGED;
+
+ local_irq_save(flags);
+ hash_page_mm(fault->pe_data.mm, fault->dar, access, 0x300,
+ inv_flags);
+ local_irq_restore(flags);
+ }
+ r = RESTART;
+ack:
+ mmdrop(fault->pe_data.mm);
+ ack_irq(spa, r);
+}
+
+static irqreturn_t xsl_fault_handler(int irq, void *data)
+{
+ struct link *link = (struct link *) data;
+ struct spa *spa = link->spa;
+ u64 dsisr, dar, pe_handle;
+ struct pe_data *pe_data;
+ struct ocxl_process_element *pe;
+ int lpid, pid, tid;
+
+ read_irq(spa, &dsisr, &dar, &pe_handle);
+
+ WARN_ON(pe_handle > SPA_PE_MASK);
+ pe = spa->spa_mem + pe_handle;
+ lpid = be32_to_cpu(pe->lpid);
+ pid = be32_to_cpu(pe->pid);
+ tid = be32_to_cpu(pe->tid);
+ /* We could be reading all null values here if the PE is being
+ * removed while an interrupt kicks in. It's not supposed to
+ * happen if the driver notified the AFU to terminate the
+ * PASID, and the AFU waited for pending operations before
+ * acknowledging. But even if it happens, we won't find a
+ * memory context below and fail silently, so it should be ok.
+ */
+ if (!(dsisr & SPA_XSL_TF)) {
+ WARN(1, "Invalid xsl interrupt fault register %#llx\n", dsisr);
+ ack_irq(spa, ADDRESS_ERROR);
+ return IRQ_HANDLED;
+ }
+
+ rcu_read_lock();
+ pe_data = radix_tree_lookup(&spa->pe_tree, pe_handle);
+ if (!pe_data) {
+ /*
+ * Could only happen if the driver didn't notify the
+ * AFU about PASID termination before removing the PE,
+ * or the AFU didn't wait for all memory access to
+ * have completed.
+ *
+ * Either way, we fail early, but we shouldn't log an
+ * error message, as it is a valid (if unexpected)
+ * scenario
+ */
+ rcu_read_unlock();
+ pr_debug("Unknown mm context for xsl interrupt\n");
+ ack_irq(spa, ADDRESS_ERROR);
+ return IRQ_HANDLED;
+ }
+ WARN_ON(pe_data->mm->context.id != pid);
+
+ spa->xsl_fault.pe = pe_handle;
+ spa->xsl_fault.dar = dar;
+ spa->xsl_fault.dsisr = dsisr;
+ spa->xsl_fault.pe_data = *pe_data;
+ mmgrab(pe_data->mm); /* mm count is released by bottom half */
+
+ rcu_read_unlock();
+ schedule_work(&spa->xsl_fault.fault_work);
+ return IRQ_HANDLED;
+}
+
+static void unmap_irq_registers(struct spa *spa)
+{
+ pnv_ocxl_unmap_xsl_regs(spa->reg_dsisr, spa->reg_dar, spa->reg_tfc,
+ spa->reg_pe_handle);
+}
+
+static int map_irq_registers(struct pci_dev *dev, struct spa *spa)
+{
+ return pnv_ocxl_map_xsl_regs(dev, &spa->reg_dsisr, &spa->reg_dar,
+ &spa->reg_tfc, &spa->reg_pe_handle);
+}
+
+static int setup_xsl_irq(struct pci_dev *dev, struct link *link)
+{
+ struct spa *spa = link->spa;
+ int rc;
+ int hwirq;
+
+ rc = pnv_ocxl_get_xsl_irq(dev, &hwirq);
+ if (rc)
+ return rc;
+
+ rc = map_irq_registers(dev, spa);
+ if (rc)
+ return rc;
+
+ spa->irq_name = kasprintf(GFP_KERNEL, "ocxl-xsl-%x-%x-%x",
+ link->domain, link->bus, link->dev);
+ if (!spa->irq_name) {
+ unmap_irq_registers(spa);
+ dev_err(&dev->dev, "Can't allocate name for xsl interrupt\n");
+ return -ENOMEM;
+ }
+ /*
+ * At some point, we'll need to look into allowing a higher
+ * number of interrupts. Could we have an IRQ domain per link?
+ */
+ spa->virq = irq_create_mapping(NULL, hwirq);
+ if (!spa->virq) {
+ kfree(spa->irq_name);
+ unmap_irq_registers(spa);
+ dev_err(&dev->dev,
+ "irq_create_mapping failed for translation interrupt\n");
+ return -EINVAL;
+ }
+
+ dev_dbg(&dev->dev, "hwirq %d mapped to virq %d\n", hwirq, spa->virq);
+
+ rc = request_irq(spa->virq, xsl_fault_handler, 0, spa->irq_name,
+ link);
+ if (rc) {
+ irq_dispose_mapping(spa->virq);
+ kfree(spa->irq_name);
+ unmap_irq_registers(spa);
+ dev_err(&dev->dev,
+ "request_irq failed for translation interrupt: %d\n",
+ rc);
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static void release_xsl_irq(struct link *link)
+{
+ struct spa *spa = link->spa;
+
+ if (spa->virq) {
+ free_irq(spa->virq, link);
+ irq_dispose_mapping(spa->virq);
+ }
+ kfree(spa->irq_name);
+ unmap_irq_registers(spa);
+}
+
+static int alloc_spa(struct pci_dev *dev, struct link *link)
+{
+ struct spa *spa;
+
+ spa = kzalloc(sizeof(struct spa), GFP_KERNEL);
+ if (!spa)
+ return -ENOMEM;
+
+ mutex_init(&spa->spa_lock);
+ INIT_RADIX_TREE(&spa->pe_tree, GFP_KERNEL);
+ INIT_WORK(&spa->xsl_fault.fault_work, xsl_fault_handler_bh);
+
+ spa->spa_order = SPA_SPA_SIZE_LOG - PAGE_SHIFT;
+ spa->spa_mem = (struct ocxl_process_element *)
+ __get_free_pages(GFP_KERNEL | __GFP_ZERO, spa->spa_order);
+ if (!spa->spa_mem) {
+ dev_err(&dev->dev, "Can't allocate Shared Process Area\n");
+ kfree(spa);
+ return -ENOMEM;
+ }
+ pr_debug("Allocated SPA for %x:%x:%x at %p\n", link->domain, link->bus,
+ link->dev, spa->spa_mem);
+
+ link->spa = spa;
+ return 0;
+}
+
+static void free_spa(struct link *link)
+{
+ struct spa *spa = link->spa;
+
+ pr_debug("Freeing SPA for %x:%x:%x\n", link->domain, link->bus,
+ link->dev);
+
+ if (spa && spa->spa_mem) {
+ free_pages((unsigned long) spa->spa_mem, spa->spa_order);
+ kfree(spa);
+ link->spa = NULL;
+ }
+}
+
+static int alloc_link(struct pci_dev *dev, int PE_mask, struct link **out_link)
+{
+ struct link *link;
+ int rc;
+
+ link = kzalloc(sizeof(struct link), GFP_KERNEL);
+ if (!link)
+ return -ENOMEM;
+
+ kref_init(&link->ref);
+ link->domain = pci_domain_nr(dev->bus);
+ link->bus = dev->bus->number;
+ link->dev = PCI_SLOT(dev->devfn);
+ atomic_set(&link->irq_available, MAX_IRQ_PER_LINK);
+
+ rc = alloc_spa(dev, link);
+ if (rc)
+ goto err_free;
+
+ rc = setup_xsl_irq(dev, link);
+ if (rc)
+ goto err_spa;
+
+ /* platform specific hook */
+ rc = pnv_ocxl_spa_setup(dev, link->spa->spa_mem, PE_mask,
+ &link->platform_data);
+ if (rc)
+ goto err_xsl_irq;
+
+ *out_link = link;
+ return 0;
+
+err_xsl_irq:
+ release_xsl_irq(link);
+err_spa:
+ free_spa(link);
+err_free:
+ kfree(link);
+ return rc;
+}
+
+static void free_link(struct link *link)
+{
+ release_xsl_irq(link);
+ free_spa(link);
+ kfree(link);
+}
+
+int ocxl_link_setup(struct pci_dev *dev, int PE_mask, void **link_handle)
+{
+ int rc = 0;
+ struct link *link;
+
+ mutex_lock(&links_list_lock);
+ list_for_each_entry(link, &links_list, list) {
+ /* The functions of a device all share the same link */
+ if (link->domain == pci_domain_nr(dev->bus) &&
+ link->bus == dev->bus->number &&
+ link->dev == PCI_SLOT(dev->devfn)) {
+ kref_get(&link->ref);
+ *link_handle = link;
+ goto unlock;
+ }
+ }
+ rc = alloc_link(dev, PE_mask, &link);
+ if (rc)
+ goto unlock;
+
+ list_add(&link->list, &links_list);
+ *link_handle = link;
+unlock:
+ mutex_unlock(&links_list_lock);
+ return rc;
+}
+
+static void release_xsl(struct kref *ref)
+{
+ struct link *link = container_of(ref, struct link, ref);
+
+ list_del(&link->list);
+ /* call platform code before releasing data */
+ pnv_ocxl_spa_release(link->platform_data);
+ free_link(link);
+}
+
+void ocxl_link_release(struct pci_dev *dev, void *link_handle)
+{
+ struct link *link = (struct link *) link_handle;
+
+ mutex_lock(&links_list_lock);
+ kref_put(&link->ref, release_xsl);
+ mutex_unlock(&links_list_lock);
+}
+
+static u64 calculate_cfg_state(bool kernel)
+{
+ u64 state;
+
+ state = SPA_CFG_DR;
+ if (mfspr(SPRN_LPCR) & LPCR_TC)
+ state |= SPA_CFG_TC;
+ if (radix_enabled())
+ state |= SPA_CFG_XLAT_ror;
+ else
+ state |= SPA_CFG_XLAT_hpt;
+ state |= SPA_CFG_HV;
+ if (kernel) {
+ if (mfmsr() & MSR_SF)
+ state |= SPA_CFG_SF;
+ } else {
+ state |= SPA_CFG_PR;
+ if (!test_tsk_thread_flag(current, TIF_32BIT))
+ state |= SPA_CFG_SF;
+ }
+ return state;
+}
+
+int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
+ u64 amr, struct mm_struct *mm,
+ void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
+ void *xsl_err_data)
+{
+ struct link *link = (struct link *) link_handle;
+ struct spa *spa = link->spa;
+ struct ocxl_process_element *pe;
+ int pe_handle, rc = 0;
+ struct pe_data *pe_data;
+
+ BUILD_BUG_ON(sizeof(struct ocxl_process_element) != 128);
+ if (pasid > SPA_PASID_MAX)
+ return -EINVAL;
+
+ mutex_lock(&spa->spa_lock);
+ pe_handle = pasid & SPA_PE_MASK;
+ pe = spa->spa_mem + pe_handle;
+
+ if (pe->software_state) {
+ rc = -EBUSY;
+ goto unlock;
+ }
+
+ pe_data = kmalloc(sizeof(*pe_data), GFP_KERNEL);
+ if (!pe_data) {
+ rc = -ENOMEM;
+ goto unlock;
+ }
+
+ pe_data->mm = mm;
+ pe_data->xsl_err_cb = xsl_err_cb;
+ pe_data->xsl_err_data = xsl_err_data;
+
+ memset(pe, 0, sizeof(struct ocxl_process_element));
+ pe->config_state = cpu_to_be64(calculate_cfg_state(pidr == 0));
+ pe->lpid = cpu_to_be32(mfspr(SPRN_LPID));
+ pe->pid = cpu_to_be32(pidr);
+ pe->tid = cpu_to_be32(tidr);
+ pe->amr = cpu_to_be64(amr);
+ pe->software_state = cpu_to_be32(SPA_PE_VALID);
+
+ mm_context_add_copro(mm);
+ /*
+ * Barrier is to make sure PE is visible in the SPA before it
+ * is used by the device. It also helps with the global TLBI
+ * invalidation
+ */
+ mb();
+ radix_tree_insert(&spa->pe_tree, pe_handle, pe_data);
+
+ /*
+ * The mm must stay valid for as long as the device uses it. We
+ * lower the count when the context is removed from the SPA.
+ *
+ * We grab mm_count (and not mm_users), as we don't want to
+ * end up in a circular dependency if a process mmaps its
+ * mmio, therefore incrementing the file ref count when
+ * calling mmap(), and forgets to unmap before exiting. In
+ * that scenario, when the kernel handles the death of the
+ * process, the file is not cleaned because unmap was not
+ * called, and the mm wouldn't be freed because we would still
+ * have a reference on mm_users. Incrementing mm_count solves
+ * the problem.
+ */
+ mmgrab(mm);
+unlock:
+ mutex_unlock(&spa->spa_lock);
+ return rc;
+}
+
+int ocxl_link_remove_pe(void *link_handle, int pasid)
+{
+ struct link *link = (struct link *) link_handle;
+ struct spa *spa = link->spa;
+ struct ocxl_process_element *pe;
+ struct pe_data *pe_data;
+ int pe_handle, rc;
+
+ if (pasid > SPA_PASID_MAX)
+ return -EINVAL;
+
+ /*
+ * About synchronization with our memory fault handler:
+ *
+ * Before removing the PE, the driver is supposed to have
+ * notified the AFU, which should have cleaned up and make
+ * sure the PASID is no longer in use, including pending
+ * interrupts. However, there's no way to be sure...
+ *
+ * We clear the PE and remove the context from our radix
+ * tree. From that point on, any new interrupt for that
+ * context will fail silently, which is ok. As mentioned
+ * above, that's not expected, but it could happen if the
+ * driver or AFU didn't do the right thing.
+ *
+ * There could still be a bottom half running, but we don't
+ * need to wait/flush, as it is managing a reference count on
+ * the mm it reads from the radix tree.
+ */
+ pe_handle = pasid & SPA_PE_MASK;
+ pe = spa->spa_mem + pe_handle;
+
+ mutex_lock(&spa->spa_lock);
+
+ if (!(pe->software_state & cpu_to_be32(SPA_PE_VALID))) {
+ rc = -EINVAL;
+ goto unlock;
+ }
+
+ memset(pe, 0, sizeof(struct ocxl_process_element));
+ /*
+ * The barrier makes sure the PE is removed from the SPA
+ * before we clear the NPU context cache below, so that the
+ * old PE cannot be reloaded erroneously.
+ */
+ mb();
+
+ /*
+ * hook to platform code
+ * On powerpc, the entry needs to be cleared from the context
+ * cache of the NPU.
+ */
+ rc = pnv_ocxl_spa_remove_pe(link->platform_data, pe_handle);
+ WARN_ON(rc);
+
+ pe_data = radix_tree_delete(&spa->pe_tree, pe_handle);
+ if (!pe_data) {
+ WARN(1, "Couldn't find pe data when removing PE\n");
+ } else {
+ mm_context_remove_copro(pe_data->mm);
+ mmdrop(pe_data->mm);
+ kfree_rcu(pe_data, rcu);
+ }
+unlock:
+ mutex_unlock(&spa->spa_lock);
+ return rc;
+}
diff --git a/drivers/misc/ocxl/main.c b/drivers/misc/ocxl/main.c
new file mode 100644
index 000000000000..be34b8fae97a
--- /dev/null
+++ b/drivers/misc/ocxl/main.c
@@ -0,0 +1,40 @@
+/*
+ * Copyright 2017 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/pci.h>
+#include "ocxl_internal.h"
+
+static int __init init_ocxl(void)
+{
+ int rc = 0;
+
+ rc = ocxl_file_init();
+ if (rc)
+ return rc;
+
+ rc = pci_register_driver(&ocxl_pci_driver);
+ if (rc) {
+ ocxl_file_exit();
+ return rc;
+ }
+ return 0;
+}
+
+static void exit_ocxl(void)
+{
+ pci_unregister_driver(&ocxl_pci_driver);
+ ocxl_file_exit();
+}
+
+module_init(init_ocxl);
+module_exit(exit_ocxl);
+
+MODULE_DESCRIPTION("Open Coherent Accelerator");
+MODULE_LICENSE("GPL");
diff --git a/drivers/misc/ocxl/ocxl_internal.h b/drivers/misc/ocxl/ocxl_internal.h
new file mode 100644
index 000000000000..e07f7d523275
--- /dev/null
+++ b/drivers/misc/ocxl/ocxl_internal.h
@@ -0,0 +1,200 @@
+/*
+ * Copyright 2017 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _OCXL_INTERNAL_H_
+#define _OCXL_INTERNAL_H_
+
+#include <linux/pci.h>
+#include <linux/cdev.h>
+#include <linux/list.h>
+
+#define OCXL_AFU_NAME_SZ (24+1) /* add 1 for NULL termination */
+#define MAX_IRQ_PER_LINK 2000
+#define MAX_IRQ_PER_CONTEXT MAX_IRQ_PER_LINK
+
+#define to_ocxl_function(d) container_of(d, struct ocxl_fn, dev)
+#define to_ocxl_afu(d) container_of(d, struct ocxl_afu, dev)
+
+extern struct pci_driver ocxl_pci_driver;
+
+/*
+ * The following 2 structures are a fairly generic way of representing
+ * the configuration data for a function and AFU, as read from the
+ * configuration space.
+ */
+struct ocxl_afu_config {
+ u8 idx;
+ int dvsec_afu_control_pos;
+ char name[OCXL_AFU_NAME_SZ];
+ u8 version_major;
+ u8 version_minor;
+ u8 afuc_type;
+ u8 afum_type;
+ u8 profile;
+ u8 global_mmio_bar;
+ u64 global_mmio_offset;
+ u32 global_mmio_size;
+ u8 pp_mmio_bar;
+ u64 pp_mmio_offset;
+ u32 pp_mmio_stride;
+ u8 log_mem_size;
+ u8 pasid_supported_log;
+ u16 actag_supported;
+};
+
+struct ocxl_fn_config {
+ int dvsec_tl_pos;
+ int dvsec_function_pos;
+ int dvsec_afu_info_pos;
+ s8 max_pasid_log;
+ s8 max_afu_index;
+};
+
+struct ocxl_fn {
+ struct device dev;
+ int bar_used[3];
+ struct ocxl_fn_config config;
+ struct list_head afu_list;
+ int pasid_base;
+ int actag_base;
+ int actag_enabled;
+ int actag_supported;
+ struct list_head pasid_list;
+ struct list_head actag_list;
+ void *link;
+};
+
+struct ocxl_afu {
+ struct ocxl_fn *fn;
+ struct list_head list;
+ struct device dev;
+ struct cdev cdev;
+ struct ocxl_afu_config config;
+ int pasid_base;
+ int pasid_count; /* opened contexts */
+ int pasid_max; /* maximum number of contexts */
+ int actag_base;
+ int actag_enabled;
+ struct mutex contexts_lock;
+ struct idr contexts_idr;
+ struct mutex afu_control_lock;
+ u64 global_mmio_start;
+ u64 irq_base_offset;
+ void __iomem *global_mmio_ptr;
+ u64 pp_mmio_start;
+ struct bin_attribute attr_global_mmio;
+};
+
+enum ocxl_context_status {
+ CLOSED,
+ OPENED,
+ ATTACHED,
+};
+
+// Contains metadata about a translation fault
+struct ocxl_xsl_error {
+ u64 addr; // The address that triggered the fault
+ u64 dsisr; // the value of the dsisr register
+ u64 count; // The number of times this fault has been triggered
+};
+
+struct ocxl_context {
+ struct ocxl_afu *afu;
+ int pasid;
+ struct mutex status_mutex;
+ enum ocxl_context_status status;
+ struct address_space *mapping;
+ struct mutex mapping_lock;
+ wait_queue_head_t events_wq;
+ struct mutex xsl_error_lock;
+ struct ocxl_xsl_error xsl_error;
+ struct mutex irq_lock;
+ struct idr irq_idr;
+};
+
+struct ocxl_process_element {
+ u64 config_state;
+ u32 reserved1[11];
+ u32 lpid;
+ u32 tid;
+ u32 pid;
+ u32 reserved2[10];
+ u64 amr;
+ u32 reserved3[3];
+ u32 software_state;
+};
+
+
+extern struct ocxl_afu *ocxl_afu_get(struct ocxl_afu *afu);
+extern void ocxl_afu_put(struct ocxl_afu *afu);
+
+extern int ocxl_create_cdev(struct ocxl_afu *afu);
+extern void ocxl_destroy_cdev(struct ocxl_afu *afu);
+extern int ocxl_register_afu(struct ocxl_afu *afu);
+extern void ocxl_unregister_afu(struct ocxl_afu *afu);
+
+extern int ocxl_file_init(void);
+extern void ocxl_file_exit(void);
+
+extern int ocxl_config_read_function(struct pci_dev *dev,
+ struct ocxl_fn_config *fn);
+
+extern int ocxl_config_check_afu_index(struct pci_dev *dev,
+ struct ocxl_fn_config *fn, int afu_idx);
+extern int ocxl_config_read_afu(struct pci_dev *dev,
+ struct ocxl_fn_config *fn,
+ struct ocxl_afu_config *afu,
+ u8 afu_idx);
+extern int ocxl_config_get_pasid_info(struct pci_dev *dev, int *count);
+extern void ocxl_config_set_afu_pasid(struct pci_dev *dev,
+ int afu_control,
+ int pasid_base, u32 pasid_count_log);
+extern int ocxl_config_get_actag_info(struct pci_dev *dev,
+ u16 *base, u16 *enabled, u16 *supported);
+extern void ocxl_config_set_actag(struct pci_dev *dev, int func_dvsec,
+ u32 tag_first, u32 tag_count);
+extern void ocxl_config_set_afu_actag(struct pci_dev *dev, int afu_control,
+ int actag_base, int actag_count);
+extern void ocxl_config_set_afu_state(struct pci_dev *dev, int afu_control,
+ int enable);
+extern int ocxl_config_set_TL(struct pci_dev *dev, int tl_dvsec);
+extern int ocxl_config_terminate_pasid(struct pci_dev *dev, int afu_control,
+ int pasid);
+
+extern int ocxl_link_setup(struct pci_dev *dev, int PE_mask,
+ void **link_handle);
+extern void ocxl_link_release(struct pci_dev *dev, void *link_handle);
+extern int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
+ u64 amr, struct mm_struct *mm,
+ void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
+ void *xsl_err_data);
+extern int ocxl_link_remove_pe(void *link_handle, int pasid);
+extern int ocxl_link_irq_alloc(void *link_handle, int *hw_irq,
+ u64 *addr);
+extern void ocxl_link_free_irq(void *link_handle, int hw_irq);
+
+extern int ocxl_pasid_afu_alloc(struct ocxl_fn *fn, u32 size);
+extern void ocxl_pasid_afu_free(struct ocxl_fn *fn, u32 start, u32 size);
+extern int ocxl_actag_afu_alloc(struct ocxl_fn *fn, u32 size);
+extern void ocxl_actag_afu_free(struct ocxl_fn *fn, u32 start, u32 size);
+
+extern struct ocxl_context *ocxl_context_alloc(void);
+extern int ocxl_context_init(struct ocxl_context *ctx, struct ocxl_afu *afu,
+ struct address_space *mapping);
+extern int ocxl_context_attach(struct ocxl_context *ctx, u64 amr);
+extern int ocxl_context_mmap(struct ocxl_context *ctx,
+ struct vm_area_struct *vma);
+extern int ocxl_context_detach(struct ocxl_context *ctx);
+extern void ocxl_context_detach_all(struct ocxl_afu *afu);
+extern void ocxl_context_free(struct ocxl_context *ctx);
+
+extern int ocxl_sysfs_add_afu(struct ocxl_afu *afu);
+extern void ocxl_sysfs_remove_afu(struct ocxl_afu *afu);
+
+#endif /* _OCXL_INTERNAL_H_ */
diff --git a/drivers/misc/ocxl/pasid.c b/drivers/misc/ocxl/pasid.c
new file mode 100644
index 000000000000..ea999a3a99b4
--- /dev/null
+++ b/drivers/misc/ocxl/pasid.c
@@ -0,0 +1,114 @@
+/*
+ * Copyright 2017 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include "ocxl_internal.h"
+
+
+struct id_range {
+ struct list_head list;
+ u32 start;
+ u32 end;
+};
+
+#ifdef DEBUG
+static void dump_list(struct list_head *head, char *type_str)
+{
+ struct id_range *cur;
+
+ pr_debug("%s ranges allocated:\n", type_str);
+ list_for_each_entry(cur, head, list) {
+ pr_debug("Range %d->%d\n", cur->start, cur->end);
+ }
+}
+#endif
+
+static int range_alloc(struct list_head *head, u32 size, int max_id,
+ char *type_str)
+{
+ struct list_head *pos;
+ struct id_range *cur, *new;
+ int rc, last_end;
+
+ new = kmalloc(sizeof(struct id_range), GFP_KERNEL);
+ if (!new)
+ return -ENOMEM;
+
+ pos = head;
+ last_end = -1;
+ list_for_each_entry(cur, head, list) {
+ if ((cur->start - last_end) > size)
+ break;
+ last_end = cur->end;
+ pos = &cur->list;
+ }
+
+ new->start = last_end + 1;
+ new->end = new->start + size - 1;
+
+ if (new->end > max_id) {
+ kfree(new);
+ rc = -ENOSPC;
+ } else {
+ list_add(&new->list, pos);
+ rc = new->start;
+ }
+
+#ifdef DEBUG
+ dump_list(head, type_str);
+#endif
+ return rc;
+}
+
+static void range_free(struct list_head *head, u32 start, u32 size,
+ char *type_str)
+{
+ bool found = false;
+ struct id_range *cur, *tmp;
+
+ list_for_each_entry_safe(cur, tmp, head, list) {
+ if (cur->start == start && cur->end == (start + size - 1)) {
+ found = true;
+ list_del(&cur->list);
+ kfree(cur);
+ break;
+ }
+ }
+ WARN_ON(!found);
+#ifdef DEBUG
+ dump_list(head, type_str);
+#endif
+}
+
+int ocxl_pasid_afu_alloc(struct ocxl_fn *fn, u32 size)
+{
+ int max_pasid;
+
+ if (fn->config.max_pasid_log < 0)
+ return -ENOSPC;
+ max_pasid = 1 << fn->config.max_pasid_log;
+ return range_alloc(&fn->pasid_list, size, max_pasid, "afu pasid");
+}
+
+void ocxl_pasid_afu_free(struct ocxl_fn *fn, u32 start, u32 size)
+{
+ return range_free(&fn->pasid_list, start, size, "afu pasid");
+}
+
+int ocxl_actag_afu_alloc(struct ocxl_fn *fn, u32 size)
+{
+ int max_actag;
+
+ max_actag = fn->actag_enabled;
+ return range_alloc(&fn->actag_list, size, max_actag, "afu actag");
+}
+
+void ocxl_actag_afu_free(struct ocxl_fn *fn, u32 start, u32 size)
+{
+ return range_free(&fn->actag_list, start, size, "afu actag");
+}
diff --git a/drivers/misc/ocxl/pci.c b/drivers/misc/ocxl/pci.c
new file mode 100644
index 000000000000..39e7bdd48215
--- /dev/null
+++ b/drivers/misc/ocxl/pci.c
@@ -0,0 +1,592 @@
+/*
+ * Copyright 2017 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/idr.h>
+#include <asm/pnv-ocxl.h>
+#include "ocxl_internal.h"
+
+/*
+ * Any opencapi device which wants to use this 'generic' driver should
+ * use the 0x062B device ID. Vendors should define the subsystem
+ * vendor/device ID to help differentiate devices.
+ */
+static const struct pci_device_id ocxl_pci_tbl[] = {
+ { PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x062B), },
+ { }
+};
+MODULE_DEVICE_TABLE(pci, ocxl_pci_tbl);
+
+
+static struct ocxl_fn *ocxl_fn_get(struct ocxl_fn *fn)
+{
+ return (get_device(&fn->dev) == NULL) ? NULL : fn;
+}
+
+static void ocxl_fn_put(struct ocxl_fn *fn)
+{
+ put_device(&fn->dev);
+}
+
+struct ocxl_afu *ocxl_afu_get(struct ocxl_afu *afu)
+{
+ return (get_device(&afu->dev) == NULL) ? NULL : afu;
+}
+
+void ocxl_afu_put(struct ocxl_afu *afu)
+{
+ put_device(&afu->dev);
+}
+
+static struct ocxl_afu *alloc_afu(struct ocxl_fn *fn)
+{
+ struct ocxl_afu *afu;
+
+ afu = kzalloc(sizeof(struct ocxl_afu), GFP_KERNEL);
+ if (!afu)
+ return NULL;
+
+ mutex_init(&afu->contexts_lock);
+ mutex_init(&afu->afu_control_lock);
+ idr_init(&afu->contexts_idr);
+ afu->fn = fn;
+ ocxl_fn_get(fn);
+ return afu;
+}
+
+static void free_afu(struct ocxl_afu *afu)
+{
+ idr_destroy(&afu->contexts_idr);
+ ocxl_fn_put(afu->fn);
+ kfree(afu);
+}
+
+static void free_afu_dev(struct device *dev)
+{
+ struct ocxl_afu *afu = to_ocxl_afu(dev);
+
+ ocxl_unregister_afu(afu);
+ free_afu(afu);
+}
+
+static int set_afu_device(struct ocxl_afu *afu, const char *location)
+{
+ struct ocxl_fn *fn = afu->fn;
+ int rc;
+
+ afu->dev.parent = &fn->dev;
+ afu->dev.release = free_afu_dev;
+ rc = dev_set_name(&afu->dev, "%s.%s.%hhu", afu->config.name, location,
+ afu->config.idx);
+ return rc;
+}
+
+static int assign_afu_actag(struct ocxl_afu *afu, struct pci_dev *dev)
+{
+ struct ocxl_fn *fn = afu->fn;
+ int actag_count, actag_offset;
+
+ /*
+ * if there were not enough actags for the function, each afu
+ * reduces its count as well
+ */
+ actag_count = afu->config.actag_supported *
+ fn->actag_enabled / fn->actag_supported;
+ actag_offset = ocxl_actag_afu_alloc(fn, actag_count);
+ if (actag_offset < 0) {
+ dev_err(&afu->dev, "Can't allocate %d actags for AFU: %d\n",
+ actag_count, actag_offset);
+ return actag_offset;
+ }
+ afu->actag_base = fn->actag_base + actag_offset;
+ afu->actag_enabled = actag_count;
+
+ ocxl_config_set_afu_actag(dev, afu->config.dvsec_afu_control_pos,
+ afu->actag_base, afu->actag_enabled);
+ dev_dbg(&afu->dev, "actag base=%d enabled=%d\n",
+ afu->actag_base, afu->actag_enabled);
+ return 0;
+}
+
+static void reclaim_afu_actag(struct ocxl_afu *afu)
+{
+ struct ocxl_fn *fn = afu->fn;
+ int start_offset, size;
+
+ start_offset = afu->actag_base - fn->actag_base;
+ size = afu->actag_enabled;
+ ocxl_actag_afu_free(afu->fn, start_offset, size);
+}
+
+static int assign_afu_pasid(struct ocxl_afu *afu, struct pci_dev *dev)
+{
+ struct ocxl_fn *fn = afu->fn;
+ int pasid_count, pasid_offset;
+
+ /*
+ * We only support the case where the function configuration
+ * requested enough PASIDs to cover all AFUs.
+ */
+ pasid_count = 1 << afu->config.pasid_supported_log;
+ pasid_offset = ocxl_pasid_afu_alloc(fn, pasid_count);
+ if (pasid_offset < 0) {
+ dev_err(&afu->dev, "Can't allocate %d PASIDs for AFU: %d\n",
+ pasid_count, pasid_offset);
+ return pasid_offset;
+ }
+ afu->pasid_base = fn->pasid_base + pasid_offset;
+ afu->pasid_count = 0;
+ afu->pasid_max = pasid_count;
+
+ ocxl_config_set_afu_pasid(dev, afu->config.dvsec_afu_control_pos,
+ afu->pasid_base,
+ afu->config.pasid_supported_log);
+ dev_dbg(&afu->dev, "PASID base=%d, enabled=%d\n",
+ afu->pasid_base, pasid_count);
+ return 0;
+}
+
+static void reclaim_afu_pasid(struct ocxl_afu *afu)
+{
+ struct ocxl_fn *fn = afu->fn;
+ int start_offset, size;
+
+ start_offset = afu->pasid_base - fn->pasid_base;
+ size = 1 << afu->config.pasid_supported_log;
+ ocxl_pasid_afu_free(afu->fn, start_offset, size);
+}
+
+static int reserve_fn_bar(struct ocxl_fn *fn, int bar)
+{
+ struct pci_dev *dev = to_pci_dev(fn->dev.parent);
+ int rc, idx;
+
+ if (bar != 0 && bar != 2 && bar != 4)
+ return -EINVAL;
+
+ idx = bar >> 1;
+ if (fn->bar_used[idx]++ == 0) {
+ rc = pci_request_region(dev, bar, "ocxl");
+ if (rc)
+ return rc;
+ }
+ return 0;
+}
+
+static void release_fn_bar(struct ocxl_fn *fn, int bar)
+{
+ struct pci_dev *dev = to_pci_dev(fn->dev.parent);
+ int idx;
+
+ if (bar != 0 && bar != 2 && bar != 4)
+ return;
+
+ idx = bar >> 1;
+ if (--fn->bar_used[idx] == 0)
+ pci_release_region(dev, bar);
+ WARN_ON(fn->bar_used[idx] < 0);
+}
+
+static int map_mmio_areas(struct ocxl_afu *afu, struct pci_dev *dev)
+{
+ int rc;
+
+ rc = reserve_fn_bar(afu->fn, afu->config.global_mmio_bar);
+ if (rc)
+ return rc;
+
+ rc = reserve_fn_bar(afu->fn, afu->config.pp_mmio_bar);
+ if (rc) {
+ release_fn_bar(afu->fn, afu->config.global_mmio_bar);
+ return rc;
+ }
+
+ afu->global_mmio_start =
+ pci_resource_start(dev, afu->config.global_mmio_bar) +
+ afu->config.global_mmio_offset;
+ afu->pp_mmio_start =
+ pci_resource_start(dev, afu->config.pp_mmio_bar) +
+ afu->config.pp_mmio_offset;
+
+ afu->global_mmio_ptr = ioremap(afu->global_mmio_start,
+ afu->config.global_mmio_size);
+ if (!afu->global_mmio_ptr) {
+ release_fn_bar(afu->fn, afu->config.pp_mmio_bar);
+ release_fn_bar(afu->fn, afu->config.global_mmio_bar);
+ dev_err(&dev->dev, "Error mapping global mmio area\n");
+ return -ENOMEM;
+ }
+
+ /*
+ * Leave an empty page between the per-process mmio area and
+ * the AFU interrupt mappings
+ */
+ afu->irq_base_offset = afu->config.pp_mmio_stride + PAGE_SIZE;
+ return 0;
+}
+
+static void unmap_mmio_areas(struct ocxl_afu *afu)
+{
+ if (afu->global_mmio_ptr) {
+ iounmap(afu->global_mmio_ptr);
+ afu->global_mmio_ptr = NULL;
+ }
+ afu->global_mmio_start = 0;
+ afu->pp_mmio_start = 0;
+ release_fn_bar(afu->fn, afu->config.pp_mmio_bar);
+ release_fn_bar(afu->fn, afu->config.global_mmio_bar);
+}
+
+static int configure_afu(struct ocxl_afu *afu, u8 afu_idx, struct pci_dev *dev)
+{
+ int rc;
+
+ rc = ocxl_config_read_afu(dev, &afu->fn->config, &afu->config, afu_idx);
+ if (rc)
+ return rc;
+
+ rc = set_afu_device(afu, dev_name(&dev->dev));
+ if (rc)
+ return rc;
+
+ rc = assign_afu_actag(afu, dev);
+ if (rc)
+ return rc;
+
+ rc = assign_afu_pasid(afu, dev);
+ if (rc) {
+ reclaim_afu_actag(afu);
+ return rc;
+ }
+
+ rc = map_mmio_areas(afu, dev);
+ if (rc) {
+ reclaim_afu_pasid(afu);
+ reclaim_afu_actag(afu);
+ return rc;
+ }
+ return 0;
+}
+
+static void deconfigure_afu(struct ocxl_afu *afu)
+{
+ unmap_mmio_areas(afu);
+ reclaim_afu_pasid(afu);
+ reclaim_afu_actag(afu);
+}
+
+static int activate_afu(struct pci_dev *dev, struct ocxl_afu *afu)
+{
+ int rc;
+
+ ocxl_config_set_afu_state(dev, afu->config.dvsec_afu_control_pos, 1);
+ /*
+ * Char device creation is the last step, as processes can
+ * call our driver immediately, so all our inits must be finished.
+ */
+ rc = ocxl_create_cdev(afu);
+ if (rc)
+ return rc;
+ return 0;
+}
+
+static void deactivate_afu(struct ocxl_afu *afu)
+{
+ struct pci_dev *dev = to_pci_dev(afu->fn->dev.parent);
+
+ ocxl_destroy_cdev(afu);
+ ocxl_config_set_afu_state(dev, afu->config.dvsec_afu_control_pos, 0);
+}
+
+static int init_afu(struct pci_dev *dev, struct ocxl_fn *fn, u8 afu_idx)
+{
+ int rc;
+ struct ocxl_afu *afu;
+
+ afu = alloc_afu(fn);
+ if (!afu)
+ return -ENOMEM;
+
+ rc = configure_afu(afu, afu_idx, dev);
+ if (rc) {
+ free_afu(afu);
+ return rc;
+ }
+
+ rc = ocxl_register_afu(afu);
+ if (rc)
+ goto err;
+
+ rc = ocxl_sysfs_add_afu(afu);
+ if (rc)
+ goto err;
+
+ rc = activate_afu(dev, afu);
+ if (rc)
+ goto err_sys;
+
+ list_add_tail(&afu->list, &fn->afu_list);
+ return 0;
+
+err_sys:
+ ocxl_sysfs_remove_afu(afu);
+err:
+ deconfigure_afu(afu);
+ device_unregister(&afu->dev);
+ return rc;
+}
+
+static void remove_afu(struct ocxl_afu *afu)
+{
+ list_del(&afu->list);
+ ocxl_context_detach_all(afu);
+ deactivate_afu(afu);
+ ocxl_sysfs_remove_afu(afu);
+ deconfigure_afu(afu);
+ device_unregister(&afu->dev);
+}
+
+static struct ocxl_fn *alloc_function(struct pci_dev *dev)
+{
+ struct ocxl_fn *fn;
+
+ fn = kzalloc(sizeof(struct ocxl_fn), GFP_KERNEL);
+ if (!fn)
+ return NULL;
+
+ INIT_LIST_HEAD(&fn->afu_list);
+ INIT_LIST_HEAD(&fn->pasid_list);
+ INIT_LIST_HEAD(&fn->actag_list);
+ return fn;
+}
+
+static void free_function(struct ocxl_fn *fn)
+{
+ WARN_ON(!list_empty(&fn->afu_list));
+ WARN_ON(!list_empty(&fn->pasid_list));
+ kfree(fn);
+}
+
+static void free_function_dev(struct device *dev)
+{
+ struct ocxl_fn *fn = to_ocxl_function(dev);
+
+ free_function(fn);
+}
+
+static int set_function_device(struct ocxl_fn *fn, struct pci_dev *dev)
+{
+ int rc;
+
+ fn->dev.parent = &dev->dev;
+ fn->dev.release = free_function_dev;
+ rc = dev_set_name(&fn->dev, "ocxlfn.%s", dev_name(&dev->dev));
+ if (rc)
+ return rc;
+ pci_set_drvdata(dev, fn);
+ return 0;
+}
+
+static int assign_function_actag(struct ocxl_fn *fn)
+{
+ struct pci_dev *dev = to_pci_dev(fn->dev.parent);
+ u16 base, enabled, supported;
+ int rc;
+
+ rc = ocxl_config_get_actag_info(dev, &base, &enabled, &supported);
+ if (rc)
+ return rc;
+
+ fn->actag_base = base;
+ fn->actag_enabled = enabled;
+ fn->actag_supported = supported;
+
+ ocxl_config_set_actag(dev, fn->config.dvsec_function_pos,
+ fn->actag_base, fn->actag_enabled);
+ dev_dbg(&fn->dev, "actag range starting at %d, enabled %d\n",
+ fn->actag_base, fn->actag_enabled);
+ return 0;
+}
+
+static int set_function_pasid(struct ocxl_fn *fn)
+{
+ struct pci_dev *dev = to_pci_dev(fn->dev.parent);
+ int rc, desired_count, max_count;
+
+ /* A function may not require any PASID */
+ if (fn->config.max_pasid_log < 0)
+ return 0;
+
+ rc = ocxl_config_get_pasid_info(dev, &max_count);
+ if (rc)
+ return rc;
+
+ desired_count = 1 << fn->config.max_pasid_log;
+
+ if (desired_count > max_count) {
+ dev_err(&fn->dev,
+ "Function requires more PASIDs than is available (%d vs. %d)\n",
+ desired_count, max_count);
+ return -ENOSPC;
+ }
+
+ fn->pasid_base = 0;
+ return 0;
+}
+
+static int configure_function(struct ocxl_fn *fn, struct pci_dev *dev)
+{
+ int rc;
+
+ rc = pci_enable_device(dev);
+ if (rc) {
+ dev_err(&dev->dev, "pci_enable_device failed: %d\n", rc);
+ return rc;
+ }
+
+ /*
+ * Once it has been confirmed to work on our hardware, we
+ * should reset the function, to force the adapter to restart
+ * from scratch.
+ * A function reset would also reset all its AFUs.
+ *
+ * Some hints for implementation:
+ *
+ * - there's not status bit to know when the reset is done. We
+ * should try reading the config space to know when it's
+ * done.
+ * - probably something like:
+ * Reset
+ * wait 100ms
+ * issue config read
+ * allow device up to 1 sec to return success on config
+ * read before declaring it broken
+ *
+ * Some shared logic on the card (CFG, TLX) won't be reset, so
+ * there's no guarantee that it will be enough.
+ */
+ rc = ocxl_config_read_function(dev, &fn->config);
+ if (rc)
+ return rc;
+
+ rc = set_function_device(fn, dev);
+ if (rc)
+ return rc;
+
+ rc = assign_function_actag(fn);
+ if (rc)
+ return rc;
+
+ rc = set_function_pasid(fn);
+ if (rc)
+ return rc;
+
+ rc = ocxl_link_setup(dev, 0, &fn->link);
+ if (rc)
+ return rc;
+
+ rc = ocxl_config_set_TL(dev, fn->config.dvsec_tl_pos);
+ if (rc) {
+ ocxl_link_release(dev, fn->link);
+ return rc;
+ }
+ return 0;
+}
+
+static void deconfigure_function(struct ocxl_fn *fn)
+{
+ struct pci_dev *dev = to_pci_dev(fn->dev.parent);
+
+ ocxl_link_release(dev, fn->link);
+ pci_disable_device(dev);
+}
+
+static struct ocxl_fn *init_function(struct pci_dev *dev)
+{
+ struct ocxl_fn *fn;
+ int rc;
+
+ fn = alloc_function(dev);
+ if (!fn)
+ return ERR_PTR(-ENOMEM);
+
+ rc = configure_function(fn, dev);
+ if (rc) {
+ free_function(fn);
+ return ERR_PTR(rc);
+ }
+
+ rc = device_register(&fn->dev);
+ if (rc) {
+ deconfigure_function(fn);
+ device_unregister(&fn->dev);
+ return ERR_PTR(rc);
+ }
+ return fn;
+}
+
+static void remove_function(struct ocxl_fn *fn)
+{
+ deconfigure_function(fn);
+ device_unregister(&fn->dev);
+}
+
+static int ocxl_probe(struct pci_dev *dev, const struct pci_device_id *id)
+{
+ int rc, afu_count = 0;
+ u8 afu;
+ struct ocxl_fn *fn;
+
+ if (!radix_enabled()) {
+ dev_err(&dev->dev, "Unsupported memory model (hash)\n");
+ return -ENODEV;
+ }
+
+ fn = init_function(dev);
+ if (IS_ERR(fn)) {
+ dev_err(&dev->dev, "function init failed: %li\n",
+ PTR_ERR(fn));
+ return PTR_ERR(fn);
+ }
+
+ for (afu = 0; afu <= fn->config.max_afu_index; afu++) {
+ rc = ocxl_config_check_afu_index(dev, &fn->config, afu);
+ if (rc > 0) {
+ rc = init_afu(dev, fn, afu);
+ if (rc) {
+ dev_err(&dev->dev,
+ "Can't initialize AFU index %d\n", afu);
+ continue;
+ }
+ afu_count++;
+ }
+ }
+ dev_info(&dev->dev, "%d AFU(s) configured\n", afu_count);
+ return 0;
+}
+
+static void ocxl_remove(struct pci_dev *dev)
+{
+ struct ocxl_afu *afu, *tmp;
+ struct ocxl_fn *fn = pci_get_drvdata(dev);
+
+ list_for_each_entry_safe(afu, tmp, &fn->afu_list, list) {
+ remove_afu(afu);
+ }
+ remove_function(fn);
+}
+
+struct pci_driver ocxl_pci_driver = {
+ .name = "ocxl",
+ .id_table = ocxl_pci_tbl,
+ .probe = ocxl_probe,
+ .remove = ocxl_remove,
+ .shutdown = ocxl_remove,
+};
diff --git a/drivers/misc/ocxl/sysfs.c b/drivers/misc/ocxl/sysfs.c
new file mode 100644
index 000000000000..b7b1d1735c07
--- /dev/null
+++ b/drivers/misc/ocxl/sysfs.c
@@ -0,0 +1,150 @@
+/*
+ * Copyright 2017 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/sysfs.h>
+#include "ocxl_internal.h"
+
+static ssize_t global_mmio_size_show(struct device *device,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct ocxl_afu *afu = to_ocxl_afu(device);
+
+ return scnprintf(buf, PAGE_SIZE, "%d\n",
+ afu->config.global_mmio_size);
+}
+
+static ssize_t pp_mmio_size_show(struct device *device,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct ocxl_afu *afu = to_ocxl_afu(device);
+
+ return scnprintf(buf, PAGE_SIZE, "%d\n",
+ afu->config.pp_mmio_stride);
+}
+
+static ssize_t afu_version_show(struct device *device,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct ocxl_afu *afu = to_ocxl_afu(device);
+
+ return scnprintf(buf, PAGE_SIZE, "%hhu:%hhu\n",
+ afu->config.version_major,
+ afu->config.version_minor);
+}
+
+static ssize_t contexts_show(struct device *device,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct ocxl_afu *afu = to_ocxl_afu(device);
+
+ return scnprintf(buf, PAGE_SIZE, "%d/%d\n",
+ afu->pasid_count, afu->pasid_max);
+}
+
+static struct device_attribute afu_attrs[] = {
+ __ATTR_RO(global_mmio_size),
+ __ATTR_RO(pp_mmio_size),
+ __ATTR_RO(afu_version),
+ __ATTR_RO(contexts),
+};
+
+static ssize_t global_mmio_read(struct file *filp, struct kobject *kobj,
+ struct bin_attribute *bin_attr, char *buf,
+ loff_t off, size_t count)
+{
+ struct ocxl_afu *afu = to_ocxl_afu(kobj_to_dev(kobj));
+
+ if (count == 0 || off < 0 ||
+ off >= afu->config.global_mmio_size)
+ return 0;
+
+ memcpy(buf, afu->global_mmio_ptr + off, count);
+ return count;
+}
+
+static int global_mmio_fault(struct vm_fault *vmf)
+{
+ struct vm_area_struct *vma = vmf->vma;
+ struct ocxl_afu *afu = vma->vm_private_data;
+ unsigned long offset;
+
+ if (vmf->pgoff >= (afu->config.global_mmio_size >> PAGE_SHIFT))
+ return VM_FAULT_SIGBUS;
+
+ offset = vmf->pgoff;
+ offset += (afu->global_mmio_start >> PAGE_SHIFT);
+ vm_insert_pfn(vma, vmf->address, offset);
+ return VM_FAULT_NOPAGE;
+}
+
+static const struct vm_operations_struct global_mmio_vmops = {
+ .fault = global_mmio_fault,
+};
+
+static int global_mmio_mmap(struct file *filp, struct kobject *kobj,
+ struct bin_attribute *bin_attr,
+ struct vm_area_struct *vma)
+{
+ struct ocxl_afu *afu = to_ocxl_afu(kobj_to_dev(kobj));
+
+ if ((vma_pages(vma) + vma->vm_pgoff) >
+ (afu->config.global_mmio_size >> PAGE_SHIFT))
+ return -EINVAL;
+
+ vma->vm_flags |= VM_IO | VM_PFNMAP;
+ vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+ vma->vm_ops = &global_mmio_vmops;
+ vma->vm_private_data = afu;
+ return 0;
+}
+
+int ocxl_sysfs_add_afu(struct ocxl_afu *afu)
+{
+ int i, rc;
+
+ for (i = 0; i < ARRAY_SIZE(afu_attrs); i++) {
+ rc = device_create_file(&afu->dev, &afu_attrs[i]);
+ if (rc)
+ goto err;
+ }
+
+ sysfs_attr_init(&afu->attr_global_mmio.attr);
+ afu->attr_global_mmio.attr.name = "global_mmio_area";
+ afu->attr_global_mmio.attr.mode = 0600;
+ afu->attr_global_mmio.size = afu->config.global_mmio_size;
+ afu->attr_global_mmio.read = global_mmio_read;
+ afu->attr_global_mmio.mmap = global_mmio_mmap;
+ rc = device_create_bin_file(&afu->dev, &afu->attr_global_mmio);
+ if (rc) {
+ dev_err(&afu->dev,
+ "Unable to create global mmio attr for afu: %d\n",
+ rc);
+ goto err;
+ }
+
+ return 0;
+
+err:
+ for (i--; i >= 0; i--)
+ device_remove_file(&afu->dev, &afu_attrs[i]);
+ return rc;
+}
+
+void ocxl_sysfs_remove_afu(struct ocxl_afu *afu)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(afu_attrs); i++)
+ device_remove_file(&afu->dev, &afu_attrs[i]);
+ device_remove_bin_file(&afu->dev, &afu->attr_global_mmio);
+}
diff --git a/include/uapi/misc/ocxl.h b/include/uapi/misc/ocxl.h
new file mode 100644
index 000000000000..71fa387f2efd
--- /dev/null
+++ b/include/uapi/misc/ocxl.h
@@ -0,0 +1,47 @@
+/*
+ * Copyright 2017 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _UAPI_MISC_OCXL_H
+#define _UAPI_MISC_OCXL_H
+
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
+enum ocxl_event_type {
+ OCXL_AFU_EVENT_XSL_FAULT_ERROR = 0,
+};
+
+#define OCXL_KERNEL_EVENT_FLAG_LAST 0x0001 /* This is the last event pending */
+
+struct ocxl_kernel_event_header {
+ __u16 type;
+ __u16 flags;
+ __u32 reserved;
+};
+
+struct ocxl_kernel_event_xsl_fault_error {
+ __u64 addr;
+ __u64 dsisr;
+ __u64 count;
+ __u64 reserved;
+};
+
+struct ocxl_ioctl_attach {
+ __u64 amr;
+ __u64 reserved1;
+ __u64 reserved2;
+ __u64 reserved3;
+};
+
+/* ioctl numbers */
+#define OCXL_MAGIC 0xCA
+/* AFU devices */
+#define OCXL_IOCTL_ATTACH _IOW(OCXL_MAGIC, 0x10, struct ocxl_ioctl_attach)
+
+#endif /* _UAPI_MISC_OCXL_H */
--
2.14.1

2017-12-18 15:24:26

by Frederic Barrat

[permalink] [raw]
Subject: [PATCH 05/13] powerpc/powernv: Capture actag information for the device

In the opencapi protocol, host memory contexts are referenced by a
'actag'. During setup, a driver must tell the device how many actags
it can used, and what values are acceptable.

On POWER9, the NPU can handle 64 actags per link, so they must be
shared between all the PCI functions of the link. To get a global
picture of how many actags are used by each AFU of every function, we
capture some data at the end of PCI enumeration, so that actags can be
shared fairly if needed.

This is not powernv specific per say, but rather a consequence of the
opencapi configuration specification being quite general. The number
of available actags on POWER9 makes it more likely to be hit. This is
somewhat mitigated by the fact that existing AFUs are coded by
requesting a reasonable count of actags and existing devices carry
only one AFU.


Signed-off-by: Frederic Barrat <[email protected]>
---
arch/powerpc/include/asm/pnv-ocxl.h | 4 +
arch/powerpc/platforms/powernv/ocxl.c | 302 ++++++++++++++++++++++++++++++++++
include/misc/ocxl-config.h | 52 ++++++
3 files changed, 358 insertions(+)
create mode 100644 include/misc/ocxl-config.h

diff --git a/arch/powerpc/include/asm/pnv-ocxl.h b/arch/powerpc/include/asm/pnv-ocxl.h
index b9ab3f0a9634..5a7ae7f28209 100644
--- a/arch/powerpc/include/asm/pnv-ocxl.h
+++ b/arch/powerpc/include/asm/pnv-ocxl.h
@@ -16,6 +16,10 @@
#define PNV_OCXL_TL_BITS_PER_RATE 4
#define PNV_OCXL_TL_RATE_BUF_SIZE ((PNV_OCXL_TL_MAX_TEMPLATE+1) * PNV_OCXL_TL_BITS_PER_RATE / 8)

+extern int pnv_ocxl_get_actag(struct pci_dev *dev, u16 *base, u16 *enabled,
+ u16 *supported);
+extern int pnv_ocxl_get_pasid_count(struct pci_dev *dev, int *count);
+
extern int pnv_ocxl_get_tl_cap(struct pci_dev *dev, long *cap,
char *rate_buf, int rate_buf_size);
extern int pnv_ocxl_set_tl_conf(struct pci_dev *dev, long cap,
diff --git a/arch/powerpc/platforms/powernv/ocxl.c b/arch/powerpc/platforms/powernv/ocxl.c
index 3378b75cf5e5..6c79924b95c8 100644
--- a/arch/powerpc/platforms/powernv/ocxl.c
+++ b/arch/powerpc/platforms/powernv/ocxl.c
@@ -9,13 +9,315 @@

#include <asm/pnv-ocxl.h>
#include <asm/opal.h>
+#include <misc/ocxl-config.h>
#include "pci.h"

#define PNV_OCXL_TL_P9_RECV_CAP 0x000000000000000Full
+#define PNV_OCXL_ACTAG_MAX 64
/* PASIDs are 20-bit, but on P9, NPU can only handle 15 bits */
#define PNV_OCXL_PASID_BITS 15
#define PNV_OCXL_PASID_MAX ((1 << PNV_OCXL_PASID_BITS) - 1)

+#define AFU_PRESENT (1 << 31)
+#define AFU_INDEX_MASK 0x3F000000
+#define AFU_INDEX_SHIFT 24
+#define ACTAG_MASK 0xFFF
+
+
+struct actag_range {
+ u16 start;
+ u16 count;
+};
+
+struct npu_link {
+ struct list_head list;
+ int domain;
+ int bus;
+ int dev;
+ u16 fn_desired_actags[8];
+ struct actag_range fn_actags[8];
+ bool assignment_done;
+};
+static struct list_head links_list = LIST_HEAD_INIT(links_list);
+static DEFINE_MUTEX(links_list_lock);
+
+
+/*
+ * opencapi actags handling:
+ *
+ * When sending commands, the opencapi device references the memory
+ * context it's targeting with an 'actag', which is really an alias
+ * for a (BDF, pasid) combination. When it receives a command, the NPU
+ * must do a lookup of the actag to identify the memory context. The
+ * hardware supports a finite number of actags per link (64 for
+ * POWER9).
+ *
+ * The device can carry multiple functions, and each function can have
+ * multiple AFUs. Each AFU advertises in its config space the number
+ * of desired actags. The host must configure in the config space of
+ * the AFU how many actags the AFU is really allowed to use (which can
+ * be less than what the AFU desires).
+ *
+ * When a PCI function is probed by the driver, it has no visibility
+ * about the other PCI functions and how many actags they'd like,
+ * which makes it impossible to distribute actags fairly among AFUs.
+ *
+ * Unfortunately, the only way to know how many actags a function
+ * desires is by looking at the data for each AFU in the config space
+ * and add them up. Similarly, the only way to know how many actags
+ * all the functions of the physical device desire is by adding the
+ * previously computed function counts. Then we can match that against
+ * what the hardware supports.
+ *
+ * To get a comprehensive view, we use a 'pci fixup': at the end of
+ * PCI enumeration, each function counts how many actags its AFUs
+ * desire and we save it in a 'npu_link' structure, shared between all
+ * the PCI functions of a same device. Therefore, when the first
+ * function is probed by the driver, we can get an idea of the total
+ * count of desired actags for the device, and assign the actags to
+ * the AFUs, by pro-rating if needed.
+ */
+
+static int find_dvsec_from_pos(struct pci_dev *dev, int dvsec_id, int pos)
+{
+ int vsec = pos;
+ u16 vendor, id;
+
+ while ((vsec = pci_find_next_ext_capability(dev, vsec,
+ OCXL_EXT_CAP_ID_DVSEC))) {
+ pci_read_config_word(dev, vsec + OCXL_DVSEC_VENDOR_OFFSET,
+ &vendor);
+ pci_read_config_word(dev, vsec + OCXL_DVSEC_ID_OFFSET, &id);
+ if (vendor == PCI_VENDOR_ID_IBM && id == dvsec_id)
+ return vsec;
+ }
+ return 0;
+}
+
+static int find_dvsec_afu_ctrl(struct pci_dev *dev, u8 afu_idx)
+{
+ int vsec = 0;
+ u8 idx;
+
+ while ((vsec = find_dvsec_from_pos(dev, OCXL_DVSEC_AFU_CTRL_ID,
+ vsec))) {
+ pci_read_config_byte(dev, vsec + OCXL_DVSEC_AFU_CTRL_AFU_IDX,
+ &idx);
+ if (idx == afu_idx)
+ return vsec;
+ }
+ return 0;
+}
+
+static int get_max_afu_index(struct pci_dev *dev, int *afu_idx)
+{
+ int pos;
+ u32 val;
+
+ pos = find_dvsec_from_pos(dev, OCXL_DVSEC_FUNC_ID, 0);
+ if (!pos)
+ return -ESRCH;
+
+ pci_read_config_dword(dev, pos + OCXL_DVSEC_FUNC_OFF_INDEX, &val);
+ if (val & AFU_PRESENT)
+ *afu_idx = (val & AFU_INDEX_MASK) >> AFU_INDEX_SHIFT;
+ else
+ *afu_idx = -1;
+ return 0;
+}
+
+static int get_actag_count(struct pci_dev *dev, int afu_idx, int *actag)
+{
+ int pos;
+ u16 actag_sup;
+
+ pos = find_dvsec_afu_ctrl(dev, afu_idx);
+ if (!pos)
+ return -ESRCH;
+
+ pci_read_config_word(dev, pos + OCXL_DVSEC_AFU_CTRL_ACTAG_SUP,
+ &actag_sup);
+ *actag = actag_sup & ACTAG_MASK;
+ return 0;
+}
+
+static struct npu_link *find_link(struct pci_dev *dev)
+{
+ struct npu_link *link;
+
+ list_for_each_entry(link, &links_list, list) {
+ /* The functions of a device all share the same link */
+ if (link->domain == pci_domain_nr(dev->bus) &&
+ link->bus == dev->bus->number &&
+ link->dev == PCI_SLOT(dev->devfn)) {
+ return link;
+ }
+ }
+
+ /* link doesn't exist yet. Allocate one */
+ link = kzalloc(sizeof(struct npu_link), GFP_KERNEL);
+ if (!link)
+ return NULL;
+ link->domain = pci_domain_nr(dev->bus);
+ link->bus = dev->bus->number;
+ link->dev = PCI_SLOT(dev->devfn);
+ list_add(&link->list, &links_list);
+ return link;
+}
+
+static void pnv_ocxl_fixup_actag(struct pci_dev *dev)
+{
+ struct pci_controller *hose = pci_bus_to_host(dev->bus);
+ struct pnv_phb *phb = hose->private_data;
+ struct npu_link *link;
+ int rc, afu_idx = -1, i, actag;
+
+ if (phb->type != PNV_PHB_NPU_OCAPI)
+ return;
+
+ mutex_lock(&links_list_lock);
+
+ link = find_link(dev);
+ if (!link) {
+ dev_warn(&dev->dev, "couldn't update actag information\n");
+ mutex_unlock(&links_list_lock);
+ return;
+ }
+
+ /*
+ * Check how many actags are desired for the AFUs under that
+ * function and add it to the count for the link
+ */
+ rc = get_max_afu_index(dev, &afu_idx);
+ if (rc) {
+ /* Most likely an invalid config space */
+ dev_dbg(&dev->dev, "couldn't find AFU information\n");
+ afu_idx = -1;
+ }
+
+ link->fn_desired_actags[PCI_FUNC(dev->devfn)] = 0;
+ for (i = 0; i <= afu_idx; i++) {
+ /*
+ * AFU index 'holes' are allowed. So don't fail if we
+ * can't read the actag info for an index
+ */
+ rc = get_actag_count(dev, i, &actag);
+ if (rc)
+ continue;
+ link->fn_desired_actags[PCI_FUNC(dev->devfn)] += actag;
+ }
+ dev_dbg(&dev->dev, "total actags for function: %d\n",
+ link->fn_desired_actags[PCI_FUNC(dev->devfn)]);
+
+ mutex_unlock(&links_list_lock);
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, PCI_ANY_ID, pnv_ocxl_fixup_actag);
+
+static u16 assign_fn_actags(u16 desired, u16 total)
+{
+ u16 count;
+
+ if (total <= PNV_OCXL_ACTAG_MAX)
+ count = desired;
+ else
+ count = PNV_OCXL_ACTAG_MAX * desired / total;
+
+ return count;
+}
+
+static void assign_actags(struct npu_link *link)
+{
+ u16 actag_count, range_start = 0, total_desired = 0;
+ int i;
+
+ for (i = 0; i < 8; i++)
+ total_desired += link->fn_desired_actags[i];
+
+ for (i = 0; i < 8; i++) {
+ if (link->fn_desired_actags[i]) {
+ actag_count = assign_fn_actags(
+ link->fn_desired_actags[i],
+ total_desired);
+ link->fn_actags[i].start = range_start;
+ link->fn_actags[i].count = actag_count;
+ range_start += actag_count;
+ WARN_ON(range_start >= PNV_OCXL_ACTAG_MAX);
+ }
+ pr_debug("link %x:%x:%x fct %d actags: start=%d count=%d (desired=%d)\n",
+ link->domain, link->bus, link->dev, i,
+ link->fn_actags[i].start, link->fn_actags[i].count,
+ link->fn_desired_actags[i]);
+ }
+ link->assignment_done = true;
+}
+
+int pnv_ocxl_get_actag(struct pci_dev *dev, u16 *base, u16 *enabled,
+ u16 *supported)
+{
+ struct npu_link *link;
+
+ mutex_lock(&links_list_lock);
+
+ link = find_link(dev);
+ if (!link) {
+ dev_err(&dev->dev, "actag information not found\n");
+ mutex_unlock(&links_list_lock);
+ return -ENODEV;
+ }
+ /*
+ * On p9, we only have 64 actags per link, so they must be
+ * shared by all the functions of the same adapter. We counted
+ * the desired actag counts during PCI enumeration, so that we
+ * can allocate a pro-rated number of actags to each function.
+ */
+ if (!link->assignment_done)
+ assign_actags(link);
+
+ *base = link->fn_actags[PCI_FUNC(dev->devfn)].start;
+ *enabled = link->fn_actags[PCI_FUNC(dev->devfn)].count;
+ *supported = link->fn_desired_actags[PCI_FUNC(dev->devfn)];
+
+ mutex_unlock(&links_list_lock);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_get_actag);
+
+int pnv_ocxl_get_pasid_count(struct pci_dev *dev, int *count)
+{
+ struct npu_link *link;
+ int i, rc = -EINVAL;
+
+ /*
+ * The number of PASIDs (process address space ID) which can
+ * be used by a function depends on how many functions exist
+ * on the device. The NPU needs to be configured to know how
+ * many bits are available to PASIDs and how many are to be
+ * used by the function BDF indentifier.
+ *
+ * We only support one AFU-carrying function for now.
+ */
+ mutex_lock(&links_list_lock);
+
+ link = find_link(dev);
+ if (!link) {
+ dev_err(&dev->dev, "actag information not found\n");
+ mutex_unlock(&links_list_lock);
+ return -ENODEV;
+ }
+
+ for (i = 0; i < 8; i++)
+ if (link->fn_desired_actags[i] && (i == PCI_FUNC(dev->devfn))) {
+ *count = PNV_OCXL_PASID_MAX;
+ rc = 0;
+ break;
+ }
+
+ mutex_unlock(&links_list_lock);
+ dev_dbg(&dev->dev, "%d PASIDs available for function\n",
+ rc ? 0 : *count);
+ return rc;
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_get_pasid_count);

static void set_templ_rate(unsigned int templ, unsigned int rate, char *buf)
{
diff --git a/include/misc/ocxl-config.h b/include/misc/ocxl-config.h
new file mode 100644
index 000000000000..b6677899ed09
--- /dev/null
+++ b/include/misc/ocxl-config.h
@@ -0,0 +1,52 @@
+/*
+ * Copyright 2017 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _OCXL_CONFIG_H_
+#define _OCXL_CONFIG_H_
+
+/*
+ * This file lists the various constants used to read the
+ * configuration space of an opencapi adapter.
+ *
+ * It follows the specification for opencapi 3.0
+ */
+
+#define OCXL_EXT_CAP_ID_DVSEC 0x23
+
+#define OCXL_DVSEC_VENDOR_OFFSET 0x4
+#define OCXL_DVSEC_ID_OFFSET 0x8
+#define OCXL_DVSEC_TL_ID 0xF000
+#define OCXL_DVSEC_TL_BACKOFF_TIMERS 0x10
+#define OCXL_DVSEC_TL_RECV_CAP 0x18
+#define OCXL_DVSEC_TL_SEND_CAP 0x20
+#define OCXL_DVSEC_TL_RECV_RATE 0x30
+#define OCXL_DVSEC_TL_SEND_RATE 0x50
+#define OCXL_DVSEC_FUNC_ID 0xF001
+#define OCXL_DVSEC_FUNC_OFF_INDEX 0x08
+#define OCXL_DVSEC_FUNC_OFF_ACTAG 0x0C
+#define OCXL_DVSEC_AFU_INFO_ID 0xF003
+#define OCXL_DVSEC_AFU_INFO_AFU_IDX 0x0A
+#define OCXL_DVSEC_AFU_INFO_OFF 0x0C
+#define OCXL_DVSEC_AFU_INFO_DATA 0x10
+#define OCXL_DVSEC_AFU_CTRL_ID 0xF004
+#define OCXL_DVSEC_AFU_CTRL_AFU_IDX 0x0A
+#define OCXL_DVSEC_AFU_CTRL_TERM_PASID 0x0C
+#define OCXL_DVSEC_AFU_CTRL_ENABLE 0x0F
+#define OCXL_DVSEC_AFU_CTRL_PASID_SUP 0x10
+#define OCXL_DVSEC_AFU_CTRL_PASID_EN 0x11
+#define OCXL_DVSEC_AFU_CTRL_PASID_BASE 0x14
+#define OCXL_DVSEC_AFU_CTRL_ACTAG_SUP 0x18
+#define OCXL_DVSEC_AFU_CTRL_ACTAG_EN 0x1A
+#define OCXL_DVSEC_AFU_CTRL_ACTAG_BASE 0x1C
+#define OCXL_DVSEC_VENDOR_ID 0xF0F0
+#define OCXL_DVSEC_VENDOR_CFG_VERS 0x0C
+#define OCXL_DVSEC_VENDOR_TLX_VERS 0x10
+#define OCXL_DVSEC_VENDOR_DLX_VERS 0x20
+
+#endif /* _OCXL_CONFIG_H_ */
--
2.14.1

2017-12-18 15:24:40

by Frederic Barrat

[permalink] [raw]
Subject: [PATCH 02/13] powerpc/powernv: Set correct configuration space size for opencapi devices

>From Andrew Donnellan <[email protected]>

The configuration space for opencapi devices doesn't have a PCI
Express capability, therefore confusing linux in thinking it's of an
old PCI type with a 256-byte configuration space size, instead of the
desired 4k. So add a PCI fixup to declare the correct size.


Signed-off-by: Andrew Donnellan <[email protected]>
Signed-off-by: Frederic Barrat <[email protected]>
---
arch/powerpc/platforms/powernv/pci-ioda.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index c37b5d288f9c..b8ec76aa266f 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -4079,6 +4079,16 @@ void __init pnv_pci_init_npu2_opencapi_phb(struct device_node *np)
pnv_pci_init_ioda_phb(np, 0, PNV_PHB_NPU_OCAPI);
}

+static void pnv_npu2_opencapi_cfg_size_fixup(struct pci_dev *dev)
+{
+ struct pci_controller *hose = pci_bus_to_host(dev->bus);
+ struct pnv_phb *phb = hose->private_data;
+
+ if (phb->type == PNV_PHB_NPU_OCAPI)
+ dev->cfg_size = PCI_CFG_SPACE_EXP_SIZE;
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, pnv_npu2_opencapi_cfg_size_fixup);
+
void __init pnv_pci_init_ioda_hub(struct device_node *np)
{
struct device_node *phbn;
--
2.14.1

2017-12-18 15:24:42

by Frederic Barrat

[permalink] [raw]
Subject: [PATCH 01/13] powerpc/powernv: Introduce new PHB type for opencapi links

The NPU was already abstracted by opal as a virtual PHB for nvlink,
but it helps to be able to differentiate between a nvlink or opencapi
PHB, as it's not completely transparent to linux. In particular, PE
assignment differs and we'll also need the information in later
patches.

So rename existing PNV_PHB_NPU type to PNV_PHB_NPU_NVLINK and add a
new type PNV_PHB_NPU_OCAPI.

Signed-off-by: Frederic Barrat <[email protected]>
Signed-off-by: Andrew Donnellan <[email protected]>
---
arch/powerpc/platforms/powernv/npu-dma.c | 2 +-
arch/powerpc/platforms/powernv/pci-ioda.c | 46 +++++++++++++++++++++++--------
arch/powerpc/platforms/powernv/pci.c | 4 +++
arch/powerpc/platforms/powernv/pci.h | 8 ++++--
4 files changed, 45 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
index f6cbc1a71472..c5899c107d59 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -277,7 +277,7 @@ static int pnv_npu_dma_set_bypass(struct pnv_ioda_pe *npe)
int64_t rc = 0;
phys_addr_t top = memblock_end_of_DRAM();

- if (phb->type != PNV_PHB_NPU || !npe->pdev)
+ if (phb->type != PNV_PHB_NPU_NVLINK || !npe->pdev)
return -EINVAL;

rc = pnv_npu_unset_window(npe, 0);
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 749055553064..c37b5d288f9c 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -54,7 +54,8 @@
#define POWERNV_IOMMU_DEFAULT_LEVELS 1
#define POWERNV_IOMMU_MAX_LEVELS 5

-static const char * const pnv_phb_names[] = { "IODA1", "IODA2", "NPU" };
+static const char * const pnv_phb_names[] = { "IODA1", "IODA2", "NPU_NVLINK",
+ "NPU_OCAPI" };
static void pnv_pci_ioda2_table_free_pages(struct iommu_table *tbl);

void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
@@ -924,7 +925,7 @@ static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
* Configure PELTV. NPUs don't have a PELTV table so skip
* configuration on them.
*/
- if (phb->type != PNV_PHB_NPU)
+ if (phb->type != PNV_PHB_NPU_NVLINK && phb->type != PNV_PHB_NPU_OCAPI)
pnv_ioda_set_peltv(phb, pe, true);

/* Setup reverse map */
@@ -1260,12 +1261,13 @@ static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
return pe;
}

-static void pnv_ioda_setup_npu_PEs(struct pci_bus *bus)
+static void pnv_ioda_setup_npu_PEs(struct pci_bus *bus,
+ struct pnv_ioda_pe *fn(struct pci_dev *npu_pdev))
{
struct pci_dev *pdev;

list_for_each_entry(pdev, &bus->devices, bus_list)
- pnv_ioda_setup_npu_PE(pdev);
+ fn(pdev);
}

static void pnv_pci_ioda_setup_PEs(void)
@@ -1275,13 +1277,18 @@ static void pnv_pci_ioda_setup_PEs(void)

list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
phb = hose->private_data;
- if (phb->type == PNV_PHB_NPU) {
+ if (phb->type == PNV_PHB_NPU_NVLINK) {
/* PE#0 is needed for error reporting */
pnv_ioda_reserve_pe(phb, 0);
- pnv_ioda_setup_npu_PEs(hose->bus);
+ pnv_ioda_setup_npu_PEs(hose->bus,
+ pnv_ioda_setup_npu_PE);
if (phb->model == PNV_PHB_MODEL_NPU2)
pnv_npu2_init(phb);
}
+ if (phb->type == PNV_PHB_NPU_OCAPI) {
+ pnv_ioda_setup_npu_PEs(hose->bus,
+ pnv_ioda_setup_dev_PE);
+ }
}
}

@@ -2640,7 +2647,7 @@ static int gpe_table_group_to_npe_cb(struct device *dev, void *opaque)

hose = pci_bus_to_host(pdev->bus);
phb = hose->private_data;
- if (phb->type != PNV_PHB_NPU)
+ if (phb->type != PNV_PHB_NPU_NVLINK)
return 0;

*ptmppe = &phb->ioda.pe_array[pdn->pe_number];
@@ -2724,7 +2731,7 @@ static void pnv_pci_ioda_setup_iommu_api(void)
list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
phb = hose->private_data;

- if (phb->type != PNV_PHB_NPU)
+ if (phb->type != PNV_PHB_NPU_NVLINK)
continue;

list_for_each_entry(pe, &phb->ioda.pe_list, list) {
@@ -3774,6 +3781,13 @@ static const struct pci_controller_ops pnv_npu_ioda_controller_ops = {
.shutdown = pnv_pci_ioda_shutdown,
};

+static const struct pci_controller_ops pnv_npu_ocapi_ioda_controller_ops = {
+ .enable_device_hook = pnv_pci_enable_device_hook,
+ .window_alignment = pnv_pci_window_alignment,
+ .reset_secondary_bus = pnv_pci_reset_secondary_bus,
+ .shutdown = pnv_pci_ioda_shutdown,
+};
+
#ifdef CONFIG_CXL_BASE
const struct pci_controller_ops pnv_cxl_cx4_ioda_controller_ops = {
.dma_dev_setup = pnv_pci_dma_dev_setup,
@@ -4007,9 +4021,14 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
*/
ppc_md.pcibios_fixup = pnv_pci_ioda_fixup;

- if (phb->type == PNV_PHB_NPU) {
+ switch (phb->type) {
+ case PNV_PHB_NPU_NVLINK:
hose->controller_ops = pnv_npu_ioda_controller_ops;
- } else {
+ break;
+ case PNV_PHB_NPU_OCAPI:
+ hose->controller_ops = pnv_npu_ocapi_ioda_controller_ops;
+ break;
+ default:
phb->dma_dev_setup = pnv_pci_ioda_dma_dev_setup;
hose->controller_ops = pnv_pci_ioda_controller_ops;
}
@@ -4052,7 +4071,12 @@ void __init pnv_pci_init_ioda2_phb(struct device_node *np)

void __init pnv_pci_init_npu_phb(struct device_node *np)
{
- pnv_pci_init_ioda_phb(np, 0, PNV_PHB_NPU);
+ pnv_pci_init_ioda_phb(np, 0, PNV_PHB_NPU_NVLINK);
+}
+
+void __init pnv_pci_init_npu2_opencapi_phb(struct device_node *np)
+{
+ pnv_pci_init_ioda_phb(np, 0, PNV_PHB_NPU_OCAPI);
}

void __init pnv_pci_init_ioda_hub(struct device_node *np)
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index 5422f4a6317c..69d102cbf48f 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -1142,6 +1142,10 @@ void __init pnv_pci_init(void)
for_each_compatible_node(np, NULL, "ibm,ioda2-npu2-phb")
pnv_pci_init_npu_phb(np);

+ /* Look for NPU2 OpenCAPI PHBs */
+ for_each_compatible_node(np, NULL, "ibm,ioda2-npu2-opencapi-phb")
+ pnv_pci_init_npu2_opencapi_phb(np);
+
/* Configure IOMMU DMA hooks */
set_pci_dma_ops(&dma_iommu_ops);
}
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index b772d7473896..eada4b6068cb 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -12,9 +12,10 @@ struct pci_dn;
#define NV_NMMU_ATSD_REGS 8

enum pnv_phb_type {
- PNV_PHB_IODA1 = 0,
- PNV_PHB_IODA2 = 1,
- PNV_PHB_NPU = 2,
+ PNV_PHB_IODA1 = 0,
+ PNV_PHB_IODA2 = 1,
+ PNV_PHB_NPU_NVLINK = 2,
+ PNV_PHB_NPU_OCAPI = 3,
};

/* Precise PHB model for error management */
@@ -227,6 +228,7 @@ extern void pnv_pci_setup_iommu_table(struct iommu_table *tbl,
extern void pnv_pci_init_ioda_hub(struct device_node *np);
extern void pnv_pci_init_ioda2_phb(struct device_node *np);
extern void pnv_pci_init_npu_phb(struct device_node *np);
+extern void pnv_pci_init_npu2_opencapi_phb(struct device_node *np);
extern void pnv_pci_reset_secondary_bus(struct pci_dev *dev);
extern int pnv_eeh_phb_reset(struct pci_controller *hose, int option);

--
2.14.1

2017-12-18 16:08:57

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH 13/13] ocxl: add MAINTAINERS entry

On Mon, 2017-12-18 at 16:21 +0100, Frederic Barrat wrote:
> Signed-off-by: Frederic Barrat <[email protected]>
> Signed-off-by: Andrew Donnellan <[email protected]>
> ---
> MAINTAINERS | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index a6e86e20761e..edc9e1db352b 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3919,6 +3919,18 @@ F: drivers/scsi/cxlflash/
> F: include/uapi/scsi/cxlflash_ioctls.h
> F: Documentation/powerpc/cxlflash.txt
>
> +OCXL (Open Coherent Accelerator Processor Interface OpenCAPI) DRIVER
> +M: Frederic Barrat <[email protected]>
> +M: Andrew Donnellan <[email protected]>
> +L: [email protected]
> +S: Supported
> +F: arch/powerpc/platforms/powernv/ocxl.c
> +F: arch/powerpc/include/asm/pnv-ocxl.h
> +F: drivers/misc/ocxl/
> +F: include/misc/ocxl*
> +F: include/uapi/misc/ocxl.h
> +F: Documentation/accelerators/ocxl.txt
> +

Alphabetic ordering by section header please...

Maybe CXL - OCXL ...

> CYBERPRO FB DRIVER
> M: Russell King <[email protected]>
> L: [email protected] (moderated for non-subscribers)

2017-12-18 16:48:55

by Philippe Ombredanne

[permalink] [raw]
Subject: Re: [PATCH 09/13] ocxl: Add trace points

Frederic,

On Mon, Dec 18, 2017 at 4:21 PM, Frederic Barrat
<[email protected]> wrote:
> Define a few trace points so that we can use the standard tracing
> mechanism for debug and/or monitoring.
>
> Signed-off-by: Frederic Barrat <[email protected]>

<snip>

> --- /dev/null
> +++ b/drivers/misc/ocxl/trace.h
> @@ -0,0 +1,189 @@
> +/*
> + * Copyright 2017 IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */

Would you mind using the new SPDX tags documented in Thomas patch set
[1] rather than this legalese?

Thank you!

[1] https://lkml.org/lkml/2017/12/4/934

--
Cordially
Philippe Ombredanne

2017-12-18 23:47:17

by Andrew Donnellan

[permalink] [raw]
Subject: Re: [PATCH 11/13] cxl: Remove support for "Processing accelerators" class

On 19/12/17 02:21, Frederic Barrat wrote:
> The cxl driver currently declares in its table of supported PCI
> devices the class "Processing accelerators". Therefore it may be
> called to probe for opencapi devices, which generates errors, as the
> config space of a cxl device is not compatible with opencapi.
>
> So remove support for the generic class, as we now have (at least) two
> drivers for devices of the same class. Most cxl devices are FPGAs with
> a PSL which will show a known device ID of 0x477. Other devices are
> really supported by the cxlflash driver and are already listed in the
> table. So removing the class is expected to go unnoticed.
>
> Signed-off-by: Frederic Barrat <[email protected]>

Acked-by: Andrew Donnellan <[email protected]>

> ---
> drivers/misc/cxl/pci.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
> index 19969ee86d6f..758842f65a1b 100644
> --- a/drivers/misc/cxl/pci.c
> +++ b/drivers/misc/cxl/pci.c
> @@ -125,8 +125,6 @@ static const struct pci_device_id cxl_pci_tbl[] = {
> { PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x0601), },
> { PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x0623), },
> { PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x0628), },
> - { PCI_DEVICE_CLASS(0x120000, ~0), },
> -
> { }
> };
> MODULE_DEVICE_TABLE(pci, cxl_pci_tbl);
>

--
Andrew Donnellan OzLabs, ADL Canberra
[email protected] IBM Australia Limited

2017-12-19 00:23:07

by Andrew Donnellan

[permalink] [raw]
Subject: Re: [PATCH 00/13] New driver to support OpenCAPI devices on POWER9

On 19/12/17 02:21, Frederic Barrat wrote:
> This series adds support for Open Coherent Accelerator (ocxl) devices
> on POWER9 processor. OpenCAPI is a consortium developing the
> specifications for an interface between processors and accelerators,
> allowing sharing the host memory with the accelerators, using virtual
> addresses.
>
> The OpenCAPI device can also have its own local memory and provide
> access to the host, though it is not supported by that series.
>
> The OpenCAPI specification is processor agnostic, but this series adds
> support specifically for powerpc.
>
> Even though the underlying transport is not PCI, the firmware
> abstracts the hardware like a PCI host bridge and Linux sees the
> OpenCAPI devices as PCI devices. So a lot of existing infrastructure
> and commands can be reused.
>
> Patches 1-5: add the platform-specific services needed by the driver
> Patches 6-10: driver code
> Patch 11: small correction to existing cxl driver
> Patch 12: documentation
>
> Current limitations, that will be addressed in later patches:
> - no capability to trigger a reset of the opencapi adapter
> - no support for the 'wake_host_thread' command
> - no support for adapters with a dual-link connection (none exists yet)
> - no access to the adapter-local memory
>
> Many people contributed directly or indirectly, from the software,
> hardware and bringup teams. In particular Andrew Donnellan and
> Alastair D'Silva, who are developing the related firmware and library.
>
> Feedback welcome!

[+ linux-accelerators]

The corresponding patch series for skiboot/OPAL can be found at:

https://patchwork.ozlabs.org/project/skiboot/list/?series=19043


Andrew

--
Andrew Donnellan OzLabs, ADL Canberra
[email protected] IBM Australia Limited

2017-12-19 00:47:34

by Andrew Donnellan

[permalink] [raw]
Subject: Re: [PATCH 03/13] powerpc/powernv: Add opal calls for opencapi

On 19/12/17 02:21, Frederic Barrat wrote:
> Add opal calls to interact with the NPU:
>
> OPAL_NPU_SPA_SETUP: set the Shared Process Area (SPA)
> The SPA is a table containing one entry (Process Element) per memory
> context which can be accessed by the opencapi device.
>
> OPAL_NPU_SPA_CLEAR_CACHE: clear the context cache
> The NPU keeps a cache of recently accessed memory contexts. When a
> Process Element is removed from the SPA, the cache for the link must
> be cleared.
>
> OPAL_NPU_TL_SET: configure the Transaction Layer
> The Transaction Layer specification defines several templates for
> messages to be exchanged on the link. During link setup, the host and
> device must negotiate what templates are supported on both sides and
> at what rates those messages can be sent.
>
>
> Signed-off-by: Frederic Barrat <[email protected]>

Corresponding skiboot patch: https://patchwork.ozlabs.org/patch/849830/

Acked-by: Andrew Donnellan <[email protected]>

> ---
> arch/powerpc/include/asm/opal-api.h | 5 ++++-
> arch/powerpc/include/asm/opal.h | 6 ++++++
> arch/powerpc/platforms/powernv/opal-wrappers.S | 3 +++
> 3 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
> index 233c7504b1f2..24c73f5575ee 100644
> --- a/arch/powerpc/include/asm/opal-api.h
> +++ b/arch/powerpc/include/asm/opal-api.h
> @@ -201,7 +201,10 @@
> #define OPAL_SET_POWER_SHIFT_RATIO 155
> #define OPAL_SENSOR_GROUP_CLEAR 156
> #define OPAL_PCI_SET_P2P 157
> -#define OPAL_LAST 157
> +#define OPAL_NPU_SPA_SETUP 159
> +#define OPAL_NPU_SPA_CLEAR_CACHE 160
> +#define OPAL_NPU_TL_SET 161
> +#define OPAL_LAST 161
>
> /* Device tree flags */
>
> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
> index 0c545f7fc77b..12e70fb58700 100644
> --- a/arch/powerpc/include/asm/opal.h
> +++ b/arch/powerpc/include/asm/opal.h
> @@ -34,6 +34,12 @@ int64_t opal_npu_init_context(uint64_t phb_id, int pasid, uint64_t msr,
> uint64_t bdf);
> int64_t opal_npu_map_lpar(uint64_t phb_id, uint64_t bdf, uint64_t lparid,
> uint64_t lpcr);
> +int64_t opal_npu_spa_setup(uint64_t phb_id, uint32_t bdfn,
> + uint64_t addr, uint64_t PE_mask);
> +int64_t opal_npu_spa_clear_cache(uint64_t phb_id, uint32_t bdfn,
> + uint64_t PE_handle);
> +int64_t opal_npu_tl_set(uint64_t phb_id, uint32_t bdfn, long cap,
> + uint64_t rate_phys, uint32_t size);

The function prototype on the skiboot side has slightly different
parameter names: long capabilities, uint32_t rate_sz.

> int64_t opal_console_write(int64_t term_number, __be64 *length,
> const uint8_t *buffer);
> int64_t opal_console_read(int64_t term_number, __be64 *length,
> diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
> index 6f4b00a2ac46..1b2936ba6040 100644
> --- a/arch/powerpc/platforms/powernv/opal-wrappers.S
> +++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
> @@ -320,3 +320,6 @@ OPAL_CALL(opal_set_powercap, OPAL_SET_POWERCAP);
> OPAL_CALL(opal_get_power_shift_ratio, OPAL_GET_POWER_SHIFT_RATIO);
> OPAL_CALL(opal_set_power_shift_ratio, OPAL_SET_POWER_SHIFT_RATIO);
> OPAL_CALL(opal_sensor_group_clear, OPAL_SENSOR_GROUP_CLEAR);
> +OPAL_CALL(opal_npu_spa_setup, OPAL_NPU_SPA_SETUP);
> +OPAL_CALL(opal_npu_spa_clear_cache, OPAL_NPU_SPA_CLEAR_CACHE);
> +OPAL_CALL(opal_npu_tl_set, OPAL_NPU_TL_SET);
>

--
Andrew Donnellan OzLabs, ADL Canberra
[email protected] IBM Australia Limited

2017-12-19 01:15:08

by Andrew Donnellan

[permalink] [raw]
Subject: Re: [PATCH 02/13] powerpc/powernv: Set correct configuration space size for opencapi devices

On 19/12/17 02:21, Frederic Barrat wrote:
> From Andrew Donnellan <[email protected]>

Good try :) That should be "From: ..."

git format-patch/send-email will handle this automatically if the commit
author is set correctly, ie:

git commit --amend --author="Andrew Donnellan
<[email protected]>"

>
> The configuration space for opencapi devices doesn't have a PCI
> Express capability, therefore confusing linux in thinking it's of an
> old PCI type with a 256-byte configuration space size, instead of the
> desired 4k. So add a PCI fixup to declare the correct size.
>
>
> Signed-off-by: Andrew Donnellan <[email protected]>
> Signed-off-by: Frederic Barrat <[email protected]>
> ---
> arch/powerpc/platforms/powernv/pci-ioda.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index c37b5d288f9c..b8ec76aa266f 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -4079,6 +4079,16 @@ void __init pnv_pci_init_npu2_opencapi_phb(struct device_node *np)
> pnv_pci_init_ioda_phb(np, 0, PNV_PHB_NPU_OCAPI);
> }
>
> +static void pnv_npu2_opencapi_cfg_size_fixup(struct pci_dev *dev)
> +{
> + struct pci_controller *hose = pci_bus_to_host(dev->bus);
> + struct pnv_phb *phb = hose->private_data;
> +
> + if (phb->type == PNV_PHB_NPU_OCAPI)
> + dev->cfg_size = PCI_CFG_SPACE_EXP_SIZE;
> +}
> +DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, pnv_npu2_opencapi_cfg_size_fixup);
> +
> void __init pnv_pci_init_ioda_hub(struct device_node *np)
> {
> struct device_node *phbn;
>

--
Andrew Donnellan OzLabs, ADL Canberra
[email protected] IBM Australia Limited

2017-12-19 04:05:55

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 07/13] ocxl: Add AFU interrupt support

On Mon, 2017-12-18 at 16:21 +0100, Frederic Barrat wrote:
> Add user APIs through ioctl to allocate, free, and be notified of an
> AFU interrupt.
>
> For opencapi, an AFU can trigger an interrupt on the host by sending a
> specific command targeting a 64-bit object handle. On POWER9, this is
> implemented by mapping a special page in the address space of a
> process and a write to that page will trigger an interrupt.

We need to figure out how that plays with KVM. +Cedric..

For all those "generic xive" interrupts, whether they are used for
OpenCAPI, plain guest IPIs, NX interrupts etc... but also for actual
pass-through ones, we'll need a mechanism to map the trigger and ESB
pages into qemu.

We can't have a bazillion VMAs and KVM memory regions either, so we'll
need some kind of mechanism/driver which allows for a single fairly
large mmap'ed VMA which can then be "populated" with interrupt control
pages.

The issue of course is that we can't really do a "generic" system that
allows to map any interrupt, it's a security issue. So we need the
interrupt "owner" to be the one allowing this. VFIO for PCI for
example, possibly a specific VFIO variant for OpenCAPI, something else
for guest IPIs ?

Food for thoughts...

Ben.

>
> Signed-off-by: Frederic Barrat <[email protected]>
> ---
> arch/powerpc/include/asm/pnv-ocxl.h | 3 +
> arch/powerpc/platforms/powernv/ocxl.c | 30 +++++
> drivers/misc/ocxl/afu_irq.c | 204 ++++++++++++++++++++++++++++++++++
> drivers/misc/ocxl/context.c | 40 ++++++-
> drivers/misc/ocxl/file.c | 33 ++++++
> drivers/misc/ocxl/link.c | 28 +++++
> drivers/misc/ocxl/ocxl_internal.h | 7 ++
> include/uapi/misc/ocxl.h | 9 ++
> 8 files changed, 352 insertions(+), 2 deletions(-)
> create mode 100644 drivers/misc/ocxl/afu_irq.c
>
> diff --git a/arch/powerpc/include/asm/pnv-ocxl.h b/arch/powerpc/include/asm/pnv-ocxl.h
> index 5a7ae7f28209..1e26f0a39500 100644
> --- a/arch/powerpc/include/asm/pnv-ocxl.h
> +++ b/arch/powerpc/include/asm/pnv-ocxl.h
> @@ -37,4 +37,7 @@ extern int pnv_ocxl_spa_setup(struct pci_dev *dev, void *spa_mem, int PE_mask,
> extern void pnv_ocxl_spa_release(void *platform_data);
> extern int pnv_ocxl_spa_remove_pe(void *platform_data, int pe_handle);
>
> +extern int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr);
> +extern void pnv_ocxl_free_xive_irq(u32 irq);
> +
> #endif /* _ASM_PVN_OCXL_H */
> diff --git a/arch/powerpc/platforms/powernv/ocxl.c b/arch/powerpc/platforms/powernv/ocxl.c
> index 6c79924b95c8..96cafba6aef1 100644
> --- a/arch/powerpc/platforms/powernv/ocxl.c
> +++ b/arch/powerpc/platforms/powernv/ocxl.c
> @@ -9,6 +9,7 @@
>
> #include <asm/pnv-ocxl.h>
> #include <asm/opal.h>
> +#include <asm/xive.h>
> #include <misc/ocxl-config.h>
> #include "pci.h"
>
> @@ -487,3 +488,32 @@ int pnv_ocxl_spa_remove_pe(void *platform_data, int pe_handle)
> return rc;
> }
> EXPORT_SYMBOL_GPL(pnv_ocxl_spa_remove_pe);
> +
> +int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr)
> +{
> + __be64 flags, trigger_page;
> + s64 rc;
> + u32 hwirq;
> +
> + hwirq = xive_native_alloc_irq();
> + if (!hwirq)
> + return -ENOENT;
> +
> + rc = opal_xive_get_irq_info(hwirq, &flags, NULL, &trigger_page, NULL,
> + NULL);
> + if (rc || !trigger_page) {
> + xive_native_free_irq(hwirq);
> + return -ENOENT;
> + }
> + *irq = hwirq;
> + *trigger_addr = be64_to_cpu(trigger_page);
> + return 0;
> +
> +}
> +EXPORT_SYMBOL_GPL(pnv_ocxl_alloc_xive_irq);
> +
> +void pnv_ocxl_free_xive_irq(u32 irq)
> +{
> + xive_native_free_irq(irq);
> +}
> +EXPORT_SYMBOL_GPL(pnv_ocxl_free_xive_irq);
> diff --git a/drivers/misc/ocxl/afu_irq.c b/drivers/misc/ocxl/afu_irq.c
> new file mode 100644
> index 000000000000..0b217a854837
> --- /dev/null
> +++ b/drivers/misc/ocxl/afu_irq.c
> @@ -0,0 +1,204 @@
> +/*
> + * Copyright 2017 IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#include <linux/interrupt.h>
> +#include <linux/eventfd.h>
> +#include <asm/pnv-ocxl.h>
> +#include "ocxl_internal.h"
> +
> +struct afu_irq {
> + int id;
> + int hw_irq;
> + unsigned int virq;
> + char *name;
> + u64 trigger_page;
> + struct eventfd_ctx *ev_ctx;
> +};
> +
> +static int irq_offset_to_id(struct ocxl_context *ctx, u64 offset)
> +{
> + return (offset - ctx->afu->irq_base_offset) >> PAGE_SHIFT;
> +}
> +
> +static u64 irq_id_to_offset(struct ocxl_context *ctx, int id)
> +{
> + return ctx->afu->irq_base_offset + (id << PAGE_SHIFT);
> +}
> +
> +static irqreturn_t afu_irq_handler(int virq, void *data)
> +{
> + struct afu_irq *irq = (struct afu_irq *) data;
> +
> + if (irq->ev_ctx)
> + eventfd_signal(irq->ev_ctx, 1);
> + return IRQ_HANDLED;
> +}
> +
> +static int setup_afu_irq(struct ocxl_context *ctx, struct afu_irq *irq)
> +{
> + int rc;
> +
> + irq->virq = irq_create_mapping(NULL, irq->hw_irq);
> + if (!irq->virq) {
> + pr_err("irq_create_mapping failed\n");
> + return -ENOMEM;
> + }
> + pr_debug("hw_irq %d mapped to virq %u\n", irq->hw_irq, irq->virq);
> +
> + irq->name = kasprintf(GFP_KERNEL, "ocxl-afu-%u", irq->virq);
> + if (!irq->name) {
> + irq_dispose_mapping(irq->virq);
> + return -ENOMEM;
> + }
> +
> + rc = request_irq(irq->virq, afu_irq_handler, 0, irq->name, irq);
> + if (rc) {
> + kfree(irq->name);
> + irq->name = NULL;
> + irq_dispose_mapping(irq->virq);
> + pr_err("request_irq failed: %d\n", rc);
> + return rc;
> + }
> + return 0;
> +}
> +
> +static void release_afu_irq(struct afu_irq *irq)
> +{
> + free_irq(irq->virq, irq);
> + irq_dispose_mapping(irq->virq);
> + kfree(irq->name);
> +}
> +
> +int ocxl_afu_irq_alloc(struct ocxl_context *ctx, u64 *irq_offset)
> +{
> + struct afu_irq *irq;
> + int rc;
> +
> + irq = kzalloc(sizeof(struct afu_irq), GFP_KERNEL);
> + if (!irq)
> + return -ENOMEM;
> +
> + /*
> + * We limit the number of afu irqs per context and per link to
> + * avoid a single process or user depleting the pool of IPIs
> + */
> +
> + mutex_lock(&ctx->irq_lock);
> +
> + irq->id = idr_alloc(&ctx->irq_idr, irq, 0, MAX_IRQ_PER_CONTEXT,
> + GFP_KERNEL);
> + if (irq->id < 0) {
> + rc = -ENOSPC;
> + goto err_unlock;
> + }
> +
> + rc = ocxl_link_irq_alloc(ctx->afu->fn->link, &irq->hw_irq,
> + &irq->trigger_page);
> + if (rc)
> + goto err_idr;
> +
> + rc = setup_afu_irq(ctx, irq);
> + if (rc)
> + goto err_alloc;
> +
> + *irq_offset = irq_id_to_offset(ctx, irq->id);
> +
> + mutex_unlock(&ctx->irq_lock);
> + return 0;
> +
> +err_alloc:
> + ocxl_link_free_irq(ctx->afu->fn->link, irq->hw_irq);
> +err_idr:
> + idr_remove(&ctx->irq_idr, irq->id);
> +err_unlock:
> + mutex_unlock(&ctx->irq_lock);
> + kfree(irq);
> + return rc;
> +}
> +
> +static void afu_irq_free(struct afu_irq *irq, struct ocxl_context *ctx)
> +{
> + if (ctx->mapping)
> + unmap_mapping_range(ctx->mapping,
> + irq_id_to_offset(ctx, irq->id),
> + 1 << PAGE_SHIFT, 1);
> + release_afu_irq(irq);
> + if (irq->ev_ctx)
> + eventfd_ctx_put(irq->ev_ctx);
> + ocxl_link_free_irq(ctx->afu->fn->link, irq->hw_irq);
> + kfree(irq);
> +}
> +
> +int ocxl_afu_irq_free(struct ocxl_context *ctx, u64 irq_offset)
> +{
> + struct afu_irq *irq;
> + int id = irq_offset_to_id(ctx, irq_offset);
> +
> + mutex_lock(&ctx->irq_lock);
> +
> + irq = idr_find(&ctx->irq_idr, id);
> + if (!irq) {
> + mutex_unlock(&ctx->irq_lock);
> + return -EINVAL;
> + }
> + idr_remove(&ctx->irq_idr, irq->id);
> + afu_irq_free(irq, ctx);
> + mutex_unlock(&ctx->irq_lock);
> + return 0;
> +}
> +
> +void ocxl_afu_irq_free_all(struct ocxl_context *ctx)
> +{
> + struct afu_irq *irq;
> + int id;
> +
> + mutex_lock(&ctx->irq_lock);
> + idr_for_each_entry(&ctx->irq_idr, irq, id)
> + afu_irq_free(irq, ctx);
> + mutex_unlock(&ctx->irq_lock);
> +}
> +
> +int ocxl_afu_irq_set_fd(struct ocxl_context *ctx, u64 irq_offset, int eventfd)
> +{
> + struct afu_irq *irq;
> + struct eventfd_ctx *ev_ctx;
> + int rc = 0, id = irq_offset_to_id(ctx, irq_offset);
> +
> + mutex_lock(&ctx->irq_lock);
> + irq = idr_find(&ctx->irq_idr, id);
> + if (!irq) {
> + rc = -EINVAL;
> + goto unlock;
> + }
> +
> + ev_ctx = eventfd_ctx_fdget(eventfd);
> + if (IS_ERR(ev_ctx)) {
> + rc = -EINVAL;
> + goto unlock;
> + }
> +
> + irq->ev_ctx = ev_ctx;
> +unlock:
> + mutex_unlock(&ctx->irq_lock);
> + return rc;
> +}
> +
> +u64 ocxl_afu_irq_get_addr(struct ocxl_context *ctx, u64 irq_offset)
> +{
> + struct afu_irq *irq;
> + int id = irq_offset_to_id(ctx, irq_offset);
> + u64 addr = 0;
> +
> + mutex_lock(&ctx->irq_lock);
> + irq = idr_find(&ctx->irq_idr, id);
> + if (irq)
> + addr = irq->trigger_page;
> + mutex_unlock(&ctx->irq_lock);
> + return addr;
> +}
> diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
> index 0bc0dd97d784..19575269ed22 100644
> --- a/drivers/misc/ocxl/context.c
> +++ b/drivers/misc/ocxl/context.c
> @@ -38,6 +38,8 @@ int ocxl_context_init(struct ocxl_context *ctx, struct ocxl_afu *afu,
> mutex_init(&ctx->mapping_lock);
> init_waitqueue_head(&ctx->events_wq);
> mutex_init(&ctx->xsl_error_lock);
> + mutex_init(&ctx->irq_lock);
> + idr_init(&ctx->irq_idr);
> /*
> * Keep a reference on the AFU to make sure it's valid for the
> * duration of the life of the context
> @@ -87,6 +89,19 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr)
> return rc;
> }
>
> +static int map_afu_irq(struct vm_area_struct *vma, unsigned long address,
> + u64 offset, struct ocxl_context *ctx)
> +{
> + u64 trigger_addr;
> +
> + trigger_addr = ocxl_afu_irq_get_addr(ctx, offset);
> + if (!trigger_addr)
> + return VM_FAULT_SIGBUS;
> +
> + vm_insert_pfn(vma, address, trigger_addr >> PAGE_SHIFT);
> + return VM_FAULT_NOPAGE;
> +}
> +
> static int map_pp_mmio(struct vm_area_struct *vma, unsigned long address,
> u64 offset, struct ocxl_context *ctx)
> {
> @@ -125,7 +140,10 @@ static int ocxl_mmap_fault(struct vm_fault *vmf)
> pr_debug("%s: pasid %d address 0x%lx offset 0x%llx\n", __func__,
> ctx->pasid, vmf->address, offset);
>
> - rc = map_pp_mmio(vma, vmf->address, offset, ctx);
> + if (offset < ctx->afu->irq_base_offset)
> + rc = map_pp_mmio(vma, vmf->address, offset, ctx);
> + else
> + rc = map_afu_irq(vma, vmf->address, offset, ctx);
> return rc;
> }
>
> @@ -133,6 +151,19 @@ static const struct vm_operations_struct ocxl_vmops = {
> .fault = ocxl_mmap_fault,
> };
>
> +static int check_mmap_afu_irq(struct ocxl_context *ctx,
> + struct vm_area_struct *vma)
> +{
> + /* only one page */
> + if (vma_pages(vma) != 1)
> + return -EINVAL;
> +
> + /* check offset validty */
> + if (!ocxl_afu_irq_get_addr(ctx, vma->vm_pgoff << PAGE_SHIFT))
> + return -EINVAL;
> + return 0;
> +}
> +
> static int check_mmap_mmio(struct ocxl_context *ctx,
> struct vm_area_struct *vma)
> {
> @@ -146,7 +177,10 @@ int ocxl_context_mmap(struct ocxl_context *ctx, struct vm_area_struct *vma)
> {
> int rc;
>
> - rc = check_mmap_mmio(ctx, vma);
> + if ((vma->vm_pgoff << PAGE_SHIFT) < ctx->afu->irq_base_offset)
> + rc = check_mmap_mmio(ctx, vma);
> + else
> + rc = check_mmap_afu_irq(ctx, vma);
> if (rc)
> return rc;
>
> @@ -231,6 +265,8 @@ void ocxl_context_free(struct ocxl_context *ctx)
> idr_remove(&ctx->afu->contexts_idr, ctx->pasid);
> mutex_unlock(&ctx->afu->contexts_lock);
>
> + ocxl_afu_irq_free_all(ctx);
> + idr_destroy(&ctx->irq_idr);
> /* reference to the AFU taken in ocxl_context_init */
> ocxl_afu_put(ctx->afu);
> kfree(ctx);
> diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c
> index a51386eff4f5..0a73e2c11ba6 100644
> --- a/drivers/misc/ocxl/file.c
> +++ b/drivers/misc/ocxl/file.c
> @@ -110,12 +110,17 @@ static long afu_ioctl_attach(struct ocxl_context *ctx,
> }
>
> #define CMD_STR(x) (x == OCXL_IOCTL_ATTACH ? "ATTACH" : \
> + x == OCXL_IOCTL_IRQ_ALLOC ? "IRQ_ALLOC" : \
> + x == OCXL_IOCTL_IRQ_FREE ? "IRQ_FREE" : \
> + x == OCXL_IOCTL_IRQ_SET_FD ? "IRQ_SET_FD" : \
> "UNKNOWN")
>
> static long afu_ioctl(struct file *file, unsigned int cmd,
> unsigned long args)
> {
> struct ocxl_context *ctx = file->private_data;
> + struct ocxl_ioctl_irq_fd irq_fd;
> + u64 irq_offset;
> long rc;
>
> pr_debug("%s for context %d, command %s\n", __func__, ctx->pasid,
> @@ -130,6 +135,34 @@ static long afu_ioctl(struct file *file, unsigned int cmd,
> (struct ocxl_ioctl_attach __user *) args);
> break;
>
> + case OCXL_IOCTL_IRQ_ALLOC:
> + rc = ocxl_afu_irq_alloc(ctx, &irq_offset);
> + if (!rc) {
> + rc = copy_to_user((u64 *) args, &irq_offset,
> + sizeof(irq_offset));
> + if (rc)
> + ocxl_afu_irq_free(ctx, irq_offset);
> + }
> + break;
> +
> + case OCXL_IOCTL_IRQ_FREE:
> + rc = copy_from_user(&irq_offset, (u64 *) args,
> + sizeof(irq_offset));
> + if (rc)
> + return -EFAULT;
> + rc = ocxl_afu_irq_free(ctx, irq_offset);
> + break;
> +
> + case OCXL_IOCTL_IRQ_SET_FD:
> + rc = copy_from_user(&irq_fd, (u64 *) args, sizeof(irq_fd));
> + if (rc)
> + return -EFAULT;
> + if (irq_fd.reserved)
> + return -EINVAL;
> + rc = ocxl_afu_irq_set_fd(ctx, irq_fd.irq_offset,
> + irq_fd.eventfd);
> + break;
> +
> default:
> rc = -EINVAL;
> }
> diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
> index 6b184cd7d2a6..5f12564eea99 100644
> --- a/drivers/misc/ocxl/link.c
> +++ b/drivers/misc/ocxl/link.c
> @@ -608,3 +608,31 @@ int ocxl_link_remove_pe(void *link_handle, int pasid)
> mutex_unlock(&spa->spa_lock);
> return rc;
> }
> +
> +int ocxl_link_irq_alloc(void *link_handle, int *hw_irq, u64 *trigger_addr)
> +{
> + struct link *link = (struct link *) link_handle;
> + int rc, irq;
> + u64 addr;
> +
> + if (atomic_dec_if_positive(&link->irq_available) < 0)
> + return -ENOSPC;
> +
> + rc = pnv_ocxl_alloc_xive_irq(&irq, &addr);
> + if (rc) {
> + atomic_inc(&link->irq_available);
> + return rc;
> + }
> +
> + *hw_irq = irq;
> + *trigger_addr = addr;
> + return 0;
> +}
> +
> +void ocxl_link_free_irq(void *link_handle, int hw_irq)
> +{
> + struct link *link = (struct link *) link_handle;
> +
> + pnv_ocxl_free_xive_irq(hw_irq);
> + atomic_inc(&link->irq_available);
> +}
> diff --git a/drivers/misc/ocxl/ocxl_internal.h b/drivers/misc/ocxl/ocxl_internal.h
> index e07f7d523275..829369c5f004 100644
> --- a/drivers/misc/ocxl/ocxl_internal.h
> +++ b/drivers/misc/ocxl/ocxl_internal.h
> @@ -197,4 +197,11 @@ extern void ocxl_context_free(struct ocxl_context *ctx);
> extern int ocxl_sysfs_add_afu(struct ocxl_afu *afu);
> extern void ocxl_sysfs_remove_afu(struct ocxl_afu *afu);
>
> +extern int ocxl_afu_irq_alloc(struct ocxl_context *ctx, u64 *irq_offset);
> +extern int ocxl_afu_irq_free(struct ocxl_context *ctx, u64 irq_offset);
> +extern void ocxl_afu_irq_free_all(struct ocxl_context *ctx);
> +extern int ocxl_afu_irq_set_fd(struct ocxl_context *ctx, u64 irq_offset,
> + int eventfd);
> +extern u64 ocxl_afu_irq_get_addr(struct ocxl_context *ctx, u64 irq_offset);
> +
> #endif /* _OCXL_INTERNAL_H_ */
> diff --git a/include/uapi/misc/ocxl.h b/include/uapi/misc/ocxl.h
> index 71fa387f2efd..488e75228c33 100644
> --- a/include/uapi/misc/ocxl.h
> +++ b/include/uapi/misc/ocxl.h
> @@ -39,9 +39,18 @@ struct ocxl_ioctl_attach {
> __u64 reserved3;
> };
>
> +struct ocxl_ioctl_irq_fd {
> + __u64 irq_offset;
> + __s32 eventfd;
> + __u32 reserved;
> +};
> +
> /* ioctl numbers */
> #define OCXL_MAGIC 0xCA
> /* AFU devices */
> #define OCXL_IOCTL_ATTACH _IOW(OCXL_MAGIC, 0x10, struct ocxl_ioctl_attach)
> +#define OCXL_IOCTL_IRQ_ALLOC _IOR(OCXL_MAGIC, 0x11, __u64)
> +#define OCXL_IOCTL_IRQ_FREE _IOW(OCXL_MAGIC, 0x12, __u64)
> +#define OCXL_IOCTL_IRQ_SET_FD _IOW(OCXL_MAGIC, 0x13, struct ocxl_ioctl_irq_fd)
>
> #endif /* _UAPI_MISC_OCXL_H */

2017-12-19 14:39:40

by Frederic Barrat

[permalink] [raw]
Subject: Re: [PATCH 13/13] ocxl: add MAINTAINERS entry



Le 18/12/2017 à 17:04, Joe Perches a écrit :
>> +OCXL (Open Coherent Accelerator Processor Interface OpenCAPI) DRIVER
>> +M: Frederic Barrat<[email protected]>
>> +M: Andrew Donnellan<[email protected]>
>> +L: [email protected]
>> +S: Supported
>> +F: arch/powerpc/platforms/powernv/ocxl.c
>> +F: arch/powerpc/include/asm/pnv-ocxl.h
>> +F: drivers/misc/ocxl/
>> +F: include/misc/ocxl*
>> +F: include/uapi/misc/ocxl.h
>> +F: Documentation/accelerators/ocxl.txt
>> +
> Alphabetic ordering by section header please...
>
> Maybe CXL - OCXL ...


Sure, I'll fix it. Thanks!

Fred

2017-12-20 12:01:22

by Frederic Barrat

[permalink] [raw]
Subject: Re: [PATCH 09/13] ocxl: Add trace points



Le 18/12/2017 à 17:48, Philippe Ombredanne a écrit :
>> --- /dev/null
>> +++ b/drivers/misc/ocxl/trace.h
>> @@ -0,0 +1,189 @@
>> +/*
>> + * Copyright 2017 IBM Corp.
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public License
>> + * as published by the Free Software Foundation; either version
>> + * 2 of the License, or (at your option) any later version.
>> + */
> Would you mind using the new SPDX tags documented in Thomas patch set
> [1] rather than this legalese?

ok, it will be in the next revision. Thanks!

Fred

2018-01-03 03:53:51

by Andrew Donnellan

[permalink] [raw]
Subject: Re: [PATCH 01/13] powerpc/powernv: Introduce new PHB type for opencapi links

On 19/12/17 02:21, Frederic Barrat wrote:
> The NPU was already abstracted by opal as a virtual PHB for nvlink,
> but it helps to be able to differentiate between a nvlink or opencapi
> PHB, as it's not completely transparent to linux. In particular, PE
> assignment differs and we'll also need the information in later
> patches.
>
> So rename existing PNV_PHB_NPU type to PNV_PHB_NPU_NVLINK and add a
> new type PNV_PHB_NPU_OCAPI.
>
> Signed-off-by: Frederic Barrat <[email protected]>
> Signed-off-by: Andrew Donnellan <[email protected]>
> ---
> arch/powerpc/platforms/powernv/npu-dma.c | 2 +-
> arch/powerpc/platforms/powernv/pci-ioda.c | 46 +++++++++++++++++++++++--------
> arch/powerpc/platforms/powernv/pci.c | 4 +++
> arch/powerpc/platforms/powernv/pci.h | 8 ++++--
> 4 files changed, 45 insertions(+), 15 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
> index f6cbc1a71472..c5899c107d59 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -277,7 +277,7 @@ static int pnv_npu_dma_set_bypass(struct pnv_ioda_pe *npe)
> int64_t rc = 0;
> phys_addr_t top = memblock_end_of_DRAM();
>
> - if (phb->type != PNV_PHB_NPU || !npe->pdev)
> + if (phb->type != PNV_PHB_NPU_NVLINK || !npe->pdev)
> return -EINVAL;
>
> rc = pnv_npu_unset_window(npe, 0);
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 749055553064..c37b5d288f9c 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -54,7 +54,8 @@
> #define POWERNV_IOMMU_DEFAULT_LEVELS 1
> #define POWERNV_IOMMU_MAX_LEVELS 5
>
> -static const char * const pnv_phb_names[] = { "IODA1", "IODA2", "NPU" };
> +static const char * const pnv_phb_names[] = { "IODA1", "IODA2", "NPU_NVLINK",
> + "NPU_OCAPI" };
> static void pnv_pci_ioda2_table_free_pages(struct iommu_table *tbl);
>
> void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
> @@ -924,7 +925,7 @@ static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
> * Configure PELTV. NPUs don't have a PELTV table so skip
> * configuration on them.
> */
> - if (phb->type != PNV_PHB_NPU)
> + if (phb->type != PNV_PHB_NPU_NVLINK && phb->type != PNV_PHB_NPU_OCAPI)
> pnv_ioda_set_peltv(phb, pe, true);
>
> /* Setup reverse map */
> @@ -1260,12 +1261,13 @@ static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
> return pe;
> }
>
> -static void pnv_ioda_setup_npu_PEs(struct pci_bus *bus)
> +static void pnv_ioda_setup_npu_PEs(struct pci_bus *bus,
> + struct pnv_ioda_pe *fn(struct pci_dev *npu_pdev))
> {
> struct pci_dev *pdev;
>
> list_for_each_entry(pdev, &bus->devices, bus_list)
> - pnv_ioda_setup_npu_PE(pdev);
> + fn(pdev);
> }

I think adding a function pointer here is rather ugly, at this point you
might as well just do this directly in pnv_pci_ioda_setup_PEs()

>
> static void pnv_pci_ioda_setup_PEs(void)
> @@ -1275,13 +1277,18 @@ static void pnv_pci_ioda_setup_PEs(void)
>
> list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
> phb = hose->private_data;
> - if (phb->type == PNV_PHB_NPU) {
> + if (phb->type == PNV_PHB_NPU_NVLINK) {
> /* PE#0 is needed for error reporting */
> pnv_ioda_reserve_pe(phb, 0);
> - pnv_ioda_setup_npu_PEs(hose->bus);
> + pnv_ioda_setup_npu_PEs(hose->bus,
> + pnv_ioda_setup_npu_PE);
> if (phb->model == PNV_PHB_MODEL_NPU2)
> pnv_npu2_init(phb);
> }
> + if (phb->type == PNV_PHB_NPU_OCAPI) {
> + pnv_ioda_setup_npu_PEs(hose->bus,
> + pnv_ioda_setup_dev_PE);
> + }
> }
> }
>

--
Andrew Donnellan OzLabs, ADL Canberra
[email protected] IBM Australia Limited

2018-01-03 05:48:25

by Andrew Donnellan

[permalink] [raw]
Subject: Re: [PATCH 10/13] ocxl: Add Makefile and Kconfig

On 19/12/17 02:21, Frederic Barrat wrote:
> OCXL_BASE triggers the platform support needed by the driver.
>
> Signed-off-by: Frederic Barrat <[email protected]>
> ---
> drivers/misc/Kconfig | 1 +
> drivers/misc/Makefile | 1 +
> drivers/misc/ocxl/Kconfig | 25 +++++++++++++++++++++++++
> drivers/misc/ocxl/Makefile | 10 ++++++++++
> 4 files changed, 37 insertions(+)
> create mode 100644 drivers/misc/ocxl/Kconfig
> create mode 100644 drivers/misc/ocxl/Makefile
>
> diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
> index f1a5c2357b14..0534f338c84a 100644
> --- a/drivers/misc/Kconfig
> +++ b/drivers/misc/Kconfig
> @@ -508,4 +508,5 @@ source "drivers/misc/mic/Kconfig"
> source "drivers/misc/genwqe/Kconfig"
> source "drivers/misc/echo/Kconfig"
> source "drivers/misc/cxl/Kconfig"
> +source "drivers/misc/ocxl/Kconfig"
> endmenu
> diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
> index 5ca5f64df478..73326d54e246 100644
> --- a/drivers/misc/Makefile
> +++ b/drivers/misc/Makefile
> @@ -55,6 +55,7 @@ obj-$(CONFIG_CXL_BASE) += cxl/
> obj-$(CONFIG_ASPEED_LPC_CTRL) += aspeed-lpc-ctrl.o
> obj-$(CONFIG_ASPEED_LPC_SNOOP) += aspeed-lpc-snoop.o
> obj-$(CONFIG_PCI_ENDPOINT_TEST) += pci_endpoint_test.o
> +obj-$(CONFIG_OCXL) += ocxl/
>
> lkdtm-$(CONFIG_LKDTM) += lkdtm_core.o
> lkdtm-$(CONFIG_LKDTM) += lkdtm_bugs.o
> diff --git a/drivers/misc/ocxl/Kconfig b/drivers/misc/ocxl/Kconfig
> new file mode 100644
> index 000000000000..4496b61f48db
> --- /dev/null
> +++ b/drivers/misc/ocxl/Kconfig
> @@ -0,0 +1,25 @@
> +#
> +# Open Coherent Accelerator (OCXL) compatible devices
> +#
> +
> +config OCXL_BASE
> + bool
> + default n
> + select PPC_COPRO_BASE
> +
> +config OCXL
> + tristate "Support for Open Coherent Accelerators (OCXL)"
> + depends on PPC_POWERNV && PCI && EEH
> + select OCXL_BASE
> + default m
> + help
> +
> + Select this option to enable driver support for Open
> + Coherent Accelerators (OCXL). OCXL is otherwise known as
> + Open Coherent Accelerator Processor Interface (OCAPI).
> + OCAPI allows accelerators in FPGAs to be coherently attached
> + to a CPU through a Open CAPI link. This driver enables
> + userspace programs to access these accelerators through
> + devices found in /dev/ocxl/

I'd prefer more consistency in how we refer to OpenCAPI. "ocxl" is a
driver name that we have purely for historical reasons, it's not really
the name of anything else. I know throughout the various specs and code,
we use "OCAPI" a lot, but that's not really an abbreviation that should
be "user-facing".

Something like:

config OCXL
tristate "OpenCAPI coherent accelerator support"
help

Select this option to enable the ocxl driver for Open Coherent

Accelerator Processor Interface (OpenCAPI) devices.

OpenCAPI allows FPGA and ASIC accelerators to be coherently
attached to a CPU over an OpenCAPI link.

The ocxl driver enables userspace programs to access these
accelerators through devices in /dev/ocxl/.

For more information, see http://opencapi.org.

If unsure, say N.

> +
> + If unsure, say N.
> diff --git a/drivers/misc/ocxl/Makefile b/drivers/misc/ocxl/Makefile
> new file mode 100644
> index 000000000000..f75853411cfd
> --- /dev/null
> +++ b/drivers/misc/ocxl/Makefile
> @@ -0,0 +1,10 @@
> +ccflags-$(CONFIG_PPC_WERROR) += -Werror
> +
> +ocxl-y += main.o pci.o config.o file.o pasid.o
> +ocxl-y += link.o context.o afu_irq.o sysfs.o trace.o
> +obj-$(CONFIG_OCXL) += ocxl.o
> +
> +# For tracepoints to include our trace.h from tracepoint infrastructure:
> +CFLAGS_trace.o := -I$(src)
> +
> +# ccflags-y += -DDEBUG
>

--
Andrew Donnellan OzLabs, ADL Canberra
[email protected] IBM Australia Limited

2018-01-03 07:31:13

by Andrew Donnellan

[permalink] [raw]
Subject: Re: [PATCH 06/13] ocxl: Driver code for 'generic' opencapi devices

On 19/12/17 02:21, Frederic Barrat wrote:
> Add an ocxl driver to handle generic opencapi devices. Of course, it's
> not meant to be the only opencapi driver, any device is free to
> implement its own. But if a host application only needs basic services
> like attaching to an opencapi adapter, have translation faults handled
> or allocate AFU interrupts, it should suffice.
>
> The AFU config space must follow the opencapi specification and use
> the expected vendor/device ID to be seen by the generic driver.
>
> The driver exposes the device AFUs as a char device in /dev/ocxl/
>
> Note that the driver currently doesn't handle memory attached to the
> opencapi device.
>
> Signed-off-by: Frederic Barrat <[email protected]>
> Signed-off-by: Andrew Donnellan <[email protected]>
> Signed-off-by: Alastair D'Silva <[email protected]>

A bunch of sparse warnings we should look at. (there's a few more that
appear in later patches too)

> ---
> drivers/misc/ocxl/config.c | 718 ++++++++++++++++++++++++++++++++++++++
> drivers/misc/ocxl/context.c | 237 +++++++++++++
> drivers/misc/ocxl/file.c | 405 +++++++++++++++++++++
> drivers/misc/ocxl/link.c | 610 ++++++++++++++++++++++++++++++++
> drivers/misc/ocxl/main.c | 40 +++
> drivers/misc/ocxl/ocxl_internal.h | 200 +++++++++++
> drivers/misc/ocxl/pasid.c | 114 ++++++
> drivers/misc/ocxl/pci.c | 592 +++++++++++++++++++++++++++++++
> drivers/misc/ocxl/sysfs.c | 150 ++++++++
> include/uapi/misc/ocxl.h | 47 +++
> 10 files changed, 3113 insertions(+)
> create mode 100644 drivers/misc/ocxl/config.c
> create mode 100644 drivers/misc/ocxl/context.c
> create mode 100644 drivers/misc/ocxl/file.c
> create mode 100644 drivers/misc/ocxl/link.c
> create mode 100644 drivers/misc/ocxl/main.c
> create mode 100644 drivers/misc/ocxl/ocxl_internal.h
> create mode 100644 drivers/misc/ocxl/pasid.c
> create mode 100644 drivers/misc/ocxl/pci.c
> create mode 100644 drivers/misc/ocxl/sysfs.c
> create mode 100644 include/uapi/misc/ocxl.h
>
> diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
> new file mode 100644
> index 000000000000..bb2fde5967e2
> --- /dev/null
> +++ b/drivers/misc/ocxl/config.c
> @@ -0,0 +1,718 @@
> +/*
> + * Copyright 2017 IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#include <linux/pci.h>
> +#include <asm/pnv-ocxl.h>
> +#include <misc/ocxl-config.h>
> +#include "ocxl_internal.h"
> +
> +#define EXTRACT_BIT(val, bit) (!!(val & BIT(bit)))
> +#define EXTRACT_BITS(val, s, e) ((val & GENMASK(e, s)) >> s)
> +
> +#define OCXL_DVSEC_AFU_IDX_MASK GENMASK(5, 0)
> +#define OCXL_DVSEC_ACTAG_MASK GENMASK(11, 0)
> +#define OCXL_DVSEC_PASID_MASK GENMASK(19, 0)
> +#define OCXL_DVSEC_PASID_LOG_MASK GENMASK(4, 0)
> +
> +#define OCXL_DVSEC_TEMPL_VERSION 0x0
> +#define OCXL_DVSEC_TEMPL_NAME 0x4
> +#define OCXL_DVSEC_TEMPL_AFU_VERSION 0x1C
> +#define OCXL_DVSEC_TEMPL_MMIO_GLOBAL 0x20
> +#define OCXL_DVSEC_TEMPL_MMIO_GLOBAL_SZ 0x28
> +#define OCXL_DVSEC_TEMPL_MMIO_PP 0x30
> +#define OCXL_DVSEC_TEMPL_MMIO_PP_SZ 0x38
> +#define OCXL_DVSEC_TEMPL_MEM_SZ 0x3C
> +#define OCXL_DVSEC_TEMPL_WWID 0x40
> +
> +#define OCXL_MAX_AFU_PER_FUNCTION 64
> +#define OCXL_TEMPL_LEN 0x58
> +#define OCXL_TEMPL_NAME_LEN 24
> +#define OCXL_CFG_TIMEOUT 3
> +
> +static int find_dvsec(struct pci_dev *dev, int dvsec_id)
> +{
> + int vsec = 0;
> + u16 vendor, id;
> +
> + while ((vsec = pci_find_next_ext_capability(dev, vsec,
> + OCXL_EXT_CAP_ID_DVSEC))) {
> + pci_read_config_word(dev, vsec + OCXL_DVSEC_VENDOR_OFFSET,
> + &vendor);
> + pci_read_config_word(dev, vsec + OCXL_DVSEC_ID_OFFSET, &id);
> + if (vendor == PCI_VENDOR_ID_IBM && id == dvsec_id)
> + return vsec;
> + }
> + return 0;
> +}
> +
> +static int find_dvsec_afu_ctrl(struct pci_dev *dev, u8 afu_idx)
> +{
> + int vsec = 0;
> + u16 vendor, id;
> + u8 idx;
> +
> + while ((vsec = pci_find_next_ext_capability(dev, vsec,
> + OCXL_EXT_CAP_ID_DVSEC))) {
> + pci_read_config_word(dev, vsec + OCXL_DVSEC_VENDOR_OFFSET,
> + &vendor);
> + pci_read_config_word(dev, vsec + OCXL_DVSEC_ID_OFFSET, &id);
> +
> + if (vendor == PCI_VENDOR_ID_IBM &&
> + id == OCXL_DVSEC_AFU_CTRL_ID) {
> + pci_read_config_byte(dev,
> + vsec + OCXL_DVSEC_AFU_CTRL_AFU_IDX,
> + &idx);
> + if (idx == afu_idx)
> + return vsec;
> + }
> + }
> + return 0;
> +}
> +
> +static int read_pasid(struct pci_dev *dev, struct ocxl_fn_config *fn)
> +{
> + u16 val;
> + int pos;
> +
> + pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_PASID);
> + if (!pos) {
> + /*
> + * PASID capability is not mandatory, but there
> + * shouldn't be any AFU
> + */
> + dev_dbg(&dev->dev, "Function doesn't require any PASID\n");
> + fn->max_pasid_log = -1;
> + goto out;
> + }
> + pci_read_config_word(dev, pos + PCI_PASID_CAP, &val);
> + fn->max_pasid_log = EXTRACT_BITS(val, 8, 12);
> +
> +out:
> + dev_dbg(&dev->dev, "PASID capability:\n");
> + dev_dbg(&dev->dev, " Max PASID log = %d\n", fn->max_pasid_log);
> + return 0;
> +}
> +
> +static int read_dvsec_tl(struct pci_dev *dev, struct ocxl_fn_config *fn)
> +{
> + int pos;
> +
> + pos = find_dvsec(dev, OCXL_DVSEC_TL_ID);
> + if (!pos && PCI_FUNC(dev->devfn) == 0) {
> + dev_err(&dev->dev, "Can't find TL DVSEC\n");
> + return -ENODEV;
> + }
> + if (pos && PCI_FUNC(dev->devfn) != 0) {
> + dev_err(&dev->dev, "TL DVSEC is only allowed on function 0\n");
> + return -ENODEV;
> + }
> + fn->dvsec_tl_pos = pos;
> + return 0;
> +}
> +
> +static int read_dvsec_function(struct pci_dev *dev, struct ocxl_fn_config *fn)
> +{
> + int pos, afu_present;
> + u32 val;
> +
> + pos = find_dvsec(dev, OCXL_DVSEC_FUNC_ID);
> + if (!pos) {
> + dev_err(&dev->dev, "Can't find function DVSEC\n");
> + return -ENODEV;
> + }
> + fn->dvsec_function_pos = pos;
> +
> + pci_read_config_dword(dev, pos + OCXL_DVSEC_FUNC_OFF_INDEX, &val);
> + afu_present = EXTRACT_BIT(val, 31);
> + if (!afu_present) {
> + fn->max_afu_index = -1;
> + dev_dbg(&dev->dev, "Function doesn't define any AFU\n");
> + goto out;
> + }
> + fn->max_afu_index = EXTRACT_BITS(val, 24, 29);
> +
> +out:
> + dev_dbg(&dev->dev, "Function DVSEC:\n");
> + dev_dbg(&dev->dev, " Max AFU index = %d\n", fn->max_afu_index);
> + return 0;
> +}
> +
> +static int read_dvsec_afu_info(struct pci_dev *dev, struct ocxl_fn_config *fn)
> +{
> + int pos;
> +
> + if (fn->max_afu_index < 0) {
> + fn->dvsec_afu_info_pos = -1;
> + return 0;
> + }
> +
> + pos = find_dvsec(dev, OCXL_DVSEC_AFU_INFO_ID);
> + if (!pos) {
> + dev_err(&dev->dev, "Can't find AFU information DVSEC\n");
> + return -ENODEV;
> + }
> + fn->dvsec_afu_info_pos = pos;
> + return 0;
> +}
> +
> +static int read_dvsec_vendor(struct pci_dev *dev)
> +{
> + int pos;
> + u32 cfg, tlx, dlx;
> +
> + /*
> + * vendor specific DVSEC is optional
> + *
> + * It's currently only used on function 0 to specify the
> + * version of some logic blocks. Some older images may not
> + * even have it so we ignore any errors
> + */
> + if (PCI_FUNC(dev->devfn) != 0)
> + return 0;
> +
> + pos = find_dvsec(dev, OCXL_DVSEC_VENDOR_ID);
> + if (!pos)
> + return 0;
> +
> + pci_read_config_dword(dev, pos + OCXL_DVSEC_VENDOR_CFG_VERS, &cfg);
> + pci_read_config_dword(dev, pos + OCXL_DVSEC_VENDOR_TLX_VERS, &tlx);
> + pci_read_config_dword(dev, pos + OCXL_DVSEC_VENDOR_DLX_VERS, &dlx);
> +
> + dev_dbg(&dev->dev, "Vendor specific DVSEC:\n");
> + dev_dbg(&dev->dev, " CFG version = 0x%x\n", cfg);
> + dev_dbg(&dev->dev, " TLX version = 0x%x\n", tlx);
> + dev_dbg(&dev->dev, " DLX version = 0x%x\n", dlx);
> + return 0;
> +}
> +
> +static int validate_function(struct pci_dev *dev, struct ocxl_fn_config *fn)
> +{
> + if (fn->max_pasid_log == -1 && fn->max_afu_index >= 0) {
> + dev_err(&dev->dev,
> + "AFUs are defined but no PASIDs are requested\n");
> + return -EINVAL;
> + }
> +
> + if (fn->max_afu_index > OCXL_MAX_AFU_PER_FUNCTION) {
> + dev_err(&dev->dev,
> + "Max AFU index out of architectural limit (%d vs %d)\n",
> + fn->max_afu_index, OCXL_MAX_AFU_PER_FUNCTION);
> + return -EINVAL;
> + }
> + return 0;
> +}
> +
> +int ocxl_config_read_function(struct pci_dev *dev, struct ocxl_fn_config *fn)
> +{
> + int rc;
> +
> + rc = read_pasid(dev, fn);
> + if (rc) {
> + dev_err(&dev->dev, "Invalid PASID configuration: %d\n", rc);
> + return -ENODEV;
> + }
> +
> + rc = read_dvsec_tl(dev, fn);
> + if (rc) {
> + dev_err(&dev->dev,
> + "Invalid Transaction Layer DVSEC configuration: %d\n",
> + rc);
> + return -ENODEV;
> + }
> +
> + rc = read_dvsec_function(dev, fn);
> + if (rc) {
> + dev_err(&dev->dev,
> + "Invalid Function DVSEC configuration: %d\n", rc);
> + return -ENODEV;
> + }
> +
> + rc = read_dvsec_afu_info(dev, fn);
> + if (rc) {
> + dev_err(&dev->dev, "Invalid AFU configuration: %d\n", rc);
> + return -ENODEV;
> + }
> +
> + rc = read_dvsec_vendor(dev);
> + if (rc) {
> + dev_err(&dev->dev,
> + "Invalid vendor specific DVSEC configuration: %d\n",
> + rc);
> + return -ENODEV;
> + }
> +
> + rc = validate_function(dev, fn);
> + return rc;
> +}
> +
> +static int read_afu_info(struct pci_dev *dev, struct ocxl_fn_config *fn,
> + int offset, u32 *data)
> +{
> + u32 val;
> + unsigned long timeout = jiffies + (HZ * OCXL_CFG_TIMEOUT);
> + int pos = fn->dvsec_afu_info_pos;
> +
> + /* Protect 'data valid' bit */
> + if (EXTRACT_BIT(offset, 31)) {
> + dev_err(&dev->dev, "Invalid offset in AFU info DVSEC\n");
> + return -EINVAL;
> + }
> +
> + pci_write_config_dword(dev, pos + OCXL_DVSEC_AFU_INFO_OFF, offset);
> + pci_read_config_dword(dev, pos + OCXL_DVSEC_AFU_INFO_OFF, &val);
> + while (!EXTRACT_BIT(val, 31)) {
> + if (time_after_eq(jiffies, timeout)) {
> + dev_err(&dev->dev,
> + "Timeout while reading AFU info DVSEC (offset=%d)\n",
> + offset);
> + return -EBUSY;
> + }
> + cpu_relax();
> + pci_read_config_dword(dev, pos + OCXL_DVSEC_AFU_INFO_OFF, &val);
> + }
> + pci_read_config_dword(dev, pos + OCXL_DVSEC_AFU_INFO_DATA, data);
> + return 0;
> +}
> +
> +int ocxl_config_check_afu_index(struct pci_dev *dev,
> + struct ocxl_fn_config *fn, int afu_idx)
> +{
> + u32 val;
> + int rc, templ_major, templ_minor, len;
> +
> + pci_write_config_word(dev, fn->dvsec_afu_info_pos, afu_idx);
> + rc = read_afu_info(dev, fn, OCXL_DVSEC_TEMPL_VERSION, &val);
> + if (rc)
> + return rc;
> +
> + /* AFU index map can have holes */
> + if (!val)
> + return 0;
> +
> + templ_major = EXTRACT_BITS(val, 8, 15);
> + templ_minor = EXTRACT_BITS(val, 0, 7);
> + dev_dbg(&dev->dev, "AFU descriptor template version %d.%d\n",
> + templ_major, templ_minor);
> +
> + len = EXTRACT_BITS(val, 16, 31);
> + if (len != OCXL_TEMPL_LEN) {
> + dev_warn(&dev->dev,
> + "Unexpected template length in AFU information (%#x)\n",
> + len);
> + }
> + return 1;
> +}
> +
> +static int read_afu_name(struct pci_dev *dev, struct ocxl_fn_config *fn,
> + struct ocxl_afu_config *afu)
> +{
> + int i, rc;
> + u32 val, *ptr;
> +
> + BUILD_BUG_ON(OCXL_AFU_NAME_SZ < OCXL_TEMPL_NAME_LEN);
> + for (i = 0; i < OCXL_TEMPL_NAME_LEN; i += 4) {
> + rc = read_afu_info(dev, fn, OCXL_DVSEC_TEMPL_NAME + i, &val);
> + if (rc)
> + return rc;
> + ptr = (u32 *) &afu->name[i];
> + *ptr = val;
> + }
> + afu->name[OCXL_AFU_NAME_SZ - 1] = '\0'; /* play safe */
> + return 0;
> +}
> +
> +static int read_afu_mmio(struct pci_dev *dev, struct ocxl_fn_config *fn,
> + struct ocxl_afu_config *afu)
> +{
> + int rc;
> + u32 val;
> +
> + /*
> + * Global MMIO
> + */
> + rc = read_afu_info(dev, fn, OCXL_DVSEC_TEMPL_MMIO_GLOBAL, &val);
> + if (rc)
> + return rc;
> + afu->global_mmio_bar = EXTRACT_BITS(val, 0, 2);
> + afu->global_mmio_offset = EXTRACT_BITS(val, 16, 31) << 16;
> +
> + rc = read_afu_info(dev, fn, OCXL_DVSEC_TEMPL_MMIO_GLOBAL + 4, &val);
> + if (rc)
> + return rc;
> + afu->global_mmio_offset += (u64) val << 32;
> +
> + rc = read_afu_info(dev, fn, OCXL_DVSEC_TEMPL_MMIO_GLOBAL_SZ, &val);
> + if (rc)
> + return rc;
> + afu->global_mmio_size = val;
> +
> + /*
> + * Per-process MMIO
> + */
> + rc = read_afu_info(dev, fn, OCXL_DVSEC_TEMPL_MMIO_PP, &val);
> + if (rc)
> + return rc;
> + afu->pp_mmio_bar = EXTRACT_BITS(val, 0, 2);
> + afu->pp_mmio_offset = EXTRACT_BITS(val, 16, 31) << 16;
> +
> + rc = read_afu_info(dev, fn, OCXL_DVSEC_TEMPL_MMIO_PP + 4, &val);
> + if (rc)
> + return rc;
> + afu->pp_mmio_offset += (u64) val << 32;
> +
> + rc = read_afu_info(dev, fn, OCXL_DVSEC_TEMPL_MMIO_PP_SZ, &val);
> + if (rc)
> + return rc;
> + afu->pp_mmio_stride = val;
> +
> + return 0;
> +}
> +
> +static int read_afu_control(struct pci_dev *dev, struct ocxl_afu_config *afu)
> +{
> + int pos;
> + u8 val8;
> + u16 val16;
> +
> + pos = find_dvsec_afu_ctrl(dev, afu->idx);
> + if (!pos) {
> + dev_err(&dev->dev, "Can't find AFU control DVSEC for AFU %d\n",
> + afu->idx);
> + return -ENODEV;
> + }
> + afu->dvsec_afu_control_pos = pos;
> +
> + pci_read_config_byte(dev, pos + OCXL_DVSEC_AFU_CTRL_PASID_SUP, &val8);
> + afu->pasid_supported_log = EXTRACT_BITS(val8, 0, 4);
> +
> + pci_read_config_word(dev, pos + OCXL_DVSEC_AFU_CTRL_ACTAG_SUP, &val16);
> + afu->actag_supported = EXTRACT_BITS(val16, 0, 11);
> + return 0;
> +}
> +
> +static bool char_allowed(int c)
> +{
> + /*
> + * Permitted Characters : Alphanumeric, hyphen, underscore, comma
> + */
> + if ((c >= 0x30 && c <= 0x39) /* digits */ ||
> + (c >= 0x41 && c <= 0x5A) /* upper case */ ||
> + (c >= 0x61 && c <= 0x7A) /* lower case */ ||
> + c == 0 /* NULL */ ||
> + c == 0x2D /* - */ ||
> + c == 0x5F /* _ */ ||
> + c == 0x2C /* , */)
> + return true;
> + return false;
> +}
> +
> +static int validate_afu(struct pci_dev *dev, struct ocxl_afu_config *afu)
> +{
> + int i;
> +
> + if (!afu->name[0]) {
> + dev_err(&dev->dev, "Empty AFU name\n");
> + return -EINVAL;
> + }
> + for (i = 0; i < OCXL_TEMPL_NAME_LEN; i++) {
> + if (!char_allowed(afu->name[i])) {
> + dev_err(&dev->dev,
> + "Invalid character in AFU name\n");
> + return -EINVAL;
> + }
> + }
> +
> + if (afu->global_mmio_bar != 0 &&
> + afu->global_mmio_bar != 2 &&
> + afu->global_mmio_bar != 4) {
> + dev_err(&dev->dev, "Invalid global MMIO bar number\n");
> + return -EINVAL;
> + }
> + if (afu->pp_mmio_bar != 0 &&
> + afu->pp_mmio_bar != 2 &&
> + afu->pp_mmio_bar != 4) {
> + dev_err(&dev->dev, "Invalid per-process MMIO bar number\n");
> + return -EINVAL;
> + }
> + return 0;
> +}
> +
> +int ocxl_config_read_afu(struct pci_dev *dev, struct ocxl_fn_config *fn,
> + struct ocxl_afu_config *afu, u8 afu_idx)
> +{
> + int rc;
> + u32 val32;
> +
> + /*
> + * First, we need to write the AFU idx for the AFU we want to
> + * access.
> + */
> + WARN_ON((afu_idx & OCXL_DVSEC_AFU_IDX_MASK) != afu_idx);
> + afu->idx = afu_idx;
> + pci_write_config_byte(dev,
> + fn->dvsec_afu_info_pos + OCXL_DVSEC_AFU_INFO_AFU_IDX,
> + afu->idx);
> +
> + rc = read_afu_name(dev, fn, afu);
> + if (rc)
> + return rc;
> +
> + rc = read_afu_info(dev, fn, OCXL_DVSEC_TEMPL_AFU_VERSION, &val32);
> + if (rc)
> + return rc;
> + afu->version_major = EXTRACT_BITS(val32, 24, 31);
> + afu->version_minor = EXTRACT_BITS(val32, 16, 23);
> + afu->afuc_type = EXTRACT_BITS(val32, 14, 15);
> + afu->afum_type = EXTRACT_BITS(val32, 12, 13);
> + afu->profile = EXTRACT_BITS(val32, 0, 7);
> +
> + rc = read_afu_mmio(dev, fn, afu);
> + if (rc)
> + return rc;
> +
> + rc = read_afu_info(dev, fn, OCXL_DVSEC_TEMPL_MEM_SZ, &val32);
> + if (rc)
> + return rc;
> + afu->log_mem_size = EXTRACT_BITS(val32, 0, 7);
> +
> + rc = read_afu_control(dev, afu);
> + if (rc)
> + return rc;
> +
> + dev_dbg(&dev->dev, "AFU configuration:\n");
> + dev_dbg(&dev->dev, " name = %s\n", afu->name);
> + dev_dbg(&dev->dev, " version = %d.%d\n", afu->version_major,
> + afu->version_minor);
> + dev_dbg(&dev->dev, " global mmio bar = %hhu\n", afu->global_mmio_bar);
> + dev_dbg(&dev->dev, " global mmio offset = %#llx\n",
> + afu->global_mmio_offset);
> + dev_dbg(&dev->dev, " global mmio size = %#x\n", afu->global_mmio_size);
> + dev_dbg(&dev->dev, " pp mmio bar = %hhu\n", afu->pp_mmio_bar);
> + dev_dbg(&dev->dev, " pp mmio offset = %#llx\n", afu->pp_mmio_offset);
> + dev_dbg(&dev->dev, " pp mmio stride = %#x\n", afu->pp_mmio_stride);
> + dev_dbg(&dev->dev, " mem size (log) = %hhu\n", afu->log_mem_size);
> + dev_dbg(&dev->dev, " pasid supported (log) = %u\n",
> + afu->pasid_supported_log);
> + dev_dbg(&dev->dev, " actag supported = %u\n",
> + afu->actag_supported);
> +
> + rc = validate_afu(dev, afu);
> + return rc;
> +}
> +
> +int ocxl_config_get_actag_info(struct pci_dev *dev, u16 *base, u16 *enabled,
> + u16 *supported)
> +{
> + int rc;
> +
> + /*
> + * This is really a simple wrapper for the kernel API, to
> + * avoid an external driver using ocxl as a library to call
> + * platform-dependent code
> + */
> + rc = pnv_ocxl_get_actag(dev, base, enabled, supported);
> + if (rc) {
> + dev_err(&dev->dev, "Can't get actag for device: %d\n", rc);
> + return rc;
> + }
> + return 0;
> +}
> +
> +void ocxl_config_set_afu_actag(struct pci_dev *dev, int pos, int actag_base,
> + int actag_count)
> +{
> + u16 val;
> +
> + val = actag_count & OCXL_DVSEC_ACTAG_MASK;
> + pci_write_config_byte(dev, pos + OCXL_DVSEC_AFU_CTRL_ACTAG_EN, val);
> +
> + val = actag_base & OCXL_DVSEC_ACTAG_MASK;
> + pci_write_config_dword(dev, pos + OCXL_DVSEC_AFU_CTRL_ACTAG_BASE, val);
> +}
> +
> +int ocxl_config_get_pasid_info(struct pci_dev *dev, int *count)
> +{
> + return pnv_ocxl_get_pasid_count(dev, count);
> +}
> +
> +void ocxl_config_set_afu_pasid(struct pci_dev *dev, int pos, int pasid_base,
> + u32 pasid_count_log)
> +{
> + u8 val8;
> + u32 val32;
> +
> + val8 = pasid_count_log & OCXL_DVSEC_PASID_LOG_MASK;
> + pci_write_config_byte(dev, pos + OCXL_DVSEC_AFU_CTRL_PASID_EN, val8);
> +
> + pci_read_config_dword(dev, pos + OCXL_DVSEC_AFU_CTRL_PASID_BASE,
> + &val32);
> + val32 &= ~OCXL_DVSEC_PASID_MASK;
> + val32 |= pasid_base & OCXL_DVSEC_PASID_MASK;
> + pci_write_config_dword(dev, pos + OCXL_DVSEC_AFU_CTRL_PASID_BASE,
> + val32);
> +}
> +
> +void ocxl_config_set_afu_state(struct pci_dev *dev, int pos, int enable)
> +{
> + u8 val;
> +
> + pci_read_config_byte(dev, pos + OCXL_DVSEC_AFU_CTRL_ENABLE, &val);
> + if (enable)
> + val |= 1;
> + else
> + val &= 0xFE;
> + pci_write_config_byte(dev, pos + OCXL_DVSEC_AFU_CTRL_ENABLE, val);
> +}
> +
> +int ocxl_config_set_TL(struct pci_dev *dev, int tl_dvsec)
> +{
> + u32 val, *ptr32;
> + u8 timers;
> + int i, rc;
> + long recv_cap;
> + char *recv_rate;
> +
> + /*
> + * Skip on function != 0, as the TL can only be defined on 0
> + */
> + if (PCI_FUNC(dev->devfn) != 0)
> + return 0;
> +
> + recv_rate = kzalloc(PNV_OCXL_TL_RATE_BUF_SIZE, GFP_KERNEL);
> + if (!recv_rate)
> + return -ENOMEM;
> + /*
> + * The spec defines 64 templates for messages in the
> + * Transaction Layer (TL).
> + *
> + * The host and device each support a subset, so we need to
> + * configure the transmitters on each side to send only
> + * templates the receiver understands, at a rate the receiver
> + * can process. Per the spec, template 0 must be supported by
> + * everybody. That's the template which has been used by the
> + * host and device so far.
> + *
> + * The sending rate limit must be set before the template is
> + * enabled.
> + */
> +
> + /*
> + * Device -> host
> + */
> + rc = pnv_ocxl_get_tl_cap(dev, &recv_cap, recv_rate,
> + PNV_OCXL_TL_RATE_BUF_SIZE);
> + if (rc)
> + goto out;
> +
> + for (i = 0; i < PNV_OCXL_TL_RATE_BUF_SIZE; i += 4) {
> + ptr32 = (u32 *) &recv_rate[i];
> + pci_write_config_dword(dev,
> + tl_dvsec + OCXL_DVSEC_TL_SEND_RATE + i,
> + be32_to_cpu(*ptr32));

drivers/misc/ocxl/config.c:618:33: warning: cast to restricted __be32

> + }
> + val = recv_cap >> 32;
> + pci_write_config_dword(dev, tl_dvsec + OCXL_DVSEC_TL_SEND_CAP, val);
> + val = recv_cap & GENMASK(31, 0);
> + pci_write_config_dword(dev, tl_dvsec + OCXL_DVSEC_TL_SEND_CAP + 4, val);
> +
> + /*
> + * Host -> device
> + */
> + for (i = 0; i < PNV_OCXL_TL_RATE_BUF_SIZE; i += 4) {
> + pci_read_config_dword(dev,
> + tl_dvsec + OCXL_DVSEC_TL_RECV_RATE + i,
> + &val);
> + ptr32 = (u32 *) &recv_rate[i];
> + *ptr32 = cpu_to_be32(val);

drivers/misc/ocxl/config.c:633:24: warning: incorrect type in assignment
(different base types)
drivers/misc/ocxl/config.c:633:24: expected unsigned int [unsigned]
[usertype] <noident>
drivers/misc/ocxl/config.c:633:24: got restricted __be32 [usertype]
<noident>

> + }
> + pci_read_config_dword(dev, tl_dvsec + OCXL_DVSEC_TL_RECV_CAP, &val);
> + recv_cap = (long) val << 32;
> + pci_read_config_dword(dev, tl_dvsec + OCXL_DVSEC_TL_RECV_CAP + 4, &val);
> + recv_cap |= val;
> +
> + rc = pnv_ocxl_set_tl_conf(dev, recv_cap, __pa(recv_rate),
> + PNV_OCXL_TL_RATE_BUF_SIZE);
> + if (rc)
> + goto out;
> +
> + /*
> + * Opencapi commands needing to be retried are classified per
> + * the TL in 2 groups: short and long commands.
> + *
> + * The short back off timer it not used for now. It will be
> + * for opencapi 4.0.
> + *
> + * The long back off timer is typically used when an AFU hits
> + * a page fault but the NPU is already processing one. So the
> + * AFU needs to wait before it can resubmit. Having a value
> + * too low doesn't break anything, but can generate extra
> + * traffic on the link.
> + * We set it to 1.6 us for now. It's shorter than, but in the
> + * same order of magnitude as the time spent to process a page
> + * fault.
> + */
> + timers = 0x2 << 4; /* long timer = 1.6 us */
> + pci_write_config_byte(dev, tl_dvsec + OCXL_DVSEC_TL_BACKOFF_TIMERS,
> + timers);
> +
> + rc = 0;
> +out:
> + kfree(recv_rate);
> + return rc;
> +}
> +
> +int ocxl_config_terminate_pasid(struct pci_dev *dev, int afu_control, int pasid)
> +{
> + u32 val;
> + unsigned long timeout;
> +
> + pci_read_config_dword(dev, afu_control + OCXL_DVSEC_AFU_CTRL_TERM_PASID,
> + &val);
> + if (EXTRACT_BIT(val, 20)) {
> + dev_err(&dev->dev,
> + "Can't terminate PASID %#x, previous termination didn't complete\n",
> + pasid);
> + return -EBUSY;
> + }
> +
> + val &= ~OCXL_DVSEC_PASID_MASK;
> + val |= pasid & OCXL_DVSEC_PASID_MASK;
> + val |= BIT(20);
> + pci_write_config_dword(dev,
> + afu_control + OCXL_DVSEC_AFU_CTRL_TERM_PASID,
> + val);
> +
> + timeout = jiffies + (HZ * OCXL_CFG_TIMEOUT);
> + pci_read_config_dword(dev, afu_control + OCXL_DVSEC_AFU_CTRL_TERM_PASID,
> + &val);
> + while (EXTRACT_BIT(val, 20)) {
> + if (time_after_eq(jiffies, timeout)) {
> + dev_err(&dev->dev,
> + "Timeout while waiting for AFU to terminate PASID %#x\n",
> + pasid);
> + return -EBUSY;
> + }
> + cpu_relax();
> + pci_read_config_dword(dev,
> + afu_control + OCXL_DVSEC_AFU_CTRL_TERM_PASID,
> + &val);
> + }
> + return 0;
> +}
> +
> +void ocxl_config_set_actag(struct pci_dev *dev, int func_dvsec, u32 tag_first,
> + u32 tag_count)
> +{
> + u32 val;
> +
> + val = (tag_first & OCXL_DVSEC_ACTAG_MASK) << 16;
> + val |= tag_count & OCXL_DVSEC_ACTAG_MASK;
> + pci_write_config_dword(dev, func_dvsec + OCXL_DVSEC_FUNC_OFF_ACTAG,
> + val);
> +}
> diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
> new file mode 100644
> index 000000000000..0bc0dd97d784
> --- /dev/null
> +++ b/drivers/misc/ocxl/context.c
> @@ -0,0 +1,237 @@
> +/*
> + * Copyright 2017 IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#include <linux/sched/mm.h>
> +#include "ocxl_internal.h"
> +
> +struct ocxl_context *ocxl_context_alloc(void)
> +{
> + return kzalloc(sizeof(struct ocxl_context), GFP_KERNEL);
> +}
> +
> +int ocxl_context_init(struct ocxl_context *ctx, struct ocxl_afu *afu,
> + struct address_space *mapping)
> +{
> + int pasid;
> +
> + ctx->afu = afu;
> + mutex_lock(&afu->contexts_lock);
> + pasid = idr_alloc(&afu->contexts_idr, ctx, afu->pasid_base,
> + afu->pasid_base + afu->pasid_max, GFP_KERNEL);
> + if (pasid < 0) {
> + mutex_unlock(&afu->contexts_lock);
> + return pasid;
> + }
> + afu->pasid_count++;
> + mutex_unlock(&afu->contexts_lock);
> +
> + ctx->pasid = pasid;
> + ctx->status = OPENED;
> + mutex_init(&ctx->status_mutex);
> + ctx->mapping = mapping;
> + mutex_init(&ctx->mapping_lock);
> + init_waitqueue_head(&ctx->events_wq);
> + mutex_init(&ctx->xsl_error_lock);
> + /*
> + * Keep a reference on the AFU to make sure it's valid for the
> + * duration of the life of the context
> + */
> + ocxl_afu_get(afu);
> + return 0;
> +}
> +
> +/*
> + * Callback for when a translation fault triggers an error
> + * data: a pointer to the context which triggered the fault
> + * addr: the address that triggered the error
> + * dsisr: the value of the PPC64 dsisr register
> + */
> +static void xsl_fault_error(void *data, u64 addr, u64 dsisr)
> +{
> + struct ocxl_context *ctx = (struct ocxl_context *) data;
> +
> + mutex_lock(&ctx->xsl_error_lock);
> + ctx->xsl_error.addr = addr;
> + ctx->xsl_error.dsisr = dsisr;
> + ctx->xsl_error.count++;
> + mutex_unlock(&ctx->xsl_error_lock);
> +
> + wake_up_all(&ctx->events_wq);
> +}
> +
> +int ocxl_context_attach(struct ocxl_context *ctx, u64 amr)
> +{
> + int rc;
> +
> + mutex_lock(&ctx->status_mutex);
> + if (ctx->status != OPENED) {
> + rc = -EIO;
> + goto out;
> + }
> +
> + rc = ocxl_link_add_pe(ctx->afu->fn->link, ctx->pasid,
> + current->mm->context.id, 0, amr, current->mm,
> + xsl_fault_error, ctx);
> + if (rc)
> + goto out;
> +
> + ctx->status = ATTACHED;
> +out:
> + mutex_unlock(&ctx->status_mutex);
> + return rc;
> +}
> +
> +static int map_pp_mmio(struct vm_area_struct *vma, unsigned long address,
> + u64 offset, struct ocxl_context *ctx)
> +{
> + u64 pp_mmio_addr;
> + int pasid_off;
> +
> + if (offset >= ctx->afu->config.pp_mmio_stride)
> + return VM_FAULT_SIGBUS;
> +
> + mutex_lock(&ctx->status_mutex);
> + if (ctx->status != ATTACHED) {
> + mutex_unlock(&ctx->status_mutex);
> + pr_debug("%s: Context not attached, failing mmio mmap\n",
> + __func__);
> + return VM_FAULT_SIGBUS;
> + }
> +
> + pasid_off = ctx->pasid - ctx->afu->pasid_base;
> + pp_mmio_addr = ctx->afu->pp_mmio_start +
> + pasid_off * ctx->afu->config.pp_mmio_stride +
> + offset;
> +
> + vm_insert_pfn(vma, address, pp_mmio_addr >> PAGE_SHIFT);
> + mutex_unlock(&ctx->status_mutex);
> + return VM_FAULT_NOPAGE;
> +}
> +
> +static int ocxl_mmap_fault(struct vm_fault *vmf)
> +{
> + struct vm_area_struct *vma = vmf->vma;
> + struct ocxl_context *ctx = vma->vm_file->private_data;
> + u64 offset;
> + int rc;
> +
> + offset = vmf->pgoff << PAGE_SHIFT;
> + pr_debug("%s: pasid %d address 0x%lx offset 0x%llx\n", __func__,
> + ctx->pasid, vmf->address, offset);
> +
> + rc = map_pp_mmio(vma, vmf->address, offset, ctx);
> + return rc;
> +}
> +
> +static const struct vm_operations_struct ocxl_vmops = {
> + .fault = ocxl_mmap_fault,
> +};
> +
> +static int check_mmap_mmio(struct ocxl_context *ctx,
> + struct vm_area_struct *vma)
> +{
> + if ((vma_pages(vma) + vma->vm_pgoff) >
> + (ctx->afu->config.pp_mmio_stride >> PAGE_SHIFT))
> + return -EINVAL;
> + return 0;
> +}
> +
> +int ocxl_context_mmap(struct ocxl_context *ctx, struct vm_area_struct *vma)
> +{
> + int rc;
> +
> + rc = check_mmap_mmio(ctx, vma);
> + if (rc)
> + return rc;
> +
> + vma->vm_flags |= VM_IO | VM_PFNMAP;
> + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> + vma->vm_ops = &ocxl_vmops;
> + return 0;
> +}
> +
> +int ocxl_context_detach(struct ocxl_context *ctx)
> +{
> + struct pci_dev *dev;
> + int afu_control_pos;
> + enum ocxl_context_status status;
> + int rc;
> +
> + mutex_lock(&ctx->status_mutex);
> + status = ctx->status;
> + ctx->status = CLOSED;
> + mutex_unlock(&ctx->status_mutex);
> + if (status != ATTACHED)
> + return 0;
> +
> + dev = to_pci_dev(ctx->afu->fn->dev.parent);
> + afu_control_pos = ctx->afu->config.dvsec_afu_control_pos;
> +
> + mutex_lock(&ctx->afu->afu_control_lock);
> + rc = ocxl_config_terminate_pasid(dev, afu_control_pos, ctx->pasid);
> + mutex_unlock(&ctx->afu->afu_control_lock);
> + if (rc) {
> + /*
> + * If we timeout waiting for the AFU to terminate the
> + * pasid, then it's dangerous to clean up the Process
> + * Element entry in the SPA, as it may be referenced
> + * in the future by the AFU. In which case, we would
> + * checkstop because of an invalid PE access (FIR
> + * register 2, bit 42). So leave the PE
> + * defined. Caller shouldn't free the context so that
> + * PASID remains allocated.
> + *
> + * A link reset will be required to cleanup the AFU
> + * and the SPA.
> + */
> + if (rc == -EBUSY)
> + return rc;
> + }
> + rc = ocxl_link_remove_pe(ctx->afu->fn->link, ctx->pasid);
> + if (rc) {
> + dev_warn(&ctx->afu->dev,
> + "Couldn't remove PE entry cleanly: %d\n", rc);
> + }
> + return 0;
> +}
> +
> +void ocxl_context_detach_all(struct ocxl_afu *afu)
> +{
> + struct ocxl_context *ctx;
> + int tmp;
> +
> + mutex_lock(&afu->contexts_lock);
> + idr_for_each_entry(&afu->contexts_idr, ctx, tmp) {
> + ocxl_context_detach(ctx);
> + /*
> + * We are force detaching - remove any active mmio
> + * mappings so userspace cannot interfere with the
> + * card if it comes back. Easiest way to exercise
> + * this is to unbind and rebind the driver via sysfs
> + * while it is in use.
> + */
> + mutex_lock(&ctx->mapping_lock);
> + if (ctx->mapping)
> + unmap_mapping_range(ctx->mapping, 0, 0, 1);
> + mutex_unlock(&ctx->mapping_lock);
> + }
> + mutex_unlock(&afu->contexts_lock);
> +}
> +
> +void ocxl_context_free(struct ocxl_context *ctx)
> +{
> + mutex_lock(&ctx->afu->contexts_lock);
> + ctx->afu->pasid_count--;
> + idr_remove(&ctx->afu->contexts_idr, ctx->pasid);
> + mutex_unlock(&ctx->afu->contexts_lock);
> +
> + /* reference to the AFU taken in ocxl_context_init */
> + ocxl_afu_put(ctx->afu);
> + kfree(ctx);
> +}
> diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c
> new file mode 100644
> index 000000000000..a51386eff4f5
> --- /dev/null
> +++ b/drivers/misc/ocxl/file.c
> @@ -0,0 +1,405 @@
> +/*
> + * Copyright 2017 IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#include <linux/fs.h>
> +#include <linux/poll.h>
> +#include <linux/sched/signal.h>
> +#include <linux/uaccess.h>
> +#include <uapi/misc/ocxl.h>
> +#include "ocxl_internal.h"
> +
> +
> +#define OCXL_NUM_MINORS 256 /* Total to reserve */
> +
> +static dev_t ocxl_dev;
> +static struct class *ocxl_class;
> +static struct mutex minors_idr_lock;
> +static struct idr minors_idr;
> +
> +static struct ocxl_afu *find_and_get_afu(dev_t devno)
> +{
> + struct ocxl_afu *afu;
> + int afu_minor;
> +
> + afu_minor = MINOR(devno);
> + /*
> + * We don't declare an RCU critical section here, as our AFU
> + * is protected by a reference counter on the device. By the time the
> + * minor number of a device is removed from the idr, the ref count of
> + * the device is already at 0, so no user API will access that AFU and
> + * this function can't return it.
> + */
> + afu = idr_find(&minors_idr, afu_minor);
> + if (afu)
> + ocxl_afu_get(afu);
> + return afu;
> +}
> +
> +static int allocate_afu_minor(struct ocxl_afu *afu)
> +{
> + int minor;
> +
> + mutex_lock(&minors_idr_lock);
> + minor = idr_alloc(&minors_idr, afu, 0, OCXL_NUM_MINORS, GFP_KERNEL);
> + mutex_unlock(&minors_idr_lock);
> + return minor;
> +}
> +
> +static void free_afu_minor(struct ocxl_afu *afu)
> +{
> + mutex_lock(&minors_idr_lock);
> + idr_remove(&minors_idr, MINOR(afu->dev.devt));
> + mutex_unlock(&minors_idr_lock);
> +}
> +
> +static int afu_open(struct inode *inode, struct file *file)
> +{
> + struct ocxl_afu *afu;
> + struct ocxl_context *ctx;
> + int rc;
> +
> + pr_debug("%s for device %x\n", __func__, inode->i_rdev);
> +
> + afu = find_and_get_afu(inode->i_rdev);
> + if (!afu)
> + return -ENODEV;
> +
> + ctx = ocxl_context_alloc();
> + if (!ctx) {
> + rc = -ENOMEM;
> + goto put_afu;
> + }
> +
> + rc = ocxl_context_init(ctx, afu, inode->i_mapping);
> + if (rc)
> + goto put_afu;
> + file->private_data = ctx;
> + ocxl_afu_put(afu);
> + return 0;
> +
> +put_afu:
> + ocxl_afu_put(afu);
> + return rc;
> +}
> +
> +static long afu_ioctl_attach(struct ocxl_context *ctx,
> + struct ocxl_ioctl_attach __user *uarg)
> +{
> + struct ocxl_ioctl_attach arg;
> + u64 amr = 0;
> + int rc;
> +
> + pr_debug("%s for context %d\n", __func__, ctx->pasid);
> +
> + if (copy_from_user(&arg, uarg, sizeof(arg)))
> + return -EFAULT;
> +
> + /* Make sure reserved fields are not set for forward compatibility */
> + if (arg.reserved1 || arg.reserved2 || arg.reserved3)
> + return -EINVAL;
> +
> + amr = arg.amr & mfspr(SPRN_UAMOR);
> + rc = ocxl_context_attach(ctx, amr);
> + return rc;
> +}
> +
> +#define CMD_STR(x) (x == OCXL_IOCTL_ATTACH ? "ATTACH" : \
> + "UNKNOWN")
> +
> +static long afu_ioctl(struct file *file, unsigned int cmd,
> + unsigned long args)
> +{
> + struct ocxl_context *ctx = file->private_data;
> + long rc;
> +
> + pr_debug("%s for context %d, command %s\n", __func__, ctx->pasid,
> + CMD_STR(cmd));
> +
> + if (ctx->status == CLOSED)
> + return -EIO;
> +
> + switch (cmd) {
> + case OCXL_IOCTL_ATTACH:
> + rc = afu_ioctl_attach(ctx,
> + (struct ocxl_ioctl_attach __user *) args);
> + break;
> +
> + default:
> + rc = -EINVAL;
> + }
> + return rc;
> +}
> +
> +static long afu_compat_ioctl(struct file *file, unsigned int cmd,
> + unsigned long args)
> +{
> + return afu_ioctl(file, cmd, args);
> +}
> +
> +static int afu_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> + struct ocxl_context *ctx = file->private_data;
> +
> + pr_debug("%s for context %d\n", __func__, ctx->pasid);
> + return ocxl_context_mmap(ctx, vma);
> +}
> +
> +static bool has_xsl_error(struct ocxl_context *ctx)
> +{
> + bool ret;
> +
> + mutex_lock(&ctx->xsl_error_lock);
> + ret = !!ctx->xsl_error.addr;
> + mutex_unlock(&ctx->xsl_error_lock);
> +
> + return ret;
> +}
> +
> +/*
> + * Are there any events pending on the AFU
> + * ctx: The AFU context
> + * Returns: true if there are events pending
> + */
> +static bool afu_events_pending(struct ocxl_context *ctx)
> +{
> + if (has_xsl_error(ctx))
> + return true;
> + return false;
> +}
> +
> +static unsigned int afu_poll(struct file *file, struct poll_table_struct *wait)
> +{
> + struct ocxl_context *ctx = file->private_data;
> + unsigned int mask = 0;
> + bool closed;
> +
> + pr_debug("%s for context %d\n", __func__, ctx->pasid);
> +
> + poll_wait(file, &ctx->events_wq, wait);
> +
> + mutex_lock(&ctx->status_mutex);
> + closed = (ctx->status == CLOSED);
> + mutex_unlock(&ctx->status_mutex);
> +
> + if (afu_events_pending(ctx))
> + mask = POLLIN | POLLRDNORM;
> + else if (closed)
> + mask = POLLERR;
> +
> + return mask;
> +}
> +
> +/*
> + * Populate the supplied buffer with a single XSL error
> + * ctx: The AFU context to report the error from
> + * header: the event header to populate
> + * buf: The buffer to write the body into (should be at least
> + * AFU_EVENT_BODY_XSL_ERROR_SIZE)
> + * Return: the amount of buffer that was populated
> + */
> +static ssize_t append_xsl_error(struct ocxl_context *ctx,
> + struct ocxl_kernel_event_header *header,
> + char __user *buf)
> +{
> + struct ocxl_kernel_event_xsl_fault_error body;
> +
> + memset(&body, 0, sizeof(body));
> +
> + mutex_lock(&ctx->xsl_error_lock);
> + if (!ctx->xsl_error.addr) {
> + mutex_unlock(&ctx->xsl_error_lock);
> + return 0;
> + }
> +
> + body.addr = ctx->xsl_error.addr;
> + body.dsisr = ctx->xsl_error.dsisr;
> + body.count = ctx->xsl_error.count;
> +
> + ctx->xsl_error.addr = 0;
> + ctx->xsl_error.dsisr = 0;
> + ctx->xsl_error.count = 0;
> +
> + mutex_unlock(&ctx->xsl_error_lock);
> +
> + header->type = OCXL_AFU_EVENT_XSL_FAULT_ERROR;
> +
> + if (copy_to_user(buf, &body, sizeof(body)))
> + return -EFAULT;
> +
> + return sizeof(body);
> +}
> +
> +#define AFU_EVENT_BODY_MAX_SIZE sizeof(struct ocxl_kernel_event_xsl_fault_error)
> +
> +/*
> + * Reports events on the AFU
> + * Format:
> + * Header (struct ocxl_kernel_event_header)
> + * Body (struct ocxl_kernel_event_*)
> + * Header...
> + */
> +static ssize_t afu_read(struct file *file, char __user *buf, size_t count,
> + loff_t *off)
> +{
> + struct ocxl_context *ctx = file->private_data;
> + struct ocxl_kernel_event_header header;
> + ssize_t rc;
> + size_t used = 0;
> + DEFINE_WAIT(event_wait);
> +
> + memset(&header, 0, sizeof(header));
> +
> + /* Require offset to be 0 */
> + if (*off != 0)
> + return -EINVAL;
> +
> + if (count < (sizeof(struct ocxl_kernel_event_header) +
> + AFU_EVENT_BODY_MAX_SIZE))
> + return -EINVAL;
> +
> + for (;;) {
> + prepare_to_wait(&ctx->events_wq, &event_wait,
> + TASK_INTERRUPTIBLE);
> +
> + if (afu_events_pending(ctx))
> + break;
> +
> + if (ctx->status == CLOSED)
> + break;
> +
> + if (file->f_flags & O_NONBLOCK) {
> + finish_wait(&ctx->events_wq, &event_wait);
> + return -EAGAIN;
> + }
> +
> + if (signal_pending(current)) {
> + finish_wait(&ctx->events_wq, &event_wait);
> + return -ERESTARTSYS;
> + }
> +
> + schedule();
> + }
> +
> + finish_wait(&ctx->events_wq, &event_wait);
> +
> + if (has_xsl_error(ctx)) {
> + used = append_xsl_error(ctx, &header, buf + sizeof(header));
> + if (used < 0)
> + return used;
> + }
> +
> + if (!afu_events_pending(ctx))
> + header.flags |= OCXL_KERNEL_EVENT_FLAG_LAST;
> +
> + if (copy_to_user(buf, &header, sizeof(header)))
> + return -EFAULT;
> +
> + used += sizeof(header);
> +
> + rc = (ssize_t) used;
> + return rc;
> +}
> +
> +static int afu_release(struct inode *inode, struct file *file)
> +{
> + struct ocxl_context *ctx = file->private_data;
> + int rc;
> +
> + pr_debug("%s for device %x\n", __func__, inode->i_rdev);
> + rc = ocxl_context_detach(ctx);
> + mutex_lock(&ctx->mapping_lock);
> + ctx->mapping = NULL;
> + mutex_unlock(&ctx->mapping_lock);
> + wake_up_all(&ctx->events_wq);
> + if (rc != -EBUSY)
> + ocxl_context_free(ctx);
> + return 0;
> +}
> +
> +static const struct file_operations ocxl_afu_fops = {
> + .owner = THIS_MODULE,
> + .open = afu_open,
> + .unlocked_ioctl = afu_ioctl,
> + .compat_ioctl = afu_compat_ioctl,
> + .mmap = afu_mmap,
> + .poll = afu_poll,
> + .read = afu_read,
> + .release = afu_release,
> +};
> +
> +int ocxl_create_cdev(struct ocxl_afu *afu)
> +{
> + int rc;
> +
> + cdev_init(&afu->cdev, &ocxl_afu_fops);
> + rc = cdev_add(&afu->cdev, afu->dev.devt, 1);
> + if (rc) {
> + dev_err(&afu->dev, "Unable to add afu char device: %d\n", rc);
> + return rc;
> + }
> + return 0;
> +}
> +
> +void ocxl_destroy_cdev(struct ocxl_afu *afu)
> +{
> + cdev_del(&afu->cdev);
> +}
> +
> +int ocxl_register_afu(struct ocxl_afu *afu)
> +{
> + int minor;
> +
> + minor = allocate_afu_minor(afu);
> + if (minor < 0)
> + return minor;
> + afu->dev.devt = MKDEV(MAJOR(ocxl_dev), minor);
> + afu->dev.class = ocxl_class;
> + return device_register(&afu->dev);
> +}
> +
> +void ocxl_unregister_afu(struct ocxl_afu *afu)
> +{
> + free_afu_minor(afu);
> +}
> +
> +static char *ocxl_devnode(struct device *dev, umode_t *mode)
> +{
> + return kasprintf(GFP_KERNEL, "ocxl/%s", dev_name(dev));
> +}
> +
> +int ocxl_file_init(void)
> +{
> + int rc;
> +
> + mutex_init(&minors_idr_lock);
> + idr_init(&minors_idr);
> +
> + rc = alloc_chrdev_region(&ocxl_dev, 0, OCXL_NUM_MINORS, "ocxl");
> + if (rc) {
> + pr_err("Unable to allocate ocxl major number: %d\n", rc);
> + return rc;
> + }
> +
> + ocxl_class = class_create(THIS_MODULE, "ocxl");
> + if (IS_ERR(ocxl_class)) {
> + pr_err("Unable to create ocxl class\n");
> + unregister_chrdev_region(ocxl_dev, OCXL_NUM_MINORS);
> + return PTR_ERR(ocxl_class);
> + }
> +
> + ocxl_class->devnode = ocxl_devnode;
> + return 0;
> +}
> +
> +void ocxl_file_exit(void)
> +{
> + class_destroy(ocxl_class);
> + unregister_chrdev_region(ocxl_dev, OCXL_NUM_MINORS);
> + idr_destroy(&minors_idr);
> +}
> diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
> new file mode 100644
> index 000000000000..6b184cd7d2a6
> --- /dev/null
> +++ b/drivers/misc/ocxl/link.c
> @@ -0,0 +1,610 @@
> +/*
> + * Copyright 2017 IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#include <linux/sched/mm.h>
> +#include <linux/mutex.h>
> +#include <linux/mmu_context.h>
> +#include <asm/copro.h>
> +#include <asm/pnv-ocxl.h>
> +#include "ocxl_internal.h"
> +
> +
> +#define SPA_PASID_BITS 15
> +#define SPA_PASID_MAX ((1 << SPA_PASID_BITS) - 1)
> +#define SPA_PE_MASK SPA_PASID_MAX
> +#define SPA_SPA_SIZE_LOG 22 /* Each SPA is 4 Mb */
> +
> +#define SPA_CFG_SF (1ull << (63-0))
> +#define SPA_CFG_TA (1ull << (63-1))
> +#define SPA_CFG_HV (1ull << (63-3))
> +#define SPA_CFG_UV (1ull << (63-4))
> +#define SPA_CFG_XLAT_hpt (0ull << (63-6)) /* Hashed page table (HPT) mode */
> +#define SPA_CFG_XLAT_roh (2ull << (63-6)) /* Radix on HPT mode */
> +#define SPA_CFG_XLAT_ror (3ull << (63-6)) /* Radix on Radix mode */
> +#define SPA_CFG_PR (1ull << (63-49))
> +#define SPA_CFG_TC (1ull << (63-54))
> +#define SPA_CFG_DR (1ull << (63-59))
> +
> +#define SPA_XSL_TF (1ull << (63-3)) /* Translation fault */
> +#define SPA_XSL_S (1ull << (63-38)) /* Store operation */
> +
> +#define SPA_PE_VALID 0x80000000
> +
> +
> +struct pe_data {
> + struct mm_struct *mm;
> + /* callback to trigger when a translation fault occurs */
> + void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr);
> + /* opaque pointer to be passed to the above callback */
> + void *xsl_err_data;
> + struct rcu_head rcu;
> +};
> +
> +struct spa {
> + struct ocxl_process_element *spa_mem;
> + int spa_order;
> + struct mutex spa_lock;
> + struct radix_tree_root pe_tree; /* Maps PE handles to pe_data */
> + char *irq_name;
> + int virq;
> + void __iomem *reg_dsisr;
> + void __iomem *reg_dar;
> + void __iomem *reg_tfc;
> + void __iomem *reg_pe_handle;
> + /*
> + * The following field are used by the memory fault
> + * interrupt handler. We can only have one interrupt at a
> + * time. The NPU won't raise another interrupt until the
> + * previous one has been ack'd by writing to the TFC register
> + */
> + struct xsl_fault {
> + struct work_struct fault_work;
> + u64 pe;
> + u64 dsisr;
> + u64 dar;
> + struct pe_data pe_data;
> + } xsl_fault;
> +};
> +
> +/*
> + * A opencapi link can be used be by several PCI functions. We have
> + * one link per device slot.
> + *
> + * A linked list of opencapi links should suffice, as there's a
> + * limited number of opencapi slots on a system and lookup is only
> + * done when the device is probed
> + */
> +struct link {
> + struct list_head list;
> + struct kref ref;
> + int domain;
> + int bus;
> + int dev;
> + atomic_t irq_available;
> + struct spa *spa;
> + void *platform_data;
> +};
> +static struct list_head links_list = LIST_HEAD_INIT(links_list);
> +static DEFINE_MUTEX(links_list_lock);
> +
> +enum xsl_response {
> + CONTINUE,
> + ADDRESS_ERROR,
> + RESTART,
> +};
> +
> +
> +static void read_irq(struct spa *spa, u64 *dsisr, u64 *dar, u64 *pe)
> +{
> + u64 reg;
> +
> + *dsisr = in_be64(spa->reg_dsisr);
> + *dar = in_be64(spa->reg_dar);
> + reg = in_be64(spa->reg_pe_handle);
> + *pe = reg & SPA_PE_MASK;
> +}
> +
> +static void ack_irq(struct spa *spa, enum xsl_response r)
> +{
> + u64 reg = 0;
> +
> + /* continue is not supported */
> + if (r == RESTART)
> + reg = PPC_BIT(31);
> + else if (r == ADDRESS_ERROR)
> + reg = PPC_BIT(30);
> + else
> + WARN(1, "Invalid irq response %d\n", r);
> +
> + if (reg)
> + out_be64(spa->reg_tfc, reg);
> +}
> +
> +static void xsl_fault_handler_bh(struct work_struct *fault_work)
> +{
> + unsigned int flt = 0;
> + unsigned long access, flags, inv_flags = 0;
> + enum xsl_response r;
> + struct xsl_fault *fault = container_of(fault_work, struct xsl_fault,
> + fault_work);
> + struct spa *spa = container_of(fault, struct spa, xsl_fault);
> +
> + int rc;
> +
> + /*
> + * We need to release a reference on the mm whenever exiting this
> + * function (taken in the memory fault interrupt handler)
> + */
> + rc = copro_handle_mm_fault(fault->pe_data.mm, fault->dar, fault->dsisr,
> + &flt);
> + if (rc) {
> + pr_debug("copro_handle_mm_fault failed: %d\n", rc);
> + if (fault->pe_data.xsl_err_cb) {
> + fault->pe_data.xsl_err_cb(
> + fault->pe_data.xsl_err_data,
> + fault->dar, fault->dsisr);
> + }
> + r = ADDRESS_ERROR;
> + goto ack;
> + }
> +
> + if (!radix_enabled()) {
> + /*
> + * update_mmu_cache() will not have loaded the hash
> + * since current->trap is not a 0x400 or 0x300, so
> + * just call hash_page_mm() here.
> + */
> + access = _PAGE_PRESENT | _PAGE_READ;
> + if (fault->dsisr & SPA_XSL_S)
> + access |= _PAGE_WRITE;
> +
> + if (REGION_ID(fault->dar) != USER_REGION_ID)
> + access |= _PAGE_PRIVILEGED;
> +
> + local_irq_save(flags);
> + hash_page_mm(fault->pe_data.mm, fault->dar, access, 0x300,
> + inv_flags);
> + local_irq_restore(flags);
> + }
> + r = RESTART;
> +ack:
> + mmdrop(fault->pe_data.mm);
> + ack_irq(spa, r);
> +}
> +
> +static irqreturn_t xsl_fault_handler(int irq, void *data)
> +{
> + struct link *link = (struct link *) data;
> + struct spa *spa = link->spa;
> + u64 dsisr, dar, pe_handle;
> + struct pe_data *pe_data;
> + struct ocxl_process_element *pe;
> + int lpid, pid, tid;
> +
> + read_irq(spa, &dsisr, &dar, &pe_handle);
> +
> + WARN_ON(pe_handle > SPA_PE_MASK);
> + pe = spa->spa_mem + pe_handle;
> + lpid = be32_to_cpu(pe->lpid);
> + pid = be32_to_cpu(pe->pid);
> + tid = be32_to_cpu(pe->tid);

drivers/misc/ocxl/link.c:193:16: warning: cast to restricted __be32
drivers/misc/ocxl/link.c:194:15: warning: cast to restricted __be32
drivers/misc/ocxl/link.c:195:15: warning: cast to restricted __be32

> + /* We could be reading all null values here if the PE is being
> + * removed while an interrupt kicks in. It's not supposed to
> + * happen if the driver notified the AFU to terminate the
> + * PASID, and the AFU waited for pending operations before
> + * acknowledging. But even if it happens, we won't find a
> + * memory context below and fail silently, so it should be ok.
> + */
> + if (!(dsisr & SPA_XSL_TF)) {
> + WARN(1, "Invalid xsl interrupt fault register %#llx\n", dsisr);
> + ack_irq(spa, ADDRESS_ERROR);
> + return IRQ_HANDLED;
> + }
> +
> + rcu_read_lock();
> + pe_data = radix_tree_lookup(&spa->pe_tree, pe_handle);
> + if (!pe_data) {
> + /*
> + * Could only happen if the driver didn't notify the
> + * AFU about PASID termination before removing the PE,
> + * or the AFU didn't wait for all memory access to
> + * have completed.
> + *
> + * Either way, we fail early, but we shouldn't log an
> + * error message, as it is a valid (if unexpected)
> + * scenario
> + */
> + rcu_read_unlock();
> + pr_debug("Unknown mm context for xsl interrupt\n");
> + ack_irq(spa, ADDRESS_ERROR);
> + return IRQ_HANDLED;
> + }
> + WARN_ON(pe_data->mm->context.id != pid);
> +
> + spa->xsl_fault.pe = pe_handle;
> + spa->xsl_fault.dar = dar;
> + spa->xsl_fault.dsisr = dsisr;
> + spa->xsl_fault.pe_data = *pe_data;
> + mmgrab(pe_data->mm); /* mm count is released by bottom half */
> +
> + rcu_read_unlock();
> + schedule_work(&spa->xsl_fault.fault_work);
> + return IRQ_HANDLED;
> +}
> +
> +static void unmap_irq_registers(struct spa *spa)
> +{
> + pnv_ocxl_unmap_xsl_regs(spa->reg_dsisr, spa->reg_dar, spa->reg_tfc,
> + spa->reg_pe_handle);
> +}
> +
> +static int map_irq_registers(struct pci_dev *dev, struct spa *spa)
> +{
> + return pnv_ocxl_map_xsl_regs(dev, &spa->reg_dsisr, &spa->reg_dar,
> + &spa->reg_tfc, &spa->reg_pe_handle);
> +}
> +
> +static int setup_xsl_irq(struct pci_dev *dev, struct link *link)
> +{
> + struct spa *spa = link->spa;
> + int rc;
> + int hwirq;
> +
> + rc = pnv_ocxl_get_xsl_irq(dev, &hwirq);
> + if (rc)
> + return rc;
> +
> + rc = map_irq_registers(dev, spa);
> + if (rc)
> + return rc;
> +
> + spa->irq_name = kasprintf(GFP_KERNEL, "ocxl-xsl-%x-%x-%x",
> + link->domain, link->bus, link->dev);
> + if (!spa->irq_name) {
> + unmap_irq_registers(spa);
> + dev_err(&dev->dev, "Can't allocate name for xsl interrupt\n");
> + return -ENOMEM;
> + }
> + /*
> + * At some point, we'll need to look into allowing a higher
> + * number of interrupts. Could we have an IRQ domain per link?
> + */
> + spa->virq = irq_create_mapping(NULL, hwirq);
> + if (!spa->virq) {
> + kfree(spa->irq_name);
> + unmap_irq_registers(spa);
> + dev_err(&dev->dev,
> + "irq_create_mapping failed for translation interrupt\n");
> + return -EINVAL;
> + }
> +
> + dev_dbg(&dev->dev, "hwirq %d mapped to virq %d\n", hwirq, spa->virq);
> +
> + rc = request_irq(spa->virq, xsl_fault_handler, 0, spa->irq_name,
> + link);
> + if (rc) {
> + irq_dispose_mapping(spa->virq);
> + kfree(spa->irq_name);
> + unmap_irq_registers(spa);
> + dev_err(&dev->dev,
> + "request_irq failed for translation interrupt: %d\n",
> + rc);
> + return -EINVAL;
> + }
> + return 0;
> +}
> +
> +static void release_xsl_irq(struct link *link)
> +{
> + struct spa *spa = link->spa;
> +
> + if (spa->virq) {
> + free_irq(spa->virq, link);
> + irq_dispose_mapping(spa->virq);
> + }
> + kfree(spa->irq_name);
> + unmap_irq_registers(spa);
> +}
> +
> +static int alloc_spa(struct pci_dev *dev, struct link *link)
> +{
> + struct spa *spa;
> +
> + spa = kzalloc(sizeof(struct spa), GFP_KERNEL);
> + if (!spa)
> + return -ENOMEM;
> +
> + mutex_init(&spa->spa_lock);
> + INIT_RADIX_TREE(&spa->pe_tree, GFP_KERNEL);
> + INIT_WORK(&spa->xsl_fault.fault_work, xsl_fault_handler_bh);
> +
> + spa->spa_order = SPA_SPA_SIZE_LOG - PAGE_SHIFT;
> + spa->spa_mem = (struct ocxl_process_element *)
> + __get_free_pages(GFP_KERNEL | __GFP_ZERO, spa->spa_order);
> + if (!spa->spa_mem) {
> + dev_err(&dev->dev, "Can't allocate Shared Process Area\n");
> + kfree(spa);
> + return -ENOMEM;
> + }
> + pr_debug("Allocated SPA for %x:%x:%x at %p\n", link->domain, link->bus,
> + link->dev, spa->spa_mem);
> +
> + link->spa = spa;
> + return 0;
> +}
> +
> +static void free_spa(struct link *link)
> +{
> + struct spa *spa = link->spa;
> +
> + pr_debug("Freeing SPA for %x:%x:%x\n", link->domain, link->bus,
> + link->dev);
> +
> + if (spa && spa->spa_mem) {
> + free_pages((unsigned long) spa->spa_mem, spa->spa_order);
> + kfree(spa);
> + link->spa = NULL;
> + }
> +}
> +
> +static int alloc_link(struct pci_dev *dev, int PE_mask, struct link **out_link)
> +{
> + struct link *link;
> + int rc;
> +
> + link = kzalloc(sizeof(struct link), GFP_KERNEL);
> + if (!link)
> + return -ENOMEM;
> +
> + kref_init(&link->ref);
> + link->domain = pci_domain_nr(dev->bus);
> + link->bus = dev->bus->number;
> + link->dev = PCI_SLOT(dev->devfn);
> + atomic_set(&link->irq_available, MAX_IRQ_PER_LINK);
> +
> + rc = alloc_spa(dev, link);
> + if (rc)
> + goto err_free;
> +
> + rc = setup_xsl_irq(dev, link);
> + if (rc)
> + goto err_spa;
> +
> + /* platform specific hook */
> + rc = pnv_ocxl_spa_setup(dev, link->spa->spa_mem, PE_mask,
> + &link->platform_data);
> + if (rc)
> + goto err_xsl_irq;
> +
> + *out_link = link;
> + return 0;
> +
> +err_xsl_irq:
> + release_xsl_irq(link);
> +err_spa:
> + free_spa(link);
> +err_free:
> + kfree(link);
> + return rc;
> +}
> +
> +static void free_link(struct link *link)
> +{
> + release_xsl_irq(link);
> + free_spa(link);
> + kfree(link);
> +}
> +
> +int ocxl_link_setup(struct pci_dev *dev, int PE_mask, void **link_handle)
> +{
> + int rc = 0;
> + struct link *link;
> +
> + mutex_lock(&links_list_lock);
> + list_for_each_entry(link, &links_list, list) {
> + /* The functions of a device all share the same link */
> + if (link->domain == pci_domain_nr(dev->bus) &&
> + link->bus == dev->bus->number &&
> + link->dev == PCI_SLOT(dev->devfn)) {
> + kref_get(&link->ref);
> + *link_handle = link;
> + goto unlock;
> + }
> + }
> + rc = alloc_link(dev, PE_mask, &link);
> + if (rc)
> + goto unlock;
> +
> + list_add(&link->list, &links_list);
> + *link_handle = link;
> +unlock:
> + mutex_unlock(&links_list_lock);
> + return rc;
> +}
> +
> +static void release_xsl(struct kref *ref)
> +{
> + struct link *link = container_of(ref, struct link, ref);
> +
> + list_del(&link->list);
> + /* call platform code before releasing data */
> + pnv_ocxl_spa_release(link->platform_data);
> + free_link(link);
> +}
> +
> +void ocxl_link_release(struct pci_dev *dev, void *link_handle)
> +{
> + struct link *link = (struct link *) link_handle;
> +
> + mutex_lock(&links_list_lock);
> + kref_put(&link->ref, release_xsl);
> + mutex_unlock(&links_list_lock);
> +}
> +
> +static u64 calculate_cfg_state(bool kernel)
> +{
> + u64 state;
> +
> + state = SPA_CFG_DR;
> + if (mfspr(SPRN_LPCR) & LPCR_TC)
> + state |= SPA_CFG_TC;
> + if (radix_enabled())
> + state |= SPA_CFG_XLAT_ror;
> + else
> + state |= SPA_CFG_XLAT_hpt;
> + state |= SPA_CFG_HV;
> + if (kernel) {
> + if (mfmsr() & MSR_SF)
> + state |= SPA_CFG_SF;
> + } else {
> + state |= SPA_CFG_PR;
> + if (!test_tsk_thread_flag(current, TIF_32BIT))
> + state |= SPA_CFG_SF;
> + }
> + return state;
> +}
> +
> +int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
> + u64 amr, struct mm_struct *mm,
> + void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
> + void *xsl_err_data)
> +{
> + struct link *link = (struct link *) link_handle;
> + struct spa *spa = link->spa;
> + struct ocxl_process_element *pe;
> + int pe_handle, rc = 0;
> + struct pe_data *pe_data;
> +
> + BUILD_BUG_ON(sizeof(struct ocxl_process_element) != 128);
> + if (pasid > SPA_PASID_MAX)
> + return -EINVAL;
> +
> + mutex_lock(&spa->spa_lock);
> + pe_handle = pasid & SPA_PE_MASK;
> + pe = spa->spa_mem + pe_handle;
> +
> + if (pe->software_state) {
> + rc = -EBUSY;
> + goto unlock;
> + }
> +
> + pe_data = kmalloc(sizeof(*pe_data), GFP_KERNEL);
> + if (!pe_data) {
> + rc = -ENOMEM;
> + goto unlock;
> + }
> +
> + pe_data->mm = mm;
> + pe_data->xsl_err_cb = xsl_err_cb;
> + pe_data->xsl_err_data = xsl_err_data;
> +
> + memset(pe, 0, sizeof(struct ocxl_process_element));
> + pe->config_state = cpu_to_be64(calculate_cfg_state(pidr == 0));
> + pe->lpid = cpu_to_be32(mfspr(SPRN_LPID));
> + pe->pid = cpu_to_be32(pidr);
> + pe->tid = cpu_to_be32(tidr);
> + pe->amr = cpu_to_be64(amr);
> + pe->software_state = cpu_to_be32(SPA_PE_VALID);

drivers/misc/ocxl/link.c:509:26: warning: incorrect type in assignment
(different base types)
drivers/misc/ocxl/link.c:509:26: expected unsigned long long
[unsigned] [usertype] config_state
drivers/misc/ocxl/link.c:509:26: got restricted __be64 [usertype]
<noident>
drivers/misc/ocxl/link.c:510:18: warning: incorrect type in assignment
(different base types)
drivers/misc/ocxl/link.c:510:18: expected unsigned int [unsigned]
[usertype] lpid
drivers/misc/ocxl/link.c:510:18: got restricted __be32 [usertype]
<noident>
drivers/misc/ocxl/link.c:511:17: warning: incorrect type in assignment
(different base types)
drivers/misc/ocxl/link.c:511:17: expected unsigned int [unsigned]
[usertype] pid
drivers/misc/ocxl/link.c:511:17: got restricted __be32 [usertype]
<noident>
drivers/misc/ocxl/link.c:512:17: warning: incorrect type in assignment
(different base types)
drivers/misc/ocxl/link.c:512:17: expected unsigned int [unsigned]
[usertype] tid
drivers/misc/ocxl/link.c:512:17: got restricted __be32 [usertype]
<noident>
drivers/misc/ocxl/link.c:513:17: warning: incorrect type in assignment
(different base types)
drivers/misc/ocxl/link.c:513:17: expected unsigned long long
[unsigned] [usertype] amr
drivers/misc/ocxl/link.c:513:17: got restricted __be64 [usertype]
<noident>
drivers/misc/ocxl/link.c:514:28: warning: incorrect type in assignment
(different base types)
drivers/misc/ocxl/link.c:514:28: expected unsigned int [unsigned]
[usertype] software_state
drivers/misc/ocxl/link.c:514:28: got restricted __be32 [usertype]
<noident>

> +
> + mm_context_add_copro(mm);
> + /*
> + * Barrier is to make sure PE is visible in the SPA before it
> + * is used by the device. It also helps with the global TLBI
> + * invalidation
> + */
> + mb();
> + radix_tree_insert(&spa->pe_tree, pe_handle, pe_data);
> +
> + /*
> + * The mm must stay valid for as long as the device uses it. We
> + * lower the count when the context is removed from the SPA.
> + *
> + * We grab mm_count (and not mm_users), as we don't want to
> + * end up in a circular dependency if a process mmaps its
> + * mmio, therefore incrementing the file ref count when
> + * calling mmap(), and forgets to unmap before exiting. In
> + * that scenario, when the kernel handles the death of the
> + * process, the file is not cleaned because unmap was not
> + * called, and the mm wouldn't be freed because we would still
> + * have a reference on mm_users. Incrementing mm_count solves
> + * the problem.
> + */
> + mmgrab(mm);
> +unlock:
> + mutex_unlock(&spa->spa_lock);
> + return rc;
> +}
> +
> +int ocxl_link_remove_pe(void *link_handle, int pasid)
> +{
> + struct link *link = (struct link *) link_handle;
> + struct spa *spa = link->spa;
> + struct ocxl_process_element *pe;
> + struct pe_data *pe_data;
> + int pe_handle, rc;
> +
> + if (pasid > SPA_PASID_MAX)
> + return -EINVAL;
> +
> + /*
> + * About synchronization with our memory fault handler:
> + *
> + * Before removing the PE, the driver is supposed to have
> + * notified the AFU, which should have cleaned up and make
> + * sure the PASID is no longer in use, including pending
> + * interrupts. However, there's no way to be sure...
> + *
> + * We clear the PE and remove the context from our radix
> + * tree. From that point on, any new interrupt for that
> + * context will fail silently, which is ok. As mentioned
> + * above, that's not expected, but it could happen if the
> + * driver or AFU didn't do the right thing.
> + *
> + * There could still be a bottom half running, but we don't
> + * need to wait/flush, as it is managing a reference count on
> + * the mm it reads from the radix tree.
> + */
> + pe_handle = pasid & SPA_PE_MASK;
> + pe = spa->spa_mem + pe_handle;
> +
> + mutex_lock(&spa->spa_lock);
> +
> + if (!(pe->software_state & cpu_to_be32(SPA_PE_VALID))) {

drivers/misc/ocxl/link.c:581:36: warning: restricted __be32 degrades to
integer

> + rc = -EINVAL;
> + goto unlock;
> + }
> +
> + memset(pe, 0, sizeof(struct ocxl_process_element));
> + /*
> + * The barrier makes sure the PE is removed from the SPA
> + * before we clear the NPU context cache below, so that the
> + * old PE cannot be reloaded erroneously.
> + */
> + mb();
> +
> + /*
> + * hook to platform code
> + * On powerpc, the entry needs to be cleared from the context
> + * cache of the NPU.
> + */
> + rc = pnv_ocxl_spa_remove_pe(link->platform_data, pe_handle);
> + WARN_ON(rc);
> +
> + pe_data = radix_tree_delete(&spa->pe_tree, pe_handle);
> + if (!pe_data) {
> + WARN(1, "Couldn't find pe data when removing PE\n");
> + } else {
> + mm_context_remove_copro(pe_data->mm);
> + mmdrop(pe_data->mm);
> + kfree_rcu(pe_data, rcu);
> + }
> +unlock:
> + mutex_unlock(&spa->spa_lock);
> + return rc;
> +}
> diff --git a/drivers/misc/ocxl/main.c b/drivers/misc/ocxl/main.c
> new file mode 100644
> index 000000000000..be34b8fae97a
> --- /dev/null
> +++ b/drivers/misc/ocxl/main.c
> @@ -0,0 +1,40 @@
> +/*
> + * Copyright 2017 IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/pci.h>
> +#include "ocxl_internal.h"
> +
> +static int __init init_ocxl(void)
> +{
> + int rc = 0;
> +
> + rc = ocxl_file_init();
> + if (rc)
> + return rc;
> +
> + rc = pci_register_driver(&ocxl_pci_driver);
> + if (rc) {
> + ocxl_file_exit();
> + return rc;
> + }
> + return 0;
> +}
> +
> +static void exit_ocxl(void)
> +{
> + pci_unregister_driver(&ocxl_pci_driver);
> + ocxl_file_exit();
> +}
> +
> +module_init(init_ocxl);
> +module_exit(exit_ocxl);
> +
> +MODULE_DESCRIPTION("Open Coherent Accelerator");
> +MODULE_LICENSE("GPL");
> diff --git a/drivers/misc/ocxl/ocxl_internal.h b/drivers/misc/ocxl/ocxl_internal.h
> new file mode 100644
> index 000000000000..e07f7d523275
> --- /dev/null
> +++ b/drivers/misc/ocxl/ocxl_internal.h
> @@ -0,0 +1,200 @@
> +/*
> + * Copyright 2017 IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#ifndef _OCXL_INTERNAL_H_
> +#define _OCXL_INTERNAL_H_
> +
> +#include <linux/pci.h>
> +#include <linux/cdev.h>
> +#include <linux/list.h>
> +
> +#define OCXL_AFU_NAME_SZ (24+1) /* add 1 for NULL termination */
> +#define MAX_IRQ_PER_LINK 2000
> +#define MAX_IRQ_PER_CONTEXT MAX_IRQ_PER_LINK
> +
> +#define to_ocxl_function(d) container_of(d, struct ocxl_fn, dev)
> +#define to_ocxl_afu(d) container_of(d, struct ocxl_afu, dev)
> +
> +extern struct pci_driver ocxl_pci_driver;
> +
> +/*
> + * The following 2 structures are a fairly generic way of representing
> + * the configuration data for a function and AFU, as read from the
> + * configuration space.
> + */
> +struct ocxl_afu_config {
> + u8 idx;
> + int dvsec_afu_control_pos;
> + char name[OCXL_AFU_NAME_SZ];
> + u8 version_major;
> + u8 version_minor;
> + u8 afuc_type;
> + u8 afum_type;
> + u8 profile;
> + u8 global_mmio_bar;
> + u64 global_mmio_offset;
> + u32 global_mmio_size;
> + u8 pp_mmio_bar;
> + u64 pp_mmio_offset;
> + u32 pp_mmio_stride;
> + u8 log_mem_size;
> + u8 pasid_supported_log;
> + u16 actag_supported;
> +};
> +
> +struct ocxl_fn_config {
> + int dvsec_tl_pos;
> + int dvsec_function_pos;
> + int dvsec_afu_info_pos;
> + s8 max_pasid_log;
> + s8 max_afu_index;
> +};
> +
> +struct ocxl_fn {
> + struct device dev;
> + int bar_used[3];
> + struct ocxl_fn_config config;
> + struct list_head afu_list;
> + int pasid_base;
> + int actag_base;
> + int actag_enabled;
> + int actag_supported;
> + struct list_head pasid_list;
> + struct list_head actag_list;
> + void *link;
> +};
> +
> +struct ocxl_afu {
> + struct ocxl_fn *fn;
> + struct list_head list;
> + struct device dev;
> + struct cdev cdev;
> + struct ocxl_afu_config config;
> + int pasid_base;
> + int pasid_count; /* opened contexts */
> + int pasid_max; /* maximum number of contexts */
> + int actag_base;
> + int actag_enabled;
> + struct mutex contexts_lock;
> + struct idr contexts_idr;
> + struct mutex afu_control_lock;
> + u64 global_mmio_start;
> + u64 irq_base_offset;
> + void __iomem *global_mmio_ptr;
> + u64 pp_mmio_start;
> + struct bin_attribute attr_global_mmio;
> +};
> +
> +enum ocxl_context_status {
> + CLOSED,
> + OPENED,
> + ATTACHED,
> +};
> +
> +// Contains metadata about a translation fault
> +struct ocxl_xsl_error {
> + u64 addr; // The address that triggered the fault
> + u64 dsisr; // the value of the dsisr register
> + u64 count; // The number of times this fault has been triggered
> +};
> +
> +struct ocxl_context {
> + struct ocxl_afu *afu;
> + int pasid;
> + struct mutex status_mutex;
> + enum ocxl_context_status status;
> + struct address_space *mapping;
> + struct mutex mapping_lock;
> + wait_queue_head_t events_wq;
> + struct mutex xsl_error_lock;
> + struct ocxl_xsl_error xsl_error;
> + struct mutex irq_lock;
> + struct idr irq_idr;
> +};
> +
> +struct ocxl_process_element {
> + u64 config_state;
> + u32 reserved1[11];
> + u32 lpid;
> + u32 tid;
> + u32 pid;
> + u32 reserved2[10];
> + u64 amr;
> + u32 reserved3[3];
> + u32 software_state;
> +};
> +
> +
> +extern struct ocxl_afu *ocxl_afu_get(struct ocxl_afu *afu);
> +extern void ocxl_afu_put(struct ocxl_afu *afu);
> +
> +extern int ocxl_create_cdev(struct ocxl_afu *afu);
> +extern void ocxl_destroy_cdev(struct ocxl_afu *afu);
> +extern int ocxl_register_afu(struct ocxl_afu *afu);
> +extern void ocxl_unregister_afu(struct ocxl_afu *afu);
> +
> +extern int ocxl_file_init(void);
> +extern void ocxl_file_exit(void);
> +
> +extern int ocxl_config_read_function(struct pci_dev *dev,
> + struct ocxl_fn_config *fn);
> +
> +extern int ocxl_config_check_afu_index(struct pci_dev *dev,
> + struct ocxl_fn_config *fn, int afu_idx);
> +extern int ocxl_config_read_afu(struct pci_dev *dev,
> + struct ocxl_fn_config *fn,
> + struct ocxl_afu_config *afu,
> + u8 afu_idx);
> +extern int ocxl_config_get_pasid_info(struct pci_dev *dev, int *count);
> +extern void ocxl_config_set_afu_pasid(struct pci_dev *dev,
> + int afu_control,
> + int pasid_base, u32 pasid_count_log);
> +extern int ocxl_config_get_actag_info(struct pci_dev *dev,
> + u16 *base, u16 *enabled, u16 *supported);
> +extern void ocxl_config_set_actag(struct pci_dev *dev, int func_dvsec,
> + u32 tag_first, u32 tag_count);
> +extern void ocxl_config_set_afu_actag(struct pci_dev *dev, int afu_control,
> + int actag_base, int actag_count);
> +extern void ocxl_config_set_afu_state(struct pci_dev *dev, int afu_control,
> + int enable);
> +extern int ocxl_config_set_TL(struct pci_dev *dev, int tl_dvsec);
> +extern int ocxl_config_terminate_pasid(struct pci_dev *dev, int afu_control,
> + int pasid);
> +
> +extern int ocxl_link_setup(struct pci_dev *dev, int PE_mask,
> + void **link_handle);
> +extern void ocxl_link_release(struct pci_dev *dev, void *link_handle);
> +extern int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
> + u64 amr, struct mm_struct *mm,
> + void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
> + void *xsl_err_data);
> +extern int ocxl_link_remove_pe(void *link_handle, int pasid);
> +extern int ocxl_link_irq_alloc(void *link_handle, int *hw_irq,
> + u64 *addr);
> +extern void ocxl_link_free_irq(void *link_handle, int hw_irq);
> +
> +extern int ocxl_pasid_afu_alloc(struct ocxl_fn *fn, u32 size);
> +extern void ocxl_pasid_afu_free(struct ocxl_fn *fn, u32 start, u32 size);
> +extern int ocxl_actag_afu_alloc(struct ocxl_fn *fn, u32 size);
> +extern void ocxl_actag_afu_free(struct ocxl_fn *fn, u32 start, u32 size);
> +
> +extern struct ocxl_context *ocxl_context_alloc(void);
> +extern int ocxl_context_init(struct ocxl_context *ctx, struct ocxl_afu *afu,
> + struct address_space *mapping);
> +extern int ocxl_context_attach(struct ocxl_context *ctx, u64 amr);
> +extern int ocxl_context_mmap(struct ocxl_context *ctx,
> + struct vm_area_struct *vma);
> +extern int ocxl_context_detach(struct ocxl_context *ctx);
> +extern void ocxl_context_detach_all(struct ocxl_afu *afu);
> +extern void ocxl_context_free(struct ocxl_context *ctx);
> +
> +extern int ocxl_sysfs_add_afu(struct ocxl_afu *afu);
> +extern void ocxl_sysfs_remove_afu(struct ocxl_afu *afu);
> +
> +#endif /* _OCXL_INTERNAL_H_ */
> diff --git a/drivers/misc/ocxl/pasid.c b/drivers/misc/ocxl/pasid.c
> new file mode 100644
> index 000000000000..ea999a3a99b4
> --- /dev/null
> +++ b/drivers/misc/ocxl/pasid.c
> @@ -0,0 +1,114 @@
> +/*
> + * Copyright 2017 IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#include "ocxl_internal.h"
> +
> +
> +struct id_range {
> + struct list_head list;
> + u32 start;
> + u32 end;
> +};
> +
> +#ifdef DEBUG
> +static void dump_list(struct list_head *head, char *type_str)
> +{
> + struct id_range *cur;
> +
> + pr_debug("%s ranges allocated:\n", type_str);
> + list_for_each_entry(cur, head, list) {
> + pr_debug("Range %d->%d\n", cur->start, cur->end);
> + }
> +}
> +#endif
> +
> +static int range_alloc(struct list_head *head, u32 size, int max_id,
> + char *type_str)
> +{
> + struct list_head *pos;
> + struct id_range *cur, *new;
> + int rc, last_end;
> +
> + new = kmalloc(sizeof(struct id_range), GFP_KERNEL);
> + if (!new)
> + return -ENOMEM;
> +
> + pos = head;
> + last_end = -1;
> + list_for_each_entry(cur, head, list) {
> + if ((cur->start - last_end) > size)
> + break;
> + last_end = cur->end;
> + pos = &cur->list;
> + }
> +
> + new->start = last_end + 1;
> + new->end = new->start + size - 1;
> +
> + if (new->end > max_id) {
> + kfree(new);
> + rc = -ENOSPC;
> + } else {
> + list_add(&new->list, pos);
> + rc = new->start;
> + }
> +
> +#ifdef DEBUG
> + dump_list(head, type_str);
> +#endif
> + return rc;
> +}
> +
> +static void range_free(struct list_head *head, u32 start, u32 size,
> + char *type_str)
> +{
> + bool found = false;
> + struct id_range *cur, *tmp;
> +
> + list_for_each_entry_safe(cur, tmp, head, list) {
> + if (cur->start == start && cur->end == (start + size - 1)) {
> + found = true;
> + list_del(&cur->list);
> + kfree(cur);
> + break;
> + }
> + }
> + WARN_ON(!found);
> +#ifdef DEBUG
> + dump_list(head, type_str);
> +#endif
> +}
> +
> +int ocxl_pasid_afu_alloc(struct ocxl_fn *fn, u32 size)
> +{
> + int max_pasid;
> +
> + if (fn->config.max_pasid_log < 0)
> + return -ENOSPC;
> + max_pasid = 1 << fn->config.max_pasid_log;
> + return range_alloc(&fn->pasid_list, size, max_pasid, "afu pasid");
> +}
> +
> +void ocxl_pasid_afu_free(struct ocxl_fn *fn, u32 start, u32 size)
> +{
> + return range_free(&fn->pasid_list, start, size, "afu pasid");
> +}
> +
> +int ocxl_actag_afu_alloc(struct ocxl_fn *fn, u32 size)
> +{
> + int max_actag;
> +
> + max_actag = fn->actag_enabled;
> + return range_alloc(&fn->actag_list, size, max_actag, "afu actag");
> +}
> +
> +void ocxl_actag_afu_free(struct ocxl_fn *fn, u32 start, u32 size)
> +{
> + return range_free(&fn->actag_list, start, size, "afu actag");
> +}
> diff --git a/drivers/misc/ocxl/pci.c b/drivers/misc/ocxl/pci.c
> new file mode 100644
> index 000000000000..39e7bdd48215
> --- /dev/null
> +++ b/drivers/misc/ocxl/pci.c
> @@ -0,0 +1,592 @@
> +/*
> + * Copyright 2017 IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/pci.h>
> +#include <linux/idr.h>
> +#include <asm/pnv-ocxl.h>
> +#include "ocxl_internal.h"
> +
> +/*
> + * Any opencapi device which wants to use this 'generic' driver should
> + * use the 0x062B device ID. Vendors should define the subsystem
> + * vendor/device ID to help differentiate devices.
> + */
> +static const struct pci_device_id ocxl_pci_tbl[] = {
> + { PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x062B), },
> + { }
> +};
> +MODULE_DEVICE_TABLE(pci, ocxl_pci_tbl);
> +
> +
> +static struct ocxl_fn *ocxl_fn_get(struct ocxl_fn *fn)
> +{
> + return (get_device(&fn->dev) == NULL) ? NULL : fn;
> +}
> +
> +static void ocxl_fn_put(struct ocxl_fn *fn)
> +{
> + put_device(&fn->dev);
> +}
> +
> +struct ocxl_afu *ocxl_afu_get(struct ocxl_afu *afu)
> +{
> + return (get_device(&afu->dev) == NULL) ? NULL : afu;
> +}
> +
> +void ocxl_afu_put(struct ocxl_afu *afu)
> +{
> + put_device(&afu->dev);
> +}
> +
> +static struct ocxl_afu *alloc_afu(struct ocxl_fn *fn)
> +{
> + struct ocxl_afu *afu;
> +
> + afu = kzalloc(sizeof(struct ocxl_afu), GFP_KERNEL);
> + if (!afu)
> + return NULL;
> +
> + mutex_init(&afu->contexts_lock);
> + mutex_init(&afu->afu_control_lock);
> + idr_init(&afu->contexts_idr);
> + afu->fn = fn;
> + ocxl_fn_get(fn);
> + return afu;
> +}
> +
> +static void free_afu(struct ocxl_afu *afu)
> +{
> + idr_destroy(&afu->contexts_idr);
> + ocxl_fn_put(afu->fn);
> + kfree(afu);
> +}
> +
> +static void free_afu_dev(struct device *dev)
> +{
> + struct ocxl_afu *afu = to_ocxl_afu(dev);
> +
> + ocxl_unregister_afu(afu);
> + free_afu(afu);
> +}
> +
> +static int set_afu_device(struct ocxl_afu *afu, const char *location)
> +{
> + struct ocxl_fn *fn = afu->fn;
> + int rc;
> +
> + afu->dev.parent = &fn->dev;
> + afu->dev.release = free_afu_dev;
> + rc = dev_set_name(&afu->dev, "%s.%s.%hhu", afu->config.name, location,
> + afu->config.idx);
> + return rc;
> +}
> +
> +static int assign_afu_actag(struct ocxl_afu *afu, struct pci_dev *dev)
> +{
> + struct ocxl_fn *fn = afu->fn;
> + int actag_count, actag_offset;
> +
> + /*
> + * if there were not enough actags for the function, each afu
> + * reduces its count as well
> + */
> + actag_count = afu->config.actag_supported *
> + fn->actag_enabled / fn->actag_supported;
> + actag_offset = ocxl_actag_afu_alloc(fn, actag_count);
> + if (actag_offset < 0) {
> + dev_err(&afu->dev, "Can't allocate %d actags for AFU: %d\n",
> + actag_count, actag_offset);
> + return actag_offset;
> + }
> + afu->actag_base = fn->actag_base + actag_offset;
> + afu->actag_enabled = actag_count;
> +
> + ocxl_config_set_afu_actag(dev, afu->config.dvsec_afu_control_pos,
> + afu->actag_base, afu->actag_enabled);
> + dev_dbg(&afu->dev, "actag base=%d enabled=%d\n",
> + afu->actag_base, afu->actag_enabled);
> + return 0;
> +}
> +
> +static void reclaim_afu_actag(struct ocxl_afu *afu)
> +{
> + struct ocxl_fn *fn = afu->fn;
> + int start_offset, size;
> +
> + start_offset = afu->actag_base - fn->actag_base;
> + size = afu->actag_enabled;
> + ocxl_actag_afu_free(afu->fn, start_offset, size);
> +}
> +
> +static int assign_afu_pasid(struct ocxl_afu *afu, struct pci_dev *dev)
> +{
> + struct ocxl_fn *fn = afu->fn;
> + int pasid_count, pasid_offset;
> +
> + /*
> + * We only support the case where the function configuration
> + * requested enough PASIDs to cover all AFUs.
> + */
> + pasid_count = 1 << afu->config.pasid_supported_log;
> + pasid_offset = ocxl_pasid_afu_alloc(fn, pasid_count);
> + if (pasid_offset < 0) {
> + dev_err(&afu->dev, "Can't allocate %d PASIDs for AFU: %d\n",
> + pasid_count, pasid_offset);
> + return pasid_offset;
> + }
> + afu->pasid_base = fn->pasid_base + pasid_offset;
> + afu->pasid_count = 0;
> + afu->pasid_max = pasid_count;
> +
> + ocxl_config_set_afu_pasid(dev, afu->config.dvsec_afu_control_pos,
> + afu->pasid_base,
> + afu->config.pasid_supported_log);
> + dev_dbg(&afu->dev, "PASID base=%d, enabled=%d\n",
> + afu->pasid_base, pasid_count);
> + return 0;
> +}
> +
> +static void reclaim_afu_pasid(struct ocxl_afu *afu)
> +{
> + struct ocxl_fn *fn = afu->fn;
> + int start_offset, size;
> +
> + start_offset = afu->pasid_base - fn->pasid_base;
> + size = 1 << afu->config.pasid_supported_log;
> + ocxl_pasid_afu_free(afu->fn, start_offset, size);
> +}
> +
> +static int reserve_fn_bar(struct ocxl_fn *fn, int bar)
> +{
> + struct pci_dev *dev = to_pci_dev(fn->dev.parent);
> + int rc, idx;
> +
> + if (bar != 0 && bar != 2 && bar != 4)
> + return -EINVAL;
> +
> + idx = bar >> 1;
> + if (fn->bar_used[idx]++ == 0) {
> + rc = pci_request_region(dev, bar, "ocxl");
> + if (rc)
> + return rc;
> + }
> + return 0;
> +}
> +
> +static void release_fn_bar(struct ocxl_fn *fn, int bar)
> +{
> + struct pci_dev *dev = to_pci_dev(fn->dev.parent);
> + int idx;
> +
> + if (bar != 0 && bar != 2 && bar != 4)
> + return;
> +
> + idx = bar >> 1;
> + if (--fn->bar_used[idx] == 0)
> + pci_release_region(dev, bar);
> + WARN_ON(fn->bar_used[idx] < 0);
> +}
> +
> +static int map_mmio_areas(struct ocxl_afu *afu, struct pci_dev *dev)
> +{
> + int rc;
> +
> + rc = reserve_fn_bar(afu->fn, afu->config.global_mmio_bar);
> + if (rc)
> + return rc;
> +
> + rc = reserve_fn_bar(afu->fn, afu->config.pp_mmio_bar);
> + if (rc) {
> + release_fn_bar(afu->fn, afu->config.global_mmio_bar);
> + return rc;
> + }
> +
> + afu->global_mmio_start =
> + pci_resource_start(dev, afu->config.global_mmio_bar) +
> + afu->config.global_mmio_offset;
> + afu->pp_mmio_start =
> + pci_resource_start(dev, afu->config.pp_mmio_bar) +
> + afu->config.pp_mmio_offset;
> +
> + afu->global_mmio_ptr = ioremap(afu->global_mmio_start,
> + afu->config.global_mmio_size);
> + if (!afu->global_mmio_ptr) {
> + release_fn_bar(afu->fn, afu->config.pp_mmio_bar);
> + release_fn_bar(afu->fn, afu->config.global_mmio_bar);
> + dev_err(&dev->dev, "Error mapping global mmio area\n");
> + return -ENOMEM;
> + }
> +
> + /*
> + * Leave an empty page between the per-process mmio area and
> + * the AFU interrupt mappings
> + */
> + afu->irq_base_offset = afu->config.pp_mmio_stride + PAGE_SIZE;
> + return 0;
> +}
> +
> +static void unmap_mmio_areas(struct ocxl_afu *afu)
> +{
> + if (afu->global_mmio_ptr) {
> + iounmap(afu->global_mmio_ptr);
> + afu->global_mmio_ptr = NULL;
> + }
> + afu->global_mmio_start = 0;
> + afu->pp_mmio_start = 0;
> + release_fn_bar(afu->fn, afu->config.pp_mmio_bar);
> + release_fn_bar(afu->fn, afu->config.global_mmio_bar);
> +}
> +
> +static int configure_afu(struct ocxl_afu *afu, u8 afu_idx, struct pci_dev *dev)
> +{
> + int rc;
> +
> + rc = ocxl_config_read_afu(dev, &afu->fn->config, &afu->config, afu_idx);
> + if (rc)
> + return rc;
> +
> + rc = set_afu_device(afu, dev_name(&dev->dev));
> + if (rc)
> + return rc;
> +
> + rc = assign_afu_actag(afu, dev);
> + if (rc)
> + return rc;
> +
> + rc = assign_afu_pasid(afu, dev);
> + if (rc) {
> + reclaim_afu_actag(afu);
> + return rc;
> + }
> +
> + rc = map_mmio_areas(afu, dev);
> + if (rc) {
> + reclaim_afu_pasid(afu);
> + reclaim_afu_actag(afu);
> + return rc;
> + }
> + return 0;
> +}
> +
> +static void deconfigure_afu(struct ocxl_afu *afu)
> +{
> + unmap_mmio_areas(afu);
> + reclaim_afu_pasid(afu);
> + reclaim_afu_actag(afu);
> +}
> +
> +static int activate_afu(struct pci_dev *dev, struct ocxl_afu *afu)
> +{
> + int rc;
> +
> + ocxl_config_set_afu_state(dev, afu->config.dvsec_afu_control_pos, 1);
> + /*
> + * Char device creation is the last step, as processes can
> + * call our driver immediately, so all our inits must be finished.
> + */
> + rc = ocxl_create_cdev(afu);
> + if (rc)
> + return rc;
> + return 0;
> +}
> +
> +static void deactivate_afu(struct ocxl_afu *afu)
> +{
> + struct pci_dev *dev = to_pci_dev(afu->fn->dev.parent);
> +
> + ocxl_destroy_cdev(afu);
> + ocxl_config_set_afu_state(dev, afu->config.dvsec_afu_control_pos, 0);
> +}
> +
> +static int init_afu(struct pci_dev *dev, struct ocxl_fn *fn, u8 afu_idx)
> +{
> + int rc;
> + struct ocxl_afu *afu;
> +
> + afu = alloc_afu(fn);
> + if (!afu)
> + return -ENOMEM;
> +
> + rc = configure_afu(afu, afu_idx, dev);
> + if (rc) {
> + free_afu(afu);
> + return rc;
> + }
> +
> + rc = ocxl_register_afu(afu);
> + if (rc)
> + goto err;
> +
> + rc = ocxl_sysfs_add_afu(afu);
> + if (rc)
> + goto err;
> +
> + rc = activate_afu(dev, afu);
> + if (rc)
> + goto err_sys;
> +
> + list_add_tail(&afu->list, &fn->afu_list);
> + return 0;
> +
> +err_sys:
> + ocxl_sysfs_remove_afu(afu);
> +err:
> + deconfigure_afu(afu);
> + device_unregister(&afu->dev);
> + return rc;
> +}
> +
> +static void remove_afu(struct ocxl_afu *afu)
> +{
> + list_del(&afu->list);
> + ocxl_context_detach_all(afu);
> + deactivate_afu(afu);
> + ocxl_sysfs_remove_afu(afu);
> + deconfigure_afu(afu);
> + device_unregister(&afu->dev);
> +}
> +
> +static struct ocxl_fn *alloc_function(struct pci_dev *dev)
> +{
> + struct ocxl_fn *fn;
> +
> + fn = kzalloc(sizeof(struct ocxl_fn), GFP_KERNEL);
> + if (!fn)
> + return NULL;
> +
> + INIT_LIST_HEAD(&fn->afu_list);
> + INIT_LIST_HEAD(&fn->pasid_list);
> + INIT_LIST_HEAD(&fn->actag_list);
> + return fn;
> +}
> +
> +static void free_function(struct ocxl_fn *fn)
> +{
> + WARN_ON(!list_empty(&fn->afu_list));
> + WARN_ON(!list_empty(&fn->pasid_list));
> + kfree(fn);
> +}
> +
> +static void free_function_dev(struct device *dev)
> +{
> + struct ocxl_fn *fn = to_ocxl_function(dev);
> +
> + free_function(fn);
> +}
> +
> +static int set_function_device(struct ocxl_fn *fn, struct pci_dev *dev)
> +{
> + int rc;
> +
> + fn->dev.parent = &dev->dev;
> + fn->dev.release = free_function_dev;
> + rc = dev_set_name(&fn->dev, "ocxlfn.%s", dev_name(&dev->dev));
> + if (rc)
> + return rc;
> + pci_set_drvdata(dev, fn);
> + return 0;
> +}
> +
> +static int assign_function_actag(struct ocxl_fn *fn)
> +{
> + struct pci_dev *dev = to_pci_dev(fn->dev.parent);
> + u16 base, enabled, supported;
> + int rc;
> +
> + rc = ocxl_config_get_actag_info(dev, &base, &enabled, &supported);
> + if (rc)
> + return rc;
> +
> + fn->actag_base = base;
> + fn->actag_enabled = enabled;
> + fn->actag_supported = supported;
> +
> + ocxl_config_set_actag(dev, fn->config.dvsec_function_pos,
> + fn->actag_base, fn->actag_enabled);
> + dev_dbg(&fn->dev, "actag range starting at %d, enabled %d\n",
> + fn->actag_base, fn->actag_enabled);
> + return 0;
> +}
> +
> +static int set_function_pasid(struct ocxl_fn *fn)
> +{
> + struct pci_dev *dev = to_pci_dev(fn->dev.parent);
> + int rc, desired_count, max_count;
> +
> + /* A function may not require any PASID */
> + if (fn->config.max_pasid_log < 0)
> + return 0;
> +
> + rc = ocxl_config_get_pasid_info(dev, &max_count);
> + if (rc)
> + return rc;
> +
> + desired_count = 1 << fn->config.max_pasid_log;
> +
> + if (desired_count > max_count) {
> + dev_err(&fn->dev,
> + "Function requires more PASIDs than is available (%d vs. %d)\n",
> + desired_count, max_count);
> + return -ENOSPC;
> + }
> +
> + fn->pasid_base = 0;
> + return 0;
> +}
> +
> +static int configure_function(struct ocxl_fn *fn, struct pci_dev *dev)
> +{
> + int rc;
> +
> + rc = pci_enable_device(dev);
> + if (rc) {
> + dev_err(&dev->dev, "pci_enable_device failed: %d\n", rc);
> + return rc;
> + }
> +
> + /*
> + * Once it has been confirmed to work on our hardware, we
> + * should reset the function, to force the adapter to restart
> + * from scratch.
> + * A function reset would also reset all its AFUs.
> + *
> + * Some hints for implementation:
> + *
> + * - there's not status bit to know when the reset is done. We
> + * should try reading the config space to know when it's
> + * done.
> + * - probably something like:
> + * Reset
> + * wait 100ms
> + * issue config read
> + * allow device up to 1 sec to return success on config
> + * read before declaring it broken
> + *
> + * Some shared logic on the card (CFG, TLX) won't be reset, so
> + * there's no guarantee that it will be enough.
> + */
> + rc = ocxl_config_read_function(dev, &fn->config);
> + if (rc)
> + return rc;
> +
> + rc = set_function_device(fn, dev);
> + if (rc)
> + return rc;
> +
> + rc = assign_function_actag(fn);
> + if (rc)
> + return rc;
> +
> + rc = set_function_pasid(fn);
> + if (rc)
> + return rc;
> +
> + rc = ocxl_link_setup(dev, 0, &fn->link);
> + if (rc)
> + return rc;
> +
> + rc = ocxl_config_set_TL(dev, fn->config.dvsec_tl_pos);
> + if (rc) {
> + ocxl_link_release(dev, fn->link);
> + return rc;
> + }
> + return 0;
> +}
> +
> +static void deconfigure_function(struct ocxl_fn *fn)
> +{
> + struct pci_dev *dev = to_pci_dev(fn->dev.parent);
> +
> + ocxl_link_release(dev, fn->link);
> + pci_disable_device(dev);
> +}
> +
> +static struct ocxl_fn *init_function(struct pci_dev *dev)
> +{
> + struct ocxl_fn *fn;
> + int rc;
> +
> + fn = alloc_function(dev);
> + if (!fn)
> + return ERR_PTR(-ENOMEM);
> +
> + rc = configure_function(fn, dev);
> + if (rc) {
> + free_function(fn);
> + return ERR_PTR(rc);
> + }
> +
> + rc = device_register(&fn->dev);
> + if (rc) {
> + deconfigure_function(fn);
> + device_unregister(&fn->dev);
> + return ERR_PTR(rc);
> + }
> + return fn;
> +}
> +
> +static void remove_function(struct ocxl_fn *fn)
> +{
> + deconfigure_function(fn);
> + device_unregister(&fn->dev);
> +}
> +
> +static int ocxl_probe(struct pci_dev *dev, const struct pci_device_id *id)
> +{
> + int rc, afu_count = 0;
> + u8 afu;
> + struct ocxl_fn *fn;
> +
> + if (!radix_enabled()) {
> + dev_err(&dev->dev, "Unsupported memory model (hash)\n");
> + return -ENODEV;
> + }
> +
> + fn = init_function(dev);
> + if (IS_ERR(fn)) {
> + dev_err(&dev->dev, "function init failed: %li\n",
> + PTR_ERR(fn));
> + return PTR_ERR(fn);
> + }
> +
> + for (afu = 0; afu <= fn->config.max_afu_index; afu++) {
> + rc = ocxl_config_check_afu_index(dev, &fn->config, afu);
> + if (rc > 0) {
> + rc = init_afu(dev, fn, afu);
> + if (rc) {
> + dev_err(&dev->dev,
> + "Can't initialize AFU index %d\n", afu);
> + continue;
> + }
> + afu_count++;
> + }
> + }
> + dev_info(&dev->dev, "%d AFU(s) configured\n", afu_count);
> + return 0;
> +}
> +
> +static void ocxl_remove(struct pci_dev *dev)
> +{
> + struct ocxl_afu *afu, *tmp;
> + struct ocxl_fn *fn = pci_get_drvdata(dev);
> +
> + list_for_each_entry_safe(afu, tmp, &fn->afu_list, list) {
> + remove_afu(afu);
> + }
> + remove_function(fn);
> +}
> +
> +struct pci_driver ocxl_pci_driver = {
> + .name = "ocxl",
> + .id_table = ocxl_pci_tbl,
> + .probe = ocxl_probe,
> + .remove = ocxl_remove,
> + .shutdown = ocxl_remove,
> +};
> diff --git a/drivers/misc/ocxl/sysfs.c b/drivers/misc/ocxl/sysfs.c
> new file mode 100644
> index 000000000000..b7b1d1735c07
> --- /dev/null
> +++ b/drivers/misc/ocxl/sysfs.c
> @@ -0,0 +1,150 @@
> +/*
> + * Copyright 2017 IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#include <linux/sysfs.h>
> +#include "ocxl_internal.h"
> +
> +static ssize_t global_mmio_size_show(struct device *device,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + struct ocxl_afu *afu = to_ocxl_afu(device);
> +
> + return scnprintf(buf, PAGE_SIZE, "%d\n",
> + afu->config.global_mmio_size);
> +}
> +
> +static ssize_t pp_mmio_size_show(struct device *device,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + struct ocxl_afu *afu = to_ocxl_afu(device);
> +
> + return scnprintf(buf, PAGE_SIZE, "%d\n",
> + afu->config.pp_mmio_stride);
> +}
> +
> +static ssize_t afu_version_show(struct device *device,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + struct ocxl_afu *afu = to_ocxl_afu(device);
> +
> + return scnprintf(buf, PAGE_SIZE, "%hhu:%hhu\n",
> + afu->config.version_major,
> + afu->config.version_minor);
> +}
> +
> +static ssize_t contexts_show(struct device *device,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + struct ocxl_afu *afu = to_ocxl_afu(device);
> +
> + return scnprintf(buf, PAGE_SIZE, "%d/%d\n",
> + afu->pasid_count, afu->pasid_max);
> +}
> +
> +static struct device_attribute afu_attrs[] = {
> + __ATTR_RO(global_mmio_size),
> + __ATTR_RO(pp_mmio_size),
> + __ATTR_RO(afu_version),
> + __ATTR_RO(contexts),
> +};
> +
> +static ssize_t global_mmio_read(struct file *filp, struct kobject *kobj,
> + struct bin_attribute *bin_attr, char *buf,
> + loff_t off, size_t count)
> +{
> + struct ocxl_afu *afu = to_ocxl_afu(kobj_to_dev(kobj));
> +
> + if (count == 0 || off < 0 ||
> + off >= afu->config.global_mmio_size)
> + return 0;
> +
> + memcpy(buf, afu->global_mmio_ptr + off, count);

drivers/misc/ocxl/sysfs.c:64:42: warning: incorrect type in argument 2
(different address spaces)
drivers/misc/ocxl/sysfs.c:64:42: expected void const *<noident>
drivers/misc/ocxl/sysfs.c:64:42: got void [noderef] <asn:2>*

> + return count;
> +}
> +
> +static int global_mmio_fault(struct vm_fault *vmf)
> +{
> + struct vm_area_struct *vma = vmf->vma;
> + struct ocxl_afu *afu = vma->vm_private_data;
> + unsigned long offset;
> +
> + if (vmf->pgoff >= (afu->config.global_mmio_size >> PAGE_SHIFT))
> + return VM_FAULT_SIGBUS;
> +
> + offset = vmf->pgoff;
> + offset += (afu->global_mmio_start >> PAGE_SHIFT);
> + vm_insert_pfn(vma, vmf->address, offset);
> + return VM_FAULT_NOPAGE;
> +}
> +
> +static const struct vm_operations_struct global_mmio_vmops = {
> + .fault = global_mmio_fault,
> +};
> +
> +static int global_mmio_mmap(struct file *filp, struct kobject *kobj,
> + struct bin_attribute *bin_attr,
> + struct vm_area_struct *vma)
> +{
> + struct ocxl_afu *afu = to_ocxl_afu(kobj_to_dev(kobj));
> +
> + if ((vma_pages(vma) + vma->vm_pgoff) >
> + (afu->config.global_mmio_size >> PAGE_SHIFT))
> + return -EINVAL;
> +
> + vma->vm_flags |= VM_IO | VM_PFNMAP;
> + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> + vma->vm_ops = &global_mmio_vmops;
> + vma->vm_private_data = afu;
> + return 0;
> +}
> +
> +int ocxl_sysfs_add_afu(struct ocxl_afu *afu)
> +{
> + int i, rc;
> +
> + for (i = 0; i < ARRAY_SIZE(afu_attrs); i++) {
> + rc = device_create_file(&afu->dev, &afu_attrs[i]);
> + if (rc)
> + goto err;
> + }
> +
> + sysfs_attr_init(&afu->attr_global_mmio.attr);
> + afu->attr_global_mmio.attr.name = "global_mmio_area";
> + afu->attr_global_mmio.attr.mode = 0600;
> + afu->attr_global_mmio.size = afu->config.global_mmio_size;
> + afu->attr_global_mmio.read = global_mmio_read;
> + afu->attr_global_mmio.mmap = global_mmio_mmap;
> + rc = device_create_bin_file(&afu->dev, &afu->attr_global_mmio);
> + if (rc) {
> + dev_err(&afu->dev,
> + "Unable to create global mmio attr for afu: %d\n",
> + rc);
> + goto err;
> + }
> +
> + return 0;
> +
> +err:
> + for (i--; i >= 0; i--)
> + device_remove_file(&afu->dev, &afu_attrs[i]);
> + return rc;
> +}
> +
> +void ocxl_sysfs_remove_afu(struct ocxl_afu *afu)
> +{
> + int i;
> +
> + for (i = 0; i < ARRAY_SIZE(afu_attrs); i++)
> + device_remove_file(&afu->dev, &afu_attrs[i]);
> + device_remove_bin_file(&afu->dev, &afu->attr_global_mmio);
> +}
> diff --git a/include/uapi/misc/ocxl.h b/include/uapi/misc/ocxl.h
> new file mode 100644
> index 000000000000..71fa387f2efd
> --- /dev/null
> +++ b/include/uapi/misc/ocxl.h
> @@ -0,0 +1,47 @@
> +/*
> + * Copyright 2017 IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#ifndef _UAPI_MISC_OCXL_H
> +#define _UAPI_MISC_OCXL_H
> +
> +#include <linux/types.h>
> +#include <linux/ioctl.h>
> +
> +enum ocxl_event_type {
> + OCXL_AFU_EVENT_XSL_FAULT_ERROR = 0,
> +};
> +
> +#define OCXL_KERNEL_EVENT_FLAG_LAST 0x0001 /* This is the last event pending */
> +
> +struct ocxl_kernel_event_header {
> + __u16 type;
> + __u16 flags;
> + __u32 reserved;
> +};
> +
> +struct ocxl_kernel_event_xsl_fault_error {
> + __u64 addr;
> + __u64 dsisr;
> + __u64 count;
> + __u64 reserved;
> +};
> +
> +struct ocxl_ioctl_attach {
> + __u64 amr;
> + __u64 reserved1;
> + __u64 reserved2;
> + __u64 reserved3;
> +};
> +
> +/* ioctl numbers */
> +#define OCXL_MAGIC 0xCA
> +/* AFU devices */
> +#define OCXL_IOCTL_ATTACH _IOW(OCXL_MAGIC, 0x10, struct ocxl_ioctl_attach)
> +
> +#endif /* _UAPI_MISC_OCXL_H */
>

--
Andrew Donnellan OzLabs, ADL Canberra
[email protected] IBM Australia Limited

2018-01-03 07:32:06

by Andrew Donnellan

[permalink] [raw]
Subject: Re: [PATCH 04/13] powerpc/powernv: Add platform-specific services for opencapi

On 19/12/17 02:21, Frederic Barrat wrote:
> Implement a few platform-specific calls which can be used by drivers:
>
> - provide the Transaction Layer capabilities of the host, so that the
> driver can find some common ground and configure the device and host
> appropriately.
>
> - provide the hw interrupt to be used for translation faults raised by
> the NPU
>
> - map/unmap some NPU mmio registers to get the fault context when the
> NPU raises an address translation fault
>
> The rest are wrappers around the previously-introduced opal calls.
>
>
> Signed-off-by: Frederic Barrat <[email protected]>
> ---
> arch/powerpc/include/asm/pnv-ocxl.h | 36 ++++++
> arch/powerpc/platforms/powernv/Makefile | 1 +
> arch/powerpc/platforms/powernv/ocxl.c | 187 ++++++++++++++++++++++++++++++++
> 3 files changed, 224 insertions(+)
> create mode 100644 arch/powerpc/include/asm/pnv-ocxl.h
> create mode 100644 arch/powerpc/platforms/powernv/ocxl.c
>
> diff --git a/arch/powerpc/include/asm/pnv-ocxl.h b/arch/powerpc/include/asm/pnv-ocxl.h
> new file mode 100644
> index 000000000000..b9ab3f0a9634
> --- /dev/null
> +++ b/arch/powerpc/include/asm/pnv-ocxl.h
> @@ -0,0 +1,36 @@
> +/*
> + * Copyright 2017 IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#ifndef _ASM_PVN_OCXL_H
> +#define _ASM_PVN_OCXL_H

I assume you meant "PNV" here.

> +
> +#include <linux/pci.h>
> +
> +#define PNV_OCXL_TL_MAX_TEMPLATE 63
> +#define PNV_OCXL_TL_BITS_PER_RATE 4
> +#define PNV_OCXL_TL_RATE_BUF_SIZE ((PNV_OCXL_TL_MAX_TEMPLATE+1) * PNV_OCXL_TL_BITS_PER_RATE / 8)
> +
> +extern int pnv_ocxl_get_tl_cap(struct pci_dev *dev, long *cap,
> + char *rate_buf, int rate_buf_size);
> +extern int pnv_ocxl_set_tl_conf(struct pci_dev *dev, long cap,
> + uint64_t rate_buf_phys, int rate_buf_size);
> +
> +extern int pnv_ocxl_get_xsl_irq(struct pci_dev *dev, int *hwirq);
> +extern void pnv_ocxl_unmap_xsl_regs(void __iomem *dsisr, void __iomem *dar,
> + void __iomem *tfc, void __iomem *pe_handle);
> +extern int pnv_ocxl_map_xsl_regs(struct pci_dev *dev, void __iomem **dsisr,
> + void __iomem **dar, void __iomem **tfc,
> + void __iomem **pe_handle);
> +
> +extern int pnv_ocxl_spa_setup(struct pci_dev *dev, void *spa_mem, int PE_mask,
> + void **platform_data);
> +extern void pnv_ocxl_spa_release(void *platform_data);
> +extern int pnv_ocxl_spa_remove_pe(void *platform_data, int pe_handle);
> +
> +#endif /* _ASM_PVN_OCXL_H */

And here

> diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
> index 3732118a0482..6c9d5199a7e2 100644
> --- a/arch/powerpc/platforms/powernv/Makefile
> +++ b/arch/powerpc/platforms/powernv/Makefile
> @@ -17,3 +17,4 @@ obj-$(CONFIG_PERF_EVENTS) += opal-imc.o
> obj-$(CONFIG_PPC_MEMTRACE) += memtrace.o
> obj-$(CONFIG_PPC_VAS) += vas.o vas-window.o vas-debug.o
> obj-$(CONFIG_PPC_FTW) += nx-ftw.o
> +obj-$(CONFIG_OCXL_BASE) += ocxl.o
> diff --git a/arch/powerpc/platforms/powernv/ocxl.c b/arch/powerpc/platforms/powernv/ocxl.c
> new file mode 100644
> index 000000000000..3378b75cf5e5
> --- /dev/null
> +++ b/arch/powerpc/platforms/powernv/ocxl.c
> +int pnv_ocxl_get_xsl_irq(struct pci_dev *dev, int *hwirq)
> +{
> + int rc;
> +
> + rc = of_property_read_u32(dev->dev.of_node, "ibm,opal-xsl-irq", hwirq);
> + if (rc) {
> + dev_err(&dev->dev,
> + "Can't translation xsl interrupt for device\n");

Can't get?


--
Andrew Donnellan OzLabs, ADL Canberra
[email protected] IBM Australia Limited

2018-01-09 15:46:07

by Frederic Barrat

[permalink] [raw]
Subject: Re: [PATCH 10/13] ocxl: Add Makefile and Kconfig



Le 03/01/2018 à 06:48, Andrew Donnellan a écrit :
> On 19/12/17 02:21, Frederic Barrat wrote:
>> OCXL_BASE triggers the platform support needed by the driver.
>>
>> Signed-off-by: Frederic Barrat <[email protected]>
>> ---
>>   drivers/misc/Kconfig       |  1 +
>>   drivers/misc/Makefile      |  1 +
>>   drivers/misc/ocxl/Kconfig  | 25 +++++++++++++++++++++++++
>>   drivers/misc/ocxl/Makefile | 10 ++++++++++
>>   4 files changed, 37 insertions(+)
>>   create mode 100644 drivers/misc/ocxl/Kconfig
>>   create mode 100644 drivers/misc/ocxl/Makefile
>>
>> diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
>> index f1a5c2357b14..0534f338c84a 100644
>> --- a/drivers/misc/Kconfig
>> +++ b/drivers/misc/Kconfig
>> @@ -508,4 +508,5 @@ source "drivers/misc/mic/Kconfig"
>>   source "drivers/misc/genwqe/Kconfig"
>>   source "drivers/misc/echo/Kconfig"
>>   source "drivers/misc/cxl/Kconfig"
>> +source "drivers/misc/ocxl/Kconfig"
>>   endmenu
>> diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
>> index 5ca5f64df478..73326d54e246 100644
>> --- a/drivers/misc/Makefile
>> +++ b/drivers/misc/Makefile
>> @@ -55,6 +55,7 @@ obj-$(CONFIG_CXL_BASE)        += cxl/
>>   obj-$(CONFIG_ASPEED_LPC_CTRL)    += aspeed-lpc-ctrl.o
>>   obj-$(CONFIG_ASPEED_LPC_SNOOP)    += aspeed-lpc-snoop.o
>>   obj-$(CONFIG_PCI_ENDPOINT_TEST)    += pci_endpoint_test.o
>> +obj-$(CONFIG_OCXL)        += ocxl/
>>
>>   lkdtm-$(CONFIG_LKDTM)        += lkdtm_core.o
>>   lkdtm-$(CONFIG_LKDTM)        += lkdtm_bugs.o
>> diff --git a/drivers/misc/ocxl/Kconfig b/drivers/misc/ocxl/Kconfig
>> new file mode 100644
>> index 000000000000..4496b61f48db
>> --- /dev/null
>> +++ b/drivers/misc/ocxl/Kconfig
>> @@ -0,0 +1,25 @@
>> +#
>> +# Open Coherent Accelerator (OCXL) compatible devices
>> +#
>> +
>> +config OCXL_BASE
>> +    bool
>> +    default n
>> +    select PPC_COPRO_BASE
>> +
>> +config OCXL
>> +    tristate "Support for Open Coherent Accelerators (OCXL)"
>> +    depends on PPC_POWERNV && PCI && EEH
>> +    select OCXL_BASE
>> +    default m
>> +    help
>> +
>> +      Select this option to enable driver support for Open
>> +      Coherent Accelerators (OCXL).  OCXL is otherwise known as
>> +      Open Coherent Accelerator Processor Interface (OCAPI).
>> +      OCAPI allows accelerators in FPGAs to be coherently attached
>> +      to a CPU through a Open CAPI link.  This driver enables
>> +      userspace programs to access these accelerators through
>> +      devices found in /dev/ocxl/
>
> I'd prefer more consistency in how we refer to OpenCAPI. "ocxl" is a
> driver name that we have purely for historical reasons, it's not really
> the name of anything else. I know throughout the various specs and code,
> we use "OCAPI" a lot, but that's not really an abbreviation that should
> be "user-facing".
>
> Something like:
>
> config OCXL
>      tristate "OpenCAPI coherent accelerator support"
>      help
>
>        Select this option to enable the ocxl driver for Open Coherent
>        Accelerator Processor Interface (OpenCAPI) devices.
>
>        OpenCAPI allows FPGA and ASIC accelerators to be coherently
>        attached to a CPU over an OpenCAPI link.
>
>        The ocxl driver enables userspace programs to access these
>        accelerators through devices in /dev/ocxl/.
>
>        For more information, see http://opencapi.org.
>
>        If unsure, say N.
>

Agreed, and stolen.

Fred


>> +
>> +      If unsure, say N.
>> diff --git a/drivers/misc/ocxl/Makefile b/drivers/misc/ocxl/Makefile
>> new file mode 100644
>> index 000000000000..f75853411cfd
>> --- /dev/null
>> +++ b/drivers/misc/ocxl/Makefile
>> @@ -0,0 +1,10 @@
>> +ccflags-$(CONFIG_PPC_WERROR)    += -Werror
>> +
>> +ocxl-y                += main.o pci.o config.o file.o pasid.o
>> +ocxl-y                += link.o context.o afu_irq.o sysfs.o trace.o
>> +obj-$(CONFIG_OCXL)        += ocxl.o
>> +
>> +# For tracepoints to include our trace.h from tracepoint infrastructure:
>> +CFLAGS_trace.o := -I$(src)
>> +
>> +# ccflags-y += -DDEBUG
>>
>

2018-01-09 23:21:08

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH 10/13] ocxl: Add Makefile and Kconfig



On 10 January 2018 2:45:56 am AEDT, Frederic Barrat <[email protected]> wrote:
>
>
>Le 03/01/2018 à 06:48, Andrew Donnellan a écrit :
>> On 19/12/17 02:21, Frederic Barrat wrote:
>>> OCXL_BASE triggers the platform support needed by the driver.
>>>
>>> Signed-off-by: Frederic Barrat <[email protected]>
>>> ---
>>>   drivers/misc/Kconfig       |  1 +
>>>   drivers/misc/Makefile      |  1 +
>>>   drivers/misc/ocxl/Kconfig  | 25 +++++++++++++++++++++++++
>>>   drivers/misc/ocxl/Makefile | 10 ++++++++++
>>>   4 files changed, 37 insertions(+)
>>>   create mode 100644 drivers/misc/ocxl/Kconfig
>>>   create mode 100644 drivers/misc/ocxl/Makefile
>>>
>>> diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
>>> index f1a5c2357b14..0534f338c84a 100644
>>> --- a/drivers/misc/Kconfig
>>> +++ b/drivers/misc/Kconfig
>>> @@ -508,4 +508,5 @@ source "drivers/misc/mic/Kconfig"
>>>   source "drivers/misc/genwqe/Kconfig"
>>>   source "drivers/misc/echo/Kconfig"
>>>   source "drivers/misc/cxl/Kconfig"
>>> +source "drivers/misc/ocxl/Kconfig"
>>>   endmenu
>>> diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
>>> index 5ca5f64df478..73326d54e246 100644
>>> --- a/drivers/misc/Makefile
>>> +++ b/drivers/misc/Makefile
>>> @@ -55,6 +55,7 @@ obj-$(CONFIG_CXL_BASE)        += cxl/
>>>   obj-$(CONFIG_ASPEED_LPC_CTRL)    += aspeed-lpc-ctrl.o
>>>   obj-$(CONFIG_ASPEED_LPC_SNOOP)    += aspeed-lpc-snoop.o
>>>   obj-$(CONFIG_PCI_ENDPOINT_TEST)    += pci_endpoint_test.o
>>> +obj-$(CONFIG_OCXL)        += ocxl/
>>>
>>>   lkdtm-$(CONFIG_LKDTM)        += lkdtm_core.o
>>>   lkdtm-$(CONFIG_LKDTM)        += lkdtm_bugs.o
>>> diff --git a/drivers/misc/ocxl/Kconfig b/drivers/misc/ocxl/Kconfig
>>> new file mode 100644
>>> index 000000000000..4496b61f48db
>>> --- /dev/null
>>> +++ b/drivers/misc/ocxl/Kconfig
>>> @@ -0,0 +1,25 @@
>>> +#
>>> +# Open Coherent Accelerator (OCXL) compatible devices
>>> +#
>>> +
>>> +config OCXL_BASE
>>> +    bool
>>> +    default n
>>> +    select PPC_COPRO_BASE
>>> +
>>> +config OCXL
>>> +    tristate "Support for Open Coherent Accelerators (OCXL)"
>>> +    depends on PPC_POWERNV && PCI && EEH
>>> +    select OCXL_BASE
>>> +    default m
>>> +    help
>>> +
>>> +      Select this option to enable driver support for Open
>>> +      Coherent Accelerators (OCXL).  OCXL is otherwise known as
>>> +      Open Coherent Accelerator Processor Interface (OCAPI).
>>> +      OCAPI allows accelerators in FPGAs to be coherently attached
>>> +      to a CPU through a Open CAPI link.  This driver enables
>>> +      userspace programs to access these accelerators through
>>> +      devices found in /dev/ocxl/
>>
>> I'd prefer more consistency in how we refer to OpenCAPI. "ocxl" is a
>> driver name that we have purely for historical reasons, it's not
>really
>> the name of anything else. I know throughout the various specs and
>code,
>> we use "OCAPI" a lot, but that's not really an abbreviation that
>should
>> be "user-facing".
>>
>> Something like:
>>
>> config OCXL
>>      tristate "OpenCAPI coherent accelerator support"
>>      help
>>
>>        Select this option to enable the ocxl driver for Open
>Coherent
>>        Accelerator Processor Interface (OpenCAPI) devices.
>>
>>        OpenCAPI allows FPGA and ASIC accelerators to be coherently
>>        attached to a CPU over an OpenCAPI link.
>>
>>        The ocxl driver enables userspace programs to access these
>>        accelerators through devices in /dev/ocxl/.
>>
>>        For more information, see http://opencapi.org.
>>
>>        If unsure, say N.
>>
>
>Agreed, and stolen.

Would also be great to add something describing the relationship to CAPI and cxl. Otherwise people will be confused about whether they need this one or the other one or both.

cheers
--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

2018-01-10 19:17:56

by Frederic Barrat

[permalink] [raw]
Subject: Re: [PATCH 10/13] ocxl: Add Makefile and Kconfig



Le 10/01/2018 à 00:21, Michael Ellerman a écrit :
> Would also be great to add something describing the relationship to CAPI and cxl. Otherwise people will be confused about whether they need this one or the other one or both.

OK, I'll add something.

Fred

2018-01-20 09:58:22

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH 02/13] powerpc/powernv: Set correct configuration space size for opencapi devices

Frederic Barrat <[email protected]> writes:

> From Andrew Donnellan <[email protected]>
>
> The configuration space for opencapi devices doesn't have a PCI
> Express capability, therefore confusing linux in thinking it's of an
> old PCI type with a 256-byte configuration space size, instead of the
> desired 4k. So add a PCI fixup to declare the correct size.
>
>
> Signed-off-by: Andrew Donnellan <[email protected]>
> Signed-off-by: Frederic Barrat <[email protected]>
> ---
> arch/powerpc/platforms/powernv/pci-ioda.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index c37b5d288f9c..b8ec76aa266f 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -4079,6 +4079,16 @@ void __init pnv_pci_init_npu2_opencapi_phb(struct device_node *np)
> pnv_pci_init_ioda_phb(np, 0, PNV_PHB_NPU_OCAPI);
> }
>
> +static void pnv_npu2_opencapi_cfg_size_fixup(struct pci_dev *dev)
> +{
> + struct pci_controller *hose = pci_bus_to_host(dev->bus);
> + struct pnv_phb *phb = hose->private_data;
> +
> + if (phb->type == PNV_PHB_NPU_OCAPI)
> + dev->cfg_size = PCI_CFG_SPACE_EXP_SIZE;
> +}
> +DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, pnv_npu2_opencapi_cfg_size_fixup);

On my Power8 PowerVM LPAR:

[ 0.096846] PCI: Probing PCI hardware
[ 0.096878] PCI host bridge to bus 0015:70
[ 0.096883] pci_bus 0015:70: root bus resource [mem 0x3fc0c0000000-0x3fc0cfffffff] (bus address [0xc0000000-0xcfffffff])
[ 0.096888] pci_bus 0015:70: root bus resource [mem 0x301800000000-0x301bffffffff] (bus address [0x3d01800000000-0x3d01bffffffff])
[ 0.096892] pci_bus 0015:70: root bus resource [bus 70-ff]
[ 0.097523] Unable to handle kernel paging request for data at address 0x00000008
[ 0.097526] Faulting instruction address: 0xc0000000000b3330
[ 0.097530] Oops: Kernel access of bad area, sig: 11 [#1]
[ 0.097532] LE SMP NR_CPUS=2048 NUMA pSeries
[ 0.097536] Modules linked in:
[ 0.097539] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.15.0-rc2-gcc7x-gf93b9d8 #1
[ 0.097543] task: 000000007ef679b5 task.stack: 00000000c7c0b3f9
[ 0.097546] NIP: c0000000000b3330 LR: c00000000067ee78 CTR: c0000000000b3300
[ 0.097549] REGS: 0000000012013889 TRAP: 0380 Not tainted (4.15.0-rc2-gcc7x-gf93b9d8)
[ 0.097552] MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 88000842 XER: 2000000f
[ 0.097559] CFAR: c00000000067ee74 SOFTE: 0
[ 0.097559] GPR00: c00000000067ee78 c0000003f7583980 c00000000103f000 c0000003fd619800
[ 0.097559] GPR04: c000000000d9c0e0 c000000000d9c7a0 ffff0a01ffffff10 0000000000000030
[ 0.097559] GPR08: 0000000000000000 0000000000000000 000000000000ffff c000000000ba3428
[ 0.097559] GPR12: c0000000000b3300 c00000000fd40000 c00000000000d938 0000000000000000
[ 0.097559] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 0.097559] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 0.097559] GPR24: c000000000e7b380 0000000000000000 0000000000000000 c0000000010d0088
[ 0.097559] GPR28: 000000000000ffff c000000000d9c7a0 c0000003fd619800 c0000003fd619800
[ 0.097595] NIP [c0000000000b3330] pnv_npu2_opencapi_cfg_size_fixup+0x30/0x60
[ 0.097599] LR [c00000000067ee78] pci_do_fixups+0xd8/0x140
[ 0.097602] Call Trace:
[ 0.097605] [c0000003f7583980] [c0000000000607a8] pci_dev_pdn_setup+0x58/0x70 (unreliable)
[ 0.097609] [c0000003f75839b0] [c00000000067ee78] pci_do_fixups+0xd8/0x140
[ 0.097613] [c0000003f7583a00] [c000000000064334] of_create_pci_dev+0x1d4/0x910
[ 0.097617] [c0000003f7583ab0] [c000000000064b98] __of_scan_bus+0x128/0x1e0
[ 0.097621] [c0000003f7583b20] [c00000000006225c] pcibios_scan_phb+0x22c/0x260
[ 0.097625] [c0000003f7583bc0] [c000000000e11648] pcibios_init+0x8c/0xe4
[ 0.097629] [c0000003f7583c40] [c00000000000d6b8] do_one_initcall+0x68/0x1e0
[ 0.097633] [c0000003f7583d00] [c000000000e04534] kernel_init_freeable+0x280/0x36c
[ 0.097636] [c0000003f7583dc0] [c00000000000d95c] kernel_init+0x2c/0x160
[ 0.097640] [c0000003f7583e30] [c00000000000bae8] ret_from_kernel_thread+0x5c/0x74
[ 0.097643] Instruction dump:
[ 0.097646] 3c4c00f9 3842bd00 7c0802a6 fbe1fff8 f8010010 f821ffd1 7c7f1b78 60000000
[ 0.097652] 60000000 e93f0010 e92900d0 e9290280 <81290008> 2f890003 409e000c 39201000
[ 0.097659] ---[ end trace 57e6a876df59eda0 ]---


And Power8 KVM guest:

[ 0.271653] PCI: Probing PCI hardware
[ 0.272640] PCI host bridge to bus 0000:00
[ 0.272897] pci_bus 0000:00: root bus resource [io 0x10000-0x1ffff] (bus address [0x0000-0xffff])
[ 0.273127] pci_bus 0000:00: root bus resource [mem 0x100a0000000-0x1101fffffff] (bus address [0x80000000-0xfffffffff])
[ 0.273346] pci_bus 0000:00: root bus resource [bus 00-ff]
[ 0.273610] Unable to handle kernel paging request for data at address 0x00000008
[ 0.273752] Faulting instruction address: 0xc0000000000b0030
[ 0.273878] Oops: Kernel access of bad area, sig: 11 [#1]
[ 0.273972] LE SMP NR_CPUS=2048 NUMA pSeries
[ 0.274068] Modules linked in:
[ 0.274140] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.15.0-rc2-gcc5x-gf93b9d8 #1
[ 0.274283] task: 000000001f3d5330 task.stack: 00000000f3462ef5
[ 0.274405] NIP: c0000000000b0030 LR: c00000000064aed8 CTR: c0000000000b0000
[ 0.274549] REGS: 000000000ef1a4d9 TRAP: 0380 Not tainted (4.15.0-rc2-gcc5x-gf93b9d8)
[ 0.274692] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 28000222 XER: 20000000
[ 0.274838] CFAR: c00000000064aed4 SOFTE: 0
[ 0.274838] GPR00: c00000000064aed8 c0000000fea87970 c000000000feec00 c0000000fe184108
[ 0.274838] GPR04: c000000000d4bbf8 00000000ffffffff 0000000000000008 0000000098968000
[ 0.274838] GPR08: 0000000000000038 0000000000000000 0000000000000000 0000000000000001
[ 0.274838] GPR12: c0000000000b0000 c00000000fd40000 c00000000000d798 0000000000000000
[ 0.274838] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 0.274838] GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000000d987b8
[ 0.274838] GPR24: c000000000db3894 0000000000000000 c000000000ca7d28 c000000001080088
[ 0.274838] GPR28: 000000000000ffff c000000000d4c2b8 c0000000fe184108 c0000000fe184108
[ 0.276032] NIP [c0000000000b0030] pnv_npu2_opencapi_cfg_size_fixup+0x30/0x60
[ 0.276178] LR [c00000000064aed8] pci_do_fixups+0xd8/0x130
[ 0.276269] Call Trace:
[ 0.276317] [c0000000fea87970] [c00000000005eb78] pci_dev_pdn_setup+0x58/0x70 (unreliable)
[ 0.276458] [c0000000fea879a0] [c00000000064aed8] pci_do_fixups+0xd8/0x130
[ 0.276575] [c0000000fea879f0] [c000000000062608] of_create_pci_dev+0x1d8/0x420
[ 0.276715] [c0000000fea87aa0] [c000000000062974] __of_scan_bus+0x124/0x1f0
[ 0.276831] [c0000000fea87b10] [c0000000000604dc] pcibios_scan_phb+0x24c/0x280
[ 0.276967] [c0000000fea87bb0] [c000000000dc16c4] pcibios_init+0x90/0xe8
[ 0.277078] [c0000000fea87c30] [c00000000000d5f8] do_one_initcall+0x138/0x1d0
[ 0.277208] [c0000000fea87cf0] [c000000000db45e8] kernel_init_freeable+0x298/0x378
[ 0.277341] [c0000000fea87dc0] [c00000000000d7bc] kernel_init+0x2c/0x160
[ 0.277454] [c0000000fea87e30] [c00000000000bae8] ret_from_kernel_thread+0x5c/0x74
[ 0.277591] Instruction dump:
[ 0.277662] 3c4c00f4 3842ec00 7c0802a6 fbe1fff8 f8010010 f821ffd1 7c7f1b78 60000000
[ 0.277804] 60000000 e93f0010 e92900d0 e9290280 <81290008> 2f890003 409e000c 39201000
[ 0.277950] ---[ end trace df10e6159ca0c179 ]---

cheers

2018-01-22 05:08:32

by Andrew Donnellan

[permalink] [raw]
Subject: Re: [PATCH 02/13] powerpc/powernv: Set correct configuration space size for opencapi devices

On 20/01/18 20:52, Michael Ellerman wrote:> On my Power8 PowerVM LPAR:

<snip>

Will fix...

--
Andrew Donnellan OzLabs, ADL Canberra
[email protected] IBM Australia Limited


2018-01-23 18:31:22

by Cédric Le Goater

[permalink] [raw]
Subject: Re: [PATCH 07/13] ocxl: Add AFU interrupt support

On 12/19/2017 04:05 AM, Benjamin Herrenschmidt wrote:
> On Mon, 2017-12-18 at 16:21 +0100, Frederic Barrat wrote:
>> Add user APIs through ioctl to allocate, free, and be notified of an
>> AFU interrupt.
>>
>> For opencapi, an AFU can trigger an interrupt on the host by sending a
>> specific command targeting a 64-bit object handle. On POWER9, this is
>> implemented by mapping a special page in the address space of a
>> process and a write to that page will trigger an interrupt.
>
> We need to figure out how that plays with KVM. +Cedric..
>
> For all those "generic xive" interrupts, whether they are used for
> OpenCAPI, plain guest IPIs, NX interrupts etc... but also for actual
> pass-through ones, we'll need a mechanism to map the trigger and ESB
> pages into qemu.
It seems feasible to use a common driver, at least for QEMU/KVM
and OCXL, to expose the ESB pages of a range of IRQ numbers. Fred
has already defined a user API, a set of ioctl which allocate, free
one IRQ and also associate an IRQ with an eventfd for handling.
The VMA is populated on demand.

This XIVE IRQ "device", that I don't know how to name, defines
generic IRQ sources and handlers for a given range. We would need
a couple of properties to describe it in a device tree,

- "ibm,xive-lisn-ranges" for the range.

Anymore ?

The current code needs some changes to distinguish the XIVE IRQ
driver from the OCXL one, range support should be added, using a
bitmap to track allocation I guess.

From a OCXL perspective, the XIVE IRQ device driver would be
instantiated from the OCXL one using an ioctl returning an fd,
like KVM does with KVM devices. User space would then alloc, free,
associate IRQs and mmap the ESB pages to configure the OpenCAPI
device. As for QEMU, I think we could add an extra KVM device,
QEMU does not need the 'associate' feature though.

Such devices could theoretically be defined by the firmware for
general purpose also, and be used through a char device. This is
a possibility.


> We can't have a bazillion VMAs and KVM memory regions either, so we'll
> need some kind of mechanism/driver which allows for a single fairly
> large mmap'ed VMA which can then be "populated" with interrupt control
> pages.

yes. the full address range should mmapped for the IRQ range defined
for the device. access to pages not populated would return EFAULT.

> The issue of course is that we can't really do a "generic" system that
> allows to map any interrupt, it's a security issue. So we need the
> interrupt "owner" to be the one allowing this. VFIO for PCI for
> example, possibly a specific VFIO variant for OpenCAPI, something else
> for guest IPIs ?
If we have defined ranges per devices, that should be enough no ?

Thanks,

C.

> Food for thoughts...
>
> Ben.
>
>>
>> Signed-off-by: Frederic Barrat <[email protected]>
>> ---
>> arch/powerpc/include/asm/pnv-ocxl.h | 3 +
>> arch/powerpc/platforms/powernv/ocxl.c | 30 +++++
>> drivers/misc/ocxl/afu_irq.c | 204 ++++++++++++++++++++++++++++++++++
>> drivers/misc/ocxl/context.c | 40 ++++++-
>> drivers/misc/ocxl/file.c | 33 ++++++
>> drivers/misc/ocxl/link.c | 28 +++++
>> drivers/misc/ocxl/ocxl_internal.h | 7 ++
>> include/uapi/misc/ocxl.h | 9 ++
>> 8 files changed, 352 insertions(+), 2 deletions(-)
>> create mode 100644 drivers/misc/ocxl/afu_irq.c
>>
>> diff --git a/arch/powerpc/include/asm/pnv-ocxl.h b/arch/powerpc/include/asm/pnv-ocxl.h
>> index 5a7ae7f28209..1e26f0a39500 100644
>> --- a/arch/powerpc/include/asm/pnv-ocxl.h
>> +++ b/arch/powerpc/include/asm/pnv-ocxl.h
>> @@ -37,4 +37,7 @@ extern int pnv_ocxl_spa_setup(struct pci_dev *dev, void *spa_mem, int PE_mask,
>> extern void pnv_ocxl_spa_release(void *platform_data);
>> extern int pnv_ocxl_spa_remove_pe(void *platform_data, int pe_handle);
>>
>> +extern int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr);
>> +extern void pnv_ocxl_free_xive_irq(u32 irq);
>> +
>> #endif /* _ASM_PVN_OCXL_H */
>> diff --git a/arch/powerpc/platforms/powernv/ocxl.c b/arch/powerpc/platforms/powernv/ocxl.c
>> index 6c79924b95c8..96cafba6aef1 100644
>> --- a/arch/powerpc/platforms/powernv/ocxl.c
>> +++ b/arch/powerpc/platforms/powernv/ocxl.c
>> @@ -9,6 +9,7 @@
>>
>> #include <asm/pnv-ocxl.h>
>> #include <asm/opal.h>
>> +#include <asm/xive.h>
>> #include <misc/ocxl-config.h>
>> #include "pci.h"
>>
>> @@ -487,3 +488,32 @@ int pnv_ocxl_spa_remove_pe(void *platform_data, int pe_handle)
>> return rc;
>> }
>> EXPORT_SYMBOL_GPL(pnv_ocxl_spa_remove_pe);
>> +
>> +int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr)
>> +{
>> + __be64 flags, trigger_page;
>> + s64 rc;
>> + u32 hwirq;
>> +
>> + hwirq = xive_native_alloc_irq();
>> + if (!hwirq)
>> + return -ENOENT;
>> +
>> + rc = opal_xive_get_irq_info(hwirq, &flags, NULL, &trigger_page, NULL,
>> + NULL);
>> + if (rc || !trigger_page) {
>> + xive_native_free_irq(hwirq);
>> + return -ENOENT;
>> + }
>> + *irq = hwirq;
>> + *trigger_addr = be64_to_cpu(trigger_page);
>> + return 0;
>> +
>> +}
>> +EXPORT_SYMBOL_GPL(pnv_ocxl_alloc_xive_irq);
>> +
>> +void pnv_ocxl_free_xive_irq(u32 irq)
>> +{
>> + xive_native_free_irq(irq);
>> +}
>> +EXPORT_SYMBOL_GPL(pnv_ocxl_free_xive_irq);
>> diff --git a/drivers/misc/ocxl/afu_irq.c b/drivers/misc/ocxl/afu_irq.c
>> new file mode 100644
>> index 000000000000..0b217a854837
>> --- /dev/null
>> +++ b/drivers/misc/ocxl/afu_irq.c
>> @@ -0,0 +1,204 @@
>> +/*
>> + * Copyright 2017 IBM Corp.
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public License
>> + * as published by the Free Software Foundation; either version
>> + * 2 of the License, or (at your option) any later version.
>> + */
>> +
>> +#include <linux/interrupt.h>
>> +#include <linux/eventfd.h>
>> +#include <asm/pnv-ocxl.h>
>> +#include "ocxl_internal.h"
>> +
>> +struct afu_irq {
>> + int id;
>> + int hw_irq;
>> + unsigned int virq;
>> + char *name;
>> + u64 trigger_page;
>> + struct eventfd_ctx *ev_ctx;
>> +};
>> +
>> +static int irq_offset_to_id(struct ocxl_context *ctx, u64 offset)
>> +{
>> + return (offset - ctx->afu->irq_base_offset) >> PAGE_SHIFT;
>> +}
>> +
>> +static u64 irq_id_to_offset(struct ocxl_context *ctx, int id)
>> +{
>> + return ctx->afu->irq_base_offset + (id << PAGE_SHIFT);
>> +}
>> +
>> +static irqreturn_t afu_irq_handler(int virq, void *data)
>> +{
>> + struct afu_irq *irq = (struct afu_irq *) data;
>> +
>> + if (irq->ev_ctx)
>> + eventfd_signal(irq->ev_ctx, 1);
>> + return IRQ_HANDLED;
>> +}
>> +
>> +static int setup_afu_irq(struct ocxl_context *ctx, struct afu_irq *irq)
>> +{
>> + int rc;
>> +
>> + irq->virq = irq_create_mapping(NULL, irq->hw_irq);
>> + if (!irq->virq) {
>> + pr_err("irq_create_mapping failed\n");
>> + return -ENOMEM;
>> + }
>> + pr_debug("hw_irq %d mapped to virq %u\n", irq->hw_irq, irq->virq);
>> +
>> + irq->name = kasprintf(GFP_KERNEL, "ocxl-afu-%u", irq->virq);
>> + if (!irq->name) {
>> + irq_dispose_mapping(irq->virq);
>> + return -ENOMEM;
>> + }
>> +
>> + rc = request_irq(irq->virq, afu_irq_handler, 0, irq->name, irq);
>> + if (rc) {
>> + kfree(irq->name);
>> + irq->name = NULL;
>> + irq_dispose_mapping(irq->virq);
>> + pr_err("request_irq failed: %d\n", rc);
>> + return rc;
>> + }
>> + return 0;
>> +}
>> +
>> +static void release_afu_irq(struct afu_irq *irq)
>> +{
>> + free_irq(irq->virq, irq);
>> + irq_dispose_mapping(irq->virq);
>> + kfree(irq->name);
>> +}
>> +
>> +int ocxl_afu_irq_alloc(struct ocxl_context *ctx, u64 *irq_offset)
>> +{
>> + struct afu_irq *irq;
>> + int rc;
>> +
>> + irq = kzalloc(sizeof(struct afu_irq), GFP_KERNEL);
>> + if (!irq)
>> + return -ENOMEM;
>> +
>> + /*
>> + * We limit the number of afu irqs per context and per link to
>> + * avoid a single process or user depleting the pool of IPIs
>> + */
>> +
>> + mutex_lock(&ctx->irq_lock);
>> +
>> + irq->id = idr_alloc(&ctx->irq_idr, irq, 0, MAX_IRQ_PER_CONTEXT,
>> + GFP_KERNEL);
>> + if (irq->id < 0) {
>> + rc = -ENOSPC;
>> + goto err_unlock;
>> + }
>> +
>> + rc = ocxl_link_irq_alloc(ctx->afu->fn->link, &irq->hw_irq,
>> + &irq->trigger_page);
>> + if (rc)
>> + goto err_idr;
>> +
>> + rc = setup_afu_irq(ctx, irq);
>> + if (rc)
>> + goto err_alloc;
>> +
>> + *irq_offset = irq_id_to_offset(ctx, irq->id);
>> +
>> + mutex_unlock(&ctx->irq_lock);
>> + return 0;
>> +
>> +err_alloc:
>> + ocxl_link_free_irq(ctx->afu->fn->link, irq->hw_irq);
>> +err_idr:
>> + idr_remove(&ctx->irq_idr, irq->id);
>> +err_unlock:
>> + mutex_unlock(&ctx->irq_lock);
>> + kfree(irq);
>> + return rc;
>> +}
>> +
>> +static void afu_irq_free(struct afu_irq *irq, struct ocxl_context *ctx)
>> +{
>> + if (ctx->mapping)
>> + unmap_mapping_range(ctx->mapping,
>> + irq_id_to_offset(ctx, irq->id),
>> + 1 << PAGE_SHIFT, 1);
>> + release_afu_irq(irq);
>> + if (irq->ev_ctx)
>> + eventfd_ctx_put(irq->ev_ctx);
>> + ocxl_link_free_irq(ctx->afu->fn->link, irq->hw_irq);
>> + kfree(irq);
>> +}
>> +
>> +int ocxl_afu_irq_free(struct ocxl_context *ctx, u64 irq_offset)
>> +{
>> + struct afu_irq *irq;
>> + int id = irq_offset_to_id(ctx, irq_offset);
>> +
>> + mutex_lock(&ctx->irq_lock);
>> +
>> + irq = idr_find(&ctx->irq_idr, id);
>> + if (!irq) {
>> + mutex_unlock(&ctx->irq_lock);
>> + return -EINVAL;
>> + }
>> + idr_remove(&ctx->irq_idr, irq->id);
>> + afu_irq_free(irq, ctx);
>> + mutex_unlock(&ctx->irq_lock);
>> + return 0;
>> +}
>> +
>> +void ocxl_afu_irq_free_all(struct ocxl_context *ctx)
>> +{
>> + struct afu_irq *irq;
>> + int id;
>> +
>> + mutex_lock(&ctx->irq_lock);
>> + idr_for_each_entry(&ctx->irq_idr, irq, id)
>> + afu_irq_free(irq, ctx);
>> + mutex_unlock(&ctx->irq_lock);
>> +}
>> +
>> +int ocxl_afu_irq_set_fd(struct ocxl_context *ctx, u64 irq_offset, int eventfd)
>> +{
>> + struct afu_irq *irq;
>> + struct eventfd_ctx *ev_ctx;
>> + int rc = 0, id = irq_offset_to_id(ctx, irq_offset);
>> +
>> + mutex_lock(&ctx->irq_lock);
>> + irq = idr_find(&ctx->irq_idr, id);
>> + if (!irq) {
>> + rc = -EINVAL;
>> + goto unlock;
>> + }
>> +
>> + ev_ctx = eventfd_ctx_fdget(eventfd);
>> + if (IS_ERR(ev_ctx)) {
>> + rc = -EINVAL;
>> + goto unlock;
>> + }
>> +
>> + irq->ev_ctx = ev_ctx;
>> +unlock:
>> + mutex_unlock(&ctx->irq_lock);
>> + return rc;
>> +}
>> +
>> +u64 ocxl_afu_irq_get_addr(struct ocxl_context *ctx, u64 irq_offset)
>> +{
>> + struct afu_irq *irq;
>> + int id = irq_offset_to_id(ctx, irq_offset);
>> + u64 addr = 0;
>> +
>> + mutex_lock(&ctx->irq_lock);
>> + irq = idr_find(&ctx->irq_idr, id);
>> + if (irq)
>> + addr = irq->trigger_page;
>> + mutex_unlock(&ctx->irq_lock);
>> + return addr;
>> +}
>> diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
>> index 0bc0dd97d784..19575269ed22 100644
>> --- a/drivers/misc/ocxl/context.c
>> +++ b/drivers/misc/ocxl/context.c
>> @@ -38,6 +38,8 @@ int ocxl_context_init(struct ocxl_context *ctx, struct ocxl_afu *afu,
>> mutex_init(&ctx->mapping_lock);
>> init_waitqueue_head(&ctx->events_wq);
>> mutex_init(&ctx->xsl_error_lock);
>> + mutex_init(&ctx->irq_lock);
>> + idr_init(&ctx->irq_idr);
>> /*
>> * Keep a reference on the AFU to make sure it's valid for the
>> * duration of the life of the context
>> @@ -87,6 +89,19 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr)
>> return rc;
>> }
>>
>> +static int map_afu_irq(struct vm_area_struct *vma, unsigned long address,
>> + u64 offset, struct ocxl_context *ctx)
>> +{
>> + u64 trigger_addr;
>> +
>> + trigger_addr = ocxl_afu_irq_get_addr(ctx, offset);
>> + if (!trigger_addr)
>> + return VM_FAULT_SIGBUS;
>> +
>> + vm_insert_pfn(vma, address, trigger_addr >> PAGE_SHIFT);
>> + return VM_FAULT_NOPAGE;
>> +}
>> +
>> static int map_pp_mmio(struct vm_area_struct *vma, unsigned long address,
>> u64 offset, struct ocxl_context *ctx)
>> {
>> @@ -125,7 +140,10 @@ static int ocxl_mmap_fault(struct vm_fault *vmf)
>> pr_debug("%s: pasid %d address 0x%lx offset 0x%llx\n", __func__,
>> ctx->pasid, vmf->address, offset);
>>
>> - rc = map_pp_mmio(vma, vmf->address, offset, ctx);
>> + if (offset < ctx->afu->irq_base_offset)
>> + rc = map_pp_mmio(vma, vmf->address, offset, ctx);
>> + else
>> + rc = map_afu_irq(vma, vmf->address, offset, ctx);
>> return rc;
>> }
>>
>> @@ -133,6 +151,19 @@ static const struct vm_operations_struct ocxl_vmops = {
>> .fault = ocxl_mmap_fault,
>> };
>>
>> +static int check_mmap_afu_irq(struct ocxl_context *ctx,
>> + struct vm_area_struct *vma)
>> +{
>> + /* only one page */
>> + if (vma_pages(vma) != 1)
>> + return -EINVAL;
>> +
>> + /* check offset validty */
>> + if (!ocxl_afu_irq_get_addr(ctx, vma->vm_pgoff << PAGE_SHIFT))
>> + return -EINVAL;
>> + return 0;
>> +}
>> +
>> static int check_mmap_mmio(struct ocxl_context *ctx,
>> struct vm_area_struct *vma)
>> {
>> @@ -146,7 +177,10 @@ int ocxl_context_mmap(struct ocxl_context *ctx, struct vm_area_struct *vma)
>> {
>> int rc;
>>
>> - rc = check_mmap_mmio(ctx, vma);
>> + if ((vma->vm_pgoff << PAGE_SHIFT) < ctx->afu->irq_base_offset)
>> + rc = check_mmap_mmio(ctx, vma);
>> + else
>> + rc = check_mmap_afu_irq(ctx, vma);
>> if (rc)
>> return rc;
>>
>> @@ -231,6 +265,8 @@ void ocxl_context_free(struct ocxl_context *ctx)
>> idr_remove(&ctx->afu->contexts_idr, ctx->pasid);
>> mutex_unlock(&ctx->afu->contexts_lock);
>>
>> + ocxl_afu_irq_free_all(ctx);
>> + idr_destroy(&ctx->irq_idr);
>> /* reference to the AFU taken in ocxl_context_init */
>> ocxl_afu_put(ctx->afu);
>> kfree(ctx);
>> diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c
>> index a51386eff4f5..0a73e2c11ba6 100644
>> --- a/drivers/misc/ocxl/file.c
>> +++ b/drivers/misc/ocxl/file.c
>> @@ -110,12 +110,17 @@ static long afu_ioctl_attach(struct ocxl_context *ctx,
>> }
>>
>> #define CMD_STR(x) (x == OCXL_IOCTL_ATTACH ? "ATTACH" : \
>> + x == OCXL_IOCTL_IRQ_ALLOC ? "IRQ_ALLOC" : \
>> + x == OCXL_IOCTL_IRQ_FREE ? "IRQ_FREE" : \
>> + x == OCXL_IOCTL_IRQ_SET_FD ? "IRQ_SET_FD" : \
>> "UNKNOWN")
>>
>> static long afu_ioctl(struct file *file, unsigned int cmd,
>> unsigned long args)
>> {
>> struct ocxl_context *ctx = file->private_data;
>> + struct ocxl_ioctl_irq_fd irq_fd;
>> + u64 irq_offset;
>> long rc;
>>
>> pr_debug("%s for context %d, command %s\n", __func__, ctx->pasid,
>> @@ -130,6 +135,34 @@ static long afu_ioctl(struct file *file, unsigned int cmd,
>> (struct ocxl_ioctl_attach __user *) args);
>> break;
>>
>> + case OCXL_IOCTL_IRQ_ALLOC:
>> + rc = ocxl_afu_irq_alloc(ctx, &irq_offset);
>> + if (!rc) {
>> + rc = copy_to_user((u64 *) args, &irq_offset,
>> + sizeof(irq_offset));
>> + if (rc)
>> + ocxl_afu_irq_free(ctx, irq_offset);
>> + }
>> + break;
>> +
>> + case OCXL_IOCTL_IRQ_FREE:
>> + rc = copy_from_user(&irq_offset, (u64 *) args,
>> + sizeof(irq_offset));
>> + if (rc)
>> + return -EFAULT;
>> + rc = ocxl_afu_irq_free(ctx, irq_offset);
>> + break;
>> +
>> + case OCXL_IOCTL_IRQ_SET_FD:
>> + rc = copy_from_user(&irq_fd, (u64 *) args, sizeof(irq_fd));
>> + if (rc)
>> + return -EFAULT;
>> + if (irq_fd.reserved)
>> + return -EINVAL;
>> + rc = ocxl_afu_irq_set_fd(ctx, irq_fd.irq_offset,
>> + irq_fd.eventfd);
>> + break;
>> +
>> default:
>> rc = -EINVAL;
>> }
>> diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
>> index 6b184cd7d2a6..5f12564eea99 100644
>> --- a/drivers/misc/ocxl/link.c
>> +++ b/drivers/misc/ocxl/link.c
>> @@ -608,3 +608,31 @@ int ocxl_link_remove_pe(void *link_handle, int pasid)
>> mutex_unlock(&spa->spa_lock);
>> return rc;
>> }
>> +
>> +int ocxl_link_irq_alloc(void *link_handle, int *hw_irq, u64 *trigger_addr)
>> +{
>> + struct link *link = (struct link *) link_handle;
>> + int rc, irq;
>> + u64 addr;
>> +
>> + if (atomic_dec_if_positive(&link->irq_available) < 0)
>> + return -ENOSPC;
>> +
>> + rc = pnv_ocxl_alloc_xive_irq(&irq, &addr);
>> + if (rc) {
>> + atomic_inc(&link->irq_available);
>> + return rc;
>> + }
>> +
>> + *hw_irq = irq;
>> + *trigger_addr = addr;
>> + return 0;
>> +}
>> +
>> +void ocxl_link_free_irq(void *link_handle, int hw_irq)
>> +{
>> + struct link *link = (struct link *) link_handle;
>> +
>> + pnv_ocxl_free_xive_irq(hw_irq);
>> + atomic_inc(&link->irq_available);
>> +}
>> diff --git a/drivers/misc/ocxl/ocxl_internal.h b/drivers/misc/ocxl/ocxl_internal.h
>> index e07f7d523275..829369c5f004 100644
>> --- a/drivers/misc/ocxl/ocxl_internal.h
>> +++ b/drivers/misc/ocxl/ocxl_internal.h
>> @@ -197,4 +197,11 @@ extern void ocxl_context_free(struct ocxl_context *ctx);
>> extern int ocxl_sysfs_add_afu(struct ocxl_afu *afu);
>> extern void ocxl_sysfs_remove_afu(struct ocxl_afu *afu);
>>
>> +extern int ocxl_afu_irq_alloc(struct ocxl_context *ctx, u64 *irq_offset);
>> +extern int ocxl_afu_irq_free(struct ocxl_context *ctx, u64 irq_offset);
>> +extern void ocxl_afu_irq_free_all(struct ocxl_context *ctx);
>> +extern int ocxl_afu_irq_set_fd(struct ocxl_context *ctx, u64 irq_offset,
>> + int eventfd);
>> +extern u64 ocxl_afu_irq_get_addr(struct ocxl_context *ctx, u64 irq_offset);
>> +
>> #endif /* _OCXL_INTERNAL_H_ */
>> diff --git a/include/uapi/misc/ocxl.h b/include/uapi/misc/ocxl.h
>> index 71fa387f2efd..488e75228c33 100644
>> --- a/include/uapi/misc/ocxl.h
>> +++ b/include/uapi/misc/ocxl.h
>> @@ -39,9 +39,18 @@ struct ocxl_ioctl_attach {
>> __u64 reserved3;
>> };
>>
>> +struct ocxl_ioctl_irq_fd {
>> + __u64 irq_offset;
>> + __s32 eventfd;
>> + __u32 reserved;
>> +};
>> +
>> /* ioctl numbers */
>> #define OCXL_MAGIC 0xCA
>> /* AFU devices */
>> #define OCXL_IOCTL_ATTACH _IOW(OCXL_MAGIC, 0x10, struct ocxl_ioctl_attach)
>> +#define OCXL_IOCTL_IRQ_ALLOC _IOR(OCXL_MAGIC, 0x11, __u64)
>> +#define OCXL_IOCTL_IRQ_FREE _IOW(OCXL_MAGIC, 0x12, __u64)
>> +#define OCXL_IOCTL_IRQ_SET_FD _IOW(OCXL_MAGIC, 0x13, struct ocxl_ioctl_irq_fd)
>>
>> #endif /* _UAPI_MISC_OCXL_H */
>