2020-09-19 15:30:06

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH 0/4] Pass zPCI hardware information via VFIO

This patchset provides a means by which hardware information about the
underlying PCI device can be passed up to userspace (ie, QEMU) so that
this hardware information can be used rather than previously hard-coded
assumptions. A new VFIO region type is defined which holds this
information.

A form of these patches saw some rounds last year but has been back-
tabled for a while. The original work for this feature was done by Pierre
Morel. I'd like to refresh the discussion on this and get this finished up
so that we can move forward with better-supporting additional types of
PCI-attached devices. The proposal here presents a completely different
region mapping vs the prior approach, taking inspiration from vfio info
capability chains to provide device CLP information in a way that allows
for future expansion (new CLP features).

This feature is toggled via the CONFIG_VFIO_PCI_ZDEV configuration entry.

Matthew Rosato (4):
s390/pci: stash version in the zpci_dev
s390/pci: track whether util_str is valid in the zpci_dev
vfio-pci/zdev: define the vfio_zdev header
vfio-pci/zdev: use a device region to retrieve zPCI information

arch/s390/include/asm/pci.h | 4 +-
arch/s390/pci/pci_clp.c | 2 +
drivers/vfio/pci/Kconfig | 13 ++
drivers/vfio/pci/Makefile | 1 +
drivers/vfio/pci/vfio_pci.c | 8 ++
drivers/vfio/pci/vfio_pci_private.h | 10 ++
drivers/vfio/pci/vfio_pci_zdev.c | 242 ++++++++++++++++++++++++++++++++++++
include/uapi/linux/vfio.h | 5 +
include/uapi/linux/vfio_zdev.h | 116 +++++++++++++++++
9 files changed, 400 insertions(+), 1 deletion(-)
create mode 100644 drivers/vfio/pci/vfio_pci_zdev.c
create mode 100644 include/uapi/linux/vfio_zdev.h

--
1.8.3.1


2020-09-19 15:30:13

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH 3/4] vfio-pci/zdev: define the vfio_zdev header

We define a new device region in vfio.h to be able to get the ZPCI CLP
information by reading this region from userspace.

We create a new file, vfio_zdev.h to define the structure of the new
region defined in vfio.h

Signed-off-by: Matthew Rosato <[email protected]>
---
include/uapi/linux/vfio.h | 5 ++
include/uapi/linux/vfio_zdev.h | 116 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 121 insertions(+)
create mode 100644 include/uapi/linux/vfio_zdev.h

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 9204705..65eb367 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -326,6 +326,11 @@ struct vfio_region_info_cap_type {
* to do TLB invalidation on a GPU.
*/
#define VFIO_REGION_SUBTYPE_IBM_NVLINK2_ATSD (1)
+/*
+ * IBM zPCI specific hardware feature information for a devcie. The contents
+ * of this region are mapped by struct vfio_region_zpci_info.
+ */
+#define VFIO_REGION_SUBTYPE_IBM_ZPCI_CLP (2)

/* sub-types for VFIO_REGION_TYPE_GFX */
#define VFIO_REGION_SUBTYPE_GFX_EDID (1)
diff --git a/include/uapi/linux/vfio_zdev.h b/include/uapi/linux/vfio_zdev.h
new file mode 100644
index 0000000..c9e4891
--- /dev/null
+++ b/include/uapi/linux/vfio_zdev.h
@@ -0,0 +1,116 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Region definition for ZPCI devices
+ *
+ * Copyright IBM Corp. 2020
+ *
+ * Author(s): Pierre Morel <[email protected]>
+ * Matthew Rosato <[email protected]>
+ */
+
+#ifndef _VFIO_ZDEV_H_
+#define _VFIO_ZDEV_H_
+
+#include <linux/types.h>
+
+/**
+ * struct vfio_region_zpci_info - ZPCI information
+ *
+ * This region provides zPCI specific hardware feature information for a
+ * device.
+ *
+ * The ZPCI information structure is presented as a chain of CLP features
+ * defined below. argsz provides the size of the entire region, and offset
+ * provides the location of the first CLP feature in the chain.
+ *
+ */
+struct vfio_region_zpci_info {
+ __u32 argsz; /* Size of entire payload */
+ __u32 offset; /* Location of first entry */
+} __packed;
+
+/**
+ * struct vfio_region_zpci_info_hdr - ZPCI header information
+ *
+ * This structure is included at the top of each CLP feature to define what
+ * type of CLP feature is presented / the structure version. The next value
+ * defines the offset of the next CLP feature, and is an offset from the very
+ * beginning of the region (vfio_region_zpci_info).
+ *
+ * Each CLP feature must have it's own unique 'id'.
+ */
+struct vfio_region_zpci_info_hdr {
+ __u16 id; /* Identifies the CLP type */
+ __u16 version; /* version of the CLP data */
+ __u32 next; /* Offset of next entry */
+} __packed;
+
+/**
+ * struct vfio_region_zpci_info_qpci - Initial Query PCI information
+ *
+ * This region provides an initial set of data from the Query PCI Function
+ * CLP.
+ */
+#define VFIO_REGION_ZPCI_INFO_QPCI 1
+
+struct vfio_region_zpci_info_qpci {
+ struct vfio_region_zpci_info_hdr hdr;
+ __u64 start_dma; /* Start of available DMA addresses */
+ __u64 end_dma; /* End of available DMA addresses */
+ __u16 pchid; /* Physical Channel ID */
+ __u16 vfn; /* Virtual function number */
+ __u16 fmb_length; /* Measurement Block Length (in bytes) */
+ __u8 pft; /* PCI Function Type */
+ __u8 gid; /* PCI function group ID */
+} __packed;
+
+
+/**
+ * struct vfio_region_zpci_info_qpcifg - Initial Query PCI Function Group info
+ *
+ * This region provides an initial set of data from the Query PCI Function
+ * Group CLP.
+ */
+#define VFIO_REGION_ZPCI_INFO_QPCIFG 2
+
+struct vfio_region_zpci_info_qpcifg {
+ struct vfio_region_zpci_info_hdr hdr;
+ __u64 dasm; /* DMA Address space mask */
+ __u64 msi_addr; /* MSI address */
+ __u64 flags;
+#define VFIO_PCI_ZDEV_FLAGS_REFRESH 1 /* Use program-specified TLB refresh */
+ __u16 mui; /* Measurement Block Update Interval */
+ __u16 noi; /* Maximum number of MSIs */
+ __u16 maxstbl; /* Maximum Store Block Length */
+ __u8 version; /* Supported PCI Version */
+} __packed;
+
+/**
+ * struct vfio_region_zpci_info_util - Utility String
+ *
+ * This region provides the utility string for the associated device, which is
+ * a device identifier string.
+ */
+#define VFIO_REGION_ZPCI_INFO_UTIL 3
+
+struct vfio_region_zpci_info_util {
+ struct vfio_region_zpci_info_hdr hdr;
+ __u32 size;
+ __u8 util_str[];
+} __packed;
+
+/**
+ * struct vfio_region_zpci_info_pfip - PCI Function Path
+ *
+ * This region provides the PCI function path string, which is an identifier
+ * that describes the internal hardware path of the device.
+ */
+#define VFIO_REGION_ZPCI_INFO_PFIP 4
+
+struct vfio_region_zpci_info_pfip {
+struct vfio_region_zpci_info_hdr hdr;
+ __u32 size;
+ __u8 pfip[];
+} __packed;
+
+#endif
--
1.8.3.1

2020-09-19 15:30:19

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH 1/4] s390/pci: stash version in the zpci_dev

In preparation for passing the info on to vfio-pci devices, stash the
supported PCI version for the target device in the zpci_dev.

Signed-off-by: Matthew Rosato <[email protected]>
---
arch/s390/include/asm/pci.h | 1 +
arch/s390/pci/pci_clp.c | 1 +
2 files changed, 2 insertions(+)

diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 99b92c3..882e233 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -179,6 +179,7 @@ struct zpci_dev {
atomic64_t mapped_pages;
atomic64_t unmapped_pages;

+ u8 version;
enum pci_bus_speed max_bus_speed;

struct dentry *debugfs_dev;
diff --git a/arch/s390/pci/pci_clp.c b/arch/s390/pci/pci_clp.c
index 7e735f4..48bf316 100644
--- a/arch/s390/pci/pci_clp.c
+++ b/arch/s390/pci/pci_clp.c
@@ -102,6 +102,7 @@ static void clp_store_query_pci_fngrp(struct zpci_dev *zdev,
zdev->msi_addr = response->msia;
zdev->max_msi = response->noi;
zdev->fmb_update = response->mui;
+ zdev->version = response->version;

switch (response->version) {
case 1:
--
1.8.3.1

2020-09-19 15:31:20

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH 4/4] vfio-pci/zdev: use a device region to retrieve zPCI information

Define a new configuration entry VFIO_PCI_ZDEV for VFIO/PCI.

When this s390-only feature is configured we initialize a new device
region, VFIO_REGION_SUBTYPE_IBM_ZPCI_CLP, to hold information provided
by the underlying hardware.

This patch is based on work previously done by Pierre Morel.

Signed-off-by: Matthew Rosato <[email protected]>
---
drivers/vfio/pci/Kconfig | 13 ++
drivers/vfio/pci/Makefile | 1 +
drivers/vfio/pci/vfio_pci.c | 8 ++
drivers/vfio/pci/vfio_pci_private.h | 10 ++
drivers/vfio/pci/vfio_pci_zdev.c | 242 ++++++++++++++++++++++++++++++++++++
5 files changed, 274 insertions(+)
create mode 100644 drivers/vfio/pci/vfio_pci_zdev.c

diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index ac3c1dd..07b4a35 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -45,3 +45,16 @@ config VFIO_PCI_NVLINK2
depends on VFIO_PCI && PPC_POWERNV
help
VFIO PCI support for P9 Witherspoon machine with NVIDIA V100 GPUs
+
+config VFIO_PCI_ZDEV
+ bool "VFIO PCI ZPCI device CLP support"
+ depends on VFIO_PCI && S390
+ default y
+ help
+ Enabling this options exposes a region containing hardware
+ configuration for zPCI devices. This enables userspace (e.g. QEMU)
+ to supply proper configuration values instead of hard-coded defaults
+ for zPCI devices passed through via VFIO on s390.
+
+ Say Y here.
+
diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index f027f8a..781e080 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -3,5 +3,6 @@
vfio-pci-y := vfio_pci.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o
vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
vfio-pci-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2.o
+vfio-pci-$(CONFIG_VFIO_PCI_ZDEV) += vfio_pci_zdev.o

obj-$(CONFIG_VFIO_PCI) += vfio-pci.o
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 1ab1f5c..cfb04d9 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -409,6 +409,14 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
}
}

+ if (IS_ENABLED(CONFIG_VFIO_PCI_ZDEV)) {
+ ret = vfio_pci_zdev_init(vdev);
+ if (ret && ret != -ENODEV) {
+ pci_warn(pdev, "Failed to setup zPCI CLP region\n");
+ goto disable_exit;
+ }
+ }
+
vfio_pci_probe_mmaps(vdev);

return 0;
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index 61ca8ab..729af20 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -213,4 +213,14 @@ static inline int vfio_pci_ibm_npu2_init(struct vfio_pci_device *vdev)
return -ENODEV;
}
#endif
+
+#ifdef CONFIG_VFIO_PCI_ZDEV
+extern int vfio_pci_zdev_init(struct vfio_pci_device *vdev);
+#else
+static inline int vfio_pci_zdev_init(struct vfio_pci_device *vdev)
+{
+ return -ENODEV;
+}
+#endif
+
#endif /* VFIO_PCI_PRIVATE_H */
diff --git a/drivers/vfio/pci/vfio_pci_zdev.c b/drivers/vfio/pci/vfio_pci_zdev.c
new file mode 100644
index 0000000..9c7d659
--- /dev/null
+++ b/drivers/vfio/pci/vfio_pci_zdev.c
@@ -0,0 +1,242 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * VFIO ZPCI devices support
+ *
+ * Copyright (C) IBM Corp. 2020. All rights reserved.
+ * Author(s): Pierre Morel <[email protected]>
+ * Matthew Rosato <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+#include <linux/io.h>
+#include <linux/pci.h>
+#include <linux/uaccess.h>
+#include <linux/vfio.h>
+#include <linux/vfio_zdev.h>
+#include <asm/pci_clp.h>
+#include <asm/pci_io.h>
+
+#include "vfio_pci_private.h"
+
+static size_t vfio_pci_zdev_rw(struct vfio_pci_device *vdev,
+ char __user *buf, size_t count, loff_t *ppos,
+ bool iswrite)
+{
+ unsigned int i = VFIO_PCI_OFFSET_TO_INDEX(*ppos) - VFIO_PCI_NUM_REGIONS;
+ struct vfio_region_zpci_info *region = vdev->region[i].data;
+ loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+
+ if ((!vdev->pdev->bus) || (!to_zpci(vdev->pdev)))
+ return -ENODEV;
+
+ if (pos >= vdev->region[i].size || iswrite)
+ return -EINVAL;
+
+ count = min(count, (size_t)(vdev->region[i].size - pos));
+ if (copy_to_user(buf, region + pos, count))
+ return -EFAULT;
+
+ return count;
+}
+
+static void vfio_pci_zdev_release(struct vfio_pci_device *vdev,
+ struct vfio_pci_region *region)
+{
+ kfree(region->data);
+}
+
+static const struct vfio_pci_regops vfio_pci_zdev_regops = {
+ .rw = vfio_pci_zdev_rw,
+ .release = vfio_pci_zdev_release,
+};
+
+static void vfio_pci_zdev_fill_hdr(struct vfio_region_zpci_info_hdr *hdr,
+ __u16 id, __u16 version, size_t offset,
+ size_t size)
+{
+ hdr->id = id;
+ hdr->version = version;
+ hdr->next = offset + size;
+}
+
+/*
+ * Add the Query PCI Function information to the device region.
+ *
+ * zdev - the zPCI device to get information from
+ * region - start of the vfio device region
+ * offset - location within region to place the data
+ *
+ * On return, provide the offset of the end of this CLP feature.
+ */
+static size_t vfio_pci_zdev_add_qpci(struct zpci_dev *zdev, void *region,
+ size_t offset)
+{
+ struct vfio_region_zpci_info_qpci *clp;
+
+ /* Jump to the CLP region via the offset */
+ clp = (struct vfio_region_zpci_info_qpci *) (region + offset);
+
+ /* Fill in the clp header */
+ vfio_pci_zdev_fill_hdr(&clp->hdr, VFIO_REGION_ZPCI_INFO_QPCI, 1,
+ offset, sizeof(*clp));
+
+ /* Fill in the CLP feature info */
+ clp->start_dma = zdev->start_dma;
+ clp->end_dma = zdev->end_dma;
+ clp->pchid = zdev->pchid;
+ clp->vfn = zdev->vfn;
+ clp->fmb_length = zdev->fmb_length;
+ clp->pft = zdev->pft;
+ clp->gid = zdev->pfgid;
+
+ /* Return offset to the end of this CLP feature */
+ return clp->hdr.next;
+}
+
+/*
+ * Add the Query PCI Function Group information to the device region.
+ *
+ * zdev - the zPCI device to get information from
+ * region - start of the vfio device region
+ * offset - location within region to place the data
+ *
+ * On return, provide the offset of the end of this CLP feature.
+ */
+static size_t vfio_pci_zdev_add_qpcifg(struct zpci_dev *zdev, void *region,
+ size_t offset)
+{
+ struct vfio_region_zpci_info_qpcifg *clp;
+
+ /* Jump to the CLP region via the offset */
+ clp = (struct vfio_region_zpci_info_qpcifg *) (region + offset);
+
+ /* Fill in the clp header */
+ vfio_pci_zdev_fill_hdr(&clp->hdr, VFIO_REGION_ZPCI_INFO_QPCIFG, 1,
+ offset, sizeof(*clp));
+
+ /* Fill in the CLP feature info */
+ clp->dasm = zdev->dma_mask;
+ clp->msi_addr = zdev->msi_addr;
+ clp->flags = VFIO_PCI_ZDEV_FLAGS_REFRESH;
+ clp->mui = zdev->fmb_update;
+ clp->noi = zdev->max_msi;
+ clp->maxstbl = ZPCI_MAX_WRITE_SIZE;
+ clp->version = zdev->version;
+
+ /* Return offset to the end of this CLP feature */
+ return clp->hdr.next;
+}
+
+/*
+ * Add the device utility string to the device region.
+ *
+ * zdev - the zPCI device to get information from
+ * region - start of the vfio device region
+ * offset - location within region to place the data
+ *
+ * On return, provide the offset of the end of this CLP feature.
+ */
+static size_t vfio_pci_zdev_add_util(struct zpci_dev *zdev, void *region,
+ size_t offset)
+{
+ struct vfio_region_zpci_info_util *clp;
+ size_t size = CLP_UTIL_STR_LEN;
+
+ /* Only add a utility string if one is available */
+ if (!zdev->util_avail)
+ return offset;
+
+ /* Jump to the CLP region via the offset */
+ clp = (struct vfio_region_zpci_info_util *) (region + offset);
+
+ /* Fill in the clp header */
+ vfio_pci_zdev_fill_hdr(&clp->hdr, VFIO_REGION_ZPCI_INFO_UTIL, 1,
+ offset, sizeof(*clp) + size);
+
+ /* Fill in the CLP feature info */
+ clp->size = size;
+ memcpy(clp->util_str, zdev->util_str, size);
+
+ /* Return offset to the end of this CLP feature */
+ return clp->hdr.next;
+}
+
+/*
+ * Add the function path string to the device region.
+ *
+ * zdev - the zPCI device to get information from
+ * region - start of the vfio device region
+ * offset - location within region to place the data
+ *
+ * On return, provide the offset of the end of this CLP feature.
+ */
+static size_t vfio_pci_zdev_add_pfip(struct zpci_dev *zdev, void *region,
+ size_t offset)
+{
+ struct vfio_region_zpci_info_pfip *clp;
+ size_t size = CLP_PFIP_NR_SEGMENTS;
+
+ /* Jump to the CLP region via the offset */
+ clp = (struct vfio_region_zpci_info_pfip *) (region + offset);
+
+ /* Fill in the clp header */
+ vfio_pci_zdev_fill_hdr(&clp->hdr, VFIO_REGION_ZPCI_INFO_PFIP, 1,
+ offset, sizeof(*clp) + size);
+
+ /* Fill in the CLP feature info */
+ clp->size = size;
+ memcpy(clp->pfip, zdev->pfip, size);
+
+ /* Return offset to the end of this CLP feature */
+ return clp->hdr.next;
+}
+
+int vfio_pci_zdev_init(struct vfio_pci_device *vdev)
+{
+ struct vfio_region_zpci_info *region;
+ struct zpci_dev *zdev;
+ size_t clp_offset;
+ int size;
+ int ret;
+
+ if (!vdev->pdev->bus)
+ return -ENODEV;
+
+ zdev = to_zpci(vdev->pdev);
+ if (!zdev)
+ return -ENODEV;
+
+ /* Calculate size needed for all supported CLP features */
+ size = sizeof(*region) +
+ sizeof(struct vfio_region_zpci_info_qpci) +
+ sizeof(struct vfio_region_zpci_info_qpcifg) +
+ (sizeof(struct vfio_region_zpci_info_util) + CLP_UTIL_STR_LEN) +
+ (sizeof(struct vfio_region_zpci_info_pfip) +
+ CLP_PFIP_NR_SEGMENTS);
+
+ region = kmalloc(size, GFP_KERNEL);
+ if (!region)
+ return -ENOMEM;
+
+ /* Fill in header */
+ region->argsz = size;
+ clp_offset = region->offset = sizeof(struct vfio_region_zpci_info);
+
+ /* Fill the supported CLP features */
+ clp_offset = vfio_pci_zdev_add_qpci(zdev, region, clp_offset);
+ clp_offset = vfio_pci_zdev_add_qpcifg(zdev, region, clp_offset);
+ clp_offset = vfio_pci_zdev_add_util(zdev, region, clp_offset);
+ clp_offset = vfio_pci_zdev_add_pfip(zdev, region, clp_offset);
+
+ ret = vfio_pci_register_dev_region(vdev,
+ PCI_VENDOR_ID_IBM | VFIO_REGION_TYPE_PCI_VENDOR_TYPE,
+ VFIO_REGION_SUBTYPE_IBM_ZPCI_CLP, &vfio_pci_zdev_regops,
+ size, VFIO_REGION_INFO_FLAG_READ, region);
+ if (ret)
+ kfree(region);
+
+ return ret;
+}
--
1.8.3.1

2020-09-19 15:31:28

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH 2/4] s390/pci: track whether util_str is valid in the zpci_dev

We'll need to keep track of whether or not the byte string in util_str is
valid and thus needs to be passed to a vfio-pci passthrough device.

Signed-off-by: Matthew Rosato <[email protected]>
---
arch/s390/include/asm/pci.h | 3 ++-
arch/s390/pci/pci_clp.c | 1 +
2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 882e233..32eb975 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -132,7 +132,8 @@ struct zpci_dev {
u8 rid_available : 1;
u8 has_hp_slot : 1;
u8 is_physfn : 1;
- u8 reserved : 5;
+ u8 util_avail : 1;
+ u8 reserved : 4;
unsigned int devfn; /* DEVFN part of the RID*/

struct mutex lock;
diff --git a/arch/s390/pci/pci_clp.c b/arch/s390/pci/pci_clp.c
index 48bf316..d011134 100644
--- a/arch/s390/pci/pci_clp.c
+++ b/arch/s390/pci/pci_clp.c
@@ -168,6 +168,7 @@ static int clp_store_query_pci_fn(struct zpci_dev *zdev,
if (response->util_str_avail) {
memcpy(zdev->util_str, response->util_str,
sizeof(zdev->util_str));
+ zdev->util_avail = 1;
}
zdev->mio_capable = response->mio_addr_avail;
for (i = 0; i < PCI_STD_NUM_BARS; i++) {
--
1.8.3.1

2020-09-19 15:55:54

by Matthew Rosato

[permalink] [raw]
Subject: Re: [PATCH 0/4] Pass zPCI hardware information via VFIO

On 9/19/20 11:28 AM, Matthew Rosato wrote:
> This patchset provides a means by which hardware information about the
> underlying PCI device can be passed up to userspace (ie, QEMU) so that
> this hardware information can be used rather than previously hard-coded
> assumptions. A new VFIO region type is defined which holds this
> information.
>
> A form of these patches saw some rounds last year but has been back-
> tabled for a while. The original work for this feature was done by Pierre
> Morel. I'd like to refresh the discussion on this and get this finished up
> so that we can move forward with better-supporting additional types of
> PCI-attached devices. The proposal here presents a completely different
> region mapping vs the prior approach, taking inspiration from vfio info
> capability chains to provide device CLP information in a way that allows
> for future expansion (new CLP features).
>
> This feature is toggled via the CONFIG_VFIO_PCI_ZDEV configuration entry.

QEMU patchset that exploits the new region:
https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg07076.html

>
> Matthew Rosato (4):
> s390/pci: stash version in the zpci_dev
> s390/pci: track whether util_str is valid in the zpci_dev
> vfio-pci/zdev: define the vfio_zdev header
> vfio-pci/zdev: use a device region to retrieve zPCI information
>
> arch/s390/include/asm/pci.h | 4 +-
> arch/s390/pci/pci_clp.c | 2 +
> drivers/vfio/pci/Kconfig | 13 ++
> drivers/vfio/pci/Makefile | 1 +
> drivers/vfio/pci/vfio_pci.c | 8 ++
> drivers/vfio/pci/vfio_pci_private.h | 10 ++
> drivers/vfio/pci/vfio_pci_zdev.c | 242 ++++++++++++++++++++++++++++++++++++
> include/uapi/linux/vfio.h | 5 +
> include/uapi/linux/vfio_zdev.h | 116 +++++++++++++++++
> 9 files changed, 400 insertions(+), 1 deletion(-)
> create mode 100644 drivers/vfio/pci/vfio_pci_zdev.c
> create mode 100644 include/uapi/linux/vfio_zdev.h
>

2020-09-21 09:39:07

by Niklas Schnelle

[permalink] [raw]
Subject: Re: [PATCH 1/4] s390/pci: stash version in the zpci_dev

Hi Matthew,

On 9/19/20 5:28 PM, Matthew Rosato wrote:
> In preparation for passing the info on to vfio-pci devices, stash the
> supported PCI version for the target device in the zpci_dev.
>
> Signed-off-by: Matthew Rosato <[email protected]>

Acked-by: Niklas Schnelle <[email protected]>

> ---
> arch/s390/include/asm/pci.h | 1 +
> arch/s390/pci/pci_clp.c | 1 +
> 2 files changed, 2 insertions(+)
>
> diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
> index 99b92c3..882e233 100644
> --- a/arch/s390/include/asm/pci.h
> +++ b/arch/s390/include/asm/pci.h
> @@ -179,6 +179,7 @@ struct zpci_dev {
> atomic64_t mapped_pages;
> atomic64_t unmapped_pages;
>
> + u8 version;
> enum pci_bus_speed max_bus_speed;
>
> struct dentry *debugfs_dev;
> diff --git a/arch/s390/pci/pci_clp.c b/arch/s390/pci/pci_clp.c
> index 7e735f4..48bf316 100644
> --- a/arch/s390/pci/pci_clp.c
> +++ b/arch/s390/pci/pci_clp.c
> @@ -102,6 +102,7 @@ static void clp_store_query_pci_fngrp(struct zpci_dev *zdev,
> zdev->msi_addr = response->msia;
> zdev->max_msi = response->noi;
> zdev->fmb_update = response->mui;
> + zdev->version = response->version;
>
> switch (response->version) {
> case 1:
>

2020-09-21 09:39:32

by Christian Borntraeger

[permalink] [raw]
Subject: Re: [PATCH 1/4] s390/pci: stash version in the zpci_dev

On 19.09.20 17:28, Matthew Rosato wrote:
> In preparation for passing the info on to vfio-pci devices, stash the
> supported PCI version for the target device in the zpci_dev.
>
> Signed-off-by: Matthew Rosato <[email protected]>

In case this will go via the vfio tree, with my s390 kernel maintainer hat on:

Acked-by: Christian Borntraeger <[email protected]>
> ---
> arch/s390/include/asm/pci.h | 1 +
> arch/s390/pci/pci_clp.c | 1 +
> 2 files changed, 2 insertions(+)
>
> diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
> index 99b92c3..882e233 100644
> --- a/arch/s390/include/asm/pci.h
> +++ b/arch/s390/include/asm/pci.h
> @@ -179,6 +179,7 @@ struct zpci_dev {
> atomic64_t mapped_pages;
> atomic64_t unmapped_pages;
>
> + u8 version;
> enum pci_bus_speed max_bus_speed;
>
> struct dentry *debugfs_dev;
> diff --git a/arch/s390/pci/pci_clp.c b/arch/s390/pci/pci_clp.c
> index 7e735f4..48bf316 100644
> --- a/arch/s390/pci/pci_clp.c
> +++ b/arch/s390/pci/pci_clp.c
> @@ -102,6 +102,7 @@ static void clp_store_query_pci_fngrp(struct zpci_dev *zdev,
> zdev->msi_addr = response->msia;
> zdev->max_msi = response->noi;
> zdev->fmb_update = response->mui;
> + zdev->version = response->version;
>
> switch (response->version) {
> case 1:
>

2020-09-21 09:40:38

by Christian Borntraeger

[permalink] [raw]
Subject: Re: [PATCH 2/4] s390/pci: track whether util_str is valid in the zpci_dev



On 19.09.20 17:28, Matthew Rosato wrote:
> We'll need to keep track of whether or not the byte string in util_str is
> valid and thus needs to be passed to a vfio-pci passthrough device.
>
> Signed-off-by: Matthew Rosato <[email protected]>

If this goes via the vfio tree,

Acked-by: Christian Borntraeger <[email protected]>

(please wait for an ack from Niklas Schnelle as well)
> ---
> arch/s390/include/asm/pci.h | 3 ++-
> arch/s390/pci/pci_clp.c | 1 +
> 2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
> index 882e233..32eb975 100644
> --- a/arch/s390/include/asm/pci.h
> +++ b/arch/s390/include/asm/pci.h
> @@ -132,7 +132,8 @@ struct zpci_dev {
> u8 rid_available : 1;
> u8 has_hp_slot : 1;
> u8 is_physfn : 1;
> - u8 reserved : 5;
> + u8 util_avail : 1;
> + u8 reserved : 4;
> unsigned int devfn; /* DEVFN part of the RID*/
>
> struct mutex lock;
> diff --git a/arch/s390/pci/pci_clp.c b/arch/s390/pci/pci_clp.c
> index 48bf316..d011134 100644
> --- a/arch/s390/pci/pci_clp.c
> +++ b/arch/s390/pci/pci_clp.c
> @@ -168,6 +168,7 @@ static int clp_store_query_pci_fn(struct zpci_dev *zdev,
> if (response->util_str_avail) {
> memcpy(zdev->util_str, response->util_str,
> sizeof(zdev->util_str));
> + zdev->util_avail = 1;
> }
> zdev->mio_capable = response->mio_addr_avail;
> for (i = 0; i < PCI_STD_NUM_BARS; i++) {
>

2020-09-21 09:42:44

by Niklas Schnelle

[permalink] [raw]
Subject: Re: [PATCH 2/4] s390/pci: track whether util_str is valid in the zpci_dev

Hi Matthew,

On 9/19/20 5:28 PM, Matthew Rosato wrote:
> We'll need to keep track of whether or not the byte string in util_str is
> valid and thus needs to be passed to a vfio-pci passthrough device.
>
> Signed-off-by: Matthew Rosato <[email protected]>
> ---
> arch/s390/include/asm/pci.h | 3 ++-
> arch/s390/pci/pci_clp.c | 1 +
> 2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
> index 882e233..32eb975 100644
> --- a/arch/s390/include/asm/pci.h
> +++ b/arch/s390/include/asm/pci.h
> @@ -132,7 +132,8 @@ struct zpci_dev {
> u8 rid_available : 1;
> u8 has_hp_slot : 1;
> u8 is_physfn : 1;
> - u8 reserved : 5;
> + u8 util_avail : 1;

Any reason you're not matching the util_str_avail name in the response struct?
I think this is currently always an EBCDIC encoded string so the information that
even if it looks like binary for anyone with a non-mainframe background
it is in fact a string seems quite helpful.
Other than that

Acked-by: Niklas Schnelle <[email protected]>

> + u8 reserved : 4;
> unsigned int devfn; /* DEVFN part of the RID*/
>
> struct mutex lock;
> diff --git a/arch/s390/pci/pci_clp.c b/arch/s390/pci/pci_clp.c
> index 48bf316..d011134 100644
> --- a/arch/s390/pci/pci_clp.c
> +++ b/arch/s390/pci/pci_clp.c
> @@ -168,6 +168,7 @@ static int clp_store_query_pci_fn(struct zpci_dev *zdev,
> if (response->util_str_avail) {
> memcpy(zdev->util_str, response->util_str,
> sizeof(zdev->util_str));
> + zdev->util_avail = 1;
> }
> zdev->mio_capable = response->mio_addr_avail;
> for (i = 0; i < PCI_STD_NUM_BARS; i++) {
>

2020-09-21 15:03:44

by Cornelia Huck

[permalink] [raw]
Subject: Re: [PATCH 1/4] s390/pci: stash version in the zpci_dev

On Sat, 19 Sep 2020 11:28:35 -0400
Matthew Rosato <[email protected]> wrote:

> In preparation for passing the info on to vfio-pci devices, stash the
> supported PCI version for the target device in the zpci_dev.

Hm, what kind of version is that? The version of the zPCI interface?

Inquiring minds want to know :)

>
> Signed-off-by: Matthew Rosato <[email protected]>
> ---
> arch/s390/include/asm/pci.h | 1 +
> arch/s390/pci/pci_clp.c | 1 +
> 2 files changed, 2 insertions(+)

2020-09-21 15:46:07

by Matthew Rosato

[permalink] [raw]
Subject: Re: [PATCH 1/4] s390/pci: stash version in the zpci_dev

On 9/21/20 11:01 AM, Cornelia Huck wrote:
> On Sat, 19 Sep 2020 11:28:35 -0400
> Matthew Rosato <[email protected]> wrote:
>
>> In preparation for passing the info on to vfio-pci devices, stash the
>> supported PCI version for the target device in the zpci_dev.
>
> Hm, what kind of version is that? The version of the zPCI interface?
>
> Inquiring minds want to know :)
>

Ha :) It's related to PCI-SIG spec versions and which one the zPCI
facility supports for this device.

>>
>> Signed-off-by: Matthew Rosato <[email protected]>
>> ---
>> arch/s390/include/asm/pci.h | 1 +
>> arch/s390/pci/pci_clp.c | 1 +
>> 2 files changed, 2 insertions(+)
>

2020-09-21 15:51:34

by Cornelia Huck

[permalink] [raw]
Subject: Re: [PATCH 1/4] s390/pci: stash version in the zpci_dev

On Mon, 21 Sep 2020 11:44:20 -0400
Matthew Rosato <[email protected]> wrote:

> On 9/21/20 11:01 AM, Cornelia Huck wrote:
> > On Sat, 19 Sep 2020 11:28:35 -0400
> > Matthew Rosato <[email protected]> wrote:
> >
> >> In preparation for passing the info on to vfio-pci devices, stash the
> >> supported PCI version for the target device in the zpci_dev.
> >
> > Hm, what kind of version is that? The version of the zPCI interface?
> >
> > Inquiring minds want to know :)
> >
>
> Ha :) It's related to PCI-SIG spec versions and which one the zPCI
> facility supports for this device.

Thanks for the info :)

>
> >>
> >> Signed-off-by: Matthew Rosato <[email protected]>
> >> ---
> >> arch/s390/include/asm/pci.h | 1 +
> >> arch/s390/pci/pci_clp.c | 1 +
> >> 2 files changed, 2 insertions(+)
> >
>

FWIW,

Acked-by: Cornelia Huck <[email protected]>

2020-09-22 10:56:03

by Cornelia Huck

[permalink] [raw]
Subject: Re: [PATCH 3/4] vfio-pci/zdev: define the vfio_zdev header

On Sat, 19 Sep 2020 11:28:37 -0400
Matthew Rosato <[email protected]> wrote:

> We define a new device region in vfio.h to be able to get the ZPCI CLP
> information by reading this region from userspace.
>
> We create a new file, vfio_zdev.h to define the structure of the new
> region defined in vfio.h
>
> Signed-off-by: Matthew Rosato <[email protected]>
> ---
> include/uapi/linux/vfio.h | 5 ++
> include/uapi/linux/vfio_zdev.h | 116 +++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 121 insertions(+)
> create mode 100644 include/uapi/linux/vfio_zdev.h
>
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 9204705..65eb367 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -326,6 +326,11 @@ struct vfio_region_info_cap_type {
> * to do TLB invalidation on a GPU.
> */
> #define VFIO_REGION_SUBTYPE_IBM_NVLINK2_ATSD (1)
> +/*
> + * IBM zPCI specific hardware feature information for a devcie. The contents
> + * of this region are mapped by struct vfio_region_zpci_info.
> + */
> +#define VFIO_REGION_SUBTYPE_IBM_ZPCI_CLP (2)

This is not really for a 10de vendor, but for all pci devices accessed
via zpci, isn't it?

We obviously want to avoid collisions here; not really sure how to
cover all possible vendors. Maybe just pick a high number?

>
> /* sub-types for VFIO_REGION_TYPE_GFX */
> #define VFIO_REGION_SUBTYPE_GFX_EDID (1)
> diff --git a/include/uapi/linux/vfio_zdev.h b/include/uapi/linux/vfio_zdev.h
> new file mode 100644
> index 0000000..c9e4891
> --- /dev/null
> +++ b/include/uapi/linux/vfio_zdev.h
> @@ -0,0 +1,116 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/*
> + * Region definition for ZPCI devices
> + *
> + * Copyright IBM Corp. 2020
> + *
> + * Author(s): Pierre Morel <[email protected]>
> + * Matthew Rosato <[email protected]>
> + */
> +
> +#ifndef _VFIO_ZDEV_H_
> +#define _VFIO_ZDEV_H_
> +
> +#include <linux/types.h>
> +
> +/**
> + * struct vfio_region_zpci_info - ZPCI information
> + *
> + * This region provides zPCI specific hardware feature information for a
> + * device.
> + *
> + * The ZPCI information structure is presented as a chain of CLP features

"CLP features" == "features returned by the CLP instruction", I guess?
Maybe mention that explicitly?

> + * defined below. argsz provides the size of the entire region, and offset
> + * provides the location of the first CLP feature in the chain.
> + *
> + */
> +struct vfio_region_zpci_info {
> + __u32 argsz; /* Size of entire payload */
> + __u32 offset; /* Location of first entry */
> +} __packed;

This '__packed' annotation seems redundant. I think that all of these
structures should be defined in a way that packing is unneeded (which
seems to be the case on a quick browse.)

> +
> +/**
> + * struct vfio_region_zpci_info_hdr - ZPCI header information
> + *
> + * This structure is included at the top of each CLP feature to define what
> + * type of CLP feature is presented / the structure version. The next value
> + * defines the offset of the next CLP feature, and is an offset from the very
> + * beginning of the region (vfio_region_zpci_info).
> + *
> + * Each CLP feature must have it's own unique 'id'.

s/it's/its/

Is the 'id' something that is already provided by the CLP instruction?

> + */
> +struct vfio_region_zpci_info_hdr {
> + __u16 id; /* Identifies the CLP type */
> + __u16 version; /* version of the CLP data */
> + __u32 next; /* Offset of next entry */
> +} __packed;
> +
> +/**
> + * struct vfio_region_zpci_info_qpci - Initial Query PCI information
> + *
> + * This region provides an initial set of data from the Query PCI Function

What does 'initial' mean in this context? Information you get for a
freshly initialized function?

> + * CLP.
> + */
> +#define VFIO_REGION_ZPCI_INFO_QPCI 1
> +
> +struct vfio_region_zpci_info_qpci {
> + struct vfio_region_zpci_info_hdr hdr;
> + __u64 start_dma; /* Start of available DMA addresses */
> + __u64 end_dma; /* End of available DMA addresses */
> + __u16 pchid; /* Physical Channel ID */
> + __u16 vfn; /* Virtual function number */
> + __u16 fmb_length; /* Measurement Block Length (in bytes) */
> + __u8 pft; /* PCI Function Type */
> + __u8 gid; /* PCI function group ID */
> +} __packed;
> +
> +
> +/**
> + * struct vfio_region_zpci_info_qpcifg - Initial Query PCI Function Group info
> + *
> + * This region provides an initial set of data from the Query PCI Function
> + * Group CLP.
> + */
> +#define VFIO_REGION_ZPCI_INFO_QPCIFG 2
> +
> +struct vfio_region_zpci_info_qpcifg {
> + struct vfio_region_zpci_info_hdr hdr;
> + __u64 dasm; /* DMA Address space mask */
> + __u64 msi_addr; /* MSI address */
> + __u64 flags;
> +#define VFIO_PCI_ZDEV_FLAGS_REFRESH 1 /* Use program-specified TLB refresh */
> + __u16 mui; /* Measurement Block Update Interval */
> + __u16 noi; /* Maximum number of MSIs */
> + __u16 maxstbl; /* Maximum Store Block Length */
> + __u8 version; /* Supported PCI Version */
> +} __packed;
> +
> +/**
> + * struct vfio_region_zpci_info_util - Utility String
> + *
> + * This region provides the utility string for the associated device, which is
> + * a device identifier string.

Is there an upper boundary for this string?

Is this a classic NUL-terminated string, or a list of EBCDIC characters?

> + */
> +#define VFIO_REGION_ZPCI_INFO_UTIL 3
> +
> +struct vfio_region_zpci_info_util {
> + struct vfio_region_zpci_info_hdr hdr;
> + __u32 size;
> + __u8 util_str[];
> +} __packed;
> +
> +/**
> + * struct vfio_region_zpci_info_pfip - PCI Function Path
> + *
> + * This region provides the PCI function path string, which is an identifier
> + * that describes the internal hardware path of the device.

Same question here.

> + */
> +#define VFIO_REGION_ZPCI_INFO_PFIP 4
> +
> +struct vfio_region_zpci_info_pfip {
> +struct vfio_region_zpci_info_hdr hdr;
> + __u32 size;
> + __u8 pfip[];
> +} __packed;
> +
> +#endif

2020-09-22 11:24:27

by Cornelia Huck

[permalink] [raw]
Subject: Re: [PATCH 4/4] vfio-pci/zdev: use a device region to retrieve zPCI information

On Sat, 19 Sep 2020 11:28:38 -0400
Matthew Rosato <[email protected]> wrote:

> Define a new configuration entry VFIO_PCI_ZDEV for VFIO/PCI.
>
> When this s390-only feature is configured we initialize a new device
> region, VFIO_REGION_SUBTYPE_IBM_ZPCI_CLP, to hold information provided
> by the underlying hardware.
>
> This patch is based on work previously done by Pierre Morel.
>
> Signed-off-by: Matthew Rosato <[email protected]>
> ---
> drivers/vfio/pci/Kconfig | 13 ++
> drivers/vfio/pci/Makefile | 1 +
> drivers/vfio/pci/vfio_pci.c | 8 ++
> drivers/vfio/pci/vfio_pci_private.h | 10 ++
> drivers/vfio/pci/vfio_pci_zdev.c | 242 ++++++++++++++++++++++++++++++++++++

Maybe you want to add yourself to MAINTAINERS for the zdev-specific
files? You're probably better suited to review changes to the
zpci-specific code :)

> 5 files changed, 274 insertions(+)
> create mode 100644 drivers/vfio/pci/vfio_pci_zdev.c

(...)

> +int vfio_pci_zdev_init(struct vfio_pci_device *vdev)
> +{
> + struct vfio_region_zpci_info *region;
> + struct zpci_dev *zdev;
> + size_t clp_offset;
> + int size;
> + int ret;
> +
> + if (!vdev->pdev->bus)
> + return -ENODEV;
> +
> + zdev = to_zpci(vdev->pdev);
> + if (!zdev)
> + return -ENODEV;
> +
> + /* Calculate size needed for all supported CLP features */
> + size = sizeof(*region) +
> + sizeof(struct vfio_region_zpci_info_qpci) +
> + sizeof(struct vfio_region_zpci_info_qpcifg) +
> + (sizeof(struct vfio_region_zpci_info_util) + CLP_UTIL_STR_LEN) +
> + (sizeof(struct vfio_region_zpci_info_pfip) +
> + CLP_PFIP_NR_SEGMENTS);
> +
> + region = kmalloc(size, GFP_KERNEL);
> + if (!region)
> + return -ENOMEM;
> +
> + /* Fill in header */
> + region->argsz = size;
> + clp_offset = region->offset = sizeof(struct vfio_region_zpci_info);
> +
> + /* Fill the supported CLP features */
> + clp_offset = vfio_pci_zdev_add_qpci(zdev, region, clp_offset);
> + clp_offset = vfio_pci_zdev_add_qpcifg(zdev, region, clp_offset);
> + clp_offset = vfio_pci_zdev_add_util(zdev, region, clp_offset);
> + clp_offset = vfio_pci_zdev_add_pfip(zdev, region, clp_offset);

So, the regions are populated once. Can any of the values in the
hardware structures be modified by a guest? Or changed from the
hardware side?

> +
> + ret = vfio_pci_register_dev_region(vdev,
> + PCI_VENDOR_ID_IBM | VFIO_REGION_TYPE_PCI_VENDOR_TYPE,
> + VFIO_REGION_SUBTYPE_IBM_ZPCI_CLP, &vfio_pci_zdev_regops,
> + size, VFIO_REGION_INFO_FLAG_READ, region);
> + if (ret)
> + kfree(region);
> +
> + return ret;
> +}

2020-09-22 13:57:49

by Matthew Rosato

[permalink] [raw]
Subject: Re: [PATCH 3/4] vfio-pci/zdev: define the vfio_zdev header

On 9/22/20 6:54 AM, Cornelia Huck wrote:
> On Sat, 19 Sep 2020 11:28:37 -0400
> Matthew Rosato <[email protected]> wrote:
>
>> We define a new device region in vfio.h to be able to get the ZPCI CLP
>> information by reading this region from userspace.
>>
>> We create a new file, vfio_zdev.h to define the structure of the new
>> region defined in vfio.h
>>
>> Signed-off-by: Matthew Rosato <[email protected]>
>> ---
>> include/uapi/linux/vfio.h | 5 ++
>> include/uapi/linux/vfio_zdev.h | 116 +++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 121 insertions(+)
>> create mode 100644 include/uapi/linux/vfio_zdev.h
>>
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 9204705..65eb367 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -326,6 +326,11 @@ struct vfio_region_info_cap_type {
>> * to do TLB invalidation on a GPU.
>> */
>> #define VFIO_REGION_SUBTYPE_IBM_NVLINK2_ATSD (1)
>> +/*
>> + * IBM zPCI specific hardware feature information for a devcie. The contents
>> + * of this region are mapped by struct vfio_region_zpci_info.
>> + */
>> +#define VFIO_REGION_SUBTYPE_IBM_ZPCI_CLP (2)
>
> This is not really for a 10de vendor, but for all pci devices accessed
> via zpci, isn't it?
s/10de/1014/ (10de is the set of regions prior to this one)

1014 == PCI_VENDOR_ID_IBM

But yes, this region is intended to be assigned to all pci devices
accessed thru zpci. But the next patch always assigns the region to the
zpci device using type 1014 subtype 2 (and userspace always searches
using that pair) -- So it should always be unique as I understand it
unless someone re-defines another type 1014 subtype 2?

> We obviously want to avoid collisions here; not really sure how to
> cover all possible vendors. Maybe just pick a high number?
>

I don't think this is necessary unless I'm misunderstanding something.

>>
>> /* sub-types for VFIO_REGION_TYPE_GFX */
>> #define VFIO_REGION_SUBTYPE_GFX_EDID (1)
>> diff --git a/include/uapi/linux/vfio_zdev.h b/include/uapi/linux/vfio_zdev.h
>> new file mode 100644
>> index 0000000..c9e4891
>> --- /dev/null
>> +++ b/include/uapi/linux/vfio_zdev.h
>> @@ -0,0 +1,116 @@
>> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
>> +/*
>> + * Region definition for ZPCI devices
>> + *
>> + * Copyright IBM Corp. 2020
>> + *
>> + * Author(s): Pierre Morel <[email protected]>
>> + * Matthew Rosato <[email protected]>
>> + */
>> +
>> +#ifndef _VFIO_ZDEV_H_
>> +#define _VFIO_ZDEV_H_
>> +
>> +#include <linux/types.h>
>> +
>> +/**
>> + * struct vfio_region_zpci_info - ZPCI information
>> + *
>> + * This region provides zPCI specific hardware feature information for a
>> + * device.
>> + *
>> + * The ZPCI information structure is presented as a chain of CLP features
>
> "CLP features" == "features returned by the CLP instruction", I guess?
> Maybe mention that explicitly?

Yes, that's correct. I'm trying to clarify that these things aren't a
1:1 relationship to a CLP instruction payload.

>
>> + * defined below. argsz provides the size of the entire region, and offset
>> + * provides the location of the first CLP feature in the chain.
>> + *
>> + */
>> +struct vfio_region_zpci_info {
>> + __u32 argsz; /* Size of entire payload */
>> + __u32 offset; /* Location of first entry */
>> +} __packed;
>
> This '__packed' annotation seems redundant. I think that all of these
> structures should be defined in a way that packing is unneeded (which
> seems to be the case on a quick browse.)
>
OK, I'll double-check and remove the __packed annotation.

>> +
>> +/**
>> + * struct vfio_region_zpci_info_hdr - ZPCI header information
>> + *
>> + * This structure is included at the top of each CLP feature to define what
>> + * type of CLP feature is presented / the structure version. The next value
>> + * defines the offset of the next CLP feature, and is an offset from the very
>> + * beginning of the region (vfio_region_zpci_info).
>> + *
>> + * Each CLP feature must have it's own unique 'id'.
>
> s/it's/its/
>
> Is the 'id' something that is already provided by the CLP instruction?
>

No, these IDs correspond only to the API for the vfio region and don't
directly relate to which CLP instruction they are associated with. The
term 'CLP feature' was intended to abstract these structures from
individual CLP instructions.

So, it might help to explain the design a bit here -- The CLP
instructions each return a specific, hardware-architected payload. But
we're not sending the entirety of that payload to the guest, rather
identifying a subset to forward via the vfio region. Currently, I've
sub-divided it as follows:

1) query pci info we currently care about
2) query pci fg info we currently care about
3) utility string
4) function path

This was done in such a way that, when we need to add further CLP
information to this region (ex: new device support, new zpci feature
support, etc) we can do so by adding new 'CLP features' to the region.
Those 'CLP features' could be additional parts of the query pci CLP,
additional parts of the query pci fg CLP, or parts of some other CLP.
Technically, #3 and #4 are part of query pci info, but the nature of the
way they are sized made it more convenient to make them separate features.

Userspace can then scan the region only for the 'CLP features' it
understands (or is enabled for) and pick only those (and use defaults
and/or turn support off for 'CLP features' it cannot find but expected to).

>> + */
>> +struct vfio_region_zpci_info_hdr {
>> + __u16 id; /* Identifies the CLP type */
>> + __u16 version; /* version of the CLP data */
>> + __u32 next; /* Offset of next entry */
>> +} __packed;
>> +
>> +/**
>> + * struct vfio_region_zpci_info_qpci - Initial Query PCI information
>> + *
>> + * This region provides an initial set of data from the Query PCI Function
>
> What does 'initial' mean in this context? Information you get for a
> freshly initialized function?
>

So this goes back to my statement above about 'query pci info we
currently care about' - It's not the entire query pci payload and I was
trying to avoid implying it was to prevent future confusion. So
'initial' is more in a sense of 'what we initially care to send to
userspace.'

But really, the vfio region API doesn't care which CLP the info came
from / where userspace is planning to stick these fields -- Perhaps I
should drop 'initial' and re-phrase without mentioning the CLP itself.
This feature is providing basic descriptive information about the
device, so maybe something like "Base zPCI device information"

>> + * CLP.
>> + */
>> +#define VFIO_REGION_ZPCI_INFO_QPCI 1
>> +
>> +struct vfio_region_zpci_info_qpci {
>> + struct vfio_region_zpci_info_hdr hdr;
>> + __u64 start_dma; /* Start of available DMA addresses */
>> + __u64 end_dma; /* End of available DMA addresses */
>> + __u16 pchid; /* Physical Channel ID */
>> + __u16 vfn; /* Virtual function number */
>> + __u16 fmb_length; /* Measurement Block Length (in bytes) */
>> + __u8 pft; /* PCI Function Type */
>> + __u8 gid; /* PCI function group ID */
>> +} __packed;
>> +
>> +
>> +/**
>> + * struct vfio_region_zpci_info_qpcifg - Initial Query PCI Function Group info
>> + *
>> + * This region provides an initial set of data from the Query PCI Function
>> + * Group CLP.
>> + */

And the same thing here -- It's the subset of query pci fg info we
currently care about -- So I can rename and drop the 'Initial' bit.
Something like "Base zPCI group information"

>> +#define VFIO_REGION_ZPCI_INFO_QPCIFG 2
>> +
>> +struct vfio_region_zpci_info_qpcifg {
>> + struct vfio_region_zpci_info_hdr hdr;
>> + __u64 dasm; /* DMA Address space mask */
>> + __u64 msi_addr; /* MSI address */
>> + __u64 flags;
>> +#define VFIO_PCI_ZDEV_FLAGS_REFRESH 1 /* Use program-specified TLB refresh */
>> + __u16 mui; /* Measurement Block Update Interval */
>> + __u16 noi; /* Maximum number of MSIs */
>> + __u16 maxstbl; /* Maximum Store Block Length */
>> + __u8 version; /* Supported PCI Version */
>> +} __packed;
>> +
>> +/**
>> + * struct vfio_region_zpci_info_util - Utility String
>> + *
>> + * This region provides the utility string for the associated device, which is
>> + * a device identifier string.
>
> Is there an upper boundary for this string?
>
> Is this a classic NUL-terminated string, or a list of EBCDIC characters?
>

EBCDIC characters.

So, there is indeed an upper-boundary for the string, CLP_UTIL_STR_LEN.
It's coming from a hardware-architected field and shouldn't change size,
but we send the length anyway so that the API can act independent of the
CLP hardware region. So the expectation is that userspace (qemu) would
compare the provided size of the util_str with what it expects the CLP
hardware payload to look like -- If it's too big, userspace can't use
that string to properly emulate the CLP response so it would have to
ignore this feature and use defaults.

>> + */
>> +#define VFIO_REGION_ZPCI_INFO_UTIL 3
>> +
>> +struct vfio_region_zpci_info_util {
>> + struct vfio_region_zpci_info_hdr hdr;
>> + __u32 size;
>> + __u8 util_str[];
>> +} __packed;
>> +
>> +/**
>> + * struct vfio_region_zpci_info_pfip - PCI Function Path
>> + *
>> + * This region provides the PCI function path string, which is an identifier
>> + * that describes the internal hardware path of the device.
>
> Same question here.

Hex string bounded by CLP_PFIP_NR_SEGMENTS and again coming from a
hardware-architected field that shouldn't change -- the rest of my
answer from above applies here too.

>
>> + */
>> +#define VFIO_REGION_ZPCI_INFO_PFIP 4
>> +
>> +struct vfio_region_zpci_info_pfip {
>> +struct vfio_region_zpci_info_hdr hdr;
>> + __u32 size;
>> + __u8 pfip[];
>> +} __packed;
>> +
>> +#endif
>

2020-09-22 14:04:09

by Matthew Rosato

[permalink] [raw]
Subject: Re: [PATCH 4/4] vfio-pci/zdev: use a device region to retrieve zPCI information

On 9/22/20 7:22 AM, Cornelia Huck wrote:
> On Sat, 19 Sep 2020 11:28:38 -0400
> Matthew Rosato <[email protected]> wrote:
>
>> Define a new configuration entry VFIO_PCI_ZDEV for VFIO/PCI.
>>
>> When this s390-only feature is configured we initialize a new device
>> region, VFIO_REGION_SUBTYPE_IBM_ZPCI_CLP, to hold information provided
>> by the underlying hardware.
>>
>> This patch is based on work previously done by Pierre Morel.
>>
>> Signed-off-by: Matthew Rosato <[email protected]>
>> ---
>> drivers/vfio/pci/Kconfig | 13 ++
>> drivers/vfio/pci/Makefile | 1 +
>> drivers/vfio/pci/vfio_pci.c | 8 ++
>> drivers/vfio/pci/vfio_pci_private.h | 10 ++
>> drivers/vfio/pci/vfio_pci_zdev.c | 242 ++++++++++++++++++++++++++++++++++++
>
> Maybe you want to add yourself to MAINTAINERS for the zdev-specific
> files? You're probably better suited to review changes to the
> zpci-specific code :)

Of course, will do. Looking at how we split vfio-ap and vfio-ccw, I'll
add an S390 VFIO-PCI DRIVER category and point to the new .h and .c file
for now.

>
>> 5 files changed, 274 insertions(+)
>> create mode 100644 drivers/vfio/pci/vfio_pci_zdev.c
>
> (...)
>
>> +int vfio_pci_zdev_init(struct vfio_pci_device *vdev)
>> +{
>> + struct vfio_region_zpci_info *region;
>> + struct zpci_dev *zdev;
>> + size_t clp_offset;
>> + int size;
>> + int ret;
>> +
>> + if (!vdev->pdev->bus)
>> + return -ENODEV;
>> +
>> + zdev = to_zpci(vdev->pdev);
>> + if (!zdev)
>> + return -ENODEV;
>> +
>> + /* Calculate size needed for all supported CLP features */
>> + size = sizeof(*region) +
>> + sizeof(struct vfio_region_zpci_info_qpci) +
>> + sizeof(struct vfio_region_zpci_info_qpcifg) +
>> + (sizeof(struct vfio_region_zpci_info_util) + CLP_UTIL_STR_LEN) +
>> + (sizeof(struct vfio_region_zpci_info_pfip) +
>> + CLP_PFIP_NR_SEGMENTS);
>> +
>> + region = kmalloc(size, GFP_KERNEL);
>> + if (!region)
>> + return -ENOMEM;
>> +
>> + /* Fill in header */
>> + region->argsz = size;
>> + clp_offset = region->offset = sizeof(struct vfio_region_zpci_info);
>> +
>> + /* Fill the supported CLP features */
>> + clp_offset = vfio_pci_zdev_add_qpci(zdev, region, clp_offset);
>> + clp_offset = vfio_pci_zdev_add_qpcifg(zdev, region, clp_offset);
>> + clp_offset = vfio_pci_zdev_add_util(zdev, region, clp_offset);
>> + clp_offset = vfio_pci_zdev_add_pfip(zdev, region, clp_offset);
>
> So, the regions are populated once. Can any of the values in the
> hardware structures be modified by a guest? Or changed from the
> hardware side?
>

The region is created read-only (vfio_pci_zdev_rw returns -EINVAL on
writes), so no guest modification. The expectation is the guest can
read the region and take what it wants / ignore what it doesn't.

The CLPs covered by this region are not intended to change once the
device is up.

If we end up needing either of the above for a different CLP, I would
think an additional/different region would be appropriate.


>> +
>> + ret = vfio_pci_register_dev_region(vdev,
>> + PCI_VENDOR_ID_IBM | VFIO_REGION_TYPE_PCI_VENDOR_TYPE,
>> + VFIO_REGION_SUBTYPE_IBM_ZPCI_CLP, &vfio_pci_zdev_regops,
>> + size, VFIO_REGION_INFO_FLAG_READ, region);
>> + if (ret)
>> + kfree(region);
>> +
>> + return ret;
>> +}
>

2020-09-22 14:08:37

by Matthew Rosato

[permalink] [raw]
Subject: Re: [PATCH 2/4] s390/pci: track whether util_str is valid in the zpci_dev

On 9/21/20 5:41 AM, Niklas Schnelle wrote:
> Hi Matthew,
>
> On 9/19/20 5:28 PM, Matthew Rosato wrote:
>> We'll need to keep track of whether or not the byte string in util_str is
>> valid and thus needs to be passed to a vfio-pci passthrough device.
>>
>> Signed-off-by: Matthew Rosato <[email protected]>
>> ---
>> arch/s390/include/asm/pci.h | 3 ++-
>> arch/s390/pci/pci_clp.c | 1 +
>> 2 files changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
>> index 882e233..32eb975 100644
>> --- a/arch/s390/include/asm/pci.h
>> +++ b/arch/s390/include/asm/pci.h
>> @@ -132,7 +132,8 @@ struct zpci_dev {
>> u8 rid_available : 1;
>> u8 has_hp_slot : 1;
>> u8 is_physfn : 1;
>> - u8 reserved : 5;
>> + u8 util_avail : 1;
>
> Any reason you're not matching the util_str_avail name in the response struct? > I think this is currently always an EBCDIC encoded string so the
information that
> even if it looks like binary for anyone with a non-mainframe background
> it is in fact a string seems quite helpful.

Frankly, the dropping of 'str_' was arbitrary on my part -- I'll go
ahead and rename it to util_str_avail with v2.

> Other than that
>
> Acked-by: Niklas Schnelle <[email protected]>
>

Thanks!

>> + u8 reserved : 4;
>> unsigned int devfn; /* DEVFN part of the RID*/
>>
>> struct mutex lock;
>> diff --git a/arch/s390/pci/pci_clp.c b/arch/s390/pci/pci_clp.c
>> index 48bf316..d011134 100644
>> --- a/arch/s390/pci/pci_clp.c
>> +++ b/arch/s390/pci/pci_clp.c
>> @@ -168,6 +168,7 @@ static int clp_store_query_pci_fn(struct zpci_dev *zdev,
>> if (response->util_str_avail) {
>> memcpy(zdev->util_str, response->util_str,
>> sizeof(zdev->util_str));
>> + zdev->util_avail = 1;
>> }
>> zdev->mio_capable = response->mio_addr_avail;
>> for (i = 0; i < PCI_STD_NUM_BARS; i++) {
>>