2022-03-15 08:45:22

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution

Note: A few patches in this series are dependent on Baolu's IOMMU domain ops
split, which is currently in the next branch of linux-iommu. This series
applies on top:
https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git

Enable interpretive execution of zPCI instructions + adapter interruption
forwarding for s390x KVM vfio-pci. This is done by introducing a new IOMMU
domain for s390x (KVM-managed), indicating via vfio that this IOMMU domain
should be used instead of the default, with subsequent management of the
hardware assists being handled via a new KVM ioctl for zPCI management.

By allowing intepretation of zPCI instructions and firmware delivery of
interrupts to guests, we can significantly reduce the frequency of guest
SIE exits for zPCI. We then see additional gains by handling a hot-path
instruction that can still intercept to the hypervisor (RPCIT) directly
in kvm via the new IOMMU domain, whose map operations update the host
DMA table with pinned guest entries over the specified range.

From the perspective of guest configuration, you passthrough zPCI devices
in the same manner as before, with intepretation support being used by
default if available in kernel+qemu.

Will reply with a link to the associated QEMU series.

Changelog v3->v4:
v3: https://lore.kernel.org/kvm/[email protected]/
- Significant overhaul of the userspace API. Remove all vfio device
feature ioctls. Remove CONFIG_VFIO_PCI_ZDEV, this is once again always
built with vfio-pci for s390; IS_ENABLED checks can instead look at
CONFIG_VFIO_PCI. Most earlier patches in the series could maintain
their reviews, but some needed to be removed due to required code
changes.
- Instead use a KVM ioctl for zPCI management. The API is very similar
to the feature ioctls used in the prior series, with an additional step
to create an association between an iommu domain + KVM + zPCI device.
- Introduce a new iommu domain ops type for s390-iommu, to be used when
KVM manages the IOMMU instead of in response to VFIO mapping ioctls
- Add a iommu method for specifying the type of domain to allocate
- Add a new type to vfio_iommu_type1 (KVM-owned) to trigger the allocation
of the KVM-owned IOMMU domain when zPCI interpretation is requested.
In this case, the KVM-owned type is specified on VFIO_SET_IOMMU.
- Wire the RPCIT intercepts into the new IOMMU domain via the kernel
IOMMU API
- Remove a bunch of unnecessary symbol externs, make the associated
functions static
- Now that we keep a list of zPCI associated with a given KVM, we can do
fh lookup on this list vs the list of all zPCI on the host. We only
need to do a host-wide fh lookup during the initial device<->KVM
association.


Matthew Rosato (32):
s390/sclp: detect the zPCI load/store interpretation facility
s390/sclp: detect the AISII facility
s390/sclp: detect the AENI facility
s390/sclp: detect the AISI facility
s390/airq: pass more TPI info to airq handlers
s390/airq: allow for airq structure that uses an input vector
s390/pci: externalize the SIC operation controls and routine
s390/pci: stash associated GISA designation
s390/pci: export some routines related to RPCIT processing
s390/pci: stash dtsm and maxstbl
s390/pci: add helper function to find device by handle
s390/pci: get SHM information from list pci
s390/pci: return status from zpci_refresh_trans
iommu: introduce iommu_domain_alloc_type and the KVM type
vfio: introduce KVM-owned IOMMU type
vfio-pci/zdev: add function handle to clp base capability
KVM: s390: pci: add basic kvm_zdev structure
iommu/s390: add support for IOMMU_DOMAIN_KVM
KVM: s390: pci: do initial setup for AEN interpretation
KVM: s390: pci: enable host forwarding of Adapter Event Notifications
KVM: s390: mechanism to enable guest zPCI Interpretation
KVM: s390: pci: routines for (dis)associating zPCI devices with a KVM
KVM: s390: pci: provide routines for enabling/disabling interpretation
KVM: s390: pci: provide routines for enabling/disabling interrupt
forwarding
KVM: s390: pci: provide routines for enabling/disabling IOAT assist
KVM: s390: pci: handle refresh of PCI translations
KVM: s390: intercept the rpcit instruction
KVM: s390: add KVM_S390_ZPCI_OP to manage guest zPCI devices
vfio-pci/zdev: add DTSM to clp group capability
KVM: s390: introduce CPU feature for zPCI Interpretation
MAINTAINERS: additional files related kvm s390 pci passthrough
MAINTAINERS: update s390 IOMMU entry

Documentation/virt/kvm/api.rst | 60 +++
MAINTAINERS | 4 +-
arch/s390/include/asm/airq.h | 7 +-
arch/s390/include/asm/kvm_host.h | 7 +
arch/s390/include/asm/kvm_pci.h | 40 ++
arch/s390/include/asm/pci.h | 12 +
arch/s390/include/asm/pci_clp.h | 11 +-
arch/s390/include/asm/pci_dma.h | 3 +
arch/s390/include/asm/pci_insn.h | 31 +-
arch/s390/include/asm/sclp.h | 4 +
arch/s390/include/asm/tpi.h | 13 +
arch/s390/include/uapi/asm/kvm.h | 1 +
arch/s390/kvm/Makefile | 1 +
arch/s390/kvm/interrupt.c | 95 +++-
arch/s390/kvm/kvm-s390.c | 90 +++-
arch/s390/kvm/kvm-s390.h | 10 +
arch/s390/kvm/pci.c | 833 +++++++++++++++++++++++++++++++
arch/s390/kvm/pci.h | 63 +++
arch/s390/kvm/priv.c | 46 ++
arch/s390/pci/pci.c | 31 ++
arch/s390/pci/pci_clp.c | 28 +-
arch/s390/pci/pci_dma.c | 7 +-
arch/s390/pci/pci_insn.c | 15 +-
arch/s390/pci/pci_irq.c | 48 +-
drivers/iommu/Kconfig | 8 +
drivers/iommu/Makefile | 1 +
drivers/iommu/iommu.c | 7 +
drivers/iommu/s390-iommu.c | 53 +-
drivers/iommu/s390-iommu.h | 53 ++
drivers/iommu/s390-kvm-iommu.c | 469 +++++++++++++++++
drivers/s390/char/sclp_early.c | 4 +
drivers/s390/cio/airq.c | 12 +-
drivers/s390/cio/qdio_thinint.c | 6 +-
drivers/s390/crypto/ap_bus.c | 9 +-
drivers/s390/virtio/virtio_ccw.c | 6 +-
drivers/vfio/pci/vfio_pci_zdev.c | 17 +-
drivers/vfio/vfio_iommu_type1.c | 12 +-
include/linux/iommu.h | 12 +
include/uapi/linux/kvm.h | 43 ++
include/uapi/linux/vfio.h | 6 +
include/uapi/linux/vfio_zdev.h | 6 +
41 files changed, 2090 insertions(+), 94 deletions(-)
create mode 100644 arch/s390/include/asm/kvm_pci.h
create mode 100644 arch/s390/kvm/pci.c
create mode 100644 arch/s390/kvm/pci.h
create mode 100644 drivers/iommu/s390-iommu.h
create mode 100644 drivers/iommu/s390-kvm-iommu.c

--
2.27.0


2022-03-15 12:24:11

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH v4 28/32] KVM: s390: add KVM_S390_ZPCI_OP to manage guest zPCI devices

The KVM_S390_ZPCI_OP ioctl provides a series of operations that
can be invoked to manage hardware-assisted virtualization features
for s390x PCI passthrough.

Signed-off-by: Matthew Rosato <[email protected]>
---
Documentation/virt/kvm/api.rst | 60 ++++++++++++++++++++++++++
arch/s390/kvm/kvm-s390.c | 26 ++++++++++++
arch/s390/kvm/pci.c | 77 ++++++++++++++++++++++++++++++++++
arch/s390/kvm/pci.h | 3 +-
include/uapi/linux/kvm.h | 43 +++++++++++++++++++
5 files changed, 208 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 9f3172376ec3..c642ff891cf2 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -5574,6 +5574,66 @@ enabled with ``arch_prctl()``, but this may change in the future.
The offsets of the state save areas in struct kvm_xsave follow the contents
of CPUID leaf 0xD on the host.

+4.134 KVM_S390_ZPCI_OP
+--------------------
+
+:Capability: KVM_CAP_S390_ZPCI_OP
+:Architectures: s390
+:Type: vcpu ioctl
+:Parameters: struct kvm_s390_zpci_op (in, out)
+:Returns: 0 on success, <0 on error
+
+Used to manage hardware-assisted virtualization features for zPCI devices.
+
+Parameters are specified via the following structure::
+
+ struct kvm_s390_zpci_op {
+ /* in */
+ __u32 fh; /* target device */
+ __u8 op; /* operation to perform */
+ __u8 pad[3];
+ union {
+ /* for KVM_S390_ZPCIOP_REG_INT */
+ struct {
+ __u64 ibv; /* Guest addr of interrupt bit vector */
+ __u64 sb; /* Guest addr of summary bit */
+ __u32 flags;
+ __u32 noi; /* Number of interrupts */
+ __u8 isc; /* Guest interrupt subclass */
+ __u8 sbo; /* Offset of guest summary bit vector */
+ __u16 pad;
+ } reg_int;
+ /* for KVM_S390_ZPCIOP_REG_IOAT */
+ struct {
+ __u64 iota; /* I/O Translation settings */
+ } reg_ioat;
+ __u8 reserved[64];
+ } u;
+ /* out */
+ __u32 newfh; /* updated device handle */
+ };
+
+The type of operation is specified in the "op" field.
+KVM_S390_ZPCIOP_INIT is used to assocaite a zPCI function with this vm.
+Conversely, KVM_S390_ZPCIOP_END is used to terminate that association.
+KVM_S390_ZPCIOP_START_INTERP is used to enable interpretive execution
+for the specified zPCI function for this VM; KVM_S390_ZPCIOP_STOP_INTERP
+is used to subsequently disable interpretive execution.
+KVM_S390_ZPCIOP_REG_INT is used to register the VM for adapter interruption
+forwarding, which will allow firmware delivery of interrupts directly to
+the vm, with KVM providing a backup delivery mechanism;
+KVM_S390_ZPCIOP_DEREG_INT is used to subsequently disable interrupt forwarding.
+KVM_S390_ZPCIOP_REG_IOAT is used to enable KVM-managed IOMMU ops to begin
+synchronizing guest and host DMA tables; KVM_S390_ZPCIOP_DEREG_IOAT is used
+to subsequently disable IOMMU mapping.
+
+The target zPCI function must also be specified via the "fh" field. For the
+KVM_S390_ZPCIOP_REG_INT operation, additional information to establish the
+interrupt forwarding must be provided via the "reg_int" struct. For the
+KVM_S390_ZPCIOP_REG_IOAT operation, guest table format and location must be
+specified via the "reg_ioat" struct.
+
+The "reserved" field is meant for future extensions.

5. The kvm_run structure
========================
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 84acaf59a7d3..613101ba29be 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -616,6 +616,15 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_S390_PROTECTED:
r = is_prot_virt_host();
break;
+ case KVM_CAP_S390_ZPCI_OP:
+ if (IS_ENABLED(CONFIG_S390_KVM_IOMMU) && test_facility(69) &&
+ test_facility(70) && test_facility(71) &&
+ test_facility(72)) {
+ r = 1;
+ } else {
+ r = 0;
+ }
+ break;
default:
r = 0;
}
@@ -2532,6 +2541,23 @@ long kvm_arch_vm_ioctl(struct file *filp,
}
break;
}
+ case KVM_S390_ZPCI_OP: {
+ struct kvm_s390_zpci_op args;
+
+ r = -EINVAL;
+ if (!IS_ENABLED(CONFIG_VFIO_PCI))
+ break;
+ if (copy_from_user(&args, argp, sizeof(args))) {
+ r = -EFAULT;
+ break;
+ }
+ r = kvm_s390_pci_zpci_op(kvm, &args);
+ if (r)
+ break;
+ if (copy_to_user(argp, &args, sizeof(args)))
+ r = -EFAULT;
+ break;
+ }
default:
r = -ENOTTY;
}
diff --git a/arch/s390/kvm/pci.c b/arch/s390/kvm/pci.c
index 40d2fadbfbd5..15b581915cd7 100644
--- a/arch/s390/kvm/pci.c
+++ b/arch/s390/kvm/pci.c
@@ -739,6 +739,83 @@ void kvm_s390_pci_clear_list(struct kvm *kvm)
}
}

+static int kvm_s390_pci_zpci_reg_int(struct zpci_dev *zdev,
+ struct kvm_s390_zpci_op *args)
+{
+ struct zpci_fib fib = {};
+
+ fib.fmt0.aibv = args->u.reg_int.ibv;
+ fib.fmt0.isc = args->u.reg_int.isc;
+ fib.fmt0.noi = args->u.reg_int.noi;
+ if (args->u.reg_int.sb != 0) {
+ fib.fmt0.aisb = args->u.reg_int.sb;
+ fib.fmt0.aisbo = args->u.reg_int.sbo;
+ fib.fmt0.sum = 1;
+ } else {
+ fib.fmt0.aisb = 0;
+ fib.fmt0.aisbo = 0;
+ fib.fmt0.sum = 0;
+ }
+
+ if (args->u.reg_int.flags & KVM_S390_ZPCIOP_REGINT_HOST)
+ return kvm_s390_pci_aif_enable(zdev, &fib, true);
+ else
+ return kvm_s390_pci_aif_enable(zdev, &fib, false);
+}
+
+int kvm_s390_pci_zpci_op(struct kvm *kvm, struct kvm_s390_zpci_op *args)
+{
+ struct kvm_zdev *kzdev;
+ struct zpci_dev *zdev;
+ int r;
+
+ if (args->op == KVM_S390_ZPCIOP_INIT) {
+ zdev = get_zdev_by_fh(args->fh);
+ if (!zdev)
+ return -ENODEV;
+ } else {
+ kzdev = get_kzdev_by_fh(kvm, args->fh);
+ if (!kzdev || !kzdev->zdev)
+ return -ENODEV;
+ zdev = kzdev->zdev;
+ }
+
+ switch (args->op) {
+ case KVM_S390_ZPCIOP_INIT:
+ r = kvm_s390_pci_zpci_start(kvm, zdev);
+ break;
+ case KVM_S390_ZPCIOP_END:
+ r = kvm_s390_pci_zpci_stop(kvm, zdev);
+ break;
+ case KVM_S390_ZPCIOP_START_INTERP:
+ r = kvm_s390_pci_interp_enable(zdev);
+ break;
+ case KVM_S390_ZPCIOP_STOP_INTERP:
+ r = kvm_s390_pci_interp_disable(zdev, false);
+ break;
+ case KVM_S390_ZPCIOP_REG_INT:
+ r = kvm_s390_pci_zpci_reg_int(zdev, args);
+ break;
+ case KVM_S390_ZPCIOP_DEREG_INT:
+ r = kvm_s390_pci_aif_disable(zdev, false);
+ break;
+ case KVM_S390_ZPCIOP_REG_IOAT:
+ r = kvm_s390_pci_ioat_enable(zdev, args->u.reg_ioat.iota);
+ break;
+ case KVM_S390_ZPCIOP_DEREG_IOAT:
+ r = kvm_s390_pci_ioat_disable(zdev);
+ break;
+ default:
+ r = -EINVAL;
+ }
+
+ /* On success, always return the current host function handle */
+ if (r == 0)
+ args->newfh = zdev->fh;
+
+ return r;
+}
+
int kvm_s390_pci_init(void)
{
int rc;
diff --git a/arch/s390/kvm/pci.h b/arch/s390/kvm/pci.h
index 2cb1b27396c1..c30b0bacca00 100644
--- a/arch/s390/kvm/pci.h
+++ b/arch/s390/kvm/pci.h
@@ -12,6 +12,7 @@

#include <linux/pci.h>
#include <linux/mutex.h>
+#include <linux/kvm.h>
#include <linux/kvm_host.h>
#include <asm/airq.h>
#include <asm/kvm_pci.h>
@@ -56,7 +57,7 @@ int kvm_s390_pci_zpci_start(struct kvm *kvm, struct zpci_dev *zdev);
int kvm_s390_pci_zpci_stop(struct kvm *kvm, struct zpci_dev *zdev);
void kvm_s390_pci_init_list(struct kvm *kvm);
void kvm_s390_pci_clear_list(struct kvm *kvm);
-
+int kvm_s390_pci_zpci_op(struct kvm *kvm, struct kvm_s390_zpci_op *args);
int kvm_s390_pci_init(void);

#endif /* __KVM_S390_PCI_H */
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 507ee1f2aa96..be8693ccc833 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1135,6 +1135,7 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_XSAVE2 208
#define KVM_CAP_SYS_ATTRIBUTES 209
#define KVM_CAP_PPC_AIL_MODE_3 210
+#define KVM_CAP_S390_ZPCI_OP 211

#ifdef KVM_CAP_IRQ_ROUTING

@@ -2049,4 +2050,46 @@ struct kvm_stats_desc {
/* Available with KVM_CAP_XSAVE2 */
#define KVM_GET_XSAVE2 _IOR(KVMIO, 0xcf, struct kvm_xsave)

+/* Available with KVM_CAP_S390_ZPCI_OP */
+#define KVM_S390_ZPCI_OP _IOW(KVMIO, 0xd0, struct kvm_s390_zpci_op)
+
+struct kvm_s390_zpci_op {
+ /* in */
+ __u32 fh; /* target device */
+ __u8 op; /* operation to perform */
+ __u8 pad[3];
+ union {
+ /* for KVM_S390_ZPCIOP_REG_INT */
+ struct {
+ __u64 ibv; /* Guest addr of interrupt bit vector */
+ __u64 sb; /* Guest addr of summary bit */
+ __u32 flags;
+ __u32 noi; /* Number of interrupts */
+ __u8 isc; /* Guest interrupt subclass */
+ __u8 sbo; /* Offset of guest summary bit vector */
+ __u16 pad;
+ } reg_int;
+ /* for KVM_S390_ZPCIOP_REG_IOAT */
+ struct {
+ __u64 iota; /* I/O Translation settings */
+ } reg_ioat;
+ __u8 reserved[64];
+ } u;
+ /* out */
+ __u32 newfh; /* updated device handle */
+};
+
+/* types for kvm_s390_zpci_op->op */
+#define KVM_S390_ZPCIOP_INIT 0
+#define KVM_S390_ZPCIOP_END 1
+#define KVM_S390_ZPCIOP_START_INTERP 2
+#define KVM_S390_ZPCIOP_STOP_INTERP 3
+#define KVM_S390_ZPCIOP_REG_INT 4
+#define KVM_S390_ZPCIOP_DEREG_INT 5
+#define KVM_S390_ZPCIOP_REG_IOAT 6
+#define KVM_S390_ZPCIOP_DEREG_IOAT 7
+
+/* flags for kvm_s390_zpci_op->u.reg_int.flags */
+#define KVM_S390_ZPCIOP_REGINT_HOST (1 << 0)
+
#endif /* __LINUX_KVM_H */
--
2.27.0

2022-03-15 12:24:52

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH v4 12/32] s390/pci: get SHM information from list pci

KVM will need information on the special handle mask used to indicate
emulated devices. In order to obtain this, a new type of list pci call
must be made to gather the information. Extend clp_list_pci_req to
also fetch the model-dependent-data field that holds this mask.

Reviewed-by: Niklas Schnelle <[email protected]>
Acked-by: Pierre Morel <[email protected]>
Signed-off-by: Matthew Rosato <[email protected]>
---
arch/s390/include/asm/pci.h | 1 +
arch/s390/include/asm/pci_clp.h | 2 +-
arch/s390/pci/pci_clp.c | 25 ++++++++++++++++++++++---
3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 3c0b9986dcdc..e8a3fd5bc169 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -227,6 +227,7 @@ int clp_enable_fh(struct zpci_dev *zdev, u32 *fh, u8 nr_dma_as);
int clp_disable_fh(struct zpci_dev *zdev, u32 *fh);
int clp_get_state(u32 fid, enum zpci_state *state);
int clp_refresh_fh(u32 fid, u32 *fh);
+int zpci_get_mdd(u32 *mdd);

/* UID */
void update_uid_checking(bool new);
diff --git a/arch/s390/include/asm/pci_clp.h b/arch/s390/include/asm/pci_clp.h
index d6189ed14f84..dc2041e97de4 100644
--- a/arch/s390/include/asm/pci_clp.h
+++ b/arch/s390/include/asm/pci_clp.h
@@ -76,7 +76,7 @@ struct clp_req_list_pci {
struct clp_rsp_list_pci {
struct clp_rsp_hdr hdr;
u64 resume_token;
- u32 reserved2;
+ u32 mdd;
u16 max_fn;
u8 : 7;
u8 uid_checking : 1;
diff --git a/arch/s390/pci/pci_clp.c b/arch/s390/pci/pci_clp.c
index dc733b58e74f..7477956be632 100644
--- a/arch/s390/pci/pci_clp.c
+++ b/arch/s390/pci/pci_clp.c
@@ -328,7 +328,7 @@ int clp_disable_fh(struct zpci_dev *zdev, u32 *fh)
}

static int clp_list_pci_req(struct clp_req_rsp_list_pci *rrb,
- u64 *resume_token, int *nentries)
+ u64 *resume_token, int *nentries, u32 *mdd)
{
int rc;

@@ -354,6 +354,8 @@ static int clp_list_pci_req(struct clp_req_rsp_list_pci *rrb,
*nentries = (rrb->response.hdr.len - LIST_PCI_HDR_LEN) /
rrb->response.entry_size;
*resume_token = rrb->response.resume_token;
+ if (mdd)
+ *mdd = rrb->response.mdd;

return rc;
}
@@ -365,7 +367,7 @@ static int clp_list_pci(struct clp_req_rsp_list_pci *rrb, void *data,
int nentries, i, rc;

do {
- rc = clp_list_pci_req(rrb, &resume_token, &nentries);
+ rc = clp_list_pci_req(rrb, &resume_token, &nentries, NULL);
if (rc)
return rc;
for (i = 0; i < nentries; i++)
@@ -383,7 +385,7 @@ static int clp_find_pci(struct clp_req_rsp_list_pci *rrb, u32 fid,
int nentries, i, rc;

do {
- rc = clp_list_pci_req(rrb, &resume_token, &nentries);
+ rc = clp_list_pci_req(rrb, &resume_token, &nentries, NULL);
if (rc)
return rc;
fh_list = rrb->response.fh_list;
@@ -468,6 +470,23 @@ int clp_get_state(u32 fid, enum zpci_state *state)
return rc;
}

+int zpci_get_mdd(u32 *mdd)
+{
+ struct clp_req_rsp_list_pci *rrb;
+ u64 resume_token = 0;
+ int nentries, rc;
+
+ rrb = clp_alloc_block(GFP_KERNEL);
+ if (!rrb)
+ return -ENOMEM;
+
+ rc = clp_list_pci_req(rrb, &resume_token, &nentries, mdd);
+
+ clp_free_block(rrb);
+ return rc;
+}
+EXPORT_SYMBOL_GPL(zpci_get_mdd);
+
static int clp_base_slpc(struct clp_req *req, struct clp_req_rsp_slpc *lpcb)
{
unsigned long limit = PAGE_SIZE - sizeof(lpcb->request);
--
2.27.0

2022-03-15 17:55:23

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH v4 26/32] KVM: s390: pci: handle refresh of PCI translations

Add a routine that will perform a shadow operation between a guest
and host IOAT. A subsequent patch will invoke this in response to
an 04 RPCIT instruction intercept.

Signed-off-by: Matthew Rosato <[email protected]>
---
arch/s390/include/asm/kvm_pci.h | 1 +
arch/s390/kvm/pci.c | 31 ++++++++++++++++++++++++++++++-
arch/s390/kvm/pci.h | 3 +++
3 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/kvm_pci.h b/arch/s390/include/asm/kvm_pci.h
index e27dbede723c..9578b5dafb45 100644
--- a/arch/s390/include/asm/kvm_pci.h
+++ b/arch/s390/include/asm/kvm_pci.h
@@ -25,6 +25,7 @@ struct kvm_zdev {
struct zpci_fib fib;
struct notifier_block nb;
struct list_head entry;
+ u64 rpcit_count;
};

int kvm_s390_pci_dev_open(struct zpci_dev *zdev);
diff --git a/arch/s390/kvm/pci.c b/arch/s390/kvm/pci.c
index 1a8b82220b29..40d2fadbfbd5 100644
--- a/arch/s390/kvm/pci.c
+++ b/arch/s390/kvm/pci.c
@@ -8,6 +8,7 @@
*/

#include <linux/kvm_host.h>
+#include <linux/iommu.h>
#include <linux/pci.h>
#include <linux/vfio.h>
#include <asm/kvm_pci.h>
@@ -173,6 +174,30 @@ int kvm_s390_pci_aen_init(u8 nisc)
return rc;
}

+int kvm_s390_pci_refresh_trans(struct kvm_vcpu *vcpu, unsigned long req,
+ unsigned long start, unsigned long size)
+{
+ struct kvm_zdev *kzdev;
+ u32 fh = req >> 32;
+ int rc;
+
+ /* Make sure this is a valid device associated with this guest */
+ kzdev = get_kzdev_by_fh(vcpu->kvm, fh);
+ if (!kzdev)
+ return -EINVAL;
+
+ /*
+ * The KVM-managed IOMMU map operation will synchronize the associated
+ * guest IOAT tables with the host DMA tables. A physical address is
+ * not specified as it will be derived from pinned guest PTEs
+ */
+ rc = iommu_map(kzdev->dom, start, 0, size, IOMMU_WRITE | IOMMU_READ);
+
+ kzdev->rpcit_count++;
+
+ return rc;
+}
+
/* Modify PCI: Register floating adapter interruption forwarding */
static int kvm_zpci_set_airq(struct zpci_dev *zdev)
{
@@ -716,6 +741,8 @@ void kvm_s390_pci_clear_list(struct kvm *kvm)

int kvm_s390_pci_init(void)
{
+ int rc;
+
aift = kzalloc(sizeof(struct zpci_aift), GFP_KERNEL);
if (!aift)
return -ENOMEM;
@@ -723,5 +750,7 @@ int kvm_s390_pci_init(void)
spin_lock_init(&aift->gait_lock);
mutex_init(&aift->aift_lock);

- return 0;
+ rc = zpci_get_mdd(&aift->mdd);
+
+ return rc;
}
diff --git a/arch/s390/kvm/pci.h b/arch/s390/kvm/pci.h
index 867f04cae3a1..2cb1b27396c1 100644
--- a/arch/s390/kvm/pci.h
+++ b/arch/s390/kvm/pci.h
@@ -33,6 +33,7 @@ struct zpci_aift {
struct kvm_zdev **kzdev;
spinlock_t gait_lock; /* Protects the gait, used during AEN forward */
struct mutex aift_lock; /* Protects the other structures in aift */
+ u32 mdd;
};

extern struct zpci_aift *aift;
@@ -48,6 +49,8 @@ static inline struct kvm *kvm_s390_pci_si_to_kvm(struct zpci_aift *aift,

int kvm_s390_pci_aen_init(u8 nisc);
void kvm_s390_pci_aen_exit(void);
+int kvm_s390_pci_refresh_trans(struct kvm_vcpu *vcpu, unsigned long req,
+ unsigned long start, unsigned long end);

int kvm_s390_pci_zpci_start(struct kvm *kvm, struct zpci_dev *zdev);
int kvm_s390_pci_zpci_stop(struct kvm *kvm, struct zpci_dev *zdev);
--
2.27.0

2022-03-15 20:24:07

by Matthew Rosato

[permalink] [raw]
Subject: Re: [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution

On 3/14/22 3:44 PM, Matthew Rosato wrote:
> Note: A few patches in this series are dependent on Baolu's IOMMU domain ops
> split, which is currently in the next branch of linux-iommu. This series
> applies on top:
> https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git
>
> Enable interpretive execution of zPCI instructions + adapter interruption
> forwarding for s390x KVM vfio-pci. This is done by introducing a new IOMMU
> domain for s390x (KVM-managed), indicating via vfio that this IOMMU domain
> should be used instead of the default, with subsequent management of the
> hardware assists being handled via a new KVM ioctl for zPCI management.
>
> By allowing intepretation of zPCI instructions and firmware delivery of
> interrupts to guests, we can significantly reduce the frequency of guest
> SIE exits for zPCI. We then see additional gains by handling a hot-path
> instruction that can still intercept to the hypervisor (RPCIT) directly
> in kvm via the new IOMMU domain, whose map operations update the host
> DMA table with pinned guest entries over the specified range.
>
> From the perspective of guest configuration, you passthrough zPCI devices
> in the same manner as before, with intepretation support being used by
> default if available in kernel+qemu.
>
> Will reply with a link to the associated QEMU series.

QEMU series:
https://lore.kernel.org/kvm/[email protected]/

2022-03-15 21:15:27

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH v4 22/32] KVM: s390: pci: routines for (dis)associating zPCI devices with a KVM

These routines will be wired into a KVM ioctl, to be issued from
userspace to (dis)associate a specific zPCI device with the issuing
KVM. This will create/delete a relationship between KVM, zPCI device
and the associated IOMMU domain for the device.

Signed-off-by: Matthew Rosato <[email protected]>
---
arch/s390/include/asm/kvm_host.h | 2 +
arch/s390/include/asm/kvm_pci.h | 2 +
arch/s390/kvm/kvm-s390.c | 5 +
arch/s390/kvm/pci.c | 225 +++++++++++++++++++++++++++++++
arch/s390/kvm/pci.h | 5 +
5 files changed, 239 insertions(+)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index bf61ab05f98c..bd171abbb8ef 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -965,6 +965,8 @@ struct kvm_arch{
DECLARE_BITMAP(idle_mask, KVM_MAX_VCPUS);
struct kvm_s390_gisa_interrupt gisa_int;
struct kvm_s390_pv pv;
+ struct list_head kzdev_list;
+ spinlock_t kzdev_list_lock;
};

#define KVM_HVA_ERR_BAD (-1UL)
diff --git a/arch/s390/include/asm/kvm_pci.h b/arch/s390/include/asm/kvm_pci.h
index ebc0da5d9ac1..47ce18b5bddd 100644
--- a/arch/s390/include/asm/kvm_pci.h
+++ b/arch/s390/include/asm/kvm_pci.h
@@ -21,6 +21,8 @@ struct kvm_zdev {
struct zpci_dev *zdev;
struct kvm *kvm;
struct iommu_domain *dom; /* Used to invoke IOMMU API for RPCIT */
+ struct notifier_block nb;
+ struct list_head entry;
};

int kvm_s390_pci_dev_open(struct zpci_dev *zdev);
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index d91b2547f0bf..84acaf59a7d3 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2775,6 +2775,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)

kvm_s390_crypto_init(kvm);

+ if (IS_ENABLED(CONFIG_VFIO_PCI))
+ kvm_s390_pci_init_list(kvm);
+
mutex_init(&kvm->arch.float_int.ais_lock);
spin_lock_init(&kvm->arch.float_int.lock);
for (i = 0; i < FIRQ_LIST_COUNT; i++)
@@ -2860,6 +2863,8 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
if (!kvm_is_ucontrol(kvm))
gmap_remove(kvm->arch.gmap);
kvm_s390_destroy_adapters(kvm);
+ if (IS_ENABLED(CONFIG_VFIO_PCI))
+ kvm_s390_pci_clear_list(kvm);
kvm_s390_clear_float_irqs(kvm);
kvm_s390_vsie_destroy(kvm);
KVM_EVENT(3, "vm 0x%pK destroyed", kvm);
diff --git a/arch/s390/kvm/pci.c b/arch/s390/kvm/pci.c
index 1c42d25de697..28fe95f13c33 100644
--- a/arch/s390/kvm/pci.c
+++ b/arch/s390/kvm/pci.c
@@ -9,6 +9,7 @@

#include <linux/kvm_host.h>
#include <linux/pci.h>
+#include <linux/vfio.h>
#include <asm/kvm_pci.h>
#include <asm/pci.h>
#include <asm/pci_insn.h>
@@ -23,6 +24,22 @@ static inline int __set_irq_noiib(u16 ctl, u8 isc)
return zpci_set_irq_ctrl(ctl, isc, &iib);
}

+static struct kvm_zdev *get_kzdev_by_fh(struct kvm *kvm, u32 fh)
+{
+ struct kvm_zdev *kzdev, *retval = NULL;
+
+ spin_lock(&kvm->arch.kzdev_list_lock);
+ list_for_each_entry(kzdev, &kvm->arch.kzdev_list, entry) {
+ if (kzdev->zdev->fh == fh) {
+ retval = kzdev;
+ break;
+ }
+ }
+ spin_unlock(&kvm->arch.kzdev_list_lock);
+
+ return retval;
+}
+
/* Caller must hold the aift lock before calling this function */
void kvm_s390_pci_aen_exit(void)
{
@@ -153,6 +170,20 @@ int kvm_s390_pci_aen_init(u8 nisc)
return rc;
}

+static int kvm_s390_pci_group_notifier(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ struct kvm_zdev *kzdev = container_of(nb, struct kvm_zdev, nb);
+
+ if (action == VFIO_GROUP_NOTIFY_SET_KVM) {
+ if (!data || !kzdev->zdev)
+ return NOTIFY_DONE;
+ kzdev->kvm = data;
+ }
+
+ return NOTIFY_OK;
+}
+
int kvm_s390_pci_dev_open(struct zpci_dev *zdev)
{
struct kvm_zdev *kzdev;
@@ -179,6 +210,200 @@ void kvm_s390_pci_dev_release(struct zpci_dev *zdev)
}
EXPORT_SYMBOL_GPL(kvm_s390_pci_dev_release);

+static struct vfio_device *get_vdev(struct device *dev)
+{
+ struct vfio_device *(*fn)(struct device *dev);
+ struct vfio_device *vdev;
+
+ fn = symbol_get(vfio_device_get_from_dev);
+ if (!fn)
+ return NULL;
+
+ vdev = fn(dev);
+
+ symbol_put(vfio_device_get_from_dev);
+
+ return vdev;
+}
+
+static void put_vdev(struct vfio_device *vdev)
+{
+ void (*fn)(struct vfio_device *vdev);
+
+ fn = symbol_get(vfio_device_put);
+ if (!fn)
+ return;
+
+ fn(vdev);
+
+ symbol_put(vfio_device_put);
+}
+
+static int register_notifier(struct device *dev, struct notifier_block *nb)
+{
+ int (*fn)(struct device *dev, enum vfio_notify_type type,
+ unsigned long *events, struct notifier_block *nb);
+ unsigned long events = VFIO_GROUP_NOTIFY_SET_KVM;
+ int rc;
+
+ fn = symbol_get(vfio_register_notifier);
+ if (!fn)
+ return -EINVAL;
+
+ rc = fn(dev, VFIO_GROUP_NOTIFY, &events, nb);
+
+ symbol_put(vfio_register_notifier);
+
+ return rc;
+}
+
+static int unregister_notifier(struct device *dev, struct notifier_block *nb)
+{
+ int (*fn)(struct device *dev, enum vfio_notify_type type,
+ struct notifier_block *nb);
+ int rc;
+
+ fn = symbol_get(vfio_unregister_notifier);
+ if (!fn)
+ return -EINVAL;
+
+ rc = fn(dev, VFIO_GROUP_NOTIFY, nb);
+
+ symbol_put(vfio_unregister_notifier);
+
+ return rc;
+}
+
+int kvm_s390_pci_zpci_start(struct kvm *kvm, struct zpci_dev *zdev)
+{
+ struct vfio_device *vdev;
+ struct pci_dev *pdev;
+ int rc;
+
+ rc = kvm_s390_pci_dev_open(zdev);
+ if (rc)
+ return rc;
+
+ pdev = pci_get_slot(zdev->zbus->bus, zdev->devfn);
+ if (!pdev) {
+ rc = -ENODEV;
+ goto exit_err;
+ }
+
+ vdev = get_vdev(&pdev->dev);
+ if (!vdev) {
+ pci_dev_put(pdev);
+ rc = -ENODEV;
+ goto exit_err;
+ }
+
+ zdev->kzdev->nb.notifier_call = kvm_s390_pci_group_notifier;
+
+ /*
+ * At this point, a KVM should already be associated with this device,
+ * so registering the notifier now should immediately trigger the
+ * event. We also want to know if the KVM association is later removed
+ * to ensure proper cleanup happens.
+ */
+ rc = register_notifier(vdev->dev, &zdev->kzdev->nb);
+
+ put_vdev(vdev);
+ pci_dev_put(pdev);
+
+ /* Make sure the registered KVM matches the KVM issuing the ioctl */
+ if (rc || zdev->kzdev->kvm != kvm) {
+ rc = -ENODEV;
+ goto exit_err;
+ }
+
+ /* Must support KVM-managed IOMMU to proceed */
+ if (IS_ENABLED(CONFIG_S390_KVM_IOMMU))
+ rc = zpci_iommu_attach_kvm(zdev, kvm);
+ else
+ rc = -EINVAL;
+
+ if (rc)
+ goto exit_err;
+
+ spin_lock(&kvm->arch.kzdev_list_lock);
+ list_add_tail(&zdev->kzdev->entry, &kvm->arch.kzdev_list);
+ spin_unlock(&kvm->arch.kzdev_list_lock);
+ return 0;
+
+exit_err:
+ kvm_s390_pci_dev_release(zdev);
+ return rc;
+}
+
+int kvm_s390_pci_zpci_stop(struct kvm *kvm, struct zpci_dev *zdev)
+{
+ struct vfio_device *vdev;
+ struct pci_dev *pdev;
+ int rc = 0;
+
+ if (!zdev || !zdev->kzdev)
+ return -EINVAL;
+
+ pdev = pci_get_slot(zdev->zbus->bus, zdev->devfn);
+ if (!pdev) {
+ rc = -ENODEV;
+ goto exit_err;
+ }
+
+ vdev = get_vdev(&pdev->dev);
+ if (!vdev) {
+ pci_dev_put(pdev);
+ rc = -ENODEV;
+ goto exit_err;
+ }
+
+ spin_lock(&kvm->arch.kzdev_list_lock);
+ list_del(&zdev->kzdev->entry);
+ spin_unlock(&kvm->arch.kzdev_list_lock);
+
+ rc = unregister_notifier(vdev->dev, &zdev->kzdev->nb);
+
+ put_vdev(vdev);
+ pci_dev_put(pdev);
+
+exit_err:
+ kvm_s390_pci_dev_release(zdev);
+ return rc;
+}
+
+void kvm_s390_pci_init_list(struct kvm *kvm)
+{
+ spin_lock_init(&kvm->arch.kzdev_list_lock);
+ INIT_LIST_HEAD(&kvm->arch.kzdev_list);
+}
+
+void kvm_s390_pci_clear_list(struct kvm *kvm)
+{
+ struct kvm_zdev *tmp, *kzdev;
+ struct vfio_device *vdev;
+ struct pci_dev *pdev;
+ LIST_HEAD(remove);
+
+ spin_lock(&kvm->arch.kzdev_list_lock);
+ list_for_each_entry_safe(kzdev, tmp, &kvm->arch.kzdev_list, entry)
+ list_move_tail(&kzdev->entry, &remove);
+ spin_unlock(&kvm->arch.kzdev_list_lock);
+
+ list_for_each_entry_safe(kzdev, tmp, &remove, entry) {
+ pdev = pci_get_slot(kzdev->zdev->zbus->bus, kzdev->zdev->devfn);
+ if (pdev) {
+ vdev = get_vdev(&pdev->dev);
+ if (vdev) {
+ unregister_notifier(vdev->dev,
+ &kzdev->nb);
+ put_vdev(vdev);
+ }
+ pci_dev_put(pdev);
+ }
+ kvm_s390_pci_dev_release(kzdev->zdev);
+ }
+}
+
int kvm_s390_pci_init(void)
{
aift = kzalloc(sizeof(struct zpci_aift), GFP_KERNEL);
diff --git a/arch/s390/kvm/pci.h b/arch/s390/kvm/pci.h
index 25cb1c787190..a95d9fdc91be 100644
--- a/arch/s390/kvm/pci.h
+++ b/arch/s390/kvm/pci.h
@@ -47,6 +47,11 @@ static inline struct kvm *kvm_s390_pci_si_to_kvm(struct zpci_aift *aift,
int kvm_s390_pci_aen_init(u8 nisc);
void kvm_s390_pci_aen_exit(void);

+int kvm_s390_pci_zpci_start(struct kvm *kvm, struct zpci_dev *zdev);
+int kvm_s390_pci_zpci_stop(struct kvm *kvm, struct zpci_dev *zdev);
+void kvm_s390_pci_init_list(struct kvm *kvm);
+void kvm_s390_pci_clear_list(struct kvm *kvm);
+
int kvm_s390_pci_init(void);

#endif /* __KVM_S390_PCI_H */
--
2.27.0

2022-03-15 21:21:41

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH v4 14/32] iommu: introduce iommu_domain_alloc_type and the KVM type

s390x will introduce an additional domain type that is used for
managing IOMMU owned by KVM. Define the type here and add an
interface for allocating a specified type vs the default type.

Signed-off-by: Matthew Rosato <[email protected]>
---
drivers/iommu/iommu.c | 7 +++++++
include/linux/iommu.h | 12 ++++++++++++
2 files changed, 19 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f2c45b85b9fc..8bb57e0e3945 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1976,6 +1976,13 @@ void iommu_domain_free(struct iommu_domain *domain)
}
EXPORT_SYMBOL_GPL(iommu_domain_free);

+struct iommu_domain *iommu_domain_alloc_type(struct bus_type *bus,
+ unsigned int t)
+{
+ return __iommu_domain_alloc(bus, t);
+}
+EXPORT_SYMBOL_GPL(iommu_domain_alloc_type);
+
static int __iommu_attach_device(struct iommu_domain *domain,
struct device *dev)
{
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 9208eca4b0d1..b427bbb9f387 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -63,6 +63,7 @@ struct iommu_domain_geometry {
implementation */
#define __IOMMU_DOMAIN_PT (1U << 2) /* Domain is identity mapped */
#define __IOMMU_DOMAIN_DMA_FQ (1U << 3) /* DMA-API uses flush queue */
+#define __IOMMU_DOMAIN_KVM (1U << 4) /* Domain is controlled by KVM */

/*
* This are the possible domain-types
@@ -77,6 +78,7 @@ struct iommu_domain_geometry {
* certain optimizations for these domains
* IOMMU_DOMAIN_DMA_FQ - As above, but definitely using batched TLB
* invalidation.
+ * IOMMU_DOMAIN_KVM - DMA mappings managed by KVM, used for VMs
*/
#define IOMMU_DOMAIN_BLOCKED (0U)
#define IOMMU_DOMAIN_IDENTITY (__IOMMU_DOMAIN_PT)
@@ -86,6 +88,8 @@ struct iommu_domain_geometry {
#define IOMMU_DOMAIN_DMA_FQ (__IOMMU_DOMAIN_PAGING | \
__IOMMU_DOMAIN_DMA_API | \
__IOMMU_DOMAIN_DMA_FQ)
+#define IOMMU_DOMAIN_KVM (__IOMMU_DOMAIN_PAGING | \
+ __IOMMU_DOMAIN_KVM)

struct iommu_domain {
unsigned type;
@@ -421,6 +425,8 @@ extern bool iommu_capable(struct bus_type *bus, enum iommu_cap cap);
extern struct iommu_domain *iommu_domain_alloc(struct bus_type *bus);
extern struct iommu_group *iommu_group_get_by_id(int id);
extern void iommu_domain_free(struct iommu_domain *domain);
+extern struct iommu_domain *iommu_domain_alloc_type(struct bus_type *bus,
+ unsigned int t);
extern int iommu_attach_device(struct iommu_domain *domain,
struct device *dev);
extern void iommu_detach_device(struct iommu_domain *domain,
@@ -708,6 +714,12 @@ static inline void iommu_domain_free(struct iommu_domain *domain)
{
}

+static inline struct iommu_domain *iommu_domain_alloc_type(struct bus_type *bus,
+ unsigned int t)
+{
+ return NULL;
+}
+
static inline int iommu_attach_device(struct iommu_domain *domain,
struct device *dev)
{
--
2.27.0

2022-03-15 21:50:53

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH v4 23/32] KVM: s390: pci: provide routines for enabling/disabling interpretation

These routines will be wired into a kvm ioctl in order to respond to
requests to enable / disable a device for zPCI Load/Store intepretation.

The first time such a request is received, enable the necessary facilities
for the guest.

Signed-off-by: Matthew Rosato <[email protected]>
---
arch/s390/kvm/pci.c | 86 +++++++++++++++++++++++++++++++++++++++++++++
arch/s390/pci/pci.c | 3 ++
2 files changed, 89 insertions(+)

diff --git a/arch/s390/kvm/pci.c b/arch/s390/kvm/pci.c
index 28fe95f13c33..df50dd6114c3 100644
--- a/arch/s390/kvm/pci.c
+++ b/arch/s390/kvm/pci.c
@@ -13,7 +13,9 @@
#include <asm/kvm_pci.h>
#include <asm/pci.h>
#include <asm/pci_insn.h>
+#include <asm/sclp.h>
#include "pci.h"
+#include "kvm-s390.h"

struct zpci_aift *aift;

@@ -170,6 +172,87 @@ int kvm_s390_pci_aen_init(u8 nisc)
return rc;
}

+static int kvm_s390_pci_interp_enable(struct zpci_dev *zdev)
+{
+ u32 gisa;
+ int rc;
+
+ if (!zdev->kzdev || !zdev->kzdev->kvm)
+ return -EINVAL;
+
+ /*
+ * If this is the first request to use an interpreted device, make the
+ * necessary vcpu changes
+ */
+ if (!zdev->kzdev->kvm->arch.use_zpci_interp)
+ kvm_s390_vcpu_pci_enable_interp(zdev->kzdev->kvm);
+
+ /*
+ * In the event of a system reset in userspace, the GISA designation
+ * may still be assigned because the device is still enabled.
+ * Verify it's the same guest before proceeding.
+ */
+ gisa = (u32)virt_to_phys(&zdev->kzdev->kvm->arch.sie_page2->gisa);
+ if (zdev->gisa != 0 && zdev->gisa != gisa)
+ return -EPERM;
+
+ if (zdev_enabled(zdev)) {
+ zdev->gisa = 0;
+ rc = zpci_disable_device(zdev);
+ if (rc)
+ return rc;
+ }
+
+ /*
+ * Store information about the identity of the kvm guest allowed to
+ * access this device via interpretation to be used by host CLP
+ */
+ zdev->gisa = gisa;
+
+ rc = zpci_enable_device(zdev);
+ if (rc)
+ goto err;
+
+ /* Re-register the IOMMU that was already created */
+ rc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
+ virt_to_phys(zdev->dma_table));
+ if (rc)
+ goto err;
+
+ return rc;
+
+err:
+ zdev->gisa = 0;
+ return rc;
+}
+
+static int kvm_s390_pci_interp_disable(struct zpci_dev *zdev)
+{
+ int rc;
+
+ if (zdev->gisa == 0)
+ return -EINVAL;
+
+ /* Remove the host CLP guest designation */
+ zdev->gisa = 0;
+
+ if (zdev_enabled(zdev)) {
+ rc = zpci_disable_device(zdev);
+ if (rc)
+ return rc;
+ }
+
+ rc = zpci_enable_device(zdev);
+ if (rc)
+ return rc;
+
+ /* Re-register the IOMMU that was already created */
+ rc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
+ virt_to_phys(zdev->dma_table));
+
+ return rc;
+}
+
static int kvm_s390_pci_group_notifier(struct notifier_block *nb,
unsigned long action, void *data)
{
@@ -203,6 +286,9 @@ void kvm_s390_pci_dev_release(struct zpci_dev *zdev)
{
struct kvm_zdev *kzdev;

+ if (zdev->gisa != 0)
+ kvm_s390_pci_interp_disable(zdev, true);
+
kzdev = zdev->kzdev;
WARN_ON(kzdev->zdev != zdev);
zdev->kzdev = 0;
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index 13033717cd4e..5dbe49ec325e 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -147,6 +147,7 @@ int zpci_register_ioat(struct zpci_dev *zdev, u8 dmaas,
zpci_dbg(3, "reg ioat fid:%x, cc:%d, status:%d\n", zdev->fid, cc, status);
return cc;
}
+EXPORT_SYMBOL_GPL(zpci_register_ioat);

/* Modify PCI: Unregister I/O address translation parameters */
int zpci_unregister_ioat(struct zpci_dev *zdev, u8 dmaas)
@@ -727,6 +728,7 @@ int zpci_enable_device(struct zpci_dev *zdev)
zpci_update_fh(zdev, fh);
return rc;
}
+EXPORT_SYMBOL_GPL(zpci_enable_device);

int zpci_disable_device(struct zpci_dev *zdev)
{
@@ -750,6 +752,7 @@ int zpci_disable_device(struct zpci_dev *zdev)
}
return rc;
}
+EXPORT_SYMBOL_GPL(zpci_disable_device);

/**
* zpci_hot_reset_device - perform a reset of the given zPCI function
--
2.27.0

2022-03-16 15:46:01

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH v4 16/32] vfio-pci/zdev: add function handle to clp base capability

The function handle is a system-wide unique identifier for a zPCI
device. It is used as input for various zPCI operations.

Signed-off-by: Matthew Rosato <[email protected]>
---
drivers/vfio/pci/vfio_pci_zdev.c | 5 +++--
include/uapi/linux/vfio_zdev.h | 3 +++
2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_zdev.c b/drivers/vfio/pci/vfio_pci_zdev.c
index ea4c0d2b0663..4a653ce480c7 100644
--- a/drivers/vfio/pci/vfio_pci_zdev.c
+++ b/drivers/vfio/pci/vfio_pci_zdev.c
@@ -23,14 +23,15 @@ static int zpci_base_cap(struct zpci_dev *zdev, struct vfio_info_cap *caps)
{
struct vfio_device_info_cap_zpci_base cap = {
.header.id = VFIO_DEVICE_INFO_CAP_ZPCI_BASE,
- .header.version = 1,
+ .header.version = 2,
.start_dma = zdev->start_dma,
.end_dma = zdev->end_dma,
.pchid = zdev->pchid,
.vfn = zdev->vfn,
.fmb_length = zdev->fmb_length,
.pft = zdev->pft,
- .gid = zdev->pfgid
+ .gid = zdev->pfgid,
+ .fh = zdev->fh
};

return vfio_info_add_capability(caps, &cap.header, sizeof(cap));
diff --git a/include/uapi/linux/vfio_zdev.h b/include/uapi/linux/vfio_zdev.h
index b4309397b6b2..78c022af3d29 100644
--- a/include/uapi/linux/vfio_zdev.h
+++ b/include/uapi/linux/vfio_zdev.h
@@ -29,6 +29,9 @@ struct vfio_device_info_cap_zpci_base {
__u16 fmb_length; /* Measurement Block Length (in bytes) */
__u8 pft; /* PCI Function Type */
__u8 gid; /* PCI function group ID */
+ /* End of version 1 */
+ __u32 fh; /* PCI function handle */
+ /* End of version 2 */
};

/**
--
2.27.0

2022-03-16 16:07:33

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH v4 30/32] KVM: s390: introduce CPU feature for zPCI Interpretation

KVM_S390_VM_CPU_FEAT_ZPCI_INTERP relays whether zPCI interpretive
execution is possible based on the available hardware facilities.

Signed-off-by: Matthew Rosato <[email protected]>
---
arch/s390/include/uapi/asm/kvm.h | 1 +
arch/s390/kvm/kvm-s390.c | 6 ++++++
2 files changed, 7 insertions(+)

diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
index 7a6b14874d65..ed06458a871f 100644
--- a/arch/s390/include/uapi/asm/kvm.h
+++ b/arch/s390/include/uapi/asm/kvm.h
@@ -130,6 +130,7 @@ struct kvm_s390_vm_cpu_machine {
#define KVM_S390_VM_CPU_FEAT_PFMFI 11
#define KVM_S390_VM_CPU_FEAT_SIGPIF 12
#define KVM_S390_VM_CPU_FEAT_KSS 13
+#define KVM_S390_VM_CPU_FEAT_ZPCI_INTERP 14
struct kvm_s390_vm_cpu_feat {
__u64 feat[16];
};
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 613101ba29be..137ab8c09b82 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -434,6 +434,12 @@ static void kvm_s390_cpu_feat_init(void)
if (test_facility(151)) /* DFLTCC */
__insn32_query(INSN_DFLTCC, kvm_s390_available_subfunc.dfltcc);

+ /* zPCI Interpretation */
+ if (IS_ENABLED(CONFIG_VFIO_PCI) && IS_ENABLED(CONFIG_S390_KVM_IOMMU) &&
+ test_facility(69) && test_facility(70) && test_facility(71) &&
+ test_facility(72))
+ allow_cpu_feat(KVM_S390_VM_CPU_FEAT_ZPCI_INTERP);
+
if (MACHINE_HAS_ESOP)
allow_cpu_feat(KVM_S390_VM_CPU_FEAT_ESOP);
/*
--
2.27.0

2022-03-16 16:43:58

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH v4 08/32] s390/pci: stash associated GISA designation

For passthrough devices, we will need to know the GISA designation of the
guest if interpretation facilities are to be used. Setup to stash this in
the zdev and set a default of 0 (no GISA designation) for now; a subsequent
patch will set a valid GISA designation for passthrough devices.
Also, extend mpcific routines to specify this stashed designation as part
of the mpcific command.

Reviewed-by: Niklas Schnelle <[email protected]>
Reviewed-by: Christian Borntraeger <[email protected]>
Reviewed-by: Eric Farman <[email protected]>
Reviewed-by: Pierre Morel <[email protected]>
Signed-off-by: Matthew Rosato <[email protected]>
---
arch/s390/include/asm/pci.h | 1 +
arch/s390/include/asm/pci_clp.h | 3 ++-
arch/s390/pci/pci.c | 6 ++++++
arch/s390/pci/pci_clp.c | 1 +
arch/s390/pci/pci_irq.c | 5 +++++
5 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 90824be5ce9a..d07d7c3205de 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -123,6 +123,7 @@ struct zpci_dev {
enum zpci_state state;
u32 fid; /* function ID, used by sclp */
u32 fh; /* function handle, used by insn's */
+ u32 gisa; /* GISA designation for passthrough */
u16 vfn; /* virtual function number */
u16 pchid; /* physical channel ID */
u8 pfgid; /* function group ID */
diff --git a/arch/s390/include/asm/pci_clp.h b/arch/s390/include/asm/pci_clp.h
index 1f4b666e85ee..f3286bc5ba6e 100644
--- a/arch/s390/include/asm/pci_clp.h
+++ b/arch/s390/include/asm/pci_clp.h
@@ -173,7 +173,8 @@ struct clp_req_set_pci {
u16 reserved2;
u8 oc; /* operation controls */
u8 ndas; /* number of dma spaces */
- u64 reserved3;
+ u32 reserved3;
+ u32 gisa; /* GISA designation */
} __packed;

/* Set PCI function response */
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index 792f8e0f2178..ca9c29386de6 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -119,6 +119,7 @@ int zpci_register_ioat(struct zpci_dev *zdev, u8 dmaas,
fib.pba = base;
fib.pal = limit;
fib.iota = iota | ZPCI_IOTA_RTTO_FLAG;
+ fib.gd = zdev->gisa;
cc = zpci_mod_fc(req, &fib, &status);
if (cc)
zpci_dbg(3, "reg ioat fid:%x, cc:%d, status:%d\n", zdev->fid, cc, status);
@@ -132,6 +133,8 @@ int zpci_unregister_ioat(struct zpci_dev *zdev, u8 dmaas)
struct zpci_fib fib = {0};
u8 cc, status;

+ fib.gd = zdev->gisa;
+
cc = zpci_mod_fc(req, &fib, &status);
if (cc)
zpci_dbg(3, "unreg ioat fid:%x, cc:%d, status:%d\n", zdev->fid, cc, status);
@@ -159,6 +162,7 @@ int zpci_fmb_enable_device(struct zpci_dev *zdev)
atomic64_set(&zdev->unmapped_pages, 0);

fib.fmb_addr = virt_to_phys(zdev->fmb);
+ fib.gd = zdev->gisa;
cc = zpci_mod_fc(req, &fib, &status);
if (cc) {
kmem_cache_free(zdev_fmb_cache, zdev->fmb);
@@ -177,6 +181,8 @@ int zpci_fmb_disable_device(struct zpci_dev *zdev)
if (!zdev->fmb)
return -EINVAL;

+ fib.gd = zdev->gisa;
+
/* Function measurement is disabled if fmb address is zero */
cc = zpci_mod_fc(req, &fib, &status);
if (cc == 3) /* Function already gone. */
diff --git a/arch/s390/pci/pci_clp.c b/arch/s390/pci/pci_clp.c
index be077b39da33..4dcc37ddeeaf 100644
--- a/arch/s390/pci/pci_clp.c
+++ b/arch/s390/pci/pci_clp.c
@@ -240,6 +240,7 @@ static int clp_set_pci_fn(struct zpci_dev *zdev, u32 *fh, u8 nr_dma_as, u8 comma
rrb->request.fh = zdev->fh;
rrb->request.oc = command;
rrb->request.ndas = nr_dma_as;
+ rrb->request.gisa = zdev->gisa;

rc = clp_req(rrb, CLP_LPS_PCI);
if (rrb->response.hdr.rsp == CLP_RC_SETPCIFN_BUSY) {
diff --git a/arch/s390/pci/pci_irq.c b/arch/s390/pci/pci_irq.c
index 2f675355fd0c..a19ac0282929 100644
--- a/arch/s390/pci/pci_irq.c
+++ b/arch/s390/pci/pci_irq.c
@@ -43,6 +43,7 @@ static int zpci_set_airq(struct zpci_dev *zdev)
fib.fmt0.aibvo = 0; /* each zdev has its own interrupt vector */
fib.fmt0.aisb = virt_to_phys(zpci_sbv->vector) + (zdev->aisb / 64) * 8;
fib.fmt0.aisbo = zdev->aisb & 63;
+ fib.gd = zdev->gisa;

return zpci_mod_fc(req, &fib, &status) ? -EIO : 0;
}
@@ -54,6 +55,8 @@ static int zpci_clear_airq(struct zpci_dev *zdev)
struct zpci_fib fib = {0};
u8 cc, status;

+ fib.gd = zdev->gisa;
+
cc = zpci_mod_fc(req, &fib, &status);
if (cc == 3 || (cc == 1 && status == 24))
/* Function already gone or IRQs already deregistered. */
@@ -72,6 +75,7 @@ static int zpci_set_directed_irq(struct zpci_dev *zdev)
fib.fmt = 1;
fib.fmt1.noi = zdev->msi_nr_irqs;
fib.fmt1.dibvo = zdev->msi_first_bit;
+ fib.gd = zdev->gisa;

return zpci_mod_fc(req, &fib, &status) ? -EIO : 0;
}
@@ -84,6 +88,7 @@ static int zpci_clear_directed_irq(struct zpci_dev *zdev)
u8 cc, status;

fib.fmt = 1;
+ fib.gd = zdev->gisa;
cc = zpci_mod_fc(req, &fib, &status);
if (cc == 3 || (cc == 1 && status == 24))
/* Function already gone or IRQs already deregistered. */
--
2.27.0

2022-03-16 17:30:25

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 22/32] KVM: s390: pci: routines for (dis)associating zPCI devices with a KVM

On Mon, Mar 14, 2022 at 03:44:41PM -0400, Matthew Rosato wrote:
> +int kvm_s390_pci_zpci_start(struct kvm *kvm, struct zpci_dev *zdev)
> +{
> + struct vfio_device *vdev;
> + struct pci_dev *pdev;
> + int rc;
> +
> + rc = kvm_s390_pci_dev_open(zdev);
> + if (rc)
> + return rc;
> +
> + pdev = pci_get_slot(zdev->zbus->bus, zdev->devfn);
> + if (!pdev) {
> + rc = -ENODEV;
> + goto exit_err;
> + }
> +
> + vdev = get_vdev(&pdev->dev);
> + if (!vdev) {
> + pci_dev_put(pdev);
> + rc = -ENODEV;
> + goto exit_err;
> + }
> +
> + zdev->kzdev->nb.notifier_call = kvm_s390_pci_group_notifier;
> +
> + /*
> + * At this point, a KVM should already be associated with this device,
> + * so registering the notifier now should immediately trigger the
> + * event. We also want to know if the KVM association is later removed
> + * to ensure proper cleanup happens.
> + */
> + rc = register_notifier(vdev->dev, &zdev->kzdev->nb);
> +
> + put_vdev(vdev);
> + pci_dev_put(pdev);
> +
> + /* Make sure the registered KVM matches the KVM issuing the ioctl */
> + if (rc || zdev->kzdev->kvm != kvm) {
> + rc = -ENODEV;
> + goto exit_err;
> + }
> +
> + /* Must support KVM-managed IOMMU to proceed */
> + if (IS_ENABLED(CONFIG_S390_KVM_IOMMU))
> + rc = zpci_iommu_attach_kvm(zdev, kvm);
> + else
> + rc = -EINVAL;

This seems like kind of a strange API, shouldn't kvm be getting a
reference on the underlying iommu_domain and then calling into it to
get the mapping table instead of pushing KVM specific logic into the
iommu driver?

I would be nice if all the special kvm stuff could more isolated in
kvm code.

I'm still a little unclear about why this is so complicated - can't
you get the iommu_domain from the group FD directly in KVM code as
power does?

Jason

2022-03-16 19:14:04

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH v4 13/32] s390/pci: return status from zpci_refresh_trans

Current callers of zpci_refresh_trans don't need to interrogate the status
returned from the underlying instructions. However, a subsequent patch
will add a KVM caller that needs this information. Add a new argument to
zpci_refresh_trans to pass the address of a status byte and update
existing call sites to provide it.

Reviewed-by: Pierre Morel <[email protected]>
Reviewed-by: Claudio Imbrenda <[email protected]>
Reviewed-by: Niklas Schnelle <[email protected]>
Signed-off-by: Matthew Rosato <[email protected]>
---
arch/s390/include/asm/pci_insn.h | 2 +-
arch/s390/pci/pci_dma.c | 6 ++++--
arch/s390/pci/pci_insn.c | 10 +++++-----
drivers/iommu/s390-iommu.c | 4 +++-
4 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/arch/s390/include/asm/pci_insn.h b/arch/s390/include/asm/pci_insn.h
index 5331082fa516..32759c407b8f 100644
--- a/arch/s390/include/asm/pci_insn.h
+++ b/arch/s390/include/asm/pci_insn.h
@@ -135,7 +135,7 @@ union zpci_sic_iib {
DECLARE_STATIC_KEY_FALSE(have_mio);

u8 zpci_mod_fc(u64 req, struct zpci_fib *fib, u8 *status);
-int zpci_refresh_trans(u64 fn, u64 addr, u64 range);
+int zpci_refresh_trans(u64 fn, u64 addr, u64 range, u8 *status);
int __zpci_load(u64 *data, u64 req, u64 offset);
int zpci_load(u64 *data, const volatile void __iomem *addr, unsigned long len);
int __zpci_store(u64 data, u64 req, u64 offset);
diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
index a81de48d5ea7..b0a2380bcad8 100644
--- a/arch/s390/pci/pci_dma.c
+++ b/arch/s390/pci/pci_dma.c
@@ -23,8 +23,9 @@ static u32 s390_iommu_aperture_factor = 1;

static int zpci_refresh_global(struct zpci_dev *zdev)
{
+ u8 status;
return zpci_refresh_trans((u64) zdev->fh << 32, zdev->start_dma,
- zdev->iommu_pages * PAGE_SIZE);
+ zdev->iommu_pages * PAGE_SIZE, &status);
}

unsigned long *dma_alloc_cpu_table(void)
@@ -183,6 +184,7 @@ static int __dma_purge_tlb(struct zpci_dev *zdev, dma_addr_t dma_addr,
size_t size, int flags)
{
unsigned long irqflags;
+ u8 status;
int ret;

/*
@@ -201,7 +203,7 @@ static int __dma_purge_tlb(struct zpci_dev *zdev, dma_addr_t dma_addr,
}

ret = zpci_refresh_trans((u64) zdev->fh << 32, dma_addr,
- PAGE_ALIGN(size));
+ PAGE_ALIGN(size), &status);
if (ret == -ENOMEM && !s390_iommu_strict) {
/* enable the hypervisor to free some resources */
if (zpci_refresh_global(zdev))
diff --git a/arch/s390/pci/pci_insn.c b/arch/s390/pci/pci_insn.c
index 0509554301c7..ca6399d52767 100644
--- a/arch/s390/pci/pci_insn.c
+++ b/arch/s390/pci/pci_insn.c
@@ -77,20 +77,20 @@ static inline u8 __rpcit(u64 fn, u64 addr, u64 range, u8 *status)
return cc;
}

-int zpci_refresh_trans(u64 fn, u64 addr, u64 range)
+int zpci_refresh_trans(u64 fn, u64 addr, u64 range, u8 *status)
{
- u8 cc, status;
+ u8 cc;

do {
- cc = __rpcit(fn, addr, range, &status);
+ cc = __rpcit(fn, addr, range, status);
if (cc == 2)
udelay(ZPCI_INSN_BUSY_DELAY);
} while (cc == 2);

if (cc)
- zpci_err_insn(cc, status, addr, range);
+ zpci_err_insn(cc, *status, addr, range);

- if (cc == 1 && (status == 4 || status == 16))
+ if (cc == 1 && (*status == 4 || *status == 16))
return -ENOMEM;

return (cc) ? -EIO : 0;
diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index 3833e86c6e7b..73a85c599dc2 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -214,6 +214,7 @@ static int s390_iommu_update_trans(struct s390_domain *s390_domain,
unsigned long irq_flags, nr_pages, i;
unsigned long *entry;
int rc = 0;
+ u8 status;

if (dma_addr < s390_domain->domain.geometry.aperture_start ||
dma_addr + size > s390_domain->domain.geometry.aperture_end)
@@ -238,7 +239,8 @@ static int s390_iommu_update_trans(struct s390_domain *s390_domain,
spin_lock(&s390_domain->list_lock);
list_for_each_entry(domain_device, &s390_domain->devices, list) {
rc = zpci_refresh_trans((u64) domain_device->zdev->fh << 32,
- start_dma_addr, nr_pages * PAGE_SIZE);
+ start_dma_addr, nr_pages * PAGE_SIZE,
+ &status);
if (rc)
break;
}
--
2.27.0

2022-03-16 23:16:19

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH v4 24/32] KVM: s390: pci: provide routines for enabling/disabling interrupt forwarding

These routines will be wired into a kvm ioctl in order to respond to
requests to enable / disable a device for Adapter Event Notifications /
Adapter Interuption Forwarding.

Signed-off-by: Matthew Rosato <[email protected]>
---
arch/s390/include/asm/kvm_pci.h | 2 +
arch/s390/kvm/pci.c | 201 +++++++++++++++++++++++++++++++-
arch/s390/pci/pci_insn.c | 1 +
3 files changed, 203 insertions(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/kvm_pci.h b/arch/s390/include/asm/kvm_pci.h
index 47ce18b5bddd..ed596880fb06 100644
--- a/arch/s390/include/asm/kvm_pci.h
+++ b/arch/s390/include/asm/kvm_pci.h
@@ -16,11 +16,13 @@
#include <linux/kvm_host.h>
#include <linux/kvm.h>
#include <linux/pci.h>
+#include <asm/pci_insn.h>

struct kvm_zdev {
struct zpci_dev *zdev;
struct kvm *kvm;
struct iommu_domain *dom; /* Used to invoke IOMMU API for RPCIT */
+ struct zpci_fib fib;
struct notifier_block nb;
struct list_head entry;
};
diff --git a/arch/s390/kvm/pci.c b/arch/s390/kvm/pci.c
index df50dd6114c3..2287c1c6a3e5 100644
--- a/arch/s390/kvm/pci.c
+++ b/arch/s390/kvm/pci.c
@@ -13,6 +13,7 @@
#include <asm/kvm_pci.h>
#include <asm/pci.h>
#include <asm/pci_insn.h>
+#include <asm/pci_io.h>
#include <asm/sclp.h>
#include "pci.h"
#include "kvm-s390.h"
@@ -172,6 +173,200 @@ int kvm_s390_pci_aen_init(u8 nisc)
return rc;
}

+/* Modify PCI: Register floating adapter interruption forwarding */
+static int kvm_zpci_set_airq(struct zpci_dev *zdev)
+{
+ u64 req = ZPCI_CREATE_REQ(zdev->fh, 0, ZPCI_MOD_FC_REG_INT);
+ struct zpci_fib fib = {};
+ u8 status;
+
+ fib.fmt0.isc = zdev->kzdev->fib.fmt0.isc;
+ fib.fmt0.sum = 1; /* enable summary notifications */
+ fib.fmt0.noi = airq_iv_end(zdev->aibv);
+ fib.fmt0.aibv = virt_to_phys(zdev->aibv->vector);
+ fib.fmt0.aibvo = 0;
+ fib.fmt0.aisb = virt_to_phys(aift->sbv->vector + (zdev->aisb / 64) * 8);
+ fib.fmt0.aisbo = zdev->aisb & 63;
+ fib.gd = zdev->gisa;
+
+ return zpci_mod_fc(req, &fib, &status) ? -EIO : 0;
+}
+
+/* Modify PCI: Unregister floating adapter interruption forwarding */
+static int kvm_zpci_clear_airq(struct zpci_dev *zdev)
+{
+ u64 req = ZPCI_CREATE_REQ(zdev->fh, 0, ZPCI_MOD_FC_DEREG_INT);
+ struct zpci_fib fib = {};
+ u8 cc, status;
+
+ fib.gd = zdev->gisa;
+
+ cc = zpci_mod_fc(req, &fib, &status);
+ if (cc == 3 || (cc == 1 && status == 24))
+ /* Function already gone or IRQs already deregistered. */
+ cc = 0;
+
+ return cc ? -EIO : 0;
+}
+
+static int kvm_s390_pci_aif_enable(struct zpci_dev *zdev, struct zpci_fib *fib,
+ bool assist)
+{
+ struct page *aibv_page, *aisb_page = NULL;
+ unsigned int msi_vecs, idx;
+ struct zpci_gaite *gaite;
+ unsigned long bit;
+ struct kvm *kvm;
+ phys_addr_t gaddr;
+ int rc = 0, gisc;
+
+ /*
+ * Interrupt forwarding is only applicable if the device is already
+ * enabled for interpretation
+ */
+ if (zdev->gisa == 0)
+ return -EINVAL;
+
+ kvm = zdev->kzdev->kvm;
+ msi_vecs = min_t(unsigned int, fib->fmt0.noi, zdev->max_msi);
+
+ /* Get the associated forwarding ISC - if invalid, return the error */
+ gisc = kvm_s390_gisc_register(kvm, fib->fmt0.isc);
+ if (gisc < 0)
+ return gisc;
+
+ /* Replace AIBV address */
+ idx = srcu_read_lock(&kvm->srcu);
+ aibv_page = gfn_to_page(kvm, gpa_to_gfn((gpa_t)fib->fmt0.aibv));
+ srcu_read_unlock(&kvm->srcu, idx);
+ if (is_error_page(aibv_page)) {
+ rc = -EIO;
+ goto out;
+ }
+ gaddr = page_to_phys(aibv_page) + (fib->fmt0.aibv & ~PAGE_MASK);
+ fib->fmt0.aibv = gaddr;
+
+ /* Pin the guest AISB if one was specified */
+ if (fib->fmt0.sum == 1) {
+ idx = srcu_read_lock(&kvm->srcu);
+ aisb_page = gfn_to_page(kvm, gpa_to_gfn((gpa_t)fib->fmt0.aisb));
+ srcu_read_unlock(&kvm->srcu, idx);
+ if (is_error_page(aisb_page)) {
+ rc = -EIO;
+ goto unpin1;
+ }
+ }
+
+ /* AISB must be allocated before we can fill in GAITE */
+ mutex_lock(&aift->aift_lock);
+ bit = airq_iv_alloc_bit(aift->sbv);
+ if (bit == -1UL)
+ goto unpin2;
+ zdev->aisb = bit; /* store the summary bit number */
+ zdev->aibv = airq_iv_create(msi_vecs, AIRQ_IV_DATA |
+ AIRQ_IV_BITLOCK |
+ AIRQ_IV_GUESTVEC,
+ phys_to_virt(fib->fmt0.aibv));
+
+ spin_lock_irq(&aift->gait_lock);
+ gaite = (struct zpci_gaite *)aift->gait + (zdev->aisb *
+ sizeof(struct zpci_gaite));
+
+ /* If assist not requested, host will get all alerts */
+ if (assist)
+ gaite->gisa = (u32)virt_to_phys(&kvm->arch.sie_page2->gisa);
+ else
+ gaite->gisa = 0;
+
+ gaite->gisc = fib->fmt0.isc;
+ gaite->count++;
+ gaite->aisbo = fib->fmt0.aisbo;
+ gaite->aisb = virt_to_phys(page_address(aisb_page) + (fib->fmt0.aisb &
+ ~PAGE_MASK));
+ aift->kzdev[zdev->aisb] = zdev->kzdev;
+ spin_unlock_irq(&aift->gait_lock);
+
+ /* Update guest FIB for re-issue */
+ fib->fmt0.aisbo = zdev->aisb & 63;
+ fib->fmt0.aisb = virt_to_phys(aift->sbv->vector + (zdev->aisb / 64) * 8);
+ fib->fmt0.isc = gisc;
+
+ /* Save some guest fib values in the host for later use */
+ zdev->kzdev->fib.fmt0.isc = fib->fmt0.isc;
+ zdev->kzdev->fib.fmt0.aibv = fib->fmt0.aibv;
+ mutex_unlock(&aift->aift_lock);
+
+ /* Issue the clp to setup the irq now */
+ rc = kvm_zpci_set_airq(zdev);
+ return rc;
+
+unpin2:
+ mutex_unlock(&aift->aift_lock);
+ if (fib->fmt0.sum == 1) {
+ gaddr = page_to_phys(aisb_page);
+ kvm_release_pfn_dirty(gaddr >> PAGE_SHIFT);
+ }
+unpin1:
+ kvm_release_pfn_dirty(fib->fmt0.aibv >> PAGE_SHIFT);
+out:
+ return rc;
+}
+
+static int kvm_s390_pci_aif_disable(struct zpci_dev *zdev, bool force)
+{
+ struct kvm_zdev *kzdev = zdev->kzdev;
+ struct zpci_gaite *gaite;
+ int rc;
+ u8 isc;
+
+ if (zdev->gisa == 0)
+ return -EINVAL;
+
+ mutex_lock(&aift->aift_lock);
+
+ /*
+ * If the clear fails due to an error, leave now unless we know this
+ * device is about to go away (force) -- In that case clear the GAITE
+ * regardless.
+ */
+ rc = kvm_zpci_clear_airq(zdev);
+ if (rc && !force)
+ goto out;
+
+ if (zdev->kzdev->fib.fmt0.aibv == 0)
+ goto out;
+ spin_lock_irq(&aift->gait_lock);
+ gaite = (struct zpci_gaite *)aift->gait + (zdev->aisb *
+ sizeof(struct zpci_gaite));
+ isc = gaite->gisc;
+ gaite->count--;
+ if (gaite->count == 0) {
+ /* Release guest AIBV and AISB */
+ kvm_release_pfn_dirty(kzdev->fib.fmt0.aibv >> PAGE_SHIFT);
+ if (gaite->aisb != 0)
+ kvm_release_pfn_dirty(gaite->aisb >> PAGE_SHIFT);
+ /* Clear the GAIT entry */
+ gaite->aisb = 0;
+ gaite->gisc = 0;
+ gaite->aisbo = 0;
+ gaite->gisa = 0;
+ aift->kzdev[zdev->aisb] = 0;
+ /* Clear zdev info */
+ airq_iv_free_bit(aift->sbv, zdev->aisb);
+ airq_iv_release(zdev->aibv);
+ zdev->aisb = 0;
+ zdev->aibv = NULL;
+ }
+ spin_unlock_irq(&aift->gait_lock);
+ kvm_s390_gisc_unregister(kzdev->kvm, isc);
+ kzdev->fib.fmt0.isc = 0;
+ kzdev->fib.fmt0.aibv = 0;
+out:
+ mutex_unlock(&aift->aift_lock);
+
+ return rc;
+}
+
static int kvm_s390_pci_interp_enable(struct zpci_dev *zdev)
{
u32 gisa;
@@ -226,13 +421,17 @@ static int kvm_s390_pci_interp_enable(struct zpci_dev *zdev)
return rc;
}

-static int kvm_s390_pci_interp_disable(struct zpci_dev *zdev)
+static int kvm_s390_pci_interp_disable(struct zpci_dev *zdev, bool force)
{
int rc;

if (zdev->gisa == 0)
return -EINVAL;

+ /* Forwarding must be turned off before interpretation */
+ if (zdev->kzdev->fib.fmt0.aibv != 0)
+ kvm_s390_pci_aif_disable(zdev, force);
+
/* Remove the host CLP guest designation */
zdev->gisa = 0;

diff --git a/arch/s390/pci/pci_insn.c b/arch/s390/pci/pci_insn.c
index ca6399d52767..f7d0e29bbf0b 100644
--- a/arch/s390/pci/pci_insn.c
+++ b/arch/s390/pci/pci_insn.c
@@ -59,6 +59,7 @@ u8 zpci_mod_fc(u64 req, struct zpci_fib *fib, u8 *status)

return cc;
}
+EXPORT_SYMBOL_GPL(zpci_mod_fc);

/* Refresh PCI Translations */
static inline u8 __rpcit(u64 fn, u64 addr, u64 range, u8 *status)
--
2.27.0

2022-03-17 01:45:43

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH v4 03/32] s390/sclp: detect the AENI facility

Detect the Adapter Event Notification Interpretation facility.

Reviewed-by: Eric Farman <[email protected]>
Reviewed-by: Christian Borntraeger <[email protected]>
Reviewed-by: Claudio Imbrenda <[email protected]>
Signed-off-by: Matthew Rosato <[email protected]>
---
arch/s390/include/asm/sclp.h | 1 +
drivers/s390/char/sclp_early.c | 1 +
2 files changed, 2 insertions(+)

diff --git a/arch/s390/include/asm/sclp.h b/arch/s390/include/asm/sclp.h
index 8b56ac5ae496..8c2e142000d4 100644
--- a/arch/s390/include/asm/sclp.h
+++ b/arch/s390/include/asm/sclp.h
@@ -90,6 +90,7 @@ struct sclp_info {
unsigned char has_dirq : 1;
unsigned char has_zpci_lsi : 1;
unsigned char has_aisii : 1;
+ unsigned char has_aeni : 1;
unsigned int ibc;
unsigned int mtid;
unsigned int mtid_cp;
diff --git a/drivers/s390/char/sclp_early.c b/drivers/s390/char/sclp_early.c
index 29fee179e197..e9af01b4c97a 100644
--- a/drivers/s390/char/sclp_early.c
+++ b/drivers/s390/char/sclp_early.c
@@ -46,6 +46,7 @@ static void __init sclp_early_facilities_detect(void)
sclp.has_hvs = !!(sccb->fac119 & 0x80);
sclp.has_kss = !!(sccb->fac98 & 0x01);
sclp.has_aisii = !!(sccb->fac118 & 0x40);
+ sclp.has_aeni = !!(sccb->fac118 & 0x20);
sclp.has_zpci_lsi = !!(sccb->fac118 & 0x01);
if (sccb->fac85 & 0x02)
S390_lowcore.machine_flags |= MACHINE_FLAG_ESOP;
--
2.27.0

2022-03-17 03:33:58

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH v4 29/32] vfio-pci/zdev: add DTSM to clp group capability

The DTSM, or designation type supported mask, indicates what IOAT formats
are available to the guest. For an interpreted device, userspace will not
know what format(s) the IOAT assist supports, so pass it via the
capability chain. Since the value belongs to the Query PCI Function Group
clp, let's extend the existing capability with a new version.

Reviewed-by: Pierre Morel <[email protected]>
Signed-off-by: Matthew Rosato <[email protected]>
---
drivers/vfio/pci/vfio_pci_zdev.c | 12 ++++++++++--
include/uapi/linux/vfio_zdev.h | 3 +++
2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_zdev.c b/drivers/vfio/pci/vfio_pci_zdev.c
index 4a653ce480c7..aadd2b58b822 100644
--- a/drivers/vfio/pci/vfio_pci_zdev.c
+++ b/drivers/vfio/pci/vfio_pci_zdev.c
@@ -13,6 +13,7 @@
#include <linux/vfio_zdev.h>
#include <asm/pci_clp.h>
#include <asm/pci_io.h>
+#include <asm/kvm_pci.h>

#include <linux/vfio_pci_core.h>

@@ -44,16 +45,23 @@ static int zpci_group_cap(struct zpci_dev *zdev, struct vfio_info_cap *caps)
{
struct vfio_device_info_cap_zpci_group cap = {
.header.id = VFIO_DEVICE_INFO_CAP_ZPCI_GROUP,
- .header.version = 1,
+ .header.version = 2,
.dasm = zdev->dma_mask,
.msi_addr = zdev->msi_addr,
.flags = VFIO_DEVICE_INFO_ZPCI_FLAG_REFRESH,
.mui = zdev->fmb_update,
.noi = zdev->max_msi,
.maxstbl = ZPCI_MAX_WRITE_SIZE,
- .version = zdev->version
+ .version = zdev->version,
+ .dtsm = 0
};

+ /* Some values are different for interpreted devices */
+ if (zdev->kzdev) {
+ cap.maxstbl = zdev->maxstbl;
+ cap.dtsm = kvm_s390_pci_get_dtsm(zdev);
+ }
+
return vfio_info_add_capability(caps, &cap.header, sizeof(cap));
}

diff --git a/include/uapi/linux/vfio_zdev.h b/include/uapi/linux/vfio_zdev.h
index 78c022af3d29..29351687e914 100644
--- a/include/uapi/linux/vfio_zdev.h
+++ b/include/uapi/linux/vfio_zdev.h
@@ -50,6 +50,9 @@ struct vfio_device_info_cap_zpci_group {
__u16 noi; /* Maximum number of MSIs */
__u16 maxstbl; /* Maximum Store Block Length */
__u8 version; /* Supported PCI Version */
+ /* End of version 1 */
+ __u8 dtsm; /* Supported IOAT Designations */
+ /* End of version 2 */
};

/**
--
2.27.0

2022-03-17 03:41:20

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH v4 06/32] s390/airq: allow for airq structure that uses an input vector

When doing device passthrough where interrupts are being forwarded from
host to guest, we wish to use a pinned section of guest memory as the
vector (the same memory used by the guest as the vector). To accomplish
this, add a new parameter for airq_iv_create which allows passing an
existing vector to be used instead of allocating a new one. The caller
is responsible for ensuring the vector is pinned in memory as well as for
unpinning the memory when the vector is no longer needed.

A subsequent patch will use this new parameter for zPCI interpretation.

Reviewed-by: Pierre Morel <[email protected]>
Reviewed-by: Claudio Imbrenda <[email protected]>
Acked-by: Cornelia Huck <[email protected]>
Signed-off-by: Matthew Rosato <[email protected]>
---
arch/s390/include/asm/airq.h | 4 +++-
arch/s390/pci/pci_irq.c | 8 ++++----
drivers/s390/cio/airq.c | 10 +++++++---
drivers/s390/virtio/virtio_ccw.c | 2 +-
4 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/arch/s390/include/asm/airq.h b/arch/s390/include/asm/airq.h
index 7918a7d09028..e82e5626e139 100644
--- a/arch/s390/include/asm/airq.h
+++ b/arch/s390/include/asm/airq.h
@@ -47,8 +47,10 @@ struct airq_iv {
#define AIRQ_IV_PTR 4 /* Allocate the ptr array */
#define AIRQ_IV_DATA 8 /* Allocate the data array */
#define AIRQ_IV_CACHELINE 16 /* Cacheline alignment for the vector */
+#define AIRQ_IV_GUESTVEC 32 /* Vector is a pinned guest page */

-struct airq_iv *airq_iv_create(unsigned long bits, unsigned long flags);
+struct airq_iv *airq_iv_create(unsigned long bits, unsigned long flags,
+ unsigned long *vec);
void airq_iv_release(struct airq_iv *iv);
unsigned long airq_iv_alloc(struct airq_iv *iv, unsigned long num);
void airq_iv_free(struct airq_iv *iv, unsigned long bit, unsigned long num);
diff --git a/arch/s390/pci/pci_irq.c b/arch/s390/pci/pci_irq.c
index cc4c8d7c8f5c..0d0a02a9fbbf 100644
--- a/arch/s390/pci/pci_irq.c
+++ b/arch/s390/pci/pci_irq.c
@@ -296,7 +296,7 @@ int arch_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
zdev->aisb = bit;

/* Create adapter interrupt vector */
- zdev->aibv = airq_iv_create(msi_vecs, AIRQ_IV_DATA | AIRQ_IV_BITLOCK);
+ zdev->aibv = airq_iv_create(msi_vecs, AIRQ_IV_DATA | AIRQ_IV_BITLOCK, NULL);
if (!zdev->aibv)
return -ENOMEM;

@@ -419,7 +419,7 @@ static int __init zpci_directed_irq_init(void)
union zpci_sic_iib iib = {{0}};
unsigned int cpu;

- zpci_sbv = airq_iv_create(num_possible_cpus(), 0);
+ zpci_sbv = airq_iv_create(num_possible_cpus(), 0, NULL);
if (!zpci_sbv)
return -ENOMEM;

@@ -441,7 +441,7 @@ static int __init zpci_directed_irq_init(void)
zpci_ibv[cpu] = airq_iv_create(cache_line_size() * BITS_PER_BYTE,
AIRQ_IV_DATA |
AIRQ_IV_CACHELINE |
- (!cpu ? AIRQ_IV_ALLOC : 0));
+ (!cpu ? AIRQ_IV_ALLOC : 0), NULL);
if (!zpci_ibv[cpu])
return -ENOMEM;
}
@@ -458,7 +458,7 @@ static int __init zpci_floating_irq_init(void)
if (!zpci_ibv)
return -ENOMEM;

- zpci_sbv = airq_iv_create(ZPCI_NR_DEVICES, AIRQ_IV_ALLOC);
+ zpci_sbv = airq_iv_create(ZPCI_NR_DEVICES, AIRQ_IV_ALLOC, NULL);
if (!zpci_sbv)
goto out_free;

diff --git a/drivers/s390/cio/airq.c b/drivers/s390/cio/airq.c
index 2f2226786319..375a58b1c838 100644
--- a/drivers/s390/cio/airq.c
+++ b/drivers/s390/cio/airq.c
@@ -122,10 +122,12 @@ static inline unsigned long iv_size(unsigned long bits)
* airq_iv_create - create an interrupt vector
* @bits: number of bits in the interrupt vector
* @flags: allocation flags
+ * @vec: pointer to pinned guest memory if AIRQ_IV_GUESTVEC
*
* Returns a pointer to an interrupt vector structure
*/
-struct airq_iv *airq_iv_create(unsigned long bits, unsigned long flags)
+struct airq_iv *airq_iv_create(unsigned long bits, unsigned long flags,
+ unsigned long *vec)
{
struct airq_iv *iv;
unsigned long size;
@@ -146,6 +148,8 @@ struct airq_iv *airq_iv_create(unsigned long bits, unsigned long flags)
&iv->vector_dma);
if (!iv->vector)
goto out_free;
+ } else if (flags & AIRQ_IV_GUESTVEC) {
+ iv->vector = vec;
} else {
iv->vector = cio_dma_zalloc(size);
if (!iv->vector)
@@ -185,7 +189,7 @@ struct airq_iv *airq_iv_create(unsigned long bits, unsigned long flags)
kfree(iv->avail);
if (iv->flags & AIRQ_IV_CACHELINE && iv->vector)
dma_pool_free(airq_iv_cache, iv->vector, iv->vector_dma);
- else
+ else if (!(iv->flags & AIRQ_IV_GUESTVEC))
cio_dma_free(iv->vector, size);
kfree(iv);
out:
@@ -204,7 +208,7 @@ void airq_iv_release(struct airq_iv *iv)
kfree(iv->bitlock);
if (iv->flags & AIRQ_IV_CACHELINE)
dma_pool_free(airq_iv_cache, iv->vector, iv->vector_dma);
- else
+ else if (!(iv->flags & AIRQ_IV_GUESTVEC))
cio_dma_free(iv->vector, iv_size(iv->bits));
kfree(iv->avail);
kfree(iv);
diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
index 52c376d15978..410498d693f8 100644
--- a/drivers/s390/virtio/virtio_ccw.c
+++ b/drivers/s390/virtio/virtio_ccw.c
@@ -241,7 +241,7 @@ static struct airq_info *new_airq_info(int index)
return NULL;
rwlock_init(&info->lock);
info->aiv = airq_iv_create(VIRTIO_IV_BITS, AIRQ_IV_ALLOC | AIRQ_IV_PTR
- | AIRQ_IV_CACHELINE);
+ | AIRQ_IV_CACHELINE, NULL);
if (!info->aiv) {
kfree(info);
return NULL;
--
2.27.0

2022-03-17 03:41:31

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH v4 01/32] s390/sclp: detect the zPCI load/store interpretation facility

Detect the zPCI Load/Store Interpretation facility.

Reviewed-by: Eric Farman <[email protected]>
Reviewed-by: Christian Borntraeger <[email protected]>
Reviewed-by: Claudio Imbrenda <[email protected]>
Signed-off-by: Matthew Rosato <[email protected]>
---
arch/s390/include/asm/sclp.h | 1 +
drivers/s390/char/sclp_early.c | 1 +
2 files changed, 2 insertions(+)

diff --git a/arch/s390/include/asm/sclp.h b/arch/s390/include/asm/sclp.h
index c68ea35de498..58a4d3d354b7 100644
--- a/arch/s390/include/asm/sclp.h
+++ b/arch/s390/include/asm/sclp.h
@@ -88,6 +88,7 @@ struct sclp_info {
unsigned char has_diag318 : 1;
unsigned char has_sipl : 1;
unsigned char has_dirq : 1;
+ unsigned char has_zpci_lsi : 1;
unsigned int ibc;
unsigned int mtid;
unsigned int mtid_cp;
diff --git a/drivers/s390/char/sclp_early.c b/drivers/s390/char/sclp_early.c
index e9943a86c361..b88dd0da1231 100644
--- a/drivers/s390/char/sclp_early.c
+++ b/drivers/s390/char/sclp_early.c
@@ -45,6 +45,7 @@ static void __init sclp_early_facilities_detect(void)
sclp.has_gisaf = !!(sccb->fac118 & 0x08);
sclp.has_hvs = !!(sccb->fac119 & 0x80);
sclp.has_kss = !!(sccb->fac98 & 0x01);
+ sclp.has_zpci_lsi = !!(sccb->fac118 & 0x01);
if (sccb->fac85 & 0x02)
S390_lowcore.machine_flags |= MACHINE_FLAG_ESOP;
if (sccb->fac91 & 0x40)
--
2.27.0

2022-03-17 03:53:48

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH v4 07/32] s390/pci: externalize the SIC operation controls and routine

A subsequent patch will be issuing SIC from KVM -- export the necessary
routine and make the operation control definitions available from a header.
Because the routine will now be exported, let's rename __zpci_set_irq_ctrl
to zpci_set_irq_ctrl and get rid of the zero'd iib wrapper function of
the same name.

Reviewed-by: Niklas Schnelle <[email protected]>
Reviewed-by: Claudio Imbrenda <[email protected]>
Reviewed-by: Pierre Morel <[email protected]>
Signed-off-by: Matthew Rosato <[email protected]>
---
arch/s390/include/asm/pci_insn.h | 17 +++++++++--------
arch/s390/pci/pci_insn.c | 3 ++-
arch/s390/pci/pci_irq.c | 26 ++++++++++++--------------
3 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/arch/s390/include/asm/pci_insn.h b/arch/s390/include/asm/pci_insn.h
index 61cf9531f68f..5331082fa516 100644
--- a/arch/s390/include/asm/pci_insn.h
+++ b/arch/s390/include/asm/pci_insn.h
@@ -98,6 +98,14 @@ struct zpci_fib {
u32 gd;
} __packed __aligned(8);

+/* Set Interruption Controls Operation Controls */
+#define SIC_IRQ_MODE_ALL 0
+#define SIC_IRQ_MODE_SINGLE 1
+#define SIC_IRQ_MODE_DIRECT 4
+#define SIC_IRQ_MODE_D_ALL 16
+#define SIC_IRQ_MODE_D_SINGLE 17
+#define SIC_IRQ_MODE_SET_CPU 18
+
/* directed interruption information block */
struct zpci_diib {
u32 : 1;
@@ -134,13 +142,6 @@ int __zpci_store(u64 data, u64 req, u64 offset);
int zpci_store(const volatile void __iomem *addr, u64 data, unsigned long len);
int __zpci_store_block(const u64 *data, u64 req, u64 offset);
void zpci_barrier(void);
-int __zpci_set_irq_ctrl(u16 ctl, u8 isc, union zpci_sic_iib *iib);
-
-static inline int zpci_set_irq_ctrl(u16 ctl, u8 isc)
-{
- union zpci_sic_iib iib = {{0}};
-
- return __zpci_set_irq_ctrl(ctl, isc, &iib);
-}
+int zpci_set_irq_ctrl(u16 ctl, u8 isc, union zpci_sic_iib *iib);

#endif
diff --git a/arch/s390/pci/pci_insn.c b/arch/s390/pci/pci_insn.c
index 4dd58b196cea..2a47b3936e44 100644
--- a/arch/s390/pci/pci_insn.c
+++ b/arch/s390/pci/pci_insn.c
@@ -97,7 +97,7 @@ int zpci_refresh_trans(u64 fn, u64 addr, u64 range)
}

/* Set Interruption Controls */
-int __zpci_set_irq_ctrl(u16 ctl, u8 isc, union zpci_sic_iib *iib)
+int zpci_set_irq_ctrl(u16 ctl, u8 isc, union zpci_sic_iib *iib)
{
if (!test_facility(72))
return -EIO;
@@ -108,6 +108,7 @@ int __zpci_set_irq_ctrl(u16 ctl, u8 isc, union zpci_sic_iib *iib)

return 0;
}
+EXPORT_SYMBOL_GPL(zpci_set_irq_ctrl);

/* PCI Load */
static inline int ____pcilg(u64 *data, u64 req, u64 offset, u8 *status)
diff --git a/arch/s390/pci/pci_irq.c b/arch/s390/pci/pci_irq.c
index 0d0a02a9fbbf..2f675355fd0c 100644
--- a/arch/s390/pci/pci_irq.c
+++ b/arch/s390/pci/pci_irq.c
@@ -15,13 +15,6 @@

static enum {FLOATING, DIRECTED} irq_delivery;

-#define SIC_IRQ_MODE_ALL 0
-#define SIC_IRQ_MODE_SINGLE 1
-#define SIC_IRQ_MODE_DIRECT 4
-#define SIC_IRQ_MODE_D_ALL 16
-#define SIC_IRQ_MODE_D_SINGLE 17
-#define SIC_IRQ_MODE_SET_CPU 18
-
/*
* summary bit vector
* FLOATING - summary bit per function
@@ -154,6 +147,7 @@ static struct irq_chip zpci_irq_chip = {
static void zpci_handle_cpu_local_irq(bool rescan)
{
struct airq_iv *dibv = zpci_ibv[smp_processor_id()];
+ union zpci_sic_iib iib = {{0}};
unsigned long bit;
int irqs_on = 0;

@@ -165,7 +159,7 @@ static void zpci_handle_cpu_local_irq(bool rescan)
/* End of second scan with interrupts on. */
break;
/* First scan complete, reenable interrupts. */
- if (zpci_set_irq_ctrl(SIC_IRQ_MODE_D_SINGLE, PCI_ISC))
+ if (zpci_set_irq_ctrl(SIC_IRQ_MODE_D_SINGLE, PCI_ISC, &iib))
break;
bit = 0;
continue;
@@ -193,6 +187,7 @@ static void zpci_handle_remote_irq(void *data)
static void zpci_handle_fallback_irq(void)
{
struct cpu_irq_data *cpu_data;
+ union zpci_sic_iib iib = {{0}};
unsigned long cpu;
int irqs_on = 0;

@@ -203,7 +198,7 @@ static void zpci_handle_fallback_irq(void)
/* End of second scan with interrupts on. */
break;
/* First scan complete, reenable interrupts. */
- if (zpci_set_irq_ctrl(SIC_IRQ_MODE_SINGLE, PCI_ISC))
+ if (zpci_set_irq_ctrl(SIC_IRQ_MODE_SINGLE, PCI_ISC, &iib))
break;
cpu = 0;
continue;
@@ -234,6 +229,7 @@ static void zpci_directed_irq_handler(struct airq_struct *airq,
static void zpci_floating_irq_handler(struct airq_struct *airq,
struct tpi_info *tpi_info)
{
+ union zpci_sic_iib iib = {{0}};
unsigned long si, ai;
struct airq_iv *aibv;
int irqs_on = 0;
@@ -247,7 +243,7 @@ static void zpci_floating_irq_handler(struct airq_struct *airq,
/* End of second scan with interrupts on. */
break;
/* First scan complete, reenable interrupts. */
- if (zpci_set_irq_ctrl(SIC_IRQ_MODE_SINGLE, PCI_ISC))
+ if (zpci_set_irq_ctrl(SIC_IRQ_MODE_SINGLE, PCI_ISC, &iib))
break;
si = 0;
continue;
@@ -407,11 +403,12 @@ static struct airq_struct zpci_airq = {
static void __init cpu_enable_directed_irq(void *unused)
{
union zpci_sic_iib iib = {{0}};
+ union zpci_sic_iib ziib = {{0}};

iib.cdiib.dibv_addr = (u64) zpci_ibv[smp_processor_id()]->vector;

- __zpci_set_irq_ctrl(SIC_IRQ_MODE_SET_CPU, 0, &iib);
- zpci_set_irq_ctrl(SIC_IRQ_MODE_D_SINGLE, PCI_ISC);
+ zpci_set_irq_ctrl(SIC_IRQ_MODE_SET_CPU, 0, &iib);
+ zpci_set_irq_ctrl(SIC_IRQ_MODE_D_SINGLE, PCI_ISC, &ziib);
}

static int __init zpci_directed_irq_init(void)
@@ -426,7 +423,7 @@ static int __init zpci_directed_irq_init(void)
iib.diib.isc = PCI_ISC;
iib.diib.nr_cpus = num_possible_cpus();
iib.diib.disb_addr = virt_to_phys(zpci_sbv->vector);
- __zpci_set_irq_ctrl(SIC_IRQ_MODE_DIRECT, 0, &iib);
+ zpci_set_irq_ctrl(SIC_IRQ_MODE_DIRECT, 0, &iib);

zpci_ibv = kcalloc(num_possible_cpus(), sizeof(*zpci_ibv),
GFP_KERNEL);
@@ -471,6 +468,7 @@ static int __init zpci_floating_irq_init(void)

int __init zpci_irq_init(void)
{
+ union zpci_sic_iib iib = {{0}};
int rc;

irq_delivery = sclp.has_dirq ? DIRECTED : FLOATING;
@@ -502,7 +500,7 @@ int __init zpci_irq_init(void)
* Enable floating IRQs (with suppression after one IRQ). When using
* directed IRQs this enables the fallback path.
*/
- zpci_set_irq_ctrl(SIC_IRQ_MODE_SINGLE, PCI_ISC);
+ zpci_set_irq_ctrl(SIC_IRQ_MODE_SINGLE, PCI_ISC, &iib);

return 0;
out_airq:
--
2.27.0

2022-03-17 04:02:23

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH v4 18/32] iommu/s390: add support for IOMMU_DOMAIN_KVM

Add an alternate domain ops for type IOMMU_DOMAIN_KVM. This type is
intended for use when KVM is managing the IOMMU domain on behalf of a
VM. Mapping can only be performed once a KVM is registered with the
domain as well as a guest IOTA (address translation anchor).

The map operation is expected to be received in response to an
04 intercept of a guest RPCIT instruction, and will perform a
synchronization operation between the host DMA and guest DMA tables
over the range specified.

Signed-off-by: Matthew Rosato <[email protected]>
---
arch/s390/include/asm/kvm_pci.h | 6 +
arch/s390/include/asm/pci_dma.h | 3 +
drivers/iommu/Kconfig | 8 +
drivers/iommu/Makefile | 1 +
drivers/iommu/s390-iommu.c | 49 ++--
drivers/iommu/s390-iommu.h | 53 ++++
drivers/iommu/s390-kvm-iommu.c | 469 ++++++++++++++++++++++++++++++++
7 files changed, 562 insertions(+), 27 deletions(-)
create mode 100644 drivers/iommu/s390-iommu.h
create mode 100644 drivers/iommu/s390-kvm-iommu.c

diff --git a/arch/s390/include/asm/kvm_pci.h b/arch/s390/include/asm/kvm_pci.h
index ae8669105f72..ebc0da5d9ac1 100644
--- a/arch/s390/include/asm/kvm_pci.h
+++ b/arch/s390/include/asm/kvm_pci.h
@@ -11,6 +11,7 @@
#define ASM_KVM_PCI_H

#include <linux/types.h>
+#include <linux/iommu.h>
#include <linux/kvm_types.h>
#include <linux/kvm_host.h>
#include <linux/kvm.h>
@@ -19,9 +20,14 @@
struct kvm_zdev {
struct zpci_dev *zdev;
struct kvm *kvm;
+ struct iommu_domain *dom; /* Used to invoke IOMMU API for RPCIT */
};

int kvm_s390_pci_dev_open(struct zpci_dev *zdev);
void kvm_s390_pci_dev_release(struct zpci_dev *zdev);

+int zpci_iommu_attach_kvm(struct zpci_dev *zdev, struct kvm *kvm);
+int zpci_iommu_kvm_assign_iota(struct zpci_dev *zdev, u64 iota);
+int zpci_iommu_kvm_remove_iota(struct zpci_dev *zdev);
+
#endif /* ASM_KVM_PCI_H */
diff --git a/arch/s390/include/asm/pci_dma.h b/arch/s390/include/asm/pci_dma.h
index 91e63426bdc5..38004e0a4383 100644
--- a/arch/s390/include/asm/pci_dma.h
+++ b/arch/s390/include/asm/pci_dma.h
@@ -50,6 +50,9 @@ enum zpci_ioat_dtype {
#define ZPCI_TABLE_ALIGN ZPCI_TABLE_SIZE
#define ZPCI_TABLE_ENTRY_SIZE (sizeof(unsigned long))
#define ZPCI_TABLE_ENTRIES (ZPCI_TABLE_SIZE / ZPCI_TABLE_ENTRY_SIZE)
+#define ZPCI_TABLE_PAGES (ZPCI_TABLE_SIZE >> PAGE_SHIFT)
+#define ZPCI_TABLE_ENTRIES_PAGES (ZPCI_TABLE_ENTRIES * ZPCI_TABLE_PAGES)
+#define ZPCI_TABLE_ENTRIES_PER_PAGE (ZPCI_TABLE_ENTRIES / ZPCI_TABLE_PAGES)

#define ZPCI_TABLE_BITS 11
#define ZPCI_PT_BITS 8
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 3eb68fa1b8cc..9637f73925ec 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -411,6 +411,14 @@ config S390_AP_IOMMU
Enables bits of IOMMU API required by VFIO. The iommu_ops
is not implemented as it is not necessary for VFIO.

+config S390_KVM_IOMMU
+ bool "S390 KVM IOMMU Support"
+ depends on S390_IOMMU && KVM || COMPILE_TEST
+ select IOMMU_API
+ help
+ Extends the S390 IOMMU API to support a domain owned and managed by
+ KVM. This allows KVM to manage nested mappings vs userspace.
+
config MTK_IOMMU
tristate "MediaTek IOMMU Support"
depends on ARCH_MEDIATEK || COMPILE_TEST
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index bc7f730edbb0..5476e978d7f5 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -25,6 +25,7 @@ obj-$(CONFIG_TEGRA_IOMMU_SMMU) += tegra-smmu.o
obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
+obj-$(CONFIG_S390_KVM_IOMMU) += s390-kvm-iommu.o
obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
obj-$(CONFIG_IOMMU_SVA_LIB) += iommu-sva-lib.o io-pgfault.o
diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index 73a85c599dc2..0ead37f6e232 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -11,6 +11,7 @@
#include <linux/iommu-helper.h>
#include <linux/sizes.h>
#include <asm/pci_dma.h>
+#include "s390-iommu.h"

/*
* Physically contiguous memory regions can be mapped with 4 KiB alignment,
@@ -21,24 +22,6 @@

static const struct iommu_ops s390_iommu_ops;

-struct s390_domain {
- struct iommu_domain domain;
- struct list_head devices;
- unsigned long *dma_table;
- spinlock_t dma_table_lock;
- spinlock_t list_lock;
-};
-
-struct s390_domain_device {
- struct list_head list;
- struct zpci_dev *zdev;
-};
-
-static struct s390_domain *to_s390_domain(struct iommu_domain *dom)
-{
- return container_of(dom, struct s390_domain, domain);
-}
-
static bool s390_iommu_capable(enum iommu_cap cap)
{
switch (cap) {
@@ -55,7 +38,12 @@ static struct iommu_domain *s390_domain_alloc(unsigned domain_type)
{
struct s390_domain *s390_domain;

- if (domain_type != IOMMU_DOMAIN_UNMANAGED)
+ if (domain_type != IOMMU_DOMAIN_UNMANAGED &&
+ domain_type != IOMMU_DOMAIN_KVM)
+ return NULL;
+
+ if (domain_type == IOMMU_DOMAIN_KVM &&
+ !IS_ENABLED(CONFIG_S390_KVM_IOMMU))
return NULL;

s390_domain = kzalloc(sizeof(*s390_domain), GFP_KERNEL);
@@ -68,23 +56,30 @@ static struct iommu_domain *s390_domain_alloc(unsigned domain_type)
return NULL;
}

+ /* If KVM-managed, swap in alternate ops now */
+ if (IS_ENABLED(CONFIG_S390_KVM_IOMMU) &&
+ domain_type == IOMMU_DOMAIN_KVM)
+ s390_domain->domain.ops = &s390_kvm_domain_ops;
+
spin_lock_init(&s390_domain->dma_table_lock);
spin_lock_init(&s390_domain->list_lock);
+ mutex_init(&s390_domain->kvm_dom.ioat_lock);
INIT_LIST_HEAD(&s390_domain->devices);

return &s390_domain->domain;
}

-static void s390_domain_free(struct iommu_domain *domain)
+void s390_domain_free(struct iommu_domain *domain)
{
struct s390_domain *s390_domain = to_s390_domain(domain);

dma_cleanup_tables(s390_domain->dma_table);
+ mutex_destroy(&s390_domain->kvm_dom.ioat_lock);
kfree(s390_domain);
}

-static int s390_iommu_attach_device(struct iommu_domain *domain,
- struct device *dev)
+int s390_iommu_attach_device(struct iommu_domain *domain,
+ struct device *dev)
{
struct s390_domain *s390_domain = to_s390_domain(domain);
struct zpci_dev *zdev = to_zpci_dev(dev);
@@ -143,8 +138,8 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
return rc;
}

-static void s390_iommu_detach_device(struct iommu_domain *domain,
- struct device *dev)
+void s390_iommu_detach_device(struct iommu_domain *domain,
+ struct device *dev)
{
struct s390_domain *s390_domain = to_s390_domain(domain);
struct zpci_dev *zdev = to_zpci_dev(dev);
@@ -200,7 +195,7 @@ static void s390_iommu_release_device(struct device *dev)
if (zdev && zdev->s390_domain) {
domain = iommu_get_domain_for_dev(dev);
if (domain)
- s390_iommu_detach_device(domain, dev);
+ domain->ops->detach_dev(domain, dev);
}
}

@@ -282,8 +277,8 @@ static int s390_iommu_map(struct iommu_domain *domain, unsigned long iova,
return rc;
}

-static phys_addr_t s390_iommu_iova_to_phys(struct iommu_domain *domain,
- dma_addr_t iova)
+phys_addr_t s390_iommu_iova_to_phys(struct iommu_domain *domain,
+ dma_addr_t iova)
{
struct s390_domain *s390_domain = to_s390_domain(domain);
unsigned long *sto, *pto, *rto, flags;
diff --git a/drivers/iommu/s390-iommu.h b/drivers/iommu/s390-iommu.h
new file mode 100644
index 000000000000..21c8243a36b1
--- /dev/null
+++ b/drivers/iommu/s390-iommu.h
@@ -0,0 +1,53 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * IOMMU API for s390 PCI devices
+ *
+ * Copyright IBM Corp. 2022
+ * Author(s): Matthew Rosato <[email protected]>
+ */
+
+#ifndef _S390_IOMMU_H
+#define _S390_IOMMU_H
+
+#include <linux/iommu.h>
+#include <linux/kvm_host.h>
+
+extern const struct iommu_domain_ops s390_kvm_domain_ops;
+
+struct s390_kvm_domain {
+ struct kvm *kvm;
+ unsigned long *head[ZPCI_TABLE_PAGES];
+ unsigned long **seg;
+ unsigned long ***pt;
+ struct page *(*pin)(struct kvm *kvm, gfn_t gfn);
+ void (*unpin)(kvm_pfn_t pfn);
+ struct mutex ioat_lock;
+ bool map_enabled;
+};
+
+struct s390_domain {
+ struct iommu_domain domain;
+ struct list_head devices;
+ unsigned long *dma_table;
+ spinlock_t dma_table_lock;
+ spinlock_t list_lock;
+ struct s390_kvm_domain kvm_dom;
+};
+
+struct s390_domain_device {
+ struct list_head list;
+ struct zpci_dev *zdev;
+};
+
+static inline struct s390_domain *to_s390_domain(struct iommu_domain *dom)
+{
+ return container_of(dom, struct s390_domain, domain);
+}
+
+void s390_domain_free(struct iommu_domain *domain);
+int s390_iommu_attach_device(struct iommu_domain *domain, struct device *dev);
+void s390_iommu_detach_device(struct iommu_domain *domain, struct device *dev);
+phys_addr_t s390_iommu_iova_to_phys(struct iommu_domain *domain,
+ dma_addr_t iova);
+
+#endif /* _S390_IOMMU_H */
diff --git a/drivers/iommu/s390-kvm-iommu.c b/drivers/iommu/s390-kvm-iommu.c
new file mode 100644
index 000000000000..d24e6904d5f8
--- /dev/null
+++ b/drivers/iommu/s390-kvm-iommu.c
@@ -0,0 +1,469 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * IOMMU API domain ops for s390 PCI devices using KVM passthrough
+ *
+ * Copyright IBM Corp. 2022
+ * Author(s): Matthew Rosato <[email protected]>
+ */
+
+#include <linux/pci.h>
+#include <linux/iommu.h>
+#include <linux/iommu-helper.h>
+#include <linux/sizes.h>
+#include <linux/kvm_host.h>
+#include <asm/kvm_pci.h>
+#include <asm/pci_dma.h>
+#include "s390-iommu.h"
+
+const struct iommu_domain_ops s390_kvm_domain_ops;
+
+static int dma_shadow_cpu_trans(struct s390_kvm_domain *kvm_dom,
+ unsigned long *entry, unsigned long *gentry)
+{
+ phys_addr_t gaddr = 0;
+ unsigned long idx;
+ struct page *page;
+ kvm_pfn_t pfn;
+ gpa_t addr;
+ int rc = 0;
+
+ if (pt_entry_isvalid(*gentry)) {
+ /* pin and validate */
+ addr = *gentry & ZPCI_PTE_ADDR_MASK;
+ idx = srcu_read_lock(&kvm_dom->kvm->srcu);
+ page = kvm_dom->pin(kvm_dom->kvm, gpa_to_gfn(addr));
+ srcu_read_unlock(&kvm_dom->kvm->srcu, idx);
+ if (is_error_page(page))
+ return -EIO;
+ gaddr = page_to_phys(page) + (addr & ~PAGE_MASK);
+ }
+
+ if (pt_entry_isvalid(*entry)) {
+ /* Either we are invalidating, replacing or no-op */
+ if (gaddr != 0) {
+ if ((*entry & ZPCI_PTE_ADDR_MASK) == gaddr) {
+ /* Duplicate */
+ kvm_dom->unpin(*entry >> PAGE_SHIFT);
+ } else {
+ /* Replace */
+ pfn = (*entry >> PAGE_SHIFT);
+ invalidate_pt_entry(entry);
+ set_pt_pfaa(entry, gaddr);
+ validate_pt_entry(entry);
+ kvm_dom->unpin(pfn);
+ rc = 1;
+ }
+ } else {
+ /* Invalidate */
+ pfn = (*entry >> PAGE_SHIFT);
+ invalidate_pt_entry(entry);
+ kvm_dom->unpin(pfn);
+ rc = 1;
+ }
+ } else if (gaddr != 0) {
+ /* New Entry */
+ set_pt_pfaa(entry, gaddr);
+ validate_pt_entry(entry);
+ }
+
+ return rc;
+}
+
+static unsigned long *dma_walk_guest_cpu_trans(struct s390_kvm_domain *kvm_dom,
+ dma_addr_t dma_addr)
+{
+ unsigned long *rto, *sto, *pto;
+ unsigned int rtx, rts, sx, px, idx;
+ struct page *page;
+ gpa_t addr;
+ int i;
+
+ /* Pin guest segment table if needed */
+ rtx = calc_rtx(dma_addr);
+ rto = kvm_dom->head[(rtx / ZPCI_TABLE_ENTRIES_PER_PAGE)];
+ rts = rtx * ZPCI_TABLE_PAGES;
+ if (!kvm_dom->seg[rts]) {
+ if (!reg_entry_isvalid(rto[rtx % ZPCI_TABLE_ENTRIES_PER_PAGE]))
+ return NULL;
+ sto = get_rt_sto(rto[rtx % ZPCI_TABLE_ENTRIES_PER_PAGE]);
+ addr = ((u64)sto & ZPCI_RTE_ADDR_MASK);
+ idx = srcu_read_lock(&kvm_dom->kvm->srcu);
+ for (i = 0; i < ZPCI_TABLE_PAGES; i++) {
+ page = kvm_dom->pin(kvm_dom->kvm, gpa_to_gfn(addr));
+ if (is_error_page(page)) {
+ srcu_read_unlock(&kvm_dom->kvm->srcu, idx);
+ return NULL;
+ }
+ kvm_dom->seg[rts + i] = (page_to_virt(page) +
+ (addr & ~PAGE_MASK));
+ addr += PAGE_SIZE;
+ }
+ srcu_read_unlock(&kvm_dom->kvm->srcu, idx);
+ }
+
+ /* Allocate pin pointers for another segment table if needed */
+ if (!kvm_dom->pt[rtx]) {
+ kvm_dom->pt[rtx] = kcalloc(ZPCI_TABLE_ENTRIES,
+ (sizeof(unsigned long *)),
+ GFP_KERNEL);
+ if (!kvm_dom->pt[rtx])
+ return NULL;
+ }
+ /* Pin guest page table if needed */
+ sx = calc_sx(dma_addr);
+ sto = kvm_dom->seg[(rts + (sx / ZPCI_TABLE_ENTRIES_PER_PAGE))];
+ if (!kvm_dom->pt[rtx][sx]) {
+ if (!reg_entry_isvalid(sto[sx % ZPCI_TABLE_ENTRIES_PER_PAGE]))
+ return NULL;
+ pto = get_st_pto(sto[sx % ZPCI_TABLE_ENTRIES_PER_PAGE]);
+ if (!pto)
+ return NULL;
+ addr = ((u64)pto & ZPCI_STE_ADDR_MASK);
+ idx = srcu_read_lock(&kvm_dom->kvm->srcu);
+ page = kvm_dom->pin(kvm_dom->kvm, gpa_to_gfn(addr));
+ srcu_read_unlock(&kvm_dom->kvm->srcu, idx);
+ if (is_error_page(page))
+ return NULL;
+ kvm_dom->pt[rtx][sx] = page_to_virt(page) + (addr & ~PAGE_MASK);
+ }
+ pto = kvm_dom->pt[rtx][sx];
+
+ /* Return guest PTE */
+ px = calc_px(dma_addr);
+ return &pto[px];
+}
+
+static int dma_table_shadow(struct s390_domain *s390_domain,
+ dma_addr_t dma_addr, size_t nr_pages,
+ size_t *mapped_pages)
+{
+ struct s390_kvm_domain *kvm_dom = &s390_domain->kvm_dom;
+ unsigned long *entry, *gentry;
+ int rc = 0, rc2;
+
+ for (*mapped_pages = 0; *mapped_pages < nr_pages; (*mapped_pages)++) {
+ gentry = dma_walk_guest_cpu_trans(kvm_dom, dma_addr);
+ if (!gentry)
+ continue;
+ entry = dma_walk_cpu_trans(s390_domain->dma_table, dma_addr);
+
+ if (!entry)
+ return -ENOMEM;
+
+ rc2 = dma_shadow_cpu_trans(kvm_dom, entry, gentry);
+ if (rc2 < 0)
+ return -EIO;
+
+ dma_addr += PAGE_SIZE;
+ rc += rc2;
+ }
+
+ return rc;
+}
+
+static int s390_kvm_iommu_update_trans(struct s390_domain *s390_domain,
+ dma_addr_t dma_addr, size_t nr_pages,
+ size_t *mapped)
+{
+ struct s390_domain_device *domain_device;
+ unsigned long irq_flags;
+ size_t mapped_pages;
+ int rc = 0;
+ u8 status;
+
+ mutex_lock(&s390_domain->kvm_dom.ioat_lock);
+ rc = dma_table_shadow(s390_domain, dma_addr, nr_pages, &mapped_pages);
+
+ /* If error or no new mappings, leave immediately without refresh */
+ if (rc <= 0)
+ goto exit;
+
+ spin_lock_irqsave(&s390_domain->list_lock, irq_flags);
+ list_for_each_entry(domain_device, &s390_domain->devices, list) {
+ rc = zpci_refresh_trans((u64) domain_device->zdev->fh << 32,
+ dma_addr, nr_pages * PAGE_SIZE,
+ &status);
+ if (rc) {
+ if (status == 0)
+ rc = -EINVAL;
+ else
+ rc = -EIO;
+ }
+ }
+ spin_unlock_irqrestore(&s390_domain->list_lock, irq_flags);
+
+exit:
+ if (mapped)
+ *mapped = mapped_pages << PAGE_SHIFT;
+
+ mutex_unlock(&s390_domain->kvm_dom.ioat_lock);
+ return rc;
+}
+
+static int s390_kvm_iommu_map(struct iommu_domain *domain, unsigned long iova,
+ phys_addr_t paddr, size_t size, int prot,
+ gfp_t gfp)
+{
+ struct s390_domain *s390_domain = to_s390_domain(domain);
+ size_t nr_pages;
+
+ int rc = 0;
+
+ if (!(prot & (IOMMU_READ | IOMMU_WRITE)))
+ return -EINVAL;
+
+ /* Can only perform mapping when a guest IOTA is registered */
+ if (!s390_domain->kvm_dom.map_enabled)
+ return -EINVAL;
+
+ nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
+ if (!nr_pages)
+ return -EINVAL;
+
+ rc = s390_kvm_iommu_update_trans(s390_domain, iova, nr_pages, NULL);
+
+ return rc;
+}
+
+static int s390_kvm_iommu_map_pages(struct iommu_domain *domain,
+ unsigned long iova, phys_addr_t paddr,
+ size_t pgsize, size_t pgcount, int prot,
+ gfp_t gfp, size_t *mapped)
+{
+ struct s390_domain *s390_domain = to_s390_domain(domain);
+ size_t nr_pages;
+
+ int rc = 0;
+
+ if (!(prot & (IOMMU_READ | IOMMU_WRITE)))
+ return -EINVAL;
+
+ /* Can only perform mapping when a guest IOTA is registered */
+ if (!s390_domain->kvm_dom.map_enabled)
+ return -EINVAL;
+
+ nr_pages = pgcount * (pgsize / PAGE_SIZE);
+ if (!nr_pages)
+ return -EINVAL;
+
+ rc = s390_kvm_iommu_update_trans(s390_domain, iova, nr_pages, mapped);
+
+ return rc;
+}
+
+static void free_pt_entry(struct s390_kvm_domain *kvm_dom, int st, int pt)
+{
+ if (!kvm_dom->pt[st][pt])
+ return;
+
+ kvm_dom->unpin((u64)kvm_dom->pt[st][pt]);
+}
+
+static void free_seg_entry(struct s390_kvm_domain *kvm_dom, int entry)
+{
+ int i, st, count = 0;
+
+ for (i = 0; i < ZPCI_TABLE_PAGES; i++) {
+ if (kvm_dom->seg[entry + i]) {
+ kvm_dom->unpin((u64)kvm_dom->seg[entry + i]);
+ count++;
+ }
+ }
+
+ if (count == 0)
+ return;
+
+ st = entry / ZPCI_TABLE_PAGES;
+ for (i = 0; i < ZPCI_TABLE_ENTRIES; i++)
+ free_pt_entry(kvm_dom, st, i);
+ kfree(kvm_dom->pt[st]);
+}
+
+static int s390_kvm_clear_ioat_tables(struct s390_domain *s390_domain)
+{
+ struct s390_kvm_domain *kvm_dom = &s390_domain->kvm_dom;
+ unsigned long *entry;
+ dma_addr_t dma_addr;
+ kvm_pfn_t pfn;
+ int i;
+
+ if (!kvm_dom->kvm || !kvm_dom->map_enabled)
+ return -EINVAL;
+
+ mutex_lock(&s390_domain->kvm_dom.ioat_lock);
+
+ /* Invalidate and unpin remaining guest pages */
+ for (dma_addr = s390_domain->domain.geometry.aperture_start;
+ dma_addr < s390_domain->domain.geometry.aperture_end;
+ dma_addr += PAGE_SIZE) {
+ entry = dma_walk_cpu_trans(s390_domain->dma_table, dma_addr);
+ if (entry && pt_entry_isvalid(*entry)) {
+ pfn = (*entry >> PAGE_SHIFT);
+ invalidate_pt_entry(entry);
+ kvm_dom->unpin(pfn);
+ }
+ }
+
+ /* Unpin all shadow tables */
+ for (i = 0; i < ZPCI_TABLE_PAGES; i++) {
+ kvm_dom->unpin((u64)kvm_dom->head[i] >> PAGE_SHIFT);
+ kvm_dom->head[i] = 0;
+ }
+
+ for (i = 0; i < ZPCI_TABLE_ENTRIES_PAGES; i += ZPCI_TABLE_PAGES)
+ free_seg_entry(kvm_dom, i);
+
+ kfree(kvm_dom->seg);
+ kfree(kvm_dom->pt);
+
+ mutex_unlock(&s390_domain->kvm_dom.ioat_lock);
+
+ kvm_dom->map_enabled = false;
+
+ return 0;
+}
+
+static void s390_kvm_domain_free(struct iommu_domain *domain)
+{
+ struct s390_domain *s390_domain = to_s390_domain(domain);
+
+ s390_kvm_clear_ioat_tables(s390_domain);
+
+ if (s390_domain->kvm_dom.kvm) {
+ symbol_put(gfn_to_page);
+ symbol_put(kvm_release_pfn_dirty);
+ }
+
+ s390_domain_free(domain);
+}
+
+int zpci_iommu_attach_kvm(struct zpci_dev *zdev, struct kvm *kvm)
+{
+ struct s390_domain *s390_domain = zdev->s390_domain;
+ struct iommu_domain *domain = &s390_domain->domain;
+ struct s390_domain_device *domain_device;
+ unsigned long flags;
+ int rc = 0;
+
+ if (domain->type != IOMMU_DOMAIN_KVM)
+ return -EINVAL;
+
+ if (s390_domain->kvm_dom.kvm != 0)
+ return -EINVAL;
+
+ spin_lock_irqsave(&s390_domain->list_lock, flags);
+ list_for_each_entry(domain_device, &s390_domain->devices, list) {
+ if (domain_device->zdev->kzdev->kvm != kvm) {
+ rc = -EINVAL;
+ break;
+ }
+ domain_device->zdev->kzdev->dom = domain;
+ }
+ spin_unlock_irqrestore(&s390_domain->list_lock, flags);
+
+ if (rc)
+ return rc;
+
+ s390_domain->kvm_dom.pin = symbol_get(gfn_to_page);
+ if (!s390_domain->kvm_dom.pin)
+ return -EINVAL;
+
+ s390_domain->kvm_dom.unpin = symbol_get(kvm_release_pfn_dirty);
+ if (!s390_domain->kvm_dom.unpin) {
+ symbol_put(gfn_to_page);
+ return -EINVAL;
+ }
+
+ s390_domain->kvm_dom.kvm = kvm;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(zpci_iommu_attach_kvm);
+
+int zpci_iommu_kvm_assign_iota(struct zpci_dev *zdev, u64 iota)
+{
+ struct s390_domain *s390_domain = zdev->s390_domain;
+ struct s390_kvm_domain *kvm_dom = &s390_domain->kvm_dom;
+ gpa_t gpa = (gpa_t)(iota & ZPCI_RTE_ADDR_MASK);
+ struct page *page;
+ struct kvm *kvm;
+ unsigned int idx;
+ void *iaddr;
+ int i, rc;
+
+ /* Ensure KVM associated and IOTA not already registered */
+ if (!kvm_dom->kvm || kvm_dom->map_enabled)
+ return -EINVAL;
+
+ /* Ensure supported type specified */
+ if ((iota & ZPCI_IOTA_RTTO_FLAG) != ZPCI_IOTA_RTTO_FLAG)
+ return -EINVAL;
+
+ kvm = kvm_dom->kvm;
+ mutex_lock(&s390_domain->kvm_dom.ioat_lock);
+ idx = srcu_read_lock(&kvm->srcu);
+ for (i = 0; i < ZPCI_TABLE_PAGES; i++) {
+ page = kvm_dom->pin(kvm, gpa_to_gfn(gpa));
+ if (is_error_page(page)) {
+ srcu_read_unlock(&kvm->srcu, idx);
+ rc = -EIO;
+ goto unpin;
+ }
+ iaddr = page_to_virt(page) + (gpa & ~PAGE_MASK);
+ kvm_dom->head[i] = (unsigned long *)iaddr;
+ gpa += PAGE_SIZE;
+ }
+ srcu_read_unlock(&kvm->srcu, idx);
+
+ kvm_dom->seg = kcalloc(ZPCI_TABLE_ENTRIES_PAGES,
+ sizeof(unsigned long *), GFP_KERNEL);
+ if (!kvm_dom->seg)
+ goto unpin;
+ kvm_dom->pt = kcalloc(ZPCI_TABLE_ENTRIES, sizeof(unsigned long **),
+ GFP_KERNEL);
+ if (!kvm_dom->pt)
+ goto free_seg;
+
+ mutex_unlock(&s390_domain->kvm_dom.ioat_lock);
+ kvm_dom->map_enabled = true;
+ return 0;
+
+free_seg:
+ kfree(kvm_dom->seg);
+ rc = -ENOMEM;
+unpin:
+ for (i = 0; i < ZPCI_TABLE_PAGES; i++) {
+ kvm_dom->unpin((u64)kvm_dom->head[i] >> PAGE_SHIFT);
+ kvm_dom->head[i] = 0;
+ }
+ mutex_unlock(&s390_domain->kvm_dom.ioat_lock);
+ return rc;
+}
+EXPORT_SYMBOL_GPL(zpci_iommu_kvm_assign_iota);
+
+int zpci_iommu_kvm_remove_iota(struct zpci_dev *zdev)
+{
+ struct s390_domain *s390_domain = zdev->s390_domain;
+
+ return s390_kvm_clear_ioat_tables(s390_domain);
+}
+EXPORT_SYMBOL_GPL(zpci_iommu_kvm_remove_iota);
+
+const struct iommu_domain_ops s390_kvm_domain_ops = {
+ .attach_dev = s390_iommu_attach_device,
+ .detach_dev = s390_iommu_detach_device,
+ /*
+ * All iommu mapping and unmapping operations are handled via the map
+ * ops. A map over a given range will synchronize the host and guest
+ * DMA tables, performing the necessary mappings / unmappings to
+ * synchronize the table states.
+ * Partial mapping failures do not require a rewind, the guest will
+ * receive an indication that will trigger a global refresh of the
+ * tables.
+ */
+ .map = s390_kvm_iommu_map,
+ .map_pages = s390_kvm_iommu_map_pages,
+ .unmap = NULL,
+ .unmap_pages = NULL,
+ .iova_to_phys = s390_iommu_iova_to_phys,
+ .free = s390_kvm_domain_free,
+};
--
2.27.0

2022-03-17 04:08:50

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH v4 05/32] s390/airq: pass more TPI info to airq handlers

A subsequent patch will introduce an airq handler that requires additional
TPI information beyond directed vs floating, so pass the entire tpi_info
structure via the handler. Only pci actually uses this information today,
for the other airq handlers this is effectively a no-op.

Reviewed-by: Eric Farman <[email protected]>
Reviewed-by: Claudio Imbrenda <[email protected]>
Reviewed-by: Pierre Morel <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Acked-by: Christian Borntraeger <[email protected]>
Acked-by: Cornelia Huck <[email protected]>
Signed-off-by: Matthew Rosato <[email protected]>
---
arch/s390/include/asm/airq.h | 3 ++-
arch/s390/kvm/interrupt.c | 4 +++-
arch/s390/pci/pci_irq.c | 9 +++++++--
drivers/s390/cio/airq.c | 2 +-
drivers/s390/cio/qdio_thinint.c | 6 ++++--
drivers/s390/crypto/ap_bus.c | 9 ++++++---
drivers/s390/virtio/virtio_ccw.c | 4 +++-
7 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/arch/s390/include/asm/airq.h b/arch/s390/include/asm/airq.h
index 01936fdfaddb..7918a7d09028 100644
--- a/arch/s390/include/asm/airq.h
+++ b/arch/s390/include/asm/airq.h
@@ -12,10 +12,11 @@

#include <linux/bit_spinlock.h>
#include <linux/dma-mapping.h>
+#include <asm/tpi.h>

struct airq_struct {
struct hlist_node list; /* Handler queueing. */
- void (*handler)(struct airq_struct *airq, bool floating);
+ void (*handler)(struct airq_struct *airq, struct tpi_info *tpi_info);
u8 *lsi_ptr; /* Local-Summary-Indicator pointer */
u8 lsi_mask; /* Local-Summary-Indicator mask */
u8 isc; /* Interrupt-subclass */
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index db933c252dbc..65e75ca2fc5d 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -28,6 +28,7 @@
#include <asm/switch_to.h>
#include <asm/nmi.h>
#include <asm/airq.h>
+#include <asm/tpi.h>
#include "kvm-s390.h"
#include "gaccess.h"
#include "trace-s390.h"
@@ -3269,7 +3270,8 @@ int kvm_s390_gisc_unregister(struct kvm *kvm, u32 gisc)
}
EXPORT_SYMBOL_GPL(kvm_s390_gisc_unregister);

-static void gib_alert_irq_handler(struct airq_struct *airq, bool floating)
+static void gib_alert_irq_handler(struct airq_struct *airq,
+ struct tpi_info *tpi_info)
{
inc_irq_stat(IRQIO_GAL);
process_gib_alert_list();
diff --git a/arch/s390/pci/pci_irq.c b/arch/s390/pci/pci_irq.c
index 2b6062c486f5..cc4c8d7c8f5c 100644
--- a/arch/s390/pci/pci_irq.c
+++ b/arch/s390/pci/pci_irq.c
@@ -11,6 +11,7 @@

#include <asm/isc.h>
#include <asm/airq.h>
+#include <asm/tpi.h>

static enum {FLOATING, DIRECTED} irq_delivery;

@@ -216,8 +217,11 @@ static void zpci_handle_fallback_irq(void)
}
}

-static void zpci_directed_irq_handler(struct airq_struct *airq, bool floating)
+static void zpci_directed_irq_handler(struct airq_struct *airq,
+ struct tpi_info *tpi_info)
{
+ bool floating = !tpi_info->directed_irq;
+
if (floating) {
inc_irq_stat(IRQIO_PCF);
zpci_handle_fallback_irq();
@@ -227,7 +231,8 @@ static void zpci_directed_irq_handler(struct airq_struct *airq, bool floating)
}
}

-static void zpci_floating_irq_handler(struct airq_struct *airq, bool floating)
+static void zpci_floating_irq_handler(struct airq_struct *airq,
+ struct tpi_info *tpi_info)
{
unsigned long si, ai;
struct airq_iv *aibv;
diff --git a/drivers/s390/cio/airq.c b/drivers/s390/cio/airq.c
index e56535c99888..2f2226786319 100644
--- a/drivers/s390/cio/airq.c
+++ b/drivers/s390/cio/airq.c
@@ -99,7 +99,7 @@ static irqreturn_t do_airq_interrupt(int irq, void *dummy)
rcu_read_lock();
hlist_for_each_entry_rcu(airq, head, list)
if ((*airq->lsi_ptr & airq->lsi_mask) != 0)
- airq->handler(airq, !tpi_info->directed_irq);
+ airq->handler(airq, tpi_info);
rcu_read_unlock();

return IRQ_HANDLED;
diff --git a/drivers/s390/cio/qdio_thinint.c b/drivers/s390/cio/qdio_thinint.c
index 8e09bf3a2fcd..9b9335dd06db 100644
--- a/drivers/s390/cio/qdio_thinint.c
+++ b/drivers/s390/cio/qdio_thinint.c
@@ -15,6 +15,7 @@
#include <asm/qdio.h>
#include <asm/airq.h>
#include <asm/isc.h>
+#include <asm/tpi.h>

#include "cio.h"
#include "ioasm.h"
@@ -93,9 +94,10 @@ static inline u32 clear_shared_ind(void)
/**
* tiqdio_thinint_handler - thin interrupt handler for qdio
* @airq: pointer to adapter interrupt descriptor
- * @floating: flag to recognize floating vs. directed interrupts (unused)
+ * @tpi_info: interrupt information (e.g. floating vs directed -- unused)
*/
-static void tiqdio_thinint_handler(struct airq_struct *airq, bool floating)
+static void tiqdio_thinint_handler(struct airq_struct *airq,
+ struct tpi_info *tpi_info)
{
u64 irq_time = S390_lowcore.int_clock;
u32 si_used = clear_shared_ind();
diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
index 1986243f9cd3..df1a038442db 100644
--- a/drivers/s390/crypto/ap_bus.c
+++ b/drivers/s390/crypto/ap_bus.c
@@ -27,6 +27,7 @@
#include <linux/kthread.h>
#include <linux/mutex.h>
#include <asm/airq.h>
+#include <asm/tpi.h>
#include <linux/atomic.h>
#include <asm/isc.h>
#include <linux/hrtimer.h>
@@ -129,7 +130,8 @@ static int ap_max_adapter_id = 63;
static struct bus_type ap_bus_type;

/* Adapter interrupt definitions */
-static void ap_interrupt_handler(struct airq_struct *airq, bool floating);
+static void ap_interrupt_handler(struct airq_struct *airq,
+ struct tpi_info *tpi_info);

static bool ap_irq_flag;

@@ -442,9 +444,10 @@ static enum hrtimer_restart ap_poll_timeout(struct hrtimer *unused)
/**
* ap_interrupt_handler() - Schedule ap_tasklet on interrupt
* @airq: pointer to adapter interrupt descriptor
- * @floating: ignored
+ * @tpi_info: ignored
*/
-static void ap_interrupt_handler(struct airq_struct *airq, bool floating)
+static void ap_interrupt_handler(struct airq_struct *airq,
+ struct tpi_info *tpi_info)
{
inc_irq_stat(IRQIO_APB);
tasklet_schedule(&ap_tasklet);
diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
index d35e7a3f7067..52c376d15978 100644
--- a/drivers/s390/virtio/virtio_ccw.c
+++ b/drivers/s390/virtio/virtio_ccw.c
@@ -33,6 +33,7 @@
#include <asm/virtio-ccw.h>
#include <asm/isc.h>
#include <asm/airq.h>
+#include <asm/tpi.h>

/*
* virtio related functions
@@ -203,7 +204,8 @@ static void drop_airq_indicator(struct virtqueue *vq, struct airq_info *info)
write_unlock_irqrestore(&info->lock, flags);
}

-static void virtio_airq_handler(struct airq_struct *airq, bool floating)
+static void virtio_airq_handler(struct airq_struct *airq,
+ struct tpi_info *tpi_info)
{
struct airq_info *info = container_of(airq, struct airq_info, airq);
unsigned long ai;
--
2.27.0

2022-03-17 04:09:01

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH v4 19/32] KVM: s390: pci: do initial setup for AEN interpretation

Initial setup for Adapter Event Notification Interpretation for zPCI
passthrough devices. Specifically, allocate a structure for forwarding of
adapter events and pass the address of this structure to firmware.

Signed-off-by: Matthew Rosato <[email protected]>
---
arch/s390/include/asm/pci.h | 4 +
arch/s390/include/asm/pci_insn.h | 12 +++
arch/s390/kvm/interrupt.c | 14 +++
arch/s390/kvm/kvm-s390.c | 9 ++
arch/s390/kvm/pci.c | 154 +++++++++++++++++++++++++++++++
arch/s390/kvm/pci.h | 42 +++++++++
arch/s390/pci/pci.c | 6 ++
7 files changed, 241 insertions(+)
create mode 100644 arch/s390/kvm/pci.h

diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 4faff673078b..1ae49330d1c8 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -9,6 +9,7 @@
#include <asm-generic/pci.h>
#include <asm/pci_clp.h>
#include <asm/pci_debug.h>
+#include <asm/pci_insn.h>
#include <asm/sclp.h>

#define PCIBIOS_MIN_IO 0x1000
@@ -204,6 +205,9 @@ extern const struct attribute_group *zpci_attr_groups[];
extern unsigned int s390_pci_force_floating __initdata;
extern unsigned int s390_pci_no_rid;

+extern union zpci_sic_iib *zpci_aipb;
+extern struct airq_iv *zpci_aif_sbv;
+
/* -----------------------------------------------------------------------------
Prototypes
----------------------------------------------------------------------------- */
diff --git a/arch/s390/include/asm/pci_insn.h b/arch/s390/include/asm/pci_insn.h
index 32759c407b8f..ad9000295c82 100644
--- a/arch/s390/include/asm/pci_insn.h
+++ b/arch/s390/include/asm/pci_insn.h
@@ -101,6 +101,7 @@ struct zpci_fib {
/* Set Interruption Controls Operation Controls */
#define SIC_IRQ_MODE_ALL 0
#define SIC_IRQ_MODE_SINGLE 1
+#define SIC_SET_AENI_CONTROLS 2
#define SIC_IRQ_MODE_DIRECT 4
#define SIC_IRQ_MODE_D_ALL 16
#define SIC_IRQ_MODE_D_SINGLE 17
@@ -127,9 +128,20 @@ struct zpci_cdiib {
u64 : 64;
} __packed __aligned(8);

+/* adapter interruption parameters block */
+struct zpci_aipb {
+ u64 faisb;
+ u64 gait;
+ u16 : 13;
+ u16 afi : 3;
+ u32 : 32;
+ u16 faal;
+} __packed __aligned(8);
+
union zpci_sic_iib {
struct zpci_diib diib;
struct zpci_cdiib cdiib;
+ struct zpci_aipb aipb;
};

DECLARE_STATIC_KEY_FALSE(have_mio);
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 65e75ca2fc5d..17c7deb516d2 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -32,6 +32,7 @@
#include "kvm-s390.h"
#include "gaccess.h"
#include "trace-s390.h"
+#include "pci.h"

#define PFAULT_INIT 0x0600
#define PFAULT_DONE 0x0680
@@ -3286,6 +3287,11 @@ void kvm_s390_gib_destroy(void)
{
if (!gib)
return;
+ if (IS_ENABLED(CONFIG_VFIO_PCI) && sclp.has_aeni && aift) {
+ mutex_lock(&aift->aift_lock);
+ kvm_s390_pci_aen_exit();
+ mutex_unlock(&aift->aift_lock);
+ }
chsc_sgib(0);
unregister_adapter_interrupt(&gib_alert_irq);
free_page((unsigned long)gib);
@@ -3323,6 +3329,14 @@ int kvm_s390_gib_init(u8 nisc)
goto out_unreg_gal;
}

+ if (IS_ENABLED(CONFIG_VFIO_PCI) && sclp.has_aeni) {
+ if (kvm_s390_pci_aen_init(nisc)) {
+ pr_err("Initializing AEN for PCI failed\n");
+ rc = -EIO;
+ goto out_unreg_gal;
+ }
+ }
+
KVM_EVENT(3, "gib 0x%pK (nisc=%d) initialized", gib, gib->nisc);
goto out;

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 2296b1ff1e02..d89cd16b57dd 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -48,6 +48,7 @@
#include <asm/fpu/api.h>
#include "kvm-s390.h"
#include "gaccess.h"
+#include "pci.h"

#define CREATE_TRACE_POINTS
#include "trace.h"
@@ -503,6 +504,14 @@ int kvm_arch_init(void *opaque)
goto out;
}

+ if (IS_ENABLED(CONFIG_VFIO_PCI)) {
+ rc = kvm_s390_pci_init();
+ if (rc) {
+ pr_err("Unable to allocate AIFT for PCI\n");
+ goto out;
+ }
+ }
+
rc = kvm_s390_gib_init(GAL_ISC);
if (rc)
goto out;
diff --git a/arch/s390/kvm/pci.c b/arch/s390/kvm/pci.c
index 612faf87126d..1c42d25de697 100644
--- a/arch/s390/kvm/pci.c
+++ b/arch/s390/kvm/pci.c
@@ -10,6 +10,148 @@
#include <linux/kvm_host.h>
#include <linux/pci.h>
#include <asm/kvm_pci.h>
+#include <asm/pci.h>
+#include <asm/pci_insn.h>
+#include "pci.h"
+
+struct zpci_aift *aift;
+
+static inline int __set_irq_noiib(u16 ctl, u8 isc)
+{
+ union zpci_sic_iib iib = {{0}};
+
+ return zpci_set_irq_ctrl(ctl, isc, &iib);
+}
+
+/* Caller must hold the aift lock before calling this function */
+void kvm_s390_pci_aen_exit(void)
+{
+ unsigned long flags;
+ struct kvm_zdev **gait_kzdev;
+
+ /*
+ * Contents of the aipb remain registered for the life of the host
+ * kernel, the information preserved in zpci_aipb and zpci_aif_sbv
+ * in case we insert the KVM module again later. Clear the AIFT
+ * information and free anything not registered with underlying
+ * firmware.
+ */
+ spin_lock_irqsave(&aift->gait_lock, flags);
+ gait_kzdev = aift->kzdev;
+ aift->gait = 0;
+ aift->sbv = 0;
+ aift->kzdev = 0;
+ spin_unlock_irqrestore(&aift->gait_lock, flags);
+
+ kfree(gait_kzdev);
+}
+
+static int zpci_setup_aipb(u8 nisc)
+{
+ struct page *page;
+ int size, rc;
+
+ zpci_aipb = kzalloc(sizeof(union zpci_sic_iib), GFP_KERNEL);
+ if (!zpci_aipb)
+ return -ENOMEM;
+
+ aift->sbv = airq_iv_create(ZPCI_NR_DEVICES, AIRQ_IV_ALLOC, 0);
+ if (!aift->sbv) {
+ rc = -ENOMEM;
+ goto free_aipb;
+ }
+ zpci_aif_sbv = aift->sbv;
+ size = get_order(PAGE_ALIGN(ZPCI_NR_DEVICES *
+ sizeof(struct zpci_gaite)));
+ page = alloc_pages(GFP_KERNEL | __GFP_ZERO, size);
+ if (!page) {
+ rc = -ENOMEM;
+ goto free_sbv;
+ }
+ aift->gait = (struct zpci_gaite *)page_to_phys(page);
+
+ zpci_aipb->aipb.faisb = virt_to_phys(aift->sbv->vector);
+ zpci_aipb->aipb.gait = virt_to_phys(aift->gait);
+ zpci_aipb->aipb.afi = nisc;
+ zpci_aipb->aipb.faal = ZPCI_NR_DEVICES;
+
+ /* Setup Adapter Event Notification Interpretation */
+ if (zpci_set_irq_ctrl(SIC_SET_AENI_CONTROLS, 0, zpci_aipb)) {
+ rc = -EIO;
+ goto free_gait;
+ }
+
+ return 0;
+
+free_gait:
+ size = get_order(PAGE_ALIGN(ZPCI_NR_DEVICES *
+ sizeof(struct zpci_gaite)));
+ free_pages((unsigned long)aift->gait, size);
+free_sbv:
+ airq_iv_release(aift->sbv);
+ zpci_aif_sbv = 0;
+free_aipb:
+ kfree(zpci_aipb);
+ zpci_aipb = 0;
+
+ return rc;
+}
+
+static int zpci_reset_aipb(u8 nisc)
+{
+ /*
+ * AEN registration can only happen once per system boot. If
+ * an aipb already exists then AEN was already registered and
+ * we can re-use the aipb contents. This can only happen if
+ * the KVM module was removed and re-inserted.
+ */
+ if (zpci_aipb->aipb.faal != ZPCI_NR_DEVICES ||
+ zpci_aipb->aipb.afi != nisc) {
+ return -EINVAL;
+ }
+ aift->sbv = zpci_aif_sbv;
+ aift->gait = (struct zpci_gaite *)zpci_aipb->aipb.gait;
+
+ return 0;
+}
+
+int kvm_s390_pci_aen_init(u8 nisc)
+{
+ int rc = 0;
+
+ /* If already enabled for AEN, bail out now */
+ if (aift->gait || aift->sbv)
+ return -EPERM;
+
+ mutex_lock(&aift->aift_lock);
+ aift->kzdev = kcalloc(ZPCI_NR_DEVICES, sizeof(struct kvm_zdev),
+ GFP_KERNEL);
+ if (!aift->kzdev) {
+ rc = -ENOMEM;
+ goto unlock;
+ }
+
+ if (!zpci_aipb)
+ rc = zpci_setup_aipb(nisc);
+ else
+ rc = zpci_reset_aipb(nisc);
+ if (rc)
+ goto free_zdev;
+
+ /* Enable floating IRQs */
+ if (__set_irq_noiib(SIC_IRQ_MODE_SINGLE, nisc)) {
+ rc = -EIO;
+ kvm_s390_pci_aen_exit();
+ }
+
+ goto unlock;
+
+free_zdev:
+ kfree(aift->kzdev);
+unlock:
+ mutex_unlock(&aift->aift_lock);
+ return rc;
+}

int kvm_s390_pci_dev_open(struct zpci_dev *zdev)
{
@@ -36,3 +178,15 @@ void kvm_s390_pci_dev_release(struct zpci_dev *zdev)
kfree(kzdev);
}
EXPORT_SYMBOL_GPL(kvm_s390_pci_dev_release);
+
+int kvm_s390_pci_init(void)
+{
+ aift = kzalloc(sizeof(struct zpci_aift), GFP_KERNEL);
+ if (!aift)
+ return -ENOMEM;
+
+ spin_lock_init(&aift->gait_lock);
+ mutex_init(&aift->aift_lock);
+
+ return 0;
+}
diff --git a/arch/s390/kvm/pci.h b/arch/s390/kvm/pci.h
new file mode 100644
index 000000000000..19609d7a53a7
--- /dev/null
+++ b/arch/s390/kvm/pci.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * s390 kvm PCI passthrough support
+ *
+ * Copyright IBM Corp. 2022
+ *
+ * Author(s): Matthew Rosato <[email protected]>
+ */
+
+#ifndef __KVM_S390_PCI_H
+#define __KVM_S390_PCI_H
+
+#include <linux/pci.h>
+#include <linux/mutex.h>
+#include <asm/airq.h>
+#include <asm/kvm_pci.h>
+
+struct zpci_gaite {
+ u32 gisa;
+ u8 gisc;
+ u8 count;
+ u8 reserved;
+ u8 aisbo;
+ u64 aisb;
+};
+
+struct zpci_aift {
+ struct zpci_gaite *gait;
+ struct airq_iv *sbv;
+ struct kvm_zdev **kzdev;
+ spinlock_t gait_lock; /* Protects the gait, used during AEN forward */
+ struct mutex aift_lock; /* Protects the other structures in aift */
+};
+
+extern struct zpci_aift *aift;
+
+int kvm_s390_pci_aen_init(u8 nisc);
+void kvm_s390_pci_aen_exit(void);
+
+int kvm_s390_pci_init(void);
+
+#endif /* __KVM_S390_PCI_H */
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index 04c16312ad54..13033717cd4e 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -61,6 +61,12 @@ DEFINE_STATIC_KEY_FALSE(have_mio);

static struct kmem_cache *zdev_fmb_cache;

+/* AEN structures that must be preserved over KVM module re-insertion */
+union zpci_sic_iib *zpci_aipb;
+EXPORT_SYMBOL_GPL(zpci_aipb);
+struct airq_iv *zpci_aif_sbv;
+EXPORT_SYMBOL_GPL(zpci_aif_sbv);
+
struct zpci_dev *get_zdev_by_fid(u32 fid)
{
struct zpci_dev *tmp, *zdev = NULL;
--
2.27.0

2022-03-17 04:12:05

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH v4 32/32] MAINTAINERS: update s390 IOMMU entry

Use wildcard to pick up new parts added by KVM domain support.

Signed-off-by: Matthew Rosato <[email protected]>
---
MAINTAINERS | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 6c76eb66b10a..d803f490eafb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16867,7 +16867,7 @@ M: Gerald Schaefer <[email protected]>
L: [email protected]
S: Supported
W: http://www.ibm.com/developerworks/linux/linux390/
-F: drivers/iommu/s390-iommu.c
+F: drivers/iommu/s390*

S390 IUCV NETWORK LAYER
M: Alexandra Winter <[email protected]>
--
2.27.0

2022-03-17 04:12:21

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH v4 09/32] s390/pci: export some routines related to RPCIT processing

KVM will re-use dma_walk_cpu_trans to walk the host shadow table and
will also need to be able to call zpci_refresh_trans to re-issue a RPCIT.

Reviewed-by: Niklas Schnelle <[email protected]>
Reviewed-by: Pierre Morel <[email protected]>
Acked-by: Christian Borntraeger <[email protected]>
Signed-off-by: Matthew Rosato <[email protected]>
---
arch/s390/pci/pci_dma.c | 1 +
arch/s390/pci/pci_insn.c | 1 +
2 files changed, 2 insertions(+)

diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
index f46833a25526..a81de48d5ea7 100644
--- a/arch/s390/pci/pci_dma.c
+++ b/arch/s390/pci/pci_dma.c
@@ -116,6 +116,7 @@ unsigned long *dma_walk_cpu_trans(unsigned long *rto, dma_addr_t dma_addr)
px = calc_px(dma_addr);
return &pto[px];
}
+EXPORT_SYMBOL_GPL(dma_walk_cpu_trans);

void dma_update_cpu_trans(unsigned long *entry, phys_addr_t page_addr, int flags)
{
diff --git a/arch/s390/pci/pci_insn.c b/arch/s390/pci/pci_insn.c
index 2a47b3936e44..0509554301c7 100644
--- a/arch/s390/pci/pci_insn.c
+++ b/arch/s390/pci/pci_insn.c
@@ -95,6 +95,7 @@ int zpci_refresh_trans(u64 fn, u64 addr, u64 range)

return (cc) ? -EIO : 0;
}
+EXPORT_SYMBOL_GPL(zpci_refresh_trans);

/* Set Interruption Controls */
int zpci_set_irq_ctrl(u16 ctl, u8 isc, union zpci_sic_iib *iib)
--
2.27.0

2022-03-17 04:40:43

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH v4 17/32] KVM: s390: pci: add basic kvm_zdev structure

This structure will be used to carry kvm passthrough information related to
zPCI devices.

Signed-off-by: Matthew Rosato <[email protected]>
---
arch/s390/include/asm/kvm_pci.h | 27 +++++++++++++++++++++++
arch/s390/include/asm/pci.h | 3 +++
arch/s390/kvm/Makefile | 1 +
arch/s390/kvm/pci.c | 38 +++++++++++++++++++++++++++++++++
4 files changed, 69 insertions(+)
create mode 100644 arch/s390/include/asm/kvm_pci.h
create mode 100644 arch/s390/kvm/pci.c

diff --git a/arch/s390/include/asm/kvm_pci.h b/arch/s390/include/asm/kvm_pci.h
new file mode 100644
index 000000000000..ae8669105f72
--- /dev/null
+++ b/arch/s390/include/asm/kvm_pci.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * KVM PCI Passthrough for virtual machines on s390
+ *
+ * Copyright IBM Corp. 2022
+ *
+ * Author(s): Matthew Rosato <[email protected]>
+ */
+
+#ifndef ASM_KVM_PCI_H
+#define ASM_KVM_PCI_H
+
+#include <linux/types.h>
+#include <linux/kvm_types.h>
+#include <linux/kvm_host.h>
+#include <linux/kvm.h>
+#include <linux/pci.h>
+
+struct kvm_zdev {
+ struct zpci_dev *zdev;
+ struct kvm *kvm;
+};
+
+int kvm_s390_pci_dev_open(struct zpci_dev *zdev);
+void kvm_s390_pci_dev_release(struct zpci_dev *zdev);
+
+#endif /* ASM_KVM_PCI_H */
diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index e8a3fd5bc169..4faff673078b 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -97,6 +97,7 @@ struct zpci_bar_struct {
};

struct s390_domain;
+struct kvm_zdev;

#define ZPCI_FUNCTIONS_PER_BUS 256
struct zpci_bus {
@@ -190,6 +191,8 @@ struct zpci_dev {
struct dentry *debugfs_dev;

struct s390_domain *s390_domain; /* s390 IOMMU domain data */
+
+ struct kvm_zdev *kzdev; /* passthrough data */
};

static inline bool zdev_enabled(struct zpci_dev *zdev)
diff --git a/arch/s390/kvm/Makefile b/arch/s390/kvm/Makefile
index 26f4a74e5ce4..00cf6853d93f 100644
--- a/arch/s390/kvm/Makefile
+++ b/arch/s390/kvm/Makefile
@@ -10,4 +10,5 @@ ccflags-y := -Ivirt/kvm -Iarch/s390/kvm
kvm-y += kvm-s390.o intercept.o interrupt.o priv.o sigp.o
kvm-y += diag.o gaccess.o guestdbg.o vsie.o pv.o

+kvm-$(CONFIG_PCI) += pci.o
obj-$(CONFIG_KVM) += kvm.o
diff --git a/arch/s390/kvm/pci.c b/arch/s390/kvm/pci.c
new file mode 100644
index 000000000000..612faf87126d
--- /dev/null
+++ b/arch/s390/kvm/pci.c
@@ -0,0 +1,38 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * s390 kvm PCI passthrough support
+ *
+ * Copyright IBM Corp. 2022
+ *
+ * Author(s): Matthew Rosato <[email protected]>
+ */
+
+#include <linux/kvm_host.h>
+#include <linux/pci.h>
+#include <asm/kvm_pci.h>
+
+int kvm_s390_pci_dev_open(struct zpci_dev *zdev)
+{
+ struct kvm_zdev *kzdev;
+
+ kzdev = kzalloc(sizeof(struct kvm_zdev), GFP_KERNEL);
+ if (!kzdev)
+ return -ENOMEM;
+
+ kzdev->zdev = zdev;
+ zdev->kzdev = kzdev;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_s390_pci_dev_open);
+
+void kvm_s390_pci_dev_release(struct zpci_dev *zdev)
+{
+ struct kvm_zdev *kzdev;
+
+ kzdev = zdev->kzdev;
+ WARN_ON(kzdev->zdev != zdev);
+ zdev->kzdev = 0;
+ kfree(kzdev);
+}
+EXPORT_SYMBOL_GPL(kvm_s390_pci_dev_release);
--
2.27.0

2022-03-17 04:40:52

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH v4 15/32] vfio: introduce KVM-owned IOMMU type

s390x will introduce a new IOMMU domain type where the mappings are
managed by KVM rather than in response to userspace mapping ioctls. Allow
for specifying this type on the VFIO_SET_IOMMU ioctl and triggering the
appropriate iommu interface for overriding the default domain.

Signed-off-by: Matthew Rosato <[email protected]>
---
drivers/vfio/vfio_iommu_type1.c | 12 +++++++++++-
include/uapi/linux/vfio.h | 6 ++++++
2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 9394aa9444c1..0bec97077d61 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -77,6 +77,7 @@ struct vfio_iommu {
bool nesting;
bool dirty_page_tracking;
bool container_open;
+ bool kvm;
struct list_head emulated_iommu_groups;
};

@@ -2203,7 +2204,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
goto out_free_group;

ret = -EIO;
- domain->domain = iommu_domain_alloc(bus);
+
+ if (iommu->kvm)
+ domain->domain = iommu_domain_alloc_type(bus, IOMMU_DOMAIN_KVM);
+ else
+ domain->domain = iommu_domain_alloc(bus);
+
if (!domain->domain)
goto out_free_domain;

@@ -2552,6 +2558,9 @@ static void *vfio_iommu_type1_open(unsigned long arg)
case VFIO_TYPE1v2_IOMMU:
iommu->v2 = true;
break;
+ case VFIO_KVM_IOMMU:
+ iommu->kvm = true;
+ break;
default:
kfree(iommu);
return ERR_PTR(-EINVAL);
@@ -2637,6 +2646,7 @@ static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu,
case VFIO_TYPE1_NESTING_IOMMU:
case VFIO_UNMAP_ALL:
case VFIO_UPDATE_VADDR:
+ case VFIO_KVM_IOMMU:
return 1;
case VFIO_DMA_CC_IOMMU:
if (!iommu)
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index ef33ea002b0b..666edb6957ac 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -52,6 +52,12 @@
/* Supports the vaddr flag for DMA map and unmap */
#define VFIO_UPDATE_VADDR 10

+/*
+ * The KVM_IOMMU type implies that the hypervisor will control the mappings
+ * rather than userspace
+ */
+#define VFIO_KVM_IOMMU 11
+
/*
* The IOCTL interface is designed for extensibility by embedding the
* structure length (argsz) and flags into structures passed between
--
2.27.0

2022-03-17 04:41:08

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 29/32] vfio-pci/zdev: add DTSM to clp group capability

On Mon, Mar 14, 2022 at 03:44:48PM -0400, Matthew Rosato wrote:
> The DTSM, or designation type supported mask, indicates what IOAT formats
> are available to the guest. For an interpreted device, userspace will not
> know what format(s) the IOAT assist supports, so pass it via the
> capability chain. Since the value belongs to the Query PCI Function Group
> clp, let's extend the existing capability with a new version.

Why is this on the VFIO device?

Maybe I don't quite understand it right, but the IOAT is the
'userspace page table'?

That is something that should be modeled as a nested iommu domain.

Querying the formats and any control logic for this should be on the
iommu side not built into VFIO.

Jason

2022-03-17 04:41:49

by Matthew Rosato

[permalink] [raw]
Subject: Re: [PATCH v4 22/32] KVM: s390: pci: routines for (dis)associating zPCI devices with a KVM

On 3/14/22 5:46 PM, Jason Gunthorpe wrote:
> On Mon, Mar 14, 2022 at 03:44:41PM -0400, Matthew Rosato wrote:
>> +int kvm_s390_pci_zpci_start(struct kvm *kvm, struct zpci_dev *zdev)
>> +{
>> + struct vfio_device *vdev;
>> + struct pci_dev *pdev;
>> + int rc;
>> +
>> + rc = kvm_s390_pci_dev_open(zdev);
>> + if (rc)
>> + return rc;
>> +
>> + pdev = pci_get_slot(zdev->zbus->bus, zdev->devfn);
>> + if (!pdev) {
>> + rc = -ENODEV;
>> + goto exit_err;
>> + }
>> +
>> + vdev = get_vdev(&pdev->dev);
>> + if (!vdev) {
>> + pci_dev_put(pdev);
>> + rc = -ENODEV;
>> + goto exit_err;
>> + }
>> +
>> + zdev->kzdev->nb.notifier_call = kvm_s390_pci_group_notifier;
>> +
>> + /*
>> + * At this point, a KVM should already be associated with this device,
>> + * so registering the notifier now should immediately trigger the
>> + * event. We also want to know if the KVM association is later removed
>> + * to ensure proper cleanup happens.
>> + */
>> + rc = register_notifier(vdev->dev, &zdev->kzdev->nb);
>> +
>> + put_vdev(vdev);
>> + pci_dev_put(pdev);
>> +
>> + /* Make sure the registered KVM matches the KVM issuing the ioctl */
>> + if (rc || zdev->kzdev->kvm != kvm) {
>> + rc = -ENODEV;
>> + goto exit_err;
>> + }
>> +
>> + /* Must support KVM-managed IOMMU to proceed */
>> + if (IS_ENABLED(CONFIG_S390_KVM_IOMMU))
>> + rc = zpci_iommu_attach_kvm(zdev, kvm);
>> + else
>> + rc = -EINVAL;
>
> This seems like kind of a strange API, shouldn't kvm be getting a
> reference on the underlying iommu_domain and then calling into it to
> get the mapping table instead of pushing KVM specific logic into the
> iommu driver?
>
> I would be nice if all the special kvm stuff could more isolated in
> kvm code.
>
> I'm still a little unclear about why this is so complicated - can't
> you get the iommu_domain from the group FD directly in KVM code as
> power does?

Yeah, I think I could do something like that using the vfio group fd
like power does.

Providing a reference to the kvm itself inside iommu was being used for
the pin/unpin operations, which would not be necessary if we switched to
the 1st layer iommu pinning all of guest memory.



2022-03-17 05:44:27

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH v4 21/32] KVM: s390: mechanism to enable guest zPCI Interpretation

The guest must have access to certain facilities in order to allow
interpretive execution of zPCI instructions and adapter event
notifications. However, there are some cases where a guest might
disable interpretation -- provide a mechanism via which we can defer
enabling the associated zPCI interpretation facilities until the guest
indicates it wishes to use them.

Acked-by: Pierre Morel <[email protected]>
Signed-off-by: Matthew Rosato <[email protected]>
---
arch/s390/include/asm/kvm_host.h | 4 ++++
arch/s390/kvm/kvm-s390.c | 41 ++++++++++++++++++++++++++++++++
arch/s390/kvm/kvm-s390.h | 10 ++++++++
3 files changed, 55 insertions(+)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index b468d3a2215e..bf61ab05f98c 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -252,7 +252,10 @@ struct kvm_s390_sie_block {
#define ECB2_IEP 0x20
#define ECB2_PFMFI 0x08
#define ECB2_ESCA 0x04
+#define ECB2_ZPCI_LSI 0x02
__u8 ecb2; /* 0x0062 */
+#define ECB3_AISI 0x20
+#define ECB3_AISII 0x10
#define ECB3_DEA 0x08
#define ECB3_AES 0x04
#define ECB3_RI 0x01
@@ -938,6 +941,7 @@ struct kvm_arch{
int use_cmma;
int use_pfmfi;
int use_skf;
+ int use_zpci_interp;
int user_cpu_state_ctrl;
int user_sigp;
int user_stsi;
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 32e75f6f4e4d..d91b2547f0bf 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1029,6 +1029,45 @@ static int kvm_s390_vm_set_crypto(struct kvm *kvm, struct kvm_device_attr *attr)
return 0;
}

+static void kvm_s390_vcpu_pci_setup(struct kvm_vcpu *vcpu)
+{
+ /* Only set the ECB bits after guest requests zPCI interpretation */
+ if (!vcpu->kvm->arch.use_zpci_interp)
+ return;
+
+ vcpu->arch.sie_block->ecb2 |= ECB2_ZPCI_LSI;
+ vcpu->arch.sie_block->ecb3 |= ECB3_AISII + ECB3_AISI;
+}
+
+void kvm_s390_vcpu_pci_enable_interp(struct kvm *kvm)
+{
+ struct kvm_vcpu *vcpu;
+ unsigned long i;
+
+ /*
+ * If host is configured for PCI and the necessary facilities are
+ * available, turn on interpretation for the life of this guest
+ */
+ if (!sclp.has_zpci_lsi || !sclp.has_aisii || !sclp.has_aeni ||
+ !sclp.has_aisi || !IS_ENABLED(CONFIG_VFIO_PCI) ||
+ !IS_ENABLED(CONFIG_S390_KVM_IOMMU))
+ return;
+
+ mutex_lock(&kvm->lock);
+
+ kvm->arch.use_zpci_interp = 1;
+
+ kvm_s390_vcpu_block_all(kvm);
+
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ kvm_s390_vcpu_pci_setup(vcpu);
+ kvm_s390_sync_request(KVM_REQ_VSIE_RESTART, vcpu);
+ }
+
+ kvm_s390_vcpu_unblock_all(kvm);
+ mutex_unlock(&kvm->lock);
+}
+
static void kvm_s390_sync_request_broadcast(struct kvm *kvm, int req)
{
unsigned long cx;
@@ -3236,6 +3275,8 @@ static int kvm_s390_vcpu_setup(struct kvm_vcpu *vcpu)

kvm_s390_vcpu_crypto_setup(vcpu);

+ kvm_s390_vcpu_pci_setup(vcpu);
+
mutex_lock(&vcpu->kvm->lock);
if (kvm_s390_pv_is_protected(vcpu->kvm)) {
rc = kvm_s390_pv_create_cpu(vcpu, &uvrc, &uvrrc);
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index 098831e815e6..14bb2539f837 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -496,6 +496,16 @@ void kvm_s390_reinject_machine_check(struct kvm_vcpu *vcpu,
*/
void kvm_s390_vcpu_crypto_reset_all(struct kvm *kvm);

+/**
+ * kvm_s390_vcpu_pci_enable_interp
+ *
+ * Set the associated PCI attributes for each vcpu to allow for zPCI Load/Store
+ * interpretation as well as adapter interruption forwarding.
+ *
+ * @kvm: the KVM guest
+ */
+void kvm_s390_vcpu_pci_enable_interp(struct kvm *kvm);
+
/**
* diag9c_forwarding_hz
*
--
2.27.0

2022-03-17 05:54:34

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH v4 10/32] s390/pci: stash dtsm and maxstbl

Store information about what IOAT designation types are supported by
underlying hardware as well as the largest store block size allowed.
These values will be needed by passthrough.

Reviewed-by: Niklas Schnelle <[email protected]>
Reviewed-by: Pierre Morel <[email protected]>
Reviewed-by: Christian Borntraeger <[email protected]>
Signed-off-by: Matthew Rosato <[email protected]>
---
arch/s390/include/asm/pci.h | 2 ++
arch/s390/include/asm/pci_clp.h | 6 ++++--
arch/s390/pci/pci_clp.c | 2 ++
3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index d07d7c3205de..7ee52a70a96f 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -126,9 +126,11 @@ struct zpci_dev {
u32 gisa; /* GISA designation for passthrough */
u16 vfn; /* virtual function number */
u16 pchid; /* physical channel ID */
+ u16 maxstbl; /* Maximum store block size */
u8 pfgid; /* function group ID */
u8 pft; /* pci function type */
u8 port;
+ u8 dtsm; /* Supported DT mask */
u8 rid_available : 1;
u8 has_hp_slot : 1;
u8 has_resources : 1;
diff --git a/arch/s390/include/asm/pci_clp.h b/arch/s390/include/asm/pci_clp.h
index f3286bc5ba6e..d6189ed14f84 100644
--- a/arch/s390/include/asm/pci_clp.h
+++ b/arch/s390/include/asm/pci_clp.h
@@ -153,9 +153,11 @@ struct clp_rsp_query_pci_grp {
u8 : 6;
u8 frame : 1;
u8 refresh : 1; /* TLB refresh mode */
- u16 reserved2;
+ u16 : 3;
+ u16 maxstbl : 13; /* Maximum store block size */
u16 mui;
- u16 : 16;
+ u8 dtsm; /* Supported DT mask */
+ u8 reserved3;
u16 maxfaal;
u16 : 4;
u16 dnoi : 12;
diff --git a/arch/s390/pci/pci_clp.c b/arch/s390/pci/pci_clp.c
index 4dcc37ddeeaf..dc733b58e74f 100644
--- a/arch/s390/pci/pci_clp.c
+++ b/arch/s390/pci/pci_clp.c
@@ -103,6 +103,8 @@ static void clp_store_query_pci_fngrp(struct zpci_dev *zdev,
zdev->max_msi = response->noi;
zdev->fmb_update = response->mui;
zdev->version = response->version;
+ zdev->maxstbl = response->maxstbl;
+ zdev->dtsm = response->dtsm;

switch (response->version) {
case 1:
--
2.27.0

2022-03-17 06:01:34

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH v4 20/32] KVM: s390: pci: enable host forwarding of Adapter Event Notifications

In cases where interrupts are not forwarded to the guest via firmware,
KVM is responsible for ensuring delivery. When an interrupt presents
with the forwarding bit, we must process the forwarding tables until
all interrupts are delivered.

Signed-off-by: Matthew Rosato <[email protected]>
---
arch/s390/include/asm/kvm_host.h | 1 +
arch/s390/include/asm/tpi.h | 13 ++++++
arch/s390/kvm/interrupt.c | 77 +++++++++++++++++++++++++++++++-
arch/s390/kvm/kvm-s390.c | 3 +-
arch/s390/kvm/pci.h | 10 +++++
5 files changed, 102 insertions(+), 2 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index a22c9266ea05..b468d3a2215e 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -757,6 +757,7 @@ struct kvm_vm_stat {
u64 inject_pfault_done;
u64 inject_service_signal;
u64 inject_virtio;
+ u64 aen_forward;
};

struct kvm_arch_memory_slot {
diff --git a/arch/s390/include/asm/tpi.h b/arch/s390/include/asm/tpi.h
index 1ac538b8cbf5..f76e5fdff23a 100644
--- a/arch/s390/include/asm/tpi.h
+++ b/arch/s390/include/asm/tpi.h
@@ -19,6 +19,19 @@ struct tpi_info {
u32 :12;
} __packed __aligned(4);

+/* I/O-Interruption Code as stored by TPI for an Adapter I/O */
+struct tpi_adapter_info {
+ u32 aism:8;
+ u32 :22;
+ u32 error:1;
+ u32 forward:1;
+ u32 reserved;
+ u32 adapter_IO:1;
+ u32 directed_irq:1;
+ u32 isc:3;
+ u32 :27;
+} __packed __aligned(4);
+
#endif /* __ASSEMBLY__ */

#endif /* _ASM_S390_TPI_H */
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 17c7deb516d2..513b393d5d0d 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -3271,11 +3271,86 @@ int kvm_s390_gisc_unregister(struct kvm *kvm, u32 gisc)
}
EXPORT_SYMBOL_GPL(kvm_s390_gisc_unregister);

+static void aen_host_forward(unsigned long si)
+{
+ struct kvm_s390_gisa_interrupt *gi;
+ struct zpci_gaite *gaite;
+ struct kvm *kvm;
+
+ gaite = (struct zpci_gaite *)aift->gait +
+ (si * sizeof(struct zpci_gaite));
+ if (gaite->count == 0)
+ return;
+ if (gaite->aisb != 0)
+ set_bit_inv(gaite->aisbo, (unsigned long *)gaite->aisb);
+
+ kvm = kvm_s390_pci_si_to_kvm(aift, si);
+ if (!kvm)
+ return;
+ gi = &kvm->arch.gisa_int;
+
+ if (!(gi->origin->g1.simm & AIS_MODE_MASK(gaite->gisc)) ||
+ !(gi->origin->g1.nimm & AIS_MODE_MASK(gaite->gisc))) {
+ gisa_set_ipm_gisc(gi->origin, gaite->gisc);
+ if (hrtimer_active(&gi->timer))
+ hrtimer_cancel(&gi->timer);
+ hrtimer_start(&gi->timer, 0, HRTIMER_MODE_REL);
+ kvm->stat.aen_forward++;
+ }
+}
+
+static void aen_process_gait(u8 isc)
+{
+ bool found = false, first = true;
+ union zpci_sic_iib iib = {{0}};
+ unsigned long si, flags;
+
+ spin_lock_irqsave(&aift->gait_lock, flags);
+
+ if (!aift->gait) {
+ spin_unlock_irqrestore(&aift->gait_lock, flags);
+ return;
+ }
+
+ for (si = 0;;) {
+ /* Scan adapter summary indicator bit vector */
+ si = airq_iv_scan(aift->sbv, si, airq_iv_end(aift->sbv));
+ if (si == -1UL) {
+ if (first || found) {
+ /* Re-enable interrupts. */
+ zpci_set_irq_ctrl(SIC_IRQ_MODE_SINGLE, isc,
+ &iib);
+ first = found = false;
+ } else {
+ /* Interrupts on and all bits processed */
+ break;
+ }
+ found = false;
+ si = 0;
+ /* Scan again after re-enabling interrupts */
+ continue;
+ }
+ found = true;
+ aen_host_forward(si);
+ }
+
+ spin_unlock_irqrestore(&aift->gait_lock, flags);
+}
+
static void gib_alert_irq_handler(struct airq_struct *airq,
struct tpi_info *tpi_info)
{
+ struct tpi_adapter_info *info = (struct tpi_adapter_info *)tpi_info;
+
inc_irq_stat(IRQIO_GAL);
- process_gib_alert_list();
+
+ if (IS_ENABLED(CONFIG_VFIO_PCI) && (info->forward || info->error)) {
+ aen_process_gait(info->isc);
+ if (info->aism != 0)
+ process_gib_alert_list();
+ } else {
+ process_gib_alert_list();
+ }
}

static struct airq_struct gib_alert_irq = {
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index d89cd16b57dd..32e75f6f4e4d 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -65,7 +65,8 @@ const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
STATS_DESC_COUNTER(VM, inject_float_mchk),
STATS_DESC_COUNTER(VM, inject_pfault_done),
STATS_DESC_COUNTER(VM, inject_service_signal),
- STATS_DESC_COUNTER(VM, inject_virtio)
+ STATS_DESC_COUNTER(VM, inject_virtio),
+ STATS_DESC_COUNTER(VM, aen_forward)
};

const struct kvm_stats_header kvm_vm_stats_header = {
diff --git a/arch/s390/kvm/pci.h b/arch/s390/kvm/pci.h
index 19609d7a53a7..25cb1c787190 100644
--- a/arch/s390/kvm/pci.h
+++ b/arch/s390/kvm/pci.h
@@ -12,6 +12,7 @@

#include <linux/pci.h>
#include <linux/mutex.h>
+#include <linux/kvm_host.h>
#include <asm/airq.h>
#include <asm/kvm_pci.h>

@@ -34,6 +35,15 @@ struct zpci_aift {

extern struct zpci_aift *aift;

+static inline struct kvm *kvm_s390_pci_si_to_kvm(struct zpci_aift *aift,
+ unsigned long si)
+{
+ if (!IS_ENABLED(CONFIG_VFIO_PCI) || aift->kzdev == 0 ||
+ aift->kzdev[si] == 0)
+ return 0;
+ return aift->kzdev[si]->kvm;
+};
+
int kvm_s390_pci_aen_init(u8 nisc);
void kvm_s390_pci_aen_exit(void);

--
2.27.0

2022-03-17 06:10:50

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 15/32] vfio: introduce KVM-owned IOMMU type

On Mon, Mar 14, 2022 at 03:44:34PM -0400, Matthew Rosato wrote:

> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 9394aa9444c1..0bec97077d61 100644
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -77,6 +77,7 @@ struct vfio_iommu {
> bool nesting;
> bool dirty_page_tracking;
> bool container_open;
> + bool kvm;
> struct list_head emulated_iommu_groups;
> };
>
> @@ -2203,7 +2204,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> goto out_free_group;
>
> ret = -EIO;
> - domain->domain = iommu_domain_alloc(bus);
> +
> + if (iommu->kvm)
> + domain->domain = iommu_domain_alloc_type(bus, IOMMU_DOMAIN_KVM);
> + else
> + domain->domain = iommu_domain_alloc(bus);
> +
> if (!domain->domain)
> goto out_free_domain;
>
> @@ -2552,6 +2558,9 @@ static void *vfio_iommu_type1_open(unsigned long arg)
> case VFIO_TYPE1v2_IOMMU:
> iommu->v2 = true;
> break;
> + case VFIO_KVM_IOMMU:
> + iommu->kvm = true;
> + break;

Same remark for this - but more - this is called KVM but it doesn't
accept a kvm FD or any thing else to link the domain to the KVM
in-use.

Jason

2022-03-17 06:16:23

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH v4 02/32] s390/sclp: detect the AISII facility

Detect the Adapter Interruption Source ID Interpretation facility.

Reviewed-by: Eric Farman <[email protected]>
Reviewed-by: Christian Borntraeger <[email protected]>
Reviewed-by: Claudio Imbrenda <[email protected]>
Signed-off-by: Matthew Rosato <[email protected]>
---
arch/s390/include/asm/sclp.h | 1 +
drivers/s390/char/sclp_early.c | 1 +
2 files changed, 2 insertions(+)

diff --git a/arch/s390/include/asm/sclp.h b/arch/s390/include/asm/sclp.h
index 58a4d3d354b7..8b56ac5ae496 100644
--- a/arch/s390/include/asm/sclp.h
+++ b/arch/s390/include/asm/sclp.h
@@ -89,6 +89,7 @@ struct sclp_info {
unsigned char has_sipl : 1;
unsigned char has_dirq : 1;
unsigned char has_zpci_lsi : 1;
+ unsigned char has_aisii : 1;
unsigned int ibc;
unsigned int mtid;
unsigned int mtid_cp;
diff --git a/drivers/s390/char/sclp_early.c b/drivers/s390/char/sclp_early.c
index b88dd0da1231..29fee179e197 100644
--- a/drivers/s390/char/sclp_early.c
+++ b/drivers/s390/char/sclp_early.c
@@ -45,6 +45,7 @@ static void __init sclp_early_facilities_detect(void)
sclp.has_gisaf = !!(sccb->fac118 & 0x08);
sclp.has_hvs = !!(sccb->fac119 & 0x80);
sclp.has_kss = !!(sccb->fac98 & 0x01);
+ sclp.has_aisii = !!(sccb->fac118 & 0x40);
sclp.has_zpci_lsi = !!(sccb->fac118 & 0x01);
if (sccb->fac85 & 0x02)
S390_lowcore.machine_flags |= MACHINE_FLAG_ESOP;
--
2.27.0

2022-03-17 06:44:22

by Matthew Rosato

[permalink] [raw]
Subject: [PATCH v4 31/32] MAINTAINERS: additional files related kvm s390 pci passthrough

Add entries from the s390 kvm subdirectory related to pci passthrough.

Acked-by: Christian Borntraeger <[email protected]>
Signed-off-by: Matthew Rosato <[email protected]>
---
MAINTAINERS | 2 ++
1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index e127c2fb08a7..6c76eb66b10a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16928,6 +16928,8 @@ M: Eric Farman <[email protected]>
L: [email protected]
L: [email protected]
S: Supported
+F: arch/s390/include/asm/kvm_pci.h
+F: arch/s390/kvm/pci*
F: drivers/vfio/pci/vfio_pci_zdev.c
F: include/uapi/linux/vfio_zdev.h

--
2.27.0

2022-03-21 04:26:52

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v4 14/32] iommu: introduce iommu_domain_alloc_type and the KVM type

> From: Jason Gunthorpe <[email protected]>
> Sent: Friday, March 18, 2022 10:13 PM
>
> On Fri, Mar 18, 2022 at 02:23:57AM +0000, Tian, Kevin wrote:
>
> > Yes, that is another major part work besides the iommufd work. And
> > it is not compatible with KVM features which rely on the dynamic
> > manner of EPT. Though It is a bit questionable whether it's worthy of
> > doing so just for saving memory footprint while losing other capabilities,
> > it is a requirement for some future security extension in Intel trusted
> > computing architecture. And KVM has been pinning pages for SEV/TDX/etc.
> > today thus some facilities can be reused. But I agree it is not a simple
> > task thus we need start discussion early to explore various gaps in
> > iommu and kvm.
>
> Yikes. IMHO this might work better going the other way, have KVM
> import the iommu_domain and use that as the KVM page table than vice
> versa.
>
> The semantics are a heck of a lot clearer, and it is really obvious
> that alot of KVM becomes disabled if you do this.
>

This is an interesting angle to look at it. But given pinning is already
required in KVM to support SEV/TDX even w/o assigned device, those
restrictions have to be understood by KVM MMU code which makes
a KVM-managed page table under such restrictions closer to be
sharable with IOMMU.

Thanks
Kevin

2022-03-21 22:56:01

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 14/32] iommu: introduce iommu_domain_alloc_type and the KVM type

On Sat, Mar 19, 2022 at 07:51:31AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <[email protected]>
> > Sent: Friday, March 18, 2022 10:13 PM
> >
> > On Fri, Mar 18, 2022 at 02:23:57AM +0000, Tian, Kevin wrote:
> >
> > > Yes, that is another major part work besides the iommufd work. And
> > > it is not compatible with KVM features which rely on the dynamic
> > > manner of EPT. Though It is a bit questionable whether it's worthy of
> > > doing so just for saving memory footprint while losing other capabilities,
> > > it is a requirement for some future security extension in Intel trusted
> > > computing architecture. And KVM has been pinning pages for SEV/TDX/etc.
> > > today thus some facilities can be reused. But I agree it is not a simple
> > > task thus we need start discussion early to explore various gaps in
> > > iommu and kvm.
> >
> > Yikes. IMHO this might work better going the other way, have KVM
> > import the iommu_domain and use that as the KVM page table than vice
> > versa.
> >
> > The semantics are a heck of a lot clearer, and it is really obvious
> > that alot of KVM becomes disabled if you do this.
> >
>
> This is an interesting angle to look at it. But given pinning is already
> required in KVM to support SEV/TDX even w/o assigned device, those
> restrictions have to be understood by KVM MMU code which makes
> a KVM-managed page table under such restrictions closer to be
> sharable with IOMMU.

I thought the SEV/TDX stuff wasn't being done with pinning but via a
memfd in a special mode that does sort of pin under the covers, but it
is not necessarily a DMA pin. (it isn't even struct page memory, so
I'm not even sure what pin means)

Certainly, there is no inherent problem with SEV/TDX having movable
memory and KVM could concievably handle this - but iommu cannot.

I would not make an equivilance with SEV/TDX and iommu at least..

Jason

2022-03-22 07:43:38

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v4 14/32] iommu: introduce iommu_domain_alloc_type and the KVM type

> From: Jason Gunthorpe
> Sent: Monday, March 21, 2022 10:07 PM
>
> On Sat, Mar 19, 2022 at 07:51:31AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <[email protected]>
> > > Sent: Friday, March 18, 2022 10:13 PM
> > >
> > > On Fri, Mar 18, 2022 at 02:23:57AM +0000, Tian, Kevin wrote:
> > >
> > > > Yes, that is another major part work besides the iommufd work. And
> > > > it is not compatible with KVM features which rely on the dynamic
> > > > manner of EPT. Though It is a bit questionable whether it's worthy of
> > > > doing so just for saving memory footprint while losing other capabilities,
> > > > it is a requirement for some future security extension in Intel trusted
> > > > computing architecture. And KVM has been pinning pages for
> SEV/TDX/etc.
> > > > today thus some facilities can be reused. But I agree it is not a simple
> > > > task thus we need start discussion early to explore various gaps in
> > > > iommu and kvm.
> > >
> > > Yikes. IMHO this might work better going the other way, have KVM
> > > import the iommu_domain and use that as the KVM page table than vice
> > > versa.
> > >
> > > The semantics are a heck of a lot clearer, and it is really obvious
> > > that alot of KVM becomes disabled if you do this.
> > >
> >
> > This is an interesting angle to look at it. But given pinning is already
> > required in KVM to support SEV/TDX even w/o assigned device, those
> > restrictions have to be understood by KVM MMU code which makes
> > a KVM-managed page table under such restrictions closer to be
> > sharable with IOMMU.
>
> I thought the SEV/TDX stuff wasn't being done with pinning but via a
> memfd in a special mode that does sort of pin under the covers, but it
> is not necessarily a DMA pin. (it isn't even struct page memory, so
> I'm not even sure what pin means)
>
> Certainly, there is no inherent problem with SEV/TDX having movable
> memory and KVM could concievably handle this - but iommu cannot.
>
> I would not make an equivilance with SEV/TDX and iommu at least..
>

Currently SEV does use DMA pin i.e. pin_user_pages in sev_pin_memory().

I'm not sure whether it's a hardware limitation or just a software tradeoff
for simplicity. But having that code does imply that KVM has absorbed
certain restrictions with that pinning fact.

But I agree they are not equivalent. e.g. suppose pinning is only applied to
private/encrypted memory in SEV/TDX while iommu requires pinning the
entire guest memory (if no IOPF support on device).

btw no matter it's KVM to import iommu domain or it's iommufd to
import KVM page table, in the end KVM mmu needs to explicitly mark
out its page table as shared with IOMMU and enable all kinds of
restrictions to support that sharing fact.

Thanks
Kevin