2019-03-22 14:45:27

by Pierre Morel

[permalink] [raw]
Subject: [PATCH v6 0/7] vfio: ap: AP Queue Interrupt Control

This patch implement PQAP/AQIC interception in KVM.

To implement this we need to add a new structure, vfio_ap_queue,to be
able to retrieve the mediated device associated with a queue and specific
values needed to register/unregister the interrupt structures:
- APQN: to be able to issue the commands and search for queue structures
- NIB : to unpin the NIB on clear IRQ
- ISC : to unregister with the GIB interface
- MATRIX: a pointer to the matrix mediated device
- LIST: the list_head to handle the vfio_queue life cycle

Having this structure and the list management greatly ease the handling
of the AP queues and diminues the LOCs needed in the vfio_ap driver by
more than 150 lines in comparison with the previous version.


0) Queues life cycle

vfio_ap_queues are created on probe

We define one bucket on the matrix device to store the free vfio_ap_queues,
the queues not assign to any matrix mediated device.

We define one bucket on each matrix mediated device to hold the
vfio_ap_queues belonging to it.

vfio_ap_queues are deleted on remove

This makes the search for a queue easy and the detection of assignent
incoherency obvious (the queue is not avilable) and simplifies assignment.


1) Phase 1, probe and remove from vfio_ap_queue

The vfio_ap_queue structures are dynamically allocated and setup
when a queue is probed by the ap_vfio_driver.
The vfio_ap_queue is linked to the ap_queue device as the driver data.

The new The vfio_ap_queue is put on a free_list belonging to the
matrix device.

The vfio_ap_queue are free during remove.


2) Phase 2, assignment of vfio_ap_queue to a mediated device

When a APID is assigned we look for APQI already assigned to
the matrix mediated device and associate all the queue with the
APQN = (APID,APQI) to the mediated device by adding them to
the mediated device queue list.
We do the same when a APQI is assigned.

If any queue with a matching APQN can not be found on the matrix
device free list it means it is already associated to another matrix
mediated device and no queue is added to the matrix mediated device.

3) Phase 3, starting the guest

When the VFIO device is opened the PQAP callback and a pointer to
the matrix mediated device are set inside KVM during the open callback.

When the device is closed or if a queue is removed, the vfio_ap_queue is
dissociated from the mediated device.


4) Phase 3 intercepting the PQAP/AQIC instruction

On interception of the PQAP/AQIC instruction, the interception code
verifies that AP instructions are available on hardware and in the
guest and retrun the usual -EOPNOTSUPP return code to let
QEMU handle the fault if it is not the case.

If instructions are allowed but intercepted it can only be due to
specifications errors in instruction usage or to the valid
interception of PQAP/AQIC.

In this case, we make sure the pqap_hook is initialized and call it.
Otherwise, if the hook is not initialize, we assume that there is
no VFIO AP driver to handle the CRYCB which is consequently empty
and setup a response as AP queue unavailable.

the pqap callback search for the queue asociated with the APQN
stored in the register 0, setting the code to "illegal APQN"
if the vfio_ap_queue can not be found.

Depending on the "i" bit of the register 1, the pqap callback
setup or clear the interruption by calling the host format PQAP/AQIC
instruction.
When seting up the interruption it uses the NIB and the guest ISC
provided by the guest and the host ISC provided by the registration
to the GIB code, pin the NIB and also stores ISC and NIB inside
the vfio_ap_queue structure.
When clearing the interrupt it retrieves the host ISC to unregister
with the GIB code and unpin the NIB.

We take care when enabling GISA that the guest may have issued a
reset and will not need to disable the interuptions before
re-enabling interruptions.

To make sure that the module holding the callback does not disapear
we use a module reference counting in the structure containing the
callback.


5) Phase 4 clean dissociation from the mediated device on remove

On removing of the AP device the remove callback is called.
To be sure that the guest will not access the queue anymore
we clear the APID CRYCB bit.
Cleaning the APID, over the APQI, is chosen because the architecture
specifies that only the APID can be dynamically changed outside IPL.

To be sure that the IRQ is cleared before the GISA is free we use
the KVM reference counting, raise it in open, lower it on release.


6) Associated QEMU patch

There is a QEMU patch which is needed to enable the PQAP/AQIC
facility in the guest.

Posted in [email protected] as:
Message-Id: <[email protected]>



Pierre Morel (7):
s390: ap: kvm: add PQAP interception for AQIC
s390: ap: new vfio_ap_queue structure
s390: ap: setup relation betwen KVM and mediated device
vfio: ap: register IOMMU VFIO notifier
s390: ap: implement PAPQ AQIC interception in kernel
s390: ap: Cleanup on removing the AP device
s390: ap: kvm: Enable PQAP/AQIC facility for the guest

arch/s390/include/asm/kvm_host.h | 8 +
arch/s390/kvm/priv.c | 90 ++++
arch/s390/tools/gen_facilities.c | 1 +
drivers/s390/crypto/ap_bus.h | 1 +
drivers/s390/crypto/vfio_ap_drv.c | 69 ++-
drivers/s390/crypto/vfio_ap_ops.c | 784 +++++++++++++++++++++++-----------
drivers/s390/crypto/vfio_ap_private.h | 20 +
7 files changed, 728 insertions(+), 245 deletions(-)

--
2.7.4

Changelog from v5:
- Refactoring of the PQAP interception after all discussions
(Conny, Halil (offline))
- take a big lock around open to avoid parallel changes through
assignment
- verify that at least one queue has a APID or APQI when
first assignment is done to not accept unavailable APID/APQI
(myself)
- Adding comment for locks on free_list
(Conny)
- Modified comment for
"s390: ap: setup relation betwen KVM and mediated device"
(Halil)

Changelog from v4:
- Add forgotten locking for vfio_get_queue() in pqap callback
(Conny / Halil)
- Add KVM reference counting to make sure GISA is free after IRQ
(Christian / Halil)
- Take care that ISC = 0 is a valid ISC
(Halil)
- Integrate the PQAP call back in a structure with module owner
reference counting to make sure the callback does not disappear.
- Restrict functionality to always open KVM before opening the
VFIO device.
- Search all devices in the vfio_ap driver list when associating
a queue to a mediated device
(Halil / Tony)
- Get vfio_ap_free_irq() out of vfio_ap_mdev_reset_queue() to call
it always, whatever the result of the reset.
(Tony)

Changelog from v3:
- Associating the vfio_queues during APID/APQI assign
(Tony)
- Dissociating the vfio_queues during APID/APQI unassign
(Tony)
- Taking care that the guest can directly disable the interrupt
by using a RESET
(Halil)
- Remove the patch creating the matrix bus to accelerate its
integration in Linux stable
(Christian)



2019-03-22 14:45:20

by Pierre Morel

[permalink] [raw]
Subject: [PATCH v6 7/7] s390: ap: kvm: Enable PQAP/AQIC facility for the guest

AP Queue Interruption Control (AQIC) facility gives
the guest the possibility to control interruption for
the Cryptographic Adjunct Processor queues.

Signed-off-by: Pierre Morel <[email protected]>
Reviewed-by: Tony Krowiak <[email protected]>
---
arch/s390/tools/gen_facilities.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/s390/tools/gen_facilities.c b/arch/s390/tools/gen_facilities.c
index 04c5f07..be4b826 100644
--- a/arch/s390/tools/gen_facilities.c
+++ b/arch/s390/tools/gen_facilities.c
@@ -111,6 +111,7 @@ static struct facility_def facility_defs[] = {
.bits = (int[]){
12, /* AP Query Configuration Information */
15, /* AP Facilities Test */
+ 65, /* AP Queue Interruption Control */
156, /* etoken facility */
-1 /* END */
}
--
2.7.4


2019-03-22 14:45:27

by Pierre Morel

[permalink] [raw]
Subject: [PATCH v6 6/7] s390: ap: Cleanup on removing the AP device

When a AP device is remove, clear the queue's APID bit in the guest CRYCB.
to be sure that the guest will not access the AP queue anymore.

Then we clear the interruptions and reset the AP device properly.

Signed-off-by: Pierre Morel <[email protected]>
---
drivers/s390/crypto/vfio_ap_drv.c | 36 +++++++++++++++++++++++++++++++++++
drivers/s390/crypto/vfio_ap_ops.c | 16 +++++++++++++---
drivers/s390/crypto/vfio_ap_private.h | 3 +++
3 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index 796e73d4..850ba6e 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -5,6 +5,7 @@
* Copyright IBM Corp. 2018
*
* Author(s): Tony Krowiak <[email protected]>
+ * Pierre Morel <[email protected]>
*/

#include <linux/module.h>
@@ -12,6 +13,8 @@
#include <linux/slab.h>
#include <linux/string.h>
#include <asm/facility.h>
+#include <linux/bitops.h>
+#include <linux/kvm_host.h>
#include "vfio_ap_private.h"

#define VFIO_AP_ROOT_NAME "vfio_ap"
@@ -65,6 +68,33 @@ static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
}

/**
+ * vfio_ap_update_crycb
+ * @q: A pointer to the queue being removed
+ *
+ * We clear the APID of the queue, making this queue unusable for the guest.
+ * After this function we can reset the queue without to fear a race with
+ * the guest to access the queue again.
+ * We do not fear race with the host as we still get the device.
+ */
+static void vfio_ap_update_crycb(struct vfio_ap_queue *q)
+{
+ struct ap_matrix_mdev *matrix_mdev = q->matrix_mdev;
+
+ if (!matrix_mdev)
+ return;
+
+ clear_bit_inv(AP_QID_CARD(q->apqn), matrix_mdev->matrix.apm);
+
+ if (!matrix_mdev->kvm)
+ return;
+
+ kvm_arch_crypto_set_masks(matrix_mdev->kvm,
+ matrix_mdev->matrix.apm,
+ matrix_mdev->matrix.aqm,
+ matrix_mdev->matrix.adm);
+}
+
+/**
* vfio_ap_queue_dev_remove:
*
* Free the associated vfio_ap_queue structure
@@ -74,7 +104,13 @@ static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
struct vfio_ap_queue *q;

q = dev_get_drvdata(&apdev->device);
+ if (!q)
+ return;
+
mutex_lock(&matrix_dev->lock);
+ vfio_ap_update_crycb(q);
+ vfio_ap_mdev_reset_queue(q);
+ vfio_ap_free_irq(q);
list_del(&q->list);
mutex_unlock(&matrix_dev->lock);
kfree(q);
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 7559b84..5db671c 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -63,15 +63,22 @@ static int vfio_ap_find_any_domain(int apqi)
return 0;
}

-static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
+int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
{
struct ap_queue_status status;
- int retry = 1;
+ int retry = 20;

do {
status = ap_zapq(q->apqn);
switch (status.response_code) {
case AP_RESPONSE_NORMAL:
+ while (!status.queue_empty && retry--) {
+ msleep(20);
+ status = ap_tapq(q->apqn, NULL);
+ }
+ if (retry <= 0)
+ pr_warn("%s: queue 0x%04x not empty\n",
+ __func__, q->apqn);
return 0;
case AP_RESPONSE_RESET_IN_PROGRESS:
case AP_RESPONSE_BUSY:
@@ -94,7 +101,7 @@ static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
* Unregister the ISC from the GIB alert
* Clear the vfio_ap_queue intern fields
*/
-static void vfio_ap_free_irq(struct vfio_ap_queue *q)
+void vfio_ap_free_irq(struct vfio_ap_queue *q)
{
if (!q)
return;
@@ -320,6 +327,7 @@ static int vfio_ap_mdev_remove(struct mdev_device *mdev)
list_for_each_entry_safe(q, qtmp, &matrix_mdev->qlist, list) {
q->matrix_mdev = NULL;
vfio_ap_mdev_reset_queue(q);
+ vfio_ap_free_irq(q);
list_move(&q->list, &matrix_dev->free_list);
}
list_del(&matrix_mdev->node);
@@ -382,6 +390,7 @@ static void vfio_ap_free_queue(int apqn, struct ap_matrix_mdev *matrix_mdev)
return;
q->matrix_mdev = NULL;
vfio_ap_mdev_reset_queue(q);
+ vfio_ap_free_irq(q);
list_move(&q->list, &matrix_dev->free_list);
}

@@ -1036,6 +1045,7 @@ static int vfio_ap_mdev_reset_queues(struct ap_matrix_mdev *matrix_mdev)

list_for_each_entry(q, &matrix_mdev->qlist, list) {
ret = vfio_ap_mdev_reset_queue(q);
+ vfio_ap_free_irq(q);
/*
* Regardless whether a queue turns out to be busy, or
* is not operational, we need to continue resetting
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 968d8aa..9fe580b 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -4,6 +4,7 @@
*
* Author(s): Tony Krowiak <[email protected]>
* Halil Pasic <[email protected]>
+ * Pierre Morel <[email protected]>
*
* Copyright IBM Corp. 2018
*/
@@ -103,4 +104,6 @@ struct vfio_ap_queue {
unsigned char a_isc;
unsigned char p_isc;
};
+void vfio_ap_free_irq(struct vfio_ap_queue *q);
+int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q);
#endif /* _VFIO_AP_PRIVATE_H_ */
--
2.7.4


2019-03-22 14:45:42

by Pierre Morel

[permalink] [raw]
Subject: [PATCH v6 3/7] s390: ap: setup relation betwen KVM and mediated device

When the mediated device is open we setup the relation with KVM unset it
when the mediated device is released.

We lock the matrix mediated device to avoid any change until the
open is done.
We make sure that KVM is present when opening the mediated device
otherwise we return an error.

Increase kvm's refcount to ensure the KVM structures are still available
during the use of the mediated device by the guest.

Signed-off-by: Pierre Morel <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 143 +++++++++++++++++++++-----------------
1 file changed, 79 insertions(+), 64 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 77f7bac..bdb36e0 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -787,74 +787,24 @@ static const struct attribute_group *vfio_ap_mdev_attr_groups[] = {
NULL
};

-/**
- * vfio_ap_mdev_set_kvm
- *
- * @matrix_mdev: a mediated matrix device
- * @kvm: reference to KVM instance
- *
- * Verifies no other mediated matrix device has @kvm and sets a reference to
- * it in @matrix_mdev->kvm.
- *
- * Return 0 if no other mediated matrix device has a reference to @kvm;
- * otherwise, returns an -EPERM.
- */
-static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
- struct kvm *kvm)
-{
- struct ap_matrix_mdev *m;
-
- mutex_lock(&matrix_dev->lock);
-
- list_for_each_entry(m, &matrix_dev->mdev_list, node) {
- if ((m != matrix_mdev) && (m->kvm == kvm)) {
- mutex_unlock(&matrix_dev->lock);
- return -EPERM;
- }
- }
-
- matrix_mdev->kvm = kvm;
- mutex_unlock(&matrix_dev->lock);
-
- return 0;
-}
-
static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
unsigned long action, void *data)
{
- int ret;
struct ap_matrix_mdev *matrix_mdev;

if (action != VFIO_GROUP_NOTIFY_SET_KVM)
return NOTIFY_OK;

matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
-
- if (!data) {
- matrix_mdev->kvm = NULL;
- return NOTIFY_OK;
- }
-
- ret = vfio_ap_mdev_set_kvm(matrix_mdev, data);
- if (ret)
- return NOTIFY_DONE;
-
- /* If there is no CRYCB pointer, then we can't copy the masks */
- if (!matrix_mdev->kvm->arch.crypto.crycbd)
- return NOTIFY_DONE;
-
- kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
- matrix_mdev->matrix.aqm,
- matrix_mdev->matrix.adm);
+ matrix_mdev->kvm = data;

return NOTIFY_OK;
}

-static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
+static int vfio_ap_mdev_reset_queues(struct ap_matrix_mdev *matrix_mdev)
{
int ret;
int rc = 0;
- struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
struct vfio_ap_queue *q;

list_for_each_entry(q, &matrix_mdev->qlist, list) {
@@ -871,41 +821,106 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
return rc;
}

+/**
+ * vfio_ap_mdev_set_kvm
+ *
+ * @matrix_mdev: a mediated matrix device
+ *
+ * - Verifies that the hook is free and install the PQAP hook
+ * - Copy the matrix masks inside the CRYCB
+ * - Increment the KVM rerference count
+ *
+ * Return 0 if no other mediated matrix device has a reference to @kvm;
+ * otherwise, returns an -EPERM.
+ */
+static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev)
+{
+ if (matrix_mdev->kvm->arch.crypto.pqap_hook)
+ return -EPERM;
+ matrix_mdev->kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
+
+ kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
+ matrix_mdev->matrix.aqm,
+ matrix_mdev->matrix.adm);
+
+ kvm_get_kvm(matrix_mdev->kvm);
+ return 0;
+}
+
+/**
+ * vfio_ap_mdev_unset_kvm
+ *
+ * @matrix_mdev: a mediated matrix device
+ *
+ * - Clears the matrix masks inside the CRYCB
+ * - Reset the queues before to clear the hook in case IRQ happen during
+ * reset.
+ * - Clears the hook
+ * - Decrement the KVM rerference count
+ */
+static int vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
+{
+ struct kvm *kvm = matrix_mdev->kvm;
+
+ kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
+ vfio_ap_mdev_reset_queues(matrix_mdev);
+ matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
+ matrix_mdev->kvm = NULL;
+ kvm_put_kvm(kvm);
+ return 0;
+}
+
static int vfio_ap_mdev_open(struct mdev_device *mdev)
{
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
unsigned long events;
int ret;

+ mutex_lock(&matrix_dev->lock);

- if (!try_module_get(THIS_MODULE))
- return -ENODEV;
+ if (!try_module_get(THIS_MODULE)) {
+ ret = -ENODEV;
+ goto unlock;
+ }

matrix_mdev->group_notifier.notifier_call = vfio_ap_mdev_group_notifier;
events = VFIO_GROUP_NOTIFY_SET_KVM;

ret = vfio_register_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
&events, &matrix_mdev->group_notifier);
- if (ret) {
- module_put(THIS_MODULE);
- return ret;
+ if (ret)
+ goto put_unlock;
+
+ /* We do not support opening the mediated device without KVM */
+ if (!matrix_mdev->kvm) {
+ ret = -ENOENT;
+ goto free_notifier;
}

- return 0;
+ ret = vfio_ap_mdev_set_kvm(matrix_mdev);
+ if (!ret)
+ goto unlock;
+
+free_notifier:
+ vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
+ &matrix_mdev->group_notifier);
+put_unlock:
+ module_put(THIS_MODULE);
+unlock:
+ mutex_unlock(&matrix_dev->lock);
+ return ret;
}

static void vfio_ap_mdev_release(struct mdev_device *mdev)
{
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);

- if (matrix_mdev->kvm)
- kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
-
- matrix_mdev->kvm = NULL;
- vfio_ap_mdev_reset_queues(mdev);
+ mutex_lock(&matrix_dev->lock);
+ vfio_ap_mdev_unset_kvm(matrix_mdev);
vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
&matrix_mdev->group_notifier);
module_put(THIS_MODULE);
+ mutex_unlock(&matrix_dev->lock);
}

static int vfio_ap_mdev_get_device_info(unsigned long arg)
@@ -939,7 +954,7 @@ static ssize_t vfio_ap_mdev_ioctl(struct mdev_device *mdev,
break;
case VFIO_DEVICE_RESET:
mutex_lock(&matrix_dev->lock);
- ret = vfio_ap_mdev_reset_queues(mdev);
+ ret = vfio_ap_mdev_reset_queues(mdev_get_drvdata(mdev));
mutex_unlock(&matrix_dev->lock);
break;
default:
--
2.7.4


2019-03-22 14:45:44

by Pierre Morel

[permalink] [raw]
Subject: [PATCH v6 4/7] vfio: ap: register IOMMU VFIO notifier

To be able to use the VFIO interface to facilitate the
mediated device memory pinning/unpinning we need to register
a notifier for IOMMU.

While we will start to pin one guest page for the interrupt indicator
byte, this is still ok with ballooning as this page will never be
used by the guest virtio-balloon driver.
So the pinned page will never be freed. And even a broken guest does
so, that would not impact the host as the original page is still
in control by vfio.

Signed-off-by: Pierre Morel <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 38 +++++++++++++++++++++++++++++++++++
drivers/s390/crypto/vfio_ap_private.h | 2 ++
2 files changed, 40 insertions(+)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index bdb36e0..3478499 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -787,6 +787,35 @@ static const struct attribute_group *vfio_ap_mdev_attr_groups[] = {
NULL
};

+/**
+ * vfio_ap_mdev_iommu_notifier: IOMMU notifier callback
+ *
+ * @nb: The notifier block
+ * @action: Action to be taken
+ * @data: data associated with the request
+ *
+ * For an UNMAP request, unpin the guest IOVA (the NIB guest address we
+ * pinned before). Other requests are ignored.
+ *
+ */
+static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ struct ap_matrix_mdev *matrix_mdev;
+
+ matrix_mdev = container_of(nb, struct ap_matrix_mdev, iommu_notifier);
+
+ if (action == VFIO_IOMMU_NOTIFY_DMA_UNMAP) {
+ struct vfio_iommu_type1_dma_unmap *unmap = data;
+ unsigned long g_pfn = unmap->iova >> PAGE_SHIFT;
+
+ vfio_unpin_pages(mdev_dev(matrix_mdev->mdev), &g_pfn, 1);
+ return NOTIFY_OK;
+ }
+
+ return NOTIFY_DONE;
+}
+
static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
unsigned long action, void *data)
{
@@ -897,6 +926,13 @@ static int vfio_ap_mdev_open(struct mdev_device *mdev)
goto free_notifier;
}

+ matrix_mdev->iommu_notifier.notifier_call = vfio_ap_mdev_iommu_notifier;
+ events = VFIO_IOMMU_NOTIFY_DMA_UNMAP;
+ ret = vfio_register_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
+ &events, &matrix_mdev->iommu_notifier);
+ if (ret)
+ goto free_notifier;
+
ret = vfio_ap_mdev_set_kvm(matrix_mdev);
if (!ret)
goto unlock;
@@ -917,6 +953,8 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)

mutex_lock(&matrix_dev->lock);
vfio_ap_mdev_unset_kvm(matrix_mdev);
+ vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
+ &matrix_mdev->iommu_notifier);
vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
&matrix_mdev->group_notifier);
module_put(THIS_MODULE);
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 3e6940c..4a287c8 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -82,9 +82,11 @@ struct ap_matrix_mdev {
struct list_head node;
struct ap_matrix matrix;
struct notifier_block group_notifier;
+ struct notifier_block iommu_notifier;
struct kvm *kvm;
struct kvm_s390_module_hook pqap_hook;
struct list_head qlist;
+ struct mdev_device *mdev;
};

extern int vfio_ap_mdev_register(void);
--
2.7.4


2019-03-22 14:46:48

by Pierre Morel

[permalink] [raw]
Subject: [PATCH v6 2/7] s390: ap: new vfio_ap_queue structure

The AP interruptions are assigned on a queue basis and
the GISA structure is handled on a VM basis, so that
we need to add a structure we can retrieve from both side
holding the information we need to handle PQAP/AQIC interception
and setup the GISA.

Since we can not add more information to the ap_device
we add a new structure vfio_ap_queue, to hold per queue
information useful to handle interruptions and set it as
driver's data of the standard ap_queue device.

Usually, the device and the mediated device are linked together
but in the vfio_ap driver design we have a bunch of "sub" devices
(the ap_queue devices) belonging to the mediated device.

Linking these structure to the mediated device it is assigned to,
with the help of the vfio_ap_queue structure will help us to
retrieve the AP devices associated with the mediated devices
during the mediated device operations.

------------ -------------
| AP queue |--> | AP_vfio_q |<----
------------ ------^------ | ---------------
| <--->| matrix_mdev |
------------ ------v------ | ---------------
| AP queue |--> | AP_vfio_q |-----
------------ -------------

The vfio_ap_queue device will hold the following entries:
- apqn: AP queue number (defined here)
- isc : Interrupt subclass (defined later)
- nib : notification information byte (defined later)
- list: a list_head entry allowing to link this structure to a
matrix mediated device it is assigned to.

The vfio_ap_queue structure is allocated when the vfio_ap_driver
is probed and added as driver data to the ap_queue device.
It is free on remove.

The structure is linked to the matrix_dev host device at the
probe of the device building some kind of free list for the
matrix mediated devices.

When the vfio_queue is associated to a matrix mediated device,
during assign_adapter or assign_domain,
the vfio_ap_queue device is linked to this matrix mediated device
and unlinked when dissociated.

Queuing the devices on a list of free devices and testing the
matrix_mdev pointer to the associated matrix allow us to know
if the queue is associated to the matrix device and associated
or not to a mediated device.

All the operation on the free_list must be protected by the
VFIO AP matrix_dev lock.

Signed-off-by: Pierre Morel <[email protected]>
---
drivers/s390/crypto/vfio_ap_drv.c | 31 ++-
drivers/s390/crypto/vfio_ap_ops.c | 423 ++++++++++++++++++----------------
drivers/s390/crypto/vfio_ap_private.h | 7 +
3 files changed, 266 insertions(+), 195 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index e9824c3..df6f21a 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -40,14 +40,42 @@ static struct ap_device_id ap_queue_ids[] = {

MODULE_DEVICE_TABLE(vfio_ap, ap_queue_ids);

+/**
+ * vfio_ap_queue_dev_probe:
+ *
+ * Allocate a vfio_ap_queue structure and associate it
+ * with the device as driver_data.
+ */
static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
{
+ struct vfio_ap_queue *q;
+
+ q = kzalloc(sizeof(*q), GFP_KERNEL);
+ if (!q)
+ return -ENOMEM;
+ dev_set_drvdata(&apdev->device, q);
+ q->apqn = to_ap_queue(&apdev->device)->qid;
+ INIT_LIST_HEAD(&q->list);
+ mutex_lock(&matrix_dev->lock);
+ list_add(&q->list, &matrix_dev->free_list);
+ mutex_unlock(&matrix_dev->lock);
return 0;
}

+/**
+ * vfio_ap_queue_dev_remove:
+ *
+ * Free the associated vfio_ap_queue structure
+ */
static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
{
- /* Nothing to do yet */
+ struct vfio_ap_queue *q;
+
+ q = dev_get_drvdata(&apdev->device);
+ mutex_lock(&matrix_dev->lock);
+ list_del(&q->list);
+ mutex_unlock(&matrix_dev->lock);
+ kfree(q);
}

static void vfio_ap_matrix_dev_release(struct device *dev)
@@ -107,6 +135,7 @@ static int vfio_ap_matrix_dev_create(void)
matrix_dev->device.bus = &matrix_bus;
matrix_dev->device.release = vfio_ap_matrix_dev_release;
matrix_dev->vfio_ap_drv = &vfio_ap_drv;
+ INIT_LIST_HEAD(&matrix_dev->free_list);

ret = device_register(&matrix_dev->device);
if (ret)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 900b9cf..77f7bac 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -24,6 +24,68 @@
#define VFIO_AP_MDEV_TYPE_HWVIRT "passthrough"
#define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"

+/**
+ * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
+ * @apqn: The queue APQN
+ *
+ * Retrieve a queue with a specific APQN from the list of the
+ * devices associated with a list.
+ *
+ * Returns the pointer to the associated vfio_ap_queue
+ */
+struct vfio_ap_queue *vfio_ap_get_queue(int apqn, struct list_head *l)
+{
+ struct vfio_ap_queue *q;
+
+ list_for_each_entry(q, l, list)
+ if (q->apqn == apqn)
+ return q;
+ return NULL;
+}
+
+static int vfio_ap_find_any_card(int apid)
+{
+ struct vfio_ap_queue *q;
+
+ list_for_each_entry(q, &matrix_dev->free_list, list)
+ if (AP_QID_CARD(q->apqn) == apid)
+ return 1;
+ return 0;
+}
+
+static int vfio_ap_find_any_domain(int apqi)
+{
+ struct vfio_ap_queue *q;
+
+ list_for_each_entry(q, &matrix_dev->free_list, list)
+ if (AP_QID_QUEUE(q->apqn) == apqi)
+ return 1;
+ return 0;
+}
+
+static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
+{
+ struct ap_queue_status status;
+ int retry = 1;
+
+ do {
+ status = ap_zapq(q->apqn);
+ switch (status.response_code) {
+ case AP_RESPONSE_NORMAL:
+ return 0;
+ case AP_RESPONSE_RESET_IN_PROGRESS:
+ case AP_RESPONSE_BUSY:
+ msleep(20);
+ break;
+ default:
+ /* things are really broken, give up */
+ return -EIO;
+ }
+ } while (retry--);
+
+ return -EBUSY;
+}
+
static void vfio_ap_matrix_init(struct ap_config_info *info,
struct ap_matrix *matrix)
{
@@ -45,6 +107,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
return -ENOMEM;
}

+ INIT_LIST_HEAD(&matrix_mdev->qlist);
vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
mdev_set_drvdata(mdev, matrix_mdev);
mutex_lock(&matrix_dev->lock);
@@ -113,162 +176,189 @@ static struct attribute_group *vfio_ap_mdev_type_groups[] = {
NULL,
};

-struct vfio_ap_queue_reserved {
- unsigned long *apid;
- unsigned long *apqi;
- bool reserved;
-};
+static void vfio_ap_free_queue(int apqn, struct ap_matrix_mdev *matrix_mdev)
+{
+ struct vfio_ap_queue *q;
+
+ q = vfio_ap_get_queue(apqn, &matrix_mdev->qlist);
+ if (!q)
+ return;
+ q->matrix_mdev = NULL;
+ vfio_ap_mdev_reset_queue(q);
+ list_move(&q->list, &matrix_dev->free_list);
+}

/**
- * vfio_ap_has_queue
- *
- * @dev: an AP queue device
- * @data: a struct vfio_ap_queue_reserved reference
+ * vfio_ap_put_all_domains:
*
- * Flags whether the AP queue device (@dev) has a queue ID containing the APQN,
- * apid or apqi specified in @data:
+ * @matrix_mdev: the matrix mediated device for which we want to associate
+ * all available queues with a given apqi.
+ * @apid: The apid which associated with all defined APQI of the
+ * mediated device will define a AP queue.
*
- * - If @data contains both an apid and apqi value, then @data will be flagged
- * as reserved if the APID and APQI fields for the AP queue device matches
- *
- * - If @data contains only an apid value, @data will be flagged as
- * reserved if the APID field in the AP queue device matches
- *
- * - If @data contains only an apqi value, @data will be flagged as
- * reserved if the APQI field in the AP queue device matches
- *
- * Returns 0 to indicate the input to function succeeded. Returns -EINVAL if
- * @data does not contain either an apid or apqi.
+ * We remove the queue from the list of queues associated with the
+ * mediated device and put them back to the free list of the matrix
+ * device and clear the matrix_mdev pointer.
*/
-static int vfio_ap_has_queue(struct device *dev, void *data)
+static void vfio_ap_put_all_domains(struct ap_matrix_mdev *matrix_mdev,
+ int apid)
{
- struct vfio_ap_queue_reserved *qres = data;
- struct ap_queue *ap_queue = to_ap_queue(dev);
- ap_qid_t qid;
- unsigned long id;
+ int apqi, apqn;

- if (qres->apid && qres->apqi) {
- qid = AP_MKQID(*qres->apid, *qres->apqi);
- if (qid == ap_queue->qid)
- qres->reserved = true;
- } else if (qres->apid && !qres->apqi) {
- id = AP_QID_CARD(ap_queue->qid);
- if (id == *qres->apid)
- qres->reserved = true;
- } else if (!qres->apid && qres->apqi) {
- id = AP_QID_QUEUE(ap_queue->qid);
- if (id == *qres->apqi)
- qres->reserved = true;
- } else {
- return -EINVAL;
+ for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
+ apqn = AP_MKQID(apid, apqi);
+ vfio_ap_free_queue(apqn, matrix_mdev);
}
-
- return 0;
}

/**
- * vfio_ap_verify_queue_reserved
- *
- * @matrix_dev: a mediated matrix device
- * @apid: an AP adapter ID
- * @apqi: an AP queue index
+ * vfio_ap_put_all_cards:
*
- * Verifies that the AP queue with @apid/@apqi is reserved by the VFIO AP device
- * driver according to the following rules:
+ * @matrix_mdev: the matrix mediated device for which we want to associate
+ * all available queues with a given apqi.
+ * @apqi: The apqi which associated with all defined APID of the
+ * mediated device will define a AP queue.
*
- * - If both @apid and @apqi are not NULL, then there must be an AP queue
- * device bound to the vfio_ap driver with the APQN identified by @apid and
- * @apqi
- *
- * - If only @apid is not NULL, then there must be an AP queue device bound
- * to the vfio_ap driver with an APQN containing @apid
- *
- * - If only @apqi is not NULL, then there must be an AP queue device bound
- * to the vfio_ap driver with an APQN containing @apqi
- *
- * Returns 0 if the AP queue is reserved; otherwise, returns -EADDRNOTAVAIL.
+ * We remove the queue from the list of queues associated with the
+ * mediated device and put them back to the free list of the matrix
+ * device and clear the matrix_mdev pointer.
*/
-static int vfio_ap_verify_queue_reserved(unsigned long *apid,
- unsigned long *apqi)
+static void vfio_ap_put_all_cards(struct ap_matrix_mdev *matrix_mdev, int apqi)
{
- int ret;
- struct vfio_ap_queue_reserved qres;
+ int apid, apqn;

- qres.apid = apid;
- qres.apqi = apqi;
- qres.reserved = false;
-
- ret = driver_for_each_device(&matrix_dev->vfio_ap_drv->driver, NULL,
- &qres, vfio_ap_has_queue);
- if (ret)
- return ret;
+ for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
+ apqn = AP_MKQID(apid, apqi);
+ vfio_ap_free_queue(apqn, matrix_mdev);
+ }
+}

- if (qres.reserved)
- return 0;
+static void move_and_set(struct list_head *src, struct list_head *dst,
+ struct ap_matrix_mdev *matrix_mdev)
+{
+ struct vfio_ap_queue *q, *qtmp;

- return -EADDRNOTAVAIL;
+ list_for_each_entry_safe(q, qtmp, src, list) {
+ list_move(&q->list, dst);
+ q->matrix_mdev = matrix_mdev;
+ }
}

-static int
-vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
- unsigned long apid)
+static int vfio_ap_queue_match(struct device *dev, void *data)
{
- int ret;
- unsigned long apqi;
- unsigned long nbits = matrix_mdev->matrix.aqm_max + 1;
+ struct ap_queue *ap;

- if (find_first_bit_inv(matrix_mdev->matrix.aqm, nbits) >= nbits)
- return vfio_ap_verify_queue_reserved(&apid, NULL);
+ ap = to_ap_queue(dev);
+ return ap->qid == *(int *)data;
+}

- for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, nbits) {
- ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
- if (ret)
- return ret;
- }
+static struct vfio_ap_queue *vfio_ap_find_queue(int apqn)
+{
+ struct device *dev;
+ struct vfio_ap_queue *q;
+
+ dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
+ &apqn, vfio_ap_queue_match);
+ if (!dev)
+ return NULL;
+ q = dev_get_drvdata(dev);
+ put_device(dev);
+ return q;
+}

+/**
+ * vfio_ap_get_all_domains:
+ *
+ * @matrix_mdev: the matrix mediated device for which we want to associate
+ * all available queues with a given apqi.
+ * @apqi: The apqi which associated with all defined APID of the
+ * mediated device will define a AP queue.
+ *
+ * We define a local list to put all queues we find on the matrix driver
+ * device list when associating the apqi with all already defined apid for
+ * this matrix mediated device.
+ *
+ * If we can get all the devices we roll them to the mediated device list
+ * If we get errors we unroll them to the free list.
+ */
+static int vfio_ap_get_all_domains(struct ap_matrix_mdev *matrix_mdev, int apid)
+{
+ int apqi, apqn;
+ int ret = 0;
+ struct vfio_ap_queue *q;
+ struct list_head q_list;
+
+ if (!vfio_ap_find_any_card(apid))
+ return -EADDRNOTAVAIL;
+
+ INIT_LIST_HEAD(&q_list);
+
+ for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
+ apqn = AP_MKQID(apid, apqi);
+ q = vfio_ap_find_queue(apqn);
+ if (!q) {
+ ret = -EADDRNOTAVAIL;
+ goto rewind;
+ }
+ if (q->matrix_mdev) {
+ ret = -EADDRINUSE;
+ goto rewind;
+ }
+ list_move(&q->list, &q_list);
+ }
+ move_and_set(&q_list, &matrix_mdev->qlist, matrix_mdev);
return 0;
+rewind:
+ move_and_set(&q_list, &matrix_dev->free_list, NULL);
+ return ret;
}
-
/**
- * vfio_ap_mdev_verify_no_sharing
+ * vfio_ap_get_all_cards:
*
- * Verifies that the APQNs derived from the cross product of the AP adapter IDs
- * and AP queue indexes comprising the AP matrix are not configured for another
- * mediated device. AP queue sharing is not allowed.
+ * @matrix_mdev: the matrix mediated device for which we want to associate
+ * all available queues with a given apqi.
+ * @apqi: The apqi which associated with all defined APID of the
+ * mediated device will define a AP queue.
*
- * @matrix_mdev: the mediated matrix device
+ * We define a local list to put all queues we find on the matrix device
+ * free list when associating the apqi with all already defined apid for
+ * this matrix mediated device.
*
- * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
+ * If we can get all the devices we roll them to the mediated device list
+ * If we get errors we unroll them to the free list.
*/
-static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
+static int vfio_ap_get_all_cards(struct ap_matrix_mdev *matrix_mdev, int apqi)
{
- struct ap_matrix_mdev *lstdev;
- DECLARE_BITMAP(apm, AP_DEVICES);
- DECLARE_BITMAP(aqm, AP_DOMAINS);
-
- list_for_each_entry(lstdev, &matrix_dev->mdev_list, node) {
- if (matrix_mdev == lstdev)
- continue;
-
- memset(apm, 0, sizeof(apm));
- memset(aqm, 0, sizeof(aqm));
-
- /*
- * We work on full longs, as we can only exclude the leftover
- * bits in non-inverse order. The leftover is all zeros.
- */
- if (!bitmap_and(apm, matrix_mdev->matrix.apm,
- lstdev->matrix.apm, AP_DEVICES))
- continue;
-
- if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
- lstdev->matrix.aqm, AP_DOMAINS))
- continue;
-
- return -EADDRINUSE;
+ int apid, apqn;
+ int ret = 0;
+ struct vfio_ap_queue *q;
+ struct list_head q_list;
+ struct ap_matrix_mdev *tmp = NULL;
+
+ if (!vfio_ap_find_any_domain(apqi))
+ return -EADDRNOTAVAIL;
+
+ INIT_LIST_HEAD(&q_list);
+
+ for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
+ apqn = AP_MKQID(apid, apqi);
+ q = vfio_ap_find_queue(apqn);
+ if (!q) {
+ ret = -EADDRNOTAVAIL;
+ goto rewind;
+ }
+ if (q->matrix_mdev) {
+ ret = -EADDRINUSE;
+ goto rewind;
+ }
+ list_move(&q->list, &q_list);
}
-
+ tmp = matrix_mdev;
+ move_and_set(&q_list, &matrix_mdev->qlist, matrix_mdev);
return 0;
+rewind:
+ move_and_set(&q_list, &matrix_dev->free_list, NULL);
+ return ret;
}

/**
@@ -330,21 +420,15 @@ static ssize_t assign_adapter_store(struct device *dev,
*/
mutex_lock(&matrix_dev->lock);

- ret = vfio_ap_mdev_verify_queues_reserved_for_apid(matrix_mdev, apid);
+ ret = vfio_ap_get_all_domains(matrix_mdev, apid);
if (ret)
goto done;

set_bit_inv(apid, matrix_mdev->matrix.apm);

- ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
- if (ret)
- goto share_err;
-
ret = count;
goto done;

-share_err:
- clear_bit_inv(apid, matrix_mdev->matrix.apm);
done:
mutex_unlock(&matrix_dev->lock);

@@ -391,32 +475,13 @@ static ssize_t unassign_adapter_store(struct device *dev,

mutex_lock(&matrix_dev->lock);
clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
+ vfio_ap_put_all_domains(matrix_mdev, apid);
mutex_unlock(&matrix_dev->lock);

return count;
}
static DEVICE_ATTR_WO(unassign_adapter);

-static int
-vfio_ap_mdev_verify_queues_reserved_for_apqi(struct ap_matrix_mdev *matrix_mdev,
- unsigned long apqi)
-{
- int ret;
- unsigned long apid;
- unsigned long nbits = matrix_mdev->matrix.apm_max + 1;
-
- if (find_first_bit_inv(matrix_mdev->matrix.apm, nbits) >= nbits)
- return vfio_ap_verify_queue_reserved(NULL, &apqi);
-
- for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, nbits) {
- ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
- if (ret)
- return ret;
- }
-
- return 0;
-}
-
/**
* assign_domain_store
*
@@ -471,21 +536,15 @@ static ssize_t assign_domain_store(struct device *dev,

mutex_lock(&matrix_dev->lock);

- ret = vfio_ap_mdev_verify_queues_reserved_for_apqi(matrix_mdev, apqi);
+ ret = vfio_ap_get_all_cards(matrix_mdev, apqi);
if (ret)
goto done;

set_bit_inv(apqi, matrix_mdev->matrix.aqm);

- ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
- if (ret)
- goto share_err;
-
ret = count;
goto done;

-share_err:
- clear_bit_inv(apqi, matrix_mdev->matrix.aqm);
done:
mutex_unlock(&matrix_dev->lock);

@@ -533,6 +592,7 @@ static ssize_t unassign_domain_store(struct device *dev,

mutex_lock(&matrix_dev->lock);
clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
+ vfio_ap_put_all_cards(matrix_mdev, apqi);
mutex_unlock(&matrix_dev->lock);

return count;
@@ -790,49 +850,22 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
return NOTIFY_OK;
}

-static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
- unsigned int retry)
-{
- struct ap_queue_status status;
-
- do {
- status = ap_zapq(AP_MKQID(apid, apqi));
- switch (status.response_code) {
- case AP_RESPONSE_NORMAL:
- return 0;
- case AP_RESPONSE_RESET_IN_PROGRESS:
- case AP_RESPONSE_BUSY:
- msleep(20);
- break;
- default:
- /* things are really broken, give up */
- return -EIO;
- }
- } while (retry--);
-
- return -EBUSY;
-}
-
static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
{
int ret;
int rc = 0;
- unsigned long apid, apqi;
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+ struct vfio_ap_queue *q;

- for_each_set_bit_inv(apid, matrix_mdev->matrix.apm,
- matrix_mdev->matrix.apm_max + 1) {
- for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
- matrix_mdev->matrix.aqm_max + 1) {
- ret = vfio_ap_mdev_reset_queue(apid, apqi, 1);
- /*
- * Regardless whether a queue turns out to be busy, or
- * is not operational, we need to continue resetting
- * the remaining queues.
- */
- if (ret)
- rc = ret;
- }
+ list_for_each_entry(q, &matrix_mdev->qlist, list) {
+ ret = vfio_ap_mdev_reset_queue(q);
+ /*
+ * Regardless whether a queue turns out to be busy, or
+ * is not operational, we need to continue resetting
+ * the remaining queues but notice the last error code.
+ */
+ if (ret)
+ rc = ret;
}

return rc;
@@ -868,10 +901,10 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
if (matrix_mdev->kvm)
kvm_arch_crypto_clear_masks(matrix_mdev->kvm);

+ matrix_mdev->kvm = NULL;
vfio_ap_mdev_reset_queues(mdev);
vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
&matrix_mdev->group_notifier);
- matrix_mdev->kvm = NULL;
module_put(THIS_MODULE);
}

@@ -905,7 +938,9 @@ static ssize_t vfio_ap_mdev_ioctl(struct mdev_device *mdev,
ret = vfio_ap_mdev_get_device_info(arg);
break;
case VFIO_DEVICE_RESET:
+ mutex_lock(&matrix_dev->lock);
ret = vfio_ap_mdev_reset_queues(mdev);
+ mutex_unlock(&matrix_dev->lock);
break;
default:
ret = -EOPNOTSUPP;
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index a910be1..3e6940c 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -40,6 +40,7 @@ struct ap_matrix_dev {
atomic_t available_instances;
struct ap_config_info info;
struct list_head mdev_list;
+ struct list_head free_list;
struct mutex lock;
struct ap_driver *vfio_ap_drv;
};
@@ -83,9 +84,15 @@ struct ap_matrix_mdev {
struct notifier_block group_notifier;
struct kvm *kvm;
struct kvm_s390_module_hook pqap_hook;
+ struct list_head qlist;
};

extern int vfio_ap_mdev_register(void);
extern void vfio_ap_mdev_unregister(void);

+struct vfio_ap_queue {
+ struct list_head list;
+ struct ap_matrix_mdev *matrix_mdev;
+ int apqn;
+};
#endif /* _VFIO_AP_PRIVATE_H_ */
--
2.7.4


2019-03-22 14:47:38

by Pierre Morel

[permalink] [raw]
Subject: [PATCH v6 5/7] s390: ap: implement PAPQ AQIC interception in kernel

We register the AP PQAP instruction hook during the open
of the mediated device. And unregister it on release.

In the AP PQAP instruction hook, if we receive a demand to
enable IRQs,
- we retrieve the vfio_ap_queue based on the APQN we receive
in REG1,
- we retrieve the page of the guest address, (NIB), from
register REG2
- we the mediated device to use the VFIO pinning infratrsucture
to pin the page of the guest address,
- we retrieve the pointer to KVM to register the guest ISC
and retrieve the host ISC
- finaly we activate GISA

If we receive a demand to disable IRQs,
- we deactivate GISA
- unregister from the GIB
- unping the NIB

Signed-off-by: Pierre Morel <[email protected]>
---
drivers/s390/crypto/ap_bus.h | 1 +
drivers/s390/crypto/vfio_ap_drv.c | 2 +
drivers/s390/crypto/vfio_ap_ops.c | 204 +++++++++++++++++++++++++++++++++-
drivers/s390/crypto/vfio_ap_private.h | 6 +
4 files changed, 210 insertions(+), 3 deletions(-)

diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
index d0059ea..9a4fd96 100644
--- a/drivers/s390/crypto/ap_bus.h
+++ b/drivers/s390/crypto/ap_bus.h
@@ -43,6 +43,7 @@ static inline int ap_test_bit(unsigned int *ptr, unsigned int nr)
#define AP_RESPONSE_BUSY 0x05
#define AP_RESPONSE_INVALID_ADDRESS 0x06
#define AP_RESPONSE_OTHERWISE_CHANGED 0x07
+#define AP_RESPONSE_INVALID_GISA 0x08
#define AP_RESPONSE_Q_FULL 0x10
#define AP_RESPONSE_NO_PENDING_REPLY 0x10
#define AP_RESPONSE_INDEX_TOO_BIG 0x11
diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index df6f21a..796e73d4 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -55,6 +55,8 @@ static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
return -ENOMEM;
dev_set_drvdata(&apdev->device, q);
q->apqn = to_ap_queue(&apdev->device)->qid;
+ q->a_isc = VFIO_AP_ISC_INVALID;
+ q->p_isc = VFIO_AP_ISC_INVALID;
INIT_LIST_HEAD(&q->list);
mutex_lock(&matrix_dev->lock);
list_add(&q->list, &matrix_dev->free_list);
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 3478499..7559b84 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -86,6 +86,194 @@ static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
return -EBUSY;
}

+/**
+ * vfio_ap_free_irq:
+ * @q: The vfio_ap_queue
+ *
+ * Unpin the guest NIB
+ * Unregister the ISC from the GIB alert
+ * Clear the vfio_ap_queue intern fields
+ */
+static void vfio_ap_free_irq(struct vfio_ap_queue *q)
+{
+ if (!q)
+ return;
+ if (q->a_pfn)
+ vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), &q->a_pfn, 1);
+ if (q->a_isc != VFIO_AP_ISC_INVALID)
+ kvm_s390_gisc_unregister(q->matrix_mdev->kvm, q->a_isc);
+ q->a_pfn = 0;
+ q->p_pfn = 0;
+ q->a_isc = VFIO_AP_ISC_INVALID;
+ q->p_isc = VFIO_AP_ISC_INVALID;
+}
+
+/**
+ * vfio_ap_clrirq: Disable Interruption for a APQN
+ *
+ * @dev: the device associated with the ap_queue
+ * @q: the vfio_ap_queue holding AQIC parameters
+ *
+ * Issue the host side PQAP/AQIC
+ * On success: unpin the NIB saved in *q and unregister from GIB
+ * interface
+ *
+ * Return the ap_queue_status returned by the ap_aqic()
+ */
+static struct ap_queue_status vfio_ap_clrirq(struct vfio_ap_queue *q)
+{
+ struct ap_qirq_ctrl aqic_gisa = {};
+ struct ap_queue_status status;
+
+ status = ap_aqic(q->apqn, aqic_gisa, NULL);
+ if (!status.response_code)
+ vfio_ap_free_irq(q);
+
+ return status;
+}
+
+/**
+ * vfio_ap_setirq: Enable Interruption for a APQN
+ *
+ * @dev: the device associated with the ap_queue
+ * @q: the vfio_ap_queue holding AQIC parameters
+ *
+ * Pin the NIB saved in *q
+ * Register the guest ISC to GIB interface and retrieve the
+ * host ISC to issue the host side PQAP/AQIC
+ *
+ * Response.status may be set to following Response Code in case of error:
+ * - AP_RESPONSE_INVALID_ADDRESS: vfio_pin_pages failed
+ * - AP_RESPONSE_OTHERWISE_CHANGED: Hypervizor GISA internal error
+ *
+ * Otherwise return the ap_queue_status returned by the ap_aqic()
+ */
+static struct ap_queue_status vfio_ap_setirq(struct vfio_ap_queue *q)
+{
+ struct ap_qirq_ctrl aqic_gisa = {};
+ struct ap_queue_status status = {};
+ struct kvm_s390_gisa *gisa;
+ struct kvm *kvm;
+ unsigned long h_nib, h_pfn;
+ int ret;
+
+ kvm = q->matrix_mdev->kvm;
+ gisa = kvm->arch.gisa_int.origin;
+
+ q->a_pfn = q->a_nib >> PAGE_SHIFT;
+ ret = vfio_pin_pages(mdev_dev(q->matrix_mdev->mdev), &q->a_pfn, 1,
+ IOMMU_READ | IOMMU_WRITE, &h_pfn);
+ switch (ret) {
+ case 1:
+ break;
+ case -EINVAL:
+ case -E2BIG:
+ status.response_code = AP_RESPONSE_INVALID_ADDRESS;
+ /* Fallthrough */
+ default:
+ return status;
+ }
+
+ h_nib = (h_pfn << PAGE_SHIFT) | (q->a_nib & ~PAGE_MASK);
+ aqic_gisa.gisc = q->a_isc;
+ aqic_gisa.isc = kvm_s390_gisc_register(kvm, q->a_isc);
+ aqic_gisa.ir = 1;
+ aqic_gisa.gisa = gisa->next_alert >> 4;
+
+ status = ap_aqic(q->apqn, aqic_gisa, (void *)h_nib);
+ switch (status.response_code) {
+ case AP_RESPONSE_NORMAL:
+ /* See if we did clear older IRQ configuration */
+ if (q->p_pfn)
+ vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev),
+ &q->p_pfn, 1);
+ if (q->p_isc != VFIO_AP_ISC_INVALID)
+ kvm_s390_gisc_unregister(kvm, q->p_isc);
+ q->p_pfn = q->a_pfn;
+ q->p_isc = q->a_isc;
+ break;
+ case AP_RESPONSE_OTHERWISE_CHANGED:
+ /* We could not modify IRQ setings: clear new configuration */
+ vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), &q->a_pfn, 1);
+ kvm_s390_gisc_unregister(kvm, q->a_isc);
+ break;
+ case AP_RESPONSE_INVALID_GISA:
+ status.response_code = AP_RESPONSE_INVALID_ADDRESS;
+ default: /* Fall Through */
+ pr_warn("%s: apqn %04x: response: %02x\n", __func__, q->apqn,
+ status.response_code);
+ vfio_ap_free_irq(q);
+ break;
+ }
+
+ return status;
+}
+
+/**
+ * handle_pqap: PQAP instruction callback
+ *
+ * @vcpu: The vcpu on which we received the PQAP instruction
+ *
+ * Get the general register contents to initialize internal variables.
+ * REG[0]: APQN
+ * REG[1]: IR and ISC
+ * REG[2]: NIB
+ *
+ * Response.status may be set to following Response Code:
+ * - AP_RESPONSE_Q_NOT_AVAIL: if the queue is not available
+ * - AP_RESPONSE_DECONFIGURED: if the queue is not configured
+ * - AP_RESPONSE_NORMAL (0) : in case of successs
+ * Check vfio_ap_setirq() and vfio_ap_clrirq() for other possible RC.
+ * We take the matrix_dev lock to ensure serialization on queues and
+ * mediated device access.
+ *
+ * Return 0 if we could handle the request inside KVM.
+ * otherwise, returns -EOPNOTSUPP to let QEMU handle the fault.
+ */
+static int handle_pqap(struct kvm_vcpu *vcpu)
+{
+ uint64_t status;
+ uint16_t apqn;
+ struct vfio_ap_queue *q;
+ struct ap_queue_status qstatus = {
+ .response_code = AP_RESPONSE_Q_NOT_AVAIL, };
+ struct ap_matrix_mdev *matrix_mdev;
+
+ /* If we do not use the AIV facility just go to userland */
+ if (!(vcpu->arch.sie_block->eca & ECA_AIV))
+ return -EOPNOTSUPP;
+
+ apqn = vcpu->run->s.regs.gprs[0] & 0xffff;
+ mutex_lock(&matrix_dev->lock);
+ matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
+ struct ap_matrix_mdev, pqap_hook);
+ if (!matrix_mdev)
+ goto out_unlock;
+ q = vfio_ap_get_queue(apqn, &matrix_mdev->qlist);
+ if (!q)
+ goto out_noqueue;
+
+ status = vcpu->run->s.regs.gprs[1];
+
+ /* If IR bit(16) is set we enable the interrupt */
+ if ((status >> (63 - 16)) & 0x01) {
+ q->a_isc = status & 0x07;
+ q->a_nib = vcpu->run->s.regs.gprs[2];
+ qstatus = vfio_ap_setirq(q);
+ if (qstatus.response_code) {
+ q->a_nib = 0;
+ q->a_isc = VFIO_AP_ISC_INVALID;
+ }
+ } else
+ qstatus = vfio_ap_clrirq(q);
+
+out_noqueue:
+ memcpy(&vcpu->run->s.regs.gprs[1], &qstatus, sizeof(qstatus));
+out_unlock:
+ mutex_unlock(&matrix_dev->lock);
+ return 0;
+}
+
static void vfio_ap_matrix_init(struct ap_config_info *info,
struct ap_matrix *matrix)
{
@@ -108,8 +296,11 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
}

INIT_LIST_HEAD(&matrix_mdev->qlist);
+ matrix_mdev->mdev = mdev;
vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
mdev_set_drvdata(mdev, matrix_mdev);
+ matrix_mdev->pqap_hook.hook = handle_pqap;
+ matrix_mdev->pqap_hook.owner = THIS_MODULE;
mutex_lock(&matrix_dev->lock);
list_add(&matrix_mdev->node, &matrix_dev->mdev_list);
mutex_unlock(&matrix_dev->lock);
@@ -120,11 +311,17 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
static int vfio_ap_mdev_remove(struct mdev_device *mdev)
{
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+ struct vfio_ap_queue *q, *qtmp;

if (matrix_mdev->kvm)
return -EBUSY;

mutex_lock(&matrix_dev->lock);
+ list_for_each_entry_safe(q, qtmp, &matrix_mdev->qlist, list) {
+ q->matrix_mdev = NULL;
+ vfio_ap_mdev_reset_queue(q);
+ list_move(&q->list, &matrix_dev->free_list);
+ }
list_del(&matrix_mdev->node);
mutex_unlock(&matrix_dev->lock);

@@ -787,7 +984,7 @@ static const struct attribute_group *vfio_ap_mdev_attr_groups[] = {
NULL
};

-/**
+ /*
* vfio_ap_mdev_iommu_notifier: IOMMU notifier callback
*
* @nb: The notifier block
@@ -807,9 +1004,10 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,

if (action == VFIO_IOMMU_NOTIFY_DMA_UNMAP) {
struct vfio_iommu_type1_dma_unmap *unmap = data;
- unsigned long g_pfn = unmap->iova >> PAGE_SHIFT;
+ unsigned long pfn = unmap->iova >> PAGE_SHIFT;

- vfio_unpin_pages(mdev_dev(matrix_mdev->mdev), &g_pfn, 1);
+ if (matrix_mdev->mdev)
+ vfio_unpin_pages(mdev_dev(matrix_mdev->mdev), &pfn, 1);
return NOTIFY_OK;
}

diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 4a287c8..968d8aa 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -95,6 +95,12 @@ extern void vfio_ap_mdev_unregister(void);
struct vfio_ap_queue {
struct list_head list;
struct ap_matrix_mdev *matrix_mdev;
+ unsigned long a_nib;
+ unsigned long a_pfn;
+ unsigned long p_pfn;
int apqn;
+#define VFIO_AP_ISC_INVALID 0xff
+ unsigned char a_isc;
+ unsigned char p_isc;
};
#endif /* _VFIO_AP_PRIVATE_H_ */
--
2.7.4


2019-03-22 14:47:48

by Pierre Morel

[permalink] [raw]
Subject: [PATCH v6 1/7] s390: ap: kvm: add PQAP interception for AQIC

We prepare the interception of the PQAP/AQIC instruction for
the case the AQIC facility is enabled in the guest.

First of all we do not want to change existing behavior when
intercepting AP instructions without the SIE allowing the guest
to use AP instructions.

In this patch we only handle the AQIC interception allowed by
facility 65 which will be enabled when the complete interception
infrastructure will be present.

We add a callback inside the KVM arch structure for s390 for
a VFIO driver to handle a specific response to the PQAP
instruction with the AQIC command and only this command.

But we want to be able to return a correct answer to the guest
even there is no VFIO AP driver in the kernel.
Therefor, we inject the correct exceptions from inside KVM for the
case the callback is not initialized, which happens when the vfio_ap
driver is not loaded.

We do consider the responsability of the driver to always initialize
the PQAP callback if it defines queues by initializing the CRYCB for
a guest.
If the callback has been setup we call it.
If not we setup an answer considering that no queue is available
for the guest when no callback has been setup.

Signed-off-by: Pierre Morel <[email protected]>
---
arch/s390/include/asm/kvm_host.h | 8 ++++
arch/s390/kvm/priv.c | 90 +++++++++++++++++++++++++++++++++++
drivers/s390/crypto/vfio_ap_private.h | 2 +
3 files changed, 100 insertions(+)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index a496276..624460b 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -18,6 +18,7 @@
#include <linux/kvm_host.h>
#include <linux/kvm.h>
#include <linux/seqlock.h>
+#include <linux/module.h>
#include <asm/debug.h>
#include <asm/cpu.h>
#include <asm/fpu/api.h>
@@ -721,8 +722,15 @@ struct kvm_s390_cpu_model {
unsigned short ibc;
};

+struct kvm_s390_module_hook {
+ int (*hook)(struct kvm_vcpu *vcpu);
+ void *data;
+ struct module *owner;
+};
+
struct kvm_s390_crypto {
struct kvm_s390_crypto_cb *crycb;
+ struct kvm_s390_module_hook *pqap_hook;
__u32 crycbd;
__u8 aes_kw;
__u8 dea_kw;
diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index 8679bd7..793e48a 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -27,6 +27,7 @@
#include <asm/io.h>
#include <asm/ptrace.h>
#include <asm/sclp.h>
+#include <asm/ap.h>
#include "gaccess.h"
#include "kvm-s390.h"
#include "trace.h"
@@ -592,6 +593,93 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
}
}

+/*
+ * handle_pqap: Handling pqap interception
+ * @vcpu: the vcpu having issue the pqap instruction
+ *
+ * We now support PQAP/AQIC instructions and we need to correctly
+ * answer the guest even if no dedicated driver's hook is available.
+ *
+ * The intercepting code calls a dedicated callback for this instruction
+ * if a driver did register one in the CRYPTO satellite of the
+ * SIE block.
+ *
+ * For PQAP AQIC and TAPQ instructions, verify privilege and specifications.
+ *
+ * If no callback available, the queues are not available, return this to
+ * the caller.
+ * Else return the value returned by the callback.
+ */
+static int handle_pqap(struct kvm_vcpu *vcpu)
+{
+ struct ap_queue_status status = {};
+ unsigned long reg0;
+ int ret;
+ uint8_t fc;
+
+ /* Verify that the AP instruction are available */
+ if (!ap_instructions_available())
+ return -EOPNOTSUPP;
+ /* Verify that the guest is allowed to use AP instructions */
+ if (!(vcpu->arch.sie_block->eca & ECA_APIE))
+ return -EOPNOTSUPP;
+ /*
+ * The only possibly intercepted instructions when AP instructions are
+ * available for the guest are AQIC and TAPQ with the t bit set
+ * since we do not set IC.3 (FIII) we currently will not intercept
+ * TAPQ.
+ * The following code will only treat AQIC function code.
+ */
+ reg0 = vcpu->run->s.regs.gprs[0];
+ fc = reg0 >> 24;
+ if (fc != 0x03) {
+ pr_warn("%s: Unexpected interception code 0x%02x\n",
+ __func__, fc);
+ return -EOPNOTSUPP;
+ }
+ /* All PQAP instructions are allowed for guest kernel only */
+ if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
+ return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
+ /*
+ * Common tests for PQAP instructions to generate a specification
+ * exception
+ */
+ /* Zero bits overwrite produce a specification exception */
+ if (reg0 & 0x007f0000UL)
+ goto specification_except;
+ /* If APXA is not installed APQN is limited */
+ if (!(vcpu->kvm->arch.crypto.crycbd & 0x02))
+ if (reg0 & 0x000030f0UL)
+ goto specification_except;
+ /* AQIC needs facility 65 */
+ if (!test_kvm_facility(vcpu->kvm, 65))
+ goto specification_except;
+
+ /*
+ * Verify that the hook callback is registered, lock the owner
+ * and call the hook.
+ */
+ if (vcpu->kvm->arch.crypto.pqap_hook) {
+ if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))
+ return -EOPNOTSUPP;
+ ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);
+ module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);
+ return ret;
+ }
+ /*
+ * It is the duty of the vfio_driver to register a hook
+ * If it does not and we get an exception on AQIC we must
+ * guess that there is no vfio_ap_driver at all and no one
+ * to handle the guests's CRYCB and the CRYCB is empty.
+ */
+ status.response_code = 0x01;
+ memcpy(&vcpu->run->s.regs.gprs[1], &status, sizeof(status));
+ return 0;
+
+specification_except:
+ return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
+}
+
static int handle_stfl(struct kvm_vcpu *vcpu)
{
int rc;
@@ -878,6 +966,8 @@ int kvm_s390_handle_b2(struct kvm_vcpu *vcpu)
return handle_sthyi(vcpu);
case 0x7d:
return handle_stsi(vcpu);
+ case 0xaf:
+ return handle_pqap(vcpu);
case 0xb1:
return handle_stfl(vcpu);
case 0xb2:
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 76b7f98..a910be1 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -16,6 +16,7 @@
#include <linux/mdev.h>
#include <linux/delay.h>
#include <linux/mutex.h>
+#include <linux/kvm_host.h>

#include "ap_bus.h"

@@ -81,6 +82,7 @@ struct ap_matrix_mdev {
struct ap_matrix matrix;
struct notifier_block group_notifier;
struct kvm *kvm;
+ struct kvm_s390_module_hook pqap_hook;
};

extern int vfio_ap_mdev_register(void);
--
2.7.4


2019-03-25 08:08:05

by Harald Freudenberger

[permalink] [raw]
Subject: Re: [PATCH v6 2/7] s390: ap: new vfio_ap_queue structure

On 22.03.19 15:43, Pierre Morel wrote:
> The AP interruptions are assigned on a queue basis and
> the GISA structure is handled on a VM basis, so that
> we need to add a structure we can retrieve from both side
> holding the information we need to handle PQAP/AQIC interception
> and setup the GISA.
>
> Since we can not add more information to the ap_device
> we add a new structure vfio_ap_queue, to hold per queue
> information useful to handle interruptions and set it as
> driver's data of the standard ap_queue device.
>
> Usually, the device and the mediated device are linked together
> but in the vfio_ap driver design we have a bunch of "sub" devices
> (the ap_queue devices) belonging to the mediated device.
>
> Linking these structure to the mediated device it is assigned to,
> with the help of the vfio_ap_queue structure will help us to
> retrieve the AP devices associated with the mediated devices
> during the mediated device operations.
>
> ------------ -------------
> | AP queue |--> | AP_vfio_q |<----
> ------------ ------^------ | ---------------
> | <--->| matrix_mdev |
> ------------ ------v------ | ---------------
> | AP queue |--> | AP_vfio_q |-----
> ------------ -------------
>
> The vfio_ap_queue device will hold the following entries:
> - apqn: AP queue number (defined here)
> - isc : Interrupt subclass (defined later)
> - nib : notification information byte (defined later)
> - list: a list_head entry allowing to link this structure to a
> matrix mediated device it is assigned to.
>
> The vfio_ap_queue structure is allocated when the vfio_ap_driver
> is probed and added as driver data to the ap_queue device.
> It is free on remove.
>
> The structure is linked to the matrix_dev host device at the
> probe of the device building some kind of free list for the
> matrix mediated devices.
>
> When the vfio_queue is associated to a matrix mediated device,
> during assign_adapter or assign_domain,
> the vfio_ap_queue device is linked to this matrix mediated device
> and unlinked when dissociated.
>
> Queuing the devices on a list of free devices and testing the
> matrix_mdev pointer to the associated matrix allow us to know
> if the queue is associated to the matrix device and associated
> or not to a mediated device.
>
> All the operation on the free_list must be protected by the
> VFIO AP matrix_dev lock.
>
> Signed-off-by: Pierre Morel <[email protected]>
> ---
> drivers/s390/crypto/vfio_ap_drv.c | 31 ++-
> drivers/s390/crypto/vfio_ap_ops.c | 423 ++++++++++++++++++----------------
> drivers/s390/crypto/vfio_ap_private.h | 7 +
> 3 files changed, 266 insertions(+), 195 deletions(-)
>
> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
> index e9824c3..df6f21a 100644
> --- a/drivers/s390/crypto/vfio_ap_drv.c
> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> @@ -40,14 +40,42 @@ static struct ap_device_id ap_queue_ids[] = {
>
> MODULE_DEVICE_TABLE(vfio_ap, ap_queue_ids);
>
> +/**
> + * vfio_ap_queue_dev_probe:
> + *
> + * Allocate a vfio_ap_queue structure and associate it
> + * with the device as driver_data.
> + */
> static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
> {
> + struct vfio_ap_queue *q;
> +
> + q = kzalloc(sizeof(*q), GFP_KERNEL);
> + if (!q)
> + return -ENOMEM;
> + dev_set_drvdata(&apdev->device, q);
> + q->apqn = to_ap_queue(&apdev->device)->qid;
> + INIT_LIST_HEAD(&q->list);
> + mutex_lock(&matrix_dev->lock);
> + list_add(&q->list, &matrix_dev->free_list);
> + mutex_unlock(&matrix_dev->lock);
> return 0;
> }
>
> +/**
> + * vfio_ap_queue_dev_remove:
> + *
> + * Free the associated vfio_ap_queue structure
> + */
> static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
> {
> - /* Nothing to do yet */
> + struct vfio_ap_queue *q;
> +
> + q = dev_get_drvdata(&apdev->device);
I'd add a check if q != NULL here.
> + mutex_lock(&matrix_dev->lock);
> + list_del(&q->list);
> + mutex_unlock(&matrix_dev->lock);
> + kfree(q);
I would add a line:
    dev_set_drvdata(&apdev->device, NULL);
> }
>
> static void vfio_ap_matrix_dev_release(struct device *dev)
> @@ -107,6 +135,7 @@ static int vfio_ap_matrix_dev_create(void)
> matrix_dev->device.bus = &matrix_bus;
> matrix_dev->device.release = vfio_ap_matrix_dev_release;
> matrix_dev->vfio_ap_drv = &vfio_ap_drv;
> + INIT_LIST_HEAD(&matrix_dev->free_list);
>
> ret = device_register(&matrix_dev->device);
> if (ret)
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 900b9cf..77f7bac 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -24,6 +24,68 @@
> #define VFIO_AP_MDEV_TYPE_HWVIRT "passthrough"
> #define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
>
> +/**
> + * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
> + * @apqn: The queue APQN
> + *
> + * Retrieve a queue with a specific APQN from the list of the
> + * devices associated with a list.
> + *
> + * Returns the pointer to the associated vfio_ap_queue
> + */
> +struct vfio_ap_queue *vfio_ap_get_queue(int apqn, struct list_head *l)
> +{
> + struct vfio_ap_queue *q;
> +
> + list_for_each_entry(q, l, list)
> + if (q->apqn == apqn)
> + return q;
> + return NULL;
> +}
> +
> +static int vfio_ap_find_any_card(int apid)
> +{
> + struct vfio_ap_queue *q;
> +
> + list_for_each_entry(q, &matrix_dev->free_list, list)
> + if (AP_QID_CARD(q->apqn) == apid)
> + return 1;
> + return 0;
> +}
> +
> +static int vfio_ap_find_any_domain(int apqi)
> +{
> + struct vfio_ap_queue *q;
> +
> + list_for_each_entry(q, &matrix_dev->free_list, list)
> + if (AP_QID_QUEUE(q->apqn) == apqi)
> + return 1;
> + return 0;
> +}
> +
> +static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
> +{
> + struct ap_queue_status status;
> + int retry = 1;
> +
> + do {
> + status = ap_zapq(q->apqn);
> + switch (status.response_code) {
> + case AP_RESPONSE_NORMAL:
> + return 0;
> + case AP_RESPONSE_RESET_IN_PROGRESS:
> + case AP_RESPONSE_BUSY:
> + msleep(20);
> + break;
> + default:
> + /* things are really broken, give up */
> + return -EIO;
> + }
> + } while (retry--);
> +
> + return -EBUSY;
> +}
> +
> static void vfio_ap_matrix_init(struct ap_config_info *info,
> struct ap_matrix *matrix)
> {
> @@ -45,6 +107,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
> return -ENOMEM;
> }
>
> + INIT_LIST_HEAD(&matrix_mdev->qlist);
> vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
> mdev_set_drvdata(mdev, matrix_mdev);
> mutex_lock(&matrix_dev->lock);
> @@ -113,162 +176,189 @@ static struct attribute_group *vfio_ap_mdev_type_groups[] = {
> NULL,
> };
>
> -struct vfio_ap_queue_reserved {
> - unsigned long *apid;
> - unsigned long *apqi;
> - bool reserved;
> -};
> +static void vfio_ap_free_queue(int apqn, struct ap_matrix_mdev *matrix_mdev)
> +{
> + struct vfio_ap_queue *q;
> +
> + q = vfio_ap_get_queue(apqn, &matrix_mdev->qlist);
> + if (!q)
> + return;
> + q->matrix_mdev = NULL;
> + vfio_ap_mdev_reset_queue(q);
> + list_move(&q->list, &matrix_dev->free_list);
> +}
>
> /**
> - * vfio_ap_has_queue
> - *
> - * @dev: an AP queue device
> - * @data: a struct vfio_ap_queue_reserved reference
> + * vfio_ap_put_all_domains:
> *
> - * Flags whether the AP queue device (@dev) has a queue ID containing the APQN,
> - * apid or apqi specified in @data:
> + * @matrix_mdev: the matrix mediated device for which we want to associate
> + * all available queues with a given apqi.
> + * @apid: The apid which associated with all defined APQI of the
> + * mediated device will define a AP queue.
> *
> - * - If @data contains both an apid and apqi value, then @data will be flagged
> - * as reserved if the APID and APQI fields for the AP queue device matches
> - *
> - * - If @data contains only an apid value, @data will be flagged as
> - * reserved if the APID field in the AP queue device matches
> - *
> - * - If @data contains only an apqi value, @data will be flagged as
> - * reserved if the APQI field in the AP queue device matches
> - *
> - * Returns 0 to indicate the input to function succeeded. Returns -EINVAL if
> - * @data does not contain either an apid or apqi.
> + * We remove the queue from the list of queues associated with the
> + * mediated device and put them back to the free list of the matrix
> + * device and clear the matrix_mdev pointer.
> */
> -static int vfio_ap_has_queue(struct device *dev, void *data)
> +static void vfio_ap_put_all_domains(struct ap_matrix_mdev *matrix_mdev,
> + int apid)
> {
> - struct vfio_ap_queue_reserved *qres = data;
> - struct ap_queue *ap_queue = to_ap_queue(dev);
> - ap_qid_t qid;
> - unsigned long id;
> + int apqi, apqn;
>
> - if (qres->apid && qres->apqi) {
> - qid = AP_MKQID(*qres->apid, *qres->apqi);
> - if (qid == ap_queue->qid)
> - qres->reserved = true;
> - } else if (qres->apid && !qres->apqi) {
> - id = AP_QID_CARD(ap_queue->qid);
> - if (id == *qres->apid)
> - qres->reserved = true;
> - } else if (!qres->apid && qres->apqi) {
> - id = AP_QID_QUEUE(ap_queue->qid);
> - if (id == *qres->apqi)
> - qres->reserved = true;
> - } else {
> - return -EINVAL;
> + for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
> + apqn = AP_MKQID(apid, apqi);
> + vfio_ap_free_queue(apqn, matrix_mdev);
> }
> -
> - return 0;
> }
>
> /**
> - * vfio_ap_verify_queue_reserved
> - *
> - * @matrix_dev: a mediated matrix device
> - * @apid: an AP adapter ID
> - * @apqi: an AP queue index
> + * vfio_ap_put_all_cards:
> *
> - * Verifies that the AP queue with @apid/@apqi is reserved by the VFIO AP device
> - * driver according to the following rules:
> + * @matrix_mdev: the matrix mediated device for which we want to associate
> + * all available queues with a given apqi.
> + * @apqi: The apqi which associated with all defined APID of the
> + * mediated device will define a AP queue.
> *
> - * - If both @apid and @apqi are not NULL, then there must be an AP queue
> - * device bound to the vfio_ap driver with the APQN identified by @apid and
> - * @apqi
> - *
> - * - If only @apid is not NULL, then there must be an AP queue device bound
> - * to the vfio_ap driver with an APQN containing @apid
> - *
> - * - If only @apqi is not NULL, then there must be an AP queue device bound
> - * to the vfio_ap driver with an APQN containing @apqi
> - *
> - * Returns 0 if the AP queue is reserved; otherwise, returns -EADDRNOTAVAIL.
> + * We remove the queue from the list of queues associated with the
> + * mediated device and put them back to the free list of the matrix
> + * device and clear the matrix_mdev pointer.
> */
> -static int vfio_ap_verify_queue_reserved(unsigned long *apid,
> - unsigned long *apqi)
> +static void vfio_ap_put_all_cards(struct ap_matrix_mdev *matrix_mdev, int apqi)
> {
> - int ret;
> - struct vfio_ap_queue_reserved qres;
> + int apid, apqn;
>
> - qres.apid = apid;
> - qres.apqi = apqi;
> - qres.reserved = false;
> -
> - ret = driver_for_each_device(&matrix_dev->vfio_ap_drv->driver, NULL,
> - &qres, vfio_ap_has_queue);
> - if (ret)
> - return ret;
> + for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
> + apqn = AP_MKQID(apid, apqi);
> + vfio_ap_free_queue(apqn, matrix_mdev);
> + }
> +}
>
> - if (qres.reserved)
> - return 0;
> +static void move_and_set(struct list_head *src, struct list_head *dst,
> + struct ap_matrix_mdev *matrix_mdev)
> +{
> + struct vfio_ap_queue *q, *qtmp;
>
> - return -EADDRNOTAVAIL;
> + list_for_each_entry_safe(q, qtmp, src, list) {
> + list_move(&q->list, dst);
> + q->matrix_mdev = matrix_mdev;
> + }
> }
>
> -static int
> -vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
> - unsigned long apid)
> +static int vfio_ap_queue_match(struct device *dev, void *data)
> {
> - int ret;
> - unsigned long apqi;
> - unsigned long nbits = matrix_mdev->matrix.aqm_max + 1;
> + struct ap_queue *ap;
>
> - if (find_first_bit_inv(matrix_mdev->matrix.aqm, nbits) >= nbits)
> - return vfio_ap_verify_queue_reserved(&apid, NULL);
> + ap = to_ap_queue(dev);
> + return ap->qid == *(int *)data;
> +}
>
> - for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, nbits) {
> - ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
> - if (ret)
> - return ret;
> - }
> +static struct vfio_ap_queue *vfio_ap_find_queue(int apqn)
> +{
> + struct device *dev;
> + struct vfio_ap_queue *q;
> +
> + dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
> + &apqn, vfio_ap_queue_match);
> + if (!dev)
> + return NULL;
> + q = dev_get_drvdata(dev);
> + put_device(dev);
> + return q;
> +}
>
> +/**
> + * vfio_ap_get_all_domains:
> + *
> + * @matrix_mdev: the matrix mediated device for which we want to associate
> + * all available queues with a given apqi.
> + * @apqi: The apqi which associated with all defined APID of the
> + * mediated device will define a AP queue.
> + *
> + * We define a local list to put all queues we find on the matrix driver
> + * device list when associating the apqi with all already defined apid for
> + * this matrix mediated device.
> + *
> + * If we can get all the devices we roll them to the mediated device list
> + * If we get errors we unroll them to the free list.
> + */
> +static int vfio_ap_get_all_domains(struct ap_matrix_mdev *matrix_mdev, int apid)
> +{
> + int apqi, apqn;
> + int ret = 0;
> + struct vfio_ap_queue *q;
> + struct list_head q_list;
> +
> + if (!vfio_ap_find_any_card(apid))
> + return -EADDRNOTAVAIL;
> +
> + INIT_LIST_HEAD(&q_list);
> +
> + for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
> + apqn = AP_MKQID(apid, apqi);
> + q = vfio_ap_find_queue(apqn);
> + if (!q) {
> + ret = -EADDRNOTAVAIL;
> + goto rewind;
> + }
> + if (q->matrix_mdev) {
> + ret = -EADDRINUSE;
> + goto rewind;
> + }
> + list_move(&q->list, &q_list);
> + }
> + move_and_set(&q_list, &matrix_mdev->qlist, matrix_mdev);
> return 0;
> +rewind:
> + move_and_set(&q_list, &matrix_dev->free_list, NULL);
> + return ret;
> }
> -
> /**
> - * vfio_ap_mdev_verify_no_sharing
> + * vfio_ap_get_all_cards:
> *
> - * Verifies that the APQNs derived from the cross product of the AP adapter IDs
> - * and AP queue indexes comprising the AP matrix are not configured for another
> - * mediated device. AP queue sharing is not allowed.
> + * @matrix_mdev: the matrix mediated device for which we want to associate
> + * all available queues with a given apqi.
> + * @apqi: The apqi which associated with all defined APID of the
> + * mediated device will define a AP queue.
> *
> - * @matrix_mdev: the mediated matrix device
> + * We define a local list to put all queues we find on the matrix device
> + * free list when associating the apqi with all already defined apid for
> + * this matrix mediated device.
> *
> - * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
> + * If we can get all the devices we roll them to the mediated device list
> + * If we get errors we unroll them to the free list.
> */
> -static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
> +static int vfio_ap_get_all_cards(struct ap_matrix_mdev *matrix_mdev, int apqi)
> {
> - struct ap_matrix_mdev *lstdev;
> - DECLARE_BITMAP(apm, AP_DEVICES);
> - DECLARE_BITMAP(aqm, AP_DOMAINS);
> -
> - list_for_each_entry(lstdev, &matrix_dev->mdev_list, node) {
> - if (matrix_mdev == lstdev)
> - continue;
> -
> - memset(apm, 0, sizeof(apm));
> - memset(aqm, 0, sizeof(aqm));
> -
> - /*
> - * We work on full longs, as we can only exclude the leftover
> - * bits in non-inverse order. The leftover is all zeros.
> - */
> - if (!bitmap_and(apm, matrix_mdev->matrix.apm,
> - lstdev->matrix.apm, AP_DEVICES))
> - continue;
> -
> - if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
> - lstdev->matrix.aqm, AP_DOMAINS))
> - continue;
> -
> - return -EADDRINUSE;
> + int apid, apqn;
> + int ret = 0;
> + struct vfio_ap_queue *q;
> + struct list_head q_list;
> + struct ap_matrix_mdev *tmp = NULL;
> +
> + if (!vfio_ap_find_any_domain(apqi))
> + return -EADDRNOTAVAIL;
> +
> + INIT_LIST_HEAD(&q_list);
> +
> + for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
> + apqn = AP_MKQID(apid, apqi);
> + q = vfio_ap_find_queue(apqn);
> + if (!q) {
> + ret = -EADDRNOTAVAIL;
> + goto rewind;
> + }
> + if (q->matrix_mdev) {
> + ret = -EADDRINUSE;
> + goto rewind;
> + }
> + list_move(&q->list, &q_list);
> }
> -
> + tmp = matrix_mdev;
> + move_and_set(&q_list, &matrix_mdev->qlist, matrix_mdev);
> return 0;
> +rewind:
> + move_and_set(&q_list, &matrix_dev->free_list, NULL);
> + return ret;
> }
>
> /**
> @@ -330,21 +420,15 @@ static ssize_t assign_adapter_store(struct device *dev,
> */
> mutex_lock(&matrix_dev->lock);
>
> - ret = vfio_ap_mdev_verify_queues_reserved_for_apid(matrix_mdev, apid);
> + ret = vfio_ap_get_all_domains(matrix_mdev, apid);
> if (ret)
> goto done;
>
> set_bit_inv(apid, matrix_mdev->matrix.apm);
>
> - ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
> - if (ret)
> - goto share_err;
> -
> ret = count;
> goto done;
>
> -share_err:
> - clear_bit_inv(apid, matrix_mdev->matrix.apm);
> done:
> mutex_unlock(&matrix_dev->lock);
>
> @@ -391,32 +475,13 @@ static ssize_t unassign_adapter_store(struct device *dev,
>
> mutex_lock(&matrix_dev->lock);
> clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
> + vfio_ap_put_all_domains(matrix_mdev, apid);
> mutex_unlock(&matrix_dev->lock);
>
> return count;
> }
> static DEVICE_ATTR_WO(unassign_adapter);
>
> -static int
> -vfio_ap_mdev_verify_queues_reserved_for_apqi(struct ap_matrix_mdev *matrix_mdev,
> - unsigned long apqi)
> -{
> - int ret;
> - unsigned long apid;
> - unsigned long nbits = matrix_mdev->matrix.apm_max + 1;
> -
> - if (find_first_bit_inv(matrix_mdev->matrix.apm, nbits) >= nbits)
> - return vfio_ap_verify_queue_reserved(NULL, &apqi);
> -
> - for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, nbits) {
> - ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
> - if (ret)
> - return ret;
> - }
> -
> - return 0;
> -}
> -
> /**
> * assign_domain_store
> *
> @@ -471,21 +536,15 @@ static ssize_t assign_domain_store(struct device *dev,
>
> mutex_lock(&matrix_dev->lock);
>
> - ret = vfio_ap_mdev_verify_queues_reserved_for_apqi(matrix_mdev, apqi);
> + ret = vfio_ap_get_all_cards(matrix_mdev, apqi);
> if (ret)
> goto done;
>
> set_bit_inv(apqi, matrix_mdev->matrix.aqm);
>
> - ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
> - if (ret)
> - goto share_err;
> -
> ret = count;
> goto done;
>
> -share_err:
> - clear_bit_inv(apqi, matrix_mdev->matrix.aqm);
> done:
> mutex_unlock(&matrix_dev->lock);
>
> @@ -533,6 +592,7 @@ static ssize_t unassign_domain_store(struct device *dev,
>
> mutex_lock(&matrix_dev->lock);
> clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
> + vfio_ap_put_all_cards(matrix_mdev, apqi);
> mutex_unlock(&matrix_dev->lock);
>
> return count;
> @@ -790,49 +850,22 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
> return NOTIFY_OK;
> }
>
> -static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
> - unsigned int retry)
> -{
> - struct ap_queue_status status;
> -
> - do {
> - status = ap_zapq(AP_MKQID(apid, apqi));
> - switch (status.response_code) {
> - case AP_RESPONSE_NORMAL:
> - return 0;
> - case AP_RESPONSE_RESET_IN_PROGRESS:
> - case AP_RESPONSE_BUSY:
> - msleep(20);
> - break;
> - default:
> - /* things are really broken, give up */
> - return -EIO;
> - }
> - } while (retry--);
> -
> - return -EBUSY;
> -}
> -
> static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
> {
> int ret;
> int rc = 0;
> - unsigned long apid, apqi;
> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> + struct vfio_ap_queue *q;
>
> - for_each_set_bit_inv(apid, matrix_mdev->matrix.apm,
> - matrix_mdev->matrix.apm_max + 1) {
> - for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
> - matrix_mdev->matrix.aqm_max + 1) {
> - ret = vfio_ap_mdev_reset_queue(apid, apqi, 1);
> - /*
> - * Regardless whether a queue turns out to be busy, or
> - * is not operational, we need to continue resetting
> - * the remaining queues.
> - */
> - if (ret)
> - rc = ret;
> - }
> + list_for_each_entry(q, &matrix_mdev->qlist, list) {
> + ret = vfio_ap_mdev_reset_queue(q);
> + /*
> + * Regardless whether a queue turns out to be busy, or
> + * is not operational, we need to continue resetting
> + * the remaining queues but notice the last error code.
> + */
> + if (ret)
> + rc = ret;
> }
>
> return rc;
> @@ -868,10 +901,10 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
> if (matrix_mdev->kvm)
> kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
>
> + matrix_mdev->kvm = NULL;
> vfio_ap_mdev_reset_queues(mdev);
> vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
> &matrix_mdev->group_notifier);
> - matrix_mdev->kvm = NULL;
> module_put(THIS_MODULE);
> }
>
> @@ -905,7 +938,9 @@ static ssize_t vfio_ap_mdev_ioctl(struct mdev_device *mdev,
> ret = vfio_ap_mdev_get_device_info(arg);
> break;
> case VFIO_DEVICE_RESET:
> + mutex_lock(&matrix_dev->lock);
> ret = vfio_ap_mdev_reset_queues(mdev);
> + mutex_unlock(&matrix_dev->lock);
> break;
> default:
> ret = -EOPNOTSUPP;
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index a910be1..3e6940c 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -40,6 +40,7 @@ struct ap_matrix_dev {
> atomic_t available_instances;
> struct ap_config_info info;
> struct list_head mdev_list;
> + struct list_head free_list;
> struct mutex lock;
> struct ap_driver *vfio_ap_drv;
> };
> @@ -83,9 +84,15 @@ struct ap_matrix_mdev {
> struct notifier_block group_notifier;
> struct kvm *kvm;
> struct kvm_s390_module_hook pqap_hook;
> + struct list_head qlist;
> };
>
> extern int vfio_ap_mdev_register(void);
> extern void vfio_ap_mdev_unregister(void);
>
> +struct vfio_ap_queue {
> + struct list_head list;
> + struct ap_matrix_mdev *matrix_mdev;
> + int apqn;
> +};
> #endif /* _VFIO_AP_PRIVATE_H_ */


2019-03-26 19:00:23

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v6 1/7] s390: ap: kvm: add PQAP interception for AQIC

On 3/22/19 10:43 AM, Pierre Morel wrote:
> We prepare the interception of the PQAP/AQIC instruction for
> the case the AQIC facility is enabled in the guest.
>
> First of all we do not want to change existing behavior when
> intercepting AP instructions without the SIE allowing the guest
> to use AP instructions.
>
> In this patch we only handle the AQIC interception allowed by
> facility 65 which will be enabled when the complete interception
> infrastructure will be present.
>
> We add a callback inside the KVM arch structure for s390 for
> a VFIO driver to handle a specific response to the PQAP
> instruction with the AQIC command and only this command.
>
> But we want to be able to return a correct answer to the guest
> even there is no VFIO AP driver in the kernel.
> Therefor, we inject the correct exceptions from inside KVM for the
> case the callback is not initialized, which happens when the vfio_ap
> driver is not loaded.
>
> We do consider the responsability of the driver to always initialize
> the PQAP callback if it defines queues by initializing the CRYCB for
> a guest.
> If the callback has been setup we call it.
> If not we setup an answer considering that no queue is available
> for the guest when no callback has been setup.
>
> Signed-off-by: Pierre Morel <[email protected]>
> ---
> arch/s390/include/asm/kvm_host.h | 8 ++++
> arch/s390/kvm/priv.c | 90 +++++++++++++++++++++++++++++++++++
> drivers/s390/crypto/vfio_ap_private.h | 2 +
> 3 files changed, 100 insertions(+)
>
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index a496276..624460b 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -18,6 +18,7 @@
> #include <linux/kvm_host.h>
> #include <linux/kvm.h>
> #include <linux/seqlock.h>
> +#include <linux/module.h>
> #include <asm/debug.h>
> #include <asm/cpu.h>
> #include <asm/fpu/api.h>
> @@ -721,8 +722,15 @@ struct kvm_s390_cpu_model {
> unsigned short ibc;
> };
>
> +struct kvm_s390_module_hook {
> + int (*hook)(struct kvm_vcpu *vcpu);
> + void *data;
> + struct module *owner;
> +};
> +
> struct kvm_s390_crypto {
> struct kvm_s390_crypto_cb *crycb;
> + struct kvm_s390_module_hook *pqap_hook;
> __u32 crycbd;
> __u8 aes_kw;
> __u8 dea_kw;
> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
> index 8679bd7..793e48a 100644
> --- a/arch/s390/kvm/priv.c
> +++ b/arch/s390/kvm/priv.c
> @@ -27,6 +27,7 @@
> #include <asm/io.h>
> #include <asm/ptrace.h>
> #include <asm/sclp.h>
> +#include <asm/ap.h>
> #include "gaccess.h"
> #include "kvm-s390.h"
> #include "trace.h"
> @@ -592,6 +593,93 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
> }
> }
>
> +/*
> + * handle_pqap: Handling pqap interception
> + * @vcpu: the vcpu having issue the pqap instruction
> + *
> + * We now support PQAP/AQIC instructions and we need to correctly
> + * answer the guest even if no dedicated driver's hook is available.
> + *
> + * The intercepting code calls a dedicated callback for this instruction
> + * if a driver did register one in the CRYPTO satellite of the
> + * SIE block.
> + *
> + * For PQAP AQIC and TAPQ instructions, verify privilege and specifications.

The two paragraphs above should be described via the comments embedded
in the code and is not necessary here.

> + *
> + * If no callback available, the queues are not available, return this to
> + * the caller.

This implies it is specified via the return code when it is in fact
the response code in the status word.

> + * Else return the value returned by the callback.
> + */

Given this handler may be called for any PQAP instruction sub-function,
I think the function doc should be more generic, providing:

* A general description of what the function does
* A description of each input parameter
* A description of the value returned. If the return value is a return
code, the possible rc values can be enumerated with a description for
of the reason each particular value may be returned.

> +static int handle_pqap(struct kvm_vcpu *vcpu)
> +{
> + struct ap_queue_status status = {};
> + unsigned long reg0;
> + int ret;
> + uint8_t fc;
> +
> + /* Verify that the AP instruction are available */
> + if (!ap_instructions_available())
> + return -EOPNOTSUPP;
> + /* Verify that the guest is allowed to use AP instructions */
> + if (!(vcpu->arch.sie_block->eca & ECA_APIE))
> + return -EOPNOTSUPP;
> + /*
> + * The only possibly intercepted instructions when AP instructions are
> + * available for the guest are AQIC and TAPQ with the t bit set
> + * since we do not set IC.3 (FIII) we currently will not intercept
> + * TAPQ.
> + * The following code will only treat AQIC function code.
> + */

Simplify to:

/* The only supported PQAP function is AQIC (0x03) */

> + reg0 = vcpu->run->s.regs.gprs[0];
> + fc = reg0 >> 24;
> + if (fc != 0x03) {
> + pr_warn("%s: Unexpected interception code 0x%02x\n",
> + __func__, fc);
> + return -EOPNOTSUPP;
> + }
> + /* All PQAP instructions are allowed for guest kernel only */

There is only one PQAP instruction with multiple sub-functions.
/* PQAP instruction is allowed for guest kernel only */
or
/* PQAP instruction is privileged */

> + if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
> + return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
> + /*
> + * Common tests for PQAP instructions to generate a specification
> + * exception
> + */

This comment is unnecessary as the individual comments below adequately
do the job.

> + /* Zero bits overwrite produce a specification exception */

This comment has no meaning unless you intimately know the architecture.
The following would make more sense:

/* Bits 41-47 must all be zeros */

It's probably not a big deal, but since we don't support PQAP(TAPQ),
would it make more sense to make sure bits 40-47 are zeros (i.e.,
the 't' bit is not set)?

> + if (reg0 & 0x007f0000UL)
> + goto specification_except;
> + /* If APXA is not installed APQN is limited */

Wouldn't it be better to state how the APQN is limited?
For example:

/*
* If APXA is not installed, then the maximum APID is
* 63 (bits 48-49 of reg0 must be zero) and the maximum
* APQI is 15 (bits 56-59 must be zero)
*/

> + if (!(vcpu->kvm->arch.crypto.crycbd & 0x02))
> + if (reg0 & 0x000030f0UL)

If APXA is not installed, then bits 48-49 and 56-59 must all be
zeros. Shouldn't this mask be 0x0000c0f0UL?

> + goto specification_except;
> + /* AQIC needs facility 65 */
> + if (!test_kvm_facility(vcpu->kvm, 65))
> + goto specification_except;
> +
> + /*
> + * Verify that the hook callback is registered, lock the owner
> + * and call the hook.
> + */
> + if (vcpu->kvm->arch.crypto.pqap_hook) {
> + if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))
> + return -EOPNOTSUPP;
> + ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);
> + module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);
> + return ret;
> + }
> + /*
> + * It is the duty of the vfio_driver to register a hook
> + * If it does not and we get an exception on AQIC we must
> + * guess that there is no vfio_ap_driver at all and no one
> + * to handle the guests's CRYCB and the CRYCB is empty.
> + */

The comment above does not make sense to me. If there is no pqap
hook registered, then we need to handle that case for sure. But why
mention getting an exception? Why even mention whose responsibility
it is to set the hook when all we need to know is whether a hook is
set or not?

I am wondering whether merely setting a response code indicating the
APQN is invalid is the correct thing to do here. First of all, if the
guest's CRYCB is empty, then the AP bus running in the guest would not
create any AP devices or any AP queues bound to any zcrypt driver. In
that case, I don't think the PQAP(AQIC) would ever be issued. If a
PQAP is intercepted, wouldn't we want to return -EOPNOTSUPP?



> + status.response_code = 0x01;
> + memcpy(&vcpu->run->s.regs.gprs[1], &status, sizeof(status));
> + return 0;
> +
> +specification_except:
> + return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
> +}
> +
> static int handle_stfl(struct kvm_vcpu *vcpu)
> {
> int rc;
> @@ -878,6 +966,8 @@ int kvm_s390_handle_b2(struct kvm_vcpu *vcpu)
> return handle_sthyi(vcpu);
> case 0x7d:
> return handle_stsi(vcpu);
> + case 0xaf:
> + return handle_pqap(vcpu);
> case 0xb1:
> return handle_stfl(vcpu);
> case 0xb2:
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index 76b7f98..a910be1 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.hhttps://www.linuxmint.com/start/sylvia/
> @@ -16,6 +16,7 @@
> #include <linux/mdev.h>
> #include <linux/delay.h>
> #include <linux/mutex.h>
> +#include <linux/kvm_host.h>
>
> #include "ap_bus.h"
>
> @@ -81,6 +82,7 @@ struct ap_matrix_mdev {
> struct ap_matrix matrix;
> struct notifier_block group_notifier;
> struct kvm *kvm;
> + struct kvm_s390_module_hook pqap_hook;
> };
>
> extern int vfio_ap_mdev_register(void);
>


2019-03-26 20:46:32

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v6 2/7] s390: ap: new vfio_ap_queue structure

On 3/22/19 10:43 AM, Pierre Morel wrote:
> The AP interruptions are assigned on a queue basis and
> the GISA structure is handled on a VM basis, so that
> we need to add a structure we can retrieve from both side

s/side/sides/

> holding the information we need to handle PQAP/AQIC interception
> and setup the GISA.

s/setup/set up/

>
> Since we can not add more information to the ap_device
> we add a new structure vfio_ap_queue, to hold per queue
> information useful to handle interruptions and set it as
> driver's data of the standard ap_queue device.
>
> Usually, the device and the mediated device are linked together
> but in the vfio_ap driver design we have a bunch of "sub" devices
> (the ap_queue devices) belonging to the mediated device.
>
> Linking these structure to the mediated device it is assigned to,
> with the help of the vfio_ap_queue structure will help us to
> retrieve the AP devices associated with the mediated devices
> during the mediated device operations.
>
> ------------ -------------
> | AP queue |--> | AP_vfio_q |<----
> ------------ ------^------ | ---------------
> | <--->| matrix_mdev |
> ------------ ------v------ | ---------------
> | AP queue |--> | AP_vfio_q |-----
> ------------ -------------
>
> The vfio_ap_queue device will hold the following entries:
> - apqn: AP queue number (defined here)
> - isc : Interrupt subclass (defined later)
> - nib : notification information byte (defined later)
> - list: a list_head entry allowing to link this structure to a
> matrix mediated device it is assigned to.
>
> The vfio_ap_queue structure is allocated when the vfio_ap_driver
> is probed and added as driver data to the ap_queue device.
> It is free on remove.
>
> The structure is linked to the matrix_dev host device at the
> probe of the device building some kind of free list for the
> matrix mediated devices.
>
> When the vfio_queue is associated to a matrix mediated device,
> during assign_adapter or assign_domain,
> the vfio_ap_queue device is linked to this matrix mediated device
> and unlinked when dissociated.
>
> Queuing the devices on a list of free devices and testing the
> matrix_mdev pointer to the associated matrix allow us to know
> if the queue is associated to the matrix device and associated
> or not to a mediated device.
>
> All the operation on the free_list must be protected by the
> VFIO AP matrix_dev lock.
>
> Signed-off-by: Pierre Morel <[email protected]>
> ---
> drivers/s390/crypto/vfio_ap_drv.c | 31 ++-
> drivers/s390/crypto/vfio_ap_ops.c | 423 ++++++++++++++++++----------------
> drivers/s390/crypto/vfio_ap_private.h | 7 +
> 3 files changed, 266 insertions(+), 195 deletions(-)
>
> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
> index e9824c3..df6f21a 100644
> --- a/drivers/s390/crypto/vfio_ap_drv.c
> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> @@ -40,14 +40,42 @@ static struct ap_device_id ap_queue_ids[] = {
>
> MODULE_DEVICE_TABLE(vfio_ap, ap_queue_ids);
>
> +/**
> + * vfio_ap_queue_dev_probe:
> + *
> + * Allocate a vfio_ap_queue structure and associate it
> + * with the device as driver_data.
> + */
> static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
> {
> + struct vfio_ap_queue *q;
> +
> + q = kzalloc(sizeof(*q), GFP_KERNEL);
> + if (!q)
> + return -ENOMEM;
> + dev_set_drvdata(&apdev->device, q);
> + q->apqn = to_ap_queue(&apdev->device)->qid;
> + INIT_LIST_HEAD(&q->list);
> + mutex_lock(&matrix_dev->lock);
> + list_add(&q->list, &matrix_dev->free_list);
> + mutex_unlock(&matrix_dev->lock);
> return 0;
> }
>
> +/**
> + * vfio_ap_queue_dev_remove:
> + *
> + * Free the associated vfio_ap_queue structure
> + */
> static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
> {
> - /* Nothing to do yet */
> + struct vfio_ap_queue *q;
> +
> + q = dev_get_drvdata(&apdev->device);
> + mutex_lock(&matrix_dev->lock);
> + list_del(&q->list);
> + mutex_unlock(&matrix_dev->lock);
> + kfree(q);
> }
>
> static void vfio_ap_matrix_dev_release(struct device *dev)
> @@ -107,6 +135,7 @@ static int vfio_ap_matrix_dev_create(void)
> matrix_dev->device.bus = &matrix_bus;
> matrix_dev->device.release = vfio_ap_matrix_dev_release;
> matrix_dev->vfio_ap_drv = &vfio_ap_drv;
> + INIT_LIST_HEAD(&matrix_dev->free_list);
>
> ret = device_register(&matrix_dev->device);
> if (ret)
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 900b9cf..77f7bac 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -24,6 +24,68 @@
> #define VFIO_AP_MDEV_TYPE_HWVIRT "passthrough"
> #define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
>
> +/**
> + * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
> + * @apqn: The queue APQN
> + *
> + * Retrieve a queue with a specific APQN from the list of the
> + * devices associated with a list.
> + *
> + * Returns the pointer to the associated vfio_ap_queue
> + */
> +struct vfio_ap_queue *vfio_ap_get_queue(int apqn, struct list_head *l)
> +{
> + struct vfio_ap_queue *q;
> +
> + list_for_each_entry(q, l, list)
> + if (q->apqn == apqn)
> + return q;
> + return NULL;
> +}
> +
> +static int vfio_ap_find_any_card(int apid)
> +{
> + struct vfio_ap_queue *q;
> +
> + list_for_each_entry(q, &matrix_dev->free_list, list)
> + if (AP_QID_CARD(q->apqn) == apid)
> + return 1;
> + return 0;
> +}
> +
> +static int vfio_ap_find_any_domain(int apqi)
> +{
> + struct vfio_ap_queue *q;
> +
> + list_for_each_entry(q, &matrix_dev->free_list, list)
> + if (AP_QID_QUEUE(q->apqn) == apqi)
> + return 1;
> + return 0;
> +}
> +
> +static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
> +{
> + struct ap_queue_status status;
> + int retry = 1;
> +
> + do {
> + status = ap_zapq(q->apqn);
> + switch (status.response_code) {
> + case AP_RESPONSE_NORMAL:
> + return 0;
> + case AP_RESPONSE_RESET_IN_PROGRESS:
> + case AP_RESPONSE_BUSY:
> + msleep(20);
> + break;
> + default:
> + /* things are really broken, give up */

I'm not sure things are necessarily broken. We could end up here if
the AP is removed from the configuration via the SE or SCLP Deconfigure
Adjunct Processor command.

> + return -EIO;
> + }
> + } while (retry--);
> +
> + return -EBUSY;
> +}
> +
> static void vfio_ap_matrix_init(struct ap_config_info *info,
> struct ap_matrix *matrix)
> {
> @@ -45,6 +107,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
> return -ENOMEM;
> }
>
> + INIT_LIST_HEAD(&matrix_mdev->qlist);
> vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
> mdev_set_drvdata(mdev, matrix_mdev);
> mutex_lock(&matrix_dev->lock);
> @@ -113,162 +176,189 @@ static struct attribute_group *vfio_ap_mdev_type_groups[] = {
> NULL,
> };
>
> -struct vfio_ap_queue_reserved {
> - unsigned long *apid;
> - unsigned long *apqi;
> - bool reserved;
> -};
> +static void vfio_ap_free_queue(int apqn, struct ap_matrix_mdev *matrix_mdev)
> +{
> + struct vfio_ap_queue *q;
> +
> + q = vfio_ap_get_queue(apqn, &matrix_mdev->qlist);
> + if (!q)
> + return;
> + q->matrix_mdev = NULL;
> + vfio_ap_mdev_reset_queue(q);

I'm wondering if it's necessary to reset the queue here. The only time
a queue is used is when a guest using the mdev device is started. When
that guest is terminated, the fd for the mdev device is closed and the
mdev device's release callback is invoked. The release callback resets
the queues assigned to the mdev device. Is it really necessary to
reset the queue again when it is unassigned even if there would have
been no subsequent activity?

> + list_move(&q->list, &matrix_dev->free_list);
> +}
>
> /**
> - * vfio_ap_has_queue
> - *
> - * @dev: an AP queue device
> - * @data: a struct vfio_ap_queue_reserved reference
> + * vfio_ap_put_all_domains:
> *
> - * Flags whether the AP queue device (@dev) has a queue ID containing the APQN,
> - * apid or apqi specified in @data:
> + * @matrix_mdev: the matrix mediated device for which we want to associate
> + * all available queues with a given apqi.
> + * @apid: The apid which associated with all defined APQI of the
> + * mediated device will define a AP queue.
> *
> - * - If @data contains both an apid and apqi value, then @data will be flagged
> - * as reserved if the APID and APQI fields for the AP queue device matches
> - *
> - * - If @data contains only an apid value, @data will be flagged as
> - * reserved if the APID field in the AP queue device matches
> - *
> - * - If @data contains only an apqi value, @data will be flagged as
> - * reserved if the APQI field in the AP queue device matches
> - *
> - * Returns 0 to indicate the input to function succeeded. Returns -EINVAL if
> - * @data does not contain either an apid or apqi.
> + * We remove the queue from the list of queues associated with the
> + * mediated device and put them back to the free list of the matrix
> + * device and clear the matrix_mdev pointer.
> */
> -static int vfio_ap_has_queue(struct device *dev, void *data)
> +static void vfio_ap_put_all_domains(struct ap_matrix_mdev *matrix_mdev,
> + int apid)
> {
> - struct vfio_ap_queue_reserved *qres = data;
> - struct ap_queue *ap_queue = to_ap_queue(dev);
> - ap_qid_t qid;
> - unsigned long id;
> + int apqi, apqn;
>
> - if (qres->apid && qres->apqi) {
> - qid = AP_MKQID(*qres->apid, *qres->apqi);
> - if (qid == ap_queue->qid)
> - qres->reserved = true;
> - } else if (qres->apid && !qres->apqi) {
> - id = AP_QID_CARD(ap_queue->qid);
> - if (id == *qres->apid)
> - qres->reserved = true;
> - } else if (!qres->apid && qres->apqi) {
> - id = AP_QID_QUEUE(ap_queue->qid);
> - if (id == *qres->apqi)
> - qres->reserved = true;
> - } else {
> - return -EINVAL;
> + for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
> + apqn = AP_MKQID(apid, apqi);
> + vfio_ap_free_queue(apqn, matrix_mdev);
> }
> -
> - return 0;
> }
>
> /**
> - * vfio_ap_verify_queue_reserved
> - *
> - * @matrix_dev: a mediated matrix device
> - * @apid: an AP adapter ID
> - * @apqi: an AP queue index
> + * vfio_ap_put_all_cards:
> *
> - * Verifies that the AP queue with @apid/@apqi is reserved by the VFIO AP device
> - * driver according to the following rules:
> + * @matrix_mdev: the matrix mediated device for which we want to associate
> + * all available queues with a given apqi.
> + * @apqi: The apqi which associated with all defined APID of the
> + * mediated device will define a AP queue.
> *
> - * - If both @apid and @apqi are not NULL, then there must be an AP queue
> - * device bound to the vfio_ap driver with the APQN identified by @apid and
> - * @apqi
> - *
> - * - If only @apid is not NULL, then there must be an AP queue device bound
> - * to the vfio_ap driver with an APQN containing @apid
> - *
> - * - If only @apqi is not NULL, then there must be an AP queue device bound
> - * to the vfio_ap driver with an APQN containing @apqi
> - *
> - * Returns 0 if the AP queue is reserved; otherwise, returns -EADDRNOTAVAIL.
> + * We remove the queue from the list of queues associated with the
> + * mediated device and put them back to the free list of the matrix
> + * device and clear the matrix_mdev pointer.
> */
> -static int vfio_ap_verify_queue_reserved(unsigned long *apid,
> - unsigned long *apqi)
> +static void vfio_ap_put_all_cards(struct ap_matrix_mdev *matrix_mdev, int apqi)
> {
> - int ret;
> - struct vfio_ap_queue_reserved qres;
> + int apid, apqn;
>
> - qres.apid = apid;
> - qres.apqi = apqi;
> - qres.reserved = false;
> -
> - ret = driver_for_each_device(&matrix_dev->vfio_ap_drv->driver, NULL,
> - &qres, vfio_ap_has_queue);
> - if (ret)
> - return ret;
> + for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
> + apqn = AP_MKQID(apid, apqi);
> + vfio_ap_free_queue(apqn, matrix_mdev);
> + }
> +}
>
> - if (qres.reserved)
> - return 0;
> +static void move_and_set(struct list_head *src, struct list_head *dst,
> + struct ap_matrix_mdev *matrix_mdev)
> +{
> + struct vfio_ap_queue *q, *qtmp;
>
> - return -EADDRNOTAVAIL;
> + list_for_each_entry_safe(q, qtmp, src, list) {
> + list_move(&q->list, dst);
> + q->matrix_mdev = matrix_mdev;
> + }
> }
>
> -static int
> -vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
> - unsigned long apid)
> +static int vfio_ap_queue_match(struct device *dev, void *data)
> {
> - int ret;
> - unsigned long apqi;
> - unsigned long nbits = matrix_mdev->matrix.aqm_max + 1;
> + struct ap_queue *ap;
>
> - if (find_first_bit_inv(matrix_mdev->matrix.aqm, nbits) >= nbits)
> - return vfio_ap_verify_queue_reserved(&apid, NULL);
> + ap = to_ap_queue(dev);
> + return ap->qid == *(int *)data;
> +}
>
> - for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, nbits) {
> - ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
> - if (ret)
> - return ret;
> - }
> +static struct vfio_ap_queue *vfio_ap_find_queue(int apqn)
> +{
> + struct device *dev;
> + struct vfio_ap_queue *q;
> +
> + dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
> + &apqn, vfio_ap_queue_match);
> + if (!dev)
> + return NULL;
> + q = dev_get_drvdata(dev);
> + put_device(dev);
> + return q;
> +}
>
> +/**
> + * vfio_ap_get_all_domains:
> + *
> + * @matrix_mdev: the matrix mediated device for which we want to associate
> + * all available queues with a given apqi.
> + * @apqi: The apqi which associated with all defined APID of the
> + * mediated device will define a AP queue.
> + *
> + * We define a local list to put all queues we find on the matrix driver
> + * device list when associating the apqi with all already defined apid for
> + * this matrix mediated device.
> + *
> + * If we can get all the devices we roll them to the mediated device list
> + * If we get errors we unroll them to the free list.
> + */
> +static int vfio_ap_get_all_domains(struct ap_matrix_mdev *matrix_mdev, int apid)
> +{
> + int apqi, apqn;
> + int ret = 0;
> + struct vfio_ap_queue *q;
> + struct list_head q_list;
> +
> + if (!vfio_ap_find_any_card(apid))
> + return -EADDRNOTAVAIL;
> +
> + INIT_LIST_HEAD(&q_list);
> +
> + for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
> + apqn = AP_MKQID(apid, apqi);
> + q = vfio_ap_find_queue(apqn);
> + if (!q) {
> + ret = -EADDRNOTAVAIL;
> + goto rewind;
> + }
> + if (q->matrix_mdev) {

If somebody assigns the same adapter a second time, the assignment will
fail because the matrix_mdev will already have been associated with the
queue. I don't think it is appropriate to fail the assignment if the
q->matrix_mdev is the same as the input matrix_mdev. This should be
changed to:

if (q->matrix_mdev != matrix_mdev)

> + ret = -EADDRINUSE;
> + goto rewind;
> + }
> + list_move(&q->list, &q_list);
> + }
> + move_and_set(&q_list, &matrix_mdev->qlist, matrix_mdev);
> return 0;
> +rewind:
> + move_and_set(&q_list, &matrix_dev->free_list, NULL);
> + return ret;
> }
> -
> /**
> - * vfio_ap_mdev_verify_no_sharing
> + * vfio_ap_get_all_cards:
> *
> - * Verifies that the APQNs derived from the cross product of the AP adapter IDs
> - * and AP queue indexes comprising the AP matrix are not configured for another
> - * mediated device. AP queue sharing is not allowed.
> + * @matrix_mdev: the matrix mediated device for which we want to associate
> + * all available queues with a given apqi.
> + * @apqi: The apqi which associated with all defined APID of the
> + * mediated device will define a AP queue.
> *
> - * @matrix_mdev: the mediated matrix device
> + * We define a local list to put all queues we find on the matrix device
> + * free list when associating the apqi with all already defined apid for
> + * this matrix mediated device.
> *
> - * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
> + * If we can get all the devices we roll them to the mediated device list
> + * If we get errors we unroll them to the free list.
> */
> -static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
> +static int vfio_ap_get_all_cards(struct ap_matrix_mdev *matrix_mdev, int apqi)
> {
> - struct ap_matrix_mdev *lstdev;
> - DECLARE_BITMAP(apm, AP_DEVICES);
> - DECLARE_BITMAP(aqm, AP_DOMAINS);
> -
> - list_for_each_entry(lstdev, &matrix_dev->mdev_list, node) {
> - if (matrix_mdev == lstdev)
> - continue;
> -
> - memset(apm, 0, sizeof(apm));
> - memset(aqm, 0, sizeof(aqm));
> -
> - /*
> - * We work on full longs, as we can only exclude the leftover
> - * bits in non-inverse order. The leftover is all zeros.
> - */
> - if (!bitmap_and(apm, matrix_mdev->matrix.apm,
> - lstdev->matrix.apm, AP_DEVICES))
> - continue;
> -
> - if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
> - lstdev->matrix.aqm, AP_DOMAINS))
> - continue;
> -
> - return -EADDRINUSE;
> + int apid, apqn;
> + int ret = 0;
> + struct vfio_ap_queue *q;
> + struct list_head q_list;
> + struct ap_matrix_mdev *tmp = NULL;
> +
> + if (!vfio_ap_find_any_domain(apqi))
> + return -EADDRNOTAVAIL;
> +
> + INIT_LIST_HEAD(&q_list);
> +
> + for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
> + apqn = AP_MKQID(apid, apqi);
> + q = vfio_ap_find_queue(apqn);
> + if (!q) {
> + ret = -EADDRNOTAVAIL;
> + goto rewind;
> + }
> + if (q->matrix_mdev) {

If somebody assigns the same domain a second time, the assignment will
fail because the matrix_mdev will already have been associated with the
queue. I don't think it is appropriate to fail the assignment if the
q->matrix_mdev is the same as the input matrix_mdev. This should be
changed to:

if (q->matrix_mdev != matrix_mdev)

> + ret = -EADDRINUSE;
> + goto rewind;
> + }
> + list_move(&q->list, &q_list);
> }
> -
> + tmp = matrix_mdev;
> + move_and_set(&q_list, &matrix_mdev->qlist, matrix_mdev);
> return 0;
> +rewind:
> + move_and_set(&q_list, &matrix_dev->free_list, NULL);
> + return ret;
> }
>
> /**
> @@ -330,21 +420,15 @@ static ssize_t assign_adapter_store(struct device *dev,
> */
> mutex_lock(&matrix_dev->lock);
>
> - ret = vfio_ap_mdev_verify_queues_reserved_for_apid(matrix_mdev, apid);
> + ret = vfio_ap_get_all_domains(matrix_mdev, apid);
> if (ret)
> goto done;
>
> set_bit_inv(apid, matrix_mdev->matrix.apm);
>
> - ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
> - if (ret)
> - goto share_err;
> -
> ret = count;
> goto done;
>
> -share_err:
> - clear_bit_inv(apid, matrix_mdev->matrix.apm);
> done:
> mutex_unlock(&matrix_dev->lock);
>
> @@ -391,32 +475,13 @@ static ssize_t unassign_adapter_store(struct device *dev,
>
> mutex_lock(&matrix_dev->lock);
> clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
> + vfio_ap_put_all_domains(matrix_mdev, apid);
> mutex_unlock(&matrix_dev->lock);
>
> return count;
> }
> static DEVICE_ATTR_WO(unassign_adapter);
>
> -static int
> -vfio_ap_mdev_verify_queues_reserved_for_apqi(struct ap_matrix_mdev *matrix_mdev,
> - unsigned long apqi)
> -{
> - int ret;
> - unsigned long apid;
> - unsigned long nbits = matrix_mdev->matrix.apm_max + 1;
> -
> - if (find_first_bit_inv(matrix_mdev->matrix.apm, nbits) >= nbits)
> - return vfio_ap_verify_queue_reserved(NULL, &apqi);
> -
> - for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, nbits) {
> - ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
> - if (ret)
> - return ret;
> - }
> -
> - return 0;
> -}
> -
> /**
> * assign_domain_store
> *
> @@ -471,21 +536,15 @@ static ssize_t assign_domain_store(struct device *dev,
>
> mutex_lock(&matrix_dev->lock);
>
> - ret = vfio_ap_mdev_verify_queues_reserved_for_apqi(matrix_mdev, apqi);
> + ret = vfio_ap_get_all_cards(matrix_mdev, apqi);
> if (ret)
> goto done;
>
> set_bit_inv(apqi, matrix_mdev->matrix.aqm);
>
> - ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
> - if (ret)
> - goto share_err;
> -
> ret = count;
> goto done;
>
> -share_err:
> - clear_bit_inv(apqi, matrix_mdev->matrix.aqm);
> done:
> mutex_unlock(&matrix_dev->lock);
>
> @@ -533,6 +592,7 @@ static ssize_t unassign_domain_store(struct device *dev,
>
> mutex_lock(&matrix_dev->lock);
> clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
> + vfio_ap_put_all_cards(matrix_mdev, apqi);
> mutex_unlock(&matrix_dev->lock);
>
> return count;
> @@ -790,49 +850,22 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
> return NOTIFY_OK;
> }
>
> -static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
> - unsigned int retry)
> -{
> - struct ap_queue_status status;
> -
> - do {
> - status = ap_zapq(AP_MKQID(apid, apqi));
> - switch (status.response_code) {
> - case AP_RESPONSE_NORMAL:
> - return 0;
> - case AP_RESPONSE_RESET_IN_PROGRESS:
> - case AP_RESPONSE_BUSY:
> - msleep(20);
> - break;
> - default:
> - /* things are really broken, give up */
> - return -EIO;
> - }
> - } while (retry--);
> -
> - return -EBUSY;
> -}
> -
> static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
> {
> int ret;
> int rc = 0;
> - unsigned long apid, apqi;
> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> + struct vfio_ap_queue *q;
>
> - for_each_set_bit_inv(apid, matrix_mdev->matrix.apm,
> - matrix_mdev->matrix.apm_max + 1) {
> - for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
> - matrix_mdev->matrix.aqm_max + 1) {
> - ret = vfio_ap_mdev_reset_queue(apid, apqi, 1);
> - /*
> - * Regardless whether a queue turns out to be busy, or
> - * is not operational, we need to continue resetting
> - * the remaining queues.
> - */
> - if (ret)
> - rc = ret;
> - }
> + list_for_each_entry(q, &matrix_mdev->qlist, list) {
> + ret = vfio_ap_mdev_reset_queue(q);
> + /*
> + * Regardless whether a queue turns out to be busy, or
> + * is not operational, we need to continue resetting
> + * the remaining queues but notice the last error code.
> + */
> + if (ret)
> + rc = ret;
> }
>
> return rc;
> @@ -868,10 +901,10 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
> if (matrix_mdev->kvm)
> kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
>
> + matrix_mdev->kvm = NULL;
> vfio_ap_mdev_reset_queues(mdev);
> vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
> &matrix_mdev->group_notifier);
> - matrix_mdev->kvm = NULL;
> module_put(THIS_MODULE);
> }
>
> @@ -905,7 +938,9 @@ static ssize_t vfio_ap_mdev_ioctl(struct mdev_device *mdev,
> ret = vfio_ap_mdev_get_device_info(arg);
> break;
> case VFIO_DEVICE_RESET:
> + mutex_lock(&matrix_dev->lock);
> ret = vfio_ap_mdev_reset_queues(mdev);
> + mutex_unlock(&matrix_dev->lock);
> break;
> default:
> ret = -EOPNOTSUPP;
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index a910be1..3e6940c 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -40,6 +40,7 @@ struct ap_matrix_dev {
> atomic_t available_instances;
> struct ap_config_info info;
> struct list_head mdev_list;
> + struct list_head free_list;
> struct mutex lock;
> struct ap_driver *vfio_ap_drv;
> };
> @@ -83,9 +84,15 @@ struct ap_matrix_mdev {
> struct notifier_block group_notifier;
> struct kvm *kvm;
> struct kvm_s390_module_hook pqap_hook;
> + struct list_head qlist;
> };
>
> extern int vfio_ap_mdev_register(void);
> extern void vfio_ap_mdev_unregister(void);
>
> +struct vfio_ap_queue {
> + struct list_head list;
> + struct ap_matrix_mdev *matrix_mdev;
> + int apqn;
> +};
> #endif /* _VFIO_AP_PRIVATE_H_ */
>


2019-03-27 11:01:10

by Harald Freudenberger

[permalink] [raw]
Subject: Re: [PATCH v6 2/7] s390: ap: new vfio_ap_queue structure

On 26.03.19 21:45, Tony Krowiak wrote:
> On 3/22/19 10:43 AM, Pierre Morel wrote:
>> The AP interruptions are assigned on a queue basis and
>> the GISA structure is handled on a VM basis, so that
>> we need to add a structure we can retrieve from both side
>
> s/side/sides/
>
>> holding the information we need to handle PQAP/AQIC interception
>> and setup the GISA.
>
> s/setup/set up/
>
>>
>> Since we can not add more information to the ap_device
>> we add a new structure vfio_ap_queue, to hold per queue
>> information useful to handle interruptions and set it as
>> driver's data of the standard ap_queue device.
>>
>> Usually, the device and the mediated device are linked together
>> but in the vfio_ap driver design we have a bunch of "sub" devices
>> (the ap_queue devices) belonging to the mediated device.
>>
>> Linking these structure to the mediated device it is assigned to,
>> with the help of the vfio_ap_queue structure will help us to
>> retrieve the AP devices associated with the mediated devices
>> during the mediated device operations.
>>
>> ------------    -------------
>> | AP queue |--> | AP_vfio_q |<----
>> ------------    ------^------    |    ---------------
>>                        |          <--->| matrix_mdev |
>> ------------    ------v------    |    ---------------
>> | AP queue |--> | AP_vfio_q |-----
>> ------------    -------------
>>
>> The vfio_ap_queue device will hold the following entries:
>> - apqn: AP queue number (defined here)
>> - isc : Interrupt subclass (defined later)
>> - nib : notification information byte (defined later)
>> - list: a list_head entry allowing to link this structure to a
>>     matrix mediated device it is assigned to.
>>
>> The vfio_ap_queue structure is allocated when the vfio_ap_driver
>> is probed and added as driver data to the ap_queue device.
>> It is free on remove.
>>
>> The structure is linked to the matrix_dev host device at the
>> probe of the device building some kind of free list for the
>> matrix mediated devices.
>>
>> When the vfio_queue is associated to a matrix mediated device,
>> during assign_adapter or assign_domain,
>> the vfio_ap_queue device is linked to this matrix mediated device
>> and unlinked when dissociated.
>>
>> Queuing the devices on a list of free devices and testing the
>> matrix_mdev pointer to the associated matrix allow us to know
>> if the queue is associated to the matrix device and associated
>> or not to a mediated device.
>>
>> All the operation on the free_list must be protected by the
>> VFIO AP matrix_dev lock.
>>
>> Signed-off-by: Pierre Morel <[email protected]>
>> ---
>>   drivers/s390/crypto/vfio_ap_drv.c     |  31 ++-
>>   drivers/s390/crypto/vfio_ap_ops.c     | 423 ++++++++++++++++++----------------
>>   drivers/s390/crypto/vfio_ap_private.h |   7 +
>>   3 files changed, 266 insertions(+), 195 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
>> index e9824c3..df6f21a 100644
>> --- a/drivers/s390/crypto/vfio_ap_drv.c
>> +++ b/drivers/s390/crypto/vfio_ap_drv.c
>> @@ -40,14 +40,42 @@ static struct ap_device_id ap_queue_ids[] = {
>>     MODULE_DEVICE_TABLE(vfio_ap, ap_queue_ids);
>>   +/**
>> + * vfio_ap_queue_dev_probe:
>> + *
>> + * Allocate a vfio_ap_queue structure and associate it
>> + * with the device as driver_data.
>> + */
>>   static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
>>   {
>> +    struct vfio_ap_queue *q;
>> +
>> +    q = kzalloc(sizeof(*q), GFP_KERNEL);
>> +    if (!q)
>> +        return -ENOMEM;
>> +    dev_set_drvdata(&apdev->device, q);
>> +    q->apqn = to_ap_queue(&apdev->device)->qid;
>> +    INIT_LIST_HEAD(&q->list);
>> +    mutex_lock(&matrix_dev->lock);
>> +    list_add(&q->list, &matrix_dev->free_list);
>> +    mutex_unlock(&matrix_dev->lock);
>>       return 0;
>>   }
>>   +/**
>> + * vfio_ap_queue_dev_remove:
>> + *
>> + * Free the associated vfio_ap_queue structure
>> + */
>>   static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
>>   {
>> -    /* Nothing to do yet */
>> +    struct vfio_ap_queue *q;
>> +
>> +    q = dev_get_drvdata(&apdev->device);
>> +    mutex_lock(&matrix_dev->lock);
>> +    list_del(&q->list);
>> +    mutex_unlock(&matrix_dev->lock);
>> +    kfree(q);
>>   }
>>     static void vfio_ap_matrix_dev_release(struct device *dev)
>> @@ -107,6 +135,7 @@ static int vfio_ap_matrix_dev_create(void)
>>       matrix_dev->device.bus = &matrix_bus;
>>       matrix_dev->device.release = vfio_ap_matrix_dev_release;
>>       matrix_dev->vfio_ap_drv = &vfio_ap_drv;
>> +    INIT_LIST_HEAD(&matrix_dev->free_list);
>>         ret = device_register(&matrix_dev->device);
>>       if (ret)
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index 900b9cf..77f7bac 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -24,6 +24,68 @@
>>   #define VFIO_AP_MDEV_TYPE_HWVIRT "passthrough"
>>   #define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
>>   +/**
>> + * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
>> + * @apqn: The queue APQN
>> + *
>> + * Retrieve a queue with a specific APQN from the list of the
>> + * devices associated with a list.
>> + *
>> + * Returns the pointer to the associated vfio_ap_queue
>> + */
>> +struct vfio_ap_queue *vfio_ap_get_queue(int apqn, struct list_head *l)
>> +{
>> +    struct vfio_ap_queue *q;
>> +
>> +    list_for_each_entry(q, l, list)
>> +        if (q->apqn == apqn)
>> +            return q;
>> +    return NULL;
>> +}
>> +
>> +static int vfio_ap_find_any_card(int apid)
>> +{
>> +    struct vfio_ap_queue *q;
>> +
>> +    list_for_each_entry(q, &matrix_dev->free_list, list)
>> +        if (AP_QID_CARD(q->apqn) == apid)
>> +            return 1;
>> +    return 0;
>> +}
>> +
>> +static int vfio_ap_find_any_domain(int apqi)
>> +{
>> +    struct vfio_ap_queue *q;
>> +
>> +    list_for_each_entry(q, &matrix_dev->free_list, list)
>> +        if (AP_QID_QUEUE(q->apqn) == apqi)
>> +            return 1;
>> +    return 0;
>> +}
>> +
>> +static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
>> +{
>> +    struct ap_queue_status status;
>> +    int retry = 1;
>> +
>> +    do {
>> +        status = ap_zapq(q->apqn);
>> +        switch (status.response_code) {
>> +        case AP_RESPONSE_NORMAL:
>> +            return 0;
>> +        case AP_RESPONSE_RESET_IN_PROGRESS:
>> +        case AP_RESPONSE_BUSY:
>> +            msleep(20);
>> +            break;
>> +        default:
>> +            /* things are really broken, give up */
>
> I'm not sure things are necessarily broken. We could end up here if
> the AP is removed from the configuration via the SE or SCLP Deconfigure
> Adjunct Processor command.
Yes, that's right. The default is also reached when the APQN
goes away from the configuration e. g. when an admin
drives the card "offline" on the SE. So maybe correct the comment.
>
>> +            return -EIO;
>> +        }
>> +    } while (retry--);
>> +
>> +    return -EBUSY;
>> +}
>> +
>>   static void vfio_ap_matrix_init(struct ap_config_info *info,
>>                   struct ap_matrix *matrix)
>>   {
>> @@ -45,6 +107,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>>           return -ENOMEM;
>>       }
>>   +    INIT_LIST_HEAD(&matrix_mdev->qlist);
>>       vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
>>       mdev_set_drvdata(mdev, matrix_mdev);
>>       mutex_lock(&matrix_dev->lock);
>> @@ -113,162 +176,189 @@ static struct attribute_group *vfio_ap_mdev_type_groups[] = {
>>       NULL,
>>   };
>>   -struct vfio_ap_queue_reserved {
>> -    unsigned long *apid;
>> -    unsigned long *apqi;
>> -    bool reserved;
>> -};
>> +static void vfio_ap_free_queue(int apqn, struct ap_matrix_mdev *matrix_mdev)
>> +{
>> +    struct vfio_ap_queue *q;
>> +
>> +    q = vfio_ap_get_queue(apqn, &matrix_mdev->qlist);
>> +    if (!q)
>> +        return;
>> +    q->matrix_mdev = NULL;
>> +    vfio_ap_mdev_reset_queue(q);
>
> I'm wondering if it's necessary to reset the queue here. The only time
> a queue is used is when a guest using the mdev device is started. When
> that guest is terminated, the fd for the mdev device is closed and the
> mdev device's release callback is invoked. The release callback resets
> the queues assigned to the mdev device. Is it really necessary to
> reset the queue again when it is unassigned even if there would have
> been no subsequent activity?
When I understand this here right this code is called when a queue goes
away from the guest but is still reserved for use by the vfio dd. So it is
possible to assign the queue now to another guest. But then it makes
sense to clear all the entries in the millicode queue because a pending
reply could be "received" by the wrong guest.

If this function is just called on remove of a queue device where the
device goes back to the AP bus, a reset is not needed.
>
>> +    list_move(&q->list, &matrix_dev->free_list);
>> +}
>>     /**
>> - * vfio_ap_has_queue
>> - *
>> - * @dev: an AP queue device
>> - * @data: a struct vfio_ap_queue_reserved reference
>> + * vfio_ap_put_all_domains:
>>    *
>> - * Flags whether the AP queue device (@dev) has a queue ID containing the APQN,
>> - * apid or apqi specified in @data:
>> + * @matrix_mdev: the matrix mediated device for which we want to associate
>> + *         all available queues with a given apqi.
>> + * @apid:     The apid which associated with all defined APQI of the
>> + *         mediated device will define a AP queue.
>>    *
>> - * - If @data contains both an apid and apqi value, then @data will be flagged
>> - *   as reserved if the APID and APQI fields for the AP queue device matches
>> - *
>> - * - If @data contains only an apid value, @data will be flagged as
>> - *   reserved if the APID field in the AP queue device matches
>> - *
>> - * - If @data contains only an apqi value, @data will be flagged as
>> - *   reserved if the APQI field in the AP queue device matches
>> - *
>> - * Returns 0 to indicate the input to function succeeded. Returns -EINVAL if
>> - * @data does not contain either an apid or apqi.
>> + * We remove the queue from the list of queues associated with the
>> + * mediated device and put them back to the free list of the matrix
>> + * device and clear the matrix_mdev pointer.
>>    */
>> -static int vfio_ap_has_queue(struct device *dev, void *data)
>> +static void vfio_ap_put_all_domains(struct ap_matrix_mdev *matrix_mdev,
>> +                    int apid)
>>   {
>> -    struct vfio_ap_queue_reserved *qres = data;
>> -    struct ap_queue *ap_queue = to_ap_queue(dev);
>> -    ap_qid_t qid;
>> -    unsigned long id;
>> +    int apqi, apqn;
>>   -    if (qres->apid && qres->apqi) {
>> -        qid = AP_MKQID(*qres->apid, *qres->apqi);
>> -        if (qid == ap_queue->qid)
>> -            qres->reserved = true;
>> -    } else if (qres->apid && !qres->apqi) {
>> -        id = AP_QID_CARD(ap_queue->qid);
>> -        if (id == *qres->apid)
>> -            qres->reserved = true;
>> -    } else if (!qres->apid && qres->apqi) {
>> -        id = AP_QID_QUEUE(ap_queue->qid);
>> -        if (id == *qres->apqi)
>> -            qres->reserved = true;
>> -    } else {
>> -        return -EINVAL;
>> +    for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
>> +        apqn = AP_MKQID(apid, apqi);
>> +        vfio_ap_free_queue(apqn, matrix_mdev);
>>       }
>> -
>> -    return 0;
>>   }
>>     /**
>> - * vfio_ap_verify_queue_reserved
>> - *
>> - * @matrix_dev: a mediated matrix device
>> - * @apid: an AP adapter ID
>> - * @apqi: an AP queue index
>> + * vfio_ap_put_all_cards:
>>    *
>> - * Verifies that the AP queue with @apid/@apqi is reserved by the VFIO AP device
>> - * driver according to the following rules:
>> + * @matrix_mdev: the matrix mediated device for which we want to associate
>> + *         all available queues with a given apqi.
>> + * @apqi:     The apqi which associated with all defined APID of the
>> + *         mediated device will define a AP queue.
>>    *
>> - * - If both @apid and @apqi are not NULL, then there must be an AP queue
>> - *   device bound to the vfio_ap driver with the APQN identified by @apid and
>> - *   @apqi
>> - *
>> - * - If only @apid is not NULL, then there must be an AP queue device bound
>> - *   to the vfio_ap driver with an APQN containing @apid
>> - *
>> - * - If only @apqi is not NULL, then there must be an AP queue device bound
>> - *   to the vfio_ap driver with an APQN containing @apqi
>> - *
>> - * Returns 0 if the AP queue is reserved; otherwise, returns -EADDRNOTAVAIL.
>> + * We remove the queue from the list of queues associated with the
>> + * mediated device and put them back to the free list of the matrix
>> + * device and clear the matrix_mdev pointer.
>>    */
>> -static int vfio_ap_verify_queue_reserved(unsigned long *apid,
>> -                     unsigned long *apqi)
>> +static void vfio_ap_put_all_cards(struct ap_matrix_mdev *matrix_mdev, int apqi)
>>   {
>> -    int ret;
>> -    struct vfio_ap_queue_reserved qres;
>> +    int apid, apqn;
>>   -    qres.apid = apid;
>> -    qres.apqi = apqi;
>> -    qres.reserved = false;
>> -
>> -    ret = driver_for_each_device(&matrix_dev->vfio_ap_drv->driver, NULL,
>> -                     &qres, vfio_ap_has_queue);
>> -    if (ret)
>> -        return ret;
>> +    for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
>> +        apqn = AP_MKQID(apid, apqi);
>> +        vfio_ap_free_queue(apqn, matrix_mdev);
>> +    }
>> +}
>>   -    if (qres.reserved)
>> -        return 0;
>> +static void move_and_set(struct list_head *src, struct list_head *dst,
>> +             struct ap_matrix_mdev *matrix_mdev)
>> +{
>> +    struct vfio_ap_queue *q, *qtmp;
>>   -    return -EADDRNOTAVAIL;
>> +    list_for_each_entry_safe(q, qtmp, src, list) {
>> +        list_move(&q->list, dst);
>> +        q->matrix_mdev = matrix_mdev;
>> +    }
>>   }
>>   -static int
>> -vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
>> -                         unsigned long apid)
>> +static int vfio_ap_queue_match(struct device *dev, void *data)
>>   {
>> -    int ret;
>> -    unsigned long apqi;
>> -    unsigned long nbits = matrix_mdev->matrix.aqm_max + 1;
>> +    struct ap_queue *ap;
>>   -    if (find_first_bit_inv(matrix_mdev->matrix.aqm, nbits) >= nbits)
>> -        return vfio_ap_verify_queue_reserved(&apid, NULL);
>> +    ap = to_ap_queue(dev);
>> +    return ap->qid == *(int *)data;
>> +}
>>   -    for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, nbits) {
>> -        ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
>> -        if (ret)
>> -            return ret;
>> -    }
>> +static struct vfio_ap_queue *vfio_ap_find_queue(int apqn)
>> +{
>> +    struct device *dev;
>> +    struct vfio_ap_queue *q;
>> +
>> +    dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
>> +                 &apqn, vfio_ap_queue_match);
>> +    if (!dev)
>> +        return NULL;
>> +    q = dev_get_drvdata(dev);
>> +    put_device(dev);
>> +    return q;
>> +}
>>   +/**
>> + * vfio_ap_get_all_domains:
>> + *
>> + * @matrix_mdev: the matrix mediated device for which we want to associate
>> + *         all available queues with a given apqi.
>> + * @apqi:     The apqi which associated with all defined APID of the
>> + *         mediated device will define a AP queue.
>> + *
>> + * We define a local list to put all queues we find on the matrix driver
>> + * device list when associating the apqi with all already defined apid for
>> + * this matrix mediated device.
>> + *
>> + * If we can get all the devices we roll them to the mediated device list
>> + * If we get errors we unroll them to the free list.
>> + */
>> +static int vfio_ap_get_all_domains(struct ap_matrix_mdev *matrix_mdev, int apid)
>> +{
>> +    int apqi, apqn;
>> +    int ret = 0;
>> +    struct vfio_ap_queue *q;
>> +    struct list_head q_list;
>> +
>> +    if (!vfio_ap_find_any_card(apid))
>> +        return -EADDRNOTAVAIL;
>> +
>> +    INIT_LIST_HEAD(&q_list);
>> +
>> +    for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
>> +        apqn = AP_MKQID(apid, apqi);
>> +        q = vfio_ap_find_queue(apqn);
>> +        if (!q) {
>> +            ret = -EADDRNOTAVAIL;
>> +            goto rewind;
>> +        }
>> +        if (q->matrix_mdev) {
>
> If somebody assigns the same adapter a second time, the assignment will
> fail because the matrix_mdev will already have been associated with the
> queue. I don't think it is appropriate to fail the assignment if the
> q->matrix_mdev is the same as the input matrix_mdev. This should be
> changed to:
>
>     if (q->matrix_mdev != matrix_mdev)
>
>> +            ret = -EADDRINUSE;
>> +            goto rewind;
>> +        }
>> +        list_move(&q->list, &q_list);
>> +    }
>> +    move_and_set(&q_list, &matrix_mdev->qlist, matrix_mdev);
>>       return 0;
>> +rewind:
>> +    move_and_set(&q_list, &matrix_dev->free_list, NULL);
>> +    return ret;
>>   }
>> -
>>   /**
>> - * vfio_ap_mdev_verify_no_sharing
>> + * vfio_ap_get_all_cards:
>>    *
>> - * Verifies that the APQNs derived from the cross product of the AP adapter IDs
>> - * and AP queue indexes comprising the AP matrix are not configured for another
>> - * mediated device. AP queue sharing is not allowed.
>> + * @matrix_mdev: the matrix mediated device for which we want to associate
>> + *         all available queues with a given apqi.
>> + * @apqi:     The apqi which associated with all defined APID of the
>> + *         mediated device will define a AP queue.
>>    *
>> - * @matrix_mdev: the mediated matrix device
>> + * We define a local list to put all queues we find on the matrix device
>> + * free list when associating the apqi with all already defined apid for
>> + * this matrix mediated device.
>>    *
>> - * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
>> + * If we can get all the devices we roll them to the mediated device list
>> + * If we get errors we unroll them to the free list.
>>    */
>> -static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
>> +static int vfio_ap_get_all_cards(struct ap_matrix_mdev *matrix_mdev, int apqi)
>>   {
>> -    struct ap_matrix_mdev *lstdev;
>> -    DECLARE_BITMAP(apm, AP_DEVICES);
>> -    DECLARE_BITMAP(aqm, AP_DOMAINS);
>> -
>> -    list_for_each_entry(lstdev, &matrix_dev->mdev_list, node) {
>> -        if (matrix_mdev == lstdev)
>> -            continue;
>> -
>> -        memset(apm, 0, sizeof(apm));
>> -        memset(aqm, 0, sizeof(aqm));
>> -
>> -        /*
>> -         * We work on full longs, as we can only exclude the leftover
>> -         * bits in non-inverse order. The leftover is all zeros.
>> -         */
>> -        if (!bitmap_and(apm, matrix_mdev->matrix.apm,
>> -                lstdev->matrix.apm, AP_DEVICES))
>> -            continue;
>> -
>> -        if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
>> -                lstdev->matrix.aqm, AP_DOMAINS))
>> -            continue;
>> -
>> -        return -EADDRINUSE;
>> +    int apid, apqn;
>> +    int ret = 0;
>> +    struct vfio_ap_queue *q;
>> +    struct list_head q_list;
>> +    struct ap_matrix_mdev *tmp = NULL;
>> +
>> +    if (!vfio_ap_find_any_domain(apqi))
>> +        return -EADDRNOTAVAIL;
>> +
>> +    INIT_LIST_HEAD(&q_list);
>> +
>> +    for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
>> +        apqn = AP_MKQID(apid, apqi);
>> +        q = vfio_ap_find_queue(apqn);
>> +        if (!q) {
>> +            ret = -EADDRNOTAVAIL;
>> +            goto rewind;
>> +        }
>> +        if (q->matrix_mdev) {
>
> If somebody assigns the same domain a second time, the assignment will
> fail because the matrix_mdev will already have been associated with the
> queue. I don't think it is appropriate to fail the assignment if the
> q->matrix_mdev is the same as the input matrix_mdev. This should be
> changed to:
>
>     if (q->matrix_mdev != matrix_mdev)
>
>> +            ret = -EADDRINUSE;
>> +            goto rewind;
>> +        }
>> +        list_move(&q->list, &q_list);
>>       }
>> -
>> +    tmp = matrix_mdev;
>> +    move_and_set(&q_list, &matrix_mdev->qlist, matrix_mdev);
>>       return 0;
>> +rewind:
>> +    move_and_set(&q_list, &matrix_dev->free_list, NULL);
>> +    return ret;
>>   }
>>     /**
>> @@ -330,21 +420,15 @@ static ssize_t assign_adapter_store(struct device *dev,
>>        */
>>       mutex_lock(&matrix_dev->lock);
>>   -    ret = vfio_ap_mdev_verify_queues_reserved_for_apid(matrix_mdev, apid);
>> +    ret = vfio_ap_get_all_domains(matrix_mdev, apid);
>>       if (ret)
>>           goto done;
>>         set_bit_inv(apid, matrix_mdev->matrix.apm);
>>   -    ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
>> -    if (ret)
>> -        goto share_err;
>> -
>>       ret = count;
>>       goto done;
>>   -share_err:
>> -    clear_bit_inv(apid, matrix_mdev->matrix.apm);
>>   done:
>>       mutex_unlock(&matrix_dev->lock);
>>   @@ -391,32 +475,13 @@ static ssize_t unassign_adapter_store(struct device *dev,
>>         mutex_lock(&matrix_dev->lock);
>>       clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
>> +    vfio_ap_put_all_domains(matrix_mdev, apid);
>>       mutex_unlock(&matrix_dev->lock);
>>         return count;
>>   }
>>   static DEVICE_ATTR_WO(unassign_adapter);
>>   -static int
>> -vfio_ap_mdev_verify_queues_reserved_for_apqi(struct ap_matrix_mdev *matrix_mdev,
>> -                         unsigned long apqi)
>> -{
>> -    int ret;
>> -    unsigned long apid;
>> -    unsigned long nbits = matrix_mdev->matrix.apm_max + 1;
>> -
>> -    if (find_first_bit_inv(matrix_mdev->matrix.apm, nbits) >= nbits)
>> -        return vfio_ap_verify_queue_reserved(NULL, &apqi);
>> -
>> -    for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, nbits) {
>> -        ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
>> -        if (ret)
>> -            return ret;
>> -    }
>> -
>> -    return 0;
>> -}
>> -
>>   /**
>>    * assign_domain_store
>>    *
>> @@ -471,21 +536,15 @@ static ssize_t assign_domain_store(struct device *dev,
>>         mutex_lock(&matrix_dev->lock);
>>   -    ret = vfio_ap_mdev_verify_queues_reserved_for_apqi(matrix_mdev, apqi);
>> +    ret = vfio_ap_get_all_cards(matrix_mdev, apqi);
>>       if (ret)
>>           goto done;
>>         set_bit_inv(apqi, matrix_mdev->matrix.aqm);
>>   -    ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
>> -    if (ret)
>> -        goto share_err;
>> -
>>       ret = count;
>>       goto done;
>>   -share_err:
>> -    clear_bit_inv(apqi, matrix_mdev->matrix.aqm);
>>   done:
>>       mutex_unlock(&matrix_dev->lock);
>>   @@ -533,6 +592,7 @@ static ssize_t unassign_domain_store(struct device *dev,
>>         mutex_lock(&matrix_dev->lock);
>>       clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
>> +    vfio_ap_put_all_cards(matrix_mdev, apqi);
>>       mutex_unlock(&matrix_dev->lock);
>>         return count;
>> @@ -790,49 +850,22 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>>       return NOTIFY_OK;
>>   }
>>   -static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
>> -                    unsigned int retry)
>> -{
>> -    struct ap_queue_status status;
>> -
>> -    do {
>> -        status = ap_zapq(AP_MKQID(apid, apqi));
>> -        switch (status.response_code) {
>> -        case AP_RESPONSE_NORMAL:
>> -            return 0;
>> -        case AP_RESPONSE_RESET_IN_PROGRESS:
>> -        case AP_RESPONSE_BUSY:
>> -            msleep(20);
>> -            break;
>> -        default:
>> -            /* things are really broken, give up */
>> -            return -EIO;
>> -        }
>> -    } while (retry--);
>> -
>> -    return -EBUSY;
>> -}
>> -
>>   static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
>>   {
>>       int ret;
>>       int rc = 0;
>> -    unsigned long apid, apqi;
>>       struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>> +    struct vfio_ap_queue *q;
>>   -    for_each_set_bit_inv(apid, matrix_mdev->matrix.apm,
>> -                 matrix_mdev->matrix.apm_max + 1) {
>> -        for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
>> -                     matrix_mdev->matrix.aqm_max + 1) {
>> -            ret = vfio_ap_mdev_reset_queue(apid, apqi, 1);
>> -            /*
>> -             * Regardless whether a queue turns out to be busy, or
>> -             * is not operational, we need to continue resetting
>> -             * the remaining queues.
>> -             */
>> -            if (ret)
>> -                rc = ret;
>> -        }
>> +    list_for_each_entry(q, &matrix_mdev->qlist, list) {
>> +        ret = vfio_ap_mdev_reset_queue(q);
>> +        /*
>> +         * Regardless whether a queue turns out to be busy, or
>> +         * is not operational, we need to continue resetting
>> +         * the remaining queues but notice the last error code.
>> +         */
>> +        if (ret)
>> +            rc = ret;
>>       }
>>         return rc;
>> @@ -868,10 +901,10 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
>>       if (matrix_mdev->kvm)
>>           kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
>>   +    matrix_mdev->kvm = NULL;
>>       vfio_ap_mdev_reset_queues(mdev);
>>       vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
>>                    &matrix_mdev->group_notifier);
>> -    matrix_mdev->kvm = NULL;
>>       module_put(THIS_MODULE);
>>   }
>>   @@ -905,7 +938,9 @@ static ssize_t vfio_ap_mdev_ioctl(struct mdev_device *mdev,
>>           ret = vfio_ap_mdev_get_device_info(arg);
>>           break;
>>       case VFIO_DEVICE_RESET:
>> +        mutex_lock(&matrix_dev->lock);
>>           ret = vfio_ap_mdev_reset_queues(mdev);
>> +        mutex_unlock(&matrix_dev->lock);
>>           break;
>>       default:
>>           ret = -EOPNOTSUPP;
>> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
>> index a910be1..3e6940c 100644
>> --- a/drivers/s390/crypto/vfio_ap_private.h
>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>> @@ -40,6 +40,7 @@ struct ap_matrix_dev {
>>       atomic_t available_instances;
>>       struct ap_config_info info;
>>       struct list_head mdev_list;
>> +    struct list_head free_list;
>>       struct mutex lock;
>>       struct ap_driver  *vfio_ap_drv;
>>   };
>> @@ -83,9 +84,15 @@ struct ap_matrix_mdev {
>>       struct notifier_block group_notifier;
>>       struct kvm *kvm;
>>       struct kvm_s390_module_hook pqap_hook;
>> +    struct list_head qlist;
>>   };
>>     extern int vfio_ap_mdev_register(void);
>>   extern void vfio_ap_mdev_unregister(void);
>>   +struct vfio_ap_queue {
>> +    struct list_head list;
>> +    struct ap_matrix_mdev *matrix_mdev;
>> +    int    apqn;
>> +};
>>   #endif /* _VFIO_AP_PRIVATE_H_ */
>>
>


2019-03-27 16:08:47

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v6 1/7] s390: ap: kvm: add PQAP interception for AQIC

On 3/26/19 2:57 PM, Tony Krowiak wrote:
> On 3/22/19 10:43 AM, Pierre Morel wrote:
>> We prepare the interception of the PQAP/AQIC instruction for
>> the case the AQIC facility is enabled in the guest.
>>
>> First of all we do not want to change existing behavior when
>> intercepting AP instructions without the SIE allowing the guest
>> to use AP instructions.
>>
>> In this patch we only handle the AQIC interception allowed by
>> facility 65 which will be enabled when the complete interception
>> infrastructure will be present.
>>
>> We add a callback inside the KVM arch structure for s390 for
>> a VFIO driver to handle a specific response to the PQAP
>> instruction with the AQIC command and only this command.
>>
>> But we want to be able to return a correct answer to the guest
>> even there is no VFIO AP driver in the kernel.
>> Therefor, we inject the correct exceptions from inside KVM for the
>> case the callback is not initialized, which happens when the vfio_ap
>> driver is not loaded.
>>
>> We do consider the responsability of the driver to always initialize
>> the PQAP callback if it defines queues by initializing the CRYCB for
>> a guest.
>> If the callback has been setup we call it.
>> If not we setup an answer considering that no queue is available
>> for the guest when no callback has been setup.
>>
>> Signed-off-by: Pierre Morel <[email protected]>
>> ---
>>   arch/s390/include/asm/kvm_host.h      |  8 ++++
>>   arch/s390/kvm/priv.c                  | 90
>> +++++++++++++++++++++++++++++++++++
>>   drivers/s390/crypto/vfio_ap_private.h |  2 +
>>   3 files changed, 100 insertions(+)
>>
>> diff --git a/arch/s390/include/asm/kvm_host.h
>> b/arch/s390/include/asm/kvm_host.h
>> index a496276..624460b 100644
>> --- a/arch/s390/include/asm/kvm_host.h
>> +++ b/arch/s390/include/asm/kvm_host.h
>> @@ -18,6 +18,7 @@
>>   #include <linux/kvm_host.h>
>>   #include <linux/kvm.h>
>>   #include <linux/seqlock.h>
>> +#include <linux/module.h>
>>   #include <asm/debug.h>
>>   #include <asm/cpu.h>
>>   #include <asm/fpu/api.h>
>> @@ -721,8 +722,15 @@ struct kvm_s390_cpu_model {
>>       unsigned short ibc;
>>   };
>> +struct kvm_s390_module_hook {
>> +    int (*hook)(struct kvm_vcpu *vcpu);
>> +    void *data;
>> +    struct module *owner;
>> +};
>> +
>>   struct kvm_s390_crypto {
>>       struct kvm_s390_crypto_cb *crycb;
>> +    struct kvm_s390_module_hook *pqap_hook;
>>       __u32 crycbd;
>>       __u8 aes_kw;
>>       __u8 dea_kw;
>> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
>> index 8679bd7..793e48a 100644
>> --- a/arch/s390/kvm/priv.c
>> +++ b/arch/s390/kvm/priv.c
>> @@ -27,6 +27,7 @@
>>   #include <asm/io.h>
>>   #include <asm/ptrace.h>
>>   #include <asm/sclp.h>
>> +#include <asm/ap.h>
>>   #include "gaccess.h"
>>   #include "kvm-s390.h"
>>   #include "trace.h"
>> @@ -592,6 +593,93 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
>>       }
>>   }
>> +/*
>> + * handle_pqap: Handling pqap interception
>> + * @vcpu: the vcpu having issue the pqap instruction
>> + *
>> + * We now support PQAP/AQIC instructions and we need to correctly
>> + * answer the guest even if no dedicated driver's hook is available.
>> + *
>> + * The intercepting code calls a dedicated callback for this instruction
>> + * if a driver did register one in the CRYPTO satellite of the
>> + * SIE block.
>> + *
>> + * For PQAP AQIC and TAPQ instructions, verify privilege and
>> specifications.
>
> The two paragraphs above should be described via the comments embedded
> in the code and is not necessary here.
>
>> + *
>> + * If no callback available, the queues are not available, return
>> this to
>> + * the caller.
>
> This implies it is specified via the return code when it is in fact
> the response code in the status word.
>
>> + * Else return the value returned by the callback.
>> + */
>
> Given this handler may be called for any PQAP instruction sub-function,
> I think the function doc should be more generic, providing:
>
> * A general description of what the function does
> * A description of each input parameter
> * A description of the value returned. If the return value is a return
>   code, the possible rc values can be enumerated with a description for
>   of the reason each particular value may be returned.
>
>> +static int handle_pqap(struct kvm_vcpu *vcpu)
>> +{
>> +    struct ap_queue_status status = {};
>> +    unsigned long reg0;
>> +    int ret;
>> +    uint8_t fc;
>> +
>> +    /* Verify that the AP instruction are available */
>> +    if (!ap_instructions_available())
>> +        return -EOPNOTSUPP;
>> +    /* Verify that the guest is allowed to use AP instructions */
>> +    if (!(vcpu->arch.sie_block->eca & ECA_APIE))
>> +        return -EOPNOTSUPP;
>> +    /*
>> +     * The only possibly intercepted instructions when AP
>> instructions are
>> +     * available for the guest are AQIC and TAPQ with the t bit set
>> +     * since we do not set IC.3 (FIII) we currently will not intercept
>> +     * TAPQ.
>> +     * The following code will only treat AQIC function code.
>> +     */
>
> Simplify to:
>
> /* The only supported PQAP function is AQIC (0x03) */
>
>> +    reg0 = vcpu->run->s.regs.gprs[0];
>> +    fc = reg0 >> 24;
>> +    if (fc != 0x03) {
>> +        pr_warn("%s: Unexpected interception code 0x%02x\n",
>> +            __func__, fc);
>> +        return -EOPNOTSUPP;
>> +    }
>> +    /* All PQAP instructions are allowed for guest kernel only */
>
> There is only one PQAP instruction with multiple sub-functions.
> /* PQAP instruction is allowed for guest kernel only */
>                         or
> /* PQAP instruction is privileged */
>
>> +    if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
>> +        return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
>> +    /*
>> +     * Common tests for PQAP instructions to generate a specification
>> +     * exception
>> +     */
>
> This comment is unnecessary as the individual comments below adequately
> do the job.
>
>> +    /* Zero bits overwrite produce a specification exception */
>
> This comment has no meaning unless you intimately know the architecture.
> The following would make more sense:
>
>     /* Bits 41-47 must all be zeros */
>
> It's probably not a big deal, but since we don't support PQAP(TAPQ),
> would it make more sense to make sure bits 40-47 are zeros (i.e.,
> the 't' bit is not set)?
>
>> +    if (reg0 & 0x007f0000UL)
>> +        goto specification_except;
>> +    /* If APXA is not installed APQN is limited */
>
> Wouldn't it be better to state how the APQN is limited?
> For example:
>
>     /*
>      * If APXA is not installed, then the maximum APID is
>      * 63 (bits 48-49 of reg0 must be zero) and the maximum
>      * APQI is 15 (bits 56-59 must be zero)
>      */
>
>> +    if (!(vcpu->kvm->arch.crypto.crycbd & 0x02))
>> +        if (reg0 & 0x000030f0UL)
>
> If APXA is not installed, then bits 48-49 and 56-59 must all be
> zeros. Shouldn't this mask be 0x0000c0f0UL?
>
>> +            goto specification_except;
>> +    /* AQIC needs facility 65 */
>> +    if (!test_kvm_facility(vcpu->kvm, 65))
>> +        goto specification_except;
>> +
>> +    /*
>> +     * Verify that the hook callback is registered, lock the owner
>> +     * and call the hook.
>> +     */
>> +    if (vcpu->kvm->arch.crypto.pqap_hook) {
>> +        if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))
>> +            return -EOPNOTSUPP;
>> +        ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);
>> +        module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);
>> +        return ret;
>> +    }
>> +    /*
>> +     * It is the duty of the vfio_driver to register a hook
>> +     * If it does not and we get an exception on AQIC we must
>> +     * guess that there is no vfio_ap_driver at all and no one
>> +     * to handle the guests's CRYCB and the CRYCB is empty.
>> +     */
>
> The comment above does not make sense to me. If there is no pqap
> hook registered, then we need to handle that case for sure. But why
> mention getting an exception? Why even mention whose responsibility
> it is to set the hook when all we need to know is whether a hook is
> set or not?
>
> I am wondering whether merely setting a response code indicating the
> APQN is invalid is the correct thing to do here. First of all, if the
> guest's CRYCB is empty, then the AP bus running in the guest would not
> create any AP devices or any AP queues bound to any zcrypt driver. In
> that case, I don't think the PQAP(AQIC) would ever be issued. If a
> PQAP is intercepted, wouldn't we want to return -EOPNOTSUPP?

I dug back through the previous comments and see this has been discussed
before, so you can ignore this comment.

>
>
>
>> +    status.response_code = 0x01;
>> +    memcpy(&vcpu->run->s.regs.gprs[1], &status, sizeof(status));
>> +    return 0;
>> +
>> +specification_except:
>> +    return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
>> +}
>> +
>>   static int handle_stfl(struct kvm_vcpu *vcpu)
>>   {
>>       int rc;
>> @@ -878,6 +966,8 @@ int kvm_s390_handle_b2(struct kvm_vcpu *vcpu)
>>           return handle_sthyi(vcpu);
>>       case 0x7d:
>>           return handle_stsi(vcpu);
>> +    case 0xaf:
>> +        return handle_pqap(vcpu);
>>       case 0xb1:
>>           return handle_stfl(vcpu);
>>       case 0xb2:
>> diff --git a/drivers/s390/crypto/vfio_ap_private.h
>> b/drivers/s390/crypto/vfio_ap_private.h
>> index 76b7f98..a910be1 100644
>> --- a/drivers/s390/crypto/vfio_ap_private.h
>> +++
>> b/drivers/s390/crypto/vfio_ap_private.hhttps://www.linuxmint.com/start/sylvia/
>>
>> @@ -16,6 +16,7 @@
>>   #include <linux/mdev.h>
>>   #include <linux/delay.h>
>>   #include <linux/mutex.h>
>> +#include <linux/kvm_host.h>
>>   #include "ap_bus.h"
>> @@ -81,6 +82,7 @@ struct ap_matrix_mdev {
>>       struct ap_matrix matrix;
>>       struct notifier_block group_notifier;
>>       struct kvm *kvm;
>> +    struct kvm_s390_module_hook pqap_hook;
>>   };
>>   extern int vfio_ap_mdev_register(void);
>>
>


2019-03-28 12:44:37

by Pierre Morel

[permalink] [raw]
Subject: Re: [PATCH v6 1/7] s390: ap: kvm: add PQAP interception for AQIC

On 26/03/2019 19:57, Tony Krowiak wrote:
> On 3/22/19 10:43 AM, Pierre Morel wrote:
>> We prepare the interception of the PQAP/AQIC instruction for
>> the case the AQIC facility is enabled in the guest.
>>

...snip...

>> +/*
>> + * handle_pqap: Handling pqap interception
>> + * @vcpu: the vcpu having issue the pqap instruction
>> + *
>> + * We now support PQAP/AQIC instructions and we need to correctly
>> + * answer the guest even if no dedicated driver's hook is available.
>> + *
>> + * The intercepting code calls a dedicated callback for this instruction
>> + * if a driver did register one in the CRYPTO satellite of the
>> + * SIE block.
>> + *
>> + * For PQAP AQIC and TAPQ instructions, verify privilege and
>> specifications.
>
> The two paragraphs above should be described via the comments embedded
> in the code and is not necessary here.
>
>> + *
>> + * If no callback available, the queues are not available, return
>> this to
>> + * the caller.
>
> This implies it is specified via the return code when it is in fact
> the response code in the status word.
>
>> + * Else return the value returned by the callback.
>> + */
>
> Given this handler may be called for any PQAP instruction sub-function,
> I think the function doc should be more generic, providing:
>
> * A general description of what the function does
> * A description of each input parameter
> * A description of the value returned. If the return value is a return
>   code, the possible rc values can be enumerated with a description for
>   of the reason each particular value may be returned.

Sorry, I do not understand what you want here.
Isn't it exactly what is done?

And don't you exactly say the opposite when you say that the description
should be done by the embedded comments?


>
>> +static int handle_pqap(struct kvm_vcpu *vcpu)
>> +{
>> +    struct ap_queue_status status = {};
>> +    unsigned long reg0;
>> +    int ret;
>> +    uint8_t fc;
>> +
>> +    /* Verify that the AP instruction are available */
>> +    if (!ap_instructions_available())
>> +        return -EOPNOTSUPP;
>> +    /* Verify that the guest is allowed to use AP instructions */
>> +    if (!(vcpu->arch.sie_block->eca & ECA_APIE))
>> +        return -EOPNOTSUPP;
>> +    /*
>> +     * The only possibly intercepted instructions when AP
>> instructions are
>> +     * available for the guest are AQIC and TAPQ with the t bit set
>> +     * since we do not set IC.3 (FIII) we currently will not intercept
>> +     * TAPQ.
>> +     * The following code will only treat AQIC function code.
>> +     */
>
> Simplify to:
>
> /* The only supported PQAP function is AQIC (0x03) */

OK, but then istn't obvious from reading the code ?

>
>> +    reg0 = vcpu->run->s.regs.gprs[0];
>> +    fc = reg0 >> 24;
>> +    if (fc != 0x03) {
>> +        pr_warn("%s: Unexpected interception code 0x%02x\n",
>> +            __func__, fc);
>> +        return -EOPNOTSUPP;
>> +    }
>> +    /* All PQAP instructions are allowed for guest kernel only */
>
> There is only one PQAP instruction with multiple sub-functions.
> /* PQAP instruction is allowed for guest kernel only */
>                         or
> /* PQAP instruction is privileged */

OK

>
>> +    if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
>> +        return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
>> +    /*
>> +     * Common tests for PQAP instructions to generate a specification
>> +     * exception
>> +     */
>
> This comment is unnecessary as the individual comments below adequately
> do the job.

OK

>
>> +    /* Zero bits overwrite produce a specification exception */
>
> This comment has no meaning unless you intimately know the architecture.
> The following would make more sense:
>
>     /* Bits 41-47 must all be zeros */
>
> It's probably not a big deal, but since we don't support PQAP(TAPQ),
> would it make more sense to make sure bits 40-47 are zeros (i.e.,
> the 't' bit is not set)?

I am not sure about this one as APFT is installed in our case.
Or do you want that we test if it is installed and test the bit 40?

We should discuss this offline because I do not find any evidence that
we should really do this in the documentation.

>
>> +    if (reg0 & 0x007f0000UL)
>> +        goto specification_except;
>> +    /* If APXA is not installed APQN is limited */
>
> Wouldn't it be better to state how the APQN is limited?
> For example:
>
>     /*
>      * If APXA is not installed, then the maximum APID is
>      * 63 (bits 48-49 of reg0 must be zero) and the maximum
>      * APQI is 15 (bits 56-59 must be zero)
>      */
OK
>
>> +    if (!(vcpu->kvm->arch.crypto.crycbd & 0x02))
>> +        if (reg0 & 0x000030f0UL)
>
> If APXA is not installed, then bits 48-49 and 56-59 must all be
> zeros. Shouldn't this mask be 0x0000c0f0UL?

You can better count than I do ;)
I will change this to c0f0.

...snip...
>
>
>
>> +    status.response_code = 0x01;
>> +    memcpy(&vcpu->run->s.regs.gprs[1], &status, sizeof(status));

hum,
I miss a
kvm_s390_set_psw_cc(vcpu, 3);
here
and certainly wherever fault in the status response code are set.

Will be corrected in the next iteration.


Thanks for the comments,

regards,
Pierre



--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


2019-03-28 12:54:35

by Pierre Morel

[permalink] [raw]
Subject: Re: [PATCH v6 2/7] s390: ap: new vfio_ap_queue structure

On 27/03/2019 12:00, Harald Freudenberger wrote:
> On 26.03.19 21:45, Tony Krowiak wrote:
>> On 3/22/19 10:43 AM, Pierre Morel wrote:
>>> The AP interruptions are assigned on a queue basis and

...snip...

>>> +static void vfio_ap_free_queue(int apqn, struct ap_matrix_mdev *matrix_mdev)
>>> +{
>>> +    struct vfio_ap_queue *q;
>>> +
>>> +    q = vfio_ap_get_queue(apqn, &matrix_mdev->qlist);
>>> +    if (!q)
>>> +        return;
>>> +    q->matrix_mdev = NULL;
>>> +    vfio_ap_mdev_reset_queue(q);
>>
>> I'm wondering if it's necessary to reset the queue here. The only time
>> a queue is used is when a guest using the mdev device is started. When
>> that guest is terminated, the fd for the mdev device is closed and the
>> mdev device's release callback is invoked. The release callback resets
>> the queues assigned to the mdev device. Is it really necessary to
>> reset the queue again when it is unassigned even if there would have
>> been no subsequent activity?
> When I understand this here right this code is called when a queue goes
> away from the guest but is still reserved for use by the vfio dd. So it is
> possible to assign the queue now to another guest. But then it makes
> sense to clear all the entries in the millicode queue because a pending
> reply could be "received" by the wrong guest.
>
> If this function is just called on remove of a queue device where the
> device goes back to the AP bus, a reset is not needed.

You are right Harald, the function is called when un-assigning a queue
from a mediated device, so it must be reset before to assign it to another.


Regards,
Pierre

--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


2019-03-28 13:08:11

by Pierre Morel

[permalink] [raw]
Subject: Re: [PATCH v6 2/7] s390: ap: new vfio_ap_queue structure

On 26/03/2019 21:45, Tony Krowiak wrote:
> On 3/22/19 10:43 AM, Pierre Morel wrote:
>> The AP interruptions are assigned on a queue basis and
>> the GISA structure is handled on a VM basis, so that
>> we need to add a structure we can retrieve from both side
>
> s/side/sides/
OK

>
>> holding the information we need to handle PQAP/AQIC interception
>> and setup the GISA.
>
> s/setup/set up/

OK

...snip...

>> +
>> +static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
>> +{
>> +    struct ap_queue_status status;
>> +    int retry = 1;
>> +
>> +    do {
>> +        status = ap_zapq(q->apqn);
>> +        switch (status.response_code) {
>> +        case AP_RESPONSE_NORMAL:
>> +            return 0;
>> +        case AP_RESPONSE_RESET_IN_PROGRESS:
>> +        case AP_RESPONSE_BUSY:
>> +            msleep(20);
>> +            break;
>> +        default:
>> +            /* things are really broken, give up */
>
> I'm not sure things are necessarily broken. We could end up here if
> the AP is removed from the configuration via the SE or SCLP Deconfigure
> Adjunct Processor command.

OK, but note that it is your original comment I just moved the function
here ;)

>
>> +            return -EIO;
>> +        }
>> +    } while (retry--);
>> +
>> +    return -EBUSY;
>> +}
>> +
>>   static void vfio_ap_matrix_init(struct ap_config_info *info,
>>                   struct ap_matrix *matrix)
>>   {
>> @@ -45,6 +107,7 @@ static int vfio_ap_mdev_create(struct kobject
>> *kobj, struct mdev_device *mdev)
>>           return -ENOMEM;
>>       }
>> +    INIT_LIST_HEAD(&matrix_mdev->qlist);
>>       vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
>>       mdev_set_drvdata(mdev, matrix_mdev);
>>       mutex_lock(&matrix_dev->lock);
>> @@ -113,162 +176,189 @@ static struct attribute_group
>> *vfio_ap_mdev_type_groups[] = {
>>       NULL,
>>   };
>> -struct vfio_ap_queue_reserved {
>> -    unsigned long *apid;
>> -    unsigned long *apqi;
>> -    bool reserved;
>> -};
>> +static void vfio_ap_free_queue(int apqn, struct ap_matrix_mdev
>> *matrix_mdev)
>> +{
>> +    struct vfio_ap_queue *q;
>> +
>> +    q = vfio_ap_get_queue(apqn, &matrix_mdev->qlist);
>> +    if (!q)
>> +        return;
>> +    q->matrix_mdev = NULL;
>> +    vfio_ap_mdev_reset_queue(q);
>
> I'm wondering if it's necessary to reset the queue here. The only time
> a queue is used is when a guest using the mdev device is started. When
> that guest is terminated, the fd for the mdev device is closed and the
> mdev device's release callback is invoked. The release callback resets
> the queues assigned to the mdev device. Is it really necessary to
> reset the queue again when it is unassigned even if there would have
> been no subsequent activity?

Yes, it is necessary, the queue can be re-assigned to another guest later.
Release will only be called when unbinding the queue from the driver.

>
>> +    list_move(&q->list, &matrix_dev->free_list);
>> +}

...snip...

>> +    for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
>> +        apqn = AP_MKQID(apid, apqi);
>> +        q = vfio_ap_find_queue(apqn);
>> +        if (!q) {
>> +            ret = -EADDRNOTAVAIL;
>> +            goto rewind;
>> +        }
>> +        if (q->matrix_mdev) {
>
> If somebody assigns the same domain a second time, the assignment will
> fail because the matrix_mdev will already have been associated with the
> queue. I don't think it is appropriate to fail the assignment if the

It is usual to report a failure in the case the operation requested has
already be done.
But we can do as you want. Any other opinion?

> q->matrix_mdev is the same as the input matrix_mdev. This should be
> changed to:
>
>     if (q->matrix_mdev != matrix_mdev)

You surely want to say: add this, not change to this. ;)

>

Thanks for commenting,

Regards,
Pierre


--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


2019-03-28 13:15:07

by Pierre Morel

[permalink] [raw]
Subject: Re: [PATCH v6 2/7] s390: ap: new vfio_ap_queue structure

On 25/03/2019 09:05, Harald Freudenberger wrote:
> On 22.03.19 15:43, Pierre Morel wrote:
>> The AP interruptions are assigned on a queue basis and
>> the GISA structure is handled on a VM basis, so that

...snip...

>> + * vfio_ap_queue_dev_remove:
>> + *
>> + * Free the associated vfio_ap_queue structure
>> + */
>> static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
>> {
>> - /* Nothing to do yet */
>> + struct vfio_ap_queue *q;
>> +
>> + q = dev_get_drvdata(&apdev->device);
> I'd add a check if q != NULL here.

I wonder if this can ever happen.
However I added a check in the next patch.
I can move it here.

>> + mutex_lock(&matrix_dev->lock);
>> + list_del(&q->list);
>> + mutex_unlock(&matrix_dev->lock);
>> + kfree(q);
> I would add a line:
>     dev_set_drvdata(&apdev->device, NULL);

OK, I clean it before giving it back, fair.

Thanks.

Rgeards,
Pierre

--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


2019-03-28 15:25:33

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v6 1/7] s390: ap: kvm: add PQAP interception for AQIC

On 3/28/19 8:43 AM, Pierre Morel wrote:
> On 26/03/2019 19:57, Tony Krowiak wrote:
>> On 3/22/19 10:43 AM, Pierre Morel wrote:
>>> We prepare the interception of the PQAP/AQIC instruction for
>>> the case the AQIC facility is enabled in the guest.
>>>
>
> ...snip...
>
>>> +/*
>>> + * handle_pqap: Handling pqap interception
>>> + * @vcpu: the vcpu having issue the pqap instruction
>>> + *
>>> + * We now support PQAP/AQIC instructions and we need to correctly
>>> + * answer the guest even if no dedicated driver's hook is available.
>>> + *
>>> + * The intercepting code calls a dedicated callback for this
>>> instruction
>>> + * if a driver did register one in the CRYPTO satellite of the
>>> + * SIE block.
>>> + *
>>> + * For PQAP AQIC and TAPQ instructions, verify privilege and
>>> specifications.
>>
>> The two paragraphs above should be described via the comments embedded
>> in the code and is not necessary here.
>>
>>> + *
>>> + * If no callback available, the queues are not available, return
>>> this to
>>> + * the caller.
>>
>> This implies it is specified via the return code when it is in fact
>> the response code in the status word.
>>
>>> + * Else return the value returned by the callback.
>>> + */
>>
>> Given this handler may be called for any PQAP instruction sub-function,
>> I think the function doc should be more generic, providing:
>>
>> * A general description of what the function does
>> * A description of each input parameter
>> * A description of the value returned. If the return value is a return
>>    code, the possible rc values can be enumerated with a description for
>>    of the reason each particular value may be returned.
>
> Sorry, I do not understand what you want here.
> Isn't it exactly what is done?

No, what you have provided is a description that includes details that
may not apply in the future. I'm thinking something more like this:

/*
* handle_pqap
*
* @vcpu: the vcpu that executed the PQAP instruction
*
* Handles interception of the PQAP instruction. A specification
* exception will be injected into the guest if the input parameters
* to the PQAP instruction are not properly formatted.
*
* Returns zero if the PQAP instruction is handled successfully;
* otherwise, returns an error.
*/

>
> And don't you exactly say the opposite when you say that the description
> should be done by the embedded comments?

Not really, that was directed at only the two sentences preceding the
comment.

>
>
>>
>>> +static int handle_pqap(struct kvm_vcpu *vcpu)
>>> +{
>>> +    struct ap_queue_status status = {};
>>> +    unsigned long reg0;
>>> +    int ret;
>>> +    uint8_t fc;
>>> +
>>> +    /* Verify that the AP instruction are available */
>>> +    if (!ap_instructions_available())
>>> +        return -EOPNOTSUPP;
>>> +    /* Verify that the guest is allowed to use AP instructions */
>>> +    if (!(vcpu->arch.sie_block->eca & ECA_APIE))
>>> +        return -EOPNOTSUPP;
>>> +    /*
>>> +     * The only possibly intercepted instructions when AP
>>> instructions are
>>> +     * available for the guest are AQIC and TAPQ with the t bit set
>>> +     * since we do not set IC.3 (FIII) we currently will not intercept
>>> +     * TAPQ.
>>> +     * The following code will only treat AQIC function code.
>>> +     */
>>
>> Simplify to:
>>
>> /* The only supported PQAP function is AQIC (0x03) */
>
> OK, but then istn't obvious from reading the code ?

It's obvious that you are verifying the function code is
0x03, but only those familiar with the architecture will
know the is the AQIC function. Besides, I was merely modifying
the comment you already had. You can leave the comment out
if you prefer.

>
>>
>>> +    reg0 = vcpu->run->s.regs.gprs[0];
>>> +    fc = reg0 >> 24;
>>> +    if (fc != 0x03) {
>>> +        pr_warn("%s: Unexpected interception code 0x%02x\n",
>>> +            __func__, fc);

I would change the text to:
"Unexpected PQAP function code 0x%02x\n"

>>> +        return -EOPNOTSUPP;
>>> +    }
>>> +    /* All PQAP instructions are allowed for guest kernel only */
>>
>> There is only one PQAP instruction with multiple sub-functions.
>> /* PQAP instruction is allowed for guest kernel only */
>>                          or
>> /* PQAP instruction is privileged */
>
> OK
>
>>
>>> +    if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
>>> +        return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
>>> +    /*
>>> +     * Common tests for PQAP instructions to generate a specification
>>> +     * exception
>>> +     */
>>
>> This comment is unnecessary as the individual comments below adequately
>> do the job.
>
> OK
>
>>
>>> +    /* Zero bits overwrite produce a specification exception */
>>
>> This comment has no meaning unless you intimately know the architecture.
>> The following would make more sense:
>>
>>      /* Bits 41-47 must all be zeros */
>>
>> It's probably not a big deal, but since we don't support PQAP(TAPQ),
>> would it make more sense to make sure bits 40-47 are zeros (i.e.,
>> the 't' bit is not set)?
>
> I am not sure about this one as APFT is installed in our case.
> Or do you want that we test if it is installed and test the bit 40?
>
> We should discuss this offline because I do not find any evidence that
> we should really do this in the documentation.

I am okay with not checking bit 40, but I would change the comment as
suggested: /* Bits 41-47 must all be zeros */

>
>>
>>> +    if (reg0 & 0x007f0000UL)
>>> +        goto specification_except;
>>> +    /* If APXA is not installed APQN is limited */
>>
>> Wouldn't it be better to state how the APQN is limited?
>> For example:
>>
>>      /*
>>       * If APXA is not installed, then the maximum APID is
>>       * 63 (bits 48-49 of reg0 must be zero) and the maximum
>>       * APQI is 15 (bits 56-59 must be zero)
>>       */
> OK
>>
>>> +    if (!(vcpu->kvm->arch.crypto.crycbd & 0x02))
>>> +        if (reg0 & 0x000030f0UL)
>>
>> If APXA is not installed, then bits 48-49 and 56-59 must all be
>> zeros. Shouldn't this mask be 0x0000c0f0UL?
>
> You can better count than I do ;)
> I will change this to c0f0.
>
> ...snip...
>>
>>
>>
>>> +    status.response_code = 0x01;
>>> +    memcpy(&vcpu->run->s.regs.gprs[1], &status, sizeof(status));
>
> hum,
> I miss a
>     kvm_s390_set_psw_cc(vcpu, 3);
> here
> and certainly wherever fault in the status response code are set.
>
> Will be corrected in the next iteration.

Sounds good.

>
>
> Thanks for the comments,
>
> regards,
> Pierre
>
>
>


2019-03-28 15:34:58

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v6 2/7] s390: ap: new vfio_ap_queue structure

On 3/28/19 9:06 AM, Pierre Morel wrote:
> On 26/03/2019 21:45, Tony Krowiak wrote:
>> On 3/22/19 10:43 AM, Pierre Morel wrote:
>>> The AP interruptions are assigned on a queue basis and
>>> the GISA structure is handled on a VM basis, so that
>>> we need to add a structure we can retrieve from both side
>>
>> s/side/sides/
> OK
>
>>
>>> holding the information we need to handle PQAP/AQIC interception
>>> and setup the GISA.
>>
>> s/setup/set up/
>
> OK
>
> ...snip...
>
>>> +
>>> +static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
>>> +{
>>> +    struct ap_queue_status status;
>>> +    int retry = 1;
>>> +
>>> +    do {
>>> +        status = ap_zapq(q->apqn);
>>> +        switch (status.response_code) {
>>> +        case AP_RESPONSE_NORMAL:
>>> +            return 0;
>>> +        case AP_RESPONSE_RESET_IN_PROGRESS:
>>> +        case AP_RESPONSE_BUSY:
>>> +            msleep(20);
>>> +            break;
>>> +        default:
>>> +            /* things are really broken, give up */
>>
>> I'm not sure things are necessarily broken. We could end up here if
>> the AP is removed from the configuration via the SE or SCLP Deconfigure
>> Adjunct Processor command.
>
> OK, but note that it is your original comment I just moved the function
> here ;)

Yes, it is. I'm smarter now;)

>
>>
>>> +            return -EIO;
>>> +        }
>>> +    } while (retry--);
>>> +
>>> +    return -EBUSY;
>>> +}
>>> +
>>>   static void vfio_ap_matrix_init(struct ap_config_info *info,
>>>                   struct ap_matrix *matrix)
>>>   {
>>> @@ -45,6 +107,7 @@ static int vfio_ap_mdev_create(struct kobject
>>> *kobj, struct mdev_device *mdev)
>>>           return -ENOMEM;
>>>       }
>>> +    INIT_LIST_HEAD(&matrix_mdev->qlist);
>>>       vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
>>>       mdev_set_drvdata(mdev, matrix_mdev);
>>>       mutex_lock(&matrix_dev->lock);
>>> @@ -113,162 +176,189 @@ static struct attribute_group
>>> *vfio_ap_mdev_type_groups[] = {
>>>       NULL,
>>>   };
>>> -struct vfio_ap_queue_reserved {
>>> -    unsigned long *apid;
>>> -    unsigned long *apqi;
>>> -    bool reserved;
>>> -};
>>> +static void vfio_ap_free_queue(int apqn, struct ap_matrix_mdev
>>> *matrix_mdev)
>>> +{
>>> +    struct vfio_ap_queue *q;
>>> +
>>> +    q = vfio_ap_get_queue(apqn, &matrix_mdev->qlist);
>>> +    if (!q)
>>> +        return;
>>> +    q->matrix_mdev = NULL;
>>> +    vfio_ap_mdev_reset_queue(q);
>>
>> I'm wondering if it's necessary to reset the queue here. The only time
>> a queue is used is when a guest using the mdev device is started. When
>> that guest is terminated, the fd for the mdev device is /* Bits 41-47 must all be zeros */closed and the
>> mdev device's release callback is invoked. The release callback resets
>> the queues assigned to the mdev device. Is it really necessary to
>> reset the queue again when it is unassigned even if there would have
>> been no subsequent activity?
>
> Yes, it is necessary, the queue can be re-assigned to another guest later.
> Release will only be called when unbinding the queue from the driver.

That is true, but if the queue is never used, there is nothing to reset.

>
>>
>>> +    list_move(&q->list, &matrix_dev->free_list);
>>> +}
>
> ...snip...
>
>>> +    for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
>>> +        apqn = AP_MKQID(apid, apqi);
>>> +        q = vfio_ap_find_queue(apqn);
>>> +        if (!q) {
>>> +            ret = -EADDRNOTAVAIL;
>>> +            goto rewind;
>>> +        }
>>> +        if (q->matrix_mdev) {
>>
>> If somebody assigns the same domain a second time, the assignment will
>> fail because the matrix_mdev will already have been associated with the
>> queue. I don't think it is appropriate to fail the assignment if the
>
> It is usual to report a failure in the case the operation requested has
> already be done.
> But we can do as you want. Any other opinion?
>
>> q->matrix_mdev is the same as the input matrix_mdev. This should be
>> changed to:
>>
>>      if (q->matrix_mdev != matrix_mdev)
>
> You surely want to say: add this, not change to this. ;)

Yes

>
>>
>
> Thanks for commenting,
>
> Regards,
> Pierre
>
>


2019-03-28 16:07:10

by Pierre Morel

[permalink] [raw]
Subject: Re: [PATCH v6 2/7] s390: ap: new vfio_ap_queue structure

On 28/03/2019 16:32, Tony Krowiak wrote:
> On 3/28/19 9:06 AM, Pierre Morel wrote:
>> On 26/03/2019 21:45, Tony Krowiak wrote:
>>> On 3/22/19 10:43 AM, Pierre Morel wrote:
>>>> The AP interruptions are assigned on a queue basis and

...
>>>> +static void vfio_ap_free_queue(int apqn, struct ap_matrix_mdev
>>>> *matrix_mdev)
>>>> +{
>>>> +    struct vfio_ap_queue *q;
>>>> +
>>>> +    q = vfio_ap_get_queue(apqn, &matrix_mdev->qlist);
>>>> +    if (!q)
>>>> +        return;
>>>> +    q->matrix_mdev = NULL;
>>>> +    vfio_ap_mdev_reset_queue(q);
>>>
>>> I'm wondering if it's necessary to reset the queue here. The only time
>>> a queue is used is when a guest using the mdev device is started. When
>>> that guest is terminated, the fd for the mdev device is /* Bits 41-47
>>> must all be zeros */closed and the
>>> mdev device's release callback is invoked. The release callback resets
>>> the queues assigned to the mdev device. Is it really necessary to
>>> reset the queue again when it is unassigned even if there would have
>>> been no subsequent activity?
>>
>> Yes, it is necessary, the queue can be re-assigned to another guest
>> later.
>> Release will only be called when unbinding the queue from the driver.
>
> That is true, but if the queue is never used, there is nothing to reset.

:) OK


Regards,
Pierre

--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


2019-03-28 16:13:24

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v6 1/7] s390: ap: kvm: add PQAP interception for AQIC

On 3/22/19 10:43 AM, Pierre Morel wrote:
> We prepare the interception of the PQAP/AQIC instruction for
> the case the AQIC facility is enabled in the guest.
>
> First of all we do not want to change existing behavior when
> intercepting AP instructions without the SIE allowing the guest
> to use AP instructions.
>
> In this patch we only handle the AQIC interception allowed by
> facility 65 which will be enabled when the complete interception
> infrastructure will be present.
>
> We add a callback inside the KVM arch structure for s390 for
> a VFIO driver to handle a specific response to the PQAP
> instruction with the AQIC command and only this command.
>
> But we want to be able to return a correct answer to the guest
> even there is no VFIO AP driver in the kernel.
> Therefor, we inject the correct exceptions from inside KVM for the
> case the callback is not initialized, which happens when the vfio_ap
> driver is not loaded.
>
> We do consider the responsability of the driver to always initialize
> the PQAP callback if it defines queues by initializing the CRYCB for
> a guest.
> If the callback has been setup we call it.
> If not we setup an answer considering that no queue is available
> for the guest when no callback has been setup.
>
> Signed-off-by: Pierre Morel <[email protected]>
> ---
> arch/s390/include/asm/kvm_host.h | 8 ++++
> arch/s390/kvm/priv.c | 90 +++++++++++++++++++++++++++++++++++
> drivers/s390/crypto/vfio_ap_private.h | 2 +
> 3 files changed, 100 insertions(+)
>
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index a496276..624460b 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -18,6 +18,7 @@
> #include <linux/kvm_host.h>
> #include <linux/kvm.h>
> #include <linux/seqlock.h>
> +#include <linux/module.h>
> #include <asm/debug.h>
> #include <asm/cpu.h>
> #include <asm/fpu/api.h>
> @@ -721,8 +722,15 @@ struct kvm_s390_cpu_model {
> unsigned short ibc;
> };
>
> +struct kvm_s390_module_hook {
> + int (*hook)(struct kvm_vcpu *vcpu);
> + void *data;
> + struct module *owner;
> +};
> +
> struct kvm_s390_crypto {
> struct kvm_s390_crypto_cb *crycb;
> + struct kvm_s390_module_hook *pqap_hook;
> __u32 crycbd;
> __u8 aes_kw;
> __u8 dea_kw;
> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
> index 8679bd7..793e48a 100644
> --- a/arch/s390/kvm/priv.c
> +++ b/arch/s390/kvm/priv.c
> @@ -27,6 +27,7 @@
> #include <asm/io.h>
> #include <asm/ptrace.h>
> #include <asm/sclp.h>
> +#include <asm/ap.h>
> #include "gaccess.h"
> #include "kvm-s390.h"
> #include "trace.h"
> @@ -592,6 +593,93 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
> }
> }
>
> +/*
> + * handle_pqap: Handling pqap interception
> + * @vcpu: the vcpu having issue the pqap instruction
> + *
> + * We now support PQAP/AQIC instructions and we need to correctly
> + * answer the guest even if no dedicated driver's hook is available.
> + *
> + * The intercepting code calls a dedicated callback for this instruction
> + * if a driver did register one in the CRYPTO satellite of the
> + * SIE block.
> + *
> + * For PQAP AQIC and TAPQ instructions, verify privilege and specifications.
> + *
> + * If no callback available, the queues are not available, return this to
> + * the caller.
> + * Else return the value returned by the callback.
> + */
> +static int handle_pqap(struct kvm_vcpu *vcpu)
> +{
> + struct ap_queue_status status = {};
> + unsigned long reg0;
> + int ret;
> + uint8_t fc;
> +
> + /* Verify that the AP instruction are available */
> + if (!ap_instructions_available())
> + return -EOPNOTSUPP;
> + /* Verify that the guest is allowed to use AP instructions */
> + if (!(vcpu->arch.sie_block->eca & ECA_APIE))
> + return -EOPNOTSUPP;
> + /*
> + * The only possibly intercepted instructions when AP instructions are
> + * available for the guest are AQIC and TAPQ with the t bit set
> + * since we do not set IC.3 (FIII) we currently will not intercept
> + * TAPQ.
> + * The following code will only treat AQIC function code.
> + */
> + reg0 = vcpu->run->s.regs.gprs[0];
> + fc = reg0 >> 24;
> + if (fc != 0x03) {
> + pr_warn("%s: Unexpected interception code 0x%02x\n",
> + __func__, fc);
> + return -EOPNOTSUPP;
> + }
> + /* All PQAP instructions are allowed for guest kernel only */
> + if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
> + return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
> + /*
> + * Common tests for PQAP instructions to generate a specification
> + * exception
> + */
> + /* Zero bits overwrite produce a specification exception */
> + if (reg0 & 0x007f0000UL)
> + goto specification_except;
> + /* If APXA is not installed APQN is limited */
> + if (!(vcpu->kvm->arch.crypto.crycbd & 0x02))
> + if (reg0 & 0x000030f0UL)
> + goto specification_except;
> + /* AQIC needs facility 65 */
> + if (!test_kvm_facility(vcpu->kvm, 65))
> + goto specification_except;
> +
> + /*
> + * Verify that the hook callback is registered, lock the owner
> + * and call the hook.
> + */
> + if (vcpu->kvm->arch.crypto.pqap_hook) {
> + if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))
> + return -EOPNOTSUPP;
> + ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);
> + module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);
> + return ret;
> + }
> + /*
> + * It is the duty of the vfio_driver to register a hook
> + * If it does not and we get an exception on AQIC we must
> + * guess that there is no vfio_ap_driver at all and no one
> + * to handle the guests's CRYCB and the CRYCB is empty.
> + */
> + status.response_code = 0x01;
> + memcpy(&vcpu->run->s.regs.gprs[1], &status, sizeof(status));
> + return 0;
> +
> +specification_except:
> + return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
> +}
> +
> static int handle_stfl(struct kvm_vcpu *vcpu)
> {
> int rc;
> @@ -878,6 +966,8 @@ int kvm_s390_handle_b2(struct kvm_vcpu *vcpu)
> return handle_sthyi(vcpu);
> case 0x7d:
> return handle_stsi(vcpu);
> + case 0xaf:
> + return handle_pqap(vcpu);
> case 0xb1:
> return handle_stfl(vcpu);
> case 0xb2:
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index 76b7f98..a910be1 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -16,6 +16,7 @@
> #include <linux/mdev.h>
> #include <linux/delay.h>
> #include <linux/mutex.h>
> +#include <linux/kvm_host.h>
>
> #include "ap_bus.h"
>
> @@ -81,6 +82,7 @@ struct ap_matrix_mdev {
> struct ap_matrix matrix;
> struct notifier_block group_notifier;
> struct kvm *kvm;
> + struct kvm_s390_module_hook pqap_hook;

I don't understand the purpose of adding this field. We set up the
the kvm->arch.crypto.pqap_hook in the vfio_ap_mdev_set_kvm which is
also in this same file, why not just use a static struct
kvm_s390_module_hook and reuse it when setting up
kvm->arch.crypto.pqap_hook? It saves you from initializing it every
time an ap_matrix_mdev is created.

> };
>
> extern int vfio_ap_mdev_register(void);
>


2019-03-28 16:15:07

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v6 3/7] s390: ap: setup relation betwen KVM and mediated device

On 3/22/19 10:43 AM, Pierre Morel wrote:
> When the mediated device is open we setup the relation with KVM unset it
> when the mediated device is released.

s/open we setup/open, we set up/
s/with KVM unset/with KVM and unset/

>
> We lock the matrix mediated device to avoid any change until the
> open is done.
> We make sure that KVM is present when opening the mediated device
> otherwise we return an error.

s/mediated device/mediated device,/

>
> Increase kvm's refcount to ensure the KVM structures are still available
> during the use of the mediated device by the guest.
>
> Signed-off-by: Pierre Morel <[email protected]>
> ---
> drivers/s390/crypto/vfio_ap_ops.c | 143 +++++++++++++++++++++-----------------
> 1 file changed, 79 insertions(+), 64 deletions(-)
>
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 77f7bac..bdb36e0 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -787,74 +787,24 @@ static const struct attribute_group *vfio_ap_mdev_attr_groups[] = {
> NULL
> };
>
> -/**
> - * vfio_ap_mdev_set_kvm
> - *
> - * @matrix_mdev: a mediated matrix device
> - * @kvm: reference to KVM instance
> - *
> - * Verifies no other mediated matrix device has @kvm and sets a reference to
> - * it in @matrix_mdev->kvm.
> - *
> - * Return 0 if no other mediated matrix device has a reference to @kvm;
> - * otherwise, returns an -EPERM.
> - */
> -static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
> - struct kvm *kvm)
> -{
> - struct ap_matrix_mdev *m;
> -
> - mutex_lock(&matrix_dev->lock);
> -
> - list_for_each_entry(m, &matrix_dev->mdev_list, node) {
> - if ((m != matrix_mdev) && (m->kvm == kvm)) {
> - mutex_unlock(&matrix_dev->lock);
> - return -EPERM;
> - }
> - }
> -
> - matrix_mdev->kvm = kvm;
> - mutex_unlock(&matrix_dev->lock);
> -
> - return 0;
> -}
> -
> static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
> unsigned long action, void *data)
> {
> - int ret;
> struct ap_matrix_mdev *matrix_mdev;
>
> if (action != VFIO_GROUP_NOTIFY_SET_KVM)
> return NOTIFY_OK;
>
> matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
> -
> - if (!data) {
> - matrix_mdev->kvm = NULL;
> - return NOTIFY_OK;
> - }
> -
> - ret = vfio_ap_mdev_set_kvm(matrix_mdev, data);
> - if (ret)
> - return NOTIFY_DONE;
> -
> - /* If there is no CRYCB pointer, then we can't copy the masks */
> - if (!matrix_mdev->kvm->arch.crypto.crycbd)
> - return NOTIFY_DONE;
> -
> - kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
> - matrix_mdev->matrix.aqm,
> - matrix_mdev->matrix.adm);
> + matrix_mdev->kvm = data;
>
> return NOTIFY_OK;
> }
>
> -static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
> +static int vfio_ap_mdev_reset_queues(struct ap_matrix_mdev *matrix_mdev)
> {
> int ret;
> int rc = 0;
> - struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> struct vfio_ap_queue *q;
>
> list_for_each_entry(q, &matrix_mdev->qlist, list) {
> @@ -871,41 +821,106 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
> return rc;
> }
>
> +/**
> + * vfio_ap_mdev_set_kvm
> + *
> + * @matrix_mdev: a mediated matrix device
> + *
> + * - Verifies that the hook is free and install the PQAP hook
> + * - Copy the matrix masks inside the CRYCB
> + * - Increment the KVM rerference count
> + *
> + * Return 0 if no other mediated matrix device has a reference to @kvm;
> + * otherwise, returns an -EPERM.
> + */
> +static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev)
> +{
> + if (matrix_mdev->kvm->arch.crypto.pqap_hook)
> + return -EPERM;

How would this happen; in other words, why are we checking this?

> + matrix_mdev->kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
> +
> + kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
> + matrix_mdev->matrix.aqm,
> + matrix_mdev->matrix.adm);
> +
> + kvm_get_kvm(matrix_mdev->kvm);
> + return 0;
> +}
> +
> +/**
> + * vfio_ap_mdev_unset_kvm
> + *
> + * @matrix_mdev: a mediated matrix device
> + *
> + * - Clears the matrix masks inside the CRYCB
> + * - Reset the queues before to clear the hook in case IRQ happen during
> + * reset.
> + * - Clears the hook
> + * - Decrement the KVM rerference count
> + */
> +static int vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
> +{
> + struct kvm *kvm = matrix_mdev->kvm;
> +
> + kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
> + vfio_ap_mdev_reset_queues(matrix_mdev);
> + matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
> + matrix_mdev->kvm = NULL;
> + kvm_put_kvm(kvm);
> + return 0;
> +}
> +
> static int vfio_ap_mdev_open(struct mdev_device *mdev)
> {
> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> unsigned long events;
> int ret;
>
> + mutex_lock(&matrix_dev->lock);
>
> - if (!try_module_get(THIS_MODULE))
> - return -ENODEV;
> + if (!try_module_get(THIS_MODULE)) {
> + ret = -ENODEV;
> + goto unlock;
> + }
>
> matrix_mdev->group_notifier.notifier_call = vfio_ap_mdev_group_notifier;
> events = VFIO_GROUP_NOTIFY_SET_KVM;
>
> ret = vfio_register_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
> &events, &matrix_mdev->group_notifier);
> - if (ret) {
> - module_put(THIS_MODULE);
> - return ret;
> + if (ret)
> + goto put_unlock;
> +
> + /* We do not support opening the mediated device without KVM */
> + if (!matrix_mdev->kvm) {
> + ret = -ENOENT;
> + goto free_notifier;
> }
>
> - return 0;
> + ret = vfio_ap_mdev_set_kvm(matrix_mdev);
> + if (!ret)
> + goto unlock;
> +
> +free_notifier:
> + vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
> + &matrix_mdev->group_notifier);
> +put_unlock:
> + module_put(THIS_MODULE);
> +unlock:
> + mutex_unlock(&matrix_dev->lock);
> + return ret;
> }
>
> static void vfio_ap_mdev_release(struct mdev_device *mdev)
> {
> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>
> - if (matrix_mdev->kvm)
> - kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
> -
> - matrix_mdev->kvm = NULL;
> - vfio_ap_mdev_reset_queues(mdev);
> + mutex_lock(&matrix_dev->lock);
> + vfio_ap_mdev_unset_kvm(matrix_mdev);
> vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
> &matrix_mdev->group_notifier);
> module_put(THIS_MODULE);
> + mutex_unlock(&matrix_dev->lock);
> }
>
> static int vfio_ap_mdev_get_device_info(unsigned long arg)
> @@ -939,7 +954,7 @@ static ssize_t vfio_ap_mdev_ioctl(struct mdev_device *mdev,
> break;
> case VFIO_DEVICE_RESET:
> mutex_lock(&matrix_dev->lock);
> - ret = vfio_ap_mdev_reset_queues(mdev);
> + ret = vfio_ap_mdev_reset_queues(mdev_get_drvdata(mdev));
> mutex_unlock(&matrix_dev->lock);
> break;
> default:
>


2019-03-28 16:28:58

by Pierre Morel

[permalink] [raw]
Subject: Re: [PATCH v6 3/7] s390: ap: setup relation betwen KVM and mediated device

On 28/03/2019 17:12, Tony Krowiak wrote:
> On 3/22/19 10:43 AM, Pierre Morel wrote:
>> When the mediated device is open we setup the relation with KVM unset it
>> when the mediated device is released.
>
> s/open we setup/open, we set up/
> s/with KVM unset/with KVM and unset/
>
>>
>> We lock the matrix mediated device to avoid any change until the
>> open is done.
>> We make sure that KVM is present when opening the mediated device
>> otherwise we return an error.
>
> s/mediated device/mediated device,/
>
>>
>> Increase kvm's refcount to ensure the KVM structures are still available
>> during the use of the mediated device by the guest.
>>
>> Signed-off-by: Pierre Morel <[email protected]>
>> ---
>>   drivers/s390/crypto/vfio_ap_ops.c | 143
>> +++++++++++++++++++++-----------------
>>   1 file changed, 79 insertions(+), 64 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
>> b/drivers/s390/crypto/vfio_ap_ops.c
>> index 77f7bac..bdb36e0 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -787,74 +787,24 @@ static const struct attribute_group
>> *vfio_ap_mdev_attr_groups[] = {
>>       NULL
>>   };
>> -/**
>> - * vfio_ap_mdev_set_kvm
>> - *
>> - * @matrix_mdev: a mediated matrix device
>> - * @kvm: reference to KVM instance
>> - *
>> - * Verifies no other mediated matrix device has @kvm and sets a
>> reference to
>> - * it in @matrix_mdev->kvm.
>> - *
>> - * Return 0 if no other mediated matrix device has a reference to @kvm;
>> - * otherwise, returns an -EPERM.
>> - */
>> -static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
>> -                struct kvm *kvm)
>> -{
>> -    struct ap_matrix_mdev *m;
>> -
>> -    mutex_lock(&matrix_dev->lock);
>> -
>> -    list_for_each_entry(m, &matrix_dev->mdev_list, node) {
>> -        if ((m != matrix_mdev) && (m->kvm == kvm)) {
>> -            mutex_unlock(&matrix_dev->lock);
>> -            return -EPERM;
>> -        }
>> -    }
>> -
>> -    matrix_mdev->kvm = kvm;
>> -    mutex_unlock(&matrix_dev->lock);
>> -
>> -    return 0;
>> -}
>> -
>>   static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>>                          unsigned long action, void *data)
>>   {
>> -    int ret;
>>       struct ap_matrix_mdev *matrix_mdev;
>>       if (action != VFIO_GROUP_NOTIFY_SET_KVM)
>>           return NOTIFY_OK;
>>       matrix_mdev = container_of(nb, struct ap_matrix_mdev,
>> group_notifier);
>> -
>> -    if (!data) {
>> -        matrix_mdev->kvm = NULL;
>> -        return NOTIFY_OK;
>> -    }
>> -
>> -    ret = vfio_ap_mdev_set_kvm(matrix_mdev, data);
>> -    if (ret)
>> -        return NOTIFY_DONE;
>> -
>> -    /* If there is no CRYCB pointer, then we can't copy the masks */
>> -    if (!matrix_mdev->kvm->arch.crypto.crycbd)
>> -        return NOTIFY_DONE;
>> -
>> -    kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
>> -                  matrix_mdev->matrix.aqm,
>> -                  matrix_mdev->matrix.adm);
>> +    matrix_mdev->kvm = data;
>>       return NOTIFY_OK;
>>   }
>> -static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
>> +static int vfio_ap_mdev_reset_queues(struct ap_matrix_mdev *matrix_mdev)
>>   {
>>       int ret;
>>       int rc = 0;
>> -    struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>       struct vfio_ap_queue *q;
>>       list_for_each_entry(q, &matrix_mdev->qlist, list) {
>> @@ -871,41 +821,106 @@ static int vfio_ap_mdev_reset_queues(struct
>> mdev_device *mdev)
>>       return rc;
>>   }
>> +/**
>> + * vfio_ap_mdev_set_kvm
>> + *
>> + * @matrix_mdev: a mediated matrix device
>> + *
>> + * - Verifies that the hook is free and install the PQAP hook
>> + * - Copy the matrix masks inside the CRYCB
>> + * - Increment the KVM rerference count
>> + *
>> + * Return 0 if no other mediated matrix device has a reference to @kvm;
>> + * otherwise, returns an -EPERM.
>> + */
>> +static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev)
>> +{
>> +    if (matrix_mdev->kvm->arch.crypto.pqap_hook)
>> +        return -EPERM;
>
> How would this happen; in other words, why are we checking this?

I check this to verify that no other AP mediated device is already in
use by this VM.

Regards,
Pierre



--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


2019-03-28 17:26:45

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v6 3/7] s390: ap: setup relation betwen KVM and mediated device

On 3/28/19 12:27 PM, Pierre Morel wrote:
> On 28/03/2019 17:12, Tony Krowiak wrote:
>> On 3/22/19 10:43 AM, Pierre Morel wrote:
>>> When the mediated device is open we setup the relation with KVM unset it
>>> when the mediated device is released.
>>
>> s/open we setup/open, we set up/
>> s/with KVM unset/with KVM and unset/
>>
>>>
>>> We lock the matrix mediated device to avoid any change until the
>>> open is done.
>>> We make sure that KVM is present when opening the mediated device
>>> otherwise we return an error.
>>
>> s/mediated device/mediated device,/
>>
>>>
>>> Increase kvm's refcount to ensure the KVM structures are still available
>>> during the use of the mediated device by the guest.
>>>
>>> Signed-off-by: Pierre Morel <[email protected]>
>>> ---
>>>   drivers/s390/crypto/vfio_ap_ops.c | 143
>>> +++++++++++++++++++++-----------------
>>>   1 file changed, 79 insertions(+), 64 deletions(-)
>>>
>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
>>> b/drivers/s390/crypto/vfio_ap_ops.c
>>> index 77f7bac..bdb36e0 100644
>>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>>> @@ -787,74 +787,24 @@ static const struct attribute_group
>>> *vfio_ap_mdev_attr_groups[] = {
>>>       NULL
>>>   };
>>> -/**
>>> - * vfio_ap_mdev_set_kvm
>>> - *
>>> - * @matrix_mdev: a mediated matrix device
>>> - * @kvm: reference to KVM instance
>>> - *
>>> - * Verifies no other mediated matrix device has @kvm and sets a
>>> reference to
>>> - * it in @matrix_mdev->kvm.
>>> - *
>>> - * Return 0 if no other mediated matrix device has a reference to @kvm;
>>> - * otherwise, returns an -EPERM.
>>> - */
>>> -static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
>>> -                struct kvm *kvm)
>>> -{
>>> -    struct ap_matrix_mdev *m;
>>> -
>>> -    mutex_lock(&matrix_dev->lock);
>>> -
>>> -    list_for_each_entry(m, &matrix_dev->mdev_list, node) {
>>> -        if ((m != matrix_mdev) && (m->kvm == kvm)) {
>>> -            mutex_unlock(&matrix_dev->lock);
>>> -            return -EPERM;
>>> -        }
>>> -    }
>>> -
>>> -    matrix_mdev->kvm = kvm;
>>> -    mutex_unlock(&matrix_dev->lock);
>>> -
>>> -    return 0;
>>> -}
>>> -
>>>   static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>>>                          unsigned long action, void *data)
>>>   {
>>> -    int ret;
>>>       struct ap_matrix_mdev *matrix_mdev;
>>>       if (action != VFIO_GROUP_NOTIFY_SET_KVM)
>>>           return NOTIFY_OK;
>>>       matrix_mdev = container_of(nb, struct ap_matrix_mdev,
>>> group_notifier);
>>> -
>>> -    if (!data) {
>>> -        matrix_mdev->kvm = NULL;
>>> -        return NOTIFY_OK;
>>> -    }
>>> -
>>> -    ret = vfio_ap_mdev_set_kvm(matrix_mdev, data);
>>> -    if (ret)
>>> -        return NOTIFY_DONE;
>>> -
>>> -    /* If there is no CRYCB pointer, then we can't copy the masks */
>>> -    if (!matrix_mdev->kvm->arch.crypto.crycbd)
>>> -        return NOTIFY_DONE;
>>> -
>>> -    kvm_arch_crypto_set_masks(matrix_mdev->kvm,
>>> matrix_mdev->matrix.apm,
>>> -                  matrix_mdev->matrix.aqm,
>>> -                  matrix_mdev->matrix.adm);
>>> +    matrix_mdev->kvm = data;
>>>       return NOTIFY_OK;
>>>   }
>>> -static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
>>> +static int vfio_ap_mdev_reset_queues(struct ap_matrix_mdev
>>> *matrix_mdev)
>>>   {
>>>       int ret;
>>>       int rc = 0;
>>> -    struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>>       struct vfio_ap_queue *q;
>>>       list_for_each_entry(q, &matrix_mdev->qlist, list) {
>>> @@ -871,41 +821,106 @@ static int vfio_ap_mdev_reset_queues(struct
>>> mdev_device *mdev)
>>>       return rc;
>>>   }
>>> +/**
>>> + * vfio_ap_mdev_set_kvm
>>> + *
>>> + * @matrix_mdev: a mediated matrix device
>>> + *
>>> + * - Verifies that the hook is free and install the PQAP hook
>>> + * - Copy the matrix masks inside the CRYCB
>>> + * - Increment the KVM rerference count
>>> + *
>>> + * Return 0 if no other mediated matrix device has a reference to @kvm;
>>> + * otherwise, returns an -EPERM.
>>> + */
>>> +static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev)
>>> +{
>>> +    if (matrix_mdev->kvm->arch.crypto.pqap_hook)
>>> +        return -EPERM;
>>
>> How would this happen; in other words, why are we checking this?
>
> I check this to verify that no other AP mediated device is already in
> use by this VM.

Maybe you should insert a comment to that effect.

>
> Regards,
> Pierre
>
>
>


2019-03-28 20:49:13

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v6 4/7] vfio: ap: register IOMMU VFIO notifier

On 3/22/19 10:43 AM, Pierre Morel wrote:
> To be able to use the VFIO interface to facilitate the
> mediated device memory pinning/unpinning we need to register
> a notifier for IOMMU.
>
> While we will start to pin one guest page for the interrupt indicator
> byte, this is still ok with ballooning as this page will never be
> used by the guest virtio-balloon driver.
> So the pinned page will never be freed. And even a broken guest does
> so, that would not impact the host as the original page is still
> in control by vfio.

I apologize, but I do not understand what you are saying in the second
sentence of the paragraph above. Why will the pinned page never be
freed? I understand that the pinned page is under the control of vfio
until it is freed, but have no idea what you mean by "and even a broken
guest does so"? A broken guest does what? Can you please reword this so
it makes more sense?

>
> Signed-off-by: Pierre Morel <[email protected]>
> Reviewed-by: Cornelia Huck <[email protected]>
> ---
> drivers/s390/crypto/vfio_ap_ops.c | 38 +++++++++++++++++++++++++++++++++++
> drivers/s390/crypto/vfio_ap_private.h | 2 ++
> 2 files changed, 40 insertions(+)
>
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index bdb36e0..3478499 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -787,6 +787,35 @@ static const struct attribute_group *vfio_ap_mdev_attr_groups[] = {
> NULL
> };
>
> +/**
> + * vfio_ap_mdev_iommu_notifier: IOMMU notifier callback
> + *
> + * @nb: The notifier block
> + * @action: Action to be taken
> + * @data: data associated with the request
> + *
> + * For an UNMAP request, unpin the guest IOVA (the NIB guest address we
> + * pinned before). Other requests are ignored.
> + *
> + */
> +static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
> + unsigned long action, void *data)
> +{
> + struct ap_matrix_mdev *matrix_mdev;
> +
> + matrix_mdev = container_of(nb, struct ap_matrix_mdev, iommu_notifier);
> +

I don't understand why we registered this notifier. I may be wrong, but
AFAIU, this notifier will be invoked only when the VFIO_IOMMU_UNMAP_DMA
ioctl is called from userspace. I did an experiment and inserted some
printf's to see if this ever gets called and verified it does not. Maybe
you have a good reason of which I'm not aware. Can you enlighten me
here?

> + if (action == VFIO_IOMMU_NOTIFY_DMA_UNMAP) {
> + struct vfio_iommu_type1_dma_unmap *unmap = data;
> + unsigned long g_pfn = unmap->iova >> PAGE_SHIFT;
> +
> + vfio_unpin_pages(mdev_dev(matrix_mdev->mdev), &g_pfn, 1);
> + return NOTIFY_OK;
> + }
> +
> + return NOTIFY_DONE;
> +}
> +
> static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
> unsigned long action, void *data)
> {
> @@ -897,6 +926,13 @@ static int vfio_ap_mdev_open(struct mdev_device *mdev)
> goto free_notifier;
> }
>
> + matrix_mdev->iommu_notifier.notifier_call = vfio_ap_mdev_iommu_notifier;
> + events = VFIO_IOMMU_NOTIFY_DMA_UNMAP;
> + ret = vfio_register_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
> + &events, &matrix_mdev->iommu_notifier);
> + if (ret)
> + goto free_notifier;
> +
> ret = vfio_ap_mdev_set_kvm(matrix_mdev);
> if (!ret)
> goto unlock;
> @@ -917,6 +953,8 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
>
> mutex_lock(&matrix_dev->lock);
> vfio_ap_mdev_unset_kvm(matrix_mdev);
> + vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
> + &matrix_mdev->iommu_notifier);
> vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
> &matrix_mdev->group_notifier);
> module_put(THIS_MODULE);
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index 3e6940c..4a287c8 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -82,9 +82,11 @@ struct ap_matrix_mdev {
> struct list_head node;
> struct ap_matrix matrix;
> struct notifier_block group_notifier;
> + struct notifier_block iommu_notifier;
> struct kvm *kvm;
> struct kvm_s390_module_hook pqap_hook;
> struct list_head qlist;
> + struct mdev_device *mdev;
> };
>
> extern int vfio_ap_mdev_register(void);
>


2019-03-29 08:53:13

by Pierre Morel

[permalink] [raw]
Subject: Re: [PATCH v6 1/7] s390: ap: kvm: add PQAP interception for AQIC

On 28/03/2019 17:12, Tony Krowiak wrote:
> On 3/22/19 10:43 AM, Pierre Morel wrote:
>> We prepare the interception of the PQAP/AQIC instruction for
>> the case the AQIC facility is enabled in the guest.
>>
>> First of all we do not want to change existing behavior when
>> intercepting AP instructions without the SIE allowing the guest
>> to use AP instructions.
>>
>> In this patch we only handle the AQIC interception allowed by
>> facility 65 which will be enabled when the complete interception
>> infrastructure will be present.
>>
>> We add a callback inside the KVM arch structure for s390 for
>> a VFIO driver to handle a specific response to the PQAP
>> instruction with the AQIC command and only this command.
>>
>> But we want to be able to return a correct answer to the guest
>> even there is no VFIO AP driver in the kernel.
>> Therefor, we inject the correct exceptions from inside KVM for the
>> case the callback is not initialized, which happens when the vfio_ap
>> driver is not loaded.
>>
>> We do consider the responsability of the driver to always initialize
>> the PQAP callback if it defines queues by initializing the CRYCB for
>> a guest.
>> If the callback has been setup we call it.
>> If not we setup an answer considering that no queue is available
>> for the guest when no callback has been setup.
>>
>> Signed-off-by: Pierre Morel <[email protected]>
>> ---
>>   arch/s390/include/asm/kvm_host.h      |  8 ++++
>>   arch/s390/kvm/priv.c                  | 90
>> +++++++++++++++++++++++++++++++++++
>>   drivers/s390/crypto/vfio_ap_private.h |  2 +
>>   3 files changed, 100 insertions(+)
>>
>> diff --git a/arch/s390/include/asm/kvm_host.h
>> b/arch/s390/include/asm/kvm_host.h
>> index a496276..624460b 100644
>> --- a/arch/s390/include/asm/kvm_host.h
>> +++ b/arch/s390/include/asm/kvm_host.h
>> @@ -18,6 +18,7 @@
>>   #include <linux/kvm_host.h>
>>   #include <linux/kvm.h>
>>   #include <linux/seqlock.h>
>> +#include <linux/module.h>
>>   #include <asm/debug.h>
>>   #include <asm/cpu.h>
>>   #include <asm/fpu/api.h>
>> @@ -721,8 +722,15 @@ struct kvm_s390_cpu_model {
>>       unsigned short ibc;
>>   };
>> +struct kvm_s390_module_hook {
>> +    int (*hook)(struct kvm_vcpu *vcpu);
>> +    void *data;
>> +    struct module *owner;
>> +};
>> +
>>   struct kvm_s390_crypto {
>>       struct kvm_s390_crypto_cb *crycb;
>> +    struct kvm_s390_module_hook *pqap_hook;
>>       __u32 crycbd;
>>       __u8 aes_kw;
>>       __u8 dea_kw;
>> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
>> index 8679bd7..793e48a 100644
>> --- a/arch/s390/kvm/priv.c
>> +++ b/arch/s390/kvm/priv.c
>> @@ -27,6 +27,7 @@
>>   #include <asm/io.h>
>>   #include <asm/ptrace.h>
>>   #include <asm/sclp.h>
>> +#include <asm/ap.h>
>>   #include "gaccess.h"
>>   #include "kvm-s390.h"
>>   #include "trace.h"
>> @@ -592,6 +593,93 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
>>       }
>>   }
>> +/*
>> + * handle_pqap: Handling pqap interception
>> + * @vcpu: the vcpu having issue the pqap instruction
>> + *
>> + * We now support PQAP/AQIC instructions and we need to correctly
>> + * answer the guest even if no dedicated driver's hook is available.
>> + *
>> + * The intercepting code calls a dedicated callback for this instruction
>> + * if a driver did register one in the CRYPTO satellite of the
>> + * SIE block.
>> + *
>> + * For PQAP AQIC and TAPQ instructions, verify privilege and
>> specifications.
>> + *
>> + * If no callback available, the queues are not available, return
>> this to
>> + * the caller.
>> + * Else return the value returned by the callback.
>> + */
>> +static int handle_pqap(struct kvm_vcpu *vcpu)
>> +{
>> +    struct ap_queue_status status = {};
>> +    unsigned long reg0;
>> +    int ret;
>> +    uint8_t fc;
>> +
>> +    /* Verify that the AP instruction are available */
>> +    if (!ap_instructions_available())
>> +        return -EOPNOTSUPP;
>> +    /* Verify that the guest is allowed to use AP instructions */
>> +    if (!(vcpu->arch.sie_block->eca & ECA_APIE))
>> +        return -EOPNOTSUPP;
>> +    /*
>> +     * The only possibly intercepted instructions when AP
>> instructions are
>> +     * available for the guest are AQIC and TAPQ with the t bit set
>> +     * since we do not set IC.3 (FIII) we currently will not intercept
>> +     * TAPQ.
>> +     * The following code will only treat AQIC function code.
>> +     */
>> +    reg0 = vcpu->run->s.regs.gprs[0];
>> +    fc = reg0 >> 24;
>> +    if (fc != 0x03) {
>> +        pr_warn("%s: Unexpected interception code 0x%02x\n",
>> +            __func__, fc);
>> +        return -EOPNOTSUPP;
>> +    }
>> +    /* All PQAP instructions are allowed for guest kernel only */
>> +    if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
>> +        return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
>> +    /*
>> +     * Common tests for PQAP instructions to generate a specification
>> +     * exception
>> +     */
>> +    /* Zero bits overwrite produce a specification exception */
>> +    if (reg0 & 0x007f0000UL)
>> +        goto specification_except;
>> +    /* If APXA is not installed APQN is limited */
>> +    if (!(vcpu->kvm->arch.crypto.crycbd & 0x02))
>> +        if (reg0 & 0x000030f0UL)
>> +            goto specification_except;
>> +    /* AQIC needs facility 65 */
>> +    if (!test_kvm_facility(vcpu->kvm, 65))
>> +        goto specification_except;
>> +
>> +    /*
>> +     * Verify that the hook callback is registered, lock the owner
>> +     * and call the hook.
>> +     */
>> +    if (vcpu->kvm->arch.crypto.pqap_hook) {
>> +        if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))
>> +            return -EOPNOTSUPP;
>> +        ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);
>> +        module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);
>> +        return ret;
>> +    }
>> +    /*
>> +     * It is the duty of the vfio_driver to register a hook
>> +     * If it does not and we get an exception on AQIC we must
>> +     * guess that there is no vfio_ap_driver at all and no one
>> +     * to handle the guests's CRYCB and the CRYCB is empty.
>> +     */
>> +    status.response_code = 0x01;
>> +    memcpy(&vcpu->run->s.regs.gprs[1], &status, sizeof(status));
>> +    return 0;
>> +
>> +specification_except:
>> +    return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
>> +}
>> +
>>   static int handle_stfl(struct kvm_vcpu *vcpu)
>>   {
>>       int rc;
>> @@ -878,6 +966,8 @@ int kvm_s390_handle_b2(struct kvm_vcpu *vcpu)
>>           return handle_sthyi(vcpu);
>>       case 0x7d:
>>           return handle_stsi(vcpu);
>> +    case 0xaf:
>> +        return handle_pqap(vcpu);
>>       case 0xb1:
>>           return handle_stfl(vcpu);
>>       case 0xb2:
>> diff --git a/drivers/s390/crypto/vfio_ap_private.h
>> b/drivers/s390/crypto/vfio_ap_private.h
>> index 76b7f98..a910be1 100644
>> --- a/drivers/s390/crypto/vfio_ap_private.h
>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>> @@ -16,6 +16,7 @@
>>   #include <linux/mdev.h>
>>   #include <linux/delay.h>
>>   #include <linux/mutex.h>
>> +#include <linux/kvm_host.h>
>>   #include "ap_bus.h"
>> @@ -81,6 +82,7 @@ struct ap_matrix_mdev {
>>       struct ap_matrix matrix;
>>       struct notifier_block group_notifier;
>>       struct kvm *kvm;
>> +    struct kvm_s390_module_hook pqap_hook;
>
> I don't understand the purpose of adding this field. We set up the
> the kvm->arch.crypto.pqap_hook in the vfio_ap_mdev_set_kvm which is
> also in this same file, why not just use a static struct
> kvm_s390_module_hook and reuse it when setting up
> kvm->arch.crypto.pqap_hook? It saves you from initializing it every
> time an ap_matrix_mdev is created.

Having this field embedded in the matrix_mdev allows to easily retrieve
the matrix_mdev from the the hook.

Thanks,
Pierre




--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


2019-03-29 09:01:04

by Pierre Morel

[permalink] [raw]
Subject: Re: [PATCH v6 3/7] s390: ap: setup relation betwen KVM and mediated device

On 28/03/2019 18:25, Tony Krowiak wrote:
> On 3/28/19 12:27 PM, Pierre Morel wrote:
>> On 28/03/2019 17:12, Tony Krowiak wrote:
>>> On 3/22/19 10:43 AM, Pierre Morel wrote:
>>>> When the mediated device is open we setup the relation with KVM
>>>> unset it
>>>> when the mediated device is released.
>>>
>>> s/open we setup/open, we set up/
>>> s/with KVM unset/with KVM and unset/
>>>
>>>>
>>>> We lock the matrix mediated device to avoid any change until the
>>>> open is done.
>>>> We make sure that KVM is present when opening the mediated device
>>>> otherwise we return an error.
>>>
>>> s/mediated device/mediated device,/
>>>
>>>>
>>>> Increase kvm's refcount to ensure the KVM structures are still
>>>> available
>>>> during the use of the mediated device by the guest.
>>>>
>>>> Signed-off-by: Pierre Morel <[email protected]>
>>>> ---
>>>>   drivers/s390/crypto/vfio_ap_ops.c | 143
>>>> +++++++++++++++++++++-----------------
>>>>   1 file changed, 79 insertions(+), 64 deletions(-)
>>>>
>>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
>>>> b/drivers/s390/crypto/vfio_ap_ops.c
>>>> index 77f7bac..bdb36e0 100644
>>>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>>>> @@ -787,74 +787,24 @@ static const struct attribute_group
>>>> *vfio_ap_mdev_attr_groups[] = {
>>>>       NULL
>>>>   };
>>>> -/**
>>>> - * vfio_ap_mdev_set_kvm
>>>> - *
>>>> - * @matrix_mdev: a mediated matrix device
>>>> - * @kvm: reference to KVM instance
>>>> - *
>>>> - * Verifies no other mediated matrix device has @kvm and sets a
>>>> reference to
>>>> - * it in @matrix_mdev->kvm.
>>>> - *
>>>> - * Return 0 if no other mediated matrix device has a reference to
>>>> @kvm;
>>>> - * otherwise, returns an -EPERM.
>>>> - */
>>>> -static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
>>>> -                struct kvm *kvm)
>>>> -{
>>>> -    struct ap_matrix_mdev *m;
>>>> -
>>>> -    mutex_lock(&matrix_dev->lock);
>>>> -
>>>> -    list_for_each_entry(m, &matrix_dev->mdev_list, node) {
>>>> -        if ((m != matrix_mdev) && (m->kvm == kvm)) {
>>>> -            mutex_unlock(&matrix_dev->lock);
>>>> -            return -EPERM;
>>>> -        }
>>>> -    }
>>>> -
>>>> -    matrix_mdev->kvm = kvm;
>>>> -    mutex_unlock(&matrix_dev->lock);
>>>> -
>>>> -    return 0;
>>>> -}
>>>> -
>>>>   static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>>>>                          unsigned long action, void *data)
>>>>   {
>>>> -    int ret;
>>>>       struct ap_matrix_mdev *matrix_mdev;
>>>>       if (action != VFIO_GROUP_NOTIFY_SET_KVM)
>>>>           return NOTIFY_OK;
>>>>       matrix_mdev = container_of(nb, struct ap_matrix_mdev,
>>>> group_notifier);
>>>> -
>>>> -    if (!data) {
>>>> -        matrix_mdev->kvm = NULL;
>>>> -        return NOTIFY_OK;
>>>> -    }
>>>> -
>>>> -    ret = vfio_ap_mdev_set_kvm(matrix_mdev, data);
>>>> -    if (ret)
>>>> -        return NOTIFY_DONE;
>>>> -
>>>> -    /* If there is no CRYCB pointer, then we can't copy the masks */
>>>> -    if (!matrix_mdev->kvm->arch.crypto.crycbd)
>>>> -        return NOTIFY_DONE;
>>>> -
>>>> -    kvm_arch_crypto_set_masks(matrix_mdev->kvm,
>>>> matrix_mdev->matrix.apm,
>>>> -                  matrix_mdev->matrix.aqm,
>>>> -                  matrix_mdev->matrix.adm);
>>>> +    matrix_mdev->kvm = data;
>>>>       return NOTIFY_OK;
>>>>   }
>>>> -static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
>>>> +static int vfio_ap_mdev_reset_queues(struct ap_matrix_mdev
>>>> *matrix_mdev)
>>>>   {
>>>>       int ret;
>>>>       int rc = 0;
>>>> -    struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>>>       struct vfio_ap_queue *q;
>>>>       list_for_each_entry(q, &matrix_mdev->qlist, list) {
>>>> @@ -871,41 +821,106 @@ static int vfio_ap_mdev_reset_queues(struct
>>>> mdev_device *mdev)
>>>>       return rc;
>>>>   }
>>>> +/**
>>>> + * vfio_ap_mdev_set_kvm
>>>> + *
>>>> + * @matrix_mdev: a mediated matrix device
>>>> + *
>>>> + * - Verifies that the hook is free and install the PQAP hook
>>>> + * - Copy the matrix masks inside the CRYCB
>>>> + * - Increment the KVM rerference count
>>>> + *
>>>> + * Return 0 if no other mediated matrix device has a reference to
>>>> @kvm;
>>>> + * otherwise, returns an -EPERM.
>>>> + */
>>>> +static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev)
>>>> +{
>>>> +    if (matrix_mdev->kvm->arch.crypto.pqap_hook)
>>>> +        return -EPERM;
>>>
>>> How would this happen; in other words, why are we checking this?
>>
>> I check this to verify that no other AP mediated device is already in
>> use by this VM.
>
> Maybe you should insert a comment to that effect.

Please notice that there is already a comment on this in the description
of the function.

Regards,
Pierre



--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


2019-03-29 09:32:10

by Pierre Morel

[permalink] [raw]
Subject: Re: [PATCH v6 4/7] vfio: ap: register IOMMU VFIO notifier

On 28/03/2019 21:46, Tony Krowiak wrote:
> On 3/22/19 10:43 AM, Pierre Morel wrote:
>> To be able to use the VFIO interface to facilitate the
>> mediated device memory pinning/unpinning we need to register
>> a notifier for IOMMU.
>>
>> While we will start to pin one guest page for the interrupt indicator
>> byte, this is still ok with ballooning as this page will never be
>> used by the guest virtio-balloon driver.
>> So the pinned page will never be freed. And even a broken guest does
>> so, that would not impact the host as the original page is still
>> in control by vfio.
>
> I apologize, but I do not understand what you are saying in the second
> sentence of the paragraph above. Why will the pinned page never be
> freed?
Because it is in use by the guest's kernel as a notification information
byte for the original PQAP AQIC.

I understand that the pinned page is under the control of vfio
> until it is freed, but have no idea what you mean by "and even a broken
> guest does so"? A broken guest does what? Can you please reword this so
> it makes more sense?

A broken guest could free the page used for the NIB. What is obviously
wrong.

>
>>
>> Signed-off-by: Pierre Morel <[email protected]>
>> Reviewed-by: Cornelia Huck <[email protected]>
>> ---
>>   drivers/s390/crypto/vfio_ap_ops.c     | 38
>> +++++++++++++++++++++++++++++++++++
>>   drivers/s390/crypto/vfio_ap_private.h |  2 ++
>>   2 files changed, 40 insertions(+)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
>> b/drivers/s390/crypto/vfio_ap_ops.c
>> index bdb36e0..3478499 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -787,6 +787,35 @@ static const struct attribute_group
>> *vfio_ap_mdev_attr_groups[] = {
>>       NULL
>>   };
>> +/**
>> + * vfio_ap_mdev_iommu_notifier: IOMMU notifier callback
>> + *
>> + * @nb: The notifier block
>> + * @action: Action to be taken
>> + * @data: data associated with the request
>> + *
>> + * For an UNMAP request, unpin the guest IOVA (the NIB guest address we
>> + * pinned before). Other requests are ignored.
>> + *
>> + */
>> +static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
>> +                       unsigned long action, void *data)
>> +{
>> +    struct ap_matrix_mdev *matrix_mdev;
>> +
>> +    matrix_mdev = container_of(nb, struct ap_matrix_mdev,
>> iommu_notifier);
>> +
>
> I don't understand why we registered this notifier. I may be wrong, but
> AFAIU, this notifier will be invoked only when the VFIO_IOMMU_UNMAP_DMA
> ioctl is called from userspace. I did an experiment and inserted some
> printf's to see if this ever gets called and verified it does not. Maybe
> you have a good reason of which I'm not aware. Can you enlighten me
> here?

The vfio_iommu_type1 pin page requires a notifier.

Regards,
Pierre

--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


2019-03-29 13:03:55

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v6 1/7] s390: ap: kvm: add PQAP interception for AQIC

On 3/29/19 4:52 AM, Pierre Morel wrote:
> On 28/03/2019 17:12, Tony Krowiak wrote:
>> On 3/22/19 10:43 AM, Pierre Morel wrote:
>>> We prepare the interception of the PQAP/AQIC instruction for
>>> the case the AQIC facility is enabled in the guest.
>>>
>>> First of all we do not want to change existing behavior when
>>> intercepting AP instructions without the SIE allowing the guest
>>> to use AP instructions.
>>>
>>> In this patch we only handle the AQIC interception allowed by
>>> facility 65 which will be enabled when the complete interception
>>> infrastructure will be present.
>>>
>>> We add a callback inside the KVM arch structure for s390 for
>>> a VFIO driver to handle a specific response to the PQAP
>>> instruction with the AQIC command and only this command.
>>>
>>> But we want to be able to return a correct answer to the guest
>>> even there is no VFIO AP driver in the kernel.
>>> Therefor, we inject the correct exceptions from inside KVM for the
>>> case the callback is not initialized, which happens when the vfio_ap
>>> driver is not loaded.
>>>
>>> We do consider the responsability of the driver to always initialize
>>> the PQAP callback if it defines queues by initializing the CRYCB for
>>> a guest.
>>> If the callback has been setup we call it.
>>> If not we setup an answer considering that no queue is available
>>> for the guest when no callback has been setup.
>>>
>>> Signed-off-by: Pierre Morel <[email protected]>
>>> ---
>>>   arch/s390/include/asm/kvm_host.h      |  8 ++++
>>>   arch/s390/kvm/priv.c                  | 90
>>> +++++++++++++++++++++++++++++++++++
>>>   drivers/s390/crypto/vfio_ap_private.h |  2 +
>>>   3 files changed, 100 insertions(+)
>>>
>>> diff --git a/arch/s390/include/asm/kvm_host.h
>>> b/arch/s390/include/asm/kvm_host.h
>>> index a496276..624460b 100644
>>> --- a/arch/s390/include/asm/kvm_host.h
>>> +++ b/arch/s390/include/asm/kvm_host.h
>>> @@ -18,6 +18,7 @@
>>>   #include <linux/kvm_host.h>
>>>   #include <linux/kvm.h>
>>>   #include <linux/seqlock.h>
>>> +#include <linux/module.h>
>>>   #include <asm/debug.h>
>>>   #include <asm/cpu.h>
>>>   #include <asm/fpu/api.h>
>>> @@ -721,8 +722,15 @@ struct kvm_s390_cpu_model {
>>>       unsigned short ibc;
>>>   };
>>> +struct kvm_s390_module_hook {
>>> +    int (*hook)(struct kvm_vcpu *vcpu);
>>> +    void *data;
>>> +    struct module *owner;
>>> +};
>>> +
>>>   struct kvm_s390_crypto {
>>>       struct kvm_s390_crypto_cb *crycb;
>>> +    struct kvm_s390_module_hook *pqap_hook;
>>>       __u32 crycbd;
>>>       __u8 aes_kw;
>>>       __u8 dea_kw;
>>> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
>>> index 8679bd7..793e48a 100644
>>> --- a/arch/s390/kvm/priv.c
>>> +++ b/arch/s390/kvm/priv.c
>>> @@ -27,6 +27,7 @@
>>>   #include <asm/io.h>
>>>   #include <asm/ptrace.h>
>>>   #include <asm/sclp.h>
>>> +#include <asm/ap.h>
>>>   #include "gaccess.h"
>>>   #include "kvm-s390.h"
>>>   #include "trace.h"
>>> @@ -592,6 +593,93 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
>>>       }
>>>   }
>>> +/*
>>> + * handle_pqap: Handling pqap interception
>>> + * @vcpu: the vcpu having issue the pqap instruction
>>> + *
>>> + * We now support PQAP/AQIC instructions and we need to correctly
>>> + * answer the guest even if no dedicated driver's hook is available.
>>> + *
>>> + * The intercepting code calls a dedicated callback for this
>>> instruction
>>> + * if a driver did register one in the CRYPTO satellite of the
>>> + * SIE block.
>>> + *
>>> + * For PQAP AQIC and TAPQ instructions, verify privilege and
>>> specifications.
>>> + *
>>> + * If no callback available, the queues are not available, return
>>> this to
>>> + * the caller.
>>> + * Else return the value returned by the callback.
>>> + */
>>> +static int handle_pqap(struct kvm_vcpu *vcpu)
>>> +{
>>> +    struct ap_queue_status status = {};
>>> +    unsigned long reg0;
>>> +    int ret;
>>> +    uint8_t fc;
>>> +
>>> +    /* Verify that the AP instruction are available */
>>> +    if (!ap_instructions_available())
>>> +        return -EOPNOTSUPP;
>>> +    /* Verify that the guest is allowed to use AP instructions */
>>> +    if (!(vcpu->arch.sie_block->eca & ECA_APIE))
>>> +        return -EOPNOTSUPP;
>>> +    /*
>>> +     * The only possibly intercepted instructions when AP
>>> instructions are
>>> +     * available for the guest are AQIC and TAPQ with the t bit set
>>> +     * since we do not set IC.3 (FIII) we currently will not intercept
>>> +     * TAPQ.
>>> +     * The following code will only treat AQIC function code.
>>> +     */
>>> +    reg0 = vcpu->run->s.regs.gprs[0];
>>> +    fc = reg0 >> 24;
>>> +    if (fc != 0x03) {
>>> +        pr_warn("%s: Unexpected interception code 0x%02x\n",
>>> +            __func__, fc);
>>> +        return -EOPNOTSUPP;
>>> +    }
>>> +    /* All PQAP instructions are allowed for guest kernel only */
>>> +    if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
>>> +        return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
>>> +    /*
>>> +     * Common tests for PQAP instructions to generate a specification
>>> +     * exception
>>> +     */
>>> +    /* Zero bits overwrite produce a specification exception */
>>> +    if (reg0 & 0x007f0000UL)
>>> +        goto specification_except;
>>> +    /* If APXA is not installed APQN is limited */
>>> +    if (!(vcpu->kvm->arch.crypto.crycbd & 0x02))
>>> +        if (reg0 & 0x000030f0UL)
>>> +            goto specification_except;
>>> +    /* AQIC needs facility 65 */
>>> +    if (!test_kvm_facility(vcpu->kvm, 65))
>>> +        goto specification_except;
>>> +
>>> +    /*
>>> +     * Verify that the hook callback is registered, lock the owner
>>> +     * and call the hook.
>>> +     */
>>> +    if (vcpu->kvm->arch.crypto.pqap_hook) {
>>> +        if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))
>>> +            return -EOPNOTSUPP;
>>> +        ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);
>>> +        module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);
>>> +        return ret;
>>> +    }
>>> +    /*
>>> +     * It is the duty of the vfio_driver to register a hook
>>> +     * If it does not and we get an exception on AQIC we must
>>> +     * guess that there is no vfio_ap_driver at all and no one
>>> +     * to handle the guests's CRYCB and the CRYCB is empty.
>>> +     */
>>> +    status.response_code = 0x01;
>>> +    memcpy(&vcpu->run->s.regs.gprs[1], &status, sizeof(status));
>>> +    return 0;
>>> +
>>> +specification_except:
>>> +    return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
>>> +}
>>> +
>>>   static int handle_stfl(struct kvm_vcpu *vcpu)
>>>   {
>>>       int rc;
>>> @@ -878,6 +966,8 @@ int kvm_s390_handle_b2(struct kvm_vcpu *vcpu)
>>>           return handle_sthyi(vcpu);
>>>       case 0x7d:
>>>           return handle_stsi(vcpu);
>>> +    case 0xaf:
>>> +        return handle_pqap(vcpu);
>>>       case 0xb1:
>>>           return handle_stfl(vcpu);
>>>       case 0xb2:
>>> diff --git a/drivers/s390/crypto/vfio_ap_private.h
>>> b/drivers/s390/crypto/vfio_ap_private.h
>>> index 76b7f98..a910be1 100644
>>> --- a/drivers/s390/crypto/vfio_ap_private.h
>>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>>> @@ -16,6 +16,7 @@
>>>   #include <linux/mdev.h>
>>>   #include <linux/delay.h>
>>>   #include <linux/mutex.h>
>>> +#include <linux/kvm_host.h>
>>>   #include "ap_bus.h"
>>> @@ -81,6 +82,7 @@ struct ap_matrix_mdev {
>>>       struct ap_matrix matrix;
>>>       struct notifier_block group_notifier;
>>>       struct kvm *kvm;
>>> +    struct kvm_s390_module_hook pqap_hook;
>>
>> I don't understand the purpose of adding this field. We set up the
>> the kvm->arch.crypto.pqap_hook in the vfio_ap_mdev_set_kvm which is
>> also in this same file, why not just use a static struct
>> kvm_s390_module_hook and reuse it when setting up
>> kvm->arch.crypto.pqap_hook? It saves you from initializing it every
>> time an ap_matrix_mdev is created.
>
> Having this field embedded in the matrix_mdev allows to easily retrieve
> the matrix_mdev from the the hook.

The only place you do this is in the handle_pqap hook. The reason you
get the matrix_mdev there is to get the vfio_ap_queue object from the
matrix_mdev qlist. You could just as well have gotten the vfio_ap_queue
using the vfio_ap_find_queue() function.

>
> Thanks,
> Pierre
>
>
>
>


2019-03-29 13:07:08

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v6 3/7] s390: ap: setup relation betwen KVM and mediated device

On 3/29/19 4:58 AM, Pierre Morel wrote:
> On 28/03/2019 18:25, Tony Krowiak wrote:
>> On 3/28/19 12:27 PM, Pierre Morel wrote:
>>> On 28/03/2019 17:12, Tony Krowiak wrote:
>>>> On 3/22/19 10:43 AM, Pierre Morel wrote:
>>>>> When the mediated device is open we setup the relation with KVM
>>>>> unset it
>>>>> when the mediated device is released.
>>>>
>>>> s/open we setup/open, we set up/
>>>> s/with KVM unset/with KVM and unset/
>>>>
>>>>>
>>>>> We lock the matrix mediated device to avoid any change until the
>>>>> open is done.
>>>>> We make sure that KVM is present when opening the mediated device
>>>>> otherwise we return an error.
>>>>
>>>> s/mediated device/mediated device,/
>>>>
>>>>>
>>>>> Increase kvm's refcount to ensure the KVM structures are still
>>>>> available
>>>>> during the use of the mediated device by the guest.
>>>>>
>>>>> Signed-off-by: Pierre Morel <[email protected]>
>>>>> ---
>>>>>   drivers/s390/crypto/vfio_ap_ops.c | 143
>>>>> +++++++++++++++++++++-----------------
>>>>>   1 file changed, 79 insertions(+), 64 deletions(-)
>>>>>
>>>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
>>>>> b/drivers/s390/crypto/vfio_ap_ops.c
>>>>> index 77f7bac..bdb36e0 100644
>>>>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>>>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>>>>> @@ -787,74 +787,24 @@ static const struct attribute_group
>>>>> *vfio_ap_mdev_attr_groups[] = {
>>>>>       NULL
>>>>>   };
>>>>> -/**
>>>>> - * vfio_ap_mdev_set_kvm
>>>>> - *
>>>>> - * @matrix_mdev: a mediated matrix device
>>>>> - * @kvm: reference to KVM instance
>>>>> - *
>>>>> - * Verifies no other mediated matrix device has @kvm and sets a
>>>>> reference to
>>>>> - * it in @matrix_mdev->kvm.
>>>>> - *
>>>>> - * Return 0 if no other mediated matrix device has a reference to
>>>>> @kvm;
>>>>> - * otherwise, returns an -EPERM.
>>>>> - */
>>>>> -static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
>>>>> -                struct kvm *kvm)
>>>>> -{
>>>>> -    struct ap_matrix_mdev *m;
>>>>> -
>>>>> -    mutex_lock(&matrix_dev->lock);
>>>>> -
>>>>> -    list_for_each_entry(m, &matrix_dev->mdev_list, node) {
>>>>> -        if ((m != matrix_mdev) && (m->kvm == kvm)) {
>>>>> -            mutex_unlock(&matrix_dev->lock);
>>>>> -            return -EPERM;
>>>>> -        }
>>>>> -    }
>>>>> -
>>>>> -    matrix_mdev->kvm = kvm;
>>>>> -    mutex_unlock(&matrix_dev->lock);
>>>>> -
>>>>> -    return 0;
>>>>> -}
>>>>> -
>>>>>   static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>>>>>                          unsigned long action, void *data)
>>>>>   {
>>>>> -    int ret;
>>>>>       struct ap_matrix_mdev *matrix_mdev;
>>>>>       if (action != VFIO_GROUP_NOTIFY_SET_KVM)
>>>>>           return NOTIFY_OK;
>>>>>       matrix_mdev = container_of(nb, struct ap_matrix_mdev,
>>>>> group_notifier);
>>>>> -
>>>>> -    if (!data) {
>>>>> -        matrix_mdev->kvm = NULL;
>>>>> -        return NOTIFY_OK;
>>>>> -    }
>>>>> -
>>>>> -    ret = vfio_ap_mdev_set_kvm(matrix_mdev, data);
>>>>> -    if (ret)
>>>>> -        return NOTIFY_DONE;
>>>>> -
>>>>> -    /* If there is no CRYCB pointer, then we can't copy the masks */
>>>>> -    if (!matrix_mdev->kvm->arch.crypto.crycbd)
>>>>> -        return NOTIFY_DONE;
>>>>> -
>>>>> -    kvm_arch_crypto_set_masks(matrix_mdev->kvm,
>>>>> matrix_mdev->matrix.apm,
>>>>> -                  matrix_mdev->matrix.aqm,
>>>>> -                  matrix_mdev->matrix.adm);
>>>>> +    matrix_mdev->kvm = data;
>>>>>       return NOTIFY_OK;
>>>>>   }
>>>>> -static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
>>>>> +static int vfio_ap_mdev_reset_queues(struct ap_matrix_mdev
>>>>> *matrix_mdev)
>>>>>   {
>>>>>       int ret;
>>>>>       int rc = 0;
>>>>> -    struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>>>>       struct vfio_ap_queue *q;
>>>>>       list_for_each_entry(q, &matrix_mdev->qlist, list) {
>>>>> @@ -871,41 +821,106 @@ static int vfio_ap_mdev_reset_queues(struct
>>>>> mdev_device *mdev)
>>>>>       return rc;
>>>>>   }
>>>>> +/**
>>>>> + * vfio_ap_mdev_set_kvm
>>>>> + *
>>>>> + * @matrix_mdev: a mediated matrix device
>>>>> + *
>>>>> + * - Verifies that the hook is free and install the PQAP hook
>>>>> + * - Copy the matrix masks inside the CRYCB
>>>>> + * - Increment the KVM rerference count
>>>>> + *
>>>>> + * Return 0 if no other mediated matrix device has a reference to
>>>>> @kvm;
>>>>> + * otherwise, returns an -EPERM.
>>>>> + */
>>>>> +static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev)
>>>>> +{
>>>>> +    if (matrix_mdev->kvm->arch.crypto.pqap_hook)
>>>>> +        return -EPERM;
>>>>
>>>> How would this happen; in other words, why are we checking this?
>>>
>>> I check this to verify that no other AP mediated device is already in
>>> use by this VM.
>>
>> Maybe you should insert a comment to that effect.
>
> Please notice that there is already a comment on this in the description
> of the function.

True, but that comment merely states that the function verifies the
hook is free, not the reason why that particular check is done. When
I reviewed the code and saw this check, I wondered why it was necessary.
The comment you have would not have helped in this regard, so maybe
you need to update your comment.

>
> Regards,
> Pierre
>
>
>


2019-03-29 13:24:29

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v6 4/7] vfio: ap: register IOMMU VFIO notifier

On 3/29/19 5:31 AM, Pierre Morel wrote:
> On 28/03/2019 21:46, Tony Krowiak wrote:
>> On 3/22/19 10:43 AM, Pierre Morel wrote:
>>> To be able to use the VFIO interface to facilitate the
>>> mediated device memory pinning/unpinning we need to register
>>> a notifier for IOMMU.
>>>
>>> While we will start to pin one guest page for the interrupt indicator
>>> byte, this is still ok with ballooning as this page will never be
>>> used by the guest virtio-balloon driver.
>>> So the pinned page will never be freed. And even a broken guest does
>>> so, that would not impact the host as the original page is still
>>> in control by vfio.
>>
>> I apologize, but I do not understand what you are saying in the second
>> sentence of the paragraph above. Why will the pinned page never be freed?
> Because it is in use by the guest's kernel as a notification information
> byte for the original PQAP AQIC.

Your comment says the pinned page will never be free, doesn't it get
freed when the guest is terminated?

>
>  I understand that the pinned page is under the control of vfio
>> until it is freed, but have no idea what you mean by "and even a broken
>> guest does so"? A broken guest does what? Can you please reword this so
>> it makes more sense?
>
> A broken guest could free the page used for the NIB. What is obviously
> wrong.

Then why not simply say a pinned page is under the control of the
vfio driver, so if a broken (malicious?) guest frees the page, it will
not impact the host or something to that effect.

>
>>
>>>
>>> Signed-off-by: Pierre Morel <[email protected]>
>>> Reviewed-by: Cornelia Huck <[email protected]>
>>> ---
>>>   drivers/s390/crypto/vfio_ap_ops.c     | 38
>>> +++++++++++++++++++++++++++++++++++
>>>   drivers/s390/crypto/vfio_ap_private.h |  2 ++
>>>   2 files changed, 40 insertions(+)
>>>
>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
>>> b/drivers/s390/crypto/vfio_ap_ops.c
>>> index bdb36e0..3478499 100644
>>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>>> @@ -787,6 +787,35 @@ static const struct attribute_group
>>> *vfio_ap_mdev_attr_groups[] = {
>>>       NULL
>>>   };
>>> +/**
>>> + * vfio_ap_mdev_iommu_notifier: IOMMU notifier callback
>>> + *
>>> + * @nb: The notifier block
>>> + * @action: Action to be taken
>>> + * @data: data associated with the request
>>> + *
>>> + * For an UNMAP request, unpin the guest IOVA (the NIB guest address we
>>> + * pinned before). Other requests are ignored.
>>> + *
>>> + */
>>> +static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
>>> +                       unsigned long action, void *data)
>>> +{
>>> +    struct ap_matrix_mdev *matrix_mdev;
>>> +
>>> +    matrix_mdev = container_of(nb, struct ap_matrix_mdev,
>>> iommu_notifier);
>>> +
>>
>> I don't understand why we registered this notifier. I may be wrong, but
>> AFAIU, this notifier will be invoked only when the VFIO_IOMMU_UNMAP_DMA
>> ioctl is called from userspace. I did an experiment and inserted some
>> printf's to see if this ever gets called and verified it does not. Maybe
>> you have a good reason of which I'm not aware. Can you enlighten me
>> here?
>
> The vfio_iommu_type1 pin page requires a notifier.
>
> Regards,
> Pierre
>


2019-04-02 13:32:27

by Pierre Morel

[permalink] [raw]
Subject: Re: [PATCH v6 2/7] s390: ap: new vfio_ap_queue structure


Hi all,

I will abandon this single patch of the series so, no need to comment
anymore.
Even I liked using lists and though it could make things easier for
hotplug later, it is not the right time for this change. If ever.

Sorry for this change of plan.
I will send a new v7 without this patch soon.

Regards,
Pierre



On 22/03/2019 15:43, Pierre Morel wrote:
> The AP interruptions are assigned on a queue basis and
> the GISA structure is handled on a VM basis, so that
> we need to add a structure we can retrieve from both side
> holding the information we need to handle PQAP/AQIC interception
> and setup the GISA.
>
> Since we can not add more information to the ap_device
> we add a new structure vfio_ap_queue, to hold per queue
> information useful to handle interruptions and set it as
> driver's data of the standard ap_queue device.
>
> Usually, the device and the mediated device are linked together
> but in the vfio_ap driver design we have a bunch of "sub" devices
> (the ap_queue devices) belonging to the mediated device.
>
> Linking these structure to the mediated device it is assigned to,
> with the help of the vfio_ap_queue structure will help us to
> retrieve the AP devices associated with the mediated devices
> during the mediated device operations.
>
> ------------ -------------
> | AP queue |--> | AP_vfio_q |<----
> ------------ ------^------ | ---------------
> | <--->| matrix_mdev |
> ------------ ------v------ | ---------------
> | AP queue |--> | AP_vfio_q |-----
> ------------ -------------
>
> The vfio_ap_queue device will hold the following entries:
> - apqn: AP queue number (defined here)
> - isc : Interrupt subclass (defined later)
> - nib : notification information byte (defined later)
> - list: a list_head entry allowing to link this structure to a
> matrix mediated device it is assigned to.
>
> The vfio_ap_queue structure is allocated when the vfio_ap_driver
> is probed and added as driver data to the ap_queue device.
> It is free on remove.
>
> The structure is linked to the matrix_dev host device at the
> probe of the device building some kind of free list for the
> matrix mediated devices.
>
> When the vfio_queue is associated to a matrix mediated device,
> during assign_adapter or assign_domain,
> the vfio_ap_queue device is linked to this matrix mediated device
> and unlinked when dissociated.
>
> Queuing the devices on a list of free devices and testing the
> matrix_mdev pointer to the associated matrix allow us to know
> if the queue is associated to the matrix device and associated
> or not to a mediated device.
>
> All the operation on the free_list must be protected by the
> VFIO AP matrix_dev lock.
>
> Signed-off-by: Pierre Morel <[email protected]>
> ---
> drivers/s390/crypto/vfio_ap_drv.c | 31 ++-
> drivers/s390/crypto/vfio_ap_ops.c | 423 ++++++++++++++++++----------------
> drivers/s390/crypto/vfio_ap_private.h | 7 +
> 3 files changed, 266 insertions(+), 195 deletions(-)
>
> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
> index e9824c3..df6f21a 100644
> --- a/drivers/s390/crypto/vfio_ap_drv.c
> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> @@ -40,14 +40,42 @@ static struct ap_device_id ap_queue_ids[] = {
>
> MODULE_DEVICE_TABLE(vfio_ap, ap_queue_ids);
>
> +/**
> + * vfio_ap_queue_dev_probe:
> + *
> + * Allocate a vfio_ap_queue structure and associate it
> + * with the device as driver_data.
> + */
> static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
> {
> + struct vfio_ap_queue *q;
> +
> + q = kzalloc(sizeof(*q), GFP_KERNEL);
> + if (!q)
> + return -ENOMEM;
> + dev_set_drvdata(&apdev->device, q);
> + q->apqn = to_ap_queue(&apdev->device)->qid;
> + INIT_LIST_HEAD(&q->list);
> + mutex_lock(&matrix_dev->lock);
> + list_add(&q->list, &matrix_dev->free_list);
> + mutex_unlock(&matrix_dev->lock);
> return 0;
> }
>
> +/**
> + * vfio_ap_queue_dev_remove:
> + *
> + * Free the associated vfio_ap_queue structure
> + */
> static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
> {
> - /* Nothing to do yet */
> + struct vfio_ap_queue *q;
> +
> + q = dev_get_drvdata(&apdev->device);
> + mutex_lock(&matrix_dev->lock);
> + list_del(&q->list);
> + mutex_unlock(&matrix_dev->lock);
> + kfree(q);
> }
>
> static void vfio_ap_matrix_dev_release(struct device *dev)
> @@ -107,6 +135,7 @@ static int vfio_ap_matrix_dev_create(void)
> matrix_dev->device.bus = &matrix_bus;
> matrix_dev->device.release = vfio_ap_matrix_dev_release;
> matrix_dev->vfio_ap_drv = &vfio_ap_drv;
> + INIT_LIST_HEAD(&matrix_dev->free_list);
>
> ret = device_register(&matrix_dev->device);
> if (ret)
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 900b9cf..77f7bac 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -24,6 +24,68 @@
> #define VFIO_AP_MDEV_TYPE_HWVIRT "passthrough"
> #define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
>
> +/**
> + * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
> + * @apqn: The queue APQN
> + *
> + * Retrieve a queue with a specific APQN from the list of the
> + * devices associated with a list.
> + *
> + * Returns the pointer to the associated vfio_ap_queue
> + */
> +struct vfio_ap_queue *vfio_ap_get_queue(int apqn, struct list_head *l)
> +{
> + struct vfio_ap_queue *q;
> +
> + list_for_each_entry(q, l, list)
> + if (q->apqn == apqn)
> + return q;
> + return NULL;
> +}
> +
> +static int vfio_ap_find_any_card(int apid)
> +{
> + struct vfio_ap_queue *q;
> +
> + list_for_each_entry(q, &matrix_dev->free_list, list)
> + if (AP_QID_CARD(q->apqn) == apid)
> + return 1;
> + return 0;
> +}
> +
> +static int vfio_ap_find_any_domain(int apqi)
> +{
> + struct vfio_ap_queue *q;
> +
> + list_for_each_entry(q, &matrix_dev->free_list, list)
> + if (AP_QID_QUEUE(q->apqn) == apqi)
> + return 1;
> + return 0;
> +}
> +
> +static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
> +{
> + struct ap_queue_status status;
> + int retry = 1;
> +
> + do {
> + status = ap_zapq(q->apqn);
> + switch (status.response_code) {
> + case AP_RESPONSE_NORMAL:
> + return 0;
> + case AP_RESPONSE_RESET_IN_PROGRESS:
> + case AP_RESPONSE_BUSY:
> + msleep(20);
> + break;
> + default:
> + /* things are really broken, give up */
> + return -EIO;
> + }
> + } while (retry--);
> +
> + return -EBUSY;
> +}
> +
> static void vfio_ap_matrix_init(struct ap_config_info *info,
> struct ap_matrix *matrix)
> {
> @@ -45,6 +107,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
> return -ENOMEM;
> }
>
> + INIT_LIST_HEAD(&matrix_mdev->qlist);
> vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
> mdev_set_drvdata(mdev, matrix_mdev);
> mutex_lock(&matrix_dev->lock);
> @@ -113,162 +176,189 @@ static struct attribute_group *vfio_ap_mdev_type_groups[] = {
> NULL,
> };
>
> -struct vfio_ap_queue_reserved {
> - unsigned long *apid;
> - unsigned long *apqi;
> - bool reserved;
> -};
> +static void vfio_ap_free_queue(int apqn, struct ap_matrix_mdev *matrix_mdev)
> +{
> + struct vfio_ap_queue *q;
> +
> + q = vfio_ap_get_queue(apqn, &matrix_mdev->qlist);
> + if (!q)
> + return;
> + q->matrix_mdev = NULL;
> + vfio_ap_mdev_reset_queue(q);
> + list_move(&q->list, &matrix_dev->free_list);
> +}
>
> /**
> - * vfio_ap_has_queue
> - *
> - * @dev: an AP queue device
> - * @data: a struct vfio_ap_queue_reserved reference
> + * vfio_ap_put_all_domains:
> *
> - * Flags whether the AP queue device (@dev) has a queue ID containing the APQN,
> - * apid or apqi specified in @data:
> + * @matrix_mdev: the matrix mediated device for which we want to associate
> + * all available queues with a given apqi.
> + * @apid: The apid which associated with all defined APQI of the
> + * mediated device will define a AP queue.
> *
> - * - If @data contains both an apid and apqi value, then @data will be flagged
> - * as reserved if the APID and APQI fields for the AP queue device matches
> - *
> - * - If @data contains only an apid value, @data will be flagged as
> - * reserved if the APID field in the AP queue device matches
> - *
> - * - If @data contains only an apqi value, @data will be flagged as
> - * reserved if the APQI field in the AP queue device matches
> - *
> - * Returns 0 to indicate the input to function succeeded. Returns -EINVAL if
> - * @data does not contain either an apid or apqi.
> + * We remove the queue from the list of queues associated with the
> + * mediated device and put them back to the free list of the matrix
> + * device and clear the matrix_mdev pointer.
> */
> -static int vfio_ap_has_queue(struct device *dev, void *data)
> +static void vfio_ap_put_all_domains(struct ap_matrix_mdev *matrix_mdev,
> + int apid)
> {
> - struct vfio_ap_queue_reserved *qres = data;
> - struct ap_queue *ap_queue = to_ap_queue(dev);
> - ap_qid_t qid;
> - unsigned long id;
> + int apqi, apqn;
>
> - if (qres->apid && qres->apqi) {
> - qid = AP_MKQID(*qres->apid, *qres->apqi);
> - if (qid == ap_queue->qid)
> - qres->reserved = true;
> - } else if (qres->apid && !qres->apqi) {
> - id = AP_QID_CARD(ap_queue->qid);
> - if (id == *qres->apid)
> - qres->reserved = true;
> - } else if (!qres->apid && qres->apqi) {
> - id = AP_QID_QUEUE(ap_queue->qid);
> - if (id == *qres->apqi)
> - qres->reserved = true;
> - } else {
> - return -EINVAL;
> + for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
> + apqn = AP_MKQID(apid, apqi);
> + vfio_ap_free_queue(apqn, matrix_mdev);
> }
> -
> - return 0;
> }
>
> /**
> - * vfio_ap_verify_queue_reserved
> - *
> - * @matrix_dev: a mediated matrix device
> - * @apid: an AP adapter ID
> - * @apqi: an AP queue index
> + * vfio_ap_put_all_cards:
> *
> - * Verifies that the AP queue with @apid/@apqi is reserved by the VFIO AP device
> - * driver according to the following rules:
> + * @matrix_mdev: the matrix mediated device for which we want to associate
> + * all available queues with a given apqi.
> + * @apqi: The apqi which associated with all defined APID of the
> + * mediated device will define a AP queue.
> *
> - * - If both @apid and @apqi are not NULL, then there must be an AP queue
> - * device bound to the vfio_ap driver with the APQN identified by @apid and
> - * @apqi
> - *
> - * - If only @apid is not NULL, then there must be an AP queue device bound
> - * to the vfio_ap driver with an APQN containing @apid
> - *
> - * - If only @apqi is not NULL, then there must be an AP queue device bound
> - * to the vfio_ap driver with an APQN containing @apqi
> - *
> - * Returns 0 if the AP queue is reserved; otherwise, returns -EADDRNOTAVAIL.
> + * We remove the queue from the list of queues associated with the
> + * mediated device and put them back to the free list of the matrix
> + * device and clear the matrix_mdev pointer.
> */
> -static int vfio_ap_verify_queue_reserved(unsigned long *apid,
> - unsigned long *apqi)
> +static void vfio_ap_put_all_cards(struct ap_matrix_mdev *matrix_mdev, int apqi)
> {
> - int ret;
> - struct vfio_ap_queue_reserved qres;
> + int apid, apqn;
>
> - qres.apid = apid;
> - qres.apqi = apqi;
> - qres.reserved = false;
> -
> - ret = driver_for_each_device(&matrix_dev->vfio_ap_drv->driver, NULL,
> - &qres, vfio_ap_has_queue);
> - if (ret)
> - return ret;
> + for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
> + apqn = AP_MKQID(apid, apqi);
> + vfio_ap_free_queue(apqn, matrix_mdev);
> + }
> +}
>
> - if (qres.reserved)
> - return 0;
> +static void move_and_set(struct list_head *src, struct list_head *dst,
> + struct ap_matrix_mdev *matrix_mdev)
> +{
> + struct vfio_ap_queue *q, *qtmp;
>
> - return -EADDRNOTAVAIL;
> + list_for_each_entry_safe(q, qtmp, src, list) {
> + list_move(&q->list, dst);
> + q->matrix_mdev = matrix_mdev;
> + }
> }
>
> -static int
> -vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
> - unsigned long apid)
> +static int vfio_ap_queue_match(struct device *dev, void *data)
> {
> - int ret;
> - unsigned long apqi;
> - unsigned long nbits = matrix_mdev->matrix.aqm_max + 1;
> + struct ap_queue *ap;
>
> - if (find_first_bit_inv(matrix_mdev->matrix.aqm, nbits) >= nbits)
> - return vfio_ap_verify_queue_reserved(&apid, NULL);
> + ap = to_ap_queue(dev);
> + return ap->qid == *(int *)data;
> +}
>
> - for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, nbits) {
> - ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
> - if (ret)
> - return ret;
> - }
> +static struct vfio_ap_queue *vfio_ap_find_queue(int apqn)
> +{
> + struct device *dev;
> + struct vfio_ap_queue *q;
> +
> + dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
> + &apqn, vfio_ap_queue_match);
> + if (!dev)
> + return NULL;
> + q = dev_get_drvdata(dev);
> + put_device(dev);
> + return q;
> +}
>
> +/**
> + * vfio_ap_get_all_domains:
> + *
> + * @matrix_mdev: the matrix mediated device for which we want to associate
> + * all available queues with a given apqi.
> + * @apqi: The apqi which associated with all defined APID of the
> + * mediated device will define a AP queue.
> + *
> + * We define a local list to put all queues we find on the matrix driver
> + * device list when associating the apqi with all already defined apid for
> + * this matrix mediated device.
> + *
> + * If we can get all the devices we roll them to the mediated device list
> + * If we get errors we unroll them to the free list.
> + */
> +static int vfio_ap_get_all_domains(struct ap_matrix_mdev *matrix_mdev, int apid)
> +{
> + int apqi, apqn;
> + int ret = 0;
> + struct vfio_ap_queue *q;
> + struct list_head q_list;
> +
> + if (!vfio_ap_find_any_card(apid))
> + return -EADDRNOTAVAIL;
> +
> + INIT_LIST_HEAD(&q_list);
> +
> + for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
> + apqn = AP_MKQID(apid, apqi);
> + q = vfio_ap_find_queue(apqn);
> + if (!q) {
> + ret = -EADDRNOTAVAIL;
> + goto rewind;
> + }
> + if (q->matrix_mdev) {
> + ret = -EADDRINUSE;
> + goto rewind;
> + }
> + list_move(&q->list, &q_list);
> + }
> + move_and_set(&q_list, &matrix_mdev->qlist, matrix_mdev);
> return 0;
> +rewind:
> + move_and_set(&q_list, &matrix_dev->free_list, NULL);
> + return ret;
> }
> -
> /**
> - * vfio_ap_mdev_verify_no_sharing
> + * vfio_ap_get_all_cards:
> *
> - * Verifies that the APQNs derived from the cross product of the AP adapter IDs
> - * and AP queue indexes comprising the AP matrix are not configured for another
> - * mediated device. AP queue sharing is not allowed.
> + * @matrix_mdev: the matrix mediated device for which we want to associate
> + * all available queues with a given apqi.
> + * @apqi: The apqi which associated with all defined APID of the
> + * mediated device will define a AP queue.
> *
> - * @matrix_mdev: the mediated matrix device
> + * We define a local list to put all queues we find on the matrix device
> + * free list when associating the apqi with all already defined apid for
> + * this matrix mediated device.
> *
> - * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
> + * If we can get all the devices we roll them to the mediated device list
> + * If we get errors we unroll them to the free list.
> */
> -static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
> +static int vfio_ap_get_all_cards(struct ap_matrix_mdev *matrix_mdev, int apqi)
> {
> - struct ap_matrix_mdev *lstdev;
> - DECLARE_BITMAP(apm, AP_DEVICES);
> - DECLARE_BITMAP(aqm, AP_DOMAINS);
> -
> - list_for_each_entry(lstdev, &matrix_dev->mdev_list, node) {
> - if (matrix_mdev == lstdev)
> - continue;
> -
> - memset(apm, 0, sizeof(apm));
> - memset(aqm, 0, sizeof(aqm));
> -
> - /*
> - * We work on full longs, as we can only exclude the leftover
> - * bits in non-inverse order. The leftover is all zeros.
> - */
> - if (!bitmap_and(apm, matrix_mdev->matrix.apm,
> - lstdev->matrix.apm, AP_DEVICES))
> - continue;
> -
> - if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
> - lstdev->matrix.aqm, AP_DOMAINS))
> - continue;
> -
> - return -EADDRINUSE;
> + int apid, apqn;
> + int ret = 0;
> + struct vfio_ap_queue *q;
> + struct list_head q_list;
> + struct ap_matrix_mdev *tmp = NULL;
> +
> + if (!vfio_ap_find_any_domain(apqi))
> + return -EADDRNOTAVAIL;
> +
> + INIT_LIST_HEAD(&q_list);
> +
> + for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
> + apqn = AP_MKQID(apid, apqi);
> + q = vfio_ap_find_queue(apqn);
> + if (!q) {
> + ret = -EADDRNOTAVAIL;
> + goto rewind;
> + }
> + if (q->matrix_mdev) {
> + ret = -EADDRINUSE;
> + goto rewind;
> + }
> + list_move(&q->list, &q_list);
> }
> -
> + tmp = matrix_mdev;
> + move_and_set(&q_list, &matrix_mdev->qlist, matrix_mdev);
> return 0;
> +rewind:
> + move_and_set(&q_list, &matrix_dev->free_list, NULL);
> + return ret;
> }
>
> /**
> @@ -330,21 +420,15 @@ static ssize_t assign_adapter_store(struct device *dev,
> */
> mutex_lock(&matrix_dev->lock);
>
> - ret = vfio_ap_mdev_verify_queues_reserved_for_apid(matrix_mdev, apid);
> + ret = vfio_ap_get_all_domains(matrix_mdev, apid);
> if (ret)
> goto done;
>
> set_bit_inv(apid, matrix_mdev->matrix.apm);
>
> - ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
> - if (ret)
> - goto share_err;
> -
> ret = count;
> goto done;
>
> -share_err:
> - clear_bit_inv(apid, matrix_mdev->matrix.apm);
> done:
> mutex_unlock(&matrix_dev->lock);
>
> @@ -391,32 +475,13 @@ static ssize_t unassign_adapter_store(struct device *dev,
>
> mutex_lock(&matrix_dev->lock);
> clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
> + vfio_ap_put_all_domains(matrix_mdev, apid);
> mutex_unlock(&matrix_dev->lock);
>
> return count;
> }
> static DEVICE_ATTR_WO(unassign_adapter);
>
> -static int
> -vfio_ap_mdev_verify_queues_reserved_for_apqi(struct ap_matrix_mdev *matrix_mdev,
> - unsigned long apqi)
> -{
> - int ret;
> - unsigned long apid;
> - unsigned long nbits = matrix_mdev->matrix.apm_max + 1;
> -
> - if (find_first_bit_inv(matrix_mdev->matrix.apm, nbits) >= nbits)
> - return vfio_ap_verify_queue_reserved(NULL, &apqi);
> -
> - for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, nbits) {
> - ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
> - if (ret)
> - return ret;
> - }
> -
> - return 0;
> -}
> -
> /**
> * assign_domain_store
> *
> @@ -471,21 +536,15 @@ static ssize_t assign_domain_store(struct device *dev,
>
> mutex_lock(&matrix_dev->lock);
>
> - ret = vfio_ap_mdev_verify_queues_reserved_for_apqi(matrix_mdev, apqi);
> + ret = vfio_ap_get_all_cards(matrix_mdev, apqi);
> if (ret)
> goto done;
>
> set_bit_inv(apqi, matrix_mdev->matrix.aqm);
>
> - ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
> - if (ret)
> - goto share_err;
> -
> ret = count;
> goto done;
>
> -share_err:
> - clear_bit_inv(apqi, matrix_mdev->matrix.aqm);
> done:
> mutex_unlock(&matrix_dev->lock);
>
> @@ -533,6 +592,7 @@ static ssize_t unassign_domain_store(struct device *dev,
>
> mutex_lock(&matrix_dev->lock);
> clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
> + vfio_ap_put_all_cards(matrix_mdev, apqi);
> mutex_unlock(&matrix_dev->lock);
>
> return count;
> @@ -790,49 +850,22 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
> return NOTIFY_OK;
> }
>
> -static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
> - unsigned int retry)
> -{
> - struct ap_queue_status status;
> -
> - do {
> - status = ap_zapq(AP_MKQID(apid, apqi));
> - switch (status.response_code) {
> - case AP_RESPONSE_NORMAL:
> - return 0;
> - case AP_RESPONSE_RESET_IN_PROGRESS:
> - case AP_RESPONSE_BUSY:
> - msleep(20);
> - break;
> - default:
> - /* things are really broken, give up */
> - return -EIO;
> - }
> - } while (retry--);
> -
> - return -EBUSY;
> -}
> -
> static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
> {
> int ret;
> int rc = 0;
> - unsigned long apid, apqi;
> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> + struct vfio_ap_queue *q;
>
> - for_each_set_bit_inv(apid, matrix_mdev->matrix.apm,
> - matrix_mdev->matrix.apm_max + 1) {
> - for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
> - matrix_mdev->matrix.aqm_max + 1) {
> - ret = vfio_ap_mdev_reset_queue(apid, apqi, 1);
> - /*
> - * Regardless whether a queue turns out to be busy, or
> - * is not operational, we need to continue resetting
> - * the remaining queues.
> - */
> - if (ret)
> - rc = ret;
> - }
> + list_for_each_entry(q, &matrix_mdev->qlist, list) {
> + ret = vfio_ap_mdev_reset_queue(q);
> + /*
> + * Regardless whether a queue turns out to be busy, or
> + * is not operational, we need to continue resetting
> + * the remaining queues but notice the last error code.
> + */
> + if (ret)
> + rc = ret;
> }
>
> return rc;
> @@ -868,10 +901,10 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
> if (matrix_mdev->kvm)
> kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
>
> + matrix_mdev->kvm = NULL;
> vfio_ap_mdev_reset_queues(mdev);
> vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
> &matrix_mdev->group_notifier);
> - matrix_mdev->kvm = NULL;
> module_put(THIS_MODULE);
> }
>
> @@ -905,7 +938,9 @@ static ssize_t vfio_ap_mdev_ioctl(struct mdev_device *mdev,
> ret = vfio_ap_mdev_get_device_info(arg);
> break;
> case VFIO_DEVICE_RESET:
> + mutex_lock(&matrix_dev->lock);
> ret = vfio_ap_mdev_reset_queues(mdev);
> + mutex_unlock(&matrix_dev->lock);
> break;
> default:
> ret = -EOPNOTSUPP;
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index a910be1..3e6940c 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -40,6 +40,7 @@ struct ap_matrix_dev {
> atomic_t available_instances;
> struct ap_config_info info;
> struct list_head mdev_list;
> + struct list_head free_list;
> struct mutex lock;
> struct ap_driver *vfio_ap_drv;
> };
> @@ -83,9 +84,15 @@ struct ap_matrix_mdev {
> struct notifier_block group_notifier;
> struct kvm *kvm;
> struct kvm_s390_module_hook pqap_hook;
> + struct list_head qlist;
> };
>
> extern int vfio_ap_mdev_register(void);
> extern void vfio_ap_mdev_unregister(void);
>
> +struct vfio_ap_queue {
> + struct list_head list;
> + struct ap_matrix_mdev *matrix_mdev;
> + int apqn;
> +};
> #endif /* _VFIO_AP_PRIVATE_H_ */
>


--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany