2021-10-21 15:25:17

by Anthony Krowiak

[permalink] [raw]
Subject: [PATCH v17 00/15] s390/vfio-ap: dynamic configuration support

The current design for AP pass-through does not support making dynamic
changes to the AP matrix of a running guest resulting in a few
deficiencies this patch series is intended to mitigate:

1. Adapters, domains and control domains can not be added to or removed
from a running guest. In order to modify a guest's AP configuration,
the guest must be terminated; only then can AP resources be assigned
to or unassigned from the guest's matrix mdev. The new AP
configuration becomes available to the guest when it is subsequently
restarted.

2. The AP bus's /sys/bus/ap/apmask and /sys/bus/ap/aqmask interfaces can
be modified by a root user without any restrictions. A change to
either mask can result in AP queue devices being unbound from the
vfio_ap device driver and bound to a zcrypt device driver even if a
guest is using the queues, thus giving the host access to the guest's
private crypto data and vice versa.

3. The APQNs derived from the Cartesian product of the APIDs of the
adapters and APQIs of the domains assigned to a matrix mdev must
reference an AP queue device bound to the vfio_ap device driver. The
AP architecture allows assignment of AP resources that are not
available to the system, so this artificial restriction is not
compliant with the architecture.

4. The AP configuration profile can be dynamically changed for the linux
host after a KVM guest is started. For example, a new domain can be
dynamically added to the configuration profile via the SE or an HMC
connected to a DPM enabled lpar. Likewise, AP adapters can be
dynamically configured (online state) and deconfigured (standby state)
using the SE, an SCLP command or an HMC connected to a DPM enabled
lpar. This can result in inadvertent sharing of AP queues between the
guest and host.

5. A root user can manually unbind an AP queue device representing a
queue in use by a KVM guest via the vfio_ap device driver's sysfs
unbind attribute. In this case, the guest will be using a queue that
is not bound to the driver which violates the device model.

This patch series introduces the following changes to the current design
to alleviate the shortcomings described above as well as to implement
more of the AP architecture:

1. A root user will be prevented from making edits to the AP bus's
/sys/bus/ap/apmask or /sys/bus/ap/aqmask if the change would transfer
ownership of an APQN from the vfio_ap device driver to a zcrypt driver
while the APQN is assigned to a matrix mdev.

2. Allow a root user to hot plug/unplug AP adapters, domains and control
domains for a KVM guest using the matrix mdev via its sysfs
assign/unassign attributes.

4. Allow assignment of an AP adapter or domain to a matrix mdev even if
it results in assignment of an APQN that does not reference an AP
queue device bound to the vfio_ap device driver, as long as the APQN
is not reserved for use by the default zcrypt drivers (also known as
over-provisioning of AP resources). Allowing over-provisioning of AP
resources better models the architecture which does not preclude
assigning AP resources that are not yet available in the system. Such
APQNs, however, will not be assigned to the guest using the matrix
mdev; only APQNs referencing AP queue devices bound to the vfio_ap
device driver will actually get assigned to the guest.

5. Handle dynamic changes to the AP device model.

1. Rationale for changes to AP bus's apmask/aqmask interfaces:
----------------------------------------------------------
Due to the extremely sensitive nature of cryptographic data, it is
imperative that great care be taken to ensure that such data is secured.
Allowing a root user, either inadvertently or maliciously, to configure
these masks such that a queue is shared between the host and a guest is
not only avoidable, it is advisable. It was suggested that this scenario
is better handled in user space with management software, but that does
not preclude a malicious administrator from using the sysfs interfaces
to gain access to a guest's crypto data. It was also suggested that this
scenario could be avoided by taking access to the adapter away from the
guest and zeroing out the queues prior to the vfio_ap driver releasing the
device; however, stealing an adapter in use from a guest as a by-product
of an operation is bad and will likely cause problems for the guest
unnecessarily. It was decided that the most effective solution with the
least number of negative side effects is to prevent the situation at the
source.

2. Rationale for hot plug/unplug using matrix mdev sysfs interfaces:
----------------------------------------------------------------
Allowing a user to hot plug/unplug AP resources using the matrix mdev
sysfs interfaces circumvents the need to terminate the guest in order to
modify its AP configuration. Allowing dynamic configuration makes
reconfiguring a guest's AP matrix much less disruptive.

3. Rationale for allowing over-provisioning of AP resources:
-----------------------------------------------------------
Allowing assignment of AP resources to a matrix mdev and ultimately to a
guest better models the AP architecture. The architecture does not
preclude assignment of unavailable AP resources. If a queue subsequently
becomes available while a guest using the matrix mdev to which its APQN
is assigned, the guest will be given access to it. If an APQN
is dynamically unassigned from the underlying host system, it will
automatically become unavailable to the guest.

Change log v16-v17:
------------------
* Introduced a new patch (patch 1) to remove the setting of the pqap hook
in the group notifier callback. It is now set when the vfio_ap device
driver is loaded.

* Patch 6:
- Split the filtering of the APQNs and the control domains into
two functions and consolidated the vfio_ap_mdev_refresh_apcb and
vfio_ap_mdev_filter_apcb into one function named
vfio_ap_mdev_filter_matrix because the matrix is actually what is
being filtered.

- Removed ACK by Halil Pasic because of changes above; needs re-review.

* Introduced a new patch (patch 8) to keep track of active guests.

* Patch 9 (patch 8 in v16):
- Refactored locking to ensure KVM lock is taken before
matrix_dev->lock when hot plugging adapters, domains and
control domains.

- Removed ACK by Halil because of changes above; needs re-review.

* Patch 14 (patch 13 in v16):
- This patch has been redesigned to ensure proper locking order (i.e.,
taking kvm->lock before matrix_dev->lock).

- Removed Halil's Removed-by because of changes above; needs re-review.

Tony Krowiak (15):
s390/vfio-ap: Set pqap hook when vfio_ap module is loaded
s390/vfio-ap: use new AP bus interface to search for queue devices
s390/vfio-ap: move probe and remove callbacks to vfio_ap_ops.c
s390/vfio-ap: manage link between queue struct and matrix mdev
s390/vfio-ap: introduce shadow APCB
s390/vfio-ap: refresh guest's APCB by filtering APQNs assigned to mdev
s390/vfio-ap: allow assignment of unavailable AP queues to mdev device
s390/vfio-ap: keep track of active guests
s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
s390/vfio-ap: reset queues after adapter/domain unassignment
s390/ap: driver callback to indicate resource in use
s390/vfio-ap: implement in-use callback for vfio_ap driver
s390/vfio-ap: sysfs attribute to display the guest's matrix
s390/ap: notify drivers on config changed and scan complete callbacks
s390/vfio-ap: update docs to include dynamic config support

Documentation/s390/vfio-ap.rst | 492 ++++++---
arch/s390/include/asm/kvm_host.h | 10 +-
arch/s390/kvm/kvm-s390.c | 1 -
arch/s390/kvm/priv.c | 45 +-
drivers/s390/crypto/ap_bus.c | 241 ++++-
drivers/s390/crypto/ap_bus.h | 16 +
drivers/s390/crypto/vfio_ap_drv.c | 52 +-
drivers/s390/crypto/vfio_ap_ops.c | 1379 ++++++++++++++++++-------
drivers/s390/crypto/vfio_ap_private.h | 66 +-
9 files changed, 1714 insertions(+), 588 deletions(-)

--
2.31.1


2021-10-21 15:25:18

by Anthony Krowiak

[permalink] [raw]
Subject: [PATCH v17 01/15] s390/vfio-ap: Set pqap hook when vfio_ap module is loaded

Rather than storing the function pointer to the PQAP(AQIC) instruction
interception handler with the mediated device (struct ap_matrix_mdev) when
the vfio_ap device driver is notified that the KVM point is being set,
let's store it once in a global variable when the vfio_ap module is
loaded.

There are three reasons for doing this:

1. The lifetime of the interception handler function coincides with the
lifetime of the vfio_ap module, so it makes little sense to tie it to
the mediated device and complete sense to tie it to the module in which
the function resides.

2. The setting/clearing of the function pointer is protected by a mutex
lock. This increases the number of locks taken during
VFIO_GROUP_NOTIFY_SET_KVM notification and increases the complexity of
ensuring locking integrity and avoiding circular lock dependencies.

3. The lock will only be taken for writing twice: When the vfio_ap module
is loaded; and, when the vfio_ap module is removed. As it stands now,
the lock is taken for writing whenever a guest is started or terminated.

Signed-off-by: Tony Krowiak <[email protected]>
---
arch/s390/include/asm/kvm_host.h | 10 ++++--
arch/s390/kvm/kvm-s390.c | 1 -
arch/s390/kvm/priv.c | 45 ++++++++++++++++++++++-----
drivers/s390/crypto/vfio_ap_ops.c | 27 ++++++++--------
drivers/s390/crypto/vfio_ap_private.h | 1 -
5 files changed, 58 insertions(+), 26 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index a604d51acfc8..05569d077d7f 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -799,16 +799,17 @@ struct kvm_s390_cpu_model {
unsigned short ibc;
};

-typedef int (*crypto_hook)(struct kvm_vcpu *vcpu);
+struct kvm_s390_crypto_hook {
+ int (*fcn)(struct kvm_vcpu *vcpu);
+};

struct kvm_s390_crypto {
struct kvm_s390_crypto_cb *crycb;
- struct rw_semaphore pqap_hook_rwsem;
- crypto_hook *pqap_hook;
__u32 crycbd;
__u8 aes_kw;
__u8 dea_kw;
__u8 apie;
+ void *data;
};

#define APCB0_MASK_SIZE 1
@@ -998,6 +999,9 @@ extern char sie_exit;
extern int kvm_s390_gisc_register(struct kvm *kvm, u32 gisc);
extern int kvm_s390_gisc_unregister(struct kvm *kvm, u32 gisc);

+extern int kvm_s390_pqap_hook_register(struct kvm_s390_crypto_hook *hook);
+extern int kvm_s390_pqap_hook_unregister(struct kvm_s390_crypto_hook *hook);
+
static inline void kvm_arch_hardware_disable(void) {}
static inline void kvm_arch_sync_events(struct kvm *kvm) {}
static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 6a6dd5e1daf6..c91981599328 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2649,7 +2649,6 @@ static void kvm_s390_crypto_init(struct kvm *kvm)
{
kvm->arch.crypto.crycb = &kvm->arch.sie_page2->crycb;
kvm_s390_set_crycb_format(kvm);
- init_rwsem(&kvm->arch.crypto.pqap_hook_rwsem);

if (!test_kvm_facility(kvm, 76))
return;
diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index 53da4ceb16a3..3d91ff934c0c 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -31,6 +31,39 @@
#include "kvm-s390.h"
#include "trace.h"

+DEFINE_MUTEX(pqap_hook_lock);
+static struct kvm_s390_crypto_hook *pqap_hook;
+
+int kvm_s390_pqap_hook_register(struct kvm_s390_crypto_hook *hook)
+{
+ int ret = 0;
+
+ mutex_lock(&pqap_hook_lock);
+ if (pqap_hook)
+ ret = -EACCES;
+ else
+ pqap_hook = hook;
+ mutex_unlock(&pqap_hook_lock);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(kvm_s390_pqap_hook_register);
+
+int kvm_s390_pqap_hook_unregister(struct kvm_s390_crypto_hook *hook)
+{
+ int ret = 0;
+
+ mutex_lock(&pqap_hook_lock);
+ if (hook != pqap_hook)
+ ret = -EACCES;
+ else
+ pqap_hook = NULL;
+ mutex_unlock(&pqap_hook_lock);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(kvm_s390_pqap_hook_unregister);
+
static int handle_ri(struct kvm_vcpu *vcpu)
{
vcpu->stat.instruction_ri++;
@@ -610,7 +643,6 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
static int handle_pqap(struct kvm_vcpu *vcpu)
{
struct ap_queue_status status = {};
- crypto_hook pqap_hook;
unsigned long reg0;
int ret;
uint8_t fc;
@@ -659,16 +691,15 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
* hook function pointer in the kvm_s390_crypto structure. Lock the
* owner, retrieve the hook function pointer and call the hook.
*/
- down_read(&vcpu->kvm->arch.crypto.pqap_hook_rwsem);
- if (vcpu->kvm->arch.crypto.pqap_hook) {
- pqap_hook = *vcpu->kvm->arch.crypto.pqap_hook;
- ret = pqap_hook(vcpu);
+ mutex_lock(&pqap_hook_lock);
+ if (pqap_hook) {
+ ret = pqap_hook->fcn(vcpu);
if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000)
kvm_s390_set_psw_cc(vcpu, 3);
- up_read(&vcpu->kvm->arch.crypto.pqap_hook_rwsem);
+ mutex_unlock(&pqap_hook_lock);
return ret;
}
- up_read(&vcpu->kvm->arch.crypto.pqap_hook_rwsem);
+ mutex_unlock(&pqap_hook_lock);
/*
* A vfio_driver must register a hook.
* No hook means no driver to enable the SIE CRYCB and no queues.
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 94c1c9bd58ad..02275d246b39 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -293,13 +293,10 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
apqn = vcpu->run->s.regs.gprs[0] & 0xffff;
mutex_lock(&matrix_dev->lock);

- if (!vcpu->kvm->arch.crypto.pqap_hook)
- goto out_unlock;
- matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
- struct ap_matrix_mdev, pqap_hook);
+ matrix_mdev = vcpu->kvm->arch.crypto.data;

/* If the there is no guest using the mdev, there is nothing to do */
- if (!matrix_mdev->kvm)
+ if (!matrix_mdev || !matrix_mdev->kvm)
goto out_unlock;

q = vfio_ap_get_queue(matrix_mdev, apqn);
@@ -348,7 +345,6 @@ static int vfio_ap_mdev_probe(struct mdev_device *mdev)

matrix_mdev->mdev = mdev;
vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
- matrix_mdev->pqap_hook = handle_pqap;
mutex_lock(&matrix_dev->lock);
list_add(&matrix_mdev->node, &matrix_dev->mdev_list);
mutex_unlock(&matrix_dev->lock);
@@ -1078,10 +1074,6 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
struct ap_matrix_mdev *m;

if (kvm->arch.crypto.crycbd) {
- down_write(&kvm->arch.crypto.pqap_hook_rwsem);
- kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
- up_write(&kvm->arch.crypto.pqap_hook_rwsem);
-
mutex_lock(&kvm->lock);
mutex_lock(&matrix_dev->lock);

@@ -1095,6 +1087,7 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,

kvm_get_kvm(kvm);
matrix_mdev->kvm = kvm;
+ kvm->arch.crypto.data = matrix_mdev;
kvm_arch_crypto_set_masks(kvm,
matrix_mdev->matrix.apm,
matrix_mdev->matrix.aqm,
@@ -1155,16 +1148,13 @@ static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev,
struct kvm *kvm)
{
if (kvm && kvm->arch.crypto.crycbd) {
- down_write(&kvm->arch.crypto.pqap_hook_rwsem);
- kvm->arch.crypto.pqap_hook = NULL;
- up_write(&kvm->arch.crypto.pqap_hook_rwsem);
-
mutex_lock(&kvm->lock);
mutex_lock(&matrix_dev->lock);

kvm_arch_crypto_clear_masks(kvm);
vfio_ap_mdev_reset_queues(matrix_mdev);
kvm_put_kvm(kvm);
+ kvm->arch.crypto.data = NULL;
matrix_mdev->kvm = NULL;

mutex_unlock(&kvm->lock);
@@ -1391,12 +1381,20 @@ static const struct mdev_parent_ops vfio_ap_matrix_ops = {
.supported_type_groups = vfio_ap_mdev_type_groups,
};

+static struct kvm_s390_crypto_hook pqap_hook = {
+ .fcn = handle_pqap,
+};
+
int vfio_ap_mdev_register(void)
{
int ret;

atomic_set(&matrix_dev->available_instances, MAX_ZDEV_ENTRIES_EXT);

+ ret = kvm_s390_pqap_hook_register(&pqap_hook);
+ if (ret)
+ return ret;
+
ret = mdev_register_driver(&vfio_ap_matrix_driver);
if (ret)
return ret;
@@ -1413,6 +1411,7 @@ int vfio_ap_mdev_register(void)

void vfio_ap_mdev_unregister(void)
{
+ WARN_ON(kvm_s390_pqap_hook_unregister(&pqap_hook));
mdev_unregister_device(&matrix_dev->device);
mdev_unregister_driver(&vfio_ap_matrix_driver);
}
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 648fcaf8104a..907f41160de7 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -97,7 +97,6 @@ struct ap_matrix_mdev {
struct notifier_block group_notifier;
struct notifier_block iommu_notifier;
struct kvm *kvm;
- crypto_hook pqap_hook;
struct mdev_device *mdev;
};

--
2.31.1

2021-10-21 15:25:21

by Anthony Krowiak

[permalink] [raw]
Subject: [PATCH v17 03/15] s390/vfio-ap: move probe and remove callbacks to vfio_ap_ops.c

Let's move the probe and remove callbacks into the vfio_ap_ops.c
file to keep all code related to managing queues in a single file. This
way, all functions related to queue management can be removed from the
vfio_ap_private.h header file defining the public interfaces for the
vfio_ap device driver.

Signed-off-by: Tony Krowiak <[email protected]>
Reviewed-by: Halil Pasic <[email protected]>
---
drivers/s390/crypto/vfio_ap_drv.c | 45 ++-------------------------
drivers/s390/crypto/vfio_ap_ops.c | 31 ++++++++++++++++--
drivers/s390/crypto/vfio_ap_private.h | 5 +--
3 files changed, 34 insertions(+), 47 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index 03311a476366..5255e338591d 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -41,50 +41,9 @@ static struct ap_device_id ap_queue_ids[] = {

MODULE_DEVICE_TABLE(vfio_ap, ap_queue_ids);

-/**
- * vfio_ap_queue_dev_probe: Allocate a vfio_ap_queue structure and associate it
- * with the device as driver_data.
- *
- * @apdev: the AP device being probed
- *
- * Return: returns 0 if the probe succeeded; otherwise, returns -ENOMEM if
- * storage could not be allocated for a vfio_ap_queue object.
- */
-static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
-{
- struct vfio_ap_queue *q;
-
- q = kzalloc(sizeof(*q), GFP_KERNEL);
- if (!q)
- return -ENOMEM;
- dev_set_drvdata(&apdev->device, q);
- q->apqn = to_ap_queue(&apdev->device)->qid;
- q->saved_isc = VFIO_AP_ISC_INVALID;
- return 0;
-}
-
-/**
- * vfio_ap_queue_dev_remove: Free the associated vfio_ap_queue structure.
- *
- * @apdev: the AP device being removed
- *
- * Takes the matrix lock to avoid actions on this device while doing the remove.
- */
-static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
-{
- struct vfio_ap_queue *q;
-
- mutex_lock(&matrix_dev->lock);
- q = dev_get_drvdata(&apdev->device);
- vfio_ap_mdev_reset_queue(q, 1);
- dev_set_drvdata(&apdev->device, NULL);
- kfree(q);
- mutex_unlock(&matrix_dev->lock);
-}
-
static struct ap_driver vfio_ap_drv = {
- .probe = vfio_ap_queue_dev_probe,
- .remove = vfio_ap_queue_dev_remove,
+ .probe = vfio_ap_mdev_probe_queue,
+ .remove = vfio_ap_mdev_remove_queue,
.ids = ap_queue_ids,
};

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 7a9d1b04ecd4..cd2fe1586327 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1191,8 +1191,7 @@ static struct vfio_ap_queue *vfio_ap_find_queue(int apqn)
return q;
}

-int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q,
- unsigned int retry)
+static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q, unsigned int retry)
{
struct ap_queue_status status;
int ret;
@@ -1410,3 +1409,31 @@ void vfio_ap_mdev_unregister(void)
mdev_unregister_device(&matrix_dev->device);
mdev_unregister_driver(&vfio_ap_matrix_driver);
}
+
+int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
+{
+ struct vfio_ap_queue *q;
+
+ q = kzalloc(sizeof(*q), GFP_KERNEL);
+ if (!q)
+ return -ENOMEM;
+ mutex_lock(&matrix_dev->lock);
+ q->apqn = to_ap_queue(&apdev->device)->qid;
+ q->saved_isc = VFIO_AP_ISC_INVALID;
+ dev_set_drvdata(&apdev->device, q);
+ mutex_unlock(&matrix_dev->lock);
+
+ return 0;
+}
+
+void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
+{
+ struct vfio_ap_queue *q;
+
+ mutex_lock(&matrix_dev->lock);
+ q = dev_get_drvdata(&apdev->device);
+ vfio_ap_mdev_reset_queue(q, 1);
+ dev_set_drvdata(&apdev->device, NULL);
+ kfree(q);
+ mutex_unlock(&matrix_dev->lock);
+}
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 907f41160de7..e3f0d42b094c 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -118,7 +118,8 @@ struct vfio_ap_queue {

int vfio_ap_mdev_register(void);
void vfio_ap_mdev_unregister(void);
-int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q,
- unsigned int retry);
+
+int vfio_ap_mdev_probe_queue(struct ap_device *queue);
+void vfio_ap_mdev_remove_queue(struct ap_device *queue);

#endif /* _VFIO_AP_PRIVATE_H_ */
--
2.31.1

2021-10-21 15:25:24

by Anthony Krowiak

[permalink] [raw]
Subject: [PATCH v17 02/15] s390/vfio-ap: use new AP bus interface to search for queue devices

This patch refactors the vfio_ap device driver to use the AP bus's
ap_get_qdev() function to retrieve the vfio_ap_queue struct containing
information about a queue that is bound to the vfio_ap device driver.
The bus's ap_get_qdev() function retrieves the queue device from a
hashtable keyed by APQN. This is much more efficient than looping over
the list of devices attached to the AP bus by several orders of
magnitude.

Signed-off-by: Tony Krowiak <[email protected]>
Reviewed-by: Halil Pasic <[email protected]>
Reviewed-by: Jason J. Herne <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 23 +++++++++--------------
1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 02275d246b39..7a9d1b04ecd4 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -28,13 +28,6 @@ static int vfio_ap_mdev_reset_queues(struct ap_matrix_mdev *matrix_mdev);
static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
static const struct vfio_device_ops vfio_ap_matrix_dev_ops;

-static int match_apqn(struct device *dev, const void *data)
-{
- struct vfio_ap_queue *q = dev_get_drvdata(dev);
-
- return (q->apqn == *(int *)(data)) ? 1 : 0;
-}
-
/**
* vfio_ap_get_queue - retrieve a queue with a specific APQN from a list
* @matrix_mdev: the associated mediated matrix
@@ -1183,15 +1176,17 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,

static struct vfio_ap_queue *vfio_ap_find_queue(int apqn)
{
- struct device *dev;
+ struct ap_queue *queue;
struct vfio_ap_queue *q = NULL;

- dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
- &apqn, match_apqn);
- if (dev) {
- q = dev_get_drvdata(dev);
- put_device(dev);
- }
+ queue = ap_get_qdev(apqn);
+ if (!queue)
+ return NULL;
+
+ if (queue->ap_dev.device.driver == &matrix_dev->vfio_ap_drv->driver)
+ q = dev_get_drvdata(&queue->ap_dev.device);
+
+ put_device(&queue->ap_dev.device);

return q;
}
--
2.31.1

2021-10-21 15:25:35

by Anthony Krowiak

[permalink] [raw]
Subject: [PATCH v17 04/15] s390/vfio-ap: manage link between queue struct and matrix mdev

Let's create links between each queue device bound to the vfio_ap device
driver and the matrix mdev to which the queue's APQN is assigned. The idea
is to facilitate efficient retrieval of the objects representing the queue
devices and matrix mdevs as well as to verify that a queue assigned to
a matrix mdev is bound to the driver.

The links will be created as follows:

* When the queue device is probed, if its APQN is assigned to a matrix
mdev, the structures representing the queue device and the matrix mdev
will be linked.

* When an adapter or domain is assigned to a matrix mdev, for each new
APQN assigned that references a queue device bound to the vfio_ap
device driver, the structures representing the queue device and the
matrix mdev will be linked.

The links will be removed as follows:

* When the queue device is removed, if its APQN is assigned to a matrix
mdev, the link from the structure representing the matrix mdev to the
structure representing the queue will be removed. The link from the
queue to the matrix mdev will be maintained because if the queue device
is being removed due to a manual sysfs unbind, it may be needed after
the queue is reset to clean up the IRQ resources allocated to enable AP
interrupts for the KVM guest. Since the storage for the structure
representing the queue device is ultimately freed by the remove
callback, keeping the reference shouldn't be a problem.

* When an adapter or domain is unassigned from a matrix mdev, for each
APQN unassigned that references a queue device bound to the vfio_ap
device driver, the structures representing the queue device and the
matrix mdev will be unlinked.

* When an mdev is removed, the link from any queues assigned to the mdev
to the mdev will be removed.

Signed-off-by: Tony Krowiak <[email protected]>
Reviewed-by: Halil Pasic <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 192 +++++++++++++++++++++-----
drivers/s390/crypto/vfio_ap_private.h | 14 ++
2 files changed, 169 insertions(+), 37 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index cd2fe1586327..e2be29e9d310 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -29,32 +29,27 @@ static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
static const struct vfio_device_ops vfio_ap_matrix_dev_ops;

/**
- * vfio_ap_get_queue - retrieve a queue with a specific APQN from a list
- * @matrix_mdev: the associated mediated matrix
- * @apqn: The queue APQN
+ * vfio_ap_mdev_get_queue - retrieve a queue with a specific APQN from a
+ * hash table of queues assigned to a matrix mdev
+ * @matrix_mdev: the matrix mdev
+ * @apqn: The APQN of a queue device
*
- * Retrieve a queue with a specific APQN from the list of the
- * devices of the vfio_ap_drv.
- * Verify that the APID and the APQI are set in the matrix.
- *
- * Return: the pointer to the associated vfio_ap_queue
+ * Return: the pointer to the vfio_ap_queue struct representing the queue or
+ * NULL if the queue is not assigned to @matrix_mdev
*/
-static struct vfio_ap_queue *vfio_ap_get_queue(
+static struct vfio_ap_queue *vfio_ap_mdev_get_queue(
struct ap_matrix_mdev *matrix_mdev,
int apqn)
{
struct vfio_ap_queue *q;

- if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
- return NULL;
- if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
- return NULL;
-
- q = vfio_ap_find_queue(apqn);
- if (q)
- q->matrix_mdev = matrix_mdev;
+ hash_for_each_possible(matrix_mdev->qtable.queues, q, mdev_qnode,
+ apqn) {
+ if (q && q->apqn == apqn)
+ return q;
+ }

- return q;
+ return NULL;
}

/**
@@ -172,7 +167,6 @@ static struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
status.response_code);
end_free:
vfio_ap_free_aqic_resources(q);
- q->matrix_mdev = NULL;
return status;
}

@@ -292,7 +286,7 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
if (!matrix_mdev || !matrix_mdev->kvm)
goto out_unlock;

- q = vfio_ap_get_queue(matrix_mdev, apqn);
+ q = vfio_ap_mdev_get_queue(matrix_mdev, apqn);
if (!q)
goto out_unlock;

@@ -338,6 +332,8 @@ static int vfio_ap_mdev_probe(struct mdev_device *mdev)

matrix_mdev->mdev = mdev;
vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
+ hash_init(matrix_mdev->qtable.queues);
+ mdev_set_drvdata(mdev, matrix_mdev);
mutex_lock(&matrix_dev->lock);
list_add(&matrix_mdev->node, &matrix_dev->mdev_list);
mutex_unlock(&matrix_dev->lock);
@@ -359,6 +355,55 @@ static int vfio_ap_mdev_probe(struct mdev_device *mdev)
return ret;
}

+static void vfio_ap_mdev_link_queue(struct ap_matrix_mdev *matrix_mdev,
+ struct vfio_ap_queue *q)
+{
+ if (q) {
+ q->matrix_mdev = matrix_mdev;
+ hash_add(matrix_mdev->qtable.queues, &q->mdev_qnode, q->apqn);
+ }
+}
+
+static void vfio_ap_mdev_link_apqn(struct ap_matrix_mdev *matrix_mdev, int apqn)
+{
+ struct vfio_ap_queue *q;
+
+ q = vfio_ap_find_queue(apqn);
+ vfio_ap_mdev_link_queue(matrix_mdev, q);
+}
+
+static void vfio_ap_unlink_queue_fr_mdev(struct vfio_ap_queue *q)
+{
+ hash_del(&q->mdev_qnode);
+}
+
+static void vfio_ap_unlink_mdev_fr_queue(struct vfio_ap_queue *q)
+{
+ q->matrix_mdev = NULL;
+}
+
+static void vfio_ap_mdev_unlink_queue(struct vfio_ap_queue *q)
+{
+ vfio_ap_unlink_queue_fr_mdev(q);
+ vfio_ap_unlink_mdev_fr_queue(q);
+}
+
+static void vfio_ap_mdev_unlink_fr_queues(struct ap_matrix_mdev *matrix_mdev)
+{
+ struct vfio_ap_queue *q;
+ unsigned long apid, apqi;
+
+ for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
+ for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
+ AP_DOMAINS) {
+ q = vfio_ap_mdev_get_queue(matrix_mdev,
+ AP_MKQID(apid, apqi));
+ if (q)
+ q->matrix_mdev = NULL;
+ }
+ }
+}
+
static void vfio_ap_mdev_remove(struct mdev_device *mdev)
{
struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(&mdev->dev);
@@ -367,6 +412,7 @@ static void vfio_ap_mdev_remove(struct mdev_device *mdev)

mutex_lock(&matrix_dev->lock);
vfio_ap_mdev_reset_queues(matrix_mdev);
+ vfio_ap_mdev_unlink_fr_queues(matrix_mdev);
list_del(&matrix_mdev->node);
mutex_unlock(&matrix_dev->lock);
vfio_uninit_group_dev(&matrix_mdev->vdev);
@@ -575,6 +621,16 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
return 0;
}

+static void vfio_ap_mdev_link_adapter(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long apid)
+{
+ unsigned long apqi;
+
+ for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS)
+ vfio_ap_mdev_link_apqn(matrix_mdev,
+ AP_MKQID(apid, apqi));
+}
+
/**
* assign_adapter_store - parses the APID from @buf and sets the
* corresponding bit in the mediated matrix device's APM
@@ -645,6 +701,7 @@ static ssize_t assign_adapter_store(struct device *dev,
if (ret)
goto share_err;

+ vfio_ap_mdev_link_adapter(matrix_mdev, apid);
ret = count;
goto done;

@@ -657,6 +714,20 @@ static ssize_t assign_adapter_store(struct device *dev,
}
static DEVICE_ATTR_WO(assign_adapter);

+static void vfio_ap_mdev_unlink_adapter(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long apid)
+{
+ unsigned long apqi;
+ struct vfio_ap_queue *q;
+
+ for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
+ q = vfio_ap_mdev_get_queue(matrix_mdev, AP_MKQID(apid, apqi));
+
+ if (q)
+ vfio_ap_mdev_unlink_queue(q);
+ }
+}
+
/**
* unassign_adapter_store - parses the APID from @buf and clears the
* corresponding bit in the mediated matrix device's APM
@@ -698,6 +769,7 @@ static ssize_t unassign_adapter_store(struct device *dev,
}

clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
+ vfio_ap_mdev_unlink_adapter(matrix_mdev, apid);
ret = count;
done:
mutex_unlock(&matrix_dev->lock);
@@ -725,6 +797,16 @@ vfio_ap_mdev_verify_queues_reserved_for_apqi(struct ap_matrix_mdev *matrix_mdev,
return 0;
}

+static void vfio_ap_mdev_link_domain(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long apqi)
+{
+ unsigned long apid;
+
+ for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES)
+ vfio_ap_mdev_link_apqn(matrix_mdev,
+ AP_MKQID(apid, apqi));
+}
+
/**
* assign_domain_store - parses the APQI from @buf and sets the
* corresponding bit in the mediated matrix device's AQM
@@ -790,6 +872,7 @@ static ssize_t assign_domain_store(struct device *dev,
if (ret)
goto share_err;

+ vfio_ap_mdev_link_domain(matrix_mdev, apqi);
ret = count;
goto done;

@@ -802,6 +885,19 @@ static ssize_t assign_domain_store(struct device *dev,
}
static DEVICE_ATTR_WO(assign_domain);

+static void vfio_ap_mdev_unlink_domain(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long apqi)
+{
+ unsigned long apid;
+ struct vfio_ap_queue *q;
+
+ for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
+ q = vfio_ap_mdev_get_queue(matrix_mdev, AP_MKQID(apid, apqi));
+
+ if (q)
+ vfio_ap_mdev_unlink_queue(q);
+ }
+}

/**
* unassign_domain_store - parses the APQI from @buf and clears the
@@ -844,6 +940,7 @@ static ssize_t unassign_domain_store(struct device *dev,
}

clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
+ vfio_ap_mdev_unlink_domain(matrix_mdev, apqi);
ret = count;

done:
@@ -1243,25 +1340,18 @@ static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q, unsigned int retry)

static int vfio_ap_mdev_reset_queues(struct ap_matrix_mdev *matrix_mdev)
{
- int ret;
- int rc = 0;
- unsigned long apid, apqi;
+ int ret, bkt, rc = 0;
struct vfio_ap_queue *q;

- for_each_set_bit_inv(apid, matrix_mdev->matrix.apm,
- matrix_mdev->matrix.apm_max + 1) {
- for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
- matrix_mdev->matrix.aqm_max + 1) {
- q = vfio_ap_find_queue(AP_MKQID(apid, apqi));
- ret = vfio_ap_mdev_reset_queue(q, 1);
- /*
- * Regardless whether a queue turns out to be busy, or
- * is not operational, we need to continue resetting
- * the remaining queues.
- */
- if (ret)
- rc = ret;
- }
+ hash_for_each(matrix_mdev->qtable.queues, bkt, q, mdev_qnode) {
+ ret = vfio_ap_mdev_reset_queue(q, 1);
+ /*
+ * Regardless whether a queue turns out to be busy, or
+ * is not operational, we need to continue resetting
+ * the remaining queues.
+ */
+ if (ret)
+ rc = ret;
}

return rc;
@@ -1410,6 +1500,28 @@ void vfio_ap_mdev_unregister(void)
mdev_unregister_driver(&vfio_ap_matrix_driver);
}

+/*
+ * vfio_ap_queue_link_mdev
+ *
+ * @q: The queue to link with the matrix mdev.
+ *
+ * Links @q with the matrix mdev to which the queue's APQN is assigned.
+ */
+static void vfio_ap_queue_link_mdev(struct vfio_ap_queue *q)
+{
+ unsigned long apid = AP_QID_CARD(q->apqn);
+ unsigned long apqi = AP_QID_QUEUE(q->apqn);
+ struct ap_matrix_mdev *matrix_mdev;
+
+ list_for_each_entry(matrix_mdev, &matrix_dev->mdev_list, node) {
+ if (test_bit_inv(apid, matrix_mdev->matrix.apm) &&
+ test_bit_inv(apqi, matrix_mdev->matrix.aqm)) {
+ vfio_ap_mdev_link_queue(matrix_mdev, q);
+ break;
+ }
+ }
+}
+
int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
{
struct vfio_ap_queue *q;
@@ -1417,9 +1529,11 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
q = kzalloc(sizeof(*q), GFP_KERNEL);
if (!q)
return -ENOMEM;
+
mutex_lock(&matrix_dev->lock);
q->apqn = to_ap_queue(&apdev->device)->qid;
q->saved_isc = VFIO_AP_ISC_INVALID;
+ vfio_ap_queue_link_mdev(q);
dev_set_drvdata(&apdev->device, q);
mutex_unlock(&matrix_dev->lock);

@@ -1432,6 +1546,10 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)

mutex_lock(&matrix_dev->lock);
q = dev_get_drvdata(&apdev->device);
+
+ if (q->matrix_mdev)
+ vfio_ap_unlink_queue_fr_mdev(q);
+
vfio_ap_mdev_reset_queue(q, 1);
dev_set_drvdata(&apdev->device, NULL);
kfree(q);
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index e3f0d42b094c..c1f57f89973e 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -19,6 +19,7 @@
#include <linux/mutex.h>
#include <linux/kvm_host.h>
#include <linux/vfio.h>
+#include <linux/hashtable.h>

#include "ap_bus.h"

@@ -74,6 +75,15 @@ struct ap_matrix {
DECLARE_BITMAP(adm, 256);
};

+/**
+ * struct ap_queue_table - a table of queue objects.
+ *
+ * @queues: a hashtable of queues (struct vfio_ap_queue).
+ */
+struct ap_queue_table {
+ DECLARE_HASHTABLE(queues, 8);
+};
+
/**
* struct ap_matrix_mdev - Contains the data associated with a matrix mediated
* device.
@@ -89,6 +99,7 @@ struct ap_matrix {
* @pqap_hook: the function pointer to the interception handler for the
* PQAP(AQIC) instruction.
* @mdev: the mediated device
+ * @qtable: table of queues (struct vfio_ap_queue) assigned to the mdev
*/
struct ap_matrix_mdev {
struct vfio_device vdev;
@@ -98,6 +109,7 @@ struct ap_matrix_mdev {
struct notifier_block iommu_notifier;
struct kvm *kvm;
struct mdev_device *mdev;
+ struct ap_queue_table qtable;
};

/**
@@ -107,6 +119,7 @@ struct ap_matrix_mdev {
* @saved_pfn: the guest PFN pinned for the guest
* @apqn: the APQN of the AP queue device
* @saved_isc: the guest ISC registered with the GIB interface
+ * @mdev_qnode: allows the vfio_ap_queue struct to be added to a hashtable
*/
struct vfio_ap_queue {
struct ap_matrix_mdev *matrix_mdev;
@@ -114,6 +127,7 @@ struct vfio_ap_queue {
int apqn;
#define VFIO_AP_ISC_INVALID 0xff
unsigned char saved_isc;
+ struct hlist_node mdev_qnode;
};

int vfio_ap_mdev_register(void);
--
2.31.1

2021-10-21 15:25:38

by Anthony Krowiak

[permalink] [raw]
Subject: [PATCH v17 06/15] s390/vfio-ap: refresh guest's APCB by filtering APQNs assigned to mdev

Refresh the guest's APCB by filtering the APQNs assigned to the matrix mdev
that do not reference an AP queue device bound to the vfio_ap device
driver. The mdev's APQNs will be filtered according to the following rules:

* The APID of each adapter and the APQI of each domain that is not in the
host's AP configuration is filtered out.

* The APID of each adapter comprising an APQN that does not reference a
queue device bound to the vfio_ap device driver is filtered. The APQNs
are derived from the Cartesian product of the APID of each adapter and
APQI of each domain assigned to the mdev.

The control domains that are not assigned to the host's AP configuration
will also be filtered before assigning them to the guest's APCB.

Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 66 ++++++++++++++++++++++++++++++-
1 file changed, 64 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 4305177029bf..46c179363aca 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -314,6 +314,62 @@ static void vfio_ap_matrix_init(struct ap_config_info *info,
matrix->adm_max = info->apxa ? info->Nd : 15;
}

+static void vfio_ap_mdev_filter_cdoms(struct ap_matrix_mdev *matrix_mdev)
+{
+ bitmap_and(matrix_mdev->shadow_apcb.adm, matrix_mdev->matrix.adm,
+ (unsigned long *)matrix_dev->info.adm, AP_DOMAINS);
+}
+
+/*
+ * vfio_ap_mdev_filter_matrix - copy the mdev's AP configuration to the KVM
+ * guest's APCB then filter the APIDs that do not
+ * comprise at least one APQN that references a
+ * queue device bound to the vfio_ap device driver.
+ *
+ * @matrix_mdev: the mdev whose AP configuration is to be filtered.
+ */
+static void vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev)
+{
+ int ret;
+ unsigned long apid, apqi, apqn;
+
+ ret = ap_qci(&matrix_dev->info);
+ if (ret)
+ return;
+
+ vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);
+
+ /*
+ * Copy the adapters, domains and control domains to the shadow_apcb
+ * from the matrix mdev, but only those that are assigned to the host's
+ * AP configuration.
+ */
+ bitmap_and(matrix_mdev->shadow_apcb.apm, matrix_mdev->matrix.apm,
+ (unsigned long *)matrix_dev->info.apm, AP_DEVICES);
+ bitmap_and(matrix_mdev->shadow_apcb.aqm, matrix_mdev->matrix.aqm,
+ (unsigned long *)matrix_dev->info.aqm, AP_DOMAINS);
+
+ for_each_set_bit_inv(apid, matrix_mdev->shadow_apcb.apm, AP_DEVICES) {
+ for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm,
+ AP_DOMAINS) {
+ /*
+ * If the APQN is not bound to the vfio_ap device
+ * driver, then we can't assign it to the guest's
+ * AP configuration. The AP architecture won't
+ * allow filtering of a single APQN, so if we're
+ * filtering APIDs, then filter the APID; otherwise,
+ * filter the APQI.
+ */
+ apqn = AP_MKQID(apid, apqi);
+ if (!vfio_ap_mdev_get_queue(matrix_mdev, apqn)) {
+ clear_bit_inv(apid,
+ matrix_mdev->shadow_apcb.apm);
+ break;
+ }
+ }
+ }
+}
+
static int vfio_ap_mdev_probe(struct mdev_device *mdev)
{
struct ap_matrix_mdev *matrix_mdev;
@@ -703,6 +759,7 @@ static ssize_t assign_adapter_store(struct device *dev,
goto share_err;

vfio_ap_mdev_link_adapter(matrix_mdev, apid);
+ vfio_ap_mdev_filter_matrix(matrix_mdev);
ret = count;
goto done;

@@ -771,6 +828,7 @@ static ssize_t unassign_adapter_store(struct device *dev,

clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
vfio_ap_mdev_unlink_adapter(matrix_mdev, apid);
+ vfio_ap_mdev_filter_matrix(matrix_mdev);
ret = count;
done:
mutex_unlock(&matrix_dev->lock);
@@ -874,6 +932,7 @@ static ssize_t assign_domain_store(struct device *dev,
goto share_err;

vfio_ap_mdev_link_domain(matrix_mdev, apqi);
+ vfio_ap_mdev_filter_matrix(matrix_mdev);
ret = count;
goto done;

@@ -942,6 +1001,7 @@ static ssize_t unassign_domain_store(struct device *dev,

clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
vfio_ap_mdev_unlink_domain(matrix_mdev, apqi);
+ vfio_ap_mdev_filter_matrix(matrix_mdev);
ret = count;

done:
@@ -995,6 +1055,7 @@ static ssize_t assign_control_domain_store(struct device *dev,
* number of control domains that can be assigned.
*/
set_bit_inv(id, matrix_mdev->matrix.adm);
+ vfio_ap_mdev_filter_cdoms(matrix_mdev);
ret = count;
done:
mutex_unlock(&matrix_dev->lock);
@@ -1042,6 +1103,7 @@ static ssize_t unassign_control_domain_store(struct device *dev,
}

clear_bit_inv(domid, matrix_mdev->matrix.adm);
+ clear_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
ret = count;
done:
mutex_unlock(&matrix_dev->lock);
@@ -1179,8 +1241,6 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
kvm_get_kvm(kvm);
matrix_mdev->kvm = kvm;
kvm->arch.crypto.data = matrix_mdev;
- memcpy(&matrix_mdev->shadow_apcb, &matrix_mdev->matrix,
- sizeof(struct ap_matrix));
kvm_arch_crypto_set_masks(kvm, matrix_mdev->shadow_apcb.apm,
matrix_mdev->shadow_apcb.aqm,
matrix_mdev->shadow_apcb.adm);
@@ -1536,6 +1596,8 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
q->apqn = to_ap_queue(&apdev->device)->qid;
q->saved_isc = VFIO_AP_ISC_INVALID;
vfio_ap_queue_link_mdev(q);
+ if (q->matrix_mdev)
+ vfio_ap_mdev_filter_matrix(q->matrix_mdev);
dev_set_drvdata(&apdev->device, q);
mutex_unlock(&matrix_dev->lock);

--
2.31.1

2021-10-21 15:25:48

by Anthony Krowiak

[permalink] [raw]
Subject: [PATCH v17 05/15] s390/vfio-ap: introduce shadow APCB

The APCB is a field within the CRYCB that provides the AP configuration
to a KVM guest. Let's introduce a shadow copy of the KVM guest's APCB and
maintain it for the lifespan of the guest.

Signed-off-by: Tony Krowiak <[email protected]>
Reviewed-by: Halil Pasic <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 10 ++++++----
drivers/s390/crypto/vfio_ap_private.h | 2 ++
2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index e2be29e9d310..4305177029bf 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -332,6 +332,7 @@ static int vfio_ap_mdev_probe(struct mdev_device *mdev)

matrix_mdev->mdev = mdev;
vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
+ vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);
hash_init(matrix_mdev->qtable.queues);
mdev_set_drvdata(mdev, matrix_mdev);
mutex_lock(&matrix_dev->lock);
@@ -1178,10 +1179,11 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
kvm_get_kvm(kvm);
matrix_mdev->kvm = kvm;
kvm->arch.crypto.data = matrix_mdev;
- kvm_arch_crypto_set_masks(kvm,
- matrix_mdev->matrix.apm,
- matrix_mdev->matrix.aqm,
- matrix_mdev->matrix.adm);
+ memcpy(&matrix_mdev->shadow_apcb, &matrix_mdev->matrix,
+ sizeof(struct ap_matrix));
+ kvm_arch_crypto_set_masks(kvm, matrix_mdev->shadow_apcb.apm,
+ matrix_mdev->shadow_apcb.aqm,
+ matrix_mdev->shadow_apcb.adm);

mutex_unlock(&kvm->lock);
mutex_unlock(&matrix_dev->lock);
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index c1f57f89973e..6dc0ebbf7a06 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -91,6 +91,7 @@ struct ap_queue_table {
* @node: allows the ap_matrix_mdev struct to be added to a list
* @matrix: the adapters, usage domains and control domains assigned to the
* mediated matrix device.
+ * @shadow_apcb: the shadow copy of the APCB field of the KVM guest's CRYCB
* @group_notifier: notifier block used for specifying callback function for
* handling the VFIO_GROUP_NOTIFY_SET_KVM event
* @iommu_notifier: notifier block used for specifying callback function for
@@ -105,6 +106,7 @@ struct ap_matrix_mdev {
struct vfio_device vdev;
struct list_head node;
struct ap_matrix matrix;
+ struct ap_matrix shadow_apcb;
struct notifier_block group_notifier;
struct notifier_block iommu_notifier;
struct kvm *kvm;
--
2.31.1

2021-10-21 15:26:03

by Anthony Krowiak

[permalink] [raw]
Subject: [PATCH v17 08/15] s390/vfio-ap: keep track of active guests

The vfio_ap device driver registers for notification when the pointer to
the KVM object for a guest is set. Let's store the KVM pointer as well as
the pointer to the mediated device when the KVM pointer is set.

The reason for storing the KVM and mediated device pointers is to
facilitate hot plug/unplug of AP queues for a KVM guest when a queue device
is probed or removed. When a guest's APCB is hot plugged into the guest,
the kvm->lock must be taken prior to taking the matrix_dev->lock, or there
is potential for a lockdep splat (see below). Unfortunately, when a queue
is probed or removed, we have no idea whether it is assigned to a guest or
which KVM object is associated with the guest. If we take the
matrix_dev->lock to determine whether the APQN is assigned to a running
guest then subsequently take the kvm->lock, in certain situations that will
result in a lockdep splat:

* see commit 0cc00c8d4050 ("Fix circular lockdep when setting/clearing
crypto masks")

* see commit 86956e70761b ("replace open coded locks for
VFIO_GROUP_NOTIFY_SET_KVM notification")

The reason a lockdep splat can occur has to do with the fact that the
kvm->lock has to be taken before the vcpu->lock; so, for example, when a
secure execution guest is started, you may end up with the following
scenario:

Interception of PQAP(AQIC) instruction executed on the guest:
------------------------------------------------------------
handle_pqap: matrix_dev->lock
kvm_vcpu_ioctl: vcpu_mutex

Start of secure execution guest:
-------------------------------
kvm_s390_cpus_to_pv: vcpu->mutex
kvm_arch_vm_ioctl: kvm->lock

Queue is unbound from vfio_ap device driver:
-------------------------------------------
kvm->lock
vfio_ap_mdev_remove_queue: matrix_dev->lock

This patch introduces a new ap_guest structure into which the pointers to
the kvm and matrix_mdev can be stored. It also introduces two new fields
in the struct ap_matrix_dev:

struct ap_matrix_dev {
...
struct rw_semaphore guests_lock;
struct list_head guests;
...
}

The 'guests_lock' field is a r/w semaphore to control access to the
'guests' field. The 'guests' field is a list of ap_guest
structures containing the KVM and matrix_mdev pointers for each active
guest. An ap_guest structure will be stored into the list whenever the
vfio_ap device driver is notified that the KVM pointer has been set and
removed when notified that the KVM pointer has been cleared.

Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_drv.c | 2 ++
drivers/s390/crypto/vfio_ap_ops.c | 44 +++++++++++++++++++++++++--
drivers/s390/crypto/vfio_ap_private.h | 10 ++++++
3 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index 5255e338591d..1d1746fe50ea 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -98,6 +98,8 @@ static int vfio_ap_matrix_dev_create(void)

mutex_init(&matrix_dev->lock);
INIT_LIST_HEAD(&matrix_dev->mdev_list);
+ init_rwsem(&matrix_dev->guests_lock);
+ INIT_LIST_HEAD(&matrix_dev->guests);

dev_set_name(&matrix_dev->device, "%s", VFIO_AP_DEV_NAME);
matrix_dev->device.parent = root_device;
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 6b40db6dab3c..a2875cf79091 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1086,6 +1086,20 @@ static const struct attribute_group *vfio_ap_mdev_attr_groups[] = {
NULL
};

+static int vfio_ap_mdev_create_guest(struct kvm *kvm,
+ struct ap_matrix_mdev *matrix_mdev)
+{
+ struct ap_guest *guest;
+
+ guest = kzalloc(sizeof(*guest), GFP_KERNEL);
+ if (!guest)
+ return -ENOMEM;
+
+ list_add(&guest->node, &matrix_dev->guests);
+
+ return 0;
+}
+
/**
* vfio_ap_mdev_set_kvm - sets all data for @matrix_mdev that are needed
* to manage AP resources for the guest whose state is represented by @kvm
@@ -1106,16 +1120,23 @@ static const struct attribute_group *vfio_ap_mdev_attr_groups[] = {
static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
struct kvm *kvm)
{
+ int ret;
struct ap_matrix_mdev *m;

if (kvm->arch.crypto.crycbd) {
+ down_write(&matrix_dev->guests_lock);
+ ret = vfio_ap_mdev_create_guest(kvm, matrix_mdev);
+ if (WARN_ON(ret))
+ return ret;
+
mutex_lock(&kvm->lock);
mutex_lock(&matrix_dev->lock);

list_for_each_entry(m, &matrix_dev->mdev_list, node) {
if (m != matrix_mdev && m->kvm == kvm) {
- mutex_unlock(&kvm->lock);
mutex_unlock(&matrix_dev->lock);
+ mutex_unlock(&kvm->lock);
+ up_write(&matrix_dev->guests_lock);
return -EPERM;
}
}
@@ -1127,8 +1148,9 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
matrix_mdev->shadow_apcb.aqm,
matrix_mdev->shadow_apcb.adm);

- mutex_unlock(&kvm->lock);
mutex_unlock(&matrix_dev->lock);
+ mutex_unlock(&kvm->lock);
+ up_write(&matrix_dev->guests_lock);
}

return 0;
@@ -1164,6 +1186,18 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
return NOTIFY_DONE;
}

+static void vfio_ap_mdev_remove_guest(struct kvm *kvm)
+{
+ struct ap_guest *guest, *tmp;
+
+ list_for_each_entry_safe(guest, tmp, &matrix_dev->guests, node) {
+ if (guest->kvm == kvm) {
+ list_del(&guest->node);
+ break;
+ }
+ }
+}
+
/**
* vfio_ap_mdev_unset_kvm - performs clean-up of resources no longer needed
* by @matrix_mdev.
@@ -1182,6 +1216,9 @@ static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev,
struct kvm *kvm)
{
if (kvm && kvm->arch.crypto.crycbd) {
+ down_write(&matrix_dev->guests_lock);
+ vfio_ap_mdev_remove_guest(kvm);
+
mutex_lock(&kvm->lock);
mutex_lock(&matrix_dev->lock);

@@ -1191,8 +1228,9 @@ static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev,
kvm->arch.crypto.data = NULL;
matrix_mdev->kvm = NULL;

- mutex_unlock(&kvm->lock);
mutex_unlock(&matrix_dev->lock);
+ mutex_unlock(&kvm->lock);
+ up_write(&matrix_dev->guests_lock);
}
}

diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 6dc0ebbf7a06..6d28b287d7bf 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -26,6 +26,11 @@
#define VFIO_AP_MODULE_NAME "vfio_ap"
#define VFIO_AP_DRV_NAME "vfio_ap"

+struct ap_guest {
+ struct kvm *kvm;
+ struct list_head node;
+};
+
/**
* struct ap_matrix_dev - Contains the data for the matrix device.
*
@@ -39,6 +44,9 @@
* single ap_matrix_mdev device. It's quite coarse but we don't
* expect much contention.
* @vfio_ap_drv: the vfio_ap device driver
+ * @guests_lock: r/w semaphore for protecting access to @guests
+ * @guests: list of guests (struct ap_guest) using AP devices bound to the
+ * vfio_ap device driver.
*/
struct ap_matrix_dev {
struct device device;
@@ -47,6 +55,8 @@ struct ap_matrix_dev {
struct list_head mdev_list;
struct mutex lock;
struct ap_driver *vfio_ap_drv;
+ struct rw_semaphore guests_lock;
+ struct list_head guests;
};

extern struct ap_matrix_dev *matrix_dev;
--
2.31.1

2021-10-21 15:26:15

by Anthony Krowiak

[permalink] [raw]
Subject: [PATCH v17 07/15] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device

The current implementation does not allow assignment of an AP adapter or
domain to an mdev device if each APQN resulting from the assignment
does not reference an AP queue device that is bound to the vfio_ap device
driver. This patch allows assignment of AP resources to the matrix mdev as
long as the APQNs resulting from the assignment:
1. Are not reserved by the AP BUS for use by the zcrypt device drivers.
2. Are not assigned to another matrix mdev.

The rationale behind this is that the AP architecture does not preclude
assignment of APQNs to an AP configuration profile that are not available
to the system.

Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 224 +++++++-----------------------
1 file changed, 53 insertions(+), 171 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 46c179363aca..6b40db6dab3c 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -520,141 +520,48 @@ static struct attribute_group *vfio_ap_mdev_type_groups[] = {
NULL,
};

-struct vfio_ap_queue_reserved {
- unsigned long *apid;
- unsigned long *apqi;
- bool reserved;
-};
+#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
+ "already assigned to %s"

-/**
- * vfio_ap_has_queue - determines if the AP queue containing the target in @data
- *
- * @dev: an AP queue device
- * @data: a struct vfio_ap_queue_reserved reference
- *
- * Flags whether the AP queue device (@dev) has a queue ID containing the APQN,
- * apid or apqi specified in @data:
- *
- * - If @data contains both an apid and apqi value, then @data will be flagged
- * as reserved if the APID and APQI fields for the AP queue device matches
- *
- * - If @data contains only an apid value, @data will be flagged as
- * reserved if the APID field in the AP queue device matches
- *
- * - If @data contains only an apqi value, @data will be flagged as
- * reserved if the APQI field in the AP queue device matches
- *
- * Return: 0 to indicate the input to function succeeded. Returns -EINVAL if
- * @data does not contain either an apid or apqi.
- */
-static int vfio_ap_has_queue(struct device *dev, void *data)
+static void vfio_ap_mdev_log_sharing_err(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long *apm,
+ unsigned long *aqm)
{
- struct vfio_ap_queue_reserved *qres = data;
- struct ap_queue *ap_queue = to_ap_queue(dev);
- ap_qid_t qid;
- unsigned long id;
-
- if (qres->apid && qres->apqi) {
- qid = AP_MKQID(*qres->apid, *qres->apqi);
- if (qid == ap_queue->qid)
- qres->reserved = true;
- } else if (qres->apid && !qres->apqi) {
- id = AP_QID_CARD(ap_queue->qid);
- if (id == *qres->apid)
- qres->reserved = true;
- } else if (!qres->apid && qres->apqi) {
- id = AP_QID_QUEUE(ap_queue->qid);
- if (id == *qres->apqi)
- qres->reserved = true;
- } else {
- return -EINVAL;
- }
-
- return 0;
-}
-
-/**
- * vfio_ap_verify_queue_reserved - verifies that the AP queue containing
- * @apid or @aqpi is reserved
- *
- * @apid: an AP adapter ID
- * @apqi: an AP queue index
- *
- * Verifies that the AP queue with @apid/@apqi is reserved by the VFIO AP device
- * driver according to the following rules:
- *
- * - If both @apid and @apqi are not NULL, then there must be an AP queue
- * device bound to the vfio_ap driver with the APQN identified by @apid and
- * @apqi
- *
- * - If only @apid is not NULL, then there must be an AP queue device bound
- * to the vfio_ap driver with an APQN containing @apid
- *
- * - If only @apqi is not NULL, then there must be an AP queue device bound
- * to the vfio_ap driver with an APQN containing @apqi
- *
- * Return: 0 if the AP queue is reserved; otherwise, returns -EADDRNOTAVAIL.
- */
-static int vfio_ap_verify_queue_reserved(unsigned long *apid,
- unsigned long *apqi)
-{
- int ret;
- struct vfio_ap_queue_reserved qres;
-
- qres.apid = apid;
- qres.apqi = apqi;
- qres.reserved = false;
-
- ret = driver_for_each_device(&matrix_dev->vfio_ap_drv->driver, NULL,
- &qres, vfio_ap_has_queue);
- if (ret)
- return ret;
-
- if (qres.reserved)
- return 0;
-
- return -EADDRNOTAVAIL;
-}
-
-static int
-vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
- unsigned long apid)
-{
- int ret;
- unsigned long apqi;
- unsigned long nbits = matrix_mdev->matrix.aqm_max + 1;
-
- if (find_first_bit_inv(matrix_mdev->matrix.aqm, nbits) >= nbits)
- return vfio_ap_verify_queue_reserved(&apid, NULL);
-
- for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, nbits) {
- ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
- if (ret)
- return ret;
- }
+ unsigned long apid, apqi;
+ const struct device *dev = mdev_dev(matrix_mdev->mdev);
+ const char *mdev_name = dev_name(dev);

- return 0;
+ for_each_set_bit_inv(apid, apm, AP_DEVICES)
+ for_each_set_bit_inv(apqi, aqm, AP_DOMAINS)
+ dev_warn(dev, MDEV_SHARING_ERR, apid, apqi, mdev_name);
}

/**
- * vfio_ap_mdev_verify_no_sharing - verifies that the AP matrix is not configured
+ * vfio_ap_mdev_verify_no_sharing - verify APQNs are not shared by matrix mdevs
*
- * @matrix_mdev: the mediated matrix device
+ * @mdev_apm: mask indicating the APIDs of the APQNs to be verified
+ * @mdev_aqm: mask indicating the APQIs of the APQNs to be verified
*
- * Verifies that the APQNs derived from the cross product of the AP adapter IDs
- * and AP queue indexes comprising the AP matrix are not configured for another
+ * Verifies that each APQN derived from the Cartesian product of a bitmap of
+ * AP adapter IDs and AP queue indexes is not configured for any matrix
* mediated device. AP queue sharing is not allowed.
*
- * Return: 0 if the APQNs are not shared; otherwise returns -EADDRINUSE.
+ * Return: 0 if the APQNs are not shared; otherwise return -EADDRINUSE.
*/
-static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
+static int vfio_ap_mdev_verify_no_sharing(unsigned long *mdev_apm,
+ unsigned long *mdev_aqm)
{
- struct ap_matrix_mdev *lstdev;
+ struct ap_matrix_mdev *matrix_mdev;
DECLARE_BITMAP(apm, AP_DEVICES);
DECLARE_BITMAP(aqm, AP_DOMAINS);

- list_for_each_entry(lstdev, &matrix_dev->mdev_list, node) {
- if (matrix_mdev == lstdev)
+ list_for_each_entry(matrix_mdev, &matrix_dev->mdev_list, node) {
+ /*
+ * If the input apm and aqm belong to the matrix_mdev's matrix,
+ * then move on to the next.
+ */
+ if (mdev_apm == matrix_mdev->matrix.apm &&
+ mdev_aqm == matrix_mdev->matrix.aqm)
continue;

memset(apm, 0, sizeof(apm));
@@ -664,20 +571,32 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
* We work on full longs, as we can only exclude the leftover
* bits in non-inverse order. The leftover is all zeros.
*/
- if (!bitmap_and(apm, matrix_mdev->matrix.apm,
- lstdev->matrix.apm, AP_DEVICES))
+ if (!bitmap_and(apm, mdev_apm, matrix_mdev->matrix.apm,
+ AP_DEVICES))
continue;

- if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
- lstdev->matrix.aqm, AP_DOMAINS))
+ if (!bitmap_and(aqm, mdev_aqm, matrix_mdev->matrix.aqm,
+ AP_DOMAINS))
continue;

+ vfio_ap_mdev_log_sharing_err(matrix_mdev, apm, aqm);
+
return -EADDRINUSE;
}

return 0;
}

+static int vfio_ap_mdev_validate_masks(struct ap_matrix_mdev *matrix_mdev)
+{
+ if (ap_apqn_in_matrix_owned_by_def_drv(matrix_mdev->matrix.apm,
+ matrix_mdev->matrix.aqm))
+ return -EADDRNOTAVAIL;
+
+ return vfio_ap_mdev_verify_no_sharing(matrix_mdev->matrix.apm,
+ matrix_mdev->matrix.aqm);
+}
+
static void vfio_ap_mdev_link_adapter(struct ap_matrix_mdev *matrix_mdev,
unsigned long apid)
{
@@ -743,28 +662,17 @@ static ssize_t assign_adapter_store(struct device *dev,
goto done;
}

- /*
- * Set the bit in the AP mask (APM) corresponding to the AP adapter
- * number (APID). The bits in the mask, from most significant to least
- * significant bit, correspond to APIDs 0-255.
- */
- ret = vfio_ap_mdev_verify_queues_reserved_for_apid(matrix_mdev, apid);
- if (ret)
- goto done;
-
set_bit_inv(apid, matrix_mdev->matrix.apm);

- ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
- if (ret)
- goto share_err;
+ ret = vfio_ap_mdev_validate_masks(matrix_mdev);
+ if (ret) {
+ clear_bit_inv(apid, matrix_mdev->matrix.apm);
+ goto done;
+ }

vfio_ap_mdev_link_adapter(matrix_mdev, apid);
vfio_ap_mdev_filter_matrix(matrix_mdev);
ret = count;
- goto done;
-
-share_err:
- clear_bit_inv(apid, matrix_mdev->matrix.apm);
done:
mutex_unlock(&matrix_dev->lock);

@@ -836,26 +744,6 @@ static ssize_t unassign_adapter_store(struct device *dev,
}
static DEVICE_ATTR_WO(unassign_adapter);

-static int
-vfio_ap_mdev_verify_queues_reserved_for_apqi(struct ap_matrix_mdev *matrix_mdev,
- unsigned long apqi)
-{
- int ret;
- unsigned long apid;
- unsigned long nbits = matrix_mdev->matrix.apm_max + 1;
-
- if (find_first_bit_inv(matrix_mdev->matrix.apm, nbits) >= nbits)
- return vfio_ap_verify_queue_reserved(NULL, &apqi);
-
- for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, nbits) {
- ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
- if (ret)
- return ret;
- }
-
- return 0;
-}
-
static void vfio_ap_mdev_link_domain(struct ap_matrix_mdev *matrix_mdev,
unsigned long apqi)
{
@@ -921,23 +809,17 @@ static ssize_t assign_domain_store(struct device *dev,
goto done;
}

- ret = vfio_ap_mdev_verify_queues_reserved_for_apqi(matrix_mdev, apqi);
- if (ret)
- goto done;
-
set_bit_inv(apqi, matrix_mdev->matrix.aqm);

- ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
- if (ret)
- goto share_err;
+ ret = vfio_ap_mdev_validate_masks(matrix_mdev);
+ if (ret) {
+ clear_bit_inv(apqi, matrix_mdev->matrix.aqm);
+ goto done;
+ }

vfio_ap_mdev_link_domain(matrix_mdev, apqi);
vfio_ap_mdev_filter_matrix(matrix_mdev);
ret = count;
- goto done;
-
-share_err:
- clear_bit_inv(apqi, matrix_mdev->matrix.aqm);
done:
mutex_unlock(&matrix_dev->lock);

--
2.31.1

2021-10-21 15:26:23

by Anthony Krowiak

[permalink] [raw]
Subject: [PATCH v17 10/15] s390/vfio-ap: reset queues after adapter/domain unassignment

When an adapter or domain is unassigned from an mdev providing the AP
configuration to a running KVM guest, one or more of the guest's queues may
get dynamically removed. Since the removed queues could get re-assigned to
another mdev, they need to be reset. So, when an adapter or domain is
unassigned from the mdev, the queues that are removed from the guest's
AP configuration will be reset.

Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 136 +++++++++++++++++++-------
drivers/s390/crypto/vfio_ap_private.h | 2 +
2 files changed, 100 insertions(+), 38 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 5a484e7afbd0..6b292ed30ada 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -24,9 +24,10 @@
#define VFIO_AP_MDEV_TYPE_HWVIRT "passthrough"
#define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"

-static int vfio_ap_mdev_reset_queues(struct ap_matrix_mdev *matrix_mdev);
+static int vfio_ap_mdev_reset_queues(struct ap_queue_table *qtable);
static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
static const struct vfio_device_ops vfio_ap_matrix_dev_ops;
+static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q, unsigned int retry);

/**
* vfio_ap_mdev_get_queue - retrieve a queue with a specific APQN from a
@@ -352,6 +353,7 @@ static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev)
unsigned long apid, apqi, apqn;
DECLARE_BITMAP(shadow_apm, AP_DEVICES);
DECLARE_BITMAP(shadow_aqm, AP_DOMAINS);
+ struct vfio_ap_queue *q;

ret = ap_qci(&matrix_dev->info);
if (ret)
@@ -383,7 +385,8 @@ static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev)
* filter the APQI.
*/
apqn = AP_MKQID(apid, apqi);
- if (!vfio_ap_mdev_get_queue(matrix_mdev, apqn)) {
+ q = vfio_ap_mdev_get_queue(matrix_mdev, apqn);
+ if (!q || q->reset_rc) {
clear_bit_inv(apid,
matrix_mdev->shadow_apcb.apm);
break;
@@ -466,12 +469,6 @@ static void vfio_ap_unlink_mdev_fr_queue(struct vfio_ap_queue *q)
q->matrix_mdev = NULL;
}

-static void vfio_ap_mdev_unlink_queue(struct vfio_ap_queue *q)
-{
- vfio_ap_unlink_queue_fr_mdev(q);
- vfio_ap_unlink_mdev_fr_queue(q);
-}
-
static void vfio_ap_mdev_unlink_fr_queues(struct ap_matrix_mdev *matrix_mdev)
{
struct vfio_ap_queue *q;
@@ -495,7 +492,7 @@ static void vfio_ap_mdev_remove(struct mdev_device *mdev)
vfio_unregister_group_dev(&matrix_mdev->vdev);

mutex_lock(&matrix_dev->lock);
- vfio_ap_mdev_reset_queues(matrix_mdev);
+ vfio_ap_mdev_reset_queues(&matrix_mdev->qtable);
vfio_ap_mdev_unlink_fr_queues(matrix_mdev);
list_del(&matrix_mdev->node);
mutex_unlock(&matrix_dev->lock);
@@ -732,17 +729,59 @@ static ssize_t assign_adapter_store(struct device *dev,
}
static DEVICE_ATTR_WO(assign_adapter);

+static void vfio_ap_unlink_apqn_fr_mdev(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long apid, unsigned long apqi,
+ struct ap_queue_table *qtable)
+{
+ struct vfio_ap_queue *q;
+
+ q = vfio_ap_mdev_get_queue(matrix_mdev, AP_MKQID(apid, apqi));
+ /* If the queue is assigned to the matrix mdev, unlink it. */
+ if (q)
+ vfio_ap_unlink_queue_fr_mdev(q);
+
+ /* If the queue is assigned to the APCB, store it in @qtable. */
+ if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm) &&
+ test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm))
+ hash_add(qtable->queues, &q->mdev_qnode, q->apqn);
+}
+
+/**
+ * vfio_ap_mdev_unlink_adapter - unlink all queues associated with unassigned
+ * adapter from the matrix mdev to which the
+ * adapter was assigned.
+ * @matrix_mdev: the matrix mediated device to which the adapter was assigned.
+ * @apid: the APID of the unassigned adapter.
+ * @qtable: table for storing queues associated with unassigned adapter.
+ */
static void vfio_ap_mdev_unlink_adapter(struct ap_matrix_mdev *matrix_mdev,
- unsigned long apid)
+ unsigned long apid,
+ struct ap_queue_table *qtable)
{
unsigned long apqi;
+
+ for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS)
+ vfio_ap_unlink_apqn_fr_mdev(matrix_mdev, apid, apqi, qtable);
+}
+
+static void vfio_ap_mdev_hot_unplug_adapter(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long apid)
+{
+ int bkt;
struct vfio_ap_queue *q;
+ struct ap_queue_table qtable;
+
+ hash_init(qtable.queues);
+ vfio_ap_mdev_unlink_adapter(matrix_mdev, apid, &qtable);

- for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
- q = vfio_ap_mdev_get_queue(matrix_mdev, AP_MKQID(apid, apqi));
+ if (vfio_ap_mdev_filter_matrix(matrix_mdev))
+ vfio_ap_mdev_hotplug_apcb(matrix_mdev);

- if (q)
- vfio_ap_mdev_unlink_queue(q);
+ vfio_ap_mdev_reset_queues(&qtable);
+
+ hash_for_each(qtable.queues, bkt, q, mdev_qnode) {
+ vfio_ap_unlink_mdev_fr_queue(q);
+ hash_del(&q->mdev_qnode);
}
}

@@ -778,11 +817,7 @@ static ssize_t unassign_adapter_store(struct device *dev,

vfio_ap_mdev_get_locks(matrix_mdev);
clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
- vfio_ap_mdev_unlink_adapter(matrix_mdev, apid);
-
- if (vfio_ap_mdev_filter_matrix(matrix_mdev))
- vfio_ap_mdev_hotplug_apcb(matrix_mdev);
-
+ vfio_ap_mdev_hot_unplug_adapter(matrix_mdev, apid);
vfio_ap_mdev_put_locks(matrix_mdev);

return count;
@@ -867,16 +902,33 @@ static ssize_t assign_domain_store(struct device *dev,
static DEVICE_ATTR_WO(assign_domain);

static void vfio_ap_mdev_unlink_domain(struct ap_matrix_mdev *matrix_mdev,
- unsigned long apqi)
+ unsigned long apqi,
+ struct ap_queue_table *qtable)
{
unsigned long apid;
+
+ for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES)
+ vfio_ap_unlink_apqn_fr_mdev(matrix_mdev, apid, apqi, qtable);
+}
+
+static void vfio_ap_mdev_hot_unplug_domain(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long apqi)
+{
+ int bkt;
struct vfio_ap_queue *q;
+ struct ap_queue_table qtable;

- for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
- q = vfio_ap_mdev_get_queue(matrix_mdev, AP_MKQID(apid, apqi));
+ hash_init(qtable.queues);
+ vfio_ap_mdev_unlink_domain(matrix_mdev, apqi, &qtable);
+
+ if (vfio_ap_mdev_filter_matrix(matrix_mdev))
+ vfio_ap_mdev_hotplug_apcb(matrix_mdev);
+
+ vfio_ap_mdev_reset_queues(&qtable);

- if (q)
- vfio_ap_mdev_unlink_queue(q);
+ hash_for_each(qtable.queues, bkt, q, mdev_qnode) {
+ vfio_ap_unlink_mdev_fr_queue(q);
+ hash_del(&q->mdev_qnode);
}
}

@@ -912,11 +964,7 @@ static ssize_t unassign_domain_store(struct device *dev,

vfio_ap_mdev_get_locks(matrix_mdev);
clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
- vfio_ap_mdev_unlink_domain(matrix_mdev, apqi);
-
- if (vfio_ap_mdev_filter_matrix(matrix_mdev))
- vfio_ap_mdev_hotplug_apcb(matrix_mdev);
-
+ vfio_ap_mdev_hot_unplug_domain(matrix_mdev, apqi);
vfio_ap_mdev_put_locks(matrix_mdev);

return count;
@@ -1243,7 +1291,7 @@ static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
mutex_lock(&matrix_dev->lock);

kvm_arch_crypto_clear_masks(matrix_mdev->guest->kvm);
- vfio_ap_mdev_reset_queues(matrix_mdev);
+ vfio_ap_mdev_reset_queues(&matrix_mdev->qtable);
matrix_mdev->guest->kvm->arch.crypto.data = NULL;
kvm_put_kvm(matrix_mdev->guest->kvm);

@@ -1299,12 +1347,14 @@ static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q, unsigned int retry)

if (!q)
return 0;
+ q->reset_rc = 0;

retry_zapq:
status = ap_zapq(q->apqn);
switch (status.response_code) {
case AP_RESPONSE_NORMAL:
ret = 0;
+ q->reset_rc = status.response_code;
break;
case AP_RESPONSE_RESET_IN_PROGRESS:
if (retry--) {
@@ -1316,13 +1366,20 @@ static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q, unsigned int retry)
case AP_RESPONSE_Q_NOT_AVAIL:
case AP_RESPONSE_DECONFIGURED:
case AP_RESPONSE_CHECKSTOPPED:
- WARN_ON_ONCE(status.irq_enabled);
+ WARN_ONCE(status.irq_enabled,
+ "PQAP/ZAPQ for %02x.%04x failed with rc=%u while IRQ enabled",
+ AP_QID_CARD(q->apqn), AP_QID_QUEUE(q->apqn),
+ status.response_code);
+ q->reset_rc = status.response_code;
ret = -EBUSY;
goto free_resources;
default:
/* things are really broken, give up */
- WARN(true, "PQAP/ZAPQ completed with invalid rc (%x)\n",
+ WARN(true,
+ "PQAP/ZAPQ for %02x.%04x failed with invalid rc=%u\n",
+ AP_QID_CARD(q->apqn), AP_QID_QUEUE(q->apqn),
status.response_code);
+ q->reset_rc = status.response_code;
return -EIO;
}

@@ -1333,7 +1390,8 @@ static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q, unsigned int retry)
msleep(20);
status = ap_tapq(q->apqn, NULL);
}
- WARN_ON_ONCE(retry2 <= 0);
+ WARN_ONCE(retry2 <= 0, "unable to verify reset of queue %02x.%04x",
+ AP_QID_CARD(q->apqn), AP_QID_QUEUE(q->apqn));

free_resources:
vfio_ap_free_aqic_resources(q);
@@ -1341,20 +1399,22 @@ static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q, unsigned int retry)
return ret;
}

-static int vfio_ap_mdev_reset_queues(struct ap_matrix_mdev *matrix_mdev)
+static int vfio_ap_mdev_reset_queues(struct ap_queue_table *qtable)
{
- int ret, bkt, rc = 0;
+ int rc = 0, ret, bkt;
struct vfio_ap_queue *q;

- hash_for_each(matrix_mdev->qtable.queues, bkt, q, mdev_qnode) {
+ hash_for_each(qtable->queues, bkt, q, mdev_qnode) {
ret = vfio_ap_mdev_reset_queue(q, 1);
/*
* Regardless whether a queue turns out to be busy, or
* is not operational, we need to continue resetting
* the remaining queues.
*/
- if (ret)
+ if (ret) {
rc = ret;
+ q->reset_rc = ret;
+ }
}

return rc;
@@ -1434,7 +1494,7 @@ static ssize_t vfio_ap_mdev_ioctl(struct vfio_device *vdev,
ret = vfio_ap_mdev_get_device_info(arg);
break;
case VFIO_DEVICE_RESET:
- ret = vfio_ap_mdev_reset_queues(matrix_mdev);
+ ret = vfio_ap_mdev_reset_queues(&matrix_mdev->qtable);
break;
default:
ret = -EOPNOTSUPP;
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 0e825ffbd0cc..5d59bba8b153 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -131,6 +131,7 @@ struct ap_matrix_mdev {
* @apqn: the APQN of the AP queue device
* @saved_isc: the guest ISC registered with the GIB interface
* @mdev_qnode: allows the vfio_ap_queue struct to be added to a hashtable
+ * @reset_rc: the status response code from the last reset of the queue
*/
struct vfio_ap_queue {
struct ap_matrix_mdev *matrix_mdev;
@@ -139,6 +140,7 @@ struct vfio_ap_queue {
#define VFIO_AP_ISC_INVALID 0xff
unsigned char saved_isc;
struct hlist_node mdev_qnode;
+ unsigned int reset_rc;
};

int vfio_ap_mdev_register(void);
--
2.31.1

2021-10-21 15:26:23

by Anthony Krowiak

[permalink] [raw]
Subject: [PATCH v17 09/15] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device

Let's allow adapters, domains and control domains to be hot plugged into
and hot unplugged from a KVM guest using a matrix mdev when:

* The adapter, domain or control domain is assigned to or unassigned from
the matrix mdev

* A queue device with an APQN assigned to the matrix mdev is bound to or
unbound from the vfio_ap device driver.

Whenever an assignment or unassignment of an adapter, domain or control
domain is performed as well as when a bind or unbind of a queue device
is executed, the AP configuration assigned to the matrix mediated device
will be filtered and assigned to the AP control block (APCB) that supplies
the AP configuration to the guest so that no adapter, domain or control
domain that is not in the host's AP configuration nor any APQN that does
not reference a queue device bound to the vfio_ap device driver is
assigned.

After updating the APCB, if the mdev is in use by a KVM guest, it is
hot plugged into the guest to dynamically provide access to the adapters,
domains and control domains provided via the newly refreshed APCB.

Keep in mind that the kvm->lock must be taken outside of the
matrix_mdev->lock to avoid circular lock dependencies (i.e., a lockdep
splat). This will necessitate taking the matrix_dev->guests_lock in order
to find the guest(s) in the matrix_dev->guests list to which the affected
APQN(s) may be assigned. The kvm->lock can then be taken prior to the
matrix_dev->lock and the APCB plugged into the guest without any problem.

Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 388 ++++++++++++++++----------
drivers/s390/crypto/vfio_ap_private.h | 7 +-
2 files changed, 238 insertions(+), 157 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index a2875cf79091..5a484e7afbd0 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -105,8 +105,8 @@ static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q)
if (!q)
return;
if (q->saved_isc != VFIO_AP_ISC_INVALID &&
- !WARN_ON(!(q->matrix_mdev && q->matrix_mdev->kvm))) {
- kvm_s390_gisc_unregister(q->matrix_mdev->kvm, q->saved_isc);
+ !WARN_ON(!(q->matrix_mdev && q->matrix_mdev->guest->kvm))) {
+ kvm_s390_gisc_unregister(q->matrix_mdev->guest->kvm, q->saved_isc);
q->saved_isc = VFIO_AP_ISC_INVALID;
}
if (q->saved_pfn && !WARN_ON(!q->matrix_mdev)) {
@@ -211,7 +211,7 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
return status;
}

- kvm = q->matrix_mdev->kvm;
+ kvm = q->matrix_mdev->guest->kvm;
gisa = kvm->arch.gisa_int.origin;

h_nib = (h_pfn << PAGE_SHIFT) | (nib & ~PAGE_MASK);
@@ -283,7 +283,7 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
matrix_mdev = vcpu->kvm->arch.crypto.data;

/* If the there is no guest using the mdev, there is nothing to do */
- if (!matrix_mdev || !matrix_mdev->kvm)
+ if (!matrix_mdev || !matrix_mdev->guest->kvm)
goto out_unlock;

q = vfio_ap_mdev_get_queue(matrix_mdev, apqn);
@@ -314,10 +314,25 @@ static void vfio_ap_matrix_init(struct ap_config_info *info,
matrix->adm_max = info->apxa ? info->Nd : 15;
}

-static void vfio_ap_mdev_filter_cdoms(struct ap_matrix_mdev *matrix_mdev)
+static void vfio_ap_mdev_hotplug_apcb(struct ap_matrix_mdev *matrix_mdev)
{
+ if (matrix_mdev->guest->kvm)
+ kvm_arch_crypto_set_masks(matrix_mdev->guest->kvm,
+ matrix_mdev->shadow_apcb.apm,
+ matrix_mdev->shadow_apcb.aqm,
+ matrix_mdev->shadow_apcb.adm);
+}
+
+static bool vfio_ap_mdev_filter_cdoms(struct ap_matrix_mdev *matrix_mdev)
+{
+ DECLARE_BITMAP(shadow_adm, AP_DOMAINS);
+
+ bitmap_copy(shadow_adm, matrix_mdev->shadow_apcb.adm, AP_DOMAINS);
bitmap_and(matrix_mdev->shadow_apcb.adm, matrix_mdev->matrix.adm,
(unsigned long *)matrix_dev->info.adm, AP_DOMAINS);
+
+ return !bitmap_equal(shadow_adm, matrix_mdev->shadow_apcb.adm,
+ AP_DOMAINS);
}

/*
@@ -327,16 +342,23 @@ static void vfio_ap_mdev_filter_cdoms(struct ap_matrix_mdev *matrix_mdev)
* queue device bound to the vfio_ap device driver.
*
* @matrix_mdev: the mdev whose AP configuration is to be filtered.
+ *
+ * Return: a boolean value indicating whether the KVM guest's APCB was changed
+ * by the filtering or not.
*/
-static void vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev)
+static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev)
{
int ret;
unsigned long apid, apqi, apqn;
+ DECLARE_BITMAP(shadow_apm, AP_DEVICES);
+ DECLARE_BITMAP(shadow_aqm, AP_DOMAINS);

ret = ap_qci(&matrix_dev->info);
if (ret)
- return;
+ return false;

+ bitmap_copy(shadow_apm, matrix_mdev->shadow_apcb.apm, AP_DEVICES);
+ bitmap_copy(shadow_aqm, matrix_mdev->shadow_apcb.aqm, AP_DOMAINS);
vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);

/*
@@ -368,6 +390,11 @@ static void vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev)
}
}
}
+
+ return !bitmap_equal(shadow_apm, matrix_mdev->shadow_apcb.apm,
+ AP_DEVICES) ||
+ !bitmap_equal(shadow_aqm, matrix_mdev->shadow_apcb.aqm,
+ AP_DOMAINS);
}

static int vfio_ap_mdev_probe(struct mdev_device *mdev)
@@ -607,6 +634,37 @@ static void vfio_ap_mdev_link_adapter(struct ap_matrix_mdev *matrix_mdev,
AP_MKQID(apid, apqi));
}

+/**
+ * vfio_ap_mdev_get_locks - lock the kvm->lock and matrix_dev->lock mutexes
+ *
+ * @matrix_mdev: the matrix mediated device object
+ */
+static void vfio_ap_mdev_get_locks(struct ap_matrix_mdev *matrix_mdev)
+{
+ down_read(&matrix_dev->guests_lock);
+
+ /* The kvm->lock must be must be taken before the matrix_dev->lock */
+ if (matrix_mdev->guest)
+ mutex_lock(&matrix_mdev->guest->kvm->lock);
+
+ mutex_lock(&matrix_dev->lock);
+}
+
+/**
+ * vfio_ap_mdev_put_locks - release the kvm->lock and matrix_dev->lock mutexes
+ *
+ * @matrix_mdev: the matrix mediated device object
+ */
+static void vfio_ap_mdev_put_locks(struct ap_matrix_mdev *matrix_mdev)
+{
+ /* The kvm->lock must be must be taken before the matrix_dev->lock */
+ if (matrix_mdev->guest)
+ mutex_unlock(&matrix_mdev->guest->kvm->lock);
+
+ mutex_unlock(&matrix_dev->lock);
+ up_read(&matrix_dev->guests_lock);
+}
+
/**
* assign_adapter_store - parses the APID from @buf and sets the
* corresponding bit in the mediated matrix device's APM
@@ -645,23 +703,14 @@ static ssize_t assign_adapter_store(struct device *dev,
unsigned long apid;
struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev);

- mutex_lock(&matrix_dev->lock);
-
- /* If the KVM guest is running, disallow assignment of adapter */
- if (matrix_mdev->kvm) {
- ret = -EBUSY;
- goto done;
- }
-
ret = kstrtoul(buf, 0, &apid);
if (ret)
- goto done;
+ return ret;

- if (apid > matrix_mdev->matrix.apm_max) {
- ret = -ENODEV;
- goto done;
- }
+ if (apid > matrix_mdev->matrix.apm_max)
+ return -ENODEV;

+ vfio_ap_mdev_get_locks(matrix_mdev);
set_bit_inv(apid, matrix_mdev->matrix.apm);

ret = vfio_ap_mdev_validate_masks(matrix_mdev);
@@ -671,10 +720,13 @@ static ssize_t assign_adapter_store(struct device *dev,
}

vfio_ap_mdev_link_adapter(matrix_mdev, apid);
- vfio_ap_mdev_filter_matrix(matrix_mdev);
+
+ if (vfio_ap_mdev_filter_matrix(matrix_mdev))
+ vfio_ap_mdev_hotplug_apcb(matrix_mdev);
+
ret = count;
done:
- mutex_unlock(&matrix_dev->lock);
+ vfio_ap_mdev_put_locks(matrix_mdev);

return ret;
}
@@ -717,30 +769,23 @@ static ssize_t unassign_adapter_store(struct device *dev,
unsigned long apid;
struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev);

- mutex_lock(&matrix_dev->lock);
-
- /* If the KVM guest is running, disallow unassignment of adapter */
- if (matrix_mdev->kvm) {
- ret = -EBUSY;
- goto done;
- }
-
ret = kstrtoul(buf, 0, &apid);
if (ret)
- goto done;
+ return ret;

- if (apid > matrix_mdev->matrix.apm_max) {
- ret = -ENODEV;
- goto done;
- }
+ if (apid > matrix_mdev->matrix.apm_max)
+ return -ENODEV;

+ vfio_ap_mdev_get_locks(matrix_mdev);
clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
vfio_ap_mdev_unlink_adapter(matrix_mdev, apid);
- vfio_ap_mdev_filter_matrix(matrix_mdev);
- ret = count;
-done:
- mutex_unlock(&matrix_dev->lock);
- return ret;
+
+ if (vfio_ap_mdev_filter_matrix(matrix_mdev))
+ vfio_ap_mdev_hotplug_apcb(matrix_mdev);
+
+ vfio_ap_mdev_put_locks(matrix_mdev);
+
+ return count;
}
static DEVICE_ATTR_WO(unassign_adapter);

@@ -793,22 +838,13 @@ static ssize_t assign_domain_store(struct device *dev,
struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev);
unsigned long max_apqi = matrix_mdev->matrix.aqm_max;

- mutex_lock(&matrix_dev->lock);
-
- /* If the KVM guest is running, disallow assignment of domain */
- if (matrix_mdev->kvm) {
- ret = -EBUSY;
- goto done;
- }
-
ret = kstrtoul(buf, 0, &apqi);
if (ret)
- goto done;
- if (apqi > max_apqi) {
- ret = -ENODEV;
- goto done;
- }
+ return ret;
+ if (apqi > max_apqi)
+ return -ENODEV;

+ vfio_ap_mdev_get_locks(matrix_mdev);
set_bit_inv(apqi, matrix_mdev->matrix.aqm);

ret = vfio_ap_mdev_validate_masks(matrix_mdev);
@@ -818,10 +854,13 @@ static ssize_t assign_domain_store(struct device *dev,
}

vfio_ap_mdev_link_domain(matrix_mdev, apqi);
- vfio_ap_mdev_filter_matrix(matrix_mdev);
+
+ if (vfio_ap_mdev_filter_matrix(matrix_mdev))
+ vfio_ap_mdev_hotplug_apcb(matrix_mdev);
+
ret = count;
done:
- mutex_unlock(&matrix_dev->lock);
+ vfio_ap_mdev_put_locks(matrix_mdev);

return ret;
}
@@ -864,31 +903,23 @@ static ssize_t unassign_domain_store(struct device *dev,
unsigned long apqi;
struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev);

- mutex_lock(&matrix_dev->lock);
-
- /* If the KVM guest is running, disallow unassignment of domain */
- if (matrix_mdev->kvm) {
- ret = -EBUSY;
- goto done;
- }
-
ret = kstrtoul(buf, 0, &apqi);
if (ret)
- goto done;
+ return ret;

- if (apqi > matrix_mdev->matrix.aqm_max) {
- ret = -ENODEV;
- goto done;
- }
+ if (apqi > matrix_mdev->matrix.aqm_max)
+ return -ENODEV;

+ vfio_ap_mdev_get_locks(matrix_mdev);
clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
vfio_ap_mdev_unlink_domain(matrix_mdev, apqi);
- vfio_ap_mdev_filter_matrix(matrix_mdev);
- ret = count;

-done:
- mutex_unlock(&matrix_dev->lock);
- return ret;
+ if (vfio_ap_mdev_filter_matrix(matrix_mdev))
+ vfio_ap_mdev_hotplug_apcb(matrix_mdev);
+
+ vfio_ap_mdev_put_locks(matrix_mdev);
+
+ return count;
}
static DEVICE_ATTR_WO(unassign_domain);

@@ -914,22 +945,14 @@ static ssize_t assign_control_domain_store(struct device *dev,
unsigned long id;
struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev);

- mutex_lock(&matrix_dev->lock);
-
- /* If the KVM guest is running, disallow assignment of control domain */
- if (matrix_mdev->kvm) {
- ret = -EBUSY;
- goto done;
- }
-
ret = kstrtoul(buf, 0, &id);
if (ret)
- goto done;
+ return ret;

- if (id > matrix_mdev->matrix.adm_max) {
- ret = -ENODEV;
- goto done;
- }
+ if (id > matrix_mdev->matrix.adm_max)
+ return -ENODEV;
+
+ vfio_ap_mdev_get_locks(matrix_mdev);

/* Set the bit in the ADM (bitmask) corresponding to the AP control
* domain number (id). The bits in the mask, from most significant to
@@ -937,11 +960,13 @@ static ssize_t assign_control_domain_store(struct device *dev,
* number of control domains that can be assigned.
*/
set_bit_inv(id, matrix_mdev->matrix.adm);
- vfio_ap_mdev_filter_cdoms(matrix_mdev);
- ret = count;
-done:
- mutex_unlock(&matrix_dev->lock);
- return ret;
+
+ if (vfio_ap_mdev_filter_cdoms(matrix_mdev))
+ vfio_ap_mdev_hotplug_apcb(matrix_mdev);
+
+ vfio_ap_mdev_put_locks(matrix_mdev);
+
+ return count;
}
static DEVICE_ATTR_WO(assign_control_domain);

@@ -968,28 +993,21 @@ static ssize_t unassign_control_domain_store(struct device *dev,
struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev);
unsigned long max_domid = matrix_mdev->matrix.adm_max;

- mutex_lock(&matrix_dev->lock);
-
- /* If a KVM guest is running, disallow unassignment of control domain */
- if (matrix_mdev->kvm) {
- ret = -EBUSY;
- goto done;
- }
-
ret = kstrtoul(buf, 0, &domid);
if (ret)
- goto done;
- if (domid > max_domid) {
- ret = -ENODEV;
- goto done;
- }
+ return ret;
+ if (domid > max_domid)
+ return -ENODEV;

+ vfio_ap_mdev_get_locks(matrix_mdev);
clear_bit_inv(domid, matrix_mdev->matrix.adm);
- clear_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
- ret = count;
-done:
- mutex_unlock(&matrix_dev->lock);
- return ret;
+
+ if (vfio_ap_mdev_filter_cdoms(matrix_mdev))
+ vfio_ap_mdev_hotplug_apcb(matrix_mdev);
+
+ vfio_ap_mdev_put_locks(matrix_mdev);
+
+ return count;
}
static DEVICE_ATTR_WO(unassign_control_domain);

@@ -1095,6 +1113,9 @@ static int vfio_ap_mdev_create_guest(struct kvm *kvm,
if (!guest)
return -ENOMEM;

+ guest->kvm = kvm;
+ guest->matrix_mdev = matrix_mdev;
+ matrix_mdev->guest = guest;
list_add(&guest->node, &matrix_dev->guests);

return 0;
@@ -1123,17 +1144,15 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
int ret;
struct ap_matrix_mdev *m;

- if (kvm->arch.crypto.crycbd) {
- down_write(&matrix_dev->guests_lock);
- ret = vfio_ap_mdev_create_guest(kvm, matrix_mdev);
- if (WARN_ON(ret))
- return ret;
+ down_write(&matrix_dev->guests_lock);

+ if (kvm->arch.crypto.crycbd) {
mutex_lock(&kvm->lock);
mutex_lock(&matrix_dev->lock);

list_for_each_entry(m, &matrix_dev->mdev_list, node) {
- if (m != matrix_mdev && m->kvm == kvm) {
+ if (m != matrix_mdev && m->guest &&
+ m->guest->kvm == kvm) {
mutex_unlock(&matrix_dev->lock);
mutex_unlock(&kvm->lock);
up_write(&matrix_dev->guests_lock);
@@ -1141,18 +1160,27 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
}
}

- kvm_get_kvm(kvm);
- matrix_mdev->kvm = kvm;
+ ret = vfio_ap_mdev_create_guest(kvm, matrix_mdev);
+ if (WARN_ON(ret)) {
+ mutex_unlock(&matrix_dev->lock);
+ mutex_unlock(&kvm->lock);
+ up_write(&matrix_dev->guests_lock);
+ return ret;
+ }
+
+ kvm_get_kvm(matrix_mdev->guest->kvm);
kvm->arch.crypto.data = matrix_mdev;
- kvm_arch_crypto_set_masks(kvm, matrix_mdev->shadow_apcb.apm,
+ kvm_arch_crypto_set_masks(matrix_mdev->guest->kvm,
+ matrix_mdev->shadow_apcb.apm,
matrix_mdev->shadow_apcb.aqm,
matrix_mdev->shadow_apcb.adm);

mutex_unlock(&matrix_dev->lock);
- mutex_unlock(&kvm->lock);
- up_write(&matrix_dev->guests_lock);
+ mutex_unlock(&matrix_mdev->guest->kvm->lock);
}

+ up_write(&matrix_dev->guests_lock);
+
return 0;
}

@@ -1186,16 +1214,11 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
return NOTIFY_DONE;
}

-static void vfio_ap_mdev_remove_guest(struct kvm *kvm)
+static void vfio_ap_mdev_remove_guest(struct ap_matrix_mdev *matrix_mdev)
{
- struct ap_guest *guest, *tmp;
-
- list_for_each_entry_safe(guest, tmp, &matrix_dev->guests, node) {
- if (guest->kvm == kvm) {
- list_del(&guest->node);
- break;
- }
- }
+ list_del(&matrix_mdev->guest->node);
+ matrix_mdev->guest = NULL;
+ kfree(matrix_mdev->guest);
}

/**
@@ -1203,7 +1226,6 @@ static void vfio_ap_mdev_remove_guest(struct kvm *kvm)
* by @matrix_mdev.
*
* @matrix_mdev: a matrix mediated device
- * @kvm: the pointer to the kvm structure being unset.
*
* Note: The matrix_dev->lock must be taken prior to calling
* this function; however, the lock will be temporarily released while the
@@ -1212,26 +1234,25 @@ static void vfio_ap_mdev_remove_guest(struct kvm *kvm)
* certain circumstances, will result in a circular lock dependency if this is
* done under the @matrix_mdev->lock.
*/
-static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev,
- struct kvm *kvm)
+static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
{
- if (kvm && kvm->arch.crypto.crycbd) {
- down_write(&matrix_dev->guests_lock);
- vfio_ap_mdev_remove_guest(kvm);
+ down_write(&matrix_dev->guests_lock);

- mutex_lock(&kvm->lock);
+ if (matrix_mdev->guest) {
+ mutex_lock(&matrix_mdev->guest->kvm->lock);
mutex_lock(&matrix_dev->lock);

- kvm_arch_crypto_clear_masks(kvm);
+ kvm_arch_crypto_clear_masks(matrix_mdev->guest->kvm);
vfio_ap_mdev_reset_queues(matrix_mdev);
- kvm_put_kvm(kvm);
- kvm->arch.crypto.data = NULL;
- matrix_mdev->kvm = NULL;
+ matrix_mdev->guest->kvm->arch.crypto.data = NULL;
+ kvm_put_kvm(matrix_mdev->guest->kvm);

mutex_unlock(&matrix_dev->lock);
- mutex_unlock(&kvm->lock);
- up_write(&matrix_dev->guests_lock);
+ mutex_unlock(&matrix_mdev->guest->kvm->lock);
+ vfio_ap_mdev_remove_guest(matrix_mdev);
}
+
+ up_write(&matrix_dev->guests_lock);
}

static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
@@ -1246,7 +1267,7 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);

if (!data)
- vfio_ap_mdev_unset_kvm(matrix_mdev, matrix_mdev->kvm);
+ vfio_ap_mdev_unset_kvm(matrix_mdev);
else if (vfio_ap_mdev_set_kvm(matrix_mdev, data))
notify_rc = NOTIFY_DONE;

@@ -1377,7 +1398,7 @@ static void vfio_ap_mdev_close_device(struct vfio_device *vdev)
&matrix_mdev->iommu_notifier);
vfio_unregister_notifier(vdev->dev, VFIO_GROUP_NOTIFY,
&matrix_mdev->group_notifier);
- vfio_ap_mdev_unset_kvm(matrix_mdev, matrix_mdev->kvm);
+ vfio_ap_mdev_unset_kvm(matrix_mdev);
}

static int vfio_ap_mdev_get_device_info(unsigned long arg)
@@ -1504,38 +1525,99 @@ static void vfio_ap_queue_link_mdev(struct vfio_ap_queue *q)
}
}

+/**
+ * vfio_ap_mdev_get_qlocks: lock all of the locks required for probe/remove
+ * callbacks.
+ *
+ * @apqn: the APQN of the queue device being probed or removed
+ *
+ * Return: the struct ap_guest object using the matrix mdev to which @apqn is
+ * assigned.
+ */
+static struct ap_guest *vfio_ap_mdev_get_qlocks(int apqn)
+{
+ struct ap_guest *guest;
+ unsigned long apid = AP_QID_CARD(apqn);
+ unsigned long apqi = AP_QID_QUEUE(apqn);
+
+ down_read(&matrix_dev->guests_lock);
+
+ list_for_each_entry(guest, &matrix_dev->guests, node) {
+ if (test_bit_inv(apid, guest->matrix_mdev->matrix.apm) &&
+ test_bit_inv(apqi, guest->matrix_mdev->matrix.aqm)) {
+ mutex_lock(&guest->kvm->lock);
+ mutex_lock(&matrix_dev->lock);
+
+ return guest;
+ }
+ }
+
+ mutex_lock(&matrix_dev->lock);
+
+ return NULL;
+}
+
+/**
+ * vfio_ap_mdev_put_qlocks - unlock all of the locks required for probe/remove
+ * callbacks.
+ *
+ * @guest: the guest using the queue device being probed/removed
+ */
+static void vfio_ap_mdev_put_qlocks(struct ap_guest *guest)
+{
+ if (guest)
+ mutex_unlock(&guest->kvm->lock);
+
+ mutex_unlock(&matrix_dev->lock);
+ up_read(&matrix_dev->guests_lock);
+}
+
int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
{
struct vfio_ap_queue *q;
+ struct ap_guest *guest;
+ int apqn = to_ap_queue(&apdev->device)->qid;

q = kzalloc(sizeof(*q), GFP_KERNEL);
if (!q)
return -ENOMEM;

- mutex_lock(&matrix_dev->lock);
- q->apqn = to_ap_queue(&apdev->device)->qid;
+ q->apqn = apqn;
q->saved_isc = VFIO_AP_ISC_INVALID;
- vfio_ap_queue_link_mdev(q);
- if (q->matrix_mdev)
- vfio_ap_mdev_filter_matrix(q->matrix_mdev);
+ guest = vfio_ap_mdev_get_qlocks(apqn);
+
+ if (guest) {
+ vfio_ap_mdev_link_queue(guest->matrix_mdev, q);
+
+ if (vfio_ap_mdev_filter_matrix(guest->matrix_mdev))
+ vfio_ap_mdev_hotplug_apcb(guest->matrix_mdev);
+ } else {
+ vfio_ap_queue_link_mdev(q);
+ }
+
dev_set_drvdata(&apdev->device, q);
- mutex_unlock(&matrix_dev->lock);
+ vfio_ap_mdev_put_qlocks(guest);

return 0;
}

void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
{
+ struct ap_guest *guest;
struct vfio_ap_queue *q;
+ int apqn = to_ap_queue(&apdev->device)->qid;

- mutex_lock(&matrix_dev->lock);
+ guest = vfio_ap_mdev_get_qlocks(apqn);
q = dev_get_drvdata(&apdev->device);

- if (q->matrix_mdev)
+ if (q->matrix_mdev) {
vfio_ap_unlink_queue_fr_mdev(q);
+ if (vfio_ap_mdev_filter_matrix(q->matrix_mdev))
+ vfio_ap_mdev_hotplug_apcb(q->matrix_mdev);
+ }

vfio_ap_mdev_reset_queue(q, 1);
dev_set_drvdata(&apdev->device, NULL);
+ vfio_ap_mdev_put_qlocks(guest);
kfree(q);
- mutex_unlock(&matrix_dev->lock);
}
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 6d28b287d7bf..0e825ffbd0cc 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -28,6 +28,7 @@

struct ap_guest {
struct kvm *kvm;
+ struct ap_matrix_mdev *matrix_mdev;
struct list_head node;
};

@@ -106,11 +107,9 @@ struct ap_queue_table {
* handling the VFIO_GROUP_NOTIFY_SET_KVM event
* @iommu_notifier: notifier block used for specifying callback function for
* handling the VFIO_IOMMU_NOTIFY_DMA_UNMAP even
- * @kvm: the struct holding guest's state
- * @pqap_hook: the function pointer to the interception handler for the
- * PQAP(AQIC) instruction.
* @mdev: the mediated device
* @qtable: table of queues (struct vfio_ap_queue) assigned to the mdev
+ * @guest: the KVM guest using the matrix mdev
*/
struct ap_matrix_mdev {
struct vfio_device vdev;
@@ -119,9 +118,9 @@ struct ap_matrix_mdev {
struct ap_matrix shadow_apcb;
struct notifier_block group_notifier;
struct notifier_block iommu_notifier;
- struct kvm *kvm;
struct mdev_device *mdev;
struct ap_queue_table qtable;
+ struct ap_guest *guest;
};

/**
--
2.31.1

2021-10-21 15:26:28

by Anthony Krowiak

[permalink] [raw]
Subject: [PATCH v17 11/15] s390/ap: driver callback to indicate resource in use

Introduces a new driver callback to prevent a root user from re-assigning
the APQN of a queue that is in use by a non-default host device driver to
a default host device driver and vice versa. The callback will be invoked
whenever a change to the AP bus's sysfs apmask or aqmask attributes would
result in one or more APQNs being re-assigned. If the callback responds
in the affirmative for any driver queried, the change to the apmask or
aqmask will be rejected with a device busy error.

For this patch, only non-default drivers will be queried. Currently,
there is only one non-default driver, the vfio_ap device driver. The
vfio_ap device driver facilitates pass-through of an AP queue to a
guest. The idea here is that a guest may be administered by a different
sysadmin than the host and we don't want AP resources to unexpectedly
disappear from a guest's AP configuration (i.e., adapters and domains
assigned to the matrix mdev). This will enforce the proper procedure for
removing AP resources intended for guest usage which is to
first unassign them from the matrix mdev, then unbind them from the
vfio_ap device driver.

Signed-off-by: Tony Krowiak <[email protected]>
Reviewed-by: Harald Freudenberger <[email protected]>
Reviewed-by: Halil Pasic <[email protected]>
---
drivers/s390/crypto/ap_bus.c | 160 ++++++++++++++++++++++++++++++++---
drivers/s390/crypto/ap_bus.h | 4 +
2 files changed, 154 insertions(+), 10 deletions(-)

diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
index d9b804943d19..15886610f61a 100644
--- a/drivers/s390/crypto/ap_bus.c
+++ b/drivers/s390/crypto/ap_bus.c
@@ -36,6 +36,7 @@
#include <linux/mod_devicetable.h>
#include <linux/debugfs.h>
#include <linux/ctype.h>
+#include <linux/module.h>

#include "ap_bus.h"
#include "ap_debug.h"
@@ -1060,6 +1061,23 @@ static int modify_bitmap(const char *str, unsigned long *bitmap, int bits)
return 0;
}

+static int ap_parse_bitmap_str(const char *str, unsigned long *bitmap, int bits,
+ unsigned long *newmap)
+{
+ unsigned long size;
+ int rc;
+
+ size = BITS_TO_LONGS(bits) * sizeof(unsigned long);
+ if (*str == '+' || *str == '-') {
+ memcpy(newmap, bitmap, size);
+ rc = modify_bitmap(str, newmap, bits);
+ } else {
+ memset(newmap, 0, size);
+ rc = hex2bitmap(str, newmap, bits);
+ }
+ return rc;
+}
+
int ap_parse_mask_str(const char *str,
unsigned long *bitmap, int bits,
struct mutex *lock)
@@ -1079,14 +1097,7 @@ int ap_parse_mask_str(const char *str,
kfree(newmap);
return -ERESTARTSYS;
}
-
- if (*str == '+' || *str == '-') {
- memcpy(newmap, bitmap, size);
- rc = modify_bitmap(str, newmap, bits);
- } else {
- memset(newmap, 0, size);
- rc = hex2bitmap(str, newmap, bits);
- }
+ rc = ap_parse_bitmap_str(str, bitmap, bits, newmap);
if (rc == 0)
memcpy(bitmap, newmap, size);
mutex_unlock(lock);
@@ -1278,12 +1289,76 @@ static ssize_t apmask_show(struct bus_type *bus, char *buf)
return rc;
}

+static int __verify_card_reservations(struct device_driver *drv, void *data)
+{
+ int rc = 0;
+ struct ap_driver *ap_drv = to_ap_drv(drv);
+ unsigned long *newapm = (unsigned long *)data;
+
+ /*
+ * No need to verify whether the driver is using the queues if it is the
+ * default driver.
+ */
+ if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
+ return 0;
+
+ /*
+ * increase the driver's module refcounter to be sure it is not
+ * going away when we invoke the callback function.
+ */
+ if (!try_module_get(drv->owner))
+ return 0;
+
+ if (ap_drv->in_use) {
+ rc = ap_drv->in_use(newapm, ap_perms.aqm);
+ if (rc)
+ rc = -EBUSY;
+ }
+
+ /* release the driver's module */
+ module_put(drv->owner);
+
+ return rc;
+}
+
+static int apmask_commit(unsigned long *newapm)
+{
+ int rc;
+ unsigned long reserved[BITS_TO_LONGS(AP_DEVICES)];
+
+ /*
+ * Check if any bits in the apmask have been set which will
+ * result in queues being removed from non-default drivers
+ */
+ if (bitmap_andnot(reserved, newapm, ap_perms.apm, AP_DEVICES)) {
+ rc = bus_for_each_drv(&ap_bus_type, NULL, reserved,
+ __verify_card_reservations);
+ if (rc)
+ return rc;
+ }
+
+ memcpy(ap_perms.apm, newapm, APMASKSIZE);
+
+ return 0;
+}
+
static ssize_t apmask_store(struct bus_type *bus, const char *buf,
size_t count)
{
int rc;
+ DECLARE_BITMAP(newapm, AP_DEVICES);

- rc = ap_parse_mask_str(buf, ap_perms.apm, AP_DEVICES, &ap_perms_mutex);
+ if (mutex_lock_interruptible(&ap_perms_mutex))
+ return -ERESTARTSYS;
+
+ rc = ap_parse_bitmap_str(buf, ap_perms.apm, AP_DEVICES, newapm);
+ if (rc)
+ goto done;
+
+ rc = apmask_commit(newapm);
+
+done:
+ mutex_unlock(&ap_perms_mutex);
if (rc)
return rc;

@@ -1309,12 +1384,77 @@ static ssize_t aqmask_show(struct bus_type *bus, char *buf)
return rc;
}

+static int __verify_queue_reservations(struct device_driver *drv, void *data)
+{
+ int rc = 0;
+ struct ap_driver *ap_drv = to_ap_drv(drv);
+ unsigned long *newaqm = (unsigned long *)data;
+
+ /*
+ * If the reserved bits do not identify queues reserved for use by the
+ * non-default driver, there is no need to verify the driver is using
+ * the queues.
+ */
+ if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
+ return 0;
+
+ /*
+ * increase the driver's module refcounter to be sure it is not
+ * going away when we invoke the callback function.
+ */
+ if (!try_module_get(drv->owner))
+ return 0;
+
+ if (ap_drv->in_use) {
+ rc = ap_drv->in_use(ap_perms.apm, newaqm);
+ if (rc)
+ return -EBUSY;
+ }
+
+ /* release the driver's module */
+ module_put(drv->owner);
+
+ return rc;
+}
+
+static int aqmask_commit(unsigned long *newaqm)
+{
+ int rc;
+ unsigned long reserved[BITS_TO_LONGS(AP_DOMAINS)];
+
+ /*
+ * Check if any bits in the aqmask have been set which will
+ * result in queues being removed from non-default drivers
+ */
+ if (bitmap_andnot(reserved, newaqm, ap_perms.aqm, AP_DOMAINS)) {
+ rc = bus_for_each_drv(&ap_bus_type, NULL, reserved,
+ __verify_queue_reservations);
+ if (rc)
+ return rc;
+ }
+
+ memcpy(ap_perms.aqm, newaqm, AQMASKSIZE);
+
+ return 0;
+}
+
static ssize_t aqmask_store(struct bus_type *bus, const char *buf,
size_t count)
{
int rc;
+ DECLARE_BITMAP(newaqm, AP_DOMAINS);
+
+ if (mutex_lock_interruptible(&ap_perms_mutex))
+ return -ERESTARTSYS;
+
+ rc = ap_parse_bitmap_str(buf, ap_perms.aqm, AP_DOMAINS, newaqm);
+ if (rc)
+ goto done;
+
+ rc = aqmask_commit(newaqm);

- rc = ap_parse_mask_str(buf, ap_perms.aqm, AP_DOMAINS, &ap_perms_mutex);
+done:
+ mutex_unlock(&ap_perms_mutex);
if (rc)
return rc;

diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
index 95b577754b35..67c1bef60ad5 100644
--- a/drivers/s390/crypto/ap_bus.h
+++ b/drivers/s390/crypto/ap_bus.h
@@ -142,6 +142,7 @@ struct ap_driver {

int (*probe)(struct ap_device *);
void (*remove)(struct ap_device *);
+ int (*in_use)(unsigned long *apm, unsigned long *aqm);
};

#define to_ap_drv(x) container_of((x), struct ap_driver, driver)
@@ -289,6 +290,9 @@ void ap_queue_init_state(struct ap_queue *aq);
struct ap_card *ap_card_create(int id, int queue_depth, int raw_type,
int comp_type, unsigned int functions, int ml);

+#define APMASKSIZE (BITS_TO_LONGS(AP_DEVICES) * sizeof(unsigned long))
+#define AQMASKSIZE (BITS_TO_LONGS(AP_DOMAINS) * sizeof(unsigned long))
+
struct ap_perms {
unsigned long ioctlm[BITS_TO_LONGS(AP_IOCTLS)];
unsigned long apm[BITS_TO_LONGS(AP_DEVICES)];
--
2.31.1

2021-10-21 15:26:39

by Anthony Krowiak

[permalink] [raw]
Subject: [PATCH v17 14/15] s390/ap: notify drivers on config changed and scan complete callbacks

This patch introduces an extension to the ap bus to notify device drivers
when the host AP configuration changes - i.e., adapters, domains or
control domains are added or removed. When an adapter or domain is added to
the host's AP configuration, the AP bus will create the associated queue
devices in the linux sysfs device model. Each new type 10 (i.e., CEX4) or
newer queue device with an APQN that is not reserved for the default device
driver will get bound to the vfio_ap device driver. Likewise, whan an
adapter or domain is removed from the host's AP configuration, the AP bus
will remove the associated queue devices from the sysfs device model. Each
of the queues that is bound to the vfio_ap device driver will get unbound.

With the introduction of hot plug support, binding or unbinding of a
queue device will result in plugging or unplugging one or more queues from
a guest that is using the queue. If there are multiple changes to the
host's AP configuration, it could result in the probe and remove callbacks
getting invoked multiple times. Each time queues are plugged into or
unplugged from a guest, the guest's VCPUs must be taken out of SIE.
If this occurs multiple times due to changes in the host's AP
configuration, that can have an undesirable negative affect on the guest's
performance.

To alleviate this problem, this patch introduces two new callbacks: one to
notify the vfio_ap device driver when the AP bus scan routine detects a
change to the host's AP configuration; and, one to notify the driver when
the AP bus is done scanning. This will allow the vfio_ap driver to do
bulk processing of all affected adapters, domains and control domains for
affected guests rather than plugging or unplugging them one at a time when
the probe or remove callback is invoked. The two new callbacks are:

void (*on_config_changed)(struct ap_config_info *new_config_info,
struct ap_config_info *old_config_info);

This callback is invoked at the start of the AP bus scan
function when it determines that the host AP configuration information
has changed since the previous scan. This is done by storing
an old and current QCI info struct and comparing them. If there is any
difference, the callback is invoked.

The vfio_ap device driver registers a callback function for this callback
that performs the following operations:

1. Unplugs the adapters, domains and control domains removed from the
host's AP configuration from the guests to which they are
assigned in a single operation.

2. Disconnects the links between each queue structure representing a
queue that was unplugged from the structure representing
the mediated device to which the queue is assigned. Thus, when the
vfio_ap device driver's remove callback is invoked, the unplugging of
the queue from the guest and the unlinking of the queue structure from
the mediated device structure will be bypassed because the queues and
control domains will have already been unplugged in bulk.

3. Stores bitmaps identifying the adapters, domains and control domains
added to the host's AP configuration with the structure representing
the mediated device. When the vfio_ap device driver's probe callback is
subsequently invoked, the probe function will recognize that the
queue is being probed due to a change in the host's AP configuration
and the plugging of the queue into the guest will be bypassed.

void (*on_scan_complete)(struct ap_config_info *new_config_info,
struct ap_config_info *old_config_info);

The on_scan_complete callback is invoked after the ap bus scan is
completed if the host AP configuration data has changed. The vfio_ap
device driver registers a callback function for this callback that hot
plugs each queue and control domain added to the AP configuration for each
guest using them in a single hot plug operation.

Signed-off-by: Harald Freudenberger <[email protected]>
[[email protected]: implemented callback functions in vfio_ap driver]
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/ap_bus.c | 81 ++++++-
drivers/s390/crypto/ap_bus.h | 12 +
drivers/s390/crypto/vfio_ap_drv.c | 4 +-
drivers/s390/crypto/vfio_ap_ops.c | 332 ++++++++++++++++++++++++--
drivers/s390/crypto/vfio_ap_private.h | 23 +-
5 files changed, 429 insertions(+), 23 deletions(-)

diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
index 15886610f61a..b97149d02da6 100644
--- a/drivers/s390/crypto/ap_bus.c
+++ b/drivers/s390/crypto/ap_bus.c
@@ -88,6 +88,7 @@ static atomic64_t ap_bindings_complete_count = ATOMIC64_INIT(0);
static DECLARE_COMPLETION(ap_init_apqn_bindings_complete);

static struct ap_config_info *ap_qci_info;
+static struct ap_config_info *ap_qci_info_old;

/*
* AP bus related debug feature things.
@@ -225,9 +226,14 @@ static void __init ap_init_qci_info(void)
ap_qci_info = kzalloc(sizeof(*ap_qci_info), GFP_KERNEL);
if (!ap_qci_info)
return;
+ ap_qci_info_old = kzalloc(sizeof(*ap_qci_info_old), GFP_KERNEL);
+ if (!ap_qci_info_old)
+ return;
if (ap_fetch_qci_info(ap_qci_info) != 0) {
kfree(ap_qci_info);
+ kfree(ap_qci_info_old);
ap_qci_info = NULL;
+ ap_qci_info_old = NULL;
return;
}
AP_DBF_INFO("%s successful fetched initial qci info\n", __func__);
@@ -244,6 +250,8 @@ static void __init ap_init_qci_info(void)
__func__, ap_max_domain_id);
}
}
+
+ memcpy(ap_qci_info_old, ap_qci_info, sizeof(*ap_qci_info));
}

/*
@@ -1635,6 +1643,49 @@ static int __match_queue_device_with_queue_id(struct device *dev, const void *da
&& AP_QID_QUEUE(to_ap_queue(dev)->qid) == (int)(long) data;
}

+/* Helper function for notify_config_changed */
+static int __drv_notify_config_changed(struct device_driver *drv, void *data)
+{
+ struct ap_driver *ap_drv = to_ap_drv(drv);
+
+ if (try_module_get(drv->owner)) {
+ if (ap_drv->on_config_changed)
+ ap_drv->on_config_changed(ap_qci_info, ap_qci_info_old);
+ module_put(drv->owner);
+ }
+
+ return 0;
+}
+
+/* Notify all drivers about an qci config change */
+static inline void notify_config_changed(void)
+{
+ bus_for_each_drv(&ap_bus_type, NULL, NULL,
+ __drv_notify_config_changed);
+}
+
+/* Helper function for notify_scan_complete */
+static int __drv_notify_scan_complete(struct device_driver *drv, void *data)
+{
+ struct ap_driver *ap_drv = to_ap_drv(drv);
+
+ if (try_module_get(drv->owner)) {
+ if (ap_drv->on_scan_complete)
+ ap_drv->on_scan_complete(ap_qci_info,
+ ap_qci_info_old);
+ module_put(drv->owner);
+ }
+
+ return 0;
+}
+
+/* Notify all drivers about bus scan complete */
+static inline void notify_scan_complete(void)
+{
+ bus_for_each_drv(&ap_bus_type, NULL, NULL,
+ __drv_notify_scan_complete);
+}
+
/*
* Helper function for ap_scan_bus().
* Remove card device and associated queue devices.
@@ -1923,6 +1974,25 @@ static inline void ap_scan_adapter(int ap)
put_device(&ac->ap_dev.device);
}

+/**
+ * ap_get_configuration - get the host AP configuration
+ *
+ * Stores the host AP configuration information returned from the previous call
+ * to Query Configuration Information (QCI), then retrieves and stores the
+ * current AP configuration returned from QCI.
+ *
+ * Return: true if the host AP configuration changed between calls to QCI;
+ * otherwise, return false.
+ */
+static bool ap_get_configuration(void)
+{
+ memcpy(ap_qci_info_old, ap_qci_info, sizeof(*ap_qci_info));
+ ap_fetch_qci_info(ap_qci_info);
+
+ return memcmp(ap_qci_info, ap_qci_info_old,
+ sizeof(struct ap_config_info)) != 0;
+}
+
/**
* ap_scan_bus(): Scan the AP bus for new devices
* Runs periodically, workqueue timer (ap_config_time)
@@ -1930,9 +2000,12 @@ static inline void ap_scan_adapter(int ap)
*/
static void ap_scan_bus(struct work_struct *unused)
{
- int ap;
+ int ap, config_changed = 0;

- ap_fetch_qci_info(ap_qci_info);
+ /* config change notify */
+ config_changed = ap_get_configuration();
+ if (config_changed)
+ notify_config_changed();
ap_select_domain();

AP_DBF_DBG("%s running\n", __func__);
@@ -1941,6 +2014,10 @@ static void ap_scan_bus(struct work_struct *unused)
for (ap = 0; ap <= ap_max_adapter_id; ap++)
ap_scan_adapter(ap);

+ /* scan complete notify */
+ if (config_changed)
+ notify_scan_complete();
+
/* check if there is at least one queue available with default domain */
if (ap_domain_index >= 0) {
struct device *dev =
diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
index 67c1bef60ad5..4de062ea6b76 100644
--- a/drivers/s390/crypto/ap_bus.h
+++ b/drivers/s390/crypto/ap_bus.h
@@ -143,6 +143,18 @@ struct ap_driver {
int (*probe)(struct ap_device *);
void (*remove)(struct ap_device *);
int (*in_use)(unsigned long *apm, unsigned long *aqm);
+ /*
+ * Called at the start of the ap bus scan function when
+ * the crypto config information (qci) has changed.
+ */
+ void (*on_config_changed)(struct ap_config_info *new_config_info,
+ struct ap_config_info *old_config_info);
+ /*
+ * Called at the end of the ap bus scan function when
+ * the crypto config information (qci) has changed.
+ */
+ void (*on_scan_complete)(struct ap_config_info *new_config_info,
+ struct ap_config_info *old_config_info);
};

#define to_ap_drv(x) container_of((x), struct ap_driver, driver)
diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index df7528dcf6ed..5edd45d4d2fc 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -45,6 +45,8 @@ static struct ap_driver vfio_ap_drv = {
.probe = vfio_ap_mdev_probe_queue,
.remove = vfio_ap_mdev_remove_queue,
.in_use = vfio_ap_mdev_resource_in_use,
+ .on_config_changed = vfio_ap_on_cfg_changed,
+ .on_scan_complete = vfio_ap_on_scan_complete,
.ids = ap_queue_ids,
};

@@ -92,7 +94,7 @@ static int vfio_ap_matrix_dev_create(void)

/* Fill in config info via PQAP(QCI), if available */
if (test_facility(12)) {
- ret = ap_qci(&matrix_dev->info);
+ ret = ap_qci(&matrix_dev->config_info);
if (ret)
goto matrix_alloc_err;
}
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 8075080ef2dd..cedf491c0df4 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -330,7 +330,7 @@ static bool vfio_ap_mdev_filter_cdoms(struct ap_matrix_mdev *matrix_mdev)

bitmap_copy(shadow_adm, matrix_mdev->shadow_apcb.adm, AP_DOMAINS);
bitmap_and(matrix_mdev->shadow_apcb.adm, matrix_mdev->matrix.adm,
- (unsigned long *)matrix_dev->info.adm, AP_DOMAINS);
+ (unsigned long *)matrix_dev->config_info.adm, AP_DOMAINS);

return !bitmap_equal(shadow_adm, matrix_mdev->shadow_apcb.adm,
AP_DOMAINS);
@@ -349,19 +349,15 @@ static bool vfio_ap_mdev_filter_cdoms(struct ap_matrix_mdev *matrix_mdev)
*/
static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev)
{
- int ret;
unsigned long apid, apqi, apqn;
DECLARE_BITMAP(shadow_apm, AP_DEVICES);
DECLARE_BITMAP(shadow_aqm, AP_DOMAINS);
struct vfio_ap_queue *q;

- ret = ap_qci(&matrix_dev->info);
- if (ret)
- return false;
-
bitmap_copy(shadow_apm, matrix_mdev->shadow_apcb.apm, AP_DEVICES);
bitmap_copy(shadow_aqm, matrix_mdev->shadow_apcb.aqm, AP_DOMAINS);
- vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);
+ vfio_ap_matrix_init(&matrix_dev->config_info,
+ &matrix_mdev->shadow_apcb);

/*
* Copy the adapters, domains and control domains to the shadow_apcb
@@ -369,9 +365,9 @@ static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev)
* AP configuration.
*/
bitmap_and(matrix_mdev->shadow_apcb.apm, matrix_mdev->matrix.apm,
- (unsigned long *)matrix_dev->info.apm, AP_DEVICES);
+ (unsigned long *)matrix_dev->config_info.apm, AP_DEVICES);
bitmap_and(matrix_mdev->shadow_apcb.aqm, matrix_mdev->matrix.aqm,
- (unsigned long *)matrix_dev->info.aqm, AP_DOMAINS);
+ (unsigned long *)matrix_dev->config_info.aqm, AP_DOMAINS);

for_each_set_bit_inv(apid, matrix_mdev->shadow_apcb.apm, AP_DEVICES) {
for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm,
@@ -417,8 +413,9 @@ static int vfio_ap_mdev_probe(struct mdev_device *mdev)
&vfio_ap_matrix_dev_ops);

matrix_mdev->mdev = mdev;
- vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
- vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);
+ vfio_ap_matrix_init(&matrix_dev->config_info, &matrix_mdev->matrix);
+ vfio_ap_matrix_init(&matrix_dev->config_info,
+ &matrix_mdev->shadow_apcb);
hash_init(matrix_mdev->qtable.queues);
mdev_set_drvdata(mdev, matrix_mdev);
mutex_lock(&matrix_dev->lock);
@@ -772,13 +769,17 @@ static void vfio_ap_unlink_apqn_fr_mdev(struct ap_matrix_mdev *matrix_mdev,

q = vfio_ap_mdev_get_queue(matrix_mdev, AP_MKQID(apid, apqi));
/* If the queue is assigned to the matrix mdev, unlink it. */
- if (q)
+ if (q) {
vfio_ap_unlink_queue_fr_mdev(q);

- /* If the queue is assigned to the APCB, store it in @qtable. */
- if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm) &&
- test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm))
- hash_add(qtable->queues, &q->mdev_qnode, q->apqn);
+ /* If the queue is assigned to the APCB, store it in @qtable. */
+ if (qtable) {
+ if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm) &&
+ test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm))
+ hash_add(qtable->queues, &q->mdev_qnode,
+ q->apqn);
+ }
+ }
}

/**
@@ -1702,9 +1703,31 @@ static void vfio_ap_mdev_put_qlocks(struct ap_guest *guest)
mutex_unlock(&guest->kvm->lock);

mutex_unlock(&matrix_dev->lock);
+
up_read(&matrix_dev->guests_lock);
}

+static bool vfio_ap_mdev_do_filter_matrix(struct ap_matrix_mdev *matrix_mdev,
+ struct vfio_ap_queue *q)
+{
+ unsigned long apid = AP_QID_CARD(q->apqn);
+ unsigned long apqi = AP_QID_QUEUE(q->apqn);
+
+ /*
+ * If the queue is being probed because its APID or APQI is in the
+ * process of being added to the host's AP configuration, then we don't
+ * want to filter the matrix now as the filtering will be done after
+ * the driver is notified that the AP bus scan operation has completed
+ * (see the vfio_ap_on_scan_complete callback function).
+ */
+ if (test_bit_inv(apid, matrix_mdev->apm_add) ||
+ test_bit_inv(apqi, matrix_mdev->aqm_add))
+ return false;
+
+
+ return true;
+}
+
int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
{
struct vfio_ap_queue *q;
@@ -1722,8 +1745,10 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
if (guest) {
vfio_ap_mdev_link_queue(guest->matrix_mdev, q);

- if (vfio_ap_mdev_filter_matrix(guest->matrix_mdev))
- vfio_ap_mdev_hotplug_apcb(guest->matrix_mdev);
+ if (vfio_ap_mdev_do_filter_matrix(guest->matrix_mdev, q)) {
+ if (vfio_ap_mdev_filter_matrix(guest->matrix_mdev))
+ vfio_ap_mdev_hotplug_apcb(guest->matrix_mdev);
+ }
} else {
vfio_ap_queue_link_mdev(q);
}
@@ -1767,3 +1792,274 @@ int vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm)

return ret;
}
+
+/**
+ * vfio_ap_mdev_unlink_adapters - unlinks all queues from the matrix mdev with
+ * an APQI of a domain that has been removed from
+ * the host's AP configuration.
+ *
+ * @matrix_mdev: the matrix mdev from which the queues are to be unlinked
+ * @ap_unlink: a bitmap specifying the APIDs of the adapters removed from the
+ * host's AP configuration.
+ */
+static void vfio_ap_mdev_unlink_adapters(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long *ap_unlink)
+{
+ unsigned long apid;
+
+ for_each_set_bit_inv(apid, ap_unlink, AP_DEVICES)
+ vfio_ap_mdev_unlink_adapter(matrix_mdev, apid, NULL);
+}
+
+/**
+ * vfio_ap_mdev_unlink_domains - unlinks all queues from the matrix mdev with an
+ * APQI of a domain that has been removed from the
+ * host's AP configuration.
+ *
+ * @matrix_mdev: the matrix mdev from which the queues are to be unlinked
+ * @aq_unlink: a bitmap specifying the APQIs of the domains removed from the
+ * host's AP configuration.
+ */
+static void vfio_ap_mdev_unlink_domains(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long *aq_unlink)
+{
+ unsigned long apqi;
+
+ for_each_set_bit_inv(apqi, aq_unlink, AP_DOMAINS)
+ vfio_ap_mdev_unlink_domain(matrix_mdev, apqi, NULL);
+}
+
+/**
+ * vfio_ap_mdev_hot_unplug_cfg - hot unplug the adapters, domains and control
+ * domains that have been removed from the host's
+ * AP configuration from a guest.
+ *
+ * @guest: the guest
+ * @aprem: the adapters that have been removed from the host's AP configuration
+ * @aqrem: the domains that have been removed from the host's AP configuration
+ */
+static void vfio_ap_mdev_hot_unplug_cfg(struct ap_guest *guest,
+ unsigned long *aprem,
+ unsigned long *aqrem)
+{
+ vfio_ap_mdev_unlink_adapters(guest->matrix_mdev, aprem);
+ vfio_ap_mdev_unlink_domains(guest->matrix_mdev, aqrem);
+
+ if (vfio_ap_mdev_filter_matrix(guest->matrix_mdev) ||
+ vfio_ap_mdev_filter_cdoms(guest->matrix_mdev)) {
+ mutex_lock(&guest->kvm->lock);
+ mutex_lock(&matrix_dev->lock);
+
+ vfio_ap_mdev_hotplug_apcb(guest->matrix_mdev);
+
+ mutex_unlock(&guest->kvm->lock);
+ mutex_unlock(&matrix_dev->lock);
+ }
+}
+
+/**
+ * vfio_ap_mdev_cfg_remove - determines which guests are using the adapters,
+ * domains and control domains that have been removed
+ * from the host AP configuration and unplugs them
+ * from those guests.
+ *
+ * @ap_remove: bitmap specifying which adapters have been removed from the host
+ * config.
+ * @aq_remove: bitmap specifying which domains have been removed from the host
+ * config.
+ * @cd_remove: bitmap specifying which control domains have been removed from
+ * the host config.
+ */
+static void vfio_ap_mdev_cfg_remove(unsigned long *ap_remove,
+ unsigned long *aq_remove,
+ unsigned long *cd_remove)
+{
+ struct ap_guest *guest;
+ DECLARE_BITMAP(aprem, AP_DEVICES);
+ DECLARE_BITMAP(aqrem, AP_DOMAINS);
+ int do_ap_remove, do_aq_remove, do_cd_remove;
+
+ list_for_each_entry(guest, &matrix_dev->guests, node) {
+ do_ap_remove = bitmap_and(aprem, ap_remove,
+ guest->matrix_mdev->matrix.apm,
+ AP_DEVICES);
+ do_aq_remove = bitmap_and(aqrem, aq_remove,
+ guest->matrix_mdev->matrix.aqm,
+ AP_DOMAINS);
+ do_cd_remove = bitmap_and(aqrem, cd_remove,
+ guest->matrix_mdev->matrix.aqm,
+ AP_DOMAINS);
+
+ if (!do_ap_remove && !do_aq_remove && !do_cd_remove)
+ continue;
+
+ vfio_ap_mdev_hot_unplug_cfg(guest, aprem, aqrem);
+ }
+}
+
+/**
+ * vfio_ap_mdev_on_cfg_remove - responds to the removal of adapters, domains and
+ * control domains from the host AP configuration
+ * by unplugging them from the guests that are
+ * using them.
+ */
+static void vfio_ap_mdev_on_cfg_remove(void)
+{
+ int ap_remove, aq_remove, cd_remove;
+ DECLARE_BITMAP(aprem, AP_DEVICES);
+ DECLARE_BITMAP(aqrem, AP_DOMAINS);
+ DECLARE_BITMAP(cdrem, AP_DOMAINS);
+ unsigned long *cur_apm, *cur_aqm, *cur_adm;
+ unsigned long *prev_apm, *prev_aqm, *prev_adm;
+
+ cur_apm = (unsigned long *)matrix_dev->config_info.apm;
+ cur_aqm = (unsigned long *)matrix_dev->config_info.aqm;
+ cur_adm = (unsigned long *)matrix_dev->config_info.adm;
+ prev_apm = (unsigned long *)matrix_dev->config_info_prev.apm;
+ prev_aqm = (unsigned long *)matrix_dev->config_info_prev.aqm;
+ prev_adm = (unsigned long *)matrix_dev->config_info_prev.adm;
+
+ ap_remove = bitmap_andnot(aprem, prev_apm, cur_apm, AP_DEVICES);
+ aq_remove = bitmap_andnot(aqrem, prev_aqm, cur_aqm, AP_DOMAINS);
+ cd_remove = bitmap_andnot(cdrem, prev_adm, cur_adm, AP_DOMAINS);
+
+ if (ap_remove || aq_remove || cd_remove)
+ vfio_ap_mdev_cfg_remove(aprem, aqrem, cdrem);
+}
+
+/**
+ * vfio_ap_mdev_cfg_add - store bitmaps specifying the adapters, domains and
+ * control domains that have been added to the host's
+ * AP configuration for each matrix mdev to which they
+ * are assigned.
+ *
+ * @apm_add: a bitmap specifying the adapters that have been added to the AP
+ * configuration.
+ * @aqm_add: a bitmap specifying the domains that have been added to the AP
+ * configuration.
+ * @adm_add: a bitmap specifying the control domains that have been added to the
+ * AP configuration.
+ */
+static void vfio_ap_mdev_cfg_add(unsigned long *apm_add, unsigned long *aqm_add,
+ unsigned long *adm_add)
+{
+ struct ap_guest *guest;
+
+ list_for_each_entry(guest, &matrix_dev->guests, node) {
+ bitmap_and(guest->matrix_mdev->apm_add,
+ guest->matrix_mdev->matrix.apm, apm_add, AP_DEVICES);
+ bitmap_and(guest->matrix_mdev->aqm_add,
+ guest->matrix_mdev->matrix.aqm, aqm_add, AP_DOMAINS);
+ bitmap_and(guest->matrix_mdev->adm_add,
+ guest->matrix_mdev->matrix.adm, adm_add, AP_DEVICES);
+ }
+}
+
+/**
+ * vfio_ap_mdev_on_cfg_add - responds to the addition of adapters, domains and
+ * control domains to the host AP configuration
+ * by updating the bitmaps that specify what adapters,
+ * domains and control domains have been added so they
+ * can be hot plugged into the guest when the AP bus
+ * scan completes (see vfio_ap_on_scan_complete
+ * function).
+ */
+static void vfio_ap_mdev_on_cfg_add(void)
+{
+ bool do_add;
+ DECLARE_BITMAP(apm_add, AP_DEVICES);
+ DECLARE_BITMAP(aqm_add, AP_DOMAINS);
+ DECLARE_BITMAP(adm_add, AP_DOMAINS);
+ unsigned long *cur_apm, *cur_aqm, *cur_adm;
+ unsigned long *prev_apm, *prev_aqm, *prev_adm;
+
+ cur_apm = (unsigned long *)matrix_dev->config_info.apm;
+ cur_aqm = (unsigned long *)matrix_dev->config_info.aqm;
+ cur_adm = (unsigned long *)matrix_dev->config_info.adm;
+
+ prev_apm = (unsigned long *)matrix_dev->config_info_prev.apm;
+ prev_aqm = (unsigned long *)matrix_dev->config_info_prev.aqm;
+ prev_adm = (unsigned long *)matrix_dev->config_info_prev.adm;
+
+ do_add = bitmap_andnot(apm_add, cur_apm, prev_apm, AP_DEVICES);
+ do_add |= bitmap_andnot(aqm_add, cur_aqm, prev_aqm, AP_DOMAINS);
+ do_add |= bitmap_andnot(adm_add, cur_adm, prev_adm, AP_DOMAINS);
+
+ if (do_add)
+ vfio_ap_mdev_cfg_add(apm_add, aqm_add, adm_add);
+}
+
+/**
+ * vfio_ap_on_cfg_changed - handles notification of changes to the host AP
+ * configuration.
+ *
+ * @new_config_info: the new host AP configuration
+ * @old_config_info: the previous host AP configuration
+ */
+void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
+ struct ap_config_info *old_config_info)
+{
+ down_read(&matrix_dev->guests_lock);
+
+ memcpy(&matrix_dev->config_info_prev, old_config_info,
+ sizeof(struct ap_config_info));
+ memcpy(&matrix_dev->config_info, new_config_info,
+ sizeof(struct ap_config_info));
+ vfio_ap_mdev_on_cfg_remove();
+ vfio_ap_mdev_on_cfg_add();
+
+ up_read(&matrix_dev->guests_lock);
+}
+
+static void vfio_ap_mdev_hot_plug_cfg(struct ap_guest *guest)
+{
+ bool filter_matrix, filter_cdoms, do_hotplug = false;
+
+ filter_matrix = bitmap_intersects(guest->matrix_mdev->matrix.apm,
+ guest->matrix_mdev->apm_add,
+ AP_DEVICES) ||
+ bitmap_intersects(guest->matrix_mdev->matrix.aqm,
+ guest->matrix_mdev->aqm_add,
+ AP_DOMAINS);
+
+ filter_cdoms = bitmap_intersects(guest->matrix_mdev->matrix.adm,
+ guest->matrix_mdev->aqm_add,
+ AP_DOMAINS);
+
+ mutex_lock(&guest->kvm->lock);
+ mutex_lock(&matrix_dev->lock);
+
+ if (filter_matrix)
+ do_hotplug |= vfio_ap_mdev_filter_matrix(guest->matrix_mdev);
+
+ if (filter_cdoms)
+ do_hotplug |= vfio_ap_mdev_filter_cdoms(guest->matrix_mdev);
+
+ if (do_hotplug)
+ vfio_ap_mdev_hotplug_apcb(guest->matrix_mdev);
+
+ mutex_unlock(&matrix_dev->lock);
+ mutex_unlock(&guest->kvm->lock);
+}
+
+void vfio_ap_on_scan_complete(struct ap_config_info *new_config_info,
+ struct ap_config_info *old_config_info)
+{
+ struct ap_guest *guest;
+
+ down_read(&matrix_dev->guests_lock);
+
+ list_for_each_entry(guest, &matrix_dev->guests, node) {
+ if (bitmap_empty(guest->matrix_mdev->apm_add, AP_DEVICES) &&
+ bitmap_empty(guest->matrix_mdev->aqm_add, AP_DOMAINS) &&
+ bitmap_empty(guest->matrix_mdev->adm_add, AP_DOMAINS))
+ continue;
+
+ vfio_ap_mdev_hot_plug_cfg(guest);
+ bitmap_clear(guest->matrix_mdev->apm_add, 0, AP_DEVICES);
+ bitmap_clear(guest->matrix_mdev->aqm_add, 0, AP_DOMAINS);
+ bitmap_clear(guest->matrix_mdev->adm_add, 0, AP_DOMAINS);
+ }
+
+ up_read(&matrix_dev->guests_lock);
+}
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 97da41f87c65..affa63da7f88 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -37,7 +37,9 @@ struct ap_guest {
*
* @device: generic device structure associated with the AP matrix device
* @available_instances: number of mediated matrix devices that can be created
- * @info: the struct containing the output from the PQAP(QCI) instruction
+ * @config_info: the struct containing the output from the PQAP(QCI) instruction
+ * @config_info_prev: the struct containing the previous output from the
+ * PQAP(AQIC) instruction
* @mdev_list: the list of mediated matrix devices created
* @lock: mutex for locking the AP matrix device. This lock will be
* taken every time we fiddle with state managed by the vfio_ap
@@ -52,7 +54,8 @@ struct ap_guest {
struct ap_matrix_dev {
struct device device;
atomic_t available_instances;
- struct ap_config_info info;
+ struct ap_config_info config_info;
+ struct ap_config_info config_info_prev;
struct list_head mdev_list;
struct mutex lock;
struct ap_driver *vfio_ap_drv;
@@ -110,6 +113,13 @@ struct ap_queue_table {
* @mdev: the mediated device
* @qtable: table of queues (struct vfio_ap_queue) assigned to the mdev
* @guest: the KVM guest using the matrix mdev
+ * @apm_add: adapters to be hot plugged into the guest when the vfio_ap
+ * device driver is notified that the AP bus scan has completed.
+ * @aqm_add: domains to be hot plugged into the guest when the vfio_ap
+ * device driver is notified that the AP bus scan has completed.
+ * @adm_add: control domains to be hot plugged into the guest when the
+ * vfio_ap device driver is notified that the AP bus scan has
+ * completed.
*/
struct ap_matrix_mdev {
struct vfio_device vdev;
@@ -121,6 +131,9 @@ struct ap_matrix_mdev {
struct mdev_device *mdev;
struct ap_queue_table qtable;
struct ap_guest *guest;
+ DECLARE_BITMAP(apm_add, AP_DEVICES);
+ DECLARE_BITMAP(aqm_add, AP_DOMAINS);
+ DECLARE_BITMAP(adm_add, AP_DOMAINS);
};

/**
@@ -151,4 +164,10 @@ void vfio_ap_mdev_remove_queue(struct ap_device *queue);

int vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm);

+
+void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
+ struct ap_config_info *old_config_info);
+void vfio_ap_on_scan_complete(struct ap_config_info *new_config_info,
+ struct ap_config_info *old_config_info);
+
#endif /* _VFIO_AP_PRIVATE_H_ */
--
2.31.1

2021-10-21 15:26:45

by Anthony Krowiak

[permalink] [raw]
Subject: [PATCH v17 13/15] s390/vfio-ap: sysfs attribute to display the guest's matrix

The matrix of adapters and domains configured in a guest's APCB may
differ from the matrix of adapters and domains assigned to the matrix mdev,
so this patch introduces a sysfs attribute to display the matrix of
adapters and domains that are or will be assigned to the APCB of a guest
that is or will be using the matrix mdev. For a matrix mdev denoted by
$uuid, the guest matrix can be displayed as follows:

cat /sys/devices/vfio_ap/matrix/$uuid/guest_matrix

Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 50 +++++++++++++++++++++++--------
1 file changed, 37 insertions(+), 13 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 5386b8635bec..8075080ef2dd 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1131,28 +1131,24 @@ static ssize_t control_domains_show(struct device *dev,
}
static DEVICE_ATTR_RO(control_domains);

-static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
- char *buf)
+static ssize_t vfio_ap_mdev_matrix_show(struct ap_matrix *matrix, char *buf)
{
- struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev);
char *bufpos = buf;
unsigned long apid;
unsigned long apqi;
unsigned long apid1;
unsigned long apqi1;
- unsigned long napm_bits = matrix_mdev->matrix.apm_max + 1;
- unsigned long naqm_bits = matrix_mdev->matrix.aqm_max + 1;
+ unsigned long napm_bits = matrix->apm_max + 1;
+ unsigned long naqm_bits = matrix->aqm_max + 1;
int nchars = 0;
int n;

- apid1 = find_first_bit_inv(matrix_mdev->matrix.apm, napm_bits);
- apqi1 = find_first_bit_inv(matrix_mdev->matrix.aqm, naqm_bits);
-
- mutex_lock(&matrix_dev->lock);
+ apid1 = find_first_bit_inv(matrix->apm, napm_bits);
+ apqi1 = find_first_bit_inv(matrix->aqm, naqm_bits);

if ((apid1 < napm_bits) && (apqi1 < naqm_bits)) {
- for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm_bits) {
- for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
+ for_each_set_bit_inv(apid, matrix->apm, napm_bits) {
+ for_each_set_bit_inv(apqi, matrix->aqm,
naqm_bits) {
n = sprintf(bufpos, "%02lx.%04lx\n", apid,
apqi);
@@ -1161,25 +1157,52 @@ static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
}
}
} else if (apid1 < napm_bits) {
- for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm_bits) {
+ for_each_set_bit_inv(apid, matrix->apm, napm_bits) {
n = sprintf(bufpos, "%02lx.\n", apid);
bufpos += n;
nchars += n;
}
} else if (apqi1 < naqm_bits) {
- for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, naqm_bits) {
+ for_each_set_bit_inv(apqi, matrix->aqm, naqm_bits) {
n = sprintf(bufpos, ".%04lx\n", apqi);
bufpos += n;
nchars += n;
}
}

+ return nchars;
+}
+
+static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ ssize_t nchars;
+ struct mdev_device *mdev = mdev_from_dev(dev);
+ struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+
+ mutex_lock(&matrix_dev->lock);
+ nchars = vfio_ap_mdev_matrix_show(&matrix_mdev->matrix, buf);
mutex_unlock(&matrix_dev->lock);

return nchars;
}
static DEVICE_ATTR_RO(matrix);

+static ssize_t guest_matrix_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ ssize_t nchars;
+ struct mdev_device *mdev = mdev_from_dev(dev);
+ struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+
+ mutex_lock(&matrix_dev->lock);
+ nchars = vfio_ap_mdev_matrix_show(&matrix_mdev->shadow_apcb, buf);
+ mutex_unlock(&matrix_dev->lock);
+
+ return nchars;
+}
+static DEVICE_ATTR_RO(guest_matrix);
+
static struct attribute *vfio_ap_mdev_attrs[] = {
&dev_attr_assign_adapter.attr,
&dev_attr_unassign_adapter.attr,
@@ -1189,6 +1212,7 @@ static struct attribute *vfio_ap_mdev_attrs[] = {
&dev_attr_unassign_control_domain.attr,
&dev_attr_control_domains.attr,
&dev_attr_matrix.attr,
+ &dev_attr_guest_matrix.attr,
NULL,
};

--
2.31.1

2021-10-21 15:26:47

by Anthony Krowiak

[permalink] [raw]
Subject: [PATCH v17 15/15] s390/vfio-ap: update docs to include dynamic config support

Update the documentation in vfio-ap.rst to include information about the
AP dynamic configuration support (e.g., hot plug of adapters, domains
and control domains via the matrix mediated device's sysfs assignment
attributes). This patch also makes a few minor tweaks to make corrections
and clarifications.

Signed-off-by: Tony Krowiak <[email protected]>
---
Documentation/s390/vfio-ap.rst | 492 +++++++++++++++++++++++----------
1 file changed, 346 insertions(+), 146 deletions(-)

diff --git a/Documentation/s390/vfio-ap.rst b/Documentation/s390/vfio-ap.rst
index f57ae621f33e..f4b8748ab9a8 100644
--- a/Documentation/s390/vfio-ap.rst
+++ b/Documentation/s390/vfio-ap.rst
@@ -123,27 +123,24 @@ Let's now take a look at how AP instructions executed on a guest are interpreted
by the hardware.

A satellite control block called the Crypto Control Block (CRYCB) is attached to
-our main hardware virtualization control block. The CRYCB contains three fields
-to identify the adapters, usage domains and control domains assigned to the KVM
-guest:
+our main hardware virtualization control block. The CRYCB contains an AP Control
+Block (APCB) that has three fields to identify the adapters, usage domains and
+control domains assigned to the KVM guest:

* The AP Mask (APM) field is a bit mask that identifies the AP adapters assigned
- to the KVM guest. Each bit in the mask, from left to right (i.e. from most
- significant to least significant bit in big endian order), corresponds to
+ to the KVM guest. Each bit in the mask, from left to right, corresponds to
an APID from 0-255. If a bit is set, the corresponding adapter is valid for
use by the KVM guest.

* The AP Queue Mask (AQM) field is a bit mask identifying the AP usage domains
- assigned to the KVM guest. Each bit in the mask, from left to right (i.e. from
- most significant to least significant bit in big endian order), corresponds to
- an AP queue index (APQI) from 0-255. If a bit is set, the corresponding queue
- is valid for use by the KVM guest.
+ assigned to the KVM guest. Each bit in the mask, from left to right,
+ corresponds to an AP queue index (APQI) from 0-255. If a bit is set, the
+ corresponding queue is valid for use by the KVM guest.

* The AP Domain Mask field is a bit mask that identifies the AP control domains
assigned to the KVM guest. The ADM bit mask controls which domains can be
changed by an AP command-request message sent to a usage domain from the
- guest. Each bit in the mask, from left to right (i.e. from most significant to
- least significant bit in big endian order), corresponds to a domain from
+ guest. Each bit in the mask, from left to right, corresponds to a domain from
0-255. If a bit is set, the corresponding domain can be modified by an AP
command-request message sent to a usage domain.

@@ -151,10 +148,10 @@ If you recall from the description of an AP Queue, AP instructions include
an APQN to identify the AP queue to which an AP command-request message is to be
sent (NQAP and PQAP instructions), or from which a command-reply message is to
be received (DQAP instruction). The validity of an APQN is defined by the matrix
-calculated from the APM and AQM; it is the cross product of all assigned adapter
-numbers (APM) with all assigned queue indexes (AQM). For example, if adapters 1
-and 2 and usage domains 5 and 6 are assigned to a guest, the APQNs (1,5), (1,6),
-(2,5) and (2,6) will be valid for the guest.
+calculated from the APM and AQM; it is the Cartesian product of all assigned
+adapter numbers (APM) with all assigned queue indexes (AQM). For example, if
+adapters 1 and 2 and usage domains 5 and 6 are assigned to a guest, the APQNs
+(1,5), (1,6), (2,5) and (2,6) will be valid for the guest.

The APQNs can provide secure key functionality - i.e., a private key is stored
on the adapter card for each of its domains - so each APQN must be assigned to
@@ -192,7 +189,7 @@ The design introduces three new objects:

1. AP matrix device
2. VFIO AP device driver (vfio_ap.ko)
-3. VFIO AP mediated matrix pass-through device
+3. VFIO AP mediated pass-through device

The VFIO AP device driver
-------------------------
@@ -200,12 +197,13 @@ The VFIO AP (vfio_ap) device driver serves the following purposes:

1. Provides the interfaces to secure APQNs for exclusive use of KVM guests.

-2. Sets up the VFIO mediated device interfaces to manage a mediated matrix
+2. Sets up the VFIO mediated device interfaces to manage a vfio_ap mediated
device and creates the sysfs interfaces for assigning adapters, usage
domains, and control domains comprising the matrix for a KVM guest.

-3. Configures the APM, AQM and ADM in the CRYCB referenced by a KVM guest's
- SIE state description to grant the guest access to a matrix of AP devices
+3. Configures the APM, AQM and ADM in the APCB contained in the CRYCB referenced
+ by a KVM guest's SIE state description to grant the guest access to a matrix
+ of AP devices

Reserve APQNs for exclusive use of KVM guests
---------------------------------------------
@@ -235,10 +233,10 @@ reserved::
| | 8 probe | |
+--------^---------+ +--^--^------------+
6 edit | | |
- apmask | +-----------------------------+ | 9 mdev create
+ apmask | +-----------------------------+ | 11 mdev create
aqmask | | 1 modprobe |
+--------+-----+---+ +----------------+-+ +----------------+
- | | | |8 create | mediated |
+ | | | |10 create| mediated |
| admin | | VFIO device core |---------> matrix |
| + | | | device |
+------+-+---------+ +--------^---------+ +--------^-------+
@@ -246,14 +244,14 @@ reserved::
| | 9 create vfio_ap-passthrough | |
| +------------------------------+ |
+-------------------------------------------------------------+
- 10 assign adapter/domain/control domain
+ 12 assign adapter/domain/control domain

The process for reserving an AP queue for use by a KVM guest is:

1. The administrator loads the vfio_ap device driver
2. The vfio-ap driver during its initialization will register a single 'matrix'
device with the device core. This will serve as the parent device for
- all mediated matrix devices used to configure an AP matrix for a guest.
+ all vfio_ap mediated devices used to configure an AP matrix for a guest.
3. The /sys/devices/vfio_ap/matrix device is created by the device core
4. The vfio_ap device driver will register with the AP bus for AP queue devices
of type 10 and higher (CEX4 and newer). The driver will provide the vfio_ap
@@ -269,24 +267,24 @@ The process for reserving an AP queue for use by a KVM guest is:
default zcrypt cex4queue driver.
8. The AP bus probes the vfio_ap device driver to bind the queues reserved for
it.
-9. The administrator creates a passthrough type mediated matrix device to be
+9. The administrator creates a passthrough type vfio_ap mediated device to be
used by a guest
10. The administrator assigns the adapters, usage domains and control domains
to be exclusively used by a guest.

Set up the VFIO mediated device interfaces
------------------------------------------
-The VFIO AP device driver utilizes the common interface of the VFIO mediated
+The VFIO AP device driver utilizes the common interfaces of the VFIO mediated
device core driver to:

-* Register an AP mediated bus driver to add a mediated matrix device to and
+* Register an AP mediated bus driver to add a vfio_ap mediated device to and
remove it from a VFIO group.
-* Create and destroy a mediated matrix device
-* Add a mediated matrix device to and remove it from the AP mediated bus driver
-* Add a mediated matrix device to and remove it from an IOMMU group
+* Create and destroy a vfio_ap mediated device
+* Add a vfio_ap mediated device to and remove it from the AP mediated bus driver
+* Add a vfio_ap mediated device to and remove it from an IOMMU group

The following high-level block diagram shows the main components and interfaces
-of the VFIO AP mediated matrix device driver::
+of the VFIO AP mediated device driver::

+-------------+
| |
@@ -343,7 +341,7 @@ matrix device.
* device_api:
the mediated device type's API
* available_instances:
- the number of mediated matrix passthrough devices
+ the number of vfio_ap mediated passthrough devices
that can be created
* device_api:
specifies the VFIO API
@@ -351,29 +349,37 @@ matrix device.
This attribute group identifies the user-defined sysfs attributes of the
mediated device. When a device is registered with the VFIO mediated device
framework, the sysfs attribute files identified in the 'mdev_attr_groups'
- structure will be created in the mediated matrix device's directory. The
- sysfs attributes for a mediated matrix device are:
+ structure will be created in the vfio_ap mediated device's directory. The
+ sysfs attributes for a vfio_ap mediated device are:

assign_adapter / unassign_adapter:
Write-only attributes for assigning/unassigning an AP adapter to/from the
- mediated matrix device. To assign/unassign an adapter, the APID of the
- adapter is echoed to the respective attribute file.
+ vfio_ap mediated device. To assign/unassign an adapter, the APID of the
+ adapter is echoed into the respective attribute file.
assign_domain / unassign_domain:
Write-only attributes for assigning/unassigning an AP usage domain to/from
- the mediated matrix device. To assign/unassign a domain, the domain
- number of the usage domain is echoed to the respective attribute
+ the vfio_ap mediated device. To assign/unassign a domain, the domain
+ number of the usage domain is echoed into the respective attribute
file.
matrix:
- A read-only file for displaying the APQNs derived from the cross product
- of the adapter and domain numbers assigned to the mediated matrix device.
+ A read-only file for displaying the APQNs derived from the Cartesian
+ product of the adapter and domain numbers assigned to the vfio_ap mediated
+ device.
+ guest_matrix:
+ A read-only file for displaying the APQNs derived from the Cartesian
+ product of the adapter and domain numbers assigned to the APM and AQM
+ fields respectively of the KVM guest's CRYCB. This may differ from the
+ the APQNs assigned to the vfio_ap mediated device if any APQN does not
+ reference a queue device bound to the vfio_ap device driver (i.e., the
+ queue is not in the host's AP configuration).
assign_control_domain / unassign_control_domain:
Write-only attributes for assigning/unassigning an AP control domain
- to/from the mediated matrix device. To assign/unassign a control domain,
- the ID of the domain to be assigned/unassigned is echoed to the respective
- attribute file.
+ to/from the vfio_ap mediated device. To assign/unassign a control domain,
+ the ID of the domain to be assigned/unassigned is echoed into the
+ respective attribute file.
control_domains:
A read-only file for displaying the control domain numbers assigned to the
- mediated matrix device.
+ vfio_ap mediated device.

* functions:

@@ -383,45 +389,75 @@ matrix device.
* Store the reference to the KVM structure for the guest using the mdev
* Store the AP matrix configuration for the adapters, domains, and control
domains assigned via the corresponding sysfs attributes files
+ * Store the AP matrix configuration for the adapters, domains and control
+ domains available to a guest. A guest may not be provided access to APQNs
+ referencing queue devices that do not exist, or are not bound to the
+ vfio_ap device driver.

remove:
- deallocates the mediated matrix device's ap_matrix_mdev structure. This will
- be allowed only if a running guest is not using the mdev.
+ deallocates the vfio_ap mediated device's ap_matrix_mdev structure.
+ This will be allowed only if a running guest is not using the mdev.

* callback interfaces

- open:
+ open_device:
The vfio_ap driver uses this callback to register a
- VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the mdev matrix
- device. The open is invoked when QEMU connects the VFIO iommu group
- for the mdev matrix device to the MDEV bus. Access to the KVM structure used
- to configure the KVM guest is provided via this callback. The KVM structure,
- is used to configure the guest's access to the AP matrix defined via the
- mediated matrix device's sysfs attribute files.
- release:
+ VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the matrix mdev
+ devices. The open_device callback is invoked by userspace to connect the
+ VFIO iommu group for the matrix mdev device to the MDEV bus. Access to the
+ KVM structure used to configure the KVM guest is provided via this callback.
+ The KVM structure, is used to configure the guest's access to the AP matrix
+ defined via the vfio_ap mediated device's sysfs attribute files.
+
+ close_device:
unregisters the VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the
- mdev matrix device and deconfigures the guest's AP matrix.
+ matrix mdev device and deconfigures the guest's AP matrix.

-Configure the APM, AQM and ADM in the CRYCB
--------------------------------------------
-Configuring the AP matrix for a KVM guest will be performed when the
+ ioctl:
+ this callback handles the VFIO_DEVICE_GET_INFO and VFIO_DEVICE_RESET ioctls
+ defined by the vfio framework.
+
+Configure the guest's AP resources
+----------------------------------
+Configuring the AP resources for a KVM guest will be performed when the
VFIO_GROUP_NOTIFY_SET_KVM notifier callback is invoked. The notifier
-function is called when QEMU connects to KVM. The guest's AP matrix is
-configured via it's CRYCB by:
+function is called when userspace connects to KVM. The guest's AP resources are
+configured via it's APCB by:

* Setting the bits in the APM corresponding to the APIDs assigned to the
- mediated matrix device via its 'assign_adapter' interface.
+ vfio_ap mediated device via its 'assign_adapter' interface.
* Setting the bits in the AQM corresponding to the domains assigned to the
- mediated matrix device via its 'assign_domain' interface.
+ vfio_ap mediated device via its 'assign_domain' interface.
* Setting the bits in the ADM corresponding to the domain dIDs assigned to the
- mediated matrix device via its 'assign_control_domains' interface.
+ vfio_ap mediated device via its 'assign_control_domains' interface.
+
+The linux device model precludes passing a device through to a KVM guest that
+is not bound to the device driver facilitating its pass-through. Consequently,
+an APQN that does not reference a queue device bound to the vfio_ap device
+driver will not be assigned to a KVM guest's matrix. The AP architecture,
+however, does not provide a means to filter individual APQNs from the guest's
+matrix, so the adapters, domains and control domains assigned to vfio_ap
+mediated device via its sysfs 'assign_adapter', 'assign_domain' and
+'assign_control_domain' interfaces will be filtered before providing the AP
+configuration to a guest:
+
+* The APIDs of the adapters, the APQIs of the domains and the domain numbers of
+ the control domains assigned to the matrix mdev that are not also assigned to
+ the host's AP configuration will be filtered.
+
+* Each APQN derived from the Cartesian product of the APIDs and APQIs assigned
+ to the vfio_ap mdev is examined and if any one of them does not reference a
+ queue device bound to the vfio_ap device driver, the adapter will not be
+ plugged into the guest (i.e., the bit corresponding to its APID will not be
+ set in the APM of the guest's APCB).

The CPU model features for AP
-----------------------------
-The AP stack relies on the presence of the AP instructions as well as two
-facilities: The AP Facilities Test (APFT) facility; and the AP Query
-Configuration Information (QCI) facility. These features/facilities are made
-available to a KVM guest via the following CPU model features:
+The AP stack relies on the presence of the AP instructions as well as three
+facilities: The AP Facilities Test (APFT) facility; the AP Query
+Configuration Information (QCI) facility; and the AP Queue Interruption Control
+facility. These features/facilities are made available to a KVM guest via the
+following CPU model features:

1. ap: Indicates whether the AP instructions are installed on the guest. This
feature will be enabled by KVM only if the AP instructions are installed
@@ -435,24 +471,28 @@ available to a KVM guest via the following CPU model features:
can be made available to the guest only if it is available on the host (i.e.,
facility bit 12 is set).

+4. apqi: Indicates AP Queue Interruption Control faclity is available on the
+ guest. This facility can be made available to the guest only if it is
+ available on the host (i.e., facility bit 65 is set).
+
Note: If the user chooses to specify a CPU model different than the 'host'
model to QEMU, the CPU model features and facilities need to be turned on
explicitly; for example::

- /usr/bin/qemu-system-s390x ... -cpu z13,ap=on,apqci=on,apft=on
+ /usr/bin/qemu-system-s390x ... -cpu z13,ap=on,apqci=on,apft=on,apqi=on

A guest can be precluded from using AP features/facilities by turning them off
explicitly; for example::

- /usr/bin/qemu-system-s390x ... -cpu host,ap=off,apqci=off,apft=off
+ /usr/bin/qemu-system-s390x ... -cpu host,ap=off,apqci=off,apft=off,apqi=off

Note: If the APFT facility is turned off (apft=off) for the guest, the guest
-will not see any AP devices. The zcrypt device drivers that register for type 10
-and newer AP devices - i.e., the cex4card and cex4queue device drivers - need
-the APFT facility to ascertain the facilities installed on a given AP device. If
-the APFT facility is not installed on the guest, then the probe of device
-drivers will fail since only type 10 and newer devices can be configured for
-guest use.
+will not see any AP devices. The zcrypt device drivers on the guest that
+register for type 10 and newer AP devices - i.e., the cex4card and cex4queue
+device drivers - need the APFT facility to ascertain the facilities installed on
+a given AP device. If the APFT facility is not installed on the guest, then no
+adapter or domain devices will get created by the AP bus running on the
+guest because only type 10 and newer devices can be configured for guest use.

Example
=======
@@ -471,7 +511,7 @@ CARD.DOMAIN TYPE MODE
05.00ab CEX5C CCA-Coproc
06 CEX5A Accelerator
06.0004 CEX5A Accelerator
-06.00ab CEX5C CCA-Coproc
+06.00ab CEX5A Accelerator
=========== ===== ============

Guest2
@@ -479,9 +519,9 @@ Guest2
=========== ===== ============
CARD.DOMAIN TYPE MODE
=========== ===== ============
-05 CEX5A Accelerator
-05.0047 CEX5A Accelerator
-05.00ff CEX5A Accelerator
+05 CEX5C CCA-Coproc
+05.0047 CEX5C CCA-Coproc
+05.00ff CEX5C CCA-Coproc
=========== ===== ============

Guest3
@@ -529,40 +569,56 @@ These are the steps:

2. Secure the AP queues to be used by the three guests so that the host can not
access them. To secure them, there are two sysfs files that specify
- bitmasks marking a subset of the APQN range as 'usable by the default AP
- queue device drivers' or 'not usable by the default device drivers' and thus
- available for use by the vfio_ap device driver'. The location of the sysfs
- files containing the masks are::
+ bitmasks marking a subset of the APQN range as usable only by the default AP
+ queue device drivers. All remaining APQNs are available for use by
+ any other device driver. The vfio_ap device driver is currently the only
+ non-default device driver. The location of the sysfs files containing the
+ masks are::

/sys/bus/ap/apmask
/sys/bus/ap/aqmask

The 'apmask' is a 256-bit mask that identifies a set of AP adapter IDs
- (APID). Each bit in the mask, from left to right (i.e., from most significant
- to least significant bit in big endian order), corresponds to an APID from
- 0-255. If a bit is set, the APID is marked as usable only by the default AP
- queue device drivers; otherwise, the APID is usable by the vfio_ap
- device driver.
+ (APID). Each bit in the mask, from left to right, corresponds to an APID from
+ 0-255. If a bit is set, the APID belongs to the subset of APQNs marked as
+ available only to the default AP queue device drivers.

The 'aqmask' is a 256-bit mask that identifies a set of AP queue indexes
- (APQI). Each bit in the mask, from left to right (i.e., from most significant
- to least significant bit in big endian order), corresponds to an APQI from
- 0-255. If a bit is set, the APQI is marked as usable only by the default AP
- queue device drivers; otherwise, the APQI is usable by the vfio_ap device
- driver.
+ (APQI). Each bit in the mask, from left to right, corresponds to an APQI from
+ 0-255. If a bit is set, the APQI belongs to the subset of APQNs marked as
+ available only to the default AP queue device drivers.
+
+ The Cartesian product of the APIDs corresponding to the bits set in the
+ apmask and the APQIs corresponding to the bits set in the aqmask comprise
+ the subset of APQNs that can be used only by the host default device drivers.
+ All other APQNs are available to the non-default device drivers such as the
+ vfio_ap driver.
+
+ Take, for example, the following masks::
+
+ apmask:
+ 0x7d00000000000000000000000000000000000000000000000000000000000000

- Take, for example, the following mask::
+ aqmask:
+ 0x8000000000000000000000000000000000000000000000000000000000000000

- 0x7dffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
+ The masks indicate:

- It indicates:
+ * Adapters 1, 2, 3, 4, 5, and 7 are available for use by the host default
+ device drivers.

- 1, 2, 3, 4, 5, and 7-255 belong to the default drivers' pool, and 0 and 6
- belong to the vfio_ap device driver's pool.
+ * Domain 0 is available for use by the host default device drivers
+
+ * The subset of APQNs available for use only by the default host device
+ drivers are:
+
+ (1,0), (2,0), (3,0), (4.0), (5,0) and (7,0)
+
+ * All other APQNs are available for use by the non-default device drivers.

The APQN of each AP queue device assigned to the linux host is checked by the
- AP bus against the set of APQNs derived from the cross product of APIDs
- and APQIs marked as usable only by the default AP queue device drivers. If a
+ AP bus against the set of APQNs derived from the Cartesian product of APIDs
+ and APQIs marked as available to the default AP queue device drivers. If a
match is detected, only the default AP queue device drivers will be probed;
otherwise, the vfio_ap device driver will be probed.

@@ -579,8 +635,7 @@ These are the steps:

0x4100000000000000000000000000000000000000000000000000000000000000

- Keep in mind that the mask reads from left to right (i.e., most
- significant to least significant bit in big endian order), so the mask
+ Keep in mind that the mask reads from left to right, so the mask
above identifies device numbers 1 and 7 (01000001).

If the string is longer than the mask, the operation is terminated with
@@ -626,11 +681,22 @@ These are the steps:
default drivers pool: adapter 0-15, domain 1
alternate drivers pool: adapter 16-255, domains 0, 2-255

+ Note ***:
+ Changing a mask such that one or more APQNs will be taken from a vfio_ap
+ mediated device (see below) will fail with an error (EBUSY). A message
+ is logged to the kernel ring buffer which can be viewed with the 'dmesg'
+ command. The output identifies each APQN flagged as 'in use' and identifies
+ the vfio_ap mediated device to which it is assigned; for example:
+
+ Userspace may not re-assign queue 05.0054 already assigned to 62177883-f1bb-47f0-914d-32a22e3a8804
+ Userspace may not re-assign queue 04.0054 already assigned to cef03c3c-903d-4ecc-9a83-40694cb8aee4
+
Securing the APQNs for our example
----------------------------------
To secure the AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004, 06.0047,
06.00ab, and 06.00ff for use by the vfio_ap device driver, the corresponding
- APQNs can either be removed from the default masks::
+ APQNs can be removed from the default masks using either of the following
+ commands::

echo -5,-6 > /sys/bus/ap/apmask

@@ -683,7 +749,7 @@ Securing the APQNs for our example

/sys/devices/vfio_ap/matrix/
--- [mdev_supported_types]
- ------ [vfio_ap-passthrough] (passthrough mediated matrix device type)
+ ------ [vfio_ap-passthrough] (passthrough vfio_ap mediated device type)
--------- create
--------- [devices]

@@ -734,6 +800,9 @@ Securing the APQNs for our example
----------------unassign_control_domain
----------------unassign_domain

+ Note *****: The vfio_ap mdevs do not persist across reboots unless the
+ mdevctl tool is used to create and persist them.
+
4. The administrator now needs to configure the matrixes for the mediated
devices $uuid1 (for Guest1), $uuid2 (for Guest2) and $uuid3 (for Guest3).

@@ -755,6 +824,10 @@ Securing the APQNs for our example

cat matrix

+ To display the matrix that is or will be assigned to Guest1::
+
+ cat guest_matrix
+
This is how the matrix is configured for Guest2::

echo 5 > assign_adapter
@@ -774,17 +847,24 @@ Securing the APQNs for our example
higher than the maximum is specified, the operation will terminate with
an error (ENODEV).

- * All APQNs that can be derived from the adapter ID and the IDs of
- the previously assigned domains must be bound to the vfio_ap device
- driver. If no domains have yet been assigned, then there must be at least
- one APQN with the specified APID bound to the vfio_ap driver. If no such
- APQNs are bound to the driver, the operation will terminate with an
- error (EADDRNOTAVAIL).
+ Note: The maximum adapter number can be obtained via the sysfs
+ /sys/bus/ap/ap_max_adapter_id attribute file.
+
+ * Each APQN derived from the Cartesian product of the APID of the adapter
+ being assigned and the APQIs of the domains previously assigned:

- No APQN that can be derived from the adapter ID and the IDs of the
- previously assigned domains can be assigned to another mediated matrix
- device. If an APQN is assigned to another mediated matrix device, the
- operation will terminate with an error (EADDRINUSE).
+ - Must only be available to the vfio_ap device driver as specified in the
+ sysfs /sys/bus/ap/apmask and /sys/bus/ap/aqmask attribute files. If even
+ one APQN is reserved for use by the host device driver, the operation
+ will terminate with an error (EADDRNOTAVAIL).
+
+ - Must NOT be assigned to another vfio_ap mediated device. If even one APQN
+ is assigned to another vfio_ap mediated device, the operation will
+ terminate with an error (EBUSY).
+
+ - Must NOT be assigned while the sysfs /sys/bus/ap/apmask and
+ sys/bus/ap/aqmask attribute files are being edited or the operation may
+ terminate with an error (EBUSY).

In order to successfully assign a domain:

@@ -793,41 +873,50 @@ Securing the APQNs for our example
higher than the maximum is specified, the operation will terminate with
an error (ENODEV).

- * All APQNs that can be derived from the domain ID and the IDs of
- the previously assigned adapters must be bound to the vfio_ap device
- driver. If no domains have yet been assigned, then there must be at least
- one APQN with the specified APQI bound to the vfio_ap driver. If no such
- APQNs are bound to the driver, the operation will terminate with an
- error (EADDRNOTAVAIL).
+ Note: The maximum domain number can be obtained via the sysfs
+ /sys/bus/ap/ap_max_domain_id attribute file.
+
+ * Each APQN derived from the Cartesian product of the APQI of the domain
+ being assigned and the APIDs of the adapters previously assigned:

- No APQN that can be derived from the domain ID and the IDs of the
- previously assigned adapters can be assigned to another mediated matrix
- device. If an APQN is assigned to another mediated matrix device, the
- operation will terminate with an error (EADDRINUSE).
+ - Must only be available to the vfio_ap device driver as specified in the
+ sysfs /sys/bus/ap/apmask and /sys/bus/ap/aqmask attribute files. If even
+ one APQN is reserved for use by the host device driver, the operation
+ will terminate with an error (EADDRNOTAVAIL).

- In order to successfully assign a control domain, the domain number
- specified must represent a value from 0 up to the maximum domain number
- configured for the system. If a control domain number higher than the maximum
- is specified, the operation will terminate with an error (ENODEV).
+ - Must NOT be assigned to another vfio_ap mediated device. If even one APQN
+ is assigned to another vfio_ap mediated device, the operation will
+ terminate with an error (EBUSY).
+
+ - Must NOT be assigned while the sysfs /sys/bus/ap/apmask and
+ sys/bus/ap/aqmask attribute files are being edited or the operation may
+ terminate with an error (EBUSY).
+
+ In order to successfully assign a control domain:
+
+ * The domain number specified must represent a value from 0 up to the maximum
+ domain number configured for the system. If a control domain number higher
+ than the maximum is specified, the operation will terminate with an
+ error (ENODEV).

5. Start Guest1::

- /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \
+ /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on,apqi=on \
-device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid1 ...

7. Start Guest2::

- /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \
+ /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on,apqi=on \
-device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid2 ...

7. Start Guest3::

- /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \
+ /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on,apqi=on \
-device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid3 ...

-When the guest is shut down, the mediated matrix devices may be removed.
+When the guest is shut down, the vfio_ap mediated devices may be removed.

-Using our example again, to remove the mediated matrix device $uuid1::
+Using our example again, to remove the vfio_ap mediated device $uuid1::

/sys/devices/vfio_ap/matrix/
--- [mdev_supported_types]
@@ -840,26 +929,137 @@ Using our example again, to remove the mediated matrix device $uuid1::

echo 1 > remove

-This will remove all of the mdev matrix device's sysfs structures including
-the mdev device itself. To recreate and reconfigure the mdev matrix device,
+This will remove all of the matrix mdev device's sysfs structures including
+the mdev device itself. To recreate and reconfigure the matrix mdev device,
all of the steps starting with step 3 will have to be performed again. Note
-that the remove will fail if a guest using the mdev is still running.
+that the remove will fail if a guest using the vfio_ap mdev is still running.

-It is not necessary to remove an mdev matrix device, but one may want to
+It is not necessary to remove a vfio_ap mdev, but one may want to
remove it if no guest will use it during the remaining lifetime of the linux
-host. If the mdev matrix device is removed, one may want to also reconfigure
+host. If the vfio_ap mdev is removed, one may want to also reconfigure
the pool of adapters and queues reserved for use by the default drivers.

+Hot plug/unplug support:
+================
+An adapter, domain or control domain may be hot plugged into a running KVM
+guest by assigning it to the vfio_ap mediated device being used by the guest if
+the following conditions are met:
+
+* The adapter, domain or control domain must also be assigned to the host's
+ AP configuration.
+
+* Each APQN derived from the Cartesian product comprised of the APID of the
+ adapter being assigned and the APQIs of the domains assigned must reference a
+ queue device bound to the vfio_ap device driver.
+
+* To hot plug a domain, each APQN derived from the Cartesian product
+ comprised of the APQI of the domain being assigned and the APIDs of the
+ adapters assigned must reference a queue device bound to the vfio_ap device
+ driver.
+
+An adapter, domain or control domain may be hot unplugged from a running KVM
+guest by unassigning it from the vfio_ap mediated device being used by the
+guest.
+
+Over-provisioning of AP queues for a KVM guest:
+==============================================
+Over-provisioning is defined herein as the assignment of adapters or domains to
+a vfio_ap mediated device that do not reference AP devices in the host's AP
+configuration. The idea here is that when the adapter or domain becomes
+available, it will be automatically hot-plugged into the KVM guest using
+the vfio_ap mediated device to which it is assigned as long as each new APQN
+resulting from plugging it in references a queue device bound to the vfio_ap
+device driver.
+
Limitations
===========
-* The KVM/kernel interfaces do not provide a way to prevent restoring an APQN
- to the default drivers pool of a queue that is still assigned to a mediated
- device in use by a guest. It is incumbent upon the administrator to
- ensure there is no mediated device in use by a guest to which the APQN is
- assigned lest the host be given access to the private data of the AP queue
- device such as a private key configured specifically for the guest.
+Live guest migration is not supported for guests using AP devices without
+intervention by a system administrator. Before a KVM guest can be migrated,
+the vfio_ap mediated device must be removed. Unfortunately, it can not be
+removed manually (i.e., echo 1 > /sys/devices/vfio_ap/matrix/$UUID/remove) while
+the mdev is in use by a KVM guest. If the guest is being emulated by QEMU,
+its mdev can be hot unplugged from the guest in one of two ways:
+
+1. If the KVM guest was started with libvirt, you can hot unplug the mdev via
+ the following commands:
+
+ virsh detach-device <guestname> <path-to-device-xml>
+
+ For example, to hot unplug mdev 62177883-f1bb-47f0-914d-32a22e3a8804 from
+ the guest named 'my-guest':
+
+ virsh detach-device my-guest ~/config/my-guest-hostdev.xml
+
+ The contents of my-guest-hostdev.xml:
+
+ <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-ap'>
+ <source>
+ <address uuid='62177883-f1bb-47f0-914d-32a22e3a8804'/>
+ </source>
+ </hostdev>
+
+
+ virsh qemu-monitor-command <guest-name> --hmp "device-del <device-id>"
+
+ For example, to hot unplug the vfio_ap mediated device identified on the
+ qemu command line with 'id=hostdev0' from the guest named 'my-guest':
+
+ virsh qemu-monitor-command my-guest --hmp "device_del hostdev0"
+
+2. A vfio_ap mediated device can be hot unplugged by attaching the qemu monitor
+ to the guest and using the following qemu monitor command:
+
+ (QEMU) device-del id=<device-id>
+
+ For example, to hot unplug the vfio_ap mediated device that was specified
+ on the qemu command line with 'id=hostdev0' when the guest was started:
+
+ (QEMU) device-del id=hostdev0
+
+After live migration of the KVM guest completes, an AP configuration can be
+restored to the KVM guest by hot plugging a vfio_ap mediated device on the target
+system into the guest in one of two ways:
+
+1. If the KVM guest was started with libvirt, you can hot plug a matrix mediated
+ device into the guest via the following virsh commands:
+
+ virsh attach-device <guestname> <path-to-device-xml>
+
+ For example, to hot plug mdev 62177883-f1bb-47f0-914d-32a22e3a8804 into
+ the guest named 'my-guest':
+
+ virsh attach-device my-guest ~/config/my-guest-hostdev.xml
+
+ The contents of my-guest-hostdev.xml:
+
+ <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-ap'>
+ <source>
+ <address uuid='62177883-f1bb-47f0-914d-32a22e3a8804'/>
+ </source>
+ </hostdev>
+
+
+ virsh qemu-monitor-command <guest-name> --hmp \
+ "device_add vfio-ap,sysfsdev=<path-to-mdev>,id=<device-id>"
+
+ For example, to hot plug the vfio_ap mediated device
+ 62177883-f1bb-47f0-914d-32a22e3a8804 into the guest named 'my-guest' with
+ device-id hostdev0:
+
+ virsh qemu-monitor-command my-guest --hmp \
+ "device_add vfio-ap,\
+ sysfsdev=/sys/devices/vfio_ap/matrix/62177883-f1bb-47f0-914d-32a22e3a8804,\
+ id=hostdev0"
+
+2. A vfio_ap mediated device can be hot plugged by attaching the qemu monitor
+ to the guest and using the following qemu monitor command:
+
+ (qemu) device_add "vfio-ap,sysfsdev=<path-to-mdev>,id=<device-id>"

-* Dynamically modifying the AP matrix for a running guest (which would amount to
- hot(un)plug of AP devices for the guest) is currently not supported
+ For example, to plug the vfio_ap mediated device
+ 62177883-f1bb-47f0-914d-32a22e3a8804 into the guest with the device-id
+ hostdev0:

-* Live guest migration is not supported for guests using AP devices.
+ (QEMU) device-add "vfio-ap,\
+ sysfsdev=/sys/devices/vfio_ap/matrix/62177883-f1bb-47f0-914d-32a22e3a8804,\
+ id=hostdev0"
--
2.31.1

2021-10-21 15:28:04

by Anthony Krowiak

[permalink] [raw]
Subject: [PATCH v17 12/15] s390/vfio-ap: implement in-use callback for vfio_ap driver

Let's implement the callback to indicate when an APQN
is in use by the vfio_ap device driver. The callback is
invoked whenever a change to the apmask or aqmask would
result in one or more queue devices being removed from the driver. The
vfio_ap device driver will indicate a resource is in use
if the APQN of any of the queue devices to be removed are assigned to
any of the matrix mdevs under the driver's control.

There is potential for a deadlock condition between the matrix_dev->lock
used to lock the matrix device during assignment of adapters and domains
and the ap_perms_mutex locked by the AP bus when changes are made to the
sysfs apmask/aqmask attributes.

Consider following scenario (courtesy of Halil Pasic):
1) apmask_store() takes ap_perms_mutex
2) assign_adapter_store() takes matrix_dev->lock
3) apmask_store() calls vfio_ap_mdev_resource_in_use() which tries
to take matrix_dev->lock
4) assign_adapter_store() calls ap_apqn_in_matrix_owned_by_def_drv
which tries to take ap_perms_mutex
BANG!

To resolve this issue, instead of using the mutex_lock(&matrix_dev->lock)
function to lock the matrix device during assignment of an adapter or
domain to a matrix_mdev as well as during the in_use callback, the
mutex_trylock(&matrix_dev->lock) function will be used. If the lock is not
obtained, then the assignment and in_use functions will terminate with
-EAGAIN.

Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_drv.c | 1 +
drivers/s390/crypto/vfio_ap_ops.c | 80 ++++++++++++++++++++++++---
drivers/s390/crypto/vfio_ap_private.h | 2 +
3 files changed, 74 insertions(+), 9 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index 1d1746fe50ea..df7528dcf6ed 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -44,6 +44,7 @@ MODULE_DEVICE_TABLE(vfio_ap, ap_queue_ids);
static struct ap_driver vfio_ap_drv = {
.probe = vfio_ap_mdev_probe_queue,
.remove = vfio_ap_mdev_remove_queue,
+ .in_use = vfio_ap_mdev_resource_in_use,
.ids = ap_queue_ids,
};

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 6b292ed30ada..5386b8635bec 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -635,16 +635,45 @@ static void vfio_ap_mdev_link_adapter(struct ap_matrix_mdev *matrix_mdev,
* vfio_ap_mdev_get_locks - lock the kvm->lock and matrix_dev->lock mutexes
*
* @matrix_mdev: the matrix mediated device object
+ * @check_mdev_lock: indicates whether to check that the matrix_dev->lock mutex
+ * is already locked (true = check, false = do not check).
+ *
+ * Return:
+ * -EAGAIN if the matrix_dev->lock mutex is already locked.
+ * 0 if both locks were acquired.
*/
-static void vfio_ap_mdev_get_locks(struct ap_matrix_mdev *matrix_mdev)
+static int vfio_ap_mdev_get_locks(struct ap_matrix_mdev *matrix_mdev,
+ bool check_mdev_lock)
{
+ /*
+ * If the matrix_dev->lock mutex is to be checked, then there's no
+ * sense in proceding if it is already locked.
+ */
+ if (check_mdev_lock && mutex_is_locked(&matrix_dev->lock))
+ return -EAGAIN;
+
down_read(&matrix_dev->guests_lock);

/* The kvm->lock must be must be taken before the matrix_dev->lock */
if (matrix_mdev->guest)
mutex_lock(&matrix_mdev->guest->kvm->lock);

- mutex_lock(&matrix_dev->lock);
+ /*
+ * If the matrix_dev-> lock is to be checked, then let's try to acquire
+ * it. If it can't be acquired, then let's bail out and return
+ * a value indicating locking should be tried again.
+ */
+ if (check_mdev_lock) {
+ if (!mutex_trylock(&matrix_dev->lock)) {
+ mutex_unlock(&matrix_mdev->guest->kvm->lock);
+ up_read(&matrix_dev->guests_lock);
+ return -EAGAIN;
+ }
+ } else {
+ mutex_lock(&matrix_dev->lock);
+ }
+
+ return 0;
}

/**
@@ -654,7 +683,6 @@ static void vfio_ap_mdev_get_locks(struct ap_matrix_mdev *matrix_mdev)
*/
static void vfio_ap_mdev_put_locks(struct ap_matrix_mdev *matrix_mdev)
{
- /* The kvm->lock must be must be taken before the matrix_dev->lock */
if (matrix_mdev->guest)
mutex_unlock(&matrix_mdev->guest->kvm->lock);

@@ -691,6 +719,10 @@ static void vfio_ap_mdev_put_locks(struct ap_matrix_mdev *matrix_mdev)
* An APQN derived from the cross product of the APID being assigned
* and the APQIs previously assigned is being used by another mediated
* matrix device
+ *
+ * 5. -EAGAIN
+ * The mdev lock could not be acquired which is required in order to
+ * change the AP configuration for the mdev
*/
static ssize_t assign_adapter_store(struct device *dev,
struct device_attribute *attr,
@@ -707,7 +739,10 @@ static ssize_t assign_adapter_store(struct device *dev,
if (apid > matrix_mdev->matrix.apm_max)
return -ENODEV;

- vfio_ap_mdev_get_locks(matrix_mdev);
+ ret = vfio_ap_mdev_get_locks(matrix_mdev, true);
+ if (ret)
+ return ret;
+
set_bit_inv(apid, matrix_mdev->matrix.apm);

ret = vfio_ap_mdev_validate_masks(matrix_mdev);
@@ -815,7 +850,10 @@ static ssize_t unassign_adapter_store(struct device *dev,
if (apid > matrix_mdev->matrix.apm_max)
return -ENODEV;

- vfio_ap_mdev_get_locks(matrix_mdev);
+ ret = vfio_ap_mdev_get_locks(matrix_mdev, false);
+ if (ret)
+ return ret;
+
clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
vfio_ap_mdev_hot_unplug_adapter(matrix_mdev, apid);
vfio_ap_mdev_put_locks(matrix_mdev);
@@ -879,7 +917,10 @@ static ssize_t assign_domain_store(struct device *dev,
if (apqi > max_apqi)
return -ENODEV;

- vfio_ap_mdev_get_locks(matrix_mdev);
+ ret = vfio_ap_mdev_get_locks(matrix_mdev, true);
+ if (ret)
+ return ret;
+
set_bit_inv(apqi, matrix_mdev->matrix.aqm);

ret = vfio_ap_mdev_validate_masks(matrix_mdev);
@@ -962,7 +1003,10 @@ static ssize_t unassign_domain_store(struct device *dev,
if (apqi > matrix_mdev->matrix.aqm_max)
return -ENODEV;

- vfio_ap_mdev_get_locks(matrix_mdev);
+ ret = vfio_ap_mdev_get_locks(matrix_mdev, false);
+ if (ret)
+ return ret;
+
clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
vfio_ap_mdev_hot_unplug_domain(matrix_mdev, apqi);
vfio_ap_mdev_put_locks(matrix_mdev);
@@ -1000,7 +1044,9 @@ static ssize_t assign_control_domain_store(struct device *dev,
if (id > matrix_mdev->matrix.adm_max)
return -ENODEV;

- vfio_ap_mdev_get_locks(matrix_mdev);
+ ret = vfio_ap_mdev_get_locks(matrix_mdev, false);
+ if (ret)
+ return ret;

/* Set the bit in the ADM (bitmask) corresponding to the AP control
* domain number (id). The bits in the mask, from most significant to
@@ -1047,7 +1093,10 @@ static ssize_t unassign_control_domain_store(struct device *dev,
if (domid > max_domid)
return -ENODEV;

- vfio_ap_mdev_get_locks(matrix_mdev);
+ ret = vfio_ap_mdev_get_locks(matrix_mdev, false);
+ if (ret)
+ return ret;
+
clear_bit_inv(domid, matrix_mdev->matrix.adm);

if (vfio_ap_mdev_filter_cdoms(matrix_mdev))
@@ -1681,3 +1730,16 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
vfio_ap_mdev_put_qlocks(guest);
kfree(q);
}
+
+int vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm)
+{
+ int ret;
+
+ if (!mutex_trylock(&matrix_dev->lock))
+ return -EBUSY;
+
+ ret = vfio_ap_mdev_verify_no_sharing(apm, aqm);
+ mutex_unlock(&matrix_dev->lock);
+
+ return ret;
+}
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 5d59bba8b153..97da41f87c65 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -149,4 +149,6 @@ void vfio_ap_mdev_unregister(void);
int vfio_ap_mdev_probe_queue(struct ap_device *queue);
void vfio_ap_mdev_remove_queue(struct ap_device *queue);

+int vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm);
+
#endif /* _VFIO_AP_PRIVATE_H_ */
--
2.31.1

2021-10-27 21:32:00

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v17 00/15] s390/vfio-ap: dynamic configuration support

PING!!

On 10/21/21 11:23 AM, Tony Krowiak wrote:
> The current design for AP pass-through does not support making dynamic
> changes to the AP matrix of a running guest resulting in a few
> deficiencies this patch series is intended to mitigate:
>
> 1. Adapters, domains and control domains can not be added to or removed
> from a running guest. In order to modify a guest's AP configuration,
> the guest must be terminated; only then can AP resources be assigned
> to or unassigned from the guest's matrix mdev. The new AP
> configuration becomes available to the guest when it is subsequently
> restarted.
>
> 2. The AP bus's /sys/bus/ap/apmask and /sys/bus/ap/aqmask interfaces can
> be modified by a root user without any restrictions. A change to
> either mask can result in AP queue devices being unbound from the
> vfio_ap device driver and bound to a zcrypt device driver even if a
> guest is using the queues, thus giving the host access to the guest's
> private crypto data and vice versa.
>
> 3. The APQNs derived from the Cartesian product of the APIDs of the
> adapters and APQIs of the domains assigned to a matrix mdev must
> reference an AP queue device bound to the vfio_ap device driver. The
> AP architecture allows assignment of AP resources that are not
> available to the system, so this artificial restriction is not
> compliant with the architecture.
>
> 4. The AP configuration profile can be dynamically changed for the linux
> host after a KVM guest is started. For example, a new domain can be
> dynamically added to the configuration profile via the SE or an HMC
> connected to a DPM enabled lpar. Likewise, AP adapters can be
> dynamically configured (online state) and deconfigured (standby state)
> using the SE, an SCLP command or an HMC connected to a DPM enabled
> lpar. This can result in inadvertent sharing of AP queues between the
> guest and host.
>
> 5. A root user can manually unbind an AP queue device representing a
> queue in use by a KVM guest via the vfio_ap device driver's sysfs
> unbind attribute. In this case, the guest will be using a queue that
> is not bound to the driver which violates the device model.
>
> This patch series introduces the following changes to the current design
> to alleviate the shortcomings described above as well as to implement
> more of the AP architecture:
>
> 1. A root user will be prevented from making edits to the AP bus's
> /sys/bus/ap/apmask or /sys/bus/ap/aqmask if the change would transfer
> ownership of an APQN from the vfio_ap device driver to a zcrypt driver
> while the APQN is assigned to a matrix mdev.
>
> 2. Allow a root user to hot plug/unplug AP adapters, domains and control
> domains for a KVM guest using the matrix mdev via its sysfs
> assign/unassign attributes.
>
> 4. Allow assignment of an AP adapter or domain to a matrix mdev even if
> it results in assignment of an APQN that does not reference an AP
> queue device bound to the vfio_ap device driver, as long as the APQN
> is not reserved for use by the default zcrypt drivers (also known as
> over-provisioning of AP resources). Allowing over-provisioning of AP
> resources better models the architecture which does not preclude
> assigning AP resources that are not yet available in the system. Such
> APQNs, however, will not be assigned to the guest using the matrix
> mdev; only APQNs referencing AP queue devices bound to the vfio_ap
> device driver will actually get assigned to the guest.
>
> 5. Handle dynamic changes to the AP device model.
>
> 1. Rationale for changes to AP bus's apmask/aqmask interfaces:
> ----------------------------------------------------------
> Due to the extremely sensitive nature of cryptographic data, it is
> imperative that great care be taken to ensure that such data is secured.
> Allowing a root user, either inadvertently or maliciously, to configure
> these masks such that a queue is shared between the host and a guest is
> not only avoidable, it is advisable. It was suggested that this scenario
> is better handled in user space with management software, but that does
> not preclude a malicious administrator from using the sysfs interfaces
> to gain access to a guest's crypto data. It was also suggested that this
> scenario could be avoided by taking access to the adapter away from the
> guest and zeroing out the queues prior to the vfio_ap driver releasing the
> device; however, stealing an adapter in use from a guest as a by-product
> of an operation is bad and will likely cause problems for the guest
> unnecessarily. It was decided that the most effective solution with the
> least number of negative side effects is to prevent the situation at the
> source.
>
> 2. Rationale for hot plug/unplug using matrix mdev sysfs interfaces:
> ----------------------------------------------------------------
> Allowing a user to hot plug/unplug AP resources using the matrix mdev
> sysfs interfaces circumvents the need to terminate the guest in order to
> modify its AP configuration. Allowing dynamic configuration makes
> reconfiguring a guest's AP matrix much less disruptive.
>
> 3. Rationale for allowing over-provisioning of AP resources:
> -----------------------------------------------------------
> Allowing assignment of AP resources to a matrix mdev and ultimately to a
> guest better models the AP architecture. The architecture does not
> preclude assignment of unavailable AP resources. If a queue subsequently
> becomes available while a guest using the matrix mdev to which its APQN
> is assigned, the guest will be given access to it. If an APQN
> is dynamically unassigned from the underlying host system, it will
> automatically become unavailable to the guest.
>
> Change log v16-v17:
> ------------------
> * Introduced a new patch (patch 1) to remove the setting of the pqap hook
> in the group notifier callback. It is now set when the vfio_ap device
> driver is loaded.
>
> * Patch 6:
> - Split the filtering of the APQNs and the control domains into
> two functions and consolidated the vfio_ap_mdev_refresh_apcb and
> vfio_ap_mdev_filter_apcb into one function named
> vfio_ap_mdev_filter_matrix because the matrix is actually what is
> being filtered.
>
> - Removed ACK by Halil Pasic because of changes above; needs re-review.
>
> * Introduced a new patch (patch 8) to keep track of active guests.
>
> * Patch 9 (patch 8 in v16):
> - Refactored locking to ensure KVM lock is taken before
> matrix_dev->lock when hot plugging adapters, domains and
> control domains.
>
> - Removed ACK by Halil because of changes above; needs re-review.
>
> * Patch 14 (patch 13 in v16):
> - This patch has been redesigned to ensure proper locking order (i.e.,
> taking kvm->lock before matrix_dev->lock).
>
> - Removed Halil's Removed-by because of changes above; needs re-review.
>
> Tony Krowiak (15):
> s390/vfio-ap: Set pqap hook when vfio_ap module is loaded
> s390/vfio-ap: use new AP bus interface to search for queue devices
> s390/vfio-ap: move probe and remove callbacks to vfio_ap_ops.c
> s390/vfio-ap: manage link between queue struct and matrix mdev
> s390/vfio-ap: introduce shadow APCB
> s390/vfio-ap: refresh guest's APCB by filtering APQNs assigned to mdev
> s390/vfio-ap: allow assignment of unavailable AP queues to mdev device
> s390/vfio-ap: keep track of active guests
> s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
> s390/vfio-ap: reset queues after adapter/domain unassignment
> s390/ap: driver callback to indicate resource in use
> s390/vfio-ap: implement in-use callback for vfio_ap driver
> s390/vfio-ap: sysfs attribute to display the guest's matrix
> s390/ap: notify drivers on config changed and scan complete callbacks
> s390/vfio-ap: update docs to include dynamic config support
>
> Documentation/s390/vfio-ap.rst | 492 ++++++---
> arch/s390/include/asm/kvm_host.h | 10 +-
> arch/s390/kvm/kvm-s390.c | 1 -
> arch/s390/kvm/priv.c | 45 +-
> drivers/s390/crypto/ap_bus.c | 241 ++++-
> drivers/s390/crypto/ap_bus.h | 16 +
> drivers/s390/crypto/vfio_ap_drv.c | 52 +-
> drivers/s390/crypto/vfio_ap_ops.c | 1379 ++++++++++++++++++-------
> drivers/s390/crypto/vfio_ap_private.h | 66 +-
> 9 files changed, 1714 insertions(+), 588 deletions(-)
>

2021-11-02 19:27:40

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v17 00/15] s390/vfio-ap: dynamic configuration support

Anybody interested in doing a review of this series?

On 10/27/21 10:24 AM, Tony Krowiak wrote:
> PING!!
>
> On 10/21/21 11:23 AM, Tony Krowiak wrote:
>> The current design for AP pass-through does not support making dynamic
>> changes to the AP matrix of a running guest resulting in a few
>> deficiencies this patch series is intended to mitigate:
>>
>> 1. Adapters, domains and control domains can not be added to or removed
>>      from a running guest. In order to modify a guest's AP
>> configuration,
>>      the guest must be terminated; only then can AP resources be
>> assigned
>>      to or unassigned from the guest's matrix mdev. The new AP
>>      configuration becomes available to the guest when it is
>> subsequently
>>      restarted.
>>
>> 2. The AP bus's /sys/bus/ap/apmask and /sys/bus/ap/aqmask interfaces can
>>      be modified by a root user without any restrictions. A change to
>>      either mask can result in AP queue devices being unbound from the
>>      vfio_ap device driver and bound to a zcrypt device driver even if a
>>      guest is using the queues, thus giving the host access to the
>> guest's
>>      private crypto data and vice versa.
>>
>> 3. The APQNs derived from the Cartesian product of the APIDs of the
>>      adapters and APQIs of the domains assigned to a matrix mdev must
>>      reference an AP queue device bound to the vfio_ap device driver.
>> The
>>      AP architecture allows assignment of AP resources that are not
>>      available to the system, so this artificial restriction is not
>>      compliant with the architecture.
>>
>> 4. The AP configuration profile can be dynamically changed for the linux
>>      host after a KVM guest is started. For example, a new domain can be
>>      dynamically added to the configuration profile via the SE or an HMC
>>      connected to a DPM enabled lpar. Likewise, AP adapters can be
>>      dynamically configured (online state) and deconfigured (standby
>> state)
>>      using the SE, an SCLP command or an HMC connected to a DPM enabled
>>      lpar. This can result in inadvertent sharing of AP queues
>> between the
>>      guest and host.
>>
>> 5. A root user can manually unbind an AP queue device representing a
>>      queue in use by a KVM guest via the vfio_ap device driver's sysfs
>>      unbind attribute. In this case, the guest will be using a queue
>> that
>>      is not bound to the driver which violates the device model.
>>
>> This patch series introduces the following changes to the current design
>> to alleviate the shortcomings described above as well as to implement
>> more of the AP architecture:
>>
>> 1. A root user will be prevented from making edits to the AP bus's
>>      /sys/bus/ap/apmask or /sys/bus/ap/aqmask if the change would
>> transfer
>>      ownership of an APQN from the vfio_ap device driver to a zcrypt
>> driver
>>      while the APQN is assigned to a matrix mdev.
>>
>> 2. Allow a root user to hot plug/unplug AP adapters, domains and control
>>      domains for a KVM guest using the matrix mdev via its sysfs
>>      assign/unassign attributes.
>>
>> 4. Allow assignment of an AP adapter or domain to a matrix mdev even if
>>      it results in assignment of an APQN that does not reference an AP
>>      queue device bound to the vfio_ap device driver, as long as the
>> APQN
>>      is not reserved for use by the default zcrypt drivers (also
>> known as
>>      over-provisioning of AP resources). Allowing over-provisioning
>> of AP
>>      resources better models the architecture which does not preclude
>>      assigning AP resources that are not yet available in the system.
>> Such
>>      APQNs, however, will not be assigned to the guest using the matrix
>>      mdev; only APQNs referencing AP queue devices bound to the vfio_ap
>>      device driver will actually get assigned to the guest.
>>
>> 5. Handle dynamic changes to the AP device model.
>>
>> 1. Rationale for changes to AP bus's apmask/aqmask interfaces:
>> ----------------------------------------------------------
>> Due to the extremely sensitive nature of cryptographic data, it is
>> imperative that great care be taken to ensure that such data is secured.
>> Allowing a root user, either inadvertently or maliciously, to configure
>> these masks such that a queue is shared between the host and a guest is
>> not only avoidable, it is advisable. It was suggested that this scenario
>> is better handled in user space with management software, but that does
>> not preclude a malicious administrator from using the sysfs interfaces
>> to gain access to a guest's crypto data. It was also suggested that this
>> scenario could be avoided by taking access to the adapter away from the
>> guest and zeroing out the queues prior to the vfio_ap driver
>> releasing the
>> device; however, stealing an adapter in use from a guest as a by-product
>> of an operation is bad and will likely cause problems for the guest
>> unnecessarily. It was decided that the most effective solution with the
>> least number of negative side effects is to prevent the situation at the
>> source.
>>
>> 2. Rationale for hot plug/unplug using matrix mdev sysfs interfaces:
>> ----------------------------------------------------------------
>> Allowing a user to hot plug/unplug AP resources using the matrix mdev
>> sysfs interfaces circumvents the need to terminate the guest in order to
>> modify its AP configuration. Allowing dynamic configuration makes
>> reconfiguring a guest's AP matrix much less disruptive.
>>
>> 3. Rationale for allowing over-provisioning of AP resources:
>> -----------------------------------------------------------
>> Allowing assignment of AP resources to a matrix mdev and ultimately to a
>> guest better models the AP architecture. The architecture does not
>> preclude assignment of unavailable AP resources. If a queue subsequently
>> becomes available while a guest using the matrix mdev to which its APQN
>> is assigned, the guest will be given access to it. If an APQN
>> is dynamically unassigned from the underlying host system, it will
>> automatically become unavailable to the guest.
>>
>> Change log v16-v17:
>> ------------------
>> * Introduced a new patch (patch 1) to remove the setting of the pqap
>> hook
>>    in the group notifier callback. It is now set when the vfio_ap device
>>    driver is loaded.
>>
>> * Patch 6:
>>      - Split the filtering of the APQNs and the control domains into
>>        two functions and consolidated the vfio_ap_mdev_refresh_apcb and
>>        vfio_ap_mdev_filter_apcb into one function named
>>        vfio_ap_mdev_filter_matrix because the matrix is actually what is
>>        being filtered.
>>
>>      - Removed ACK by Halil Pasic because of changes above; needs
>> re-review.
>>
>> * Introduced a new patch (patch 8) to keep track of active guests.
>>
>> * Patch 9 (patch 8 in v16):
>>      - Refactored locking to ensure KVM lock is taken before
>>        matrix_dev->lock when hot plugging adapters, domains and
>>        control domains.
>>
>>      - Removed ACK by Halil because of changes above; needs re-review.
>>
>> * Patch 14 (patch 13 in v16):
>>      - This patch has been redesigned to ensure proper locking order
>> (i.e.,
>>        taking kvm->lock before matrix_dev->lock).
>>
>>      - Removed Halil's Removed-by because of changes above; needs
>> re-review.
>>
>> Tony Krowiak (15):
>>    s390/vfio-ap: Set pqap hook when vfio_ap module is loaded
>>    s390/vfio-ap: use new AP bus interface to search for queue devices
>>    s390/vfio-ap: move probe and remove callbacks to vfio_ap_ops.c
>>    s390/vfio-ap: manage link between queue struct and matrix mdev
>>    s390/vfio-ap: introduce shadow APCB
>>    s390/vfio-ap: refresh guest's APCB by filtering APQNs assigned to
>> mdev
>>    s390/vfio-ap: allow assignment of unavailable AP queues to mdev
>> device
>>    s390/vfio-ap: keep track of active guests
>>    s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
>>    s390/vfio-ap: reset queues after adapter/domain unassignment
>>    s390/ap: driver callback to indicate resource in use
>>    s390/vfio-ap: implement in-use callback for vfio_ap driver
>>    s390/vfio-ap: sysfs attribute to display the guest's matrix
>>    s390/ap: notify drivers on config changed and scan complete callbacks
>>    s390/vfio-ap: update docs to include dynamic config support
>>
>>   Documentation/s390/vfio-ap.rst        |  492 ++++++---
>>   arch/s390/include/asm/kvm_host.h      |   10 +-
>>   arch/s390/kvm/kvm-s390.c              |    1 -
>>   arch/s390/kvm/priv.c                  |   45 +-
>>   drivers/s390/crypto/ap_bus.c          |  241 ++++-
>>   drivers/s390/crypto/ap_bus.h          |   16 +
>>   drivers/s390/crypto/vfio_ap_drv.c     |   52 +-
>>   drivers/s390/crypto/vfio_ap_ops.c     | 1379 ++++++++++++++++++-------
>>   drivers/s390/crypto/vfio_ap_private.h |   66 +-
>>   9 files changed, 1714 insertions(+), 588 deletions(-)
>>
>

2021-11-04 11:28:46

by Harald Freudenberger

[permalink] [raw]
Subject: Re: [PATCH v17 11/15] s390/ap: driver callback to indicate resource in use

On 21.10.21 17:23, Tony Krowiak wrote:
> Introduces a new driver callback to prevent a root user from re-assigning
> the APQN of a queue that is in use by a non-default host device driver to
> a default host device driver and vice versa. The callback will be invoked
> whenever a change to the AP bus's sysfs apmask or aqmask attributes would
> result in one or more APQNs being re-assigned. If the callback responds
> in the affirmative for any driver queried, the change to the apmask or
> aqmask will be rejected with a device busy error.
>
> For this patch, only non-default drivers will be queried. Currently,
> there is only one non-default driver, the vfio_ap device driver. The
> vfio_ap device driver facilitates pass-through of an AP queue to a
> guest. The idea here is that a guest may be administered by a different
> sysadmin than the host and we don't want AP resources to unexpectedly
> disappear from a guest's AP configuration (i.e., adapters and domains
> assigned to the matrix mdev). This will enforce the proper procedure for
> removing AP resources intended for guest usage which is to
> first unassign them from the matrix mdev, then unbind them from the
> vfio_ap device driver.
>
> Signed-off-by: Tony Krowiak <[email protected]>
> Reviewed-by: Harald Freudenberger <[email protected]>
> Reviewed-by: Halil Pasic <[email protected]>
> ---
> drivers/s390/crypto/ap_bus.c | 160 ++++++++++++++++++++++++++++++++---
> drivers/s390/crypto/ap_bus.h | 4 +
> 2 files changed, 154 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
> index d9b804943d19..15886610f61a 100644
> --- a/drivers/s390/crypto/ap_bus.c
> +++ b/drivers/s390/crypto/ap_bus.c
> @@ -36,6 +36,7 @@
> #include <linux/mod_devicetable.h>
> #include <linux/debugfs.h>
> #include <linux/ctype.h>
> +#include <linux/module.h>
>
> #include "ap_bus.h"
> #include "ap_debug.h"
> @@ -1060,6 +1061,23 @@ static int modify_bitmap(const char *str, unsigned long *bitmap, int bits)
> return 0;
> }
>
> +static int ap_parse_bitmap_str(const char *str, unsigned long *bitmap, int bits,
> + unsigned long *newmap)
> +{
> + unsigned long size;
> + int rc;
> +
> + size = BITS_TO_LONGS(bits) * sizeof(unsigned long);
> + if (*str == '+' || *str == '-') {
> + memcpy(newmap, bitmap, size);
> + rc = modify_bitmap(str, newmap, bits);
> + } else {
> + memset(newmap, 0, size);
> + rc = hex2bitmap(str, newmap, bits);
> + }
> + return rc;
> +}
> +
> int ap_parse_mask_str(const char *str,
> unsigned long *bitmap, int bits,
> struct mutex *lock)
> @@ -1079,14 +1097,7 @@ int ap_parse_mask_str(const char *str,
> kfree(newmap);
> return -ERESTARTSYS;
> }
> -
> - if (*str == '+' || *str == '-') {
> - memcpy(newmap, bitmap, size);
> - rc = modify_bitmap(str, newmap, bits);
> - } else {
> - memset(newmap, 0, size);
> - rc = hex2bitmap(str, newmap, bits);
> - }
> + rc = ap_parse_bitmap_str(str, bitmap, bits, newmap);
> if (rc == 0)
> memcpy(bitmap, newmap, size);
> mutex_unlock(lock);
> @@ -1278,12 +1289,76 @@ static ssize_t apmask_show(struct bus_type *bus, char *buf)
> return rc;
> }
>
> +static int __verify_card_reservations(struct device_driver *drv, void *data)
> +{
> + int rc = 0;
> + struct ap_driver *ap_drv = to_ap_drv(drv);
> + unsigned long *newapm = (unsigned long *)data;
> +
> + /*
> + * No need to verify whether the driver is using the queues if it is the
> + * default driver.
> + */
> + if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
> + return 0;
> +
> + /*
> + * increase the driver's module refcounter to be sure it is not
> + * going away when we invoke the callback function.
> + */
> + if (!try_module_get(drv->owner))
> + return 0;
> +
> + if (ap_drv->in_use) {
> + rc = ap_drv->in_use(newapm, ap_perms.aqm);
> + if (rc)
> + rc = -EBUSY;
> + }
> +
> + /* release the driver's module */
> + module_put(drv->owner);
> +
> + return rc;
> +}
> +
> +static int apmask_commit(unsigned long *newapm)
> +{
> + int rc;
> + unsigned long reserved[BITS_TO_LONGS(AP_DEVICES)];
> +
> + /*
> + * Check if any bits in the apmask have been set which will
> + * result in queues being removed from non-default drivers
> + */
> + if (bitmap_andnot(reserved, newapm, ap_perms.apm, AP_DEVICES)) {
> + rc = bus_for_each_drv(&ap_bus_type, NULL, reserved,
> + __verify_card_reservations);
> + if (rc)
> + return rc;
> + }
> +
> + memcpy(ap_perms.apm, newapm, APMASKSIZE);
> +
> + return 0;
> +}
> +
> static ssize_t apmask_store(struct bus_type *bus, const char *buf,
> size_t count)
> {
> int rc;
> + DECLARE_BITMAP(newapm, AP_DEVICES);
>
> - rc = ap_parse_mask_str(buf, ap_perms.apm, AP_DEVICES, &ap_perms_mutex);
> + if (mutex_lock_interruptible(&ap_perms_mutex))
> + return -ERESTARTSYS;
> +
> + rc = ap_parse_bitmap_str(buf, ap_perms.apm, AP_DEVICES, newapm);
> + if (rc)
> + goto done;
> +
> + rc = apmask_commit(newapm);
> +
> +done:
> + mutex_unlock(&ap_perms_mutex);
> if (rc)
> return rc;
>
> @@ -1309,12 +1384,77 @@ static ssize_t aqmask_show(struct bus_type *bus, char *buf)
> return rc;
> }
>
> +static int __verify_queue_reservations(struct device_driver *drv, void *data)
> +{
> + int rc = 0;
> + struct ap_driver *ap_drv = to_ap_drv(drv);
> + unsigned long *newaqm = (unsigned long *)data;
> +
> + /*
> + * If the reserved bits do not identify queues reserved for use by the
> + * non-default driver, there is no need to verify the driver is using
> + * the queues.
> + */
> + if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
> + return 0;
> +
> + /*
> + * increase the driver's module refcounter to be sure it is not
> + * going away when we invoke the callback function.
> + */
> + if (!try_module_get(drv->owner))
> + return 0;
> +
> + if (ap_drv->in_use) {
> + rc = ap_drv->in_use(ap_perms.apm, newaqm);
> + if (rc)
> + return -EBUSY;
> + }
> +
> + /* release the driver's module */
> + module_put(drv->owner);
> +
> + return rc;
> +}
> +
> +static int aqmask_commit(unsigned long *newaqm)
> +{
> + int rc;
> + unsigned long reserved[BITS_TO_LONGS(AP_DOMAINS)];
> +
> + /*
> + * Check if any bits in the aqmask have been set which will
> + * result in queues being removed from non-default drivers
> + */
> + if (bitmap_andnot(reserved, newaqm, ap_perms.aqm, AP_DOMAINS)) {
> + rc = bus_for_each_drv(&ap_bus_type, NULL, reserved,
> + __verify_queue_reservations);
> + if (rc)
> + return rc;
> + }
> +
> + memcpy(ap_perms.aqm, newaqm, AQMASKSIZE);
> +
> + return 0;
> +}
> +
> static ssize_t aqmask_store(struct bus_type *bus, const char *buf,
> size_t count)
> {
> int rc;
> + DECLARE_BITMAP(newaqm, AP_DOMAINS);
> +
> + if (mutex_lock_interruptible(&ap_perms_mutex))
> + return -ERESTARTSYS;
> +
> + rc = ap_parse_bitmap_str(buf, ap_perms.aqm, AP_DOMAINS, newaqm);
> + if (rc)
> + goto done;
> +
> + rc = aqmask_commit(newaqm);
>
> - rc = ap_parse_mask_str(buf, ap_perms.aqm, AP_DOMAINS, &ap_perms_mutex);
> +done:
> + mutex_unlock(&ap_perms_mutex);
> if (rc)
> return rc;
>
> diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
> index 95b577754b35..67c1bef60ad5 100644
> --- a/drivers/s390/crypto/ap_bus.h
> +++ b/drivers/s390/crypto/ap_bus.h
> @@ -142,6 +142,7 @@ struct ap_driver {
>
> int (*probe)(struct ap_device *);
> void (*remove)(struct ap_device *);
> + int (*in_use)(unsigned long *apm, unsigned long *aqm);
> };
>
> #define to_ap_drv(x) container_of((x), struct ap_driver, driver)
> @@ -289,6 +290,9 @@ void ap_queue_init_state(struct ap_queue *aq);
> struct ap_card *ap_card_create(int id, int queue_depth, int raw_type,
> int comp_type, unsigned int functions, int ml);
>
> +#define APMASKSIZE (BITS_TO_LONGS(AP_DEVICES) * sizeof(unsigned long))
> +#define AQMASKSIZE (BITS_TO_LONGS(AP_DOMAINS) * sizeof(unsigned long))
> +
> struct ap_perms {
> unsigned long ioctlm[BITS_TO_LONGS(AP_IOCTLS)];
> unsigned long apm[BITS_TO_LONGS(AP_DEVICES)];
reviewed again. I still don't like this as it introduces an unbalanced weighting for the
vfio dd but ... We could consider removing the

if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
return 0;

in function __verify_queue_reservations. It would still work as the 'default device
drivers' do not implement the in_use() callback and thus do not disagree about
the upcoming change.

2021-11-04 12:11:04

by Harald Freudenberger

[permalink] [raw]
Subject: Re: [PATCH v17 14/15] s390/ap: notify drivers on config changed and scan complete callbacks

On 21.10.21 17:23, Tony Krowiak wrote:
> This patch introduces an extension to the ap bus to notify device drivers
> when the host AP configuration changes - i.e., adapters, domains or
> control domains are added or removed. When an adapter or domain is added to
> the host's AP configuration, the AP bus will create the associated queue
> devices in the linux sysfs device model. Each new type 10 (i.e., CEX4) or
> newer queue device with an APQN that is not reserved for the default device
> driver will get bound to the vfio_ap device driver. Likewise, whan an
> adapter or domain is removed from the host's AP configuration, the AP bus
> will remove the associated queue devices from the sysfs device model. Each
> of the queues that is bound to the vfio_ap device driver will get unbound.
>
> With the introduction of hot plug support, binding or unbinding of a
> queue device will result in plugging or unplugging one or more queues from
> a guest that is using the queue. If there are multiple changes to the
> host's AP configuration, it could result in the probe and remove callbacks
> getting invoked multiple times. Each time queues are plugged into or
> unplugged from a guest, the guest's VCPUs must be taken out of SIE.
> If this occurs multiple times due to changes in the host's AP
> configuration, that can have an undesirable negative affect on the guest's
> performance.
>
> To alleviate this problem, this patch introduces two new callbacks: one to
> notify the vfio_ap device driver when the AP bus scan routine detects a
> change to the host's AP configuration; and, one to notify the driver when
> the AP bus is done scanning. This will allow the vfio_ap driver to do
> bulk processing of all affected adapters, domains and control domains for
> affected guests rather than plugging or unplugging them one at a time when
> the probe or remove callback is invoked. The two new callbacks are:
>
> void (*on_config_changed)(struct ap_config_info *new_config_info,
> struct ap_config_info *old_config_info);
>
> This callback is invoked at the start of the AP bus scan
> function when it determines that the host AP configuration information
> has changed since the previous scan. This is done by storing
> an old and current QCI info struct and comparing them. If there is any
> difference, the callback is invoked.
>
> The vfio_ap device driver registers a callback function for this callback
> that performs the following operations:
>
> 1. Unplugs the adapters, domains and control domains removed from the
> host's AP configuration from the guests to which they are
> assigned in a single operation.
>
> 2. Disconnects the links between each queue structure representing a
> queue that was unplugged from the structure representing
> the mediated device to which the queue is assigned. Thus, when the
> vfio_ap device driver's remove callback is invoked, the unplugging of
> the queue from the guest and the unlinking of the queue structure from
> the mediated device structure will be bypassed because the queues and
> control domains will have already been unplugged in bulk.
>
> 3. Stores bitmaps identifying the adapters, domains and control domains
> added to the host's AP configuration with the structure representing
> the mediated device. When the vfio_ap device driver's probe callback is
> subsequently invoked, the probe function will recognize that the
> queue is being probed due to a change in the host's AP configuration
> and the plugging of the queue into the guest will be bypassed.
>
> void (*on_scan_complete)(struct ap_config_info *new_config_info,
> struct ap_config_info *old_config_info);
>
> The on_scan_complete callback is invoked after the ap bus scan is
> completed if the host AP configuration data has changed. The vfio_ap
> device driver registers a callback function for this callback that hot
> plugs each queue and control domain added to the AP configuration for each
> guest using them in a single hot plug operation.
>
> Signed-off-by: Harald Freudenberger <[email protected]>
> [[email protected]: implemented callback functions in vfio_ap driver]
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> drivers/s390/crypto/ap_bus.c | 81 ++++++-
> drivers/s390/crypto/ap_bus.h | 12 +
> drivers/s390/crypto/vfio_ap_drv.c | 4 +-
> drivers/s390/crypto/vfio_ap_ops.c | 332 ++++++++++++++++++++++++--
> drivers/s390/crypto/vfio_ap_private.h | 23 +-
> 5 files changed, 429 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
> index 15886610f61a..b97149d02da6 100644
> --- a/drivers/s390/crypto/ap_bus.c
> +++ b/drivers/s390/crypto/ap_bus.c
> @@ -88,6 +88,7 @@ static atomic64_t ap_bindings_complete_count = ATOMIC64_INIT(0);
> static DECLARE_COMPLETION(ap_init_apqn_bindings_complete);
>
> static struct ap_config_info *ap_qci_info;
> +static struct ap_config_info *ap_qci_info_old;
>
> /*
> * AP bus related debug feature things.
> @@ -225,9 +226,14 @@ static void __init ap_init_qci_info(void)
> ap_qci_info = kzalloc(sizeof(*ap_qci_info), GFP_KERNEL);
> if (!ap_qci_info)
> return;
> + ap_qci_info_old = kzalloc(sizeof(*ap_qci_info_old), GFP_KERNEL);
> + if (!ap_qci_info_old)
> + return;
> if (ap_fetch_qci_info(ap_qci_info) != 0) {
> kfree(ap_qci_info);
> + kfree(ap_qci_info_old);
> ap_qci_info = NULL;
> + ap_qci_info_old = NULL;
> return;
> }
> AP_DBF_INFO("%s successful fetched initial qci info\n", __func__);
> @@ -244,6 +250,8 @@ static void __init ap_init_qci_info(void)
> __func__, ap_max_domain_id);
> }
> }
> +
> + memcpy(ap_qci_info_old, ap_qci_info, sizeof(*ap_qci_info));
> }
>
> /*
> @@ -1635,6 +1643,49 @@ static int __match_queue_device_with_queue_id(struct device *dev, const void *da
> && AP_QID_QUEUE(to_ap_queue(dev)->qid) == (int)(long) data;
> }
>
> +/* Helper function for notify_config_changed */
> +static int __drv_notify_config_changed(struct device_driver *drv, void *data)
> +{
> + struct ap_driver *ap_drv = to_ap_drv(drv);
> +
> + if (try_module_get(drv->owner)) {
> + if (ap_drv->on_config_changed)
> + ap_drv->on_config_changed(ap_qci_info, ap_qci_info_old);
> + module_put(drv->owner);
> + }
> +
> + return 0;
> +}
> +
> +/* Notify all drivers about an qci config change */
> +static inline void notify_config_changed(void)
> +{
> + bus_for_each_drv(&ap_bus_type, NULL, NULL,
> + __drv_notify_config_changed);
> +}
> +
> +/* Helper function for notify_scan_complete */
> +static int __drv_notify_scan_complete(struct device_driver *drv, void *data)
> +{
> + struct ap_driver *ap_drv = to_ap_drv(drv);
> +
> + if (try_module_get(drv->owner)) {
> + if (ap_drv->on_scan_complete)
> + ap_drv->on_scan_complete(ap_qci_info,
> + ap_qci_info_old);
> + module_put(drv->owner);
> + }
> +
> + return 0;
> +}
> +
> +/* Notify all drivers about bus scan complete */
> +static inline void notify_scan_complete(void)
> +{
> + bus_for_each_drv(&ap_bus_type, NULL, NULL,
> + __drv_notify_scan_complete);
> +}
> +
> /*
> * Helper function for ap_scan_bus().
> * Remove card device and associated queue devices.
> @@ -1923,6 +1974,25 @@ static inline void ap_scan_adapter(int ap)
> put_device(&ac->ap_dev.device);
> }
>
> +/**
> + * ap_get_configuration - get the host AP configuration
> + *
> + * Stores the host AP configuration information returned from the previous call
> + * to Query Configuration Information (QCI), then retrieves and stores the
> + * current AP configuration returned from QCI.
> + *
> + * Return: true if the host AP configuration changed between calls to QCI;
> + * otherwise, return false.
> + */
> +static bool ap_get_configuration(void)
> +{
> + memcpy(ap_qci_info_old, ap_qci_info, sizeof(*ap_qci_info));
> + ap_fetch_qci_info(ap_qci_info);
> +
> + return memcmp(ap_qci_info, ap_qci_info_old,
> + sizeof(struct ap_config_info)) != 0;
> +}
> +
> /**
> * ap_scan_bus(): Scan the AP bus for new devices
> * Runs periodically, workqueue timer (ap_config_time)
> @@ -1930,9 +2000,12 @@ static inline void ap_scan_adapter(int ap)
> */
> static void ap_scan_bus(struct work_struct *unused)
> {
> - int ap;
> + int ap, config_changed = 0;
>
> - ap_fetch_qci_info(ap_qci_info);
> + /* config change notify */
> + config_changed = ap_get_configuration();
> + if (config_changed)
> + notify_config_changed();
> ap_select_domain();
>
> AP_DBF_DBG("%s running\n", __func__);
> @@ -1941,6 +2014,10 @@ static void ap_scan_bus(struct work_struct *unused)
> for (ap = 0; ap <= ap_max_adapter_id; ap++)
> ap_scan_adapter(ap);
>
> + /* scan complete notify */
> + if (config_changed)
> + notify_scan_complete();
> +
> /* check if there is at least one queue available with default domain */
> if (ap_domain_index >= 0) {
> struct device *dev =
> diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
> index 67c1bef60ad5..4de062ea6b76 100644
> --- a/drivers/s390/crypto/ap_bus.h
> +++ b/drivers/s390/crypto/ap_bus.h
> @@ -143,6 +143,18 @@ struct ap_driver {
> int (*probe)(struct ap_device *);
> void (*remove)(struct ap_device *);
> int (*in_use)(unsigned long *apm, unsigned long *aqm);
> + /*
> + * Called at the start of the ap bus scan function when
> + * the crypto config information (qci) has changed.
> + */
> + void (*on_config_changed)(struct ap_config_info *new_config_info,
> + struct ap_config_info *old_config_info);
> + /*
> + * Called at the end of the ap bus scan function when
> + * the crypto config information (qci) has changed.
> + */
> + void (*on_scan_complete)(struct ap_config_info *new_config_info,
> + struct ap_config_info *old_config_info);
> };
>
> #define to_ap_drv(x) container_of((x), struct ap_driver, driver)
> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
> index df7528dcf6ed..5edd45d4d2fc 100644
> --- a/drivers/s390/crypto/vfio_ap_drv.c
> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> @@ -45,6 +45,8 @@ static struct ap_driver vfio_ap_drv = {
> .probe = vfio_ap_mdev_probe_queue,
> .remove = vfio_ap_mdev_remove_queue,
> .in_use = vfio_ap_mdev_resource_in_use,
> + .on_config_changed = vfio_ap_on_cfg_changed,
> + .on_scan_complete = vfio_ap_on_scan_complete,
> .ids = ap_queue_ids,
> };
>
> @@ -92,7 +94,7 @@ static int vfio_ap_matrix_dev_create(void)
>
> /* Fill in config info via PQAP(QCI), if available */
> if (test_facility(12)) {
> - ret = ap_qci(&matrix_dev->info);
> + ret = ap_qci(&matrix_dev->config_info);
> if (ret)
> goto matrix_alloc_err;
> }
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 8075080ef2dd..cedf491c0df4 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -330,7 +330,7 @@ static bool vfio_ap_mdev_filter_cdoms(struct ap_matrix_mdev *matrix_mdev)
>
> bitmap_copy(shadow_adm, matrix_mdev->shadow_apcb.adm, AP_DOMAINS);
> bitmap_and(matrix_mdev->shadow_apcb.adm, matrix_mdev->matrix.adm,
> - (unsigned long *)matrix_dev->info.adm, AP_DOMAINS);
> + (unsigned long *)matrix_dev->config_info.adm, AP_DOMAINS);
>
> return !bitmap_equal(shadow_adm, matrix_mdev->shadow_apcb.adm,
> AP_DOMAINS);
> @@ -349,19 +349,15 @@ static bool vfio_ap_mdev_filter_cdoms(struct ap_matrix_mdev *matrix_mdev)
> */
> static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev)
> {
> - int ret;
> unsigned long apid, apqi, apqn;
> DECLARE_BITMAP(shadow_apm, AP_DEVICES);
> DECLARE_BITMAP(shadow_aqm, AP_DOMAINS);
> struct vfio_ap_queue *q;
>
> - ret = ap_qci(&matrix_dev->info);
> - if (ret)
> - return false;
> -
> bitmap_copy(shadow_apm, matrix_mdev->shadow_apcb.apm, AP_DEVICES);
> bitmap_copy(shadow_aqm, matrix_mdev->shadow_apcb.aqm, AP_DOMAINS);
> - vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);
> + vfio_ap_matrix_init(&matrix_dev->config_info,
> + &matrix_mdev->shadow_apcb);
>
> /*
> * Copy the adapters, domains and control domains to the shadow_apcb
> @@ -369,9 +365,9 @@ static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev)
> * AP configuration.
> */
> bitmap_and(matrix_mdev->shadow_apcb.apm, matrix_mdev->matrix.apm,
> - (unsigned long *)matrix_dev->info.apm, AP_DEVICES);
> + (unsigned long *)matrix_dev->config_info.apm, AP_DEVICES);
> bitmap_and(matrix_mdev->shadow_apcb.aqm, matrix_mdev->matrix.aqm,
> - (unsigned long *)matrix_dev->info.aqm, AP_DOMAINS);
> + (unsigned long *)matrix_dev->config_info.aqm, AP_DOMAINS);
>
> for_each_set_bit_inv(apid, matrix_mdev->shadow_apcb.apm, AP_DEVICES) {
> for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm,
> @@ -417,8 +413,9 @@ static int vfio_ap_mdev_probe(struct mdev_device *mdev)
> &vfio_ap_matrix_dev_ops);
>
> matrix_mdev->mdev = mdev;
> - vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
> - vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);
> + vfio_ap_matrix_init(&matrix_dev->config_info, &matrix_mdev->matrix);
> + vfio_ap_matrix_init(&matrix_dev->config_info,
> + &matrix_mdev->shadow_apcb);
> hash_init(matrix_mdev->qtable.queues);
> mdev_set_drvdata(mdev, matrix_mdev);
> mutex_lock(&matrix_dev->lock);
> @@ -772,13 +769,17 @@ static void vfio_ap_unlink_apqn_fr_mdev(struct ap_matrix_mdev *matrix_mdev,
>
> q = vfio_ap_mdev_get_queue(matrix_mdev, AP_MKQID(apid, apqi));
> /* If the queue is assigned to the matrix mdev, unlink it. */
> - if (q)
> + if (q) {
> vfio_ap_unlink_queue_fr_mdev(q);
>
> - /* If the queue is assigned to the APCB, store it in @qtable. */
> - if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm) &&
> - test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm))
> - hash_add(qtable->queues, &q->mdev_qnode, q->apqn);
> + /* If the queue is assigned to the APCB, store it in @qtable. */
> + if (qtable) {
> + if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm) &&
> + test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm))
> + hash_add(qtable->queues, &q->mdev_qnode,
> + q->apqn);
> + }
> + }
> }
>
> /**
> @@ -1702,9 +1703,31 @@ static void vfio_ap_mdev_put_qlocks(struct ap_guest *guest)
> mutex_unlock(&guest->kvm->lock);
>
> mutex_unlock(&matrix_dev->lock);
> +
> up_read(&matrix_dev->guests_lock);
> }
>
> +static bool vfio_ap_mdev_do_filter_matrix(struct ap_matrix_mdev *matrix_mdev,
> + struct vfio_ap_queue *q)
> +{
> + unsigned long apid = AP_QID_CARD(q->apqn);
> + unsigned long apqi = AP_QID_QUEUE(q->apqn);
> +
> + /*
> + * If the queue is being probed because its APID or APQI is in the
> + * process of being added to the host's AP configuration, then we don't
> + * want to filter the matrix now as the filtering will be done after
> + * the driver is notified that the AP bus scan operation has completed
> + * (see the vfio_ap_on_scan_complete callback function).
> + */
> + if (test_bit_inv(apid, matrix_mdev->apm_add) ||
> + test_bit_inv(apqi, matrix_mdev->aqm_add))
> + return false;
> +
> +
> + return true;
> +}
> +
> int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
> {
> struct vfio_ap_queue *q;
> @@ -1722,8 +1745,10 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
> if (guest) {
> vfio_ap_mdev_link_queue(guest->matrix_mdev, q);
>
> - if (vfio_ap_mdev_filter_matrix(guest->matrix_mdev))
> - vfio_ap_mdev_hotplug_apcb(guest->matrix_mdev);
> + if (vfio_ap_mdev_do_filter_matrix(guest->matrix_mdev, q)) {
> + if (vfio_ap_mdev_filter_matrix(guest->matrix_mdev))
> + vfio_ap_mdev_hotplug_apcb(guest->matrix_mdev);
> + }
> } else {
> vfio_ap_queue_link_mdev(q);
> }
> @@ -1767,3 +1792,274 @@ int vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm)
>
> return ret;
> }
> +
> +/**
> + * vfio_ap_mdev_unlink_adapters - unlinks all queues from the matrix mdev with
> + * an APQI of a domain that has been removed from
> + * the host's AP configuration.
> + *
> + * @matrix_mdev: the matrix mdev from which the queues are to be unlinked
> + * @ap_unlink: a bitmap specifying the APIDs of the adapters removed from the
> + * host's AP configuration.
> + */
> +static void vfio_ap_mdev_unlink_adapters(struct ap_matrix_mdev *matrix_mdev,
> + unsigned long *ap_unlink)
> +{
> + unsigned long apid;
> +
> + for_each_set_bit_inv(apid, ap_unlink, AP_DEVICES)
> + vfio_ap_mdev_unlink_adapter(matrix_mdev, apid, NULL);
> +}
> +
> +/**
> + * vfio_ap_mdev_unlink_domains - unlinks all queues from the matrix mdev with an
> + * APQI of a domain that has been removed from the
> + * host's AP configuration.
> + *
> + * @matrix_mdev: the matrix mdev from which the queues are to be unlinked
> + * @aq_unlink: a bitmap specifying the APQIs of the domains removed from the
> + * host's AP configuration.
> + */
> +static void vfio_ap_mdev_unlink_domains(struct ap_matrix_mdev *matrix_mdev,
> + unsigned long *aq_unlink)
> +{
> + unsigned long apqi;
> +
> + for_each_set_bit_inv(apqi, aq_unlink, AP_DOMAINS)
> + vfio_ap_mdev_unlink_domain(matrix_mdev, apqi, NULL);
> +}
> +
> +/**
> + * vfio_ap_mdev_hot_unplug_cfg - hot unplug the adapters, domains and control
> + * domains that have been removed from the host's
> + * AP configuration from a guest.
> + *
> + * @guest: the guest
> + * @aprem: the adapters that have been removed from the host's AP configuration
> + * @aqrem: the domains that have been removed from the host's AP configuration
> + */
> +static void vfio_ap_mdev_hot_unplug_cfg(struct ap_guest *guest,
> + unsigned long *aprem,
> + unsigned long *aqrem)
> +{
> + vfio_ap_mdev_unlink_adapters(guest->matrix_mdev, aprem);
> + vfio_ap_mdev_unlink_domains(guest->matrix_mdev, aqrem);
> +
> + if (vfio_ap_mdev_filter_matrix(guest->matrix_mdev) ||
> + vfio_ap_mdev_filter_cdoms(guest->matrix_mdev)) {
> + mutex_lock(&guest->kvm->lock);
> + mutex_lock(&matrix_dev->lock);
> +
> + vfio_ap_mdev_hotplug_apcb(guest->matrix_mdev);
> +
> + mutex_unlock(&guest->kvm->lock);
> + mutex_unlock(&matrix_dev->lock);
> + }
> +}
> +
> +/**
> + * vfio_ap_mdev_cfg_remove - determines which guests are using the adapters,
> + * domains and control domains that have been removed
> + * from the host AP configuration and unplugs them
> + * from those guests.
> + *
> + * @ap_remove: bitmap specifying which adapters have been removed from the host
> + * config.
> + * @aq_remove: bitmap specifying which domains have been removed from the host
> + * config.
> + * @cd_remove: bitmap specifying which control domains have been removed from
> + * the host config.
> + */
> +static void vfio_ap_mdev_cfg_remove(unsigned long *ap_remove,
> + unsigned long *aq_remove,
> + unsigned long *cd_remove)
> +{
> + struct ap_guest *guest;
> + DECLARE_BITMAP(aprem, AP_DEVICES);
> + DECLARE_BITMAP(aqrem, AP_DOMAINS);
> + int do_ap_remove, do_aq_remove, do_cd_remove;
> +
> + list_for_each_entry(guest, &matrix_dev->guests, node) {
> + do_ap_remove = bitmap_and(aprem, ap_remove,
> + guest->matrix_mdev->matrix.apm,
> + AP_DEVICES);
> + do_aq_remove = bitmap_and(aqrem, aq_remove,
> + guest->matrix_mdev->matrix.aqm,
> + AP_DOMAINS);
> + do_cd_remove = bitmap_and(aqrem, cd_remove,
> + guest->matrix_mdev->matrix.aqm,
> + AP_DOMAINS);
> +
> + if (!do_ap_remove && !do_aq_remove && !do_cd_remove)
> + continue;
> +
> + vfio_ap_mdev_hot_unplug_cfg(guest, aprem, aqrem);
> + }
> +}
> +
> +/**
> + * vfio_ap_mdev_on_cfg_remove - responds to the removal of adapters, domains and
> + * control domains from the host AP configuration
> + * by unplugging them from the guests that are
> + * using them.
> + */
> +static void vfio_ap_mdev_on_cfg_remove(void)
> +{
> + int ap_remove, aq_remove, cd_remove;
> + DECLARE_BITMAP(aprem, AP_DEVICES);
> + DECLARE_BITMAP(aqrem, AP_DOMAINS);
> + DECLARE_BITMAP(cdrem, AP_DOMAINS);
> + unsigned long *cur_apm, *cur_aqm, *cur_adm;
> + unsigned long *prev_apm, *prev_aqm, *prev_adm;
> +
> + cur_apm = (unsigned long *)matrix_dev->config_info.apm;
> + cur_aqm = (unsigned long *)matrix_dev->config_info.aqm;
> + cur_adm = (unsigned long *)matrix_dev->config_info.adm;
> + prev_apm = (unsigned long *)matrix_dev->config_info_prev.apm;
> + prev_aqm = (unsigned long *)matrix_dev->config_info_prev.aqm;
> + prev_adm = (unsigned long *)matrix_dev->config_info_prev.adm;
> +
> + ap_remove = bitmap_andnot(aprem, prev_apm, cur_apm, AP_DEVICES);
> + aq_remove = bitmap_andnot(aqrem, prev_aqm, cur_aqm, AP_DOMAINS);
> + cd_remove = bitmap_andnot(cdrem, prev_adm, cur_adm, AP_DOMAINS);
> +
> + if (ap_remove || aq_remove || cd_remove)
> + vfio_ap_mdev_cfg_remove(aprem, aqrem, cdrem);
> +}
> +
> +/**
> + * vfio_ap_mdev_cfg_add - store bitmaps specifying the adapters, domains and
> + * control domains that have been added to the host's
> + * AP configuration for each matrix mdev to which they
> + * are assigned.
> + *
> + * @apm_add: a bitmap specifying the adapters that have been added to the AP
> + * configuration.
> + * @aqm_add: a bitmap specifying the domains that have been added to the AP
> + * configuration.
> + * @adm_add: a bitmap specifying the control domains that have been added to the
> + * AP configuration.
> + */
> +static void vfio_ap_mdev_cfg_add(unsigned long *apm_add, unsigned long *aqm_add,
> + unsigned long *adm_add)
> +{
> + struct ap_guest *guest;
> +
> + list_for_each_entry(guest, &matrix_dev->guests, node) {
> + bitmap_and(guest->matrix_mdev->apm_add,
> + guest->matrix_mdev->matrix.apm, apm_add, AP_DEVICES);
> + bitmap_and(guest->matrix_mdev->aqm_add,
> + guest->matrix_mdev->matrix.aqm, aqm_add, AP_DOMAINS);
> + bitmap_and(guest->matrix_mdev->adm_add,
> + guest->matrix_mdev->matrix.adm, adm_add, AP_DEVICES);
> + }
> +}
> +
> +/**
> + * vfio_ap_mdev_on_cfg_add - responds to the addition of adapters, domains and
> + * control domains to the host AP configuration
> + * by updating the bitmaps that specify what adapters,
> + * domains and control domains have been added so they
> + * can be hot plugged into the guest when the AP bus
> + * scan completes (see vfio_ap_on_scan_complete
> + * function).
> + */
> +static void vfio_ap_mdev_on_cfg_add(void)
> +{
> + bool do_add;
> + DECLARE_BITMAP(apm_add, AP_DEVICES);
> + DECLARE_BITMAP(aqm_add, AP_DOMAINS);
> + DECLARE_BITMAP(adm_add, AP_DOMAINS);
> + unsigned long *cur_apm, *cur_aqm, *cur_adm;
> + unsigned long *prev_apm, *prev_aqm, *prev_adm;
> +
> + cur_apm = (unsigned long *)matrix_dev->config_info.apm;
> + cur_aqm = (unsigned long *)matrix_dev->config_info.aqm;
> + cur_adm = (unsigned long *)matrix_dev->config_info.adm;
> +
> + prev_apm = (unsigned long *)matrix_dev->config_info_prev.apm;
> + prev_aqm = (unsigned long *)matrix_dev->config_info_prev.aqm;
> + prev_adm = (unsigned long *)matrix_dev->config_info_prev.adm;
> +
> + do_add = bitmap_andnot(apm_add, cur_apm, prev_apm, AP_DEVICES);
> + do_add |= bitmap_andnot(aqm_add, cur_aqm, prev_aqm, AP_DOMAINS);
> + do_add |= bitmap_andnot(adm_add, cur_adm, prev_adm, AP_DOMAINS);
> +
> + if (do_add)
> + vfio_ap_mdev_cfg_add(apm_add, aqm_add, adm_add);
> +}
> +
> +/**
> + * vfio_ap_on_cfg_changed - handles notification of changes to the host AP
> + * configuration.
> + *
> + * @new_config_info: the new host AP configuration
> + * @old_config_info: the previous host AP configuration
> + */
> +void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
> + struct ap_config_info *old_config_info)
> +{
> + down_read(&matrix_dev->guests_lock);
> +
> + memcpy(&matrix_dev->config_info_prev, old_config_info,
> + sizeof(struct ap_config_info));
> + memcpy(&matrix_dev->config_info, new_config_info,
> + sizeof(struct ap_config_info));
> + vfio_ap_mdev_on_cfg_remove();
> + vfio_ap_mdev_on_cfg_add();
> +
> + up_read(&matrix_dev->guests_lock);
> +}
> +
> +static void vfio_ap_mdev_hot_plug_cfg(struct ap_guest *guest)
> +{
> + bool filter_matrix, filter_cdoms, do_hotplug = false;
> +
> + filter_matrix = bitmap_intersects(guest->matrix_mdev->matrix.apm,
> + guest->matrix_mdev->apm_add,
> + AP_DEVICES) ||
> + bitmap_intersects(guest->matrix_mdev->matrix.aqm,
> + guest->matrix_mdev->aqm_add,
> + AP_DOMAINS);
> +
> + filter_cdoms = bitmap_intersects(guest->matrix_mdev->matrix.adm,
> + guest->matrix_mdev->aqm_add,
> + AP_DOMAINS);
> +
> + mutex_lock(&guest->kvm->lock);
> + mutex_lock(&matrix_dev->lock);
> +
> + if (filter_matrix)
> + do_hotplug |= vfio_ap_mdev_filter_matrix(guest->matrix_mdev);
> +
> + if (filter_cdoms)
> + do_hotplug |= vfio_ap_mdev_filter_cdoms(guest->matrix_mdev);
> +
> + if (do_hotplug)
> + vfio_ap_mdev_hotplug_apcb(guest->matrix_mdev);
> +
> + mutex_unlock(&matrix_dev->lock);
> + mutex_unlock(&guest->kvm->lock);
> +}
> +
> +void vfio_ap_on_scan_complete(struct ap_config_info *new_config_info,
> + struct ap_config_info *old_config_info)
> +{
> + struct ap_guest *guest;
> +
> + down_read(&matrix_dev->guests_lock);
> +
> + list_for_each_entry(guest, &matrix_dev->guests, node) {
> + if (bitmap_empty(guest->matrix_mdev->apm_add, AP_DEVICES) &&
> + bitmap_empty(guest->matrix_mdev->aqm_add, AP_DOMAINS) &&
> + bitmap_empty(guest->matrix_mdev->adm_add, AP_DOMAINS))
> + continue;
> +
> + vfio_ap_mdev_hot_plug_cfg(guest);
> + bitmap_clear(guest->matrix_mdev->apm_add, 0, AP_DEVICES);
> + bitmap_clear(guest->matrix_mdev->aqm_add, 0, AP_DOMAINS);
> + bitmap_clear(guest->matrix_mdev->adm_add, 0, AP_DOMAINS);
> + }
> +
> + up_read(&matrix_dev->guests_lock);
> +}
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index 97da41f87c65..affa63da7f88 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -37,7 +37,9 @@ struct ap_guest {
> *
> * @device: generic device structure associated with the AP matrix device
> * @available_instances: number of mediated matrix devices that can be created
> - * @info: the struct containing the output from the PQAP(QCI) instruction
> + * @config_info: the struct containing the output from the PQAP(QCI) instruction
> + * @config_info_prev: the struct containing the previous output from the
> + * PQAP(AQIC) instruction
> * @mdev_list: the list of mediated matrix devices created
> * @lock: mutex for locking the AP matrix device. This lock will be
> * taken every time we fiddle with state managed by the vfio_ap
> @@ -52,7 +54,8 @@ struct ap_guest {
> struct ap_matrix_dev {
> struct device device;
> atomic_t available_instances;
> - struct ap_config_info info;
> + struct ap_config_info config_info;
> + struct ap_config_info config_info_prev;
> struct list_head mdev_list;
> struct mutex lock;
> struct ap_driver *vfio_ap_drv;
> @@ -110,6 +113,13 @@ struct ap_queue_table {
> * @mdev: the mediated device
> * @qtable: table of queues (struct vfio_ap_queue) assigned to the mdev
> * @guest: the KVM guest using the matrix mdev
> + * @apm_add: adapters to be hot plugged into the guest when the vfio_ap
> + * device driver is notified that the AP bus scan has completed.
> + * @aqm_add: domains to be hot plugged into the guest when the vfio_ap
> + * device driver is notified that the AP bus scan has completed.
> + * @adm_add: control domains to be hot plugged into the guest when the
> + * vfio_ap device driver is notified that the AP bus scan has
> + * completed.
> */
> struct ap_matrix_mdev {
> struct vfio_device vdev;
> @@ -121,6 +131,9 @@ struct ap_matrix_mdev {
> struct mdev_device *mdev;
> struct ap_queue_table qtable;
> struct ap_guest *guest;
> + DECLARE_BITMAP(apm_add, AP_DEVICES);
> + DECLARE_BITMAP(aqm_add, AP_DOMAINS);
> + DECLARE_BITMAP(adm_add, AP_DOMAINS);
> };
>
> /**
> @@ -151,4 +164,10 @@ void vfio_ap_mdev_remove_queue(struct ap_device *queue);
>
> int vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm);
>
> +
> +void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
> + struct ap_config_info *old_config_info);
> +void vfio_ap_on_scan_complete(struct ap_config_info *new_config_info,
> + struct ap_config_info *old_config_info);
> +
> #endif /* _VFIO_AP_PRIVATE_H_ */
again reviewed the ap parts of this patch. Looks fine to me and applies cleanly to current devel branch.
Tony as this is v17, if you may do jet another loop, I would pick the ap parts of your patch series and
apply them to the devel branch as separate patches.

2021-11-04 15:52:51

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v17 11/15] s390/ap: driver callback to indicate resource in use



On 11/4/21 7:27 AM, Harald Freudenberger wrote:

> reviewed again. I still don't like this as it introduces an unbalanced weighting for the
> vfio dd but ... We could consider removing the
>
> if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
> return 0;
>
> in function __verify_queue_reservations. It would still work as the 'default device
> drivers' do not implement the in_use() callback and thus do not disagree about
> the upcoming change.

I don't have a problem with that given the default drivers may one day
have use for implementing the callback.

>

2021-11-04 15:54:31

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v17 14/15] s390/ap: notify drivers on config changed and scan complete callbacks



On 11/4/21 8:06 AM, Harald Freudenberger wrote:
>
> Tony as this is v17, if you may do jet another loop, I would pick the ap parts of your patch series and
> apply them to the devel branch as separate patches.

Are you suggesting I do this now, or when this is finally ready to go
upstream?


2021-11-05 08:36:18

by Harald Freudenberger

[permalink] [raw]
Subject: Re: [PATCH v17 14/15] s390/ap: notify drivers on config changed and scan complete callbacks

On 04.11.21 16:50, Tony Krowiak wrote:
>
>
> On 11/4/21 8:06 AM, Harald Freudenberger wrote:
>>
>> Tony as this is v17, if you may do jet another loop, I would pick the ap parts of your patch series and
>> apply them to the devel branch as separate patches.
>
> Are you suggesting I do this now, or when this is finally ready to go upstream?
>
>
I am suggesting picking all the ap related stuff into one patch and commit it to the devel branch now (well in the next days).
So the ap stuff is then prepared for your patches and it gives your patch series some relief.

2021-11-05 16:09:22

by Harald Freudenberger

[permalink] [raw]
Subject: Re: [PATCH v17 14/15] s390/ap: notify drivers on config changed and scan complete callbacks

On 05.11.21 09:23, Harald Freudenberger wrote:
> On 04.11.21 16:50, Tony Krowiak wrote:
>>
>> On 11/4/21 8:06 AM, Harald Freudenberger wrote:
>>> Tony as this is v17, if you may do jet another loop, I would pick the ap parts of your patch series and
>>> apply them to the devel branch as separate patches.
>> Are you suggesting I do this now, or when this is finally ready to go upstream?
>>
>>
> I am suggesting picking all the ap related stuff into one patch and commit it to the devel branch now (well in the next days).
> So the ap stuff is then prepared for your patches and it gives your patch series some relief.
Of course I would do this if you agree to this procedure.

2021-11-08 17:07:45

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v17 14/15] s390/ap: notify drivers on config changed and scan complete callbacks



On 11/5/21 9:15 AM, Harald Freudenberger wrote:
> On 05.11.21 09:23, Harald Freudenberger wrote:
>> On 04.11.21 16:50, Tony Krowiak wrote:
>>> On 11/4/21 8:06 AM, Harald Freudenberger wrote:
>>>> Tony as this is v17, if you may do jet another loop, I would pick the ap parts of your patch series and
>>>> apply them to the devel branch as separate patches.
>>> Are you suggesting I do this now, or when this is finally ready to go upstream?
>>>
>>>
>> I am suggesting picking all the ap related stuff into one patch and commit it to the devel branch now (well in the next days).
>> So the ap stuff is then prepared for your patches and it gives your patch series some relief.
> Of course I would do this if you agree to this procedure.

I am fine with it, it makes sense


2021-11-08 20:16:23

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v17 14/15] s390/ap: notify drivers on config changed and scan complete callbacks



On 11/5/21 4:23 AM, Harald Freudenberger wrote:
> On 04.11.21 16:50, Tony Krowiak wrote:
>>
>> On 11/4/21 8:06 AM, Harald Freudenberger wrote:
>>> Tony as this is v17, if you may do jet another loop, I would pick the ap parts of your patch series and
>>> apply them to the devel branch as separate patches.
>> Are you suggesting I do this now, or when this is finally ready to go upstream?
>>
>>
> I am suggesting picking all the ap related stuff into one patch and commit it to the devel branch now (well in the next days).
> So the ap stuff is then prepared for your patches and it gives your patch series some relief.

Will do.


2021-11-15 15:46:43

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v17 00/15] s390/vfio-ap: dynamic configuration support

PING!

On 10/21/21 11:23 AM, Tony Krowiak wrote:
> The current design for AP pass-through does not support making dynamic
> changes to the AP matrix of a running guest resulting in a few
> deficiencies this patch series is intended to mitigate:
>
> 1. Adapters, domains and control domains can not be added to or removed
> from a running guest. In order to modify a guest's AP configuration,
> the guest must be terminated; only then can AP resources be assigned
> to or unassigned from the guest's matrix mdev. The new AP
> configuration becomes available to the guest when it is subsequently
> restarted.
>
> 2. The AP bus's /sys/bus/ap/apmask and /sys/bus/ap/aqmask interfaces can
> be modified by a root user without any restrictions. A change to
> either mask can result in AP queue devices being unbound from the
> vfio_ap device driver and bound to a zcrypt device driver even if a
> guest is using the queues, thus giving the host access to the guest's
> private crypto data and vice versa.
>
> 3. The APQNs derived from the Cartesian product of the APIDs of the
> adapters and APQIs of the domains assigned to a matrix mdev must
> reference an AP queue device bound to the vfio_ap device driver. The
> AP architecture allows assignment of AP resources that are not
> available to the system, so this artificial restriction is not
> compliant with the architecture.
>
> 4. The AP configuration profile can be dynamically changed for the linux
> host after a KVM guest is started. For example, a new domain can be
> dynamically added to the configuration profile via the SE or an HMC
> connected to a DPM enabled lpar. Likewise, AP adapters can be
> dynamically configured (online state) and deconfigured (standby state)
> using the SE, an SCLP command or an HMC connected to a DPM enabled
> lpar. This can result in inadvertent sharing of AP queues between the
> guest and host.
>
> 5. A root user can manually unbind an AP queue device representing a
> queue in use by a KVM guest via the vfio_ap device driver's sysfs
> unbind attribute. In this case, the guest will be using a queue that
> is not bound to the driver which violates the device model.
>
> This patch series introduces the following changes to the current design
> to alleviate the shortcomings described above as well as to implement
> more of the AP architecture:
>
> 1. A root user will be prevented from making edits to the AP bus's
> /sys/bus/ap/apmask or /sys/bus/ap/aqmask if the change would transfer
> ownership of an APQN from the vfio_ap device driver to a zcrypt driver
> while the APQN is assigned to a matrix mdev.
>
> 2. Allow a root user to hot plug/unplug AP adapters, domains and control
> domains for a KVM guest using the matrix mdev via its sysfs
> assign/unassign attributes.
>
> 4. Allow assignment of an AP adapter or domain to a matrix mdev even if
> it results in assignment of an APQN that does not reference an AP
> queue device bound to the vfio_ap device driver, as long as the APQN
> is not reserved for use by the default zcrypt drivers (also known as
> over-provisioning of AP resources). Allowing over-provisioning of AP
> resources better models the architecture which does not preclude
> assigning AP resources that are not yet available in the system. Such
> APQNs, however, will not be assigned to the guest using the matrix
> mdev; only APQNs referencing AP queue devices bound to the vfio_ap
> device driver will actually get assigned to the guest.
>
> 5. Handle dynamic changes to the AP device model.
>
> 1. Rationale for changes to AP bus's apmask/aqmask interfaces:
> ----------------------------------------------------------
> Due to the extremely sensitive nature of cryptographic data, it is
> imperative that great care be taken to ensure that such data is secured.
> Allowing a root user, either inadvertently or maliciously, to configure
> these masks such that a queue is shared between the host and a guest is
> not only avoidable, it is advisable. It was suggested that this scenario
> is better handled in user space with management software, but that does
> not preclude a malicious administrator from using the sysfs interfaces
> to gain access to a guest's crypto data. It was also suggested that this
> scenario could be avoided by taking access to the adapter away from the
> guest and zeroing out the queues prior to the vfio_ap driver releasing the
> device; however, stealing an adapter in use from a guest as a by-product
> of an operation is bad and will likely cause problems for the guest
> unnecessarily. It was decided that the most effective solution with the
> least number of negative side effects is to prevent the situation at the
> source.
>
> 2. Rationale for hot plug/unplug using matrix mdev sysfs interfaces:
> ----------------------------------------------------------------
> Allowing a user to hot plug/unplug AP resources using the matrix mdev
> sysfs interfaces circumvents the need to terminate the guest in order to
> modify its AP configuration. Allowing dynamic configuration makes
> reconfiguring a guest's AP matrix much less disruptive.
>
> 3. Rationale for allowing over-provisioning of AP resources:
> -----------------------------------------------------------
> Allowing assignment of AP resources to a matrix mdev and ultimately to a
> guest better models the AP architecture. The architecture does not
> preclude assignment of unavailable AP resources. If a queue subsequently
> becomes available while a guest using the matrix mdev to which its APQN
> is assigned, the guest will be given access to it. If an APQN
> is dynamically unassigned from the underlying host system, it will
> automatically become unavailable to the guest.
>
> Change log v16-v17:
> ------------------
> * Introduced a new patch (patch 1) to remove the setting of the pqap hook
> in the group notifier callback. It is now set when the vfio_ap device
> driver is loaded.
>
> * Patch 6:
> - Split the filtering of the APQNs and the control domains into
> two functions and consolidated the vfio_ap_mdev_refresh_apcb and
> vfio_ap_mdev_filter_apcb into one function named
> vfio_ap_mdev_filter_matrix because the matrix is actually what is
> being filtered.
>
> - Removed ACK by Halil Pasic because of changes above; needs re-review.
>
> * Introduced a new patch (patch 8) to keep track of active guests.
>
> * Patch 9 (patch 8 in v16):
> - Refactored locking to ensure KVM lock is taken before
> matrix_dev->lock when hot plugging adapters, domains and
> control domains.
>
> - Removed ACK by Halil because of changes above; needs re-review.
>
> * Patch 14 (patch 13 in v16):
> - This patch has been redesigned to ensure proper locking order (i.e.,
> taking kvm->lock before matrix_dev->lock).
>
> - Removed Halil's Removed-by because of changes above; needs re-review.
>
> Tony Krowiak (15):
> s390/vfio-ap: Set pqap hook when vfio_ap module is loaded
> s390/vfio-ap: use new AP bus interface to search for queue devices
> s390/vfio-ap: move probe and remove callbacks to vfio_ap_ops.c
> s390/vfio-ap: manage link between queue struct and matrix mdev
> s390/vfio-ap: introduce shadow APCB
> s390/vfio-ap: refresh guest's APCB by filtering APQNs assigned to mdev
> s390/vfio-ap: allow assignment of unavailable AP queues to mdev device
> s390/vfio-ap: keep track of active guests
> s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
> s390/vfio-ap: reset queues after adapter/domain unassignment
> s390/ap: driver callback to indicate resource in use
> s390/vfio-ap: implement in-use callback for vfio_ap driver
> s390/vfio-ap: sysfs attribute to display the guest's matrix
> s390/ap: notify drivers on config changed and scan complete callbacks
> s390/vfio-ap: update docs to include dynamic config support
>
> Documentation/s390/vfio-ap.rst | 492 ++++++---
> arch/s390/include/asm/kvm_host.h | 10 +-
> arch/s390/kvm/kvm-s390.c | 1 -
> arch/s390/kvm/priv.c | 45 +-
> drivers/s390/crypto/ap_bus.c | 241 ++++-
> drivers/s390/crypto/ap_bus.h | 16 +
> drivers/s390/crypto/vfio_ap_drv.c | 52 +-
> drivers/s390/crypto/vfio_ap_ops.c | 1379 ++++++++++++++++++-------
> drivers/s390/crypto/vfio_ap_private.h | 66 +-
> 9 files changed, 1714 insertions(+), 588 deletions(-)
>


2021-11-22 16:13:08

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v17 00/15] s390/vfio-ap: dynamic configuration support

PING!!

On 10/21/21 11:23 AM, Tony Krowiak wrote:
> The current design for AP pass-through does not support making dynamic
> changes to the AP matrix of a running guest resulting in a few
> deficiencies this patch series is intended to mitigate:
>
> 1. Adapters, domains and control domains can not be added to or removed
> from a running guest. In order to modify a guest's AP configuration,
> the guest must be terminated; only then can AP resources be assigned
> to or unassigned from the guest's matrix mdev. The new AP
> configuration becomes available to the guest when it is subsequently
> restarted.
>
> 2. The AP bus's /sys/bus/ap/apmask and /sys/bus/ap/aqmask interfaces can
> be modified by a root user without any restrictions. A change to
> either mask can result in AP queue devices being unbound from the
> vfio_ap device driver and bound to a zcrypt device driver even if a
> guest is using the queues, thus giving the host access to the guest's
> private crypto data and vice versa.
>
> 3. The APQNs derived from the Cartesian product of the APIDs of the
> adapters and APQIs of the domains assigned to a matrix mdev must
> reference an AP queue device bound to the vfio_ap device driver. The
> AP architecture allows assignment of AP resources that are not
> available to the system, so this artificial restriction is not
> compliant with the architecture.
>
> 4. The AP configuration profile can be dynamically changed for the linux
> host after a KVM guest is started. For example, a new domain can be
> dynamically added to the configuration profile via the SE or an HMC
> connected to a DPM enabled lpar. Likewise, AP adapters can be
> dynamically configured (online state) and deconfigured (standby state)
> using the SE, an SCLP command or an HMC connected to a DPM enabled
> lpar. This can result in inadvertent sharing of AP queues between the
> guest and host.
>
> 5. A root user can manually unbind an AP queue device representing a
> queue in use by a KVM guest via the vfio_ap device driver's sysfs
> unbind attribute. In this case, the guest will be using a queue that
> is not bound to the driver which violates the device model.
>
> This patch series introduces the following changes to the current design
> to alleviate the shortcomings described above as well as to implement
> more of the AP architecture:
>
> 1. A root user will be prevented from making edits to the AP bus's
> /sys/bus/ap/apmask or /sys/bus/ap/aqmask if the change would transfer
> ownership of an APQN from the vfio_ap device driver to a zcrypt driver
> while the APQN is assigned to a matrix mdev.
>
> 2. Allow a root user to hot plug/unplug AP adapters, domains and control
> domains for a KVM guest using the matrix mdev via its sysfs
> assign/unassign attributes.
>
> 4. Allow assignment of an AP adapter or domain to a matrix mdev even if
> it results in assignment of an APQN that does not reference an AP
> queue device bound to the vfio_ap device driver, as long as the APQN
> is not reserved for use by the default zcrypt drivers (also known as
> over-provisioning of AP resources). Allowing over-provisioning of AP
> resources better models the architecture which does not preclude
> assigning AP resources that are not yet available in the system. Such
> APQNs, however, will not be assigned to the guest using the matrix
> mdev; only APQNs referencing AP queue devices bound to the vfio_ap
> device driver will actually get assigned to the guest.
>
> 5. Handle dynamic changes to the AP device model.
>
> 1. Rationale for changes to AP bus's apmask/aqmask interfaces:
> ----------------------------------------------------------
> Due to the extremely sensitive nature of cryptographic data, it is
> imperative that great care be taken to ensure that such data is secured.
> Allowing a root user, either inadvertently or maliciously, to configure
> these masks such that a queue is shared between the host and a guest is
> not only avoidable, it is advisable. It was suggested that this scenario
> is better handled in user space with management software, but that does
> not preclude a malicious administrator from using the sysfs interfaces
> to gain access to a guest's crypto data. It was also suggested that this
> scenario could be avoided by taking access to the adapter away from the
> guest and zeroing out the queues prior to the vfio_ap driver releasing the
> device; however, stealing an adapter in use from a guest as a by-product
> of an operation is bad and will likely cause problems for the guest
> unnecessarily. It was decided that the most effective solution with the
> least number of negative side effects is to prevent the situation at the
> source.
>
> 2. Rationale for hot plug/unplug using matrix mdev sysfs interfaces:
> ----------------------------------------------------------------
> Allowing a user to hot plug/unplug AP resources using the matrix mdev
> sysfs interfaces circumvents the need to terminate the guest in order to
> modify its AP configuration. Allowing dynamic configuration makes
> reconfiguring a guest's AP matrix much less disruptive.
>
> 3. Rationale for allowing over-provisioning of AP resources:
> -----------------------------------------------------------
> Allowing assignment of AP resources to a matrix mdev and ultimately to a
> guest better models the AP architecture. The architecture does not
> preclude assignment of unavailable AP resources. If a queue subsequently
> becomes available while a guest using the matrix mdev to which its APQN
> is assigned, the guest will be given access to it. If an APQN
> is dynamically unassigned from the underlying host system, it will
> automatically become unavailable to the guest.
>
> Change log v16-v17:
> ------------------
> * Introduced a new patch (patch 1) to remove the setting of the pqap hook
> in the group notifier callback. It is now set when the vfio_ap device
> driver is loaded.
>
> * Patch 6:
> - Split the filtering of the APQNs and the control domains into
> two functions and consolidated the vfio_ap_mdev_refresh_apcb and
> vfio_ap_mdev_filter_apcb into one function named
> vfio_ap_mdev_filter_matrix because the matrix is actually what is
> being filtered.
>
> - Removed ACK by Halil Pasic because of changes above; needs re-review.
>
> * Introduced a new patch (patch 8) to keep track of active guests.
>
> * Patch 9 (patch 8 in v16):
> - Refactored locking to ensure KVM lock is taken before
> matrix_dev->lock when hot plugging adapters, domains and
> control domains.
>
> - Removed ACK by Halil because of changes above; needs re-review.
>
> * Patch 14 (patch 13 in v16):
> - This patch has been redesigned to ensure proper locking order (i.e.,
> taking kvm->lock before matrix_dev->lock).
>
> - Removed Halil's Removed-by because of changes above; needs re-review.
>
> Tony Krowiak (15):
> s390/vfio-ap: Set pqap hook when vfio_ap module is loaded
> s390/vfio-ap: use new AP bus interface to search for queue devices
> s390/vfio-ap: move probe and remove callbacks to vfio_ap_ops.c
> s390/vfio-ap: manage link between queue struct and matrix mdev
> s390/vfio-ap: introduce shadow APCB
> s390/vfio-ap: refresh guest's APCB by filtering APQNs assigned to mdev
> s390/vfio-ap: allow assignment of unavailable AP queues to mdev device
> s390/vfio-ap: keep track of active guests
> s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
> s390/vfio-ap: reset queues after adapter/domain unassignment
> s390/ap: driver callback to indicate resource in use
> s390/vfio-ap: implement in-use callback for vfio_ap driver
> s390/vfio-ap: sysfs attribute to display the guest's matrix
> s390/ap: notify drivers on config changed and scan complete callbacks
> s390/vfio-ap: update docs to include dynamic config support
>
> Documentation/s390/vfio-ap.rst | 492 ++++++---
> arch/s390/include/asm/kvm_host.h | 10 +-
> arch/s390/kvm/kvm-s390.c | 1 -
> arch/s390/kvm/priv.c | 45 +-
> drivers/s390/crypto/ap_bus.c | 241 ++++-
> drivers/s390/crypto/ap_bus.h | 16 +
> drivers/s390/crypto/vfio_ap_drv.c | 52 +-
> drivers/s390/crypto/vfio_ap_ops.c | 1379 ++++++++++++++++++-------
> drivers/s390/crypto/vfio_ap_private.h | 66 +-
> 9 files changed, 1714 insertions(+), 588 deletions(-)
>


2021-12-27 08:22:25

by Halil Pasic

[permalink] [raw]
Subject: Re: [PATCH v17 01/15] s390/vfio-ap: Set pqap hook when vfio_ap module is loaded

On Thu, 21 Oct 2021 11:23:18 -0400
Tony Krowiak <[email protected]> wrote:

> Rather than storing the function pointer to the PQAP(AQIC) instruction
> interception handler with the mediated device (struct ap_matrix_mdev) when
> the vfio_ap device driver is notified that the KVM point is being set,
> let's store it once in a global variable when the vfio_ap module is
> loaded.

This is a global variable of the kvm module, right? I'm not sure this
makes sense from a 'bigger picture' perspective!. Imagine a situation
where we have two drivers doing some sort of a vfio-ap like thing. Only
one of the drivers would be able to register the callback. I do
understand that we are unlikely to see another driver that needs to
register a pqap callback in the foreseeable future, but this till looks
like a very questionable design. The old design where the AP
pass-through and/or virtualization is done by exactly one driver for one
guest, but different guests may use different drivers looks preferable
to me.



>
> There are three reasons for doing this:
>
> 1. The lifetime of the interception handler function coincides with the
> lifetime of the vfio_ap module, so it makes little sense to tie it to
> the mediated device and complete sense to tie it to the module in which
> the function resides.

Well, the handler lives in the vfio_ap module so this is a given anyway.
I guess we are talking here about the callback which is registered by the
vfio_ap module, in the old design with a specific kvm instance (guest).

>
> 2. The setting/clearing of the function pointer is protected by a mutex
> lock. This increases the number of locks taken during
> VFIO_GROUP_NOTIFY_SET_KVM notification and increases the complexity of
> ensuring locking integrity and avoiding circular lock dependencies.
>
> 3. The lock will only be taken for writing twice: When the vfio_ap module
> is loaded; and, when the vfio_ap module is removed. As it stands now,
> the lock is taken for writing whenever a guest is started or terminated.

What you do is basically get rid of the

crypto_hook *pqap_hook;
pointer and add the
void *data;
pointer instead.

It is obvious why you need to add that pointer: we need a pointer to the
matrix_mdev, and we used to obtain it via the pqap_hook pointer.

But isn't the access to *data racy now? I mean you sill have concurrent
access to a pointer that is just called differently. Yes, instead of
accessing vcpu->kvm->arch.crypto.pqap_hook twice, we access the global
pqap_hook in the kvm module once, and then data once, but the access to
data does not seem to be synchronized at all. And you do not care to
explain why that is supposed to be OK.



>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> arch/s390/include/asm/kvm_host.h | 10 ++++--
> arch/s390/kvm/kvm-s390.c | 1 -
> arch/s390/kvm/priv.c | 45 ++++++++++++++++++++++-----
> drivers/s390/crypto/vfio_ap_ops.c | 27 ++++++++--------
> drivers/s390/crypto/vfio_ap_private.h | 1 -
> 5 files changed, 58 insertions(+), 26 deletions(-)
>
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index a604d51acfc8..05569d077d7f 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -799,16 +799,17 @@ struct kvm_s390_cpu_model {
> unsigned short ibc;
> };
>
> -typedef int (*crypto_hook)(struct kvm_vcpu *vcpu);
> +struct kvm_s390_crypto_hook {
> + int (*fcn)(struct kvm_vcpu *vcpu);
> +};

I guess you do this for type safety, or?

>
> struct kvm_s390_crypto {
> struct kvm_s390_crypto_cb *crycb;
> - struct rw_semaphore pqap_hook_rwsem;
> - crypto_hook *pqap_hook;

One pointer set by the driver gone...

> __u32 crycbd;
> __u8 aes_kw;
> __u8 dea_kw;
> __u8 apie;
> + void *data;

... but another one added!

> };
>
> #define APCB0_MASK_SIZE 1
> @@ -998,6 +999,9 @@ extern char sie_exit;
> extern int kvm_s390_gisc_register(struct kvm *kvm, u32 gisc);
> extern int kvm_s390_gisc_unregister(struct kvm *kvm, u32 gisc);
>
> +extern int kvm_s390_pqap_hook_register(struct kvm_s390_crypto_hook *hook);
> +extern int kvm_s390_pqap_hook_unregister(struct kvm_s390_crypto_hook *hook);
> +
> static inline void kvm_arch_hardware_disable(void) {}
> static inline void kvm_arch_sync_events(struct kvm *kvm) {}
> static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 6a6dd5e1daf6..c91981599328 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -2649,7 +2649,6 @@ static void kvm_s390_crypto_init(struct kvm *kvm)
> {
> kvm->arch.crypto.crycb = &kvm->arch.sie_page2->crycb;
> kvm_s390_set_crycb_format(kvm);
> - init_rwsem(&kvm->arch.crypto.pqap_hook_rwsem);
>
> if (!test_kvm_facility(kvm, 76))
> return;
> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
> index 53da4ceb16a3..3d91ff934c0c 100644
> --- a/arch/s390/kvm/priv.c
> +++ b/arch/s390/kvm/priv.c
> @@ -31,6 +31,39 @@
> #include "kvm-s390.h"
> #include "trace.h"
>
> +DEFINE_MUTEX(pqap_hook_lock);
> +static struct kvm_s390_crypto_hook *pqap_hook;

This is the kvm global variable for the hook.
> +
> +int kvm_s390_pqap_hook_register(struct kvm_s390_crypto_hook *hook)
> +{
> + int ret = 0;
> +
> + mutex_lock(&pqap_hook_lock);
> + if (pqap_hook)
> + ret = -EACCES;
> + else
> + pqap_hook = hook;
> + mutex_unlock(&pqap_hook_lock);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(kvm_s390_pqap_hook_register);
> +
> +int kvm_s390_pqap_hook_unregister(struct kvm_s390_crypto_hook *hook)
> +{
> + int ret = 0;
> +
> + mutex_lock(&pqap_hook_lock);
> + if (hook != pqap_hook)
> + ret = -EACCES;
> + else
> + pqap_hook = NULL;
> + mutex_unlock(&pqap_hook_lock);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(kvm_s390_pqap_hook_unregister);
> +
> static int handle_ri(struct kvm_vcpu *vcpu)
> {
> vcpu->stat.instruction_ri++;
> @@ -610,7 +643,6 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
> static int handle_pqap(struct kvm_vcpu *vcpu)
> {
> struct ap_queue_status status = {};
> - crypto_hook pqap_hook;
> unsigned long reg0;
> int ret;
> uint8_t fc;
> @@ -659,16 +691,15 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
> * hook function pointer in the kvm_s390_crypto structure. Lock the
> * owner, retrieve the hook function pointer and call the hook.
> */
> - down_read(&vcpu->kvm->arch.crypto.pqap_hook_rwsem);

This used to protect any reads of vcpu->kvm->arch.crypto.pqap_hook ...

> - if (vcpu->kvm->arch.crypto.pqap_hook) {
> - pqap_hook = *vcpu->kvm->arch.crypto.pqap_hook;
> - ret = pqap_hook(vcpu);

... while we are executing the hook. Instead in the hook we are supposed
to read *vcpu->kvm->arch.crypto.data but we don't care to synchronize
that access because that one ain't protected by the new pqap_hook_lock
mutex.

> + mutex_lock(&pqap_hook_lock);
> + if (pqap_hook) {
> + ret = pqap_hook->fcn(vcpu);
> if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000)
> kvm_s390_set_psw_cc(vcpu, 3);
> - up_read(&vcpu->kvm->arch.crypto.pqap_hook_rwsem);
> + mutex_unlock(&pqap_hook_lock);
> return ret;
> }
> - up_read(&vcpu->kvm->arch.crypto.pqap_hook_rwsem);
> + mutex_unlock(&pqap_hook_lock);
> /*
> * A vfio_driver must register a hook.
> * No hook means no driver to enable the SIE CRYCB and no queues.
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 94c1c9bd58ad..02275d246b39 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -293,13 +293,10 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
> apqn = vcpu->run->s.regs.gprs[0] & 0xffff;
> mutex_lock(&matrix_dev->lock);
>
> - if (!vcpu->kvm->arch.crypto.pqap_hook)
> - goto out_unlock;
> - matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
> - struct ap_matrix_mdev, pqap_hook);
> + matrix_mdev = vcpu->kvm->arch.crypto.data;

Here is the access I'm talking about above. [1]

>
> /* If the there is no guest using the mdev, there is nothing to do */
> - if (!matrix_mdev->kvm)
> + if (!matrix_mdev || !matrix_mdev->kvm)
> goto out_unlock;
>
> q = vfio_ap_get_queue(matrix_mdev, apqn);
> @@ -348,7 +345,6 @@ static int vfio_ap_mdev_probe(struct mdev_device *mdev)
>
> matrix_mdev->mdev = mdev;
> vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
> - matrix_mdev->pqap_hook = handle_pqap;
> mutex_lock(&matrix_dev->lock);
> list_add(&matrix_mdev->node, &matrix_dev->mdev_list);
> mutex_unlock(&matrix_dev->lock);
> @@ -1078,10 +1074,6 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
> struct ap_matrix_mdev *m;
>
> if (kvm->arch.crypto.crycbd) {
> - down_write(&kvm->arch.crypto.pqap_hook_rwsem);
> - kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;

We used to set our old pointer in a synchronized manner...

> - up_write(&kvm->arch.crypto.pqap_hook_rwsem);
> -
> mutex_lock(&kvm->lock);
> mutex_lock(&matrix_dev->lock);
>
> @@ -1095,6 +1087,7 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
>
> kvm_get_kvm(kvm);
> matrix_mdev->kvm = kvm;
> + kvm->arch.crypto.data = matrix_mdev;

... but not any more!

> kvm_arch_crypto_set_masks(kvm,
> matrix_mdev->matrix.apm,
> matrix_mdev->matrix.aqm,
> @@ -1155,16 +1148,13 @@ static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev,
> struct kvm *kvm)
> {
> if (kvm && kvm->arch.crypto.crycbd) {
> - down_write(&kvm->arch.crypto.pqap_hook_rwsem);
> - kvm->arch.crypto.pqap_hook = NULL;
> - up_write(&kvm->arch.crypto.pqap_hook_rwsem);
> -

Same here!

> mutex_lock(&kvm->lock);
> mutex_lock(&matrix_dev->lock);
>
> kvm_arch_crypto_clear_masks(kvm);
> vfio_ap_mdev_reset_queues(matrix_mdev);
> kvm_put_kvm(kvm);
> + kvm->arch.crypto.data = NULL;

And same here. So at [1] we aren't guaranteed to see this write, right?
That does not look right to me.

> matrix_mdev->kvm = NULL;
>
> mutex_unlock(&kvm->lock);
> @@ -1391,12 +1381,20 @@ static const struct mdev_parent_ops vfio_ap_matrix_ops = {
> .supported_type_groups = vfio_ap_mdev_type_groups,
> };
>
> +static struct kvm_s390_crypto_hook pqap_hook = {
> + .fcn = handle_pqap,
> +};
> +
> int vfio_ap_mdev_register(void)
> {
> int ret;
>
> atomic_set(&matrix_dev->available_instances, MAX_ZDEV_ENTRIES_EXT);
>
> + ret = kvm_s390_pqap_hook_register(&pqap_hook);
> + if (ret)
> + return ret;
> +
> ret = mdev_register_driver(&vfio_ap_matrix_driver);
> if (ret)
> return ret;
> @@ -1413,6 +1411,7 @@ int vfio_ap_mdev_register(void)
>
> void vfio_ap_mdev_unregister(void)
> {
> + WARN_ON(kvm_s390_pqap_hook_unregister(&pqap_hook));
> mdev_unregister_device(&matrix_dev->device);
> mdev_unregister_driver(&vfio_ap_matrix_driver);
> }
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index 648fcaf8104a..907f41160de7 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -97,7 +97,6 @@ struct ap_matrix_mdev {
> struct notifier_block group_notifier;
> struct notifier_block iommu_notifier;
> struct kvm *kvm;
> - crypto_hook pqap_hook;
> struct mdev_device *mdev;
> };
>


2021-12-27 08:26:15

by Halil Pasic

[permalink] [raw]
Subject: Re: [PATCH v17 02/15] s390/vfio-ap: use new AP bus interface to search for queue devices

On Thu, 21 Oct 2021 11:23:19 -0400
Tony Krowiak <[email protected]> wrote:

> his patch refactors the vfio_ap device driver to use the AP bus's
> ap_get_qdev() function to retrieve the vfio_ap_queue struct containing
> information about a queue that is bound to the vfio_ap device driver.
> The bus's ap_get_qdev() function retrieves the queue device from a
> hashtable keyed by APQN. This is much more efficient than looping over
> the list of devices attached to the AP bus by several orders of
> magnitude.
>
> Signed-off-by: Tony Krowiak <[email protected]>
> Reviewed-by: Halil Pasic <[email protected]>

I assume nothing changed here, and nothing significant changed around
this patch (context). If I'm wrong, please tell me and I will
re-evaluate.

Regards,
Halil

2021-12-27 08:53:44

by Halil Pasic

[permalink] [raw]
Subject: Re: [PATCH v17 06/15] s390/vfio-ap: refresh guest's APCB by filtering APQNs assigned to mdev

On Thu, 21 Oct 2021 11:23:23 -0400
Tony Krowiak <[email protected]> wrote:

> Refresh the guest's APCB by filtering the APQNs assigned to the matrix mdev
> that do not reference an AP queue device bound to the vfio_ap device
> driver. The mdev's APQNs will be filtered according to the following rules:
>
> * The APID of each adapter and the APQI of each domain that is not in the
> host's AP configuration is filtered out.
>
> * The APID of each adapter comprising an APQN that does not reference a
> queue device bound to the vfio_ap device driver is filtered. The APQNs
> are derived from the Cartesian product of the APID of each adapter and
> APQI of each domain assigned to the mdev.
>
> The control domains that are not assigned to the host's AP configuration
> will also be filtered before assigning them to the guest's APCB.

The v16 version used to filer on queue removal from vfio_ap, which makes
a ton of sense.

This version will "filter back" the queues once these become bound, but
if a queue is removed form vfio_ap, we don't seem to care to filter. Is
this intentional?

Also we could probably do the filtering incrementally. In a sense that
at a time only so much changes, and we know that the invariant was
preserved without that change. But that would probably end up trading
complexity for cycles. I will trust your judgment and your tests on this
matter.

>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> drivers/s390/crypto/vfio_ap_ops.c | 66 ++++++++++++++++++++++++++++++-
> 1 file changed, 64 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 4305177029bf..46c179363aca 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -314,6 +314,62 @@ static void vfio_ap_matrix_init(struct ap_config_info *info,
> matrix->adm_max = info->apxa ? info->Nd : 15;
> }
>
> +static void vfio_ap_mdev_filter_cdoms(struct ap_matrix_mdev *matrix_mdev)
> +{
> + bitmap_and(matrix_mdev->shadow_apcb.adm, matrix_mdev->matrix.adm,
> + (unsigned long *)matrix_dev->info.adm, AP_DOMAINS);
> +}
> +
> +/*
> + * vfio_ap_mdev_filter_matrix - copy the mdev's AP configuration to the KVM
> + * guest's APCB then filter the APIDs that do not
> + * comprise at least one APQN that references a
> + * queue device bound to the vfio_ap device driver.
> + *
> + * @matrix_mdev: the mdev whose AP configuration is to be filtered.
> + */
> +static void vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev)
> +{
> + int ret;
> + unsigned long apid, apqi, apqn;
> +
> + ret = ap_qci(&matrix_dev->info);
> + if (ret)
> + return;
> +
> + vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);
> +
> + /*
> + * Copy the adapters, domains and control domains to the shadow_apcb
> + * from the matrix mdev, but only those that are assigned to the host's
> + * AP configuration.
> + */
> + bitmap_and(matrix_mdev->shadow_apcb.apm, matrix_mdev->matrix.apm,
> + (unsigned long *)matrix_dev->info.apm, AP_DEVICES);
> + bitmap_and(matrix_mdev->shadow_apcb.aqm, matrix_mdev->matrix.aqm,
> + (unsigned long *)matrix_dev->info.aqm, AP_DOMAINS);
> +
> + for_each_set_bit_inv(apid, matrix_mdev->shadow_apcb.apm, AP_DEVICES) {
> + for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm,
> + AP_DOMAINS) {
> + /*
> + * If the APQN is not bound to the vfio_ap device
> + * driver, then we can't assign it to the guest's
> + * AP configuration. The AP architecture won't
> + * allow filtering of a single APQN, so if we're
> + * filtering APIDs, then filter the APID; otherwise,
> + * filter the APQI.
> + */
> + apqn = AP_MKQID(apid, apqi);
> + if (!vfio_ap_mdev_get_queue(matrix_mdev, apqn)) {
> + clear_bit_inv(apid,
> + matrix_mdev->shadow_apcb.apm);
> + break;
> + }
> + }
> + }
> +}
> +
> static int vfio_ap_mdev_probe(struct mdev_device *mdev)
> {
> struct ap_matrix_mdev *matrix_mdev;
> @@ -703,6 +759,7 @@ static ssize_t assign_adapter_store(struct device *dev,
> goto share_err;
>
> vfio_ap_mdev_link_adapter(matrix_mdev, apid);
> + vfio_ap_mdev_filter_matrix(matrix_mdev);
> ret = count;
> goto done;
>
> @@ -771,6 +828,7 @@ static ssize_t unassign_adapter_store(struct device *dev,
>
> clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
> vfio_ap_mdev_unlink_adapter(matrix_mdev, apid);
> + vfio_ap_mdev_filter_matrix(matrix_mdev);
> ret = count;
> done:
> mutex_unlock(&matrix_dev->lock);
> @@ -874,6 +932,7 @@ static ssize_t assign_domain_store(struct device *dev,
> goto share_err;
>
> vfio_ap_mdev_link_domain(matrix_mdev, apqi);
> + vfio_ap_mdev_filter_matrix(matrix_mdev);
> ret = count;
> goto done;
>
> @@ -942,6 +1001,7 @@ static ssize_t unassign_domain_store(struct device *dev,
>
> clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
> vfio_ap_mdev_unlink_domain(matrix_mdev, apqi);
> + vfio_ap_mdev_filter_matrix(matrix_mdev);
> ret = count;
>
> done:
> @@ -995,6 +1055,7 @@ static ssize_t assign_control_domain_store(struct device *dev,
> * number of control domains that can be assigned.
> */
> set_bit_inv(id, matrix_mdev->matrix.adm);
> + vfio_ap_mdev_filter_cdoms(matrix_mdev);
> ret = count;
> done:
> mutex_unlock(&matrix_dev->lock);
> @@ -1042,6 +1103,7 @@ static ssize_t unassign_control_domain_store(struct device *dev,
> }
>
> clear_bit_inv(domid, matrix_mdev->matrix.adm);
> + clear_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
> ret = count;
> done:
> mutex_unlock(&matrix_dev->lock);
> @@ -1179,8 +1241,6 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
> kvm_get_kvm(kvm);
> matrix_mdev->kvm = kvm;
> kvm->arch.crypto.data = matrix_mdev;
> - memcpy(&matrix_mdev->shadow_apcb, &matrix_mdev->matrix,
> - sizeof(struct ap_matrix));
> kvm_arch_crypto_set_masks(kvm, matrix_mdev->shadow_apcb.apm,
> matrix_mdev->shadow_apcb.aqm,
> matrix_mdev->shadow_apcb.adm);
> @@ -1536,6 +1596,8 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
> q->apqn = to_ap_queue(&apdev->device)->qid;
> q->saved_isc = VFIO_AP_ISC_INVALID;
> vfio_ap_queue_link_mdev(q);
> + if (q->matrix_mdev)
> + vfio_ap_mdev_filter_matrix(q->matrix_mdev);
> dev_set_drvdata(&apdev->device, q);
> mutex_unlock(&matrix_dev->lock);
>


2021-12-27 09:20:58

by Halil Pasic

[permalink] [raw]
Subject: Re: [PATCH v17 07/15] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device

On Thu, 21 Oct 2021 11:23:24 -0400
Tony Krowiak <[email protected]> wrote:

> The current implementation does not allow assignment of an AP adapter or
> domain to an mdev device if each APQN resulting from the assignment
> does not reference an AP queue device that is bound to the vfio_ap device
> driver. This patch allows assignment of AP resources to the matrix mdev as
> long as the APQNs resulting from the assignment:
> 1. Are not reserved by the AP BUS for use by the zcrypt device drivers.
> 2. Are not assigned to another matrix mdev.
>
> The rationale behind this is that the AP architecture does not preclude
> assignment of APQNs to an AP configuration profile that are not available
> to the system.
>
> Signed-off-by: Tony Krowiak <[email protected]>

Reviewed-by: Halil Pasic <[email protected]>

Looks good in isolation!


> ---
> drivers/s390/crypto/vfio_ap_ops.c | 224 +++++++-----------------------
> 1 file changed, 53 insertions(+), 171 deletions(-)
>
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 46c179363aca..6b40db6dab3c 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -520,141 +520,48 @@ static struct attribute_group *vfio_ap_mdev_type_groups[] = {
> NULL,
> };
>
> -struct vfio_ap_queue_reserved {
> - unsigned long *apid;
> - unsigned long *apqi;
> - bool reserved;
> -};
> +#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
> + "already assigned to %s"
>
> -/**
> - * vfio_ap_has_queue - determines if the AP queue containing the target in @data
> - *
> - * @dev: an AP queue device
> - * @data: a struct vfio_ap_queue_reserved reference
> - *
> - * Flags whether the AP queue device (@dev) has a queue ID containing the APQN,
> - * apid or apqi specified in @data:
> - *
> - * - If @data contains both an apid and apqi value, then @data will be flagged
> - * as reserved if the APID and APQI fields for the AP queue device matches
> - *
> - * - If @data contains only an apid value, @data will be flagged as
> - * reserved if the APID field in the AP queue device matches
> - *
> - * - If @data contains only an apqi value, @data will be flagged as
> - * reserved if the APQI field in the AP queue device matches
> - *
> - * Return: 0 to indicate the input to function succeeded. Returns -EINVAL if
> - * @data does not contain either an apid or apqi.
> - */
> -static int vfio_ap_has_queue(struct device *dev, void *data)
> +static void vfio_ap_mdev_log_sharing_err(struct ap_matrix_mdev *matrix_mdev,
> + unsigned long *apm,
> + unsigned long *aqm)
> {
> - struct vfio_ap_queue_reserved *qres = data;
> - struct ap_queue *ap_queue = to_ap_queue(dev);
> - ap_qid_t qid;
> - unsigned long id;
> -
> - if (qres->apid && qres->apqi) {
> - qid = AP_MKQID(*qres->apid, *qres->apqi);
> - if (qid == ap_queue->qid)
> - qres->reserved = true;
> - } else if (qres->apid && !qres->apqi) {
> - id = AP_QID_CARD(ap_queue->qid);
> - if (id == *qres->apid)
> - qres->reserved = true;
> - } else if (!qres->apid && qres->apqi) {
> - id = AP_QID_QUEUE(ap_queue->qid);
> - if (id == *qres->apqi)
> - qres->reserved = true;
> - } else {
> - return -EINVAL;
> - }
> -
> - return 0;
> -}
> -
> -/**
> - * vfio_ap_verify_queue_reserved - verifies that the AP queue containing
> - * @apid or @aqpi is reserved
> - *
> - * @apid: an AP adapter ID
> - * @apqi: an AP queue index
> - *
> - * Verifies that the AP queue with @apid/@apqi is reserved by the VFIO AP device
> - * driver according to the following rules:
> - *
> - * - If both @apid and @apqi are not NULL, then there must be an AP queue
> - * device bound to the vfio_ap driver with the APQN identified by @apid and
> - * @apqi
> - *
> - * - If only @apid is not NULL, then there must be an AP queue device bound
> - * to the vfio_ap driver with an APQN containing @apid
> - *
> - * - If only @apqi is not NULL, then there must be an AP queue device bound
> - * to the vfio_ap driver with an APQN containing @apqi
> - *
> - * Return: 0 if the AP queue is reserved; otherwise, returns -EADDRNOTAVAIL.
> - */
> -static int vfio_ap_verify_queue_reserved(unsigned long *apid,
> - unsigned long *apqi)
> -{
> - int ret;
> - struct vfio_ap_queue_reserved qres;
> -
> - qres.apid = apid;
> - qres.apqi = apqi;
> - qres.reserved = false;
> -
> - ret = driver_for_each_device(&matrix_dev->vfio_ap_drv->driver, NULL,
> - &qres, vfio_ap_has_queue);
> - if (ret)
> - return ret;
> -
> - if (qres.reserved)
> - return 0;
> -
> - return -EADDRNOTAVAIL;
> -}
> -
> -static int
> -vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
> - unsigned long apid)
> -{
> - int ret;
> - unsigned long apqi;
> - unsigned long nbits = matrix_mdev->matrix.aqm_max + 1;
> -
> - if (find_first_bit_inv(matrix_mdev->matrix.aqm, nbits) >= nbits)
> - return vfio_ap_verify_queue_reserved(&apid, NULL);
> -
> - for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, nbits) {
> - ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
> - if (ret)
> - return ret;
> - }
> + unsigned long apid, apqi;
> + const struct device *dev = mdev_dev(matrix_mdev->mdev);
> + const char *mdev_name = dev_name(dev);
>
> - return 0;
> + for_each_set_bit_inv(apid, apm, AP_DEVICES)
> + for_each_set_bit_inv(apqi, aqm, AP_DOMAINS)
> + dev_warn(dev, MDEV_SHARING_ERR, apid, apqi, mdev_name);
> }
>
> /**
> - * vfio_ap_mdev_verify_no_sharing - verifies that the AP matrix is not configured
> + * vfio_ap_mdev_verify_no_sharing - verify APQNs are not shared by matrix mdevs
> *
> - * @matrix_mdev: the mediated matrix device
> + * @mdev_apm: mask indicating the APIDs of the APQNs to be verified
> + * @mdev_aqm: mask indicating the APQIs of the APQNs to be verified
> *
> - * Verifies that the APQNs derived from the cross product of the AP adapter IDs
> - * and AP queue indexes comprising the AP matrix are not configured for another
> + * Verifies that each APQN derived from the Cartesian product of a bitmap of
> + * AP adapter IDs and AP queue indexes is not configured for any matrix
> * mediated device. AP queue sharing is not allowed.
> *
> - * Return: 0 if the APQNs are not shared; otherwise returns -EADDRINUSE.
> + * Return: 0 if the APQNs are not shared; otherwise return -EADDRINUSE.
> */
> -static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
> +static int vfio_ap_mdev_verify_no_sharing(unsigned long *mdev_apm,
> + unsigned long *mdev_aqm)
> {
> - struct ap_matrix_mdev *lstdev;
> + struct ap_matrix_mdev *matrix_mdev;
> DECLARE_BITMAP(apm, AP_DEVICES);
> DECLARE_BITMAP(aqm, AP_DOMAINS);
>
> - list_for_each_entry(lstdev, &matrix_dev->mdev_list, node) {
> - if (matrix_mdev == lstdev)
> + list_for_each_entry(matrix_mdev, &matrix_dev->mdev_list, node) {
> + /*
> + * If the input apm and aqm belong to the matrix_mdev's matrix,
> + * then move on to the next.
> + */
> + if (mdev_apm == matrix_mdev->matrix.apm &&
> + mdev_aqm == matrix_mdev->matrix.aqm)
> continue;
>
> memset(apm, 0, sizeof(apm));
> @@ -664,20 +571,32 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
> * We work on full longs, as we can only exclude the leftover
> * bits in non-inverse order. The leftover is all zeros.
> */
> - if (!bitmap_and(apm, matrix_mdev->matrix.apm,
> - lstdev->matrix.apm, AP_DEVICES))
> + if (!bitmap_and(apm, mdev_apm, matrix_mdev->matrix.apm,
> + AP_DEVICES))
> continue;
>
> - if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
> - lstdev->matrix.aqm, AP_DOMAINS))
> + if (!bitmap_and(aqm, mdev_aqm, matrix_mdev->matrix.aqm,
> + AP_DOMAINS))
> continue;
>
> + vfio_ap_mdev_log_sharing_err(matrix_mdev, apm, aqm);
> +
> return -EADDRINUSE;
> }
>
> return 0;
> }
>
> +static int vfio_ap_mdev_validate_masks(struct ap_matrix_mdev *matrix_mdev)
> +{
> + if (ap_apqn_in_matrix_owned_by_def_drv(matrix_mdev->matrix.apm,
> + matrix_mdev->matrix.aqm))
> + return -EADDRNOTAVAIL;
> +
> + return vfio_ap_mdev_verify_no_sharing(matrix_mdev->matrix.apm,
> + matrix_mdev->matrix.aqm);
> +}
> +
> static void vfio_ap_mdev_link_adapter(struct ap_matrix_mdev *matrix_mdev,
> unsigned long apid)
> {
> @@ -743,28 +662,17 @@ static ssize_t assign_adapter_store(struct device *dev,
> goto done;
> }
>
> - /*
> - * Set the bit in the AP mask (APM) corresponding to the AP adapter
> - * number (APID). The bits in the mask, from most significant to least
> - * significant bit, correspond to APIDs 0-255.
> - */
> - ret = vfio_ap_mdev_verify_queues_reserved_for_apid(matrix_mdev, apid);
> - if (ret)
> - goto done;
> -
> set_bit_inv(apid, matrix_mdev->matrix.apm);
>
> - ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
> - if (ret)
> - goto share_err;
> + ret = vfio_ap_mdev_validate_masks(matrix_mdev);
> + if (ret) {
> + clear_bit_inv(apid, matrix_mdev->matrix.apm);
> + goto done;
> + }
>
> vfio_ap_mdev_link_adapter(matrix_mdev, apid);
> vfio_ap_mdev_filter_matrix(matrix_mdev);
> ret = count;
> - goto done;
> -
> -share_err:
> - clear_bit_inv(apid, matrix_mdev->matrix.apm);
> done:
> mutex_unlock(&matrix_dev->lock);
>
> @@ -836,26 +744,6 @@ static ssize_t unassign_adapter_store(struct device *dev,
> }
> static DEVICE_ATTR_WO(unassign_adapter);
>
> -static int
> -vfio_ap_mdev_verify_queues_reserved_for_apqi(struct ap_matrix_mdev *matrix_mdev,
> - unsigned long apqi)
> -{
> - int ret;
> - unsigned long apid;
> - unsigned long nbits = matrix_mdev->matrix.apm_max + 1;
> -
> - if (find_first_bit_inv(matrix_mdev->matrix.apm, nbits) >= nbits)
> - return vfio_ap_verify_queue_reserved(NULL, &apqi);
> -
> - for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, nbits) {
> - ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
> - if (ret)
> - return ret;
> - }
> -
> - return 0;
> -}
> -
> static void vfio_ap_mdev_link_domain(struct ap_matrix_mdev *matrix_mdev,
> unsigned long apqi)
> {
> @@ -921,23 +809,17 @@ static ssize_t assign_domain_store(struct device *dev,
> goto done;
> }
>
> - ret = vfio_ap_mdev_verify_queues_reserved_for_apqi(matrix_mdev, apqi);
> - if (ret)
> - goto done;
> -
> set_bit_inv(apqi, matrix_mdev->matrix.aqm);
>
> - ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
> - if (ret)
> - goto share_err;
> + ret = vfio_ap_mdev_validate_masks(matrix_mdev);
> + if (ret) {
> + clear_bit_inv(apqi, matrix_mdev->matrix.aqm);
> + goto done;
> + }
>
> vfio_ap_mdev_link_domain(matrix_mdev, apqi);
> vfio_ap_mdev_filter_matrix(matrix_mdev);
> ret = count;
> - goto done;
> -
> -share_err:
> - clear_bit_inv(apqi, matrix_mdev->matrix.aqm);
> done:
> mutex_unlock(&matrix_dev->lock);
>


2021-12-30 02:04:31

by Halil Pasic

[permalink] [raw]
Subject: Re: [PATCH v17 08/15] s390/vfio-ap: keep track of active guests

On Thu, 21 Oct 2021 11:23:25 -0400
Tony Krowiak <[email protected]> wrote:

> The reason a lockdep splat can occur has to do with the fact that the
> kvm->lock has to be taken before the vcpu->lock; so, for example, when a
> secure execution guest is started, you may end up with the following
> scenario:
>
> Interception of PQAP(AQIC) instruction executed on the guest:
> ------------------------------------------------------------
> handle_pqap: matrix_dev->lock
> kvm_vcpu_ioctl: vcpu_mutex
>
> Start of secure execution guest:
> -------------------------------
> kvm_s390_cpus_to_pv: vcpu->mutex
> kvm_arch_vm_ioctl: kvm->lock
>
> Queue is unbound from vfio_ap device driver:
> -------------------------------------------
> kvm->lock
> vfio_ap_mdev_remove_queue: matrix_dev->lock

The way you describe your scenario is a little ambiguous. It
seems you choose a stack-trace like description, in a sense that for
example for PQAP: first vcpu->mutex is taken and then matrix_dev->lock
but you write the latter first and the former second. I think it is more
usual to describe such stuff a a sequence of event in a sense that
if A precedes B in the text (from the top towards the bottom), then
execution of a A precedes the execution of B in time.

Also you are inconsistent with vcpu_mutex vs vcpu->mutex.

I can't say I understand the need for this yet. I have been starring
at the end result for a while. Let me see if I can come up with an
alternate proposal for some things.

Regards,
Halil



2021-12-30 03:33:36

by Halil Pasic

[permalink] [raw]
Subject: Re: [PATCH v17 08/15] s390/vfio-ap: keep track of active guests

On Thu, 21 Oct 2021 11:23:25 -0400
Tony Krowiak <[email protected]> wrote:

> The vfio_ap device driver registers for notification when the pointer to
> the KVM object for a guest is set. Let's store the KVM pointer as well as
> the pointer to the mediated device when the KVM pointer is set.

[..]


> struct ap_matrix_dev {
> ...
> struct rw_semaphore guests_lock;
> struct list_head guests;
> ...
> }
>
> The 'guests_lock' field is a r/w semaphore to control access to the
> 'guests' field. The 'guests' field is a list of ap_guest
> structures containing the KVM and matrix_mdev pointers for each active
> guest. An ap_guest structure will be stored into the list whenever the
> vfio_ap device driver is notified that the KVM pointer has been set and
> removed when notified that the KVM pointer has been cleared.
>

Is this about the field or about the list including all the nodes? This
reads lie guests_lock only protects the head element, which makes no
sense to me. Because of how these lists work.

The narrowest scope that could make sense is all the list_head stuff
in the entire list. I.e. one would only need the lock to traverse or
manipulate the list, while the payload would still be subject to
the matrix_dev->lock mutex.

[..]

> +struct ap_guest {
> + struct kvm *kvm;
> + struct list_head node;
> +};
> +
> /**
> * struct ap_matrix_dev - Contains the data for the matrix device.
> *
> @@ -39,6 +44,9 @@
> * single ap_matrix_mdev device. It's quite coarse but we don't
> * expect much contention.
> * @vfio_ap_drv: the vfio_ap device driver
> + * @guests_lock: r/w semaphore for protecting access to @guests
> + * @guests: list of guests (struct ap_guest) using AP devices bound to the
> + * vfio_ap device driver.

Please compare the above. Also if it is only about the access to the
list, then you could drop the lock right after create, and not keep it
till the very end of vfio_ap_mdev_set_kvm(). Right?

In any case I'm skeptical about this whole struct ap_guest business. To
me, it looks like something that just makes things more obscure and
complicated without any real benefit.

Regards,
Halil

> */
> struct ap_matrix_dev {
> struct device device;
> @@ -47,6 +55,8 @@ struct ap_matrix_dev {
> struct list_head mdev_list;
> struct mutex lock;
> struct ap_driver *vfio_ap_drv;
> + struct rw_semaphore guests_lock;
> + struct list_head guests;
> };
>
> extern struct ap_matrix_dev *matrix_dev;


2022-01-04 16:22:58

by Jason J. Herne

[permalink] [raw]
Subject: Re: [PATCH v17 01/15] s390/vfio-ap: Set pqap hook when vfio_ap module is loaded

On 10/21/21 11:23, Tony Krowiak wrote:

> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index a604d51acfc8..05569d077d7f 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -799,16 +799,17 @@ struct kvm_s390_cpu_model {
> unsigned short ibc;
> };
>
> -typedef int (*crypto_hook)(struct kvm_vcpu *vcpu);
> +struct kvm_s390_crypto_hook {
> + int (*fcn)(struct kvm_vcpu *vcpu);
> +};

Why are we storing a single function pointer inside a struct? Seems simpler to just use a
function pointer. What was the problem with the typedef that you are replacing?

2022-01-09 21:36:46

by Halil Pasic

[permalink] [raw]
Subject: Re: [PATCH v17 09/15] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device

On Thu, 21 Oct 2021 11:23:26 -0400
Tony Krowiak <[email protected]> wrote:

> Keep in mind that the kvm->lock must be taken outside of the
> matrix_mdev->lock to avoid circular lock dependencies (i.e., a lockdep
> splat). This will necessitate taking the matrix_dev->guests_lock in order
> to find the guest(s) in the matrix_dev->guests list to which the affected
> APQN(s) may be assigned. The kvm->lock can then be taken prior to the
> matrix_dev->lock and the APCB plugged into the guest without any problem.

IMHO correct and sane locking is one of the key points we have to
resolve. Frankly, I'm having trouble understanding the why behind some
of your changes, compared to v16, and I suspect that looking for a good
locking scheme might have played a role.

In the beginning, I was not very keen on taking the kvm->lock first
and the matrix_dev->lock, but the more I think about it the more I
become convinced that this is probably the simplest way to resolve the
problem in a satisfactory manner. I don't like the idea of
hogging the kvm->lock and potentially stalling out some core kvm code
because there is contention on matrix_dev->lock. And it is kind of up to
the user-space and the guests, how much pressure is put on the
matrix_dev->lock. And I'm still worried about that, but when I went
through the alternatives, my mood turned form bad to worse. Because of
that, I'm fine with this solution, provided some of the KVM/s390
maintainers ack it as well. I don't feel comfortable making a call on
this alone.

That said, let me also sum up my thoughts on alternatives and
non-alternatives, hopefully for the benefit of other reviewers.

1) I deeply regret that I used to argue against handling PQAP in
userspace with an ioctl as Pierre originally proposed. I was unaware of
the kvm->lock vcpu->lock locking order. Back then we didn't use to
have that sequence, but the rule was already there. I guess we could
still go back to that scheme of handling PQAP if QEMU were to support
it, and thus break the circle, but that would result in a very ugly
dependency (we would need QEMU support for dynamic, and we would have
to handle the case of an old QEMU). Technically it is still possible, but
very ugly.
2) I've contemplated if it is possible to simulate the userspace exit
and re-entry via ioctl in KVM. But looking at the code, it does not
look like a sane option to me.
3) I also considered using a read-write lock for matrix_dev->lock. In
theory a read-write lock that favors reads in a sense that a steady
stream of readers can starve the writers would work. But rwsem can't be
used in this situation because rwsem is fair, in a sense that a waiting
writers may effectively block readers that try to acquire the lock while
the lock is held as a read lock. So while rwsem in practice does allow
for more parallelism regarding lock dependency circles it does not
provide any benefits over a mutex.
4) I considered srcu as well. But rcu is a very different beast and does
not seem to be a great fit for what we are trying to do here. We are
not not fine with working with a stale copy of the matrix in most of the
situations.
5) I also contemplated, if relaxing the mutual exclusion is possible.
PQAP only needs the CRYCB matrix to check whether the queue is in the
config or not. So maybe we could get away without taking the
matrix_dev->lock and doing separate locking for the queue in question,
and instead of delaying any updates to the CRYCB while processing AQIC,
we could just work with whatever we see in the CRYCB. Since the setting
up of the interrupts is asynchronous with respect to the instruction
requesting it (PQAP/AQIC) and the CRYCB masks are relevant in the
instruction context... So I was thinking: if we were to introduce a
separate lock for the AQIC state, and find the queue without taking
the matrix_dev->lock, we could actually process the PQAP/AQIC without
the matrix_dev->lock. But then because we would have vcpu->lock -->
vfio_ap_queue->lock, we would have to avoid ending up with a circle
on the cleanup path, and also avoid races on the cleanup path. I'm not
sure how tricky that would end up being, if at all possible.
6) We could practically implement that unfair read-write lock with
a mutex and condition variables (and a waitqueue), but that wouldn't
simplify things either. Still if we want to avoid taking kvm->lock
before taking the vfio_ap lock, it may be the most straight forward
alternative.

At the end let me also state, that my understanding of some of the
details is still incomplete.

Regards,
Halil




2022-01-11 17:30:06

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v17 01/15] s390/vfio-ap: Set pqap hook when vfio_ap module is loaded



On 1/4/22 11:22, Jason J. Herne wrote:
> On 10/21/21 11:23, Tony Krowiak wrote:
>
>> diff --git a/arch/s390/include/asm/kvm_host.h
>> b/arch/s390/include/asm/kvm_host.h
>> index a604d51acfc8..05569d077d7f 100644
>> --- a/arch/s390/include/asm/kvm_host.h
>> +++ b/arch/s390/include/asm/kvm_host.h
>> @@ -799,16 +799,17 @@ struct kvm_s390_cpu_model {
>>       unsigned short ibc;
>>   };
>>   -typedef int (*crypto_hook)(struct kvm_vcpu *vcpu);
>> +struct kvm_s390_crypto_hook {
>> +    int (*fcn)(struct kvm_vcpu *vcpu);
>> +};
>
> Why are we storing a single function pointer inside a struct? Seems
> simpler to just use a function pointer. What was the problem with the
> typedef that you are replacing?

In case you didn't see my response to Halil, the point is now moot
because I'm eliminating this patch.



2022-01-11 17:34:11

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v17 02/15] s390/vfio-ap: use new AP bus interface to search for queue devices



On 12/27/21 03:25, Halil Pasic wrote:
> On Thu, 21 Oct 2021 11:23:19 -0400
> Tony Krowiak <[email protected]> wrote:
>
>> his patch refactors the vfio_ap device driver to use the AP bus's
>> ap_get_qdev() function to retrieve the vfio_ap_queue struct containing
>> information about a queue that is bound to the vfio_ap device driver.
>> The bus's ap_get_qdev() function retrieves the queue device from a
>> hashtable keyed by APQN. This is much more efficient than looping over
>> the list of devices attached to the AP bus by several orders of
>> magnitude.
>>
>> Signed-off-by: Tony Krowiak <[email protected]>
>> Reviewed-by: Halil Pasic <[email protected]>
> I assume nothing changed here, and nothing significant changed around
> this patch (context). If I'm wrong, please tell me and I will
> re-evaluate.

rebase to this patch -> delete the patch -> fix resultant conflicts

>
> Regards,
> Halil


2022-01-11 21:13:34

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v17 01/15] s390/vfio-ap: Set pqap hook when vfio_ap module is loaded



On 12/27/21 03:21, Halil Pasic wrote:
> On Thu, 21 Oct 2021 11:23:18 -0400
> Tony Krowiak <[email protected]> wrote:
>
>> Rather than storing the function pointer to the PQAP(AQIC) instruction
>> interception handler with the mediated device (struct ap_matrix_mdev) when
>> the vfio_ap device driver is notified that the KVM point is being set,
>> let's store it once in a global variable when the vfio_ap module is
>> loaded.
> This is a global variable of the kvm module, right? I'm not sure this
> makes sense from a 'bigger picture' perspective!. Imagine a situation
> where we have two drivers doing some sort of a vfio-ap like thing. Only
> one of the drivers would be able to register the callback. I do
> understand that we are unlikely to see another driver that needs to
> register a pqap callback in the foreseeable future, but this till looks
> like a very questionable design. The old design where the AP
> pass-through and/or virtualization is done by exactly one driver for one
> guest, but different guests may use different drivers looks preferable
> to me.

This patch was developed due to lockdep errors I could not get rid of
using the previous model. I reverted back to that model and tested
with an SE guest again and verified that there were no lockdep
splats. Maybe it was inadvertently fixed with the subsequent patches,
so I'm going to run with the previous design.

>
>
>
>
>> There are three reasons for doing this:
>>
>> 1. The lifetime of the interception handler function coincides with the
>> lifetime of the vfio_ap module, so it makes little sense to tie it to
>> the mediated device and complete sense to tie it to the module in which
>> the function resides.
> Well, the handler lives in the vfio_ap module so this is a given anyway.
> I guess we are talking here about the callback which is registered by the
> vfio_ap module, in the old design with a specific kvm instance (guest).
>
>> 2. The setting/clearing of the function pointer is protected by a mutex
>> lock. This increases the number of locks taken during
>> VFIO_GROUP_NOTIFY_SET_KVM notification and increases the complexity of
>> ensuring locking integrity and avoiding circular lock dependencies.
>>
>> 3. The lock will only be taken for writing twice: When the vfio_ap module
>> is loaded; and, when the vfio_ap module is removed. As it stands now,
>> the lock is taken for writing whenever a guest is started or terminated.
> What you do is basically get rid of the
>
> crypto_hook *pqap_hook;
> pointer and add the
> void *data;
> pointer instead.
>
> It is obvious why you need to add that pointer: we need a pointer to the
> matrix_mdev, and we used to obtain it via the pqap_hook pointer.
>
> But isn't the access to *data racy now? I mean you sill have concurrent
> access to a pointer that is just called differently. Yes, instead of
> accessing vcpu->kvm->arch.crypto.pqap_hook twice, we access the global
> pqap_hook in the kvm module once, and then data once, but the access to
> data does not seem to be synchronized at all. And you do not care to
> explain why that is supposed to be OK.
>
>
>
>> Signed-off-by: Tony Krowiak <[email protected]>
>> ---
>> arch/s390/include/asm/kvm_host.h | 10 ++++--
>> arch/s390/kvm/kvm-s390.c | 1 -
>> arch/s390/kvm/priv.c | 45 ++++++++++++++++++++++-----
>> drivers/s390/crypto/vfio_ap_ops.c | 27 ++++++++--------
>> drivers/s390/crypto/vfio_ap_private.h | 1 -
>> 5 files changed, 58 insertions(+), 26 deletions(-)
>>
>> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
>> index a604d51acfc8..05569d077d7f 100644
>> --- a/arch/s390/include/asm/kvm_host.h
>> +++ b/arch/s390/include/asm/kvm_host.h
>> @@ -799,16 +799,17 @@ struct kvm_s390_cpu_model {
>> unsigned short ibc;
>> };
>>
>> -typedef int (*crypto_hook)(struct kvm_vcpu *vcpu);
>> +struct kvm_s390_crypto_hook {
>> + int (*fcn)(struct kvm_vcpu *vcpu);
>> +};
> I guess you do this for type safety, or?
>
>>
>> struct kvm_s390_crypto {
>> struct kvm_s390_crypto_cb *crycb;
>> - struct rw_semaphore pqap_hook_rwsem;
>> - crypto_hook *pqap_hook;
> One pointer set by the driver gone...
>
>> __u32 crycbd;
>> __u8 aes_kw;
>> __u8 dea_kw;
>> __u8 apie;
>> + void *data;
> ... but another one added!
>
>> };
>>
>> #define APCB0_MASK_SIZE 1
>> @@ -998,6 +999,9 @@ extern char sie_exit;
>> extern int kvm_s390_gisc_register(struct kvm *kvm, u32 gisc);
>> extern int kvm_s390_gisc_unregister(struct kvm *kvm, u32 gisc);
>>
>> +extern int kvm_s390_pqap_hook_register(struct kvm_s390_crypto_hook *hook);
>> +extern int kvm_s390_pqap_hook_unregister(struct kvm_s390_crypto_hook *hook);
>> +
>> static inline void kvm_arch_hardware_disable(void) {}
>> static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>> static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>> index 6a6dd5e1daf6..c91981599328 100644
>> --- a/arch/s390/kvm/kvm-s390.c
>> +++ b/arch/s390/kvm/kvm-s390.c
>> @@ -2649,7 +2649,6 @@ static void kvm_s390_crypto_init(struct kvm *kvm)
>> {
>> kvm->arch.crypto.crycb = &kvm->arch.sie_page2->crycb;
>> kvm_s390_set_crycb_format(kvm);
>> - init_rwsem(&kvm->arch.crypto.pqap_hook_rwsem);
>>
>> if (!test_kvm_facility(kvm, 76))
>> return;
>> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
>> index 53da4ceb16a3..3d91ff934c0c 100644
>> --- a/arch/s390/kvm/priv.c
>> +++ b/arch/s390/kvm/priv.c
>> @@ -31,6 +31,39 @@
>> #include "kvm-s390.h"
>> #include "trace.h"
>>
>> +DEFINE_MUTEX(pqap_hook_lock);
>> +static struct kvm_s390_crypto_hook *pqap_hook;
> This is the kvm global variable for the hook.
>> +
>> +int kvm_s390_pqap_hook_register(struct kvm_s390_crypto_hook *hook)
>> +{
>> + int ret = 0;
>> +
>> + mutex_lock(&pqap_hook_lock);
>> + if (pqap_hook)
>> + ret = -EACCES;
>> + else
>> + pqap_hook = hook;
>> + mutex_unlock(&pqap_hook_lock);
>> +
>> + return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(kvm_s390_pqap_hook_register);
>> +
>> +int kvm_s390_pqap_hook_unregister(struct kvm_s390_crypto_hook *hook)
>> +{
>> + int ret = 0;
>> +
>> + mutex_lock(&pqap_hook_lock);
>> + if (hook != pqap_hook)
>> + ret = -EACCES;
>> + else
>> + pqap_hook = NULL;
>> + mutex_unlock(&pqap_hook_lock);
>> +
>> + return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(kvm_s390_pqap_hook_unregister);
>> +
>> static int handle_ri(struct kvm_vcpu *vcpu)
>> {
>> vcpu->stat.instruction_ri++;
>> @@ -610,7 +643,6 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
>> static int handle_pqap(struct kvm_vcpu *vcpu)
>> {
>> struct ap_queue_status status = {};
>> - crypto_hook pqap_hook;
>> unsigned long reg0;
>> int ret;
>> uint8_t fc;
>> @@ -659,16 +691,15 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>> * hook function pointer in the kvm_s390_crypto structure. Lock the
>> * owner, retrieve the hook function pointer and call the hook.
>> */
>> - down_read(&vcpu->kvm->arch.crypto.pqap_hook_rwsem);
> This used to protect any reads of vcpu->kvm->arch.crypto.pqap_hook ...
>
>> - if (vcpu->kvm->arch.crypto.pqap_hook) {
>> - pqap_hook = *vcpu->kvm->arch.crypto.pqap_hook;
>> - ret = pqap_hook(vcpu);
> ... while we are executing the hook. Instead in the hook we are supposed
> to read *vcpu->kvm->arch.crypto.data but we don't care to synchronize
> that access because that one ain't protected by the new pqap_hook_lock
> mutex.
>
>> + mutex_lock(&pqap_hook_lock);
>> + if (pqap_hook) {
>> + ret = pqap_hook->fcn(vcpu);
>> if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000)
>> kvm_s390_set_psw_cc(vcpu, 3);
>> - up_read(&vcpu->kvm->arch.crypto.pqap_hook_rwsem);
>> + mutex_unlock(&pqap_hook_lock);
>> return ret;
>> }
>> - up_read(&vcpu->kvm->arch.crypto.pqap_hook_rwsem);
>> + mutex_unlock(&pqap_hook_lock);
>> /*
>> * A vfio_driver must register a hook.
>> * No hook means no driver to enable the SIE CRYCB and no queues.
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index 94c1c9bd58ad..02275d246b39 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -293,13 +293,10 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>> apqn = vcpu->run->s.regs.gprs[0] & 0xffff;
>> mutex_lock(&matrix_dev->lock);
>>
>> - if (!vcpu->kvm->arch.crypto.pqap_hook)
>> - goto out_unlock;
>> - matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
>> - struct ap_matrix_mdev, pqap_hook);
>> + matrix_mdev = vcpu->kvm->arch.crypto.data;
> Here is the access I'm talking about above. [1]
>
>>
>> /* If the there is no guest using the mdev, there is nothing to do */
>> - if (!matrix_mdev->kvm)
>> + if (!matrix_mdev || !matrix_mdev->kvm)
>> goto out_unlock;
>>
>> q = vfio_ap_get_queue(matrix_mdev, apqn);
>> @@ -348,7 +345,6 @@ static int vfio_ap_mdev_probe(struct mdev_device *mdev)
>>
>> matrix_mdev->mdev = mdev;
>> vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
>> - matrix_mdev->pqap_hook = handle_pqap;
>> mutex_lock(&matrix_dev->lock);
>> list_add(&matrix_mdev->node, &matrix_dev->mdev_list);
>> mutex_unlock(&matrix_dev->lock);
>> @@ -1078,10 +1074,6 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
>> struct ap_matrix_mdev *m;
>>
>> if (kvm->arch.crypto.crycbd) {
>> - down_write(&kvm->arch.crypto.pqap_hook_rwsem);
>> - kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
> We used to set our old pointer in a synchronized manner...
>
>> - up_write(&kvm->arch.crypto.pqap_hook_rwsem);
>> -
>> mutex_lock(&kvm->lock);
>> mutex_lock(&matrix_dev->lock);
>>
>> @@ -1095,6 +1087,7 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
>>
>> kvm_get_kvm(kvm);
>> matrix_mdev->kvm = kvm;
>> + kvm->arch.crypto.data = matrix_mdev;
> ... but not any more!
>
>> kvm_arch_crypto_set_masks(kvm,
>> matrix_mdev->matrix.apm,
>> matrix_mdev->matrix.aqm,
>> @@ -1155,16 +1148,13 @@ static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev,
>> struct kvm *kvm)
>> {
>> if (kvm && kvm->arch.crypto.crycbd) {
>> - down_write(&kvm->arch.crypto.pqap_hook_rwsem);
>> - kvm->arch.crypto.pqap_hook = NULL;
>> - up_write(&kvm->arch.crypto.pqap_hook_rwsem);
>> -
> Same here!
>
>> mutex_lock(&kvm->lock);
>> mutex_lock(&matrix_dev->lock);
>>
>> kvm_arch_crypto_clear_masks(kvm);
>> vfio_ap_mdev_reset_queues(matrix_mdev);
>> kvm_put_kvm(kvm);
>> + kvm->arch.crypto.data = NULL;
> And same here. So at [1] we aren't guaranteed to see this write, right?
> That does not look right to me.
>
>> matrix_mdev->kvm = NULL;
>>
>> mutex_unlock(&kvm->lock);
>> @@ -1391,12 +1381,20 @@ static const struct mdev_parent_ops vfio_ap_matrix_ops = {
>> .supported_type_groups = vfio_ap_mdev_type_groups,
>> };
>>
>> +static struct kvm_s390_crypto_hook pqap_hook = {
>> + .fcn = handle_pqap,
>> +};
>> +
>> int vfio_ap_mdev_register(void)
>> {
>> int ret;
>>
>> atomic_set(&matrix_dev->available_instances, MAX_ZDEV_ENTRIES_EXT);
>>
>> + ret = kvm_s390_pqap_hook_register(&pqap_hook);
>> + if (ret)
>> + return ret;
>> +
>> ret = mdev_register_driver(&vfio_ap_matrix_driver);
>> if (ret)
>> return ret;
>> @@ -1413,6 +1411,7 @@ int vfio_ap_mdev_register(void)
>>
>> void vfio_ap_mdev_unregister(void)
>> {
>> + WARN_ON(kvm_s390_pqap_hook_unregister(&pqap_hook));
>> mdev_unregister_device(&matrix_dev->device);
>> mdev_unregister_driver(&vfio_ap_matrix_driver);
>> }
>> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
>> index 648fcaf8104a..907f41160de7 100644
>> --- a/drivers/s390/crypto/vfio_ap_private.h
>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>> @@ -97,7 +97,6 @@ struct ap_matrix_mdev {
>> struct notifier_block group_notifier;
>> struct notifier_block iommu_notifier;
>> struct kvm *kvm;
>> - crypto_hook pqap_hook;
>> struct mdev_device *mdev;
>> };
>>


2022-01-11 21:19:17

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v17 06/15] s390/vfio-ap: refresh guest's APCB by filtering APQNs assigned to mdev



On 12/27/21 03:53, Halil Pasic wrote:
> On Thu, 21 Oct 2021 11:23:23 -0400
> Tony Krowiak <[email protected]> wrote:
>
>> Refresh the guest's APCB by filtering the APQNs assigned to the matrix mdev
>> that do not reference an AP queue device bound to the vfio_ap device
>> driver. The mdev's APQNs will be filtered according to the following rules:
>>
>> * The APID of each adapter and the APQI of each domain that is not in the
>> host's AP configuration is filtered out.
>>
>> * The APID of each adapter comprising an APQN that does not reference a
>> queue device bound to the vfio_ap device driver is filtered. The APQNs
>> are derived from the Cartesian product of the APID of each adapter and
>> APQI of each domain assigned to the mdev.
>>
>> The control domains that are not assigned to the host's AP configuration
>> will also be filtered before assigning them to the guest's APCB.
> The v16 version used to filer on queue removal from vfio_ap, which makes
> a ton of sense.
>
> This version will "filter back" the queues once these become bound, but
> if a queue is removed form vfio_ap, we don't seem to care to filter. Is
> this intentional?

See patch the changes to the vfio_ap_mdev_remove_queue function in patch
09/15,
s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device

Now that I look at that patch, I should probably rearrange this patch to
also do
filtering on queue removal, but only do the hotplug stuff in patch 09.

>
> Also we could probably do the filtering incrementally. In a sense that
> at a time only so much changes, and we know that the invariant was
> preserved without that change. But that would probably end up trading
> complexity for cycles. I will trust your judgment and your tests on this
> matter.

I am not entirely clear on what you are suggesting. I think you are
suggesting that there may not be a need to look at every APQN
assigned to the mdev when an adapter or domain is assigned or
unassigned or a queue is probed or removed. Maybe you can clarify
what you are suggesting here.

>
>> Signed-off-by: Tony Krowiak <[email protected]>
>> ---
>> drivers/s390/crypto/vfio_ap_ops.c | 66 ++++++++++++++++++++++++++++++-
>> 1 file changed, 64 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index 4305177029bf..46c179363aca 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -314,6 +314,62 @@ static void vfio_ap_matrix_init(struct ap_config_info *info,
>> matrix->adm_max = info->apxa ? info->Nd : 15;
>> }
>>
>> +static void vfio_ap_mdev_filter_cdoms(struct ap_matrix_mdev *matrix_mdev)
>> +{
>> + bitmap_and(matrix_mdev->shadow_apcb.adm, matrix_mdev->matrix.adm,
>> + (unsigned long *)matrix_dev->info.adm, AP_DOMAINS);
>> +}
>> +
>> +/*
>> + * vfio_ap_mdev_filter_matrix - copy the mdev's AP configuration to the KVM
>> + * guest's APCB then filter the APIDs that do not
>> + * comprise at least one APQN that references a
>> + * queue device bound to the vfio_ap device driver.
>> + *
>> + * @matrix_mdev: the mdev whose AP configuration is to be filtered.
>> + */
>> +static void vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev)
>> +{
>> + int ret;
>> + unsigned long apid, apqi, apqn;
>> +
>> + ret = ap_qci(&matrix_dev->info);
>> + if (ret)
>> + return;
>> +
>> + vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);
>> +
>> + /*
>> + * Copy the adapters, domains and control domains to the shadow_apcb
>> + * from the matrix mdev, but only those that are assigned to the host's
>> + * AP configuration.
>> + */
>> + bitmap_and(matrix_mdev->shadow_apcb.apm, matrix_mdev->matrix.apm,
>> + (unsigned long *)matrix_dev->info.apm, AP_DEVICES);
>> + bitmap_and(matrix_mdev->shadow_apcb.aqm, matrix_mdev->matrix.aqm,
>> + (unsigned long *)matrix_dev->info.aqm, AP_DOMAINS);
>> +
>> + for_each_set_bit_inv(apid, matrix_mdev->shadow_apcb.apm, AP_DEVICES) {
>> + for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm,
>> + AP_DOMAINS) {
>> + /*
>> + * If the APQN is not bound to the vfio_ap device
>> + * driver, then we can't assign it to the guest's
>> + * AP configuration. The AP architecture won't
>> + * allow filtering of a single APQN, so if we're
>> + * filtering APIDs, then filter the APID; otherwise,
>> + * filter the APQI.
>> + */
>> + apqn = AP_MKQID(apid, apqi);
>> + if (!vfio_ap_mdev_get_queue(matrix_mdev, apqn)) {
>> + clear_bit_inv(apid,
>> + matrix_mdev->shadow_apcb.apm);
>> + break;
>> + }
>> + }
>> + }
>> +}
>> +
>> static int vfio_ap_mdev_probe(struct mdev_device *mdev)
>> {
>> struct ap_matrix_mdev *matrix_mdev;
>> @@ -703,6 +759,7 @@ static ssize_t assign_adapter_store(struct device *dev,
>> goto share_err;
>>
>> vfio_ap_mdev_link_adapter(matrix_mdev, apid);
>> + vfio_ap_mdev_filter_matrix(matrix_mdev);
>> ret = count;
>> goto done;
>>
>> @@ -771,6 +828,7 @@ static ssize_t unassign_adapter_store(struct device *dev,
>>
>> clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
>> vfio_ap_mdev_unlink_adapter(matrix_mdev, apid);
>> + vfio_ap_mdev_filter_matrix(matrix_mdev);
>> ret = count;
>> done:
>> mutex_unlock(&matrix_dev->lock);
>> @@ -874,6 +932,7 @@ static ssize_t assign_domain_store(struct device *dev,
>> goto share_err;
>>
>> vfio_ap_mdev_link_domain(matrix_mdev, apqi);
>> + vfio_ap_mdev_filter_matrix(matrix_mdev);
>> ret = count;
>> goto done;
>>
>> @@ -942,6 +1001,7 @@ static ssize_t unassign_domain_store(struct device *dev,
>>
>> clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
>> vfio_ap_mdev_unlink_domain(matrix_mdev, apqi);
>> + vfio_ap_mdev_filter_matrix(matrix_mdev);
>> ret = count;
>>
>> done:
>> @@ -995,6 +1055,7 @@ static ssize_t assign_control_domain_store(struct device *dev,
>> * number of control domains that can be assigned.
>> */
>> set_bit_inv(id, matrix_mdev->matrix.adm);
>> + vfio_ap_mdev_filter_cdoms(matrix_mdev);
>> ret = count;
>> done:
>> mutex_unlock(&matrix_dev->lock);
>> @@ -1042,6 +1103,7 @@ static ssize_t unassign_control_domain_store(struct device *dev,
>> }
>>
>> clear_bit_inv(domid, matrix_mdev->matrix.adm);
>> + clear_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
>> ret = count;
>> done:
>> mutex_unlock(&matrix_dev->lock);
>> @@ -1179,8 +1241,6 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
>> kvm_get_kvm(kvm);
>> matrix_mdev->kvm = kvm;
>> kvm->arch.crypto.data = matrix_mdev;
>> - memcpy(&matrix_mdev->shadow_apcb, &matrix_mdev->matrix,
>> - sizeof(struct ap_matrix));
>> kvm_arch_crypto_set_masks(kvm, matrix_mdev->shadow_apcb.apm,
>> matrix_mdev->shadow_apcb.aqm,
>> matrix_mdev->shadow_apcb.adm);
>> @@ -1536,6 +1596,8 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
>> q->apqn = to_ap_queue(&apdev->device)->qid;
>> q->saved_isc = VFIO_AP_ISC_INVALID;
>> vfio_ap_queue_link_mdev(q);
>> + if (q->matrix_mdev)
>> + vfio_ap_mdev_filter_matrix(q->matrix_mdev);
>> dev_set_drvdata(&apdev->device, q);
>> mutex_unlock(&matrix_dev->lock);
>>


2022-01-11 21:27:57

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v17 08/15] s390/vfio-ap: keep track of active guests



On 12/29/21 21:04, Halil Pasic wrote:
> On Thu, 21 Oct 2021 11:23:25 -0400
> Tony Krowiak <[email protected]> wrote:
>
>> The reason a lockdep splat can occur has to do with the fact that the
>> kvm->lock has to be taken before the vcpu->lock; so, for example, when a
>> secure execution guest is started, you may end up with the following
>> scenario:
>>
>> Interception of PQAP(AQIC) instruction executed on the guest:
>> ------------------------------------------------------------
>> handle_pqap: matrix_dev->lock
>> kvm_vcpu_ioctl: vcpu_mutex
>>
>> Start of secure execution guest:
>> -------------------------------
>> kvm_s390_cpus_to_pv: vcpu->mutex
>> kvm_arch_vm_ioctl: kvm->lock
>>
>> Queue is unbound from vfio_ap device driver:
>> -------------------------------------------
>> kvm->lock
>> vfio_ap_mdev_remove_queue: matrix_dev->lock
> The way you describe your scenario is a little ambiguous. It
> seems you choose a stack-trace like description, in a sense that for
> example for PQAP: first vcpu->mutex is taken and then matrix_dev->lock
> but you write the latter first and the former second. I think it is more
> usual to describe such stuff a a sequence of event in a sense that
> if A precedes B in the text (from the top towards the bottom), then
> execution of a A precedes the execution of B in time.

I wrote it the way it is displayed in the lockdep splat trace.
I'd be happy to re-arrange it if you'd prefer.

>
> Also you are inconsistent with vcpu_mutex vs vcpu->mutex.
>
> I can't say I understand the need for this yet. I have been starring
> at the end result for a while. Let me see if I can come up with an
> alternate proposal for some things.

Go for it, and may the force be with you.

>
> Regards,
> Halil
>
>


2022-01-11 21:58:22

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v17 08/15] s390/vfio-ap: keep track of active guests



On 12/29/21 22:33, Halil Pasic wrote:
> On Thu, 21 Oct 2021 11:23:25 -0400
> Tony Krowiak <[email protected]> wrote:
>
>> The vfio_ap device driver registers for notification when the pointer to
>> the KVM object for a guest is set. Let's store the KVM pointer as well as
>> the pointer to the mediated device when the KVM pointer is set.
> [..]
>
>
>> struct ap_matrix_dev {
>> ...
>> struct rw_semaphore guests_lock;
>> struct list_head guests;
>> ...
>> }
>>
>> The 'guests_lock' field is a r/w semaphore to control access to the
>> 'guests' field. The 'guests' field is a list of ap_guest
>> structures containing the KVM and matrix_mdev pointers for each active
>> guest. An ap_guest structure will be stored into the list whenever the
>> vfio_ap device driver is notified that the KVM pointer has been set and
>> removed when notified that the KVM pointer has been cleared.
>>
> Is this about the field or about the list including all the nodes? This
> reads lie guests_lock only protects the head element, which makes no
> sense to me. Because of how these lists work.

It locks the list, I can rewrite the description.

>
> The narrowest scope that could make sense is all the list_head stuff
> in the entire list. I.e. one would only need the lock to traverse or
> manipulate the list, while the payload would still be subject to
> the matrix_dev->lock mutex.

The matrix_dev->guests lock is needed whenever the kvm->lock
is needed because the struct ap_guest object is created and the
struct kvm assigned to it when the kvm pointer is set
(vfio_ap_mdev_set_kvm function). So, in order to access the
ap_guest object and retrieve the kvm pointer, we have to ensure
the ap_guest_object is still available. The fact we can get the
kvm pointer from the ap_matrix_mdev object just makes things
more efficient - i.e., we won't have to traverse the list.

Whenever the kvm->lock and matrix_dev->lock mutexes must
be held, the order is:

    matrix_dev->guests_lock
    matrix_dev->guests->kvm->lock
    matrix_dev->lock

There are times where all three locks are not required; for example,
the handle_pqap and vfio_ap_mdev_probe/remove functions only
require the matrix_dev->lock because it does not need to lock kvm.

>
> [..]
>
>> +struct ap_guest {
>> + struct kvm *kvm;
>> + struct list_head node;
>> +};
>> +
>> /**
>> * struct ap_matrix_dev - Contains the data for the matrix device.
>> *
>> @@ -39,6 +44,9 @@
>> * single ap_matrix_mdev device. It's quite coarse but we don't
>> * expect much contention.
>> * @vfio_ap_drv: the vfio_ap device driver
>> + * @guests_lock: r/w semaphore for protecting access to @guests
>> + * @guests: list of guests (struct ap_guest) using AP devices bound to the
>> + * vfio_ap device driver.
> Please compare the above. Also if it is only about the access to the
> list, then you could drop the lock right after create, and not keep it
> till the very end of vfio_ap_mdev_set_kvm(). Right?

That would be true if it only controlled access to the list, but as I
explained above, that is not its sole purpose.

>
> In any case I'm skeptical about this whole struct ap_guest business. To
> me, it looks like something that just makes things more obscure and
> complicated without any real benefit.

I'm open to other ideas, but you'll have to come up with a way
to take the kvm->lock before the matrix_mdev->lock in the
vfio_ap_mdev_probe_queue and vfio_ap_mdev_remove_queue
functions where we don't have access to the ap_matrix_mdev
object to which the APQN is assigned and has the pointer to the
kvm object.

In order to retrieve the matrix_mdev, we need the matrix_dev->lock.
In order to hot plug/unplug the queue, we need the kvm->lock.
There's your catch-22 that needs to be solved. This design is my
attempt to solve that.

>
> Regards,
> Halil
>
>> */
>> struct ap_matrix_dev {
>> struct device device;
>> @@ -47,6 +55,8 @@ struct ap_matrix_dev {
>> struct list_head mdev_list;
>> struct mutex lock;
>> struct ap_driver *vfio_ap_drv;
>> + struct rw_semaphore guests_lock;
>> + struct list_head guests;
>> };
>>
>> extern struct ap_matrix_dev *matrix_dev;


2022-01-11 22:19:45

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v17 08/15] s390/vfio-ap: keep track of active guests



On 1/11/22 16:58, Tony Krowiak wrote:
>
>
> On 12/29/21 22:33, Halil Pasic wrote:
>> On Thu, 21 Oct 2021 11:23:25 -0400
>> Tony Krowiak <[email protected]> wrote:
>>
>>> The vfio_ap device driver registers for notification when the
>>> pointer to
>>> the KVM object for a guest is set. Let's store the KVM pointer as
>>> well as
>>> the pointer to the mediated device when the KVM pointer is set.
>> [..]
>>
>>
>>> struct ap_matrix_dev {
>>>          ...
>>>          struct rw_semaphore guests_lock;
>>>          struct list_head guests;
>>>         ...
>>> }
>>>
>>> The 'guests_lock' field is a r/w semaphore to control access to the
>>> 'guests' field. The 'guests' field is a list of ap_guest
>>> structures containing the KVM and matrix_mdev pointers for each active
>>> guest. An ap_guest structure will be stored into the list whenever the
>>> vfio_ap device driver is notified that the KVM pointer has been set and
>>> removed when notified that the KVM pointer has been cleared.
>>>
>> Is this about the field or about the list including all the nodes? This
>> reads lie guests_lock only protects the head element, which makes no
>> sense to me. Because of how these lists work.
>
> It locks the list, I can rewrite the description.

Ignore this response and read the answers to your comments below.

>
>
>>
>> The narrowest scope that could make sense is all the list_head stuff
>> in the entire list. I.e. one would only need the lock to traverse or
>> manipulate the list, while the payload would still be subject to
>> the matrix_dev->lock mutex.
>
> The matrix_dev->guests lock is needed whenever the kvm->lock
> is needed because the struct ap_guest object is created and the
> struct kvm assigned to it when the kvm pointer is set
> (vfio_ap_mdev_set_kvm function). So, in order to access the
> ap_guest object and retrieve the kvm pointer, we have to ensure
> the ap_guest_object is still available. The fact we can get the
> kvm pointer from the ap_matrix_mdev object just makes things
> more efficient - i.e., we won't have to traverse the list.
>
> Whenever the kvm->lock and matrix_dev->lock mutexes must
> be held, the order is:
>
>     matrix_dev->guests_lock
>     matrix_dev->guests->kvm->lock
>     matrix_dev->lock
>
> There are times where all three locks are not required; for example,
> the handle_pqap and vfio_ap_mdev_probe/remove functions only
> require the matrix_dev->lock because it does not need to lock kvm.
>
>>
>> [..]
>>
>>> +struct ap_guest {
>>> +    struct kvm *kvm;
>>> +    struct list_head node;
>>> +};
>>> +
>>>   /**
>>>    * struct ap_matrix_dev - Contains the data for the matrix device.
>>>    *
>>> @@ -39,6 +44,9 @@
>>>    *        single ap_matrix_mdev device. It's quite coarse but we
>>> don't
>>>    *        expect much contention.
>>>    * @vfio_ap_drv: the vfio_ap device driver
>>> + * @guests_lock: r/w semaphore for protecting access to @guests
>>> + * @guests:    list of guests (struct ap_guest) using AP devices
>>> bound to the
>>> + *        vfio_ap device driver.
>> Please compare the above. Also if it is only about the access to the
>> list, then you could drop the lock right after create, and not keep it
>> till the very end of vfio_ap_mdev_set_kvm(). Right?
>
> That would be true if it only controlled access to the list, but as I
> explained above, that is not its sole purpose.
>
>>
>> In any case I'm skeptical about this whole struct ap_guest business. To
>> me, it looks like something that just makes things more obscure and
>> complicated without any real benefit.
>
> I'm open to other ideas, but you'll have to come up with a way
> to take the kvm->lock before the matrix_mdev->lock in the
> vfio_ap_mdev_probe_queue and vfio_ap_mdev_remove_queue
> functions where we don't have access to the ap_matrix_mdev
> object to which the APQN is assigned and has the pointer to the
> kvm object.
>
> In order to retrieve the matrix_mdev, we need the matrix_dev->lock.
> In order to hot plug/unplug the queue, we need the kvm->lock.
> There's your catch-22 that needs to be solved. This design is my
> attempt to solve that.
>
>>
>> Regards,
>> Halil
>>
>>>    */
>>>   struct ap_matrix_dev {
>>>       struct device device;
>>> @@ -47,6 +55,8 @@ struct ap_matrix_dev {
>>>       struct list_head mdev_list;
>>>       struct mutex lock;
>>>       struct ap_driver  *vfio_ap_drv;
>>> +    struct rw_semaphore guests_lock;
>>> +    struct list_head guests;
>>>   };
>>>     extern struct ap_matrix_dev *matrix_dev;
>


2022-01-11 22:42:38

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v17 09/15] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device



On 1/9/22 16:36, Halil Pasic wrote:
> On Thu, 21 Oct 2021 11:23:26 -0400
> Tony Krowiak <[email protected]> wrote:
>
>> Keep in mind that the kvm->lock must be taken outside of the
>> matrix_mdev->lock to avoid circular lock dependencies (i.e., a lockdep
>> splat). This will necessitate taking the matrix_dev->guests_lock in order
>> to find the guest(s) in the matrix_dev->guests list to which the affected
>> APQN(s) may be assigned. The kvm->lock can then be taken prior to the
>> matrix_dev->lock and the APCB plugged into the guest without any problem.
> IMHO correct and sane locking is one of the key points we have to
> resolve. Frankly, I'm having trouble understanding the why behind some
> of your changes, compared to v16, and I suspect that looking for a good
> locking scheme might have played a role.

The locking scheme introduced with this patch series was
precipitated by the fact that the queue probe/remove
callbacks require access to both the matrix_dev->lock
and matrix_mdev->kvm->lock. When those callbacks are
invoked, the only information we have is the queue device
being removed. In order to retrieve the mdev to which
the APQN of the queue is assigned, we have to take the
matrix_dev->lock. We also need the matrix_mdev->kvm->lock
because we are hot plugging/unplugging the queue (i.e.,
kvm_arch_crypto_set_masks).

Given we need to take the kvm->lock prior to the
matrix_dev->lock in order to avoid potential lockdep
splats, this design was created.

>
> In the beginning, I was not very keen on taking the kvm->lock first
> and the matrix_dev->lock, but the more I think about it the more I
> become convinced that this is probably the simplest way to resolve the
> problem in a satisfactory manner. I don't like the idea of
> hogging the kvm->lock and potentially stalling out some core kvm code
> because there is contention on matrix_dev->lock. And it is kind of up to
> the user-space and the guests, how much pressure is put on the
> matrix_dev->lock. And I'm still worried about that, but when I went
> through the alternatives, my mood turned form bad to worse. Because of
> that, I'm fine with this solution, provided some of the KVM/s390
> maintainers ack it as well. I don't feel comfortable making a call on
> this alone.

You are feeling the very same frustrations I felt when trying to
come up with a viable solution. I share your concerns, but I
was simply not able to come up with anything better that
wouldn't require redesigning the secure execution ioctls as well
as the instruction interception ioctls.

>
> That said, let me also sum up my thoughts on alternatives and
> non-alternatives, hopefully for the benefit of other reviewers.
>
> 1) I deeply regret that I used to argue against handling PQAP in
> userspace with an ioctl as Pierre originally proposed. I was unaware of
> the kvm->lock vcpu->lock locking order. Back then we didn't use to
> have that sequence, but the rule was already there. I guess we could
> still go back to that scheme of handling PQAP if QEMU were to support
> it, and thus break the circle, but that would result in a very ugly
> dependency (we would need QEMU support for dynamic, and we would have
> to handle the case of an old QEMU). Technically it is still possible, but
> very ugly.

Note that this didn't rear its ugly head until secure execution guests
were introduced.

> 2) I've contemplated if it is possible to simulate the userspace exit
> and re-entry via ioctl in KVM. But looking at the code, it does not
> look like a sane option to me.
> 3) I also considered using a read-write lock for matrix_dev->lock. In
> theory a read-write lock that favors reads in a sense that a steady
> stream of readers can starve the writers would work. But rwsem can't be
> used in this situation because rwsem is fair, in a sense that a waiting
> writers may effectively block readers that try to acquire the lock while
> the lock is held as a read lock. So while rwsem in practice does allow
> for more parallelism regarding lock dependency circles it does not
> provide any benefits over a mutex.

Note: I went down this road already and was not able to resolve
          lockdep splats with rwsem. That is not to say I exhausted
          all permutations, but I ended up pulling what little hair I
          have left out.

> 4) I considered srcu as well. But rcu is a very different beast and does
> not seem to be a great fit for what we are trying to do here. We are
> not not fine with working with a stale copy of the matrix in most of the
> situations.
> 5) I also contemplated, if relaxing the mutual exclusion is possible.
> PQAP only needs the CRYCB matrix to check whether the queue is in the
> config or not. So maybe we could get away without taking the
> matrix_dev->lock and doing separate locking for the queue in question,
> and instead of delaying any updates to the CRYCB while processing AQIC,
> we could just work with whatever we see in the CRYCB. Since the setting
> up of the interrupts is asynchronous with respect to the instruction
> requesting it (PQAP/AQIC) and the CRYCB masks are relevant in the
> instruction context... So I was thinking: if we were to introduce a
> separate lock for the AQIC state, and find the queue without taking
> the matrix_dev->lock, we could actually process the PQAP/AQIC without
> the matrix_dev->lock. But then because we would have vcpu->lock -->
> vfio_ap_queue->lock, we would have to avoid ending up with a circle
> on the cleanup path, and also avoid races on the cleanup path. I'm not
> sure how tricky that would end up being, if at all possible.

Note: I too considered this, but again failed to make it work.
          I don't recall why, but I ended up with lockdep splats.
          Maybe I just didn't design it properly.

> 6) We could practically implement that unfair read-write lock with
> a mutex and condition variables (and a waitqueue), but that wouldn't
> simplify things either. Still if we want to avoid taking kvm->lock
> before taking the vfio_ap lock, it may be the most straight forward
> alternative.

Note: The original lockdep splat was resolved with a wait queue,
          but as you may recall, that was objected to on the grounds
          that it circumvented lockdep. That is what precipitated this
          mess in the first place.

>
> At the end let me also state, that my understanding of some of the
> details is still incomplete.

Given the difficulty of keeping the mental thread needed to
implement this, I can certainly understand:)

>
> Regards,
> Halil
>
>
>


2022-01-12 11:53:05

by Halil Pasic

[permalink] [raw]
Subject: Re: [PATCH v17 06/15] s390/vfio-ap: refresh guest's APCB by filtering APQNs assigned to mdev

On Tue, 11 Jan 2022 16:19:06 -0500
Tony Krowiak <[email protected]> wrote:

> >
> > Also we could probably do the filtering incrementally. In a sense that
> > at a time only so much changes, and we know that the invariant was
> > preserved without that change. But that would probably end up trading
> > complexity for cycles. I will trust your judgment and your tests on this
> > matter.
>
> I am not entirely clear on what you are suggesting. I think you are
> suggesting that there may not be a need to look at every APQN
> assigned to the mdev when an adapter or domain is assigned or
> unassigned or a queue is probed or removed. Maybe you can clarify
> what you are suggesting here.

Exactly. For example if we have the following assigned
adapters:
1, 2, 3
domains:
1, 2, 3
and the operation we are trying to perform is assign domain 4, then it
is sufficient to have a look at the queues with the APQNs (1,4), (2,4)
and (3, 4). We don't have to examine all the 14 queues.

When an unassign dapter is performed, there is no need to do the
re-filtering, because there is nothing that can pop-back or go away. And
on unassign domain is performed, then all we care about are the queues
of that domain on the filtered adapters.

Similarly if after that successful assign the queue (3,4) gets removed
(from vfio_ap) and then added back again and probed, we only have to
look at the queues (3, 1), (3, 2), (3, 3).

But I'm OK with the current design of this. It is certainly conceptually
simpler to say we have a master-copy and we filter that master-copy based
on the very same rules every time something changes. I'm really fine
either way as log as it works well. :D

Regards,
Halil

2022-01-12 14:25:57

by Halil Pasic

[permalink] [raw]
Subject: Re: [PATCH v17 08/15] s390/vfio-ap: keep track of active guests

On Tue, 11 Jan 2022 16:58:13 -0500
Tony Krowiak <[email protected]> wrote:

> On 12/29/21 22:33, Halil Pasic wrote:
> > On Thu, 21 Oct 2021 11:23:25 -0400
> > Tony Krowiak <[email protected]> wrote:
> >
> >> The vfio_ap device driver registers for notification when the pointer to
> >> the KVM object for a guest is set. Let's store the KVM pointer as well as
> >> the pointer to the mediated device when the KVM pointer is set.
> > [..]
> >
> >
> >> struct ap_matrix_dev {
> >> ...
> >> struct rw_semaphore guests_lock;
> >> struct list_head guests;
> >> ...
> >> }
> >>
> >> The 'guests_lock' field is a r/w semaphore to control access to the
> >> 'guests' field. The 'guests' field is a list of ap_guest
> >> structures containing the KVM and matrix_mdev pointers for each active
> >> guest. An ap_guest structure will be stored into the list whenever the
> >> vfio_ap device driver is notified that the KVM pointer has been set and
> >> removed when notified that the KVM pointer has been cleared.
> >>
> > Is this about the field or about the list including all the nodes? This
> > reads lie guests_lock only protects the head element, which makes no
> > sense to me. Because of how these lists work.
>
> It locks the list, I can rewrite the description.
>
> >
> > The narrowest scope that could make sense is all the list_head stuff
> > in the entire list. I.e. one would only need the lock to traverse or
> > manipulate the list, while the payload would still be subject to
> > the matrix_dev->lock mutex.
>
> The matrix_dev->guests lock is needed whenever the kvm->lock
> is needed because the struct ap_guest object is created and the
> struct kvm assigned to it when the kvm pointer is set
> (vfio_ap_mdev_set_kvm function).

Yes reading the code, my impression was, that this is more about the
ap_guest.kvm that about the list.

My understanding is that struct ap_gurest is basically about the
marriage between a matrix_mdev and a kvm. Basically a link between the
two.

But then, it probably does not make a sense for this link to outlive
either kvm or matrix_mdev.

Thus I don't quite understand why do we need the extra allocation? If
we want a list, why don't we just the pointers to matrix_mdev?

We could still protect that stuff with a separate lock.

> So, in order to access the
> ap_guest object and retrieve the kvm pointer, we have to ensure
> the ap_guest_object is still available. The fact we can get the
> kvm pointer from the ap_matrix_mdev object just makes things
> more efficient - i.e., we won't have to traverse the list.

Well if the guests_lock is only protecting the list, then that should not
be true. In that case, you can be only sure about the nodes that you
reached by traversing the list with he lock held. Right.

If only the list is protected, then one could do

down_write(guests_lock)
list_del(element)
up_write(guests_lock)
fancy_free(element)


>
> Whenever the kvm->lock and matrix_dev->lock mutexes must
> be held, the order is:
>
>     matrix_dev->guests_lock
>     matrix_dev->guests->kvm->lock
>     matrix_dev->lock
>
> There are times where all three locks are not required; for example,
> the handle_pqap and vfio_ap_mdev_probe/remove functions only
> require the matrix_dev->lock because it does not need to lock kvm.
>

Yeah, that is what gets rid of the circular lock dependency. If we had
to take guests_lock there we would have guests_lock in the same role
as matrix_dev->lock before.

But the thing is you do
kvm = q->matrix_mdev->guest->kvm;
in the pqap_handler (more precisely in a function called by it).

So you do access the struct ap_guest object and its kvm member
without the guests_lock being held. That is where things become very
muddy to me.

It looks to me that the kvm pointer is changed with both the
guests_lock and the matrix_dev->lock held in write mode. And accessing
such stuff read only is safe with either of the two locks held.

Thus I do believe that the general idea is viable. I've pointed that out
in a later email.

But the information you give the unsuspecting reader to aid him in
understanding our new locking scheme is severely lacking.

> >
> > [..]
> >
> >> +struct ap_guest {
> >> + struct kvm *kvm;
> >> + struct list_head node;
> >> +};
> >> +
> >> /**
> >> * struct ap_matrix_dev - Contains the data for the matrix device.
> >> *
> >> @@ -39,6 +44,9 @@
> >> * single ap_matrix_mdev device. It's quite coarse but we don't
> >> * expect much contention.
> >> * @vfio_ap_drv: the vfio_ap device driver
> >> + * @guests_lock: r/w semaphore for protecting access to @guests
> >> + * @guests: list of guests (struct ap_guest) using AP devices bound to the
> >> + * vfio_ap device driver.
> > Please compare the above. Also if it is only about the access to the
> > list, then you could drop the lock right after create, and not keep it
> > till the very end of vfio_ap_mdev_set_kvm(). Right?
>
> That would be true if it only controlled access to the list, but as I
> explained above, that is not its sole purpose.

Well, but guests is a member of struct ap_matrix_dev and not the whole
list including all the nodes.

>
> >
> > In any case I'm skeptical about this whole struct ap_guest business. To
> > me, it looks like something that just makes things more obscure and
> > complicated without any real benefit.
>
> I'm open to other ideas, but you'll have to come up with a way
> to take the kvm->lock before the matrix_mdev->lock in the
> vfio_ap_mdev_probe_queue and vfio_ap_mdev_remove_queue
> functions where we don't have access to the ap_matrix_mdev
> object to which the APQN is assigned and has the pointer to the
> kvm object.
>
> In order to retrieve the matrix_mdev, we need the matrix_dev->lock.
> In order to hot plug/unplug the queue, we need the kvm->lock.
> There's your catch-22 that needs to be solved. This design is my
> attempt to solve that.
>

I agree that having a lock that we take before kvm->lock is taken,
and another one that we take with the kvm->lock taken is a good idea.

I was referring to having ap_guest objects which are separately
allocated, and have a decoupled lifecycle. Please see above!

Regards,
Halil
[..]

2022-01-15 10:08:50

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v17 08/15] s390/vfio-ap: keep track of active guests



On 1/12/22 09:25, Halil Pasic wrote:
> On Tue, 11 Jan 2022 16:58:13 -0500
> Tony Krowiak <[email protected]> wrote:
>
>> On 12/29/21 22:33, Halil Pasic wrote:
>>> On Thu, 21 Oct 2021 11:23:25 -0400
>>> Tony Krowiak <[email protected]> wrote:
>>>
>>>> The vfio_ap device driver registers for notification when the pointer to
>>>> the KVM object for a guest is set. Let's store the KVM pointer as well as
>>>> the pointer to the mediated device when the KVM pointer is set.
>>> [..]
>>>
>>>
>>>> struct ap_matrix_dev {
>>>> ...
>>>> struct rw_semaphore guests_lock;
>>>> struct list_head guests;
>>>> ...
>>>> }
>>>>
>>>> The 'guests_lock' field is a r/w semaphore to control access to the
>>>> 'guests' field. The 'guests' field is a list of ap_guest
>>>> structures containing the KVM and matrix_mdev pointers for each active
>>>> guest. An ap_guest structure will be stored into the list whenever the
>>>> vfio_ap device driver is notified that the KVM pointer has been set and
>>>> removed when notified that the KVM pointer has been cleared.
>>>>
>>> Is this about the field or about the list including all the nodes? This
>>> reads lie guests_lock only protects the head element, which makes no
>>> sense to me. Because of how these lists work.
>> It locks the list, I can rewrite the description.
>>
>>> The narrowest scope that could make sense is all the list_head stuff
>>> in the entire list. I.e. one would only need the lock to traverse or
>>> manipulate the list, while the payload would still be subject to
>>> the matrix_dev->lock mutex.
>> The matrix_dev->guests lock is needed whenever the kvm->lock
>> is needed because the struct ap_guest object is created and the
>> struct kvm assigned to it when the kvm pointer is set
>> (vfio_ap_mdev_set_kvm function).
> Yes reading the code, my impression was, that this is more about the
> ap_guest.kvm that about the list.
>
> My understanding is that struct ap_gurest is basically about the
> marriage between a matrix_mdev and a kvm. Basically a link between the
> two.
>
> But then, it probably does not make a sense for this link to outlive
> either kvm or matrix_mdev.
>
> Thus I don't quite understand why do we need the extra allocation? If
> we want a list, why don't we just the pointers to matrix_mdev?
>
> We could still protect that stuff with a separate lock.

I think this may be a good idea. We already have a list of matrix_mdev
stored in matrix_dev. I'll explore this further.

>
>> So, in order to access the
>> ap_guest object and retrieve the kvm pointer, we have to ensure
>> the ap_guest_object is still available. The fact we can get the
>> kvm pointer from the ap_matrix_mdev object just makes things
>> more efficient - i.e., we won't have to traverse the list.
> Well if the guests_lock is only protecting the list, then that should not
> be true. In that case, you can be only sure about the nodes that you
> reached by traversing the list with he lock held. Right.
>
> If only the list is protected, then one could do
>
> down_write(guests_lock)
> list_del(element)
> up_write(guests_lock)
> fancy_free(element)
>
>
>> Whenever the kvm->lock and matrix_dev->lock mutexes must
>> be held, the order is:
>>
>>     matrix_dev->guests_lock
>>     matrix_dev->guests->kvm->lock
>>     matrix_dev->lock
>>
>> There are times where all three locks are not required; for example,
>> the handle_pqap and vfio_ap_mdev_probe/remove functions only
>> require the matrix_dev->lock because it does not need to lock kvm.
>>
> Yeah, that is what gets rid of the circular lock dependency. If we had
> to take guests_lock there we would have guests_lock in the same role
> as matrix_dev->lock before.
>
> But the thing is you do
> kvm = q->matrix_mdev->guest->kvm;
> in the pqap_handler (more precisely in a function called by it).
>
> So you do access the struct ap_guest object and its kvm member
> without the guests_lock being held. That is where things become very
> muddy to me.

I was thinking about this the other day, that the kvm pointer is
needed when the IRQ is disabled to clean up the gisa stuff and
the pinned memory. I'm going to revisit this.

>
> It looks to me that the kvm pointer is changed with both the
> guests_lock and the matrix_dev->lock held in write mode. And accessing
> such stuff read only is safe with either of the two locks held.
>
> Thus I do believe that the general idea is viable. I've pointed that out
> in a later email.
>
> But the information you give the unsuspecting reader to aid him in
> understanding our new locking scheme is severely lacking.

I'll try to clear up the patch description.

>
>>> [..]
>>>
>>>> +struct ap_guest {
>>>> + struct kvm *kvm;
>>>> + struct list_head node;
>>>> +};
>>>> +
>>>> /**
>>>> * struct ap_matrix_dev - Contains the data for the matrix device.
>>>> *
>>>> @@ -39,6 +44,9 @@
>>>> * single ap_matrix_mdev device. It's quite coarse but we don't
>>>> * expect much contention.
>>>> * @vfio_ap_drv: the vfio_ap device driver
>>>> + * @guests_lock: r/w semaphore for protecting access to @guests
>>>> + * @guests: list of guests (struct ap_guest) using AP devices bound to the
>>>> + * vfio_ap device driver.
>>> Please compare the above. Also if it is only about the access to the
>>> list, then you could drop the lock right after create, and not keep it
>>> till the very end of vfio_ap_mdev_set_kvm(). Right?
>> That would be true if it only controlled access to the list, but as I
>> explained above, that is not its sole purpose.
> Well, but guests is a member of struct ap_matrix_dev and not the whole
> list including all the nodes.
>
>>> In any case I'm skeptical about this whole struct ap_guest business. To
>>> me, it looks like something that just makes things more obscure and
>>> complicated without any real benefit.
>> I'm open to other ideas, but you'll have to come up with a way
>> to take the kvm->lock before the matrix_mdev->lock in the
>> vfio_ap_mdev_probe_queue and vfio_ap_mdev_remove_queue
>> functions where we don't have access to the ap_matrix_mdev
>> object to which the APQN is assigned and has the pointer to the
>> kvm object.
>>
>> In order to retrieve the matrix_mdev, we need the matrix_dev->lock.
>> In order to hot plug/unplug the queue, we need the kvm->lock.
>> There's your catch-22 that needs to be solved. This design is my
>> attempt to solve that.
>>
> I agree that having a lock that we take before kvm->lock is taken,
> and another one that we take with the kvm->lock taken is a good idea.
>
> I was referring to having ap_guest objects which are separately
> allocated, and have a decoupled lifecycle. Please see above!

I'm thinking about looking into getting rid of the struct ap_guest and
the guests list as I said above. I think I can rework this.

>
> Regards,
> Halil
> [..]

2022-01-15 10:09:47

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v17 06/15] s390/vfio-ap: refresh guest's APCB by filtering APQNs assigned to mdev



On 1/12/22 06:52, Halil Pasic wrote:
> On Tue, 11 Jan 2022 16:19:06 -0500
> Tony Krowiak <[email protected]> wrote:
>
>>> Also we could probably do the filtering incrementally. In a sense that
>>> at a time only so much changes, and we know that the invariant was
>>> preserved without that change. But that would probably end up trading
>>> complexity for cycles. I will trust your judgment and your tests on this
>>> matter.
>> I am not entirely clear on what you are suggesting. I think you are
>> suggesting that there may not be a need to look at every APQN
>> assigned to the mdev when an adapter or domain is assigned or
>> unassigned or a queue is probed or removed. Maybe you can clarify
>> what you are suggesting here.
> Exactly. For example if we have the following assigned
> adapters:
> 1, 2, 3
> domains:
> 1, 2, 3
> and the operation we are trying to perform is assign domain 4, then it
> is sufficient to have a look at the queues with the APQNs (1,4), (2,4)
> and (3, 4). We don't have to examine all the 14 queues.
>
> When an unassign dapter is performed, there is no need to do the
> re-filtering, because there is nothing that can pop-back or go away. And
> on unassign domain is performed, then all we care about are the queues
> of that domain on the filtered adapters.
>
> Similarly if after that successful assign the queue (3,4) gets removed
> (from vfio_ap) and then added back again and probed, we only have to
> look at the queues (3, 1), (3, 2), (3, 3).
>
> But I'm OK with the current design of this. It is certainly conceptually
> simpler to say we have a master-copy and we filter that master-copy based
> on the very same rules every time something changes. I'm really fine
> either way as log as it works well. :D
>
> Regards,
> Halil

I spent a day messing with this and was able to make it work, so
the next implementation will incorporate your idea here.


2022-02-07 06:09:38

by Halil Pasic

[permalink] [raw]
Subject: Re: [PATCH v17 14/15] s390/ap: notify drivers on config changed and scan complete callbacks

On Thu, 21 Oct 2021 11:23:31 -0400
Tony Krowiak <[email protected]> wrote:

> This patch introduces an extension to the ap bus to notify device drivers
> when the host AP configuration changes - i.e., adapters, domains or
> control domains are added or removed. When an adapter or domain is added to
> the host's AP configuration, the AP bus will create the associated queue
> devices in the linux sysfs device model. Each new type 10 (i.e., CEX4) or
> newer queue device with an APQN that is not reserved for the default device
> driver will get bound to the vfio_ap device driver. Likewise, whan an
> adapter or domain is removed from the host's AP configuration, the AP bus
> will remove the associated queue devices from the sysfs device model. Each
> of the queues that is bound to the vfio_ap device driver will get unbound.
>
> With the introduction of hot plug support, binding or unbinding of a
> queue device will result in plugging or unplugging one or more queues from
> a guest that is using the queue. If there are multiple changes to the
> host's AP configuration, it could result in the probe and remove callbacks
> getting invoked multiple times. Each time queues are plugged into or
> unplugged from a guest, the guest's VCPUs must be taken out of SIE.
> If this occurs multiple times due to changes in the host's AP
> configuration, that can have an undesirable negative affect on the guest's
> performance.
>
> To alleviate this problem, this patch introduces two new callbacks: one to
> notify the vfio_ap device driver when the AP bus scan routine detects a
> change to the host's AP configuration; and, one to notify the driver when
> the AP bus is done scanning. This will allow the vfio_ap driver to do
> bulk processing of all affected adapters, domains and control domains for
> affected guests rather than plugging or unplugging them one at a time when
> the probe or remove callback is invoked. The two new callbacks are:
>
> void (*on_config_changed)(struct ap_config_info *new_config_info,
> struct ap_config_info *old_config_info);
>
> This callback is invoked at the start of the AP bus scan
> function when it determines that the host AP configuration information
> has changed since the previous scan. This is done by storing
> an old and current QCI info struct and comparing them. If there is any
> difference, the callback is invoked.
>
> The vfio_ap device driver registers a callback function for this callback
> that performs the following operations:
>
> 1. Unplugs the adapters, domains and control domains removed from the
> host's AP configuration from the guests to which they are
> assigned in a single operation.
>
> 2. Disconnects the links between each queue structure representing a
> queue that was unplugged from the structure representing
> the mediated device to which the queue is assigned. Thus, when the
> vfio_ap device driver's remove callback is invoked, the unplugging of
> the queue from the guest and the unlinking of the queue structure from
> the mediated device structure will be bypassed because the queues and
> control domains will have already been unplugged in bulk.
>
> 3. Stores bitmaps identifying the adapters, domains and control domains
> added to the host's AP configuration with the structure representing
> the mediated device. When the vfio_ap device driver's probe callback is
> subsequently invoked, the probe function will recognize that the
> queue is being probed due to a change in the host's AP configuration
> and the plugging of the queue into the guest will be bypassed.
>
> void (*on_scan_complete)(struct ap_config_info *new_config_info,
> struct ap_config_info *old_config_info);
>
> The on_scan_complete callback is invoked after the ap bus scan is
> completed if the host AP configuration data has changed. The vfio_ap
> device driver registers a callback function for this callback that hot
> plugs each queue and control domain added to the AP configuration for each
> guest using them in a single hot plug operation.
>
> Signed-off-by: Harald Freudenberger <[email protected]>
> [[email protected]: implemented callback functions in vfio_ap driver]
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> drivers/s390/crypto/ap_bus.c | 81 ++++++-
> drivers/s390/crypto/ap_bus.h | 12 +
> drivers/s390/crypto/vfio_ap_drv.c | 4 +-
> drivers/s390/crypto/vfio_ap_ops.c | 332 ++++++++++++++++++++++++--
> drivers/s390/crypto/vfio_ap_private.h | 23 +-
> 5 files changed, 429 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
> index 15886610f61a..b97149d02da6 100644
> --- a/drivers/s390/crypto/ap_bus.c
> +++ b/drivers/s390/crypto/ap_bus.c
> @@ -88,6 +88,7 @@ static atomic64_t ap_bindings_complete_count = ATOMIC64_INIT(0);
> static DECLARE_COMPLETION(ap_init_apqn_bindings_complete);
>
> static struct ap_config_info *ap_qci_info;
> +static struct ap_config_info *ap_qci_info_old;
>
> /*
> * AP bus related debug feature things.
> @@ -225,9 +226,14 @@ static void __init ap_init_qci_info(void)
> ap_qci_info = kzalloc(sizeof(*ap_qci_info), GFP_KERNEL);
> if (!ap_qci_info)
> return;
> + ap_qci_info_old = kzalloc(sizeof(*ap_qci_info_old), GFP_KERNEL);
> + if (!ap_qci_info_old)
> + return;
> if (ap_fetch_qci_info(ap_qci_info) != 0) {
> kfree(ap_qci_info);
> + kfree(ap_qci_info_old);
> ap_qci_info = NULL;
> + ap_qci_info_old = NULL;
> return;
> }
> AP_DBF_INFO("%s successful fetched initial qci info\n", __func__);
> @@ -244,6 +250,8 @@ static void __init ap_init_qci_info(void)
> __func__, ap_max_domain_id);
> }
> }
> +
> + memcpy(ap_qci_info_old, ap_qci_info, sizeof(*ap_qci_info));
> }
>
> /*
> @@ -1635,6 +1643,49 @@ static int __match_queue_device_with_queue_id(struct device *dev, const void *da
> && AP_QID_QUEUE(to_ap_queue(dev)->qid) == (int)(long) data;
> }
>
> +/* Helper function for notify_config_changed */
> +static int __drv_notify_config_changed(struct device_driver *drv, void *data)
> +{
> + struct ap_driver *ap_drv = to_ap_drv(drv);
> +
> + if (try_module_get(drv->owner)) {
> + if (ap_drv->on_config_changed)
> + ap_drv->on_config_changed(ap_qci_info, ap_qci_info_old);
> + module_put(drv->owner);
> + }
> +
> + return 0;
> +}
> +
> +/* Notify all drivers about an qci config change */
> +static inline void notify_config_changed(void)
> +{
> + bus_for_each_drv(&ap_bus_type, NULL, NULL,
> + __drv_notify_config_changed);
> +}
> +
> +/* Helper function for notify_scan_complete */
> +static int __drv_notify_scan_complete(struct device_driver *drv, void *data)
> +{
> + struct ap_driver *ap_drv = to_ap_drv(drv);
> +
> + if (try_module_get(drv->owner)) {
> + if (ap_drv->on_scan_complete)
> + ap_drv->on_scan_complete(ap_qci_info,
> + ap_qci_info_old);
> + module_put(drv->owner);
> + }
> +
> + return 0;
> +}
> +
> +/* Notify all drivers about bus scan complete */
> +static inline void notify_scan_complete(void)
> +{
> + bus_for_each_drv(&ap_bus_type, NULL, NULL,
> + __drv_notify_scan_complete);
> +}
> +
> /*
> * Helper function for ap_scan_bus().
> * Remove card device and associated queue devices.
> @@ -1923,6 +1974,25 @@ static inline void ap_scan_adapter(int ap)
> put_device(&ac->ap_dev.device);
> }
>
> +/**
> + * ap_get_configuration - get the host AP configuration
> + *
> + * Stores the host AP configuration information returned from the previous call
> + * to Query Configuration Information (QCI), then retrieves and stores the
> + * current AP configuration returned from QCI.
> + *
> + * Return: true if the host AP configuration changed between calls to QCI;
> + * otherwise, return false.
> + */
> +static bool ap_get_configuration(void)
> +{
> + memcpy(ap_qci_info_old, ap_qci_info, sizeof(*ap_qci_info));
> + ap_fetch_qci_info(ap_qci_info);
> +
> + return memcmp(ap_qci_info, ap_qci_info_old,
> + sizeof(struct ap_config_info)) != 0;
> +}
> +
> /**
> * ap_scan_bus(): Scan the AP bus for new devices
> * Runs periodically, workqueue timer (ap_config_time)
> @@ -1930,9 +2000,12 @@ static inline void ap_scan_adapter(int ap)
> */
> static void ap_scan_bus(struct work_struct *unused)
> {
> - int ap;
> + int ap, config_changed = 0;
>
> - ap_fetch_qci_info(ap_qci_info);
> + /* config change notify */
> + config_changed = ap_get_configuration();
> + if (config_changed)
> + notify_config_changed();
> ap_select_domain();
>
> AP_DBF_DBG("%s running\n", __func__);
> @@ -1941,6 +2014,10 @@ static void ap_scan_bus(struct work_struct *unused)
> for (ap = 0; ap <= ap_max_adapter_id; ap++)
> ap_scan_adapter(ap);
>
> + /* scan complete notify */
> + if (config_changed)
> + notify_scan_complete();
> +
> /* check if there is at least one queue available with default domain */
> if (ap_domain_index >= 0) {
> struct device *dev =
> diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
> index 67c1bef60ad5..4de062ea6b76 100644
> --- a/drivers/s390/crypto/ap_bus.h
> +++ b/drivers/s390/crypto/ap_bus.h
> @@ -143,6 +143,18 @@ struct ap_driver {
> int (*probe)(struct ap_device *);
> void (*remove)(struct ap_device *);
> int (*in_use)(unsigned long *apm, unsigned long *aqm);
> + /*
> + * Called at the start of the ap bus scan function when
> + * the crypto config information (qci) has changed.
> + */
> + void (*on_config_changed)(struct ap_config_info *new_config_info,
> + struct ap_config_info *old_config_info);
> + /*
> + * Called at the end of the ap bus scan function when
> + * the crypto config information (qci) has changed.
> + */
> + void (*on_scan_complete)(struct ap_config_info *new_config_info,
> + struct ap_config_info *old_config_info);
> };
>
> #define to_ap_drv(x) container_of((x), struct ap_driver, driver)
> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
> index df7528dcf6ed..5edd45d4d2fc 100644
> --- a/drivers/s390/crypto/vfio_ap_drv.c
> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> @@ -45,6 +45,8 @@ static struct ap_driver vfio_ap_drv = {
> .probe = vfio_ap_mdev_probe_queue,
> .remove = vfio_ap_mdev_remove_queue,
> .in_use = vfio_ap_mdev_resource_in_use,
> + .on_config_changed = vfio_ap_on_cfg_changed,
> + .on_scan_complete = vfio_ap_on_scan_complete,
> .ids = ap_queue_ids,
> };
>
> @@ -92,7 +94,7 @@ static int vfio_ap_matrix_dev_create(void)
>
> /* Fill in config info via PQAP(QCI), if available */
> if (test_facility(12)) {
> - ret = ap_qci(&matrix_dev->info);
> + ret = ap_qci(&matrix_dev->config_info);
> if (ret)
> goto matrix_alloc_err;
> }
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 8075080ef2dd..cedf491c0df4 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -330,7 +330,7 @@ static bool vfio_ap_mdev_filter_cdoms(struct ap_matrix_mdev *matrix_mdev)
>
> bitmap_copy(shadow_adm, matrix_mdev->shadow_apcb.adm, AP_DOMAINS);
> bitmap_and(matrix_mdev->shadow_apcb.adm, matrix_mdev->matrix.adm,
> - (unsigned long *)matrix_dev->info.adm, AP_DOMAINS);
> + (unsigned long *)matrix_dev->config_info.adm, AP_DOMAINS);
>
> return !bitmap_equal(shadow_adm, matrix_mdev->shadow_apcb.adm,
> AP_DOMAINS);
> @@ -349,19 +349,15 @@ static bool vfio_ap_mdev_filter_cdoms(struct ap_matrix_mdev *matrix_mdev)
> */
> static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev)
> {
> - int ret;
> unsigned long apid, apqi, apqn;
> DECLARE_BITMAP(shadow_apm, AP_DEVICES);
> DECLARE_BITMAP(shadow_aqm, AP_DOMAINS);
> struct vfio_ap_queue *q;
>
> - ret = ap_qci(&matrix_dev->info);
> - if (ret)
> - return false;
> -
> bitmap_copy(shadow_apm, matrix_mdev->shadow_apcb.apm, AP_DEVICES);
> bitmap_copy(shadow_aqm, matrix_mdev->shadow_apcb.aqm, AP_DOMAINS);
> - vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);
> + vfio_ap_matrix_init(&matrix_dev->config_info,
> + &matrix_mdev->shadow_apcb);
>
> /*
> * Copy the adapters, domains and control domains to the shadow_apcb
> @@ -369,9 +365,9 @@ static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev)
> * AP configuration.
> */
> bitmap_and(matrix_mdev->shadow_apcb.apm, matrix_mdev->matrix.apm,
> - (unsigned long *)matrix_dev->info.apm, AP_DEVICES);
> + (unsigned long *)matrix_dev->config_info.apm, AP_DEVICES);
> bitmap_and(matrix_mdev->shadow_apcb.aqm, matrix_mdev->matrix.aqm,
> - (unsigned long *)matrix_dev->info.aqm, AP_DOMAINS);
> + (unsigned long *)matrix_dev->config_info.aqm, AP_DOMAINS);
>
> for_each_set_bit_inv(apid, matrix_mdev->shadow_apcb.apm, AP_DEVICES) {
> for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm,
> @@ -417,8 +413,9 @@ static int vfio_ap_mdev_probe(struct mdev_device *mdev)
> &vfio_ap_matrix_dev_ops);
>
> matrix_mdev->mdev = mdev;
> - vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
> - vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);
> + vfio_ap_matrix_init(&matrix_dev->config_info, &matrix_mdev->matrix);
> + vfio_ap_matrix_init(&matrix_dev->config_info,
> + &matrix_mdev->shadow_apcb);
> hash_init(matrix_mdev->qtable.queues);
> mdev_set_drvdata(mdev, matrix_mdev);
> mutex_lock(&matrix_dev->lock);
> @@ -772,13 +769,17 @@ static void vfio_ap_unlink_apqn_fr_mdev(struct ap_matrix_mdev *matrix_mdev,
>
> q = vfio_ap_mdev_get_queue(matrix_mdev, AP_MKQID(apid, apqi));
> /* If the queue is assigned to the matrix mdev, unlink it. */
> - if (q)
> + if (q) {
> vfio_ap_unlink_queue_fr_mdev(q);
>
> - /* If the queue is assigned to the APCB, store it in @qtable. */
> - if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm) &&
> - test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm))
> - hash_add(qtable->queues, &q->mdev_qnode, q->apqn);
> + /* If the queue is assigned to the APCB, store it in @qtable. */
> + if (qtable) {
> + if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm) &&
> + test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm))
> + hash_add(qtable->queues, &q->mdev_qnode,
> + q->apqn);
> + }
> + }
> }
>
> /**
> @@ -1702,9 +1703,31 @@ static void vfio_ap_mdev_put_qlocks(struct ap_guest *guest)
> mutex_unlock(&guest->kvm->lock);
>
> mutex_unlock(&matrix_dev->lock);
> +
> up_read(&matrix_dev->guests_lock);
> }
>
> +static bool vfio_ap_mdev_do_filter_matrix(struct ap_matrix_mdev *matrix_mdev,
> + struct vfio_ap_queue *q)
> +{
> + unsigned long apid = AP_QID_CARD(q->apqn);
> + unsigned long apqi = AP_QID_QUEUE(q->apqn);
> +
> + /*
> + * If the queue is being probed because its APID or APQI is in the
> + * process of being added to the host's AP configuration, then we don't
> + * want to filter the matrix now as the filtering will be done after
> + * the driver is notified that the AP bus scan operation has completed
> + * (see the vfio_ap_on_scan_complete callback function).
> + */
> + if (test_bit_inv(apid, matrix_mdev->apm_add) ||
> + test_bit_inv(apqi, matrix_mdev->aqm_add))
> + return false;
> +
> +
> + return true;
> +}
> +
> int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
> {
> struct vfio_ap_queue *q;
> @@ -1722,8 +1745,10 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
> if (guest) {
> vfio_ap_mdev_link_queue(guest->matrix_mdev, q);
>
> - if (vfio_ap_mdev_filter_matrix(guest->matrix_mdev))
> - vfio_ap_mdev_hotplug_apcb(guest->matrix_mdev);
> + if (vfio_ap_mdev_do_filter_matrix(guest->matrix_mdev, q)) {
> + if (vfio_ap_mdev_filter_matrix(guest->matrix_mdev))
> + vfio_ap_mdev_hotplug_apcb(guest->matrix_mdev);
> + }
> } else {
> vfio_ap_queue_link_mdev(q);
> }
> @@ -1767,3 +1792,274 @@ int vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm)
>
> return ret;
> }
> +
> +/**
> + * vfio_ap_mdev_unlink_adapters - unlinks all queues from the matrix mdev with
> + * an APQI of a domain that has been removed from
> + * the host's AP configuration.
> + *
> + * @matrix_mdev: the matrix mdev from which the queues are to be unlinked
> + * @ap_unlink: a bitmap specifying the APIDs of the adapters removed from the
> + * host's AP configuration.
> + */
> +static void vfio_ap_mdev_unlink_adapters(struct ap_matrix_mdev *matrix_mdev,
> + unsigned long *ap_unlink)
> +{
> + unsigned long apid;
> +
> + for_each_set_bit_inv(apid, ap_unlink, AP_DEVICES)
> + vfio_ap_mdev_unlink_adapter(matrix_mdev, apid, NULL);
> +}
> +
> +/**
> + * vfio_ap_mdev_unlink_domains - unlinks all queues from the matrix mdev with an
> + * APQI of a domain that has been removed from the
> + * host's AP configuration.
> + *
> + * @matrix_mdev: the matrix mdev from which the queues are to be unlinked
> + * @aq_unlink: a bitmap specifying the APQIs of the domains removed from the
> + * host's AP configuration.
> + */
> +static void vfio_ap_mdev_unlink_domains(struct ap_matrix_mdev *matrix_mdev,
> + unsigned long *aq_unlink)
> +{
> + unsigned long apqi;
> +
> + for_each_set_bit_inv(apqi, aq_unlink, AP_DOMAINS)
> + vfio_ap_mdev_unlink_domain(matrix_mdev, apqi, NULL);
> +}
> +
> +/**
> + * vfio_ap_mdev_hot_unplug_cfg - hot unplug the adapters, domains and control
> + * domains that have been removed from the host's
> + * AP configuration from a guest.
> + *
> + * @guest: the guest
> + * @aprem: the adapters that have been removed from the host's AP configuration
> + * @aqrem: the domains that have been removed from the host's AP configuration
> + */
> +static void vfio_ap_mdev_hot_unplug_cfg(struct ap_guest *guest,
> + unsigned long *aprem,
> + unsigned long *aqrem)
> +{
> + vfio_ap_mdev_unlink_adapters(guest->matrix_mdev, aprem);
> + vfio_ap_mdev_unlink_domains(guest->matrix_mdev, aqrem);
> +
> + if (vfio_ap_mdev_filter_matrix(guest->matrix_mdev) ||
> + vfio_ap_mdev_filter_cdoms(guest->matrix_mdev)) {
> + mutex_lock(&guest->kvm->lock);
> + mutex_lock(&matrix_dev->lock);
> +
> + vfio_ap_mdev_hotplug_apcb(guest->matrix_mdev);
> +
> + mutex_unlock(&guest->kvm->lock);
> + mutex_unlock(&matrix_dev->lock);
> + }
> +}
> +
> +/**
> + * vfio_ap_mdev_cfg_remove - determines which guests are using the adapters,
> + * domains and control domains that have been removed
> + * from the host AP configuration and unplugs them
> + * from those guests.
> + *
> + * @ap_remove: bitmap specifying which adapters have been removed from the host
> + * config.
> + * @aq_remove: bitmap specifying which domains have been removed from the host
> + * config.
> + * @cd_remove: bitmap specifying which control domains have been removed from
> + * the host config.
> + */
> +static void vfio_ap_mdev_cfg_remove(unsigned long *ap_remove,
> + unsigned long *aq_remove,
> + unsigned long *cd_remove)
> +{
> + struct ap_guest *guest;
> + DECLARE_BITMAP(aprem, AP_DEVICES);
> + DECLARE_BITMAP(aqrem, AP_DOMAINS);
> + int do_ap_remove, do_aq_remove, do_cd_remove;
> +
> + list_for_each_entry(guest, &matrix_dev->guests, node) {
> + do_ap_remove = bitmap_and(aprem, ap_remove,
> + guest->matrix_mdev->matrix.apm,
> + AP_DEVICES);
> + do_aq_remove = bitmap_and(aqrem, aq_remove,
> + guest->matrix_mdev->matrix.aqm,
> + AP_DOMAINS);
> + do_cd_remove = bitmap_and(aqrem, cd_remove,
> + guest->matrix_mdev->matrix.aqm,
> + AP_DOMAINS);
> +
> + if (!do_ap_remove && !do_aq_remove && !do_cd_remove)
> + continue;
> +
> + vfio_ap_mdev_hot_unplug_cfg(guest, aprem, aqrem);
> + }
> +}
> +
> +/**
> + * vfio_ap_mdev_on_cfg_remove - responds to the removal of adapters, domains and
> + * control domains from the host AP configuration
> + * by unplugging them from the guests that are
> + * using them.
> + */
> +static void vfio_ap_mdev_on_cfg_remove(void)
> +{
> + int ap_remove, aq_remove, cd_remove;
> + DECLARE_BITMAP(aprem, AP_DEVICES);
> + DECLARE_BITMAP(aqrem, AP_DOMAINS);
> + DECLARE_BITMAP(cdrem, AP_DOMAINS);
> + unsigned long *cur_apm, *cur_aqm, *cur_adm;
> + unsigned long *prev_apm, *prev_aqm, *prev_adm;
> +
> + cur_apm = (unsigned long *)matrix_dev->config_info.apm;
> + cur_aqm = (unsigned long *)matrix_dev->config_info.aqm;
> + cur_adm = (unsigned long *)matrix_dev->config_info.adm;
> + prev_apm = (unsigned long *)matrix_dev->config_info_prev.apm;
> + prev_aqm = (unsigned long *)matrix_dev->config_info_prev.aqm;
> + prev_adm = (unsigned long *)matrix_dev->config_info_prev.adm;
> +
> + ap_remove = bitmap_andnot(aprem, prev_apm, cur_apm, AP_DEVICES);
> + aq_remove = bitmap_andnot(aqrem, prev_aqm, cur_aqm, AP_DOMAINS);
> + cd_remove = bitmap_andnot(cdrem, prev_adm, cur_adm, AP_DOMAINS);
> +
> + if (ap_remove || aq_remove || cd_remove)
> + vfio_ap_mdev_cfg_remove(aprem, aqrem, cdrem);
> +}
> +
> +/**
> + * vfio_ap_mdev_cfg_add - store bitmaps specifying the adapters, domains and
> + * control domains that have been added to the host's
> + * AP configuration for each matrix mdev to which they
> + * are assigned.
> + *
> + * @apm_add: a bitmap specifying the adapters that have been added to the AP
> + * configuration.
> + * @aqm_add: a bitmap specifying the domains that have been added to the AP
> + * configuration.
> + * @adm_add: a bitmap specifying the control domains that have been added to the
> + * AP configuration.
> + */
> +static void vfio_ap_mdev_cfg_add(unsigned long *apm_add, unsigned long *aqm_add,
> + unsigned long *adm_add)
> +{
> + struct ap_guest *guest;
> +
> + list_for_each_entry(guest, &matrix_dev->guests, node) {
> + bitmap_and(guest->matrix_mdev->apm_add,
> + guest->matrix_mdev->matrix.apm, apm_add, AP_DEVICES);
> + bitmap_and(guest->matrix_mdev->aqm_add,
> + guest->matrix_mdev->matrix.aqm, aqm_add, AP_DOMAINS);
> + bitmap_and(guest->matrix_mdev->adm_add,
> + guest->matrix_mdev->matrix.adm, adm_add, AP_DEVICES);
> + }
> +}
> +
> +/**
> + * vfio_ap_mdev_on_cfg_add - responds to the addition of adapters, domains and
> + * control domains to the host AP configuration
> + * by updating the bitmaps that specify what adapters,
> + * domains and control domains have been added so they
> + * can be hot plugged into the guest when the AP bus
> + * scan completes (see vfio_ap_on_scan_complete
> + * function).
> + */
> +static void vfio_ap_mdev_on_cfg_add(void)
> +{
> + bool do_add;
> + DECLARE_BITMAP(apm_add, AP_DEVICES);
> + DECLARE_BITMAP(aqm_add, AP_DOMAINS);
> + DECLARE_BITMAP(adm_add, AP_DOMAINS);
> + unsigned long *cur_apm, *cur_aqm, *cur_adm;
> + unsigned long *prev_apm, *prev_aqm, *prev_adm;
> +
> + cur_apm = (unsigned long *)matrix_dev->config_info.apm;
> + cur_aqm = (unsigned long *)matrix_dev->config_info.aqm;
> + cur_adm = (unsigned long *)matrix_dev->config_info.adm;
> +
> + prev_apm = (unsigned long *)matrix_dev->config_info_prev.apm;
> + prev_aqm = (unsigned long *)matrix_dev->config_info_prev.aqm;
> + prev_adm = (unsigned long *)matrix_dev->config_info_prev.adm;
> +
> + do_add = bitmap_andnot(apm_add, cur_apm, prev_apm, AP_DEVICES);
> + do_add |= bitmap_andnot(aqm_add, cur_aqm, prev_aqm, AP_DOMAINS);
> + do_add |= bitmap_andnot(adm_add, cur_adm, prev_adm, AP_DOMAINS);
> +
> + if (do_add)
> + vfio_ap_mdev_cfg_add(apm_add, aqm_add, adm_add);
> +}
> +
> +/**
> + * vfio_ap_on_cfg_changed - handles notification of changes to the host AP
> + * configuration.
> + *
> + * @new_config_info: the new host AP configuration
> + * @old_config_info: the previous host AP configuration
> + */
> +void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
> + struct ap_config_info *old_config_info)
> +{
> + down_read(&matrix_dev->guests_lock);
> +
> + memcpy(&matrix_dev->config_info_prev, old_config_info,
> + sizeof(struct ap_config_info));
> + memcpy(&matrix_dev->config_info, new_config_info,
> + sizeof(struct ap_config_info));
> + vfio_ap_mdev_on_cfg_remove();

Back to the topic of locking: it looks to me that on this path you
do the filtering and thus the accesses to matrix_mdev->shadow_apcb,
matrix_mdev->matrix and matrix_dev->config_info some of which are
of type write whithout the matrix_dev->lock held. More precisely
only with the matrix_dev->guests_lock held in "read" mode.

Did I misread the code? If not, how is that OK?

BTW I got delayed on my "locking rules" writeup. Sorry for that!

Regards,
Halil

> + vfio_ap_mdev_on_cfg_add();
> +
> + up_read(&matrix_dev->guests_lock);
> +}
> +
> +static void vfio_ap_mdev_hot_plug_cfg(struct ap_guest *guest)
> +{
> + bool filter_matrix, filter_cdoms, do_hotplug = false;
> +
> + filter_matrix = bitmap_intersects(guest->matrix_mdev->matrix.apm,
> + guest->matrix_mdev->apm_add,
> + AP_DEVICES) ||
> + bitmap_intersects(guest->matrix_mdev->matrix.aqm,
> + guest->matrix_mdev->aqm_add,
> + AP_DOMAINS);
> +
> + filter_cdoms = bitmap_intersects(guest->matrix_mdev->matrix.adm,
> + guest->matrix_mdev->aqm_add,
> + AP_DOMAINS);
> +
> + mutex_lock(&guest->kvm->lock);
> + mutex_lock(&matrix_dev->lock);
> +
> + if (filter_matrix)
> + do_hotplug |= vfio_ap_mdev_filter_matrix(guest->matrix_mdev);
> +
> + if (filter_cdoms)
> + do_hotplug |= vfio_ap_mdev_filter_cdoms(guest->matrix_mdev);
> +
> + if (do_hotplug)
> + vfio_ap_mdev_hotplug_apcb(guest->matrix_mdev);
> +
> + mutex_unlock(&matrix_dev->lock);
> + mutex_unlock(&guest->kvm->lock);
> +}
> +
> +void vfio_ap_on_scan_complete(struct ap_config_info *new_config_info,
> + struct ap_config_info *old_config_info)
> +{
> + struct ap_guest *guest;
> +
> + down_read(&matrix_dev->guests_lock);
> +
> + list_for_each_entry(guest, &matrix_dev->guests, node) {
> + if (bitmap_empty(guest->matrix_mdev->apm_add, AP_DEVICES) &&
> + bitmap_empty(guest->matrix_mdev->aqm_add, AP_DOMAINS) &&
> + bitmap_empty(guest->matrix_mdev->adm_add, AP_DOMAINS))
> + continue;
> +
> + vfio_ap_mdev_hot_plug_cfg(guest);
> + bitmap_clear(guest->matrix_mdev->apm_add, 0, AP_DEVICES);
> + bitmap_clear(guest->matrix_mdev->aqm_add, 0, AP_DOMAINS);
> + bitmap_clear(guest->matrix_mdev->adm_add, 0, AP_DOMAINS);
> + }
> +
> + up_read(&matrix_dev->guests_lock);
> +}
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index 97da41f87c65..affa63da7f88 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -37,7 +37,9 @@ struct ap_guest {
> *
> * @device: generic device structure associated with the AP matrix device
> * @available_instances: number of mediated matrix devices that can be created
> - * @info: the struct containing the output from the PQAP(QCI) instruction
> + * @config_info: the struct containing the output from the PQAP(QCI) instruction
> + * @config_info_prev: the struct containing the previous output from the
> + * PQAP(AQIC) instruction
> * @mdev_list: the list of mediated matrix devices created
> * @lock: mutex for locking the AP matrix device. This lock will be
> * taken every time we fiddle with state managed by the vfio_ap
> @@ -52,7 +54,8 @@ struct ap_guest {
> struct ap_matrix_dev {
> struct device device;
> atomic_t available_instances;
> - struct ap_config_info info;
> + struct ap_config_info config_info;
> + struct ap_config_info config_info_prev;
> struct list_head mdev_list;
> struct mutex lock;
> struct ap_driver *vfio_ap_drv;
> @@ -110,6 +113,13 @@ struct ap_queue_table {
> * @mdev: the mediated device
> * @qtable: table of queues (struct vfio_ap_queue) assigned to the mdev
> * @guest: the KVM guest using the matrix mdev
> + * @apm_add: adapters to be hot plugged into the guest when the vfio_ap
> + * device driver is notified that the AP bus scan has completed.
> + * @aqm_add: domains to be hot plugged into the guest when the vfio_ap
> + * device driver is notified that the AP bus scan has completed.
> + * @adm_add: control domains to be hot plugged into the guest when the
> + * vfio_ap device driver is notified that the AP bus scan has
> + * completed.
> */
> struct ap_matrix_mdev {
> struct vfio_device vdev;
> @@ -121,6 +131,9 @@ struct ap_matrix_mdev {
> struct mdev_device *mdev;
> struct ap_queue_table qtable;
> struct ap_guest *guest;
> + DECLARE_BITMAP(apm_add, AP_DEVICES);
> + DECLARE_BITMAP(aqm_add, AP_DOMAINS);
> + DECLARE_BITMAP(adm_add, AP_DOMAINS);
> };
>
> /**
> @@ -151,4 +164,10 @@ void vfio_ap_mdev_remove_queue(struct ap_device *queue);
>
> int vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm);
>
> +
> +void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
> + struct ap_config_info *old_config_info);
> +void vfio_ap_on_scan_complete(struct ap_config_info *new_config_info,
> + struct ap_config_info *old_config_info);
> +
> #endif /* _VFIO_AP_PRIVATE_H_ */


2022-02-09 08:27:44

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v17 14/15] s390/ap: notify drivers on config changed and scan complete callbacks



On 2/7/22 20:38, Halil Pasic wrote:
> On Mon, 7 Feb 2022 14:39:31 -0500
> Tony Krowiak <[email protected]> wrote:
>
>>> Back to the topic of locking: it looks to me that on this path you
>>> do the filtering and thus the accesses to matrix_mdev->shadow_apcb,
>>> matrix_mdev->matrix and matrix_dev->config_info some of which are
>>> of type write whithout the matrix_dev->lock held. More precisely
>>> only with the matrix_dev->guests_lock held in "read" mode.
>>>
>>> Did I misread the code? If not, how is that OK?
>> You make a valid point, a struct rw_semaphore is not adequate for the
>> purposes
>> it is used in this patch series. It needs to be a mutex.
>>
> Good we agree that v17 is racy.
>
>> For v18 which is forthcoming probably this week, I've been reworking the
>> locking
>> based on your observation that the struct ap_guest is not necessary given we
>> already have a list of the mediated devices which contain the KVM
>> pointer. On the other
> [..]
>>> BTW I got delayed on my "locking rules" writeup. Sorry for that!
>> No worries, I've been writing up a vfio-ap-locking.rst document to
>> include with the next
>> version of the patch series.
> I'm looking forward to v18 including that document. I prefer not to
> discuss what you wrote about the approach taken in v18 now. It is easier
> to me when I have both the text stating the intended design, and the
> code that is supposed to adhere to this design.
>
> Regards,
> Halil

Coming soon to a theater near you:)

>


2022-02-09 08:49:04

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v17 14/15] s390/ap: notify drivers on config changed and scan complete callbacks



On 2/4/22 05:43, Halil Pasic wrote:
> On Thu, 21 Oct 2021 11:23:31 -0400
> Tony Krowiak <[email protected]> wrote:
>
>> This patch introduces an extension to the ap bus to notify device drivers
>> when the host AP configuration changes - i.e., adapters, domains or
>> control domains are added or removed. When an adapter or domain is added to
>> the host's AP configuration, the AP bus will create the associated queue
>> devices in the linux sysfs device model. Each new type 10 (i.e., CEX4) or
>> newer queue device with an APQN that is not reserved for the default device
>> driver will get bound to the vfio_ap device driver. Likewise, whan an
>> adapter or domain is removed from the host's AP configuration, the AP bus
>> will remove the associated queue devices from the sysfs device model. Each
>> of the queues that is bound to the vfio_ap device driver will get unbound.
>>
>> With the introduction of hot plug support, binding or unbinding of a
>> queue device will result in plugging or unplugging one or more queues from
>> a guest that is using the queue. If there are multiple changes to the
>> host's AP configuration, it could result in the probe and remove callbacks
>> getting invoked multiple times. Each time queues are plugged into or
>> unplugged from a guest, the guest's VCPUs must be taken out of SIE.
>> If this occurs multiple times due to changes in the host's AP
>> configuration, that can have an undesirable negative affect on the guest's
>> performance.
>>
>> To alleviate this problem, this patch introduces two new callbacks: one to
>> notify the vfio_ap device driver when the AP bus scan routine detects a
>> change to the host's AP configuration; and, one to notify the driver when
>> the AP bus is done scanning. This will allow the vfio_ap driver to do
>> bulk processing of all affected adapters, domains and control domains for
>> affected guests rather than plugging or unplugging them one at a time when
>> the probe or remove callback is invoked. The two new callbacks are:
>>
>> void (*on_config_changed)(struct ap_config_info *new_config_info,
>> struct ap_config_info *old_config_info);
>>
>> This callback is invoked at the start of the AP bus scan
>> function when it determines that the host AP configuration information
>> has changed since the previous scan. This is done by storing
>> an old and current QCI info struct and comparing them. If there is any
>> difference, the callback is invoked.
>>
>> The vfio_ap device driver registers a callback function for this callback
>> that performs the following operations:
>>
>> 1. Unplugs the adapters, domains and control domains removed from the
>> host's AP configuration from the guests to which they are
>> assigned in a single operation.
>>
>> 2. Disconnects the links between each queue structure representing a
>> queue that was unplugged from the structure representing
>> the mediated device to which the queue is assigned. Thus, when the
>> vfio_ap device driver's remove callback is invoked, the unplugging of
>> the queue from the guest and the unlinking of the queue structure from
>> the mediated device structure will be bypassed because the queues and
>> control domains will have already been unplugged in bulk.
>>
>> 3. Stores bitmaps identifying the adapters, domains and control domains
>> added to the host's AP configuration with the structure representing
>> the mediated device. When the vfio_ap device driver's probe callback is
>> subsequently invoked, the probe function will recognize that the
>> queue is being probed due to a change in the host's AP configuration
>> and the plugging of the queue into the guest will be bypassed.
>>
>> void (*on_scan_complete)(struct ap_config_info *new_config_info,
>> struct ap_config_info *old_config_info);
>>
>> The on_scan_complete callback is invoked after the ap bus scan is
>> completed if the host AP configuration data has changed. The vfio_ap
>> device driver registers a callback function for this callback that hot
>> plugs each queue and control domain added to the AP configuration for each
>> guest using them in a single hot plug operation.
>>
>> Signed-off-by: Harald Freudenberger <[email protected]>
>> [[email protected]: implemented callback functions in vfio_ap driver]
>> Signed-off-by: Tony Krowiak <[email protected]>
>> ---
>> drivers/s390/crypto/ap_bus.c | 81 ++++++-
>> drivers/s390/crypto/ap_bus.h | 12 +
>> drivers/s390/crypto/vfio_ap_drv.c | 4 +-
>> drivers/s390/crypto/vfio_ap_ops.c | 332 ++++++++++++++++++++++++--
>> drivers/s390/crypto/vfio_ap_private.h | 23 +-
>> 5 files changed, 429 insertions(+), 23 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
>> index 15886610f61a..b97149d02da6 100644
>> --- a/drivers/s390/crypto/ap_bus.c
>> +++ b/drivers/s390/crypto/ap_bus.c
>> @@ -88,6 +88,7 @@ static atomic64_t ap_bindings_complete_count = ATOMIC64_INIT(0);
>> static DECLARE_COMPLETION(ap_init_apqn_bindings_complete);
>>
>> static struct ap_config_info *ap_qci_info;
>> +static struct ap_config_info *ap_qci_info_old;
>>
>> /*
>> * AP bus related debug feature things.
>> @@ -225,9 +226,14 @@ static void __init ap_init_qci_info(void)
>> ap_qci_info = kzalloc(sizeof(*ap_qci_info), GFP_KERNEL);
>> if (!ap_qci_info)
>> return;
>> + ap_qci_info_old = kzalloc(sizeof(*ap_qci_info_old), GFP_KERNEL);
>> + if (!ap_qci_info_old)
>> + return;
>> if (ap_fetch_qci_info(ap_qci_info) != 0) {
>> kfree(ap_qci_info);
>> + kfree(ap_qci_info_old);
>> ap_qci_info = NULL;
>> + ap_qci_info_old = NULL;
>> return;
>> }
>> AP_DBF_INFO("%s successful fetched initial qci info\n", __func__);
>> @@ -244,6 +250,8 @@ static void __init ap_init_qci_info(void)
>> __func__, ap_max_domain_id);
>> }
>> }
>> +
>> + memcpy(ap_qci_info_old, ap_qci_info, sizeof(*ap_qci_info));
>> }
>>
>> /*
>> @@ -1635,6 +1643,49 @@ static int __match_queue_device_with_queue_id(struct device *dev, const void *da
>> && AP_QID_QUEUE(to_ap_queue(dev)->qid) == (int)(long) data;
>> }
>>
>> +/* Helper function for notify_config_changed */
>> +static int __drv_notify_config_changed(struct device_driver *drv, void *data)
>> +{
>> + struct ap_driver *ap_drv = to_ap_drv(drv);
>> +
>> + if (try_module_get(drv->owner)) {
>> + if (ap_drv->on_config_changed)
>> + ap_drv->on_config_changed(ap_qci_info, ap_qci_info_old);
>> + module_put(drv->owner);
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +/* Notify all drivers about an qci config change */
>> +static inline void notify_config_changed(void)
>> +{
>> + bus_for_each_drv(&ap_bus_type, NULL, NULL,
>> + __drv_notify_config_changed);
>> +}
>> +
>> +/* Helper function for notify_scan_complete */
>> +static int __drv_notify_scan_complete(struct device_driver *drv, void *data)
>> +{
>> + struct ap_driver *ap_drv = to_ap_drv(drv);
>> +
>> + if (try_module_get(drv->owner)) {
>> + if (ap_drv->on_scan_complete)
>> + ap_drv->on_scan_complete(ap_qci_info,
>> + ap_qci_info_old);
>> + module_put(drv->owner);
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +/* Notify all drivers about bus scan complete */
>> +static inline void notify_scan_complete(void)
>> +{
>> + bus_for_each_drv(&ap_bus_type, NULL, NULL,
>> + __drv_notify_scan_complete);
>> +}
>> +
>> /*
>> * Helper function for ap_scan_bus().
>> * Remove card device and associated queue devices.
>> @@ -1923,6 +1974,25 @@ static inline void ap_scan_adapter(int ap)
>> put_device(&ac->ap_dev.device);
>> }
>>
>> +/**
>> + * ap_get_configuration - get the host AP configuration
>> + *
>> + * Stores the host AP configuration information returned from the previous call
>> + * to Query Configuration Information (QCI), then retrieves and stores the
>> + * current AP configuration returned from QCI.
>> + *
>> + * Return: true if the host AP configuration changed between calls to QCI;
>> + * otherwise, return false.
>> + */
>> +static bool ap_get_configuration(void)
>> +{
>> + memcpy(ap_qci_info_old, ap_qci_info, sizeof(*ap_qci_info));
>> + ap_fetch_qci_info(ap_qci_info);
>> +
>> + return memcmp(ap_qci_info, ap_qci_info_old,
>> + sizeof(struct ap_config_info)) != 0;
>> +}
>> +
>> /**
>> * ap_scan_bus(): Scan the AP bus for new devices
>> * Runs periodically, workqueue timer (ap_config_time)
>> @@ -1930,9 +2000,12 @@ static inline void ap_scan_adapter(int ap)
>> */
>> static void ap_scan_bus(struct work_struct *unused)
>> {
>> - int ap;
>> + int ap, config_changed = 0;
>>
>> - ap_fetch_qci_info(ap_qci_info);
>> + /* config change notify */
>> + config_changed = ap_get_configuration();
>> + if (config_changed)
>> + notify_config_changed();
>> ap_select_domain();
>>
>> AP_DBF_DBG("%s running\n", __func__);
>> @@ -1941,6 +2014,10 @@ static void ap_scan_bus(struct work_struct *unused)
>> for (ap = 0; ap <= ap_max_adapter_id; ap++)
>> ap_scan_adapter(ap);
>>
>> + /* scan complete notify */
>> + if (config_changed)
>> + notify_scan_complete();
>> +
>> /* check if there is at least one queue available with default domain */
>> if (ap_domain_index >= 0) {
>> struct device *dev =
>> diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
>> index 67c1bef60ad5..4de062ea6b76 100644
>> --- a/drivers/s390/crypto/ap_bus.h
>> +++ b/drivers/s390/crypto/ap_bus.h
>> @@ -143,6 +143,18 @@ struct ap_driver {
>> int (*probe)(struct ap_device *);
>> void (*remove)(struct ap_device *);
>> int (*in_use)(unsigned long *apm, unsigned long *aqm);
>> + /*
>> + * Called at the start of the ap bus scan function when
>> + * the crypto config information (qci) has changed.
>> + */
>> + void (*on_config_changed)(struct ap_config_info *new_config_info,
>> + struct ap_config_info *old_config_info);
>> + /*
>> + * Called at the end of the ap bus scan function when
>> + * the crypto config information (qci) has changed.
>> + */
>> + void (*on_scan_complete)(struct ap_config_info *new_config_info,
>> + struct ap_config_info *old_config_info);
>> };
>>
>> #define to_ap_drv(x) container_of((x), struct ap_driver, driver)
>> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
>> index df7528dcf6ed..5edd45d4d2fc 100644
>> --- a/drivers/s390/crypto/vfio_ap_drv.c
>> +++ b/drivers/s390/crypto/vfio_ap_drv.c
>> @@ -45,6 +45,8 @@ static struct ap_driver vfio_ap_drv = {
>> .probe = vfio_ap_mdev_probe_queue,
>> .remove = vfio_ap_mdev_remove_queue,
>> .in_use = vfio_ap_mdev_resource_in_use,
>> + .on_config_changed = vfio_ap_on_cfg_changed,
>> + .on_scan_complete = vfio_ap_on_scan_complete,
>> .ids = ap_queue_ids,
>> };
>>
>> @@ -92,7 +94,7 @@ static int vfio_ap_matrix_dev_create(void)
>>
>> /* Fill in config info via PQAP(QCI), if available */
>> if (test_facility(12)) {
>> - ret = ap_qci(&matrix_dev->info);
>> + ret = ap_qci(&matrix_dev->config_info);
>> if (ret)
>> goto matrix_alloc_err;
>> }
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index 8075080ef2dd..cedf491c0df4 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -330,7 +330,7 @@ static bool vfio_ap_mdev_filter_cdoms(struct ap_matrix_mdev *matrix_mdev)
>>
>> bitmap_copy(shadow_adm, matrix_mdev->shadow_apcb.adm, AP_DOMAINS);
>> bitmap_and(matrix_mdev->shadow_apcb.adm, matrix_mdev->matrix.adm,
>> - (unsigned long *)matrix_dev->info.adm, AP_DOMAINS);
>> + (unsigned long *)matrix_dev->config_info.adm, AP_DOMAINS);
>>
>> return !bitmap_equal(shadow_adm, matrix_mdev->shadow_apcb.adm,
>> AP_DOMAINS);
>> @@ -349,19 +349,15 @@ static bool vfio_ap_mdev_filter_cdoms(struct ap_matrix_mdev *matrix_mdev)
>> */
>> static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev)
>> {
>> - int ret;
>> unsigned long apid, apqi, apqn;
>> DECLARE_BITMAP(shadow_apm, AP_DEVICES);
>> DECLARE_BITMAP(shadow_aqm, AP_DOMAINS);
>> struct vfio_ap_queue *q;
>>
>> - ret = ap_qci(&matrix_dev->info);
>> - if (ret)
>> - return false;
>> -
>> bitmap_copy(shadow_apm, matrix_mdev->shadow_apcb.apm, AP_DEVICES);
>> bitmap_copy(shadow_aqm, matrix_mdev->shadow_apcb.aqm, AP_DOMAINS);
>> - vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);
>> + vfio_ap_matrix_init(&matrix_dev->config_info,
>> + &matrix_mdev->shadow_apcb);
>>
>> /*
>> * Copy the adapters, domains and control domains to the shadow_apcb
>> @@ -369,9 +365,9 @@ static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev)
>> * AP configuration.
>> */
>> bitmap_and(matrix_mdev->shadow_apcb.apm, matrix_mdev->matrix.apm,
>> - (unsigned long *)matrix_dev->info.apm, AP_DEVICES);
>> + (unsigned long *)matrix_dev->config_info.apm, AP_DEVICES);
>> bitmap_and(matrix_mdev->shadow_apcb.aqm, matrix_mdev->matrix.aqm,
>> - (unsigned long *)matrix_dev->info.aqm, AP_DOMAINS);
>> + (unsigned long *)matrix_dev->config_info.aqm, AP_DOMAINS);
>>
>> for_each_set_bit_inv(apid, matrix_mdev->shadow_apcb.apm, AP_DEVICES) {
>> for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm,
>> @@ -417,8 +413,9 @@ static int vfio_ap_mdev_probe(struct mdev_device *mdev)
>> &vfio_ap_matrix_dev_ops);
>>
>> matrix_mdev->mdev = mdev;
>> - vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
>> - vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);
>> + vfio_ap_matrix_init(&matrix_dev->config_info, &matrix_mdev->matrix);
>> + vfio_ap_matrix_init(&matrix_dev->config_info,
>> + &matrix_mdev->shadow_apcb);
>> hash_init(matrix_mdev->qtable.queues);
>> mdev_set_drvdata(mdev, matrix_mdev);
>> mutex_lock(&matrix_dev->lock);
>> @@ -772,13 +769,17 @@ static void vfio_ap_unlink_apqn_fr_mdev(struct ap_matrix_mdev *matrix_mdev,
>>
>> q = vfio_ap_mdev_get_queue(matrix_mdev, AP_MKQID(apid, apqi));
>> /* If the queue is assigned to the matrix mdev, unlink it. */
>> - if (q)
>> + if (q) {
>> vfio_ap_unlink_queue_fr_mdev(q);
>>
>> - /* If the queue is assigned to the APCB, store it in @qtable. */
>> - if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm) &&
>> - test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm))
>> - hash_add(qtable->queues, &q->mdev_qnode, q->apqn);
>> + /* If the queue is assigned to the APCB, store it in @qtable. */
>> + if (qtable) {
>> + if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm) &&
>> + test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm))
>> + hash_add(qtable->queues, &q->mdev_qnode,
>> + q->apqn);
>> + }
>> + }
>> }
>>
>> /**
>> @@ -1702,9 +1703,31 @@ static void vfio_ap_mdev_put_qlocks(struct ap_guest *guest)
>> mutex_unlock(&guest->kvm->lock);
>>
>> mutex_unlock(&matrix_dev->lock);
>> +
>> up_read(&matrix_dev->guests_lock);
>> }
>>
>> +static bool vfio_ap_mdev_do_filter_matrix(struct ap_matrix_mdev *matrix_mdev,
>> + struct vfio_ap_queue *q)
>> +{
>> + unsigned long apid = AP_QID_CARD(q->apqn);
>> + unsigned long apqi = AP_QID_QUEUE(q->apqn);
>> +
>> + /*
>> + * If the queue is being probed because its APID or APQI is in the
>> + * process of being added to the host's AP configuration, then we don't
>> + * want to filter the matrix now as the filtering will be done after
>> + * the driver is notified that the AP bus scan operation has completed
>> + * (see the vfio_ap_on_scan_complete callback function).
>> + */
>> + if (test_bit_inv(apid, matrix_mdev->apm_add) ||
>> + test_bit_inv(apqi, matrix_mdev->aqm_add))
>> + return false;
>> +
>> +
>> + return true;
>> +}
>> +
>> int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
>> {
>> struct vfio_ap_queue *q;
>> @@ -1722,8 +1745,10 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
>> if (guest) {
>> vfio_ap_mdev_link_queue(guest->matrix_mdev, q);
>>
>> - if (vfio_ap_mdev_filter_matrix(guest->matrix_mdev))
>> - vfio_ap_mdev_hotplug_apcb(guest->matrix_mdev);
>> + if (vfio_ap_mdev_do_filter_matrix(guest->matrix_mdev, q)) {
>> + if (vfio_ap_mdev_filter_matrix(guest->matrix_mdev))
>> + vfio_ap_mdev_hotplug_apcb(guest->matrix_mdev);
>> + }
>> } else {
>> vfio_ap_queue_link_mdev(q);
>> }
>> @@ -1767,3 +1792,274 @@ int vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm)
>>
>> return ret;
>> }
>> +
>> +/**
>> + * vfio_ap_mdev_unlink_adapters - unlinks all queues from the matrix mdev with
>> + * an APQI of a domain that has been removed from
>> + * the host's AP configuration.
>> + *
>> + * @matrix_mdev: the matrix mdev from which the queues are to be unlinked
>> + * @ap_unlink: a bitmap specifying the APIDs of the adapters removed from the
>> + * host's AP configuration.
>> + */
>> +static void vfio_ap_mdev_unlink_adapters(struct ap_matrix_mdev *matrix_mdev,
>> + unsigned long *ap_unlink)
>> +{
>> + unsigned long apid;
>> +
>> + for_each_set_bit_inv(apid, ap_unlink, AP_DEVICES)
>> + vfio_ap_mdev_unlink_adapter(matrix_mdev, apid, NULL);
>> +}
>> +
>> +/**
>> + * vfio_ap_mdev_unlink_domains - unlinks all queues from the matrix mdev with an
>> + * APQI of a domain that has been removed from the
>> + * host's AP configuration.
>> + *
>> + * @matrix_mdev: the matrix mdev from which the queues are to be unlinked
>> + * @aq_unlink: a bitmap specifying the APQIs of the domains removed from the
>> + * host's AP configuration.
>> + */
>> +static void vfio_ap_mdev_unlink_domains(struct ap_matrix_mdev *matrix_mdev,
>> + unsigned long *aq_unlink)
>> +{
>> + unsigned long apqi;
>> +
>> + for_each_set_bit_inv(apqi, aq_unlink, AP_DOMAINS)
>> + vfio_ap_mdev_unlink_domain(matrix_mdev, apqi, NULL);
>> +}
>> +
>> +/**
>> + * vfio_ap_mdev_hot_unplug_cfg - hot unplug the adapters, domains and control
>> + * domains that have been removed from the host's
>> + * AP configuration from a guest.
>> + *
>> + * @guest: the guest
>> + * @aprem: the adapters that have been removed from the host's AP configuration
>> + * @aqrem: the domains that have been removed from the host's AP configuration
>> + */
>> +static void vfio_ap_mdev_hot_unplug_cfg(struct ap_guest *guest,
>> + unsigned long *aprem,
>> + unsigned long *aqrem)
>> +{
>> + vfio_ap_mdev_unlink_adapters(guest->matrix_mdev, aprem);
>> + vfio_ap_mdev_unlink_domains(guest->matrix_mdev, aqrem);
>> +
>> + if (vfio_ap_mdev_filter_matrix(guest->matrix_mdev) ||
>> + vfio_ap_mdev_filter_cdoms(guest->matrix_mdev)) {
>> + mutex_lock(&guest->kvm->lock);
>> + mutex_lock(&matrix_dev->lock);
>> +
>> + vfio_ap_mdev_hotplug_apcb(guest->matrix_mdev);
>> +
>> + mutex_unlock(&guest->kvm->lock);
>> + mutex_unlock(&matrix_dev->lock);
>> + }
>> +}
>> +
>> +/**
>> + * vfio_ap_mdev_cfg_remove - determines which guests are using the adapters,
>> + * domains and control domains that have been removed
>> + * from the host AP configuration and unplugs them
>> + * from those guests.
>> + *
>> + * @ap_remove: bitmap specifying which adapters have been removed from the host
>> + * config.
>> + * @aq_remove: bitmap specifying which domains have been removed from the host
>> + * config.
>> + * @cd_remove: bitmap specifying which control domains have been removed from
>> + * the host config.
>> + */
>> +static void vfio_ap_mdev_cfg_remove(unsigned long *ap_remove,
>> + unsigned long *aq_remove,
>> + unsigned long *cd_remove)
>> +{
>> + struct ap_guest *guest;
>> + DECLARE_BITMAP(aprem, AP_DEVICES);
>> + DECLARE_BITMAP(aqrem, AP_DOMAINS);
>> + int do_ap_remove, do_aq_remove, do_cd_remove;
>> +
>> + list_for_each_entry(guest, &matrix_dev->guests, node) {
>> + do_ap_remove = bitmap_and(aprem, ap_remove,
>> + guest->matrix_mdev->matrix.apm,
>> + AP_DEVICES);
>> + do_aq_remove = bitmap_and(aqrem, aq_remove,
>> + guest->matrix_mdev->matrix.aqm,
>> + AP_DOMAINS);
>> + do_cd_remove = bitmap_and(aqrem, cd_remove,
>> + guest->matrix_mdev->matrix.aqm,
>> + AP_DOMAINS);
>> +
>> + if (!do_ap_remove && !do_aq_remove && !do_cd_remove)
>> + continue;
>> +
>> + vfio_ap_mdev_hot_unplug_cfg(guest, aprem, aqrem);
>> + }
>> +}
>> +
>> +/**
>> + * vfio_ap_mdev_on_cfg_remove - responds to the removal of adapters, domains and
>> + * control domains from the host AP configuration
>> + * by unplugging them from the guests that are
>> + * using them.
>> + */
>> +static void vfio_ap_mdev_on_cfg_remove(void)
>> +{
>> + int ap_remove, aq_remove, cd_remove;
>> + DECLARE_BITMAP(aprem, AP_DEVICES);
>> + DECLARE_BITMAP(aqrem, AP_DOMAINS);
>> + DECLARE_BITMAP(cdrem, AP_DOMAINS);
>> + unsigned long *cur_apm, *cur_aqm, *cur_adm;
>> + unsigned long *prev_apm, *prev_aqm, *prev_adm;
>> +
>> + cur_apm = (unsigned long *)matrix_dev->config_info.apm;
>> + cur_aqm = (unsigned long *)matrix_dev->config_info.aqm;
>> + cur_adm = (unsigned long *)matrix_dev->config_info.adm;
>> + prev_apm = (unsigned long *)matrix_dev->config_info_prev.apm;
>> + prev_aqm = (unsigned long *)matrix_dev->config_info_prev.aqm;
>> + prev_adm = (unsigned long *)matrix_dev->config_info_prev.adm;
>> +
>> + ap_remove = bitmap_andnot(aprem, prev_apm, cur_apm, AP_DEVICES);
>> + aq_remove = bitmap_andnot(aqrem, prev_aqm, cur_aqm, AP_DOMAINS);
>> + cd_remove = bitmap_andnot(cdrem, prev_adm, cur_adm, AP_DOMAINS);
>> +
>> + if (ap_remove || aq_remove || cd_remove)
>> + vfio_ap_mdev_cfg_remove(aprem, aqrem, cdrem);
>> +}
>> +
>> +/**
>> + * vfio_ap_mdev_cfg_add - store bitmaps specifying the adapters, domains and
>> + * control domains that have been added to the host's
>> + * AP configuration for each matrix mdev to which they
>> + * are assigned.
>> + *
>> + * @apm_add: a bitmap specifying the adapters that have been added to the AP
>> + * configuration.
>> + * @aqm_add: a bitmap specifying the domains that have been added to the AP
>> + * configuration.
>> + * @adm_add: a bitmap specifying the control domains that have been added to the
>> + * AP configuration.
>> + */
>> +static void vfio_ap_mdev_cfg_add(unsigned long *apm_add, unsigned long *aqm_add,
>> + unsigned long *adm_add)
>> +{
>> + struct ap_guest *guest;
>> +
>> + list_for_each_entry(guest, &matrix_dev->guests, node) {
>> + bitmap_and(guest->matrix_mdev->apm_add,
>> + guest->matrix_mdev->matrix.apm, apm_add, AP_DEVICES);
>> + bitmap_and(guest->matrix_mdev->aqm_add,
>> + guest->matrix_mdev->matrix.aqm, aqm_add, AP_DOMAINS);
>> + bitmap_and(guest->matrix_mdev->adm_add,
>> + guest->matrix_mdev->matrix.adm, adm_add, AP_DEVICES);
>> + }
>> +}
>> +
>> +/**
>> + * vfio_ap_mdev_on_cfg_add - responds to the addition of adapters, domains and
>> + * control domains to the host AP configuration
>> + * by updating the bitmaps that specify what adapters,
>> + * domains and control domains have been added so they
>> + * can be hot plugged into the guest when the AP bus
>> + * scan completes (see vfio_ap_on_scan_complete
>> + * function).
>> + */
>> +static void vfio_ap_mdev_on_cfg_add(void)
>> +{
>> + bool do_add;
>> + DECLARE_BITMAP(apm_add, AP_DEVICES);
>> + DECLARE_BITMAP(aqm_add, AP_DOMAINS);
>> + DECLARE_BITMAP(adm_add, AP_DOMAINS);
>> + unsigned long *cur_apm, *cur_aqm, *cur_adm;
>> + unsigned long *prev_apm, *prev_aqm, *prev_adm;
>> +
>> + cur_apm = (unsigned long *)matrix_dev->config_info.apm;
>> + cur_aqm = (unsigned long *)matrix_dev->config_info.aqm;
>> + cur_adm = (unsigned long *)matrix_dev->config_info.adm;
>> +
>> + prev_apm = (unsigned long *)matrix_dev->config_info_prev.apm;
>> + prev_aqm = (unsigned long *)matrix_dev->config_info_prev.aqm;
>> + prev_adm = (unsigned long *)matrix_dev->config_info_prev.adm;
>> +
>> + do_add = bitmap_andnot(apm_add, cur_apm, prev_apm, AP_DEVICES);
>> + do_add |= bitmap_andnot(aqm_add, cur_aqm, prev_aqm, AP_DOMAINS);
>> + do_add |= bitmap_andnot(adm_add, cur_adm, prev_adm, AP_DOMAINS);
>> +
>> + if (do_add)
>> + vfio_ap_mdev_cfg_add(apm_add, aqm_add, adm_add);
>> +}
>> +
>> +/**
>> + * vfio_ap_on_cfg_changed - handles notification of changes to the host AP
>> + * configuration.
>> + *
>> + * @new_config_info: the new host AP configuration
>> + * @old_config_info: the previous host AP configuration
>> + */
>> +void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
>> + struct ap_config_info *old_config_info)
>> +{
>> + down_read(&matrix_dev->guests_lock);
>> +
>> + memcpy(&matrix_dev->config_info_prev, old_config_info,
>> + sizeof(struct ap_config_info));
>> + memcpy(&matrix_dev->config_info, new_config_info,
>> + sizeof(struct ap_config_info));
>> + vfio_ap_mdev_on_cfg_remove();
> Back to the topic of locking: it looks to me that on this path you
> do the filtering and thus the accesses to matrix_mdev->shadow_apcb,
> matrix_mdev->matrix and matrix_dev->config_info some of which are
> of type write whithout the matrix_dev->lock held. More precisely
> only with the matrix_dev->guests_lock held in "read" mode.
>
> Did I misread the code? If not, how is that OK?

You make a valid point, a struct rw_semaphore is not adequate for the
purposes
it is used in this patch series. It needs to be a mutex.


For v18 which is forthcoming probably this week, I've been reworking the
locking
based on your observation that the struct ap_guest is not necessary given we
already have a list of the mediated devices which contain the KVM
pointer. On the other
hand, the matrix_dev->guests_lock will remain, except it will have a new
name,
matrix_dev->kvm_lock, and will be changed to a mutex per your
observation above.
So, I will address this question based on the forthcoming patches. The
purpose and
usage of the matrix_dev->kvm_lock, however, will not differ much, if at
all, from this
v17 series.

Let's start with the purposes of the matrix_dev->kvm lock:

The primary purpose of this lock is to enforce a locking order that ensures
the matrix_mdev->kvm->lock is taken before the matrix_dev->lock to
prevent a lockdep splat. Consequently, any function that dynamically updates
the guest's APCB must take this lock before any other; including, all
mediated
device assign/unassign interfaces, the vfio_ap driver's probe/remove
callback,
the mdev remove callback and the AP bus callbacks described herein. In
all other
cases, it is unnecessary to take the matrix_dev->kvm_lock because the
matrix_dev->lock is sufficient since all fields in the matrix_mdev are
protected
by that lock including matrix_mdev->kvm.

So, let's look at each of the objects you mentioned:

* Access to matrix_mdev->shadow_apcb:

  In every case that the guest's APCB is updated - which includes
filtering and updating
  the shadow APCB, the matrix_dev->kvm_lock will be taken before any
other locks; so,
  in the context of the vfio_ap_cfg_changed callback, it is sufficient
to operate on the
  matrix_mdev->shadow_apcb with only the matrix_dev->kvm_lock held.

* Access to matrix_mdev->matrix:

   The matrix_mdev->matrix is only changed via the mediated device's sysfs
   assign/unassign interfaces. Since these functions may update the guest's
   APCB, they take the matrix_dev->kvm_lock prior to taking any other
lock and
   hold it until the operation is complete. That being the case, the
   matrix_mdev->matrix will remain stable for the duration of the the
   vfio_ap_cfg_changed callback.

* Access to matrix_dev->config_info:

  The matrix_dev->config_info value is set in the vfio_ap_cfg_changed
callback
  function subsequent to taking the matrix_dev->kvm_lock, so access to the
  matrix_dev->config_info is protected by that lock for the duration of the
  function. The only other place matrix_dev->config_info is accessed is
in the
  filtering functions which will only ever be called while the
matrix_dev->kvm_lock
  is held.





>
> BTW I got delayed on my "locking rules" writeup. Sorry for that!

No worries, I've been writing up a vfio-ap-locking.rst document to
include with the next
version of the patch series.

>
> Regards,
> Halil
>
>> + vfio_ap_mdev_on_cfg_add();
>> +
>> + up_read(&matrix_dev->guests_lock);
>> +}
>> +
>> +static void vfio_ap_mdev_hot_plug_cfg(struct ap_guest *guest)
>> +{
>> + bool filter_matrix, filter_cdoms, do_hotplug = false;
>> +
>> + filter_matrix = bitmap_intersects(guest->matrix_mdev->matrix.apm,
>> + guest->matrix_mdev->apm_add,
>> + AP_DEVICES) ||
>> + bitmap_intersects(guest->matrix_mdev->matrix.aqm,
>> + guest->matrix_mdev->aqm_add,
>> + AP_DOMAINS);
>> +
>> + filter_cdoms = bitmap_intersects(guest->matrix_mdev->matrix.adm,
>> + guest->matrix_mdev->aqm_add,
>> + AP_DOMAINS);
>> +
>> + mutex_lock(&guest->kvm->lock);
>> + mutex_lock(&matrix_dev->lock);
>> +
>> + if (filter_matrix)
>> + do_hotplug |= vfio_ap_mdev_filter_matrix(guest->matrix_mdev);
>> +
>> + if (filter_cdoms)
>> + do_hotplug |= vfio_ap_mdev_filter_cdoms(guest->matrix_mdev);
>> +
>> + if (do_hotplug)
>> + vfio_ap_mdev_hotplug_apcb(guest->matrix_mdev);
>> +
>> + mutex_unlock(&matrix_dev->lock);
>> + mutex_unlock(&guest->kvm->lock);
>> +}
>> +
>> +void vfio_ap_on_scan_complete(struct ap_config_info *new_config_info,
>> + struct ap_config_info *old_config_info)
>> +{
>> + struct ap_guest *guest;
>> +
>> + down_read(&matrix_dev->guests_lock);
>> +
>> + list_for_each_entry(guest, &matrix_dev->guests, node) {
>> + if (bitmap_empty(guest->matrix_mdev->apm_add, AP_DEVICES) &&
>> + bitmap_empty(guest->matrix_mdev->aqm_add, AP_DOMAINS) &&
>> + bitmap_empty(guest->matrix_mdev->adm_add, AP_DOMAINS))
>> + continue;
>> +
>> + vfio_ap_mdev_hot_plug_cfg(guest);
>> + bitmap_clear(guest->matrix_mdev->apm_add, 0, AP_DEVICES);
>> + bitmap_clear(guest->matrix_mdev->aqm_add, 0, AP_DOMAINS);
>> + bitmap_clear(guest->matrix_mdev->adm_add, 0, AP_DOMAINS);
>> + }
>> +
>> + up_read(&matrix_dev->guests_lock);
>> +}
>> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
>> index 97da41f87c65..affa63da7f88 100644
>> --- a/drivers/s390/crypto/vfio_ap_private.h
>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>> @@ -37,7 +37,9 @@ struct ap_guest {
>> *
>> * @device: generic device structure associated with the AP matrix device
>> * @available_instances: number of mediated matrix devices that can be created
>> - * @info: the struct containing the output from the PQAP(QCI) instruction
>> + * @config_info: the struct containing the output from the PQAP(QCI) instruction
>> + * @config_info_prev: the struct containing the previous output from the
>> + * PQAP(AQIC) instruction
>> * @mdev_list: the list of mediated matrix devices created
>> * @lock: mutex for locking the AP matrix device. This lock will be
>> * taken every time we fiddle with state managed by the vfio_ap
>> @@ -52,7 +54,8 @@ struct ap_guest {
>> struct ap_matrix_dev {
>> struct device device;
>> atomic_t available_instances;
>> - struct ap_config_info info;
>> + struct ap_config_info config_info;
>> + struct ap_config_info config_info_prev;
>> struct list_head mdev_list;
>> struct mutex lock;
>> struct ap_driver *vfio_ap_drv;
>> @@ -110,6 +113,13 @@ struct ap_queue_table {
>> * @mdev: the mediated device
>> * @qtable: table of queues (struct vfio_ap_queue) assigned to the mdev
>> * @guest: the KVM guest using the matrix mdev
>> + * @apm_add: adapters to be hot plugged into the guest when the vfio_ap
>> + * device driver is notified that the AP bus scan has completed.
>> + * @aqm_add: domains to be hot plugged into the guest when the vfio_ap
>> + * device driver is notified that the AP bus scan has completed.
>> + * @adm_add: control domains to be hot plugged into the guest when the
>> + * vfio_ap device driver is notified that the AP bus scan has
>> + * completed.
>> */
>> struct ap_matrix_mdev {
>> struct vfio_device vdev;
>> @@ -121,6 +131,9 @@ struct ap_matrix_mdev {
>> struct mdev_device *mdev;
>> struct ap_queue_table qtable;
>> struct ap_guest *guest;
>> + DECLARE_BITMAP(apm_add, AP_DEVICES);
>> + DECLARE_BITMAP(aqm_add, AP_DOMAINS);
>> + DECLARE_BITMAP(adm_add, AP_DOMAINS);
>> };
>>
>> /**
>> @@ -151,4 +164,10 @@ void vfio_ap_mdev_remove_queue(struct ap_device *queue);
>>
>> int vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm);
>>
>> +
>> +void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
>> + struct ap_config_info *old_config_info);
>> +void vfio_ap_on_scan_complete(struct ap_config_info *new_config_info,
>> + struct ap_config_info *old_config_info);
>> +
>> #endif /* _VFIO_AP_PRIVATE_H_ */


2022-02-09 09:53:42

by Halil Pasic

[permalink] [raw]
Subject: Re: [PATCH v17 14/15] s390/ap: notify drivers on config changed and scan complete callbacks

On Mon, 7 Feb 2022 14:39:31 -0500
Tony Krowiak <[email protected]> wrote:

> > Back to the topic of locking: it looks to me that on this path you
> > do the filtering and thus the accesses to matrix_mdev->shadow_apcb,
> > matrix_mdev->matrix and matrix_dev->config_info some of which are
> > of type write whithout the matrix_dev->lock held. More precisely
> > only with the matrix_dev->guests_lock held in "read" mode.
> >
> > Did I misread the code? If not, how is that OK?
>
> You make a valid point, a struct rw_semaphore is not adequate for the
> purposes
> it is used in this patch series. It needs to be a mutex.
>

Good we agree that v17 is racy.

>
> For v18 which is forthcoming probably this week, I've been reworking the
> locking
> based on your observation that the struct ap_guest is not necessary given we
> already have a list of the mediated devices which contain the KVM
> pointer. On the other

[..]
>
> >
> > BTW I got delayed on my "locking rules" writeup. Sorry for that!
>
> No worries, I've been writing up a vfio-ap-locking.rst document to
> include with the next
> version of the patch series.

I'm looking forward to v18 including that document. I prefer not to
discuss what you wrote about the approach taken in v18 now. It is easier
to me when I have both the text stating the intended design, and the
code that is supposed to adhere to this design.

Regards,
Halil