The current design for AP pass-through does not support making dynamic
changes to the AP matrix of a running guest resulting in a few
deficiencies this patch series is intended to mitigate:
1. Adapters, domains and control domains can not be added to or removed
from a running guest. In order to modify a guest's AP configuration,
the guest must be terminated; only then can AP resources be assigned
to or unassigned from the guest's matrix mdev. The new AP
configuration becomes available to the guest when it is subsequently
restarted.
2. The AP bus's /sys/bus/ap/apmask and /sys/bus/ap/aqmask interfaces can
be modified by a root user without any restrictions. A change to
either mask can result in AP queue devices being unbound from the
vfio_ap device driver and bound to a zcrypt device driver even if a
guest is using the queues, thus giving the host access to the guest's
private crypto data and vice versa.
3. The APQNs derived from the Cartesian product of the APIDs of the
adapters and APQIs of the domains assigned to a matrix mdev must
reference an AP queue device bound to the vfio_ap device driver. The
AP architecture allows assignment of AP resources that are not
available to the system, so this artificial restriction is not
compliant with the architecture.
4. The AP configuration profile can be dynamically changed for the linux
host after a KVM guest is started. For example, a new domain can be
dynamically added to the configuration profile via the SE or an HMC
connected to a DPM enabled lpar. Likewise, AP adapters can be
dynamically configured (online state) and deconfigured (standby state)
using the SE, an SCLP command or an HMC connected to a DPM enabled
lpar. This can result in inadvertent sharing of AP queues between the
guest and host.
5. A root user can manually unbind an AP queue device representing a
queue in use by a KVM guest via the vfio_ap device driver's sysfs
unbind attribute. In this case, the guest will be using a queue that
is not bound to the driver which violates the device model.
This patch series introduces the following changes to the current design
to alleviate the shortcomings described above as well as to implement
more of the AP architecture:
1. A root user will be prevented from making edits to the AP bus's
/sys/bus/ap/apmask or /sys/bus/ap/aqmask if the change would transfer
ownership of an APQN from the vfio_ap device driver to a zcrypt driver
while the APQN is assigned to a matrix mdev.
2. Allow a root user to hot plug/unplug AP adapters, domains and control
domains for a KVM guest using the matrix mdev via its sysfs
assign/unassign attributes.
4. Allow assignment of an AP adapter or domain to a matrix mdev even if
it results in assignment of an APQN that does not reference an AP
queue device bound to the vfio_ap device driver, as long as the APQN
is not reserved for use by the default zcrypt drivers (also known as
over-provisioning of AP resources). Allowing over-provisioning of AP
resources better models the architecture which does not preclude
assigning AP resources that are not yet available in the system. Such
APQNs, however, will not be assigned to the guest using the matrix
mdev; only APQNs referencing AP queue devices bound to the vfio_ap
device driver will actually get assigned to the guest.
5. Handle dynamic changes to the AP device model.
1. Rationale for changes to AP bus's apmask/aqmask interfaces:
----------------------------------------------------------
Due to the extremely sensitive nature of cryptographic data, it is
imperative that great care be taken to ensure that such data is secured.
Allowing a root user, either inadvertently or maliciously, to configure
these masks such that a queue is shared between the host and a guest is
not only avoidable, it is advisable. It was suggested that this scenario
is better handled in user space with management software, but that does
not preclude a malicious administrator from using the sysfs interfaces
to gain access to a guest's crypto data. It was also suggested that this
scenario could be avoided by taking access to the adapter away from the
guest and zeroing out the queues prior to the vfio_ap driver releasing the
device; however, stealing an adapter in use from a guest as a by-product
of an operation is bad and will likely cause problems for the guest
unnecessarily. It was decided that the most effective solution with the
least number of negative side effects is to prevent the situation at the
source.
2. Rationale for hot plug/unplug using matrix mdev sysfs interfaces:
----------------------------------------------------------------
Allowing a user to hot plug/unplug AP resources using the matrix mdev
sysfs interfaces circumvents the need to terminate the guest in order to
modify its AP configuration. Allowing dynamic configuration makes
reconfiguring a guest's AP matrix much less disruptive.
3. Rationale for allowing over-provisioning of AP resources:
-----------------------------------------------------------
Allowing assignment of AP resources to a matrix mdev and ultimately to a
guest better models the AP architecture. The architecture does not
preclude assignment of unavailable AP resources. If a queue subsequently
becomes available while a guest using the matrix mdev to which its APQN
is assigned, the guest will be given access to it. If an APQN
is dynamically unassigned from the underlying host system, it will
automatically become unavailable to the guest.
Change log v11-v12:
------------------
* Moved matrix device lock to protect group notifier callback
* Split the 'No need to disable IRQ after queue reset' patch into
multiple patches for easier review (move probe/remove callback
functions and remove disable IRQ after queue reset)
* Added code to decrement reference count for KVM in group notifier
callback
* Using mutex_trylock() in functions implementing the sysfs assign_adapter
and assign_domain as well as the in_use callback to avoid deadlock
between the AP bus's ap_perms mutex and the matrix device lock used by
vfio_ap driver.
* The sysfs guest_matrix attribute of the vfio_ap mdev will now display
the shadow APCB regardless of whether a guest is using the mdev or not
* Replaced vfio_ap mdev filtering function with a function that initializes
the guest's APCB by filtering the vfio_ap mdev by APID.
* No longer using filtering function during adapter/domain assignment
to/from the vfio_ap mdev; replaced with new hot plug/unplug
adapter/domain functions.
* No longer using filtering function during bind/unbind; replaced with
hot plug/unplug queue functions.
* No longer using filtering function for bulk assignment of new adapters
and domains in on_scan_complete callback; replaced with new hot plug
functions.
Change log v10-v11:
------------------
* The matrix mdev's configuration is not filtered by APID so that if any
APQN assigned to the mdev is not bound to the vfio_ap device driver,
the adapter will not get plugged into the KVM guest on startup, or when
a new adapter is assigned to the mdev.
* Replaced patch 8 by squashing patches 8 (filtering patch) and 15 (handle
probe/remove).
* Added a patch 1 to remove disable IRQ after a reset because the reset
already disables a queue.
* Now using filtering code to update the KVM guest's matrix when
notified that AP bus scan has completed.
* Fixed issue with probe/remove not inititiated by a configuration change
occurring within a config change.
Change log v9-v10:
-----------------
* Updated the documentation in vfio-ap.rst to include information about the
AP dynamic configuration support
Change log v8-v9:
----------------
* Fixed errors flagged by the kernel test robot
* Fixed issue with guest losing queues when a new queue is probed due to
manual bind operation.
Change log v7-v8:
----------------
* Now logging a message when an attempt to reserve APQNs for the zcrypt
drivers will result in taking a queue away from a KVM guest to provide
the sysadmin a way to ascertain why the sysfs operation failed.
* Created locked and unlocked versions of the ap_parse_mask_str() function.
* Now using new interface provided by an AP bus patch -
s390/ap: introduce new ap function ap_get_qdev() - to retrieve
struct ap_queue representing an AP queue device. This patch is not a
part of this series but is a prerequisite for this series.
Change log v6-v7:
----------------
* Added callbacks to AP bus:
- on_config_changed: Notifies implementing drivers that
the AP configuration has changed since last AP device scan.
- on_scan_complete: Notifies implementing drivers that the device scan
has completed.
- implemented on_config_changed and on_scan_complete callbacks for
vfio_ap device driver.
- updated vfio_ap device driver's probe and remove callbacks to handle
dynamic changes to the AP device model.
* Added code to filter APQNs when assigning AP resources to a KVM guest's
CRYCB
Change log v5-v6:
----------------
* Fixed a bug in ap_bus.c introduced with patch 2/7 of the v5
series. Harald Freudenberer pointed out that the mutex lock
for ap_perms_mutex in the apmask_store and aqmask_store functions
was not being freed.
* Removed patch 6/7 which added logging to the vfio_ap driver
to expedite acceptance of this series. The logging will be introduced
with a separate patch series to allow more time to explore options
such as DBF logging vs. tracepoints.
* Added 3 patches related to ensuring that APQNs that do not reference
AP queue devices bound to the vfio_ap device driver are not assigned
to the guest CRYCB:
Patch 4: Filter CRYCB bits for unavailable queue devices
Patch 5: sysfs attribute to display the guest CRYCB
Patch 6: update guest CRYCB in vfio_ap probe and remove callbacks
* Added a patch (Patch 9) to version the vfio_ap module.
* Reshuffled patches to allow the in_use callback implementation to
invoke the vfio_ap_mdev_verify_no_sharing() function introduced in
patch 2.
Change log v4-v5:
----------------
* Added a patch to provide kernel s390dbf debug logs for VFIO AP
Change log v3->v4:
-----------------
* Restored patches preventing root user from changing ownership of
APQNs from zcrypt drivers to the vfio_ap driver if the APQN is
assigned to an mdev.
* No longer enforcing requirement restricting guest access to
queues represented by a queue device bound to the vfio_ap
device driver.
* Removed shadow CRYCB and now directly updating the guest CRYCB
from the matrix mdev's matrix.
* Rebased the patch series on top of 'vfio: ap: AP Queue Interrupt
Control' patches.
* Disabled bind/unbind sysfs interfaces for vfio_ap driver
Change log v2->v3:
-----------------
* Allow guest access to an AP queue only if the queue is bound to
the vfio_ap device driver.
* Removed the patch to test CRYCB masks before taking the vCPUs
out of SIE. Now checking the shadow CRYCB in the vfio_ap driver.
Change log v1->v2:
-----------------
* Removed patches preventing root user from unbinding AP queues from
the vfio_ap device driver
* Introduced a shadow CRYCB in the vfio_ap driver to manage dynamic
changes to the AP guest configuration due to root user interventions
or hardware anomalies.
Tony Krowiak (17):
s390/vfio-ap: move probe and remove callbacks to vfio_ap_ops.c
s390/vfio-ap: decrement reference count to KVM
390/vfio-ap: use new AP bus interface to search for queue devices
s390/vfio-ap: No need to disable IRQ after queue reset
s390/vfio-ap: manage link between queue struct and matrix mdev
s390/zcrypt: driver callback to indicate resource in use
s390/vfio-ap: implement in-use callback for vfio_ap driver
s390/vfio-ap: introduce shadow APCB
s390/vfio-ap: sysfs attribute to display the guest's matrix
s390/vfio-ap: initialize the guest apcb
s390/vfio-ap: allow assignment of unavailable AP queues to mdev device
s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
s390/vfio-ap: hot plug/unplug queues on bind/unbind of queue device
s390/zcrypt: Notify driver on config changed and scan complete
callbacks
s390/vfio-ap: handle host AP config change notification
s390/vfio-ap: handle AP bus scan completed notification
s390/vfio-ap: update docs to include dynamic config support
Documentation/s390/vfio-ap.rst | 401 ++++++---
drivers/s390/crypto/ap_bus.c | 243 +++++-
drivers/s390/crypto/ap_bus.h | 16 +
drivers/s390/crypto/vfio_ap_drv.c | 52 +-
drivers/s390/crypto/vfio_ap_ops.c | 1113 +++++++++++++++++++------
drivers/s390/crypto/vfio_ap_private.h | 30 +-
6 files changed, 1418 insertions(+), 437 deletions(-)
--
2.21.1
Decrement the reference count to KVM when notified that KVM pointer is
invalidated via the vfio group notifier.
Signed-off-by: Tony Krowiak <[email protected]>
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 66fd9784a156..31e39c1f6e56 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1095,7 +1095,11 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
if (!data) {
+ if (matrix_mdev->kvm)
+ kvm_put_kvm(matrix_mdev->kvm);
+
matrix_mdev->kvm = NULL;
+
return NOTIFY_OK;
}
--
2.21.1
This patch refactors the vfio_ap device driver to use the AP bus's
ap_get_qdev() function to retrieve the vfio_ap_queue struct containing
information about a queue that is bound to the vfio_ap device driver.
The bus's ap_get_qdev() function retrieves the queue device from a
hashtable keyed by APQN. This is much more efficient than looping over
the list of devices attached to the AP bus by several orders of
magnitude.
Signed-off-by: Tony Krowiak <[email protected]>
Reviewed-by: Halil Pasic <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 29 ++++++++++++++++-------------
1 file changed, 16 insertions(+), 13 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 31e39c1f6e56..8e6972495daa 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -33,36 +33,39 @@ static int match_apqn(struct device *dev, const void *data)
return (q->apqn == *(int *)(data)) ? 1 : 0;
}
+
/**
- * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
+ * vfio_ap_get_queue: Retrieve a queue with a specific APQN.
* @matrix_mdev: the associated mediated matrix
* @apqn: The queue APQN
*
- * Retrieve a queue with a specific APQN from the list of the
- * devices of the vfio_ap_drv.
- * Verify that the APID and the APQI are set in the matrix.
+ * Retrieve a queue with a specific APQN from the AP queue devices attached to
+ * the AP bus.
*
- * Returns the pointer to the associated vfio_ap_queue
+ * Returns the pointer to the vfio_ap_queue with the specified APQN, or NULL.
*/
static struct vfio_ap_queue *vfio_ap_get_queue(
struct ap_matrix_mdev *matrix_mdev,
int apqn)
{
- struct vfio_ap_queue *q;
- struct device *dev;
+ struct ap_queue *queue;
+ struct vfio_ap_queue *q = NULL;
if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
return NULL;
if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
return NULL;
- dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
- &apqn, match_apqn);
- if (!dev)
+ queue = ap_get_qdev(apqn);
+ if (!queue)
return NULL;
- q = dev_get_drvdata(dev);
- q->matrix_mdev = matrix_mdev;
- put_device(dev);
+
+ put_device(&queue->ap_dev.device);
+
+ if (queue->ap_dev.device.driver == &matrix_dev->vfio_ap_drv->driver) {
+ q = dev_get_drvdata(&queue->ap_dev.device);
+ q->matrix_mdev = matrix_mdev;
+ }
return q;
}
--
2.21.1
The APCB is a field within the CRYCB that provides the AP configuration
to a KVM guest. Let's introduce a shadow copy of the KVM guest's APCB and
maintain it for the lifespan of the guest.
Signed-off-by: Tony Krowiak <[email protected]>
Reviewed-by: Halil Pasic <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 25 ++++++++++++++++++-------
drivers/s390/crypto/vfio_ap_private.h | 2 ++
2 files changed, 20 insertions(+), 7 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 3c2479d7e674..89b0e81657ca 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -318,6 +318,20 @@ static void vfio_ap_matrix_init(struct ap_config_info *info,
matrix->adm_max = info->apxa ? info->Nd : 15;
}
+static bool vfio_ap_mdev_has_crycb(struct ap_matrix_mdev *matrix_mdev)
+{
+ return (matrix_mdev->kvm && matrix_mdev->kvm->arch.crypto.crycbd);
+}
+
+static void vfio_ap_mdev_commit_shadow_apcb(struct ap_matrix_mdev *matrix_mdev)
+{
+ if (vfio_ap_mdev_has_crycb(matrix_mdev))
+ kvm_arch_crypto_set_masks(matrix_mdev->kvm,
+ matrix_mdev->shadow_apcb.apm,
+ matrix_mdev->shadow_apcb.aqm,
+ matrix_mdev->shadow_apcb.adm);
+}
+
static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
{
struct ap_matrix_mdev *matrix_mdev;
@@ -333,6 +347,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
matrix_mdev->mdev = mdev;
vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
+ vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);
hash_init(matrix_mdev->qtable);
mdev_set_drvdata(mdev, matrix_mdev);
matrix_mdev->pqap_hook.hook = handle_pqap;
@@ -1218,13 +1233,9 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
if (ret)
return NOTIFY_DONE;
- /* If there is no CRYCB pointer, then we can't copy the masks */
- if (!matrix_mdev->kvm->arch.crypto.crycbd)
- return NOTIFY_DONE;
-
- kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
- matrix_mdev->matrix.aqm,
- matrix_mdev->matrix.adm);
+ memcpy(&matrix_mdev->shadow_apcb, &matrix_mdev->matrix,
+ sizeof(matrix_mdev->shadow_apcb));
+ vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
return NOTIFY_OK;
}
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 8b9b5255abfe..15b7cd74843b 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -75,6 +75,7 @@ struct ap_matrix {
* @list: allows the ap_matrix_mdev struct to be added to a list
* @matrix: the adapters, usage domains and control domains assigned to the
* mediated matrix device.
+ * @shadow_apcb: the shadow copy of the APCB field of the KVM guest's CRYCB
* @group_notifier: notifier block used for specifying callback function for
* handling the VFIO_GROUP_NOTIFY_SET_KVM event
* @kvm: the struct holding guest's state
@@ -82,6 +83,7 @@ struct ap_matrix {
struct ap_matrix_mdev {
struct list_head node;
struct ap_matrix matrix;
+ struct ap_matrix shadow_apcb;
struct notifier_block group_notifier;
struct notifier_block iommu_notifier;
struct kvm *kvm;
--
2.21.1
Let's implement the callback to indicate when an APQN
is in use by the vfio_ap device driver. The callback is
invoked whenever a change to the apmask or aqmask would
result in one or more queue devices being removed from the driver. The
vfio_ap device driver will indicate a resource is in use
if the APQN of any of the queue devices to be removed are assigned to
any of the matrix mdevs under the driver's control.
There is potential for a deadlock condition between the matrix_dev->lock
used to lock the matrix device during assignment of adapters and domains
and the ap_perms_mutex locked by the AP bus when changes are made to the
sysfs apmask/aqmask attributes.
Consider following scenario (courtesy of Halil Pasic):
1) apmask_store() takes ap_perms_mutex
2) assign_adapter_store() takes matrix_dev->lock
3) apmask_store() calls vfio_ap_mdev_resource_in_use() which tries
to take matrix_dev->lock
4) assign_adapter_store() calls ap_apqn_in_matrix_owned_by_def_drv
which tries to take ap_perms_mutex
BANG!
To resolve this issue, instead of using the mutex_lock(&matrix_dev->lock)
function to lock the matrix device during assignment of an adapter or
domain to a matrix_mdev as well as during the in_use callback, the
mutex_trylock(&matrix_dev->lock) function will be used. If the lock is not
obtained, then the assignment and in_use functions will terminate with
-EBUSY.
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_drv.c | 1 +
drivers/s390/crypto/vfio_ap_ops.c | 96 +++++++++++++++++++--------
drivers/s390/crypto/vfio_ap_private.h | 2 +
3 files changed, 71 insertions(+), 28 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index 73bd073fd5d3..8934471b7944 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -147,6 +147,7 @@ static int __init vfio_ap_init(void)
memset(&vfio_ap_drv, 0, sizeof(vfio_ap_drv));
vfio_ap_drv.probe = vfio_ap_mdev_probe_queue;
vfio_ap_drv.remove = vfio_ap_mdev_remove_queue;
+ vfio_ap_drv.in_use = vfio_ap_mdev_resource_in_use;
vfio_ap_drv.ids = ap_queue_ids;
ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 07caf871943c..3c2479d7e674 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -520,18 +520,40 @@ vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
return 0;
}
+#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
+ "already assigned to %s"
+
+static void vfio_ap_mdev_log_sharing_err(const char *mdev_name,
+ unsigned long *apm,
+ unsigned long *aqm)
+{
+ unsigned long apid, apqi;
+
+ for_each_set_bit_inv(apid, apm, AP_DEVICES)
+ for_each_set_bit_inv(apqi, aqm, AP_DOMAINS)
+ pr_warn(MDEV_SHARING_ERR, apid, apqi, mdev_name);
+}
+
/**
* vfio_ap_mdev_verify_no_sharing
*
- * Verifies that the APQNs derived from the cross product of the AP adapter IDs
- * and AP queue indexes comprising the AP matrix are not configured for another
+ * Verifies that each APQN derived from the cross product of the AP adapter IDs
+ * and AP queue indexes comprising an AP matrix is not assigned to a
* mediated device. AP queue sharing is not allowed.
*
- * @matrix_mdev: the mediated matrix device
+ * @matrix_mdev: the mediated matrix device to which the APQNs being verified
+ * are assigned. If the value is not NULL, then verification will
+ * proceed for all other matrix mediated devices; otherwise, all
+ * matrix mediated devices will be verified.
+ * @mdev_apm: mask indicating the APIDs of the APQNs to be verified
+ * @mdev_aqm: mask indicating the APQIs of the APQNs to be verified
*
- * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
+ * Returns 0 if no APQNs are not shared, otherwise; returns -EBUSY if one
+ * or more APQNs are shared.
*/
-static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
+static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long *mdev_apm,
+ unsigned long *mdev_aqm)
{
struct ap_matrix_mdev *lstdev;
DECLARE_BITMAP(apm, AP_DEVICES);
@@ -548,15 +570,16 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
* We work on full longs, as we can only exclude the leftover
* bits in non-inverse order. The leftover is all zeros.
*/
- if (!bitmap_and(apm, matrix_mdev->matrix.apm,
- lstdev->matrix.apm, AP_DEVICES))
+ if (!bitmap_and(apm, mdev_apm, lstdev->matrix.apm, AP_DEVICES))
continue;
- if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
- lstdev->matrix.aqm, AP_DOMAINS))
+ if (!bitmap_and(aqm, mdev_aqm, lstdev->matrix.aqm, AP_DOMAINS))
continue;
- return -EADDRINUSE;
+ vfio_ap_mdev_log_sharing_err(dev_name(mdev_dev(lstdev->mdev)),
+ apm, aqm);
+
+ return -EBUSY;
}
return 0;
@@ -670,10 +693,10 @@ static void vfio_ap_mdev_manage_qlinks(struct ap_matrix_mdev *matrix_mdev,
* driver; or, if no APQIs have yet been assigned, the APID is not
* contained in an APQN bound to the vfio_ap device driver.
*
- * 4. -EADDRINUSE
+ * 4. -EBUSY
* An APQN derived from the cross product of the APID being assigned
* and the APQIs previously assigned is being used by another mediated
- * matrix device
+ * matrix device, or the matrix_dev-> lock could not be acquired.
*/
static ssize_t assign_adapter_store(struct device *dev,
struct device_attribute *attr,
@@ -681,6 +704,7 @@ static ssize_t assign_adapter_store(struct device *dev,
{
int ret;
unsigned long apid;
+ DECLARE_BITMAP(apm, AP_DEVICES);
struct mdev_device *mdev = mdev_from_dev(dev);
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
@@ -700,24 +724,25 @@ static ssize_t assign_adapter_store(struct device *dev,
* number (APID). The bits in the mask, from most significant to least
* significant bit, correspond to APIDs 0-255.
*/
- mutex_lock(&matrix_dev->lock);
+ if (!mutex_trylock(&matrix_dev->lock))
+ return -EBUSY;
ret = vfio_ap_mdev_verify_queues_reserved_for_apid(matrix_mdev, apid);
if (ret)
goto done;
- set_bit_inv(apid, matrix_mdev->matrix.apm);
+ memset(apm, 0, sizeof(apm));
+ set_bit_inv(apid, apm);
- ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
+ ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev, apm,
+ matrix_mdev->matrix.aqm);
if (ret)
- goto share_err;
+ goto done;
+ set_bit_inv(apid, matrix_mdev->matrix.apm);
vfio_ap_mdev_manage_qlinks(matrix_mdev, LINK_APID, apid);
ret = count;
- goto done;
-share_err:
- clear_bit_inv(apid, matrix_mdev->matrix.apm);
done:
mutex_unlock(&matrix_dev->lock);
@@ -821,7 +846,7 @@ vfio_ap_mdev_verify_queues_reserved_for_apqi(struct ap_matrix_mdev *matrix_mdev,
* 4. -EADDRINUSE
* An APQN derived from the cross product of the APQI being assigned
* and the APIDs previously assigned is being used by another mediated
- * matrix device
+ * matrix device, or the matrix_dev-> lock could not be acquired.
*/
static ssize_t assign_domain_store(struct device *dev,
struct device_attribute *attr,
@@ -829,6 +854,7 @@ static ssize_t assign_domain_store(struct device *dev,
{
int ret;
unsigned long apqi;
+ DECLARE_BITMAP(aqm, AP_DOMAINS);
struct mdev_device *mdev = mdev_from_dev(dev);
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
@@ -843,24 +869,25 @@ static ssize_t assign_domain_store(struct device *dev,
if (apqi > max_apqi)
return -ENODEV;
- mutex_lock(&matrix_dev->lock);
+ if (!mutex_trylock(&matrix_dev->lock))
+ return -EBUSY;
ret = vfio_ap_mdev_verify_queues_reserved_for_apqi(matrix_mdev, apqi);
if (ret)
goto done;
- set_bit_inv(apqi, matrix_mdev->matrix.aqm);
+ memset(aqm, 0, sizeof(aqm));
+ set_bit_inv(apqi, aqm);
- ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
+ ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev,
+ matrix_mdev->matrix.apm, aqm);
if (ret)
- goto share_err;
+ goto done;
+ set_bit_inv(apqi, matrix_mdev->matrix.aqm);
vfio_ap_mdev_manage_qlinks(matrix_mdev, LINK_APQI, apqi);
ret = count;
- goto done;
-share_err:
- clear_bit_inv(apqi, matrix_mdev->matrix.aqm);
done:
mutex_unlock(&matrix_dev->lock);
@@ -956,7 +983,8 @@ static ssize_t assign_control_domain_store(struct device *dev,
* least significant, correspond to IDs 0 up to the one less than the
* number of control domains that can be assigned.
*/
- mutex_lock(&matrix_dev->lock);
+ if (!mutex_trylock(&matrix_dev->lock))
+ return -EBUSY;
set_bit_inv(id, matrix_mdev->matrix.adm);
mutex_unlock(&matrix_dev->lock);
@@ -1457,3 +1485,15 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
kfree(q);
mutex_unlock(&matrix_dev->lock);
}
+
+int vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm)
+{
+ int ret;
+
+ if (!mutex_trylock(&matrix_dev->lock))
+ return -EBUSY;
+ ret = vfio_ap_mdev_verify_no_sharing(NULL, apm, aqm);
+ mutex_unlock(&matrix_dev->lock);
+
+ return ret;
+}
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 4e5cc72fc0db..8b9b5255abfe 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -105,4 +105,6 @@ struct vfio_ap_queue {
int vfio_ap_mdev_probe_queue(struct ap_device *queue);
void vfio_ap_mdev_remove_queue(struct ap_device *queue);
+int vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm);
+
#endif /* _VFIO_AP_PRIVATE_H_ */
--
2.21.1
This patch intruduces an extension to the ap bus to notify device drivers
when the host AP configuration changes - i.e., adapters, domains or
control domains are added or removed. To that end, two new callbacks are
introduced for AP device drivers:
void (*on_config_changed)(struct ap_config_info *new_config_info,
struct ap_config_info *old_config_info);
This callback is invoked at the start of the AP bus scan
function when it determines that the host AP configuration information
has changed since the previous scan. This is done by storing
an old and current QCI info struct and comparing them. If there is any
difference, the callback is invoked.
Note that when the AP bus scan detects that AP adapters, domains or
control domains have been removed from the host's AP configuration, it
will remove the associated devices from the AP bus subsystem's device
model. This callback gives the device driver a chance to respond to
the removal of the AP devices from the host configuration prior to
calling the device driver's remove callback. The primary purpose of
this callback is to allow the vfio_ap driver to do a bulk unplug of
all affected adapters, domains and control domains from affected
guests rather than unplugging them one at a time when the remove
callback is invoked.
void (*on_scan_complete)(struct ap_config_info *new_config_info,
struct ap_config_info *old_config_info);
The on_scan_complete callback is invoked after the ap bus scan is
complete if the host AP configuration data has changed.
Note that when the AP bus scan detects that adapters, domains or
control domains have been added to the host's configuration, it will
create new devices in the AP bus subsystem's device model. The primary
purpose of this callback is to allow the vfio_ap driver to do a bulk
plug of all affected adapters, domains and control domains into
affected guests rather than plugging them one at a time when the
probe callback is invoked.
Please note that changes to the apmask and aqmask do not trigger
these two callbacks since the bus scan function is not invoked by changes
to those masks.
Signed-off-by: Harald Freudenberger <[email protected]>
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/ap_bus.c | 83 ++++++++++++++++++++++++++-
drivers/s390/crypto/ap_bus.h | 12 ++++
drivers/s390/crypto/vfio_ap_private.h | 14 ++++-
3 files changed, 106 insertions(+), 3 deletions(-)
diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
index 593573740981..3a63f6b33d8a 100644
--- a/drivers/s390/crypto/ap_bus.c
+++ b/drivers/s390/crypto/ap_bus.c
@@ -75,6 +75,7 @@ DEFINE_MUTEX(ap_perms_mutex);
EXPORT_SYMBOL(ap_perms_mutex);
static struct ap_config_info *ap_qci_info;
+static struct ap_config_info *ap_qci_info_old;
/*
* AP bus related debug feature things.
@@ -1440,6 +1441,52 @@ static int __match_queue_device_with_queue_id(struct device *dev, const void *da
&& AP_QID_QUEUE(to_ap_queue(dev)->qid) == (int)(long) data;
}
+/* Helper function for notify_config_changed */
+static int __drv_notify_config_changed(struct device_driver *drv, void *data)
+{
+ struct ap_driver *ap_drv = to_ap_drv(drv);
+
+ if (try_module_get(drv->owner)) {
+ if (ap_drv->on_config_changed)
+ ap_drv->on_config_changed(ap_qci_info,
+ ap_qci_info_old);
+ module_put(drv->owner);
+ }
+
+ return 0;
+}
+
+/* Notify all drivers about an qci config change */
+static inline void notify_config_changed(void)
+{
+ bus_for_each_drv(&ap_bus_type, NULL, NULL,
+ __drv_notify_config_changed);
+}
+
+/* Helper function for notify_scan_complete */
+static int __drv_notify_scan_complete(struct device_driver *drv, void *data)
+{
+ struct ap_driver *ap_drv = to_ap_drv(drv);
+
+ if (try_module_get(drv->owner)) {
+ if (ap_drv->on_scan_complete)
+ ap_drv->on_scan_complete(ap_qci_info,
+ ap_qci_info_old);
+ module_put(drv->owner);
+ }
+
+ return 0;
+}
+
+/* Notify all drivers about bus scan complete */
+static inline void notify_scan_complete(void)
+{
+ bus_for_each_drv(&ap_bus_type, NULL, NULL,
+ __drv_notify_scan_complete);
+}
+
+
+
/*
* Helper function for ap_scan_bus().
* Remove card device and associated queue devices.
@@ -1718,15 +1765,43 @@ static inline void ap_scan_adapter(int ap)
put_device(&ac->ap_dev.device);
}
+static int ap_get_configuration(void)
+{
+ int cfg_chg = 0;
+
+ if (ap_qci_info) {
+ if (!ap_qci_info_old) {
+ ap_qci_info_old = kzalloc(sizeof(*ap_qci_info_old),
+ GFP_KERNEL);
+ if (!ap_qci_info_old)
+ return 0;
+ } else {
+ memcpy(ap_qci_info_old, ap_qci_info,
+ sizeof(struct ap_config_info));
+ }
+ ap_fetch_qci_info(ap_qci_info);
+ cfg_chg = memcmp(ap_qci_info,
+ ap_qci_info_old,
+ sizeof(struct ap_config_info)) != 0;
+ }
+
+ return cfg_chg;
+}
+
/**
* ap_scan_bus(): Scan the AP bus for new devices
* Runs periodically, workqueue timer (ap_config_time)
*/
static void ap_scan_bus(struct work_struct *unused)
{
- int ap;
+ int ap, config_changed = 0;
- ap_fetch_qci_info(ap_qci_info);
+ /* config change notify */
+ config_changed = ap_get_configuration();
+ if (config_changed)
+ notify_config_changed();
+ memcpy(ap_qci_info_old, ap_qci_info,
+ sizeof(struct ap_config_info));
ap_select_domain();
AP_DBF_DBG("%s running\n", __func__);
@@ -1735,6 +1810,10 @@ static void ap_scan_bus(struct work_struct *unused)
for (ap = 0; ap <= ap_max_adapter_id; ap++)
ap_scan_adapter(ap);
+ /* scan complete notify */
+ if (config_changed)
+ notify_scan_complete();
+
/* check if there is at least one queue available with default domain */
if (ap_domain_index >= 0) {
struct device *dev =
diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
index 65edd847c65a..fbfbf6991718 100644
--- a/drivers/s390/crypto/ap_bus.h
+++ b/drivers/s390/crypto/ap_bus.h
@@ -146,6 +146,18 @@ struct ap_driver {
int (*probe)(struct ap_device *);
void (*remove)(struct ap_device *);
int (*in_use)(unsigned long *apm, unsigned long *aqm);
+ /*
+ * Called at the start of the ap bus scan function when
+ * the crypto config information (qci) has changed.
+ */
+ void (*on_config_changed)(struct ap_config_info *new_config_info,
+ struct ap_config_info *old_config_info);
+ /*
+ * Called at the end of the ap bus scan function when
+ * the crypto config information (qci) has changed.
+ */
+ void (*on_scan_complete)(struct ap_config_info *new_config_info,
+ struct ap_config_info *old_config_info);
};
#define to_ap_drv(x) container_of((x), struct ap_driver, driver)
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 15b7cd74843b..7bd7e35eb2e0 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -36,14 +36,21 @@
* driver, be it using @mdev_list or writing the state of a
* single ap_matrix_mdev device. It's quite coarse but we don't
* expect much contention.
+ ** @ap_add: a bitmap specifying the APIDs added to the host AP configuration
+ * as notified by the AP bus via the on_cfg_chg callback.
+ * @aq_add: a bitmap specifying the APQIs added to the host AP configuration
+ * as notified by the AP bus via the on_cfg_chg callback.
*/
struct ap_matrix_dev {
struct device device;
atomic_t available_instances;
- struct ap_config_info info;
+ struct ap_config_info config_info;
+ struct ap_config_info config_info_prev;
struct list_head mdev_list;
struct mutex lock;
struct ap_driver *vfio_ap_drv;
+ DECLARE_BITMAP(ap_add, AP_DEVICES);
+ DECLARE_BITMAP(aq_add, AP_DEVICES);
};
extern struct ap_matrix_dev *matrix_dev;
@@ -90,6 +97,8 @@ struct ap_matrix_mdev {
struct kvm_s390_module_hook pqap_hook;
struct mdev_device *mdev;
DECLARE_HASHTABLE(qtable, 8);
+ DECLARE_BITMAP(ap_add, AP_DEVICES);
+ DECLARE_BITMAP(aq_add, AP_DEVICES);
};
extern int vfio_ap_mdev_register(void);
@@ -109,4 +118,7 @@ void vfio_ap_mdev_remove_queue(struct ap_device *queue);
int vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm);
+void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
+ struct ap_config_info *old_config_info);
+
#endif /* _VFIO_AP_PRIVATE_H_ */
--
2.21.1
Implements the driver callback invoked by the AP bus when the AP bus
scan has completed. Since this callback is invoked after binding the newly
added devices to their respective device drivers, the vfio_ap driver will
attempt to hot plug the adapters, domains and control domains into each
guest using the matrix mdev to which they are assigned. Keep in mind that
an adapter or domain can be plugged in only if:
* Each APQN derived from the newly added APID of the adapter and the APQIs
already assigned to the guest's APCB references an AP queue device bound
to the vfio_ap driver
* Each APQN derived from the newly added APQI of the domain and the APIDs
already assigned to the guest's APCB references an AP queue device bound
to the vfio_ap driver
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_drv.c | 1 +
drivers/s390/crypto/vfio_ap_ops.c | 54 +++++++++++++++++++++++++++
drivers/s390/crypto/vfio_ap_private.h | 2 +
3 files changed, 57 insertions(+)
diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index d7aa5543afef..357481e80b0a 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -152,6 +152,7 @@ static int __init vfio_ap_init(void)
vfio_ap_drv.in_use = vfio_ap_mdev_resource_in_use;
vfio_ap_drv.ids = ap_queue_ids;
vfio_ap_drv.on_config_changed = vfio_ap_on_cfg_changed;
+ vfio_ap_drv.on_scan_complete = vfio_ap_on_scan_complete;
ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
if (ret) {
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 074147fae339..7bfad92dd5e7 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1805,3 +1805,57 @@ void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
vfio_ap_mdev_on_cfg_add();
mutex_unlock(&matrix_dev->lock);
}
+
+static bool vfio_ap_assign_new_adapters(struct ap_matrix_mdev *matrix_mdev)
+{
+ unsigned long apid;
+ bool assigned = false;
+ DECLARE_BITMAP(ap_add, AP_DEVICES);
+
+ if (bitmap_empty(matrix_dev->ap_add, AP_DEVICES) ||
+ !bitmap_and(ap_add, matrix_dev->ap_add, matrix_mdev->matrix.apm,
+ AP_DEVICES))
+ return false;
+
+ for_each_set_bit_inv(apid, ap_add, AP_DEVICES)
+ assigned |= vfio_ap_assign_apid_to_apcb(matrix_mdev, apid);
+
+ return assigned;
+}
+
+static bool vfio_ap_assign_new_domains(struct ap_matrix_mdev *matrix_mdev)
+{
+ unsigned long apqi;
+ bool assigned = false;
+ DECLARE_BITMAP(aq_add, AP_DOMAINS);
+
+ if (bitmap_empty(matrix_dev->aq_add, AP_DOMAINS) ||
+ !bitmap_and(aq_add, matrix_dev->aq_add, matrix_mdev->matrix.aqm,
+ AP_DOMAINS))
+ return false;
+
+ for_each_set_bit_inv(apqi, aq_add, AP_DOMAINS)
+ assigned |= vfio_ap_assign_apqi_to_apcb(matrix_mdev, apqi);
+
+ return assigned;
+}
+
+void vfio_ap_on_scan_complete(struct ap_config_info *new_config_info,
+ struct ap_config_info *old_config_info)
+{
+ bool do_hotplug;
+ struct ap_matrix_mdev *matrix_mdev;
+
+ mutex_lock(&matrix_dev->lock);
+ list_for_each_entry(matrix_mdev, &matrix_dev->mdev_list, node) {
+ do_hotplug = vfio_ap_assign_new_adapters(matrix_mdev);
+ do_hotplug |= vfio_ap_assign_new_domains(matrix_mdev);
+
+ if (do_hotplug)
+ vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
+ }
+
+ bitmap_clear(matrix_dev->ap_add, 0, AP_DEVICES);
+ bitmap_clear(matrix_dev->aq_add, 0, AP_DOMAINS);
+ mutex_unlock(&matrix_dev->lock);
+}
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 7bd7e35eb2e0..807be361b95d 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -120,5 +120,7 @@ int vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm);
void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
struct ap_config_info *old_config_info);
+void vfio_ap_on_scan_complete(struct ap_config_info *new_config_info,
+ struct ap_config_info *old_config_info);
#endif /* _VFIO_AP_PRIVATE_H_ */
--
2.21.1
In response to the probe or remove of a queue device, if a KVM guest is
using the matrix mdev to which the APQN of the queue device is assigned,
the vfio_ap device driver must respond accordingly. In an ideal world, the
queue corresponding to the queue device being probed would be hot plugged
into the guest. Likewise, the queue corresponding to the queue device being
removed would be hot unplugged. Unfortunately, the AP architecture
precludes plugging or unplugging individual queues. The queues to which a
guest is granted access are specified as a matrix of adapter and domain
numbers. The Cartesian product of the adapter and domain numbers assigned
to this matrix comprise the AP queue numbers (APQN) to which the guest will
be granted access; therefore, it becomes obvious that assigning a new
adapter or domain number to the matrix may result in multiple APQNs
getting assigned. Likewise, unassigning an adapter or domain number from
the matrix may result in multiple APQNs getting unassigned. Additionally,
in order to enforce the linux device model requirement that a pass-through
device must be bound to the driver facilitating its passthrough, each new
APQN assigned to the guest's matrix must reference a queue device bound to
the vfio_ap device driver. The following sections articulate the design
for this patch.
Probing a queue device:
----------------------
The goal here is to assign the APQN of the queue being probed to the
guest's matrix if possible by adhering to a set of rules:
* The adapter number (APID) will be assigned to the guest matrix iff:
1. The adapter is in the host's AP configuration
2. The APID is not yet assigned to the guest's matrix
3. Each APQN derived from the APID and the domain numbers (APQI) of
domains already assigned to the guest's matrix references a queue
device bound to the vfio_ap device driver
* The domain number (APQI) will be assigned to the guest matrix iff:
1. The domain is in the host's AP configuration
2. The APQI is not yet assigned to the guest's matrix
3. Each APQN derived from the APQI and the APIDs of
adapters already assigned to the guest's matrix references a queue
device bound to the vfio_ap device driver
Removing a queue device:
-----------------------
Unassigning the adapter number from the guest's matrix will remove access
to all domains on the adapter from the guest. Unassigning the domain
number from the guest's matrix will remove access to that domain on all
adapters assigned to the guest matrix. If both the adapter and domain are
unassigned from the guest's matrix, That will reduce access to every
adapter for the guest. Since an AP adapter card is the actual hardware
device that gets physically plugged/unplugged, unassigning the adapter
number from the guest's matrix makes the most sense here.
Signed-off-by: Tony Krowiak <[email protected]>
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 37 +++++++++++++++++++++++++++++++
1 file changed, 37 insertions(+)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 4f96b7861607..1179c6af59c6 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1508,6 +1508,23 @@ static void vfio_ap_queue_link_mdev(struct vfio_ap_queue *q)
}
}
+
+static void vfio_ap_mdev_hot_plug_queue(struct vfio_ap_queue *q)
+{
+ bool hot_plug = false;
+ unsigned long apid = (unsigned long)AP_QID_CARD(q->apqn);
+ unsigned long apqi = (unsigned long)AP_QID_QUEUE(q->apqn);
+
+ if (q->matrix_mdev == NULL)
+ return;
+
+ hot_plug |= vfio_ap_assign_apid_to_apcb(q->matrix_mdev, apid);
+ hot_plug |= vfio_ap_assign_apqi_to_apcb(q->matrix_mdev, apqi);
+
+ if (hot_plug)
+ vfio_ap_mdev_commit_shadow_apcb(q->matrix_mdev);
+}
+
/**
* vfio_ap_mdev_probe_queue:
*
@@ -1526,11 +1543,30 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
q->apqn = to_ap_queue(&apdev->device)->qid;
q->saved_isc = VFIO_AP_ISC_INVALID;
vfio_ap_queue_link_mdev(q);
+ vfio_ap_mdev_hot_plug_queue(q);
mutex_unlock(&matrix_dev->lock);
return 0;
}
+static void vfio_ap_mdev_hot_unplug_queue(struct vfio_ap_queue *q)
+{
+ unsigned long apid;
+ unsigned long apqi;
+
+ if (q->matrix_mdev == NULL)
+ return;
+
+ apid = AP_QID_CARD(q->apqn);
+ apqi = AP_QID_QUEUE(q->apqn);
+
+ if (test_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm) &&
+ test_bit_inv(apqi, q->matrix_mdev->shadow_apcb.aqm)) {
+ clear_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm);
+ vfio_ap_mdev_commit_shadow_apcb(q->matrix_mdev);
+ }
+}
+
/**
* vfio_ap_mdev_remove_queue:
*
@@ -1544,6 +1580,7 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
mutex_lock(&matrix_dev->lock);
q = dev_get_drvdata(&apdev->device);
+ vfio_ap_mdev_hot_unplug_queue(q);
dev_set_drvdata(&apdev->device, NULL);
apid = AP_QID_CARD(q->apqn);
apqi = AP_QID_QUEUE(q->apqn);
--
2.21.1
Let's hot plug/unplug adapters, domains and control domains assigned to or
unassigned from an AP matrix mdev device while it is in use by a guest per
the following rules:
* Assign an adapter to mdev's matrix:
The adapter will be hot plugged into the guest under the following
conditions:
1. The adapter is not yet assigned to the guest's matrix
2. At least one domain is assigned to the guest's matrix
3. Each APQN derived from the APID of the newly assigned adapter and
the APQIs of the domains already assigned to the guest's
matrix references a queue device bound to the vfio_ap device driver.
The adapter and each domain assigned to the mdev's matrix will be hot
plugged into the guest under the following conditions:
1. The adapter is not yet assigned to the guest's matrix
2. No domains are assigned to the guest's matrix
3 At least one domain is assigned to the mdev's matrix
4. Each APQN derived from the APID of the newly assigned adapter and
the APQIs of the domains assigned to the mdev's matrix references a
queue device bound to the vfio_ap device driver.
* Unassign an adapter from mdev's matrix:
The adapter will be hot unplugged from the KVM guest if it is
assigned to the guest's matrix.
* Assign a domain to mdev's matrix:
The domain will be hot plugged into the guest under the following
conditions:
1. The domain is not yet assigned to the guest's matrix
2. At least one adapter is assigned to the guest's matrix
3. Each APQN derived from the APQI of the newly assigned domain and
the APIDs of the adapters already assigned to the guest's
matrix references a queue device bound to the vfio_ap device driver.
The domain and each adapter assigned to the mdev's matrix will be hot
plugged into the guest under the following conditions:
1. The domain is not yet assigned to the guest's matrix
2. No adapters are assigned to the guest's matrix
3 At least one adapter is assigned to the mdev's matrix
4. Each APQN derived from the APQI of the newly assigned domain and
the APIDs of the adapters assigned to the mdev's matrix references a
queue device bound to the vfio_ap device driver.
* Unassign adapter from mdev's matrix:
The domain will be hot unplugged from the KVM guest if it is
assigned to the guest's matrix.
* Assign a control domain:
The control domain will be hot plugged into the KVM guest if it is not
assigned to the guest's APCB. The AP architecture ensures a guest will
only get access to the control domain if it is in the host's AP
configuration, so there is no risk in hot plugging it; however, it will
become automatically available to the guest when it is added to the host
configuration.
* Unassign a control domain:
The control domain will be hot unplugged from the KVM guest if it is
assigned to the guest's APCB.
Note: Now that hot plug/unplug is implemented, there is the possibility
that an assignment/unassignment of an adapter, domain or control
domain could be initiated while the guest is starting, so the
matrix device lock will be taken for the group notification callback
that initializes the guest's APCB when the KVM pointer is made
available to the vfio_ap device driver.
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 190 +++++++++++++++++++++++++-----
1 file changed, 159 insertions(+), 31 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 586ec5776693..4f96b7861607 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -631,6 +631,60 @@ static void vfio_ap_mdev_manage_qlinks(struct ap_matrix_mdev *matrix_mdev,
}
}
+static bool vfio_ap_assign_apid_to_apcb(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long apid)
+{
+ unsigned long apqi, apqn;
+ unsigned long *aqm = matrix_mdev->shadow_apcb.aqm;
+
+ /*
+ * If the APID is already assigned to the guest's shadow APCB, there is
+ * no need to assign it.
+ */
+ if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm))
+ return false;
+
+ /*
+ * If no domains have yet been assigned to the shadow APCB and one or
+ * more domains have been assigned to the matrix mdev, then use
+ * the domains assigned to the matrix mdev; otherwise, there is nothing
+ * to assign to the shadow APCB.
+ */
+ if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS)) {
+ if (bitmap_empty(matrix_mdev->matrix.aqm, AP_DOMAINS))
+ return false;
+
+ aqm = matrix_mdev->matrix.aqm;
+ }
+
+ /* Make sure all APQNs are bound to the vfio_ap driver */
+ for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
+ apqn = AP_MKQID(apid, apqi);
+
+ if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
+ return false;
+ }
+
+ set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
+
+ /*
+ * If we verified APQNs using the domains assigned to the matrix mdev,
+ * then copy the APQIs of those domains into the guest's APCB
+ */
+ if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS))
+ bitmap_copy(matrix_mdev->shadow_apcb.aqm,
+ matrix_mdev->matrix.aqm, AP_DOMAINS);
+
+ return true;
+}
+
+static void vfio_ap_mdev_hot_plug_adapter(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long apid)
+{
+ if (vfio_ap_assign_apid_to_apcb(matrix_mdev, apid))
+ vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
+}
+
/**
* assign_adapter_store
*
@@ -673,10 +727,6 @@ static ssize_t assign_adapter_store(struct device *dev,
struct mdev_device *mdev = mdev_from_dev(dev);
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
- /* If the guest is running, disallow assignment of adapter */
- if (matrix_mdev->kvm)
- return -EBUSY;
-
ret = kstrtoul(buf, 0, &apid);
if (ret)
return ret;
@@ -698,12 +748,22 @@ static ssize_t assign_adapter_store(struct device *dev,
}
set_bit_inv(apid, matrix_mdev->matrix.apm);
vfio_ap_mdev_manage_qlinks(matrix_mdev, LINK_APID, apid);
+ vfio_ap_mdev_hot_plug_adapter(matrix_mdev, apid);
mutex_unlock(&matrix_dev->lock);
return count;
}
static DEVICE_ATTR_WO(assign_adapter);
+static void vfio_ap_mdev_hot_unplug_adapter(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long apid)
+{
+ if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm)) {
+ clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
+ vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
+ }
+}
+
/**
* unassign_adapter_store
*
@@ -730,10 +790,6 @@ static ssize_t unassign_adapter_store(struct device *dev,
struct mdev_device *mdev = mdev_from_dev(dev);
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
- /* If the guest is running, disallow un-assignment of adapter */
- if (matrix_mdev->kvm)
- return -EBUSY;
-
ret = kstrtoul(buf, 0, &apid);
if (ret)
return ret;
@@ -744,12 +800,67 @@ static ssize_t unassign_adapter_store(struct device *dev,
mutex_lock(&matrix_dev->lock);
clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
vfio_ap_mdev_manage_qlinks(matrix_mdev, UNLINK_APID, apid);
+ vfio_ap_mdev_hot_unplug_adapter(matrix_mdev, apid);
mutex_unlock(&matrix_dev->lock);
return count;
}
static DEVICE_ATTR_WO(unassign_adapter);
+static bool vfio_ap_assign_apqi_to_apcb(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long apqi)
+{
+ unsigned long apid, apqn;
+ unsigned long *apm = matrix_mdev->shadow_apcb.apm;
+
+ /*
+ * If the APQI is already assigned to the guest's shadow APCB, there is
+ * no need to assign it.
+ */
+ if (test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm))
+ return false;
+
+ /*
+ * If no adapters have yet been assigned to the shadow APCB and one or
+ * more adapters have been assigned to the matrix mdev, then use
+ * the adapters assigned to the matrix mdev; otherwise, there is nothing
+ * to assign to the shadow APCB.
+ */
+ if (bitmap_empty(matrix_mdev->shadow_apcb.apm, AP_DEVICES)) {
+ if (bitmap_empty(matrix_mdev->matrix.apm, AP_DEVICES))
+ return false;
+
+ apm = matrix_mdev->matrix.apm;
+ }
+
+ /* Make sure all APQNs are bound to the vfio_ap driver */
+ for_each_set_bit_inv(apid, apm, AP_DEVICES) {
+ apqn = AP_MKQID(apid, apqi);
+
+ if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
+ return false;
+ }
+
+ set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
+
+ /*
+ * If we verified APQNs using the adapters assigned to the matrix mdev,
+ * then copy the APIDs of those adapters into the guest's APCB
+ */
+ if (bitmap_empty(matrix_mdev->shadow_apcb.apm, AP_DEVICES))
+ bitmap_copy(matrix_mdev->shadow_apcb.apm,
+ matrix_mdev->matrix.apm, AP_DEVICES);
+
+ return true;
+}
+
+static void vfio_ap_mdev_hot_plug_domain(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long apqi)
+{
+ if (vfio_ap_assign_apqi_to_apcb(matrix_mdev, apqi))
+ vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
+}
+
/**
* assign_domain_store
*
@@ -793,10 +904,6 @@ static ssize_t assign_domain_store(struct device *dev,
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
- /* If the guest is running, disallow assignment of domain */
- if (matrix_mdev->kvm)
- return -EBUSY;
-
ret = kstrtoul(buf, 0, &apqi);
if (ret)
return ret;
@@ -817,12 +924,21 @@ static ssize_t assign_domain_store(struct device *dev,
}
set_bit_inv(apqi, matrix_mdev->matrix.aqm);
vfio_ap_mdev_manage_qlinks(matrix_mdev, LINK_APQI, apqi);
+ vfio_ap_mdev_hot_plug_domain(matrix_mdev, apqi);
mutex_unlock(&matrix_dev->lock);
return count;
}
static DEVICE_ATTR_WO(assign_domain);
+static void vfio_ap_mdev_hot_unplug_domain(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long apqi)
+{
+ if (test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm)) {
+ clear_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
+ vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
+ }
+}
/**
* unassign_domain_store
@@ -850,10 +966,6 @@ static ssize_t unassign_domain_store(struct device *dev,
struct mdev_device *mdev = mdev_from_dev(dev);
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
- /* If the guest is running, disallow un-assignment of domain */
- if (matrix_mdev->kvm)
- return -EBUSY;
-
ret = kstrtoul(buf, 0, &apqi);
if (ret)
return ret;
@@ -864,12 +976,22 @@ static ssize_t unassign_domain_store(struct device *dev,
mutex_lock(&matrix_dev->lock);
clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
vfio_ap_mdev_manage_qlinks(matrix_mdev, UNLINK_APQI, apqi);
+ vfio_ap_mdev_hot_unplug_domain(matrix_mdev, apqi);
mutex_unlock(&matrix_dev->lock);
return count;
}
static DEVICE_ATTR_WO(unassign_domain);
+static void vfio_ap_mdev_hot_plug_ctl_domain(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long domid)
+{
+ if (!test_bit_inv(domid, matrix_mdev->shadow_apcb.adm)) {
+ set_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
+ vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
+ }
+}
+
/**
* assign_control_domain_store
*
@@ -895,10 +1017,6 @@ static ssize_t assign_control_domain_store(struct device *dev,
struct mdev_device *mdev = mdev_from_dev(dev);
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
- /* If the guest is running, disallow assignment of control domain */
- if (matrix_mdev->kvm)
- return -EBUSY;
-
ret = kstrtoul(buf, 0, &id);
if (ret)
return ret;
@@ -914,12 +1032,23 @@ static ssize_t assign_control_domain_store(struct device *dev,
if (!mutex_trylock(&matrix_dev->lock))
return -EBUSY;
set_bit_inv(id, matrix_mdev->matrix.adm);
+ vfio_ap_mdev_hot_plug_ctl_domain(matrix_mdev, id);
mutex_unlock(&matrix_dev->lock);
return count;
}
static DEVICE_ATTR_WO(assign_control_domain);
+static void
+vfio_ap_mdev_hot_unplug_ctl_domain(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long domid)
+{
+ if (test_bit_inv(domid, matrix_mdev->shadow_apcb.adm)) {
+ clear_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
+ vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
+ }
+}
+
/**
* unassign_control_domain_store
*
@@ -946,10 +1075,6 @@ static ssize_t unassign_control_domain_store(struct device *dev,
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
unsigned long max_domid = matrix_mdev->matrix.adm_max;
- /* If the guest is running, disallow un-assignment of control domain */
- if (matrix_mdev->kvm)
- return -EBUSY;
-
ret = kstrtoul(buf, 0, &domid);
if (ret)
return ret;
@@ -958,6 +1083,7 @@ static ssize_t unassign_control_domain_store(struct device *dev,
mutex_lock(&matrix_dev->lock);
clear_bit_inv(domid, matrix_mdev->matrix.adm);
+ vfio_ap_mdev_hot_unplug_ctl_domain(matrix_mdev, domid);
mutex_unlock(&matrix_dev->lock);
return count;
@@ -1099,8 +1225,6 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
{
struct ap_matrix_mdev *m;
- mutex_lock(&matrix_dev->lock);
-
list_for_each_entry(m, &matrix_dev->mdev_list, node) {
if ((m != matrix_mdev) && (m->kvm == kvm)) {
mutex_unlock(&matrix_dev->lock);
@@ -1111,7 +1235,6 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
matrix_mdev->kvm = kvm;
kvm_get_kvm(kvm);
kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
- mutex_unlock(&matrix_dev->lock);
return 0;
}
@@ -1148,7 +1271,7 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
unsigned long action, void *data)
{
- int ret;
+ int ret = NOTIFY_DONE;
struct ap_matrix_mdev *matrix_mdev;
if (action != VFIO_GROUP_NOTIFY_SET_KVM)
@@ -1156,23 +1279,28 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
+ mutex_lock(&matrix_dev->lock);
+
if (!data) {
if (matrix_mdev->kvm)
kvm_put_kvm(matrix_mdev->kvm);
matrix_mdev->kvm = NULL;
- return NOTIFY_OK;
+ ret = NOTIFY_OK;
+ goto done;
}
ret = vfio_ap_mdev_set_kvm(matrix_mdev, data);
if (ret)
- return NOTIFY_DONE;
+ goto done;
vfio_ap_mdev_init_apcb(matrix_mdev);
vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
- return NOTIFY_OK;
+done:
+ mutex_unlock(&matrix_dev->lock);
+ return ret;
}
static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
--
2.21.1
Update the documentation in vfio-ap.rst to include information about the
AP dynamic configuration support (i.e., hot plug of adapters, domains
and control domains via the matrix mediated device's sysfs assignment
attributes).
Signed-off-by: Tony Krowiak <[email protected]>
---
Documentation/s390/vfio-ap.rst | 401 +++++++++++++++++++++++++--------
1 file changed, 302 insertions(+), 99 deletions(-)
diff --git a/Documentation/s390/vfio-ap.rst b/Documentation/s390/vfio-ap.rst
index e15436599086..09f4add1c805 100644
--- a/Documentation/s390/vfio-ap.rst
+++ b/Documentation/s390/vfio-ap.rst
@@ -123,9 +123,9 @@ Let's now take a look at how AP instructions executed on a guest are interpreted
by the hardware.
A satellite control block called the Crypto Control Block (CRYCB) is attached to
-our main hardware virtualization control block. The CRYCB contains three fields
-to identify the adapters, usage domains and control domains assigned to the KVM
-guest:
+our main hardware virtualization control block. The CRYCB contains an AP Control
+Block (APCB) that has three fields to identify the adapters, usage domains and
+control domains assigned to the KVM guest:
* The AP Mask (APM) field is a bit mask that identifies the AP adapters assigned
to the KVM guest. Each bit in the mask, from left to right (i.e. from most
@@ -192,7 +192,7 @@ The design introduces three new objects:
1. AP matrix device
2. VFIO AP device driver (vfio_ap.ko)
-3. VFIO AP mediated matrix pass-through device
+3. VFIO AP mediated pass-through device
The VFIO AP device driver
-------------------------
@@ -200,12 +200,13 @@ The VFIO AP (vfio_ap) device driver serves the following purposes:
1. Provides the interfaces to secure APQNs for exclusive use of KVM guests.
-2. Sets up the VFIO mediated device interfaces to manage a mediated matrix
+2. Sets up the VFIO mediated device interfaces to manage a vfio_ap mediated
device and creates the sysfs interfaces for assigning adapters, usage
domains, and control domains comprising the matrix for a KVM guest.
-3. Configures the APM, AQM and ADM in the CRYCB referenced by a KVM guest's
- SIE state description to grant the guest access to a matrix of AP devices
+3. Configures the APM, AQM and ADM in the APCB contained in the CRYCB referenced
+ by a KVM guest's SIE state description to grant the guest access to a matrix
+ of AP devices
Reserve APQNs for exclusive use of KVM guests
---------------------------------------------
@@ -253,7 +254,7 @@ The process for reserving an AP queue for use by a KVM guest is:
1. The administrator loads the vfio_ap device driver
2. The vfio-ap driver during its initialization will register a single 'matrix'
device with the device core. This will serve as the parent device for
- all mediated matrix devices used to configure an AP matrix for a guest.
+ all vfio_ap mediated devices used to configure an AP matrix for a guest.
3. The /sys/devices/vfio_ap/matrix device is created by the device core
4. The vfio_ap device driver will register with the AP bus for AP queue devices
of type 10 and higher (CEX4 and newer). The driver will provide the vfio_ap
@@ -269,7 +270,7 @@ The process for reserving an AP queue for use by a KVM guest is:
default zcrypt cex4queue driver.
8. The AP bus probes the vfio_ap device driver to bind the queues reserved for
it.
-9. The administrator creates a passthrough type mediated matrix device to be
+9. The administrator creates a passthrough type vfio_ap mediated device to be
used by a guest
10. The administrator assigns the adapters, usage domains and control domains
to be exclusively used by a guest.
@@ -279,14 +280,14 @@ Set up the VFIO mediated device interfaces
The VFIO AP device driver utilizes the common interface of the VFIO mediated
device core driver to:
-* Register an AP mediated bus driver to add a mediated matrix device to and
+* Register an AP mediated bus driver to add a vfio_ap mediated device to and
remove it from a VFIO group.
-* Create and destroy a mediated matrix device
-* Add a mediated matrix device to and remove it from the AP mediated bus driver
-* Add a mediated matrix device to and remove it from an IOMMU group
+* Create and destroy a vfio_ap mediated device
+* Add a vfio_ap mediated device to and remove it from the AP mediated bus driver
+* Add a vfio_ap mediated device to and remove it from an IOMMU group
The following high-level block diagram shows the main components and interfaces
-of the VFIO AP mediated matrix device driver::
+of the VFIO AP mediated device driver::
+-------------+
| |
@@ -343,7 +344,7 @@ matrix device.
* device_api:
the mediated device type's API
* available_instances:
- the number of mediated matrix passthrough devices
+ the number of vfio_ap mediated passthrough devices
that can be created
* device_api:
specifies the VFIO API
@@ -351,29 +352,37 @@ matrix device.
This attribute group identifies the user-defined sysfs attributes of the
mediated device. When a device is registered with the VFIO mediated device
framework, the sysfs attribute files identified in the 'mdev_attr_groups'
- structure will be created in the mediated matrix device's directory. The
- sysfs attributes for a mediated matrix device are:
+ structure will be created in the vfio_ap mediated device's directory. The
+ sysfs attributes for a vfio_ap mediated device are:
assign_adapter / unassign_adapter:
Write-only attributes for assigning/unassigning an AP adapter to/from the
- mediated matrix device. To assign/unassign an adapter, the APID of the
+ vfio_ap mediated device. To assign/unassign an adapter, the APID of the
adapter is echoed to the respective attribute file.
assign_domain / unassign_domain:
Write-only attributes for assigning/unassigning an AP usage domain to/from
- the mediated matrix device. To assign/unassign a domain, the domain
+ the vfio_ap mediated device. To assign/unassign a domain, the domain
number of the usage domain is echoed to the respective attribute
file.
matrix:
- A read-only file for displaying the APQNs derived from the cross product
- of the adapter and domain numbers assigned to the mediated matrix device.
+ A read-only file for displaying the APQNs derived from the Cartesian
+ product of the adapter and domain numbers assigned to the vfio_ap mediated
+ device.
+ guest_matrix:
+ A read-only file for displaying the APQNs derived from the Cartesian
+ product of the adapter and domain numbers assigned to the APM and AQM
+ fields respectively of the KVM guest's CRYCB. This may differ from the
+ the APQNs assigned to the vfio_ap mediated device if any APQN does not
+ reference a queue device bound to the vfio_ap device driver (i.e., the
+ queue is not in the host's AP configuration).
assign_control_domain / unassign_control_domain:
Write-only attributes for assigning/unassigning an AP control domain
- to/from the mediated matrix device. To assign/unassign a control domain,
+ to/from the vfio_ap mediated device. To assign/unassign a control domain,
the ID of the domain to be assigned/unassigned is echoed to the respective
attribute file.
control_domains:
A read-only file for displaying the control domain numbers assigned to the
- mediated matrix device.
+ vfio_ap mediated device.
* functions:
@@ -385,7 +394,7 @@ matrix device.
domains assigned via the corresponding sysfs attributes files
remove:
- deallocates the mediated matrix device's ap_matrix_mdev structure. This will
+ deallocates the vfio_ap mediated device's ap_matrix_mdev structure. This will
be allowed only if a running guest is not using the mdev.
* callback interfaces
@@ -397,24 +406,44 @@ matrix device.
for the mdev matrix device to the MDEV bus. Access to the KVM structure used
to configure the KVM guest is provided via this callback. The KVM structure,
is used to configure the guest's access to the AP matrix defined via the
- mediated matrix device's sysfs attribute files.
+ vfio_ap mediated device's sysfs attribute files.
release:
unregisters the VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the
mdev matrix device and deconfigures the guest's AP matrix.
-Configure the APM, AQM and ADM in the CRYCB
--------------------------------------------
-Configuring the AP matrix for a KVM guest will be performed when the
+Configure the guest's AP resources
+----------------------------------
+Configuring the AP resources for a KVM guest will be performed when the
VFIO_GROUP_NOTIFY_SET_KVM notifier callback is invoked. The notifier
-function is called when QEMU connects to KVM. The guest's AP matrix is
-configured via it's CRYCB by:
+function is called when QEMU connects to KVM. The guest's AP resources are
+configured via it's APCB by:
* Setting the bits in the APM corresponding to the APIDs assigned to the
- mediated matrix device via its 'assign_adapter' interface.
+ vfio_ap mediated device via its 'assign_adapter' interface.
* Setting the bits in the AQM corresponding to the domains assigned to the
- mediated matrix device via its 'assign_domain' interface.
+ vfio_ap mediated device via its 'assign_domain' interface.
* Setting the bits in the ADM corresponding to the domain dIDs assigned to the
- mediated matrix device via its 'assign_control_domains' interface.
+ vfio_ap mediated device via its 'assign_control_domains' interface.
+
+The linux device model precludes passing a device through to a KVM guest that
+is not bound to the device driver facilitating its pass-through. Consequently,
+an APQN that does not reference a queue device bound to the vfio_ap device
+driver will not be assigned to a KVM guest's matrix. The AP architecture,
+however, does not provide a means to filter individual APQNs from the guest's
+matrix, so the adapters assigned to vfio_ap mediated device via its
+sysfs 'assign_adapter' interface will be filtered as follows:
+
+ To filter APQNs by APID, each APQN derived from the Cartesian product of the
+ adapter numbers (APID) and domain numbers (APQI) assigned to the vfio_ap mdev
+ is examined and if any one of them does not reference a queue device bound to
+ the vfio_ap device driver, the adapter will not be plugged into the guest
+ (i.e., the bit corresponding to its APID will not be set in the APM of the
+ guest's APCB).
+
+ If at least one adapter is plugged into the guest, then all domains assigned
+ to the vfio_ap mdev will also be plugged into the guest (i.e., the bits
+ corresponding to the APQIs of the domains assigned to the vfio_ap mdev will be
+ set in the AQM field of the guest's APCB).
The CPU model features for AP
-----------------------------
@@ -435,16 +464,20 @@ available to a KVM guest via the following CPU model features:
can be made available to the guest only if it is available on the host (i.e.,
facility bit 12 is set).
+4. apqi: Indicates AP queue interrupts are available on the guest. This facility
+ can be made available to the guest only if it is available on the host (i.e.,
+ facility bit 65 is set).
+
Note: If the user chooses to specify a CPU model different than the 'host'
model to QEMU, the CPU model features and facilities need to be turned on
explicitly; for example::
- /usr/bin/qemu-system-s390x ... -cpu z13,ap=on,apqci=on,apft=on
+ /usr/bin/qemu-system-s390x ... -cpu z13,ap=on,apqci=on,apft=on,apqi=on
A guest can be precluded from using AP features/facilities by turning them off
explicitly; for example::
- /usr/bin/qemu-system-s390x ... -cpu host,ap=off,apqci=off,apft=off
+ /usr/bin/qemu-system-s390x ... -cpu host,ap=off,apqci=off,apft=off,apqi=off
Note: If the APFT facility is turned off (apft=off) for the guest, the guest
will not see any AP devices. The zcrypt device drivers that register for type 10
@@ -530,40 +563,56 @@ These are the steps:
2. Secure the AP queues to be used by the three guests so that the host can not
access them. To secure them, there are two sysfs files that specify
- bitmasks marking a subset of the APQN range as 'usable by the default AP
- queue device drivers' or 'not usable by the default device drivers' and thus
- available for use by the vfio_ap device driver'. The location of the sysfs
- files containing the masks are::
+ bitmasks marking a subset of the APQN range as usable only by the default AP
+ queue device drivers. All remaining APQNs are available for use by
+ any other device driver. The vfio_ap device driver is currently the only
+ non-default device driver. The location of the sysfs files containing the
+ masks are::
/sys/bus/ap/apmask
/sys/bus/ap/aqmask
The 'apmask' is a 256-bit mask that identifies a set of AP adapter IDs
- (APID). Each bit in the mask, from left to right (i.e., from most significant
- to least significant bit in big endian order), corresponds to an APID from
- 0-255. If a bit is set, the APID is marked as usable only by the default AP
- queue device drivers; otherwise, the APID is usable by the vfio_ap
- device driver.
+ (APID). Each bit in the mask, from left to right, corresponds to an APID from
+ 0-255. If a bit is set, the APID belongs to the subset of APQNs marked as
+ available only to the default AP queue device drivers.
The 'aqmask' is a 256-bit mask that identifies a set of AP queue indexes
- (APQI). Each bit in the mask, from left to right (i.e., from most significant
- to least significant bit in big endian order), corresponds to an APQI from
- 0-255. If a bit is set, the APQI is marked as usable only by the default AP
- queue device drivers; otherwise, the APQI is usable by the vfio_ap device
- driver.
+ (APQI). Each bit in the mask, from left to right, corresponds to an APQI from
+ 0-255. If a bit is set, the APQI belongs to the subset of APQNs marked as
+ available only to the default AP queue device drivers.
+
+ The Cartesian product of the APIDs corresponding to the bits set in the
+ apmask and the APQIs corresponding to the bits set in the aqmask comprise
+ the subset of APQNs that can be used only by the host default device drivers.
+ All other APQNs are available to the non-default device drivers such as the
+ vfio_ap driver.
+
+ Take, for example, the following masks::
+
+ apmask:
+ 0x7d00000000000000000000000000000000000000000000000000000000000000
+
+ aqmask:
+ 0x8000000000000000000000000000000000000000000000000000000000000000
- Take, for example, the following mask::
+ The masks indicate:
- 0x7dffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
+ * Adapters 1, 2, 3, 4, 5, and 7 are available for use by the host default
+ device drivers.
- It indicates:
+ * Domain 0 is available for use by the host default device drivers
- 1, 2, 3, 4, 5, and 7-255 belong to the default drivers' pool, and 0 and 6
- belong to the vfio_ap device driver's pool.
+ * The subset of APQNs available for use only by the default host device
+ drivers are:
+
+ (1,0), (2,0), (3,0), (4.0), (5,0) and (7,0)
+
+ * All other APQNs are available for use by the non-default device drivers.
The APQN of each AP queue device assigned to the linux host is checked by the
- AP bus against the set of APQNs derived from the cross product of APIDs
- and APQIs marked as usable only by the default AP queue device drivers. If a
+ AP bus against the set of APQNs derived from the Cartesian product of APIDs
+ and APQIs marked as available to the default AP queue device drivers. If a
match is detected, only the default AP queue device drivers will be probed;
otherwise, the vfio_ap device driver will be probed.
@@ -627,11 +676,22 @@ These are the steps:
default drivers pool: adapter 0-15, domain 1
alternate drivers pool: adapter 16-255, domains 0, 2-255
+ Note ***:
+ Changing a mask such that one or more APQNs will be taken from a vfio_ap
+ mediated device (see below) will fail with an error (EBUSY). A message
+ is logged to the kernel ring buffer which can be viewed with the 'dmesg'
+ command. The output identifies each APQN flagged as 'in use' and identifies
+ the vfio_ap mediated device to which it is assigned; for example:
+
+ Userspace may not re-assign queue 05.0054 already assigned to 62177883-f1bb-47f0-914d-32a22e3a8804
+ Userspace may not re-assign queue 04.0054 already assigned to cef03c3c-903d-4ecc-9a83-40694cb8aee4
+
Securing the APQNs for our example
----------------------------------
To secure the AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004, 06.0047,
06.00ab, and 06.00ff for use by the vfio_ap device driver, the corresponding
- APQNs can either be removed from the default masks::
+ APQNs can be removed from the default masks using either of the following
+ commands::
echo -5,-6 > /sys/bus/ap/apmask
@@ -684,7 +744,7 @@ Securing the APQNs for our example
/sys/devices/vfio_ap/matrix/
--- [mdev_supported_types]
- ------ [vfio_ap-passthrough] (passthrough mediated matrix device type)
+ ------ [vfio_ap-passthrough] (passthrough vfio_ap mediated device type)
--------- create
--------- [devices]
@@ -735,6 +795,9 @@ Securing the APQNs for our example
----------------unassign_control_domain
----------------unassign_domain
+ Note *****: The vfio_ap mdevs do not persist across reboots unless the
+ mdevctl tool is used to create and persist them.
+
4. The administrator now needs to configure the matrixes for the mediated
devices $uuid1 (for Guest1), $uuid2 (for Guest2) and $uuid3 (for Guest3).
@@ -775,17 +838,21 @@ Securing the APQNs for our example
higher than the maximum is specified, the operation will terminate with
an error (ENODEV).
- * All APQNs that can be derived from the adapter ID and the IDs of
- the previously assigned domains must be bound to the vfio_ap device
- driver. If no domains have yet been assigned, then there must be at least
- one APQN with the specified APID bound to the vfio_ap driver. If no such
- APQNs are bound to the driver, the operation will terminate with an
- error (EADDRNOTAVAIL).
+ * Each APQN derived from the Cartesian product of the APID of the adapter
+ being assigned and the APQIs of the domains previously assigned:
+
+ - Must only be available to the vfio_ap device driver as specified in the
+ sysfs /sys/bus/ap/apmask and /sys/bus/ap/aqmask attribute files. If even
+ one APQN is reserved for use by the host device driver, the operation
+ will terminate with an error (EADDRNOTAVAIL).
+
+ - Must NOT be assigned to another vfio_ap mediated device. If even one APQN
+ is assigned to another vfio_ap mediated device, the operation will
+ terminate with an error (EBUSY).
- No APQN that can be derived from the adapter ID and the IDs of the
- previously assigned domains can be assigned to another mediated matrix
- device. If an APQN is assigned to another mediated matrix device, the
- operation will terminate with an error (EADDRINUSE).
+ Must NOT be assigned while the sysfs /sys/bus/ap/apmask and
+ sys/bus/ap/aqmask attribute files are being edited or the operation may
+ terminate with an error (EBUSY).
In order to successfully assign a domain:
@@ -794,41 +861,47 @@ Securing the APQNs for our example
higher than the maximum is specified, the operation will terminate with
an error (ENODEV).
- * All APQNs that can be derived from the domain ID and the IDs of
- the previously assigned adapters must be bound to the vfio_ap device
- driver. If no domains have yet been assigned, then there must be at least
- one APQN with the specified APQI bound to the vfio_ap driver. If no such
- APQNs are bound to the driver, the operation will terminate with an
- error (EADDRNOTAVAIL).
+ * Each APQN derived from the Cartesian product of the APQI of the domain
+ being assigned and the APIDs of the adapters previously assigned:
- No APQN that can be derived from the domain ID and the IDs of the
- previously assigned adapters can be assigned to another mediated matrix
- device. If an APQN is assigned to another mediated matrix device, the
- operation will terminate with an error (EADDRINUSE).
+ - Must only be available to the vfio_ap device driver as specified in the
+ sysfs /sys/bus/ap/apmask and /sys/bus/ap/aqmask attribute files. If even
+ one APQN is reserved for use by the host device driver, the operation
+ will terminate with an error (EADDRNOTAVAIL).
- In order to successfully assign a control domain, the domain number
- specified must represent a value from 0 up to the maximum domain number
- configured for the system. If a control domain number higher than the maximum
- is specified, the operation will terminate with an error (ENODEV).
+ - Must NOT be assigned to another vfio_ap mediated device. If even one APQN
+ is assigned to another vfio_ap mediated device, the operation will
+ terminate with an error (EBUSY).
+
+ Must NOT be assigned while the sysfs /sys/bus/ap/apmask and
+ sys/bus/ap/aqmask attribute files are being edited or the operation may
+ terminate with an error (EBUSY).
+
+ In order to successfully assign a control domain:
+
+ * The domain number specified must represent a value from 0 up to the maximum
+ domain number configured for the system. If a control domain number higher
+ than the maximum is specified, the operation will terminate with an
+ error (ENODEV).
5. Start Guest1::
- /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \
+ /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on,apqi=on \
-device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid1 ...
7. Start Guest2::
- /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \
+ /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on,apqi=on \
-device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid2 ...
7. Start Guest3::
- /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \
+ /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on,apqi=on \
-device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid3 ...
-When the guest is shut down, the mediated matrix devices may be removed.
+When the guest is shut down, the vfio_ap mediated devices may be removed.
-Using our example again, to remove the mediated matrix device $uuid1::
+Using our example again, to remove the vfio_ap mediated device $uuid1::
/sys/devices/vfio_ap/matrix/
--- [mdev_supported_types]
@@ -844,23 +917,153 @@ Using our example again, to remove the mediated matrix device $uuid1::
This will remove all of the mdev matrix device's sysfs structures including
the mdev device itself. To recreate and reconfigure the mdev matrix device,
all of the steps starting with step 3 will have to be performed again. Note
-that the remove will fail if a guest using the mdev is still running.
+that the remove will fail if a guest using the vfio_ap mdev is still running.
-It is not necessary to remove an mdev matrix device, but one may want to
+It is not necessary to remove a vfio_ap mdev, but one may want to
remove it if no guest will use it during the remaining lifetime of the linux
-host. If the mdev matrix device is removed, one may want to also reconfigure
+host. If the vfio_ap mdev is removed, one may want to also reconfigure
the pool of adapters and queues reserved for use by the default drivers.
+Hot plug support:
+================
+An adapter, domain or control domain may be hot plugged into a running KVM
+guest by assigning it to the vfio_ap mediated device being used by the guest.
+Control domains will always be hot plugged; however, an adapter or domain will
+be hot plugged only if each new APQN resulting from its assignment
+references a queue device bound to the vfio_ap device driver as described
+below.
+
+When an adapter is assigned to a vfio_ap mediated device in use by a KVM guest:
+
+* If no domains have yet been plugged into the KVM guest:
+
+ Hot plug the adapter and every domain previously assigned to the mdev if each
+ APQN derived from the Cartesian product of the APID of the adapter being
+ assigned and the APQIs of the domains previously assigned references a queue
+ device bound to the vfio_ap device driver.
+
+* If one or more domains have previously been plugged into the guest:
+
+ Hot plug the adapter if each APQN derived from the Cartesian product of the
+ APID of the adapter being assigned and the APQIs of the domains already
+ plugged into the guest references a queue device bound to the vfio_ap device
+ driver.
+
+When a domain is assigned to a vfio_ap mediated device in use by a KVM guest:
+
+* If no adapters have yet been plugged into the KVM guest:
+
+ Hot plug the domain and every adapter previously assigned to the mdev if each
+ APQN derived from the Cartesian product of the APIDs of the adapters
+ previously assigned and the APQI of the domain being assigned references a
+ queue device bound to the vfio_ap device driver.
+
+* If one or more adapters have previously been plugged into the guest:
+
+ Hot plug the domain if each APQN derived from the Cartesian product of the
+ APIDs of the adapters already plugged into the guest and the APQI of the
+ domain being assigned references a queue device bound to the vfio_ap device
+ driver.
+
+Over-provisioning of AP queues for a KVM guest:
+==============================================
+Over-provisioning is defined herein as the assignment of adapters or domains to
+a vfio_ap mediated device that do not reference AP devices in the host's AP
+configuration. The idea here is that when the adapter or domain becomes
+available, it will be automatically hot-plugged into the KVM guest using
+the vfio_ap mediated device to which it is assigned as long as each new APQN
+resulting from plugging it in references a queue device bound to the vfio_ap
+device driver.
+
Limitations
===========
-* The KVM/kernel interfaces do not provide a way to prevent restoring an APQN
- to the default drivers pool of a queue that is still assigned to a mediated
- device in use by a guest. It is incumbent upon the administrator to
- ensure there is no mediated device in use by a guest to which the APQN is
- assigned lest the host be given access to the private data of the AP queue
- device such as a private key configured specifically for the guest.
+Live guest migration is not supported for guests using AP devices without
+intervention by a system administrator. Before a KVM guest can be migrated,
+the vfio_ap mediated device must be removed. Unfortunately, it can not be
+removed manually (i.e., echo 1 > /sys/devices/vfio_ap/matrix/$UUID/remove) while
+the mdev is in use by a KVM guest. If the guest is being emulated by QEMU,
+its mdev can be hot unplugged from the guest in one of two ways:
+
+1. If the KVM guest was started with libvirt, you can hot unplug the mdev via
+ the following commands:
+
+ virsh detach-device <guestname> <path-to-device-xml>
+
+ For example, to hot unplug mdev 62177883-f1bb-47f0-914d-32a22e3a8804 from
+ the guest named 'my-guest':
+
+ virsh detach-device my-guest ~/config/my-guest-hostdev.xml
+
+ The contents of my-guest-hostdev.xml:
+
+ <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-ap'>
+ <source>
+ <address uuid='62177883-f1bb-47f0-914d-32a22e3a8804'/>
+ </source>
+ </hostdev>
+
+
+ virsh qemu-monitor-command <guest-name> --hmp "device-del <device-id>"
+
+ For example, to hot unplug the vfio_ap mediated device identified on the
+ qemu command line with 'id=hostdev0' from the guest named 'my-guest':
+
+ virsh qemu-monitor-command my-guest --hmp "device_del hostdev0"
+
+2. A vfio_ap mediated device can be hot unplugged by attaching the qemu monitor
+ to the guest and using the following qemu monitor command:
+
+ (QEMU) device-del id=<device-id>
+
+ For example, to hot unplug the vfio_ap mediated device that was specified
+ on the qemu command line with 'id=hostdev0' when the guest was started:
+
+ (QEMU) device-del id=hostdev0
+
+After live migration of the KVM guest completes, an AP configuration can be
+restored to the KVM guest by hot plugging a vfio_ap mediated device on the target
+system into the guest in one of two ways:
+
+1. If the KVM guest was started with libvirt, you can hot plug a matrix mediated
+ device into the guest via the following virsh commands:
+
+ virsh attach-device <guestname> <path-to-device-xml>
+
+ For example, to hot plug mdev 62177883-f1bb-47f0-914d-32a22e3a8804 into
+ the guest named 'my-guest':
+
+ virsh attach-device my-guest ~/config/my-guest-hostdev.xml
+
+ The contents of my-guest-hostdev.xml:
+
+ <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-ap'>
+ <source>
+ <address uuid='62177883-f1bb-47f0-914d-32a22e3a8804'/>
+ </source>
+ </hostdev>
+
+
+ virsh qemu-monitor-command <guest-name> --hmp \
+ "device_add vfio-ap,sysfsdev=<path-to-mdev>,id=<device-id>"
+
+ For example, to hot plug the vfio_ap mediated device
+ 62177883-f1bb-47f0-914d-32a22e3a8804 into the guest named 'my-guest' with
+ device-id hostdev0:
+
+ virsh qemu-monitor-command my-guest --hmp \
+ "device_add vfio-ap,\
+ sysfsdev=/sys/devices/vfio_ap/matrix/62177883-f1bb-47f0-914d-32a22e3a8804,\
+ id=hostdev0"
+
+2. A vfio_ap mediated device can be hot plugged by attaching the qemu monitor
+ to the guest and using the following qemu monitor command:
+
+ (qemu) device_add "vfio-ap,sysfsdev=<path-to-mdev>,id=<device-id>"
-* Dynamically modifying the AP matrix for a running guest (which would amount to
- hot(un)plug of AP devices for the guest) is currently not supported
+ For example, to plug the vfio_ap mediated device
+ 62177883-f1bb-47f0-914d-32a22e3a8804 into the guest with the device-id
+ hostdev0:
-* Live guest migration is not supported for guests using AP devices.
+ (QEMU) device-add "vfio-ap,\
+ sysfsdev=/sys/devices/vfio_ap/matrix/62177883-f1bb-47f0-914d-32a22e3a8804,\
+ id=hostdev0"
--
2.21.1
Introduces a new driver callback to prevent a root user from unbinding
an AP queue from its device driver if the queue is in use. The callback
will be invoked whenever a change to the AP bus's sysfs apmask or aqmask
attributes would result in one or more AP queues being removed from its
driver. If the callback responds in the affirmative for any driver
queried, the change to the apmask or aqmask will be rejected with a device
busy error.
For this patch, only non-default drivers will be queried. Currently,
there is only one non-default driver, the vfio_ap device driver. The
vfio_ap device driver facilitates pass-through of an AP queue to a
guest. The idea here is that a guest may be administered by a different
sysadmin than the host and we don't want AP resources to unexpectedly
disappear from a guest's AP configuration (i.e., adapters and domains
assigned to the matrix mdev). This will enforce the proper procedure for
removing AP resources intended for guest usage which is to
first unassign them from the matrix mdev, then unbind them from the
vfio_ap device driver.
Signed-off-by: Tony Krowiak <[email protected]>
Reviewed-by: Harald Freudenberger <[email protected]>
---
drivers/s390/crypto/ap_bus.c | 160 ++++++++++++++++++++++++++++++++---
drivers/s390/crypto/ap_bus.h | 4 +
2 files changed, 154 insertions(+), 10 deletions(-)
diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
index ef738b42a092..593573740981 100644
--- a/drivers/s390/crypto/ap_bus.c
+++ b/drivers/s390/crypto/ap_bus.c
@@ -35,6 +35,7 @@
#include <linux/mod_devicetable.h>
#include <linux/debugfs.h>
#include <linux/ctype.h>
+#include <linux/module.h>
#include "ap_bus.h"
#include "ap_debug.h"
@@ -901,6 +902,23 @@ static int modify_bitmap(const char *str, unsigned long *bitmap, int bits)
return 0;
}
+static int ap_parse_bitmap_str(const char *str, unsigned long *bitmap, int bits,
+ unsigned long *newmap)
+{
+ unsigned long size;
+ int rc;
+
+ size = BITS_TO_LONGS(bits)*sizeof(unsigned long);
+ if (*str == '+' || *str == '-') {
+ memcpy(newmap, bitmap, size);
+ rc = modify_bitmap(str, newmap, bits);
+ } else {
+ memset(newmap, 0, size);
+ rc = hex2bitmap(str, newmap, bits);
+ }
+ return rc;
+}
+
int ap_parse_mask_str(const char *str,
unsigned long *bitmap, int bits,
struct mutex *lock)
@@ -920,14 +938,7 @@ int ap_parse_mask_str(const char *str,
kfree(newmap);
return -ERESTARTSYS;
}
-
- if (*str == '+' || *str == '-') {
- memcpy(newmap, bitmap, size);
- rc = modify_bitmap(str, newmap, bits);
- } else {
- memset(newmap, 0, size);
- rc = hex2bitmap(str, newmap, bits);
- }
+ rc = ap_parse_bitmap_str(str, bitmap, bits, newmap);
if (rc == 0)
memcpy(bitmap, newmap, size);
mutex_unlock(lock);
@@ -1119,12 +1130,76 @@ static ssize_t apmask_show(struct bus_type *bus, char *buf)
return rc;
}
+static int __verify_card_reservations(struct device_driver *drv, void *data)
+{
+ int rc = 0;
+ struct ap_driver *ap_drv = to_ap_drv(drv);
+ unsigned long *newapm = (unsigned long *)data;
+
+ /*
+ * No need to verify whether the driver is using the queues if it is the
+ * default driver.
+ */
+ if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
+ return 0;
+
+ /*
+ * increase the driver's module refcounter to be sure it is not
+ * going away when we invoke the callback function.
+ */
+ if (!try_module_get(drv->owner))
+ return 0;
+
+ if (ap_drv->in_use) {
+ rc = ap_drv->in_use(newapm, ap_perms.aqm);
+ if (rc)
+ return rc;
+ }
+
+ /* release the driver's module */
+ module_put(drv->owner);
+
+ return rc;
+}
+
+static int apmask_commit(unsigned long *newapm)
+{
+ int rc;
+ unsigned long reserved[BITS_TO_LONGS(AP_DEVICES)];
+
+ /*
+ * Check if any bits in the apmask have been set which will
+ * result in queues being removed from non-default drivers
+ */
+ if (bitmap_andnot(reserved, newapm, ap_perms.apm, AP_DEVICES)) {
+ rc = bus_for_each_drv(&ap_bus_type, NULL, reserved,
+ __verify_card_reservations);
+ if (rc)
+ return rc;
+ }
+
+ memcpy(ap_perms.apm, newapm, APMASKSIZE);
+
+ return 0;
+}
+
static ssize_t apmask_store(struct bus_type *bus, const char *buf,
size_t count)
{
int rc;
+ DECLARE_BITMAP(newapm, AP_DEVICES);
+
+ if (mutex_lock_interruptible(&ap_perms_mutex))
+ return -ERESTARTSYS;
- rc = ap_parse_mask_str(buf, ap_perms.apm, AP_DEVICES, &ap_perms_mutex);
+ rc = ap_parse_bitmap_str(buf, ap_perms.apm, AP_DEVICES, newapm);
+ if (rc)
+ goto done;
+
+ rc = apmask_commit(newapm);
+
+done:
+ mutex_unlock(&ap_perms_mutex);
if (rc)
return rc;
@@ -1150,12 +1225,77 @@ static ssize_t aqmask_show(struct bus_type *bus, char *buf)
return rc;
}
+static int __verify_queue_reservations(struct device_driver *drv, void *data)
+{
+ int rc = 0;
+ struct ap_driver *ap_drv = to_ap_drv(drv);
+ unsigned long *newaqm = (unsigned long *)data;
+
+ /*
+ * If the reserved bits do not identify queues reserved for use by the
+ * non-default driver, there is no need to verify the driver is using
+ * the queues.
+ */
+ if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
+ return 0;
+
+ /*
+ * increase the driver's module refcounter to be sure it is not
+ * going away when we invoke the callback function.
+ */
+ if (!try_module_get(drv->owner))
+ return 0;
+
+ if (ap_drv->in_use) {
+ rc = ap_drv->in_use(ap_perms.apm, newaqm);
+ if (rc)
+ return rc;
+ }
+
+ /* release the driver's module */
+ module_put(drv->owner);
+
+ return rc;
+}
+
+static int aqmask_commit(unsigned long *newaqm)
+{
+ int rc;
+ unsigned long reserved[BITS_TO_LONGS(AP_DOMAINS)];
+
+ /*
+ * Check if any bits in the aqmask have been set which will
+ * result in queues being removed from non-default drivers
+ */
+ if (bitmap_andnot(reserved, newaqm, ap_perms.aqm, AP_DOMAINS)) {
+ rc = bus_for_each_drv(&ap_bus_type, NULL, reserved,
+ __verify_queue_reservations);
+ if (rc)
+ return rc;
+ }
+
+ memcpy(ap_perms.aqm, newaqm, AQMASKSIZE);
+
+ return 0;
+}
+
static ssize_t aqmask_store(struct bus_type *bus, const char *buf,
size_t count)
{
int rc;
+ DECLARE_BITMAP(newaqm, AP_DOMAINS);
+
+ if (mutex_lock_interruptible(&ap_perms_mutex))
+ return -ERESTARTSYS;
+
+ rc = ap_parse_bitmap_str(buf, ap_perms.aqm, AP_DOMAINS, newaqm);
+ if (rc)
+ goto done;
+
+ rc = aqmask_commit(newaqm);
- rc = ap_parse_mask_str(buf, ap_perms.aqm, AP_DOMAINS, &ap_perms_mutex);
+done:
+ mutex_unlock(&ap_perms_mutex);
if (rc)
return rc;
diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
index 5029b80132aa..65edd847c65a 100644
--- a/drivers/s390/crypto/ap_bus.h
+++ b/drivers/s390/crypto/ap_bus.h
@@ -145,6 +145,7 @@ struct ap_driver {
int (*probe)(struct ap_device *);
void (*remove)(struct ap_device *);
+ int (*in_use)(unsigned long *apm, unsigned long *aqm);
};
#define to_ap_drv(x) container_of((x), struct ap_driver, driver)
@@ -293,6 +294,9 @@ void ap_queue_init_state(struct ap_queue *aq);
struct ap_card *ap_card_create(int id, int queue_depth, int raw_device_type,
int comp_device_type, unsigned int functions);
+#define APMASKSIZE (BITS_TO_LONGS(AP_DEVICES) * sizeof(unsigned long))
+#define AQMASKSIZE (BITS_TO_LONGS(AP_DOMAINS) * sizeof(unsigned long))
+
struct ap_perms {
unsigned long ioctlm[BITS_TO_LONGS(AP_IOCTLS)];
unsigned long apm[BITS_TO_LONGS(AP_DEVICES)];
--
2.21.1
Let's move the probe and remove callbacks into the vfio_ap_ops.c
file to keep all code related to managing queues in a single file. This
way, all functions related to queue management can be removed from the
vfio_ap_private.h header file defining the public interfaces for the
vfio_ap device driver.
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_drv.c | 45 ++-----------------------
drivers/s390/crypto/vfio_ap_ops.c | 47 +++++++++++++++++++++++++--
drivers/s390/crypto/vfio_ap_private.h | 7 ++--
3 files changed, 50 insertions(+), 49 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index be2520cc010b..73bd073fd5d3 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -43,47 +43,6 @@ static struct ap_device_id ap_queue_ids[] = {
MODULE_DEVICE_TABLE(vfio_ap, ap_queue_ids);
-/**
- * vfio_ap_queue_dev_probe:
- *
- * Allocate a vfio_ap_queue structure and associate it
- * with the device as driver_data.
- */
-static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
-{
- struct vfio_ap_queue *q;
-
- q = kzalloc(sizeof(*q), GFP_KERNEL);
- if (!q)
- return -ENOMEM;
- dev_set_drvdata(&apdev->device, q);
- q->apqn = to_ap_queue(&apdev->device)->qid;
- q->saved_isc = VFIO_AP_ISC_INVALID;
- return 0;
-}
-
-/**
- * vfio_ap_queue_dev_remove:
- *
- * Takes the matrix lock to avoid actions on this device while removing
- * Free the associated vfio_ap_queue structure
- */
-static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
-{
- struct vfio_ap_queue *q;
- int apid, apqi;
-
- mutex_lock(&matrix_dev->lock);
- q = dev_get_drvdata(&apdev->device);
- dev_set_drvdata(&apdev->device, NULL);
- apid = AP_QID_CARD(q->apqn);
- apqi = AP_QID_QUEUE(q->apqn);
- vfio_ap_mdev_reset_queue(apid, apqi, 1);
- vfio_ap_irq_disable(q);
- kfree(q);
- mutex_unlock(&matrix_dev->lock);
-}
-
static void vfio_ap_matrix_dev_release(struct device *dev)
{
struct ap_matrix_dev *matrix_dev = dev_get_drvdata(dev);
@@ -186,8 +145,8 @@ static int __init vfio_ap_init(void)
return ret;
memset(&vfio_ap_drv, 0, sizeof(vfio_ap_drv));
- vfio_ap_drv.probe = vfio_ap_queue_dev_probe;
- vfio_ap_drv.remove = vfio_ap_queue_dev_remove;
+ vfio_ap_drv.probe = vfio_ap_mdev_probe_queue;
+ vfio_ap_drv.remove = vfio_ap_mdev_remove_queue;
vfio_ap_drv.ids = ap_queue_ids;
ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index e0bde8518745..66fd9784a156 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -144,7 +144,7 @@ static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q)
* Returns if ap_aqic function failed with invalid, deconfigured or
* checkstopped AP.
*/
-struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
+static struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
{
struct ap_qirq_ctrl aqic_gisa = {};
struct ap_queue_status status;
@@ -1128,8 +1128,8 @@ static void vfio_ap_irq_disable_apqn(int apqn)
}
}
-int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
- unsigned int retry)
+static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
+ unsigned int retry)
{
struct ap_queue_status status;
int retry2 = 2;
@@ -1302,3 +1302,44 @@ void vfio_ap_mdev_unregister(void)
{
mdev_unregister_device(&matrix_dev->device);
}
+
+/**
+ * vfio_ap_mdev_probe_queue:
+ *
+ * Allocate a vfio_ap_queue structure and associate it
+ * with the device as driver_data.
+ */
+int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
+{
+ struct vfio_ap_queue *q;
+
+ q = kzalloc(sizeof(*q), GFP_KERNEL);
+ if (!q)
+ return -ENOMEM;
+ dev_set_drvdata(&apdev->device, q);
+ q->apqn = to_ap_queue(&apdev->device)->qid;
+ q->saved_isc = VFIO_AP_ISC_INVALID;
+ return 0;
+}
+
+/**
+ * vfio_ap_mdev_remove_queue:
+ *
+ * Takes the matrix lock to avoid actions on this device while removing
+ * Free the associated vfio_ap_queue structure
+ */
+void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
+{
+ struct vfio_ap_queue *q;
+ int apid, apqi;
+
+ mutex_lock(&matrix_dev->lock);
+ q = dev_get_drvdata(&apdev->device);
+ dev_set_drvdata(&apdev->device, NULL);
+ apid = AP_QID_CARD(q->apqn);
+ apqi = AP_QID_QUEUE(q->apqn);
+ vfio_ap_mdev_reset_queue(apid, apqi, 1);
+ vfio_ap_irq_disable(q);
+ kfree(q);
+ mutex_unlock(&matrix_dev->lock);
+}
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index f46dde56b464..d9003de4fbad 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -90,8 +90,6 @@ struct ap_matrix_mdev {
extern int vfio_ap_mdev_register(void);
extern void vfio_ap_mdev_unregister(void);
-int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
- unsigned int retry);
struct vfio_ap_queue {
struct ap_matrix_mdev *matrix_mdev;
@@ -100,5 +98,8 @@ struct vfio_ap_queue {
#define VFIO_AP_ISC_INVALID 0xff
unsigned char saved_isc;
};
-struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q);
+
+int vfio_ap_mdev_probe_queue(struct ap_device *queue);
+void vfio_ap_mdev_remove_queue(struct ap_device *queue);
+
#endif /* _VFIO_AP_PRIVATE_H_ */
--
2.21.1
The matrix of adapters and domains configured in a guest's APCB may
differ from the matrix of adapters and domains assigned to the matrix mdev,
so this patch introduces a sysfs attribute to display the matrix of
adapters and domains that are or will be assigned to the APCB of a guest
that is or will be using the matrix mdev. For a matrix mdev denoted by
$uuid, the guest matrix can be displayed as follows:
cat /sys/devices/vfio_ap/matrix/$uuid/guest_matrix
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 51 ++++++++++++++++++++++---------
1 file changed, 37 insertions(+), 14 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 89b0e81657ca..a69422d76e7f 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1075,29 +1075,24 @@ static ssize_t control_domains_show(struct device *dev,
}
static DEVICE_ATTR_RO(control_domains);
-static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
- char *buf)
+static ssize_t vfio_ap_mdev_matrix_show(struct ap_matrix *matrix, char *buf)
{
- struct mdev_device *mdev = mdev_from_dev(dev);
- struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
char *bufpos = buf;
unsigned long apid;
unsigned long apqi;
unsigned long apid1;
unsigned long apqi1;
- unsigned long napm_bits = matrix_mdev->matrix.apm_max + 1;
- unsigned long naqm_bits = matrix_mdev->matrix.aqm_max + 1;
+ unsigned long napm_bits = matrix->apm_max + 1;
+ unsigned long naqm_bits = matrix->aqm_max + 1;
int nchars = 0;
int n;
- apid1 = find_first_bit_inv(matrix_mdev->matrix.apm, napm_bits);
- apqi1 = find_first_bit_inv(matrix_mdev->matrix.aqm, naqm_bits);
-
- mutex_lock(&matrix_dev->lock);
+ apid1 = find_first_bit_inv(matrix->apm, napm_bits);
+ apqi1 = find_first_bit_inv(matrix->aqm, naqm_bits);
if ((apid1 < napm_bits) && (apqi1 < naqm_bits)) {
- for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm_bits) {
- for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
+ for_each_set_bit_inv(apid, matrix->apm, napm_bits) {
+ for_each_set_bit_inv(apqi, matrix->aqm,
naqm_bits) {
n = sprintf(bufpos, "%02lx.%04lx\n", apid,
apqi);
@@ -1106,25 +1101,52 @@ static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
}
}
} else if (apid1 < napm_bits) {
- for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm_bits) {
+ for_each_set_bit_inv(apid, matrix->apm, napm_bits) {
n = sprintf(bufpos, "%02lx.\n", apid);
bufpos += n;
nchars += n;
}
} else if (apqi1 < naqm_bits) {
- for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, naqm_bits) {
+ for_each_set_bit_inv(apqi, matrix->aqm, naqm_bits) {
n = sprintf(bufpos, ".%04lx\n", apqi);
bufpos += n;
nchars += n;
}
}
+ return nchars;
+}
+
+static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ ssize_t nchars;
+ struct mdev_device *mdev = mdev_from_dev(dev);
+ struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+
+ mutex_lock(&matrix_dev->lock);
+ nchars = vfio_ap_mdev_matrix_show(&matrix_mdev->matrix, buf);
mutex_unlock(&matrix_dev->lock);
return nchars;
}
static DEVICE_ATTR_RO(matrix);
+static ssize_t guest_matrix_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ ssize_t nchars;
+ struct mdev_device *mdev = mdev_from_dev(dev);
+ struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+
+ mutex_lock(&matrix_dev->lock);
+ nchars = vfio_ap_mdev_matrix_show(&matrix_mdev->shadow_apcb, buf);
+ mutex_unlock(&matrix_dev->lock);
+
+ return nchars;
+}
+static DEVICE_ATTR_RO(guest_matrix);
+
static struct attribute *vfio_ap_mdev_attrs[] = {
&dev_attr_assign_adapter.attr,
&dev_attr_unassign_adapter.attr,
@@ -1134,6 +1156,7 @@ static struct attribute *vfio_ap_mdev_attrs[] = {
&dev_attr_unassign_control_domain.attr,
&dev_attr_control_domains.attr,
&dev_attr_matrix.attr,
+ &dev_attr_guest_matrix.attr,
NULL,
};
--
2.21.1
The current implementation does not allow assignment of an AP adapter or
domain to an mdev device if each APQN resulting from the assignment
does not reference an AP queue device that is bound to the vfio_ap device
driver. This patch allows assignment of AP resources to the matrix mdev as
long as the APQNs resulting from the assignment:
1. Are not reserved by the AP BUS for use by the zcrypt device drivers.
2. Are not assigned to another matrix mdev.
The rationale behind this is twofold:
1. The AP architecture does not preclude assignment of APQNs to an AP
configuration that are not available to the system.
2. APQNs that do not reference a queue device bound to the vfio_ap
device driver will not be assigned to the guest's CRYCB, so the
guest will not get access to queues not bound to the vfio_ap driver.
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 199 +++++-------------------------
1 file changed, 28 insertions(+), 171 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 633c61995891..586ec5776693 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -475,122 +475,6 @@ static struct attribute_group *vfio_ap_mdev_type_groups[] = {
NULL,
};
-struct vfio_ap_queue_reserved {
- unsigned long *apid;
- unsigned long *apqi;
- bool reserved;
-};
-
-/**
- * vfio_ap_has_queue
- *
- * @dev: an AP queue device
- * @data: a struct vfio_ap_queue_reserved reference
- *
- * Flags whether the AP queue device (@dev) has a queue ID containing the APQN,
- * apid or apqi specified in @data:
- *
- * - If @data contains both an apid and apqi value, then @data will be flagged
- * as reserved if the APID and APQI fields for the AP queue device matches
- *
- * - If @data contains only an apid value, @data will be flagged as
- * reserved if the APID field in the AP queue device matches
- *
- * - If @data contains only an apqi value, @data will be flagged as
- * reserved if the APQI field in the AP queue device matches
- *
- * Returns 0 to indicate the input to function succeeded. Returns -EINVAL if
- * @data does not contain either an apid or apqi.
- */
-static int vfio_ap_has_queue(struct device *dev, void *data)
-{
- struct vfio_ap_queue_reserved *qres = data;
- struct ap_queue *ap_queue = to_ap_queue(dev);
- ap_qid_t qid;
- unsigned long id;
-
- if (qres->apid && qres->apqi) {
- qid = AP_MKQID(*qres->apid, *qres->apqi);
- if (qid == ap_queue->qid)
- qres->reserved = true;
- } else if (qres->apid && !qres->apqi) {
- id = AP_QID_CARD(ap_queue->qid);
- if (id == *qres->apid)
- qres->reserved = true;
- } else if (!qres->apid && qres->apqi) {
- id = AP_QID_QUEUE(ap_queue->qid);
- if (id == *qres->apqi)
- qres->reserved = true;
- } else {
- return -EINVAL;
- }
-
- return 0;
-}
-
-/**
- * vfio_ap_verify_queue_reserved
- *
- * @matrix_dev: a mediated matrix device
- * @apid: an AP adapter ID
- * @apqi: an AP queue index
- *
- * Verifies that the AP queue with @apid/@apqi is reserved by the VFIO AP device
- * driver according to the following rules:
- *
- * - If both @apid and @apqi are not NULL, then there must be an AP queue
- * device bound to the vfio_ap driver with the APQN identified by @apid and
- * @apqi
- *
- * - If only @apid is not NULL, then there must be an AP queue device bound
- * to the vfio_ap driver with an APQN containing @apid
- *
- * - If only @apqi is not NULL, then there must be an AP queue device bound
- * to the vfio_ap driver with an APQN containing @apqi
- *
- * Returns 0 if the AP queue is reserved; otherwise, returns -EADDRNOTAVAIL.
- */
-static int vfio_ap_verify_queue_reserved(unsigned long *apid,
- unsigned long *apqi)
-{
- int ret;
- struct vfio_ap_queue_reserved qres;
-
- qres.apid = apid;
- qres.apqi = apqi;
- qres.reserved = false;
-
- ret = driver_for_each_device(&matrix_dev->vfio_ap_drv->driver, NULL,
- &qres, vfio_ap_has_queue);
- if (ret)
- return ret;
-
- if (qres.reserved)
- return 0;
-
- return -EADDRNOTAVAIL;
-}
-
-static int
-vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
- unsigned long apid)
-{
- int ret;
- unsigned long apqi;
- unsigned long nbits = matrix_mdev->matrix.aqm_max + 1;
-
- if (find_first_bit_inv(matrix_mdev->matrix.aqm, nbits) >= nbits)
- return vfio_ap_verify_queue_reserved(&apid, NULL);
-
- for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, nbits) {
- ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
- if (ret)
- return ret;
- }
-
- return 0;
-}
-
#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
"already assigned to %s"
@@ -656,6 +540,16 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
return 0;
}
+static int vfio_ap_mdev_validate_masks(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long *mdev_apm,
+ unsigned long *mdev_aqm)
+{
+ if (ap_apqn_in_matrix_owned_by_def_drv(mdev_apm, mdev_aqm))
+ return -EADDRNOTAVAIL;
+
+ return vfio_ap_mdev_verify_no_sharing(matrix_mdev, mdev_apm, mdev_aqm);
+}
+
enum qlink_action {
LINK_APID,
LINK_APQI,
@@ -790,34 +684,23 @@ static ssize_t assign_adapter_store(struct device *dev,
if (apid > matrix_mdev->matrix.apm_max)
return -ENODEV;
- /*
- * Set the bit in the AP mask (APM) corresponding to the AP adapter
- * number (APID). The bits in the mask, from most significant to least
- * significant bit, correspond to APIDs 0-255.
- */
- if (!mutex_trylock(&matrix_dev->lock))
- return -EBUSY;
-
- ret = vfio_ap_mdev_verify_queues_reserved_for_apid(matrix_mdev, apid);
- if (ret)
- goto done;
-
memset(apm, 0, sizeof(apm));
set_bit_inv(apid, apm);
- ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev, apm,
- matrix_mdev->matrix.aqm);
- if (ret)
- goto done;
+ if (!mutex_trylock(&matrix_dev->lock))
+ return -EBUSY;
+ ret = vfio_ap_mdev_validate_masks(matrix_mdev, apm,
+ matrix_mdev->matrix.aqm);
+ if (ret) {
+ mutex_unlock(&matrix_dev->lock);
+ return ret;
+ }
set_bit_inv(apid, matrix_mdev->matrix.apm);
vfio_ap_mdev_manage_qlinks(matrix_mdev, LINK_APID, apid);
- ret = count;
-
-done:
mutex_unlock(&matrix_dev->lock);
- return ret;
+ return count;
}
static DEVICE_ATTR_WO(assign_adapter);
@@ -867,26 +750,6 @@ static ssize_t unassign_adapter_store(struct device *dev,
}
static DEVICE_ATTR_WO(unassign_adapter);
-static int
-vfio_ap_mdev_verify_queues_reserved_for_apqi(struct ap_matrix_mdev *matrix_mdev,
- unsigned long apqi)
-{
- int ret;
- unsigned long apid;
- unsigned long nbits = matrix_mdev->matrix.apm_max + 1;
-
- if (find_first_bit_inv(matrix_mdev->matrix.apm, nbits) >= nbits)
- return vfio_ap_verify_queue_reserved(NULL, &apqi);
-
- for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, nbits) {
- ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
- if (ret)
- return ret;
- }
-
- return 0;
-}
-
/**
* assign_domain_store
*
@@ -940,29 +803,23 @@ static ssize_t assign_domain_store(struct device *dev,
if (apqi > max_apqi)
return -ENODEV;
- if (!mutex_trylock(&matrix_dev->lock))
- return -EBUSY;
-
- ret = vfio_ap_mdev_verify_queues_reserved_for_apqi(matrix_mdev, apqi);
- if (ret)
- goto done;
-
memset(aqm, 0, sizeof(aqm));
set_bit_inv(apqi, aqm);
- ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev,
- matrix_mdev->matrix.apm, aqm);
- if (ret)
- goto done;
+ if (!mutex_trylock(&matrix_dev->lock))
+ return -EBUSY;
+ ret = vfio_ap_mdev_validate_masks(matrix_mdev, matrix_mdev->matrix.apm,
+ aqm);
+ if (ret) {
+ mutex_unlock(&matrix_dev->lock);
+ return ret;
+ }
set_bit_inv(apqi, matrix_mdev->matrix.aqm);
vfio_ap_mdev_manage_qlinks(matrix_mdev, LINK_APQI, apqi);
- ret = count;
-
-done:
mutex_unlock(&matrix_dev->lock);
- return ret;
+ return count;
}
static DEVICE_ATTR_WO(assign_domain);
--
2.21.1
The motivation for config change notification is to enable the vfio_ap
device driver to handle hot plug/unplug of AP queues for a KVM guest as a
bulk operation. For example, if a new APID is dynamically assigned to the
host configuration, then a queue device will be created for each APQN that
can be formulated from the new APID and all APQIs already assigned to the
host configuration. Each of these new queue devices will get bound to their
respective driver one at a time, as they are created. In the case of the
vfio_ap driver, if the APQN of the queue device being bound to the driver
is assigned to a matrix mdev in use by a KVM guest, it will be hot plugged
into the guest if possible. Given that the AP architecture allows for 256
adapters and 256 domains, one can see the possibility of the vfio_ap
driver's probe/remove callbacks getting invoked an inordinate number of
times when the host configuration changes. Keep in mind that in order to
plug/unplug an AP queue for a guest, the guest's VCPUs must be suspended,
then the guest's AP configuration must be updated followed by the VCPUs
being resumed. If this is done each time the probe or remove callback is
invoked and there are hundreds or thousands of queues to be probed or
removed, this would be incredibly inefficient and could have a large impact
on guest performance. What the config notification does is allow us to
make the changes to the guest in a single operation.
This patch implements the on_cfg_changed callback which notifies the
AP device drivers that the host AP configuration has changed (i.e.,
adapters, domains and/or control domains are added to or removed from the
host AP configuration).
Adapters added to host configuration:
* The APIDs of the adapters added will be stored in a bitmap contained
within the struct representing the matrix device which is the parent
device of all matrix mediated devices.
* When a queue is probed, if the APQN of the queue being probed is
assigned to an mdev in use by a guest, the queue may get hot plugged
into the guest; however, if the APID of the adapter is contained in the
bitmap of adapters added, the queue hot plug operation will be skipped
until the AP bus notifies the driver that its scan operation has
completed (another patch).
Domains added to host configuration:
* The APQIs of the domains added will be stored in a bitmap contained
within the struct representing the matrix device which is the parent
device of all matrix mediated devices.
* When a queue is probed, if the APQN of the queue being probed is
assigned to an mdev in use by a guest, the queue may get hot plugged
into the guest; however, if the APQI of the domain is contained in the
bitmap of domains added, the queue hot plug operation will be skipped
until the AP bus notifies the driver that its scan operation has
completed (another patch).
Control domains added to the host configuration:
* Since control domains are not devices in the linux device model, there is
no concern with whether they are bound to a device driver.
* The AP architecture will mask off control domains not in the host AP
configuration from the guest, so there is also no concern about a guest
changing a domain to which it is not authorized.
Adapters removed from configuration:
* Each adapter removed from the host configuration will be hot unplugged
from each guest using it.
* Each queue device with the APID identifying an adapter removed from
the host AP configuration will be unlinked from the matrix mdev to which
the queue's APQN is assigned.
* When the vfio_ap driver's remove callback is invoked, if the queue
device is not linked to the matrix mdev, the hot unplug operation will
be skipped until the vfio_ap driver is notified that the AP bus scan
has completed.
Adapters removed from configuration:
* Each domain removed from the host configuration will be hot unplugged
from each guest using it.
* Each queue device with the APQI identifying a domain removed from
the host AP configuration will be unlinked from the matrix mdev to which
the queue's APQN is assigned.
* When the vfio_ap driver's remove callback is invoked, if the queue
device is not linked to the matrix mdev, the hot unplug operation will
be until the vfio_ap driver is notified that the AP bus scan
has completed.
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_drv.c | 5 +-
drivers/s390/crypto/vfio_ap_ops.c | 213 ++++++++++++++++++++++++++++--
2 files changed, 209 insertions(+), 9 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index 8934471b7944..d7aa5543afef 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -87,9 +87,11 @@ static int vfio_ap_matrix_dev_create(void)
/* Fill in config info via PQAP(QCI), if available */
if (test_facility(12)) {
- ret = ap_qci(&matrix_dev->info);
+ ret = ap_qci(&matrix_dev->config_info);
if (ret)
goto matrix_alloc_err;
+ memcpy(&matrix_dev->config_info_prev, &matrix_dev->config_info,
+ sizeof(struct ap_config_info));
}
mutex_init(&matrix_dev->lock);
@@ -149,6 +151,7 @@ static int __init vfio_ap_init(void)
vfio_ap_drv.remove = vfio_ap_mdev_remove_queue;
vfio_ap_drv.in_use = vfio_ap_mdev_resource_in_use;
vfio_ap_drv.ids = ap_queue_ids;
+ vfio_ap_drv.on_config_changed = vfio_ap_on_cfg_changed;
ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
if (ret) {
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 1179c6af59c6..074147fae339 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -350,8 +350,8 @@ static void vfio_ap_mdev_init_apcb(struct ap_matrix_mdev *matrix_mdev)
* If the APID is not assigned to the host AP configuration,
* we can not assign it to the guest's AP configuration
*/
- if (!test_bit_inv(apid,
- (unsigned long *)matrix_dev->info.apm)) {
+ if (!test_bit_inv(apid, (unsigned long *)
+ matrix_dev->config_info.apm)) {
clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
continue;
}
@@ -364,7 +364,7 @@ static void vfio_ap_mdev_init_apcb(struct ap_matrix_mdev *matrix_mdev)
* guest's AP configuration
*/
if (!test_bit_inv(apqi, (unsigned long *)
- matrix_dev->info.aqm)) {
+ matrix_dev->config_info.aqm)) {
clear_bit_inv(apqi,
matrix_mdev->shadow_apcb.aqm);
continue;
@@ -402,8 +402,9 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
}
matrix_mdev->mdev = mdev;
- vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
- vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);
+ vfio_ap_matrix_init(&matrix_dev->config_info, &matrix_mdev->matrix);
+ vfio_ap_matrix_init(&matrix_dev->config_info,
+ &matrix_mdev->shadow_apcb);
hash_init(matrix_mdev->qtable);
mdev_set_drvdata(mdev, matrix_mdev);
matrix_mdev->pqap_hook.hook = handle_pqap;
@@ -428,8 +429,6 @@ static int vfio_ap_mdev_remove(struct mdev_device *mdev)
mutex_unlock(&matrix_dev->lock);
kfree(matrix_mdev);
- mdev_set_drvdata(mdev, NULL);
- atomic_inc(&matrix_dev->available_instances);
return 0;
}
@@ -1515,7 +1514,9 @@ static void vfio_ap_mdev_hot_plug_queue(struct vfio_ap_queue *q)
unsigned long apid = (unsigned long)AP_QID_CARD(q->apqn);
unsigned long apqi = (unsigned long)AP_QID_QUEUE(q->apqn);
- if (q->matrix_mdev == NULL)
+ if ((q->matrix_mdev == NULL) ||
+ test_bit_inv(apid, matrix_dev->ap_add) ||
+ test_bit_inv(apqi, matrix_dev->aq_add))
return;
hot_plug |= vfio_ap_assign_apid_to_apcb(q->matrix_mdev, apid);
@@ -1608,3 +1609,199 @@ int vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm)
return ret;
}
+
+/**
+ * vfio_ap_mdev_unassign_apids
+ *
+ * @matrix_mdev: The matrix mediated device
+ *
+ * @apid_rem: The bitmap specifying the APIDs of the adapters removed from
+ * the host's AP configuration
+ *
+ * Unassigns each APID specified in @apid_rem that is assigned to the
+ * shadow APCB. Returns true if at least one APID is unassigned; otherwise,
+ * returns false.
+ */
+static bool vfio_ap_mdev_unassign_apids(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long *apid_rem)
+{
+ DECLARE_BITMAP(shadow_apm, AP_DEVICES);
+
+ /*
+ * Get the result of filtering the APIDs removed from the host AP
+ * configuration out of the shadow APCB
+ */
+ bitmap_andnot(shadow_apm, matrix_mdev->shadow_apcb.apm, apid_rem,
+ AP_DEVICES);
+
+ /*
+ * If filtering removed any APIDs from the shadow APCB, then let's go
+ * ahead and update the shadow APCB accordingly
+ */
+ if (!bitmap_equal(matrix_mdev->shadow_apcb.apm, shadow_apm,
+ AP_DEVICES)) {
+ bitmap_copy(matrix_mdev->shadow_apcb.apm, shadow_apm,
+ AP_DEVICES);
+
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * vfio_ap_mdev_unlink_apids
+ *
+ * @matrix_mdev: The matrix mediated device
+ *
+ * @apid_rem: The bitmap specifying the APIDs of the adapters removed from
+ * the host's AP configuration
+ *
+ * Unlinks @matrix_mdev from each queue assigned to @matrix_mdev whose APQN
+ * contains an APID specified in @apid_rem.
+ */
+static void vfio_ap_mdev_unlink_apids(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long *apid_rem)
+{
+ int bkt, apid;
+ struct vfio_ap_queue *q;
+
+ hash_for_each(matrix_mdev->qtable, bkt, q, mdev_qnode) {
+ apid = AP_QID_CARD(q->apqn);
+ if (test_bit_inv(apid, apid_rem)) {
+ q->matrix_mdev = NULL;
+ hash_del(&q->mdev_qnode);
+ }
+ }
+}
+
+/**
+ * vfio_ap_mdev_unassign_apqis
+ *
+ * @matrix_mdev: The matrix mediated device
+ *
+ * @apqi_rem: The bitmap specifying the APQIs of the domains removed from
+ * the host's AP configuration
+ *
+ * Unassigns each APQI specified in @apqi_rem that is assigned to the
+ * shadow APCB. Returns true if at least one APQI is unassigned; otherwise,
+ * returns false.
+ */
+static bool vfio_ap_mdev_unassign_apqis(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long *apqi_rem)
+{
+ DECLARE_BITMAP(shadow_aqm, AP_DOMAINS);
+
+ /*
+ * Get the result of filtering the APQIs removed from the host AP
+ * configuration out of the shadow APCB
+ */
+ bitmap_andnot(shadow_aqm, matrix_mdev->shadow_apcb.aqm, apqi_rem,
+ AP_DOMAINS);
+
+ /*
+ * If filtering removed any APQIs from the shadow APCB, then let's go
+ * ahead and update the shadow APCB accordingly
+ */
+ if (!bitmap_equal(matrix_mdev->shadow_apcb.aqm, shadow_aqm,
+ AP_DOMAINS)) {
+ memcpy(matrix_mdev->shadow_apcb.aqm, shadow_aqm,
+ sizeof(struct ap_matrix));
+
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * vfio_ap_mdev_unlink_apqis
+ *
+ * @matrix_mdev: The matrix mediated device
+ *
+ * @apqi_rem: The bitmap specifying the APQIs of the domains removed from
+ * the host's AP configuration
+ *
+ * Unlinks @matrix_mdev from each queue assigned to @matrix_mdev whose APQN
+ * contains an APQI specified in @apqi_rem.
+ */
+static void vfio_ap_mdev_unlink_apqis(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long *apqi_rem)
+{
+ int bkt, apqi;
+ struct vfio_ap_queue *q;
+
+ hash_for_each(matrix_mdev->qtable, bkt, q, mdev_qnode) {
+ apqi = AP_QID_QUEUE(q->apqn);
+ if (test_bit_inv(apqi, apqi_rem)) {
+ q->matrix_mdev = NULL;
+ hash_del(&q->mdev_qnode);
+ }
+ }
+}
+
+static void vfio_ap_mdev_on_cfg_remove(void)
+{
+ bool unassigned = false;
+ int ap_remove, aq_remove;
+ struct ap_matrix_mdev *matrix_mdev;
+ DECLARE_BITMAP(apid_rem, AP_DEVICES);
+ DECLARE_BITMAP(apqi_rem, AP_DOMAINS);
+ unsigned long *cur_apm, *cur_aqm, *prev_apm, *prev_aqm;
+
+ cur_apm = (unsigned long *)matrix_dev->config_info.apm;
+ cur_aqm = (unsigned long *)matrix_dev->config_info.aqm;
+ prev_apm = (unsigned long *)matrix_dev->config_info_prev.apm;
+ prev_aqm = (unsigned long *)matrix_dev->config_info_prev.aqm;
+
+ ap_remove = bitmap_andnot(apid_rem, prev_apm, cur_apm, AP_DEVICES);
+ aq_remove = bitmap_andnot(apqi_rem, prev_aqm, cur_aqm, AP_DOMAINS);
+
+ if (!ap_remove && !aq_remove)
+ return;
+
+ list_for_each_entry(matrix_mdev, &matrix_dev->mdev_list, node) {
+ if (ap_remove) {
+ if (vfio_ap_mdev_unassign_apids(matrix_mdev, apid_rem))
+ unassigned = true;
+ vfio_ap_mdev_unlink_apids(matrix_mdev, apid_rem);
+ }
+
+ if (aq_remove) {
+ if (vfio_ap_mdev_unassign_apqis(matrix_mdev, apqi_rem))
+ unassigned = true;
+ vfio_ap_mdev_unlink_apqis(matrix_mdev, apqi_rem);
+ }
+
+ if (unassigned)
+ vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
+ }
+}
+
+static void vfio_ap_mdev_on_cfg_add(void)
+{
+ unsigned long *cur_apm, *cur_aqm, *prev_apm, *prev_aqm;
+
+ cur_apm = (unsigned long *)matrix_dev->config_info.apm;
+ cur_aqm = (unsigned long *)matrix_dev->config_info.aqm;
+
+ prev_apm = (unsigned long *)matrix_dev->config_info_prev.apm;
+ prev_aqm = (unsigned long *)matrix_dev->config_info_prev.aqm;
+
+ bitmap_andnot(matrix_dev->ap_add, cur_apm, prev_apm, AP_DEVICES);
+ bitmap_andnot(matrix_dev->aq_add, cur_aqm, prev_aqm, AP_DOMAINS);
+}
+
+void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
+ struct ap_config_info *old_config_info)
+{
+ mutex_lock(&matrix_dev->lock);
+ memcpy(&matrix_dev->config_info, new_config_info,
+ sizeof(struct ap_config_info));
+ memcpy(&matrix_dev->config_info_prev, old_config_info,
+ sizeof(struct ap_config_info));
+
+ vfio_ap_mdev_on_cfg_remove();
+ vfio_ap_mdev_on_cfg_add();
+ mutex_unlock(&matrix_dev->lock);
+}
--
2.21.1
Let's create links between each queue device bound to the vfio_ap device
driver and the matrix mdev to which the queue is assigned. The idea is to
facilitate efficient retrieval of the objects representing the queue
devices and matrix mdevs as well as to verify that a queue assigned to
a matrix mdev is bound to the driver.
The links will be created as follows:
* When the queue device is probed, if its APQN is assigned to a matrix
mdev, the structures representing the queue device and the matrix mdev
will be linked.
* When an adapter or domain is assigned to a matrix mdev, for each new
APQN assigned that references a queue device bound to the vfio_ap
device driver, the structures representing the queue device and the
matrix mdev will be linked.
The links will be removed as follows:
* When the queue device is removed, if its APQN is assigned to a matrix
mdev, the structures representing the queue device and the matrix mdev
will be unlinked.
* When an adapter or domain is unassigned from a matrix mdev, for each
APQN unassigned that references a queue device bound to the vfio_ap
device driver, the structures representing the queue device and the
matrix mdev will be unlinked.
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 161 +++++++++++++++++++++++---
drivers/s390/crypto/vfio_ap_private.h | 3 +
2 files changed, 146 insertions(+), 18 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index dc699fd54505..07caf871943c 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -28,7 +28,6 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
/**
* vfio_ap_get_queue: Retrieve a queue with a specific APQN.
- * @matrix_mdev: the associated mediated matrix
* @apqn: The queue APQN
*
* Retrieve a queue with a specific APQN from the AP queue devices attached to
@@ -36,32 +35,36 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
*
* Returns the pointer to the vfio_ap_queue with the specified APQN, or NULL.
*/
-static struct vfio_ap_queue *vfio_ap_get_queue(
- struct ap_matrix_mdev *matrix_mdev,
- int apqn)
+static struct vfio_ap_queue *vfio_ap_get_queue(int apqn)
{
struct ap_queue *queue;
struct vfio_ap_queue *q = NULL;
- if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
- return NULL;
- if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
- return NULL;
-
queue = ap_get_qdev(apqn);
if (!queue)
return NULL;
put_device(&queue->ap_dev.device);
- if (queue->ap_dev.device.driver == &matrix_dev->vfio_ap_drv->driver) {
+ if (queue->ap_dev.device.driver == &matrix_dev->vfio_ap_drv->driver)
q = dev_get_drvdata(&queue->ap_dev.device);
- q->matrix_mdev = matrix_mdev;
- }
return q;
}
+static struct vfio_ap_queue *
+vfio_ap_mdev_get_queue(struct ap_matrix_mdev *matrix_mdev, unsigned long apqn)
+{
+ struct vfio_ap_queue *q;
+
+ hash_for_each_possible(matrix_mdev->qtable, q, mdev_qnode, apqn) {
+ if (q && (q->apqn == apqn))
+ return q;
+ }
+
+ return NULL;
+}
+
/**
* vfio_ap_wait_for_irqclear
* @apqn: The AP Queue number
@@ -172,7 +175,6 @@ static struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
status.response_code);
end_free:
vfio_ap_free_aqic_resources(q);
- q->matrix_mdev = NULL;
return status;
}
@@ -288,7 +290,7 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
struct ap_matrix_mdev, pqap_hook);
- q = vfio_ap_get_queue(matrix_mdev, apqn);
+ q = vfio_ap_mdev_get_queue(matrix_mdev, apqn);
if (!q)
goto out_unlock;
@@ -331,6 +333,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
matrix_mdev->mdev = mdev;
vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
+ hash_init(matrix_mdev->qtable);
mdev_set_drvdata(mdev, matrix_mdev);
matrix_mdev->pqap_hook.hook = handle_pqap;
matrix_mdev->pqap_hook.owner = THIS_MODULE;
@@ -559,6 +562,87 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
return 0;
}
+enum qlink_action {
+ LINK_APID,
+ LINK_APQI,
+ UNLINK_APID,
+ UNLINK_APQI,
+};
+
+static void vfio_ap_mdev_link_queue(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long apid, unsigned long apqi)
+{
+ struct vfio_ap_queue *q;
+
+ q = vfio_ap_get_queue(AP_MKQID(apid, apqi));
+ if (q) {
+ q->matrix_mdev = matrix_mdev;
+ hash_add(matrix_mdev->qtable,
+ &q->mdev_qnode, q->apqn);
+ }
+}
+
+static void vfio_ap_mdev_unlink_queue(unsigned long apid, unsigned long apqi)
+{
+ struct vfio_ap_queue *q;
+
+ q = vfio_ap_get_queue(AP_MKQID(apid, apqi));
+ if (q) {
+ q->matrix_mdev = NULL;
+ hash_del(&q->mdev_qnode);
+ }
+}
+
+/**
+ * vfio_ap_mdev_manage_qlinks
+ *
+ * @matrix_mdev: The matrix mdev to link.
+ * @action: The action to take on @qlink_id.
+ * @qlink_id: The APID or APQI of the queues to link.
+ *
+ * Sets or clears the links between the queues with the specified @qlink_id
+ * and the @matrix_mdev:
+ * @action == LINK_APID: Set the links between the @matrix_mdev and the
+ * queues with the specified @qlink_id (APID)
+ * @action == LINK_APQI: Set the links between the @matrix_mdev and the
+ * queues with the specified @qlink_id (APQI)
+ * @action == UNLINK_APID: Clear the links between the @matrix_mdev and the
+ * queues with the specified @qlink_id (APID)
+ * @action == UNLINK_APQI: Clear the links between the @matrix_mdev and the
+ * queues with the specified @qlink_id (APQI)
+ */
+static void vfio_ap_mdev_manage_qlinks(struct ap_matrix_mdev *matrix_mdev,
+ enum qlink_action action,
+ unsigned long qlink_id)
+{
+ unsigned long id;
+
+ switch (action) {
+ case LINK_APID:
+ for_each_set_bit_inv(id, matrix_mdev->matrix.aqm,
+ matrix_mdev->matrix.aqm_max + 1)
+ vfio_ap_mdev_link_queue(matrix_mdev, qlink_id, id);
+ break;
+ case UNLINK_APID:
+ for_each_set_bit_inv(id, matrix_mdev->matrix.aqm,
+ matrix_mdev->matrix.aqm_max + 1)
+ vfio_ap_mdev_unlink_queue(qlink_id, id);
+ break;
+ case LINK_APQI:
+ for_each_set_bit_inv(id, matrix_mdev->matrix.apm,
+ matrix_mdev->matrix.apm_max + 1)
+ vfio_ap_mdev_link_queue(matrix_mdev, id, qlink_id);
+ break;
+ case UNLINK_APQI:
+ for_each_set_bit_inv(id, matrix_mdev->matrix.apm,
+ matrix_mdev->matrix.apm_max + 1)
+ vfio_ap_mdev_link_queue(matrix_mdev, id, qlink_id);
+ break;
+ default:
+ WARN_ON_ONCE(1);
+ }
+}
+
/**
* assign_adapter_store
*
@@ -628,6 +712,7 @@ static ssize_t assign_adapter_store(struct device *dev,
if (ret)
goto share_err;
+ vfio_ap_mdev_manage_qlinks(matrix_mdev, LINK_APID, apid);
ret = count;
goto done;
@@ -679,6 +764,7 @@ static ssize_t unassign_adapter_store(struct device *dev,
mutex_lock(&matrix_dev->lock);
clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
+ vfio_ap_mdev_manage_qlinks(matrix_mdev, UNLINK_APID, apid);
mutex_unlock(&matrix_dev->lock);
return count;
@@ -769,6 +855,7 @@ static ssize_t assign_domain_store(struct device *dev,
if (ret)
goto share_err;
+ vfio_ap_mdev_manage_qlinks(matrix_mdev, LINK_APQI, apqi);
ret = count;
goto done;
@@ -821,6 +908,7 @@ static ssize_t unassign_domain_store(struct device *dev,
mutex_lock(&matrix_dev->lock);
clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
+ vfio_ap_mdev_manage_qlinks(matrix_mdev, UNLINK_APQI, apqi);
mutex_unlock(&matrix_dev->lock);
return count;
@@ -1155,6 +1243,11 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
matrix_mdev->matrix.apm_max + 1) {
for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
matrix_mdev->matrix.aqm_max + 1) {
+ q = vfio_ap_mdev_get_queue(matrix_mdev,
+ AP_MKQID(apid, apqi));
+ if (!q)
+ continue;
+
ret = vfio_ap_mdev_reset_queue(apid, apqi, 1);
/*
* Regardless whether a queue turns out to be busy, or
@@ -1164,9 +1257,7 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
if (ret)
rc = ret;
- q = vfio_ap_get_queue(matrix_mdev, AP_MKQID(apid, apqi);
- if (q)
- vfio_ap_free_aqic_resources(q);
+ vfio_ap_free_aqic_resources(q);
}
}
@@ -1292,6 +1383,29 @@ void vfio_ap_mdev_unregister(void)
mdev_unregister_device(&matrix_dev->device);
}
+/*
+ * vfio_ap_queue_link_mdev
+ *
+ * @q: The queue to link with the matrix mdev.
+ *
+ * Links @q with the matrix mdev to which the queue's APQN is assigned.
+ */
+static void vfio_ap_queue_link_mdev(struct vfio_ap_queue *q)
+{
+ unsigned long apid = AP_QID_CARD(q->apqn);
+ unsigned long apqi = AP_QID_QUEUE(q->apqn);
+ struct ap_matrix_mdev *matrix_mdev;
+
+ list_for_each_entry(matrix_mdev, &matrix_dev->mdev_list, node) {
+ if (test_bit_inv(apid, matrix_mdev->matrix.apm) &&
+ test_bit_inv(apqi, matrix_mdev->matrix.aqm)) {
+ q->matrix_mdev = matrix_mdev;
+ hash_add(matrix_mdev->qtable, &q->mdev_qnode, q->apqn);
+ break;
+ }
+ }
+}
+
/**
* vfio_ap_mdev_probe_queue:
*
@@ -1305,9 +1419,13 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
q = kzalloc(sizeof(*q), GFP_KERNEL);
if (!q)
return -ENOMEM;
+ mutex_lock(&matrix_dev->lock);
dev_set_drvdata(&apdev->device, q);
q->apqn = to_ap_queue(&apdev->device)->qid;
q->saved_isc = VFIO_AP_ISC_INVALID;
+ vfio_ap_queue_link_mdev(q);
+ mutex_unlock(&matrix_dev->lock);
+
return 0;
}
@@ -1328,7 +1446,14 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
apid = AP_QID_CARD(q->apqn);
apqi = AP_QID_QUEUE(q->apqn);
vfio_ap_mdev_reset_queue(apid, apqi, 1);
- vfio_ap_irq_disable(q);
+ if (q->matrix_mdev) {
+ if (q->matrix_mdev->kvm) {
+ vfio_ap_free_aqic_resources(q);
+ kvm_put_kvm(q->matrix_mdev->kvm);
+ }
+ hash_del(&q->mdev_qnode);
+ q->matrix_mdev = NULL;
+ }
kfree(q);
mutex_unlock(&matrix_dev->lock);
}
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index d9003de4fbad..4e5cc72fc0db 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -18,6 +18,7 @@
#include <linux/delay.h>
#include <linux/mutex.h>
#include <linux/kvm_host.h>
+#include <linux/hashtable.h>
#include "ap_bus.h"
@@ -86,6 +87,7 @@ struct ap_matrix_mdev {
struct kvm *kvm;
struct kvm_s390_module_hook pqap_hook;
struct mdev_device *mdev;
+ DECLARE_HASHTABLE(qtable, 8);
};
extern int vfio_ap_mdev_register(void);
@@ -97,6 +99,7 @@ struct vfio_ap_queue {
int apqn;
#define VFIO_AP_ISC_INVALID 0xff
unsigned char saved_isc;
+ struct hlist_node mdev_qnode;
};
int vfio_ap_mdev_probe_queue(struct ap_device *queue);
--
2.21.1
The queues assigned to a matrix mediated device are currently reset when:
* The VFIO_DEVICE_RESET ioctl is invoked
* The mdev fd is closed by userspace (QEMU)
* The mdev is removed from sysfs.
Immediately after the reset of a queue, a call is made to disable
interrupts for the queue. This is entirely unnecessary because the reset of
a queue disables interrupts, so this will be removed.
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 28 +++++-----------------------
1 file changed, 5 insertions(+), 23 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 8e6972495daa..dc699fd54505 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -26,14 +26,6 @@
static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
-static int match_apqn(struct device *dev, const void *data)
-{
- struct vfio_ap_queue *q = dev_get_drvdata(dev);
-
- return (q->apqn == *(int *)(data)) ? 1 : 0;
-}
-
-
/**
* vfio_ap_get_queue: Retrieve a queue with a specific APQN.
* @matrix_mdev: the associated mediated matrix
@@ -1121,20 +1113,6 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
return NOTIFY_OK;
}
-static void vfio_ap_irq_disable_apqn(int apqn)
-{
- struct device *dev;
- struct vfio_ap_queue *q;
-
- dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
- &apqn, match_apqn);
- if (dev) {
- q = dev_get_drvdata(dev);
- vfio_ap_irq_disable(q);
- put_device(dev);
- }
-}
-
static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
unsigned int retry)
{
@@ -1169,6 +1147,7 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
{
int ret;
int rc = 0;
+ struct vfio_ap_queue *q;
unsigned long apid, apqi;
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
@@ -1184,7 +1163,10 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
*/
if (ret)
rc = ret;
- vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi));
+
+ q = vfio_ap_get_queue(matrix_mdev, AP_MKQID(apid, apqi);
+ if (q)
+ vfio_ap_free_aqic_resources(q);
}
}
--
2.21.1
The APCB is a control block containing the masks that specify the adapters,
domains and control domains to which a KVM guest is granted access. When
the vfio_ap device driver is notified that the KVM pointer has been set,
the guest's APCB is initialized from the AP configuration of adapters,
domains and control domains assigned to the matrix mdev. The linux device
model, however, precludes passing through to a guest any devices that
are not bound to the device driver facilitating the pass-through.
Consequently, APQNs assigned to the matrix mdev that do not reference
AP queue devices must be filtered before assigning them to the KVM guest's
APCB; however, the AP architecture precludes filtering individual APQNs, so
the APQNs will be filtered by APID. That is, if a given APQN does not
reference a queue device bound to the vfio_ap driver, its APID will not
get assigned to the guest's APCB. For example:
Queues bound to vfio_ap:
04.0004
04.0022
04.0035
05.0004
05.0022
Adapters/domains assigned to the matrix mdev:
04 0004
0022
0035
05 0004
0022
0035
APQNs assigned to APCB:
04.0004
04.0022
04.0035
The APID 05 was filtered from the matrix mdev's matrix because
queue device 05.0035 is not bound to the vfio_ap device driver.
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 59 +++++++++++++++++++++++++++++--
1 file changed, 57 insertions(+), 2 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index a69422d76e7f..633c61995891 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -318,6 +318,13 @@ static void vfio_ap_matrix_init(struct ap_config_info *info,
matrix->adm_max = info->apxa ? info->Nd : 15;
}
+static void vfio_ap_copy_masks(struct ap_matrix *dst, struct ap_matrix *src)
+{
+ bitmap_copy(dst->apm, src->apm, AP_DEVICES);
+ bitmap_copy(dst->aqm, src->aqm, AP_DOMAINS);
+ bitmap_copy(dst->adm, src->adm, AP_DOMAINS);
+}
+
static bool vfio_ap_mdev_has_crycb(struct ap_matrix_mdev *matrix_mdev)
{
return (matrix_mdev->kvm && matrix_mdev->kvm->arch.crypto.crycbd);
@@ -332,6 +339,55 @@ static void vfio_ap_mdev_commit_shadow_apcb(struct ap_matrix_mdev *matrix_mdev)
matrix_mdev->shadow_apcb.adm);
}
+static void vfio_ap_mdev_init_apcb(struct ap_matrix_mdev *matrix_mdev)
+{
+ unsigned long apid, apqi, apqn;
+
+ vfio_ap_copy_masks(&matrix_mdev->shadow_apcb, &matrix_mdev->matrix);
+
+ for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
+ /*
+ * If the APID is not assigned to the host AP configuration,
+ * we can not assign it to the guest's AP configuration
+ */
+ if (!test_bit_inv(apid,
+ (unsigned long *)matrix_dev->info.apm)) {
+ clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
+ continue;
+ }
+
+ for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
+ AP_DOMAINS) {
+ /*
+ * If the APQI is not assigned to the host AP
+ * configuration, then it can not be assigned to the
+ * guest's AP configuration
+ */
+ if (!test_bit_inv(apqi, (unsigned long *)
+ matrix_dev->info.aqm)) {
+ clear_bit_inv(apqi,
+ matrix_mdev->shadow_apcb.aqm);
+ continue;
+ }
+
+ /*
+ * If the APQN is not bound to the vfio_ap device
+ * driver, then we can't assign it to the guest's
+ * AP configuration. The AP architecture won't
+ * allow filtering of a single APQN, so let's filter
+ * the APID.
+ */
+ apqn = AP_MKQID(apid, apqi);
+
+ if (!vfio_ap_mdev_get_queue(matrix_mdev, apqn)) {
+ clear_bit_inv(apid,
+ matrix_mdev->shadow_apcb.apm);
+ break;
+ }
+ }
+ }
+}
+
static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
{
struct ap_matrix_mdev *matrix_mdev;
@@ -1256,8 +1312,7 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
if (ret)
return NOTIFY_DONE;
- memcpy(&matrix_mdev->shadow_apcb, &matrix_mdev->matrix,
- sizeof(matrix_mdev->shadow_apcb));
+ vfio_ap_mdev_init_apcb(matrix_mdev);
vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
return NOTIFY_OK;
--
2.21.1
On Tue, 24 Nov 2020 16:40:03 -0500
Tony Krowiak <[email protected]> wrote:
> The queues assigned to a matrix mediated device are currently reset when:
>
> * The VFIO_DEVICE_RESET ioctl is invoked
> * The mdev fd is closed by userspace (QEMU)
> * The mdev is removed from sysfs.
>
> Immediately after the reset of a queue, a call is made to disable
> interrupts for the queue. This is entirely unnecessary because the reset of
> a queue disables interrupts, so this will be removed.
>
> Signed-off-by: Tony Krowiak <[email protected]>
Reviewed-by: Halil Pasic <[email protected]>
As I said previously, I would prefer the cleanup of the airq
resources being part of reset_queue(), but I can propose that
later.
> ---
> drivers/s390/crypto/vfio_ap_ops.c | 28 +++++-----------------------
> 1 file changed, 5 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 8e6972495daa..dc699fd54505 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -26,14 +26,6 @@
>
> static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
>
> -static int match_apqn(struct device *dev, const void *data)
> -{
> - struct vfio_ap_queue *q = dev_get_drvdata(dev);
> -
> - return (q->apqn == *(int *)(data)) ? 1 : 0;
> -}
> -
> -
> /**
> * vfio_ap_get_queue: Retrieve a queue with a specific APQN.
> * @matrix_mdev: the associated mediated matrix
> @@ -1121,20 +1113,6 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
> return NOTIFY_OK;
> }
>
> -static void vfio_ap_irq_disable_apqn(int apqn)
> -{
> - struct device *dev;
> - struct vfio_ap_queue *q;
> -
> - dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
> - &apqn, match_apqn);
> - if (dev) {
> - q = dev_get_drvdata(dev);
> - vfio_ap_irq_disable(q);
> - put_device(dev);
> - }
> -}
> -
> static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
> unsigned int retry)
> {
> @@ -1169,6 +1147,7 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
> {
> int ret;
> int rc = 0;
> + struct vfio_ap_queue *q;
> unsigned long apid, apqi;
> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>
> @@ -1184,7 +1163,10 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
> */
> if (ret)
> rc = ret;
> - vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi));
> +
> + q = vfio_ap_get_queue(matrix_mdev, AP_MKQID(apid, apqi);
> + if (q)
> + vfio_ap_free_aqic_resources(q);
> }
> }
>
On Tue, 24 Nov 2020 16:40:04 -0500
Tony Krowiak <[email protected]> wrote:
> @@ -1155,6 +1243,11 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
> matrix_mdev->matrix.apm_max + 1) {
> for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
> matrix_mdev->matrix.aqm_max + 1) {
> + q = vfio_ap_mdev_get_queue(matrix_mdev,
> + AP_MKQID(apid, apqi));
> + if (!q)
> + continue;
> +
> ret = vfio_ap_mdev_reset_queue(apid, apqi, 1);
> /*
> * Regardless whether a queue turns out to be busy, or
> @@ -1164,9 +1257,7 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
> if (ret)
> rc = ret;
>
> - q = vfio_ap_get_queue(matrix_mdev, AP_MKQID(apid, apqi);
> - if (q)
> - vfio_ap_free_aqic_resources(q);
> + vfio_ap_free_aqic_resources(q);
> }
> }
During the review of v11 we discussed this. Introducing this the one
way around, just to change it in the next patch, which should deal
with something different makes no sense to me.
BTW I've provided a ton of feedback for '[PATCH v11 03/14]
s390/vfio-ap: manage link between queue struct and matrix mdev', but I
can't find your response to that. Some of the things resurface here, and
I don't feel like repeating myself. Can you provide me an answer to
the v11 version?
On Tue, 24 Nov 2020 16:40:06 -0500
Tony Krowiak <[email protected]> wrote:
> Let's implement the callback to indicate when an APQN
> is in use by the vfio_ap device driver. The callback is
> invoked whenever a change to the apmask or aqmask would
> result in one or more queue devices being removed from the driver. The
> vfio_ap device driver will indicate a resource is in use
> if the APQN of any of the queue devices to be removed are assigned to
> any of the matrix mdevs under the driver's control.
>
> There is potential for a deadlock condition between the matrix_dev->lock
> used to lock the matrix device during assignment of adapters and domains
> and the ap_perms_mutex locked by the AP bus when changes are made to the
> sysfs apmask/aqmask attributes.
>
> Consider following scenario (courtesy of Halil Pasic):
> 1) apmask_store() takes ap_perms_mutex
> 2) assign_adapter_store() takes matrix_dev->lock
> 3) apmask_store() calls vfio_ap_mdev_resource_in_use() which tries
> to take matrix_dev->lock
> 4) assign_adapter_store() calls ap_apqn_in_matrix_owned_by_def_drv
> which tries to take ap_perms_mutex
>
> BANG!
>
> To resolve this issue, instead of using the mutex_lock(&matrix_dev->lock)
> function to lock the matrix device during assignment of an adapter or
> domain to a matrix_mdev as well as during the in_use callback, the
> mutex_trylock(&matrix_dev->lock) function will be used. If the lock is not
> obtained, then the assignment and in_use functions will terminate with
> -EBUSY.
Good news is: the final product is OK with regards to in_use(). Bad news
is: this patch does not do enough. At this stage we are still racy.
The problem is that the assign operations don't bother to take the
ap_perms_mutex lock under the matrix_dev->lock.
The scenario is the following:
1) apmask_store() takes ap_perms_mutex
2) apmask_store() calls vfio_ap_mdev_resource_in_use() which
takes matrix_dev->lock
3) vfio_ap_mdev_resource_in_use() releases matrix_dev->lock
and returns 0
4) assign_adapter_store() takes matrix_dev->lock does the
assign (the queues are still bound to vfio_ap) and releases
matrix_dev->lock
5) apmask_store() carries on, does the update to apask and releases
ap_perms_mutex
6) The queues get 'stolen' from vfio ap while used.
This gets fixed with "s390/vfio-ap: allow assignment of unavailable AP
queues to mdev device". Maybe we can reorder these patches. I didn't
look into that.
We could also just ignore the problem, because it is just for a couple
of commits, but I would prefer it gone.
Regards,
Halil
On Tue, 24 Nov 2020 16:40:00 -0500
Tony Krowiak <[email protected]> wrote:
> Let's move the probe and remove callbacks into the vfio_ap_ops.c
> file to keep all code related to managing queues in a single file. This
> way, all functions related to queue management can be removed from the
> vfio_ap_private.h header file defining the public interfaces for the
> vfio_ap device driver.
>
> Signed-off-by: Tony Krowiak <[email protected]>
Reviewed-by: Halil Pasic <[email protected]>
On Tue, 24 Nov 2020 16:40:01 -0500
Tony Krowiak <[email protected]> wrote:
> Decrement the reference count to KVM when notified that KVM pointer is
> invalidated via the vfio group notifier.
Can you please explain more thoroughly. Is this a bug you found? If
yes do we need to backport it (cc stabe, fixes tag)?
It doesn't see related to the objective of the series. If not related,
why not spin it separately?
>
> Signed-off-by: Tony Krowiak <[email protected]>
This s-o-b is probably by accident.
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> drivers/s390/crypto/vfio_ap_ops.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 66fd9784a156..31e39c1f6e56 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -1095,7 +1095,11 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
> matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
>
> if (!data) {
> + if (matrix_mdev->kvm)
> + kvm_put_kvm(matrix_mdev->kvm);
> +
> matrix_mdev->kvm = NULL;
> +
> return NOTIFY_OK;
> }
>
On Tue, 24 Nov 2020 16:40:02 -0500
Tony Krowiak <[email protected]> wrote:
A nit: for all other patches the title prefix is s390/vfio-ap, here you
have 390/vfio-ap.
On Tue, 24 Nov 2020 16:40:04 -0500
Tony Krowiak <[email protected]> wrote:
> Let's create links between each queue device bound to the vfio_ap device
> driver and the matrix mdev to which the queue is assigned. The idea is to
> facilitate efficient retrieval of the objects representing the queue
> devices and matrix mdevs as well as to verify that a queue assigned to
> a matrix mdev is bound to the driver.
>
> The links will be created as follows:
>
> * When the queue device is probed, if its APQN is assigned to a matrix
> mdev, the structures representing the queue device and the matrix mdev
> will be linked.
>
> * When an adapter or domain is assigned to a matrix mdev, for each new
> APQN assigned that references a queue device bound to the vfio_ap
> device driver, the structures representing the queue device and the
> matrix mdev will be linked.
>
> The links will be removed as follows:
>
> * When the queue device is removed, if its APQN is assigned to a matrix
> mdev, the structures representing the queue device and the matrix mdev
> will be unlinked.
>
> * When an adapter or domain is unassigned from a matrix mdev, for each
> APQN unassigned that references a queue device bound to the vfio_ap
> device driver, the structures representing the queue device and the
> matrix mdev will be unlinked.
>
> Signed-off-by: Tony Krowiak <[email protected]>
Actually some aspects of this look much better than last time,
but I'm afraid there one new issue that must be corrected -- see below.
> ---
> drivers/s390/crypto/vfio_ap_ops.c | 161 +++++++++++++++++++++++---
> drivers/s390/crypto/vfio_ap_private.h | 3 +
> 2 files changed, 146 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index dc699fd54505..07caf871943c 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -28,7 +28,6 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
>
> /**
> * vfio_ap_get_queue: Retrieve a queue with a specific APQN.
> - * @matrix_mdev: the associated mediated matrix
> * @apqn: The queue APQN
> *
> * Retrieve a queue with a specific APQN from the AP queue devices attached to
> @@ -36,32 +35,36 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
> *
> * Returns the pointer to the vfio_ap_queue with the specified APQN, or NULL.
> */
> -static struct vfio_ap_queue *vfio_ap_get_queue(
> - struct ap_matrix_mdev *matrix_mdev,
> - int apqn)
> +static struct vfio_ap_queue *vfio_ap_get_queue(int apqn)
> {
> struct ap_queue *queue;
> struct vfio_ap_queue *q = NULL;
>
> - if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
> - return NULL;
> - if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
> - return NULL;
> -
> queue = ap_get_qdev(apqn);
> if (!queue)
> return NULL;
>
> put_device(&queue->ap_dev.device);
>
> - if (queue->ap_dev.device.driver == &matrix_dev->vfio_ap_drv->driver) {
> + if (queue->ap_dev.device.driver == &matrix_dev->vfio_ap_drv->driver)
> q = dev_get_drvdata(&queue->ap_dev.device);
> - q->matrix_mdev = matrix_mdev;
> - }
>
> return q;
> }
>
> +static struct vfio_ap_queue *
> +vfio_ap_mdev_get_queue(struct ap_matrix_mdev *matrix_mdev, unsigned long apqn)
> +{
> + struct vfio_ap_queue *q;
> +
> + hash_for_each_possible(matrix_mdev->qtable, q, mdev_qnode, apqn) {
> + if (q && (q->apqn == apqn))
> + return q;
> + }
> +
> + return NULL;
> +}
> +
> /**
> * vfio_ap_wait_for_irqclear
> * @apqn: The AP Queue number
> @@ -172,7 +175,6 @@ static struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
> status.response_code);
> end_free:
> vfio_ap_free_aqic_resources(q);
> - q->matrix_mdev = NULL;
> return status;
> }
>
> @@ -288,7 +290,7 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
> matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
> struct ap_matrix_mdev, pqap_hook);
>
> - q = vfio_ap_get_queue(matrix_mdev, apqn);
> + q = vfio_ap_mdev_get_queue(matrix_mdev, apqn);
> if (!q)
> goto out_unlock;
>
> @@ -331,6 +333,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>
> matrix_mdev->mdev = mdev;
> vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
> + hash_init(matrix_mdev->qtable);
> mdev_set_drvdata(mdev, matrix_mdev);
> matrix_mdev->pqap_hook.hook = handle_pqap;
> matrix_mdev->pqap_hook.owner = THIS_MODULE;
> @@ -559,6 +562,87 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
> return 0;
> }
>
> +enum qlink_action {
> + LINK_APID,
> + LINK_APQI,
> + UNLINK_APID,
> + UNLINK_APQI,
> +};
> +
> +static void vfio_ap_mdev_link_queue(struct ap_matrix_mdev *matrix_mdev,
> + unsigned long apid, unsigned long apqi)
> +{
> + struct vfio_ap_queue *q;
> +
> + q = vfio_ap_get_queue(AP_MKQID(apid, apqi));
> + if (q) {
> + q->matrix_mdev = matrix_mdev;
> + hash_add(matrix_mdev->qtable,
> + &q->mdev_qnode, q->apqn);
> + }
> +}
> +
> +static void vfio_ap_mdev_unlink_queue(unsigned long apid, unsigned long apqi)
> +{
> + struct vfio_ap_queue *q;
> +
> + q = vfio_ap_get_queue(AP_MKQID(apid, apqi));
> + if (q) {
> + q->matrix_mdev = NULL;
> + hash_del(&q->mdev_qnode);
> + }
> +}
I would do
+static void vfio_ap_mdev_unlink_queue(struct vfio_ap_queue *q)
+{
+ if (!q)
+ return;
+ q->matrix_mdev = NULL;
+ hash_del(&q->mdev_qnode);
+}
+
+static void vfio_ap_mdev_unlink_queue_by_id(unsigned long apid, unsigned long apqi)
+{
+ struct vfio_ap_queue *q = vfio_ap_get_queue(AP_MKQID(apid, apqi));
+
+ vfio_ap_mdev_unlink_queue(q);
+}
> +
> +/**
> + * vfio_ap_mdev_manage_qlinks
> + *
> + * @matrix_mdev: The matrix mdev to link.
> + * @action: The action to take on @qlink_id.
> + * @qlink_id: The APID or APQI of the queues to link.
> + *
> + * Sets or clears the links between the queues with the specified @qlink_id
> + * and the @matrix_mdev:
> + * @action == LINK_APID: Set the links between the @matrix_mdev and the
> + * queues with the specified @qlink_id (APID)
> + * @action == LINK_APQI: Set the links between the @matrix_mdev and the
> + * queues with the specified @qlink_id (APQI)
> + * @action == UNLINK_APID: Clear the links between the @matrix_mdev and the
> + * queues with the specified @qlink_id (APID)
> + * @action == UNLINK_APQI: Clear the links between the @matrix_mdev and the
> + * queues with the specified @qlink_id (APQI)
> + */
> +static void vfio_ap_mdev_manage_qlinks(struct ap_matrix_mdev *matrix_mdev,
> + enum qlink_action action,
> + unsigned long qlink_id)
> +{
> + unsigned long id;
> +
> + switch (action) {
> + case LINK_APID:
> + for_each_set_bit_inv(id, matrix_mdev->matrix.aqm,
> + matrix_mdev->matrix.aqm_max + 1)
> + vfio_ap_mdev_link_queue(matrix_mdev, qlink_id, id);
> + break;
> + case UNLINK_APID:
> + for_each_set_bit_inv(id, matrix_mdev->matrix.aqm,
> + matrix_mdev->matrix.aqm_max + 1)
> + vfio_ap_mdev_unlink_queue(qlink_id, id);
> + break;
> + case LINK_APQI:
> + for_each_set_bit_inv(id, matrix_mdev->matrix.apm,
> + matrix_mdev->matrix.apm_max + 1)
> + vfio_ap_mdev_link_queue(matrix_mdev, id, qlink_id);
> + break;
> + case UNLINK_APQI:
> + for_each_set_bit_inv(id, matrix_mdev->matrix.apm,
> + matrix_mdev->matrix.apm_max + 1)
> + vfio_ap_mdev_link_queue(matrix_mdev, id, qlink_id);
> + break;
> + default:
> + WARN_ON_ONCE(1);
> + }
> +}
> +
> /**
> * assign_adapter_store
> *
> @@ -628,6 +712,7 @@ static ssize_t assign_adapter_store(struct device *dev,
> if (ret)
> goto share_err;
>
> + vfio_ap_mdev_manage_qlinks(matrix_mdev, LINK_APID, apid);
> ret = count;
> goto done;
>
> @@ -679,6 +764,7 @@ static ssize_t unassign_adapter_store(struct device *dev,
>
> mutex_lock(&matrix_dev->lock);
> clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
> + vfio_ap_mdev_manage_qlinks(matrix_mdev, UNLINK_APID, apid);
> mutex_unlock(&matrix_dev->lock);
>
> return count;
> @@ -769,6 +855,7 @@ static ssize_t assign_domain_store(struct device *dev,
> if (ret)
> goto share_err;
>
> + vfio_ap_mdev_manage_qlinks(matrix_mdev, LINK_APQI, apqi);
> ret = count;
> goto done;
>
> @@ -821,6 +908,7 @@ static ssize_t unassign_domain_store(struct device *dev,
>
> mutex_lock(&matrix_dev->lock);
> clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
> + vfio_ap_mdev_manage_qlinks(matrix_mdev, UNLINK_APQI, apqi);
> mutex_unlock(&matrix_dev->lock);
>
> return count;
> @@ -1155,6 +1243,11 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
> matrix_mdev->matrix.apm_max + 1) {
> for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
> matrix_mdev->matrix.aqm_max + 1) {
> + q = vfio_ap_mdev_get_queue(matrix_mdev,
> + AP_MKQID(apid, apqi));
> + if (!q)
> + continue;
> +
> ret = vfio_ap_mdev_reset_queue(apid, apqi, 1);
> /*
> * Regardless whether a queue turns out to be busy, or
> @@ -1164,9 +1257,7 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
> if (ret)
> rc = ret;
>
> - q = vfio_ap_get_queue(matrix_mdev, AP_MKQID(apid, apqi);
> - if (q)
> - vfio_ap_free_aqic_resources(q);
> + vfio_ap_free_aqic_resources(q);
> }
> }
>
> @@ -1292,6 +1383,29 @@ void vfio_ap_mdev_unregister(void)
> mdev_unregister_device(&matrix_dev->device);
> }
>
> +/*
> + * vfio_ap_queue_link_mdev
> + *
> + * @q: The queue to link with the matrix mdev.
> + *
> + * Links @q with the matrix mdev to which the queue's APQN is assigned.
> + */
> +static void vfio_ap_queue_link_mdev(struct vfio_ap_queue *q)
> +{
> + unsigned long apid = AP_QID_CARD(q->apqn);
> + unsigned long apqi = AP_QID_QUEUE(q->apqn);
> + struct ap_matrix_mdev *matrix_mdev;
> +
> + list_for_each_entry(matrix_mdev, &matrix_dev->mdev_list, node) {
> + if (test_bit_inv(apid, matrix_mdev->matrix.apm) &&
> + test_bit_inv(apqi, matrix_mdev->matrix.aqm)) {
> + q->matrix_mdev = matrix_mdev;
> + hash_add(matrix_mdev->qtable, &q->mdev_qnode, q->apqn);
> + break;
> + }
> + }
> +}
> +
> /**
> * vfio_ap_mdev_probe_queue:
> *
> @@ -1305,9 +1419,13 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
> q = kzalloc(sizeof(*q), GFP_KERNEL);
> if (!q)
> return -ENOMEM;
> + mutex_lock(&matrix_dev->lock);
> dev_set_drvdata(&apdev->device, q);
> q->apqn = to_ap_queue(&apdev->device)->qid;
> q->saved_isc = VFIO_AP_ISC_INVALID;
> + vfio_ap_queue_link_mdev(q);
> + mutex_unlock(&matrix_dev->lock);
> +
> return 0;
> }
>
> @@ -1328,7 +1446,14 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
> apid = AP_QID_CARD(q->apqn);
> apqi = AP_QID_QUEUE(q->apqn);
> vfio_ap_mdev_reset_queue(apid, apqi, 1);
Does it make sense to reset if !q->matrix_dev?
> - vfio_ap_irq_disable(q);
> + if (q->matrix_mdev) {
> + if (q->matrix_mdev->kvm) {
> + vfio_ap_free_aqic_resources(q);
Again this belongs to the previous patch.
> + kvm_put_kvm(q->matrix_mdev->kvm);
This kvm_put_kvm() makes no sense to me! Please explain. Where
is the corresponding kvm_get_kvm()?
> + }
> + hash_del(&q->mdev_qnode);
> + q->matrix_mdev = NULL;
This shouuld be an unlink_queue(q).
> + }
> kfree(q);
> mutex_unlock(&matrix_dev->lock);
> }
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index d9003de4fbad..4e5cc72fc0db 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -18,6 +18,7 @@
> #include <linux/delay.h>
> #include <linux/mutex.h>
> #include <linux/kvm_host.h>
> +#include <linux/hashtable.h>
>
> #include "ap_bus.h"
>
> @@ -86,6 +87,7 @@ struct ap_matrix_mdev {
> struct kvm *kvm;
> struct kvm_s390_module_hook pqap_hook;
> struct mdev_device *mdev;
> + DECLARE_HASHTABLE(qtable, 8);
> };
>
> extern int vfio_ap_mdev_register(void);
> @@ -97,6 +99,7 @@ struct vfio_ap_queue {
> int apqn;
> #define VFIO_AP_ISC_INVALID 0xff
> unsigned char saved_isc;
> + struct hlist_node mdev_qnode;
> };
>
> int vfio_ap_mdev_probe_queue(struct ap_device *queue);
On Tue, 24 Nov 2020 16:40:07 -0500
Tony Krowiak <[email protected]> wrote:
> The APCB is a field within the CRYCB that provides the AP configuration
> to a KVM guest. Let's introduce a shadow copy of the KVM guest's APCB and
> maintain it for the lifespan of the guest.
>
> Signed-off-by: Tony Krowiak <[email protected]>
> Reviewed-by: Halil Pasic <[email protected]>
Still LGTM
On Tue, 24 Nov 2020 16:40:08 -0500
Tony Krowiak <[email protected]> wrote:
> The matrix of adapters and domains configured in a guest's APCB may
> differ from the matrix of adapters and domains assigned to the matrix mdev,
> so this patch introduces a sysfs attribute to display the matrix of
> adapters and domains that are or will be assigned to the APCB of a guest
> that is or will be using the matrix mdev. For a matrix mdev denoted by
> $uuid, the guest matrix can be displayed as follows:
>
> cat /sys/devices/vfio_ap/matrix/$uuid/guest_matrix
>
> Signed-off-by: Tony Krowiak <[email protected]>
Code looks good, but it may be a little early, since the treatment of
guset_matrix is changed by the following patches.
On Tue, 24 Nov 2020 16:40:09 -0500
Tony Krowiak <[email protected]> wrote:
> The APCB is a control block containing the masks that specify the adapters,
> domains and control domains to which a KVM guest is granted access. When
> the vfio_ap device driver is notified that the KVM pointer has been set,
> the guest's APCB is initialized from the AP configuration of adapters,
> domains and control domains assigned to the matrix mdev. The linux device
> model, however, precludes passing through to a guest any devices that
> are not bound to the device driver facilitating the pass-through.
> Consequently, APQNs assigned to the matrix mdev that do not reference
> AP queue devices must be filtered before assigning them to the KVM guest's
> APCB; however, the AP architecture precludes filtering individual APQNs, so
> the APQNs will be filtered by APID. That is, if a given APQN does not
> reference a queue device bound to the vfio_ap driver, its APID will not
> get assigned to the guest's APCB. For example:
>
> Queues bound to vfio_ap:
> 04.0004
> 04.0022
> 04.0035
> 05.0004
> 05.0022
>
> Adapters/domains assigned to the matrix mdev:
> 04 0004
> 0022
> 0035
> 05 0004
> 0022
> 0035
>
> APQNs assigned to APCB:
> 04.0004
> 04.0022
> 04.0035
>
> The APID 05 was filtered from the matrix mdev's matrix because
> queue device 05.0035 is not bound to the vfio_ap device driver.
>
> Signed-off-by: Tony Krowiak <[email protected]>
This adds filtering. So from here guest_matrix may be different
than matrix also for an mdev that is associated with a guest. I'm still
grappling with the big picture. Have you thought about testability?
How is a testcase supposed to figure out which behavior is
to be deemed correct?
I don't like the title line. It implies that guest apcb was
uninitialized before. Which is not the case.
On Tue, 24 Nov 2020 16:40:10 -0500
Tony Krowiak <[email protected]> wrote:
> The current implementation does not allow assignment of an AP adapter or
> domain to an mdev device if each APQN resulting from the assignment
> does not reference an AP queue device that is bound to the vfio_ap device
> driver. This patch allows assignment of AP resources to the matrix mdev as
> long as the APQNs resulting from the assignment:
> 1. Are not reserved by the AP BUS for use by the zcrypt device drivers.
> 2. Are not assigned to another matrix mdev.
>
> The rationale behind this is twofold:
> 1. The AP architecture does not preclude assignment of APQNs to an AP
> configuration that are not available to the system.
> 2. APQNs that do not reference a queue device bound to the vfio_ap
> device driver will not be assigned to the guest's CRYCB, so the
> guest will not get access to queues not bound to the vfio_ap driver.
>
> Signed-off-by: Tony Krowiak <[email protected]>
Again code looks good. I'm still worried about all the incremental
changes (good for review) and their testability.
On Tue, 24 Nov 2020 16:40:11 -0500
Tony Krowiak <[email protected]> wrote:
> Let's hot plug/unplug adapters, domains and control domains assigned to or
> unassigned from an AP matrix mdev device while it is in use by a guest per
> the following rules:
>
> * Assign an adapter to mdev's matrix:
>
> The adapter will be hot plugged into the guest under the following
> conditions:
> 1. The adapter is not yet assigned to the guest's matrix
> 2. At least one domain is assigned to the guest's matrix
> 3. Each APQN derived from the APID of the newly assigned adapter and
> the APQIs of the domains already assigned to the guest's
> matrix references a queue device bound to the vfio_ap device driver.
>
> The adapter and each domain assigned to the mdev's matrix will be hot
> plugged into the guest under the following conditions:
> 1. The adapter is not yet assigned to the guest's matrix
> 2. No domains are assigned to the guest's matrix
> 3 At least one domain is assigned to the mdev's matrix
> 4. Each APQN derived from the APID of the newly assigned adapter and
> the APQIs of the domains assigned to the mdev's matrix references a
> queue device bound to the vfio_ap device driver.
>
> * Unassign an adapter from mdev's matrix:
>
> The adapter will be hot unplugged from the KVM guest if it is
> assigned to the guest's matrix.
>
> * Assign a domain to mdev's matrix:
>
> The domain will be hot plugged into the guest under the following
> conditions:
> 1. The domain is not yet assigned to the guest's matrix
> 2. At least one adapter is assigned to the guest's matrix
> 3. Each APQN derived from the APQI of the newly assigned domain and
> the APIDs of the adapters already assigned to the guest's
> matrix references a queue device bound to the vfio_ap device driver.
>
> The domain and each adapter assigned to the mdev's matrix will be hot
> plugged into the guest under the following conditions:
> 1. The domain is not yet assigned to the guest's matrix
> 2. No adapters are assigned to the guest's matrix
> 3 At least one adapter is assigned to the mdev's matrix
> 4. Each APQN derived from the APQI of the newly assigned domain and
> the APIDs of the adapters assigned to the mdev's matrix references a
> queue device bound to the vfio_ap device driver.
>
> * Unassign adapter from mdev's matrix:
>
> The domain will be hot unplugged from the KVM guest if it is
> assigned to the guest's matrix.
>
> * Assign a control domain:
>
> The control domain will be hot plugged into the KVM guest if it is not
> assigned to the guest's APCB. The AP architecture ensures a guest will
> only get access to the control domain if it is in the host's AP
> configuration, so there is no risk in hot plugging it; however, it will
> become automatically available to the guest when it is added to the host
> configuration.
>
> * Unassign a control domain:
>
> The control domain will be hot unplugged from the KVM guest if it is
> assigned to the guest's APCB.
This is where things start getting tricky. E.g. do we need to revise
filtering after an unassign? (For example an assign_adapter X didn't
change the shadow, because queue XY was missing, but now we unplug domain
Y. Should the adapter X pop up? I guess it should.)
>
> Note: Now that hot plug/unplug is implemented, there is the possibility
> that an assignment/unassignment of an adapter, domain or control
> domain could be initiated while the guest is starting, so the
> matrix device lock will be taken for the group notification callback
> that initializes the guest's APCB when the KVM pointer is made
> available to the vfio_ap device driver.
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> drivers/s390/crypto/vfio_ap_ops.c | 190 +++++++++++++++++++++++++-----
> 1 file changed, 159 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 586ec5776693..4f96b7861607 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -631,6 +631,60 @@ static void vfio_ap_mdev_manage_qlinks(struct ap_matrix_mdev *matrix_mdev,
> }
> }
>
> +static bool vfio_ap_assign_apid_to_apcb(struct ap_matrix_mdev *matrix_mdev,
> + unsigned long apid)
> +{
> + unsigned long apqi, apqn;
> + unsigned long *aqm = matrix_mdev->shadow_apcb.aqm;
> +
> + /*
> + * If the APID is already assigned to the guest's shadow APCB, there is
> + * no need to assign it.
> + */
> + if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm))
> + return false;
> +
> + /*
> + * If no domains have yet been assigned to the shadow APCB and one or
> + * more domains have been assigned to the matrix mdev, then use
> + * the domains assigned to the matrix mdev; otherwise, there is nothing
> + * to assign to the shadow APCB.
> + */
> + if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS)) {
> + if (bitmap_empty(matrix_mdev->matrix.aqm, AP_DOMAINS))
> + return false;
> +
> + aqm = matrix_mdev->matrix.aqm;
> + }
> +
> + /* Make sure all APQNs are bound to the vfio_ap driver */
> + for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
> + apqn = AP_MKQID(apid, apqi);
> +
> + if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
> + return false;
> + }
> +
> + set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
> +
> + /*
> + * If we verified APQNs using the domains assigned to the matrix mdev,
> + * then copy the APQIs of those domains into the guest's APCB
> + */
> + if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS))
> + bitmap_copy(matrix_mdev->shadow_apcb.aqm,
> + matrix_mdev->matrix.aqm, AP_DOMAINS);
> +
> + return true;
> +}
What is the rationale behind the shadow aqm empty special handling? I.e.
why not simply:
static bool vfio_ap_assign_apid_to_apcb(struct ap_matrix_mdev *matrix_mdev,
unsigned long apid)
{
unsigned long apqi, apqn;
unsigned long *aqm = matrix_mdev->shadow_apcb.aqm;
/*
* If the APID is already assigned to the guest's shadow APCB, there is
* no need to assign it.
*/
if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm))
return false;
/* Make sure all APQNs are bound to the vfio_ap driver */
for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
apqn = AP_MKQID(apid, apqi);
if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
return false;
}
set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
return true;
}
Please answer the questions I've asked, and note that I will have to
return to this patch, later.
Regards,
Halil
> +
> +static void vfio_ap_mdev_hot_plug_adapter(struct ap_matrix_mdev *matrix_mdev,
> + unsigned long apid)
> +{
> + if (vfio_ap_assign_apid_to_apcb(matrix_mdev, apid))
> + vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
> +}
> +
> /**
> * assign_adapter_store
> *
> @@ -673,10 +727,6 @@ static ssize_t assign_adapter_store(struct device *dev,
> struct mdev_device *mdev = mdev_from_dev(dev);
> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>
> - /* If the guest is running, disallow assignment of adapter */
> - if (matrix_mdev->kvm)
> - return -EBUSY;
> -
> ret = kstrtoul(buf, 0, &apid);
> if (ret)
> return ret;
> @@ -698,12 +748,22 @@ static ssize_t assign_adapter_store(struct device *dev,
> }
> set_bit_inv(apid, matrix_mdev->matrix.apm);
> vfio_ap_mdev_manage_qlinks(matrix_mdev, LINK_APID, apid);
> + vfio_ap_mdev_hot_plug_adapter(matrix_mdev, apid);
> mutex_unlock(&matrix_dev->lock);
>
> return count;
> }
> static DEVICE_ATTR_WO(assign_adapter);
>
> +static void vfio_ap_mdev_hot_unplug_adapter(struct ap_matrix_mdev *matrix_mdev,
> + unsigned long apid)
> +{
> + if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm)) {
> + clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
> + vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
> + }
> +}
> +
> /**
> * unassign_adapter_store
> *
> @@ -730,10 +790,6 @@ static ssize_t unassign_adapter_store(struct device *dev,
> struct mdev_device *mdev = mdev_from_dev(dev);
> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>
> - /* If the guest is running, disallow un-assignment of adapter */
> - if (matrix_mdev->kvm)
> - return -EBUSY;
> -
> ret = kstrtoul(buf, 0, &apid);
> if (ret)
> return ret;
> @@ -744,12 +800,67 @@ static ssize_t unassign_adapter_store(struct device *dev,
> mutex_lock(&matrix_dev->lock);
> clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
> vfio_ap_mdev_manage_qlinks(matrix_mdev, UNLINK_APID, apid);
> + vfio_ap_mdev_hot_unplug_adapter(matrix_mdev, apid);
> mutex_unlock(&matrix_dev->lock);
>
> return count;
> }
> static DEVICE_ATTR_WO(unassign_adapter);
>
> +static bool vfio_ap_assign_apqi_to_apcb(struct ap_matrix_mdev *matrix_mdev,
> + unsigned long apqi)
> +{
> + unsigned long apid, apqn;
> + unsigned long *apm = matrix_mdev->shadow_apcb.apm;
> +
> + /*
> + * If the APQI is already assigned to the guest's shadow APCB, there is
> + * no need to assign it.
> + */
> + if (test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm))
> + return false;
> +
> + /*
> + * If no adapters have yet been assigned to the shadow APCB and one or
> + * more adapters have been assigned to the matrix mdev, then use
> + * the adapters assigned to the matrix mdev; otherwise, there is nothing
> + * to assign to the shadow APCB.
> + */
> + if (bitmap_empty(matrix_mdev->shadow_apcb.apm, AP_DEVICES)) {
> + if (bitmap_empty(matrix_mdev->matrix.apm, AP_DEVICES))
> + return false;
> +
> + apm = matrix_mdev->matrix.apm;
> + }
> +
> + /* Make sure all APQNs are bound to the vfio_ap driver */
> + for_each_set_bit_inv(apid, apm, AP_DEVICES) {
> + apqn = AP_MKQID(apid, apqi);
> +
> + if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
> + return false;
> + }
> +
> + set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
> +
> + /*
> + * If we verified APQNs using the adapters assigned to the matrix mdev,
> + * then copy the APIDs of those adapters into the guest's APCB
> + */
> + if (bitmap_empty(matrix_mdev->shadow_apcb.apm, AP_DEVICES))
> + bitmap_copy(matrix_mdev->shadow_apcb.apm,
> + matrix_mdev->matrix.apm, AP_DEVICES);
> +
> + return true;
> +}
> +
> +static void vfio_ap_mdev_hot_plug_domain(struct ap_matrix_mdev *matrix_mdev,
> + unsigned long apqi)
> +{
> + if (vfio_ap_assign_apqi_to_apcb(matrix_mdev, apqi))
> + vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
> +}
> +
> /**
> * assign_domain_store
> *
> @@ -793,10 +904,6 @@ static ssize_t assign_domain_store(struct device *dev,
> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
>
> - /* If the guest is running, disallow assignment of domain */
> - if (matrix_mdev->kvm)
> - return -EBUSY;
> -
> ret = kstrtoul(buf, 0, &apqi);
> if (ret)
> return ret;
> @@ -817,12 +924,21 @@ static ssize_t assign_domain_store(struct device *dev,
> }
> set_bit_inv(apqi, matrix_mdev->matrix.aqm);
> vfio_ap_mdev_manage_qlinks(matrix_mdev, LINK_APQI, apqi);
> + vfio_ap_mdev_hot_plug_domain(matrix_mdev, apqi);
> mutex_unlock(&matrix_dev->lock);
>
> return count;
> }
> static DEVICE_ATTR_WO(assign_domain);
>
> +static void vfio_ap_mdev_hot_unplug_domain(struct ap_matrix_mdev *matrix_mdev,
> + unsigned long apqi)
> +{
> + if (test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm)) {
> + clear_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
> + vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
> + }
> +}
>
> /**
> * unassign_domain_store
> @@ -850,10 +966,6 @@ static ssize_t unassign_domain_store(struct device *dev,
> struct mdev_device *mdev = mdev_from_dev(dev);
> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>
> - /* If the guest is running, disallow un-assignment of domain */
> - if (matrix_mdev->kvm)
> - return -EBUSY;
> -
> ret = kstrtoul(buf, 0, &apqi);
> if (ret)
> return ret;
> @@ -864,12 +976,22 @@ static ssize_t unassign_domain_store(struct device *dev,
> mutex_lock(&matrix_dev->lock);
> clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
> vfio_ap_mdev_manage_qlinks(matrix_mdev, UNLINK_APQI, apqi);
> + vfio_ap_mdev_hot_unplug_domain(matrix_mdev, apqi);
> mutex_unlock(&matrix_dev->lock);
>
> return count;
> }
> static DEVICE_ATTR_WO(unassign_domain);
>
> +static void vfio_ap_mdev_hot_plug_ctl_domain(struct ap_matrix_mdev *matrix_mdev,
> + unsigned long domid)
> +{
> + if (!test_bit_inv(domid, matrix_mdev->shadow_apcb.adm)) {
> + set_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
> + vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
> + }
> +}
> +
> /**
> * assign_control_domain_store
> *
> @@ -895,10 +1017,6 @@ static ssize_t assign_control_domain_store(struct device *dev,
> struct mdev_device *mdev = mdev_from_dev(dev);
> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>
> - /* If the guest is running, disallow assignment of control domain */
> - if (matrix_mdev->kvm)
> - return -EBUSY;
> -
> ret = kstrtoul(buf, 0, &id);
> if (ret)
> return ret;
> @@ -914,12 +1032,23 @@ static ssize_t assign_control_domain_store(struct device *dev,
> if (!mutex_trylock(&matrix_dev->lock))
> return -EBUSY;
> set_bit_inv(id, matrix_mdev->matrix.adm);
> + vfio_ap_mdev_hot_plug_ctl_domain(matrix_mdev, id);
> mutex_unlock(&matrix_dev->lock);
>
> return count;
> }
> static DEVICE_ATTR_WO(assign_control_domain);
>
> +static void
> +vfio_ap_mdev_hot_unplug_ctl_domain(struct ap_matrix_mdev *matrix_mdev,
> + unsigned long domid)
> +{
> + if (test_bit_inv(domid, matrix_mdev->shadow_apcb.adm)) {
> + clear_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
> + vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
> + }
> +}
> +
> /**
> * unassign_control_domain_store
> *
> @@ -946,10 +1075,6 @@ static ssize_t unassign_control_domain_store(struct device *dev,
> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> unsigned long max_domid = matrix_mdev->matrix.adm_max;
>
> - /* If the guest is running, disallow un-assignment of control domain */
> - if (matrix_mdev->kvm)
> - return -EBUSY;
> -
> ret = kstrtoul(buf, 0, &domid);
> if (ret)
> return ret;
> @@ -958,6 +1083,7 @@ static ssize_t unassign_control_domain_store(struct device *dev,
>
> mutex_lock(&matrix_dev->lock);
> clear_bit_inv(domid, matrix_mdev->matrix.adm);
> + vfio_ap_mdev_hot_unplug_ctl_domain(matrix_mdev, domid);
> mutex_unlock(&matrix_dev->lock);
>
> return count;
> @@ -1099,8 +1225,6 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
> {
> struct ap_matrix_mdev *m;
>
> - mutex_lock(&matrix_dev->lock);
> -
> list_for_each_entry(m, &matrix_dev->mdev_list, node) {
> if ((m != matrix_mdev) && (m->kvm == kvm)) {
> mutex_unlock(&matrix_dev->lock);
> @@ -1111,7 +1235,6 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
> matrix_mdev->kvm = kvm;
> kvm_get_kvm(kvm);
> kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
> - mutex_unlock(&matrix_dev->lock);
>
> return 0;
> }
> @@ -1148,7 +1271,7 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
> static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
> unsigned long action, void *data)
> {
> - int ret;
> + int ret = NOTIFY_DONE;
> struct ap_matrix_mdev *matrix_mdev;
>
> if (action != VFIO_GROUP_NOTIFY_SET_KVM)
> @@ -1156,23 +1279,28 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>
> matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
>
> + mutex_lock(&matrix_dev->lock);
> +
> if (!data) {
> if (matrix_mdev->kvm)
> kvm_put_kvm(matrix_mdev->kvm);
>
> matrix_mdev->kvm = NULL;
>
> - return NOTIFY_OK;
> + ret = NOTIFY_OK;
> + goto done;
> }
>
> ret = vfio_ap_mdev_set_kvm(matrix_mdev, data);
> if (ret)
> - return NOTIFY_DONE;
> + goto done;
>
> vfio_ap_mdev_init_apcb(matrix_mdev);
> vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>
> - return NOTIFY_OK;
> +done:
> + mutex_unlock(&matrix_dev->lock);
> + return ret;
> }
>
> static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
On 11/26/20 5:34 AM, Halil Pasic wrote:
> On Tue, 24 Nov 2020 16:40:02 -0500
> Tony Krowiak <[email protected]> wrote:
>
> A nit: for all other patches the title prefix is s390/vfio-ap, here you
> have 390/vfio-ap.
I'll fix that.
On 11/28/20 8:52 PM, Halil Pasic wrote:
> On Tue, 24 Nov 2020 16:40:11 -0500
> Tony Krowiak <[email protected]> wrote:
>
>> Let's hot plug/unplug adapters, domains and control domains assigned to or
>> unassigned from an AP matrix mdev device while it is in use by a guest per
>> the following rules:
>>
>> * Assign an adapter to mdev's matrix:
>>
>> The adapter will be hot plugged into the guest under the following
>> conditions:
>> 1. The adapter is not yet assigned to the guest's matrix
>> 2. At least one domain is assigned to the guest's matrix
>> 3. Each APQN derived from the APID of the newly assigned adapter and
>> the APQIs of the domains already assigned to the guest's
>> matrix references a queue device bound to the vfio_ap device driver.
>>
>> The adapter and each domain assigned to the mdev's matrix will be hot
>> plugged into the guest under the following conditions:
>> 1. The adapter is not yet assigned to the guest's matrix
>> 2. No domains are assigned to the guest's matrix
>> 3 At least one domain is assigned to the mdev's matrix
>> 4. Each APQN derived from the APID of the newly assigned adapter and
>> the APQIs of the domains assigned to the mdev's matrix references a
>> queue device bound to the vfio_ap device driver.
>>
>> * Unassign an adapter from mdev's matrix:
>>
>> The adapter will be hot unplugged from the KVM guest if it is
>> assigned to the guest's matrix.
>>
>> * Assign a domain to mdev's matrix:
>>
>> The domain will be hot plugged into the guest under the following
>> conditions:
>> 1. The domain is not yet assigned to the guest's matrix
>> 2. At least one adapter is assigned to the guest's matrix
>> 3. Each APQN derived from the APQI of the newly assigned domain and
>> the APIDs of the adapters already assigned to the guest's
>> matrix references a queue device bound to the vfio_ap device driver.
>>
>> The domain and each adapter assigned to the mdev's matrix will be hot
>> plugged into the guest under the following conditions:
>> 1. The domain is not yet assigned to the guest's matrix
>> 2. No adapters are assigned to the guest's matrix
>> 3 At least one adapter is assigned to the mdev's matrix
>> 4. Each APQN derived from the APQI of the newly assigned domain and
>> the APIDs of the adapters assigned to the mdev's matrix references a
>> queue device bound to the vfio_ap device driver.
>>
>> * Unassign adapter from mdev's matrix:
>>
>> The domain will be hot unplugged from the KVM guest if it is
>> assigned to the guest's matrix.
>>
>> * Assign a control domain:
>>
>> The control domain will be hot plugged into the KVM guest if it is not
>> assigned to the guest's APCB. The AP architecture ensures a guest will
>> only get access to the control domain if it is in the host's AP
>> configuration, so there is no risk in hot plugging it; however, it will
>> become automatically available to the guest when it is added to the host
>> configuration.
>>
>> * Unassign a control domain:
>>
>> The control domain will be hot unplugged from the KVM guest if it is
>> assigned to the guest's APCB.
> This is where things start getting tricky. E.g. do we need to revise
> filtering after an unassign? (For example an assign_adapter X didn't
> change the shadow, because queue XY was missing, but now we unplug domain
> Y. Should the adapter X pop up? I guess it should.)
I suppose that makes sense at the expense of making the code
more complex. It is essentially what we had in the prior version
which used the same filtering code for assignment as well as
host AP configuration changes.
>
>
>> Note: Now that hot plug/unplug is implemented, there is the possibility
>> that an assignment/unassignment of an adapter, domain or control
>> domain could be initiated while the guest is starting, so the
>> matrix device lock will be taken for the group notification callback
>> that initializes the guest's APCB when the KVM pointer is made
>> available to the vfio_ap device driver.
>>
>> Signed-off-by: Tony Krowiak <[email protected]>
>> ---
>> drivers/s390/crypto/vfio_ap_ops.c | 190 +++++++++++++++++++++++++-----
>> 1 file changed, 159 insertions(+), 31 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index 586ec5776693..4f96b7861607 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -631,6 +631,60 @@ static void vfio_ap_mdev_manage_qlinks(struct ap_matrix_mdev *matrix_mdev,
>> }
>> }
>>
>> +static bool vfio_ap_assign_apid_to_apcb(struct ap_matrix_mdev *matrix_mdev,
>> + unsigned long apid)
>> +{
>> + unsigned long apqi, apqn;
>> + unsigned long *aqm = matrix_mdev->shadow_apcb.aqm;
>> +
>> + /*
>> + * If the APID is already assigned to the guest's shadow APCB, there is
>> + * no need to assign it.
>> + */
>> + if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm))
>> + return false;
>> +
>> + /*
>> + * If no domains have yet been assigned to the shadow APCB and one or
>> + * more domains have been assigned to the matrix mdev, then use
>> + * the domains assigned to the matrix mdev; otherwise, there is nothing
>> + * to assign to the shadow APCB.
>> + */
>> + if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS)) {
>> + if (bitmap_empty(matrix_mdev->matrix.aqm, AP_DOMAINS))
>> + return false;
>> +
>> + aqm = matrix_mdev->matrix.aqm;
>> + }
>> +
>> + /* Make sure all APQNs are bound to the vfio_ap driver */
>> + for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
>> + apqn = AP_MKQID(apid, apqi);
>> +
>> + if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
>> + return false;
>> + }
>> +
>> + set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
>> +
>> + /*
>> + * If we verified APQNs using the domains assigned to the matrix mdev,
>> + * then copy the APQIs of those domains into the guest's APCB
>> + */
>> + if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS))
>> + bitmap_copy(matrix_mdev->shadow_apcb.aqm,
>> + matrix_mdev->matrix.aqm, AP_DOMAINS);
>> +
>> + return true;
>> +}
> What is the rationale behind the shadow aqm empty special handling?
The rationale was to avoid taking the VCPUs
out of SIE in order to make an update to the guest's APCB
unnecessarily. For example, suppose the guest is started
without access to any APQNs (i.e., all matrix and shadow_apcb
masks are zeros). Now suppose the administrator proceeds to
start assigning AP resources to the mdev. Let's say he starts
by assigning adapters 1 through 100. The code below will return
true indicating the shadow_apcb was updated. Consequently,
the calling code will commit the changes to the guest's
APCB. The problem there is that in order to update the guest's
VCPUs, they will have to be taken out of SIE, yet the guest will
not get access to the adapter since no domains have yet been
assigned to the APCB. Doing this 100 times - once for each
adapter 1-100 - is probably a bad idea.
> I.e.
> why not simply:
>
>
> static bool vfio_ap_assign_apid_to_apcb(struct ap_matrix_mdev *matrix_mdev,
> unsigned long apid)
> {
> unsigned long apqi, apqn;
> unsigned long *aqm = matrix_mdev->shadow_apcb.aqm;
>
> /*
> * If the APID is already assigned to the guest's shadow APCB, there is
> * no need to assign it.
> */
> if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm))
> return false;
>
> /* Make sure all APQNs are bound to the vfio_ap driver */
> for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
> apqn = AP_MKQID(apid, apqi);
>
> if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
> return false;
> }
>
> set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
>
> return true;
> }
>
> Please answer the questions I've asked, and note that I will have to
> return to this patch, later.
>
> Regards,
> Halil
>
>> +
>> +static void vfio_ap_mdev_hot_plug_adapter(struct ap_matrix_mdev *matrix_mdev,
>> + unsigned long apid)
>> +{
>> + if (vfio_ap_assign_apid_to_apcb(matrix_mdev, apid))
>> + vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>> +}
>> +
>> /**
>> * assign_adapter_store
>> *
>> @@ -673,10 +727,6 @@ static ssize_t assign_adapter_store(struct device *dev,
>> struct mdev_device *mdev = mdev_from_dev(dev);
>> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>
>> - /* If the guest is running, disallow assignment of adapter */
>> - if (matrix_mdev->kvm)
>> - return -EBUSY;
>> -
>> ret = kstrtoul(buf, 0, &apid);
>> if (ret)
>> return ret;
>> @@ -698,12 +748,22 @@ static ssize_t assign_adapter_store(struct device *dev,
>> }
>> set_bit_inv(apid, matrix_mdev->matrix.apm);
>> vfio_ap_mdev_manage_qlinks(matrix_mdev, LINK_APID, apid);
>> + vfio_ap_mdev_hot_plug_adapter(matrix_mdev, apid);
>> mutex_unlock(&matrix_dev->lock);
>>
>> return count;
>> }
>> static DEVICE_ATTR_WO(assign_adapter);
>>
>> +static void vfio_ap_mdev_hot_unplug_adapter(struct ap_matrix_mdev *matrix_mdev,
>> + unsigned long apid)
>> +{
>> + if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm)) {
>> + clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
>> + vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>> + }
>> +}
>> +
>> /**
>> * unassign_adapter_store
>> *
>> @@ -730,10 +790,6 @@ static ssize_t unassign_adapter_store(struct device *dev,
>> struct mdev_device *mdev = mdev_from_dev(dev);
>> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>
>> - /* If the guest is running, disallow un-assignment of adapter */
>> - if (matrix_mdev->kvm)
>> - return -EBUSY;
>> -
>> ret = kstrtoul(buf, 0, &apid);
>> if (ret)
>> return ret;
>> @@ -744,12 +800,67 @@ static ssize_t unassign_adapter_store(struct device *dev,
>> mutex_lock(&matrix_dev->lock);
>> clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
>> vfio_ap_mdev_manage_qlinks(matrix_mdev, UNLINK_APID, apid);
>> + vfio_ap_mdev_hot_unplug_adapter(matrix_mdev, apid);
>> mutex_unlock(&matrix_dev->lock);
>>
>> return count;
>> }
>> static DEVICE_ATTR_WO(unassign_adapter);
>>
>> +static bool vfio_ap_assign_apqi_to_apcb(struct ap_matrix_mdev *matrix_mdev,
>> + unsigned long apqi)
>> +{
>> + unsigned long apid, apqn;
>> + unsigned long *apm = matrix_mdev->shadow_apcb.apm;
>> +
>> + /*
>> + * If the APQI is already assigned to the guest's shadow APCB, there is
>> + * no need to assign it.
>> + */
>> + if (test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm))
>> + return false;
>> +
>> + /*
>> + * If no adapters have yet been assigned to the shadow APCB and one or
>> + * more adapters have been assigned to the matrix mdev, then use
>> + * the adapters assigned to the matrix mdev; otherwise, there is nothing
>> + * to assign to the shadow APCB.
>> + */
>> + if (bitmap_empty(matrix_mdev->shadow_apcb.apm, AP_DEVICES)) {
>> + if (bitmap_empty(matrix_mdev->matrix.apm, AP_DEVICES))
>> + return false;
>> +
>> + apm = matrix_mdev->matrix.apm;
>> + }
>> +
>> + /* Make sure all APQNs are bound to the vfio_ap driver */
>> + for_each_set_bit_inv(apid, apm, AP_DEVICES) {
>> + apqn = AP_MKQID(apid, apqi);
>> +
>> + if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
>> + return false;
>> + }
>> +
>> + set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
>> +
>> + /*
>> + * If we verified APQNs using the adapters assigned to the matrix mdev,
>> + * then copy the APIDs of those adapters into the guest's APCB
>> + */
>> + if (bitmap_empty(matrix_mdev->shadow_apcb.apm, AP_DEVICES))
>> + bitmap_copy(matrix_mdev->shadow_apcb.apm,
>> + matrix_mdev->matrix.apm, AP_DEVICES);
>> +
>> + return true;
>> +}
>> +
>> +static void vfio_ap_mdev_hot_plug_domain(struct ap_matrix_mdev *matrix_mdev,
>> + unsigned long apqi)
>> +{
>> + if (vfio_ap_assign_apqi_to_apcb(matrix_mdev, apqi))
>> + vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>> +}
>> +
>> /**
>> * assign_domain_store
>> *
>> @@ -793,10 +904,6 @@ static ssize_t assign_domain_store(struct device *dev,
>> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>> unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
>>
>> - /* If the guest is running, disallow assignment of domain */
>> - if (matrix_mdev->kvm)
>> - return -EBUSY;
>> -
>> ret = kstrtoul(buf, 0, &apqi);
>> if (ret)
>> return ret;
>> @@ -817,12 +924,21 @@ static ssize_t assign_domain_store(struct device *dev,
>> }
>> set_bit_inv(apqi, matrix_mdev->matrix.aqm);
>> vfio_ap_mdev_manage_qlinks(matrix_mdev, LINK_APQI, apqi);
>> + vfio_ap_mdev_hot_plug_domain(matrix_mdev, apqi);
>> mutex_unlock(&matrix_dev->lock);
>>
>> return count;
>> }
>> static DEVICE_ATTR_WO(assign_domain);
>>
>> +static void vfio_ap_mdev_hot_unplug_domain(struct ap_matrix_mdev *matrix_mdev,
>> + unsigned long apqi)
>> +{
>> + if (test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm)) {
>> + clear_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
>> + vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>> + }
>> +}
>>
>> /**
>> * unassign_domain_store
>> @@ -850,10 +966,6 @@ static ssize_t unassign_domain_store(struct device *dev,
>> struct mdev_device *mdev = mdev_from_dev(dev);
>> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>
>> - /* If the guest is running, disallow un-assignment of domain */
>> - if (matrix_mdev->kvm)
>> - return -EBUSY;
>> -
>> ret = kstrtoul(buf, 0, &apqi);
>> if (ret)
>> return ret;
>> @@ -864,12 +976,22 @@ static ssize_t unassign_domain_store(struct device *dev,
>> mutex_lock(&matrix_dev->lock);
>> clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
>> vfio_ap_mdev_manage_qlinks(matrix_mdev, UNLINK_APQI, apqi);
>> + vfio_ap_mdev_hot_unplug_domain(matrix_mdev, apqi);
>> mutex_unlock(&matrix_dev->lock);
>>
>> return count;
>> }
>> static DEVICE_ATTR_WO(unassign_domain);
>>
>> +static void vfio_ap_mdev_hot_plug_ctl_domain(struct ap_matrix_mdev *matrix_mdev,
>> + unsigned long domid)
>> +{
>> + if (!test_bit_inv(domid, matrix_mdev->shadow_apcb.adm)) {
>> + set_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
>> + vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>> + }
>> +}
>> +
>> /**
>> * assign_control_domain_store
>> *
>> @@ -895,10 +1017,6 @@ static ssize_t assign_control_domain_store(struct device *dev,
>> struct mdev_device *mdev = mdev_from_dev(dev);
>> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>
>> - /* If the guest is running, disallow assignment of control domain */
>> - if (matrix_mdev->kvm)
>> - return -EBUSY;
>> -
>> ret = kstrtoul(buf, 0, &id);
>> if (ret)
>> return ret;
>> @@ -914,12 +1032,23 @@ static ssize_t assign_control_domain_store(struct device *dev,
>> if (!mutex_trylock(&matrix_dev->lock))
>> return -EBUSY;
>> set_bit_inv(id, matrix_mdev->matrix.adm);
>> + vfio_ap_mdev_hot_plug_ctl_domain(matrix_mdev, id);
>> mutex_unlock(&matrix_dev->lock);
>>
>> return count;
>> }
>> static DEVICE_ATTR_WO(assign_control_domain);
>>
>> +static void
>> +vfio_ap_mdev_hot_unplug_ctl_domain(struct ap_matrix_mdev *matrix_mdev,
>> + unsigned long domid)
>> +{
>> + if (test_bit_inv(domid, matrix_mdev->shadow_apcb.adm)) {
>> + clear_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
>> + vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>> + }
>> +}
>> +
>> /**
>> * unassign_control_domain_store
>> *
>> @@ -946,10 +1075,6 @@ static ssize_t unassign_control_domain_store(struct device *dev,
>> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>> unsigned long max_domid = matrix_mdev->matrix.adm_max;
>>
>> - /* If the guest is running, disallow un-assignment of control domain */
>> - if (matrix_mdev->kvm)
>> - return -EBUSY;
>> -
>> ret = kstrtoul(buf, 0, &domid);
>> if (ret)
>> return ret;
>> @@ -958,6 +1083,7 @@ static ssize_t unassign_control_domain_store(struct device *dev,
>>
>> mutex_lock(&matrix_dev->lock);
>> clear_bit_inv(domid, matrix_mdev->matrix.adm);
>> + vfio_ap_mdev_hot_unplug_ctl_domain(matrix_mdev, domid);
>> mutex_unlock(&matrix_dev->lock);
>>
>> return count;
>> @@ -1099,8 +1225,6 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
>> {
>> struct ap_matrix_mdev *m;
>>
>> - mutex_lock(&matrix_dev->lock);
>> -
>> list_for_each_entry(m, &matrix_dev->mdev_list, node) {
>> if ((m != matrix_mdev) && (m->kvm == kvm)) {
>> mutex_unlock(&matrix_dev->lock);
>> @@ -1111,7 +1235,6 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
>> matrix_mdev->kvm = kvm;
>> kvm_get_kvm(kvm);
>> kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
>> - mutex_unlock(&matrix_dev->lock);
>>
>> return 0;
>> }
>> @@ -1148,7 +1271,7 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
>> static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>> unsigned long action, void *data)
>> {
>> - int ret;
>> + int ret = NOTIFY_DONE;
>> struct ap_matrix_mdev *matrix_mdev;
>>
>> if (action != VFIO_GROUP_NOTIFY_SET_KVM)
>> @@ -1156,23 +1279,28 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>>
>> matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
>>
>> + mutex_lock(&matrix_dev->lock);
>> +
>> if (!data) {
>> if (matrix_mdev->kvm)
>> kvm_put_kvm(matrix_mdev->kvm);
>>
>> matrix_mdev->kvm = NULL;
>>
>> - return NOTIFY_OK;
>> + ret = NOTIFY_OK;
>> + goto done;
>> }
>>
>> ret = vfio_ap_mdev_set_kvm(matrix_mdev, data);
>> if (ret)
>> - return NOTIFY_DONE;
>> + goto done;
>>
>> vfio_ap_mdev_init_apcb(matrix_mdev);
>> vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>>
>> - return NOTIFY_OK;
>> +done:
>> + mutex_unlock(&matrix_dev->lock);
>> + return ret;
>> }
>>
>> static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
On 11/30/20 6:32 PM, Halil Pasic wrote:
> On Mon, 30 Nov 2020 14:36:10 -0500
> Tony Krowiak <[email protected]> wrote:
>
>>
>> On 11/28/20 8:52 PM, Halil Pasic wrote:
> [..]
>>>> * Unassign adapter from mdev's matrix:
>>>>
>>>> The domain will be hot unplugged from the KVM guest if it is
>>>> assigned to the guest's matrix.
>>>>
>>>> * Assign a control domain:
>>>>
>>>> The control domain will be hot plugged into the KVM guest if it is not
>>>> assigned to the guest's APCB. The AP architecture ensures a guest will
>>>> only get access to the control domain if it is in the host's AP
>>>> configuration, so there is no risk in hot plugging it; however, it will
>>>> become automatically available to the guest when it is added to the host
>>>> configuration.
>>>>
>>>> * Unassign a control domain:
>>>>
>>>> The control domain will be hot unplugged from the KVM guest if it is
>>>> assigned to the guest's APCB.
>>> This is where things start getting tricky. E.g. do we need to revise
>>> filtering after an unassign? (For example an assign_adapter X didn't
>>> change the shadow, because queue XY was missing, but now we unplug domain
>>> Y. Should the adapter X pop up? I guess it should.)
>> I suppose that makes sense at the expense of making the code
>> more complex. It is essentially what we had in the prior version
>> which used the same filtering code for assignment as well as
>> host AP configuration changes.
>>
> Will have to think about it some more. Making the user unplug and
> replug an adapter because at some point it got filtered, but there
> is no need to filter it does not feel right. On the other hand, I'm
> afraid I'm complaining in circles.
>
>>>
>>>> Note: Now that hot plug/unplug is implemented, there is the possibility
>>>> that an assignment/unassignment of an adapter, domain or control
>>>> domain could be initiated while the guest is starting, so the
>>>> matrix device lock will be taken for the group notification callback
>>>> that initializes the guest's APCB when the KVM pointer is made
>>>> available to the vfio_ap device driver.
>>>>
>>>> Signed-off-by: Tony Krowiak <[email protected]>
>>>> ---
>>>> drivers/s390/crypto/vfio_ap_ops.c | 190 +++++++++++++++++++++++++-----
>>>> 1 file changed, 159 insertions(+), 31 deletions(-)
>>>>
>>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>>>> index 586ec5776693..4f96b7861607 100644
>>>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>>>> @@ -631,6 +631,60 @@ static void vfio_ap_mdev_manage_qlinks(struct ap_matrix_mdev *matrix_mdev,
>>>> }
>>>> }
>>>>
>>>> +static bool vfio_ap_assign_apid_to_apcb(struct ap_matrix_mdev *matrix_mdev,
>>>> + unsigned long apid)
>>>> +{
>>>> + unsigned long apqi, apqn;
>>>> + unsigned long *aqm = matrix_mdev->shadow_apcb.aqm;
>>>> +
>>>> + /*
>>>> + * If the APID is already assigned to the guest's shadow APCB, there is
>>>> + * no need to assign it.
>>>> + */
>>>> + if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm))
>>>> + return false;
>>>> +
>>>> + /*
>>>> + * If no domains have yet been assigned to the shadow APCB and one or
>>>> + * more domains have been assigned to the matrix mdev, then use
>>>> + * the domains assigned to the matrix mdev; otherwise, there is nothing
>>>> + * to assign to the shadow APCB.
>>>> + */
>>>> + if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS)) {
>>>> + if (bitmap_empty(matrix_mdev->matrix.aqm, AP_DOMAINS))
>>>> + return false;
>>>> +
>>>> + aqm = matrix_mdev->matrix.aqm;
>>>> + }
>>>> +
>>>> + /* Make sure all APQNs are bound to the vfio_ap driver */
>>>> + for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
>>>> + apqn = AP_MKQID(apid, apqi);
>>>> +
>>>> + if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
>>>> + return false;
>>>> + }
>>>> +
>>>> + set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
>>>> +
>>>> + /*
>>>> + * If we verified APQNs using the domains assigned to the matrix mdev,
>>>> + * then copy the APQIs of those domains into the guest's APCB
>>>> + */
>>>> + if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS))
>>>> + bitmap_copy(matrix_mdev->shadow_apcb.aqm,
>>>> + matrix_mdev->matrix.aqm, AP_DOMAINS);
>>>> +
>>>> + return true;
>>>> +}
>>> What is the rationale behind the shadow aqm empty special handling?
>> The rationale was to avoid taking the VCPUs
>> out of SIE in order to make an update to the guest's APCB
>> unnecessarily. For example, suppose the guest is started
>> without access to any APQNs (i.e., all matrix and shadow_apcb
>> masks are zeros). Now suppose the administrator proceeds to
>> start assigning AP resources to the mdev. Let's say he starts
>> by assigning adapters 1 through 100. The code below will return
>> true indicating the shadow_apcb was updated. Consequently,
>> the calling code will commit the changes to the guest's
>> APCB. The problem there is that in order to update the guest's
>> VCPUs, they will have to be taken out of SIE, yet the guest will
>> not get access to the adapter since no domains have yet been
>> assigned to the APCB. Doing this 100 times - once for each
>> adapter 1-100 - is probably a bad idea.
>>
> Not yanking the VCPUs out of SIE does make a lot of sense. At least
> I understand your motivation now. I will think some more about this,
> but in the meanwhile, please try to answer one more question (see
> below).
>
>>> I.e.
>>> why not simply:
>>>
>>>
>>> static bool vfio_ap_assign_apid_to_apcb(struct ap_matrix_mdev *matrix_mdev,
>>> unsigned long apid)
>>> {
>>> unsigned long apqi, apqn;
>>> unsigned long *aqm = matrix_mdev->shadow_apcb.aqm;
>>>
>>> /*
>>> * If the APID is already assigned to the guest's shadow APCB, there is
>>> * no need to assign it.
>>> */
>>> if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm))
>>> return false;
>>>
>>> /* Make sure all APQNs are bound to the vfio_ap driver */
>>> for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
>>> apqn = AP_MKQID(apid, apqi);
>>>
>>> if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
>>> return false;
>>> }
>>>
>>> set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
>>>
>>> return true;
> Would
> s/return true/return !bitmap_empty(matrix_mdev->shadow_apcb.aqm,
> AP_DOMAINS)/
> do the trick?
>
> I mean if empty, then we would not commit the APCB, so we would
> not take the vCPUs out of SIE -- see below.
At first glance I'd say yes, it does the trick; but, I need to consider
all possible scenarios. For example, that will work fine when someone
either assigns all of the adapters or all of the domains first, then assigns
the other.
>
>>>> +
>>>> +static void vfio_ap_mdev_hot_plug_adapter(struct ap_matrix_mdev *matrix_mdev,
>>>> + unsigned long apid)
>>>> +{
>>>> + if (vfio_ap_assign_apid_to_apcb(matrix_mdev, apid))
>>>> + vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>>>> +}
>>>> +
> [..]
>
> Regards,
> Halil
On Mon, 30 Nov 2020 14:36:10 -0500
Tony Krowiak <[email protected]> wrote:
>
>
> On 11/28/20 8:52 PM, Halil Pasic wrote:
[..]
> >> * Unassign adapter from mdev's matrix:
> >>
> >> The domain will be hot unplugged from the KVM guest if it is
> >> assigned to the guest's matrix.
> >>
> >> * Assign a control domain:
> >>
> >> The control domain will be hot plugged into the KVM guest if it is not
> >> assigned to the guest's APCB. The AP architecture ensures a guest will
> >> only get access to the control domain if it is in the host's AP
> >> configuration, so there is no risk in hot plugging it; however, it will
> >> become automatically available to the guest when it is added to the host
> >> configuration.
> >>
> >> * Unassign a control domain:
> >>
> >> The control domain will be hot unplugged from the KVM guest if it is
> >> assigned to the guest's APCB.
> > This is where things start getting tricky. E.g. do we need to revise
> > filtering after an unassign? (For example an assign_adapter X didn't
> > change the shadow, because queue XY was missing, but now we unplug domain
> > Y. Should the adapter X pop up? I guess it should.)
>
> I suppose that makes sense at the expense of making the code
> more complex. It is essentially what we had in the prior version
> which used the same filtering code for assignment as well as
> host AP configuration changes.
>
Will have to think about it some more. Making the user unplug and
replug an adapter because at some point it got filtered, but there
is no need to filter it does not feel right. On the other hand, I'm
afraid I'm complaining in circles.
> >
> >
> >> Note: Now that hot plug/unplug is implemented, there is the possibility
> >> that an assignment/unassignment of an adapter, domain or control
> >> domain could be initiated while the guest is starting, so the
> >> matrix device lock will be taken for the group notification callback
> >> that initializes the guest's APCB when the KVM pointer is made
> >> available to the vfio_ap device driver.
> >>
> >> Signed-off-by: Tony Krowiak <[email protected]>
> >> ---
> >> drivers/s390/crypto/vfio_ap_ops.c | 190 +++++++++++++++++++++++++-----
> >> 1 file changed, 159 insertions(+), 31 deletions(-)
> >>
> >> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> >> index 586ec5776693..4f96b7861607 100644
> >> --- a/drivers/s390/crypto/vfio_ap_ops.c
> >> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> >> @@ -631,6 +631,60 @@ static void vfio_ap_mdev_manage_qlinks(struct ap_matrix_mdev *matrix_mdev,
> >> }
> >> }
> >>
> >> +static bool vfio_ap_assign_apid_to_apcb(struct ap_matrix_mdev *matrix_mdev,
> >> + unsigned long apid)
> >> +{
> >> + unsigned long apqi, apqn;
> >> + unsigned long *aqm = matrix_mdev->shadow_apcb.aqm;
> >> +
> >> + /*
> >> + * If the APID is already assigned to the guest's shadow APCB, there is
> >> + * no need to assign it.
> >> + */
> >> + if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm))
> >> + return false;
> >> +
> >> + /*
> >> + * If no domains have yet been assigned to the shadow APCB and one or
> >> + * more domains have been assigned to the matrix mdev, then use
> >> + * the domains assigned to the matrix mdev; otherwise, there is nothing
> >> + * to assign to the shadow APCB.
> >> + */
> >> + if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS)) {
> >> + if (bitmap_empty(matrix_mdev->matrix.aqm, AP_DOMAINS))
> >> + return false;
> >> +
> >> + aqm = matrix_mdev->matrix.aqm;
> >> + }
> >> +
> >> + /* Make sure all APQNs are bound to the vfio_ap driver */
> >> + for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
> >> + apqn = AP_MKQID(apid, apqi);
> >> +
> >> + if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
> >> + return false;
> >> + }
> >> +
> >> + set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
> >> +
> >> + /*
> >> + * If we verified APQNs using the domains assigned to the matrix mdev,
> >> + * then copy the APQIs of those domains into the guest's APCB
> >> + */
> >> + if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS))
> >> + bitmap_copy(matrix_mdev->shadow_apcb.aqm,
> >> + matrix_mdev->matrix.aqm, AP_DOMAINS);
> >> +
> >> + return true;
> >> +}
> > What is the rationale behind the shadow aqm empty special handling?
>
> The rationale was to avoid taking the VCPUs
> out of SIE in order to make an update to the guest's APCB
> unnecessarily. For example, suppose the guest is started
> without access to any APQNs (i.e., all matrix and shadow_apcb
> masks are zeros). Now suppose the administrator proceeds to
> start assigning AP resources to the mdev. Let's say he starts
> by assigning adapters 1 through 100. The code below will return
> true indicating the shadow_apcb was updated. Consequently,
> the calling code will commit the changes to the guest's
> APCB. The problem there is that in order to update the guest's
> VCPUs, they will have to be taken out of SIE, yet the guest will
> not get access to the adapter since no domains have yet been
> assigned to the APCB. Doing this 100 times - once for each
> adapter 1-100 - is probably a bad idea.
>
Not yanking the VCPUs out of SIE does make a lot of sense. At least
I understand your motivation now. I will think some more about this,
but in the meanwhile, please try to answer one more question (see
below).
> > I.e.
> > why not simply:
> >
> >
> > static bool vfio_ap_assign_apid_to_apcb(struct ap_matrix_mdev *matrix_mdev,
> > unsigned long apid)
> > {
> > unsigned long apqi, apqn;
> > unsigned long *aqm = matrix_mdev->shadow_apcb.aqm;
> >
> > /*
> > * If the APID is already assigned to the guest's shadow APCB, there is
> > * no need to assign it.
> > */
> > if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm))
> > return false;
> >
> > /* Make sure all APQNs are bound to the vfio_ap driver */
> > for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
> > apqn = AP_MKQID(apid, apqi);
> >
> > if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
> > return false;
> > }
> >
> > set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
> >
> > return true;
Would
s/return true/return !bitmap_empty(matrix_mdev->shadow_apcb.aqm,
AP_DOMAINS)/
do the trick?
I mean if empty, then we would not commit the APCB, so we would
not take the vCPUs out of SIE -- see below.
> >> +
> >> +static void vfio_ap_mdev_hot_plug_adapter(struct ap_matrix_mdev *matrix_mdev,
> >> + unsigned long apid)
> >> +{
> >> + if (vfio_ap_assign_apid_to_apcb(matrix_mdev, apid))
> >> + vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
> >> +}
> >> +
[..]
Regards,
Halil
On Mon, 30 Nov 2020 19:18:30 -0500
Tony Krowiak <[email protected]> wrote:
> >>>> +static bool vfio_ap_assign_apid_to_apcb(struct ap_matrix_mdev *matrix_mdev,
> >>>> + unsigned long apid)
> >>>> +{
> >>>> + unsigned long apqi, apqn;
> >>>> + unsigned long *aqm = matrix_mdev->shadow_apcb.aqm;
> >>>> +
> >>>> + /*
> >>>> + * If the APID is already assigned to the guest's shadow APCB, there is
> >>>> + * no need to assign it.
> >>>> + */
> >>>> + if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm))
> >>>> + return false;
> >>>> +
> >>>> + /*
> >>>> + * If no domains have yet been assigned to the shadow APCB and one or
> >>>> + * more domains have been assigned to the matrix mdev, then use
> >>>> + * the domains assigned to the matrix mdev; otherwise, there is nothing
> >>>> + * to assign to the shadow APCB.
> >>>> + */
> >>>> + if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS)) {
> >>>> + if (bitmap_empty(matrix_mdev->matrix.aqm, AP_DOMAINS))
> >>>> + return false;
> >>>> +
> >>>> + aqm = matrix_mdev->matrix.aqm;
> >>>> + }
> >>>> +
> >>>> + /* Make sure all APQNs are bound to the vfio_ap driver */
> >>>> + for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
> >>>> + apqn = AP_MKQID(apid, apqi);
> >>>> +
> >>>> + if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
> >>>> + return false;
> >>>> + }
> >>>> +
> >>>> + set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
> >>>> +
> >>>> + /*
> >>>> + * If we verified APQNs using the domains assigned to the matrix mdev,
> >>>> + * then copy the APQIs of those domains into the guest's APCB
> >>>> + */
> >>>> + if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS))
> >>>> + bitmap_copy(matrix_mdev->shadow_apcb.aqm,
> >>>> + matrix_mdev->matrix.aqm, AP_DOMAINS);
> >>>> +
> >>>> + return true;
> >>>> +}
> >>> What is the rationale behind the shadow aqm empty special handling?
> >> The rationale was to avoid taking the VCPUs
> >> out of SIE in order to make an update to the guest's APCB
> >> unnecessarily. For example, suppose the guest is started
> >> without access to any APQNs (i.e., all matrix and shadow_apcb
> >> masks are zeros). Now suppose the administrator proceeds to
> >> start assigning AP resources to the mdev. Let's say he starts
> >> by assigning adapters 1 through 100. The code below will return
> >> true indicating the shadow_apcb was updated. Consequently,
> >> the calling code will commit the changes to the guest's
> >> APCB. The problem there is that in order to update the guest's
> >> VCPUs, they will have to be taken out of SIE, yet the guest will
> >> not get access to the adapter since no domains have yet been
> >> assigned to the APCB. Doing this 100 times - once for each
> >> adapter 1-100 - is probably a bad idea.
> >>
> > Not yanking the VCPUs out of SIE does make a lot of sense. At least
> > I understand your motivation now. I will think some more about this,
> > but in the meanwhile, please try to answer one more question (see
> > below).
> >
> >>> I.e.
> >>> why not simply:
> >>>
> >>>
> >>> static bool vfio_ap_assign_apid_to_apcb(struct ap_matrix_mdev *matrix_mdev,
> >>> unsigned long apid)
> >>> {
> >>> unsigned long apqi, apqn;
> >>> unsigned long *aqm = matrix_mdev->shadow_apcb.aqm;
> >>>
> >>> /*
> >>> * If the APID is already assigned to the guest's shadow APCB, there is
> >>> * no need to assign it.
> >>> */
> >>> if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm))
> >>> return false;
> >>>
> >>> /* Make sure all APQNs are bound to the vfio_ap driver */
> >>> for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
> >>> apqn = AP_MKQID(apid, apqi);
> >>>
> >>> if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
> >>> return false;
> >>> }
> >>>
> >>> set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
> >>>
> >>> return true;
> > Would
> > s/return true/return !bitmap_empty(matrix_mdev->shadow_apcb.aqm,
> > AP_DOMAINS)/
> > do the trick?
> >
> > I mean if empty, then we would not commit the APCB, so we would
> > not take the vCPUs out of SIE -- see below.
>
> At first glance I'd say yes, it does the trick; but, I need to consider
> all possible scenarios. For example, that will work fine when someone
> either assigns all of the adapters or all of the domains first, then assigns
> the other.
Maybe I can help you. The only caveat I have in mind is the show of the
guest_matrix attribute. We probably don't want to display adapters
without domains and vice-versa. But that can be easily handled with
a flag.
Regards,
Halil
On Tue, 1 Dec 2020 00:32:27 +0100
Halil Pasic <[email protected]> wrote:
> >
> >
> > On 11/28/20 8:52 PM, Halil Pasic wrote:
> [..]
> > >> * Unassign adapter from mdev's matrix:
> > >>
> > >> The domain will be hot unplugged from the KVM guest if it is
> > >> assigned to the guest's matrix.
> > >>
> > >> * Assign a control domain:
> > >>
> > >> The control domain will be hot plugged into the KVM guest if it is not
> > >> assigned to the guest's APCB. The AP architecture ensures a guest will
> > >> only get access to the control domain if it is in the host's AP
> > >> configuration, so there is no risk in hot plugging it; however, it will
> > >> become automatically available to the guest when it is added to the host
> > >> configuration.
> > >>
> > >> * Unassign a control domain:
> > >>
> > >> The control domain will be hot unplugged from the KVM guest if it is
> > >> assigned to the guest's APCB.
> > > This is where things start getting tricky. E.g. do we need to revise
> > > filtering after an unassign? (For example an assign_adapter X didn't
> > > change the shadow, because queue XY was missing, but now we unplug domain
> > > Y. Should the adapter X pop up? I guess it should.)
> >
> > I suppose that makes sense at the expense of making the code
> > more complex. It is essentially what we had in the prior version
> > which used the same filtering code for assignment as well as
> > host AP configuration changes.
> >
>
> Will have to think about it some more. Making the user unplug and
> replug an adapter because at some point it got filtered, but there
> is no need to filter it does not feel right. On the other hand, I'm
> afraid I'm complaining in circles.
I did some thinking. The following statements are about the state of
affairs, when all 17 patches are applied. I'm commenting here, because
I believe this is the patch that introduces the most controversial code.
First about low level problems with the current code/design. The other is
empty handling in vfio_ap_assign_apid_to_apcb() (and
vfio_ap_assign_apqi_to_apcb()) is troublesome. The final product
allows for over-commitment, i.e. assignment of e.g. domains that
are not in the crypto host config. Let's assume the host LPAR
has usage domains 1 and 2, and adapters 1, 2, and 3. The apmask
and aqmask are both 0 (all in on vfio), all bound. We start with an empty
mdev that is tied to a running guest:
assign_adapter 1
assign_adapter 2
assign_adapter 3
assign_adapter 4
all of these will work. The resulting shadow_apcb is completely empty. No
commit_apcb.
assign_domain 1
assign_domain 2
assign_domain 3
all of these will work. But again the shadow_apcb is completely empty at
the end: we did get to the loop that is checking the boundness of the
queues, but please note that we are checking against matrix.apm, and
adapter 4 is not in the config of the host.
I've hacked up a fixup patch for these problems that simplifies the
code considerably, but there are design level issues, that run deeper,
so I'm not sure the fixups are the way to go.
Now lets talk about design level stuff. Currently the assignment
operations are designed in to accommodate the FCFS principle. This
is a blessing and a curse at the same time.
Consider the following scenarios. We have an empty (nothing assigned
mdev) and the following queues are bound to the vfio_ap driver:
0.0
0.1
1.0
If the we do
asssign_adapter 0
assign_domain 0
assign_domain 1
assign_adapter 1
We end up with the guest_matrix
0.0
0.1
and the matrix
0.0
0.1
1.0
1.0
That is a different result compared to
asssign_adapter 0
assign_domain 0
assign_adapter 1
assign_domain 1
or the situation where we have 0.0, 0.1, 1.0 and 1.1 bound to vfio_ap
and then 1.1 gets unbound.
For the same system state (bound, config, ap_perm, matrix) you get a
different outcomes (guest_matrix), because the outcomes depend on
history.
Another thing is recovery. I believe the main idea behind shadow_apcb
is that we should auto recover once the resources are available again.
The current design choices make recovery more difficult to think about
because we may end up having either the apid or the apqi filtered on
a 'hole' (an queue missing for reasons different than, belonging to
default, or not being in the host config).
I still think for these cases filtering out the apid is the lesser
evil. Yes a hotplug of a domain making hot unplugging an adapter is
ugly, but at least I can describe that. So I propose the following.
Let me hack up a fixup that morphs things in this direction. Maybe
I will run into unexpected problems, but if I don't then we will
have an alternative design you can run your testcases against. How about
that?
Regards,
Halil
On 12/1/20 12:56 PM, Halil Pasic wrote:
> On Tue, 1 Dec 2020 00:32:27 +0100
> Halil Pasic <[email protected]> wrote:
>
>>>
>>> On 11/28/20 8:52 PM, Halil Pasic wrote:
>> [..]
>>>>> * Unassign adapter from mdev's matrix:
>>>>>
>>>>> The domain will be hot unplugged from the KVM guest if it is
>>>>> assigned to the guest's matrix.
>>>>>
>>>>> * Assign a control domain:
>>>>>
>>>>> The control domain will be hot plugged into the KVM guest if it is not
>>>>> assigned to the guest's APCB. The AP architecture ensures a guest will
>>>>> only get access to the control domain if it is in the host's AP
>>>>> configuration, so there is no risk in hot plugging it; however, it will
>>>>> become automatically available to the guest when it is added to the host
>>>>> configuration.
>>>>>
>>>>> * Unassign a control domain:
>>>>>
>>>>> The control domain will be hot unplugged from the KVM guest if it is
>>>>> assigned to the guest's APCB.
>>>> This is where things start getting tricky. E.g. do we need to revise
>>>> filtering after an unassign? (For example an assign_adapter X didn't
>>>> change the shadow, because queue XY was missing, but now we unplug domain
>>>> Y. Should the adapter X pop up? I guess it should.)
>>> I suppose that makes sense at the expense of making the code
>>> more complex. It is essentially what we had in the prior version
>>> which used the same filtering code for assignment as well as
>>> host AP configuration changes.
>>>
>> Will have to think about it some more. Making the user unplug and
>> replug an adapter because at some point it got filtered, but there
>> is no need to filter it does not feel right. On the other hand, I'm
>> afraid I'm complaining in circles.
> I did some thinking. The following statements are about the state of
> affairs, when all 17 patches are applied. I'm commenting here, because
> I believe this is the patch that introduces the most controversial code.
>
> First about low level problems with the current code/design. The other is
> empty handling in vfio_ap_assign_apid_to_apcb() (and
> vfio_ap_assign_apqi_to_apcb()) is troublesome. The final product
> allows for over-commitment, i.e. assignment of e.g. domains that
> are not in the crypto host config. Let's assume the host LPAR
> has usage domains 1 and 2, and adapters 1, 2, and 3. The apmask
> and aqmask are both 0 (all in on vfio), all bound. We start with an empty
> mdev that is tied to a running guest:
> assign_adapter 1
> assign_adapter 2
> assign_adapter 3
> assign_adapter 4
> all of these will work. The resulting shadow_apcb is completely empty. No
> commit_apcb.
> assign_domain 1
> assign_domain 2
> assign_domain 3
> all of these will work. But again the shadow_apcb is completely empty at
> the end: we did get to the loop that is checking the boundness of the
> queues, but please note that we are checking against matrix.apm, and
> adapter 4 is not in the config of the host.
>
> I've hacked up a fixup patch for these problems that simplifies the
> code considerably, but there are design level issues, that run deeper,
> so I'm not sure the fixups are the way to go.
>
> Now lets talk about design level stuff. Currently the assignment
> operations are designed in to accommodate the FCFS principle. This
> is a blessing and a curse at the same time.
>
> Consider the following scenarios. We have an empty (nothing assigned
> mdev) and the following queues are bound to the vfio_ap driver:
> 0.0
> 0.1
> 1.0
> If the we do
> asssign_adapter 0
> assign_domain 0
> assign_domain 1
> assign_adapter 1
> We end up with the guest_matrix
> 0.0
> 0.1
> and the matrix
> 0.0
> 0.1
> 1.0
> 1.0
>
> That is a different result compared to
> asssign_adapter 0
> assign_domain 0
> assign_adapter 1
> assign_domain 1
> or the situation where we have 0.0, 0.1, 1.0 and 1.1 bound to vfio_ap
> and then 1.1 gets unbound.
In v11 of the patch series, the filtering code always filters
the matrix assigned to the mdev and is invoked whenever
an adapter or domain is assigned, a queue is probed and
when the AP bus scan complete notification is received and
adapters and/or domains have been added to the host AP
configuration. So I made a slight modification to that
filtering function to filter only by APID and ran the above
scenarios. In each case, the resulting guest matrix was
identicle. I also tested the bind/unbind and achieved the
same results.
>
> For the same system state (bound, config, ap_perm, matrix) you get a
> different outcomes (guest_matrix), because the outcomes depend on
> history.
>
> Another thing is recovery. I believe the main idea behind shadow_apcb
> is that we should auto recover once the resources are available again.
> The current design choices make recovery more difficult to think about
> because we may end up having either the apid or the apqi filtered on
> a 'hole' (an queue missing for reasons different than, belonging to
> default, or not being in the host config).
The filtering code from the v11 series with the tweak I
mentioned above accomplishes this. I tested this by
doing manual binds/unbinds of a queue using the
scenarios you layed out.
>
> I still think for these cases filtering out the apid is the lesser
> evil. Yes a hotplug of a domain making hot unplugging an adapter is
> ugly, but at least I can describe that. So I propose the following.
> Let me hack up a fixup that morphs things in this direction. Maybe
> I will run into unexpected problems, but if I don't then we will
> have an alternative design you can run your testcases against. How about
> that?
I appreciate the offer, but I believe with the change to the v11
filtering code I described above we have a solution. One of
your objections to the filtering code was looping over all
assigned adapters/domains each time an adapter or
domain is assigned. It should also be easy to examine only
the APQNs involving the new APID or APQI being assigned.
Again, I appreciate your offer, but I don't think it is necessary
to take you away from your priorities to involve yourself in
mine.
>
> Regards,
> Halil
On Tue, 1 Dec 2020 17:12:56 -0500
Tony Krowiak <[email protected]> wrote:
> On 12/1/20 12:56 PM, Halil Pasic wrote:
> > On Tue, 1 Dec 2020 00:32:27 +0100
> > Halil Pasic <[email protected]> wrote:
> >
> >>>
> >>> On 11/28/20 8:52 PM, Halil Pasic wrote:
> >> [..]
> >>>>> * Unassign adapter from mdev's matrix:
> >>>>>
> >>>>> The domain will be hot unplugged from the KVM guest if it is
> >>>>> assigned to the guest's matrix.
> >>>>>
> >>>>> * Assign a control domain:
> >>>>>
> >>>>> The control domain will be hot plugged into the KVM guest if it is not
> >>>>> assigned to the guest's APCB. The AP architecture ensures a guest will
> >>>>> only get access to the control domain if it is in the host's AP
> >>>>> configuration, so there is no risk in hot plugging it; however, it will
> >>>>> become automatically available to the guest when it is added to the host
> >>>>> configuration.
> >>>>>
> >>>>> * Unassign a control domain:
> >>>>>
> >>>>> The control domain will be hot unplugged from the KVM guest if it is
> >>>>> assigned to the guest's APCB.
> >>>> This is where things start getting tricky. E.g. do we need to revise
> >>>> filtering after an unassign? (For example an assign_adapter X didn't
> >>>> change the shadow, because queue XY was missing, but now we unplug domain
> >>>> Y. Should the adapter X pop up? I guess it should.)
> >>> I suppose that makes sense at the expense of making the code
> >>> more complex. It is essentially what we had in the prior version
> >>> which used the same filtering code for assignment as well as
> >>> host AP configuration changes.
> >>>
> >> Will have to think about it some more. Making the user unplug and
> >> replug an adapter because at some point it got filtered, but there
> >> is no need to filter it does not feel right. On the other hand, I'm
> >> afraid I'm complaining in circles.
> > I did some thinking. The following statements are about the state of
> > affairs, when all 17 patches are applied. I'm commenting here, because
> > I believe this is the patch that introduces the most controversial code.
> >
> > First about low level problems with the current code/design. The other is
> > empty handling in vfio_ap_assign_apid_to_apcb() (and
> > vfio_ap_assign_apqi_to_apcb()) is troublesome. The final product
> > allows for over-commitment, i.e. assignment of e.g. domains that
> > are not in the crypto host config. Let's assume the host LPAR
> > has usage domains 1 and 2, and adapters 1, 2, and 3. The apmask
> > and aqmask are both 0 (all in on vfio), all bound. We start with an empty
> > mdev that is tied to a running guest:
> > assign_adapter 1
> > assign_adapter 2
> > assign_adapter 3
> > assign_adapter 4
> > all of these will work. The resulting shadow_apcb is completely empty. No
> > commit_apcb.
> > assign_domain 1
> > assign_domain 2
> > assign_domain 3
> > all of these will work. But again the shadow_apcb is completely empty at
> > the end: we did get to the loop that is checking the boundness of the
> > queues, but please note that we are checking against matrix.apm, and
> > adapter 4 is not in the config of the host.
> >
> > I've hacked up a fixup patch for these problems that simplifies the
> > code considerably, but there are design level issues, that run deeper,
> > so I'm not sure the fixups are the way to go.
> >
> > Now lets talk about design level stuff. Currently the assignment
> > operations are designed in to accommodate the FCFS principle. This
> > is a blessing and a curse at the same time.
> >
> > Consider the following scenarios. We have an empty (nothing assigned
> > mdev) and the following queues are bound to the vfio_ap driver:
> > 0.0
> > 0.1
> > 1.0
> > If the we do
> > asssign_adapter 0
> > assign_domain 0
> > assign_domain 1
> > assign_adapter 1
> > We end up with the guest_matrix
> > 0.0
> > 0.1
> > and the matrix
> > 0.0
> > 0.1
> > 1.0
> > 1.0
> >
> > That is a different result compared to
> > asssign_adapter 0
> > assign_domain 0
> > assign_adapter 1
> > assign_domain 1
> > or the situation where we have 0.0, 0.1, 1.0 and 1.1 bound to vfio_ap
> > and then 1.1 gets unbound.
>
> In v11 of the patch series, the filtering code always filters
> the matrix assigned to the mdev and is invoked whenever
> an adapter or domain is assigned, a queue is probed and
> when the AP bus scan complete notification is received and
> adapters and/or domains have been added to the host AP
> configuration. So I made a slight modification to that
> filtering function to filter only by APID and ran the above
> scenarios. In each case, the resulting guest matrix was
> identicle. I also tested the bind/unbind and achieved the
> same results.
>
> >
> > For the same system state (bound, config, ap_perm, matrix) you get a
> > different outcomes (guest_matrix), because the outcomes depend on
> > history.
> >
> > Another thing is recovery. I believe the main idea behind shadow_apcb
> > is that we should auto recover once the resources are available again.
> > The current design choices make recovery more difficult to think about
> > because we may end up having either the apid or the apqi filtered on
> > a 'hole' (an queue missing for reasons different than, belonging to
> > default, or not being in the host config).
>
> The filtering code from the v11 series with the tweak I
> mentioned above accomplishes this. I tested this by
> doing manual binds/unbinds of a queue using the
> scenarios you layed out.
>
> >
> > I still think for these cases filtering out the apid is the lesser
> > evil. Yes a hotplug of a domain making hot unplugging an adapter is
> > ugly, but at least I can describe that. So I propose the following.
> > Let me hack up a fixup that morphs things in this direction. Maybe
> > I will run into unexpected problems, but if I don't then we will
> > have an alternative design you can run your testcases against. How about
> > that?
>
> I appreciate the offer, but I believe with the change to the v11
> filtering code I described above we have a solution. One of
> your objections to the filtering code was looping over all
> assigned adapters/domains each time an adapter or
> domain is assigned. It should also be easy to examine only
> the APQNs involving the new APID or APQI being assigned.
> Again, I appreciate your offer, but I don't think it is necessary
> to take you away from your priorities to involve yourself in
> mine.
Seems you have it sorted out. Unfortunately I can't really follow without
code, but I have to trust you. Can you please spin a v13 with these
improvements implemented?
Maybe I didn't comment on every patch, but I did go through all of them.
I believe we have enough material for another iteration, and further
review makes no sense at this point. I intend to come back to this
once v13 is out.
Thanks,
Halil
On 30.11.20 10:18, [email protected] wrote:
> On Tue, 24 Nov 2020 16:40:13 -0500
> Tony Krowiak <[email protected]> wrote:
>
>> This patch intruduces an extension to the ap bus to notify device drivers
>> when the host AP configuration changes - i.e., adapters, domains or
>> control domains are added or removed. To that end, two new callbacks are
>> introduced for AP device drivers:
>>
>> void (*on_config_changed)(struct ap_config_info *new_config_info,
>> struct ap_config_info *old_config_info);
>>
>> This callback is invoked at the start of the AP bus scan
>> function when it determines that the host AP configuration information
>> has changed since the previous scan. This is done by storing
>> an old and current QCI info struct and comparing them. If there is any
>> difference, the callback is invoked.
>>
>> Note that when the AP bus scan detects that AP adapters, domains or
>> control domains have been removed from the host's AP configuration, it
>> will remove the associated devices from the AP bus subsystem's device
>> model. This callback gives the device driver a chance to respond to
>> the removal of the AP devices from the host configuration prior to
>> calling the device driver's remove callback. The primary purpose of
>> this callback is to allow the vfio_ap driver to do a bulk unplug of
>> all affected adapters, domains and control domains from affected
>> guests rather than unplugging them one at a time when the remove
>> callback is invoked.
>>
>> void (*on_scan_complete)(struct ap_config_info *new_config_info,
>> struct ap_config_info *old_config_info);
>>
>> The on_scan_complete callback is invoked after the ap bus scan is
>> complete if the host AP configuration data has changed.
>>
>> Note that when the AP bus scan detects that adapters, domains or
>> control domains have been added to the host's configuration, it will
>> create new devices in the AP bus subsystem's device model. The primary
>> purpose of this callback is to allow the vfio_ap driver to do a bulk
>> plug of all affected adapters, domains and control domains into
>> affected guests rather than plugging them one at a time when the
>> probe callback is invoked.
>>
>> Please note that changes to the apmask and aqmask do not trigger
>> these two callbacks since the bus scan function is not invoked by changes
>> to those masks.
>>
>> Signed-off-by: Harald Freudenberger <[email protected]>
>> Signed-off-by: Tony Krowiak <[email protected]>
>> ---
>> drivers/s390/crypto/ap_bus.c | 83 ++++++++++++++++++++++++++-
>> drivers/s390/crypto/ap_bus.h | 12 ++++
>> drivers/s390/crypto/vfio_ap_private.h | 14 ++++-
>> 3 files changed, 106 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
>> index 593573740981..3a63f6b33d8a 100644
>> --- a/drivers/s390/crypto/ap_bus.c
>> +++ b/drivers/s390/crypto/ap_bus.c
>> @@ -75,6 +75,7 @@ DEFINE_MUTEX(ap_perms_mutex);
>> EXPORT_SYMBOL(ap_perms_mutex);
>>
>> static struct ap_config_info *ap_qci_info;
>> +static struct ap_config_info *ap_qci_info_old;
>>
>> /*
>> * AP bus related debug feature things.
>> @@ -1440,6 +1441,52 @@ static int __match_queue_device_with_queue_id(struct device *dev, const void *da
>> && AP_QID_QUEUE(to_ap_queue(dev)->qid) == (int)(long) data;
>> }
>>
>> +/* Helper function for notify_config_changed */
>> +static int __drv_notify_config_changed(struct device_driver *drv, void *data)
>> +{
>> + struct ap_driver *ap_drv = to_ap_drv(drv);
>> +
>> + if (try_module_get(drv->owner)) {
>> + if (ap_drv->on_config_changed)
>> + ap_drv->on_config_changed(ap_qci_info,
>> + ap_qci_info_old);
>> + module_put(drv->owner);
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +/* Notify all drivers about an qci config change */
>> +static inline void notify_config_changed(void)
>> +{
>> + bus_for_each_drv(&ap_bus_type, NULL, NULL,
>> + __drv_notify_config_changed);
>> +}
>> +
>> +/* Helper function for notify_scan_complete */
>> +static int __drv_notify_scan_complete(struct device_driver *drv, void *data)
>> +{
>> + struct ap_driver *ap_drv = to_ap_drv(drv);
>> +
>> + if (try_module_get(drv->owner)) {
>> + if (ap_drv->on_scan_complete)
>> + ap_drv->on_scan_complete(ap_qci_info,
>> + ap_qci_info_old);
>> + module_put(drv->owner);
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +/* Notify all drivers about bus scan complete */
>> +static inline void notify_scan_complete(void)
>> +{
>> + bus_for_each_drv(&ap_bus_type, NULL, NULL,
>> + __drv_notify_scan_complete);
>> +}
>> +
>> +
>> +
>> /*
>> * Helper function for ap_scan_bus().
>> * Remove card device and associated queue devices.
>> @@ -1718,15 +1765,43 @@ static inline void ap_scan_adapter(int ap)
>> put_device(&ac->ap_dev.device);
>> }
>>
>> +static int ap_get_configuration(void)
> I believe this was Haralds request. I'm OO contaminated, but
> the signature and the semantic does not mash well with my understanding
> of a 'getter'. Especially the return value being actually a boolean and
> 'configuration changed/still the same'. From the signature it looks more
> like the usual C-stlyle try to do something and return 0 if OK, otherwise
> error code != 0.
>
> Since it's Haralds dominion, I'm not asking you to change this, but we
> could at least document the return value (maybe also the behavior).
Well, no. This function comes from Tony. And you can see a mixture of
bool and int return values within the AP code. Historically there was no bool
and it was very long frowned upon using bool within the kernel.
However, long term I'd like to use bool for all these true/false functions and
so Tony if you need to touch this anyway you could change to bool here.
Tony, as you anyway need to rebase here - the ap code has significant changed in this corner -
should I pull these changes within the ap bus code from your patch series and push them
into the development branch after some adaptions to the current code ?
>
>> +{
>> + int cfg_chg = 0;
>> +
>> + if (ap_qci_info) {
>> + if (!ap_qci_info_old) {
>> + ap_qci_info_old = kzalloc(sizeof(*ap_qci_info_old),
>> + GFP_KERNEL);
>> + if (!ap_qci_info_old)
>> + return 0;
>> + } else {
>> + memcpy(ap_qci_info_old, ap_qci_info,
>> + sizeof(struct ap_config_info));
>> + }
>> + ap_fetch_qci_info(ap_qci_info);
>> + cfg_chg = memcmp(ap_qci_info,
>> + ap_qci_info_old,
>> + sizeof(struct ap_config_info)) != 0;
>> + }
>> +
>> + return cfg_chg;
>> +}
>> +
>> /**
>> * ap_scan_bus(): Scan the AP bus for new devices
>> * Runs periodically, workqueue timer (ap_config_time)
>> */
>> static void ap_scan_bus(struct work_struct *unused)
>> {
>> - int ap;
>> + int ap, config_changed = 0;
>>
>> - ap_fetch_qci_info(ap_qci_info);
>> + /* config change notify */
>> + config_changed = ap_get_configuration();
>> + if (config_changed)
>> + notify_config_changed();
>> + memcpy(ap_qci_info_old, ap_qci_info,
>> + sizeof(struct ap_config_info));
> Why is this memcpy needed? Isn't that already take care of in
> ap_get_configuration()?
>
>> ap_select_domain();
>>
>> AP_DBF_DBG("%s running\n", __func__);
>> @@ -1735,6 +1810,10 @@ static void ap_scan_bus(struct work_struct *unused)
>> for (ap = 0; ap <= ap_max_adapter_id; ap++)
>> ap_scan_adapter(ap);
>>
>> + /* scan complete notify */
>> + if (config_changed)
>> + notify_scan_complete();
>> +
>> /* check if there is at least one queue available with default domain */
>> if (ap_domain_index >= 0) {
>> struct device *dev =
>> diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
>> index 65edd847c65a..fbfbf6991718 100644
>> --- a/drivers/s390/crypto/ap_bus.h
>> +++ b/drivers/s390/crypto/ap_bus.h
>> @@ -146,6 +146,18 @@ struct ap_driver {
>> int (*probe)(struct ap_device *);
>> void (*remove)(struct ap_device *);
>> int (*in_use)(unsigned long *apm, unsigned long *aqm);
>> + /*
>> + * Called at the start of the ap bus scan function when
>> + * the crypto config information (qci) has changed.
>> + */
>> + void (*on_config_changed)(struct ap_config_info *new_config_info,
>> + struct ap_config_info *old_config_info);
>> + /*
>> + * Called at the end of the ap bus scan function when
>> + * the crypto config information (qci) has changed.
>> + */
>> + void (*on_scan_complete)(struct ap_config_info *new_config_info,
>> + struct ap_config_info *old_config_info);
>> };
>>
>> #define to_ap_drv(x) container_of((x), struct ap_driver, driver)
>> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
>> index 15b7cd74843b..7bd7e35eb2e0 100644
>> --- a/drivers/s390/crypto/vfio_ap_private.h
>> +++ b/drivers/s390/crypto/vfio_ap_private.h
> These changes probably belong to some next patch...
>
> With the things I just brought up clarified, you can slap a:
> Reviewed-by: Halil Pasic <[email protected]>
> over it next time.
>
>> @@ -36,14 +36,21 @@
>> * driver, be it using @mdev_list or writing the state of a
>> * single ap_matrix_mdev device. It's quite coarse but we don't
>> * expect much contention.
>> + ** @ap_add: a bitmap specifying the APIDs added to the host AP configuration
>> + * as notified by the AP bus via the on_cfg_chg callback.
>> + * @aq_add: a bitmap specifying the APQIs added to the host AP configuration
>> + * as notified by the AP bus via the on_cfg_chg callback.
>> */
>> struct ap_matrix_dev {
>> struct device device;
>> atomic_t available_instances;
>> - struct ap_config_info info;
>> + struct ap_config_info config_info;
>> + struct ap_config_info config_info_prev;
>> struct list_head mdev_list;
>> struct mutex lock;
>> struct ap_driver *vfio_ap_drv;
>> + DECLARE_BITMAP(ap_add, AP_DEVICES);
>> + DECLARE_BITMAP(aq_add, AP_DEVICES);
>> };
>>
>> extern struct ap_matrix_dev *matrix_dev;
>> @@ -90,6 +97,8 @@ struct ap_matrix_mdev {
>> struct kvm_s390_module_hook pqap_hook;
>> struct mdev_device *mdev;
>> DECLARE_HASHTABLE(qtable, 8);
>> + DECLARE_BITMAP(ap_add, AP_DEVICES);
>> + DECLARE_BITMAP(aq_add, AP_DEVICES);
>> };
>>
>> extern int vfio_ap_mdev_register(void);
>> @@ -109,4 +118,7 @@ void vfio_ap_mdev_remove_queue(struct ap_device *queue);
>>
>> int vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm);
>>
>> +void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
>> + struct ap_config_info *old_config_info);
>> +
>> #endif /* _VFIO_AP_PRIVATE_H_ */
On 11/26/20 9:45 AM, Halil Pasic wrote:
> On Tue, 24 Nov 2020 16:40:04 -0500
> Tony Krowiak <[email protected]> wrote:
>
>> Let's create links between each queue device bound to the vfio_ap device
>> driver and the matrix mdev to which the queue is assigned. The idea is to
>> facilitate efficient retrieval of the objects representing the queue
>> devices and matrix mdevs as well as to verify that a queue assigned to
>> a matrix mdev is bound to the driver.
>>
>> The links will be created as follows:
>>
>> * When the queue device is probed, if its APQN is assigned to a matrix
>> mdev, the structures representing the queue device and the matrix mdev
>> will be linked.
>>
>> * When an adapter or domain is assigned to a matrix mdev, for each new
>> APQN assigned that references a queue device bound to the vfio_ap
>> device driver, the structures representing the queue device and the
>> matrix mdev will be linked.
>>
>> The links will be removed as follows:
>>
>> * When the queue device is removed, if its APQN is assigned to a matrix
>> mdev, the structures representing the queue device and the matrix mdev
>> will be unlinked.
>>
>> * When an adapter or domain is unassigned from a matrix mdev, for each
>> APQN unassigned that references a queue device bound to the vfio_ap
>> device driver, the structures representing the queue device and the
>> matrix mdev will be unlinked.
>>
>> Signed-off-by: Tony Krowiak <[email protected]>
> Actually some aspects of this look much better than last time,
> but I'm afraid there one new issue that must be corrected -- see below.
>
>> ---
>> drivers/s390/crypto/vfio_ap_ops.c | 161 +++++++++++++++++++++++---
>> drivers/s390/crypto/vfio_ap_private.h | 3 +
>> 2 files changed, 146 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index dc699fd54505..07caf871943c 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -28,7 +28,6 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
>>
>> /**
>> * vfio_ap_get_queue: Retrieve a queue with a specific APQN.
>> - * @matrix_mdev: the associated mediated matrix
>> * @apqn: The queue APQN
>> *
>> * Retrieve a queue with a specific APQN from the AP queue devices attached to
>> @@ -36,32 +35,36 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
>> *
>> * Returns the pointer to the vfio_ap_queue with the specified APQN, or NULL.
>> */
>> -static struct vfio_ap_queue *vfio_ap_get_queue(
>> - struct ap_matrix_mdev *matrix_mdev,
>> - int apqn)
>> +static struct vfio_ap_queue *vfio_ap_get_queue(int apqn)
>> {
>> struct ap_queue *queue;
>> struct vfio_ap_queue *q = NULL;
>>
>> - if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
>> - return NULL;
>> - if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
>> - return NULL;
>> -
>> queue = ap_get_qdev(apqn);
>> if (!queue)
>> return NULL;
>>
>> put_device(&queue->ap_dev.device);
>>
>> - if (queue->ap_dev.device.driver == &matrix_dev->vfio_ap_drv->driver) {
>> + if (queue->ap_dev.device.driver == &matrix_dev->vfio_ap_drv->driver)
>> q = dev_get_drvdata(&queue->ap_dev.device);
>> - q->matrix_mdev = matrix_mdev;
>> - }
>>
>> return q;
>> }
>>
>> +static struct vfio_ap_queue *
>> +vfio_ap_mdev_get_queue(struct ap_matrix_mdev *matrix_mdev, unsigned long apqn)
>> +{
>> + struct vfio_ap_queue *q;
>> +
>> + hash_for_each_possible(matrix_mdev->qtable, q, mdev_qnode, apqn) {
>> + if (q && (q->apqn == apqn))
>> + return q;
>> + }
>> +
>> + return NULL;
>> +}
>> +
>> /**
>> * vfio_ap_wait_for_irqclear
>> * @apqn: The AP Queue number
>> @@ -172,7 +175,6 @@ static struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
>> status.response_code);
>> end_free:
>> vfio_ap_free_aqic_resources(q);
>> - q->matrix_mdev = NULL;
>> return status;
>> }
>>
>> @@ -288,7 +290,7 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>> matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
>> struct ap_matrix_mdev, pqap_hook);
>>
>> - q = vfio_ap_get_queue(matrix_mdev, apqn);
>> + q = vfio_ap_mdev_get_queue(matrix_mdev, apqn);
>> if (!q)
>> goto out_unlock;
>>
>> @@ -331,6 +333,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>>
>> matrix_mdev->mdev = mdev;
>> vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
>> + hash_init(matrix_mdev->qtable);
>> mdev_set_drvdata(mdev, matrix_mdev);
>> matrix_mdev->pqap_hook.hook = handle_pqap;
>> matrix_mdev->pqap_hook.owner = THIS_MODULE;
>> @@ -559,6 +562,87 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
>> return 0;
>> }
>>
>> +enum qlink_action {
>> + LINK_APID,
>> + LINK_APQI,
>> + UNLINK_APID,
>> + UNLINK_APQI,
>> +};
>> +
>> +static void vfio_ap_mdev_link_queue(struct ap_matrix_mdev *matrix_mdev,
>> + unsigned long apid, unsigned long apqi)
>> +{
>> + struct vfio_ap_queue *q;
>> +
>> + q = vfio_ap_get_queue(AP_MKQID(apid, apqi));
>> + if (q) {
>> + q->matrix_mdev = matrix_mdev;
>> + hash_add(matrix_mdev->qtable,
>> + &q->mdev_qnode, q->apqn);
>> + }
>> +}
>> +
>> +static void vfio_ap_mdev_unlink_queue(unsigned long apid, unsigned long apqi)
>> +{
>> + struct vfio_ap_queue *q;
>> +
>> + q = vfio_ap_get_queue(AP_MKQID(apid, apqi));
>> + if (q) {
>> + q->matrix_mdev = NULL;
>> + hash_del(&q->mdev_qnode);
>> + }
>> +}
>
>
> I would do
>
> +static void vfio_ap_mdev_unlink_queue(struct vfio_ap_queue *q)
> +{
> + if (!q)
> + return;
> + q->matrix_mdev = NULL;
> + hash_del(&q->mdev_qnode);
> +}
> +
> +static void vfio_ap_mdev_unlink_queue_by_id(unsigned long apid, unsigned long apqi)
> +{
> + struct vfio_ap_queue *q = vfio_ap_get_queue(AP_MKQID(apid, apqi));
> +
> + vfio_ap_mdev_unlink_queue(q);
> +}
I agree because of the case you made below.
>
>> +
>> +/**
>> + * vfio_ap_mdev_manage_qlinks
>> + *
>> + * @matrix_mdev: The matrix mdev to link.
>> + * @action: The action to take on @qlink_id.
>> + * @qlink_id: The APID or APQI of the queues to link.
>> + *
>> + * Sets or clears the links between the queues with the specified @qlink_id
>> + * and the @matrix_mdev:
>> + * @action == LINK_APID: Set the links between the @matrix_mdev and the
>> + * queues with the specified @qlink_id (APID)
>> + * @action == LINK_APQI: Set the links between the @matrix_mdev and the
>> + * queues with the specified @qlink_id (APQI)
>> + * @action == UNLINK_APID: Clear the links between the @matrix_mdev and the
>> + * queues with the specified @qlink_id (APID)
>> + * @action == UNLINK_APQI: Clear the links between the @matrix_mdev and the
>> + * queues with the specified @qlink_id (APQI)
>> + */
>> +static void vfio_ap_mdev_manage_qlinks(struct ap_matrix_mdev *matrix_mdev,
>> + enum qlink_action action,
>> + unsigned long qlink_id)
>> +{
>> + unsigned long id;
>> +
>> + switch (action) {
>> + case LINK_APID:
>> + for_each_set_bit_inv(id, matrix_mdev->matrix.aqm,
>> + matrix_mdev->matrix.aqm_max + 1)
>> + vfio_ap_mdev_link_queue(matrix_mdev, qlink_id, id);
>> + break;
>> + case UNLINK_APID:
>> + for_each_set_bit_inv(id, matrix_mdev->matrix.aqm,
>> + matrix_mdev->matrix.aqm_max + 1)
>> + vfio_ap_mdev_unlink_queue(qlink_id, id);
>> + break;
>> + case LINK_APQI:
>> + for_each_set_bit_inv(id, matrix_mdev->matrix.apm,
>> + matrix_mdev->matrix.apm_max + 1)
>> + vfio_ap_mdev_link_queue(matrix_mdev, id, qlink_id);
>> + break;
>> + case UNLINK_APQI:
>> + for_each_set_bit_inv(id, matrix_mdev->matrix.apm,
>> + matrix_mdev->matrix.apm_max + 1)
>> + vfio_ap_mdev_link_queue(matrix_mdev, id, qlink_id);
>> + break;
>> + default:
>> + WARN_ON_ONCE(1);
>> + }
>> +}
>> +
>> /**
>> * assign_adapter_store
>> *
>> @@ -628,6 +712,7 @@ static ssize_t assign_adapter_store(struct device *dev,
>> if (ret)
>> goto share_err;
>>
>> + vfio_ap_mdev_manage_qlinks(matrix_mdev, LINK_APID, apid);
>> ret = count;
>> goto done;
>>
>> @@ -679,6 +764,7 @@ static ssize_t unassign_adapter_store(struct device *dev,
>>
>> mutex_lock(&matrix_dev->lock);
>> clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
>> + vfio_ap_mdev_manage_qlinks(matrix_mdev, UNLINK_APID, apid);
>> mutex_unlock(&matrix_dev->lock);
>>
>> return count;
>> @@ -769,6 +855,7 @@ static ssize_t assign_domain_store(struct device *dev,
>> if (ret)
>> goto share_err;
>>
>> + vfio_ap_mdev_manage_qlinks(matrix_mdev, LINK_APQI, apqi);
>> ret = count;
>> goto done;
>>
>> @@ -821,6 +908,7 @@ static ssize_t unassign_domain_store(struct device *dev,
>>
>> mutex_lock(&matrix_dev->lock);
>> clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
>> + vfio_ap_mdev_manage_qlinks(matrix_mdev, UNLINK_APQI, apqi);
>> mutex_unlock(&matrix_dev->lock);
>>
>> return count;
>> @@ -1155,6 +1243,11 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
>> matrix_mdev->matrix.apm_max + 1) {
>> for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
>> matrix_mdev->matrix.aqm_max + 1) {
>> + q = vfio_ap_mdev_get_queue(matrix_mdev,
>> + AP_MKQID(apid, apqi));
>> + if (!q)
>> + continue;
>> +
>> ret = vfio_ap_mdev_reset_queue(apid, apqi, 1);
>> /*
>> * Regardless whether a queue turns out to be busy, or
>> @@ -1164,9 +1257,7 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
>> if (ret)
>> rc = ret;
>>
>> - q = vfio_ap_get_queue(matrix_mdev, AP_MKQID(apid, apqi);
>> - if (q)
>> - vfio_ap_free_aqic_resources(q);
>> + vfio_ap_free_aqic_resources(q);
>> }
>> }
>>
>> @@ -1292,6 +1383,29 @@ void vfio_ap_mdev_unregister(void)
>> mdev_unregister_device(&matrix_dev->device);
>> }
>>
>> +/*
>> + * vfio_ap_queue_link_mdev
>> + *
>> + * @q: The queue to link with the matrix mdev.
>> + *
>> + * Links @q with the matrix mdev to which the queue's APQN is assigned.
>> + */
>> +static void vfio_ap_queue_link_mdev(struct vfio_ap_queue *q)
>> +{
>> + unsigned long apid = AP_QID_CARD(q->apqn);
>> + unsigned long apqi = AP_QID_QUEUE(q->apqn);
>> + struct ap_matrix_mdev *matrix_mdev;
>> +
>> + list_for_each_entry(matrix_mdev, &matrix_dev->mdev_list, node) {
>> + if (test_bit_inv(apid, matrix_mdev->matrix.apm) &&
>> + test_bit_inv(apqi, matrix_mdev->matrix.aqm)) {
>> + q->matrix_mdev = matrix_mdev;
>> + hash_add(matrix_mdev->qtable, &q->mdev_qnode, q->apqn);
>> + break;
>> + }
>> + }
>> +}
>> +
>> /**
>> * vfio_ap_mdev_probe_queue:
>> *
>> @@ -1305,9 +1419,13 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
>> q = kzalloc(sizeof(*q), GFP_KERNEL);
>> if (!q)
>> return -ENOMEM;
>> + mutex_lock(&matrix_dev->lock);
>> dev_set_drvdata(&apdev->device, q);
>> q->apqn = to_ap_queue(&apdev->device)->qid;
>> q->saved_isc = VFIO_AP_ISC_INVALID;
>> + vfio_ap_queue_link_mdev(q);
>> + mutex_unlock(&matrix_dev->lock);
>> +
>> return 0;
>> }
>>
>> @@ -1328,7 +1446,14 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
>> apid = AP_QID_CARD(q->apqn);
>> apqi = AP_QID_QUEUE(q->apqn);
>> vfio_ap_mdev_reset_queue(apid, apqi, 1);
> Does it make sense to reset if !q->matrix_dev?
This line of code was not modified from what is upstream, so I don't
think this patch or even this patch series is the appropriate place to
question this. If you feel strongly that we shouldn't reset the queue
when it is unbound from the vfio_ap device driver, we can discuss
that offline and create an individual patch specifically for that
purpose.
>
>> - vfio_ap_irq_disable(q);
>> + if (q->matrix_mdev) {
>> + if (q->matrix_mdev->kvm) {
>> + vfio_ap_free_aqic_resources(q);
> Again this belongs to the previous patch.
Actually, it belongs in patch 01/14 but I agree, it does not belong
in this patch.
>
>> + kvm_put_kvm(q->matrix_mdev->kvm);
> This kvm_put_kvm() makes no sense to me! Please explain. Where
> is the corresponding kvm_get_kvm()?
The kvm_get_kvm() is in the group notifier callback, but it definitely
doesn't belong here with this patch.
>
>> + }
>> + hash_del(&q->mdev_qnode);
>> + q->matrix_mdev = NULL;
> This shouuld be an unlink_queue(q).
Okay.
>
>> + }
>> kfree(q);
>> mutex_unlock(&matrix_dev->lock);
>> }
>> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
>> index d9003de4fbad..4e5cc72fc0db 100644
>> --- a/drivers/s390/crypto/vfio_ap_private.h
>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>> @@ -18,6 +18,7 @@
>> #include <linux/delay.h>
>> #include <linux/mutex.h>
>> #include <linux/kvm_host.h>
>> +#include <linux/hashtable.h>
>>
>> #include "ap_bus.h"
>>
>> @@ -86,6 +87,7 @@ struct ap_matrix_mdev {
>> struct kvm *kvm;
>> struct kvm_s390_module_hook pqap_hook;
>> struct mdev_device *mdev;
>> + DECLARE_HASHTABLE(qtable, 8);
>> };
>>
>> extern int vfio_ap_mdev_register(void);
>> @@ -97,6 +99,7 @@ struct vfio_ap_queue {
>> int apqn;
>> #define VFIO_AP_ISC_INVALID 0xff
>> unsigned char saved_isc;
>> + struct hlist_node mdev_qnode;
>> };
>>
>> int vfio_ap_mdev_probe_queue(struct ap_device *queue);
On 11/26/20 9:08 AM, Halil Pasic wrote:
> On Tue, 24 Nov 2020 16:40:04 -0500
> Tony Krowiak <[email protected]> wrote:
>
>> @@ -1155,6 +1243,11 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
>> matrix_mdev->matrix.apm_max + 1) {
>> for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
>> matrix_mdev->matrix.aqm_max + 1) {
>> + q = vfio_ap_mdev_get_queue(matrix_mdev,
>> + AP_MKQID(apid, apqi));
>> + if (!q)
>> + continue;
>> +
>> ret = vfio_ap_mdev_reset_queue(apid, apqi, 1);
>> /*
>> * Regardless whether a queue turns out to be busy, or
>> @@ -1164,9 +1257,7 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
>> if (ret)
>> rc = ret;
>>
>> - q = vfio_ap_get_queue(matrix_mdev, AP_MKQID(apid, apqi);
>> - if (q)
>> - vfio_ap_free_aqic_resources(q);
>> + vfio_ap_free_aqic_resources(q);
>> }
>> }
> During the review of v11 we discussed this. Introducing this the one
> way around, just to change it in the next patch, which should deal
> with something different makes no sense to me.
This is handled by the vfio_ap_mdev_reset_queue() function in the
next version.
>
> BTW I've provided a ton of feedback for '[PATCH v11 03/14]
> s390/vfio-ap: manage link between queue struct and matrix mdev', but I
> can't find your response to that. Some of the things resurface here, and
> I don't feel like repeating myself. Can you provide me an answer to
> the v11 version?
I can.
On 11/26/20 10:54 AM, Halil Pasic wrote:
> On Tue, 24 Nov 2020 16:40:06 -0500
> Tony Krowiak <[email protected]> wrote:
>
>> Let's implement the callback to indicate when an APQN
>> is in use by the vfio_ap device driver. The callback is
>> invoked whenever a change to the apmask or aqmask would
>> result in one or more queue devices being removed from the driver. The
>> vfio_ap device driver will indicate a resource is in use
>> if the APQN of any of the queue devices to be removed are assigned to
>> any of the matrix mdevs under the driver's control.
>>
>> There is potential for a deadlock condition between the matrix_dev->lock
>> used to lock the matrix device during assignment of adapters and domains
>> and the ap_perms_mutex locked by the AP bus when changes are made to the
>> sysfs apmask/aqmask attributes.
>>
>> Consider following scenario (courtesy of Halil Pasic):
>> 1) apmask_store() takes ap_perms_mutex
>> 2) assign_adapter_store() takes matrix_dev->lock
>> 3) apmask_store() calls vfio_ap_mdev_resource_in_use() which tries
>> to take matrix_dev->lock
>> 4) assign_adapter_store() calls ap_apqn_in_matrix_owned_by_def_drv
>> which tries to take ap_perms_mutex
>>
>> BANG!
>>
>> To resolve this issue, instead of using the mutex_lock(&matrix_dev->lock)
>> function to lock the matrix device during assignment of an adapter or
>> domain to a matrix_mdev as well as during the in_use callback, the
>> mutex_trylock(&matrix_dev->lock) function will be used. If the lock is not
>> obtained, then the assignment and in_use functions will terminate with
>> -EBUSY.
> Good news is: the final product is OK with regards to in_use(). Bad news
> is: this patch does not do enough. At this stage we are still racy.
>
> The problem is that the assign operations don't bother to take the
> ap_perms_mutex lock under the matrix_dev->lock.
>
> The scenario is the following:
> 1) apmask_store() takes ap_perms_mutex
> 2) apmask_store() calls vfio_ap_mdev_resource_in_use() which
> takes matrix_dev->lock
> 3) vfio_ap_mdev_resource_in_use() releases matrix_dev->lock
> and returns 0
> 4) assign_adapter_store() takes matrix_dev->lock does the
> assign (the queues are still bound to vfio_ap) and releases
> matrix_dev->lock
> 5) apmask_store() carries on, does the update to apask and releases
> ap_perms_mutex
> 6) The queues get 'stolen' from vfio ap while used.
You're missing an interim step between 5 and 6 where the apmask_store()
function executes the device_reprobe() function which results in queues
to be taken from vfio_ap getting unbound. In this case, the
vfio_ap_mdev_remove_queue() function gets called to remove the
queues resulting in unplugging
>
> This gets fixed with "s390/vfio-ap: allow assignment of unavailable AP
> queues to mdev device". Maybe we can reorder these patches. I didn't
> look into that.
>
> We could also just ignore the problem, because it is just for a couple
> of commits, but I would prefer it gone.
Reordering the patches is not a trivial task, I perfer not to do it.
>
> Regards,
> Halil
>
>
>