2020-10-22 18:17:14

by Anthony Krowiak

[permalink] [raw]
Subject: [PATCH v11 00/14] s390/vfio-ap: dynamic configuration support

The current design for AP pass-through does not support making dynamic
changes to the AP matrix of a running guest resulting in a few
deficiencies this patch series is intended to mitigate:

1. Adapters, domains and control domains can not be added to or removed
from a running guest. In order to modify a guest's AP configuration,
the guest must be terminated; only then can AP resources be assigned
to or unassigned from the guest's matrix mdev. The new AP
configuration becomes available to the guest when it is subsequently
restarted.

2. The AP bus's /sys/bus/ap/apmask and /sys/bus/ap/aqmask interfaces can
be modified by a root user without any restrictions. A change to
either mask can result in AP queue devices being unbound from the
vfio_ap device driver and bound to a zcrypt device driver even if a
guest is using the queues, thus giving the host access to the guest's
private crypto data and vice versa.

3. The APQNs derived from the Cartesian product of the APIDs of the
adapters and APQIs of the domains assigned to a matrix mdev must
reference an AP queue device bound to the vfio_ap device driver. The
AP architecture allows assignment of AP resources that are not
available to the system, so this artificial restriction is not
compliant with the architecture.

4. The AP configuration profile can be dynamically changed for the linux
host after a KVM guest is started. For example, a new domain can be
dynamically added to the configuration profile via the SE or an HMC
connected to a DPM enabled lpar. Likewise, AP adapters can be
dynamically configured (online state) and deconfigured (standby state)
using the SE, an SCLP command or an HMC connected to a DPM enabled
lpar. This can result in inadvertent sharing of AP queues between the
guest and host.

5. A root user can manually unbind an AP queue device representing a
queue in use by a KVM guest via the vfio_ap device driver's sysfs
unbind attribute. In this case, the guest will be using a queue that
is not bound to the driver which violates the device model.

This patch series introduces the following changes to the current design
to alleviate the shortcomings described above as well as to implement
more of the AP architecture:

1. A root user will be prevented from making edits to the AP bus's
/sys/bus/ap/apmask or /sys/bus/ap/aqmask if the change would transfer
ownership of an APQN from the vfio_ap device driver to a zcrypt driver
while the APQN is assigned to a matrix mdev.

2. Allow a root user to hot plug/unplug AP adapters, domains and control
domains for a KVM guest using the matrix mdev via its sysfs
assign/unassign attributes.

4. Allow assignment of an AP adapter or domain to a matrix mdev even if
it results in assignment of an APQN that does not reference an AP
queue device bound to the vfio_ap device driver, as long as the APQN
is not reserved for use by the default zcrypt drivers (also known as
over-provisioning of AP resources). Allowing over-provisioning of AP
resources better models the architecture which does not preclude
assigning AP resources that are not yet available in the system. Such
APQNs, however, will not be assigned to the guest using the matrix
mdev; only APQNs referencing AP queue devices bound to the vfio_ap
device driver will actually get assigned to the guest.

5. Handle dynamic changes to the AP device model.

1. Rationale for changes to AP bus's apmask/aqmask interfaces:
----------------------------------------------------------
Due to the extremely sensitive nature of cryptographic data, it is
imperative that great care be taken to ensure that such data is secured.
Allowing a root user, either inadvertently or maliciously, to configure
these masks such that a queue is shared between the host and a guest is
not only avoidable, it is advisable. It was suggested that this scenario
is better handled in user space with management software, but that does
not preclude a malicious administrator from using the sysfs interfaces
to gain access to a guest's crypto data. It was also suggested that this
scenario could be avoided by taking access to the adapter away from the
guest and zeroing out the queues prior to the vfio_ap driver releasing the
device; however, stealing an adapter in use from a guest as a by-product
of an operation is bad and will likely cause problems for the guest
unnecessarily. It was decided that the most effective solution with the
least number of negative side effects is to prevent the situation at the
source.

2. Rationale for hot plug/unplug using matrix mdev sysfs interfaces:
----------------------------------------------------------------
Allowing a user to hot plug/unplug AP resources using the matrix mdev
sysfs interfaces circumvents the need to terminate the guest in order to
modify its AP configuration. Allowing dynamic configuration makes
reconfiguring a guest's AP matrix much less disruptive.

3. Rationale for allowing over-provisioning of AP resources:
-----------------------------------------------------------
Allowing assignment of AP resources to a matrix mdev and ultimately to a
guest better models the AP architecture. The architecture does not
preclude assignment of unavailable AP resources. If a queue subsequently
becomes available while a guest using the matrix mdev to which its APQN
is assigned, the guest will be given access to it. If an APQN
is dynamically unassigned from the underlying host system, it will
automatically become unavailable to the guest.

Change log v10-v11:
------------------
* The matrix mdev's configuration is not filtered by APID so that if any
APQN assigned to the mdev is not bound to the vfio_ap device driver,
the adapter will not get plugged into the KVM guest on startup, or when
a new adapter is assigned to the mdev.

* Replaced patch 8 by squashing patches 8 (filtering patch) and 15 (handle
probe/remove).

* Added a patch 1 to remove disable IRQ after a reset because the reset
already disables a queue.

* Now using filtering code to update the KVM guest's matrix when
notified that AP bus scan has completed.

* Fixed issue with probe/remove not inititiated by a configuration change
occurring within a config change.


Change log v9-v10:
-----------------
* Updated the documentation in vfio-ap.rst to include information about the
AP dynamic configuration support

Change log v8-v9:
----------------
* Fixed errors flagged by the kernel test robot

* Fixed issue with guest losing queues when a new queue is probed due to
manual bind operation.

Change log v7-v8:
----------------
* Now logging a message when an attempt to reserve APQNs for the zcrypt
drivers will result in taking a queue away from a KVM guest to provide
the sysadmin a way to ascertain why the sysfs operation failed.

* Created locked and unlocked versions of the ap_parse_mask_str() function.

* Now using new interface provided by an AP bus patch -
s390/ap: introduce new ap function ap_get_qdev() - to retrieve
struct ap_queue representing an AP queue device. This patch is not a
part of this series but is a prerequisite for this series.

Change log v6-v7:
----------------
* Added callbacks to AP bus:
- on_config_changed: Notifies implementing drivers that
the AP configuration has changed since last AP device scan.
- on_scan_complete: Notifies implementing drivers that the device scan
has completed.
- implemented on_config_changed and on_scan_complete callbacks for
vfio_ap device driver.
- updated vfio_ap device driver's probe and remove callbacks to handle
dynamic changes to the AP device model.
* Added code to filter APQNs when assigning AP resources to a KVM guest's
CRYCB

Change log v5-v6:
----------------
* Fixed a bug in ap_bus.c introduced with patch 2/7 of the v5
series. Harald Freudenberer pointed out that the mutex lock
for ap_perms_mutex in the apmask_store and aqmask_store functions
was not being freed.

* Removed patch 6/7 which added logging to the vfio_ap driver
to expedite acceptance of this series. The logging will be introduced
with a separate patch series to allow more time to explore options
such as DBF logging vs. tracepoints.

* Added 3 patches related to ensuring that APQNs that do not reference
AP queue devices bound to the vfio_ap device driver are not assigned
to the guest CRYCB:

Patch 4: Filter CRYCB bits for unavailable queue devices
Patch 5: sysfs attribute to display the guest CRYCB
Patch 6: update guest CRYCB in vfio_ap probe and remove callbacks

* Added a patch (Patch 9) to version the vfio_ap module.

* Reshuffled patches to allow the in_use callback implementation to
invoke the vfio_ap_mdev_verify_no_sharing() function introduced in
patch 2.

Change log v4-v5:
----------------
* Added a patch to provide kernel s390dbf debug logs for VFIO AP

Change log v3->v4:
-----------------
* Restored patches preventing root user from changing ownership of
APQNs from zcrypt drivers to the vfio_ap driver if the APQN is
assigned to an mdev.

* No longer enforcing requirement restricting guest access to
queues represented by a queue device bound to the vfio_ap
device driver.

* Removed shadow CRYCB and now directly updating the guest CRYCB
from the matrix mdev's matrix.

* Rebased the patch series on top of 'vfio: ap: AP Queue Interrupt
Control' patches.

* Disabled bind/unbind sysfs interfaces for vfio_ap driver

Change log v2->v3:
-----------------
* Allow guest access to an AP queue only if the queue is bound to
the vfio_ap device driver.

* Removed the patch to test CRYCB masks before taking the vCPUs
out of SIE. Now checking the shadow CRYCB in the vfio_ap driver.

Change log v1->v2:
-----------------
* Removed patches preventing root user from unbinding AP queues from
the vfio_ap device driver
* Introduced a shadow CRYCB in the vfio_ap driver to manage dynamic
changes to the AP guest configuration due to root user interventions
or hardware anomalies.


Tony Krowiak (14):
s390/vfio-ap: No need to disable IRQ after queue reset
390/vfio-ap: use new AP bus interface to search for queue devices
s390/vfio-ap: manage link between queue struct and matrix mdev
s390/zcrypt: driver callback to indicate resource in use
s390/vfio-ap: implement in-use callback for vfio_ap driver
s390/vfio-ap: introduce shadow APCB
s390/vfio-ap: sysfs attribute to display the guest's matrix
s390/vfio-ap: hot plug/unplug queues on bind/unbind of queue device
s390/vfio-ap: allow assignment of unavailable AP queues to mdev device
s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
s390/zcrypt: Notify driver on config changed and scan complete
callbacks
s390/vfio-ap: handle host AP config change notification
s390/vfio-ap: handle AP bus scan completed notification
s390/vfio-ap: update docs to include dynamic config support

Documentation/s390/vfio-ap.rst | 362 ++++++--
drivers/s390/crypto/ap_bus.c | 236 +++++-
drivers/s390/crypto/ap_bus.h | 16 +
drivers/s390/crypto/vfio_ap_drv.c | 52 +-
drivers/s390/crypto/vfio_ap_ops.c | 1091 +++++++++++++++++++------
drivers/s390/crypto/vfio_ap_private.h | 29 +-
6 files changed, 1384 insertions(+), 402 deletions(-)

--
2.21.1


2020-10-22 18:17:16

by Anthony Krowiak

[permalink] [raw]
Subject: [PATCH v11 05/14] s390/vfio-ap: implement in-use callback for vfio_ap driver

Let's implement the callback to indicate when an APQN
is in use by the vfio_ap device driver. The callback is
invoked whenever a change to the apmask or aqmask would
result in one or more queue devices being removed from the driver. The
vfio_ap device driver will indicate a resource is in use
if the APQN of any of the queue devices to be removed are assigned to
any of the matrix mdevs under the driver's control.

Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_drv.c | 1 +
drivers/s390/crypto/vfio_ap_ops.c | 78 +++++++++++++++++++--------
drivers/s390/crypto/vfio_ap_private.h | 2 +
3 files changed, 60 insertions(+), 21 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index 73bd073fd5d3..8934471b7944 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -147,6 +147,7 @@ static int __init vfio_ap_init(void)
memset(&vfio_ap_drv, 0, sizeof(vfio_ap_drv));
vfio_ap_drv.probe = vfio_ap_mdev_probe_queue;
vfio_ap_drv.remove = vfio_ap_mdev_remove_queue;
+ vfio_ap_drv.in_use = vfio_ap_mdev_resource_in_use;
vfio_ap_drv.ids = ap_queue_ids;

ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 1357f8f8b7e4..9e9fad560859 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -522,18 +522,40 @@ vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
return 0;
}

+#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
+ "already assigned to %s"
+
+static void vfio_ap_mdev_log_sharing_err(const char *mdev_name,
+ unsigned long *apm,
+ unsigned long *aqm)
+{
+ unsigned long apid, apqi;
+
+ for_each_set_bit_inv(apid, apm, AP_DEVICES)
+ for_each_set_bit_inv(apqi, aqm, AP_DOMAINS)
+ pr_err(MDEV_SHARING_ERR, apid, apqi, mdev_name);
+}
+
/**
* vfio_ap_mdev_verify_no_sharing
*
- * Verifies that the APQNs derived from the cross product of the AP adapter IDs
- * and AP queue indexes comprising the AP matrix are not configured for another
+ * Verifies that each APQN derived from the cross product of the AP adapter IDs
+ * and AP queue indexes comprising an AP matrix is not assigned to a
* mediated device. AP queue sharing is not allowed.
*
- * @matrix_mdev: the mediated matrix device
+ * @matrix_mdev: the mediated matrix device to which the APQNs being verified
+ * are assigned. If the value is not NULL, then verification will
+ * proceed for all other matrix mediated devices; otherwise, all
+ * matrix mediated devices will be verified.
+ * @mdev_apm: mask indicating the APIDs of the APQNs to be verified
+ * @mdev_aqm: mask indicating the APQIs of the APQNs to be verified
*
- * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
+ * Returns 0 if no APQNs are not shared, otherwise; returns -EADDRINUSE if one
+ * or more APQNs are shared.
*/
-static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
+static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long *mdev_apm,
+ unsigned long *mdev_aqm)
{
struct ap_matrix_mdev *lstdev;
DECLARE_BITMAP(apm, AP_DEVICES);
@@ -550,14 +572,15 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
* We work on full longs, as we can only exclude the leftover
* bits in non-inverse order. The leftover is all zeros.
*/
- if (!bitmap_and(apm, matrix_mdev->matrix.apm,
- lstdev->matrix.apm, AP_DEVICES))
+ if (!bitmap_and(apm, mdev_apm, lstdev->matrix.apm, AP_DEVICES))
continue;

- if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
- lstdev->matrix.aqm, AP_DOMAINS))
+ if (!bitmap_and(aqm, mdev_aqm, lstdev->matrix.aqm, AP_DOMAINS))
continue;

+ vfio_ap_mdev_log_sharing_err(dev_name(mdev_dev(lstdev->mdev)),
+ apm, aqm);
+
return -EADDRINUSE;
}

@@ -683,6 +706,7 @@ static ssize_t assign_adapter_store(struct device *dev,
{
int ret;
unsigned long apid;
+ DECLARE_BITMAP(apm, AP_DEVICES);
struct mdev_device *mdev = mdev_from_dev(dev);
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);

@@ -708,18 +732,18 @@ static ssize_t assign_adapter_store(struct device *dev,
if (ret)
goto done;

- set_bit_inv(apid, matrix_mdev->matrix.apm);
+ memset(apm, 0, sizeof(apm));
+ set_bit_inv(apid, apm);

- ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
+ ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev, apm,
+ matrix_mdev->matrix.aqm);
if (ret)
- goto share_err;
+ goto done;

+ set_bit_inv(apid, matrix_mdev->matrix.apm);
vfio_ap_mdev_link_queues(matrix_mdev, LINK_APID, apid);
ret = count;
- goto done;

-share_err:
- clear_bit_inv(apid, matrix_mdev->matrix.apm);
done:
mutex_unlock(&matrix_dev->lock);

@@ -831,6 +855,7 @@ static ssize_t assign_domain_store(struct device *dev,
{
int ret;
unsigned long apqi;
+ DECLARE_BITMAP(aqm, AP_DOMAINS);
struct mdev_device *mdev = mdev_from_dev(dev);
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
@@ -851,18 +876,18 @@ static ssize_t assign_domain_store(struct device *dev,
if (ret)
goto done;

- set_bit_inv(apqi, matrix_mdev->matrix.aqm);
+ memset(aqm, 0, sizeof(aqm));
+ set_bit_inv(apqi, aqm);

- ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
+ ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev,
+ matrix_mdev->matrix.apm, aqm);
if (ret)
- goto share_err;
+ goto done;

+ set_bit_inv(apqi, matrix_mdev->matrix.aqm);
vfio_ap_mdev_link_queues(matrix_mdev, LINK_APQI, apqi);
ret = count;
- goto done;

-share_err:
- clear_bit_inv(apqi, matrix_mdev->matrix.aqm);
done:
mutex_unlock(&matrix_dev->lock);

@@ -1442,3 +1467,14 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
kfree(q);
mutex_unlock(&matrix_dev->lock);
}
+
+bool vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm)
+{
+ bool in_use;
+
+ mutex_lock(&matrix_dev->lock);
+ in_use = !!vfio_ap_mdev_verify_no_sharing(NULL, apm, aqm);
+ mutex_unlock(&matrix_dev->lock);
+
+ return in_use;
+}
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 4e5cc72fc0db..c1d8b5507610 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -105,4 +105,6 @@ struct vfio_ap_queue {
int vfio_ap_mdev_probe_queue(struct ap_device *queue);
void vfio_ap_mdev_remove_queue(struct ap_device *queue);

+bool vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm);
+
#endif /* _VFIO_AP_PRIVATE_H_ */
--
2.21.1

2020-10-22 18:17:18

by Anthony Krowiak

[permalink] [raw]
Subject: [PATCH v11 10/14] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device

Let's hot plug/unplug adapters, domains and control domains assigned to or
unassigned from an AP matrix mdev device while it is in use by a guest per
the following:

* Hot plug AP adapter:

When the APID of an adapter is assigned to a matrix mdev in use by a KVM
guest, the adapter will be hot plugged into the KVM guest as long as each
APQN derived from the Cartesian product of the APID being assigned and
the APQIs already assigned to the matrix mdev references a queue device
bound to the vfio_ap device driver.

* Hot unplug adapter:

When the APID of an adapter is unassigned from a matrix mdev in use by a
KVM guest, the adapter will be hot unplugged from the KVM guest.

* Hot plug domain:

When the APQI of a domain is assigned to a matrix mdev in use by a KVM
guest, the domain will be hot plugged into the KVM guest as long as each
APQN derived from the Cartesian product of the APQI being assigned and
the APIDs already assigned to the matrix mdev references a queue device
bound to the vfio_ap device driver.

* Hot unplug domain:

When the APQI of a domain is unassigned from a matrix mdev in use by a
KVM guest, the domain will be hot unplugged from the KVM guest

* Hot plug control domain:

When the domain number of a control domain is assigned to a matrix mdev
in use by a KVM guest, the control domain will be hot plugged into the
KVM guest. The AP architecture ensures a guest will only get access to
the control domain if it is in the host's AP configuration, so there is
no risk in hot plugging it; however, it will become automatically
available to the guest when it is added to the host configuration.

* Hot unplug control domain:

When the domain number of a control domain is unassigned from a matrix
mdev in use by a KVM guest, the control domain will be hot unplugged
from the KVM guest.

Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 148 ++++++++++++++++++++++++------
1 file changed, 119 insertions(+), 29 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index c2c6dcec8829..dae1fba41941 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -517,12 +517,18 @@ static bool vfio_ap_mdev_matrixes_equal(struct ap_matrix *matrix1,
* a matrix mdev's AP configuration and stores the result in the shadow copy of
* the APCB used to supply a KVM guest's AP configuration.
*
+ * Note: Filtering is applied only to adapters and domains. Changes to control
+ * domains will always be reflected in the shadow APCB.
+ *
* @matrix_mdev: the matrix mdev whose AP configuration is to be filtered
+ * @filter_apid: indicates whether APIDs (true) or APQIs (false) shall be
+ * filtered
*
* Returns true if filtering has changed the shadow copy of the APCB used
* to supply a KVM guest's AP configuration; otherwise, returns false.
*/
-static int vfio_ap_mdev_filter_guest_matrix(struct ap_matrix_mdev *matrix_mdev)
+static int vfio_ap_mdev_filter_guest_matrix(struct ap_matrix_mdev *matrix_mdev,
+ bool filter_apid)
{
struct ap_matrix shadow_apcb;
unsigned long apid, apqi, apqn;
@@ -561,9 +567,15 @@ static int vfio_ap_mdev_filter_guest_matrix(struct ap_matrix_mdev *matrix_mdev)
* the APID.
*/
apqn = AP_MKQID(apid, apqi);
+
if (!vfio_ap_mdev_get_queue(matrix_mdev, apqn)) {
- clear_bit_inv(apid, shadow_apcb.apm);
- break;
+ if (filter_apid) {
+ clear_bit_inv(apid, shadow_apcb.apm);
+ break;
+ }
+
+ clear_bit_inv(apqi, shadow_apcb.aqm);
+ continue;
}
}

@@ -723,10 +735,6 @@ static ssize_t assign_adapter_store(struct device *dev,
struct mdev_device *mdev = mdev_from_dev(dev);
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);

- /* If the guest is running, disallow assignment of adapter */
- if (matrix_mdev->kvm)
- return -EBUSY;
-
ret = kstrtoul(buf, 0, &apid);
if (ret)
return ret;
@@ -746,12 +754,44 @@ static ssize_t assign_adapter_store(struct device *dev,
}
set_bit_inv(apid, matrix_mdev->matrix.apm);
vfio_ap_mdev_link_queues(matrix_mdev, LINK_APID, apid);
+
+ if (vfio_ap_mdev_has_crycb(matrix_mdev))
+ if (vfio_ap_mdev_filter_guest_matrix(matrix_mdev, true))
+ vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
mutex_unlock(&matrix_dev->lock);

return count;
}
static DEVICE_ATTR_WO(assign_adapter);

+static bool vfio_ap_mdev_unassign_guest_apid(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long apid)
+{
+ if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
+ if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm)) {
+ clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
+
+ /*
+ * If there are no APIDs assigned to the guest, then
+ * the guest will not have access to any queues, so
+ * let's also go ahead and unassign the APQIs. Keeping
+ * them around may yield unpredictable results during
+ * a probe that is not related to a host AP
+ * configuration change (i.e., an AP adapter is
+ * configured online).
+ */
+ if (bitmap_empty(matrix_mdev->shadow_apcb.apm,
+ AP_DEVICES))
+ bitmap_clear(matrix_mdev->shadow_apcb.aqm, 0,
+ AP_DOMAINS);
+
+ return true;
+ }
+ }
+
+ return false;
+}
+
/**
* unassign_adapter_store
*
@@ -778,10 +818,6 @@ static ssize_t unassign_adapter_store(struct device *dev,
struct mdev_device *mdev = mdev_from_dev(dev);
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);

- /* If the guest is running, disallow un-assignment of adapter */
- if (matrix_mdev->kvm)
- return -EBUSY;
-
ret = kstrtoul(buf, 0, &apid);
if (ret)
return ret;
@@ -792,6 +828,9 @@ static ssize_t unassign_adapter_store(struct device *dev,
mutex_lock(&matrix_dev->lock);
clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
vfio_ap_mdev_link_queues(matrix_mdev, UNLINK_APID, apid);
+
+ if (vfio_ap_mdev_unassign_guest_apid(matrix_mdev, apid))
+ vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
mutex_unlock(&matrix_dev->lock);

return count;
@@ -841,10 +880,6 @@ static ssize_t assign_domain_store(struct device *dev,
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
unsigned long max_apqi = matrix_mdev->matrix.aqm_max;

- /* If the guest is running, disallow assignment of domain */
- if (matrix_mdev->kvm)
- return -EBUSY;
-
ret = kstrtoul(buf, 0, &apqi);
if (ret)
return ret;
@@ -863,12 +898,43 @@ static ssize_t assign_domain_store(struct device *dev,
}
set_bit_inv(apqi, matrix_mdev->matrix.aqm);
vfio_ap_mdev_link_queues(matrix_mdev, LINK_APQI, apqi);
+
+ if (vfio_ap_mdev_has_crycb(matrix_mdev))
+ if (vfio_ap_mdev_filter_guest_matrix(matrix_mdev, false))
+ vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
mutex_unlock(&matrix_dev->lock);

return count;
}
static DEVICE_ATTR_WO(assign_domain);

+static bool vfio_ap_mdev_unassign_guest_apqi(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long apqi)
+{
+ if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
+ if (test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm)) {
+ clear_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
+
+ /*
+ * If there are no APQIs assigned to the guest, then
+ * the guest will not have access to any queues, so
+ * let's also go ahead and unassign the APIDs. Keeping
+ * them around may yield unpredictable results during
+ * a probe that is not related to a host AP
+ * configuration change (i.e., an AP adapter is
+ * configured online).
+ */
+ if (bitmap_empty(matrix_mdev->shadow_apcb.aqm,
+ AP_DOMAINS))
+ bitmap_clear(matrix_mdev->shadow_apcb.apm, 0,
+ AP_DEVICES);
+
+ return true;
+ }
+ }
+
+ return false;
+}

/**
* unassign_domain_store
@@ -896,10 +962,6 @@ static ssize_t unassign_domain_store(struct device *dev,
struct mdev_device *mdev = mdev_from_dev(dev);
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);

- /* If the guest is running, disallow un-assignment of domain */
- if (matrix_mdev->kvm)
- return -EBUSY;
-
ret = kstrtoul(buf, 0, &apqi);
if (ret)
return ret;
@@ -910,12 +972,29 @@ static ssize_t unassign_domain_store(struct device *dev,
mutex_lock(&matrix_dev->lock);
clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
vfio_ap_mdev_link_queues(matrix_mdev, UNLINK_APQI, apqi);
+
+ if (vfio_ap_mdev_unassign_guest_apqi(matrix_mdev, apqi))
+ vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
mutex_unlock(&matrix_dev->lock);

return count;
}
static DEVICE_ATTR_WO(unassign_domain);

+static bool vfio_ap_mdev_assign_guest_cdom(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long domid)
+{
+ if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
+ if (!test_bit_inv(domid, matrix_mdev->shadow_apcb.adm)) {
+ set_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
+
+ return true;
+ }
+ }
+
+ return false;
+}
+
/**
* assign_control_domain_store
*
@@ -941,10 +1020,6 @@ static ssize_t assign_control_domain_store(struct device *dev,
struct mdev_device *mdev = mdev_from_dev(dev);
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);

- /* If the guest is running, disallow assignment of control domain */
- if (matrix_mdev->kvm)
- return -EBUSY;
-
ret = kstrtoul(buf, 0, &id);
if (ret)
return ret;
@@ -959,12 +1034,29 @@ static ssize_t assign_control_domain_store(struct device *dev,
*/
mutex_lock(&matrix_dev->lock);
set_bit_inv(id, matrix_mdev->matrix.adm);
+ if (vfio_ap_mdev_assign_guest_cdom(matrix_mdev, id))
+ vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
mutex_unlock(&matrix_dev->lock);

return count;
}
static DEVICE_ATTR_WO(assign_control_domain);

+static bool
+vfio_ap_mdev_unassign_guest_cdom(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long domid)
+{
+ if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
+ if (test_bit_inv(domid, matrix_mdev->shadow_apcb.adm)) {
+ clear_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
+
+ return true;
+ }
+ }
+
+ return false;
+}
+
/**
* unassign_control_domain_store
*
@@ -991,10 +1083,6 @@ static ssize_t unassign_control_domain_store(struct device *dev,
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
unsigned long max_domid = matrix_mdev->matrix.adm_max;

- /* If the guest is running, disallow un-assignment of control domain */
- if (matrix_mdev->kvm)
- return -EBUSY;
-
ret = kstrtoul(buf, 0, &domid);
if (ret)
return ret;
@@ -1003,6 +1091,8 @@ static ssize_t unassign_control_domain_store(struct device *dev,

mutex_lock(&matrix_dev->lock);
clear_bit_inv(domid, matrix_mdev->matrix.adm);
+ if (vfio_ap_mdev_unassign_guest_cdom(matrix_mdev, domid))
+ vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
mutex_unlock(&matrix_dev->lock);

return count;
@@ -1216,7 +1306,7 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
if (!vfio_ap_mdev_has_crycb(matrix_mdev))
return NOTIFY_DONE;

- if (vfio_ap_mdev_filter_guest_matrix(matrix_mdev))
+ if (vfio_ap_mdev_filter_guest_matrix(matrix_mdev, true))
vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);

return NOTIFY_OK;
@@ -1443,7 +1533,7 @@ static void vfio_ap_mdev_hot_plug_queue(struct vfio_ap_queue *q)
if ((q->matrix_mdev == NULL) || !vfio_ap_mdev_has_crycb(q->matrix_mdev))
return;

- if (vfio_ap_mdev_filter_guest_matrix(q->matrix_mdev))
+ if (vfio_ap_mdev_filter_guest_matrix(q->matrix_mdev, true))
vfio_ap_mdev_commit_shadow_apcb(q->matrix_mdev);
}

--
2.21.1

2020-10-22 18:17:28

by Anthony Krowiak

[permalink] [raw]
Subject: [PATCH v11 06/14] s390/vfio-ap: introduce shadow APCB

The APCB is a field within the CRYCB that provides the AP configuration
to a KVM guest. Let's introduce a shadow copy of the KVM guest's APCB and
maintain it for the lifespan of the guest.

Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 24 +++++++++++++++++++-----
drivers/s390/crypto/vfio_ap_private.h | 2 ++
2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 9e9fad560859..9791761aa7fd 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -320,6 +320,19 @@ static void vfio_ap_matrix_init(struct ap_config_info *info,
matrix->adm_max = info->apxa ? info->Nd : 15;
}

+static bool vfio_ap_mdev_has_crycb(struct ap_matrix_mdev *matrix_mdev)
+{
+ return (matrix_mdev->kvm && matrix_mdev->kvm->arch.crypto.crycbd);
+}
+
+static void vfio_ap_mdev_commit_shadow_apcb(struct ap_matrix_mdev *matrix_mdev)
+{
+ kvm_arch_crypto_set_masks(matrix_mdev->kvm,
+ matrix_mdev->shadow_apcb.apm,
+ matrix_mdev->shadow_apcb.aqm,
+ matrix_mdev->shadow_apcb.adm);
+}
+
static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
{
struct ap_matrix_mdev *matrix_mdev;
@@ -335,6 +348,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)

matrix_mdev->mdev = mdev;
vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
+ vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);
hash_init(matrix_mdev->qtable);
mdev_set_drvdata(mdev, matrix_mdev);
matrix_mdev->pqap_hook.hook = handle_pqap;
@@ -1213,13 +1227,12 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
if (ret)
return NOTIFY_DONE;

- /* If there is no CRYCB pointer, then we can't copy the masks */
- if (!matrix_mdev->kvm->arch.crypto.crycbd)
+ if (!vfio_ap_mdev_has_crycb(matrix_mdev))
return NOTIFY_DONE;

- kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
- matrix_mdev->matrix.aqm,
- matrix_mdev->matrix.adm);
+ memcpy(&matrix_mdev->shadow_apcb, &matrix_mdev->matrix,
+ sizeof(matrix_mdev->shadow_apcb));
+ vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);

return NOTIFY_OK;
}
@@ -1329,6 +1342,7 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
kvm_put_kvm(matrix_mdev->kvm);
matrix_mdev->kvm = NULL;
}
+
mutex_unlock(&matrix_dev->lock);

vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index c1d8b5507610..fc8634cee485 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -75,6 +75,7 @@ struct ap_matrix {
* @list: allows the ap_matrix_mdev struct to be added to a list
* @matrix: the adapters, usage domains and control domains assigned to the
* mediated matrix device.
+ * @shadow_apcb: the shadow copy of the APCB field of the KVM guest's CRYCB
* @group_notifier: notifier block used for specifying callback function for
* handling the VFIO_GROUP_NOTIFY_SET_KVM event
* @kvm: the struct holding guest's state
@@ -82,6 +83,7 @@ struct ap_matrix {
struct ap_matrix_mdev {
struct list_head node;
struct ap_matrix matrix;
+ struct ap_matrix shadow_apcb;
struct notifier_block group_notifier;
struct notifier_block iommu_notifier;
struct kvm *kvm;
--
2.21.1

2020-10-22 18:17:28

by Anthony Krowiak

[permalink] [raw]
Subject: [PATCH v11 07/14] s390/vfio-ap: sysfs attribute to display the guest's matrix

The matrix of adapters and domains configured in a guest's APCB may
differ from the matrix of adapters and domains assigned to the matrix mdev,
so this patch introduces a sysfs attribute to display the matrix of a guest
using the matrix mdev. For a matrix mdev denoted by $uuid, the crycb for a
guest using the matrix mdev can be displayed as follows:

cat /sys/devices/vfio_ap/matrix/$uuid/guest_matrix

If a guest is not using the matrix mdev at the time the crycb is displayed,
an error (ENODEV) will be returned.

Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 54 +++++++++++++++++++++++--------
1 file changed, 40 insertions(+), 14 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 9791761aa7fd..7bad70d7bcef 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1073,29 +1073,24 @@ static ssize_t control_domains_show(struct device *dev,
}
static DEVICE_ATTR_RO(control_domains);

-static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
- char *buf)
+static ssize_t vfio_ap_mdev_matrix_show(struct ap_matrix *matrix, char *buf)
{
- struct mdev_device *mdev = mdev_from_dev(dev);
- struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
char *bufpos = buf;
unsigned long apid;
unsigned long apqi;
unsigned long apid1;
unsigned long apqi1;
- unsigned long napm_bits = matrix_mdev->matrix.apm_max + 1;
- unsigned long naqm_bits = matrix_mdev->matrix.aqm_max + 1;
+ unsigned long napm_bits = matrix->apm_max + 1;
+ unsigned long naqm_bits = matrix->aqm_max + 1;
int nchars = 0;
int n;

- apid1 = find_first_bit_inv(matrix_mdev->matrix.apm, napm_bits);
- apqi1 = find_first_bit_inv(matrix_mdev->matrix.aqm, naqm_bits);
-
- mutex_lock(&matrix_dev->lock);
+ apid1 = find_first_bit_inv(matrix->apm, napm_bits);
+ apqi1 = find_first_bit_inv(matrix->aqm, naqm_bits);

if ((apid1 < napm_bits) && (apqi1 < naqm_bits)) {
- for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm_bits) {
- for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
+ for_each_set_bit_inv(apid, matrix->apm, napm_bits) {
+ for_each_set_bit_inv(apqi, matrix->aqm,
naqm_bits) {
n = sprintf(bufpos, "%02lx.%04lx\n", apid,
apqi);
@@ -1104,25 +1099,55 @@ static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
}
}
} else if (apid1 < napm_bits) {
- for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm_bits) {
+ for_each_set_bit_inv(apid, matrix->apm, napm_bits) {
n = sprintf(bufpos, "%02lx.\n", apid);
bufpos += n;
nchars += n;
}
} else if (apqi1 < naqm_bits) {
- for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, naqm_bits) {
+ for_each_set_bit_inv(apqi, matrix->aqm, naqm_bits) {
n = sprintf(bufpos, ".%04lx\n", apqi);
bufpos += n;
nchars += n;
}
}

+ return nchars;
+}
+
+static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ ssize_t nchars;
+ struct mdev_device *mdev = mdev_from_dev(dev);
+ struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+
+ mutex_lock(&matrix_dev->lock);
+ nchars = vfio_ap_mdev_matrix_show(&matrix_mdev->matrix, buf);
mutex_unlock(&matrix_dev->lock);

return nchars;
}
static DEVICE_ATTR_RO(matrix);

+static ssize_t guest_matrix_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ ssize_t nchars;
+ struct mdev_device *mdev = mdev_from_dev(dev);
+ struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+
+ if (!vfio_ap_mdev_has_crycb(matrix_mdev))
+ return -ENODEV;
+
+ mutex_lock(&matrix_dev->lock);
+ nchars = vfio_ap_mdev_matrix_show(&matrix_mdev->shadow_apcb, buf);
+ mutex_unlock(&matrix_dev->lock);
+
+ return nchars;
+}
+static DEVICE_ATTR_RO(guest_matrix);
+
static struct attribute *vfio_ap_mdev_attrs[] = {
&dev_attr_assign_adapter.attr,
&dev_attr_unassign_adapter.attr,
@@ -1132,6 +1157,7 @@ static struct attribute *vfio_ap_mdev_attrs[] = {
&dev_attr_unassign_control_domain.attr,
&dev_attr_control_domains.attr,
&dev_attr_matrix.attr,
+ &dev_attr_guest_matrix.attr,
NULL,
};

--
2.21.1

2020-10-22 18:18:12

by Anthony Krowiak

[permalink] [raw]
Subject: [PATCH v11 14/14] s390/vfio-ap: update docs to include dynamic config support

Update the documentation in vfio-ap.rst to include information about the
AP dynamic configuration support (i.e., hot plug of adapters, domains
and control domains via the matrix mediated device's sysfs assignment
attributes).

Signed-off-by: Tony Krowiak <[email protected]>
---
Documentation/s390/vfio-ap.rst | 362 ++++++++++++++++++++++++++-------
1 file changed, 285 insertions(+), 77 deletions(-)

diff --git a/Documentation/s390/vfio-ap.rst b/Documentation/s390/vfio-ap.rst
index e15436599086..888e15dbefc0 100644
--- a/Documentation/s390/vfio-ap.rst
+++ b/Documentation/s390/vfio-ap.rst
@@ -253,7 +253,7 @@ The process for reserving an AP queue for use by a KVM guest is:
1. The administrator loads the vfio_ap device driver
2. The vfio-ap driver during its initialization will register a single 'matrix'
device with the device core. This will serve as the parent device for
- all mediated matrix devices used to configure an AP matrix for a guest.
+ all matrix mediated devices used to configure an AP matrix for a guest.
3. The /sys/devices/vfio_ap/matrix device is created by the device core
4. The vfio_ap device driver will register with the AP bus for AP queue devices
of type 10 and higher (CEX4 and newer). The driver will provide the vfio_ap
@@ -269,7 +269,7 @@ The process for reserving an AP queue for use by a KVM guest is:
default zcrypt cex4queue driver.
8. The AP bus probes the vfio_ap device driver to bind the queues reserved for
it.
-9. The administrator creates a passthrough type mediated matrix device to be
+9. The administrator creates a passthrough type matrix mediated device to be
used by a guest
10. The administrator assigns the adapters, usage domains and control domains
to be exclusively used by a guest.
@@ -279,14 +279,14 @@ Set up the VFIO mediated device interfaces
The VFIO AP device driver utilizes the common interface of the VFIO mediated
device core driver to:

-* Register an AP mediated bus driver to add a mediated matrix device to and
+* Register an AP mediated bus driver to add a matrix mediated device to and
remove it from a VFIO group.
-* Create and destroy a mediated matrix device
-* Add a mediated matrix device to and remove it from the AP mediated bus driver
-* Add a mediated matrix device to and remove it from an IOMMU group
+* Create and destroy a matrix mediated device
+* Add a matrix mediated device to and remove it from the AP mediated bus driver
+* Add a matrix mediated device to and remove it from an IOMMU group

The following high-level block diagram shows the main components and interfaces
-of the VFIO AP mediated matrix device driver::
+of the VFIO AP matrix mediated device driver::

+-------------+
| |
@@ -351,29 +351,37 @@ matrix device.
This attribute group identifies the user-defined sysfs attributes of the
mediated device. When a device is registered with the VFIO mediated device
framework, the sysfs attribute files identified in the 'mdev_attr_groups'
- structure will be created in the mediated matrix device's directory. The
- sysfs attributes for a mediated matrix device are:
+ structure will be created in the matrix mediated device's directory. The
+ sysfs attributes for a matrix mediated device are:

assign_adapter / unassign_adapter:
Write-only attributes for assigning/unassigning an AP adapter to/from the
- mediated matrix device. To assign/unassign an adapter, the APID of the
+ matrix mediated device. To assign/unassign an adapter, the APID of the
adapter is echoed to the respective attribute file.
assign_domain / unassign_domain:
Write-only attributes for assigning/unassigning an AP usage domain to/from
- the mediated matrix device. To assign/unassign a domain, the domain
+ the matrix mediated device. To assign/unassign a domain, the domain
number of the usage domain is echoed to the respective attribute
file.
matrix:
- A read-only file for displaying the APQNs derived from the cross product
- of the adapter and domain numbers assigned to the mediated matrix device.
+ A read-only file for displaying the APQNs derived from the Cartesian
+ product of the adapter and domain numbers assigned to the mediated matrix
+ device.
+ guest_matrix:
+ A read-only file for displaying the APQNs derived from the Cartesian
+ product of the adapter and domain numbers assigned to the APM and AQM
+ fields respectively of the KVM guest's CRYCB. This will differ from the
+ matrix if any APQNs assigned to the matrix mediated device do not
+ reference a queue device bound to the vfio_ap device driver (i.e., the
+ queue is not in the AP configuration).
assign_control_domain / unassign_control_domain:
Write-only attributes for assigning/unassigning an AP control domain
- to/from the mediated matrix device. To assign/unassign a control domain,
+ to/from the matrix mediated device. To assign/unassign a control domain,
the ID of the domain to be assigned/unassigned is echoed to the respective
attribute file.
control_domains:
A read-only file for displaying the control domain numbers assigned to the
- mediated matrix device.
+ matrix mediated device.

* functions:

@@ -385,7 +393,7 @@ matrix device.
domains assigned via the corresponding sysfs attributes files

remove:
- deallocates the mediated matrix device's ap_matrix_mdev structure. This will
+ deallocates the matrix mediated device's ap_matrix_mdev structure. This will
be allowed only if a running guest is not using the mdev.

* callback interfaces
@@ -397,7 +405,7 @@ matrix device.
for the mdev matrix device to the MDEV bus. Access to the KVM structure used
to configure the KVM guest is provided via this callback. The KVM structure,
is used to configure the guest's access to the AP matrix defined via the
- mediated matrix device's sysfs attribute files.
+ matrix mediated device's sysfs attribute files.
release:
unregisters the VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the
mdev matrix device and deconfigures the guest's AP matrix.
@@ -410,11 +418,49 @@ function is called when QEMU connects to KVM. The guest's AP matrix is
configured via it's CRYCB by:

* Setting the bits in the APM corresponding to the APIDs assigned to the
- mediated matrix device via its 'assign_adapter' interface.
+ matrix mediated device via its 'assign_adapter' interface.
* Setting the bits in the AQM corresponding to the domains assigned to the
- mediated matrix device via its 'assign_domain' interface.
+ matrix mediated device via its 'assign_domain' interface.
* Setting the bits in the ADM corresponding to the domain dIDs assigned to the
- mediated matrix device via its 'assign_control_domains' interface.
+ matrix mediated device via its 'assign_control_domains' interface.
+
+The linux device model precludes passing a device through to a KVM guest that
+is not bound to the device driver facilitating its pass-through. Consequently,
+an APQN that does not reference a queue device bound to the vfio_ap device
+driver will not be assigned to a KVM guest's CRYCB. The AP architecture,
+however, does not provide a means to filter individual APQNs from the guest's
+CRYCB, so the following logic is employed to filter them:
+
+* Filter the APQNs assigned to the matrix mediated device by APID.
+
+ To filter APQNs by APID, each APQN derived from the Cartesian product of the
+ adapter numbers (APID) and domain numbers (APQI) assigned to the mdev is
+ examined and if any one of them does not reference a queue device bound to the
+ vfio_ap device driver, the adapter will not be plugged into the guest (i.e.,
+ the bit corresponding to its APID will not be set in the APM of the guest's
+ CRYCB).
+
+ If at least one adapter is plugged into the guest, then all domains assigned
+ to the mdev will also be plugged into the guest (i.e., the bits corresponding
+ to the APQIs of the domains assigned to the mdev will be set in the AQM field
+ of the guest's CRYCB).
+
+* Filter the APQNs assigned to the matrix mediated device by APQI.
+
+ The APQNs will be filtered by APQI if filtering by APID does not result in any
+ adapters or domains getting plugged into the guest.
+
+ To filter APQNs by APQI, each APQN derived from the Cartesian product of the
+ adapter numbers (APID) and domain numbers (APQI) assigned to the mdev is
+ examined and if any one of them does not reference a queue device bound to the
+ vfio_ap device driver, the domain will not be plugged into the guest (i.e.,
+ the bit corresponding to its APQI will not be set in the AQM of the guest's
+ CRYCB).
+
+ If at least one domain is plugged into the guest, then all adapters assigned
+ to the mdev will also be plugged into the guest (i.e., the bits corresponding
+ to the APIDs of the adapters assigned to the mdev will be set in the APM field
+ of the guest's CRYCB).

The CPU model features for AP
-----------------------------
@@ -435,6 +481,10 @@ available to a KVM guest via the following CPU model features:
can be made available to the guest only if it is available on the host (i.e.,
facility bit 12 is set).

+4. apqi: Indicates AP queue interrupts are available on the guest. This facility
+ can be made available to the guest only if it is available on the host (i.e.,
+ facility bit 65 is set).
+
Note: If the user chooses to specify a CPU model different than the 'host'
model to QEMU, the CPU model features and facilities need to be turned on
explicitly; for example::
@@ -444,7 +494,7 @@ explicitly; for example::
A guest can be precluded from using AP features/facilities by turning them off
explicitly; for example::

- /usr/bin/qemu-system-s390x ... -cpu host,ap=off,apqci=off,apft=off
+ /usr/bin/qemu-system-s390x ... -cpu host,ap=off,apqci=off,apft=off,apqi=off

Note: If the APFT facility is turned off (apft=off) for the guest, the guest
will not see any AP devices. The zcrypt device drivers that register for type 10
@@ -530,40 +580,56 @@ These are the steps:

2. Secure the AP queues to be used by the three guests so that the host can not
access them. To secure them, there are two sysfs files that specify
- bitmasks marking a subset of the APQN range as 'usable by the default AP
- queue device drivers' or 'not usable by the default device drivers' and thus
- available for use by the vfio_ap device driver'. The location of the sysfs
- files containing the masks are::
+ bitmasks marking a subset of the APQN range as usable only by the default AP
+ queue device drivers. All remaining APQNs are available available for use by
+ any other device driver. The vfio_ap device driver is currently the only
+ non-default device driver. The location of the sysfs files containing the
+ masks are::

/sys/bus/ap/apmask
/sys/bus/ap/aqmask

The 'apmask' is a 256-bit mask that identifies a set of AP adapter IDs
- (APID). Each bit in the mask, from left to right (i.e., from most significant
- to least significant bit in big endian order), corresponds to an APID from
- 0-255. If a bit is set, the APID is marked as usable only by the default AP
- queue device drivers; otherwise, the APID is usable by the vfio_ap
- device driver.
+ (APID). Each bit in the mask, from left to right corresponds to an APID from
+ 0-255. If a bit is set, the APID is marked as available to the default AP
+ queue device drivers.

The 'aqmask' is a 256-bit mask that identifies a set of AP queue indexes
- (APQI). Each bit in the mask, from left to right (i.e., from most significant
- to least significant bit in big endian order), corresponds to an APQI from
- 0-255. If a bit is set, the APQI is marked as usable only by the default AP
- queue device drivers; otherwise, the APQI is usable by the vfio_ap device
- driver.
+ (APQI). Each bit in the mask, from left to right corresponds to an APQI from
+ 0-255. If a bit is set, the APQI is marked as available to the default AP
+ queue device drivers.
+
+ The Cartesian product of the APIDs corresponding to the bits set in the
+ apmask and the APQIs corresponding to the bits set in the aqmask comprise
+ the subset of APQNs that can be used only by the host default device drivers.
+ All other APQNs are available to the non-default device drivers such as the
+ vfio_ap driver.
+
+ Take, for example, the following masks::
+
+ apmask:
+ 0x7d00000000000000000000000000000000000000000000000000000000000000
+
+ aqmask:
+ 0x8000000000000000000000000000000000000000000000000000000000000000
+
+ The masks indicate:

- Take, for example, the following mask::
+ * Adapters 1, 2, 3, 4, 5, and 7 are available for use by the host default
+ device drivers.

- 0x7dffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
+ * Domain 0 is available for use by the host default device drivers

- It indicates:
+ * The subset of APQNs available for use only by the default host device
+ drivers are:

- 1, 2, 3, 4, 5, and 7-255 belong to the default drivers' pool, and 0 and 6
- belong to the vfio_ap device driver's pool.
+ (1,0), (2,0), (3,0), (4.0), (5,0) and (7,0)
+
+ * All other APQNs are available for use by the non-default device drivers.

The APQN of each AP queue device assigned to the linux host is checked by the
- AP bus against the set of APQNs derived from the cross product of APIDs
- and APQIs marked as usable only by the default AP queue device drivers. If a
+ AP bus against the set of APQNs derived from the Cartesian product of APIDs
+ and APQIs marked as available to the default AP queue device drivers. If a
match is detected, only the default AP queue device drivers will be probed;
otherwise, the vfio_ap device driver will be probed.

@@ -627,6 +693,16 @@ These are the steps:
default drivers pool: adapter 0-15, domain 1
alternate drivers pool: adapter 16-255, domains 0, 2-255

+ Note ***:
+ Changing a mask such that one or more APQNs will be taken from a matrix
+ mediated device (see below) will fail with an error (EADDRINUSE). The error
+ is logged to the kernel ring buffer which can be viewed with the 'dmesg'
+ command. The output identifies each APQN flagged as 'in use' and the matrix
+ mediated device to which it is assigned; for example:
+
+ Userspace may not re-assign queue 05.0054 already assigned to 62177883-f1bb-47f0-914d-32a22e3a8804
+ Userspace may not re-assign queue 04.0054 already assigned to cef03c3c-903d-4ecc-9a83-40694cb8aee4
+
Securing the APQNs for our example
----------------------------------
To secure the AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004, 06.0047,
@@ -684,7 +760,7 @@ Securing the APQNs for our example

/sys/devices/vfio_ap/matrix/
--- [mdev_supported_types]
- ------ [vfio_ap-passthrough] (passthrough mediated matrix device type)
+ ------ [vfio_ap-passthrough] (passthrough matrix mediated device type)
--------- create
--------- [devices]

@@ -775,17 +851,18 @@ Securing the APQNs for our example
higher than the maximum is specified, the operation will terminate with
an error (ENODEV).

- * All APQNs that can be derived from the adapter ID and the IDs of
- the previously assigned domains must be bound to the vfio_ap device
- driver. If no domains have yet been assigned, then there must be at least
- one APQN with the specified APID bound to the vfio_ap driver. If no such
- APQNs are bound to the driver, the operation will terminate with an
- error (EADDRNOTAVAIL).
+ * All APQNs that can be derived from the Cartesian product of the APID of the
+ adapter being assigned and the APQIs of the previously assigned domains
+ must be available to the vfio_ap device driver as specified in the sysfs
+ /sys/bus/ap/apmask and /sys/bus/ap/aqmask attribute files. If even one APQN
+ is reserved for use by the host device driver, the operation will terminate
+ with an error (EADDRNOTAVAIL).

- No APQN that can be derived from the adapter ID and the IDs of the
- previously assigned domains can be assigned to another mediated matrix
- device. If an APQN is assigned to another mediated matrix device, the
- operation will terminate with an error (EADDRINUSE).
+ * No APQN that can be derived from the Cartesian product of the APID of the
+ adapter being assigned and the APQIs of the previously assigned domains can
+ be assigned to another matrix mediated device. If even one APQN is assigned
+ to another matrix mediated device, the operation will terminate with an
+ error (EADDRINUSE).

In order to successfully assign a domain:

@@ -794,17 +871,18 @@ Securing the APQNs for our example
higher than the maximum is specified, the operation will terminate with
an error (ENODEV).

- * All APQNs that can be derived from the domain ID and the IDs of
- the previously assigned adapters must be bound to the vfio_ap device
- driver. If no domains have yet been assigned, then there must be at least
- one APQN with the specified APQI bound to the vfio_ap driver. If no such
- APQNs are bound to the driver, the operation will terminate with an
- error (EADDRNOTAVAIL).
+ * All APQNs that can be derived from the Cartesian product of the APQI of the
+ domain being assigned and the APIDs of the previously assigned adapters
+ must be available to the vfio_ap device driver as specified in the sysfs
+ /sys/bus/ap/apmask and /sys/bus/ap/aqmask attribute files. If even one APQN
+ is reserved for use by the host device driver, the operation will terminate
+ with an error (EADDRNOTAVAIL).

- No APQN that can be derived from the domain ID and the IDs of the
- previously assigned adapters can be assigned to another mediated matrix
- device. If an APQN is assigned to another mediated matrix device, the
- operation will terminate with an error (EADDRINUSE).
+ * No APQN that can be derived from the Cartesian product of the APQI of the
+ domain being assigned and the APIDs of the previously assigned adapters can
+ be assigned to another matrix mediated device. If even one APQN is assigned
+ to another matrix mediated device, the operation will terminate with an
+ error (EADDRINUSE).

In order to successfully assign a control domain, the domain number
specified must represent a value from 0 up to the maximum domain number
@@ -813,22 +891,22 @@ Securing the APQNs for our example

5. Start Guest1::

- /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \
+ /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on,apqi=on \
-device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid1 ...

7. Start Guest2::

- /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \
+ /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on,apqi=on \
-device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid2 ...

7. Start Guest3::

- /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \
+ /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on,apqi=on \
-device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid3 ...

-When the guest is shut down, the mediated matrix devices may be removed.
+When the guest is shut down, the matrix mediated devices may be removed.

-Using our example again, to remove the mediated matrix device $uuid1::
+Using our example again, to remove the matrix mediated device $uuid1::

/sys/devices/vfio_ap/matrix/
--- [mdev_supported_types]
@@ -851,16 +929,146 @@ remove it if no guest will use it during the remaining lifetime of the linux
host. If the mdev matrix device is removed, one may want to also reconfigure
the pool of adapters and queues reserved for use by the default drivers.

+Hot plug support:
+================
+An adapter, domain or control domain may be hot plugged into a running KVM
+guest by assigning it to the matrix mediated device being used by the guest.
+Control domains will always be hot plugged; however, an adapter or domain will
+be hot plugged only if each new APQN resulting from its assignment
+references a queue device bound to the vfio_ap device driver as described
+below.
+
+When an adapter is assigned to a matrix mediated device in use by a KVM guest:
+
+* If no domains have yet been plugged into the KVM guest:
+
+ Hot plug the adapter and every domain previously assigned to the mdev if each
+ APQN derived from the Cartesian product of the APID of the adapter being
+ assigned and the APQIs of the domains previously assigned references a queue
+ device bound to the vfio_ap device driver.
+
+* If one or more domains have previously been plugged into the guest:
+
+ Hot plug the adapter if each APQN derived from the Cartesian product of the
+ APID of the adapter being assigned and the APQIs of the domains already
+ plugged into the guest references a queue device bound to the vfio_ap device
+ driver.
+
+When a domain is assigned to a matrix mediated device in use by a KVM guest:
+
+* If no adapters have yet been plugged into the KVM guest:
+
+ Hot plug the domain and every adapter previously assigned to the mdev if each
+ APQN derived from the Cartesian product of the APIDs of the adapters
+ previously assigned and the APQI of the domain being assigned references a
+ queue device bound to the vfio_ap device driver.
+
+* If one or more adapters have previously been plugged into the guest:
+
+ Hot plug the domain if each APQN derived from the Cartesian product of the
+ APIDs of the adapters already plugged into the guest and the APQI of the
+ domain being assigned references a queue device bound to the vfio_ap device
+ driver.
+
+Over-provisioning of AP queues for a KVM guest:
+==============================================
+Over-provisioning is defined herein as the assignment of adapters or domains to
+a matrix mediated device that do not reference AP devices in the host's AP
+configuration. The idea here is that when the adapter or domain becomes
+available, it will be automatically hot-plugged into the KVM guest using
+the matrix mediated device to which it is assigned as long as each new APQN
+resulting from plugging it in references a queue device bound to the vfio_ap
+device driver.
+
Limitations
===========
-* The KVM/kernel interfaces do not provide a way to prevent restoring an APQN
- to the default drivers pool of a queue that is still assigned to a mediated
- device in use by a guest. It is incumbent upon the administrator to
- ensure there is no mediated device in use by a guest to which the APQN is
- assigned lest the host be given access to the private data of the AP queue
- device such as a private key configured specifically for the guest.
+Live guest migration is not supported for guests using AP devices without
+intervention by a system administrator. Before a KVM guest can be migrated,
+the matrix mediated device must be removed. Unfortunately, it can not be
+removed manually (i.e., echo 1 > /sys/devices/vfio_ap/matrix/$UUID/remove) while
+the mdev is in use by a KVM guest. If the guest is being emulated by QEMU,
+its mdev can be hot unplugged from the guest in one of two ways:
+
+1. If the KVM guest was started with libvirt, you can hot unplug the mdev via
+ the following commands:
+
+ virsh detach-device <guestname> <path-to-device-xml>
+
+ For example, to hot unplug mdev 62177883-f1bb-47f0-914d-32a22e3a8804 from
+ the guest named 'my-guest':
+
+ virsh detach-device my-guest ~/config/my-guest-hostdev.xml
+
+ The contents of my-guest-hostdev.xml:
+
+ <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-ap'>
+ <source>
+ <address uuid='62177883-f1bb-47f0-914d-32a22e3a8804'/>
+ </source>
+ </hostdev>
+
+
+ virsh qemu-monitor-command <guest-name> --hmp "device-del <device-id>"
+
+ For example, to hot unplug the matrix mediated device identified on the
+ qemu command line with 'id=hostdev0' from the guest named 'my-guest':
+
+ virsh qemu-monitor-command my-guest --hmp "device_del hostdev0"
+
+2. A matrix mediated device can be hot unplugged by attaching the qemu monitor
+ to the guest and using the following qemu monitor command:
+
+ (QEMU) device-del id=<device-id>
+
+ For example, to hot unplug the matrix mediated device that was specified
+ on the qemu command line with 'id=hostdev0' when the guest was started:
+
+ (QEMU) device-del id=hostdev0
+
+After live migration of the KVM guest completes, an AP configuration can be
+restored to the KVM guest by hot plugging a matrix mediated device on the target
+system into the guest in one of two ways:
+
+1. If the KVM guest was started with libvirt, you can hot plug a matrix mediated
+ device into the guest via the following virsh commands:
+
+ virsh attach-device <guestname> <path-to-device-xml>
+
+ For example, to hot plug mdev 62177883-f1bb-47f0-914d-32a22e3a8804 into
+ the guest named 'my-guest':
+
+ virsh attach-device my-guest ~/config/my-guest-hostdev.xml
+
+ The contents of my-guest-hostdev.xml:
+
+ <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-ap'>
+ <source>
+ <address uuid='62177883-f1bb-47f0-914d-32a22e3a8804'/>
+ </source>
+ </hostdev>
+
+
+ virsh qemu-monitor-command <guest-name> --hmp \
+ "device_add vfio-ap,sysfsdev=<path-to-mdev>,id=<device-id>"
+
+ For example, to hot plug the matrix mediated device
+ 62177883-f1bb-47f0-914d-32a22e3a8804 into the guest named 'my-guest' with
+ device-id hostdev0:
+
+ virsh qemu-monitor-command my-guest --hmp \
+ "device_add vfio-ap,\
+ sysfsdev=/sys/devices/vfio_ap/matrix/62177883-f1bb-47f0-914d-32a22e3a8804,\
+ id=hostdev0"
+
+2. A matrix mediated device can be hot plugged by attaching the qemu monitor
+ to the guest and using the following qemu monitor command:
+
+ (qemu) device_add "vfio-ap,sysfsdev=<path-to-mdev>,id=<device-id>"

-* Dynamically modifying the AP matrix for a running guest (which would amount to
- hot(un)plug of AP devices for the guest) is currently not supported
+ For example, to plug the matrix mediated device
+ 62177883-f1bb-47f0-914d-32a22e3a8804 into the guest with the device-id
+ hostdev0:

-* Live guest migration is not supported for guests using AP devices.
+ (QEMU) device-add "vfio-ap,\
+ sysfsdev=/sys/devices/vfio_ap/matrix/62177883-f1bb-47f0-914d-32a22e3a8804,\
+ id=hostdev0"
--
2.21.1

2020-10-22 18:18:19

by Anthony Krowiak

[permalink] [raw]
Subject: [PATCH v11 13/14] s390/vfio-ap: handle AP bus scan completed notification

Implements the driver callback invoked by the AP bus when the AP bus
scan has completed. Since this callback is invoked after binding the newly
added devices to their respective device drivers, the vfio_ap driver will
attempt to hot plug the adapters, domains and control domains into each
guest using the matrix mdev to which they are assigned. Keep in mind that
an adapter or domain can be plugged in only if each APQN with the APID of
the adapter or the APQI of the domain references a queue device bound
to the vfio_ap device driver. Consequently, not all newly added adapters
and domains will necessarily get hot plugged.

The same filtering operation used when the guest is started will again be
used to filter the APQNs assigned to the guest when the vfio_ap driver is
notified the AP bus scan has completed for those matrix mediated devices
to which the newly added APID(s) and/or APQI(s) are assigned.

To recap the filtering process employed:

For each APQN formulated from the Cartesian
product of the APIDs and APQIs assigned to the matrix mdev, if the APQN
does not reference a queue device bound to the vfio_ap device driver, the
APID will not be hot plugged into the guest. If any APIDs are left after
filtering, all of the queues referenced by the APQNs formulated by the
remaining APIDs and the APQIs assigned to the matrix mdev will be hot
plugged into the guest.

Control domains will not be filtered and will always be hot plugged.

Example:
=======
Queue devices bound to vfio_ap device driver:
04.0004
04.0047
04.0054

05.0005
05.0047

Adapters and domains assigned to matrix mdev:
Adapters Domains -> Queues
04 0004 04.0004
05 0047 04.0047
0054 04.0054
05.0004
05.0047
05.0054

KVM guest matrix after filtering:
Adapters Domains -> Queues
04 0004 04.0004
0047 04.0047
0054 04.0054

Adapter 05 is filtered because queue 05.0054 is not bound.

Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_drv.c | 1 +
drivers/s390/crypto/vfio_ap_ops.c | 26 ++++++++++++++++++++++++++
drivers/s390/crypto/vfio_ap_private.h | 2 ++
3 files changed, 29 insertions(+)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index d7aa5543afef..357481e80b0a 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -152,6 +152,7 @@ static int __init vfio_ap_init(void)
vfio_ap_drv.in_use = vfio_ap_mdev_resource_in_use;
vfio_ap_drv.ids = ap_queue_ids;
vfio_ap_drv.on_config_changed = vfio_ap_on_cfg_changed;
+ vfio_ap_drv.on_scan_complete = vfio_ap_on_scan_complete;

ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
if (ret) {
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 075096adbfd3..824f936364ba 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1837,3 +1837,29 @@ void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
vfio_ap_mdev_on_cfg_add();
mutex_unlock(&matrix_dev->lock);
}
+
+void vfio_ap_on_scan_complete(struct ap_config_info *new_config_info,
+ struct ap_config_info *old_config_info)
+{
+ struct ap_matrix_mdev *matrix_mdev;
+
+ mutex_lock(&matrix_dev->lock);
+ list_for_each_entry(matrix_mdev, &matrix_dev->mdev_list, node) {
+ if (!vfio_ap_mdev_has_crycb(matrix_mdev))
+ continue;
+
+ if (!bitmap_intersects(matrix_mdev->matrix.apm,
+ matrix_dev->ap_add, AP_DEVICES) &&
+ !bitmap_intersects(matrix_mdev->matrix.aqm,
+ matrix_dev->aq_add, AP_DOMAINS))
+ continue;
+
+ if (vfio_ap_mdev_filter_guest_matrix(matrix_mdev,
+ true))
+ vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
+ }
+
+ bitmap_clear(matrix_dev->ap_add, 0, AP_DEVICES);
+ bitmap_clear(matrix_dev->aq_add, 0, AP_DOMAINS);
+ mutex_unlock(&matrix_dev->lock);
+}
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 64f1f5b820f6..d82d1e62cb2f 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -119,5 +119,7 @@ void vfio_ap_mdev_remove_queue(struct ap_device *queue);
bool vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm);
void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
struct ap_config_info *old_config_info);
+void vfio_ap_on_scan_complete(struct ap_config_info *new_config_info,
+ struct ap_config_info *old_config_info);

#endif /* _VFIO_AP_PRIVATE_H_ */
--
2.21.1

2020-10-22 18:19:21

by Anthony Krowiak

[permalink] [raw]
Subject: [PATCH v11 04/14] s390/zcrypt: driver callback to indicate resource in use

Introduces a new driver callback to prevent a root user from unbinding
an AP queue from its device driver if the queue is in use. The callback
will be invoked whenever a change to the AP bus's sysfs apmask or aqmask
attributes would result in one or more AP queues being removed from its
driver. If the callback responds in the affirmative for any driver
queried, the change to the apmask or aqmask will be rejected with a device
in use error.

For this patch, only non-default drivers will be queried. Currently,
there is only one non-default driver, the vfio_ap device driver. The
vfio_ap device driver facilitates pass-through of an AP queue to a
guest. The idea here is that a guest may be administered by a different
sysadmin than the host and we don't want AP resources to unexpectedly
disappear from a guest's AP configuration (i.e., adapters and domains
assigned to the matrix mdev). This will enforce the proper procedure for
removing AP resources intended for guest usage which is to
first unassign them from the matrix mdev, then unbind them from the
vfio_ap device driver.

Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/ap_bus.c | 148 ++++++++++++++++++++++++++++++++---
drivers/s390/crypto/ap_bus.h | 4 +
2 files changed, 142 insertions(+), 10 deletions(-)

diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
index 485cbfcbf06e..998e61cd86d9 100644
--- a/drivers/s390/crypto/ap_bus.c
+++ b/drivers/s390/crypto/ap_bus.c
@@ -35,6 +35,7 @@
#include <linux/mod_devicetable.h>
#include <linux/debugfs.h>
#include <linux/ctype.h>
+#include <linux/module.h>

#include "ap_bus.h"
#include "ap_debug.h"
@@ -893,6 +894,23 @@ static int modify_bitmap(const char *str, unsigned long *bitmap, int bits)
return 0;
}

+static int ap_parse_bitmap_str(const char *str, unsigned long *bitmap, int bits,
+ unsigned long *newmap)
+{
+ unsigned long size;
+ int rc;
+
+ size = BITS_TO_LONGS(bits)*sizeof(unsigned long);
+ if (*str == '+' || *str == '-') {
+ memcpy(newmap, bitmap, size);
+ rc = modify_bitmap(str, newmap, bits);
+ } else {
+ memset(newmap, 0, size);
+ rc = hex2bitmap(str, newmap, bits);
+ }
+ return rc;
+}
+
int ap_parse_mask_str(const char *str,
unsigned long *bitmap, int bits,
struct mutex *lock)
@@ -912,14 +930,7 @@ int ap_parse_mask_str(const char *str,
kfree(newmap);
return -ERESTARTSYS;
}
-
- if (*str == '+' || *str == '-') {
- memcpy(newmap, bitmap, size);
- rc = modify_bitmap(str, newmap, bits);
- } else {
- memset(newmap, 0, size);
- rc = hex2bitmap(str, newmap, bits);
- }
+ rc = ap_parse_bitmap_str(str, bitmap, bits, newmap);
if (rc == 0)
memcpy(bitmap, newmap, size);
mutex_unlock(lock);
@@ -1111,12 +1122,70 @@ static ssize_t apmask_show(struct bus_type *bus, char *buf)
return rc;
}

+static int __verify_card_reservations(struct device_driver *drv, void *data)
+{
+ int rc = 0;
+ struct ap_driver *ap_drv = to_ap_drv(drv);
+ unsigned long *newapm = (unsigned long *)data;
+
+ /*
+ * No need to verify whether the driver is using the queues if it is the
+ * default driver.
+ */
+ if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
+ return 0;
+
+ /* The non-default driver's module must be loaded */
+ if (!try_module_get(drv->owner))
+ return 0;
+
+ if (ap_drv->in_use)
+ if (ap_drv->in_use(newapm, ap_perms.aqm))
+ rc = -EBUSY;
+
+ module_put(drv->owner);
+
+ return rc;
+}
+
+static int apmask_commit(unsigned long *newapm)
+{
+ int rc;
+ unsigned long reserved[BITS_TO_LONGS(AP_DEVICES)];
+
+ /*
+ * Check if any bits in the apmask have been set which will
+ * result in queues being removed from non-default drivers
+ */
+ if (bitmap_andnot(reserved, newapm, ap_perms.apm, AP_DEVICES)) {
+ rc = bus_for_each_drv(&ap_bus_type, NULL, reserved,
+ __verify_card_reservations);
+ if (rc)
+ return rc;
+ }
+
+ memcpy(ap_perms.apm, newapm, APMASKSIZE);
+
+ return 0;
+}
+
static ssize_t apmask_store(struct bus_type *bus, const char *buf,
size_t count)
{
int rc;
+ DECLARE_BITMAP(newapm, AP_DEVICES);
+
+ if (mutex_lock_interruptible(&ap_perms_mutex))
+ return -ERESTARTSYS;
+
+ rc = ap_parse_bitmap_str(buf, ap_perms.apm, AP_DEVICES, newapm);
+ if (rc)
+ goto done;

- rc = ap_parse_mask_str(buf, ap_perms.apm, AP_DEVICES, &ap_perms_mutex);
+ rc = apmask_commit(newapm);
+
+done:
+ mutex_unlock(&ap_perms_mutex);
if (rc)
return rc;

@@ -1142,12 +1211,71 @@ static ssize_t aqmask_show(struct bus_type *bus, char *buf)
return rc;
}

+static int __verify_queue_reservations(struct device_driver *drv, void *data)
+{
+ int rc = 0;
+ struct ap_driver *ap_drv = to_ap_drv(drv);
+ unsigned long *newaqm = (unsigned long *)data;
+
+ /*
+ * If the reserved bits do not identify queues reserved for use by the
+ * non-default driver, there is no need to verify the driver is using
+ * the queues.
+ */
+ if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
+ return 0;
+
+ /* The non-default driver's module must be loaded */
+ if (!try_module_get(drv->owner))
+ return 0;
+
+ if (ap_drv->in_use)
+ if (ap_drv->in_use(ap_perms.apm, newaqm))
+ rc = -EBUSY;
+
+ module_put(drv->owner);
+
+ return rc;
+}
+
+static int aqmask_commit(unsigned long *newaqm)
+{
+ int rc;
+ unsigned long reserved[BITS_TO_LONGS(AP_DOMAINS)];
+
+ /*
+ * Check if any bits in the aqmask have been set which will
+ * result in queues being removed from non-default drivers
+ */
+ if (bitmap_andnot(reserved, newaqm, ap_perms.aqm, AP_DOMAINS)) {
+ rc = bus_for_each_drv(&ap_bus_type, NULL, reserved,
+ __verify_queue_reservations);
+ if (rc)
+ return rc;
+ }
+
+ memcpy(ap_perms.aqm, newaqm, AQMASKSIZE);
+
+ return 0;
+}
+
static ssize_t aqmask_store(struct bus_type *bus, const char *buf,
size_t count)
{
int rc;
+ DECLARE_BITMAP(newaqm, AP_DOMAINS);

- rc = ap_parse_mask_str(buf, ap_perms.aqm, AP_DOMAINS, &ap_perms_mutex);
+ if (mutex_lock_interruptible(&ap_perms_mutex))
+ return -ERESTARTSYS;
+
+ rc = ap_parse_bitmap_str(buf, ap_perms.aqm, AP_DOMAINS, newaqm);
+ if (rc)
+ goto done;
+
+ rc = aqmask_commit(newaqm);
+
+done:
+ mutex_unlock(&ap_perms_mutex);
if (rc)
return rc;

diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
index 5029b80132aa..6ce154d924d3 100644
--- a/drivers/s390/crypto/ap_bus.h
+++ b/drivers/s390/crypto/ap_bus.h
@@ -145,6 +145,7 @@ struct ap_driver {

int (*probe)(struct ap_device *);
void (*remove)(struct ap_device *);
+ bool (*in_use)(unsigned long *apm, unsigned long *aqm);
};

#define to_ap_drv(x) container_of((x), struct ap_driver, driver)
@@ -293,6 +294,9 @@ void ap_queue_init_state(struct ap_queue *aq);
struct ap_card *ap_card_create(int id, int queue_depth, int raw_device_type,
int comp_device_type, unsigned int functions);

+#define APMASKSIZE (BITS_TO_LONGS(AP_DEVICES) * sizeof(unsigned long))
+#define AQMASKSIZE (BITS_TO_LONGS(AP_DOMAINS) * sizeof(unsigned long))
+
struct ap_perms {
unsigned long ioctlm[BITS_TO_LONGS(AP_IOCTLS)];
unsigned long apm[BITS_TO_LONGS(AP_DEVICES)];
--
2.21.1

2020-10-23 02:28:43

by Anthony Krowiak

[permalink] [raw]
Subject: [PATCH v11 02/14] 390/vfio-ap: use new AP bus interface to search for queue devices

This patch refactors the vfio_ap device driver to use the AP bus's
ap_get_qdev() function to retrieve the vfio_ap_queue struct containing
information about a queue that is bound to the vfio_ap device driver.
The bus's ap_get_qdev() function retrieves the queue device from a
hashtable keyed by APQN. This is much more efficient than looping over
the list of devices attached to the AP bus by several orders of
magnitude.

Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 35 +++++++++++++------------------
1 file changed, 14 insertions(+), 21 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index c471832f0a30..049b97d7444c 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -26,43 +26,36 @@

static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);

-static int match_apqn(struct device *dev, const void *data)
-{
- struct vfio_ap_queue *q = dev_get_drvdata(dev);
-
- return (q->apqn == *(int *)(data)) ? 1 : 0;
-}
-
/**
- * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
+ * vfio_ap_get_queue: Retrieve a queue with a specific APQN.
* @matrix_mdev: the associated mediated matrix
* @apqn: The queue APQN
*
- * Retrieve a queue with a specific APQN from the list of the
- * devices of the vfio_ap_drv.
- * Verify that the APID and the APQI are set in the matrix.
+ * Retrieve a queue with a specific APQN from the AP queue devices attached to
+ * the AP bus.
*
- * Returns the pointer to the associated vfio_ap_queue
+ * Returns the pointer to the vfio_ap_queue with the specified APQN, or NULL.
*/
static struct vfio_ap_queue *vfio_ap_get_queue(
struct ap_matrix_mdev *matrix_mdev,
- int apqn)
+ unsigned long apqn)
{
- struct vfio_ap_queue *q;
- struct device *dev;
+ struct ap_queue *queue;
+ struct vfio_ap_queue *q = NULL;

if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
return NULL;
if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
return NULL;

- dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
- &apqn, match_apqn);
- if (!dev)
+ queue = ap_get_qdev(apqn);
+ if (!queue)
return NULL;
- q = dev_get_drvdata(dev);
- q->matrix_mdev = matrix_mdev;
- put_device(dev);
+
+ if (queue->ap_dev.device.driver == &matrix_dev->vfio_ap_drv->driver)
+ q = dev_get_drvdata(&queue->ap_dev.device);
+
+ put_device(&queue->ap_dev.device);

return q;
}
--
2.21.1

2020-10-23 02:28:44

by Anthony Krowiak

[permalink] [raw]
Subject: [PATCH v11 03/14] s390/vfio-ap: manage link between queue struct and matrix mdev

Let's create links between each queue device bound to the vfio_ap device
driver and the matrix mdev to which the queue is assigned. The idea is to
facilitate efficient retrieval of the objects representing the queue
devices and matrix mdevs as well as to verify that a queue assigned to
a matrix mdev is bound to the driver.

The links will be created as follows:

* When the queue device is probed, if its APQN is assigned to a matrix
mdev, the structures representing the queue device and the matrix mdev
will be linked.

* When an adapter or domain is assigned to a matrix mdev, for each new
APQN assigned that references a queue device bound to the vfio_ap
device driver, the structures representing the queue device and the
matrix mdev will be linked.

The links will be removed as follows:

* When the queue device is removed, if its APQN is assigned to a matrix
mdev, the structures representing the queue device and the matrix mdev
will be unlinked.

* When an adapter or domain is unassigned from a matrix mdev, for each
APQN unassigned that references a queue device bound to the vfio_ap
device driver, the structures representing the queue device and the
matrix mdev will be unlinked.

Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 146 +++++++++++++++++++++++---
drivers/s390/crypto/vfio_ap_private.h | 3 +
2 files changed, 135 insertions(+), 14 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 049b97d7444c..1357f8f8b7e4 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -28,7 +28,6 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);

/**
* vfio_ap_get_queue: Retrieve a queue with a specific APQN.
- * @matrix_mdev: the associated mediated matrix
* @apqn: The queue APQN
*
* Retrieve a queue with a specific APQN from the AP queue devices attached to
@@ -36,18 +35,11 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
*
* Returns the pointer to the vfio_ap_queue with the specified APQN, or NULL.
*/
-static struct vfio_ap_queue *vfio_ap_get_queue(
- struct ap_matrix_mdev *matrix_mdev,
- unsigned long apqn)
+static struct vfio_ap_queue *vfio_ap_get_queue(unsigned long apqn)
{
struct ap_queue *queue;
struct vfio_ap_queue *q = NULL;

- if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
- return NULL;
- if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
- return NULL;
-
queue = ap_get_qdev(apqn);
if (!queue)
return NULL;
@@ -60,6 +52,19 @@ static struct vfio_ap_queue *vfio_ap_get_queue(
return q;
}

+static struct vfio_ap_queue *
+vfio_ap_mdev_get_queue(struct ap_matrix_mdev *matrix_mdev, unsigned long apqn)
+{
+ struct vfio_ap_queue *q;
+
+ hash_for_each_possible(matrix_mdev->qtable, q, mdev_qnode, apqn) {
+ if (q && (q->apqn == apqn))
+ return q;
+ }
+
+ return NULL;
+}
+
/**
* vfio_ap_wait_for_irqclear
* @apqn: The AP Queue number
@@ -171,7 +176,6 @@ static struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
status.response_code);
end_free:
vfio_ap_free_aqic_resources(q);
- q->matrix_mdev = NULL;
return status;
}

@@ -284,14 +288,14 @@ static int handle_pqap(struct kvm_vcpu *vcpu)

if (!vcpu->kvm->arch.crypto.pqap_hook)
goto out_unlock;
+
matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
struct ap_matrix_mdev, pqap_hook);

- q = vfio_ap_get_queue(matrix_mdev, apqn);
+ q = vfio_ap_mdev_get_queue(matrix_mdev, apqn);
if (!q)
goto out_unlock;

- q->matrix_mdev = matrix_mdev;
status = vcpu->run->s.regs.gprs[1];

/* If IR bit(16) is set we enable the interrupt */
@@ -331,6 +335,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)

matrix_mdev->mdev = mdev;
vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
+ hash_init(matrix_mdev->qtable);
mdev_set_drvdata(mdev, matrix_mdev);
matrix_mdev->pqap_hook.hook = handle_pqap;
matrix_mdev->pqap_hook.owner = THIS_MODULE;
@@ -559,6 +564,87 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
return 0;
}

+enum qlink_type {
+ LINK_APID,
+ LINK_APQI,
+ UNLINK_APID,
+ UNLINK_APQI,
+};
+
+static void vfio_ap_mdev_link_queue(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long apid, unsigned long apqi)
+{
+ struct vfio_ap_queue *q;
+
+ q = vfio_ap_get_queue(AP_MKQID(apid, apqi));
+ if (q) {
+ q->matrix_mdev = matrix_mdev;
+ hash_add(matrix_mdev->qtable,
+ &q->mdev_qnode, q->apqn);
+ }
+}
+
+static void vfio_ap_mdev_unlink_queue(unsigned long apid, unsigned long apqi)
+{
+ struct vfio_ap_queue *q;
+
+ q = vfio_ap_get_queue(AP_MKQID(apid, apqi));
+ if (q) {
+ q->matrix_mdev = NULL;
+ hash_del(&q->mdev_qnode);
+ }
+}
+
+/**
+ * vfio_ap_mdev_link_queues
+ *
+ * @matrix_mdev: The matrix mdev to link.
+ * @type: The type of @qlink_id.
+ * @qlink_id: The APID or APQI of the queues to link.
+ *
+ * Sets or clears the links between the queues with the specified @qlink_id
+ * and the @matrix_mdev:
+ * @type == LINK_APID: Set the links between the @matrix_mdev and the
+ * queues with the specified @qlink_id (APID)
+ * @type == LINK_APQI: Set the links between the @matrix_mdev and the
+ * queues with the specified @qlink_id (APQI)
+ * @type == UNLINK_APID: Clear the links between the @matrix_mdev and the
+ * queues with the specified @qlink_id (APID)
+ * @type == UNLINK_APQI: Clear the links between the @matrix_mdev and the
+ * queues with the specified @qlink_id (APQI)
+ */
+static void vfio_ap_mdev_link_queues(struct ap_matrix_mdev *matrix_mdev,
+ enum qlink_type type,
+ unsigned long qlink_id)
+{
+ unsigned long id;
+
+ switch (type) {
+ case LINK_APID:
+ for_each_set_bit_inv(id, matrix_mdev->matrix.aqm,
+ matrix_mdev->matrix.aqm_max + 1)
+ vfio_ap_mdev_link_queue(matrix_mdev, qlink_id, id);
+ break;
+ case UNLINK_APID:
+ for_each_set_bit_inv(id, matrix_mdev->matrix.aqm,
+ matrix_mdev->matrix.aqm_max + 1)
+ vfio_ap_mdev_unlink_queue(qlink_id, id);
+ break;
+ case LINK_APQI:
+ for_each_set_bit_inv(id, matrix_mdev->matrix.apm,
+ matrix_mdev->matrix.apm_max + 1)
+ vfio_ap_mdev_link_queue(matrix_mdev, id, qlink_id);
+ break;
+ case UNLINK_APQI:
+ for_each_set_bit_inv(id, matrix_mdev->matrix.apm,
+ matrix_mdev->matrix.apm_max + 1)
+ vfio_ap_mdev_link_queue(matrix_mdev, id, qlink_id);
+ break;
+ default:
+ WARN_ON_ONCE(1);
+ }
+}
+
/**
* assign_adapter_store
*
@@ -628,6 +714,7 @@ static ssize_t assign_adapter_store(struct device *dev,
if (ret)
goto share_err;

+ vfio_ap_mdev_link_queues(matrix_mdev, LINK_APID, apid);
ret = count;
goto done;

@@ -679,6 +766,7 @@ static ssize_t unassign_adapter_store(struct device *dev,

mutex_lock(&matrix_dev->lock);
clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
+ vfio_ap_mdev_link_queues(matrix_mdev, UNLINK_APID, apid);
mutex_unlock(&matrix_dev->lock);

return count;
@@ -769,6 +857,7 @@ static ssize_t assign_domain_store(struct device *dev,
if (ret)
goto share_err;

+ vfio_ap_mdev_link_queues(matrix_mdev, LINK_APQI, apqi);
ret = count;
goto done;

@@ -821,6 +910,7 @@ static ssize_t unassign_domain_store(struct device *dev,

mutex_lock(&matrix_dev->lock);
clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
+ vfio_ap_mdev_link_queues(matrix_mdev, UNLINK_APQI, apqi);
mutex_unlock(&matrix_dev->lock);

return count;
@@ -1159,8 +1249,8 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
*/
if (ret)
rc = ret;
- q = vfio_ap_get_queue(matrix_mdev,
- AP_MKQID(apid, apqi));
+ q = vfio_ap_mdev_get_queue(matrix_mdev,
+ AP_MKQID(apid, apqi));
if (q)
vfio_ap_free_aqic_resources(q);
}
@@ -1288,6 +1378,29 @@ void vfio_ap_mdev_unregister(void)
mdev_unregister_device(&matrix_dev->device);
}

+/**
+ * vfio_ap_queue_link_mdev
+ *
+ * @q: The queue to link with the matrix mdev.
+ *
+ * Links @q with the matrix mdev to which the queue's APQN is assigned.
+ */
+static void vfio_ap_queue_link_mdev(struct vfio_ap_queue *q)
+{
+ unsigned long apid = AP_QID_CARD(q->apqn);
+ unsigned long apqi = AP_QID_QUEUE(q->apqn);
+ struct ap_matrix_mdev *matrix_mdev;
+
+ list_for_each_entry(matrix_mdev, &matrix_dev->mdev_list, node) {
+ if (test_bit_inv(apid, matrix_mdev->matrix.apm) &&
+ test_bit_inv(apqi, matrix_mdev->matrix.aqm)) {
+ q->matrix_mdev = matrix_mdev;
+ hash_add(matrix_mdev->qtable, &q->mdev_qnode, q->apqn);
+ break;
+ }
+ }
+}
+
int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
{
struct vfio_ap_queue *q;
@@ -1299,9 +1412,12 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
if (!q)
return -ENOMEM;

+ mutex_lock(&matrix_dev->lock);
dev_set_drvdata(&queue->ap_dev.device, q);
q->apqn = queue->qid;
q->saved_isc = VFIO_AP_ISC_INVALID;
+ vfio_ap_queue_link_mdev(q);
+ mutex_unlock(&matrix_dev->lock);

return 0;
}
@@ -1321,6 +1437,8 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
apqi = AP_QID_QUEUE(q->apqn);
vfio_ap_mdev_reset_queue(apid, apqi, 1);
vfio_ap_free_aqic_resources(q);
+ if (q->matrix_mdev)
+ hash_del(&q->mdev_qnode);
kfree(q);
mutex_unlock(&matrix_dev->lock);
}
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index d9003de4fbad..4e5cc72fc0db 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -18,6 +18,7 @@
#include <linux/delay.h>
#include <linux/mutex.h>
#include <linux/kvm_host.h>
+#include <linux/hashtable.h>

#include "ap_bus.h"

@@ -86,6 +87,7 @@ struct ap_matrix_mdev {
struct kvm *kvm;
struct kvm_s390_module_hook pqap_hook;
struct mdev_device *mdev;
+ DECLARE_HASHTABLE(qtable, 8);
};

extern int vfio_ap_mdev_register(void);
@@ -97,6 +99,7 @@ struct vfio_ap_queue {
int apqn;
#define VFIO_AP_ISC_INVALID 0xff
unsigned char saved_isc;
+ struct hlist_node mdev_qnode;
};

int vfio_ap_mdev_probe_queue(struct ap_device *queue);
--
2.21.1

2020-10-27 14:12:24

by Halil Pasic

[permalink] [raw]
Subject: Re: [PATCH v11 02/14] 390/vfio-ap: use new AP bus interface to search for queue devices

On Thu, 22 Oct 2020 13:11:57 -0400
Tony Krowiak <[email protected]> wrote:

> This patch refactors the vfio_ap device driver to use the AP bus's
> ap_get_qdev() function to retrieve the vfio_ap_queue struct containing
> information about a queue that is bound to the vfio_ap device driver.
> The bus's ap_get_qdev() function retrieves the queue device from a
> hashtable keyed by APQN. This is much more efficient than looping over
> the list of devices attached to the AP bus by several orders of
> magnitude.
>
> Signed-off-by: Tony Krowiak <[email protected]>

Reviewed-by: Halil Pasic <[email protected]>

> ---
> drivers/s390/crypto/vfio_ap_ops.c | 35 +++++++++++++------------------
> 1 file changed, 14 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index c471832f0a30..049b97d7444c 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -26,43 +26,36 @@
>
> static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
>
> -static int match_apqn(struct device *dev, const void *data)
> -{
> - struct vfio_ap_queue *q = dev_get_drvdata(dev);
> -
> - return (q->apqn == *(int *)(data)) ? 1 : 0;
> -}
> -
> /**
> - * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
> + * vfio_ap_get_queue: Retrieve a queue with a specific APQN.
> * @matrix_mdev: the associated mediated matrix
> * @apqn: The queue APQN
> *
> - * Retrieve a queue with a specific APQN from the list of the
> - * devices of the vfio_ap_drv.
> - * Verify that the APID and the APQI are set in the matrix.
> + * Retrieve a queue with a specific APQN from the AP queue devices attached to
> + * the AP bus.
> *
> - * Returns the pointer to the associated vfio_ap_queue
> + * Returns the pointer to the vfio_ap_queue with the specified APQN, or NULL.
> */
> static struct vfio_ap_queue *vfio_ap_get_queue(
> struct ap_matrix_mdev *matrix_mdev,
> - int apqn)
> + unsigned long apqn)
> {
> - struct vfio_ap_queue *q;
> - struct device *dev;
> + struct ap_queue *queue;
> + struct vfio_ap_queue *q = NULL;
>
> if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
> return NULL;
> if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
> return NULL;
>
> - dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
> - &apqn, match_apqn);
> - if (!dev)
> + queue = ap_get_qdev(apqn);
> + if (!queue)
> return NULL;
> - q = dev_get_drvdata(dev);
> - q->matrix_mdev = matrix_mdev;
> - put_device(dev);
> +
> + if (queue->ap_dev.device.driver == &matrix_dev->vfio_ap_drv->driver)
> + q = dev_get_drvdata(&queue->ap_dev.device);
> +

Needs to be called with the vfio_ap lock held, right? Otherwise the queue could
get unbound while we are working with it as a vfio_ap_queue... Noting
new, but might we worth documenting.

> + put_device(&queue->ap_dev.device);
>
> return q;
> }

2020-10-27 14:55:36

by Halil Pasic

[permalink] [raw]
Subject: Re: [PATCH v11 05/14] s390/vfio-ap: implement in-use callback for vfio_ap driver

On Thu, 22 Oct 2020 13:12:00 -0400
Tony Krowiak <[email protected]> wrote:

> Let's implement the callback to indicate when an APQN
> is in use by the vfio_ap device driver. The callback is
> invoked whenever a change to the apmask or aqmask would
> result in one or more queue devices being removed from the driver. The
> vfio_ap device driver will indicate a resource is in use
> if the APQN of any of the queue devices to be removed are assigned to
> any of the matrix mdevs under the driver's control.
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> drivers/s390/crypto/vfio_ap_drv.c | 1 +
> drivers/s390/crypto/vfio_ap_ops.c | 78 +++++++++++++++++++--------
> drivers/s390/crypto/vfio_ap_private.h | 2 +
> 3 files changed, 60 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
> index 73bd073fd5d3..8934471b7944 100644
> --- a/drivers/s390/crypto/vfio_ap_drv.c
> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> @@ -147,6 +147,7 @@ static int __init vfio_ap_init(void)
> memset(&vfio_ap_drv, 0, sizeof(vfio_ap_drv));
> vfio_ap_drv.probe = vfio_ap_mdev_probe_queue;
> vfio_ap_drv.remove = vfio_ap_mdev_remove_queue;
> + vfio_ap_drv.in_use = vfio_ap_mdev_resource_in_use;
> vfio_ap_drv.ids = ap_queue_ids;
>
> ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 1357f8f8b7e4..9e9fad560859 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -522,18 +522,40 @@ vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
> return 0;
> }
>
> +#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
> + "already assigned to %s"
> +
> +static void vfio_ap_mdev_log_sharing_err(const char *mdev_name,
> + unsigned long *apm,
> + unsigned long *aqm)
> +{
> + unsigned long apid, apqi;
> +
> + for_each_set_bit_inv(apid, apm, AP_DEVICES)
> + for_each_set_bit_inv(apqi, aqm, AP_DOMAINS)
> + pr_err(MDEV_SHARING_ERR, apid, apqi, mdev_name);

Isn't error rather severe for this? For my taste even warning would be
severe for this.

> +}
> +
> /**
> * vfio_ap_mdev_verify_no_sharing
> *
> - * Verifies that the APQNs derived from the cross product of the AP adapter IDs
> - * and AP queue indexes comprising the AP matrix are not configured for another
> + * Verifies that each APQN derived from the cross product of the AP adapter IDs
> + * and AP queue indexes comprising an AP matrix is not assigned to a
> * mediated device. AP queue sharing is not allowed.
> *
> - * @matrix_mdev: the mediated matrix device
> + * @matrix_mdev: the mediated matrix device to which the APQNs being verified
> + * are assigned. If the value is not NULL, then verification will
> + * proceed for all other matrix mediated devices; otherwise, all
> + * matrix mediated devices will be verified.
> + * @mdev_apm: mask indicating the APIDs of the APQNs to be verified
> + * @mdev_aqm: mask indicating the APQIs of the APQNs to be verified
> *
> - * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
> + * Returns 0 if no APQNs are not shared, otherwise; returns -EADDRINUSE if one
> + * or more APQNs are shared.
> */
> -static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
> +static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
> + unsigned long *mdev_apm,
> + unsigned long *mdev_aqm)
> {
> struct ap_matrix_mdev *lstdev;
> DECLARE_BITMAP(apm, AP_DEVICES);
> @@ -550,14 +572,15 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
> * We work on full longs, as we can only exclude the leftover
> * bits in non-inverse order. The leftover is all zeros.
> */
> - if (!bitmap_and(apm, matrix_mdev->matrix.apm,
> - lstdev->matrix.apm, AP_DEVICES))
> + if (!bitmap_and(apm, mdev_apm, lstdev->matrix.apm, AP_DEVICES))
> continue;
>
> - if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
> - lstdev->matrix.aqm, AP_DOMAINS))
> + if (!bitmap_and(aqm, mdev_aqm, lstdev->matrix.aqm, AP_DOMAINS))
> continue;
>
> + vfio_ap_mdev_log_sharing_err(dev_name(mdev_dev(lstdev->mdev)),
> + apm, aqm);
> +
> return -EADDRINUSE;
> }
>
> @@ -683,6 +706,7 @@ static ssize_t assign_adapter_store(struct device *dev,
> {
> int ret;
> unsigned long apid;
> + DECLARE_BITMAP(apm, AP_DEVICES);
> struct mdev_device *mdev = mdev_from_dev(dev);
> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>
> @@ -708,18 +732,18 @@ static ssize_t assign_adapter_store(struct device *dev,
> if (ret)
> goto done;
>
> - set_bit_inv(apid, matrix_mdev->matrix.apm);
> + memset(apm, 0, sizeof(apm));
> + set_bit_inv(apid, apm);
>
> - ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
> + ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev, apm,
> + matrix_mdev->matrix.aqm);

What is the benefit of using a copy here? I mean we have the vfio_ap lock
so nobody can see the bit we speculatively flipped.

I've also pointed out in the previous patch that in_use() isn't
perfectly reliable (at least in theory) because of a race.

Otherwise looks good to me!

> if (ret)
> - goto share_err;
> + goto done;
>
> + set_bit_inv(apid, matrix_mdev->matrix.apm);
> vfio_ap_mdev_link_queues(matrix_mdev, LINK_APID, apid);
> ret = count;
> - goto done;
>
> -share_err:
> - clear_bit_inv(apid, matrix_mdev->matrix.apm);
> done:
> mutex_unlock(&matrix_dev->lock);
>
> @@ -831,6 +855,7 @@ static ssize_t assign_domain_store(struct device *dev,
> {
> int ret;
> unsigned long apqi;
> + DECLARE_BITMAP(aqm, AP_DOMAINS);
> struct mdev_device *mdev = mdev_from_dev(dev);
> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
> @@ -851,18 +876,18 @@ static ssize_t assign_domain_store(struct device *dev,
> if (ret)
> goto done;
>
> - set_bit_inv(apqi, matrix_mdev->matrix.aqm);
> + memset(aqm, 0, sizeof(aqm));
> + set_bit_inv(apqi, aqm);
>
> - ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
> + ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev,
> + matrix_mdev->matrix.apm, aqm);
> if (ret)
> - goto share_err;
> + goto done;
>
> + set_bit_inv(apqi, matrix_mdev->matrix.aqm);
> vfio_ap_mdev_link_queues(matrix_mdev, LINK_APQI, apqi);
> ret = count;
> - goto done;
>
> -share_err:
> - clear_bit_inv(apqi, matrix_mdev->matrix.aqm);
> done:
> mutex_unlock(&matrix_dev->lock);
>
> @@ -1442,3 +1467,14 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
> kfree(q);
> mutex_unlock(&matrix_dev->lock);
> }
> +
> +bool vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm)
> +{
> + bool in_use;
> +
> + mutex_lock(&matrix_dev->lock);
> + in_use = !!vfio_ap_mdev_verify_no_sharing(NULL, apm, aqm);
> + mutex_unlock(&matrix_dev->lock);
> +
> + return in_use;
> +}
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index 4e5cc72fc0db..c1d8b5507610 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -105,4 +105,6 @@ struct vfio_ap_queue {
> int vfio_ap_mdev_probe_queue(struct ap_device *queue);
> void vfio_ap_mdev_remove_queue(struct ap_device *queue);
>
> +bool vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm);
> +
> #endif /* _VFIO_AP_PRIVATE_H_ */

2020-10-27 17:33:47

by Harald Freudenberger

[permalink] [raw]
Subject: Re: [PATCH v11 04/14] s390/zcrypt: driver callback to indicate resource in use

On 22.10.20 19:11, Tony Krowiak wrote:
> Introduces a new driver callback to prevent a root user from unbinding
> an AP queue from its device driver if the queue is in use. The callback
> will be invoked whenever a change to the AP bus's sysfs apmask or aqmask
> attributes would result in one or more AP queues being removed from its
> driver. If the callback responds in the affirmative for any driver
> queried, the change to the apmask or aqmask will be rejected with a device
> in use error.
>
> For this patch, only non-default drivers will be queried. Currently,
> there is only one non-default driver, the vfio_ap device driver. The
> vfio_ap device driver facilitates pass-through of an AP queue to a
> guest. The idea here is that a guest may be administered by a different
> sysadmin than the host and we don't want AP resources to unexpectedly
> disappear from a guest's AP configuration (i.e., adapters and domains
> assigned to the matrix mdev). This will enforce the proper procedure for
> removing AP resources intended for guest usage which is to
> first unassign them from the matrix mdev, then unbind them from the
> vfio_ap device driver.
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> drivers/s390/crypto/ap_bus.c | 148 ++++++++++++++++++++++++++++++++---
> drivers/s390/crypto/ap_bus.h | 4 +
> 2 files changed, 142 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
> index 485cbfcbf06e..998e61cd86d9 100644
> --- a/drivers/s390/crypto/ap_bus.c
> +++ b/drivers/s390/crypto/ap_bus.c
> @@ -35,6 +35,7 @@
> #include <linux/mod_devicetable.h>
> #include <linux/debugfs.h>
> #include <linux/ctype.h>
> +#include <linux/module.h>
>
> #include "ap_bus.h"
> #include "ap_debug.h"
> @@ -893,6 +894,23 @@ static int modify_bitmap(const char *str, unsigned long *bitmap, int bits)
> return 0;
> }
>
> +static int ap_parse_bitmap_str(const char *str, unsigned long *bitmap, int bits,
> + unsigned long *newmap)
> +{
> + unsigned long size;
> + int rc;
> +
> + size = BITS_TO_LONGS(bits)*sizeof(unsigned long);
> + if (*str == '+' || *str == '-') {
> + memcpy(newmap, bitmap, size);
> + rc = modify_bitmap(str, newmap, bits);
> + } else {
> + memset(newmap, 0, size);
> + rc = hex2bitmap(str, newmap, bits);
> + }
> + return rc;
> +}
> +
> int ap_parse_mask_str(const char *str,
> unsigned long *bitmap, int bits,
> struct mutex *lock)
> @@ -912,14 +930,7 @@ int ap_parse_mask_str(const char *str,
> kfree(newmap);
> return -ERESTARTSYS;
> }
> -
> - if (*str == '+' || *str == '-') {
> - memcpy(newmap, bitmap, size);
> - rc = modify_bitmap(str, newmap, bits);
> - } else {
> - memset(newmap, 0, size);
> - rc = hex2bitmap(str, newmap, bits);
> - }
> + rc = ap_parse_bitmap_str(str, bitmap, bits, newmap);
> if (rc == 0)
> memcpy(bitmap, newmap, size);
> mutex_unlock(lock);
> @@ -1111,12 +1122,70 @@ static ssize_t apmask_show(struct bus_type *bus, char *buf)
> return rc;
> }
>
> +static int __verify_card_reservations(struct device_driver *drv, void *data)
> +{
> + int rc = 0;
> + struct ap_driver *ap_drv = to_ap_drv(drv);
> + unsigned long *newapm = (unsigned long *)data;
> +
> + /*
> + * No need to verify whether the driver is using the queues if it is the
> + * default driver.
> + */
> + if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
> + return 0;
> +
> + /* The non-default driver's module must be loaded */
Can you please update this comment? It should be something like
/* increase the driver's module refcounter to be sure it is not
   going away when we invoke the callback function. */

> + if (!try_module_get(drv->owner))
> + return 0;
> +
> + if (ap_drv->in_use)
> + if (ap_drv->in_use(newapm, ap_perms.aqm))
> + rc = -EBUSY;
> +
And here: /* release driver's module */ or simmilar
> + module_put(drv->owner);
> +
> + return rc;
> +}
> +
> +static int apmask_commit(unsigned long *newapm)
> +{
> + int rc;
> + unsigned long reserved[BITS_TO_LONGS(AP_DEVICES)];
> +
> + /*
> + * Check if any bits in the apmask have been set which will
> + * result in queues being removed from non-default drivers
> + */
> + if (bitmap_andnot(reserved, newapm, ap_perms.apm, AP_DEVICES)) {
> + rc = bus_for_each_drv(&ap_bus_type, NULL, reserved,
> + __verify_card_reservations);
> + if (rc)
> + return rc;
> + }
> +
> + memcpy(ap_perms.apm, newapm, APMASKSIZE);
> +
> + return 0;
> +}
> +
> static ssize_t apmask_store(struct bus_type *bus, const char *buf,
> size_t count)
> {
> int rc;
> + DECLARE_BITMAP(newapm, AP_DEVICES);
> +
> + if (mutex_lock_interruptible(&ap_perms_mutex))
> + return -ERESTARTSYS;
> +
> + rc = ap_parse_bitmap_str(buf, ap_perms.apm, AP_DEVICES, newapm);
> + if (rc)
> + goto done;
>
> - rc = ap_parse_mask_str(buf, ap_perms.apm, AP_DEVICES, &ap_perms_mutex);
> + rc = apmask_commit(newapm);
> +
> +done:
> + mutex_unlock(&ap_perms_mutex);
> if (rc)
> return rc;
>
> @@ -1142,12 +1211,71 @@ static ssize_t aqmask_show(struct bus_type *bus, char *buf)
> return rc;
> }
>
> +static int __verify_queue_reservations(struct device_driver *drv, void *data)
> +{
> + int rc = 0;
> + struct ap_driver *ap_drv = to_ap_drv(drv);
> + unsigned long *newaqm = (unsigned long *)data;
> +
> + /*
> + * If the reserved bits do not identify queues reserved for use by the
> + * non-default driver, there is no need to verify the driver is using
> + * the queues.
> + */
> + if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
> + return 0;
> +
> + /* The non-default driver's module must be loaded */
Same here.
> + if (!try_module_get(drv->owner))
> + return 0;
> +
> + if (ap_drv->in_use)
> + if (ap_drv->in_use(ap_perms.apm, newaqm))
> + rc = -EBUSY;
> +
and here
> + module_put(drv->owner);
> +
> + return rc;
> +}
> +
> +static int aqmask_commit(unsigned long *newaqm)
> +{
> + int rc;
> + unsigned long reserved[BITS_TO_LONGS(AP_DOMAINS)];
> +
> + /*
> + * Check if any bits in the aqmask have been set which will
> + * result in queues being removed from non-default drivers
> + */
> + if (bitmap_andnot(reserved, newaqm, ap_perms.aqm, AP_DOMAINS)) {
> + rc = bus_for_each_drv(&ap_bus_type, NULL, reserved,
> + __verify_queue_reservations);
> + if (rc)
> + return rc;
> + }
> +
> + memcpy(ap_perms.aqm, newaqm, AQMASKSIZE);
> +
> + return 0;
> +}
> +
> static ssize_t aqmask_store(struct bus_type *bus, const char *buf,
> size_t count)
> {
> int rc;
> + DECLARE_BITMAP(newaqm, AP_DOMAINS);
>
> - rc = ap_parse_mask_str(buf, ap_perms.aqm, AP_DOMAINS, &ap_perms_mutex);
> + if (mutex_lock_interruptible(&ap_perms_mutex))
> + return -ERESTARTSYS;
> +
> + rc = ap_parse_bitmap_str(buf, ap_perms.aqm, AP_DOMAINS, newaqm);
> + if (rc)
> + goto done;
> +
> + rc = aqmask_commit(newaqm);
> +
> +done:
> + mutex_unlock(&ap_perms_mutex);
> if (rc)
> return rc;
>
> diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
> index 5029b80132aa..6ce154d924d3 100644
> --- a/drivers/s390/crypto/ap_bus.h
> +++ b/drivers/s390/crypto/ap_bus.h
> @@ -145,6 +145,7 @@ struct ap_driver {
>
> int (*probe)(struct ap_device *);
> void (*remove)(struct ap_device *);
> + bool (*in_use)(unsigned long *apm, unsigned long *aqm);
> };
>
> #define to_ap_drv(x) container_of((x), struct ap_driver, driver)
> @@ -293,6 +294,9 @@ void ap_queue_init_state(struct ap_queue *aq);
> struct ap_card *ap_card_create(int id, int queue_depth, int raw_device_type,
> int comp_device_type, unsigned int functions);
>
> +#define APMASKSIZE (BITS_TO_LONGS(AP_DEVICES) * sizeof(unsigned long))
> +#define AQMASKSIZE (BITS_TO_LONGS(AP_DOMAINS) * sizeof(unsigned long))
> +
> struct ap_perms {
> unsigned long ioctlm[BITS_TO_LONGS(AP_IOCTLS)];
> unsigned long apm[BITS_TO_LONGS(AP_DEVICES)];
I still don't like this code. That's because of what it is doing - not because of the code quality.
And Halil, you are right. It is adding more pressure to the mutex used for locking the apmask
and aqmask stuff (and the zcrypt multiple device drivers support code also).
I am very concerned about the in_use callback which is called with the ap_perms_mutex
held AND during bus_for_each_drv (so holding the overall AP BUS mutex) and then diving
into the vfio_ap ... with yet another mutex to protect the vfio structs.
Reviewed-by: Harald Freudenberger <[email protected]>

2020-10-28 06:17:41

by Halil Pasic

[permalink] [raw]
Subject: Re: [PATCH v11 03/14] s390/vfio-ap: manage link between queue struct and matrix mdev

On Thu, 22 Oct 2020 13:11:58 -0400
Tony Krowiak <[email protected]> wrote:

> Let's create links between each queue device bound to the vfio_ap device
> driver and the matrix mdev to which the queue is assigned. The idea is to
> facilitate efficient retrieval of the objects representing the queue
> devices and matrix mdevs as well as to verify that a queue assigned to
> a matrix mdev is bound to the driver.
>
> The links will be created as follows:
>
> * When the queue device is probed, if its APQN is assigned to a matrix
> mdev, the structures representing the queue device and the matrix mdev
> will be linked.
>
> * When an adapter or domain is assigned to a matrix mdev, for each new
> APQN assigned that references a queue device bound to the vfio_ap
> device driver, the structures representing the queue device and the
> matrix mdev will be linked.
>
> The links will be removed as follows:
>
> * When the queue device is removed, if its APQN is assigned to a matrix
> mdev, the structures representing the queue device and the matrix mdev
> will be unlinked.
>
> * When an adapter or domain is unassigned from a matrix mdev, for each
> APQN unassigned that references a queue device bound to the vfio_ap
> device driver, the structures representing the queue device and the
> matrix mdev will be unlinked.
>

I would prefer if the changes to the q->matrix_mdev link were restricted
to this patch. Patches 1 and 2 do some of that stuff as well. See my
comments at the code.

> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> drivers/s390/crypto/vfio_ap_ops.c | 146 +++++++++++++++++++++++---
> drivers/s390/crypto/vfio_ap_private.h | 3 +
> 2 files changed, 135 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 049b97d7444c..1357f8f8b7e4 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -28,7 +28,6 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
>
> /**
> * vfio_ap_get_queue: Retrieve a queue with a specific APQN.
> - * @matrix_mdev: the associated mediated matrix
> * @apqn: The queue APQN
> *
> * Retrieve a queue with a specific APQN from the AP queue devices attached to
> @@ -36,18 +35,11 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
> *
> * Returns the pointer to the vfio_ap_queue with the specified APQN, or NULL.
> */
> -static struct vfio_ap_queue *vfio_ap_get_queue(
> - struct ap_matrix_mdev *matrix_mdev,
> - unsigned long apqn)
> +static struct vfio_ap_queue *vfio_ap_get_queue(unsigned long apqn)
> {
> struct ap_queue *queue;
> struct vfio_ap_queue *q = NULL;
>
> - if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
> - return NULL;
> - if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
> - return NULL;
> -
> queue = ap_get_qdev(apqn);
> if (!queue)
> return NULL;

Patch 2 removed
q->matrix_mdev = matrix_mdev;
because patch 1 make it redundant. But patch 1 should not have made it
redundant in the first place.

It should be removed in this patch.

> @@ -60,6 +52,19 @@ static struct vfio_ap_queue *vfio_ap_get_queue(
> return q;
> }
>
> +static struct vfio_ap_queue *
> +vfio_ap_mdev_get_queue(struct ap_matrix_mdev *matrix_mdev, unsigned long apqn)
> +{
> + struct vfio_ap_queue *q;
> +
> + hash_for_each_possible(matrix_mdev->qtable, q, mdev_qnode, apqn) {
> + if (q && (q->apqn == apqn))
> + return q;
> + }
> +
> + return NULL;
> +}
> +
> /**
> * vfio_ap_wait_for_irqclear
> * @apqn: The AP Queue number
> @@ -171,7 +176,6 @@ static struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
> status.response_code);
> end_free:
> vfio_ap_free_aqic_resources(q);
> - q->matrix_mdev = NULL;
> return status;
> }
>
> @@ -284,14 +288,14 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>
> if (!vcpu->kvm->arch.crypto.pqap_hook)
> goto out_unlock;
> +
> matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
> struct ap_matrix_mdev, pqap_hook);
>
> - q = vfio_ap_get_queue(matrix_mdev, apqn);
> + q = vfio_ap_mdev_get_queue(matrix_mdev, apqn);
> if (!q)
> goto out_unlock;
>
> - q->matrix_mdev = matrix_mdev;

This was unnecessarily added in patch 1, now it's removed.

> status = vcpu->run->s.regs.gprs[1];
>
> /* If IR bit(16) is set we enable the interrupt */
> @@ -331,6 +335,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>
> matrix_mdev->mdev = mdev;
> vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
> + hash_init(matrix_mdev->qtable);
> mdev_set_drvdata(mdev, matrix_mdev);
> matrix_mdev->pqap_hook.hook = handle_pqap;
> matrix_mdev->pqap_hook.owner = THIS_MODULE;
> @@ -559,6 +564,87 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
> return 0;
> }
>
> +enum qlink_type {
> + LINK_APID,
> + LINK_APQI,
> + UNLINK_APID,
> + UNLINK_APQI,
> +};
> +
> +static void vfio_ap_mdev_link_queue(struct ap_matrix_mdev *matrix_mdev,
> + unsigned long apid, unsigned long apqi)
> +{
> + struct vfio_ap_queue *q;
> +
> + q = vfio_ap_get_queue(AP_MKQID(apid, apqi));
> + if (q) {
> + q->matrix_mdev = matrix_mdev;
> + hash_add(matrix_mdev->qtable,
> + &q->mdev_qnode, q->apqn);
> + }
> +}
> +
> +static void vfio_ap_mdev_unlink_queue(unsigned long apid, unsigned long apqi)
> +{
> + struct vfio_ap_queue *q;
> +
> + q = vfio_ap_get_queue(AP_MKQID(apid, apqi));
> + if (q) {
> + q->matrix_mdev = NULL;
> + hash_del(&q->mdev_qnode);
> + }
> +}
> +
> +/**
> + * vfio_ap_mdev_link_queues
> + *
> + * @matrix_mdev: The matrix mdev to link.
> + * @type: The type of @qlink_id.
> + * @qlink_id: The APID or APQI of the queues to link.
> + *
> + * Sets or clears the links between the queues with the specified @qlink_id
> + * and the @matrix_mdev:
> + * @type == LINK_APID: Set the links between the @matrix_mdev and the
> + * queues with the specified @qlink_id (APID)
> + * @type == LINK_APQI: Set the links between the @matrix_mdev and the
> + * queues with the specified @qlink_id (APQI)
> + * @type == UNLINK_APID: Clear the links between the @matrix_mdev and the
> + * queues with the specified @qlink_id (APID)
> + * @type == UNLINK_APQI: Clear the links between the @matrix_mdev and the
> + * queues with the specified @qlink_id (APQI)
> + */
> +static void vfio_ap_mdev_link_queues(struct ap_matrix_mdev *matrix_mdev,
> + enum qlink_type type,
> + unsigned long qlink_id)

I believe Connie wanted this changed, and IMHO she is right, this does
not specify the type of link, the type of the link is always the same,
but determines what action needs to be taken. The enum name qlink_type
reads like it's the type of the qlink, but as your doc says it just tells
you what qlink_id is.

If apids and apqis had their own type-checked distinct type, the type of qlink_id
would be the union of those two...

> +{
> + unsigned long id;
> +
> + switch (type) {

Since each of these cases is used at exactly one place, maybe it would
be simpler to just inline them where they are needed. Or are these going
to be used in other situations as well?

> + case LINK_APID:

assign_adapter

> + for_each_set_bit_inv(id, matrix_mdev->matrix.aqm,
> + matrix_mdev->matrix.aqm_max + 1)
> + vfio_ap_mdev_link_queue(matrix_mdev, qlink_id, id);
> + break;
> + case UNLINK_APID:

unassign_adapter

> + for_each_set_bit_inv(id, matrix_mdev->matrix.aqm,
> + matrix_mdev->matrix.aqm_max + 1)
> + vfio_ap_mdev_unlink_queue(qlink_id, id);
> + break;
> + case LINK_APQI:

assign_domain

> + for_each_set_bit_inv(id, matrix_mdev->matrix.apm,
> + matrix_mdev->matrix.apm_max + 1)
> + vfio_ap_mdev_link_queue(matrix_mdev, id, qlink_id);
> + break;
> + case UNLINK_APQI:

unassign_domain

> + for_each_set_bit_inv(id, matrix_mdev->matrix.apm,
> + matrix_mdev->matrix.apm_max + 1)
> + vfio_ap_mdev_link_queue(matrix_mdev, id, qlink_id);
> + break;
> + default:
> + WARN_ON_ONCE(1);
> + }
> +}
> +
> /**
> * assign_adapter_store
> *
> @@ -628,6 +714,7 @@ static ssize_t assign_adapter_store(struct device *dev,
> if (ret)
> goto share_err;
>
> + vfio_ap_mdev_link_queues(matrix_mdev, LINK_APID, apid);
> ret = count;
> goto done;
>
> @@ -679,6 +766,7 @@ static ssize_t unassign_adapter_store(struct device *dev,
>
> mutex_lock(&matrix_dev->lock);
> clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
> + vfio_ap_mdev_link_queues(matrix_mdev, UNLINK_APID, apid);
> mutex_unlock(&matrix_dev->lock);
>
> return count;
> @@ -769,6 +857,7 @@ static ssize_t assign_domain_store(struct device *dev,
> if (ret)
> goto share_err;
>
> + vfio_ap_mdev_link_queues(matrix_mdev, LINK_APQI, apqi);
> ret = count;
> goto done;
>
> @@ -821,6 +910,7 @@ static ssize_t unassign_domain_store(struct device *dev,
>
> mutex_lock(&matrix_dev->lock);
> clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
> + vfio_ap_mdev_link_queues(matrix_mdev, UNLINK_APQI, apqi);
> mutex_unlock(&matrix_dev->lock);
>
> return count;
> @@ -1159,8 +1249,8 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
> */
> if (ret)
> rc = ret;
> - q = vfio_ap_get_queue(matrix_mdev,
> - AP_MKQID(apid, apqi));
> + q = vfio_ap_mdev_get_queue(matrix_mdev,
> + AP_MKQID(apid, apqi));
> if (q)
> vfio_ap_free_aqic_resources(q);
> }
> @@ -1288,6 +1378,29 @@ void vfio_ap_mdev_unregister(void)
> mdev_unregister_device(&matrix_dev->device);
> }
>
> +/**
> + * vfio_ap_queue_link_mdev
> + *
> + * @q: The queue to link with the matrix mdev.
> + *
> + * Links @q with the matrix mdev to which the queue's APQN is assigned.
> + */
> +static void vfio_ap_queue_link_mdev(struct vfio_ap_queue *q)
> +{
> + unsigned long apid = AP_QID_CARD(q->apqn);
> + unsigned long apqi = AP_QID_QUEUE(q->apqn);
> + struct ap_matrix_mdev *matrix_mdev;
> +
> + list_for_each_entry(matrix_mdev, &matrix_dev->mdev_list, node) {
> + if (test_bit_inv(apid, matrix_mdev->matrix.apm) &&
> + test_bit_inv(apqi, matrix_mdev->matrix.aqm)) {
> + q->matrix_mdev = matrix_mdev;
> + hash_add(matrix_mdev->qtable, &q->mdev_qnode, q->apqn);
> + break;
> + }
> + }
> +}
> +
> int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
> {
> struct vfio_ap_queue *q;
> @@ -1299,9 +1412,12 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
> if (!q)
> return -ENOMEM;
>
> + mutex_lock(&matrix_dev->lock);
> dev_set_drvdata(&queue->ap_dev.device, q);
> q->apqn = queue->qid;
> q->saved_isc = VFIO_AP_ISC_INVALID;
> + vfio_ap_queue_link_mdev(q);
> + mutex_unlock(&matrix_dev->lock);
>
> return 0;
> }
> @@ -1321,6 +1437,8 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
> apqi = AP_QID_QUEUE(q->apqn);
> vfio_ap_mdev_reset_queue(apid, apqi, 1);
> vfio_ap_free_aqic_resources(q);
> + if (q->matrix_mdev)
> + hash_del(&q->mdev_qnode);
> kfree(q);
> mutex_unlock(&matrix_dev->lock);
> }
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index d9003de4fbad..4e5cc72fc0db 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -18,6 +18,7 @@
> #include <linux/delay.h>
> #include <linux/mutex.h>
> #include <linux/kvm_host.h>
> +#include <linux/hashtable.h>
>
> #include "ap_bus.h"
>
> @@ -86,6 +87,7 @@ struct ap_matrix_mdev {
> struct kvm *kvm;
> struct kvm_s390_module_hook pqap_hook;
> struct mdev_device *mdev;
> + DECLARE_HASHTABLE(qtable, 8);

I'm not sure about the benefit of this hashtable if the bus is supposed
to give us O(1) queue lookup based on APQN. I guess it's also easier to
right-size the hashtable in the bus than for each mdev.

Don't get me wrong, I'm willing to accept these hashtables.

Another thing I'm thinking about is how do we want to deal later with
resources filtered because one of the required queues is missing. Does
it make sense to maintain the link for those? I will have to study the
following patches and return to this one later.

Regards,
Halil


> };
>
> extern int vfio_ap_mdev_register(void);
> @@ -97,6 +99,7 @@ struct vfio_ap_queue {
> int apqn;
> #define VFIO_AP_ISC_INVALID 0xff
> unsigned char saved_isc;
> + struct hlist_node mdev_qnode;
> };
>
> int vfio_ap_mdev_probe_queue(struct ap_device *queue);

2020-10-28 07:42:20

by Halil Pasic

[permalink] [raw]
Subject: Re: [PATCH v11 04/14] s390/zcrypt: driver callback to indicate resource in use

On Thu, 22 Oct 2020 13:11:59 -0400
Tony Krowiak <[email protected]> wrote:

> Introduces a new driver callback to prevent a root user from unbinding
> an AP queue from its device driver if the queue is in use. The callback
> will be invoked whenever a change to the AP bus's sysfs apmask or aqmask
> attributes would result in one or more AP queues being removed from its
> driver. If the callback responds in the affirmative for any driver
> queried, the change to the apmask or aqmask will be rejected with a device
> in use error.

Like discussed last time, there seems to be nothing, that would prevent
a resource becoming in use between the in_use() callback returned false
and the resource being removed as a result of ap_bus_revise_bindings().

Another thing that may be of interest, is that now we hold the
ap_perms_mutex for the in_use() checks. The ap_perms_mutex is used
in ap_device_probe() and I don't quite understand some
usages of in zcrypt_api.c My feeling is that the extra pressure on that
lock should not be a problem, except if in_use() were to not return
because of some deadlock.

With all that said if Harald is fine with it, so am I.

Acked-by: Halil Pasic <[email protected]>

>
> For this patch, only non-default drivers will be queried. Currently,
> there is only one non-default driver, the vfio_ap device driver. The
> vfio_ap device driver facilitates pass-through of an AP queue to a
> guest. The idea here is that a guest may be administered by a different
> sysadmin than the host and we don't want AP resources to unexpectedly
> disappear from a guest's AP configuration (i.e., adapters and domains
> assigned to the matrix mdev). This will enforce the proper procedure for
> removing AP resources intended for guest usage which is to
> first unassign them from the matrix mdev, then unbind them from the
> vfio_ap device driver.
>
> Signed-off-by: Tony Krowiak <[email protected]>
>

2020-10-28 21:52:30

by Halil Pasic

[permalink] [raw]
Subject: Re: [PATCH v11 06/14] s390/vfio-ap: introduce shadow APCB

On Thu, 22 Oct 2020 13:12:01 -0400
Tony Krowiak <[email protected]> wrote:

> The APCB is a field within the CRYCB that provides the AP configuration
> to a KVM guest. Let's introduce a shadow copy of the KVM guest's APCB and
> maintain it for the lifespan of the guest.
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> drivers/s390/crypto/vfio_ap_ops.c | 24 +++++++++++++++++++-----
> drivers/s390/crypto/vfio_ap_private.h | 2 ++
> 2 files changed, 21 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 9e9fad560859..9791761aa7fd 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -320,6 +320,19 @@ static void vfio_ap_matrix_init(struct ap_config_info *info,
> matrix->adm_max = info->apxa ? info->Nd : 15;
> }
>
> +static bool vfio_ap_mdev_has_crycb(struct ap_matrix_mdev *matrix_mdev)
> +{
> + return (matrix_mdev->kvm && matrix_mdev->kvm->arch.crypto.crycbd);
> +}
> +
> +static void vfio_ap_mdev_commit_shadow_apcb(struct ap_matrix_mdev *matrix_mdev)
> +{
> + kvm_arch_crypto_set_masks(matrix_mdev->kvm,
> + matrix_mdev->shadow_apcb.apm,
> + matrix_mdev->shadow_apcb.aqm,
> + matrix_mdev->shadow_apcb.adm);
> +}
> +
> static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
> {
> struct ap_matrix_mdev *matrix_mdev;
> @@ -335,6 +348,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>
> matrix_mdev->mdev = mdev;
> vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
> + vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);
> hash_init(matrix_mdev->qtable);
> mdev_set_drvdata(mdev, matrix_mdev);
> matrix_mdev->pqap_hook.hook = handle_pqap;
> @@ -1213,13 +1227,12 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
> if (ret)
> return NOTIFY_DONE;
>
> - /* If there is no CRYCB pointer, then we can't copy the masks */
> - if (!matrix_mdev->kvm->arch.crypto.crycbd)
> + if (!vfio_ap_mdev_has_crycb(matrix_mdev))
> return NOTIFY_DONE;
>
> - kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
> - matrix_mdev->matrix.aqm,
> - matrix_mdev->matrix.adm);
> + memcpy(&matrix_mdev->shadow_apcb, &matrix_mdev->matrix,
> + sizeof(matrix_mdev->shadow_apcb));
> + vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>
> return NOTIFY_OK;
> }
> @@ -1329,6 +1342,7 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
> kvm_put_kvm(matrix_mdev->kvm);
> matrix_mdev->kvm = NULL;
> }
> +

Unrelated change.

Otherwise patch looks OK.

Reviewed-by: Halil Pasic <[email protected]>

> mutex_unlock(&matrix_dev->lock);
>
> vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index c1d8b5507610..fc8634cee485 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -75,6 +75,7 @@ struct ap_matrix {
> * @list: allows the ap_matrix_mdev struct to be added to a list
> * @matrix: the adapters, usage domains and control domains assigned to the
> * mediated matrix device.
> + * @shadow_apcb: the shadow copy of the APCB field of the KVM guest's CRYCB
> * @group_notifier: notifier block used for specifying callback function for
> * handling the VFIO_GROUP_NOTIFY_SET_KVM event
> * @kvm: the struct holding guest's state
> @@ -82,6 +83,7 @@ struct ap_matrix {
> struct ap_matrix_mdev {
> struct list_head node;
> struct ap_matrix matrix;
> + struct ap_matrix shadow_apcb;
> struct notifier_block group_notifier;
> struct notifier_block iommu_notifier;
> struct kvm *kvm;

2020-10-29 08:48:34

by Halil Pasic

[permalink] [raw]
Subject: Re: [PATCH v11 07/14] s390/vfio-ap: sysfs attribute to display the guest's matrix

On Thu, 22 Oct 2020 13:12:02 -0400
Tony Krowiak <[email protected]> wrote:

> +static ssize_t guest_matrix_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + ssize_t nchars;
> + struct mdev_device *mdev = mdev_from_dev(dev);
> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> +
> + if (!vfio_ap_mdev_has_crycb(matrix_mdev))
> + return -ENODEV;

I'm wondering, would it make sense to have guest_matrix display the would
be guest matrix when we don't have a KVM? With the filtering in
place, the question in what guest_matrix would my (assign) matrix result
right now if I were to hook up my vfio_ap_mdev to a guest seems a
legitimate one.


> +
> + mutex_lock(&matrix_dev->lock);
> + nchars = vfio_ap_mdev_matrix_show(&matrix_mdev->shadow_apcb, buf);
> + mutex_unlock(&matrix_dev->lock);
> +
> + return nchars;
> +}
> +static DEVICE_ATTR_RO(guest_matrix);

2020-11-02 22:02:22

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v11 02/14] 390/vfio-ap: use new AP bus interface to search for queue devices



On 10/27/20 3:01 AM, Halil Pasic wrote:
> On Thu, 22 Oct 2020 13:11:57 -0400
> Tony Krowiak <[email protected]> wrote:
>
>> This patch refactors the vfio_ap device driver to use the AP bus's
>> ap_get_qdev() function to retrieve the vfio_ap_queue struct containing
>> information about a queue that is bound to the vfio_ap device driver.
>> The bus's ap_get_qdev() function retrieves the queue device from a
>> hashtable keyed by APQN. This is much more efficient than looping over
>> the list of devices attached to the AP bus by several orders of
>> magnitude.
>>
>> Signed-off-by: Tony Krowiak <[email protected]>
> Reviewed-by: Halil Pasic <[email protected]>

Thank you for your review.

>
>> ---
>> drivers/s390/crypto/vfio_ap_ops.c | 35 +++++++++++++------------------
>> 1 file changed, 14 insertions(+), 21 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index c471832f0a30..049b97d7444c 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -26,43 +26,36 @@
>>
>> static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
>>
>> -static int match_apqn(struct device *dev, const void *data)
>> -{
>> - struct vfio_ap_queue *q = dev_get_drvdata(dev);
>> -
>> - return (q->apqn == *(int *)(data)) ? 1 : 0;
>> -}
>> -
>> /**
>> - * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
>> + * vfio_ap_get_queue: Retrieve a queue with a specific APQN.
>> * @matrix_mdev: the associated mediated matrix
>> * @apqn: The queue APQN
>> *
>> - * Retrieve a queue with a specific APQN from the list of the
>> - * devices of the vfio_ap_drv.
>> - * Verify that the APID and the APQI are set in the matrix.
>> + * Retrieve a queue with a specific APQN from the AP queue devices attached to
>> + * the AP bus.
>> *
>> - * Returns the pointer to the associated vfio_ap_queue
>> + * Returns the pointer to the vfio_ap_queue with the specified APQN, or NULL.
>> */
>> static struct vfio_ap_queue *vfio_ap_get_queue(
>> struct ap_matrix_mdev *matrix_mdev,
>> - int apqn)
>> + unsigned long apqn)
>> {
>> - struct vfio_ap_queue *q;
>> - struct device *dev;
>> + struct ap_queue *queue;
>> + struct vfio_ap_queue *q = NULL;
>>
>> if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
>> return NULL;
>> if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
>> return NULL;
>>
>> - dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
>> - &apqn, match_apqn);
>> - if (!dev)
>> + queue = ap_get_qdev(apqn);
>> + if (!queue)
>> return NULL;
>> - q = dev_get_drvdata(dev);
>> - q->matrix_mdev = matrix_mdev;
>> - put_device(dev);
>> +
>> + if (queue->ap_dev.device.driver == &matrix_dev->vfio_ap_drv->driver)
>> + q = dev_get_drvdata(&queue->ap_dev.device);
>> +
> Needs to be called with the vfio_ap lock held, right? Otherwise the queue could
> get unbound while we are working with it as a vfio_ap_queue... Noting
> new, but might we worth documenting.

This is always called with the vfio_ap lock held.

>
>> + put_device(&queue->ap_dev.device);
>>
>> return q;
>> }

2020-11-13 17:17:44

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v11 05/14] s390/vfio-ap: implement in-use callback for vfio_ap driver



On 10/27/20 9:27 AM, Halil Pasic wrote:
> On Thu, 22 Oct 2020 13:12:00 -0400
> Tony Krowiak <[email protected]> wrote:
>
>> Let's implement the callback to indicate when an APQN
>> is in use by the vfio_ap device driver. The callback is
>> invoked whenever a change to the apmask or aqmask would
>> result in one or more queue devices being removed from the driver. The
>> vfio_ap device driver will indicate a resource is in use
>> if the APQN of any of the queue devices to be removed are assigned to
>> any of the matrix mdevs under the driver's control.
>>
>> Signed-off-by: Tony Krowiak <[email protected]>
>> ---
>> drivers/s390/crypto/vfio_ap_drv.c | 1 +
>> drivers/s390/crypto/vfio_ap_ops.c | 78 +++++++++++++++++++--------
>> drivers/s390/crypto/vfio_ap_private.h | 2 +
>> 3 files changed, 60 insertions(+), 21 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
>> index 73bd073fd5d3..8934471b7944 100644
>> --- a/drivers/s390/crypto/vfio_ap_drv.c
>> +++ b/drivers/s390/crypto/vfio_ap_drv.c
>> @@ -147,6 +147,7 @@ static int __init vfio_ap_init(void)
>> memset(&vfio_ap_drv, 0, sizeof(vfio_ap_drv));
>> vfio_ap_drv.probe = vfio_ap_mdev_probe_queue;
>> vfio_ap_drv.remove = vfio_ap_mdev_remove_queue;
>> + vfio_ap_drv.in_use = vfio_ap_mdev_resource_in_use;
>> vfio_ap_drv.ids = ap_queue_ids;
>>
>> ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index 1357f8f8b7e4..9e9fad560859 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -522,18 +522,40 @@ vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
>> return 0;
>> }
>>
>> +#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
>> + "already assigned to %s"
>> +
>> +static void vfio_ap_mdev_log_sharing_err(const char *mdev_name,
>> + unsigned long *apm,
>> + unsigned long *aqm)
>> +{
>> + unsigned long apid, apqi;
>> +
>> + for_each_set_bit_inv(apid, apm, AP_DEVICES)
>> + for_each_set_bit_inv(apqi, aqm, AP_DOMAINS)
>> + pr_err(MDEV_SHARING_ERR, apid, apqi, mdev_name);
> Isn't error rather severe for this? For my taste even warning would be
> severe for this.

The user only sees a EADDRINUSE returned from the sysfs interface,
so Conny asked if I could log a message to indicate which APQNs are
in use by which mdev. I can change this to an info message, but it
will be missed if the log level is set higher. Maybe Conny can put in
her two cents here since she asked for this.

>
>> +}
>> +
>> /**
>> * vfio_ap_mdev_verify_no_sharing
>> *
>> - * Verifies that the APQNs derived from the cross product of the AP adapter IDs
>> - * and AP queue indexes comprising the AP matrix are not configured for another
>> + * Verifies that each APQN derived from the cross product of the AP adapter IDs
>> + * and AP queue indexes comprising an AP matrix is not assigned to a
>> * mediated device. AP queue sharing is not allowed.
>> *
>> - * @matrix_mdev: the mediated matrix device
>> + * @matrix_mdev: the mediated matrix device to which the APQNs being verified
>> + * are assigned. If the value is not NULL, then verification will
>> + * proceed for all other matrix mediated devices; otherwise, all
>> + * matrix mediated devices will be verified.
>> + * @mdev_apm: mask indicating the APIDs of the APQNs to be verified
>> + * @mdev_aqm: mask indicating the APQIs of the APQNs to be verified
>> *
>> - * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
>> + * Returns 0 if no APQNs are not shared, otherwise; returns -EADDRINUSE if one
>> + * or more APQNs are shared.
>> */
>> -static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
>> +static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
>> + unsigned long *mdev_apm,
>> + unsigned long *mdev_aqm)
>> {
>> struct ap_matrix_mdev *lstdev;
>> DECLARE_BITMAP(apm, AP_DEVICES);
>> @@ -550,14 +572,15 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
>> * We work on full longs, as we can only exclude the leftover
>> * bits in non-inverse order. The leftover is all zeros.
>> */
>> - if (!bitmap_and(apm, matrix_mdev->matrix.apm,
>> - lstdev->matrix.apm, AP_DEVICES))
>> + if (!bitmap_and(apm, mdev_apm, lstdev->matrix.apm, AP_DEVICES))
>> continue;
>>
>> - if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
>> - lstdev->matrix.aqm, AP_DOMAINS))
>> + if (!bitmap_and(aqm, mdev_aqm, lstdev->matrix.aqm, AP_DOMAINS))
>> continue;
>>
>> + vfio_ap_mdev_log_sharing_err(dev_name(mdev_dev(lstdev->mdev)),
>> + apm, aqm);
>> +
>> return -EADDRINUSE;
>> }
>>
>> @@ -683,6 +706,7 @@ static ssize_t assign_adapter_store(struct device *dev,
>> {
>> int ret;
>> unsigned long apid;
>> + DECLARE_BITMAP(apm, AP_DEVICES);
>> struct mdev_device *mdev = mdev_from_dev(dev);
>> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>
>> @@ -708,18 +732,18 @@ static ssize_t assign_adapter_store(struct device *dev,
>> if (ret)
>> goto done;
>>
>> - set_bit_inv(apid, matrix_mdev->matrix.apm);
>> + memset(apm, 0, sizeof(apm));
>> + set_bit_inv(apid, apm);
>>
>> - ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
>> + ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev, apm,
>> + matrix_mdev->matrix.aqm);
> What is the benefit of using a copy here? I mean we have the vfio_ap lock
> so nobody can see the bit we speculatively flipped.

The vfio_ap_mdev_verify_no_sharing() function definition was changed
so that it can also be re-used by the vfio_ap_mdev_resource_in_use()
function rather than duplicating that code for the in_use callback. The
in-use callback is invoked by the AP bus which has no concept of
a mediated device, so I made this change to accommodate that fact.

>
> I've also pointed out in the previous patch that in_use() isn't
> perfectly reliable (at least in theory) because of a race.

We discussed that privately and determined that the sysfs assignment
interfaces will use mutex_trylock() to avoid races.

>
> Otherwise looks good to me!
>
>> if (ret)
>> - goto share_err;
>> + goto done;
>>
>> + set_bit_inv(apid, matrix_mdev->matrix.apm);
>> vfio_ap_mdev_link_queues(matrix_mdev, LINK_APID, apid);
>> ret = count;
>> - goto done;
>>
>> -share_err:
>> - clear_bit_inv(apid, matrix_mdev->matrix.apm);
>> done:
>> mutex_unlock(&matrix_dev->lock);
>>
>> @@ -831,6 +855,7 @@ static ssize_t assign_domain_store(struct device *dev,
>> {
>> int ret;
>> unsigned long apqi;
>> + DECLARE_BITMAP(aqm, AP_DOMAINS);
>> struct mdev_device *mdev = mdev_from_dev(dev);
>> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>> unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
>> @@ -851,18 +876,18 @@ static ssize_t assign_domain_store(struct device *dev,
>> if (ret)
>> goto done;
>>
>> - set_bit_inv(apqi, matrix_mdev->matrix.aqm);
>> + memset(aqm, 0, sizeof(aqm));
>> + set_bit_inv(apqi, aqm);
>>
>> - ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
>> + ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev,
>> + matrix_mdev->matrix.apm, aqm);
>> if (ret)
>> - goto share_err;
>> + goto done;
>>
>> + set_bit_inv(apqi, matrix_mdev->matrix.aqm);
>> vfio_ap_mdev_link_queues(matrix_mdev, LINK_APQI, apqi);
>> ret = count;
>> - goto done;
>>
>> -share_err:
>> - clear_bit_inv(apqi, matrix_mdev->matrix.aqm);
>> done:
>> mutex_unlock(&matrix_dev->lock);
>>
>> @@ -1442,3 +1467,14 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
>> kfree(q);
>> mutex_unlock(&matrix_dev->lock);
>> }
>> +
>> +bool vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm)
>> +{
>> + bool in_use;
>> +
>> + mutex_lock(&matrix_dev->lock);
>> + in_use = !!vfio_ap_mdev_verify_no_sharing(NULL, apm, aqm);
>> + mutex_unlock(&matrix_dev->lock);
>> +
>> + return in_use;
>> +}
>> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
>> index 4e5cc72fc0db..c1d8b5507610 100644
>> --- a/drivers/s390/crypto/vfio_ap_private.h
>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>> @@ -105,4 +105,6 @@ struct vfio_ap_queue {
>> int vfio_ap_mdev_probe_queue(struct ap_device *queue);
>> void vfio_ap_mdev_remove_queue(struct ap_device *queue);
>>
>> +bool vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm);
>> +
>> #endif /* _VFIO_AP_PRIVATE_H_ */

2020-11-13 17:20:22

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v11 06/14] s390/vfio-ap: introduce shadow APCB



On 10/28/20 4:11 AM, Halil Pasic wrote:
> On Thu, 22 Oct 2020 13:12:01 -0400
> Tony Krowiak <[email protected]> wrote:
>
>> The APCB is a field within the CRYCB that provides the AP configuration
>> to a KVM guest. Let's introduce a shadow copy of the KVM guest's APCB and
>> maintain it for the lifespan of the guest.
>>
>> Signed-off-by: Tony Krowiak <[email protected]>
>> ---
>> drivers/s390/crypto/vfio_ap_ops.c | 24 +++++++++++++++++++-----
>> drivers/s390/crypto/vfio_ap_private.h | 2 ++
>> 2 files changed, 21 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index 9e9fad560859..9791761aa7fd 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -320,6 +320,19 @@ static void vfio_ap_matrix_init(struct ap_config_info *info,
>> matrix->adm_max = info->apxa ? info->Nd : 15;
>> }
>>
>> +static bool vfio_ap_mdev_has_crycb(struct ap_matrix_mdev *matrix_mdev)
>> +{
>> + return (matrix_mdev->kvm && matrix_mdev->kvm->arch.crypto.crycbd);
>> +}
>> +
>> +static void vfio_ap_mdev_commit_shadow_apcb(struct ap_matrix_mdev *matrix_mdev)
>> +{
>> + kvm_arch_crypto_set_masks(matrix_mdev->kvm,
>> + matrix_mdev->shadow_apcb.apm,
>> + matrix_mdev->shadow_apcb.aqm,
>> + matrix_mdev->shadow_apcb.adm);
>> +}
>> +
>> static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>> {
>> struct ap_matrix_mdev *matrix_mdev;
>> @@ -335,6 +348,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>>
>> matrix_mdev->mdev = mdev;
>> vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
>> + vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);
>> hash_init(matrix_mdev->qtable);
>> mdev_set_drvdata(mdev, matrix_mdev);
>> matrix_mdev->pqap_hook.hook = handle_pqap;
>> @@ -1213,13 +1227,12 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>> if (ret)
>> return NOTIFY_DONE;
>>
>> - /* If there is no CRYCB pointer, then we can't copy the masks */
>> - if (!matrix_mdev->kvm->arch.crypto.crycbd)
>> + if (!vfio_ap_mdev_has_crycb(matrix_mdev))
>> return NOTIFY_DONE;
>>
>> - kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
>> - matrix_mdev->matrix.aqm,
>> - matrix_mdev->matrix.adm);
>> + memcpy(&matrix_mdev->shadow_apcb, &matrix_mdev->matrix,
>> + sizeof(matrix_mdev->shadow_apcb));
>> + vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>>
>> return NOTIFY_OK;
>> }
>> @@ -1329,6 +1342,7 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
>> kvm_put_kvm(matrix_mdev->kvm);
>> matrix_mdev->kvm = NULL;
>> }
>> +
> Unrelated change.
>
> Otherwise patch looks OK.
>
> Reviewed-by: Halil Pasic <[email protected]>

I'll fix it. Thanks for your review.

>
>> mutex_unlock(&matrix_dev->lock);
>>
>> vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
>> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
>> index c1d8b5507610..fc8634cee485 100644
>> --- a/drivers/s390/crypto/vfio_ap_private.h
>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>> @@ -75,6 +75,7 @@ struct ap_matrix {
>> * @list: allows the ap_matrix_mdev struct to be added to a list
>> * @matrix: the adapters, usage domains and control domains assigned to the
>> * mediated matrix device.
>> + * @shadow_apcb: the shadow copy of the APCB field of the KVM guest's CRYCB
>> * @group_notifier: notifier block used for specifying callback function for
>> * handling the VFIO_GROUP_NOTIFY_SET_KVM event
>> * @kvm: the struct holding guest's state
>> @@ -82,6 +83,7 @@ struct ap_matrix {
>> struct ap_matrix_mdev {
>> struct list_head node;
>> struct ap_matrix matrix;
>> + struct ap_matrix shadow_apcb;
>> struct notifier_block group_notifier;
>> struct notifier_block iommu_notifier;
>> struct kvm *kvm;

2020-11-13 17:29:57

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v11 07/14] s390/vfio-ap: sysfs attribute to display the guest's matrix



On 10/28/20 4:17 AM, Halil Pasic wrote:
> On Thu, 22 Oct 2020 13:12:02 -0400
> Tony Krowiak <[email protected]> wrote:
>
>> +static ssize_t guest_matrix_show(struct device *dev,
>> + struct device_attribute *attr, char *buf)
>> +{
>> + ssize_t nchars;
>> + struct mdev_device *mdev = mdev_from_dev(dev);
>> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>> +
>> + if (!vfio_ap_mdev_has_crycb(matrix_mdev))
>> + return -ENODEV;
> I'm wondering, would it make sense to have guest_matrix display the would
> be guest matrix when we don't have a KVM? With the filtering in
> place, the question in what guest_matrix would my (assign) matrix result
> right now if I were to hook up my vfio_ap_mdev to a guest seems a
> legitimate one.

A couple of thoughts here:
* The ENODEV informs the user that there is no guest running
   which makes sense to me given this interface displays the
   guest matrix. The alternative, which I considered, was to
   display an empty matrix (i.e., nothing).
* This would be a pretty drastic change to the design because
   the shadow_apcb - which is what is displayed via this interface - is
   only updated when the guest is started and while it is running (i.e.,
   hot plug of new adapters/domains). Making this change would
   require changing that entire design concept which I am reluctant
   to do at this point in the game.


>
>
>> +
>> + mutex_lock(&matrix_dev->lock);
>> + nchars = vfio_ap_mdev_matrix_show(&matrix_mdev->shadow_apcb, buf);
>> + mutex_unlock(&matrix_dev->lock);
>> +
>> + return nchars;
>> +}
>> +static DEVICE_ATTR_RO(guest_matrix);

2020-11-13 21:32:47

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v11 04/14] s390/zcrypt: driver callback to indicate resource in use



On 10/27/20 12:55 PM, Harald Freudenberger wrote:
> On 22.10.20 19:11, Tony Krowiak wrote:
>> Introduces a new driver callback to prevent a root user from unbinding
>> an AP queue from its device driver if the queue is in use. The callback
>> will be invoked whenever a change to the AP bus's sysfs apmask or aqmask
>> attributes would result in one or more AP queues being removed from its
>> driver. If the callback responds in the affirmative for any driver
>> queried, the change to the apmask or aqmask will be rejected with a device
>> in use error.
>>
>> For this patch, only non-default drivers will be queried. Currently,
>> there is only one non-default driver, the vfio_ap device driver. The
>> vfio_ap device driver facilitates pass-through of an AP queue to a
>> guest. The idea here is that a guest may be administered by a different
>> sysadmin than the host and we don't want AP resources to unexpectedly
>> disappear from a guest's AP configuration (i.e., adapters and domains
>> assigned to the matrix mdev). This will enforce the proper procedure for
>> removing AP resources intended for guest usage which is to
>> first unassign them from the matrix mdev, then unbind them from the
>> vfio_ap device driver.
>>
>> Signed-off-by: Tony Krowiak <[email protected]>
>> ---
>> drivers/s390/crypto/ap_bus.c | 148 ++++++++++++++++++++++++++++++++---
>> drivers/s390/crypto/ap_bus.h | 4 +
>> 2 files changed, 142 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
>> index 485cbfcbf06e..998e61cd86d9 100644
>> --- a/drivers/s390/crypto/ap_bus.c
>> +++ b/drivers/s390/crypto/ap_bus.c
>> @@ -35,6 +35,7 @@
>> #include <linux/mod_devicetable.h>
>> #include <linux/debugfs.h>
>> #include <linux/ctype.h>
>> +#include <linux/module.h>
>>
>> #include "ap_bus.h"
>> #include "ap_debug.h"
>> @@ -893,6 +894,23 @@ static int modify_bitmap(const char *str, unsigned long *bitmap, int bits)
>> return 0;
>> }
>>
>> +static int ap_parse_bitmap_str(const char *str, unsigned long *bitmap, int bits,
>> + unsigned long *newmap)
>> +{
>> + unsigned long size;
>> + int rc;
>> +
>> + size = BITS_TO_LONGS(bits)*sizeof(unsigned long);
>> + if (*str == '+' || *str == '-') {
>> + memcpy(newmap, bitmap, size);
>> + rc = modify_bitmap(str, newmap, bits);
>> + } else {
>> + memset(newmap, 0, size);
>> + rc = hex2bitmap(str, newmap, bits);
>> + }
>> + return rc;
>> +}
>> +
>> int ap_parse_mask_str(const char *str,
>> unsigned long *bitmap, int bits,
>> struct mutex *lock)
>> @@ -912,14 +930,7 @@ int ap_parse_mask_str(const char *str,
>> kfree(newmap);
>> return -ERESTARTSYS;
>> }
>> -
>> - if (*str == '+' || *str == '-') {
>> - memcpy(newmap, bitmap, size);
>> - rc = modify_bitmap(str, newmap, bits);
>> - } else {
>> - memset(newmap, 0, size);
>> - rc = hex2bitmap(str, newmap, bits);
>> - }
>> + rc = ap_parse_bitmap_str(str, bitmap, bits, newmap);
>> if (rc == 0)
>> memcpy(bitmap, newmap, size);
>> mutex_unlock(lock);
>> @@ -1111,12 +1122,70 @@ static ssize_t apmask_show(struct bus_type *bus, char *buf)
>> return rc;
>> }
>>
>> +static int __verify_card_reservations(struct device_driver *drv, void *data)
>> +{
>> + int rc = 0;
>> + struct ap_driver *ap_drv = to_ap_drv(drv);
>> + unsigned long *newapm = (unsigned long *)data;
>> +
>> + /*
>> + * No need to verify whether the driver is using the queues if it is the
>> + * default driver.
>> + */
>> + if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
>> + return 0;
>> +
>> + /* The non-default driver's module must be loaded */
> Can you please update this comment? It should be something like
> /* increase the driver's module refcounter to be sure it is not
>    going away when we invoke the callback function. */

Will do.

>
>> + if (!try_module_get(drv->owner))
>> + return 0;
>> +
>> + if (ap_drv->in_use)
>> + if (ap_drv->in_use(newapm, ap_perms.aqm))
>> + rc = -EBUSY;
>> +
> And here: /* release driver's module */ or simmilar

Okay

>> + module_put(drv->owner);
>> +
>> + return rc;
>> +}
>> +
>> +static int apmask_commit(unsigned long *newapm)
>> +{
>> + int rc;
>> + unsigned long reserved[BITS_TO_LONGS(AP_DEVICES)];
>> +
>> + /*
>> + * Check if any bits in the apmask have been set which will
>> + * result in queues being removed from non-default drivers
>> + */
>> + if (bitmap_andnot(reserved, newapm, ap_perms.apm, AP_DEVICES)) {
>> + rc = bus_for_each_drv(&ap_bus_type, NULL, reserved,
>> + __verify_card_reservations);
>> + if (rc)
>> + return rc;
>> + }
>> +
>> + memcpy(ap_perms.apm, newapm, APMASKSIZE);
>> +
>> + return 0;
>> +}
>> +
>> static ssize_t apmask_store(struct bus_type *bus, const char *buf,
>> size_t count)
>> {
>> int rc;
>> + DECLARE_BITMAP(newapm, AP_DEVICES);
>> +
>> + if (mutex_lock_interruptible(&ap_perms_mutex))
>> + return -ERESTARTSYS;
>> +
>> + rc = ap_parse_bitmap_str(buf, ap_perms.apm, AP_DEVICES, newapm);
>> + if (rc)
>> + goto done;
>>
>> - rc = ap_parse_mask_str(buf, ap_perms.apm, AP_DEVICES, &ap_perms_mutex);
>> + rc = apmask_commit(newapm);
>> +
>> +done:
>> + mutex_unlock(&ap_perms_mutex);
>> if (rc)
>> return rc;
>>
>> @@ -1142,12 +1211,71 @@ static ssize_t aqmask_show(struct bus_type *bus, char *buf)
>> return rc;
>> }
>>
>> +static int __verify_queue_reservations(struct device_driver *drv, void *data)
>> +{
>> + int rc = 0;
>> + struct ap_driver *ap_drv = to_ap_drv(drv);
>> + unsigned long *newaqm = (unsigned long *)data;
>> +
>> + /*
>> + * If the reserved bits do not identify queues reserved for use by the
>> + * non-default driver, there is no need to verify the driver is using
>> + * the queues.
>> + */
>> + if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
>> + return 0;
>> +
>> + /* The non-default driver's module must be loaded */
> Same here.

Okay

>> + if (!try_module_get(drv->owner))
>> + return 0;
>> +
>> + if (ap_drv->in_use)
>> + if (ap_drv->in_use(ap_perms.apm, newaqm))
>> + rc = -EBUSY;
>> +
> and here

Okay

>> + module_put(drv->owner);
>> +
>> + return rc;
>> +}
>> +
>> +static int aqmask_commit(unsigned long *newaqm)
>> +{
>> + int rc;
>> + unsigned long reserved[BITS_TO_LONGS(AP_DOMAINS)];
>> +
>> + /*
>> + * Check if any bits in the aqmask have been set which will
>> + * result in queues being removed from non-default drivers
>> + */
>> + if (bitmap_andnot(reserved, newaqm, ap_perms.aqm, AP_DOMAINS)) {
>> + rc = bus_for_each_drv(&ap_bus_type, NULL, reserved,
>> + __verify_queue_reservations);
>> + if (rc)
>> + return rc;
>> + }
>> +
>> + memcpy(ap_perms.aqm, newaqm, AQMASKSIZE);
>> +
>> + return 0;
>> +}
>> +
>> static ssize_t aqmask_store(struct bus_type *bus, const char *buf,
>> size_t count)
>> {
>> int rc;
>> + DECLARE_BITMAP(newaqm, AP_DOMAINS);
>>
>> - rc = ap_parse_mask_str(buf, ap_perms.aqm, AP_DOMAINS, &ap_perms_mutex);
>> + if (mutex_lock_interruptible(&ap_perms_mutex))
>> + return -ERESTARTSYS;
>> +
>> + rc = ap_parse_bitmap_str(buf, ap_perms.aqm, AP_DOMAINS, newaqm);
>> + if (rc)
>> + goto done;
>> +
>> + rc = aqmask_commit(newaqm);
>> +
>> +done:
>> + mutex_unlock(&ap_perms_mutex);
>> if (rc)
>> return rc;
>>
>> diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
>> index 5029b80132aa..6ce154d924d3 100644
>> --- a/drivers/s390/crypto/ap_bus.h
>> +++ b/drivers/s390/crypto/ap_bus.h
>> @@ -145,6 +145,7 @@ struct ap_driver {
>>
>> int (*probe)(struct ap_device *);
>> void (*remove)(struct ap_device *);
>> + bool (*in_use)(unsigned long *apm, unsigned long *aqm);
>> };
>>
>> #define to_ap_drv(x) container_of((x), struct ap_driver, driver)
>> @@ -293,6 +294,9 @@ void ap_queue_init_state(struct ap_queue *aq);
>> struct ap_card *ap_card_create(int id, int queue_depth, int raw_device_type,
>> int comp_device_type, unsigned int functions);
>>
>> +#define APMASKSIZE (BITS_TO_LONGS(AP_DEVICES) * sizeof(unsigned long))
>> +#define AQMASKSIZE (BITS_TO_LONGS(AP_DOMAINS) * sizeof(unsigned long))
>> +
>> struct ap_perms {
>> unsigned long ioctlm[BITS_TO_LONGS(AP_IOCTLS)];
>> unsigned long apm[BITS_TO_LONGS(AP_DEVICES)];
> I still don't like this code. That's because of what it is doing - not because of the code quality.
> And Halil, you are right. It is adding more pressure to the mutex used for locking the apmask
> and aqmask stuff (and the zcrypt multiple device drivers support code also).
> I am very concerned about the in_use callback which is called with the ap_perms_mutex
> held AND during bus_for_each_drv (so holding the overall AP BUS mutex) and then diving
> into the vfio_ap ... with yet another mutex to protect the vfio structs.
> Reviewed-by: Harald Freudenberger <[email protected]>

Thank you for your review. Maybe you ought to bring these concerns up with
our crypto architect. Halil came up with a solution for the potential
deadlock
situation. We will be using the mutex_trylock() function in our sysfs
assignment
interfaces which make the call to the AP bus to check permissions (which
also
locks ap_perms). If the mutex_trylock() fails, we return from the assignment
function with -EBUSY. This should resolve that potential deadlock issue.


2020-11-13 23:16:02

by Halil Pasic

[permalink] [raw]
Subject: Re: [PATCH v11 07/14] s390/vfio-ap: sysfs attribute to display the guest's matrix

On Fri, 13 Nov 2020 12:27:32 -0500
Tony Krowiak <[email protected]> wrote:

>
>
> On 10/28/20 4:17 AM, Halil Pasic wrote:
> > On Thu, 22 Oct 2020 13:12:02 -0400
> > Tony Krowiak <[email protected]> wrote:
> >
> >> +static ssize_t guest_matrix_show(struct device *dev,
> >> + struct device_attribute *attr, char *buf)
> >> +{
> >> + ssize_t nchars;
> >> + struct mdev_device *mdev = mdev_from_dev(dev);
> >> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> >> +
> >> + if (!vfio_ap_mdev_has_crycb(matrix_mdev))
> >> + return -ENODEV;
> > I'm wondering, would it make sense to have guest_matrix display the would
> > be guest matrix when we don't have a KVM? With the filtering in
> > place, the question in what guest_matrix would my (assign) matrix result
> > right now if I were to hook up my vfio_ap_mdev to a guest seems a
> > legitimate one.
>
> A couple of thoughts here:
> * The ENODEV informs the user that there is no guest running
>    which makes sense to me given this interface displays the
>    guest matrix. The alternative, which I considered, was to
>    display an empty matrix (i.e., nothing).
> * This would be a pretty drastic change to the design because
>    the shadow_apcb - which is what is displayed via this interface - is
>    only updated when the guest is started and while it is running (i.e.,
>    hot plug of new adapters/domains). Making this change would
>    require changing that entire design concept which I am reluctant
>    to do at this point in the game.
>
>

No problem. My thinking was, that, because we can do the
assign/unassing ops also for the running guest, that we also have
the code to do the maintenance on the shadow_apcb. In this
series this code is conditional with respect to vfio_ap_mdev_has_crycb().
E.g.

static ssize_t assign_adapter_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
[..]
if (vfio_ap_mdev_has_crycb(matrix_mdev))
if (vfio_ap_mdev_filter_guest_matrix(matrix_mdev, true))
vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);

If one were to move the
vfio_ap_mdev_has_crycb() check into vfio_ap_mdev_commit_shadow_apcb()
then we would have an always up to date shatdow_apcb, we could display.

I don't feel strongly about this. Was just an idea, because if the result
of the filtering is surprising, currently the only to see, without
knowing the algorithm, and possibly the state, and the history of the
system, is to actually start a guest.

Regards,
Halil

2020-11-13 23:51:38

by Halil Pasic

[permalink] [raw]
Subject: Re: [PATCH v11 05/14] s390/vfio-ap: implement in-use callback for vfio_ap driver

On Fri, 13 Nov 2020 12:14:22 -0500
Tony Krowiak <[email protected]> wrote:
[..]
> >> }
> >>
> >> +#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
> >> + "already assigned to %s"
> >> +
> >> +static void vfio_ap_mdev_log_sharing_err(const char *mdev_name,
> >> + unsigned long *apm,
> >> + unsigned long *aqm)
> >> +{
> >> + unsigned long apid, apqi;
> >> +
> >> + for_each_set_bit_inv(apid, apm, AP_DEVICES)
> >> + for_each_set_bit_inv(apqi, aqm, AP_DOMAINS)
> >> + pr_err(MDEV_SHARING_ERR, apid, apqi, mdev_name);
> > Isn't error rather severe for this? For my taste even warning would be
> > severe for this.
>
> The user only sees a EADDRINUSE returned from the sysfs interface,
> so Conny asked if I could log a message to indicate which APQNs are
> in use by which mdev. I can change this to an info message, but it
> will be missed if the log level is set higher. Maybe Conny can put in
> her two cents here since she asked for this.
>

I'm looking forward to Conny's opinion. :)

[..]
> >>
> >> @@ -708,18 +732,18 @@ static ssize_t assign_adapter_store(struct device *dev,
> >> if (ret)
> >> goto done;
> >>
> >> - set_bit_inv(apid, matrix_mdev->matrix.apm);
> >> + memset(apm, 0, sizeof(apm));
> >> + set_bit_inv(apid, apm);
> >>
> >> - ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
> >> + ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev, apm,
> >> + matrix_mdev->matrix.aqm);
> > What is the benefit of using a copy here? I mean we have the vfio_ap lock
> > so nobody can see the bit we speculatively flipped.
>
> The vfio_ap_mdev_verify_no_sharing() function definition was changed
> so that it can also be re-used by the vfio_ap_mdev_resource_in_use()
> function rather than duplicating that code for the in_use callback. The
> in-use callback is invoked by the AP bus which has no concept of
> a mediated device, so I made this change to accommodate that fact.

Seems I was not clear enough with my question. Here you pass a local
apm which has the every bit 0 except the one corresponding to the
adapter we are trying to assign. The matrix.apm actually may have
more apm bits set. What we used to do, is set the matrix.apm bit,
verify, and clear it if verification fails. I think that
would still work.

The computational complexity is currently the same. For
some reason unknown to me ap_apqn_in_matrix_owned_by_def_drv() uses loops
instead of using bitmap operations. But it won't do any less work
if the apm argument is sparse. Same is true bitmap ops are used.

What you do here is not wrong, because if the invariants, which should
be maintained, are maintained, performing the check with the other
bits set in the apm is superfluous. But as I said before, actually
it ain't extra work, and if there was a bug, it could help us detect
it (because the assignment, that should have worked would fail).

Preparing the local apm isn't much extra work either, but I still
don't understand the change. Why can't you pass in matrix.apm
after set_bit_inv(apid, ...) like we use to do before?

Again, no big deal, but I just prefer to understand the whys.

>
> >
> > I've also pointed out in the previous patch that in_use() isn't
> > perfectly reliable (at least in theory) because of a race.
>
> We discussed that privately and determined that the sysfs assignment
> interfaces will use mutex_trylock() to avoid races.

I don't think, what we discussed is going to fix the race I'm referring
to here. But I do look forward to v12.

Regards,
Halil

2020-11-14 00:04:21

by Halil Pasic

[permalink] [raw]
Subject: Re: [PATCH v11 04/14] s390/zcrypt: driver callback to indicate resource in use

On Fri, 13 Nov 2020 16:30:31 -0500
Tony Krowiak <[email protected]> wrote:

> We will be using the mutex_trylock() function in our sysfs
> assignment
> interfaces which make the call to the AP bus to check permissions (which
> also
> locks ap_perms). If the mutex_trylock() fails, we return from the assignment
> function with -EBUSY. This should resolve that potential deadlock issue.

It resolves the deadlock issue only if in_use() is also doing
mutex_trylock(), but the if in_use doesn't take the lock it
needs to back off (and so does it's client code) i.e. a boolean as
return value won't do.

Regards,
Halil

2020-11-16 16:28:40

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v11 04/14] s390/zcrypt: driver callback to indicate resource in use



On 11/13/20 7:00 PM, Halil Pasic wrote:
> On Fri, 13 Nov 2020 16:30:31 -0500
> Tony Krowiak <[email protected]> wrote:
>
>> We will be using the mutex_trylock() function in our sysfs
>> assignment
>> interfaces which make the call to the AP bus to check permissions (which
>> also
>> locks ap_perms). If the mutex_trylock() fails, we return from the assignment
>> function with -EBUSY. This should resolve that potential deadlock issue.
> It resolves the deadlock issue only if in_use() is also doing
> mutex_trylock(), but the if in_use doesn't take the lock it
> needs to back off (and so does it's client code) i.e. a boolean as
> return value won't do.

Makes sense. I'll change the in_use callback to return an int and use
mutex_trylock() for the vfio_ap_mdev_in_use() function. If the lock
can not be obtained, the function will return -EBUSY.

>
> Regards,
> Halil

2020-11-16 20:46:12

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v11 05/14] s390/vfio-ap: implement in-use callback for vfio_ap driver



On 11/13/20 6:47 PM, Halil Pasic wrote:
> On Fri, 13 Nov 2020 12:14:22 -0500
> Tony Krowiak <[email protected]> wrote:
> [..]
>>>> }
>>>>
>>>> +#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
>>>> + "already assigned to %s"
>>>> +
>>>> +static void vfio_ap_mdev_log_sharing_err(const char *mdev_name,
>>>> + unsigned long *apm,
>>>> + unsigned long *aqm)
>>>> +{
>>>> + unsigned long apid, apqi;
>>>> +
>>>> + for_each_set_bit_inv(apid, apm, AP_DEVICES)
>>>> + for_each_set_bit_inv(apqi, aqm, AP_DOMAINS)
>>>> + pr_err(MDEV_SHARING_ERR, apid, apqi, mdev_name);
>>> Isn't error rather severe for this? For my taste even warning would be
>>> severe for this.
>> The user only sees a EADDRINUSE returned from the sysfs interface,
>> so Conny asked if I could log a message to indicate which APQNs are
>> in use by which mdev. I can change this to an info message, but it
>> will be missed if the log level is set higher. Maybe Conny can put in
>> her two cents here since she asked for this.
>>
> I'm looking forward to Conny's opinion. :)
>
> [..]
>>>>
>>>> @@ -708,18 +732,18 @@ static ssize_t assign_adapter_store(struct device *dev,
>>>> if (ret)
>>>> goto done;
>>>>
>>>> - set_bit_inv(apid, matrix_mdev->matrix.apm);
>>>> + memset(apm, 0, sizeof(apm));
>>>> + set_bit_inv(apid, apm);
>>>>
>>>> - ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
>>>> + ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev, apm,
>>>> + matrix_mdev->matrix.aqm);
>>> What is the benefit of using a copy here? I mean we have the vfio_ap lock
>>> so nobody can see the bit we speculatively flipped.
>> The vfio_ap_mdev_verify_no_sharing() function definition was changed
>> so that it can also be re-used by the vfio_ap_mdev_resource_in_use()
>> function rather than duplicating that code for the in_use callback. The
>> in-use callback is invoked by the AP bus which has no concept of
>> a mediated device, so I made this change to accommodate that fact.
> Seems I was not clear enough with my question. Here you pass a local
> apm which has the every bit 0 except the one corresponding to the
> adapter we are trying to assign. The matrix.apm actually may have
> more apm bits set. What we used to do, is set the matrix.apm bit,
> verify, and clear it if verification fails. I think that
> would still work.
>
> The computational complexity is currently the same. For
> some reason unknown to me ap_apqn_in_matrix_owned_by_def_drv() uses loops
> instead of using bitmap operations. But it won't do any less work
> if the apm argument is sparse. Same is true bitmap ops are used.
>
> What you do here is not wrong, because if the invariants, which should
> be maintained, are maintained, performing the check with the other
> bits set in the apm is superfluous. But as I said before, actually
> it ain't extra work, and if there was a bug, it could help us detect
> it (because the assignment, that should have worked would fail).
>
> Preparing the local apm isn't much extra work either, but I still
> don't understand the change. Why can't you pass in matrix.apm
> after set_bit_inv(apid, ...) like we use to do before?
>
> Again, no big deal, but I just prefer to understand the whys.

I think you misunderstood what I was saying, probably because
I didn't explain it very thoroughly or clearly. The change was not
made to reduce the amount of work done in the
vfio_ap_mdev_verify_no_sharing() function.


If the assignment functions were the only ones to call the
vfio_ap_mdev_verify_no_sharing() function, then you'd be correct;
there would be no good reason not to set the apid in the
matrix_mdev->matrix.apm/aqm as we used to. The modification
was made to accommodate the vfio_ap_mdev_resource_in_use() function.

The vfio_ap_mdev_resource_in_use() function is invoked by the
AP bus when a change is made to the apmask/aqmask that
will result in taking queues away from vfio_ap. This function
needs to verify that the affected APQNs are not assigned to
any matrix mdev. Rather than write a new function that duplicates
the logic in the vfio_ap_mdev_verify_no_sharing() function, I merely
changed the signature to take the apm/aqm specifying the APQNs to
verify rather than obtaining them from the matrix_mdev. The
reason for this is because the bitmaps passed to the in_use
callback are not specific to a particular matrix_mdev as is the
case with the assignment interfaces. Making this change allowed the
vfio_ap_mdev_verify_no_sharing() function to be used by both the
assignment functions as well as the in_use callback.

I suppose another option
would have been to create a phony matrix_mdev in the in_use
callback and copy the masks passed in to the function to the
phony matrix_mdev's apm/aqm. That would have eliminated
the need to change the signature of the vfio_ap_mdev_verify_no_sharing()
function, but I'm not sure it is worth the effort at this point.

>>> I've also pointed out in the previous patch that in_use() isn't
>>> perfectly reliable (at least in theory) because of a race.
>> We discussed that privately and determined that the sysfs assignment
>> interfaces will use mutex_trylock() to avoid races.
> I don't think, what we discussed is going to fix the race I'm referring
> to here. But I do look forward to v12.
>
> Regards,
> Halil

2020-11-19 18:19:47

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v11 07/14] s390/vfio-ap: sysfs attribute to display the guest's matrix



On 11/13/20 6:12 PM, Halil Pasic wrote:
> On Fri, 13 Nov 2020 12:27:32 -0500
> Tony Krowiak <[email protected]> wrote:
>
>>
>> On 10/28/20 4:17 AM, Halil Pasic wrote:
>>> On Thu, 22 Oct 2020 13:12:02 -0400
>>> Tony Krowiak <[email protected]> wrote:
>>>
>>>> +static ssize_t guest_matrix_show(struct device *dev,
>>>> + struct device_attribute *attr, char *buf)
>>>> +{
>>>> + ssize_t nchars;
>>>> + struct mdev_device *mdev = mdev_from_dev(dev);
>>>> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>>> +
>>>> + if (!vfio_ap_mdev_has_crycb(matrix_mdev))
>>>> + return -ENODEV;
>>> I'm wondering, would it make sense to have guest_matrix display the would
>>> be guest matrix when we don't have a KVM? With the filtering in
>>> place, the question in what guest_matrix would my (assign) matrix result
>>> right now if I were to hook up my vfio_ap_mdev to a guest seems a
>>> legitimate one.
>> A couple of thoughts here:
>> * The ENODEV informs the user that there is no guest running
>>    which makes sense to me given this interface displays the
>>    guest matrix. The alternative, which I considered, was to
>>    display an empty matrix (i.e., nothing).
>> * This would be a pretty drastic change to the design because
>>    the shadow_apcb - which is what is displayed via this interface - is
>>    only updated when the guest is started and while it is running (i.e.,
>>    hot plug of new adapters/domains). Making this change would
>>    require changing that entire design concept which I am reluctant
>>    to do at this point in the game.
>>
>>
> No problem. My thinking was, that, because we can do the
> assign/unassing ops also for the running guest, that we also have
> the code to do the maintenance on the shadow_apcb. In this
> series this code is conditional with respect to vfio_ap_mdev_has_crycb().
> E.g.
>
> static ssize_t assign_adapter_store(struct device *dev,
> struct device_attribute *attr,
> const char *buf, size_t count)
> {
> [..]
> if (vfio_ap_mdev_has_crycb(matrix_mdev))
> if (vfio_ap_mdev_filter_guest_matrix(matrix_mdev, true))
> vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>
> If one were to move the
> vfio_ap_mdev_has_crycb() check into vfio_ap_mdev_commit_shadow_apcb()
> then we would have an always up to date shatdow_apcb, we could display.
>
> I don't feel strongly about this. Was just an idea, because if the result
> of the filtering is surprising, currently the only to see, without
> knowing the algorithm, and possibly the state, and the history of the
> system, is to actually start a guest.

Okay, I can buy this and will make the change.

>
> Regards,
> Halil
>

2020-11-23 17:07:16

by Cornelia Huck

[permalink] [raw]
Subject: Re: [PATCH v11 05/14] s390/vfio-ap: implement in-use callback for vfio_ap driver

On Sat, 14 Nov 2020 00:47:22 +0100
Halil Pasic <[email protected]> wrote:

> On Fri, 13 Nov 2020 12:14:22 -0500
> Tony Krowiak <[email protected]> wrote:
> [..]
> > >> }
> > >>
> > >> +#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
> > >> + "already assigned to %s"
> > >> +
> > >> +static void vfio_ap_mdev_log_sharing_err(const char *mdev_name,
> > >> + unsigned long *apm,
> > >> + unsigned long *aqm)
> > >> +{
> > >> + unsigned long apid, apqi;
> > >> +
> > >> + for_each_set_bit_inv(apid, apm, AP_DEVICES)
> > >> + for_each_set_bit_inv(apqi, aqm, AP_DOMAINS)
> > >> + pr_err(MDEV_SHARING_ERR, apid, apqi, mdev_name);
> > > Isn't error rather severe for this? For my taste even warning would be
> > > severe for this.
> >
> > The user only sees a EADDRINUSE returned from the sysfs interface,
> > so Conny asked if I could log a message to indicate which APQNs are
> > in use by which mdev. I can change this to an info message, but it
> > will be missed if the log level is set higher. Maybe Conny can put in
> > her two cents here since she asked for this.
> >
>
> I'm looking forward to Conny's opinion. :)

(only just saw this; -ETOOMANYEMAILS)

It is probably not an error in the sense of "things are broken, this
cannot work"; but I'd consider this at least a warning "this does not
work as you intended".

2020-11-23 19:27:22

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v11 05/14] s390/vfio-ap: implement in-use callback for vfio_ap driver



On 11/23/20 12:03 PM, Cornelia Huck wrote:
> On Sat, 14 Nov 2020 00:47:22 +0100
> Halil Pasic <[email protected]> wrote:
>
>> On Fri, 13 Nov 2020 12:14:22 -0500
>> Tony Krowiak <[email protected]> wrote:
>> [..]
>>>>> }
>>>>>
>>>>> +#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
>>>>> + "already assigned to %s"
>>>>> +
>>>>> +static void vfio_ap_mdev_log_sharing_err(const char *mdev_name,
>>>>> + unsigned long *apm,
>>>>> + unsigned long *aqm)
>>>>> +{
>>>>> + unsigned long apid, apqi;
>>>>> +
>>>>> + for_each_set_bit_inv(apid, apm, AP_DEVICES)
>>>>> + for_each_set_bit_inv(apqi, aqm, AP_DOMAINS)
>>>>> + pr_err(MDEV_SHARING_ERR, apid, apqi, mdev_name);
>>>> Isn't error rather severe for this? For my taste even warning would be
>>>> severe for this.
>>> The user only sees a EADDRINUSE returned from the sysfs interface,
>>> so Conny asked if I could log a message to indicate which APQNs are
>>> in use by which mdev. I can change this to an info message, but it
>>> will be missed if the log level is set higher. Maybe Conny can put in
>>> her two cents here since she asked for this.
>>>
>> I'm looking forward to Conny's opinion. :)
> (only just saw this; -ETOOMANYEMAILS)
>
> It is probably not an error in the sense of "things are broken, this
> cannot work"; but I'd consider this at least a warning "this does not
> work as you intended".

Okay then, I'll make it a warning.

>