2019-06-13 19:41:03

by Anthony Krowiak

[permalink] [raw]
Subject: [PATCH v4 0/7] s390: vfio-ap: dynamic configuration support

The current design for AP pass-through does not support making dynamic
changes to the AP matrix of a running guest resulting in three deficiencies
this patch series is intended to mitigate:

1. Adapters, domains and control domains can not be added to or removed
from a running guest. In order to modify a guest's AP configuration,
the guest must be terminated; only then can AP resources be assigned
to or unassigned from the guest's matrix mdev. The new AP configuration
becomes available to the guest when it is subsequently restarted.

2. The AP bus's /sys/bus/ap/apmask and /sys/bus/ap/aqmask interfaces can
be modified by a root user without any restrictions. A change to either
mask can result in AP queue devices being unbound from the vfio_ap
device driver and bound to a zcrypt device driver even if a guest is
using the queues, thus giving the host access to the guest's private
crypto data and vice versa.

3. The APQNs derived from the Cartesian product of the APIDs of the
adapters and APQIs of the domains assigned to a matrix mdev must
an AP queue device bound to the vfio_ap device driver.

This patch series introduces the following changes to the current design
to alleviate the shortcomings described above as well as to implement more
of the AP architecture:

1. A root user will be prevented from making changes to the AP bus's
/sys/bus/ap/apmask or /sys/bus/ap/aqmask if the ownership of an APQN
changes from the vfio_ap device driver to a zcrypt driver if the APQN is
assigned to a matrix mdev.

2. The sysfs bind/unbind interfaces will be disabled for the vfio_ap device
driver.

3. Allow AP resources to be assigned to or removed from a matrix mdev
while a guest is using it and hot plug the resource into or hot unplug
the resource from the running guest.

4. Allow assignment of an AP adapter or domain to a matrix mdev even if it
results in assignment of an APQN that does not reference an AP queue
device bound to the vfio_ap device driver, as long as the APQN is owned
by the vfio_ap driver. Allowing over-provisioning of AP resources
better models the architecture which does not preclude assigning AP
resources that are not yet available in the system.

1. Rationale for changes to AP bus's apmask/aqmask interfaces:
----------------------------------------------------------
Due to the extremely sensitive nature of cryptographic data, it is
imperative that great care be taken to ensure that such data is secured.
Allowing a root user, either inadvertently or maliciously, to configure
these masks such that a queue is shared between the host and a guest is
not only avoidable, it is advisable. It was suggested that this scenario
is better handled in user space with management software, but that does
not preclude a malicious administrator from using the sysfs interfaces
to gain access to a guest's crypto data. It was also suggested that this
scenario could be avoided by taking access to the adapter away from the
guest and zeroing out the queues prior to the vfio_ap driver releasing the
device; however, stealing an adapter in use from a guest as a by-product
of an operation is bad and will likely cause problems for the guest
unnecessarily. It was decided that the most effective solution with the
least number of negative side effects is to prevent the situation at the
source. It is not out of the ordinary for the kernel to prevent a root
user from performing an action under certain circumstances; for example,
a root user is prevented from removing a module until all references to it
are given up. An even more pertinent example is the device driver bind
interface. Binding a device to a driver that does not meet the match
criteria will be rejected by the kernel.

2. Rationale for disabling bind/unbind interfaces for vfio_ap driver:
-----------------------------------------------------------------
By disabling the bind/unbind interfaces for the vfio_ap device driver,
the user is forced to use the AP bus's apmask/aqmask interfaces to control
the probing and removing of AP queues. There are two primary reasons for
disabling the bind/unbind interfaces for the vfio_ap device driver:

* The device architecture does not provide a means to prevent unbinding
a device from a device driver, so an AP queue device can be unbound
from the vfio_ap driver even the queue is in use by a guest. By
disabling the unbind interface, the user is forced to use the AP bus's
apmask/aqmask interfaces which will prevent this.

* Binding of AP queues is controlled by the AP bus /sys/bus/ap/apmask and
/sys/bus/ap/aqmask interfaces. If the masks indicate that an APQN is
owned by zcrypt, trying to bind it to the vfio_ap device driver will
fail; therefore, the bind interface is somewhat redundant and certainly
unnecessary.

3. Rationale for hot plug/unplug using matrix mdev sysfs interfaces:
----------------------------------------------------------------
Allowing a user to hot plug/unplug AP resources using the matrix mdev
sysfs interfaces circumvents the need to terminate the guest in order to
modify its AP configuration. Allowing dynamic configuration makes
reconfiguring a guest's AP matrix much less disruptive.

4. Rationale for allowing over-provisioning of AP resources:
-----------------------------------------------------------
Allowing assignment of AP resources to a matrix mdev and ultimately to a
guest better models the AP architecture. The architecture does not
preclude assignment of unavailable AP resources. If a queue subsequently
becomes available while a guest using the matrix mdev to which its APQN
is assigned, the guest will automatically acquire access to it. If an APQN
is dynamically unassigned from the underlying host system, it will
automatically become unavailable to the guest.

Note: This patch series is rebased on top of the patch series for
'vfio: ap: AP Queue Interrupt Control' (v9) to make merging of the
two series simpler.


Change log v3->v4:
-----------------
* Restored patches preventing root user from changing ownership of
APQNs from zcrypt drivers to the vfio_ap driver if the APQN is
assigned to an mdev.

* No longer enforcing requirement restricting guest access to
queues represented by a queue device bound to the vfio_ap
device driver.

* Removed shadow CRYCB and now directly updating the guest CRYCB
from the matrix mdev's matrix.

* Rebased the patch series on top of 'vfio: ap: AP Queue Interrupt
Control' patches.

* Disabled bind/unbind sysfs interfaces for vfio_ap driver

Change log v2->v3:
-----------------
* Allow guest access to an AP queue only if the queue is bound to
the vfio_ap device driver.

* Removed the patch to test CRYCB masks before taking the vCPUs
out of SIE. Now checking the shadow CRYCB in the vfio_ap driver.

Change log v1->v2:
-----------------
* Removed patches preventing root user from unbinding AP queues from
the vfio_ap device driver
* Introduced a shadow CRYCB in the vfio_ap driver to manage dynamic
changes to the AP guest configuration due to root user interventions
or hardware anomalies.

Tony Krowiak (7):
s390: vfio-ap: Refactor vfio_ap driver probe and remove callbacks
s390: vfio-ap: wait for queue empty on queue reset
s390: zcrypt: driver callback to indicate resource in use
s390: vfio-ap: implement in-use callback for vfio_ap driver
s390: vfio-ap: allow assignment of unavailable AP resources to mdev
device
s390: vfio-ap: allow hot plug/unplug of AP resources using mdev device
s390: vfio-ap: update documentation

Documentation/s390/vfio-ap.txt | 292 +++++++++++++++++++--------
drivers/s390/crypto/ap_bus.c | 138 ++++++++++++-
drivers/s390/crypto/ap_bus.h | 3 +
drivers/s390/crypto/vfio_ap_drv.c | 51 +++--
drivers/s390/crypto/vfio_ap_ops.c | 370 +++++++++++++---------------------
drivers/s390/crypto/vfio_ap_private.h | 6 +-
6 files changed, 526 insertions(+), 334 deletions(-)

--
2.7.4


2019-06-13 19:41:36

by Anthony Krowiak

[permalink] [raw]
Subject: [PATCH v4 2/7] s390: vfio-ap: wait for queue empty on queue reset

Refactors the AP queue reset function to wait until the queue is empty
after the PQAP(ZAPQ) instruction is executed to zero out the queue as
required by the AP architecture.

Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 49 +++++++++++++++++++++++++++------------
1 file changed, 34 insertions(+), 15 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index bf2ab02b9a0b..60efd3d7896d 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1128,23 +1128,46 @@ static void vfio_ap_irq_disable_apqn(int apqn)
}
}

-int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
- unsigned int retry)
+static void vfio_ap_mdev_wait_for_qempty(ap_qid_t qid)
+{
+ struct ap_queue_status status;
+ int retry = 5;
+
+ do {
+ status = ap_tapq(qid, NULL);
+ switch (status.response_code) {
+ case AP_RESPONSE_NORMAL:
+ if (status.queue_empty)
+ return;
+ case AP_RESPONSE_RESET_IN_PROGRESS:
+ case AP_RESPONSE_BUSY:
+ msleep(20);
+ break;
+ default:
+ pr_warn("%s: tapq response %02x waiting for queue %04x.%02x empty\n",
+ __func__, status.response_code,
+ AP_QID_CARD(qid), AP_QID_QUEUE(qid));
+ return;
+ }
+ } while (--retry);
+
+ WARN_ON_ONCE(retry <= 0);
+}
+
+int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi)
{
struct ap_queue_status status;
- int retry2 = 2;
int apqn = AP_MKQID(apid, apqi);
+ int retry = 5;

do {
status = ap_zapq(apqn);
switch (status.response_code) {
case AP_RESPONSE_NORMAL:
- while (!status.queue_empty && retry2--) {
- msleep(20);
- status = ap_tapq(apqn, NULL);
- }
- WARN_ON_ONCE(retry <= 0);
+ vfio_ap_mdev_wait_for_qempty(AP_MKQID(apid, apqi));
return 0;
+ case AP_RESPONSE_DECONFIGURED:
+ return -ENODEV;
case AP_RESPONSE_RESET_IN_PROGRESS:
case AP_RESPONSE_BUSY:
msleep(20);
@@ -1169,14 +1192,10 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
matrix_mdev->matrix.apm_max + 1) {
for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
matrix_mdev->matrix.aqm_max + 1) {
- ret = vfio_ap_mdev_reset_queue(apid, apqi, 1);
- /*
- * Regardless whether a queue turns out to be busy, or
- * is not operational, we need to continue resetting
- * the remaining queues.
- */
+ ret = vfio_ap_mdev_reset_queue(apid, apqi);
if (ret)
rc = ret;
+
vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi));
}
}
@@ -1326,7 +1345,7 @@ void vfio_ap_mdev_remove_queue(struct ap_queue *queue)
dev_set_drvdata(&queue->ap_dev.device, NULL);
apid = AP_QID_CARD(q->apqn);
apqi = AP_QID_QUEUE(q->apqn);
- vfio_ap_mdev_reset_queue(apid, apqi, 1);
+ vfio_ap_mdev_reset_queue(apid, apqi);
vfio_ap_irq_disable(q);
kfree(q);
}
--
2.7.4

2019-06-13 19:41:37

by Anthony Krowiak

[permalink] [raw]
Subject: [PATCH v4 3/7] s390: zcrypt: driver callback to indicate resource in use

Introduces a new driver callback to prevent a root user from unbinding
an AP queue from its device driver if the queue is in use. This prevents
a root user from inadvertently taking a queue away from a guest and
giving it to the host, or vice versa. The callback will be invoked
whenever a change to the AP bus's apmask or aqmask sysfs interfaces may
result in one or more AP queues being removed from its driver. If the
callback responds in the affirmative for any driver queried, the change
to the apmask or aqmask will be rejected with a device in use error.

For this patch, only non-default drivers will be queried. Currently,
there is only one non-default driver, the vfio_ap device driver. The
vfio_ap device driver manages AP queues passed through to one or more
guests and we don't want to unexpectedly take AP resources away from
guests which are most likely independently administered.

Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/ap_bus.c | 138 +++++++++++++++++++++++++++++++++++++++++--
drivers/s390/crypto/ap_bus.h | 3 +
2 files changed, 135 insertions(+), 6 deletions(-)

diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
index b9fc502c58c2..1b06f47d0d1c 100644
--- a/drivers/s390/crypto/ap_bus.c
+++ b/drivers/s390/crypto/ap_bus.c
@@ -35,6 +35,7 @@
#include <linux/mod_devicetable.h>
#include <linux/debugfs.h>
#include <linux/ctype.h>
+#include <linux/module.h>

#include "ap_bus.h"
#include "ap_debug.h"
@@ -998,9 +999,11 @@ int ap_parse_mask_str(const char *str,
newmap = kmalloc(size, GFP_KERNEL);
if (!newmap)
return -ENOMEM;
- if (mutex_lock_interruptible(lock)) {
- kfree(newmap);
- return -ERESTARTSYS;
+ if (lock) {
+ if (mutex_lock_interruptible(lock)) {
+ kfree(newmap);
+ return -ERESTARTSYS;
+ }
}

if (*str == '+' || *str == '-') {
@@ -1012,7 +1015,10 @@ int ap_parse_mask_str(const char *str,
}
if (rc == 0)
memcpy(bitmap, newmap, size);
- mutex_unlock(lock);
+
+ if (lock)
+ mutex_unlock(lock);
+
kfree(newmap);
return rc;
}
@@ -1199,12 +1205,72 @@ static ssize_t apmask_show(struct bus_type *bus, char *buf)
return rc;
}

+int __verify_card_reservations(struct device_driver *drv, void *data)
+{
+ int rc = 0;
+ struct ap_driver *ap_drv = to_ap_drv(drv);
+ unsigned long *newapm = (unsigned long *)data;
+
+ /*
+ * If the reserved bits do not identify cards reserved for use by the
+ * non-default driver, there is no need to verify the driver is using
+ * the queues.
+ */
+ if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
+ return 0;
+
+ /* The non-default driver's module must be loaded */
+ if (!try_module_get(drv->owner))
+ return 0;
+
+ if (ap_drv->in_use)
+ if (ap_drv->in_use(newapm, ap_perms.aqm))
+ rc = -EADDRINUSE;
+
+ module_put(drv->owner);
+
+ return rc;
+}
+
+static int apmask_commit(unsigned long *newapm)
+{
+ int rc;
+ unsigned long reserved[BITS_TO_LONGS(AP_DEVICES)];
+
+ /*
+ * Check if any bits in the apmask have been set which will
+ * result in queues being removed from non-default drivers
+ */
+ if (bitmap_andnot(reserved, newapm, ap_perms.apm, AP_DEVICES)) {
+ rc = (bus_for_each_drv(&ap_bus_type, NULL, reserved,
+ __verify_card_reservations));
+ if (rc)
+ return rc;
+ }
+
+ memcpy(ap_perms.apm, newapm, APMASKSIZE);
+
+ return 0;
+}
+
static ssize_t apmask_store(struct bus_type *bus, const char *buf,
size_t count)
{
int rc;
+ unsigned long newapm[BITS_TO_LONGS(AP_DEVICES)];
+
+ memcpy(newapm, ap_perms.apm, APMASKSIZE);
+
+ rc = ap_parse_mask_str(buf, newapm, AP_DEVICES, NULL);
+ if (rc)
+ return rc;
+
+ if (mutex_lock_interruptible(&ap_perms_mutex))
+ return -ERESTARTSYS;
+
+ rc = apmask_commit(newapm);
+ mutex_unlock(&ap_perms_mutex);

- rc = ap_parse_mask_str(buf, ap_perms.apm, AP_DEVICES, &ap_perms_mutex);
if (rc)
return rc;

@@ -1230,12 +1296,72 @@ static ssize_t aqmask_show(struct bus_type *bus, char *buf)
return rc;
}

+int __verify_queue_reservations(struct device_driver *drv, void *data)
+{
+ int rc = 0;
+ struct ap_driver *ap_drv = to_ap_drv(drv);
+ unsigned long *newaqm = (unsigned long *)data;
+
+ /*
+ * If the reserved bits do not identify queues reserved for use by the
+ * non-default driver, there is no need to verify the driver is using
+ * the queues.
+ */
+ if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
+ return 0;
+
+ /* The non-default driver's module must be loaded */
+ if (!try_module_get(drv->owner))
+ return 0;
+
+ if (ap_drv->in_use)
+ if (ap_drv->in_use(ap_perms.apm, newaqm))
+ rc = -EADDRINUSE;
+
+ module_put(drv->owner);
+
+ return rc;
+}
+
+static int aqmask_commit(unsigned long *newaqm)
+{
+ int rc;
+ unsigned long reserved[BITS_TO_LONGS(AP_DOMAINS)];
+
+ /*
+ * Check if any bits in the aqmask have been set which will
+ * result in queues being removed from non-default drivers
+ */
+ if (bitmap_andnot(reserved, newaqm, ap_perms.aqm, AP_DOMAINS)) {
+ rc = (bus_for_each_drv(&ap_bus_type, NULL, reserved,
+ __verify_queue_reservations));
+ if (rc)
+ return rc;
+ }
+
+ memcpy(ap_perms.aqm, newaqm, APMASKSIZE);
+
+ return 0;
+}
+
static ssize_t aqmask_store(struct bus_type *bus, const char *buf,
size_t count)
{
int rc;
+ unsigned long newaqm[BITS_TO_LONGS(AP_DEVICES)];
+
+ memcpy(newaqm, ap_perms.aqm, AQMASKSIZE);
+
+ rc = ap_parse_mask_str(buf, newaqm, AP_DOMAINS, NULL);
+ if (rc)
+ return rc;
+
+ if (mutex_lock_interruptible(&ap_perms_mutex))
+ return -ERESTARTSYS;
+
+ rc = aqmask_commit(newaqm);
+ mutex_unlock(&ap_perms_mutex);

- rc = ap_parse_mask_str(buf, ap_perms.aqm, AP_DOMAINS, &ap_perms_mutex);
if (rc)
return rc;

diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
index 6f3cf37776ca..0f865c5545f2 100644
--- a/drivers/s390/crypto/ap_bus.h
+++ b/drivers/s390/crypto/ap_bus.h
@@ -137,6 +137,7 @@ struct ap_driver {
void (*remove)(struct ap_device *);
void (*suspend)(struct ap_device *);
void (*resume)(struct ap_device *);
+ bool (*in_use)(unsigned long *apm, unsigned long *aqm);
};

#define to_ap_drv(x) container_of((x), struct ap_driver, driver)
@@ -265,6 +266,8 @@ void ap_queue_reinit_state(struct ap_queue *aq);
struct ap_card *ap_card_create(int id, int queue_depth, int raw_device_type,
int comp_device_type, unsigned int functions);

+#define APMASKSIZE (BITS_TO_LONGS(AP_DEVICES) * sizeof(unsigned long))
+#define AQMASKSIZE (BITS_TO_LONGS(AP_DOMAINS) * sizeof(unsigned long))
struct ap_perms {
unsigned long ioctlm[BITS_TO_LONGS(AP_IOCTLS)];
unsigned long apm[BITS_TO_LONGS(AP_DEVICES)];
--
2.7.4

2019-06-13 19:42:36

by Anthony Krowiak

[permalink] [raw]
Subject: [PATCH v4 7/7] s390: vfio-ap: update documentation

This patch updates the vfio-ap documentation to include the information
below.

Changes made to the mdev matrix assignment interfaces:

* Allow assignment of APQNs that are not bound to the vfio-ap device
driver as long as they are not owned by a zcrypt driver as identified
in the AP bus sysfs apmask and aqmask interfaces.

* Allow assignment of an AP resource to a mediated device which is in use
by a guest to hot plug an adapter, domain and control domain into a
running guest.

* Allow unassignment of an AP resource from a mediated device which is in
use by a guest to hot unplug an adapter, domain and control domain from
a running guest.

This patch also:

* Clarifies the section on configuring the AP bus's apmask and aqmask.

* Adds sections on dynamic configuration using the SE or SCLP command.

Signed-off-by: Tony Krowiak <[email protected]>
---
Documentation/s390/vfio-ap.txt | 292 ++++++++++++++++++++++++++++++-----------
1 file changed, 213 insertions(+), 79 deletions(-)

diff --git a/Documentation/s390/vfio-ap.txt b/Documentation/s390/vfio-ap.txt
index 65167cfe4485..9372a6570ce1 100644
--- a/Documentation/s390/vfio-ap.txt
+++ b/Documentation/s390/vfio-ap.txt
@@ -81,10 +81,19 @@ definitions:
which the AP command is to be sent for processing.

The AP bus will create a sysfs device for each APQN that can be derived from
- the cross product of the AP adapter and usage domain numbers detected when the
- AP bus module is loaded. For example, if adapters 4 and 10 (0x0a) and usage
- domains 6 and 71 (0x47) are assigned to the LPAR, the AP bus will create the
- following sysfs entries:
+ the Cartesian product of the AP adapter and usage domain numbers detected when
+ the AP bus module is loaded. For example, if adapters 4 and 10 (0x0a) and
+ usage domains 6 and 71 (0x47) are assigned to the LPAR, the Cartesian product
+ would be defined by the following table:
+
+ 06 71
+ +-----------+-----------+
+ 04 | (04,06) | (04,47) |
+ +-----------|-----------+
+ 10 | (0a,06) | (0a,47) |
+ +-----------|-----------+
+
+ The AP bus will create the following sysfs entries:

/sys/devices/ap/card04/04.0006
/sys/devices/ap/card04/04.0047
@@ -146,10 +155,20 @@ If you recall from the description of an AP Queue, AP instructions include
an APQN to identify the AP queue to which an AP command-request message is to be
sent (NQAP and PQAP instructions), or from which a command-reply message is to
be received (DQAP instruction). The validity of an APQN is defined by the matrix
-calculated from the APM and AQM; it is the cross product of all assigned adapter
-numbers (APM) with all assigned queue indexes (AQM). For example, if adapters 1
-and 2 and usage domains 5 and 6 are assigned to a guest, the APQNs (1,5), (1,6),
-(2,5) and (2,6) will be valid for the guest.
+calculated from the APM and AQM; it is the Cartesian product of all assigned
+adapter numbers (APM) with all assigned queue indexes (AQM). For example, if
+adapters 1 and 2 and usage domains 5 and 6 are assigned to a guest:
+
+
+ 05 06
+ +-----------+-----------+
+ 01 | (01,05) | (01,06) |
+ +-----------|-----------+
+ 02 | (02,05) | (02,06) |
+ +-----------|-----------+
+
+
+APQNs (01,05), (01,06), (02,05) and (02,06) will be valid for the guest.

The APQNs can provide secure key functionality - i.e., a private key is stored
on the adapter card for each of its domains - so each APQN must be assigned to
@@ -349,8 +368,9 @@ matrix device.
number of the the usage domain is echoed to the respective attribute
file.
* matrix:
- A read-only file for displaying the APQNs derived from the cross product
- of the adapter and domain numbers assigned to the mediated matrix device.
+ A read-only file for displaying the APQNs derived from the Caresian
+ product of the adapter and domain numbers assigned to the mediated matrix
+ device.
* assign_control_domain:
* unassign_control_domain:
Write-only attributes for assigning/unassigning an AP control domain
@@ -438,9 +458,10 @@ guest use.
Example:
=======
Let's now provide an example to illustrate how KVM guests may be given
-access to AP facilities. For this example, we will show how to configure
-three guests such that executing the lszcrypt command on the guests would
-look like this:
+access to AP facilities. For this example, we will assume that adapters 4, 5
+and 6 and domains 4, 71 (0x47), 171 (0xab) and 255 (0xff) are assigned to the
+LPAR and are online. We will show how to configure three guests such that
+executing the lszcrypt command on the guests would look like this:

Guest1
------
@@ -466,7 +487,7 @@ Guest2
CARD.DOMAIN TYPE MODE
------------------------------
06 CEX5A Accelerator
-06.0047 CEX5A Accelerator
+06.0047 CEX5A AcceleratorNote that this directory may contain additional bindings depending upon what
06.00ff CEX5A Accelerator

These are the steps:
@@ -513,35 +534,44 @@ These are the steps:
/sys/bus/ap/aqmask

The 'apmask' is a 256-bit mask that identifies a set of AP adapter IDs
- (APID). Each bit in the mask, from left to right (i.e., from most significant
- to least significant bit in big endian order), corresponds to an APID from
- 0-255. If a bit is set, the APID is marked as usable only by the default AP
- queue device drivers; otherwise, the APID is usable by the vfio_ap
- device driver.
+ (APID). Each bit in the mask, from left to right, corresponds to an APID from
+ 0-255.

The 'aqmask' is a 256-bit mask that identifies a set of AP queue indexes
- (APQI). Each bit in the mask, from left to right (i.e., from most significant
- to least significant bit in big endian order), corresponds to an APQI from
- 0-255. If a bit is set, the APQI is marked as usable only by the default AP
- queue device drivers; otherwise, the APQI is usable by the vfio_ap device
- driver.
+ (APQI). Each bit in the mask, from left to right, corresponds to an APQI from
+ 0-255.
+
+ The Cartesian product of the APIDs set in the apmask and the APQIs set in
+ the aqmask identify the APQNs of AP queue devices owned by the zcrypt
+ device drivers.
+
+ Take, for example, the following masks:

- Take, for example, the following mask:
+ apmask: 0x7000000000000000000000000000000000000000000000000000000000000000

- 0x7dffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
+ aqmask: 0x0180000000000000000000000000000000000000000000000000000000000000

- It indicates:
+ The bits set in apmask are bits 1, 2 and 3. The bits set in aqmask are bits
+ 7 and 8. The Cartesian product of the bits set in the two masks is:

- 1, 2, 3, 4, 5, and 7-255 belong to the default drivers' pool, and 0 and 6
- belong to the vfio_ap device driver's pool.
+ 07 08
+ +-----------+-----------+
+ 01 | (01,07) | (01,08) |
+ +-----------|-----------+
+ 02 | (02,07) | (02,08) |
+ +-----------|-----------+
+ 03 | (03,07) | (03,08) |
+ +-----------|-----------+

- The APQN of each AP queue device assigned to the linux host is checked by the
- AP bus against the set of APQNs derived from the cross product of APIDs
- and APQIs marked as usable only by the default AP queue device drivers. If a
- match is detected, only the default AP queue device drivers will be probed;
- otherwise, the vfio_ap device driver will be probed.
+ The masks indicate that the queues with APQNs (01,07), (01,08), (02,07),
+ (02,08), (03,07) and (03,08) are owned by the zcrypt drivers. When the AP bus
+ detects an AP queue device, its APQN is checked against the set of APQNs
+ derived from the apmask and aqmask. If a match is detected, the zcrypt
+ device driver registered for the device type of the queue will be probed. If
+ a match is not detected and the device type of the queue is CEX4 or newer,
+ the vfio_ap device driver will be probed.

- By default, the two masks are set to reserve all APQNs for use by the default
+ By default, the two masks are set to reserve all APQNs for use by the zcrypt
AP queue device drivers. There are two ways the default masks can be changed:

1. The sysfs mask files can be edited by echoing a string into the
@@ -554,8 +584,7 @@ These are the steps:

0x4100000000000000000000000000000000000000000000000000000000000000

- Keep in mind that the mask reads from left to right (i.e., most
- significant to least significant bit in big endian order), so the mask
+ Keep in mind that the mask reads from left to right, so the mask
above identifies device numbers 1 and 7 (01000001).

If the string is longer than the mask, the operation is terminated with
@@ -563,7 +592,7 @@ These are the steps:

* Individual bits in the mask can be switched on and off by specifying
each bit number to be switched in a comma separated list. Each bit
- number string must be prepended with a ('+') or minus ('-') to indicate
+ number string must be prefixed with a ('+') or minus ('-') to indicate
the corresponding bit is to be switched on ('+') or off ('-'). Some
valid values are:

@@ -594,19 +623,26 @@ These are the steps:
aqmask:
0x4000000000000000000000000000000000000000000000000000000000000000

- Resulting in these two pools:
-
- default drivers pool: adapter 0-15, domain 1
- alternate drivers pool: adapter 16-255, domains 0, 2-255
+Recall that the Cartesian product of the APIDs set in the apmask and the APQIs
+set in the aqmask identify the APQNs of AP queue devices owned by the zcrypt
+device drivers. If an attempt is made to modify the apmask or aqmask such that
+one or more APQNs changes ownership from a the vfio_ap device driver to a zcrypt
+device driver and the APQN is assigned to a mediated device (see step 3 below),
+the operation will fail with an error ('Address already in use').

Securing the APQNs for our example:
----------------------------------
- To secure the AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004, 06.0047,
- 06.00ab, and 06.00ff for use by the vfio_ap device driver, the corresponding
- APQNs can either be removed from the default masks:
+ There is no way to secure the specific AP queues 05.0004, 05.0047, 05.00ab,
+ 05.00ff, 06.0004, 06.0047, 06.00ab, and 06.00ff for use by the vfio_ap device
+ driver, so we are left with either securing all queues on adapters 05 and
+ 06, or queues 0004, 0047, 00ab and 00ff can be secured on all adapters.
+
+ To secure all queues on adapters 05 and 05:

echo -5,-6 > /sys/bus/ap/apmask

+ To secure queues 0004, 0047, 00ab, and 00ff on all adapters:
+
echo -4,-0x47,-0xab,-0xff > /sys/bus/ap/aqmask

Or the masks can be set as follows:
@@ -617,14 +653,20 @@ These are the steps:
echo 0xf7fffffffffffffffeffffffffffffffffffffffffeffffffffffffffffffffe \
> aqmask

- This will result in AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004,
- 06.0047, 06.00ab, and 06.00ff getting bound to the vfio_ap device driver. The
- sysfs directory for the vfio_ap device driver will now contain symbolic links
- to the AP queue devices bound to it:
+ For this example, we will choose to secure queues 0004, 0047, 00ab, and 00ff
+ on all adapters. This will result in AP queues 04.0004, 04.0047, 04.00ab,
+ 04.00ff, 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004, 06.0047, 06.00ab and
+ 06.00ff getting bound to the vfio_ap device driver. The sysfs directory for
+ the vfio_ap device driver will now contain symbolic links to the AP queue
+ devices bound to it:

/sys/bus/ap
... [drivers]
...... [vfio_ap]
+ ......... [04.0004]
+ ......... [04.0047]
+ ......... [04.00ab]
+ ......... [04.00ff]
......... [05.0004]
......... [05.0047]
......... [05.00ab]
@@ -641,11 +683,12 @@ These are the steps:
future and for which there are few older systems on which to test.

The administrator, therefore, must take care to secure only AP queues that
- can be bound to the vfio_ap device driver. The device type for a given AP
- queue device can be read from the parent card's sysfs directory. For example,
- to see the hardware type of the queue 05.0004:
+ can be bound to the vfio_ap device driver, or those queues will not get bound
+ to any driver. The device type for a given AP queue device can be read from
+ the parent card's sysfs directory. For example, to see the hardware type of
+ the queue 05.0004:

- cat /sys/bus/ap/devices/card05/hwtype
+ cat /sys/bus/ap/devices/card05/hwtype

The hwtype must be 10 or higher (CEX4 or newer) in order to be bound to the
vfio_ap device driver.
@@ -747,37 +790,48 @@ These are the steps:
higher than the maximum is specified, the operation will terminate with
an error (ENODEV).

- * All APQNs that can be derived from the adapter ID and the IDs of
- the previously assigned domains must be bound to the vfio_ap device
- driver. If no domains have yet been assigned, then there must be at least
- one APQN with the specified APID bound to the vfio_ap driver. If no such
- APQNs are bound to the driver, the operation will terminate with an
- error (EADDRNOTAVAIL).
+ * Each APQN that can be derived from the adapter ID and the IDs of
+ the previously assigned domains must not be reserved for use by the
+ zcrypt device drivers as specified by the /sys/bus/ap/apmask and
+ /sys/bus/ap/aqmask syfs interfaces. If any APQN is reserved, the operation
+ will terminate with an error (EADDRNOTAVAIL).

- No APQN that can be derived from the adapter ID and the IDs of the
+ * No APQN that can be derived from the adapter ID and the IDs of the
previously assigned domains can be assigned to another mediated matrix
device. If an APQN is assigned to another mediated matrix device, the
operation will terminate with an error (EADDRINUSE).

+ Note that adapters that are currently not available to the host may be
+ assigned to the mediated device. If the adapter subsequently becomes
+ available while a guest is using the mediated device, it will automatically
+ become available to the guest.
+
In order to successfully assign a domain:

* The domain number specified must represent a value from 0 up to the
maximum domain number configured for the system. If a domain number
higher than the maximum is specified, the operation will terminate with
- an error (ENODEV).
+ an error (ENODEV). The maximum domain number can be determined by
+ printing the sysfs /sys/bus/ap/ap_max_domain_id attribute:

- * All APQNs that can be derived from the domain ID and the IDs of
- the previously assigned adapters must be bound to the vfio_ap device
- driver. If no domains have yet been assigned, then there must be at least
- one APQN with the specified APQI bound to the vfio_ap driver. If no such
- APQNs are bound to the driver, the operation will terminate with an
- error (EADDRNOTAVAIL).
+ cat /sys/bus/ap/ap_max_domain_id

- No APQN that can be derived from the domain ID and the IDs of the
+ * Each APQN that can be derived from the domain ID and the IDs of
+ the previously assigned adapters must not be reserved for use by the
+ zcrypt device drivers as specified by the /sys/bus/ap/apmask and
+ /sys/bus/ap/aqmask syfs interfaces. If any APQN is reserved, the operation
+ will terminate with an error (EADDRNOTAVAIL).
+
+ * No APQN that can be derived from the domain ID and the IDs of the
previously assigned adapters can be assigned to another mediated matrix
device. If an APQN is assigned to another mediated matrix device, the
operation will terminate with an error (EADDRINUSE).

+ Note that domains that are currently not available to the host may be
+ assigned to the mediated device. If the domain subsequently becomes
+ available while a guest is using the mediated device, it will automatically
+ become available to the guest.
+
In order to successfully assign a control domain, the domain number
specified must represent a value from 0 up to the maximum domain number
configured for the system. If a control domain number higher than the maximum
@@ -822,16 +876,96 @@ Using our example again, to remove the mediated matrix device $uuid1:
host. If the mdev matrix device is removed, one may want to also reconfigure
the pool of adapters and queues reserved for use by the default drivers.

-Limitations
-===========
-* The KVM/kernel interfaces do not provide a way to prevent restoring an APQN
- to the default drivers pool of a queue that is still assigned to a mediated
- device in use by a guest. It is incumbent upon the administrator to
- ensure there is no mediated device in use by a guest to which the APQN is
- assigned lest the host be given access to the private data of the AP queue
- device such as a private key configured specifically for the guest.
+Hot plug/unplug via mdev matrix device sysfs interfaces:
+=======================================================
+If an mdev matrix device is in use by a running guest, AP resources can be
+plugged into or unplugged from the guest via the mdev device's sysfs
+assignment interfaces. Below are some examples.
+
+ To plug adapter 10 into a running guest:
+
+ echo 0xa > assign_adapter
+
+ To unplug domain 5 from a running guest:
+
+ echo 5 > unassign_domain
+
+To display the matrix of a guest using the mdev matrix device:
+
+ cat guest_matrix
+
+If you attempt to display the guest matrix when a guest is not using the
+mdev matrix device, an error will be displayed (ENODEV). Note that adapters and
+domains that are not yet available or not yet assigned to the LPAR can be
+assigned and will become available to the guest as soon as they become available
+to the host.
+
+Dynamic Changes to AP Configuration using the Support Element (SE):
+==================================================================
+The SE can be used to dynamically make the following changes to the AP
+configuration for an LPAR in which a linux host is running:
+
+ * Configure one or more adapters on
+
+ Configuring an adapter on sets its state to online, thus making it
+ available to the LPAR to which it is assigned. When an adapter is
+ configured on, it immediately becomes available to the LPAR as well as to
+ any guests using a mediated device to which the adapter is assigned.
+
+ * Configure one or more adapters off
+
+ Configuring an adapter off sets its state to standby, thus making it
+ unavailable to the LPAR to which it is assigned. When an adapter is
+ configured off, it immediately becomes unavailable to the LPAR as well as
+ to any guests using a mediated device to which the adapter is assigned.
+
+ * Add adapters or domains to the LPAR configuration
+
+ Adapters and/or domains can be assigned to an LPAR using the Change LPAR
+ Cryptographic Controls task. To make dynamic changes to the AP
+ configuration for an LPAR Running a linux guest, the online adapters
+ assigned to the LPAR must first be configured off. After performing the
+ adapter and/or domain assignments, the AP resources will automatically
+ become available to the linux host running in the LPAR as well as any
+ guests using a mediated device to which the adpaters and/or domains are
+ assigned.
+
+ * Remove adapters or domains from the LPAR configuration
+
+ Adapters and/or domains can be unassigned from an LPAR using the Change
+ LPAR Cryptographic Controls task. To make dynamic changes to the AP
+ configuration for an LPAR Running a linux guest, the online adapters
+ assigned to the LPAR must first be configured off. After performing the
+ adapter and/or domain unassignments, the AP resources will automatically
+ become unavailable to the linux host running in the LPAR as well as any
+ guests using a mediated device to which the adpaters and/or domains are
+ assigned.
+
+Dynamic Changes to the AP Configuration using the SCLP command:
+==============================================================
+The following SCLP commands may be used to dynamically configure AP adapters on
+and off:
+
+* Configure Adjunct Processor
+
+ The 'Configure Adjunct Processor' command sets an AP adapter's state to
+ online, thus making it available to the LPARs to which it is assigned. It will
+ likewise become available to any linux guest using a mediated device to which
+ the adapter is assigned.
+
+* Deconfigure Adjunct Processor
+
+ The 'Deconfigure Adjunct Processor' command sets an AP adapter's state to
+ standby, thus making it unavailable to the LPARs to which it is assigned. It
+ will likewise become unavailable to any linux guest using a mediated device to
+ which the adapter is assigned.

-* Dynamically modifying the AP matrix for a running guest (which would amount to
- hot(un)plug of AP devices for the guest) is currently not supported
+Live migration:
+==============
+Live guest migration is not supported for guests using AP devices. All AP
+devices in use by the guest must be unplugged prior to initiating live
+migration (see "Hot plug/unplug via mdev matrix device sysfs interfaces" section
+above). If you are using QEMU to run your guest and it supports hot plug/unplug
+of the vfio-ap device, this would be another option (consult the QEMU
+documentation for details).

-* Live guest migration is not supported for guests using AP devices.
--
2.7.4

2019-06-13 19:42:40

by Anthony Krowiak

[permalink] [raw]
Subject: [PATCH v4 5/7] s390: vfio-ap: allow assignment of unavailable AP resources to mdev device

The AP architecture does not preclude assignment of AP resources that are
not available. Let's go ahead and implement this facet of the AP
architecture for linux guests.

The current implementation does not allow assignment of AP adapters or
domains to an mdev device if the APQNs resulting from the assignment
reference AP queue devices that are not bound to the vfio_ap device
driver. This patch allows assignment of AP resources to the mdev device as
long as the APQNs resulting from the assignment are not reserved by the AP
BUS for use by the zcrypt device drivers and the APQNs are not assigned to
another mdev device.

Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 231 ++++++++------------------------------
1 file changed, 44 insertions(+), 187 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 60efd3d7896d..9db86c0db52e 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -406,122 +406,6 @@ static struct attribute_group *vfio_ap_mdev_type_groups[] = {
NULL,
};

-struct vfio_ap_queue_reserved {
- unsigned long *apid;
- unsigned long *apqi;
- bool reserved;
-};
-
-/**
- * vfio_ap_has_queue
- *
- * @dev: an AP queue device
- * @data: a struct vfio_ap_queue_reserved reference
- *
- * Flags whether the AP queue device (@dev) has a queue ID containing the APQN,
- * apid or apqi specified in @data:
- *
- * - If @data contains both an apid and apqi value, then @data will be flagged
- * as reserved if the APID and APQI fields for the AP queue device matches
- *
- * - If @data contains only an apid value, @data will be flagged as
- * reserved if the APID field in the AP queue device matches
- *
- * - If @data contains only an apqi value, @data will be flagged as
- * reserved if the APQI field in the AP queue device matches
- *
- * Returns 0 to indicate the input to function succeeded. Returns -EINVAL if
- * @data does not contain either an apid or apqi.
- */
-static int vfio_ap_has_queue(struct device *dev, void *data)
-{
- struct vfio_ap_queue_reserved *qres = data;
- struct ap_queue *ap_queue = to_ap_queue(dev);
- ap_qid_t qid;
- unsigned long id;
-
- if (qres->apid && qres->apqi) {
- qid = AP_MKQID(*qres->apid, *qres->apqi);
- if (qid == ap_queue->qid)
- qres->reserved = true;
- } else if (qres->apid && !qres->apqi) {
- id = AP_QID_CARD(ap_queue->qid);
- if (id == *qres->apid)
- qres->reserved = true;
- } else if (!qres->apid && qres->apqi) {
- id = AP_QID_QUEUE(ap_queue->qid);
- if (id == *qres->apqi)
- qres->reserved = true;
- } else {
- return -EINVAL;
- }
-
- return 0;
-}
-
-/**
- * vfio_ap_verify_queue_reserved
- *
- * @matrix_dev: a mediated matrix device
- * @apid: an AP adapter ID
- * @apqi: an AP queue index
- *
- * Verifies that the AP queue with @apid/@apqi is reserved by the VFIO AP device
- * driver according to the following rules:
- *
- * - If both @apid and @apqi are not NULL, then there must be an AP queue
- * device bound to the vfio_ap driver with the APQN identified by @apid and
- * @apqi
- *
- * - If only @apid is not NULL, then there must be an AP queue device bound
- * to the vfio_ap driver with an APQN containing @apid
- *
- * - If only @apqi is not NULL, then there must be an AP queue device bound
- * to the vfio_ap driver with an APQN containing @apqi
- *
- * Returns 0 if the AP queue is reserved; otherwise, returns -EADDRNOTAVAIL.
- */
-static int vfio_ap_verify_queue_reserved(unsigned long *apid,
- unsigned long *apqi)
-{
- int ret;
- struct vfio_ap_queue_reserved qres;
-
- qres.apid = apid;
- qres.apqi = apqi;
- qres.reserved = false;
-
- ret = driver_for_each_device(&matrix_dev->vfio_ap_drv->driver, NULL,
- &qres, vfio_ap_has_queue);
- if (ret)
- return ret;
-
- if (qres.reserved)
- return 0;
-
- return -EADDRNOTAVAIL;
-}
-
-static int
-vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
- unsigned long apid)
-{
- int ret;
- unsigned long apqi;
- unsigned long nbits = matrix_mdev->matrix.aqm_max + 1;
-
- if (find_first_bit_inv(matrix_mdev->matrix.aqm, nbits) >= nbits)
- return vfio_ap_verify_queue_reserved(&apid, NULL);
-
- for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, nbits) {
- ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
- if (ret)
- return ret;
- }
-
- return 0;
-}
-
/**
* vfio_ap_mdev_verify_no_sharing
*
@@ -529,18 +413,26 @@ vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
* and AP queue indexes comprising the AP matrix are not configured for another
* mediated device. AP queue sharing is not allowed.
*
- * @matrix_mdev: the mediated matrix device
+ * @mdev_apm: the mask identifying the adapters assigned to mdev
+ * @mdev_apm: the mask identifying the adapters assigned to mdev
*
* Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
*/
-static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
+static int vfio_ap_mdev_verify_no_sharing(unsigned long *mdev_apm,
+ unsigned long *mdev_aqm)
{
struct ap_matrix_mdev *lstdev;
DECLARE_BITMAP(apm, AP_DEVICES);
DECLARE_BITMAP(aqm, AP_DOMAINS);

list_for_each_entry(lstdev, &matrix_dev->mdev_list, node) {
- if (matrix_mdev == lstdev)
+ /*
+ * If either of the input masks belongs to the mdev to which an
+ * AP resource is being assigned, then we don't need to verify
+ * that mdev's masks.
+ */
+ if ((mdev_apm == lstdev->matrix.apm) ||
+ (mdev_aqm == lstdev->matrix.aqm))
continue;

memset(apm, 0, sizeof(apm));
@@ -550,12 +442,10 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
* We work on full longs, as we can only exclude the leftover
* bits in non-inverse order. The leftover is all zeros.
*/
- if (!bitmap_and(apm, matrix_mdev->matrix.apm,
- lstdev->matrix.apm, AP_DEVICES))
+ if (!bitmap_and(apm, mdev_apm, lstdev->matrix.apm, AP_DEVICES))
continue;

- if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
- lstdev->matrix.aqm, AP_DOMAINS))
+ if (!bitmap_and(aqm, mdev_aqm, lstdev->matrix.aqm, AP_DOMAINS))
continue;

return -EADDRINUSE;
@@ -564,6 +454,17 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
return 0;
}

+static int vfio_ap_mdev_validate_masks(unsigned long *apm, unsigned long *aqm)
+{
+ int ret;
+
+ ret = ap_apqn_in_matrix_owned_by_def_drv(apm, aqm);
+ if (ret)
+ return (ret == 1) ? -EADDRNOTAVAIL : ret;
+
+ return vfio_ap_mdev_verify_no_sharing(apm, aqm);
+}
+
/**
* assign_adapter_store
*
@@ -602,6 +503,7 @@ static ssize_t assign_adapter_store(struct device *dev,
{
int ret;
unsigned long apid;
+ DECLARE_BITMAP(apm, AP_DEVICES);
struct mdev_device *mdev = mdev_from_dev(dev);
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);

@@ -616,32 +518,19 @@ static ssize_t assign_adapter_store(struct device *dev,
if (apid > matrix_mdev->matrix.apm_max)
return -ENODEV;

- /*
- * Set the bit in the AP mask (APM) corresponding to the AP adapter
- * number (APID). The bits in the mask, from most significant to least
- * significant bit, correspond to APIDs 0-255.
- */
- mutex_lock(&matrix_dev->lock);
-
- ret = vfio_ap_mdev_verify_queues_reserved_for_apid(matrix_mdev, apid);
- if (ret)
- goto done;
+ memset(apm, 0, ARRAY_SIZE(apm) * sizeof(*apm));
+ set_bit_inv(apid, apm);

+ mutex_lock(&matrix_dev->lock);
+ ret = vfio_ap_mdev_validate_masks(apm, matrix_mdev->matrix.aqm);
+ if (ret) {
+ mutex_unlock(&matrix_dev->lock);
+ return ret;
+ }
set_bit_inv(apid, matrix_mdev->matrix.apm);
-
- ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
- if (ret)
- goto share_err;
-
- ret = count;
- goto done;
-
-share_err:
- clear_bit_inv(apid, matrix_mdev->matrix.apm);
-done:
mutex_unlock(&matrix_dev->lock);

- return ret;
+ return count;
}
static DEVICE_ATTR_WO(assign_adapter);

@@ -690,26 +579,6 @@ static ssize_t unassign_adapter_store(struct device *dev,
}
static DEVICE_ATTR_WO(unassign_adapter);

-static int
-vfio_ap_mdev_verify_queues_reserved_for_apqi(struct ap_matrix_mdev *matrix_mdev,
- unsigned long apqi)
-{
- int ret;
- unsigned long apid;
- unsigned long nbits = matrix_mdev->matrix.apm_max + 1;
-
- if (find_first_bit_inv(matrix_mdev->matrix.apm, nbits) >= nbits)
- return vfio_ap_verify_queue_reserved(NULL, &apqi);
-
- for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, nbits) {
- ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
- if (ret)
- return ret;
- }
-
- return 0;
-}
-
/**
* assign_domain_store
*
@@ -748,6 +617,7 @@ static ssize_t assign_domain_store(struct device *dev,
{
int ret;
unsigned long apqi;
+ DECLARE_BITMAP(aqm, AP_DEVICES);
struct mdev_device *mdev = mdev_from_dev(dev);
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
@@ -762,27 +632,19 @@ static ssize_t assign_domain_store(struct device *dev,
if (apqi > max_apqi)
return -ENODEV;

- mutex_lock(&matrix_dev->lock);
-
- ret = vfio_ap_mdev_verify_queues_reserved_for_apqi(matrix_mdev, apqi);
- if (ret)
- goto done;
+ memset(aqm, 0, ARRAY_SIZE(aqm) * sizeof(*aqm));
+ set_bit_inv(apqi, aqm);

+ mutex_lock(&matrix_dev->lock);
+ ret = vfio_ap_mdev_validate_masks(matrix_mdev->matrix.apm, aqm);
+ if (ret) {
+ mutex_unlock(&matrix_dev->lock);
+ return ret;
+ }
set_bit_inv(apqi, matrix_mdev->matrix.aqm);
-
- ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
- if (ret)
- goto share_err;
-
- ret = count;
- goto done;
-
-share_err:
- clear_bit_inv(apqi, matrix_mdev->matrix.aqm);
-done:
mutex_unlock(&matrix_dev->lock);

- return ret;
+ return count;
}
static DEVICE_ATTR_WO(assign_domain);

@@ -868,11 +730,6 @@ static ssize_t assign_control_domain_store(struct device *dev,
if (id > matrix_mdev->matrix.adm_max)
return -ENODEV;

- /* Set the bit in the ADM (bitmask) corresponding to the AP control
- * domain number (id). The bits in the mask, from most significant to
- * least significant, correspond to IDs 0 up to the one less than the
- * number of control domains that can be assigned.
- */
mutex_lock(&matrix_dev->lock);
set_bit_inv(id, matrix_mdev->matrix.adm);
mutex_unlock(&matrix_dev->lock);
--
2.7.4

2019-06-17 08:48:17

by Harald Freudenberger

[permalink] [raw]
Subject: Re: [PATCH v4 2/7] s390: vfio-ap: wait for queue empty on queue reset

On 13.06.19 21:39, Tony Krowiak wrote:
> Refactors the AP queue reset function to wait until the queue is empty
> after the PQAP(ZAPQ) instruction is executed to zero out the queue as
> required by the AP architecture.
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> drivers/s390/crypto/vfio_ap_ops.c | 49 +++++++++++++++++++++++++++------------
> 1 file changed, 34 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index bf2ab02b9a0b..60efd3d7896d 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -1128,23 +1128,46 @@ static void vfio_ap_irq_disable_apqn(int apqn)
> }
> }
>
> -int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
> - unsigned int retry)
> +static void vfio_ap_mdev_wait_for_qempty(ap_qid_t qid)
> +{
> + struct ap_queue_status status;
> + int retry = 5;
> +
> + do {
> + status = ap_tapq(qid, NULL);
> + switch (status.response_code) {
> + case AP_RESPONSE_NORMAL:
> + if (status.queue_empty)
> + return;
> + case AP_RESPONSE_RESET_IN_PROGRESS:
> + case AP_RESPONSE_BUSY:
> + msleep(20);
> + break;
> + default:
> + pr_warn("%s: tapq response %02x waiting for queue %04x.%02x empty\n",
> + __func__, status.response_code,
> + AP_QID_CARD(qid), AP_QID_QUEUE(qid));
The ap and zcrypt code uses %02x.%04x for displaying an APQN.
I would also recommend to handle 0x01 (AP_RESPONSE_Q_NOT_AVAIL) and
0x03 (AP_RESPONSE_DECONFIGURED) as this code may run when a APQN
is removed from the configuration of the LPAR. However, it's up to you if you
want to handle these with the default statement and issue an sysfs warning.
> + return;
> + }
> + } while (--retry);
> +
> + WARN_ON_ONCE(retry <= 0);
> +}
> +
> +int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi)
> {
> struct ap_queue_status status;
> - int retry2 = 2;
> int apqn = AP_MKQID(apid, apqi);
> + int retry = 5;
>
> do {
> status = ap_zapq(apqn);
> switch (status.response_code) {
> case AP_RESPONSE_NORMAL:
> - while (!status.queue_empty && retry2--) {
> - msleep(20);
> - status = ap_tapq(apqn, NULL);
> - }
> - WARN_ON_ONCE(retry <= 0);
> + vfio_ap_mdev_wait_for_qempty(AP_MKQID(apid, apqi));
> return 0;
> + case AP_RESPONSE_DECONFIGURED:
> + return -ENODEV;
> case AP_RESPONSE_RESET_IN_PROGRESS:
> case AP_RESPONSE_BUSY:
> msleep(20);
> @@ -1169,14 +1192,10 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
> matrix_mdev->matrix.apm_max + 1) {
> for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
> matrix_mdev->matrix.aqm_max + 1) {
> - ret = vfio_ap_mdev_reset_queue(apid, apqi, 1);
> - /*
> - * Regardless whether a queue turns out to be busy, or
> - * is not operational, we need to continue resetting
> - * the remaining queues.
> - */
> + ret = vfio_ap_mdev_reset_queue(apid, apqi);
> if (ret)
> rc = ret;
> +
> vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi));
> }
> }
> @@ -1326,7 +1345,7 @@ void vfio_ap_mdev_remove_queue(struct ap_queue *queue)
> dev_set_drvdata(&queue->ap_dev.device, NULL);
> apid = AP_QID_CARD(q->apqn);
> apqi = AP_QID_QUEUE(q->apqn);
> - vfio_ap_mdev_reset_queue(apid, apqi, 1);
> + vfio_ap_mdev_reset_queue(apid, apqi);
> vfio_ap_irq_disable(q);
> kfree(q);
> }

2019-06-17 09:30:11

by Harald Freudenberger

[permalink] [raw]
Subject: Re: [PATCH v4 3/7] s390: zcrypt: driver callback to indicate resource in use

On 13.06.19 21:39, Tony Krowiak wrote:
> Introduces a new driver callback to prevent a root user from unbinding
> an AP queue from its device driver if the queue is in use. This prevents
> a root user from inadvertently taking a queue away from a guest and
> giving it to the host, or vice versa. The callback will be invoked
> whenever a change to the AP bus's apmask or aqmask sysfs interfaces may
> result in one or more AP queues being removed from its driver. If the
> callback responds in the affirmative for any driver queried, the change
> to the apmask or aqmask will be rejected with a device in use error.
>
> For this patch, only non-default drivers will be queried. Currently,
> there is only one non-default driver, the vfio_ap device driver. The
> vfio_ap device driver manages AP queues passed through to one or more
> guests and we don't want to unexpectedly take AP resources away from
> guests which are most likely independently administered.
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> drivers/s390/crypto/ap_bus.c | 138 +++++++++++++++++++++++++++++++++++++++++--
> drivers/s390/crypto/ap_bus.h | 3 +
> 2 files changed, 135 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
> index b9fc502c58c2..1b06f47d0d1c 100644
> --- a/drivers/s390/crypto/ap_bus.c
> +++ b/drivers/s390/crypto/ap_bus.c
> @@ -35,6 +35,7 @@
> #include <linux/mod_devicetable.h>
> #include <linux/debugfs.h>
> #include <linux/ctype.h>
> +#include <linux/module.h>
>
> #include "ap_bus.h"
> #include "ap_debug.h"
> @@ -998,9 +999,11 @@ int ap_parse_mask_str(const char *str,
> newmap = kmalloc(size, GFP_KERNEL);
> if (!newmap)
> return -ENOMEM;
> - if (mutex_lock_interruptible(lock)) {
> - kfree(newmap);
> - return -ERESTARTSYS;
> + if (lock) {
> + if (mutex_lock_interruptible(lock)) {
> + kfree(newmap);
> + return -ERESTARTSYS;
> + }
> }
>
> if (*str == '+' || *str == '-') {
> @@ -1012,7 +1015,10 @@ int ap_parse_mask_str(const char *str,
> }
> if (rc == 0)
> memcpy(bitmap, newmap, size);
> - mutex_unlock(lock);
> +
> + if (lock)
> + mutex_unlock(lock);
> +
> kfree(newmap);
> return rc;
> }
> @@ -1199,12 +1205,72 @@ static ssize_t apmask_show(struct bus_type *bus, char *buf)
> return rc;
> }
>
> +int __verify_card_reservations(struct device_driver *drv, void *data)
> +{
> + int rc = 0;
> + struct ap_driver *ap_drv = to_ap_drv(drv);
> + unsigned long *newapm = (unsigned long *)data;
> +
> + /*
> + * If the reserved bits do not identify cards reserved for use by the
> + * non-default driver, there is no need to verify the driver is using
> + * the queues.
> + */
> + if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
> + return 0;
> +
> + /* The non-default driver's module must be loaded */
> + if (!try_module_get(drv->owner))
> + return 0;
> +
> + if (ap_drv->in_use)
> + if (ap_drv->in_use(newapm, ap_perms.aqm))
> + rc = -EADDRINUSE;
> +
> + module_put(drv->owner);
> +
> + return rc;
> +}
> +
> +static int apmask_commit(unsigned long *newapm)
> +{
> + int rc;
> + unsigned long reserved[BITS_TO_LONGS(AP_DEVICES)];
> +
> + /*
> + * Check if any bits in the apmask have been set which will
> + * result in queues being removed from non-default drivers
> + */
> + if (bitmap_andnot(reserved, newapm, ap_perms.apm, AP_DEVICES)) {
> + rc = (bus_for_each_drv(&ap_bus_type, NULL, reserved,
> + __verify_card_reservations));
Why surround the bus_for_each_drv() with additional parenthesis ?
> + if (rc)
> + return rc;
> + }
> +
> + memcpy(ap_perms.apm, newapm, APMASKSIZE);
> +
> + return 0;
> +}
> +
> static ssize_t apmask_store(struct bus_type *bus, const char *buf,
> size_t count)
> {
> int rc;
> + unsigned long newapm[BITS_TO_LONGS(AP_DEVICES)];
> +
> + memcpy(newapm, ap_perms.apm, APMASKSIZE);
> +
> + rc = ap_parse_mask_str(buf, newapm, AP_DEVICES, NULL);
> + if (rc)
> + return rc;
> +
> + if (mutex_lock_interruptible(&ap_perms_mutex))
> + return -ERESTARTSYS;
This lock is too late. Move it before the memcpy(newapm, ap_perms.apm, APMASKSIZE);
> +
> + rc = apmask_commit(newapm);
> + mutex_unlock(&ap_perms_mutex);
>
> - rc = ap_parse_mask_str(buf, ap_perms.apm, AP_DEVICES, &ap_perms_mutex);
> if (rc)
> return rc;
>
> @@ -1230,12 +1296,72 @@ static ssize_t aqmask_show(struct bus_type *bus, char *buf)
> return rc;
> }
>
> +int __verify_queue_reservations(struct device_driver *drv, void *data)
> +{
> + int rc = 0;
> + struct ap_driver *ap_drv = to_ap_drv(drv);
> + unsigned long *newaqm = (unsigned long *)data;
> +
> + /*
> + * If the reserved bits do not identify queues reserved for use by the
> + * non-default driver, there is no need to verify the driver is using
> + * the queues.
> + */
> + if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
> + return 0;
> +
> + /* The non-default driver's module must be loaded */
> + if (!try_module_get(drv->owner))
> + return 0;
> +
> + if (ap_drv->in_use)
> + if (ap_drv->in_use(ap_perms.apm, newaqm))
> + rc = -EADDRINUSE;
> +
> + module_put(drv->owner);
> +
> + return rc;
> +}
> +
> +static int aqmask_commit(unsigned long *newaqm)
> +{
> + int rc;
> + unsigned long reserved[BITS_TO_LONGS(AP_DOMAINS)];
> +
> + /*
> + * Check if any bits in the aqmask have been set which will
> + * result in queues being removed from non-default drivers
> + */
> + if (bitmap_andnot(reserved, newaqm, ap_perms.aqm, AP_DOMAINS)) {
> + rc = (bus_for_each_drv(&ap_bus_type, NULL, reserved,
> + __verify_queue_reservations));
Same here: please remove the surrounding ( ... ).
> + if (rc)
> + return rc;
> + }
> +
> + memcpy(ap_perms.aqm, newaqm, APMASKSIZE);
APMASKSIZE -> AQMASKSIZE
> +
> + return 0;
> +}
> +
> static ssize_t aqmask_store(struct bus_type *bus, const char *buf,
> size_t count)
> {
> int rc;
> + unsigned long newaqm[BITS_TO_LONGS(AP_DEVICES)];
> +
> + memcpy(newaqm, ap_perms.aqm, AQMASKSIZE);
> +
> + rc = ap_parse_mask_str(buf, newaqm, AP_DOMAINS, NULL);
> + if (rc)
> + return rc;
> +
> + if (mutex_lock_interruptible(&ap_perms_mutex))
> + return -ERESTARTSYS;
Please move the lock before the memcpy(newaqm, ...)
> +
> + rc = aqmask_commit(newaqm);
> + mutex_unlock(&ap_perms_mutex);
>
> - rc = ap_parse_mask_str(buf, ap_perms.aqm, AP_DOMAINS, &ap_perms_mutex);
> if (rc)
> return rc;
>
> diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
> index 6f3cf37776ca..0f865c5545f2 100644
> --- a/drivers/s390/crypto/ap_bus.h
> +++ b/drivers/s390/crypto/ap_bus.h
> @@ -137,6 +137,7 @@ struct ap_driver {
> void (*remove)(struct ap_device *);
> void (*suspend)(struct ap_device *);
> void (*resume)(struct ap_device *);
> + bool (*in_use)(unsigned long *apm, unsigned long *aqm);
> };
>
> #define to_ap_drv(x) container_of((x), struct ap_driver, driver)
> @@ -265,6 +266,8 @@ void ap_queue_reinit_state(struct ap_queue *aq);
> struct ap_card *ap_card_create(int id, int queue_depth, int raw_device_type,
> int comp_device_type, unsigned int functions);
>
> +#define APMASKSIZE (BITS_TO_LONGS(AP_DEVICES) * sizeof(unsigned long))
> +#define AQMASKSIZE (BITS_TO_LONGS(AP_DOMAINS) * sizeof(unsigned long))
add one empty line here please
> struct ap_perms {
> unsigned long ioctlm[BITS_TO_LONGS(AP_IOCTLS)];
> unsigned long apm[BITS_TO_LONGS(AP_DEVICES)];

2019-06-17 10:07:53

by Harald Freudenberger

[permalink] [raw]
Subject: Re: [PATCH v4 5/7] s390: vfio-ap: allow assignment of unavailable AP resources to mdev device

On 13.06.19 21:39, Tony Krowiak wrote:
> The AP architecture does not preclude assignment of AP resources that are
> not available. Let's go ahead and implement this facet of the AP
> architecture for linux guests.
>
> The current implementation does not allow assignment of AP adapters or
> domains to an mdev device if the APQNs resulting from the assignment
> reference AP queue devices that are not bound to the vfio_ap device
> driver. This patch allows assignment of AP resources to the mdev device as
> long as the APQNs resulting from the assignment are not reserved by the AP
> BUS for use by the zcrypt device drivers and the APQNs are not assigned to
> another mdev device.
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> drivers/s390/crypto/vfio_ap_ops.c | 231 ++++++++------------------------------
> 1 file changed, 44 insertions(+), 187 deletions(-)
>
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 60efd3d7896d..9db86c0db52e 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -406,122 +406,6 @@ static struct attribute_group *vfio_ap_mdev_type_groups[] = {
> NULL,
> };
>
> -struct vfio_ap_queue_reserved {
> - unsigned long *apid;
> - unsigned long *apqi;
> - bool reserved;
> -};
> -
> -/**
> - * vfio_ap_has_queue
> - *
> - * @dev: an AP queue device
> - * @data: a struct vfio_ap_queue_reserved reference
> - *
> - * Flags whether the AP queue device (@dev) has a queue ID containing the APQN,
> - * apid or apqi specified in @data:
> - *
> - * - If @data contains both an apid and apqi value, then @data will be flagged
> - * as reserved if the APID and APQI fields for the AP queue device matches
> - *
> - * - If @data contains only an apid value, @data will be flagged as
> - * reserved if the APID field in the AP queue device matches
> - *
> - * - If @data contains only an apqi value, @data will be flagged as
> - * reserved if the APQI field in the AP queue device matches
> - *
> - * Returns 0 to indicate the input to function succeeded. Returns -EINVAL if
> - * @data does not contain either an apid or apqi.
> - */
> -static int vfio_ap_has_queue(struct device *dev, void *data)
> -{
> - struct vfio_ap_queue_reserved *qres = data;
> - struct ap_queue *ap_queue = to_ap_queue(dev);
> - ap_qid_t qid;
> - unsigned long id;
> -
> - if (qres->apid && qres->apqi) {
> - qid = AP_MKQID(*qres->apid, *qres->apqi);
> - if (qid == ap_queue->qid)
> - qres->reserved = true;
> - } else if (qres->apid && !qres->apqi) {
> - id = AP_QID_CARD(ap_queue->qid);
> - if (id == *qres->apid)
> - qres->reserved = true;
> - } else if (!qres->apid && qres->apqi) {
> - id = AP_QID_QUEUE(ap_queue->qid);
> - if (id == *qres->apqi)
> - qres->reserved = true;
> - } else {
> - return -EINVAL;
> - }
> -
> - return 0;
> -}
> -
> -/**
> - * vfio_ap_verify_queue_reserved
> - *
> - * @matrix_dev: a mediated matrix device
> - * @apid: an AP adapter ID
> - * @apqi: an AP queue index
> - *
> - * Verifies that the AP queue with @apid/@apqi is reserved by the VFIO AP device
> - * driver according to the following rules:
> - *
> - * - If both @apid and @apqi are not NULL, then there must be an AP queue
> - * device bound to the vfio_ap driver with the APQN identified by @apid and
> - * @apqi
> - *
> - * - If only @apid is not NULL, then there must be an AP queue device bound
> - * to the vfio_ap driver with an APQN containing @apid
> - *
> - * - If only @apqi is not NULL, then there must be an AP queue device bound
> - * to the vfio_ap driver with an APQN containing @apqi
> - *
> - * Returns 0 if the AP queue is reserved; otherwise, returns -EADDRNOTAVAIL.
> - */
> -static int vfio_ap_verify_queue_reserved(unsigned long *apid,
> - unsigned long *apqi)
> -{
> - int ret;
> - struct vfio_ap_queue_reserved qres;
> -
> - qres.apid = apid;
> - qres.apqi = apqi;
> - qres.reserved = false;
> -
> - ret = driver_for_each_device(&matrix_dev->vfio_ap_drv->driver, NULL,
> - &qres, vfio_ap_has_queue);
> - if (ret)
> - return ret;
> -
> - if (qres.reserved)
> - return 0;
> -
> - return -EADDRNOTAVAIL;
> -}
> -
> -static int
> -vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
> - unsigned long apid)
> -{
> - int ret;
> - unsigned long apqi;
> - unsigned long nbits = matrix_mdev->matrix.aqm_max + 1;
> -
> - if (find_first_bit_inv(matrix_mdev->matrix.aqm, nbits) >= nbits)
> - return vfio_ap_verify_queue_reserved(&apid, NULL);
> -
> - for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, nbits) {
> - ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
> - if (ret)
> - return ret;
> - }
> -
> - return 0;
> -}
> -
> /**
> * vfio_ap_mdev_verify_no_sharing
> *
> @@ -529,18 +413,26 @@ vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
> * and AP queue indexes comprising the AP matrix are not configured for another
> * mediated device. AP queue sharing is not allowed.
> *
> - * @matrix_mdev: the mediated matrix device
> + * @mdev_apm: the mask identifying the adapters assigned to mdev
> + * @mdev_apm: the mask identifying the adapters assigned to mdev
I assume you wanted to write @mdev_aqm ... queues ... at the 2nd line.
> *
> * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
> */
> -static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
> +static int vfio_ap_mdev_verify_no_sharing(unsigned long *mdev_apm,
> + unsigned long *mdev_aqm)
> {
> struct ap_matrix_mdev *lstdev;
> DECLARE_BITMAP(apm, AP_DEVICES);
> DECLARE_BITMAP(aqm, AP_DOMAINS);
>
> list_for_each_entry(lstdev, &matrix_dev->mdev_list, node) {
> - if (matrix_mdev == lstdev)
> + /*
> + * If either of the input masks belongs to the mdev to which an
> + * AP resource is being assigned, then we don't need to verify
> + * that mdev's masks.
> + */
> + if ((mdev_apm == lstdev->matrix.apm) ||
> + (mdev_aqm == lstdev->matrix.aqm))
> continue;
>
> memset(apm, 0, sizeof(apm));
> @@ -550,12 +442,10 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
> * We work on full longs, as we can only exclude the leftover
> * bits in non-inverse order. The leftover is all zeros.
> */
> - if (!bitmap_and(apm, matrix_mdev->matrix.apm,
> - lstdev->matrix.apm, AP_DEVICES))
> + if (!bitmap_and(apm, mdev_apm, lstdev->matrix.apm, AP_DEVICES))
> continue;
>
> - if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
> - lstdev->matrix.aqm, AP_DOMAINS))
> + if (!bitmap_and(aqm, mdev_aqm, lstdev->matrix.aqm, AP_DOMAINS))
> continue;
>
> return -EADDRINUSE;
> @@ -564,6 +454,17 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
> return 0;
> }
>
> +static int vfio_ap_mdev_validate_masks(unsigned long *apm, unsigned long *aqm)
> +{
> + int ret;
> +
> + ret = ap_apqn_in_matrix_owned_by_def_drv(apm, aqm);
> + if (ret)
> + return (ret == 1) ? -EADDRNOTAVAIL : ret;
> +
> + return vfio_ap_mdev_verify_no_sharing(apm, aqm);
> +}
> +
> /**
> * assign_adapter_store
> *
> @@ -602,6 +503,7 @@ static ssize_t assign_adapter_store(struct device *dev,
> {
> int ret;
> unsigned long apid;
> + DECLARE_BITMAP(apm, AP_DEVICES);
> struct mdev_device *mdev = mdev_from_dev(dev);
> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>
> @@ -616,32 +518,19 @@ static ssize_t assign_adapter_store(struct device *dev,
> if (apid > matrix_mdev->matrix.apm_max)
> return -ENODEV;
>
> - /*
> - * Set the bit in the AP mask (APM) corresponding to the AP adapter
> - * number (APID). The bits in the mask, from most significant to least
> - * significant bit, correspond to APIDs 0-255.
> - */
> - mutex_lock(&matrix_dev->lock);
> -
> - ret = vfio_ap_mdev_verify_queues_reserved_for_apid(matrix_mdev, apid);
> - if (ret)
> - goto done;
> + memset(apm, 0, ARRAY_SIZE(apm) * sizeof(*apm));
What about just memset(apm, 0, sizeof(apm));
> + set_bit_inv(apid, apm);
>
> + mutex_lock(&matrix_dev->lock);
> + ret = vfio_ap_mdev_validate_masks(apm, matrix_mdev->matrix.aqm);
> + if (ret) {
> + mutex_unlock(&matrix_dev->lock);
> + return ret;
> + }
> set_bit_inv(apid, matrix_mdev->matrix.apm);
> -
> - ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
> - if (ret)
> - goto share_err;
> -
> - ret = count;
> - goto done;
> -
> -share_err:
> - clear_bit_inv(apid, matrix_mdev->matrix.apm);
> -done:
> mutex_unlock(&matrix_dev->lock);
>
> - return ret;
> + return count;
> }
> static DEVICE_ATTR_WO(assign_adapter);
>
> @@ -690,26 +579,6 @@ static ssize_t unassign_adapter_store(struct device *dev,
> }
> static DEVICE_ATTR_WO(unassign_adapter);
>
> -static int
> -vfio_ap_mdev_verify_queues_reserved_for_apqi(struct ap_matrix_mdev *matrix_mdev,
> - unsigned long apqi)
> -{
> - int ret;
> - unsigned long apid;
> - unsigned long nbits = matrix_mdev->matrix.apm_max + 1;
> -
> - if (find_first_bit_inv(matrix_mdev->matrix.apm, nbits) >= nbits)
> - return vfio_ap_verify_queue_reserved(NULL, &apqi);
> -
> - for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, nbits) {
> - ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
> - if (ret)
> - return ret;
> - }
> -
> - return 0;
> -}
> -
> /**
> * assign_domain_store
> *
> @@ -748,6 +617,7 @@ static ssize_t assign_domain_store(struct device *dev,
> {
> int ret;
> unsigned long apqi;
> + DECLARE_BITMAP(aqm, AP_DEVICES);
AP_DEVICES -> AP_DOMAINS
> struct mdev_device *mdev = mdev_from_dev(dev);
> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
> @@ -762,27 +632,19 @@ static ssize_t assign_domain_store(struct device *dev,
> if (apqi > max_apqi)
> return -ENODEV;
>
> - mutex_lock(&matrix_dev->lock);
> -
> - ret = vfio_ap_mdev_verify_queues_reserved_for_apqi(matrix_mdev, apqi);
> - if (ret)
> - goto done;
> + memset(aqm, 0, ARRAY_SIZE(aqm) * sizeof(*aqm));
memset(aqm, 0, sizeof(aqm));
> + set_bit_inv(apqi, aqm);
>
> + mutex_lock(&matrix_dev->lock);
> + ret = vfio_ap_mdev_validate_masks(matrix_mdev->matrix.apm, aqm);
> + if (ret) {
> + mutex_unlock(&matrix_dev->lock);
> + return ret;
> + }
> set_bit_inv(apqi, matrix_mdev->matrix.aqm);
> -
> - ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
> - if (ret)
> - goto share_err;
> -
> - ret = count;
> - goto done;
> -
> -share_err:
> - clear_bit_inv(apqi, matrix_mdev->matrix.aqm);
> -done:
> mutex_unlock(&matrix_dev->lock);
>
> - return ret;
> + return count;
> }
> static DEVICE_ATTR_WO(assign_domain);
>
> @@ -868,11 +730,6 @@ static ssize_t assign_control_domain_store(struct device *dev,
> if (id > matrix_mdev->matrix.adm_max)
> return -ENODEV;
>
> - /* Set the bit in the ADM (bitmask) corresponding to the AP control
> - * domain number (id). The bits in the mask, from most significant to
> - * least significant, correspond to IDs 0 up to the one less than the
> - * number of control domains that can be assigned.
> - */
> mutex_lock(&matrix_dev->lock);
> set_bit_inv(id, matrix_mdev->matrix.adm);
> mutex_unlock(&matrix_dev->lock);

2019-06-17 11:44:29

by Harald Freudenberger

[permalink] [raw]
Subject: Re: [PATCH v4 7/7] s390: vfio-ap: update documentation

On 13.06.19 21:39, Tony Krowiak wrote:
> This patch updates the vfio-ap documentation to include the information
> below.
>
> Changes made to the mdev matrix assignment interfaces:
>
> * Allow assignment of APQNs that are not bound to the vfio-ap device
> driver as long as they are not owned by a zcrypt driver as identified
> in the AP bus sysfs apmask and aqmask interfaces.
>
> * Allow assignment of an AP resource to a mediated device which is in use
> by a guest to hot plug an adapter, domain and control domain into a
> running guest.
>
> * Allow unassignment of an AP resource from a mediated device which is in
> use by a guest to hot unplug an adapter, domain and control domain from
> a running guest.
>
> This patch also:
>
> * Clarifies the section on configuring the AP bus's apmask and aqmask.
>
> * Adds sections on dynamic configuration using the SE or SCLP command.
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> Documentation/s390/vfio-ap.txt | 292 ++++++++++++++++++++++++++++++-----------
> 1 file changed, 213 insertions(+), 79 deletions(-)
>
> diff --git a/Documentation/s390/vfio-ap.txt b/Documentation/s390/vfio-ap.txt
> index 65167cfe4485..9372a6570ce1 100644
> --- a/Documentation/s390/vfio-ap.txt
> +++ b/Documentation/s390/vfio-ap.txt
> @@ -81,10 +81,19 @@ definitions:
> which the AP command is to be sent for processing.
>
> The AP bus will create a sysfs device for each APQN that can be derived from
> - the cross product of the AP adapter and usage domain numbers detected when the
> - AP bus module is loaded. For example, if adapters 4 and 10 (0x0a) and usage
> - domains 6 and 71 (0x47) are assigned to the LPAR, the AP bus will create the
> - following sysfs entries:
> + the Cartesian product of the AP adapter and usage domain numbers detected when
> + the AP bus module is loaded. For example, if adapters 4 and 10 (0x0a) and
> + usage domains 6 and 71 (0x47) are assigned to the LPAR, the Cartesian product
> + would be defined by the following table:
> +
> + 06 71
> + +-----------+-----------+
> + 04 | (04,06) | (04,47) |
> + +-----------|-----------+
> + 10 | (0a,06) | (0a,47) |
> + +-----------|-----------+
> +
These ascii pictures are nice and show very lovely how this works ...
but you use the domain id as column (x) and the adapter id as row (y).
Then the APQNs are shown in (y,x) writing then following the scheme
(card,domain).
> + The AP bus will create the following sysfs entries:
>
> /sys/devices/ap/card04/04.0006
> /sys/devices/ap/card04/04.0047
> @@ -146,10 +155,20 @@ If you recall from the description of an AP Queue, AP instructions include
> an APQN to identify the AP queue to which an AP command-request message is to be
> sent (NQAP and PQAP instructions), or from which a command-reply message is to
> be received (DQAP instruction). The validity of an APQN is defined by the matrix
> -calculated from the APM and AQM; it is the cross product of all assigned adapter
> -numbers (APM) with all assigned queue indexes (AQM). For example, if adapters 1
> -and 2 and usage domains 5 and 6 are assigned to a guest, the APQNs (1,5), (1,6),
> -(2,5) and (2,6) will be valid for the guest.
> +calculated from the APM and AQM; it is the Cartesian product of all assigned
> +adapter numbers (APM) with all assigned queue indexes (AQM). For example, if
> +adapters 1 and 2 and usage domains 5 and 6 are assigned to a guest:
> +
> +
> + 05 06
> + +-----------+-----------+
> + 01 | (01,05) | (01,06) |
> + +-----------|-----------+
> + 02 | (02,05) | (02,06) |
> + +-----------|-----------+
> +
> +
> +APQNs (01,05), (01,06), (02,05) and (02,06) will be valid for the guest.
>
> The APQNs can provide secure key functionality - i.e., a private key is stored
> on the adapter card for each of its domains - so each APQN must be assigned to
> @@ -349,8 +368,9 @@ matrix device.
> number of the the usage domain is echoed to the respective attribute
> file.
> * matrix:
> - A read-only file for displaying the APQNs derived from the cross product
> - of the adapter and domain numbers assigned to the mediated matrix device.
> + A read-only file for displaying the APQNs derived from the Caresian
> + product of the adapter and domain numbers assigned to the mediated matrix
> + device.
> * assign_control_domain:
> * unassign_control_domain:
> Write-only attributes for assigning/unassigning an AP control domain
> @@ -438,9 +458,10 @@ guest use.
> Example:
> =======
> Let's now provide an example to illustrate how KVM guests may be given
> -access to AP facilities. For this example, we will show how to configure
> -three guests such that executing the lszcrypt command on the guests would
> -look like this:
> +access to AP facilities. For this example, we will assume that adapters 4, 5
> +and 6 and domains 4, 71 (0x47), 171 (0xab) and 255 (0xff) are assigned to the
> +LPAR and are online. We will show how to configure three guests such that
> +executing the lszcrypt command on the guests would look like this:
>
> Guest1
> ------
> @@ -466,7 +487,7 @@ Guest2
> CARD.DOMAIN TYPE MODE
> ------------------------------
> 06 CEX5A Accelerator
> -06.0047 CEX5A Accelerator
> +06.0047 CEX5A AcceleratorNote that this directory may contain additional bindings depending upon what
This line looks somehow weird.
> 06.00ff CEX5A Accelerator
>
> These are the steps:
> @@ -513,35 +534,44 @@ These are the steps:
> /sys/bus/ap/aqmask
>
> The 'apmask' is a 256-bit mask that identifies a set of AP adapter IDs
> - (APID). Each bit in the mask, from left to right (i.e., from most significant
> - to least significant bit in big endian order), corresponds to an APID from
> - 0-255. If a bit is set, the APID is marked as usable only by the default AP
> - queue device drivers; otherwise, the APID is usable by the vfio_ap
> - device driver.
> + (APID). Each bit in the mask, from left to right, corresponds to an APID from
> + 0-255.
>
> The 'aqmask' is a 256-bit mask that identifies a set of AP queue indexes
> - (APQI). Each bit in the mask, from left to right (i.e., from most significant
> - to least significant bit in big endian order), corresponds to an APQI from
> - 0-255. If a bit is set, the APQI is marked as usable only by the default AP
> - queue device drivers; otherwise, the APQI is usable by the vfio_ap device
> - driver.
> + (APQI). Each bit in the mask, from left to right, corresponds to an APQI from
> + 0-255.
> +
> + The Cartesian product of the APIDs set in the apmask and the APQIs set in
> + the aqmask identify the APQNs of AP queue devices owned by the zcrypt
> + device drivers.
> +
> + Take, for example, the following masks:
>
> - Take, for example, the following mask:
> + apmask: 0x7000000000000000000000000000000000000000000000000000000000000000
>
> - 0x7dffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
> + aqmask: 0x0180000000000000000000000000000000000000000000000000000000000000
>
> - It indicates:
> + The bits set in apmask are bits 1, 2 and 3. The bits set in aqmask are bits
> + 7 and 8. The Cartesian product of the bits set in the two masks is:
>
> - 1, 2, 3, 4, 5, and 7-255 belong to the default drivers' pool, and 0 and 6
> - belong to the vfio_ap device driver's pool.
> + 07 08
> + +-----------+-----------+
> + 01 | (01,07) | (01,08) |
> + +-----------|-----------+
> + 02 | (02,07) | (02,08) |
> + +-----------|-----------+
> + 03 | (03,07) | (03,08) |
> + +-----------|-----------+
>
> - The APQN of each AP queue device assigned to the linux host is checked by the
> - AP bus against the set of APQNs derived from the cross product of APIDs
> - and APQIs marked as usable only by the default AP queue device drivers. If a
> - match is detected, only the default AP queue device drivers will be probed;
> - otherwise, the vfio_ap device driver will be probed.
> + The masks indicate that the queues with APQNs (01,07), (01,08), (02,07),
> + (02,08), (03,07) and (03,08) are owned by the zcrypt drivers. When the AP bus
> + detects an AP queue device, its APQN is checked against the set of APQNs
> + derived from the apmask and aqmask. If a match is detected, the zcrypt
> + device driver registered for the device type of the queue will be probed. If
> + a match is not detected and the device type of the queue is CEX4 or newer,
> + the vfio_ap device driver will be probed.
>
> - By default, the two masks are set to reserve all APQNs for use by the default
> + By default, the two masks are set to reserve all APQNs for use by the zcrypt
> AP queue device drivers. There are two ways the default masks can be changed:
>
> 1. The sysfs mask files can be edited by echoing a string into the
> @@ -554,8 +584,7 @@ These are the steps:
>
> 0x4100000000000000000000000000000000000000000000000000000000000000
>
> - Keep in mind that the mask reads from left to right (i.e., most
> - significant to least significant bit in big endian order), so the mask
> + Keep in mind that the mask reads from left to right, so the mask
> above identifies device numbers 1 and 7 (01000001).
>
> If the string is longer than the mask, the operation is terminated with
> @@ -563,7 +592,7 @@ These are the steps:
>
> * Individual bits in the mask can be switched on and off by specifying
> each bit number to be switched in a comma separated list. Each bit
> - number string must be prepended with a ('+') or minus ('-') to indicate
> + number string must be prefixed with a ('+') or minus ('-') to indicate
> the corresponding bit is to be switched on ('+') or off ('-'). Some
> valid values are:
>
> @@ -594,19 +623,26 @@ These are the steps:
> aqmask:
> 0x4000000000000000000000000000000000000000000000000000000000000000
>
> - Resulting in these two pools:
> -
> - default drivers pool: adapter 0-15, domain 1
> - alternate drivers pool: adapter 16-255, domains 0, 2-255
> +Recall that the Cartesian product of the APIDs set in the apmask and the APQIs
> +set in the aqmask identify the APQNs of AP queue devices owned by the zcrypt
> +device drivers. If an attempt is made to modify the apmask or aqmask such that
> +one or more APQNs changes ownership from a the vfio_ap device driver to a zcrypt
> +device driver and the APQN is assigned to a mediated device (see step 3 below),
> +the operation will fail with an error ('Address already in use').
>
> Securing the APQNs for our example:
> ----------------------------------
> - To secure the AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004, 06.0047,
> - 06.00ab, and 06.00ff for use by the vfio_ap device driver, the corresponding
> - APQNs can either be removed from the default masks:
> + There is no way to secure the specific AP queues 05.0004, 05.0047, 05.00ab,
> + 05.00ff, 06.0004, 06.0047, 06.00ab, and 06.00ff for use by the vfio_ap device
> + driver, so we are left with either securing all queues on adapters 05 and
> + 06, or queues 0004, 0047, 00ab and 00ff can be secured on all adapters.
> +
> + To secure all queues on adapters 05 and 05:
>
> echo -5,-6 > /sys/bus/ap/apmask
>
> + To secure queues 0004, 0047, 00ab, and 00ff on all adapters:
> +
> echo -4,-0x47,-0xab,-0xff > /sys/bus/ap/aqmask
>
> Or the masks can be set as follows:
> @@ -617,14 +653,20 @@ These are the steps:
> echo 0xf7fffffffffffffffeffffffffffffffffffffffffeffffffffffffffffffffe \
> > aqmask
>
> - This will result in AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004,
> - 06.0047, 06.00ab, and 06.00ff getting bound to the vfio_ap device driver. The
> - sysfs directory for the vfio_ap device driver will now contain symbolic links
> - to the AP queue devices bound to it:
> + For this example, we will choose to secure queues 0004, 0047, 00ab, and 00ff
> + on all adapters. This will result in AP queues 04.0004, 04.0047, 04.00ab,
> + 04.00ff, 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004, 06.0047, 06.00ab and
> + 06.00ff getting bound to the vfio_ap device driver. The sysfs directory for
> + the vfio_ap device driver will now contain symbolic links to the AP queue
> + devices bound to it:
>
> /sys/bus/ap
> ... [drivers]
> ...... [vfio_ap]
> + ......... [04.0004]
> + ......... [04.0047]
> + ......... [04.00ab]
> + ......... [04.00ff]
> ......... [05.0004]
> ......... [05.0047]
> ......... [05.00ab]
> @@ -641,11 +683,12 @@ These are the steps:
> future and for which there are few older systems on which to test.
>
> The administrator, therefore, must take care to secure only AP queues that
> - can be bound to the vfio_ap device driver. The device type for a given AP
> - queue device can be read from the parent card's sysfs directory. For example,
> - to see the hardware type of the queue 05.0004:
> + can be bound to the vfio_ap device driver, or those queues will not get bound
> + to any driver. The device type for a given AP queue device can be read from
> + the parent card's sysfs directory. For example, to see the hardware type of
> + the queue 05.0004:
>
> - cat /sys/bus/ap/devices/card05/hwtype
> + cat /sys/bus/ap/devices/card05/hwtype
>
> The hwtype must be 10 or higher (CEX4 or newer) in order to be bound to the
> vfio_ap device driver.
> @@ -747,37 +790,48 @@ These are the steps:
> higher than the maximum is specified, the operation will terminate with
> an error (ENODEV).
>
> - * All APQNs that can be derived from the adapter ID and the IDs of
> - the previously assigned domains must be bound to the vfio_ap device
> - driver. If no domains have yet been assigned, then there must be at least
> - one APQN with the specified APID bound to the vfio_ap driver. If no such
> - APQNs are bound to the driver, the operation will terminate with an
> - error (EADDRNOTAVAIL).
> + * Each APQN that can be derived from the adapter ID and the IDs of
> + the previously assigned domains must not be reserved for use by the
> + zcrypt device drivers as specified by the /sys/bus/ap/apmask and
> + /sys/bus/ap/aqmask syfs interfaces. If any APQN is reserved, the operation
> + will terminate with an error (EADDRNOTAVAIL).
>
> - No APQN that can be derived from the adapter ID and the IDs of the
> + * No APQN that can be derived from the adapter ID and the IDs of the
> previously assigned domains can be assigned to another mediated matrix
> device. If an APQN is assigned to another mediated matrix device, the
> operation will terminate with an error (EADDRINUSE).
>
> + Note that adapters that are currently not available to the host may be
> + assigned to the mediated device. If the adapter subsequently becomes
> + available while a guest is using the mediated device, it will automatically
> + become available to the guest.
> +
> In order to successfully assign a domain:
>
> * The domain number specified must represent a value from 0 up to the
> maximum domain number configured for the system. If a domain number
> higher than the maximum is specified, the operation will terminate with
> - an error (ENODEV).
> + an error (ENODEV). The maximum domain number can be determined by
> + printing the sysfs /sys/bus/ap/ap_max_domain_id attribute:
>
> - * All APQNs that can be derived from the domain ID and the IDs of
> - the previously assigned adapters must be bound to the vfio_ap device
> - driver. If no domains have yet been assigned, then there must be at least
> - one APQN with the specified APQI bound to the vfio_ap driver. If no such
> - APQNs are bound to the driver, the operation will terminate with an
> - error (EADDRNOTAVAIL).
> + cat /sys/bus/ap/ap_max_domain_id
>
> - No APQN that can be derived from the domain ID and the IDs of the
> + * Each APQN that can be derived from the domain ID and the IDs of
> + the previously assigned adapters must not be reserved for use by the
> + zcrypt device drivers as specified by the /sys/bus/ap/apmask and
> + /sys/bus/ap/aqmask syfs interfaces. If any APQN is reserved, the operation
> + will terminate with an error (EADDRNOTAVAIL).
> +
> + * No APQN that can be derived from the domain ID and the IDs of the
> previously assigned adapters can be assigned to another mediated matrix
> device. If an APQN is assigned to another mediated matrix device, the
> operation will terminate with an error (EADDRINUSE).
>
> + Note that domains that are currently not available to the host may be
> + assigned to the mediated device. If the domain subsequently becomes
> + available while a guest is using the mediated device, it will automatically
> + become available to the guest.
> +
> In order to successfully assign a control domain, the domain number
> specified must represent a value from 0 up to the maximum domain number
> configured for the system. If a control domain number higher than the maximum
> @@ -822,16 +876,96 @@ Using our example again, to remove the mediated matrix device $uuid1:
> host. If the mdev matrix device is removed, one may want to also reconfigure
> the pool of adapters and queues reserved for use by the default drivers.
>
> -Limitations
> -===========
> -* The KVM/kernel interfaces do not provide a way to prevent restoring an APQN
> - to the default drivers pool of a queue that is still assigned to a mediated
> - device in use by a guest. It is incumbent upon the administrator to
> - ensure there is no mediated device in use by a guest to which the APQN is
> - assigned lest the host be given access to the private data of the AP queue
> - device such as a private key configured specifically for the guest.
> +Hot plug/unplug via mdev matrix device sysfs interfaces:
> +=======================================================
> +If an mdev matrix device is in use by a running guest, AP resources can be
> +plugged into or unplugged from the guest via the mdev device's sysfs
> +assignment interfaces. Below are some examples.
> +
> + To plug adapter 10 into a running guest:
> +
> + echo 0xa > assign_adapter
> +
> + To unplug domain 5 from a running guest:
> +
> + echo 5 > unassign_domain
> +
> +To display the matrix of a guest using the mdev matrix device:
> +
> + cat guest_matrix
> +
> +If you attempt to display the guest matrix when a guest is not using the
> +mdev matrix device, an error will be displayed (ENODEV). Note that adapters and
> +domains that are not yet available or not yet assigned to the LPAR can be
> +assigned and will become available to the guest as soon as they become available
> +to the host.
> +
> +Dynamic Changes to AP Configuration using the Support Element (SE):
> +==================================================================
> +The SE can be used to dynamically make the following changes to the AP
> +configuration for an LPAR in which a linux host is running:
> +
> + * Configure one or more adapters on
> +
> + Configuring an adapter on sets its state to online, thus making it
> + available to the LPAR to which it is assigned. When an adapter is
> + configured on, it immediately becomes available to the LPAR as well as to
> + any guests using a mediated device to which the adapter is assigned.
> +
> + * Configure one or more adapters off
> +
> + Configuring an adapter off sets its state to standby, thus making it
> + unavailable to the LPAR to which it is assigned. When an adapter is
> + configured off, it immediately becomes unavailable to the LPAR as well as
> + to any guests using a mediated device to which the adapter is assigned.
> +
> + * Add adapters or domains to the LPAR configuration
> +
> + Adapters and/or domains can be assigned to an LPAR using the Change LPAR
> + Cryptographic Controls task. To make dynamic changes to the AP
> + configuration for an LPAR Running a linux guest, the online adapters
> + assigned to the LPAR must first be configured off. After performing the
> + adapter and/or domain assignments, the AP resources will automatically
> + become available to the linux host running in the LPAR as well as any
> + guests using a mediated device to which the adpaters and/or domains are
> + assigned.
> +
> + * Remove adapters or domains from the LPAR configuration
> +
> + Adapters and/or domains can be unassigned from an LPAR using the Change
> + LPAR Cryptographic Controls task. To make dynamic changes to the AP
> + configuration for an LPAR Running a linux guest, the online adapters
> + assigned to the LPAR must first be configured off. After performing the
> + adapter and/or domain unassignments, the AP resources will automatically
> + become unavailable to the linux host running in the LPAR as well as any
> + guests using a mediated device to which the adpaters and/or domains are
> + assigned.
> +
> +Dynamic Changes to the AP Configuration using the SCLP command:
> +==============================================================
> +The following SCLP commands may be used to dynamically configure AP adapters on
> +and off:
> +
> +* Configure Adjunct Processor
> +
> + The 'Configure Adjunct Processor' command sets an AP adapter's state to
> + online, thus making it available to the LPARs to which it is assigned. It will
> + likewise become available to any linux guest using a mediated device to which
> + the adapter is assigned.
> +
> +* Deconfigure Adjunct Processor
> +
> + The 'Deconfigure Adjunct Processor' command sets an AP adapter's state to
> + standby, thus making it unavailable to the LPARs to which it is assigned. It
> + will likewise become unavailable to any linux guest using a mediated device to
> + which the adapter is assigned.
>
> -* Dynamically modifying the AP matrix for a running guest (which would amount to
> - hot(un)plug of AP devices for the guest) is currently not supported
> +Live migration:
> +==============
> +Live guest migration is not supported for guests using AP devices. All AP
> +devices in use by the guest must be unplugged prior to initiating live
> +migration (see "Hot plug/unplug via mdev matrix device sysfs interfaces" section
> +above). If you are using QEMU to run your guest and it supports hot plug/unplug
> +of the vfio-ap device, this would be another option (consult the QEMU
> +documentation for details).
>
> -* Live guest migration is not supported for guests using AP devices.

2019-06-17 14:30:28

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v4 2/7] s390: vfio-ap: wait for queue empty on queue reset

On 6/17/19 4:47 AM, Harald Freudenberger wrote:
> On 13.06.19 21:39, Tony Krowiak wrote:
>> Refactors the AP queue reset function to wait until the queue is empty
>> after the PQAP(ZAPQ) instruction is executed to zero out the queue as
>> required by the AP architecture.
>>
>> Signed-off-by: Tony Krowiak <[email protected]>
>> ---
>> drivers/s390/crypto/vfio_ap_ops.c | 49 +++++++++++++++++++++++++++------------
>> 1 file changed, 34 insertions(+), 15 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index bf2ab02b9a0b..60efd3d7896d 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -1128,23 +1128,46 @@ static void vfio_ap_irq_disable_apqn(int apqn)
>> }
>> }
>>
>> -int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
>> - unsigned int retry)
>> +static void vfio_ap_mdev_wait_for_qempty(ap_qid_t qid)
>> +{
>> + struct ap_queue_status status;
>> + int retry = 5;
>> +
>> + do {
>> + status = ap_tapq(qid, NULL);
>> + switch (status.response_code) {
>> + case AP_RESPONSE_NORMAL:
>> + if (status.queue_empty)
>> + return;
>> + case AP_RESPONSE_RESET_IN_PROGRESS:
>> + case AP_RESPONSE_BUSY:
>> + msleep(20);
>> + break;
>> + default:
>> + pr_warn("%s: tapq response %02x waiting for queue %04x.%02x empty\n",
>> + __func__, status.response_code,
>> + AP_QID_CARD(qid), AP_QID_QUEUE(qid));
> The ap and zcrypt code uses %02x.%04x for displaying an APQN.

Oops, of course it does. My synapses must have been reversed.

> I would also recommend to handle 0x01 (AP_RESPONSE_Q_NOT_AVAIL) and
> 0x03 (AP_RESPONSE_DECONFIGURED) as this code may run when a APQN
> is removed from the configuration of the LPAR. However, it's up to you if you
> want to handle these with the default statement and issue an sysfs warning.

How would you handle them differently? If the queue is not in the
configuration, then there is nothing to wait for.

>> + return;
>> + }
>> + } while (--retry);
>> +
>> + WARN_ON_ONCE(retry <= 0);
>> +}
>> +
>> +int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi)
>> {
>> struct ap_queue_status status;
>> - int retry2 = 2;
>> int apqn = AP_MKQID(apid, apqi);
>> + int retry = 5;
>>
>> do {
>> status = ap_zapq(apqn);
>> switch (status.response_code) {
>> case AP_RESPONSE_NORMAL:
>> - while (!status.queue_empty && retry2--) {
>> - msleep(20);
>> - status = ap_tapq(apqn, NULL);
>> - }
>> - WARN_ON_ONCE(retry <= 0);
>> + vfio_ap_mdev_wait_for_qempty(AP_MKQID(apid, apqi));
>> return 0;
>> + case AP_RESPONSE_DECONFIGURED:
>> + return -ENODEV;
>> case AP_RESPONSE_RESET_IN_PROGRESS:
>> case AP_RESPONSE_BUSY:
>> msleep(20);
>> @@ -1169,14 +1192,10 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
>> matrix_mdev->matrix.apm_max + 1) {
>> for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
>> matrix_mdev->matrix.aqm_max + 1) {
>> - ret = vfio_ap_mdev_reset_queue(apid, apqi, 1);
>> - /*
>> - * Regardless whether a queue turns out to be busy, or
>> - * is not operational, we need to continue resetting
>> - * the remaining queues.
>> - */
>> + ret = vfio_ap_mdev_reset_queue(apid, apqi);
>> if (ret)
>> rc = ret;
>> +
>> vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi));
>> }
>> }
>> @@ -1326,7 +1345,7 @@ void vfio_ap_mdev_remove_queue(struct ap_queue *queue)
>> dev_set_drvdata(&queue->ap_dev.device, NULL);
>> apid = AP_QID_CARD(q->apqn);
>> apqi = AP_QID_QUEUE(q->apqn);
>> - vfio_ap_mdev_reset_queue(apid, apqi, 1);
>> + vfio_ap_mdev_reset_queue(apid, apqi);
>> vfio_ap_irq_disable(q);
>> kfree(q);
>> }
>

2019-06-17 14:38:06

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v4 3/7] s390: zcrypt: driver callback to indicate resource in use

On 6/17/19 5:28 AM, Harald Freudenberger wrote:
> On 13.06.19 21:39, Tony Krowiak wrote:
>> Introduces a new driver callback to prevent a root user from unbinding
>> an AP queue from its device driver if the queue is in use. This prevents
>> a root user from inadvertently taking a queue away from a guest and
>> giving it to the host, or vice versa. The callback will be invoked
>> whenever a change to the AP bus's apmask or aqmask sysfs interfaces may
>> result in one or more AP queues being removed from its driver. If the
>> callback responds in the affirmative for any driver queried, the change
>> to the apmask or aqmask will be rejected with a device in use error.
>>
>> For this patch, only non-default drivers will be queried. Currently,
>> there is only one non-default driver, the vfio_ap device driver. The
>> vfio_ap device driver manages AP queues passed through to one or more
>> guests and we don't want to unexpectedly take AP resources away from
>> guests which are most likely independently administered.
>>
>> Signed-off-by: Tony Krowiak <[email protected]>
>> ---
>> drivers/s390/crypto/ap_bus.c | 138 +++++++++++++++++++++++++++++++++++++++++--
>> drivers/s390/crypto/ap_bus.h | 3 +
>> 2 files changed, 135 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
>> index b9fc502c58c2..1b06f47d0d1c 100644
>> --- a/drivers/s390/crypto/ap_bus.c
>> +++ b/drivers/s390/crypto/ap_bus.c
>> @@ -35,6 +35,7 @@
>> #include <linux/mod_devicetable.h>
>> #include <linux/debugfs.h>
>> #include <linux/ctype.h>
>> +#include <linux/module.h>
>>
>> #include "ap_bus.h"
>> #include "ap_debug.h"
>> @@ -998,9 +999,11 @@ int ap_parse_mask_str(const char *str,
>> newmap = kmalloc(size, GFP_KERNEL);
>> if (!newmap)
>> return -ENOMEM;
>> - if (mutex_lock_interruptible(lock)) {
>> - kfree(newmap);
>> - return -ERESTARTSYS;
>> + if (lock) {
>> + if (mutex_lock_interruptible(lock)) {
>> + kfree(newmap);
>> + return -ERESTARTSYS;
>> + }
>> }
>>
>> if (*str == '+' || *str == '-') {
>> @@ -1012,7 +1015,10 @@ int ap_parse_mask_str(const char *str,
>> }
>> if (rc == 0)
>> memcpy(bitmap, newmap, size);
>> - mutex_unlock(lock);
>> +
>> + if (lock)
>> + mutex_unlock(lock);
>> +
>> kfree(newmap);
>> return rc;
>> }
>> @@ -1199,12 +1205,72 @@ static ssize_t apmask_show(struct bus_type *bus, char *buf)
>> return rc;
>> }
>>
>> +int __verify_card_reservations(struct device_driver *drv, void *data)
>> +{
>> + int rc = 0;
>> + struct ap_driver *ap_drv = to_ap_drv(drv);
>> + unsigned long *newapm = (unsigned long *)data;
>> +
>> + /*
>> + * If the reserved bits do not identify cards reserved for use by the
>> + * non-default driver, there is no need to verify the driver is using
>> + * the queues.
>> + */
>> + if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
>> + return 0;
>> +
>> + /* The non-default driver's module must be loaded */
>> + if (!try_module_get(drv->owner))
>> + return 0;
>> +
>> + if (ap_drv->in_use)
>> + if (ap_drv->in_use(newapm, ap_perms.aqm))
>> + rc = -EADDRINUSE;
>> +
>> + module_put(drv->owner);
>> +
>> + return rc;
>> +}
>> +
>> +static int apmask_commit(unsigned long *newapm)
>> +{
>> + int rc;
>> + unsigned long reserved[BITS_TO_LONGS(AP_DEVICES)];
>> +
>> + /*
>> + * Check if any bits in the apmask have been set which will
>> + * result in queues being removed from non-default drivers
>> + */
>> + if (bitmap_andnot(reserved, newapm, ap_perms.apm, AP_DEVICES)) {
>> + rc = (bus_for_each_drv(&ap_bus_type, NULL, reserved,
>> + __verify_card_reservations));
> Why surround the bus_for_each_drv() with additional parenthesis ?

You are right, it is not necessary.

>> + if (rc)
>> + return rc;
>> + }
>> +
>> + memcpy(ap_perms.apm, newapm, APMASKSIZE);
>> +
>> + return 0;
>> +}
>> +
>> static ssize_t apmask_store(struct bus_type *bus, const char *buf,
>> size_t count)
>> {
>> int rc;
>> + unsigned long newapm[BITS_TO_LONGS(AP_DEVICES)];
>> +
>> + memcpy(newapm, ap_perms.apm, APMASKSIZE);
>> +
>> + rc = ap_parse_mask_str(buf, newapm, AP_DEVICES, NULL);
>> + if (rc)
>> + return rc;
>> +
>> + if (mutex_lock_interruptible(&ap_perms_mutex))
>> + return -ERESTARTSYS;
> This lock is too late. Move it before the memcpy(newapm, ap_perms.apm, APMASKSIZE);

I agree, it shall be moved.

>> +
>> + rc = apmask_commit(newapm);
>> + mutex_unlock(&ap_perms_mutex);
>>
>> - rc = ap_parse_mask_str(buf, ap_perms.apm, AP_DEVICES, &ap_perms_mutex);
>> if (rc)
>> return rc;
>>
>> @@ -1230,12 +1296,72 @@ static ssize_t aqmask_show(struct bus_type *bus, char *buf)
>> return rc;
>> }
>>
>> +int __verify_queue_reservations(struct device_driver *drv, void *data)
>> +{
>> + int rc = 0;
>> + struct ap_driver *ap_drv = to_ap_drv(drv);
>> + unsigned long *newaqm = (unsigned long *)data;
>> +
>> + /*
>> + * If the reserved bits do not identify queues reserved for use by the
>> + * non-default driver, there is no need to verify the driver is using
>> + * the queues.
>> + */
>> + if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
>> + return 0;
>> +
>> + /* The non-default driver's module must be loaded */
>> + if (!try_module_get(drv->owner))
>> + return 0;
>> +
>> + if (ap_drv->in_use)
>> + if (ap_drv->in_use(ap_perms.apm, newaqm))
>> + rc = -EADDRINUSE;
>> +
>> + module_put(drv->owner);
>> +
>> + return rc;
>> +}
>> +
>> +static int aqmask_commit(unsigned long *newaqm)
>> +{
>> + int rc;
>> + unsigned long reserved[BITS_TO_LONGS(AP_DOMAINS)];
>> +
>> + /*
>> + * Check if any bits in the aqmask have been set which will
>> + * result in queues being removed from non-default drivers
>> + */
>> + if (bitmap_andnot(reserved, newaqm, ap_perms.aqm, AP_DOMAINS)) {
>> + rc = (bus_for_each_drv(&ap_bus_type, NULL, reserved,
>> + __verify_queue_reservations));
> Same here: please remove the surrounding ( ... ).

Will do.

>> + if (rc)
>> + return rc;
>> + }
>> +
>> + memcpy(ap_perms.aqm, newaqm, APMASKSIZE);
> APMASKSIZE -> AQMASKSIZE
>> +
>> + return 0;
>> +}
>> +
>> static ssize_t aqmask_store(struct bus_type *bus, const char *buf,
>> size_t count)
>> {
>> int rc;
>> + unsigned long newaqm[BITS_TO_LONGS(AP_DEVICES)];
>> +
>> + memcpy(newaqm, ap_perms.aqm, AQMASKSIZE);
>> +
>> + rc = ap_parse_mask_str(buf, newaqm, AP_DOMAINS, NULL);
>> + if (rc)
>> + return rc;
>> +
>> + if (mutex_lock_interruptible(&ap_perms_mutex))
>> + return -ERESTARTSYS;
> Please move the lock before the memcpy(newaqm, ...)

Will do.

>> +
>> + rc = aqmask_commit(newaqm);
>> + mutex_unlock(&ap_perms_mutex);
>>
>> - rc = ap_parse_mask_str(buf, ap_perms.aqm, AP_DOMAINS, &ap_perms_mutex);
>> if (rc)
>> return rc;
>>
>> diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
>> index 6f3cf37776ca..0f865c5545f2 100644
>> --- a/drivers/s390/crypto/ap_bus.h
>> +++ b/drivers/s390/crypto/ap_bus.h
>> @@ -137,6 +137,7 @@ struct ap_driver {
>> void (*remove)(struct ap_device *);
>> void (*suspend)(struct ap_device *);
>> void (*resume)(struct ap_device *);
>> + bool (*in_use)(unsigned long *apm, unsigned long *aqm);
>> };
>>
>> #define to_ap_drv(x) container_of((x), struct ap_driver, driver)
>> @@ -265,6 +266,8 @@ void ap_queue_reinit_state(struct ap_queue *aq);
>> struct ap_card *ap_card_create(int id, int queue_depth, int raw_device_type,
>> int comp_device_type, unsigned int functions);
>>
>> +#define APMASKSIZE (BITS_TO_LONGS(AP_DEVICES) * sizeof(unsigned long))
>> +#define AQMASKSIZE (BITS_TO_LONGS(AP_DOMAINS) * sizeof(unsigned long))
> add one empty line here please
>> struct ap_perms {
>> unsigned long ioctlm[BITS_TO_LONGS(AP_IOCTLS)];
>> unsigned long apm[BITS_TO_LONGS(AP_DEVICES)];
>

2019-06-17 15:09:52

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v4 5/7] s390: vfio-ap: allow assignment of unavailable AP resources to mdev device

On 6/17/19 6:05 AM, Harald Freudenberger wrote:
> On 13.06.19 21:39, Tony Krowiak wrote:
>> The AP architecture does not preclude assignment of AP resources that are
>> not available. Let's go ahead and implement this facet of the AP
>> architecture for linux guests.
>>
>> The current implementation does not allow assignment of AP adapters or
>> domains to an mdev device if the APQNs resulting from the assignment
>> reference AP queue devices that are not bound to the vfio_ap device
>> driver. This patch allows assignment of AP resources to the mdev device as
>> long as the APQNs resulting from the assignment are not reserved by the AP
>> BUS for use by the zcrypt device drivers and the APQNs are not assigned to
>> another mdev device.
>>
>> Signed-off-by: Tony Krowiak <[email protected]>
>> ---
>> drivers/s390/crypto/vfio_ap_ops.c | 231 ++++++++------------------------------
>> 1 file changed, 44 insertions(+), 187 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index 60efd3d7896d..9db86c0db52e 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -406,122 +406,6 @@ static struct attribute_group *vfio_ap_mdev_type_groups[] = {
>> NULL,
>> };
>>
>> -struct vfio_ap_queue_reserved {
>> - unsigned long *apid;
>> - unsigned long *apqi;
>> - bool reserved;
>> -};
>> -
>> -/**
>> - * vfio_ap_has_queue
>> - *
>> - * @dev: an AP queue device
>> - * @data: a struct vfio_ap_queue_reserved reference
>> - *
>> - * Flags whether the AP queue device (@dev) has a queue ID containing the APQN,
>> - * apid or apqi specified in @data:
>> - *
>> - * - If @data contains both an apid and apqi value, then @data will be flagged
>> - * as reserved if the APID and APQI fields for the AP queue device matches
>> - *
>> - * - If @data contains only an apid value, @data will be flagged as
>> - * reserved if the APID field in the AP queue device matches
>> - *
>> - * - If @data contains only an apqi value, @data will be flagged as
>> - * reserved if the APQI field in the AP queue device matches
>> - *
>> - * Returns 0 to indicate the input to function succeeded. Returns -EINVAL if
>> - * @data does not contain either an apid or apqi.
>> - */
>> -static int vfio_ap_has_queue(struct device *dev, void *data)
>> -{
>> - struct vfio_ap_queue_reserved *qres = data;
>> - struct ap_queue *ap_queue = to_ap_queue(dev);
>> - ap_qid_t qid;
>> - unsigned long id;
>> -
>> - if (qres->apid && qres->apqi) {
>> - qid = AP_MKQID(*qres->apid, *qres->apqi);
>> - if (qid == ap_queue->qid)
>> - qres->reserved = true;
>> - } else if (qres->apid && !qres->apqi) {
>> - id = AP_QID_CARD(ap_queue->qid);
>> - if (id == *qres->apid)
>> - qres->reserved = true;
>> - } else if (!qres->apid && qres->apqi) {
>> - id = AP_QID_QUEUE(ap_queue->qid);
>> - if (id == *qres->apqi)
>> - qres->reserved = true;
>> - } else {
>> - return -EINVAL;
>> - }
>> -
>> - return 0;
>> -}
>> -
>> -/**
>> - * vfio_ap_verify_queue_reserved
>> - *
>> - * @matrix_dev: a mediated matrix device
>> - * @apid: an AP adapter ID
>> - * @apqi: an AP queue index
>> - *
>> - * Verifies that the AP queue with @apid/@apqi is reserved by the VFIO AP device
>> - * driver according to the following rules:
>> - *
>> - * - If both @apid and @apqi are not NULL, then there must be an AP queue
>> - * device bound to the vfio_ap driver with the APQN identified by @apid and
>> - * @apqi
>> - *
>> - * - If only @apid is not NULL, then there must be an AP queue device bound
>> - * to the vfio_ap driver with an APQN containing @apid
>> - *
>> - * - If only @apqi is not NULL, then there must be an AP queue device bound
>> - * to the vfio_ap driver with an APQN containing @apqi
>> - *
>> - * Returns 0 if the AP queue is reserved; otherwise, returns -EADDRNOTAVAIL.
>> - */
>> -static int vfio_ap_verify_queue_reserved(unsigned long *apid,
>> - unsigned long *apqi)
>> -{
>> - int ret;
>> - struct vfio_ap_queue_reserved qres;
>> -
>> - qres.apid = apid;
>> - qres.apqi = apqi;
>> - qres.reserved = false;
>> -
>> - ret = driver_for_each_device(&matrix_dev->vfio_ap_drv->driver, NULL,
>> - &qres, vfio_ap_has_queue);
>> - if (ret)
>> - return ret;
>> -
>> - if (qres.reserved)
>> - return 0;
>> -
>> - return -EADDRNOTAVAIL;
>> -}
>> -
>> -static int
>> -vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
>> - unsigned long apid)
>> -{
>> - int ret;
>> - unsigned long apqi;
>> - unsigned long nbits = matrix_mdev->matrix.aqm_max + 1;
>> -
>> - if (find_first_bit_inv(matrix_mdev->matrix.aqm, nbits) >= nbits)
>> - return vfio_ap_verify_queue_reserved(&apid, NULL);
>> -
>> - for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, nbits) {
>> - ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
>> - if (ret)
>> - return ret;
>> - }
>> -
>> - return 0;
>> -}
>> -
>> /**
>> * vfio_ap_mdev_verify_no_sharing
>> *
>> @@ -529,18 +413,26 @@ vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
>> * and AP queue indexes comprising the AP matrix are not configured for another
>> * mediated device. AP queue sharing is not allowed.
>> *
>> - * @matrix_mdev: the mediated matrix device
>> + * @mdev_apm: the mask identifying the adapters assigned to mdev
>> + * @mdev_apm: the mask identifying the adapters assigned to mdev
> I assume you wanted to write @mdev_aqm ... queues ... at the 2nd line.

You assumed correctly. I also mean to say "domains assigned to mdev".

>> *
>> * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
>> */
>> -static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
>> +static int vfio_ap_mdev_verify_no_sharing(unsigned long *mdev_apm,
>> + unsigned long *mdev_aqm)
>> {
>> struct ap_matrix_mdev *lstdev;
>> DECLARE_BITMAP(apm, AP_DEVICES);
>> DECLARE_BITMAP(aqm, AP_DOMAINS);
>>
>> list_for_each_entry(lstdev, &matrix_dev->mdev_list, node) {
>> - if (matrix_mdev == lstdev)
>> + /*
>> + * If either of the input masks belongs to the mdev to which an
>> + * AP resource is being assigned, then we don't need to verify
>> + * that mdev's masks.
>> + */
>> + if ((mdev_apm == lstdev->matrix.apm) ||
>> + (mdev_aqm == lstdev->matrix.aqm))
>> continue;
>>
>> memset(apm, 0, sizeof(apm));
>> @@ -550,12 +442,10 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
>> * We work on full longs, as we can only exclude the leftover
>> * bits in non-inverse order. The leftover is all zeros.
>> */
>> - if (!bitmap_and(apm, matrix_mdev->matrix.apm,
>> - lstdev->matrix.apm, AP_DEVICES))
>> + if (!bitmap_and(apm, mdev_apm, lstdev->matrix.apm, AP_DEVICES))
>> continue;
>>
>> - if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
>> - lstdev->matrix.aqm, AP_DOMAINS))
>> + if (!bitmap_and(aqm, mdev_aqm, lstdev->matrix.aqm, AP_DOMAINS))
>> continue;
>>
>> return -EADDRINUSE;
>> @@ -564,6 +454,17 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
>> return 0;
>> }
>>
>> +static int vfio_ap_mdev_validate_masks(unsigned long *apm, unsigned long *aqm)
>> +{
>> + int ret;
>> +
>> + ret = ap_apqn_in_matrix_owned_by_def_drv(apm, aqm);
>> + if (ret)
>> + return (ret == 1) ? -EADDRNOTAVAIL : ret;
>> +
>> + return vfio_ap_mdev_verify_no_sharing(apm, aqm);
>> +}
>> +
>> /**
>> * assign_adapter_store
>> *
>> @@ -602,6 +503,7 @@ static ssize_t assign_adapter_store(struct device *dev,
>> {
>> int ret;
>> unsigned long apid;
>> + DECLARE_BITMAP(apm, AP_DEVICES);
>> struct mdev_device *mdev = mdev_from_dev(dev);
>> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>
>> @@ -616,32 +518,19 @@ static ssize_t assign_adapter_store(struct device *dev,
>> if (apid > matrix_mdev->matrix.apm_max)
>> return -ENODEV;
>>
>> - /*
>> - * Set the bit in the AP mask (APM) corresponding to the AP adapter
>> - * number (APID). The bits in the mask, from most significant to least
>> - * significant bit, correspond to APIDs 0-255.
>> - */
>> - mutex_lock(&matrix_dev->lock);
>> -
>> - ret = vfio_ap_mdev_verify_queues_reserved_for_apid(matrix_mdev, apid);
>> - if (ret)
>> - goto done;
>> + memset(apm, 0, ARRAY_SIZE(apm) * sizeof(*apm));
> What about just memset(apm, 0, sizeof(apm));

apm is a pointer to an array of unsigned long, so sizeof(apm) will
yield the number of bytes in the pointer (8), not the number of bytes in
the array (32); or am I wrong about that?

>> + set_bit_inv(apid, apm);
>>
>> + mutex_lock(&matrix_dev->lock);
>> + ret = vfio_ap_mdev_validate_masks(apm, matrix_mdev->matrix.aqm);
>> + if (ret) {
>> + mutex_unlock(&matrix_dev->lock);
>> + return ret;
>> + }
>> set_bit_inv(apid, matrix_mdev->matrix.apm);
>> -
>> - ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
>> - if (ret)
>> - goto share_err;
>> -
>> - ret = count;
>> - goto done;
>> -
>> -share_err:
>> - clear_bit_inv(apid, matrix_mdev->matrix.apm);
>> -done:
>> mutex_unlock(&matrix_dev->lock);
>>
>> - return ret;
>> + return count;
>> }
>> static DEVICE_ATTR_WO(assign_adapter);
>>
>> @@ -690,26 +579,6 @@ static ssize_t unassign_adapter_store(struct device *dev,
>> }
>> static DEVICE_ATTR_WO(unassign_adapter);
>>
>> -static int
>> -vfio_ap_mdev_verify_queues_reserved_for_apqi(struct ap_matrix_mdev *matrix_mdev,
>> - unsigned long apqi)
>> -{
>> - int ret;
>> - unsigned long apid;
>> - unsigned long nbits = matrix_mdev->matrix.apm_max + 1;
>> -
>> - if (find_first_bit_inv(matrix_mdev->matrix.apm, nbits) >= nbits)
>> - return vfio_ap_verify_queue_reserved(NULL, &apqi);
>> -
>> - for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, nbits) {
>> - ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
>> - if (ret)
>> - return ret;
>> - }
>> -
>> - return 0;
>> -}
>> -
>> /**
>> * assign_domain_store
>> *
>> @@ -748,6 +617,7 @@ static ssize_t assign_domain_store(struct device *dev,
>> {
>> int ret;
>> unsigned long apqi;
>> + DECLARE_BITMAP(aqm, AP_DEVICES);
> AP_DEVICES -> AP_DOMAINS

Copy and paste error, it should be AP_DOMAINS.

>> struct mdev_device *mdev = mdev_from_dev(dev);
>> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>> unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
>> @@ -762,27 +632,19 @@ static ssize_t assign_domain_store(struct device *dev,
>> if (apqi > max_apqi)
>> return -ENODEV;
>>
>> - mutex_lock(&matrix_dev->lock);
>> -
>> - ret = vfio_ap_mdev_verify_queues_reserved_for_apqi(matrix_mdev, apqi);
>> - if (ret)
>> - goto done;
>> + memset(aqm, 0, ARRAY_SIZE(aqm) * sizeof(*aqm));
> memset(aqm, 0, sizeof(aqm));

See response above.

>> + set_bit_inv(apqi, aqm);
>>
>> + mutex_lock(&matrix_dev->lock);
>> + ret = vfio_ap_mdev_validate_masks(matrix_mdev->matrix.apm, aqm);
>> + if (ret) {
>> + mutex_unlock(&matrix_dev->lock);
>> + return ret;
>> + }
>> set_bit_inv(apqi, matrix_mdev->matrix.aqm);
>> -
>> - ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
>> - if (ret)
>> - goto share_err;
>> -
>> - ret = count;
>> - goto done;
>> -
>> -share_err:
>> - clear_bit_inv(apqi, matrix_mdev->matrix.aqm);
>> -done:
>> mutex_unlock(&matrix_dev->lock);
>>
>> - return ret;
>> + return count;
>> }
>> static DEVICE_ATTR_WO(assign_domain);
>>
>> @@ -868,11 +730,6 @@ static ssize_t assign_control_domain_store(struct device *dev,
>> if (id > matrix_mdev->matrix.adm_max)
>> return -ENODEV;
>>
>> - /* Set the bit in the ADM (bitmask) corresponding to the AP control
>> - * domain number (id). The bits in the mask, from most significant to
>> - * least significant, correspond to IDs 0 up to the one less than the
>> - * number of control domains that can be assigned.
>> - */
>> mutex_lock(&matrix_dev->lock);
>> set_bit_inv(id, matrix_mdev->matrix.adm);
>> mutex_unlock(&matrix_dev->lock);
>

2019-06-17 15:22:52

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v4 7/7] s390: vfio-ap: update documentation

On 6/17/19 7:42 AM, Harald Freudenberger wrote:
> On 13.06.19 21:39, Tony Krowiak wrote:
>> This patch updates the vfio-ap documentation to include the information
>> below.
>>
>> Changes made to the mdev matrix assignment interfaces:
>>
>> * Allow assignment of APQNs that are not bound to the vfio-ap device
>> driver as long as they are not owned by a zcrypt driver as identified
>> in the AP bus sysfs apmask and aqmask interfaces.
>>
>> * Allow assignment of an AP resource to a mediated device which is in use
>> by a guest to hot plug an adapter, domain and control domain into a
>> running guest.
>>
>> * Allow unassignment of an AP resource from a mediated device which is in
>> use by a guest to hot unplug an adapter, domain and control domain from
>> a running guest.
>>
>> This patch also:
>>
>> * Clarifies the section on configuring the AP bus's apmask and aqmask.
>>
>> * Adds sections on dynamic configuration using the SE or SCLP command.
>>
>> Signed-off-by: Tony Krowiak <[email protected]>
>> ---
>> Documentation/s390/vfio-ap.txt | 292 ++++++++++++++++++++++++++++++-----------
>> 1 file changed, 213 insertions(+), 79 deletions(-)
>>
>> diff --git a/Documentation/s390/vfio-ap.txt b/Documentation/s390/vfio-ap.txt
>> index 65167cfe4485..9372a6570ce1 100644
>> --- a/Documentation/s390/vfio-ap.txt
>> +++ b/Documentation/s390/vfio-ap.txt
>> @@ -81,10 +81,19 @@ definitions:
>> which the AP command is to be sent for processing.
>>
>> The AP bus will create a sysfs device for each APQN that can be derived from
>> - the cross product of the AP adapter and usage domain numbers detected when the
>> - AP bus module is loaded. For example, if adapters 4 and 10 (0x0a) and usage
>> - domains 6 and 71 (0x47) are assigned to the LPAR, the AP bus will create the
>> - following sysfs entries:
>> + the Cartesian product of the AP adapter and usage domain numbers detected when
>> + the AP bus module is loaded. For example, if adapters 4 and 10 (0x0a) and
>> + usage domains 6 and 71 (0x47) are assigned to the LPAR, the Cartesian product
>> + would be defined by the following table:
>> +
>> + 06 71
>> + +-----------+-----------+
>> + 04 | (04,06) | (04,47) |
>> + +-----------|-----------+
>> + 10 | (0a,06) | (0a,47) |
>> + +-----------|-----------+
>> +
> These ascii pictures are nice and show very lovely how this works ...
> but you use the domain id as column (x) and the adapter id as row (y).
> Then the APQNs are shown in (y,x) writing then following the scheme
> (card,domain).

I'll fix this.

>> + The AP bus will create the following sysfs entries:
>>
>> /sys/devices/ap/card04/04.0006
>> /sys/devices/ap/card04/04.0047
>> @@ -146,10 +155,20 @@ If you recall from the description of an AP Queue, AP instructions include
>> an APQN to identify the AP queue to which an AP command-request message is to be
>> sent (NQAP and PQAP instructions), or from which a command-reply message is to
>> be received (DQAP instruction). The validity of an APQN is defined by the matrix
>> -calculated from the APM and AQM; it is the cross product of all assigned adapter
>> -numbers (APM) with all assigned queue indexes (AQM). For example, if adapters 1
>> -and 2 and usage domains 5 and 6 are assigned to a guest, the APQNs (1,5), (1,6),
>> -(2,5) and (2,6) will be valid for the guest.
>> +calculated from the APM and AQM; it is the Cartesian product of all assigned
>> +adapter numbers (APM) with all assigned queue indexes (AQM). For example, if
>> +adapters 1 and 2 and usage domains 5 and 6 are assigned to a guest:
>> +
>> +
>> + 05 06
>> + +-----------+-----------+
>> + 01 | (01,05) | (01,06) |
>> + +-----------|-----------+
>> + 02 | (02,05) | (02,06) |
>> + +-----------|-----------+
>> +
>> +
>> +APQNs (01,05), (01,06), (02,05) and (02,06) will be valid for the guest.
>>
>> The APQNs can provide secure key functionality - i.e., a private key is stored
>> on the adapter card for each of its domains - so each APQN must be assigned to
>> @@ -349,8 +368,9 @@ matrix device.
>> number of the the usage domain is echoed to the respective attribute
>> file.
>> * matrix:
>> - A read-only file for displaying the APQNs derived from the cross product
>> - of the adapter and domain numbers assigned to the mediated matrix device.
>> + A read-only file for displaying the APQNs derived from the Caresian
>> + product of the adapter and domain numbers assigned to the mediated matrix
>> + device.
>> * assign_control_domain:
>> * unassign_control_domain:
>> Write-only attributes for assigning/unassigning an AP control domain
>> @@ -438,9 +458,10 @@ guest use.
>> Example:
>> =======
>> Let's now provide an example to illustrate how KVM guests may be given
>> -access to AP facilities. For this example, we will show how to configure
>> -three guests such that executing the lszcrypt command on the guests would
>> -look like this:
>> +access to AP facilities. For this example, we will assume that adapters 4, 5
>> +and 6 and domains 4, 71 (0x47), 171 (0xab) and 255 (0xff) are assigned to the
>> +LPAR and are online. We will show how to configure three guests such that
>> +executing the lszcrypt command on the guests would look like this:
>>
>> Guest1
>> ------
>> @@ -466,7 +487,7 @@ Guest2
>> CARD.DOMAIN TYPE MODE
>> ------------------------------
>> 06 CEX5A Accelerator
>> -06.0047 CEX5A Accelerator
>> +06.0047 CEX5A AcceleratorNote that this directory may contain additional bindings depending upon what
> This line looks somehow weird.

Sometimes my editor does weird things like this. I'll fix it.

>> 06.00ff CEX5A Accelerator
>>
>> These are the steps:
>> @@ -513,35 +534,44 @@ These are the steps:
>> /sys/bus/ap/aqmask
>>
>> The 'apmask' is a 256-bit mask that identifies a set of AP adapter IDs
>> - (APID). Each bit in the mask, from left to right (i.e., from most significant
>> - to least significant bit in big endian order), corresponds to an APID from
>> - 0-255. If a bit is set, the APID is marked as usable only by the default AP
>> - queue device drivers; otherwise, the APID is usable by the vfio_ap
>> - device driver.
>> + (APID). Each bit in the mask, from left to right, corresponds to an APID from
>> + 0-255.
>>
>> The 'aqmask' is a 256-bit mask that identifies a set of AP queue indexes
>> - (APQI). Each bit in the mask, from left to right (i.e., from most significant
>> - to least significant bit in big endian order), corresponds to an APQI from
>> - 0-255. If a bit is set, the APQI is marked as usable only by the default AP
>> - queue device drivers; otherwise, the APQI is usable by the vfio_ap device
>> - driver.
>> + (APQI). Each bit in the mask, from left to right, corresponds to an APQI from
>> + 0-255.
>> +
>> + The Cartesian product of the APIDs set in the apmask and the APQIs set in
>> + the aqmask identify the APQNs of AP queue devices owned by the zcrypt
>> + device drivers.
>> +
>> + Take, for example, the following masks:
>>
>> - Take, for example, the following mask:
>> + apmask: 0x7000000000000000000000000000000000000000000000000000000000000000
>>
>> - 0x7dffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
>> + aqmask: 0x0180000000000000000000000000000000000000000000000000000000000000
>>
>> - It indicates:
>> + The bits set in apmask are bits 1, 2 and 3. The bits set in aqmask are bits
>> + 7 and 8. The Cartesian product of the bits set in the two masks is:
>>
>> - 1, 2, 3, 4, 5, and 7-255 belong to the default drivers' pool, and 0 and 6
>> - belong to the vfio_ap device driver's pool.
>> + 07 08
>> + +-----------+-----------+
>> + 01 | (01,07) | (01,08) |
>> + +-----------|-----------+
>> + 02 | (02,07) | (02,08) |
>> + +-----------|-----------+
>> + 03 | (03,07) | (03,08) |
>> + +-----------|-----------+
>>
>> - The APQN of each AP queue device assigned to the linux host is checked by the
>> - AP bus against the set of APQNs derived from the cross product of APIDs
>> - and APQIs marked as usable only by the default AP queue device drivers. If a
>> - match is detected, only the default AP queue device drivers will be probed;
>> - otherwise, the vfio_ap device driver will be probed.
>> + The masks indicate that the queues with APQNs (01,07), (01,08), (02,07),
>> + (02,08), (03,07) and (03,08) are owned by the zcrypt drivers. When the AP bus
>> + detects an AP queue device, its APQN is checked against the set of APQNs
>> + derived from the apmask and aqmask. If a match is detected, the zcrypt
>> + device driver registered for the device type of the queue will be probed. If
>> + a match is not detected and the device type of the queue is CEX4 or newer,
>> + the vfio_ap device driver will be probed.
>>
>> - By default, the two masks are set to reserve all APQNs for use by the default
>> + By default, the two masks are set to reserve all APQNs for use by the zcrypt
>> AP queue device drivers. There are two ways the default masks can be changed:
>>
>> 1. The sysfs mask files can be edited by echoing a string into the
>> @@ -554,8 +584,7 @@ These are the steps:
>>
>> 0x4100000000000000000000000000000000000000000000000000000000000000
>>
>> - Keep in mind that the mask reads from left to right (i.e., most
>> - significant to least significant bit in big endian order), so the mask
>> + Keep in mind that the mask reads from left to right, so the mask
>> above identifies device numbers 1 and 7 (01000001).
>>
>> If the string is longer than the mask, the operation is terminated with
>> @@ -563,7 +592,7 @@ These are the steps:
>>
>> * Individual bits in the mask can be switched on and off by specifying
>> each bit number to be switched in a comma separated list. Each bit
>> - number string must be prepended with a ('+') or minus ('-') to indicate
>> + number string must be prefixed with a ('+') or minus ('-') to indicate
>> the corresponding bit is to be switched on ('+') or off ('-'). Some
>> valid values are:
>>
>> @@ -594,19 +623,26 @@ These are the steps:
>> aqmask:
>> 0x4000000000000000000000000000000000000000000000000000000000000000
>>
>> - Resulting in these two pools:
>> -
>> - default drivers pool: adapter 0-15, domain 1
>> - alternate drivers pool: adapter 16-255, domains 0, 2-255
>> +Recall that the Cartesian product of the APIDs set in the apmask and the APQIs
>> +set in the aqmask identify the APQNs of AP queue devices owned by the zcrypt
>> +device drivers. If an attempt is made to modify the apmask or aqmask such that
>> +one or more APQNs changes ownership from a the vfio_ap device driver to a zcrypt
>> +device driver and the APQN is assigned to a mediated device (see step 3 below),
>> +the operation will fail with an error ('Address already in use').
>>
>> Securing the APQNs for our example:
>> ----------------------------------
>> - To secure the AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004, 06.0047,
>> - 06.00ab, and 06.00ff for use by the vfio_ap device driver, the corresponding
>> - APQNs can either be removed from the default masks:
>> + There is no way to secure the specific AP queues 05.0004, 05.0047, 05.00ab,
>> + 05.00ff, 06.0004, 06.0047, 06.00ab, and 06.00ff for use by the vfio_ap device
>> + driver, so we are left with either securing all queues on adapters 05 and
>> + 06, or queues 0004, 0047, 00ab and 00ff can be secured on all adapters.
>> +
>> + To secure all queues on adapters 05 and 05:
>>
>> echo -5,-6 > /sys/bus/ap/apmask
>>
>> + To secure queues 0004, 0047, 00ab, and 00ff on all adapters:
>> +
>> echo -4,-0x47,-0xab,-0xff > /sys/bus/ap/aqmask
>>
>> Or the masks can be set as follows:
>> @@ -617,14 +653,20 @@ These are the steps:
>> echo 0xf7fffffffffffffffeffffffffffffffffffffffffeffffffffffffffffffffe \
>> > aqmask
>>
>> - This will result in AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004,
>> - 06.0047, 06.00ab, and 06.00ff getting bound to the vfio_ap device driver. The
>> - sysfs directory for the vfio_ap device driver will now contain symbolic links
>> - to the AP queue devices bound to it:
>> + For this example, we will choose to secure queues 0004, 0047, 00ab, and 00ff
>> + on all adapters. This will result in AP queues 04.0004, 04.0047, 04.00ab,
>> + 04.00ff, 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004, 06.0047, 06.00ab and
>> + 06.00ff getting bound to the vfio_ap device driver. The sysfs directory for
>> + the vfio_ap device driver will now contain symbolic links to the AP queue
>> + devices bound to it:
>>
>> /sys/bus/ap
>> ... [drivers]
>> ...... [vfio_ap]
>> + ......... [04.0004]
>> + ......... [04.0047]
>> + ......... [04.00ab]
>> + ......... [04.00ff]
>> ......... [05.0004]
>> ......... [05.0047]
>> ......... [05.00ab]
>> @@ -641,11 +683,12 @@ These are the steps:
>> future and for which there are few older systems on which to test.
>>
>> The administrator, therefore, must take care to secure only AP queues that
>> - can be bound to the vfio_ap device driver. The device type for a given AP
>> - queue device can be read from the parent card's sysfs directory. For example,
>> - to see the hardware type of the queue 05.0004:
>> + can be bound to the vfio_ap device driver, or those queues will not get bound
>> + to any driver. The device type for a given AP queue device can be read from
>> + the parent card's sysfs directory. For example, to see the hardware type of
>> + the queue 05.0004:
>>
>> - cat /sys/bus/ap/devices/card05/hwtype
>> + cat /sys/bus/ap/devices/card05/hwtype
>>
>> The hwtype must be 10 or higher (CEX4 or newer) in order to be bound to the
>> vfio_ap device driver.
>> @@ -747,37 +790,48 @@ These are the steps:
>> higher than the maximum is specified, the operation will terminate with
>> an error (ENODEV).
>>
>> - * All APQNs that can be derived from the adapter ID and the IDs of
>> - the previously assigned domains must be bound to the vfio_ap device
>> - driver. If no domains have yet been assigned, then there must be at least
>> - one APQN with the specified APID bound to the vfio_ap driver. If no such
>> - APQNs are bound to the driver, the operation will terminate with an
>> - error (EADDRNOTAVAIL).
>> + * Each APQN that can be derived from the adapter ID and the IDs of
>> + the previously assigned domains must not be reserved for use by the
>> + zcrypt device drivers as specified by the /sys/bus/ap/apmask and
>> + /sys/bus/ap/aqmask syfs interfaces. If any APQN is reserved, the operation
>> + will terminate with an error (EADDRNOTAVAIL).
>>
>> - No APQN that can be derived from the adapter ID and the IDs of the
>> + * No APQN that can be derived from the adapter ID and the IDs of the
>> previously assigned domains can be assigned to another mediated matrix
>> device. If an APQN is assigned to another mediated matrix device, the
>> operation will terminate with an error (EADDRINUSE).
>>
>> + Note that adapters that are currently not available to the host may be
>> + assigned to the mediated device. If the adapter subsequently becomes
>> + available while a guest is using the mediated device, it will automatically
>> + become available to the guest.
>> +
>> In order to successfully assign a domain:
>>
>> * The domain number specified must represent a value from 0 up to the
>> maximum domain number configured for the system. If a domain number
>> higher than the maximum is specified, the operation will terminate with
>> - an error (ENODEV).
>> + an error (ENODEV). The maximum domain number can be determined by
>> + printing the sysfs /sys/bus/ap/ap_max_domain_id attribute:
>>
>> - * All APQNs that can be derived from the domain ID and the IDs of
>> - the previously assigned adapters must be bound to the vfio_ap device
>> - driver. If no domains have yet been assigned, then there must be at least
>> - one APQN with the specified APQI bound to the vfio_ap driver. If no such
>> - APQNs are bound to the driver, the operation will terminate with an
>> - error (EADDRNOTAVAIL).
>> + cat /sys/bus/ap/ap_max_domain_id
>>
>> - No APQN that can be derived from the domain ID and the IDs of the
>> + * Each APQN that can be derived from the domain ID and the IDs of
>> + the previously assigned adapters must not be reserved for use by the
>> + zcrypt device drivers as specified by the /sys/bus/ap/apmask and
>> + /sys/bus/ap/aqmask syfs interfaces. If any APQN is reserved, the operation
>> + will terminate with an error (EADDRNOTAVAIL).
>> +
>> + * No APQN that can be derived from the domain ID and the IDs of the
>> previously assigned adapters can be assigned to another mediated matrix
>> device. If an APQN is assigned to another mediated matrix device, the
>> operation will terminate with an error (EADDRINUSE).
>>
>> + Note that domains that are currently not available to the host may be
>> + assigned to the mediated device. If the domain subsequently becomes
>> + available while a guest is using the mediated device, it will automatically
>> + become available to the guest.
>> +
>> In order to successfully assign a control domain, the domain number
>> specified must represent a value from 0 up to the maximum domain number
>> configured for the system. If a control domain number higher than the maximum
>> @@ -822,16 +876,96 @@ Using our example again, to remove the mediated matrix device $uuid1:
>> host. If the mdev matrix device is removed, one may want to also reconfigure
>> the pool of adapters and queues reserved for use by the default drivers.
>>
>> -Limitations
>> -===========
>> -* The KVM/kernel interfaces do not provide a way to prevent restoring an APQN
>> - to the default drivers pool of a queue that is still assigned to a mediated
>> - device in use by a guest. It is incumbent upon the administrator to
>> - ensure there is no mediated device in use by a guest to which the APQN is
>> - assigned lest the host be given access to the private data of the AP queue
>> - device such as a private key configured specifically for the guest.
>> +Hot plug/unplug via mdev matrix device sysfs interfaces:
>> +=======================================================
>> +If an mdev matrix device is in use by a running guest, AP resources can be
>> +plugged into or unplugged from the guest via the mdev device's sysfs
>> +assignment interfaces. Below are some examples.
>> +
>> + To plug adapter 10 into a running guest:
>> +
>> + echo 0xa > assign_adapter
>> +
>> + To unplug domain 5 from a running guest:
>> +
>> + echo 5 > unassign_domain
>> +
>> +To display the matrix of a guest using the mdev matrix device:
>> +
>> + cat guest_matrix
>> +
>> +If you attempt to display the guest matrix when a guest is not using the
>> +mdev matrix device, an error will be displayed (ENODEV). Note that adapters and
>> +domains that are not yet available or not yet assigned to the LPAR can be
>> +assigned and will become available to the guest as soon as they become available
>> +to the host.
>> +
>> +Dynamic Changes to AP Configuration using the Support Element (SE):
>> +==================================================================
>> +The SE can be used to dynamically make the following changes to the AP
>> +configuration for an LPAR in which a linux host is running:
>> +
>> + * Configure one or more adapters on
>> +
>> + Configuring an adapter on sets its state to online, thus making it
>> + available to the LPAR to which it is assigned. When an adapter is
>> + configured on, it immediately becomes available to the LPAR as well as to
>> + any guests using a mediated device to which the adapter is assigned.
>> +
>> + * Configure one or more adapters off
>> +
>> + Configuring an adapter off sets its state to standby, thus making it
>> + unavailable to the LPAR to which it is assigned. When an adapter is
>> + configured off, it immediately becomes unavailable to the LPAR as well as
>> + to any guests using a mediated device to which the adapter is assigned.
>> +
>> + * Add adapters or domains to the LPAR configuration
>> +
>> + Adapters and/or domains can be assigned to an LPAR using the Change LPAR
>> + Cryptographic Controls task. To make dynamic changes to the AP
>> + configuration for an LPAR Running a linux guest, the online adapters
>> + assigned to the LPAR must first be configured off. After performing the
>> + adapter and/or domain assignments, the AP resources will automatically
>> + become available to the linux host running in the LPAR as well as any
>> + guests using a mediated device to which the adpaters and/or domains are
>> + assigned.
>> +
>> + * Remove adapters or domains from the LPAR configuration
>> +
>> + Adapters and/or domains can be unassigned from an LPAR using the Change
>> + LPAR Cryptographic Controls task. To make dynamic changes to the AP
>> + configuration for an LPAR Running a linux guest, the online adapters
>> + assigned to the LPAR must first be configured off. After performing the
>> + adapter and/or domain unassignments, the AP resources will automatically
>> + become unavailable to the linux host running in the LPAR as well as any
>> + guests using a mediated device to which the adpaters and/or domains are
>> + assigned.
>> +
>> +Dynamic Changes to the AP Configuration using the SCLP command:
>> +==============================================================
>> +The following SCLP commands may be used to dynamically configure AP adapters on
>> +and off:
>> +
>> +* Configure Adjunct Processor
>> +
>> + The 'Configure Adjunct Processor' command sets an AP adapter's state to
>> + online, thus making it available to the LPARs to which it is assigned. It will
>> + likewise become available to any linux guest using a mediated device to which
>> + the adapter is assigned.
>> +
>> +* Deconfigure Adjunct Processor
>> +
>> + The 'Deconfigure Adjunct Processor' command sets an AP adapter's state to
>> + standby, thus making it unavailable to the LPARs to which it is assigned. It
>> + will likewise become unavailable to any linux guest using a mediated device to
>> + which the adapter is assigned.
>>
>> -* Dynamically modifying the AP matrix for a running guest (which would amount to
>> - hot(un)plug of AP devices for the guest) is currently not supported
>> +Live migration:
>> +==============
>> +Live guest migration is not supported for guests using AP devices. All AP
>> +devices in use by the guest must be unplugged prior to initiating live
>> +migration (see "Hot plug/unplug via mdev matrix device sysfs interfaces" section
>> +above). If you are using QEMU to run your guest and it supports hot plug/unplug
>> +of the vfio-ap device, this would be another option (consult the QEMU
>> +documentation for details).
>>
>> -* Live guest migration is not supported for guests using AP devices.
>

2019-06-18 06:50:09

by Harald Freudenberger

[permalink] [raw]
Subject: Re: [PATCH v4 5/7] s390: vfio-ap: allow assignment of unavailable AP resources to mdev device

On 17.06.19 17:07, Tony Krowiak wrote:
> On 6/17/19 6:05 AM, Harald Freudenberger wrote:
>> On 13.06.19 21:39, Tony Krowiak wrote:
>>> The AP architecture does not preclude assignment of AP resources that are
>>> not available. Let's go ahead and implement this facet of the AP
>>> architecture for linux guests.
>>>
>>> The current implementation does not allow assignment of AP adapters or
>>> domains to an mdev device if the APQNs resulting from the assignment
>>> reference AP queue devices that are not bound to the vfio_ap device
>>> driver. This patch allows assignment of AP resources to the mdev device as
>>> long as the APQNs resulting from the assignment are not reserved by the AP
>>> BUS for use by the zcrypt device drivers and the APQNs are not assigned to
>>> another mdev device.
>>>
>>> Signed-off-by: Tony Krowiak <[email protected]>
>>> ---
>>>   drivers/s390/crypto/vfio_ap_ops.c | 231 ++++++++------------------------------
>>>   1 file changed, 44 insertions(+), 187 deletions(-)
>>>
>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>>> index 60efd3d7896d..9db86c0db52e 100644
>>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>>> @@ -406,122 +406,6 @@ static struct attribute_group *vfio_ap_mdev_type_groups[] = {
>>>       NULL,
>>>   };
>>>   -struct vfio_ap_queue_reserved {
>>> -    unsigned long *apid;
>>> -    unsigned long *apqi;
>>> -    bool reserved;
>>> -};
>>> -
>>> -/**
>>> - * vfio_ap_has_queue
>>> - *
>>> - * @dev: an AP queue device
>>> - * @data: a struct vfio_ap_queue_reserved reference
>>> - *
>>> - * Flags whether the AP queue device (@dev) has a queue ID containing the APQN,
>>> - * apid or apqi specified in @data:
>>> - *
>>> - * - If @data contains both an apid and apqi value, then @data will be flagged
>>> - *   as reserved if the APID and APQI fields for the AP queue device matches
>>> - *
>>> - * - If @data contains only an apid value, @data will be flagged as
>>> - *   reserved if the APID field in the AP queue device matches
>>> - *
>>> - * - If @data contains only an apqi value, @data will be flagged as
>>> - *   reserved if the APQI field in the AP queue device matches
>>> - *
>>> - * Returns 0 to indicate the input to function succeeded. Returns -EINVAL if
>>> - * @data does not contain either an apid or apqi.
>>> - */
>>> -static int vfio_ap_has_queue(struct device *dev, void *data)
>>> -{
>>> -    struct vfio_ap_queue_reserved *qres = data;
>>> -    struct ap_queue *ap_queue = to_ap_queue(dev);
>>> -    ap_qid_t qid;
>>> -    unsigned long id;
>>> -
>>> -    if (qres->apid && qres->apqi) {
>>> -        qid = AP_MKQID(*qres->apid, *qres->apqi);
>>> -        if (qid == ap_queue->qid)
>>> -            qres->reserved = true;
>>> -    } else if (qres->apid && !qres->apqi) {
>>> -        id = AP_QID_CARD(ap_queue->qid);
>>> -        if (id == *qres->apid)
>>> -            qres->reserved = true;
>>> -    } else if (!qres->apid && qres->apqi) {
>>> -        id = AP_QID_QUEUE(ap_queue->qid);
>>> -        if (id == *qres->apqi)
>>> -            qres->reserved = true;
>>> -    } else {
>>> -        return -EINVAL;
>>> -    }
>>> -
>>> -    return 0;
>>> -}
>>> -
>>> -/**
>>> - * vfio_ap_verify_queue_reserved
>>> - *
>>> - * @matrix_dev: a mediated matrix device
>>> - * @apid: an AP adapter ID
>>> - * @apqi: an AP queue index
>>> - *
>>> - * Verifies that the AP queue with @apid/@apqi is reserved by the VFIO AP device
>>> - * driver according to the following rules:
>>> - *
>>> - * - If both @apid and @apqi are not NULL, then there must be an AP queue
>>> - *   device bound to the vfio_ap driver with the APQN identified by @apid and
>>> - *   @apqi
>>> - *
>>> - * - If only @apid is not NULL, then there must be an AP queue device bound
>>> - *   to the vfio_ap driver with an APQN containing @apid
>>> - *
>>> - * - If only @apqi is not NULL, then there must be an AP queue device bound
>>> - *   to the vfio_ap driver with an APQN containing @apqi
>>> - *
>>> - * Returns 0 if the AP queue is reserved; otherwise, returns -EADDRNOTAVAIL.
>>> - */
>>> -static int vfio_ap_verify_queue_reserved(unsigned long *apid,
>>> -                     unsigned long *apqi)
>>> -{
>>> -    int ret;
>>> -    struct vfio_ap_queue_reserved qres;
>>> -
>>> -    qres.apid = apid;
>>> -    qres.apqi = apqi;
>>> -    qres.reserved = false;
>>> -
>>> -    ret = driver_for_each_device(&matrix_dev->vfio_ap_drv->driver, NULL,
>>> -                     &qres, vfio_ap_has_queue);
>>> -    if (ret)
>>> -        return ret;
>>> -
>>> -    if (qres.reserved)
>>> -        return 0;
>>> -
>>> -    return -EADDRNOTAVAIL;
>>> -}
>>> -
>>> -static int
>>> -vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
>>> -                         unsigned long apid)
>>> -{
>>> -    int ret;
>>> -    unsigned long apqi;
>>> -    unsigned long nbits = matrix_mdev->matrix.aqm_max + 1;
>>> -
>>> -    if (find_first_bit_inv(matrix_mdev->matrix.aqm, nbits) >= nbits)
>>> -        return vfio_ap_verify_queue_reserved(&apid, NULL);
>>> -
>>> -    for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, nbits) {
>>> -        ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
>>> -        if (ret)
>>> -            return ret;
>>> -    }
>>> -
>>> -    return 0;
>>> -}
>>> -
>>>   /**
>>>    * vfio_ap_mdev_verify_no_sharing
>>>    *
>>> @@ -529,18 +413,26 @@ vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
>>>    * and AP queue indexes comprising the AP matrix are not configured for another
>>>    * mediated device. AP queue sharing is not allowed.
>>>    *
>>> - * @matrix_mdev: the mediated matrix device
>>> + * @mdev_apm: the mask identifying the adapters assigned to mdev
>>> + * @mdev_apm: the mask identifying the adapters assigned to mdev
>> I assume you wanted to write @mdev_aqm ... queues ... at the 2nd line.
>
> You assumed correctly. I also mean to say "domains assigned to mdev".
>
>>>    *
>>>    * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
>>>    */
>>> -static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
>>> +static int vfio_ap_mdev_verify_no_sharing(unsigned long *mdev_apm,
>>> +                      unsigned long *mdev_aqm)
>>>   {
>>>       struct ap_matrix_mdev *lstdev;
>>>       DECLARE_BITMAP(apm, AP_DEVICES);
>>>       DECLARE_BITMAP(aqm, AP_DOMAINS);
>>>         list_for_each_entry(lstdev, &matrix_dev->mdev_list, node) {
>>> -        if (matrix_mdev == lstdev)
>>> +        /*
>>> +         * If either of the input masks belongs to the mdev to which an
>>> +         * AP resource is being assigned, then we don't need to verify
>>> +         * that mdev's masks.
>>> +         */
>>> +        if ((mdev_apm == lstdev->matrix.apm) ||
>>> +            (mdev_aqm == lstdev->matrix.aqm))
>>>               continue;
>>>             memset(apm, 0, sizeof(apm));
>>> @@ -550,12 +442,10 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
>>>            * We work on full longs, as we can only exclude the leftover
>>>            * bits in non-inverse order. The leftover is all zeros.
>>>            */
>>> -        if (!bitmap_and(apm, matrix_mdev->matrix.apm,
>>> -                lstdev->matrix.apm, AP_DEVICES))
>>> +        if (!bitmap_and(apm, mdev_apm, lstdev->matrix.apm, AP_DEVICES))
>>>               continue;
>>>   -        if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
>>> -                lstdev->matrix.aqm, AP_DOMAINS))
>>> +        if (!bitmap_and(aqm, mdev_aqm, lstdev->matrix.aqm, AP_DOMAINS))
>>>               continue;
>>>             return -EADDRINUSE;
>>> @@ -564,6 +454,17 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
>>>       return 0;
>>>   }
>>>   +static int vfio_ap_mdev_validate_masks(unsigned long *apm, unsigned long *aqm)
>>> +{
>>> +    int ret;
>>> +
>>> +    ret = ap_apqn_in_matrix_owned_by_def_drv(apm, aqm);
>>> +    if (ret)
>>> +        return (ret == 1) ? -EADDRNOTAVAIL : ret;
>>> +
>>> +    return vfio_ap_mdev_verify_no_sharing(apm, aqm);
>>> +}
>>> +
>>>   /**
>>>    * assign_adapter_store
>>>    *
>>> @@ -602,6 +503,7 @@ static ssize_t assign_adapter_store(struct device *dev,
>>>   {
>>>       int ret;
>>>       unsigned long apid;
>>> +    DECLARE_BITMAP(apm, AP_DEVICES);
>>>       struct mdev_device *mdev = mdev_from_dev(dev);
>>>       struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>>   @@ -616,32 +518,19 @@ static ssize_t assign_adapter_store(struct device *dev,
>>>       if (apid > matrix_mdev->matrix.apm_max)
>>>           return -ENODEV;
>>>   -    /*
>>> -     * Set the bit in the AP mask (APM) corresponding to the AP adapter
>>> -     * number (APID). The bits in the mask, from most significant to least
>>> -     * significant bit, correspond to APIDs 0-255.
>>> -     */
>>> -    mutex_lock(&matrix_dev->lock);
>>> -
>>> -    ret = vfio_ap_mdev_verify_queues_reserved_for_apid(matrix_mdev, apid);
>>> -    if (ret)
>>> -        goto done;
>>> +    memset(apm, 0, ARRAY_SIZE(apm) * sizeof(*apm));
>> What about just memset(apm, 0, sizeof(apm));
>
> apm is a pointer to an array of unsigned long, so sizeof(apm) will
> yield the number of bytes in the pointer (8), not the number of bytes in
> the array (32); or am I wrong about that?
The
  DECLARE_BITMAP(apm, AP_DEVICES);
with the macro definition in types.h:
  #define DECLARE_BITMAP(name,bits) \
    unsigned long name[BITS_TO_LONGS(bits)]
gives this expanded code:
  unsigned long apm[BITS_TO_LONGS(AP_DEVICES)];
which is an ordinary unsigned long array.
So sizeof(apm) gives the size of the array in bytes:
  sizeof(apm) = 32
I know this is weird, but it is common practice.

Funnily look at this code:
  int blubber[10];
  int *ptr = blubber;
sizeof(blubber) gives 10*sizeof(int) but
sizeof(ptr) gives the size of the pointer pointing to
the blubber array.

>
>>> +    set_bit_inv(apid, apm);
>>>   +    mutex_lock(&matrix_dev->lock);
>>> +    ret = vfio_ap_mdev_validate_masks(apm, matrix_mdev->matrix.aqm);
>>> +    if (ret) {
>>> +        mutex_unlock(&matrix_dev->lock);
>>> +        return ret;
>>> +    }
>>>       set_bit_inv(apid, matrix_mdev->matrix.apm);
>>> -
>>> -    ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
>>> -    if (ret)
>>> -        goto share_err;
>>> -
>>> -    ret = count;
>>> -    goto done;
>>> -
>>> -share_err:
>>> -    clear_bit_inv(apid, matrix_mdev->matrix.apm);
>>> -done:
>>>       mutex_unlock(&matrix_dev->lock);
>>>   -    return ret;
>>> +    return count;
>>>   }
>>>   static DEVICE_ATTR_WO(assign_adapter);
>>>   @@ -690,26 +579,6 @@ static ssize_t unassign_adapter_store(struct device *dev,
>>>   }
>>>   static DEVICE_ATTR_WO(unassign_adapter);
>>>   -static int
>>> -vfio_ap_mdev_verify_queues_reserved_for_apqi(struct ap_matrix_mdev *matrix_mdev,
>>> -                         unsigned long apqi)
>>> -{
>>> -    int ret;
>>> -    unsigned long apid;
>>> -    unsigned long nbits = matrix_mdev->matrix.apm_max + 1;
>>> -
>>> -    if (find_first_bit_inv(matrix_mdev->matrix.apm, nbits) >= nbits)
>>> -        return vfio_ap_verify_queue_reserved(NULL, &apqi);
>>> -
>>> -    for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, nbits) {
>>> -        ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
>>> -        if (ret)
>>> -            return ret;
>>> -    }
>>> -
>>> -    return 0;
>>> -}
>>> -
>>>   /**
>>>    * assign_domain_store
>>>    *
>>> @@ -748,6 +617,7 @@ static ssize_t assign_domain_store(struct device *dev,
>>>   {
>>>       int ret;
>>>       unsigned long apqi;
>>> +    DECLARE_BITMAP(aqm, AP_DEVICES);
>> AP_DEVICES -> AP_DOMAINS
>
> Copy and paste error, it should be AP_DOMAINS.
>
>>>       struct mdev_device *mdev = mdev_from_dev(dev);
>>>       struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>>       unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
>>> @@ -762,27 +632,19 @@ static ssize_t assign_domain_store(struct device *dev,
>>>       if (apqi > max_apqi)
>>>           return -ENODEV;
>>>   -    mutex_lock(&matrix_dev->lock);
>>> -
>>> -    ret = vfio_ap_mdev_verify_queues_reserved_for_apqi(matrix_mdev, apqi);
>>> -    if (ret)
>>> -        goto done;
>>> +    memset(aqm, 0, ARRAY_SIZE(aqm) * sizeof(*aqm));
>> memset(aqm, 0, sizeof(aqm));
>
> See response above.
>
>>> +    set_bit_inv(apqi, aqm);
>>>   +    mutex_lock(&matrix_dev->lock);
>>> +    ret = vfio_ap_mdev_validate_masks(matrix_mdev->matrix.apm, aqm);
>>> +    if (ret) {
>>> +        mutex_unlock(&matrix_dev->lock);
>>> +        return ret;
>>> +    }
>>>       set_bit_inv(apqi, matrix_mdev->matrix.aqm);
>>> -
>>> -    ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
>>> -    if (ret)
>>> -        goto share_err;
>>> -
>>> -    ret = count;
>>> -    goto done;
>>> -
>>> -share_err:
>>> -    clear_bit_inv(apqi, matrix_mdev->matrix.aqm);
>>> -done:
>>>       mutex_unlock(&matrix_dev->lock);
>>>   -    return ret;
>>> +    return count;
>>>   }
>>>   static DEVICE_ATTR_WO(assign_domain);
>>>   @@ -868,11 +730,6 @@ static ssize_t assign_control_domain_store(struct device *dev,
>>>       if (id > matrix_mdev->matrix.adm_max)
>>>           return -ENODEV;
>>>   -    /* Set the bit in the ADM (bitmask) corresponding to the AP control
>>> -     * domain number (id). The bits in the mask, from most significant to
>>> -     * least significant, correspond to IDs 0 up to the one less than the
>>> -     * number of control domains that can be assigned.
>>> -     */
>>>       mutex_lock(&matrix_dev->lock);
>>>       set_bit_inv(id, matrix_mdev->matrix.adm);
>>>       mutex_unlock(&matrix_dev->lock);
>>
>

2019-06-18 16:28:28

by Cornelia Huck

[permalink] [raw]
Subject: Re: [PATCH v4 3/7] s390: zcrypt: driver callback to indicate resource in use

On Thu, 13 Jun 2019 15:39:36 -0400
Tony Krowiak <[email protected]> wrote:

> Introduces a new driver callback to prevent a root user from unbinding
> an AP queue from its device driver if the queue is in use. This prevents
> a root user from inadvertently taking a queue away from a guest and
> giving it to the host, or vice versa. The callback will be invoked
> whenever a change to the AP bus's apmask or aqmask sysfs interfaces may
> result in one or more AP queues being removed from its driver. If the
> callback responds in the affirmative for any driver queried, the change
> to the apmask or aqmask will be rejected with a device in use error.
>
> For this patch, only non-default drivers will be queried. Currently,
> there is only one non-default driver, the vfio_ap device driver. The
> vfio_ap device driver manages AP queues passed through to one or more
> guests and we don't want to unexpectedly take AP resources away from
> guests which are most likely independently administered.
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> drivers/s390/crypto/ap_bus.c | 138 +++++++++++++++++++++++++++++++++++++++++--
> drivers/s390/crypto/ap_bus.h | 3 +
> 2 files changed, 135 insertions(+), 6 deletions(-)

Hm... I recall objecting to this patch before, fearing that it makes it
possible for a bad actor to hog resources that can't be removed by
root, even forcefully. (I have not had time to look at the intervening
versions, so I might be missing something.)

Is there a way for root to forcefully override this?

2019-06-19 13:05:24

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v4 3/7] s390: zcrypt: driver callback to indicate resource in use

On 6/18/19 12:25 PM, Cornelia Huck wrote:
> On Thu, 13 Jun 2019 15:39:36 -0400
> Tony Krowiak <[email protected]> wrote:
>
>> Introduces a new driver callback to prevent a root user from unbinding
>> an AP queue from its device driver if the queue is in use. This prevents
>> a root user from inadvertently taking a queue away from a guest and
>> giving it to the host, or vice versa. The callback will be invoked
>> whenever a change to the AP bus's apmask or aqmask sysfs interfaces may
>> result in one or more AP queues being removed from its driver. If the
>> callback responds in the affirmative for any driver queried, the change
>> to the apmask or aqmask will be rejected with a device in use error.
>>
>> For this patch, only non-default drivers will be queried. Currently,
>> there is only one non-default driver, the vfio_ap device driver. The
>> vfio_ap device driver manages AP queues passed through to one or more
>> guests and we don't want to unexpectedly take AP resources away from
>> guests which are most likely independently administered.
>>
>> Signed-off-by: Tony Krowiak <[email protected]>
>> ---
>> drivers/s390/crypto/ap_bus.c | 138 +++++++++++++++++++++++++++++++++++++++++--
>> drivers/s390/crypto/ap_bus.h | 3 +
>> 2 files changed, 135 insertions(+), 6 deletions(-)
>
> Hm... I recall objecting to this patch before, fearing that it makes it
> possible for a bad actor to hog resources that can't be removed by
> root, even forcefully. (I have not had time to look at the intervening
> versions, so I might be missing something.)
>
> Is there a way for root to forcefully override this?

You recall correctly; however, after many internal crypto team
discussions, it was decided that this feature was important
and should be kept.

Allow me to first address your fear that a bad actor can hog
resources that can't be removed by root. With this enhancement,
there is nothing preventing a root user from taking resources
from a matrix mdev, it simply forces him/her to follow the
proper procedure. The resources to be removed must first be
unassigned from the matrix mdev to which they are assigned.
The AP bus's /sys/bus/ap/apmask and /sys/bus/ap/aqmask
sysfs attributes can then be edited to transfer ownership
of the resources to zcrypt.

The rationale for keeping this feature is:

* It is a bad idea to steal an adapter in use from a guest. In the worst
case, the guest could end up without access to any crypto adapters
without knowing why. This could lead to performance issues on guests
that rely heavily on crypto such as guests used for blockchain
transactions.

* There are plenty of examples in linux of the kernel preventing a root
user from performing a task. For example, a module can't be removed
if references are still held for it. Another example would be trying
to bind a CEX4 adapter to a device driver not registered for CEX4;
this action will also be rejected.

* The semantics are much cleaner and the logic is far less complicated.

* It forces the use of the proper procedure to change ownership of AP
queues.


>

2019-06-19 13:40:35

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v4 5/7] s390: vfio-ap: allow assignment of unavailable AP resources to mdev device

On 6/18/19 2:49 AM, Harald Freudenberger wrote:
> On 17.06.19 17:07, Tony Krowiak wrote:
>> On 6/17/19 6:05 AM, Harald Freudenberger wrote:
>>> On 13.06.19 21:39, Tony Krowiak wrote:
>>>> The AP architecture does not preclude assignment of AP resources that are
>>>> not available. Let's go ahead and implement this facet of the AP
>>>> architecture for linux guests.
>>>>
>>>> The current implementation does not allow assignment of AP adapters or
>>>> domains to an mdev device if the APQNs resulting from the assignment
>>>> reference AP queue devices that are not bound to the vfio_ap device
>>>> driver. This patch allows assignment of AP resources to the mdev device as
>>>> long as the APQNs resulting from the assignment are not reserved by the AP
>>>> BUS for use by the zcrypt device drivers and the APQNs are not assigned to
>>>> another mdev device.
>>>>
>>>> Signed-off-by: Tony Krowiak <[email protected]>
>>>> ---
>>>>   drivers/s390/crypto/vfio_ap_ops.c | 231 ++++++++------------------------------
>>>>   1 file changed, 44 insertions(+), 187 deletions(-)
>>>>
>>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>>>> index 60efd3d7896d..9db86c0db52e 100644
>>>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>>>> @@ -406,122 +406,6 @@ static struct attribute_group *vfio_ap_mdev_type_groups[] = {
>>>>       NULL,
>>>>   };
>>>>   -struct vfio_ap_queue_reserved {
>>>> -    unsigned long *apid;
>>>> -    unsigned long *apqi;
>>>> -    bool reserved;
>>>> -};
>>>> -
>>>> -/**
>>>> - * vfio_ap_has_queue
>>>> - *
>>>> - * @dev: an AP queue device
>>>> - * @data: a struct vfio_ap_queue_reserved reference
>>>> - *
>>>> - * Flags whether the AP queue device (@dev) has a queue ID containing the APQN,
>>>> - * apid or apqi specified in @data:
>>>> - *
>>>> - * - If @data contains both an apid and apqi value, then @data will be flagged
>>>> - *   as reserved if the APID and APQI fields for the AP queue device matches
>>>> - *
>>>> - * - If @data contains only an apid value, @data will be flagged as
>>>> - *   reserved if the APID field in the AP queue device matches
>>>> - *
>>>> - * - If @data contains only an apqi value, @data will be flagged as
>>>> - *   reserved if the APQI field in the AP queue device matches
>>>> - *
>>>> - * Returns 0 to indicate the input to function succeeded. Returns -EINVAL if
>>>> - * @data does not contain either an apid or apqi.
>>>> - */
>>>> -static int vfio_ap_has_queue(struct device *dev, void *data)
>>>> -{
>>>> -    struct vfio_ap_queue_reserved *qres = data;
>>>> -    struct ap_queue *ap_queue = to_ap_queue(dev);
>>>> -    ap_qid_t qid;
>>>> -    unsigned long id;
>>>> -
>>>> -    if (qres->apid && qres->apqi) {
>>>> -        qid = AP_MKQID(*qres->apid, *qres->apqi);
>>>> -        if (qid == ap_queue->qid)
>>>> -            qres->reserved = true;
>>>> -    } else if (qres->apid && !qres->apqi) {
>>>> -        id = AP_QID_CARD(ap_queue->qid);
>>>> -        if (id == *qres->apid)
>>>> -            qres->reserved = true;
>>>> -    } else if (!qres->apid && qres->apqi) {
>>>> -        id = AP_QID_QUEUE(ap_queue->qid);
>>>> -        if (id == *qres->apqi)
>>>> -            qres->reserved = true;
>>>> -    } else {
>>>> -        return -EINVAL;
>>>> -    }
>>>> -
>>>> -    return 0;
>>>> -}
>>>> -
>>>> -/**
>>>> - * vfio_ap_verify_queue_reserved
>>>> - *
>>>> - * @matrix_dev: a mediated matrix device
>>>> - * @apid: an AP adapter ID
>>>> - * @apqi: an AP queue index
>>>> - *
>>>> - * Verifies that the AP queue with @apid/@apqi is reserved by the VFIO AP device
>>>> - * driver according to the following rules:
>>>> - *
>>>> - * - If both @apid and @apqi are not NULL, then there must be an AP queue
>>>> - *   device bound to the vfio_ap driver with the APQN identified by @apid and
>>>> - *   @apqi
>>>> - *
>>>> - * - If only @apid is not NULL, then there must be an AP queue device bound
>>>> - *   to the vfio_ap driver with an APQN containing @apid
>>>> - *
>>>> - * - If only @apqi is not NULL, then there must be an AP queue device bound
>>>> - *   to the vfio_ap driver with an APQN containing @apqi
>>>> - *
>>>> - * Returns 0 if the AP queue is reserved; otherwise, returns -EADDRNOTAVAIL.
>>>> - */
>>>> -static int vfio_ap_verify_queue_reserved(unsigned long *apid,
>>>> -                     unsigned long *apqi)
>>>> -{
>>>> -    int ret;
>>>> -    struct vfio_ap_queue_reserved qres;
>>>> -
>>>> -    qres.apid = apid;
>>>> -    qres.apqi = apqi;
>>>> -    qres.reserved = false;
>>>> -
>>>> -    ret = driver_for_each_device(&matrix_dev->vfio_ap_drv->driver, NULL,
>>>> -                     &qres, vfio_ap_has_queue);
>>>> -    if (ret)
>>>> -        return ret;
>>>> -
>>>> -    if (qres.reserved)
>>>> -        return 0;
>>>> -
>>>> -    return -EADDRNOTAVAIL;
>>>> -}
>>>> -
>>>> -static int
>>>> -vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
>>>> -                         unsigned long apid)
>>>> -{
>>>> -    int ret;
>>>> -    unsigned long apqi;
>>>> -    unsigned long nbits = matrix_mdev->matrix.aqm_max + 1;
>>>> -
>>>> -    if (find_first_bit_inv(matrix_mdev->matrix.aqm, nbits) >= nbits)
>>>> -        return vfio_ap_verify_queue_reserved(&apid, NULL);
>>>> -
>>>> -    for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, nbits) {
>>>> -        ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
>>>> -        if (ret)
>>>> -            return ret;
>>>> -    }
>>>> -
>>>> -    return 0;
>>>> -}
>>>> -
>>>>   /**
>>>>    * vfio_ap_mdev_verify_no_sharing
>>>>    *
>>>> @@ -529,18 +413,26 @@ vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
>>>>    * and AP queue indexes comprising the AP matrix are not configured for another
>>>>    * mediated device. AP queue sharing is not allowed.
>>>>    *
>>>> - * @matrix_mdev: the mediated matrix device
>>>> + * @mdev_apm: the mask identifying the adapters assigned to mdev
>>>> + * @mdev_apm: the mask identifying the adapters assigned to mdev
>>> I assume you wanted to write @mdev_aqm ... queues ... at the 2nd line.
>>
>> You assumed correctly. I also mean to say "domains assigned to mdev".
>>
>>>>    *
>>>>    * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
>>>>    */
>>>> -static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
>>>> +static int vfio_ap_mdev_verify_no_sharing(unsigned long *mdev_apm,
>>>> +                      unsigned long *mdev_aqm)
>>>>   {
>>>>       struct ap_matrix_mdev *lstdev;
>>>>       DECLARE_BITMAP(apm, AP_DEVICES);
>>>>       DECLARE_BITMAP(aqm, AP_DOMAINS);
>>>>         list_for_each_entry(lstdev, &matrix_dev->mdev_list, node) {
>>>> -        if (matrix_mdev == lstdev)
>>>> +        /*
>>>> +         * If either of the input masks belongs to the mdev to which an
>>>> +         * AP resource is being assigned, then we don't need to verify
>>>> +         * that mdev's masks.
>>>> +         */
>>>> +        if ((mdev_apm == lstdev->matrix.apm) ||
>>>> +            (mdev_aqm == lstdev->matrix.aqm))
>>>>               continue;
>>>>             memset(apm, 0, sizeof(apm));
>>>> @@ -550,12 +442,10 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
>>>>            * We work on full longs, as we can only exclude the leftover
>>>>            * bits in non-inverse order. The leftover is all zeros.
>>>>            */
>>>> -        if (!bitmap_and(apm, matrix_mdev->matrix.apm,
>>>> -                lstdev->matrix.apm, AP_DEVICES))
>>>> +        if (!bitmap_and(apm, mdev_apm, lstdev->matrix.apm, AP_DEVICES))
>>>>               continue;
>>>>   -        if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
>>>> -                lstdev->matrix.aqm, AP_DOMAINS))
>>>> +        if (!bitmap_and(aqm, mdev_aqm, lstdev->matrix.aqm, AP_DOMAINS))
>>>>               continue;
>>>>             return -EADDRINUSE;
>>>> @@ -564,6 +454,17 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
>>>>       return 0;
>>>>   }
>>>>   +static int vfio_ap_mdev_validate_masks(unsigned long *apm, unsigned long *aqm)
>>>> +{
>>>> +    int ret;
>>>> +
>>>> +    ret = ap_apqn_in_matrix_owned_by_def_drv(apm, aqm);
>>>> +    if (ret)
>>>> +        return (ret == 1) ? -EADDRNOTAVAIL : ret;
>>>> +
>>>> +    return vfio_ap_mdev_verify_no_sharing(apm, aqm);
>>>> +}
>>>> +
>>>>   /**
>>>>    * assign_adapter_store
>>>>    *
>>>> @@ -602,6 +503,7 @@ static ssize_t assign_adapter_store(struct device *dev,
>>>>   {
>>>>       int ret;
>>>>       unsigned long apid;
>>>> +    DECLARE_BITMAP(apm, AP_DEVICES);
>>>>       struct mdev_device *mdev = mdev_from_dev(dev);
>>>>       struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>>>   @@ -616,32 +518,19 @@ static ssize_t assign_adapter_store(struct device *dev,
>>>>       if (apid > matrix_mdev->matrix.apm_max)
>>>>           return -ENODEV;
>>>>   -    /*
>>>> -     * Set the bit in the AP mask (APM) corresponding to the AP adapter
>>>> -     * number (APID). The bits in the mask, from most significant to least
>>>> -     * significant bit, correspond to APIDs 0-255.
>>>> -     */
>>>> -    mutex_lock(&matrix_dev->lock);
>>>> -
>>>> -    ret = vfio_ap_mdev_verify_queues_reserved_for_apid(matrix_mdev, apid);
>>>> -    if (ret)
>>>> -        goto done;
>>>> +    memset(apm, 0, ARRAY_SIZE(apm) * sizeof(*apm));
>>> What about just memset(apm, 0, sizeof(apm));
>>
>> apm is a pointer to an array of unsigned long, so sizeof(apm) will
>> yield the number of bytes in the pointer (8), not the number of bytes in
>> the array (32); or am I wrong about that?
> The
>   DECLARE_BITMAP(apm, AP_DEVICES);
> with the macro definition in types.h:
>   #define DECLARE_BITMAP(name,bits) \
>     unsigned long name[BITS_TO_LONGS(bits)]
> gives this expanded code:
>   unsigned long apm[BITS_TO_LONGS(AP_DEVICES)];
> which is an ordinary unsigned long array.
> So sizeof(apm) gives the size of the array in bytes:
>   sizeof(apm) = 32
> I know this is weird, but it is common practice.

It must be the compiler figures out that apm points to
an array. Although probably not necessary, I will make
the change.

>
> Funnily look at this code:
>   int blubber[10];
>   int *ptr = blubber;
> sizeof(blubber) gives 10*sizeof(int) but
> sizeof(ptr) gives the size of the pointer pointing to
> the blubber array.
>
>>
>>>> +    set_bit_inv(apid, apm);
>>>>   +    mutex_lock(&matrix_dev->lock);
>>>> +    ret = vfio_ap_mdev_validate_masks(apm, matrix_mdev->matrix.aqm);
>>>> +    if (ret) {
>>>> +        mutex_unlock(&matrix_dev->lock);
>>>> +        return ret;
>>>> +    }
>>>>       set_bit_inv(apid, matrix_mdev->matrix.apm);
>>>> -
>>>> -    ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
>>>> -    if (ret)
>>>> -        goto share_err;
>>>> -
>>>> -    ret = count;
>>>> -    goto done;
>>>> -
>>>> -share_err:
>>>> -    clear_bit_inv(apid, matrix_mdev->matrix.apm);
>>>> -done:
>>>>       mutex_unlock(&matrix_dev->lock);
>>>>   -    return ret;
>>>> +    return count;
>>>>   }
>>>>   static DEVICE_ATTR_WO(assign_adapter);
>>>>   @@ -690,26 +579,6 @@ static ssize_t unassign_adapter_store(struct device *dev,
>>>>   }
>>>>   static DEVICE_ATTR_WO(unassign_adapter);
>>>>   -static int
>>>> -vfio_ap_mdev_verify_queues_reserved_for_apqi(struct ap_matrix_mdev *matrix_mdev,
>>>> -                         unsigned long apqi)
>>>> -{
>>>> -    int ret;
>>>> -    unsigned long apid;
>>>> -    unsigned long nbits = matrix_mdev->matrix.apm_max + 1;
>>>> -
>>>> -    if (find_first_bit_inv(matrix_mdev->matrix.apm, nbits) >= nbits)
>>>> -        return vfio_ap_verify_queue_reserved(NULL, &apqi);
>>>> -
>>>> -    for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, nbits) {
>>>> -        ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
>>>> -        if (ret)
>>>> -            return ret;
>>>> -    }
>>>> -
>>>> -    return 0;
>>>> -}
>>>> -
>>>>   /**
>>>>    * assign_domain_store
>>>>    *
>>>> @@ -748,6 +617,7 @@ static ssize_t assign_domain_store(struct device *dev,
>>>>   {
>>>>       int ret;
>>>>       unsigned long apqi;
>>>> +    DECLARE_BITMAP(aqm, AP_DEVICES);
>>> AP_DEVICES -> AP_DOMAINS
>>
>> Copy and paste error, it should be AP_DOMAINS.
>>
>>>>       struct mdev_device *mdev = mdev_from_dev(dev);
>>>>       struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>>>       unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
>>>> @@ -762,27 +632,19 @@ static ssize_t assign_domain_store(struct device *dev,
>>>>       if (apqi > max_apqi)
>>>>           return -ENODEV;
>>>>   -    mutex_lock(&matrix_dev->lock);
>>>> -
>>>> -    ret = vfio_ap_mdev_verify_queues_reserved_for_apqi(matrix_mdev, apqi);
>>>> -    if (ret)
>>>> -        goto done;
>>>> +    memset(aqm, 0, ARRAY_SIZE(aqm) * sizeof(*aqm));
>>> memset(aqm, 0, sizeof(aqm));
>>
>> See response above.
>>
>>>> +    set_bit_inv(apqi, aqm);
>>>>   +    mutex_lock(&matrix_dev->lock);
>>>> +    ret = vfio_ap_mdev_validate_masks(matrix_mdev->matrix.apm, aqm);
>>>> +    if (ret) {
>>>> +        mutex_unlock(&matrix_dev->lock);
>>>> +        return ret;
>>>> +    }
>>>>       set_bit_inv(apqi, matrix_mdev->matrix.aqm);
>>>> -
>>>> -    ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
>>>> -    if (ret)
>>>> -        goto share_err;
>>>> -
>>>> -    ret = count;
>>>> -    goto done;
>>>> -
>>>> -share_err:
>>>> -    clear_bit_inv(apqi, matrix_mdev->matrix.aqm);
>>>> -done:
>>>>       mutex_unlock(&matrix_dev->lock);
>>>>   -    return ret;
>>>> +    return count;
>>>>   }
>>>>   static DEVICE_ATTR_WO(assign_domain);
>>>>   @@ -868,11 +730,6 @@ static ssize_t assign_control_domain_store(struct device *dev,
>>>>       if (id > matrix_mdev->matrix.adm_max)
>>>>           return -ENODEV;
>>>>   -    /* Set the bit in the ADM (bitmask) corresponding to the AP control
>>>> -     * domain number (id). The bits in the mask, from most significant to
>>>> -     * least significant, correspond to IDs 0 up to the one less than the
>>>> -     * number of control domains that can be assigned.
>>>> -     */
>>>>       mutex_lock(&matrix_dev->lock);
>>>>       set_bit_inv(id, matrix_mdev->matrix.adm);
>>>>       mutex_unlock(&matrix_dev->lock);
>>>
>>
>

2019-06-26 21:15:05

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v4 3/7] s390: zcrypt: driver callback to indicate resource in use

On 6/19/19 9:04 AM, Tony Krowiak wrote:
> On 6/18/19 12:25 PM, Cornelia Huck wrote:
>> On Thu, 13 Jun 2019 15:39:36 -0400
>> Tony Krowiak <[email protected]> wrote:
>>
>>> Introduces a new driver callback to prevent a root user from unbinding
>>> an AP queue from its device driver if the queue is in use. This prevents
>>> a root user from inadvertently taking a queue away from a guest and
>>> giving it to the host, or vice versa. The callback will be invoked
>>> whenever a change to the AP bus's apmask or aqmask sysfs interfaces may
>>> result in one or more AP queues being removed from its driver. If the
>>> callback responds in the affirmative for any driver queried, the change
>>> to the apmask or aqmask will be rejected with a device in use error.
>>>
>>> For this patch, only non-default drivers will be queried. Currently,
>>> there is only one non-default driver, the vfio_ap device driver. The
>>> vfio_ap device driver manages AP queues passed through to one or more
>>> guests and we don't want to unexpectedly take AP resources away from
>>> guests which are most likely independently administered.
>>>
>>> Signed-off-by: Tony Krowiak <[email protected]>
>>> ---
>>>   drivers/s390/crypto/ap_bus.c | 138
>>> +++++++++++++++++++++++++++++++++++++++++--
>>>   drivers/s390/crypto/ap_bus.h |   3 +
>>>   2 files changed, 135 insertions(+), 6 deletions(-)
>>
>> Hm... I recall objecting to this patch before, fearing that it makes it
>> possible for a bad actor to hog resources that can't be removed by
>> root, even forcefully. (I have not had time to look at the intervening
>> versions, so I might be missing something.)
>>
>> Is there a way for root to forcefully override this?
>
> You recall correctly; however, after many internal crypto team
> discussions, it was decided that this feature was important
> and should be kept.
>
> Allow me to first address your fear that a bad actor can hog
> resources that can't be removed by root. With this enhancement,
> there is nothing preventing a root user from taking resources
> from a matrix mdev, it simply forces him/her to follow the
> proper procedure. The resources to be removed must first be
> unassigned from the matrix mdev to which they are assigned.
> The AP bus's /sys/bus/ap/apmask and /sys/bus/ap/aqmask
> sysfs attributes can then be edited to transfer ownership
> of the resources to zcrypt.
>
> The rationale for keeping this feature is:
>
> * It is a bad idea to steal an adapter in use from a guest. In the worst
>   case, the guest could end up without access to any crypto adapters
>   without knowing why. This could lead to performance issues on guests
>   that rely heavily on crypto such as guests used for blockchain
>   transactions.
>
> * There are plenty of examples in linux of the kernel preventing a root
>   user from performing a task. For example, a module can't be removed
>   if references are still held for it. Another example would be trying
>   to bind a CEX4 adapter to a device driver not registered for CEX4;
>   this action will also be rejected.
>
> * The semantics are much cleaner and the logic is far less complicated.
>
> * It forces the use of the proper procedure to change ownership of AP
>   queues.
>

Any feedback on this?

Tony K

>
>>
>

2019-06-27 07:25:57

by Cornelia Huck

[permalink] [raw]
Subject: Re: [PATCH v4 3/7] s390: zcrypt: driver callback to indicate resource in use

On Wed, 26 Jun 2019 17:13:50 -0400
Tony Krowiak <[email protected]> wrote:

> On 6/19/19 9:04 AM, Tony Krowiak wrote:
> > On 6/18/19 12:25 PM, Cornelia Huck wrote:
> >> On Thu, 13 Jun 2019 15:39:36 -0400
> >> Tony Krowiak <[email protected]> wrote:
> >>
> >>> Introduces a new driver callback to prevent a root user from unbinding
> >>> an AP queue from its device driver if the queue is in use. This prevents
> >>> a root user from inadvertently taking a queue away from a guest and
> >>> giving it to the host, or vice versa. The callback will be invoked
> >>> whenever a change to the AP bus's apmask or aqmask sysfs interfaces may
> >>> result in one or more AP queues being removed from its driver. If the
> >>> callback responds in the affirmative for any driver queried, the change
> >>> to the apmask or aqmask will be rejected with a device in use error.
> >>>
> >>> For this patch, only non-default drivers will be queried. Currently,
> >>> there is only one non-default driver, the vfio_ap device driver. The
> >>> vfio_ap device driver manages AP queues passed through to one or more
> >>> guests and we don't want to unexpectedly take AP resources away from
> >>> guests which are most likely independently administered.
> >>>
> >>> Signed-off-by: Tony Krowiak <[email protected]>
> >>> ---
> >>>   drivers/s390/crypto/ap_bus.c | 138
> >>> +++++++++++++++++++++++++++++++++++++++++--
> >>>   drivers/s390/crypto/ap_bus.h |   3 +
> >>>   2 files changed, 135 insertions(+), 6 deletions(-)
> >>
> >> Hm... I recall objecting to this patch before, fearing that it makes it
> >> possible for a bad actor to hog resources that can't be removed by
> >> root, even forcefully. (I have not had time to look at the intervening
> >> versions, so I might be missing something.)
> >>
> >> Is there a way for root to forcefully override this?
> >
> > You recall correctly; however, after many internal crypto team
> > discussions, it was decided that this feature was important
> > and should be kept.
> >
> > Allow me to first address your fear that a bad actor can hog
> > resources that can't be removed by root. With this enhancement,
> > there is nothing preventing a root user from taking resources
> > from a matrix mdev, it simply forces him/her to follow the
> > proper procedure. The resources to be removed must first be
> > unassigned from the matrix mdev to which they are assigned.
> > The AP bus's /sys/bus/ap/apmask and /sys/bus/ap/aqmask
> > sysfs attributes can then be edited to transfer ownership
> > of the resources to zcrypt.
> >
> > The rationale for keeping this feature is:
> >
> > * It is a bad idea to steal an adapter in use from a guest. In the worst
> >   case, the guest could end up without access to any crypto adapters
> >   without knowing why. This could lead to performance issues on guests
> >   that rely heavily on crypto such as guests used for blockchain
> >   transactions.
> >
> > * There are plenty of examples in linux of the kernel preventing a root
> >   user from performing a task. For example, a module can't be removed
> >   if references are still held for it. Another example would be trying
> >   to bind a CEX4 adapter to a device driver not registered for CEX4;
> >   this action will also be rejected.
> >
> > * The semantics are much cleaner and the logic is far less complicated.
> >
> > * It forces the use of the proper procedure to change ownership of AP
> >   queues.
> >
>
> Any feedback on this?

Had not yet time to look at this, sorry.


>
> Tony K
>
> >
> >>
> >
>

2019-06-27 13:00:47

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v4 3/7] s390: zcrypt: driver callback to indicate resource in use

On 6/27/19 3:25 AM, Cornelia Huck wrote:
> On Wed, 26 Jun 2019 17:13:50 -0400
> Tony Krowiak <[email protected]> wrote:
>
>> On 6/19/19 9:04 AM, Tony Krowiak wrote:
>>> On 6/18/19 12:25 PM, Cornelia Huck wrote:
>>>> On Thu, 13 Jun 2019 15:39:36 -0400
>>>> Tony Krowiak <[email protected]> wrote:
>>>>
>>>>> Introduces a new driver callback to prevent a root user from unbinding
>>>>> an AP queue from its device driver if the queue is in use. This prevents
>>>>> a root user from inadvertently taking a queue away from a guest and
>>>>> giving it to the host, or vice versa. The callback will be invoked
>>>>> whenever a change to the AP bus's apmask or aqmask sysfs interfaces may
>>>>> result in one or more AP queues being removed from its driver. If the
>>>>> callback responds in the affirmative for any driver queried, the change
>>>>> to the apmask or aqmask will be rejected with a device in use error.
>>>>>
>>>>> For this patch, only non-default drivers will be queried. Currently,
>>>>> there is only one non-default driver, the vfio_ap device driver. The
>>>>> vfio_ap device driver manages AP queues passed through to one or more
>>>>> guests and we don't want to unexpectedly take AP resources away from
>>>>> guests which are most likely independently administered.
>>>>>
>>>>> Signed-off-by: Tony Krowiak <[email protected]>
>>>>> ---
>>>>>   drivers/s390/crypto/ap_bus.c | 138
>>>>> +++++++++++++++++++++++++++++++++++++++++--
>>>>>   drivers/s390/crypto/ap_bus.h |   3 +
>>>>>   2 files changed, 135 insertions(+), 6 deletions(-)
>>>>
>>>> Hm... I recall objecting to this patch before, fearing that it makes it
>>>> possible for a bad actor to hog resources that can't be removed by
>>>> root, even forcefully. (I have not had time to look at the intervening
>>>> versions, so I might be missing something.)
>>>>
>>>> Is there a way for root to forcefully override this?
>>>
>>> You recall correctly; however, after many internal crypto team
>>> discussions, it was decided that this feature was important
>>> and should be kept.
>>>
>>> Allow me to first address your fear that a bad actor can hog
>>> resources that can't be removed by root. With this enhancement,
>>> there is nothing preventing a root user from taking resources
>>> from a matrix mdev, it simply forces him/her to follow the
>>> proper procedure. The resources to be removed must first be
>>> unassigned from the matrix mdev to which they are assigned.
>>> The AP bus's /sys/bus/ap/apmask and /sys/bus/ap/aqmask
>>> sysfs attributes can then be edited to transfer ownership
>>> of the resources to zcrypt.
>>>
>>> The rationale for keeping this feature is:
>>>
>>> * It is a bad idea to steal an adapter in use from a guest. In the worst
>>>   case, the guest could end up without access to any crypto adapters
>>>   without knowing why. This could lead to performance issues on guests
>>>   that rely heavily on crypto such as guests used for blockchain
>>>   transactions.
>>>
>>> * There are plenty of examples in linux of the kernel preventing a root
>>>   user from performing a task. For example, a module can't be removed
>>>   if references are still held for it. Another example would be trying
>>>   to bind a CEX4 adapter to a device driver not registered for CEX4;
>>>   this action will also be rejected.
>>>
>>> * The semantics are much cleaner and the logic is far less complicated.
>>>
>>> * It forces the use of the proper procedure to change ownership of AP
>>>   queues.
>>>
>>
>> Any feedback on this?
>
> Had not yet time to look at this, sorry.

No problem, just wanted to make sure it didn't get lost in the shuffle.

>
>
>>
>> Tony K
>>
>>>
>>>>
>>>
>>
>

2019-07-01 19:27:50

by Cornelia Huck

[permalink] [raw]
Subject: Re: [PATCH v4 3/7] s390: zcrypt: driver callback to indicate resource in use

On Wed, 19 Jun 2019 09:04:18 -0400
Tony Krowiak <[email protected]> wrote:

> On 6/18/19 12:25 PM, Cornelia Huck wrote:
> > On Thu, 13 Jun 2019 15:39:36 -0400
> > Tony Krowiak <[email protected]> wrote:
> >
> >> Introduces a new driver callback to prevent a root user from unbinding
> >> an AP queue from its device driver if the queue is in use. This prevents
> >> a root user from inadvertently taking a queue away from a guest and
> >> giving it to the host, or vice versa. The callback will be invoked
> >> whenever a change to the AP bus's apmask or aqmask sysfs interfaces may
> >> result in one or more AP queues being removed from its driver. If the
> >> callback responds in the affirmative for any driver queried, the change
> >> to the apmask or aqmask will be rejected with a device in use error.
> >>
> >> For this patch, only non-default drivers will be queried. Currently,
> >> there is only one non-default driver, the vfio_ap device driver. The
> >> vfio_ap device driver manages AP queues passed through to one or more
> >> guests and we don't want to unexpectedly take AP resources away from
> >> guests which are most likely independently administered.
> >>
> >> Signed-off-by: Tony Krowiak <[email protected]>
> >> ---
> >> drivers/s390/crypto/ap_bus.c | 138 +++++++++++++++++++++++++++++++++++++++++--
> >> drivers/s390/crypto/ap_bus.h | 3 +
> >> 2 files changed, 135 insertions(+), 6 deletions(-)
> >
> > Hm... I recall objecting to this patch before, fearing that it makes it
> > possible for a bad actor to hog resources that can't be removed by
> > root, even forcefully. (I have not had time to look at the intervening
> > versions, so I might be missing something.)
> >
> > Is there a way for root to forcefully override this?
>
> You recall correctly; however, after many internal crypto team
> discussions, it was decided that this feature was important
> and should be kept.

That's the problem with internal discussions: they're internal :( Would
have been nice to have some summary of this in the changelog.

>
> Allow me to first address your fear that a bad actor can hog
> resources that can't be removed by root. With this enhancement,
> there is nothing preventing a root user from taking resources
> from a matrix mdev, it simply forces him/her to follow the
> proper procedure. The resources to be removed must first be
> unassigned from the matrix mdev to which they are assigned.
> The AP bus's /sys/bus/ap/apmask and /sys/bus/ap/aqmask
> sysfs attributes can then be edited to transfer ownership
> of the resources to zcrypt.

What is the suggested procedure when root wants to unbind a queue
device? Find the mdev using the queue (is that easy enough?), unassign
it, then unbind? Failing to unbind is a bit unexpected; can we point
the admin to the correct mdev from which the queue has to be removed
first?

>
> The rationale for keeping this feature is:
>
> * It is a bad idea to steal an adapter in use from a guest. In the worst
> case, the guest could end up without access to any crypto adapters
> without knowing why. This could lead to performance issues on guests
> that rely heavily on crypto such as guests used for blockchain
> transactions.

Yanking adapters out from a running guest is not a good idea, yes; but
I see it more as a problem of the management layer. Performance issues
etc. are not something we want, obviously; but is removing access to
the adapter deadly, or can the guest keep limping along? (Does the
guest have any chance to find out that the adapter is gone? What
happens on an LPAR if an adapter is gone, maybe due to a hardware
failure?)

>
> * There are plenty of examples in linux of the kernel preventing a root
> user from performing a task. For example, a module can't be removed
> if references are still held for it.

In this case, removing the module would actively break/crash anything
relying on it; I'm not sure how analogous the situation is here (i.e.
can we limp on without the device?)

> Another example would be trying
> to bind a CEX4 adapter to a device driver not registered for CEX4;
> this action will also be rejected.

I don't think this one is analogous at all: The device driver can't
drive the device, so why should it be able to bind to it?

>
> * The semantics are much cleaner and the logic is far less complicated.

This is actually the most convincing of the arguments, I think :) If we
need some really byzantine logic to allow unbinding, it's probably not
worth it.

>
> * It forces the use of the proper procedure to change ownership of AP
> queues.

This needs to be properly documented, and the admin needs to have a
chance to find out why unbinding didn't work and what needs to be done
(see my comments above).

2019-07-08 19:26:51

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v4 3/7] s390: zcrypt: driver callback to indicate resource in use

On 7/1/19 3:26 PM, Cornelia Huck wrote:
> On Wed, 19 Jun 2019 09:04:18 -0400
> Tony Krowiak <[email protected]> wrote:
>
>> On 6/18/19 12:25 PM, Cornelia Huck wrote:
>>> On Thu, 13 Jun 2019 15:39:36 -0400
>>> Tony Krowiak <[email protected]> wrote:
>>>
>>>> Introduces a new driver callback to prevent a root user from unbinding
>>>> an AP queue from its device driver if the queue is in use. This prevents
>>>> a root user from inadvertently taking a queue away from a guest and
>>>> giving it to the host, or vice versa. The callback will be invoked
>>>> whenever a change to the AP bus's apmask or aqmask sysfs interfaces may
>>>> result in one or more AP queues being removed from its driver. If the
>>>> callback responds in the affirmative for any driver queried, the change
>>>> to the apmask or aqmask will be rejected with a device in use error.
>>>>
>>>> For this patch, only non-default drivers will be queried. Currently,
>>>> there is only one non-default driver, the vfio_ap device driver. The
>>>> vfio_ap device driver manages AP queues passed through to one or more
>>>> guests and we don't want to unexpectedly take AP resources away from
>>>> guests which are most likely independently administered.
>>>>
>>>> Signed-off-by: Tony Krowiak <[email protected]>
>>>> ---
>>>> drivers/s390/crypto/ap_bus.c | 138 +++++++++++++++++++++++++++++++++++++++++--
>>>> drivers/s390/crypto/ap_bus.h | 3 +
>>>> 2 files changed, 135 insertions(+), 6 deletions(-)
>>>
>>> Hm... I recall objecting to this patch before, fearing that it makes it
>>> possible for a bad actor to hog resources that can't be removed by
>>> root, even forcefully. (I have not had time to look at the intervening
>>> versions, so I might be missing something.)
>>>
>>> Is there a way for root to forcefully override this?
>>
>> You recall correctly; however, after many internal crypto team
>> discussions, it was decided that this feature was important
>> and should be kept.
>
> That's the problem with internal discussions: they're internal :( Would
> have been nice to have some summary of this in the changelog.

I tried to summarize everything here.

>
>>
>> Allow me to first address your fear that a bad actor can hog
>> resources that can't be removed by root. With this enhancement,
>> there is nothing preventing a root user from taking resources
>> from a matrix mdev, it simply forces him/her to follow the
>> proper procedure. The resources to be removed must first be
>> unassigned from the matrix mdev to which they are assigned.
>> The AP bus's /sys/bus/ap/apmask and /sys/bus/ap/aqmask
>> sysfs attributes can then be edited to transfer ownership
>> of the resources to zcrypt.
>
> What is the suggested procedure when root wants to unbind a queue
> device? Find the mdev using the queue (is that easy enough?), unassign
> it, then unbind? Failing to unbind is a bit unexpected; can we point
> the admin to the correct mdev from which the queue has to be removed
> first?

The proper procedure is to first unassign the adapter, domain, or both
from the mdev to which the APQN is assigned. The difficulty in finding
the queue depends upon how many mdevs have been created. I would expect
that an admin would keep records of who owns what, but in the case he or
she doesn't, it would be a matter of printing out the matrix attribute
of each mdev until you find the mdev to which the APQN is assigned.
The only means I know of for informing the admin to which mdev a given
APQN is assigned is to log the error when it occurs. I think Matt is
also looking to provide query functions in the management tool on which
he is currently working.

>
>>
>> The rationale for keeping this feature is:
>>
>> * It is a bad idea to steal an adapter in use from a guest. In the worst
>> case, the guest could end up without access to any crypto adapters
>> without knowing why. This could lead to performance issues on guests
>> that rely heavily on crypto such as guests used for blockchain
>> transactions.
>
> Yanking adapters out from a running guest is not a good idea, yes; but
> I see it more as a problem of the management layer. Performance issues
> etc. are not something we want, obviously; but is removing access to
> the adapter deadly, or can the guest keep limping along? (Does the
> guest have any chance to find out that the adapter is gone? What
> happens on an LPAR if an adapter is gone, maybe due to a hardware
> failure?)

I don't think anybody is going to die if an adapter is yanked out;), but
if the guest has only one adapter, then other avenues for crypto
services would have to be used.

>
>>
>> * There are plenty of examples in linux of the kernel preventing a root
>> user from performing a task. For example, a module can't be removed
>> if references are still held for it.
>
> In this case, removing the module would actively break/crash anything
> relying on it; I'm not sure how analogous the situation is here (i.e.
> can we limp on without the device?)

I believe crypto libraries like libica revert to using other means -
such as crypto software or CPACF functions? - if no adapters are
available, so I don't think the guest is dead in the water as far as
crypto goes, but I'm certainly no expert in what happens above the AP
layer.

>
>> Another example would be trying
>> to bind a CEX4 adapter to a device driver not registered for CEX4;
>> this action will also be rejected.
>
> I don't think this one is analogous at all: The device driver can't
> drive the device, so why should it be able to bind to it?

Yes, probably a bad example

>
>>
>> * The semantics are much cleaner and the logic is far less complicated.
>
> This is actually the most convincing of the arguments, I think :) If we
> need some really byzantine logic to allow unbinding, it's probably not
> worth it.
>
>>
>> * It forces the use of the proper procedure to change ownership of AP
>> queues.
>
> This needs to be properly documented, and the admin needs to have a
> chance to find out why unbinding didn't work and what needs to be done
> (see my comments above).

I will create a section in the vfio-ap.txt document that comes with this
patch set describing the proper procedure for unbinding queues. Of
course, we'll make sure the official IBM doc also more thoroughly
describes this.

>

2019-07-09 11:03:41

by Cornelia Huck

[permalink] [raw]
Subject: Re: [PATCH v4 3/7] s390: zcrypt: driver callback to indicate resource in use

On Mon, 8 Jul 2019 10:27:11 -0400
Tony Krowiak <[email protected]> wrote:

> On 7/1/19 3:26 PM, Cornelia Huck wrote:
> > On Wed, 19 Jun 2019 09:04:18 -0400
> > Tony Krowiak <[email protected]> wrote:

> >> Allow me to first address your fear that a bad actor can hog
> >> resources that can't be removed by root. With this enhancement,
> >> there is nothing preventing a root user from taking resources
> >> from a matrix mdev, it simply forces him/her to follow the
> >> proper procedure. The resources to be removed must first be
> >> unassigned from the matrix mdev to which they are assigned.
> >> The AP bus's /sys/bus/ap/apmask and /sys/bus/ap/aqmask
> >> sysfs attributes can then be edited to transfer ownership
> >> of the resources to zcrypt.
> >
> > What is the suggested procedure when root wants to unbind a queue
> > device? Find the mdev using the queue (is that easy enough?), unassign
> > it, then unbind? Failing to unbind is a bit unexpected; can we point
> > the admin to the correct mdev from which the queue has to be removed
> > first?
>
> The proper procedure is to first unassign the adapter, domain, or both
> from the mdev to which the APQN is assigned. The difficulty in finding
> the queue depends upon how many mdevs have been created. I would expect
> that an admin would keep records of who owns what, but in the case he or
> she doesn't, it would be a matter of printing out the matrix attribute
> of each mdev until you find the mdev to which the APQN is assigned.

Ok, so the information is basically available, if needed.

> The only means I know of for informing the admin to which mdev a given
> APQN is assigned is to log the error when it occurs.

That might be helpful, if it's easy to do.

> I think Matt is
> also looking to provide query functions in the management tool on which
> he is currently working.

That also sounds helpful.

(...)

> >> * It forces the use of the proper procedure to change ownership of AP
> >> queues.
> >
> > This needs to be properly documented, and the admin needs to have a
> > chance to find out why unbinding didn't work and what needs to be done
> > (see my comments above).
>
> I will create a section in the vfio-ap.txt document that comes with this
> patch set describing the proper procedure for unbinding queues. Of
> course, we'll make sure the official IBM doc also more thoroughly
> describes this.

+1 for good documentation.

With that, I don't really object to this change.

2019-07-09 15:32:51

by Halil Pasic

[permalink] [raw]
Subject: Re: [PATCH v4 0/7] s390: vfio-ap: dynamic configuration support

On Thu, 13 Jun 2019 15:39:33 -0400
Tony Krowiak <[email protected]> wrote:

> The current design for AP pass-through does not support making dynamic
> changes to the AP matrix of a running guest

Sorry guys, I currently don't have the bandwidth to participate in this
discussion. Please proceed without me ;)

Regards,
Halil

2019-07-09 21:16:46

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH v4 3/7] s390: zcrypt: driver callback to indicate resource in use

On 7/9/19 6:49 AM, Cornelia Huck wrote:
> On Mon, 8 Jul 2019 10:27:11 -0400
> Tony Krowiak <[email protected]> wrote:
>
>> On 7/1/19 3:26 PM, Cornelia Huck wrote:
>>> On Wed, 19 Jun 2019 09:04:18 -0400
>>> Tony Krowiak <[email protected]> wrote:
>
>>>> Allow me to first address your fear that a bad actor can hog
>>>> resources that can't be removed by root. With this enhancement,
>>>> there is nothing preventing a root user from taking resources
>>>> from a matrix mdev, it simply forces him/her to follow the
>>>> proper procedure. The resources to be removed must first be
>>>> unassigned from the matrix mdev to which they are assigned.
>>>> The AP bus's /sys/bus/ap/apmask and /sys/bus/ap/aqmask
>>>> sysfs attributes can then be edited to transfer ownership
>>>> of the resources to zcrypt.
>>>
>>> What is the suggested procedure when root wants to unbind a queue
>>> device? Find the mdev using the queue (is that easy enough?), unassign
>>> it, then unbind? Failing to unbind is a bit unexpected; can we point
>>> the admin to the correct mdev from which the queue has to be removed
>>> first?
>>
>> The proper procedure is to first unassign the adapter, domain, or both
>> from the mdev to which the APQN is assigned. The difficulty in finding
>> the queue depends upon how many mdevs have been created. I would expect
>> that an admin would keep records of who owns what, but in the case he or
>> she doesn't, it would be a matter of printing out the matrix attribute
>> of each mdev until you find the mdev to which the APQN is assigned.
>
> Ok, so the information is basically available, if needed.
>
>> The only means I know of for informing the admin to which mdev a given
>> APQN is assigned is to log the error when it occurs.
>
> That might be helpful, if it's easy to do.
>
>> I think Matt is
>> also looking to provide query functions in the management tool on which
>> he is currently working.
>
> That also sounds helpful.
>
> (...)
>
>>>> * It forces the use of the proper procedure to change ownership of AP
>>>> queues.
>>>
>>> This needs to be properly documented, and the admin needs to have a
>>> chance to find out why unbinding didn't work and what needs to be done
>>> (see my comments above).
>>
>> I will create a section in the vfio-ap.txt document that comes with this
>> patch set describing the proper procedure for unbinding queues. Of
>> course, we'll make sure the official IBM doc also more thoroughly
>> describes this.
>
> +1 for good documentation.
>
> With that, I don't really object to this change.

Then I will make the suggested changes and post v5 to the list.

>