Note: Patch 1 - s390/ap: introduce new ap function ap_get_qdev() - is not
a part of this series. It is a forthcoming patch that is a
prerequisite to this series and is being provided so this series
will compile.
The current design for AP pass-through does not support making dynamic
changes to the AP matrix of a running guest resulting in a few
deficiencies this patch series is intended to mitigate:
1. Adapters, domains and control domains can not be added to or removed
from a running guest. In order to modify a guest's AP configuration,
the guest must be terminated; only then can AP resources be assigned
to or unassigned from the guest's matrix mdev. The new AP
configuration becomes available to the guest when it is subsequently
restarted.
2. The AP bus's /sys/bus/ap/apmask and /sys/bus/ap/aqmask interfaces can
be modified by a root user without any restrictions. A change to
either mask can result in AP queue devices being unbound from the
vfio_ap device driver and bound to a zcrypt device driver even if a
guest is using the queues, thus giving the host access to the guest's
private crypto data and vice versa.
3. The APQNs derived from the Cartesian product of the APIDs of the
adapters and APQIs of the domains assigned to a matrix mdev must
reference an AP queue device bound to the vfio_ap device driver. The
AP architecture allows assignment of AP resources that are not
available to the system, so this artificial restriction is not
compliant with the architecture.
4. The AP configuration profile can be dynamically changed for the linux
host after a KVM guest is started. For example, a new domain can be
dynamically added to the configuration profile via the SE or an HMC
connected to a DPM enabled lpar. Likewise, AP adapters can be
dynamically configured (online state) and deconfigured (standby state)
using the SE, an SCLP command or an HMC connected to a DPM enabled
lpar. This can result in inadvertent sharing of AP queues between the
guest and host.
5. A root user can manually unbind an AP queue device representing a
queue in use by a KVM guest via the vfio_ap device driver's sysfs
unbind attribute. In this case, the guest will be using a queue that
is not bound to the driver which violates the device model.
This patch series introduces the following changes to the current design
to alleviate the shortcomings described above as well as to implement
more of the AP architecture:
1. A root user will be prevented from making changes to the AP bus's
/sys/bus/ap/apmask or /sys/bus/ap/aqmask if the ownership of an APQN
changes from the vfio_ap device driver to a zcrypt driver when the
APQN is assigned to a matrix mdev.
2. Allow a root user to hot plug/unplug AP adapters, domains and control
domains using the matrix mdev's assign/unassign attributes.
4. Allow assignment of an AP adapter or domain to a matrix mdev even if
it results in assignment of an APQN that does not reference an AP
queue device bound to the vfio_ap device driver, as long as the APQN
is not reserved for use by the default zcrypt drivers (also known as
over-provisioning of AP resources). Allowing over-provisioning of AP
resources better models the architecture which does not preclude
assigning AP resources that are not yet available in the system. Such
APQNs, however, will not be assigned to the guest using the matrix
mdev; only APQNs referencing AP queue devices bound to the vfio_ap
device driver will actually get assigned to the guest.
5. Handle dynamic changes to the AP device model.
1. Rationale for changes to AP bus's apmask/aqmask interfaces:
----------------------------------------------------------
Due to the extremely sensitive nature of cryptographic data, it is
imperative that great care be taken to ensure that such data is secured.
Allowing a root user, either inadvertently or maliciously, to configure
these masks such that a queue is shared between the host and a guest is
not only avoidable, it is advisable. It was suggested that this scenario
is better handled in user space with management software, but that does
not preclude a malicious administrator from using the sysfs interfaces
to gain access to a guest's crypto data. It was also suggested that this
scenario could be avoided by taking access to the adapter away from the
guest and zeroing out the queues prior to the vfio_ap driver releasing the
device; however, stealing an adapter in use from a guest as a by-product
of an operation is bad and will likely cause problems for the guest
unnecessarily. It was decided that the most effective solution with the
least number of negative side effects is to prevent the situation at the
source.
2. Rationale for hot plug/unplug using matrix mdev sysfs interfaces:
----------------------------------------------------------------
Allowing a user to hot plug/unplug AP resources using the matrix mdev
sysfs interfaces circumvents the need to terminate the guest in order to
modify its AP configuration. Allowing dynamic configuration makes
reconfiguring a guest's AP matrix much less disruptive.
3. Rationale for allowing over-provisioning of AP resources:
-----------------------------------------------------------
Allowing assignment of AP resources to a matrix mdev and ultimately to a
guest better models the AP architecture. The architecture does not
preclude assignment of unavailable AP resources. If a queue subsequently
becomes available while a guest using the matrix mdev to which its APQN
is assigned, the guest will be given access to it. If an APQN
is dynamically unassigned from the underlying host system, it will
automatically become unavailable to the guest.
Change log v6-v7:
----------------
* Added callbacks to AP bus:
- on_config_changed: Notifies implementing drivers that
the AP configuration has changed since last AP device scan.
- on_scan_complete: Notifies implementing drivers that the device scan
has completed.
- implemented on_config_changed and on_scan_complete callbacks for
vfio_ap device driver.
- updated vfio_ap device driver's probe and remove callbacks to handle
dynamic changes to the AP device model.
* Added code to filter APQNs when assigning AP resources to a KVM guest's
CRYCB
Change log v7-v8:
----------------
* Now logging a message when an attempt to reserve APQNs for the zcrypt
drivers will result in taking a queue away from a KVM guest to provide
the sysadmin a way to ascertain why the sysfs operation failed.
* Created locked and unlocked versions of the ap_parse_mask_str() function.
* Now using new interface provided by an AP bus patch -
s390/ap: introduce new ap function ap_get_qdev() - to retrieve
struct ap_queue representing an AP queue device. This patch is not a
part of this series but is a prerequisite for this series.
Change log v6-v7:
----------------
Change log v5-v6:
----------------
* Fixed a bug in ap_bus.c introduced with patch 2/7 of the v5
series. Harald Freudenberer pointed out that the mutex lock
for ap_perms_mutex in the apmask_store and aqmask_store functions
was not being freed.
* Removed patch 6/7 which added logging to the vfio_ap driver
to expedite acceptance of this series. The logging will be introduced
with a separate patch series to allow more time to explore options
such as DBF logging vs. tracepoints.
* Added 3 patches related to ensuring that APQNs that do not reference
AP queue devices bound to the vfio_ap device driver are not assigned
to the guest CRYCB:
Patch 4: Filter CRYCB bits for unavailable queue devices
Patch 5: sysfs attribute to display the guest CRYCB
Patch 6: update guest CRYCB in vfio_ap probe and remove callbacks
* Added a patch (Patch 9) to version the vfio_ap module.
* Reshuffled patches to allow the in_use callback implementation to
invoke the vfio_ap_mdev_verify_no_sharing() function introduced in
patch 2.
Change log v4-v5:
----------------
* Added a patch to provide kernel s390dbf debug logs for VFIO AP
Change log v3->v4:
-----------------
* Restored patches preventing root user from changing ownership of
APQNs from zcrypt drivers to the vfio_ap driver if the APQN is
assigned to an mdev.
* No longer enforcing requirement restricting guest access to
queues represented by a queue device bound to the vfio_ap
device driver.
* Removed shadow CRYCB and now directly updating the guest CRYCB
from the matrix mdev's matrix.
* Rebased the patch series on top of 'vfio: ap: AP Queue Interrupt
Control' patches.
* Disabled bind/unbind sysfs interfaces for vfio_ap driver
Change log v2->v3:
-----------------
* Allow guest access to an AP queue only if the queue is bound to
the vfio_ap device driver.
* Removed the patch to test CRYCB masks before taking the vCPUs
out of SIE. Now checking the shadow CRYCB in the vfio_ap driver.
Change log v1->v2:
-----------------
* Removed patches preventing root user from unbinding AP queues from
the vfio_ap device driver
* Introduced a shadow CRYCB in the vfio_ap driver to manage dynamic
changes to the AP guest configuration due to root user interventions
or hardware anomalies.
Harald Freudenberger (2):
s390/ap: introduce new ap function ap_get_qdev()
s390/zcrypt: Notify driver on config changed and scan complete
callbacks
Tony Krowiak (14):
s390/vfio-ap: use new AP bus interface to search for queue devices
s390/vfio-ap: manage link between queue struct and matrix mdev
s390/zcrypt: driver callback to indicate resource in use
s390/vfio-ap: implement in-use callback for vfio_ap driver
s390/vfio-ap: introduce shadow APCB
s390/vfio-ap: sysfs attribute to display the guest's matrix
s390/vfio-ap: filter matrix for unavailable queue devices
s390/vfio_ap: add qlink from ap_matrix_mdev struct to vfio_ap_queue
struct
s390/vfio-ap: allow assignment of unavailable AP queues to mdev device
s390/vfio-ap: allow configuration of matrix mdev in use by a KVM guest
s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
s390/vfio-ap: handle host AP config change notification
s390/vfio-ap: handle AP bus scan completed notification
s390/vfio-ap: handle probe/remove not due to host AP config changes
drivers/s390/crypto/ap_bus.c | 417 +++++++--
drivers/s390/crypto/ap_bus.h | 41 +-
drivers/s390/crypto/ap_card.c | 47 +-
drivers/s390/crypto/ap_queue.c | 10 +-
drivers/s390/crypto/vfio_ap_drv.c | 34 +-
drivers/s390/crypto/vfio_ap_ops.c | 1165 ++++++++++++++++++++-----
drivers/s390/crypto/vfio_ap_private.h | 23 +-
7 files changed, 1339 insertions(+), 398 deletions(-)
--
2.21.1
The matrix of adapters and domains configured in a guest's CRYCB may
differ from the matrix of adapters and domains assigned to the matrix mdev,
so this patch introduces a sysfs attribute to display the matrix of a guest
using the matrix mdev. For a matrix mdev denoted by $uuid, the crycb for a
guest using the matrix mdev can be displayed as follows:
cat /sys/devices/vfio_ap/matrix/$uuid/guest_matrix
If a guest is not using the matrix mdev at the time the crycb is displayed,
an error (ENODEV) will be returned.
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 58 +++++++++++++++++++++++++++++++
1 file changed, 58 insertions(+)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index b5ed36e2c948..779659074776 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1086,6 +1086,63 @@ static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
}
static DEVICE_ATTR_RO(matrix);
+static ssize_t guest_matrix_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct mdev_device *mdev = mdev_from_dev(dev);
+ struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+ char *bufpos = buf;
+ unsigned long apid;
+ unsigned long apqi;
+ unsigned long apid1;
+ unsigned long apqi1;
+ unsigned long napm_bits = matrix_mdev->shadow_apcb.apm_max + 1;
+ unsigned long naqm_bits = matrix_mdev->shadow_apcb.aqm_max + 1;
+ int nchars = 0;
+ int n;
+
+ if (!vfio_ap_mdev_has_crycb(matrix_mdev))
+ return -ENODEV;
+
+ apid1 = find_first_bit_inv(matrix_mdev->shadow_apcb.apm, napm_bits);
+ apqi1 = find_first_bit_inv(matrix_mdev->shadow_apcb.aqm, naqm_bits);
+
+ mutex_lock(&matrix_dev->lock);
+
+ if ((apid1 < napm_bits) && (apqi1 < naqm_bits)) {
+ for_each_set_bit_inv(apid, matrix_mdev->shadow_apcb.apm,
+ napm_bits) {
+ for_each_set_bit_inv(apqi,
+ matrix_mdev->shadow_apcb.aqm,
+ naqm_bits) {
+ n = sprintf(bufpos, "%02lx.%04lx\n", apid,
+ apqi);
+ bufpos += n;
+ nchars += n;
+ }
+ }
+ } else if (apid1 < napm_bits) {
+ for_each_set_bit_inv(apid, matrix_mdev->shadow_apcb.apm,
+ napm_bits) {
+ n = sprintf(bufpos, "%02lx.\n", apid);
+ bufpos += n;
+ nchars += n;
+ }
+ } else if (apqi1 < naqm_bits) {
+ for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm,
+ naqm_bits) {
+ n = sprintf(bufpos, ".%04lx\n", apqi);
+ bufpos += n;
+ nchars += n;
+ }
+ }
+
+ mutex_unlock(&matrix_dev->lock);
+
+ return nchars;
+}
+static DEVICE_ATTR_RO(guest_matrix);
+
static struct attribute *vfio_ap_mdev_attrs[] = {
&dev_attr_assign_adapter.attr,
&dev_attr_unassign_adapter.attr,
@@ -1095,6 +1152,7 @@ static struct attribute *vfio_ap_mdev_attrs[] = {
&dev_attr_unassign_control_domain.attr,
&dev_attr_control_domains.attr,
&dev_attr_matrix.attr,
+ &dev_attr_guest_matrix.attr,
NULL,
};
--
2.21.1
The current support for pass-through crypto adapters does not allow
configuration of a matrix mdev when it is in use by a KVM guest. Let's
allow AP resources - i.e., adapters, domains and control domains - to be
assigned to or unassigned from a matrix mdev while it is in use by a guest.
This is in preparation for the introduction of support for dynamic
configuration of the AP matrix for a running KVM guest.
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 24 ------------------------
1 file changed, 24 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 68bdf80807c6..4f59f471b4d3 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -773,10 +773,6 @@ static ssize_t assign_adapter_store(struct device *dev,
struct mdev_device *mdev = mdev_from_dev(dev);
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
- /* If the guest is running, disallow assignment of adapter */
- if (matrix_mdev->kvm)
- return -EBUSY;
-
ret = kstrtoul(buf, 0, &apid);
if (ret)
return ret;
@@ -828,10 +824,6 @@ static ssize_t unassign_adapter_store(struct device *dev,
struct mdev_device *mdev = mdev_from_dev(dev);
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
- /* If the guest is running, disallow un-assignment of adapter */
- if (matrix_mdev->kvm)
- return -EBUSY;
-
ret = kstrtoul(buf, 0, &apid);
if (ret)
return ret;
@@ -891,10 +883,6 @@ static ssize_t assign_domain_store(struct device *dev,
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
- /* If the guest is running, disallow assignment of domain */
- if (matrix_mdev->kvm)
- return -EBUSY;
-
ret = kstrtoul(buf, 0, &apqi);
if (ret)
return ret;
@@ -946,10 +934,6 @@ static ssize_t unassign_domain_store(struct device *dev,
struct mdev_device *mdev = mdev_from_dev(dev);
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
- /* If the guest is running, disallow un-assignment of domain */
- if (matrix_mdev->kvm)
- return -EBUSY;
-
ret = kstrtoul(buf, 0, &apqi);
if (ret)
return ret;
@@ -991,10 +975,6 @@ static ssize_t assign_control_domain_store(struct device *dev,
struct mdev_device *mdev = mdev_from_dev(dev);
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
- /* If the guest is running, disallow assignment of control domain */
- if (matrix_mdev->kvm)
- return -EBUSY;
-
ret = kstrtoul(buf, 0, &id);
if (ret)
return ret;
@@ -1036,10 +1016,6 @@ static ssize_t unassign_control_domain_store(struct device *dev,
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
unsigned long max_domid = matrix_mdev->matrix.adm_max;
- /* If the guest is running, disallow un-assignment of control domain */
- if (matrix_mdev->kvm)
- return -EBUSY;
-
ret = kstrtoul(buf, 0, &domid);
if (ret)
return ret;
--
2.21.1
AP queue devices are probed or removed for reasons other than changes
to the host AP configuration:
* Each queue device associated with a card device will get created and
probed when the state of the AP adapter represented by the card device
dynamically changes from standby to online.
* Each queue device associated with a card device will get removed
when the state of the AP adapter to which the queue represented by the
queue device dynamically changes from online to standby.
* Each queue device associated with a card device will get removed
when the type of the AP adapter to which the queue represented by the
queue device dynamically changes.
* Each queue device associated with a card device will get removed
when the status of the queue represented by the queue device changes
from operating to check stop.
* AP queue devices can be manually bound to or unbound from the vfio_ap
device driver by a root user via the sysfs bind/unbind attributes of the
driver.
In response to a queue device probe or remove that is not the result of a
change to the host's AP configuration, if a KVM guest is using the matrix
mdev to which the APQN of the queue device is assigned, the vfio_ap device
driver must respond accordingly. In an ideal world, the queue corresponding
to the queue device being probed would be hot plugged into the guest.
Likewise, the queue corresponding to the queue device being removed would
be hot unplugged from the guest. Unfortunately, the AP architecture
precludes plugging or unplugging individual queues, so let's handle
the probe or remove of an AP queue device as follows:
Handling Probe
--------------
There are two requirements that must be met in order to give a
guest access to the queue corresponding to the queue device being probed:
* Each APQN derived from the APID of the queue device and the APQIs of the
domains already assigned to the guest's AP configuration must reference
a queue device bound to the vfio_ap device driver.
* Each APQN derived from the APQI of the queue device and the APIDs of the
adapters assigned to the guest's AP configuration must reference a queue
device bound to the vfio_ap device driver.
If the above conditions are met, the APQN will be assigned to the guest's
AP configuration and the guest will be given access to the queue.
Handling Remove
---------------
Since the AP architecture precludes us from taking access to an individual
queue from a guest, we are left with the choice of taking access away from
either the adapter or the domain to which the queue is connected. Access to
the adapter will be taken away because it is likely that most of the time,
the remove callback will be invoked because the adapter state has
transitioned from online to standby. In such a case, no queue connected
to the adapter will be available to access.
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 38 +++++++++++++++++++++++++++++++
1 file changed, 38 insertions(+)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index cfe93ff9cc8c..5ee60dac7ad1 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1681,6 +1681,15 @@ static void vfio_ap_queue_link_mdev(struct vfio_ap_queue *q)
}
}
+void vfio_ap_mdev_hot_plug_queue(struct vfio_ap_queue *q)
+{
+ if ((q->matrix_mdev == NULL) || !vfio_ap_mdev_has_crycb(q->matrix_mdev))
+ return;
+
+ if (vfio_ap_mdev_config_shadow_apcb(q->matrix_mdev))
+ vfio_ap_mdev_commit_shadow_apcb(q->matrix_mdev);
+}
+
int vfio_ap_mdev_probe_queue(struct ap_queue *queue)
{
struct vfio_ap_queue *q;
@@ -1694,11 +1703,35 @@ int vfio_ap_mdev_probe_queue(struct ap_queue *queue)
q->apqn = queue->qid;
q->saved_isc = VFIO_AP_ISC_INVALID;
vfio_ap_queue_link_mdev(q);
+ /* Make sure we're not in the middle of an AP configuration change. */
+ if (!(matrix_dev->flags & AP_MATRIX_CFG_CHG))
+ vfio_ap_mdev_hot_plug_queue(q);
mutex_unlock(&matrix_dev->lock);
return 0;
}
+void vfio_ap_mdev_hot_unplug_queue(struct vfio_ap_queue *q)
+{
+ unsigned long apid = AP_QID_CARD(q->apqn);
+ unsigned long apqi = AP_QID_QUEUE(q->apqn);
+
+ if ((q->matrix_mdev == NULL) || !vfio_ap_mdev_has_crycb(q->matrix_mdev))
+ return;
+
+ /*
+ * If the APQN is assigned to the guest, then let's
+ * go ahead and unplug the adapter since the
+ * architecture does not provide a means to unplug
+ * an individual queue.
+ */
+ if (test_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm) &&
+ test_bit_inv(apqi, q->matrix_mdev->shadow_apcb.aqm)) {
+ if (vfio_ap_mdev_unassign_guest_apid(q->matrix_mdev, apid))
+ vfio_ap_mdev_commit_shadow_apcb(q->matrix_mdev);
+ }
+}
+
void vfio_ap_mdev_remove_queue(struct ap_queue *queue)
{
struct vfio_ap_queue *q;
@@ -1706,6 +1739,11 @@ void vfio_ap_mdev_remove_queue(struct ap_queue *queue)
mutex_lock(&matrix_dev->lock);
q = dev_get_drvdata(&queue->ap_dev.device);
+
+ /* Make sure we're not in the middle of an AP configuration change. */
+ if (!(matrix_dev->flags & AP_MATRIX_CFG_CHG))
+ vfio_ap_mdev_hot_unplug_queue(q);
+
dev_set_drvdata(&queue->ap_dev.device, NULL);
apid = AP_QID_CARD(q->apqn);
apqi = AP_QID_QUEUE(q->apqn);
--
2.21.1
The APCB is a field within the CRYCB that provides the AP configuration
to a KVM guest. Let's introduce a shadow copy of the KVM guest's APCB and
maintain it for the lifespan of the guest.
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 33 +++++++++++++++++++++++----
drivers/s390/crypto/vfio_ap_private.h | 2 ++
2 files changed, 30 insertions(+), 5 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 2eebb2b6d2d4..b5ed36e2c948 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -292,14 +292,35 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
return 0;
}
+static void vfio_ap_matrix_clear_masks(struct ap_matrix *matrix)
+{
+ bitmap_clear(matrix->apm, 0, AP_DEVICES);
+ bitmap_clear(matrix->aqm, 0, AP_DOMAINS);
+ bitmap_clear(matrix->adm, 0, AP_DOMAINS);
+}
+
static void vfio_ap_matrix_init(struct ap_config_info *info,
struct ap_matrix *matrix)
{
+ vfio_ap_matrix_clear_masks(matrix);
matrix->apm_max = info->apxa ? info->Na : 63;
matrix->aqm_max = info->apxa ? info->Nd : 15;
matrix->adm_max = info->apxa ? info->Nd : 15;
}
+static bool vfio_ap_mdev_has_crycb(struct ap_matrix_mdev *matrix_mdev)
+{
+ return (matrix_mdev->kvm && matrix_mdev->kvm->arch.crypto.crycbd);
+}
+
+static void vfio_ap_mdev_commit_crycb(struct ap_matrix_mdev *matrix_mdev)
+{
+ kvm_arch_crypto_set_masks(matrix_mdev->kvm,
+ matrix_mdev->shadow_apcb.apm,
+ matrix_mdev->shadow_apcb.aqm,
+ matrix_mdev->shadow_apcb.adm);
+}
+
static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
{
struct ap_matrix_mdev *matrix_mdev;
@@ -315,6 +336,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
matrix_mdev->mdev = mdev;
vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
+ vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);
mdev_set_drvdata(mdev, matrix_mdev);
matrix_mdev->pqap_hook.hook = handle_pqap;
matrix_mdev->pqap_hook.owner = THIS_MODULE;
@@ -1168,13 +1190,12 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
if (ret)
return NOTIFY_DONE;
- /* If there is no CRYCB pointer, then we can't copy the masks */
- if (!matrix_mdev->kvm->arch.crypto.crycbd)
+ if (!vfio_ap_mdev_has_crycb(matrix_mdev))
return NOTIFY_DONE;
- kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
- matrix_mdev->matrix.aqm,
- matrix_mdev->matrix.adm);
+ memcpy(&matrix_mdev->shadow_apcb, &matrix_mdev->matrix,
+ sizeof(matrix_mdev->shadow_apcb));
+ vfio_ap_mdev_commit_crycb(matrix_mdev);
return NOTIFY_OK;
}
@@ -1289,6 +1310,8 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
kvm_put_kvm(matrix_mdev->kvm);
matrix_mdev->kvm = NULL;
}
+
+ vfio_ap_matrix_clear_masks(&matrix_mdev->shadow_apcb);
mutex_unlock(&matrix_dev->lock);
vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index ad2d5b6a2851..8e24a073166b 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -75,6 +75,7 @@ struct ap_matrix {
* @list: allows the ap_matrix_mdev struct to be added to a list
* @matrix: the adapters, usage domains and control domains assigned to the
* mediated matrix device.
+ * @shadow_apcb: the shadow copy of the APCB field of the KVM guest's CRYCB
* @group_notifier: notifier block used for specifying callback function for
* handling the VFIO_GROUP_NOTIFY_SET_KVM event
* @kvm: the struct holding guest's state
@@ -82,6 +83,7 @@ struct ap_matrix {
struct ap_matrix_mdev {
struct list_head node;
struct ap_matrix matrix;
+ struct ap_matrix shadow_apcb;
struct notifier_block group_notifier;
struct notifier_block iommu_notifier;
struct kvm *kvm;
--
2.21.1
Even though APQNs for queues that are not in the host's AP configuration
may be assigned to a matrix mdev, we do not want to set bits in the guest's
APCB for APQNs that do not reference AP queue devices bound to the vfio_ap
device driver. Ideally, it would be great if such APQNs could be filtered
out before setting the bits in the guest's APCB; however, the architecture
precludes filtering individual APQNs. Consequently, either the APID or APQI
must be filtered.
This patch introduces code to filter the APIDs or APQIs assigned to the
matrix mdev's AP configuration before assigning them to the guest's AP
configuration (i.e., APCB). We'll start by filtering the APIDs:
If an APQN assigned to the matrix mdev's AP configuration does not
reference a queue device bound to the vfio_ap device driver, the APID
will be filtered out (i.e., not assigned to the guest's APCB).
If every APID assigned to the matrix mdev is filtered out, then we'll try
filtering the APQI's:
If an APQN assigned to the matrix mdev's AP configuration does not
reference a queue device bound to the vfio_ap device driver, the APQI
will be filtered out (i.e., not assigned to the guest's APCB).
In any case, if after filtering either the APIDs or APQIs there are any
APQNs that can be assigned to the guest's APCB, they will be assigned and
the CRYCB will be hot plugged into the guest.
Example
=======
APQNs bound to vfio_ap device driver:
04.0004
04.0047
04.0054
05.0005
05.0047
05.0054
Assignments to matrix mdev:
APIDs APQIs -> APQNs
04 0004 04.0004
05 0005 04.0005
0047 04.0047
0054 04.0054
05.0004
05.0005
05.0047
04.0054
Filter APIDs:
APID 04 will be filtered because APQN 04.0005 is not bound.
APID 05 will be filtered because APQN 05.0004 is not bound.
APQNs remaining: None
Filter APQIs:
APQI 04 will be filtered because APQN 05.0004 is not bound.
APQI 05 will be filtered because APQN 04.0005 is not bound.
APQNs remaining: 04.0047, 04.0054, 05.0047, 05.0054
APQNs 04.0047, 04.0054, 05.0047, 05.0054 will be assigned to the CRYCB and
hot plugged into the KVM guest.
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 159 +++++++++++++++++++++++++++++-
1 file changed, 155 insertions(+), 4 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 779659074776..add442977b9a 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -313,7 +313,7 @@ static bool vfio_ap_mdev_has_crycb(struct ap_matrix_mdev *matrix_mdev)
return (matrix_mdev->kvm && matrix_mdev->kvm->arch.crypto.crycbd);
}
-static void vfio_ap_mdev_commit_crycb(struct ap_matrix_mdev *matrix_mdev)
+static void vfio_ap_mdev_commit_shadow_apcb(struct ap_matrix_mdev *matrix_mdev)
{
kvm_arch_crypto_set_masks(matrix_mdev->kvm,
matrix_mdev->shadow_apcb.apm,
@@ -584,6 +584,157 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
return 0;
}
+/**
+ * vfio_ap_mdev_filter_matrix
+ *
+ * Filter APQNs assigned to the matrix mdev that do not reference an AP queue
+ * device bound to the vfio_ap device driver.
+ *
+ * @matrix_mdev: the matrix mdev whose AP configuration is to be filtered
+ * @shadow_apcb: the shadow of the KVM guest's APCB (contains AP configuration
+ * for guest)
+ * @filter_apids: boolean value indicating whether the APQNs shall be filtered
+ * by APID (true) or by APQI (false).
+ *
+ * Returns the number of APQNs remaining after filtering is complete.
+ */
+static int vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev,
+ struct ap_matrix *shadow_apcb,
+ bool filter_apids)
+{
+ unsigned long apid, apqi, apqn;
+
+ memcpy(shadow_apcb, &matrix_mdev->matrix, sizeof(*shadow_apcb));
+
+ for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
+ /*
+ * If the APID is not assigned to the host AP configuration,
+ * we can not assign it to the guest's AP configuration
+ */
+ if (!test_bit_inv(apid,
+ (unsigned long *)matrix_dev->info.apm)) {
+ clear_bit_inv(apid, shadow_apcb->apm);
+ continue;
+ }
+
+ for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
+ AP_DOMAINS) {
+ /*
+ * If the APQI is not assigned to the host AP
+ * configuration, then it can not be assigned to the
+ * guest's AP configuration
+ */
+ if (!test_bit_inv(apqi, (unsigned long *)
+ matrix_dev->info.aqm)) {
+ clear_bit_inv(apqi, shadow_apcb->aqm);
+ continue;
+ }
+
+ /*
+ * If the APQN is not bound to the vfio_ap device
+ * driver, then we can't assign it to the guest's
+ * AP configuration. The AP architecture won't
+ * allow filtering of a single APQN, so if we're
+ * filtering APIDs, then filter the APID; otherwise,
+ * filter the APQI.
+ */
+ apqn = AP_MKQID(apid, apqi);
+ if (!vfio_ap_get_queue(apqn)) {
+ if (filter_apids)
+ clear_bit_inv(apid, shadow_apcb->apm);
+ else
+ clear_bit_inv(apqi, shadow_apcb->aqm);
+ break;
+ }
+ }
+
+ /*
+ * If we're filtering APQIs and all of them have been filtered,
+ * there's no need to continue filtering.
+ */
+ if (!filter_apids)
+ if (bitmap_empty(shadow_apcb->aqm, AP_DOMAINS))
+ break;
+ }
+
+ return bitmap_weight(shadow_apcb->apm, AP_DEVICES) *
+ bitmap_weight(shadow_apcb->aqm, AP_DOMAINS);
+}
+
+/**
+ * vfio_ap_mdev_config_shadow_apcb
+ *
+ * Configure the shadow of a KVM guest's APCB specifying the adapters, domains
+ * and control domains to be assigned to the guest. The shadow APCB will be
+ * configured after filtering the APQNs assigned to the matrix mdev that do not
+ * reference a queue device bound to the vfio_ap device driver.
+ *
+ * @matrix_mdev: the matrix mdev whose shadow APCB is to be configured.
+ *
+ * Returns true if the shadow APCB contents have been changed; otherwise,
+ * returns false.
+ */
+static bool vfio_ap_mdev_config_shadow_apcb(struct ap_matrix_mdev *matrix_mdev)
+{
+ int napm, naqm;
+ struct ap_matrix shadow_apcb;
+
+ vfio_ap_matrix_init(&matrix_dev->info, &shadow_apcb);
+ napm = bitmap_weight(matrix_mdev->matrix.apm, AP_DEVICES);
+ naqm = bitmap_weight(matrix_mdev->matrix.aqm, AP_DOMAINS);
+
+ /*
+ * If there are no APIDs or no APQIs assigned to the matrix mdev,
+ * then no APQNs shall be assigned to the guest CRYCB.
+ */
+ if ((napm != 0) || (naqm != 0)) {
+ /*
+ * Filter the APIDs assigned to the matrix mdev for APQNs that
+ * do not reference an AP queue device bound to the driver.
+ */
+ napm = vfio_ap_mdev_filter_matrix(matrix_mdev, &shadow_apcb,
+ true);
+ /*
+ * If there are no APQNs that can be assigned to the guest's
+ * CRYCB after filtering, then try filtering the APQIs.
+ */
+ if (napm == 0) {
+ naqm = vfio_ap_mdev_filter_matrix(matrix_mdev,
+ &shadow_apcb, false);
+
+ /*
+ * If there are no APQNs that can be assigned to the
+ * matrix mdev after filtering the APQIs, then no APQNs
+ * shall be assigned to the guest's CRYCB.
+ */
+ if (naqm == 0) {
+ bitmap_clear(shadow_apcb.apm, 0, AP_DEVICES);
+ bitmap_clear(shadow_apcb.aqm, 0, AP_DOMAINS);
+ }
+ }
+ }
+
+ /*
+ * If the guest's AP configuration has not changed, then return
+ * indicating such.
+ */
+ if (bitmap_equal(matrix_mdev->shadow_apcb.apm, shadow_apcb.apm,
+ AP_DEVICES) &&
+ bitmap_equal(matrix_mdev->shadow_apcb.aqm, shadow_apcb.aqm,
+ AP_DOMAINS) &&
+ bitmap_equal(matrix_mdev->shadow_apcb.adm, shadow_apcb.adm,
+ AP_DOMAINS))
+ return false;
+
+ /*
+ * Copy the changes to the guest's CRYCB, then return indicating that
+ * the guest's AP configuration has changed.
+ */
+ memcpy(&matrix_mdev->shadow_apcb, &shadow_apcb, sizeof(shadow_apcb));
+
+ return true;
+}
+
enum qlink_type {
LINK_APID,
LINK_APQI,
@@ -1251,9 +1402,8 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
if (!vfio_ap_mdev_has_crycb(matrix_mdev))
return NOTIFY_DONE;
- memcpy(&matrix_mdev->shadow_apcb, &matrix_mdev->matrix,
- sizeof(matrix_mdev->shadow_apcb));
- vfio_ap_mdev_commit_crycb(matrix_mdev);
+ if (vfio_ap_mdev_config_shadow_apcb(matrix_mdev))
+ vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
return NOTIFY_OK;
}
@@ -1363,6 +1513,7 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
mutex_lock(&matrix_dev->lock);
if (matrix_mdev->kvm) {
kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
+ vfio_ap_matrix_clear_masks(&matrix_mdev->shadow_apcb);
matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
vfio_ap_mdev_reset_queues(mdev);
kvm_put_kvm(matrix_mdev->kvm);
--
2.21.1
I would greatly appreciate some attention to this patch series ... Please?
On 6/5/20 5:39 PM, Tony Krowiak wrote:
> Note: Patch 1 - s390/ap: introduce new ap function ap_get_qdev() - is not
> a part of this series. It is a forthcoming patch that is a
> prerequisite to this series and is being provided so this series
> will compile.
>
> The current design for AP pass-through does not support making dynamic
> changes to the AP matrix of a running guest resulting in a few
> deficiencies this patch series is intended to mitigate:
>
> 1. Adapters, domains and control domains can not be added to or removed
> from a running guest. In order to modify a guest's AP configuration,
> the guest must be terminated; only then can AP resources be assigned
> to or unassigned from the guest's matrix mdev. The new AP
> configuration becomes available to the guest when it is subsequently
> restarted.
>
> 2. The AP bus's /sys/bus/ap/apmask and /sys/bus/ap/aqmask interfaces can
> be modified by a root user without any restrictions. A change to
> either mask can result in AP queue devices being unbound from the
> vfio_ap device driver and bound to a zcrypt device driver even if a
> guest is using the queues, thus giving the host access to the guest's
> private crypto data and vice versa.
>
> 3. The APQNs derived from the Cartesian product of the APIDs of the
> adapters and APQIs of the domains assigned to a matrix mdev must
> reference an AP queue device bound to the vfio_ap device driver. The
> AP architecture allows assignment of AP resources that are not
> available to the system, so this artificial restriction is not
> compliant with the architecture.
>
> 4. The AP configuration profile can be dynamically changed for the linux
> host after a KVM guest is started. For example, a new domain can be
> dynamically added to the configuration profile via the SE or an HMC
> connected to a DPM enabled lpar. Likewise, AP adapters can be
> dynamically configured (online state) and deconfigured (standby state)
> using the SE, an SCLP command or an HMC connected to a DPM enabled
> lpar. This can result in inadvertent sharing of AP queues between the
> guest and host.
>
> 5. A root user can manually unbind an AP queue device representing a
> queue in use by a KVM guest via the vfio_ap device driver's sysfs
> unbind attribute. In this case, the guest will be using a queue that
> is not bound to the driver which violates the device model.
>
> This patch series introduces the following changes to the current design
> to alleviate the shortcomings described above as well as to implement
> more of the AP architecture:
>
> 1. A root user will be prevented from making changes to the AP bus's
> /sys/bus/ap/apmask or /sys/bus/ap/aqmask if the ownership of an APQN
> changes from the vfio_ap device driver to a zcrypt driver when the
> APQN is assigned to a matrix mdev.
>
> 2. Allow a root user to hot plug/unplug AP adapters, domains and control
> domains using the matrix mdev's assign/unassign attributes.
>
> 4. Allow assignment of an AP adapter or domain to a matrix mdev even if
> it results in assignment of an APQN that does not reference an AP
> queue device bound to the vfio_ap device driver, as long as the APQN
> is not reserved for use by the default zcrypt drivers (also known as
> over-provisioning of AP resources). Allowing over-provisioning of AP
> resources better models the architecture which does not preclude
> assigning AP resources that are not yet available in the system. Such
> APQNs, however, will not be assigned to the guest using the matrix
> mdev; only APQNs referencing AP queue devices bound to the vfio_ap
> device driver will actually get assigned to the guest.
>
> 5. Handle dynamic changes to the AP device model.
>
> 1. Rationale for changes to AP bus's apmask/aqmask interfaces:
> ----------------------------------------------------------
> Due to the extremely sensitive nature of cryptographic data, it is
> imperative that great care be taken to ensure that such data is secured.
> Allowing a root user, either inadvertently or maliciously, to configure
> these masks such that a queue is shared between the host and a guest is
> not only avoidable, it is advisable. It was suggested that this scenario
> is better handled in user space with management software, but that does
> not preclude a malicious administrator from using the sysfs interfaces
> to gain access to a guest's crypto data. It was also suggested that this
> scenario could be avoided by taking access to the adapter away from the
> guest and zeroing out the queues prior to the vfio_ap driver releasing the
> device; however, stealing an adapter in use from a guest as a by-product
> of an operation is bad and will likely cause problems for the guest
> unnecessarily. It was decided that the most effective solution with the
> least number of negative side effects is to prevent the situation at the
> source.
>
> 2. Rationale for hot plug/unplug using matrix mdev sysfs interfaces:
> ----------------------------------------------------------------
> Allowing a user to hot plug/unplug AP resources using the matrix mdev
> sysfs interfaces circumvents the need to terminate the guest in order to
> modify its AP configuration. Allowing dynamic configuration makes
> reconfiguring a guest's AP matrix much less disruptive.
>
> 3. Rationale for allowing over-provisioning of AP resources:
> -----------------------------------------------------------
> Allowing assignment of AP resources to a matrix mdev and ultimately to a
> guest better models the AP architecture. The architecture does not
> preclude assignment of unavailable AP resources. If a queue subsequently
> becomes available while a guest using the matrix mdev to which its APQN
> is assigned, the guest will be given access to it. If an APQN
> is dynamically unassigned from the underlying host system, it will
> automatically become unavailable to the guest.
>
> Change log v6-v7:
> ----------------
> * Added callbacks to AP bus:
> - on_config_changed: Notifies implementing drivers that
> the AP configuration has changed since last AP device scan.
> - on_scan_complete: Notifies implementing drivers that the device scan
> has completed.
> - implemented on_config_changed and on_scan_complete callbacks for
> vfio_ap device driver.
> - updated vfio_ap device driver's probe and remove callbacks to handle
> dynamic changes to the AP device model.
> * Added code to filter APQNs when assigning AP resources to a KVM guest's
> CRYCB
>
> Change log v7-v8:
> ----------------
> * Now logging a message when an attempt to reserve APQNs for the zcrypt
> drivers will result in taking a queue away from a KVM guest to provide
> the sysadmin a way to ascertain why the sysfs operation failed.
>
> * Created locked and unlocked versions of the ap_parse_mask_str() function.
>
> * Now using new interface provided by an AP bus patch -
> s390/ap: introduce new ap function ap_get_qdev() - to retrieve
> struct ap_queue representing an AP queue device. This patch is not a
> part of this series but is a prerequisite for this series.
>
> Change log v6-v7:
> ----------------
>
> Change log v5-v6:
> ----------------
> * Fixed a bug in ap_bus.c introduced with patch 2/7 of the v5
> series. Harald Freudenberer pointed out that the mutex lock
> for ap_perms_mutex in the apmask_store and aqmask_store functions
> was not being freed.
>
> * Removed patch 6/7 which added logging to the vfio_ap driver
> to expedite acceptance of this series. The logging will be introduced
> with a separate patch series to allow more time to explore options
> such as DBF logging vs. tracepoints.
>
> * Added 3 patches related to ensuring that APQNs that do not reference
> AP queue devices bound to the vfio_ap device driver are not assigned
> to the guest CRYCB:
>
> Patch 4: Filter CRYCB bits for unavailable queue devices
> Patch 5: sysfs attribute to display the guest CRYCB
> Patch 6: update guest CRYCB in vfio_ap probe and remove callbacks
>
> * Added a patch (Patch 9) to version the vfio_ap module.
>
> * Reshuffled patches to allow the in_use callback implementation to
> invoke the vfio_ap_mdev_verify_no_sharing() function introduced in
> patch 2.
>
> Change log v4-v5:
> ----------------
> * Added a patch to provide kernel s390dbf debug logs for VFIO AP
>
> Change log v3->v4:
> -----------------
> * Restored patches preventing root user from changing ownership of
> APQNs from zcrypt drivers to the vfio_ap driver if the APQN is
> assigned to an mdev.
>
> * No longer enforcing requirement restricting guest access to
> queues represented by a queue device bound to the vfio_ap
> device driver.
>
> * Removed shadow CRYCB and now directly updating the guest CRYCB
> from the matrix mdev's matrix.
>
> * Rebased the patch series on top of 'vfio: ap: AP Queue Interrupt
> Control' patches.
>
> * Disabled bind/unbind sysfs interfaces for vfio_ap driver
>
> Change log v2->v3:
> -----------------
> * Allow guest access to an AP queue only if the queue is bound to
> the vfio_ap device driver.
>
> * Removed the patch to test CRYCB masks before taking the vCPUs
> out of SIE. Now checking the shadow CRYCB in the vfio_ap driver.
>
> Change log v1->v2:
> -----------------
> * Removed patches preventing root user from unbinding AP queues from
> the vfio_ap device driver
> * Introduced a shadow CRYCB in the vfio_ap driver to manage dynamic
> changes to the AP guest configuration due to root user interventions
> or hardware anomalies.
>
> Harald Freudenberger (2):
> s390/ap: introduce new ap function ap_get_qdev()
> s390/zcrypt: Notify driver on config changed and scan complete
> callbacks
>
> Tony Krowiak (14):
> s390/vfio-ap: use new AP bus interface to search for queue devices
> s390/vfio-ap: manage link between queue struct and matrix mdev
> s390/zcrypt: driver callback to indicate resource in use
> s390/vfio-ap: implement in-use callback for vfio_ap driver
> s390/vfio-ap: introduce shadow APCB
> s390/vfio-ap: sysfs attribute to display the guest's matrix
> s390/vfio-ap: filter matrix for unavailable queue devices
> s390/vfio_ap: add qlink from ap_matrix_mdev struct to vfio_ap_queue
> struct
> s390/vfio-ap: allow assignment of unavailable AP queues to mdev device
> s390/vfio-ap: allow configuration of matrix mdev in use by a KVM guest
> s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
> s390/vfio-ap: handle host AP config change notification
> s390/vfio-ap: handle AP bus scan completed notification
> s390/vfio-ap: handle probe/remove not due to host AP config changes
>
> drivers/s390/crypto/ap_bus.c | 417 +++++++--
> drivers/s390/crypto/ap_bus.h | 41 +-
> drivers/s390/crypto/ap_card.c | 47 +-
> drivers/s390/crypto/ap_queue.c | 10 +-
> drivers/s390/crypto/vfio_ap_drv.c | 34 +-
> drivers/s390/crypto/vfio_ap_ops.c | 1165 ++++++++++++++++++++-----
> drivers/s390/crypto/vfio_ap_private.h | 23 +-
> 7 files changed, 1339 insertions(+), 398 deletions(-)
>
On 16.06.20 16:26, Tony Krowiak wrote:
> I would greatly appreciate some attention to this patch series ... Please?
Any idea about the kernel test build mails? Are these patches maybe against
a wrong tree?
On 6/16/20 11:31 AM, Christian Borntraeger wrote:
> On 16.06.20 16:26, Tony Krowiak wrote:
>> I would greatly appreciate some attention to this patch series ... Please?
> Any idea about the kernel test build mails? Are these patches maybe against
> a wrong tree?
I'm not sure why I don't see any of those warning messages; maybe I
need to set some build flag. In any case, I fixed them all.
>
>
Ping
On 6/5/20 5:39 PM, Tony Krowiak wrote:
> Note: Patch 1 - s390/ap: introduce new ap function ap_get_qdev() - is not
> a part of this series. It is a forthcoming patch that is a
> prerequisite to this series and is being provided so this series
> will compile.
>
> The current design for AP pass-through does not support making dynamic
> changes to the AP matrix of a running guest resulting in a few
> deficiencies this patch series is intended to mitigate:
>
> 1. Adapters, domains and control domains can not be added to or removed
> from a running guest. In order to modify a guest's AP configuration,
> the guest must be terminated; only then can AP resources be assigned
> to or unassigned from the guest's matrix mdev. The new AP
> configuration becomes available to the guest when it is subsequently
> restarted.
>
> 2. The AP bus's /sys/bus/ap/apmask and /sys/bus/ap/aqmask interfaces can
> be modified by a root user without any restrictions. A change to
> either mask can result in AP queue devices being unbound from the
> vfio_ap device driver and bound to a zcrypt device driver even if a
> guest is using the queues, thus giving the host access to the guest's
> private crypto data and vice versa.
>
> 3. The APQNs derived from the Cartesian product of the APIDs of the
> adapters and APQIs of the domains assigned to a matrix mdev must
> reference an AP queue device bound to the vfio_ap device driver. The
> AP architecture allows assignment of AP resources that are not
> available to the system, so this artificial restriction is not
> compliant with the architecture.
>
> 4. The AP configuration profile can be dynamically changed for the linux
> host after a KVM guest is started. For example, a new domain can be
> dynamically added to the configuration profile via the SE or an HMC
> connected to a DPM enabled lpar. Likewise, AP adapters can be
> dynamically configured (online state) and deconfigured (standby state)
> using the SE, an SCLP command or an HMC connected to a DPM enabled
> lpar. This can result in inadvertent sharing of AP queues between the
> guest and host.
>
> 5. A root user can manually unbind an AP queue device representing a
> queue in use by a KVM guest via the vfio_ap device driver's sysfs
> unbind attribute. In this case, the guest will be using a queue that
> is not bound to the driver which violates the device model.
>
> This patch series introduces the following changes to the current design
> to alleviate the shortcomings described above as well as to implement
> more of the AP architecture:
>
> 1. A root user will be prevented from making changes to the AP bus's
> /sys/bus/ap/apmask or /sys/bus/ap/aqmask if the ownership of an APQN
> changes from the vfio_ap device driver to a zcrypt driver when the
> APQN is assigned to a matrix mdev.
>
> 2. Allow a root user to hot plug/unplug AP adapters, domains and control
> domains using the matrix mdev's assign/unassign attributes.
>
> 4. Allow assignment of an AP adapter or domain to a matrix mdev even if
> it results in assignment of an APQN that does not reference an AP
> queue device bound to the vfio_ap device driver, as long as the APQN
> is not reserved for use by the default zcrypt drivers (also known as
> over-provisioning of AP resources). Allowing over-provisioning of AP
> resources better models the architecture which does not preclude
> assigning AP resources that are not yet available in the system. Such
> APQNs, however, will not be assigned to the guest using the matrix
> mdev; only APQNs referencing AP queue devices bound to the vfio_ap
> device driver will actually get assigned to the guest.
>
> 5. Handle dynamic changes to the AP device model.
>
> 1. Rationale for changes to AP bus's apmask/aqmask interfaces:
> ----------------------------------------------------------
> Due to the extremely sensitive nature of cryptographic data, it is
> imperative that great care be taken to ensure that such data is secured.
> Allowing a root user, either inadvertently or maliciously, to configure
> these masks such that a queue is shared between the host and a guest is
> not only avoidable, it is advisable. It was suggested that this scenario
> is better handled in user space with management software, but that does
> not preclude a malicious administrator from using the sysfs interfaces
> to gain access to a guest's crypto data. It was also suggested that this
> scenario could be avoided by taking access to the adapter away from the
> guest and zeroing out the queues prior to the vfio_ap driver releasing the
> device; however, stealing an adapter in use from a guest as a by-product
> of an operation is bad and will likely cause problems for the guest
> unnecessarily. It was decided that the most effective solution with the
> least number of negative side effects is to prevent the situation at the
> source.
>
> 2. Rationale for hot plug/unplug using matrix mdev sysfs interfaces:
> ----------------------------------------------------------------
> Allowing a user to hot plug/unplug AP resources using the matrix mdev
> sysfs interfaces circumvents the need to terminate the guest in order to
> modify its AP configuration. Allowing dynamic configuration makes
> reconfiguring a guest's AP matrix much less disruptive.
>
> 3. Rationale for allowing over-provisioning of AP resources:
> -----------------------------------------------------------
> Allowing assignment of AP resources to a matrix mdev and ultimately to a
> guest better models the AP architecture. The architecture does not
> preclude assignment of unavailable AP resources. If a queue subsequently
> becomes available while a guest using the matrix mdev to which its APQN
> is assigned, the guest will be given access to it. If an APQN
> is dynamically unassigned from the underlying host system, it will
> automatically become unavailable to the guest.
>
> Change log v6-v7:
> ----------------
> * Added callbacks to AP bus:
> - on_config_changed: Notifies implementing drivers that
> the AP configuration has changed since last AP device scan.
> - on_scan_complete: Notifies implementing drivers that the device scan
> has completed.
> - implemented on_config_changed and on_scan_complete callbacks for
> vfio_ap device driver.
> - updated vfio_ap device driver's probe and remove callbacks to handle
> dynamic changes to the AP device model.
> * Added code to filter APQNs when assigning AP resources to a KVM guest's
> CRYCB
>
> Change log v7-v8:
> ----------------
> * Now logging a message when an attempt to reserve APQNs for the zcrypt
> drivers will result in taking a queue away from a KVM guest to provide
> the sysadmin a way to ascertain why the sysfs operation failed.
>
> * Created locked and unlocked versions of the ap_parse_mask_str() function.
>
> * Now using new interface provided by an AP bus patch -
> s390/ap: introduce new ap function ap_get_qdev() - to retrieve
> struct ap_queue representing an AP queue device. This patch is not a
> part of this series but is a prerequisite for this series.
>
> Change log v6-v7:
> ----------------
>
> Change log v5-v6:
> ----------------
> * Fixed a bug in ap_bus.c introduced with patch 2/7 of the v5
> series. Harald Freudenberer pointed out that the mutex lock
> for ap_perms_mutex in the apmask_store and aqmask_store functions
> was not being freed.
>
> * Removed patch 6/7 which added logging to the vfio_ap driver
> to expedite acceptance of this series. The logging will be introduced
> with a separate patch series to allow more time to explore options
> such as DBF logging vs. tracepoints.
>
> * Added 3 patches related to ensuring that APQNs that do not reference
> AP queue devices bound to the vfio_ap device driver are not assigned
> to the guest CRYCB:
>
> Patch 4: Filter CRYCB bits for unavailable queue devices
> Patch 5: sysfs attribute to display the guest CRYCB
> Patch 6: update guest CRYCB in vfio_ap probe and remove callbacks
>
> * Added a patch (Patch 9) to version the vfio_ap module.
>
> * Reshuffled patches to allow the in_use callback implementation to
> invoke the vfio_ap_mdev_verify_no_sharing() function introduced in
> patch 2.
>
> Change log v4-v5:
> ----------------
> * Added a patch to provide kernel s390dbf debug logs for VFIO AP
>
> Change log v3->v4:
> -----------------
> * Restored patches preventing root user from changing ownership of
> APQNs from zcrypt drivers to the vfio_ap driver if the APQN is
> assigned to an mdev.
>
> * No longer enforcing requirement restricting guest access to
> queues represented by a queue device bound to the vfio_ap
> device driver.
>
> * Removed shadow CRYCB and now directly updating the guest CRYCB
> from the matrix mdev's matrix.
>
> * Rebased the patch series on top of 'vfio: ap: AP Queue Interrupt
> Control' patches.
>
> * Disabled bind/unbind sysfs interfaces for vfio_ap driver
>
> Change log v2->v3:
> -----------------
> * Allow guest access to an AP queue only if the queue is bound to
> the vfio_ap device driver.
>
> * Removed the patch to test CRYCB masks before taking the vCPUs
> out of SIE. Now checking the shadow CRYCB in the vfio_ap driver.
>
> Change log v1->v2:
> -----------------
> * Removed patches preventing root user from unbinding AP queues from
> the vfio_ap device driver
> * Introduced a shadow CRYCB in the vfio_ap driver to manage dynamic
> changes to the AP guest configuration due to root user interventions
> or hardware anomalies.
>
> Harald Freudenberger (2):
> s390/ap: introduce new ap function ap_get_qdev()
> s390/zcrypt: Notify driver on config changed and scan complete
> callbacks
>
> Tony Krowiak (14):
> s390/vfio-ap: use new AP bus interface to search for queue devices
> s390/vfio-ap: manage link between queue struct and matrix mdev
> s390/zcrypt: driver callback to indicate resource in use
> s390/vfio-ap: implement in-use callback for vfio_ap driver
> s390/vfio-ap: introduce shadow APCB
> s390/vfio-ap: sysfs attribute to display the guest's matrix
> s390/vfio-ap: filter matrix for unavailable queue devices
> s390/vfio_ap: add qlink from ap_matrix_mdev struct to vfio_ap_queue
> struct
> s390/vfio-ap: allow assignment of unavailable AP queues to mdev device
> s390/vfio-ap: allow configuration of matrix mdev in use by a KVM guest
> s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
> s390/vfio-ap: handle host AP config change notification
> s390/vfio-ap: handle AP bus scan completed notification
> s390/vfio-ap: handle probe/remove not due to host AP config changes
>
> drivers/s390/crypto/ap_bus.c | 417 +++++++--
> drivers/s390/crypto/ap_bus.h | 41 +-
> drivers/s390/crypto/ap_card.c | 47 +-
> drivers/s390/crypto/ap_queue.c | 10 +-
> drivers/s390/crypto/vfio_ap_drv.c | 34 +-
> drivers/s390/crypto/vfio_ap_ops.c | 1165 ++++++++++++++++++++-----
> drivers/s390/crypto/vfio_ap_private.h | 23 +-
> 7 files changed, 1339 insertions(+), 398 deletions(-)
>
This series has been on the mailing list since June 5th. It would be GREATLY
appreciated if these patches can get some attention so we can move
forward with providing dynamic Adjunct Processor configuration support for
our customers. Thanks in advance for your time.
On 6/5/20 5:39 PM, Tony Krowiak wrote:
> Note: Patch 1 - s390/ap: introduce new ap function ap_get_qdev() - is not
> a part of this series. It is a forthcoming patch that is a
> prerequisite to this series and is being provided so this series
> will compile.
>
> The current design for AP pass-through does not support making dynamic
> changes to the AP matrix of a running guest resulting in a few
> deficiencies this patch series is intended to mitigate:
>
> 1. Adapters, domains and control domains can not be added to or removed
> from a running guest. In order to modify a guest's AP configuration,
> the guest must be terminated; only then can AP resources be assigned
> to or unassigned from the guest's matrix mdev. The new AP
> configuration becomes available to the guest when it is subsequently
> restarted.
>
> 2. The AP bus's /sys/bus/ap/apmask and /sys/bus/ap/aqmask interfaces can
> be modified by a root user without any restrictions. A change to
> either mask can result in AP queue devices being unbound from the
> vfio_ap device driver and bound to a zcrypt device driver even if a
> guest is using the queues, thus giving the host access to the guest's
> private crypto data and vice versa.
>
> 3. The APQNs derived from the Cartesian product of the APIDs of the
> adapters and APQIs of the domains assigned to a matrix mdev must
> reference an AP queue device bound to the vfio_ap device driver. The
> AP architecture allows assignment of AP resources that are not
> available to the system, so this artificial restriction is not
> compliant with the architecture.
>
> 4. The AP configuration profile can be dynamically changed for the linux
> host after a KVM guest is started. For example, a new domain can be
> dynamically added to the configuration profile via the SE or an HMC
> connected to a DPM enabled lpar. Likewise, AP adapters can be
> dynamically configured (online state) and deconfigured (standby state)
> using the SE, an SCLP command or an HMC connected to a DPM enabled
> lpar. This can result in inadvertent sharing of AP queues between the
> guest and host.
>
> 5. A root user can manually unbind an AP queue device representing a
> queue in use by a KVM guest via the vfio_ap device driver's sysfs
> unbind attribute. In this case, the guest will be using a queue that
> is not bound to the driver which violates the device model.
>
> This patch series introduces the following changes to the current design
> to alleviate the shortcomings described above as well as to implement
> more of the AP architecture:
>
> 1. A root user will be prevented from making changes to the AP bus's
> /sys/bus/ap/apmask or /sys/bus/ap/aqmask if the ownership of an APQN
> changes from the vfio_ap device driver to a zcrypt driver when the
> APQN is assigned to a matrix mdev.
>
> 2. Allow a root user to hot plug/unplug AP adapters, domains and control
> domains using the matrix mdev's assign/unassign attributes.
>
> 4. Allow assignment of an AP adapter or domain to a matrix mdev even if
> it results in assignment of an APQN that does not reference an AP
> queue device bound to the vfio_ap device driver, as long as the APQN
> is not reserved for use by the default zcrypt drivers (also known as
> over-provisioning of AP resources). Allowing over-provisioning of AP
> resources better models the architecture which does not preclude
> assigning AP resources that are not yet available in the system. Such
> APQNs, however, will not be assigned to the guest using the matrix
> mdev; only APQNs referencing AP queue devices bound to the vfio_ap
> device driver will actually get assigned to the guest.
>
> 5. Handle dynamic changes to the AP device model.
>
> 1. Rationale for changes to AP bus's apmask/aqmask interfaces:
> ----------------------------------------------------------
> Due to the extremely sensitive nature of cryptographic data, it is
> imperative that great care be taken to ensure that such data is secured.
> Allowing a root user, either inadvertently or maliciously, to configure
> these masks such that a queue is shared between the host and a guest is
> not only avoidable, it is advisable. It was suggested that this scenario
> is better handled in user space with management software, but that does
> not preclude a malicious administrator from using the sysfs interfaces
> to gain access to a guest's crypto data. It was also suggested that this
> scenario could be avoided by taking access to the adapter away from the
> guest and zeroing out the queues prior to the vfio_ap driver releasing the
> device; however, stealing an adapter in use from a guest as a by-product
> of an operation is bad and will likely cause problems for the guest
> unnecessarily. It was decided that the most effective solution with the
> least number of negative side effects is to prevent the situation at the
> source.
>
> 2. Rationale for hot plug/unplug using matrix mdev sysfs interfaces:
> ----------------------------------------------------------------
> Allowing a user to hot plug/unplug AP resources using the matrix mdev
> sysfs interfaces circumvents the need to terminate the guest in order to
> modify its AP configuration. Allowing dynamic configuration makes
> reconfiguring a guest's AP matrix much less disruptive.
>
> 3. Rationale for allowing over-provisioning of AP resources:
> -----------------------------------------------------------
> Allowing assignment of AP resources to a matrix mdev and ultimately to a
> guest better models the AP architecture. The architecture does not
> preclude assignment of unavailable AP resources. If a queue subsequently
> becomes available while a guest using the matrix mdev to which its APQN
> is assigned, the guest will be given access to it. If an APQN
> is dynamically unassigned from the underlying host system, it will
> automatically become unavailable to the guest.
>
> Change log v6-v7:
> ----------------
> * Added callbacks to AP bus:
> - on_config_changed: Notifies implementing drivers that
> the AP configuration has changed since last AP device scan.
> - on_scan_complete: Notifies implementing drivers that the device scan
> has completed.
> - implemented on_config_changed and on_scan_complete callbacks for
> vfio_ap device driver.
> - updated vfio_ap device driver's probe and remove callbacks to handle
> dynamic changes to the AP device model.
> * Added code to filter APQNs when assigning AP resources to a KVM guest's
> CRYCB
>
> Change log v7-v8:
> ----------------
> * Now logging a message when an attempt to reserve APQNs for the zcrypt
> drivers will result in taking a queue away from a KVM guest to provide
> the sysadmin a way to ascertain why the sysfs operation failed.
>
> * Created locked and unlocked versions of the ap_parse_mask_str() function.
>
> * Now using new interface provided by an AP bus patch -
> s390/ap: introduce new ap function ap_get_qdev() - to retrieve
> struct ap_queue representing an AP queue device. This patch is not a
> part of this series but is a prerequisite for this series.
>
> Change log v6-v7:
> ----------------
>
> Change log v5-v6:
> ----------------
> * Fixed a bug in ap_bus.c introduced with patch 2/7 of the v5
> series. Harald Freudenberer pointed out that the mutex lock
> for ap_perms_mutex in the apmask_store and aqmask_store functions
> was not being freed.
>
> * Removed patch 6/7 which added logging to the vfio_ap driver
> to expedite acceptance of this series. The logging will be introduced
> with a separate patch series to allow more time to explore options
> such as DBF logging vs. tracepoints.
>
> * Added 3 patches related to ensuring that APQNs that do not reference
> AP queue devices bound to the vfio_ap device driver are not assigned
> to the guest CRYCB:
>
> Patch 4: Filter CRYCB bits for unavailable queue devices
> Patch 5: sysfs attribute to display the guest CRYCB
> Patch 6: update guest CRYCB in vfio_ap probe and remove callbacks
>
> * Added a patch (Patch 9) to version the vfio_ap module.
>
> * Reshuffled patches to allow the in_use callback implementation to
> invoke the vfio_ap_mdev_verify_no_sharing() function introduced in
> patch 2.
>
> Change log v4-v5:
> ----------------
> * Added a patch to provide kernel s390dbf debug logs for VFIO AP
>
> Change log v3->v4:
> -----------------
> * Restored patches preventing root user from changing ownership of
> APQNs from zcrypt drivers to the vfio_ap driver if the APQN is
> assigned to an mdev.
>
> * No longer enforcing requirement restricting guest access to
> queues represented by a queue device bound to the vfio_ap
> device driver.
>
> * Removed shadow CRYCB and now directly updating the guest CRYCB
> from the matrix mdev's matrix.
>
> * Rebased the patch series on top of 'vfio: ap: AP Queue Interrupt
> Control' patches.
>
> * Disabled bind/unbind sysfs interfaces for vfio_ap driver
>
> Change log v2->v3:
> -----------------
> * Allow guest access to an AP queue only if the queue is bound to
> the vfio_ap device driver.
>
> * Removed the patch to test CRYCB masks before taking the vCPUs
> out of SIE. Now checking the shadow CRYCB in the vfio_ap driver.
>
> Change log v1->v2:
> -----------------
> * Removed patches preventing root user from unbinding AP queues from
> the vfio_ap device driver
> * Introduced a shadow CRYCB in the vfio_ap driver to manage dynamic
> changes to the AP guest configuration due to root user interventions
> or hardware anomalies.
>
> Harald Freudenberger (2):
> s390/ap: introduce new ap function ap_get_qdev()
> s390/zcrypt: Notify driver on config changed and scan complete
> callbacks
>
> Tony Krowiak (14):
> s390/vfio-ap: use new AP bus interface to search for queue devices
> s390/vfio-ap: manage link between queue struct and matrix mdev
> s390/zcrypt: driver callback to indicate resource in use
> s390/vfio-ap: implement in-use callback for vfio_ap driver
> s390/vfio-ap: introduce shadow APCB
> s390/vfio-ap: sysfs attribute to display the guest's matrix
> s390/vfio-ap: filter matrix for unavailable queue devices
> s390/vfio_ap: add qlink from ap_matrix_mdev struct to vfio_ap_queue
> struct
> s390/vfio-ap: allow assignment of unavailable AP queues to mdev device
> s390/vfio-ap: allow configuration of matrix mdev in use by a KVM guest
> s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
> s390/vfio-ap: handle host AP config change notification
> s390/vfio-ap: handle AP bus scan completed notification
> s390/vfio-ap: handle probe/remove not due to host AP config changes
>
> drivers/s390/crypto/ap_bus.c | 417 +++++++--
> drivers/s390/crypto/ap_bus.h | 41 +-
> drivers/s390/crypto/ap_card.c | 47 +-
> drivers/s390/crypto/ap_queue.c | 10 +-
> drivers/s390/crypto/vfio_ap_drv.c | 34 +-
> drivers/s390/crypto/vfio_ap_ops.c | 1165 ++++++++++++++++++++-----
> drivers/s390/crypto/vfio_ap_private.h | 23 +-
> 7 files changed, 1339 insertions(+), 398 deletions(-)
>