From: Tony Krowiak <[email protected]>
Notes:
=====
* Patches 1-4 posted with this series are forthcoming via another patch
series and should not be reviewed with this series. They have been
included here because some of the functions in this patch series are
dependent upon them.
* In the previous review, Pierre Morel suggested that the remove callback
for the VFIO AP device driver check to see if a guest is using the
device being removed and if so, remove it from the guest. This would
require a hot unplug implementation. Due to the design complexity,
implementation of hot plug/unplug of AP devices is being deferred until
the next patch series.
=====
On s390, we have cryptographic coprocessor cards, which are modeled on
Linux as devices on the AP bus. Each card can be partitioned into domains
which can be thought of as a set of hardware registers for processing
crypto commands. Crypto commands are sent to a specific domain within a
card is via a queue which is identified as a (card,domain) tuple. We model
this something like the following (assuming we have access to cards 3 and
4 and domains 1 and 2):
AP -> card3 -> queue (3,1)
-> queue (3,2)
-> card4 -> queue (4,1)
-> queue (4,2)
If we want to virtualize this, we can use a feature provided by the
hardware. We basically attach a satellite control block to our main
hardware virtualization control block and the hardware takes care of
most of the rest.
For this control block, we don't specify explicit tuples, but a list of
cards and a list of domains. The guest will get access to the cross
product.
Because of this, we need to take care that the lists provided to
different guests don't overlap; i.e., we need to enforce sane
configurations. Otherwise, one guest may get access to things like
secret keys for another guest.
The idea of this patch set is to introduce a new device, the matrix
device. This matrix device hangs off a different root and acts as the
parent node for mdev devices.
If you now want to give the tuples (4,1) and (4,2), you need to do the
following:
- Unbind the (4,1) and (4,2) tuples from their ap bus driver.
- Bind the (4,1) and (4,2) tuples to the vfio_ap driver.
- Create the mediated device.
- Assign card 4 and domains 1 and 2 to the mediated device
QEMU will now simply consume the mediated device and things should work.
For a complete description of the architecture and concepts underlying the
design, see the Documentation/s390/vfio-ap.txt file included with this
patch set.
--
v5 => v6 Change log:
===================
* Added VSIE support - thanks to Pierre Morel
* Added VFIO_DEVICE_RESET ioctl
* Zeroizing AP queues when mediated device released and when
VFIO_DEVICE_RESET ioctl is invoked
* Removed /arch/s390/kvm/kvm-ap.c and arch/s390/include/asm/kvm-ap.h and
moved guest matrix configuration into vfio driver
* Removed temporary interfaces to be supplied by AP bus
* Made the variable that keeps track of mdev instance count an atomic_t
type
* Removed code iterating through vm_list to determine if another guest has
a queue .... not keep a list of matrix_mdev devices and verify against
that list. Removes the need for the kvm_lock.
* Added a sysfs attribute for the mediated matrix device to display the
matrix contained in the guest's CRYCB if a guest is using the mdev.
v4 => v5 Change log:
===================
* Verify AP queues bound to driver in mediated device open callback, prior
to configuring the matrix in the CRYCB
* Implement VFIO_DEVICE_RESET ioctl
* Zeroize queues on guest entry and exit
* Removed vnet from all email IBM email addresses referenced
* Add synchronization in mdev create/remove and open/release.
* Added code to mdev open callback to ensure not more than one vfio-ap
device can be opened by a guest.
* Interpret AP instructions by default
* Removed patch implementing interface to enable/disable AP interpretation,
since that will now be done by default
* Removed patch to reset crypto attributes for ALL vcpus. That will be
submitted as a single patch since it will not be needed in this series -
i.e., it was called from the interface to enable/disable AP instructions
* All code for initializing crypto for a guest has been moved back to
kvm-s390.c, kvm_s390_crypto_init(kvm) function
* Maintaining a module reference count for the vfio_ap module so it is not
removed while a guest with AP devices is running.
v3 => v4 Change log:
===================
* Resolved issue with enabling ZCRYPT when KVM is enabled by using
#ifdef ZCRYPT in relevant functions
* Added patch with a new function for resetting the crypto attributes
for all vcpus to resolve the issue raised with running vcpus getting out
of sync.
* Removed KVM_S390_VM_CRYPTO_INTERPRET_AP: Setting interpretive exec mode
from vfio_ap driver when mdev device is opened.
v2 => v3 Change log:
===================
* Set APIE in VCPU setup function
* Renamed patch 13/15:
KVM: s390: Configure the guest's CRYCB
KVM: s390: Configure the guest's AP devices
* Fixed problem with building arch/s390/kvm/kvm-ap.c when CONFIG_ZCRYPT
not selected
* Removed patch introducing VSIE support for AP pending further
investigation
* Initialized AP maximum mask sizes - i.e., APM, AQM and ADM - from info
returned from PQAP(QCI) function
* Introduced a new device attribute to the KVM_S390_VM_CRYPTO attribute
group for setting a flag via the KVM_SET_DEVICE_ATTR ioctl to indicate
whether ECA_APIE should be set or not. The flag is used in the
kvm_s390_vcpu_crypto_setup() function to set ECA_APIE in the SIE block.
v1 => v2 Change log:
===================
* Added documentation vfio-ap.txt
* Renamed vfio_ap_matrix module and device driver to vfio_ap
* Use device core device list instead of maintaining list of matrix
devices in driver
* Added VSIE support for AP
* Create matrix device before registering VFIO AP device driver with the
AP bus
* Renamed the following files in drivers/s390/crypto:
* vfio_ap_matrix.drv -> vfio_ap_drv
* vfio_ap_matrix_private.h -> vfio_ap_private.h
* vfio_ap_matrix_ops.c -> vfio_ap_ops.c
* arch/s390/include/asm/kvm/ap-matrix-config.h
* Renamed to kvm-ap.h
* Changed the data type of the bit mask fields for the matrix structure
to unsigned long and create them with DECLARE_BITMAP
* Changed #define prefixes from AP_MATRIX to KVM_AP
* Changed function and structure prefixes from ap_matrix to kvm_ap
* Added function interface to check if AP Extended Addressing (APXA)
facility is installedCRYCB_FORMAT_MASK
* Added function interface to get the maximum ID for AP mask type
* Added function interface to set the AP execution mode
* arch/s390/kvm/ap-matrix-config.c
* Renamed to kvm-ap.c
* Changed function prefixes from ap_matrix to kvm_ap
* Added function to check if AP Extended Addressing (APXA) facility is
installed
* Added function to get the maximum ID for AP mask type
* Added function to set the AP execution mode
* Added a boolean parameter to the functions that retrieve the APM, AQM
and ADM bit mask fields from the CRYCB. If true, then the function
will clear the bits in the mask before returning a reference to it
* Added validation to verify that APM, AQM and ADM bits that are set do
not exceed the maximum ID value allowed
*
* arch/s390/include/asm/kvm_host.h
* Changed defined for ECA_AP to ECA_APIE - interpretive execution mode
* Added a flag to struct kvm_s390_crypto to indicate whether the
KVM_S390_VM_CPU_FEAT_AP CPU model feature for AP facilities is set
* Added two CPU facilities features to set STFLE.12 and STFLE.15
* arch/s390/kvm/kvm-s390.c
* Added initialization for new KVM_S390_VM_CPU_FEAT_AP CPU model feature
* Removed kvm_s390_apxa_installed() function
* Changed call to kvm_s390_apxa_installed() which has been removed to a
call to new kvm_ap_apxa_installed() function.
* Added code to kvm_s390_vcpu_crypto_setup() to set the new CPU model
feature flag in the kvm_s390_crypto structure
* Added CRYCB_FORMAT_MASK to mask CRYCBD
* arch/s390/tools/gen_facilities.c
* Added STFLE.12 and STFLE.15 to struct facility _def
* drivers/s390/crypto/vfio_ap_matrix_private.h
* Changed name of file to vfio_ap.private.h
* Changed #define prefixes from VFIO_AP_MATRIX to VFIO_AP
* struct ap_matrix: removed list fields and locks
* struct vfio_ap_queue: removed list field
* Renamed functions ap_matrix_mdev_register and ap_matrix_mdev_unregister
to vfio_ap_mdev_register and vfio_ap_mdev_unregister respectively
* drivers/s390/crypto/vfio_ap_matrix_drv.c
* Renamed file to drivers/s390/crypto/vfio_ap_drv.c
* Changed all #define, structure and function prefixes to vfio_ap
* probe function
* Changed root device name for the matrix device to vfio_ap:
i.e., /sys/devices/vfio_ap/matrix
* No longer storing the AP queue device in a list, it is retrievablegit
via the device core
* Removed unnecessary check whether matrix device exists
* Store the vfio_ap_queue structure in the private field of the
ap_queue structure rather than using list interface
* remove function
* Retrieve vfio_ap_queue structure from the struct ap_queue private
data rather than from a list
* Removed unnecessary check
* drivers/s390/crypto/vfio_ap_matrix_ops.c
* Renamed file to vfio_ap_ops.c
* Changed #define prefixes from AP_MATRIX to VFIO_AP
* Changed function name prefixes from ap_matrix to vfio_ap
* Removed ioctl to configure the CRYCB
* create function
* Removed ap_matrix_mdev_find_by_uuid() function - function is provided
by mdev core
* Removed available_instances verification, provided by mdev core
* Removed check to see if mediated device exists, handled by mdev core
* notifier function
* Configuring matrix here instead of via ioctl
* Set interpretive execution mode for all VCPUs
* Removed R/O attributes to display adapters and domains
* Added an R/O attribute to display the matrix
* assign_control_domain mdev attribute:
* Removed check to see if the domain is installed on the linux host
* Added check to verify the control domain ID does not exceed the max
value
* assign_adapter mdev attribute:
* Added check to verify the adapter ID does not exceed the max
value
* If any APQNs configured for the mediated matrix device that
have an APID matching the adapter ID being assigned are not
bound to the vfio_ap device driver then it is assumed that the APQN
is bound to another driver and assignment will fail
* assign_domain mdev attribute:
* Added check to verify the domain ID does not exceed the max
value
* If any APQNs configured for the mediated matrix device that
have an APQI matching the domain ID being assigned are not
bound to the vfio_ap device driver then it is assumed that the APQN
is bound to another driver and assignment will fail
* tools/arch/s390/include/uapi/asm/kvm.h
* removed KVM_S390_VM_CPU_FEAT_AP feature definition
Harald Freudenberger (4):
s390/zcrypt: Add ZAPQ inline function.
s390/zcrypt: Review inline assembler constraints.
s390/zcrypt: Show load of cards and queues in sysfs
s390/zcrypt: Integrate ap_asm.h into include/asm/ap.h.
Pierre Morel (1):
KVM: s390: Handling of Cypto control block in VSIE
Tony Krowiak (16):
KVM: s390: CPU model support for AP virtualization
KVM: s390: refactor crypto initialization
s390: vfio-ap: base implementation of VFIO AP device driver
s390: vfio-ap: register matrix device with VFIO mdev framework
s390: vfio-ap: structure for storing mdev matrix
s390: vfio-ap: sysfs interfaces to configure adapters
s390: vfio-ap: sysfs interfaces to configure domains
s390: vfio-ap: sysfs interfaces to configure control domains
s390: vfio-ap: sysfs interface to view matrix mdev matrix
s390: vfio-ap: implement mediated device open callback
s390: vfio-ap: configure the guest's AP matrix
s390: vfio-ap: sysfs interface to view guest matrix
s390: vfio-ap: implement VFIO_DEVICE_GET_INFO ioctl
s390: vfio-ap: zeroize the AP queues.
s390: vfio-ap: implement VFIO_DEVICE_RESET ioctl
s390: doc: detailed specifications for AP virtualization
Documentation/s390/vfio-ap.txt | 575 ++++++++++++++++
MAINTAINERS | 12 +
arch/s390/Kconfig | 11 +
arch/s390/include/asm/ap.h | 284 +++++++-
arch/s390/include/asm/kvm_host.h | 4 +
arch/s390/include/uapi/asm/kvm.h | 1 +
arch/s390/kvm/kvm-s390.c | 93 ++--
arch/s390/kvm/vsie.c | 224 ++++++-
arch/s390/tools/gen_facilities.c | 3 +
drivers/s390/crypto/Makefile | 4 +
drivers/s390/crypto/ap_asm.h | 236 -------
drivers/s390/crypto/ap_bus.c | 21 +-
drivers/s390/crypto/ap_bus.h | 1 +
drivers/s390/crypto/ap_card.c | 1 -
drivers/s390/crypto/ap_queue.c | 1 -
drivers/s390/crypto/vfio_ap_drv.c | 159 +++++
drivers/s390/crypto/vfio_ap_ops.c | 1221 +++++++++++++++++++++++++++++++++
drivers/s390/crypto/vfio_ap_private.h | 97 +++
drivers/s390/crypto/zcrypt_card.c | 12 +
drivers/s390/crypto/zcrypt_queue.c | 12 +
include/uapi/linux/vfio.h | 2 +
samples/bpf/bpf_load.c | 62 ++
22 files changed, 2686 insertions(+), 350 deletions(-)
create mode 100644 Documentation/s390/vfio-ap.txt
delete mode 100644 drivers/s390/crypto/ap_asm.h
create mode 100644 drivers/s390/crypto/vfio_ap_drv.c
create mode 100644 drivers/s390/crypto/vfio_ap_ops.c
create mode 100644 drivers/s390/crypto/vfio_ap_private.h
Introduces a new CPU model feature and two CPU model
facilities to support AP virtualization for KVM guests.
CPU model feature:
The KVM_S390_VM_CPU_FEAT_AP feature indicates that
AP instructions are available on the guest. This
feature will be enabled by the kernel only if the AP
instructions are installed on the linux host. This feature
must be specifically turned on for the KVM guest from
userspace to use the VFIO AP device driver for guest
access to AP devices.
CPU model facilities:
1. AP Query Configuration Information (QCI) facility is installed.
This is indicated by setting facilities bit 12 for
the guest. The kernel will not enable this facility
for the guest if it is not set on the host. This facility
must not be set by userspace if the KVM_S390_VM_CPU_FEAT_AP
feature is not installed.
If this facility is not set for the KVM guest, then only
APQNs with an APQI less than 16 will be available to the
guest regardless of the guest's matrix configuration. This
is a limitation of the AP bus running on the guest.
2. AP Facilities Test facility (APFT) is installed.
This is indicated by setting facilities bit 15 for
the guest. The kernel will not enable this facility for
the guest if it is not set on the host. This facility
must not be set by userspace if the KVM_S390_VM_CPU_FEAT_AP
feature is not installed.
If this facility is not set for the KVM guest, then no
AP devices will be available to the guest regardless of
the guest's matrix configuration. This is a limitation
of the AP bus running under the guest.
Reviewed-by: Christian Borntraeger <[email protected]>
Reviewed-by: Halil Pasic <[email protected]>
Signed-off-by: Tony Krowiak <[email protected]>
---
arch/s390/include/uapi/asm/kvm.h | 1 +
arch/s390/kvm/kvm-s390.c | 8 ++++++++
arch/s390/tools/gen_facilities.c | 3 +++
3 files changed, 12 insertions(+), 0 deletions(-)
diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
index 4cdaa55..a580dec 100644
--- a/arch/s390/include/uapi/asm/kvm.h
+++ b/arch/s390/include/uapi/asm/kvm.h
@@ -130,6 +130,7 @@ struct kvm_s390_vm_cpu_machine {
#define KVM_S390_VM_CPU_FEAT_PFMFI 11
#define KVM_S390_VM_CPU_FEAT_SIGPIF 12
#define KVM_S390_VM_CPU_FEAT_KSS 13
+#define KVM_S390_VM_CPU_FEAT_AP 14
struct kvm_s390_vm_cpu_feat {
__u64 feat[16];
};
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 3b7a515..d2208d4 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -40,6 +40,7 @@
#include <asm/sclp.h>
#include <asm/cpacf.h>
#include <asm/timex.h>
+#include <asm/ap.h>
#include "kvm-s390.h"
#include "gaccess.h"
@@ -366,6 +367,13 @@ static void kvm_s390_cpu_feat_init(void)
if (MACHINE_HAS_ESOP)
allow_cpu_feat(KVM_S390_VM_CPU_FEAT_ESOP);
+
+ /*
+ * Check if AP instructions installed on host
+ */
+ if (ap_instructions_available() == 0)
+ allow_cpu_feat(KVM_S390_VM_CPU_FEAT_AP);
+
/*
* We need SIE support, ESOP (PROT_READ protection for gmap_shadow),
* 64bit SCAO (SCA passthrough) and IDTE (for gmap_shadow unshadowing).
diff --git a/arch/s390/tools/gen_facilities.c b/arch/s390/tools/gen_facilities.c
index 90a8c9e..e0e2c19 100644
--- a/arch/s390/tools/gen_facilities.c
+++ b/arch/s390/tools/gen_facilities.c
@@ -106,6 +106,9 @@ struct facility_def {
.name = "FACILITIES_KVM_CPUMODEL",
.bits = (int[]){
+ 12, /* AP Query Configuration Information */
+ 15, /* AP Facilities Test */
+ 156, /* Execution Token facility */
-1 /* END */
}
},
--
1.7.1
From: Tony Krowiak <[email protected]>
Provides a sysfs interface to view the AP matrix configured for the
guest that is using the mdev matrix device.
The relevant sysfs structures are:
/sys/devices/vfio_ap
... [matrix]
...... [mdev_supported_types]
......... [vfio_ap-passthrough]
............ [devices]
...............[$uuid]
.................. guest_matrix
To view the matrix configured for the guest,
print the guest_matrix file:
cat guest_matrix
If no guest is using the device, then the output will be an emtpy
string.
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 39 +++++++++++++++++++++++++++++++++++++
1 files changed, 39 insertions(+), 0 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 98bd0a1..bc05d40 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -730,6 +730,44 @@ static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
}
DEVICE_ATTR_RO(matrix);
+static unsigned long *kvm_ap_get_crycb_apm(struct ap_matrix_mdev *matrix_mdev);
+
+static unsigned long *kvm_ap_get_crycb_aqm(struct ap_matrix_mdev *matrix_mdev);
+
+static ssize_t guest_matrix_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct mdev_device *mdev = mdev_from_dev(dev);
+ struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+ char *bufpos = buf;
+ unsigned long apid;
+ unsigned long apqi;
+ unsigned long *apm, *aqm;
+ unsigned long napm = matrix_mdev->matrix.apm_max + 1;
+ unsigned long naqm = matrix_mdev->matrix.aqm_max + 1;
+ int nchars = 0;
+ int n;
+
+ if (!matrix_mdev->kvm)
+ return nchars;
+
+ apm = kvm_ap_get_crycb_apm(matrix_mdev);
+ for_each_set_bit_inv(apid, apm, napm) {
+ n = sprintf(bufpos, "%02lx\n", apid);
+ bufpos += n;
+ nchars += n;
+
+ aqm = kvm_ap_get_crycb_aqm(matrix_mdev);
+ for_each_set_bit_inv(apqi, aqm, naqm) {
+ n = sprintf(bufpos, "%02lx.%04lx\n", apid, apqi);
+ bufpos += n;
+ nchars += n;
+ }
+ }
+
+ return nchars;
+}
+DEVICE_ATTR_RO(guest_matrix);
static struct attribute *vfio_ap_mdev_attrs[] = {
&dev_attr_assign_adapter.attr,
@@ -740,6 +778,7 @@ static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
&dev_attr_unassign_control_domain.attr,
&dev_attr_control_domains.attr,
&dev_attr_matrix.attr,
+ &dev_attr_guest_matrix.attr,
NULL,
};
--
1.7.1
From: Tony Krowiak <[email protected]>
Implements the VFIO_DEVICE_RESET ioctl. This ioctl zeroizes
all of the AP queues assigned to the guest.
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index e247491..4ff75c7 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1161,7 +1161,7 @@ static int vfio_ap_mdev_get_device_info(unsigned long arg)
return -EINVAL;
}
- info.flags = VFIO_DEVICE_FLAGS_AP;
+ info.flags = VFIO_DEVICE_FLAGS_AP | VFIO_DEVICE_FLAGS_RESET;
info.num_regions = 0;
info.num_irqs = 0;
@@ -1177,6 +1177,9 @@ static ssize_t vfio_ap_mdev_ioctl(struct mdev_device *mdev,
case VFIO_DEVICE_GET_INFO:
ret = vfio_ap_mdev_get_device_info(arg);
break;
+ case VFIO_DEVICE_RESET:
+ ret = vfio_ap_mdev_reset_queues(mdev, true);
+ break;
default:
pr_err("%s: ioctl command %d is not a supported command",
VFIO_AP_MODULE_NAME, cmd);
--
1.7.1
Let's call PAPQ(ZAPQ) to zeroize a queue:
* For each queue configured for a mediated matrix device
when it is released.
* When an AP queue is unbound from the VFIO AP device driver.
Zeroizing a queue resets the queue, clears all pending
messages for the queue entries and disables adapter interruptions
associated with the queue.
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_drv.c | 12 +++++++++++-
drivers/s390/crypto/vfio_ap_ops.c | 33 ++++++++++++++++++++++++++++++++-
drivers/s390/crypto/vfio_ap_private.h | 26 ++++++++++++++++++++++++++
3 files changed, 69 insertions(+), 2 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index b6ff7a4..d09ffdc 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -53,7 +53,17 @@ static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
{
- /* Nothing to do yet */
+ struct ap_queue *ap_queue = to_ap_queue(&apdev->device);
+
+ vfio_ap_reset_queue(AP_QID_CARD(ap_queue->qid),
+ AP_QID_QUEUE(ap_queue->qid));
+
+ /*
+ * TODO: Ensure that no guest is using the queue and handle it
+ * accordingly. The domain or possibly the adapter may have to
+ * be removed from the guest's configuration which would require
+ * hot unplug support which is forthcoming.
+ */
}
static void vfio_ap_matrix_dev_release(struct device *dev)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 79ac0d4..e247491 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1054,6 +1054,37 @@ static int vfio_ap_mdev_open_once(struct ap_matrix_mdev *matrix_mdev)
return ret;
}
+static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev, bool force)
+{
+ int ret;
+ int rc = 0;
+ unsigned long apid, apqi;
+ struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+ struct ap_matrix_dev *matrix_dev =
+ to_ap_matrix_dev(mdev_parent_dev(mdev));
+
+ ret = vfio_ap_verify_queues_reserved(matrix_dev, matrix_mdev->name,
+ &matrix_mdev->matrix);
+ if (ret)
+ return ret;
+
+ for_each_set_bit_inv(apid, matrix_mdev->matrix.apm,
+ matrix_mdev->matrix.apm_max + 1) {
+ for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
+ matrix_mdev->matrix.aqm_max + 1) {
+ ret = vfio_ap_reset_queue(apid, apqi);
+ if (ret) {
+ if (force)
+ rc = ret;
+ else
+ return ret;
+ }
+ }
+ }
+
+ return rc;
+}
+
static int vfio_ap_mdev_open(struct mdev_device *mdev)
{
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
@@ -1107,7 +1138,7 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
kvm_ap_deconfigure_matrix(matrix_mdev);
-
+ vfio_ap_mdev_reset_queues(mdev, true);
vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
&matrix_mdev->group_notifier);
matrix_mdev->kvm = NULL;
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 7792b45..97d80f3 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -12,6 +12,7 @@
#include <linux/types.h>
#include <linux/device.h>
#include <linux/mdev.h>
+#include <linux/delay.h>
#include "ap_bus.h"
@@ -68,4 +69,29 @@ struct ap_matrix_mdev {
extern int vfio_ap_mdev_register(struct ap_matrix_dev *matrix_dev);
extern void vfio_ap_mdev_unregister(struct ap_matrix_dev *matrix_dev);
+static inline int vfio_ap_reset_queue(unsigned long apid, unsigned long apqi)
+{
+ int count = 50;
+ struct ap_queue_status status;
+
+ while (count--) {
+ status = ap_zapq(AP_MKQID(apid, apqi));
+ switch (status.response_code) {
+ case AP_RESPONSE_NORMAL:
+ return 0;
+ case AP_RESPONSE_RESET_IN_PROGRESS:
+ case AP_RESPONSE_BUSY:
+ msleep(20);
+ break;
+ default:
+ pr_err("%s: error zeroizing %02lx.%04lx: response code %d ",
+ VFIO_AP_MODULE_NAME, apid, apqi,
+ status.response_code);
+ return -EIO;
+ }
+ };
+
+ return -EBUSY;
+}
+
#endif /* _VFIO_AP_PRIVATE_H_ */
--
1.7.1
From: Pierre Morel <[email protected]>
Shadowing the crypto control block now supports APCB shadowing.
AP instruction interpretation for guest 3 through ECA.28 is shadowed when
guest 2 ECA.28 is set.
CRYCB is shadowed for APCB and wrapping keys.
CRYCB format 0 is now supported for both guests 2 and 3.
Shadow CRYCB always uses the guest 2 CRYCB format and it
follows that:
* Guest 3 CRYCB format 0 is supported with guest 2 CRYCB format 0,1 or 2
* Guest 3 CRYCB format 1 is supported with guest 2 CRYCB format 1 or 2
* Guest 3 CRYCB format 2 is supported with guest 2 CRYCB format 2
Signed-off-by: Pierre Morel <[email protected]>
Signed-off-by: Tony Krowiak <[email protected]>
---
arch/s390/kvm/vsie.c | 224 ++++++++++++++++++++++++++++++++++++++++++++------
1 files changed, 200 insertions(+), 24 deletions(-)
diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
index 84c89cb..25c8ccc 100644
--- a/arch/s390/kvm/vsie.c
+++ b/arch/s390/kvm/vsie.c
@@ -136,17 +136,8 @@ static int prepare_cpuflags(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
return 0;
}
-/*
- * Create a shadow copy of the crycb block and setup key wrapping, if
- * requested for guest 3 and enabled for guest 2.
- *
- * We only accept format-1 (no AP in g2), but convert it into format-2
- * There is nothing to do for format-0.
- *
- * Returns: - 0 if shadowed or nothing to do
- * - > 0 if control has to be given to guest 2
- */
-static int shadow_crycb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
+/* Copy keys into shadow crycb, is only called if MSA3 is available. */
+static int copy_key_masks(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
{
struct kvm_s390_sie_block *scb_s = &vsie_page->scb_s;
struct kvm_s390_sie_block *scb_o = vsie_page->scb_o;
@@ -155,30 +146,17 @@ static int shadow_crycb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
unsigned long *b1, *b2;
u8 ecb3_flags;
- scb_s->crycbd = 0;
- if (!(crycbd_o & vcpu->arch.sie_block->crycbd & CRYCB_FORMAT1))
- return 0;
- /* format-1 is supported with message-security-assist extension 3 */
- if (!test_kvm_facility(vcpu->kvm, 76))
- return 0;
/* we may only allow it if enabled for guest 2 */
ecb3_flags = scb_o->ecb3 & vcpu->arch.sie_block->ecb3 &
(ECB3_AES | ECB3_DEA);
if (!ecb3_flags)
return 0;
- if ((crycb_addr & PAGE_MASK) != ((crycb_addr + 128) & PAGE_MASK))
- return set_validity_icpt(scb_s, 0x003CU);
- else if (!crycb_addr)
- return set_validity_icpt(scb_s, 0x0039U);
-
/* copy only the wrapping keys */
if (read_guest_real(vcpu, crycb_addr + 72, &vsie_page->crycb, 56))
return set_validity_icpt(scb_s, 0x0035U);
scb_s->ecb3 |= ecb3_flags;
- scb_s->crycbd = ((__u32)(__u64) &vsie_page->crycb) | CRYCB_FORMAT1 |
- CRYCB_FORMAT2;
/* xor both blocks in one run */
b1 = (unsigned long *) vsie_page->crycb.dea_wrapping_key_mask;
@@ -189,6 +167,204 @@ static int shadow_crycb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
return 0;
}
+/* Copy masks into apcb when g2 and g3 use format 1 */
+static int copy_apcb1(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
+{
+ struct kvm_s390_sie_block *scb_o = vsie_page->scb_o;
+ struct kvm_s390_sie_block *scb_s = &vsie_page->scb_s;
+ const uint32_t crycbd_o = READ_ONCE(scb_o->crycbd);
+ const u32 crycb_o = crycbd_o & 0x7ffffff8U;
+ struct kvm_s390_crypto_cb *crycb_h = &vcpu->kvm->arch.sie_page2->crycb;
+ struct kvm_s390_crypto_cb *crycb_s = &vsie_page->crycb;
+ unsigned long *apcb_s = (unsigned long *) &crycb_s->apcb1;
+ unsigned long *apcb_h = (unsigned long *) &crycb_h->apcb1;
+ int i;
+ u32 src;
+
+ src = crycb_o + offsetof(struct kvm_s390_crypto_cb, apcb1);
+ if (read_guest_real(vcpu, src, apcb_s, sizeof(struct kvm_s390_apcb1)))
+ return set_validity_icpt(scb_s, 0x0035U);
+
+ for (i = 0; i < sizeof(struct kvm_s390_apcb1); i += sizeof(*apcb_s))
+ *apcb_s &= *apcb_h;
+
+ return 0;
+}
+
+/*
+ * Copy masks into apcb when g2 use format 1 and g3 use format 0
+ * In this case the shadow APCB uses format 1
+ */
+static int copy_apcb01(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
+{
+ struct kvm_s390_sie_block *scb_o = vsie_page->scb_o;
+ struct kvm_s390_sie_block *scb_s = &vsie_page->scb_s;
+ const uint32_t crycbd_o = READ_ONCE(scb_o->crycbd);
+ const u32 crycb_o = crycbd_o & 0x7ffffff8U;
+ struct kvm_s390_apcb1 *apcb_h = &vcpu->kvm->arch.sie_page2->crycb.apcb1;
+ struct kvm_s390_apcb1 *apcb_s = &vsie_page->crycb.apcb1;
+ u32 src;
+
+ memset(apcb_s, 0, sizeof(*apcb_s));
+
+ src = crycb_o + offsetof(struct kvm_s390_crypto_cb, apcb0.apm[0]);
+ if (read_guest_real(vcpu, src, &apcb_s->apm[0], sizeof(__u64)))
+ return set_validity_icpt(scb_s, 0x0035U);
+
+ src = crycb_o + offsetof(struct kvm_s390_crypto_cb, apcb0.aqm[0]);
+ if (read_guest_real(vcpu, src, &apcb_s->aqm[0], sizeof(__u64)))
+ return set_validity_icpt(scb_s, 0x0035U);
+
+ src = crycb_o + offsetof(struct kvm_s390_crypto_cb, apcb0.adm[0]);
+ if (read_guest_real(vcpu, src, &apcb_s->adm[0], sizeof(__u64)))
+ return set_validity_icpt(scb_s, 0x0035U);
+
+ apcb_s->apm[0] &= apcb_h->apm[0];
+ apcb_s->aqm[0] &= apcb_h->aqm[0];
+ apcb_s->adm[0] &= apcb_h->adm[0];
+
+ return 0;
+}
+
+/* Copy masks into apcb when g2 and g3 use format 0 */
+static int copy_apcb0(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
+{
+ struct kvm_s390_sie_block *scb_o = vsie_page->scb_o;
+ struct kvm_s390_sie_block *scb_s = &vsie_page->scb_s;
+ const uint32_t crycbd_o = READ_ONCE(scb_o->crycbd);
+ const u32 crycb_o = crycbd_o & 0x7ffffff8U;
+ struct kvm_s390_apcb0 *apcb_h = &vcpu->kvm->arch.sie_page2->crycb.apcb0;
+ struct kvm_s390_apcb0 *apcb_s = &vsie_page->crycb.apcb0;
+ u32 src;
+
+ src = crycb_o + offsetof(struct kvm_s390_crypto_cb, apcb0.apm[0]);
+ if (read_guest_real(vcpu, src, &apcb_s->apm[0], sizeof(__u64)))
+ return set_validity_icpt(scb_s, 0x0035U);
+
+ src = crycb_o + offsetof(struct kvm_s390_crypto_cb, apcb0.aqm[0]);
+ if (read_guest_real(vcpu, src, &apcb_s->aqm[0], sizeof(__u64)))
+ return set_validity_icpt(scb_s, 0x0035U);
+
+ src = crycb_o + offsetof(struct kvm_s390_crypto_cb, apcb0.adm[0]);
+ if (read_guest_real(vcpu, src, &apcb_s->adm[0], sizeof(__u64)))
+ return set_validity_icpt(scb_s, 0x0035U);
+
+ apcb_s->apm[0] &= apcb_h->apm[0];
+ apcb_s->aqm[0] &= apcb_h->aqm[0];
+ apcb_s->adm[0] &= apcb_h->adm[0];
+
+ return 0;
+}
+
+/* Shadowing APCB depends on G2 and G3 CRYCB format */
+static int copy_apcb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page,
+ int g2_fmt, int g3_fmt)
+{
+ struct kvm_s390_sie_block *scb_s = &vsie_page->scb_s;
+ int ret = 0;
+
+ switch (g2_fmt) {
+ case CRYCB_FORMAT0:
+ switch (g3_fmt) {
+ case CRYCB_FORMAT0:
+ ret = copy_apcb0(vcpu, vsie_page);
+ break;
+ default:
+ return set_validity_icpt(scb_s, 0x0020U);
+ }
+ break;
+ case CRYCB_FORMAT1:
+ switch (g3_fmt) {
+ case CRYCB_FORMAT1:
+ case CRYCB_FORMAT0: /* Fall through to copy APCB */
+ ret = copy_apcb0(vcpu, vsie_page);
+ break;
+ default:
+ return set_validity_icpt(scb_s, 0x0020U);
+ }
+ break;
+ case CRYCB_FORMAT2:
+ switch (g3_fmt) {
+ case CRYCB_FORMAT0:
+ case CRYCB_FORMAT1:
+ ret = copy_apcb01(vcpu, vsie_page);
+ break;
+ case CRYCB_FORMAT2:
+ ret = copy_apcb1(vcpu, vsie_page);
+ break;
+ }
+ break;
+ default:
+ /*
+ * Guest 2 format is valid or we can not get to here.
+ */
+ break;
+ }
+
+ return ret;
+}
+
+/*
+ * Create a shadow copy of the crycb block.
+ * - Setup key wrapping, if requested for guest 3 and enabled for guest 2.
+ * - Shadow APCB if requested by guest 3 and enabled for guest 2 through
+ * ECA_APIE.
+ *
+ * We only accept format-1 (no AP in g2), but convert it into format-2
+ * There is nothing to do for format-0.
+ *
+ * Returns: - 0 if shadowed or nothing to do
+ * - > 0 if control has to be given to guest 2
+ * - < 0 if something went wrong on copy
+ */
+#define ECA_APIE 0x00000008
+static int shadow_crycb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
+{
+ struct kvm_s390_sie_block *scb_s = &vsie_page->scb_s;
+ struct kvm_s390_sie_block *scb_o = vsie_page->scb_o;
+ const uint32_t crycbd_o = READ_ONCE(scb_o->crycbd);
+ const u32 crycb_addr = crycbd_o & 0x7ffffff8U;
+ int g2_fmt = vcpu->arch.sie_block->crycbd & CRYCB_FORMAT_MASK;
+ int g3_fmt = crycbd_o & CRYCB_FORMAT_MASK;
+ int g2_apie, g2_msa3, g3_apie, g3_msa3;
+ int size, ret;
+
+ /* crycb should not cross a page boundary */
+ size = (g3_fmt == CRYCB_FORMAT2) ? 0x100 : 0x80;
+ if ((crycb_addr & PAGE_MASK) != ((crycb_addr + size) & PAGE_MASK))
+ return set_validity_icpt(scb_s, 0x003CU);
+
+ g2_apie = vcpu->arch.sie_block->eca & ECA_APIE;
+ g3_apie = scb_o->eca & g2_apie;
+
+ g2_msa3 = test_kvm_facility(vcpu->kvm, 76);
+ g3_msa3 = (g3_fmt != CRYCB_FORMAT0) & g2_msa3;
+
+ scb_s->crycbd = 0;
+ /* If no AP instructions and no keys we just set crycbd to 0 */
+ if (!(g3_apie || g3_msa3))
+ return 0;
+
+ if (!crycb_addr)
+ return set_validity_icpt(scb_s, 0x0039U);
+
+ if (g3_apie) {
+ ret = copy_apcb(vcpu, vsie_page, g2_fmt, g3_fmt);
+ if (ret)
+ goto out;
+ scb_s->eca |= g3_apie;
+ }
+
+ if (g3_msa3)
+ ret = copy_key_masks(vcpu, vsie_page);
+
+ if (!ret)
+ scb_s->crycbd = ((__u32)(__u64) &vsie_page->crycb) | g2_fmt;
+
+out:
+ return ret;
+}
+
/* shadow (round up/down) the ibc to avoid validity icpt */
static void prepare_ibc(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
{
--
1.7.1
Introduces ioctl access to the VFIO AP Matrix device driver
by implementing the VFIO_DEVICE_GET_INFO ioctl. This ioctl
provides the VFIO AP Matrix device driver information to the
guest machine.
Reviewed-by: Pierre Morel <[email protected]>
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 43 +++++++++++++++++++++++++++++++++++++
1 files changed, 43 insertions(+), 0 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index bc05d40..79ac0d4 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1114,6 +1114,48 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
module_put(THIS_MODULE);
}
+static int vfio_ap_mdev_get_device_info(unsigned long arg)
+{
+ unsigned long minsz;
+ struct vfio_device_info info;
+
+ minsz = offsetofend(struct vfio_device_info, num_irqs);
+
+ if (copy_from_user(&info, (void __user *)arg, minsz))
+ return -EFAULT;
+
+ if (info.argsz < minsz) {
+ pr_err("%s: Argument size %u less than min size %li",
+ VFIO_AP_MODULE_NAME, info.argsz, minsz);
+ return -EINVAL;
+ }
+
+ info.flags = VFIO_DEVICE_FLAGS_AP;
+ info.num_regions = 0;
+ info.num_irqs = 0;
+
+ return copy_to_user((void __user *)arg, &info, minsz);
+}
+
+static ssize_t vfio_ap_mdev_ioctl(struct mdev_device *mdev,
+ unsigned int cmd, unsigned long arg)
+{
+ int ret;
+
+ switch (cmd) {
+ case VFIO_DEVICE_GET_INFO:
+ ret = vfio_ap_mdev_get_device_info(arg);
+ break;
+ default:
+ pr_err("%s: ioctl command %d is not a supported command",
+ VFIO_AP_MODULE_NAME, cmd);
+ ret = -EOPNOTSUPP;
+ break;
+ }
+
+ return ret;
+}
+
static const struct mdev_parent_ops vfio_ap_matrix_ops = {
.owner = THIS_MODULE,
.supported_type_groups = vfio_ap_mdev_type_groups,
@@ -1122,6 +1164,7 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
.remove = vfio_ap_mdev_remove,
.open = vfio_ap_mdev_open,
.release = vfio_ap_mdev_release,
+ .ioctl = vfio_ap_mdev_ioctl,
};
int vfio_ap_mdev_register(struct ap_matrix_dev *matrix_dev)
--
1.7.1
Provides a sysfs interface to view the AP matrix configured for the
mediated matrix device.
The relevant sysfs structures are:
/sys/devices/vfio_ap
... [matrix]
...... [mdev_supported_types]
......... [vfio_ap-passthrough]
............ [devices]
...............[$uuid]
.................. matrix
To view the matrix configured for the mediated matrix device,
print the matrix file:
cat matrix
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 31 +++++++++++++++++++++++++++++++
1 files changed, 31 insertions(+), 0 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index c8f31f3..bc7398d 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -697,6 +697,36 @@ static ssize_t control_domains_show(struct device *dev,
}
DEVICE_ATTR_RO(control_domains);
+static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct mdev_device *mdev = mdev_from_dev(dev);
+ struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+ char *bufpos = buf;
+ unsigned long apid;
+ unsigned long apqi;
+ unsigned long napm = matrix_mdev->matrix.apm_max + 1;
+ unsigned long naqm = matrix_mdev->matrix.aqm_max + 1;
+ int nchars = 0;
+ int n;
+
+ for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm) {
+ n = sprintf(bufpos, "%02lx\n", apid);
+ bufpos += n;
+ nchars += n;
+
+ for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, naqm) {
+ n = sprintf(bufpos, "%02lx.%04lx\n", apid, apqi);
+ bufpos += n;
+ nchars += n;
+ }
+ }
+
+ return nchars;
+}
+DEVICE_ATTR_RO(matrix);
+
+
static struct attribute *vfio_ap_mdev_attrs[] = {
&dev_attr_assign_adapter.attr,
&dev_attr_unassign_adapter.attr,
@@ -705,6 +735,7 @@ static ssize_t control_domains_show(struct device *dev,
&dev_attr_assign_control_domain.attr,
&dev_attr_unassign_control_domain.attr,
&dev_attr_control_domains.attr,
+ &dev_attr_matrix.attr,
NULL,
};
--
1.7.1
Provides the sysfs interfaces for assigning AP control domains
to and unassigning AP control domains from a mediated matrix device.
The IDs of the AP control domains assigned to the mediated matrix
device are stored in an AP domain mask (ADM). The bits in the ADM,
from most significant to least significant bit, correspond to
AP domain numbers 0 to 255. When a control domain is assigned,
the bit corresponding its domain ID will be set in the ADM.
Likewise, when a domain is unassigned, the bit corresponding
to its domain ID will be cleared in the ADM.
The relevant sysfs structures are:
/sys/devices/vfio_ap
... [matrix]
...... [mdev_supported_types]
......... [vfio_ap-passthrough]
............ [devices]
...............[$uuid]
.................. assign_control_domain
.................. unassign_control_domain
To assign a control domain to the $uuid mediated matrix device's
ADM, write its domain number to the assign_control_domain file.
To unassign a domain, write its domain number to the
unassign_control_domain file. The domain number is specified
using conventional semantics: If it begins with 0x the number
will be parsed as a hexadecimal (case insensitive) number;
if it begins with 0, it is parsed as an octal number;
otherwise, it will be parsed as a decimal number.
For example, to assign control domain 173 (0xad) to the mediated
matrix device $uuid:
echo 173 > assign_control_domain
or
echo 0255 > assign_control_domain
or
echo 0xad > assign_control_domain
To unassign control domain 173 (0xad):
echo 173 > unassign_control_domain
or
echo 0255 > unassign_control_domain
or
echo 0xad > unassign_control_domain
The assignment will be rejected if the APQI exceeds the maximum
value for an AP domain:
* If the AP Extended Addressing (APXA) facility is installed,
the max value is 255
* Else the max value is 15
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 114 +++++++++++++++++++++++++++++++++++++
1 files changed, 114 insertions(+), 0 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index a5b06e7..c8f31f3 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -586,11 +586,125 @@ static ssize_t unassign_domain_store(struct device *dev,
}
DEVICE_ATTR_WO(unassign_domain);
+
+/**
+ * assign_control_domain_store
+ *
+ * @dev: the matrix device
+ * @attr: a mediated matrix device attribute
+ * @buf: a buffer containing the adapter ID (APID) to be assigned
+ * @count: the number of bytes in @buf
+ *
+ * Parses the domain ID from @buf and assigns it to the mediated matrix device.
+ *
+ * Returns the number of bytes processed if the domain ID is valid; otherwise
+ * returns an error.
+ */
+static ssize_t assign_control_domain_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ int ret;
+ unsigned long id;
+ struct mdev_device *mdev = mdev_from_dev(dev);
+ struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+ unsigned long maxid = matrix_mdev->matrix.adm_max;
+
+ ret = kstrtoul(buf, 0, &id);
+ if (ret || (id > maxid)) {
+ pr_err("%s: %s: control domain id '%s' not a value from 0 to %02lu(%#04lx)",
+ VFIO_AP_MODULE_NAME, __func__, buf, maxid, maxid);
+
+ return ret ? ret : -EINVAL;
+ }
+
+ /* Set the bit in the ADM (bitmask) corresponding to the AP control
+ * domain number (id). The bits in the mask, from most significant to
+ * least significant, correspond to IDs 0 up to the one less than the
+ * number of control domains that can be assigned.
+ */
+ set_bit_inv(id, matrix_mdev->matrix.adm);
+
+ return count;
+}
+DEVICE_ATTR_WO(assign_control_domain);
+
+/**
+ * unassign_control_domain_store
+ *
+ * @dev: the matrix device
+ * @attr: a mediated matrix device attribute
+ * @buf: a buffer containing the adapter ID (APID) to be assigned
+ * @count: the number of bytes in @buf
+ *
+ * Parses the domain ID from @buf and unassigns it from the mediated matrix
+ * device.
+ *
+ * Returns the number of bytes processed if the domain ID is valid; otherwise
+ * returns an error.
+ */
+static ssize_t unassign_control_domain_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ int ret;
+ unsigned long domid;
+ struct mdev_device *mdev = mdev_from_dev(dev);
+ struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+ unsigned long max_domid = matrix_mdev->matrix.adm_max;
+
+ ret = kstrtoul(buf, 0, &domid);
+ if (ret || (domid > max_domid)) {
+ pr_err("%s: %s: control domain id '%s' not a value from 0 to %02lu(%#04lx)",
+ VFIO_AP_MODULE_NAME, __func__, buf,
+ max_domid, max_domid);
+
+ return ret ? ret : -EINVAL;
+ }
+
+ if (!test_bit_inv(domid, matrix_mdev->matrix.adm)) {
+ pr_err("%s: %s: control domain id %02lu(%#04lx) is not assigned",
+ VFIO_AP_MODULE_NAME, __func__, domid, domid);
+
+ return -ENODEV;
+ }
+
+ clear_bit_inv(domid, matrix_mdev->matrix.adm);
+
+ return count;
+}
+DEVICE_ATTR_WO(unassign_control_domain);
+
+static ssize_t control_domains_show(struct device *dev,
+ struct device_attribute *dev_attr,
+ char *buf)
+{
+ unsigned long id;
+ int nchars = 0;
+ int n;
+ char *bufpos = buf;
+ struct mdev_device *mdev = mdev_from_dev(dev);
+ struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+ unsigned long max_apqi = matrix_mdev->matrix.apm_max;
+
+ for_each_set_bit_inv(id, matrix_mdev->matrix.adm, max_apqi + 1) {
+ n = sprintf(bufpos, "%04lx\n", id);
+ bufpos += n;
+ nchars += n;
+ }
+
+ return nchars;
+}
+DEVICE_ATTR_RO(control_domains);
+
static struct attribute *vfio_ap_mdev_attrs[] = {
&dev_attr_assign_adapter.attr,
&dev_attr_unassign_adapter.attr,
&dev_attr_assign_domain.attr,
&dev_attr_unassign_domain.attr,
+ &dev_attr_assign_control_domain.attr,
+ &dev_attr_unassign_control_domain.attr,
+ &dev_attr_control_domains.attr,
NULL,
};
--
1.7.1
From: Tony Krowiak <[email protected]>
Configures the AP adapters, usage domains and control domains for the
KVM guest from the matrix configured via the mediated matrix device's
sysfs attribute files.
The guest's SIE state description has a satellite structure called the
Crypto Control Block (CRYCB) containing three bitmask fields
identifying the adapters, queues (domains) and control domains
assigned to the KVM guest:
* The AP Adapter Mask (APM) field identifies the AP adapters assigned to
the KVM guest
* The AP Queue Mask (AQM) field identifies the AP queues assigned to
the KVM guest. Each AP queue is connected to a usage domain within
an AP adapter.
* The AP Domain Mask (ADM) field identifies the control domains
assigned to the KVM guest.
Each adapter, queue (usage domain) and control domain are identified by
a number from 0 to 255. The bits in each mask, from most significant to
least significant bit, correspond to the numbers 0-255. When a bit is
set, the corresponding adapter, queue (usage domain) or control domain
is assigned to the KVM guest.
This patch will set the bits in the APM, AQM and ADM fields of the
CRYCB referenced by the KVM guest's SIE state description. The process
used is:
1. Verify that the bits to be set do not exceed the maximum bit
number for the given mask.
2. Verify that the APQNs that can be derived from the cross product
of the bits set in the APM and AQM fields of the KVM guest's CRYCB
are not assigned to any other KVM guest running on the same linux
host.
3. Set the APM, AQM and ADM in the CRYCB according to the matrix
configured for the mediated matrix device via its sysfs
assign_adapter, assign_domain and assign_control domain attribute
files respectively.
Signed-off-by: Tony Krowiak <[email protected]>
---
arch/s390/include/asm/kvm_host.h | 1 +
drivers/s390/crypto/vfio_ap_ops.c | 201 +++++++++++++++++++++++++++++++++++++
2 files changed, 202 insertions(+), 0 deletions(-)
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index d44e0d5..79b2ccf 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -257,6 +257,7 @@ struct kvm_s390_sie_block {
__u64 tecmc; /* 0x00e8 */
__u8 reservedf0[12]; /* 0x00f0 */
#define CRYCB_FORMAT_MASK 0x00000003
+#define CRYCB_FORMAT0 0x00000000
#define CRYCB_FORMAT1 0x00000001
#define CRYCB_FORMAT2 0x00000003
__u32 crycbd; /* 0x00fc */
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 58be495..98bd0a1 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -770,6 +770,201 @@ static int kvm_ap_validate_crypto_setup(struct kvm *kvm)
return -EOPNOTSUPP;
}
+static inline unsigned long *
+kvm_ap_get_crycb_apm(struct ap_matrix_mdev *matrix_mdev)
+{
+ unsigned long *apm;
+ struct kvm *kvm = matrix_mdev->kvm;
+
+ switch (kvm->arch.crypto.crycbd & CRYCB_FORMAT_MASK) {
+ case CRYCB_FORMAT2:
+ apm = (unsigned long *)kvm->arch.crypto.crycb->apcb1.apm;
+ break;
+ case CRYCB_FORMAT1:
+ case CRYCB_FORMAT0:
+ default:
+ apm = (unsigned long *)kvm->arch.crypto.crycb->apcb0.apm;
+ break;
+ }
+
+ return apm;
+}
+
+static inline unsigned long *
+kvm_ap_get_crycb_aqm(struct ap_matrix_mdev *matrix_mdev)
+{
+ unsigned long *aqm;
+ struct kvm *kvm = matrix_mdev->kvm;
+
+ switch (kvm->arch.crypto.crycbd & CRYCB_FORMAT_MASK) {
+ case CRYCB_FORMAT2:
+ aqm = (unsigned long *)kvm->arch.crypto.crycb->apcb1.aqm;
+ break;
+ case CRYCB_FORMAT1:
+ case CRYCB_FORMAT0:
+ default:
+ aqm = (unsigned long *)kvm->arch.crypto.crycb->apcb0.aqm;
+ break;
+ }
+
+ return aqm;
+}
+
+static inline unsigned long *
+kvm_ap_get_crycb_adm(struct ap_matrix_mdev *matrix_mdev)
+{
+ unsigned long *adm;
+ struct kvm *kvm = matrix_mdev->kvm;
+
+ switch (kvm->arch.crypto.crycbd & CRYCB_FORMAT_MASK) {
+ case CRYCB_FORMAT2:
+ adm = (unsigned long *)kvm->arch.crypto.crycb->apcb1.adm;
+ break;
+ case CRYCB_FORMAT1:
+ case CRYCB_FORMAT0:
+ default:
+ adm = (unsigned long *)kvm->arch.crypto.crycb->apcb0.adm;
+ break;
+ }
+
+ return adm;
+}
+
+static inline void kvm_ap_clear_crycb_masks(struct ap_matrix_mdev *matrix_mdev)
+{
+ memset(&matrix_mdev->kvm->arch.crypto.crycb->apcb0, 0,
+ sizeof(matrix_mdev->kvm->arch.crypto.crycb->apcb0));
+ memset(&matrix_mdev->kvm->arch.crypto.crycb->apcb1, 0,
+ sizeof(matrix_mdev->kvm->arch.crypto.crycb->apcb1));
+}
+
+static void kvm_ap_set_crycb_masks(struct ap_matrix_mdev *matrix_mdev)
+{
+ int nbytes;
+ unsigned long *apm, *aqm, *adm;
+
+ kvm_ap_clear_crycb_masks(matrix_mdev);
+
+ apm = kvm_ap_get_crycb_apm(matrix_mdev);
+ aqm = kvm_ap_get_crycb_aqm(matrix_mdev);
+ adm = kvm_ap_get_crycb_adm(matrix_mdev);
+
+ nbytes = KVM_AP_MASK_BYTES(matrix_mdev->matrix.apm_max + 1);
+ memcpy(apm, matrix_mdev->matrix.apm, nbytes);
+
+ nbytes = KVM_AP_MASK_BYTES(matrix_mdev->matrix.aqm_max + 1);
+ memcpy(aqm, matrix_mdev->matrix.aqm, nbytes);
+
+ /*
+ * Merge the AQM and ADM since the ADM is a superset of the
+ * AQM by agreed-upon convention.
+ */
+ bitmap_or(adm, matrix_mdev->matrix.adm, matrix_mdev->matrix.aqm,
+ matrix_mdev->matrix.adm_max + 1);
+}
+
+static void kvm_ap_log_sharing_err(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long apid, unsigned long apqi)
+{
+ pr_err("%s: AP queue %02lx.%04lx is assigned to %s device", __func__,
+ apid, apqi, matrix_mdev->name);
+}
+
+static int kvm_ap_find_matching_bits(unsigned long *dst, unsigned long *src1,
+ unsigned long *src2, unsigned long nbits)
+{
+ unsigned long nbit;
+
+ for_each_set_bit_inv(nbit, src1, nbits) {
+ if (test_bit_inv(nbit, src2))
+ set_bit_inv(nbit, dst);
+ }
+
+ return find_first_bit_inv(dst, nbit) < nbits;
+}
+
+/**
+ * kvm_ap_validate_queue_sharing
+ *
+ * Verifies that the APQNs derived from the cross product of the AP adapter IDs
+ * and AP queue indexes comprising the AP matrix are not configured for
+ * another guest. AP queue sharing is not allowed.
+ *
+ * @kvm: the KVM guest
+ * @matrix: the AP matrix
+ *
+ * Returns 0 if the APQNs are valid, otherwise; returns -EBUSY.
+ */
+static int kvm_ap_validate_queue_sharing(struct ap_matrix_mdev *matrix_mdev)
+{
+ int ret;
+ struct ap_matrix_mdev *lstdev;
+ unsigned long apid, apqi;
+ unsigned long apm[BITS_TO_LONGS(matrix_mdev->matrix.apm_max + 1)];
+ unsigned long aqm[BITS_TO_LONGS(matrix_mdev->matrix.aqm_max + 1)];
+
+ spin_lock_bh(&mdev_list_lock);
+
+ list_for_each_entry(lstdev, &mdev_list, list) {
+ if (matrix_mdev == lstdev)
+ continue;
+
+ memset(apm, 0, BITS_TO_LONGS(matrix_mdev->matrix.apm_max + 1) *
+ sizeof(unsigned long));
+ memset(aqm, 0, BITS_TO_LONGS(matrix_mdev->matrix.aqm_max + 1) *
+ sizeof(unsigned long));
+
+ if (!kvm_ap_find_matching_bits(apm, matrix_mdev->matrix.apm,
+ lstdev->matrix.apm,
+ matrix_mdev->matrix.apm_max + 1))
+ continue;
+
+ if (!kvm_ap_find_matching_bits(aqm, matrix_mdev->matrix.aqm,
+ lstdev->matrix.aqm,
+ matrix_mdev->matrix.aqm_max + 1))
+ continue;
+
+ for_each_set_bit_inv(apid, apm, matrix_mdev->matrix.apm_max + 1)
+ for_each_set_bit_inv(apqi, aqm,
+ matrix_mdev->matrix.aqm_max + 1)
+ kvm_ap_log_sharing_err(lstdev, apid, apqi);
+
+ ret = -EBUSY;
+ goto done;
+ }
+
+ ret = 0;
+
+done:
+ spin_unlock_bh(&mdev_list_lock);
+ return ret;
+}
+
+static int kvm_ap_configure_matrix(struct ap_matrix_mdev *matrix_mdev)
+{
+ int ret = 0;
+
+ mutex_lock(&matrix_mdev->kvm->lock);
+
+ ret = kvm_ap_validate_queue_sharing(matrix_mdev);
+ if (ret)
+ goto done;
+
+ kvm_ap_set_crycb_masks(matrix_mdev);
+
+done:
+ mutex_unlock(&matrix_mdev->kvm->lock);
+
+ return ret;
+}
+
+void kvm_ap_deconfigure_matrix(struct ap_matrix_mdev *matrix_mdev)
+{
+ mutex_lock(&matrix_mdev->kvm->lock);
+ kvm_ap_clear_crycb_masks(matrix_mdev);
+ mutex_unlock(&matrix_mdev->kvm->lock);
+}
+
static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
unsigned long action, void *data)
{
@@ -852,6 +1047,10 @@ static int vfio_ap_mdev_open(struct mdev_device *mdev)
if (ret)
goto out_kvm_err;
+ ret = kvm_ap_configure_matrix(matrix_mdev);
+ if (ret)
+ goto out_kvm_err;
+
return 0;
out_kvm_err:
@@ -868,6 +1067,8 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
{
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+ kvm_ap_deconfigure_matrix(matrix_mdev);
+
vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
&matrix_mdev->group_notifier);
matrix_mdev->kvm = NULL;
--
1.7.1
From: Tony Krowiak <[email protected]>
Introduces a new structure for storing the AP matrix configured
for the mediated matrix device via its sysfs attributes files.
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 12 ++++++++++++
drivers/s390/crypto/vfio_ap_private.h | 24 ++++++++++++++++++++++++
2 files changed, 36 insertions(+), 0 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 4e61e33..bf7ed9f 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -20,6 +20,17 @@
DEFINE_SPINLOCK(mdev_list_lock);
LIST_HEAD(mdev_list);
+static void vfio_ap_matrix_init(struct ap_matrix *matrix)
+{
+ /* Test if PQAP(QCI) instruction is available */
+ if (test_facility(12))
+ ap_qci(&matrix->info);
+
+ matrix->apm_max = matrix->info.apxa ? matrix->info.Na : 63;
+ matrix->aqm_max = matrix->info.apxa ? matrix->info.Nd : 15;
+ matrix->adm_max = matrix->info.apxa ? matrix->info.Nd : 15;
+}
+
static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
{
struct ap_matrix_dev *matrix_dev =
@@ -31,6 +42,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
return -ENOMEM;
matrix_mdev->name = dev_name(mdev_dev(mdev));
+ vfio_ap_matrix_init(&matrix_mdev->matrix);
mdev_set_drvdata(mdev, matrix_mdev);
if (atomic_dec_if_positive(&matrix_dev->available_instances) < 0) {
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 3de1275..ae771f5 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -29,9 +29,33 @@ struct ap_matrix_dev {
atomic_t available_instances;
};
+/**
+ * The AP matrix is comprised of three bit masks identifying the adapters,
+ * queues (domains) and control domains that belong to an AP matrix. The bits i
+ * each mask, from least significant to most significant bit, correspond to IDs
+ * 0 to 255. When a bit is set, the corresponding ID belongs to the matrix.
+ *
+ * @apm identifies the AP adapters in the matrix
+ * @apm_max: max adapter number in @apm
+ * @aqm identifies the AP queues (domains) in the matrix
+ * @aqm_max: max domain number in @aqm
+ * @adm identifies the AP control domains in the matrix
+ * @adm_max: max domain number in @adm
+ */
+struct ap_matrix {
+ unsigned long apm_max;
+ DECLARE_BITMAP(apm, 256);
+ unsigned long aqm_max;
+ DECLARE_BITMAP(aqm, 256);
+ unsigned long adm_max;
+ DECLARE_BITMAP(adm, 256);
+ struct ap_config_info info;
+};
+
struct ap_matrix_mdev {
const char *name;
struct list_head list;
+ struct ap_matrix matrix;
};
static struct ap_matrix_dev *to_ap_matrix_dev(struct device *dev)
--
1.7.1
From: Harald Freudenberger <[email protected]>
Show the current load value of cards and queues in sysfs.
The load value for each card and queue is maintained by
the zcrypt device driver for dispatching and load
balancing requests over the available devices.
This patch provides the load value to userspace via a
new read only sysfs attribute 'load' per card and queue.
Signed-off-by: Harald Freudenberger <[email protected]>
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/zcrypt_card.c | 12 ++++++++++++
drivers/s390/crypto/zcrypt_queue.c | 12 ++++++++++++
2 files changed, 24 insertions(+), 0 deletions(-)
diff --git a/drivers/s390/crypto/zcrypt_card.c b/drivers/s390/crypto/zcrypt_card.c
index 233e1e6..da2c8df 100644
--- a/drivers/s390/crypto/zcrypt_card.c
+++ b/drivers/s390/crypto/zcrypt_card.c
@@ -83,9 +83,21 @@ static ssize_t zcrypt_card_online_store(struct device *dev,
static DEVICE_ATTR(online, 0644, zcrypt_card_online_show,
zcrypt_card_online_store);
+static ssize_t zcrypt_card_load_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct zcrypt_card *zc = to_ap_card(dev)->private;
+
+ return snprintf(buf, PAGE_SIZE, "%d\n", atomic_read(&zc->load));
+}
+
+static DEVICE_ATTR(load, 0444, zcrypt_card_load_show, NULL);
+
static struct attribute *zcrypt_card_attrs[] = {
&dev_attr_type.attr,
&dev_attr_online.attr,
+ &dev_attr_load.attr,
NULL,
};
diff --git a/drivers/s390/crypto/zcrypt_queue.c b/drivers/s390/crypto/zcrypt_queue.c
index 720434e..91a52f2 100644
--- a/drivers/s390/crypto/zcrypt_queue.c
+++ b/drivers/s390/crypto/zcrypt_queue.c
@@ -75,8 +75,20 @@ static ssize_t zcrypt_queue_online_store(struct device *dev,
static DEVICE_ATTR(online, 0644, zcrypt_queue_online_show,
zcrypt_queue_online_store);
+static ssize_t zcrypt_queue_load_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct zcrypt_queue *zq = to_ap_queue(dev)->private;
+
+ return snprintf(buf, PAGE_SIZE, "%d\n", atomic_read(&zq->load));
+}
+
+static DEVICE_ATTR(load, 0444, zcrypt_queue_load_show, NULL);
+
static struct attribute *zcrypt_queue_attrs[] = {
&dev_attr_online.attr,
+ &dev_attr_load.attr,
NULL,
};
--
1.7.1
Introduces a new AP device driver. This device driver
is built on the VFIO mediated device framework. The framework
provides sysfs interfaces that facilitate passthrough
access by guests to devices installed on the linux host.
The VFIO AP device driver will serve two purposes:
1. Provide the interfaces to reserve AP devices for exclusive
use by KVM guests. This is accomplished by unbinding the
devices to be reserved for guest usage from the default AP
device driver and binding them to the VFIO AP device driver.
2. Implements the functions, callbacks and sysfs attribute
interfaces required to create one or more VFIO mediated
devices each of which will be used to configure the AP
matrix for a guest and serve as a file descriptor
for facilitating communication between QEMU and the
VFIO AP device driver.
When the VFIO AP device driver is initialized:
* It registers with the AP bus for control of type 10 (CEX4
and newer) AP queue devices. This limitation was imposed
due to:
1. A lack of access to older systems needed to test the
older AP device models;
2. A desire to keep the code as simple as possible;
3. Some older models are no longer supported by the kernel
and others are getting close to end of service.
The probe and remove callbacks will be provided to support
the binding/unbinding of AP queue devices to/from the VFIO
AP device driver.
* Creates a /sys/devices/vfio-ap/matrix device to hold
the APQNs of the AP devices bound to the VFIO
AP device driver and serves as the parent of the
mediated devices created for each guest.
Signed-off-by: Tony Krowiak <[email protected]>
---
MAINTAINERS | 10 +++
arch/s390/Kconfig | 11 +++
drivers/s390/crypto/Makefile | 4 +
drivers/s390/crypto/vfio_ap_drv.c | 140 +++++++++++++++++++++++++++++++++
drivers/s390/crypto/vfio_ap_private.h | 29 +++++++
include/uapi/linux/vfio.h | 2 +
samples/bpf/bpf_load.c | 62 +++++++++++++++
7 files changed, 258 insertions(+), 0 deletions(-)
create mode 100644 drivers/s390/crypto/vfio_ap_drv.c
create mode 100644 drivers/s390/crypto/vfio_ap_private.h
diff --git a/MAINTAINERS b/MAINTAINERS
index e19ec6d..0515dae 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12401,6 +12401,16 @@ W: http://www.ibm.com/developerworks/linux/linux390/
S: Supported
F: drivers/s390/crypto/
+S390 VFIO AP DRIVER
+M: Tony Krowiak <[email protected]>
+M: Christian Borntraeger <[email protected]>
+M: Martin Schwidefsky <[email protected]>
+L: [email protected]
+W: http://www.ibm.com/developerworks/linux/linux390/
+S: Supported
+F: drivers/s390/crypto/vfio_ap_drv.c
+F: drivers/s390/crypto/vfio_ap_private.h
+
S390 ZFCP DRIVER
M: Steffen Maier <[email protected]>
M: Benjamin Block <[email protected]>
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index baed397..1a534c6 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -778,6 +778,17 @@ config VFIO_CCW
To compile this driver as a module, choose M here: the
module will be called vfio_ccw.
+config VFIO_AP
+ def_tristate n
+ prompt "VFIO support for AP devices"
+ depends on ZCRYPT && VFIO_MDEV_DEVICE && KVM
+ help
+ This driver grants access to Adjunct Processor (AP) devices
+ via the VFIO mediated device interface.
+
+ To compile this driver as a module, choose M here: the module
+ will be called vfio_ap.
+
endmenu
menu "Dump support"
diff --git a/drivers/s390/crypto/Makefile b/drivers/s390/crypto/Makefile
index b59af54..48e466e 100644
--- a/drivers/s390/crypto/Makefile
+++ b/drivers/s390/crypto/Makefile
@@ -15,3 +15,7 @@ obj-$(CONFIG_ZCRYPT) += zcrypt_pcixcc.o zcrypt_cex2a.o zcrypt_cex4.o
# pkey kernel module
pkey-objs := pkey_api.o
obj-$(CONFIG_PKEY) += pkey.o
+
+# adjunct processor matrix
+vfio_ap-objs := vfio_ap_drv.o
+obj-$(CONFIG_VFIO_AP) += vfio_ap.o
diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
new file mode 100644
index 0000000..93db312
--- /dev/null
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -0,0 +1,140 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * VFIO based AP device driver
+ *
+ * Copyright IBM Corp. 2018
+ *
+ * Author(s): Tony Krowiak <[email protected]>
+ */
+
+#include <linux/module.h>
+#include <linux/mod_devicetable.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include "vfio_ap_private.h"
+
+#define VFIO_AP_ROOT_NAME "vfio_ap"
+#define VFIO_AP_DEV_TYPE_NAME "ap_matrix"
+#define VFIO_AP_DEV_NAME "matrix"
+
+MODULE_AUTHOR("IBM Corporation");
+MODULE_DESCRIPTION("VFIO AP device driver, Copyright IBM Corp. 2017");
+MODULE_LICENSE("GPL v2");
+
+static struct device *vfio_ap_root_device;
+
+static struct ap_driver vfio_ap_drv;
+
+static struct ap_matrix_dev *matrix_dev;
+
+static struct device_type vfio_ap_dev_type = {
+ .name = VFIO_AP_DEV_TYPE_NAME,
+};
+
+/* Only type 10 adapters (CEX4 and later) are supported
+ * by the AP matrix device driver
+ */
+static struct ap_device_id ap_queue_ids[] = {
+ { .dev_type = AP_DEVICE_TYPE_CEX4,
+ .match_flags = AP_DEVICE_ID_MATCH_QUEUE_TYPE },
+ { .dev_type = AP_DEVICE_TYPE_CEX5,
+ .match_flags = AP_DEVICE_ID_MATCH_QUEUE_TYPE },
+ { .dev_type = AP_DEVICE_TYPE_CEX6,
+ .match_flags = AP_DEVICE_ID_MATCH_QUEUE_TYPE },
+ { /* end of sibling */ },
+};
+
+MODULE_DEVICE_TABLE(vfio_ap, ap_queue_ids);
+
+static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
+{
+ return 0;
+}
+
+static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
+{
+ /* Nothing to do yet */
+}
+
+static void vfio_ap_matrix_dev_release(struct device *dev)
+{
+ struct ap_matrix_dev *matrix_dev = dev_get_drvdata(dev);
+
+ kfree(matrix_dev);
+}
+
+static int vfio_ap_matrix_dev_create(void)
+{
+ int ret;
+
+ vfio_ap_root_device = root_device_register(VFIO_AP_ROOT_NAME);
+
+ if (IS_ERR(vfio_ap_root_device)) {
+ ret = PTR_ERR(vfio_ap_root_device);
+ goto done;
+ }
+
+ matrix_dev = kzalloc(sizeof(*matrix_dev), GFP_KERNEL);
+ if (!matrix_dev) {
+ ret = -ENOMEM;
+ goto matrix_alloc_err;
+ }
+
+ matrix_dev->device.type = &vfio_ap_dev_type;
+ dev_set_name(&matrix_dev->device, "%s", VFIO_AP_DEV_NAME);
+ matrix_dev->device.parent = vfio_ap_root_device;
+ matrix_dev->device.release = vfio_ap_matrix_dev_release;
+ matrix_dev->device.driver = &vfio_ap_drv.driver;
+
+ ret = device_register(&matrix_dev->device);
+ if (ret)
+ goto matrix_reg_err;
+
+ goto done;
+
+matrix_reg_err:
+ put_device(&matrix_dev->device);
+
+matrix_alloc_err:
+ root_device_unregister(vfio_ap_root_device);
+
+done:
+ return ret;
+}
+
+static void vfio_ap_matrix_dev_destroy(struct ap_matrix_dev *matrix_dev)
+{
+ device_unregister(&matrix_dev->device);
+ root_device_unregister(vfio_ap_root_device);
+}
+
+int __init vfio_ap_init(void)
+{
+ int ret;
+
+ ret = vfio_ap_matrix_dev_create();
+ if (ret)
+ return ret;
+
+ memset(&vfio_ap_drv, 0, sizeof(vfio_ap_drv));
+ vfio_ap_drv.probe = vfio_ap_queue_dev_probe;
+ vfio_ap_drv.remove = vfio_ap_queue_dev_remove;
+ vfio_ap_drv.ids = ap_queue_ids;
+
+ ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
+ if (ret) {
+ vfio_ap_matrix_dev_destroy(matrix_dev);
+ return ret;
+ }
+
+ return 0;
+}
+
+void __exit vfio_ap_exit(void)
+{
+ ap_driver_unregister(&vfio_ap_drv);
+ vfio_ap_matrix_dev_destroy(matrix_dev);
+}
+
+module_init(vfio_ap_init);
+module_exit(vfio_ap_exit);
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
new file mode 100644
index 0000000..19c0b60
--- /dev/null
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Private data and functions for adjunct processor VFIO matrix driver.
+ *
+ * Copyright IBM Corp. 2018
+ * Author(s): Tony Krowiak <[email protected]>
+ */
+
+#ifndef _VFIO_AP_PRIVATE_H_
+#define _VFIO_AP_PRIVATE_H_
+
+#include <linux/types.h>
+
+#include "ap_bus.h"
+
+#define VFIO_AP_MODULE_NAME "vfio_ap"
+#define VFIO_AP_DRV_NAME "vfio_ap"
+
+struct ap_matrix_dev {
+ struct device device;
+};
+
+static inline struct ap_matrix_dev
+*to_ap_matrix_parent_dev(struct device *dev)
+{
+ return container_of(dev, struct ap_matrix_dev, device.parent);
+}
+
+#endif /* _VFIO_AP_PRIVATE_H_ */
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 1aa7b82..f378b98 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -200,6 +200,7 @@ struct vfio_device_info {
#define VFIO_DEVICE_FLAGS_PLATFORM (1 << 2) /* vfio-platform device */
#define VFIO_DEVICE_FLAGS_AMBA (1 << 3) /* vfio-amba device */
#define VFIO_DEVICE_FLAGS_CCW (1 << 4) /* vfio-ccw device */
+#define VFIO_DEVICE_FLAGS_AP (1 << 5) /* vfio-ap device */
__u32 num_regions; /* Max region index + 1 */
__u32 num_irqs; /* Max IRQ index + 1 */
};
@@ -215,6 +216,7 @@ struct vfio_device_info {
#define VFIO_DEVICE_API_PLATFORM_STRING "vfio-platform"
#define VFIO_DEVICE_API_AMBA_STRING "vfio-amba"
#define VFIO_DEVICE_API_CCW_STRING "vfio-ccw"
+#define VFIO_DEVICE_API_AP_STRING "vfio-ap"
/**
* VFIO_DEVICE_GET_REGION_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 8,
diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index 89161c9..2a3fd39 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -243,6 +243,68 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
return 0;
}
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
static int load_maps(struct bpf_map_data *maps, int nr_maps,
fixup_map_cb fixup_map)
{
--
1.7.1
Provides the sysfs interfaces for assigning AP adapters to
and unassigning AP adapters from a mediated matrix device.
The IDs of the AP adapters assigned to the mediated matrix
device are stored in an AP mask (APM). The bits in the APM,
from most significant to least significant bit, correspond to
AP adapter ID (APID) 0 to 255. When an adapter is assigned, the
bit corresponding the APID will be set in the APM.
Likewise, when an adapter is unassigned, the bit corresponding
to the APID will be cleared from the APM.
The relevant sysfs structures are:
/sys/devices/vfio_ap
... [matrix]
...... [mdev_supported_types]
......... [vfio_ap-passthrough]
............ [devices]
...............[$uuid]
.................. assign_adapter
.................. unassign_adapter
To assign an adapter to the $uuid mediated matrix device's APM,
write the APID to the assign_adapter file. To unassign an adapter,
write the APID to the unassign_adapter file. The APID is specified
using conventional semantics: If it begins with 0x the number will
be parsed as a hexadecimal number; if it begins with a 0 the number
will be parsed as an octal number; otherwise, it will be parsed as a
decimal number.
For example, to assign adapter 173 (0xad) to the mediated matrix
device $uuid:
echo 173 > assign_adapter
or
echo 0xad > assign_adapter
or
echo 0255 > assign_adapter
To unassign adapter 173 (0xad):
echo 173 > unassign_adapter
or
echo 0xad > unassign_adapter
or
echo 0255 > unassign_adapter
The assignment will be rejected:
* If the APID exceeds the maximum value for an AP adapter:
* If the AP Extended Addressing (APXA) facility is
installed, the max value is 255
* Else the max value is 64
* If no AP domains have yet been assigned and there are
no AP queues bound to the VFIO AP driver that have an APQN
with an APID matching that of the AP adapter being assigned.
* If any of the APQNs that can be derived from the cross product
of the APID being assigned and the AP queue index (APQI) of
each of the AP domains previously assigned can not be matched
with an APQN of an AP queue device reserved by the VFIO AP
driver.
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 317 +++++++++++++++++++++++++++++++++++++
1 files changed, 317 insertions(+), 0 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index bf7ed9f..a4351bd 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -16,6 +16,7 @@
#define VFOP_AP_MDEV_TYPE_HWVIRT "passthrough"
#define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
+#define KVM_AP_MASK_BYTES(n) DIV_ROUND_UP(n, BITS_PER_BYTE)
DEFINE_SPINLOCK(mdev_list_lock);
LIST_HEAD(mdev_list);
@@ -116,9 +117,325 @@ static ssize_t device_api_show(struct kobject *kobj, struct device *dev,
NULL,
};
+struct vfio_ap_qid_reserved {
+ ap_qid_t qid;
+ bool reserved;
+};
+
+struct vfio_id_reserved {
+ unsigned long id;
+ bool reserved;
+};
+
+/**
+ * vfio_ap_qid_reserved
+ *
+ * @dev: an AP queue device
+ * @data: a queue ID
+ *
+ * Flags whether any AP queue device has a particular qid
+ *
+ * Returns 0 to indicate the function succeeded
+ */
+static int vfio_ap_queue_has_qid(struct device *dev, void *data)
+{
+ struct vfio_ap_qid_reserved *qid_res = data;
+ struct ap_queue *ap_queue = to_ap_queue(dev);
+
+ if (qid_res->qid == ap_queue->qid)
+ qid_res->reserved = true;
+
+ return 0;
+}
+
+/**
+ * vfio_ap_queue_has_apid
+ *
+ * @dev: an AP queue device
+ * @data: an AP adapter ID
+ *
+ * Flags whether any AP queue device has a particular AP adapter ID
+ *
+ * Returns 0 to indicate the function succeeded
+ */
+static int vfio_ap_queue_has_apid(struct device *dev, void *data)
+{
+ struct vfio_id_reserved *id_res = data;
+ struct ap_queue *ap_queue = to_ap_queue(dev);
+
+ if (id_res->id == AP_QID_CARD(ap_queue->qid))
+ id_res->reserved = true;
+
+ return 0;
+}
+
+/**
+ * vfio_ap_verify_qid_reserved
+ *
+ * @matrix_dev: a mediated matrix device
+ * @qid: a qid (i.e., APQN)
+ *
+ * Verifies that the AP queue with @qid is reserved by the VFIO AP device
+ * driver.
+ *
+ * Returns 0 if the AP queue with @qid is reserved; otherwise, returns -ENODEV.
+ */
+static int vfio_ap_verify_qid_reserved(struct ap_matrix_dev *matrix_dev,
+ ap_qid_t qid)
+{
+ int ret;
+ struct vfio_ap_qid_reserved qid_res;
+
+ qid_res.qid = qid;
+ qid_res.reserved = false;
+
+ ret = driver_for_each_device(matrix_dev->device.driver, NULL, &qid_res,
+ vfio_ap_queue_has_qid);
+ if (ret)
+ return ret;
+
+ if (qid_res.reserved)
+ return 0;
+
+ return -EPERM;
+}
+
+/**
+ * vfio_ap_verify_apid_reserved
+ *
+ * @matrix_dev: a mediated matrix device
+ * @apid: an AP adapter ID
+ *
+ * Verifies that an AP queue with @apid is reserved by the VFIO AP device
+ * driver.
+ *
+ * Returns 0 if an AP queue with @apid is reserved; otherwise, returns -ENODEV.
+ */
+static int vfio_ap_verify_apid_reserved(struct ap_matrix_dev *matrix_dev,
+ const char *mdev_name,
+ unsigned long apid)
+{
+ int ret;
+ struct vfio_id_reserved id_res;
+
+ id_res.id = apid;
+ id_res.reserved = false;
+
+ ret = driver_for_each_device(matrix_dev->device.driver, NULL, &id_res,
+ vfio_ap_queue_has_apid);
+ if (ret)
+ return ret;
+
+ if (id_res.reserved)
+ return 0;
+
+ pr_err("%s: mdev %s using adapter %02lx not reserved by %s driver",
+ VFIO_AP_MODULE_NAME, mdev_name, apid,
+ VFIO_AP_DRV_NAME);
+
+ return -EPERM;
+}
+
+static int vfio_ap_verify_queues_reserved(struct ap_matrix_dev *matrix_dev,
+ const char *mdev_name,
+ struct ap_matrix *matrix)
+{
+ unsigned long apid, apqi;
+ int ret;
+ int rc = 0;
+
+ for_each_set_bit_inv(apid, matrix->apm, matrix->apm_max + 1) {
+ for_each_set_bit_inv(apqi, matrix->aqm, matrix->aqm_max + 1) {
+ ret = vfio_ap_verify_qid_reserved(matrix_dev,
+ AP_MKQID(apid, apqi));
+ if (ret == 0)
+ continue;
+
+ /*
+ * We want to log every APQN that is not reserved by
+ * the driver, so record the return code, log a message
+ * and allow the loop to continue
+ */
+ rc = ret;
+ pr_err("%s: mdev %s using queue %02lx.%04lx not reserved by %s driver",
+ VFIO_AP_MODULE_NAME, mdev_name, apid,
+ apqi, VFIO_AP_DRV_NAME);
+ }
+ }
+
+ return rc;
+}
+
+/**
+ * vfio_ap_validate_apid
+ *
+ * @mdev: the mediated device
+ * @matrix_mdev: the mediated matrix device
+ * @apid: the APID to validate
+ *
+ * Validates the value of @apid:
+ * * If there are no AP domains assigned, then there must be at least
+ * one AP queue device reserved by the VFIO AP device driver with an
+ * APQN containing @apid.
+ *
+ * * Else each APQN that can be derived from the intersection of @apid and
+ * the IDs of the AP domains already assigned must identify an AP queue
+ * that has been reserved by the VFIO AP device driver.
+ *
+ * Returns 0 if the value of @apid is valid; otherwise, returns an error.
+ */
+static int vfio_ap_validate_apid(struct mdev_device *mdev,
+ struct ap_matrix_mdev *matrix_mdev,
+ unsigned long apid)
+{
+ int ret;
+ unsigned long aqmsz = matrix_mdev->matrix.aqm_max + 1;
+ struct device *dev = mdev_parent_dev(mdev);
+ struct ap_matrix_dev *matrix_dev = to_ap_matrix_dev(dev);
+ struct ap_matrix matrix = matrix_mdev->matrix;
+
+ /* If there are any queues assigned to the mediated device */
+ if (find_first_bit_inv(matrix.aqm, aqmsz) < aqmsz) {
+ matrix.apm_max = matrix_mdev->matrix.apm_max;
+ memset(matrix.apm, 0,
+ ARRAY_SIZE(matrix.apm) * sizeof(matrix.apm[0]));
+ set_bit_inv(apid, matrix.apm);
+ matrix.aqm_max = matrix_mdev->matrix.aqm_max;
+ memcpy(matrix.aqm, matrix_mdev->matrix.aqm,
+ ARRAY_SIZE(matrix.aqm) * sizeof(matrix.aqm[0]));
+ ret = vfio_ap_verify_queues_reserved(matrix_dev,
+ matrix_mdev->name,
+ &matrix);
+ } else {
+ ret = vfio_ap_verify_apid_reserved(matrix_dev,
+ matrix_mdev->name, apid);
+ }
+
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+/**
+ * assign_adapter_store
+ *
+ * @dev: the matrix device
+ * @attr: a mediated matrix device attribute
+ * @buf: a buffer containing the adapter ID (APID) to be assigned
+ * @count: the number of bytes in @buf
+ *
+ * Parses the APID from @buf and assigns it to the mediated matrix device. The
+ * APID must be a valid value:
+ * * The APID value must not exceed the maximum allowable AP adapter ID
+ *
+ * * If there are no AP domains assigned, then there must be at least
+ * one AP queue device reserved by the VFIO AP device driver with an
+ * APQN containing @apid.
+ *
+ * * Else each APQN that can be derived from the intersection of @apid and
+ * the IDs of the AP domains already assigned must identify an AP queue
+ * that has been reserved by the VFIO AP device driver.
+ *
+ * Returns the number of bytes processed if the APID is valid; otherwise returns
+ * an error.
+ */
+static ssize_t assign_adapter_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ int ret;
+ unsigned long apid;
+ struct mdev_device *mdev = mdev_from_dev(dev);
+ struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+ unsigned long max_apid = matrix_mdev->matrix.apm_max;
+
+ ret = kstrtoul(buf, 0, &apid);
+ if (ret || (apid > max_apid)) {
+ pr_err("%s: %s: adapter id '%s' not a value from 0 to %02lu(%#04lx)",
+ VFIO_AP_MODULE_NAME, __func__, buf, max_apid, max_apid);
+
+ return ret ? ret : -EINVAL;
+ }
+
+ ret = vfio_ap_validate_apid(mdev, matrix_mdev, apid);
+ if (ret)
+ return ret;
+
+ /* Set the bit in the AP mask (APM) corresponding to the AP adapter
+ * number (APID). The bits in the mask, from most significant to least
+ * significant bit, correspond to APIDs 0-255.
+ */
+ set_bit_inv(apid, matrix_mdev->matrix.apm);
+
+ return count;
+}
+static DEVICE_ATTR_WO(assign_adapter);
+
+/**
+ * unassign_adapter_store
+ *
+ * @dev: the matrix device
+ * @attr: a mediated matrix device attribute
+ * @buf: a buffer containing the adapter ID (APID) to be assigned
+ * @count: the number of bytes in @buf
+ *
+ * Parses the APID from @buf and unassigns it from the mediated matrix device.
+ * The APID must be a valid value
+ *
+ * Returns the number of bytes processed if the APID is valid; otherwise returns
+ * an error.
+ */
+static ssize_t unassign_adapter_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ int ret;
+ unsigned long apid;
+ struct mdev_device *mdev = mdev_from_dev(dev);
+ struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+ unsigned long max_apid = matrix_mdev->matrix.apm_max;
+
+ ret = kstrtoul(buf, 0, &apid);
+ if (ret || (apid > max_apid)) {
+ pr_err("%s: %s: adapter id '%s' must be a value from 0 to %02lu(%#04lx)",
+ VFIO_AP_MODULE_NAME, __func__, buf, max_apid, max_apid);
+
+ return ret ? ret : -EINVAL;
+ }
+
+ if (!test_bit_inv(apid, matrix_mdev->matrix.apm)) {
+ pr_err("%s: %s: adapter id %02lu(%#04lx) not assigned",
+ VFIO_AP_MODULE_NAME, __func__, apid, apid);
+
+ return -ENODEV;
+ }
+
+ clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
+
+ return count;
+}
+DEVICE_ATTR_WO(unassign_adapter);
+
+static struct attribute *vfio_ap_mdev_attrs[] = {
+ &dev_attr_assign_adapter.attr,
+ &dev_attr_unassign_adapter.attr,
+ NULL
+};
+
+static struct attribute_group vfio_ap_mdev_attr_group = {
+ .attrs = vfio_ap_mdev_attrs
+};
+
+static const struct attribute_group *vfio_ap_mdev_attr_groups[] = {
+ &vfio_ap_mdev_attr_group,
+ NULL
+};
+
static const struct mdev_parent_ops vfio_ap_matrix_ops = {
.owner = THIS_MODULE,
.supported_type_groups = vfio_ap_mdev_type_groups,
+ .mdev_attr_groups = vfio_ap_mdev_attr_groups,
.create = vfio_ap_mdev_create,
.remove = vfio_ap_mdev_remove,
};
--
1.7.1
From: Tony Krowiak <[email protected]>
This patch refactors the code that initializes and sets up the
crypto configuration for a guest. The following changes are
implemented via this patch:
1. Prior to the introduction of AP device virtualization, it
was not necessary to provide guest access to the CRYCB
unless the MSA extension 3 (MSAX3) facility was installed
on the host system. With the introduction of AP device
virtualization, the CRYCB must be made accessible to the
guest as long as the AP instructions are installed on the
host.
2. Introduces a flag indicating AP instructions executed on
the guest shall be interpreted by the firmware. It is
initialized to indicate AP instructions are to be
to be interpreted and is used to set the SIE bit for
each vcpu during vcpu setup.
Signed-off-by: Tony Krowiak <[email protected]>
---
arch/s390/include/asm/kvm_host.h | 3 +
arch/s390/kvm/kvm-s390.c | 85 +++++++++++++++++++------------------
2 files changed, 47 insertions(+), 41 deletions(-)
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index a2188e3..d44e0d5 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -186,6 +186,7 @@ struct kvm_s390_sie_block {
#define ECA_AIV 0x00200000
#define ECA_VX 0x00020000
#define ECA_PROTEXCI 0x00002000
+#define ECA_APIE 0x00000008
#define ECA_SII 0x00000001
__u32 eca; /* 0x004c */
#define ICPT_INST 0x04
@@ -255,6 +256,7 @@ struct kvm_s390_sie_block {
__u8 reservede4[4]; /* 0x00e4 */
__u64 tecmc; /* 0x00e8 */
__u8 reservedf0[12]; /* 0x00f0 */
+#define CRYCB_FORMAT_MASK 0x00000003
#define CRYCB_FORMAT1 0x00000001
#define CRYCB_FORMAT2 0x00000003
__u32 crycbd; /* 0x00fc */
@@ -713,6 +715,7 @@ struct kvm_s390_crypto {
__u32 crycbd;
__u8 aes_kw;
__u8 dea_kw;
+ __u8 apie;
};
#define APCB0_MASK_SIZE 1
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index d2208d4..3aa16df 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1886,49 +1886,37 @@ long kvm_arch_vm_ioctl(struct file *filp,
return r;
}
-static int kvm_s390_query_ap_config(u8 *config)
-{
- u32 fcn_code = 0x04000000UL;
- u32 cc = 0;
-
- memset(config, 0, 128);
- asm volatile(
- "lgr 0,%1\n"
- "lgr 2,%2\n"
- ".long 0xb2af0000\n" /* PQAP(QCI) */
- "0: ipm %0\n"
- "srl %0,28\n"
- "1:\n"
- EX_TABLE(0b, 1b)
- : "+r" (cc)
- : "r" (fcn_code), "r" (config)
- : "cc", "0", "2", "memory"
- );
-
- return cc;
-}
-
static int kvm_s390_apxa_installed(void)
{
- u8 config[128];
- int cc;
+ struct ap_config_info info;
- if (test_facility(12)) {
- cc = kvm_s390_query_ap_config(config);
-
- if (cc)
- pr_err("PQAP(QCI) failed with cc=%d", cc);
- else
- return config[0] & 0x40;
+ if (ap_instructions_available() == 0) {
+ if (ap_qci(&info) == 0)
+ return info.apxa;
}
return 0;
}
+/*
+ * The format of the crypto control block (CRYCB) is specified in the 3 low
+ * order bits of the CRYCB designation (CRYCBD) field as follows:
+ * Format 0: Neither the message security assist extension 3 (MSAX3) nor the
+ * AP extended addressing (APXA) facility are installed.
+ * Format 1: The APXA facility is not installed but the MSAX3 facility is.
+ * Format 2: Both the APXA and MSAX3 facilities are installed
+ */
static void kvm_s390_set_crycb_format(struct kvm *kvm)
{
kvm->arch.crypto.crycbd = (__u32)(unsigned long) kvm->arch.crypto.crycb;
+ /* Clear the CRYCB format bits - i.e., set format 0 by default */
+ kvm->arch.crypto.crycbd &= ~(CRYCB_FORMAT_MASK);
+
+ /* Check whether MSAX3 is installed */
+ if (!test_kvm_facility(kvm, 76))
+ return;
+
if (kvm_s390_apxa_installed())
kvm->arch.crypto.crycbd |= CRYCB_FORMAT2;
else
@@ -1946,11 +1934,13 @@ static u64 kvm_s390_get_initial_cpuid(void)
static void kvm_s390_crypto_init(struct kvm *kvm)
{
- if (!test_kvm_facility(kvm, 76))
- return;
-
kvm->arch.crypto.crycb = &kvm->arch.sie_page2->crycb;
kvm_s390_set_crycb_format(kvm);
+ /* Default setting indicating SIE shall interpret AP instructions */
+ kvm->arch.crypto.apie = 1;
+
+ if (!test_kvm_facility(kvm, 76))
+ return;
/* Enable AES/DEA protected key functions by default */
kvm->arch.crypto.aes_kw = 1;
@@ -2479,17 +2469,30 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
static void kvm_s390_vcpu_crypto_setup(struct kvm_vcpu *vcpu)
{
- if (!test_kvm_facility(vcpu->kvm, 76))
+ /*
+ * If neither the AP instructions nor the MSAX3 facility are installed
+ * on the host, then there is no need for a CRYCB in SIE because the
+ * they will not be installed on the guest either.
+ */
+ if (ap_instructions_available() && !test_facility(76))
return;
- vcpu->arch.sie_block->ecb3 &= ~(ECB3_AES | ECB3_DEA);
+ vcpu->arch.sie_block->crycbd = vcpu->kvm->arch.crypto.crycbd;
+
+ vcpu->arch.sie_block->eca &= ~ECA_APIE;
+ if (vcpu->kvm->arch.crypto.apie &&
+ test_kvm_cpu_feat(vcpu->kvm, KVM_S390_VM_CPU_FEAT_AP))
+ vcpu->arch.sie_block->eca |= ECA_APIE;
- if (vcpu->kvm->arch.crypto.aes_kw)
- vcpu->arch.sie_block->ecb3 |= ECB3_AES;
- if (vcpu->kvm->arch.crypto.dea_kw)
- vcpu->arch.sie_block->ecb3 |= ECB3_DEA;
+ /* If MSAX3 is installed on the guest, set up protected key support */
+ if (test_kvm_facility(vcpu->kvm, 76)) {
+ vcpu->arch.sie_block->ecb3 &= ~(ECB3_AES | ECB3_DEA);
- vcpu->arch.sie_block->crycbd = vcpu->kvm->arch.crypto.crycbd;
+ if (vcpu->kvm->arch.crypto.aes_kw)
+ vcpu->arch.sie_block->ecb3 |= ECB3_AES;
+ if (vcpu->kvm->arch.crypto.dea_kw)
+ vcpu->arch.sie_block->ecb3 |= ECB3_DEA;
+ }
}
void kvm_s390_vcpu_unsetup_cmma(struct kvm_vcpu *vcpu)
--
1.7.1
From: Harald Freudenberger <[email protected]>
Move all the inline functions from the ap bus header
file ap_asm.h into the in-kernel api header file
arch/s390/include/asm/ap.h so that KVM can make use
of all the low level AP functions.
Signed-off-by: Harald Freudenberger <[email protected]>
Signed-off-by: Tony Krowiak <[email protected]>
---
arch/s390/include/asm/ap.h | 284 ++++++++++++++++++++++++++++++++++++----
drivers/s390/crypto/ap_asm.h | 261 ------------------------------------
drivers/s390/crypto/ap_bus.c | 21 +---
drivers/s390/crypto/ap_bus.h | 1 +
drivers/s390/crypto/ap_card.c | 1 -
drivers/s390/crypto/ap_queue.c | 1 -
6 files changed, 259 insertions(+), 310 deletions(-)
delete mode 100644 drivers/s390/crypto/ap_asm.h
diff --git a/arch/s390/include/asm/ap.h b/arch/s390/include/asm/ap.h
index c1bedb4..046e044 100644
--- a/arch/s390/include/asm/ap.h
+++ b/arch/s390/include/asm/ap.h
@@ -47,6 +47,50 @@ struct ap_queue_status {
};
/**
+ * ap_intructions_available() - Test if AP instructions are available.
+ *
+ * Returns 0 if the AP instructions are installed.
+ */
+static inline int ap_instructions_available(void)
+{
+ register unsigned long reg0 asm ("0") = AP_MKQID(0, 0);
+ register unsigned long reg1 asm ("1") = -ENODEV;
+ register unsigned long reg2 asm ("2");
+
+ asm volatile(
+ " .long 0xb2af0000\n" /* PQAP(TAPQ) */
+ "0: la %0,0\n"
+ "1:\n"
+ EX_TABLE(0b, 1b)
+ : "+d" (reg1), "=d" (reg2)
+ : "d" (reg0)
+ : "cc");
+ return reg1;
+}
+
+/**
+ * ap_tapq(): Test adjunct processor queue.
+ * @qid: The AP queue number
+ * @info: Pointer to queue descriptor
+ *
+ * Returns AP queue status structure.
+ */
+static inline struct ap_queue_status ap_tapq(ap_qid_t qid, unsigned long *info)
+{
+ register unsigned long reg0 asm ("0") = qid;
+ register struct ap_queue_status reg1 asm ("1");
+ register unsigned long reg2 asm ("2");
+
+ asm volatile(".long 0xb2af0000" /* PQAP(TAPQ) */
+ : "=d" (reg1), "=d" (reg2)
+ : "d" (reg0)
+ : "cc");
+ if (info)
+ *info = reg2;
+ return reg1;
+}
+
+/**
* ap_test_queue(): Test adjunct processor queue.
* @qid: The AP queue number
* @tbit: Test facilities bit
@@ -54,10 +98,57 @@ struct ap_queue_status {
*
* Returns AP queue status structure.
*/
-struct ap_queue_status ap_test_queue(ap_qid_t qid,
- int tbit,
- unsigned long *info);
+static inline struct ap_queue_status ap_test_queue(ap_qid_t qid,
+ int tbit,
+ unsigned long *info)
+{
+ if (tbit)
+ qid |= 1UL << 23; /* set T bit*/
+ return ap_tapq(qid, info);
+}
+/**
+ * ap_pqap_rapq(): Reset adjunct processor queue.
+ * @qid: The AP queue number
+ *
+ * Returns AP queue status structure.
+ */
+static inline struct ap_queue_status ap_rapq(ap_qid_t qid)
+{
+ register unsigned long reg0 asm ("0") = qid | (1UL << 24);
+ register struct ap_queue_status reg1 asm ("1");
+
+ asm volatile(
+ ".long 0xb2af0000" /* PQAP(RAPQ) */
+ : "=d" (reg1)
+ : "d" (reg0)
+ : "cc");
+ return reg1;
+}
+
+/**
+ * ap_pqap_zapq(): Reset and zeroize adjunct processor queue.
+ * @qid: The AP queue number
+ *
+ * Returns AP queue status structure.
+ */
+static inline struct ap_queue_status ap_zapq(ap_qid_t qid)
+{
+ register unsigned long reg0 asm ("0") = qid | (2UL << 24);
+ register struct ap_queue_status reg1 asm ("1");
+
+ asm volatile(
+ ".long 0xb2af0000" /* PQAP(ZAPQ) */
+ : "=d" (reg1)
+ : "d" (reg0)
+ : "cc");
+ return reg1;
+}
+
+/**
+ * struct ap_config_info - convenience struct for AP crypto
+ * config info as returned by the ap_qci() function.
+ */
struct ap_config_info {
unsigned int apsc : 1; /* S bit */
unsigned int apxa : 1; /* N bit */
@@ -74,50 +165,189 @@ struct ap_config_info {
unsigned char _reserved4[16];
} __aligned(8);
-/*
- * ap_query_configuration(): Fetch cryptographic config info
+/**
+ * ap_qci(): Get AP configuration data
*
- * Returns the ap configuration info fetched via PQAP(QCI).
- * On success 0 is returned, on failure a negative errno
- * is returned, e.g. if the PQAP(QCI) instruction is not
- * available, the return value will be -EOPNOTSUPP.
+ * Returns 0 on success, or -EOPNOTSUPP.
*/
-int ap_query_configuration(struct ap_config_info *info);
+static inline int ap_qci(struct ap_config_info *config)
+{
+ register unsigned long reg0 asm ("0") = 4UL << 24;
+ register unsigned long reg1 asm ("1") = -EOPNOTSUPP;
+ register struct ap_config_info *reg2 asm ("2") = config;
+
+ asm volatile(
+ ".long 0xb2af0000\n" /* PQAP(QCI) */
+ "0: la %0,0\n"
+ "1:\n"
+ EX_TABLE(0b, 1b)
+ : "+d" (reg1)
+ : "d" (reg0), "d" (reg2)
+ : "cc", "memory");
+
+ return reg1;
+}
/*
* struct ap_qirq_ctrl - convenient struct for easy invocation
- * of the ap_queue_irq_ctrl() function. This struct is passed
- * as GR1 parameter to the PQAP(AQIC) instruction. For details
- * please see the AR documentation.
+ * of the ap_aqic() function. This struct is passed as GR1
+ * parameter to the PQAP(AQIC) instruction. For details please
+ * see the AR documentation.
*/
struct ap_qirq_ctrl {
unsigned int _res1 : 8;
- unsigned int zone : 8; /* zone info */
- unsigned int ir : 1; /* ir flag: enable (1) or disable (0) irq */
+ unsigned int zone : 8; /* zone info */
+ unsigned int ir : 1; /* ir flag: enable (1) or disable (0) irq */
unsigned int _res2 : 4;
- unsigned int gisc : 3; /* guest isc field */
+ unsigned int gisc : 3; /* guest isc field */
unsigned int _res3 : 6;
- unsigned int gf : 2; /* gisa format */
+ unsigned int gf : 2; /* gisa format */
unsigned int _res4 : 1;
- unsigned int gisa : 27; /* gisa origin */
+ unsigned int gisa : 27; /* gisa origin */
unsigned int _res5 : 1;
- unsigned int isc : 3; /* irq sub class */
+ unsigned int isc : 3; /* irq sub class */
};
/**
- * ap_queue_irq_ctrl(): Control interruption on a AP queue.
+ * ap_aqic(): Control interruption for a specific AP.
* @qid: The AP queue number
- * @qirqctrl: struct ap_qirq_ctrl, see above
+ * @qirqctrl: struct ap_qirq_ctrl (64 bit value)
* @ind: The notification indicator byte
*
* Returns AP queue status.
+ */
+static inline struct ap_queue_status ap_aqic(ap_qid_t qid,
+ struct ap_qirq_ctrl qirqctrl,
+ void *ind)
+{
+ register unsigned long reg0 asm ("0") = qid | (3UL << 24);
+ register struct ap_qirq_ctrl reg1_in asm ("1") = qirqctrl;
+ register struct ap_queue_status reg1_out asm ("1");
+ register void *reg2 asm ("2") = ind;
+
+ asm volatile(
+ ".long 0xb2af0000" /* PQAP(AQIC) */
+ : "=d" (reg1_out)
+ : "d" (reg0), "d" (reg1_in), "d" (reg2)
+ : "cc");
+ return reg1_out;
+}
+
+/*
+ * union ap_qact_ap_info - used together with the
+ * ap_aqic() function to provide a convenient way
+ * to handle the ap info needed by the qact function.
+ */
+union ap_qact_ap_info {
+ unsigned long val;
+ struct {
+ unsigned int : 3;
+ unsigned int mode : 3;
+ unsigned int : 26;
+ unsigned int cat : 8;
+ unsigned int : 8;
+ unsigned char ver[2];
+ };
+};
+
+/**
+ * ap_qact(): Query AP combatibility type.
+ * @qid: The AP queue number
+ * @apinfo: On input the info about the AP queue. On output the
+ * alternate AP queue info provided by the qact function
+ * in GR2 is stored in.
*
- * Control interruption on the given AP queue.
- * Just a simple wrapper function for the low level PQAP(AQIC)
- * instruction available for other kernel modules.
+ * Returns AP queue status. Check response_code field for failures.
*/
-struct ap_queue_status ap_queue_irq_ctrl(ap_qid_t qid,
- struct ap_qirq_ctrl qirqctrl,
- void *ind);
+static inline struct ap_queue_status ap_qact(ap_qid_t qid, int ifbit,
+ union ap_qact_ap_info *apinfo)
+{
+ register unsigned long reg0 asm ("0") = qid | (5UL << 24)
+ | ((ifbit & 0x01) << 22);
+ register unsigned long reg1_in asm ("1") = apinfo->val;
+ register struct ap_queue_status reg1_out asm ("1");
+ register unsigned long reg2 asm ("2");
+
+ asm volatile(
+ ".long 0xb2af0000" /* PQAP(QACT) */
+ : "+d" (reg1_in), "=d" (reg1_out), "=d" (reg2)
+ : "d" (reg0)
+ : "cc");
+ apinfo->val = reg2;
+ return reg1_out;
+}
+
+/**
+ * ap_nqap(): Send message to adjunct processor queue.
+ * @qid: The AP queue number
+ * @psmid: The program supplied message identifier
+ * @msg: The message text
+ * @length: The message length
+ *
+ * Returns AP queue status structure.
+ * Condition code 1 on NQAP can't happen because the L bit is 1.
+ * Condition code 2 on NQAP also means the send is incomplete,
+ * because a segment boundary was reached. The NQAP is repeated.
+ */
+static inline struct ap_queue_status ap_nqap(ap_qid_t qid,
+ unsigned long long psmid,
+ void *msg, size_t length)
+{
+ register unsigned long reg0 asm ("0") = qid | 0x40000000UL;
+ register struct ap_queue_status reg1 asm ("1");
+ register unsigned long reg2 asm ("2") = (unsigned long) msg;
+ register unsigned long reg3 asm ("3") = (unsigned long) length;
+ register unsigned long reg4 asm ("4") = (unsigned int) (psmid >> 32);
+ register unsigned long reg5 asm ("5") = psmid & 0xffffffff;
+
+ asm volatile (
+ "0: .long 0xb2ad0042\n" /* NQAP */
+ " brc 2,0b"
+ : "+d" (reg0), "=d" (reg1), "+d" (reg2), "+d" (reg3)
+ : "d" (reg4), "d" (reg5)
+ : "cc", "memory");
+ return reg1;
+}
+
+/**
+ * ap_dqap(): Receive message from adjunct processor queue.
+ * @qid: The AP queue number
+ * @psmid: Pointer to program supplied message identifier
+ * @msg: The message text
+ * @length: The message length
+ *
+ * Returns AP queue status structure.
+ * Condition code 1 on DQAP means the receive has taken place
+ * but only partially. The response is incomplete, hence the
+ * DQAP is repeated.
+ * Condition code 2 on DQAP also means the receive is incomplete,
+ * this time because a segment boundary was reached. Again, the
+ * DQAP is repeated.
+ * Note that gpr2 is used by the DQAP instruction to keep track of
+ * any 'residual' length, in case the instruction gets interrupted.
+ * Hence it gets zeroed before the instruction.
+ */
+static inline struct ap_queue_status ap_dqap(ap_qid_t qid,
+ unsigned long long *psmid,
+ void *msg, size_t length)
+{
+ register unsigned long reg0 asm("0") = qid | 0x80000000UL;
+ register struct ap_queue_status reg1 asm ("1");
+ register unsigned long reg2 asm("2") = 0UL;
+ register unsigned long reg4 asm("4") = (unsigned long) msg;
+ register unsigned long reg5 asm("5") = (unsigned long) length;
+ register unsigned long reg6 asm("6") = 0UL;
+ register unsigned long reg7 asm("7") = 0UL;
+
+
+ asm volatile(
+ "0: .long 0xb2ae0064\n" /* DQAP */
+ " brc 6,0b\n"
+ : "+d" (reg0), "=d" (reg1), "+d" (reg2),
+ "+d" (reg4), "+d" (reg5), "+d" (reg6), "+d" (reg7)
+ : : "cc", "memory");
+ *psmid = (((unsigned long long) reg6) << 32) + reg7;
+ return reg1;
+}
#endif /* _ASM_S390_AP_H_ */
diff --git a/drivers/s390/crypto/ap_asm.h b/drivers/s390/crypto/ap_asm.h
deleted file mode 100644
index e22ee12..0000000
--- a/drivers/s390/crypto/ap_asm.h
+++ /dev/null
@@ -1,261 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * Copyright IBM Corp. 2016
- * Author(s): Martin Schwidefsky <[email protected]>
- *
- * Adjunct processor bus inline assemblies.
- */
-
-#ifndef _AP_ASM_H_
-#define _AP_ASM_H_
-
-#include <asm/isc.h>
-
-/**
- * ap_intructions_available() - Test if AP instructions are available.
- *
- * Returns 0 if the AP instructions are installed.
- */
-static inline int ap_instructions_available(void)
-{
- register unsigned long reg0 asm ("0") = AP_MKQID(0, 0);
- register unsigned long reg1 asm ("1") = -ENODEV;
- register unsigned long reg2 asm ("2");
-
- asm volatile(
- " .long 0xb2af0000\n" /* PQAP(TAPQ) */
- "0: la %0,0\n"
- "1:\n"
- EX_TABLE(0b, 1b)
- : "+d" (reg1), "=d" (reg2)
- : "d" (reg0)
- : "cc");
- return reg1;
-}
-
-/**
- * ap_tapq(): Test adjunct processor queue.
- * @qid: The AP queue number
- * @info: Pointer to queue descriptor
- *
- * Returns AP queue status structure.
- */
-static inline struct ap_queue_status ap_tapq(ap_qid_t qid, unsigned long *info)
-{
- register unsigned long reg0 asm ("0") = qid;
- register struct ap_queue_status reg1 asm ("1");
- register unsigned long reg2 asm ("2");
-
- asm volatile(".long 0xb2af0000" /* PQAP(TAPQ) */
- : "=d" (reg1), "=d" (reg2)
- : "d" (reg0)
- : "cc");
- if (info)
- *info = reg2;
- return reg1;
-}
-
-/**
- * ap_pqap_rapq(): Reset adjunct processor queue.
- * @qid: The AP queue number
- *
- * Returns AP queue status structure.
- */
-static inline struct ap_queue_status ap_rapq(ap_qid_t qid)
-{
- register unsigned long reg0 asm ("0") = qid | (1UL << 24);
- register struct ap_queue_status reg1 asm ("1");
-
- asm volatile(
- ".long 0xb2af0000" /* PQAP(RAPQ) */
- : "=d" (reg1)
- : "d" (reg0)
- : "cc");
- return reg1;
-}
-
-/**
- * ap_pqap_zapq(): Reset and zeroize adjunct processor queue.
- * @qid: The AP queue number
- *
- * Returns AP queue status structure.
- */
-static inline struct ap_queue_status ap_zapq(ap_qid_t qid)
-{
- register unsigned long reg0 asm ("0") = qid | (2UL << 24);
- register struct ap_queue_status reg1 asm ("1");
-
- asm volatile(
- ".long 0xb2af0000" /* PQAP(ZAPQ) */
- : "=d" (reg1)
- : "d" (reg0)
- : "cc");
- return reg1;
-}
-
-/**
- * ap_aqic(): Control interruption for a specific AP.
- * @qid: The AP queue number
- * @qirqctrl: struct ap_qirq_ctrl (64 bit value)
- * @ind: The notification indicator byte
- *
- * Returns AP queue status.
- */
-static inline struct ap_queue_status ap_aqic(ap_qid_t qid,
- struct ap_qirq_ctrl qirqctrl,
- void *ind)
-{
- register unsigned long reg0 asm ("0") = qid | (3UL << 24);
- register struct ap_qirq_ctrl reg1_in asm ("1") = qirqctrl;
- register struct ap_queue_status reg1_out asm ("1");
- register void *reg2 asm ("2") = ind;
-
- asm volatile(
- ".long 0xb2af0000" /* PQAP(AQIC) */
- : "=d" (reg1_out)
- : "d" (reg0), "d" (reg1_in), "d" (reg2)
- : "cc");
- return reg1_out;
-}
-
-/**
- * ap_qci(): Get AP configuration data
- *
- * Returns 0 on success, or -EOPNOTSUPP.
- */
-static inline int ap_qci(void *config)
-{
- register unsigned long reg0 asm ("0") = 4UL << 24;
- register unsigned long reg1 asm ("1") = -EINVAL;
- register void *reg2 asm ("2") = (void *) config;
-
- asm volatile(
- ".long 0xb2af0000\n" /* PQAP(QCI) */
- "0: la %0,0\n"
- "1:\n"
- EX_TABLE(0b, 1b)
- : "+d" (reg1)
- : "d" (reg0), "d" (reg2)
- : "cc", "memory");
-
- return reg1;
-}
-
-/*
- * union ap_qact_ap_info - used together with the
- * ap_aqic() function to provide a convenient way
- * to handle the ap info needed by the qact function.
- */
-union ap_qact_ap_info {
- unsigned long val;
- struct {
- unsigned int : 3;
- unsigned int mode : 3;
- unsigned int : 26;
- unsigned int cat : 8;
- unsigned int : 8;
- unsigned char ver[2];
- };
-};
-
-/**
- * ap_qact(): Query AP combatibility type.
- * @qid: The AP queue number
- * @apinfo: On input the info about the AP queue. On output the
- * alternate AP queue info provided by the qact function
- * in GR2 is stored in.
- *
- * Returns AP queue status. Check response_code field for failures.
- */
-static inline struct ap_queue_status ap_qact(ap_qid_t qid, int ifbit,
- union ap_qact_ap_info *apinfo)
-{
- register unsigned long reg0 asm ("0") = qid | (5UL << 24)
- | ((ifbit & 0x01) << 22);
- register unsigned long reg1_in asm ("1") = apinfo->val;
- register struct ap_queue_status reg1_out asm ("1");
- register unsigned long reg2 asm ("2");
-
- asm volatile(
- ".long 0xb2af0000" /* PQAP(QACT) */
- : "+d" (reg1_in), "=d" (reg1_out), "=d" (reg2)
- : "d" (reg0)
- : "cc");
- apinfo->val = reg2;
- return reg1_out;
-}
-
-/**
- * ap_nqap(): Send message to adjunct processor queue.
- * @qid: The AP queue number
- * @psmid: The program supplied message identifier
- * @msg: The message text
- * @length: The message length
- *
- * Returns AP queue status structure.
- * Condition code 1 on NQAP can't happen because the L bit is 1.
- * Condition code 2 on NQAP also means the send is incomplete,
- * because a segment boundary was reached. The NQAP is repeated.
- */
-static inline struct ap_queue_status ap_nqap(ap_qid_t qid,
- unsigned long long psmid,
- void *msg, size_t length)
-{
- register unsigned long reg0 asm ("0") = qid | 0x40000000UL;
- register struct ap_queue_status reg1 asm ("1");
- register unsigned long reg2 asm ("2") = (unsigned long) msg;
- register unsigned long reg3 asm ("3") = (unsigned long) length;
- register unsigned long reg4 asm ("4") = (unsigned int) (psmid >> 32);
- register unsigned long reg5 asm ("5") = psmid & 0xffffffff;
-
- asm volatile (
- "0: .long 0xb2ad0042\n" /* NQAP */
- " brc 2,0b"
- : "+d" (reg0), "=d" (reg1), "+d" (reg2), "+d" (reg3)
- : "d" (reg4), "d" (reg5)
- : "cc", "memory");
- return reg1;
-}
-
-/**
- * ap_dqap(): Receive message from adjunct processor queue.
- * @qid: The AP queue number
- * @psmid: Pointer to program supplied message identifier
- * @msg: The message text
- * @length: The message length
- *
- * Returns AP queue status structure.
- * Condition code 1 on DQAP means the receive has taken place
- * but only partially. The response is incomplete, hence the
- * DQAP is repeated.
- * Condition code 2 on DQAP also means the receive is incomplete,
- * this time because a segment boundary was reached. Again, the
- * DQAP is repeated.
- * Note that gpr2 is used by the DQAP instruction to keep track of
- * any 'residual' length, in case the instruction gets interrupted.
- * Hence it gets zeroed before the instruction.
- */
-static inline struct ap_queue_status ap_dqap(ap_qid_t qid,
- unsigned long long *psmid,
- void *msg, size_t length)
-{
- register unsigned long reg0 asm("0") = qid | 0x80000000UL;
- register struct ap_queue_status reg1 asm ("1");
- register unsigned long reg2 asm("2") = 0UL;
- register unsigned long reg4 asm("4") = (unsigned long) msg;
- register unsigned long reg5 asm("5") = (unsigned long) length;
- register unsigned long reg6 asm("6") = 0UL;
- register unsigned long reg7 asm("7") = 0UL;
-
-
- asm volatile(
- "0: .long 0xb2ae0064\n" /* DQAP */
- " brc 6,0b\n"
- : "+d" (reg0), "=d" (reg1), "+d" (reg2),
- "+d" (reg4), "+d" (reg5), "+d" (reg6), "+d" (reg7)
- : : "cc", "memory");
- *psmid = (((unsigned long long) reg6) << 32) + reg7;
- return reg1;
-}
-
-#endif /* _AP_ASM_H_ */
diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
index 35a0c2b..c0a6723 100644
--- a/drivers/s390/crypto/ap_bus.c
+++ b/drivers/s390/crypto/ap_bus.c
@@ -36,7 +36,6 @@
#include <linux/debugfs.h>
#include "ap_bus.h"
-#include "ap_asm.h"
#include "ap_debug.h"
/*
@@ -174,24 +173,6 @@ static inline int ap_qact_available(void)
return 0;
}
-/**
- * ap_test_queue(): Test adjunct processor queue.
- * @qid: The AP queue number
- * @tbit: Test facilities bit
- * @info: Pointer to queue descriptor
- *
- * Returns AP queue status structure.
- */
-struct ap_queue_status ap_test_queue(ap_qid_t qid,
- int tbit,
- unsigned long *info)
-{
- if (tbit)
- qid |= 1UL << 23; /* set T bit*/
- return ap_tapq(qid, info);
-}
-EXPORT_SYMBOL(ap_test_queue);
-
/*
* ap_query_configuration(): Fetch cryptographic config info
*
@@ -200,7 +181,7 @@ struct ap_queue_status ap_test_queue(ap_qid_t qid,
* is returned, e.g. if the PQAP(QCI) instruction is not
* available, the return value will be -EOPNOTSUPP.
*/
-int ap_query_configuration(struct ap_config_info *info)
+static inline int ap_query_configuration(struct ap_config_info *info)
{
if (!ap_configuration_available())
return -EOPNOTSUPP;
diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
index 6a273c5..9365419 100644
--- a/drivers/s390/crypto/ap_bus.h
+++ b/drivers/s390/crypto/ap_bus.h
@@ -15,6 +15,7 @@
#include <linux/device.h>
#include <linux/types.h>
+#include <asm/isc.h>
#include <asm/ap.h>
#define AP_DEVICES 256 /* Number of AP devices. */
diff --git a/drivers/s390/crypto/ap_card.c b/drivers/s390/crypto/ap_card.c
index 2c726df..c13e432 100644
--- a/drivers/s390/crypto/ap_card.c
+++ b/drivers/s390/crypto/ap_card.c
@@ -14,7 +14,6 @@
#include <asm/facility.h>
#include "ap_bus.h"
-#include "ap_asm.h"
/*
* AP card related attributes.
diff --git a/drivers/s390/crypto/ap_queue.c b/drivers/s390/crypto/ap_queue.c
index ba3a2e1..d83c1fa 100644
--- a/drivers/s390/crypto/ap_queue.c
+++ b/drivers/s390/crypto/ap_queue.c
@@ -14,7 +14,6 @@
#include <asm/facility.h>
#include "ap_bus.h"
-#include "ap_asm.h"
/**
* ap_queue_irq_ctrl(): Control interruption on a AP queue.
--
1.7.1
From: Harald Freudenberger <[email protected]>
Reviewed and adapted the register use and asm constraints
of the C inline assembler functions in accordance to the
the AP instructions specifications.
Signed-off-by: Harald Freudenberger <[email protected]>
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/ap_asm.h | 40 +++++++++++++++++++++++-----------------
1 files changed, 23 insertions(+), 17 deletions(-)
diff --git a/drivers/s390/crypto/ap_asm.h b/drivers/s390/crypto/ap_asm.h
index b22d30a..e22ee12 100644
--- a/drivers/s390/crypto/ap_asm.h
+++ b/drivers/s390/crypto/ap_asm.h
@@ -20,14 +20,16 @@ static inline int ap_instructions_available(void)
{
register unsigned long reg0 asm ("0") = AP_MKQID(0, 0);
register unsigned long reg1 asm ("1") = -ENODEV;
- register unsigned long reg2 asm ("2") = 0UL;
+ register unsigned long reg2 asm ("2");
asm volatile(
" .long 0xb2af0000\n" /* PQAP(TAPQ) */
- "0: la %1,0\n"
+ "0: la %0,0\n"
"1:\n"
EX_TABLE(0b, 1b)
- : "+d" (reg0), "+d" (reg1), "+d" (reg2) : : "cc");
+ : "+d" (reg1), "=d" (reg2)
+ : "d" (reg0)
+ : "cc");
return reg1;
}
@@ -42,10 +44,12 @@ static inline struct ap_queue_status ap_tapq(ap_qid_t qid, unsigned long *info)
{
register unsigned long reg0 asm ("0") = qid;
register struct ap_queue_status reg1 asm ("1");
- register unsigned long reg2 asm ("2") = 0UL;
+ register unsigned long reg2 asm ("2");
asm volatile(".long 0xb2af0000" /* PQAP(TAPQ) */
- : "+d" (reg0), "=d" (reg1), "+d" (reg2) : : "cc");
+ : "=d" (reg1), "=d" (reg2)
+ : "d" (reg0)
+ : "cc");
if (info)
*info = reg2;
return reg1;
@@ -59,13 +63,14 @@ static inline struct ap_queue_status ap_tapq(ap_qid_t qid, unsigned long *info)
*/
static inline struct ap_queue_status ap_rapq(ap_qid_t qid)
{
- register unsigned long reg0 asm ("0") = qid | 0x01000000UL;
+ register unsigned long reg0 asm ("0") = qid | (1UL << 24);
register struct ap_queue_status reg1 asm ("1");
- register unsigned long reg2 asm ("2") = 0UL;
asm volatile(
".long 0xb2af0000" /* PQAP(RAPQ) */
- : "+d" (reg0), "=d" (reg1), "+d" (reg2) : : "cc");
+ : "=d" (reg1)
+ : "d" (reg0)
+ : "cc");
return reg1;
}
@@ -107,8 +112,8 @@ static inline struct ap_queue_status ap_aqic(ap_qid_t qid,
asm volatile(
".long 0xb2af0000" /* PQAP(AQIC) */
- : "+d" (reg0), "+d" (reg1_in), "=d" (reg1_out), "+d" (reg2)
- :
+ : "=d" (reg1_out)
+ : "d" (reg0), "d" (reg1_in), "d" (reg2)
: "cc");
return reg1_out;
}
@@ -120,17 +125,17 @@ static inline struct ap_queue_status ap_aqic(ap_qid_t qid,
*/
static inline int ap_qci(void *config)
{
- register unsigned long reg0 asm ("0") = 0x04000000UL;
+ register unsigned long reg0 asm ("0") = 4UL << 24;
register unsigned long reg1 asm ("1") = -EINVAL;
register void *reg2 asm ("2") = (void *) config;
asm volatile(
".long 0xb2af0000\n" /* PQAP(QCI) */
- "0: la %1,0\n"
+ "0: la %0,0\n"
"1:\n"
EX_TABLE(0b, 1b)
- : "+d" (reg0), "+d" (reg1), "+d" (reg2)
- :
+ : "+d" (reg1)
+ : "d" (reg0), "d" (reg2)
: "cc", "memory");
return reg1;
@@ -169,12 +174,13 @@ static inline struct ap_queue_status ap_qact(ap_qid_t qid, int ifbit,
| ((ifbit & 0x01) << 22);
register unsigned long reg1_in asm ("1") = apinfo->val;
register struct ap_queue_status reg1_out asm ("1");
- register unsigned long reg2 asm ("2") = 0;
+ register unsigned long reg2 asm ("2");
asm volatile(
".long 0xb2af0000" /* PQAP(QACT) */
- : "+d" (reg0), "+d" (reg1_in), "=d" (reg1_out), "+d" (reg2)
- : : "cc");
+ : "+d" (reg1_in), "=d" (reg1_out), "=d" (reg2)
+ : "d" (reg0)
+ : "cc");
apinfo->val = reg2;
return reg1_out;
}
--
1.7.1
From: Harald Freudenberger <[email protected]>
Added new inline function ap_pqap_zapq()
which is a C inline function wrapper for
the AP PQAP(ZAPQ) instruction.
Signed-off-by: Harald Freudenberger <[email protected]>
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/ap_asm.h | 19 +++++++++++++++++++
1 files changed, 19 insertions(+), 0 deletions(-)
diff --git a/drivers/s390/crypto/ap_asm.h b/drivers/s390/crypto/ap_asm.h
index 16b59ce..b22d30a 100644
--- a/drivers/s390/crypto/ap_asm.h
+++ b/drivers/s390/crypto/ap_asm.h
@@ -70,6 +70,25 @@ static inline struct ap_queue_status ap_rapq(ap_qid_t qid)
}
/**
+ * ap_pqap_zapq(): Reset and zeroize adjunct processor queue.
+ * @qid: The AP queue number
+ *
+ * Returns AP queue status structure.
+ */
+static inline struct ap_queue_status ap_zapq(ap_qid_t qid)
+{
+ register unsigned long reg0 asm ("0") = qid | (2UL << 24);
+ register struct ap_queue_status reg1 asm ("1");
+
+ asm volatile(
+ ".long 0xb2af0000" /* PQAP(ZAPQ) */
+ : "=d" (reg1)
+ : "d" (reg0)
+ : "cc");
+ return reg1;
+}
+
+/**
* ap_aqic(): Control interruption for a specific AP.
* @qid: The AP queue number
* @qirqctrl: struct ap_qirq_ctrl (64 bit value)
--
1.7.1
This patch provides documentation describing the AP architecture and
design concepts behind the virtualization of AP devices. It also
includes an example of how to configure AP devices for exclusive
use of KVM guests.
Signed-off-by: Tony Krowiak <[email protected]>
---
Documentation/s390/vfio-ap.txt | 575 ++++++++++++++++++++++++++++++++++++++++
MAINTAINERS | 1 +
2 files changed, 576 insertions(+), 0 deletions(-)
create mode 100644 Documentation/s390/vfio-ap.txt
diff --git a/Documentation/s390/vfio-ap.txt b/Documentation/s390/vfio-ap.txt
new file mode 100644
index 0000000..79f3d43
--- /dev/null
+++ b/Documentation/s390/vfio-ap.txt
@@ -0,0 +1,575 @@
+Introduction:
+============
+The Adjunct Processor (AP) facility is an IBM Z cryptographic facility comprised
+of three AP instructions and from 1 up to 256 PCIe cryptographic adapter cards.
+The AP devices provide cryptographic functions to all CPUs assigned to a
+linux system running in an IBM Z system LPAR.
+
+The AP adapter cards are exposed via the AP bus. The motivation for vfio-ap
+is to make AP cards available to KVM guests using the VFIO mediated device
+framework. This implementation relies considerably on the s390 virtualization
+facilities which do most of the hard work of providing direct access to AP
+devices.
+
+AP Architectural Overview:
+=========================
+To facilitate the comprehension of the design, let's start with some
+definitions:
+
+* AP adapter
+
+ An AP adapter is an IBM Z adapter card that can perform cryptographic
+ functions. There can be from 0 to 256 adapters assigned to an LPAR. Adapters
+ assigned to the LPAR in which a linux host is running will be available to
+ the linux host. Each adapter is identified by a number from 0 to 255. When
+ installed, an AP adapter is accessed by AP instructions executed by any CPU.
+
+ The AP adapter cards are assigned to a given LPAR via the system's Activation
+ Profile which can be edited via the HMC. When the system is IPL'd, the AP bus
+ module is loaded and detects the AP adapter cards assigned to the LPAR. The AP
+ bus creates a sysfs device for each adapter as they are detected. For example,
+ if AP adapters 4 and 10 (0x0a) are assigned to the LPAR, the AP bus will
+ create the following sysfs entries:
+
+ /sys/devices/ap/card04
+ /sys/devices/ap/card0a
+
+ Symbolic links to these devices will also be created in the AP bus devices
+ sub-directory:
+
+ /sys/bus/ap/devices/[card04]
+ /sys/bus/ap/devices/[card04]
+
+* AP domain
+
+ An adapter is partitioned into domains. Each domain can be thought of as
+ a set of hardware registers for processing AP instructions. An adapter can
+ hold up to 256 domains. Each domain is identified by a number from 0 to 255.
+ Domains can be further classified into two types:
+
+ * Usage domains are domains that can be accessed directly to process AP
+ commands.
+
+ * Control domains are domains that are accessed indirectly by AP
+ commands sent to a usage domain to control or change the domain, for
+ example; to set a secure private key for the domain.
+
+ The AP usage and control domains are assigned to a given LPAR via the system's
+ Activation Profile which can be edited via the HMC. When the system is IPL'd,
+ the AP bus module is loaded and detects the AP usage and control domains
+ assigned to the LPAR. The domain number of each usage domain will be coupled
+ with the adapter number of each AP adapter assigned to the LPAR to identify
+ the AP queues (see AP Queue section below). The domain number of each control
+ domain will be represented in a bitmask and stored in a sysfs file
+ /sys/bus/ap/ap_control_domain_mask created by the bus. The bits in the mask,
+ from most to least significant bit, correspond to domains 0-255.
+
+ A domain may be assigned to a system as both a usage and control domain, or
+ as a control domain only. Consequently, all domains assigned as both a usage
+ and control domain can both process AP commands as well as be changed by an AP
+ command sent to any usage domain assigned to the same system. Domains assigned
+ only as control domains can not process AP commands but can be changed by AP
+ commands sent to any usage domain assigned to the system.
+
+* AP Queue
+
+ An AP queue is the means by which an AP command-request message is sent to a
+ usage domain inside a specific adapter. An AP queue is identified by a tuple
+ comprised of an AP adapter ID (APID) and an AP queue index (APQI). The
+ APQI corresponds to a given usage domain number within the adapter. This tuple
+ forms an AP Queue Number (APQN) uniquely identifying an AP queue. AP
+ instructions include a field containing the APQN to identify the AP queue to
+ which the AP command-request message is to be sent for processing.
+
+ The AP bus will create a sysfs device for each APQN that can be derived from
+ the intersection of the AP adapter and usage domain numbers detected when the
+ AP bus module is loaded. For example, if adapters 4 and 10 (0x0a) and usage
+ domains 6 and 71 (0x47) are assigned to the LPAR, the AP bus will create the
+ following sysfs entries:
+
+ /sys/devices/ap/card04/04.0006
+ /sys/devices/ap/card04/04.0047
+ /sys/devices/ap/card0a/0a.0006
+ /sys/devices/ap/card0a/0a.0047
+
+ The following symbolic links to these devices will be created in the AP bus
+ devices subdirectory:
+
+ /sys/bus/ap/devices/[04.0006]
+ /sys/bus/ap/devices/[04.0047]
+ /sys/bus/ap/devices/[0a.0006]
+ /sys/bus/ap/devices/[0a.0047]
+
+* AP Instructions:
+
+ There are three AP instructions:
+
+ * NQAP: to enqueue an AP command-request message to a queue
+ * DQAP: to dequeue an AP command-reply message from a queue
+ * PQAP: to administer the queues
+
+AP and SIE:
+==========
+Let's now see how AP instructions are interpreted by the hardware.
+
+A satellite control block called the Crypto Control Block is attached to our
+main hardware virtualization control block. The CRYCB contains three fields to
+identify the adapters, usage domains and control domains assigned to the KVM
+guest:
+
+* The AP Mask (APM) field is a bit mask that identifies the AP adapters assigned
+ to the KVM guest. Each bit in the mask, from most significant to least
+ significant bit, corresponds to an APID from 0-255. If a bit is set, the
+ corresponding adapter is valid for use by the KVM guest.
+
+* The AP Queue Mask (AQM) field is a bit mask identifying the AP usage domains
+ assigned to the KVM guest. Each bit in the mask, from most significant to
+ least significant bit, corresponds to an AP queue index (APQI) from 0-255. If
+ a bit is set, the corresponding queue is valid for use by the KVM guest.
+
+* The AP Domain Mask field is a bit mask that identifies the AP control domains
+ assigned to the KVM guest. The ADM bit mask controls which domains can be
+ changed by an AP command-request message sent to a usage domain from the
+ guest. Each bit in the mask, from least significant to most significant bit,
+ corresponds to a domain from 0-255. If a bit is set, the corresponding domain
+ can be modified by an AP command-request message sent to a usage domain
+ configured for the KVM guest.
+
+If you recall from the description of an AP Queue, AP instructions include
+an APQN to identify the AP adapter and AP queue to which an AP command-request
+message is to be sent (NQAP and PQAP instructions), or from which a
+command-reply message is to be received (DQAP instruction). The validity of an
+APQN is defined by the matrix calculated from the APM and AQM; it is the
+cross product of all assigned adapter numbers (APM) with all assigned queue
+indexes (AQM). For example, if adapters 1 and 2 and usage domains 5 and 6 are
+assigned to a guest, the APQNs (1,5), (1,6), (2,5) and (2,6) will be valid for
+the guest.
+
+The APQNs can provide secure key functionality - i.e., a private key is stored
+on the adapter card for each of its domains - so each APQN must be assigned to
+at most one guest or the linux host.
+
+ Example 1: Valid configuration:
+ ------------------------------
+ Guest1: adapters 1,2 domains 5,6
+ Guest2: adapter 1,2 domain 7
+
+ This is valid because both guests have a unique set of APQNs: Guest1 has
+ APQNs (1,5), (1,6), (2,5) and (2,6); Guest2 has APQNs (1,7) and (2,7).
+
+ Example 2: Invalid configuration:
+ --------------------------------is assigned by writing the adapter's number into the
+ Guest1: adapters 1,2 domains 5,6
+ Guest2: adapter 1 domains 6,7
+
+ This is an invalid configuration because both guests have access to
+ APQN (1,6).
+
+The Design:
+===========
+The design introduces three new objects:
+
+1. AP matrix device
+2. VFIO AP device driver (vfio_ap.ko)
+3. AP mediated matrix passthrough device
+
+The VFIO AP device driver
+-------------------------
+The VFIO AP (vfio_ap) device driver serves the following purposes:
+
+1. Provides the interfaces to reserve APQNs for exclusive use of KVM guests.
+
+2. Sets up the VFIO mediated device interfaces to manage the mediated matrix
+ device and create the sysfs interfaces for assigning adapters, usage domains,
+ and control domains comprising the matrix for a KVM guest.
+
+3. Configure the APM, AQM and ADM in the CRYCB referenced by a KVM guest's
+ SIE state description to grant the guest access to AP devices
+
+4. Initialize the CPU model feature indicating that a KVM guest may use
+ AP facilities installed on the linux host.
+
+5. Enable interpretive execution mode for the KVM guest.
+
+Reserve APQNs for exclusive use of KVM guests
+---------------------------------------------
+The following block diagram illustrates the mechanism by which APQNs are
+reserved:
+
+ +------------------+
+ remove | | unbind
+ +------------------->+ cex4queue driver +<-----------+
+ | | | |
+ | +------------------+ |
+ | |
+ | |
+ | |
++--------+---------+ register +------------------+ +-----+------+
+| +<---------+ | bind | |
+| ap_bus | | vfio_ap driver +<-----+ admin |
+| +--------->+ | | |
++------------------+ probe +---+--------+-----+ +------------+
+ | |
+ create | | store APQN
+ | |
+ v v
+ +---+--------+-----+
+ | |
+ | matrix device |
+ | |
+ +------------------+
+
+The process for reserving an AP queue for use by a KVM guest is:
+
+* The vfio-ap driver during its initialization will perform the following:
+ * Create the 'vfio_ap' root device - /sys/devices/vfio_ap
+ * Create the 'matrix' device in the 'vfio_ap' root
+ * Register the matrix device with the device core
+* Register with the ap_bus for AP queue devices of type 10 devices (CEX4 and
+ newer) and to provide the vfio_ap driver's probe and remove callback
+ interfaces. The reason why older devices are not supported is because there
+ are no systems available on which to test.
+* The admin unbinds queue cc.qqqq from the cex4queue device driver. This results
+ in the ap_bus calling the the device driver's remove interface which
+ unbinds the cc.qqqq queue device from the driver.
+* The admin binds the cc.qqqq queue to the vfio_ap device driver. This results
+ in the ap_bus calling the device vfio_ap driver's probe interface to bind
+ queue cc.qqqq to the driver. The vfio_ap device driver will store the APQN for
+ the queue in the matrix device
+
+Set up the VFIO mediated device interfaces
+------------------------------------------
+The VFIO AP device driver utilizes the common interface of the VFIO mediated
+device core driver to:
+* Register an AP mediated bus driver to add a mediated matrix device to and
+ remove it from a VFIO group.
+* Create and destroy a mediated matrix device
+* Add a mediated matrix device to and remove it from the AP mediated bus driver
+* Add a mediated matrix device to and remove it from an IOMMU group
+
+The following high-level block diagram shows the main components and interfaces
+of the VFIO AP mediated matrix device driver:
+
+ +-------------+
+ | |
+ | +---------+ | mdev_register_driver() +--------------+
+ | | Mdev | +<-----------------------+ |
+ | | bus | | | vfio_mdev.ko |
+ | | driver | +----------------------->+ |<-> VFIO user
+ | +---------+ | probe()/remove() +--------------+ APIs
+ | |
+ | MDEV CORE |
+ | MODULE |
+ | mdev.ko |
+ | +---------+ | mdev_register_device() +--------------+
+ | |Physical | +<-----------------------+ |
+ | | device | | | vfio_ap.ko |<-> matrix
+ | |interface| +----------------------->+ | device
+ | +---------+ | callback +--------------+
+ +-------------+
+
+During initialization of the vfio_ap module, the matrix device is registered
+with an 'mdev_parent_ops' structure that provides the sysfs attribute
+structures, mdev functions and callback interfaces for managing the mediated
+matrix device.
+
+* sysfs attribute structures:
+ * supported_type_groups
+ The VFIO mediated device framework supports creation of user-defined
+ mediated device types. These mediated device types are specified
+ via the 'supported_type_groups' structure when a device is registered
+ with the mediated device framework. The registration process creates the
+ sysfs structures for each mediated device type specified in the
+ 'mdev_supported_types' sub-directory of the device being registered. Along
+ with the device type, the sysfs attributes of the mediated device type are
+ provided.
+
+ The VFIO AP device driver will register one mediated device type for
+ passthrough devices:
+ /sys/devices/vfio_ap/mdev_supported_types/vfio_ap-passthrough
+ Only the three read-only attributes required by the VFIO mdev framework will
+ be provided:
+ /sys/devices/vfio_ap/mdev_supported_types
+ ... name
+ ... device_api
+ ... available_instances
+ Where:
+ * name: specifies the name of the mediated device type
+ * device_api: the mediated device type's API
+ * available_instances: the number of mediated matrix passthrough devices
+ that can be created
+ * mdev_attr_groups
+ This attribute group identifies the user-defined sysfs attributes of the
+ mediated device. When a device is registered with the VFIO mediated device
+ framework, the sysfs attributes files identified in the 'mdev_attr_groups'
+ structure will be created in the mediated matrix device's directory. The
+ sysfs attributes for a mediated matrix device are:
+ * assign_adapter:
+ A write-only file for assigning an AP adapter to the mediated matrix
+ device. To assign an adapter, the APID of the adapter is written to the
+ file.
+ * assign_domain:
+ A write-only file for assigning an AP usage domain to the mediated matrix
+ device. To assign a domain, the APQI of the AP queue corresponding to a
+ usage domain is written to the file.
+ * matrix:
+ A read-only file for displaying the APQNs derived from the adapters and
+ domains assigned to the mediated matrix device.
+ * assign_control_domain:
+ A write-only file for assigning an AP control domain to the mediated
+ matrix device. To assign a control domain, the ID of a domain to be
+ controlled is written to the file. For the initial implementation, the set
+ of control domains will always include the set of usage domains, so it is
+ only necessary to assign control domains that are not also assigned as
+ usage domains.
+ * control_domains:
+ A read-only file for displaying the control domain numbers assigned to the
+ mediated matrix device.
+
+* functions:
+ * create:
+ allocates the ap_matrix_mdev structure used by the vfio_ap driver to:
+ * Keep track of the available instances
+ * Store the reference to the struct kvm for the KVM guest
+ * Provide the notifier callback that will get invoked to handle the
+ VFIO_GROUP_NOTIFY_SET_KVM event. When received, the vfio_ap driver will
+ store the reference in the mediated matrix device's ap_matrix_mdev
+ structure and enable the interpretive execution mode for the KVM guest.
+ * remove:
+ deallocates the mediated matrix device's ap_matrix_mdev structure.
+
+* callback interfaces
+ * open:
+ The vfio_ap driver uses this callback to register a
+ VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the mdev matrix
+ device. The notifier is invoked when QEMU connects the VFIO iommu group
+ for the mdev matrix device to the MDEV bus. Access to the KVM structure used
+ to configure the KVM guest is provided via this callback. The KVM structure,
+ is used to configure the guest's access to the AP matrix defined via the
+ mediated matrix device's sysfs attribute files.
+ * release:
+ unregisters the VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the
+ mdev matrix device and deconfigures the guest's AP matrix.
+
+Configure the APM, AQM and ADM in the CRYCB:
+-------------------------------------------
+Configuring the AP matrix for a KVM guest will be performed when the
+VFIO_GROUP_NOTIFY_SET_KVM notifier callback is invoked. The notifier
+function is called when QEMU connects the VFIO iommu group for the mdev matrix
+device to the MDEV bus. The CRYCB is configured by:
+* Setting the bits in the APM corresponding to the APIDs assigned to the
+ mediated matrix device via its 'assign_adapter' interface.
+* Setting the bits in the AQM corresponding to the APQIs assigned to the
+ mediated matrix device via its 'assign_domain' interface.
+* Setting the bits in the ADM corresponding to the domain dIDs assigned to the
+ mediated matrix device via its 'assign_control_domains' interface.
+
+Initialize the CPU model feature for AP
+---------------------------------------
+A new CPU model feature, KVM_S390_VM_CPU_FEAT_AP, is introduced to indicate that
+AP instructions are available to the KVM guest. This feature will be enabled by
+KVM only if the AP instructions are installed on the linux host. The feature
+must be turned on for the guest in order to access AP devices from the guest.
+For example, to turn the AP facilities on from the QEMU command line:
+
+ /usr/bin/qemu-system-s390x ... -cpu xxx,ap=on
+
+ Where xxx is the CPU model being used.
+
+ If the CPU model feature is not enabled by the kernel, QEMU will fail and
+ report that the feature is not supported.
+
+Example:
+=======
+Let's now provide an example to illustrate how KVM guests may be given
+access to AP facilities. For this example, we will show how to configure
+two guests such that executing the lszcrypt command on the guests would
+look like this:
+
+Guest1
+------
+CARD.DOMAIN TYPE MODE
+------------------------------
+05 CEX5C CCA-Coproc
+05.0004 CEX5C CCA-Coproc
+05.00ab CEX5C CCA-Coproc
+06 CEX5A Accelerator
+06.0004 CEX5A Accelerator
+06.00ab CEX5C CCA-Coproc
+
+Guest2
+------
+CARD.DOMAIN TYPE MODE
+------------------------------
+05 CEX5A Accelerator
+05.0047 CEX5A Accelerator
+05.00ff CEX5A Accelerator
+
+These are the steps:
+
+1. Install the vfio_ap module on the linux host. The dependency chain for the
+ vfio_ap module is:
+ * vfio
+ * mdev
+ * vfio_mdev
+ * KVM
+ * vfio_ap
+
+2. Secure the AP queues to be used by the two guests so that the host can not
+ access them. Only type 10 adapters (i.e., CEX4 and later) are supported
+ due to the fact that no test systems with older card types are available
+ for testing.
+
+ To secure the AP queues each, each AP Queue device must first be unbound from
+ the cex4queue device driver. The sysfs location of the driver is:
+
+ /sys/bus/ap
+ --- [drivers]
+ ------ [cex4queue]
+ --------- [05.0004]
+ --------- [05.0047]
+ --------- [05.00ab]
+ --------- [05.00ff]
+ --------- [06.0004]
+ --------- [06.00ab]
+ --------- unbind
+
+ To unbind AP queue 05.0004 from the cex4queue device driver:
+
+ echo 05.0004 > unbind
+
+ This must also be done for AP queues 05.00ab, 05.0047, 05.00ff, 06.0004,
+ and 06.00ab.
+
+ The AP Queues that were unbound must then be reserves for use by the two KVM
+ guests. This is accomplished by binding them to the vfio_ap device driver.
+ The sysfs location of the driver is:
+
+ /sys/bus/ap
+ ---[drivers]
+ ------ [vfio_ap]
+ ---------- bind
+
+ To bind queue 05.0004 to the vfio_ap driver:
+
+ echo 05.0004 > bind
+
+ This must also be done for AP queues 05.00ab, 05.0047, 05.00ff, 06.0004,
+ and 06.00ab.
+
+ Take note that the AP queues bound to the vfio_ap driver will be available
+ for guest usage until they are unbound from the driver, the vfio_ap module
+ is unloaded, or the host system is shut down.
+
+3. Create the mediated devices needed to configure the AP matrixes for the
+ two guests and to provide an interface to the vfio_ap driver for
+ use by the guests:
+
+ /sys/devices/
+ --- [vfio_ap]
+ ------ [matrix] (this is the matrix device)
+ --------- [mdev_supported_types]
+ ------------ [vfio_ap-passthrough] (passthrough mediated matrix device type)
+ --------------- create
+ --------------- [devices]
+
+ To create the mediated devices for the two guests:
+
+ uuidgen > create
+ uuidgen > create
+
+ This will create two mediated devices in the [devices] subdirectory named
+ with the UUID written to the create attribute file. We call them $uuid1
+ and $uuid2:
+
+ /sys/devices/
+ --- [vfio_ap]
+ ------ [matrix]
+ --------- [mdev_supported_types]
+ ------------ [vfio_ap-passthrough]
+ --------------- [devices]
+ ------------------ [$uuid1]
+ --------------------- assign_adapter
+ --------------------- assign_control_domain
+ --------------------- assign_domain
+ --------------------- matrix
+ --------------------- unassign_adapter
+ --------------------- unassign_control_domain
+ --------------------- unassign_domain
+
+ ------------------ [$uuid2]
+ --------------------- assign_adapter
+ --------------------- assign_cTo assign an adapter, the APID of the adapter is written to the
+ file. ontrol_domain
+ --------------------- assign_domain
+ --------------------- matrix
+ --------------------- unassign_adapter
+ --------------------- unassign_control_domain
+ --------------------- unassign_domain
+
+4. The administrator now needs to configure the matrixes for mediated
+ devices $uuid1 (for Guest1) and $uuid2 (for Guest2).
+
+ This is how the matrix is configured for Guest1:
+
+ echo 5 > assign_adapter
+ echo 6 > assign_adapter
+ echo 4 > assign_domain
+ echo 0xab > assign_domain
+
+ For this implementation, all usage domains - i.e., domains assigned
+ via the assign_domain attribute file - will also be configured in the ADM
+ field of the KVM guest's CRYCB, so there is no need to assign control
+ domains here unless you want to assign control domains that are not
+ assigned as usage domains.
+
+ If a mistake is made configuring an adapter, domain or control domain,
+ you can use the unassign_xxx files to unassign the adapter, domain or
+ control domain.
+
+ To display the matrix configuration for Guest1:
+
+ cat matrix
+
+ This is how the matrix is configured for Guest2:
+
+ echo 5 > assign_adapter
+ echo 0x47 > assign_domain
+ echo 0xff > assign_domain
+
+6. Start Guest1:
+
+ /usr/bin/qemu-system-s390x ... -cpu xxx,ap=on \
+ -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid1 ...
+
+7. Start Guest2:
+
+ /usr/bin/qemu-system-s390x ... -cpu xxx,ap=on \
+ -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid2 ...
+
+When the guest is shut down, the mediated matrix device may be removed.
+
+Using our example again, to remove the mediated matrix device $uuid1:
+
+ /sys/devices/
+ --- [vfio_ap]
+ ------ [matrix]
+ --------- [mdev_supported_types]
+ ------------ [vfio_ap-passthrough]
+ --------------- [devices]
+ ------------------ [$uuid1]
+ --------------------- remove
+
+ echo 1 > remove
+
+ This will remove all of the mdev matrix device's sysfs structures. To
+ recreate and reconfigure the mdev matrix device, all of the steps starting
+ with step 4 will have to be performed again.
+
+ It is not necessary to remove an mdev matrix device, but one may want to
+ remove it if no guest will use it during the lifetime of the linux host. If
+ the mdev matrix device is removed, one may want to unbind the AP queues the
+ guest was using from the vfio_ap device driver and bind them back to the
+ default driver. Alternatively, the AP queues can be configured for another
+ mdev matrix (i.e., guest). In either case, one must take care to change the
+ secure key configured for the domain to which the queue is connected.
\ No newline at end of file
diff --git a/MAINTAINERS b/MAINTAINERS
index 3217803..c693a23 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12411,6 +12411,7 @@ S: Supported
F: drivers/s390/crypto/vfio_ap_drv.c
F: drivers/s390/crypto/vfio_ap_private.h
F: drivers/s390/crypto/vfio_ap_ops.c
+F: Documentation/s390/vfio-ap.txt
S390 ZFCP DRIVER
M: Steffen Maier <[email protected]>
--
1.7.1
Registers the matrix device created by the VFIO AP device
driver with the VFIO mediated device framework.
Registering the matrix device will create the sysfs
structures needed to create mediated matrix devices
each of which will be used to configure the AP matrix
for a guest and connect it to the VFIO AP device driver.
Registering the matrix device with the VFIO mediated device
framework will create the following sysfs structures:
/sys/devices/vfio_ap
... [matrix]
...... [mdev_supported_types]
......... [vfio_ap-passthrough]
............ create
To create a mediated device for the AP matrix device, write a UUID
to the create file:
uuidgen > create
A symbolic link to the mediated device's directory will be created in the
devices subdirectory named after the generated $uuid:
/sys/devices/vfio_ap
... [matrix]
...... [mdev_supported_types]
......... [vfio_ap-passthrough]
............ [devices]
............... [$uuid]
Signed-off-by: Tony Krowiak <[email protected]>
---
MAINTAINERS | 1 +
drivers/s390/crypto/Makefile | 2 +-
drivers/s390/crypto/vfio_ap_drv.c | 9 ++
drivers/s390/crypto/vfio_ap_ops.c | 131 +++++++++++++++++++++++++++++++++
drivers/s390/crypto/vfio_ap_private.h | 22 +++++-
5 files changed, 161 insertions(+), 4 deletions(-)
create mode 100644 drivers/s390/crypto/vfio_ap_ops.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 0515dae..3217803 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12410,6 +12410,7 @@ W: http://www.ibm.com/developerworks/linux/linux390/
S: Supported
F: drivers/s390/crypto/vfio_ap_drv.c
F: drivers/s390/crypto/vfio_ap_private.h
+F: drivers/s390/crypto/vfio_ap_ops.c
S390 ZFCP DRIVER
M: Steffen Maier <[email protected]>
diff --git a/drivers/s390/crypto/Makefile b/drivers/s390/crypto/Makefile
index 48e466e..8d36b05 100644
--- a/drivers/s390/crypto/Makefile
+++ b/drivers/s390/crypto/Makefile
@@ -17,5 +17,5 @@ pkey-objs := pkey_api.o
obj-$(CONFIG_PKEY) += pkey.o
# adjunct processor matrix
-vfio_ap-objs := vfio_ap_drv.o
+vfio_ap-objs := vfio_ap_drv.o vfio_ap_ops.o
obj-$(CONFIG_VFIO_AP) += vfio_ap.o
diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index 93db312..b6ff7a4 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -127,11 +127,20 @@ int __init vfio_ap_init(void)
return ret;
}
+ ret = vfio_ap_mdev_register(matrix_dev);
+ if (ret) {
+ ap_driver_unregister(&vfio_ap_drv);
+ vfio_ap_matrix_dev_destroy(matrix_dev);
+
+ return ret;
+ }
+
return 0;
}
void __exit vfio_ap_exit(void)
{
+ vfio_ap_mdev_unregister(matrix_dev);
ap_driver_unregister(&vfio_ap_drv);
vfio_ap_matrix_dev_destroy(matrix_dev);
}
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
new file mode 100644
index 0000000..4e61e33
--- /dev/null
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -0,0 +1,131 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Adjunct processor matrix VFIO device driver callbacks.
+ *
+ * Copyright IBM Corp. 2018
+ * Author(s): Tony Krowiak <[email protected]>
+ *
+ */
+#include <linux/string.h>
+#include <linux/vfio.h>
+#include <linux/device.h>
+#include <linux/list.h>
+#include <linux/ctype.h>
+
+#include "vfio_ap_private.h"
+
+#define VFOP_AP_MDEV_TYPE_HWVIRT "passthrough"
+#define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
+
+DEFINE_SPINLOCK(mdev_list_lock);
+LIST_HEAD(mdev_list);
+
+static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
+{
+ struct ap_matrix_dev *matrix_dev =
+ to_ap_matrix_dev(mdev_parent_dev(mdev));
+ struct ap_matrix_mdev *matrix_mdev;
+
+ matrix_mdev = kzalloc(sizeof(*matrix_mdev), GFP_KERNEL);
+ if (!matrix_mdev)
+ return -ENOMEM;
+
+ matrix_mdev->name = dev_name(mdev_dev(mdev));
+ mdev_set_drvdata(mdev, matrix_mdev);
+
+ if (atomic_dec_if_positive(&matrix_dev->available_instances) < 0) {
+ kfree(matrix_mdev);
+ return -EPERM;
+ }
+
+ spin_lock_bh(&mdev_list_lock);
+ list_add(&matrix_mdev->list, &mdev_list);
+ spin_unlock_bh(&mdev_list_lock);
+
+ return 0;
+}
+
+static int vfio_ap_mdev_remove(struct mdev_device *mdev)
+{
+ struct ap_matrix_dev *matrix_dev =
+ to_ap_matrix_dev(mdev_parent_dev(mdev));
+ struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+
+ spin_lock_bh(&mdev_list_lock);
+ list_del(&matrix_mdev->list);
+ spin_unlock_bh(&mdev_list_lock);
+ kfree(matrix_mdev);
+ mdev_set_drvdata(mdev, NULL);
+ atomic_inc(&matrix_dev->available_instances);
+
+ return 0;
+}
+
+static ssize_t name_show(struct kobject *kobj, struct device *dev, char *buf)
+{
+ return sprintf(buf, "%s\n", VFIO_AP_MDEV_NAME_HWVIRT);
+}
+
+MDEV_TYPE_ATTR_RO(name);
+
+static ssize_t available_instances_show(struct kobject *kobj,
+ struct device *dev, char *buf)
+{
+ struct ap_matrix_dev *matrix_dev = to_ap_matrix_dev(dev);
+
+ return sprintf(buf, "%d\n",
+ atomic_read(&matrix_dev->available_instances));
+}
+
+MDEV_TYPE_ATTR_RO(available_instances);
+
+static ssize_t device_api_show(struct kobject *kobj, struct device *dev,
+ char *buf)
+{
+ return sprintf(buf, "%s\n", VFIO_DEVICE_API_AP_STRING);
+}
+
+MDEV_TYPE_ATTR_RO(device_api);
+
+static struct attribute *vfio_ap_mdev_type_attrs[] = {
+ &mdev_type_attr_name.attr,
+ &mdev_type_attr_device_api.attr,
+ &mdev_type_attr_available_instances.attr,
+ NULL,
+};
+
+static struct attribute_group vfio_ap_mdev_hwvirt_type_group = {
+ .name = VFOP_AP_MDEV_TYPE_HWVIRT,
+ .attrs = vfio_ap_mdev_type_attrs,
+};
+
+static struct attribute_group *vfio_ap_mdev_type_groups[] = {
+ &vfio_ap_mdev_hwvirt_type_group,
+ NULL,
+};
+
+static const struct mdev_parent_ops vfio_ap_matrix_ops = {
+ .owner = THIS_MODULE,
+ .supported_type_groups = vfio_ap_mdev_type_groups,
+ .create = vfio_ap_mdev_create,
+ .remove = vfio_ap_mdev_remove,
+};
+
+int vfio_ap_mdev_register(struct ap_matrix_dev *matrix_dev)
+{
+ int ret;
+
+ ret = mdev_register_device(&matrix_dev->device, &vfio_ap_matrix_ops);
+ if (ret)
+ return ret;
+
+ atomic_set(&matrix_dev->available_instances,
+ AP_MATRIX_MAX_AVAILABLE_INSTANCES);
+
+ return 0;
+}
+
+void vfio_ap_mdev_unregister(struct ap_matrix_dev *matrix_dev)
+{
+ mdev_unregister_device(&matrix_dev->device);
+}
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 19c0b60..3de1275 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -10,20 +10,36 @@
#define _VFIO_AP_PRIVATE_H_
#include <linux/types.h>
+#include <linux/device.h>
+#include <linux/mdev.h>
#include "ap_bus.h"
#define VFIO_AP_MODULE_NAME "vfio_ap"
#define VFIO_AP_DRV_NAME "vfio_ap"
+/**
+ * There must be one mediated matrix device for every guest using AP devices.
+ * If every APQN is assigned to a guest, then the maximum number of guests with
+ * a unique APQN assigned would be 255 adapters x 255 domains = 72351 guests.
+ */
+#define AP_MATRIX_MAX_AVAILABLE_INSTANCES 72351
struct ap_matrix_dev {
struct device device;
+ atomic_t available_instances;
+};
+
+struct ap_matrix_mdev {
+ const char *name;
+ struct list_head list;
};
-static inline struct ap_matrix_dev
-*to_ap_matrix_parent_dev(struct device *dev)
+static struct ap_matrix_dev *to_ap_matrix_dev(struct device *dev)
{
- return container_of(dev, struct ap_matrix_dev, device.parent);
+ return container_of(dev, struct ap_matrix_dev, device);
}
+extern int vfio_ap_mdev_register(struct ap_matrix_dev *matrix_dev);
+extern void vfio_ap_mdev_unregister(struct ap_matrix_dev *matrix_dev);
+
#endif /* _VFIO_AP_PRIVATE_H_ */
--
1.7.1
Provides the sysfs interfaces for assigning AP domains to
and unassigning AP domains from a mediated matrix device.
An AP domain ID corresponds to an AP queue index (APQI). For
each domain assigned to the mediated matrix device, its
corresponging APQI is stored in an AP queue mask (AQM).
The bits in the AQM, from most significant to least
significant bit, correspond to AP domain numbers 0 to 255.
When a domain is assigned, the bit corresponding to its
APQI will be set in the AQM. Likewise, when a domain is
unassigned, the bit corresponding to its APQI will be
cleared from the AQM.
The relevant sysfs structures are:
/sys/devices/vfio_ap
... [matrix]
...... [mdev_supported_types]
......... [vfio_ap-passthrough]
............ [devices]
...............[$uuid]
.................. assign_domain
.................. unassign_domain
To assign a domain to the $uuid mediated matrix device,
write the domain's ID to the assign_domain file. To
unassign a domain, write the domain's ID to the
unassign_domain file. The ID is specified using
conventional semantics: If it begins with 0x, the number
will be parsed as a hexadecimal (case insensitive) number;
if it begins with 0, it will be parsed as an octal number;
otherwise, it will be parsed as a decimal number.
For example, to assign domain 173 (0xad) to the mediated matrix
device $uuid:
echo 173 > assign_domain
or
echo 0255 > assign_domain
or
echo 0xad > assign_domain
To unassign domain 173 (0xad):
echo 173 > unassign_domain
or
echo 0255 > unassign_domain
or
echo 0xad > unassign_domain
The assignment will be rejected:
* If the domain ID exceeds the maximum value for an AP domain:
* If the AP Extended Addressing (APXA) facility is installed,
the max value is 255
* Else the max value is 15
* If no AP adapters have yet been assigned and there are
no AP queues reserved by the VFIO AP driver that have an APQN
with an APQI matching that of the AP domain number being
assigned.
* If any of the APQNs that can be derived from the intersection
of the APQI being assigned and the AP adapter ID (APID) of
each of the AP adapters previously assigned can not be matched
with an APQN of an AP queue device reserved by the VFIO AP
driver.
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 173 ++++++++++++++++++++++++++++++++++++-
1 files changed, 172 insertions(+), 1 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index a4351bd..a5b06e7 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -170,6 +170,27 @@ static int vfio_ap_queue_has_apid(struct device *dev, void *data)
}
/**
+ * vfio_ap_queue_has_apqi
+ *
+ * @dev: an AP queue device
+ * @data: an AP queue index
+ *
+ * Flags whether any AP queue device has a particular AP queue index
+ *
+ * Returns 0 to indicate the function succeeded
+ */
+static int vfio_ap_queue_has_apqi(struct device *dev, void *data)
+{
+ struct vfio_id_reserved *id_res = data;
+ struct ap_queue *ap_queue = to_ap_queue(dev);
+
+ if (id_res->id == AP_QID_QUEUE(ap_queue->qid))
+ id_res->reserved = true;
+
+ return 0;
+}
+
+/**
* vfio_ap_verify_qid_reserved
*
* @matrix_dev: a mediated matrix device
@@ -236,6 +257,42 @@ static int vfio_ap_verify_apid_reserved(struct ap_matrix_dev *matrix_dev,
return -EPERM;
}
+/**
+ * vfio_ap_verify_apqi_reserved
+ *
+ * @matrix_dev: a mediated matrix device
+ * @apqi: an AP queue index
+ *
+ * Verifies that an AP queue with @apqi is reserved by the VFIO AP device
+ * driver.
+ *
+ * Returns 0 if an AP queue with @apqi is reserved; otherwise, returns -ENODEV.
+ */
+static int vfio_ap_verify_apqi_reserved(struct ap_matrix_dev *matrix_dev,
+ const char *mdev_name,
+ unsigned long apqi)
+{
+ int ret;
+ struct vfio_id_reserved id_res;
+
+ id_res.id = apqi;
+ id_res.reserved = false;
+
+ ret = driver_for_each_device(matrix_dev->device.driver, NULL, &id_res,
+ vfio_ap_queue_has_apqi);
+ if (ret)
+ return ret;
+
+ if (id_res.reserved)
+ return 0;
+
+ pr_err("%s: mdev %s using queue %04lx not reserved by %s driver",
+ VFIO_AP_MODULE_NAME, mdev_name, apqi,
+ VFIO_AP_DRV_NAME);
+
+ return -EPERM;
+}
+
static int vfio_ap_verify_queues_reserved(struct ap_matrix_dev *matrix_dev,
const char *mdev_name,
struct ap_matrix *matrix)
@@ -417,10 +474,124 @@ static ssize_t unassign_adapter_store(struct device *dev,
}
DEVICE_ATTR_WO(unassign_adapter);
+/**
+ * vfio_ap_validate_apqi
+ *
+ * @matrix_mdev: the mediated matrix device
+ * @apqi: the APQI (domain ID) to validate
+ *
+ * Validates the value of @apqi:
+ * * If there are no AP adapters assigned, then there must be at least
+ * one AP queue device reserved by the VFIO AP device driver with an
+ * APQN containing @apqi.
+ *
+ * * Else each APQN that can be derived from the cross product of @apqi and
+ * the IDs of the AP adapters already assigned must identify an AP queue
+ * that has been reserved by the VFIO AP device driver.
+ *
+ * Returns 0 if the value of @apqi is valid; otherwise, returns an error.
+ */
+static int vfio_ap_validate_apqi(struct mdev_device *mdev,
+ struct ap_matrix_mdev *matrix_mdev,
+ unsigned long apqi)
+{
+ int ret;
+ unsigned long apmsz = matrix_mdev->matrix.apm_max + 1;
+ struct device *dev = mdev_parent_dev(mdev);
+ struct ap_matrix_dev *matrix_dev = to_ap_matrix_dev(dev);
+ struct ap_matrix matrix = matrix_mdev->matrix;
+
+ /* If there are any adapters assigned to the mediated device */
+ if (find_first_bit_inv(matrix.apm, apmsz) < apmsz) {
+ matrix.apm_max = matrix_mdev->matrix.apm_max;
+ memcpy(matrix.apm, matrix_mdev->matrix.apm,
+ ARRAY_SIZE(matrix.apm) * sizeof(matrix.apm[0]));
+ matrix.aqm_max = matrix_mdev->matrix.aqm_max;
+ memset(matrix.aqm, 0,
+ ARRAY_SIZE(matrix.aqm) * sizeof(matrix.aqm[0]));
+ set_bit_inv(apqi, matrix.aqm);
+ ret = vfio_ap_verify_queues_reserved(matrix_dev,
+ matrix_mdev->name,
+ &matrix);
+ } else {
+ ret = vfio_ap_verify_apqi_reserved(matrix_dev,
+ matrix_mdev->name, apqi);
+ }
+
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+static ssize_t assign_domain_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ int ret;
+ unsigned long apqi;
+ struct mdev_device *mdev = mdev_from_dev(dev);
+ struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+ unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
+
+ ret = kstrtoul(buf, 0, &apqi);
+ if (ret || (apqi > max_apqi)) {
+ pr_err("%s: %s: domain id '%s' not a value from 0 to %02lu(%#04lx)",
+ VFIO_AP_MODULE_NAME, __func__, buf, max_apqi, max_apqi);
+
+ return ret ? ret : -EINVAL;
+ }
+
+ ret = vfio_ap_validate_apqi(mdev, matrix_mdev, apqi);
+ if (ret)
+ return ret;
+
+ /* Set the bit in the AQM (bitmask) corresponding to the AP domain
+ * number (APQI). The bits in the mask, from most significant to least
+ * significant, correspond to numbers 0-255.
+ */
+ set_bit_inv(apqi, matrix_mdev->matrix.aqm);
+
+ return count;
+}
+DEVICE_ATTR_WO(assign_domain);
+
+static ssize_t unassign_domain_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ int ret;
+ unsigned long apqi;
+ struct mdev_device *mdev = mdev_from_dev(dev);
+ struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+ unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
+
+ ret = kstrtoul(buf, 0, &apqi);
+ if (ret || (apqi > max_apqi)) {
+ pr_err("%s: %s: domain id '%s' not a value from 0 to %02lu(%#04lx)",
+ VFIO_AP_MODULE_NAME, __func__, buf, max_apqi, max_apqi);
+
+ return ret ? ret : -EINVAL;
+ }
+
+ if (!test_bit_inv(apqi, matrix_mdev->matrix.aqm)) {
+ pr_err("%s: %s: domain %02lu(%#04lx) not assigned",
+ VFIO_AP_MODULE_NAME, __func__, apqi, apqi);
+ return -ENODEV;
+ }
+
+ clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
+
+ return count;
+}
+DEVICE_ATTR_WO(unassign_domain);
+
static struct attribute *vfio_ap_mdev_attrs[] = {
&dev_attr_assign_adapter.attr,
&dev_attr_unassign_adapter.attr,
- NULL
+ &dev_attr_assign_domain.attr,
+ &dev_attr_unassign_domain.attr,
+ NULL,
};
static struct attribute_group vfio_ap_mdev_attr_group = {
--
1.7.1
Implements the open callback on the mediated matrix device.
The function registers a group notifier to receive notification
of the VFIO_GROUP_NOTIFY_SET_KVM event. When notified,
the vfio_ap device driver will get access to the guest's
kvm structure. The open callback must ensure that only one
mediated device shall be opened per guest.
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 128 +++++++++++++++++++++++++++++++++
drivers/s390/crypto/vfio_ap_private.h | 2 +
2 files changed, 130 insertions(+), 0 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index bc7398d..58be495 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -11,6 +11,10 @@
#include <linux/device.h>
#include <linux/list.h>
#include <linux/ctype.h>
+#include <linux/bitops.h>
+#include <linux/kvm_host.h>
+#include <linux/module.h>
+#include <asm/kvm.h>
#include "vfio_ap_private.h"
@@ -748,12 +752,136 @@ static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
NULL
};
+/**
+ * Verify that the AP instructions are available on the guest and are to be
+ * interpreted by the firmware. The former is indicated via the
+ * KVM_S390_VM_CPU_FEAT_AP CPU model feature and the latter by apie crypto
+ * flag.
+ */
+static int kvm_ap_validate_crypto_setup(struct kvm *kvm)
+{
+ if (test_bit_inv(KVM_S390_VM_CPU_FEAT_AP, kvm->arch.cpu_feat) &&
+ kvm->arch.crypto.apie)
+ return 0;
+
+ pr_err("%s: interpretation of AP instructions not available",
+ VFIO_AP_MODULE_NAME);
+
+ return -EOPNOTSUPP;
+}
+
+static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ struct ap_matrix_mdev *matrix_mdev;
+
+ if (action == VFIO_GROUP_NOTIFY_SET_KVM) {
+ matrix_mdev = container_of(nb, struct ap_matrix_mdev,
+ group_notifier);
+ matrix_mdev->kvm = data;
+ }
+
+ return NOTIFY_OK;
+}
+
+/**
+ * vfio_ap_mdev_open_once
+ *
+ * @matrix_mdev: a mediated matrix device
+ *
+ * Return 0 if no other mediated matrix device has been opened for the
+ * KVM guest assigned to @matrix_mdev; otherwise, returns an error.
+ */
+static int vfio_ap_mdev_open_once(struct ap_matrix_mdev *matrix_mdev)
+{
+ int ret = 0;
+ struct ap_matrix_mdev *lstdev;
+
+ spin_lock_bh(&mdev_list_lock);
+
+ list_for_each_entry(lstdev, &mdev_list, list) {
+ if ((lstdev->kvm == matrix_mdev->kvm) &&
+ (lstdev != matrix_mdev)) {
+ ret = -EPERM;
+ break;
+ }
+ }
+
+ if (ret) {
+ pr_err("%s: mdev %s open failed for guest %s",
+ VFIO_AP_MODULE_NAME, matrix_mdev->name,
+ matrix_mdev->kvm->arch.dbf->name);
+ pr_err("%s: mdev %s already opened for guest %s",
+ VFIO_AP_MODULE_NAME, lstdev->name,
+ lstdev->kvm->arch.dbf->name);
+ }
+
+ spin_unlock_bh(&mdev_list_lock);
+ return ret;
+}
+
+static int vfio_ap_mdev_open(struct mdev_device *mdev)
+{
+ struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+ struct ap_matrix_dev *matrix_dev =
+ to_ap_matrix_dev(mdev_parent_dev(mdev));
+ unsigned long events;
+ int ret;
+
+ if (!try_module_get(THIS_MODULE))
+ return -ENODEV;
+
+ ret = vfio_ap_verify_queues_reserved(matrix_dev, matrix_mdev->name,
+ &matrix_mdev->matrix);
+ if (ret)
+ goto out_err;
+
+ matrix_mdev->group_notifier.notifier_call = vfio_ap_mdev_group_notifier;
+ events = VFIO_GROUP_NOTIFY_SET_KVM;
+
+ ret = vfio_register_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
+ &events, &matrix_mdev->group_notifier);
+ if (ret)
+ goto out_err;
+
+ ret = kvm_ap_validate_crypto_setup(matrix_mdev->kvm);
+ if (ret)
+ goto out_kvm_err;
+
+ ret = vfio_ap_mdev_open_once(matrix_mdev);
+ if (ret)
+ goto out_kvm_err;
+
+ return 0;
+
+out_kvm_err:
+ vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
+ &matrix_mdev->group_notifier);
+ matrix_mdev->kvm = NULL;
+out_err:
+ module_put(THIS_MODULE);
+
+ return ret;
+}
+
+static void vfio_ap_mdev_release(struct mdev_device *mdev)
+{
+ struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+
+ vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
+ &matrix_mdev->group_notifier);
+ matrix_mdev->kvm = NULL;
+ module_put(THIS_MODULE);
+}
+
static const struct mdev_parent_ops vfio_ap_matrix_ops = {
.owner = THIS_MODULE,
.supported_type_groups = vfio_ap_mdev_type_groups,
.mdev_attr_groups = vfio_ap_mdev_attr_groups,
.create = vfio_ap_mdev_create,
.remove = vfio_ap_mdev_remove,
+ .open = vfio_ap_mdev_open,
+ .release = vfio_ap_mdev_release,
};
int vfio_ap_mdev_register(struct ap_matrix_dev *matrix_dev)
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index ae771f5..7792b45 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -56,6 +56,8 @@ struct ap_matrix_mdev {
const char *name;
struct list_head list;
struct ap_matrix matrix;
+ struct notifier_block group_notifier;
+ struct kvm *kvm;
};
static struct ap_matrix_dev *to_ap_matrix_dev(struct device *dev)
--
1.7.1
On 06/29/2018 11:11 PM, Tony Krowiak wrote:
> Introduces a new AP device driver. This device driver
> is built on the VFIO mediated device framework. The framework
> provides sysfs interfaces that facilitate passthrough
> access by guests to devices installed on the linux host.
>
> The VFIO AP device driver will serve two purposes:
>
> 1. Provide the interfaces to reserve AP devices for exclusive
> use by KVM guests. This is accomplished by unbinding the
> devices to be reserved for guest usage from the default AP
> device driver and binding them to the VFIO AP device driver.
>
> 2. Implements the functions, callbacks and sysfs attribute
> interfaces required to create one or more VFIO mediated
> devices each of which will be used to configure the AP
> matrix for a guest and serve as a file descriptor
> for facilitating communication between QEMU and the
> VFIO AP device driver.
>
> When the VFIO AP device driver is initialized:
>
> * It registers with the AP bus for control of type 10 (CEX4
> and newer) AP queue devices. This limitation was imposed
> due to:
>
> 1. A lack of access to older systems needed to test the
> older AP device models;
>
> 2. A desire to keep the code as simple as possible;
>
> 3. Some older models are no longer supported by the kernel
> and others are getting close to end of service.
>
> The probe and remove callbacks will be provided to support
> the binding/unbinding of AP queue devices to/from the VFIO
> AP device driver.
>
> * Creates a /sys/devices/vfio-ap/matrix device to hold
> the APQNs of the AP devices bound to the VFIO
> AP device driver and serves as the parent of the
> mediated devices created for each guest.
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> MAINTAINERS | 10 +++
> arch/s390/Kconfig | 11 +++
> drivers/s390/crypto/Makefile | 4 +
> drivers/s390/crypto/vfio_ap_drv.c | 140 +++++++++++++++++++++++++++++++++
> drivers/s390/crypto/vfio_ap_private.h | 29 +++++++
> include/uapi/linux/vfio.h | 2 +
> samples/bpf/bpf_load.c | 62 +++++++++++++++
You have probably touched the last one by accident.
Regards,
Halil
On 06/29/2018 11:11 PM, Tony Krowiak wrote:
> Introduces a new CPU model feature and two CPU model
> facilities to support AP virtualization for KVM guests.
>
> CPU model feature:
>
> The KVM_S390_VM_CPU_FEAT_AP feature indicates that
> AP instructions are available on the guest. This
> feature will be enabled by the kernel only if the AP
> instructions are installed on the linux host. This feature
> must be specifically turned on for the KVM guest from
> userspace to use the VFIO AP device driver for guest
> access to AP devices.
>
> CPU model facilities:
>
> 1. AP Query Configuration Information (QCI) facility is installed.
>
> This is indicated by setting facilities bit 12 for
> the guest. The kernel will not enable this facility
> for the guest if it is not set on the host. This facility
> must not be set by userspace if the KVM_S390_VM_CPU_FEAT_AP
> feature is not installed.
>
> If this facility is not set for the KVM guest, then only
> APQNs with an APQI less than 16 will be available to the
> guest regardless of the guest's matrix configuration. This
> is a limitation of the AP bus running on the guest.
>
> 2. AP Facilities Test facility (APFT) is installed.
>
> This is indicated by setting facilities bit 15 for
> the guest. The kernel will not enable this facility for
> the guest if it is not set on the host. This facility
> must not be set by userspace if the KVM_S390_VM_CPU_FEAT_AP
> feature is not installed.
>
> If this facility is not set for the KVM guest, then no
> AP devices will be available to the guest regardless of
> the guest's matrix configuration. This is a limitation
> of the AP bus running under the guest.
>
> Reviewed-by: Christian Borntraeger <[email protected]>
> Reviewed-by: Halil Pasic <[email protected]>
> Signed-off-by: Tony Krowiak <[email protected]>
I think it probably should be at the end of the series, other than that its good.
> ---
> arch/s390/include/uapi/asm/kvm.h | 1 +
> arch/s390/kvm/kvm-s390.c | 8 ++++++++
> arch/s390/tools/gen_facilities.c | 3 +++
> 3 files changed, 12 insertions(+), 0 deletions(-)
>
> diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
> index 4cdaa55..a580dec 100644
> --- a/arch/s390/include/uapi/asm/kvm.h
> +++ b/arch/s390/include/uapi/asm/kvm.h
> @@ -130,6 +130,7 @@ struct kvm_s390_vm_cpu_machine {
> #define KVM_S390_VM_CPU_FEAT_PFMFI 11
> #define KVM_S390_VM_CPU_FEAT_SIGPIF 12
> #define KVM_S390_VM_CPU_FEAT_KSS 13
> +#define KVM_S390_VM_CPU_FEAT_AP 14
> struct kvm_s390_vm_cpu_feat {
> __u64 feat[16];
> };
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 3b7a515..d2208d4 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -40,6 +40,7 @@
> #include <asm/sclp.h>
> #include <asm/cpacf.h>
> #include <asm/timex.h>
> +#include <asm/ap.h>
> #include "kvm-s390.h"
> #include "gaccess.h"
>
> @@ -366,6 +367,13 @@ static void kvm_s390_cpu_feat_init(void)
>
> if (MACHINE_HAS_ESOP)
> allow_cpu_feat(KVM_S390_VM_CPU_FEAT_ESOP);
> +
> + /*
> + * Check if AP instructions installed on host
> + */
> + if (ap_instructions_available() == 0)
> + allow_cpu_feat(KVM_S390_VM_CPU_FEAT_AP);
> +
> /*
> * We need SIE support, ESOP (PROT_READ protection for gmap_shadow),
> * 64bit SCAO (SCA passthrough) and IDTE (for gmap_shadow unshadowing).
> diff --git a/arch/s390/tools/gen_facilities.c b/arch/s390/tools/gen_facilities.c
> index 90a8c9e..e0e2c19 100644
> --- a/arch/s390/tools/gen_facilities.c
> +++ b/arch/s390/tools/gen_facilities.c
> @@ -106,6 +106,9 @@ struct facility_def {
>
> .name = "FACILITIES_KVM_CPUMODEL",
> .bits = (int[]){
> + 12, /* AP Query Configuration Information */
> + 15, /* AP Facilities Test */
> -1 /* END */
> }
> },
On 07/02/2018 10:38 AM, Christian Borntraeger wrote:
>
> On 06/29/2018 11:11 PM, Tony Krowiak wrote:
>> Introduces a new CPU model feature and two CPU model
>> facilities to support AP virtualization for KVM guests.
>>
>> CPU model feature:
>>
>> The KVM_S390_VM_CPU_FEAT_AP feature indicates that
>> AP instructions are available on the guest. This
>> feature will be enabled by the kernel only if the AP
>> instructions are installed on the linux host. This feature
>> must be specifically turned on for the KVM guest from
>> userspace to use the VFIO AP device driver for guest
>> access to AP devices.
>>
>> CPU model facilities:
>>
>> 1. AP Query Configuration Information (QCI) facility is installed.
>>
>> This is indicated by setting facilities bit 12 for
>> the guest. The kernel will not enable this facility
>> for the guest if it is not set on the host. This facility
>> must not be set by userspace if the KVM_S390_VM_CPU_FEAT_AP
>> feature is not installed.
>>
>> If this facility is not set for the KVM guest, then only
>> APQNs with an APQI less than 16 will be available to the
>> guest regardless of the guest's matrix configuration. This
>> is a limitation of the AP bus running on the guest.
>>
>> 2. AP Facilities Test facility (APFT) is installed.
>>
>> This is indicated by setting facilities bit 15 for
>> the guest. The kernel will not enable this facility for
>> the guest if it is not set on the host. This facility
>> must not be set by userspace if the KVM_S390_VM_CPU_FEAT_AP
>> feature is not installed.
>>
>> If this facility is not set for the KVM guest, then no
>> AP devices will be available to the guest regardless of
>> the guest's matrix configuration. This is a limitation
>> of the AP bus running under the guest.
>>
>> Reviewed-by: Christian Borntraeger <[email protected]>
>> Reviewed-by: Halil Pasic <[email protected]>
>> Signed-off-by: Tony Krowiak <[email protected]>
> I think it probably should be at the end of the series, other than that its good.
If I move this to the end of the series, the very next patch checks the
KVM_S390_VM_CPU_FEAT_AP feature?
>
>
>
>> ---
>> arch/s390/include/uapi/asm/kvm.h | 1 +
>> arch/s390/kvm/kvm-s390.c | 8 ++++++++
>> arch/s390/tools/gen_facilities.c | 3 +++
>> 3 files changed, 12 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
>> index 4cdaa55..a580dec 100644
>> --- a/arch/s390/include/uapi/asm/kvm.h
>> +++ b/arch/s390/include/uapi/asm/kvm.h
>> @@ -130,6 +130,7 @@ struct kvm_s390_vm_cpu_machine {
>> #define KVM_S390_VM_CPU_FEAT_PFMFI 11
>> #define KVM_S390_VM_CPU_FEAT_SIGPIF 12
>> #define KVM_S390_VM_CPU_FEAT_KSS 13
>> +#define KVM_S390_VM_CPU_FEAT_AP 14
>> struct kvm_s390_vm_cpu_feat {
>> __u64 feat[16];
>> };
>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>> index 3b7a515..d2208d4 100644
>> --- a/arch/s390/kvm/kvm-s390.c
>> +++ b/arch/s390/kvm/kvm-s390.c
>> @@ -40,6 +40,7 @@
>> #include <asm/sclp.h>
>> #include <asm/cpacf.h>
>> #include <asm/timex.h>
>> +#include <asm/ap.h>
>> #include "kvm-s390.h"
>> #include "gaccess.h"
>>
>> @@ -366,6 +367,13 @@ static void kvm_s390_cpu_feat_init(void)
>>
>> if (MACHINE_HAS_ESOP)
>> allow_cpu_feat(KVM_S390_VM_CPU_FEAT_ESOP);
>> +
>> + /*
>> + * Check if AP instructions installed on host
>> + */
>> + if (ap_instructions_available() == 0)
>> + allow_cpu_feat(KVM_S390_VM_CPU_FEAT_AP);
>> +
>> /*
>> * We need SIE support, ESOP (PROT_READ protection for gmap_shadow),
>> * 64bit SCAO (SCA passthrough) and IDTE (for gmap_shadow unshadowing).
>> diff --git a/arch/s390/tools/gen_facilities.c b/arch/s390/tools/gen_facilities.c
>> index 90a8c9e..e0e2c19 100644
>> --- a/arch/s390/tools/gen_facilities.c
>> +++ b/arch/s390/tools/gen_facilities.c
>> @@ -106,6 +106,9 @@ struct facility_def {
>>
>> .name = "FACILITIES_KVM_CPUMODEL",
>> .bits = (int[]){
>> + 12, /* AP Query Configuration Information */
>> + 15, /* AP Facilities Test */
>> -1 /* END */
>> }
>> },
On 07/02/2018 09:53 AM, Halil Pasic wrote:
>
>
> On 06/29/2018 11:11 PM, Tony Krowiak wrote:
>> Introduces a new AP device driver. This device driver
>> is built on the VFIO mediated device framework. The framework
>> provides sysfs interfaces that facilitate passthrough
>> access by guests to devices installed on the linux host.
>>
>> The VFIO AP device driver will serve two purposes:
>>
>> 1. Provide the interfaces to reserve AP devices for exclusive
>> use by KVM guests. This is accomplished by unbinding the
>> devices to be reserved for guest usage from the default AP
>> device driver and binding them to the VFIO AP device driver.
>>
>> 2. Implements the functions, callbacks and sysfs attribute
>> interfaces required to create one or more VFIO mediated
>> devices each of which will be used to configure the AP
>> matrix for a guest and serve as a file descriptor
>> for facilitating communication between QEMU and the
>> VFIO AP device driver.
>>
>> When the VFIO AP device driver is initialized:
>>
>> * It registers with the AP bus for control of type 10 (CEX4
>> and newer) AP queue devices. This limitation was imposed
>> due to:
>>
>> 1. A lack of access to older systems needed to test the
>> older AP device models;
>>
>> 2. A desire to keep the code as simple as possible;
>>
>> 3. Some older models are no longer supported by the kernel
>> and others are getting close to end of service.
>>
>> The probe and remove callbacks will be provided to support
>> the binding/unbinding of AP queue devices to/from the VFIO
>> AP device driver.
>>
>> * Creates a /sys/devices/vfio-ap/matrix device to hold
>> the APQNs of the AP devices bound to the VFIO
>> AP device driver and serves as the parent of the
>> mediated devices created for each guest.
>>
>> Signed-off-by: Tony Krowiak <[email protected]>
>> ---
>> MAINTAINERS | 10 +++
>> arch/s390/Kconfig | 11 +++
>> drivers/s390/crypto/Makefile | 4 +
>> drivers/s390/crypto/vfio_ap_drv.c | 140
>> +++++++++++++++++++++++++++++++++
>> drivers/s390/crypto/vfio_ap_private.h | 29 +++++++
>> include/uapi/linux/vfio.h | 2 +
>> samples/bpf/bpf_load.c | 62 +++++++++++++++
>
> You have probably touched the last one by accident.
I'll have to figure out what happened here.
>
>
> Regards,
> Halil
On Mon, 2 Jul 2018 11:37:11 -0400
Tony Krowiak <[email protected]> wrote:
> On 07/02/2018 10:38 AM, Christian Borntraeger wrote:
> >
> > On 06/29/2018 11:11 PM, Tony Krowiak wrote:
> >> Introduces a new CPU model feature and two CPU model
> >> facilities to support AP virtualization for KVM guests.
> >>
> >> CPU model feature:
> >>
> >> The KVM_S390_VM_CPU_FEAT_AP feature indicates that
> >> AP instructions are available on the guest. This
> >> feature will be enabled by the kernel only if the AP
> >> instructions are installed on the linux host. This feature
> >> must be specifically turned on for the KVM guest from
> >> userspace to use the VFIO AP device driver for guest
> >> access to AP devices.
> >>
> >> CPU model facilities:
> >>
> >> 1. AP Query Configuration Information (QCI) facility is installed.
> >>
> >> This is indicated by setting facilities bit 12 for
> >> the guest. The kernel will not enable this facility
> >> for the guest if it is not set on the host. This facility
> >> must not be set by userspace if the KVM_S390_VM_CPU_FEAT_AP
> >> feature is not installed.
> >>
> >> If this facility is not set for the KVM guest, then only
> >> APQNs with an APQI less than 16 will be available to the
> >> guest regardless of the guest's matrix configuration. This
> >> is a limitation of the AP bus running on the guest.
> >>
> >> 2. AP Facilities Test facility (APFT) is installed.
> >>
> >> This is indicated by setting facilities bit 15 for
> >> the guest. The kernel will not enable this facility for
> >> the guest if it is not set on the host. This facility
> >> must not be set by userspace if the KVM_S390_VM_CPU_FEAT_AP
> >> feature is not installed.
> >>
> >> If this facility is not set for the KVM guest, then no
> >> AP devices will be available to the guest regardless of
> >> the guest's matrix configuration. This is a limitation
> >> of the AP bus running under the guest.
> >>
> >> Reviewed-by: Christian Borntraeger <[email protected]>
> >> Reviewed-by: Halil Pasic <[email protected]>
> >> Signed-off-by: Tony Krowiak <[email protected]>
> > I think it probably should be at the end of the series, other than that its good.
>
> If I move this to the end of the series, the very next patch checks the
>
> KVM_S390_VM_CPU_FEAT_AP feature?
Introduce it here, offer it only with the last patch?
On 07/02/2018 11:41 AM, Cornelia Huck wrote:
> On Mon, 2 Jul 2018 11:37:11 -0400
> Tony Krowiak <[email protected]> wrote:
>
>> On 07/02/2018 10:38 AM, Christian Borntraeger wrote:
>>> On 06/29/2018 11:11 PM, Tony Krowiak wrote:
>>>> Introduces a new CPU model feature and two CPU model
>>>> facilities to support AP virtualization for KVM guests.
>>>>
>>>> CPU model feature:
>>>>
>>>> The KVM_S390_VM_CPU_FEAT_AP feature indicates that
>>>> AP instructions are available on the guest. This
>>>> feature will be enabled by the kernel only if the AP
>>>> instructions are installed on the linux host. This feature
>>>> must be specifically turned on for the KVM guest from
>>>> userspace to use the VFIO AP device driver for guest
>>>> access to AP devices.
>>>>
>>>> CPU model facilities:
>>>>
>>>> 1. AP Query Configuration Information (QCI) facility is installed.
>>>>
>>>> This is indicated by setting facilities bit 12 for
>>>> the guest. The kernel will not enable this facility
>>>> for the guest if it is not set on the host. This facility
>>>> must not be set by userspace if the KVM_S390_VM_CPU_FEAT_AP
>>>> feature is not installed.
>>>>
>>>> If this facility is not set for the KVM guest, then only
>>>> APQNs with an APQI less than 16 will be available to the
>>>> guest regardless of the guest's matrix configuration. This
>>>> is a limitation of the AP bus running on the guest.
>>>>
>>>> 2. AP Facilities Test facility (APFT) is installed.
>>>>
>>>> This is indicated by setting facilities bit 15 for
>>>> the guest. The kernel will not enable this facility for
>>>> the guest if it is not set on the host. This facility
>>>> must not be set by userspace if the KVM_S390_VM_CPU_FEAT_AP
>>>> feature is not installed.
>>>>
>>>> If this facility is not set for the KVM guest, then no
>>>> AP devices will be available to the guest regardless of
>>>> the guest's matrix configuration. This is a limitation
>>>> of the AP bus running under the guest.
>>>>
>>>> Reviewed-by: Christian Borntraeger <[email protected]>
>>>> Reviewed-by: Halil Pasic <[email protected]>
>>>> Signed-off-by: Tony Krowiak <[email protected]>
>>> I think it probably should be at the end of the series, other than that its good.
>> If I move this to the end of the series, the very next patch checks the
>>
>> KVM_S390_VM_CPU_FEAT_AP feature?
> Introduce it here, offer it only with the last patch?
I apologize, but I don't know what you mean by this. Are you suggesting
this patch
should only include the #define for KVM_S390_VM_CPU_FEAT_AP?
>
On 06/29/2018 05:11 PM, Tony Krowiak wrote:
> Introduces a new CPU model feature and two CPU model
> facilities to support AP virtualization for KVM guests.
>
> CPU model feature:
>
> The KVM_S390_VM_CPU_FEAT_AP feature indicates that
> AP instructions are available on the guest. This
> feature will be enabled by the kernel only if the AP
> instructions are installed on the linux host. This feature
> must be specifically turned on for the KVM guest from
> userspace to use the VFIO AP device driver for guest
> access to AP devices.
>
> CPU model facilities:
>
> 1. AP Query Configuration Information (QCI) facility is installed.
>
> This is indicated by setting facilities bit 12 for
> the guest. The kernel will not enable this facility
> for the guest if it is not set on the host. This facility
> must not be set by userspace if the KVM_S390_VM_CPU_FEAT_AP
> feature is not installed.
>
> If this facility is not set for the KVM guest, then only
> APQNs with an APQI less than 16 will be available to the
> guest regardless of the guest's matrix configuration. This
> is a limitation of the AP bus running on the guest.
>
> 2. AP Facilities Test facility (APFT) is installed.
>
> This is indicated by setting facilities bit 15 for
> the guest. The kernel will not enable this facility for
> the guest if it is not set on the host. This facility
> must not be set by userspace if the KVM_S390_VM_CPU_FEAT_AP
> feature is not installed.
>
> If this facility is not set for the KVM guest, then no
> AP devices will be available to the guest regardless of
> the guest's matrix configuration. This is a limitation
> of the AP bus running under the guest.
>
> Reviewed-by: Christian Borntraeger <[email protected]>
> Reviewed-by: Halil Pasic <[email protected]>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> arch/s390/include/uapi/asm/kvm.h | 1 +
> arch/s390/kvm/kvm-s390.c | 8 ++++++++
> arch/s390/tools/gen_facilities.c | 3 +++
> 3 files changed, 12 insertions(+), 0 deletions(-)
>
> diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
> index 4cdaa55..a580dec 100644
> --- a/arch/s390/include/uapi/asm/kvm.h
> +++ b/arch/s390/include/uapi/asm/kvm.h
> @@ -130,6 +130,7 @@ struct kvm_s390_vm_cpu_machine {
> #define KVM_S390_VM_CPU_FEAT_PFMFI 11
> #define KVM_S390_VM_CPU_FEAT_SIGPIF 12
> #define KVM_S390_VM_CPU_FEAT_KSS 13
> +#define KVM_S390_VM_CPU_FEAT_AP 14
> struct kvm_s390_vm_cpu_feat {
> __u64 feat[16];
> };
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 3b7a515..d2208d4 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -40,6 +40,7 @@
> #include <asm/sclp.h>
> #include <asm/cpacf.h>
> #include <asm/timex.h>
> +#include <asm/ap.h>
> #include "kvm-s390.h"
> #include "gaccess.h"
>
> @@ -366,6 +367,13 @@ static void kvm_s390_cpu_feat_init(void)
>
> if (MACHINE_HAS_ESOP)
> allow_cpu_feat(KVM_S390_VM_CPU_FEAT_ESOP);
> +
> + /*
> + * Check if AP instructions installed on host
> + */
> + if (ap_instructions_available() == 0)
> + allow_cpu_feat(KVM_S390_VM_CPU_FEAT_AP);
> +
> /*
> * We need SIE support, ESOP (PROT_READ protection for gmap_shadow),
> * 64bit SCAO (SCA passthrough) and IDTE (for gmap_shadow unshadowing).
> diff --git a/arch/s390/tools/gen_facilities.c b/arch/s390/tools/gen_facilities.c
> index 90a8c9e..e0e2c19 100644
> --- a/arch/s390/tools/gen_facilities.c
> +++ b/arch/s390/tools/gen_facilities.c
> @@ -106,6 +106,9 @@ struct facility_def {
>
> .name = "FACILITIES_KVM_CPUMODEL",
> .bits = (int[]){
> + 12, /* AP Query Configuration Information */
> + 15, /* AP Facilities Test */
> + 156, /* Execution Token facility */
Oops. This last one is an error, the result of a faulty merge with the
latest
code base.
> -1 /* END */
> }
> },
On 07/02/2018 05:37 PM, Tony Krowiak wrote:
> On 07/02/2018 10:38 AM, Christian Borntraeger wrote:
>>
>> On 06/29/2018 11:11 PM, Tony Krowiak wrote:
>>> Introduces a new CPU model feature and two CPU model
>>> facilities to support AP virtualization for KVM guests.
>>>
>>> CPU model feature:
>>>
>>> The KVM_S390_VM_CPU_FEAT_AP feature indicates that
>>> AP instructions are available on the guest. This
>>> feature will be enabled by the kernel only if the AP
>>> instructions are installed on the linux host. This feature
>>> must be specifically turned on for the KVM guest from
>>> userspace to use the VFIO AP device driver for guest
>>> access to AP devices.
>>>
>>> CPU model facilities:
>>>
>>> 1. AP Query Configuration Information (QCI) facility is installed.
>>>
>>> This is indicated by setting facilities bit 12 for
>>> the guest. The kernel will not enable this facility
>>> for the guest if it is not set on the host. This facility
>>> must not be set by userspace if the KVM_S390_VM_CPU_FEAT_AP
>>> feature is not installed.
>>>
>>> If this facility is not set for the KVM guest, then only
>>> APQNs with an APQI less than 16 will be available to the
>>> guest regardless of the guest's matrix configuration. This
>>> is a limitation of the AP bus running on the guest.
>>>
>>> 2. AP Facilities Test facility (APFT) is installed.
>>>
>>> This is indicated by setting facilities bit 15 for
>>> the guest. The kernel will not enable this facility for
>>> the guest if it is not set on the host. This facility
>>> must not be set by userspace if the KVM_S390_VM_CPU_FEAT_AP
>>> feature is not installed.
>>>
>>> If this facility is not set for the KVM guest, then no
>>> AP devices will be available to the guest regardless of
>>> the guest's matrix configuration. This is a limitation
>>> of the AP bus running under the guest.
>>>
>>> Reviewed-by: Christian Borntraeger <[email protected]>
>>> Reviewed-by: Halil Pasic <[email protected]>
>>> Signed-off-by: Tony Krowiak <[email protected]>
>> I think it probably should be at the end of the series, other than that its good.
>
> If I move this to the end of the series, the very next patch checks the
>
> KVM_S390_VM_CPU_FEAT_AP feature?
>
The point is the following: never expose a feature *before* it
is actually provided. And this is exactly what you are doing here.
AFAIU the userspace can happily negotiate KVM_S390_VM_CPU_FEAT_AP,
do it's part of the job and still not have AP instructions in the guest
if patches [0..5] are applied but [6..21] not. This is wrong.
AFAIR I requested this one being squashed with the next one for exact
this reason. That would be OK as starting with patch 6 applied we
do satisfy what the features require. It's only that the interfaces
required to set up the resources are not there yet and thus the features
can't really be used meaningfully.
Usually we expose the features at the end of a series, as such a series
often just implements support for the given feature(s).
In this special IMHO case we can get away with not doing so, but not
exposing the feature until the end of the series could still have some
merit.
Anyway we should avoid exposing half-assed stuff. In that sense the
resource cleanup (zapq) logic must not be introduced after
resources can be acquired and utilized.
Regards,
Halil
On Mon, 2 Jul 2018 11:54:28 -0400
Tony Krowiak <[email protected]> wrote:
> On 07/02/2018 11:41 AM, Cornelia Huck wrote:
> > On Mon, 2 Jul 2018 11:37:11 -0400
> > Tony Krowiak <[email protected]> wrote:
> >
> >> On 07/02/2018 10:38 AM, Christian Borntraeger wrote:
> >>> On 06/29/2018 11:11 PM, Tony Krowiak wrote:
> >>>> Introduces a new CPU model feature and two CPU model
> >>>> facilities to support AP virtualization for KVM guests.
> >>>>
> >>>> CPU model feature:
> >>>>
> >>>> The KVM_S390_VM_CPU_FEAT_AP feature indicates that
> >>>> AP instructions are available on the guest. This
> >>>> feature will be enabled by the kernel only if the AP
> >>>> instructions are installed on the linux host. This feature
> >>>> must be specifically turned on for the KVM guest from
> >>>> userspace to use the VFIO AP device driver for guest
> >>>> access to AP devices.
> >>>>
> >>>> CPU model facilities:
> >>>>
> >>>> 1. AP Query Configuration Information (QCI) facility is installed.
> >>>>
> >>>> This is indicated by setting facilities bit 12 for
> >>>> the guest. The kernel will not enable this facility
> >>>> for the guest if it is not set on the host. This facility
> >>>> must not be set by userspace if the KVM_S390_VM_CPU_FEAT_AP
> >>>> feature is not installed.
> >>>>
> >>>> If this facility is not set for the KVM guest, then only
> >>>> APQNs with an APQI less than 16 will be available to the
> >>>> guest regardless of the guest's matrix configuration. This
> >>>> is a limitation of the AP bus running on the guest.
> >>>>
> >>>> 2. AP Facilities Test facility (APFT) is installed.
> >>>>
> >>>> This is indicated by setting facilities bit 15 for
> >>>> the guest. The kernel will not enable this facility for
> >>>> the guest if it is not set on the host. This facility
> >>>> must not be set by userspace if the KVM_S390_VM_CPU_FEAT_AP
> >>>> feature is not installed.
> >>>>
> >>>> If this facility is not set for the KVM guest, then no
> >>>> AP devices will be available to the guest regardless of
> >>>> the guest's matrix configuration. This is a limitation
> >>>> of the AP bus running under the guest.
> >>>>
> >>>> Reviewed-by: Christian Borntraeger <[email protected]>
> >>>> Reviewed-by: Halil Pasic <[email protected]>
> >>>> Signed-off-by: Tony Krowiak <[email protected]>
> >>> I think it probably should be at the end of the series, other than that its good.
> >> If I move this to the end of the series, the very next patch checks the
> >>
> >> KVM_S390_VM_CPU_FEAT_AP feature?
> > Introduce it here, offer it only with the last patch?
>
> I apologize, but I don't know what you mean by this. Are you suggesting
> this patch
> should only include the #define for KVM_S390_VM_CPU_FEAT_AP?
Yes, just introduce the definition here (so code later in the series
can refer to it) and flip the switch (offer the bit) as the final
patch.
On 07/02/2018 06:11 PM, Cornelia Huck wrote:
> On Mon, 2 Jul 2018 11:54:28 -0400
> Tony Krowiak <[email protected]> wrote:
>
>> On 07/02/2018 11:41 AM, Cornelia Huck wrote:
>>> On Mon, 2 Jul 2018 11:37:11 -0400
>>> Tony Krowiak <[email protected]> wrote:
>>>
>>>> On 07/02/2018 10:38 AM, Christian Borntraeger wrote:
>>>>> On 06/29/2018 11:11 PM, Tony Krowiak wrote:
>>>>>> Introduces a new CPU model feature and two CPU model
>>>>>> facilities to support AP virtualization for KVM guests.
>>>>>>
>>>>>> CPU model feature:
>>>>>>
>>>>>> The KVM_S390_VM_CPU_FEAT_AP feature indicates that
>>>>>> AP instructions are available on the guest. This
>>>>>> feature will be enabled by the kernel only if the AP
>>>>>> instructions are installed on the linux host. This feature
>>>>>> must be specifically turned on for the KVM guest from
>>>>>> userspace to use the VFIO AP device driver for guest
>>>>>> access to AP devices.
>>>>>>
>>>>>> CPU model facilities:
>>>>>>
>>>>>> 1. AP Query Configuration Information (QCI) facility is installed.
>>>>>>
>>>>>> This is indicated by setting facilities bit 12 for
>>>>>> the guest. The kernel will not enable this facility
>>>>>> for the guest if it is not set on the host. This facility
>>>>>> must not be set by userspace if the KVM_S390_VM_CPU_FEAT_AP
>>>>>> feature is not installed.
>>>>>>
>>>>>> If this facility is not set for the KVM guest, then only
>>>>>> APQNs with an APQI less than 16 will be available to the
>>>>>> guest regardless of the guest's matrix configuration. This
>>>>>> is a limitation of the AP bus running on the guest.
>>>>>>
>>>>>> 2. AP Facilities Test facility (APFT) is installed.
>>>>>>
>>>>>> This is indicated by setting facilities bit 15 for
>>>>>> the guest. The kernel will not enable this facility for
>>>>>> the guest if it is not set on the host. This facility
>>>>>> must not be set by userspace if the KVM_S390_VM_CPU_FEAT_AP
>>>>>> feature is not installed.
>>>>>>
>>>>>> If this facility is not set for the KVM guest, then no
>>>>>> AP devices will be available to the guest regardless of
>>>>>> the guest's matrix configuration. This is a limitation
>>>>>> of the AP bus running under the guest.
>>>>>>
>>>>>> Reviewed-by: Christian Borntraeger <[email protected]>
>>>>>> Reviewed-by: Halil Pasic <[email protected]>
>>>>>> Signed-off-by: Tony Krowiak <[email protected]>
>>>>> I think it probably should be at the end of the series, other than that its good.
>>>> If I move this to the end of the series, the very next patch checks the
>>>>
>>>> KVM_S390_VM_CPU_FEAT_AP feature?
>>> Introduce it here, offer it only with the last patch?
>>
>> I apologize, but I don't know what you mean by this. Are you suggesting
>> this patch
>> should only include the #define for KVM_S390_VM_CPU_FEAT_AP?
>
> Yes, just introduce the definition here (so code later in the series
> can refer to it) and flip the switch (offer the bit) as the final
> patch.
>
The other features introduced and exposed here are no different. For
KVM_S390_VM_CPU_FEAT_AP defer exposing means defer allow_cpu_feat();
for the STFLE features, defer adding to FACILITIES_KVM_CPUMODEL.
Anyway, I think the definition should be squashed into #6. Expose the
features after patch #6 is in place or expose them at the end of the
series is IMHO a matter of taste -- and I lean towards expose at the
end of the series.
Regards,
Halil
On Mon, 2 Jul 2018 18:20:55 +0200
Halil Pasic <[email protected]> wrote:
> On 07/02/2018 06:11 PM, Cornelia Huck wrote:
> > On Mon, 2 Jul 2018 11:54:28 -0400
> > Tony Krowiak <[email protected]> wrote:
> >
> >> On 07/02/2018 11:41 AM, Cornelia Huck wrote:
> >>> On Mon, 2 Jul 2018 11:37:11 -0400
> >>> Tony Krowiak <[email protected]> wrote:
> >>>
> >>>> On 07/02/2018 10:38 AM, Christian Borntraeger wrote:
> >>>>> On 06/29/2018 11:11 PM, Tony Krowiak wrote:
> >>>>>> Introduces a new CPU model feature and two CPU model
> >>>>>> facilities to support AP virtualization for KVM guests.
> >>>>>>
> >>>>>> CPU model feature:
> >>>>>>
> >>>>>> The KVM_S390_VM_CPU_FEAT_AP feature indicates that
> >>>>>> AP instructions are available on the guest. This
> >>>>>> feature will be enabled by the kernel only if the AP
> >>>>>> instructions are installed on the linux host. This feature
> >>>>>> must be specifically turned on for the KVM guest from
> >>>>>> userspace to use the VFIO AP device driver for guest
> >>>>>> access to AP devices.
> >>>>>>
> >>>>>> CPU model facilities:
> >>>>>>
> >>>>>> 1. AP Query Configuration Information (QCI) facility is installed.
> >>>>>>
> >>>>>> This is indicated by setting facilities bit 12 for
> >>>>>> the guest. The kernel will not enable this facility
> >>>>>> for the guest if it is not set on the host. This facility
> >>>>>> must not be set by userspace if the KVM_S390_VM_CPU_FEAT_AP
> >>>>>> feature is not installed.
> >>>>>>
> >>>>>> If this facility is not set for the KVM guest, then only
> >>>>>> APQNs with an APQI less than 16 will be available to the
> >>>>>> guest regardless of the guest's matrix configuration. This
> >>>>>> is a limitation of the AP bus running on the guest.
> >>>>>>
> >>>>>> 2. AP Facilities Test facility (APFT) is installed.
> >>>>>>
> >>>>>> This is indicated by setting facilities bit 15 for
> >>>>>> the guest. The kernel will not enable this facility for
> >>>>>> the guest if it is not set on the host. This facility
> >>>>>> must not be set by userspace if the KVM_S390_VM_CPU_FEAT_AP
> >>>>>> feature is not installed.
> >>>>>>
> >>>>>> If this facility is not set for the KVM guest, then no
> >>>>>> AP devices will be available to the guest regardless of
> >>>>>> the guest's matrix configuration. This is a limitation
> >>>>>> of the AP bus running under the guest.
> >>>>>>
> >>>>>> Reviewed-by: Christian Borntraeger <[email protected]>
> >>>>>> Reviewed-by: Halil Pasic <[email protected]>
> >>>>>> Signed-off-by: Tony Krowiak <[email protected]>
> >>>>> I think it probably should be at the end of the series, other than that its good.
> >>>> If I move this to the end of the series, the very next patch checks the
> >>>>
> >>>> KVM_S390_VM_CPU_FEAT_AP feature?
> >>> Introduce it here, offer it only with the last patch?
> >>
> >> I apologize, but I don't know what you mean by this. Are you suggesting
> >> this patch
> >> should only include the #define for KVM_S390_VM_CPU_FEAT_AP?
> >
> > Yes, just introduce the definition here (so code later in the series
> > can refer to it) and flip the switch (offer the bit) as the final
> > patch.
> >
>
> The other features introduced and exposed here are no different. For
> KVM_S390_VM_CPU_FEAT_AP defer exposing means defer allow_cpu_feat();
> for the STFLE features, defer adding to FACILITIES_KVM_CPUMODEL.
>
> Anyway, I think the definition should be squashed into #6. Expose the
> features after patch #6 is in place or expose them at the end of the
> series is IMHO a matter of taste -- and I lean towards expose at the
> end of the series.
Squashing with patch 6 and enabling at the end of the series sounds
good to me as well.
On 06/29/2018 11:11 PM, Tony Krowiak wrote:
> This patch provides documentation describing the AP architecture and
> design concepts behind the virtualization of AP devices. It also
> includes an example of how to configure AP devices for exclusive
> use of KVM guests.
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
[..]
> +
> +Reserve APQNs for exclusive use of KVM guests
> +---------------------------------------------
> +The following block diagram illustrates the mechanism by which APQNs are
> +reserved:
> +
> + +------------------+
> + remove | | unbind
> + +------------------->+ cex4queue driver +<-----------+
> + | | | |
> + | +------------------+ |
> + | |
> + | |
> + | |
> ++--------+---------+ register +------------------+ +-----+------+
> +| +<---------+ | bind | |
> +| ap_bus | | vfio_ap driver +<-----+ admin |
> +| +--------->+ | | |
> ++------------------+ probe +---+--------+-----+ +------------+
> + | |
> + create | | store APQN
> + | |
> + v v
> + +---+--------+-----+
> + | |
> + | matrix device |
> + | |
> + +------------------+
> +
> +The process for reserving an AP queue for use by a KVM guest is:
> +
> +* The vfio-ap driver during its initialization will perform the following:
> + * Create the 'vfio_ap' root device - /sys/devices/vfio_ap
> + * Create the 'matrix' device in the 'vfio_ap' root
> + * Register the matrix device with the device core
> +* Register with the ap_bus for AP queue devices of type 10 devices (CEX4 and
> + newer) and to provide the vfio_ap driver's probe and remove callback
> + interfaces. The reason why older devices are not supported is because there
> + are no systems available on which to test.
> +* The admin unbinds queue cc.qqqq from the cex4queue device driver. This results
> + in the ap_bus calling the the device driver's remove interface which
> + unbinds the cc.qqqq queue device from the driver.
What if the queue cc.qqqq is already in use? AFAIU unbind is almost as radical as
pulling a cable. What is the proper procedure an admin should follow before doing
the unbind?
> +* The admin binds the cc.qqqq queue to the vfio_ap device driver. This results
> + in the ap_bus calling the device vfio_ap driver's probe interface to bind
> + queue cc.qqqq to the driver. The vfio_ap device driver will store the APQN for
> + the queue in the matrix device
> +
[..]
On 06/29/2018 11:11 PM, Tony Krowiak wrote:
> This patch provides documentation describing the AP architecture and
> design concepts behind the virtualization of AP devices. It also
> includes an example of how to configure AP devices for exclusive
> use of KVM guests.
>
> Signed-off-by: Tony Krowiak <[email protected]>
I don't like the design of external interfaces except for:
* cpu model features, and
* reset handling.
In particular:
1) The architecture is such that authorizing access (via APM, AQM and ADM)
to an AP queue that is currently not configured (e.g. the card not physically
plugged, or just configured off). That seems to be a perfectly normal use
case.
Your assign operations however enforce that the resource is bound to your
driver, and thus the existence of the resource in the host.
It is clear: we need to avoid passing trough resources to guests that are not
dedicated for this purpose (e.g. a queue utilized by zcrypt). But IMHO
we need a different mechanism.
2) I see no benefit in deferring the exclusivity check to vfio_ap_mdev_open().
The downside is however pretty obvious: management software is notified about
a 'bad configuration' only at an attempted guest start-up. And your current QEMU
patches are not very helpful in conveying this piece of information.
I've talked with Boris, and AFAIR he said this is not acceptable to him (@Boris
can you confirm).
3) We indicate the reason for failure due to a configuration problem (exclusivity
or resource allocation) via pr_err() that is via kernel messages. I don't think
this is very tooling/management software friendly, and I hope we don't expect admins
to work with the sysfs interface long term. I mean the effects of the admin actions
are not very persistent. Thus if the interface is a painful one, we are talking
about potentially frequent pain.
4) If I were to act out the role of the administrator, I would prefer to think of
specifying or changing the access controls of a guest in respect to AP (that is
setting the AP matrix) as a single atomic operation -- which either succeeds or fails.
The operation should succeed for any valid configuration, and fail for any invalid
on.
The current piecemeal approach seems even less fitting if we consider changing the
access controls of a running guest. AFAIK changing access controls for a running
guest is possible, and I don't see a reason why should we artificially prohibit this.
I think the current sysfs interface for manipulating the matrix is good for
manual playing around, but I would prefer having an interface that is better
suited for programs (e.g. ioctl).
Regards,
Halil
On 02.07.2018 18:28, Halil Pasic wrote:
>
>
> On 06/29/2018 11:11 PM, Tony Krowiak wrote:
>> This patch provides documentation describing the AP architecture and
>> design concepts behind the virtualization of AP devices. It also
>> includes an example of how to configure AP devices for exclusive
>> use of KVM guests.
>>
>> Signed-off-by: Tony Krowiak <[email protected]>
>> ---
> [..]
>> +
>> +Reserve APQNs for exclusive use of KVM guests
>> +---------------------------------------------
>> +The following block diagram illustrates the mechanism by which APQNs are
>> +reserved:
>> +
>> + +------------------+
>> + remove | | unbind
>> + +------------------->+ cex4queue driver +<-----------+
>> + | | | |
>> + | +------------------+ |
>> + | |
>> + | |
>> + | |
>> ++--------+---------+ register +------------------+ +-----+------+
>> +| +<---------+ | bind | |
>> +| ap_bus | | vfio_ap driver +<-----+ admin |
>> +| +--------->+ | | |
>> ++------------------+ probe +---+--------+-----+ +------------+
>> + | |
>> + create | | store APQN
>> + | |
>> + v v
>> + +---+--------+-----+
>> + | |
>> + | matrix device |
>> + | |
>> + +------------------+
>> +
>> +The process for reserving an AP queue for use by a KVM guest is:
>> +
>> +* The vfio-ap driver during its initialization will perform the following:
>> + * Create the 'vfio_ap' root device - /sys/devices/vfio_ap
>> + * Create the 'matrix' device in the 'vfio_ap' root
>> + * Register the matrix device with the device core
>> +* Register with the ap_bus for AP queue devices of type 10 devices (CEX4 and
>> + newer) and to provide the vfio_ap driver's probe and remove callback
>> + interfaces. The reason why older devices are not supported is because there
>> + are no systems available on which to test.
>> +* The admin unbinds queue cc.qqqq from the cex4queue device driver. This results
>> + in the ap_bus calling the the device driver's remove interface which
>> + unbinds the cc.qqqq queue device from the driver.
>
> What if the queue cc.qqqq is already in use? AFAIU unbind is almost as radical as
> pulling a cable. What is the proper procedure an admin should follow before doing
> the unbind?
What do you mean on this level with 'in use'? A unbind destroys the association
between device and driver. There is no awareness of 'in use' or 'not in use' on this
level. This is a hard unbind.
>
>> +* The admin binds the cc.qqqq queue to the vfio_ap device driver. This results
>> + in the ap_bus calling the device vfio_ap driver's probe interface to bind
>> + queue cc.qqqq to the driver. The vfio_ap device driver will store the APQN for
>> + the queue in the matrix device
>> +
> [..]
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On 29.06.2018 23:11, Tony Krowiak wrote:
> This patch provides documentation describing the AP architecture and
> design concepts behind the virtualization of AP devices. It also
> includes an example of how to configure AP devices for exclusive
> use of KVM guests.
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> Documentation/s390/vfio-ap.txt | 575 ++++++++++++++++++++++++++++++++++++++++
> MAINTAINERS | 1 +
> 2 files changed, 576 insertions(+), 0 deletions(-)
> create mode 100644 Documentation/s390/vfio-ap.txt
>
> diff --git a/Documentation/s390/vfio-ap.txt b/Documentation/s390/vfio-ap.txt
> new file mode 100644
> index 0000000..79f3d43
> --- /dev/null
> +++ b/Documentation/s390/vfio-ap.txt
> @@ -0,0 +1,575 @@
> +Introduction:
> +============
> +The Adjunct Processor (AP) facility is an IBM Z cryptographic facility comprised
> +of three AP instructions and from 1 up to 256 PCIe cryptographic adapter cards.
> +The AP devices provide cryptographic functions to all CPUs assigned to a
> +linux system running in an IBM Z system LPAR.
> +
> +The AP adapter cards are exposed via the AP bus. The motivation for vfio-ap
> +is to make AP cards available to KVM guests using the VFIO mediated device
> +framework. This implementation relies considerably on the s390 virtualization
> +facilities which do most of the hard work of providing direct access to AP
> +devices.
> +
> +AP Architectural Overview:
> +=========================
> +To facilitate the comprehension of the design, let's start with some
> +definitions:
> +
> +* AP adapter
> +
> + An AP adapter is an IBM Z adapter card that can perform cryptographic
> + functions. There can be from 0 to 256 adapters assigned to an LPAR. Adapters
> + assigned to the LPAR in which a linux host is running will be available to
> + the linux host. Each adapter is identified by a number from 0 to 255. When
> + installed, an AP adapter is accessed by AP instructions executed by any CPU.
> +
> + The AP adapter cards are assigned to a given LPAR via the system's Activation
> + Profile which can be edited via the HMC. When the system is IPL'd, the AP bus
> + module is loaded and detects the AP adapter cards assigned to the LPAR. The AP
> + bus creates a sysfs device for each adapter as they are detected. For example,
> + if AP adapters 4 and 10 (0x0a) are assigned to the LPAR, the AP bus will
> + create the following sysfs entries:
> +
> + /sys/devices/ap/card04
> + /sys/devices/ap/card0a
> +
> + Symbolic links to these devices will also be created in the AP bus devices
> + sub-directory:
> +
> + /sys/bus/ap/devices/[card04]
> + /sys/bus/ap/devices/[card04]
> +
> +* AP domain
> +
> + An adapter is partitioned into domains. Each domain can be thought of as
> + a set of hardware registers for processing AP instructions. An adapter can
> + hold up to 256 domains. Each domain is identified by a number from 0 to 255.
> + Domains can be further classified into two types:
> +
> + * Usage domains are domains that can be accessed directly to process AP
> + commands.
> +
> + * Control domains are domains that are accessed indirectly by AP
> + commands sent to a usage domain to control or change the domain, for
> + example; to set a secure private key for the domain.
> +
> + The AP usage and control domains are assigned to a given LPAR via the system's
> + Activation Profile which can be edited via the HMC. When the system is IPL'd,
> + the AP bus module is loaded and detects the AP usage and control domains
> + assigned to the LPAR. The domain number of each usage domain will be coupled
> + with the adapter number of each AP adapter assigned to the LPAR to identify
> + the AP queues (see AP Queue section below). The domain number of each control
> + domain will be represented in a bitmask and stored in a sysfs file
> + /sys/bus/ap/ap_control_domain_mask created by the bus. The bits in the mask,
> + from most to least significant bit, correspond to domains 0-255.
> +
> + A domain may be assigned to a system as both a usage and control domain, or
> + as a control domain only. Consequently, all domains assigned as both a usage
> + and control domain can both process AP commands as well as be changed by an AP
> + command sent to any usage domain assigned to the same system. Domains assigned
> + only as control domains can not process AP commands but can be changed by AP
> + commands sent to any usage domain assigned to the system.
> +
> +* AP Queue
> +
> + An AP queue is the means by which an AP command-request message is sent to a
> + usage domain inside a specific adapter. An AP queue is identified by a tuple
> + comprised of an AP adapter ID (APID) and an AP queue index (APQI). The
> + APQI corresponds to a given usage domain number within the adapter. This tuple
> + forms an AP Queue Number (APQN) uniquely identifying an AP queue. AP
> + instructions include a field containing the APQN to identify the AP queue to
> + which the AP command-request message is to be sent for processing.
> +
> + The AP bus will create a sysfs device for each APQN that can be derived from
> + the intersection of the AP adapter and usage domain numbers detected when the
> + AP bus module is loaded. For example, if adapters 4 and 10 (0x0a) and usage
> + domains 6 and 71 (0x47) are assigned to the LPAR, the AP bus will create the
> + following sysfs entries:
> +
> + /sys/devices/ap/card04/04.0006
> + /sys/devices/ap/card04/04.0047
> + /sys/devices/ap/card0a/0a.0006
> + /sys/devices/ap/card0a/0a.0047
> +
> + The following symbolic links to these devices will be created in the AP bus
> + devices subdirectory:
> +
> + /sys/bus/ap/devices/[04.0006]
> + /sys/bus/ap/devices/[04.0047]
> + /sys/bus/ap/devices/[0a.0006]
> + /sys/bus/ap/devices/[0a.0047]
> +
> +* AP Instructions:
> +
> + There are three AP instructions:
> +
> + * NQAP: to enqueue an AP command-request message to a queue
> + * DQAP: to dequeue an AP command-reply message from a queue
> + * PQAP: to administer the queues
> +
> +AP and SIE:
> +==========
> +Let's now see how AP instructions are interpreted by the hardware.
> +
> +A satellite control block called the Crypto Control Block is attached to our
> +main hardware virtualization control block. The CRYCB contains three fields to
> +identify the adapters, usage domains and control domains assigned to the KVM
> +guest:
> +
> +* The AP Mask (APM) field is a bit mask that identifies the AP adapters assigned
> + to the KVM guest. Each bit in the mask, from most significant to least
> + significant bit, corresponds to an APID from 0-255. If a bit is set, the
> + corresponding adapter is valid for use by the KVM guest.
> +
> +* The AP Queue Mask (AQM) field is a bit mask identifying the AP usage domains
> + assigned to the KVM guest. Each bit in the mask, from most significant to
> + least significant bit, corresponds to an AP queue index (APQI) from 0-255. If
> + a bit is set, the corresponding queue is valid for use by the KVM guest.
> +
> +* The AP Domain Mask field is a bit mask that identifies the AP control domains
> + assigned to the KVM guest. The ADM bit mask controls which domains can be
> + changed by an AP command-request message sent to a usage domain from the
> + guest. Each bit in the mask, from least significant to most significant bit,
> + corresponds to a domain from 0-255. If a bit is set, the corresponding domain
> + can be modified by an AP command-request message sent to a usage domain
> + configured for the KVM guest.
> +
> +If you recall from the description of an AP Queue, AP instructions include
> +an APQN to identify the AP adapter and AP queue to which an AP command-request
> +message is to be sent (NQAP and PQAP instructions), or from which a
> +command-reply message is to be received (DQAP instruction). The validity of an
> +APQN is defined by the matrix calculated from the APM and AQM; it is the
> +cross product of all assigned adapter numbers (APM) with all assigned queue
> +indexes (AQM). For example, if adapters 1 and 2 and usage domains 5 and 6 are
> +assigned to a guest, the APQNs (1,5), (1,6), (2,5) and (2,6) will be valid for
> +the guest.
> +
> +The APQNs can provide secure key functionality - i.e., a private key is stored
> +on the adapter card for each of its domains - so each APQN must be assigned to
> +at most one guest or the linux host.
> +
> + Example 1: Valid configuration:
> + ------------------------------
> + Guest1: adapters 1,2 domains 5,6
> + Guest2: adapter 1,2 domain 7
> +
> + This is valid because both guests have a unique set of APQNs: Guest1 has
> + APQNs (1,5), (1,6), (2,5) and (2,6); Guest2 has APQNs (1,7) and (2,7).
> +
> + Example 2: Invalid configuration:
> + --------------------------------is assigned by writing the adapter's number into the
> + Guest1: adapters 1,2 domains 5,6
> + Guest2: adapter 1 domains 6,7
> +
> + This is an invalid configuration because both guests have access to
> + APQN (1,6).
> +
> +The Design:
> +===========
> +The design introduces three new objects:
> +
> +1. AP matrix device
> +2. VFIO AP device driver (vfio_ap.ko)
> +3. AP mediated matrix passthrough device
> +
> +The VFIO AP device driver
> +-------------------------
> +The VFIO AP (vfio_ap) device driver serves the following purposes:
> +
> +1. Provides the interfaces to reserve APQNs for exclusive use of KVM guests.
> +
> +2. Sets up the VFIO mediated device interfaces to manage the mediated matrix
> + device and create the sysfs interfaces for assigning adapters, usage domains,
> + and control domains comprising the matrix for a KVM guest.
> +
> +3. Configure the APM, AQM and ADM in the CRYCB referenced by a KVM guest's
> + SIE state description to grant the guest access to AP devices
> +
> +4. Initialize the CPU model feature indicating that a KVM guest may use
> + AP facilities installed on the linux host.
> +
> +5. Enable interpretive execution mode for the KVM guest.
> +
> +Reserve APQNs for exclusive use of KVM guests
> +---------------------------------------------
> +The following block diagram illustrates the mechanism by which APQNs are
> +reserved:
> +
> + +------------------+
> + remove | | unbind
> + +------------------->+ cex4queue driver +<-----------+
> + | | | |
> + | +------------------+ |
> + | |
> + | |
> + | |
> ++--------+---------+ register +------------------+ +-----+------+
> +| +<---------+ | bind | |
> +| ap_bus | | vfio_ap driver +<-----+ admin |
> +| +--------->+ | | |
> ++------------------+ probe +---+--------+-----+ +------------+
> + | |
> + create | | store APQN
> + | |
> + v v
> + +---+--------+-----+
> + | |
> + | matrix device |
> + | |
> + +------------------+
> +
> +The process for reserving an AP queue for use by a KVM guest is:
> +
> +* The vfio-ap driver during its initialization will perform the following:
> + * Create the 'vfio_ap' root device - /sys/devices/vfio_ap
> + * Create the 'matrix' device in the 'vfio_ap' root
> + * Register the matrix device with the device core
> +* Register with the ap_bus for AP queue devices of type 10 devices (CEX4 and
> + newer) and to provide the vfio_ap driver's probe and remove callback
> + interfaces. The reason why older devices are not supported is because there
> + are no systems available on which to test.
This is simple not true. The reason is this is a design decision. The older
cards are simple somewhat more complicated and we don't want to
add even more complexity to the ap virtualization implementation.
We also said several times that APXA is a requirement not a feature.
> +* The admin unbinds queue cc.qqqq from the cex4queue device driver. This results
> + in the ap_bus calling the the device driver's remove interface which
> + unbinds the cc.qqqq queue device from the driver.
> +* The admin binds the cc.qqqq queue to the vfio_ap device driver. This results
> + in the ap_bus calling the device vfio_ap driver's probe interface to bind
> + queue cc.qqqq to the driver. The vfio_ap device driver will store the APQN for
> + the queue in the matrix device
> +
> +Set up the VFIO mediated device interfaces
> +------------------------------------------
> +The VFIO AP device driver utilizes the common interface of the VFIO mediated
> +device core driver to:
> +* Register an AP mediated bus driver to add a mediated matrix device to and
> + remove it from a VFIO group.
> +* Create and destroy a mediated matrix device
> +* Add a mediated matrix device to and remove it from the AP mediated bus driver
> +* Add a mediated matrix device to and remove it from an IOMMU group
> +
> +The following high-level block diagram shows the main components and interfaces
> +of the VFIO AP mediated matrix device driver:
> +
> + +-------------+
> + | |
> + | +---------+ | mdev_register_driver() +--------------+
> + | | Mdev | +<-----------------------+ |
> + | | bus | | | vfio_mdev.ko |
> + | | driver | +----------------------->+ |<-> VFIO user
> + | +---------+ | probe()/remove() +--------------+ APIs
> + | |
> + | MDEV CORE |
> + | MODULE |
> + | mdev.ko |
> + | +---------+ | mdev_register_device() +--------------+
> + | |Physical | +<-----------------------+ |
> + | | device | | | vfio_ap.ko |<-> matrix
> + | |interface| +----------------------->+ | device
> + | +---------+ | callback +--------------+
> + +-------------+
> +
> +During initialization of the vfio_ap module, the matrix device is registered
> +with an 'mdev_parent_ops' structure that provides the sysfs attribute
> +structures, mdev functions and callback interfaces for managing the mediated
> +matrix device.
> +
> +* sysfs attribute structures:
> + * supported_type_groups
> + The VFIO mediated device framework supports creation of user-defined
> + mediated device types. These mediated device types are specified
> + via the 'supported_type_groups' structure when a device is registered
> + with the mediated device framework. The registration process creates the
> + sysfs structures for each mediated device type specified in the
> + 'mdev_supported_types' sub-directory of the device being registered. Along
> + with the device type, the sysfs attributes of the mediated device type are
> + provided.
> +
> + The VFIO AP device driver will register one mediated device type for
> + passthrough devices:
> + /sys/devices/vfio_ap/mdev_supported_types/vfio_ap-passthrough
> + Only the three read-only attributes required by the VFIO mdev framework will
> + be provided:
> + /sys/devices/vfio_ap/mdev_supported_types
> + ... name
> + ... device_api
> + ... available_instances
> + Where:
> + * name: specifies the name of the mediated device type
> + * device_api: the mediated device type's API
> + * available_instances: the number of mediated matrix passthrough devices
> + that can be created
> + * mdev_attr_groups
> + This attribute group identifies the user-defined sysfs attributes of the
> + mediated device. When a device is registered with the VFIO mediated device
> + framework, the sysfs attributes files identified in the 'mdev_attr_groups'
> + structure will be created in the mediated matrix device's directory. The
> + sysfs attributes for a mediated matrix device are:
> + * assign_adapter:
> + A write-only file for assigning an AP adapter to the mediated matrix
> + device. To assign an adapter, the APID of the adapter is written to the
> + file.
> + * assign_domain:
> + A write-only file for assigning an AP usage domain to the mediated matrix
> + device. To assign a domain, the APQI of the AP queue corresponding to a
> + usage domain is written to the file.
> + * matrix:
> + A read-only file for displaying the APQNs derived from the adapters and
> + domains assigned to the mediated matrix device.
> + * assign_control_domain:
> + A write-only file for assigning an AP control domain to the mediated
> + matrix device. To assign a control domain, the ID of a domain to be
> + controlled is written to the file. For the initial implementation, the set
> + of control domains will always include the set of usage domains, so it is
> + only necessary to assign control domains that are not also assigned as
> + usage domains.
> + * control_domains:
> + A read-only file for displaying the control domain numbers assigned to the
> + mediated matrix device.
> +
> +* functions:
> + * create:
> + allocates the ap_matrix_mdev structure used by the vfio_ap driver to:
> + * Keep track of the available instances
> + * Store the reference to the struct kvm for the KVM guest
> + * Provide the notifier callback that will get invoked to handle the
> + VFIO_GROUP_NOTIFY_SET_KVM event. When received, the vfio_ap driver will
> + store the reference in the mediated matrix device's ap_matrix_mdev
> + structure and enable the interpretive execution mode for the KVM guest.
> + * remove:
> + deallocates the mediated matrix device's ap_matrix_mdev structure.
> +
> +* callback interfaces
> + * open:
> + The vfio_ap driver uses this callback to register a
> + VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the mdev matrix
> + device. The notifier is invoked when QEMU connects the VFIO iommu group
> + for the mdev matrix device to the MDEV bus. Access to the KVM structure used
> + to configure the KVM guest is provided via this callback. The KVM structure,
> + is used to configure the guest's access to the AP matrix defined via the
> + mediated matrix device's sysfs attribute files.
> + * release:
> + unregisters the VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the
> + mdev matrix device and deconfigures the guest's AP matrix.
> +
> +Configure the APM, AQM and ADM in the CRYCB:
> +-------------------------------------------
> +Configuring the AP matrix for a KVM guest will be performed when the
> +VFIO_GROUP_NOTIFY_SET_KVM notifier callback is invoked. The notifier
> +function is called when QEMU connects the VFIO iommu group for the mdev matrix
> +device to the MDEV bus. The CRYCB is configured by:
> +* Setting the bits in the APM corresponding to the APIDs assigned to the
> + mediated matrix device via its 'assign_adapter' interface.
> +* Setting the bits in the AQM corresponding to the APQIs assigned to the
> + mediated matrix device via its 'assign_domain' interface.
> +* Setting the bits in the ADM corresponding to the domain dIDs assigned to the
> + mediated matrix device via its 'assign_control_domains' interface.
> +
> +Initialize the CPU model feature for AP
> +---------------------------------------
> +A new CPU model feature, KVM_S390_VM_CPU_FEAT_AP, is introduced to indicate that
> +AP instructions are available to the KVM guest. This feature will be enabled by
> +KVM only if the AP instructions are installed on the linux host. The feature
> +must be turned on for the guest in order to access AP devices from the guest.
> +For example, to turn the AP facilities on from the QEMU command line:
> +
> + /usr/bin/qemu-system-s390x ... -cpu xxx,ap=on
> +
> + Where xxx is the CPU model being used.
> +
> + If the CPU model feature is not enabled by the kernel, QEMU will fail and
> + report that the feature is not supported.
> +
> +Example:
> +=======
> +Let's now provide an example to illustrate how KVM guests may be given
> +access to AP facilities. For this example, we will show how to configure
> +two guests such that executing the lszcrypt command on the guests would
> +look like this:
> +
> +Guest1
> +------
> +CARD.DOMAIN TYPE MODE
> +------------------------------
> +05 CEX5C CCA-Coproc
> +05.0004 CEX5C CCA-Coproc
> +05.00ab CEX5C CCA-Coproc
> +06 CEX5A Accelerator
> +06.0004 CEX5A Accelerator
> +06.00ab CEX5C CCA-Coproc
typo: change the mode of the last line to Accelerator please
> +
> +Guest2
> +------
> +CARD.DOMAIN TYPE MODE
> +------------------------------
> +05 CEX5A Accelerator
> +05.0047 CEX5A Accelerator
> +05.00ff CEX5A Accelerator
Btw: this is an excellent example about thinking beyond the current design.
We don't want to dedicate Accelerators to guests. Accelerators should be
shared, CCA and EP11 Coprocessors should be dedicated. So maybe
change the example to use EP11 and CCA Coprocessors .... and think
about how shared Accelerators could be handled.
> +
> +These are the steps:
> +
> +1. Install the vfio_ap module on the linux host. The dependency chain for the
> + vfio_ap module is:
> + * vfio
> + * mdev
> + * vfio_mdev
> + * KVM
> + * vfio_ap
> +
> +2. Secure the AP queues to be used by the two guests so that the host can not
> + access them. Only type 10 adapters (i.e., CEX4 and later) are supported
> + due to the fact that no test systems with older card types are available
> + for testing.
> +
> + To secure the AP queues each, each AP Queue device must first be unbound from
> + the cex4queue device driver. The sysfs location of the driver is:
> +
> + /sys/bus/ap
> + --- [drivers]
> + ------ [cex4queue]
> + --------- [05.0004]
> + --------- [05.0047]
> + --------- [05.00ab]
> + --------- [05.00ff]
> + --------- [06.0004]
> + --------- [06.00ab]
> + --------- unbind
> +
> + To unbind AP queue 05.0004 from the cex4queue device driver:
> +
> + echo 05.0004 > unbind
> +
> + This must also be done for AP queues 05.00ab, 05.0047, 05.00ff, 06.0004,
> + and 06.00ab.
> +
> + The AP Queues that were unbound must then be reserves for use by the two KVM
> + guests. This is accomplished by binding them to the vfio_ap device driver.
> + The sysfs location of the driver is:
> +
> + /sys/bus/ap
> + ---[drivers]
> + ------ [vfio_ap]
> + ---------- bind
> +
> + To bind queue 05.0004 to the vfio_ap driver:
> +
> + echo 05.0004 > bind
> +
> + This must also be done for AP queues 05.00ab, 05.0047, 05.00ff, 06.0004,
> + and 06.00ab.
> +
> + Take note that the AP queues bound to the vfio_ap driver will be available
> + for guest usage until they are unbound from the driver, the vfio_ap module
> + is unloaded, or the host system is shut down.
> +
> +3. Create the mediated devices needed to configure the AP matrixes for the
> + two guests and to provide an interface to the vfio_ap driver for
> + use by the guests:
> +
> + /sys/devices/
> + --- [vfio_ap]
> + ------ [matrix] (this is the matrix device)
> + --------- [mdev_supported_types]
> + ------------ [vfio_ap-passthrough] (passthrough mediated matrix device type)
> + --------------- create
> + --------------- [devices]
> +
> + To create the mediated devices for the two guests:
> +
> + uuidgen > create
> + uuidgen > create
> +
> + This will create two mediated devices in the [devices] subdirectory named
> + with the UUID written to the create attribute file. We call them $uuid1
> + and $uuid2:
> +
> + /sys/devices/
> + --- [vfio_ap]
> + ------ [matrix]
> + --------- [mdev_supported_types]
> + ------------ [vfio_ap-passthrough]
> + --------------- [devices]
> + ------------------ [$uuid1]
> + --------------------- assign_adapter
> + --------------------- assign_control_domain
> + --------------------- assign_domain
> + --------------------- matrix
> + --------------------- unassign_adapter
> + --------------------- unassign_control_domain
> + --------------------- unassign_domain
> +
> + ------------------ [$uuid2]
> + --------------------- assign_adapter
> + --------------------- assign_cTo assign an adapter, the APID of the adapter is written to the
> + file. ontrol_domain
Here something seems to be mixed up.
> + --------------------- assign_domain
> + --------------------- matrix
> + --------------------- unassign_adapter
> + --------------------- unassign_control_domain
> + --------------------- unassign_domain
> +
> +4. The administrator now needs to configure the matrixes for mediated
> + devices $uuid1 (for Guest1) and $uuid2 (for Guest2).
> +
> + This is how the matrix is configured for Guest1:
> +
> + echo 5 > assign_adapter
> + echo 6 > assign_adapter
> + echo 4 > assign_domain
> + echo 0xab > assign_domain
> +
> + For this implementation, all usage domains - i.e., domains assigned
> + via the assign_domain attribute file - will also be configured in the ADM
> + field of the KVM guest's CRYCB, so there is no need to assign control
> + domains here unless you want to assign control domains that are not
> + assigned as usage domains.
> +
> + If a mistake is made configuring an adapter, domain or control domain,
> + you can use the unassign_xxx files to unassign the adapter, domain or
> + control domain.
> +
> + To display the matrix configuration for Guest1:
> +
> + cat matrix
> +
> + This is how the matrix is configured for Guest2:
> +
> + echo 5 > assign_adapter
> + echo 0x47 > assign_domain
> + echo 0xff > assign_domain
> +
> +6. Start Guest1:
> +
> + /usr/bin/qemu-system-s390x ... -cpu xxx,ap=on \
> + -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid1 ...
> +
> +7. Start Guest2:
> +
> + /usr/bin/qemu-system-s390x ... -cpu xxx,ap=on \
> + -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid2 ...
> +
> +When the guest is shut down, the mediated matrix device may be removed.
> +
> +Using our example again, to remove the mediated matrix device $uuid1:
> +
> + /sys/devices/
> + --- [vfio_ap]
> + ------ [matrix]
> + --------- [mdev_supported_types]
> + ------------ [vfio_ap-passthrough]
> + --------------- [devices]
> + ------------------ [$uuid1]
> + --------------------- remove
> +
> + echo 1 > remove
> +
> + This will remove all of the mdev matrix device's sysfs structures. To
> + recreate and reconfigure the mdev matrix device, all of the steps starting
> + with step 4 will have to be performed again.
> +
> + It is not necessary to remove an mdev matrix device, but one may want to
> + remove it if no guest will use it during the lifetime of the linux host. If
> + the mdev matrix device is removed, one may want to unbind the AP queues the
> + guest was using from the vfio_ap device driver and bind them back to the
> + default driver. Alternatively, the AP queues can be configured for another
Please note: you can't just 'bind them back to the default driver'. You need
to unbind and then call dev_reprobe() which triggers the default way of
assigning a driver to a device and give the ap bus a chance to handle this.
> + mdev matrix (i.e., guest). In either case, one must take care to change the
> + secure key configured for the domain to which the queue is connected.
> \ No newline at end of file
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 3217803..c693a23 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -12411,6 +12411,7 @@ S: Supported
> F: drivers/s390/crypto/vfio_ap_drv.c
> F: drivers/s390/crypto/vfio_ap_private.h
> F: drivers/s390/crypto/vfio_ap_ops.c
> +F: Documentation/s390/vfio-ap.txt
>
> S390 ZFCP DRIVER
> M: Steffen Maier <[email protected]>
On 07/03/2018 09:46 AM, Harald Freudenberger wrote:
> On 02.07.2018 18:28, Halil Pasic wrote:
>>
>>
>> On 06/29/2018 11:11 PM, Tony Krowiak wrote:
>>> This patch provides documentation describing the AP architecture and
>>> design concepts behind the virtualization of AP devices. It also
>>> includes an example of how to configure AP devices for exclusive
>>> use of KVM guests.
>>>
>>> Signed-off-by: Tony Krowiak <[email protected]>
>>> ---
>> [..]
>>> +
>>> +Reserve APQNs for exclusive use of KVM guests
>>> +---------------------------------------------
>>> +The following block diagram illustrates the mechanism by which APQNs are
>>> +reserved:
>>> +
>>> + +------------------+
>>> + remove | | unbind
>>> + +------------------->+ cex4queue driver +<-----------+
>>> + | | | |
>>> + | +------------------+ |
>>> + | |
>>> + | |
>>> + | |
>>> ++--------+---------+ register +------------------+ +-----+------+
>>> +| +<---------+ | bind | |
>>> +| ap_bus | | vfio_ap driver +<-----+ admin |
>>> +| +--------->+ | | |
>>> ++------------------+ probe +---+--------+-----+ +------------+
>>> + | |
>>> + create | | store APQN
>>> + | |
>>> + v v
>>> + +---+--------+-----+
>>> + | |
>>> + | matrix device |
>>> + | |
>>> + +------------------+
>>> +
>>> +The process for reserving an AP queue for use by a KVM guest is:
>>> +
>>> +* The vfio-ap driver during its initialization will perform the following:
>>> + * Create the 'vfio_ap' root device - /sys/devices/vfio_ap
>>> + * Create the 'matrix' device in the 'vfio_ap' root
>>> + * Register the matrix device with the device core
>>> +* Register with the ap_bus for AP queue devices of type 10 devices (CEX4 and
>>> + newer) and to provide the vfio_ap driver's probe and remove callback
>>> + interfaces. The reason why older devices are not supported is because there
>>> + are no systems available on which to test.
>>> +* The admin unbinds queue cc.qqqq from the cex4queue device driver. This results
>>> + in the ap_bus calling the the device driver's remove interface which
>>> + unbinds the cc.qqqq queue device from the driver.
>>
>> What if the queue cc.qqqq is already in use? AFAIU unbind is almost as radical as
>> pulling a cable. What is the proper procedure an admin should follow before doing
>> the unbind?
> What do you mean on this level with 'in use'? A unbind destroys the association
> between device and driver. There is no awareness of 'in use' or 'not in use' on this
> level. This is a hard unbind.
>>
Let me try to invoke the DASD analogy. If one for some reason wants to detach
a DASD the procedure to follow seems to be (see
https://www.ibm.com/support/knowledgecenter/en/linuxonibm/com.ibm.linux.z.lgdd/lgdd_t_dasd_online.html)
the following:
1) Unmount.
2) Offline possibly using safe_offline.
3) Detach.
Detaching a disk that is currently doing I/O asks for trouble, so the admin is encouraged
to make sure there is no pending I/O.
In case of AP you can interpret my 'in use' as the queue is not empty. In my understanding
unbind is supposed to be hard (I used the word radical). That's why I compared it to pulling
a cable. So that's why I ask is there stuff the admin is supposed to do before doing the
unbind.
Does that answer your question?
Regards,
Halil
On Tue, 3 Jul 2018 11:22:10 +0200
Halil Pasic <[email protected]> wrote:
> On 07/03/2018 09:46 AM, Harald Freudenberger wrote:
> > On 02.07.2018 18:28, Halil Pasic wrote:
> >>
> >>
> >> On 06/29/2018 11:11 PM, Tony Krowiak wrote:
> >>> This patch provides documentation describing the AP architecture and
> >>> design concepts behind the virtualization of AP devices. It also
> >>> includes an example of how to configure AP devices for exclusive
> >>> use of KVM guests.
> >>>
> >>> Signed-off-by: Tony Krowiak <[email protected]>
> >>> ---
> >> [..]
> >>> +
> >>> +Reserve APQNs for exclusive use of KVM guests
> >>> +---------------------------------------------
> >>> +The following block diagram illustrates the mechanism by which APQNs are
> >>> +reserved:
> >>> +
> >>> + +------------------+
> >>> + remove | | unbind
> >>> + +------------------->+ cex4queue driver +<-----------+
> >>> + | | | |
> >>> + | +------------------+ |
> >>> + | |
> >>> + | |
> >>> + | |
> >>> ++--------+---------+ register +------------------+ +-----+------+
> >>> +| +<---------+ | bind | |
> >>> +| ap_bus | | vfio_ap driver +<-----+ admin |
> >>> +| +--------->+ | | |
> >>> ++------------------+ probe +---+--------+-----+ +------------+
> >>> + | |
> >>> + create | | store APQN
> >>> + | |
> >>> + v v
> >>> + +---+--------+-----+
> >>> + | |
> >>> + | matrix device |
> >>> + | |
> >>> + +------------------+
> >>> +
> >>> +The process for reserving an AP queue for use by a KVM guest is:
> >>> +
> >>> +* The vfio-ap driver during its initialization will perform the following:
> >>> + * Create the 'vfio_ap' root device - /sys/devices/vfio_ap
> >>> + * Create the 'matrix' device in the 'vfio_ap' root
> >>> + * Register the matrix device with the device core
> >>> +* Register with the ap_bus for AP queue devices of type 10 devices (CEX4 and
> >>> + newer) and to provide the vfio_ap driver's probe and remove callback
> >>> + interfaces. The reason why older devices are not supported is because there
> >>> + are no systems available on which to test.
> >>> +* The admin unbinds queue cc.qqqq from the cex4queue device driver. This results
> >>> + in the ap_bus calling the the device driver's remove interface which
> >>> + unbinds the cc.qqqq queue device from the driver.
> >>
> >> What if the queue cc.qqqq is already in use? AFAIU unbind is almost as radical as
> >> pulling a cable. What is the proper procedure an admin should follow before doing
> >> the unbind?
> > What do you mean on this level with 'in use'? A unbind destroys the association
> > between device and driver. There is no awareness of 'in use' or 'not in use' on this
> > level. This is a hard unbind.
> >>
>
>
> Let me try to invoke the DASD analogy. If one for some reason wants to detach
> a DASD the procedure to follow seems to be (see
> https://www.ibm.com/support/knowledgecenter/en/linuxonibm/com.ibm.linux.z.lgdd/lgdd_t_dasd_online.html)
> the following:
> 1) Unmount.
> 2) Offline possibly using safe_offline.
> 3) Detach.
>
> Detaching a disk that is currently doing I/O asks for trouble, so the admin is encouraged
> to make sure there is no pending I/O.
I don't think we can use dasd (block devices) as a good analogy for
every kind of device (for starters, consider network devices).
> In case of AP you can interpret my 'in use' as the queue is not empty. In my understanding
> unbind is supposed to be hard (I used the word radical). That's why I compared it to pulling
> a cable. So that's why I ask is there stuff the admin is supposed to do before doing the
> unbind.
Are you asking for a kind of 'quiescing' operation? I would hope that
the crypto drivers already can deal with that via flushing the queue,
not allowing new requests, or whatever. This is not the block device
case.
Anyway, this is an administrative issue. If you don't have a clear
concept which devices are for host usage and which for guest usage, you
already have problems.
Speaking of administrative issues, is there libvirt support for vfio-ap
under development? It would be helpful to validate the approach.
On 07/03/2018 01:52 PM, Cornelia Huck wrote:
> On Tue, 3 Jul 2018 11:22:10 +0200
> Halil Pasic <[email protected]> wrote:
>
[..]
>>
>> Let me try to invoke the DASD analogy. If one for some reason wants to detach
>> a DASD the procedure to follow seems to be (see
>> https://www.ibm.com/support/knowledgecenter/en/linuxonibm/com.ibm.linux.z.lgdd/lgdd_t_dasd_online.html)
>> the following:
>> 1) Unmount.
>> 2) Offline possibly using safe_offline.
>> 3) Detach.
>>
>> Detaching a disk that is currently doing I/O asks for trouble, so the admin is encouraged
>> to make sure there is no pending I/O.
>
> I don't think we can use dasd (block devices) as a good analogy for
> every kind of device (for starters, consider network devices).
>
I did not use it for every kind of device. I used it for AP. I'm
under the impression you find the analogy inappropriate. If, could
you please explain why?
>> In case of AP you can interpret my 'in use' as the queue is not empty. In my understanding
>> unbind is supposed to be hard (I used the word radical). That's why I compared it to pulling
>> a cable. So that's why I ask is there stuff the admin is supposed to do before doing the
>> unbind.
>
> Are you asking for a kind of 'quiescing' operation? I would hope that
> the crypto drivers already can deal with that via flushing the queue,
> not allowing new requests, or whatever. This is not the block device
> case.
>
The current implementation of vfio-ap which is a crypto driver too certainly
can not deal 'with that'. Whether the rest of the drivers can, I don't
know. Maybe Tony can tell.
I'm aware of the fact that AP adapters are not block devices. But
as stated above I don't understand what is the big difference regarding
the unbind operation.
> Anyway, this is an administrative issue. If you don't have a clear
> concept which devices are for host usage and which for guest usage, you
> already have problems.
I'm trying to understand the whole solution. I agree, this is an administrative
issue. But the document is trying to address such administrative issues.
>
> Speaking of administrative issues, is there libvirt support for vfio-ap
> under development? It would be helpful to validate the approach.
I full-heartedly agree. I guess Tony will have to answer this one too.
Regards,
Halil
On Tue, 3 Jul 2018 14:20:11 +0200
Halil Pasic <[email protected]> wrote:
> On 07/03/2018 01:52 PM, Cornelia Huck wrote:
> > On Tue, 3 Jul 2018 11:22:10 +0200
> > Halil Pasic <[email protected]> wrote:
> >
> [..]
> >>
> >> Let me try to invoke the DASD analogy. If one for some reason wants to detach
> >> a DASD the procedure to follow seems to be (see
> >> https://www.ibm.com/support/knowledgecenter/en/linuxonibm/com.ibm.linux.z.lgdd/lgdd_t_dasd_online.html)
> >> the following:
> >> 1) Unmount.
> >> 2) Offline possibly using safe_offline.
> >> 3) Detach.
> >>
> >> Detaching a disk that is currently doing I/O asks for trouble, so the admin is encouraged
> >> to make sure there is no pending I/O.
> >
> > I don't think we can use dasd (block devices) as a good analogy for
> > every kind of device (for starters, consider network devices).
> >
>
> I did not use it for every kind of device. I used it for AP. I'm
> under the impression you find the analogy inappropriate. If, could
> you please explain why?
I don't think block devices (which are designed to be more or less
permanently accessed, e.g. by mounting a file system) have the same
semantics as ap devices (which exist as a backend for crypto requests).
Not everything that makes sense for a block device makes sense for
other devices as well, and I don't think it makes sense here.
>
> >> In case of AP you can interpret my 'in use' as the queue is not empty. In my understanding
> >> unbind is supposed to be hard (I used the word radical). That's why I compared it to pulling
> >> a cable. So that's why I ask is there stuff the admin is supposed to do before doing the
> >> unbind.
> >
> > Are you asking for a kind of 'quiescing' operation? I would hope that
> > the crypto drivers already can deal with that via flushing the queue,
> > not allowing new requests, or whatever. This is not the block device
> > case.
> >
>
> The current implementation of vfio-ap which is a crypto driver too certainly
> can not deal 'with that'. Whether the rest of the drivers can, I don't
> know. Maybe Tony can tell.
If the current implementation of vfio-ap cannot deal with it (by
cleaning up, blocking, etc.), it needs at the very least be documented
so that it can be implemented later. I do not know what the SIE will or
won't do to assist here (e.g., if you're removing it from some masks,
the device will already be inaccessible to the guest). But the part you
were referring to was talking about the existing host driver anyway,
wasn't it?
>
> I'm aware of the fact that AP adapters are not block devices. But
> as stated above I don't understand what is the big difference regarding
> the unbind operation.
>
> > Anyway, this is an administrative issue. If you don't have a clear
> > concept which devices are for host usage and which for guest usage, you
> > already have problems.
>
> I'm trying to understand the whole solution. I agree, this is an administrative
> issue. But the document is trying to address such administrative issues.
I'd assume "know which devices are for the host and which devices are
for the guests" to be a given, no?
> >
> > Speaking of administrative issues, is there libvirt support for vfio-ap
> > under development? It would be helpful to validate the approach.
>
> I full-heartedly agree. I guess Tony will have to answer this one too.
>
> Regards,
> Halil
>
On 07/03/2018 03:25 PM, Cornelia Huck wrote:
> On Tue, 3 Jul 2018 14:20:11 +0200
> Halil Pasic <[email protected]> wrote:
>
>> On 07/03/2018 01:52 PM, Cornelia Huck wrote:
>>> On Tue, 3 Jul 2018 11:22:10 +0200
>>> Halil Pasic <[email protected]> wrote:
>>>
>> [..]
>>>>
>>>> Let me try to invoke the DASD analogy. If one for some reason wants to detach
>>>> a DASD the procedure to follow seems to be (see
>>>> https://www.ibm.com/support/knowledgecenter/en/linuxonibm/com.ibm.linux.z.lgdd/lgdd_t_dasd_online.html)
>>>> the following:
>>>> 1) Unmount.
>>>> 2) Offline possibly using safe_offline.
>>>> 3) Detach.
>>>>
>>>> Detaching a disk that is currently doing I/O asks for trouble, so the admin is encouraged
>>>> to make sure there is no pending I/O.
>>>
>>> I don't think we can use dasd (block devices) as a good analogy for
>>> every kind of device (for starters, consider network devices).
>>>
>>
>> I did not use it for every kind of device. I used it for AP. I'm
>> under the impression you find the analogy inappropriate. If, could
>> you please explain why?
>
> I don't think block devices (which are designed to be more or less
> permanently accessed, e.g. by mounting a file system) have the same
> semantics as ap devices (which exist as a backend for crypto requests).
> Not everything that makes sense for a block device makes sense for
> other devices as well, and I don't think it makes sense here.
>
I'm still confused. If it's about frequency of access (as hinted
by block devices accessed more or less permanently) I'm not sure
there is a substantial difference. I guess there are scenarios where
the AP domain is used very seldom (e.g. protected keys --> most of
the crypto ops done by CPACF but AP unwraps at the beginning), but
there are such scenarios for block too.
If it's about (persistent) state, I guess it again depends on the
scenario and on the type of the card. But I may be wrong.
>>
>>>> In case of AP you can interpret my 'in use' as the queue is not empty. In my understanding
>>>> unbind is supposed to be hard (I used the word radical). That's why I compared it to pulling
>>>> a cable. So that's why I ask is there stuff the admin is supposed to do before doing the
>>>> unbind.
>>>
>>> Are you asking for a kind of 'quiescing' operation? I would hope that
>>> the crypto drivers already can deal with that via flushing the queue,
>>> not allowing new requests, or whatever. This is not the block device
>>> case.
>>>
>>
>> The current implementation of vfio-ap which is a crypto driver too certainly
>> can not deal 'with that'. Whether the rest of the drivers can, I don't
>> know. Maybe Tony can tell.
>
> If the current implementation of vfio-ap cannot deal with it (by
> cleaning up, blocking, etc.), it needs at the very least be documented
> so that it can be implemented later. I do not know what the SIE will or
> won't do to assist here (e.g., if you're removing it from some masks,
> the device will already be inaccessible to the guest). But the part you
> were referring to was talking about the existing host driver anyway,
> wasn't it?
>
I was thinking about both directions. Re-classifying a device form
pass-through to normal should also be possible. But the document only
talks about one direction.
I'm not familiar with the existing host drivers. If we can say 'Hey,
unbind is perfectly safe at any time: no per-cautions need to be considered!'
I'm very happy with that. Although I would find it a bit surprising.
I just wanted to make sure this is not something we forget.
>>
>> I'm aware of the fact that AP adapters are not block devices. But
>> as stated above I don't understand what is the big difference regarding
>> the unbind operation.
>>
>>> Anyway, this is an administrative issue. If you don't have a clear
>>> concept which devices are for host usage and which for guest usage, you
>>> already have problems.
>>
>> I'm trying to understand the whole solution. I agree, this is an administrative
>> issue. But the document is trying to address such administrative issues.
>
> I'd assume "know which devices are for the host and which devices are
> for the guests" to be a given, no?
>
My other email scratches this topic. AFAIK we don't have a solution for
that yet. Nor we have a good understanding of how and to what extent
is statically given what is given. E.g. if one wants to re-partition my AP
resources (and at some point one will have to at least do the initial
re-partitioning) do I need a reboot for the changes to take effect? Or
is this 'known' variable during the uptime of an OS.
@Tony: Please feel free to fill the gaps in my understanding.
Regards,
Halil
On Tue, 3 Jul 2018 15:58:37 +0200
Halil Pasic <[email protected]> wrote:
> On 07/03/2018 03:25 PM, Cornelia Huck wrote:
> > On Tue, 3 Jul 2018 14:20:11 +0200
> > Halil Pasic <[email protected]> wrote:
> >
> >> On 07/03/2018 01:52 PM, Cornelia Huck wrote:
> >>> On Tue, 3 Jul 2018 11:22:10 +0200
> >>> Halil Pasic <[email protected]> wrote:
> >>>
> >> [..]
> >>>>
> >>>> Let me try to invoke the DASD analogy. If one for some reason wants to detach
> >>>> a DASD the procedure to follow seems to be (see
> >>>> https://www.ibm.com/support/knowledgecenter/en/linuxonibm/com.ibm.linux.z.lgdd/lgdd_t_dasd_online.html)
> >>>> the following:
> >>>> 1) Unmount.
> >>>> 2) Offline possibly using safe_offline.
> >>>> 3) Detach.
> >>>>
> >>>> Detaching a disk that is currently doing I/O asks for trouble, so the admin is encouraged
> >>>> to make sure there is no pending I/O.
> >>>
> >>> I don't think we can use dasd (block devices) as a good analogy for
> >>> every kind of device (for starters, consider network devices).
> >>>
> >>
> >> I did not use it for every kind of device. I used it for AP. I'm
> >> under the impression you find the analogy inappropriate. If, could
> >> you please explain why?
> >
> > I don't think block devices (which are designed to be more or less
> > permanently accessed, e.g. by mounting a file system) have the same
> > semantics as ap devices (which exist as a backend for crypto requests).
> > Not everything that makes sense for a block device makes sense for
> > other devices as well, and I don't think it makes sense here.
> >
>
> I'm still confused. If it's about frequency of access (as hinted
> by block devices accessed more or less permanently) I'm not sure
> there is a substantial difference. I guess there are scenarios where
> the AP domain is used very seldom (e.g. protected keys --> most of
> the crypto ops done by CPACF but AP unwraps at the beginning), but
> there are such scenarios for block too.
>
> If it's about (persistent) state, I guess it again depends on the
> scenario and on the type of the card. But I may be wrong.
So, let's turn this around: Why do you think that dasd (and not qeth or
whatever) is a good model for ap device unbinding? Because I really
fail to get it... maybe the ap driver maintainers can chime in.
>
> >>
> >>>> In case of AP you can interpret my 'in use' as the queue is not empty. In my understanding
> >>>> unbind is supposed to be hard (I used the word radical). That's why I compared it to pulling
> >>>> a cable. So that's why I ask is there stuff the admin is supposed to do before doing the
> >>>> unbind.
> >>>
> >>> Are you asking for a kind of 'quiescing' operation? I would hope that
> >>> the crypto drivers already can deal with that via flushing the queue,
> >>> not allowing new requests, or whatever. This is not the block device
> >>> case.
> >>>
> >>
> >> The current implementation of vfio-ap which is a crypto driver too certainly
> >> can not deal 'with that'. Whether the rest of the drivers can, I don't
> >> know. Maybe Tony can tell.
> >
> > If the current implementation of vfio-ap cannot deal with it (by
> > cleaning up, blocking, etc.), it needs at the very least be documented
> > so that it can be implemented later. I do not know what the SIE will or
> > won't do to assist here (e.g., if you're removing it from some masks,
> > the device will already be inaccessible to the guest). But the part you
> > were referring to was talking about the existing host driver anyway,
> > wasn't it?
> >
>
> I was thinking about both directions. Re-classifying a device form
> pass-through to normal should also be possible. But the document only
> talks about one direction.
Presumably because it (rightfully) focuses on setting up vfio-ap?
>
> I'm not familiar with the existing host drivers. If we can say 'Hey,
> unbind is perfectly safe at any time: no per-cautions need to be considered!'
> I'm very happy with that. Although I would find it a bit surprising.
>
> I just wanted to make sure this is not something we forget.
>
> >>
> >> I'm aware of the fact that AP adapters are not block devices. But
> >> as stated above I don't understand what is the big difference regarding
> >> the unbind operation.
> >>
> >>> Anyway, this is an administrative issue. If you don't have a clear
> >>> concept which devices are for host usage and which for guest usage, you
> >>> already have problems.
> >>
> >> I'm trying to understand the whole solution. I agree, this is an administrative
> >> issue. But the document is trying to address such administrative issues.
> >
> > I'd assume "know which devices are for the host and which devices are
> > for the guests" to be a given, no?
> >
>
> My other email scratches this topic. AFAIK we don't have a solution for
> that yet. Nor we have a good understanding of how and to what extent
> is statically given what is given. E.g. if one wants to re-partition my AP
> resources (and at some point one will have to at least do the initial
> re-partitioning) do I need a reboot for the changes to take effect? Or
> is this 'known' variable during the uptime of an OS.
I think that is really out of scope for this file, which I'd expect to
explain how vfio-ap basically works and which incantations I need to
give crypto devices to a guest. It should NOT focus on administrative
tasks; this should either be delegated to the likes of libvirt or
documented in a "how to use crypto cards with kvm" kind of technical
writeup. If there's a limitation (e.g. you can't easily unbind again),
write a line here.
On 07/02/2018 12:28 PM, Cornelia Huck wrote:
> On Mon, 2 Jul 2018 18:20:55 +0200
> Halil Pasic <[email protected]> wrote:
>
>> On 07/02/2018 06:11 PM, Cornelia Huck wrote:
>>> On Mon, 2 Jul 2018 11:54:28 -0400
>>> Tony Krowiak <[email protected]> wrote:
>>>
>>>> On 07/02/2018 11:41 AM, Cornelia Huck wrote:
>>>>> On Mon, 2 Jul 2018 11:37:11 -0400
>>>>> Tony Krowiak <[email protected]> wrote:
>>>>>
>>>>>> On 07/02/2018 10:38 AM, Christian Borntraeger wrote:
>>>>>>> On 06/29/2018 11:11 PM, Tony Krowiak wrote:
>>>>>>>> Introduces a new CPU model feature and two CPU model
>>>>>>>> facilities to support AP virtualization for KVM guests.
>>>>>>>>
>>>>>>>> CPU model feature:
>>>>>>>>
>>>>>>>> The KVM_S390_VM_CPU_FEAT_AP feature indicates that
>>>>>>>> AP instructions are available on the guest. This
>>>>>>>> feature will be enabled by the kernel only if the AP
>>>>>>>> instructions are installed on the linux host. This feature
>>>>>>>> must be specifically turned on for the KVM guest from
>>>>>>>> userspace to use the VFIO AP device driver for guest
>>>>>>>> access to AP devices.
>>>>>>>>
>>>>>>>> CPU model facilities:
>>>>>>>>
>>>>>>>> 1. AP Query Configuration Information (QCI) facility is installed.
>>>>>>>>
>>>>>>>> This is indicated by setting facilities bit 12 for
>>>>>>>> the guest. The kernel will not enable this facility
>>>>>>>> for the guest if it is not set on the host. This facility
>>>>>>>> must not be set by userspace if the KVM_S390_VM_CPU_FEAT_AP
>>>>>>>> feature is not installed.
>>>>>>>>
>>>>>>>> If this facility is not set for the KVM guest, then only
>>>>>>>> APQNs with an APQI less than 16 will be available to the
>>>>>>>> guest regardless of the guest's matrix configuration. This
>>>>>>>> is a limitation of the AP bus running on the guest.
>>>>>>>>
>>>>>>>> 2. AP Facilities Test facility (APFT) is installed.
>>>>>>>>
>>>>>>>> This is indicated by setting facilities bit 15 for
>>>>>>>> the guest. The kernel will not enable this facility for
>>>>>>>> the guest if it is not set on the host. This facility
>>>>>>>> must not be set by userspace if the KVM_S390_VM_CPU_FEAT_AP
>>>>>>>> feature is not installed.
>>>>>>>>
>>>>>>>> If this facility is not set for the KVM guest, then no
>>>>>>>> AP devices will be available to the guest regardless of
>>>>>>>> the guest's matrix configuration. This is a limitation
>>>>>>>> of the AP bus running under the guest.
>>>>>>>>
>>>>>>>> Reviewed-by: Christian Borntraeger <[email protected]>
>>>>>>>> Reviewed-by: Halil Pasic <[email protected]>
>>>>>>>> Signed-off-by: Tony Krowiak <[email protected]>
>>>>>>> I think it probably should be at the end of the series, other than that its good.
>>>>>> If I move this to the end of the series, the very next patch checks the
>>>>>>
>>>>>> KVM_S390_VM_CPU_FEAT_AP feature?
>>>>> Introduce it here, offer it only with the last patch?
>>>> I apologize, but I don't know what you mean by this. Are you suggesting
>>>> this patch
>>>> should only include the #define for KVM_S390_VM_CPU_FEAT_AP?
>>> Yes, just introduce the definition here (so code later in the series
>>> can refer to it) and flip the switch (offer the bit) as the final
>>> patch.
>>>
>> The other features introduced and exposed here are no different. For
>> KVM_S390_VM_CPU_FEAT_AP defer exposing means defer allow_cpu_feat();
>> for the STFLE features, defer adding to FACILITIES_KVM_CPUMODEL.
>>
>> Anyway, I think the definition should be squashed into #6. Expose the
>> features after patch #6 is in place or expose them at the end of the
>> series is IMHO a matter of taste -- and I lean towards expose at the
>> end of the series.
> Squashing with patch 6 and enabling at the end of the series sounds
> good to me as well.
Consider it done.
>
On 07/03/2018 03:46 AM, Harald Freudenberger wrote:
> On 02.07.2018 18:28, Halil Pasic wrote:
>>
>> On 06/29/2018 11:11 PM, Tony Krowiak wrote:
>>> This patch provides documentation describing the AP architecture and
>>> design concepts behind the virtualization of AP devices. It also
>>> includes an example of how to configure AP devices for exclusive
>>> use of KVM guests.
>>>
>>> Signed-off-by: Tony Krowiak <[email protected]>
>>> ---
>> [..]
>>> +
>>> +Reserve APQNs for exclusive use of KVM guests
>>> +---------------------------------------------
>>> +The following block diagram illustrates the mechanism by which APQNs are
>>> +reserved:
>>> +
>>> + +------------------+
>>> + remove | | unbind
>>> + +------------------->+ cex4queue driver +<-----------+
>>> + | | | |
>>> + | +------------------+ |
>>> + | |
>>> + | |
>>> + | |
>>> ++--------+---------+ register +------------------+ +-----+------+
>>> +| +<---------+ | bind | |
>>> +| ap_bus | | vfio_ap driver +<-----+ admin |
>>> +| +--------->+ | | |
>>> ++------------------+ probe +---+--------+-----+ +------------+
>>> + | |
>>> + create | | store APQN
>>> + | |
>>> + v v
>>> + +---+--------+-----+
>>> + | |
>>> + | matrix device |
>>> + | |
>>> + +------------------+
>>> +
>>> +The process for reserving an AP queue for use by a KVM guest is:
>>> +
>>> +* The vfio-ap driver during its initialization will perform the following:
>>> + * Create the 'vfio_ap' root device - /sys/devices/vfio_ap
>>> + * Create the 'matrix' device in the 'vfio_ap' root
>>> + * Register the matrix device with the device core
>>> +* Register with the ap_bus for AP queue devices of type 10 devices (CEX4 and
>>> + newer) and to provide the vfio_ap driver's probe and remove callback
>>> + interfaces. The reason why older devices are not supported is because there
>>> + are no systems available on which to test.
>>> +* The admin unbinds queue cc.qqqq from the cex4queue device driver. This results
>>> + in the ap_bus calling the the device driver's remove interface which
>>> + unbinds the cc.qqqq queue device from the driver.
>> What if the queue cc.qqqq is already in use? AFAIU unbind is almost as radical as
>> pulling a cable. What is the proper procedure an admin should follow before doing
>> the unbind?
> What do you mean on this level with 'in use'? A unbind destroys the association
> between device and driver. There is no awareness of 'in use' or 'not in use' on this
> level. This is a hard unbind.
According to my reading of the code, the remove callback for the AP
queue drivers
flushes the queue before it is disconnected from the driver. Do you
concur Harald?
>>> +* The admin binds the cc.qqqq queue to the vfio_ap device driver. This results
>>> + in the ap_bus calling the device vfio_ap driver's probe interface to bind
>>> + queue cc.qqqq to the driver. The vfio_ap device driver will store the APQN for
>>> + the queue in the matrix device
>>> +
>> [..]
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
On 07/03/2018 07:52 AM, Cornelia Huck wrote:
> On Tue, 3 Jul 2018 11:22:10 +0200
> Halil Pasic <[email protected]> wrote:
>
>> On 07/03/2018 09:46 AM, Harald Freudenberger wrote:
>>> On 02.07.2018 18:28, Halil Pasic wrote:
>>>>
>>>> On 06/29/2018 11:11 PM, Tony Krowiak wrote:
>>>>> This patch provides documentation describing the AP architecture and
>>>>> design concepts behind the virtualization of AP devices. It also
>>>>> includes an example of how to configure AP devices for exclusive
>>>>> use of KVM guests.
>>>>>
>>>>> Signed-off-by: Tony Krowiak <[email protected]>
>>>>> ---
>>>> [..]
>>>>> +
>>>>> +Reserve APQNs for exclusive use of KVM guests
>>>>> +---------------------------------------------
>>>>> +The following block diagram illustrates the mechanism by which APQNs are
>>>>> +reserved:
>>>>> +
>>>>> + +------------------+
>>>>> + remove | | unbind
>>>>> + +------------------->+ cex4queue driver +<-----------+
>>>>> + | | | |
>>>>> + | +------------------+ |
>>>>> + | |
>>>>> + | |
>>>>> + | |
>>>>> ++--------+---------+ register +------------------+ +-----+------+
>>>>> +| +<---------+ | bind | |
>>>>> +| ap_bus | | vfio_ap driver +<-----+ admin |
>>>>> +| +--------->+ | | |
>>>>> ++------------------+ probe +---+--------+-----+ +------------+
>>>>> + | |
>>>>> + create | | store APQN
>>>>> + | |
>>>>> + v v
>>>>> + +---+--------+-----+
>>>>> + | |
>>>>> + | matrix device |
>>>>> + | |
>>>>> + +------------------+
>>>>> +
>>>>> +The process for reserving an AP queue for use by a KVM guest is:
>>>>> +
>>>>> +* The vfio-ap driver during its initialization will perform the following:
>>>>> + * Create the 'vfio_ap' root device - /sys/devices/vfio_ap
>>>>> + * Create the 'matrix' device in the 'vfio_ap' root
>>>>> + * Register the matrix device with the device core
>>>>> +* Register with the ap_bus for AP queue devices of type 10 devices (CEX4 and
>>>>> + newer) and to provide the vfio_ap driver's probe and remove callback
>>>>> + interfaces. The reason why older devices are not supported is because there
>>>>> + are no systems available on which to test.
>>>>> +* The admin unbinds queue cc.qqqq from the cex4queue device driver. This results
>>>>> + in the ap_bus calling the the device driver's remove interface which
>>>>> + unbinds the cc.qqqq queue device from the driver.
>>>> What if the queue cc.qqqq is already in use? AFAIU unbind is almost as radical as
>>>> pulling a cable. What is the proper procedure an admin should follow before doing
>>>> the unbind?
>>> What do you mean on this level with 'in use'? A unbind destroys the association
>>> between device and driver. There is no awareness of 'in use' or 'not in use' on this
>>> level. This is a hard unbind.
>>>>
>>
>> Let me try to invoke the DASD analogy. If one for some reason wants to detach
>> a DASD the procedure to follow seems to be (see
>> https://www.ibm.com/support/knowledgecenter/en/linuxonibm/com.ibm.linux.z.lgdd/lgdd_t_dasd_online.html)
>> the following:
>> 1) Unmount.
>> 2) Offline possibly using safe_offline.
>> 3) Detach.
>>
>> Detaching a disk that is currently doing I/O asks for trouble, so the admin is encouraged
>> to make sure there is no pending I/O.
> I don't think we can use dasd (block devices) as a good analogy for
> every kind of device (for starters, consider network devices).
>
>> In case of AP you can interpret my 'in use' as the queue is not empty. In my understanding
>> unbind is supposed to be hard (I used the word radical). That's why I compared it to pulling
>> a cable. So that's why I ask is there stuff the admin is supposed to do before doing the
>> unbind.
> Are you asking for a kind of 'quiescing' operation? I would hope that
> the crypto drivers already can deal with that via flushing the queue,
> not allowing new requests, or whatever. This is not the block device
> case.
As I stated in Message ID:
<[email protected]>,
I believe the queue is flushed when the remove callback is invoked on the
driver.
>
> Anyway, this is an administrative issue. If you don't have a clear
> concept which devices are for host usage and which for guest usage, you
> already have problems.
>
> Speaking of administrative issues, is there libvirt support for vfio-ap
> under development? It would be helpful to validate the approach.
There is libvirt support under development although it is not very far
along at this point.
>
On 07/03/2018 08:20 AM, Halil Pasic wrote:
>
>
> On 07/03/2018 01:52 PM, Cornelia Huck wrote:
>> On Tue, 3 Jul 2018 11:22:10 +0200
>> Halil Pasic <[email protected]> wrote:
>>
> [..]
>>>
>>> Let me try to invoke the DASD analogy. If one for some reason wants
>>> to detach
>>> a DASD the procedure to follow seems to be (see
>>> https://www.ibm.com/support/knowledgecenter/en/linuxonibm/com.ibm.linux.z.lgdd/lgdd_t_dasd_online.html)
>>>
>>> the following:
>>> 1) Unmount.
>>> 2) Offline possibly using safe_offline.
>>> 3) Detach.
>>>
>>> Detaching a disk that is currently doing I/O asks for trouble, so
>>> the admin is encouraged
>>> to make sure there is no pending I/O.
>>
>> I don't think we can use dasd (block devices) as a good analogy for
>> every kind of device (for starters, consider network devices).
>>
>
> I did not use it for every kind of device. I used it for AP. I'm
> under the impression you find the analogy inappropriate. If, could
> you please explain why?
>
>>> In case of AP you can interpret my 'in use' as the queue is not
>>> empty. In my understanding
>>> unbind is supposed to be hard (I used the word radical). That's why
>>> I compared it to pulling
>>> a cable. So that's why I ask is there stuff the admin is supposed to
>>> do before doing the
>>> unbind.
>>
>> Are you asking for a kind of 'quiescing' operation? I would hope that
>> the crypto drivers already can deal with that via flushing the queue,
>> not allowing new requests, or whatever. This is not the block device
>> case.
>>
>
> The current implementation of vfio-ap which is a crypto driver too
> certainly
> can not deal 'with that'. Whether the rest of the drivers can, I don't
> know. Maybe Tony can tell.
As stated in the cover letter, unbinding a queue from the vfio-ap device
driver is akin to a hot unplug. Hot plug/unplug is one of the goals of
the next patch series.
>
>
> I'm aware of the fact that AP adapters are not block devices. But
> as stated above I don't understand what is the big difference regarding
> the unbind operation.
>
>> Anyway, this is an administrative issue. If you don't have a clear
>> concept which devices are for host usage and which for guest usage, you
>> already have problems.
>
> I'm trying to understand the whole solution. I agree, this is an
> administrative
> issue. But the document is trying to address such administrative issues.
This section of the document is intended to describe how to provision AP
queues
for dedicated guest usage and to show the relationship between the
various objects
involved. While it does administrative actions, it is not intended to be an
administrator's guide.
>
>>
>> Speaking of administrative issues, is there libvirt support for vfio-ap
>> under development? It would be helpful to validate the approach.
>
> I full-heartedly agree. I guess Tony will have to answer this one too.
>
> Regards,
> Halil
On 07/03/2018 09:25 AM, Cornelia Huck wrote:
> On Tue, 3 Jul 2018 14:20:11 +0200
> Halil Pasic <[email protected]> wrote:
>
>> On 07/03/2018 01:52 PM, Cornelia Huck wrote:
>>> On Tue, 3 Jul 2018 11:22:10 +0200
>>> Halil Pasic <[email protected]> wrote:
>>>
>> [..]
>>>> Let me try to invoke the DASD analogy. If one for some reason wants to detach
>>>> a DASD the procedure to follow seems to be (see
>>>> https://www.ibm.com/support/knowledgecenter/en/linuxonibm/com.ibm.linux.z.lgdd/lgdd_t_dasd_online.html)
>>>> the following:
>>>> 1) Unmount.
>>>> 2) Offline possibly using safe_offline.
>>>> 3) Detach.
>>>>
>>>> Detaching a disk that is currently doing I/O asks for trouble, so the admin is encouraged
>>>> to make sure there is no pending I/O.
>>> I don't think we can use dasd (block devices) as a good analogy for
>>> every kind of device (for starters, consider network devices).
>>>
>> I did not use it for every kind of device. I used it for AP. I'm
>> under the impression you find the analogy inappropriate. If, could
>> you please explain why?
> I don't think block devices (which are designed to be more or less
> permanently accessed, e.g. by mounting a file system) have the same
> semantics as ap devices (which exist as a backend for crypto requests).
> Not everything that makes sense for a block device makes sense for
> other devices as well, and I don't think it makes sense here.
>
>>>> In case of AP you can interpret my 'in use' as the queue is not empty. In my understanding
>>>> unbind is supposed to be hard (I used the word radical). That's why I compared it to pulling
>>>> a cable. So that's why I ask is there stuff the admin is supposed to do before doing the
>>>> unbind.
>>> Are you asking for a kind of 'quiescing' operation? I would hope that
>>> the crypto drivers already can deal with that via flushing the queue,
>>> not allowing new requests, or whatever. This is not the block device
>>> case.
>>>
>> The current implementation of vfio-ap which is a crypto driver too certainly
>> can not deal 'with that'. Whether the rest of the drivers can, I don't
>> know. Maybe Tony can tell.
> If the current implementation of vfio-ap cannot deal with it (by
> cleaning up, blocking, etc.), it needs at the very least be documented
> so that it can be implemented later. I do not know what the SIE will or
> won't do to assist here (e.g., if you're removing it from some masks,
> the device will already be inaccessible to the guest). But the part you
> were referring to was talking about the existing host driver anyway,
> wasn't it?
I addressed this in the cover letter and included a comment in the remove
callback for the vfio_ap driver. The goal is to provide this in the next
patch series.
>
>> I'm aware of the fact that AP adapters are not block devices. But
>> as stated above I don't understand what is the big difference regarding
>> the unbind operation.
>>
>>> Anyway, this is an administrative issue. If you don't have a clear
>>> concept which devices are for host usage and which for guest usage, you
>>> already have problems.
>> I'm trying to understand the whole solution. I agree, this is an administrative
>> issue. But the document is trying to address such administrative issues.
> I'd assume "know which devices are for the host and which devices are
> for the guests" to be a given, no?
>
>>> Speaking of administrative issues, is there libvirt support for vfio-ap
>>> under development? It would be helpful to validate the approach.
>> I full-heartedly agree. I guess Tony will have to answer this one too.
>>
>> Regards,
>> Halil
>>
On 07/03/2018 10:30 AM, Cornelia Huck wrote:
> On Tue, 3 Jul 2018 15:58:37 +0200
> Halil Pasic <[email protected]> wrote:
>
>> On 07/03/2018 03:25 PM, Cornelia Huck wrote:
>>> On Tue, 3 Jul 2018 14:20:11 +0200
>>> Halil Pasic <[email protected]> wrote:
>>>
>>>> On 07/03/2018 01:52 PM, Cornelia Huck wrote:
>>>>> On Tue, 3 Jul 2018 11:22:10 +0200
>>>>> Halil Pasic <[email protected]> wrote:
>>>>>
>>>> [..]
>>>>>> Let me try to invoke the DASD analogy. If one for some reason wants to detach
>>>>>> a DASD the procedure to follow seems to be (see
>>>>>> https://www.ibm.com/support/knowledgecenter/en/linuxonibm/com.ibm.linux.z.lgdd/lgdd_t_dasd_online.html)
>>>>>> the following:
>>>>>> 1) Unmount.
>>>>>> 2) Offline possibly using safe_offline.
>>>>>> 3) Detach.
>>>>>>
>>>>>> Detaching a disk that is currently doing I/O asks for trouble, so the admin is encouraged
>>>>>> to make sure there is no pending I/O.
>>>>> I don't think we can use dasd (block devices) as a good analogy for
>>>>> every kind of device (for starters, consider network devices).
>>>>>
>>>> I did not use it for every kind of device. I used it for AP. I'm
>>>> under the impression you find the analogy inappropriate. If, could
>>>> you please explain why?
>>> I don't think block devices (which are designed to be more or less
>>> permanently accessed, e.g. by mounting a file system) have the same
>>> semantics as ap devices (which exist as a backend for crypto requests).
>>> Not everything that makes sense for a block device makes sense for
>>> other devices as well, and I don't think it makes sense here.
>>>
>> I'm still confused. If it's about frequency of access (as hinted
>> by block devices accessed more or less permanently) I'm not sure
>> there is a substantial difference. I guess there are scenarios where
>> the AP domain is used very seldom (e.g. protected keys --> most of
>> the crypto ops done by CPACF but AP unwraps at the beginning), but
>> there are such scenarios for block too.
>>
>> If it's about (persistent) state, I guess it again depends on the
>> scenario and on the type of the card. But I may be wrong.
> So, let's turn this around: Why do you think that dasd (and not qeth or
> whatever) is a good model for ap device unbinding? Because I really
> fail to get it... maybe the ap driver maintainers can chime in.
>
>>>>
>>>>>> In case of AP you can interpret my 'in use' as the queue is not empty. In my understanding
>>>>>> unbind is supposed to be hard (I used the word radical). That's why I compared it to pulling
>>>>>> a cable. So that's why I ask is there stuff the admin is supposed to do before doing the
>>>>>> unbind.
>>>>> Are you asking for a kind of 'quiescing' operation? I would hope that
>>>>> the crypto drivers already can deal with that via flushing the queue,
>>>>> not allowing new requests, or whatever. This is not the block device
>>>>> case.
>>>>>
>>>> The current implementation of vfio-ap which is a crypto driver too certainly
>>>> can not deal 'with that'. Whether the rest of the drivers can, I don't
>>>> know. Maybe Tony can tell.
>>> If the current implementation of vfio-ap cannot deal with it (by
>>> cleaning up, blocking, etc.), it needs at the very least be documented
>>> so that it can be implemented later. I do not know what the SIE will or
>>> won't do to assist here (e.g., if you're removing it from some masks,
>>> the device will already be inaccessible to the guest). But the part you
>>> were referring to was talking about the existing host driver anyway,
>>> wasn't it?
>>>
>> I was thinking about both directions. Re-classifying a device form
>> pass-through to normal should also be possible. But the document only
>> talks about one direction.
> Presumably because it (rightfully) focuses on setting up vfio-ap?
This is a true statement. The doc is not intended to be a comprehensive
administration guide, it is intended to be more of a design spec. The
goal here is to show the relationship between the objects involved in
AP queue pass-through.
>
>> I'm not familiar with the existing host drivers. If we can say 'Hey,
>> unbind is perfectly safe at any time: no per-cautions need to be considered!'
>> I'm very happy with that. Although I would find it a bit surprising.
>>
>> I just wanted to make sure this is not something we forget.
>>
>>>> I'm aware of the fact that AP adapters are not block devices. But
>>>> as stated above I don't understand what is the big difference regarding
>>>> the unbind operation.
>>>>
>>>>> Anyway, this is an administrative issue. If you don't have a clear
>>>>> concept which devices are for host usage and which for guest usage, you
>>>>> already have problems.
>>>> I'm trying to understand the whole solution. I agree, this is an administrative
>>>> issue. But the document is trying to address such administrative issues.
>>> I'd assume "know which devices are for the host and which devices are
>>> for the guests" to be a given, no?
>>>
>> My other email scratches this topic. AFAIK we don't have a solution for
>> that yet. Nor we have a good understanding of how and to what extent
>> is statically given what is given. E.g. if one wants to re-partition my AP
>> resources (and at some point one will have to at least do the initial
>> re-partitioning) do I need a reboot for the changes to take effect? Or
>> is this 'known' variable during the uptime of an OS.
> I think that is really out of scope for this file, which I'd expect to
> explain how vfio-ap basically works and which incantations I need to
> give crypto devices to a guest. It should NOT focus on administrative
> tasks; this should either be delegated to the likes of libvirt or
> documented in a "how to use crypto cards with kvm" kind of technical
> writeup. If there's a limitation (e.g. you can't easily unbind again),
> write a line here.
On this we can agree.
>
On 07/03/2018 04:30 PM, Cornelia Huck wrote:
> On Tue, 3 Jul 2018 15:58:37 +0200
> Halil Pasic <[email protected]> wrote:
>
>> On 07/03/2018 03:25 PM, Cornelia Huck wrote:
>>> On Tue, 3 Jul 2018 14:20:11 +0200
>>> Halil Pasic <[email protected]> wrote:
>>>
>>>> On 07/03/2018 01:52 PM, Cornelia Huck wrote:
>>>>> On Tue, 3 Jul 2018 11:22:10 +0200
>>>>> Halil Pasic <[email protected]> wrote:
>>>>>
>>>> [..]
>>>>>>
>>>>>> Let me try to invoke the DASD analogy. If one for some reason wants to detach
>>>>>> a DASD the procedure to follow seems to be (see
>>>>>> https://www.ibm.com/support/knowledgecenter/en/linuxonibm/com.ibm.linux.z.lgdd/lgdd_t_dasd_online.html)
>>>>>> the following:
>>>>>> 1) Unmount.
>>>>>> 2) Offline possibly using safe_offline.
>>>>>> 3) Detach.
>>>>>>
>>>>>> Detaching a disk that is currently doing I/O asks for trouble, so the admin is encouraged
>>>>>> to make sure there is no pending I/O.
>>>>>
>>>>> I don't think we can use dasd (block devices) as a good analogy for
>>>>> every kind of device (for starters, consider network devices).
>>>>>
>>>>
>>>> I did not use it for every kind of device. I used it for AP. I'm
>>>> under the impression you find the analogy inappropriate. If, could
>>>> you please explain why?
>>>
>>> I don't think block devices (which are designed to be more or less
>>> permanently accessed, e.g. by mounting a file system) have the same
>>> semantics as ap devices (which exist as a backend for crypto requests).
>>> Not everything that makes sense for a block device makes sense for
>>> other devices as well, and I don't think it makes sense here.
>>>
>>
>> I'm still confused. If it's about frequency of access (as hinted
>> by block devices accessed more or less permanently) I'm not sure
>> there is a substantial difference. I guess there are scenarios where
>> the AP domain is used very seldom (e.g. protected keys --> most of
>> the crypto ops done by CPACF but AP unwraps at the beginning), but
>> there are such scenarios for block too.
>>
>> If it's about (persistent) state, I guess it again depends on the
>> scenario and on the type of the card. But I may be wrong.
>
> So, let's turn this around: Why do you think that dasd (and not qeth or
> whatever) is a good model for ap device unbinding? Because I really
> fail to get it... maybe the ap driver maintainers can chime in.
>
Let's do it! But let me clarify one thing first I never stated that
dasd is the only good model.
What speaks for dasd as a model for unbinding:
* DASD is currently the only device we have vfio-mdev passthrough
for on s390x.
* DASD is comparatively simple and familiar. I'm not less confident
to talk about qeth or whatever else than to talk about DASD.
* DASD has persistent state. A NIC is much more stateless.
* DASD has offline and safe_offline. This kind of demonstrates that
the stock operation may trade 'safety' for stuff (e.g. guarantee to
terminate). Since the queue reset implemented by Tony has a limited
wait built in this seemed relevant.
* DASD can be seen as request-response with some local-ish stuff
as opposed to sending and receiving packets in a probably largish
network. The idea of outstanding operations is easy to gasp.
* From expectations of the upper layer entities a block device seems to
be a better fit than a network interface. Fault recovery is less of
a concern for an application that writes to a file, than for an
application that tires to talk to an other application over the net.
In my experience connections break more often that disks or I suppose
AP domains.
What is so wrong about asking the question: Is really unbind all
the admin has to do?
>>
>>>>
>>>>>> In case of AP you can interpret my 'in use' as the queue is not empty. In my understanding
>>>>>> unbind is supposed to be hard (I used the word radical). That's why I compared it to pulling
>>>>>> a cable. So that's why I ask is there stuff the admin is supposed to do before doing the
>>>>>> unbind.
>>>>>
>>>>> Are you asking for a kind of 'quiescing' operation? I would hope that
>>>>> the crypto drivers already can deal with that via flushing the queue,
>>>>> not allowing new requests, or whatever. This is not the block device
>>>>> case.
>>>>>
>>>>
>>>> The current implementation of vfio-ap which is a crypto driver too certainly
>>>> can not deal 'with that'. Whether the rest of the drivers can, I don't
>>>> know. Maybe Tony can tell.
>>>
>>> If the current implementation of vfio-ap cannot deal with it (by
>>> cleaning up, blocking, etc.), it needs at the very least be documented
>>> so that it can be implemented later. I do not know what the SIE will or
>>> won't do to assist here (e.g., if you're removing it from some masks,
>>> the device will already be inaccessible to the guest). But the part you
>>> were referring to was talking about the existing host driver anyway,
>>> wasn't it?
>>>
>>
>> I was thinking about both directions. Re-classifying a device form
>> pass-through to normal should also be possible. But the document only
>> talks about one direction.
>
> Presumably because it (rightfully) focuses on setting up vfio-ap?
>
I'm afraid we have a misunderstanding here. I did not propose to include
the other direction. Again I'm reasoning about the solution.
>>
>> I'm not familiar with the existing host drivers. If we can say 'Hey,
>> unbind is perfectly safe at any time: no per-cautions need to be considered!'
>> I'm very happy with that. Although I would find it a bit surprising.
>>
>> I just wanted to make sure this is not something we forget.
>>
>>>>
>>>> I'm aware of the fact that AP adapters are not block devices. But
>>>> as stated above I don't understand what is the big difference regarding
>>>> the unbind operation.
>>>>
>>>>> Anyway, this is an administrative issue. If you don't have a clear
>>>>> concept which devices are for host usage and which for guest usage, you
>>>>> already have problems.
>>>>
>>>> I'm trying to understand the whole solution. I agree, this is an administrative
>>>> issue. But the document is trying to address such administrative issues.
>>>
>>> I'd assume "know which devices are for the host and which devices are
>>> for the guests" to be a given, no?
>>>
>>
>> My other email scratches this topic. AFAIK we don't have a solution for
>> that yet. Nor we have a good understanding of how and to what extent
>> is statically given what is given. E.g. if one wants to re-partition my AP
>> resources (and at some point one will have to at least do the initial
>> re-partitioning) do I need a reboot for the changes to take effect? Or
>> is this 'known' variable during the uptime of an OS.
>
> I think that is really out of scope for this file, which I'd expect to
> explain how vfio-ap basically works and which incantations I need to
> give crypto devices to a guest. It should NOT focus on administrative
> tasks; this should either be delegated to the likes of libvirt or
> documented in a "how to use crypto cards with kvm" kind of technical
> writeup. If there's a limitation (e.g. you can't easily unbind again),
> write a line here.
Again the misunderstanding. I'm not trying to understand the design and
not to put stuff in this document. I'm not aware of the existence of this
"how to use crypto cards with kvm" nor I've seen the likes of libvirt
patches that take care of the stuff. The stated purpose of this patch
is "provides documentation describing the AP architecture and
design concepts behind the virtualization of AP devices". This was the
best place I could find to ask my question. My intended question was
motivated by my understanding of unbind as a *not inherently safe*
operation, and by not knowing what happens if.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On 07/02/2018 07:10 PM, Halil Pasic wrote:
>
>
> On 06/29/2018 11:11 PM, Tony Krowiak wrote:
>> This patch provides documentation describing the AP architecture and
>> design concepts behind the virtualization of AP devices. It also
>> includes an example of how to configure AP devices for exclusive
>> use of KVM guests.
>>
>> Signed-off-by: Tony Krowiak <[email protected]>
>
> I don't like the design of external interfaces except for:
> * cpu model features, and
> * reset handling.
>
> In particular:
>
> 1) The architecture is such that authorizing access (via APM, AQM and
> ADM)
> to an AP queue that is currently not configured (e.g. the card not
> physically
> plugged, or just configured off). That seems to be a perfectly normal use
> case.
>
> Your assign operations however enforce that the resource is bound to your
> driver, and thus the existence of the resource in the host.
>
> It is clear: we need to avoid passing trough resources to guests that
> are not
> dedicated for this purpose (e.g. a queue utilized by zcrypt). But IMHO
> we need a different mechanism.
Interesting that you wait until v6 to bring this up. I agree, this is a
normal
use case, but there is currently no mechanism in the AP bus for drivers to
reserve devices that are not yet configured. There is proposed solution
in the
works, but until such time that is available the only choice is to disallow
assignment of AP queues to a guest that are not bound to the vfio_ap
device driver.
>
>
> 2) I see no benefit in deferring the exclusivity check to
> vfio_ap_mdev_open().
> The downside is however pretty obvious: management software is
> notified about
> a 'bad configuration' only at an attempted guest start-up. And your
> current QEMU
> patches are not very helpful in conveying this piece of information.
It only becomes a 'bad configuration' if the two guests are started
concurrently.
Is there value in being able to configure two mediated devices with the same
queue if the intent is to never run two guests using those mediated devices
simultaneously? If so, then the only time the exclusivity check can be done
is when the guest opens the mediated device. If not, then we can certainly
prevent multiple mediated devices from being assigned the same queue.
In my view, while a mediated device is used by a guest, it is not a
guest and
can be configured any way an administrator prefers. If we get concurrence
that doing an exclusivity check when an adapter or domain is assigned to
the mediated device, I'll make that change.
>
>
> I've talked with Boris, and AFAIR he said this is not acceptable to
> him (@Boris
> can you confirm).
Then I suggest Boris participate in the review and explain why.
>
>
> 3) We indicate the reason for failure due to a configuration problem
> (exclusivity
> or resource allocation) via pr_err() that is via kernel messages. I
> don't think
> this is very tooling/management software friendly, and I hope we don't
> expect admins
> to work with the sysfs interface long term. I mean the effects of the
> admin actions
> are not very persistent. Thus if the interface is a painful one, we
> are talking
> about potentially frequent pain.
We have multiple layers of software, each with its own logging
facilities. Figuring
out what went wrong when a guest fails to start is always a painful
process IMHO.
Typically, one has to view the log for each component in the stack to
figure out
what went wrong and often times, still can't figure it out. Of course,
we can help
out here by having QEMU put out a better message when this problem
occurs. But the
bottom line is, does the community think that allowing an administrator
to configure
multiple mediated devices with the same queues have value? In other
words, are
there potential use cases that would required this?
>
>
> 4) If I were to act out the role of the administrator, I would prefer
> to think of
> specifying or changing the access controls of a guest in respect to AP
> (that is
> setting the AP matrix) as a single atomic operation -- which either
> succeeds or fails.
I don't understand what you are describing here. How would this be done?
Are you
suggesting the admin somehow provides the masks en masse?
>
>
> The operation should succeed for any valid configuration, and fail for
> any invalid
> on.
>
> The current piecemeal approach seems even less fitting if we consider
> changing the
> access controls of a running guest. AFAIK changing access controls for
> a running
> guest is possible, and I don't see a reason why should we artificially
> prohibit this.
Setting and clearing bits in the APM/AQM/ADM of a guest's CRYCB is
certainly possible,
but there is a lot more to it than merely setting and clearing bits.
What you seem
to be describing here is hot plug/unplug which I stated in the cover
letter is
forthcoming. It is currently prohibited for good reason.
>
>
> I think the current sysfs interface for manipulating the matrix is
> good for
> manual playing around, but I would prefer having an interface that is
> better
> suited for programs (e.g. ioctl).
That wouldn't be a problem, but do we have a use case for it?
>
>
> Regards,
> Halil
On 03.07.2018 16:56, Tony Krowiak wrote:
> On 07/03/2018 03:46 AM, Harald Freudenberger wrote:
>> On 02.07.2018 18:28, Halil Pasic wrote:
>>>
>>> On 06/29/2018 11:11 PM, Tony Krowiak wrote:
>>>> This patch provides documentation describing the AP architecture and
>>>> design concepts behind the virtualization of AP devices. It also
>>>> includes an example of how to configure AP devices for exclusive
>>>> use of KVM guests.
>>>>
>>>> Signed-off-by: Tony Krowiak <[email protected]>
>>>> ---
>>> [..]
>>>> +
>>>> +Reserve APQNs for exclusive use of KVM guests
>>>> +---------------------------------------------
>>>> +The following block diagram illustrates the mechanism by which APQNs are
>>>> +reserved:
>>>> +
>>>> + +------------------+
>>>> + remove | | unbind
>>>> + +------------------->+ cex4queue driver +<-----------+
>>>> + | | | |
>>>> + | +------------------+ |
>>>> + | |
>>>> + | |
>>>> + | |
>>>> ++--------+---------+ register +------------------+ +-----+------+
>>>> +| +<---------+ | bind | |
>>>> +| ap_bus | | vfio_ap driver +<-----+ admin |
>>>> +| +--------->+ | | |
>>>> ++------------------+ probe +---+--------+-----+ +------------+
>>>> + | |
>>>> + create | | store APQN
>>>> + | |
>>>> + v v
>>>> + +---+--------+-----+
>>>> + | |
>>>> + | matrix device |
>>>> + | |
>>>> + +------------------+
>>>> +
>>>> +The process for reserving an AP queue for use by a KVM guest is:
>>>> +
>>>> +* The vfio-ap driver during its initialization will perform the following:
>>>> + * Create the 'vfio_ap' root device - /sys/devices/vfio_ap
>>>> + * Create the 'matrix' device in the 'vfio_ap' root
>>>> + * Register the matrix device with the device core
>>>> +* Register with the ap_bus for AP queue devices of type 10 devices (CEX4 and
>>>> + newer) and to provide the vfio_ap driver's probe and remove callback
>>>> + interfaces. The reason why older devices are not supported is because there
>>>> + are no systems available on which to test.
>>>> +* The admin unbinds queue cc.qqqq from the cex4queue device driver. This results
>>>> + in the ap_bus calling the the device driver's remove interface which
>>>> + unbinds the cc.qqqq queue device from the driver.
>>> What if the queue cc.qqqq is already in use? AFAIU unbind is almost as radical as
>>> pulling a cable. What is the proper procedure an admin should follow before doing
>>> the unbind?
>> What do you mean on this level with 'in use'? A unbind destroys the association
>> between device and driver. There is no awareness of 'in use' or 'not in use' on this
>> level. This is a hard unbind.
>
> According to my reading of the code, the remove callback for the AP queue drivers
> flushes the queue before it is disconnected from the driver. Do you concur Harald?
Yes, you are right. I checked this. The unbind triggers a zcrypt_cex4_queue_remove()
which calls ap_queue_remove() which calls ap_flush_queue().
The ap_flush_queue() function does:
- for all requests which are queued these are 'received' with -EAGAIN and thus
the zcrypt api tries to re-schedule these requests on another apqn.
- for all requests which have been sent to the ap but there is no answer yet
these are 'reveived' with -EAGAIN and the zcrypt_api tries to re-schedule these
requests. [Well, this may in the end lead to some requests sent double...]
Looks like the unbind is handled in a smooth way :-)
>
>>>> +* The admin binds the cc.qqqq queue to the vfio_ap device driver. This results
>>>> + in the ap_bus calling the device vfio_ap driver's probe interface to bind
>>>> + queue cc.qqqq to the driver. The vfio_ap device driver will store the APQN for
>>>> + the queue in the matrix device
>>>> +
>>> [..]
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>
On 07/03/2018 06:36 PM, Tony Krowiak wrote:
> On 07/02/2018 07:10 PM, Halil Pasic wrote:
>>
>>
>> On 06/29/2018 11:11 PM, Tony Krowiak wrote:
>>> This patch provides documentation describing the AP architecture and
>>> design concepts behind the virtualization of AP devices. It also
>>> includes an example of how to configure AP devices for exclusive
>>> use of KVM guests.
>>>
>>> Signed-off-by: Tony Krowiak <[email protected]>
>>
>> I don't like the design of external interfaces except for:
>> * cpu model features, and
>> * reset handling.
>>
>> In particular:
>>
>> 1) The architecture is such that authorizing access (via APM, AQM and
>> ADM)
>> to an AP queue that is currently not configured (e.g. the card not
>> physically
>> plugged, or just configured off). That seems to be a perfectly normal use
>> case.
>>
>> Your assign operations however enforce that the resource is bound to your
>> driver, and thus the existence of the resource in the host.
>>
>> It is clear: we need to avoid passing trough resources to guests that
>> are not
>> dedicated for this purpose (e.g. a queue utilized by zcrypt). But IMHO
>> we need a different mechanism.
>
> Interesting that you wait until v6 to bring this up. I agree, this is a
> normal
> use case, but there is currently no mechanism in the AP bus for drivers to
> reserve devices that are not yet configured. There is proposed solution
> in the
> works, but until such time that is available the only choice is to disallow
> assignment of AP queues to a guest that are not bound to the vfio_ap
> device driver.
>
>>
>>
>> 2) I see no benefit in deferring the exclusivity check to
>> vfio_ap_mdev_open().
>> The downside is however pretty obvious: management software is
>> notified about
>> a 'bad configuration' only at an attempted guest start-up. And your
>> current QEMU
>> patches are not very helpful in conveying this piece of information.
>
> It only becomes a 'bad configuration' if the two guests are started
> concurrently.
> Is there value in being able to configure two mediated devices with the
> same
> queue if the intent is to never run two guests using those mediated devices
> simultaneously? If so, then the only time the exclusivity check can be done
> is when the guest opens the mediated device. If not, then we can certainly
> prevent multiple mediated devices from being assigned the same queue.
>
> In my view, while a mediated device is used by a guest, it is not a
> guest and
> can be configured any way an administrator prefers. If we get concurrence
> that doing an exclusivity check when an adapter or domain is assigned to
> the mediated device, I'll make that change.
>
>>
>>
>> I've talked with Boris, and AFAIR he said this is not acceptable to
>> him (@Boris
>> can you confirm).
>
> Then I suggest Boris participate in the review and explain why.
[To make things a bit easier I am not going to address the aspect of
not-currently-exiting host resources.]
Your current implementation does provide active configurations that work
with existing host resources. These need to be bound to the vfio_ap driver.
Libvirt allows to define objects (e.g. domains or networks). These are
just definitions and do NOT bind any resources. The defined resources
are bound once the definition is started.
Currently I am assuming that an ap matrix device is defined in libvirt
outside of a libvirt domain (an ap definition). The mediated device of
the ap matrix device is used in a libvirt domain by referencing it via
its UID.
When a libvirt domain is started the mediated device should exist and be
configured correctly as every other host resource.
Therefore there needs to be something new in libvirt that allows one to
define, start, stop and undefine an ap matrix device. After a define the
ap definition for an ap matrix device would exist in libvirt only.
Once you start the ap definition the result should be a well configured
ready to be used mediated device representing the ap definition which
can be used configuration-error free by a libvirt domain. Please not
that the start of an ap definition is independent from the start of a
libvirt domain using the ap definition.
Can you explain to me how that can be accomplished?
>>
>>
>> 3) We indicate the reason for failure due to a configuration problem
>> (exclusivity
>> or resource allocation) via pr_err() that is via kernel messages. I
>> don't think
>> this is very tooling/management software friendly, and I hope we don't
>> expect admins
>> to work with the sysfs interface long term. I mean the effects of the
>> admin actions
>> are not very persistent. Thus if the interface is a painful one, we
>> are talking
>> about potentially frequent pain.
>
> We have multiple layers of software, each with its own logging
> facilities. Figuring
> out what went wrong when a guest fails to start is always a painful
> process IMHO.
> Typically, one has to view the log for each component in the stack to
> figure out
> what went wrong and often times, still can't figure it out. Of course,
> we can help
> out here by having QEMU put out a better message when this problem
> occurs. But the
> bottom line is, does the community think that allowing an administrator
> to configure
> multiple mediated devices with the same queues have value? In other
> words, are
> there potential use cases that would required this?
>
>>
>>
>> 4) If I were to act out the role of the administrator, I would prefer
>> to think of
>> specifying or changing the access controls of a guest in respect to AP
>> (that is
>> setting the AP matrix) as a single atomic operation -- which either
>> succeeds or fails.
>
> I don't understand what you are describing here. How would this be done?
> Are you
> suggesting the admin somehow provides the masks en masse?
>
>>
>>
>> The operation should succeed for any valid configuration, and fail for
>> any invalid
>> on.
>>
>> The current piecemeal approach seems even less fitting if we consider
>> changing the
>> access controls of a running guest. AFAIK changing access controls for
>> a running
>> guest is possible, and I don't see a reason why should we artificially
>> prohibit this.
>
> Setting and clearing bits in the APM/AQM/ADM of a guest's CRYCB is
> certainly possible,
> but there is a lot more to it than merely setting and clearing bits.
> What you seem
> to be describing here is hot plug/unplug which I stated in the cover
> letter is
> forthcoming. It is currently prohibited for good reason.
>
>>
>>
>> I think the current sysfs interface for manipulating the matrix is
>> good for
>> manual playing around, but I would prefer having an interface that is
>> better
>> suited for programs (e.g. ioctl).
>
> That wouldn't be a problem, but do we have a use case for it?
>
>>
>>
>> Regards,
>> Halil
>
>
--
Mit freundlichen Grüßen/Kind regards
Boris Fiuczynski
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martina Köderitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294
On 07/04/2018 12:31 PM, Boris Fiuczynski wrote:
> On 07/03/2018 06:36 PM, Tony Krowiak wrote:
>> On 07/02/2018 07:10 PM, Halil Pasic wrote:
>>>
>>>
>>> On 06/29/2018 11:11 PM, Tony Krowiak wrote:
>>>> This patch provides documentation describing the AP architecture and
>>>> design concepts behind the virtualization of AP devices. It also
>>>> includes an example of how to configure AP devices for exclusive
>>>> use of KVM guests.
>>>>
>>>> Signed-off-by: Tony Krowiak <[email protected]>
>>>
>>> I don't like the design of external interfaces except for:
>>> * cpu model features, and
>>> * reset handling.
>>>
>>> In particular:
>>>
>>> 1) The architecture is such that authorizing access (via APM, AQM
>>> and ADM)
>>> to an AP queue that is currently not configured (e.g. the card not
>>> physically
>>> plugged, or just configured off). That seems to be a perfectly
>>> normal use
>>> case.
>>>
>>> Your assign operations however enforce that the resource is bound to
>>> your
>>> driver, and thus the existence of the resource in the host.
>>>
>>> It is clear: we need to avoid passing trough resources to guests
>>> that are not
>>> dedicated for this purpose (e.g. a queue utilized by zcrypt). But IMHO
>>> we need a different mechanism.
>>
>> Interesting that you wait until v6 to bring this up. I agree, this is
>> a normal
>> use case, but there is currently no mechanism in the AP bus for
>> drivers to
>> reserve devices that are not yet configured. There is proposed
>> solution in the
>> works, but until such time that is available the only choice is to
>> disallow
>> assignment of AP queues to a guest that are not bound to the vfio_ap
>> device driver.
>>
>>>
>>>
>>> 2) I see no benefit in deferring the exclusivity check to
>>> vfio_ap_mdev_open().
>>> The downside is however pretty obvious: management software is
>>> notified about
>>> a 'bad configuration' only at an attempted guest start-up. And your
>>> current QEMU
>>> patches are not very helpful in conveying this piece of information.
>>
>> It only becomes a 'bad configuration' if the two guests are started
>> concurrently.
>> Is there value in being able to configure two mediated devices with
>> the same
>> queue if the intent is to never run two guests using those mediated
>> devices
>> simultaneously? If so, then the only time the exclusivity check can
>> be done
>> is when the guest opens the mediated device. If not, then we can
>> certainly
>> prevent multiple mediated devices from being assigned the same queue.
>>
>> In my view, while a mediated device is used by a guest, it is not a
>> guest and
>> can be configured any way an administrator prefers. If we get
>> concurrence
>> that doing an exclusivity check when an adapter or domain is assigned to
>> the mediated device, I'll make that change.
>>
>>>
>>>
>>> I've talked with Boris, and AFAIR he said this is not acceptable to
>>> him (@Boris
>>> can you confirm).
>>
>> Then I suggest Boris participate in the review and explain why.
>
> [To make things a bit easier I am not going to address the aspect of
> not-currently-exiting host resources.]
> Your current implementation does provide active configurations that
> work with existing host resources. These need to be bound to the
> vfio_ap driver.
> Libvirt allows to define objects (e.g. domains or networks). These are
> just definitions and do NOT bind any resources. The defined resources
> are bound once the definition is started.
> Currently I am assuming that an ap matrix device is defined in libvirt
> outside of a libvirt domain (an ap definition). The mediated device of
> the ap matrix device is used in a libvirt domain by referencing it via
> its UID.
> When a libvirt domain is started the mediated device should exist and
> be configured correctly as every other host resource.
> Therefore there needs to be something new in libvirt that allows one
> to define, start, stop and undefine an ap matrix device. After a
> define the ap definition for an ap matrix device would exist in
> libvirt only.
> Once you start the ap definition the result should be a well
> configured ready to be used mediated device representing the ap
> definition which can be used configuration-error free by a libvirt
> domain. Please not that the start of an ap definition is independent
> from the start of a libvirt domain using the ap definition.
> Can you explain to me how that can be accomplished?
I can make a similar case for the mediated devices. Mediated devices
play no role in guest configuration until a vfio-ap
device is specified on the QEMU command line when starting a guest. In
other words, a mediated device configuration is
independent from the start of a guest using the mediated device. To
answer your question then, if there are two or more
mediated devices with the same APQN(s) assigned, then only start one
libvirt domain that uses one of these mediated
devices. This begs the question: Does libvirt preclude one from defining
a domain that uses a host device (of any kind)
that must be dedicated to a single guest? If not, then isn't it
incumbent upon the administrator to ensure he doesn't
start two guests with the same dedicated host device? Wouldn't that same
logic apply to AP devices?
Having said that, I have no problem disallowing assignment of an AP
queue to more than one mediated device, however; suppose
an administrator - for whatever reason - wants to create multiple
mediated devices with the same APQN(s) assigned, but
never intends to run more than one guest using one of those mediated
devices concurrently. The question is - as I have
asked in another response - is there a use case for allowing an
administrator to configure multiple mediated devices with
the same APQN assigned?
>
>>>
>>>
>>> 3) We indicate the reason for failure due to a configuration problem
>>> (exclusivity
>>> or resource allocation) via pr_err() that is via kernel messages. I
>>> don't think
>>> this is very tooling/management software friendly, and I hope we
>>> don't expect admins
>>> to work with the sysfs interface long term. I mean the effects of
>>> the admin actions
>>> are not very persistent. Thus if the interface is a painful one, we
>>> are talking
>>> about potentially frequent pain.
>>
>> We have multiple layers of software, each with its own logging
>> facilities. Figuring
>> out what went wrong when a guest fails to start is always a painful
>> process IMHO.
>> Typically, one has to view the log for each component in the stack to
>> figure out
>> what went wrong and often times, still can't figure it out. Of
>> course, we can help
>> out here by having QEMU put out a better message when this problem
>> occurs. But the
>> bottom line is, does the community think that allowing an
>> administrator to configure
>> multiple mediated devices with the same queues have value? In other
>> words, are
>> there potential use cases that would required this?
>>
>>>
>>>
>>> 4) If I were to act out the role of the administrator, I would
>>> prefer to think of
>>> specifying or changing the access controls of a guest in respect to
>>> AP (that is
>>> setting the AP matrix) as a single atomic operation -- which either
>>> succeeds or fails.
>>
>> I don't understand what you are describing here. How would this be
>> done? Are you
>> suggesting the admin somehow provides the masks en masse?
>>
>>>
>>>
>>> The operation should succeed for any valid configuration, and fail
>>> for any invalid
>>> on.
>>>
>>> The current piecemeal approach seems even less fitting if we
>>> consider changing the
>>> access controls of a running guest. AFAIK changing access controls
>>> for a running
>>> guest is possible, and I don't see a reason why should we
>>> artificially prohibit this.
>>
>> Setting and clearing bits in the APM/AQM/ADM of a guest's CRYCB is
>> certainly possible,
>> but there is a lot more to it than merely setting and clearing bits.
>> What you seem
>> to be describing here is hot plug/unplug which I stated in the cover
>> letter is
>> forthcoming. It is currently prohibited for good reason.
>>
>>>
>>>
>>> I think the current sysfs interface for manipulating the matrix is
>>> good for
>>> manual playing around, but I would prefer having an interface that
>>> is better
>>> suited for programs (e.g. ioctl).
>>
>> That wouldn't be a problem, but do we have a use case for it?
>>
>>>
>>>
>>> Regards,
>>> Halil
>>
>>
>
>
On 06/29/2018 11:11 PM, Tony Krowiak wrote:
> From: Tony Krowiak <[email protected]>
>
> Introduces a new structure for storing the AP matrix configured
> for the mediated matrix device via its sysfs attributes files.
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> drivers/s390/crypto/vfio_ap_ops.c | 12 ++++++++++++
> drivers/s390/crypto/vfio_ap_private.h | 24 ++++++++++++++++++++++++
> 2 files changed, 36 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 4e61e33..bf7ed9f 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -20,6 +20,17 @@
> DEFINE_SPINLOCK(mdev_list_lock);
> LIST_HEAD(mdev_list);
>
> +static void vfio_ap_matrix_init(struct ap_matrix *matrix)
> +{
> + /* Test if PQAP(QCI) instruction is available */
> + if (test_facility(12))
> + ap_qci(&matrix->info);
> +
> + matrix->apm_max = matrix->info.apxa ? matrix->info.Na : 63;
> + matrix->aqm_max = matrix->info.apxa ? matrix->info.Nd : 15;
> + matrix->adm_max = matrix->info.apxa ? matrix->info.Nd : 15;
> +}
> +
> static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
> {
> struct ap_matrix_dev *matrix_dev =
> @@ -31,6 +42,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
> return -ENOMEM;
>
> matrix_mdev->name = dev_name(mdev_dev(mdev));
> + vfio_ap_matrix_init(&matrix_mdev->matrix);
> mdev_set_drvdata(mdev, matrix_mdev);
>
> if (atomic_dec_if_positive(&matrix_dev->available_instances) < 0) {
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index 3de1275..ae771f5 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -29,9 +29,33 @@ struct ap_matrix_dev {
> atomic_t available_instances;
> };
>
> +/**
> + * The AP matrix is comprised of three bit masks identifying the adapters,
> + * queues (domains) and control domains that belong to an AP matrix. The bits i
> + * each mask, from least significant to most significant bit, correspond to IDs
> + * 0 to 255. When a bit is set, the corresponding ID belongs to the matrix.
> + *
> + * @apm identifies the AP adapters in the matrix
> + * @apm_max: max adapter number in @apm
> + * @aqm identifies the AP queues (domains) in the matrix
> + * @aqm_max: max domain number in @aqm
> + * @adm identifies the AP control domains in the matrix
> + * @adm_max: max domain number in @adm
> + */
> +struct ap_matrix {
> + unsigned long apm_max;
> + DECLARE_BITMAP(apm, 256);
> + unsigned long aqm_max;
> + DECLARE_BITMAP(aqm, 256);
> + unsigned long adm_max;
> + DECLARE_BITMAP(adm, 256);
> + struct ap_config_info info;
Why do we maintain (and populate by doing a QCI) the info member on a
per mdev device basis?
> +};
> +
> struct ap_matrix_mdev {
> const char *name;
> struct list_head list;
> + struct ap_matrix matrix;
> };
>
> static struct ap_matrix_dev *to_ap_matrix_dev(struct device *dev)
>
On 03/07/2018 10:10, Harald Freudenberger wrote:
> On 29.06.2018 23:11, Tony Krowiak wrote:
...snip...
> +The process for reserving an AP queue for use by a KVM guest is:
> +
> +* The vfio-ap driver during its initialization will perform the following:
> + * Create the 'vfio_ap' root device - /sys/devices/vfio_ap
> + * Create the 'matrix' device in the 'vfio_ap' root
> + * Register the matrix device with the device core
> +* Register with the ap_bus for AP queue devices of type 10 devices (CEX4 and
> + newer) and to provide the vfio_ap driver's probe and remove callback
> + interfaces. The reason why older devices are not supported is because there
> + are no systems available on which to test.
> This is simple not true. The reason is this is a design decision. The older
> cards are simple somewhat more complicated and we don't want to
> add even more complexity to the ap virtualization implementation.
> We also said several times that APXA is a requirement not a feature.
I understand your point of view as maintainer of the cryptographic driver
but I do not see the point concerning virtualization.
The SIE allows to work fine without APXA.
Is there any reason to add a restrictions here?
If there is a good reason then the problem should be treated when
detecting the presence
of APXA. AFAIR we do not do this.
>> +* The admin unbinds queue cc.qqqq from the cex4queue device driver. This results
>> + in the ap_bus calling the the device driver's remove interface which
>> + unbinds the cc.qqqq queue device from the driver.
>> +* The admin binds the cc.qqqq queue to the vfio_ap device driver. This results
>> + in the ap_bus calling the device vfio_ap driver's probe interface to bind
>> + queue cc.qqqq to the driver. The vfio_ap device driver will store the APQN for
>> + the queue in the matrix device
...snip...
>> +
>> +Guest2
>> +------
>> +CARD.DOMAIN TYPE MODE
>> +------------------------------
>> +05 CEX5A Accelerator
>> +05.0047 CEX5A Accelerator
>> +05.00ff CEX5A Accelerator
> Btw: this is an excellent example about thinking beyond the current design.
> We don't want to dedicate Accelerators to guests. Accelerators should be
> shared, CCA and EP11 Coprocessors should be dedicated. So maybe
> change the example to use EP11 and CCA Coprocessors .... and think
> about how shared Accelerators could be handled.
Shouldn't this problematic be let to the administrator?
Using the SIE for virtualization is independent of the kind of
card.
Why, again, see above, should we take the type of card into account
at this level?
>> +
>> +These are the steps:
...snip...
>> + echo 1 > remove
>> +
>> + This will remove all of the mdev matrix device's sysfs structures. To
>> + recreate and reconfigure the mdev matrix device, all of the steps starting
>> + with step 4 will have to be performed again.
>> +
>> + It is not necessary to remove an mdev matrix device, but one may want to
>> + remove it if no guest will use it during the lifetime of the linux host. If
>> + the mdev matrix device is removed, one may want to unbind the AP queues the
>> + guest was using from the vfio_ap device driver and bind them back to the
>> + default driver. Alternatively, the AP queues can be configured for another
> Please note: you can't just 'bind them back to the default driver'. You need
> to unbind and then call dev_reprobe() which triggers the default way of
> assigning a driver to a device and give the ap bus a chance to handle this.
Are you saying that the administrator can not unbind a AP device
and bind it to another AP driver?
I am surprised. can you explain?
Best regards,
Pierre
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On 03/07/2018 01:10, Halil Pasic wrote:
>
>
> On 06/29/2018 11:11 PM, Tony Krowiak wrote:
>> This patch provides documentation describing the AP architecture and
>> design concepts behind the virtualization of AP devices. It also
>> includes an example of how to configure AP devices for exclusive
>> use of KVM guests.
>>
>> Signed-off-by: Tony Krowiak <[email protected]>
>
> I don't like the design of external interfaces except for:
> * cpu model features, and
> * reset handling.
>
> In particular:
>
>
...snip...
> 4) If I were to act out the role of the administrator, I would prefer
> to think of
> specifying or changing the access controls of a guest in respect to AP
> (that is
> setting the AP matrix) as a single atomic operation -- which either
> succeeds or fails.
>
> The operation should succeed for any valid configuration, and fail for
> any invalid
> on.
>
> The current piecemeal approach seems even less fitting if we consider
> changing the
> access controls of a running guest. AFAIK changing access controls for
> a running
> guest is possible, and I don't see a reason why should we artificially
> prohibit this.
>
> I think the current sysfs interface for manipulating the matrix is
> good for
> manual playing around, but I would prefer having an interface that is
> better
> suited for programs (e.g. ioctl).
I disagree with using ioctl.
I agree that the current implementation is not right.
The configuration of APM and AQM should always be guarantied as coherent
within the host but it can be done doing the right checks when using the
sysfs.
Regards,
Pierre
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On 29/06/2018 23:11, Tony Krowiak wrote:
> Provides the sysfs interfaces for assigning AP adapters to
> and unassigning AP adapters from a mediated matrix device.
>
> The IDs of the AP adapters assigned to the mediated matrix
> device are stored in an AP mask (APM). The bits in the APM,
> from most significant to least significant bit, correspond to
> AP adapter ID (APID) 0 to 255. When an adapter is assigned, the
> bit corresponding the APID will be set in the APM.
> Likewise, when an adapter is unassigned, the bit corresponding
> to the APID will be cleared from the APM.
>
> The relevant sysfs structures are:
>
> /sys/devices/vfio_ap
> ... [matrix]
> ...... [mdev_supported_types]
> ......... [vfio_ap-passthrough]
> ............ [devices]
> ...............[$uuid]
> .................. assign_adapter
> .................. unassign_adapter
>
> To assign an adapter to the $uuid mediated matrix device's APM,
> write the APID to the assign_adapter file. To unassign an adapter,
> write the APID to the unassign_adapter file. The APID is specified
> using conventional semantics: If it begins with 0x the number will
> be parsed as a hexadecimal number; if it begins with a 0 the number
> will be parsed as an octal number; otherwise, it will be parsed as a
> decimal number.
>
> For example, to assign adapter 173 (0xad) to the mediated matrix
> device $uuid:
>
> echo 173 > assign_adapter
>
> or
>
> echo 0xad > assign_adapter
>
> or
>
> echo 0255 > assign_adapter
>
> To unassign adapter 173 (0xad):
>
> echo 173 > unassign_adapter
>
> or
>
> echo 0xad > unassign_adapter
>
> or
>
> echo 0255 > unassign_adapter
>
> The assignment will be rejected:
>
> * If the APID exceeds the maximum value for an AP adapter:
> * If the AP Extended Addressing (APXA) facility is
> installed, the max value is 255
> * Else the max value is 64
>
> * If no AP domains have yet been assigned and there are
> no AP queues bound to the VFIO AP driver that have an APQN
> with an APID matching that of the AP adapter being assigned.
>
> * If any of the APQNs that can be derived from the cross product
> of the APID being assigned and the AP queue index (APQI) of
> each of the AP domains previously assigned can not be matched
> with an APQN of an AP queue device reserved by the VFIO AP
> driver.
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> drivers/s390/crypto/vfio_ap_ops.c | 317 +++++++++++++++++++++++++++++++++++++
> 1 files changed, 317 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index bf7ed9f..a4351bd 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -16,6 +16,7 @@
>
> #define VFOP_AP_MDEV_TYPE_HWVIRT "passthrough"
> #define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
> +#define KVM_AP_MASK_BYTES(n) DIV_ROUND_UP(n, BITS_PER_BYTE)
>
> DEFINE_SPINLOCK(mdev_list_lock);
> LIST_HEAD(mdev_list);
> @@ -116,9 +117,325 @@ static ssize_t device_api_show(struct kobject *kobj, struct device *dev,
> NULL,
> };
>
> +struct vfio_ap_qid_reserved {
> + ap_qid_t qid;
> + bool reserved;
> +};
> +
> +struct vfio_id_reserved {
> + unsigned long id;
> + bool reserved;
> +};
> +
> +/**
> + * vfio_ap_qid_reserved
> + *
> + * @dev: an AP queue device
> + * @data: a queue ID
> + *
> + * Flags whether any AP queue device has a particular qid
> + *
> + * Returns 0 to indicate the function succeeded
> + */
> +static int vfio_ap_queue_has_qid(struct device *dev, void *data)
> +{
> + struct vfio_ap_qid_reserved *qid_res = data;
> + struct ap_queue *ap_queue = to_ap_queue(dev);
> +
> + if (qid_res->qid == ap_queue->qid)
> + qid_res->reserved = true;
> +
> + return 0;
> +}
> +
> +/**
> + * vfio_ap_queue_has_apid
> + *
> + * @dev: an AP queue device
> + * @data: an AP adapter ID
> + *
> + * Flags whether any AP queue device has a particular AP adapter ID
> + *
> + * Returns 0 to indicate the function succeeded
> + */
> +static int vfio_ap_queue_has_apid(struct device *dev, void *data)
> +{
> + struct vfio_id_reserved *id_res = data;
> + struct ap_queue *ap_queue = to_ap_queue(dev);
> +
> + if (id_res->id == AP_QID_CARD(ap_queue->qid))
> + id_res->reserved = true;
> +
> + return 0;
> +}
> +
> +/**
> + * vfio_ap_verify_qid_reserved
> + *
> + * @matrix_dev: a mediated matrix device
> + * @qid: a qid (i.e., APQN)
> + *
> + * Verifies that the AP queue with @qid is reserved by the VFIO AP device
> + * driver.
> + *
> + * Returns 0 if the AP queue with @qid is reserved; otherwise, returns -ENODEV.
> + */
> +static int vfio_ap_verify_qid_reserved(struct ap_matrix_dev *matrix_dev,
> + ap_qid_t qid)
> +{
> + int ret;
> + struct vfio_ap_qid_reserved qid_res;
> +
> + qid_res.qid = qid;
> + qid_res.reserved = false;
> +
> + ret = driver_for_each_device(matrix_dev->device.driver, NULL, &qid_res,
> + vfio_ap_queue_has_qid);
> + if (ret)
> + return ret;
> +
> + if (qid_res.reserved)
> + return 0;
> +
> + return -EPERM;
> +}
> +
> +/**
> + * vfio_ap_verify_apid_reserved
> + *
> + * @matrix_dev: a mediated matrix device
> + * @apid: an AP adapter ID
> + *
> + * Verifies that an AP queue with @apid is reserved by the VFIO AP device
> + * driver.
> + *
> + * Returns 0 if an AP queue with @apid is reserved; otherwise, returns -ENODEV.
> + */
> +static int vfio_ap_verify_apid_reserved(struct ap_matrix_dev *matrix_dev,
> + const char *mdev_name,
> + unsigned long apid)
> +{
> + int ret;
> + struct vfio_id_reserved id_res;
> +
> + id_res.id = apid;
> + id_res.reserved = false;
> +
> + ret = driver_for_each_device(matrix_dev->device.driver, NULL, &id_res,
> + vfio_ap_queue_has_apid);
> + if (ret)
> + return ret;
> +
> + if (id_res.reserved)
> + return 0;
> +
> + pr_err("%s: mdev %s using adapter %02lx not reserved by %s driver",
> + VFIO_AP_MODULE_NAME, mdev_name, apid,
> + VFIO_AP_DRV_NAME);
> +
> + return -EPERM;
> +}
> +
> +static int vfio_ap_verify_queues_reserved(struct ap_matrix_dev *matrix_dev,
> + const char *mdev_name,
> + struct ap_matrix *matrix)
> +{
> + unsigned long apid, apqi;
> + int ret;
> + int rc = 0;
> +
> + for_each_set_bit_inv(apid, matrix->apm, matrix->apm_max + 1) {
> + for_each_set_bit_inv(apqi, matrix->aqm, matrix->aqm_max + 1) {
> + ret = vfio_ap_verify_qid_reserved(matrix_dev,
> + AP_MKQID(apid, apqi));
> + if (ret == 0)
> + continue;
> +
> + /*
> + * We want to log every APQN that is not reserved by
> + * the driver, so record the return code, log a message
> + * and allow the loop to continue
> + */
> + rc = ret;
> + pr_err("%s: mdev %s using queue %02lx.%04lx not reserved by %s driver",
> + VFIO_AP_MODULE_NAME, mdev_name, apid,
> + apqi, VFIO_AP_DRV_NAME);
> + }
> + }
> +
> + return rc;
> +}
> +
> +/**
> + * vfio_ap_validate_apid
> + *
> + * @mdev: the mediated device
> + * @matrix_mdev: the mediated matrix device
> + * @apid: the APID to validate
> + *
> + * Validates the value of @apid:
> + * * If there are no AP domains assigned, then there must be at least
> + * one AP queue device reserved by the VFIO AP device driver with an
> + * APQN containing @apid.
> + *
> + * * Else each APQN that can be derived from the intersection of @apid and
> + * the IDs of the AP domains already assigned must identify an AP queue
> + * that has been reserved by the VFIO AP device driver.
> + *
> + * Returns 0 if the value of @apid is valid; otherwise, returns an error.
> + */
> +static int vfio_ap_validate_apid(struct mdev_device *mdev,
> + struct ap_matrix_mdev *matrix_mdev,
> + unsigned long apid)
> +{
> + int ret;
> + unsigned long aqmsz = matrix_mdev->matrix.aqm_max + 1;
> + struct device *dev = mdev_parent_dev(mdev);
> + struct ap_matrix_dev *matrix_dev = to_ap_matrix_dev(dev);
> + struct ap_matrix matrix = matrix_mdev->matrix;
> +
> + /* If there are any queues assigned to the mediated device */
> + if (find_first_bit_inv(matrix.aqm, aqmsz) < aqmsz) {
> + matrix.apm_max = matrix_mdev->matrix.apm_max;
> + memset(matrix.apm, 0,
> + ARRAY_SIZE(matrix.apm) * sizeof(matrix.apm[0]));
> + set_bit_inv(apid, matrix.apm);
> + matrix.aqm_max = matrix_mdev->matrix.aqm_max;
> + memcpy(matrix.aqm, matrix_mdev->matrix.aqm,
> + ARRAY_SIZE(matrix.aqm) * sizeof(matrix.aqm[0]));
> + ret = vfio_ap_verify_queues_reserved(matrix_dev,
> + matrix_mdev->name,
> + &matrix);
> + } else {
> + ret = vfio_ap_verify_apid_reserved(matrix_dev,
> + matrix_mdev->name, apid);
> + }
> +
> + if (ret)
> + return ret;
> +
> + return 0;
> +}
> +
> +/**
> + * assign_adapter_store
> + *
> + * @dev: the matrix device
> + * @attr: a mediated matrix device attribute
> + * @buf: a buffer containing the adapter ID (APID) to be assigned
> + * @count: the number of bytes in @buf
> + *
> + * Parses the APID from @buf and assigns it to the mediated matrix device. The
> + * APID must be a valid value:
> + * * The APID value must not exceed the maximum allowable AP adapter ID
> + *
> + * * If there are no AP domains assigned, then there must be at least
> + * one AP queue device reserved by the VFIO AP device driver with an
> + * APQN containing @apid.
I do not understand the reason here.
Can you develop?
I suppose that by reserved you mean bound. (then use bound)
But I still can not understand the reason why.
Beside if I understand correctly what you do it forbid the automatic
assignment of a new card plugged into the host.
> + *
> + * * Else each APQN that can be derived from the intersection of @apid and
> + * the IDs of the AP domains already assigned must identify an AP queue
> + * that has been reserved by the VFIO AP device driver.
> + *
> + * Returns the number of bytes processed if the APID is valid; otherwise returns
> + * an error.
> + */
> +static ssize_t assign_adapter_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + int ret;
> + unsigned long apid;
> + struct mdev_device *mdev = mdev_from_dev(dev);
> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> + unsigned long max_apid = matrix_mdev->matrix.apm_max;
> +
> + ret = kstrtoul(buf, 0, &apid);
> + if (ret || (apid > max_apid)) {
> + pr_err("%s: %s: adapter id '%s' not a value from 0 to %02lu(%#04lx)",
> + VFIO_AP_MODULE_NAME, __func__, buf, max_apid, max_apid);
> +
> + return ret ? ret : -EINVAL;
> + }
> +
> + ret = vfio_ap_validate_apid(mdev, matrix_mdev, apid);
> + if (ret)
> + return ret;
> +
> + /* Set the bit in the AP mask (APM) corresponding to the AP adapter
> + * number (APID). The bits in the mask, from most significant to least
> + * significant bit, correspond to APIDs 0-255.
> + */
> + set_bit_inv(apid, matrix_mdev->matrix.apm);
> +
> + return count;
> +}
> +static DEVICE_ATTR_WO(assign_adapter);
> +
> +/**
> + * unassign_adapter_store
> + *
> + * @dev: the matrix device
> + * @attr: a mediated matrix device attribute
> + * @buf: a buffer containing the adapter ID (APID) to be assigned
> + * @count: the number of bytes in @buf
> + *
> + * Parses the APID from @buf and unassigns it from the mediated matrix device.
> + * The APID must be a valid value
> + *
> + * Returns the number of bytes processed if the APID is valid; otherwise returns
> + * an error.
> + */
> +static ssize_t unassign_adapter_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + int ret;
> + unsigned long apid;
> + struct mdev_device *mdev = mdev_from_dev(dev);
> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> + unsigned long max_apid = matrix_mdev->matrix.apm_max;
> +
> + ret = kstrtoul(buf, 0, &apid);
> + if (ret || (apid > max_apid)) {
> + pr_err("%s: %s: adapter id '%s' must be a value from 0 to %02lu(%#04lx)",
> + VFIO_AP_MODULE_NAME, __func__, buf, max_apid, max_apid);
> +
> + return ret ? ret : -EINVAL;
> + }
> +
> + if (!test_bit_inv(apid, matrix_mdev->matrix.apm)) {
> + pr_err("%s: %s: adapter id %02lu(%#04lx) not assigned",
> + VFIO_AP_MODULE_NAME, __func__, apid, apid);
> +
> + return -ENODEV;
> + }
> +
> + clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
> +
> + return count;
> +}
> +DEVICE_ATTR_WO(unassign_adapter);
> +
> +static struct attribute *vfio_ap_mdev_attrs[] = {
> + &dev_attr_assign_adapter.attr,
> + &dev_attr_unassign_adapter.attr,
> + NULL
> +};
> +
> +static struct attribute_group vfio_ap_mdev_attr_group = {
> + .attrs = vfio_ap_mdev_attrs
> +};
> +
> +static const struct attribute_group *vfio_ap_mdev_attr_groups[] = {
> + &vfio_ap_mdev_attr_group,
> + NULL
> +};
> +
> static const struct mdev_parent_ops vfio_ap_matrix_ops = {
> .owner = THIS_MODULE,
> .supported_type_groups = vfio_ap_mdev_type_groups,
> + .mdev_attr_groups = vfio_ap_mdev_attr_groups,
> .create = vfio_ap_mdev_create,
> .remove = vfio_ap_mdev_remove,
> };
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On 29/06/2018 23:11, Tony Krowiak wrote:
> Provides the sysfs interfaces for assigning AP domains to
> and unassigning AP domains from a mediated matrix device.
>
> An AP domain ID corresponds to an AP queue index (APQI). For
> each domain assigned to the mediated matrix device, its
> corresponging APQI is stored in an AP queue mask (AQM).
> The bits in the AQM, from most significant to least
> significant bit, correspond to AP domain numbers 0 to 255.
> When a domain is assigned, the bit corresponding to its
> APQI will be set in the AQM. Likewise, when a domain is
> unassigned, the bit corresponding to its APQI will be
> cleared from the AQM.
>
> The relevant sysfs structures are:
>
> /sys/devices/vfio_ap
> ... [matrix]
> ...... [mdev_supported_types]
> ......... [vfio_ap-passthrough]
> ............ [devices]
> ...............[$uuid]
> .................. assign_domain
> .................. unassign_domain
>
> To assign a domain to the $uuid mediated matrix device,
> write the domain's ID to the assign_domain file. To
> unassign a domain, write the domain's ID to the
> unassign_domain file. The ID is specified using
> conventional semantics: If it begins with 0x, the number
> will be parsed as a hexadecimal (case insensitive) number;
> if it begins with 0, it will be parsed as an octal number;
> otherwise, it will be parsed as a decimal number.
>
> For example, to assign domain 173 (0xad) to the mediated matrix
> device $uuid:
>
> echo 173 > assign_domain
>
> or
>
> echo 0255 > assign_domain
>
> or
>
> echo 0xad > assign_domain
>
> To unassign domain 173 (0xad):
>
> echo 173 > unassign_domain
>
> or
>
> echo 0255 > unassign_domain
>
> or
>
> echo 0xad > unassign_domain
>
> The assignment will be rejected:
>
> * If the domain ID exceeds the maximum value for an AP domain:
>
> * If the AP Extended Addressing (APXA) facility is installed,
> the max value is 255
>
> * Else the max value is 15
>
> * If no AP adapters have yet been assigned and there are
> no AP queues reserved by the VFIO AP driver that have an APQN
> with an APQI matching that of the AP domain number being
> assigned.
>
> * If any of the APQNs that can be derived from the intersection
> of the APQI being assigned and the AP adapter ID (APID) of
> each of the AP adapters previously assigned can not be matched
> with an APQN of an AP queue device reserved by the VFIO AP
> driver.
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> drivers/s390/crypto/vfio_ap_ops.c | 173 ++++++++++++++++++++++++++++++++++++-
> 1 files changed, 172 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index a4351bd..a5b06e7 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -170,6 +170,27 @@ static int vfio_ap_queue_has_apid(struct device *dev, void *data)
> }
>
> /**
> + * vfio_ap_queue_has_apqi
> + *
> + * @dev: an AP queue device
> + * @data: an AP queue index
> + *
> + * Flags whether any AP queue device has a particular AP queue index
> + *
> + * Returns 0 to indicate the function succeeded
> + */
> +static int vfio_ap_queue_has_apqi(struct device *dev, void *data)
> +{
> + struct vfio_id_reserved *id_res = data;
> + struct ap_queue *ap_queue = to_ap_queue(dev);
> +
> + if (id_res->id == AP_QID_QUEUE(ap_queue->qid))
> + id_res->reserved = true;
> +
> + return 0;
> +}
> +
> +/**
> * vfio_ap_verify_qid_reserved
> *
> * @matrix_dev: a mediated matrix device
> @@ -236,6 +257,42 @@ static int vfio_ap_verify_apid_reserved(struct ap_matrix_dev *matrix_dev,
> return -EPERM;
> }
>
> +/**
> + * vfio_ap_verify_apqi_reserved
> + *
> + * @matrix_dev: a mediated matrix device
> + * @apqi: an AP queue index
> + *
> + * Verifies that an AP queue with @apqi is reserved by the VFIO AP device
> + * driver.
> + *
> + * Returns 0 if an AP queue with @apqi is reserved; otherwise, returns -ENODEV.
> + */
> +static int vfio_ap_verify_apqi_reserved(struct ap_matrix_dev *matrix_dev,
> + const char *mdev_name,
> + unsigned long apqi)
> +{
> + int ret;
> + struct vfio_id_reserved id_res;
> +
> + id_res.id = apqi;
> + id_res.reserved = false;
> +
> + ret = driver_for_each_device(matrix_dev->device.driver, NULL, &id_res,
> + vfio_ap_queue_has_apqi);
> + if (ret)
> + return ret;
> +
> + if (id_res.reserved)
> + return 0;
> +
> + pr_err("%s: mdev %s using queue %04lx not reserved by %s driver",
> + VFIO_AP_MODULE_NAME, mdev_name, apqi,
> + VFIO_AP_DRV_NAME);
> +
> + return -EPERM;
> +}
> +
> static int vfio_ap_verify_queues_reserved(struct ap_matrix_dev *matrix_dev,
> const char *mdev_name,
> struct ap_matrix *matrix)
> @@ -417,10 +474,124 @@ static ssize_t unassign_adapter_store(struct device *dev,
> }
> DEVICE_ATTR_WO(unassign_adapter);
>
> +/**
> + * vfio_ap_validate_apqi
> + *
> + * @matrix_mdev: the mediated matrix device
> + * @apqi: the APQI (domain ID) to validate
> + *
> + * Validates the value of @apqi:
> + * * If there are no AP adapters assigned, then there must be at least
> + * one AP queue device reserved by the VFIO AP device driver with an
> + * APQN containing @apqi.
Same as in preceding patch, I do not understand why you need this.
Pierre
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On 29/06/2018 23:11, Tony Krowiak wrote:
> Provides a sysfs interface to view the AP matrix configured for the
> mediated matrix device.
>
> The relevant sysfs structures are:
>
> /sys/devices/vfio_ap
> ... [matrix]
> ...... [mdev_supported_types]
> ......... [vfio_ap-passthrough]
> ............ [devices]
> ...............[$uuid]
> .................. matrix
>
> To view the matrix configured for the mediated matrix device,
> print the matrix file:
>
> cat matrix
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> drivers/s390/crypto/vfio_ap_ops.c | 31 +++++++++++++++++++++++++++++++
> 1 files changed, 31 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index c8f31f3..bc7398d 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -697,6 +697,36 @@ static ssize_t control_domains_show(struct device *dev,
> }
> DEVICE_ATTR_RO(control_domains);
>
> +static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
> + char *buf)
> +{
> + struct mdev_device *mdev = mdev_from_dev(dev);
> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> + char *bufpos = buf;
> + unsigned long apid;
> + unsigned long apqi;
> + unsigned long napm = matrix_mdev->matrix.apm_max + 1;
> + unsigned long naqm = matrix_mdev->matrix.aqm_max + 1;
> + int nchars = 0;
> + int n;
> +
> + for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm) {
> + n = sprintf(bufpos, "%02lx\n", apid);
> + bufpos += n;
> + nchars += n;
> +
> + for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, naqm) {
> + n = sprintf(bufpos, "%02lx.%04lx\n", apid, apqi);
> + bufpos += n;
> + nchars += n;
> + }
> + }
> +
> + return nchars;
> +}
> +DEVICE_ATTR_RO(matrix);
> +
> +
> static struct attribute *vfio_ap_mdev_attrs[] = {
> &dev_attr_assign_adapter.attr,
> &dev_attr_unassign_adapter.attr,
> @@ -705,6 +735,7 @@ static ssize_t control_domains_show(struct device *dev,
> &dev_attr_assign_control_domain.attr,
> &dev_attr_unassign_control_domain.attr,
> &dev_attr_control_domains.attr,
> + &dev_attr_matrix.attr,
> NULL,
> };
>
I have still the same remark: what you show here is not what is currently
used by the SIE.
It is not irrelevant but what the guest really use may be more interesting
for the admin.
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On 29/06/2018 23:11, Tony Krowiak wrote:
> Registers the matrix device created by the VFIO AP device
> driver with the VFIO mediated device framework.
> Registering the matrix device will create the sysfs
> structures needed to create mediated matrix devices
> each of which will be used to configure the AP matrix
> for a guest and connect it to the VFIO AP device driver.
>
> Registering the matrix device with the VFIO mediated device
> framework will create the following sysfs structures:
>
> /sys/devices/vfio_ap
> ... [matrix]
> ...... [mdev_supported_types]
> ......... [vfio_ap-passthrough]
> ............ create
>
> To create a mediated device for the AP matrix device, write a UUID
> to the create file:
>
> uuidgen > create
>
> A symbolic link to the mediated device's directory will be created in the
> devices subdirectory named after the generated $uuid:
>
> /sys/devices/vfio_ap
> ... [matrix]
> ...... [mdev_supported_types]
> ......... [vfio_ap-passthrough]
> ............ [devices]
> ............... [$uuid]
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> MAINTAINERS | 1 +
> drivers/s390/crypto/Makefile | 2 +-
> drivers/s390/crypto/vfio_ap_drv.c | 9 ++
> drivers/s390/crypto/vfio_ap_ops.c | 131 +++++++++++++++++++++++++++++++++
> drivers/s390/crypto/vfio_ap_private.h | 22 +++++-
> 5 files changed, 161 insertions(+), 4 deletions(-)
> create mode 100644 drivers/s390/crypto/vfio_ap_ops.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0515dae..3217803 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -12410,6 +12410,7 @@ W: http://www.ibm.com/developerworks/linux/linux390/
> S: Supported
> F: drivers/s390/crypto/vfio_ap_drv.c
> F: drivers/s390/crypto/vfio_ap_private.h
> +F: drivers/s390/crypto/vfio_ap_ops.c
>
> S390 ZFCP DRIVER
> M: Steffen Maier <[email protected]>
> diff --git a/drivers/s390/crypto/Makefile b/drivers/s390/crypto/Makefile
> index 48e466e..8d36b05 100644
> --- a/drivers/s390/crypto/Makefile
> +++ b/drivers/s390/crypto/Makefile
> @@ -17,5 +17,5 @@ pkey-objs := pkey_api.o
> obj-$(CONFIG_PKEY) += pkey.o
>
> # adjunct processor matrix
> -vfio_ap-objs := vfio_ap_drv.o
> +vfio_ap-objs := vfio_ap_drv.o vfio_ap_ops.o
> obj-$(CONFIG_VFIO_AP) += vfio_ap.o
> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
> index 93db312..b6ff7a4 100644
> --- a/drivers/s390/crypto/vfio_ap_drv.c
> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> @@ -127,11 +127,20 @@ int __init vfio_ap_init(void)
> return ret;
> }
>
> + ret = vfio_ap_mdev_register(matrix_dev);
> + if (ret) {
> + ap_driver_unregister(&vfio_ap_drv);
> + vfio_ap_matrix_dev_destroy(matrix_dev);
> +
> + return ret;
> + }
> +
> return 0;
> }
>
> void __exit vfio_ap_exit(void)
> {
> + vfio_ap_mdev_unregister(matrix_dev);
> ap_driver_unregister(&vfio_ap_drv);
> vfio_ap_matrix_dev_destroy(matrix_dev);
> }
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> new file mode 100644
> index 0000000..4e61e33
> --- /dev/null
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -0,0 +1,131 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * Adjunct processor matrix VFIO device driver callbacks.
> + *
> + * Copyright IBM Corp. 2018
> + * Author(s): Tony Krowiak <[email protected]>
> + *
> + */
> +#include <linux/string.h>
> +#include <linux/vfio.h>
> +#include <linux/device.h>
> +#include <linux/list.h>
> +#include <linux/ctype.h>
> +
> +#include "vfio_ap_private.h"
> +
> +#define VFOP_AP_MDEV_TYPE_HWVIRT "passthrough"
> +#define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
> +
> +DEFINE_SPINLOCK(mdev_list_lock);
> +LIST_HEAD(mdev_list);
> +
> +static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
> +{
> + struct ap_matrix_dev *matrix_dev =
> + to_ap_matrix_dev(mdev_parent_dev(mdev));
> + struct ap_matrix_mdev *matrix_mdev;
> +
> + matrix_mdev = kzalloc(sizeof(*matrix_mdev), GFP_KERNEL);
> + if (!matrix_mdev)
> + return -ENOMEM;
> +
> + matrix_mdev->name = dev_name(mdev_dev(mdev));
> + mdev_set_drvdata(mdev, matrix_mdev);
> +
> + if (atomic_dec_if_positive(&matrix_dev->available_instances) < 0) {
> + kfree(matrix_mdev);
> + return -EPERM;
> + }
> +
> + spin_lock_bh(&mdev_list_lock);
> + list_add(&matrix_mdev->list, &mdev_list);
> + spin_unlock_bh(&mdev_list_lock);
> +
> + return 0;
> +}
> +
> +static int vfio_ap_mdev_remove(struct mdev_device *mdev)
> +{
> + struct ap_matrix_dev *matrix_dev =
> + to_ap_matrix_dev(mdev_parent_dev(mdev));
> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> +
> + spin_lock_bh(&mdev_list_lock);
> + list_del(&matrix_mdev->list);
> + spin_unlock_bh(&mdev_list_lock);
> + kfree(matrix_mdev);
> + mdev_set_drvdata(mdev, NULL);
> + atomic_inc(&matrix_dev->available_instances);
> +
> + return 0;
> +}
> +
> +static ssize_t name_show(struct kobject *kobj, struct device *dev, char *buf)
> +{
> + return sprintf(buf, "%s\n", VFIO_AP_MDEV_NAME_HWVIRT);
> +}
> +
> +MDEV_TYPE_ATTR_RO(name);
> +
> +static ssize_t available_instances_show(struct kobject *kobj,
> + struct device *dev, char *buf)
> +{
> + struct ap_matrix_dev *matrix_dev = to_ap_matrix_dev(dev);
> +
> + return sprintf(buf, "%d\n",
> + atomic_read(&matrix_dev->available_instances));
> +}
> +
> +MDEV_TYPE_ATTR_RO(available_instances);
> +
> +static ssize_t device_api_show(struct kobject *kobj, struct device *dev,
> + char *buf)
> +{
> + return sprintf(buf, "%s\n", VFIO_DEVICE_API_AP_STRING);
> +}
> +
> +MDEV_TYPE_ATTR_RO(device_api);
> +
> +static struct attribute *vfio_ap_mdev_type_attrs[] = {
> + &mdev_type_attr_name.attr,
> + &mdev_type_attr_device_api.attr,
> + &mdev_type_attr_available_instances.attr,
> + NULL,
> +};
> +
> +static struct attribute_group vfio_ap_mdev_hwvirt_type_group = {
> + .name = VFOP_AP_MDEV_TYPE_HWVIRT,
> + .attrs = vfio_ap_mdev_type_attrs,
> +};
> +
> +static struct attribute_group *vfio_ap_mdev_type_groups[] = {
> + &vfio_ap_mdev_hwvirt_type_group,
> + NULL,
> +};
> +
> +static const struct mdev_parent_ops vfio_ap_matrix_ops = {
> + .owner = THIS_MODULE,
> + .supported_type_groups = vfio_ap_mdev_type_groups,
> + .create = vfio_ap_mdev_create,
> + .remove = vfio_ap_mdev_remove,
> +};
> +
> +int vfio_ap_mdev_register(struct ap_matrix_dev *matrix_dev)
> +{
> + int ret;
> +
> + ret = mdev_register_device(&matrix_dev->device, &vfio_ap_matrix_ops);
> + if (ret)
> + return ret;
> +
> + atomic_set(&matrix_dev->available_instances,
> + AP_MATRIX_MAX_AVAILABLE_INSTANCES);
> +
> + return 0;
> +}
> +
> +void vfio_ap_mdev_unregister(struct ap_matrix_dev *matrix_dev)
> +{
> + mdev_unregister_device(&matrix_dev->device);
> +}
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index 19c0b60..3de1275 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -10,20 +10,36 @@
> #define _VFIO_AP_PRIVATE_H_
>
> #include <linux/types.h>
> +#include <linux/device.h>
> +#include <linux/mdev.h>
>
> #include "ap_bus.h"
>
> #define VFIO_AP_MODULE_NAME "vfio_ap"
> #define VFIO_AP_DRV_NAME "vfio_ap"
> +/**
> + * There must be one mediated matrix device for every guest using AP devices.
> + * If every APQN is assigned to a guest, then the maximum number of guests with
> + * a unique APQN assigned would be 255 adapters x 255 domains = 72351 guests.
> + */
> +#define AP_MATRIX_MAX_AVAILABLE_INSTANCES 72351
Why isn't it 256 x 256 ?
>
> struct ap_matrix_dev {
> struct device device;
> + atomic_t available_instances;
> +};
> +
> +struct ap_matrix_mdev {
> + const char *name;
> + struct list_head list;
> };
>
> -static inline struct ap_matrix_dev
> -*to_ap_matrix_parent_dev(struct device *dev)
> +static struct ap_matrix_dev *to_ap_matrix_dev(struct device *dev)
> {
> - return container_of(dev, struct ap_matrix_dev, device.parent);
> + return container_of(dev, struct ap_matrix_dev, device);
> }
>
> +extern int vfio_ap_mdev_register(struct ap_matrix_dev *matrix_dev);
> +extern void vfio_ap_mdev_unregister(struct ap_matrix_dev *matrix_dev);
> +
> #endif /* _VFIO_AP_PRIVATE_H_ */
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On 09/07/2018 14:20, Pierre Morel wrote:
> On 29/06/2018 23:11, Tony Krowiak wrote:
>> Provides a sysfs interface to view the AP matrix configured for the
>> mediated matrix device.
>>
>> The relevant sysfs structures are:
>>
>> /sys/devices/vfio_ap
>> ... [matrix]
>> ...... [mdev_supported_types]
>> ......... [vfio_ap-passthrough]
>> ............ [devices]
>> ...............[$uuid]
>> .................. matrix
>>
>> To view the matrix configured for the mediated matrix device,
>> print the matrix file:
>>
>> cat matrix
>>
>> Signed-off-by: Tony Krowiak <[email protected]>
>> ---
>> drivers/s390/crypto/vfio_ap_ops.c | 31
>> +++++++++++++++++++++++++++++++
>> 1 files changed, 31 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
>> b/drivers/s390/crypto/vfio_ap_ops.c
>> index c8f31f3..bc7398d 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -697,6 +697,36 @@ static ssize_t control_domains_show(struct
>> device *dev,
>> }
>> DEVICE_ATTR_RO(control_domains);
>>
>> +static ssize_t matrix_show(struct device *dev, struct
>> device_attribute *attr,
>> + char *buf)
>> +{
>> + struct mdev_device *mdev = mdev_from_dev(dev);
>> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>> + char *bufpos = buf;
>> + unsigned long apid;
>> + unsigned long apqi;
>> + unsigned long napm = matrix_mdev->matrix.apm_max + 1;
>> + unsigned long naqm = matrix_mdev->matrix.aqm_max + 1;
>> + int nchars = 0;
>> + int n;
>> +
>> + for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm) {
>> + n = sprintf(bufpos, "%02lx\n", apid);
>> + bufpos += n;
>> + nchars += n;
>> +
>> + for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, naqm) {
>> + n = sprintf(bufpos, "%02lx.%04lx\n", apid, apqi);
>> + bufpos += n;
>> + nchars += n;
>> + }
>> + }
>> +
>> + return nchars;
>> +}
>> +DEVICE_ATTR_RO(matrix);
>> +
>> +
>> static struct attribute *vfio_ap_mdev_attrs[] = {
>> &dev_attr_assign_adapter.attr,
>> &dev_attr_unassign_adapter.attr,
>> @@ -705,6 +735,7 @@ static ssize_t control_domains_show(struct device
>> *dev,
>> &dev_attr_assign_control_domain.attr,
>> &dev_attr_unassign_control_domain.attr,
>> &dev_attr_control_domains.attr,
>> + &dev_attr_matrix.attr,
>> NULL,
>> };
>>
>
> I have still the same remark: what you show here is not what is currently
> used by the SIE.
> It is not irrelevant but what the guest really use may be more
> interesting
> for the admin.
>
>
OK, you implement the right view it in patch 16/21.
Still, what is the purpose of showing this view?
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On 29/06/2018 23:11, Tony Krowiak wrote:
> Registers the matrix device created by the VFIO AP device
> driver with the VFIO mediated device framework.
> Registering the matrix device will create the sysfs
> structures needed to create mediated matrix devices
> each of which will be used to configure the AP matrix
> for a guest and connect it to the VFIO AP device driver.
>
> Registering the matrix device with the VFIO mediated device
> framework will create the following sysfs structures:
>
> /sys/devices/vfio_ap
> ... [matrix]
> ...... [mdev_supported_types]
> ......... [vfio_ap-passthrough]
> ............ create
>
> To create a mediated device for the AP matrix device, write a UUID
> to the create file:
>
> uuidgen > create
>
> A symbolic link to the mediated device's directory will be created in the
> devices subdirectory named after the generated $uuid:
>
> /sys/devices/vfio_ap
> ... [matrix]
> ...... [mdev_supported_types]
> ......... [vfio_ap-passthrough]
> ............ [devices]
> ............... [$uuid]
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> MAINTAINERS | 1 +
> drivers/s390/crypto/Makefile | 2 +-
> drivers/s390/crypto/vfio_ap_drv.c | 9 ++
> drivers/s390/crypto/vfio_ap_ops.c | 131 +++++++++++++++++++++++++++++++++
> drivers/s390/crypto/vfio_ap_private.h | 22 +++++-
> 5 files changed, 161 insertions(+), 4 deletions(-)
> create mode 100644 drivers/s390/crypto/vfio_ap_ops.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0515dae..3217803 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -12410,6 +12410,7 @@ W: http://www.ibm.com/developerworks/linux/linux390/
> S: Supported
> F: drivers/s390/crypto/vfio_ap_drv.c
> F: drivers/s390/crypto/vfio_ap_private.h
> +F: drivers/s390/crypto/vfio_ap_ops.c
>
> S390 ZFCP DRIVER
> M: Steffen Maier <[email protected]>
> diff --git a/drivers/s390/crypto/Makefile b/drivers/s390/crypto/Makefile
> index 48e466e..8d36b05 100644
> --- a/drivers/s390/crypto/Makefile
> +++ b/drivers/s390/crypto/Makefile
> @@ -17,5 +17,5 @@ pkey-objs := pkey_api.o
> obj-$(CONFIG_PKEY) += pkey.o
>
> # adjunct processor matrix
> -vfio_ap-objs := vfio_ap_drv.o
> +vfio_ap-objs := vfio_ap_drv.o vfio_ap_ops.o
> obj-$(CONFIG_VFIO_AP) += vfio_ap.o
> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
> index 93db312..b6ff7a4 100644
> --- a/drivers/s390/crypto/vfio_ap_drv.c
> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> @@ -127,11 +127,20 @@ int __init vfio_ap_init(void)
> return ret;
> }
>
> + ret = vfio_ap_mdev_register(matrix_dev);
> + if (ret) {
> + ap_driver_unregister(&vfio_ap_drv);
> + vfio_ap_matrix_dev_destroy(matrix_dev);
> +
> + return ret;
> + }
> +
> return 0;
> }
>
> void __exit vfio_ap_exit(void)
> {
> + vfio_ap_mdev_unregister(matrix_dev);
> ap_driver_unregister(&vfio_ap_drv);
> vfio_ap_matrix_dev_destroy(matrix_dev);
> }
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> new file mode 100644
> index 0000000..4e61e33
> --- /dev/null
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -0,0 +1,131 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * Adjunct processor matrix VFIO device driver callbacks.
> + *
> + * Copyright IBM Corp. 2018
> + * Author(s): Tony Krowiak <[email protected]>
> + *
> + */
> +#include <linux/string.h>
> +#include <linux/vfio.h>
> +#include <linux/device.h>
> +#include <linux/list.h>
> +#include <linux/ctype.h>
> +
> +#include "vfio_ap_private.h"
> +
> +#define VFOP_AP_MDEV_TYPE_HWVIRT "passthrough"
Isn't it a tipping fault ? VFIO != VFOP
> +#define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
> +
> +DEFINE_SPINLOCK(mdev_list_lock);
> +LIST_HEAD(mdev_list);
> +
> +static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
> +{
> + struct ap_matrix_dev *matrix_dev =
> + to_ap_matrix_dev(mdev_parent_dev(mdev));
> + struct ap_matrix_mdev *matrix_mdev;
> +
> + matrix_mdev = kzalloc(sizeof(*matrix_mdev), GFP_KERNEL);
> + if (!matrix_mdev)
> + return -ENOMEM;
> +
> + matrix_mdev->name = dev_name(mdev_dev(mdev));
> + mdev_set_drvdata(mdev, matrix_mdev);
> +
> + if (atomic_dec_if_positive(&matrix_dev->available_instances) < 0) {
> + kfree(matrix_mdev);
> + return -EPERM;
> + }
> +
> + spin_lock_bh(&mdev_list_lock);
> + list_add(&matrix_mdev->list, &mdev_list);
> + spin_unlock_bh(&mdev_list_lock);
> +
> + return 0;
> +}
> +
> +static int vfio_ap_mdev_remove(struct mdev_device *mdev)
> +{
> + struct ap_matrix_dev *matrix_dev =
> + to_ap_matrix_dev(mdev_parent_dev(mdev));
> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> +
> + spin_lock_bh(&mdev_list_lock);
> + list_del(&matrix_mdev->list);
> + spin_unlock_bh(&mdev_list_lock);
> + kfree(matrix_mdev);
> + mdev_set_drvdata(mdev, NULL);
> + atomic_inc(&matrix_dev->available_instances);
> +
> + return 0;
> +}
> +
> +static ssize_t name_show(struct kobject *kobj, struct device *dev, char *buf)
> +{
> + return sprintf(buf, "%s\n", VFIO_AP_MDEV_NAME_HWVIRT);
> +}
> +
> +MDEV_TYPE_ATTR_RO(name);
> +
> +static ssize_t available_instances_show(struct kobject *kobj,
> + struct device *dev, char *buf)
> +{
> + struct ap_matrix_dev *matrix_dev = to_ap_matrix_dev(dev);
> +
> + return sprintf(buf, "%d\n",
> + atomic_read(&matrix_dev->available_instances));
> +}
> +
> +MDEV_TYPE_ATTR_RO(available_instances);
> +
> +static ssize_t device_api_show(struct kobject *kobj, struct device *dev,
> + char *buf)
> +{
> + return sprintf(buf, "%s\n", VFIO_DEVICE_API_AP_STRING);
> +}
> +
> +MDEV_TYPE_ATTR_RO(device_api);
> +
> +static struct attribute *vfio_ap_mdev_type_attrs[] = {
> + &mdev_type_attr_name.attr,
> + &mdev_type_attr_device_api.attr,
> + &mdev_type_attr_available_instances.attr,
> + NULL,
> +};
> +
> +static struct attribute_group vfio_ap_mdev_hwvirt_type_group = {
> + .name = VFOP_AP_MDEV_TYPE_HWVIRT,
> + .attrs = vfio_ap_mdev_type_attrs,
> +};
> +
> +static struct attribute_group *vfio_ap_mdev_type_groups[] = {
> + &vfio_ap_mdev_hwvirt_type_group,
> + NULL,
> +};
> +
> +static const struct mdev_parent_ops vfio_ap_matrix_ops = {
> + .owner = THIS_MODULE,
> + .supported_type_groups = vfio_ap_mdev_type_groups,
> + .create = vfio_ap_mdev_create,
> + .remove = vfio_ap_mdev_remove,
> +};
> +
> +int vfio_ap_mdev_register(struct ap_matrix_dev *matrix_dev)
> +{
> + int ret;
> +
> + ret = mdev_register_device(&matrix_dev->device, &vfio_ap_matrix_ops);
> + if (ret)
> + return ret;
> +
> + atomic_set(&matrix_dev->available_instances,
> + AP_MATRIX_MAX_AVAILABLE_INSTANCES);
> +
> + return 0;
> +}
> +
> +void vfio_ap_mdev_unregister(struct ap_matrix_dev *matrix_dev)
> +{
> + mdev_unregister_device(&matrix_dev->device);
> +}
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index 19c0b60..3de1275 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -10,20 +10,36 @@
> #define _VFIO_AP_PRIVATE_H_
>
> #include <linux/types.h>
> +#include <linux/device.h>
> +#include <linux/mdev.h>
>
> #include "ap_bus.h"
>
> #define VFIO_AP_MODULE_NAME "vfio_ap"
> #define VFIO_AP_DRV_NAME "vfio_ap"
> +/**
> + * There must be one mediated matrix device for every guest using AP devices.
> + * If every APQN is assigned to a guest, then the maximum number of guests with
> + * a unique APQN assigned would be 255 adapters x 255 domains = 72351 guests.
> + */
> +#define AP_MATRIX_MAX_AVAILABLE_INSTANCES 72351
>
> struct ap_matrix_dev {
> struct device device;
> + atomic_t available_instances;
> +};
> +
> +struct ap_matrix_mdev {
> + const char *name;
> + struct list_head list;
> };
>
> -static inline struct ap_matrix_dev
> -*to_ap_matrix_parent_dev(struct device *dev)
> +static struct ap_matrix_dev *to_ap_matrix_dev(struct device *dev)
> {
> - return container_of(dev, struct ap_matrix_dev, device.parent);
> + return container_of(dev, struct ap_matrix_dev, device);
> }
>
> +extern int vfio_ap_mdev_register(struct ap_matrix_dev *matrix_dev);
> +extern void vfio_ap_mdev_unregister(struct ap_matrix_dev *matrix_dev);
> +
> #endif /* _VFIO_AP_PRIVATE_H_ */
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On 07/09/2018 05:21 AM, Pierre Morel wrote:
> On 03/07/2018 01:10, Halil Pasic wrote:
>>
>>
>> On 06/29/2018 11:11 PM, Tony Krowiak wrote:
>>> This patch provides documentation describing the AP architecture and
>>> design concepts behind the virtualization of AP devices. It also
>>> includes an example of how to configure AP devices for exclusive
>>> use of KVM guests.
>>>
>>> Signed-off-by: Tony Krowiak <[email protected]>
>>
>> I don't like the design of external interfaces except for:
>> * cpu model features, and
>> * reset handling.
>>
>> In particular:
>>
>>
> ...snip...
>
>> 4) If I were to act out the role of the administrator, I would prefer to think of
>> specifying or changing the access controls of a guest in respect to AP (that is
>> setting the AP matrix) as a single atomic operation -- which either succeeds or fails.
>>
>> The operation should succeed for any valid configuration, and fail for any invalid
>> on.
>>
>> The current piecemeal approach seems even less fitting if we consider changing the
>> access controls of a running guest. AFAIK changing access controls for a running
>> guest is possible, and I don't see a reason why should we artificially prohibit this.
>>
>> I think the current sysfs interface for manipulating the matrix is good for
>> manual playing around, but I would prefer having an interface that is better
>> suited for programs (e.g. ioctl).
>
> I disagree with using ioctl.
Why? What speaks against ioctl?
> I agree that the current implementation is not right.
> The configuration of APM and AQM should always be guarantied as coherent
> within the host but it can be done doing the right checks when using the sysfs.
>
I'm glad we agree on this one at least.
Regards,
Halil
On 09.07.2018 16:17, Pierre Morel wrote:
> On 29/06/2018 23:11, Tony Krowiak wrote:
>> Registers the matrix device created by the VFIO AP device
>> driver with the VFIO mediated device framework.
>> Registering the matrix device will create the sysfs
>> structures needed to create mediated matrix devices
>> each of which will be used to configure the AP matrix
>> for a guest and connect it to the VFIO AP device driver.
>>
>> Registering the matrix device with the VFIO mediated device
>> framework will create the following sysfs structures:
>>
>> /sys/devices/vfio_ap
>> ... [matrix]
>> ...... [mdev_supported_types]
>> ......... [vfio_ap-passthrough]
>> ............ create
>>
>> To create a mediated device for the AP matrix device, write a UUID
>> to the create file:
>>
>> uuidgen > create
>>
>> A symbolic link to the mediated device's directory will be created in the
>> devices subdirectory named after the generated $uuid:
>>
>> /sys/devices/vfio_ap
>> ... [matrix]
>> ...... [mdev_supported_types]
>> ......... [vfio_ap-passthrough]
>> ............ [devices]
>> ............... [$uuid]
>>
>> Signed-off-by: Tony Krowiak <[email protected]>
>> ---
>> MAINTAINERS | 1 +
>> drivers/s390/crypto/Makefile | 2 +-
>> drivers/s390/crypto/vfio_ap_drv.c | 9 ++
>> drivers/s390/crypto/vfio_ap_ops.c | 131 +++++++++++++++++++++++++++++++++
>> drivers/s390/crypto/vfio_ap_private.h | 22 +++++-
>> 5 files changed, 161 insertions(+), 4 deletions(-)
>> create mode 100644 drivers/s390/crypto/vfio_ap_ops.c
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 0515dae..3217803 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -12410,6 +12410,7 @@ W: http://www.ibm.com/developerworks/linux/linux390/
>> S: Supported
>> F: drivers/s390/crypto/vfio_ap_drv.c
>> F: drivers/s390/crypto/vfio_ap_private.h
>> +F: drivers/s390/crypto/vfio_ap_ops.c
>>
>> S390 ZFCP DRIVER
>> M: Steffen Maier <[email protected]>
>> diff --git a/drivers/s390/crypto/Makefile b/drivers/s390/crypto/Makefile
>> index 48e466e..8d36b05 100644
>> --- a/drivers/s390/crypto/Makefile
>> +++ b/drivers/s390/crypto/Makefile
>> @@ -17,5 +17,5 @@ pkey-objs := pkey_api.o
>> obj-$(CONFIG_PKEY) += pkey.o
>>
>> # adjunct processor matrix
>> -vfio_ap-objs := vfio_ap_drv.o
>> +vfio_ap-objs := vfio_ap_drv.o vfio_ap_ops.o
>> obj-$(CONFIG_VFIO_AP) += vfio_ap.o
>> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
>> index 93db312..b6ff7a4 100644
>> --- a/drivers/s390/crypto/vfio_ap_drv.c
>> +++ b/drivers/s390/crypto/vfio_ap_drv.c
>> @@ -127,11 +127,20 @@ int __init vfio_ap_init(void)
>> return ret;
>> }
>>
>> + ret = vfio_ap_mdev_register(matrix_dev);
>> + if (ret) {
>> + ap_driver_unregister(&vfio_ap_drv);
>> + vfio_ap_matrix_dev_destroy(matrix_dev);
>> +
>> + return ret;
>> + }
>> +
>> return 0;
>> }
>>
>> void __exit vfio_ap_exit(void)
>> {
>> + vfio_ap_mdev_unregister(matrix_dev);
>> ap_driver_unregister(&vfio_ap_drv);
>> vfio_ap_matrix_dev_destroy(matrix_dev);
>> }
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> new file mode 100644
>> index 0000000..4e61e33
>> --- /dev/null
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -0,0 +1,131 @@
>> +// SPDX-License-Identifier: GPL-2.0+
>> +/*
>> + * Adjunct processor matrix VFIO device driver callbacks.
>> + *
>> + * Copyright IBM Corp. 2018
>> + * Author(s): Tony Krowiak <[email protected]>
>> + *
>> + */
>> +#include <linux/string.h>
>> +#include <linux/vfio.h>
>> +#include <linux/device.h>
>> +#include <linux/list.h>
>> +#include <linux/ctype.h>
>> +
>> +#include "vfio_ap_private.h"
>> +
>> +#define VFOP_AP_MDEV_TYPE_HWVIRT "passthrough"
>> +#define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
>> +
>> +DEFINE_SPINLOCK(mdev_list_lock);
>> +LIST_HEAD(mdev_list);
>> +
>> +static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>> +{
>> + struct ap_matrix_dev *matrix_dev =
>> + to_ap_matrix_dev(mdev_parent_dev(mdev));
>> + struct ap_matrix_mdev *matrix_mdev;
>> +
>> + matrix_mdev = kzalloc(sizeof(*matrix_mdev), GFP_KERNEL);
>> + if (!matrix_mdev)
>> + return -ENOMEM;
>> +
>> + matrix_mdev->name = dev_name(mdev_dev(mdev));
>> + mdev_set_drvdata(mdev, matrix_mdev);
>> +
>> + if (atomic_dec_if_positive(&matrix_dev->available_instances) < 0) {
>> + kfree(matrix_mdev);
>> + return -EPERM;
>> + }
>> +
>> + spin_lock_bh(&mdev_list_lock);
>> + list_add(&matrix_mdev->list, &mdev_list);
>> + spin_unlock_bh(&mdev_list_lock);
>> +
>> + return 0;
>> +}
>> +
>> +static int vfio_ap_mdev_remove(struct mdev_device *mdev)
>> +{
>> + struct ap_matrix_dev *matrix_dev =
>> + to_ap_matrix_dev(mdev_parent_dev(mdev));
>> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>> +
>> + spin_lock_bh(&mdev_list_lock);
>> + list_del(&matrix_mdev->list);
>> + spin_unlock_bh(&mdev_list_lock);
>> + kfree(matrix_mdev);
>> + mdev_set_drvdata(mdev, NULL);
>> + atomic_inc(&matrix_dev->available_instances);
>> +
>> + return 0;
>> +}
>> +
>> +static ssize_t name_show(struct kobject *kobj, struct device *dev, char *buf)
>> +{
>> + return sprintf(buf, "%s\n", VFIO_AP_MDEV_NAME_HWVIRT);
>> +}
>> +
>> +MDEV_TYPE_ATTR_RO(name);
>> +
>> +static ssize_t available_instances_show(struct kobject *kobj,
>> + struct device *dev, char *buf)
>> +{
>> + struct ap_matrix_dev *matrix_dev = to_ap_matrix_dev(dev);
>> +
>> + return sprintf(buf, "%d\n",
>> + atomic_read(&matrix_dev->available_instances));
>> +}
>> +
>> +MDEV_TYPE_ATTR_RO(available_instances);
>> +
>> +static ssize_t device_api_show(struct kobject *kobj, struct device *dev,
>> + char *buf)
>> +{
>> + return sprintf(buf, "%s\n", VFIO_DEVICE_API_AP_STRING);
>> +}
>> +
>> +MDEV_TYPE_ATTR_RO(device_api);
>> +
>> +static struct attribute *vfio_ap_mdev_type_attrs[] = {
>> + &mdev_type_attr_name.attr,
>> + &mdev_type_attr_device_api.attr,
>> + &mdev_type_attr_available_instances.attr,
>> + NULL,
>> +};
>> +
>> +static struct attribute_group vfio_ap_mdev_hwvirt_type_group = {
>> + .name = VFOP_AP_MDEV_TYPE_HWVIRT,
>> + .attrs = vfio_ap_mdev_type_attrs,
>> +};
>> +
>> +static struct attribute_group *vfio_ap_mdev_type_groups[] = {
>> + &vfio_ap_mdev_hwvirt_type_group,
>> + NULL,
>> +};
>> +
>> +static const struct mdev_parent_ops vfio_ap_matrix_ops = {
>> + .owner = THIS_MODULE,
>> + .supported_type_groups = vfio_ap_mdev_type_groups,
>> + .create = vfio_ap_mdev_create,
>> + .remove = vfio_ap_mdev_remove,
>> +};
>> +
>> +int vfio_ap_mdev_register(struct ap_matrix_dev *matrix_dev)
>> +{
>> + int ret;
>> +
>> + ret = mdev_register_device(&matrix_dev->device, &vfio_ap_matrix_ops);
>> + if (ret)
>> + return ret;
>> +
>> + atomic_set(&matrix_dev->available_instances,
>> + AP_MATRIX_MAX_AVAILABLE_INSTANCES);
>> +
>> + return 0;
>> +}
>> +
>> +void vfio_ap_mdev_unregister(struct ap_matrix_dev *matrix_dev)
>> +{
>> + mdev_unregister_device(&matrix_dev->device);
>> +}
>> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
>> index 19c0b60..3de1275 100644
>> --- a/drivers/s390/crypto/vfio_ap_private.h
>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>> @@ -10,20 +10,36 @@
>> #define _VFIO_AP_PRIVATE_H_
>>
>> #include <linux/types.h>
>> +#include <linux/device.h>
>> +#include <linux/mdev.h>
>>
>> #include "ap_bus.h"
>>
>> #define VFIO_AP_MODULE_NAME "vfio_ap"
>> #define VFIO_AP_DRV_NAME "vfio_ap"
>> +/**
>> + * There must be one mediated matrix device for every guest using AP devices.
>> + * If every APQN is assigned to a guest, then the maximum number of guests with
>> + * a unique APQN assigned would be 255 adapters x 255 domains = 72351 guests.
>> + */
>> +#define AP_MATRIX_MAX_AVAILABLE_INSTANCES 72351
>
> Why isn't it 256 x 256 ?
In zcrypt.h there are defines for these:
#define MAX_ZDEV_CARDIDS_EXT 256
#define MAX_ZDEV_DOMAINS_EXT 256
/* Maximum number of zcrypt devices */
#define MAX_ZDEV_ENTRIES_EXT (MAX_ZDEV_CARDIDS_EXT * MAX_ZDEV_DOMAINS_EXT)
>> struct ap_matrix_dev {
>> struct device device;
>> + atomic_t available_instances;
>> +};
>> +
>> +struct ap_matrix_mdev {
>> + const char *name;
>> + struct list_head list;
>> };
>>
>> -static inline struct ap_matrix_dev
>> -*to_ap_matrix_parent_dev(struct device *dev)
>> +static struct ap_matrix_dev *to_ap_matrix_dev(struct device *dev)
>> {
>> - return container_of(dev, struct ap_matrix_dev, device.parent);
>> + return container_of(dev, struct ap_matrix_dev, device);
>> }
>>
>> +extern int vfio_ap_mdev_register(struct ap_matrix_dev *matrix_dev);
>> +extern void vfio_ap_mdev_unregister(struct ap_matrix_dev *matrix_dev);
>> +
>> #endif /* _VFIO_AP_PRIVATE_H_ */
>
>
On 09/07/2018 17:50, Halil Pasic wrote:
>
>
> On 07/09/2018 05:21 AM, Pierre Morel wrote:
>> On 03/07/2018 01:10, Halil Pasic wrote:
>>>
>>>
>>> On 06/29/2018 11:11 PM, Tony Krowiak wrote:
>>>> This patch provides documentation describing the AP architecture and
>>>> design concepts behind the virtualization of AP devices. It also
>>>> includes an example of how to configure AP devices for exclusive
>>>> use of KVM guests.
>>>>
>>>> Signed-off-by: Tony Krowiak <[email protected]>
>>>
>>> I don't like the design of external interfaces except for:
>>> * cpu model features, and
>>> * reset handling.
>>>
>>> In particular:
>>>
>>>
>> ...snip...
>>
>>> 4) If I were to act out the role of the administrator, I would
>>> prefer to think of
>>> specifying or changing the access controls of a guest in respect to
>>> AP (that is
>>> setting the AP matrix) as a single atomic operation -- which either
>>> succeeds or fails.
>>>
>>> The operation should succeed for any valid configuration, and fail
>>> for any invalid
>>> on.
>>>
>>> The current piecemeal approach seems even less fitting if we
>>> consider changing the
>>> access controls of a running guest. AFAIK changing access controls
>>> for a running
>>> guest is possible, and I don't see a reason why should we
>>> artificially prohibit this.
>>>
>>> I think the current sysfs interface for manipulating the matrix is
>>> good for
>>> manual playing around, but I would prefer having an interface that
>>> is better
>>> suited for programs (e.g. ioctl).
>>
>> I disagree with using ioctl.
>
> Why? What speaks against ioctl?
Using a sysfs interface is easy and can be done using any interpreted
language.
It has become the standard way to configure drivers.
Even the existing interface must be consolidated, it exist and is
functional.
For what I understood, the problem is to have an atomic update of the
matrix to avoid possible dead-lock
when two admin tasks try to configure a matrix for a guest.
The problematic is a userland problem it is not a kernel problem.
We have several possibilities to avoid this problem but still keep a
sysfs interface:
- the admin tasks may use a lock
- the admin task may use a policy like freeing owned resources if they
can not get all resources.
Using a step by step configuration allows to easily know the missing
resource in case of failure.
Regards,
Pierre
>
>> I agree that the current implementation is not right.
>> The configuration of APM and AQM should always be guarantied as coherent
>> within the host but it can be done doing the right checks when using
>> the sysfs.
>>
>
> I'm glad we agree on this one at least.
>
> Regards,
> Halil
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On 06/29/2018 11:11 PM, Tony Krowiak wrote:
> From: Tony Krowiak <[email protected]>
>
> Configures the AP adapters, usage domains and control domains for the
> KVM guest from the matrix configured via the mediated matrix device's
> sysfs attribute files.
>
[..]
> +
> +static void kvm_ap_set_crycb_masks(struct ap_matrix_mdev *matrix_mdev)
> +{
> + int nbytes;
> + unsigned long *apm, *aqm, *adm;
> +
> + kvm_ap_clear_crycb_masks(matrix_mdev);
> +
> + apm = kvm_ap_get_crycb_apm(matrix_mdev);
> + aqm = kvm_ap_get_crycb_aqm(matrix_mdev);
> + adm = kvm_ap_get_crycb_adm(matrix_mdev);
> +
> + nbytes = KVM_AP_MASK_BYTES(matrix_mdev->matrix.apm_max + 1);
> + memcpy(apm, matrix_mdev->matrix.apm, nbytes);
> +
> + nbytes = KVM_AP_MASK_BYTES(matrix_mdev->matrix.aqm_max + 1);
> + memcpy(aqm, matrix_mdev->matrix.aqm, nbytes);
> +
> + /*
> + * Merge the AQM and ADM since the ADM is a superset of the
> + * AQM by agreed-upon convention.
> + */
> + bitmap_or(adm, matrix_mdev->matrix.adm, matrix_mdev->matrix.aqm,
> + matrix_mdev->matrix.adm_max + 1);
Are you sure this or works as expected? E.g. if adm_max == 15 the bitmaps
include the least significant 2 bytes but you want the other two.
> +}
On 07/09/2018 11:21 AM, Pierre Morel wrote:
> On 03/07/2018 01:10, Halil Pasic wrote:
>>
>>
>> On 06/29/2018 11:11 PM, Tony Krowiak wrote:
>>> This patch provides documentation describing the AP architecture and
>>> design concepts behind the virtualization of AP devices. It also
>>> includes an example of how to configure AP devices for exclusive
>>> use of KVM guests.
>>>
>>> Signed-off-by: Tony Krowiak <[email protected]>
>>
>> I don't like the design of external interfaces except for:
>> * cpu model features, and
>> * reset handling.
>>
>> In particular:
>>
>>
> ...snip...
>
>> 4) If I were to act out the role of the administrator, I would prefer
>> to think of
>> specifying or changing the access controls of a guest in respect to
>> AP (that is
>> setting the AP matrix) as a single atomic operation -- which either
>> succeeds or fails.
>>
>> The operation should succeed for any valid configuration, and fail
>> for any invalid
>> on.
>>
>> The current piecemeal approach seems even less fitting if we consider
>> changing the
>> access controls of a running guest. AFAIK changing access controls
>> for a running
>> guest is possible, and I don't see a reason why should we
>> artificially prohibit this.
>>
>> I think the current sysfs interface for manipulating the matrix is
>> good for
>> manual playing around, but I would prefer having an interface that is
>> better
>> suited for programs (e.g. ioctl).
>
> I disagree with using ioctl.
> I agree that the current implementation is not right.
> The configuration of APM and AQM should always be guarantied as coherent
> within the host but it can be done doing the right checks when using
> the sysfs.
What sysfs interfaces do you suggest?
>
>
> Regards,
>
> Pierre
>
On 07/10/2018 09:03 AM, Harald Freudenberger wrote:
> On 09.07.2018 16:17, Pierre Morel wrote:
>> On 29/06/2018 23:11, Tony Krowiak wrote:
>>> Registers the matrix device created by the VFIO AP device
>>> driver with the VFIO mediated device framework.
>>> Registering the matrix device will create the sysfs
>>> structures needed to create mediated matrix devices
>>> each of which will be used to configure the AP matrix
>>> for a guest and connect it to the VFIO AP device driver.
>>>
>>> Registering the matrix device with the VFIO mediated device
>>> framework will create the following sysfs structures:
>>>
>>> /sys/devices/vfio_ap
>>> ... [matrix]
>>> ...... [mdev_supported_types]
>>> ......... [vfio_ap-passthrough]
>>> ............ create
>>>
>>> To create a mediated device for the AP matrix device, write a UUID
>>> to the create file:
>>>
>>> uuidgen > create
>>>
>>> A symbolic link to the mediated device's directory will be created in the
>>> devices subdirectory named after the generated $uuid:
>>>
>>> /sys/devices/vfio_ap
>>> ... [matrix]
>>> ...... [mdev_supported_types]
>>> ......... [vfio_ap-passthrough]
>>> ............ [devices]
>>> ............... [$uuid]
>>>
>>> Signed-off-by: Tony Krowiak <[email protected]>
>>> ---
>>> MAINTAINERS | 1 +
>>> drivers/s390/crypto/Makefile | 2 +-
>>> drivers/s390/crypto/vfio_ap_drv.c | 9 ++
>>> drivers/s390/crypto/vfio_ap_ops.c | 131 +++++++++++++++++++++++++++++++++
>>> drivers/s390/crypto/vfio_ap_private.h | 22 +++++-
>>> 5 files changed, 161 insertions(+), 4 deletions(-)
>>> create mode 100644 drivers/s390/crypto/vfio_ap_ops.c
>>>
>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>> index 0515dae..3217803 100644
>>> --- a/MAINTAINERS
>>> +++ b/MAINTAINERS
>>> @@ -12410,6 +12410,7 @@ W: http://www.ibm.com/developerworks/linux/linux390/
>>> S: Supported
>>> F: drivers/s390/crypto/vfio_ap_drv.c
>>> F: drivers/s390/crypto/vfio_ap_private.h
>>> +F: drivers/s390/crypto/vfio_ap_ops.c
>>>
>>> S390 ZFCP DRIVER
>>> M: Steffen Maier <[email protected]>
>>> diff --git a/drivers/s390/crypto/Makefile b/drivers/s390/crypto/Makefile
>>> index 48e466e..8d36b05 100644
>>> --- a/drivers/s390/crypto/Makefile
>>> +++ b/drivers/s390/crypto/Makefile
>>> @@ -17,5 +17,5 @@ pkey-objs := pkey_api.o
>>> obj-$(CONFIG_PKEY) += pkey.o
>>>
>>> # adjunct processor matrix
>>> -vfio_ap-objs := vfio_ap_drv.o
>>> +vfio_ap-objs := vfio_ap_drv.o vfio_ap_ops.o
>>> obj-$(CONFIG_VFIO_AP) += vfio_ap.o
>>> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
>>> index 93db312..b6ff7a4 100644
>>> --- a/drivers/s390/crypto/vfio_ap_drv.c
>>> +++ b/drivers/s390/crypto/vfio_ap_drv.c
>>> @@ -127,11 +127,20 @@ int __init vfio_ap_init(void)
>>> return ret;
>>> }
>>>
>>> + ret = vfio_ap_mdev_register(matrix_dev);
>>> + if (ret) {
>>> + ap_driver_unregister(&vfio_ap_drv);
>>> + vfio_ap_matrix_dev_destroy(matrix_dev);
>>> +
>>> + return ret;
>>> + }
>>> +
>>> return 0;
>>> }
>>>
>>> void __exit vfio_ap_exit(void)
>>> {
>>> + vfio_ap_mdev_unregister(matrix_dev);
>>> ap_driver_unregister(&vfio_ap_drv);
>>> vfio_ap_matrix_dev_destroy(matrix_dev);
>>> }
>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>>> new file mode 100644
>>> index 0000000..4e61e33
>>> --- /dev/null
>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>>> @@ -0,0 +1,131 @@
>>> +// SPDX-License-Identifier: GPL-2.0+
>>> +/*
>>> + * Adjunct processor matrix VFIO device driver callbacks.
>>> + *
>>> + * Copyright IBM Corp. 2018
>>> + * Author(s): Tony Krowiak <[email protected]>
>>> + *
>>> + */
>>> +#include <linux/string.h>
>>> +#include <linux/vfio.h>
>>> +#include <linux/device.h>
>>> +#include <linux/list.h>
>>> +#include <linux/ctype.h>
>>> +
>>> +#include "vfio_ap_private.h"
>>> +
>>> +#define VFOP_AP_MDEV_TYPE_HWVIRT "passthrough"
>>> +#define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
>>> +
>>> +DEFINE_SPINLOCK(mdev_list_lock);
>>> +LIST_HEAD(mdev_list);
>>> +
>>> +static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>>> +{
>>> + struct ap_matrix_dev *matrix_dev =
>>> + to_ap_matrix_dev(mdev_parent_dev(mdev));
>>> + struct ap_matrix_mdev *matrix_mdev;
>>> +
>>> + matrix_mdev = kzalloc(sizeof(*matrix_mdev), GFP_KERNEL);
>>> + if (!matrix_mdev)
>>> + return -ENOMEM;
>>> +
>>> + matrix_mdev->name = dev_name(mdev_dev(mdev));
>>> + mdev_set_drvdata(mdev, matrix_mdev);
>>> +
>>> + if (atomic_dec_if_positive(&matrix_dev->available_instances) < 0) {
>>> + kfree(matrix_mdev);
>>> + return -EPERM;
>>> + }
>>> +
>>> + spin_lock_bh(&mdev_list_lock);
>>> + list_add(&matrix_mdev->list, &mdev_list);
>>> + spin_unlock_bh(&mdev_list_lock);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static int vfio_ap_mdev_remove(struct mdev_device *mdev)
>>> +{
>>> + struct ap_matrix_dev *matrix_dev =
>>> + to_ap_matrix_dev(mdev_parent_dev(mdev));
>>> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>> +
>>> + spin_lock_bh(&mdev_list_lock);
>>> + list_del(&matrix_mdev->list);
>>> + spin_unlock_bh(&mdev_list_lock);
>>> + kfree(matrix_mdev);
>>> + mdev_set_drvdata(mdev, NULL);
>>> + atomic_inc(&matrix_dev->available_instances);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static ssize_t name_show(struct kobject *kobj, struct device *dev, char *buf)
>>> +{
>>> + return sprintf(buf, "%s\n", VFIO_AP_MDEV_NAME_HWVIRT);
>>> +}
>>> +
>>> +MDEV_TYPE_ATTR_RO(name);
>>> +
>>> +static ssize_t available_instances_show(struct kobject *kobj,
>>> + struct device *dev, char *buf)
>>> +{
>>> + struct ap_matrix_dev *matrix_dev = to_ap_matrix_dev(dev);
>>> +
>>> + return sprintf(buf, "%d\n",
>>> + atomic_read(&matrix_dev->available_instances));
>>> +}
>>> +
>>> +MDEV_TYPE_ATTR_RO(available_instances);
>>> +
>>> +static ssize_t device_api_show(struct kobject *kobj, struct device *dev,
>>> + char *buf)
>>> +{
>>> + return sprintf(buf, "%s\n", VFIO_DEVICE_API_AP_STRING);
>>> +}
>>> +
>>> +MDEV_TYPE_ATTR_RO(device_api);
>>> +
>>> +static struct attribute *vfio_ap_mdev_type_attrs[] = {
>>> + &mdev_type_attr_name.attr,
>>> + &mdev_type_attr_device_api.attr,
>>> + &mdev_type_attr_available_instances.attr,
>>> + NULL,
>>> +};
>>> +
>>> +static struct attribute_group vfio_ap_mdev_hwvirt_type_group = {
>>> + .name = VFOP_AP_MDEV_TYPE_HWVIRT,
>>> + .attrs = vfio_ap_mdev_type_attrs,
>>> +};
>>> +
>>> +static struct attribute_group *vfio_ap_mdev_type_groups[] = {
>>> + &vfio_ap_mdev_hwvirt_type_group,
>>> + NULL,
>>> +};
>>> +
>>> +static const struct mdev_parent_ops vfio_ap_matrix_ops = {
>>> + .owner = THIS_MODULE,
>>> + .supported_type_groups = vfio_ap_mdev_type_groups,
>>> + .create = vfio_ap_mdev_create,
>>> + .remove = vfio_ap_mdev_remove,
>>> +};
>>> +
>>> +int vfio_ap_mdev_register(struct ap_matrix_dev *matrix_dev)
>>> +{
>>> + int ret;
>>> +
>>> + ret = mdev_register_device(&matrix_dev->device, &vfio_ap_matrix_ops);
>>> + if (ret)
>>> + return ret;
>>> +
>>> + atomic_set(&matrix_dev->available_instances,
>>> + AP_MATRIX_MAX_AVAILABLE_INSTANCES);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +void vfio_ap_mdev_unregister(struct ap_matrix_dev *matrix_dev)
>>> +{
>>> + mdev_unregister_device(&matrix_dev->device);
>>> +}
>>> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
>>> index 19c0b60..3de1275 100644
>>> --- a/drivers/s390/crypto/vfio_ap_private.h
>>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>>> @@ -10,20 +10,36 @@
>>> #define _VFIO_AP_PRIVATE_H_
>>>
>>> #include <linux/types.h>
>>> +#include <linux/device.h>
>>> +#include <linux/mdev.h>
>>>
>>> #include "ap_bus.h"
>>>
>>> #define VFIO_AP_MODULE_NAME "vfio_ap"
>>> #define VFIO_AP_DRV_NAME "vfio_ap"
>>> +/**
>>> + * There must be one mediated matrix device for every guest using AP devices.
>>> + * If every APQN is assigned to a guest, then the maximum number of guests with
>>> + * a unique APQN assigned would be 255 adapters x 255 domains = 72351 guests.
>>> + */
>>> +#define AP_MATRIX_MAX_AVAILABLE_INSTANCES 72351
>> Why isn't it 256 x 256 ?
> In zcrypt.h there are defines for these:
>
> #define MAX_ZDEV_CARDIDS_EXT 256
> #define MAX_ZDEV_DOMAINS_EXT 256
>
> /* Maximum number of zcrypt devices */
> #define MAX_ZDEV_ENTRIES_EXT (MAX_ZDEV_CARDIDS_EXT * MAX_ZDEV_DOMAINS_EXT)
Okay, will do.
>>> struct ap_matrix_dev {
>>> struct device device;
>>> + atomic_t available_instances;
>>> +};
>>> +
>>> +struct ap_matrix_mdev {
>>> + const char *name;
>>> + struct list_head list;
>>> };
>>>
>>> -static inline struct ap_matrix_dev
>>> -*to_ap_matrix_parent_dev(struct device *dev)
>>> +static struct ap_matrix_dev *to_ap_matrix_dev(struct device *dev)
>>> {
>>> - return container_of(dev, struct ap_matrix_dev, device.parent);
>>> + return container_of(dev, struct ap_matrix_dev, device);
>>> }
>>>
>>> +extern int vfio_ap_mdev_register(struct ap_matrix_dev *matrix_dev);
>>> +extern void vfio_ap_mdev_unregister(struct ap_matrix_dev *matrix_dev);
>>> +
>>> #endif /* _VFIO_AP_PRIVATE_H_ */
>>
On 06/29/2018 11:11 PM, Tony Krowiak wrote:
> Implements the open callback on the mediated matrix device.
> The function registers a group notifier to receive notification
> of the VFIO_GROUP_NOTIFY_SET_KVM event. When notified,
> the vfio_ap device driver will get access to the guest's
> kvm structure. The open callback must ensure that only one
> mediated device shall be opened per guest.
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> drivers/s390/crypto/vfio_ap_ops.c | 128 +++++++++++++++++++++++++++++++++
> drivers/s390/crypto/vfio_ap_private.h | 2 +
> 2 files changed, 130 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index bc7398d..58be495 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -11,6 +11,10 @@
> #include <linux/device.h>
> #include <linux/list.h>
> #include <linux/ctype.h>
> +#include <linux/bitops.h>
> +#include <linux/kvm_host.h>
> +#include <linux/module.h>
> +#include <asm/kvm.h>
>
> #include "vfio_ap_private.h"
>
> @@ -748,12 +752,136 @@ static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
> NULL
> };
>
> +/**
> + * Verify that the AP instructions are available on the guest and are to be
> + * interpreted by the firmware. The former is indicated via the
> + * KVM_S390_VM_CPU_FEAT_AP CPU model feature and the latter by apie crypto
> + * flag.
> + */
> +static int kvm_ap_validate_crypto_setup(struct kvm *kvm)
> +{
> + if (test_bit_inv(KVM_S390_VM_CPU_FEAT_AP, kvm->arch.cpu_feat) &&
> + kvm->arch.crypto.apie)
> + return 0;
> +
> + pr_err("%s: interpretation of AP instructions not available",
> + VFIO_AP_MODULE_NAME);
> +
> + return -EOPNOTSUPP;
> +}
> +
> +static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
> + unsigned long action, void *data)
> +{
> + struct ap_matrix_mdev *matrix_mdev;
> +
> + if (action == VFIO_GROUP_NOTIFY_SET_KVM) {
> + matrix_mdev = container_of(nb, struct ap_matrix_mdev,
> + group_notifier);
> + matrix_mdev->kvm = data;
> + }
> +
> + return NOTIFY_OK;
> +}
> +
[..]
> +
> +static int vfio_ap_mdev_open(struct mdev_device *mdev)
> +{
> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> + struct ap_matrix_dev *matrix_dev =
> + to_ap_matrix_dev(mdev_parent_dev(mdev));
> + unsigned long events;
> + int ret;
> +
> + if (!try_module_get(THIS_MODULE))
> + return -ENODEV;
> +
> + ret = vfio_ap_verify_queues_reserved(matrix_dev, matrix_mdev->name,
> + &matrix_mdev->matrix);
> + if (ret)
> + goto out_err;
> +
> + matrix_mdev->group_notifier.notifier_call = vfio_ap_mdev_group_notifier;
> + events = VFIO_GROUP_NOTIFY_SET_KVM;
> +
> + ret = vfio_register_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
> + &events, &matrix_mdev->group_notifier);
> + if (ret)
> + goto out_err;
> +
> + ret = kvm_ap_validate_crypto_setup(matrix_mdev->kvm);
At this point you assume that your vfio_ap_mdev_group_notifier callback
was called with VFIO_GROUP_NOTIFY_SET_KVM and that you do have
matrix_mdev->kvm set up properly.
Based on how callbacks usually work this seems rather strange. It's
probably cleaner to set up he cyrcb (including all the validation
that needs to be done immediately before) in the callback
(vfio_ap_mdev_group_notifier).
If that is not viable I think we need a comment here explaining why is this
OK (at least).
Regards,
Halil
> + if (ret)
> + goto out_kvm_err;
> +
> + ret = vfio_ap_mdev_open_once(matrix_mdev);
> + if (ret)
> + goto out_kvm_err;
> +
> + return 0;
> +
> +out_kvm_err:
> + vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
> + &matrix_mdev->group_notifier);
> + matrix_mdev->kvm = NULL;
> +out_err:
> + module_put(THIS_MODULE);
> +
> + return ret;
> +}
> +
>
[..]
On 06/29/2018 11:11 PM, Tony Krowiak wrote:
> From: Tony Krowiak <[email protected]>
>
> Configures the AP adapters, usage domains and control domains for the
[..]
> +static inline void kvm_ap_clear_crycb_masks(struct ap_matrix_mdev *matrix_mdev)
> +{
> + memset(&matrix_mdev->kvm->arch.crypto.crycb->apcb0, 0,
> + sizeof(matrix_mdev->kvm->arch.crypto.crycb->apcb0));
> + memset(&matrix_mdev->kvm->arch.crypto.crycb->apcb1, 0,
> + sizeof(matrix_mdev->kvm->arch.crypto.crycb->apcb1));
> +}
> +
> +static void kvm_ap_set_crycb_masks(struct ap_matrix_mdev *matrix_mdev)
> +{
> + int nbytes;
> + unsigned long *apm, *aqm, *adm;
> +
> + kvm_ap_clear_crycb_masks(matrix_mdev);
> +
> + apm = kvm_ap_get_crycb_apm(matrix_mdev);
> + aqm = kvm_ap_get_crycb_aqm(matrix_mdev);
> + adm = kvm_ap_get_crycb_adm(matrix_mdev);
> +
> + nbytes = KVM_AP_MASK_BYTES(matrix_mdev->matrix.apm_max + 1);
> + memcpy(apm, matrix_mdev->matrix.apm, nbytes);
> +
> + nbytes = KVM_AP_MASK_BYTES(matrix_mdev->matrix.aqm_max + 1);
> + memcpy(aqm, matrix_mdev->matrix.aqm, nbytes);
> +
> + /*
> + * Merge the AQM and ADM since the ADM is a superset of the
> + * AQM by agreed-upon convention.
> + */
> + bitmap_or(adm, matrix_mdev->matrix.adm, matrix_mdev->matrix.aqm,
> + matrix_mdev->matrix.adm_max + 1);
> +}
> +
[..]
> +
> +static int kvm_ap_configure_matrix(struct ap_matrix_mdev *matrix_mdev)
> +{
> + int ret = 0;
> +
> + mutex_lock(&matrix_mdev->kvm->lock);
> +
> + ret = kvm_ap_validate_queue_sharing(matrix_mdev);
> + if (ret)
> + goto done;
> +
> + kvm_ap_set_crycb_masks(matrix_mdev);
> +
> +done:
> + mutex_unlock(&matrix_mdev->kvm->lock);
> +
> + return ret;
> +}
> +
> +void kvm_ap_deconfigure_matrix(struct ap_matrix_mdev *matrix_mdev)
> +{
> + mutex_lock(&matrix_mdev->kvm->lock);
> + kvm_ap_clear_crycb_masks(matrix_mdev);
The guest may be running at this point of time, or?
I think you need our safe update operation that we used to use for the
initial set too, but then somebody was like it ain't necessary because
we don't support hotplug (yet).
Regards,
Halil
> + mutex_unlock(&matrix_mdev->kvm->lock);
> +}
> +
On 07/06/2018 04:26 PM, Halil Pasic wrote:
>
>
> On 06/29/2018 11:11 PM, Tony Krowiak wrote:
>> From: Tony Krowiak <[email protected]>
>>
>> Introduces a new structure for storing the AP matrix configured
>> for the mediated matrix device via its sysfs attributes files.
>>
>> Signed-off-by: Tony Krowiak <[email protected]>
>> ---
>> drivers/s390/crypto/vfio_ap_ops.c | 12 ++++++++++++
>> drivers/s390/crypto/vfio_ap_private.h | 24 ++++++++++++++++++++++++
>> 2 files changed, 36 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
>> b/drivers/s390/crypto/vfio_ap_ops.c
>> index 4e61e33..bf7ed9f 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -20,6 +20,17 @@
>> DEFINE_SPINLOCK(mdev_list_lock);
>> LIST_HEAD(mdev_list);
>>
>> +static void vfio_ap_matrix_init(struct ap_matrix *matrix)
>> +{
>> + /* Test if PQAP(QCI) instruction is available */
>> + if (test_facility(12))
>> + ap_qci(&matrix->info);
>> +
>> + matrix->apm_max = matrix->info.apxa ? matrix->info.Na : 63;
>> + matrix->aqm_max = matrix->info.apxa ? matrix->info.Nd : 15;
>> + matrix->adm_max = matrix->info.apxa ? matrix->info.Nd : 15;
>> +}
>> +
>> static int vfio_ap_mdev_create(struct kobject *kobj, struct
>> mdev_device *mdev)
>> {
>> struct ap_matrix_dev *matrix_dev =
>> @@ -31,6 +42,7 @@ static int vfio_ap_mdev_create(struct kobject
>> *kobj, struct mdev_device *mdev)
>> return -ENOMEM;
>>
>> matrix_mdev->name = dev_name(mdev_dev(mdev));
>> + vfio_ap_matrix_init(&matrix_mdev->matrix);
>> mdev_set_drvdata(mdev, matrix_mdev);
>>
>> if (atomic_dec_if_positive(&matrix_dev->available_instances) <
>> 0) {
>> diff --git a/drivers/s390/crypto/vfio_ap_private.h
>> b/drivers/s390/crypto/vfio_ap_private.h
>> index 3de1275..ae771f5 100644
>> --- a/drivers/s390/crypto/vfio_ap_private.h
>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>> @@ -29,9 +29,33 @@ struct ap_matrix_dev {
>> atomic_t available_instances;
>> };
>>
>> +/**
>> + * The AP matrix is comprised of three bit masks identifying the
>> adapters,
>> + * queues (domains) and control domains that belong to an AP matrix.
>> The bits i
>> + * each mask, from least significant to most significant bit,
>> correspond to IDs
>> + * 0 to 255. When a bit is set, the corresponding ID belongs to the
>> matrix.
>> + *
>> + * @apm identifies the AP adapters in the matrix
>> + * @apm_max: max adapter number in @apm
>> + * @aqm identifies the AP queues (domains) in the matrix
>> + * @aqm_max: max domain number in @aqm
>> + * @adm identifies the AP control domains in the matrix
>> + * @adm_max: max domain number in @adm
>> + */
>> +struct ap_matrix {
>> + unsigned long apm_max;
>> + DECLARE_BITMAP(apm, 256);
>> + unsigned long aqm_max;
>> + DECLARE_BITMAP(aqm, 256);
>> + unsigned long adm_max;
>> + DECLARE_BITMAP(adm, 256);
>> + struct ap_config_info info;
>
> Why do we maintain (and populate by doing a QCI) the info member on a
> per mdev device basis?
That is a mistake, I am going to move it to struct ap_matrix_dev and
execute the QCI instruction when the matrix device is registered
with mdev during driver initialization.
>
>
>> +};
>> +
>> struct ap_matrix_mdev {
>> const char *name;
>> struct list_head list;
>> + struct ap_matrix matrix;
>> };
>>
>> static struct ap_matrix_dev *to_ap_matrix_dev(struct device *dev)
>>
>
On 07/12/2018 01:22 AM, Halil Pasic wrote:
>
>
> On 06/29/2018 11:11 PM, Tony Krowiak wrote:
>> From: Tony Krowiak <[email protected]>
>>
>> Configures the AP adapters, usage domains and control domains for the
>> KVM guest from the matrix configured via the mediated matrix device's
>> sysfs attribute files.
>>
> [..]
>> +
>> +static void kvm_ap_set_crycb_masks(struct ap_matrix_mdev *matrix_mdev)
>> +{
>> + int nbytes;
>> + unsigned long *apm, *aqm, *adm;
>> +
>> + kvm_ap_clear_crycb_masks(matrix_mdev);
>> +
>> + apm = kvm_ap_get_crycb_apm(matrix_mdev);
>> + aqm = kvm_ap_get_crycb_aqm(matrix_mdev);
>> + adm = kvm_ap_get_crycb_adm(matrix_mdev);
>> +
>> + nbytes = KVM_AP_MASK_BYTES(matrix_mdev->matrix.apm_max + 1);
>> + memcpy(apm, matrix_mdev->matrix.apm, nbytes);
>> +
>> + nbytes = KVM_AP_MASK_BYTES(matrix_mdev->matrix.aqm_max + 1);
>> + memcpy(aqm, matrix_mdev->matrix.aqm, nbytes);
>> +
>> + /*
>> + * Merge the AQM and ADM since the ADM is a superset of the
>> + * AQM by agreed-upon convention.
>> + */
>> + bitmap_or(adm, matrix_mdev->matrix.adm, matrix_mdev->matrix.aqm,
>> + matrix_mdev->matrix.adm_max + 1);
>
> Are you sure this or works as expected? E.g. if adm_max == 15 the bitmaps
> include the least significant 2 bytes but you want the other two.
Since test system has only 15 domains defined this has never been a
problem. I'll write a function rather than using the bitmap_or().
>
>
>> +}
On 07/12/2018 03:28 PM, Halil Pasic wrote:
>
>
> On 06/29/2018 11:11 PM, Tony Krowiak wrote:
>> From: Tony Krowiak <[email protected]>
>>
>> Configures the AP adapters, usage domains and control domains for the
>
> [..]
>
>> +static inline void kvm_ap_clear_crycb_masks(struct ap_matrix_mdev
>> *matrix_mdev)
>> +{
>> + memset(&matrix_mdev->kvm->arch.crypto.crycb->apcb0, 0,
>> + sizeof(matrix_mdev->kvm->arch.crypto.crycb->apcb0));
>> + memset(&matrix_mdev->kvm->arch.crypto.crycb->apcb1, 0,
>> + sizeof(matrix_mdev->kvm->arch.crypto.crycb->apcb1));
>> +}
>> +
>> +static void kvm_ap_set_crycb_masks(struct ap_matrix_mdev *matrix_mdev)
>> +{
>> + int nbytes;
>> + unsigned long *apm, *aqm, *adm;
>> +
>> + kvm_ap_clear_crycb_masks(matrix_mdev);
>> +
>> + apm = kvm_ap_get_crycb_apm(matrix_mdev);
>> + aqm = kvm_ap_get_crycb_aqm(matrix_mdev);
>> + adm = kvm_ap_get_crycb_adm(matrix_mdev);
>> +
>> + nbytes = KVM_AP_MASK_BYTES(matrix_mdev->matrix.apm_max + 1);
>> + memcpy(apm, matrix_mdev->matrix.apm, nbytes);
>> +
>> + nbytes = KVM_AP_MASK_BYTES(matrix_mdev->matrix.aqm_max + 1);
>> + memcpy(aqm, matrix_mdev->matrix.aqm, nbytes);
>> +
>> + /*
>> + * Merge the AQM and ADM since the ADM is a superset of the
>> + * AQM by agreed-upon convention.
>> + */
>> + bitmap_or(adm, matrix_mdev->matrix.adm, matrix_mdev->matrix.aqm,
>> + matrix_mdev->matrix.adm_max + 1);
>> +}
>> +
>
> [..]
>
>> +
>> +static int kvm_ap_configure_matrix(struct ap_matrix_mdev *matrix_mdev)
>> +{
>> + int ret = 0;
>> +
>> + mutex_lock(&matrix_mdev->kvm->lock);
>> +
>> + ret = kvm_ap_validate_queue_sharing(matrix_mdev);
>> + if (ret)
>> + goto done;
>> +
>> + kvm_ap_set_crycb_masks(matrix_mdev);
>> +
>> +done:
>> + mutex_unlock(&matrix_mdev->kvm->lock);
>> +
>> + return ret;
>> +}
>> +
>> +void kvm_ap_deconfigure_matrix(struct ap_matrix_mdev *matrix_mdev)
>> +{
>> + mutex_lock(&matrix_mdev->kvm->lock);
>> + kvm_ap_clear_crycb_masks(matrix_mdev);
>
> The guest may be running at this point of time, or?
>
> I think you need our safe update operation that we used to use for the
> initial set too, but then somebody was like it ain't necessary because
> we don't support hotplug (yet).
I agree.
>
>
>
> Regards,
> Halil
>
>> + mutex_unlock(&matrix_mdev->kvm->lock);
>> +}
>> +
On 07/12/2018 02:47 PM, Halil Pasic wrote:
>
>
> On 06/29/2018 11:11 PM, Tony Krowiak wrote:
>> Implements the open callback on the mediated matrix device.
>> The function registers a group notifier to receive notification
>> of the VFIO_GROUP_NOTIFY_SET_KVM event. When notified,
>> the vfio_ap device driver will get access to the guest's
>> kvm structure. The open callback must ensure that only one
>> mediated device shall be opened per guest.
>>
>> Signed-off-by: Tony Krowiak <[email protected]>
>> ---
>> drivers/s390/crypto/vfio_ap_ops.c | 128
>> +++++++++++++++++++++++++++++++++
>> drivers/s390/crypto/vfio_ap_private.h | 2 +
>> 2 files changed, 130 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
>> b/drivers/s390/crypto/vfio_ap_ops.c
>> index bc7398d..58be495 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -11,6 +11,10 @@
>> #include <linux/device.h>
>> #include <linux/list.h>
>> #include <linux/ctype.h>
>> +#include <linux/bitops.h>
>> +#include <linux/kvm_host.h>
>> +#include <linux/module.h>
>> +#include <asm/kvm.h>
>> #include "vfio_ap_private.h"
>> @@ -748,12 +752,136 @@ static ssize_t matrix_show(struct device
>> *dev, struct device_attribute *attr,
>> NULL
>> };
>> +/**
>> + * Verify that the AP instructions are available on the guest and
>> are to be
>> + * interpreted by the firmware. The former is indicated via the
>> + * KVM_S390_VM_CPU_FEAT_AP CPU model feature and the latter by apie
>> crypto
>> + * flag.
>> + */
>> +static int kvm_ap_validate_crypto_setup(struct kvm *kvm)
>> +{
>> + if (test_bit_inv(KVM_S390_VM_CPU_FEAT_AP, kvm->arch.cpu_feat) &&
>> + kvm->arch.crypto.apie)
>> + return 0;
>> +
>> + pr_err("%s: interpretation of AP instructions not available",
>> + VFIO_AP_MODULE_NAME);
>> +
>> + return -EOPNOTSUPP;
>> +}
>> +
>> +static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>> + unsigned long action, void *data)
>> +{
>> + struct ap_matrix_mdev *matrix_mdev;
>> +
>> + if (action == VFIO_GROUP_NOTIFY_SET_KVM) {
>> + matrix_mdev = container_of(nb, struct ap_matrix_mdev,
>> + group_notifier);
>> + matrix_mdev->kvm = data;
>> + }
>> +
>> + return NOTIFY_OK;
>> +}
>> +
>
> [..]
>
>> +
>> +static int vfio_ap_mdev_open(struct mdev_device *mdev)
>> +{
>> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>> + struct ap_matrix_dev *matrix_dev =
>> + to_ap_matrix_dev(mdev_parent_dev(mdev));
>> + unsigned long events;
>> + int ret;
>> +
>> + if (!try_module_get(THIS_MODULE))
>> + return -ENODEV;
>> +
>> + ret = vfio_ap_verify_queues_reserved(matrix_dev, matrix_mdev->name,
>> + &matrix_mdev->matrix);
>> + if (ret)
>> + goto out_err;
>> +
>> + matrix_mdev->group_notifier.notifier_call =
>> vfio_ap_mdev_group_notifier;
>> + events = VFIO_GROUP_NOTIFY_SET_KVM;
>> +
>> + ret = vfio_register_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
>> + &events, &matrix_mdev->group_notifier);
>> + if (ret)
>> + goto out_err;
>> +
>> + ret = kvm_ap_validate_crypto_setup(matrix_mdev->kvm);
>
> At this point you assume that your vfio_ap_mdev_group_notifier callback
> was called with VFIO_GROUP_NOTIFY_SET_KVM and that you do have
> matrix_mdev->kvm set up properly.
>
> Based on how callbacks usually work this seems rather strange. It's
> probably cleaner to set up he cyrcb (including all the validation
> that needs to be done immediately before) in the callback
> (vfio_ap_mdev_group_notifier).
>
> If that is not viable I think we need a comment here explaining why is
> this
> OK (at least).
This was originally in the callback and moved out, to the best of my
recollection,
because the validation at that time was done on the CRYCB and if that
validation
failed, there was no way to notify the client (QEMU) that configuration
of the
guest's CRYCB failed from the notification callback. This works - at
least as far
as I can tell from testing - because the registration of the notifier
invokes the
notification callback if KVM has already been set and that appears to be
the case.
You are correct, however; we probably shouldn't bank on that given that
I don't think we can guarantee that to be the case 100% of the time.
Consequently,
I will move this back into the notification callback and since
consistency checking
is now being done on the mdev devices instead of the CRYCB, we don't
need KVM at open
time.
>
>
> Regards,
> Halil
>
>> + if (ret)
>> + goto out_kvm_err;
>> +
>> + ret = vfio_ap_mdev_open_once(matrix_mdev);
>> + if (ret)
>> + goto out_kvm_err;
>> +
>> + return 0;
>> +
>> +out_kvm_err:
>> + vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
>> + &matrix_mdev->group_notifier);
>> + matrix_mdev->kvm = NULL;
>> +out_err:
>> + module_put(THIS_MODULE);
>> +
>> + return ret;
>> +}
>> +
>>
>
> [..]
On 07/12/2018 06:03 PM, Tony Krowiak wrote:
>>> +static int vfio_ap_mdev_open(struct mdev_device *mdev)
>>> +{
>>> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>> + struct ap_matrix_dev *matrix_dev =
>>> + to_ap_matrix_dev(mdev_parent_dev(mdev));
>>> + unsigned long events;
>>> + int ret;
>>> +
>>> + if (!try_module_get(THIS_MODULE))
>>> + return -ENODEV;
>>> +
>>> + ret = vfio_ap_verify_queues_reserved(matrix_dev, matrix_mdev->name,
>>> + &matrix_mdev->matrix);
>>> + if (ret)
>>> + goto out_err;
>>> +
>>> + matrix_mdev->group_notifier.notifier_call = vfio_ap_mdev_group_notifier;
>>> + events = VFIO_GROUP_NOTIFY_SET_KVM;
>>> +
>>> + ret = vfio_register_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
>>> + &events, &matrix_mdev->group_notifier);
>>> + if (ret)
>>> + goto out_err;
>>> +
>>> + ret = kvm_ap_validate_crypto_setup(matrix_mdev->kvm);
>>
>> At this point you assume that your vfio_ap_mdev_group_notifier callback
>> was called with VFIO_GROUP_NOTIFY_SET_KVM and that you do have
>> matrix_mdev->kvm set up properly.
>>
>> Based on how callbacks usually work this seems rather strange. It's
>> probably cleaner to set up he cyrcb (including all the validation
>> that needs to be done immediately before) in the callback
>> (vfio_ap_mdev_group_notifier).
>>
>> If that is not viable I think we need a comment here explaining why is this
>> OK (at least).
>
> This was originally in the callback and moved out, to the best of my recollection,
> because the validation at that time was done on the CRYCB and if that validation
> failed, there was no way to notify the client (QEMU) that configuration of the
> guest's CRYCB failed from the notification callback. This works - at least as far
> as I can tell from testing - because the registration of the notifier invokes the
> notification callback if KVM has already been set and that appears to be the case.
> You are correct, however; we probably shouldn't bank on that given that
> I don't think we can guarantee that to be the case 100% of the time. Consequently,
> I will move this back into the notification callback and since consistency checking
> is now being done on the mdev devices instead of the CRYCB, we don't need KVM at open
> time.
Sounds good to me. Making the open fail was not a good way to communicate this
error condition to userspace anyway.
Regards,
Halil
On 07/09/2018 02:11 PM, Pierre Morel wrote:
> On 29/06/2018 23:11, Tony Krowiak wrote:
>> Provides the sysfs interfaces for assigning AP adapters to
>> and unassigning AP adapters from a mediated matrix device.
>>
>> The IDs of the AP adapters assigned to the mediated matrix
>> device are stored in an AP mask (APM). The bits in the APM,
>> from most significant to least significant bit, correspond to
>> AP adapter ID (APID) 0 to 255. When an adapter is assigned, the
>> bit corresponding the APID will be set in the APM.
>> Likewise, when an adapter is unassigned, the bit corresponding
>> to the APID will be cleared from the APM.
>>
>> The relevant sysfs structures are:
>>
>> /sys/devices/vfio_ap
>> ... [matrix]
>> ...... [mdev_supported_types]
>> ......... [vfio_ap-passthrough]
>> ............ [devices]
>> ...............[$uuid]
>> .................. assign_adapter
>> .................. unassign_adapter
>>
>> To assign an adapter to the $uuid mediated matrix device's APM,
>> write the APID to the assign_adapter file. To unassign an adapter,
>> write the APID to the unassign_adapter file. The APID is specified
>> using conventional semantics: If it begins with 0x the number will
>> be parsed as a hexadecimal number; if it begins with a 0 the number
>> will be parsed as an octal number; otherwise, it will be parsed as a
>> decimal number.
>>
>> For example, to assign adapter 173 (0xad) to the mediated matrix
>> device $uuid:
>>
>> echo 173 > assign_adapter
>>
>> or
>>
>> echo 0xad > assign_adapter
>>
>> or
>>
>> echo 0255 > assign_adapter
>>
>> To unassign adapter 173 (0xad):
>>
>> echo 173 > unassign_adapter
>>
>> or
>>
>> echo 0xad > unassign_adapter
>>
>> or
>>
>> echo 0255 > unassign_adapter
>>
>> The assignment will be rejected:
>>
>> * If the APID exceeds the maximum value for an AP adapter:
>> * If the AP Extended Addressing (APXA) facility is
>> installed, the max value is 255
>> * Else the max value is 64
>>
>> * If no AP domains have yet been assigned and there are
>> no AP queues bound to the VFIO AP driver that have an APQN
>> with an APID matching that of the AP adapter being assigned.
>>
>> * If any of the APQNs that can be derived from the cross product
>> of the APID being assigned and the AP queue index (APQI) of
>> each of the AP domains previously assigned can not be matched
>> with an APQN of an AP queue device reserved by the VFIO AP
>> driver.
>>
>> Signed-off-by: Tony Krowiak <[email protected]>
>> ---
>>
snip [...]
>> +/**
>> + * assign_adapter_store
>> + *
>> + * @dev: the matrix device
>> + * @attr: a mediated matrix device attribute
>> + * @buf: a buffer containing the adapter ID (APID) to be assigned
>> + * @count: the number of bytes in @buf
>> + *
>> + * Parses the APID from @buf and assigns it to the mediated matrix
>> device. The
>> + * APID must be a valid value:
>> + * * The APID value must not exceed the maximum allowable AP
>> adapter ID
>> + *
>> + * * If there are no AP domains assigned, then there must be at
>> least
>> + * one AP queue device reserved by the VFIO AP device driver
>> with an
>> + * APQN containing @apid.
>
> I do not understand the reason here.
> Can you develop?
We forbid assignment of an adapter:
* If any APQNs that can be derived from the cross product of the input
adapter number and the domain numbers already assigned are not bound
to the VFIO AP driver.
* Or, if the APID of at least one APQN bound to the VFIO AP device driver
does not match the input adapter number.
>
>
> I suppose that by reserved you mean bound. (then use bound)
> But I still can not understand the reason why.
Yes, I mean bound. The reason why is because we don't want to allow
a guest to use an AP queue that is not bound to the VFIO AP driver
because it will then be shared with the zcrypt driver and that
would be a breach of security. That may, however, not be what you
are really asking. Are you suggesting that we should allow assignment
of any APQN since this same check is done in the mdev open callback?
>
>
> Beside if I understand correctly what you do it forbid the automatic
> assignment of a new card plugged into the host.
Not necessarily; the same logic I described in my first answer above will
be applied if a new card is assigned.
>
>
>> + *
>> + * * Else each APQN that can be derived from the intersection of
>> @apid and
>> + * the IDs of the AP domains already assigned must identify an
>> AP queue
>> + * that has been reserved by the VFIO AP device driver.
>> + *
>> + * Returns the number of bytes processed if the APID is valid;
>> otherwise returns
>> + * an error.
>> + */
>> +static ssize_t assign_adapter_store(struct device *dev,
>> + struct device_attribute *attr,
>> + const char *buf, size_t count)
>> +{
>> + int ret;
>> + unsigned long apid;
>> + struct mdev_device *mdev = mdev_from_dev(dev);
>> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>> + unsigned long max_apid = matrix_mdev->matrix.apm_max;
>> +
>> + ret = kstrtoul(buf, 0, &apid);
>> + if (ret || (apid > max_apid)) {
>> + pr_err("%s: %s: adapter id '%s' not a value from 0 to
>> %02lu(%#04lx)",
>> + VFIO_AP_MODULE_NAME, __func__, buf, max_apid, max_apid);
>> +
>> + return ret ? ret : -EINVAL;
>> + }
>> +
>> + ret = vfio_ap_validate_apid(mdev, matrix_mdev, apid);
>> + if (ret)
>> + return ret;
>> +
>> + /* Set the bit in the AP mask (APM) corresponding to the AP adapter
>> + * number (APID). The bits in the mask, from most significant to
>> least
>> + * significant bit, correspond to APIDs 0-255.
>> + */
>> + set_bit_inv(apid, matrix_mdev->matrix.apm);
>> +
>> + return count;
>> +}
>> +static DEVICE_ATTR_WO(assign_adapter);
>> +
>> +/**
>> + * unassign_adapter_store
>> + *
>> + * @dev: the matrix device
>> + * @attr: a mediated matrix device attribute
>> + * @buf: a buffer containing the adapter ID (APID) to be assigned
>> + * @count: the number of bytes in @buf
>> + *
>> + * Parses the APID from @buf and unassigns it from the mediated
>> matrix device.
>> + * The APID must be a valid value
>> + *
>> + * Returns the number of bytes processed if the APID is valid;
>> otherwise returns
>> + * an error.
>> + */
>> +static ssize_t unassign_adapter_store(struct device *dev,
>> + struct device_attribute *attr,
>> + const char *buf, size_t count)
>> +{
>> + int ret;
>> + unsigned long apid;
>> + struct mdev_device *mdev = mdev_from_dev(dev);
>> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>> + unsigned long max_apid = matrix_mdev->matrix.apm_max;
>> +
>> + ret = kstrtoul(buf, 0, &apid);
>> + if (ret || (apid > max_apid)) {
>> + pr_err("%s: %s: adapter id '%s' must be a value from 0 to
>> %02lu(%#04lx)",
>> + VFIO_AP_MODULE_NAME, __func__, buf, max_apid, max_apid);
>> +
>> + return ret ? ret : -EINVAL;
>> + }
>> +
>> + if (!test_bit_inv(apid, matrix_mdev->matrix.apm)) {
>> + pr_err("%s: %s: adapter id %02lu(%#04lx) not assigned",
>> + VFIO_AP_MODULE_NAME, __func__, apid, apid);
>> +
>> + return -ENODEV;
>> + }
>> +
>> + clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
>> +
>> + return count;
>> +}
>> +DEVICE_ATTR_WO(unassign_adapter);
>> +
>> +static struct attribute *vfio_ap_mdev_attrs[] = {
>> + &dev_attr_assign_adapter.attr,
>> + &dev_attr_unassign_adapter.attr,
>> + NULL
>> +};
>> +
>> +static struct attribute_group vfio_ap_mdev_attr_group = {
>> + .attrs = vfio_ap_mdev_attrs
>> +};
>> +
>> +static const struct attribute_group *vfio_ap_mdev_attr_groups[] = {
>> + &vfio_ap_mdev_attr_group,
>> + NULL
>> +};
>> +
>> static const struct mdev_parent_ops vfio_ap_matrix_ops = {
>> .owner = THIS_MODULE,
>> .supported_type_groups = vfio_ap_mdev_type_groups,
>> + .mdev_attr_groups = vfio_ap_mdev_attr_groups,
>> .create = vfio_ap_mdev_create,
>> .remove = vfio_ap_mdev_remove,
>> };
>
>
On 07/09/2018 04:38 PM, Pierre Morel wrote:
> On 09/07/2018 14:20, Pierre Morel wrote:
>> On 29/06/2018 23:11, Tony Krowiak wrote:
>>> Provides a sysfs interface to view the AP matrix configured for the
>>> mediated matrix device.
>>>
>>> The relevant sysfs structures are:
>>>
>>> /sys/devices/vfio_ap
>>> ... [matrix]
>>> ...... [mdev_supported_types]
>>> ......... [vfio_ap-passthrough]
>>> ............ [devices]
>>> ...............[$uuid]
>>> .................. matrix
>>>
>>> To view the matrix configured for the mediated matrix device,
>>> print the matrix file:
>>>
>>> cat matrix
>>>
>>> Signed-off-by: Tony Krowiak <[email protected]>
>>> ---
>>> drivers/s390/crypto/vfio_ap_ops.c | 31
>>> +++++++++++++++++++++++++++++++
>>> 1 files changed, 31 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
>>> b/drivers/s390/crypto/vfio_ap_ops.c
>>> index c8f31f3..bc7398d 100644
>>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>>> @@ -697,6 +697,36 @@ static ssize_t control_domains_show(struct
>>> device *dev,
>>> }
>>> DEVICE_ATTR_RO(control_domains);
>>>
>>> +static ssize_t matrix_show(struct device *dev, struct
>>> device_attribute *attr,
>>> + char *buf)
>>> +{
>>> + struct mdev_device *mdev = mdev_from_dev(dev);
>>> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>> + char *bufpos = buf;
>>> + unsigned long apid;
>>> + unsigned long apqi;
>>> + unsigned long napm = matrix_mdev->matrix.apm_max + 1;
>>> + unsigned long naqm = matrix_mdev->matrix.aqm_max + 1;
>>> + int nchars = 0;
>>> + int n;
>>> +
>>> + for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm) {
>>> + n = sprintf(bufpos, "%02lx\n", apid);
>>> + bufpos += n;
>>> + nchars += n;
>>> +
>>> + for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, naqm) {
>>> + n = sprintf(bufpos, "%02lx.%04lx\n", apid, apqi);
>>> + bufpos += n;
>>> + nchars += n;
>>> + }
>>> + }
>>> +
>>> + return nchars;
>>> +}
>>> +DEVICE_ATTR_RO(matrix);
>>> +
>>> +
>>> static struct attribute *vfio_ap_mdev_attrs[] = {
>>> &dev_attr_assign_adapter.attr,
>>> &dev_attr_unassign_adapter.attr,
>>> @@ -705,6 +735,7 @@ static ssize_t control_domains_show(struct
>>> device *dev,
>>> &dev_attr_assign_control_domain.attr,
>>> &dev_attr_unassign_control_domain.attr,
>>> &dev_attr_control_domains.attr,
>>> + &dev_attr_matrix.attr,
>>> NULL,
>>> };
>>>
>>
>> I have still the same remark: what you show here is not what is
>> currently
>> used by the SIE.
>> It is not irrelevant but what the guest really use may be more
>> interesting
>> for the admin.
>>
>>
> OK, you implement the right view it in patch 16/21.
>
> Still, what is the purpose of showing this view?
I find it to have great value when configuring the mdev. It provides a
view of
what has been configured thus far.
>
>
>
On 07/13/2018 02:24 PM, Tony Krowiak wrote:
> On 07/09/2018 04:38 PM, Pierre Morel wrote:
>> On 09/07/2018 14:20, Pierre Morel wrote:
>>> On 29/06/2018 23:11, Tony Krowiak wrote:
>>>> Provides a sysfs interface to view the AP matrix configured for the
>>>> mediated matrix device.
>>>>
>>>> The relevant sysfs structures are:
>>>>
>>>> /sys/devices/vfio_ap
>>>> ... [matrix]
>>>> ...... [mdev_supported_types]
>>>> ......... [vfio_ap-passthrough]
>>>> ............ [devices]
>>>> ...............[$uuid]
>>>> .................. matrix
>>>>
>>>> To view the matrix configured for the mediated matrix device,
>>>> print the matrix file:
>>>>
>>>> cat matrix
>>>>
>>>> Signed-off-by: Tony Krowiak <[email protected]>
>>>> ---
>>>> drivers/s390/crypto/vfio_ap_ops.c | 31 +++++++++++++++++++++++++++++++
>>>> 1 files changed, 31 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>>>> index c8f31f3..bc7398d 100644
>>>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>>>> @@ -697,6 +697,36 @@ static ssize_t control_domains_show(struct device *dev,
>>>> }
>>>> DEVICE_ATTR_RO(control_domains);
>>>>
>>>> +static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
>>>> + char *buf)
>>>> +{
>>>> + struct mdev_device *mdev = mdev_from_dev(dev);
>>>> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>>> + char *bufpos = buf;
>>>> + unsigned long apid;
>>>> + unsigned long apqi;
>>>> + unsigned long napm = matrix_mdev->matrix.apm_max + 1;
>>>> + unsigned long naqm = matrix_mdev->matrix.aqm_max + 1;
>>>> + int nchars = 0;
>>>> + int n;
>>>> +
>>>> + for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm) {
>>>> + n = sprintf(bufpos, "%02lx\n", apid);
>>>> + bufpos += n;
>>>> + nchars += n;
>>>> +
>>>> + for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, naqm) {
>>>> + n = sprintf(bufpos, "%02lx.%04lx\n", apid, apqi);
>>>> + bufpos += n;
>>>> + nchars += n;
>>>> + }
>>>> + }
>>>> +
>>>> + return nchars;
>>>> +}
>>>> +DEVICE_ATTR_RO(matrix);
>>>> +
>>>> +
>>>> static struct attribute *vfio_ap_mdev_attrs[] = {
>>>> &dev_attr_assign_adapter.attr,
>>>> &dev_attr_unassign_adapter.attr,
>>>> @@ -705,6 +735,7 @@ static ssize_t control_domains_show(struct device *dev,
>>>> &dev_attr_assign_control_domain.attr,
>>>> &dev_attr_unassign_control_domain.attr,
>>>> &dev_attr_control_domains.attr,
>>>> + &dev_attr_matrix.attr,
>>>> NULL,
>>>> };
>>>>
>>>
>>> I have still the same remark: what you show here is not what is currently
>>> used by the SIE.
>>> It is not irrelevant but what the guest really use may be more interesting
>>> for the admin.
>>>
>>>
>> OK, you implement the right view it in patch 16/21.
>>
>> Still, what is the purpose of showing this view?
>
> I find it to have great value when configuring the mdev. It provides a view of
> what has been configured thus far.
>
IMHO we need to keep this view for the reason stated by Tony.
Halil