On s390, we have cryptographic coprocessor cards, which are modeled on
Linux as devices on the AP bus. Each card can be partitioned into domains
which can be thought of as a set of hardware registers for processing
crypto commands. Crypto commands are sent to a specific domain within a
card is via a queue which is identified as a (card,domain) tuple. We model
this something like the following (assuming we have access to cards 3 and
4 and domains 1 and 2):
AP -> card3 -> queue (3,1)
-> queue (3,2)
-> card4 -> queue (4,1)
-> queue (4,2)
If we want to virtualize this, we can use a feature provided by the
hardware. We basically attach a satellite control block to our main
hardware virtualization control block and the hardware takes care of
most of the rest.
For this control block, we don't specify explicit tuples, but a list of
cards and a list of domains. The guest will get access to the cross
product.
Because of this, we need to take care that the lists provided to
different guests don't overlap; i.e., we need to enforce sane
configurations. Otherwise, one guest may get access to things like
secret keys for another guest.
The idea of this patch set is to introduce a new device, the matrix
device. This matrix device hangs off a different root and acts as the
parent node for mdev devices.
If you now want to give the tuples (4,1) and (4,2), you need to do the
following:
- Unbind the (4,1) and (4,2) tuples from their ap bus driver.
- Bind the (4,1) and (4,2) tuples to the vfio_ap driver.
- Create the mediated device.
- Assign card 4 and domains 1 and 2 to the mediated device
QEMU will now simply consume the mediated device and things should work.
For a complete description of the architecture and concepts underlying the
design, see the Documentation/s390/vfio-ap.txt file included with this
patch set.
v4 => v5 Change log:
===================
* Added code to mdev open callback to ensure not more than one vfio-ap
device can be opened by a guest.
* Interpret AP instructions by default
* Removed patch implementing interface to enable/disable AP interpretation,
since that will now be done by default
* Removed patch to reset crypto attributes for ALL vcpus. That will be
submitted as a single patch since it will not be needed in this series -
i.e., it was called from the interface to enable/disable AP instructions
* All code for initializing crypto for a guest has been moved back to
kvm-s390.c, kvm_s390_crypto_init(kvm) function
* Maintaining a module reference count for the vfio_ap module so it is not
removed while a guest with AP devices is running.
* AP bus interfaces needed by KVM that are unavailable if CONFIG_ZCRYPT=n
are temporarily embedded in KVM until available statically via future
patch.
Tony Krowiak (13):
KVM: s390: Interface to test whether APXA installed
KVM: s390: refactor crypto initialization
KVM: s390: CPU model support for AP virtualization
s390: vfio-ap: base implementation of VFIO AP device driver
s390: vfio-ap: register matrix device with VFIO mdev framework
KVM: s390: interfaces to manage guest's AP matrix
s390: vfio-ap: sysfs interfaces to configure adapters
s390: vfio-ap: sysfs interfaces to configure domains
s390: vfio-ap: sysfs interfaces to configure control domains
s390: vfio-ap: sysfs interface to view matrix mdev matrix
KVM: s390: implement mediated device open callback
s390: vfio-ap: implement VFIO_DEVICE_GET_INFO ioctl
s390: doc: detailed specifications for AP virtualization
Documentation/s390/vfio-ap.txt | 575 +++++++++++++++++++++
MAINTAINERS | 13 +
arch/s390/Kconfig | 11 +
arch/s390/include/asm/kvm-ap.h | 133 +++++
arch/s390/include/asm/kvm_host.h | 5 +
arch/s390/include/uapi/asm/kvm.h | 1 +
arch/s390/kvm/Makefile | 2 +-
arch/s390/kvm/kvm-ap.c | 263 ++++++++++
arch/s390/kvm/kvm-s390.c | 116 +++---
arch/s390/tools/gen_facilities.c | 3 +
drivers/s390/crypto/Makefile | 4 +
drivers/s390/crypto/vfio_ap_drv.c | 143 ++++++
drivers/s390/crypto/vfio_ap_ops.c | 906 +++++++++++++++++++++++++++++++++
drivers/s390/crypto/vfio_ap_private.h | 76 +++
include/uapi/linux/vfio.h | 2 +
15 files changed, 2194 insertions(+), 59 deletions(-)
create mode 100644 Documentation/s390/vfio-ap.txt
create mode 100644 arch/s390/include/asm/kvm-ap.h
create mode 100644 arch/s390/kvm/kvm-ap.c
create mode 100644 drivers/s390/crypto/vfio_ap_drv.c
create mode 100644 drivers/s390/crypto/vfio_ap_ops.c
create mode 100644 drivers/s390/crypto/vfio_ap_private.h
Introduces a new CPU model feature and two CPU model
facilities to support AP virtualization for KVM guests.
CPU model feature:
The KVM_S390_VM_CPU_FEAT_AP feature indicates that
AP instructions are available on the guest. This
feature will be enabled by the kernel only if the AP
instructions are installed on the linux host. This feature
must be specifically turned on for the KVM guest from
userspace to use the VFIO AP device driver for guest
access to AP devices.
By default, AP instructions will be interpreted if this
feature is turned on for the KVM guest. This guarantees
that AP instructions executed on the guest will not be
met with an operation exception due to the fact that there
are no handlers to process intercepted AP instructions.
CPU model facilities:
1. AP Query Configuration Information (QCI) facility is installed.
This is indicated by setting facilities bit 12 for
the guest. The kernel will not enable this facility
for the guest if it is not set on the host. This facility
must not be set by userspace if the KVM_S390_VM_CPU_FEAT_AP
feature is not installed.
If this facility is not set for the KVM guest, then only
APQNs with an APQI less than 16 will be available to the
guest regardless of the guest's matrix configuration. This
is a limitation of the AP bus running on the guest.
2. AP Facilities Test facility (APFT) is installed.
This is indicated by setting facilities bit 15 for
the guest. The kernel will not enable this facility for
the guest if it is not set on the host. This facility
must not be set by userspace if the KVM_S390_VM_CPU_FEAT_AP
feature is not installed.
If this facility is not set for the KVM guest, then no
AP devices will be available to the guest regardless of
the guest's matrix configuration. This is a limitation
of the AP bus running under the guest.
Reviewed-by: Christian Borntraeger <[email protected]>
Reviewed-by: Halil Pasic <[email protected]>
Signed-off-by: Tony Krowiak <[email protected]>
---
arch/s390/include/asm/kvm_host.h | 2 ++
arch/s390/include/uapi/asm/kvm.h | 1 +
arch/s390/kvm/kvm-s390.c | 12 ++++++++++++
arch/s390/tools/gen_facilities.c | 3 +++
4 files changed, 18 insertions(+), 0 deletions(-)
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 5393c4d..ef4b237 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -186,6 +186,7 @@ struct kvm_s390_sie_block {
#define ECA_AIV 0x00200000
#define ECA_VX 0x00020000
#define ECA_PROTEXCI 0x00002000
+#define ECA_APIE 0x00000008
#define ECA_SII 0x00000001
__u32 eca; /* 0x004c */
#define ICPT_INST 0x04
@@ -714,6 +715,7 @@ struct kvm_s390_crypto {
__u32 crycbd;
__u8 aes_kw;
__u8 dea_kw;
+ __u8 apie;
};
#define APCB0_MASK_SIZE 1
diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
index 4cdaa55..a580dec 100644
--- a/arch/s390/include/uapi/asm/kvm.h
+++ b/arch/s390/include/uapi/asm/kvm.h
@@ -130,6 +130,7 @@ struct kvm_s390_vm_cpu_machine {
#define KVM_S390_VM_CPU_FEAT_PFMFI 11
#define KVM_S390_VM_CPU_FEAT_SIGPIF 12
#define KVM_S390_VM_CPU_FEAT_KSS 13
+#define KVM_S390_VM_CPU_FEAT_AP 14
struct kvm_s390_vm_cpu_feat {
__u64 feat[16];
};
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 99779a6..81fbb0d 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -367,6 +367,11 @@ static void kvm_s390_cpu_feat_init(void)
if (MACHINE_HAS_ESOP)
allow_cpu_feat(KVM_S390_VM_CPU_FEAT_ESOP);
+
+ /* Check if AP instructions installed on host */
+ if (kvm_ap_instructions_available())
+ allow_cpu_feat(KVM_S390_VM_CPU_FEAT_AP);
+
/*
* We need SIE support, ESOP (PROT_READ protection for gmap_shadow),
* 64bit SCAO (SCA passthrough) and IDTE (for gmap_shadow unshadowing).
@@ -1928,6 +1933,8 @@ static void kvm_s390_crypto_init(struct kvm *kvm)
kvm->arch.crypto.crycb = &kvm->arch.sie_page2->crycb;
kvm->arch.crypto.crycbd = (__u32)(unsigned long) kvm->arch.crypto.crycb;
kvm_s390_format_crycb(kvm);
+ /* Default setting indicating SIE shall interpret AP instructions */
+ kvm->arch.crypto.apie = 1;
}
static void sca_dispose(struct kvm *kvm)
@@ -2458,6 +2465,11 @@ static void kvm_s390_vcpu_crypto_setup(struct kvm_vcpu *vcpu)
vcpu->arch.sie_block->crycbd = vcpu->kvm->arch.crypto.crycbd;
+ vcpu->arch.sie_block->eca &= ~ECA_APIE;
+ if (vcpu->kvm->arch.crypto.apie &&
+ test_kvm_cpu_feat(vcpu->kvm, KVM_S390_VM_CPU_FEAT_AP))
+ vcpu->arch.sie_block->eca |= ECA_APIE;
+
/* If MSAX3 is installed, set up protected key support */
if (test_kvm_facility(vcpu->kvm, 76)) {
vcpu->arch.sie_block->ecb3 &= ~(ECB3_AES | ECB3_DEA);
diff --git a/arch/s390/tools/gen_facilities.c b/arch/s390/tools/gen_facilities.c
index 90a8c9e..e0e2c19 100644
--- a/arch/s390/tools/gen_facilities.c
+++ b/arch/s390/tools/gen_facilities.c
@@ -106,6 +106,9 @@ struct facility_def {
.name = "FACILITIES_KVM_CPUMODEL",
.bits = (int[]){
+ 12, /* AP Query Configuration Information */
+ 15, /* AP Facilities Test */
+ 156, /* Execution Token facility */
-1 /* END */
}
},
--
1.7.1
Provides interfaces to manage the AP adapters, usage domains
and control domains assigned to a KVM guest.
The guest's SIE state description has a satellite structure called the
Crypto Control Block (CRYCB) containing three bitmask fields
identifying the adapters, queues (domains) and control domains
assigned to the KVM guest:
* The AP Adapter Mask (APM) field identifies the AP adapters assigned to
the KVM guest
* The AP Queue Mask (AQM) field identifies the AP queues assigned to
the KVM guest. Each AP queue is connected to a usage domain within
an AP adapter.
* The AP Domain Mask (ADM) field identifies the control domains
assigned to the KVM guest.
Each adapter, queue (usage domain) and control domain are identified by
a number from 0 to 255. The bits in each mask, from most significant to
least significant bit, correspond to the numbers 0-255. When a bit is
set, the corresponding adapter, queue (usage domain) or control domain
is assigned to the KVM guest.
This patch will set the bits in the APM, AQM and ADM fields of the
CRYCB referenced by the KVM guest's SIE state description. The process
used is:
1. Verify that the bits to be set do not exceed the maximum bit
number for the given mask.
2. Verify that the APQNs that can be derived from the cross product
of the bits set in the APM and AQM fields of the KVM guest's CRYCB
are not assigned to any other KVM guest running on the same linux
host.
3. Set the APM, AQM and ADM in the CRYCB according to the matrix
configured for the mediated matrix device via its sysfs
assign_adapter, assign_domain and assign_control domain attribute
files respectively.
Signed-off-by: Tony Krowiak <[email protected]>
---
arch/s390/include/asm/kvm-ap.h | 52 ++++++++++++
arch/s390/include/asm/kvm_host.h | 1 +
arch/s390/kvm/kvm-ap.c | 161 ++++++++++++++++++++++++++++++++++++++
3 files changed, 214 insertions(+), 0 deletions(-)
diff --git a/arch/s390/include/asm/kvm-ap.h b/arch/s390/include/asm/kvm-ap.h
index 6af1ff8..21fe9f2 100644
--- a/arch/s390/include/asm/kvm-ap.h
+++ b/arch/s390/include/asm/kvm-ap.h
@@ -12,8 +12,33 @@
#include <linux/types.h>
#include <linux/kvm_host.h>
+#include <linux/bitops.h>
#include <asm/ap.h>
+#define KVM_AP_MASK_BYTES(n) DIV_ROUND_UP(n, BITS_PER_BYTE)
+
+/**
+ * The AP matrix is comprised of three bit masks identifying the adapters,
+ * queues (domains) and control domains that belong to an AP matrix. The bits in
+ * each mask, from least significant to most significant bit, correspond to IDs
+ * 0 to 255. When a bit is set, the corresponding ID belongs to the matrix.
+ *
+ * @apm identifies the AP adapters in the matrix
+ * @apm_max: max adapter number in @apm
+ * @aqm identifies the AP queues (domains) in the matrix
+ * @aqm_max: max domain number in @aqm
+ * @adm identifies the AP control domains in the matrix
+ * @adm_max: max domain number in @adm
+ */
+struct kvm_ap_matrix {
+ unsigned long apm_max;
+ DECLARE_BITMAP(apm, 256);
+ unsigned long aqm_max;
+ DECLARE_BITMAP(aqm, 256);
+ unsigned long adm_max;
+ DECLARE_BITMAP(adm, 256);
+};
+
/**
* kvm_ap_apxa_installed
*
@@ -57,4 +82,31 @@
*/
bool kvm_ap_instructions_available(void);
+/**
+ * kvm_ap_configure_matrix
+ *
+ * Configure the AP matrix for a KVM guest.
+ *
+ * @kvm: the KVM guest
+ * @matrix: the matrix configuration information
+ *
+ * Returns 0 if:
+ * 1. The AP instructions are installed on the guest
+ * 2. The APQNs derived from the intersection of the set of adapter
+ * IDs (APM) and queue indexes (AQM) in @matrix are not configured for
+ * any other KVM guest running on the same linux host.
+ * Otherwise returns an error code.
+ */
+int kvm_ap_configure_matrix(struct kvm *kvm, struct kvm_ap_matrix *matrix);
+
+/**
+ * kvm_ap_deconfigure_matrix
+ *
+ * Deconfigure the AP matrix for a KVM guest. Clears all of the bits in the
+ * APM, AQM and ADM in the guest's CRYCB.
+ *
+ * @kvm: the KVM guest
+ */
+void kvm_ap_deconfigure_matrix(struct kvm *kvm);
+
#endif /* _ASM_KVM_AP */
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index ef4b237..8736cde 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -257,6 +257,7 @@ struct kvm_s390_sie_block {
__u64 tecmc; /* 0x00e8 */
__u8 reservedf0[12]; /* 0x00f0 */
#define CRYCB_FORMAT_MASK 0x00000003
+#define CRYCB_FORMAT0 0x00000000
#define CRYCB_FORMAT1 0x00000001
#define CRYCB_FORMAT2 0x00000003
__u32 crycbd; /* 0x00fc */
diff --git a/arch/s390/kvm/kvm-ap.c b/arch/s390/kvm/kvm-ap.c
index 00bcfb0..98b53c7 100644
--- a/arch/s390/kvm/kvm-ap.c
+++ b/arch/s390/kvm/kvm-ap.c
@@ -7,6 +7,7 @@
* Author(s): Tony Krowiak <[email protected]>
*/
#include <linux/kernel.h>
+#include <linux/bitops.h>
#include <asm/kvm-ap.h>
#include "kvm-s390.h"
@@ -81,3 +82,163 @@ int kvm_ap_apxa_installed(void)
return 0;
}
EXPORT_SYMBOL(kvm_ap_apxa_installed);
+
+static inline void kvm_ap_clear_crycb_masks(struct kvm *kvm)
+{
+ memset(&kvm->arch.crypto.crycb->apcb0, 0,
+ sizeof(kvm->arch.crypto.crycb->apcb0));
+ memset(&kvm->arch.crypto.crycb->apcb1, 0,
+ sizeof(kvm->arch.crypto.crycb->apcb1));
+}
+
+static inline unsigned long *kvm_ap_get_crycb_apm(struct kvm *kvm)
+{
+ unsigned long *apm;
+
+ switch (kvm->arch.crypto.crycbd & CRYCB_FORMAT_MASK) {
+ case CRYCB_FORMAT2:
+ apm = (unsigned long *)kvm->arch.crypto.crycb->apcb1.apm;
+ break;
+ case CRYCB_FORMAT1:
+ case CRYCB_FORMAT0:
+ default:
+ apm = (unsigned long *)kvm->arch.crypto.crycb->apcb0.apm;
+ break;
+ }
+
+ return apm;
+}
+
+static inline unsigned long *kvm_ap_get_crycb_aqm(struct kvm *kvm)
+{
+ unsigned long *aqm;
+
+ switch (kvm->arch.crypto.crycbd & CRYCB_FORMAT_MASK) {
+ case CRYCB_FORMAT2:
+ aqm = (unsigned long *)kvm->arch.crypto.crycb->apcb1.aqm;
+ break;
+ case CRYCB_FORMAT1:
+ case CRYCB_FORMAT0:
+ default:
+ aqm = (unsigned long *)kvm->arch.crypto.crycb->apcb0.aqm;
+ break;
+ }
+
+ return aqm;
+}
+
+static inline unsigned long *kvm_ap_get_crycb_adm(struct kvm *kvm)
+{
+ unsigned long *adm;
+
+ switch (kvm->arch.crypto.crycbd & CRYCB_FORMAT_MASK) {
+ case CRYCB_FORMAT2:
+ adm = (unsigned long *)kvm->arch.crypto.crycb->apcb1.adm;
+ break;
+ case CRYCB_FORMAT1:
+ case CRYCB_FORMAT0:
+ default:
+ adm = (unsigned long *)kvm->arch.crypto.crycb->apcb0.adm;
+ break;
+ }
+
+ return adm;
+}
+
+static void kvm_ap_set_crycb_masks(struct kvm *kvm,
+ struct kvm_ap_matrix *matrix)
+{
+ int nbytes;
+ unsigned long *apm = kvm_ap_get_crycb_apm(kvm);
+ unsigned long *aqm = kvm_ap_get_crycb_aqm(kvm);
+ unsigned long *adm = kvm_ap_get_crycb_adm(kvm);
+
+ kvm_ap_clear_crycb_masks(kvm);
+
+ nbytes = KVM_AP_MASK_BYTES(matrix->apm_max + 1);
+ memcpy(apm, matrix->apm, nbytes);
+
+ nbytes = KVM_AP_MASK_BYTES(matrix->aqm_max + 1);
+ memcpy(aqm, matrix->aqm, nbytes);
+
+ /*
+ * Merge the AQM and ADM since the ADM is a superset of the
+ * AQM by agreed-upon convention.
+ */
+ bitmap_or(adm, matrix->adm, matrix->aqm, matrix->adm_max + 1);
+}
+
+static void kvm_ap_log_sharing_err(struct kvm *kvm, unsigned long apid,
+ unsigned long apqi)
+{
+ pr_err("%s: AP queue %02lx.%04lx is registered to guest %s", __func__,
+ apid, apqi, kvm->arch.dbf->name);
+}
+
+/**
+ * kvm_ap_validate_queue_sharing
+ *
+ * Verifies that the APQNs derived from the cross product of the AP adapter IDs
+ * and AP queue indexes comprising the AP matrix are not configured for
+ * another guest. AP queue sharing is not allowed.
+ *
+ * @kvm: the KVM guest
+ * @matrix: the AP matrix
+ *
+ * Returns 0 if the APQNs are valid, otherwise; returns -EBUSY.
+ */
+static int kvm_ap_validate_queue_sharing(struct kvm *kvm,
+ struct kvm_ap_matrix *matrix)
+{
+ struct kvm *vm;
+ unsigned long *apm, *aqm;
+ unsigned long apid, apqi;
+
+
+ /* No other VM may share an AP Queue with the input VM */
+ list_for_each_entry(vm, &vm_list, vm_list) {
+ if (kvm == vm)
+ continue;
+
+ apm = kvm_ap_get_crycb_apm(vm);
+ if (!bitmap_and(apm, apm, matrix->apm, matrix->apm_max + 1))
+ continue;
+
+ aqm = kvm_ap_get_crycb_aqm(vm);
+ if (!bitmap_and(aqm, aqm, matrix->aqm, matrix->aqm_max + 1))
+ continue;
+
+ for_each_set_bit_inv(apid, apm, matrix->apm_max + 1)
+ for_each_set_bit_inv(apqi, aqm, matrix->aqm_max + 1)
+ kvm_ap_log_sharing_err(vm, apid, apqi);
+
+ return -EBUSY;
+ }
+
+ return 0;
+}
+
+int kvm_ap_configure_matrix(struct kvm *kvm, struct kvm_ap_matrix *matrix)
+{
+ int ret = 0;
+
+ mutex_lock(&kvm->lock);
+
+ ret = kvm_ap_validate_queue_sharing(kvm, matrix);
+ if (ret)
+ goto done;
+
+ kvm_ap_set_crycb_masks(kvm, matrix);
+
+done:
+ mutex_unlock(&kvm->lock);
+
+ return ret;
+}
+EXPORT_SYMBOL(kvm_ap_configure_matrix);
+
+void kvm_ap_deconfigure_matrix(struct kvm *kvm)
+{
+ kvm_ap_clear_crycb_masks(kvm);
+}
+EXPORT_SYMBOL(kvm_ap_deconfigure_matrix);
--
1.7.1
Implements the open callback on the mediated matrix device.
The function registers a group notifier to receive notification
of the VFIO_GROUP_NOTIFY_SET_KVM event. When notified,
the vfio_ap device driver will get access to the guest's
kvm structure. With access to this structure the driver will:
1. Ensure that only one mediated device is opened for the guest
2. Configure access to the AP devices for the guest.
Access to AP adapters, usage domains and control domains
is controlled by three bit masks contained in the Crypto Control
Block (CRYCB) referenced from the guest's SIE state description:
* The AP Mask (APM) controls access to the AP adapters. Each bit
in the APM represents an adapter number - from most significant
to least significant bit - from 0 to 255. The bits in the APM
are set according to the adapter numbers assigned to the mediated
matrix device via its 'assign_adapter' sysfs attribute file.
* The AP Queue Mask (AQM) controls access to the AP queues. Each bit
in the AQM represents an AP queue index - from most significant
to least significant bit - from 0 to 255. A queue index references
a specific domain and is synonymous with the domian number. The
bits in the AQM are set according to the domain numbers assigned
to the mediated matrix device via its 'assign_domain' sysfs
attribute file.
* The AP Domain Mask (ADM) controls access to the AP control domains.
Each bit in the ADM represents a control domain - from most
significant to least significant bit - from 0-255. The
bits in the ADM are set according to the domain numbers assigned
to the mediated matrix device via its 'assign_control_domain'
sysfs attribute file.
Signed-off-by: Tony Krowiak <[email protected]>
---
arch/s390/include/asm/kvm-ap.h | 21 ++++++++++
arch/s390/include/asm/kvm_host.h | 1 +
arch/s390/kvm/kvm-ap.c | 19 +++++++++
drivers/s390/crypto/vfio_ap_ops.c | 68 +++++++++++++++++++++++++++++++++
drivers/s390/crypto/vfio_ap_private.h | 2 +
5 files changed, 111 insertions(+), 0 deletions(-)
diff --git a/arch/s390/include/asm/kvm-ap.h b/arch/s390/include/asm/kvm-ap.h
index 21fe9f2..68c5a67 100644
--- a/arch/s390/include/asm/kvm-ap.h
+++ b/arch/s390/include/asm/kvm-ap.h
@@ -83,6 +83,27 @@ struct kvm_ap_matrix {
bool kvm_ap_instructions_available(void);
/**
+ * kvm_ap_refcount_read
+ *
+ * Read the AP reference count and return it.
+ */
+int kvm_ap_refcount_read(struct kvm *kvm);
+
+/**
+ * kvm_ap_refcount_inc
+ *
+ * Increment the AP reference count.
+ */
+void kvm_ap_refcount_inc(struct kvm *kvm);
+
+/**
+ * kvm_ap_refcount_dec
+ *
+ * Decrement the AP reference count
+ */
+void kvm_ap_refcount_dec(struct kvm *kvm);
+
+/**
* kvm_ap_configure_matrix
*
* Configure the AP matrix for a KVM guest.
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 8736cde..5f1ad02 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -717,6 +717,7 @@ struct kvm_s390_crypto {
__u8 aes_kw;
__u8 dea_kw;
__u8 apie;
+ atomic_t aprefs;
};
#define APCB0_MASK_SIZE 1
diff --git a/arch/s390/kvm/kvm-ap.c b/arch/s390/kvm/kvm-ap.c
index 98b53c7..848fb37 100644
--- a/arch/s390/kvm/kvm-ap.c
+++ b/arch/s390/kvm/kvm-ap.c
@@ -9,6 +9,7 @@
#include <linux/kernel.h>
#include <linux/bitops.h>
#include <asm/kvm-ap.h>
+#include <asm/atomic.h>
#include "kvm-s390.h"
@@ -218,6 +219,24 @@ static int kvm_ap_validate_queue_sharing(struct kvm *kvm,
return 0;
}
+int kvm_ap_refcount_read(struct kvm *kvm)
+{
+ return atomic_read(&kvm->arch.crypto.aprefs);
+}
+EXPORT_SYMBOL(kvm_ap_refcount_read);
+
+void kvm_ap_refcount_inc(struct kvm *kvm)
+{
+ atomic_inc(&kvm->arch.crypto.aprefs);
+}
+EXPORT_SYMBOL(kvm_ap_refcount_inc);
+
+void kvm_ap_refcount_dec(struct kvm *kvm)
+{
+ atomic_dec(&kvm->arch.crypto.aprefs);
+}
+EXPORT_SYMBOL(kvm_ap_refcount_dec);
+
int kvm_ap_configure_matrix(struct kvm *kvm, struct kvm_ap_matrix *matrix)
{
int ret = 0;
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 81e03b8..8866b0e 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -11,6 +11,8 @@
#include <linux/device.h>
#include <linux/list.h>
#include <linux/ctype.h>
+#include <linux/module.h>
+#include <asm/kvm-ap.h>
#include "vfio_ap_private.h"
@@ -47,6 +49,70 @@ static int vfio_ap_mdev_remove(struct mdev_device *mdev)
return 0;
}
+static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ struct ap_matrix_mdev *matrix_mdev;
+
+ if (action == VFIO_GROUP_NOTIFY_SET_KVM) {
+ matrix_mdev = container_of(nb, struct ap_matrix_mdev,
+ group_notifier);
+ matrix_mdev->kvm = data;
+ }
+
+ return NOTIFY_OK;
+}
+
+static int vfio_ap_mdev_open(struct mdev_device *mdev)
+{
+ struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+ unsigned long events;
+ int ret;
+
+ if (!try_module_get(THIS_MODULE))
+ return -ENODEV;
+
+ matrix_mdev->group_notifier.notifier_call = vfio_ap_mdev_group_notifier;
+ events = VFIO_GROUP_NOTIFY_SET_KVM;
+
+ ret = vfio_register_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
+ &events, &matrix_mdev->group_notifier);
+ if (ret)
+ goto out_err;
+
+ /* Only one mediated device allowed per guest */
+ if (kvm_ap_refcount_read(matrix_mdev->kvm) != 0) {
+ ret = -EEXIST;
+ goto out_err;
+ }
+
+ kvm_ap_refcount_inc(matrix_mdev->kvm);
+
+ ret = kvm_ap_configure_matrix(matrix_mdev->kvm, &matrix_mdev->matrix);
+ if (ret)
+ goto config_err;
+
+ return 0;
+
+config_err:
+ kvm_ap_refcount_dec(matrix_mdev->kvm);
+out_err:
+ module_put(THIS_MODULE);
+
+ return ret;
+}
+
+static void vfio_ap_mdev_release(struct mdev_device *mdev)
+{
+ struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+
+ kvm_ap_deconfigure_matrix(matrix_mdev->kvm);
+ vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
+ &matrix_mdev->group_notifier);
+ kvm_ap_refcount_dec(matrix_mdev->kvm);
+ module_put(THIS_MODULE);
+}
+
static ssize_t name_show(struct kobject *kobj, struct device *dev, char *buf)
{
return sprintf(buf, "%s\n", VFIO_AP_MDEV_NAME_HWVIRT);
@@ -773,6 +839,8 @@ static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
.mdev_attr_groups = vfio_ap_mdev_attr_groups,
.create = vfio_ap_mdev_create,
.remove = vfio_ap_mdev_remove,
+ .open = vfio_ap_mdev_open,
+ .release = vfio_ap_mdev_release,
};
int vfio_ap_mdev_register(struct ap_matrix *ap_matrix)
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 8b6ad66..ab072e9 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -32,6 +32,8 @@ struct ap_matrix {
struct ap_matrix_mdev {
struct kvm_ap_matrix matrix;
+ struct notifier_block group_notifier;
+ struct kvm *kvm;
};
static inline struct ap_matrix *to_ap_matrix(struct device *dev)
--
1.7.1
Provides the sysfs interfaces for assigning AP domains to
and unassigning AP domains from a mediated matrix device.
An AP domain ID corresponds to an AP queue index (APQI). For
each domain assigned to the mediated matrix device, its
corresponging APQI is stored in an AP queue mask (AQM).
The bits in the AQM, from most significant to least
significant bit, correspond to AP domain numbers 0 to 255.
When a domain is assigned, the bit corresponding to its
APQI will be set in the AQM. Likewise, when a domain is
unassigned, the bit corresponding to its APQI will be
cleared from the AQM.
The relevant sysfs structures are:
/sys/devices/vfio_ap
... [matrix]
...... [mdev_supported_types]
......... [vfio_ap-passthrough]
............ [devices]
...............[$uuid]
.................. assign_domain
.................. unassign_domain
To assign a domain to the $uuid mediated matrix device,
write the domain's ID to the assign_domain file. To
unassign a domain, write the domain's ID to the
unassign_domain file. The ID is specified using
conventional semantics: If it begins with 0x, the number
will be parsed as a hexadecimal (case insensitive) number;
if it begins with 0, it will be parsed as an octal number;
otherwise, it will be parsed as a decimal number.
For example, to assign domain 173 (0xad) to the mediated matrix
device $uuid:
echo 173 > assign_domain
or
echo 0255 > assign_domain
or
echo 0xad > assign_domain
To unassign domain 173 (0xad):
echo 173 > unassign_domain
or
echo 0255 > unassign_domain
or
echo 0xad > unassign_domain
The assignment will be rejected:
* If the domain ID exceeds the maximum value for an AP domain:
* If the AP Extended Addressing (APXA) facility is installed,
the max value is 255
* Else the max value is 15
* If no AP adapters have yet been assigned and there are
no AP queues reserved by the VFIO AP driver that have an APQN
with an APQI matching that of the AP domain number being
assigned.
* If any of the APQNs that can be derived from the intersection
of the APQI being assigned and the AP adapter ID (APID) of
each of the AP adapters previously assigned can not be matched
with an APQN of an AP queue device reserved by the VFIO AP
driver.
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 228 ++++++++++++++++++++++++++++++++++++-
1 files changed, 227 insertions(+), 1 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 914274d..5c232de 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -381,10 +381,236 @@ static ssize_t unassign_adapter_store(struct device *dev,
}
DEVICE_ATTR_WO(unassign_adapter);
+/**
+ * vfio_ap_validate_queues_for_apqi
+ *
+ * @ap_matrix: the matrix device
+ * @matrix_mdev: the mediated matrix device
+ * @apqi: an AP queue index (APQI) - corresponds to a domain ID
+ *
+ * Verifies that each APQN that is derived from the intersection of @apqi and
+ * each AP adapter ID (APID) corresponding to an AP domain assigned to the
+ * @matrix_mdev matches the APQN of an AP queue reserved by the VFIO AP device
+ * driver.
+ *
+ * Returns 0 if validation succeeds; otherwise, returns an error.
+ */
+static int vfio_ap_validate_queues_for_apqi(struct ap_matrix *ap_matrix,
+ struct ap_matrix_mdev *matrix_mdev,
+ unsigned long apqi)
+{
+ int ret;
+ struct vfio_ap_qid_match qid_match;
+ unsigned long apid;
+ unsigned long nbits = matrix_mdev->matrix.apm_max + 1;
+ struct device_driver *drv = ap_matrix->device.driver;
+
+ /**
+ * Examine each APQN with the specified APQI
+ */
+ for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, nbits) {
+ qid_match.qid = AP_MKQID(apid, apqi);
+ qid_match.dev = NULL;
+
+ ret = driver_for_each_device(drv, NULL, &qid_match,
+ vfio_ap_queue_match);
+ if (ret) {
+ pr_err("%s: %s: error %d validating AP queue %02lx.%04lx reservation",
+ VFIO_AP_MODULE_NAME, __func__, ret, apid, apqi);
+
+ return ret;
+ }
+
+ /*
+ * If the APQN identifies an AP queue that is reserved by the
+ * VFIO AP device driver, continue processing.
+ */
+ if (qid_match.dev)
+ continue;
+
+ pr_err("%s: %s: AP queue %02lx.%04lx not reserved by %s driver",
+ VFIO_AP_MODULE_NAME, __func__, apid, apqi,
+ VFIO_AP_DRV_NAME);
+
+ return -ENXIO;
+ }
+
+ return 0;
+}
+
+struct vfio_ap_apqi_reserved {
+ unsigned long apqi;
+ bool reserved;
+};
+
+/**
+ * vfio_ap_queue_id_contains_apqi
+ *
+ * @dev: an AP queue device
+ * @data: an AP queue index (APQI)
+ *
+ * Returns 1 (true) if the APQI (@data) is contained in the AP queue's
+ * identifier; otherwise, returns 0;
+ */
+static int vfio_ap_queue_id_contains_apqi(struct device *dev, void *data)
+{
+ struct vfio_ap_apqi_reserved *apqi_res = data;
+ struct ap_queue *ap_queue = to_ap_queue(dev);
+
+ if (apqi_res->apqi == AP_QID_QUEUE(ap_queue->qid))
+ apqi_res->reserved = true;
+
+ return 0;
+}
+
+/**
+ * vfio_ap_verify_apqi_reserved
+ *
+ * @ap_matrix: the AP matrix configured for the mediated matrix device
+ * @apqi: the AP queue index (APQI) - corresponds to domain ID
+ *
+ * Verifies that at least one AP queue reserved by the VFIO AP device driver
+ * has an APQN containing @apqi.
+ *
+ * Returns 0 if the APQI is reserved; otherwise, returns -ENODEV.
+ */
+static int vfio_ap_verify_apqi_reserved(struct ap_matrix *ap_matrix,
+ unsigned long apqi)
+{
+ int ret;
+ struct vfio_ap_apqi_reserved apqi_res;
+
+ apqi_res.apqi = apqi;
+ apqi_res.reserved = false;
+
+ ret = driver_for_each_device(ap_matrix->device.driver, NULL,
+ &apqi_res,
+ vfio_ap_queue_id_contains_apqi);
+ if (ret) {
+ pr_err("%s: %s: error %d validating AP queue index %04lx reservation",
+ VFIO_AP_MODULE_NAME, __func__, ret, apqi);
+ return ret;
+ }
+
+ if (apqi_res.reserved)
+ return 0;
+
+ pr_err("%s: %s: no APQNs with domain ID %lu(%02lx) are reserved by %s driver",
+ VFIO_AP_MODULE_NAME, __func__, apqi, apqi, VFIO_AP_DRV_NAME);
+
+ return -ENODEV;
+}
+
+/**
+ * vfio_ap_validate_apqi
+ *
+ * @matrix_mdev: the mediated matrix device
+ * @apqi: the APQI (domain ID) to validate
+ *
+ * Validates the value of @apqi:
+ * * If there are no AP adapters assigned, then there must be at least
+ * one AP queue device reserved by the VFIO AP device driver with an
+ * APQN containing @apqi.
+ *
+ * * Else each APQN that can be derived from the intersection of @apqi and
+ * the IDs of the AP adapters already assigned must identify an AP queue
+ * that has been reserved by the VFIO AP device driver.
+ *
+ * Returns 0 if the value of @apqi is valid; otherwise, returns an error.
+ */
+static int vfio_ap_validate_apqi(struct mdev_device *mdev,
+ struct ap_matrix_mdev *matrix_mdev,
+ unsigned long apqi)
+{
+ int ret;
+ struct device *dev = mdev_parent_dev(mdev);
+ struct ap_matrix *ap_matrix = to_ap_matrix(dev);
+ unsigned long apid;
+ unsigned long max_apid = matrix_mdev->matrix.apm_max;
+
+ apid = find_first_bit_inv(matrix_mdev->matrix.apm, max_apid + 1);
+ /* If there are no adapters assigned */
+ if (apid > max_apid) {
+ ret = vfio_ap_verify_apqi_reserved(ap_matrix, apqi);
+ } else {
+ ret = vfio_ap_validate_queues_for_apqi(ap_matrix, matrix_mdev,
+ apqi);
+ }
+
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+static ssize_t assign_domain_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ int ret;
+ unsigned long apqi;
+ struct mdev_device *mdev = mdev_from_dev(dev);
+ struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+ unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
+
+ ret = kstrtoul(buf, 0, &apqi);
+ if (ret || (apqi > max_apqi)) {
+ pr_err("%s: %s: domain id '%s' not a value from 0 to %02lu(%#04lx)",
+ VFIO_AP_MODULE_NAME, __func__, buf, max_apqi, max_apqi);
+
+ return ret ? ret : -EINVAL;
+ }
+
+ ret = vfio_ap_validate_apqi(mdev, matrix_mdev, apqi);
+ if (ret)
+ return ret;
+
+ /* Set the bit in the AQM (bitmask) corresponding to the AP domain
+ * number (APQI). The bits in the mask, from most significant to least
+ * significant, correspond to numbers 0-255.
+ */
+ set_bit_inv(apqi, matrix_mdev->matrix.aqm);
+
+ return count;
+}
+DEVICE_ATTR_WO(assign_domain);
+
+static ssize_t unassign_domain_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ int ret;
+ unsigned long apqi;
+ struct mdev_device *mdev = mdev_from_dev(dev);
+ struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+ unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
+
+ ret = kstrtoul(buf, 0, &apqi);
+ if (ret || (apqi > max_apqi)) {
+ pr_err("%s: %s: domain id '%s' not a value from 0 to %02lu(%#04lx)",
+ VFIO_AP_MODULE_NAME, __func__, buf, max_apqi, max_apqi);
+
+ return ret ? ret : -EINVAL;
+ }
+
+ if (!test_bit_inv(apqi, matrix_mdev->matrix.aqm)) {
+ pr_err("%s: %s: domain %02lu(%#04lx) not assigned",
+ VFIO_AP_MODULE_NAME, __func__, apqi, apqi);
+ return -ENODEV;
+ }
+
+ clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
+
+ return count;
+}
+DEVICE_ATTR_WO(unassign_domain);
+
static struct attribute *vfio_ap_mdev_attrs[] = {
&dev_attr_assign_adapter.attr,
&dev_attr_unassign_adapter.attr,
- NULL
+ &dev_attr_assign_domain.attr,
+ &dev_attr_unassign_domain.attr,
+ NULL,
};
static struct attribute_group vfio_ap_mdev_attr_group = {
--
1.7.1
This patch refactors the code that initializes the crypto
configuration for a guest. The crypto configuration is contained in
a crypto control block (CRYCB) which is a satellite control block to
our main hardware virtualization control block. The CRYCB is
attached to the main virtualization control block via a CRYCB
designation (CRYCBD) designation field containing the address of
the CRYCB as well as its format.
Prior to the introduction of AP device virtualization, there was
no need to provide access to or specify the format of the CRYCB for
a guest unless the MSA extension 3 (MSAX3) facility was installed
on the host system. With the introduction of AP device virtualization,
the CRYCB and its format must be made accessible to the guest
regardless of the presence of the MSAX3 facility as long as the
AP instructions are installed on the host.
Signed-off-by: Tony Krowiak <[email protected]>
---
arch/s390/include/asm/kvm_host.h | 1 +
arch/s390/kvm/kvm-s390.c | 64 ++++++++++++++++++++++++++-----------
2 files changed, 46 insertions(+), 19 deletions(-)
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 81cdb6b..5393c4d 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -255,6 +255,7 @@ struct kvm_s390_sie_block {
__u8 reservede4[4]; /* 0x00e4 */
__u64 tecmc; /* 0x00e8 */
__u8 reservedf0[12]; /* 0x00f0 */
+#define CRYCB_FORMAT_MASK 0x00000003
#define CRYCB_FORMAT1 0x00000001
#define CRYCB_FORMAT2 0x00000003
__u32 crycbd; /* 0x00fc */
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 1f50de7..99779a6 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1875,14 +1875,35 @@ long kvm_arch_vm_ioctl(struct file *filp,
return r;
}
-static void kvm_s390_set_crycb_format(struct kvm *kvm)
+/*
+ * The format of the crypto control block (CRYCB) is specified in the 3 low
+ * order bits of the CRYCB designation (CRYCBD) field as follows:
+ * Format 0: Neither the message security assist extension 3 (MSAX3) nor the
+ * AP extended addressing (APXA) facility are installed.
+ * Format 1: The APXA facility is not installed but the MSAX3 facility is.
+ * Format 2: Both the APXA and MSAX3 facilities are installed
+ */
+static void kvm_s390_format_crycb(struct kvm *kvm)
{
- kvm->arch.crypto.crycbd = (__u32)(unsigned long) kvm->arch.crypto.crycb;
+ /* Clear the CRYCB format bits - i.e., set format 0 by default */
+ kvm->arch.crypto.crycbd &= ~(CRYCB_FORMAT_MASK);
+
+ /* Check whether MSAX3 is installed */
+ if (!test_kvm_facility(kvm, 76))
+ return;
if (kvm_ap_apxa_installed())
kvm->arch.crypto.crycbd |= CRYCB_FORMAT2;
else
kvm->arch.crypto.crycbd |= CRYCB_FORMAT1;
+
+ /* Enable AES/DEA protected key functions by default */
+ kvm->arch.crypto.aes_kw = 1;
+ kvm->arch.crypto.dea_kw = 1;
+ get_random_bytes(kvm->arch.crypto.crycb->aes_wrapping_key_mask,
+ sizeof(kvm->arch.crypto.crycb->aes_wrapping_key_mask));
+ get_random_bytes(kvm->arch.crypto.crycb->dea_wrapping_key_mask,
+ sizeof(kvm->arch.crypto.crycb->dea_wrapping_key_mask));
}
static u64 kvm_s390_get_initial_cpuid(void)
@@ -1896,19 +1917,17 @@ static u64 kvm_s390_get_initial_cpuid(void)
static void kvm_s390_crypto_init(struct kvm *kvm)
{
- if (!test_kvm_facility(kvm, 76))
+ /*
+ * If neither the AP instructions nor the message security assist
+ * extension 3 (MSAX3) are installed, there is no need to initialize a
+ * crypto control block (CRYCB) for the guest.
+ */
+ if (!kvm_ap_instructions_available() && !test_kvm_facility(kvm, 76))
return;
kvm->arch.crypto.crycb = &kvm->arch.sie_page2->crycb;
- kvm_s390_set_crycb_format(kvm);
-
- /* Enable AES/DEA protected key functions by default */
- kvm->arch.crypto.aes_kw = 1;
- kvm->arch.crypto.dea_kw = 1;
- get_random_bytes(kvm->arch.crypto.crycb->aes_wrapping_key_mask,
- sizeof(kvm->arch.crypto.crycb->aes_wrapping_key_mask));
- get_random_bytes(kvm->arch.crypto.crycb->dea_wrapping_key_mask,
- sizeof(kvm->arch.crypto.crycb->dea_wrapping_key_mask));
+ kvm->arch.crypto.crycbd = (__u32)(unsigned long) kvm->arch.crypto.crycb;
+ kvm_s390_format_crycb(kvm);
}
static void sca_dispose(struct kvm *kvm)
@@ -2430,17 +2449,24 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
static void kvm_s390_vcpu_crypto_setup(struct kvm_vcpu *vcpu)
{
- if (!test_kvm_facility(vcpu->kvm, 76))
+ /*
+ * If a crypto control block designation (CRYCBD) has not been
+ * initialized
+ */
+ if (vcpu->kvm->arch.crypto.crycbd == 0)
return;
- vcpu->arch.sie_block->ecb3 &= ~(ECB3_AES | ECB3_DEA);
+ vcpu->arch.sie_block->crycbd = vcpu->kvm->arch.crypto.crycbd;
- if (vcpu->kvm->arch.crypto.aes_kw)
- vcpu->arch.sie_block->ecb3 |= ECB3_AES;
- if (vcpu->kvm->arch.crypto.dea_kw)
- vcpu->arch.sie_block->ecb3 |= ECB3_DEA;
+ /* If MSAX3 is installed, set up protected key support */
+ if (test_kvm_facility(vcpu->kvm, 76)) {
+ vcpu->arch.sie_block->ecb3 &= ~(ECB3_AES | ECB3_DEA);
- vcpu->arch.sie_block->crycbd = vcpu->kvm->arch.crypto.crycbd;
+ if (vcpu->kvm->arch.crypto.aes_kw)
+ vcpu->arch.sie_block->ecb3 |= ECB3_AES;
+ if (vcpu->kvm->arch.crypto.dea_kw)
+ vcpu->arch.sie_block->ecb3 |= ECB3_DEA;
+ }
}
void kvm_s390_vcpu_unsetup_cmma(struct kvm_vcpu *vcpu)
--
1.7.1
This patch provides documentation describing the AP architecture and
design concepts behind the virtualization of AP devices. It also
includes an example of how to configure AP devices for exclusive
use of KVM guests.
Signed-off-by: Tony Krowiak <[email protected]>
---
Documentation/s390/vfio-ap.txt | 575 ++++++++++++++++++++++++++++++++++++++++
MAINTAINERS | 1 +
2 files changed, 576 insertions(+), 0 deletions(-)
create mode 100644 Documentation/s390/vfio-ap.txt
diff --git a/Documentation/s390/vfio-ap.txt b/Documentation/s390/vfio-ap.txt
new file mode 100644
index 0000000..79f3d43
--- /dev/null
+++ b/Documentation/s390/vfio-ap.txt
@@ -0,0 +1,575 @@
+Introduction:
+============
+The Adjunct Processor (AP) facility is an IBM Z cryptographic facility comprised
+of three AP instructions and from 1 up to 256 PCIe cryptographic adapter cards.
+The AP devices provide cryptographic functions to all CPUs assigned to a
+linux system running in an IBM Z system LPAR.
+
+The AP adapter cards are exposed via the AP bus. The motivation for vfio-ap
+is to make AP cards available to KVM guests using the VFIO mediated device
+framework. This implementation relies considerably on the s390 virtualization
+facilities which do most of the hard work of providing direct access to AP
+devices.
+
+AP Architectural Overview:
+=========================
+To facilitate the comprehension of the design, let's start with some
+definitions:
+
+* AP adapter
+
+ An AP adapter is an IBM Z adapter card that can perform cryptographic
+ functions. There can be from 0 to 256 adapters assigned to an LPAR. Adapters
+ assigned to the LPAR in which a linux host is running will be available to
+ the linux host. Each adapter is identified by a number from 0 to 255. When
+ installed, an AP adapter is accessed by AP instructions executed by any CPU.
+
+ The AP adapter cards are assigned to a given LPAR via the system's Activation
+ Profile which can be edited via the HMC. When the system is IPL'd, the AP bus
+ module is loaded and detects the AP adapter cards assigned to the LPAR. The AP
+ bus creates a sysfs device for each adapter as they are detected. For example,
+ if AP adapters 4 and 10 (0x0a) are assigned to the LPAR, the AP bus will
+ create the following sysfs entries:
+
+ /sys/devices/ap/card04
+ /sys/devices/ap/card0a
+
+ Symbolic links to these devices will also be created in the AP bus devices
+ sub-directory:
+
+ /sys/bus/ap/devices/[card04]
+ /sys/bus/ap/devices/[card04]
+
+* AP domain
+
+ An adapter is partitioned into domains. Each domain can be thought of as
+ a set of hardware registers for processing AP instructions. An adapter can
+ hold up to 256 domains. Each domain is identified by a number from 0 to 255.
+ Domains can be further classified into two types:
+
+ * Usage domains are domains that can be accessed directly to process AP
+ commands.
+
+ * Control domains are domains that are accessed indirectly by AP
+ commands sent to a usage domain to control or change the domain, for
+ example; to set a secure private key for the domain.
+
+ The AP usage and control domains are assigned to a given LPAR via the system's
+ Activation Profile which can be edited via the HMC. When the system is IPL'd,
+ the AP bus module is loaded and detects the AP usage and control domains
+ assigned to the LPAR. The domain number of each usage domain will be coupled
+ with the adapter number of each AP adapter assigned to the LPAR to identify
+ the AP queues (see AP Queue section below). The domain number of each control
+ domain will be represented in a bitmask and stored in a sysfs file
+ /sys/bus/ap/ap_control_domain_mask created by the bus. The bits in the mask,
+ from most to least significant bit, correspond to domains 0-255.
+
+ A domain may be assigned to a system as both a usage and control domain, or
+ as a control domain only. Consequently, all domains assigned as both a usage
+ and control domain can both process AP commands as well as be changed by an AP
+ command sent to any usage domain assigned to the same system. Domains assigned
+ only as control domains can not process AP commands but can be changed by AP
+ commands sent to any usage domain assigned to the system.
+
+* AP Queue
+
+ An AP queue is the means by which an AP command-request message is sent to a
+ usage domain inside a specific adapter. An AP queue is identified by a tuple
+ comprised of an AP adapter ID (APID) and an AP queue index (APQI). The
+ APQI corresponds to a given usage domain number within the adapter. This tuple
+ forms an AP Queue Number (APQN) uniquely identifying an AP queue. AP
+ instructions include a field containing the APQN to identify the AP queue to
+ which the AP command-request message is to be sent for processing.
+
+ The AP bus will create a sysfs device for each APQN that can be derived from
+ the intersection of the AP adapter and usage domain numbers detected when the
+ AP bus module is loaded. For example, if adapters 4 and 10 (0x0a) and usage
+ domains 6 and 71 (0x47) are assigned to the LPAR, the AP bus will create the
+ following sysfs entries:
+
+ /sys/devices/ap/card04/04.0006
+ /sys/devices/ap/card04/04.0047
+ /sys/devices/ap/card0a/0a.0006
+ /sys/devices/ap/card0a/0a.0047
+
+ The following symbolic links to these devices will be created in the AP bus
+ devices subdirectory:
+
+ /sys/bus/ap/devices/[04.0006]
+ /sys/bus/ap/devices/[04.0047]
+ /sys/bus/ap/devices/[0a.0006]
+ /sys/bus/ap/devices/[0a.0047]
+
+* AP Instructions:
+
+ There are three AP instructions:
+
+ * NQAP: to enqueue an AP command-request message to a queue
+ * DQAP: to dequeue an AP command-reply message from a queue
+ * PQAP: to administer the queues
+
+AP and SIE:
+==========
+Let's now see how AP instructions are interpreted by the hardware.
+
+A satellite control block called the Crypto Control Block is attached to our
+main hardware virtualization control block. The CRYCB contains three fields to
+identify the adapters, usage domains and control domains assigned to the KVM
+guest:
+
+* The AP Mask (APM) field is a bit mask that identifies the AP adapters assigned
+ to the KVM guest. Each bit in the mask, from most significant to least
+ significant bit, corresponds to an APID from 0-255. If a bit is set, the
+ corresponding adapter is valid for use by the KVM guest.
+
+* The AP Queue Mask (AQM) field is a bit mask identifying the AP usage domains
+ assigned to the KVM guest. Each bit in the mask, from most significant to
+ least significant bit, corresponds to an AP queue index (APQI) from 0-255. If
+ a bit is set, the corresponding queue is valid for use by the KVM guest.
+
+* The AP Domain Mask field is a bit mask that identifies the AP control domains
+ assigned to the KVM guest. The ADM bit mask controls which domains can be
+ changed by an AP command-request message sent to a usage domain from the
+ guest. Each bit in the mask, from least significant to most significant bit,
+ corresponds to a domain from 0-255. If a bit is set, the corresponding domain
+ can be modified by an AP command-request message sent to a usage domain
+ configured for the KVM guest.
+
+If you recall from the description of an AP Queue, AP instructions include
+an APQN to identify the AP adapter and AP queue to which an AP command-request
+message is to be sent (NQAP and PQAP instructions), or from which a
+command-reply message is to be received (DQAP instruction). The validity of an
+APQN is defined by the matrix calculated from the APM and AQM; it is the
+cross product of all assigned adapter numbers (APM) with all assigned queue
+indexes (AQM). For example, if adapters 1 and 2 and usage domains 5 and 6 are
+assigned to a guest, the APQNs (1,5), (1,6), (2,5) and (2,6) will be valid for
+the guest.
+
+The APQNs can provide secure key functionality - i.e., a private key is stored
+on the adapter card for each of its domains - so each APQN must be assigned to
+at most one guest or the linux host.
+
+ Example 1: Valid configuration:
+ ------------------------------
+ Guest1: adapters 1,2 domains 5,6
+ Guest2: adapter 1,2 domain 7
+
+ This is valid because both guests have a unique set of APQNs: Guest1 has
+ APQNs (1,5), (1,6), (2,5) and (2,6); Guest2 has APQNs (1,7) and (2,7).
+
+ Example 2: Invalid configuration:
+ --------------------------------is assigned by writing the adapter's number into the
+ Guest1: adapters 1,2 domains 5,6
+ Guest2: adapter 1 domains 6,7
+
+ This is an invalid configuration because both guests have access to
+ APQN (1,6).
+
+The Design:
+===========
+The design introduces three new objects:
+
+1. AP matrix device
+2. VFIO AP device driver (vfio_ap.ko)
+3. AP mediated matrix passthrough device
+
+The VFIO AP device driver
+-------------------------
+The VFIO AP (vfio_ap) device driver serves the following purposes:
+
+1. Provides the interfaces to reserve APQNs for exclusive use of KVM guests.
+
+2. Sets up the VFIO mediated device interfaces to manage the mediated matrix
+ device and create the sysfs interfaces for assigning adapters, usage domains,
+ and control domains comprising the matrix for a KVM guest.
+
+3. Configure the APM, AQM and ADM in the CRYCB referenced by a KVM guest's
+ SIE state description to grant the guest access to AP devices
+
+4. Initialize the CPU model feature indicating that a KVM guest may use
+ AP facilities installed on the linux host.
+
+5. Enable interpretive execution mode for the KVM guest.
+
+Reserve APQNs for exclusive use of KVM guests
+---------------------------------------------
+The following block diagram illustrates the mechanism by which APQNs are
+reserved:
+
+ +------------------+
+ remove | | unbind
+ +------------------->+ cex4queue driver +<-----------+
+ | | | |
+ | +------------------+ |
+ | |
+ | |
+ | |
++--------+---------+ register +------------------+ +-----+------+
+| +<---------+ | bind | |
+| ap_bus | | vfio_ap driver +<-----+ admin |
+| +--------->+ | | |
++------------------+ probe +---+--------+-----+ +------------+
+ | |
+ create | | store APQN
+ | |
+ v v
+ +---+--------+-----+
+ | |
+ | matrix device |
+ | |
+ +------------------+
+
+The process for reserving an AP queue for use by a KVM guest is:
+
+* The vfio-ap driver during its initialization will perform the following:
+ * Create the 'vfio_ap' root device - /sys/devices/vfio_ap
+ * Create the 'matrix' device in the 'vfio_ap' root
+ * Register the matrix device with the device core
+* Register with the ap_bus for AP queue devices of type 10 devices (CEX4 and
+ newer) and to provide the vfio_ap driver's probe and remove callback
+ interfaces. The reason why older devices are not supported is because there
+ are no systems available on which to test.
+* The admin unbinds queue cc.qqqq from the cex4queue device driver. This results
+ in the ap_bus calling the the device driver's remove interface which
+ unbinds the cc.qqqq queue device from the driver.
+* The admin binds the cc.qqqq queue to the vfio_ap device driver. This results
+ in the ap_bus calling the device vfio_ap driver's probe interface to bind
+ queue cc.qqqq to the driver. The vfio_ap device driver will store the APQN for
+ the queue in the matrix device
+
+Set up the VFIO mediated device interfaces
+------------------------------------------
+The VFIO AP device driver utilizes the common interface of the VFIO mediated
+device core driver to:
+* Register an AP mediated bus driver to add a mediated matrix device to and
+ remove it from a VFIO group.
+* Create and destroy a mediated matrix device
+* Add a mediated matrix device to and remove it from the AP mediated bus driver
+* Add a mediated matrix device to and remove it from an IOMMU group
+
+The following high-level block diagram shows the main components and interfaces
+of the VFIO AP mediated matrix device driver:
+
+ +-------------+
+ | |
+ | +---------+ | mdev_register_driver() +--------------+
+ | | Mdev | +<-----------------------+ |
+ | | bus | | | vfio_mdev.ko |
+ | | driver | +----------------------->+ |<-> VFIO user
+ | +---------+ | probe()/remove() +--------------+ APIs
+ | |
+ | MDEV CORE |
+ | MODULE |
+ | mdev.ko |
+ | +---------+ | mdev_register_device() +--------------+
+ | |Physical | +<-----------------------+ |
+ | | device | | | vfio_ap.ko |<-> matrix
+ | |interface| +----------------------->+ | device
+ | +---------+ | callback +--------------+
+ +-------------+
+
+During initialization of the vfio_ap module, the matrix device is registered
+with an 'mdev_parent_ops' structure that provides the sysfs attribute
+structures, mdev functions and callback interfaces for managing the mediated
+matrix device.
+
+* sysfs attribute structures:
+ * supported_type_groups
+ The VFIO mediated device framework supports creation of user-defined
+ mediated device types. These mediated device types are specified
+ via the 'supported_type_groups' structure when a device is registered
+ with the mediated device framework. The registration process creates the
+ sysfs structures for each mediated device type specified in the
+ 'mdev_supported_types' sub-directory of the device being registered. Along
+ with the device type, the sysfs attributes of the mediated device type are
+ provided.
+
+ The VFIO AP device driver will register one mediated device type for
+ passthrough devices:
+ /sys/devices/vfio_ap/mdev_supported_types/vfio_ap-passthrough
+ Only the three read-only attributes required by the VFIO mdev framework will
+ be provided:
+ /sys/devices/vfio_ap/mdev_supported_types
+ ... name
+ ... device_api
+ ... available_instances
+ Where:
+ * name: specifies the name of the mediated device type
+ * device_api: the mediated device type's API
+ * available_instances: the number of mediated matrix passthrough devices
+ that can be created
+ * mdev_attr_groups
+ This attribute group identifies the user-defined sysfs attributes of the
+ mediated device. When a device is registered with the VFIO mediated device
+ framework, the sysfs attributes files identified in the 'mdev_attr_groups'
+ structure will be created in the mediated matrix device's directory. The
+ sysfs attributes for a mediated matrix device are:
+ * assign_adapter:
+ A write-only file for assigning an AP adapter to the mediated matrix
+ device. To assign an adapter, the APID of the adapter is written to the
+ file.
+ * assign_domain:
+ A write-only file for assigning an AP usage domain to the mediated matrix
+ device. To assign a domain, the APQI of the AP queue corresponding to a
+ usage domain is written to the file.
+ * matrix:
+ A read-only file for displaying the APQNs derived from the adapters and
+ domains assigned to the mediated matrix device.
+ * assign_control_domain:
+ A write-only file for assigning an AP control domain to the mediated
+ matrix device. To assign a control domain, the ID of a domain to be
+ controlled is written to the file. For the initial implementation, the set
+ of control domains will always include the set of usage domains, so it is
+ only necessary to assign control domains that are not also assigned as
+ usage domains.
+ * control_domains:
+ A read-only file for displaying the control domain numbers assigned to the
+ mediated matrix device.
+
+* functions:
+ * create:
+ allocates the ap_matrix_mdev structure used by the vfio_ap driver to:
+ * Keep track of the available instances
+ * Store the reference to the struct kvm for the KVM guest
+ * Provide the notifier callback that will get invoked to handle the
+ VFIO_GROUP_NOTIFY_SET_KVM event. When received, the vfio_ap driver will
+ store the reference in the mediated matrix device's ap_matrix_mdev
+ structure and enable the interpretive execution mode for the KVM guest.
+ * remove:
+ deallocates the mediated matrix device's ap_matrix_mdev structure.
+
+* callback interfaces
+ * open:
+ The vfio_ap driver uses this callback to register a
+ VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the mdev matrix
+ device. The notifier is invoked when QEMU connects the VFIO iommu group
+ for the mdev matrix device to the MDEV bus. Access to the KVM structure used
+ to configure the KVM guest is provided via this callback. The KVM structure,
+ is used to configure the guest's access to the AP matrix defined via the
+ mediated matrix device's sysfs attribute files.
+ * release:
+ unregisters the VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the
+ mdev matrix device and deconfigures the guest's AP matrix.
+
+Configure the APM, AQM and ADM in the CRYCB:
+-------------------------------------------
+Configuring the AP matrix for a KVM guest will be performed when the
+VFIO_GROUP_NOTIFY_SET_KVM notifier callback is invoked. The notifier
+function is called when QEMU connects the VFIO iommu group for the mdev matrix
+device to the MDEV bus. The CRYCB is configured by:
+* Setting the bits in the APM corresponding to the APIDs assigned to the
+ mediated matrix device via its 'assign_adapter' interface.
+* Setting the bits in the AQM corresponding to the APQIs assigned to the
+ mediated matrix device via its 'assign_domain' interface.
+* Setting the bits in the ADM corresponding to the domain dIDs assigned to the
+ mediated matrix device via its 'assign_control_domains' interface.
+
+Initialize the CPU model feature for AP
+---------------------------------------
+A new CPU model feature, KVM_S390_VM_CPU_FEAT_AP, is introduced to indicate that
+AP instructions are available to the KVM guest. This feature will be enabled by
+KVM only if the AP instructions are installed on the linux host. The feature
+must be turned on for the guest in order to access AP devices from the guest.
+For example, to turn the AP facilities on from the QEMU command line:
+
+ /usr/bin/qemu-system-s390x ... -cpu xxx,ap=on
+
+ Where xxx is the CPU model being used.
+
+ If the CPU model feature is not enabled by the kernel, QEMU will fail and
+ report that the feature is not supported.
+
+Example:
+=======
+Let's now provide an example to illustrate how KVM guests may be given
+access to AP facilities. For this example, we will show how to configure
+two guests such that executing the lszcrypt command on the guests would
+look like this:
+
+Guest1
+------
+CARD.DOMAIN TYPE MODE
+------------------------------
+05 CEX5C CCA-Coproc
+05.0004 CEX5C CCA-Coproc
+05.00ab CEX5C CCA-Coproc
+06 CEX5A Accelerator
+06.0004 CEX5A Accelerator
+06.00ab CEX5C CCA-Coproc
+
+Guest2
+------
+CARD.DOMAIN TYPE MODE
+------------------------------
+05 CEX5A Accelerator
+05.0047 CEX5A Accelerator
+05.00ff CEX5A Accelerator
+
+These are the steps:
+
+1. Install the vfio_ap module on the linux host. The dependency chain for the
+ vfio_ap module is:
+ * vfio
+ * mdev
+ * vfio_mdev
+ * KVM
+ * vfio_ap
+
+2. Secure the AP queues to be used by the two guests so that the host can not
+ access them. Only type 10 adapters (i.e., CEX4 and later) are supported
+ due to the fact that no test systems with older card types are available
+ for testing.
+
+ To secure the AP queues each, each AP Queue device must first be unbound from
+ the cex4queue device driver. The sysfs location of the driver is:
+
+ /sys/bus/ap
+ --- [drivers]
+ ------ [cex4queue]
+ --------- [05.0004]
+ --------- [05.0047]
+ --------- [05.00ab]
+ --------- [05.00ff]
+ --------- [06.0004]
+ --------- [06.00ab]
+ --------- unbind
+
+ To unbind AP queue 05.0004 from the cex4queue device driver:
+
+ echo 05.0004 > unbind
+
+ This must also be done for AP queues 05.00ab, 05.0047, 05.00ff, 06.0004,
+ and 06.00ab.
+
+ The AP Queues that were unbound must then be reserves for use by the two KVM
+ guests. This is accomplished by binding them to the vfio_ap device driver.
+ The sysfs location of the driver is:
+
+ /sys/bus/ap
+ ---[drivers]
+ ------ [vfio_ap]
+ ---------- bind
+
+ To bind queue 05.0004 to the vfio_ap driver:
+
+ echo 05.0004 > bind
+
+ This must also be done for AP queues 05.00ab, 05.0047, 05.00ff, 06.0004,
+ and 06.00ab.
+
+ Take note that the AP queues bound to the vfio_ap driver will be available
+ for guest usage until they are unbound from the driver, the vfio_ap module
+ is unloaded, or the host system is shut down.
+
+3. Create the mediated devices needed to configure the AP matrixes for the
+ two guests and to provide an interface to the vfio_ap driver for
+ use by the guests:
+
+ /sys/devices/
+ --- [vfio_ap]
+ ------ [matrix] (this is the matrix device)
+ --------- [mdev_supported_types]
+ ------------ [vfio_ap-passthrough] (passthrough mediated matrix device type)
+ --------------- create
+ --------------- [devices]
+
+ To create the mediated devices for the two guests:
+
+ uuidgen > create
+ uuidgen > create
+
+ This will create two mediated devices in the [devices] subdirectory named
+ with the UUID written to the create attribute file. We call them $uuid1
+ and $uuid2:
+
+ /sys/devices/
+ --- [vfio_ap]
+ ------ [matrix]
+ --------- [mdev_supported_types]
+ ------------ [vfio_ap-passthrough]
+ --------------- [devices]
+ ------------------ [$uuid1]
+ --------------------- assign_adapter
+ --------------------- assign_control_domain
+ --------------------- assign_domain
+ --------------------- matrix
+ --------------------- unassign_adapter
+ --------------------- unassign_control_domain
+ --------------------- unassign_domain
+
+ ------------------ [$uuid2]
+ --------------------- assign_adapter
+ --------------------- assign_cTo assign an adapter, the APID of the adapter is written to the
+ file. ontrol_domain
+ --------------------- assign_domain
+ --------------------- matrix
+ --------------------- unassign_adapter
+ --------------------- unassign_control_domain
+ --------------------- unassign_domain
+
+4. The administrator now needs to configure the matrixes for mediated
+ devices $uuid1 (for Guest1) and $uuid2 (for Guest2).
+
+ This is how the matrix is configured for Guest1:
+
+ echo 5 > assign_adapter
+ echo 6 > assign_adapter
+ echo 4 > assign_domain
+ echo 0xab > assign_domain
+
+ For this implementation, all usage domains - i.e., domains assigned
+ via the assign_domain attribute file - will also be configured in the ADM
+ field of the KVM guest's CRYCB, so there is no need to assign control
+ domains here unless you want to assign control domains that are not
+ assigned as usage domains.
+
+ If a mistake is made configuring an adapter, domain or control domain,
+ you can use the unassign_xxx files to unassign the adapter, domain or
+ control domain.
+
+ To display the matrix configuration for Guest1:
+
+ cat matrix
+
+ This is how the matrix is configured for Guest2:
+
+ echo 5 > assign_adapter
+ echo 0x47 > assign_domain
+ echo 0xff > assign_domain
+
+6. Start Guest1:
+
+ /usr/bin/qemu-system-s390x ... -cpu xxx,ap=on \
+ -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid1 ...
+
+7. Start Guest2:
+
+ /usr/bin/qemu-system-s390x ... -cpu xxx,ap=on \
+ -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid2 ...
+
+When the guest is shut down, the mediated matrix device may be removed.
+
+Using our example again, to remove the mediated matrix device $uuid1:
+
+ /sys/devices/
+ --- [vfio_ap]
+ ------ [matrix]
+ --------- [mdev_supported_types]
+ ------------ [vfio_ap-passthrough]
+ --------------- [devices]
+ ------------------ [$uuid1]
+ --------------------- remove
+
+ echo 1 > remove
+
+ This will remove all of the mdev matrix device's sysfs structures. To
+ recreate and reconfigure the mdev matrix device, all of the steps starting
+ with step 4 will have to be performed again.
+
+ It is not necessary to remove an mdev matrix device, but one may want to
+ remove it if no guest will use it during the lifetime of the linux host. If
+ the mdev matrix device is removed, one may want to unbind the AP queues the
+ guest was using from the vfio_ap device driver and bind them back to the
+ default driver. Alternatively, the AP queues can be configured for another
+ mdev matrix (i.e., guest). In either case, one must take care to change the
+ secure key configured for the domain to which the queue is connected.
\ No newline at end of file
diff --git a/MAINTAINERS b/MAINTAINERS
index cecadf2..f61f714 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12247,6 +12247,7 @@ S: Supported
F: drivers/s390/crypto/vfio_ap_drv.c
F: drivers/s390/crypto/vfio_ap_private.h
F: drivers/s390/crypto/vfio_ap_ops.c
+F: Documentation/s390/vfio-ap.txt
S390 ZFCP DRIVER
M: Steffen Maier <[email protected]>
--
1.7.1
Introduces ioctl access to the VFIO AP Matrix device driver
by implementing the VFIO_DEVICE_GET_INFO ioctl. This ioctl
provides the VFIO AP Matrix device driver information to the
guest machine.
Reviewed-by: Pierre Morel <[email protected]>
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 43 +++++++++++++++++++++++++++++++++++++
1 files changed, 43 insertions(+), 0 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 8866b0e..01a036f 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -113,6 +113,48 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
module_put(THIS_MODULE);
}
+static int vfio_ap_mdev_get_device_info(unsigned long arg)
+{
+ unsigned long minsz;
+ struct vfio_device_info info;
+
+ minsz = offsetofend(struct vfio_device_info, num_irqs);
+
+ if (copy_from_user(&info, (void __user *)arg, minsz))
+ return -EFAULT;
+
+ if (info.argsz < minsz) {
+ pr_err("%s: Argument size %u less than min size %li",
+ VFIO_AP_MODULE_NAME, info.argsz, minsz);
+ return -EINVAL;
+ }
+
+ info.flags = VFIO_DEVICE_FLAGS_AP;
+ info.num_regions = 0;
+ info.num_irqs = 0;
+
+ return copy_to_user((void __user *)arg, &info, minsz);
+}
+
+static ssize_t vfio_ap_mdev_ioctl(struct mdev_device *mdev,
+ unsigned int cmd, unsigned long arg)
+{
+ int ret;
+
+ switch (cmd) {
+ case VFIO_DEVICE_GET_INFO:
+ ret = vfio_ap_mdev_get_device_info(arg);
+ break;
+ default:
+ pr_err("%s: ioctl command %d is not a supported command",
+ VFIO_AP_MODULE_NAME, cmd);
+ ret = -EOPNOTSUPP;
+ break;
+ }
+
+ return ret;
+}
+
static ssize_t name_show(struct kobject *kobj, struct device *dev, char *buf)
{
return sprintf(buf, "%s\n", VFIO_AP_MDEV_NAME_HWVIRT);
@@ -841,6 +883,7 @@ static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
.remove = vfio_ap_mdev_remove,
.open = vfio_ap_mdev_open,
.release = vfio_ap_mdev_release,
+ .ioctl = vfio_ap_mdev_ioctl,
};
int vfio_ap_mdev_register(struct ap_matrix *ap_matrix)
--
1.7.1
Registers the matrix device created by the VFIO AP device
driver with the VFIO mediated device framework.
Registering the matrix device will create the sysfs
structures needed to create mediated matrix devices
each of which will be used to configure the AP matrix
for a guest and connect it to the VFIO AP device driver.
Registering the matrix device with the VFIO mediated device
framework will create the following sysfs structures:
/sys/devices/vfio_ap
... [matrix]
...... [mdev_supported_types]
......... [vfio_ap-passthrough]
............ create
To create a mediated device for the AP matrix device, write a UUID
to the create file:
uuidgen > create
A symbolic link to the mediated device's directory will be created in the
devices subdirectory named after the generated $uuid:
/sys/devices/vfio_ap
... [matrix]
...... [mdev_supported_types]
......... [vfio_ap-passthrough]
............ [devices]
............... [$uuid]
Signed-off-by: Tony Krowiak <[email protected]>
---
MAINTAINERS | 1 +
drivers/s390/crypto/Makefile | 2 +-
drivers/s390/crypto/vfio_ap_drv.c | 9 +++
drivers/s390/crypto/vfio_ap_ops.c | 106 +++++++++++++++++++++++++++++++++
drivers/s390/crypto/vfio_ap_private.h | 17 +++++
5 files changed, 134 insertions(+), 1 deletions(-)
create mode 100644 drivers/s390/crypto/vfio_ap_ops.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 2792c81..cecadf2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12246,6 +12246,7 @@ W: http://www.ibm.com/developerworks/linux/linux390/
S: Supported
F: drivers/s390/crypto/vfio_ap_drv.c
F: drivers/s390/crypto/vfio_ap_private.h
+F: drivers/s390/crypto/vfio_ap_ops.c
S390 ZFCP DRIVER
M: Steffen Maier <[email protected]>
diff --git a/drivers/s390/crypto/Makefile b/drivers/s390/crypto/Makefile
index 48e466e..8d36b05 100644
--- a/drivers/s390/crypto/Makefile
+++ b/drivers/s390/crypto/Makefile
@@ -17,5 +17,5 @@ pkey-objs := pkey_api.o
obj-$(CONFIG_PKEY) += pkey.o
# adjunct processor matrix
-vfio_ap-objs := vfio_ap_drv.o
+vfio_ap-objs := vfio_ap_drv.o vfio_ap_ops.o
obj-$(CONFIG_VFIO_AP) += vfio_ap.o
diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index 014d70f..cc7fbd7 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -121,11 +121,20 @@ int __init vfio_ap_init(void)
return ret;
}
+ ret = vfio_ap_mdev_register(ap_matrix);
+ if (ret) {
+ ap_driver_unregister(&vfio_ap_drv);
+ vfio_ap_matrix_dev_destroy(ap_matrix);
+
+ return ret;
+ }
+
return 0;
}
void __exit vfio_ap_exit(void)
{
+ vfio_ap_mdev_unregister(ap_matrix);
ap_driver_unregister(&vfio_ap_drv);
vfio_ap_matrix_dev_destroy(ap_matrix);
}
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
new file mode 100644
index 0000000..d7d36fb
--- /dev/null
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -0,0 +1,106 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Adjunct processor matrix VFIO device driver callbacks.
+ *
+ * Copyright IBM Corp. 2017
+ * Author(s): Tony Krowiak <[email protected]>
+ *
+ */
+#include <linux/string.h>
+#include <linux/vfio.h>
+#include <linux/device.h>
+#include <linux/list.h>
+#include <linux/ctype.h>
+
+#include "vfio_ap_private.h"
+
+#define VFOP_AP_MDEV_TYPE_HWVIRT "passthrough"
+#define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
+
+static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
+{
+ struct ap_matrix *ap_matrix = to_ap_matrix(mdev_parent_dev(mdev));
+
+ ap_matrix->available_instances--;
+
+ return 0;
+}
+
+static int vfio_ap_mdev_remove(struct mdev_device *mdev)
+{
+ struct ap_matrix *ap_matrix = to_ap_matrix(mdev_parent_dev(mdev));
+
+ ap_matrix->available_instances++;
+
+ return 0;
+}
+
+static ssize_t name_show(struct kobject *kobj, struct device *dev, char *buf)
+{
+ return sprintf(buf, "%s\n", VFIO_AP_MDEV_NAME_HWVIRT);
+}
+
+MDEV_TYPE_ATTR_RO(name);
+
+static ssize_t available_instances_show(struct kobject *kobj,
+ struct device *dev, char *buf)
+{
+ struct ap_matrix *ap_matrix;
+
+ ap_matrix = to_ap_matrix(dev);
+
+ return sprintf(buf, "%d\n", ap_matrix->available_instances);
+}
+
+MDEV_TYPE_ATTR_RO(available_instances);
+
+static ssize_t device_api_show(struct kobject *kobj, struct device *dev,
+ char *buf)
+{
+ return sprintf(buf, "%s\n", VFIO_DEVICE_API_AP_STRING);
+}
+
+MDEV_TYPE_ATTR_RO(device_api);
+
+static struct attribute *vfio_ap_mdev_type_attrs[] = {
+ &mdev_type_attr_name.attr,
+ &mdev_type_attr_device_api.attr,
+ &mdev_type_attr_available_instances.attr,
+ NULL,
+};
+
+static struct attribute_group vfio_ap_mdev_hwvirt_type_group = {
+ .name = VFOP_AP_MDEV_TYPE_HWVIRT,
+ .attrs = vfio_ap_mdev_type_attrs,
+};
+
+static struct attribute_group *vfio_ap_mdev_type_groups[] = {
+ &vfio_ap_mdev_hwvirt_type_group,
+ NULL,
+};
+
+static const struct mdev_parent_ops vfio_ap_matrix_ops = {
+ .owner = THIS_MODULE,
+ .supported_type_groups = vfio_ap_mdev_type_groups,
+ .create = vfio_ap_mdev_create,
+ .remove = vfio_ap_mdev_remove,
+};
+
+int vfio_ap_mdev_register(struct ap_matrix *ap_matrix)
+{
+ int ret;
+
+ ret = mdev_register_device(&ap_matrix->device, &vfio_ap_matrix_ops);
+ if (ret)
+ return ret;
+
+ ap_matrix->available_instances = AP_MATRIX_MAX_AVAILABLE_INSTANCES;
+
+ return 0;
+}
+
+void vfio_ap_mdev_unregister(struct ap_matrix *ap_matrix)
+{
+ ap_matrix->available_instances--;
+ mdev_unregister_device(&ap_matrix->device);
+}
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index cf23675..afd8dbc 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -10,14 +10,31 @@
#define _VFIO_AP_PRIVATE_H_
#include <linux/types.h>
+#include <linux/device.h>
+#include <linux/mdev.h>
#include "ap_bus.h"
#define VFIO_AP_MODULE_NAME "vfio_ap"
#define VFIO_AP_DRV_NAME "vfio_ap"
+/**
+ * There must be one mediated matrix device per guest. If every APQN is assigned
+ * to a guest, then the maximum number of guests with a unique APQN assigned
+ * would be 255 adapters x 255 domains = 72351 guests.
+ */
+#define AP_MATRIX_MAX_AVAILABLE_INSTANCES 72351
struct ap_matrix {
struct device device;
+ int available_instances;
};
+static inline struct ap_matrix *to_ap_matrix(struct device *dev)
+{
+ return container_of(dev, struct ap_matrix, device);
+}
+
+extern int vfio_ap_mdev_register(struct ap_matrix *ap_matrix);
+extern void vfio_ap_mdev_unregister(struct ap_matrix *ap_matrix);
+
#endif /* _VFIO_AP_PRIVATE_H_ */
--
1.7.1
Provides the sysfs interfaces for assigning AP control domains
to and unassigning AP control domains from a mediated matrix device.
The IDs of the AP control domains assigned to the mediated matrix
device are stored in an AP domain mask (ADM). The bits in the ADM,
from most significant to least significant bit, correspond to
AP domain numbers 0 to 255. When a control domain is assigned,
the bit corresponding its domain ID will be set in the ADM.
Likewise, when a domain is unassigned, the bit corresponding
to its domain ID will be cleared in the ADM.
The relevant sysfs structures are:
/sys/devices/vfio_ap
... [matrix]
...... [mdev_supported_types]
......... [vfio_ap-passthrough]
............ [devices]
...............[$uuid]
.................. assign_control_domain
.................. unassign_control_domain
To assign a control domain to the $uuid mediated matrix device's
ADM, write its domain number to the assign_control_domain file.
To unassign a domain, write its domain number to the
unassign_control_domain file. The domain number is specified
using conventional semantics: If it begins with 0x the number
will be parsed as a hexadecimal (case insensitive) number;
if it begins with 0, it is parsed as an octal number;
otherwise, it will be parsed as a decimal number.
For example, to assign control domain 173 (0xad) to the mediated
matrix device $uuid:
echo 173 > assign_control_domain
or
echo 0255 > assign_control_domain
or
echo 0xad > assign_control_domain
To unassign control domain 173 (0xad):
echo 173 > unassign_control_domain
or
echo 0255 > unassign_control_domain
or
echo 0xad > unassign_control_domain
The assignment will be rejected if the APQI exceeds the maximum
value for an AP domain:
* If the AP Extended Addressing (APXA) facility is installed,
the max value is 255
* Else the max value is 15
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 114 +++++++++++++++++++++++++++++++++++++
1 files changed, 114 insertions(+), 0 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 5c232de..755be1d 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -605,11 +605,125 @@ static ssize_t unassign_domain_store(struct device *dev,
}
DEVICE_ATTR_WO(unassign_domain);
+
+/**
+ * assign_control_domain_store
+ *
+ * @dev: the matrix device
+ * @attr: a mediated matrix device attribute
+ * @buf: a buffer containing the adapter ID (APID) to be assigned
+ * @count: the number of bytes in @buf
+ *
+ * Parses the domain ID from @buf and assigns it to the mediated matrix device.
+ *
+ * Returns the number of bytes processed if the domain ID is valid; otherwise
+ * returns an error.
+ */
+static ssize_t assign_control_domain_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ int ret;
+ unsigned long id;
+ struct mdev_device *mdev = mdev_from_dev(dev);
+ struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+ unsigned long maxid = vfio_ap_max_domain_id();
+
+ ret = kstrtoul(buf, 0, &id);
+ if (ret || (id > maxid)) {
+ pr_err("%s: %s: control domain id '%s' not a value from 0 to %02lu(%#04lx)",
+ VFIO_AP_MODULE_NAME, __func__, buf, maxid, maxid);
+
+ return ret ? ret : -EINVAL;
+ }
+
+ /* Set the bit in the ADM (bitmask) corresponding to the AP control
+ * domain number (id). The bits in the mask, from most significant to
+ * least significant, correspond to IDs 0 up to the one less than the
+ * number of control domains that can be assigned.
+ */
+ set_bit_inv(id, matrix_mdev->matrix.adm);
+
+ return count;
+}
+DEVICE_ATTR_WO(assign_control_domain);
+
+/**
+ * unassign_control_domain_store
+ *
+ * @dev: the matrix device
+ * @attr: a mediated matrix device attribute
+ * @buf: a buffer containing the adapter ID (APID) to be assigned
+ * @count: the number of bytes in @buf
+ *
+ * Parses the domain ID from @buf and unassigns it from the mediated matrix
+ * device.
+ *
+ * Returns the number of bytes processed if the domain ID is valid; otherwise
+ * returns an error.
+ */
+static ssize_t unassign_control_domain_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ int ret;
+ unsigned long domid;
+ struct mdev_device *mdev = mdev_from_dev(dev);
+ struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+ unsigned long max_domid = matrix_mdev->matrix.adm_max;
+
+ ret = kstrtoul(buf, 0, &domid);
+ if (ret || (domid > max_domid)) {
+ pr_err("%s: %s: control domain id '%s' not a value from 0 to %02lu(%#04lx)",
+ VFIO_AP_MODULE_NAME, __func__, buf,
+ max_domid, max_domid);
+
+ return ret ? ret : -EINVAL;
+ }
+
+ if (!test_bit_inv(domid, matrix_mdev->matrix.adm)) {
+ pr_err("%s: %s: control domain id %02lu(%#04lx) is not assigned",
+ VFIO_AP_MODULE_NAME, __func__, domid, domid);
+
+ return -ENODEV;
+ }
+
+ clear_bit_inv(domid, matrix_mdev->matrix.adm);
+
+ return count;
+}
+DEVICE_ATTR_WO(unassign_control_domain);
+
+static ssize_t control_domains_show(struct device *dev,
+ struct device_attribute *dev_attr,
+ char *buf)
+{
+ unsigned long id;
+ int nchars = 0;
+ int n;
+ char *bufpos = buf;
+ struct mdev_device *mdev = mdev_from_dev(dev);
+ struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+ unsigned long max_apqi = matrix_mdev->matrix.apm_max;
+
+ for_each_set_bit_inv(id, matrix_mdev->matrix.adm, max_apqi + 1) {
+ n = sprintf(bufpos, "%04lx\n", id);
+ bufpos += n;
+ nchars += n;
+ }
+
+ return nchars;
+}
+DEVICE_ATTR_RO(control_domains);
+
static struct attribute *vfio_ap_mdev_attrs[] = {
&dev_attr_assign_adapter.attr,
&dev_attr_unassign_adapter.attr,
&dev_attr_assign_domain.attr,
&dev_attr_unassign_domain.attr,
+ &dev_attr_assign_control_domain.attr,
+ &dev_attr_unassign_control_domain.attr,
+ &dev_attr_control_domains.attr,
NULL,
};
--
1.7.1
Provides a sysfs interface to view the AP matrix configured for the
mediated matrix device.
The relevant sysfs structures are:
/sys/devices/vfio_ap
... [matrix]
...... [mdev_supported_types]
......... [vfio_ap-passthrough]
............ [devices]
...............[$uuid]
.................. matrix
To view the matrix configured for the mediated matrix device,
print the matrix file:
cat matrix
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 31 +++++++++++++++++++++++++++++++
1 files changed, 31 insertions(+), 0 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 755be1d..81e03b8 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -716,6 +716,36 @@ static ssize_t control_domains_show(struct device *dev,
}
DEVICE_ATTR_RO(control_domains);
+static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct mdev_device *mdev = mdev_from_dev(dev);
+ struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+ char *bufpos = buf;
+ unsigned long apid;
+ unsigned long apqi;
+ unsigned long napm = matrix_mdev->matrix.apm_max + 1;
+ unsigned long naqm = matrix_mdev->matrix.aqm_max + 1;
+ int nchars = 0;
+ int n;
+
+ for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm) {
+ n = sprintf(bufpos, "%02lx\n", apid);
+ bufpos += n;
+ nchars += n;
+
+ for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, naqm) {
+ n = sprintf(bufpos, "%02lx.%04lx\n", apid, apqi);
+ bufpos += n;
+ nchars += n;
+ }
+ }
+
+ return nchars;
+}
+DEVICE_ATTR_RO(matrix);
+
+
static struct attribute *vfio_ap_mdev_attrs[] = {
&dev_attr_assign_adapter.attr,
&dev_attr_unassign_adapter.attr,
@@ -724,6 +754,7 @@ static ssize_t control_domains_show(struct device *dev,
&dev_attr_assign_control_domain.attr,
&dev_attr_unassign_control_domain.attr,
&dev_attr_control_domains.attr,
+ &dev_attr_matrix.attr,
NULL,
};
--
1.7.1
Provides the sysfs interfaces for assigning AP adapters to
and unassigning AP adapters from a mediated matrix device.
The IDs of the AP adapters assigned to the mediated matrix
device are stored in an AP mask (APM). The bits in the APM,
from most significant to least significant bit, correspond to
AP adapter ID (APID) 0 to 255. When an adapter is assigned, the
bit corresponding the APID will be set in the APM.
Likewise, when an adapter is unassigned, the bit corresponding
to the APID will be cleared from the APM.
The relevant sysfs structures are:
/sys/devices/vfio_ap
... [matrix]
...... [mdev_supported_types]
......... [vfio_ap-passthrough]
............ [devices]
...............[$uuid]
.................. assign_adapter
.................. unassign_adapter
To assign an adapter to the $uuid mediated matrix device's APM,
write the APID to the assign_adapter file. To unassign an adapter,
write the APID to the unassign_adapter file. The APID is specified
using conventional semantics: If it begins with 0x the number will
be parsed as a hexadecimal number; if it begins with a 0 the number
will be parsed as an octal number; otherwise, it will be parsed as a
decimal number.
For example, to assign adapter 173 (0xad) to the mediated matrix
device $uuid:
echo 173 > assign_adapter
or
echo 0xad > assign_adapter
or
echo 0255 > assign_adapter
To unassign adapter 173 (0xad):
echo 173 > unassign_adapter
or
echo 0xad > unassign_adapter
or
echo 0255 > unassign_adapter
The assignment will be rejected:
* If the APID exceeds the maximum value for an AP adapter:
* If the AP Extended Addressing (APXA) facility is
installed, the max value is 255
* Else the max value is 64
* If no AP domains have yet been assigned and there are
no AP queues bound to the VFIO AP driver that have an APQN
with an APID matching that of the AP adapter being assigned.
* If any of the APQNs that can be derived from the cross product
of the APID being assigned and the AP queue index (APQI) of
each of the AP domains previously assigned can not be matched
with an APQN of an AP queue device reserved by the VFIO AP
driver.
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 318 +++++++++++++++++++++++++++++++++
drivers/s390/crypto/vfio_ap_private.h | 34 ++++
2 files changed, 352 insertions(+), 0 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index d7d36fb..914274d 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -20,7 +20,16 @@
static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
{
struct ap_matrix *ap_matrix = to_ap_matrix(mdev_parent_dev(mdev));
+ struct ap_matrix_mdev *matrix_mdev;
+ matrix_mdev = kzalloc(sizeof(*matrix_mdev), GFP_KERNEL);
+ if (!matrix_mdev)
+ return -ENOMEM;
+
+ matrix_mdev->matrix.apm_max = vfio_ap_max_adapter_id();
+ matrix_mdev->matrix.aqm_max = vfio_ap_max_domain_id();
+ matrix_mdev->matrix.adm_max = matrix_mdev->matrix.aqm_max;
+ mdev_set_drvdata(mdev, matrix_mdev);
ap_matrix->available_instances--;
return 0;
@@ -29,7 +38,10 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
static int vfio_ap_mdev_remove(struct mdev_device *mdev)
{
struct ap_matrix *ap_matrix = to_ap_matrix(mdev_parent_dev(mdev));
+ struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+ kfree(matrix_mdev);
+ mdev_set_drvdata(mdev, NULL);
ap_matrix->available_instances++;
return 0;
@@ -79,9 +91,315 @@ static ssize_t device_api_show(struct kobject *kobj, struct device *dev,
NULL,
};
+struct vfio_apid_reserved {
+ unsigned long apid;
+ int reserved;
+};
+
+struct vfio_ap_qid_match {
+ qid_t qid;
+ struct device *dev;
+};
+
+/**
+ * vfio_ap_queue_match
+ *
+ * @dev: an AP queue device that has been reserved by the VFIO AP device
+ * driver
+ * @data: an AP queue identifier
+ *
+ * Returns 1 (true) if @data matches the AP queue identifier specified for @dev;
+ * otherwise, returns 0 (false);
+ */
+static int vfio_ap_queue_match(struct device *dev, void *data)
+{
+ struct vfio_ap_qid_match *qid_match = data;
+ struct ap_queue *ap_queue;
+
+ ap_queue = to_ap_queue(dev);
+
+ if (ap_queue->qid == qid_match->qid)
+ qid_match->dev = dev;
+
+ return 0;
+}
+
+/**
+ * vfio_ap_validate_queues_for_apid
+ *
+ * @ap_matrix: the matrix device
+ * @matrix_mdev: the mediated matrix device
+ * @apid: an AP adapter ID (APID)
+ *
+ * Verifies that each APQN that is derived from the intersection of @apid and
+ * each AP queue index (APQI) corresponding to an AP adapter assigned to the
+ * @matrix_mdev matches the APQN of an AP queue reserved by the VFIO AP device
+ * driver.
+ *
+ * Returns 0 if validation succeeds; otherwise, returns an error.
+ */
+static int vfio_ap_validate_queues_for_apid(struct ap_matrix *ap_matrix,
+ struct ap_matrix_mdev *matrix_mdev,
+ unsigned long apid)
+{
+ int ret;
+ struct vfio_ap_qid_match qid_match;
+ unsigned long apqi;
+ unsigned long nbits = matrix_mdev->matrix.aqm_max + 1;
+ struct device_driver *drv = ap_matrix->device.driver;
+
+ /**
+ * Examine each APQN with the specified APID
+ */
+ for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, nbits) {
+ qid_match.qid = AP_MKQID(apid, apqi);
+ qid_match.dev = NULL;
+
+ ret = driver_for_each_device(drv, NULL, &qid_match,
+ vfio_ap_queue_match);
+ if (ret) {
+ pr_err("%s: %s: Error %d validating AP queue %02lx.%04lx reservation",
+ VFIO_AP_MODULE_NAME, __func__, ret, apid, apqi);
+ return ret;
+ }
+
+ /*
+ * If the APQN identifies an AP queue that is reserved by the
+ * VFIO AP device driver, continue processing.
+ */
+ if (qid_match.dev)
+ continue;
+
+ pr_err("%s: %s: AP queue %02lx.%04lx not reserved by %s driver",
+ VFIO_AP_MODULE_NAME, __func__, apid, apqi,
+ VFIO_AP_DRV_NAME);
+
+ return -ENXIO;
+ }
+
+ return 0;
+}
+
+struct vfio_ap_apid_reserved {
+ unsigned long apid;
+ bool reserved;
+};
+
+/**
+ * vfio_ap_queue_id_contains_apid
+ *
+ * @dev: an AP queue device
+ * @data: an AP adapter ID (APID)
+ *
+ * Returns 1 (true) if the APID (@data) is contained in the AP queue's (@data)
+ * identifier; otherwise, returns 0;
+ */
+static int vfio_ap_queue_id_contains_apid(struct device *dev, void *data)
+{
+ struct vfio_ap_apid_reserved *apid_res = data;
+ struct ap_queue *ap_queue = to_ap_queue(dev);
+
+ if (apid_res->apid == AP_QID_CARD(ap_queue->qid))
+ apid_res->reserved = true;
+
+ return 0;
+}
+
+/**
+ * vfio_ap_verify_apid_reserved
+ *
+ * @ap_matrix: the AP matrix configured for the mediated matrix device
+ * @apid: the AP adapter ID
+ *
+ * Verifies that at least one AP queue reserved by the VFIO AP device driver
+ * has an APQN containing @apid.
+ *
+ * Returns 0 if the APID is reserved; otherwise, returns -ENODEV.
+ */
+static int vfio_ap_verify_apid_reserved(struct ap_matrix *ap_matrix,
+ unsigned long apid)
+{
+ int ret;
+ struct vfio_ap_apid_reserved apid_res;
+
+ apid_res.apid = apid;
+ apid_res.reserved = false;
+
+ ret = driver_for_each_device(ap_matrix->device.driver, NULL, &apid_res,
+ vfio_ap_queue_id_contains_apid);
+ if (ret)
+ return ret;
+
+ if (apid_res.reserved)
+ return 0;
+
+ pr_err("%s: %s: no APQNs with adapter ID %02lx are reserved by %s driver",
+ VFIO_AP_MODULE_NAME, __func__, apid, VFIO_AP_DRV_NAME);
+
+ return -ENODEV;
+}
+
+/**
+ * vfio_ap_validate_apid
+ *
+ * @mdev: the mediated device
+ * @matrix_mdev: the mediated matrix device
+ * @apid: the APID to validate
+ *
+ * Validates the value of @apid:
+ * * If there are no AP domains assigned, then there must be at least
+ * one AP queue device reserved by the VFIO AP device driver with an
+ * APQN containing @apid.
+ *
+ * * Else each APQN that can be derived from the intersection of @apid and
+ * the IDs of the AP domains already assigned must identify an AP queue
+ * that has been reserved by the VFIO AP device driver.
+ *
+ * Returns 0 if the value of @apid is valid; otherwise, returns an error.
+ */
+static int vfio_ap_validate_apid(struct mdev_device *mdev,
+ struct ap_matrix_mdev *matrix_mdev,
+ unsigned long apid)
+{
+ int ret;
+ struct device *dev = mdev_parent_dev(mdev);
+ struct ap_matrix *ap_matrix = to_ap_matrix(dev);
+ unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
+ unsigned long apqi;
+
+ apqi = find_first_bit_inv(matrix_mdev->matrix.aqm, max_apqi + 1);
+ if (apqi > max_apqi) {
+ ret = vfio_ap_verify_apid_reserved(ap_matrix, apid);
+ } else {
+ ret = vfio_ap_validate_queues_for_apid(ap_matrix, matrix_mdev,
+ apid);
+ }
+
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+/**
+ * assign_adapter_store
+ *
+ * @dev: the matrix device
+ * @attr: a mediated matrix device attribute
+ * @buf: a buffer containing the adapter ID (APID) to be assigned
+ * @count: the number of bytes in @buf
+ *
+ * Parses the APID from @buf and assigns it to the mediated matrix device. The
+ * APID must be a valid value:
+ * * The APID value must not exceed the maximum allowable AP adapter ID
+ *
+ * * If there are no AP domains assigned, then there must be at least
+ * one AP queue device reserved by the VFIO AP device driver with an
+ * APQN containing @apid.
+ *
+ * * Else each APQN that can be derived from the intersection of @apid and
+ * the IDs of the AP domains already assigned must identify an AP queue
+ * that has been reserved by the VFIO AP device driver.
+ *
+ * Returns the number of bytes processed if the APID is valid; otherwise returns
+ * an error.
+ */
+static ssize_t assign_adapter_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ int ret;
+ unsigned long apid;
+ struct mdev_device *mdev = mdev_from_dev(dev);
+ struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+ unsigned long max_apid = matrix_mdev->matrix.apm_max;
+
+ ret = kstrtoul(buf, 0, &apid);
+ if (ret || (apid > max_apid)) {
+ pr_err("%s: %s: adapter id '%s' not a value from 0 to %02lu(%#04lx)",
+ VFIO_AP_MODULE_NAME, __func__, buf, max_apid, max_apid);
+
+ return ret ? ret : -EINVAL;
+ }
+
+ ret = vfio_ap_validate_apid(mdev, matrix_mdev, apid);
+ if (ret)
+ return ret;
+
+ /* Set the bit in the AP mask (APM) corresponding to the AP adapter
+ * number (APID). The bits in the mask, from most significant to least
+ * significant bit, correspond to APIDs 0-255.
+ */
+ set_bit_inv(apid, matrix_mdev->matrix.apm);
+
+ return count;
+}
+static DEVICE_ATTR_WO(assign_adapter);
+
+/**
+ * unassign_adapter_store
+ *
+ * @dev: the matrix device
+ * @attr: a mediated matrix device attribute
+ * @buf: a buffer containing the adapter ID (APID) to be assigned
+ * @count: the number of bytes in @buf
+ *
+ * Parses the APID from @buf and unassigns it from the mediated matrix device.
+ * The APID must be a valid value
+ *
+ * Returns the number of bytes processed if the APID is valid; otherwise returns
+ * an error.
+ */
+static ssize_t unassign_adapter_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ int ret;
+ unsigned long apid;
+ struct mdev_device *mdev = mdev_from_dev(dev);
+ struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+ unsigned long max_apid = matrix_mdev->matrix.apm_max;
+
+ ret = kstrtoul(buf, 0, &apid);
+ if (ret || (apid > max_apid)) {
+ pr_err("%s: %s: adapter id '%s' must be a value from 0 to %02lu(%#04lx)",
+ VFIO_AP_MODULE_NAME, __func__, buf, max_apid, max_apid);
+
+ return ret ? ret : -EINVAL;
+ }
+
+ if (!test_bit_inv(apid, matrix_mdev->matrix.apm)) {
+ pr_err("%s: %s: adapter id %02lu(%#04lx) not assigned",
+ VFIO_AP_MODULE_NAME, __func__, apid, apid);
+
+ return -ENODEV;
+ }
+
+ clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
+
+ return count;
+}
+DEVICE_ATTR_WO(unassign_adapter);
+
+static struct attribute *vfio_ap_mdev_attrs[] = {
+ &dev_attr_assign_adapter.attr,
+ &dev_attr_unassign_adapter.attr,
+ NULL
+};
+
+static struct attribute_group vfio_ap_mdev_attr_group = {
+ .attrs = vfio_ap_mdev_attrs
+};
+
+static const struct attribute_group *vfio_ap_mdev_attr_groups[] = {
+ &vfio_ap_mdev_attr_group,
+ NULL
+};
+
static const struct mdev_parent_ops vfio_ap_matrix_ops = {
.owner = THIS_MODULE,
.supported_type_groups = vfio_ap_mdev_type_groups,
+ .mdev_attr_groups = vfio_ap_mdev_attr_groups,
.create = vfio_ap_mdev_create,
.remove = vfio_ap_mdev_remove,
};
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index afd8dbc..8b6ad66 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -12,6 +12,7 @@
#include <linux/types.h>
#include <linux/device.h>
#include <linux/mdev.h>
+#include <asm/kvm-ap.h>
#include "ap_bus.h"
@@ -29,11 +30,44 @@ struct ap_matrix {
int available_instances;
};
+struct ap_matrix_mdev {
+ struct kvm_ap_matrix matrix;
+};
+
static inline struct ap_matrix *to_ap_matrix(struct device *dev)
{
return container_of(dev, struct ap_matrix, device);
}
+static inline unsigned long vfio_ap_max_adapter_id(void)
+{
+ struct ap_config_info info;
+ /*
+ * TODO:
+ * Replace with call to ap_query_configuration() when that function is
+ * made static in the AP bus code.
+ */
+ if (kvm_ap_query_configuration(&info))
+ return 15;
+
+ return info.apxa ? info.Na : 63;
+}
+
+static inline unsigned long vfio_ap_max_domain_id(void)
+{
+ struct ap_config_info info;
+
+ /*
+ * TODO:
+ * Replace with call to ap_query_configuration() when that function is
+ * made static in the AP bus code.
+ */
+ if (kvm_ap_query_configuration(&info))
+ return 15;
+
+ return info.apxa ? info.Nd : 15;
+}
+
extern int vfio_ap_mdev_register(struct ap_matrix *ap_matrix);
extern void vfio_ap_mdev_unregister(struct ap_matrix *ap_matrix);
--
1.7.1
Relocates an existing static function that tests whether
the AP extended addressing facility (APXA) is installed on
the linux host. The primary reason for relocating this
function is because a new compilation unit (arch/s390/kvm/kvm-ap.c)
is being created to contain all of the interfaces and logic
for configuring an AP matrix for a KVM guest. Some of its
functions will also need to determine whether APXA is installed,
so, let's go ahead and relocate this static function as a
public interface in kvm-ap.c.
Notes:
----
1. The interface to determine whether APXA is installed on the linux
host the information returned from the AP Query Configuration
Information (QCI) function. This function will not be available
if the AP instructions are not installed on the linux host, so a check
will be included to verify that.
2. Currently, the AP bus interfaces accessing the AP instructions will
not be accessible if CONFIG_ZCRYPT=n, so the relevant code will be
temporarily contained in the new arch/s390/kvm/kvm-ap.c file until
the patch(es) to statically build the required AP bus interfaces are
available.
Signed-off-by: Tony Krowiak <[email protected]>
---
MAINTAINERS | 1 +
arch/s390/include/asm/kvm-ap.h | 60 +++++++++++++++++++++++++++++
arch/s390/kvm/Makefile | 2 +-
arch/s390/kvm/kvm-ap.c | 83 ++++++++++++++++++++++++++++++++++++++++
arch/s390/kvm/kvm-s390.c | 42 +-------------------
5 files changed, 147 insertions(+), 41 deletions(-)
create mode 100644 arch/s390/include/asm/kvm-ap.h
create mode 100644 arch/s390/kvm/kvm-ap.c
diff --git a/MAINTAINERS b/MAINTAINERS
index eab763f..224e97b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7792,6 +7792,7 @@ M: Christian Borntraeger <[email protected]>
M: Janosch Frank <[email protected]>
R: David Hildenbrand <[email protected]>
R: Cornelia Huck <[email protected]>
+R: Tony Krowiak <[email protected]>
L: [email protected]
W: http://www.ibm.com/developerworks/linux/linux390/
T: git git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git
diff --git a/arch/s390/include/asm/kvm-ap.h b/arch/s390/include/asm/kvm-ap.h
new file mode 100644
index 0000000..6af1ff8
--- /dev/null
+++ b/arch/s390/include/asm/kvm-ap.h
@@ -0,0 +1,60 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Adjunct Processor (AP) configuration management for KVM guests
+ *
+ * Copyright IBM Corp. 2018
+ *
+ * Author(s): Tony Krowiak <[email protected]>
+ */
+
+#ifndef _ASM_KVM_AP
+#define _ASM_KVM_AP
+
+#include <linux/types.h>
+#include <linux/kvm_host.h>
+#include <asm/ap.h>
+
+/**
+ * kvm_ap_apxa_installed
+ *
+ * Returns 1 if the AP extended addressing facility (APXA) is installed on the
+ * linux host; otherwise, returns 0.
+ */
+int kvm_ap_apxa_installed(void);
+
+/**
+ * kvm_ap_query_configuration
+ *
+ * @info: stores the AP configuration information
+ *
+ * Executes the AP Query Configuration Information (QCI) function and stores
+ * the configuration information in @info.
+ *
+ * Returns 0 if the operation succeeds; otherwise returns an error. If the
+ * QCI facility is not installed, returns -EOPNOTSUPP.
+ *
+ * TODO:
+ * This interface is temporary until the ap_query_configuration() interface
+ * implemented in the AP bus becomes statically available. Currently, the
+ * bus interface will not be available if CONFIG_ZCRYPT or CONFIG_ZCRYPT_MODULE
+ * is not selected. Calls to this function should be replaced by a call to
+ * the AP bus ap_query_configuration() interface at that time.
+ */
+int kvm_ap_query_configuration(struct ap_config_info *info);
+
+/**
+ * kvm_ap_instructions_available
+ *
+ * Returns 1 if the AP instructions are installed on the linux host; otherwise,
+ * returns 0.
+ *
+ * TODO:
+ * This interface is temporary until the ap_instructions_available() interface
+ * implemented in the AP bus becomes statically available. Currently, the
+ * bus interface will not be available if CONFIG_ZCRYPT or CONFIG_ZCRYPT_MODULE
+ * is not selected. Calls to this function should be replaced by a call to
+ * the AP bus ap_instructions_available() interface at that time.
+ */
+bool kvm_ap_instructions_available(void);
+
+#endif /* _ASM_KVM_AP */
diff --git a/arch/s390/kvm/Makefile b/arch/s390/kvm/Makefile
index 05ee90a..1876bfe 100644
--- a/arch/s390/kvm/Makefile
+++ b/arch/s390/kvm/Makefile
@@ -9,6 +9,6 @@ common-objs = $(KVM)/kvm_main.o $(KVM)/eventfd.o $(KVM)/async_pf.o $(KVM)/irqch
ccflags-y := -Ivirt/kvm -Iarch/s390/kvm
kvm-objs := $(common-objs) kvm-s390.o intercept.o interrupt.o priv.o sigp.o
-kvm-objs += diag.o gaccess.o guestdbg.o vsie.o
+kvm-objs += diag.o gaccess.o guestdbg.o vsie.o kvm-ap.o
obj-$(CONFIG_KVM) += kvm.o
diff --git a/arch/s390/kvm/kvm-ap.c b/arch/s390/kvm/kvm-ap.c
new file mode 100644
index 0000000..00bcfb0
--- /dev/null
+++ b/arch/s390/kvm/kvm-ap.c
@@ -0,0 +1,83 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Adjunct Processor (AP) configuration management for KVM guests
+ *
+ * Copyright IBM Corp. 2018
+ *
+ * Author(s): Tony Krowiak <[email protected]>
+ */
+#include <linux/kernel.h>
+#include <asm/kvm-ap.h>
+
+#include "kvm-s390.h"
+
+static int kvm_ap_qci(struct ap_config_info *info)
+{
+ register unsigned long reg0 asm ("0") = 0x04000000UL;
+ register unsigned long reg1 asm ("1") = -EINVAL;
+ register void *reg2 asm ("2") = (void *) info;
+
+ asm volatile(
+ ".long 0xb2af0000\n" /* PQAP(QCI) */
+ "0: la %1,0\n"
+ "1:\n"
+ EX_TABLE(0b, 1b)
+ : "+d" (reg0), "+d" (reg1), "+d" (reg2)
+ :
+ : "cc", "memory");
+
+ return reg1;
+}
+
+
+/**
+ * TODO:
+ * This interface is temporary until the ap_query_configuration() interface
+ * implemented in the AP bus becomes statically available. Currently, the
+ * bus interface will not be available if CONFIG_ZCRYPT or CONFIG_ZCRYPT_MODULE
+ * is not selected. Calls to this function should be replaced by a call to
+ * the AP bus ap_instructions_available() interface at that time.
+ */
+bool kvm_ap_instructions_available(void)
+{
+ register unsigned long reg0 asm ("0") = AP_MKQID(0, 0);
+ register unsigned long reg1 asm ("1") = -ENODEV;
+ register unsigned long reg2 asm ("2") = 0UL;
+
+ asm volatile(
+ " .long 0xb2af0000\n" /* PQAP(TAPQ) */
+ "0: la %1,0\n"
+ "1:\n"
+ EX_TABLE(0b, 1b)
+ : "+d" (reg0), "+d" (reg1), "+d" (reg2) : : "cc");
+ return reg1 == 0;
+}
+EXPORT_SYMBOL(kvm_ap_instructions_available);
+
+/**
+ * TODO:
+ * This interface is temporary until the ap_query_configuration() interface
+ * implemented in the AP bus becomes statically available. Currently, the AP
+ * bus interface will not be available if CONFIG_ZCRYPT or CONFIG_ZCRYPT_MODULE
+ * is not selected. Calls to this function should be replaced by a call to
+ * the AP bus ap_query_configuration() interface at that time.
+ */
+int kvm_ap_query_configuration(struct ap_config_info *info)
+{
+ if (kvm_ap_instructions_available() && test_facility(12))
+ return kvm_ap_qci(info);
+
+ return -EOPNOTSUPP;
+}
+EXPORT_SYMBOL(kvm_ap_query_configuration);
+
+int kvm_ap_apxa_installed(void)
+{
+ struct ap_config_info info;
+
+ if (kvm_ap_query_configuration(&info) == 0)
+ return (info.apxa == 1);
+
+ return 0;
+}
+EXPORT_SYMBOL(kvm_ap_apxa_installed);
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 64c9862..1f50de7 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -40,6 +40,7 @@
#include <asm/sclp.h>
#include <asm/cpacf.h>
#include <asm/timex.h>
+#include <asm/kvm-ap.h>
#include "kvm-s390.h"
#include "gaccess.h"
@@ -1874,50 +1875,11 @@ long kvm_arch_vm_ioctl(struct file *filp,
return r;
}
-static int kvm_s390_query_ap_config(u8 *config)
-{
- u32 fcn_code = 0x04000000UL;
- u32 cc = 0;
-
- memset(config, 0, 128);
- asm volatile(
- "lgr 0,%1\n"
- "lgr 2,%2\n"
- ".long 0xb2af0000\n" /* PQAP(QCI) */
- "0: ipm %0\n"
- "srl %0,28\n"
- "1:\n"
- EX_TABLE(0b, 1b)
- : "+r" (cc)
- : "r" (fcn_code), "r" (config)
- : "cc", "0", "2", "memory"
- );
-
- return cc;
-}
-
-static int kvm_s390_apxa_installed(void)
-{
- u8 config[128];
- int cc;
-
- if (test_facility(12)) {
- cc = kvm_s390_query_ap_config(config);
-
- if (cc)
- pr_err("PQAP(QCI) failed with cc=%d", cc);
- else
- return config[0] & 0x40;
- }
-
- return 0;
-}
-
static void kvm_s390_set_crycb_format(struct kvm *kvm)
{
kvm->arch.crypto.crycbd = (__u32)(unsigned long) kvm->arch.crypto.crycb;
- if (kvm_s390_apxa_installed())
+ if (kvm_ap_apxa_installed())
kvm->arch.crypto.crycbd |= CRYCB_FORMAT2;
else
kvm->arch.crypto.crycbd |= CRYCB_FORMAT1;
--
1.7.1
Introduces a new AP device driver. This device driver
is built on the VFIO mediated device framework. The framework
provides sysfs interfaces that facilitate passthrough
access by guests to devices installed on the linux host.
The VFIO AP device driver will serve two purposes:
1. Provide the interfaces to reserve AP devices for exclusive
use by KVM guests. This is accomplished by unbinding the
devices to be reserved for guest usage from the default AP
device driver and binding them to the VFIO AP device driver.
2. Implements the functions, callbacks and sysfs attribute
interfaces required to create one or more VFIO mediated
devices each of which will be used to configure the AP
matrix for a guest and serve as a file descriptor
for facilitating communication between QEMU and the
VFIO AP device driver.
When the VFIO AP device driver is initialized:
* It registers with the AP bus for control of type 10 (CEX4
and newer) AP queue devices. This limitation was imposed
due to:
1. A lack of access to older systems needed to test the
older AP device models;
2. A desire to keep the code as simple as possible;
3. Some older models are no longer supported by the kernel
and others are getting close to end of service.
The probe and remove callbacks will be provided to support
the binding/unbinding of AP queue devices to/from the VFIO
AP device driver.
* Creates a /sys/devices/vfio-ap/matrix device to hold
the APQNs of the AP devices bound to the VFIO
AP device driver and serves as the parent of the
mediated devices created for each guest.
Signed-off-by: Tony Krowiak <[email protected]>
---
MAINTAINERS | 10 +++
arch/s390/Kconfig | 11 +++
drivers/s390/crypto/Makefile | 4 +
drivers/s390/crypto/vfio_ap_drv.c | 134 +++++++++++++++++++++++++++++++++
drivers/s390/crypto/vfio_ap_private.h | 23 ++++++
include/uapi/linux/vfio.h | 2 +
6 files changed, 184 insertions(+), 0 deletions(-)
create mode 100644 drivers/s390/crypto/vfio_ap_drv.c
create mode 100644 drivers/s390/crypto/vfio_ap_private.h
diff --git a/MAINTAINERS b/MAINTAINERS
index 224e97b..2792c81 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12237,6 +12237,16 @@ W: http://www.ibm.com/developerworks/linux/linux390/
S: Supported
F: drivers/s390/crypto/
+S390 VFIO AP DRIVER
+M: Tony Krowiak <[email protected]>
+M: Christian Borntraeger <[email protected]>
+M: Martin Schwidefsky <[email protected]>
+L: [email protected]
+W: http://www.ibm.com/developerworks/linux/linux390/
+S: Supported
+F: drivers/s390/crypto/vfio_ap_drv.c
+F: drivers/s390/crypto/vfio_ap_private.h
+
S390 ZFCP DRIVER
M: Steffen Maier <[email protected]>
M: Benjamin Block <[email protected]>
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 199ac3e..8d833be 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -786,6 +786,17 @@ config VFIO_CCW
To compile this driver as a module, choose M here: the
module will be called vfio_ccw.
+config VFIO_AP
+ def_tristate n
+ prompt "VFIO support for AP devices"
+ depends on ZCRYPT && VFIO_MDEV_DEVICE && KVM
+ help
+ This driver grants access to Adjunct Processor (AP) devices
+ via the VFIO mediated device interface.
+
+ To compile this driver as a module, choose M here: the module
+ will be called vfio_ap.
+
endmenu
menu "Dump support"
diff --git a/drivers/s390/crypto/Makefile b/drivers/s390/crypto/Makefile
index b59af54..48e466e 100644
--- a/drivers/s390/crypto/Makefile
+++ b/drivers/s390/crypto/Makefile
@@ -15,3 +15,7 @@ obj-$(CONFIG_ZCRYPT) += zcrypt_pcixcc.o zcrypt_cex2a.o zcrypt_cex4.o
# pkey kernel module
pkey-objs := pkey_api.o
obj-$(CONFIG_PKEY) += pkey.o
+
+# adjunct processor matrix
+vfio_ap-objs := vfio_ap_drv.o
+obj-$(CONFIG_VFIO_AP) += vfio_ap.o
diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
new file mode 100644
index 0000000..014d70f
--- /dev/null
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -0,0 +1,134 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * VFIO based AP device driver
+ *
+ * Copyright IBM Corp. 2018
+ *
+ * Author(s): Tony Krowiak <[email protected]>
+ */
+
+#include <linux/module.h>
+#include <linux/mod_devicetable.h>
+#include <linux/slab.h>
+
+#include "vfio_ap_private.h"
+
+#define VFIO_AP_ROOT_NAME "vfio_ap"
+#define VFIO_AP_DEV_TYPE_NAME "ap_matrix"
+#define VFIO_AP_DEV_NAME "matrix"
+
+MODULE_AUTHOR("IBM Corporation");
+MODULE_DESCRIPTION("VFIO AP device driver, Copyright IBM Corp. 2017");
+MODULE_LICENSE("GPL v2");
+
+static struct device *vfio_ap_root_device;
+
+static struct ap_driver vfio_ap_drv;
+
+static struct ap_matrix *ap_matrix;
+
+static struct device_type vfio_ap_dev_type = {
+ .name = VFIO_AP_DEV_TYPE_NAME,
+};
+
+/* Only type 10 adapters (CEX4 and later) are supported
+ * by the AP matrix device driver
+ */
+static struct ap_device_id ap_queue_ids[] = {
+ { .dev_type = AP_DEVICE_TYPE_CEX4,
+ .match_flags = AP_DEVICE_ID_MATCH_QUEUE_TYPE },
+ { .dev_type = AP_DEVICE_TYPE_CEX5,
+ .match_flags = AP_DEVICE_ID_MATCH_QUEUE_TYPE },
+ { .dev_type = AP_DEVICE_TYPE_CEX6,
+ .match_flags = AP_DEVICE_ID_MATCH_QUEUE_TYPE },
+ { /* end of sibling */ },
+};
+
+MODULE_DEVICE_TABLE(vfio_ap, ap_queue_ids);
+
+static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
+{
+ return 0;
+}
+
+static void vfio_ap_matrix_dev_release(struct device *dev)
+{
+ struct ap_matrix *ap_matrix = dev_get_drvdata(dev);
+
+ kfree(ap_matrix);
+}
+
+static int vfio_ap_matrix_dev_create(void)
+{
+ int ret;
+
+ vfio_ap_root_device = root_device_register(VFIO_AP_ROOT_NAME);
+
+ if (IS_ERR(vfio_ap_root_device)) {
+ ret = PTR_ERR(vfio_ap_root_device);
+ goto done;
+ }
+
+ ap_matrix = kzalloc(sizeof(*ap_matrix), GFP_KERNEL);
+ if (!ap_matrix) {
+ ret = -ENOMEM;
+ goto matrix_alloc_err;
+ }
+
+ ap_matrix->device.type = &vfio_ap_dev_type;
+ dev_set_name(&ap_matrix->device, "%s", VFIO_AP_DEV_NAME);
+ ap_matrix->device.parent = vfio_ap_root_device;
+ ap_matrix->device.release = vfio_ap_matrix_dev_release;
+ ap_matrix->device.driver = &vfio_ap_drv.driver;
+
+ ret = device_register(&ap_matrix->device);
+ if (ret)
+ goto matrix_reg_err;
+
+ goto done;
+
+matrix_reg_err:
+ put_device(&ap_matrix->device);
+
+matrix_alloc_err:
+ root_device_unregister(vfio_ap_root_device);
+
+done:
+ return ret;
+}
+
+static void vfio_ap_matrix_dev_destroy(struct ap_matrix *ap_matrix)
+{
+ device_unregister(&ap_matrix->device);
+ root_device_unregister(vfio_ap_root_device);
+}
+
+int __init vfio_ap_init(void)
+{
+ int ret;
+
+ ret = vfio_ap_matrix_dev_create();
+ if (ret)
+ return ret;
+
+ memset(&vfio_ap_drv, 0, sizeof(vfio_ap_drv));
+ vfio_ap_drv.probe = vfio_ap_queue_dev_probe;
+ vfio_ap_drv.ids = ap_queue_ids;
+
+ ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
+ if (ret) {
+ vfio_ap_matrix_dev_destroy(ap_matrix);
+ return ret;
+ }
+
+ return 0;
+}
+
+void __exit vfio_ap_exit(void)
+{
+ ap_driver_unregister(&vfio_ap_drv);
+ vfio_ap_matrix_dev_destroy(ap_matrix);
+}
+
+module_init(vfio_ap_init);
+module_exit(vfio_ap_exit);
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
new file mode 100644
index 0000000..cf23675
--- /dev/null
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Private data and functions for adjunct processor VFIO matrix driver.
+ *
+ * Copyright IBM Corp. 2017
+ * Author(s): Tony Krowiak <[email protected]>
+ */
+
+#ifndef _VFIO_AP_PRIVATE_H_
+#define _VFIO_AP_PRIVATE_H_
+
+#include <linux/types.h>
+
+#include "ap_bus.h"
+
+#define VFIO_AP_MODULE_NAME "vfio_ap"
+#define VFIO_AP_DRV_NAME "vfio_ap"
+
+struct ap_matrix {
+ struct device device;
+};
+
+#endif /* _VFIO_AP_PRIVATE_H_ */
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 1aa7b82..f378b98 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -200,6 +200,7 @@ struct vfio_device_info {
#define VFIO_DEVICE_FLAGS_PLATFORM (1 << 2) /* vfio-platform device */
#define VFIO_DEVICE_FLAGS_AMBA (1 << 3) /* vfio-amba device */
#define VFIO_DEVICE_FLAGS_CCW (1 << 4) /* vfio-ccw device */
+#define VFIO_DEVICE_FLAGS_AP (1 << 5) /* vfio-ap device */
__u32 num_regions; /* Max region index + 1 */
__u32 num_irqs; /* Max IRQ index + 1 */
};
@@ -215,6 +216,7 @@ struct vfio_device_info {
#define VFIO_DEVICE_API_PLATFORM_STRING "vfio-platform"
#define VFIO_DEVICE_API_AMBA_STRING "vfio-amba"
#define VFIO_DEVICE_API_CCW_STRING "vfio-ccw"
+#define VFIO_DEVICE_API_AP_STRING "vfio-ap"
/**
* VFIO_DEVICE_GET_REGION_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 8,
--
1.7.1
On 05/07/2018 05:11 PM, Tony Krowiak wrote:
> Provides interfaces to manage the AP adapters, usage domains
> and control domains assigned to a KVM guest.
>
> The guest's SIE state description has a satellite structure called the
> Crypto Control Block (CRYCB) containing three bitmask fields
> identifying the adapters, queues (domains) and control domains
> assigned to the KVM guest:
[..]
> index 00bcfb0..98b53c7 100644
> --- a/arch/s390/kvm/kvm-ap.c
> +++ b/arch/s390/kvm/kvm-ap.c
> @@ -7,6 +7,7 @@
[..]
> +
> +/**
> + * kvm_ap_validate_queue_sharing
> + *
> + * Verifies that the APQNs derived from the cross product of the AP adapter IDs
> + * and AP queue indexes comprising the AP matrix are not configured for
> + * another guest. AP queue sharing is not allowed.
> + *
> + * @kvm: the KVM guest
> + * @matrix: the AP matrix
> + *
> + * Returns 0 if the APQNs are valid, otherwise; returns -EBUSY.
> + */
> +static int kvm_ap_validate_queue_sharing(struct kvm *kvm,
> + struct kvm_ap_matrix *matrix)
> +{
> + struct kvm *vm;
> + unsigned long *apm, *aqm;
> + unsigned long apid, apqi;
> +
> +
> + /* No other VM may share an AP Queue with the input VM */
> + list_for_each_entry(vm, &vm_list, vm_list) {
> + if (kvm == vm)
> + continue;
> +
> + apm = kvm_ap_get_crycb_apm(vm);
> + if (!bitmap_and(apm, apm, matrix->apm, matrix->apm_max + 1))
> + continue;
> +
> + aqm = kvm_ap_get_crycb_aqm(vm);
> + if (!bitmap_and(aqm, aqm, matrix->aqm, matrix->aqm_max + 1))
> + continue;
> +
> + for_each_set_bit_inv(apid, apm, matrix->apm_max + 1)
> + for_each_set_bit_inv(apqi, aqm, matrix->aqm_max + 1)
> + kvm_ap_log_sharing_err(vm, apid, apqi);
> +
> + return -EBUSY;
> + }
> +
> + return 0;
> +}
> +
> +int kvm_ap_configure_matrix(struct kvm *kvm, struct kvm_ap_matrix *matrix)
> +{
> + int ret = 0;
> +
> + mutex_lock(&kvm->lock);
You seem to take only kvm->lock, vm_list however (used in
kvm_ap_validate_queue_sharing()) seems to be protected by
kvm_lock.
Can you tell me why is this supposed to be safe?
What is supposed to prevent an execution like
vm1: call kvm_ap_configure_matrix(m1)
vm2: call kvm_ap_configure_matrix(m2)
vm1: call kvm_ap_validate_queue_sharing(m1)
vm2: call kvm_ap_validate_queue_sharing(m2)
vm1: call kvm_ap_set_crycb_masks(m1)
vm2: call kvm_ap_set_crycb_masks(m2)
where, let's say, m1 and m2 are equal in the sense that the
mask values are the same?
Regards,
Halil
> +
> + ret = kvm_ap_validate_queue_sharing(kvm, matrix);
> + if (ret)
> + goto done;
> +
> + kvm_ap_set_crycb_masks(kvm, matrix);
> +
> +done:
> + mutex_unlock(&kvm->lock);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL(kvm_ap_configure_matrix);
> +
On 05/07/2018 05:11 PM, Tony Krowiak wrote:
> Registers the matrix device created by the VFIO AP device
> driver with the VFIO mediated device framework.
> Registering the matrix device will create the sysfs
> structures needed to create mediated matrix devices
> each of which will be used to configure the AP matrix
> for a guest and connect it to the VFIO AP device driver.
>
[..]
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> new file mode 100644
> index 0000000..d7d36fb
> --- /dev/null
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -0,0 +1,106 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * Adjunct processor matrix VFIO device driver callbacks.
> + *
> + * Copyright IBM Corp. 2017
> + * Author(s): Tony Krowiak <[email protected]>
> + *
> + */
> +#include <linux/string.h>
> +#include <linux/vfio.h>
> +#include <linux/device.h>
> +#include <linux/list.h>
> +#include <linux/ctype.h>
> +
> +#include "vfio_ap_private.h"
> +
> +#define VFOP_AP_MDEV_TYPE_HWVIRT "passthrough"
> +#define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
> +
> +static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
> +{
> + struct ap_matrix *ap_matrix = to_ap_matrix(mdev_parent_dev(mdev));
> +
> + ap_matrix->available_instances--;
> +
> + return 0;
> +}
> +
> +static int vfio_ap_mdev_remove(struct mdev_device *mdev)
> +{
> + struct ap_matrix *ap_matrix = to_ap_matrix(mdev_parent_dev(mdev));
> +
> + ap_matrix->available_instances++;
> +
> + return 0;
> +}
> +
The above functions seem to be called with the lock of this auto-generated
mdev parent device held. That's why we don't have to care about synchronization
ourselves, right?
A small comment in the code could be helpful for mdev non-experts. Hell, I would
even consider documenting it for all mdev -- took me some time to figure out.
[..]
> +int vfio_ap_mdev_register(struct ap_matrix *ap_matrix)
> +{
> + int ret;
> +
> + ret = mdev_register_device(&ap_matrix->device, &vfio_ap_matrix_ops);
> + if (ret)
> + return ret;
> +
> + ap_matrix->available_instances = AP_MATRIX_MAX_AVAILABLE_INSTANCES;
> +
> + return 0;
> +}
> +
> +void vfio_ap_mdev_unregister(struct ap_matrix *ap_matrix)
> +{
> + ap_matrix->available_instances--;
What is this for? I don't understand.
Regards,
Halil
> + mdev_unregister_device(&ap_matrix->device);
> +}
On 05/11/2018 01:18 PM, Halil Pasic wrote:
>
>
> On 05/07/2018 05:11 PM, Tony Krowiak wrote:
>> Registers the matrix device created by the VFIO AP device
>> driver with the VFIO mediated device framework.
>> Registering the matrix device will create the sysfs
>> structures needed to create mediated matrix devices
>> each of which will be used to configure the AP matrix
>> for a guest and connect it to the VFIO AP device driver.
>>
> [..]
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
>> b/drivers/s390/crypto/vfio_ap_ops.c
>> new file mode 100644
>> index 0000000..d7d36fb
>> --- /dev/null
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -0,0 +1,106 @@
>> +// SPDX-License-Identifier: GPL-2.0+
>> +/*
>> + * Adjunct processor matrix VFIO device driver callbacks.
>> + *
>> + * Copyright IBM Corp. 2017
>> + * Author(s): Tony Krowiak <[email protected]>
>> + *
>> + */
>> +#include <linux/string.h>
>> +#include <linux/vfio.h>
>> +#include <linux/device.h>
>> +#include <linux/list.h>
>> +#include <linux/ctype.h>
>> +
>> +#include "vfio_ap_private.h"
>> +
>> +#define VFOP_AP_MDEV_TYPE_HWVIRT "passthrough"
>> +#define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
>> +
>> +static int vfio_ap_mdev_create(struct kobject *kobj, struct
>> mdev_device *mdev)
>> +{
>> + struct ap_matrix *ap_matrix = to_ap_matrix(mdev_parent_dev(mdev));
>> +
>> + ap_matrix->available_instances--;
>> +
>> + return 0;
>> +}
>> +
>> +static int vfio_ap_mdev_remove(struct mdev_device *mdev)
>> +{
>> + struct ap_matrix *ap_matrix = to_ap_matrix(mdev_parent_dev(mdev));
>> +
>> + ap_matrix->available_instances++;
>> +
>> + return 0;
>> +}
>> +
>
> The above functions seem to be called with the lock of this
> auto-generated
> mdev parent device held. That's why we don't have to care about
> synchronization
> ourselves, right?
I would assume as much. The comments for the 'struct mdev_parent_ops' in
include/linux/mdev.h do not mention anything about synchronization, nor
did I
see any locking or synchronization in the vfio_ccw implementation after
which
I modeled my code, so frankly it is something I did not consider.
>
>
> A small comment in the code could be helpful for mdev non-experts.
> Hell, I would
> even consider documenting it for all mdev -- took me some time to
> figure out.
You may want to bring this up with the VFIO mdev maintainers, but I'd be
happy to
include a comment in the functions in question if you think it important.
>
>
> [..]
>
>
>> +int vfio_ap_mdev_register(struct ap_matrix *ap_matrix)
>> +{
>> + int ret;
>> +
>> + ret = mdev_register_device(&ap_matrix->device,
>> &vfio_ap_matrix_ops);
>> + if (ret)
>> + return ret;
>> +
>> + ap_matrix->available_instances = AP_MATRIX_MAX_AVAILABLE_INSTANCES;
>> +
>> + return 0;
>> +}
>> +
>> +void vfio_ap_mdev_unregister(struct ap_matrix *ap_matrix)
>> +{
>> + ap_matrix->available_instances--;
>
> What is this for? I don't understand.
To control the number of mediated devices one can create for the matrix
device.
Once the max is reached, the mdev framework will not allow creation of
another
mediated device until one is removed. This counter keeps track of the number
of instances that can be created. This is documented with the mediated
framework. You may want to take a look at:
Documentation/vfio-mediated-device.txt
Documentation/vfio.txt
Documentation/virtual/kvm/devices/vfio.txt
>
>
> Regards,
> Halil
>
>> + mdev_unregister_device(&ap_matrix->device);
>> +}
On 14/05/2018 21:42, Tony Krowiak wrote:
> On 05/11/2018 01:18 PM, Halil Pasic wrote:
>>
>>
>> On 05/07/2018 05:11 PM, Tony Krowiak wrote:
>>> Registers the matrix device created by the VFIO AP device
>>> driver with the VFIO mediated device framework.
>>> Registering the matrix device will create the sysfs
>>> structures needed to create mediated matrix devices
>>> each of which will be used to configure the AP matrix
>>> for a guest and connect it to the VFIO AP device driver.
>>>
>> [..]
>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
>>> b/drivers/s390/crypto/vfio_ap_ops.c
>>> new file mode 100644
>>> index 0000000..d7d36fb
>>> --- /dev/null
>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>>> @@ -0,0 +1,106 @@
>>> +// SPDX-License-Identifier: GPL-2.0+
>>> +/*
>>> + * Adjunct processor matrix VFIO device driver callbacks.
>>> + *
>>> + * Copyright IBM Corp. 2017
>>> + * Author(s): Tony Krowiak <[email protected]>
>>> + *
>>> + */
>>> +#include <linux/string.h>
>>> +#include <linux/vfio.h>
>>> +#include <linux/device.h>
>>> +#include <linux/list.h>
>>> +#include <linux/ctype.h>
>>> +
>>> +#include "vfio_ap_private.h"
>>> +
>>> +#define VFOP_AP_MDEV_TYPE_HWVIRT "passthrough"
>>> +#define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
>>> +
>>> +static int vfio_ap_mdev_create(struct kobject *kobj, struct
>>> mdev_device *mdev)
>>> +{
>>> + struct ap_matrix *ap_matrix = to_ap_matrix(mdev_parent_dev(mdev));
>>> +
>>> + ap_matrix->available_instances--;
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static int vfio_ap_mdev_remove(struct mdev_device *mdev)
>>> +{
>>> + struct ap_matrix *ap_matrix = to_ap_matrix(mdev_parent_dev(mdev));
>>> +
>>> + ap_matrix->available_instances++;
>>> +
>>> + return 0;
>>> +}
>>> +
>>
>> The above functions seem to be called with the lock of this
>> auto-generated
>> mdev parent device held. That's why we don't have to care about
>> synchronization
>> ourselves, right?
>
> I would assume as much. The comments for the 'struct mdev_parent_ops' in
> include/linux/mdev.h do not mention anything about synchronization,
> nor did I
> see any locking or synchronization in the vfio_ccw implementation
> after which
> I modeled my code, so frankly it is something I did not consider.
>
>>
>>
>> A small comment in the code could be helpful for mdev non-experts.
>> Hell, I would
>> even consider documenting it for all mdev -- took me some time to
>> figure out.
>
> You may want to bring this up with the VFIO mdev maintainers, but I'd
> be happy to
> include a comment in the functions in question if you think it important.
>
>>
>>
>> [..]
>>
>>
>>> +int vfio_ap_mdev_register(struct ap_matrix *ap_matrix)
>>> +{
>>> + int ret;
>>> +
>>> + ret = mdev_register_device(&ap_matrix->device,
>>> &vfio_ap_matrix_ops);
>>> + if (ret)
>>> + return ret;
>>> +
>>> + ap_matrix->available_instances =
>>> AP_MATRIX_MAX_AVAILABLE_INSTANCES;
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +void vfio_ap_mdev_unregister(struct ap_matrix *ap_matrix)
>>> +{
>>> + ap_matrix->available_instances--;
>>
>> What is this for? I don't understand.
>
> To control the number of mediated devices one can create for the
> matrix device.
> Once the max is reached, the mdev framework will not allow creation of
> another
> mediated device until one is removed. This counter keeps track of the
> number
> of instances that can be created. This is documented with the mediated
> framework. You may want to take a look at:
>
> Documentation/vfio-mediated-device.txt
> Documentation/vfio.txt
> Documentation/virtual/kvm/devices/vfio.txt
This is what you do in create/remove.
But here in unregister I agree with Halil, it does not seem to be usefull.
>
>>
>>
>> Regards,
>> Halil
>>
>>> + mdev_unregister_device(&ap_matrix->device);
>>> +}
>
>
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On 07/05/2018 17:11, Tony Krowiak wrote:
> Provides interfaces to manage the AP adapters, usage domains
> and control domains assigned to a KVM guest.
>
> The guest's SIE state description has a satellite structure called the
> Crypto Control Block (CRYCB) containing three bitmask fields
> identifying the adapters, queues (domains) and control domains
> assigned to the KVM guest:
>
> * The AP Adapter Mask (APM) field identifies the AP adapters assigned to
> the KVM guest
>
> * The AP Queue Mask (AQM) field identifies the AP queues assigned to
> the KVM guest. Each AP queue is connected to a usage domain within
> an AP adapter.
>
> * The AP Domain Mask (ADM) field identifies the control domains
> assigned to the KVM guest.
>
> Each adapter, queue (usage domain) and control domain are identified by
> a number from 0 to 255. The bits in each mask, from most significant to
> least significant bit, correspond to the numbers 0-255. When a bit is
> set, the corresponding adapter, queue (usage domain) or control domain
> is assigned to the KVM guest.
>
> This patch will set the bits in the APM, AQM and ADM fields of the
> CRYCB referenced by the KVM guest's SIE state description. The process
> used is:
>
> 1. Verify that the bits to be set do not exceed the maximum bit
> number for the given mask.
>
> 2. Verify that the APQNs that can be derived from the cross product
> of the bits set in the APM and AQM fields of the KVM guest's CRYCB
> are not assigned to any other KVM guest running on the same linux
> host.
>
> 3. Set the APM, AQM and ADM in the CRYCB according to the matrix
> configured for the mediated matrix device via its sysfs
> assign_adapter, assign_domain and assign_control domain attribute
> files respectively.
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> arch/s390/include/asm/kvm-ap.h | 52 ++++++++++++
> arch/s390/include/asm/kvm_host.h | 1 +
> arch/s390/kvm/kvm-ap.c | 161 ++++++++++++++++++++++++++++++++++++++
> 3 files changed, 214 insertions(+), 0 deletions(-)
>
> diff --git a/arch/s390/include/asm/kvm-ap.h b/arch/s390/include/asm/kvm-ap.h
> index 6af1ff8..21fe9f2 100644
> --- a/arch/s390/include/asm/kvm-ap.h
> +++ b/arch/s390/include/asm/kvm-ap.h
> @@ -12,8 +12,33 @@
>
> #include <linux/types.h>
> #include <linux/kvm_host.h>
> +#include <linux/bitops.h>
> #include <asm/ap.h>
>
> +#define KVM_AP_MASK_BYTES(n) DIV_ROUND_UP(n, BITS_PER_BYTE)
> +
> +/**
> + * The AP matrix is comprised of three bit masks identifying the adapters,
> + * queues (domains) and control domains that belong to an AP matrix. The bits in
> + * each mask, from least significant to most significant bit, correspond to IDs
> + * 0 to 255. When a bit is set, the corresponding ID belongs to the matrix.
> + *
> + * @apm identifies the AP adapters in the matrix
> + * @apm_max: max adapter number in @apm
> + * @aqm identifies the AP queues (domains) in the matrix
> + * @aqm_max: max domain number in @aqm
> + * @adm identifies the AP control domains in the matrix
> + * @adm_max: max domain number in @adm
> + */
> +struct kvm_ap_matrix {
> + unsigned long apm_max;
> + DECLARE_BITMAP(apm, 256);
> + unsigned long aqm_max;
> + DECLARE_BITMAP(aqm, 256);
> + unsigned long adm_max;
> + DECLARE_BITMAP(adm, 256);
Just a possible performance impact:
you may have interest to put all bitmaps first to take adventage
of quadword handling (If bitmaps use it) and put unsigned longs
at the end.
> +};
> +
> /**
> * kvm_ap_apxa_installed
> *
> @@ -57,4 +82,31 @@
> */
> bool kvm_ap_instructions_available(void);
>
> +/**
> + * kvm_ap_configure_matrix
> + *
> + * Configure the AP matrix for a KVM guest.
> + *
> + * @kvm: the KVM guest
> + * @matrix: the matrix configuration information
> + *
> + * Returns 0 if:
> + * 1. The AP instructions are installed on the guest
> + * 2. The APQNs derived from the intersection of the set of adapter
> + * IDs (APM) and queue indexes (AQM) in @matrix are not configured for
> + * any other KVM guest running on the same linux host.
> + * Otherwise returns an error code.
> + */
> +int kvm_ap_configure_matrix(struct kvm *kvm, struct kvm_ap_matrix *matrix);
> +
> +/**
> + * kvm_ap_deconfigure_matrix
> + *
> + * Deconfigure the AP matrix for a KVM guest. Clears all of the bits in the
> + * APM, AQM and ADM in the guest's CRYCB.
> + *
> + * @kvm: the KVM guest
> + */
> +void kvm_ap_deconfigure_matrix(struct kvm *kvm);
> +
> #endif /* _ASM_KVM_AP */
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index ef4b237..8736cde 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -257,6 +257,7 @@ struct kvm_s390_sie_block {
> __u64 tecmc; /* 0x00e8 */
> __u8 reservedf0[12]; /* 0x00f0 */
> #define CRYCB_FORMAT_MASK 0x00000003
> +#define CRYCB_FORMAT0 0x00000000
> #define CRYCB_FORMAT1 0x00000001
> #define CRYCB_FORMAT2 0x00000003
> __u32 crycbd; /* 0x00fc */
> diff --git a/arch/s390/kvm/kvm-ap.c b/arch/s390/kvm/kvm-ap.c
> index 00bcfb0..98b53c7 100644
> --- a/arch/s390/kvm/kvm-ap.c
> +++ b/arch/s390/kvm/kvm-ap.c
> @@ -7,6 +7,7 @@
> * Author(s): Tony Krowiak <[email protected]>
> */
> #include <linux/kernel.h>
> +#include <linux/bitops.h>
> #include <asm/kvm-ap.h>
>
> #include "kvm-s390.h"
> @@ -81,3 +82,163 @@ int kvm_ap_apxa_installed(void)
> return 0;
> }
> EXPORT_SYMBOL(kvm_ap_apxa_installed);
> +
> +static inline void kvm_ap_clear_crycb_masks(struct kvm *kvm)
> +{
> + memset(&kvm->arch.crypto.crycb->apcb0, 0,
> + sizeof(kvm->arch.crypto.crycb->apcb0));
Here you prefer to set both structure to 0 instead of testing which
structure to erase.
> + memset(&kvm->arch.crypto.crycb->apcb1, 0,
> + sizeof(kvm->arch.crypto.crycb->apcb1));
> +}
> +
...snip...
> +/**
> + * kvm_ap_validate_queue_sharing
> + *
> + * Verifies that the APQNs derived from the cross product of the AP adapter IDs
> + * and AP queue indexes comprising the AP matrix are not configured for
> + * another guest. AP queue sharing is not allowed.
> + *
> + * @kvm: the KVM guest
> + * @matrix: the AP matrix
> + *
> + * Returns 0 if the APQNs are valid, otherwise; returns -EBUSY.
> + */
> +static int kvm_ap_validate_queue_sharing(struct kvm *kvm,
> + struct kvm_ap_matrix *matrix)
> +{
> + struct kvm *vm;
> + unsigned long *apm, *aqm;
> + unsigned long apid, apqi;
> +
> +
> + /* No other VM may share an AP Queue with the input VM */
> + list_for_each_entry(vm, &vm_list, vm_list) {
> + if (kvm == vm)
> + continue;
> +
> + apm = kvm_ap_get_crycb_apm(vm);
> + if (!bitmap_and(apm, apm, matrix->apm, matrix->apm_max + 1))
> + continue;
> +
> + aqm = kvm_ap_get_crycb_aqm(vm);
> + if (!bitmap_and(aqm, aqm, matrix->aqm, matrix->aqm_max + 1))
> + continue;
> +
> + for_each_set_bit_inv(apid, apm, matrix->apm_max + 1)
> + for_each_set_bit_inv(apqi, aqm, matrix->aqm_max + 1)
> + kvm_ap_log_sharing_err(vm, apid, apqi);
> +
> + return -EBUSY;
> + }
> +
> + return 0;
> +}
This function (ap_validate_queue_sharing) only verifies that VM don't
share queues.
What about the queues used by a host application?
I understand that you want to implement these checks within KVM but this is
related to which queue devices are bound to the matrix and which one are
not.
I think that this should be related somehow to the bounded queue devices and
therefor implemented inside the matrix driver.
Regards,
Pierre
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On 05/15/2018 10:17 AM, Pierre Morel wrote:
> On 14/05/2018 21:42, Tony Krowiak wrote:
>> On 05/11/2018 01:18 PM, Halil Pasic wrote:
>>>
>>>
>>> On 05/07/2018 05:11 PM, Tony Krowiak wrote:
>>>> Registers the matrix device created by the VFIO AP device
>>>> driver with the VFIO mediated device framework.
>>>> Registering the matrix device will create the sysfs
>>>> structures needed to create mediated matrix devices
>>>> each of which will be used to configure the AP matrix
>>>> for a guest and connect it to the VFIO AP device driver.
>>>>
>>> [..]
>>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
>>>> b/drivers/s390/crypto/vfio_ap_ops.c
>>>> new file mode 100644
>>>> index 0000000..d7d36fb
>>>> --- /dev/null
>>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>>>> @@ -0,0 +1,106 @@
>>>> +// SPDX-License-Identifier: GPL-2.0+
>>>> +/*
>>>> + * Adjunct processor matrix VFIO device driver callbacks.
>>>> + *
>>>> + * Copyright IBM Corp. 2017
>>>> + * Author(s): Tony Krowiak <[email protected]>
>>>> + *
>>>> + */
>>>> +#include <linux/string.h>
>>>> +#include <linux/vfio.h>
>>>> +#include <linux/device.h>
>>>> +#include <linux/list.h>
>>>> +#include <linux/ctype.h>
>>>> +
>>>> +#include "vfio_ap_private.h"
>>>> +
>>>> +#define VFOP_AP_MDEV_TYPE_HWVIRT "passthrough"
>>>> +#define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
>>>> +
>>>> +static int vfio_ap_mdev_create(struct kobject *kobj, struct
>>>> mdev_device *mdev)
>>>> +{
>>>> + struct ap_matrix *ap_matrix =
>>>> to_ap_matrix(mdev_parent_dev(mdev));
>>>> +
>>>> + ap_matrix->available_instances--;
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static int vfio_ap_mdev_remove(struct mdev_device *mdev)
>>>> +{
>>>> + struct ap_matrix *ap_matrix =
>>>> to_ap_matrix(mdev_parent_dev(mdev));
>>>> +
>>>> + ap_matrix->available_instances++;
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>
>>> The above functions seem to be called with the lock of this
>>> auto-generated
>>> mdev parent device held. That's why we don't have to care about
>>> synchronization
>>> ourselves, right?
>>
>> I would assume as much. The comments for the 'struct mdev_parent_ops' in
>> include/linux/mdev.h do not mention anything about synchronization,
>> nor did I
>> see any locking or synchronization in the vfio_ccw implementation
>> after which
>> I modeled my code, so frankly it is something I did not consider.
>>
>>>
>>>
>>> A small comment in the code could be helpful for mdev non-experts.
>>> Hell, I would
>>> even consider documenting it for all mdev -- took me some time to
>>> figure out.
>>
>> You may want to bring this up with the VFIO mdev maintainers, but I'd
>> be happy to
>> include a comment in the functions in question if you think it
>> important.
>>
>>>
>>>
>>> [..]
>>>
>>>
>>>> +int vfio_ap_mdev_register(struct ap_matrix *ap_matrix)
>>>> +{
>>>> + int ret;
>>>> +
>>>> + ret = mdev_register_device(&ap_matrix->device,
>>>> &vfio_ap_matrix_ops);
>>>> + if (ret)
>>>> + return ret;
>>>> +
>>>> + ap_matrix->available_instances =
>>>> AP_MATRIX_MAX_AVAILABLE_INSTANCES;
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +void vfio_ap_mdev_unregister(struct ap_matrix *ap_matrix)
>>>> +{
>>>> + ap_matrix->available_instances--;
>>>
>>> What is this for? I don't understand.
>>
>> To control the number of mediated devices one can create for the
>> matrix device.
>> Once the max is reached, the mdev framework will not allow creation
>> of another
>> mediated device until one is removed. This counter keeps track of the
>> number
>> of instances that can be created. This is documented with the mediated
>> framework. You may want to take a look at:
>>
>> Documentation/vfio-mediated-device.txt
>> Documentation/vfio.txt
>> Documentation/virtual/kvm/devices/vfio.txt
>
> This is what you do in create/remove.
> But here in unregister I agree with Halil, it does not seem to be usefull.
If that is in fact what Halil was asking, then I misinterpreted his
question; I
thought he was asking what the available_instances was used for. You are
correct, this does not belong here although it makes little difference given
this is called only when the driver, which creates the matrix device, is
unloaded.
It is necessary in the register function to initialize its value, but I'll
remove it from here.
>
>
>>
>>>
>>>
>>> Regards,
>>> Halil
>>>
>>>> + mdev_unregister_device(&ap_matrix->device);
>>>> +}
>>
>>
>
On 05/15/2018 05:16 PM, Tony Krowiak wrote:
> On 05/15/2018 10:17 AM, Pierre Morel wrote:
>> On 14/05/2018 21:42, Tony Krowiak wrote:
>>> On 05/11/2018 01:18 PM, Halil Pasic wrote:
>>>>
>>>>
>>>> On 05/07/2018 05:11 PM, Tony Krowiak wrote:
>>>>> Registers the matrix device created by the VFIO AP device
>>>>> driver with the VFIO mediated device framework.
>>>>> Registering the matrix device will create the sysfs
>>>>> structures needed to create mediated matrix devices
>>>>> each of which will be used to configure the AP matrix
>>>>> for a guest and connect it to the VFIO AP device driver.
>>>>>
>>>> [..]
>>>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>>>>> new file mode 100644
>>>>> index 0000000..d7d36fb
>>>>> --- /dev/null
>>>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>>>>> @@ -0,0 +1,106 @@
>>>>> +// SPDX-License-Identifier: GPL-2.0+
>>>>> +/*
>>>>> + * Adjunct processor matrix VFIO device driver callbacks.
>>>>> + *
>>>>> + * Copyright IBM Corp. 2017
>>>>> + * Author(s): Tony Krowiak <[email protected]>
>>>>> + *
>>>>> + */
>>>>> +#include <linux/string.h>
>>>>> +#include <linux/vfio.h>
>>>>> +#include <linux/device.h>
>>>>> +#include <linux/list.h>
>>>>> +#include <linux/ctype.h>
>>>>> +
>>>>> +#include "vfio_ap_private.h"
>>>>> +
>>>>> +#define VFOP_AP_MDEV_TYPE_HWVIRT "passthrough"
>>>>> +#define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
>>>>> +
>>>>> +static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>>>>> +{
>>>>> + struct ap_matrix *ap_matrix = to_ap_matrix(mdev_parent_dev(mdev));
>>>>> +
>>>>> + ap_matrix->available_instances--;
>>>>> +
>>>>> + return 0;
>>>>> +}
>>>>> +
>>>>> +static int vfio_ap_mdev_remove(struct mdev_device *mdev)
>>>>> +{
>>>>> + struct ap_matrix *ap_matrix = to_ap_matrix(mdev_parent_dev(mdev));
>>>>> +
>>>>> + ap_matrix->available_instances++;
>>>>> +
>>>>> + return 0;
>>>>> +}
>>>>> +
>>>>
>>>> The above functions seem to be called with the lock of this auto-generated
>>>> mdev parent device held. That's why we don't have to care about synchronization
>>>> ourselves, right?
>>>
>>> I would assume as much. The comments for the 'struct mdev_parent_ops' in
>>> include/linux/mdev.h do not mention anything about synchronization, nor did I
>>> see any locking or synchronization in the vfio_ccw implementation after which
>>> I modeled my code, so frankly it is something I did not consider.
>>>
>>>>
>>>>
>>>> A small comment in the code could be helpful for mdev non-experts. Hell, I would
>>>> even consider documenting it for all mdev -- took me some time to figure out.
>>>
>>> You may want to bring this up with the VFIO mdev maintainers, but I'd be happy to
>>> include a comment in the functions in question if you think it important.
>>>
>>>>
>>>>
>>>> [..]
>>>>
>>>>
>>>>> +int vfio_ap_mdev_register(struct ap_matrix *ap_matrix)
>>>>> +{
>>>>> + int ret;
>>>>> +
>>>>> + ret = mdev_register_device(&ap_matrix->device, &vfio_ap_matrix_ops);
>>>>> + if (ret)
>>>>> + return ret;
>>>>> +
>>>>> + ap_matrix->available_instances = AP_MATRIX_MAX_AVAILABLE_INSTANCES;
>>>>> +
>>>>> + return 0;
>>>>> +}
>>>>> +
>>>>> +void vfio_ap_mdev_unregister(struct ap_matrix *ap_matrix)
>>>>> +{
>>>>> + ap_matrix->available_instances--;
>>>>
>>>> What is this for? I don't understand.
>>>
>>> To control the number of mediated devices one can create for the matrix device.
>>> Once the max is reached, the mdev framework will not allow creation of another
>>> mediated device until one is removed. This counter keeps track of the number
>>> of instances that can be created. This is documented with the mediated
>>> framework. You may want to take a look at:
>>>
>>> Documentation/vfio-mediated-device.txt
>>> Documentation/vfio.txt
>>> Documentation/virtual/kvm/devices/vfio.txt
>>
>> This is what you do in create/remove.
>> But here in unregister I agree with Halil, it does not seem to be usefull.
>
> If that is in fact what Halil was asking, then I misinterpreted his question; I
> thought he was asking what the available_instances was used for. You are
> correct, this does not belong here although it makes little difference given
> this is called only when the driver, which creates the matrix device, is unloaded.
> It is necessary in the register function to initialize its value, but I'll
> remove it from here.
>
I questioned the dubious usage of ap_matrix->available_instances rather than
asking what is the variable for.
If I've had this deemed damaging I would have asked if it's damaging in a way
I think it is. For example take my comment on 'KVM: s390: interfaces to manage
guest's AP matrix'.
Regards,
Halil
>>
>>
>>>
>>>>
>>>>
>>>> Regards,
>>>> Halil
>>>>
>>>>> + mdev_unregister_device(&ap_matrix->device);
>>>>> +}
>>>
>>>
>>
>
On 05/15/2018 10:55 AM, Pierre Morel wrote:
> On 07/05/2018 17:11, Tony Krowiak wrote:
>> Provides interfaces to manage the AP adapters, usage domains
>> and control domains assigned to a KVM guest.
>>
>> The guest's SIE state description has a satellite structure called the
>> Crypto Control Block (CRYCB) containing three bitmask fields
>> identifying the adapters, queues (domains) and control domains
>> assigned to the KVM guest:
>>
>> * The AP Adapter Mask (APM) field identifies the AP adapters assigned to
>> the KVM guest
>>
>> * The AP Queue Mask (AQM) field identifies the AP queues assigned to
>> the KVM guest. Each AP queue is connected to a usage domain within
>> an AP adapter.
>>
>> * The AP Domain Mask (ADM) field identifies the control domains
>> assigned to the KVM guest.
>>
>> Each adapter, queue (usage domain) and control domain are identified by
>> a number from 0 to 255. The bits in each mask, from most significant to
>> least significant bit, correspond to the numbers 0-255. When a bit is
>> set, the corresponding adapter, queue (usage domain) or control domain
>> is assigned to the KVM guest.
>>
>> This patch will set the bits in the APM, AQM and ADM fields of the
>> CRYCB referenced by the KVM guest's SIE state description. The process
>> used is:
>>
>> 1. Verify that the bits to be set do not exceed the maximum bit
>> number for the given mask.
>>
>> 2. Verify that the APQNs that can be derived from the cross product
>> of the bits set in the APM and AQM fields of the KVM guest's CRYCB
>> are not assigned to any other KVM guest running on the same linux
>> host.
>>
>> 3. Set the APM, AQM and ADM in the CRYCB according to the matrix
>> configured for the mediated matrix device via its sysfs
>> assign_adapter, assign_domain and assign_control domain attribute
>> files respectively.
>>
>> Signed-off-by: Tony Krowiak <[email protected]>
>> ---
>> arch/s390/include/asm/kvm-ap.h | 52 ++++++++++++
>> arch/s390/include/asm/kvm_host.h | 1 +
>> arch/s390/kvm/kvm-ap.c | 161
>> ++++++++++++++++++++++++++++++++++++++
>> 3 files changed, 214 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/s390/include/asm/kvm-ap.h
>> b/arch/s390/include/asm/kvm-ap.h
>> index 6af1ff8..21fe9f2 100644
>> --- a/arch/s390/include/asm/kvm-ap.h
>> +++ b/arch/s390/include/asm/kvm-ap.h
>> @@ -12,8 +12,33 @@
>>
>> #include <linux/types.h>
>> #include <linux/kvm_host.h>
>> +#include <linux/bitops.h>
>> #include <asm/ap.h>
>>
>> +#define KVM_AP_MASK_BYTES(n) DIV_ROUND_UP(n, BITS_PER_BYTE)
>> +
>> +/**
>> + * The AP matrix is comprised of three bit masks identifying the
>> adapters,
>> + * queues (domains) and control domains that belong to an AP matrix.
>> The bits in
>> + * each mask, from least significant to most significant bit,
>> correspond to IDs
>> + * 0 to 255. When a bit is set, the corresponding ID belongs to the
>> matrix.
>> + *
>> + * @apm identifies the AP adapters in the matrix
>> + * @apm_max: max adapter number in @apm
>> + * @aqm identifies the AP queues (domains) in the matrix
>> + * @aqm_max: max domain number in @aqm
>> + * @adm identifies the AP control domains in the matrix
>> + * @adm_max: max domain number in @adm
>> + */
>> +struct kvm_ap_matrix {
>> + unsigned long apm_max;
>> + DECLARE_BITMAP(apm, 256);
>> + unsigned long aqm_max;
>> + DECLARE_BITMAP(aqm, 256);
>> + unsigned long adm_max;
>> + DECLARE_BITMAP(adm, 256);
>
> Just a possible performance impact:
> you may have interest to put all bitmaps first to take adventage
> of quadword handling (If bitmaps use it) and put unsigned longs
> at the end.
The DECLARE_BITMAP macros declare the first operand as an
array of unsigned long, so each of the fields falls on a
natural alignment boundary which I believe means the
structure or any of its fields require only one memory
access. I don't see how use of this structure will cause
performance impacts. Even if that were the case, the impact
would be negligible and completely unnoticeable by a human
IMHO. I prefer to keep the related fields together.
>
>
>> +};
>> +
>> /**
>> * kvm_ap_apxa_installed
>> *
>> @@ -57,4 +82,31 @@
>> */
>> bool kvm_ap_instructions_available(void);
>>
>> +/**
>> + * kvm_ap_configure_matrix
>> + *
>> + * Configure the AP matrix for a KVM guest.
>> + *
>> + * @kvm: the KVM guest
>> + * @matrix: the matrix configuration information
>> + *
>> + * Returns 0 if:
>> + * 1. The AP instructions are installed on the guest
>> + * 2. The APQNs derived from the intersection of the set of adapter
>> + * IDs (APM) and queue indexes (AQM) in @matrix are not
>> configured for
>> + * any other KVM guest running on the same linux host.
>> + * Otherwise returns an error code.
>> + */
>> +int kvm_ap_configure_matrix(struct kvm *kvm, struct kvm_ap_matrix
>> *matrix);
>> +
>> +/**
>> + * kvm_ap_deconfigure_matrix
>> + *
>> + * Deconfigure the AP matrix for a KVM guest. Clears all of the bits
>> in the
>> + * APM, AQM and ADM in the guest's CRYCB.
>> + *
>> + * @kvm: the KVM guest
>> + */
>> +void kvm_ap_deconfigure_matrix(struct kvm *kvm);
>> +
>> #endif /* _ASM_KVM_AP */
>> diff --git a/arch/s390/include/asm/kvm_host.h
>> b/arch/s390/include/asm/kvm_host.h
>> index ef4b237..8736cde 100644
>> --- a/arch/s390/include/asm/kvm_host.h
>> +++ b/arch/s390/include/asm/kvm_host.h
>> @@ -257,6 +257,7 @@ struct kvm_s390_sie_block {
>> __u64 tecmc; /* 0x00e8 */
>> __u8 reservedf0[12]; /* 0x00f0 */
>> #define CRYCB_FORMAT_MASK 0x00000003
>> +#define CRYCB_FORMAT0 0x00000000
>> #define CRYCB_FORMAT1 0x00000001
>> #define CRYCB_FORMAT2 0x00000003
>> __u32 crycbd; /* 0x00fc */
>> diff --git a/arch/s390/kvm/kvm-ap.c b/arch/s390/kvm/kvm-ap.c
>> index 00bcfb0..98b53c7 100644
>> --- a/arch/s390/kvm/kvm-ap.c
>> +++ b/arch/s390/kvm/kvm-ap.c
>> @@ -7,6 +7,7 @@
>> * Author(s): Tony Krowiak <[email protected]>
>> */
>> #include <linux/kernel.h>
>> +#include <linux/bitops.h>
>> #include <asm/kvm-ap.h>
>>
>> #include "kvm-s390.h"
>> @@ -81,3 +82,163 @@ int kvm_ap_apxa_installed(void)
>> return 0;
>> }
>> EXPORT_SYMBOL(kvm_ap_apxa_installed);
>> +
>> +static inline void kvm_ap_clear_crycb_masks(struct kvm *kvm)
>> +{
>> + memset(&kvm->arch.crypto.crycb->apcb0, 0,
>> + sizeof(kvm->arch.crypto.crycb->apcb0));
>
> Here you prefer to set both structure to 0 instead of testing which
> structure to erase.
The function performs the task described by its name ... it clears the
CRYCB masks (plural). This function will most likely be called only
once when the CRYCB masks are configured. I see no good reason to
change this.
>
>
>> + memset(&kvm->arch.crypto.crycb->apcb1, 0,
>> + sizeof(kvm->arch.crypto.crycb->apcb1));
>> +}
>> +
>
> ...snip...
>
>> +/**
>> + * kvm_ap_validate_queue_sharing
>> + *
>> + * Verifies that the APQNs derived from the cross product of the AP
>> adapter IDs
>> + * and AP queue indexes comprising the AP matrix are not configured for
>> + * another guest. AP queue sharing is not allowed.
>> + *
>> + * @kvm: the KVM guest
>> + * @matrix: the AP matrix
>> + *
>> + * Returns 0 if the APQNs are valid, otherwise; returns -EBUSY.
>> + */
>> +static int kvm_ap_validate_queue_sharing(struct kvm *kvm,
>> + struct kvm_ap_matrix *matrix)
>> +{
>> + struct kvm *vm;
>> + unsigned long *apm, *aqm;
>> + unsigned long apid, apqi;
>> +
>> +
>> + /* No other VM may share an AP Queue with the input VM */
>> + list_for_each_entry(vm, &vm_list, vm_list) {
>> + if (kvm == vm)
>> + continue;
>> +
>> + apm = kvm_ap_get_crycb_apm(vm);
>> + if (!bitmap_and(apm, apm, matrix->apm, matrix->apm_max + 1))
>> + continue;
>> +
>> + aqm = kvm_ap_get_crycb_aqm(vm);
>> + if (!bitmap_and(aqm, aqm, matrix->aqm, matrix->aqm_max + 1))
>> + continue;
>> +
>> + for_each_set_bit_inv(apid, apm, matrix->apm_max + 1)
>> + for_each_set_bit_inv(apqi, aqm, matrix->aqm_max + 1)
>> + kvm_ap_log_sharing_err(vm, apid, apqi);
>> +
>> + return -EBUSY;
>> + }
>> +
>> + return 0;
>> +}
>
> This function (ap_validate_queue_sharing) only verifies that VM don't
> share queues.
> What about the queues used by a host application?
How can that be verified from this function? I suppose I could put a
check in here to
verify that the queues are reserved by the vfio_ap device driver, but
that would
be redundant because an AP queue can not be assigned to a mediated
matrix device
via its sysfs attributes unless it is reserved by the vfio_ap device
driver (see
patches 7, 8 and 9).
>
>
> I understand that you want to implement these checks within KVM but
> this is
> related to which queue devices are bound to the matrix and which one
> are not.
See my comments above and below about AP queue assignment to the
mediated matrix
device. The one verification we can't do when the devices are assigned
is whether
another guest is using the queue because assignment occurs before the
guest using
the queue is started in which case we have no access to KVM. It makes no
sense to
do so at assignment time anyway because it doesn't matter until the
guest using
the mediated matrix device is started, so that check is done in KVM.
>
>
> I think that this should be related somehow to the bounded queue
> devices and
> therefor implemented inside the matrix driver.
As I stated above, when an AP queue is assigned to the mediated matrix
device via
its sysfs attributes, a check is done to verify that it is bound to the
vfio_ap
device driver (see patches 7, 8 and 9). If not, then assignment will be
rejected;
therefore, it will not be possible to configure a CRYCB with AP queues
that are
not bound to the device driver.
>
>
> Regards,
>
> Pierre
>
On 05/15/2018 11:48 AM, Halil Pasic wrote:
>
>
> On 05/15/2018 05:16 PM, Tony Krowiak wrote:
>> On 05/15/2018 10:17 AM, Pierre Morel wrote:
>>> On 14/05/2018 21:42, Tony Krowiak wrote:
>>>> On 05/11/2018 01:18 PM, Halil Pasic wrote:
>>>>>
>>>>>
>>>>> On 05/07/2018 05:11 PM, Tony Krowiak wrote:
>>>>>> Registers the matrix device created by the VFIO AP device
>>>>>> driver with the VFIO mediated device framework.
>>>>>> Registering the matrix device will create the sysfs
>>>>>> structures needed to create mediated matrix devices
>>>>>> each of which will be used to configure the AP matrix
>>>>>> for a guest and connect it to the VFIO AP device driver.
>>>>>>
>>>>> [..]
>>>>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
>>>>>> b/drivers/s390/crypto/vfio_ap_ops.c
>>>>>> new file mode 100644
>>>>>> index 0000000..d7d36fb
>>>>>> --- /dev/null
>>>>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>>>>>> @@ -0,0 +1,106 @@
>>>>>> +// SPDX-License-Identifier: GPL-2.0+
>>>>>> +/*
>>>>>> + * Adjunct processor matrix VFIO device driver callbacks.
>>>>>> + *
>>>>>> + * Copyright IBM Corp. 2017
>>>>>> + * Author(s): Tony Krowiak <[email protected]>
>>>>>> + *
>>>>>> + */
>>>>>> +#include <linux/string.h>
>>>>>> +#include <linux/vfio.h>
>>>>>> +#include <linux/device.h>
>>>>>> +#include <linux/list.h>
>>>>>> +#include <linux/ctype.h>
>>>>>> +
>>>>>> +#include "vfio_ap_private.h"
>>>>>> +
>>>>>> +#define VFOP_AP_MDEV_TYPE_HWVIRT "passthrough"
>>>>>> +#define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
>>>>>> +
>>>>>> +static int vfio_ap_mdev_create(struct kobject *kobj, struct
>>>>>> mdev_device *mdev)
>>>>>> +{
>>>>>> + struct ap_matrix *ap_matrix =
>>>>>> to_ap_matrix(mdev_parent_dev(mdev));
>>>>>> +
>>>>>> + ap_matrix->available_instances--;
>>>>>> +
>>>>>> + return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int vfio_ap_mdev_remove(struct mdev_device *mdev)
>>>>>> +{
>>>>>> + struct ap_matrix *ap_matrix =
>>>>>> to_ap_matrix(mdev_parent_dev(mdev));
>>>>>> +
>>>>>> + ap_matrix->available_instances++;
>>>>>> +
>>>>>> + return 0;
>>>>>> +}
>>>>>> +
>>>>>
>>>>> The above functions seem to be called with the lock of this
>>>>> auto-generated
>>>>> mdev parent device held. That's why we don't have to care about
>>>>> synchronization
>>>>> ourselves, right?
>>>>
>>>> I would assume as much. The comments for the 'struct
>>>> mdev_parent_ops' in
>>>> include/linux/mdev.h do not mention anything about synchronization,
>>>> nor did I
>>>> see any locking or synchronization in the vfio_ccw implementation
>>>> after which
>>>> I modeled my code, so frankly it is something I did not consider.
>>>>
>>>>>
>>>>>
>>>>> A small comment in the code could be helpful for mdev non-experts.
>>>>> Hell, I would
>>>>> even consider documenting it for all mdev -- took me some time to
>>>>> figure out.
>>>>
>>>> You may want to bring this up with the VFIO mdev maintainers, but
>>>> I'd be happy to
>>>> include a comment in the functions in question if you think it
>>>> important.
>>>>
>>>>>
>>>>>
>>>>> [..]
>>>>>
>>>>>
>>>>>> +int vfio_ap_mdev_register(struct ap_matrix *ap_matrix)
>>>>>> +{
>>>>>> + int ret;
>>>>>> +
>>>>>> + ret = mdev_register_device(&ap_matrix->device,
>>>>>> &vfio_ap_matrix_ops);
>>>>>> + if (ret)
>>>>>> + return ret;
>>>>>> +
>>>>>> + ap_matrix->available_instances =
>>>>>> AP_MATRIX_MAX_AVAILABLE_INSTANCES;
>>>>>> +
>>>>>> + return 0;
>>>>>> +}
>>>>>> +
>>>>>> +void vfio_ap_mdev_unregister(struct ap_matrix *ap_matrix)
>>>>>> +{
>>>>>> + ap_matrix->available_instances--;
>>>>>
>>>>> What is this for? I don't understand.
>>>>
>>>> To control the number of mediated devices one can create for the
>>>> matrix device.
>>>> Once the max is reached, the mdev framework will not allow creation
>>>> of another
>>>> mediated device until one is removed. This counter keeps track of
>>>> the number
>>>> of instances that can be created. This is documented with the mediated
>>>> framework. You may want to take a look at:
>>>>
>>>> Documentation/vfio-mediated-device.txt
>>>> Documentation/vfio.txt
>>>> Documentation/virtual/kvm/devices/vfio.txt
>>>
>>> This is what you do in create/remove.
>>> But here in unregister I agree with Halil, it does not seem to be
>>> usefull.
>>
>> If that is in fact what Halil was asking, then I misinterpreted his
>> question; I
>> thought he was asking what the available_instances was used for. You are
>> correct, this does not belong here although it makes little
>> difference given
>> this is called only when the driver, which creates the matrix device,
>> is unloaded.
>> It is necessary in the register function to initialize its value, but
>> I'll
>> remove it from here.
>>
>
> I questioned the dubious usage of ap_matrix->available_instances
> rather than
> asking what is the variable for.
I said I'd remove it.
>
>
> If I've had this deemed damaging I would have asked if it's damaging
> in a way
> I think it is. For example take my comment on 'KVM: s390: interfaces
> to manage
> guest's AP matrix'.
I apologize for not being able to read your mind. While this is not
necessarily
necessary, it is not damaging because this is called only when the
driver is being
unloaded. The point is moot, however, because I am removing it.
>
>
> Regards,
> Halil
>
>>>
>>>
>>>>
>>>>>
>>>>>
>>>>> Regards,
>>>>> Halil
>>>>>
>>>>>> + mdev_unregister_device(&ap_matrix->device);
>>>>>> +}
>>>>
>>>>
>>>
>>
On 15/05/2018 18:07, Tony Krowiak wrote:
> On 05/15/2018 10:55 AM, Pierre Morel wrote:
>> On 07/05/2018 17:11, Tony Krowiak wrote:
>>> Provides interfaces to manage the AP adapters, usage domains
>>> and control domains assigned to a KVM guest.
>>>
>>> The guest's SIE state description has a satellite structure called the
>>> Crypto Control Block (CRYCB) containing three bitmask fields
>>> identifying the adapters, queues (domains) and control domains
>>> assigned to the KVM guest:
>>>
...snip...
>>> +}
>>
>> This function (ap_validate_queue_sharing) only verifies that VM don't
>> share queues.
>> What about the queues used by a host application?
>
> How can that be verified from this function? I suppose I could put a
> check in here to
> verify that the queues are reserved by the vfio_ap device driver, but
> that would
> be redundant because an AP queue can not be assigned to a mediated
> matrix device
> via its sysfs attributes unless it is reserved by the vfio_ap device
> driver (see
> patches 7, 8 and 9).
>
>>
>>
>> I understand that you want to implement these checks within KVM but
>> this is
>> related to which queue devices are bound to the matrix and which one
>> are not.
>
> See my comments above and below about AP queue assignment to the
> mediated matrix
> device. The one verification we can't do when the devices are assigned
> is whether
> another guest is using the queue because assignment occurs before the
> guest using
> the queue is started in which case we have no access to KVM. It makes
> no sense to
> do so at assignment time anyway because it doesn't matter until the
> guest using
> the mediated matrix device is started, so that check is done in KVM.
>
>>
>>
>> I think that this should be related somehow to the bounded queue
>> devices and
>> therefor implemented inside the matrix driver.
>
> As I stated above, when an AP queue is assigned to the mediated matrix
> device via
> its sysfs attributes, a check is done to verify that it is bound to
> the vfio_ap
> device driver (see patches 7, 8 and 9). If not, then assignment will
> be rejected;
> therefore, it will not be possible to configure a CRYCB with AP queues
> that are
> not bound to the device driver.
This patch and te followed patches take care that the queues are bound
to the
matrix driver when they are assigned to the matrix using the sysfs entries.
But they do not take care that the queue can not be unbound before you
start
the guest, and they are not in the path if the admin decide to unbind a
queue
at some later time.
>
>>
>>
>> Regards,
>>
>> Pierre
>>
>
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On 07/05/2018 17:11, Tony Krowiak wrote:
> Implements the open callback on the mediated matrix device.
> The function registers a group notifier to receive notification
> of the VFIO_GROUP_NOTIFY_SET_KVM event. When notified,
> the vfio_ap device driver will get access to the guest's
> kvm structure. With access to this structure the driver will:
>
> 1. Ensure that only one mediated device is opened for the guest
>
> 2. Configure access to the AP devices for the guest.
>
> Access to AP adapters, usage domains and control domains
> is controlled by three bit masks contained in the Crypto Control
> Block (CRYCB) referenced from the guest's SIE state description:
>
> * The AP Mask (APM) controls access to the AP adapters. Each bit
> in the APM represents an adapter number - from most significant
> to least significant bit - from 0 to 255. The bits in the APM
> are set according to the adapter numbers assigned to the mediated
> matrix device via its 'assign_adapter' sysfs attribute file.
>
> * The AP Queue Mask (AQM) controls access to the AP queues. Each bit
> in the AQM represents an AP queue index - from most significant
> to least significant bit - from 0 to 255. A queue index references
> a specific domain and is synonymous with the domian number. The
> bits in the AQM are set according to the domain numbers assigned
> to the mediated matrix device via its 'assign_domain' sysfs
> attribute file.
>
> * The AP Domain Mask (ADM) controls access to the AP control domains.
> Each bit in the ADM represents a control domain - from most
> significant to least significant bit - from 0-255. The
> bits in the ADM are set according to the domain numbers assigned
> to the mediated matrix device via its 'assign_control_domain'
> sysfs attribute file.
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> arch/s390/include/asm/kvm-ap.h | 21 ++++++++++
> arch/s390/include/asm/kvm_host.h | 1 +
> arch/s390/kvm/kvm-ap.c | 19 +++++++++
> drivers/s390/crypto/vfio_ap_ops.c | 68 +++++++++++++++++++++++++++++++++
> drivers/s390/crypto/vfio_ap_private.h | 2 +
> 5 files changed, 111 insertions(+), 0 deletions(-)
>
> diff --git a/arch/s390/include/asm/kvm-ap.h b/arch/s390/include/asm/kvm-ap.h
> index 21fe9f2..68c5a67 100644
> --- a/arch/s390/include/asm/kvm-ap.h
> +++ b/arch/s390/include/asm/kvm-ap.h
> @@ -83,6 +83,27 @@ struct kvm_ap_matrix {
> bool kvm_ap_instructions_available(void);
>
> /**
> + * kvm_ap_refcount_read
> + *
> + * Read the AP reference count and return it.
> + */
> +int kvm_ap_refcount_read(struct kvm *kvm);
> +
> +/**
> + * kvm_ap_refcount_inc
> + *
> + * Increment the AP reference count.
> + */
> +void kvm_ap_refcount_inc(struct kvm *kvm);
> +
> +/**
> + * kvm_ap_refcount_dec
> + *
> + * Decrement the AP reference count
> + */
> +void kvm_ap_refcount_dec(struct kvm *kvm);
> +
> +/**
> * kvm_ap_configure_matrix
> *
> * Configure the AP matrix for a KVM guest.
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index 8736cde..5f1ad02 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -717,6 +717,7 @@ struct kvm_s390_crypto {
> __u8 aes_kw;
> __u8 dea_kw;
> __u8 apie;
> + atomic_t aprefs;
> };
>
> #define APCB0_MASK_SIZE 1
> diff --git a/arch/s390/kvm/kvm-ap.c b/arch/s390/kvm/kvm-ap.c
> index 98b53c7..848fb37 100644
> --- a/arch/s390/kvm/kvm-ap.c
> +++ b/arch/s390/kvm/kvm-ap.c
> @@ -9,6 +9,7 @@
> #include <linux/kernel.h>
> #include <linux/bitops.h>
> #include <asm/kvm-ap.h>
> +#include <asm/atomic.h>
>
> #include "kvm-s390.h"
>
> @@ -218,6 +219,24 @@ static int kvm_ap_validate_queue_sharing(struct kvm *kvm,
> return 0;
> }
>
> +int kvm_ap_refcount_read(struct kvm *kvm)
> +{
> + return atomic_read(&kvm->arch.crypto.aprefs);
> +}
> +EXPORT_SYMBOL(kvm_ap_refcount_read);
> +
> +void kvm_ap_refcount_inc(struct kvm *kvm)
> +{
> + atomic_inc(&kvm->arch.crypto.aprefs);
> +}
> +EXPORT_SYMBOL(kvm_ap_refcount_inc);
> +
> +void kvm_ap_refcount_dec(struct kvm *kvm)
> +{
> + atomic_dec(&kvm->arch.crypto.aprefs);
> +}
> +EXPORT_SYMBOL(kvm_ap_refcount_dec);
Why are these functions inside kvm-ap ?
Will anyone use this outer of vfio-ap ?
> +
> int kvm_ap_configure_matrix(struct kvm *kvm, struct kvm_ap_matrix *matrix)
> {
> int ret = 0;
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 81e03b8..8866b0e 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -11,6 +11,8 @@
> #include <linux/device.h>
> #include <linux/list.h>
> #include <linux/ctype.h>
> +#include <linux/module.h>
> +#include <asm/kvm-ap.h>
>
> #include "vfio_ap_private.h"
>
> @@ -47,6 +49,70 @@ static int vfio_ap_mdev_remove(struct mdev_device *mdev)
> return 0;
> }
>
> +static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
> + unsigned long action, void *data)
> +{
> + struct ap_matrix_mdev *matrix_mdev;
> +
> + if (action == VFIO_GROUP_NOTIFY_SET_KVM) {
> + matrix_mdev = container_of(nb, struct ap_matrix_mdev,
> + group_notifier);
> + matrix_mdev->kvm = data;
> + }
> +
> + return NOTIFY_OK;
> +}
> +
> +static int vfio_ap_mdev_open(struct mdev_device *mdev)
> +{
> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> + unsigned long events;
> + int ret;
> +
> + if (!try_module_get(THIS_MODULE))
> + return -ENODEV;
> +
> + matrix_mdev->group_notifier.notifier_call = vfio_ap_mdev_group_notifier;
> + events = VFIO_GROUP_NOTIFY_SET_KVM;
> +
> + ret = vfio_register_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
> + &events, &matrix_mdev->group_notifier);
> + if (ret)
> + goto out_err;
> +
> + /* Only one mediated device allowed per guest */
> + if (kvm_ap_refcount_read(matrix_mdev->kvm) != 0) {
> + ret = -EEXIST;
> + goto out_err;
> + }
Testing the existence should be the first thing to do.
> +
> + kvm_ap_refcount_inc(matrix_mdev->kvm);
> +
> + ret = kvm_ap_configure_matrix(matrix_mdev->kvm, &matrix_mdev->matrix);
> + if (ret)
> + goto config_err;
> +
> + return 0;
> +
> +config_err:
> + kvm_ap_refcount_dec(matrix_mdev->kvm);
> +out_err:
> + module_put(THIS_MODULE);
> +
> + return ret;
> +}
> +
> +static void vfio_ap_mdev_release(struct mdev_device *mdev)
> +{
> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> +
> + kvm_ap_deconfigure_matrix(matrix_mdev->kvm);
> + vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
> + &matrix_mdev->group_notifier);
> + kvm_ap_refcount_dec(matrix_mdev->kvm);
> + module_put(THIS_MODULE);
> +}
> +
> static ssize_t name_show(struct kobject *kobj, struct device *dev, char *buf)
> {
> return sprintf(buf, "%s\n", VFIO_AP_MDEV_NAME_HWVIRT);
> @@ -773,6 +839,8 @@ static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
> .mdev_attr_groups = vfio_ap_mdev_attr_groups,
> .create = vfio_ap_mdev_create,
> .remove = vfio_ap_mdev_remove,
> + .open = vfio_ap_mdev_open,
> + .release = vfio_ap_mdev_release,
> };
>
> int vfio_ap_mdev_register(struct ap_matrix *ap_matrix)
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index 8b6ad66..ab072e9 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -32,6 +32,8 @@ struct ap_matrix {
>
> struct ap_matrix_mdev {
> struct kvm_ap_matrix matrix;
> + struct notifier_block group_notifier;
> + struct kvm *kvm;
> };
>
> static inline struct ap_matrix *to_ap_matrix(struct device *dev)
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On 07/05/2018 17:11, Tony Krowiak wrote:
> Provides a sysfs interface to view the AP matrix configured for the
> mediated matrix device.
>
> The relevant sysfs structures are:
>
> /sys/devices/vfio_ap
> ... [matrix]
> ...... [mdev_supported_types]
> ......... [vfio_ap-passthrough]
> ............ [devices]
> ...............[$uuid]
> .................. matrix
>
> To view the matrix configured for the mediated matrix device,
> print the matrix file:
This is the configured matrix, not the one used by the guest.
Nothing in the patches protect against binding a queue and assigning
a new AP when the guest runs.
The card and queue will be showed by this entry.
>
> cat matrix
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> drivers/s390/crypto/vfio_ap_ops.c | 31 +++++++++++++++++++++++++++++++
> 1 files changed, 31 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 755be1d..81e03b8 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -716,6 +716,36 @@ static ssize_t control_domains_show(struct device *dev,
> }
> DEVICE_ATTR_RO(control_domains);
>
> +static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
> + char *buf)
> +{
> + struct mdev_device *mdev = mdev_from_dev(dev);
> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> + char *bufpos = buf;
> + unsigned long apid;
> + unsigned long apqi;
> + unsigned long napm = matrix_mdev->matrix.apm_max + 1;
> + unsigned long naqm = matrix_mdev->matrix.aqm_max + 1;
> + int nchars = 0;
> + int n;
> +
> + for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm) {
> + n = sprintf(bufpos, "%02lx\n", apid);
> + bufpos += n;
> + nchars += n;
> +
> + for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, naqm) {
> + n = sprintf(bufpos, "%02lx.%04lx\n", apid, apqi);
> + bufpos += n;
> + nchars += n;
> + }
> + }
> +
> + return nchars;
> +}
> +DEVICE_ATTR_RO(matrix);
> +
> +
> static struct attribute *vfio_ap_mdev_attrs[] = {
> &dev_attr_assign_adapter.attr,
> &dev_attr_unassign_adapter.attr,
> @@ -724,6 +754,7 @@ static ssize_t control_domains_show(struct device *dev,
> &dev_attr_assign_control_domain.attr,
> &dev_attr_unassign_control_domain.attr,
> &dev_attr_control_domains.attr,
> + &dev_attr_matrix.attr,
> NULL,
> };
>
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On 07/05/2018 17:11, Tony Krowiak wrote:
> Introduces a new AP device driver. This device driver
> is built on the VFIO mediated device framework. The framework
> provides sysfs interfaces that facilitate passthrough
> access by guests to devices installed on the linux host.
...snip...
> +static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
> +{
You should take care of the ap devices when they are added or removed
from the matrix.
I suggest you add a remove callback to avoid unbinding a queue while
it is assigned to a guest.
> + return 0;
> +}
> +
> +static void vfio_ap_matrix_dev_release(struct device *dev)
> +{
> + struct ap_matrix *ap_matrix = dev_get_drvdata(dev);
> +
> + kfree(ap_matrix);
> +}
...snip...
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On 07/05/2018 17:11, Tony Krowiak wrote:
> This patch refactors the code that initializes the crypto
> configuration for a guest. The crypto configuration is contained in
> a crypto control block (CRYCB) which is a satellite control block to
> our main hardware virtualization control block. The CRYCB is
> attached to the main virtualization control block via a CRYCB
> designation (CRYCBD) designation field containing the address of
> the CRYCB as well as its format.
>
> Prior to the introduction of AP device virtualization, there was
> no need to provide access to or specify the format of the CRYCB for
> a guest unless the MSA extension 3 (MSAX3) facility was installed
> on the host system. With the introduction of AP device virtualization,
> the CRYCB and its format must be made accessible to the guest
> regardless of the presence of the MSAX3 facility as long as the
> AP instructions are installed on the host.
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> arch/s390/include/asm/kvm_host.h | 1 +
> arch/s390/kvm/kvm-s390.c | 64 ++++++++++++++++++++++++++-----------
> 2 files changed, 46 insertions(+), 19 deletions(-)
>
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index 81cdb6b..5393c4d 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -255,6 +255,7 @@ struct kvm_s390_sie_block {
> __u8 reservede4[4]; /* 0x00e4 */
> __u64 tecmc; /* 0x00e8 */
> __u8 reservedf0[12]; /* 0x00f0 */
> +#define CRYCB_FORMAT_MASK 0x00000003
> #define CRYCB_FORMAT1 0x00000001
> #define CRYCB_FORMAT2 0x00000003
> __u32 crycbd; /* 0x00fc */
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 1f50de7..99779a6 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -1875,14 +1875,35 @@ long kvm_arch_vm_ioctl(struct file *filp,
> return r;
> }
>
> -static void kvm_s390_set_crycb_format(struct kvm *kvm)
> +/*
> + * The format of the crypto control block (CRYCB) is specified in the 3 low
> + * order bits of the CRYCB designation (CRYCBD) field as follows:
> + * Format 0: Neither the message security assist extension 3 (MSAX3) nor the
> + * AP extended addressing (APXA) facility are installed.
> + * Format 1: The APXA facility is not installed but the MSAX3 facility is.
> + * Format 2: Both the APXA and MSAX3 facilities are installed
> + */
> +static void kvm_s390_format_crycb(struct kvm *kvm)
> {
> - kvm->arch.crypto.crycbd = (__u32)(unsigned long) kvm->arch.crypto.crycb;
> + /* Clear the CRYCB format bits - i.e., set format 0 by default */
> + kvm->arch.crypto.crycbd &= ~(CRYCB_FORMAT_MASK);
> +
> + /* Check whether MSAX3 is installed */
> + if (!test_kvm_facility(kvm, 76))
> + return;
>
> if (kvm_ap_apxa_installed())
> kvm->arch.crypto.crycbd |= CRYCB_FORMAT2;
> else
> kvm->arch.crypto.crycbd |= CRYCB_FORMAT1;
> +
> + /* Enable AES/DEA protected key functions by default */
> + kvm->arch.crypto.aes_kw = 1;
> + kvm->arch.crypto.dea_kw = 1;
> + get_random_bytes(kvm->arch.crypto.crycb->aes_wrapping_key_mask,
> + sizeof(kvm->arch.crypto.crycb->aes_wrapping_key_mask));
> + get_random_bytes(kvm->arch.crypto.crycb->dea_wrapping_key_mask,
> + sizeof(kvm->arch.crypto.crycb->dea_wrapping_key_mask));
> }
>
> static u64 kvm_s390_get_initial_cpuid(void)
> @@ -1896,19 +1917,17 @@ static u64 kvm_s390_get_initial_cpuid(void)
>
> static void kvm_s390_crypto_init(struct kvm *kvm)
> {
> - if (!test_kvm_facility(kvm, 76))
> + /*
> + * If neither the AP instructions nor the message security assist
> + * extension 3 (MSAX3) are installed, there is no need to initialize a
> + * crypto control block (CRYCB) for the guest.
> + */
> + if (!kvm_ap_instructions_available() && !test_kvm_facility(kvm, 76))
> return;
>
> kvm->arch.crypto.crycb = &kvm->arch.sie_page2->crycb;
> - kvm_s390_set_crycb_format(kvm);
For my point of view the all patch can be reduced to putting this
call (kvm_s390_set_crycb_format(kvm);) before testing for facility 76.
(and setting the format correctly in kvm_s390_set_crycb_format(kvm))
> -
> - /* Enable AES/DEA protected key functions by default */
> - kvm->arch.crypto.aes_kw = 1;
> - kvm->arch.crypto.dea_kw = 1;
> - get_random_bytes(kvm->arch.crypto.crycb->aes_wrapping_key_mask,
> - sizeof(kvm->arch.crypto.crycb->aes_wrapping_key_mask));
> - get_random_bytes(kvm->arch.crypto.crycb->dea_wrapping_key_mask,
> - sizeof(kvm->arch.crypto.crycb->dea_wrapping_key_mask));
> + kvm->arch.crypto.crycbd = (__u32)(unsigned long) kvm->arch.crypto.crycb;
> + kvm_s390_format_crycb(kvm);
> }
>
> static void sca_dispose(struct kvm *kvm)
> @@ -2430,17 +2449,24 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
>
> static void kvm_s390_vcpu_crypto_setup(struct kvm_vcpu *vcpu)
> {
> - if (!test_kvm_facility(vcpu->kvm, 76))
> + /*
> + * If a crypto control block designation (CRYCBD) has not been
> + * initialized
> + */
> + if (vcpu->kvm->arch.crypto.crycbd == 0)
> return;
>
> - vcpu->arch.sie_block->ecb3 &= ~(ECB3_AES | ECB3_DEA);
> + vcpu->arch.sie_block->crycbd = vcpu->kvm->arch.crypto.crycbd;
>
> - if (vcpu->kvm->arch.crypto.aes_kw)
> - vcpu->arch.sie_block->ecb3 |= ECB3_AES;
> - if (vcpu->kvm->arch.crypto.dea_kw)
> - vcpu->arch.sie_block->ecb3 |= ECB3_DEA;
> + /* If MSAX3 is installed, set up protected key support */
> + if (test_kvm_facility(vcpu->kvm, 76)) {
> + vcpu->arch.sie_block->ecb3 &= ~(ECB3_AES | ECB3_DEA);
>
> - vcpu->arch.sie_block->crycbd = vcpu->kvm->arch.crypto.crycbd;
> + if (vcpu->kvm->arch.crypto.aes_kw)
> + vcpu->arch.sie_block->ecb3 |= ECB3_AES;
> + if (vcpu->kvm->arch.crypto.dea_kw)
> + vcpu->arch.sie_block->ecb3 |= ECB3_DEA;
> + }
> }
>
> void kvm_s390_vcpu_unsetup_cmma(struct kvm_vcpu *vcpu)
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On Mon, 7 May 2018 11:11:40 -0400
Tony Krowiak <[email protected]> wrote:
> Relocates an existing static function that tests whether
> the AP extended addressing facility (APXA) is installed on
> the linux host. The primary reason for relocating this
> function is because a new compilation unit (arch/s390/kvm/kvm-ap.c)
> is being created to contain all of the interfaces and logic
> for configuring an AP matrix for a KVM guest. Some of its
> functions will also need to determine whether APXA is installed,
> so, let's go ahead and relocate this static function as a
> public interface in kvm-ap.c.
>
> Notes:
> ----
> 1. The interface to determine whether APXA is installed on the linux
> host the information returned from the AP Query Configuration
> Information (QCI) function. This function will not be available
> if the AP instructions are not installed on the linux host, so a check
> will be included to verify that.
>
> 2. Currently, the AP bus interfaces accessing the AP instructions will
> not be accessible if CONFIG_ZCRYPT=n, so the relevant code will be
> temporarily contained in the new arch/s390/kvm/kvm-ap.c file until
> the patch(es) to statically build the required AP bus interfaces are
> available.
Any ETA for those interfaces? Would be nice if we could avoid
introducing temporary interfaces (but I'm certainly not opposing this
patch).
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> MAINTAINERS | 1 +
> arch/s390/include/asm/kvm-ap.h | 60 +++++++++++++++++++++++++++++
> arch/s390/kvm/Makefile | 2 +-
> arch/s390/kvm/kvm-ap.c | 83 ++++++++++++++++++++++++++++++++++++++++
> arch/s390/kvm/kvm-s390.c | 42 +-------------------
> 5 files changed, 147 insertions(+), 41 deletions(-)
> create mode 100644 arch/s390/include/asm/kvm-ap.h
> create mode 100644 arch/s390/kvm/kvm-ap.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index eab763f..224e97b 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -7792,6 +7792,7 @@ M: Christian Borntraeger <[email protected]>
> M: Janosch Frank <[email protected]>
> R: David Hildenbrand <[email protected]>
> R: Cornelia Huck <[email protected]>
> +R: Tony Krowiak <[email protected]>
Don't you want to drop the 'vnet' from your address, as the vnet-less
form seems to be the one that will continue working from what I've
heard?
> L: [email protected]
> W: http://www.ibm.com/developerworks/linux/linux390/
> T: git git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git
No objection against this patch, although I still hope it will not be
needed :)
On Mon, 7 May 2018 11:11:44 -0400
Tony Krowiak <[email protected]> wrote:
> Registers the matrix device created by the VFIO AP device
> driver with the VFIO mediated device framework.
> Registering the matrix device will create the sysfs
> structures needed to create mediated matrix devices
> each of which will be used to configure the AP matrix
> for a guest and connect it to the VFIO AP device driver.
>
> Registering the matrix device with the VFIO mediated device
> framework will create the following sysfs structures:
>
> /sys/devices/vfio_ap
> ... [matrix]
> ...... [mdev_supported_types]
> ......... [vfio_ap-passthrough]
> ............ create
>
> To create a mediated device for the AP matrix device, write a UUID
> to the create file:
>
> uuidgen > create
>
> A symbolic link to the mediated device's directory will be created in the
> devices subdirectory named after the generated $uuid:
>
> /sys/devices/vfio_ap
> ... [matrix]
> ...... [mdev_supported_types]
> ......... [vfio_ap-passthrough]
> ............ [devices]
> ............... [$uuid]
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> MAINTAINERS | 1 +
> drivers/s390/crypto/Makefile | 2 +-
> drivers/s390/crypto/vfio_ap_drv.c | 9 +++
> drivers/s390/crypto/vfio_ap_ops.c | 106 +++++++++++++++++++++++++++++++++
> drivers/s390/crypto/vfio_ap_private.h | 17 +++++
> 5 files changed, 134 insertions(+), 1 deletions(-)
> create mode 100644 drivers/s390/crypto/vfio_ap_ops.c
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> new file mode 100644
> index 0000000..d7d36fb
> --- /dev/null
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -0,0 +1,106 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * Adjunct processor matrix VFIO device driver callbacks.
> + *
> + * Copyright IBM Corp. 2017
Should be '2018' (also in some other files in this series; please
double check.)
> + * Author(s): Tony Krowiak <[email protected]>
> + *
> + */
> +#include <linux/string.h>
> +#include <linux/vfio.h>
> +#include <linux/device.h>
> +#include <linux/list.h>
> +#include <linux/ctype.h>
> +
> +#include "vfio_ap_private.h"
> +
> +#define VFOP_AP_MDEV_TYPE_HWVIRT "passthrough"
> +#define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
> +
> +static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
> +{
> + struct ap_matrix *ap_matrix = to_ap_matrix(mdev_parent_dev(mdev));
> +
> + ap_matrix->available_instances--;
Shouldn't the code check whether available_instances is actually > 0?
> +
> + return 0;
> +}
> +
> +static int vfio_ap_mdev_remove(struct mdev_device *mdev)
> +{
> + struct ap_matrix *ap_matrix = to_ap_matrix(mdev_parent_dev(mdev));
> +
> + ap_matrix->available_instances++;
> +
> + return 0;
> +}
> +
> +static ssize_t name_show(struct kobject *kobj, struct device *dev, char *buf)
> +{
> + return sprintf(buf, "%s\n", VFIO_AP_MDEV_NAME_HWVIRT);
> +}
> +
> +MDEV_TYPE_ATTR_RO(name);
> +
> +static ssize_t available_instances_show(struct kobject *kobj,
> + struct device *dev, char *buf)
> +{
> + struct ap_matrix *ap_matrix;
> +
> + ap_matrix = to_ap_matrix(dev);
Move this with the declaration?
> +
> + return sprintf(buf, "%d\n", ap_matrix->available_instances);
> +}
> +
> +MDEV_TYPE_ATTR_RO(available_instances);
> +
> +static ssize_t device_api_show(struct kobject *kobj, struct device *dev,
> + char *buf)
> +{
> + return sprintf(buf, "%s\n", VFIO_DEVICE_API_AP_STRING);
> +}
> +
> +MDEV_TYPE_ATTR_RO(device_api);
> +
> +static struct attribute *vfio_ap_mdev_type_attrs[] = {
> + &mdev_type_attr_name.attr,
> + &mdev_type_attr_device_api.attr,
> + &mdev_type_attr_available_instances.attr,
> + NULL,
> +};
> +
> +static struct attribute_group vfio_ap_mdev_hwvirt_type_group = {
> + .name = VFOP_AP_MDEV_TYPE_HWVIRT,
> + .attrs = vfio_ap_mdev_type_attrs,
> +};
> +
> +static struct attribute_group *vfio_ap_mdev_type_groups[] = {
> + &vfio_ap_mdev_hwvirt_type_group,
> + NULL,
> +};
> +
> +static const struct mdev_parent_ops vfio_ap_matrix_ops = {
> + .owner = THIS_MODULE,
> + .supported_type_groups = vfio_ap_mdev_type_groups,
> + .create = vfio_ap_mdev_create,
> + .remove = vfio_ap_mdev_remove,
> +};
> +
> +int vfio_ap_mdev_register(struct ap_matrix *ap_matrix)
> +{
> + int ret;
> +
> + ret = mdev_register_device(&ap_matrix->device, &vfio_ap_matrix_ops);
> + if (ret)
> + return ret;
> +
> + ap_matrix->available_instances = AP_MATRIX_MAX_AVAILABLE_INSTANCES;
> +
> + return 0;
> +}
> +
> +void vfio_ap_mdev_unregister(struct ap_matrix *ap_matrix)
> +{
> + ap_matrix->available_instances--;
> + mdev_unregister_device(&ap_matrix->device);
> +}
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index cf23675..afd8dbc 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -10,14 +10,31 @@
> #define _VFIO_AP_PRIVATE_H_
>
> #include <linux/types.h>
> +#include <linux/device.h>
> +#include <linux/mdev.h>
>
> #include "ap_bus.h"
>
> #define VFIO_AP_MODULE_NAME "vfio_ap"
> #define VFIO_AP_DRV_NAME "vfio_ap"
> +/**
> + * There must be one mediated matrix device per guest. If every APQN is assigned
One, or at most one? Or one for every guest using ap devices?
> + * to a guest, then the maximum number of guests with a unique APQN assigned
> + * would be 255 adapters x 255 domains = 72351 guests.
> + */
> +#define AP_MATRIX_MAX_AVAILABLE_INSTANCES 72351
>
> struct ap_matrix {
> struct device device;
> + int available_instances;
> };
>
> +static inline struct ap_matrix *to_ap_matrix(struct device *dev)
> +{
> + return container_of(dev, struct ap_matrix, device);
> +}
> +
> +extern int vfio_ap_mdev_register(struct ap_matrix *ap_matrix);
> +extern void vfio_ap_mdev_unregister(struct ap_matrix *ap_matrix);
> +
> #endif /* _VFIO_AP_PRIVATE_H_ */
On 05/16/2018 06:21 AM, Cornelia Huck wrote:
> On Mon, 7 May 2018 11:11:40 -0400
> Tony Krowiak <[email protected]> wrote:
>
>> Relocates an existing static function that tests whether
>> the AP extended addressing facility (APXA) is installed on
>> the linux host. The primary reason for relocating this
>> function is because a new compilation unit (arch/s390/kvm/kvm-ap.c)
>> is being created to contain all of the interfaces and logic
>> for configuring an AP matrix for a KVM guest. Some of its
>> functions will also need to determine whether APXA is installed,
>> so, let's go ahead and relocate this static function as a
>> public interface in kvm-ap.c.
>>
>> Notes:
>> ----
>> 1. The interface to determine whether APXA is installed on the linux
>> host the information returned from the AP Query Configuration
>> Information (QCI) function. This function will not be available
>> if the AP instructions are not installed on the linux host, so a check
>> will be included to verify that.
>>
>> 2. Currently, the AP bus interfaces accessing the AP instructions will
>> not be accessible if CONFIG_ZCRYPT=n, so the relevant code will be
>> temporarily contained in the new arch/s390/kvm/kvm-ap.c file until
>> the patch(es) to statically build the required AP bus interfaces are
>> available.
> Any ETA for those interfaces? Would be nice if we could avoid
> introducing temporary interfaces (but I'm certainly not opposing this
> patch).
I'll check with the developer.
>
>> Signed-off-by: Tony Krowiak <[email protected]>
>> ---
>> MAINTAINERS | 1 +
>> arch/s390/include/asm/kvm-ap.h | 60 +++++++++++++++++++++++++++++
>> arch/s390/kvm/Makefile | 2 +-
>> arch/s390/kvm/kvm-ap.c | 83 ++++++++++++++++++++++++++++++++++++++++
>> arch/s390/kvm/kvm-s390.c | 42 +-------------------
>> 5 files changed, 147 insertions(+), 41 deletions(-)
>> create mode 100644 arch/s390/include/asm/kvm-ap.h
>> create mode 100644 arch/s390/kvm/kvm-ap.c
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index eab763f..224e97b 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -7792,6 +7792,7 @@ M: Christian Borntraeger <[email protected]>
>> M: Janosch Frank <[email protected]>
>> R: David Hildenbrand <[email protected]>
>> R: Cornelia Huck <[email protected]>
>> +R: Tony Krowiak <[email protected]>
> Don't you want to drop the 'vnet' from your address, as the vnet-less
> form seems to be the one that will continue working from what I've
> heard?
>
>> L: [email protected]
>> W: http://www.ibm.com/developerworks/linux/linux390/
>> T: git git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git
> No objection against this patch, although I still hope it will not be
> needed :)
>
On 05/16/2018 04:51 AM, Pierre Morel wrote:
> On 07/05/2018 17:11, Tony Krowiak wrote:
>> This patch refactors the code that initializes the crypto
>> configuration for a guest. The crypto configuration is contained in
>> a crypto control block (CRYCB) which is a satellite control block to
>> our main hardware virtualization control block. The CRYCB is
>> attached to the main virtualization control block via a CRYCB
>> designation (CRYCBD) designation field containing the address of
>> the CRYCB as well as its format.
>>
>> Prior to the introduction of AP device virtualization, there was
>> no need to provide access to or specify the format of the CRYCB for
>> a guest unless the MSA extension 3 (MSAX3) facility was installed
>> on the host system. With the introduction of AP device virtualization,
>> the CRYCB and its format must be made accessible to the guest
>> regardless of the presence of the MSAX3 facility as long as the
>> AP instructions are installed on the host.
>>
>> Signed-off-by: Tony Krowiak <[email protected]>
>> ---
>> arch/s390/include/asm/kvm_host.h | 1 +
>> arch/s390/kvm/kvm-s390.c | 64
>> ++++++++++++++++++++++++++-----------
>> 2 files changed, 46 insertions(+), 19 deletions(-)
>>
>> diff --git a/arch/s390/include/asm/kvm_host.h
>> b/arch/s390/include/asm/kvm_host.h
>> index 81cdb6b..5393c4d 100644
>> --- a/arch/s390/include/asm/kvm_host.h
>> +++ b/arch/s390/include/asm/kvm_host.h
>> @@ -255,6 +255,7 @@ struct kvm_s390_sie_block {
>> __u8 reservede4[4]; /* 0x00e4 */
>> __u64 tecmc; /* 0x00e8 */
>> __u8 reservedf0[12]; /* 0x00f0 */
>> +#define CRYCB_FORMAT_MASK 0x00000003
>> #define CRYCB_FORMAT1 0x00000001
>> #define CRYCB_FORMAT2 0x00000003
>> __u32 crycbd; /* 0x00fc */
>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>> index 1f50de7..99779a6 100644
>> --- a/arch/s390/kvm/kvm-s390.c
>> +++ b/arch/s390/kvm/kvm-s390.c
>> @@ -1875,14 +1875,35 @@ long kvm_arch_vm_ioctl(struct file *filp,
>> return r;
>> }
>>
>> -static void kvm_s390_set_crycb_format(struct kvm *kvm)
>> +/*
>> + * The format of the crypto control block (CRYCB) is specified in
>> the 3 low
>> + * order bits of the CRYCB designation (CRYCBD) field as follows:
>> + * Format 0: Neither the message security assist extension 3 (MSAX3)
>> nor the
>> + * AP extended addressing (APXA) facility are installed.
>> + * Format 1: The APXA facility is not installed but the MSAX3
>> facility is.
>> + * Format 2: Both the APXA and MSAX3 facilities are installed
>> + */
>> +static void kvm_s390_format_crycb(struct kvm *kvm)
>> {
>> - kvm->arch.crypto.crycbd = (__u32)(unsigned long)
>> kvm->arch.crypto.crycb;
>> + /* Clear the CRYCB format bits - i.e., set format 0 by default */
>> + kvm->arch.crypto.crycbd &= ~(CRYCB_FORMAT_MASK);
>> +
>> + /* Check whether MSAX3 is installed */
>> + if (!test_kvm_facility(kvm, 76))
>> + return;
>>
>> if (kvm_ap_apxa_installed())
>> kvm->arch.crypto.crycbd |= CRYCB_FORMAT2;
>> else
>> kvm->arch.crypto.crycbd |= CRYCB_FORMAT1;
>> +
>> + /* Enable AES/DEA protected key functions by default */
>> + kvm->arch.crypto.aes_kw = 1;
>> + kvm->arch.crypto.dea_kw = 1;
>> + get_random_bytes(kvm->arch.crypto.crycb->aes_wrapping_key_mask,
>> + sizeof(kvm->arch.crypto.crycb->aes_wrapping_key_mask));
>> + get_random_bytes(kvm->arch.crypto.crycb->dea_wrapping_key_mask,
>> + sizeof(kvm->arch.crypto.crycb->dea_wrapping_key_mask));
>> }
>>
>> static u64 kvm_s390_get_initial_cpuid(void)
>> @@ -1896,19 +1917,17 @@ static u64 kvm_s390_get_initial_cpuid(void)
>>
>> static void kvm_s390_crypto_init(struct kvm *kvm)
>> {
>> - if (!test_kvm_facility(kvm, 76))
>> + /*
>> + * If neither the AP instructions nor the message security assist
>> + * extension 3 (MSAX3) are installed, there is no need to
>> initialize a
>> + * crypto control block (CRYCB) for the guest.
>> + */
>> + if (!kvm_ap_instructions_available() && !test_kvm_facility(kvm,
>> 76))
>> return;
>>
>> kvm->arch.crypto.crycb = &kvm->arch.sie_page2->crycb;
>> - kvm_s390_set_crycb_format(kvm);
>
>
> For my point of view the all patch can be reduced to putting this
> call (kvm_s390_set_crycb_format(kvm);) before testing for facility 76.
>
> (and setting the format correctly in kvm_s390_set_crycb_format(kvm))
I don't see what that buys us; it will just be reshuffling of the logic.
The idea here is that all of the code related to formatting the CRYCB for
use by the guest is contained in the kvm_s390_format_crycb(kvm) function.
We don't need a CRYCB, however, if the AP instructions are not installed
and the MSAX3 facility is not installed, so why even call
kvm_s390_format_crycb(kvm) in that case?
>
>
>
>> -
>> - /* Enable AES/DEA protected key functions by default */
>> - kvm->arch.crypto.aes_kw = 1;
>> - kvm->arch.crypto.dea_kw = 1;
>> - get_random_bytes(kvm->arch.crypto.crycb->aes_wrapping_key_mask,
>> - sizeof(kvm->arch.crypto.crycb->aes_wrapping_key_mask));
>> - get_random_bytes(kvm->arch.crypto.crycb->dea_wrapping_key_mask,
>> - sizeof(kvm->arch.crypto.crycb->dea_wrapping_key_mask));
>> + kvm->arch.crypto.crycbd = (__u32)(unsigned long)
>> kvm->arch.crypto.crycb;
>> + kvm_s390_format_crycb(kvm);
>> }
>>
>> static void sca_dispose(struct kvm *kvm)
>> @@ -2430,17 +2449,24 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu
>> *vcpu)
>>
>> static void kvm_s390_vcpu_crypto_setup(struct kvm_vcpu *vcpu)
>> {
>> - if (!test_kvm_facility(vcpu->kvm, 76))
>> + /*
>> + * If a crypto control block designation (CRYCBD) has not been
>> + * initialized
>> + */
>> + if (vcpu->kvm->arch.crypto.crycbd == 0)
>> return;
>>
>> - vcpu->arch.sie_block->ecb3 &= ~(ECB3_AES | ECB3_DEA);
>> + vcpu->arch.sie_block->crycbd = vcpu->kvm->arch.crypto.crycbd;
>>
>> - if (vcpu->kvm->arch.crypto.aes_kw)
>> - vcpu->arch.sie_block->ecb3 |= ECB3_AES;
>> - if (vcpu->kvm->arch.crypto.dea_kw)
>> - vcpu->arch.sie_block->ecb3 |= ECB3_DEA;
>> + /* If MSAX3 is installed, set up protected key support */
>> + if (test_kvm_facility(vcpu->kvm, 76)) {
>> + vcpu->arch.sie_block->ecb3 &= ~(ECB3_AES | ECB3_DEA);
>>
>> - vcpu->arch.sie_block->crycbd = vcpu->kvm->arch.crypto.crycbd;
>> + if (vcpu->kvm->arch.crypto.aes_kw)
>> + vcpu->arch.sie_block->ecb3 |= ECB3_AES;
>> + if (vcpu->kvm->arch.crypto.dea_kw)
>> + vcpu->arch.sie_block->ecb3 |= ECB3_DEA;
>> + }
>> }
>>
>> void kvm_s390_vcpu_unsetup_cmma(struct kvm_vcpu *vcpu)
>
>
On 05/16/2018 04:21 AM, Pierre Morel wrote:
> On 07/05/2018 17:11, Tony Krowiak wrote:
>> Introduces a new AP device driver. This device driver
>> is built on the VFIO mediated device framework. The framework
>> provides sysfs interfaces that facilitate passthrough
>> access by guests to devices installed on the linux host.
> ...snip...
>> +static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
>> +{
>
> You should take care of the ap devices when they are added or removed
> from the matrix.
> I suggest you add a remove callback to avoid unbinding a queue while
> it is assigned to a guest.
That makes sense, will do.
>
>
>> + return 0;
>> +}
>> +
>> +static void vfio_ap_matrix_dev_release(struct device *dev)
>> +{
>> + struct ap_matrix *ap_matrix = dev_get_drvdata(dev);
>> +
>> + kfree(ap_matrix);
>> +}
> ...snip...
>
On 05/16/2018 04:21 AM, Pierre Morel wrote:
> On 07/05/2018 17:11, Tony Krowiak wrote:
>> Introduces a new AP device driver. This device driver
>> is built on the VFIO mediated device framework. The framework
>> provides sysfs interfaces that facilitate passthrough
>> access by guests to devices installed on the linux host.
> ...snip...
>> +static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
>> +{
>
> You should take care of the ap devices when they are added or removed
> from the matrix.
> I suggest you add a remove callback to avoid unbinding a queue while
> it is assigned to a guest.
This is not possible without a change to the AP bus. The remove callback
returns void, so there is no way to indicate to the AP bus not to remove
the queue device. I'll talk to Harald about this.
>
>
>> + return 0;
>> +}
>> +
>> +static void vfio_ap_matrix_dev_release(struct device *dev)
>> +{
>> + struct ap_matrix *ap_matrix = dev_get_drvdata(dev);
>> +
>> + kfree(ap_matrix);
>> +}
> ...snip...
>
On 16/05/2018 13:14, Tony Krowiak wrote:
> On 05/16/2018 04:51 AM, Pierre Morel wrote:
>> On 07/05/2018 17:11, Tony Krowiak wrote:
>>> This patch refactors the code that initializes the crypto
>>> configuration for a guest. The crypto configuration is contained in
>>> a crypto control block (CRYCB) which is a satellite control block to
>>> our main hardware virtualization control block. The CRYCB is
>>> attached to the main virtualization control block via a CRYCB
>>> designation (CRYCBD) designation field containing the address of
>>> the CRYCB as well as its format.
>>>
>>> Prior to the introduction of AP device virtualization, there was
>>> no need to provide access to or specify the format of the CRYCB for
>>> a guest unless the MSA extension 3 (MSAX3) facility was installed
>>> on the host system. With the introduction of AP device virtualization,
>>> the CRYCB and its format must be made accessible to the guest
>>> regardless of the presence of the MSAX3 facility as long as the
>>> AP instructions are installed on the host.
>>>
>>> Signed-off-by: Tony Krowiak <[email protected]>
>>> ---
>>> arch/s390/include/asm/kvm_host.h | 1 +
>>> arch/s390/kvm/kvm-s390.c | 64
>>> ++++++++++++++++++++++++++-----------
>>> 2 files changed, 46 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/arch/s390/include/asm/kvm_host.h
>>> b/arch/s390/include/asm/kvm_host.h
>>> index 81cdb6b..5393c4d 100644
>>> --- a/arch/s390/include/asm/kvm_host.h
>>> +++ b/arch/s390/include/asm/kvm_host.h
>>> @@ -255,6 +255,7 @@ struct kvm_s390_sie_block {
>>> __u8 reservede4[4]; /* 0x00e4 */
>>> __u64 tecmc; /* 0x00e8 */
>>> __u8 reservedf0[12]; /* 0x00f0 */
>>> +#define CRYCB_FORMAT_MASK 0x00000003
>>> #define CRYCB_FORMAT1 0x00000001
>>> #define CRYCB_FORMAT2 0x00000003
>>> __u32 crycbd; /* 0x00fc */
>>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>>> index 1f50de7..99779a6 100644
>>> --- a/arch/s390/kvm/kvm-s390.c
>>> +++ b/arch/s390/kvm/kvm-s390.c
>>> @@ -1875,14 +1875,35 @@ long kvm_arch_vm_ioctl(struct file *filp,
>>> return r;
>>> }
>>>
>>> -static void kvm_s390_set_crycb_format(struct kvm *kvm)
>>> +/*
>>> + * The format of the crypto control block (CRYCB) is specified in
>>> the 3 low
>>> + * order bits of the CRYCB designation (CRYCBD) field as follows:
>>> + * Format 0: Neither the message security assist extension 3
>>> (MSAX3) nor the
>>> + * AP extended addressing (APXA) facility are installed.
>>> + * Format 1: The APXA facility is not installed but the MSAX3
>>> facility is.
>>> + * Format 2: Both the APXA and MSAX3 facilities are installed
>>> + */
>>> +static void kvm_s390_format_crycb(struct kvm *kvm)
>>> {
>>> - kvm->arch.crypto.crycbd = (__u32)(unsigned long)
>>> kvm->arch.crypto.crycb;
>>> + /* Clear the CRYCB format bits - i.e., set format 0 by default */
>>> + kvm->arch.crypto.crycbd &= ~(CRYCB_FORMAT_MASK);
>>> +
>>> + /* Check whether MSAX3 is installed */
>>> + if (!test_kvm_facility(kvm, 76))
>>> + return;
>>>
>>> if (kvm_ap_apxa_installed())
>>> kvm->arch.crypto.crycbd |= CRYCB_FORMAT2;
>>> else
>>> kvm->arch.crypto.crycbd |= CRYCB_FORMAT1;
>>> + one lin
>>> + /* Enable AES/DEA protected key functions by default */
>>> + kvm->arch.crypto.aes_kw = 1;
>>> + kvm->arch.crypto.dea_kw = 1;
>>> + get_random_bytes(kvm->arch.crypto.crycb->aes_wrapping_key_mask,
>>> + sizeof(kvm->arch.crypto.crycb->aes_wrapping_key_mask));
>>> + get_random_bytes(kvm->arch.crypto.crycb->dea_wrapping_key_mask,
>>> + sizeof(kvm->arch.crypto.crycb->dea_wrapping_key_mask));
>>> }
>>>
>>> static u64 kvm_s390_get_initial_cpuid(void)
>>> @@ -1896,19 +1917,17 @@ static u64 kvm_s390_get_initial_cpuid(void)
>>>
>>> static void kvm_s390_crypto_init(struct kvm *kvm)
>>> {
>>> - if (!test_kvm_facility(kvm, 76))
>>> + /*
>>> + * If neither the AP instructions nor the message security assist
>>> + * extension 3 (MSAX3) are installed, there is no need to
>>> initialize a
>>> + * crypto control block (CRYCB) for the guest.
>>> + */
>>> + if (!kvm_ap_instructions_available() && !test_kvm_facility(kvm,
>>> 76))
>>> return;
>>>
>>> kvm->arch.crypto.crycb = &kvm->arch.sie_page2->crycb;
>>> - kvm_s390_set_crycb_format(kvm);
>>
>>
>> For my point of view the all patch can be reduced to putting this
>> call (kvm_s390_set_crycb_format(kvm);) before testing for facility 76.
>>
>> (and setting the format correctly in kvm_s390_set_crycb_format(kvm))
>
> I don't see what that buys us; it will just be reshuffling of the logic.
> The idea here is that all of the code related to formatting the CRYCB for
> use by the guest is contained in the kvm_s390_format_crycb(kvm) function.
> We don't need a CRYCB, however, if the AP instructions are not installed
> and the MSAX3 facility is not installed, so why even call
> kvm_s390_format_crycb(kvm) in that case?
It byes a lot of lines.
I mean that you do exactly the same by only using 3 lines inserted
instead of 65 changes.
No logic change.
>
>>
>>
>>
>>> -
>>> - /* Enable AES/DEA protected key functions by default */
>>> - kvm->arch.crypto.aes_kw = 1;
>>> - kvm->arch.crypto.dea_kw = 1;
>>> - get_random_bytes(kvm->arch.crypto.crycb->aes_wrapping_key_mask,
>>> - sizeof(kvm->arch.crypto.crycb->aes_wrapping_key_mask));
>>> - get_random_bytes(kvm->arch.crypto.crycb->dea_wrapping_key_mask,
>>> - sizeof(kvm->arch.crypto.crycb->dea_wrapping_key_mask));
>>> + kvm->arch.crypto.crycbd = (__u32)(unsigned long)
>>> kvm->arch.crypto.crycb;
>>> + kvm_s390_format_crycb(kvm);
>>> }
>>>
>>> static void sca_dispose(struct kvm *kvm)
>>> @@ -2430,17 +2449,24 @@ void kvm_arch_vcpu_postcreate(struct
>>> kvm_vcpu *vcpu)
>>>
>>> static void kvm_s390_vcpu_crypto_setup(struct kvm_vcpu *vcpu)
>>> {
>>> - if (!test_kvm_facility(vcpu->kvm, 76))
>>> + /*
>>> + * If a crypto control block designation (CRYCBD) has not been
>>> + * initialized
>>> + */
>>> + if (vcpu->kvm->arch.crypto.crycbd == 0)
>>> return;
>>>
>>> - vcpu->arch.sie_block->ecb3 &= ~(ECB3_AES | ECB3_DEA);
>>> + vcpu->arch.sie_block->crycbd = vcpu->kvm->arch.crypto.crycbd;
>>>
>>> - if (vcpu->kvm->arch.crypto.aes_kw)
>>> - vcpu->arch.sie_block->ecb3 |= ECB3_AES;
>>> - if (vcpu->kvm->arch.crypto.dea_kw)
>>> - vcpu->arch.sie_block->ecb3 |= ECB3_DEA;
>>> + /* If MSAX3 is installed, set up protected key support */
>>> + if (test_kvm_facility(vcpu->kvm, 76)) {
>>> + vcpu->arch.sie_block->ecb3 &= ~(ECB3_AES | ECB3_DEA);
>>>
>>> - vcpu->arch.sie_block->crycbd = vcpu->kvm->arch.crypto.crycbd;
>>> + if (vcpu->kvm->arch.crypto.aes_kw)
>>> + vcpu->arch.sie_block->ecb3 |= ECB3_AES;
>>> + if (vcpu->kvm->arch.crypto.dea_kw)
>>> + vcpu->arch.sie_block->ecb3 |= ECB3_DEA;
>>> + }
>>> }
>>>
>>> void kvm_s390_vcpu_unsetup_cmma(struct kvm_vcpu *vcpu)
>>
>>
>
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On Wed, 16 May 2018 14:17:52 +0200
Pierre Morel <[email protected]> wrote:
> On 16/05/2018 13:14, Tony Krowiak wrote:
> > On 05/16/2018 04:51 AM, Pierre Morel wrote:
> >> On 07/05/2018 17:11, Tony Krowiak wrote:
> >>> @@ -1896,19 +1917,17 @@ static u64 kvm_s390_get_initial_cpuid(void)
> >>>
> >>> static void kvm_s390_crypto_init(struct kvm *kvm)
> >>> {
> >>> - if (!test_kvm_facility(kvm, 76))
> >>> + /*
> >>> + * If neither the AP instructions nor the message security assist
> >>> + * extension 3 (MSAX3) are installed, there is no need to
> >>> initialize a
> >>> + * crypto control block (CRYCB) for the guest.
> >>> + */
> >>> + if (!kvm_ap_instructions_available() && !test_kvm_facility(kvm,
> >>> 76))
> >>> return;
> >>>
> >>> kvm->arch.crypto.crycb = &kvm->arch.sie_page2->crycb;
> >>> - kvm_s390_set_crycb_format(kvm);
> >>
> >>
> >> For my point of view the all patch can be reduced to putting this
> >> call (kvm_s390_set_crycb_format(kvm);) before testing for facility 76.
> >>
> >> (and setting the format correctly in kvm_s390_set_crycb_format(kvm))
> >
> > I don't see what that buys us; it will just be reshuffling of the logic.
> > The idea here is that all of the code related to formatting the CRYCB for
> > use by the guest is contained in the kvm_s390_format_crycb(kvm) function.
> > We don't need a CRYCB, however, if the AP instructions are not installed
> > and the MSAX3 facility is not installed, so why even call
> > kvm_s390_format_crycb(kvm) in that case?
>
> It byes a lot of lines.
> I mean that you do exactly the same by only using 3 lines inserted
> instead of 65 changes.
> No logic change.
Sounds like a winner from my POV :)
On 05/16/2018 06:42 AM, Cornelia Huck wrote:
> On Mon, 7 May 2018 11:11:44 -0400
> Tony Krowiak <[email protected]> wrote:
>
>> Registers the matrix device created by the VFIO AP device
>> driver with the VFIO mediated device framework.
>> Registering the matrix device will create the sysfs
>> structures needed to create mediated matrix devices
>> each of which will be used to configure the AP matrix
>> for a guest and connect it to the VFIO AP device driver.
>>
>> Registering the matrix device with the VFIO mediated device
>> framework will create the following sysfs structures:
>>
>> /sys/devices/vfio_ap
>> ... [matrix]
>> ...... [mdev_supported_types]
>> ......... [vfio_ap-passthrough]
>> ............ create
>>
>> To create a mediated device for the AP matrix device, write a UUID
>> to the create file:
>>
>> uuidgen > create
>>
>> A symbolic link to the mediated device's directory will be created in the
>> devices subdirectory named after the generated $uuid:
>>
>> /sys/devices/vfio_ap
>> ... [matrix]
>> ...... [mdev_supported_types]
>> ......... [vfio_ap-passthrough]
>> ............ [devices]
>> ............... [$uuid]
>>
>> Signed-off-by: Tony Krowiak <[email protected]>
>> ---
>> MAINTAINERS | 1 +
>> drivers/s390/crypto/Makefile | 2 +-
>> drivers/s390/crypto/vfio_ap_drv.c | 9 +++
>> drivers/s390/crypto/vfio_ap_ops.c | 106 +++++++++++++++++++++++++++++++++
>> drivers/s390/crypto/vfio_ap_private.h | 17 +++++
>> 5 files changed, 134 insertions(+), 1 deletions(-)
>> create mode 100644 drivers/s390/crypto/vfio_ap_ops.c
>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> new file mode 100644
>> index 0000000..d7d36fb
>> --- /dev/null
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -0,0 +1,106 @@
>> +// SPDX-License-Identifier: GPL-2.0+
>> +/*
>> + * Adjunct processor matrix VFIO device driver callbacks.
>> + *
>> + * Copyright IBM Corp. 2017
> Should be '2018' (also in some other files in this series; please
> double check.)
Gee, I thought I got all of these fixed. I must have a gremlin in my
laptop.
>
>> + * Author(s): Tony Krowiak <[email protected]>
>> + *
>> + */
>> +#include <linux/string.h>
>> +#include <linux/vfio.h>
>> +#include <linux/device.h>
>> +#include <linux/list.h>
>> +#include <linux/ctype.h>
>> +
>> +#include "vfio_ap_private.h"
>> +
>> +#define VFOP_AP_MDEV_TYPE_HWVIRT "passthrough"
>> +#define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
>> +
>> +static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>> +{
>> + struct ap_matrix *ap_matrix = to_ap_matrix(mdev_parent_dev(mdev));
>> +
>> + ap_matrix->available_instances--;
> Shouldn't the code check whether available_instances is actually > 0?
It is my understanding that once the available_instances hits zero, the
mdev
framework will not allow any more mediated devices to be created, so the
value should always be greater than zero when this function is invoked.
I did an experiment to verify my understanding. I initialized the
available_instances
to 1. I was able to create 2 mediated devices. It seems that the
framework refuses to
create a mediated device only after the available_instances sysfs
attribute is a negative
number, so I have two choices: Initialize available_instances to one
less than desires;
add the check you suggested. I think I'll go with the latter.
>
>> +
>> + return 0;
>> +}
>> +
>> +static int vfio_ap_mdev_remove(struct mdev_device *mdev)
>> +{
>> + struct ap_matrix *ap_matrix = to_ap_matrix(mdev_parent_dev(mdev));
>> +
>> + ap_matrix->available_instances++;
>> +
>> + return 0;
>> +}
>> +
>> +static ssize_t name_show(struct kobject *kobj, struct device *dev, char *buf)
>> +{
>> + return sprintf(buf, "%s\n", VFIO_AP_MDEV_NAME_HWVIRT);
>> +}
>> +
>> +MDEV_TYPE_ATTR_RO(name);
>> +
>> +static ssize_t available_instances_show(struct kobject *kobj,
>> + struct device *dev, char *buf)
>> +{
>> + struct ap_matrix *ap_matrix;
>> +
>> + ap_matrix = to_ap_matrix(dev);
> Move this with the declaration?
>
>> +
>> + return sprintf(buf, "%d\n", ap_matrix->available_instances);
>> +}
>> +
>> +MDEV_TYPE_ATTR_RO(available_instances);
>> +
>> +static ssize_t device_api_show(struct kobject *kobj, struct device *dev,
>> + char *buf)
>> +{
>> + return sprintf(buf, "%s\n", VFIO_DEVICE_API_AP_STRING);
>> +}
>> +
>> +MDEV_TYPE_ATTR_RO(device_api);
>> +
>> +static struct attribute *vfio_ap_mdev_type_attrs[] = {
>> + &mdev_type_attr_name.attr,
>> + &mdev_type_attr_device_api.attr,
>> + &mdev_type_attr_available_instances.attr,
>> + NULL,
>> +};
>> +
>> +static struct attribute_group vfio_ap_mdev_hwvirt_type_group = {
>> + .name = VFOP_AP_MDEV_TYPE_HWVIRT,
>> + .attrs = vfio_ap_mdev_type_attrs,
>> +};
>> +
>> +static struct attribute_group *vfio_ap_mdev_type_groups[] = {
>> + &vfio_ap_mdev_hwvirt_type_group,
>> + NULL,
>> +};
>> +
>> +static const struct mdev_parent_ops vfio_ap_matrix_ops = {
>> + .owner = THIS_MODULE,
>> + .supported_type_groups = vfio_ap_mdev_type_groups,
>> + .create = vfio_ap_mdev_create,
>> + .remove = vfio_ap_mdev_remove,
>> +};
>> +
>> +int vfio_ap_mdev_register(struct ap_matrix *ap_matrix)
>> +{
>> + int ret;
>> +
>> + ret = mdev_register_device(&ap_matrix->device, &vfio_ap_matrix_ops);
>> + if (ret)
>> + return ret;
>> +
>> + ap_matrix->available_instances = AP_MATRIX_MAX_AVAILABLE_INSTANCES;
>> +
>> + return 0;
>> +}
>> +
>> +void vfio_ap_mdev_unregister(struct ap_matrix *ap_matrix)
>> +{
>> + ap_matrix->available_instances--;
>> + mdev_unregister_device(&ap_matrix->device);
>> +}
>> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
>> index cf23675..afd8dbc 100644
>> --- a/drivers/s390/crypto/vfio_ap_private.h
>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>> @@ -10,14 +10,31 @@
>> #define _VFIO_AP_PRIVATE_H_
>>
>> #include <linux/types.h>
>> +#include <linux/device.h>
>> +#include <linux/mdev.h>
>>
>> #include "ap_bus.h"
>>
>> #define VFIO_AP_MODULE_NAME "vfio_ap"
>> #define VFIO_AP_DRV_NAME "vfio_ap"
>> +/**
>> + * There must be one mediated matrix device per guest. If every APQN is assigned
> One, or at most one? Or one for every guest using ap devices?
>
>> + * to a guest, then the maximum number of guests with a unique APQN assigned
>> + * would be 255 adapters x 255 domains = 72351 guests.
>> + */
>> +#define AP_MATRIX_MAX_AVAILABLE_INSTANCES 72351
>>
>> struct ap_matrix {
>> struct device device;
>> + int available_instances;
>> };
>>
>> +static inline struct ap_matrix *to_ap_matrix(struct device *dev)
>> +{
>> + return container_of(dev, struct ap_matrix, device);
>> +}
>> +
>> +extern int vfio_ap_mdev_register(struct ap_matrix *ap_matrix);
>> +extern void vfio_ap_mdev_unregister(struct ap_matrix *ap_matrix);
>> +
>> #endif /* _VFIO_AP_PRIVATE_H_ */
On 05/16/2018 06:42 AM, Cornelia Huck wrote:
> On Mon, 7 May 2018 11:11:44 -0400
> Tony Krowiak <[email protected]> wrote:
>
>> Registers the matrix device created by the VFIO AP device
>> driver with the VFIO mediated device framework.
>> Registering the matrix device will create the sysfs
>> structures needed to create mediated matrix devices
>> each of which will be used to configure the AP matrix
>> for a guest and connect it to the VFIO AP device driver.
>>
>> Registering the matrix device with the VFIO mediated device
>> framework will create the following sysfs structures:
>>
>> /sys/devices/vfio_ap
>> ... [matrix]
>> ...... [mdev_supported_types]
>> ......... [vfio_ap-passthrough]
>> ............ create
>>
>> To create a mediated device for the AP matrix device, write a UUID
>> to the create file:
>>
>> uuidgen > create
>>
>> A symbolic link to the mediated device's directory will be created in the
>> devices subdirectory named after the generated $uuid:
>>
>> /sys/devices/vfio_ap
>> ... [matrix]
>> ...... [mdev_supported_types]
>> ......... [vfio_ap-passthrough]
>> ............ [devices]
>> ............... [$uuid]
>>
>> Signed-off-by: Tony Krowiak <[email protected]>
>> ---
>> MAINTAINERS | 1 +
>> drivers/s390/crypto/Makefile | 2 +-
>> drivers/s390/crypto/vfio_ap_drv.c | 9 +++
>> drivers/s390/crypto/vfio_ap_ops.c | 106 +++++++++++++++++++++++++++++++++
>> drivers/s390/crypto/vfio_ap_private.h | 17 +++++
>> 5 files changed, 134 insertions(+), 1 deletions(-)
>> create mode 100644 drivers/s390/crypto/vfio_ap_ops.c
>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> new file mode 100644
>> index 0000000..d7d36fb
>> --- /dev/null
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -0,0 +1,106 @@
>> +// SPDX-License-Identifier: GPL-2.0+
>> +/*
>> + * Adjunct processor matrix VFIO device driver callbacks.
>> + *
>> + * Copyright IBM Corp. 2017
> Should be '2018' (also in some other files in this series; please
> double check.)
>
>> + * Author(s): Tony Krowiak <[email protected]>
>> + *
>> + */
>> +#include <linux/string.h>
>> +#include <linux/vfio.h>
>> +#include <linux/device.h>
>> +#include <linux/list.h>
>> +#include <linux/ctype.h>
>> +
>> +#include "vfio_ap_private.h"
>> +
>> +#define VFOP_AP_MDEV_TYPE_HWVIRT "passthrough"
>> +#define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
>> +
>> +static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>> +{
>> + struct ap_matrix *ap_matrix = to_ap_matrix(mdev_parent_dev(mdev));
>> +
>> + ap_matrix->available_instances--;
> Shouldn't the code check whether available_instances is actually > 0?
Disregard my last response to this, I'm an idiot; it is the return of a
negative number from this function that causes the create to fail.
>
>> +
>> + return 0;
>> +}
>> +
>> +static int vfio_ap_mdev_remove(struct mdev_device *mdev)
>> +{
>> + struct ap_matrix *ap_matrix = to_ap_matrix(mdev_parent_dev(mdev));
>> +
>> + ap_matrix->available_instances++;
>> +
>> + return 0;
>> +}
>> +
>> +static ssize_t name_show(struct kobject *kobj, struct device *dev, char *buf)
>> +{
>> + return sprintf(buf, "%s\n", VFIO_AP_MDEV_NAME_HWVIRT);
>> +}
>> +
>> +MDEV_TYPE_ATTR_RO(name);
>> +
>> +static ssize_t available_instances_show(struct kobject *kobj,
>> + struct device *dev, char *buf)
>> +{
>> + struct ap_matrix *ap_matrix;
>> +
>> + ap_matrix = to_ap_matrix(dev);
> Move this with the declaration?
Okay, will do.
>
>> +
>> + return sprintf(buf, "%d\n", ap_matrix->available_instances);
>> +}
>> +
>> +MDEV_TYPE_ATTR_RO(available_instances);
>> +
>> +static ssize_t device_api_show(struct kobject *kobj, struct device *dev,
>> + char *buf)
>> +{
>> + return sprintf(buf, "%s\n", VFIO_DEVICE_API_AP_STRING);
>> +}
>> +
>> +MDEV_TYPE_ATTR_RO(device_api);
>> +
>> +static struct attribute *vfio_ap_mdev_type_attrs[] = {
>> + &mdev_type_attr_name.attr,
>> + &mdev_type_attr_device_api.attr,
>> + &mdev_type_attr_available_instances.attr,
>> + NULL,
>> +};
>> +
>> +static struct attribute_group vfio_ap_mdev_hwvirt_type_group = {
>> + .name = VFOP_AP_MDEV_TYPE_HWVIRT,
>> + .attrs = vfio_ap_mdev_type_attrs,
>> +};
>> +
>> +static struct attribute_group *vfio_ap_mdev_type_groups[] = {
>> + &vfio_ap_mdev_hwvirt_type_group,
>> + NULL,
>> +};
>> +
>> +static const struct mdev_parent_ops vfio_ap_matrix_ops = {
>> + .owner = THIS_MODULE,
>> + .supported_type_groups = vfio_ap_mdev_type_groups,
>> + .create = vfio_ap_mdev_create,
>> + .remove = vfio_ap_mdev_remove,
>> +};
>> +
>> +int vfio_ap_mdev_register(struct ap_matrix *ap_matrix)
>> +{
>> + int ret;
>> +
>> + ret = mdev_register_device(&ap_matrix->device, &vfio_ap_matrix_ops);
>> + if (ret)
>> + return ret;
>> +
>> + ap_matrix->available_instances = AP_MATRIX_MAX_AVAILABLE_INSTANCES;
>> +
>> + return 0;
>> +}
>> +
>> +void vfio_ap_mdev_unregister(struct ap_matrix *ap_matrix)
>> +{
>> + ap_matrix->available_instances--;
>> + mdev_unregister_device(&ap_matrix->device);
>> +}
>> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
>> index cf23675..afd8dbc 100644
>> --- a/drivers/s390/crypto/vfio_ap_private.h
>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>> @@ -10,14 +10,31 @@
>> #define _VFIO_AP_PRIVATE_H_
>>
>> #include <linux/types.h>
>> +#include <linux/device.h>
>> +#include <linux/mdev.h>
>>
>> #include "ap_bus.h"
>>
>> #define VFIO_AP_MODULE_NAME "vfio_ap"
>> #define VFIO_AP_DRV_NAME "vfio_ap"
>> +/**
>> + * There must be one mediated matrix device per guest. If every APQN is assigned
> One, or at most one? Or one for every guest using ap devices?
'One mediated matrix device for every guest using AP devices' is most
accurate. I'll make the change.
>
>> + * to a guest, then the maximum number of guests with a unique APQN assigned
>> + * would be 255 adapters x 255 domains = 72351 guests.
>> + */
>> +#define AP_MATRIX_MAX_AVAILABLE_INSTANCES 72351
>>
>> struct ap_matrix {
>> struct device device;
>> + int available_instances;
>> };
>>
>> +static inline struct ap_matrix *to_ap_matrix(struct device *dev)
>> +{
>> + return container_of(dev, struct ap_matrix, device);
>> +}
>> +
>> +extern int vfio_ap_mdev_register(struct ap_matrix *ap_matrix);
>> +extern void vfio_ap_mdev_unregister(struct ap_matrix *ap_matrix);
>> +
>> #endif /* _VFIO_AP_PRIVATE_H_ */
On 05/16/2018 03:48 AM, Pierre Morel wrote:
> On 15/05/2018 18:07, Tony Krowiak wrote:
>> On 05/15/2018 10:55 AM, Pierre Morel wrote:
>>> On 07/05/2018 17:11, Tony Krowiak wrote:
>>>> Provides interfaces to manage the AP adapters, usage domains
>>>> and control domains assigned to a KVM guest.
>>>>
>>>> The guest's SIE state description has a satellite structure called the
>>>> Crypto Control Block (CRYCB) containing three bitmask fields
>>>> identifying the adapters, queues (domains) and control domains
>>>> assigned to the KVM guest:
>>>>
> ...snip...
>>>> +}
>>>
>>> This function (ap_validate_queue_sharing) only verifies that VM
>>> don't share queues.
>>> What about the queues used by a host application?
>>
>> How can that be verified from this function? I suppose I could put a
>> check in here to
>> verify that the queues are reserved by the vfio_ap device driver, but
>> that would
>> be redundant because an AP queue can not be assigned to a mediated
>> matrix device
>> via its sysfs attributes unless it is reserved by the vfio_ap device
>> driver (see
>> patches 7, 8 and 9).
>>
>>>
>>>
>>> I understand that you want to implement these checks within KVM but
>>> this is
>>> related to which queue devices are bound to the matrix and which one
>>> are not.
>>
>> See my comments above and below about AP queue assignment to the
>> mediated matrix
>> device. The one verification we can't do when the devices are
>> assigned is whether
>> another guest is using the queue because assignment occurs before the
>> guest using
>> the queue is started in which case we have no access to KVM. It makes
>> no sense to
>> do so at assignment time anyway because it doesn't matter until the
>> guest using
>> the mediated matrix device is started, so that check is done in KVM.
>>
>>>
>>>
>>> I think that this should be related somehow to the bounded queue
>>> devices and
>>> therefor implemented inside the matrix driver.
>>
>> As I stated above, when an AP queue is assigned to the mediated
>> matrix device via
>> its sysfs attributes, a check is done to verify that it is bound to
>> the vfio_ap
>> device driver (see patches 7, 8 and 9). If not, then assignment will
>> be rejected;
>> therefore, it will not be possible to configure a CRYCB with AP
>> queues that are
>> not bound to the device driver.
>
> This patch and te followed patches take care that the queues are bound
> to the
> matrix driver when they are assigned to the matrix using the sysfs
> entries.
>
> But they do not take care that the queue can not be unbound before you
> start
> the guest, and they are not in the path if the admin decide to unbind
> a queue
> at some later time.
That is a good point. I need to put a check in the device driver at the time
the mediated device fd is opened to verify that the queues being
configured in
the guest's CRYCB are bound to the driver.
>
>
>>
>>>
>>>
>>> Regards,
>>>
>>> Pierre
>>>
>>
>
On 16/05/2018 15:12, Tony Krowiak wrote:
> On 05/16/2018 03:48 AM, Pierre Morel wrote:
>> On 15/05/2018 18:07, Tony Krowiak wrote:
>>> On 05/15/2018 10:55 AM, Pierre Morel wrote:
>>>> On 07/05/2018 17:11, Tony Krowiak wrote:
>>>>> Provides interfaces to manage the AP adapters, usage domains
>>>>> and control domains assigned to a KVM guest.
>>>>>
>>>>> The guest's SIE state description has a satellite structure called
>>>>> the
>>>>> Crypto Control Block (CRYCB) containing three bitmask fields
>>>>> identifying the adapters, queues (domains) and control domains
>>>>> assigned to the KVM guest:
>>>>>
>> ...snip...
>>>>> +}
>>>>
>>>> This function (ap_validate_queue_sharing) only verifies that VM
>>>> don't share queues.
>>>> What about the queues used by a host application?
>>>
>>> How can that be verified from this function? I suppose I could put a
>>> check in here to
>>> verify that the queues are reserved by the vfio_ap device driver,
>>> but that would
>>> be redundant because an AP queue can not be assigned to a mediated
>>> matrix device
>>> via its sysfs attributes unless it is reserved by the vfio_ap device
>>> driver (see
>>> patches 7, 8 and 9).
>>>
>>>>
>>>>
>>>> I understand that you want to implement these checks within KVM
>>>> but this is
>>>> related to which queue devices are bound to the matrix and which
>>>> one are not.
>>>
>>> See my comments above and below about AP queue assignment to the
>>> mediated matrix
>>> device. The one verification we can't do when the devices are
>>> assigned is whether
>>> another guest is using the queue because assignment occurs before
>>> the guest using
>>> the queue is started in which case we have no access to KVM. It
>>> makes no sense to
>>> do so at assignment time anyway because it doesn't matter until the
>>> guest using
>>> the mediated matrix device is started, so that check is done in KVM.
>>>
>>>>
>>>>
>>>> I think that this should be related somehow to the bounded queue
>>>> devices and
>>>> therefor implemented inside the matrix driver.
>>>
>>> As I stated above, when an AP queue is assigned to the mediated
>>> matrix device via
>>> its sysfs attributes, a check is done to verify that it is bound to
>>> the vfio_ap
>>> device driver (see patches 7, 8 and 9). If not, then assignment will
>>> be rejected;
>>> therefore, it will not be possible to configure a CRYCB with AP
>>> queues that are
>>> not bound to the device driver.
>>
>> This patch and te followed patches take care that the queues are
>> bound to the
>> matrix driver when they are assigned to the matrix using the sysfs
>> entries.
>>
>> But they do not take care that the queue can not be unbound before
>> you start
>> the guest, and they are not in the path if the admin decide to unbind
>> a queue
>> at some later time.
>
> That is a good point. I need to put a check in the device driver at
> the time
> the mediated device fd is opened to verify that the queues being
> configured in
> the guest's CRYCB are bound to the driver.
not only, you also need to avoid the possibility of unbinding the device.
For this you need to use the remove callback from the driver.
>
>>
>>
>>>
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Pierre
>>>>
>>>
>>
>
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On 05/16/2018 09:15 AM, Pierre Morel wrote:
> On 16/05/2018 15:12, Tony Krowiak wrote:
>> On 05/16/2018 03:48 AM, Pierre Morel wrote:
>>> On 15/05/2018 18:07, Tony Krowiak wrote:
>>>> On 05/15/2018 10:55 AM, Pierre Morel wrote:
>>>>> On 07/05/2018 17:11, Tony Krowiak wrote:
>>>>>> Provides interfaces to manage the AP adapters, usage domains
>>>>>> and control domains assigned to a KVM guest.
>>>>>>
>>>>>> The guest's SIE state description has a satellite structure
>>>>>> called the
>>>>>> Crypto Control Block (CRYCB) containing three bitmask fields
>>>>>> identifying the adapters, queues (domains) and control domains
>>>>>> assigned to the KVM guest:
>>>>>>
>>> ...snip...
>>>>>> +}
>>>>>
>>>>> This function (ap_validate_queue_sharing) only verifies that VM
>>>>> don't share queues.
>>>>> What about the queues used by a host application?
>>>>
>>>> How can that be verified from this function? I suppose I could put
>>>> a check in here to
>>>> verify that the queues are reserved by the vfio_ap device driver,
>>>> but that would
>>>> be redundant because an AP queue can not be assigned to a mediated
>>>> matrix device
>>>> via its sysfs attributes unless it is reserved by the vfio_ap
>>>> device driver (see
>>>> patches 7, 8 and 9).
>>>>
>>>>>
>>>>>
>>>>> I understand that you want to implement these checks within KVM
>>>>> but this is
>>>>> related to which queue devices are bound to the matrix and which
>>>>> one are not.
>>>>
>>>> See my comments above and below about AP queue assignment to the
>>>> mediated matrix
>>>> device. The one verification we can't do when the devices are
>>>> assigned is whether
>>>> another guest is using the queue because assignment occurs before
>>>> the guest using
>>>> the queue is started in which case we have no access to KVM. It
>>>> makes no sense to
>>>> do so at assignment time anyway because it doesn't matter until the
>>>> guest using
>>>> the mediated matrix device is started, so that check is done in KVM.
>>>>
>>>>>
>>>>>
>>>>> I think that this should be related somehow to the bounded queue
>>>>> devices and
>>>>> therefor implemented inside the matrix driver.
>>>>
>>>> As I stated above, when an AP queue is assigned to the mediated
>>>> matrix device via
>>>> its sysfs attributes, a check is done to verify that it is bound to
>>>> the vfio_ap
>>>> device driver (see patches 7, 8 and 9). If not, then assignment
>>>> will be rejected;
>>>> therefore, it will not be possible to configure a CRYCB with AP
>>>> queues that are
>>>> not bound to the device driver.
>>>
>>> This patch and te followed patches take care that the queues are
>>> bound to the
>>> matrix driver when they are assigned to the matrix using the sysfs
>>> entries.
>>>
>>> But they do not take care that the queue can not be unbound before
>>> you start
>>> the guest, and they are not in the path if the admin decide to
>>> unbind a queue
>>> at some later time.
>>
>> That is a good point. I need to put a check in the device driver at
>> the time
>> the mediated device fd is opened to verify that the queues being
>> configured in
>> the guest's CRYCB are bound to the driver.
>
> not only, you also need to avoid the possibility of unbinding the device.
> For this you need to use the remove callback from the driver.
I thought I addressed this already. The definition of the remove
callback does
not specify a return value, so there is currently no way to prevent the
AP bus
from removing the queue device on unbind. I sent an email to Harald to
discuss
adding a return value to the callback.
>
>
>>
>>>
>>>
>>>>
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> Pierre
>>>>>
>>>>
>>>
>>
>
On 05/11/2018 12:08 PM, Halil Pasic wrote:
>
>
> On 05/07/2018 05:11 PM, Tony Krowiak wrote:
>> Provides interfaces to manage the AP adapters, usage domains
>> and control domains assigned to a KVM guest.
>>
>> The guest's SIE state description has a satellite structure called the
>> Crypto Control Block (CRYCB) containing three bitmask fields
>> identifying the adapters, queues (domains) and control domains
>> assigned to the KVM guest:
>
> [..]
>
>> index 00bcfb0..98b53c7 100644
>> --- a/arch/s390/kvm/kvm-ap.c
>> +++ b/arch/s390/kvm/kvm-ap.c
>> @@ -7,6 +7,7 @@
>
> [..]
>
>> +
>> +/**
>> + * kvm_ap_validate_queue_sharing
>> + *
>> + * Verifies that the APQNs derived from the cross product of the AP
>> adapter IDs
>> + * and AP queue indexes comprising the AP matrix are not configured for
>> + * another guest. AP queue sharing is not allowed.
>> + *
>> + * @kvm: the KVM guest
>> + * @matrix: the AP matrix
>> + *
>> + * Returns 0 if the APQNs are valid, otherwise; returns -EBUSY.
>> + */
>> +static int kvm_ap_validate_queue_sharing(struct kvm *kvm,
>> + struct kvm_ap_matrix *matrix)
>> +{
>> + struct kvm *vm;
>> + unsigned long *apm, *aqm;
>> + unsigned long apid, apqi;
>> +
>> +
>> + /* No other VM may share an AP Queue with the input VM */
>> + list_for_each_entry(vm, &vm_list, vm_list) {
>> + if (kvm == vm)
>> + continue;
>> +
>> + apm = kvm_ap_get_crycb_apm(vm);
>> + if (!bitmap_and(apm, apm, matrix->apm, matrix->apm_max + 1))
>> + continue;
>> +
>> + aqm = kvm_ap_get_crycb_aqm(vm);
>> + if (!bitmap_and(aqm, aqm, matrix->aqm, matrix->aqm_max + 1))
>> + continue;
>> +
>> + for_each_set_bit_inv(apid, apm, matrix->apm_max + 1)
>> + for_each_set_bit_inv(apqi, aqm, matrix->aqm_max + 1)
>> + kvm_ap_log_sharing_err(vm, apid, apqi);
>> +
>> + return -EBUSY;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +int kvm_ap_configure_matrix(struct kvm *kvm, struct kvm_ap_matrix
>> *matrix)
>> +{
>> + int ret = 0;
>> +
>> + mutex_lock(&kvm->lock);
>
> You seem to take only kvm->lock, vm_list however (used in
> kvm_ap_validate_queue_sharing()) seems to be protected by
> kvm_lock.
>
> Can you tell me why is this supposed to be safe?
>
> What is supposed to prevent an execution like
> vm1: call kvm_ap_configure_matrix(m1)
> vm2: call kvm_ap_configure_matrix(m2)
> vm1: call kvm_ap_validate_queue_sharing(m1)
> vm2: call kvm_ap_validate_queue_sharing(m2)
> vm1: call kvm_ap_set_crycb_masks(m1)
> vm2: call kvm_ap_set_crycb_masks(m2)
>
> where, let's say, m1 and m2 are equal in the sense that the
> mask values are the same?
vm1 will get the kvm->lock first in your scenario when
kvm_ap_configure_matrix(m1) is invoked. Since the other
functions - i.e., kvm_ap_validate_queue_sharing(m1) and
kvm_ap_set_crycb_masks(m1) - are static and only called
from the kvm_ap_configure_matrix(m1), your scenario
can never happen because vm2 will not get the lock until
kvm_ap_configure_matrix(m1) has completed. I see your
point, however, and maybe I should also acquire the kvm_lock.
>
>
> Regards,
> Halil
>
>> +
>> + ret = kvm_ap_validate_queue_sharing(kvm, matrix);
>> + if (ret)
>> + goto done;
>> +
>> + kvm_ap_set_crycb_masks(kvm, matrix);
>> +
>> +done:
>> + mutex_unlock(&kvm->lock);
>> +
>> + return ret;
>> +}
>> +EXPORT_SYMBOL(kvm_ap_configure_matrix);
>> +
On 16/05/2018 16:29, Tony Krowiak wrote:
> On 05/11/2018 12:08 PM, Halil Pasic wrote:
>>
>>
>> On 05/07/2018 05:11 PM, Tony Krowiak wrote:
>>> Provides interfaces to manage the AP adapters, usage domains
>>> and control domains assigned to a KVM guest.
>>>
>>> The guest's SIE state description has a satellite structure called the
>>> Crypto Control Block (CRYCB) containing three bitmask fields
>>> identifying the adapters, queues (domains) and control domains
>>> assigned to the KVM guest:
>>
>> [..]
>>
>>> index 00bcfb0..98b53c7 100644
>>> --- a/arch/s390/kvm/kvm-ap.c
>>> +++ b/arch/s390/kvm/kvm-ap.c
>>> @@ -7,6 +7,7 @@
>>
>> [..]
>>
>>> +
>>> +/**
>>> + * kvm_ap_validate_queue_sharing
>>> + *
>>> + * Verifies that the APQNs derived from the cross product of the AP
>>> adapter IDs
>>> + * and AP queue indexes comprising the AP matrix are not configured
>>> for
>>> + * another guest. AP queue sharing is not allowed.
>>> + *
>>> + * @kvm: the KVM guest
>>> + * @matrix: the AP matrix
>>> + *
>>> + * Returns 0 if the APQNs are valid, otherwise; returns -EBUSY.
>>> + */
>>> +static int kvm_ap_validate_queue_sharing(struct kvm *kvm,
>>> + struct kvm_ap_matrix *matrix)
>>> +{
>>> + struct kvm *vm;
>>> + unsigned long *apm, *aqm;
>>> + unsigned long apid, apqi;
>>> +
>>> +
>>> + /* No other VM may share an AP Queue with the input VM */
>>> + list_for_each_entry(vm, &vm_list, vm_list) {
>>> + if (kvm == vm)
>>> + continue;
>>> +
>>> + apm = kvm_ap_get_crycb_apm(vm);
>>> + if (!bitmap_and(apm, apm, matrix->apm, matrix->apm_max + 1))
>>> + continue;
>>> +
>>> + aqm = kvm_ap_get_crycb_aqm(vm);
>>> + if (!bitmap_and(aqm, aqm, matrix->aqm, matrix->aqm_max + 1))
>>> + continue;
>>> +
>>> + for_each_set_bit_inv(apid, apm, matrix->apm_max + 1)
>>> + for_each_set_bit_inv(apqi, aqm, matrix->aqm_max + 1)
>>> + kvm_ap_log_sharing_err(vm, apid, apqi);
>>> +
>>> + return -EBUSY;
>>> + }
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +int kvm_ap_configure_matrix(struct kvm *kvm, struct kvm_ap_matrix
>>> *matrix)
>>> +{
>>> + int ret = 0;
>>> +
>>> + mutex_lock(&kvm->lock);
>>
>> You seem to take only kvm->lock, vm_list however (used in
>> kvm_ap_validate_queue_sharing()) seems to be protected by
>> kvm_lock.
>>
>> Can you tell me why is this supposed to be safe?
>>
>> What is supposed to prevent an execution like
>> vm1: call kvm_ap_configure_matrix(m1)
>> vm2: call kvm_ap_configure_matrix(m2)
>> vm1: call kvm_ap_validate_queue_sharing(m1)
>> vm2: call kvm_ap_validate_queue_sharing(m2)
>> vm1: call kvm_ap_set_crycb_masks(m1)
>> vm2: call kvm_ap_set_crycb_masks(m2)
>>
>> where, let's say, m1 and m2 are equal in the sense that the
>> mask values are the same?
>
> vm1 will get the kvm->lock first in your scenario when
> kvm_ap_configure_matrix(m1) is invoked. Since the other
> functions - i.e., kvm_ap_validate_queue_sharing(m1) and
> kvm_ap_set_crycb_masks(m1) - are static and only called
> from the kvm_ap_configure_matrix(m1), your scenario
> can never happen because vm2 will not get the lock until
> kvm_ap_configure_matrix(m1) has completed. I see your
> point, however, and maybe I should also acquire the kvm_lock.
AFAIU the locks you are talking about are KVM specific
but the example from Halil use two different VM,
i.e. two different locks are used and vm2 never wait for vw1.
>
>>
>>
>> Regards,
>> Halil
>>
>>> +
>>> + ret = kvm_ap_validate_queue_sharing(kvm, matrix);
>>> + if (ret)
>>> + goto done;
>>> +
>>> + kvm_ap_set_crycb_masks(kvm, matrix);
>>> +
>>> +done:
>>> + mutex_unlock(&kvm->lock);
>>> +
>>> + return ret;
>>> +}
>>> +EXPORT_SYMBOL(kvm_ap_configure_matrix);
>>> +
>
>
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On Mon, 14 May 2018 15:42:18 -0400
Tony Krowiak <[email protected]> wrote:
> On 05/11/2018 01:18 PM, Halil Pasic wrote:
> >
> >
> > On 05/07/2018 05:11 PM, Tony Krowiak wrote:
> >> Registers the matrix device created by the VFIO AP device
> >> driver with the VFIO mediated device framework.
> >> Registering the matrix device will create the sysfs
> >> structures needed to create mediated matrix devices
> >> each of which will be used to configure the AP matrix
> >> for a guest and connect it to the VFIO AP device driver.
> >> +static int vfio_ap_mdev_create(struct kobject *kobj, struct
> >> mdev_device *mdev)
> >> +{
> >> + struct ap_matrix *ap_matrix = to_ap_matrix(mdev_parent_dev(mdev));
> >> +
> >> + ap_matrix->available_instances--;
> >> +
> >> + return 0;
> >> +}
> >> +
> >> +static int vfio_ap_mdev_remove(struct mdev_device *mdev)
> >> +{
> >> + struct ap_matrix *ap_matrix = to_ap_matrix(mdev_parent_dev(mdev));
> >> +
> >> + ap_matrix->available_instances++;
> >> +
> >> + return 0;
> >> +}
> >> +
> >
> > The above functions seem to be called with the lock of this
> > auto-generated
> > mdev parent device held. That's why we don't have to care about
> > synchronization
> > ourselves, right?
>
> I would assume as much. The comments for the 'struct mdev_parent_ops' in
> include/linux/mdev.h do not mention anything about synchronization, nor
> did I
> see any locking or synchronization in the vfio_ccw implementation after
> which
> I modeled my code, so frankly it is something I did not consider.
>
> >
> >
> > A small comment in the code could be helpful for mdev non-experts.
> > Hell, I would
> > even consider documenting it for all mdev -- took me some time to
> > figure out.
>
> You may want to bring this up with the VFIO mdev maintainers, but I'd be
> happy to
> include a comment in the functions in question if you think it important.
Important note: There's currently a patch on list that removes the mdev
parent mutex, and it seems there was never intended to be any
serialization in that place by the mdev core. (Look for "vfio/mdev:
Check globally for duplicate devices".)
On 16.05.2018 12:45, Tony Krowiak wrote:
> On 05/16/2018 06:21 AM, Cornelia Huck wrote:
>> On Mon, 7 May 2018 11:11:40 -0400
>> Tony Krowiak <[email protected]> wrote:
>>
>>> Relocates an existing static function that tests whether
>>> the AP extended addressing facility (APXA) is installed on
>>> the linux host. The primary reason for relocating this
>>> function is because a new compilation unit (arch/s390/kvm/kvm-ap.c)
>>> is being created to contain all of the interfaces and logic
>>> for configuring an AP matrix for a KVM guest. Some of its
>>> functions will also need to determine whether APXA is installed,
>>> so, let's go ahead and relocate this static function as a
>>> public interface in kvm-ap.c.
>>>
>>> Notes:
>>> ----
>>> 1. The interface to determine whether APXA is installed on the linux
>>> host the information returned from the AP Query Configuration
>>> Information (QCI) function. This function will not be available
>>> if the AP instructions are not installed on the linux host, so a check
>>> will be included to verify that.
>>>
>>> 2. Currently, the AP bus interfaces accessing the AP instructions will
>>> not be accessible if CONFIG_ZCRYPT=n, so the relevant code will be
>>> temporarily contained in the new arch/s390/kvm/kvm-ap.c file until
>>> the patch(es) to statically build the required AP bus interfaces are
>>> available.
>> Any ETA for those interfaces? Would be nice if we could avoid
>> introducing temporary interfaces (but I'm certainly not opposing this
>> patch).
>
> I'll check with the developer.
The proposal is out on the internal mailing list. I'd like to release it (internal) tomorrow or start
next week. Already talked with Christian about this because we need to align it somehow with
kvm and s390 kernel development.
regards
Harald Freudenberger
>
>>
>>> Signed-off-by: Tony Krowiak <[email protected]>
>>> ---
>>> MAINTAINERS | 1 +
>>> arch/s390/include/asm/kvm-ap.h | 60 +++++++++++++++++++++++++++++
>>> arch/s390/kvm/Makefile | 2 +-
>>> arch/s390/kvm/kvm-ap.c | 83 ++++++++++++++++++++++++++++++++++++++++
>>> arch/s390/kvm/kvm-s390.c | 42 +-------------------
>>> 5 files changed, 147 insertions(+), 41 deletions(-)
>>> create mode 100644 arch/s390/include/asm/kvm-ap.h
>>> create mode 100644 arch/s390/kvm/kvm-ap.c
>>>
>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>> index eab763f..224e97b 100644
>>> --- a/MAINTAINERS
>>> +++ b/MAINTAINERS
>>> @@ -7792,6 +7792,7 @@ M: Christian Borntraeger <[email protected]>
>>> M: Janosch Frank <[email protected]>
>>> R: David Hildenbrand <[email protected]>
>>> R: Cornelia Huck <[email protected]>
>>> +R: Tony Krowiak <[email protected]>
>> Don't you want to drop the 'vnet' from your address, as the vnet-less
>> form seems to be the one that will continue working from what I've
>> heard?
>>
>>> L: [email protected]
>>> W: http://www.ibm.com/developerworks/linux/linux390/
>>> T: git git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git
>> No objection against this patch, although I still hope it will not be
>> needed :)
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On Thu, 17 May 2018 11:11:41 +0200
Harald Freudenberger <[email protected]> wrote:
> On 16.05.2018 12:45, Tony Krowiak wrote:
> > On 05/16/2018 06:21 AM, Cornelia Huck wrote:
> >> On Mon, 7 May 2018 11:11:40 -0400
> >> Tony Krowiak <[email protected]> wrote:
> >>
> >>> Relocates an existing static function that tests whether
> >>> the AP extended addressing facility (APXA) is installed on
> >>> the linux host. The primary reason for relocating this
> >>> function is because a new compilation unit (arch/s390/kvm/kvm-ap.c)
> >>> is being created to contain all of the interfaces and logic
> >>> for configuring an AP matrix for a KVM guest. Some of its
> >>> functions will also need to determine whether APXA is installed,
> >>> so, let's go ahead and relocate this static function as a
> >>> public interface in kvm-ap.c.
> >>>
> >>> Notes:
> >>> ----
> >>> 1. The interface to determine whether APXA is installed on the linux
> >>> host the information returned from the AP Query Configuration
> >>> Information (QCI) function. This function will not be available
> >>> if the AP instructions are not installed on the linux host, so a check
> >>> will be included to verify that.
> >>>
> >>> 2. Currently, the AP bus interfaces accessing the AP instructions will
> >>> not be accessible if CONFIG_ZCRYPT=n, so the relevant code will be
> >>> temporarily contained in the new arch/s390/kvm/kvm-ap.c file until
> >>> the patch(es) to statically build the required AP bus interfaces are
> >>> available.
> >> Any ETA for those interfaces? Would be nice if we could avoid
> >> introducing temporary interfaces (but I'm certainly not opposing this
> >> patch).
> >
> > I'll check with the developer.
> The proposal is out on the internal mailing list. I'd like to release it (internal) tomorrow or start
> next week. Already talked with Christian about this because we need to align it somehow with
> kvm and s390 kernel development.
Great, thanks for the update!
On 16/05/2018 15:48, Tony Krowiak wrote:
> On 05/16/2018 09:15 AM, Pierre Morel wrote:
>> On 16/05/2018 15:12, Tony Krowiak wrote:
>>> On 05/16/2018 03:48 AM, Pierre Morel wrote:
>>>> On 15/05/2018 18:07, Tony Krowiak wrote:
>>>>> On 05/15/2018 10:55 AM, Pierre Morel wrote:
>>>>>> On 07/05/2018 17:11, Tony Krowiak wrote:
>>>>>>> Provides interfaces to manage the AP adapters, usage domains
>>>>>>> and control domains assigned to a KVM guest.
>>>>>>>
>>>>>>> The guest's SIE state description has a satellite structure
>>>>>>> called the
>>>>>>> Crypto Control Block (CRYCB) containing three bitmask fields
>>>>>>> identifying the adapters, queues (domains) and control domains
>>>>>>> assigned to the KVM guest:
>>>>>>>
>>>> ...snip...
>>>>>>> +}
>>>>>>
>>>>>> This function (ap_validate_queue_sharing) only verifies that VM
>>>>>> don't share queues.
>>>>>> What about the queues used by a host application?
>>>>>
>>>>> How can that be verified from this function? I suppose I could put
>>>>> a check in here to
>>>>> verify that the queues are reserved by the vfio_ap device driver,
>>>>> but that would
>>>>> be redundant because an AP queue can not be assigned to a mediated
>>>>> matrix device
>>>>> via its sysfs attributes unless it is reserved by the vfio_ap
>>>>> device driver (see
>>>>> patches 7, 8 and 9).
>>>>>
>>>>>>
>>>>>>
>>>>>> I understand that you want to implement these checks within KVM
>>>>>> but this is
>>>>>> related to which queue devices are bound to the matrix and which
>>>>>> one are not.
>>>>>
>>>>> See my comments above and below about AP queue assignment to the
>>>>> mediated matrix
>>>>> device. The one verification we can't do when the devices are
>>>>> assigned is whether
>>>>> another guest is using the queue because assignment occurs before
>>>>> the guest using
>>>>> the queue is started in which case we have no access to KVM. It
>>>>> makes no sense to
>>>>> do so at assignment time anyway because it doesn't matter until
>>>>> the guest using
>>>>> the mediated matrix device is started, so that check is done in KVM.
>>>>>
>>>>>>
>>>>>>
>>>>>> I think that this should be related somehow to the bounded queue
>>>>>> devices and
>>>>>> therefor implemented inside the matrix driver.
>>>>>
>>>>> As I stated above, when an AP queue is assigned to the mediated
>>>>> matrix device via
>>>>> its sysfs attributes, a check is done to verify that it is bound
>>>>> to the vfio_ap
>>>>> device driver (see patches 7, 8 and 9). If not, then assignment
>>>>> will be rejected;
>>>>> therefore, it will not be possible to configure a CRYCB with AP
>>>>> queues that are
>>>>> not bound to the device driver.
>>>>
>>>> This patch and te followed patches take care that the queues are
>>>> bound to the
>>>> matrix driver when they are assigned to the matrix using the sysfs
>>>> entries.
>>>>
>>>> But they do not take care that the queue can not be unbound before
>>>> you start
>>>> the guest, and they are not in the path if the admin decide to
>>>> unbind a queue
>>>> at some later time.
>>>
>>> That is a good point. I need to put a check in the device driver at
>>> the time
>>> the mediated device fd is opened to verify that the queues being
>>> configured in
>>> the guest's CRYCB are bound to the driver.
>>
>> not only, you also need to avoid the possibility of unbinding the
>> device.
>> For this you need to use the remove callback from the driver.
>
> I thought I addressed this already. The definition of the remove
> callback does
> not specify a return value, so there is currently no way to prevent
> the AP bus
> from removing the queue device on unbind. I sent an email to Harald to
> discuss
> adding a return value to the callback.
If you can not prevent the unbinding you must remove
the according bits in the matrix.
>
>>
>>
>>>
>>>>
>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Pierre
>>>>>>
>>>>>
>>>>
>>>
>>
>
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On 05/17/2018 03:44 AM, Cornelia Huck wrote:
> On Mon, 14 May 2018 15:42:18 -0400
> Tony Krowiak <[email protected]> wrote:
>
>> On 05/11/2018 01:18 PM, Halil Pasic wrote:
>>>
>>> On 05/07/2018 05:11 PM, Tony Krowiak wrote:
>>>> Registers the matrix device created by the VFIO AP device
>>>> driver with the VFIO mediated device framework.
>>>> Registering the matrix device will create the sysfs
>>>> structures needed to create mediated matrix devices
>>>> each of which will be used to configure the AP matrix
>>>> for a guest and connect it to the VFIO AP device driver.
>>>> +static int vfio_ap_mdev_create(struct kobject *kobj, struct
>>>> mdev_device *mdev)
>>>> +{
>>>> + struct ap_matrix *ap_matrix = to_ap_matrix(mdev_parent_dev(mdev));
>>>> +
>>>> + ap_matrix->available_instances--;
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static int vfio_ap_mdev_remove(struct mdev_device *mdev)
>>>> +{
>>>> + struct ap_matrix *ap_matrix = to_ap_matrix(mdev_parent_dev(mdev));
>>>> +
>>>> + ap_matrix->available_instances++;
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>> The above functions seem to be called with the lock of this
>>> auto-generated
>>> mdev parent device held. That's why we don't have to care about
>>> synchronization
>>> ourselves, right?
>> I would assume as much. The comments for the 'struct mdev_parent_ops' in
>> include/linux/mdev.h do not mention anything about synchronization, nor
>> did I
>> see any locking or synchronization in the vfio_ccw implementation after
>> which
>> I modeled my code, so frankly it is something I did not consider.
>>
>>>
>>> A small comment in the code could be helpful for mdev non-experts.
>>> Hell, I would
>>> even consider documenting it for all mdev -- took me some time to
>>> figure out.
>> You may want to bring this up with the VFIO mdev maintainers, but I'd be
>> happy to
>> include a comment in the functions in question if you think it important.
> Important note: There's currently a patch on list that removes the mdev
> parent mutex, and it seems there was never intended to be any
> serialization in that place by the mdev core. (Look for "vfio/mdev:
> Check globally for duplicate devices".)
The patch on the list holds the mdev_list_lock during create and remove
of an mdev device, so it looks like no synchronization is necessary on the
part of the vendor code in the create/remove callbacks; does that sound
about right?
>
On 05/16/2018 10:41 AM, Pierre Morel wrote:
> On 16/05/2018 16:29, Tony Krowiak wrote:
>> On 05/11/2018 12:08 PM, Halil Pasic wrote:
>>>
>>>
>>> On 05/07/2018 05:11 PM, Tony Krowiak wrote:
>>>> Provides interfaces to manage the AP adapters, usage domains
>>>> and control domains assigned to a KVM guest.
>>>>
>>>> The guest's SIE state description has a satellite structure called the
>>>> Crypto Control Block (CRYCB) containing three bitmask fields
>>>> identifying the adapters, queues (domains) and control domains
>>>> assigned to the KVM guest:
>>>
>>> [..]
>>>
>>>> index 00bcfb0..98b53c7 100644
>>>> --- a/arch/s390/kvm/kvm-ap.c
>>>> +++ b/arch/s390/kvm/kvm-ap.c
>>>> @@ -7,6 +7,7 @@
>>>
>>> [..]
>>>
>>>> +
>>>> +/**
>>>> + * kvm_ap_validate_queue_sharing
>>>> + *
>>>> + * Verifies that the APQNs derived from the cross product of the
>>>> AP adapter IDs
>>>> + * and AP queue indexes comprising the AP matrix are not
>>>> configured for
>>>> + * another guest. AP queue sharing is not allowed.
>>>> + *
>>>> + * @kvm: the KVM guest
>>>> + * @matrix: the AP matrix
>>>> + *
>>>> + * Returns 0 if the APQNs are valid, otherwise; returns -EBUSY.
>>>> + */
>>>> +static int kvm_ap_validate_queue_sharing(struct kvm *kvm,
>>>> + struct kvm_ap_matrix *matrix)
>>>> +{
>>>> + struct kvm *vm;
>>>> + unsigned long *apm, *aqm;
>>>> + unsigned long apid, apqi;
>>>> +
>>>> +
>>>> + /* No other VM may share an AP Queue with the input VM */
>>>> + list_for_each_entry(vm, &vm_list, vm_list) {
>>>> + if (kvm == vm)
>>>> + continue;
>>>> +
>>>> + apm = kvm_ap_get_crycb_apm(vm);
>>>> + if (!bitmap_and(apm, apm, matrix->apm, matrix->apm_max + 1))
>>>> + continue;
>>>> +
>>>> + aqm = kvm_ap_get_crycb_aqm(vm);
>>>> + if (!bitmap_and(aqm, aqm, matrix->aqm, matrix->aqm_max + 1))
>>>> + continue;
>>>> +
>>>> + for_each_set_bit_inv(apid, apm, matrix->apm_max + 1)
>>>> + for_each_set_bit_inv(apqi, aqm, matrix->aqm_max + 1)
>>>> + kvm_ap_log_sharing_err(vm, apid, apqi);
>>>> +
>>>> + return -EBUSY;
>>>> + }
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +int kvm_ap_configure_matrix(struct kvm *kvm, struct kvm_ap_matrix
>>>> *matrix)
>>>> +{
>>>> + int ret = 0;
>>>> +
>>>> + mutex_lock(&kvm->lock);
>>>
>>> You seem to take only kvm->lock, vm_list however (used in
>>> kvm_ap_validate_queue_sharing()) seems to be protected by
>>> kvm_lock.
>>>
>>> Can you tell me why is this supposed to be safe?
>>>
>>> What is supposed to prevent an execution like
>>> vm1: call kvm_ap_configure_matrix(m1)
>>> vm2: call kvm_ap_configure_matrix(m2)
>>> vm1: call kvm_ap_validate_queue_sharing(m1)
>>> vm2: call kvm_ap_validate_queue_sharing(m2)
>>> vm1: call kvm_ap_set_crycb_masks(m1)
>>> vm2: call kvm_ap_set_crycb_masks(m2)
>>>
>>> where, let's say, m1 and m2 are equal in the sense that the
>>> mask values are the same?
>>
>> vm1 will get the kvm->lock first in your scenario when
>> kvm_ap_configure_matrix(m1) is invoked. Since the other
>> functions - i.e., kvm_ap_validate_queue_sharing(m1) and
>> kvm_ap_set_crycb_masks(m1) - are static and only called
>> from the kvm_ap_configure_matrix(m1), your scenario
>> can never happen because vm2 will not get the lock until
>> kvm_ap_configure_matrix(m1) has completed. I see your
>> point, however, and maybe I should also acquire the kvm_lock.
>
> AFAIU the locks you are talking about are KVM specific
> but the example from Halil use two different VM,
> i.e. two different locks are used and vm2 never wait for vw1.
Right you are! Perhaps I need to hold the kvm_lock, at least
in the kvm_ap_validate_queue_sharing() function while looping
over vm_list.
>
>
>>
>>>
>>>
>>> Regards,
>>> Halil
>>>
>>>> +
>>>> + ret = kvm_ap_validate_queue_sharing(kvm, matrix);
>>>> + if (ret)
>>>> + goto done;
>>>> +
>>>> + kvm_ap_set_crycb_masks(kvm, matrix);
>>>> +
>>>> +done:
>>>> + mutex_unlock(&kvm->lock);
>>>> +
>>>> + return ret;
>>>> +}
>>>> +EXPORT_SYMBOL(kvm_ap_configure_matrix);
>>>> +
>>
>>
>
On Mon, 21 May 2018 11:13:58 -0400
Tony Krowiak <[email protected]> wrote:
> On 05/17/2018 03:44 AM, Cornelia Huck wrote:
> > On Mon, 14 May 2018 15:42:18 -0400
> > Tony Krowiak <[email protected]> wrote:
> >
> >> On 05/11/2018 01:18 PM, Halil Pasic wrote:
> >>>
> >>> On 05/07/2018 05:11 PM, Tony Krowiak wrote:
> >>>> Registers the matrix device created by the VFIO AP device
> >>>> driver with the VFIO mediated device framework.
> >>>> Registering the matrix device will create the sysfs
> >>>> structures needed to create mediated matrix devices
> >>>> each of which will be used to configure the AP matrix
> >>>> for a guest and connect it to the VFIO AP device driver.
> >>>> +static int vfio_ap_mdev_create(struct kobject *kobj, struct
> >>>> mdev_device *mdev)
> >>>> +{
> >>>> + struct ap_matrix *ap_matrix = to_ap_matrix(mdev_parent_dev(mdev));
> >>>> +
> >>>> + ap_matrix->available_instances--;
> >>>> +
> >>>> + return 0;
> >>>> +}
> >>>> +
> >>>> +static int vfio_ap_mdev_remove(struct mdev_device *mdev)
> >>>> +{
> >>>> + struct ap_matrix *ap_matrix = to_ap_matrix(mdev_parent_dev(mdev));
> >>>> +
> >>>> + ap_matrix->available_instances++;
> >>>> +
> >>>> + return 0;
> >>>> +}
> >>>> +
> >>> The above functions seem to be called with the lock of this
> >>> auto-generated
> >>> mdev parent device held. That's why we don't have to care about
> >>> synchronization
> >>> ourselves, right?
> >> I would assume as much. The comments for the 'struct mdev_parent_ops' in
> >> include/linux/mdev.h do not mention anything about synchronization, nor
> >> did I
> >> see any locking or synchronization in the vfio_ccw implementation after
> >> which
> >> I modeled my code, so frankly it is something I did not consider.
> >>
> >>>
> >>> A small comment in the code could be helpful for mdev non-experts.
> >>> Hell, I would
> >>> even consider documenting it for all mdev -- took me some time to
> >>> figure out.
> >> You may want to bring this up with the VFIO mdev maintainers, but I'd be
> >> happy to
> >> include a comment in the functions in question if you think it important.
> > Important note: There's currently a patch on list that removes the mdev
> > parent mutex, and it seems there was never intended to be any
> > serialization in that place by the mdev core. (Look for "vfio/mdev:
> > Check globally for duplicate devices".)
>
> The patch on the list holds the mdev_list_lock during create and remove
> of an mdev device, so it looks like no synchronization is necessary on the
> part of the vendor code in the create/remove callbacks; does that sound
> about right?
v1/v2 did that; v3/v4 hold the list lock only while the device is added
to the mdev list. v4 also adds a note regarding locking to the
documentation.
On 05/22/2018 04:19 AM, Cornelia Huck wrote:
> On Mon, 21 May 2018 11:13:58 -0400
> Tony Krowiak <[email protected]> wrote:
>
>> On 05/17/2018 03:44 AM, Cornelia Huck wrote:
>>> On Mon, 14 May 2018 15:42:18 -0400
>>> Tony Krowiak <[email protected]> wrote:
>>>
>>>> On 05/11/2018 01:18 PM, Halil Pasic wrote:
>>>>> On 05/07/2018 05:11 PM, Tony Krowiak wrote:
>>>>>> Registers the matrix device created by the VFIO AP device
>>>>>> driver with the VFIO mediated device framework.
>>>>>> Registering the matrix device will create the sysfs
>>>>>> structures needed to create mediated matrix devices
>>>>>> each of which will be used to configure the AP matrix
>>>>>> for a guest and connect it to the VFIO AP device driver.
>>>>>> +static int vfio_ap_mdev_create(struct kobject *kobj, struct
>>>>>> mdev_device *mdev)
>>>>>> +{
>>>>>> + struct ap_matrix *ap_matrix = to_ap_matrix(mdev_parent_dev(mdev));
>>>>>> +
>>>>>> + ap_matrix->available_instances--;
>>>>>> +
>>>>>> + return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int vfio_ap_mdev_remove(struct mdev_device *mdev)
>>>>>> +{
>>>>>> + struct ap_matrix *ap_matrix = to_ap_matrix(mdev_parent_dev(mdev));
>>>>>> +
>>>>>> + ap_matrix->available_instances++;
>>>>>> +
>>>>>> + return 0;
>>>>>> +}
>>>>>> +
>>>>> The above functions seem to be called with the lock of this
>>>>> auto-generated
>>>>> mdev parent device held. That's why we don't have to care about
>>>>> synchronization
>>>>> ourselves, right?
>>>> I would assume as much. The comments for the 'struct mdev_parent_ops' in
>>>> include/linux/mdev.h do not mention anything about synchronization, nor
>>>> did I
>>>> see any locking or synchronization in the vfio_ccw implementation after
>>>> which
>>>> I modeled my code, so frankly it is something I did not consider.
>>>>
>>>>> A small comment in the code could be helpful for mdev non-experts.
>>>>> Hell, I would
>>>>> even consider documenting it for all mdev -- took me some time to
>>>>> figure out.
>>>> You may want to bring this up with the VFIO mdev maintainers, but I'd be
>>>> happy to
>>>> include a comment in the functions in question if you think it important.
>>> Important note: There's currently a patch on list that removes the mdev
>>> parent mutex, and it seems there was never intended to be any
>>> serialization in that place by the mdev core. (Look for "vfio/mdev:
>>> Check globally for duplicate devices".)
>> The patch on the list holds the mdev_list_lock during create and remove
>> of an mdev device, so it looks like no synchronization is necessary on the
>> part of the vendor code in the create/remove callbacks; does that sound
>> about right?
> v1/v2 did that; v3/v4 hold the list lock only while the device is added
> to the mdev list. v4 also adds a note regarding locking to the
> documentation.
I'll add some synchronization to the read/update of available instances.
>
On 05/18/2018 04:55 AM, Pierre Morel wrote:
> On 16/05/2018 15:48, Tony Krowiak wrote:
>> On 05/16/2018 09:15 AM, Pierre Morel wrote:
>>> On 16/05/2018 15:12, Tony Krowiak wrote:
>>>> On 05/16/2018 03:48 AM, Pierre Morel wrote:
>>>>> On 15/05/2018 18:07, Tony Krowiak wrote:
>>>>>> On 05/15/2018 10:55 AM, Pierre Morel wrote:
>>>>>>> On 07/05/2018 17:11, Tony Krowiak wrote:
>>>>>>>> Provides interfaces to manage the AP adapters, usage domains
>>>>>>>> and control domains assigned to a KVM guest.
>>>>>>>>
>>>>>>>> The guest's SIE state description has a satellite structure
>>>>>>>> called the
>>>>>>>> Crypto Control Block (CRYCB) containing three bitmask fields
>>>>>>>> identifying the adapters, queues (domains) and control domains
>>>>>>>> assigned to the KVM guest:
>>>>>>>>
>>>>> ...snip...
>>>>>>>> +}
>>>>>>>
>>>>>>> This function (ap_validate_queue_sharing) only verifies that VM
>>>>>>> don't share queues.
>>>>>>> What about the queues used by a host application?
>>>>>>
>>>>>> How can that be verified from this function? I suppose I could
>>>>>> put a check in here to
>>>>>> verify that the queues are reserved by the vfio_ap device driver,
>>>>>> but that would
>>>>>> be redundant because an AP queue can not be assigned to a
>>>>>> mediated matrix device
>>>>>> via its sysfs attributes unless it is reserved by the vfio_ap
>>>>>> device driver (see
>>>>>> patches 7, 8 and 9).
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I understand that you want to implement these checks within KVM
>>>>>>> but this is
>>>>>>> related to which queue devices are bound to the matrix and which
>>>>>>> one are not.
>>>>>>
>>>>>> See my comments above and below about AP queue assignment to the
>>>>>> mediated matrix
>>>>>> device. The one verification we can't do when the devices are
>>>>>> assigned is whether
>>>>>> another guest is using the queue because assignment occurs before
>>>>>> the guest using
>>>>>> the queue is started in which case we have no access to KVM. It
>>>>>> makes no sense to
>>>>>> do so at assignment time anyway because it doesn't matter until
>>>>>> the guest using
>>>>>> the mediated matrix device is started, so that check is done in KVM.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I think that this should be related somehow to the bounded queue
>>>>>>> devices and
>>>>>>> therefor implemented inside the matrix driver.
>>>>>>
>>>>>> As I stated above, when an AP queue is assigned to the mediated
>>>>>> matrix device via
>>>>>> its sysfs attributes, a check is done to verify that it is bound
>>>>>> to the vfio_ap
>>>>>> device driver (see patches 7, 8 and 9). If not, then assignment
>>>>>> will be rejected;
>>>>>> therefore, it will not be possible to configure a CRYCB with AP
>>>>>> queues that are
>>>>>> not bound to the device driver.
>>>>>
>>>>> This patch and te followed patches take care that the queues are
>>>>> bound to the
>>>>> matrix driver when they are assigned to the matrix using the sysfs
>>>>> entries.
>>>>>
>>>>> But they do not take care that the queue can not be unbound before
>>>>> you start
>>>>> the guest, and they are not in the path if the admin decide to
>>>>> unbind a queue
>>>>> at some later time.
>>>>
>>>> That is a good point. I need to put a check in the device driver at
>>>> the time
>>>> the mediated device fd is opened to verify that the queues being
>>>> configured in
>>>> the guest's CRYCB are bound to the driver.
>>>
>>> not only, you also need to avoid the possibility of unbinding the
>>> device.
>>> For this you need to use the remove callback from the driver.
>>
>> I thought I addressed this already. The definition of the remove
>> callback does
>> not specify a return value, so there is currently no way to prevent
>> the AP bus
>> from removing the queue device on unbind. I sent an email to Harald
>> to discuss
>> adding a return value to the callback.
>
> If you can not prevent the unbinding you must remove
> the according bits in the matrix.
In which matrix? The bits in the matrix configured via the mediated
matrix device's
sysfs attributes files? The bits in the guest's CRYCB? If the latter,
then what happens
to in-process crypto transactions on the guest? Wouldn't this
essentially be like a hot
unplug of the device from the guest?
>
>
>>
>>>
>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Pierre
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
On 05/16/2018 03:55 AM, Pierre Morel wrote:
> On 07/05/2018 17:11, Tony Krowiak wrote:
>> Provides a sysfs interface to view the AP matrix configured for the
>> mediated matrix device.
>>
>> The relevant sysfs structures are:
>>
>> /sys/devices/vfio_ap
>> ... [matrix]
>> ...... [mdev_supported_types]
>> ......... [vfio_ap-passthrough]
>> ............ [devices]
>> ...............[$uuid]
>> .................. matrix
>>
>> To view the matrix configured for the mediated matrix device,
>> print the matrix file:
>
> This is the configured matrix, not the one used by the guest.
> Nothing in the patches protect against binding a queue and assigning
> a new AP when the guest runs.
> The card and queue will be showed by this entry.
Of course, as stated above, this is the matrix configured for the
mediated matrix device. Are you suggesting here that the driver
should prevent assigning a new adapter or domain while a guest is
running? Couldn't this be a step in the process for hot (un)plugging
AP queues?
>
>
>
>>
>> cat matrix
>>
>> Signed-off-by: Tony Krowiak <[email protected]>
>> ---
>> drivers/s390/crypto/vfio_ap_ops.c | 31
>> +++++++++++++++++++++++++++++++
>> 1 files changed, 31 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
>> b/drivers/s390/crypto/vfio_ap_ops.c
>> index 755be1d..81e03b8 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -716,6 +716,36 @@ static ssize_t control_domains_show(struct
>> device *dev,
>> }
>> DEVICE_ATTR_RO(control_domains);
>>
>> +static ssize_t matrix_show(struct device *dev, struct
>> device_attribute *attr,
>> + char *buf)
>> +{
>> + struct mdev_device *mdev = mdev_from_dev(dev);
>> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>> + char *bufpos = buf;
>> + unsigned long apid;
>> + unsigned long apqi;
>> + unsigned long napm = matrix_mdev->matrix.apm_max + 1;
>> + unsigned long naqm = matrix_mdev->matrix.aqm_max + 1;
>> + int nchars = 0;
>> + int n;
>> +
>> + for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm) {
>> + n = sprintf(bufpos, "%02lx\n", apid);
>> + bufpos += n;
>> + nchars += n;
>> +
>> + for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, naqm) {
>> + n = sprintf(bufpos, "%02lx.%04lx\n", apid, apqi);
>> + bufpos += n;
>> + nchars += n;
>> + }
>> + }
>> +
>> + return nchars;
>> +}
>> +DEVICE_ATTR_RO(matrix);
>> +
>> +
>> static struct attribute *vfio_ap_mdev_attrs[] = {
>> &dev_attr_assign_adapter.attr,
>> &dev_attr_unassign_adapter.attr,
>> @@ -724,6 +754,7 @@ static ssize_t control_domains_show(struct device
>> *dev,
>> &dev_attr_assign_control_domain.attr,
>> &dev_attr_unassign_control_domain.attr,
>> &dev_attr_control_domains.attr,
>> + &dev_attr_matrix.attr,
>> NULL,
>> };
>>
>
On 05/16/2018 04:03 AM, Pierre Morel wrote:
> On 07/05/2018 17:11, Tony Krowiak wrote:
>> Implements the open callback on the mediated matrix device.
>> The function registers a group notifier to receive notification
>> of the VFIO_GROUP_NOTIFY_SET_KVM event. When notified,
>> the vfio_ap device driver will get access to the guest's
>> kvm structure. With access to this structure the driver will:
>>
>> 1. Ensure that only one mediated device is opened for the guest
>>
>> 2. Configure access to the AP devices for the guest.
>>
>> Access to AP adapters, usage domains and control domains
>> is controlled by three bit masks contained in the Crypto Control
>> Block (CRYCB) referenced from the guest's SIE state description:
>>
>> * The AP Mask (APM) controls access to the AP adapters. Each bit
>> in the APM represents an adapter number - from most significant
>> to least significant bit - from 0 to 255. The bits in the APM
>> are set according to the adapter numbers assigned to the mediated
>> matrix device via its 'assign_adapter' sysfs attribute file.
>>
>> * The AP Queue Mask (AQM) controls access to the AP queues. Each bit
>> in the AQM represents an AP queue index - from most significant
>> to least significant bit - from 0 to 255. A queue index references
>> a specific domain and is synonymous with the domian number. The
>> bits in the AQM are set according to the domain numbers assigned
>> to the mediated matrix device via its 'assign_domain' sysfs
>> attribute file.
>>
>> * The AP Domain Mask (ADM) controls access to the AP control
>> domains.
>> Each bit in the ADM represents a control domain - from most
>> significant to least significant bit - from 0-255. The
>> bits in the ADM are set according to the domain numbers assigned
>> to the mediated matrix device via its 'assign_control_domain'
>> sysfs attribute file.
>>
>> Signed-off-by: Tony Krowiak <[email protected]>
>> ---
>> arch/s390/include/asm/kvm-ap.h | 21 ++++++++++
>> arch/s390/include/asm/kvm_host.h | 1 +
>> arch/s390/kvm/kvm-ap.c | 19 +++++++++
>> drivers/s390/crypto/vfio_ap_ops.c | 68
>> +++++++++++++++++++++++++++++++++
>> drivers/s390/crypto/vfio_ap_private.h | 2 +
>> 5 files changed, 111 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/s390/include/asm/kvm-ap.h
>> b/arch/s390/include/asm/kvm-ap.h
>> index 21fe9f2..68c5a67 100644
>> --- a/arch/s390/include/asm/kvm-ap.h
>> +++ b/arch/s390/include/asm/kvm-ap.h
>> @@ -83,6 +83,27 @@ struct kvm_ap_matrix {
>> bool kvm_ap_instructions_available(void);
>>
>> /**
>> + * kvm_ap_refcount_read
>> + *
>> + * Read the AP reference count and return it.
>> + */
>> +int kvm_ap_refcount_read(struct kvm *kvm);
>> +
>> +/**
>> + * kvm_ap_refcount_inc
>> + *
>> + * Increment the AP reference count.
>> + */
>> +void kvm_ap_refcount_inc(struct kvm *kvm);
>> +
>> +/**
>> + * kvm_ap_refcount_dec
>> + *
>> + * Decrement the AP reference count
>> + */
>> +void kvm_ap_refcount_dec(struct kvm *kvm);
>> +
>> +/**
>> * kvm_ap_configure_matrix
>> *
>> * Configure the AP matrix for a KVM guest.
>> diff --git a/arch/s390/include/asm/kvm_host.h
>> b/arch/s390/include/asm/kvm_host.h
>> index 8736cde..5f1ad02 100644
>> --- a/arch/s390/include/asm/kvm_host.h
>> +++ b/arch/s390/include/asm/kvm_host.h
>> @@ -717,6 +717,7 @@ struct kvm_s390_crypto {
>> __u8 aes_kw;
>> __u8 dea_kw;
>> __u8 apie;
>> + atomic_t aprefs;
>> };
>>
>> #define APCB0_MASK_SIZE 1
>> diff --git a/arch/s390/kvm/kvm-ap.c b/arch/s390/kvm/kvm-ap.c
>> index 98b53c7..848fb37 100644
>> --- a/arch/s390/kvm/kvm-ap.c
>> +++ b/arch/s390/kvm/kvm-ap.c
>> @@ -9,6 +9,7 @@
>> #include <linux/kernel.h>
>> #include <linux/bitops.h>
>> #include <asm/kvm-ap.h>
>> +#include <asm/atomic.h>
>>
>> #include "kvm-s390.h"
>>
>> @@ -218,6 +219,24 @@ static int kvm_ap_validate_queue_sharing(struct
>> kvm *kvm,
>> return 0;
>> }
>>
>> +int kvm_ap_refcount_read(struct kvm *kvm)
>> +{
>> + return atomic_read(&kvm->arch.crypto.aprefs);
>> +}
>> +EXPORT_SYMBOL(kvm_ap_refcount_read);
>> +
>> +void kvm_ap_refcount_inc(struct kvm *kvm)
>> +{
>> + atomic_inc(&kvm->arch.crypto.aprefs);
>> +}
>> +EXPORT_SYMBOL(kvm_ap_refcount_inc);
>> +
>> +void kvm_ap_refcount_dec(struct kvm *kvm)
>> +{
>> + atomic_dec(&kvm->arch.crypto.aprefs);
>> +}
>> +EXPORT_SYMBOL(kvm_ap_refcount_dec);
>
> Why are these functions inside kvm-ap ?
> Will anyone use this outer of vfio-ap ?
As I've stated before, I made the choice to contain all interfaces that
access KVM in kvm-ap because I don't think it is appropriate for the device
driver to have to have "knowledge" of the inner workings of KVM. Why does
it matter whether any entity outside of the vfio_ap device driver calls
these functions? I could ask a similar question if the interfaces were
contained in vfio-ap; what if another device driver needs access to these
interfaces?
>
>
>> +
>> int kvm_ap_configure_matrix(struct kvm *kvm, struct kvm_ap_matrix
>> *matrix)
>> {
>> int ret = 0;
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
>> b/drivers/s390/crypto/vfio_ap_ops.c
>> index 81e03b8..8866b0e 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -11,6 +11,8 @@
>> #include <linux/device.h>
>> #include <linux/list.h>
>> #include <linux/ctype.h>
>> +#include <linux/module.h>
>> +#include <asm/kvm-ap.h>
>>
>> #include "vfio_ap_private.h"
>>
>> @@ -47,6 +49,70 @@ static int vfio_ap_mdev_remove(struct mdev_device
>> *mdev)
>> return 0;
>> }
>>
>> +static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>> + unsigned long action, void *data)
>> +{
>> + struct ap_matrix_mdev *matrix_mdev;
>> +
>> + if (action == VFIO_GROUP_NOTIFY_SET_KVM) {
>> + matrix_mdev = container_of(nb, struct ap_matrix_mdev,
>> + group_notifier);
>> + matrix_mdev->kvm = data;
>> + }
>> +
>> + return NOTIFY_OK;
>> +}
>> +
>> +static int vfio_ap_mdev_open(struct mdev_device *mdev)
>> +{
>> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>> + unsigned long events;
>> + int ret;
>> +
>> + if (!try_module_get(THIS_MODULE))
>> + return -ENODEV;
>> +
>> + matrix_mdev->group_notifier.notifier_call =
>> vfio_ap_mdev_group_notifier;
>> + events = VFIO_GROUP_NOTIFY_SET_KVM;
>> +
>> + ret = vfio_register_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
>> + &events, &matrix_mdev->group_notifier);
>> + if (ret)
>> + goto out_err;
>> +
>> + /* Only one mediated device allowed per guest */
>> + if (kvm_ap_refcount_read(matrix_mdev->kvm) != 0) {
>> + ret = -EEXIST;
>> + goto out_err;
>> + }
>
> Testing the existence should be the first thing to do.
That would be better but access to KVM is not available until the
notifier runs.
>
>
>> +
>> + kvm_ap_refcount_inc(matrix_mdev->kvm);
>> +
>> + ret = kvm_ap_configure_matrix(matrix_mdev->kvm,
>> &matrix_mdev->matrix);
>> + if (ret)
>> + goto config_err;
>> +
>> + return 0;
>> +
>> +config_err:
>> + kvm_ap_refcount_dec(matrix_mdev->kvm);
>> +out_err:
>> + module_put(THIS_MODULE);
>> +
>> + return ret;
>> +}
>> +
>> +static void vfio_ap_mdev_release(struct mdev_device *mdev)
>> +{
>> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>> +
>> + kvm_ap_deconfigure_matrix(matrix_mdev->kvm);
>> + vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
>> + &matrix_mdev->group_notifier);
>> + kvm_ap_refcount_dec(matrix_mdev->kvm);
>> + module_put(THIS_MODULE);
>> +}
>> +
>> static ssize_t name_show(struct kobject *kobj, struct device *dev,
>> char *buf)
>> {
>> return sprintf(buf, "%s\n", VFIO_AP_MDEV_NAME_HWVIRT);
>> @@ -773,6 +839,8 @@ static ssize_t matrix_show(struct device *dev,
>> struct device_attribute *attr,
>> .mdev_attr_groups = vfio_ap_mdev_attr_groups,
>> .create = vfio_ap_mdev_create,
>> .remove = vfio_ap_mdev_remove,
>> + .open = vfio_ap_mdev_open,
>> + .release = vfio_ap_mdev_release,
>> };
>>
>> int vfio_ap_mdev_register(struct ap_matrix *ap_matrix)
>> diff --git a/drivers/s390/crypto/vfio_ap_private.h
>> b/drivers/s390/crypto/vfio_ap_private.h
>> index 8b6ad66..ab072e9 100644
>> --- a/drivers/s390/crypto/vfio_ap_private.h
>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>> @@ -32,6 +32,8 @@ struct ap_matrix {
>>
>> struct ap_matrix_mdev {
>> struct kvm_ap_matrix matrix;
>> + struct notifier_block group_notifier;
>> + struct kvm *kvm;
>> };
>>
>> static inline struct ap_matrix *to_ap_matrix(struct device *dev)
>
>
On 23/05/2018 16:29, Tony Krowiak wrote:
> On 05/18/2018 04:55 AM, Pierre Morel wrote:
>> On 16/05/2018 15:48, Tony Krowiak wrote:
>>> On 05/16/2018 09:15 AM, Pierre Morel wrote:
>>>> On 16/05/2018 15:12, Tony Krowiak wrote:
>>>>> On 05/16/2018 03:48 AM, Pierre Morel wrote:
>>>>>> On 15/05/2018 18:07, Tony Krowiak wrote:
>>>>>>> On 05/15/2018 10:55 AM, Pierre Morel wrote:
>>>>>>>> On 07/05/2018 17:11, Tony Krowiak wrote:
>>>>>>>>> Provides interfaces to manage the AP adapters, usage domains
>>>>>>>>> and control domains assigned to a KVM guest.
>>>>>>>>>
>>>>>>>>> The guest's SIE state description has a satellite structure
>>>>>>>>> called the
>>>>>>>>> Crypto Control Block (CRYCB) containing three bitmask fields
>>>>>>>>> identifying the adapters, queues (domains) and control domains
>>>>>>>>> assigned to the KVM guest:
>>>>>>>>>
>>>>>> ...snip...
>>>>>>>>> +}
>>>>>>>>
>>>>>>>> This function (ap_validate_queue_sharing) only verifies that VM
>>>>>>>> don't share queues.
>>>>>>>> What about the queues used by a host application?
>>>>>>>
>>>>>>> How can that be verified from this function? I suppose I could
>>>>>>> put a check in here to
>>>>>>> verify that the queues are reserved by the vfio_ap device
>>>>>>> driver, but that would
>>>>>>> be redundant because an AP queue can not be assigned to a
>>>>>>> mediated matrix device
>>>>>>> via its sysfs attributes unless it is reserved by the vfio_ap
>>>>>>> device driver (see
>>>>>>> patches 7, 8 and 9).
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I understand that you want to implement these checks within
>>>>>>>> KVM but this is
>>>>>>>> related to which queue devices are bound to the matrix and
>>>>>>>> which one are not.
>>>>>>>
>>>>>>> See my comments above and below about AP queue assignment to the
>>>>>>> mediated matrix
>>>>>>> device. The one verification we can't do when the devices are
>>>>>>> assigned is whether
>>>>>>> another guest is using the queue because assignment occurs
>>>>>>> before the guest using
>>>>>>> the queue is started in which case we have no access to KVM. It
>>>>>>> makes no sense to
>>>>>>> do so at assignment time anyway because it doesn't matter until
>>>>>>> the guest using
>>>>>>> the mediated matrix device is started, so that check is done in
>>>>>>> KVM.
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I think that this should be related somehow to the bounded
>>>>>>>> queue devices and
>>>>>>>> therefor implemented inside the matrix driver.
>>>>>>>
>>>>>>> As I stated above, when an AP queue is assigned to the mediated
>>>>>>> matrix device via
>>>>>>> its sysfs attributes, a check is done to verify that it is bound
>>>>>>> to the vfio_ap
>>>>>>> device driver (see patches 7, 8 and 9). If not, then assignment
>>>>>>> will be rejected;
>>>>>>> therefore, it will not be possible to configure a CRYCB with AP
>>>>>>> queues that are
>>>>>>> not bound to the device driver.
>>>>>>
>>>>>> This patch and te followed patches take care that the queues are
>>>>>> bound to the
>>>>>> matrix driver when they are assigned to the matrix using the
>>>>>> sysfs entries.
>>>>>>
>>>>>> But they do not take care that the queue can not be unbound
>>>>>> before you start
>>>>>> the guest, and they are not in the path if the admin decide to
>>>>>> unbind a queue
>>>>>> at some later time.
>>>>>
>>>>> That is a good point. I need to put a check in the device driver
>>>>> at the time
>>>>> the mediated device fd is opened to verify that the queues being
>>>>> configured in
>>>>> the guest's CRYCB are bound to the driver.
>>>>
>>>> not only, you also need to avoid the possibility of unbinding the
>>>> device.
>>>> For this you need to use the remove callback from the driver.
>>>
>>> I thought I addressed this already. The definition of the remove
>>> callback does
>>> not specify a return value, so there is currently no way to prevent
>>> the AP bus
>>> from removing the queue device on unbind. I sent an email to Harald
>>> to discuss
>>> adding a return value to the callback.
>>
>> If you can not prevent the unbinding you must remove
>> the according bits in the matrix.
>
> In which matrix? The bits in the matrix configured via the mediated
> matrix device's
> sysfs attributes files? The bits in the guest's CRYCB? If the latter,
> then what happens
> to in-process crypto transactions on the guest? Wouldn't this
> essentially be like a hot
> unplug of the device from the guest?
Obviously from the CRYCB so that the guest do not access to the AP queues
belonging to the AP card anymore.
>
>>
>>
>>>
>>>>
>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Pierre
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On 23/05/2018 16:45, Tony Krowiak wrote:
> On 05/16/2018 04:03 AM, Pierre Morel wrote:
>> On 07/05/2018 17:11, Tony Krowiak wrote:
>>> Implements the open callback on the mediated matrix device.
>>> The function registers a group notifier to receive notification
>>> of the VFIO_GROUP_NOTIFY_SET_KVM event. When notified,
>>> the vfio_ap device driver will get access to the guest's
>>> kvm structure. With access to this structure the driver will:
>>>
>>> 1. Ensure that only one mediated device is opened for the guest
You should explain why.
>>>
>>> 2. Configure access to the AP devices for the guest.
>>>
...snip...
>>> +void kvm_ap_refcount_inc(struct kvm *kvm)
>>> +{
>>> + atomic_inc(&kvm->arch.crypto.aprefs);
>>> +}
>>> +EXPORT_SYMBOL(kvm_ap_refcount_inc);
>>> +
>>> +void kvm_ap_refcount_dec(struct kvm *kvm)
>>> +{
>>> + atomic_dec(&kvm->arch.crypto.aprefs);
>>> +}
>>> +EXPORT_SYMBOL(kvm_ap_refcount_dec);
>>
>> Why are these functions inside kvm-ap ?
>> Will anyone use this outer of vfio-ap ?
>
> As I've stated before, I made the choice to contain all interfaces that
> access KVM in kvm-ap because I don't think it is appropriate for the
> device
> driver to have to have "knowledge" of the inner workings of KVM. Why does
> it matter whether any entity outside of the vfio_ap device driver calls
> these functions? I could ask a similar question if the interfaces were
> contained in vfio-ap; what if another device driver needs access to these
> interfaces?
This is very driver specific and only used during initialization.
It is not a common property of the cryptographic interface.
I really think you should handle this inside the driver.
Pierre
...snip...
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On 23/05/2018 16:38, Tony Krowiak wrote:
> On 05/16/2018 03:55 AM, Pierre Morel wrote:
>> On 07/05/2018 17:11, Tony Krowiak wrote:
>>> Provides a sysfs interface to view the AP matrix configured for the
>>> mediated matrix device.
>>>
>>> The relevant sysfs structures are:
>>>
>>> /sys/devices/vfio_ap
>>> ... [matrix]
>>> ...... [mdev_supported_types]
>>> ......... [vfio_ap-passthrough]
>>> ............ [devices]
>>> ...............[$uuid]
>>> .................. matrix
>>>
>>> To view the matrix configured for the mediated matrix device,
>>> print the matrix file:
>>
>> This is the configured matrix, not the one used by the guest.
>> Nothing in the patches protect against binding a queue and assigning
>> a new AP when the guest runs.
>> The card and queue will be showed by this entry.
>
> Of course, as stated above, this is the matrix configured for the
> mediated matrix device. Are you suggesting here that the driver
> should prevent assigning a new adapter or domain while a guest is
> running? Couldn't this be a step in the process for hot (un)plugging
> AP queues?
No, I mean what is the point to show this?
It is not what the guest sees.
Has it any use case?
>
>>
>>
>>
>>>
>>> cat matrix
>>>
>>> Signed-off-by: Tony Krowiak <[email protected]>
>>> ---
>>> drivers/s390/crypto/vfio_ap_ops.c | 31
>>> +++++++++++++++++++++++++++++++
>>> 1 files changed, 31 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
>>> b/drivers/s390/crypto/vfio_ap_ops.c
>>> index 755be1d..81e03b8 100644
>>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>>> @@ -716,6 +716,36 @@ static ssize_t control_domains_show(struct
>>> device *dev,
>>> }
>>> DEVICE_ATTR_RO(control_domains);
>>>
>>> +static ssize_t matrix_show(struct device *dev, struct
>>> device_attribute *attr,
>>> + char *buf)
>>> +{
>>> + struct mdev_device *mdev = mdev_from_dev(dev);
>>> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>> + char *bufpos = buf;
>>> + unsigned long apid;
>>> + unsigned long apqi;
>>> + unsigned long napm = matrix_mdev->matrix.apm_max + 1;
>>> + unsigned long naqm = matrix_mdev->matrix.aqm_max + 1;
>>> + int nchars = 0;
>>> + int n;
>>> +
>>> + for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm) {
>>> + n = sprintf(bufpos, "%02lx\n", apid);
>>> + bufpos += n;
>>> + nchars += n;
>>> +
>>> + for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, naqm) {
>>> + n = sprintf(bufpos, "%02lx.%04lx\n", apid, apqi);
>>> + bufpos += n;
>>> + nchars += n;
>>> + }
>>> + }
>>> +
>>> + return nchars;
>>> +}
>>> +DEVICE_ATTR_RO(matrix);
>>> +
>>> +
>>> static struct attribute *vfio_ap_mdev_attrs[] = {
>>> &dev_attr_assign_adapter.attr,
>>> &dev_attr_unassign_adapter.attr,
>>> @@ -724,6 +754,7 @@ static ssize_t control_domains_show(struct
>>> device *dev,
>>> &dev_attr_assign_control_domain.attr,
>>> &dev_attr_unassign_control_domain.attr,
>>> &dev_attr_control_domains.attr,
>>> + &dev_attr_matrix.attr,
>>> NULL,
>>> };
>>>
>>
>
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On 05/24/2018 05:10 AM, Pierre Morel wrote:
> On 23/05/2018 16:38, Tony Krowiak wrote:
>> On 05/16/2018 03:55 AM, Pierre Morel wrote:
>>> On 07/05/2018 17:11, Tony Krowiak wrote:
>>>> Provides a sysfs interface to view the AP matrix configured for the
>>>> mediated matrix device.
>>>>
>>>> The relevant sysfs structures are:
>>>>
>>>> /sys/devices/vfio_ap
>>>> ... [matrix]
>>>> ...... [mdev_supported_types]
>>>> ......... [vfio_ap-passthrough]
>>>> ............ [devices]
>>>> ...............[$uuid]
>>>> .................. matrix
>>>>
>>>> To view the matrix configured for the mediated matrix device,
>>>> print the matrix file:
>>>
>>> This is the configured matrix, not the one used by the guest.
>>> Nothing in the patches protect against binding a queue and assigning
>>> a new AP when the guest runs.
>>> The card and queue will be showed by this entry.
>>
>> Of course, as stated above, this is the matrix configured for the
>> mediated matrix device. Are you suggesting here that the driver
>> should prevent assigning a new adapter or domain while a guest is
>> running? Couldn't this be a step in the process for hot (un)plugging
>> AP queues?
>
> No, I mean what is the point to show this?
> It is not what the guest sees.
> Has it any use case?
The point is to display the matrix so one can view the AP queues that
have been assigned to the mediated matrix device. This is the only way
to view the matrix. Do you not find value in being able to see what
has been assigned to the mediated matrix device?
>
>
>>
>>>
>>>
>>>
>>>>
>>>> cat matrix
>>>>
>>>> Signed-off-by: Tony Krowiak <[email protected]>
>>>> ---
>>>> drivers/s390/crypto/vfio_ap_ops.c | 31
>>>> +++++++++++++++++++++++++++++++
>>>> 1 files changed, 31 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
>>>> b/drivers/s390/crypto/vfio_ap_ops.c
>>>> index 755be1d..81e03b8 100644
>>>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>>>> @@ -716,6 +716,36 @@ static ssize_t control_domains_show(struct
>>>> device *dev,
>>>> }
>>>> DEVICE_ATTR_RO(control_domains);
>>>>
>>>> +static ssize_t matrix_show(struct device *dev, struct
>>>> device_attribute *attr,
>>>> + char *buf)
>>>> +{
>>>> + struct mdev_device *mdev = mdev_from_dev(dev);
>>>> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>>> + char *bufpos = buf;
>>>> + unsigned long apid;
>>>> + unsigned long apqi;
>>>> + unsigned long napm = matrix_mdev->matrix.apm_max + 1;
>>>> + unsigned long naqm = matrix_mdev->matrix.aqm_max + 1;
>>>> + int nchars = 0;
>>>> + int n;
>>>> +
>>>> + for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm) {
>>>> + n = sprintf(bufpos, "%02lx\n", apid);
>>>> + bufpos += n;
>>>> + nchars += n;
>>>> +
>>>> + for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, naqm) {
>>>> + n = sprintf(bufpos, "%02lx.%04lx\n", apid, apqi);
>>>> + bufpos += n;
>>>> + nchars += n;
>>>> + }
>>>> + }
>>>> +
>>>> + return nchars;
>>>> +}
>>>> +DEVICE_ATTR_RO(matrix);
>>>> +
>>>> +
>>>> static struct attribute *vfio_ap_mdev_attrs[] = {
>>>> &dev_attr_assign_adapter.attr,
>>>> &dev_attr_unassign_adapter.attr,
>>>> @@ -724,6 +754,7 @@ static ssize_t control_domains_show(struct
>>>> device *dev,
>>>> &dev_attr_assign_control_domain.attr,
>>>> &dev_attr_unassign_control_domain.attr,
>>>> &dev_attr_control_domains.attr,
>>>> + &dev_attr_matrix.attr,
>>>> NULL,
>>>> };
>>>>
>>>
>>
>
On 05/24/2018 05:08 AM, Pierre Morel wrote:
> On 23/05/2018 16:45, Tony Krowiak wrote:
>> On 05/16/2018 04:03 AM, Pierre Morel wrote:
>>> On 07/05/2018 17:11, Tony Krowiak wrote:
>>>> Implements the open callback on the mediated matrix device.
>>>> The function registers a group notifier to receive notification
>>>> of the VFIO_GROUP_NOTIFY_SET_KVM event. When notified,
>>>> the vfio_ap device driver will get access to the guest's
>>>> kvm structure. With access to this structure the driver will:
>>>>
>>>> 1. Ensure that only one mediated device is opened for the guest
>
> You should explain why.
>
>>>>
>>>> 2. Configure access to the AP devices for the guest.
>>>>
> ...snip...
>>>> +void kvm_ap_refcount_inc(struct kvm *kvm)
>>>> +{
>>>> + atomic_inc(&kvm->arch.crypto.aprefs);
>>>> +}
>>>> +EXPORT_SYMBOL(kvm_ap_refcount_inc);
>>>> +
>>>> +void kvm_ap_refcount_dec(struct kvm *kvm)
>>>> +{
>>>> + atomic_dec(&kvm->arch.crypto.aprefs);
>>>> +}
>>>> +EXPORT_SYMBOL(kvm_ap_refcount_dec);
>>>
>>> Why are these functions inside kvm-ap ?
>>> Will anyone use this outer of vfio-ap ?
>>
>> As I've stated before, I made the choice to contain all interfaces that
>> access KVM in kvm-ap because I don't think it is appropriate for the
>> device
>> driver to have to have "knowledge" of the inner workings of KVM. Why
>> does
>> it matter whether any entity outside of the vfio_ap device driver calls
>> these functions? I could ask a similar question if the interfaces were
>> contained in vfio-ap; what if another device driver needs access to
>> these
>> interfaces?
>
> This is very driver specific and only used during initialization.
> It is not a common property of the cryptographic interface.
>
> I really think you should handle this inside the driver.
We are going to have to agree to disagree on this one. Is it not possible
that future drivers - e.g., when full virtualization is implemented - will
require access to KVM?
>
> Pierre
>
>
> ...snip...
>
>
On 30/05/2018 16:33, Tony Krowiak wrote:
> On 05/24/2018 05:08 AM, Pierre Morel wrote:
>> On 23/05/2018 16:45, Tony Krowiak wrote:
>>> On 05/16/2018 04:03 AM, Pierre Morel wrote:
>>>> On 07/05/2018 17:11, Tony Krowiak wrote:
>>>>> Implements the open callback on the mediated matrix device.
>>>>> The function registers a group notifier to receive notification
>>>>> of the VFIO_GROUP_NOTIFY_SET_KVM event. When notified,
>>>>> the vfio_ap device driver will get access to the guest's
>>>>> kvm structure. With access to this structure the driver will:
>>>>>
>>>>> 1. Ensure that only one mediated device is opened for the guest
>>
>> You should explain why.
>>
>>>>>
>>>>> 2. Configure access to the AP devices for the guest.
>>>>>
>> ...snip...
>>>>> +void kvm_ap_refcount_inc(struct kvm *kvm)
>>>>> +{
>>>>> + atomic_inc(&kvm->arch.crypto.aprefs);
>>>>> +}
>>>>> +EXPORT_SYMBOL(kvm_ap_refcount_inc);
>>>>> +
>>>>> +void kvm_ap_refcount_dec(struct kvm *kvm)
>>>>> +{
>>>>> + atomic_dec(&kvm->arch.crypto.aprefs);
>>>>> +}
>>>>> +EXPORT_SYMBOL(kvm_ap_refcount_dec);
>>>>
>>>> Why are these functions inside kvm-ap ?
>>>> Will anyone use this outer of vfio-ap ?
>>>
>>> As I've stated before, I made the choice to contain all interfaces that
>>> access KVM in kvm-ap because I don't think it is appropriate for the
>>> device
>>> driver to have to have "knowledge" of the inner workings of KVM. Why
>>> does
>>> it matter whether any entity outside of the vfio_ap device driver calls
>>> these functions? I could ask a similar question if the interfaces were
>>> contained in vfio-ap; what if another device driver needs access to
>>> these
>>> interfaces?
>>
>> This is very driver specific and only used during initialization.
>> It is not a common property of the cryptographic interface.
>>
>> I really think you should handle this inside the driver.
>
> We are going to have to agree to disagree on this one. Is it not possible
> that future drivers - e.g., when full virtualization is implemented -
> will
> require access to KVM?
I do not think that an access to KVM is required for full virtualization.
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On 30/05/2018 16:28, Tony Krowiak wrote:
> On 05/24/2018 05:10 AM, Pierre Morel wrote:
>> On 23/05/2018 16:38, Tony Krowiak wrote:
>>> On 05/16/2018 03:55 AM, Pierre Morel wrote:
>>>> On 07/05/2018 17:11, Tony Krowiak wrote:
>>>>> Provides a sysfs interface to view the AP matrix configured for the
>>>>> mediated matrix device.
>>>>>
>>>>> The relevant sysfs structures are:
>>>>>
>>>>> /sys/devices/vfio_ap
>>>>> ... [matrix]
>>>>> ...... [mdev_supported_types]
>>>>> ......... [vfio_ap-passthrough]
>>>>> ............ [devices]
>>>>> ...............[$uuid]
>>>>> .................. matrix
>>>>>
>>>>> To view the matrix configured for the mediated matrix device,
>>>>> print the matrix file:
>>>>
>>>> This is the configured matrix, not the one used by the guest.
>>>> Nothing in the patches protect against binding a queue and assigning
>>>> a new AP when the guest runs.
>>>> The card and queue will be showed by this entry.
>>>
>>> Of course, as stated above, this is the matrix configured for the
>>> mediated matrix device. Are you suggesting here that the driver
>>> should prevent assigning a new adapter or domain while a guest is
>>> running? Couldn't this be a step in the process for hot (un)plugging
>>> AP queues?
>>
>> No, I mean what is the point to show this?
>> It is not what the guest sees.
>> Has it any use case?
>
> The point is to display the matrix so one can view the AP queues that
> have been assigned to the mediated matrix device. This is the only way
> to view the matrix. Do you not find value in being able to see what
> has been assigned to the mediated matrix device?
Two things:
1) I think it is better to retrieve the individual masks
2) As I said above, what you show is not the effective mask used by the
guest
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On 06/05/2018 08:40 AM, Pierre Morel wrote:
> On 30/05/2018 16:28, Tony Krowiak wrote:
>> On 05/24/2018 05:10 AM, Pierre Morel wrote:
>>> On 23/05/2018 16:38, Tony Krowiak wrote:
>>>> On 05/16/2018 03:55 AM, Pierre Morel wrote:
>>>>> On 07/05/2018 17:11, Tony Krowiak wrote:
>>>>>> Provides a sysfs interface to view the AP matrix configured for the
>>>>>> mediated matrix device.
>>>>>>
>>>>>> The relevant sysfs structures are:
>>>>>>
>>>>>> /sys/devices/vfio_ap
>>>>>> ... [matrix]
>>>>>> ...... [mdev_supported_types]
>>>>>> ......... [vfio_ap-passthrough]
>>>>>> ............ [devices]
>>>>>> ...............[$uuid]
>>>>>> .................. matrix
>>>>>>
>>>>>> To view the matrix configured for the mediated matrix device,
>>>>>> print the matrix file:
>>>>>
>>>>> This is the configured matrix, not the one used by the guest.
>>>>> Nothing in the patches protect against binding a queue and assigning
>>>>> a new AP when the guest runs.
>>>>> The card and queue will be showed by this entry.
>>>>
>>>> Of course, as stated above, this is the matrix configured for the
>>>> mediated matrix device. Are you suggesting here that the driver
>>>> should prevent assigning a new adapter or domain while a guest is
>>>> running? Couldn't this be a step in the process for hot (un)plugging
>>>> AP queues?
>>>
>>> No, I mean what is the point to show this?
>>> It is not what the guest sees.
>>> Has it any use case?
>>
>> The point is to display the matrix so one can view the AP queues that
>> have been assigned to the mediated matrix device. This is the only way
>> to view the matrix. Do you not find value in being able to see what
>> has been assigned to the mediated matrix device?
>
> Two things:
> 1) I think it is better to retrieve the individual masks
I am not certain what you mean by this. Are you suggesting we display the
actual mask? For example, the APM:
08000000000000001000000000000c0000000030000000000800000000000001
If that is the case, I completely disagree as that would be worthless from
a user perspective. Trying to figure out which APs are configured would be
ridiculously complicated.
Or, are you suggesting something like this:
4,67,116,117,154,155,255
Personally, I found viewing the queues to be much more valuable when
configuring the mediated device's matrix. I originally displayed the
individual adapter and domain attributes and found it cumbersome to
mentally configure what the matrix looked like. If you think of the
lszcrypt command, it outputs the adapters and queues which is the model
I used for this.
> 2) As I said above, what you show is not the effective mask used by
> the guest
Why would a sysfs attribute for the mediated matrix device show the
effective
mask used by the guest?
>
>
>
>
>
On 06/05/2018 08:19 AM, Pierre Morel wrote:
> On 30/05/2018 16:33, Tony Krowiak wrote:
>> On 05/24/2018 05:08 AM, Pierre Morel wrote:
>>> On 23/05/2018 16:45, Tony Krowiak wrote:
>>>> On 05/16/2018 04:03 AM, Pierre Morel wrote:
>>>>> On 07/05/2018 17:11, Tony Krowiak wrote:
>>>>>> Implements the open callback on the mediated matrix device.
>>>>>> The function registers a group notifier to receive notification
>>>>>> of the VFIO_GROUP_NOTIFY_SET_KVM event. When notified,
>>>>>> the vfio_ap device driver will get access to the guest's
>>>>>> kvm structure. With access to this structure the driver will:
>>>>>>
>>>>>> 1. Ensure that only one mediated device is opened for the guest
>>>
>>> You should explain why.
>>>
>>>>>>
>>>>>> 2. Configure access to the AP devices for the guest.
>>>>>>
>>> ...snip...
>>>>>> +void kvm_ap_refcount_inc(struct kvm *kvm)
>>>>>> +{
>>>>>> + atomic_inc(&kvm->arch.crypto.aprefs);
>>>>>> +}
>>>>>> +EXPORT_SYMBOL(kvm_ap_refcount_inc);
>>>>>> +
>>>>>> +void kvm_ap_refcount_dec(struct kvm *kvm)
>>>>>> +{
>>>>>> + atomic_dec(&kvm->arch.crypto.aprefs);
>>>>>> +}
>>>>>> +EXPORT_SYMBOL(kvm_ap_refcount_dec);
>>>>>
>>>>> Why are these functions inside kvm-ap ?
>>>>> Will anyone use this outer of vfio-ap ?
>>>>
>>>> As I've stated before, I made the choice to contain all interfaces
>>>> that
>>>> access KVM in kvm-ap because I don't think it is appropriate for
>>>> the device
>>>> driver to have to have "knowledge" of the inner workings of KVM.
>>>> Why does
>>>> it matter whether any entity outside of the vfio_ap device driver
>>>> calls
>>>> these functions? I could ask a similar question if the interfaces were
>>>> contained in vfio-ap; what if another device driver needs access to
>>>> these
>>>> interfaces?
>>>
>>> This is very driver specific and only used during initialization.
>>> It is not a common property of the cryptographic interface.
>>>
>>> I really think you should handle this inside the driver.
>>
>> We are going to have to agree to disagree on this one. Is it not
>> possible
>> that future drivers - e.g., when full virtualization is implemented -
>> will
>> require access to KVM?
>
> I do not think that an access to KVM is required for full virtualization.
You may be right, but at this point, there is no guarantee. I stand by my
design on this one.
>
>
>
On 06/06/2018 16:24, Tony Krowiak wrote:
> On 06/05/2018 08:40 AM, Pierre Morel wrote:
>> On 30/05/2018 16:28, Tony Krowiak wrote:
>>> On 05/24/2018 05:10 AM, Pierre Morel wrote:
>>>> On 23/05/2018 16:38, Tony Krowiak wrote:
>>>>> On 05/16/2018 03:55 AM, Pierre Morel wrote:
>>>>>> On 07/05/2018 17:11, Tony Krowiak wrote:
>>>>>>> Provides a sysfs interface to view the AP matrix configured for the
>>>>>>> mediated matrix device.
>>>>>>>
>>>>>>> The relevant sysfs structures are:
>>>>>>>
>>>>>>> /sys/devices/vfio_ap
>>>>>>> ... [matrix]
>>>>>>> ...... [mdev_supported_types]
>>>>>>> ......... [vfio_ap-passthrough]
>>>>>>> ............ [devices]
>>>>>>> ...............[$uuid]
>>>>>>> .................. matrix
>>>>>>>
>>>>>>> To view the matrix configured for the mediated matrix device,
>>>>>>> print the matrix file:
>>>>>>
>>>>>> This is the configured matrix, not the one used by the guest.
>>>>>> Nothing in the patches protect against binding a queue and assigning
>>>>>> a new AP when the guest runs.
>>>>>> The card and queue will be showed by this entry.
>>>>>
>>>>> Of course, as stated above, this is the matrix configured for the
>>>>> mediated matrix device. Are you suggesting here that the driver
>>>>> should prevent assigning a new adapter or domain while a guest is
>>>>> running? Couldn't this be a step in the process for hot (un)plugging
>>>>> AP queues?
>>>>
>>>> No, I mean what is the point to show this?
>>>> It is not what the guest sees.
>>>> Has it any use case?
>>>
>>> The point is to display the matrix so one can view the AP queues that
>>> have been assigned to the mediated matrix device. This is the only way
>>> to view the matrix. Do you not find value in being able to see what
>>> has been assigned to the mediated matrix device?
>>
>> Two things:
>> 1) I think it is better to retrieve the individual masks
>
> I am not certain what you mean by this. Are you suggesting we display the
> actual mask? For example, the APM:
>
> 08000000000000001000000000000c0000000030000000000800000000000001
>
> If that is the case, I completely disagree as that would be worthless
> from
> a user perspective. Trying to figure out which APs are configured
> would be
> ridiculously complicated.
- It is compatible with what the AP BUS shows
- a cut and past is easy
- you can use a userland script to translate to another format
>
> Or, are you suggesting something like this:
>
> 4,67,116,117,154,155,255
- this is not compatible with what the AP BUS shows
- as in the first case this is easy to parse
Both propositions look better to me.
>
> Personally, I found viewing the queues to be much more valuable when
> configuring the mediated device's matrix. I originally displayed the
> individual adapter and domain attributes and found it cumbersome to
> mentally configure what the matrix looked like. If you think of the
> lszcrypt command, it outputs the adapters and queues which is the model
> I used for this.
what is the point of seeing what the matrix looks like ?
It is interesting for the developer not for the administrator.
What the administrator needs is:
- To assign AP and to see what has been assigned
- To assign domains and to see what has been assigned
>
>> 2) As I said above, what you show is not the effective mask used by
>> the guest
>
> Why would a sysfs attribute for the mediated matrix device show the
> effective
> mask used by the guest?
OK, bad word, "effective", replace with "really".
We do not implement any kind of provisioning nor do we implement update
of the CRYCB at any point after the first mediated device open.
Binding a queue and updating the mask can be done at any time (may be we
should change this ?)
What is the point of showing a matrix which will never be used by the guest?
>
>>
>>
>>
>>
>>
>
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On 06/06/2018 16:28, Tony Krowiak wrote:
> On 06/05/2018 08:19 AM, Pierre Morel wrote:
>> On 30/05/2018 16:33, Tony Krowiak wrote:
>>> On 05/24/2018 05:08 AM, Pierre Morel wrote:
>>>> On 23/05/2018 16:45, Tony Krowiak wrote:
>>>>> On 05/16/2018 04:03 AM, Pierre Morel wrote:
>>>>>> On 07/05/2018 17:11, Tony Krowiak wrote:
>>>>>>> Implements the open callback on the mediated matrix device.
>>>>>>> The function registers a group notifier to receive notification
>>>>>>> of the VFIO_GROUP_NOTIFY_SET_KVM event. When notified,
>>>>>>> the vfio_ap device driver will get access to the guest's
>>>>>>> kvm structure. With access to this structure the driver will:
>>>>>>>
>>>>>>> 1. Ensure that only one mediated device is opened for the guest
>>>>
>>>> You should explain why.
>>>>
>>>>>>>
>>>>>>> 2. Configure access to the AP devices for the guest.
>>>>>>>
>>>> ...snip...
>>>>>>> +void kvm_ap_refcount_inc(struct kvm *kvm)
>>>>>>> +{
>>>>>>> + atomic_inc(&kvm->arch.crypto.aprefs);
>>>>>>> +}
>>>>>>> +EXPORT_SYMBOL(kvm_ap_refcount_inc);
>>>>>>> +
>>>>>>> +void kvm_ap_refcount_dec(struct kvm *kvm)
>>>>>>> +{
>>>>>>> + atomic_dec(&kvm->arch.crypto.aprefs);
>>>>>>> +}
>>>>>>> +EXPORT_SYMBOL(kvm_ap_refcount_dec);
>>>>>>
>>>>>> Why are these functions inside kvm-ap ?
>>>>>> Will anyone use this outer of vfio-ap ?
>>>>>
>>>>> As I've stated before, I made the choice to contain all interfaces
>>>>> that
>>>>> access KVM in kvm-ap because I don't think it is appropriate for
>>>>> the device
>>>>> driver to have to have "knowledge" of the inner workings of KVM.
>>>>> Why does
>>>>> it matter whether any entity outside of the vfio_ap device driver
>>>>> calls
>>>>> these functions? I could ask a similar question if the interfaces
>>>>> were
>>>>> contained in vfio-ap; what if another device driver needs access
>>>>> to these
>>>>> interfaces?
>>>>
>>>> This is very driver specific and only used during initialization.
>>>> It is not a common property of the cryptographic interface.
>>>>
>>>> I really think you should handle this inside the driver.
>>>
>>> We are going to have to agree to disagree on this one. Is it not
>>> possible
>>> that future drivers - e.g., when full virtualization is implemented
>>> - will
>>> require access to KVM?
>>
>> I do not think that an access to KVM is required for full
>> virtualization.
>
> You may be right, but at this point, there is no guarantee. I stand by my
> design on this one.
I really regret that we abandoned the initial design with the matrix bus
and one
single parent matrix device per guest.
We would not have the problem of these KVM dependencies.
It had the advantage of taking care of having only one device per guest
(available_instance = 1), could take care of provisioning as you have
sysfs entries available for a matrix without having a guest and a mediated
device.
it also had advantage for virtualization to keep host side and guest
side matrix
separate inside parent (host side) and mediated device (guest side).
Shouldn't we treat this problem with a design using standard interfaces
Instead of adding new dedicated interfaces?
Regards,
Pierre
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On 06/06/2018 18:08, Pierre Morel wrote:
> On 06/06/2018 16:28, Tony Krowiak wrote:
>> On 06/05/2018 08:19 AM, Pierre Morel wrote:
>>> On 30/05/2018 16:33, Tony Krowiak wrote:
>>>> On 05/24/2018 05:08 AM, Pierre Morel wrote:
>>>>> On 23/05/2018 16:45, Tony Krowiak wrote:
>>>>>> On 05/16/2018 04:03 AM, Pierre Morel wrote:
>>>>>>> On 07/05/2018 17:11, Tony Krowiak wrote:
>>>>>>>> Implements the open callback on the mediated matrix device.
>>>>>>>> The function registers a group notifier to receive notification
>>>>>>>> of the VFIO_GROUP_NOTIFY_SET_KVM event. When notified,
>>>>>>>> the vfio_ap device driver will get access to the guest's
>>>>>>>> kvm structure. With access to this structure the driver will:
>>>>>>>>
>>>>>>>> 1. Ensure that only one mediated device is opened for the guest
>>>>>
>>>>> You should explain why.
>>>>>
>>>>>>>>
>>>>>>>> 2. Configure access to the AP devices for the guest.
>>>>>>>>
>>>>> ...snip...
>>>>>>>> +void kvm_ap_refcount_inc(struct kvm *kvm)
>>>>>>>> +{
>>>>>>>> + atomic_inc(&kvm->arch.crypto.aprefs);
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL(kvm_ap_refcount_inc);
>>>>>>>> +
>>>>>>>> +void kvm_ap_refcount_dec(struct kvm *kvm)
>>>>>>>> +{
>>>>>>>> + atomic_dec(&kvm->arch.crypto.aprefs);
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL(kvm_ap_refcount_dec);
>>>>>>>
>>>>>>> Why are these functions inside kvm-ap ?
>>>>>>> Will anyone use this outer of vfio-ap ?
>>>>>>
>>>>>> As I've stated before, I made the choice to contain all
>>>>>> interfaces that
>>>>>> access KVM in kvm-ap because I don't think it is appropriate for
>>>>>> the device
>>>>>> driver to have to have "knowledge" of the inner workings of KVM.
>>>>>> Why does
>>>>>> it matter whether any entity outside of the vfio_ap device driver
>>>>>> calls
>>>>>> these functions? I could ask a similar question if the interfaces
>>>>>> were
>>>>>> contained in vfio-ap; what if another device driver needs access
>>>>>> to these
>>>>>> interfaces?
>>>>>
>>>>> This is very driver specific and only used during initialization.
>>>>> It is not a common property of the cryptographic interface.
>>>>>
>>>>> I really think you should handle this inside the driver.
>>>>
>>>> We are going to have to agree to disagree on this one. Is it not
>>>> possible
>>>> that future drivers - e.g., when full virtualization is implemented
>>>> - will
>>>> require access to KVM?
>>>
>>> I do not think that an access to KVM is required for full
>>> virtualization.
>>
>> You may be right, but at this point, there is no guarantee. I stand
>> by my
>> design on this one.
>
> I really regret that we abandoned the initial design with the matrix
> bus and one
> single parent matrix device per guest.
> We would not have the problem of these KVM dependencies.
>
> It had the advantage of taking care of having only one device per guest
> (available_instance = 1), could take care of provisioning as you have
> sysfs entries available for a matrix without having a guest and a
> mediated
> device.
>
> it also had advantage for virtualization to keep host side and guest
> side matrix
> separate inside parent (host side) and mediated device (guest side).
>
> Shouldn't we treat this problem with a design using standard interfaces
> Instead of adding new dedicated interfaces?
>
> Regards,
>
> Pierre
>
>
Forget it.
I am not happy with the design but the design I was speaking of may not
be the solution either.
Sorry for the noise.
Regards,
Pierre
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On 07/05/2018 17:11, Tony Krowiak wrote:
> Introduces a new AP device driver. This device driver
> is built on the VFIO mediated device framework. The framework
> provides sysfs interfaces that facilitate passthrough
> access by guests to devices installed on the linux host.
>
> The VFIO AP device driver will serve two purposes:
>
> 1. Provide the interfaces to reserve AP devices for exclusive
> use by KVM guests. This is accomplished by unbinding the
> devices to be reserved for guest usage from the default AP
> device driver and binding them to the VFIO AP device driver.
>
> 2. Implements the functions, callbacks and sysfs attribute
> interfaces required to create one or more VFIO mediated
> devices each of which will be used to configure the AP
> matrix for a guest and serve as a file descriptor
> for facilitating communication between QEMU and the
> VFIO AP device driver.
>
> When the VFIO AP device driver is initialized:
>
> * It registers with the AP bus for control of type 10 (CEX4
> and newer) AP queue devices. This limitation was imposed
> due to:
>
> 1. A lack of access to older systems needed to test the
> older AP device models;
>
> 2. A desire to keep the code as simple as possible;
>
> 3. Some older models are no longer supported by the kernel
> and others are getting close to end of service.
>
> The probe and remove callbacks will be provided to support
> the binding/unbinding of AP queue devices to/from the VFIO
> AP device driver.
>
> * Creates a /sys/devices/vfio-ap/matrix device to hold
> the APQNs of the AP devices bound to the VFIO
> AP device driver and serves as the parent of the
> mediated devices created for each guest.
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> MAINTAINERS | 10 +++
> arch/s390/Kconfig | 11 +++
> drivers/s390/crypto/Makefile | 4 +
> drivers/s390/crypto/vfio_ap_drv.c | 134 +++++++++++++++++++++++++++++++++
> drivers/s390/crypto/vfio_ap_private.h | 23 ++++++
> include/uapi/linux/vfio.h | 2 +
> 6 files changed, 184 insertions(+), 0 deletions(-)
> create mode 100644 drivers/s390/crypto/vfio_ap_drv.c
> create mode 100644 drivers/s390/crypto/vfio_ap_private.h
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 224e97b..2792c81 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -12237,6 +12237,16 @@ W: http://www.ibm.com/developerworks/linux/linux390/
> S: Supported
> F: drivers/s390/crypto/
>
> +S390 VFIO AP DRIVER
> +M: Tony Krowiak <[email protected]>
> +M: Christian Borntraeger <[email protected]>
> +M: Martin Schwidefsky <[email protected]>
> +L: [email protected]
> +W: http://www.ibm.com/developerworks/linux/linux390/
> +S: Supported
> +F: drivers/s390/crypto/vfio_ap_drv.c
> +F: drivers/s390/crypto/vfio_ap_private.h
> +
> S390 ZFCP DRIVER
> M: Steffen Maier <[email protected]>
> M: Benjamin Block <[email protected]>
> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> index 199ac3e..8d833be 100644
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -786,6 +786,17 @@ config VFIO_CCW
> To compile this driver as a module, choose M here: the
> module will be called vfio_ccw.
>
> +config VFIO_AP
> + def_tristate n
> + prompt "VFIO support for AP devices"
> + depends on ZCRYPT && VFIO_MDEV_DEVICE && KVM
> + help
> + This driver grants access to Adjunct Processor (AP) devices
> + via the VFIO mediated device interface.
> +
> + To compile this driver as a module, choose M here: the module
> + will be called vfio_ap.
> +
> endmenu
>
> menu "Dump support"
> diff --git a/drivers/s390/crypto/Makefile b/drivers/s390/crypto/Makefile
> index b59af54..48e466e 100644
> --- a/drivers/s390/crypto/Makefile
> +++ b/drivers/s390/crypto/Makefile
> @@ -15,3 +15,7 @@ obj-$(CONFIG_ZCRYPT) += zcrypt_pcixcc.o zcrypt_cex2a.o zcrypt_cex4.o
> # pkey kernel module
> pkey-objs := pkey_api.o
> obj-$(CONFIG_PKEY) += pkey.o
> +
> +# adjunct processor matrix
> +vfio_ap-objs := vfio_ap_drv.o
> +obj-$(CONFIG_VFIO_AP) += vfio_ap.o
> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
> new file mode 100644
> index 0000000..014d70f
> --- /dev/null
> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> @@ -0,0 +1,134 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * VFIO based AP device driver
> + *
> + * Copyright IBM Corp. 2018
> + *
> + * Author(s): Tony Krowiak <[email protected]>
> + */
> +
> +#include <linux/module.h>
> +#include <linux/mod_devicetable.h>
> +#include <linux/slab.h>
> +
> +#include "vfio_ap_private.h"
> +
> +#define VFIO_AP_ROOT_NAME "vfio_ap"
> +#define VFIO_AP_DEV_TYPE_NAME "ap_matrix"
> +#define VFIO_AP_DEV_NAME "matrix"
> +
> +MODULE_AUTHOR("IBM Corporation");
> +MODULE_DESCRIPTION("VFIO AP device driver, Copyright IBM Corp. 2017");
> +MODULE_LICENSE("GPL v2");
> +
> +static struct device *vfio_ap_root_device;
> +
> +static struct ap_driver vfio_ap_drv;
> +
> +static struct ap_matrix *ap_matrix;
> +
> +static struct device_type vfio_ap_dev_type = {
> + .name = VFIO_AP_DEV_TYPE_NAME,
> +};
> +
> +/* Only type 10 adapters (CEX4 and later) are supported
> + * by the AP matrix device driver
> + */
> +static struct ap_device_id ap_queue_ids[] = {
> + { .dev_type = AP_DEVICE_TYPE_CEX4,
> + .match_flags = AP_DEVICE_ID_MATCH_QUEUE_TYPE },
> + { .dev_type = AP_DEVICE_TYPE_CEX5,
> + .match_flags = AP_DEVICE_ID_MATCH_QUEUE_TYPE },
> + { .dev_type = AP_DEVICE_TYPE_CEX6,
> + .match_flags = AP_DEVICE_ID_MATCH_QUEUE_TYPE },
> + { /* end of sibling */ },
> +};
> +
> +MODULE_DEVICE_TABLE(vfio_ap, ap_queue_ids);
> +
> +static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
> +{
> + return 0;
> +}
> +
> +static void vfio_ap_matrix_dev_release(struct device *dev)
> +{
> + struct ap_matrix *ap_matrix = dev_get_drvdata(dev);
> +
> + kfree(ap_matrix);
> +}
> +
> +static int vfio_ap_matrix_dev_create(void)
> +{
> + int ret;
> +
> + vfio_ap_root_device = root_device_register(VFIO_AP_ROOT_NAME);
> +
> + if (IS_ERR(vfio_ap_root_device)) {
> + ret = PTR_ERR(vfio_ap_root_device);
> + goto done;
> + }
> +
> + ap_matrix = kzalloc(sizeof(*ap_matrix), GFP_KERNEL);
Since you always need this, why not a static structure ?
> + if (!ap_matrix) {
> + ret = -ENOMEM;
> + goto matrix_alloc_err;
> + }
> +
> + ap_matrix->device.type = &vfio_ap_dev_type;
> + dev_set_name(&ap_matrix->device, "%s", VFIO_AP_DEV_NAME);
> + ap_matrix->device.parent = vfio_ap_root_device;
> + ap_matrix->device.release = vfio_ap_matrix_dev_release;
> + ap_matrix->device.driver = &vfio_ap_drv.driver;
> +
> + ret = device_register(&ap_matrix->device);
> + if (ret)
> + goto matrix_reg_err;
> +
> + goto done;
> +
> +matrix_reg_err:
> + put_device(&ap_matrix->device);
> +
> +matrix_alloc_err:
> + root_device_unregister(vfio_ap_root_device);
> +
> +done:
> + return ret;
> +}
> +
> +static void vfio_ap_matrix_dev_destroy(struct ap_matrix *ap_matrix)
> +{
> + device_unregister(&ap_matrix->device);
> + root_device_unregister(vfio_ap_root_device);
> +}
> +
> +int __init vfio_ap_init(void)
> +{
> + int ret;
> +
> + ret = vfio_ap_matrix_dev_create();
> + if (ret)
> + return ret;
> +
> + memset(&vfio_ap_drv, 0, sizeof(vfio_ap_drv));
your structure is static, no need to initialize to 0.
> + vfio_ap_drv.probe = vfio_ap_queue_dev_probe;
> + vfio_ap_drv.ids = ap_queue_ids;
> +
> + ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
> + if (ret) {
> + vfio_ap_matrix_dev_destroy(ap_matrix);
> + return ret;
> + }
> +
> + return 0;
> +}
> +
> +void __exit vfio_ap_exit(void)
> +{
> + ap_driver_unregister(&vfio_ap_drv);
> + vfio_ap_matrix_dev_destroy(ap_matrix);
> +}
> +
> +module_init(vfio_ap_init);
> +module_exit(vfio_ap_exit);
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> new file mode 100644
> index 0000000..cf23675
> --- /dev/null
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -0,0 +1,23 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Private data and functions for adjunct processor VFIO matrix driver.
> + *
> + * Copyright IBM Corp. 2017
> + * Author(s): Tony Krowiak <[email protected]>
> + */
> +
> +#ifndef _VFIO_AP_PRIVATE_H_
> +#define _VFIO_AP_PRIVATE_H_
> +
> +#include <linux/types.h>
> +
> +#include "ap_bus.h"
> +
> +#define VFIO_AP_MODULE_NAME "vfio_ap"
> +#define VFIO_AP_DRV_NAME "vfio_ap"
> +
> +struct ap_matrix {
> + struct device device;
> +};
> +
> +#endif /* _VFIO_AP_PRIVATE_H_ */
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 1aa7b82..f378b98 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -200,6 +200,7 @@ struct vfio_device_info {
> #define VFIO_DEVICE_FLAGS_PLATFORM (1 << 2) /* vfio-platform device */
> #define VFIO_DEVICE_FLAGS_AMBA (1 << 3) /* vfio-amba device */
> #define VFIO_DEVICE_FLAGS_CCW (1 << 4) /* vfio-ccw device */
> +#define VFIO_DEVICE_FLAGS_AP (1 << 5) /* vfio-ap device */
> __u32 num_regions; /* Max region index + 1 */
> __u32 num_irqs; /* Max IRQ index + 1 */
> };
> @@ -215,6 +216,7 @@ struct vfio_device_info {
> #define VFIO_DEVICE_API_PLATFORM_STRING "vfio-platform"
> #define VFIO_DEVICE_API_AMBA_STRING "vfio-amba"
> #define VFIO_DEVICE_API_CCW_STRING "vfio-ccw"
> +#define VFIO_DEVICE_API_AP_STRING "vfio-ap"
>
> /**
> * VFIO_DEVICE_GET_REGION_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 8,
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On 06/06/2018 11:10 AM, Pierre Morel wrote:
> On 06/06/2018 16:24, Tony Krowiak wrote:
>> On 06/05/2018 08:40 AM, Pierre Morel wrote:
>>> On 30/05/2018 16:28, Tony Krowiak wrote:
>>>> On 05/24/2018 05:10 AM, Pierre Morel wrote:
>>>>> On 23/05/2018 16:38, Tony Krowiak wrote:
>>>>>> On 05/16/2018 03:55 AM, Pierre Morel wrote:
>>>>>>> On 07/05/2018 17:11, Tony Krowiak wrote:
>>>>>>>> Provides a sysfs interface to view the AP matrix configured for
>>>>>>>> the
>>>>>>>> mediated matrix device.
>>>>>>>>
>>>>>>>> The relevant sysfs structures are:
>>>>>>>>
>>>>>>>> /sys/devices/vfio_ap
>>>>>>>> ... [matrix]
>>>>>>>> ...... [mdev_supported_types]
>>>>>>>> ......... [vfio_ap-passthrough]
>>>>>>>> ............ [devices]
>>>>>>>> ...............[$uuid]
>>>>>>>> .................. matrix
>>>>>>>>
>>>>>>>> To view the matrix configured for the mediated matrix device,
>>>>>>>> print the matrix file:
>>>>>>>
>>>>>>> This is the configured matrix, not the one used by the guest.
>>>>>>> Nothing in the patches protect against binding a queue and
>>>>>>> assigning
>>>>>>> a new AP when the guest runs.
>>>>>>> The card and queue will be showed by this entry.
>>>>>>
>>>>>> Of course, as stated above, this is the matrix configured for the
>>>>>> mediated matrix device. Are you suggesting here that the driver
>>>>>> should prevent assigning a new adapter or domain while a guest is
>>>>>> running? Couldn't this be a step in the process for hot (un)plugging
>>>>>> AP queues?
>>>>>
>>>>> No, I mean what is the point to show this?
>>>>> It is not what the guest sees.
>>>>> Has it any use case?
>>>>
>>>> The point is to display the matrix so one can view the AP queues that
>>>> have been assigned to the mediated matrix device. This is the only way
>>>> to view the matrix. Do you not find value in being able to see what
>>>> has been assigned to the mediated matrix device?
>>>
>>> Two things:
>>> 1) I think it is better to retrieve the individual masks
>>
>> I am not certain what you mean by this. Are you suggesting we display
>> the
>> actual mask? For example, the APM:
>>
>> 08000000000000001000000000000c0000000030000000000800000000000001
>>
>> If that is the case, I completely disagree as that would be worthless
>> from
>> a user perspective. Trying to figure out which APs are configured
>> would be
>> ridiculously complicated.
>
> - It is compatible with what the AP BUS shows
> - a cut and past is easy
> - you can use a userland script to translate to another format
>
>>
>> Or, are you suggesting something like this:
>>
>> 4,67,116,117,154,155,255
>
> - this is not compatible with what the AP BUS shows
> - as in the first case this is easy to parse
>
> Both propositions look better to me.
>
>>
>> Personally, I found viewing the queues to be much more valuable when
>> configuring the mediated device's matrix. I originally displayed the
>> individual adapter and domain attributes and found it cumbersome to
>> mentally configure what the matrix looked like. If you think of the
>> lszcrypt command, it outputs the adapters and queues which is the model
>> I used for this.
>
> what is the point of seeing what the matrix looks like ?
> It is interesting for the developer not for the administrator.
> What the administrator needs is:
> - To assign AP and to see what has been assigned
> - To assign domains and to see what has been assigned
>
>>
>>> 2) As I said above, what you show is not the effective mask used by
>>> the guest
>>
>> Why would a sysfs attribute for the mediated matrix device show the
>> effective
>> mask used by the guest?
>
> OK, bad word, "effective", replace with "really".
>
> We do not implement any kind of provisioning nor do we implement update
> of the CRYCB at any point after the first mediated device open.
I think this is a way we might be able to hot plug/unplug devices.
>
>
> Binding a queue and updating the mask can be done at any time (may be
> we should change this ?)
As I said above, I think we can utilize this as a means of hot
plugging/unplugging AP
adapters and domains. If the guest is running when an adapter or domain
is assigned,
we can update the guest's CRYCB at that time.
>
>
> What is the point of showing a matrix which will never be used by the
> guest?
That is simply not true. The matrix WILL be used by a guest the next time a
guest is configured with a vfio-ap device referencing the path to the
mediated matrix device - i.e., -device vfio-ap,sysfsdev=$PATH. The point
is to show the matrix assigned to the mediated matrix device. In my
mind, the
mediated matrix device is a separate object from the guest. Sure it is used
to configure a guest's matrix when the guest is started, but it could be
used
to configure the matrix for any guest; it has no direct connection to a
particular guest until a guest using the device is started. IMHO the sysfs
attributes for the mediated matrix device reflect only the attributes of
the device, not the attributes of a guest.
>
>
>
>
>>
>>>
>>>
>>>
>>>
>>>
>>
>
On 06/06/2018 12:08 PM, Pierre Morel wrote:
> On 06/06/2018 16:28, Tony Krowiak wrote:
>> On 06/05/2018 08:19 AM, Pierre Morel wrote:
>>> On 30/05/2018 16:33, Tony Krowiak wrote:
>>>> On 05/24/2018 05:08 AM, Pierre Morel wrote:
>>>>> On 23/05/2018 16:45, Tony Krowiak wrote:
>>>>>> On 05/16/2018 04:03 AM, Pierre Morel wrote:
>>>>>>> On 07/05/2018 17:11, Tony Krowiak wrote:
>>>>>>>> Implements the open callback on the mediated matrix device.
>>>>>>>> The function registers a group notifier to receive notification
>>>>>>>> of the VFIO_GROUP_NOTIFY_SET_KVM event. When notified,
>>>>>>>> the vfio_ap device driver will get access to the guest's
>>>>>>>> kvm structure. With access to this structure the driver will:
>>>>>>>>
>>>>>>>> 1. Ensure that only one mediated device is opened for the guest
>>>>>
>>>>> You should explain why.
>>>>>
>>>>>>>>
>>>>>>>> 2. Configure access to the AP devices for the guest.
>>>>>>>>
>>>>> ...snip...
>>>>>>>> +void kvm_ap_refcount_inc(struct kvm *kvm)
>>>>>>>> +{
>>>>>>>> + atomic_inc(&kvm->arch.crypto.aprefs);
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL(kvm_ap_refcount_inc);
>>>>>>>> +
>>>>>>>> +void kvm_ap_refcount_dec(struct kvm *kvm)
>>>>>>>> +{
>>>>>>>> + atomic_dec(&kvm->arch.crypto.aprefs);
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL(kvm_ap_refcount_dec);
>>>>>>>
>>>>>>> Why are these functions inside kvm-ap ?
>>>>>>> Will anyone use this outer of vfio-ap ?
>>>>>>
>>>>>> As I've stated before, I made the choice to contain all
>>>>>> interfaces that
>>>>>> access KVM in kvm-ap because I don't think it is appropriate for
>>>>>> the device
>>>>>> driver to have to have "knowledge" of the inner workings of KVM.
>>>>>> Why does
>>>>>> it matter whether any entity outside of the vfio_ap device driver
>>>>>> calls
>>>>>> these functions? I could ask a similar question if the interfaces
>>>>>> were
>>>>>> contained in vfio-ap; what if another device driver needs access
>>>>>> to these
>>>>>> interfaces?
>>>>>
>>>>> This is very driver specific and only used during initialization.
>>>>> It is not a common property of the cryptographic interface.
>>>>>
>>>>> I really think you should handle this inside the driver.
>>>>
>>>> We are going to have to agree to disagree on this one. Is it not
>>>> possible
>>>> that future drivers - e.g., when full virtualization is implemented
>>>> - will
>>>> require access to KVM?
>>>
>>> I do not think that an access to KVM is required for full
>>> virtualization.
>>
>> You may be right, but at this point, there is no guarantee. I stand
>> by my
>> design on this one.
>
> I really regret that we abandoned the initial design with the matrix
> bus and one
> single parent matrix device per guest.
This is an interesting time to be bringing this up.
> We would not have the problem of these KVM dependencies.
How does that eliminate these KVM dependencies? We would still have to
configure
the guest's SIE state description - i.e., ECA.28 and the CRYCB - regardless
of the number or purpose of the matrix devices. To what KVM dependencies are
you referring?
>
>
> It had the advantage of taking care of having only one device per guest
> (available_instance = 1),
Maybe you didn't state this as you intended, but when you refer to
available_instances, you are referring to mediated devices. We allow
only one mediated device per guest in the current design. I suspect
that is not what you meant here.
> could take care of provisioning as you have
> sysfs entries available for a matrix without having a guest and a
> mediated
> device.
I assume here that you are saying that the matrix configuration would be
done via sysfs files for the matrix device as opposed to the mediated
device?
>
>
> it also had advantage for virtualization to keep host side and guest
> side matrix
> separate inside parent (host side) and mediated device (guest side).
In my opinion, since the AP devices assigned to the matrix device are
used only by
a guest (i.e., pass-through) and never by the host, it is all guest side
configuration.
Even if we map virtual AP devices to real AP devices, the mapping is still
guest side configuration from my perspective. I think this can all be
handled
by using differing mediated device types for pass-through, virtualized
and emulated
devices. In fact, early on I prototyped the mediated device sysfs
structures for
configuring all three mediated device types if you recall. I see no
advantage
to keeping separate configurations for host and guest sides and in fact
think it
complicates things.
>
>
> Shouldn't we treat this problem with a design using standard interfaces
> Instead of adding new dedicated interfaces?
I do not understand this question. I believe we are using standard
interfaces.
We use the bind/unbind interface to reserve queues for use by guests and
have
sysfs attributes for the mediated devices that map directly to the APM, AQM
and ADM. What do you mean by dedicated interfaces?
In fact, I think the design about which you speak introduces a need for
non-standard and confusing interfaces. For example, think about securing
AP queues; you'd have to unbind the queues from a device driver on the
AP bus and bind them to a driver on a different bus, the matrix bus.
This would require radical design changes to and/or introduction of
non-standard
interfaces on the AP bus. It would also introduce some unusual sysfs
interfaces
on the matrix driver to validate and commit the matrix - i.e., APM, AQM
- created
from the queues bound to it.
>
> Regards,
>
> Pierre
>
>
On 06/06/2018 01:40 PM, Pierre Morel wrote:
> On 06/06/2018 18:08, Pierre Morel wrote:
>> On 06/06/2018 16:28, Tony Krowiak wrote:
>>> On 06/05/2018 08:19 AM, Pierre Morel wrote:
>>>> On 30/05/2018 16:33, Tony Krowiak wrote:
>>>>> On 05/24/2018 05:08 AM, Pierre Morel wrote:
>>>>>> On 23/05/2018 16:45, Tony Krowiak wrote:
>>>>>>> On 05/16/2018 04:03 AM, Pierre Morel wrote:
>>>>>>>> On 07/05/2018 17:11, Tony Krowiak wrote:
>>>>>>>>> Implements the open callback on the mediated matrix device.
>>>>>>>>> The function registers a group notifier to receive notification
>>>>>>>>> of the VFIO_GROUP_NOTIFY_SET_KVM event. When notified,
>>>>>>>>> the vfio_ap device driver will get access to the guest's
>>>>>>>>> kvm structure. With access to this structure the driver will:
>>>>>>>>>
>>>>>>>>> 1. Ensure that only one mediated device is opened for the guest
>>>>>>
>>>>>> You should explain why.
>>>>>>
>>>>>>>>>
>>>>>>>>> 2. Configure access to the AP devices for the guest.
>>>>>>>>>
>>>>>> ...snip...
>>>>>>>>> +void kvm_ap_refcount_inc(struct kvm *kvm)
>>>>>>>>> +{
>>>>>>>>> + atomic_inc(&kvm->arch.crypto.aprefs);
>>>>>>>>> +}
>>>>>>>>> +EXPORT_SYMBOL(kvm_ap_refcount_inc);
>>>>>>>>> +
>>>>>>>>> +void kvm_ap_refcount_dec(struct kvm *kvm)
>>>>>>>>> +{
>>>>>>>>> + atomic_dec(&kvm->arch.crypto.aprefs);
>>>>>>>>> +}
>>>>>>>>> +EXPORT_SYMBOL(kvm_ap_refcount_dec);
>>>>>>>>
>>>>>>>> Why are these functions inside kvm-ap ?
>>>>>>>> Will anyone use this outer of vfio-ap ?
>>>>>>>
>>>>>>> As I've stated before, I made the choice to contain all
>>>>>>> interfaces that
>>>>>>> access KVM in kvm-ap because I don't think it is appropriate for
>>>>>>> the device
>>>>>>> driver to have to have "knowledge" of the inner workings of KVM.
>>>>>>> Why does
>>>>>>> it matter whether any entity outside of the vfio_ap device
>>>>>>> driver calls
>>>>>>> these functions? I could ask a similar question if the
>>>>>>> interfaces were
>>>>>>> contained in vfio-ap; what if another device driver needs access
>>>>>>> to these
>>>>>>> interfaces?
>>>>>>
>>>>>> This is very driver specific and only used during initialization.
>>>>>> It is not a common property of the cryptographic interface.
>>>>>>
>>>>>> I really think you should handle this inside the driver.
>>>>>
>>>>> We are going to have to agree to disagree on this one. Is it not
>>>>> possible
>>>>> that future drivers - e.g., when full virtualization is
>>>>> implemented - will
>>>>> require access to KVM?
>>>>
>>>> I do not think that an access to KVM is required for full
>>>> virtualization.
>>>
>>> You may be right, but at this point, there is no guarantee. I stand
>>> by my
>>> design on this one.
>>
>> I really regret that we abandoned the initial design with the matrix
>> bus and one
>> single parent matrix device per guest.
>> We would not have the problem of these KVM dependencies.
>>
>> It had the advantage of taking care of having only one device per guest
>> (available_instance = 1), could take care of provisioning as you have
>> sysfs entries available for a matrix without having a guest and a
>> mediated
>> device.
>>
>> it also had advantage for virtualization to keep host side and guest
>> side matrix
>> separate inside parent (host side) and mediated device (guest side).
>>
>> Shouldn't we treat this problem with a design using standard interfaces
>> Instead of adding new dedicated interfaces?
>>
>> Regards,
>>
>> Pierre
>>
>>
>
> Forget it.
>
> I am not happy with the design but the design I was speaking of may
> not be the solution either.
The AP architecture makes virtualization of AP devices complex. We tried
the solution you
described and found it to be sorely lacking which is why we ended up
where we are now.
>
>
> Sorry for the noise.
>
> Regards,
>
> Pierre
>
>
On 06/07/2018 02:53 PM, Tony Krowiak wrote:
>>>> 2) As I said above, what you show is not the effective mask used by the guest
>>>
>>> Why would a sysfs attribute for the mediated matrix device show the effective
>>> mask used by the guest?
>>
>> OK, bad word, "effective", replace with "really".
>>
>> We do not implement any kind of provisioning nor do we implement update
>> of the CRYCB at any point after the first mediated device open.
>
> I think this is a way we might be able to hot plug/unplug devices.
>
>>
>>
>> Binding a queue and updating the mask can be done at any time (may be we should change this ?)
>
> As I said above, I think we can utilize this as a means of hot plugging/unplugging AP
> adapters and domains. If the guest is running when an adapter or domain is assigned,
> we can update the guest's CRYCB at that time.
>
>>
>>
>> What is the point of showing a matrix which will never be used by the guest?
>
> That is simply not true. The matrix WILL be used by a guest the next time a
> guest is configured with a vfio-ap device referencing the path to the
> mediated matrix device - i.e., -device vfio-ap,sysfsdev=$PATH. The point
> is to show the matrix assigned to the mediated matrix device. In my mind, the
> mediated matrix device is a separate object from the guest. Sure it is used
> to configure a guest's matrix when the guest is started, but it could be used
> to configure the matrix for any guest; it has no direct connection to a
> particular guest until a guest using the device is started. IMHO the sysfs
> attributes for the mediated matrix device reflect only the attributes of
> the device, not the attributes of a guest.
So bottom line is what? Is the interface going to change so that modifications
to the mdev's matrix will be reflected immediately -- to support hotplug of
domains and ap cards?
Or are you intending to keep the interface as is?
If the matrix assigned to the mediated device can differ from the matrix
of the guest (that is the masks in the CRYCB, and I'm talking about a running
guest) do you provide a way for the host admin to examine the matrix of the
guest? If not, why do you think that information is irrelevant to the host
admin?
Regards,
Halil
On 06/07/2018 09:16 AM, Halil Pasic wrote:
>
>
> On 06/07/2018 02:53 PM, Tony Krowiak wrote:
>>>>> 2) As I said above, what you show is not the effective mask used
>>>>> by the guest
>>>>
>>>> Why would a sysfs attribute for the mediated matrix device show the
>>>> effective
>>>> mask used by the guest?
>>>
>>> OK, bad word, "effective", replace with "really".
>>>
>>> We do not implement any kind of provisioning nor do we implement update
>>> of the CRYCB at any point after the first mediated device open.
>>
>> I think this is a way we might be able to hot plug/unplug devices.
>>
>>>
>>>
>>> Binding a queue and updating the mask can be done at any time (may
>>> be we should change this ?)
>>
>> As I said above, I think we can utilize this as a means of hot
>> plugging/unplugging AP
>> adapters and domains. If the guest is running when an adapter or
>> domain is assigned,
>> we can update the guest's CRYCB at that time.
>>
>>>
>>>
>>> What is the point of showing a matrix which will never be used by
>>> the guest?
>>
>> That is simply not true. The matrix WILL be used by a guest the next time a
>>
>> guest is configured with a vfio-ap device referencing the path to the
>> mediated matrix device - i.e., -device vfio-ap,sysfsdev=$PATH. The point
>> is to show the matrix assigned to the mediated matrix device. In my
>> mind, the
>> mediated matrix device is a separate object from the guest. Sure it is used
>>
>> to configure a guest's matrix when the guest is started, but it could
>> be used
>> to configure the matrix for any guest; it has no direct connection to a
>> particular guest until a guest using the device is started. IMHO the sysfs
>>
>> attributes for the mediated matrix device reflect only the attributes of
>> the device, not the attributes of a guest.
>
> So bottom line is what? Is the interface going to change so that
> modifications
> to the mdev's matrix will be reflected immediately -- to support
> hotplug of
> domains and ap cards?
>
>
> Or are you intending to keep the interface as is?
I have been looking in to hot plug/unplug. I am in the exploratory phase and
have not yet come up with a concrete plan, but I believe we will be able to
hot plug/unplug adapters and domains using the sysfs attributes interfaces
for the mediated matrix device.
>
>
> If the matrix assigned to the mediated device can differ from the matrix
> of the guest (that is the masks in the CRYCB, and I'm talking about a
> running
> guest) do you provide a way for the host admin to examine the matrix
> of the
> guest? If not, why do you think that information is irrelevant to the
> host
> admin?
I never said the information contained in the CRYCB is irrelevant to the
host
admin. What I said was that the sysfs attributes apply to the mediated
device,
not the guest. There may or may not be a guest using the mediated device at
any given point in time. I do not currently provide a way to examine the
matrix
of the guest. I'm not sure whether a sysfs attribute of the mediated
device is the appropriate venue for displaying what's in the guest's CRYCB.
I suppose we could use the mediated device matrix attribute for this purpose
if you and Pierre insist, but I think we still need to
display the matrix or devices configured for the mediated device. It
does beg
the question, are there interfaces an admin can use to display all of
the other
devices being used by a guest? The only surefire way to see which AP devices
are actually used by the guest is to execute lszcrypt on the guest itself.
>
>
> Regards,
> Halil
On 07/06/2018 15:54, Tony Krowiak wrote:
> On 06/06/2018 01:40 PM, Pierre Morel wrote:
>> On 06/06/2018 18:08, Pierre Morel wrote:
>>> On 06/06/2018 16:28, Tony Krowiak wrote:
>>>> On 06/05/2018 08:19 AM, Pierre Morel wrote:
>>>>> On 30/05/2018 16:33, Tony Krowiak wrote:
>>>>>> On 05/24/2018 05:08 AM, Pierre Morel wrote:
>>>>>>> On 23/05/2018 16:45, Tony Krowiak wrote:
>>>>>>>> On 05/16/2018 04:03 AM, Pierre Morel wrote:
>>>>>>>>> On 07/05/2018 17:11, Tony Krowiak wrote:
>>>>>>>>>> Implements the open callback on the mediated matrix device.
>>>>>>>>>> The function registers a group notifier to receive notification
>>>>>>>>>> of the VFIO_GROUP_NOTIFY_SET_KVM event. When notified,
>>>>>>>>>> the vfio_ap device driver will get access to the guest's
>>>>>>>>>> kvm structure. With access to this structure the driver will:
>>>>>>>>>>
>>>>>>>>>> 1. Ensure that only one mediated device is opened for the guest
>>>>>>>
>>>>>>> You should explain why.
>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2. Configure access to the AP devices for the guest.
>>>>>>>>>>
>>>>>>> ...snip...
>>>>>>>>>> +void kvm_ap_refcount_inc(struct kvm *kvm)
>>>>>>>>>> +{
>>>>>>>>>> + atomic_inc(&kvm->arch.crypto.aprefs);
>>>>>>>>>> +}
>>>>>>>>>> +EXPORT_SYMBOL(kvm_ap_refcount_inc);
>>>>>>>>>> +
>>>>>>>>>> +void kvm_ap_refcount_dec(struct kvm *kvm)
>>>>>>>>>> +{
>>>>>>>>>> + atomic_dec(&kvm->arch.crypto.aprefs);
>>>>>>>>>> +}
>>>>>>>>>> +EXPORT_SYMBOL(kvm_ap_refcount_dec);
>>>>>>>>>
>>>>>>>>> Why are these functions inside kvm-ap ?
>>>>>>>>> Will anyone use this outer of vfio-ap ?
>>>>>>>>
>>>>>>>> As I've stated before, I made the choice to contain all
>>>>>>>> interfaces that
>>>>>>>> access KVM in kvm-ap because I don't think it is appropriate
>>>>>>>> for the device
>>>>>>>> driver to have to have "knowledge" of the inner workings of
>>>>>>>> KVM. Why does
>>>>>>>> it matter whether any entity outside of the vfio_ap device
>>>>>>>> driver calls
>>>>>>>> these functions? I could ask a similar question if the
>>>>>>>> interfaces were
>>>>>>>> contained in vfio-ap; what if another device driver needs
>>>>>>>> access to these
>>>>>>>> interfaces?
>>>>>>>
>>>>>>> This is very driver specific and only used during initialization.
>>>>>>> It is not a common property of the cryptographic interface.
>>>>>>>
>>>>>>> I really think you should handle this inside the driver.
>>>>>>
>>>>>> We are going to have to agree to disagree on this one. Is it not
>>>>>> possible
>>>>>> that future drivers - e.g., when full virtualization is
>>>>>> implemented - will
>>>>>> require access to KVM?
>>>>>
>>>>> I do not think that an access to KVM is required for full
>>>>> virtualization.
>>>>
>>>> You may be right, but at this point, there is no guarantee. I stand
>>>> by my
>>>> design on this one.
>>>
>>> I really regret that we abandoned the initial design with the matrix
>>> bus and one
>>> single parent matrix device per guest.
>>> We would not have the problem of these KVM dependencies.
>>>
>>> It had the advantage of taking care of having only one device per guest
>>> (available_instance = 1), could take care of provisioning as you have
>>> sysfs entries available for a matrix without having a guest and a
>>> mediated
>>> device.
>>>
>>> it also had advantage for virtualization to keep host side and guest
>>> side matrix
>>> separate inside parent (host side) and mediated device (guest side).
>>>
>>> Shouldn't we treat this problem with a design using standard interfaces
>>> Instead of adding new dedicated interfaces?
>>>
>>> Regards,
>>>
>>> Pierre
>>>
>>>
>>
>> Forget it.
>>
>> I am not happy with the design but the design I was speaking of may
>> not be the solution either.
>
> The AP architecture makes virtualization of AP devices complex. We
> tried the solution you
> described and found it to be sorely lacking which is why we ended up
> where we are now.
I did not see any explanation on why between v1 and v2 as it was abandoned.
We have internal structures like the ap_matrix and kvm_ap_matrix
which look like the bus/devices we had previously but are differently
or not at all integrated with the LDD.
Also I think that with a little data structure refactoring you can avoid
most of
the code in the arch/s390/kvm.
For example, storing the kvm pointer inside the kvm_ap_matrix and
maintaining a list of the kvm_ap_matrix structures allows to easily know
if a guest already has an associated mediated device.
Pierre
>
>>
>>
>> Sorry for the noise.
>>
>> Regards,
>>
>> Pierre
>>
>>
>
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On 06/07/2018 11:20 AM, Pierre Morel wrote:
> On 07/06/2018 15:54, Tony Krowiak wrote:
>> On 06/06/2018 01:40 PM, Pierre Morel wrote:
>>> On 06/06/2018 18:08, Pierre Morel wrote:
>>>> On 06/06/2018 16:28, Tony Krowiak wrote:
>>>>> On 06/05/2018 08:19 AM, Pierre Morel wrote:
>>>>>> On 30/05/2018 16:33, Tony Krowiak wrote:
>>>>>>> On 05/24/2018 05:08 AM, Pierre Morel wrote:
>>>>>>>> On 23/05/2018 16:45, Tony Krowiak wrote:
>>>>>>>>> On 05/16/2018 04:03 AM, Pierre Morel wrote:
>>>>>>>>>> On 07/05/2018 17:11, Tony Krowiak wrote:
>>>>>>>>>>> Implements the open callback on the mediated matrix device.
>>>>>>>>>>> The function registers a group notifier to receive notification
>>>>>>>>>>> of the VFIO_GROUP_NOTIFY_SET_KVM event. When notified,
>>>>>>>>>>> the vfio_ap device driver will get access to the guest's
>>>>>>>>>>> kvm structure. With access to this structure the driver will:
>>>>>>>>>>>
>>>>>>>>>>> 1. Ensure that only one mediated device is opened for the guest
>>>>>>>>
>>>>>>>> You should explain why.
>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2. Configure access to the AP devices for the guest.
>>>>>>>>>>>
>>>>>>>> ...snip...
>>>>>>>>>>> +void kvm_ap_refcount_inc(struct kvm *kvm)
>>>>>>>>>>> +{
>>>>>>>>>>> + atomic_inc(&kvm->arch.crypto.aprefs);
>>>>>>>>>>> +}
>>>>>>>>>>> +EXPORT_SYMBOL(kvm_ap_refcount_inc);
>>>>>>>>>>> +
>>>>>>>>>>> +void kvm_ap_refcount_dec(struct kvm *kvm)
>>>>>>>>>>> +{
>>>>>>>>>>> + atomic_dec(&kvm->arch.crypto.aprefs);
>>>>>>>>>>> +}
>>>>>>>>>>> +EXPORT_SYMBOL(kvm_ap_refcount_dec);
>>>>>>>>>>
>>>>>>>>>> Why are these functions inside kvm-ap ?
>>>>>>>>>> Will anyone use this outer of vfio-ap ?
>>>>>>>>>
>>>>>>>>> As I've stated before, I made the choice to contain all
>>>>>>>>> interfaces that
>>>>>>>>> access KVM in kvm-ap because I don't think it is appropriate
>>>>>>>>> for the device
>>>>>>>>> driver to have to have "knowledge" of the inner workings of
>>>>>>>>> KVM. Why does
>>>>>>>>> it matter whether any entity outside of the vfio_ap device
>>>>>>>>> driver calls
>>>>>>>>> these functions? I could ask a similar question if the
>>>>>>>>> interfaces were
>>>>>>>>> contained in vfio-ap; what if another device driver needs
>>>>>>>>> access to these
>>>>>>>>> interfaces?
>>>>>>>>
>>>>>>>> This is very driver specific and only used during initialization.
>>>>>>>> It is not a common property of the cryptographic interface.
>>>>>>>>
>>>>>>>> I really think you should handle this inside the driver.
>>>>>>>
>>>>>>> We are going to have to agree to disagree on this one. Is it not
>>>>>>> possible
>>>>>>> that future drivers - e.g., when full virtualization is
>>>>>>> implemented - will
>>>>>>> require access to KVM?
>>>>>>
>>>>>> I do not think that an access to KVM is required for full
>>>>>> virtualization.
>>>>>
>>>>> You may be right, but at this point, there is no guarantee. I
>>>>> stand by my
>>>>> design on this one.
>>>>
>>>> I really regret that we abandoned the initial design with the
>>>> matrix bus and one
>>>> single parent matrix device per guest.
>>>> We would not have the problem of these KVM dependencies.
>>>>
>>>> It had the advantage of taking care of having only one device per
>>>> guest
>>>> (available_instance = 1), could take care of provisioning as you have
>>>> sysfs entries available for a matrix without having a guest and a
>>>> mediated
>>>> device.
>>>>
>>>> it also had advantage for virtualization to keep host side and
>>>> guest side matrix
>>>> separate inside parent (host side) and mediated device (guest side).
>>>>
>>>> Shouldn't we treat this problem with a design using standard
>>>> interfaces
>>>> Instead of adding new dedicated interfaces?
>>>>
>>>> Regards,
>>>>
>>>> Pierre
>>>>
>>>>
>>>
>>> Forget it.
>>>
>>> I am not happy with the design but the design I was speaking of may
>>> not be the solution either.
>>
>> The AP architecture makes virtualization of AP devices complex. We
>> tried the solution you
>> described and found it to be sorely lacking which is why we ended up
>> where we are now.
>
> I did not see any explanation on why between v1 and v2 as it was
> abandoned.
>
>
> We have internal structures like the ap_matrix and kvm_ap_matrix
> which look like the bus/devices we had previously but are differently
> or not at all integrated with the LDD.
What is LDD? Are you talking about dependencies between the vfio_ap device
driver and KVM? If so, see my arguments below.
>
>
> Also I think that with a little data structure refactoring you can
> avoid most of
> the code in the arch/s390/kvm.
How will structure refactoring help us avoid the code for updating the CRYCB
in the guest's SIE state description.
>
>
> For example, storing the kvm pointer inside the kvm_ap_matrix and
> maintaining a list of the kvm_ap_matrix structures allows to easily know
> if a guest already has an associated mediated device.
How is that easier than storing the kvm pointer inside of the mediated
matrix
device (i.e., struct ap_matrix_mdev) which also contains the struct
kvm_ap_matrix?
How does that allow us to avoid the code in arch/s390/kvm? We still need
the code
to update the CRYCB in the SIE block. I can obviously avoid placing that
code in
kvm-ap.c and move it to the driver, but I already explained my reasoning
for
keeping it in KVM. Let's face it, there is no way around the dependency
between
the vfio_ap device driver and KVM unless guest matrix configuration is
managed
solely by KVM through KVM interfaces.
Why maintain a list of kvm_ap_matrix structures if we don't have to; it
is stored
with the mediated matrix device which is passed in to all of the vfio_ap
driver
callbacks.
>
>
> Pierre
>
>>
>>>
>>>
>>> Sorry for the noise.
>>>
>>> Regards,
>>>
>>> Pierre
>>>
>>>
>>
>
On 07/06/2018 18:30, Tony Krowiak wrote:
> On 06/07/2018 11:20 AM, Pierre Morel wrote:
>> On 07/06/2018 15:54, Tony Krowiak wrote:
>>> On 06/06/2018 01:40 PM, Pierre Morel wrote:
>>>> On 06/06/2018 18:08, Pierre Morel wrote:
>>>>> On 06/06/2018 16:28, Tony Krowiak wrote:
>>>>>> On 06/05/2018 08:19 AM, Pierre Morel wrote:
>>>>>>> On 30/05/2018 16:33, Tony Krowiak wrote:
>>>>>>>> On 05/24/2018 05:08 AM, Pierre Morel wrote:
>>>>>>>>> On 23/05/2018 16:45, Tony Krowiak wrote:
>>>>>>>>>> On 05/16/2018 04:03 AM, Pierre Morel wrote:
>>>>>>>>>>> On 07/05/2018 17:11, Tony Krowiak wrote:
>>>>>>>>>>>> Implements the open callback on the mediated matrix device.
>>>>>>>>>>>> The function registers a group notifier to receive
>>>>>>>>>>>> notification
>>>>>>>>>>>> of the VFIO_GROUP_NOTIFY_SET_KVM event. When notified,
>>>>>>>>>>>> the vfio_ap device driver will get access to the guest's
>>>>>>>>>>>> kvm structure. With access to this structure the driver will:
>>>>>>>>>>>>
>>>>>>>>>>>> 1. Ensure that only one mediated device is opened for the
>>>>>>>>>>>> guest
>>>>>>>>>
>>>>>>>>> You should explain why.
>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2. Configure access to the AP devices for the guest.
>>>>>>>>>>>>
>>>>>>>>> ...snip...
>>>>>>>>>>>> +void kvm_ap_refcount_inc(struct kvm *kvm)
>>>>>>>>>>>> +{
>>>>>>>>>>>> + atomic_inc(&kvm->arch.crypto.aprefs);
>>>>>>>>>>>> +}
>>>>>>>>>>>> +EXPORT_SYMBOL(kvm_ap_refcount_inc);
>>>>>>>>>>>> +
>>>>>>>>>>>> +void kvm_ap_refcount_dec(struct kvm *kvm)
>>>>>>>>>>>> +{
>>>>>>>>>>>> + atomic_dec(&kvm->arch.crypto.aprefs);
>>>>>>>>>>>> +}
>>>>>>>>>>>> +EXPORT_SYMBOL(kvm_ap_refcount_dec);
>>>>>>>>>>>
>>>>>>>>>>> Why are these functions inside kvm-ap ?
>>>>>>>>>>> Will anyone use this outer of vfio-ap ?
>>>>>>>>>>
>>>>>>>>>> As I've stated before, I made the choice to contain all
>>>>>>>>>> interfaces that
>>>>>>>>>> access KVM in kvm-ap because I don't think it is appropriate
>>>>>>>>>> for the device
>>>>>>>>>> driver to have to have "knowledge" of the inner workings of
>>>>>>>>>> KVM. Why does
>>>>>>>>>> it matter whether any entity outside of the vfio_ap device
>>>>>>>>>> driver calls
>>>>>>>>>> these functions? I could ask a similar question if the
>>>>>>>>>> interfaces were
>>>>>>>>>> contained in vfio-ap; what if another device driver needs
>>>>>>>>>> access to these
>>>>>>>>>> interfaces?
>>>>>>>>>
>>>>>>>>> This is very driver specific and only used during initialization.
>>>>>>>>> It is not a common property of the cryptographic interface.
>>>>>>>>>
>>>>>>>>> I really think you should handle this inside the driver.
>>>>>>>>
>>>>>>>> We are going to have to agree to disagree on this one. Is it
>>>>>>>> not possible
>>>>>>>> that future drivers - e.g., when full virtualization is
>>>>>>>> implemented - will
>>>>>>>> require access to KVM?
>>>>>>>
>>>>>>> I do not think that an access to KVM is required for full
>>>>>>> virtualization.
>>>>>>
>>>>>> You may be right, but at this point, there is no guarantee. I
>>>>>> stand by my
>>>>>> design on this one.
>>>>>
>>>>> I really regret that we abandoned the initial design with the
>>>>> matrix bus and one
>>>>> single parent matrix device per guest.
>>>>> We would not have the problem of these KVM dependencies.
>>>>>
>>>>> It had the advantage of taking care of having only one device per
>>>>> guest
>>>>> (available_instance = 1), could take care of provisioning as you have
>>>>> sysfs entries available for a matrix without having a guest and a
>>>>> mediated
>>>>> device.
>>>>>
>>>>> it also had advantage for virtualization to keep host side and
>>>>> guest side matrix
>>>>> separate inside parent (host side) and mediated device (guest side).
>>>>>
>>>>> Shouldn't we treat this problem with a design using standard
>>>>> interfaces
>>>>> Instead of adding new dedicated interfaces?
>>>>>
>>>>> Regards,
>>>>>
>>>>> Pierre
>>>>>
>>>>>
>>>>
>>>> Forget it.
>>>>
>>>> I am not happy with the design but the design I was speaking of may
>>>> not be the solution either.
>>>
>>> The AP architecture makes virtualization of AP devices complex. We
>>> tried the solution you
>>> described and found it to be sorely lacking which is why we ended up
>>> where we are now.
>>
>> I did not see any explanation on why between v1 and v2 as it was
>> abandoned.
>>
>>
>> We have internal structures like the ap_matrix and kvm_ap_matrix
>> which look like the bus/devices we had previously but are differently
>> or not at all integrated with the LDD.
>
> What is LDD? Are you talking about dependencies between the vfio_ap
> device
> driver and KVM? If so, see my arguments below.
>
>>
>>
>> Also I think that with a little data structure refactoring you can
>> avoid most of
>> the code in the arch/s390/kvm.
>
> How will structure refactoring help us avoid the code for updating the
> CRYCB
> in the guest's SIE state description.
>
>>
>>
>> For example, storing the kvm pointer inside the kvm_ap_matrix and
>> maintaining a list of the kvm_ap_matrix structures allows to easily know
>> if a guest already has an associated mediated device.
>
> How is that easier than storing the kvm pointer inside of the mediated
> matrix
> device (i.e., struct ap_matrix_mdev) which also contains the struct
> kvm_ap_matrix?
you can put it in ap_matrix_mdev but just the name "kvm_ap_matrix" make the
last one a better candidate for my opinion.
> How does that allow us to avoid the code in arch/s390/kvm?
This alone does not.
> We still need the code
> to update the CRYCB in the SIE block. I can obviously avoid placing
> that code in
> kvm-ap.c and move it to the driver, but I already explained my
> reasoning for
> keeping it in KVM. Let's face it, there is no way around the
> dependency between
> the vfio_ap device driver and KVM unless guest matrix configuration is
> managed
> solely by KVM through KVM interfaces.
We get the pointer to KVM from the VFIO interface.
That we both discuss on this is sterile.
The only one who could say what is right is a S390 KVM maintainer.
This would end the discussion.
My point was just to say that we have an alternative.
>
> Why maintain a list of kvm_ap_matrix structures if we don't have to;
> it is stored
> with the mediated matrix device which is passed in to all of the
> vfio_ap driver
> callbacks.
Because using the vm_list which is a static in kvm makes you stick
inside the kvm code.
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On 06/07/2018 01:15 PM, Pierre Morel wrote:
>
>>>
>>>
>>> We have internal structures like the ap_matrix and kvm_ap_matrix
>>> which look like the bus/devices we had previously but are differently
>>> or not at all integrated with the LDD.
>>
>> What is LDD? Are you talking about dependencies between the vfio_ap
>> device
>> driver and KVM? If so, see my arguments below.
>>
>>>
>>>
>>> Also I think that with a little data structure refactoring you can
>>> avoid most of
>>> the code in the arch/s390/kvm.
>>
>> How will structure refactoring help us avoid the code for updating
>> the CRYCB
>> in the guest's SIE state description.
>>
>>>
>>>
>>> For example, storing the kvm pointer inside the kvm_ap_matrix and
>>> maintaining a list of the kvm_ap_matrix structures allows to easily
>>> know
>>> if a guest already has an associated mediated device.
>>
>> How is that easier than storing the kvm pointer inside of the
>> mediated matrix
>> device (i.e., struct ap_matrix_mdev) which also contains the struct
>> kvm_ap_matrix?
>
> you can put it in ap_matrix_mdev but just the name "kvm_ap_matrix"
> make the
> last one a better candidate for my opinion.
It's been in ap_matrix_mdev since v2, but I'll consider moving it to
kvm_ap_matrix.
>
>
>> How does that allow us to avoid the code in arch/s390/kvm?
>
> This alone does not.
>
>> We still need the code
>> to update the CRYCB in the SIE block. I can obviously avoid placing
>> that code in
>> kvm-ap.c and move it to the driver, but I already explained my
>> reasoning for
>> keeping it in KVM. Let's face it, there is no way around the
>> dependency between
>> the vfio_ap device driver and KVM unless guest matrix configuration
>> is managed
>> solely by KVM through KVM interfaces.
>
> We get the pointer to KVM from the VFIO interface.
> That we both discuss on this is sterile.
> The only one who could say what is right is a S390 KVM maintainer.
> This would end the discussion.
> My point was just to say that we have an alternative.
>
>
>>
>> Why maintain a list of kvm_ap_matrix structures if we don't have to;
>> it is stored
>> with the mediated matrix device which is passed in to all of the
>> vfio_ap driver
>> callbacks.
>
> Because using the vm_list which is a static in kvm makes you stick
> inside the kvm code.
I understand your point here, but even if we did maintain a list of
kvm_ap_matrix structures,
we still need the kvm code to configure the guest's CRYCB and eventually
ECA.28. There is
also code in kvm-ap.c that is called from KVM. The idea behind kvm-ap.c
is that all code
related to configuration of AP structures in KVM is in this one spot.
>
>
>
On 08/06/2018 23:59, Tony Krowiak wrote:
> On 06/07/2018 01:15 PM, Pierre Morel wrote:
>>
...snip...
>>>>
>>>>>
>>>>> Why maintain a list of kvm_ap_matrix structures if we don't have
>>>>> to; it is stored
>>>>> with the mediated matrix device which is passed in to all of the
>>>>> vfio_ap driver
>>>>> callbacks.
>>>>
>>>> Because using the vm_list which is a static in kvm makes you stick
>>>> inside the kvm code.
>
> I understand your point here, but even if we did maintain a list of
> kvm_ap_matrix structures,
> we still need the kvm code to configure the guest's CRYCB and
> eventually ECA.28. There is
> also code in kvm-ap.c that is called from KVM.
The only code from kvm-ap which is called from KVM is temporary code
waiting for Harald to offer the clean interface to AP instructions.
> The idea behind kvm-ap.c is that all code
> related to configuration of AP structures in KVM is in this one spot.
This I understand, but the code can be in one spot inside VFIO_AP instead
of inside KVM.
Putting the code inside KVM induce dependencies between KVM and AP
while the kvm/vfio interface allows to avoid this dependency.
The purpose of VFIO_AP is to handle the CRYCB, all get/clear/set crycb masks
functions should be in VFIO AP.
If we use wrappers in KVM, since the CRYCB is an a SIE extension,
it is legitimate, the KVM interface to the CRYCB should only
handle bitmaps and be unaware of the vfio_ap internal structures.
Another concern, the kvm_ap_validate_queue_sharing() should not be
inside KVM because it is a decision of current VFIO_AP driver
to not share the queues between guest of level 2.
The Z architecture does not allow to share AP queues between
guests of level 1 but we could re-engineer the AP bus and the '
VFIO AP to offer queue sharing for guest level 2.
This would be a new VFIO_AP driver (and an AP bus extension).
We should not have to change KVM for this.
Regards,
Pierre
>
>>
>>
>>
>
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On 06/11/2018 11:23 AM, Pierre Morel wrote:
> On 08/06/2018 23:59, Tony Krowiak wrote:
>> On 06/07/2018 01:15 PM, Pierre Morel wrote:
>>>
>
> ...snip...
>
>>>>>
>>>>>>
>>>>>> Why maintain a list of kvm_ap_matrix structures if we don't have to; it is stored
>>>>>> with the mediated matrix device which is passed in to all of the vfio_ap driver
>>>>>> callbacks.
>>>>>
>>>>> Because using the vm_list which is a static in kvm makes you stick inside the kvm code.
>>
>> I understand your point here, but even if we did maintain a list of kvm_ap_matrix structures,
>> we still need the kvm code to configure the guest's CRYCB and eventually ECA.28. There is
>> also code in kvm-ap.c that is called from KVM.
>
> The only code from kvm-ap which is called from KVM is temporary code
> waiting for Harald to offer the clean interface to AP instructions.
>
>> The idea behind kvm-ap.c is that all code
>> related to configuration of AP structures in KVM is in this one spot.
>
> This I understand, but the code can be in one spot inside VFIO_AP instead
> of inside KVM.
> Putting the code inside KVM induce dependencies between KVM and AP
> while the kvm/vfio interface allows to avoid this dependency.
>
> The purpose of VFIO_AP is to handle the CRYCB, all get/clear/set crycb masks
> functions should be in VFIO AP.
>
> If we use wrappers in KVM, since the CRYCB is an a SIE extension,
> it is legitimate, the KVM interface to the CRYCB should only
> handle bitmaps and be unaware of the vfio_ap internal structures.
>
>
> Another concern, the kvm_ap_validate_queue_sharing() should not be
> inside KVM because it is a decision of current VFIO_AP driver
> to not share the queues between guest of level 2.
>
> The Z architecture does not allow to share AP queues between
> guests of level 1 but we could re-engineer the AP bus and the '
> VFIO AP to offer queue sharing for guest level 2.
>
> This would be a new VFIO_AP driver (and an AP bus extension).
> We should not have to change KVM for this.
>
Pierre's proposal makes a lot of sense to me. We would not need to take
the kvm_lock (which we need to traverse the vm_list safely) for the
validation, and we could have immediate validation (which is in my opinion
better).
Also your refcount (which is not a refcout) could go away. You simply
traverse your list and check for duplicates when hooking up the mdev
with KVM.
And my opinion is if we don't have to add code to the kvm module we
better not.
@Janosch: Does core KVM share my opinion?
Regards,
Halil
On 11.06.2018 13:32, Halil Pasic wrote:
>
>
> On 06/11/2018 11:23 AM, Pierre Morel wrote:
>> On 08/06/2018 23:59, Tony Krowiak wrote:
>>> On 06/07/2018 01:15 PM, Pierre Morel wrote:
>>>>
>>
>> ...snip...
>>
>>>>>>
>>>>>>>
>>>>>>> Why maintain a list of kvm_ap_matrix structures if we don't have to; it is stored
>>>>>>> with the mediated matrix device which is passed in to all of the vfio_ap driver
>>>>>>> callbacks.
>>>>>>
>>>>>> Because using the vm_list which is a static in kvm makes you stick inside the kvm code.
>>>
>>> I understand your point here, but even if we did maintain a list of kvm_ap_matrix structures,
>>> we still need the kvm code to configure the guest's CRYCB and eventually ECA.28. There is
>>> also code in kvm-ap.c that is called from KVM.
>>
>> The only code from kvm-ap which is called from KVM is temporary code
>> waiting for Harald to offer the clean interface to AP instructions.
>>
>>> The idea behind kvm-ap.c is that all code
>>> related to configuration of AP structures in KVM is in this one spot.
>>
>> This I understand, but the code can be in one spot inside VFIO_AP instead
>> of inside KVM.
>> Putting the code inside KVM induce dependencies between KVM and AP
>> while the kvm/vfio interface allows to avoid this dependency.
>>
>> The purpose of VFIO_AP is to handle the CRYCB, all get/clear/set crycb masks
>> functions should be in VFIO AP.
>>
>> If we use wrappers in KVM, since the CRYCB is an a SIE extension,
>> it is legitimate, the KVM interface to the CRYCB should only
>> handle bitmaps and be unaware of the vfio_ap internal structures.
Yes, please!
>>
>> Another concern, the kvm_ap_validate_queue_sharing() should not be
>> inside KVM because it is a decision of current VFIO_AP driver
>> to not share the queues between guest of level 2.
>>
>> The Z architecture does not allow to share AP queues between
>> guests of level 1 but we could re-engineer the AP bus and the '
>> VFIO AP to offer queue sharing for guest level 2.
>>
>> This would be a new VFIO_AP driver (and an AP bus extension).
>> We should not have to change KVM for this.
>>
>
>
> Pierre's proposal makes a lot of sense to me. We would not need to take
> the kvm_lock (which we need to traverse the vm_list safely) for the
> validation, and we could have immediate validation (which is in my opinion
> better).
Please do not use the kvm_lock if possible.
>
> Also your refcount (which is not a refcout) could go away. You simply
> traverse your list and check for duplicates when hooking up the mdev
> with KVM.
>
> And my opinion is if we don't have to add code to the kvm module we
> better not.
>
> @Janosch: Does core KVM share my opinion?
At least I do.
KVM does not care about who has which crypto queue/card.
I'd like to have a driver that does internal bookkeeping and then
registers the crycb with KVM, so the VM can use it.
>
> Regards,
> Halil
>
On 06/11/2018 07:32 AM, Halil Pasic wrote:
>
>
> On 06/11/2018 11:23 AM, Pierre Morel wrote:
>> On 08/06/2018 23:59, Tony Krowiak wrote:
>>> On 06/07/2018 01:15 PM, Pierre Morel wrote:
>>>>
>>
>> ...snip...
>>
>>>>>>
>>>>>>>
>>>>>>> Why maintain a list of kvm_ap_matrix structures if we don't have
>>>>>>> to; it is stored
>>>>>>> with the mediated matrix device which is passed in to all of the
>>>>>>> vfio_ap driver
>>>>>>> callbacks.
>>>>>>
>>>>>> Because using the vm_list which is a static in kvm makes you
>>>>>> stick inside the kvm code.
>>>
>>> I understand your point here, but even if we did maintain a list of
>>> kvm_ap_matrix structures,
>>> we still need the kvm code to configure the guest's CRYCB and
>>> eventually ECA.28. There is
>>> also code in kvm-ap.c that is called from KVM.
>>
>> The only code from kvm-ap which is called from KVM is temporary code
>> waiting for Harald to offer the clean interface to AP instructions.
>>
>>> The idea behind kvm-ap.c is that all code
>>> related to configuration of AP structures in KVM is in this one spot.
>>
>> This I understand, but the code can be in one spot inside VFIO_AP
>> instead
>> of inside KVM.
>> Putting the code inside KVM induce dependencies between KVM and AP
>> while the kvm/vfio interface allows to avoid this dependency.
>>
>> The purpose of VFIO_AP is to handle the CRYCB, all get/clear/set
>> crycb masks
>> functions should be in VFIO AP.
>>
>> If we use wrappers in KVM, since the CRYCB is an a SIE extension,
>> it is legitimate, the KVM interface to the CRYCB should only
>> handle bitmaps and be unaware of the vfio_ap internal structures.
>>
>>
>> Another concern, the kvm_ap_validate_queue_sharing() should not be
>> inside KVM because it is a decision of current VFIO_AP driver
>> to not share the queues between guest of level 2.
>>
>> The Z architecture does not allow to share AP queues between
>> guests of level 1 but we could re-engineer the AP bus and the '
>> VFIO AP to offer queue sharing for guest level 2.
>>
>> This would be a new VFIO_AP driver (and an AP bus extension).
>> We should not have to change KVM for this.
>>
>
>
> Pierre's proposal makes a lot of sense to me. We would not need to take
> the kvm_lock (which we need to traverse the vm_list safely) for the
> validation, and we could have immediate validation (which is in my
> opinion
> better).
>
> Also your refcount (which is not a refcout) could go away. You simply
> traverse your list and check for duplicates when hooking up the mdev
> with KVM.
>
> And my opinion is if we don't have to add code to the kvm module we
> better not.
>
> @Janosch: Does core KVM share my opinion?
Okay, I'll make the change.
>
>
> Regards,
> Halil
On 06/11/2018 05:23 AM, Pierre Morel wrote:
> On 08/06/2018 23:59, Tony Krowiak wrote:
>> On 06/07/2018 01:15 PM, Pierre Morel wrote:
>>>
>
> ...snip...
>
>>>>>
>>>>>>
>>>>>> Why maintain a list of kvm_ap_matrix structures if we don't have
>>>>>> to; it is stored
>>>>>> with the mediated matrix device which is passed in to all of the
>>>>>> vfio_ap driver
>>>>>> callbacks.
>>>>>
>>>>> Because using the vm_list which is a static in kvm makes you stick
>>>>> inside the kvm code.
>>
>> I understand your point here, but even if we did maintain a list of
>> kvm_ap_matrix structures,
>> we still need the kvm code to configure the guest's CRYCB and
>> eventually ECA.28. There is
>> also code in kvm-ap.c that is called from KVM.
>
> The only code from kvm-ap which is called from KVM is temporary code
> waiting for Harald to offer the clean interface to AP instructions.
>
>> The idea behind kvm-ap.c is that all code
>> related to configuration of AP structures in KVM is in this one spot.
>
> This I understand, but the code can be in one spot inside VFIO_AP instead
> of inside KVM.
> Putting the code inside KVM induce dependencies between KVM and AP
> while the kvm/vfio interface allows to avoid this dependency.
>
>
> The purpose of VFIO_AP is to handle the CRYCB, all get/clear/set crycb
> masks
> functions should be in VFIO AP.
>
> If we use wrappers in KVM, since the CRYCB is an a SIE extension,
> it is legitimate, the KVM interface to the CRYCB should only
> handle bitmaps and be unaware of the vfio_ap internal structures.
>
>
> Another concern, the kvm_ap_validate_queue_sharing() should not be
> inside KVM because it is a decision of current VFIO_AP driver
> to not share the queues between guest of level 2.
>
> The Z architecture does not allow to share AP queues between
> guests of level 1 but we could re-engineer the AP bus and the '
> VFIO AP to offer queue sharing for guest level 2.
>
> This would be a new VFIO_AP driver (and an AP bus extension).
> We should not have to change KVM for this.
Based on your, Halil's and Janosch's comments, I will make the changes.
>
>
>
> Regards,
>
> Pierre
>
>
>>
>>>
>>>
>>>
>>
>
On 06/11/2018 07:49 AM, Janosch Frank wrote:
> On 11.06.2018 13:32, Halil Pasic wrote:
>>
>> On 06/11/2018 11:23 AM, Pierre Morel wrote:
>>> On 08/06/2018 23:59, Tony Krowiak wrote:
>>>> On 06/07/2018 01:15 PM, Pierre Morel wrote:
>>> ...snip...
>>>
>>>>>>>> Why maintain a list of kvm_ap_matrix structures if we don't have to; it is stored
>>>>>>>> with the mediated matrix device which is passed in to all of the vfio_ap driver
>>>>>>>> callbacks.
>>>>>>> Because using the vm_list which is a static in kvm makes you stick inside the kvm code.
>>>> I understand your point here, but even if we did maintain a list of kvm_ap_matrix structures,
>>>> we still need the kvm code to configure the guest's CRYCB and eventually ECA.28. There is
>>>> also code in kvm-ap.c that is called from KVM.
>>> The only code from kvm-ap which is called from KVM is temporary code
>>> waiting for Harald to offer the clean interface to AP instructions.
>>>
>>>> The idea behind kvm-ap.c is that all code
>>>> related to configuration of AP structures in KVM is in this one spot.
>>> This I understand, but the code can be in one spot inside VFIO_AP instead
>>> of inside KVM.
>>> Putting the code inside KVM induce dependencies between KVM and AP
>>> while the kvm/vfio interface allows to avoid this dependency.
>>>
>>> The purpose of VFIO_AP is to handle the CRYCB, all get/clear/set crycb masks
>>> functions should be in VFIO AP.
>>>
>>> If we use wrappers in KVM, since the CRYCB is an a SIE extension,
>>> it is legitimate, the KVM interface to the CRYCB should only
>>> handle bitmaps and be unaware of the vfio_ap internal structures.
> Yes, please!
>
>>> Another concern, the kvm_ap_validate_queue_sharing() should not be
>>> inside KVM because it is a decision of current VFIO_AP driver
>>> to not share the queues between guest of level 2.
>>>
>>> The Z architecture does not allow to share AP queues between
>>> guests of level 1 but we could re-engineer the AP bus and the '
>>> VFIO AP to offer queue sharing for guest level 2.
>>>
>>> This would be a new VFIO_AP driver (and an AP bus extension).
>>> We should not have to change KVM for this.
>>>
>>
>> Pierre's proposal makes a lot of sense to me. We would not need to take
>> the kvm_lock (which we need to traverse the vm_list safely) for the
>> validation, and we could have immediate validation (which is in my opinion
>> better).
> Please do not use the kvm_lock if possible.
>
>> Also your refcount (which is not a refcout) could go away. You simply
>> traverse your list and check for duplicates when hooking up the mdev
>> with KVM.
>>
>> And my opinion is if we don't have to add code to the kvm module we
>> better not.
>>
>> @Janosch: Does core KVM share my opinion?
> At least I do.
>
> KVM does not care about who has which crypto queue/card.
> I'd like to have a driver that does internal bookkeeping and then
> registers the crycb with KVM, so the VM can use it.
I am not sure what you mean by "registers the crycb with KVM".
Can you provide more detail?
>
>> Regards,
>> Halil
>>
>
On 06/11/2018 12:50 PM, Halil Pasic wrote:
>
>
> On 06/11/2018 06:26 PM, Tony Krowiak wrote:
>>>> @Janosch: Does core KVM share my opinion?
>>> At least I do.
>>>
>>> KVM does not care about who has which crypto queue/card.
>>> I'd like to have a driver that does internal bookkeeping and then
>>> registers the crycb with KVM, so the VM can use it.
>>
>> I am not sure what you mean by "registers the crycb with KVM".
>> Can you provide more detail?
>>
>
> I'm pretty sure he means copy the masks form the internal bookkeeping
> in the mdev device to the CRYCB fields. Please work on this assumption.
Will do.
>
>
> Halil
On 06/11/2018 06:26 PM, Tony Krowiak wrote:
>>> @Janosch: Does core KVM share my opinion?
>> At least I do.
>>
>> KVM does not care about who has which crypto queue/card.
>> I'd like to have a driver that does internal bookkeeping and then
>> registers the crycb with KVM, so the VM can use it.
>
> I am not sure what you mean by "registers the crycb with KVM".
> Can you provide more detail?
>
I'm pretty sure he means copy the masks form the internal bookkeeping
in the mdev device to the CRYCB fields. Please work on this assumption.
Halil
On 07/05/2018 17:11, Tony Krowiak wrote:
> Introduces a new AP device driver. This device driver
> is built on the VFIO mediated device framework. The framework
> provides sysfs interfaces that facilitate passthrough
> access by guests to devices installed on the linux host.
...snip...
> +static int vfio_ap_matrix_dev_create(void)
> +{
> + int ret;
> +
> + vfio_ap_root_device = root_device_register(VFIO_AP_ROOT_NAME);
> +
> + if (IS_ERR(vfio_ap_root_device)) {
> + ret = PTR_ERR(vfio_ap_root_device);
> + goto done;
> + }
> +
> + ap_matrix = kzalloc(sizeof(*ap_matrix), GFP_KERNEL);
> + if (!ap_matrix) {
> + ret = -ENOMEM;
> + goto matrix_alloc_err;
> + }
> +
> + ap_matrix->device.type = &vfio_ap_dev_type;
> + dev_set_name(&ap_matrix->device, "%s", VFIO_AP_DEV_NAME);
> + ap_matrix->device.parent = vfio_ap_root_device;
> + ap_matrix->device.release = vfio_ap_matrix_dev_release;
> + ap_matrix->device.driver = &vfio_ap_drv.driver;
> +
> + ret = device_register(&ap_matrix->device);
> + if (ret)
> + goto matrix_reg_err;
> +
> + goto done;
> +
> +matrix_reg_err:
> + put_device(&ap_matrix->device);
Did not see this before but here you certainly want to
do a kfree and not a put_device.
> +
> +matrix_alloc_err:
> + root_device_unregister(vfio_ap_root_device);
> +
> +done:
> + return ret;
> +}
> +
> +static void vfio_ap_matrix_dev_destroy(struct ap_matrix *ap_matrix)
> +{
> + device_unregister(&ap_matrix->device);
> + root_device_unregister(vfio_ap_root_device);
Also here you need a kfree(ap_matrix) too .
Pierre
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On Wed, 13 Jun 2018 09:41:16 +0200
Pierre Morel <[email protected]> wrote:
> On 07/05/2018 17:11, Tony Krowiak wrote:
> > Introduces a new AP device driver. This device driver
> > is built on the VFIO mediated device framework. The framework
> > provides sysfs interfaces that facilitate passthrough
> > access by guests to devices installed on the linux host.
> ...snip...
>
> > +static int vfio_ap_matrix_dev_create(void)
> > +{
> > + int ret;
> > +
> > + vfio_ap_root_device = root_device_register(VFIO_AP_ROOT_NAME);
> > +
> > + if (IS_ERR(vfio_ap_root_device)) {
> > + ret = PTR_ERR(vfio_ap_root_device);
> > + goto done;
> > + }
> > +
> > + ap_matrix = kzalloc(sizeof(*ap_matrix), GFP_KERNEL);
> > + if (!ap_matrix) {
> > + ret = -ENOMEM;
> > + goto matrix_alloc_err;
> > + }
> > +
> > + ap_matrix->device.type = &vfio_ap_dev_type;
> > + dev_set_name(&ap_matrix->device, "%s", VFIO_AP_DEV_NAME);
> > + ap_matrix->device.parent = vfio_ap_root_device;
> > + ap_matrix->device.release = vfio_ap_matrix_dev_release;
> > + ap_matrix->device.driver = &vfio_ap_drv.driver;
> > +
> > + ret = device_register(&ap_matrix->device);
> > + if (ret)
> > + goto matrix_reg_err;
> > +
> > + goto done;
> > +
> > +matrix_reg_err:
> > + put_device(&ap_matrix->device);
>
> Did not see this before but here you certainly want to
> do a kfree and not a put_device.
No, this must not be a kfree. Once you've tried to register something
embedding a struct device with the driver core, you need to use
put_device, as another path may have acquired a reference, even if
registering ultimately failed. See the comment for device_register().
IOW, the code is correct.
>
>
>
> > +
> > +matrix_alloc_err:
> > + root_device_unregister(vfio_ap_root_device);
> > +
> > +done:
> > + return ret;
> > +}
> > +
> > +static void vfio_ap_matrix_dev_destroy(struct ap_matrix *ap_matrix)
> > +{
> > + device_unregister(&ap_matrix->device);
> > + root_device_unregister(vfio_ap_root_device);
>
> Also here you need a kfree(ap_matrix) too .
Same here.
On 13/06/2018 09:48, Cornelia Huck wrote:
> On Wed, 13 Jun 2018 09:41:16 +0200
> Pierre Morel <[email protected]> wrote:
>
>> On 07/05/2018 17:11, Tony Krowiak wrote:
>>> Introduces a new AP device driver. This device driver
>>> is built on the VFIO mediated device framework. The framework
>>> provides sysfs interfaces that facilitate passthrough
>>> access by guests to devices installed on the linux host.
>> ...snip...
>>
>>> +static int vfio_ap_matrix_dev_create(void)
>>> +{
>>> + int ret;
>>> +
>>> + vfio_ap_root_device = root_device_register(VFIO_AP_ROOT_NAME);
>>> +
>>> + if (IS_ERR(vfio_ap_root_device)) {
>>> + ret = PTR_ERR(vfio_ap_root_device);
>>> + goto done;
>>> + }
>>> +
>>> + ap_matrix = kzalloc(sizeof(*ap_matrix), GFP_KERNEL);
>>> + if (!ap_matrix) {
>>> + ret = -ENOMEM;
>>> + goto matrix_alloc_err;
>>> + }
>>> +
>>> + ap_matrix->device.type = &vfio_ap_dev_type;
>>> + dev_set_name(&ap_matrix->device, "%s", VFIO_AP_DEV_NAME);
>>> + ap_matrix->device.parent = vfio_ap_root_device;
>>> + ap_matrix->device.release = vfio_ap_matrix_dev_release;
>>> + ap_matrix->device.driver = &vfio_ap_drv.driver;
>>> +
>>> + ret = device_register(&ap_matrix->device);
>>> + if (ret)
>>> + goto matrix_reg_err;
>>> +
>>> + goto done;
>>> +
>>> +matrix_reg_err:
>>> + put_device(&ap_matrix->device);
>> Did not see this before but here you certainly want to
>> do a kfree and not a put_device.
> No, this must not be a kfree. Once you've tried to register something
> embedding a struct device with the driver core, you need to use
> put_device, as another path may have acquired a reference, even if
> registering ultimately failed. See the comment for device_register().
> IOW, the code is correct.
learned something again :) ,
but still, a kfree is needed for the kzalloc.
Does'nt it?
>
>>
>>
>>> +
>>> +matrix_alloc_err:
>>> + root_device_unregister(vfio_ap_root_device);
>>> +
>>> +done:
>>> + return ret;
>>> +}
>>> +
>>> +static void vfio_ap_matrix_dev_destroy(struct ap_matrix *ap_matrix)
>>> +{
>>> + device_unregister(&ap_matrix->device);
>>> + root_device_unregister(vfio_ap_root_device);
>> Also here you need a kfree(ap_matrix) too .
> Same here.
>
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On Wed, 13 Jun 2018 12:54:40 +0200
Pierre Morel <[email protected]> wrote:
> On 13/06/2018 09:48, Cornelia Huck wrote:
> > On Wed, 13 Jun 2018 09:41:16 +0200
> > Pierre Morel <[email protected]> wrote:
> >
> >> On 07/05/2018 17:11, Tony Krowiak wrote:
> >>> Introduces a new AP device driver. This device driver
> >>> is built on the VFIO mediated device framework. The framework
> >>> provides sysfs interfaces that facilitate passthrough
> >>> access by guests to devices installed on the linux host.
> >> ...snip...
> >>
> >>> +static int vfio_ap_matrix_dev_create(void)
> >>> +{
> >>> + int ret;
> >>> +
> >>> + vfio_ap_root_device = root_device_register(VFIO_AP_ROOT_NAME);
> >>> +
> >>> + if (IS_ERR(vfio_ap_root_device)) {
> >>> + ret = PTR_ERR(vfio_ap_root_device);
> >>> + goto done;
> >>> + }
> >>> +
> >>> + ap_matrix = kzalloc(sizeof(*ap_matrix), GFP_KERNEL);
> >>> + if (!ap_matrix) {
> >>> + ret = -ENOMEM;
> >>> + goto matrix_alloc_err;
> >>> + }
> >>> +
> >>> + ap_matrix->device.type = &vfio_ap_dev_type;
> >>> + dev_set_name(&ap_matrix->device, "%s", VFIO_AP_DEV_NAME);
> >>> + ap_matrix->device.parent = vfio_ap_root_device;
> >>> + ap_matrix->device.release = vfio_ap_matrix_dev_release;
> >>> + ap_matrix->device.driver = &vfio_ap_drv.driver;
> >>> +
> >>> + ret = device_register(&ap_matrix->device);
> >>> + if (ret)
> >>> + goto matrix_reg_err;
> >>> +
> >>> + goto done;
> >>> +
> >>> +matrix_reg_err:
> >>> + put_device(&ap_matrix->device);
> >> Did not see this before but here you certainly want to
> >> do a kfree and not a put_device.
> > No, this must not be a kfree. Once you've tried to register something
> > embedding a struct device with the driver core, you need to use
> > put_device, as another path may have acquired a reference, even if
> > registering ultimately failed. See the comment for device_register().
> > IOW, the code is correct.
>
> learned something again :) ,
> but still, a kfree is needed for the kzalloc.
> Does'nt it?
No, the put callback for the embedding structure needs to take care of
freeing things. Otherwise it is buggy.
>
> >
> >>
> >>
> >>> +
> >>> +matrix_alloc_err:
> >>> + root_device_unregister(vfio_ap_root_device);
> >>> +
> >>> +done:
> >>> + return ret;
> >>> +}
> >>> +
> >>> +static void vfio_ap_matrix_dev_destroy(struct ap_matrix *ap_matrix)
> >>> +{
> >>> + device_unregister(&ap_matrix->device);
> >>> + root_device_unregister(vfio_ap_root_device);
> >> Also here you need a kfree(ap_matrix) too .
> > Same here.
> >
>
On 13/06/2018 13:14, Cornelia Huck wrote:
> On Wed, 13 Jun 2018 12:54:40 +0200
> Pierre Morel <[email protected]> wrote:
>
>> On 13/06/2018 09:48, Cornelia Huck wrote:
>>> On Wed, 13 Jun 2018 09:41:16 +0200
>>> Pierre Morel <[email protected]> wrote:
>>>
>>>> On 07/05/2018 17:11, Tony Krowiak wrote:
>>>>> Introduces a new AP device driver. This device driver
>>>>> is built on the VFIO mediated device framework. The framework
>>>>> provides sysfs interfaces that facilitate passthrough
>>>>> access by guests to devices installed on the linux host.
>>>> ...snip...
>>>>
>>>>> +static int vfio_ap_matrix_dev_create(void)
>>>>> +{
>>>>> + int ret;
>>>>> +
>>>>> + vfio_ap_root_device = root_device_register(VFIO_AP_ROOT_NAME);
>>>>> +
>>>>> + if (IS_ERR(vfio_ap_root_device)) {
>>>>> + ret = PTR_ERR(vfio_ap_root_device);
>>>>> + goto done;
>>>>> + }
>>>>> +
>>>>> + ap_matrix = kzalloc(sizeof(*ap_matrix), GFP_KERNEL);
>>>>> + if (!ap_matrix) {
>>>>> + ret = -ENOMEM;
>>>>> + goto matrix_alloc_err;
>>>>> + }
>>>>> +
>>>>> + ap_matrix->device.type = &vfio_ap_dev_type;
>>>>> + dev_set_name(&ap_matrix->device, "%s", VFIO_AP_DEV_NAME);
>>>>> + ap_matrix->device.parent = vfio_ap_root_device;
>>>>> + ap_matrix->device.release = vfio_ap_matrix_dev_release;
>>>>> + ap_matrix->device.driver = &vfio_ap_drv.driver;
>>>>> +
>>>>> + ret = device_register(&ap_matrix->device);
>>>>> + if (ret)
>>>>> + goto matrix_reg_err;
>>>>> +
>>>>> + goto done;
>>>>> +
>>>>> +matrix_reg_err:
>>>>> + put_device(&ap_matrix->device);
>>>> Did not see this before but here you certainly want to
>>>> do a kfree and not a put_device.
>>> No, this must not be a kfree. Once you've tried to register something
>>> embedding a struct device with the driver core, you need to use
>>> put_device, as another path may have acquired a reference, even if
>>> registering ultimately failed. See the comment for device_register().
>>> IOW, the code is correct.
>> learned something again :) ,
>> but still, a kfree is needed for the kzalloc.
>> Does'nt it?
> No, the put callback for the embedding structure needs to take care of
> freeing things. Otherwise it is buggy.
Seems buggy to me.
How does the put_device knows if it is embedded and then in what it is
embedded ?
>
>>>
>>>>
>>>>> +
>>>>> +matrix_alloc_err:
>>>>> + root_device_unregister(vfio_ap_root_device);
>>>>> +
>>>>> +done:
>>>>> + return ret;
>>>>> +}
>>>>> +
>>>>> +static void vfio_ap_matrix_dev_destroy(struct ap_matrix *ap_matrix)
>>>>> +{
>>>>> + device_unregister(&ap_matrix->device);
>>>>> + root_device_unregister(vfio_ap_root_device);
>>>> Also here you need a kfree(ap_matrix) too .
>>> Same here.
>>>
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On Wed, 13 Jun 2018 14:01:54 +0200
Pierre Morel <[email protected]> wrote:
> On 13/06/2018 13:14, Cornelia Huck wrote:
> > On Wed, 13 Jun 2018 12:54:40 +0200
> > Pierre Morel <[email protected]> wrote:
> >
> >> On 13/06/2018 09:48, Cornelia Huck wrote:
> >>> On Wed, 13 Jun 2018 09:41:16 +0200
> >>> Pierre Morel <[email protected]> wrote:
> >>>
> >>>> On 07/05/2018 17:11, Tony Krowiak wrote:
> >>>>> Introduces a new AP device driver. This device driver
> >>>>> is built on the VFIO mediated device framework. The framework
> >>>>> provides sysfs interfaces that facilitate passthrough
> >>>>> access by guests to devices installed on the linux host.
> >>>> ...snip...
> >>>>
> >>>>> +static int vfio_ap_matrix_dev_create(void)
> >>>>> +{
> >>>>> + int ret;
> >>>>> +
> >>>>> + vfio_ap_root_device = root_device_register(VFIO_AP_ROOT_NAME);
> >>>>> +
> >>>>> + if (IS_ERR(vfio_ap_root_device)) {
> >>>>> + ret = PTR_ERR(vfio_ap_root_device);
> >>>>> + goto done;
> >>>>> + }
> >>>>> +
> >>>>> + ap_matrix = kzalloc(sizeof(*ap_matrix), GFP_KERNEL);
> >>>>> + if (!ap_matrix) {
> >>>>> + ret = -ENOMEM;
> >>>>> + goto matrix_alloc_err;
> >>>>> + }
> >>>>> +
> >>>>> + ap_matrix->device.type = &vfio_ap_dev_type;
> >>>>> + dev_set_name(&ap_matrix->device, "%s", VFIO_AP_DEV_NAME);
> >>>>> + ap_matrix->device.parent = vfio_ap_root_device;
> >>>>> + ap_matrix->device.release = vfio_ap_matrix_dev_release;
> >>>>> + ap_matrix->device.driver = &vfio_ap_drv.driver;
> >>>>> +
> >>>>> + ret = device_register(&ap_matrix->device);
> >>>>> + if (ret)
> >>>>> + goto matrix_reg_err;
> >>>>> +
> >>>>> + goto done;
> >>>>> +
> >>>>> +matrix_reg_err:
> >>>>> + put_device(&ap_matrix->device);
> >>>> Did not see this before but here you certainly want to
> >>>> do a kfree and not a put_device.
> >>> No, this must not be a kfree. Once you've tried to register something
> >>> embedding a struct device with the driver core, you need to use
> >>> put_device, as another path may have acquired a reference, even if
> >>> registering ultimately failed. See the comment for device_register().
> >>> IOW, the code is correct.
> >> learned something again :) ,
> >> but still, a kfree is needed for the kzalloc.
> >> Does'nt it?
> > No, the put callback for the embedding structure needs to take care of
> > freeing things. Otherwise it is buggy.
>
> Seems buggy to me.
> How does the put_device knows if it is embedded and then in what it is
> embedded ?
It does not need to know; the code registering the structure needs to
set up device->release correctly.
>
> >
> >>>
> >>>>
> >>>>> +
> >>>>> +matrix_alloc_err:
> >>>>> + root_device_unregister(vfio_ap_root_device);
> >>>>> +
> >>>>> +done:
> >>>>> + return ret;
> >>>>> +}
> >>>>> +
> >>>>> +static void vfio_ap_matrix_dev_destroy(struct ap_matrix *ap_matrix)
> >>>>> +{
> >>>>> + device_unregister(&ap_matrix->device);
> >>>>> + root_device_unregister(vfio_ap_root_device);
> >>>> Also here you need a kfree(ap_matrix) too .
> >>> Same here.
> >>>
>
>
On 13/06/2018 14:12, Cornelia Huck wrote:
> On Wed, 13 Jun 2018 14:01:54 +0200
> Pierre Morel <[email protected]> wrote:
>
>> On 13/06/2018 13:14, Cornelia Huck wrote:
>>> On Wed, 13 Jun 2018 12:54:40 +0200
>>> Pierre Morel <[email protected]> wrote:
>>>
>>>> On 13/06/2018 09:48, Cornelia Huck wrote:
>>>>> On Wed, 13 Jun 2018 09:41:16 +0200
>>>>> Pierre Morel <[email protected]> wrote:
>>>>>
>>>>>> On 07/05/2018 17:11, Tony Krowiak wrote:
>>>>>>> Introduces a new AP device driver. This device driver
>>>>>>> is built on the VFIO mediated device framework. The framework
>>>>>>> provides sysfs interfaces that facilitate passthrough
>>>>>>> access by guests to devices installed on the linux host.
>>>>>> ...snip...
>>>>>>
>>>>>>> +static int vfio_ap_matrix_dev_create(void)
>>>>>>> +{
>>>>>>> + int ret;
>>>>>>> +
>>>>>>> + vfio_ap_root_device = root_device_register(VFIO_AP_ROOT_NAME);
>>>>>>> +
>>>>>>> + if (IS_ERR(vfio_ap_root_device)) {
>>>>>>> + ret = PTR_ERR(vfio_ap_root_device);
>>>>>>> + goto done;
>>>>>>> + }
>>>>>>> +
>>>>>>> + ap_matrix = kzalloc(sizeof(*ap_matrix), GFP_KERNEL);
>>>>>>> + if (!ap_matrix) {
>>>>>>> + ret = -ENOMEM;
>>>>>>> + goto matrix_alloc_err;
>>>>>>> + }
>>>>>>> +
>>>>>>> + ap_matrix->device.type = &vfio_ap_dev_type;
>>>>>>> + dev_set_name(&ap_matrix->device, "%s", VFIO_AP_DEV_NAME);
>>>>>>> + ap_matrix->device.parent = vfio_ap_root_device;
>>>>>>> + ap_matrix->device.release = vfio_ap_matrix_dev_release;
>>>>>>> + ap_matrix->device.driver = &vfio_ap_drv.driver;
>>>>>>> +
>>>>>>> + ret = device_register(&ap_matrix->device);
>>>>>>> + if (ret)
>>>>>>> + goto matrix_reg_err;
>>>>>>> +
>>>>>>> + goto done;
>>>>>>> +
>>>>>>> +matrix_reg_err:
>>>>>>> + put_device(&ap_matrix->device);
>>>>>> Did not see this before but here you certainly want to
>>>>>> do a kfree and not a put_device.
>>>>> No, this must not be a kfree. Once you've tried to register something
>>>>> embedding a struct device with the driver core, you need to use
>>>>> put_device, as another path may have acquired a reference, even if
>>>>> registering ultimately failed. See the comment for device_register().
>>>>> IOW, the code is correct.
>>>> learned something again :) ,
>>>> but still, a kfree is needed for the kzalloc.
>>>> Does'nt it?
>>> No, the put callback for the embedding structure needs to take care of
>>> freeing things. Otherwise it is buggy.
>> Seems buggy to me.
>> How does the put_device knows if it is embedded and then in what it is
>> embedded ?
> It does not need to know; the code registering the structure needs to
> set up device->release correctly.
yes right, thanks.
>
>>>
>>>>>
>>>>>>
>>>>>>> +
>>>>>>> +matrix_alloc_err:
>>>>>>> + root_device_unregister(vfio_ap_root_device);
>>>>>>> +
>>>>>>> +done:
>>>>>>> + return ret;
>>>>>>> +}
>>>>>>> +
>>>>>>> +static void vfio_ap_matrix_dev_destroy(struct ap_matrix *ap_matrix)
>>>>>>> +{
>>>>>>> + device_unregister(&ap_matrix->device);
>>>>>>> + root_device_unregister(vfio_ap_root_device);
>>>>>> Also here you need a kfree(ap_matrix) too .
>>>>> Same here.
>>>>>
>>
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On 06/13/2018 08:16 AM, Pierre Morel wrote:
> On 13/06/2018 14:12, Cornelia Huck wrote:
>> On Wed, 13 Jun 2018 14:01:54 +0200
>> Pierre Morel <[email protected]> wrote:
>>
>>> On 13/06/2018 13:14, Cornelia Huck wrote:
>>>> On Wed, 13 Jun 2018 12:54:40 +0200
>>>> Pierre Morel <[email protected]> wrote:
>>>>> On 13/06/2018 09:48, Cornelia Huck wrote:
>>>>>> On Wed, 13 Jun 2018 09:41:16 +0200
>>>>>> Pierre Morel <[email protected]> wrote:
>>>>>>> On 07/05/2018 17:11, Tony Krowiak wrote:
>>>>>>>> Introduces a new AP device driver. This device driver
>>>>>>>> is built on the VFIO mediated device framework. The framework
>>>>>>>> provides sysfs interfaces that facilitate passthrough
>>>>>>>> access by guests to devices installed on the linux host.
>>>>>>> ...snip...
>>>>>>>> +static int vfio_ap_matrix_dev_create(void)
>>>>>>>> +{
>>>>>>>> + int ret;
>>>>>>>> +
>>>>>>>> + vfio_ap_root_device =
>>>>>>>> root_device_register(VFIO_AP_ROOT_NAME);
>>>>>>>> +
>>>>>>>> + if (IS_ERR(vfio_ap_root_device)) {
>>>>>>>> + ret = PTR_ERR(vfio_ap_root_device);
>>>>>>>> + goto done;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + ap_matrix = kzalloc(sizeof(*ap_matrix), GFP_KERNEL);
>>>>>>>> + if (!ap_matrix) {
>>>>>>>> + ret = -ENOMEM;
>>>>>>>> + goto matrix_alloc_err;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + ap_matrix->device.type = &vfio_ap_dev_type;
>>>>>>>> + dev_set_name(&ap_matrix->device, "%s", VFIO_AP_DEV_NAME);
>>>>>>>> + ap_matrix->device.parent = vfio_ap_root_device;
>>>>>>>> + ap_matrix->device.release = vfio_ap_matrix_dev_release;
>>>>>>>> + ap_matrix->device.driver = &vfio_ap_drv.driver;
>>>>>>>> +
>>>>>>>> + ret = device_register(&ap_matrix->device);
>>>>>>>> + if (ret)
>>>>>>>> + goto matrix_reg_err;
>>>>>>>> +
>>>>>>>> + goto done;
>>>>>>>> +
>>>>>>>> +matrix_reg_err:
>>>>>>>> + put_device(&ap_matrix->device);
>>>>>>> Did not see this before but here you certainly want to
>>>>>>> do a kfree and not a put_device.
>>>>>> No, this must not be a kfree. Once you've tried to register
>>>>>> something
>>>>>> embedding a struct device with the driver core, you need to use
>>>>>> put_device, as another path may have acquired a reference, even if
>>>>>> registering ultimately failed. See the comment for
>>>>>> device_register().
>>>>>> IOW, the code is correct.
>>>>> learned something again :) ,
>>>>> but still, a kfree is needed for the kzalloc.
>>>>> Does'nt it?
>>>> No, the put callback for the embedding structure needs to take care of
>>>> freeing things. Otherwise it is buggy.
>>> Seems buggy to me.
>>> How does the put_device knows if it is embedded and then in what it is
>>> embedded ?
>> It does not need to know; the code registering the structure needs to
>> set up device->release correctly.
>
> yes right, thanks.
See the vfio_ap_matrix_dev_release() callback.
>
>
>>
>>>>>>>> +
>>>>>>>> +matrix_alloc_err:
>>>>>>>> + root_device_unregister(vfio_ap_root_device);
>>>>>>>> +
>>>>>>>> +done:
>>>>>>>> + return ret;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void vfio_ap_matrix_dev_destroy(struct ap_matrix
>>>>>>>> *ap_matrix)
>>>>>>>> +{
>>>>>>>> + device_unregister(&ap_matrix->device);
>>>>>>>> + root_device_unregister(vfio_ap_root_device);
>>>>>>> Also here you need a kfree(ap_matrix) too .
>>>>>> Same here.
>>>
>