In response to the probe or remove of a queue device, if a KVM guest is
using the matrix mdev to which the APQN of the queue device is assigned,
the vfio_ap device driver must respond accordingly. In an ideal world, the
queue device being probed would be hot plugged into the guest. Likewise,
the queue corresponding to the queue device being removed would
be hot unplugged from the guest. Unfortunately, the AP architecture
precludes plugging or unplugging individual queues. We must also
consider the fact that the linux device model precludes us from passing a
queue device through to a KVM guest that is not bound to the driver
facilitating the pass-through. Consequently, we are left with the choice of
plugging/unplugging the adapter or the domain. In the latter case, this
would result in taking access to the domain away for each adapter the
guest is using. In either case, the operation will alter a KVM guest's
access to one or more queues, so let's plug/unplug the adapter on
bind/unbind of the queue device since this corresponds to the hardware
entity that may be physically plugged/unplugged - i.e., a domain is not
a piece of hardware.
Example:
=======
Queue devices bound to vfio_ap device driver:
04.0004
04.0047
04.0054
05.0005
05.0047
Adapters and domains assigned to matrix mdev:
Adapters Domains -> Queues
04 0004 04.0004
05 0047 04.0047
0054 04.0054
05.0004
05.0047
05.0054
KVM guest matrix at is startup:
Adapters Domains -> Queues
04 0004 04.0004
0047 04.0047
0054 04.0054
Adapter 05 is filtered because queue 05.0054 is not bound.
KVM guest matrix after queue 05.0054 is bound to the vfio_ap driver:
Adapters Domains -> Queues
04 0004 04.0004
05 0047 04.0047
0054 04.0054
05.0004
05.0047
05.0054
All queues assigned to the matrix mdev are now bound.
KVM guest matrix after queue 04.0004 is unbound:
Adapters Domains -> Queues
05 0004 05.0004
0047 05.0047
0054 05.0054
Adapter 04 is filtered because 04.0004 is no longer bound.
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 158 +++++++++++++++++++++++++++++-
1 file changed, 155 insertions(+), 3 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 7bad70d7bcef..5b34bc8fca31 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -312,6 +312,13 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
return 0;
}
+static void vfio_ap_matrix_clear_masks(struct ap_matrix *matrix)
+{
+ bitmap_clear(matrix->apm, 0, AP_DEVICES);
+ bitmap_clear(matrix->aqm, 0, AP_DOMAINS);
+ bitmap_clear(matrix->adm, 0, AP_DOMAINS);
+}
+
static void vfio_ap_matrix_init(struct ap_config_info *info,
struct ap_matrix *matrix)
{
@@ -601,6 +608,104 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
return 0;
}
+static bool vfio_ap_mdev_matrixes_equal(struct ap_matrix *matrix1,
+ struct ap_matrix *matrix2)
+{
+ return (bitmap_equal(matrix1->apm, matrix2->apm, AP_DEVICES) &&
+ bitmap_equal(matrix1->aqm, matrix2->aqm, AP_DOMAINS) &&
+ bitmap_equal(matrix1->adm, matrix2->adm, AP_DOMAINS));
+}
+
+/**
+ * vfio_ap_mdev_filter_matrix
+ *
+ * Filters the matrix of adapters, domains, and control domains assigned to
+ * a matrix mdev's AP configuration and stores the result in the shadow copy of
+ * the APCB used to supply a KVM guest's AP configuration.
+ *
+ * @matrix_mdev: the matrix mdev whose AP configuration is to be filtered
+ *
+ * Returns true if filtering has changed the shadow copy of the APCB used
+ * to supply a KVM guest's AP configuration; otherwise, returns false.
+ */
+static int vfio_ap_mdev_filter_guest_matrix(struct ap_matrix_mdev *matrix_mdev)
+{
+ struct ap_matrix shadow_apcb;
+ unsigned long apid, apqi, apqn;
+
+ memcpy(&shadow_apcb, &matrix_mdev->matrix, sizeof(struct ap_matrix));
+
+ for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
+ /*
+ * If the APID is not assigned to the host AP configuration,
+ * we can not assign it to the guest's AP configuration
+ */
+ if (!test_bit_inv(apid,
+ (unsigned long *)matrix_dev->info.apm)) {
+ clear_bit_inv(apid, shadow_apcb.apm);
+ continue;
+ }
+
+ for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
+ AP_DOMAINS) {
+ /*
+ * If the APQI is not assigned to the host AP
+ * configuration, then it can not be assigned to the
+ * guest's AP configuration
+ */
+ if (!test_bit_inv(apqi, (unsigned long *)
+ matrix_dev->info.aqm)) {
+ clear_bit_inv(apqi, shadow_apcb.aqm);
+ continue;
+ }
+
+ /*
+ * If the APQN is not bound to the vfio_ap device
+ * driver, then we can't assign it to the guest's
+ * AP configuration. The AP architecture won't
+ * allow filtering of a single APQN, so let's filter
+ * the APID.
+ */
+ apqn = AP_MKQID(apid, apqi);
+ if (!vfio_ap_mdev_get_queue(matrix_mdev, apqn)) {
+ clear_bit_inv(apid, shadow_apcb.apm);
+ break;
+ }
+ }
+
+ /*
+ * If all APIDs have been cleared, then clear the APQIs from the
+ * shadow APCB and quit filtering.
+ */
+ if (bitmap_empty(shadow_apcb.apm, AP_DEVICES)) {
+ if (!bitmap_empty(shadow_apcb.aqm, AP_DOMAINS))
+ bitmap_clear(shadow_apcb.aqm, 0, AP_DOMAINS);
+
+ break;
+ }
+
+ /*
+ * If all APQIs have been cleared, then clear the APIDs from the
+ * shadow APCB and quit filtering.
+ */
+ if (bitmap_empty(shadow_apcb.aqm, AP_DOMAINS)) {
+ if (!bitmap_empty(shadow_apcb.apm, AP_DEVICES))
+ bitmap_clear(shadow_apcb.apm, 0, AP_DEVICES);
+
+ break;
+ }
+ }
+
+ if (vfio_ap_mdev_matrixes_equal(&matrix_mdev->shadow_apcb,
+ &shadow_apcb))
+ return false;
+
+ memcpy(&matrix_mdev->shadow_apcb, &shadow_apcb,
+ sizeof(struct ap_matrix));
+
+ return true;
+}
+
enum qlink_type {
LINK_APID,
LINK_APQI,
@@ -1256,9 +1361,8 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
if (!vfio_ap_mdev_has_crycb(matrix_mdev))
return NOTIFY_DONE;
- memcpy(&matrix_mdev->shadow_apcb, &matrix_mdev->matrix,
- sizeof(matrix_mdev->shadow_apcb));
- vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
+ if (vfio_ap_mdev_filter_guest_matrix(matrix_mdev))
+ vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
return NOTIFY_OK;
}
@@ -1369,6 +1473,18 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
matrix_mdev->kvm = NULL;
}
+ /*
+ * The shadow_apcb must be cleared.
+ *
+ * The shadow_apcb is committed to the guest only if the masks resulting
+ * from filtering the matrix_mdev->matrix differs from the masks in the
+ * shadow_apcb. Consequently, if we don't clear the masks here and a
+ * guest is subsequently started, the filtering may not result in a
+ * change to the shadow_apcb which will not get committed to the guest;
+ * in that case, the guest will be left without any queues.
+ */
+ vfio_ap_matrix_clear_masks(&matrix_mdev->shadow_apcb);
+
mutex_unlock(&matrix_dev->lock);
vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
@@ -1466,6 +1582,16 @@ static void vfio_ap_queue_link_mdev(struct vfio_ap_queue *q)
}
}
+static void vfio_ap_mdev_hot_plug_queue(struct vfio_ap_queue *q)
+{
+
+ if ((q->matrix_mdev == NULL) || !vfio_ap_mdev_has_crycb(q->matrix_mdev))
+ return;
+
+ if (vfio_ap_mdev_filter_guest_matrix(q->matrix_mdev))
+ vfio_ap_mdev_commit_shadow_apcb(q->matrix_mdev);
+}
+
int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
{
struct vfio_ap_queue *q;
@@ -1482,11 +1608,36 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
q->apqn = queue->qid;
q->saved_isc = VFIO_AP_ISC_INVALID;
vfio_ap_queue_link_mdev(q);
+ vfio_ap_mdev_hot_plug_queue(q);
mutex_unlock(&matrix_dev->lock);
return 0;
}
+void vfio_ap_mdev_hot_unplug_queue(struct vfio_ap_queue *q)
+{
+ unsigned long apid = AP_QID_CARD(q->apqn);
+
+ if ((q->matrix_mdev == NULL) || !vfio_ap_mdev_has_crycb(q->matrix_mdev))
+ return;
+
+ /*
+ * If the APID is assigned to the guest, then let's
+ * go ahead and unplug the adapter since the
+ * architecture does not provide a means to unplug
+ * an individual queue.
+ */
+ if (test_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm)) {
+ clear_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm);
+
+ if (bitmap_empty(q->matrix_mdev->shadow_apcb.apm, AP_DEVICES))
+ bitmap_clear(q->matrix_mdev->shadow_apcb.aqm, 0,
+ AP_DOMAINS);
+
+ vfio_ap_mdev_commit_shadow_apcb(q->matrix_mdev);
+ }
+}
+
void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
{
struct vfio_ap_queue *q;
@@ -1497,6 +1648,7 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
mutex_lock(&matrix_dev->lock);
q = dev_get_drvdata(&queue->ap_dev.device);
+ vfio_ap_mdev_hot_unplug_queue(q);
dev_set_drvdata(&queue->ap_dev.device, NULL);
apid = AP_QID_CARD(q->apqn);
apqi = AP_QID_QUEUE(q->apqn);
--
2.21.1
Hi Tony,
I love your patch! Perhaps something to improve:
[auto build test WARNING on s390/features]
[also build test WARNING on linus/master kvms390/next linux/master v5.9 next-20201022]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Tony-Krowiak/s390-vfio-ap-dynamic-configuration-support/20201023-011543
base: https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features
config: s390-allyesconfig (attached as .config)
compiler: s390-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/aea9ab29b77facc3bb09415ebe464fd6a22ec22e
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Tony-Krowiak/s390-vfio-ap-dynamic-configuration-support/20201023-011543
git checkout aea9ab29b77facc3bb09415ebe464fd6a22ec22e
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=s390
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>
All warnings (new ones prefixed by >>):
drivers/s390/crypto/vfio_ap_ops.c:1370:5: warning: no previous prototype for 'vfio_ap_mdev_reset_queue' [-Wmissing-prototypes]
1370 | int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
| ^~~~~~~~~~~~~~~~~~~~~~~~
>> drivers/s390/crypto/vfio_ap_ops.c:1617:6: warning: no previous prototype for 'vfio_ap_mdev_hot_unplug_queue' [-Wmissing-prototypes]
1617 | void vfio_ap_mdev_hot_unplug_queue(struct vfio_ap_queue *q)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vim +/vfio_ap_mdev_hot_unplug_queue +1617 drivers/s390/crypto/vfio_ap_ops.c
1616
> 1617 void vfio_ap_mdev_hot_unplug_queue(struct vfio_ap_queue *q)
1618 {
1619 unsigned long apid = AP_QID_CARD(q->apqn);
1620
1621 if ((q->matrix_mdev == NULL) || !vfio_ap_mdev_has_crycb(q->matrix_mdev))
1622 return;
1623
1624 /*
1625 * If the APID is assigned to the guest, then let's
1626 * go ahead and unplug the adapter since the
1627 * architecture does not provide a means to unplug
1628 * an individual queue.
1629 */
1630 if (test_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm)) {
1631 clear_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm);
1632
1633 if (bitmap_empty(q->matrix_mdev->shadow_apcb.apm, AP_DEVICES))
1634 bitmap_clear(q->matrix_mdev->shadow_apcb.aqm, 0,
1635 AP_DOMAINS);
1636
1637 vfio_ap_mdev_commit_shadow_apcb(q->matrix_mdev);
1638 }
1639 }
1640
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]
On 10/22/20 4:30 PM, kernel test robot wrote:
> Hi Tony,
>
> I love your patch! Perhaps something to improve:
>
> [auto build test WARNING on s390/features]
> [also build test WARNING on linus/master kvms390/next linux/master v5.9 next-20201022]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
>
> url: https://github.com/0day-ci/linux/commits/Tony-Krowiak/s390-vfio-ap-dynamic-configuration-support/20201023-011543
> base: https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features
> config: s390-allyesconfig (attached as .config)
> compiler: s390-linux-gcc (GCC) 9.3.0
> reproduce (this is a W=1 build):
> wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # https://github.com/0day-ci/linux/commit/aea9ab29b77facc3bb09415ebe464fd6a22ec22e
> git remote add linux-review https://github.com/0day-ci/linux
> git fetch --no-tags linux-review Tony-Krowiak/s390-vfio-ap-dynamic-configuration-support/20201023-011543
> git checkout aea9ab29b77facc3bb09415ebe464fd6a22ec22e
> # save the attached .config to linux build tree
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=s390
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <[email protected]>
>
> All warnings (new ones prefixed by >>):
>
> drivers/s390/crypto/vfio_ap_ops.c:1370:5: warning: no previous prototype for 'vfio_ap_mdev_reset_queue' [-Wmissing-prototypes]
> 1370 | int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
> | ^~~~~~~~~~~~~~~~~~~~~~~~
My mistake, need to be a static function.
>>> drivers/s390/crypto/vfio_ap_ops.c:1617:6: warning: no previous prototype for 'vfio_ap_mdev_hot_unplug_queue' [-Wmissing-prototypes]
> 1617 | void vfio_ap_mdev_hot_unplug_queue(struct vfio_ap_queue *q)
> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Ditto here, but this was reported for patch 01/14 and will be fixed there.
>
> vim +/vfio_ap_mdev_hot_unplug_queue +1617 drivers/s390/crypto/vfio_ap_ops.c
>
> 1616
>> 1617 void vfio_ap_mdev_hot_unplug_queue(struct vfio_ap_queue *q)
> 1618 {
> 1619 unsigned long apid = AP_QID_CARD(q->apqn);
> 1620
> 1621 if ((q->matrix_mdev == NULL) || !vfio_ap_mdev_has_crycb(q->matrix_mdev))
> 1622 return;
> 1623
> 1624 /*
> 1625 * If the APID is assigned to the guest, then let's
> 1626 * go ahead and unplug the adapter since the
> 1627 * architecture does not provide a means to unplug
> 1628 * an individual queue.
> 1629 */
> 1630 if (test_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm)) {
> 1631 clear_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm);
> 1632
> 1633 if (bitmap_empty(q->matrix_mdev->shadow_apcb.apm, AP_DEVICES))
> 1634 bitmap_clear(q->matrix_mdev->shadow_apcb.aqm, 0,
> 1635 AP_DOMAINS);
> 1636
> 1637 vfio_ap_mdev_commit_shadow_apcb(q->matrix_mdev);
> 1638 }
> 1639 }
> 1640
>
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/[email protected]
On Thu, 22 Oct 2020 13:12:03 -0400
Tony Krowiak <[email protected]> wrote:
> In response to the probe or remove of a queue device, if a KVM guest is
> using the matrix mdev to which the APQN of the queue device is assigned,
> the vfio_ap device driver must respond accordingly. In an ideal world, the
> queue device being probed would be hot plugged into the guest. Likewise,
> the queue corresponding to the queue device being removed would
> be hot unplugged from the guest. Unfortunately, the AP architecture
> precludes plugging or unplugging individual queues. We must also
> consider the fact that the linux device model precludes us from passing a
> queue device through to a KVM guest that is not bound to the driver
> facilitating the pass-through. Consequently, we are left with the choice of
> plugging/unplugging the adapter or the domain. In the latter case, this
> would result in taking access to the domain away for each adapter the
> guest is using. In either case, the operation will alter a KVM guest's
> access to one or more queues, so let's plug/unplug the adapter on
> bind/unbind of the queue device since this corresponds to the hardware
> entity that may be physically plugged/unplugged - i.e., a domain is not
> a piece of hardware.
>
> Example:
> =======
> Queue devices bound to vfio_ap device driver:
> 04.0004
> 04.0047
> 04.0054
>
> 05.0005
> 05.0047
>
> Adapters and domains assigned to matrix mdev:
> Adapters Domains -> Queues
> 04 0004 04.0004
> 05 0047 04.0047
> 0054 04.0054
> 05.0004
> 05.0047
> 05.0054
>
> KVM guest matrix at is startup:
> Adapters Domains -> Queues
> 04 0004 04.0004
> 0047 04.0047
> 0054 04.0054
>
> Adapter 05 is filtered because queue 05.0054 is not bound.
>
> KVM guest matrix after queue 05.0054 is bound to the vfio_ap driver:
> Adapters Domains -> Queues
> 04 0004 04.0004
> 05 0047 04.0047
> 0054 04.0054
> 05.0004
> 05.0047
> 05.0054
>
> All queues assigned to the matrix mdev are now bound.
>
> KVM guest matrix after queue 04.0004 is unbound:
>
> Adapters Domains -> Queues
> 05 0004 05.0004
> 0047 05.0047
> 0054 05.0054
>
> Adapter 04 is filtered because 04.0004 is no longer bound.
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> drivers/s390/crypto/vfio_ap_ops.c | 158 +++++++++++++++++++++++++++++-
> 1 file changed, 155 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 7bad70d7bcef..5b34bc8fca31 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -312,6 +312,13 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
> return 0;
> }
>
> +static void vfio_ap_matrix_clear_masks(struct ap_matrix *matrix)
> +{
> + bitmap_clear(matrix->apm, 0, AP_DEVICES);
> + bitmap_clear(matrix->aqm, 0, AP_DOMAINS);
> + bitmap_clear(matrix->adm, 0, AP_DOMAINS);
> +}
> +
> static void vfio_ap_matrix_init(struct ap_config_info *info,
> struct ap_matrix *matrix)
> {
> @@ -601,6 +608,104 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
> return 0;
> }
>
> +static bool vfio_ap_mdev_matrixes_equal(struct ap_matrix *matrix1,
> + struct ap_matrix *matrix2)
> +{
> + return (bitmap_equal(matrix1->apm, matrix2->apm, AP_DEVICES) &&
> + bitmap_equal(matrix1->aqm, matrix2->aqm, AP_DOMAINS) &&
> + bitmap_equal(matrix1->adm, matrix2->adm, AP_DOMAINS));
> +}
> +
> +/**
> + * vfio_ap_mdev_filter_matrix
> + *
> + * Filters the matrix of adapters, domains, and control domains assigned to
> + * a matrix mdev's AP configuration and stores the result in the shadow copy of
> + * the APCB used to supply a KVM guest's AP configuration.
> + *
> + * @matrix_mdev: the matrix mdev whose AP configuration is to be filtered
> + *
> + * Returns true if filtering has changed the shadow copy of the APCB used
> + * to supply a KVM guest's AP configuration; otherwise, returns false.
> + */
> +static int vfio_ap_mdev_filter_guest_matrix(struct ap_matrix_mdev *matrix_mdev)
> +{
> + struct ap_matrix shadow_apcb;
> + unsigned long apid, apqi, apqn;
> +
> + memcpy(&shadow_apcb, &matrix_mdev->matrix, sizeof(struct ap_matrix));
> +
> + for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
> + /*
> + * If the APID is not assigned to the host AP configuration,
> + * we can not assign it to the guest's AP configuration
> + */
> + if (!test_bit_inv(apid,
> + (unsigned long *)matrix_dev->info.apm)) {
> + clear_bit_inv(apid, shadow_apcb.apm);
> + continue;
> + }
> +
> + for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
> + AP_DOMAINS) {
> + /*
> + * If the APQI is not assigned to the host AP
> + * configuration, then it can not be assigned to the
> + * guest's AP configuration
> + */
> + if (!test_bit_inv(apqi, (unsigned long *)
> + matrix_dev->info.aqm)) {
> + clear_bit_inv(apqi, shadow_apcb.aqm);
> + continue;
> + }
> +
> + /*
> + * If the APQN is not bound to the vfio_ap device
> + * driver, then we can't assign it to the guest's
> + * AP configuration. The AP architecture won't
> + * allow filtering of a single APQN, so let's filter
> + * the APID.
> + */
> + apqn = AP_MKQID(apid, apqi);
> + if (!vfio_ap_mdev_get_queue(matrix_mdev, apqn)) {
> + clear_bit_inv(apid, shadow_apcb.apm);
> + break;
> + }
> + }
> +
> + /*
> + * If all APIDs have been cleared, then clear the APQIs from the
> + * shadow APCB and quit filtering.
> + */
> + if (bitmap_empty(shadow_apcb.apm, AP_DEVICES)) {
> + if (!bitmap_empty(shadow_apcb.aqm, AP_DOMAINS))
> + bitmap_clear(shadow_apcb.aqm, 0, AP_DOMAINS);
> +
> + break;
> + }
> +
> + /*
> + * If all APQIs have been cleared, then clear the APIDs from the
> + * shadow APCB and quit filtering.
> + */
> + if (bitmap_empty(shadow_apcb.aqm, AP_DOMAINS)) {
> + if (!bitmap_empty(shadow_apcb.apm, AP_DEVICES))
> + bitmap_clear(shadow_apcb.apm, 0, AP_DEVICES);
> +
> + break;
> + }
We do this to show the no queues but bits set output in show? We could
get rid of some code if we were to not z
> + }
> +
> + if (vfio_ap_mdev_matrixes_equal(&matrix_mdev->shadow_apcb,
> + &shadow_apcb))
> + return false;
> +
> + memcpy(&matrix_mdev->shadow_apcb, &shadow_apcb,
> + sizeof(struct ap_matrix));
> +
> + return true;
> +}
> +
> enum qlink_type {
> LINK_APID,
> LINK_APQI,
> @@ -1256,9 +1361,8 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
> if (!vfio_ap_mdev_has_crycb(matrix_mdev))
> return NOTIFY_DONE;
>
> - memcpy(&matrix_mdev->shadow_apcb, &matrix_mdev->matrix,
> - sizeof(matrix_mdev->shadow_apcb));
> - vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
> + if (vfio_ap_mdev_filter_guest_matrix(matrix_mdev))
> + vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>
> return NOTIFY_OK;
> }
> @@ -1369,6 +1473,18 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
> matrix_mdev->kvm = NULL;
> }
>
> + /*
> + * The shadow_apcb must be cleared.
> + *
> + * The shadow_apcb is committed to the guest only if the masks resulting
> + * from filtering the matrix_mdev->matrix differs from the masks in the
> + * shadow_apcb. Consequently, if we don't clear the masks here and a
> + * guest is subsequently started, the filtering may not result in a
> + * change to the shadow_apcb which will not get committed to the guest;
> + * in that case, the guest will be left without any queues.
> + */
> + vfio_ap_matrix_clear_masks(&matrix_mdev->shadow_apcb);
> +
> mutex_unlock(&matrix_dev->lock);
>
> vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
> @@ -1466,6 +1582,16 @@ static void vfio_ap_queue_link_mdev(struct vfio_ap_queue *q)
> }
> }
>
> +static void vfio_ap_mdev_hot_plug_queue(struct vfio_ap_queue *q)
> +{
> +
> + if ((q->matrix_mdev == NULL) || !vfio_ap_mdev_has_crycb(q->matrix_mdev))
> + return;
> +
> + if (vfio_ap_mdev_filter_guest_matrix(q->matrix_mdev))
> + vfio_ap_mdev_commit_shadow_apcb(q->matrix_mdev);
Here we do more work than necessary. At this point we now, that
we either put the APID of the queue in the shadow_apcb or do nothing. To
decide if we have to put the APID in the shadow apcb we need to
check for the cartesian product of shadow_apcb.aqm with the APID, if the
queues identified by those APQNs are bound to the vfio_ap driver. The
vfio_ap_mdev_filter_guest_matrix() is going to do a lookup for each
assigned APQN.
> +}
> +
> int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
> {
> struct vfio_ap_queue *q;
> @@ -1482,11 +1608,36 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
> q->apqn = queue->qid;
> q->saved_isc = VFIO_AP_ISC_INVALID;
> vfio_ap_queue_link_mdev(q);
> + vfio_ap_mdev_hot_plug_queue(q);
> mutex_unlock(&matrix_dev->lock);
>
> return 0;
> }
>
> +void vfio_ap_mdev_hot_unplug_queue(struct vfio_ap_queue *q)
> +{
> + unsigned long apid = AP_QID_CARD(q->apqn);
> +
> + if ((q->matrix_mdev == NULL) || !vfio_ap_mdev_has_crycb(q->matrix_mdev))
> + return;
> +
> + /*
> + * If the APID is assigned to the guest, then let's
> + * go ahead and unplug the adapter since the
> + * architecture does not provide a means to unplug
> + * an individual queue.
> + */
> + if (test_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm)) {
> + clear_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm);
Shouldn't we check aqm as well? I mean it may be clear at this point
bacause of info->aqm. If the bit is clear, we don't have to remove
the apm bit.
> +
> + if (bitmap_empty(q->matrix_mdev->shadow_apcb.apm, AP_DEVICES))
> + bitmap_clear(q->matrix_mdev->shadow_apcb.aqm, 0,
> + AP_DOMAINS);
> +
> + vfio_ap_mdev_commit_shadow_apcb(q->matrix_mdev);
> + }
> +}
> +
> void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
> {
> struct vfio_ap_queue *q;
> @@ -1497,6 +1648,7 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
>
> mutex_lock(&matrix_dev->lock);
> q = dev_get_drvdata(&queue->ap_dev.device);
> + vfio_ap_mdev_hot_unplug_queue(q);
Puh this is ugly. In an ideal world the guest would be guaranteed to not
get any writes to the notifier byte after it has seen that the queue is
gone (or the interrupts were disabled).
The reset below might too late as the vcpus may go back immediately.
I don't have a good solution for this with the tools currently at
our disposal. We could simulate an external reset for the queue before
the update do the APCB, or just disable the interrupts. These are ugly
in their own way.
Switching to emulation mode might be something for the future, but right
now it is also ugly.
Any thoughts? Am I just dreaming up a problem here?
Regards,
Halil
> dev_set_drvdata(&queue->ap_dev.device, NULL);
> apid = AP_QID_CARD(q->apqn);
> apqi = AP_QID_QUEUE(q->apqn);
On 10/28/20 9:57 AM, Halil Pasic wrote:
> On Thu, 22 Oct 2020 13:12:03 -0400
> Tony Krowiak <[email protected]> wrote:
>
>> In response to the probe or remove of a queue device, if a KVM guest is
>> using the matrix mdev to which the APQN of the queue device is assigned,
>> the vfio_ap device driver must respond accordingly. In an ideal world, the
>> queue device being probed would be hot plugged into the guest. Likewise,
>> the queue corresponding to the queue device being removed would
>> be hot unplugged from the guest. Unfortunately, the AP architecture
>> precludes plugging or unplugging individual queues. We must also
>> consider the fact that the linux device model precludes us from passing a
>> queue device through to a KVM guest that is not bound to the driver
>> facilitating the pass-through. Consequently, we are left with the choice of
>> plugging/unplugging the adapter or the domain. In the latter case, this
>> would result in taking access to the domain away for each adapter the
>> guest is using. In either case, the operation will alter a KVM guest's
>> access to one or more queues, so let's plug/unplug the adapter on
>> bind/unbind of the queue device since this corresponds to the hardware
>> entity that may be physically plugged/unplugged - i.e., a domain is not
>> a piece of hardware.
>>
>> Example:
>> =======
>> Queue devices bound to vfio_ap device driver:
>> 04.0004
>> 04.0047
>> 04.0054
>>
>> 05.0005
>> 05.0047
>>
>> Adapters and domains assigned to matrix mdev:
>> Adapters Domains -> Queues
>> 04 0004 04.0004
>> 05 0047 04.0047
>> 0054 04.0054
>> 05.0004
>> 05.0047
>> 05.0054
>>
>> KVM guest matrix at is startup:
>> Adapters Domains -> Queues
>> 04 0004 04.0004
>> 0047 04.0047
>> 0054 04.0054
>>
>> Adapter 05 is filtered because queue 05.0054 is not bound.
>>
>> KVM guest matrix after queue 05.0054 is bound to the vfio_ap driver:
>> Adapters Domains -> Queues
>> 04 0004 04.0004
>> 05 0047 04.0047
>> 0054 04.0054
>> 05.0004
>> 05.0047
>> 05.0054
>>
>> All queues assigned to the matrix mdev are now bound.
>>
>> KVM guest matrix after queue 04.0004 is unbound:
>>
>> Adapters Domains -> Queues
>> 05 0004 05.0004
>> 0047 05.0047
>> 0054 05.0054
>>
>> Adapter 04 is filtered because 04.0004 is no longer bound.
>>
>> Signed-off-by: Tony Krowiak <[email protected]>
>> ---
>> drivers/s390/crypto/vfio_ap_ops.c | 158 +++++++++++++++++++++++++++++-
>> 1 file changed, 155 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index 7bad70d7bcef..5b34bc8fca31 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -312,6 +312,13 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>> return 0;
>> }
>>
>> +static void vfio_ap_matrix_clear_masks(struct ap_matrix *matrix)
>> +{
>> + bitmap_clear(matrix->apm, 0, AP_DEVICES);
>> + bitmap_clear(matrix->aqm, 0, AP_DOMAINS);
>> + bitmap_clear(matrix->adm, 0, AP_DOMAINS);
>> +}
>> +
>> static void vfio_ap_matrix_init(struct ap_config_info *info,
>> struct ap_matrix *matrix)
>> {
>> @@ -601,6 +608,104 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
>> return 0;
>> }
>>
>> +static bool vfio_ap_mdev_matrixes_equal(struct ap_matrix *matrix1,
>> + struct ap_matrix *matrix2)
>> +{
>> + return (bitmap_equal(matrix1->apm, matrix2->apm, AP_DEVICES) &&
>> + bitmap_equal(matrix1->aqm, matrix2->aqm, AP_DOMAINS) &&
>> + bitmap_equal(matrix1->adm, matrix2->adm, AP_DOMAINS));
>> +}
>> +
>> +/**
>> + * vfio_ap_mdev_filter_matrix
>> + *
>> + * Filters the matrix of adapters, domains, and control domains assigned to
>> + * a matrix mdev's AP configuration and stores the result in the shadow copy of
>> + * the APCB used to supply a KVM guest's AP configuration.
>> + *
>> + * @matrix_mdev: the matrix mdev whose AP configuration is to be filtered
>> + *
>> + * Returns true if filtering has changed the shadow copy of the APCB used
>> + * to supply a KVM guest's AP configuration; otherwise, returns false.
>> + */
>> +static int vfio_ap_mdev_filter_guest_matrix(struct ap_matrix_mdev *matrix_mdev)
>> +{
>> + struct ap_matrix shadow_apcb;
>> + unsigned long apid, apqi, apqn;
>> +
>> + memcpy(&shadow_apcb, &matrix_mdev->matrix, sizeof(struct ap_matrix));
>> +
>> + for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
>> + /*
>> + * If the APID is not assigned to the host AP configuration,
>> + * we can not assign it to the guest's AP configuration
>> + */
>> + if (!test_bit_inv(apid,
>> + (unsigned long *)matrix_dev->info.apm)) {
>> + clear_bit_inv(apid, shadow_apcb.apm);
>> + continue;
>> + }
>> +
>> + for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
>> + AP_DOMAINS) {
>> + /*
>> + * If the APQI is not assigned to the host AP
>> + * configuration, then it can not be assigned to the
>> + * guest's AP configuration
>> + */
>> + if (!test_bit_inv(apqi, (unsigned long *)
>> + matrix_dev->info.aqm)) {
>> + clear_bit_inv(apqi, shadow_apcb.aqm);
>> + continue;
>> + }
>> +
>> + /*
>> + * If the APQN is not bound to the vfio_ap device
>> + * driver, then we can't assign it to the guest's
>> + * AP configuration. The AP architecture won't
>> + * allow filtering of a single APQN, so let's filter
>> + * the APID.
>> + */
>> + apqn = AP_MKQID(apid, apqi);
>> + if (!vfio_ap_mdev_get_queue(matrix_mdev, apqn)) {
>> + clear_bit_inv(apid, shadow_apcb.apm);
>> + break;
>> + }
>> + }
>> +
>> + /*
>> + * If all APIDs have been cleared, then clear the APQIs from the
>> + * shadow APCB and quit filtering.
>> + */
>> + if (bitmap_empty(shadow_apcb.apm, AP_DEVICES)) {
>> + if (!bitmap_empty(shadow_apcb.aqm, AP_DOMAINS))
>> + bitmap_clear(shadow_apcb.aqm, 0, AP_DOMAINS);
>> +
>> + break;
>> + }
>> +
>> + /*
>> + * If all APQIs have been cleared, then clear the APIDs from the
>> + * shadow APCB and quit filtering.
>> + */
>> + if (bitmap_empty(shadow_apcb.aqm, AP_DOMAINS)) {
>> + if (!bitmap_empty(shadow_apcb.apm, AP_DEVICES))
>> + bitmap_clear(shadow_apcb.apm, 0, AP_DEVICES);
>> +
>> + break;
>> + }
> We do this to show the no queues but bits set output in show? We could
> get rid of some code if we were to not z
I'm not sure what you are saying/asking here. The reason for this
is because there is no point in setting bits in the APCB if no queues
will be made available to the guest which is the case if the APM or
AQM are cleared.
>
>> + }
>> +
>> + if (vfio_ap_mdev_matrixes_equal(&matrix_mdev->shadow_apcb,
>> + &shadow_apcb))
>> + return false;
>> +
>> + memcpy(&matrix_mdev->shadow_apcb, &shadow_apcb,
>> + sizeof(struct ap_matrix));
>> +
>> + return true;
>> +}
>> +
>> enum qlink_type {
>> LINK_APID,
>> LINK_APQI,
>> @@ -1256,9 +1361,8 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>> if (!vfio_ap_mdev_has_crycb(matrix_mdev))
>> return NOTIFY_DONE;
>>
>> - memcpy(&matrix_mdev->shadow_apcb, &matrix_mdev->matrix,
>> - sizeof(matrix_mdev->shadow_apcb));
>> - vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>> + if (vfio_ap_mdev_filter_guest_matrix(matrix_mdev))
>> + vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>>
>> return NOTIFY_OK;
>> }
>> @@ -1369,6 +1473,18 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
>> matrix_mdev->kvm = NULL;
>> }
>>
>> + /*
>> + * The shadow_apcb must be cleared.
>> + *
>> + * The shadow_apcb is committed to the guest only if the masks resulting
>> + * from filtering the matrix_mdev->matrix differs from the masks in the
>> + * shadow_apcb. Consequently, if we don't clear the masks here and a
>> + * guest is subsequently started, the filtering may not result in a
>> + * change to the shadow_apcb which will not get committed to the guest;
>> + * in that case, the guest will be left without any queues.
>> + */
>> + vfio_ap_matrix_clear_masks(&matrix_mdev->shadow_apcb);
>> +
>> mutex_unlock(&matrix_dev->lock);
>>
>> vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
>> @@ -1466,6 +1582,16 @@ static void vfio_ap_queue_link_mdev(struct vfio_ap_queue *q)
>> }
>> }
>>
>> +static void vfio_ap_mdev_hot_plug_queue(struct vfio_ap_queue *q)
>> +{
>> +
>> + if ((q->matrix_mdev == NULL) || !vfio_ap_mdev_has_crycb(q->matrix_mdev))
>> + return;
>> +
>> + if (vfio_ap_mdev_filter_guest_matrix(q->matrix_mdev))
>> + vfio_ap_mdev_commit_shadow_apcb(q->matrix_mdev);
> Here we do more work than necessary. At this point we now, that
> we either put the APID of the queue in the shadow_apcb or do nothing. To
> decide if we have to put the APID in the shadow apcb we need to
> check for the cartesian product of shadow_apcb.aqm with the APID, if the
> queues identified by those APQNs are bound to the vfio_ap driver. The
> vfio_ap_mdev_filter_guest_matrix() is going to do a lookup for each
> assigned APQN.
That is true and I believe in the previous iteration that is what was
done. In the interest of keeping things simple and consistent across
the various interfaces that require filtering, I decided to use one
function instead of duplicating function in multiple places. Let me think
on this and maybe I can come up with a way to kill many birds with one
stone so. The question is, how often is type of thing going to happen
(i.e.,
is performance really an issue here?).
>
>> +}
>> +
>> int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
>> {
>> struct vfio_ap_queue *q;
>> @@ -1482,11 +1608,36 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
>> q->apqn = queue->qid;
>> q->saved_isc = VFIO_AP_ISC_INVALID;
>> vfio_ap_queue_link_mdev(q);
>> + vfio_ap_mdev_hot_plug_queue(q);
>> mutex_unlock(&matrix_dev->lock);
>>
>> return 0;
>> }
>>
>> +void vfio_ap_mdev_hot_unplug_queue(struct vfio_ap_queue *q)
>> +{
>> + unsigned long apid = AP_QID_CARD(q->apqn);
>> +
>> + if ((q->matrix_mdev == NULL) || !vfio_ap_mdev_has_crycb(q->matrix_mdev))
>> + return;
>> +
>> + /*
>> + * If the APID is assigned to the guest, then let's
>> + * go ahead and unplug the adapter since the
>> + * architecture does not provide a means to unplug
>> + * an individual queue.
>> + */
>> + if (test_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm)) {
>> + clear_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm);
> Shouldn't we check aqm as well? I mean it may be clear at this point
> bacause of info->aqm. If the bit is clear, we don't have to remove
> the apm bit.
The rule we agreed upon is that if a queue is removed, we unplug
the card because we can't unplug an individual queue, so this code
is consistent with the stated rule. Typically, a queue is unplugged
because the adapter has been deconfigured or is broken which means
that all queues for that adapter will be removed in succession. On the
other hand, that situation would be handled when the last queue is
removed if we check the AQM, so I'm not adverse to making that
check if you insist. Of course, if the queue is manually unbound from
the vfio driver, what you are asking for makes sense I suppose. I'll have
to think about this one some more, but feel free to respond to this.
>
>> +
>> + if (bitmap_empty(q->matrix_mdev->shadow_apcb.apm, AP_DEVICES))
>> + bitmap_clear(q->matrix_mdev->shadow_apcb.aqm, 0,
>> + AP_DOMAINS);
>> +
>> + vfio_ap_mdev_commit_shadow_apcb(q->matrix_mdev);
>> + }
>> +}
>> +
>> void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
>> {
>> struct vfio_ap_queue *q;
>> @@ -1497,6 +1648,7 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
>>
>> mutex_lock(&matrix_dev->lock);
>> q = dev_get_drvdata(&queue->ap_dev.device);
>> + vfio_ap_mdev_hot_unplug_queue(q);
> Puh this is ugly. In an ideal world the guest would be guaranteed to not
> get any writes to the notifier byte after it has seen that the queue is
> gone (or the interrupts were disabled).
>
> The reset below might too late as the vcpus may go back immediately.
>
> I don't have a good solution for this with the tools currently at
> our disposal. We could simulate an external reset for the queue before
> the update do the APCB, or just disable the interrupts. These are ugly
> in their own way.
>
> Switching to emulation mode might be something for the future, but right
> now it is also ugly.
>
> Any thoughts? Am I just dreaming up a problem here?
I realize that we shouldn't make any assumptions about the OS
running on the guest, but in the world as it exists today the only
OS supported as a guest of linux is linux. Consequently, when
the adapter is unplugged from the guest, the zcrypt driver will
reset the queues associated with it thus disabling interrupts.
Unfortuately we can't control the flow between the host and the
guest, so there is a possibility the vfio driver could be first to
reset the queue in question. I'm not sure this is a problem
because either the vfio driver or the zcrypt driver on the guest
will get a response code indicating a reset is in progress at which
time it will wait for completion before trying again.
The bottom line is, in my opinion you are dreaming up a problem
here. Ultimately, when an adapter is removed from the guest,
the guest will no longer access any queue on that adapter and
the adapter will be reset before giving it back to the host.
>
> Regards,
> Halil
>
>
>> dev_set_drvdata(&queue->ap_dev.device, NULL);
>> apid = AP_QID_CARD(q->apqn);
>> apqi = AP_QID_QUEUE(q->apqn);
On Tue, 3 Nov 2020 17:49:21 -0500
Tony Krowiak <[email protected]> wrote:
> >>
> >> +void vfio_ap_mdev_hot_unplug_queue(struct vfio_ap_queue *q)
> >> +{
> >> + unsigned long apid = AP_QID_CARD(q->apqn);
> >> +
> >> + if ((q->matrix_mdev == NULL) || !vfio_ap_mdev_has_crycb(q->matrix_mdev))
> >> + return;
> >> +
> >> + /*
> >> + * If the APID is assigned to the guest, then let's
> >> + * go ahead and unplug the adapter since the
> >> + * architecture does not provide a means to unplug
> >> + * an individual queue.
> >> + */
> >> + if (test_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm)) {
> >> + clear_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm);
> > Shouldn't we check aqm as well? I mean it may be clear at this point
> > bacause of info->aqm. If the bit is clear, we don't have to remove
> > the apm bit.
>
> The rule we agreed upon is that if a queue is removed, we unplug
> the card because we can't unplug an individual queue, so this code
> is consistent with the stated rule.
All I'm asking for is to verify that the queue is actually plugged. The
queue is actually plugged iff
test_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm) && test_bit_inv(apqi,
q->matrix_mdev->shadow_apcb.aqm).
There is no point in unplugging the whole card, if the queue removed is
unplugged in the first place.
> Typically, a queue is unplugged
> because the adapter has been deconfigured or is broken which means
> that all queues for that adapter will be removed in succession. On the
> other hand, that situation would be handled when the last queue is
> removed if we check the AQM, so I'm not adverse to making that
> check if you insist.
I don't agree. Let's detail your scenario. We have a nicely
operating card which is as a whole passed trough to our guest. It
goes broken, and the ap bus decides to deconstruct the queues.
Already the first queue removed would unplug the the card, because
both the apm and the aqm bits are set at this point. Subsequent removals
then see that the apm bit is removed. Actually IMHO everything works
like without the extra check on aqm (in this scenario).
Would make reasoning about the code much easier to me, so sorry I do
insist.
> Of course, if the queue is manually unbound from
> the vfio driver, what you are asking for makes sense I suppose. I'll have
> to think about this one some more, but feel free to respond to this.
I'm not sure the situation where the queues ->mdev_matrix pointer is set
but the apqi is not in the shadow_apcb can actually happen (races not
considered). But I'm sure the code is suggesting it can, because
vfio_ap_mdev_filter_guest_matrix() has a third parameter called filter_apid,
which governs whether the apm or the aqm bit should be removed. And
vfio_ap_mdev_filter_guest_matrix() does get called with filter_apid=false in
assign_domain_store() and I don't see subsequent unlink operations that would
severe q->mdev_matrix.
Another case where the aqm may get filtered in
vfio_ap_mdev_filter_guest_matrix() is the info->aqm bit not set, as I've
mentioned in my previous mail. If that can not happen, we should turn
that into an assert.
Actually if you are convinced that apqi bit is always set in the
q->matrix_mdev->shadow_apcb.aqm, I would agree to turning that into an
assertion instead of condition. Then if not completely convinced, I
could at least try to trigger the assert :).
Regards,
Halil
On Tue, 3 Nov 2020 17:49:21 -0500
Tony Krowiak <[email protected]> wrote:
> > We do this to show the no queues but bits set output in show? We could
> > get rid of some code if we were to not z
Managed to delete "eroize" fro "zeroize"
>
> I'm not sure what you are saying/asking here. The reason for this
> is because there is no point in setting bits in the APCB if no queues
> will be made available to the guest which is the case if the APM or
> AQM are cleared.
Exactly my train of thought! There is no point doing work (here
zeroizing) that has no effect.
Also I'm leaning towards incremental updates to the shadow_apcb (instead
of basically recomputing it from the scratch each time). One thing I'm
particularly worried abut is that because of the third argument of
vfio_ap_mdev_filter_guest_matrix() called filter_apid, we could end up
with different filtering decision than previously. E.g. we decided to
filter the card on e.g. removal of a single queueu, but then somebody
does an assign domain, and suddenly we unplug the domain and plug the
card. With incremental changes the shadow_apcb, we could do less work
(revise only what needs to be), and it would be more straight forward
to reason about the absence of inconsistent filtering.
Regards,
Halil
On 11/4/20 7:52 AM, Halil Pasic wrote:
> On Tue, 3 Nov 2020 17:49:21 -0500
> Tony Krowiak <[email protected]> wrote:
>
>>>>
>>>> +void vfio_ap_mdev_hot_unplug_queue(struct vfio_ap_queue *q)
>>>> +{
>>>> + unsigned long apid = AP_QID_CARD(q->apqn);
>>>> +
>>>> + if ((q->matrix_mdev == NULL) || !vfio_ap_mdev_has_crycb(q->matrix_mdev))
>>>> + return;
>>>> +
>>>> + /*
>>>> + * If the APID is assigned to the guest, then let's
>>>> + * go ahead and unplug the adapter since the
>>>> + * architecture does not provide a means to unplug
>>>> + * an individual queue.
>>>> + */
>>>> + if (test_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm)) {
>>>> + clear_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm);
>>> Shouldn't we check aqm as well? I mean it may be clear at this point
>>> bacause of info->aqm. If the bit is clear, we don't have to remove
>>> the apm bit.
>> The rule we agreed upon is that if a queue is removed, we unplug
>> the card because we can't unplug an individual queue, so this code
>> is consistent with the stated rule.
> All I'm asking for is to verify that the queue is actually plugged. The
> queue is actually plugged iff
> test_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm) && test_bit_inv(apqi,
> q->matrix_mdev->shadow_apcb.aqm).
>
> There is no point in unplugging the whole card, if the queue removed is
> unplugged in the first place.
No problem, I can make that change.
>
>> Typically, a queue is unplugged
>> because the adapter has been deconfigured or is broken which means
>> that all queues for that adapter will be removed in succession. On the
>> other hand, that situation would be handled when the last queue is
>> removed if we check the AQM, so I'm not adverse to making that
>> check if you insist.
> I don't agree. Let's detail your scenario. We have a nicely
> operating card which is as a whole passed trough to our guest. It
> goes broken, and the ap bus decides to deconstruct the queues.
> Already the first queue removed would unplug the the card, because
> both the apm and the aqm bits are set at this point. Subsequent removals
> then see that the apm bit is removed. Actually IMHO everything works
> like without the extra check on aqm (in this scenario).
>
> Would make reasoning about the code much easier to me, so sorry I do
> insist.
As you said, it works as-is in the scenario you pointed out:)
Whether it makes it any easier to understand the code is in
the eyes of the beholder (for example, I disagree),
but I'm willing to make the change, it's not a big deal.
>
>> Of course, if the queue is manually unbound from
>> the vfio driver, what you are asking for makes sense I suppose. I'll have
>> to think about this one some more, but feel free to respond to this.
> I'm not sure the situation where the queues ->mdev_matrix pointer is set
> but the apqi is not in the shadow_apcb can actually happen (races not
> considered).
Of course it can, for example:
1. No queues bound to vfio driver
2. APQN 04.0004 assigned to matrix mdev
3. Guest started:
a. No bits set in shadow_apcb because no queues are bound to vfio
4. queue device 04.0004 is bound to the driver
a. q->matrix_mdev is set because 04.0004 is assigned to matrix mdev
b. apqi 0004 is not in shadow_apcb (see 3a.)
> But I'm sure the code is suggesting it can, because
> vfio_ap_mdev_filter_guest_matrix() has a third parameter called filter_apid,
> which governs whether the apm or the aqm bit should be removed. And
> vfio_ap_mdev_filter_guest_matrix() does get called with filter_apid=false in
> assign_domain_store() and I don't see subsequent unlink operations that would
> severe q->mdev_matrix.
I think you may be conflating two different things. The q in q->matrix_mdev
represents a queue device bound to the driver. The link to matrix_mdev
indicates the APQN of the queue device is assigned to the matrix_mdev.
When a new domain is assigned to matrix_mdev, we know that
all APQNS currently assigned to the shadow_apcb are bound to the vfio
driver
because of previous filtering, so we are only concerned with those APQNs
with the APQI of the new domain being assigned.
1. Queues bound to vfio_ap:
04.0004
04.0047
2. APQNs assigned to matrix_mdev:
04.0004
04.0047
3. shadow_apcb:
04.0004
04.0047
4. Assign domain 0054 to matrix_mdev
5. APQI 0054 gets filtered because 04.0054 not bound to vfio_ap
6. no change to shadow_apcb:
04.0004
04.0047
Or:
1. Queues bound to vfio_ap:
04.0004
04.0047
04.0054
2. APQNs assigned to matrix_mdev:
04.0004
04.0047
3. shadow_apcb:
04.0004
04.0047
4. Assign domain 0054 to matrix_mdev
5. APQNs assigned to matrix_mdev
04.0004
04.0047
04.0054
5. APQI 0054 does not get filtered because 04.0054 is bound to vfio_ap
6. shadow_apcb after filtering:
04.0004
04.0047
04.0054
I'm not sure why you are bringing up unlinking in the context of assigning
a new domain. Unlinking only occurs when an APID or APQI is unassigned.
>
> Another case where the aqm may get filtered in
> vfio_ap_mdev_filter_guest_matrix() is the info->aqm bit not set, as I've
> mentioned in my previous mail. If that can not happen, we should turn
> that into an assert.
In an earlier email of yours, you brought up the scenario whereby
a queue is probed not because of a change in the QCI info,
but because an unbound queue is bound; for instance manually.
I made a change to account for that so consider the following
scenario:
1. APQI 0004 removed from info->aqm
2. AP bus notifies vfio_ap that AP configuration has changed
3. vfio_ap removes APQI 0004 from shadow_apcb
4. Userspace binds queue 04.0004 to vfio_ap
5. Filtering code filters 0004 because it has been removed
from info->aqm
6. AP bus notifies vfio_ap scan is over
>
> Actually if you are convinced that apqi bit is always set in the
> q->matrix_mdev->shadow_apcb.aqm, I would agree to turning that into an
> assertion instead of condition. Then if not completely convinced, I
> could at least try to trigger the assert :).
>
> Regards,
> Halil
On Wed, 4 Nov 2020 16:20:26 -0500
Tony Krowiak <[email protected]> wrote:
> > But I'm sure the code is suggesting it can, because
> > vfio_ap_mdev_filter_guest_matrix() has a third parameter called filter_apid,
> > which governs whether the apm or the aqm bit should be removed. And
> > vfio_ap_mdev_filter_guest_matrix() does get called with filter_apid=false in
> > assign_domain_store() and I don't see subsequent unlink operations that would
> > severe q->mdev_matrix.
>
> I think you may be conflating two different things. The q in q->matrix_mdev
> represents a queue device bound to the driver. The link to matrix_mdev
> indicates the APQN of the queue device is assigned to the matrix_mdev.
> When a new domain is assigned to matrix_mdev, we know that
> all APQNS currently assigned to the shadow_apcb are bound to the vfio
> driver
> because of previous filtering, so we are only concerned with those APQNs
> with the APQI of the new domain being assigned.
>
> 1. Queues bound to vfio_ap:
> 04.0004
> 04.0047
> 2. APQNs assigned to matrix_mdev:
> 04.0004
> 04.0047
> 3. shadow_apcb:
> 04.0004
> 04.0047
> 4. Assign domain 0054 to matrix_mdev
> 5. APQI 0054 gets filtered because 04.0054 not bound to vfio_ap
> 6. no change to shadow_apcb:
> 04.0004
> 04.0047
Let me please expand on your example. For reference see the filtering
code after the example.
1. Queues bound to vfio_ap:
04.0004
04.0047
05.0004
05.0047
05.0054
2. APQNs assigned to matrix_mdev:
04.0004
04.0047
3. shadow_apcb:
04.0004
04.0047
4. Assign domain 0054 to matrix_mdev
5. APQNs assigned to matrix_mdev:
04.0004
04.0047
04.0054
5. APQI 0054 gets filtered because 04.0054 not bound to vfio_ap
6. no change to shadow_apcb:
04.0004
04.0047
7. assign adapter 05
8. APQNs assigned to matrix_mdev:
04.0004
04.0047
04.0054
05.0004
05.0047
05.0054
9. shadow_apcb changes to:
05.0004
05.0047
05.0054
because now vfio_ap_mdev_filter_guest_matrix() is called with filter_apid=true
10. assign domain 0052
11. APQNs assigned to matrix_mdev:
04.0004
04.0047
04.0053
04.0054
05.0004
05.0047
05.0053
05.0054
11. shadow_apcb changes to
04.0004
04.0047
05.0004
05.0047
because now filter_guest_matrix() is called with filter_apid=false
and apqis 0053 and 0054 get filtered
12. 05.0054 gets removed (unbound)
13. with your current code we unplug adapter 05 from shadow_apcb
despite the fact that 05.0054 was not in the shadow_apcb in
the first place
14. unassign adapter 05
15. unassign domain 0053
16. APQNs assigned to matrix_mdev:
04.0004
04.0047
04.0054
17. shadow apcb is
04.0004
04.0047
16. assign adapter 05
15. APQNs assigned to matrix_mdev:
04.0004
04.0047
04.0054
05.0004
05.0047
05.0054
16. shadow_apcb changes to
<empty>
because now filter_guest_matrix() is called with filter_apid=true
and apqn 04 gets filtered because queues 04.0053 are not bound
and apqn 05 gets filtered because queues 05.0053 are not bound
static int vfio_ap_mdev_filter_guest_matrix(struct ap_matrix_mdev *matrix_mdev,
bool filter_apid)
{
struct ap_matrix shadow_apcb;
unsigned long apid, apqi, apqn;
memcpy(&shadow_apcb, &matrix_mdev->matrix, sizeof(struct ap_matrix));
for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
/*
* If the APID is not assigned to the host AP configuration,
* we can not assign it to the guest's AP configuration
*/
if (!test_bit_inv(apid, (unsigned long *)
matrix_dev->config_info.apm)) {
clear_bit_inv(apid, shadow_apcb.apm);
continue;
}
for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
AP_DOMAINS) {
/*
* If the APQI is not assigned to the host AP
* configuration, then it can not be assigned to the
* guest's AP configuration
*/
if (!test_bit_inv(apqi, (unsigned long *)
matrix_dev->config_info.aqm)) {
clear_bit_inv(apqi, shadow_apcb.aqm);
continue;
}
/*
* If the APQN is not bound to the vfio_ap device
* driver, then we can't assign it to the guest's
* AP configuration. The AP architecture won't
* allow filtering of a single APQN, so let's filter
* the APID.
*/
apqn = AP_MKQID(apid, apqi);
if (!vfio_ap_mdev_get_queue(matrix_mdev, apqn)) {
if (filter_apid) {
clear_bit_inv(apid, shadow_apcb.apm);
break;
}
clear_bit_inv(apqi, shadow_apcb.aqm);
continue;
}
}
I realize this scenario (to play through to the end) requires
manually unbound queue (more precisely queue missing not because
of host ap config or because of a[pq]mask), but just one 'hole' suffices.
I'm afraid, that I might be bitching around, because last time it was me
who downplayed the effects of such 'holes'.
Nevertheless, I would like to ask you to verify the scenario I've
sketched, or complain if I've gotten something wrong.
Regarding solutions to the problem. It makes no sense to talk about a
solution, before agreeing on the existence of the problem. Nevertheless
I will write down two sentences, mostly as a reminder to myself, for the
case we do agree on the existence of the problem. The simplest approach
is to always filter by apid. That way we get a quirky adapter unplug
right at steps 4, but it won't create the complicated mess we have in
the rest of the points. Another idea is to restrict the overprovisioning
of domains. Basically we would make the step 4 fail because we detected
a 'hole'. But this idea has its own problems, and in some scenarios
it does boil down to the unplug the adapter rule.
[..]
>
> I'm not sure why you are bringing up unlinking in the context of assigning
> a new domain. Unlinking only occurs when an APID or APQI is unassigned.
Are you certain? What about vfio_ap_mdev_on_cfg_remove()? I believe it
unplugs from the shadow_apcb, but it does not change the
assignments to the matrix_mdev. We do that so we know in remove that the
queue was already cleaned up, and does not need more cleanup.
Regards,
Halil
On 11/5/20 7:27 AM, Halil Pasic wrote:
> On Wed, 4 Nov 2020 16:20:26 -0500
> Tony Krowiak <[email protected]> wrote:
>
>>> But I'm sure the code is suggesting it can, because
>>> vfio_ap_mdev_filter_guest_matrix() has a third parameter called filter_apid,
>>> which governs whether the apm or the aqm bit should be removed. And
>>> vfio_ap_mdev_filter_guest_matrix() does get called with filter_apid=false in
>>> assign_domain_store() and I don't see subsequent unlink operations that would
>>> severe q->mdev_matrix.
>> I think you may be conflating two different things. The q in q->matrix_mdev
>> represents a queue device bound to the driver. The link to matrix_mdev
>> indicates the APQN of the queue device is assigned to the matrix_mdev.
>> When a new domain is assigned to matrix_mdev, we know that
>> all APQNS currently assigned to the shadow_apcb are bound to the vfio
>> driver
>> because of previous filtering, so we are only concerned with those APQNs
>> with the APQI of the new domain being assigned.
>>
>> 1. Queues bound to vfio_ap:
>> 04.0004
>> 04.0047
>> 2. APQNs assigned to matrix_mdev:
>> 04.0004
>> 04.0047
>> 3. shadow_apcb:
>> 04.0004
>> 04.0047
>> 4. Assign domain 0054 to matrix_mdev
>> 5. APQI 0054 gets filtered because 04.0054 not bound to vfio_ap
>> 6. no change to shadow_apcb:
>> 04.0004
>> 04.0047
> Let me please expand on your example. For reference see the filtering
> code after the example.
Since our face to face discussion, I've made changes which
affect the scenario you laid out. The following shows the difference
in results using your scenario. Let me know what you think.
1. Queues bound to vfio_ap:
04.0004
04.0047
05.0004
05.0047
05.0054
2. APQNs assigned to matrix_mdev:
04.0004
04.0047
3. shadow_apcb:
04.0004
04.0047
4. Assign domain 0054 to matrix_mdev
5. APQNs assigned to matrix_mdev:
04.0004
04.0047
04.0054
5. APQI 0054 gets filtered because 04.0054 not bound to vfio_ap
6. no change to shadow_apcb:
04.0004
04.0047
7. assign adapter 05
8. APQNs assigned to matrix_mdev:
04.0004
04.0047
04.0054
05.0004
05.0047
05.0054
9. shadow_apcb changes to:
04.0004
04.0047
05.0004
05.0047
because adapter 05 is checked against the APQIs in the
shadow_apcb (0004, 0047) and since 05.0004 and
05.0047 are bound to the driver, adapter 05 is
hot plugged.
10. assign domain 0052
11. APQNs assigned to matrix_mdev:
04.0004
04.0047
04.0053
04.0054
05.0004
05.0047
05.0053
05.0054
11. shadow_apcb remains
04.0004
04.0047
05.0004
05.0047
because domain 0052 is checked against adapters assigned to
shadow_apcb and rejected because neither 04.0052 nor 05.0052
is bound to the vfio_ap driver.
12. 05.0054 gets removed (unbound)
13. Nothing is removed because 05.0054 is not assigned to shadow_apcb
14. unassign adapter 05
15. unassign domain 0053
16. APQNs assigned to matrix_mdev:
04.0004
04.0047
04.0054
17. shadow apcb is
04.0004
04.0047
16. assign adapter 05
15. APQNs assigned to matrix_mdev:
04.0004
04.0047
04.0054
05.0004
05.0047
05.0054
16. shadow_apcb changes to:
04.0004
04.0047
05.0004
05.0047
because adapter 05 is checked against APQIs (0004, 0047)
in shadow_apcb and since queues 05.0004 and 05.0047
are bound to vfio_ap, the adapter is hot plugged
>
> 1. Queues bound to vfio_ap:
> 04.0004
> 04.0047
> 05.0004
> 05.0047
> 05.0054
> 2. APQNs assigned to matrix_mdev:
> 04.0004
> 04.0047
> 3. shadow_apcb:
> 04.0004
> 04.0047
> 4. Assign domain 0054 to matrix_mdev
> 5. APQNs assigned to matrix_mdev:
> 04.0004
> 04.0047
> 04.0054
> 5. APQI 0054 gets filtered because 04.0054 not bound to vfio_ap
> 6. no change to shadow_apcb:
> 04.0004
> 04.0047
> 7. assign adapter 05
> 8. APQNs assigned to matrix_mdev:
> 04.0004
> 04.0047
> 04.0054
> 05.0004
> 05.0047
> 05.0054
> 9. shadow_apcb changes to:
> 05.0004
> 05.0047
> 05.0054
> because now vfio_ap_mdev_filter_guest_matrix() is called with filter_apid=true
> 10. assign domain 0052
> 11. APQNs assigned to matrix_mdev:
> 04.0004
> 04.0047
> 04.0053
> 04.0054
> 05.0004
> 05.0047
> 05.0053
> 05.0054
> 11. shadow_apcb changes to
> 04.0004
> 04.0047
> 05.0004
> 05.0047
> because now filter_guest_matrix() is called with filter_apid=false
> and apqis 0053 and 0054 get filtered
> 12. 05.0054 gets removed (unbound)
> 13. with your current code we unplug adapter 05 from shadow_apcb
> despite the fact that 05.0054 was not in the shadow_apcb in
> the first place
> 14. unassign adapter 05
> 15. unassign domain 0053
> 16. APQNs assigned to matrix_mdev:
> 04.0004
> 04.0047
> 04.0054
> 17. shadow apcb is
> 04.0004
> 04.0047
> 16. assign adapter 05
> 15. APQNs assigned to matrix_mdev:
> 04.0004
> 04.0047
> 04.0054
> 05.0004
> 05.0047
> 05.0054
> 16. shadow_apcb changes to
> <empty>
> because now filter_guest_matrix() is called with filter_apid=true
> and apqn 04 gets filtered because queues 04.0053 are not bound
> and apqn 05 gets filtered because queues 05.0053 are not bound
>
> static int vfio_ap_mdev_filter_guest_matrix(struct ap_matrix_mdev *matrix_mdev,
> bool filter_apid)
> {
> struct ap_matrix shadow_apcb;
> unsigned long apid, apqi, apqn;
>
> memcpy(&shadow_apcb, &matrix_mdev->matrix, sizeof(struct ap_matrix));
>
> for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
> /*
> * If the APID is not assigned to the host AP configuration,
> * we can not assign it to the guest's AP configuration
> */
> if (!test_bit_inv(apid, (unsigned long *)
> matrix_dev->config_info.apm)) {
> clear_bit_inv(apid, shadow_apcb.apm);
> continue;
> }
>
> for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
> AP_DOMAINS) {
> /*
> * If the APQI is not assigned to the host AP
> * configuration, then it can not be assigned to the
> * guest's AP configuration
> */
> if (!test_bit_inv(apqi, (unsigned long *)
> matrix_dev->config_info.aqm)) {
> clear_bit_inv(apqi, shadow_apcb.aqm);
> continue;
> }
>
> /*
> * If the APQN is not bound to the vfio_ap device
> * driver, then we can't assign it to the guest's
> * AP configuration. The AP architecture won't
> * allow filtering of a single APQN, so let's filter
> * the APID.
> */
> apqn = AP_MKQID(apid, apqi);
>
> if (!vfio_ap_mdev_get_queue(matrix_mdev, apqn)) {
> if (filter_apid) {
> clear_bit_inv(apid, shadow_apcb.apm);
> break;
> }
>
> clear_bit_inv(apqi, shadow_apcb.aqm);
> continue;
> }
> }
>
> I realize this scenario (to play through to the end) requires
> manually unbound queue (more precisely queue missing not because
> of host ap config or because of a[pq]mask), but just one 'hole' suffices.
>
> I'm afraid, that I might be bitching around, because last time it was me
> who downplayed the effects of such 'holes'.
>
> Nevertheless, I would like to ask you to verify the scenario I've
> sketched, or complain if I've gotten something wrong.
Your scenario looks correct and convinced me to change
the filtering logic giving the results I pointed out above.
It was a good catch on your part, so I thank you for
the review.
>
> Regarding solutions to the problem. It makes no sense to talk about a
> solution, before agreeing on the existence of the problem. Nevertheless
> I will write down two sentences, mostly as a reminder to myself, for the
> case we do agree on the existence of the problem. The simplest approach
> is to always filter by apid. That way we get a quirky adapter unplug
> right at steps 4, but it won't create the complicated mess we have in
> the rest of the points. Another idea is to restrict the overprovisioning
> of domains. Basically we would make the step 4 fail because we detected
> a 'hole'. But this idea has its own problems, and in some scenarios
> it does boil down to the unplug the adapter rule.
I made the following changes that I believe rectify this problem:
1. On guest startup, the shadow_apcb is initialized by filtering the
mdev's matrix by APID (i.e., if an APQN derived from a particular
APID and the APQIs assigned to the mdev's matrix does not
reference a queue device bound to vfio_ap, that APID is not
assigned to the shadow_apcb).
2. On adapter assignment, if each APQN derived from the APID
being assigned and the APQIs assigned to the shadow_apcb
does not reference a queue bound to vfio_ap, the adapter
will not be hot plugged.
3. On adapter unassignment, if the APID is set in the shadow_apcb,
the adapter will be hot unplugged.
4. On domain assignment, if each APQN derived from the APQI
being assigned and the APIDs assigned to the shadow_apcb
does not reference a queue bound to vfio_ap, the domain
will not be hot plugged.
5. On domain unassignment, if the APQI is set in the shadow_apcb,
the domain will be hot unplugged.
6. On probe:
a. For the queue's APID, the same logic as #2 will be used.
b. For the queue's APQI, the same logic as #4 will be used.
7. On remove, if the APQN of the queue being unbound is assigned
to the shadow_apcb, the adapter will be hot unplugged.
>
>
> [..]
>
>> I'm not sure why you are bringing up unlinking in the context of assigning
>> a new domain. Unlinking only occurs when an APID or APQI is unassigned.
> Are you certain? What about vfio_ap_mdev_on_cfg_remove()? I believe it
> unplugs from the shadow_apcb, but it does not change the
> assignments to the matrix_mdev. We do that so we know in remove that the
> queue was already cleaned up, and does not need more cleanup.
What you say is true; however, that is not related to my comment.
I asked why you were bringing up unlinking in the context of
assigning a new domain. The point you just made above has to
do with unassigning adapters/domains.
>
> Regards,
> Halil
>