2021-07-07 15:45:53

by Anthony Krowiak

[permalink] [raw]
Subject: [PATCH] s390/vfio-ap: do not open code locks for VFIO_GROUP_NOTIFY_SET_KVM notification

The fix to resolve a lockdep splat while handling the
VFIO_GROUP_NOTIFY_SET_KVM event introduced a kvm_busy flag indicating that
the vfio_ap device driver is busy setting or unsetting the KVM pointer.
A wait queue was employed to allow functions requiring access to the KVM
pointer to wait for the kvm_busy flag to be cleared. For the duration of
the wait period, the mdev lock was unlocked then acquired again after the
kvm_busy flag was cleared.

It was pointed out during an unrelated patch review that locks should not
be open coded - i.e., writing the algorithm of a standard lock in a
function instead of using a lock from the standard library. The setting and
testing of the kvm_busy flag and sleeping on a wait_event is the same thing
a lock does. Whatever potential deadlock was found and reported via the
lockdep splat was not magically removed by going to a wait_queue; it just
removed the lockdep annotations that would identify the issue early.

To remedy the problem introduced with the open coded locks, this patch
introduces the following changes:

1. Removes the the kvm_busy flag and wait queue. These were introduced to
prevent functions from accessing the KVM pointer while it was being
set because the matrix_dev->lock mutex had to be given up while
updating the guest's AP configuration in order to resolve the lockdep
splat. Since the functions that set the KVM pointer as well as those
that need access to it do so while holding the matrix_dev->lock mutex,
it is not necessary to wait for the KVM pointer to be set.

2. Introduces an rwsem to protect the hook (i.e., function pointer) to the
handler that processes interception of the PQAP instruction. A read
lock will be taken when the PQAP instruction is intercepted, before
calling the handler. A write lock will be taken whenever the KVM
pointer is set since the functions that set the KVM pointer also set
the hook.

3. Removes the lock of the matrix_dev->lock mutex from the function that
handles interception of the PQAP instruction. Since the functions that
set the KVM pointer and the PQAP interception handler hook as well as
the function that calls the hook lock the rwsem, it is not necessary
to lock the matrix_dev->lock mutex in the handler.

Fixes: 0cc00c8d4050 ("s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks")
Cc: [email protected]
Signed-off-by: Tony Krowiak <[email protected]>
Reported-by: Jason Gunthorpe <[email protected]>
---
arch/s390/include/asm/kvm_host.h | 8 +-
arch/s390/kvm/kvm-s390.c | 1 +
arch/s390/kvm/priv.c | 10 +-
drivers/s390/crypto/vfio_ap_ops.c | 129 +++++++++++---------------
drivers/s390/crypto/vfio_ap_private.h | 4 +-
5 files changed, 67 insertions(+), 85 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 9b4473f76e56..f18849d259e6 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -798,14 +798,12 @@ struct kvm_s390_cpu_model {
unsigned short ibc;
};

-struct kvm_s390_module_hook {
- int (*hook)(struct kvm_vcpu *vcpu);
- struct module *owner;
-};
+typedef int (*crypto_hook)(struct kvm_vcpu *vcpu);

struct kvm_s390_crypto {
struct kvm_s390_crypto_cb *crycb;
- struct kvm_s390_module_hook *pqap_hook;
+ struct rw_semaphore pqap_hook_rwsem;
+ crypto_hook *pqap_hook;
__u32 crycbd;
__u8 aes_kw;
__u8 dea_kw;
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index b655a7d82bf0..339534a0c5a5 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2641,6 +2641,7 @@ static void kvm_s390_crypto_init(struct kvm *kvm)
sizeof(kvm->arch.crypto.crycb->aes_wrapping_key_mask));
get_random_bytes(kvm->arch.crypto.crycb->dea_wrapping_key_mask,
sizeof(kvm->arch.crypto.crycb->dea_wrapping_key_mask));
+ init_rwsem(&kvm->arch.crypto.pqap_hook_rwsem);
}

static void sca_dispose(struct kvm *kvm)
diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index 9928f785c677..ec16c2facf7c 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -610,6 +610,7 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
static int handle_pqap(struct kvm_vcpu *vcpu)
{
struct ap_queue_status status = {};
+ crypto_hook handle_pqap;
unsigned long reg0;
int ret;
uint8_t fc;
@@ -657,15 +658,16 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
* Verify that the hook callback is registered, lock the owner
* and call the hook.
*/
+ down_read(&vcpu->kvm->arch.crypto.pqap_hook_rwsem);
if (vcpu->kvm->arch.crypto.pqap_hook) {
- if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))
- return -EOPNOTSUPP;
- ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);
- module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);
+ handle_pqap = *vcpu->kvm->arch.crypto.pqap_hook;
+ ret = handle_pqap(vcpu);
if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000)
kvm_s390_set_psw_cc(vcpu, 3);
+ up_read(&vcpu->kvm->arch.crypto.pqap_hook_rwsem);
return ret;
}
+ up_read(&vcpu->kvm->arch.crypto.pqap_hook_rwsem);
/*
* A vfio_driver must register a hook.
* No hook means no driver to enable the SIE CRYCB and no queues.
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 122c85c22469..d3447f6d83f1 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -270,6 +270,9 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
* We take the matrix_dev lock to ensure serialization on queues and
* mediated device access.
*
+ * Note: This function must be called with a read lock held on
+ * vcpu->kvm->arch.crypto.pqap_hook_rwsem.
+ *
* Return 0 if we could handle the request inside KVM.
* otherwise, returns -EOPNOTSUPP to let QEMU handle the fault.
*/
@@ -287,22 +290,12 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
return -EOPNOTSUPP;

apqn = vcpu->run->s.regs.gprs[0] & 0xffff;
- mutex_lock(&matrix_dev->lock);

if (!vcpu->kvm->arch.crypto.pqap_hook)
goto out_unlock;
matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
struct ap_matrix_mdev, pqap_hook);

- /*
- * If the KVM pointer is in the process of being set, wait until the
- * process has completed.
- */
- wait_event_cmd(matrix_mdev->wait_for_kvm,
- !matrix_mdev->kvm_busy,
- mutex_unlock(&matrix_dev->lock),
- mutex_lock(&matrix_dev->lock));
-
/* If the there is no guest using the mdev, there is nothing to do */
if (!matrix_mdev->kvm)
goto out_unlock;
@@ -323,7 +316,6 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
out_unlock:
memcpy(&vcpu->run->s.regs.gprs[1], &qstatus, sizeof(qstatus));
vcpu->run->s.regs.gprs[1] >>= 32;
- mutex_unlock(&matrix_dev->lock);
return 0;
}

@@ -350,10 +342,8 @@ static int vfio_ap_mdev_create(struct mdev_device *mdev)

matrix_mdev->mdev = mdev;
vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
- init_waitqueue_head(&matrix_mdev->wait_for_kvm);
mdev_set_drvdata(mdev, matrix_mdev);
- matrix_mdev->pqap_hook.hook = handle_pqap;
- matrix_mdev->pqap_hook.owner = THIS_MODULE;
+ matrix_mdev->pqap_hook = handle_pqap;
mutex_lock(&matrix_dev->lock);
list_add(&matrix_mdev->node, &matrix_dev->mdev_list);
mutex_unlock(&matrix_dev->lock);
@@ -624,7 +614,7 @@ static ssize_t assign_adapter_store(struct device *dev,
* If the KVM pointer is in flux or the guest is running, disallow
* un-assignment of adapter
*/
- if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
+ if (matrix_mdev->kvm) {
ret = -EBUSY;
goto done;
}
@@ -697,7 +687,7 @@ static ssize_t unassign_adapter_store(struct device *dev,
* If the KVM pointer is in flux or the guest is running, disallow
* un-assignment of adapter
*/
- if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
+ if (matrix_mdev->kvm) {
ret = -EBUSY;
goto done;
}
@@ -787,7 +777,7 @@ static ssize_t assign_domain_store(struct device *dev,
* If the KVM pointer is in flux or the guest is running, disallow
* assignment of domain
*/
- if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
+ if (matrix_mdev->kvm) {
ret = -EBUSY;
goto done;
}
@@ -855,7 +845,7 @@ static ssize_t unassign_domain_store(struct device *dev,
* If the KVM pointer is in flux or the guest is running, disallow
* un-assignment of domain
*/
- if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
+ if (matrix_mdev->kvm) {
ret = -EBUSY;
goto done;
}
@@ -909,7 +899,7 @@ static ssize_t assign_control_domain_store(struct device *dev,
* If the KVM pointer is in flux or the guest is running, disallow
* assignment of control domain.
*/
- if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
+ if (matrix_mdev->kvm) {
ret = -EBUSY;
goto done;
}
@@ -968,7 +958,7 @@ static ssize_t unassign_control_domain_store(struct device *dev,
* If the KVM pointer is in flux or the guest is running, disallow
* un-assignment of control domain.
*/
- if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
+ if (matrix_mdev->kvm) {
ret = -EBUSY;
goto done;
}
@@ -1108,26 +1098,31 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
{
struct ap_matrix_mdev *m;

- if (kvm->arch.crypto.crycbd) {
- list_for_each_entry(m, &matrix_dev->mdev_list, node) {
- if (m != matrix_mdev && m->kvm == kvm)
- return -EPERM;
- }
+ if (!kvm->arch.crypto.crycbd)
+ return 0;

- kvm_get_kvm(kvm);
- matrix_mdev->kvm_busy = true;
- mutex_unlock(&matrix_dev->lock);
- kvm_arch_crypto_set_masks(kvm,
- matrix_mdev->matrix.apm,
- matrix_mdev->matrix.aqm,
- matrix_mdev->matrix.adm);
- mutex_lock(&matrix_dev->lock);
- kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
- matrix_mdev->kvm = kvm;
- matrix_mdev->kvm_busy = false;
- wake_up_all(&matrix_mdev->wait_for_kvm);
+ down_write(&kvm->arch.crypto.pqap_hook_rwsem);
+ mutex_lock(&matrix_dev->lock);
+
+ list_for_each_entry(m, &matrix_dev->mdev_list, node) {
+ if (m != matrix_mdev && m->kvm == kvm) {
+ up_read(&kvm->arch.crypto.pqap_hook_rwsem);
+ mutex_unlock(&matrix_dev->lock);
+ return -EPERM;
+ }
}

+ kvm_get_kvm(kvm);
+ matrix_mdev->kvm = kvm;
+ kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
+ mutex_unlock(&matrix_dev->lock);
+
+ kvm_arch_crypto_set_masks(kvm,
+ matrix_mdev->matrix.apm,
+ matrix_mdev->matrix.aqm,
+ matrix_mdev->matrix.adm);
+ up_write(&kvm->arch.crypto.pqap_hook_rwsem);
+
return 0;
}

@@ -1164,6 +1159,7 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
* vfio_ap_mdev_unset_kvm
*
* @matrix_mdev: a matrix mediated device
+ * @kvm: the KVM guest state object
*
* Performs clean-up of resources no longer needed by @matrix_mdev.
*
@@ -1175,29 +1171,30 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
* done under the @matrix_mdev->lock.
*
*/
-static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
+static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev,
+ struct kvm *kvm)
{
- /*
- * If the KVM pointer is in the process of being set, wait until the
- * process has completed.
- */
- wait_event_cmd(matrix_mdev->wait_for_kvm,
- !matrix_mdev->kvm_busy,
- mutex_unlock(&matrix_dev->lock),
- mutex_lock(&matrix_dev->lock));
+ if (!kvm)
+ return;

- if (matrix_mdev->kvm) {
- matrix_mdev->kvm_busy = true;
+ down_write(&kvm->arch.crypto.pqap_hook_rwsem);
+ mutex_lock(&matrix_dev->lock);
+ if ((!matrix_mdev->kvm) || (!matrix_mdev->kvm->arch.crypto.crycbd)) {
+ up_write(&kvm->arch.crypto.pqap_hook_rwsem);
mutex_unlock(&matrix_dev->lock);
- kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
- mutex_lock(&matrix_dev->lock);
- vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
- matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
- kvm_put_kvm(matrix_mdev->kvm);
- matrix_mdev->kvm = NULL;
- matrix_mdev->kvm_busy = false;
- wake_up_all(&matrix_mdev->wait_for_kvm);
+ return;
}
+ mutex_unlock(&matrix_dev->lock);
+
+ kvm_arch_crypto_clear_masks(kvm);
+
+ mutex_lock(&matrix_dev->lock);
+ vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
+ kvm_put_kvm(kvm);
+ kvm->arch.crypto.pqap_hook = NULL;
+ matrix_mdev->kvm = NULL;
+ mutex_unlock(&matrix_dev->lock);
+ up_write(&kvm->arch.crypto.pqap_hook_rwsem);
}

static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
@@ -1209,16 +1206,13 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
if (action != VFIO_GROUP_NOTIFY_SET_KVM)
return NOTIFY_OK;

- mutex_lock(&matrix_dev->lock);
matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);

if (!data)
- vfio_ap_mdev_unset_kvm(matrix_mdev);
+ vfio_ap_mdev_unset_kvm(matrix_mdev, matrix_mdev->kvm);
else if (vfio_ap_mdev_set_kvm(matrix_mdev, data))
notify_rc = NOTIFY_DONE;

- mutex_unlock(&matrix_dev->lock);
-
return notify_rc;
}

@@ -1352,14 +1346,12 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
{
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);

- mutex_lock(&matrix_dev->lock);
- vfio_ap_mdev_unset_kvm(matrix_mdev);
- mutex_unlock(&matrix_dev->lock);
-
vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
&matrix_mdev->iommu_notifier);
vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
&matrix_mdev->group_notifier);
+
+ vfio_ap_mdev_unset_kvm(matrix_mdev, matrix_mdev->kvm);
module_put(THIS_MODULE);
}

@@ -1401,15 +1393,6 @@ static ssize_t vfio_ap_mdev_ioctl(struct mdev_device *mdev,
break;
}

- /*
- * If the KVM pointer is in the process of being set, wait until
- * the process has completed.
- */
- wait_event_cmd(matrix_mdev->wait_for_kvm,
- !matrix_mdev->kvm_busy,
- mutex_unlock(&matrix_dev->lock),
- mutex_lock(&matrix_dev->lock));
-
ret = vfio_ap_mdev_reset_queues(mdev);
break;
default:
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index f82a6396acae..22d2e0ca3ae5 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -83,10 +83,8 @@ struct ap_matrix_mdev {
struct ap_matrix matrix;
struct notifier_block group_notifier;
struct notifier_block iommu_notifier;
- bool kvm_busy;
- wait_queue_head_t wait_for_kvm;
struct kvm *kvm;
- struct kvm_s390_module_hook pqap_hook;
+ crypto_hook pqap_hook;
struct mdev_device *mdev;
};

--
2.30.2


2021-07-12 13:43:39

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH] s390/vfio-ap: do not open code locks for VFIO_GROUP_NOTIFY_SET_KVM notification

Ping!

On 7/7/21 11:41 AM, Tony Krowiak wrote:
> The fix to resolve a lockdep splat while handling the
> VFIO_GROUP_NOTIFY_SET_KVM event introduced a kvm_busy flag indicating that
> the vfio_ap device driver is busy setting or unsetting the KVM pointer.
> A wait queue was employed to allow functions requiring access to the KVM
> pointer to wait for the kvm_busy flag to be cleared. For the duration of
> the wait period, the mdev lock was unlocked then acquired again after the
> kvm_busy flag was cleared.
>
> It was pointed out during an unrelated patch review that locks should not
> be open coded - i.e., writing the algorithm of a standard lock in a
> function instead of using a lock from the standard library. The setting and
> testing of the kvm_busy flag and sleeping on a wait_event is the same thing
> a lock does. Whatever potential deadlock was found and reported via the
> lockdep splat was not magically removed by going to a wait_queue; it just
> removed the lockdep annotations that would identify the issue early.
>
> To remedy the problem introduced with the open coded locks, this patch
> introduces the following changes:
>
> 1. Removes the the kvm_busy flag and wait queue. These were introduced to
> prevent functions from accessing the KVM pointer while it was being
> set because the matrix_dev->lock mutex had to be given up while
> updating the guest's AP configuration in order to resolve the lockdep
> splat. Since the functions that set the KVM pointer as well as those
> that need access to it do so while holding the matrix_dev->lock mutex,
> it is not necessary to wait for the KVM pointer to be set.
>
> 2. Introduces an rwsem to protect the hook (i.e., function pointer) to the
> handler that processes interception of the PQAP instruction. A read
> lock will be taken when the PQAP instruction is intercepted, before
> calling the handler. A write lock will be taken whenever the KVM
> pointer is set since the functions that set the KVM pointer also set
> the hook.
>
> 3. Removes the lock of the matrix_dev->lock mutex from the function that
> handles interception of the PQAP instruction. Since the functions that
> set the KVM pointer and the PQAP interception handler hook as well as
> the function that calls the hook lock the rwsem, it is not necessary
> to lock the matrix_dev->lock mutex in the handler.
>
> Fixes: 0cc00c8d4050 ("s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks")
> Cc: [email protected]
> Signed-off-by: Tony Krowiak <[email protected]>
> Reported-by: Jason Gunthorpe <[email protected]>
> ---
> arch/s390/include/asm/kvm_host.h | 8 +-
> arch/s390/kvm/kvm-s390.c | 1 +
> arch/s390/kvm/priv.c | 10 +-
> drivers/s390/crypto/vfio_ap_ops.c | 129 +++++++++++---------------
> drivers/s390/crypto/vfio_ap_private.h | 4 +-
> 5 files changed, 67 insertions(+), 85 deletions(-)
>
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index 9b4473f76e56..f18849d259e6 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -798,14 +798,12 @@ struct kvm_s390_cpu_model {
> unsigned short ibc;
> };
>
> -struct kvm_s390_module_hook {
> - int (*hook)(struct kvm_vcpu *vcpu);
> - struct module *owner;
> -};
> +typedef int (*crypto_hook)(struct kvm_vcpu *vcpu);
>
> struct kvm_s390_crypto {
> struct kvm_s390_crypto_cb *crycb;
> - struct kvm_s390_module_hook *pqap_hook;
> + struct rw_semaphore pqap_hook_rwsem;
> + crypto_hook *pqap_hook;
> __u32 crycbd;
> __u8 aes_kw;
> __u8 dea_kw;
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index b655a7d82bf0..339534a0c5a5 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -2641,6 +2641,7 @@ static void kvm_s390_crypto_init(struct kvm *kvm)
> sizeof(kvm->arch.crypto.crycb->aes_wrapping_key_mask));
> get_random_bytes(kvm->arch.crypto.crycb->dea_wrapping_key_mask,
> sizeof(kvm->arch.crypto.crycb->dea_wrapping_key_mask));
> + init_rwsem(&kvm->arch.crypto.pqap_hook_rwsem);
> }
>
> static void sca_dispose(struct kvm *kvm)
> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
> index 9928f785c677..ec16c2facf7c 100644
> --- a/arch/s390/kvm/priv.c
> +++ b/arch/s390/kvm/priv.c
> @@ -610,6 +610,7 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
> static int handle_pqap(struct kvm_vcpu *vcpu)
> {
> struct ap_queue_status status = {};
> + crypto_hook handle_pqap;
> unsigned long reg0;
> int ret;
> uint8_t fc;
> @@ -657,15 +658,16 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
> * Verify that the hook callback is registered, lock the owner
> * and call the hook.
> */
> + down_read(&vcpu->kvm->arch.crypto.pqap_hook_rwsem);
> if (vcpu->kvm->arch.crypto.pqap_hook) {
> - if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))
> - return -EOPNOTSUPP;
> - ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);
> - module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);
> + handle_pqap = *vcpu->kvm->arch.crypto.pqap_hook;
> + ret = handle_pqap(vcpu);
> if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000)
> kvm_s390_set_psw_cc(vcpu, 3);
> + up_read(&vcpu->kvm->arch.crypto.pqap_hook_rwsem);
> return ret;
> }
> + up_read(&vcpu->kvm->arch.crypto.pqap_hook_rwsem);
> /*
> * A vfio_driver must register a hook.
> * No hook means no driver to enable the SIE CRYCB and no queues.
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 122c85c22469..d3447f6d83f1 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -270,6 +270,9 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
> * We take the matrix_dev lock to ensure serialization on queues and
> * mediated device access.
> *
> + * Note: This function must be called with a read lock held on
> + * vcpu->kvm->arch.crypto.pqap_hook_rwsem.
> + *
> * Return 0 if we could handle the request inside KVM.
> * otherwise, returns -EOPNOTSUPP to let QEMU handle the fault.
> */
> @@ -287,22 +290,12 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
> return -EOPNOTSUPP;
>
> apqn = vcpu->run->s.regs.gprs[0] & 0xffff;
> - mutex_lock(&matrix_dev->lock);
>
> if (!vcpu->kvm->arch.crypto.pqap_hook)
> goto out_unlock;
> matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
> struct ap_matrix_mdev, pqap_hook);
>
> - /*
> - * If the KVM pointer is in the process of being set, wait until the
> - * process has completed.
> - */
> - wait_event_cmd(matrix_mdev->wait_for_kvm,
> - !matrix_mdev->kvm_busy,
> - mutex_unlock(&matrix_dev->lock),
> - mutex_lock(&matrix_dev->lock));
> -
> /* If the there is no guest using the mdev, there is nothing to do */
> if (!matrix_mdev->kvm)
> goto out_unlock;
> @@ -323,7 +316,6 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
> out_unlock:
> memcpy(&vcpu->run->s.regs.gprs[1], &qstatus, sizeof(qstatus));
> vcpu->run->s.regs.gprs[1] >>= 32;
> - mutex_unlock(&matrix_dev->lock);
> return 0;
> }
>
> @@ -350,10 +342,8 @@ static int vfio_ap_mdev_create(struct mdev_device *mdev)
>
> matrix_mdev->mdev = mdev;
> vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
> - init_waitqueue_head(&matrix_mdev->wait_for_kvm);
> mdev_set_drvdata(mdev, matrix_mdev);
> - matrix_mdev->pqap_hook.hook = handle_pqap;
> - matrix_mdev->pqap_hook.owner = THIS_MODULE;
> + matrix_mdev->pqap_hook = handle_pqap;
> mutex_lock(&matrix_dev->lock);
> list_add(&matrix_mdev->node, &matrix_dev->mdev_list);
> mutex_unlock(&matrix_dev->lock);
> @@ -624,7 +614,7 @@ static ssize_t assign_adapter_store(struct device *dev,
> * If the KVM pointer is in flux or the guest is running, disallow
> * un-assignment of adapter
> */
> - if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
> + if (matrix_mdev->kvm) {
> ret = -EBUSY;
> goto done;
> }
> @@ -697,7 +687,7 @@ static ssize_t unassign_adapter_store(struct device *dev,
> * If the KVM pointer is in flux or the guest is running, disallow
> * un-assignment of adapter
> */
> - if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
> + if (matrix_mdev->kvm) {
> ret = -EBUSY;
> goto done;
> }
> @@ -787,7 +777,7 @@ static ssize_t assign_domain_store(struct device *dev,
> * If the KVM pointer is in flux or the guest is running, disallow
> * assignment of domain
> */
> - if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
> + if (matrix_mdev->kvm) {
> ret = -EBUSY;
> goto done;
> }
> @@ -855,7 +845,7 @@ static ssize_t unassign_domain_store(struct device *dev,
> * If the KVM pointer is in flux or the guest is running, disallow
> * un-assignment of domain
> */
> - if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
> + if (matrix_mdev->kvm) {
> ret = -EBUSY;
> goto done;
> }
> @@ -909,7 +899,7 @@ static ssize_t assign_control_domain_store(struct device *dev,
> * If the KVM pointer is in flux or the guest is running, disallow
> * assignment of control domain.
> */
> - if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
> + if (matrix_mdev->kvm) {
> ret = -EBUSY;
> goto done;
> }
> @@ -968,7 +958,7 @@ static ssize_t unassign_control_domain_store(struct device *dev,
> * If the KVM pointer is in flux or the guest is running, disallow
> * un-assignment of control domain.
> */
> - if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
> + if (matrix_mdev->kvm) {
> ret = -EBUSY;
> goto done;
> }
> @@ -1108,26 +1098,31 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
> {
> struct ap_matrix_mdev *m;
>
> - if (kvm->arch.crypto.crycbd) {
> - list_for_each_entry(m, &matrix_dev->mdev_list, node) {
> - if (m != matrix_mdev && m->kvm == kvm)
> - return -EPERM;
> - }
> + if (!kvm->arch.crypto.crycbd)
> + return 0;
>
> - kvm_get_kvm(kvm);
> - matrix_mdev->kvm_busy = true;
> - mutex_unlock(&matrix_dev->lock);
> - kvm_arch_crypto_set_masks(kvm,
> - matrix_mdev->matrix.apm,
> - matrix_mdev->matrix.aqm,
> - matrix_mdev->matrix.adm);
> - mutex_lock(&matrix_dev->lock);
> - kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
> - matrix_mdev->kvm = kvm;
> - matrix_mdev->kvm_busy = false;
> - wake_up_all(&matrix_mdev->wait_for_kvm);
> + down_write(&kvm->arch.crypto.pqap_hook_rwsem);
> + mutex_lock(&matrix_dev->lock);
> +
> + list_for_each_entry(m, &matrix_dev->mdev_list, node) {
> + if (m != matrix_mdev && m->kvm == kvm) {
> + up_read(&kvm->arch.crypto.pqap_hook_rwsem);
> + mutex_unlock(&matrix_dev->lock);
> + return -EPERM;
> + }
> }
>
> + kvm_get_kvm(kvm);
> + matrix_mdev->kvm = kvm;
> + kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
> + mutex_unlock(&matrix_dev->lock);
> +
> + kvm_arch_crypto_set_masks(kvm,
> + matrix_mdev->matrix.apm,
> + matrix_mdev->matrix.aqm,
> + matrix_mdev->matrix.adm);
> + up_write(&kvm->arch.crypto.pqap_hook_rwsem);
> +
> return 0;
> }
>
> @@ -1164,6 +1159,7 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
> * vfio_ap_mdev_unset_kvm
> *
> * @matrix_mdev: a matrix mediated device
> + * @kvm: the KVM guest state object
> *
> * Performs clean-up of resources no longer needed by @matrix_mdev.
> *
> @@ -1175,29 +1171,30 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
> * done under the @matrix_mdev->lock.
> *
> */
> -static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
> +static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev,
> + struct kvm *kvm)
> {
> - /*
> - * If the KVM pointer is in the process of being set, wait until the
> - * process has completed.
> - */
> - wait_event_cmd(matrix_mdev->wait_for_kvm,
> - !matrix_mdev->kvm_busy,
> - mutex_unlock(&matrix_dev->lock),
> - mutex_lock(&matrix_dev->lock));
> + if (!kvm)
> + return;
>
> - if (matrix_mdev->kvm) {
> - matrix_mdev->kvm_busy = true;
> + down_write(&kvm->arch.crypto.pqap_hook_rwsem);
> + mutex_lock(&matrix_dev->lock);
> + if ((!matrix_mdev->kvm) || (!matrix_mdev->kvm->arch.crypto.crycbd)) {
> + up_write(&kvm->arch.crypto.pqap_hook_rwsem);
> mutex_unlock(&matrix_dev->lock);
> - kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
> - mutex_lock(&matrix_dev->lock);
> - vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
> - matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
> - kvm_put_kvm(matrix_mdev->kvm);
> - matrix_mdev->kvm = NULL;
> - matrix_mdev->kvm_busy = false;
> - wake_up_all(&matrix_mdev->wait_for_kvm);
> + return;
> }
> + mutex_unlock(&matrix_dev->lock);
> +
> + kvm_arch_crypto_clear_masks(kvm);
> +
> + mutex_lock(&matrix_dev->lock);
> + vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
> + kvm_put_kvm(kvm);
> + kvm->arch.crypto.pqap_hook = NULL;
> + matrix_mdev->kvm = NULL;
> + mutex_unlock(&matrix_dev->lock);
> + up_write(&kvm->arch.crypto.pqap_hook_rwsem);
> }
>
> static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
> @@ -1209,16 +1206,13 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
> if (action != VFIO_GROUP_NOTIFY_SET_KVM)
> return NOTIFY_OK;
>
> - mutex_lock(&matrix_dev->lock);
> matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
>
> if (!data)
> - vfio_ap_mdev_unset_kvm(matrix_mdev);
> + vfio_ap_mdev_unset_kvm(matrix_mdev, matrix_mdev->kvm);
> else if (vfio_ap_mdev_set_kvm(matrix_mdev, data))
> notify_rc = NOTIFY_DONE;
>
> - mutex_unlock(&matrix_dev->lock);
> -
> return notify_rc;
> }
>
> @@ -1352,14 +1346,12 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
> {
> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>
> - mutex_lock(&matrix_dev->lock);
> - vfio_ap_mdev_unset_kvm(matrix_mdev);
> - mutex_unlock(&matrix_dev->lock);
> -
> vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
> &matrix_mdev->iommu_notifier);
> vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
> &matrix_mdev->group_notifier);
> +
> + vfio_ap_mdev_unset_kvm(matrix_mdev, matrix_mdev->kvm);
> module_put(THIS_MODULE);
> }
>
> @@ -1401,15 +1393,6 @@ static ssize_t vfio_ap_mdev_ioctl(struct mdev_device *mdev,
> break;
> }
>
> - /*
> - * If the KVM pointer is in the process of being set, wait until
> - * the process has completed.
> - */
> - wait_event_cmd(matrix_mdev->wait_for_kvm,
> - !matrix_mdev->kvm_busy,
> - mutex_unlock(&matrix_dev->lock),
> - mutex_lock(&matrix_dev->lock));
> -
> ret = vfio_ap_mdev_reset_queues(mdev);
> break;
> default:
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index f82a6396acae..22d2e0ca3ae5 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -83,10 +83,8 @@ struct ap_matrix_mdev {
> struct ap_matrix matrix;
> struct notifier_block group_notifier;
> struct notifier_block iommu_notifier;
> - bool kvm_busy;
> - wait_queue_head_t wait_for_kvm;
> struct kvm *kvm;
> - struct kvm_s390_module_hook pqap_hook;
> + crypto_hook pqap_hook;
> struct mdev_device *mdev;
> };
>

2021-07-12 23:39:52

by Halil Pasic

[permalink] [raw]
Subject: Re: [PATCH] s390/vfio-ap: do not open code locks for VFIO_GROUP_NOTIFY_SET_KVM notification

On Wed, 7 Jul 2021 11:41:56 -0400
Tony Krowiak <[email protected]> wrote:

> It was pointed out during an unrelated patch review that locks should not
> be open coded - i.e., writing the algorithm of a standard lock in a
> function instead of using a lock from the standard library. The setting and
> testing of the kvm_busy flag and sleeping on a wait_event is the same thing
> a lock does. Whatever potential deadlock was found and reported via the
> lockdep splat was not magically removed by going to a wait_queue; it just
> removed the lockdep annotations that would identify the issue early

Did you change your opinion since we last talked about it? This reads to
me like we are deadlocky without this patch, because of the last
sentence.

Regards,
Halil

2021-07-13 13:49:12

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH] s390/vfio-ap: do not open code locks for VFIO_GROUP_NOTIFY_SET_KVM notification



On 7/12/21 7:38 PM, Halil Pasic wrote:
> On Wed, 7 Jul 2021 11:41:56 -0400
> Tony Krowiak <[email protected]> wrote:
>
>> It was pointed out during an unrelated patch review that locks should not
>> be open coded - i.e., writing the algorithm of a standard lock in a
>> function instead of using a lock from the standard library. The setting and
>> testing of the kvm_busy flag and sleeping on a wait_event is the same thing
>> a lock does. Whatever potential deadlock was found and reported via the
>> lockdep splat was not magically removed by going to a wait_queue; it just
>> removed the lockdep annotations that would identify the issue early
> Did you change your opinion since we last talked about it? This reads to
> me like we are deadlocky without this patch, because of the last
> sentence.

The words are a direct paraphrase of Jason G's responses to my
query regarding what he meant by open coding locks. I
am choosing to take his word on the subject and remove the
open coded locks.

Having said that, we do not have a deadlock problem without
this patch. If you recall, the lockdep splat occurred ONLY when
running a Secure Execution guest in a CI environment. Since
AP is not yet supported for SE guests, there is no danger of
a lockdep splat occurring in a customer environment. Given
Jason's objections to the original solution (i.e., kvm_busy flag
and wait queue), I decided to replace the so-called open
coded locks.

>
> Regards,
> Halil

2021-07-13 16:46:49

by Halil Pasic

[permalink] [raw]
Subject: Re: [PATCH] s390/vfio-ap: do not open code locks for VFIO_GROUP_NOTIFY_SET_KVM notification

On Tue, 13 Jul 2021 09:48:01 -0400
Tony Krowiak <[email protected]> wrote:

> On 7/12/21 7:38 PM, Halil Pasic wrote:
> > On Wed, 7 Jul 2021 11:41:56 -0400
> > Tony Krowiak <[email protected]> wrote:
> >
> >> It was pointed out during an unrelated patch review that locks should not
> >> be open coded - i.e., writing the algorithm of a standard lock in a
> >> function instead of using a lock from the standard library. The setting and
> >> testing of the kvm_busy flag and sleeping on a wait_event is the same thing
> >> a lock does. Whatever potential deadlock was found and reported via the
> >> lockdep splat was not magically removed by going to a wait_queue; it just
> >> removed the lockdep annotations that would identify the issue early
> > Did you change your opinion since we last talked about it? This reads to
> > me like we are deadlocky without this patch, because of the last
> > sentence.
>
> The words are a direct paraphrase of Jason G's responses to my
> query regarding what he meant by open coding locks. I
> am choosing to take his word on the subject and remove the
> open coded locks.
>
> Having said that, we do not have a deadlock problem without
> this patch. If you recall, the lockdep splat occurred ONLY when
> running a Secure Execution guest in a CI environment. Since
> AP is not yet supported for SE guests, there is no danger of
> a lockdep splat occurring in a customer environment. Given
> Jason's objections to the original solution (i.e., kvm_busy flag
> and wait queue), I decided to replace the so-called open
> coded locks.

I'm in favor of doing that. But if ("s390/vfio-ap: fix
circular lockdep when setting/clearing crypto masks") ain't buggy,
then this patch does not qualify for stable. For a complete set of
rules consult:
https://github.com/torvalds/linux/blob/master/Documentation/process/stable-kernel-rules.rst

Here the most relevant points:
* It must fix a real bug that bothers people (not a, "This could be a
problem..." type thing).
* t must fix a problem that causes a build error (but not for things
marked CONFIG_BROKEN), an oops, a hang, data corruption, a real security
issue, or some "oh, that's not good" issue. In short, something critical.
* No "theoretical race condition" issues, unless an explanation of how
the race can be exploited is also provided.

Jason may give it another try to convince us that 0cc00c8d4050 only
silenced lockdep, but vfio_ap remained prone to deadlocks. To my best
knowledge using condition variable and a mutex is one of the well known
ways to implement an rwlock.

In my opinion, you should drop the fixes tag, drop the cc stable, and
provide a patch description that corresponds to *your* understanding
of the situation.

Neither the Fixes tag or the stable process is (IMHO) meant for these
types of (style) issues. And if you don't think the alleged problem is
real, don't make the description of your patch say it is real.

Regards,
Halil

>
> >
> > Regards,
> > Halil
>

2021-07-13 17:07:16

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] s390/vfio-ap: do not open code locks for VFIO_GROUP_NOTIFY_SET_KVM notification

On Tue, Jul 13, 2021 at 06:45:17PM +0200, Halil Pasic wrote:

> Jason may give it another try to convince us that 0cc00c8d4050 only
> silenced lockdep, but vfio_ap remained prone to deadlocks. To my best
> knowledge using condition variable and a mutex is one of the well known
> ways to implement an rwlock.

The well known pattern is to use a rwsem.

This:
wait_event_cmd(matrix_mdev->wait_for_kvm,
!matrix_mdev->kvm_busy,
mutex_unlock(&matrix_dev->lock),
mutex_lock(&matrix_dev->lock));


Is not really a rwsem, and is invsible to lockdep.

Jason

2021-07-13 18:48:12

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH] s390/vfio-ap: do not open code locks for VFIO_GROUP_NOTIFY_SET_KVM notification



On 7/13/21 12:45 PM, Halil Pasic wrote:
> On Tue, 13 Jul 2021 09:48:01 -0400
> Tony Krowiak <[email protected]> wrote:
>
>> On 7/12/21 7:38 PM, Halil Pasic wrote:
>>> On Wed, 7 Jul 2021 11:41:56 -0400
>>> Tony Krowiak <[email protected]> wrote:
>>>
>>>> It was pointed out during an unrelated patch review that locks should not
>>>> be open coded - i.e., writing the algorithm of a standard lock in a
>>>> function instead of using a lock from the standard library. The setting and
>>>> testing of the kvm_busy flag and sleeping on a wait_event is the same thing
>>>> a lock does. Whatever potential deadlock was found and reported via the
>>>> lockdep splat was not magically removed by going to a wait_queue; it just
>>>> removed the lockdep annotations that would identify the issue early
>>> Did you change your opinion since we last talked about it? This reads to
>>> me like we are deadlocky without this patch, because of the last
>>> sentence.
>> The words are a direct paraphrase of Jason G's responses to my
>> query regarding what he meant by open coding locks. I
>> am choosing to take his word on the subject and remove the
>> open coded locks.
>>
>> Having said that, we do not have a deadlock problem without
>> this patch. If you recall, the lockdep splat occurred ONLY when
>> running a Secure Execution guest in a CI environment. Since
>> AP is not yet supported for SE guests, there is no danger of
>> a lockdep splat occurring in a customer environment. Given
>> Jason's objections to the original solution (i.e., kvm_busy flag
>> and wait queue), I decided to replace the so-called open
>> coded locks.
> I'm in favor of doing that. But if ("s390/vfio-ap: fix
> circular lockdep when setting/clearing crypto masks") ain't buggy,
> then this patch does not qualify for stable. For a complete set of
> rules consult:
> https://github.com/torvalds/linux/blob/master/Documentation/process/stable-kernel-rules.rst
>
> Here the most relevant points:
> * It must fix a real bug that bothers people (not a, "This could be a
> problem..." type thing).
> * t must fix a problem that causes a build error (but not for things
> marked CONFIG_BROKEN), an oops, a hang, data corruption, a real security
> issue, or some "oh, that's not good" issue. In short, something critical.
> * No "theoretical race condition" issues, unless an explanation of how
> the race can be exploited is also provided.
>
> Jason may give it another try to convince us that 0cc00c8d4050 only
> silenced lockdep, but vfio_ap remained prone to deadlocks. To my best
> knowledge using condition variable and a mutex is one of the well known
> ways to implement an rwlock.
>
> In my opinion, you should drop the fixes tag, drop the cc stable, and
> provide a patch description that corresponds to *your* understanding
> of the situation.

I'll drop the fixes and cc stable. Given the patch was created in
response to Jason G's comments - which are paraphrased in
the patch description - the patch description corresponds
directly to my understanding of the situation. It is precisely why
I created the patch.

>
> Neither the Fixes tag or the stable process is (IMHO) meant for these
> types of (style) issues. And if you don't think the alleged problem is
> real, don't make the description of your patch say it is real.
> Regards,
> Halil
>
>>> Regards,
>>> Halil

2021-07-13 19:06:11

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH] s390/vfio-ap: do not open code locks for VFIO_GROUP_NOTIFY_SET_KVM notification



On 7/13/21 1:05 PM, Jason Gunthorpe wrote:
> On Tue, Jul 13, 2021 at 06:45:17PM +0200, Halil Pasic wrote:
>
>> Jason may give it another try to convince us that 0cc00c8d4050 only
>> silenced lockdep, but vfio_ap remained prone to deadlocks. To my best
>> knowledge using condition variable and a mutex is one of the well known
>> ways to implement an rwlock.
> The well known pattern is to use a rwsem.
>
> This:
> wait_event_cmd(matrix_mdev->wait_for_kvm,
> !matrix_mdev->kvm_busy,
> mutex_unlock(&matrix_dev->lock),
> mutex_lock(&matrix_dev->lock));
>
>
> Is not really a rwsem, and is invsible to lockdep.

The lockdep splat was due to holding the matrix_dev->lock
mutex while the kvm->lock was taken to plug the AP devices
into the guest. The same problem would occur whether an
rwsem or the mutex was used.

The lockdep splat was resolved by setting the
matrix_mdev->kvm_busy flag and unlocking the matrix_dev->lock
mutex while the AP devices were being plugged into the guest.
All other functions needing the matrix_dev->lock mutex would wait
on a queue until the matrix_mdev->kvm_busy flag is cleared before
locking the matrix_dev->lock mutex.

Now, I understand that this technique is invisible to lockdep,
but I don't see how we can ever end up in a deadlock with
this design since the matrix_dev->lock mutex will never get
locked as long as the matrix_mdev->kvm_busy flag is set.

>
> Jason

2021-07-13 19:22:33

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] s390/vfio-ap: do not open code locks for VFIO_GROUP_NOTIFY_SET_KVM notification

On Tue, Jul 13, 2021 at 03:04:28PM -0400, Tony Krowiak wrote:
>
>
> On 7/13/21 1:05 PM, Jason Gunthorpe wrote:
> > On Tue, Jul 13, 2021 at 06:45:17PM +0200, Halil Pasic wrote:
> >
> > > Jason may give it another try to convince us that 0cc00c8d4050 only
> > > silenced lockdep, but vfio_ap remained prone to deadlocks. To my best
> > > knowledge using condition variable and a mutex is one of the well known
> > > ways to implement an rwlock.
> > The well known pattern is to use a rwsem.
> >
> > This:
> > wait_event_cmd(matrix_mdev->wait_for_kvm,
> > !matrix_mdev->kvm_busy,
> > mutex_unlock(&matrix_dev->lock),
> > mutex_lock(&matrix_dev->lock));
> >
> >
> > Is not really a rwsem, and is invsible to lockdep.
>
> The lockdep splat was due to holding the matrix_dev->lock
> mutex while the kvm->lock was taken to plug the AP devices
> into the guest. The same problem would occur whether an
> rwsem or the mutex was used.

See, everytime you say 'the same problem would occur with a rwsem' you
don't get to go on and say everthing is fine if we open code a rwsem
as kvm_busy

This patch shows it works as a rwsem - look at what had to be changed
to make it so and you will find some clue to where the problems are in
kvm_busy version.

In any event, I don't care anymore - please just get this merged, to
AlexW's tree so I don't have conflicts with the rest of the ap changes
for VFIO I've got queued up.

Jason

2021-07-14 13:27:47

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH] s390/vfio-ap: do not open code locks for VFIO_GROUP_NOTIFY_SET_KVM notification


> This patch shows it works as a rwsem - look at what had to be changed
> to make it so and you will find some clue to where the problems are in
> kvm_busy version.
>
> In any event, I don't care anymore - please just get this merged, to
> AlexW's tree so I don't have conflicts with the rest of the ap changes
> for VFIO I've got queued up.

Christian, can you merge this with AlexW's tree? Halil suggested
the 'fixes' and 'cc stable' tags ought to be removed, do I need
to post another version or can you take those out when merging?

>
> Jason

2021-07-14 18:00:09

by Christian Borntraeger

[permalink] [raw]
Subject: Re: [PATCH] s390/vfio-ap: do not open code locks for VFIO_GROUP_NOTIFY_SET_KVM notification



On 14.07.21 15:25, Tony Krowiak wrote:
>
>> This patch shows it works as a rwsem - look at what had to be changed
>> to make it so and you will find some clue to where the problems are in
>> kvm_busy version.
>>
>> In any event, I don't care anymore - please just get this merged, to
>> AlexW's tree so I don't have conflicts with the rest of the ap changes
>> for VFIO I've got queued up.
>
> Christian, can you merge this with AlexW's tree? Halil suggested
> the 'fixes' and 'cc stable' tags ought to be removed, do I need
> to post another version or can you take those out when merging?

Normally this would go via the KVM/s390 or s390 tree (as Alex did
not want to handle s390 vfio devices).

Alex, as Jason is waiting for this to be in your tree could you take
this patch via your tree ?(please remove cc stable and Fixes for now
I want this to settle a bit).

To carry this patch in your tree with my kvm/s390 and s390 maintainer hat
Acked-by: Christian Borntraeger <[email protected]>


2021-07-14 18:45:22

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] s390/vfio-ap: do not open code locks for VFIO_GROUP_NOTIFY_SET_KVM notification

On Wed, Jul 14, 2021 at 07:56:42PM +0200, Christian Borntraeger wrote:
> > > In any event, I don't care anymore - please just get this merged, to
> > > AlexW's tree so I don't have conflicts with the rest of the ap changes
> > > for VFIO I've got queued up.
> >
> > Christian, can you merge this with AlexW's tree? Halil suggested
> > the 'fixes' and 'cc stable' tags ought to be removed, do I need
> > to post another version or can you take those out when merging?
>
> Normally this would go via the KVM/s390 or s390 tree (as Alex did
> not want to handle s390 vfio devices).

I would like to move the vfio driver(s) out of driver/s390 and over to
vfio. This is the normal kernel process and after some discussion we
decided with Thomas Gleixner that this is the right thing to do for
things like the new Intel mdev.

But that can be another discussion, thanks

> Alex, as Jason is waiting for this to be in your tree could you take
> this patch via your tree ?(please remove cc stable and Fixes for now
> I want this to settle a bit).
>
> To carry this patch in your tree with my kvm/s390 and s390 maintainer hat
> Acked-by: Christian Borntraeger <[email protected]>

Great!

Jason

2021-07-14 19:04:22

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH] s390/vfio-ap: do not open code locks for VFIO_GROUP_NOTIFY_SET_KVM notification

On Wed, 14 Jul 2021 19:56:42 +0200
Christian Borntraeger <[email protected]> wrote:

> On 14.07.21 15:25, Tony Krowiak wrote:
> >
> >> This patch shows it works as a rwsem - look at what had to be changed
> >> to make it so and you will find some clue to where the problems are in
> >> kvm_busy version.
> >>
> >> In any event, I don't care anymore - please just get this merged, to
> >> AlexW's tree so I don't have conflicts with the rest of the ap changes
> >> for VFIO I've got queued up.
> >
> > Christian, can you merge this with AlexW's tree? Halil suggested
> > the 'fixes' and 'cc stable' tags ought to be removed, do I need
> > to post another version or can you take those out when merging?
>
> Normally this would go via the KVM/s390 or s390 tree (as Alex did
> not want to handle s390 vfio devices).
>
> Alex, as Jason is waiting for this to be in your tree could you take
> this patch via your tree ?(please remove cc stable and Fixes for now
> I want this to settle a bit).
>
> To carry this patch in your tree with my kvm/s390 and s390 maintainer hat
> Acked-by: Christian Borntraeger <[email protected]>

Is this intended for v5.14 or v5.15? I don't have a v5.15 next branch
yet, so if this were to get into v5.14 before I create that, it'd be
there for Jason as well. Posting a branch that we both merge into our
linux-next branches would be another option. I can take it either way,
just trying to understand what we're wanting to achieve. Thanks,

Alex

2021-07-15 12:25:08

by Halil Pasic

[permalink] [raw]
Subject: Re: [PATCH] s390/vfio-ap: do not open code locks for VFIO_GROUP_NOTIFY_SET_KVM notification

On Tue, 13 Jul 2021 14:05:33 -0300
Jason Gunthorpe <[email protected]> wrote:

> On Tue, Jul 13, 2021 at 06:45:17PM +0200, Halil Pasic wrote:
>
> > Jason may give it another try to convince us that 0cc00c8d4050 only
> > silenced lockdep, but vfio_ap remained prone to deadlocks. To my best
> > knowledge using condition variable and a mutex is one of the well known
> > ways to implement an rwlock.
>
> The well known pattern is to use a rwsem.

I think you are missing the point. We are discussing whether
this qualifies for stable, i.e. if 0cc00c8d4050 is really broken
like the patch description says.

Using a readers-writers lock (as a primitive) to implement a
a readers-writers lock is a fallacy, so I guess you wanted to
say that when a readers-writers lock is needed in the kernel the
obvious choices are rw_semaphore and/or rwlock_t (depending on the
spin).

What I wanted to say is using a condition variable and a mutex is
not per-see wrong, because one can even implement an readers-writers
lock with it. For reference see https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock


>
> This:
> wait_event_cmd(matrix_mdev->wait_for_kvm,
> !matrix_mdev->kvm_busy,
> mutex_unlock(&matrix_dev->lock),
> mutex_lock(&matrix_dev->lock));
>
>
> Is not really a rwsem, and is invsible to lockdep.
>

I agree. But this is not a proof of a problem that qualifies to be fixed
using the stable process as documented in
https://github.com/torvalds/linux/blob/master/Documentation/process/stable-kernel-rules.rst

I'm in favor of rewriting this to use rw_semaphore. I'm not in favor
of proclaiming this a fix for stable, because for that you first have
to prove that you fix a real problem.

I hope we are on the same page.

Regards,
Halil

2021-07-15 14:19:19

by Halil Pasic

[permalink] [raw]
Subject: Re: [PATCH] s390/vfio-ap: do not open code locks for VFIO_GROUP_NOTIFY_SET_KVM notification

On Wed, 7 Jul 2021 11:41:56 -0400
Tony Krowiak <[email protected]> wrote:

First sorry for being this late with having a more serious look at the
code.


> @@ -270,6 +270,9 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
> * We take the matrix_dev lock to ensure serialization on queues and
> * mediated device access.
> *
> + * Note: This function must be called with a read lock held on
> + * vcpu->kvm->arch.crypto.pqap_hook_rwsem.
> + *


That is a fine synchronization for the pqap_hook, but I don't think it
is sufficient for everything.


> * Return 0 if we could handle the request inside KVM.
> * otherwise, returns -EOPNOTSUPP to let QEMU handle the fault.
> */
> @@ -287,22 +290,12 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
> return -EOPNOTSUPP;
>
> apqn = vcpu->run->s.regs.gprs[0] & 0xffff;
> - mutex_lock(&matrix_dev->lock);

Here you drop a matrix_dev->lock critical section. And then
you do all the interesting stuff. E.g.
q = vfio_ap_get_queue(matrix_mdev, apqn);
and
vfio_ap_irq_enable(q, status & 0x07, vcpu->run->s.regs.gprs[2]);.
Since in vfio_ap_get_queue() we do the check if the queue belongs
to the given guest, and examine the matrix (apm, aqm) I suppose
that needs to be done holding a lock that protects the matrix,
and to my best knowledge this is still matrix_dev->lock. It would
probably make sense to convert matrix_dev->lock into an rw_semaphore,
or to introduce a some new rwlock which protects less state in the
future, but right now AFAICT it is still matrix_dev->lock.

So I don't think this patch should pass review.

Regards,
Halil

>
> if (!vcpu->kvm->arch.crypto.pqap_hook)
> goto out_unlock;
> matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
> struct ap_matrix_mdev, pqap_hook);
>
> - /*
> - * If the KVM pointer is in the process of being set, wait until the
> - * process has completed.
> - */
> - wait_event_cmd(matrix_mdev->wait_for_kvm,
> - !matrix_mdev->kvm_busy,
> - mutex_unlock(&matrix_dev->lock),
> - mutex_lock(&matrix_dev->lock));
> -
> /* If the there is no guest using the mdev, there is nothing to do */
> if (!matrix_mdev->kvm)
> goto out_unlock;

2021-07-15 14:42:53

by Anthony Krowiak

[permalink] [raw]
Subject: Re: [PATCH] s390/vfio-ap: do not open code locks for VFIO_GROUP_NOTIFY_SET_KVM notification



On 7/15/21 9:44 AM, Halil Pasic wrote:
> On Wed, 7 Jul 2021 11:41:56 -0400
> Tony Krowiak <[email protected]> wrote:
>
> First sorry for being this late with having a more serious look at the
> code.
>
>
>> @@ -270,6 +270,9 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
>> * We take the matrix_dev lock to ensure serialization on queues and
>> * mediated device access.
>> *
>> + * Note: This function must be called with a read lock held on
>> + * vcpu->kvm->arch.crypto.pqap_hook_rwsem.
>> + *
>
> That is a fine synchronization for the pqap_hook, but I don't think it
> is sufficient for everything.
>
>
>> * Return 0 if we could handle the request inside KVM.
>> * otherwise, returns -EOPNOTSUPP to let QEMU handle the fault.
>> */
>> @@ -287,22 +290,12 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>> return -EOPNOTSUPP;
>>
>> apqn = vcpu->run->s.regs.gprs[0] & 0xffff;
>> - mutex_lock(&matrix_dev->lock);
> Here you drop a matrix_dev->lock critical section. And then
> you do all the interesting stuff. E.g.
> q = vfio_ap_get_queue(matrix_mdev, apqn);
> and
> vfio_ap_irq_enable(q, status & 0x07, vcpu->run->s.regs.gprs[2]);.
> Since in vfio_ap_get_queue() we do the check if the queue belongs
> to the given guest, and examine the matrix (apm, aqm) I suppose
> that needs to be done holding a lock that protects the matrix,
> and to my best knowledge this is still matrix_dev->lock. It would
> probably make sense to convert matrix_dev->lock into an rw_semaphore,
> or to introduce a some new rwlock which protects less state in the
> future, but right now AFAICT it is still matrix_dev->lock.
>
> So I don't think this patch should pass review.

Good catch. In an earlier patch review, Jason G suggested locking the
kvm->lock
mutex outside of the kvm_arch_crypto_set_masks() and
kvm_arch_crypto_clear_masks()
functions to resolve the lockdep splat resulting from locking the
matrix_dev->lock
mutex prior to the kvm->lock mutex. I believe this will allow me to remove
the kvm_busy/wait queue scenario without introducing a new rwsem.

>
> Regards,
> Halil
>
>> if (!vcpu->kvm->arch.crypto.pqap_hook)
>> goto out_unlock;
>> matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
>> struct ap_matrix_mdev, pqap_hook);
>>
>> - /*
>> - * If the KVM pointer is in the process of being set, wait until the
>> - * process has completed.
>> - */
>> - wait_event_cmd(matrix_mdev->wait_for_kvm,
>> - !matrix_mdev->kvm_busy,
>> - mutex_unlock(&matrix_dev->lock),
>> - mutex_lock(&matrix_dev->lock));
>> -
>> /* If the there is no guest using the mdev, there is nothing to do */
>> if (!matrix_mdev->kvm)
>> goto out_unlock;

2021-07-15 14:59:20

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] s390/vfio-ap: do not open code locks for VFIO_GROUP_NOTIFY_SET_KVM notification

On Thu, Jul 15, 2021 at 01:31:36PM +0200, Halil Pasic wrote:

> I'm in favor of rewriting this to use rw_semaphore. I'm not in favor
> of proclaiming this a fix for stable, because for that you first have
> to prove that you fix a real problem.

So Tony keeps saying that a rwsem couldn't work for what kvm_busy
does, so it should be trivial enough to put in the rwsem and capture
the lockdep spew.

> I hope we are on the same page.

Sure

Jason