This series introduces several improvements to the function that performs
AP queue resets:
* Breaks up reset processing into multiple smaller, more concise functions.
* Use TAPQ to verify completion of a reset in progress rather than mulitple
invocations of ZAPQ.
* Check TAPQ response codes when verifying successful completion of ZAPQ.
* Fix erroneous handling of some error response codes.
* Increase the maximum amount of time to wait for successful completion of
ZAPQ.
* Always clean up IRQ resources when the ZAPQ response code indicates an
error.
* Consider reset complete when ZAPQ response code indicates the adapter to
which a queue is connected is deconfigured. All queues associated with an
adapter are reset when it is deconfigured.
Tony Krowiak (7):
s390/vfio-ap: verify reset complete in separate function
s390/vfio_ap: check TAPQ response code when waiting for queue reset
s390/vfio_ap: use TAPQ to verify reset in progress completes
s390/vfio_ap: verify ZAPQ completion after return of response code
zero
s390/vfio_ap: fix handling of error response codes
s390/vfio_ap: increase max wait time for reset verification
s390/vfio_ap: always clean up IRQ resources
drivers/s390/crypto/vfio_ap_ops.c | 106 ++++++++++++++++++++----------
1 file changed, 73 insertions(+), 33 deletions(-)
--
2.31.1
The vfio_ap_mdev_reset_queue() function does not check the status
response code returned form the PQAP(TAPQ) function when verifying the
queue's status; consequently, there is no way of knowing whether
verification failed because the wait time was exceeded, or because the
PQAP(TAPQ) failed.
This patch adds a function to check the status response code from the
PQAP(TAPQ) instruction and logs an appropriate message if it fails.
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 36 ++++++++++++++++++++++++++-----
1 file changed, 31 insertions(+), 5 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 83ff94a38102..a5530a46cf31 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1587,23 +1587,49 @@ static struct vfio_ap_queue *vfio_ap_find_queue(int apqn)
return q;
}
+static int apq_status_check(int apqn, struct ap_queue_status *status)
+{
+ switch (status->response_code) {
+ case AP_RESPONSE_NORMAL:
+ case AP_RESPONSE_RESET_IN_PROGRESS:
+ if (status->queue_empty && !status->irq_enabled)
+ return 0;
+ return -EBUSY;
+ case AP_RESPONSE_DECONFIGURED:
+ /*
+ * If the AP queue is deconfigured, any subsequent AP command
+ * targeting the queue will fail with the same response code. On the
+ * other hand, when an AP adapter is deconfigured, the associated
+ * queues are reset, so let's return a value indicating the reset
+ * for which we're waiting completed successfully.
+ */
+ return 0;
+ default:
+ WARN(true,
+ "failed to verify reset of queue %02x.%04x: TAPQ rc=%u\n",
+ AP_QID_CARD(apqn), AP_QID_QUEUE(apqn),
+ status->response_code);
+ return -EIO;
+ }
+}
+
static int apq_reset_check(struct vfio_ap_queue *q)
{
- int iters = 2;
+ int iters = 2, ret;
struct ap_queue_status status;
while (iters--) {
msleep(20);
status = ap_tapq(q->apqn, NULL);
- if (status.queue_empty && !status.irq_enabled)
- return 0;
+ ret = apq_status_check(q->apqn, &status);
+ if (ret != -EBUSY)
+ return ret;
}
WARN_ONCE(iters <= 0,
"timeout verifying reset of queue %02x.%04x (%u, %u, %u)",
AP_QID_CARD(q->apqn), AP_QID_QUEUE(q->apqn),
status.queue_empty, status.irq_enabled, status.response_code);
-
- return -EBUSY;
+ return ret;
}
static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q,
--
2.31.1
Increase the maximum time to wait for verification of a queue reset
operation to 200ms.
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index dbf681715a6d..e80c5a6b91be 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -30,6 +30,9 @@
#define AP_QUEUE_UNASSIGNED "unassigned"
#define AP_QUEUE_IN_USE "in use"
+#define MAX_RESET_CHECK_WAIT 200 /* Sleep max 200ms for reset check */
+#define AP_RESET_INTERVAL 20 /* Reset sleep interval (20ms) */
+
static int vfio_ap_mdev_reset_queues(struct ap_queue_table *qtable);
static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
static const struct vfio_device_ops vfio_ap_matrix_dev_ops;
@@ -1615,11 +1618,12 @@ static int apq_status_check(int apqn, struct ap_queue_status *status)
static int apq_reset_check(struct vfio_ap_queue *q)
{
- int iters = 2, ret;
+ int ret;
+ int iters = MAX_RESET_CHECK_WAIT / AP_RESET_INTERVAL;
struct ap_queue_status status;
- while (iters--) {
- msleep(20);
+ for (; iters > 0; iters--) {
+ msleep(AP_RESET_INTERVAL);
status = ap_tapq(q->apqn, NULL);
ret = apq_status_check(q->apqn, &status);
if (ret != -EBUSY)
--
2.31.1
The vfio_ap_mdev_reset_queue() function contains a loop to verify that the
reset successfully completes within 40ms. This patch moves that loop into
a separate function.
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 30 +++++++++++++++++++++---------
1 file changed, 21 insertions(+), 9 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 0b4cc8c597ae..83ff94a38102 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1587,12 +1587,30 @@ static struct vfio_ap_queue *vfio_ap_find_queue(int apqn)
return q;
}
+static int apq_reset_check(struct vfio_ap_queue *q)
+{
+ int iters = 2;
+ struct ap_queue_status status;
+
+ while (iters--) {
+ msleep(20);
+ status = ap_tapq(q->apqn, NULL);
+ if (status.queue_empty && !status.irq_enabled)
+ return 0;
+ }
+ WARN_ONCE(iters <= 0,
+ "timeout verifying reset of queue %02x.%04x (%u, %u, %u)",
+ AP_QID_CARD(q->apqn), AP_QID_QUEUE(q->apqn),
+ status.queue_empty, status.irq_enabled, status.response_code);
+
+ return -EBUSY;
+}
+
static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q,
unsigned int retry)
{
struct ap_queue_status status;
int ret;
- int retry2 = 2;
if (!q)
return 0;
@@ -1629,14 +1647,8 @@ static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q,
}
/* wait for the reset to take effect */
- while (retry2--) {
- if (status.queue_empty && !status.irq_enabled)
- break;
- msleep(20);
- status = ap_tapq(q->apqn, NULL);
- }
- WARN_ONCE(retry2 <= 0, "unable to verify reset of queue %02x.%04x",
- AP_QID_CARD(q->apqn), AP_QID_QUEUE(q->apqn));
+ if (!status.queue_empty && status.irq_enabled)
+ ret = apq_reset_check(q);
free_resources:
vfio_ap_free_aqic_resources(q);
--
2.31.1
Clean up IRQ resources even when a PQAP(ZAPQ) function fails with an error
not handled by a case statement.
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index e80c5a6b91be..2dd8db9ddb39 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1676,7 +1676,7 @@ static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
"PQAP/ZAPQ for %02x.%04x failed with invalid rc=%u\n",
AP_QID_CARD(q->apqn), AP_QID_QUEUE(q->apqn),
status.response_code);
- return -EIO;
+ break;
}
vfio_ap_free_aqic_resources(q);
--
2.31.1
To eliminate the repeated calls to the PQAP(ZAPQ) function to verify that
a reset in progress completed successfully and ensure that error response
codes get appropriately logged, let's call the apq_reset_check() function
when the ZAPQ response code indicates that a reset that is already in
progress.
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 24 +++++++++++++-----------
1 file changed, 13 insertions(+), 11 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index a5530a46cf31..5bf2d93ae8af 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -33,7 +33,7 @@
static int vfio_ap_mdev_reset_queues(struct ap_queue_table *qtable);
static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
static const struct vfio_device_ops vfio_ap_matrix_dev_ops;
-static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q, unsigned int retry);
+static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q);
/**
* get_update_locks_for_kvm: Acquire the locks required to dynamically update a
@@ -1632,8 +1632,7 @@ static int apq_reset_check(struct vfio_ap_queue *q)
return ret;
}
-static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q,
- unsigned int retry)
+static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
{
struct ap_queue_status status;
int ret;
@@ -1648,12 +1647,15 @@ static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q,
ret = 0;
break;
case AP_RESPONSE_RESET_IN_PROGRESS:
- if (retry--) {
- msleep(20);
- goto retry_zapq;
- }
- ret = -EBUSY;
- break;
+ /*
+ * There is a reset issued by another process in progress. Let's wait
+ * for that to complete. Since we have no idea whether it was a RAPQ or
+ * ZAPQ, then if it completes successfully, let's issue the ZAPQ.
+ */
+ ret = apq_reset_check(q);
+ if (ret)
+ break;
+ goto retry_zapq;
case AP_RESPONSE_Q_NOT_AVAIL:
case AP_RESPONSE_DECONFIGURED:
case AP_RESPONSE_CHECKSTOPPED:
@@ -1688,7 +1690,7 @@ static int vfio_ap_mdev_reset_queues(struct ap_queue_table *qtable)
struct vfio_ap_queue *q;
hash_for_each(qtable->queues, loop_cursor, q, mdev_qnode) {
- ret = vfio_ap_mdev_reset_queue(q, 1);
+ ret = vfio_ap_mdev_reset_queue(q);
/*
* Regardless whether a queue turns out to be busy, or
* is not operational, we need to continue resetting
@@ -1931,7 +1933,7 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
}
}
- vfio_ap_mdev_reset_queue(q, 1);
+ vfio_ap_mdev_reset_queue(q);
dev_set_drvdata(&apdev->device, NULL);
kfree(q);
release_update_locks_for_mdev(matrix_mdev);
--
2.31.1
Some response codes returned from the queue reset function are not being
handled correctly; this patch fixes them:
1. Response code 3, AP queue deconfigured: Deconfiguring an AP adapter
resets all of its queues, so this is handled by indicating the reset
verification completed successfully.
2. For all response codes other than 0 (normal reset completion), 2
(queue reset in progress) and 3 (AP deconfigured), the -EIO error will
be returned from the vfio_ap_mdev_reset_queue() function. In all cases,
all fields of the status word other than the response code will be
set to zero, so it makes no sense to check status bits.
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 17 +++++++----------
1 file changed, 7 insertions(+), 10 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index c0cf5050be59..dbf681715a6d 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1659,17 +1659,15 @@ static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
if (ret)
break;
goto retry_zapq;
- case AP_RESPONSE_Q_NOT_AVAIL:
case AP_RESPONSE_DECONFIGURED:
- case AP_RESPONSE_CHECKSTOPPED:
- WARN_ONCE(status.irq_enabled,
- "PQAP/ZAPQ for %02x.%04x failed with rc=%u while IRQ enabled",
- AP_QID_CARD(q->apqn), AP_QID_QUEUE(q->apqn),
- status.response_code);
- ret = -EBUSY;
- goto free_resources;
+ /*
+ * When an AP adapter is deconfigured, the associated
+ * queues are reset, so let's return a value indicating the reset
+ * completed successfully.
+ */
+ ret = 0;
+ break;
default:
- /* things are really broken, give up */
WARN(true,
"PQAP/ZAPQ for %02x.%04x failed with invalid rc=%u\n",
AP_QID_CARD(q->apqn), AP_QID_QUEUE(q->apqn),
@@ -1677,7 +1675,6 @@ static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
return -EIO;
}
-free_resources:
vfio_ap_free_aqic_resources(q);
return ret;
--
2.31.1
Verification that the asynchronous ZAPQ function has completed only needs
to be done when the response code indicates the function was successfully
initiated; so, let's call the apq_reset_check function immediately after
the response code zero is returned from the ZAPQ.
Signed-off-by: Tony Krowiak <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 5bf2d93ae8af..c0cf5050be59 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1645,6 +1645,9 @@ static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
switch (status.response_code) {
case AP_RESPONSE_NORMAL:
ret = 0;
+ /* if the reset has not completed, wait for it to take effect */
+ if (!status.queue_empty || status.irq_enabled)
+ ret = apq_reset_check(q);
break;
case AP_RESPONSE_RESET_IN_PROGRESS:
/*
@@ -1674,10 +1677,6 @@ static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
return -EIO;
}
- /* wait for the reset to take effect */
- if (!status.queue_empty && status.irq_enabled)
- ret = apq_reset_check(q);
-
free_resources:
vfio_ap_free_aqic_resources(q);
--
2.31.1
On 12/13/22 10:44 AM, Tony Krowiak wrote:
> This series introduces several improvements to the function that performs
> AP queue resets:
>
> * Breaks up reset processing into multiple smaller, more concise functions.
>
> * Use TAPQ to verify completion of a reset in progress rather than mulitple
> invocations of ZAPQ.
>
> * Check TAPQ response codes when verifying successful completion of ZAPQ.
>
> * Fix erroneous handling of some error response codes.
>
> * Increase the maximum amount of time to wait for successful completion of
> ZAPQ.
>
> * Always clean up IRQ resources when the ZAPQ response code indicates an
> error.
>
> * Consider reset complete when ZAPQ response code indicates the adapter to
> which a queue is connected is deconfigured. All queues associated with an
> adapter are reset when it is deconfigured.
>
> Tony Krowiak (7):
> s390/vfio-ap: verify reset complete in separate function
> s390/vfio_ap: check TAPQ response code when waiting for queue reset
> s390/vfio_ap: use TAPQ to verify reset in progress completes
> s390/vfio_ap: verify ZAPQ completion after return of response code
> zero
> s390/vfio_ap: fix handling of error response codes
> s390/vfio_ap: increase max wait time for reset verification
> s390/vfio_ap: always clean up IRQ resources
>
> drivers/s390/crypto/vfio_ap_ops.c | 106 ++++++++++++++++++++----------
> 1 file changed, 73 insertions(+), 33 deletions(-)
This series largely matches what I've already reviewed. I like the way
you broke this up, it does a better job telling the story.
Here's my R-b for the entire series.
Reviewed-by: Jason J. Herne <[email protected]>
On 12/14/22 10:16 AM, Jason J. Herne wrote:
>
> On 12/13/22 10:44 AM, Tony Krowiak wrote:
>> This series introduces several improvements to the function that
>> performs
>> AP queue resets:
>>
>> * Breaks up reset processing into multiple smaller, more concise
>> functions.
>>
>> * Use TAPQ to verify completion of a reset in progress rather than
>> mulitple
>> invocations of ZAPQ.
>>
>> * Check TAPQ response codes when verifying successful completion of
>> ZAPQ.
>>
>> * Fix erroneous handling of some error response codes.
>>
>> * Increase the maximum amount of time to wait for successful
>> completion of
>> ZAPQ.
>>
>> * Always clean up IRQ resources when the ZAPQ response code indicates an
>> error.
>>
>> * Consider reset complete when ZAPQ response code indicates the
>> adapter to
>> which a queue is connected is deconfigured. All queues associated
>> with an
>> adapter are reset when it is deconfigured.
>>
>> Tony Krowiak (7):
>> s390/vfio-ap: verify reset complete in separate function
>> s390/vfio_ap: check TAPQ response code when waiting for queue reset
>> s390/vfio_ap: use TAPQ to verify reset in progress completes
>> s390/vfio_ap: verify ZAPQ completion after return of response code
>> zero
>> s390/vfio_ap: fix handling of error response codes
>> s390/vfio_ap: increase max wait time for reset verification
>> s390/vfio_ap: always clean up IRQ resources
>>
>> drivers/s390/crypto/vfio_ap_ops.c | 106 ++++++++++++++++++++----------
>> 1 file changed, 73 insertions(+), 33 deletions(-)
>
>
> This series largely matches what I've already reviewed. I like the way
> you broke this up, it does a better job telling the story.
>
> Here's my R-b for the entire series.
> Reviewed-by: Jason J. Herne <[email protected]>
Thanks for the review Jason.
On 2022-12-13 16:44, Tony Krowiak wrote:
> The vfio_ap_mdev_reset_queue() function contains a loop to verify that
> the
> reset successfully completes within 40ms. This patch moves that loop
> into
> a separate function.
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> drivers/s390/crypto/vfio_ap_ops.c | 30 +++++++++++++++++++++---------
> 1 file changed, 21 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
> b/drivers/s390/crypto/vfio_ap_ops.c
> index 0b4cc8c597ae..83ff94a38102 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -1587,12 +1587,30 @@ static struct vfio_ap_queue
> *vfio_ap_find_queue(int apqn)
> return q;
> }
>
> +static int apq_reset_check(struct vfio_ap_queue *q)
> +{
> + int iters = 2;
> + struct ap_queue_status status;
> +
> + while (iters--) {
> + msleep(20);
> + status = ap_tapq(q->apqn, NULL);
> + if (status.queue_empty && !status.irq_enabled)
> + return 0;
> + }
> + WARN_ONCE(iters <= 0,
> + "timeout verifying reset of queue %02x.%04x (%u, %u, %u)",
> + AP_QID_CARD(q->apqn), AP_QID_QUEUE(q->apqn),
> + status.queue_empty, status.irq_enabled, status.response_code);
> +
> + return -EBUSY;
> +}
> +
> static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q,
> unsigned int retry)
> {
> struct ap_queue_status status;
> int ret;
> - int retry2 = 2;
>
> if (!q)
> return 0;
> @@ -1629,14 +1647,8 @@ static int vfio_ap_mdev_reset_queue(struct
> vfio_ap_queue *q,
> }
>
> /* wait for the reset to take effect */
> - while (retry2--) {
> - if (status.queue_empty && !status.irq_enabled)
> - break;
> - msleep(20);
> - status = ap_tapq(q->apqn, NULL);
> - }
> - WARN_ONCE(retry2 <= 0, "unable to verify reset of queue %02x.%04x",
> - AP_QID_CARD(q->apqn), AP_QID_QUEUE(q->apqn));
> + if (!status.queue_empty && status.irq_enabled)
Hm, you check for
a) status.queue_empty and
b) !status.irq_enabled
which is (status.queue_empty && !status.irq_enabled)
and now let's negate this to:
!(status.queue_empty && !status.irq_enabled)
with de-morgan this gives:
(!status.queue_empty || status.irq_enabled)
> + ret = apq_reset_check(q);
>
> free_resources:
> vfio_ap_free_aqic_resources(q);
Reviewed-by: Harald Freudenberger <[email protected]>
On 2022-12-13 16:44, Tony Krowiak wrote:
> Some response codes returned from the queue reset function are not
> being
> handled correctly; this patch fixes them:
>
> 1. Response code 3, AP queue deconfigured: Deconfiguring an AP adapter
> resets all of its queues, so this is handled by indicating the reset
> verification completed successfully.
>
> 2. For all response codes other than 0 (normal reset completion), 2
> (queue reset in progress) and 3 (AP deconfigured), the -EIO error
> will
> be returned from the vfio_ap_mdev_reset_queue() function. In all
> cases,
> all fields of the status word other than the response code will be
> set to zero, so it makes no sense to check status bits.
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> drivers/s390/crypto/vfio_ap_ops.c | 17 +++++++----------
> 1 file changed, 7 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
> b/drivers/s390/crypto/vfio_ap_ops.c
> index c0cf5050be59..dbf681715a6d 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -1659,17 +1659,15 @@ static int vfio_ap_mdev_reset_queue(struct
> vfio_ap_queue *q)
> if (ret)
> break;
> goto retry_zapq;
> - case AP_RESPONSE_Q_NOT_AVAIL:
> case AP_RESPONSE_DECONFIGURED:
> - case AP_RESPONSE_CHECKSTOPPED:
> - WARN_ONCE(status.irq_enabled,
> - "PQAP/ZAPQ for %02x.%04x failed with rc=%u while IRQ enabled",
> - AP_QID_CARD(q->apqn), AP_QID_QUEUE(q->apqn),
> - status.response_code);
> - ret = -EBUSY;
> - goto free_resources;
> + /*
> + * When an AP adapter is deconfigured, the associated
> + * queues are reset, so let's return a value indicating the reset
> + * completed successfully.
> + */
> + ret = 0;
> + break;
> default:
> - /* things are really broken, give up */
> WARN(true,
> "PQAP/ZAPQ for %02x.%04x failed with invalid rc=%u\n",
> AP_QID_CARD(q->apqn), AP_QID_QUEUE(q->apqn),
> @@ -1677,7 +1675,6 @@ static int vfio_ap_mdev_reset_queue(struct
> vfio_ap_queue *q)
> return -EIO;
> }
>
> -free_resources:
> vfio_ap_free_aqic_resources(q);
>
> return ret;
Reviewed-by: Harald Freudenberger <[email protected]>
On 2022-12-13 16:44, Tony Krowiak wrote:
> The vfio_ap_mdev_reset_queue() function does not check the status
> response code returned form the PQAP(TAPQ) function when verifying the
> queue's status; consequently, there is no way of knowing whether
> verification failed because the wait time was exceeded, or because the
> PQAP(TAPQ) failed.
>
> This patch adds a function to check the status response code from the
> PQAP(TAPQ) instruction and logs an appropriate message if it fails.
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> drivers/s390/crypto/vfio_ap_ops.c | 36 ++++++++++++++++++++++++++-----
> 1 file changed, 31 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
> b/drivers/s390/crypto/vfio_ap_ops.c
> index 83ff94a38102..a5530a46cf31 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -1587,23 +1587,49 @@ static struct vfio_ap_queue
> *vfio_ap_find_queue(int apqn)
> return q;
> }
>
> +static int apq_status_check(int apqn, struct ap_queue_status *status)
> +{
> + switch (status->response_code) {
> + case AP_RESPONSE_NORMAL:
> + case AP_RESPONSE_RESET_IN_PROGRESS:
> + if (status->queue_empty && !status->irq_enabled)
> + return 0;
> + return -EBUSY;
> + case AP_RESPONSE_DECONFIGURED:
> + /*
> + * If the AP queue is deconfigured, any subsequent AP command
> + * targeting the queue will fail with the same response code. On the
> + * other hand, when an AP adapter is deconfigured, the associated
> + * queues are reset, so let's return a value indicating the reset
> + * for which we're waiting completed successfully.
> + */
> + return 0;
> + default:
> + WARN(true,
> + "failed to verify reset of queue %02x.%04x: TAPQ rc=%u\n",
> + AP_QID_CARD(apqn), AP_QID_QUEUE(apqn),
> + status->response_code);
> + return -EIO;
> + }
> +}
> +
> static int apq_reset_check(struct vfio_ap_queue *q)
> {
> - int iters = 2;
> + int iters = 2, ret;
> struct ap_queue_status status;
>
> while (iters--) {
> msleep(20);
> status = ap_tapq(q->apqn, NULL);
> - if (status.queue_empty && !status.irq_enabled)
> - return 0;
> + ret = apq_status_check(q->apqn, &status);
> + if (ret != -EBUSY)
> + return ret;
> }
> WARN_ONCE(iters <= 0,
> "timeout verifying reset of queue %02x.%04x (%u, %u, %u)",
> AP_QID_CARD(q->apqn), AP_QID_QUEUE(q->apqn),
> status.queue_empty, status.irq_enabled, status.response_code);
> -
> - return -EBUSY;
> + return ret;
> }
>
> static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q,
Reviewed-by: Harald Freudenberger <[email protected]>
Just one word here: this function is only called once and it is very
very special
to just check the status after RAPQ/ZAPQ. I would merge this function
into the
calling code or rename the function to reflect the special condition
under which
it is called. However - this is not my code and I don't need to maintain
it, so
maybe simple ignore my words.
On 2022-12-13 16:44, Tony Krowiak wrote:
> Verification that the asynchronous ZAPQ function has completed only
> needs
> to be done when the response code indicates the function was
> successfully
> initiated; so, let's call the apq_reset_check function immediately
> after
> the response code zero is returned from the ZAPQ.
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> drivers/s390/crypto/vfio_ap_ops.c | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
> b/drivers/s390/crypto/vfio_ap_ops.c
> index 5bf2d93ae8af..c0cf5050be59 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -1645,6 +1645,9 @@ static int vfio_ap_mdev_reset_queue(struct
> vfio_ap_queue *q)
> switch (status.response_code) {
> case AP_RESPONSE_NORMAL:
> ret = 0;
> + /* if the reset has not completed, wait for it to take effect */
> + if (!status.queue_empty || status.irq_enabled)
> + ret = apq_reset_check(q);
> break;
> case AP_RESPONSE_RESET_IN_PROGRESS:
> /*
> @@ -1674,10 +1677,6 @@ static int vfio_ap_mdev_reset_queue(struct
> vfio_ap_queue *q)
> return -EIO;
> }
>
> - /* wait for the reset to take effect */
> - if (!status.queue_empty && status.irq_enabled)
> - ret = apq_reset_check(q);
> -
> free_resources:
> vfio_ap_free_aqic_resources(q);
Reviewed-by: Harald Freudenberger <[email protected]>
On 2022-12-13 16:44, Tony Krowiak wrote:
> To eliminate the repeated calls to the PQAP(ZAPQ) function to verify
> that
> a reset in progress completed successfully and ensure that error
> response
> codes get appropriately logged, let's call the apq_reset_check()
> function
> when the ZAPQ response code indicates that a reset that is already in
> progress.
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> drivers/s390/crypto/vfio_ap_ops.c | 24 +++++++++++++-----------
> 1 file changed, 13 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
> b/drivers/s390/crypto/vfio_ap_ops.c
> index a5530a46cf31..5bf2d93ae8af 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -33,7 +33,7 @@
> static int vfio_ap_mdev_reset_queues(struct ap_queue_table *qtable);
> static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
> static const struct vfio_device_ops vfio_ap_matrix_dev_ops;
> -static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q, unsigned
> int retry);
> +static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q);
>
> /**
> * get_update_locks_for_kvm: Acquire the locks required to dynamically
> update a
> @@ -1632,8 +1632,7 @@ static int apq_reset_check(struct vfio_ap_queue
> *q)
> return ret;
> }
>
> -static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q,
> - unsigned int retry)
> +static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
> {
> struct ap_queue_status status;
> int ret;
> @@ -1648,12 +1647,15 @@ static int vfio_ap_mdev_reset_queue(struct
> vfio_ap_queue *q,
> ret = 0;
> break;
> case AP_RESPONSE_RESET_IN_PROGRESS:
> - if (retry--) {
> - msleep(20);
> - goto retry_zapq;
> - }
> - ret = -EBUSY;
> - break;
> + /*
> + * There is a reset issued by another process in progress. Let's
> wait
> + * for that to complete. Since we have no idea whether it was a RAPQ
> or
> + * ZAPQ, then if it completes successfully, let's issue the ZAPQ.
> + */
> + ret = apq_reset_check(q);
> + if (ret)
> + break;
> + goto retry_zapq;
> case AP_RESPONSE_Q_NOT_AVAIL:
> case AP_RESPONSE_DECONFIGURED:
> case AP_RESPONSE_CHECKSTOPPED:
> @@ -1688,7 +1690,7 @@ static int vfio_ap_mdev_reset_queues(struct
> ap_queue_table *qtable)
> struct vfio_ap_queue *q;
>
> hash_for_each(qtable->queues, loop_cursor, q, mdev_qnode) {
> - ret = vfio_ap_mdev_reset_queue(q, 1);
> + ret = vfio_ap_mdev_reset_queue(q);
> /*
> * Regardless whether a queue turns out to be busy, or
> * is not operational, we need to continue resetting
> @@ -1931,7 +1933,7 @@ void vfio_ap_mdev_remove_queue(struct ap_device
> *apdev)
> }
> }
>
> - vfio_ap_mdev_reset_queue(q, 1);
> + vfio_ap_mdev_reset_queue(q);
> dev_set_drvdata(&apdev->device, NULL);
> kfree(q);
> release_update_locks_for_mdev(matrix_mdev);
Reviewed-by: Harald Freudenberger <[email protected]>
On 2022-12-13 16:44, Tony Krowiak wrote:
> Increase the maximum time to wait for verification of a queue reset
> operation to 200ms.
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> drivers/s390/crypto/vfio_ap_ops.c | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
> b/drivers/s390/crypto/vfio_ap_ops.c
> index dbf681715a6d..e80c5a6b91be 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -30,6 +30,9 @@
> #define AP_QUEUE_UNASSIGNED "unassigned"
> #define AP_QUEUE_IN_USE "in use"
>
> +#define MAX_RESET_CHECK_WAIT 200 /* Sleep max 200ms for reset check */
> +#define AP_RESET_INTERVAL 20 /* Reset sleep interval (20ms) */
> +
> static int vfio_ap_mdev_reset_queues(struct ap_queue_table *qtable);
> static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
> static const struct vfio_device_ops vfio_ap_matrix_dev_ops;
> @@ -1615,11 +1618,12 @@ static int apq_status_check(int apqn, struct
> ap_queue_status *status)
>
> static int apq_reset_check(struct vfio_ap_queue *q)
> {
> - int iters = 2, ret;
> + int ret;
> + int iters = MAX_RESET_CHECK_WAIT / AP_RESET_INTERVAL;
> struct ap_queue_status status;
>
> - while (iters--) {
> - msleep(20);
> + for (; iters > 0; iters--) {
> + msleep(AP_RESET_INTERVAL);
> status = ap_tapq(q->apqn, NULL);
> ret = apq_status_check(q->apqn, &status);
> if (ret != -EBUSY)
Reviewed-by: Harald Freudenberger <[email protected]>
On 2022-12-13 16:44, Tony Krowiak wrote:
> Clean up IRQ resources even when a PQAP(ZAPQ) function fails with an
> error
> not handled by a case statement.
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> drivers/s390/crypto/vfio_ap_ops.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
> b/drivers/s390/crypto/vfio_ap_ops.c
> index e80c5a6b91be..2dd8db9ddb39 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -1676,7 +1676,7 @@ static int vfio_ap_mdev_reset_queue(struct
> vfio_ap_queue *q)
> "PQAP/ZAPQ for %02x.%04x failed with invalid rc=%u\n",
> AP_QID_CARD(q->apqn), AP_QID_QUEUE(q->apqn),
> status.response_code);
> - return -EIO;
> + break;
> }
>
> vfio_ap_free_aqic_resources(q);
Reviewed-by: Harald Freudenberger <[email protected]>
On Tue, 13 Dec 2022 10:44:37 -0500
Tony Krowiak <[email protected]> wrote:
> Clean up IRQ resources even when a PQAP(ZAPQ) function fails with an error
> not handled by a case statement.
Why?
I'm afraid this is a step in the wrong direction...
>
> Signed-off-by: Tony Krowiak <[email protected]>
> ---
> drivers/s390/crypto/vfio_ap_ops.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index e80c5a6b91be..2dd8db9ddb39 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -1676,7 +1676,7 @@ static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
> "PQAP/ZAPQ for %02x.%04x failed with invalid rc=%u\n",
> AP_QID_CARD(q->apqn), AP_QID_QUEUE(q->apqn),
> status.response_code);
> - return -EIO;
> + break;
> }
>
> vfio_ap_free_aqic_resources(q);
On 12/19/22 9:10 AM, Halil Pasic wrote:
> On Tue, 13 Dec 2022 10:44:37 -0500
> Tony Krowiak <[email protected]> wrote:
>
>> Clean up IRQ resources even when a PQAP(ZAPQ) function fails with an error
>> not handled by a case statement.
> Why?
If the ZAPQ failed, then instructions submitted to the same queue will
likewise fail. Are you saying it's not safe to assume, therefore, that
interrupts will not be occurring?
>
> I'm afraid this is a step in the wrong direction...
Please explain why.
>
>> Signed-off-by: Tony Krowiak <[email protected]>
>> ---
>> drivers/s390/crypto/vfio_ap_ops.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index e80c5a6b91be..2dd8db9ddb39 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -1676,7 +1676,7 @@ static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
>> "PQAP/ZAPQ for %02x.%04x failed with invalid rc=%u\n",
>> AP_QID_CARD(q->apqn), AP_QID_QUEUE(q->apqn),
>> status.response_code);
>> - return -EIO;
>> + break;
>> }
>>
>> vfio_ap_free_aqic_resources(q);
On 12/15/22 5:51 AM, Harald Freudenberger wrote:
> On 2022-12-13 16:44, Tony Krowiak wrote:
>> To eliminate the repeated calls to the PQAP(ZAPQ) function to verify
>> that
>> a reset in progress completed successfully and ensure that error
>> response
>> codes get appropriately logged, let's call the apq_reset_check()
>> function
>> when the ZAPQ response code indicates that a reset that is already in
>> progress.
>>
>> Signed-off-by: Tony Krowiak <[email protected]>
>> ---
>> drivers/s390/crypto/vfio_ap_ops.c | 24 +++++++++++++-----------
>> 1 file changed, 13 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
>> b/drivers/s390/crypto/vfio_ap_ops.c
>> index a5530a46cf31..5bf2d93ae8af 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -33,7 +33,7 @@
>> static int vfio_ap_mdev_reset_queues(struct ap_queue_table *qtable);
>> static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
>> static const struct vfio_device_ops vfio_ap_matrix_dev_ops;
>> -static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q, unsigned
>> int retry);
>> +static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q);
>>
>> /**
>> * get_update_locks_for_kvm: Acquire the locks required to
>> dynamically update a
>> @@ -1632,8 +1632,7 @@ static int apq_reset_check(struct vfio_ap_queue
>> *q)
>> return ret;
>> }
>>
>> -static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q,
>> - unsigned int retry)
>> +static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
>> {
>> struct ap_queue_status status;
>> int ret;
>> @@ -1648,12 +1647,15 @@ static int vfio_ap_mdev_reset_queue(struct
>> vfio_ap_queue *q,
>> ret = 0;
>> break;
>> case AP_RESPONSE_RESET_IN_PROGRESS:
>> - if (retry--) {
>> - msleep(20);
>> - goto retry_zapq;
>> - }
>> - ret = -EBUSY;
>> - break;
>> + /*
>> + * There is a reset issued by another process in progress.
>> Let's wait
>> + * for that to complete. Since we have no idea whether it
>> was a RAPQ or
>> + * ZAPQ, then if it completes successfully, let's issue the
>> ZAPQ.
>> + */
>> + ret = apq_reset_check(q);
>> + if (ret)
>> + break;
>> + goto retry_zapq;
>> case AP_RESPONSE_Q_NOT_AVAIL:
>> case AP_RESPONSE_DECONFIGURED:
>> case AP_RESPONSE_CHECKSTOPPED:
>> @@ -1688,7 +1690,7 @@ static int vfio_ap_mdev_reset_queues(struct
>> ap_queue_table *qtable)
>> struct vfio_ap_queue *q;
>>
>> hash_for_each(qtable->queues, loop_cursor, q, mdev_qnode) {
>> - ret = vfio_ap_mdev_reset_queue(q, 1);
>> + ret = vfio_ap_mdev_reset_queue(q);
>> /*
>> * Regardless whether a queue turns out to be busy, or
>> * is not operational, we need to continue resetting
>> @@ -1931,7 +1933,7 @@ void vfio_ap_mdev_remove_queue(struct ap_device
>> *apdev)
>> }
>> }
>>
>> - vfio_ap_mdev_reset_queue(q, 1);
>> + vfio_ap_mdev_reset_queue(q);
>> dev_set_drvdata(&apdev->device, NULL);
>> kfree(q);
>> release_update_locks_for_mdev(matrix_mdev);
>
> Reviewed-by: Harald Freudenberger <[email protected]>
Thanks for the review.
On 12/15/22 5:58 AM, Harald Freudenberger wrote:
> On 2022-12-13 16:44, Tony Krowiak wrote:
>> Increase the maximum time to wait for verification of a queue reset
>> operation to 200ms.
>>
>> Signed-off-by: Tony Krowiak <[email protected]>
>> ---
>> drivers/s390/crypto/vfio_ap_ops.c | 10 +++++++---
>> 1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
>> b/drivers/s390/crypto/vfio_ap_ops.c
>> index dbf681715a6d..e80c5a6b91be 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -30,6 +30,9 @@
>> #define AP_QUEUE_UNASSIGNED "unassigned"
>> #define AP_QUEUE_IN_USE "in use"
>>
>> +#define MAX_RESET_CHECK_WAIT 200 /* Sleep max 200ms for reset
>> check */
>> +#define AP_RESET_INTERVAL 20 /* Reset sleep interval
>> (20ms) */
>> +
>> static int vfio_ap_mdev_reset_queues(struct ap_queue_table *qtable);
>> static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
>> static const struct vfio_device_ops vfio_ap_matrix_dev_ops;
>> @@ -1615,11 +1618,12 @@ static int apq_status_check(int apqn, struct
>> ap_queue_status *status)
>>
>> static int apq_reset_check(struct vfio_ap_queue *q)
>> {
>> - int iters = 2, ret;
>> + int ret;
>> + int iters = MAX_RESET_CHECK_WAIT / AP_RESET_INTERVAL;
>> struct ap_queue_status status;
>>
>> - while (iters--) {
>> - msleep(20);
>> + for (; iters > 0; iters--) {
>> + msleep(AP_RESET_INTERVAL);
>> status = ap_tapq(q->apqn, NULL);
>> ret = apq_status_check(q->apqn, &status);
>> if (ret != -EBUSY)
>
> Reviewed-by: Harald Freudenberger <[email protected]>
Thanks for the review.
On 12/15/22 5:56 AM, Harald Freudenberger wrote:
> On 2022-12-13 16:44, Tony Krowiak wrote:
>> Some response codes returned from the queue reset function are not being
>> handled correctly; this patch fixes them:
>>
>> 1. Response code 3, AP queue deconfigured: Deconfiguring an AP adapter
>> resets all of its queues, so this is handled by indicating the reset
>> verification completed successfully.
>>
>> 2. For all response codes other than 0 (normal reset completion), 2
>> (queue reset in progress) and 3 (AP deconfigured), the -EIO error
>> will
>> be returned from the vfio_ap_mdev_reset_queue() function. In all
>> cases,
>> all fields of the status word other than the response code will be
>> set to zero, so it makes no sense to check status bits.
>>
>> Signed-off-by: Tony Krowiak <[email protected]>
>> ---
>> drivers/s390/crypto/vfio_ap_ops.c | 17 +++++++----------
>> 1 file changed, 7 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
>> b/drivers/s390/crypto/vfio_ap_ops.c
>> index c0cf5050be59..dbf681715a6d 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -1659,17 +1659,15 @@ static int vfio_ap_mdev_reset_queue(struct
>> vfio_ap_queue *q)
>> if (ret)
>> break;
>> goto retry_zapq;
>> - case AP_RESPONSE_Q_NOT_AVAIL:
>> case AP_RESPONSE_DECONFIGURED:
>> - case AP_RESPONSE_CHECKSTOPPED:
>> - WARN_ONCE(status.irq_enabled,
>> - "PQAP/ZAPQ for %02x.%04x failed with rc=%u while IRQ
>> enabled",
>> - AP_QID_CARD(q->apqn), AP_QID_QUEUE(q->apqn),
>> - status.response_code);
>> - ret = -EBUSY;
>> - goto free_resources;
>> + /*
>> + * When an AP adapter is deconfigured, the associated
>> + * queues are reset, so let's return a value indicating the
>> reset
>> + * completed successfully.
>> + */
>> + ret = 0;
>> + break;
>> default:
>> - /* things are really broken, give up */
>> WARN(true,
>> "PQAP/ZAPQ for %02x.%04x failed with invalid rc=%u\n",
>> AP_QID_CARD(q->apqn), AP_QID_QUEUE(q->apqn),
>> @@ -1677,7 +1675,6 @@ static int vfio_ap_mdev_reset_queue(struct
>> vfio_ap_queue *q)
>> return -EIO;
>> }
>>
>> -free_resources:
>> vfio_ap_free_aqic_resources(q);
>>
>> return ret;
>
> Reviewed-by: Harald Freudenberger <[email protected]>
Thanks for the review.
On 12/15/22 5:54 AM, Harald Freudenberger wrote:
> On 2022-12-13 16:44, Tony Krowiak wrote:
>> Verification that the asynchronous ZAPQ function has completed only
>> needs
>> to be done when the response code indicates the function was
>> successfully
>> initiated; so, let's call the apq_reset_check function immediately after
>> the response code zero is returned from the ZAPQ.
>>
>> Signed-off-by: Tony Krowiak <[email protected]>
>> ---
>> drivers/s390/crypto/vfio_ap_ops.c | 7 +++----
>> 1 file changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
>> b/drivers/s390/crypto/vfio_ap_ops.c
>> index 5bf2d93ae8af..c0cf5050be59 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -1645,6 +1645,9 @@ static int vfio_ap_mdev_reset_queue(struct
>> vfio_ap_queue *q)
>> switch (status.response_code) {
>> case AP_RESPONSE_NORMAL:
>> ret = 0;
>> + /* if the reset has not completed, wait for it to take
>> effect */
>> + if (!status.queue_empty || status.irq_enabled)
>> + ret = apq_reset_check(q);
>> break;
>> case AP_RESPONSE_RESET_IN_PROGRESS:
>> /*
>> @@ -1674,10 +1677,6 @@ static int vfio_ap_mdev_reset_queue(struct
>> vfio_ap_queue *q)
>> return -EIO;
>> }
>>
>> - /* wait for the reset to take effect */
>> - if (!status.queue_empty && status.irq_enabled)
>> - ret = apq_reset_check(q);
>> -
>> free_resources:
>> vfio_ap_free_aqic_resources(q);
>
> Reviewed-by: Harald Freudenberger <[email protected]>
Thanks for the review.
On Tue, 20 Dec 2022 09:33:03 -0500
Anthony Krowiak <[email protected]> wrote:
> On 12/19/22 9:10 AM, Halil Pasic wrote:
> > On Tue, 13 Dec 2022 10:44:37 -0500
> > Tony Krowiak <[email protected]> wrote:
> >
> >> Clean up IRQ resources even when a PQAP(ZAPQ) function fails with an error
> >> not handled by a case statement.
> > Why?
>
>
> If the ZAPQ failed, then instructions submitted to the same queue will
> likewise fail. Are you saying it's not safe to assume, therefore, that
> interrupts will not be occurring?
Right. We are talking about the default branch here, and I suppose, the
codes where we know that it is safe to assume that no reset is needed
handled separately (AP_RESPONSE_DECONFIGURED).
I'm not convinced that if we take the default branch we can safely
assume, that we won't see any interrupts.
For example consider hot-unplug as done by KVM. We modify the
CRYCB/APCB with all vCPUS take out of SIE, but we don't keep
the vCPUs out of SIE until the resets of the unpugged queues
are done, and we don't do any extra interrupt disablement
with all vCPUs keept out of SIE. So I believe currently there
may be a window where the guest can observe a 01 but the
interrupts are still live. That may be a bug, but IMHO it ain't clear
cut.
But it is not just about interrupts. Before we returned an error
code, which gets propagated to the userspace if this reset was
triggered via the ioctl.
With this change, ret seems to be uninitialized when returned
if we take the code path which you change here. So we would
end up logging a warning and returning garbage?
One could also debate, whether RCs introduced down the road
can affect the logic here (even if the statement "if we
see an RC other that 00 and 02, we don't need to pursue a
reset any further, and interrpts are disabled" were to be
guaranteed to be true now, new RCs could theoretically mess
this up).
>
>
> >
> > I'm afraid this is a step in the wrong direction...
>
>
> Please explain why.
>
Sorry, I kept this brief because IMHO it is your job to tell us why
this needs to be changed. But I gave in, as you see.
Regards,
Halil
On 12/20/22 12:24 PM, Halil Pasic wrote:
> On Tue, 20 Dec 2022 09:33:03 -0500
> Anthony Krowiak <[email protected]> wrote:
>
>> On 12/19/22 9:10 AM, Halil Pasic wrote:
>>> On Tue, 13 Dec 2022 10:44:37 -0500
>>> Tony Krowiak <[email protected]> wrote:
>>>
>>>> Clean up IRQ resources even when a PQAP(ZAPQ) function fails with an error
>>>> not handled by a case statement.
>>> Why?
>>
>> If the ZAPQ failed, then instructions submitted to the same queue will
>> likewise fail. Are you saying it's not safe to assume, therefore, that
>> interrupts will not be occurring?
> Right. We are talking about the default branch here, and I suppose, the
> codes where we know that it is safe to assume that no reset is needed
> handled separately (AP_RESPONSE_DECONFIGURED).
>
> I'm not convinced that if we take the default branch we can safely
> assume, that we won't see any interrupts.
>
> For example consider hot-unplug as done by KVM. We modify the
> CRYCB/APCB with all vCPUS take out of SIE, but we don't keep
> the vCPUs out of SIE until the resets of the unpugged queues
> are done, and we don't do any extra interrupt disablement
> with all vCPUs keept out of SIE. So I believe currently there
> may be a window where the guest can observe a 01 but the
> interrupts are still live. That may be a bug, but IMHO it ain't clear
> cut.
>
> But it is not just about interrupts. Before we returned an error
> code, which gets propagated to the userspace if this reset was
> triggered via the ioctl.
>
> With this change, ret seems to be uninitialized when returned
> if we take the code path which you change here. So we would
> end up logging a warning and returning garbage?
That was an oversight. The -EIO value was returned previously, so the
ret = -EIO should be set in the default case.
>
> One could also debate, whether RCs introduced down the road
> can affect the logic here (even if the statement "if we
> see an RC other that 00 and 02, we don't need to pursue a
> reset any further, and interrpts are disabled" were to be
> guaranteed to be true now, new RCs could theoretically mess
> this up).
I think that would be the case regardless of this change. If new RCs are
introduced, this function ought to be revisited anyway and appropriate
changes made.
>
>
>>
>>> I'm afraid this is a step in the wrong direction...
>>
>> Please explain why.
>>
> Sorry, I kept this brief because IMHO it is your job to tell us why
> this needs to be changed. But I gave in, as you see.
>
> Regards,
> Halil