LinuxLists.cc - [PATCH V4 0/9] rework on the IRQ hardening of virtio

2022-05-09 06:47:45

[permalink] [raw]

Subject: [PATCH V4 0/9] rework on the IRQ hardening of virtio

Hi All:

This is a rework on the IRQ hardening for virtio which is done
previously by the following commits are reverted:

9e35276a5344 ("virtio_pci: harden MSI-X interrupts")
080cd7c3ac87 ("virtio-pci: harden INTX interrupts")

The reason is that it depends on the IRQF_NO_AUTOEN which may conflict
with the assumption of the affinity managed IRQ that is used by some
virtio drivers. And what's more, it is only done for virtio-pci but
not other transports.

In this rework, I try to implement a general virtio solution which
borrows the idea of the INTX hardening by re-using per virtqueue
boolean vq->broken and toggle it in virtio_device_ready() and
virtio_reset_device(). Then we can simply reuse the existing checks in
the vring_interrupt() and return early if the driver is not ready.

Note that, I only did compile test on ccw and MMIO transport.

Please review.

Changes since v1:

- Use transport specific irq synchronization method when possible
- Drop the module parameter and enable the hardening unconditonally
- Tweak the barrier/ordering facilities used in the code
- Reanme irq_soft_enabled to driver_ready
- Avoid unnecssary IRQ synchornization (e.g during boot)

Changes since V2:

- add ccw and MMIO support
- rename synchronize_vqs() to synchronize_cbs()
- switch to re-use vq->broken instead of introducing new device
attributes for the future virtqueue reset support
- remove unnecssary READ_ONCE()/WRITE_ONCE()
- a new patch to remove device triggerable BUG_ON()
- more tweaks on the comments

Changes since V3:

- Rename synchornize_vqs() to synchronize_cbs()
- tweak the comment for synchronize_cbs()
- switch to use a dedicated helper __virtio_unbreak_device() and
document it should be only used for probing
- switch to use rwlick to synchornize the non airq for ccw

Jason Wang (8):
virtio: use virtio_reset_device() when possible
virtio: introduce config op to synchronize vring callbacks
virtio-pci: implement synchronize_cbs()
virtio-mmio: implement synchronize_cbs()
virtio-ccw: implement synchronize_cbs()
virtio: allow to unbreak virtqueue
virtio: harden vring IRQ
virtio: use WARN_ON() to warning illegal status value

Stefano Garzarella (1):
virtio: use virtio_device_ready() in virtio_device_restore()

drivers/s390/virtio/virtio_ccw.c | 27 +++++++++++++++++++++
drivers/virtio/virtio.c | 24 ++++++++++++------
drivers/virtio/virtio_mmio.c | 9 +++++++
drivers/virtio/virtio_pci_legacy.c | 1 +
drivers/virtio/virtio_pci_modern.c | 2 ++
drivers/virtio/virtio_ring.c | 32 +++++++++++++++++++++---
include/linux/virtio.h | 1 +
include/linux/virtio_config.h | 39 +++++++++++++++++++++++++++++-
8 files changed, 123 insertions(+), 12 deletions(-)

--
2.25.1

2022-05-09 07:32:12

[permalink] [raw]

Subject: [PATCH V4 9/9] virtio: use WARN_ON() to warning illegal status value

We used to use BUG_ON() in virtio_device_ready() to detect illegal
status value, this seems sub-optimal since the value is under the
control of the device. Switch to use WARN_ON() instead.

Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Halil Pasic <[email protected]>
Cc: Cornelia Huck <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
---
include/linux/virtio_config.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 23f1694cdbd5..5d539f39f7ee 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -255,7 +255,7 @@ void virtio_device_ready(struct virtio_device *dev)
{
unsigned status = dev->config->get_status(dev);

- BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
+ WARN_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);

/*
* The virtio_synchronize_cbs() makes sure vring_interrupt()
--
2.25.1

2022-05-09 08:42:30

[permalink] [raw]

Subject: [PATCH V4 8/9] virtio: harden vring IRQ

This is a rework on the previous IRQ hardening that is done for
virtio-pci where several drawbacks were found and were reverted:

1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
that is used by some device such as virtio-blk
2) done only for PCI transport

The vq->broken is re-used in this patch for implementing the IRQ
hardening. The vq->broken is set to true during both initialization
and reset. And the vq->broken is set to false in
virtio_device_ready(). Then vring_interrupt can check and return when
vq->broken is true. And in this case, switch to return IRQ_NONE to let
the interrupt core aware of such invalid interrupt to prevent IRQ
storm.

The reason of using a per queue variable instead of a per device one
is that we may need it for per queue reset hardening in the future.

Note that the hardening is only done for vring interrupt since the
config interrupt hardening is already done in commit 22b7050a024d7
("virtio: defer config changed notifications"). But the method that is
used by config interrupt can't be reused by the vring interrupt
handler because it uses spinlock to do the synchronization which is
expensive.

Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Halil Pasic <[email protected]>
Cc: Cornelia Huck <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
---
drivers/virtio/virtio.c | 15 ++++++++++++---
drivers/virtio/virtio_ring.c | 11 +++++++----
include/linux/virtio_config.h | 12 ++++++++++++
3 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 8dde44ea044a..696f5ba4f38e 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
* */
void virtio_reset_device(struct virtio_device *dev)
{
+ /*
+ * The below virtio_synchronize_cbs() guarantees that any
+ * interrupt for this line arriving after
+ * virtio_synchronize_vqs() has completed is guaranteed to see
+ * driver_ready == false.
+ */
+ virtio_break_device(dev);
+ virtio_synchronize_cbs(dev);
+
dev->config->reset(dev);
}
EXPORT_SYMBOL_GPL(virtio_reset_device);
@@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
dev->config_enabled = false;
dev->config_change_pending = false;

+ INIT_LIST_HEAD(&dev->vqs);
+ spin_lock_init(&dev->vqs_list_lock);
+
/* We always start by resetting the device, in case a previous
* driver messed it up. This also tests that code path a little. */
virtio_reset_device(dev);
@@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
/* Acknowledge that we've seen the device. */
virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);

- INIT_LIST_HEAD(&dev->vqs);
- spin_lock_init(&dev->vqs_list_lock);
-
/*
* device_add() causes the bus infrastructure to look for a matching
* driver.
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 5b7df7c455f0..9dfad2890d7a 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1690,7 +1690,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
vq->we_own_ring = true;
vq->notify = notify;
vq->weak_barriers = weak_barriers;
- vq->broken = false;
+ vq->broken = true;
vq->last_used_idx = 0;
vq->event_triggered = false;
vq->num_added = 0;
@@ -2136,8 +2136,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
return IRQ_NONE;
}

- if (unlikely(vq->broken))
- return IRQ_HANDLED;
+ if (unlikely(vq->broken)) {
+ dev_warn_once(&vq->vq.vdev->dev,
+ "virtio vring IRQ raised before DRIVER_OK");
+ return IRQ_NONE;
+ }

/* Just a hint for performance: so it's ok that this can be racy! */
if (vq->event)
@@ -2179,7 +2182,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
vq->we_own_ring = false;
vq->notify = notify;
vq->weak_barriers = weak_barriers;
- vq->broken = false;
+ vq->broken = true;
vq->last_used_idx = 0;
vq->event_triggered = false;
vq->num_added = 0;
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index d8a2340f928e..23f1694cdbd5 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -256,6 +256,18 @@ void virtio_device_ready(struct virtio_device *dev)
unsigned status = dev->config->get_status(dev);

BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
+
+ /*
+ * The virtio_synchronize_cbs() makes sure vring_interrupt()
+ * will see the driver specific setup if it sees vq->broken
+ * as false.
+ */
+ virtio_synchronize_cbs(dev);
+ __virtio_unbreak_device(dev);
+ /*
+ * The transport is expected ensure the visibility of
+ * vq->broken before setting VIRTIO_CONFIG_S_DRIVER_OK.
+ */
dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
}

--
2.25.1

2022-05-09 08:48:31

[permalink] [raw]

Subject: [PATCH V4 2/9] virtio: use virtio_reset_device() when possible

This allows us to do common extension without duplicating code.

Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Halil Pasic <[email protected]>
Cc: Cornelia Huck <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
---
drivers/virtio/virtio.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 75c8d560bbd3..8dde44ea044a 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -430,7 +430,7 @@ int register_virtio_device(struct virtio_device *dev)

/* We always start by resetting the device, in case a previous
* driver messed it up. This also tests that code path a little. */
- dev->config->reset(dev);
+ virtio_reset_device(dev);

/* Acknowledge that we've seen the device. */
virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
@@ -496,7 +496,7 @@ int virtio_device_restore(struct virtio_device *dev)

/* We always start by resetting the device, in case a previous
* driver messed it up. */
- dev->config->reset(dev);
+ virtio_reset_device(dev);

/* Acknowledge that we've seen the device. */
virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
--
2.25.1

2022-05-09 09:45:38

[permalink] [raw]

Subject: [PATCH V4 7/9] virtio: allow to unbreak virtqueue

This patch allows the new introduced __virtio_break_device() to
unbreak the virtqueue.

Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Halil Pasic <[email protected]>
Cc: Cornelia Huck <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
---
drivers/virtio/virtio_ring.c | 21 +++++++++++++++++++++
include/linux/virtio.h | 1 +
2 files changed, 22 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index cfb028ca238e..5b7df7c455f0 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2397,6 +2397,27 @@ void virtio_break_device(struct virtio_device *dev)
}
EXPORT_SYMBOL_GPL(virtio_break_device);

+/*
+ * This should allow the device to be used by the driver. You may
+ * need to grab appropriate locks to flush. This should only be used
+ * in some specific case e.g (probing and restoring). Driver should
+ * not call this directly.
+ */
+void __virtio_unbreak_device(struct virtio_device *dev)
+{
+ struct virtqueue *_vq;
+
+ spin_lock(&dev->vqs_list_lock);
+ list_for_each_entry(_vq, &dev->vqs, list) {
+ struct vring_virtqueue *vq = to_vvq(_vq);
+
+ /* Pairs with READ_ONCE() in virtqueue_is_broken(). */
+ WRITE_ONCE(vq->broken, false);
+ }
+ spin_unlock(&dev->vqs_list_lock);
+}
+EXPORT_SYMBOL_GPL(__virtio_unbreak_device);
+
dma_addr_t virtqueue_get_desc_addr(struct virtqueue *_vq)
{
struct vring_virtqueue *vq = to_vvq(_vq);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 5464f398912a..d8fdf170637c 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -131,6 +131,7 @@ void unregister_virtio_device(struct virtio_device *dev);
bool is_virtio_device(struct device *dev);

void virtio_break_device(struct virtio_device *dev);
+void __virtio_unbreak_device(struct virtio_device *dev);

void virtio_config_changed(struct virtio_device *dev);
#ifdef CONFIG_PM_SLEEP
--
2.25.1

2022-05-09 11:17:33

[permalink] [raw]

Subject: [PATCH V4 3/9] virtio: introduce config op to synchronize vring callbacks

This patch introduces new virtio config op to vring
callbacks. Transport specific method is required to make sure the
write before this function is visible to the vring_interrupt() that is
called after the return of this function. For the transport that
doesn't provide synchronize_vqs(), use synchornize_rcu() which
synchronize with IRQ implicitly as a fallback.

Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Halil Pasic <[email protected]>
Cc: Cornelia Huck <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
---
include/linux/virtio_config.h | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)

diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index b341dd62aa4d..d8a2340f928e 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -57,6 +57,11 @@ struct virtio_shm_region {
* include a NULL entry for vqs unused by driver
* Returns 0 on success or error status
* @del_vqs: free virtqueues found by find_vqs().
+ * @synchronize_cbs: synchronize with the virtqueue callbacks (optional)
+ * The function guarantees that all memory operations on the
+ * queue before it are visible to the vring_interrupt() that is
+ * called after it.
+ * vdev: the virtio_device
* @get_features: get the array of feature bits for this device.
* vdev: the virtio_device
* Returns the first 64 feature bits (all we currently need).
@@ -89,6 +94,7 @@ struct virtio_config_ops {
const char * const names[], const bool *ctx,
struct irq_affinity *desc);
void (*del_vqs)(struct virtio_device *);
+ void (*synchronize_cbs)(struct virtio_device *);
u64 (*get_features)(struct virtio_device *vdev);
int (*finalize_features)(struct virtio_device *vdev);
const char *(*bus_name)(struct virtio_device *vdev);
@@ -217,6 +223,25 @@ int virtio_find_vqs_ctx(struct virtio_device *vdev, unsigned nvqs,
desc);
}

+/**
+ * virtio_synchronize_cbs - synchronize with virtqueue callbacks
+ * @vdev: the device
+ */
+static inline
+void virtio_synchronize_cbs(struct virtio_device *dev)
+{
+ if (dev->config->synchronize_cbs) {
+ dev->config->synchronize_cbs(dev);
+ } else {
+ /*
+ * A best effort fallback to synchronize with
+ * interrupts, preemption and softirq. See comment
+ * above synchronize_rcu().
+ */
+ synchronize_rcu();
+ }
+}
+
/**
* virtio_device_ready - enable vq use in probe function
* @vdev: the device
--
2.25.1

2022-05-09 15:30:52

by Cornelia Huck

[permalink] [raw]

Subject: Re: [PATCH V4 3/9] virtio: introduce config op to synchronize vring callbacks

On Sat, May 07 2022, Jason Wang <[email protected]> wrote:

> This patch introduces new virtio config op to vring
> callbacks. Transport specific method is required to make sure the
> write before this function is visible to the vring_interrupt() that is
> called after the return of this function. For the transport that
> doesn't provide synchronize_vqs(), use synchornize_rcu() which
> synchronize with IRQ implicitly as a fallback.
>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: "Paul E. McKenney" <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Cc: Halil Pasic <[email protected]>
> Cc: Cornelia Huck <[email protected]>
> Signed-off-by: Jason Wang <[email protected]>
> ---
> include/linux/virtio_config.h | 25 +++++++++++++++++++++++++
> 1 file changed, 25 insertions(+)

Reviewed-by: Cornelia Huck <[email protected]>

2022-05-10 15:33:38

by Michael S. Tsirkin

[permalink] [raw]

Subject: Re: [PATCH V4 8/9] virtio: harden vring IRQ

On Sat, May 07, 2022 at 03:19:53PM +0800, Jason Wang wrote:
> This is a rework on the previous IRQ hardening that is done for
> virtio-pci where several drawbacks were found and were reverted:
>
> 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> that is used by some device such as virtio-blk
> 2) done only for PCI transport
>
> The vq->broken is re-used in this patch for implementing the IRQ
> hardening. The vq->broken is set to true during both initialization
> and reset. And the vq->broken is set to false in
> virtio_device_ready(). Then vring_interrupt can check and return when
> vq->broken is true. And in this case, switch to return IRQ_NONE to let
> the interrupt core aware of such invalid interrupt to prevent IRQ
> storm.
>
> The reason of using a per queue variable instead of a per device one
> is that we may need it for per queue reset hardening in the future.
>
> Note that the hardening is only done for vring interrupt since the
> config interrupt hardening is already done in commit 22b7050a024d7
> ("virtio: defer config changed notifications"). But the method that is
> used by config interrupt can't be reused by the vring interrupt
> handler because it uses spinlock to do the synchronization which is
> expensive.
>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: "Paul E. McKenney" <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Cc: Halil Pasic <[email protected]>
> Cc: Cornelia Huck <[email protected]>
> Signed-off-by: Jason Wang <[email protected]>
> ---
> drivers/virtio/virtio.c | 15 ++++++++++++---
> drivers/virtio/virtio_ring.c | 11 +++++++----
> include/linux/virtio_config.h | 12 ++++++++++++
> 3 files changed, 31 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 8dde44ea044a..696f5ba4f38e 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> * */
> void virtio_reset_device(struct virtio_device *dev)
> {
> + /*
> + * The below virtio_synchronize_cbs() guarantees that any
> + * interrupt for this line arriving after
> + * virtio_synchronize_vqs() has completed is guaranteed to see
> + * driver_ready == false.
> + */
> + virtio_break_device(dev);
> + virtio_synchronize_cbs(dev);
> +
> dev->config->reset(dev);
> }
> EXPORT_SYMBOL_GPL(virtio_reset_device);
> @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> dev->config_enabled = false;
> dev->config_change_pending = false;
>
> + INIT_LIST_HEAD(&dev->vqs);
> + spin_lock_init(&dev->vqs_list_lock);
> +
> /* We always start by resetting the device, in case a previous
> * driver messed it up. This also tests that code path a little. */
> virtio_reset_device(dev);
> @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> /* Acknowledge that we've seen the device. */
> virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
>
> - INIT_LIST_HEAD(&dev->vqs);
> - spin_lock_init(&dev->vqs_list_lock);
> -
> /*
> * device_add() causes the bus infrastructure to look for a matching
> * driver.
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 5b7df7c455f0..9dfad2890d7a 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -1690,7 +1690,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> vq->we_own_ring = true;
> vq->notify = notify;
> vq->weak_barriers = weak_barriers;
> - vq->broken = false;
> + vq->broken = true;
> vq->last_used_idx = 0;
> vq->event_triggered = false;
> vq->num_added = 0;
> @@ -2136,8 +2136,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> return IRQ_NONE;
> }
>
> - if (unlikely(vq->broken))
> - return IRQ_HANDLED;
> + if (unlikely(vq->broken)) {
> + dev_warn_once(&vq->vq.vdev->dev,
> + "virtio vring IRQ raised before DRIVER_OK");
> + return IRQ_NONE;
> + }
>
> /* Just a hint for performance: so it's ok that this can be racy! */
> if (vq->event)
> @@ -2179,7 +2182,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> vq->we_own_ring = false;
> vq->notify = notify;
> vq->weak_barriers = weak_barriers;
> - vq->broken = false;
> + vq->broken = true;
> vq->last_used_idx = 0;
> vq->event_triggered = false;
> vq->num_added = 0;
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index d8a2340f928e..23f1694cdbd5 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -256,6 +256,18 @@ void virtio_device_ready(struct virtio_device *dev)
> unsigned status = dev->config->get_status(dev);
>
> BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> +
> + /*
> + * The virtio_synchronize_cbs() makes sure vring_interrupt()
> + * will see the driver specific setup if it sees vq->broken
> + * as false.
> + */
> + virtio_synchronize_cbs(dev);

since you mention vq->broken above, maybe add
"set vq->broken to false"

> + __virtio_unbreak_device(dev);
> + /*
> + * The transport is expected ensure the visibility of

to ensure

> + * vq->broken

let's add: "visibility by vq callbacks"

> before setting VIRTIO_CONFIG_S_DRIVER_OK.
> + */

Can I see some analysis of existing transports showing
this is actually the case for them?
And maybe add a comment near set_status to document the
requirement.

> dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> }
>
> --
> 2.25.1

2022-05-10 17:28:07

by Cornelia Huck

[permalink] [raw]

Subject: Re: [PATCH V4 0/9] rework on the IRQ hardening of virtio

On Sat, May 07 2022, Jason Wang <[email protected]> wrote:

> Hi All:
>
> This is a rework on the IRQ hardening for virtio which is done
> previously by the following commits are reverted:
>
> 9e35276a5344 ("virtio_pci: harden MSI-X interrupts")
> 080cd7c3ac87 ("virtio-pci: harden INTX interrupts")
>
> The reason is that it depends on the IRQF_NO_AUTOEN which may conflict
> with the assumption of the affinity managed IRQ that is used by some
> virtio drivers. And what's more, it is only done for virtio-pci but
> not other transports.
>
> In this rework, I try to implement a general virtio solution which
> borrows the idea of the INTX hardening by re-using per virtqueue
> boolean vq->broken and toggle it in virtio_device_ready() and
> virtio_reset_device(). Then we can simply reuse the existing checks in
> the vring_interrupt() and return early if the driver is not ready.
>
> Note that, I only did compile test on ccw and MMIO transport.

Lockdep is unhappy with the ccw parts:

================================
WARNING: inconsistent lock state
5.18.0-rc6+ #191 Not tainted
--------------------------------
inconsistent {IN-HARDIRQ-R} -> {HARDIRQ-ON-W} usage.
kworker/u4:0/9 [HC0[0]:SC0[0]:HE1:SE1] takes:
00000000058e9618 (&vcdev->irq_lock){+-..}-{2:2}, at: virtio_ccw_synchronize_cbs+0x4e/0x60
{IN-HARDIRQ-R} state was registered at:
__lock_acquire+0x442/0xc20
lock_acquire.part.0+0xdc/0x228
lock_acquire+0xa6/0x1b0
_raw_read_lock_irqsave+0x72/0x100
virtio_ccw_int_handler+0x84/0x238
ccw_device_call_handler+0x72/0xd0
ccw_device_irq+0x7a/0x198
do_cio_interrupt+0x11c/0x1d0
__handle_irq_event_percpu+0xc2/0x318
handle_irq_event_percpu+0x26/0x68
handle_percpu_irq+0x64/0x88
generic_handle_irq+0x40/0x58
do_irq_async+0x56/0xb0
do_io_irq+0x82/0x160
io_int_handler+0xe6/0x120
rcu_read_lock_sched_held+0x3e/0xb0
lock_acquired+0x12e/0x208
new_inode+0x3e/0xd0
debugfs_get_inode+0x22/0x68
__debugfs_create_file+0x78/0x1c0
debugfs_create_file_unsafe+0x36/0x58
debugfs_create_u32+0x38/0x68
sched_init_debug+0xb0/0x1c0
do_one_initcall+0x108/0x280
do_initcalls+0x124/0x148
kernel_init_freeable+0x242/0x280
kernel_init+0x2e/0x158
__ret_from_fork+0x3c/0x50
ret_from_fork+0xa/0x40
irq event stamp: 539789
hardirqs last enabled at (539789): [<0000000000d9c632>] _raw_spin_unlock_irqrestore+0x72/0x88
hardirqs last disabled at (539788): [<0000000000d9c2b6>] _raw_spin_lock_irqsave+0x96/0xd0
softirqs last enabled at (539568): [<0000000000d9e0d4>] __do_softirq+0x434/0x588
softirqs last disabled at (539503): [<000000000018cd66>] __irq_exit_rcu+0x146/0x170

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&vcdev->irq_lock);
<Interrupt>
lock(&vcdev->irq_lock);

*** DEADLOCK ***

2 locks held by kworker/u4:0/9:
#0: 000000000288d948 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1ea/0x658
#1: 000003800004bdc8 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0x1ea/0x658

stack backtrace:
CPU: 1 PID: 9 Comm: kworker/u4:0 Not tainted 5.18.0-rc6+ #191
Hardware name: QEMU 8561 QEMU (KVM/Linux)
Workqueue: events_unbound async_run_entry_fn
Call Trace:
[<0000000000d8af22>] dump_stack_lvl+0x92/0xd0
[<00000000002032ac>] mark_lock_irq+0x864/0x968
[<0000000000203670>] mark_lock.part.0+0x2c0/0x790
[<0000000000203cea>] mark_usage+0x10a/0x178
[<000000000020692a>] __lock_acquire+0x442/0xc20
[<0000000000207cc4>] lock_acquire.part.0+0xdc/0x228
[<0000000000207eb6>] lock_acquire+0xa6/0x1b0
[<0000000000d9c774>] _raw_write_lock+0x54/0xa8
[<0000000000d5a1f6>] virtio_ccw_synchronize_cbs+0x4e/0x60
[<00000000008eec04>] register_virtio_device+0xdc/0x1b0
[<0000000000d5aabe>] virtio_ccw_online+0x246/0x2e8
[<0000000000c9fecc>] ccw_device_set_online+0x1c4/0x540
[<0000000000d5a05e>] virtio_ccw_auto_online+0x26/0x50
[<00000000001ba2b0>] async_run_entry_fn+0x40/0x108
[<00000000001ab9b4>] process_one_work+0x2a4/0x658
[<00000000001abdd0>] worker_thread+0x68/0x440
[<00000000001b4668>] kthread+0x128/0x130
[<0000000000102fac>] __ret_from_fork+0x3c/0x50
[<0000000000d9d3aa>] ret_from_fork+0xa/0x40
INFO: lockdep is turned off.

2022-05-11 05:55:40

[permalink] [raw]

Subject: Re: [PATCH V4 0/9] rework on the IRQ hardening of virtio

On Tue, May 10, 2022 at 5:29 PM Cornelia Huck <[email protected]> wrote:
>
> On Sat, May 07 2022, Jason Wang <[email protected]> wrote:
>
> > Hi All:
> >
> > This is a rework on the IRQ hardening for virtio which is done
> > previously by the following commits are reverted:
> >
> > 9e35276a5344 ("virtio_pci: harden MSI-X interrupts")
> > 080cd7c3ac87 ("virtio-pci: harden INTX interrupts")
> >
> > The reason is that it depends on the IRQF_NO_AUTOEN which may conflict
> > with the assumption of the affinity managed IRQ that is used by some
> > virtio drivers. And what's more, it is only done for virtio-pci but
> > not other transports.
> >
> > In this rework, I try to implement a general virtio solution which
> > borrows the idea of the INTX hardening by re-using per virtqueue
> > boolean vq->broken and toggle it in virtio_device_ready() and
> > virtio_reset_device(). Then we can simply reuse the existing checks in
> > the vring_interrupt() and return early if the driver is not ready.
> >
> > Note that, I only did compile test on ccw and MMIO transport.
>
> Lockdep is unhappy with the ccw parts:
>
> ================================
> WARNING: inconsistent lock state
> 5.18.0-rc6+ #191 Not tainted
> --------------------------------
> inconsistent {IN-HARDIRQ-R} -> {HARDIRQ-ON-W} usage.
> kworker/u4:0/9 [HC0[0]:SC0[0]:HE1:SE1] takes:
> 00000000058e9618 (&vcdev->irq_lock){+-..}-{2:2}, at: virtio_ccw_synchronize_cbs+0x4e/0x60
> {IN-HARDIRQ-R} state was registered at:
> __lock_acquire+0x442/0xc20
> lock_acquire.part.0+0xdc/0x228
> lock_acquire+0xa6/0x1b0
> _raw_read_lock_irqsave+0x72/0x100
> virtio_ccw_int_handler+0x84/0x238
> ccw_device_call_handler+0x72/0xd0
> ccw_device_irq+0x7a/0x198
> do_cio_interrupt+0x11c/0x1d0
> __handle_irq_event_percpu+0xc2/0x318
> handle_irq_event_percpu+0x26/0x68
> handle_percpu_irq+0x64/0x88
> generic_handle_irq+0x40/0x58
> do_irq_async+0x56/0xb0
> do_io_irq+0x82/0x160
> io_int_handler+0xe6/0x120
> rcu_read_lock_sched_held+0x3e/0xb0
> lock_acquired+0x12e/0x208
> new_inode+0x3e/0xd0
> debugfs_get_inode+0x22/0x68
> __debugfs_create_file+0x78/0x1c0
> debugfs_create_file_unsafe+0x36/0x58
> debugfs_create_u32+0x38/0x68
> sched_init_debug+0xb0/0x1c0
> do_one_initcall+0x108/0x280
> do_initcalls+0x124/0x148
> kernel_init_freeable+0x242/0x280
> kernel_init+0x2e/0x158
> __ret_from_fork+0x3c/0x50
> ret_from_fork+0xa/0x40
> irq event stamp: 539789
> hardirqs last enabled at (539789): [<0000000000d9c632>] _raw_spin_unlock_irqrestore+0x72/0x88
> hardirqs last disabled at (539788): [<0000000000d9c2b6>] _raw_spin_lock_irqsave+0x96/0xd0
> softirqs last enabled at (539568): [<0000000000d9e0d4>] __do_softirq+0x434/0x588
> softirqs last disabled at (539503): [<000000000018cd66>] __irq_exit_rcu+0x146/0x170
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(&vcdev->irq_lock);
> <Interrupt>
> lock(&vcdev->irq_lock);
>
> *** DEADLOCK ***

It looks to me we need to use write_lock_irq()/write_unlock_irq() to
do the synchronization.

And we probably need to keep the
read_lock_irqsave()/read_lock_irqrestore() logic since I can see the
virtio_ccw_int_handler() to be called from process context (e.g from
the io_subchannel_quiesce()).

Thanks

>
> 2 locks held by kworker/u4:0/9:
> #0: 000000000288d948 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1ea/0x658
> #1: 000003800004bdc8 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0x1ea/0x658
>
> stack backtrace:
> CPU: 1 PID: 9 Comm: kworker/u4:0 Not tainted 5.18.0-rc6+ #191
> Hardware name: QEMU 8561 QEMU (KVM/Linux)
> Workqueue: events_unbound async_run_entry_fn
> Call Trace:
> [<0000000000d8af22>] dump_stack_lvl+0x92/0xd0
> [<00000000002032ac>] mark_lock_irq+0x864/0x968
> [<0000000000203670>] mark_lock.part.0+0x2c0/0x790
> [<0000000000203cea>] mark_usage+0x10a/0x178
> [<000000000020692a>] __lock_acquire+0x442/0xc20
> [<0000000000207cc4>] lock_acquire.part.0+0xdc/0x228
> [<0000000000207eb6>] lock_acquire+0xa6/0x1b0
> [<0000000000d9c774>] _raw_write_lock+0x54/0xa8
> [<0000000000d5a1f6>] virtio_ccw_synchronize_cbs+0x4e/0x60
> [<00000000008eec04>] register_virtio_device+0xdc/0x1b0
> [<0000000000d5aabe>] virtio_ccw_online+0x246/0x2e8
> [<0000000000c9fecc>] ccw_device_set_online+0x1c4/0x540
> [<0000000000d5a05e>] virtio_ccw_auto_online+0x26/0x50
> [<00000000001ba2b0>] async_run_entry_fn+0x40/0x108
> [<00000000001ab9b4>] process_one_work+0x2a4/0x658
> [<00000000001abdd0>] worker_thread+0x68/0x440
> [<00000000001b4668>] kthread+0x128/0x130
> [<0000000000102fac>] __ret_from_fork+0x3c/0x50
> [<0000000000d9d3aa>] ret_from_fork+0xa/0x40
> INFO: lockdep is turned off.
>

2022-05-11 09:40:42

[permalink] [raw]

Subject: Re: [PATCH V4 8/9] virtio: harden vring IRQ

On Tue, May 10, 2022 at 7:32 PM Michael S. Tsirkin <[email protected]> wrote:
>
> On Sat, May 07, 2022 at 03:19:53PM +0800, Jason Wang wrote:
> > This is a rework on the previous IRQ hardening that is done for
> > virtio-pci where several drawbacks were found and were reverted:
> >
> > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > that is used by some device such as virtio-blk
> > 2) done only for PCI transport
> >
> > The vq->broken is re-used in this patch for implementing the IRQ
> > hardening. The vq->broken is set to true during both initialization
> > and reset. And the vq->broken is set to false in
> > virtio_device_ready(). Then vring_interrupt can check and return when
> > vq->broken is true. And in this case, switch to return IRQ_NONE to let
> > the interrupt core aware of such invalid interrupt to prevent IRQ
> > storm.
> >
> > The reason of using a per queue variable instead of a per device one
> > is that we may need it for per queue reset hardening in the future.
> >
> > Note that the hardening is only done for vring interrupt since the
> > config interrupt hardening is already done in commit 22b7050a024d7
> > ("virtio: defer config changed notifications"). But the method that is
> > used by config interrupt can't be reused by the vring interrupt
> > handler because it uses spinlock to do the synchronization which is
> > expensive.
> >
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: "Paul E. McKenney" <[email protected]>
> > Cc: Marc Zyngier <[email protected]>
> > Cc: Halil Pasic <[email protected]>
> > Cc: Cornelia Huck <[email protected]>
> > Signed-off-by: Jason Wang <[email protected]>
> > ---
> > drivers/virtio/virtio.c | 15 ++++++++++++---
> > drivers/virtio/virtio_ring.c | 11 +++++++----
> > include/linux/virtio_config.h | 12 ++++++++++++
> > 3 files changed, 31 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > index 8dde44ea044a..696f5ba4f38e 100644
> > --- a/drivers/virtio/virtio.c
> > +++ b/drivers/virtio/virtio.c
> > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > * */
> > void virtio_reset_device(struct virtio_device *dev)
> > {
> > + /*
> > + * The below virtio_synchronize_cbs() guarantees that any
> > + * interrupt for this line arriving after
> > + * virtio_synchronize_vqs() has completed is guaranteed to see
> > + * driver_ready == false.
> > + */
> > + virtio_break_device(dev);
> > + virtio_synchronize_cbs(dev);
> > +
> > dev->config->reset(dev);
> > }
> > EXPORT_SYMBOL_GPL(virtio_reset_device);
> > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > dev->config_enabled = false;
> > dev->config_change_pending = false;
> >
> > + INIT_LIST_HEAD(&dev->vqs);
> > + spin_lock_init(&dev->vqs_list_lock);
> > +
> > /* We always start by resetting the device, in case a previous
> > * driver messed it up. This also tests that code path a little. */
> > virtio_reset_device(dev);
> > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > /* Acknowledge that we've seen the device. */
> > virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> >
> > - INIT_LIST_HEAD(&dev->vqs);
> > - spin_lock_init(&dev->vqs_list_lock);
> > -
> > /*
> > * device_add() causes the bus infrastructure to look for a matching
> > * driver.
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 5b7df7c455f0..9dfad2890d7a 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -1690,7 +1690,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > vq->we_own_ring = true;
> > vq->notify = notify;
> > vq->weak_barriers = weak_barriers;
> > - vq->broken = false;
> > + vq->broken = true;
> > vq->last_used_idx = 0;
> > vq->event_triggered = false;
> > vq->num_added = 0;
> > @@ -2136,8 +2136,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > return IRQ_NONE;
> > }
> >
> > - if (unlikely(vq->broken))
> > - return IRQ_HANDLED;
> > + if (unlikely(vq->broken)) {
> > + dev_warn_once(&vq->vq.vdev->dev,
> > + "virtio vring IRQ raised before DRIVER_OK");
> > + return IRQ_NONE;
> > + }
> >
> > /* Just a hint for performance: so it's ok that this can be racy! */
> > if (vq->event)
> > @@ -2179,7 +2182,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > vq->we_own_ring = false;
> > vq->notify = notify;
> > vq->weak_barriers = weak_barriers;
> > - vq->broken = false;
> > + vq->broken = true;
> > vq->last_used_idx = 0;
> > vq->event_triggered = false;
> > vq->num_added = 0;
> > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > index d8a2340f928e..23f1694cdbd5 100644
> > --- a/include/linux/virtio_config.h
> > +++ b/include/linux/virtio_config.h
> > @@ -256,6 +256,18 @@ void virtio_device_ready(struct virtio_device *dev)
> > unsigned status = dev->config->get_status(dev);
> >
> > BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > +
> > + /*
> > + * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > + * will see the driver specific setup if it sees vq->broken
> > + * as false.
> > + */
> > + virtio_synchronize_cbs(dev);
>
> since you mention vq->broken above, maybe add
> "set vq->broken to false"

Ok.

>
> > + __virtio_unbreak_device(dev);
> > + /*
> > + * The transport is expected ensure the visibility of
>
> to ensure

Will fix.

>
> > + * vq->broken
>
> let's add: "visibility by vq callbacks"

Sure.

>
> > before setting VIRTIO_CONFIG_S_DRIVER_OK.
> > + */
>
>
> Can I see some analysis of existing transports showing
> this is actually the case for them?

Yes.

> And maybe add a comment near set_status to document the
> requirement.

For PCI and MMIO, we can quote the memory-barriers.txt or explain that
wmb() is not needed before the MMIO writel().
For CCW, it looks not obvious, it looks to me the IO was submitted via
__ssch() which has an inline assembly. Cornelia and Hali, could you
help me to understand if and how did virtio_ccw_set_status() can
ensure the visibility of the previous driver setup and vq->broken
here?

Thanks

>
> > dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > }
> >
> > --
> > 2.25.1
>

2022-05-11 12:20:42

[permalink] [raw]

Subject: Re: [PATCH V4 8/9] virtio: harden vring IRQ

On Wed, May 11, 2022 at 4:44 PM Cornelia Huck <[email protected]> wrote:
>
> On Wed, May 11 2022, Jason Wang <[email protected]> wrote:
>
> > On Tue, May 10, 2022 at 7:32 PM Michael S. Tsirkin <[email protected]> wrote:
> >>
> >> On Sat, May 07, 2022 at 03:19:53PM +0800, Jason Wang wrote:
> >> > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> >> > index d8a2340f928e..23f1694cdbd5 100644
> >> > --- a/include/linux/virtio_config.h
> >> > +++ b/include/linux/virtio_config.h
> >> > @@ -256,6 +256,18 @@ void virtio_device_ready(struct virtio_device *dev)
> >> > unsigned status = dev->config->get_status(dev);
> >> >
> >> > BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> >> > +
> >> > + /*
> >> > + * The virtio_synchronize_cbs() makes sure vring_interrupt()
> >> > + * will see the driver specific setup if it sees vq->broken
> >> > + * as false.
> >> > + */
> >> > + virtio_synchronize_cbs(dev);
> >>
> >> since you mention vq->broken above, maybe add
> >> "set vq->broken to false"
> >
> > Ok.
> >
> >>
> >> > + __virtio_unbreak_device(dev);
> >> > + /*
> >> > + * The transport is expected ensure the visibility of
> >>
> >> to ensure
> >
> > Will fix.
> >
> >>
> >> > + * vq->broken
> >>
> >> let's add: "visibility by vq callbacks"
> >
> > Sure.
> >
> >>
> >> > before setting VIRTIO_CONFIG_S_DRIVER_OK.
> >> > + */
> >>
> >>
> >> Can I see some analysis of existing transports showing
> >> this is actually the case for them?
> >
> > Yes.
> >
> >> And maybe add a comment near set_status to document the
> >> requirement.
> >
> > For PCI and MMIO, we can quote the memory-barriers.txt or explain that
> > wmb() is not needed before the MMIO writel().
> > For CCW, it looks not obvious, it looks to me the IO was submitted via
> > __ssch() which has an inline assembly. Cornelia and Hali, could you
> > help me to understand if and how did virtio_ccw_set_status() can
> > ensure the visibility of the previous driver setup and vq->broken
> > here?
>
> I'm not sure I completely understand the question here, but let me try:

It's something like the following case:

CPU 0: vq->broken = false
CPU 0: set_status(DRIVER_OK)
CPU 1: vring_interrupt() { if (vq->broken) return IRQ_NONE; }

We need to make sure the CPU 1 sees the vq->broken if the interrupt is
raised after DRVER_OK.

For PCI, we use MMIO of writel() for set_status(), a wmb() is not
needed in this case according to memory-barriers.txt.

"
Note that, when using writel(), a prior
wmb() is not needed to guarantee that the cache coherent memory writes
have completed before writing to the MMIO region.
"

So CPU 1 will see the broken as false.

>
> virtio_ccw_set_status() uses a channel command to set the status, with
> the interesting stuff done inside ccw_io_helper(). That function
> - takes the subchannel lock, disabling interrupts

Then it is, for x86 the operation to disable interrupt is a full
barrier. I guess this should apply to other architecture like s390. I
see a stnsm is used in this case but a quick google doesn't tell me if
it's a barrier.
If this is true. The vring_interrupt will see broken as false.

> - does the ssch; this instruction will fail if there's already another
> I/O in progress, or an interrupt is pending for the subchannel; on
> success, it is guaranteed that we'll get an interrupt eventually

I guess ssch might imply a barrier as well, otherwise we may need a
lot of barriers before this.

Thanks

> - unlock the subchannel, and wait for the interupt handler to eventually
> process the interrupt, so I guess it should see the vq->broken value?
>
> If the I/O fails, virtio_ccw_set_status() will revert its internal
> status to the old value.
>
>
> >
> > Thanks
> >
> >>
> >> > dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> >> > }
>

2022-05-12 02:23:04

by Cornelia Huck

[permalink] [raw]

Subject: Re: [PATCH V4 8/9] virtio: harden vring IRQ

On Wed, May 11 2022, Jason Wang <[email protected]> wrote:

> On Tue, May 10, 2022 at 7:32 PM Michael S. Tsirkin <[email protected]> wrote:
>>
>> On Sat, May 07, 2022 at 03:19:53PM +0800, Jason Wang wrote:
>> > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
>> > index d8a2340f928e..23f1694cdbd5 100644
>> > --- a/include/linux/virtio_config.h
>> > +++ b/include/linux/virtio_config.h
>> > @@ -256,6 +256,18 @@ void virtio_device_ready(struct virtio_device *dev)
>> > unsigned status = dev->config->get_status(dev);
>> >
>> > BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
>> > +
>> > + /*
>> > + * The virtio_synchronize_cbs() makes sure vring_interrupt()
>> > + * will see the driver specific setup if it sees vq->broken
>> > + * as false.
>> > + */
>> > + virtio_synchronize_cbs(dev);
>>
>> since you mention vq->broken above, maybe add
>> "set vq->broken to false"
>
> Ok.
>
>>
>> > + __virtio_unbreak_device(dev);
>> > + /*
>> > + * The transport is expected ensure the visibility of
>>
>> to ensure
>
> Will fix.
>
>>
>> > + * vq->broken
>>
>> let's add: "visibility by vq callbacks"
>
> Sure.
>
>>
>> > before setting VIRTIO_CONFIG_S_DRIVER_OK.
>> > + */
>>
>>
>> Can I see some analysis of existing transports showing
>> this is actually the case for them?
>
> Yes.
>
>> And maybe add a comment near set_status to document the
>> requirement.
>
> For PCI and MMIO, we can quote the memory-barriers.txt or explain that
> wmb() is not needed before the MMIO writel().
> For CCW, it looks not obvious, it looks to me the IO was submitted via
> __ssch() which has an inline assembly. Cornelia and Hali, could you
> help me to understand if and how did virtio_ccw_set_status() can
> ensure the visibility of the previous driver setup and vq->broken
> here?

I'm not sure I completely understand the question here, but let me try:

virtio_ccw_set_status() uses a channel command to set the status, with
the interesting stuff done inside ccw_io_helper(). That function
- takes the subchannel lock, disabling interrupts
- does the ssch; this instruction will fail if there's already another
I/O in progress, or an interrupt is pending for the subchannel; on
success, it is guaranteed that we'll get an interrupt eventually
- unlock the subchannel, and wait for the interupt handler to eventually
process the interrupt, so I guess it should see the vq->broken value?

If the I/O fails, virtio_ccw_set_status() will revert its internal
status to the old value.

>
> Thanks
>
>>
>> > dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
>> > }

2022-05-12 07:28:34

[permalink] [raw]

Subject: Re: [PATCH V4 8/9] virtio: harden vring IRQ

On Wed, 11 May 2022 17:27:44 +0800
Jason Wang <[email protected]> wrote:

> On Wed, May 11, 2022 at 4:44 PM Cornelia Huck <[email protected]> wrote:
> >
> > On Wed, May 11 2022, Jason Wang <[email protected]> wrote:
> >
> > > On Tue, May 10, 2022 at 7:32 PM Michael S. Tsirkin <[email protected]> wrote:
> > >>
> > >> On Sat, May 07, 2022 at 03:19:53PM +0800, Jason Wang wrote:
> > >> > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > >> > index d8a2340f928e..23f1694cdbd5 100644
> > >> > --- a/include/linux/virtio_config.h
> > >> > +++ b/include/linux/virtio_config.h
> > >> > @@ -256,6 +256,18 @@ void virtio_device_ready(struct virtio_device *dev)
> > >> > unsigned status = dev->config->get_status(dev);
> > >> >
> > >> > BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > >> > +
> > >> > + /*
> > >> > + * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > >> > + * will see the driver specific setup if it sees vq->broken
> > >> > + * as false.
> > >> > + */
> > >> > + virtio_synchronize_cbs(dev);
> > >>
> > >> since you mention vq->broken above, maybe add
> > >> "set vq->broken to false"
> > >
> > > Ok.
> > >
> > >>
> > >> > + __virtio_unbreak_device(dev);
> > >> > + /*
> > >> > + * The transport is expected ensure the visibility of
> > >>
> > >> to ensure
> > >
> > > Will fix.
> > >
> > >>
> > >> > + * vq->broken
> > >>
> > >> let's add: "visibility by vq callbacks"
> > >
> > > Sure.
> > >
> > >>
> > >> > before setting VIRTIO_CONFIG_S_DRIVER_OK.
> > >> > + */
> > >>
> > >>
> > >> Can I see some analysis of existing transports showing
> > >> this is actually the case for them?
> > >
> > > Yes.
> > >
> > >> And maybe add a comment near set_status to document the
> > >> requirement.
> > >
> > > For PCI and MMIO, we can quote the memory-barriers.txt or explain that
> > > wmb() is not needed before the MMIO writel().
> > > For CCW, it looks not obvious, it looks to me the IO was submitted via
> > > __ssch() which has an inline assembly. Cornelia and Hali, could you
> > > help me to understand if and how did virtio_ccw_set_status() can
> > > ensure the visibility of the previous driver setup and vq->broken
> > > here?
> >
> > I'm not sure I completely understand the question here, but let me try:
>
> It's something like the following case:
>
> CPU 0: vq->broken = false
> CPU 0: set_status(DRIVER_OK)
> CPU 1: vring_interrupt() { if (vq->broken) return IRQ_NONE; }
>
> We need to make sure the CPU 1 sees the vq->broken if the interrupt is
> raised after DRVER_OK.
>
> For PCI, we use MMIO of writel() for set_status(), a wmb() is not
> needed in this case according to memory-barriers.txt.
>
> "
> Note that, when using writel(), a prior
> wmb() is not needed to guarantee that the cache coherent memory writes
> have completed before writing to the MMIO region.
> "

IMHO the key facts here are the following:
* ssch and all other I/O instructions are serializing instructions
* all interruptions are serializing operations

For reference see
https://www.ibm.com/resources/publications/OutputPubsDetails?PubID=SA22783213
page 5-138.

Maybe we should add that to the linux documentation somewhere if
not already mentioned.

So IMHO we don't need CPU0 to do a wmb() because of the ssch.

>
> So CPU 1 will see the broken as false.

But barriers need to be paired. And in my understanding the ssch
doesn't really ensure that CPU1 is about to see the change, unless
there is a suitable barrier that pairs with the barrier implied
the ssch instruction.

Assumed vring_interrupt() is always done in hard-irq context, AFAIU,
we should be fine. Is that assumption correct?

Why are we fine:
* Either the ssch was performed before the interrupt for
vring_interrupt() got delivered on CPU1, and then we are guaranteed to
see the updated value for vq->broken,
* or the interrupt that triggered vring_interrupt() was delivered before
the ssch instruction got executed. But in this case it is fine to
ignore the notification, because this is actually the bad case
we want to guard against: we got a notification when
notifications are not allowed.

We may end up with !vq->broken and !DEVICE_OK as well, but that should
be fine because, although that notification would be a should not happen
one, I understand it would not catch us with our pants down.

Regards,
Halil

>
> >
> > virtio_ccw_set_status() uses a channel command to set the status, with
> > the interesting stuff done inside ccw_io_helper(). That function
> > - takes the subchannel lock, disabling interrupts
>
> Then it is, for x86 the operation to disable interrupt is a full
> barrier. I guess this should apply to other architecture like s390. I
> see a stnsm is used in this case but a quick google doesn't tell me if
> it's a barrier.
> If this is true. The vring_interrupt will see broken as false.
>
> > - does the ssch; this instruction will fail if there's already another
> > I/O in progress, or an interrupt is pending for the subchannel; on
> > success, it is guaranteed that we'll get an interrupt eventually
>
> I guess ssch might imply a barrier as well, otherwise we may need a
> lot of barriers before this.
>
> Thanks
>
> > - unlock the subchannel, and wait for the interupt handler to eventually
> > process the interrupt, so I guess it should see the vq->broken value?
> >
> > If the I/O fails, virtio_ccw_set_status() will revert its internal
> > status to the old value.
> >
> >
> > >
> > > Thanks
> > >
> > >>
> > >> > dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > >> > }
> >
>

2022-05-12 09:36:55

[permalink] [raw]

Subject: Re: [PATCH V4 8/9] virtio: harden vring IRQ

On Wed, May 11, 2022 at 8:49 PM Halil Pasic <[email protected]> wrote:
>
> On Wed, 11 May 2022 17:27:44 +0800
> Jason Wang <[email protected]> wrote:
>
> > On Wed, May 11, 2022 at 4:44 PM Cornelia Huck <[email protected]> wrote:
> > >
> > > On Wed, May 11 2022, Jason Wang <[email protected]> wrote:
> > >
> > > > On Tue, May 10, 2022 at 7:32 PM Michael S. Tsirkin <[email protected]> wrote:
> > > >>
> > > >> On Sat, May 07, 2022 at 03:19:53PM +0800, Jason Wang wrote:
> > > >> > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > >> > index d8a2340f928e..23f1694cdbd5 100644
> > > >> > --- a/include/linux/virtio_config.h
> > > >> > +++ b/include/linux/virtio_config.h
> > > >> > @@ -256,6 +256,18 @@ void virtio_device_ready(struct virtio_device *dev)
> > > >> > unsigned status = dev->config->get_status(dev);
> > > >> >
> > > >> > BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > >> > +
> > > >> > + /*
> > > >> > + * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > >> > + * will see the driver specific setup if it sees vq->broken
> > > >> > + * as false.
> > > >> > + */
> > > >> > + virtio_synchronize_cbs(dev);
> > > >>
> > > >> since you mention vq->broken above, maybe add
> > > >> "set vq->broken to false"
> > > >
> > > > Ok.
> > > >
> > > >>
> > > >> > + __virtio_unbreak_device(dev);
> > > >> > + /*
> > > >> > + * The transport is expected ensure the visibility of
> > > >>
> > > >> to ensure
> > > >
> > > > Will fix.
> > > >
> > > >>
> > > >> > + * vq->broken
> > > >>
> > > >> let's add: "visibility by vq callbacks"
> > > >
> > > > Sure.
> > > >
> > > >>
> > > >> > before setting VIRTIO_CONFIG_S_DRIVER_OK.
> > > >> > + */
> > > >>
> > > >>
> > > >> Can I see some analysis of existing transports showing
> > > >> this is actually the case for them?
> > > >
> > > > Yes.
> > > >
> > > >> And maybe add a comment near set_status to document the
> > > >> requirement.
> > > >
> > > > For PCI and MMIO, we can quote the memory-barriers.txt or explain that
> > > > wmb() is not needed before the MMIO writel().
> > > > For CCW, it looks not obvious, it looks to me the IO was submitted via
> > > > __ssch() which has an inline assembly. Cornelia and Hali, could you
> > > > help me to understand if and how did virtio_ccw_set_status() can
> > > > ensure the visibility of the previous driver setup and vq->broken
> > > > here?
> > >
> > > I'm not sure I completely understand the question here, but let me try:
> >
> > It's something like the following case:
> >
> > CPU 0: vq->broken = false
> > CPU 0: set_status(DRIVER_OK)
> > CPU 1: vring_interrupt() { if (vq->broken) return IRQ_NONE; }
> >
> > We need to make sure the CPU 1 sees the vq->broken if the interrupt is
> > raised after DRVER_OK.
> >
> > For PCI, we use MMIO of writel() for set_status(), a wmb() is not
> > needed in this case according to memory-barriers.txt.
> >
> > "
> > Note that, when using writel(), a prior
> > wmb() is not needed to guarantee that the cache coherent memory writes
> > have completed before writing to the MMIO region.
> > "
>
>
> IMHO the key facts here are the following:
> * ssch and all other I/O instructions are serializing instructions
> * all interruptions are serializing operations
>
> For reference see
> https://www.ibm.com/resources/publications/OutputPubsDetails?PubID=SA22783213
> page 5-138.

I see thanks for the pointer.

>
>
> Maybe we should add that to the linux documentation somewhere if
> not already mentioned.

Maybe somewhere in memory-barriers.txt.

>
> So IMHO we don't need CPU0 to do a wmb() because of the ssch.
>

Right.

> >
> > So CPU 1 will see the broken as false.
>
> But barriers need to be paired.

Yes, actually the pairing is done by the device where it need something like:

if (get_status(DRIVER_OK)) {
rmb();
start_device_logic();
raise_interrupt();
}

> And in my understanding the ssch
> doesn't really ensure that CPU1 is about to see the change, unless
> there is a suitable barrier that pairs with the barrier implied
> the ssch instruction.
>
> Assumed vring_interrupt() is always done in hard-irq context, AFAIU,
> we should be fine. Is that assumption correct?
>
> Why are we fine:
> * Either the ssch was performed before the interrupt for
> vring_interrupt() got delivered on CPU1, and then we are guaranteed to
> see the updated value for vq->broken,

Yes, for a well behaved device, the device will raise the interrupt
after it sees DRIVER_OK and the ssch guarantees that when the device
sees DRIVER_OK vq->broken is false.

> * or the interrupt that triggered vring_interrupt() was delivered before
> the ssch instruction got executed. But in this case it is fine to
> ignore the notification, because this is actually the bad case
> we want to guard against: we got a notification when
> notifications are not allowed.

Exactly.

>
> We may end up with !vq->broken and !DEVICE_OK as well, but that should
> be fine because, although that notification would be a should not happen
> one, I understand it would not catch us with our pants down.

Right.

>
> Regards,
> Halil
>
>
> >
> > >
> > > virtio_ccw_set_status() uses a channel command to set the status, with
> > > the interesting stuff done inside ccw_io_helper(). That function
> > > - takes the subchannel lock, disabling interrupts
> >
> > Then it is, for x86 the operation to disable interrupt is a full
> > barrier. I guess this should apply to other architecture like s390. I
> > see a stnsm is used in this case but a quick google doesn't tell me if
> > it's a barrier.

Looks like it's not a serialization instruction and this
memory-barriers.rst told me irq-disabling is only a compiler barrier:

"""
Functions that disable interrupts (ACQUIRE equivalent) and enable interrupts
(RELEASE equivalent) will act as compiler barriers only. So if memory or I/O
barriers are required in such a situation, they must be provided from some
other means.
"""

Thanks

> > If this is true. The vring_interrupt will see broken as false.
> >
> > > - does the ssch; this instruction will fail if there's already another
> > > I/O in progress, or an interrupt is pending for the subchannel; on
> > > success, it is guaranteed that we'll get an interrupt eventually
> >
> > I guess ssch might imply a barrier as well, otherwise we may need a
> > lot of barriers before this.
> >
> > Thanks
> >
> > > - unlock the subchannel, and wait for the interupt handler to eventually
> > > process the interrupt, so I guess it should see the vq->broken value?
> > >
> > > If the I/O fails, virtio_ccw_set_status() will revert its internal
> > > status to the old value.
> > >
> > >
> > > >
> > > > Thanks
> > > >
> > > >>
> > > >> > dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > >> > }
> > >
> >
>

2022-05-12 13:43:41

[permalink] [raw]

Subject: Re: [PATCH V4 0/9] rework on the IRQ hardening of virtio

On Wed, 11 May 2022 10:22:59 +0800
Jason Wang <[email protected]> wrote:

> > CPU0
> > ----
> > lock(&vcdev->irq_lock);
> > <Interrupt>
> > lock(&vcdev->irq_lock);
> >
> > *** DEADLOCK ***
>
> It looks to me we need to use write_lock_irq()/write_unlock_irq() to
> do the synchronization.
>
> And we probably need to keep the
> read_lock_irqsave()/read_lock_irqrestore() logic since I can see the
> virtio_ccw_int_handler() to be called from process context (e.g from
> the io_subchannel_quiesce()).
>

Sounds correct.

Regards,
Halil

2022-05-13 13:02:31

[permalink] [raw]

Subject: Re: [PATCH V4 0/9] rework on the IRQ hardening of virtio

On Wed, May 11, 2022 at 10:02 PM Halil Pasic <[email protected]> wrote:
>
> On Wed, 11 May 2022 10:22:59 +0800
> Jason Wang <[email protected]> wrote:
>
> > > CPU0
> > > ----
> > > lock(&vcdev->irq_lock);
> > > <Interrupt>
> > > lock(&vcdev->irq_lock);
> > >
> > > *** DEADLOCK ***
> >
> > It looks to me we need to use write_lock_irq()/write_unlock_irq() to
> > do the synchronization.
> >
> > And we probably need to keep the
> > read_lock_irqsave()/read_lock_irqrestore() logic since I can see the
> > virtio_ccw_int_handler() to be called from process context (e.g from
> > the io_subchannel_quiesce()).
> >
>
> Sounds correct.

As Cornelia and Vineeth pointed out, all the paths the vring_interrupt
is called with irq disabled.

So I will use spin_lock()/spin_unlock() in the next version.

Thanks

>
> Regards,
> Halil
>

2022-05-16 18:45:34

by Michael S. Tsirkin

[permalink] [raw]

Subject: Re: [PATCH V4 0/9] rework on the IRQ hardening of virtio

On Mon, May 16, 2022 at 01:20:06PM +0200, Halil Pasic wrote:
> On Thu, 12 May 2022 11:31:08 +0800
> Jason Wang <[email protected]> wrote:
>
> > > > It looks to me we need to use write_lock_irq()/write_unlock_irq() to
> > > > do the synchronization.
> > > >
> > > > And we probably need to keep the
> > > > read_lock_irqsave()/read_lock_irqrestore() logic since I can see the
> > > > virtio_ccw_int_handler() to be called from process context (e.g from
> > > > the io_subchannel_quiesce()).
> > > >
> > >
> > > Sounds correct.
> >
> > As Cornelia and Vineeth pointed out, all the paths the vring_interrupt
> > is called with irq disabled.
> >
> > So I will use spin_lock()/spin_unlock() in the next version.
>
> Can we do some sort of an assertion that if the kernel is built with
> the corresponding debug features will make sure this assumption holds
> (and warn if it does not)? That assertion would also document the fact.

Lockdep will do this automatically if you get it wrong, just like it
did here.

> If an assertion is not possible, I think we should at least place a
> strategic comment that documents our assumption.

That can't hurt.

> Regards,
> Halil
>
> >
> > Thanks

2022-05-17 01:57:58

[permalink] [raw]

Subject: Re: [PATCH V4 0/9] rework on the IRQ hardening of virtio

On Thu, 12 May 2022 11:31:08 +0800
Jason Wang <[email protected]> wrote:

> > > It looks to me we need to use write_lock_irq()/write_unlock_irq() to
> > > do the synchronization.
> > >
> > > And we probably need to keep the
> > > read_lock_irqsave()/read_lock_irqrestore() logic since I can see the
> > > virtio_ccw_int_handler() to be called from process context (e.g from
> > > the io_subchannel_quiesce()).
> > >
> >
> > Sounds correct.
>
> As Cornelia and Vineeth pointed out, all the paths the vring_interrupt
> is called with irq disabled.
>
> So I will use spin_lock()/spin_unlock() in the next version.

Can we do some sort of an assertion that if the kernel is built with
the corresponding debug features will make sure this assumption holds
(and warn if it does not)? That assertion would also document the fact.

If an assertion is not possible, I think we should at least place a
strategic comment that documents our assumption.

Regards,
Halil

>
> Thanks

2022-05-17 12:58:37

[permalink] [raw]

Subject: Re: [PATCH V4 0/9] rework on the IRQ hardening of virtio

On Mon, May 16, 2022 at 10:25 PM Michael S. Tsirkin <[email protected]> wrote:
>
> On Mon, May 16, 2022 at 01:20:06PM +0200, Halil Pasic wrote:
> > On Thu, 12 May 2022 11:31:08 +0800
> > Jason Wang <[email protected]> wrote:
> >
> > > > > It looks to me we need to use write_lock_irq()/write_unlock_irq() to
> > > > > do the synchronization.
> > > > >
> > > > > And we probably need to keep the
> > > > > read_lock_irqsave()/read_lock_irqrestore() logic since I can see the
> > > > > virtio_ccw_int_handler() to be called from process context (e.g from
> > > > > the io_subchannel_quiesce()).
> > > > >
> > > >
> > > > Sounds correct.
> > >
> > > As Cornelia and Vineeth pointed out, all the paths the vring_interrupt
> > > is called with irq disabled.
> > >
> > > So I will use spin_lock()/spin_unlock() in the next version.
> >
> > Can we do some sort of an assertion that if the kernel is built with
> > the corresponding debug features will make sure this assumption holds
> > (and warn if it does not)? That assertion would also document the fact.
>
> Lockdep will do this automatically if you get it wrong, just like it
> did here.
>
> > If an assertion is not possible, I think we should at least place a
> > strategic comment that documents our assumption.
>
> That can't hurt.

I will add some comments here.

Thanks

>
> > Regards,
> > Halil
> >
> > >
> > > Thanks
>