2021-08-24 11:01:10

by Stefan Hajnoczi

[permalink] [raw]
Subject: [RFC PATCH 0/1] virtio: false unhandled irqs from vring_interrupt()

While investigating an unhandled irq from vring_interrupt() with virtiofs I

stumbled onto a possible race that also affects virtio_gpu. This theory is

based on code inspection and hopefully you can point out something that makes

this a non-issue in practice :).



The vring_interrupt() function returns IRQ_NONE when an MSI-X interrupt is

taken with no used (completed) buffers in the virtqueue. The kernel disables

the irq and the driver is no longer receives irqs when this happens:



irq 77: nobody cared (try booting with the "irqpoll" option)

...

handlers:

[<00000000a40a49bb>] vring_interrupt

Disabling IRQ #77



Consider the following:



1. An virtiofs irq is handled and the virtio_fs_requests_done_work() work

function is scheduled to dequeue used buffers:

vring_interrupt() -> virtio_fs_vq_done() -> schedule_work()



2. The device adds more used requests and just before...



3. ...virtio_fs_requests_done_work() empties the virtqueue with

virtqueue_get_buf().



4. The device raises the irq and vring_interrupt() is called after

virtio_fs_requests_done_work emptied the virtqueue:



irqreturn_t vring_interrupt(int irq, void *_vq)

{

struct vring_virtqueue *vq = to_vvq(_vq);



if (!more_used(vq)) {

pr_debug("virtqueue interrupt with no work for %p\n", vq);

return IRQ_NONE;

^^^^^^^^^^^^^^^^



I have included a patch that switches virtiofs from spin_lock() to

spin_lock_irqsave() to prevent vring_interrupt() from running while the

virtqueue is processed from a work function.



virtio_gpu has a similar case where virtio_gpu_dequeue_ctrl_func() and

virtio_gpu_dequeue_cursor_func() work functions only use spin_lock().

I think this can result in the same false unhandled irq problem as virtiofs.



This race condition could in theory affect all drivers. The VIRTIO

specification says:



Neither of these notification suppression methods are reliable, as they are

not synchronized with the device, but they serve as useful optimizations.



If virtqueue_disable_cb() is just a hint and might not disable virtqueue irqs

then virtio_net and other drivers have a problem because because an irq could

be raised while the driver is dequeuing used buffers. I think we haven't seen

this because software VIRTIO devices honor virtqueue_disable_cb(). Hardware

devices might cache the value and not disable notifications for some time...



Have I missed something?



The virtiofs patch I attached is being stress tested to see if the unhandled

irqs still occur.



Stefan Hajnoczi (1):

fuse: disable local irqs when processing vq completions



fs/fuse/virtio_fs.c | 15 ++++++++++-----

1 file changed, 10 insertions(+), 5 deletions(-)



--

2.31.1





2021-08-24 11:01:40

by Stefan Hajnoczi

[permalink] [raw]
Subject: [RFC PATCH 1/1] fuse: disable local irqs when processing vq completions

The virtqueue completion handler function runs on a work queue and local
irqs are still enabled. There is a race where the completion handler
function grabs the next completed request just before vring_interrupt()
runs. vring_interrupt() sees an empty virtqueue and returns IRQ_NONE,
falsely declaring this interrupt unhandled.

The unhandled irq causes the kernel to disable the irq:

irq 77: nobody cared (try booting with the "irqpoll" option)
...
handlers:
[<00000000d33eeed7>] vring_interrupt
Disabling IRQ #77

The driver hangs afterwards since virtqueue irqs are now ignored.

Disable local irqs before calling virtqueue_get_buf() and re-enable them
afterwards so that vring_interrupt() doesn't run during the race window.

Reported-by: Xiaoling Gao <[email protected]>
Cc: Michael Tsirkin <[email protected]>
Cc: Jason Wang <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>
---
I'm not 100% convinced this fixes everything because vring_interrupt()
can still run after our critical section and find the virtqueue empty.
virtqueue_disable_cb() should minimize that but it's only a hint and
there is a small window when the race condition can happen before it's
called.
---
fs/fuse/virtio_fs.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 8f52cdaa8445..57e1f264b0a8 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -319,9 +319,10 @@ static void virtio_fs_hiprio_done_work(struct work_struct *work)
struct virtio_fs_vq *fsvq = container_of(work, struct virtio_fs_vq,
done_work);
struct virtqueue *vq = fsvq->vq;
+ unsigned long flags;

/* Free completed FUSE_FORGET requests */
- spin_lock(&fsvq->lock);
+ spin_lock_irqsave(&fsvq->lock, flags);
do {
unsigned int len;
void *req;
@@ -333,7 +334,7 @@ static void virtio_fs_hiprio_done_work(struct work_struct *work)
dec_in_flight_req(fsvq);
}
} while (!virtqueue_enable_cb(vq) && likely(!virtqueue_is_broken(vq)));
- spin_unlock(&fsvq->lock);
+ spin_unlock_irqrestore(&fsvq->lock, flags);
}

static void virtio_fs_request_dispatch_work(struct work_struct *work)
@@ -601,11 +602,15 @@ static void virtio_fs_requests_done_work(struct work_struct *work)
struct virtqueue *vq = fsvq->vq;
struct fuse_req *req;
struct fuse_req *next;
+ unsigned long flags;
unsigned int len;
LIST_HEAD(reqs);

- /* Collect completed requests off the virtqueue */
- spin_lock(&fsvq->lock);
+ /*
+ * Collect completed requests off the virtqueue with irqs disabled to
+ * prevent races with vring_interrupt().
+ */
+ spin_lock_irqsave(&fsvq->lock, flags);
do {
virtqueue_disable_cb(vq);

@@ -615,7 +620,7 @@ static void virtio_fs_requests_done_work(struct work_struct *work)
spin_unlock(&fpq->lock);
}
} while (!virtqueue_enable_cb(vq) && likely(!virtqueue_is_broken(vq)));
- spin_unlock(&fsvq->lock);
+ spin_unlock_irqrestore(&fsvq->lock, flags);

/* End requests */
list_for_each_entry_safe(req, next, &reqs, list) {
--
2.31.1

2021-08-24 11:35:44

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [RFC PATCH 0/1] virtio: false unhandled irqs from vring_interrupt()

On Tue, Aug 24, 2021 at 11:59:43AM +0100, Stefan Hajnoczi wrote:
> While investigating an unhandled irq from vring_interrupt() with virtiofs I
> stumbled onto a possible race that also affects virtio_gpu. This theory is
> based on code inspection and hopefully you can point out something that makes
> this a non-issue in practice :).
>
> The vring_interrupt() function returns IRQ_NONE when an MSI-X interrupt is
> taken with no used (completed) buffers in the virtqueue. The kernel disables
> the irq and the driver is no longer receives irqs when this happens:
>
> irq 77: nobody cared (try booting with the "irqpoll" option)
> ...
> handlers:
> [<00000000a40a49bb>] vring_interrupt
> Disabling IRQ #77
>
> Consider the following:
>
> 1. An virtiofs irq is handled and the virtio_fs_requests_done_work() work
> function is scheduled to dequeue used buffers:
> vring_interrupt() -> virtio_fs_vq_done() -> schedule_work()
>
> 2. The device adds more used requests and just before...
>
> 3. ...virtio_fs_requests_done_work() empties the virtqueue with
> virtqueue_get_buf().
>
> 4. The device raises the irq and vring_interrupt() is called after
> virtio_fs_requests_done_work emptied the virtqueue:
>
> irqreturn_t vring_interrupt(int irq, void *_vq)
> {
> struct vring_virtqueue *vq = to_vvq(_vq);
>
> if (!more_used(vq)) {
> pr_debug("virtqueue interrupt with no work for %p\n", vq);
> return IRQ_NONE;
> ^^^^^^^^^^^^^^^^
>
> I have included a patch that switches virtiofs from spin_lock() to
> spin_lock_irqsave() to prevent vring_interrupt() from running while the
> virtqueue is processed from a work function.
>
> virtio_gpu has a similar case where virtio_gpu_dequeue_ctrl_func() and
> virtio_gpu_dequeue_cursor_func() work functions only use spin_lock().
> I think this can result in the same false unhandled irq problem as virtiofs.
>
> This race condition could in theory affect all drivers. The VIRTIO
> specification says:
>
> Neither of these notification suppression methods are reliable, as they are
> not synchronized with the device, but they serve as useful optimizations.
>
> If virtqueue_disable_cb() is just a hint and might not disable virtqueue irqs
> then virtio_net and other drivers have a problem because because an irq could
> be raised while the driver is dequeuing used buffers. I think we haven't seen
> this because software VIRTIO devices honor virtqueue_disable_cb(). Hardware
> devices might cache the value and not disable notifications for some time...
>
> Have I missed something?
>
> The virtiofs patch I attached is being stress tested to see if the unhandled
> irqs still occur.
>
> Stefan Hajnoczi (1):
> fuse: disable local irqs when processing vq completions
>
> fs/fuse/virtio_fs.c | 15 ++++++++++-----
> 1 file changed, 10 insertions(+), 5 deletions(-)

Fundamentally it is not a problem to have an unhandled IRQ
once in a while. It's only a problem if this happens time
after time.


Does the kernel you are testing include
commit 8d622d21d24803408b256d96463eac4574dcf067
Author: Michael S. Tsirkin <[email protected]>
Date: Tue Apr 13 01:19:16 2021 -0400

virtio: fix up virtio_disable_cb

?

If not it's worth checking whether the latest kernel still
has the issue.

Note cherry picking that commit isn't trivial there are
a bunch of dependencies.

If you want to try on an old kernel, disable event index.

> --
> 2.31.1
>

2021-08-24 16:41:06

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [RFC PATCH 0/1] virtio: false unhandled irqs from vring_interrupt()

On Tue, Aug 24, 2021 at 07:31:29AM -0400, Michael S. Tsirkin wrote:
> On Tue, Aug 24, 2021 at 11:59:43AM +0100, Stefan Hajnoczi wrote:
> > While investigating an unhandled irq from vring_interrupt() with virtiofs I
> > stumbled onto a possible race that also affects virtio_gpu. This theory is
> > based on code inspection and hopefully you can point out something that makes
> > this a non-issue in practice :).
> >
> > The vring_interrupt() function returns IRQ_NONE when an MSI-X interrupt is
> > taken with no used (completed) buffers in the virtqueue. The kernel disables
> > the irq and the driver is no longer receives irqs when this happens:
> >
> > irq 77: nobody cared (try booting with the "irqpoll" option)
> > ...
> > handlers:
> > [<00000000a40a49bb>] vring_interrupt
> > Disabling IRQ #77
> >
> > Consider the following:
> >
> > 1. An virtiofs irq is handled and the virtio_fs_requests_done_work() work
> > function is scheduled to dequeue used buffers:
> > vring_interrupt() -> virtio_fs_vq_done() -> schedule_work()
> >
> > 2. The device adds more used requests and just before...
> >
> > 3. ...virtio_fs_requests_done_work() empties the virtqueue with
> > virtqueue_get_buf().
> >
> > 4. The device raises the irq and vring_interrupt() is called after
> > virtio_fs_requests_done_work emptied the virtqueue:
> >
> > irqreturn_t vring_interrupt(int irq, void *_vq)
> > {
> > struct vring_virtqueue *vq = to_vvq(_vq);
> >
> > if (!more_used(vq)) {
> > pr_debug("virtqueue interrupt with no work for %p\n", vq);
> > return IRQ_NONE;
> > ^^^^^^^^^^^^^^^^
> >
> > I have included a patch that switches virtiofs from spin_lock() to
> > spin_lock_irqsave() to prevent vring_interrupt() from running while the
> > virtqueue is processed from a work function.
> >
> > virtio_gpu has a similar case where virtio_gpu_dequeue_ctrl_func() and
> > virtio_gpu_dequeue_cursor_func() work functions only use spin_lock().
> > I think this can result in the same false unhandled irq problem as virtiofs.
> >
> > This race condition could in theory affect all drivers. The VIRTIO
> > specification says:
> >
> > Neither of these notification suppression methods are reliable, as they are
> > not synchronized with the device, but they serve as useful optimizations.
> >
> > If virtqueue_disable_cb() is just a hint and might not disable virtqueue irqs
> > then virtio_net and other drivers have a problem because because an irq could
> > be raised while the driver is dequeuing used buffers. I think we haven't seen
> > this because software VIRTIO devices honor virtqueue_disable_cb(). Hardware
> > devices might cache the value and not disable notifications for some time...
> >
> > Have I missed something?
> >
> > The virtiofs patch I attached is being stress tested to see if the unhandled
> > irqs still occur.
> >
> > Stefan Hajnoczi (1):
> > fuse: disable local irqs when processing vq completions
> >
> > fs/fuse/virtio_fs.c | 15 ++++++++++-----
> > 1 file changed, 10 insertions(+), 5 deletions(-)
>
> Fundamentally it is not a problem to have an unhandled IRQ
> once in a while. It's only a problem if this happens time
> after time.
>
>
> Does the kernel you are testing include
> commit 8d622d21d24803408b256d96463eac4574dcf067
> Author: Michael S. Tsirkin <[email protected]>
> Date: Tue Apr 13 01:19:16 2021 -0400
>
> virtio: fix up virtio_disable_cb
>
> ?
>
> If not it's worth checking whether the latest kernel still
> has the issue.
>
> Note cherry picking that commit isn't trivial there are
> a bunch of dependencies.
>
> If you want to try on an old kernel, disable event index.

Thanks for pointing out this commit. My kernel does not have it. The
device is using the split vring layout (probably with EVENT_IDX) so
virtqueue_disable_cb() has no effect.

kernel/irq/spurious.c:note_interrupt() disables the irq after 99.9k
unhandled irqs. It's surprising that virtiofs is hitting this limit
through the scenario I described, but maybe.

I'll try different kernel versions and/or disable EVENT_IDX as you
suggested. Thanks!

Stefan


Attachments:
(No filename) (4.17 kB)
signature.asc (499.00 B)
Download all attachments

2021-08-26 15:44:55

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [RFC PATCH 0/1] virtio: false unhandled irqs from vring_interrupt()

On Tue, Aug 24, 2021 at 07:31:29AM -0400, Michael S. Tsirkin wrote:
> On Tue, Aug 24, 2021 at 11:59:43AM +0100, Stefan Hajnoczi wrote:
> > While investigating an unhandled irq from vring_interrupt() with virtiofs I
> > stumbled onto a possible race that also affects virtio_gpu. This theory is
> > based on code inspection and hopefully you can point out something that makes
> > this a non-issue in practice :).
> >
> > The vring_interrupt() function returns IRQ_NONE when an MSI-X interrupt is
> > taken with no used (completed) buffers in the virtqueue. The kernel disables
> > the irq and the driver is no longer receives irqs when this happens:
> >
> > irq 77: nobody cared (try booting with the "irqpoll" option)
> > ...
> > handlers:
> > [<00000000a40a49bb>] vring_interrupt
> > Disabling IRQ #77
> >
> > Consider the following:
> >
> > 1. An virtiofs irq is handled and the virtio_fs_requests_done_work() work
> > function is scheduled to dequeue used buffers:
> > vring_interrupt() -> virtio_fs_vq_done() -> schedule_work()
> >
> > 2. The device adds more used requests and just before...
> >
> > 3. ...virtio_fs_requests_done_work() empties the virtqueue with
> > virtqueue_get_buf().
> >
> > 4. The device raises the irq and vring_interrupt() is called after
> > virtio_fs_requests_done_work emptied the virtqueue:
> >
> > irqreturn_t vring_interrupt(int irq, void *_vq)
> > {
> > struct vring_virtqueue *vq = to_vvq(_vq);
> >
> > if (!more_used(vq)) {
> > pr_debug("virtqueue interrupt with no work for %p\n", vq);
> > return IRQ_NONE;
> > ^^^^^^^^^^^^^^^^
> >
> > I have included a patch that switches virtiofs from spin_lock() to
> > spin_lock_irqsave() to prevent vring_interrupt() from running while the
> > virtqueue is processed from a work function.
> >
> > virtio_gpu has a similar case where virtio_gpu_dequeue_ctrl_func() and
> > virtio_gpu_dequeue_cursor_func() work functions only use spin_lock().
> > I think this can result in the same false unhandled irq problem as virtiofs.
> >
> > This race condition could in theory affect all drivers. The VIRTIO
> > specification says:
> >
> > Neither of these notification suppression methods are reliable, as they are
> > not synchronized with the device, but they serve as useful optimizations.
> >
> > If virtqueue_disable_cb() is just a hint and might not disable virtqueue irqs
> > then virtio_net and other drivers have a problem because because an irq could
> > be raised while the driver is dequeuing used buffers. I think we haven't seen
> > this because software VIRTIO devices honor virtqueue_disable_cb(). Hardware
> > devices might cache the value and not disable notifications for some time...
> >
> > Have I missed something?
> >
> > The virtiofs patch I attached is being stress tested to see if the unhandled
> > irqs still occur.
> >
> > Stefan Hajnoczi (1):
> > fuse: disable local irqs when processing vq completions
> >
> > fs/fuse/virtio_fs.c | 15 ++++++++++-----
> > 1 file changed, 10 insertions(+), 5 deletions(-)
>
> Fundamentally it is not a problem to have an unhandled IRQ
> once in a while. It's only a problem if this happens time
> after time.

That virtiofs patch in this series passed the stress test. We must have
been getting too many unhandled interrupts due to processing the
virtqueue from a work function that's not synchronized with
vring_interrupt() irq handler.

> Does the kernel you are testing include
> commit 8d622d21d24803408b256d96463eac4574dcf067
> Author: Michael S. Tsirkin <[email protected]>
> Date: Tue Apr 13 01:19:16 2021 -0400
>
> virtio: fix up virtio_disable_cb
>
> ?
>
> If not it's worth checking whether the latest kernel still
> has the issue.
>
> Note cherry picking that commit isn't trivial there are
> a bunch of dependencies.

We'll test a recent kernel with just your patch to see if it solves the
issue.

Stefan


Attachments:
(No filename) (3.96 kB)
signature.asc (499.00 B)
Download all attachments

2021-09-06 15:41:09

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [RFC PATCH 0/1] virtio: false unhandled irqs from vring_interrupt()

On Tue, Aug 24, 2021 at 07:31:29AM -0400, Michael S. Tsirkin wrote:
> On Tue, Aug 24, 2021 at 11:59:43AM +0100, Stefan Hajnoczi wrote:
> > While investigating an unhandled irq from vring_interrupt() with virtiofs I
> > stumbled onto a possible race that also affects virtio_gpu. This theory is
> > based on code inspection and hopefully you can point out something that makes
> > this a non-issue in practice :).
> >
> > The vring_interrupt() function returns IRQ_NONE when an MSI-X interrupt is
> > taken with no used (completed) buffers in the virtqueue. The kernel disables
> > the irq and the driver is no longer receives irqs when this happens:
> >
> > irq 77: nobody cared (try booting with the "irqpoll" option)
> > ...
> > handlers:
> > [<00000000a40a49bb>] vring_interrupt
> > Disabling IRQ #77
> >
> > Consider the following:
> >
> > 1. An virtiofs irq is handled and the virtio_fs_requests_done_work() work
> > function is scheduled to dequeue used buffers:
> > vring_interrupt() -> virtio_fs_vq_done() -> schedule_work()
> >
> > 2. The device adds more used requests and just before...
> >
> > 3. ...virtio_fs_requests_done_work() empties the virtqueue with
> > virtqueue_get_buf().
> >
> > 4. The device raises the irq and vring_interrupt() is called after
> > virtio_fs_requests_done_work emptied the virtqueue:
> >
> > irqreturn_t vring_interrupt(int irq, void *_vq)
> > {
> > struct vring_virtqueue *vq = to_vvq(_vq);
> >
> > if (!more_used(vq)) {
> > pr_debug("virtqueue interrupt with no work for %p\n", vq);
> > return IRQ_NONE;
> > ^^^^^^^^^^^^^^^^
> >
> > I have included a patch that switches virtiofs from spin_lock() to
> > spin_lock_irqsave() to prevent vring_interrupt() from running while the
> > virtqueue is processed from a work function.
> >
> > virtio_gpu has a similar case where virtio_gpu_dequeue_ctrl_func() and
> > virtio_gpu_dequeue_cursor_func() work functions only use spin_lock().
> > I think this can result in the same false unhandled irq problem as virtiofs.
> >
> > This race condition could in theory affect all drivers. The VIRTIO
> > specification says:
> >
> > Neither of these notification suppression methods are reliable, as they are
> > not synchronized with the device, but they serve as useful optimizations.
> >
> > If virtqueue_disable_cb() is just a hint and might not disable virtqueue irqs
> > then virtio_net and other drivers have a problem because because an irq could
> > be raised while the driver is dequeuing used buffers. I think we haven't seen
> > this because software VIRTIO devices honor virtqueue_disable_cb(). Hardware
> > devices might cache the value and not disable notifications for some time...
> >
> > Have I missed something?
> >
> > The virtiofs patch I attached is being stress tested to see if the unhandled
> > irqs still occur.
> >
> > Stefan Hajnoczi (1):
> > fuse: disable local irqs when processing vq completions
> >
> > fs/fuse/virtio_fs.c | 15 ++++++++++-----
> > 1 file changed, 10 insertions(+), 5 deletions(-)
>
> Fundamentally it is not a problem to have an unhandled IRQ
> once in a while. It's only a problem if this happens time
> after time.
>
>
> Does the kernel you are testing include
> commit 8d622d21d24803408b256d96463eac4574dcf067
> Author: Michael S. Tsirkin <[email protected]>
> Date: Tue Apr 13 01:19:16 2021 -0400
>
> virtio: fix up virtio_disable_cb
>
> ?
>
> If not it's worth checking whether the latest kernel still
> has the issue.

A new kernel with your patch doesn't have this issue.

Please disregard the patch I posted, your patch seems to be enough.

Stefan


Attachments:
(No filename) (3.70 kB)
signature.asc (499.00 B)
Download all attachments