2023-01-02 15:28:27

by Ricardo Ribalda

[permalink] [raw]
Subject: [PATCH v6] media: uvcvideo: Fix race condition with usb_kill_urb

usb_kill_urb warranties that all the handlers are finished when it
returns, but does not protect against threads that might be handling
asynchronously the urb.

For UVC, the function uvc_ctrl_status_event_async() takes care of
control changes asynchronously.

If the code is executed in the following order:

CPU 0 CPU 1
===== =====
uvc_status_complete()
uvc_status_stop()
uvc_ctrl_status_event_work()
uvc_status_start() -> FAIL

Then uvc_status_start will keep failing and this error will be shown:

<4>[ 5.540139] URB 0000000000000000 submitted while active
drivers/usb/core/urb.c:378 usb_submit_urb+0x4c3/0x528

Let's improve the current situation, by not re-submiting the urb if
we are stopping the status event. Also process the queued work
(if any) during stop.

CPU 0 CPU 1
===== =====
uvc_status_complete()
uvc_status_stop()
uvc_status_start()
uvc_ctrl_status_event_work() -> FAIL

Hopefully, with the usb layer protection this should be enough to cover
all the cases.

Cc: [email protected]
Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives")
Reviewed-by: Yunke Cao <[email protected]>
Signed-off-by: Ricardo Ribalda <[email protected]>
---
uvc: Fix race condition on uvc

Make sure that all the async work is finished when we stop the status urb.

To: Yunke Cao <[email protected]>
To: Sergey Senozhatsky <[email protected]>
To: Max Staudt <[email protected]>
To: Laurent Pinchart <[email protected]>
To: Mauro Carvalho Chehab <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
Changes in v6:
- Improve comments. (Thanks Laurent).
- Use true/false instead of 1/0 (Thanks Laurent).
- Link to v5: https://lore.kernel.org/r/[email protected]

Changes in v5:
- atomic_t do not impose barriers, use smp_mb() instead. (Thanks Laurent)
- Add an extra cancel_work_sync().
- Link to v4: https://lore.kernel.org/r/[email protected]

Changes in v4:
- Replace bool with atomic_t to avoid compiler reordering.
- First complete the async work and then kill the urb to avoid race (Thanks Laurent!)
- Link to v3: https://lore.kernel.org/r/[email protected]

Changes in v3:
- Remove the patch for dev->status, makes more sense in another series, and makes
the zero day less nervous.
- Update reviewed-by (thanks Yunke!).
- Link to v2: https://lore.kernel.org/r/[email protected]

Changes in v2:
- Add a patch for not kalloc dev->status
- Redo the logic mechanism, so it also works with suspend (Thanks Yunke!)
- Link to v1: https://lore.kernel.org/r/[email protected]
---
drivers/media/usb/uvc/uvc_ctrl.c | 3 +++
drivers/media/usb/uvc/uvc_status.c | 40 ++++++++++++++++++++++++++++++++++++++
drivers/media/usb/uvc/uvcvideo.h | 1 +
3 files changed, 44 insertions(+)

diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
index c95a2229f4fa..5160facc8e20 100644
--- a/drivers/media/usb/uvc/uvc_ctrl.c
+++ b/drivers/media/usb/uvc/uvc_ctrl.c
@@ -1442,6 +1442,9 @@ static void uvc_ctrl_status_event_work(struct work_struct *work)

uvc_ctrl_status_event(w->chain, w->ctrl, w->data);

+ if (dev->flush_status)
+ return;
+
/* Resubmit the URB. */
w->urb->interval = dev->int_ep->desc.bInterval;
ret = usb_submit_urb(w->urb, GFP_KERNEL);
diff --git a/drivers/media/usb/uvc/uvc_status.c b/drivers/media/usb/uvc/uvc_status.c
index 7518ffce22ed..e457889345a3 100644
--- a/drivers/media/usb/uvc/uvc_status.c
+++ b/drivers/media/usb/uvc/uvc_status.c
@@ -6,6 +6,7 @@
* Laurent Pinchart ([email protected])
*/

+#include <asm/barrier.h>
#include <linux/kernel.h>
#include <linux/input.h>
#include <linux/slab.h>
@@ -309,5 +310,44 @@ int uvc_status_start(struct uvc_device *dev, gfp_t flags)

void uvc_status_stop(struct uvc_device *dev)
{
+ struct uvc_ctrl_work *w = &dev->async_ctrl;
+
+ /* Prevent the asynchronous control handler from requeing the URB */
+ dev->flush_status = true;
+
+ /*
+ * The barrier is needed so the flush_status change is visible to other
+ * CPUs running the asynchronous handler before usb_kill_urb() is
+ * called below.
+ */
+ smp_mb();
+
+ /* If there is any status event on the queue, process it. */
+ if (cancel_work_sync(&w->work))
+ uvc_ctrl_status_event(w->chain, w->ctrl, w->data);
+
+ /* Kill the urb. */
usb_kill_urb(dev->int_urb);
+
+ /*
+ * The URB completion handler may have queued asynchronous work. This
+ * won't resubmit the URB as flush_status is set, but it needs to be
+ * cancelled before returning or it could then race with a future
+ * uvc_status_start() call.
+ */
+ if (cancel_work_sync(&w->work))
+ uvc_ctrl_status_event(w->chain, w->ctrl, w->data);
+
+ /*
+ * From this point, there are no events on the queue and the status URB
+ * is dead, this is, no events will be queued until uvc_status_start()
+ * is called.
+ */
+ dev->flush_status = false;
+
+ /*
+ * Write to memory the value of flush_status before uvc_status_start()
+ * is called again.
+ */
+ smp_mb();
}
diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h
index df93db259312..6a9b72d6789e 100644
--- a/drivers/media/usb/uvc/uvcvideo.h
+++ b/drivers/media/usb/uvc/uvcvideo.h
@@ -560,6 +560,7 @@ struct uvc_device {
struct usb_host_endpoint *int_ep;
struct urb *int_urb;
u8 *status;
+ bool flush_status;
struct input_dev *input;
char input_phys[64];


---
base-commit: 0ec5a38bf8499f403f81cb81a0e3a60887d1993c
change-id: 20221212-uvc-race-09276ea68bf8

Best regards,
--
Ricardo Ribalda <[email protected]>


2023-01-03 09:53:04

by Ricardo Ribalda

[permalink] [raw]
Subject: Re: [PATCH v6] media: uvcvideo: Fix race condition with usb_kill_urb

Hi Hillf

Thanks for the heads up

On Tue, 3 Jan 2023 at 03:25, Hillf Danton <[email protected]> wrote:
>
> On 02 Jan 2023 15:48:01 +0100 Ricardo Ribalda <[email protected]>
> > --- a/drivers/media/usb/uvc/uvc_ctrl.c
> > +++ b/drivers/media/usb/uvc/uvc_ctrl.c
> > @@ -1442,6 +1442,9 @@ static void uvc_ctrl_status_event_work(struct work_struct *work)
> >
> > uvc_ctrl_status_event(w->chain, w->ctrl, w->data);
> >
> > + if (dev->flush_status)
> > + return;
> > +
> > /* Resubmit the URB. */
> > w->urb->interval = dev->int_ep->desc.bInterval;
> > ret = usb_submit_urb(w->urb, GFP_KERNEL);
> > diff --git a/drivers/media/usb/uvc/uvc_status.c b/drivers/media/usb/uvc/uvc_status.c
> > index 7518ffce22ed..e457889345a3 100644
> > --- a/drivers/media/usb/uvc/uvc_status.c
> > +++ b/drivers/media/usb/uvc/uvc_status.c
> > @@ -6,6 +6,7 @@
> > * Laurent Pinchart ([email protected])
> > */
> >
> > +#include <asm/barrier.h>
> > #include <linux/kernel.h>
> > #include <linux/input.h>
> > #include <linux/slab.h>
> > @@ -309,5 +310,44 @@ int uvc_status_start(struct uvc_device *dev, gfp_t flags)
> >
> > void uvc_status_stop(struct uvc_device *dev)
> > {
> > + struct uvc_ctrl_work *w = &dev->async_ctrl;
> > +
> > + /* Prevent the asynchronous control handler from requeing the URB */
> > + dev->flush_status = true;
> > +
> > + /*
> > + * The barrier is needed so the flush_status change is visible to other
> > + * CPUs running the asynchronous handler before usb_kill_urb() is
> > + * called below.
> > + */
> > + smp_mb();
>
> Given unpaired mb, take a look at the release/acquire memory barrier pairing
> in c5b2cbdbdac5 ("ipc/mqueue.c: update/document memory barriers")

Would it work? to replace:

dev->flush_status = true;
smp_mb();

with:
smp_store_release(&dev->flush_status, 1);

and then read it always with:

smp_load_acquire(&dev->flush_status);

Thanks!

>
> > +
> > + /* If there is any status event on the queue, process it. */
> > + if (cancel_work_sync(&w->work))
> > + uvc_ctrl_status_event(w->chain, w->ctrl, w->data);
> > +
> > + /* Kill the urb. */
> > usb_kill_urb(dev->int_urb);
> > +
> > + /*
> > + * The URB completion handler may have queued asynchronous work. This
> > + * won't resubmit the URB as flush_status is set, but it needs to be
> > + * cancelled before returning or it could then race with a future
> > + * uvc_status_start() call.
> > + */
> > + if (cancel_work_sync(&w->work))
> > + uvc_ctrl_status_event(w->chain, w->ctrl, w->data);
> > +
> > + /*
> > + * From this point, there are no events on the queue and the status URB
> > + * is dead, this is, no events will be queued until uvc_status_start()
> > + * is called.
> > + */
> > + dev->flush_status = false;
> > +
> > + /*
> > + * Write to memory the value of flush_status before uvc_status_start()
> > + * is called again.
> > + */
> > + smp_mb();
> > }



--
Ricardo Ribalda

2023-01-04 21:30:21

by Ricardo Ribalda

[permalink] [raw]
Subject: Re: [PATCH v6] media: uvcvideo: Fix race condition with usb_kill_urb

Hi Hillf

On Wed, 4 Jan 2023 at 21:54, Hillf Danton <[email protected]> wrote:
>
> On 3 Jan 2023 10:08:56 +0100 Ricardo Ribalda <[email protected]>
> >
> > Would it work?
>
> How did you test your patch v6?

I have not tested v6 besides the compile test. The last one that I
device tested was v4 (with atomics) and it didn't crash on x86. I
assumed that since this is more restrictive it was less likely to
fail.

I put this patch on top of my "granular power saving" patch, because
it is more likely to run into race conditions, and then try to
exercise the driver.

So you have a suggestion to trigger the race conditions more effectively?

To be clear, what I mean by would it work with smp_store_release and
smp_load_aqcuire, is that based on the doc it seems like the right
choe, but I am sure that you have more experience than me here :). I
usually rely on mutexes for this.

Thanks!



--
Ricardo Ribalda