From: Shigeru Yoshida <[email protected]>
Running io_watchdog_func() while ohci_urb_enqueue() is running can
cause a race condition where ohci->prev_frame_no is corrupted and the
watchdog can mis-detect following error:
ohci-platform 664a0800.usb: frame counter not updating; disabled
ohci-platform 664a0800.usb: HC died; cleaning up
Specifically, following scenario causes a race condition:
1. ohci_urb_enqueue() calls spin_lock_irqsave(&ohci->lock, flags)
and enters the critical section
2. ohci_urb_enqueue() calls timer_pending(&ohci->io_watchdog) and it
returns false
3. ohci_urb_enqueue() sets ohci->prev_frame_no to a frame number
read by ohci_frame_no(ohci)
4. ohci_urb_enqueue() schedules io_watchdog_func() with mod_timer()
5. ohci_urb_enqueue() calls spin_unlock_irqrestore(&ohci->lock,
flags) and exits the critical section
6. Later, ohci_urb_enqueue() is called
7. ohci_urb_enqueue() calls spin_lock_irqsave(&ohci->lock, flags)
and enters the critical section
8. The timer scheduled on step 4 expires and io_watchdog_func() runs
9. io_watchdog_func() calls spin_lock_irqsave(&ohci->lock, flags)
and waits on it because ohci_urb_enqueue() is already in the
critical section on step 7
10. ohci_urb_enqueue() calls timer_pending(&ohci->io_watchdog) and it
returns false
11. ohci_urb_enqueue() sets ohci->prev_frame_no to new frame number
read by ohci_frame_no(ohci) because the frame number proceeded
between step 3 and 6
12. ohci_urb_enqueue() schedules io_watchdog_func() with mod_timer()
13. ohci_urb_enqueue() calls spin_unlock_irqrestore(&ohci->lock,
flags) and exits the critical section, then wake up
io_watchdog_func() which is waiting on step 9
14. io_watchdog_func() enters the critical section
15. io_watchdog_func() calls ohci_frame_no(ohci) and set frame_no
variable to the frame number
16. io_watchdog_func() compares frame_no and ohci->prev_frame_no
On step 16, because this calling of io_watchdog_func() is scheduled on
step 4, the frame number set in ohci->prev_frame_no is expected to the
number set on step 3. However, ohci->prev_frame_no is overwritten on
step 11. Because step 16 is executed soon after step 11, the frame
number might not proceed, so ohci->prev_frame_no must equals to
frame_no.
To address above scenario, this patch introduces a special sentinel
value IO_WATCHDOG_OFF and set this value to ohci->prev_frame_no when
the watchdog is not pending or running. When ohci_urb_enqueue()
schedules the watchdog (step 4 and 12 above), it compares
ohci->prev_frame_no to IO_WATCHDOG_OFF so that ohci->prev_frame_no is
not overwritten while io_watchdog_func() is running.
v2: Instead of adding an extra flag variable, defining IO_WATCHDOG_OFF
as a special sentinel value for prev_frame_no.
Signed-off-by: Shigeru Yoshida <[email protected]>
Signed-off-by: Haiqing Bai <[email protected]>
---
drivers/usb/host/ohci-hcd.c | 10 +++++++---
drivers/usb/host/ohci-hub.c | 4 +++-
2 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/host/ohci-hcd.c b/drivers/usb/host/ohci-hcd.c
index ee96763..84f88fa 100644
--- a/drivers/usb/host/ohci-hcd.c
+++ b/drivers/usb/host/ohci-hcd.c
@@ -74,6 +74,7 @@
#define STATECHANGE_DELAY msecs_to_jiffies(300)
#define IO_WATCHDOG_DELAY msecs_to_jiffies(275)
+#define IO_WATCHDOG_OFF 0xffffff00
#include "ohci.h"
#include "pci-quirks.h"
@@ -231,7 +232,7 @@ static int ohci_urb_enqueue (
}
/* Start up the I/O watchdog timer, if it's not running */
- if (!timer_pending(&ohci->io_watchdog) &&
+ if (ohci->prev_frame_no == IO_WATCHDOG_OFF &&
list_empty(&ohci->eds_in_use) &&
!(ohci->flags & OHCI_QUIRK_QEMU)) {
ohci->prev_frame_no = ohci_frame_no(ohci);
@@ -501,6 +502,7 @@ static int ohci_init (struct ohci_hcd *ohci)
return 0;
timer_setup(&ohci->io_watchdog, io_watchdog_func, 0);
+ ohci->prev_frame_no = IO_WATCHDOG_OFF;
ohci->hcca = dma_alloc_coherent (hcd->self.controller,
sizeof(*ohci->hcca), &ohci->hcca_dma, GFP_KERNEL);
@@ -730,7 +732,7 @@ static void io_watchdog_func(struct timer_list *t)
u32 head;
struct ed *ed;
struct td *td, *td_start, *td_next;
- unsigned frame_no;
+ unsigned frame_no, prev_frame_no = IO_WATCHDOG_OFF;
unsigned long flags;
spin_lock_irqsave(&ohci->lock, flags);
@@ -835,7 +837,7 @@ static void io_watchdog_func(struct timer_list *t)
}
}
if (!list_empty(&ohci->eds_in_use)) {
- ohci->prev_frame_no = frame_no;
+ prev_frame_no = frame_no;
ohci->prev_wdh_cnt = ohci->wdh_cnt;
ohci->prev_donehead = ohci_readl(ohci,
&ohci->regs->donehead);
@@ -845,6 +847,7 @@ static void io_watchdog_func(struct timer_list *t)
}
done:
+ ohci->prev_frame_no = prev_frame_no;
spin_unlock_irqrestore(&ohci->lock, flags);
}
@@ -973,6 +976,7 @@ static void ohci_stop (struct usb_hcd *hcd)
if (quirk_nec(ohci))
flush_work(&ohci->nec_work);
del_timer_sync(&ohci->io_watchdog);
+ ohci->prev_frame_no = IO_WATCHDOG_OFF;
ohci_writel (ohci, OHCI_INTR_MIE, &ohci->regs->intrdisable);
ohci_usb_reset(ohci);
diff --git a/drivers/usb/host/ohci-hub.c b/drivers/usb/host/ohci-hub.c
index fb7aaa3..634f3c7 100644
--- a/drivers/usb/host/ohci-hub.c
+++ b/drivers/usb/host/ohci-hub.c
@@ -311,8 +311,10 @@ static int ohci_bus_suspend (struct usb_hcd *hcd)
rc = ohci_rh_suspend (ohci, 0);
spin_unlock_irq (&ohci->lock);
- if (rc == 0)
+ if (rc == 0) {
del_timer_sync(&ohci->io_watchdog);
+ ohci->prev_frame_no = IO_WATCHDOG_OFF;
+ }
return rc;
}
--
1.9.1
On Fri, 2 Feb 2018, Haiqing Bai wrote:
> From: Shigeru Yoshida <[email protected]>
>
> Running io_watchdog_func() while ohci_urb_enqueue() is running can
> cause a race condition where ohci->prev_frame_no is corrupted and the
> watchdog can mis-detect following error:
>
> ohci-platform 664a0800.usb: frame counter not updating; disabled
> ohci-platform 664a0800.usb: HC died; cleaning up
>
> Specifically, following scenario causes a race condition:
>
> 1. ohci_urb_enqueue() calls spin_lock_irqsave(&ohci->lock, flags)
> and enters the critical section
> 2. ohci_urb_enqueue() calls timer_pending(&ohci->io_watchdog) and it
> returns false
> 3. ohci_urb_enqueue() sets ohci->prev_frame_no to a frame number
> read by ohci_frame_no(ohci)
> 4. ohci_urb_enqueue() schedules io_watchdog_func() with mod_timer()
> 5. ohci_urb_enqueue() calls spin_unlock_irqrestore(&ohci->lock,
> flags) and exits the critical section
> 6. Later, ohci_urb_enqueue() is called
> 7. ohci_urb_enqueue() calls spin_lock_irqsave(&ohci->lock, flags)
> and enters the critical section
> 8. The timer scheduled on step 4 expires and io_watchdog_func() runs
> 9. io_watchdog_func() calls spin_lock_irqsave(&ohci->lock, flags)
> and waits on it because ohci_urb_enqueue() is already in the
> critical section on step 7
> 10. ohci_urb_enqueue() calls timer_pending(&ohci->io_watchdog) and it
> returns false
> 11. ohci_urb_enqueue() sets ohci->prev_frame_no to new frame number
> read by ohci_frame_no(ohci) because the frame number proceeded
> between step 3 and 6
> 12. ohci_urb_enqueue() schedules io_watchdog_func() with mod_timer()
> 13. ohci_urb_enqueue() calls spin_unlock_irqrestore(&ohci->lock,
> flags) and exits the critical section, then wake up
> io_watchdog_func() which is waiting on step 9
> 14. io_watchdog_func() enters the critical section
> 15. io_watchdog_func() calls ohci_frame_no(ohci) and set frame_no
> variable to the frame number
> 16. io_watchdog_func() compares frame_no and ohci->prev_frame_no
>
> On step 16, because this calling of io_watchdog_func() is scheduled on
> step 4, the frame number set in ohci->prev_frame_no is expected to the
> number set on step 3. However, ohci->prev_frame_no is overwritten on
> step 11. Because step 16 is executed soon after step 11, the frame
> number might not proceed, so ohci->prev_frame_no must equals to
> frame_no.
>
> To address above scenario, this patch introduces a special sentinel
> value IO_WATCHDOG_OFF and set this value to ohci->prev_frame_no when
> the watchdog is not pending or running. When ohci_urb_enqueue()
> schedules the watchdog (step 4 and 12 above), it compares
> ohci->prev_frame_no to IO_WATCHDOG_OFF so that ohci->prev_frame_no is
> not overwritten while io_watchdog_func() is running.
>
> v2: Instead of adding an extra flag variable, defining IO_WATCHDOG_OFF
> as a special sentinel value for prev_frame_no.
>
> Signed-off-by: Shigeru Yoshida <[email protected]>
> Signed-off-by: Haiqing Bai <[email protected]>
The "v2" explanation is supposed to go below the "---" tear line, but
that doesn't matter much. In any case,
Acked-by: Alan Stern <[email protected]>
You know, this could have been fixed a lot more easily if the core
kernel had a timer_pending_or_executing() routine. I bet there are
other places in the kernel that suffer from the same bug and need the
same fix.
Alan Stern