2018-01-31 09:44:42

by Haiqing Bai

[permalink] [raw]
Subject: [PATCH] ohci-hcd: Fix race condition caused by ohci_urb_enqueue() and io_watchdog_func()

Running io_watchdog_func() while ohci_urb_enqueue() is running can
cause a race condition where ohci->prev_frame_no is corrupted and the
watchdog can mis-detect following error:

ohci-platform 664a0800.usb: frame counter not updating; disabled
ohci-platform 664a0800.usb: HC died; cleaning up

Specifically, following scenario causes a race condition:

1. ohci_urb_enqueue() calls spin_lock_irqsave(&ohci->lock, flags)
and enters the critical section
2. ohci_urb_enqueue() calls timer_pending(&ohci->io_watchdog) and it
returns false
3. ohci_urb_enqueue() sets ohci->prev_frame_no to a frame number
read by ohci_frame_no(ohci)
4. ohci_urb_enqueue() schedules io_watchdog_func() with mod_timer()
5. ohci_urb_enqueue() calls spin_unlock_irqrestore(&ohci->lock,
flags) and exits the critical section
6. Later, ohci_urb_enqueue() is called
7. ohci_urb_enqueue() calls spin_lock_irqsave(&ohci->lock, flags)
and enters the critical section
8. The timer scheduled on step 4 expires and io_watchdog_func() runs
9. io_watchdog_func() calls spin_lock_irqsave(&ohci->lock, flags)
and waits on it because ohci_urb_enqueue() is already in the
critical section on step 7
10. ohci_urb_enqueue() calls timer_pending(&ohci->io_watchdog) and it
returns false
11. ohci_urb_enqueue() sets ohci->prev_frame_no to new frame number
read by ohci_frame_no(ohci) because the frame number proceeded
between step 3 and 6
12. ohci_urb_enqueue() schedules io_watchdog_func() with mod_timer()
13. ohci_urb_enqueue() calls spin_unlock_irqrestore(&ohci->lock,
flags) and exits the critical section, then wake up
io_watchdog_func() which is waiting on step 9
14. io_watchdog_func() enters the critical section
15. io_watchdog_func() calls ohci_frame_no(ohci) and set frame_no
variable to the frame number
16. io_watchdog_func() compares frame_no and ohci->prev_frame_no

On step 16, because this calling of io_watchdog_func() is scheduled on
step 4, the frame number set in ohci->prev_frame_no is expected to the
number set on step 3. However, ohci->prev_frame_no is overwritten on
step 11. Because step 16 is executed soon after step 11, the frame
number might not proceed, so ohci->prev_frame_no must equals to
frame_no.

To address above scenario, this patch introduces timer_running flag to
ohci_hcd structure. Setting true to ohci->timer_running indicates
io_watchdog_func() is scheduled or is running. ohci_urb_enqueue()
checks the flag when it schedules the watchdog (step 4 and 12 above),
so ohci->prev_frame_no is not overwritten while io_watchdog_func() is
running.

Author: Yoshida, Shigeru <[email protected]>
Signed-off-by: Haiqing Bai <[email protected]>
---
drivers/usb/host/ohci-hcd.c | 7 +++++++
drivers/usb/host/ohci-hub.c | 4 +++-
drivers/usb/host/ohci.h | 1 +
3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/host/ohci-hcd.c b/drivers/usb/host/ohci-hcd.c
index ee9676349333..2c7fa0c05854 100644
--- a/drivers/usb/host/ohci-hcd.c
+++ b/drivers/usb/host/ohci-hcd.c
@@ -233,10 +233,12 @@ static int ohci_urb_enqueue (
/* Start up the I/O watchdog timer, if it's not running */
if (!timer_pending(&ohci->io_watchdog) &&
list_empty(&ohci->eds_in_use) &&
+ !ohci->timer_running &&
!(ohci->flags & OHCI_QUIRK_QEMU)) {
ohci->prev_frame_no = ohci_frame_no(ohci);
mod_timer(&ohci->io_watchdog,
jiffies + IO_WATCHDOG_DELAY);
+ ohci->timer_running = 1;
}
list_add(&ed->in_use_list, &ohci->eds_in_use);

@@ -501,6 +503,7 @@ static int ohci_init (struct ohci_hcd *ohci)
return 0;

timer_setup(&ohci->io_watchdog, io_watchdog_func, 0);
+ ohci->timer_running = 0;

ohci->hcca = dma_alloc_coherent (hcd->self.controller,
sizeof(*ohci->hcca), &ohci->hcca_dma, GFP_KERNEL);
@@ -732,6 +735,7 @@ static void io_watchdog_func(struct timer_list *t)
struct td *td, *td_start, *td_next;
unsigned frame_no;
unsigned long flags;
+ int timer_running = 0;

spin_lock_irqsave(&ohci->lock, flags);

@@ -841,10 +845,12 @@ static void io_watchdog_func(struct timer_list *t)
&ohci->regs->donehead);
mod_timer(&ohci->io_watchdog,
jiffies + IO_WATCHDOG_DELAY);
+ timer_running = 1;
}
}

done:
+ ohci->timer_running = timer_running;
spin_unlock_irqrestore(&ohci->lock, flags);
}

@@ -973,6 +979,7 @@ static void ohci_stop (struct usb_hcd *hcd)
if (quirk_nec(ohci))
flush_work(&ohci->nec_work);
del_timer_sync(&ohci->io_watchdog);
+ ohci->timer_running = 0;

ohci_writel (ohci, OHCI_INTR_MIE, &ohci->regs->intrdisable);
ohci_usb_reset(ohci);
diff --git a/drivers/usb/host/ohci-hub.c b/drivers/usb/host/ohci-hub.c
index fb7aaa3b9d06..2aa21ee39cc2 100644
--- a/drivers/usb/host/ohci-hub.c
+++ b/drivers/usb/host/ohci-hub.c
@@ -311,8 +311,10 @@ static int ohci_bus_suspend (struct usb_hcd *hcd)
rc = ohci_rh_suspend (ohci, 0);
spin_unlock_irq (&ohci->lock);

- if (rc == 0)
+ if (rc == 0) {
del_timer_sync(&ohci->io_watchdog);
+ ohci->timer_running = 0;
+ }
return rc;
}

diff --git a/drivers/usb/host/ohci.h b/drivers/usb/host/ohci.h
index 508a803139dd..84f96558d667 100644
--- a/drivers/usb/host/ohci.h
+++ b/drivers/usb/host/ohci.h
@@ -427,6 +427,7 @@ struct ohci_hcd {
unsigned wdh_cnt, prev_wdh_cnt;
u32 prev_donehead;
struct timer_list io_watchdog;
+ int timer_running;

struct work_struct nec_work; /* Worker for NEC quirk */

--
2.11.0



2018-01-31 16:36:12

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] ohci-hcd: Fix race condition caused by ohci_urb_enqueue() and io_watchdog_func()

On Wed, 31 Jan 2018, Haiqing Bai wrote:

> Running io_watchdog_func() while ohci_urb_enqueue() is running can
> cause a race condition where ohci->prev_frame_no is corrupted and the
> watchdog can mis-detect following error:
>
> ohci-platform 664a0800.usb: frame counter not updating; disabled
> ohci-platform 664a0800.usb: HC died; cleaning up
>
> Specifically, following scenario causes a race condition:
>
> 1. ohci_urb_enqueue() calls spin_lock_irqsave(&ohci->lock, flags)
> and enters the critical section
> 2. ohci_urb_enqueue() calls timer_pending(&ohci->io_watchdog) and it
> returns false
> 3. ohci_urb_enqueue() sets ohci->prev_frame_no to a frame number
> read by ohci_frame_no(ohci)
> 4. ohci_urb_enqueue() schedules io_watchdog_func() with mod_timer()
> 5. ohci_urb_enqueue() calls spin_unlock_irqrestore(&ohci->lock,
> flags) and exits the critical section
> 6. Later, ohci_urb_enqueue() is called
> 7. ohci_urb_enqueue() calls spin_lock_irqsave(&ohci->lock, flags)
> and enters the critical section
> 8. The timer scheduled on step 4 expires and io_watchdog_func() runs
> 9. io_watchdog_func() calls spin_lock_irqsave(&ohci->lock, flags)
> and waits on it because ohci_urb_enqueue() is already in the
> critical section on step 7
> 10. ohci_urb_enqueue() calls timer_pending(&ohci->io_watchdog) and it
> returns false
> 11. ohci_urb_enqueue() sets ohci->prev_frame_no to new frame number
> read by ohci_frame_no(ohci) because the frame number proceeded
> between step 3 and 6
> 12. ohci_urb_enqueue() schedules io_watchdog_func() with mod_timer()
> 13. ohci_urb_enqueue() calls spin_unlock_irqrestore(&ohci->lock,
> flags) and exits the critical section, then wake up
> io_watchdog_func() which is waiting on step 9
> 14. io_watchdog_func() enters the critical section
> 15. io_watchdog_func() calls ohci_frame_no(ohci) and set frame_no
> variable to the frame number
> 16. io_watchdog_func() compares frame_no and ohci->prev_frame_no
>
> On step 16, because this calling of io_watchdog_func() is scheduled on
> step 4, the frame number set in ohci->prev_frame_no is expected to the
> number set on step 3. However, ohci->prev_frame_no is overwritten on
> step 11. Because step 16 is executed soon after step 11, the frame
> number might not proceed, so ohci->prev_frame_no must equals to
> frame_no.

That is a nasty bug!

> To address above scenario, this patch introduces timer_running flag to
> ohci_hcd structure. Setting true to ohci->timer_running indicates
> io_watchdog_func() is scheduled or is running. ohci_urb_enqueue()
> checks the flag when it schedules the watchdog (step 4 and 12 above),
> so ohci->prev_frame_no is not overwritten while io_watchdog_func() is
> running.

Instead of adding an extra flag variable, which has to be kept in sync
with the timer routine, how about defining a special sentinel value for
prev_frame_no? For example:

#define IO_WATCHDOG_OFF 0xffffff00

Then whenever the timer isn't scheduled or running, set
ohci->prev_frame_no to IO_WATCHDOG_OFF. And instead of testing
timer_pending(), compare prev_frame_no to this special value.

I think that approach will be slightly more robust.

Alan Stern


2018-02-01 05:49:29

by Yoshida, Shigeru

[permalink] [raw]
Subject: Re: [PATCH] ohci-hcd: Fix race condition caused by ohci_urb_enqueue() and io_watchdog_func()

Hi Alan,

Thank you for your commenting.

On Wed, 31 Jan 2018 11:02:47 -0500, Alan Stern wrote:
>> To address above scenario, this patch introduces timer_running flag to
>> ohci_hcd structure. Setting true to ohci->timer_running indicates
>> io_watchdog_func() is scheduled or is running. ohci_urb_enqueue()
>> checks the flag when it schedules the watchdog (step 4 and 12 above),
>> so ohci->prev_frame_no is not overwritten while io_watchdog_func() is
>> running.
>
> Instead of adding an extra flag variable, which has to be kept in sync
> with the timer routine, how about defining a special sentinel value for
> prev_frame_no? For example:
>
> #define IO_WATCHDOG_OFF 0xffffff00
>
> Then whenever the timer isn't scheduled or running, set
> ohci->prev_frame_no to IO_WATCHDOG_OFF. And instead of testing
> timer_pending(), compare prev_frame_no to this special value.
>
> I think that approach will be slightly more robust.

It's reasonable since ohci->prev_frame_no is not used while the
watchdog timer is stopped.

I think we must choose an invalid frame number for the special
sentinel value, but I'm not sure which value is adequate for it.
Is 0xffffff00 an invalid frame number, otherwise how about simply
-1(0xffffffff)?

Thanks,
Shigeru

2018-02-01 15:35:23

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] ohci-hcd: Fix race condition caused by ohci_urb_enqueue() and io_watchdog_func()

On Thu, 1 Feb 2018, Yoshida, Shigeru wrote:

> Hi Alan,
>
> Thank you for your commenting.
>
> On Wed, 31 Jan 2018 11:02:47 -0500, Alan Stern wrote:
> >> To address above scenario, this patch introduces timer_running flag to
> >> ohci_hcd structure. Setting true to ohci->timer_running indicates
> >> io_watchdog_func() is scheduled or is running. ohci_urb_enqueue()
> >> checks the flag when it schedules the watchdog (step 4 and 12 above),
> >> so ohci->prev_frame_no is not overwritten while io_watchdog_func() is
> >> running.
> >
> > Instead of adding an extra flag variable, which has to be kept in sync
> > with the timer routine, how about defining a special sentinel value for
> > prev_frame_no? For example:
> >
> > #define IO_WATCHDOG_OFF 0xffffff00
> >
> > Then whenever the timer isn't scheduled or running, set
> > ohci->prev_frame_no to IO_WATCHDOG_OFF. And instead of testing
> > timer_pending(), compare prev_frame_no to this special value.
> >
> > I think that approach will be slightly more robust.
>
> It's reasonable since ohci->prev_frame_no is not used while the
> watchdog timer is stopped.
>
> I think we must choose an invalid frame number for the special
> sentinel value, but I'm not sure which value is adequate for it.
> Is 0xffffff00 an invalid frame number, otherwise how about simply
> -1(0xffffffff)?

Well, the frame_no register is 32 bits wide, but only the 16 low-order
bits are meaningful. ohci_frame_no() strips off the high-order 16
bits, so any value with one of those bits set would be acceptable.
(Besides, valid frame numbers only go up to 2047.)

I chose 0xffffff00 because PCI reads from a non-working device
generally get a value with all the bits set. But since the upper 16
bits are masked away anyhow, it doesn't matter. -1u would be fine.

Alan Stern


2018-02-02 01:48:13

by Yoshida, Shigeru

[permalink] [raw]
Subject: Re: [PATCH] ohci-hcd: Fix race condition caused by ohci_urb_enqueue() and io_watchdog_func()

Hi Alan,

On Thu, 1 Feb 2018 10:33:33 -0500, Alan Stern wrote:
>> I think we must choose an invalid frame number for the special
>> sentinel value, but I'm not sure which value is adequate for it.
>> Is 0xffffff00 an invalid frame number, otherwise how about simply
>> -1(0xffffffff)?
>
> Well, the frame_no register is 32 bits wide, but only the 16 low-order
> bits are meaningful. ohci_frame_no() strips off the high-order 16
> bits, so any value with one of those bits set would be acceptable.
> (Besides, valid frame numbers only go up to 2047.)
>
> I chose 0xffffff00 because PCI reads from a non-working device
> generally get a value with all the bits set. But since the upper 16
> bits are masked away anyhow, it doesn't matter. -1u would be fine.

Thanks, I choose 0xffffff00 as you suggested, and will prepare v2 patch.

Thanks,
Shigeru