2010-01-18 18:49:46

by Andreas Mohr

[permalink] [raw]
Subject: mcs7830 usb net: "scheduling while atomic" danger?

Hi,

drivers/net/usb/mcs7830.c does several:

mutex_lock(&dev->phy_mutex);
/* write the MII command */
ret = mcs7830_set_reg(dev, HIF_REG_PHY_CMD1, 2, cmd);
if (ret < 0)
goto out;

/* wait for the data to become valid, should be within < 1ms */
for (i = 0; i < 10; i++) {
ret = mcs7830_get_reg(dev, HIF_REG_PHY_CMD1, 2, cmd);
if ((ret < 0) || (cmd[1] &
HIF_REG_PHY_CMD2_READY_FLAG_BIT))
break;
ret = -EIO;
msleep(1);
}


Forgive me, but doesn't that mutex_lock()/msleep() (ab)use mean
risking a "scheduling while atomic"?
(such as discussed in e.g.
http://search.luky.org/linux-kernel.2004/msg92817.html )


And, if that is the case, shouldn't all such cases simply be killed for
good via a capable semantic patch?

Thanks,

Andreas Mohr


2010-01-18 20:25:13

by Julia Lawall

[permalink] [raw]
Subject: Re: mcs7830 usb net: "scheduling while atomic" danger?

On Mon, 18 Jan 2010, Andreas Mohr wrote:

> Hi,
>
> drivers/net/usb/mcs7830.c does several:
>
> mutex_lock(&dev->phy_mutex);
> /* write the MII command */
> ret = mcs7830_set_reg(dev, HIF_REG_PHY_CMD1, 2, cmd);
> if (ret < 0)
> goto out;
>
> /* wait for the data to become valid, should be within < 1ms */
> for (i = 0; i < 10; i++) {
> ret = mcs7830_get_reg(dev, HIF_REG_PHY_CMD1, 2, cmd);
> if ((ret < 0) || (cmd[1] &
> HIF_REG_PHY_CMD2_READY_FLAG_BIT))
> break;
> ret = -EIO;
> msleep(1);
> }
>
>
> Forgive me, but doesn't that mutex_lock()/msleep() (ab)use mean
> risking a "scheduling while atomic"?
> (such as discussed in e.g.
> http://search.luky.org/linux-kernel.2004/msg92817.html )
>
>
> And, if that is the case, shouldn't all such cases simply be killed for
> good via a capable semantic patch?

This could easily be done, if the diagnosis is correct.

julia

2010-01-18 20:48:10

by Thomas Gleixner

[permalink] [raw]
Subject: Re: mcs7830 usb net: "scheduling while atomic" danger?

On Mon, 18 Jan 2010, Andreas Mohr wrote:

> Hi,
>
> drivers/net/usb/mcs7830.c does several:
>
> mutex_lock(&dev->phy_mutex);
> /* write the MII command */
> ret = mcs7830_set_reg(dev, HIF_REG_PHY_CMD1, 2, cmd);
> if (ret < 0)
> goto out;
>
> /* wait for the data to become valid, should be within < 1ms */
> for (i = 0; i < 10; i++) {
> ret = mcs7830_get_reg(dev, HIF_REG_PHY_CMD1, 2, cmd);
> if ((ret < 0) || (cmd[1] &
> HIF_REG_PHY_CMD2_READY_FLAG_BIT))
> break;
> ret = -EIO;
> msleep(1);
> }
>
>
> Forgive me, but doesn't that mutex_lock()/msleep() (ab)use mean
> risking a "scheduling while atomic"?
> (such as discussed in e.g.
> http://search.luky.org/linux-kernel.2004/msg92817.html )

No, that's different. You are allowed to sleep with a mutex held. The
thread is about spin_lock()/msleep().

spin_lock() is implicitly disabling preemption, mutex_lock() does not.

Thanks,

tglx

2010-01-18 21:10:48

by Julia Lawall

[permalink] [raw]
Subject: Re: mcs7830 usb net: "scheduling while atomic" danger?

On Mon, 18 Jan 2010, Andreas Mohr wrote:

> Hi,
>
> drivers/net/usb/mcs7830.c does several:
>
> mutex_lock(&dev->phy_mutex);
> /* write the MII command */
> ret = mcs7830_set_reg(dev, HIF_REG_PHY_CMD1, 2, cmd);
> if (ret < 0)
> goto out;
>
> /* wait for the data to become valid, should be within < 1ms */
> for (i = 0; i < 10; i++) {
> ret = mcs7830_get_reg(dev, HIF_REG_PHY_CMD1, 2, cmd);
> if ((ret < 0) || (cmd[1] &
> HIF_REG_PHY_CMD2_READY_FLAG_BIT))
> break;
> ret = -EIO;
> msleep(1);
> }
>
>
> Forgive me, but doesn't that mutex_lock()/msleep() (ab)use mean
> risking a "scheduling while atomic"?
> (such as discussed in e.g.
> http://search.luky.org/linux-kernel.2004/msg92817.html )
>
>
> And, if that is the case, shouldn't all such cases simply be killed for
> good via a capable semantic patch?

The semantic match shown below finds 55 matches. All but two involve
mutex_lock. Those are in the file
/var/linuxes/linux-next/drivers/usb/host/ehci-hub.c
in the functions ehci_bus_suspend and ehci_hub_control.

julia


@@
@@

(
*mutex_lock(...)
|
*spin_lock(...)
|
*spin_lock_irq(...)
|
*spin_lock_irqsave(...)
)
... when != mutex_unlock(...)
when != spin_unlock(...)
when != spin_unlock_irq(...)
when != spin_unlock_irqrestore(...)
*msleep(...)

2010-01-18 21:24:23

by Thomas Gleixner

[permalink] [raw]
Subject: Re: mcs7830 usb net: "scheduling while atomic" danger?

Julia,

On Mon, 18 Jan 2010, Julia Lawall wrote:
> On Mon, 18 Jan 2010, Andreas Mohr wrote:
>
> The semantic match shown below finds 55 matches. All but two involve
> mutex_lock. Those are in the file

As I said before the mutex_lock()/msleep() ones are fine.

> /var/linuxes/linux-next/drivers/usb/host/ehci-hub.c
> in the functions ehci_bus_suspend and ehci_hub_control.

The msleep in ehci_hub_control() which happens with ehci->lock held
and irqs disabled is definitely buggy.

I can't see anything wrong wth the msleep in ehci_bus_suspend() as it
is _before_ the spin_lock_irq(&ehci->lock) region.

Thanks,

tglx

2010-01-18 21:28:20

by Arnd Bergmann

[permalink] [raw]
Subject: Re: mcs7830 usb net: "scheduling while atomic" danger?

On Monday 18 January 2010, Julia Lawall wrote:
> On Mon, 18 Jan 2010, Andreas Mohr wrote:
> >
> > Forgive me, but doesn't that mutex_lock()/msleep() (ab)use mean
> > risking a "scheduling while atomic"?
> > (such as discussed in e.g.
> > http://search.luky.org/linux-kernel.2004/msg92817.html )
> >
> >
> > And, if that is the case, shouldn't all such cases simply be killed for
> > good via a capable semantic patch?
>
> The semantic match shown below finds 55 matches. All but two involve
> mutex_lock. Those are in the file
> /var/linuxes/linux-next/drivers/usb/host/ehci-hub.c
> in the functions ehci_bus_suspend and ehci_hub_control.

That code looks indeed broken as was added las July as part of 331ac6b288d9
"USB: EHCI: Add Intel Moorestown EHCI controller HOSTPCx extensions and
support phy low power mode". The reason that this hasn't triggered is
probably the lack of Moorestown machines in the field.

Arnd

2010-01-18 21:32:53

by Julia Lawall

[permalink] [raw]
Subject: Re: mcs7830 usb net: "scheduling while atomic" danger?

On Mon, 18 Jan 2010, Thomas Gleixner wrote:

> Julia,
>
> On Mon, 18 Jan 2010, Julia Lawall wrote:
> > On Mon, 18 Jan 2010, Andreas Mohr wrote:
> >
> > The semantic match shown below finds 55 matches. All but two involve
> > mutex_lock. Those are in the file
>
> As I said before the mutex_lock()/msleep() ones are fine.

OK, I sem to have missed that message.

> > /var/linuxes/linux-next/drivers/usb/host/ehci-hub.c
> > in the functions ehci_bus_suspend and ehci_hub_control.
>
> The msleep in ehci_hub_control() which happens with ehci->lock held
> and irqs disabled is definitely buggy.
>
> I can't see anything wrong wth the msleep in ehci_bus_suspend() as it
> is _before_ the spin_lock_irq(&ehci->lock) region.

There is another one later in the function:

msleep(5);/* 5ms for HCD enter low pwr mode */

In my linux-next, it is on line 181.

julia

2010-01-18 21:35:27

by Julia Lawall

[permalink] [raw]
Subject: Re: mcs7830 usb net: "scheduling while atomic" danger?

On Mon, 18 Jan 2010, Arnd Bergmann wrote:

> On Monday 18 January 2010, Julia Lawall wrote:
> > On Mon, 18 Jan 2010, Andreas Mohr wrote:
> > >
> > > Forgive me, but doesn't that mutex_lock()/msleep() (ab)use mean
> > > risking a "scheduling while atomic"?
> > > (such as discussed in e.g.
> > > http://search.luky.org/linux-kernel.2004/msg92817.html )
> > >
> > >
> > > And, if that is the case, shouldn't all such cases simply be killed for
> > > good via a capable semantic patch?
> >
> > The semantic match shown below finds 55 matches. All but two involve
> > mutex_lock. Those are in the file
> > /var/linuxes/linux-next/drivers/usb/host/ehci-hub.c
> > in the functions ehci_bus_suspend and ehci_hub_control.
>
> That code looks indeed broken as was added las July as part of 331ac6b288d9
> "USB: EHCI: Add Intel Moorestown EHCI controller HOSTPCx extensions and
> support phy low power mode". The reason that this hasn't triggered is
> probably the lack of Moorestown machines in the field.

The fix is just msleep -> mdelay?

julia

2010-01-18 21:38:44

by Thomas Gleixner

[permalink] [raw]
Subject: Re: mcs7830 usb net: "scheduling while atomic" danger?

On Mon, 18 Jan 2010, Julia Lawall wrote:
> On Mon, 18 Jan 2010, Thomas Gleixner wrote:
>
> > Julia,
> >
> > On Mon, 18 Jan 2010, Julia Lawall wrote:
> > > On Mon, 18 Jan 2010, Andreas Mohr wrote:
> > >
> > > The semantic match shown below finds 55 matches. All but two involve
> > > mutex_lock. Those are in the file
> >
> > As I said before the mutex_lock()/msleep() ones are fine.
>
> OK, I sem to have missed that message.
>
> > > /var/linuxes/linux-next/drivers/usb/host/ehci-hub.c
> > > in the functions ehci_bus_suspend and ehci_hub_control.
> >
> > The msleep in ehci_hub_control() which happens with ehci->lock held
> > and irqs disabled is definitely buggy.
> >
> > I can't see anything wrong wth the msleep in ehci_bus_suspend() as it
> > is _before_ the spin_lock_irq(&ehci->lock) region.
>
> There is another one later in the function:
>
> msleep(5);/* 5ms for HCD enter low pwr mode */
>
> In my linux-next, it is on line 181.

Ooops, missed that one. Right, that's buggy as well.

Thanks,

tglx

2010-01-18 22:25:34

by Arnd Bergmann

[permalink] [raw]
Subject: Re: mcs7830 usb net: "scheduling while atomic" danger?

On Monday 18 January 2010, Julia Lawall wrote:
> > That code looks indeed broken as was added las July as part of 331ac6b288d9
> > "USB: EHCI: Add Intel Moorestown EHCI controller HOSTPCx extensions and
> > support phy low power mode". The reason that this hasn't triggered is
> > probably the lack of Moorestown machines in the field.
>
> The fix is just msleep -> mdelay?

No, that would just kill the warning but still leave horrible code. There
should really be some way to move the sleeping operation outside of the
spinlock.

>From a brief look at the code, I think the sequence could be converted
from

lock();
suspend();
sleep();
clock_disable();
unlock();

to

lock();
again:
suspend();
unlock();
sleep();
lock();
if (!suspended())
goto again;
clock_disable();
unlock();

Arnd

2010-01-19 02:53:48

by Du, Alek

[permalink] [raw]
Subject: RE: mcs7830 usb net: "scheduling while atomic" danger?

I confirm that thing is bad, I need to prepare another patch to fix that.

Thanks,
Alek
>-----Original Message-----
>From: Arnd Bergmann [mailto:[email protected]]
>Sent: Tuesday, January 19, 2010 5:24 AM
>To: Julia Lawall
>Cc: Andreas Mohr; [email protected]; Greg KH; Du, Alek; Pan, Jacob
>jun; Alan Stern
>Subject: Re: mcs7830 usb net: "scheduling while atomic" danger?
>
>On Monday 18 January 2010, Julia Lawall wrote:
>> On Mon, 18 Jan 2010, Andreas Mohr wrote:
>> >
>> > Forgive me, but doesn't that mutex_lock()/msleep() (ab)use mean
>> > risking a "scheduling while atomic"?
>> > (such as discussed in e.g.
>> > http://search.luky.org/linux-kernel.2004/msg92817.html )
>> >
>> >
>> > And, if that is the case, shouldn't all such cases simply be killed for
>> > good via a capable semantic patch?
>>
>> The semantic match shown below finds 55 matches. All but two involve
>> mutex_lock. Those are in the file
>> /var/linuxes/linux-next/drivers/usb/host/ehci-hub.c
>> in the functions ehci_bus_suspend and ehci_hub_control.
>
>That code looks indeed broken as was added las July as part of 331ac6b288d9
>"USB: EHCI: Add Intel Moorestown EHCI controller HOSTPCx extensions and
>support phy low power mode". The reason that this hasn't triggered is
>probably the lack of Moorestown machines in the field.
>
> Arnd

2010-01-19 06:26:00

by Julia Lawall

[permalink] [raw]
Subject: RE: mcs7830 usb net: "scheduling while atomic" danger?

So you will fix /var/linuxes/linux-next/drivers/usb/host/ehci-hub.c?

I think it is a bit out of my expertise...

julia


On Tue, 19 Jan 2010, Du, Alek wrote:

> I confirm that thing is bad, I need to prepare another patch to fix that.
>
> Thanks,
> Alek
> >-----Original Message-----
> >From: Arnd Bergmann [mailto:[email protected]]
> >Sent: Tuesday, January 19, 2010 5:24 AM
> >To: Julia Lawall
> >Cc: Andreas Mohr; [email protected]; Greg KH; Du, Alek; Pan, Jacob
> >jun; Alan Stern
> >Subject: Re: mcs7830 usb net: "scheduling while atomic" danger?
> >
> >On Monday 18 January 2010, Julia Lawall wrote:
> >> On Mon, 18 Jan 2010, Andreas Mohr wrote:
> >> >
> >> > Forgive me, but doesn't that mutex_lock()/msleep() (ab)use mean
> >> > risking a "scheduling while atomic"?
> >> > (such as discussed in e.g.
> >> > http://search.luky.org/linux-kernel.2004/msg92817.html )
> >> >
> >> >
> >> > And, if that is the case, shouldn't all such cases simply be killed for
> >> > good via a capable semantic patch?
> >>
> >> The semantic match shown below finds 55 matches. All but two involve
> >> mutex_lock. Those are in the file
> >> /var/linuxes/linux-next/drivers/usb/host/ehci-hub.c
> >> in the functions ehci_bus_suspend and ehci_hub_control.
> >
> >That code looks indeed broken as was added las July as part of 331ac6b288d9
> >"USB: EHCI: Add Intel Moorestown EHCI controller HOSTPCx extensions and
> >support phy low power mode". The reason that this hasn't triggered is
> >probably the lack of Moorestown machines in the field.
> >
> > Arnd
>

2010-01-19 08:34:34

by Du, Alek

[permalink] [raw]
Subject: [PATCH] ehci: phy low power mode bug fixing

This patch should fix the bug and I tested it with Moorestown platform...
But the strange thing is that even the original code does not trigger the
"scheduling while atomic" issue...

>From ca7833d854867e91d6aa6bccf1f3224862b3a25c Mon Sep 17 00:00:00 2001
From: Alek Du <[email protected]>
Date: Tue, 19 Jan 2010 16:05:15 +0800
Subject: [PATCH] ehci: phy low power mode bug fixing

1. There are two msleep calls inside two spin lock sections, need to unlock
and lock again after msleep.
2. Save a extra status reg setting.

Signed-off-by: Alek Du <[email protected]>
---
drivers/usb/host/ehci-hub.c | 13 ++++++++-----
1 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/usb/host/ehci-hub.c b/drivers/usb/host/ehci-hub.c
index 2c6571c..7d77fbb 100644
--- a/drivers/usb/host/ehci-hub.c
+++ b/drivers/usb/host/ehci-hub.c
@@ -178,7 +178,9 @@ static int ehci_bus_suspend (struct usb_hcd *hcd)
if (hostpc_reg) {
u32 t3;

+ spin_unlock_irq(&ehci->lock);
msleep(5);/* 5ms for HCD enter low pwr mode */
+ spin_lock_irq(&ehci->lock);
t3 = ehci_readl(ehci, hostpc_reg);
ehci_writel(ehci, t3 | HOSTPC_PHCD, hostpc_reg);
t3 = ehci_readl(ehci, hostpc_reg);
@@ -886,17 +888,18 @@ static int ehci_hub_control (
if ((temp & PORT_PE) == 0
|| (temp & PORT_RESET) != 0)
goto error;
- ehci_writel(ehci, temp | PORT_SUSPEND, status_reg);
+
/* After above check the port must be connected.
* Set appropriate bit thus could put phy into low power
* mode if we have hostpc feature
*/
+ temp &= ~PORT_WKCONN_E;
+ temp |= PORT_WKDISC_E | PORT_WKOC_E;
+ ehci_writel(ehci, temp | PORT_SUSPEND, status_reg);
if (hostpc_reg) {
- temp &= ~PORT_WKCONN_E;
- temp |= (PORT_WKDISC_E | PORT_WKOC_E);
- ehci_writel(ehci, temp | PORT_SUSPEND,
- status_reg);
+ spin_unlock_irqrestore(&ehci->lock, flags);
msleep(5);/* 5ms for HCD enter low pwr mode */
+ spin_lock_irqsave(&ehci->lock, flags);
temp1 = ehci_readl(ehci, hostpc_reg);
ehci_writel(ehci, temp1 | HOSTPC_PHCD,
hostpc_reg);
--
1.6.3.3


On Tue, 19 Jan 2010 06:25:10 +0800
Arnd Bergmann <[email protected]> wrote:

> On Monday 18 January 2010, Julia Lawall wrote:
> > > That code looks indeed broken as was added las July as part of
> > > 331ac6b288d9 "USB: EHCI: Add Intel Moorestown EHCI controller
> > > HOSTPCx extensions and support phy low power mode". The reason
> > > that this hasn't triggered is probably the lack of Moorestown
> > > machines in the field.
> >
> > The fix is just msleep -> mdelay?
>
> No, that would just kill the warning but still leave horrible code.
> There should really be some way to move the sleeping operation
> outside of the spinlock.
>
> From a brief look at the code, I think the sequence could be converted
> from
>
> lock();
> suspend();
> sleep();
> clock_disable();
> unlock();
>
> to
>
> lock();
> again:
> suspend();
> unlock();
> sleep();
> lock();
> if (!suspended())
> goto again;
> clock_disable();
> unlock();
>
> Arnd