2012-01-25 08:17:59

by Arend van Spriel

[permalink] [raw]
Subject: Re: kernel 3.2.1: BUG: scheduling while atomic: swapper/0/0/0x10000100

On 01/24/2012 04:59 PM, Francis Moreau wrote:
> Hi,
>
> Just in case someone is interested, I've been hit by this:
>
> [ 236.734640] BUG: scheduling while atomic: swapper/0/0/0x10000100

Hi Francis

I hit that problem yesterday as well on 3.3.0-rc1. I suspected it came
from my development changes on brcmfmac, but you are not using that.
Still I am looking into it trying some kernel hacking options in Kconfig.

Gr. AvS



2012-01-26 11:34:46

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: kernel 3.2.1: BUG: scheduling while atomic: swapper/0/0/0x10000100

On Thu, Jan 26, 2012 at 10:49:47AM +0100, Arend van Spriel wrote:
> >>> backtrace:
> >>> :BUG: sleeping function called from invalid context at
> >>> drivers/base/power/runtime.c:785
> >>> :in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
> >>> :1 lock held by swapper/0/0:
> >>> : #0: (&chip->rts51x_suspend_timer){+.-...}, at: [<ffffffff8108d711>]
> >>> run_timer_softirq+0x111/0x600
> >>> :Pid: 0, comm: swapper/0 Tainted: G W 3.2.0-2.fc17.x86_64.debug #1
> >>> :Call Trace:
> >>> : <IRQ> [<ffffffff8105ee7a>] __might_sleep+0x13a/0x1f0
> >>> : [<ffffffffa00180b0>] ? rts51x_modi_suspend_timer+0x50/0x50 [ums_realtek]
> >>> : [<ffffffff813fdee3>] __pm_runtime_idle+0xa3/0xb0
> >>> : [<ffffffffa00180b0>] ? rts51x_modi_suspend_timer+0x50/0x50 [ums_realtek]
> >>> : [<ffffffff8146ed30>] usb_autopm_put_interface+0x30/0x40
> >>
> >> This trace seems to indicate what is wrong. At least for this one user.
> >> The usb_autopm_put_interface() call seems result in a sleep and the
> >> rts51x_suspend_timer_fn is holding a lock. Below the code indicated in
> >> the trace.
> >>
> >> runtime.c:
> >> 768 /**
> >> 769 * __pm_runtime_idle - Entry point for runtime idle operations.
> >> 770 * @dev: Device to send idle notification for.
> >> 771 * @rpmflags: Flag bits.
> >> 772 *
> >> 773 * If the RPM_GET_PUT flag is set, decrement the device's usage
> >> count and
> >> 774 * return immediately if it is larger than zero. Then carry out an idle
> >> 775 * notification, either synchronous or asynchronous.
> >> 776 *
> >> 777 * This routine may be called in atomic context if the RPM_ASYNC
> >> flag is set,
> >> 778 * or if pm_runtime_irq_safe() has been called.
> >> 779 */
> >> 780 int __pm_runtime_idle(struct device *dev, int rpmflags)
> >> 781 {
> >> 782 unsigned long flags;
> >> 783 int retval;
> >> 784
> >> 785 might_sleep_if(!(rpmflags & RPM_ASYNC) && !dev->power.irq_safe);
> >>
> >> This issue can either be caused in ums_realtek module or the usb autopm
> >> code. Have not looked into that.
> >>
> >
> > Just a wild guess: it seems that we're affected using different HW
> > configs with different USB devices, so it would indicate that usb
> > autopm is the culprit.

Except Arend, all other users reporting that have ums_realtek, so they
hit that driver issue. I just posted ums_realtek fix.

Arend issue is different, it happens on different kernel version 3.3-rc1 vs 3.2.

Stanislaw


2012-01-26 09:16:37

by Francis Moreau

[permalink] [raw]
Subject: Re: kernel 3.2.1: BUG: scheduling while atomic: swapper/0/0/0x10000100

On Thu, Jan 26, 2012 at 10:04 AM, Arend van Spriel <[email protected]> wrote:
> On 01/25/2012 05:49 PM, Josh Boyer wrote:
>> On Wed, Jan 25, 2012 at 11:37 AM, Francis Moreau <[email protected]> wrote:
>>> Hello,
>>>
>>> On Wed, Jan 25, 2012 at 1:11 PM, Josh Boyer <[email protected]> wrote:
>>>>
>>>> We've had a few similar issues reported in Fedora yesterday after we
>>>> rebased to 3.2.1.
>>>
>>> From which kernel did you rebase to 3.2.1 ?
>>
>> Fedora 16 went from 3.1.9 to 3.2.1 earlier this week. ?We hadn't been getting
>> reports like this until 3.2.1 showed up. ?I asked one of the reporters to try
>> 3.2.0 and they say it recreates there still, so it seems the issue showed up
>> with the 3.2 release. ?One user managed to capture the trace below using a
>> kernel with a variety of debug options turned on. ?It isn't the same backtrace
>> but it might be related.
>>
>> backtrace:
>> :BUG: sleeping function called from invalid context at
>> drivers/base/power/runtime.c:785
>> :in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
>> :1 lock held by swapper/0/0:
>> : #0: ?(&chip->rts51x_suspend_timer){+.-...}, at: [<ffffffff8108d711>]
>> run_timer_softirq+0x111/0x600
>> :Pid: 0, comm: swapper/0 Tainted: G ? ? ? ?W ? ?3.2.0-2.fc17.x86_64.debug #1
>> :Call Trace:
>> : <IRQ> ?[<ffffffff8105ee7a>] __might_sleep+0x13a/0x1f0
>> : [<ffffffffa00180b0>] ? rts51x_modi_suspend_timer+0x50/0x50 [ums_realtek]
>> : [<ffffffff813fdee3>] __pm_runtime_idle+0xa3/0xb0
>> : [<ffffffffa00180b0>] ? rts51x_modi_suspend_timer+0x50/0x50 [ums_realtek]
>> : [<ffffffff8146ed30>] usb_autopm_put_interface+0x30/0x40
>
> This trace seems to indicate what is wrong. At least for this one user.
> The usb_autopm_put_interface() call seems result in a sleep and the
> rts51x_suspend_timer_fn is holding a lock. Below the code indicated in
> the trace.
>
> runtime.c:
> 768 /**
> 769 ?* __pm_runtime_idle - Entry point for runtime idle operations.
> 770 ?* @dev: Device to send idle notification for.
> 771 ?* @rpmflags: Flag bits.
> 772 ?*
> 773 ?* If the RPM_GET_PUT flag is set, decrement the device's usage
> count and
> 774 ?* return immediately if it is larger than zero. ?Then carry out an idle
> 775 ?* notification, either synchronous or asynchronous.
> 776 ?*
> 777 ?* This routine may be called in atomic context if the RPM_ASYNC
> flag is set,
> 778 ?* or if pm_runtime_irq_safe() has been called.
> 779 ?*/
> 780 int __pm_runtime_idle(struct device *dev, int rpmflags)
> 781 {
> 782 ? ? ? ? unsigned long flags;
> 783 ? ? ? ? int retval;
> 784
> 785 ? ? ? ? might_sleep_if(!(rpmflags & RPM_ASYNC) && !dev->power.irq_safe);
>
> This issue can either be caused in ums_realtek module or the usb autopm
> code. Have not looked into that.
>

Just a wild guess: it seems that we're affected using different HW
configs with different USB devices, so it would indicate that usb
autopm is the culprit.

--
Francis

2012-01-25 16:37:19

by Francis Moreau

[permalink] [raw]
Subject: Re: kernel 3.2.1: BUG: scheduling while atomic: swapper/0/0/0x10000100

Hello,

On Wed, Jan 25, 2012 at 1:11 PM, Josh Boyer <[email protected]> wrote:
>
> We've had a few similar issues reported in Fedora yesterday after we
> rebased to 3.2.1.

>From which kernel did you rebase to 3.2.1 ?

--
Francis

2012-01-25 08:35:57

by Francis Moreau

[permalink] [raw]
Subject: Re: kernel 3.2.1: BUG: scheduling while atomic: swapper/0/0/0x10000100

Hi Arend,

On Wed, Jan 25, 2012 at 9:17 AM, Arend van Spriel <[email protected]> wrote:
> On 01/24/2012 04:59 PM, Francis Moreau wrote:
>> Hi,
>>
>> Just in case someone is interested, I've been hit by this:
>>
>> [ ?236.734640] BUG: scheduling while atomic: swapper/0/0/0x10000100
>
> I hit that problem yesterday as well on 3.3.0-rc1. I suspected it came
> from my development changes on brcmfmac, but you are not using that.

I'm not sure it's related to the brcmfmac, because the RFKILL switch
was off when it happened.

Were you writting/reading from/to a SD card by any chances when it did trigger ?

--
Francis

2012-01-25 12:11:48

by Josh Boyer

[permalink] [raw]
Subject: Re: kernel 3.2.1: BUG: scheduling while atomic: swapper/0/0/0x10000100

On Wed, Jan 25, 2012 at 5:00 AM, Arend van Spriel <[email protected]> wrote:
> On 01/25/2012 09:35 AM, Francis Moreau wrote:
>> Hi Arend,
>>
>> On Wed, Jan 25, 2012 at 9:17 AM, Arend van Spriel <[email protected]> wrote:
>>> On 01/24/2012 04:59 PM, Francis Moreau wrote:
>>>> Hi,
>>>>
>>>> Just in case someone is interested, I've been hit by this:
>>>>
>>>> [ ?236.734640] BUG: scheduling while atomic: swapper/0/0/0x10000100
>>>
>>> I hit that problem yesterday as well on 3.3.0-rc1. I suspected it came
>>> from my development changes on brcmfmac, but you are not using that.
>>
>> I'm not sure it's related to the brcmfmac, because the RFKILL switch
>> was off when it happened.
>>
>> Were you writting/reading from/to a SD card by any chances when it did trigger ?
>>
>
> Nope. It was a USB device driver. Maybe it was conincidental that it
> happened shortly after loading the driver, but I tend to mistrust my own
> stuff ;-). I posted a question in my issue yesterday (see [1]).

We've had a few similar issues reported in Fedora yesterday after we
rebased to 3.2.1.
I've not found a common trend yet, but there is definitely an issue on
some hardware in
3.2.1

josh

2012-01-26 09:50:00

by Arend van Spriel

[permalink] [raw]
Subject: Re: kernel 3.2.1: BUG: scheduling while atomic: swapper/0/0/0x10000100

On 01/26/2012 10:16 AM, Francis Moreau wrote:
> On Thu, Jan 26, 2012 at 10:04 AM, Arend van Spriel <[email protected]> wrote:
>> On 01/25/2012 05:49 PM, Josh Boyer wrote:
>>> On Wed, Jan 25, 2012 at 11:37 AM, Francis Moreau <[email protected]> wrote:
>>>> Hello,
>>>>
>>>> On Wed, Jan 25, 2012 at 1:11 PM, Josh Boyer <[email protected]> wrote:
>>>>>
>>>>> We've had a few similar issues reported in Fedora yesterday after we
>>>>> rebased to 3.2.1.
>>>>
>>>> From which kernel did you rebase to 3.2.1 ?
>>>
>>> Fedora 16 went from 3.1.9 to 3.2.1 earlier this week. We hadn't been getting
>>> reports like this until 3.2.1 showed up. I asked one of the reporters to try
>>> 3.2.0 and they say it recreates there still, so it seems the issue showed up
>>> with the 3.2 release. One user managed to capture the trace below using a
>>> kernel with a variety of debug options turned on. It isn't the same backtrace
>>> but it might be related.
>>>
>>> backtrace:
>>> :BUG: sleeping function called from invalid context at
>>> drivers/base/power/runtime.c:785
>>> :in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
>>> :1 lock held by swapper/0/0:
>>> : #0: (&chip->rts51x_suspend_timer){+.-...}, at: [<ffffffff8108d711>]
>>> run_timer_softirq+0x111/0x600
>>> :Pid: 0, comm: swapper/0 Tainted: G W 3.2.0-2.fc17.x86_64.debug #1
>>> :Call Trace:
>>> : <IRQ> [<ffffffff8105ee7a>] __might_sleep+0x13a/0x1f0
>>> : [<ffffffffa00180b0>] ? rts51x_modi_suspend_timer+0x50/0x50 [ums_realtek]
>>> : [<ffffffff813fdee3>] __pm_runtime_idle+0xa3/0xb0
>>> : [<ffffffffa00180b0>] ? rts51x_modi_suspend_timer+0x50/0x50 [ums_realtek]
>>> : [<ffffffff8146ed30>] usb_autopm_put_interface+0x30/0x40
>>
>> This trace seems to indicate what is wrong. At least for this one user.
>> The usb_autopm_put_interface() call seems result in a sleep and the
>> rts51x_suspend_timer_fn is holding a lock. Below the code indicated in
>> the trace.
>>
>> runtime.c:
>> 768 /**
>> 769 * __pm_runtime_idle - Entry point for runtime idle operations.
>> 770 * @dev: Device to send idle notification for.
>> 771 * @rpmflags: Flag bits.
>> 772 *
>> 773 * If the RPM_GET_PUT flag is set, decrement the device's usage
>> count and
>> 774 * return immediately if it is larger than zero. Then carry out an idle
>> 775 * notification, either synchronous or asynchronous.
>> 776 *
>> 777 * This routine may be called in atomic context if the RPM_ASYNC
>> flag is set,
>> 778 * or if pm_runtime_irq_safe() has been called.
>> 779 */
>> 780 int __pm_runtime_idle(struct device *dev, int rpmflags)
>> 781 {
>> 782 unsigned long flags;
>> 783 int retval;
>> 784
>> 785 might_sleep_if(!(rpmflags & RPM_ASYNC) && !dev->power.irq_safe);
>>
>> This issue can either be caused in ums_realtek module or the usb autopm
>> code. Have not looked into that.
>>
>
> Just a wild guess: it seems that we're affected using different HW
> configs with different USB devices, so it would indicate that usb
> autopm is the culprit.
>

Did a git log between the versions (git log v3.1.9..v3.2.1 --
drivers/usb/core). One commit caught my eye:

commit d727b60659a1173eb4142a5fc521ce67c28b34e1
Merge: 3ee72ca 2a5306c
Author: Rafael J. Wysocki <[email protected]>
Date: Fri Oct 7 23:16:55 2011 +0200

Merge branch 'pm-runtime' into pm-for-linus

* pm-runtime:
PM / Tracing: build rpm-traces.c only if CONFIG_PM_RUNTIME is set
PM / Runtime: Replace dev_dbg() with trace_rpm_*()
PM / Runtime: Introduce trace points for tracing rpm_* functions
PM / Runtime: Don't run callbacks under lock for power.irq_safe set
USB: Add wakeup info to debugging messages
PM / Runtime: pm_runtime_idle() can be called in atomic context
PM / Runtime: Add macro to test for runtime PM events
PM / Runtime: Add might_sleep() to runtime PM functions

Might be worth to try with CONFIG_PM_RUNTIME disabled. However, I have
not yet found how to reproduce the problem.

Gr. AvS


2012-01-26 11:30:00

by Stanislaw Gruszka

[permalink] [raw]
Subject: [PATCH] USB: Realtek cr: fix autopm scheduling while atomic

Resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=784345

Reported-by: Francis Moreau <[email protected]>
Reported-and-tested-by: Christian D <[email protected]>
Reported-and-tested-by: Jimmy Dorff <[email protected]>
Reported-and-tested-by: [email protected]
Cc: [email protected] # 3.2+
Signed-off-by: Stanislaw Gruszka <[email protected]>

diff --git a/drivers/usb/storage/realtek_cr.c b/drivers/usb/storage/realtek_cr.c
index 1f62723..d32f720 100644
--- a/drivers/usb/storage/realtek_cr.c
+++ b/drivers/usb/storage/realtek_cr.c
@@ -789,7 +789,7 @@ static void rts51x_suspend_timer_fn(unsigned long data)
rts51x_set_stat(chip, RTS51X_STAT_SS);
/* ignore mass storage interface's children */
pm_suspend_ignore_children(&us->pusb_intf->dev, true);
- usb_autopm_put_interface(us->pusb_intf);
+ usb_autopm_put_interface_async(us->pusb_intf);
US_DEBUGP("%s: RTS51X_STAT_SS 01,"
"intf->pm_usage_cnt:%d, power.usage:%d\n",
__func__,

2012-01-26 09:05:15

by Arend van Spriel

[permalink] [raw]
Subject: Re: kernel 3.2.1: BUG: scheduling while atomic: swapper/0/0/0x10000100

On 01/25/2012 05:49 PM, Josh Boyer wrote:
> On Wed, Jan 25, 2012 at 11:37 AM, Francis Moreau <[email protected]> wrote:
>> Hello,
>>
>> On Wed, Jan 25, 2012 at 1:11 PM, Josh Boyer <[email protected]> wrote:
>>>
>>> We've had a few similar issues reported in Fedora yesterday after we
>>> rebased to 3.2.1.
>>
>> From which kernel did you rebase to 3.2.1 ?
>
> Fedora 16 went from 3.1.9 to 3.2.1 earlier this week. We hadn't been getting
> reports like this until 3.2.1 showed up. I asked one of the reporters to try
> 3.2.0 and they say it recreates there still, so it seems the issue showed up
> with the 3.2 release. One user managed to capture the trace below using a
> kernel with a variety of debug options turned on. It isn't the same backtrace
> but it might be related.
>
> backtrace:
> :BUG: sleeping function called from invalid context at
> drivers/base/power/runtime.c:785
> :in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
> :1 lock held by swapper/0/0:
> : #0: (&chip->rts51x_suspend_timer){+.-...}, at: [<ffffffff8108d711>]
> run_timer_softirq+0x111/0x600
> :Pid: 0, comm: swapper/0 Tainted: G W 3.2.0-2.fc17.x86_64.debug #1
> :Call Trace:
> : <IRQ> [<ffffffff8105ee7a>] __might_sleep+0x13a/0x1f0
> : [<ffffffffa00180b0>] ? rts51x_modi_suspend_timer+0x50/0x50 [ums_realtek]
> : [<ffffffff813fdee3>] __pm_runtime_idle+0xa3/0xb0
> : [<ffffffffa00180b0>] ? rts51x_modi_suspend_timer+0x50/0x50 [ums_realtek]
> : [<ffffffff8146ed30>] usb_autopm_put_interface+0x30/0x40

This trace seems to indicate what is wrong. At least for this one user.
The usb_autopm_put_interface() call seems result in a sleep and the
rts51x_suspend_timer_fn is holding a lock. Below the code indicated in
the trace.

runtime.c:
768 /**
769 * __pm_runtime_idle - Entry point for runtime idle operations.
770 * @dev: Device to send idle notification for.
771 * @rpmflags: Flag bits.
772 *
773 * If the RPM_GET_PUT flag is set, decrement the device's usage
count and
774 * return immediately if it is larger than zero. Then carry out an idle
775 * notification, either synchronous or asynchronous.
776 *
777 * This routine may be called in atomic context if the RPM_ASYNC
flag is set,
778 * or if pm_runtime_irq_safe() has been called.
779 */
780 int __pm_runtime_idle(struct device *dev, int rpmflags)
781 {
782 unsigned long flags;
783 int retval;
784
785 might_sleep_if(!(rpmflags & RPM_ASYNC) && !dev->power.irq_safe);

This issue can either be caused in ums_realtek module or the usb autopm
code. Have not looked into that.

Gr. AvS


2012-01-25 10:00:34

by Arend van Spriel

[permalink] [raw]
Subject: Re: kernel 3.2.1: BUG: scheduling while atomic: swapper/0/0/0x10000100

On 01/25/2012 09:35 AM, Francis Moreau wrote:
> Hi Arend,
>
> On Wed, Jan 25, 2012 at 9:17 AM, Arend van Spriel <[email protected]> wrote:
>> On 01/24/2012 04:59 PM, Francis Moreau wrote:
>>> Hi,
>>>
>>> Just in case someone is interested, I've been hit by this:
>>>
>>> [ 236.734640] BUG: scheduling while atomic: swapper/0/0/0x10000100
>>
>> I hit that problem yesterday as well on 3.3.0-rc1. I suspected it came
>> from my development changes on brcmfmac, but you are not using that.
>
> I'm not sure it's related to the brcmfmac, because the RFKILL switch
> was off when it happened.
>
> Were you writting/reading from/to a SD card by any chances when it did trigger ?
>

Nope. It was a USB device driver. Maybe it was conincidental that it
happened shortly after loading the driver, but I tend to mistrust my own
stuff ;-). I posted a question in my issue yesterday (see [1]).

Gr. AvS

[1] https://lkml.org/lkml/2012/1/24/83


2012-01-25 16:49:15

by Josh Boyer

[permalink] [raw]
Subject: Re: kernel 3.2.1: BUG: scheduling while atomic: swapper/0/0/0x10000100

On Wed, Jan 25, 2012 at 11:37 AM, Francis Moreau <[email protected]> wrote:
> Hello,
>
> On Wed, Jan 25, 2012 at 1:11 PM, Josh Boyer <[email protected]> wrote:
>>
>> We've had a few similar issues reported in Fedora yesterday after we
>> rebased to 3.2.1.
>
> From which kernel did you rebase to 3.2.1 ?

Fedora 16 went from 3.1.9 to 3.2.1 earlier this week. We hadn't been getting
reports like this until 3.2.1 showed up. I asked one of the reporters to try
3.2.0 and they say it recreates there still, so it seems the issue showed up
with the 3.2 release. One user managed to capture the trace below using a
kernel with a variety of debug options turned on. It isn't the same backtrace
but it might be related.

backtrace:
:BUG: sleeping function called from invalid context at
drivers/base/power/runtime.c:785
:in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
:1 lock held by swapper/0/0:
: #0: (&chip->rts51x_suspend_timer){+.-...}, at: [<ffffffff8108d711>]
run_timer_softirq+0x111/0x600
:Pid: 0, comm: swapper/0 Tainted: G W 3.2.0-2.fc17.x86_64.debug #1
:Call Trace:
: <IRQ> [<ffffffff8105ee7a>] __might_sleep+0x13a/0x1f0
: [<ffffffffa00180b0>] ? rts51x_modi_suspend_timer+0x50/0x50 [ums_realtek]
: [<ffffffff813fdee3>] __pm_runtime_idle+0xa3/0xb0
: [<ffffffffa00180b0>] ? rts51x_modi_suspend_timer+0x50/0x50 [ums_realtek]
: [<ffffffff8146ed30>] usb_autopm_put_interface+0x30/0x40
: [<ffffffffa0018125>] rts51x_suspend_timer_fn+0x75/0x80 [ums_realtek]
: [<ffffffff8108d7a6>] run_timer_softirq+0x1a6/0x600
: [<ffffffff8108d711>] ? run_timer_softirq+0x111/0x600
: [<ffffffff81084700>] __do_softirq+0xe0/0x370
: [<ffffffff8101fcc3>] ? native_sched_clock+0x13/0x80
: [<ffffffff8167f27c>] call_softirq+0x1c/0x30
: [<ffffffff8101a3e5>] do_softirq+0xa5/0xe0
: [<ffffffff81084d15>] irq_exit+0xe5/0xf0
: [<ffffffff8167fc3e>] smp_apic_timer_interrupt+0x6e/0x99
: [<ffffffff8167daf3>] apic_timer_interrupt+0x73/0x80
: <EOI> [<ffffffff8101fd39>] ? sched_clock+0x9/0x10
: [<ffffffff81363f9b>] ? intel_idle+0xfb/0x160
: [<ffffffff81363f97>] ? intel_idle+0xf7/0x160
: [<ffffffff814fa7f3>] cpuidle_idle_call+0xb3/0x520
: [<ffffffff81017265>] cpu_idle+0xe5/0x130
: [<ffffffff8164aff3>] rest_init+0xd7/0xe4
: [<ffffffff8164af1c>] ? csum_partial_copy_generic+0x16c/0x16c
: [<ffffffff81f60c39>] start_kernel+0x3ec/0x3f9
: [<ffffffff81f60347>] x86_64_start_reservations+0x132/0x136
: [<ffffffff81f6044b>] x86_64_start_kernel+0x100/0x10f

Not sure if that is a common trend yet or not.

josh

https://bugzilla.redhat.com/show_bug.cgi?id=784345
https://bugzilla.redhat.com/show_bug.cgi?id=784414