2008-10-26 17:34:18

by Alan Stern

[permalink] [raw]
Subject: Bug in workqueues [was: Usb devices randomly aren't detected with 2.6.28-rc1-git1]

This is no longer a USB issue, so I'm asking for help from the general
kernel community. Background can be found in Bugzilla #11853.

On Sun, 26 Oct 2008, Mikko C. wrote:

> Mikko C. ha scritto:
> > Alan Stern ha scritto:
> >>
> >> I don't see anything in there that looks particularly suspicious.
> >> The nature of your errors suggests that the default workqueue has
> >> crashed or hung, but it shows up okay in the dump. Are you sure this
> >> dump was made when the devices failed to appear?
> >>
> >>
> >
> > Yes, I'm 100% sure.
> >
> >> What happens when you try to rmmod the ALSA modules? Does rmmod
> >> crash with an error or does it hang? If it hangs, can you get
> >> another task dump showing the hanging process?
> >>
> >
> > Whenever I try rmmoding something, it hangs (no crash) and I'm not
> > able to do anything, beside moving the mouse.
> > I posted here: http://marc.info/?l=linux-kernel&m=122502213503239&w=4
> > but that's probably not enough, so I will try getting a full dump.
> >
> Here it is: http://bugzilla.kernel.org/attachment.cgi?id=18452&action=view

The task dump shows rmmod waiting for flush_workqueue(). But the
events/0 task doesn't appear to be hung, and the task dump taken before
running rmmod shows events/0 doing something different.

So apparently flush_workqueue() isn't working.

Alan Stern


2008-10-26 17:51:21

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Bug in workqueues [was: Usb devices randomly aren't detected with 2.6.28-rc1-git1]

On Sunday, 26 of October 2008, Alan Stern wrote:
> This is no longer a USB issue, so I'm asking for help from the general
> kernel community. Background can be found in Bugzilla #11853.
>
> On Sun, 26 Oct 2008, Mikko C. wrote:
>
> > Mikko C. ha scritto:
> > > Alan Stern ha scritto:
> > >>
> > >> I don't see anything in there that looks particularly suspicious.
> > >> The nature of your errors suggests that the default workqueue has
> > >> crashed or hung, but it shows up okay in the dump. Are you sure this
> > >> dump was made when the devices failed to appear?
> > >>
> > >>
> > >
> > > Yes, I'm 100% sure.
> > >
> > >> What happens when you try to rmmod the ALSA modules? Does rmmod
> > >> crash with an error or does it hang? If it hangs, can you get
> > >> another task dump showing the hanging process?
> > >>
> > >
> > > Whenever I try rmmoding something, it hangs (no crash) and I'm not
> > > able to do anything, beside moving the mouse.
> > > I posted here: http://marc.info/?l=linux-kernel&m=122502213503239&w=4
> > > but that's probably not enough, so I will try getting a full dump.
> > >
> > Here it is: http://bugzilla.kernel.org/attachment.cgi?id=18452&action=view
>
> The task dump shows rmmod waiting for flush_workqueue(). But the
> events/0 task doesn't appear to be hung, and the task dump taken before
> running rmmod shows events/0 doing something different.
>
> So apparently flush_workqueue() isn't working.

Let's make that more visible (adding CCs). :-)

Thanks,
Rafael

2008-10-26 18:03:02

by Linus Torvalds

[permalink] [raw]
Subject: Re: Bug in workqueues [was: Usb devices randomly aren't detected with 2.6.28-rc1-git1]



On Sun, 26 Oct 2008, Rafael J. Wysocki wrote:
> >
> > So apparently flush_workqueue() isn't working.
>
> Let's make that more visible (adding CCs). :-)

Isn't this the same thing that was fixed by commit
4403b406d4369a275d483ece6ddee0088cc0d592: aka 'Revert "Call
init_workqueues before pre smp initcalls."'

The bug was that init_workqueues was called too early, causing it to have
wrong initialization of its CPU masks, which caused various random
problems since it wouldn't run workqueues on anything but the boot CPU.

It is hidden by various config options (eg if you enable suspend, for
example), which is why it probably took some time for people to notice.

Linus

2008-10-26 18:10:24

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Bug in workqueues [was: Usb devices randomly aren't detected with 2.6.28-rc1-git1]

On Sunday, 26 of October 2008, Linus Torvalds wrote:
>
> On Sun, 26 Oct 2008, Rafael J. Wysocki wrote:
> > >
> > > So apparently flush_workqueue() isn't working.
> >
> > Let's make that more visible (adding CCs). :-)
>
> Isn't this the same thing that was fixed by commit
> 4403b406d4369a275d483ece6ddee0088cc0d592: aka 'Revert "Call
> init_workqueues before pre smp initcalls."'

OK, I've closed bug #11853 with this assumption. Mikko, if it happens again
with 2.6.28-rc1-git3 or later, please reopen.

Thanks,
Rafael

2008-10-26 18:22:01

by Linus Torvalds

[permalink] [raw]
Subject: Re: Bug in workqueues [was: Usb devices randomly aren't detected with 2.6.28-rc1-git1]



On Sun, 26 Oct 2008, Rafael J. Wysocki wrote:
>
> OK, I've closed bug #11853 with this assumption. Mikko, if it happens again
> with 2.6.28-rc1-git3 or later, please reopen.

Note: I tried to find Mikko's config, but zlin.dk times out. So I cannot
check that he actually has CONFIG_HOTPLUG_CPU off, which is probably
required to actually trigger the bug.

But it does sound likely.

However, the "Thunderbird crashes out for me" thing in that same thread
may be totally unrelated.

Linus

2008-10-26 18:28:44

by Mikko C.

[permalink] [raw]
Subject: Re: Bug in workqueues [was: Usb devices randomly aren't detected with 2.6.28-rc1-git1]

Linus Torvalds ha scritto:
> On Sun, 26 Oct 2008, Rafael J. Wysocki wrote:
>
>> OK, I've closed bug #11853 with this assumption. Mikko, if it happens again
>> with 2.6.28-rc1-git3 or later, please reopen.
>>
>
> Note: I tried to find Mikko's config, but zlin.dk times out. So I cannot
> check that he actually has CONFIG_HOTPLUG_CPU off, which is probably
> required to actually trigger the bug.
>
I have:
# CONFIG_HOTPLUG_CPU is not set

Here's the full config: http://pastebin.com/d1481da76
Does -rc1-git2 not have the fix yet?
If not, I'll wait for -git3 and report back.

> But it does sound likely.
>
> However, the "Thunderbird crashes out for me" thing in that same thread
> may be totally unrelated.
>
My thunderbird freezes when I receive a message (because it plays a
sound I guess).
So it's probably related to some sound issues.
Thanks

2008-10-26 18:37:18

by Linus Torvalds

[permalink] [raw]
Subject: Re: Bug in workqueues [was: Usb devices randomly aren't detected with 2.6.28-rc1-git1]



On Sun, 26 Oct 2008, Mikko C. wrote:
>
> Here's the full config: http://pastebin.com/d1481da76

Ok, you do indeed not have HOTPLUG_CPU, so I'd be willing to bet this is
it.

> Does -rc1-git2 not have the fix yet?

It should indeed be in -rc1-git2, so you can test it that way.

Linus

2008-10-26 19:01:20

by Mikko C.

[permalink] [raw]
Subject: Re: Bug in workqueues [was: Usb devices randomly aren't detected with 2.6.28-rc1-git1]

Linus Torvalds ha scritto:
> On Sun, 26 Oct 2008, Mikko C. wrote:
>
>> Here's the full config: http://pastebin.com/d1481da76
>>
>
> Ok, you do indeed not have HOTPLUG_CPU, so I'd be willing to bet this is
> it.
>
>> Does -rc1-git2 not have the fix yet?
>>
>
> It should indeed be in -rc1-git2, so you can test it that way.
>

-git2 seems to fix all the issues I had:

- rmmod works fine
- Sound is back and apps using it don't freeze
- usb devices seem to be detected correctly
- I will test wifi, but the first boot was ok (if not, I guess it's not
related to this).

I guess next time I'll be more patient and wait a little more before
trying a new kernel as soon as it's out :P

Thanks!!!

Mikko

2008-10-26 19:08:04

by Linus Torvalds

[permalink] [raw]
Subject: Re: Bug in workqueues [was: Usb devices randomly aren't detected with 2.6.28-rc1-git1]



On Sun, 26 Oct 2008, Mikko C. wrote:
>
> I guess next time I'll be more patient and wait a little more before trying a
> new kernel as soon as it's out :P

No, no, please do test. It's how we find these things. It got missed on
most machines just because if you happened to have suspend enabled (or
just hotplugging cpus in general), the cpu hotplug code would hide the
problem.

So having people test things with their odd configs is all good. Even if
it inevitably causes some wasted effort too...

Linus