Hi,
I have Eking I1 UMPC that features Marvell 8686 SDIO wireless card. I
am currently using vanilla kernel 2.6.30, but the bug is there on
Debian 2.26 and 2.29 as well as vanilla 2.29 kernels I tried previously
The initial symptom of the bug was system lockup when executing 'ifdown
eth0' with 'BUG: scheduling while atomic'. Trying to understand the
cause of the bug and find a workaround I was able to find an easier way
to reproduce it. By executing 'cat /proc/net/wireless' I get the same
bug:
BUG: scheduling while atomic: cat/1885/0x00000002
Pid: 1885, comm: cat Not tainted 2.6.30.090704 #2
Call Trace:
[<c042eb7f>] ? __schedule+0x37f/0x8c0
[<c01237cd>] ? try_to_wake_up+0x8d/0x1e0
[<c011ee0e>] ? __wake_up+0x3e/0x60
[<c02e882c>] ? __lbs_cmd_async+0x11c/0x280
[<c018ddbe>] ? d_rehash+0x2e/0x50
[<c042f0d0>] ? schedule+0x10/0x30
[<c02e8be4>] ? __lbs_cmd+0xa4/0x1a0
[<c02ea3f0>] ? lbs_cmd_copyback+0x0/0x40
[<c013c460>] ? autoremove_wake_function+0x0/0x50
[<c02e67d5>] ? lbs_get_wireless_stats+0xf5/0x3c0
[<c02ea3f0>] ? lbs_cmd_copyback+0x0/0x40
[<c03fb1e7>] ? wireless_seq_show+0x47/0x180
[<c03965bf>] ? dev_seq_start+0x1f/0xb0
[<c0196c6d>] ? seq_read+0x1fd/0x360
[<c0196a70>] ? seq_read+0x0/0x360
[<c01b4774>] ? proc_reg_read+0x64/0xa0
[<c01b4710>] ? proc_reg_read+0x0/0xa0
[<c017f9ab>] ? vfs_read+0x9b/0x120
[<c017fb01>] ? sys_read+0x41/0x80
[<c0102f21>] ? syscall_call+0x7/0xb
Please let me know if I should provide any additional details regarding
the bug.
--
Alex
On Mon, 06 Jul 2009 13:29:55 -0400
Dan Williams <[email protected]> wrote:
> On Sun, 2009-07-05 at 12:59 +0400, Alexander Barinov wrote:
> > On Sat, 04 Jul 2009 12:41:42 +0200
> > Johannes Berg <[email protected]> wrote:
> > > On Sat, 2009-07-04 at 13:03 +0400, Alexander Barinov wrote:
> > > > The initial symptom of the bug was system lockup when executing
> > > > 'ifdown eth0' with 'BUG: scheduling while atomic'. Trying to
> > > > understand the cause of the bug and find a workaround I was
> > > > able to find an easier way to reproduce it. By executing
> > > > 'cat /proc/net/wireless' I get the same bug:
> > > This should have been fixed by
> > > 87057825824973f29cf2f37cff1e549170b2d7e6. For some reason
> > > everybody seems to have assumed that get_wireless_stats can
> > > sleep, which before that commit it could _not_.
> >
> > The patch you have mentioned was not directly applicable to 2.6.30
> > kernel so I pulled wireless-testing git tree and compiled it. This
> > deteriorated the situation further - now after executing
> > 'cat /proc/net/wireless' I get kernel lockup without any further
> > messages.
>
> Can you get anything at all out of the kernel on that? 2.6.29.5 with
> quite recent wireless-testing libertas driver works fine with sd8686
> on my machine (hp 2530p laptop with a Ricoh controller). Can you
> post your backport of that patch too?
The problem is I get no kernel messages at all, just a lockup. Wireless
debug (CONFIG_MAC80211_*_DEBUG) and libertas debug
(CONFIG_LIBERTAS_DEBUG) are on, no "quiet" option on boot. Is there any
other way I can increase level of debug output to be able to give you
some meaningful information with regards to my problem?
BTW, my case is sd8686 on some SDHCI PCI controller, I am not sure how
to find the vendor.
I am not sure which backport you are talking about, as I am using
wireless-testing pulled from git, it should be something around
2.6.31-rc1.
Regards,
Alexander
> Yes, but the patch that I quoted makes it allowable to sleep
> there, so it must be something else. Is it maybe using the
> RTNL there? Or using schedule_work() and then waiting for it
> or something that the work triggers, which will deadlock on
> the RTNL if there's something in front of it on the queue that
> needs the RTNL, because get_wireless_stats is executed under
> RTNL? (lockdep couldn't find that particular case because it
> knows nothing about completions)
Again this is all from my memory, around the 2.6.25
time. "iwconfig" or "cat /proc/net/wireless" ended up in
drivers/net/wireless/libertas/wext.c, AFAIK in
lbs_get_wireless_stats(). This calls
lbs_cmd_with_response(priv, CMD_802_11_GET_LOG, &log);
this is a macro calling lbs_cmd(). This thingy then does
__lbs_cmd_async(), which creates a "command node", queues this,
and calls
wake_up_interruptible(&priv->waitq);
(cmd.c, around line 2050) to get the queue handled (e.g. sending
the command to the firmware). And I think that this wake_up-call
calls __schedule() now.
Later, lbs_cmd() does this:
might_sleep();
wait_event_interruptible(cmdnode->cmdwait_q,
cmdnode->cmdwaitqwoken);
But AFAIK this isn't problematic.
--
http://www.holgerschurig.de
On Tue, 7 Jul 2009 08:59:39 +0200
Holger Schurig <[email protected]> wrote:
> > By executing 'cat /proc/net/wireless' I get the same bug:
>
> I haven't tested your exact but, but I've the following in my
> mind, back from about 1 year ago when I did lots of libertas and
> libertas_cs work:
>
> Access wireless stats made the libertas driver (in wext.c) issue
> a command towards the firmware in the hardware. I think it was
> the command to get the current SNR/RSSI/whatever.
>
> The libertas command-response-cycle does sleep, so you could
> trigger a bug. I once had a patch (and I think I posted that
> patch). The patch lied, by providing the last SNR/RSSI/whatever,
> and issueing a "get SNR/RSSI/whatever" in the background, just
> storing the result. So just one "cat /proc/net/wireless" returns
> bogus, but a continues command returns "valid" info, e.g. when
> doing "watch iwconfig eth1".
Sorry for asking a potentially stupid question as I am no way a
wireless hardware expert, but can this bug be caused by wrong firmware
version?
BTW, it is really hard to find a patch you are talking about in the
archive. Could you please give some clues on the key words to search
(besides your name and "patch", of cause)?
On Sun, 2009-07-05 at 12:59 +0400, Alexander Barinov wrote:
> On Sat, 04 Jul 2009 12:41:42 +0200
> Johannes Berg <[email protected]> wrote:
> > On Sat, 2009-07-04 at 13:03 +0400, Alexander Barinov wrote:
> > > The initial symptom of the bug was system lockup when executing
> > > 'ifdown eth0' with 'BUG: scheduling while atomic'. Trying to
> > > understand the cause of the bug and find a workaround I was able to
> > > find an easier way to reproduce it. By executing
> > > 'cat /proc/net/wireless' I get the same bug:
> > This should have been fixed by
> > 87057825824973f29cf2f37cff1e549170b2d7e6. For some reason everybody
> > seems to have assumed that get_wireless_stats can sleep, which before
> > that commit it could _not_.
>
> The patch you have mentioned was not directly applicable to 2.6.30
> kernel so I pulled wireless-testing git tree and compiled it. This
> deteriorated the situation further - now after executing
> 'cat /proc/net/wireless' I get kernel lockup without any further
> messages.
Can you get anything at all out of the kernel on that? 2.6.29.5 with
quite recent wireless-testing libertas driver works fine with sd8686 on
my machine (hp 2530p laptop with a Ricoh controller). Can you post your
backport of that patch too?
Dan
> Sorry for asking a potentially stupid question as I am no way
> a wireless hardware expert, but can this bug be caused by
> wrong firmware version?
AFAIK not.
> BTW, it is really hard to find a patch you are talking about
> in the archive.
It's dead easy:
$ cd linux-git
$ git show 87057825824973f29cf2f37cff1e549170b2d7e6
Or you can go via the web-interface: start at
http://git.kernel.org, search Linus' tree and you end up at
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=summary.
Now select any commitdiff, and substitute in the URL the
hex-thingy from above. And voila, you're at
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=87057825824973f29cf2f37cff1e549170b2d7e6
--
http://www.holgerschurig.de
On Tue, 2009-07-07 at 08:59 +0200, Holger Schurig wrote:
> > By executing 'cat /proc/net/wireless' I get the same bug:
>
> I haven't tested your exact but, but I've the following in my
> mind, back from about 1 year ago when I did lots of libertas and
> libertas_cs work:
>
> Access wireless stats made the libertas driver (in wext.c) issue
> a command towards the firmware in the hardware. I think it was
> the command to get the current SNR/RSSI/whatever.
Yes, but the patch that I quoted makes it allowable to sleep there, so
it must be something else. Is it maybe using the RTNL there? Or using
schedule_work() and then waiting for it or something that the work
triggers, which will deadlock on the RTNL if there's something in front
of it on the queue that needs the RTNL, because get_wireless_stats is
executed under RTNL? (lockdep couldn't find that particular case because
it knows nothing about completions)
johannes
> By executing 'cat /proc/net/wireless' I get the same bug:
I haven't tested your exact but, but I've the following in my
mind, back from about 1 year ago when I did lots of libertas and
libertas_cs work:
Access wireless stats made the libertas driver (in wext.c) issue
a command towards the firmware in the hardware. I think it was
the command to get the current SNR/RSSI/whatever.
The libertas command-response-cycle does sleep, so you could
trigger a bug. I once had a patch (and I think I posted that
patch). The patch lied, by providing the last SNR/RSSI/whatever,
and issueing a "get SNR/RSSI/whatever" in the background, just
storing the result. So just one "cat /proc/net/wireless" returns
bogus, but a continues command returns "valid" info, e.g. when
doing "watch iwconfig eth1".
--
http://www.holgerschurig.de
On Sat, 04 Jul 2009 12:41:42 +0200
Johannes Berg <[email protected]> wrote:
> On Sat, 2009-07-04 at 13:03 +0400, Alexander Barinov wrote:
> > The initial symptom of the bug was system lockup when executing
> > 'ifdown eth0' with 'BUG: scheduling while atomic'. Trying to
> > understand the cause of the bug and find a workaround I was able to
> > find an easier way to reproduce it. By executing
> > 'cat /proc/net/wireless' I get the same bug:
> This should have been fixed by
> 87057825824973f29cf2f37cff1e549170b2d7e6. For some reason everybody
> seems to have assumed that get_wireless_stats can sleep, which before
> that commit it could _not_.
The patch you have mentioned was not directly applicable to 2.6.30
kernel so I pulled wireless-testing git tree and compiled it. This
deteriorated the situation further - now after executing
'cat /proc/net/wireless' I get kernel lockup without any further
messages.
On Sat, 2009-07-04 at 13:03 +0400, Alexander Barinov wrote:
> Hi,
>
> I have Eking I1 UMPC that features Marvell 8686 SDIO wireless card. I
> am currently using vanilla kernel 2.6.30, but the bug is there on
> Debian 2.26 and 2.29 as well as vanilla 2.29 kernels I tried previously
>
> The initial symptom of the bug was system lockup when executing 'ifdown
> eth0' with 'BUG: scheduling while atomic'. Trying to understand the
> cause of the bug and find a workaround I was able to find an easier way
> to reproduce it. By executing 'cat /proc/net/wireless' I get the same
> bug:
>
> BUG: scheduling while atomic: cat/1885/0x00000002
> Pid: 1885, comm: cat Not tainted 2.6.30.090704 #2
> Call Trace:
> [<c042eb7f>] ? __schedule+0x37f/0x8c0
> [<c01237cd>] ? try_to_wake_up+0x8d/0x1e0
> [<c011ee0e>] ? __wake_up+0x3e/0x60
> [<c02e882c>] ? __lbs_cmd_async+0x11c/0x280
> [<c018ddbe>] ? d_rehash+0x2e/0x50
> [<c042f0d0>] ? schedule+0x10/0x30
> [<c02e8be4>] ? __lbs_cmd+0xa4/0x1a0
> [<c02ea3f0>] ? lbs_cmd_copyback+0x0/0x40
> [<c013c460>] ? autoremove_wake_function+0x0/0x50
> [<c02e67d5>] ? lbs_get_wireless_stats+0xf5/0x3c0
> [<c02ea3f0>] ? lbs_cmd_copyback+0x0/0x40
> [<c03fb1e7>] ? wireless_seq_show+0x47/0x180
> [<c03965bf>] ? dev_seq_start+0x1f/0xb0
> [<c0196c6d>] ? seq_read+0x1fd/0x360
> [<c0196a70>] ? seq_read+0x0/0x360
> [<c01b4774>] ? proc_reg_read+0x64/0xa0
> [<c01b4710>] ? proc_reg_read+0x0/0xa0
> [<c017f9ab>] ? vfs_read+0x9b/0x120
> [<c017fb01>] ? sys_read+0x41/0x80
> [<c0102f21>] ? syscall_call+0x7/0xb
>
> Please let me know if I should provide any additional details regarding
> the bug.
This should have been fixed by 87057825824973f29cf2f37cff1e549170b2d7e6.
For some reason everybody seems to have assumed that get_wireless_stats
can sleep, which before that commit it could _not_.
johannes