2008-07-23 21:36:52

by Theodore Ts'o

[permalink] [raw]
Subject: Potential iwlwifi locking problem in 2.6.26-git9


Is this a new one? I don't think I've seen this on LKML....

- Ted

[ 728.456792]
[ 728.456795] =============================================
[ 728.456810] [ INFO: possible recursive locking detected ]
[ 728.456818] 2.6.26-05781-g44ea66a #25
[ 728.456823] ---------------------------------------------
[ 728.456830] NetworkManager/8287 is trying to acquire lock:
[ 728.456836] (&dev->addr_list_lock){-...}, at: [<c028c81e>] dev_mc_sync+0x1c/0x59
[ 728.456867]
[ 728.456869] but task is already holding lock:
[ 728.456875] (&dev->addr_list_lock){-...}, at: [<c0287ffe>] dev_set_rx_mode+0x19/0x2b
[ 728.456894]
[ 728.456895] other info that might help us debug this:
[ 728.456902] 2 locks held by NetworkManager/8287:
[ 728.456908] #0: (rtnl_mutex){--..}, at: [<c029150a>] rtnl_lock+0x14/0x16
[ 728.456929] #1: (&dev->addr_list_lock){-...}, at: [<c0287ffe>] dev_set_rx_mode+0x19/0x2b
[ 728.456949]
[ 728.456951] stack backtrace:
[ 728.456961] Pid: 8287, comm: NetworkManager Not tainted 2.6.26-05781-g44ea66a #25
[ 728.456999] [<c014568e>] __lock_acquire+0x99f/0xcc5
[ 728.457037] [<c0144924>] ? trace_hardirqs_on+0xb/0xd
[ 728.457054] [<c01448ce>] ? trace_hardirqs_on_caller+0xd2/0x11d
[ 728.457076] [<c0145da6>] lock_acquire+0x6a/0x90
[ 728.457091] [<c028c81e>] ? dev_mc_sync+0x1c/0x59
[ 728.457109] [<c02f7189>] _spin_lock_bh+0x26/0x53
[ 728.457126] [<c028c81e>] ? dev_mc_sync+0x1c/0x59
[ 728.457143] [<c028c81e>] dev_mc_sync+0x1c/0x59
[ 728.457159] [<f90bebd6>] ieee80211_set_multicast_list+0xca/0xd2 [mac80211]
[ 728.457223] [<c0287f6c>] __dev_set_rx_mode+0x7e/0x81
[ 728.457239] [<c0288005>] dev_set_rx_mode+0x20/0x2b
[ 728.457255] [<c028ad54>] dev_open+0x84/0x9e
[ 728.457269] [<c0288e9d>] dev_change_flags+0xa1/0x14f
[ 728.457286] [<c0290af3>] do_setlink+0x250/0x300
[ 728.457308] [<c02f7000>] ? _read_unlock+0x22/0x25
[ 728.457327] [<c0291a8f>] rtnl_setlink+0xa9/0xc1
[ 728.457364] [<c01448e4>] ? trace_hardirqs_on_caller+0xe8/0x11d
[ 728.457384] [<c02919e6>] ? rtnl_setlink+0x0/0xc1
[ 728.457401] [<c02916c7>] rtnetlink_rcv_msg+0x197/0x1b1
[ 728.457424] [<c0291530>] ? rtnetlink_rcv_msg+0x0/0x1b1
[ 728.457441] [<c029e55d>] netlink_rcv_skb+0x35/0x7a
[ 728.457460] [<c0291528>] rtnetlink_rcv+0x1c/0x24
[ 728.457476] [<c029e381>] netlink_unicast+0x1ad/0x20e
[ 728.457499] [<c029eaca>] netlink_sendmsg+0x248/0x255
[ 728.457530] [<c027e91d>] sock_sendmsg+0xe3/0xfe
[ 728.457562] [<c0138a71>] ? autoremove_wake_function+0x0/0x35
[ 728.457583] [<c010878a>] ? native_sched_clock+0xd5/0xf4
[ 728.457609] [<c010857f>] ? sched_clock+0xb/0x1c
[ 728.457628] [<c0142ca9>] ? lock_release_holdtime+0x3f/0x44
[ 728.457652] [<c0208670>] ? copy_from_user+0x45/0x68
[ 728.457672] [<c02852d6>] ? verify_iovec+0x45/0x73
[ 728.457693] [<c027ea8a>] sys_sendmsg+0x152/0x1ad
[ 728.457713] [<c027f541>] ? sys_recvmsg+0x177/0x184
[ 728.457734] [<c019136e>] ? fget_light+0x95/0xb5
[ 728.457757] [<c02088db>] ? copy_to_user+0x47/0x50
[ 728.457776] [<c027f3b2>] ? move_addr_to_user+0x5b/0x73
[ 728.457796] [<c027f717>] ? sys_getsockname+0x70/0x7b
[ 728.457814] [<c02f702d>] ? _spin_unlock_bh+0x2a/0x2d
[ 728.457831] [<c028045d>] ? release_sock+0xb3/0xbb
[ 728.457851] [<c0281909>] ? sock_setsockopt+0x49d/0x4a8
[ 728.457874] [<c0191384>] ? fget_light+0xab/0xb5
[ 728.457909] [<c027fa05>] sys_socketcall+0x168/0x183
[ 728.457935] [<c0103a0b>] sysenter_past_esp+0x78/0xc5
[ 728.457969] =======================


2008-07-23 22:02:16

by Tomas Winkler

[permalink] [raw]
Subject: Re: Potential iwlwifi locking problem in 2.6.26-git9

On Thu, Jul 24, 2008 at 12:49 AM, Theodore Tso <[email protected]> wrote:
> On Wed, Jul 23, 2008 at 05:36:30PM -0400, Theodore Ts'o wrote:
>>
>> Is this a new one? I don't think I've seen this on LKML....
>
> The context of this was I had just booted to 2.6.26-git9, noted with
> annoyance that that the iwl4965 driver needed a new microcode file, so
> I popped in my EVDO card, downloaded the firmware, moved the firmware
> into place, and then got the above error message.
>
> I've since noted that it seems to be utterly refusing to associate
> with my wireless access point. This was working just fine with
> 2.6.26-git6 kernel, so it looks like a recent regression? I just
> noticed someone else had recently complained about a similar "refusal
> to associate" with the iwl3945 driver.

I'm not sure but this can be connected to what is described in thread:
'New: dhcp doesn't work with iwl4965'
Tomas

2008-07-23 22:20:05

by David Miller

[permalink] [raw]
Subject: Re: Potential iwlwifi locking problem in 2.6.26-git9

From: "Theodore Ts'o" <[email protected]>
Date: Wed, 23 Jul 2008 17:36:30 -0400

>
> Is this a new one? I don't think I've seen this on LKML....

Yes, we have, we're trying to work it out on netdev.

2008-07-23 23:11:00

by David Miller

[permalink] [raw]
Subject: Re: Potential iwlwifi locking problem in 2.6.26-git9

From: "Carlos R. Mafra" <[email protected]>
Date: Wed, 23 Jul 2008 17:59:06 -0500

> I am trying to bisect it, but I am facing some difficulties with compilation
> problems and non-booting kernels.

We know what the problem is and are discussing it on netdev
and linux-wireless already.

2008-07-23 21:49:53

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Potential iwlwifi locking problem in 2.6.26-git9

On Wed, Jul 23, 2008 at 05:36:30PM -0400, Theodore Ts'o wrote:
>
> Is this a new one? I don't think I've seen this on LKML....

The context of this was I had just booted to 2.6.26-git9, noted with
annoyance that that the iwl4965 driver needed a new microcode file, so
I popped in my EVDO card, downloaded the firmware, moved the firmware
into place, and then got the above error message.

I've since noted that it seems to be utterly refusing to associate
with my wireless access point. This was working just fine with
2.6.26-git6 kernel, so it looks like a recent regression? I just
noticed someone else had recently complained about a similar "refusal
to associate" with the iwl3945 driver.

- Ted

>
> [ 728.456792]
> [ 728.456795] =============================================
> [ 728.456810] [ INFO: possible recursive locking detected ]
> [ 728.456818] 2.6.26-05781-g44ea66a #25
> [ 728.456823] ---------------------------------------------
> [ 728.456830] NetworkManager/8287 is trying to acquire lock:
> [ 728.456836] (&dev->addr_list_lock){-...}, at: [<c028c81e>] dev_mc_sync+0x1c/0x59
> [ 728.456867]
> [ 728.456869] but task is already holding lock:
> [ 728.456875] (&dev->addr_list_lock){-...}, at: [<c0287ffe>] dev_set_rx_mode+0x19/0x2b
> [ 728.456894]
> [ 728.456895] other info that might help us debug this:
> [ 728.456902] 2 locks held by NetworkManager/8287:
> [ 728.456908] #0: (rtnl_mutex){--..}, at: [<c029150a>] rtnl_lock+0x14/0x16
> [ 728.456929] #1: (&dev->addr_list_lock){-...}, at: [<c0287ffe>] dev_set_rx_mode+0x19/0x2b
> [ 728.456949]
> [ 728.456951] stack backtrace:
> [ 728.456961] Pid: 8287, comm: NetworkManager Not tainted 2.6.26-05781-g44ea66a #25
> [ 728.456999] [<c014568e>] __lock_acquire+0x99f/0xcc5
> [ 728.457037] [<c0144924>] ? trace_hardirqs_on+0xb/0xd
> [ 728.457054] [<c01448ce>] ? trace_hardirqs_on_caller+0xd2/0x11d
> [ 728.457076] [<c0145da6>] lock_acquire+0x6a/0x90
> [ 728.457091] [<c028c81e>] ? dev_mc_sync+0x1c/0x59
> [ 728.457109] [<c02f7189>] _spin_lock_bh+0x26/0x53
> [ 728.457126] [<c028c81e>] ? dev_mc_sync+0x1c/0x59
> [ 728.457143] [<c028c81e>] dev_mc_sync+0x1c/0x59
> [ 728.457159] [<f90bebd6>] ieee80211_set_multicast_list+0xca/0xd2 [mac80211]
> [ 728.457223] [<c0287f6c>] __dev_set_rx_mode+0x7e/0x81
> [ 728.457239] [<c0288005>] dev_set_rx_mode+0x20/0x2b
> [ 728.457255] [<c028ad54>] dev_open+0x84/0x9e
> [ 728.457269] [<c0288e9d>] dev_change_flags+0xa1/0x14f
> [ 728.457286] [<c0290af3>] do_setlink+0x250/0x300
> [ 728.457308] [<c02f7000>] ? _read_unlock+0x22/0x25
> [ 728.457327] [<c0291a8f>] rtnl_setlink+0xa9/0xc1
> [ 728.457364] [<c01448e4>] ? trace_hardirqs_on_caller+0xe8/0x11d
> [ 728.457384] [<c02919e6>] ? rtnl_setlink+0x0/0xc1
> [ 728.457401] [<c02916c7>] rtnetlink_rcv_msg+0x197/0x1b1
> [ 728.457424] [<c0291530>] ? rtnetlink_rcv_msg+0x0/0x1b1
> [ 728.457441] [<c029e55d>] netlink_rcv_skb+0x35/0x7a
> [ 728.457460] [<c0291528>] rtnetlink_rcv+0x1c/0x24
> [ 728.457476] [<c029e381>] netlink_unicast+0x1ad/0x20e
> [ 728.457499] [<c029eaca>] netlink_sendmsg+0x248/0x255
> [ 728.457530] [<c027e91d>] sock_sendmsg+0xe3/0xfe
> [ 728.457562] [<c0138a71>] ? autoremove_wake_function+0x0/0x35
> [ 728.457583] [<c010878a>] ? native_sched_clock+0xd5/0xf4
> [ 728.457609] [<c010857f>] ? sched_clock+0xb/0x1c
> [ 728.457628] [<c0142ca9>] ? lock_release_holdtime+0x3f/0x44
> [ 728.457652] [<c0208670>] ? copy_from_user+0x45/0x68
> [ 728.457672] [<c02852d6>] ? verify_iovec+0x45/0x73
> [ 728.457693] [<c027ea8a>] sys_sendmsg+0x152/0x1ad
> [ 728.457713] [<c027f541>] ? sys_recvmsg+0x177/0x184
> [ 728.457734] [<c019136e>] ? fget_light+0x95/0xb5
> [ 728.457757] [<c02088db>] ? copy_to_user+0x47/0x50
> [ 728.457776] [<c027f3b2>] ? move_addr_to_user+0x5b/0x73
> [ 728.457796] [<c027f717>] ? sys_getsockname+0x70/0x7b
> [ 728.457814] [<c02f702d>] ? _spin_unlock_bh+0x2a/0x2d
> [ 728.457831] [<c028045d>] ? release_sock+0xb3/0xbb
> [ 728.457851] [<c0281909>] ? sock_setsockopt+0x49d/0x4a8
> [ 728.457874] [<c0191384>] ? fget_light+0xab/0xb5
> [ 728.457909] [<c027fa05>] sys_socketcall+0x168/0x183
> [ 728.457935] [<c0103a0b>] sysenter_past_esp+0x78/0xc5
> [ 728.457969] =======================

2008-07-23 22:24:58

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: Potential iwlwifi locking problem in 2.6.26-git9

Theodore Tso wrote:
> On Wed, Jul 23, 2008 at 05:36:30PM -0400, Theodore Ts'o wrote:
>
>> Is this a new one? I don't think I've seen this on LKML....
>>
>
> The context of this was I had just booted to 2.6.26-git9, noted with
> annoyance that that the iwl4965 driver needed a new microcode file, so
> I popped in my EVDO card, downloaded the firmware, moved the firmware
> into place, and then got the above error message.
>
> I've since noted that it seems to be utterly refusing to associate
> with my wireless access point. This was working just fine with
> 2.6.26-git6 kernel, so it looks like a recent regression? I just
> noticed someone else had recently complained about a similar "refusal
> to associate" with the iwl3945 driver.
>

I noticed the same thing last night; a current git build goes through
all the motions, but then fails to associate.

J

2008-07-23 22:57:49

by Carlos Mafra

[permalink] [raw]
Subject: Re: Potential iwlwifi locking problem in 2.6.26-git9

On Thu 24.Jul'08 at 1:02:15 +0300, Tomas Winkler wrote:
> On Thu, Jul 24, 2008 at 12:49 AM, Theodore Tso <[email protected]> wrote:
> > On Wed, Jul 23, 2008 at 05:36:30PM -0400, Theodore Ts'o wrote:
> >>
> >> Is this a new one? I don't think I've seen this on LKML....
> >
> > The context of this was I had just booted to 2.6.26-git9, noted with
> > annoyance that that the iwl4965 driver needed a new microcode file, so
> > I popped in my EVDO card, downloaded the firmware, moved the firmware
> > into place, and then got the above error message.
> >
> > I've since noted that it seems to be utterly refusing to associate
> > with my wireless access point. This was working just fine with
> > 2.6.26-git6 kernel, so it looks like a recent regression? I just
> > noticed someone else had recently complained about a similar "refusal
> > to associate" with the iwl3945 driver.
>
> I'm not sure but this can be connected to what is described in thread:
> 'New: dhcp doesn't work with iwl4965'
> Tomas

I can't associate with the access point anymore, and it is a very
recent regression for me too (two or three days ago).

I have the new firmware iwlwifi-4965-2.ucode inside /lib/firmware
and when I try to connect (using the connect button in drakroam from
Mandriva) I get this (digged from the syslog):

Jul 23 13:46:01 localhost drakroam[3801]: running: /sbin/ifdown wlan0 daemon
Jul 23 13:46:01 localhost kernel: wlan0: associate with AP 00:1b:11:3c:f6:db
Jul 23 13:46:01 localhost ifplugd(wlan0)[3170]: Exiting.
Jul 23 13:46:04 localhost kernel: iwl4965 0000:06:00.0: PCI INT A disabled
Jul 23 13:46:04 localhost drakroam[3801]: running: /sbin/ifup wlan0 daemon
Jul 23 13:46:04 localhost kernel: iwl4965 0000:06:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
Jul 23 13:46:04 localhost kernel: iwl4965 0000:06:00.0: restoring config space at offset 0x1 (was 0x100102, writing 0x100106)
Jul 23 13:46:04 localhost kernel: ------------[ cut here ]------------
Jul 23 13:46:04 localhost kernel: WARNING: at net/core/dev.c:1344 __netif_schedule+0x6f/0x80()
Jul 23 13:46:04 localhost kernel: Modules linked in: snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss loop nvram snd_hda_intel snd_pcm snd_timer ohci1394 sg snd_page_alloc ieee1394 sky2 iwl4965 snd_hwdep sr_mod i2c_i801 evdev ide_disk ide_core ata_piix ahci libata dock sd_mod scsi_mod uhci_hcd ohci_hcd ehci_hcd usbcore [last unloaded: scsi_wait_scan]
Jul 23 13:46:04 localhost kernel: Pid: 1613, comm: iwl4965/0 Not tainted 2.6.26-06077-gc010b2f #147
Jul 23 13:46:04 localhost kernel:
Jul 23 13:46:04 localhost kernel: Call Trace:
Jul 23 13:46:04 localhost kernel: [<ffffffff80237caf>] warn_on_slowpath+0x5f/0x80
Jul 23 13:46:04 localhost kernel: [<ffffffff8042109f>] __netif_schedule+0x6f/0x80
Jul 23 13:46:04 localhost kernel: [<ffffffff804a040a>] ieee80211_wake_queue+0x5a/0x70
Jul 23 13:46:04 localhost kernel: [<ffffffff804a044f>] ieee80211_wake_queues+0x2f/0x50
Jul 23 13:46:04 localhost kernel: [<ffffffffa00d2506>] iwl_bg_alive_start+0x216/0x4b0 [iwl4965]
Jul 23 13:46:04 localhost kernel: [<ffffffffa00d22f0>] ? iwl_bg_alive_start+0x0/0x4b0 [iwl4965]
Jul 23 13:46:04 localhost kernel: [<ffffffff802498d8>] run_workqueue+0x98/0x140
Jul 23 13:46:04 localhost kernel: [<ffffffff8024a3e3>] worker_thread+0xa3/0x110
Jul 23 13:46:04 localhost kernel: [<ffffffff8024d850>] ? autoremove_wake_function+0x0/0x40
Jul 23 13:46:04 localhost kernel: [<ffffffff8024a340>] ? worker_thread+0x0/0x110
Jul 23 13:46:04 localhost kernel: [<ffffffff8024d41d>] kthread+0x4d/0x80
Jul 23 13:46:04 localhost kernel: [<ffffffff8020d249>] child_rip+0xa/0x11
Jul 23 13:46:04 localhost kernel: [<ffffffff8024d3d0>] ? kthread+0x0/0x80
Jul 23 13:46:04 localhost kernel: [<ffffffff8020d23f>] ? child_rip+0x0/0x11
Jul 23 13:46:04 localhost kernel:
Jul 23 13:46:04 localhost kernel: ---[ end trace 9c04ce6cf2b007a2 ]---
Jul 23 13:46:04 localhost kernel: Registered led device: iwl-phy0:radio
Jul 23 13:46:04 localhost kernel: Registered led device: iwl-phy0:assoc
Jul 23 13:46:04 localhost kernel: Registered led device: iwl-phy0:RX
Jul 23 13:46:04 localhost kernel: Registered led device: iwl-phy0:TX
Jul 23 13:46:04 localhost kernel: iwl4965: TX Power requested while scanning!

I am trying to bisect it, but I am facing some difficulties with compilation
problems and non-booting kernels.

In the thread Thomas Winkler pointed out the reporter says that using dhclient instead
of dhcp makes it work. However Mandriva uses dhclient and it does not work here.



2008-07-23 23:34:09

by Carlos Mafra

[permalink] [raw]
Subject: Re: Potential iwlwifi locking problem in 2.6.26-git9

On Wed 23.Jul'08 at 16:11:00 -0700, David Miller wrote:
> From: "Carlos R. Mafra" <[email protected]>
> Date: Wed, 23 Jul 2008 17:59:06 -0500
>
> > I am trying to bisect it, but I am facing some difficulties with compilation
> > problems and non-booting kernels.
>
> We know what the problem is and are discussing it on netdev
> and linux-wireless already.

Thank you for letting me know, David!
I appreciate that deeply.