2008-10-15 19:39:01

by Frederik Himpe

[permalink] [raw]
Subject: iwlagn: associating with AP causes kernel hiccup

When I associate with an AP, Linux 2.6.27 seems to "hang" for a few
seconds. During that time, all sound stops playing and keyboard and mouse
input is impossible. In the log I see this:

wlan0: authenticate with AP 00:15:f2:0a:ab:43
wlan0: authenticate with AP 00:15:f2:0a:ab:43
wlan0: authenticate with AP 00:15:f2:0a:ab:43
wlan0: authenticated
wlan0: associate with AP 00:15:f2:0a:ab:43
wlan0: RX AssocResp from 00:15:f2:0a:ab:43 (capab=0x431 status=0 aid=2)
wlan0: associated
ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
iwlagn 0000:0c:00.0: PCI INT A disabled
iwlagn 0000:0c:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
iwlagn 0000:0c:00.0: restoring config space at offset 0x1 (was 0x100102,
writing 0x100106)
Registered led device: iwl-phy0:radio
Registered led device: iwl-phy0:assoc
Registered led device: iwl-phy0:RX
Registered led device: iwl-phy0:TX
ADDRCONF(NETDEV_UP): wlan0: link is not ready
iwlagn 0000:0c:00.0: PCI INT A disabled
iwlagn 0000:0c:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
iwlagn 0000:0c:00.0: restoring config space at offset 0x1 (was 0x100102,
writing 0x100106)
Registered led device: iwl-phy0:radio
Registered led device: iwl-phy0:assoc
Registered led device: iwl-phy0:RX
Registered led device: iwl-phy0:TX
ADDRCONF(NETDEV_UP): wlan0: link is not ready
CPU0 attaching NULL sched-domain.
CPU1 attaching NULL sched-domain.
CPU0 attaching sched-domain:
domain 0: span 0-1 level MC
groups: 0 1
domain 1: span 0-1 level NODE
groups: 0-1
CPU1 attaching sched-domain:
domain 0: span 0-1 level MC
groups: 1 0
domain 1: span 0-1 level NODE
groups: 0-1
wlan0: deauthenticating by local choice (reason=3)
psmouse.c: DualPoint TouchPad at isa0060/serio1/input0 lost
synchronization, throwing 1 bytes away.
iwlagn 0000:0c:00.0: PCI INT A disabled
iwlagn 0000:0c:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
iwlagn 0000:0c:00.0: restoring config space at offset 0x1 (was 0x100102,
writing 0x100106)
Registered led device: iwl-phy0:radio
Registered led device: iwl-phy0:assoc
Registered led device: iwl-phy0:RX
Registered led device: iwl-phy0:TX
ADDRCONF(NETDEV_UP): wlan0: link is not ready
psmouse.c: resync failed, issuing reconnect request
input: PS/2 Generic Mouse as /class/input/input13
wlan0: authenticate with AP 00:15:f2:0a:ab:43
wlan0: authenticated
wlan0: associate with AP 00:15:f2:0a:ab:43
wlan0: RX AssocResp from 00:15:f2:0a:ab:43 (capab=0x431 status=0 aid=2)
wlan0: associated

This is a Dell Latitude E6400.

$ lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation Mobile 4 Series Chipset
Memory Controller Hub [8086:2a40] (rev 07)
00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile 4
Series Chipset Integrated Graphics Controller [8086:2a42] (rev 07)
00:02.1 Display controller [0380]: Intel Corporation Mobile 4 Series
Chipset Integrated Graphics Controller [8086:2a43] (rev 07)
00:19.0 Ethernet controller [0200]: Intel Corporation 82567LM Gigabit
Network Connection [8086:10f5] (rev 03)
00:1a.0 USB Controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB
UHCI Controller #4 [8086:2937] (rev 03)
00:1a.1 USB Controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB
UHCI Controller #5 [8086:2938] (rev 03)
00:1a.2 USB Controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB
UHCI Controller #6 [8086:2939] (rev 03)
00:1a.7 USB Controller [0c03]: Intel Corporation 82801I (ICH9 Family)
USB2 EHCI Controller #2 [8086:293c] (rev 03)
00:1b.0 Audio device [0403]: Intel Corporation 82801I (ICH9 Family) HD
Audio Controller [8086:293e] (rev 03)
00:1c.0 PCI bridge [0604]: Intel Corporation 82801I (ICH9 Family) PCI
Express Port 1 [8086:2940] (rev 03)
00:1c.1 PCI bridge [0604]: Intel Corporation 82801I (ICH9 Family) PCI
Express Port 2 [8086:2942] (rev 03)
00:1c.2 PCI bridge [0604]: Intel Corporation 82801I (ICH9 Family) PCI
Express Port 3 [8086:2944] (rev 03)
00:1d.0 USB Controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB
UHCI Controller #1 [8086:2934] (rev 03)
00:1d.1 USB Controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB
UHCI Controller #2 [8086:2935] (rev 03)
00:1d.2 USB Controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB
UHCI Controller #3 [8086:2936] (rev 03)
00:1d.7 USB Controller [0c03]: Intel Corporation 82801I (ICH9 Family)
USB2 EHCI Controller #1 [8086:293a] (rev 03)
00:1e.0 PCI bridge [0604]: Intel Corporation 82801 Mobile PCI Bridge
[8086:2448] (rev 93)
00:1f.0 ISA bridge [0601]: Intel Corporation ICH9M-E LPC Interface
Controller [8086:2917] (rev 03)
00:1f.2 SATA controller [0106]: Intel Corporation ICH9M/M-E SATA AHCI
Controller [8086:2929] (rev 03)
00:1f.3 SMBus [0c05]: Intel Corporation 82801I (ICH9 Family) SMBus
Controller [8086:2930] (rev 03)
03:01.0 CardBus bridge [0607]: Ricoh Co Ltd RL5c476 II [1180:0476] (rev
ba)
03:01.1 FireWire (IEEE 1394) [0c00]: Ricoh Co Ltd R5C832 IEEE 1394
Controller [1180:0832] (rev 04)
03:01.2 SD Host controller [0805]: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/
MSPro Host Adapter [1180:0822] (rev 21)
03:01.3 System peripheral [0880]: Ricoh Co Ltd R5C843 MMC Host Controller
[1180:0843] (rev ff)
0c:00.0 Network controller [0280]: Intel Corporation Device [8086:4235]




2008-10-15 21:03:26

by Reinette Chatre

[permalink] [raw]
Subject: Re: iwlagn: associating with AP causes kernel hiccup

On Wed, 2008-10-15 at 19:38 +0000, Frederik Himpe wrote:
> When I associate with an AP, Linux 2.6.27 seems to "hang" for a few
> seconds. During that time, all sound stops playing and keyboard and mouse
> input is impossible. In the log I see this:
>
> wlan0: authenticate with AP 00:15:f2:0a:ab:43
> wlan0: authenticate with AP 00:15:f2:0a:ab:43
> wlan0: authenticate with AP 00:15:f2:0a:ab:43
> wlan0: authenticated
> wlan0: associate with AP 00:15:f2:0a:ab:43
> wlan0: RX AssocResp from 00:15:f2:0a:ab:43 (capab=0x431 status=0 aid=2)
> wlan0: associated
> ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
> iwlagn 0000:0c:00.0: PCI INT A disabled
> iwlagn 0000:0c:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
> iwlagn 0000:0c:00.0: restoring config space at offset 0x1 (was 0x100102,
> writing 0x100106)

Are you running wpa_supplicant and/or network manager?

Can you try testing without them?

Reinette


2008-10-17 20:02:03

by Tomas Winkler

[permalink] [raw]
Subject: Re: iwlagn: associating with AP causes kernel hiccup

On Fri, Oct 17, 2008 at 5:27 PM, Richard Scherping <[email protected]> wrote:
>>>>> When I associate with an AP, Linux 2.6.27 seems to "hang" for a few
>>>>> seconds. During that time, all sound stops playing and keyboard and
>>>>> mouse input is impossible.
>>>
>>> I'm using Mandriva 2009.0 x86_64 with wpa_supplicant and Mandriva's
>>> wireless network configuration tool (drakroam).
>>>
>>> Actually I just found out that running # ifconfig wlan0 down
>>> is enough to trigger the sound and mouse hanging for a few seconds.
>>
>> And shortly after I wrote that, while associating while getting an IP
>> with dhclient when associating with a WPA encrypted AP, I got this
>> backtrace in my logs:
>> [...]
>
> I have a similar problem here. No crash up to now, but the very same "hang" for a few seconds on "ifconfig wlan0 down". Interestingly this does only happen after a normal boot - once I did a suspend and resume (S3), there is no hang anymore.
>
> Hardware: Thinkpad T61p with Intel 4965 agn
> Software: Debian Lenny x86_64 with vanilla 2.6.27 kernel
>
Driver in 2.6.27 is not stable, please try to reproduce this in
current wireless-testing.git.
Thanks

2008-10-17 15:27:47

by Richard Scherping

[permalink] [raw]
Subject: Re: iwlagn: associating with AP causes kernel hiccup

>>>> When I associate with an AP, Linux 2.6.27 seems to "hang" for a few
>>>> seconds. During that time, all sound stops playing and keyboard and
>>>> mouse input is impossible.
>>
>> I'm using Mandriva 2009.0 x86_64 with wpa_supplicant and Mandriva's
>> wireless network configuration tool (drakroam).
>>
>> Actually I just found out that running # ifconfig wlan0 down
>> is enough to trigger the sound and mouse hanging for a few seconds.
>
> And shortly after I wrote that, while associating while getting an IP
> with dhclient when associating with a WPA encrypted AP, I got this
> backtrace in my logs:
> [...]

I have a similar problem here. No crash up to now, but the very same "hang" for a few seconds on "ifconfig wlan0 down". Interestingly this does only happen after a normal boot - once I did a suspend and resume (S3), there is no hang anymore.

Hardware: Thinkpad T61p with Intel 4965 agn
Software: Debian Lenny x86_64 with vanilla 2.6.27 kernel

I remember having the same "hang" with older kernels, too.

Richard

2008-10-19 22:12:51

by Tomas Winkler

[permalink] [raw]
Subject: Re: iwlagn: associating with AP causes kernel hiccup

On Sun, Oct 19, 2008 at 5:18 PM, Andy Lutomirski <[email protected]> wrote:
> Richard Scherping wrote:
>>
>> Tomas Winkler schrieb:
>>>
>>> On Fri, Oct 17, 2008 at 5:27 PM, Richard Scherping
>>> <[email protected]> wrote:
>>>>>>>>
>>>>>>>> When I associate with an AP, Linux 2.6.27 seems to "hang" for a few
>>>>>>>> seconds. During that time, all sound stops playing and keyboard and
>>>>>>>> mouse input is impossible.
>>>>>>
>>>>>> I'm using Mandriva 2009.0 x86_64 with wpa_supplicant and Mandriva's
>>>>>> wireless network configuration tool (drakroam).
>>>>>>
>>>>>> Actually I just found out that running # ifconfig wlan0 down
>>>>>> is enough to trigger the sound and mouse hanging for a few seconds.
>>>>>
>>>>> And shortly after I wrote that, while associating while getting an IP
>>>>> with dhclient when associating with a WPA encrypted AP, I got this
>>>>> backtrace in my logs:
>>>>> [...]
>>>>
>>>> I have a similar problem here. No crash up to now, but the very same
>>>> "hang" for a few seconds on "ifconfig wlan0 down". Interestingly this does
>>>> only happen after a normal boot - once I did a suspend and resume (S3),
>>>> there is no hang anymore.
>>>>
>>>> Hardware: Thinkpad T61p with Intel 4965 agn
>>>> Software: Debian Lenny x86_64 with vanilla 2.6.27 kernel
>>>>
>>> Driver in 2.6.27 is not stable, please try to reproduce this in
>>> current wireless-testing.git.
>>
>> I do not have the time to compile and test wireless-testing ATM, sorry.
>>
>> In fact I am annoyed by the fact that iwlagn is "known to be unstable" in
>> a stable kernel release and that this even seems to be a totally normal
>> thing...

>
> Amen.
Stable doesn't mean all components are stable, citation from Linus blog:
"It doesn't have to be perfect (and obviously no release ever is), but
it needs to be in reasonable shape"

The fact is that some critical patches were rejected as not
regressions in rc cycle and probably need to be pushed to the stable
version now or distribution will merge them.
We gave more priority for testing 32 bit version so it is more stable
then 64 bit which got much less in house testing and we've missed many
issues there. The driver doesn't get full exposure till it's get to
the public in stable version therefore no bugs are opened in the rc
cycle so also are not fixed in the stable version. and unfortunately
there is no much system testing at all for what get's into merging
window.
Second the whole mac80211 stack didn't address fully MQ rewrite so
it's a bit shaky as well and this will be fact also in 2.6.28.

This driver has been available and more-or-less working for ages.
> What kernel am I supposed to run if I just want a stable system? Haven't
> found one yet, other than distro kernels...
>
> In any case, I've seen these complete system hiccups with iwl4965 and iwlagn
> since at least 2.6.25 and through quite a few wireless-testing versions. I
> bet that this, along with things like it, is the culprit:

Haven't seen you've filled bug for it.

>
> In many, many functions:
> spin_lock_irqsave(&priv->lock, flags);
> ...
> ret = iwl_grab_nic_access(priv);
>
> In iwl-io.h (2.6.26.something):

This code is here from version 2.6.18 at least was just moved around.

> static inline int _iwl_grab_nic_access(struct iwl_priv *priv)
> {
> ...
> ret = _iwl_poll_bit(priv, CSR_GP_CNTRL,
> CSR_GP_CNTRL_REG_VAL_MAC_ACCESS_EN,
> (CSR_GP_CNTRL_REG_FLAG_MAC_CLOCK_READY |
> CSR_GP_CNTRL_REG_FLAG_GOING_TO_SLEEP), 50);
> ...
> }
>
> static inline int _iwl_poll_bit(struct iwl_priv *priv, u32 addr,
> u32 bits, u32 mask, int timeout)
> {
> int i = 0;
>
> do {
> if ((_iwl_read32(priv, addr) & mask) == (bits & mask))
> return i;
> mdelay(10);
> i += 10;
> } while (i < timeout);
>
> return -ETIMEDOUT;
> }
>
> Polling the hardware waiting for firmware to do something *with IRQs
> disabled*? I'd really rather the drivers on my system didn't do this.
>
> I'd attempt to fix this myself, but I have no clue what the locking rules
> are supposed to be.

Locking need to be really revised but till now I didn't see show
stoppers issues so it didn't get priority

> Would I be out of line for wishing the iwlwifi developers
Patches are always welcome

would fix
> longstanding issues (latency and maybe horkage after resume, although the
> latter seems much improved lately) before adding fancy new things?

There are also problem in mac80211 it self and we did as well some
work to improve it a bit.
Tomas

2008-10-17 12:06:19

by Frederik Himpe

[permalink] [raw]
Subject: Re: iwlagn: associating with AP causes kernel hiccup

On Wed, 15 Oct 2008 13:59:23 -0700, reinette chatre wrote:

> On Wed, 2008-10-15 at 19:38 +0000, Frederik Himpe wrote:
>> When I associate with an AP, Linux 2.6.27 seems to "hang" for a few
>> seconds. During that time, all sound stops playing and keyboard and
>> mouse input is impossible. In the log I see this:
>>
>> wlan0: authenticate with AP 00:15:f2:0a:ab:43 wlan0: authenticate with
>> AP 00:15:f2:0a:ab:43 wlan0: authenticate with AP 00:15:f2:0a:ab:43
>> wlan0: authenticated
>> wlan0: associate with AP 00:15:f2:0a:ab:43 wlan0: RX AssocResp from
>> 00:15:f2:0a:ab:43 (capab=0x431 status=0 aid=2) wlan0: associated
>> ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready iwlagn 0000:0c:00.0:
>> PCI INT A disabled iwlagn 0000:0c:00.0: PCI INT A -> GSI 17 (level,
>> low) -> IRQ 17 iwlagn 0000:0c:00.0: restoring config space at offset
>> 0x1 (was 0x100102, writing 0x100106)
>
> Are you running wpa_supplicant and/or network manager?

I'm using Mandriva 2009.0 x86_64 with wpa_supplicant and Mandriva's
wireless network configuration tool (drakroam).

> Can you try testing without them?

Actually I just found out that running
# ifconfig wlan0 down
is enough to trigger the sound and mouse hanging for a few seconds.

This is what appears in the kernel logs when running that command:

Oct 17 14:04:51 defected kernel: iwlagn 0000:0c:00.0: PCI INT A disabled
Oct 17 14:04:52 defected kernel: iwlagn 0000:0c:00.0: PCI INT A -> GSI 17
(level, low) -> IRQ 17
Oct 17 14:04:52 defected kernel: iwlagn 0000:0c:00.0: restoring config
space at offset 0x1 (was 0x100102, writing 0x100106)
Oct 17 14:04:52 defected kernel: Registered led device: iwl-phy0:radio
Oct 17 14:04:52 defected kernel: Registered led device: iwl-phy0:assoc
Oct 17 14:04:52 defected kernel: Registered led device: iwl-phy0:RX
Oct 17 14:04:52 defected kernel: Registered led device: iwl-phy0:TX
Oct 17 14:04:52 defected kernel: ADDRCONF(NETDEV_UP): wlan0: link is not
ready

--
Frederik Himpe



2008-10-19 15:27:51

by Andy Lutomirski

[permalink] [raw]
Subject: Re: iwlagn: associating with AP causes kernel hiccup

Richard Scherping wrote:
> Tomas Winkler schrieb:
>> On Fri, Oct 17, 2008 at 5:27 PM, Richard Scherping <[email protected]> wrote:
>>>>>>> When I associate with an AP, Linux 2.6.27 seems to "hang" for a few
>>>>>>> seconds. During that time, all sound stops playing and keyboard and
>>>>>>> mouse input is impossible.
>>>>> I'm using Mandriva 2009.0 x86_64 with wpa_supplicant and Mandriva's
>>>>> wireless network configuration tool (drakroam).
>>>>>
>>>>> Actually I just found out that running # ifconfig wlan0 down
>>>>> is enough to trigger the sound and mouse hanging for a few seconds.
>>>> And shortly after I wrote that, while associating while getting an IP
>>>> with dhclient when associating with a WPA encrypted AP, I got this
>>>> backtrace in my logs:
>>>> [...]
>>> I have a similar problem here. No crash up to now, but the very same "hang" for a few seconds on "ifconfig wlan0 down". Interestingly this does only happen after a normal boot - once I did a suspend and resume (S3), there is no hang anymore.
>>>
>>> Hardware: Thinkpad T61p with Intel 4965 agn
>>> Software: Debian Lenny x86_64 with vanilla 2.6.27 kernel
>>>
>> Driver in 2.6.27 is not stable, please try to reproduce this in
>> current wireless-testing.git.
>
> I do not have the time to compile and test wireless-testing ATM, sorry.
>
> In fact I am annoyed by the fact that iwlagn is "known to be unstable" in a stable kernel release and that this even seems to be a totally normal thing...

Amen. This driver has been available and more-or-less working for ages.
What kernel am I supposed to run if I just want a stable system?
Haven't found one yet, other than distro kernels...

In any case, I've seen these complete system hiccups with iwl4965 and
iwlagn since at least 2.6.25 and through quite a few wireless-testing
versions. I bet that this, along with things like it, is the culprit:

In many, many functions:
spin_lock_irqsave(&priv->lock, flags);
...
ret = iwl_grab_nic_access(priv);

In iwl-io.h (2.6.26.something):
static inline int _iwl_grab_nic_access(struct iwl_priv *priv)
{
...
ret = _iwl_poll_bit(priv, CSR_GP_CNTRL,
CSR_GP_CNTRL_REG_VAL_MAC_ACCESS_EN,
(CSR_GP_CNTRL_REG_FLAG_MAC_CLOCK_READY |
CSR_GP_CNTRL_REG_FLAG_GOING_TO_SLEEP), 50);
...
}

static inline int _iwl_poll_bit(struct iwl_priv *priv, u32 addr,
u32 bits, u32 mask, int timeout)
{
int i = 0;

do {
if ((_iwl_read32(priv, addr) & mask) == (bits & mask))
return i;
mdelay(10);
i += 10;
} while (i < timeout);

return -ETIMEDOUT;
}

Polling the hardware waiting for firmware to do something *with IRQs
disabled*? I'd really rather the drivers on my system didn't do this.

I'd attempt to fix this myself, but I have no clue what the locking
rules are supposed to be.

Would I be out of line for wishing the iwlwifi developers would fix
longstanding issues (latency and maybe horkage after resume, although
the latter seems much improved lately) before adding fancy new things?

--Andy

2008-10-19 23:12:03

by Tomas Winkler

[permalink] [raw]
Subject: Re: iwlagn: associating with AP causes kernel hiccup

On Mon, Oct 20, 2008 at 12:52 AM, Andrew Lutomirski <[email protected]> wrote:
> On Sun, Oct 19, 2008 at 6:12 PM, Tomas Winkler <[email protected]> wrote:
>> On Sun, Oct 19, 2008 at 5:18 PM, Andy Lutomirski <[email protected]> wrote:
>>> Richard Scherping wrote:
>>>>
>>>> Tomas Winkler schrieb:
>>
>>>
>>> Amen.
>> Stable doesn't mean all components are stable, citation from Linus blog:
>> "It doesn't have to be perfect (and obviously no release ever is), but
>> it needs to be in reasonable shape"
>>
>> The fact is that some critical patches were rejected as not
>> regressions in rc cycle and probably need to be pushed to the stable
>> version now or distribution will merge them.
>> We gave more priority for testing 32 bit version so it is more stable
>> then 64 bit which got much less in house testing and we've missed many
>> issues there. The driver doesn't get full exposure till it's get to
>> the public in stable version therefore no bugs are opened in the rc
>> cycle so also are not fixed in the stable version. and unfortunately
>> there is no much system testing at all for what get's into merging
>> window.
>> Second the whole mac80211 stack didn't address fully MQ rewrite so
>> it's a bit shaky as well and this will be fact also in 2.6.28.
>
> OK.
>
>>
>> This driver has been available and more-or-less working for ages.
>>> What kernel am I supposed to run if I just want a stable system? Haven't
>>> found one yet, other than distro kernels...
>>>
>>> In any case, I've seen these complete system hiccups with iwl4965 and iwlagn
>>> since at least 2.6.25 and through quite a few wireless-testing versions. I
>>> bet that this, along with things like it, is the culprit:
>>
>> Haven't seen you've filled bug for it.
>
> Fair enough. #1790.
>
Appreciated.

>>
>> Locking need to be really revised but till now I didn't see show
>> stoppers issues so it didn't get priority
>>
>>> Would I be out of line for wishing the iwlwifi developers
>> Patches are always welcome
>
> I can write a patch to add a mutex and change it to:
>
> take mutex
> grab_nic
> spinlock
>
> but I bet that would break all kinds of things. :)
>
I'm far from being lock master but I think mutex just won't work here
it can be used only in sleep-able context Also if I'm not mistake if
you using the lock in irq context we must use irqsafe version of the
spin lock,
Thanks
Tomas

2008-10-19 22:52:05

by Andy Lutomirski

[permalink] [raw]
Subject: Re: iwlagn: associating with AP causes kernel hiccup

On Sun, Oct 19, 2008 at 6:12 PM, Tomas Winkler <[email protected]> wrote:
> On Sun, Oct 19, 2008 at 5:18 PM, Andy Lutomirski <[email protected]> wrote:
>> Richard Scherping wrote:
>>>
>>> Tomas Winkler schrieb:
>
>>
>> Amen.
> Stable doesn't mean all components are stable, citation from Linus blog:
> "It doesn't have to be perfect (and obviously no release ever is), but
> it needs to be in reasonable shape"
>
> The fact is that some critical patches were rejected as not
> regressions in rc cycle and probably need to be pushed to the stable
> version now or distribution will merge them.
> We gave more priority for testing 32 bit version so it is more stable
> then 64 bit which got much less in house testing and we've missed many
> issues there. The driver doesn't get full exposure till it's get to
> the public in stable version therefore no bugs are opened in the rc
> cycle so also are not fixed in the stable version. and unfortunately
> there is no much system testing at all for what get's into merging
> window.
> Second the whole mac80211 stack didn't address fully MQ rewrite so
> it's a bit shaky as well and this will be fact also in 2.6.28.

OK.

>
> This driver has been available and more-or-less working for ages.
>> What kernel am I supposed to run if I just want a stable system? Haven't
>> found one yet, other than distro kernels...
>>
>> In any case, I've seen these complete system hiccups with iwl4965 and iwlagn
>> since at least 2.6.25 and through quite a few wireless-testing versions. I
>> bet that this, along with things like it, is the culprit:
>
> Haven't seen you've filled bug for it.

Fair enough. #1790.

>
> Locking need to be really revised but till now I didn't see show
> stoppers issues so it didn't get priority
>
>> Would I be out of line for wishing the iwlwifi developers
> Patches are always welcome

I can write a patch to add a mutex and change it to:

take mutex
grab_nic
spinlock

but I bet that would break all kinds of things. :)

--Andy

2008-10-17 12:18:17

by Frederik Himpe

[permalink] [raw]
Subject: Re: iwlagn: associating with AP causes kernel hiccup

On Fri, 17 Oct 2008 12:06:08 +0000, Frederik Himpe wrote:

> On Wed, 15 Oct 2008 13:59:23 -0700, reinette chatre wrote:
>
>> On Wed, 2008-10-15 at 19:38 +0000, Frederik Himpe wrote:
>>> When I associate with an AP, Linux 2.6.27 seems to "hang" for a few
>>> seconds. During that time, all sound stops playing and keyboard and
>>> mouse input is impossible. In the log I see this:
>>>
>>> wlan0: authenticate with AP 00:15:f2:0a:ab:43 wlan0: authenticate with
>>> AP 00:15:f2:0a:ab:43 wlan0: authenticate with AP 00:15:f2:0a:ab:43
>>> wlan0: authenticated
>>> wlan0: associate with AP 00:15:f2:0a:ab:43 wlan0: RX AssocResp from
>>> 00:15:f2:0a:ab:43 (capab=0x431 status=0 aid=2) wlan0: associated
>>> ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready iwlagn
>>> 0000:0c:00.0: PCI INT A disabled iwlagn 0000:0c:00.0: PCI INT A -> GSI
>>> 17 (level, low) -> IRQ 17 iwlagn 0000:0c:00.0: restoring config space
>>> at offset 0x1 (was 0x100102, writing 0x100106)
>>
>> Are you running wpa_supplicant and/or network manager?
>
> I'm using Mandriva 2009.0 x86_64 with wpa_supplicant and Mandriva's
> wireless network configuration tool (drakroam).
>
>> Can you try testing without them?
>
> Actually I just found out that running # ifconfig wlan0 down
> is enough to trigger the sound and mouse hanging for a few seconds.

And shortly after I wrote that, while associating while getting an IP
with dhclient when associating with a WPA encrypted AP, I got this
backtrace in my logs:

ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
ADDRCONF(NETDEV_UP): eth0: link is not ready
0000:00:19.0: eth0: Link is Up 10 Mbps Full Duplex, Flow Control: None
0000:00:19.0: eth0: 10/100 speed: disabling TSO
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
martian source 255.255.255.255 from 134.184.26.55, on dev eth0
ll header: ff:ff:ff:ff:ff:ff:00:16:cb:06:4b:54:08:00
wlan0: disassociating by local choice (reason=3)
wlan0: no IPv6 routers present
wlan0: authenticate with AP 00:0d:54:a0:07:e3
wlan0: authenticate with AP 00:0d:54:a0:07:e3
wlan0: authenticated
wlan0: associate with AP 00:0d:54:a0:07:e3
wlan0: RX ReassocResp from 00:0d:54:a0:07:e3 (capab=0x431 status=0 aid=7)
wlan0: associated
eth0: no IPv6 routers present
iwlagn: index 255 not used in uCode key table.
iwlagn 0000:0c:00.0: PCI INT A disabled
iwlagn 0000:0c:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
iwlagn 0000:0c:00.0: restoring config space at offset 0x1 (was 0x100102,
writing 0x100106)
general protection fault: 0000 [1] SMP
CPU 1
Modules linked in: sit tunnel4 i915 drm af_packet ipt_MASQUERADE
iptable_nat nf_nat hidp nf_conntrack_ipv4 xt_state nf_conntrack
ipt_REJECT bnep xt_tcpudp btusb iptable_filter ip_tables rfcomm x_tables
bridge l2cap bluetooth stp ipv6 kvm_intel kvm vboxdrv binfmt_misc fuse
nls_utf8 nls_cp437 vfat fat ext3 jbd loop cpufreq_ondemand
cpufreq_conservative cpufreq_powersave acpi_cpufreq freq_table nvram i8k
snd_hda_intel arc4 firewire_ohci ecb firewire_core snd_hwdep crc_itu_t
snd_seq_dummy crypto_blkcipher snd_seq_oss thermal snd_seq_midi_event
i2c_i801 pcmcia video wmi snd_seq i2c_core ac button battery processor sg
snd_seq_device iwlagn snd_pcm_oss sr_mod snd_pcm iwlcore usbhid sdhci_pci
ohci1394 ricoh_mmc sdhci ieee1394 rfkill led_class snd_timer output
e1000e snd_page_alloc mmc_core yenta_socket rsrc_nonstatic pcmcia_core
snd_mixer_oss rtc_cmos joydev dcdbas snd mac80211 soundcore ff_memless
cfg80211 evdev dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix ahci
libata dock sd_mod scsi_mod crc_t10dif xfs uhci_hcd ohci_hcd ehci_hcd
usbcore [last unloaded: scsi_wait_scan]
Pid: 29422, comm: ifconfig Not tainted 2.6.27-tmb-desktop-2mdv #1
RIP: 0010:[<ffffffffa027b25b>] [<ffffffffa027b25b>] iwl_eeprom_query16
+0xb/0x20 [iwlcore]
RSP: 0018:ffff8800b193fb48 EFLAGS: 00010002
RAX: 7fff88011a87b800 RBX: 0000000000000286 RCX: 00000000800c00f0
RDX: ffffc200050b403c RSI: 0000000000000090 RDI: ffff88011a0c1900
RBP: ffff8800b193fb48 R08: 0000000000000001 R09: ffff8800b193fb1c
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88011a0c1900
R13: ffff88011a0c1900 R14: ffff88011a0c2348 R15: 0000000000000282
FS: 00007f75811606f0(0000) GS:ffff88011fc02980(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000016df108 CR3: 000000007c005000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ifconfig (pid: 29422, threadinfo ffff8800b193e000, task
ffff8800bbccd700)
Stack: ffff8800b193fb98 ffffffffa02c81fb 0000000000000282
ffffffffffffff04
42ffffff804dd893 0000000000000286 ffff88011a0c1900 ffff88011a0c2348
ffff88011a0c2a50 ffff88011a0c2350 ffff8800b193fbc8 ffffffffa0279a66
Call Trace:
[<ffffffffa02c81fb>] iwl5000_nic_config+0x6b/0x200 [iwlagn]
[<ffffffffa0279a66>] iwl_hw_nic_init+0x96/0x160 [iwlcore]
[<ffffffffa02be795>] __iwl4965_up+0xb5/0x2d0 [iwlagn]
[<ffffffffa02bf1c8>] iwl4965_mac_start+0x408/0x710 [iwlagn]
[<ffffffff8046efe3>] ? rt_cache_flush+0x23/0x130
[<ffffffff804a5e02>] ? fib_inetaddr_event+0x62/0x2b0
[<ffffffff804dd731>] ? _spin_lock_irq+0x11/0x20
[<ffffffffa0185632>] ieee80211_open+0x152/0x690 [mac80211]
[<ffffffff80261889>] ? up_read+0x9/0x10
[<ffffffff802625f3>] ? __blocking_notifier_call_chain+0x63/0x80
[<ffffffff8044b7da>] dev_open+0xaa/0xf0
[<ffffffff8044ad06>] dev_change_flags+0x96/0x1e0
[<ffffffff8049e3ae>] devinet_ioctl+0x6fe/0x760
[<ffffffff8049f284>] inet_ioctl+0x94/0xc0
[<ffffffff8043a646>] sock_ioctl+0x66/0x270
[<ffffffff802d59c1>] vfs_ioctl+0x31/0xa0
[<ffffffff802d5aa4>] do_vfs_ioctl+0x74/0x480
[<ffffffff804dd829>] ? _spin_lock+0x9/0x10
[<ffffffff8024f531>] ? cap_set_effective+0x61/0x90
[<ffffffff802d5f49>] sys_ioctl+0x99/0xa0
[<ffffffff8020c6ba>] system_call_fastpath+0x16/0x1b


Code: 48 8b 47 18 55 48 89 e5 48 8b 40 18 48 8b 00 ff 90 00 01 00 00 c9
c3 66 0f 1f 84 00 00 00 00 00 48 8b 87 30 23 01 00 55 48 89 e5 <0f> b6 54
30 01 0f b6 04 30 c9 c1 e2 08 09 d0 c3 0f 1f 44 00 00
RIP [<ffffffffa027b25b>] iwl_eeprom_query16+0xb/0x20 [iwlcore]
RSP <ffff8800b193fb48>
---[ end trace 186c7f9fa5d4b374 ]---

While rebooting the machine after that, it completely hung and I had to
do a hard reset.

--
Frederik Himpe


2008-10-18 16:17:39

by Richard Scherping

[permalink] [raw]
Subject: Re: iwlagn: associating with AP causes kernel hiccup

Tomas Winkler schrieb:
> On Fri, Oct 17, 2008 at 5:27 PM, Richard Scherping <[email protected]> wrote:
>>>>>> When I associate with an AP, Linux 2.6.27 seems to "hang" for a few
>>>>>> seconds. During that time, all sound stops playing and keyboard and
>>>>>> mouse input is impossible.
>>>> I'm using Mandriva 2009.0 x86_64 with wpa_supplicant and Mandriva's
>>>> wireless network configuration tool (drakroam).
>>>>
>>>> Actually I just found out that running # ifconfig wlan0 down
>>>> is enough to trigger the sound and mouse hanging for a few seconds.
>>> And shortly after I wrote that, while associating while getting an IP
>>> with dhclient when associating with a WPA encrypted AP, I got this
>>> backtrace in my logs:
>>> [...]
>> I have a similar problem here. No crash up to now, but the very same "hang" for a few seconds on "ifconfig wlan0 down". Interestingly this does only happen after a normal boot - once I did a suspend and resume (S3), there is no hang anymore.
>>
>> Hardware: Thinkpad T61p with Intel 4965 agn
>> Software: Debian Lenny x86_64 with vanilla 2.6.27 kernel
>>
> Driver in 2.6.27 is not stable, please try to reproduce this in
> current wireless-testing.git.

I do not have the time to compile and test wireless-testing ATM, sorry.

In fact I am annoyed by the fact that iwlagn is "known to be unstable" in a stable kernel release and that this even seems to be a totally normal thing...

Once compat-wireless is working again on my system I will try that and check whether the hang still occurs. Current compat-wireless does not for me - compilation says

make[1]: Entering directory `/usr/src/linux-2.6.27'
/home/richard/Desktop/compat-wireless-2008-10-17/config.mk:44: "WARNING: You are running a kernel >= 2.6.23, you should enable in it CONFIG_NETDEVICES_MULTIQUEUE for 802.11[ne] support"

and I did not find CONFIG_NETDEVICES_MULTIQUEUE using menuconfig. Loading the modules fails with tons of unknown symbols and disagreed versions (already at cfg80211 and mac80211 modules).

Perhaps I should go back to Debian stock 2.6.26 kernel - but there Ad-Hoc is broken on iwl4965, although it was working without problems earlier in 2.6.24.

Richard