Starting with 2.6.27-rc5, my T60p sometimes fails to associate
with an access point after suspend to ram/resume.
This does not seem to ever happen on 2.6.27-rc4.
When this happens, I observe the following messages in dmesg:
(AP mac masked out with XX):
[16433.458084] iwl3945 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
[16433.502490] PM: Adding info for No Bus:iwl-phy4:radio
[16433.502666] Registered led device: iwl-phy4:radio
[16433.502764] PM: Adding info for No Bus:iwl-phy4:assoc
[16433.502822] Registered led device: iwl-phy4:assoc
[16433.502870] PM: Adding info for No Bus:iwl-phy4:RX
[16433.503006] Registered led device: iwl-phy4:RX
[16433.503055] PM: Adding info for No Bus:iwl-phy4:TX
[16433.503111] Registered led device: iwl-phy4:TX
[16433.505268] eth1: authenticate with AP XX:XX:XX:XX:XX:XX
[16433.522344] ADDRCONF(NETDEV_UP): eth1: link is not ready
[16433.704125] eth1: authenticate with AP XX:XX:XX:XX:XX:XX
[16433.705986] eth1: authenticated
[16433.705997] eth1: associate with AP XX:XX:XX:XX:XX:XX
[16433.708406] eth1: RX AssocResp from XX:XX:XX:XX:XX:XX (capab=0x411 status=0 aid=1)
[16433.708412] eth1: associated
[16433.709659] ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
[16443.712280] eth1: disassociating by local choice (reason=3)
[16443.712902] eth1: disassociating by local choice (reason=3)
[16444.692059] eth1: no IPv6 routers present
[16445.997384] eth1: authenticate with AP XX:XX:XX:XX:XX:XX
[16445.999304] eth1: authenticated
[16445.999312] eth1: associate with AP XX:XX:XX:XX:XX:XX
[16446.001784] eth1: RX ReassocResp from XX:XX:XX:XX:XX:XX (capab=0x411 status=0 aid=1)
[16446.001790] eth1: associated
[16446.008222] eth1: authenticate with AP XX:XX:XX:XX:XX:XX
[16446.009539] eth1: authenticate with AP XX:XX:XX:XX:XX:XX
[16446.012433] eth1: authenticated
[16446.012440] eth1: associate with AP XX:XX:XX:XX:XX:XX
[16446.015149] eth1: RX ReassocResp from XX:XX:XX:XX:XX:XX (capab=0x411 status=0 aid=1)
[16446.015155] eth1: associated
[16456.017223] eth1: disassociating by local choice (reason=3)
[16456.019751] eth1: disassociating by local choice (reason=3)
[16458.298346] eth1: authenticate with AP XX:XX:XX:XX:XX:XX
[16458.300319] eth1: authenticated
[16458.300339] eth1: associate with AP XX:XX:XX:XX:XX:XX
[16458.305557] eth1: authenticate with AP XX:XX:XX:XX:XX:XX
[16458.314765] eth1: authenticated
[16458.314781] eth1: associate with AP XX:XX:XX:XX:XX:XX
[16458.318666] eth1: RX ReassocResp from XX:XX:XX:XX:XX:XX (capab=0x411 status=0 aid=1)
[16458.318673] eth1: associated
[16468.321210] eth1: disassociating by local choice (reason=3)
[16470.604081] eth1: authenticate with AP XX:XX:XX:XX:XX:XX
[16470.606122] eth1: authenticate with AP XX:XX:XX:XX:XX:XX
[16470.608330] eth1: authenticate with AP XX:XX:XX:XX:XX:XX
[16470.608367] eth1: authenticated
[16470.608374] eth1: associate with AP XX:XX:XX:XX:XX:XX
[16470.614629] eth1: RX ReassocResp from XX:XX:XX:XX:XX:XX (capab=0x411 status=0 aid=1)
[16470.614648] eth1: associated
[16480.617268] eth1: disassociating by local choice (reason=3)
[16480.619939] eth1: disassociating by local choice (reason=3)
[16482.896840] eth1: authenticate with AP XX:XX:XX:XX:XX:XX
[16482.899593] eth1: authenticate with AP XX:XX:XX:XX:XX:XX
[16482.899626] eth1: authenticated
[16482.899633] eth1: associate with AP XX:XX:XX:XX:XX:XX
[16482.906006] eth1: authenticate with AP XX:XX:XX:XX:XX:XX
[16482.906048] eth1: authenticated
[16482.906055] eth1: associate with AP XX:XX:XX:XX:XX:XX
[16482.907625] eth1: authenticate with AP XX:XX:XX:XX:XX:XX
[16482.909446] eth1: authenticated
[16482.909453] eth1: associate with AP XX:XX:XX:XX:XX:XX
[16482.918553] eth1: RX ReassocResp from XX:XX:XX:XX:XX:XX (capab=0x411 status=0 aid=1)
[16482.918564] eth1: associated
[16492.920224] eth1: disassociating by local choice (reason=3)
[16492.920986] eth1: disassociating by local choice (reason=3)
Reverting 8ab65b03b7893da4a49009e7e356e36e27b0c407 seems to make
the card associate properly after suspend, for me.
--
MST
On Mon, 2008-09-01 at 19:07 +0300, Michael S. Tsirkin wrote:
> [16482.909453] eth1: associate with AP XX:XX:XX:XX:XX:XX
> [16482.918553] eth1: RX ReassocResp from XX:XX:XX:XX:XX:XX
> (capab=0x411 status=0 aid=1)
> [16482.918564] eth1: associated
> [16492.920224] eth1: disassociating by local choice (reason=3)
> [16492.920986] eth1: disassociating by local choice (reason=3)
It is exactly 10 seconds the local STA sends the deauth_leaving frame
before associated everytime. I wonder if it is the timeout for some
wireless config tools. NM?
What did you do to make it reassociate after resume? If you are using
some automatic config tools, can you try iwconfig manually instead?
Thanks,
-yi
On Tue, 2008-09-02 at 17:58 +0300, Michael S. Tsirkin wrote:
> On Tue, Sep 02, 2008 at 09:30:01AM -0400, Dan Williams wrote:
> > On Tue, 2008-09-02 at 16:04 +0300, Michael S. Tsirkin wrote:
> > > On Tue, Sep 02, 2008 at 08:10:09AM +0200, Johannes Berg wrote:
> > > > Zhu Yi wrote:
> > > > > On Mon, 2008-09-01 at 19:07 +0300, Michael S. Tsirkin wrote:
> > > > >> [16482.909453] eth1: associate with AP XX:XX:XX:XX:XX:XX
> > > > >> [16482.918553] eth1: RX ReassocResp from XX:XX:XX:XX:XX:XX
> > > > >> (capab=0x411 status=0 aid=1)
> > > > >> [16482.918564] eth1: associated
> > > > >> [16492.920224] eth1: disassociating by local choice (reason=3)
> > > > >> [16492.920986] eth1: disassociating by local choice (reason=3)
> > > > >
> > > > > It is exactly 10 seconds the local STA sends the deauth_leaving frame
> > > > > before associated everytime. I wonder if it is the timeout for some
> > > > > wireless config tools. NM?
> > > >
> > > > "by local choice" means wext requested this, not a kernel bug.
> > > >
> > > > johannes
> > >
> > > Sure, it's wpa_supplicant in ubuntu gutsy doing it.
> > >
> > > $wpa_supplicant -version
> > > wpa_supplicant v0.5.8
> > > Copyright (c) 2003-2007, Jouni Malinen <[email protected]> and contributors
> > >
> > > But it happens to work fine without 8ab65b03b7893da4a49009e7e356e36e27b0c407.
> >
> > That's really odd; that patch (the don't-send-empty-extended-rates-IE
> > patch) shouldn't have anything to do with this behavior, if it did you
> > wouldn't get associated to the AP in the first place I think. You're
> > 100% sure that this patch is causing the problem?
>
> No, not sure. But reverting it seems to have helped.
> And there does not seem to be anything else relevant between rc4 and
> rc5.
>
> > Can you post the
> > output of running wpa_supplicant with the "-dddt" flags
>
> I put up a (sanitized) -dd output here
> http://bugzilla.kernel.org/show_bug.cgi?id=11476
Logs seem to indicate a normal failure of the 4 way handshake; are you
sure your PSK is correct? It actually looks like the AP doesn't ever
get message #2 from your machine, and just retransmits message #1 of the
handshake. I'm pretty sure that if
8ab65b03b7893da4a49009e7e356e36e27b0c407 had anything to do with it, you
wouldn't be getting this far and the AP would have denied association in
the first place.
To me this looks like the 4-way handshake is going wrong and either the
supplicant is killing the connection or the AP is terminating the
connection due to that. This doesn't rule out driver problems but we'd
need better analysis of the wpa_supplicant logs (be smarter people than
I) to figure out whether I'm on the right track or not.
Dan
On Tue, Sep 02, 2008 at 09:38:56AM +0800, Zhu Yi wrote:
> On Mon, 2008-09-01 at 19:07 +0300, Michael S. Tsirkin wrote:
> > [16482.909453] eth1: associate with AP XX:XX:XX:XX:XX:XX
> > [16482.918553] eth1: RX ReassocResp from XX:XX:XX:XX:XX:XX
> > (capab=0x411 status=0 aid=1)
> > [16482.918564] eth1: associated
> > [16492.920224] eth1: disassociating by local choice (reason=3)
> > [16492.920986] eth1: disassociating by local choice (reason=3)
>
> It is exactly 10 seconds the local STA sends the deauth_leaving frame
> before associated everytime. I wonder if it is the timeout for some
> wireless config tools. NM?
Not network manager, I'm using wpa_supplicant.
> What did you do to make it reassociate after resume? If you are using
> some automatic config tools,
I think that resume scripts try to do is up the interface, and
network scripts run wpa supplicant (or rather, wpa_cli which talks to
wpa_supplicant).
Same thing happens if I run wpa_supplicant manually as well.
> can you try iwconfig manually instead?
>
> Thanks,
> -yi
I could try, but I have wpa ...
--
MST
On Tue, Sep 02, 2008 at 11:50:23PM +0300, Michael S. Tsirkin wrote:
> Could you tell me which bits are needed, and don't include
> security info? I wouldn't want to post the PSK, etc.
As long as you don't include -K on the command line, wpa_supplicant
should not include PSK or password or any other keys, etc. in the debug
log. It will include MAC addresses, though (and they are useful for
understanding what is happening).
> Hmm, since it manages to reassociate with -rc4 I would guess not.
> In any case killing wpa_supplicant after reboot and restarting it does not
> seem to help.
OK. If wpa_supplicant restart does not resolve the issue, this points
towards something in the kernel getting into an unexpected state. And
yes, that would indeed mean that it has nothing to do with the fix I
mentioned.
--
Jouni Malinen PGP id EFC895FA
On Tue, Sep 02, 2008 at 08:10:09AM +0200, Johannes Berg wrote:
> Zhu Yi wrote:
> > On Mon, 2008-09-01 at 19:07 +0300, Michael S. Tsirkin wrote:
> >> [16482.909453] eth1: associate with AP XX:XX:XX:XX:XX:XX
> >> [16482.918553] eth1: RX ReassocResp from XX:XX:XX:XX:XX:XX
> >> (capab=0x411 status=0 aid=1)
> >> [16482.918564] eth1: associated
> >> [16492.920224] eth1: disassociating by local choice (reason=3)
> >> [16492.920986] eth1: disassociating by local choice (reason=3)
> >
> > It is exactly 10 seconds the local STA sends the deauth_leaving frame
> > before associated everytime. I wonder if it is the timeout for some
> > wireless config tools. NM?
>
> "by local choice" means wext requested this, not a kernel bug.
>
> johannes
Sure, it's wpa_supplicant in ubuntu gutsy doing it.
$wpa_supplicant -version
wpa_supplicant v0.5.8
Copyright (c) 2003-2007, Jouni Malinen <[email protected]> and contributors
But it happens to work fine without 8ab65b03b7893da4a49009e7e356e36e27b0c407.
So yes, this could be some assumption that wpa_supplicant makes and that
fails now. I guess I will try the latest version from source and report.
On Tue, Sep 02, 2008 at 08:24:09PM +0300, Jouni Malinen wrote:
> On Tue, Sep 02, 2008 at 11:10:38AM -0400, Dan Williams wrote:
> > On Tue, 2008-09-02 at 17:58 +0300, Michael S. Tsirkin wrote:
> > > On Tue, Sep 02, 2008 at 09:30:01AM -0400, Dan Williams wrote:
> > > > On Tue, 2008-09-02 at 16:04 +0300, Michael S. Tsirkin wrote:
>
> > > > > On Tue, Sep 02, 2008 at 08:10:09AM +0200, Johannes Berg wrote:
> > > > > > "by local choice" means wext requested this, not a kernel bug.
>
> Well.. The disassociation part not, but what is trickering that
> (association/authentication timeout) may still be ;-).
>
>
> > > I put up a (sanitized) -dd output here
> > > http://bugzilla.kernel.org/show_bug.cgi?id=11476
>
> Would it be possible to get unsanitized version and with -t (i.e.,
> timestamps)? If you don't want to post it publicly, feel free to send
> one directly to me. The one attached to that bug removed quite a bit of
> information that would have been useful in figuring out what happened.
Could you tell me which bits are needed, and don't include
security info? I wouldn't want to post the PSK, etc.
> > Logs seem to indicate a normal failure of the 4 way handshake; are you
> > sure your PSK is correct? It actually looks like the AP doesn't ever
> > get message #2 from your machine, and just retransmits message #1 of the
> > handshake. I'm pretty sure that if
> > 8ab65b03b7893da4a49009e7e356e36e27b0c407 had anything to do with it, you
> > wouldn't be getting this far and the AP would have denied association in
> > the first place.
>
> There are more than one different types of issues shown in the log. I'm
> not sure which one would be the key one here. Anyway, I would agree that
> the commit that removed an empty ext. supp rates IE has nothing to do
> with this.
>
>
> As far as suspend is concerned, which distro is used here?
ubuntu gutsy
> Does it
> unload the driver (i.e., remove the netdev) on suspend? If yes, you will
> need to update wpa_supplicant 0.6.4 which handles that disappearing and
> reappearing interface.
Hmm, since it manages to reassociate with -rc4 I would guess not.
In any case killing wpa_supplicant after reboot and restarting it does not
seem to help.
> If this is something else, I would need to get
> more complete debug log from wpa_supplicant..
>
> --
> Jouni Malinen PGP id EFC895FA
On Tue, 2008-09-02 at 16:04 +0300, Michael S. Tsirkin wrote:
> On Tue, Sep 02, 2008 at 08:10:09AM +0200, Johannes Berg wrote:
> > Zhu Yi wrote:
> > > On Mon, 2008-09-01 at 19:07 +0300, Michael S. Tsirkin wrote:
> > >> [16482.909453] eth1: associate with AP XX:XX:XX:XX:XX:XX
> > >> [16482.918553] eth1: RX ReassocResp from XX:XX:XX:XX:XX:XX
> > >> (capab=0x411 status=0 aid=1)
> > >> [16482.918564] eth1: associated
> > >> [16492.920224] eth1: disassociating by local choice (reason=3)
> > >> [16492.920986] eth1: disassociating by local choice (reason=3)
> > >
> > > It is exactly 10 seconds the local STA sends the deauth_leaving frame
> > > before associated everytime. I wonder if it is the timeout for some
> > > wireless config tools. NM?
> >
> > "by local choice" means wext requested this, not a kernel bug.
> >
> > johannes
>
> Sure, it's wpa_supplicant in ubuntu gutsy doing it.
>
> $wpa_supplicant -version
> wpa_supplicant v0.5.8
> Copyright (c) 2003-2007, Jouni Malinen <[email protected]> and contributors
>
> But it happens to work fine without 8ab65b03b7893da4a49009e7e356e36e27b0c407.
That's really odd; that patch (the don't-send-empty-extended-rates-IE
patch) shouldn't have anything to do with this behavior, if it did you
wouldn't get associated to the AP in the first place I think. You're
100% sure that this patch is causing the problem? Can you post the
output of running wpa_supplicant with the "-dddt" flags and also post
the 'iwlist scan' output for your AP?
Dan
On Mon, Sep 01, 2008 at 07:06:58PM +0300, Michael S. Tsirkin wrote:
> Starting with 2.6.27-rc5, my T60p sometimes fails to associate
> with an access point after suspend to ram/resume.
>
> This does not seem to ever happen on 2.6.27-rc4.
>
> When this happens, I observe the following messages in dmesg:
> (AP mac masked out with XX):
>
> ...
>
> Reverting 8ab65b03b7893da4a49009e7e356e36e27b0c407 seems to make
> the card associate properly after suspend, for me.
FYI I created a bugzilla entry to track this:
http://bugzilla.kernel.org/show_bug.cgi?id=11476
--
MST
Zhu Yi wrote:
> On Mon, 2008-09-01 at 19:07 +0300, Michael S. Tsirkin wrote:
>> [16482.909453] eth1: associate with AP XX:XX:XX:XX:XX:XX
>> [16482.918553] eth1: RX ReassocResp from XX:XX:XX:XX:XX:XX
>> (capab=0x411 status=0 aid=1)
>> [16482.918564] eth1: associated
>> [16492.920224] eth1: disassociating by local choice (reason=3)
>> [16492.920986] eth1: disassociating by local choice (reason=3)
>
> It is exactly 10 seconds the local STA sends the deauth_leaving frame
> before associated everytime. I wonder if it is the timeout for some
> wireless config tools. NM?
"by local choice" means wext requested this, not a kernel bug.
johannes
On Tue, Sep 02, 2008 at 05:58:41PM +0300, Michael S. Tsirkin wrote:
> On Tue, Sep 02, 2008 at 09:30:01AM -0400, Dan Williams wrote:
> > On Tue, 2008-09-02 at 16:04 +0300, Michael S. Tsirkin wrote:
> > > But it happens to work fine without 8ab65b03b7893da4a49009e7e356e36e27b0c407.
> >
> > That's really odd; that patch (the don't-send-empty-extended-rates-IE
> > patch) shouldn't have anything to do with this behavior, if it did you
> > wouldn't get associated to the AP in the first place I think. You're
> > 100% sure that this patch is causing the problem?
>
> No, not sure. But reverting it seems to have helped.
> And there does not seem to be anything else relevant between rc4 and
> rc5.
So you determined that commit by eye rather than by git bisect?
Any chance I could talk you into confirming that using git bisect?
Thanks,
John
--
John W. Linville
[email protected]
On Tue, Sep 02, 2008 at 11:06:25AM -0400, John W. Linville wrote:
> On Tue, Sep 02, 2008 at 05:58:41PM +0300, Michael S. Tsirkin wrote:
> > On Tue, Sep 02, 2008 at 09:30:01AM -0400, Dan Williams wrote:
> > > On Tue, 2008-09-02 at 16:04 +0300, Michael S. Tsirkin wrote:
>
> > > > But it happens to work fine without 8ab65b03b7893da4a49009e7e356e36e27b0c407.
> > >
> > > That's really odd; that patch (the don't-send-empty-extended-rates-IE
> > > patch) shouldn't have anything to do with this behavior, if it did you
> > > wouldn't get associated to the AP in the first place I think. You're
> > > 100% sure that this patch is causing the problem?
> >
> > No, not sure. But reverting it seems to have helped.
> > And there does not seem to be anything else relevant between rc4 and
> > rc5.
>
> So you determined that commit by eye rather than by git bisect?
Yes.
> Any chance I could talk you into confirming that using git bisect?
It's a bit time consuming because the bug triggers often, but not in
100% of suspends. So you need to fiddle with it a bit after each
bisect to figure out whether it's there.
> Thanks,
>
> John
> --
> John W. Linville
> [email protected]
On Tue, Sep 02, 2008 at 11:10:38AM -0400, Dan Williams wrote:
> On Tue, 2008-09-02 at 17:58 +0300, Michael S. Tsirkin wrote:
> > On Tue, Sep 02, 2008 at 09:30:01AM -0400, Dan Williams wrote:
> > > On Tue, 2008-09-02 at 16:04 +0300, Michael S. Tsirkin wrote:
> > > > On Tue, Sep 02, 2008 at 08:10:09AM +0200, Johannes Berg wrote:
> > > > > "by local choice" means wext requested this, not a kernel bug.
Well.. The disassociation part not, but what is trickering that
(association/authentication timeout) may still be ;-).
> > I put up a (sanitized) -dd output here
> > http://bugzilla.kernel.org/show_bug.cgi?id=11476
Would it be possible to get unsanitized version and with -t (i.e.,
timestamps)? If you don't want to post it publicly, feel free to send
one directly to me. The one attached to that bug removed quite a bit of
information that would have been useful in figuring out what happened.
> Logs seem to indicate a normal failure of the 4 way handshake; are you
> sure your PSK is correct? It actually looks like the AP doesn't ever
> get message #2 from your machine, and just retransmits message #1 of the
> handshake. I'm pretty sure that if
> 8ab65b03b7893da4a49009e7e356e36e27b0c407 had anything to do with it, you
> wouldn't be getting this far and the AP would have denied association in
> the first place.
There are more than one different types of issues shown in the log. I'm
not sure which one would be the key one here. Anyway, I would agree that
the commit that removed an empty ext. supp rates IE has nothing to do
with this.
As far as suspend is concerned, which distro is used here? Does it
unload the driver (i.e., remove the netdev) on suspend? If yes, you will
need to update wpa_supplicant 0.6.4 which handles that disappearing and
reappearing interface. If this is something else, I would need to get
more complete debug log from wpa_supplicant..
--
Jouni Malinen PGP id EFC895FA
On Tue, Sep 02, 2008 at 09:30:01AM -0400, Dan Williams wrote:
> On Tue, 2008-09-02 at 16:04 +0300, Michael S. Tsirkin wrote:
> > On Tue, Sep 02, 2008 at 08:10:09AM +0200, Johannes Berg wrote:
> > > Zhu Yi wrote:
> > > > On Mon, 2008-09-01 at 19:07 +0300, Michael S. Tsirkin wrote:
> > > >> [16482.909453] eth1: associate with AP XX:XX:XX:XX:XX:XX
> > > >> [16482.918553] eth1: RX ReassocResp from XX:XX:XX:XX:XX:XX
> > > >> (capab=0x411 status=0 aid=1)
> > > >> [16482.918564] eth1: associated
> > > >> [16492.920224] eth1: disassociating by local choice (reason=3)
> > > >> [16492.920986] eth1: disassociating by local choice (reason=3)
> > > >
> > > > It is exactly 10 seconds the local STA sends the deauth_leaving frame
> > > > before associated everytime. I wonder if it is the timeout for some
> > > > wireless config tools. NM?
> > >
> > > "by local choice" means wext requested this, not a kernel bug.
> > >
> > > johannes
> >
> > Sure, it's wpa_supplicant in ubuntu gutsy doing it.
> >
> > $wpa_supplicant -version
> > wpa_supplicant v0.5.8
> > Copyright (c) 2003-2007, Jouni Malinen <[email protected]> and contributors
> >
> > But it happens to work fine without 8ab65b03b7893da4a49009e7e356e36e27b0c407.
>
> That's really odd; that patch (the don't-send-empty-extended-rates-IE
> patch) shouldn't have anything to do with this behavior, if it did you
> wouldn't get associated to the AP in the first place I think. You're
> 100% sure that this patch is causing the problem?
No, not sure. But reverting it seems to have helped.
And there does not seem to be anything else relevant between rc4 and
rc5.
> Can you post the
> output of running wpa_supplicant with the "-dddt" flags
I put up a (sanitized) -dd output here
http://bugzilla.kernel.org/show_bug.cgi?id=11476
> and also post
> the 'iwlist scan' output for your AP?
>
> Dan
OK, I will look into collecting this data.