Hi,
I already reported this problem [1] and I got no feedback whatsoever
at all. The issue keeps happening and I've tried many things.
First of all, when Linux is failing, my phone connects fine, other
computers connect fine, and this machine with Windows 7 as well. I've
tried reloading the driver, rebooting, different module parameters,
nothing works.
I logged a session [2] where I waited 30 minutes for the link to come
up, but it never did. You can see the same thing happening over and
over:
1382936199.582861: nl80211: Association request send successfully
1382936199.797063: nl80211: Event message available
1382936199.797097: nl80211: Delete station e0:1d:3b:46:82:a0
1382936199.797881: nl80211: Event message available
1382936199.797905: nl80211: MLME event 38; timeout with e0:1d:3b:46:82:a0
1382936199.797915: wlan0: Event ASSOC_TIMED_OUT (15) received
1382936199.797920: wlan0: SME: Association timed out
1382936199.797924: Added BSSID e0:1d:3b:46:82:a0 into blacklist
If I reboot the router, it works immediately, another thing that works
is connecting with ad-hoc mode (mode=1), and then back to normal mode
(mode=0).
Here's a log [3] where I tried reloading the module multiple times and
finally tried ad-hoc mode, after which the link came up.
Clearly the Linux kernel has a bug. Can somebody point out what needs
to be done to get this fixed?
Cheers.
[1] http://article.gmane.org/gmane.linux.kernel.wireless.general/108004
[2] http://people.freedesktop.org/~felipec/wpa/wpa-bad-30-min-wait.log
[3] http://people.freedesktop.org/~felipec/wpa/wpa-good-nothing-worked-except-mode-switch.log
--
Felipe Contreras
On Tue, Oct 29, 2013 at 12:13 AM, Felipe Contreras
<[email protected]> wrote:
> On Mon, Oct 28, 2013 at 12:28 PM, Krishna Chaitanya
> <[email protected]> wrote:
>> On Mon, Oct 28, 2013 at 11:30 PM, Felipe Contreras
>> <[email protected]> wrote:
>>
>>> The authentication response does come back in both cases though, it's
>>> just the acknowledgement that is missing. Unfortunately I cannot
>>> figure out for which message it's the ack.
>>>
>>> Also, I notice the sequence number received from the router doesn't
>>> seem to change. All the authentication requests received have the same
>>> number (256). Another peculiar thing is that in the failed case the SN
>>> we send starts with 0.
>>>
>>> I suppose since the authentication ack never arrives, the next steps
>>> are never completed.
>>>
>>> Does that help?
>> From the supplicant logs we have successfully received the
>> authentication response
>> and sent out the association request. So are you referring to not receiving ACK
>> for association request??
>
> No, from the capture there's no association request in the bad case,
> only in the good one.
>
>> It would be nice to get the capture without any filters?
>
> http://people.freedesktop.org/~felipec/wpa/wpa-bad.pcapng
> http://people.freedesktop.org/~felipec/wpa/wpa-good.pcapng
>
>From the logs we can see that we have received authentication response,
so the association request is getting dropped somewhere? We might
need the mac80211 and iwlwifi trace-cmd logs to check for the drop.
http://wireless.kernel.org/en/developers/Documentation/mac80211/tracing
Another side porint? Also all the beacons from that AP are malformed
but probe responses are fine, weird??
On Mon, Oct 28, 2013 at 11:30 PM, Felipe Contreras
<[email protected]> wrote:
> The authentication response does come back in both cases though, it's
> just the acknowledgement that is missing. Unfortunately I cannot
> figure out for which message it's the ack.
>
> Also, I notice the sequence number received from the router doesn't
> seem to change. All the authentication requests received have the same
> number (256). Another peculiar thing is that in the failed case the SN
> we send starts with 0.
>
> I suppose since the authentication ack never arrives, the next steps
> are never completed.
>
> Does that help?
>From the supplicant logs we have successfully received the
authentication response
and sent out the association request. So are you referring to not receiving ACK
for association request??
It would be nice to get the capture without any filters?
--
Thanks,
Regards,
Chaitanya T K.
Mobile:+91-9963910010.
On Mon, Oct 28, 2013 at 10:28 AM, Oleksij Rempel <[email protected]> wrote:
> Am 28.10.2013 16:44, schrieb Felipe Contreras:
>> And BTW, the devices we are talking about are very varied: Wii U,
>> Windows 7, Windows 8.1, Nokia N9, Android phone, iPhone, iMac, Amazon
>> Kindle. Yet the only device that seems to have a problem is my Linux
>> machine, I think it's pretty clear where the problem is.
>
> Do any of listed devices use mac80211?
Yes, my Nokia N900, which works flawlessly. I didn't list that, but I
just tried it.
> If not, you still have fallowing
> options: wpa_supplicant, mac80211, iwlwifi driver, iwlwifi firmware. In
> you arguments you even didn't tried to eliminate any of them. So, no. It
> is not clear where problem is.
All of these things run in the Linux laptop, so to me it is clearly
not in the router. Sure, the router might be buggy, but it's very
conspicuous that only this machine reproduces that bug.
>> And sure, it does not necessarily means that other people have the
>> same problem, but it is very unlikely that I'm the only one.
>
> what is your hardware?
Intel Corporation Centrino Advanced-N 6235
> Did you tired to disable power save mode?
modinfo shows this:
parm: power_save:enable WiFi power management (default:
disable) (bool)
So I guessed power saving was disabled by default.
But yeah, to be sure I did try it:
sudo modprobe iwlwifi power_save=0 power_level=1
> I have "Intel Centrino Advanced-N 6235", it use same driver and works
> with three of my APs. I say just to show, that list of working hardware
> is not helpful.
Of course it is helpful, it tells us that the problem is very likely
on this machine, not on the AP.
> what is you AP?
ZNID-GPON-2516
> Mode it is running?
I don't know, it's a wireless router.
> Do changing settings of this AP makes some difference?
I don't have access to the settings. I'm going to ask for access, but
I'm not too hopeful.
--
Felipe Contreras
On Mon, Oct 28, 2013 at 3:07 PM, Felipe Contreras
<[email protected]> wrote:
>> Looks like there is "maximum clients" feature is enabled in the AP, most AP's
>> send deauth for Auth/Assoc Request but some AP's silently discard the
>> Assoc Request.
>>
>> That explains why it works after reboot/adhoc mode. Try increasing the
>> number if thats the case.
>
> Then why am I able to connect from my phone, other machines, and in
> this machine with Windows 7?
>
So lets say "max clients=5" and first all of your devices except the
linux connec
to the AP, then they have no issues connecting. Now if the linux is
the 6th device
then it might have trouble connecting to the AP?? Its possible.
On Mon, Oct 28, 2013 at 5:00 AM, Krishna Chaitanya
<[email protected]> wrote:
> On Mon, Oct 28, 2013 at 3:07 PM, Felipe Contreras
> <[email protected]> wrote:
>> Then why am I able to connect from my phone, other machines, and in
>> this machine with Windows 7?
>>
> So lets say "max clients=5" and first all of your devices except the
> linux connec
> to the AP, then they have no issues connecting. Now if the linux is
> the 6th device
> then it might have trouble connecting to the AP?? Its possible.
Yeah, but if max-clients = 5, and clients = 5, nothing would change
when I reboot to Windows and it works. Also, if it's working on Linux,
I reboot my machine, and then I cannot connect again. Plus, I
disconnect my phone, I try to connect in Linux, it keeps failing, I
connect my phone, and my phone works.
I don't think this theory matches the evidence at all.
--
Felipe Contreras
On Mon, Oct 28, 2013 at 2:31 AM, Oleksij Rempel <[email protected]> wrote:
> Take wireshark and capture working and not working associational request.
All right, so what I did is take a working and non-working on the same
Linux machine. I have never done that, and didn't find much
information about how to do that so I used kismet.
I don't know exactly how I should export these logs for you to browse
them but I gave a shot at interpreting them.
What I can see is that in both cases the authentication request is
exactly the same, except for the sequence number. Yet in good case
there's an acknowledgement, and in the bad case there isn't.
The authentication response does come back in both cases though, it's
just the acknowledgement that is missing. Unfortunately I cannot
figure out for which message it's the ack.
Also, I notice the sequence number received from the router doesn't
seem to change. All the authentication requests received have the same
number (256). Another peculiar thing is that in the failed case the SN
we send starts with 0.
I suppose since the authentication ack never arrives, the next steps
are never completed.
Does that help?
--
Felipe Contreras
On Mon, Oct 28, 2013 at 1:27 PM, Krishna Chaitanya
<[email protected]> wrote:
> On Tue, Oct 29, 2013 at 12:13 AM, Felipe Contreras
> <[email protected]> wrote:
>> On Mon, Oct 28, 2013 at 12:28 PM, Krishna Chaitanya
>> <[email protected]> wrote:
>>> On Mon, Oct 28, 2013 at 11:30 PM, Felipe Contreras
>>> <[email protected]> wrote:
>>>
>>>> The authentication response does come back in both cases though, it's
>>>> just the acknowledgement that is missing. Unfortunately I cannot
>>>> figure out for which message it's the ack.
>>>>
>>>> Also, I notice the sequence number received from the router doesn't
>>>> seem to change. All the authentication requests received have the same
>>>> number (256). Another peculiar thing is that in the failed case the SN
>>>> we send starts with 0.
>>>>
>>>> I suppose since the authentication ack never arrives, the next steps
>>>> are never completed.
>>>>
>>>> Does that help?
>>> From the supplicant logs we have successfully received the
>>> authentication response
>>> and sent out the association request. So are you referring to not receiving ACK
>>> for association request??
>>
>> No, from the capture there's no association request in the bad case,
>> only in the good one.
>>
>>> It would be nice to get the capture without any filters?
>>
>> http://people.freedesktop.org/~felipec/wpa/wpa-bad.pcapng
>> http://people.freedesktop.org/~felipec/wpa/wpa-good.pcapng
>>
>
> From the logs we can see that we have received authentication response,
> so the association request is getting dropped somewhere? We might
> need the mac80211 and iwlwifi trace-cmd logs to check for the drop.
>
> http://wireless.kernel.org/en/developers/Documentation/mac80211/tracing
There you go:
http://people.freedesktop.org/~felipec/wpa/trace.dat.xz
--
Felipe Contreras
On Tue, Oct 29, 2013 at 3:02 AM, Felipe Contreras
<[email protected]> wrote:
> On Mon, Oct 28, 2013 at 2:44 PM, Krishna Chaitanya
> <[email protected]> wrote:
>> On Tue, Oct 29, 2013 at 1:36 AM, Felipe Contreras
>> <[email protected]> wrote:
>>> On Mon, Oct 28, 2013 at 1:27 PM, Krishna Chaitanya
>>> <[email protected]> wrote:
>>
>>>> From the logs we can see that we have received authentication response,
>>>> so the association request is getting dropped somewhere? We might
>>>> need the mac80211 and iwlwifi trace-cmd logs to check for the drop.
>>>>
>>>> http://wireless.kernel.org/en/developers/Documentation/mac80211/tracing
>>>
>>> There you go:
>>> http://people.freedesktop.org/~felipec/wpa/trace.dat.xz
>>>
>> Hmm.."trace-cmd report -i trace.dat" returned lots of errors, i have even
>> tried with the trace-cmd from git (ubuntu). Did it worked fro you?
>
> Yes, but maybe I overrode the file. I've pushed a new one again. The
> sha-1 is 36c260d8d8c171a24eb1aa7b2ea736b06c9b55b7.
>
Thanks, able to decode now. I am not familiar with the
iwlwifi code, but let me give it a try.
--
Thanks,
Regards,
Chaitanya T K.
Am 28.10.2013 07:24, schrieb Felipe Contreras:
> Hi,
>
> I already reported this problem [1] and I got no feedback whatsoever
> at all. The issue keeps happening and I've tried many things.
>
> First of all, when Linux is failing, my phone connects fine, other
> computers connect fine, and this machine with Windows 7 as well. I've
> tried reloading the driver, rebooting, different module parameters,
> nothing works.
>
> I logged a session [2] where I waited 30 minutes for the link to come
> up, but it never did. You can see the same thing happening over and
> over:
>
> 1382936199.582861: nl80211: Association request send successfully
> 1382936199.797063: nl80211: Event message available
> 1382936199.797097: nl80211: Delete station e0:1d:3b:46:82:a0
> 1382936199.797881: nl80211: Event message available
> 1382936199.797905: nl80211: MLME event 38; timeout with e0:1d:3b:46:82:a0
> 1382936199.797915: wlan0: Event ASSOC_TIMED_OUT (15) received
> 1382936199.797920: wlan0: SME: Association timed out
> 1382936199.797924: Added BSSID e0:1d:3b:46:82:a0 into blacklist
>
> If I reboot the router, it works immediately, another thing that works
> is connecting with ad-hoc mode (mode=1), and then back to normal mode
> (mode=0).
>
> Here's a log [3] where I tried reloading the module multiple times and
> finally tried ad-hoc mode, after which the link came up.
>
> Clearly the Linux kernel has a bug. Can somebody point out what needs
> to be done to get this fixed?
>
> Cheers.
>
> [1] http://article.gmane.org/gmane.linux.kernel.wireless.general/108004
> [2] http://people.freedesktop.org/~felipec/wpa/wpa-bad-30-min-wait.log
> [3] http://people.freedesktop.org/~felipec/wpa/wpa-good-nothing-worked-except-mode-switch.log
>
Heh... this logs look like miracle :)
My first assumption would be buggy router. There is no answer in
wpa_supplicant log.
Take wireshark and capture working and not working associational request.
--
Regards,
Oleksij
On Mon, Oct 28, 2013 at 2:44 PM, Krishna Chaitanya
<[email protected]> wrote:
> On Tue, Oct 29, 2013 at 1:36 AM, Felipe Contreras
> <[email protected]> wrote:
>> On Mon, Oct 28, 2013 at 1:27 PM, Krishna Chaitanya
>> <[email protected]> wrote:
>
>>> From the logs we can see that we have received authentication response,
>>> so the association request is getting dropped somewhere? We might
>>> need the mac80211 and iwlwifi trace-cmd logs to check for the drop.
>>>
>>> http://wireless.kernel.org/en/developers/Documentation/mac80211/tracing
>>
>> There you go:
>> http://people.freedesktop.org/~felipec/wpa/trace.dat.xz
>>
> Hmm.."trace-cmd report -i trace.dat" returned lots of errors, i have even
> tried with the trace-cmd from git (ubuntu). Did it worked fro you?
Yes, but maybe I overrode the file. I've pushed a new one again. The
sha-1 is 36c260d8d8c171a24eb1aa7b2ea736b06c9b55b7.
--
Felipe Contreras
On Mon, Oct 28, 2013 at 2:01 PM, Oleksij Rempel <[email protected]> wrote:
>> I logged a session [2] where I waited 30 minutes for the link to come
>> up, but it never did. You can see the same thing happening over and
>> over:
>>
>> 1382936199.582861: nl80211: Association request send successfully
>> 1382936199.797063: nl80211: Event message available
>> 1382936199.797097: nl80211: Delete station e0:1d:3b:46:82:a0
>> 1382936199.797881: nl80211: Event message available
>> 1382936199.797905: nl80211: MLME event 38; timeout with e0:1d:3b:46:82:a0
>> 1382936199.797915: wlan0: Event ASSOC_TIMED_OUT (15) received
>> 1382936199.797920: wlan0: SME: Association timed out
>> 1382936199.797924: Added BSSID e0:1d:3b:46:82:a0 into blacklist
>>
>> If I reboot the router, it works immediately, another thing that works
>> is connecting with ad-hoc mode (mode=1), and then back to normal mode
>> (mode=0).
>>
Looks like there is "maximum clients" feature is enabled in the AP, most AP's
send deauth for Auth/Assoc Request but some AP's silently discard the
Assoc Request.
That explains why it works after reboot/adhoc mode. Try increasing the
number if thats the case.
On Mon, Oct 28, 2013 at 3:23 AM, Krishna Chaitanya
<[email protected]> wrote:
> On Mon, Oct 28, 2013 at 2:01 PM, Oleksij Rempel <[email protected]> wrote:
>>> I logged a session [2] where I waited 30 minutes for the link to come
>>> up, but it never did. You can see the same thing happening over and
>>> over:
>>>
>>> 1382936199.582861: nl80211: Association request send successfully
>>> 1382936199.797063: nl80211: Event message available
>>> 1382936199.797097: nl80211: Delete station e0:1d:3b:46:82:a0
>>> 1382936199.797881: nl80211: Event message available
>>> 1382936199.797905: nl80211: MLME event 38; timeout with e0:1d:3b:46:82:a0
>>> 1382936199.797915: wlan0: Event ASSOC_TIMED_OUT (15) received
>>> 1382936199.797920: wlan0: SME: Association timed out
>>> 1382936199.797924: Added BSSID e0:1d:3b:46:82:a0 into blacklist
>>>
>>> If I reboot the router, it works immediately, another thing that works
>>> is connecting with ad-hoc mode (mode=1), and then back to normal mode
>>> (mode=0).
>>>
> Looks like there is "maximum clients" feature is enabled in the AP, most AP's
> send deauth for Auth/Assoc Request but some AP's silently discard the
> Assoc Request.
>
> That explains why it works after reboot/adhoc mode. Try increasing the
> number if thats the case.
Then why am I able to connect from my phone, other machines, and in
this machine with Windows 7?
--
Felipe Contreras
Am 28.10.2013 16:44, schrieb Felipe Contreras:
> On Mon, Oct 28, 2013 at 3:52 AM, Oleksij Rempel <[email protected]> wrote:
>> Am 28.10.2013 10:38, schrieb Felipe Contreras:
>>> On Mon, Oct 28, 2013 at 2:31 AM, Oleksij Rempel <[email protected]> wrote:
>
>>>> Heh... this logs look like miracle :)
>>>> My first assumption would be buggy router. There is no answer in
>>>> wpa_supplicant log.
>>>
>>> Yeah, I bet the router is buggy, which router isn't? But why Windows 7
>>> connects fine?
>>
>> May be it includes some workaround?
>
>> You do not need to fight with devs, i think they are agree that some
>> thing is wrong. But believe me, there are so many access points, which
>> make problems or wear things. If you have a bug, does not mean other
>> user have it.
>
> I understand the problems of making wireless drivers work on different
> kinds of access points, and I'm not fighting with devs, I'm just
> saying that if it works in all other devices the problem is most
> likely on this one.
>
> And BTW, the devices we are talking about are very varied: Wii U,
> Windows 7, Windows 8.1, Nokia N9, Android phone, iPhone, iMac, Amazon
> Kindle. Yet the only device that seems to have a problem is my Linux
> machine, I think it's pretty clear where the problem is.
Do any of listed devices use mac80211? If not, you still have fallowing
options: wpa_supplicant, mac80211, iwlwifi driver, iwlwifi firmware. In
you arguments you even didn't tried to eliminate any of them. So, no. It
is not clear where problem is.
> And sure, it does not necessarily means that other people have the
> same problem, but it is very unlikely that I'm the only one.
what is your hardware? Did you tired to disable power save mode?
I have "Intel Centrino Advanced-N 6235", it use same driver and works
with three of my APs. I say just to show, that list of working hardware
is not helpful.
what is you AP? Mode it is running? Do changing settings of this AP
makes some difference? -- No, changing settings of you AP is not the way
to get you off.
>> Beside, how many clients use this AP?
>
> Probably around a dozen.
>
>> How big is the distance?
>
> Probably around 10m.
>
>> What do you configure in AdHock mode?
>
> Nothing, I don't think it even works, but it associates. I just add mode=1.
>
> network={
> ssid="AXTEL-XXX"
> proto=WPA2
> scan_ssid=1
> key_mgmt=WPA-PSK
> psk="XXX"
> mode=1
> }
>
>>>> Take wireshark and capture working and not working associational request.
>>>
>>> I'll try that. If only it was that easy to get a working association.
>>
>> Compare it with windows.
>
> Right, I forgot I can use wireshark in Windows.
>
>> and please read this, it will help to provide more information.
>> http://wireless.kernel.org/en/users/Documentation/Reporting_bugs
>
> I don't understand what exactly do you need from that list. I'm using
> 3.11.6, but as I said in the original report, the kernel version makes
> no difference, even as far back as 3.6.0. I've put the dmesg log in
> the original mail and there's nothing of value there. I could try
> running with CONFIG_MAC80211_*_DEBUG stuff enabled, but do you really
> think that will help?
>
> If you really must know, this is exactly what I'm running:
>
> config=$(mktemp)
> pid="/run/wpa_supplicant_wlan0.pid"
>
> cat > $config <<-EOF
> country=MX
> ap_scan=1
> eapol_version=2
> device_name=Nysa
> device_type=1-0050F204-1
>
> network={
> ssid="AXTEL-7111"
> proto=WPA2
> scan_ssid=1
> key_mgmt=WPA-PSK
> psk="C3657111"
> mode=0
> }
> EOF
>
> wpa_supplicant -B -t -d -P $pid -i wlan0 -D nl80211,wext -c $config -f
> /tmp/wpa.log
>
> rm -f "$config"
>
> I tried with wext, but that doesn't work at all. A curious fact is
> that I need to enable CONFIG_CFG80211_WEXT=y for AP scanning to work,
> even though I'm not using the wext driver.
>
> I will try to get the wireshark logs.
>
--
Regards,
Oleksij
On Mon, Oct 28, 2013 at 3:52 AM, Oleksij Rempel <[email protected]> wrote:
> Am 28.10.2013 10:38, schrieb Felipe Contreras:
>> On Mon, Oct 28, 2013 at 2:31 AM, Oleksij Rempel <[email protected]> wrote:
>>> Heh... this logs look like miracle :)
>>> My first assumption would be buggy router. There is no answer in
>>> wpa_supplicant log.
>>
>> Yeah, I bet the router is buggy, which router isn't? But why Windows 7
>> connects fine?
>
> May be it includes some workaround?
> You do not need to fight with devs, i think they are agree that some
> thing is wrong. But believe me, there are so many access points, which
> make problems or wear things. If you have a bug, does not mean other
> user have it.
I understand the problems of making wireless drivers work on different
kinds of access points, and I'm not fighting with devs, I'm just
saying that if it works in all other devices the problem is most
likely on this one.
And BTW, the devices we are talking about are very varied: Wii U,
Windows 7, Windows 8.1, Nokia N9, Android phone, iPhone, iMac, Amazon
Kindle. Yet the only device that seems to have a problem is my Linux
machine, I think it's pretty clear where the problem is.
And sure, it does not necessarily means that other people have the
same problem, but it is very unlikely that I'm the only one.
> Beside, how many clients use this AP?
Probably around a dozen.
> How big is the distance?
Probably around 10m.
> What do you configure in AdHock mode?
Nothing, I don't think it even works, but it associates. I just add mode=1.
network={
ssid="AXTEL-XXX"
proto=WPA2
scan_ssid=1
key_mgmt=WPA-PSK
psk="XXX"
mode=1
}
>>> Take wireshark and capture working and not working associational request.
>>
>> I'll try that. If only it was that easy to get a working association.
>
> Compare it with windows.
Right, I forgot I can use wireshark in Windows.
> and please read this, it will help to provide more information.
> http://wireless.kernel.org/en/users/Documentation/Reporting_bugs
I don't understand what exactly do you need from that list. I'm using
3.11.6, but as I said in the original report, the kernel version makes
no difference, even as far back as 3.6.0. I've put the dmesg log in
the original mail and there's nothing of value there. I could try
running with CONFIG_MAC80211_*_DEBUG stuff enabled, but do you really
think that will help?
If you really must know, this is exactly what I'm running:
config=$(mktemp)
pid="/run/wpa_supplicant_wlan0.pid"
cat > $config <<-EOF
country=MX
ap_scan=1
eapol_version=2
device_name=Nysa
device_type=1-0050F204-1
network={
ssid="AXTEL-7111"
proto=WPA2
scan_ssid=1
key_mgmt=WPA-PSK
psk="C3657111"
mode=0
}
EOF
wpa_supplicant -B -t -d -P $pid -i wlan0 -D nl80211,wext -c $config -f
/tmp/wpa.log
rm -f "$config"
I tried with wext, but that doesn't work at all. A curious fact is
that I need to enable CONFIG_CFG80211_WEXT=y for AP scanning to work,
even though I'm not using the wext driver.
I will try to get the wireshark logs.
--
Felipe Contreras
On Mon, Oct 28, 2013 at 1:27 PM, Krishna Chaitanya
<[email protected]> wrote:
> Another side porint? Also all the beacons from that AP are malformed
> but probe responses are fine, weird??
Also, here's a message I get some times. I don't think it's related to
the issue at hands, but who knows:
Oct 28 14:28:47 nysa kernel: iwlwifi 0000:02:00.0: fail to flush all
tx fifo queues Q 11
Oct 28 14:28:47 nysa kernel: iwlwifi 0000:02:00.0: Current SW read_ptr
137 write_ptr 154
Oct 28 14:28:47 nysa kernel: iwl data: 00000000: 00 fe ff 01 00 00 00
00 00 00 00 00 00 00 00 00 ................
Oct 28 14:28:47 nysa kernel: iwlwifi 0000:02:00.0: FH TRBs(0) = 0x00000000
Oct 28 14:28:47 nysa kernel: iwlwifi 0000:02:00.0: FH TRBs(1) = 0xc010b098
Oct 28 14:28:47 nysa kernel: iwlwifi 0000:02:00.0: FH TRBs(2) = 0x00000000
Oct 28 14:28:47 nysa kernel: iwlwifi 0000:02:00.0: FH TRBs(3) = 0x803000e3
Oct 28 14:28:47 nysa kernel: iwlwifi 0000:02:00.0: FH TRBs(4) = 0x00000000
Oct 28 14:28:47 nysa kernel: iwlwifi 0000:02:00.0: FH TRBs(5) = 0x00000000
Oct 28 14:28:47 nysa kernel: iwlwifi 0000:02:00.0: FH TRBs(6) = 0x00000000
Oct 28 14:28:47 nysa kernel: iwlwifi 0000:02:00.0: FH TRBs(7) = 0x00709041
Oct 28 14:28:47 nysa kernel: iwlwifi 0000:02:00.0: Q 0 is active and
mapped to fifo 3 ra_tid 0x0000 [228,228]
Oct 28 14:28:47 nysa kernel: iwlwifi 0000:02:00.0: Q 1 is active and
mapped to fifo 2 ra_tid 0x0000 [0,0]
Oct 28 14:28:47 nysa kernel: iwlwifi 0000:02:00.0: Q 2 is active and
mapped to fifo 1 ra_tid 0x0000 [105,105]
Oct 28 14:28:47 nysa kernel: iwlwifi 0000:02:00.0: Q 3 is active and
mapped to fifo 0 ra_tid 0x0000 [0,0]
Oct 28 14:28:47 nysa kernel: iwlwifi 0000:02:00.0: Q 4 is active and
mapped to fifo 0 ra_tid 0x0000 [0,0]
Oct 28 14:28:47 nysa kernel: iwlwifi 0000:02:00.0: Q 5 is active and
mapped to fifo 4 ra_tid 0x0000 [0,0]
Oct 28 14:28:47 nysa kernel: iwlwifi 0000:02:00.0: Q 6 is active and
mapped to fifo 2 ra_tid 0x0000 [0,0]
Oct 28 14:28:47 nysa kernel: iwlwifi 0000:02:00.0: Q 7 is active and
mapped to fifo 5 ra_tid 0x0000 [0,0]
Oct 28 14:28:47 nysa kernel: iwlwifi 0000:02:00.0: Q 8 is active and
mapped to fifo 4 ra_tid 0x0000 [0,0]
Oct 28 14:28:47 nysa kernel: iwlwifi 0000:02:00.0: Q 9 is active and
mapped to fifo 7 ra_tid 0x0000 [66,66]
Oct 28 14:28:47 nysa kernel: iwlwifi 0000:02:00.0: Q 10 is active and
mapped to fifo 5 ra_tid 0x0000 [0,0]
Oct 28 14:28:47 nysa kernel: iwlwifi 0000:02:00.0: Q 11 is active and
mapped to fifo 1 ra_tid 0x0000 [137,154]
Oct 28 14:28:47 nysa kernel: iwlwifi 0000:02:00.0: Q 12 is inactive
and mapped to fifo 0 ra_tid 0x0000 [0,0]
Oct 28 14:28:47 nysa kernel: iwlwifi 0000:02:00.0: Q 13 is inactive
and mapped to fifo 0 ra_tid 0x0000 [0,0]
Oct 28 14:28:47 nysa kernel: iwlwifi 0000:02:00.0: Q 14 is inactive
and mapped to fifo 0 ra_tid 0x0000 [0,0]
Oct 28 14:28:47 nysa kernel: iwlwifi 0000:02:00.0: Q 15 is inactive
and mapped to fifo 0 ra_tid 0x0000 [0,0]
Oct 28 14:28:47 nysa kernel: iwlwifi 0000:02:00.0: Q 16 is inactive
and mapped to fifo 0 ra_tid 0x0000 [0,0]
Oct 28 14:28:47 nysa kernel: iwlwifi 0000:02:00.0: Q 17 is inactive
and mapped to fifo 0 ra_tid 0x0000 [0,0]
Oct 28 14:28:47 nysa kernel: iwlwifi 0000:02:00.0: Q 18 is inactive
and mapped to fifo 0 ra_tid 0x0000 [0,0]
Oct 28 14:28:47 nysa kernel: iwlwifi 0000:02:00.0: Q 19 is inactive
and mapped to fifo 0 ra_tid 0x0000 [0,0]
--
Felipe Contreras
On Mon, Oct 28, 2013 at 12:28 PM, Krishna Chaitanya
<[email protected]> wrote:
> On Mon, Oct 28, 2013 at 11:30 PM, Felipe Contreras
> <[email protected]> wrote:
>
>> The authentication response does come back in both cases though, it's
>> just the acknowledgement that is missing. Unfortunately I cannot
>> figure out for which message it's the ack.
>>
>> Also, I notice the sequence number received from the router doesn't
>> seem to change. All the authentication requests received have the same
>> number (256). Another peculiar thing is that in the failed case the SN
>> we send starts with 0.
>>
>> I suppose since the authentication ack never arrives, the next steps
>> are never completed.
>>
>> Does that help?
> From the supplicant logs we have successfully received the
> authentication response
> and sent out the association request. So are you referring to not receiving ACK
> for association request??
No, from the capture there's no association request in the bad case,
only in the good one.
> It would be nice to get the capture without any filters?
http://people.freedesktop.org/~felipec/wpa/wpa-bad.pcapng
http://people.freedesktop.org/~felipec/wpa/wpa-good.pcapng
--
Felipe Contreras
On Tue, Oct 29, 2013 at 1:36 AM, Felipe Contreras
<[email protected]> wrote:
> On Mon, Oct 28, 2013 at 1:27 PM, Krishna Chaitanya
> <[email protected]> wrote:
>> From the logs we can see that we have received authentication response,
>> so the association request is getting dropped somewhere? We might
>> need the mac80211 and iwlwifi trace-cmd logs to check for the drop.
>>
>> http://wireless.kernel.org/en/developers/Documentation/mac80211/tracing
>
> There you go:
> http://people.freedesktop.org/~felipec/wpa/trace.dat.xz
>
Hmm.."trace-cmd report -i trace.dat" returned lots of errors, i have even
tried with the trace-cmd from git (ubuntu). Did it worked fro you?
===
# trace-cmd report -i trace.dat
version = 6
trace-cmd: No such file or directory ??????????????
function scsi_trace_parse_cdb not defined
function scsi_trace_parse_cdb not defined
function scsi_trace_parse_cdb not defined
function scsi_trace_parse_cdb not defined
function is_writable_pte not defined
function __le16_to_cpup not defined
function __le16_to_cpup not defined
function __le16_to_cpup not defined
CPU 0 is empty
CPU 1 is empty
CPU 2 is empty
CPU 3 is empty
cpus=4
--
Thanks,
Regards,
Chaitanya T K.
Mobile:+91-9963910010.
Am 28.10.2013 10:38, schrieb Felipe Contreras:
> On Mon, Oct 28, 2013 at 2:31 AM, Oleksij Rempel <[email protected]> wrote:
>> Am 28.10.2013 07:24, schrieb Felipe Contreras:
>>> Hi,
>>>
>>> I already reported this problem [1] and I got no feedback whatsoever
>>> at all. The issue keeps happening and I've tried many things.
>>>
>>> First of all, when Linux is failing, my phone connects fine, other
>>> computers connect fine, and this machine with Windows 7 as well. I've
>>> tried reloading the driver, rebooting, different module parameters,
>>> nothing works.
>>>
>>> I logged a session [2] where I waited 30 minutes for the link to come
>>> up, but it never did. You can see the same thing happening over and
>>> over:
>>>
>>> 1382936199.582861: nl80211: Association request send successfully
>>> 1382936199.797063: nl80211: Event message available
>>> 1382936199.797097: nl80211: Delete station e0:1d:3b:46:82:a0
>>> 1382936199.797881: nl80211: Event message available
>>> 1382936199.797905: nl80211: MLME event 38; timeout with e0:1d:3b:46:82:a0
>>> 1382936199.797915: wlan0: Event ASSOC_TIMED_OUT (15) received
>>> 1382936199.797920: wlan0: SME: Association timed out
>>> 1382936199.797924: Added BSSID e0:1d:3b:46:82:a0 into blacklist
>>>
>>> If I reboot the router, it works immediately, another thing that works
>>> is connecting with ad-hoc mode (mode=1), and then back to normal mode
>>> (mode=0).
>>>
>>> Here's a log [3] where I tried reloading the module multiple times and
>>> finally tried ad-hoc mode, after which the link came up.
>>>
>>> Clearly the Linux kernel has a bug. Can somebody point out what needs
>>> to be done to get this fixed?
>>>
>>> Cheers.
>>>
>>> [1] http://article.gmane.org/gmane.linux.kernel.wireless.general/108004
>>> [2] http://people.freedesktop.org/~felipec/wpa/wpa-bad-30-min-wait.log
>>> [3] http://people.freedesktop.org/~felipec/wpa/wpa-good-nothing-worked-except-mode-switch.log
>>>
>>
>> Heh... this logs look like miracle :)
>> My first assumption would be buggy router. There is no answer in
>> wpa_supplicant log.
>
> Yeah, I bet the router is buggy, which router isn't? But why Windows 7
> connects fine?
May be it includes some workaround?
You do not need to fight with devs, i think they are agree that some
thing is wrong. But believe me, there are so many access points, which
make problems or wear things. If you have a bug, does not mean other
user have it.
If it depends on some AP option, it would be good to know which one.
Then it will be easier to find some fix.
Beside, how many clients use this AP? How big is the distance? What do
you configure in AdHock mode?
>> Take wireshark and capture working and not working associational request.
>
> I'll try that. If only it was that easy to get a working association.
Compare it with windows.
and please read this, it will help to provide more information.
http://wireless.kernel.org/en/users/Documentation/Reporting_bugs
--
Regards,
Oleksij
On Mon, Oct 28, 2013 at 2:31 AM, Oleksij Rempel <[email protected]> wrote:
> Am 28.10.2013 07:24, schrieb Felipe Contreras:
>> Hi,
>>
>> I already reported this problem [1] and I got no feedback whatsoever
>> at all. The issue keeps happening and I've tried many things.
>>
>> First of all, when Linux is failing, my phone connects fine, other
>> computers connect fine, and this machine with Windows 7 as well. I've
>> tried reloading the driver, rebooting, different module parameters,
>> nothing works.
>>
>> I logged a session [2] where I waited 30 minutes for the link to come
>> up, but it never did. You can see the same thing happening over and
>> over:
>>
>> 1382936199.582861: nl80211: Association request send successfully
>> 1382936199.797063: nl80211: Event message available
>> 1382936199.797097: nl80211: Delete station e0:1d:3b:46:82:a0
>> 1382936199.797881: nl80211: Event message available
>> 1382936199.797905: nl80211: MLME event 38; timeout with e0:1d:3b:46:82:a0
>> 1382936199.797915: wlan0: Event ASSOC_TIMED_OUT (15) received
>> 1382936199.797920: wlan0: SME: Association timed out
>> 1382936199.797924: Added BSSID e0:1d:3b:46:82:a0 into blacklist
>>
>> If I reboot the router, it works immediately, another thing that works
>> is connecting with ad-hoc mode (mode=1), and then back to normal mode
>> (mode=0).
>>
>> Here's a log [3] where I tried reloading the module multiple times and
>> finally tried ad-hoc mode, after which the link came up.
>>
>> Clearly the Linux kernel has a bug. Can somebody point out what needs
>> to be done to get this fixed?
>>
>> Cheers.
>>
>> [1] http://article.gmane.org/gmane.linux.kernel.wireless.general/108004
>> [2] http://people.freedesktop.org/~felipec/wpa/wpa-bad-30-min-wait.log
>> [3] http://people.freedesktop.org/~felipec/wpa/wpa-good-nothing-worked-except-mode-switch.log
>>
>
> Heh... this logs look like miracle :)
> My first assumption would be buggy router. There is no answer in
> wpa_supplicant log.
Yeah, I bet the router is buggy, which router isn't? But why Windows 7
connects fine?
> Take wireshark and capture working and not working associational request.
I'll try that. If only it was that easy to get a working association.
--
Felipe Contreras
On Sat, Nov 9, 2013 at 7:10 AM, Krishna Chaitanya
<[email protected]> wrote:
> On Sat, Nov 9, 2013 at 2:22 AM, Felipe Contreras
> <[email protected]> wrote:
>> On Fri, Nov 8, 2013 at 2:30 PM, Krishna Chaitanya
>> <[email protected]> wrote:
>>>> But we are receiving 0 beacons, waiting for more than 1 won't help.
>>>> BTW, why NEED_DTIM_BEFORE_ASSOC if the device doesn't *need* the DTIM
>>>> before the association?
>>>>
> This is not just for your case but rather on a generic note. Regarding
> the flag even i am not
> too sure but i guess some hardware need to know the DTIM to set the
> wakeup schedule
> after the association?
But not this hardware? Because everything works fine.
>>> Oops...you just missed, Right after your print there is a check to
>>> drop frames with BAD CRC :-).
>>
>> That's why I put the print before that check. Since I don't see the
>> print, that means the check was never executed. iwlagn_rx_reply_rx()
>> was never called for the beacon frame.
>>
> Ok. So when we disable advertising of that flag in the driver you said things
> are working fine.
Yes, everything works perfectly.
> So in that scenario after the connection are you
> seeing the beacons?
No, there are no beacons ever, at least from this AP.
It seems to me all the beacon frames are dropped by the firmware
before passing them to the driver, so the driver cannot parse them and
do something sensible even though they are corrupted, the driver never
gets them.
> Just want to understand the problem is throughout or just before association.
> If the driver itself it not getting the beacons then our debugging ends there,
> some one from intel should guide you through the FW debugging.
Not really, part of the debugging ends there, but we can still do something.
What is the meaning of NEED_DTIM_BEFORE_ASSOC, if the driver doesn't
*need* this? Why fail the association completely, if we don't need to?
Also, I realized that after rebooting the router, the beacon frames
are not corrupted any more, so it's a compound problem, yet even in
the corrupted case, the driver can work just fine, if only it didn't
*require* the DTIM unnecessarily, as apparently all hardware and even
other OS'es on this hardware do.
Cheers.
--
Felipe Contreras
On Sat, Nov 2, 2013 at 11:16 AM, Felipe Contreras
<[email protected]> wrote:
>
> On Tue, Oct 29, 2013 at 8:23 AM, Krishna Chaitanya
> <[email protected]> wrote:
> > On Tue, Oct 29, 2013 at 3:02 AM, Felipe Contreras
> > <[email protected]> wrote:
>
> >> Yes, but maybe I overrode the file. I've pushed a new one again. The
> >> sha-1 is 36c260d8d8c171a24eb1aa7b2ea736b06c9b55b7.
> >>
> > Thanks, able to decode now. I am not familiar with the
> > iwlwifi code, but let me give it a try.
>
> Did you find anything?
Not Much, i could see the below pattern repeating
wpa_supplicant-6915 [003] 5610.786592: drv_return_void: phy1
wpa_supplicant-6915 [003] 5610.786594: drv_sta_state: phy1
vif:wlan0(2) sta:e0:1d:3b:46:82:a0 state: 0->1
wpa_supplicant-6915 [003] 5610.786597: iwlwifi_dev_hcmd:
[0000:02:00.0] hcmd 0x18 (sync)
irq/44-iwlwifi-6803 [001] 5610.786632: iwlwifi_dev_rx:
[0000:02:00.0] RX cmd 0x18
wpa_supplicant-6915 [003] 5610.786639: drv_return_int: phy1 - 0
wpa_supplicant-6915 [003] 5610.786645: drv_mgd_prepare_tx: phy1
vif:wlan0(2)
wpa_supplicant-6915 [003] 5610.786645: drv_return_void: phy1
wpa_supplicant-6915 [003] 5610.786663: iwlwifi_dev_tx:
[0000:02:00.0] TX 1c (90 bytes)
irq/44-iwlwifi-6803 [001] 5610.788677: iwlwifi_dev_rx:
[0000:02:00.0] RX cmd 0x1c
irq/44-iwlwifi-6803 [001] 5610.788709: iwlwifi_dev_rx:
[0000:02:00.0] RX cmd 0xc0
irq/44-iwlwifi-6803 [001] 5610.788710: iwlwifi_dev_rx:
[0000:02:00.0] RX cmd 0xc1
kworker/u8:0-1459 [003] 5610.788730: drv_sta_state: phy1
vif:wlan0(2) sta:e0:1d:3b:46:82:a0 state: 1->2
kworker/u8:0-1459 [003] 5610.788732: drv_return_int: phy1 - 0
kworker/u8:0-1459 [003] 5610.999549: drv_sta_state: phy1
vif:wlan0(2) sta:e0:1d:3b:46:82:a0 state: 2->1
kworker/u8:0-1459 [003] 5610.999553: drv_return_int: phy1 - 0
kworker/u8:0-1459 [003] 5610.999553: drv_sta_state: phy1
vif:wlan0(2) sta:e0:1d:3b:46:82:a0 state: 1->0
kworker/u8:0-1459 [003] 5610.999554: drv_return_int: phy1 - 0
So We move from NONE to AUTH and then AUTH to NONE. iwlwifi_dev_tx is
transmistting 90 bytes packets
but auth request is only 43 bytes, could that be association request??
If yes where is that getting dropped.
So my guess is that we are sending the association request (CMD_TX),
but its never seen OTA, its lost some where in between.
Ball goes to the FW :-).
On Fri, Nov 8, 2013 at 2:30 PM, Krishna Chaitanya
<[email protected]> wrote:
> On Fri, Nov 8, 2013 at 7:48 PM, Felipe Contreras
> <[email protected]> wrote:
>> On Fri, Nov 8, 2013 at 8:06 AM, Krishna Chaitanya
>> <[email protected]> wrote:
>>> On Fri, Nov 8, 2013 at 6:44 PM, Felipe Contreras
>>> <[email protected]> wrote:
>>>> On Fri, Nov 8, 2013 at 2:35 AM, Felipe Contreras
>>>> <[email protected]> wrote:
>>>>> On Sat, Nov 2, 2013 at 2:05 PM, Krishna Chaitanya
>>>>> <[email protected]> wrote:
>>>>>
>>>>>> Also one more thing you said N900 uses mac80211 and it has no issues, but as
>>>>>> its a embedded device it might running an older kernel where the
>>>>>> handling might be
>>>>>> different, so we need to try with the same kernel you are facing an
>>>>>> issue with the
>>>>>> a driver which advertises IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC.
>>>>>
>>>>> Yes it was running an older kernel, but I just compiled v3.12 and ran
>>>>> it on the N900, and still everything works fine.
>>>>>
>>>>>> (or) if you a have a compilation environment try commenting the advertisement of
>>>>>> IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC in the iwlwifi DVM driver and
>>>>>> try to reproduce the issue.
>>>>>
>>>>> After commenting that flag everything works fine :)
>>>
>>> Oh, great. That was just to corner the problem, that means we are not getting
>>> the required beacon before the association, but we only wait for 1 beacon here
>>> may be we to wait for some number of beacons before giving up the association??
>>>
>>> Johannes??
>>
>> But we are receiving 0 beacons, waiting for more than 1 won't help.
>> BTW, why NEED_DTIM_BEFORE_ASSOC if the device doesn't *need* the DTIM
>> before the association?
>>
>>>>> What are the next steps?
>>>>
>>>> I tried to add some debugging to see what's going on, and indeed the
>>>> beacon packets are lost, I added debugging as low in the chain as I
>>>> could (iwlagn_rx_reply_rx()), and I don't see them there. However,
>>>> when I enable the monitor mode, I see them. What's going on?
>>>
>>> In the captures you shared all the beacons are malformed, so
>>> probably they failed the CRC check. iwlwifi drops all the CRC failed
>>> packets. (doth MVM and DVM)
>>
>> Before iwlagn_rx_reply_rx()?
>>
>>> Not sure how you are receiving the beacons in the monitor mode.
>>
>> I don't know what kismet does, but I can see my debugging is printing them.
>>
>>> BTW did you tried capturing the beacons in other devices and see if they
>>> are really malformed, or is it just iwlwifi interpreting them wrongly.?
>>
>> I haven't managed to do that yet.
>>
>> This is what I'm doing:
>>
>> --- a/drivers/net/wireless/iwlwifi/dvm/rx.c
>> +++ b/drivers/net/wireless/iwlwifi/dvm/rx.c
>> @@ -919,6 +919,11 @@ static int iwlagn_rx_reply_rx(struct iwl_priv *priv,
>> ampdu_status = iwlagn_translate_rx_status(priv,
>> le32_to_cpu(rx_pkt_status));
>>
>> + if (ieee80211_is_beacon(header->frame_control)) {
>> + print_hex_dump(KERN_INFO, "iwlwifi: dump: ", DUMP_PREFIX_OFFSET,
>> + 16, 1, header, len, true);
>> + }
>> +
>> if ((unlikely(phy_res->cfg_phy_cnt > 20))) {
>> IWL_DEBUG_DROP(priv, "dsp size out of range [0,20]: %d\n",
>> phy_res->cfg_phy_cnt);
>>
> Oops...you just missed, Right after your print there is a check to
> drop frames with BAD CRC :-).
That's why I put the print before that check. Since I don't see the
print, that means the check was never executed. iwlagn_rx_reply_rx()
was never called for the beacon frame.
--
Felipe Contreras
On Fri, Nov 8, 2013 at 2:35 AM, Felipe Contreras
<[email protected]> wrote:
> On Sat, Nov 2, 2013 at 2:05 PM, Krishna Chaitanya
> <[email protected]> wrote:
>
>> Also one more thing you said N900 uses mac80211 and it has no issues, but as
>> its a embedded device it might running an older kernel where the
>> handling might be
>> different, so we need to try with the same kernel you are facing an
>> issue with the
>> a driver which advertises IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC.
>
> Yes it was running an older kernel, but I just compiled v3.12 and ran
> it on the N900, and still everything works fine.
>
>> (or) if you a have a compilation environment try commenting the advertisement of
>> IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC in the iwlwifi DVM driver and
>> try to reproduce the issue.
>
> After commenting that flag everything works fine :)
>
> What are the next steps?
I tried to add some debugging to see what's going on, and indeed the
beacon packets are lost, I added debugging as low in the chain as I
could (iwlagn_rx_reply_rx()), and I don't see them there. However,
when I enable the monitor mode, I see them. What's going on?
--
Felipe Contreras
On Sun, Nov 3, 2013 at 12:43 AM, Krishna Chaitanya
<[email protected]> wrote:
> On Sat, Nov 2, 2013 at 10:36 PM, Felipe Contreras
> <[email protected]> wrote:
>> Sorry for the crappy format (laptop is stuck).
>>
>> If the problem is the firmware why does it start working when I
>> restart the router?
>
> I was referring FW on iwlwifi, not the AP's. May be restarting the router
> triggered some kind of event like Beacon Loss which might have
> helped the STA to connect. Just guessing.
>
>> Plus, I should be seeing this problem with other APs.
>> And what could Windows be doing differently that doesn't exhibit this problem?
>
> Without knowing the root cause it tough to comment these questions :-)
Almost forgot, as mentioned previously if we can figure out why all the
beacons are corrupted (but probes are fine) then that might throw us some
light.
Its messed up in the Extended supported rates and Length of OBSS Scan
params. Other stations might not have a problem but linux might have it.
Can you also capture the beacons using some other wireless card
just to make sure corruption is not happening on the AP side?
Failure Case:
As we can see in the Link#1, in the first case we haven't received a beacon
so association didn't went through (need_beacon =1)
Successful Case:
but after associating with IBSS and disconnecting we started Rx beacons
(even though they are corrupted) the connection went through.
(need_beacon =0 and have_beacon=1)
The problems appears to be with reception of the beacons which in turn
might depend on the corruption issue.
If they are no beacons Rx after auth and before assoc we return with
association timeout.
if (ifmgd->assoc_data && ifmgd->assoc_data->timeout_started &&
time_after(jiffies, ifmgd->assoc_data->timeout)) {
if ((ifmgd->assoc_data->need_beacon &&
!ifmgd->assoc_data->have_beacon) ||
ieee80211_do_assoc(sdata)) {
u8 bssid[ETH_ALEN];
memcpy(bssid, ifmgd->assoc_data->bss->bssid, ETH_ALEN);
ieee80211_destroy_assoc_data(sdata, false);
mutex_unlock(&ifmgd->mtx);
cfg80211_send_assoc_timeout(sdata->dev, bssid);
mutex_lock(&ifmgd->mtx);
}
}
Also one more thing you said N900 uses mac80211 and it has no issues, but as
its a embedded device it might running an older kernel where the
handling might be
different, so we need to try with the same kernel you are facing an
issue with the
a driver which advertises IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC.
(or) if you a have a compilation environment try commenting the advertisement of
IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC in the iwlwifi DVM driver and
try to reproduce the issue.
Lets see what results you get :-).
On Sat, Nov 2, 2013 at 2:05 PM, Krishna Chaitanya
<[email protected]> wrote:
> Also one more thing you said N900 uses mac80211 and it has no issues, but as
> its a embedded device it might running an older kernel where the
> handling might be
> different, so we need to try with the same kernel you are facing an
> issue with the
> a driver which advertises IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC.
Yes it was running an older kernel, but I just compiled v3.12 and ran
it on the N900, and still everything works fine.
> (or) if you a have a compilation environment try commenting the advertisement of
> IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC in the iwlwifi DVM driver and
> try to reproduce the issue.
After commenting that flag everything works fine :)
What are the next steps?
--
Felipe Contreras
On Sun, Nov 10, 2013 at 2:54 AM, Felipe Contreras
<[email protected]> wrote:
> On Sat, Nov 9, 2013 at 1:10 PM, Krishna Chaitanya
> <[email protected]> wrote:
>> On Sat, Nov 9, 2013 at 10:22 PM, Felipe Contreras
>> <[email protected]> wrote:
>>> On Sat, Nov 9, 2013 at 7:10 AM, Krishna Chaitanya
>>> <[email protected]> wrote:
>>>> On Sat, Nov 9, 2013 at 2:22 AM, Felipe Contreras
>>>> <[email protected]> wrote:
>>>>> On Fri, Nov 8, 2013 at 2:30 PM, Krishna Chaitanya
>>>>> <[email protected]> wrote:
>>>
>>>>>>> But we are receiving 0 beacons, waiting for more than 1 won't help.
>>>>>>> BTW, why NEED_DTIM_BEFORE_ASSOC if the device doesn't *need* the DTIM
>>>>>>> before the association?
>>>>>>>
>>>> This is not just for your case but rather on a generic note. Regarding
>>>> the flag even i am not
>>>> too sure but i guess some hardware need to know the DTIM to set the
>>>> wakeup schedule
>>>> after the association?
>>>
>>> But not this hardware? Because everything works fine.
>>>
>>>>>> Oops...you just missed, Right after your print there is a check to
>>>>>> drop frames with BAD CRC :-).
>>>>>
>>>>> That's why I put the print before that check. Since I don't see the
>>>>> print, that means the check was never executed. iwlagn_rx_reply_rx()
>>>>> was never called for the beacon frame.
>>>>>
>>>> Ok. So when we disable advertising of that flag in the driver you said things
>>>> are working fine.
>>>
>>> Yes, everything works perfectly.
>>>
>>>> So in that scenario after the connection are you
>>>> seeing the beacons?
>>>
>>> No, there are no beacons ever, at least from this AP
>
>> Oh ok, thats interesting. Are you not seeing any disconnects due
>> to beacon loss triggers?
>
> I see some disconnects now and then, but I don't know why. Before
> trying to tackle those problems I would like to be able to connect
> reliably.
Its probably the beacons loss that triggering the disconnects, so
both the problem have the same cause. Its the beacon reception
we need to figure it out.
Adding some intel guys explicitly.
>> Also can you add some debugging to the iwlagn_rx_beacon_notif
>> (the beacon RX handler)?
>
> All right, I've added debugging there, but so far I see nothing.
>
Hmm...dead end this side too.
>>> It seems to me all the beacon frames are dropped by the firmware
>>> before passing them to the driver, so the driver cannot parse them and
>>> do something sensible even though they are corrupted, the driver never
>>> gets them.
>>>
>>>> Just want to understand the problem is throughout or just before association.
>>>> If the driver itself it not getting the beacons then our debugging ends there,
>>>> some one from intel should guide you through the FW debugging.
>>>
>>> Not really, part of the debugging ends there, but we can still do something.
>>>
>>> What is the meaning of NEED_DTIM_BEFORE_ASSOC, if the driver doesn't
>>> *need* this? Why fail the association completely, if we don't need to?
>>>
>>> Also, I realized that after rebooting the router, the beacon frames
>>> are not corrupted any more, so it's a compound problem, yet even in
>>> the corrupted case, the driver can work just fine, if only it didn't
>>> *require* the DTIM unnecessarily,
>>
>> Yeah, that's more of design query with the problem being not able to
>> Rx the beacons? We need to understand the reason for this flag being
>> set by the iwlwifi driver.
>
> Indeed.
>
>>>as apparently all hardware and even
>>> other OS'es on this hardware do.
>>
>> Thats the reason this flag is a _HW_ not all hardwares requrie this
>> but intel does.
>
> But it doesn't, my hardware is Intel, and it works fine without it.
>
Yeah, so far so good. But there should be a reason why they are
specifically advertising this flag? Also DTIM is Multicast+Powersave
so a rare thing, we might no hit that too often.
On Sun, Nov 10, 2013 at 12:31 AM, Emmanuel Grumbach <[email protected]> wrote:
>>>>>>>>> But we are receiving 0 beacons, waiting for more than 1 won't help.
>>>>>>>>> BTW, why NEED_DTIM_BEFORE_ASSOC if the device doesn't *need* the DTIM
>>>>>>>>> before the association?
>>>>>>>>>
>>>>>> This is not just for your case but rather on a generic note. Regarding
>>>>>> the flag even i am not
>>>>>> too sure but i guess some hardware need to know the DTIM to set the
>>>>>> wakeup schedule
>>>>>> after the association?
>>>>>
>
> Right - we need the send the beacon interval to the device *before* we
> configure the device to be associated.
But what do you mean "need"? If I remove the flag the association works fine.
>>>>> But not this hardware? Because everything works fine.
>>>>>
>>>>>>>> Oops...you just missed, Right after your print there is a check to
>>>>>>>> drop frames with BAD CRC :-).
>>>>>>>
>>>>>>> That's why I put the print before that check. Since I don't see the
>>>>>>> print, that means the check was never executed. iwlagn_rx_reply_rx()
>>>>>>> was never called for the beacon frame.
>>>>>>>
>
> That won't help since the firmware will drop frames with bad CRC,
> unless you are in monitor mode.
And apparently ad-hoc mode too.
Either way that's not helping, ideally those corrupted beacons should
be parsed by the driver, it will see they are corrupted, but still do
something sensible.
>>>>>> Ok. So when we disable advertising of that flag in the driver you said things
>>>>>> are working fine.
>>>>>
>>>>> Yes, everything works perfectly.
>>>>>
>>>>>> So in that scenario after the connection are you
>>>>>> seeing the beacons?
>>>>>
>>>>> No, there are no beacons ever, at least from this AP
>>>
>>>> Oh ok, thats interesting. Are you not seeing any disconnects due
>>>> to beacon loss triggers?
>>>
>>> I see some disconnects now and then, but I don't know why. Before
>>> trying to tackle those problems I would like to be able to connect
>>> reliably.
>
> No wonder. If we can't receive any beacons you can expect issues....
> PS will be completely broken and that is only the first on the list...
That's OK, it's better to connect with issues rather than not connect at all.
>> Its probably the beacons loss that triggering the disconnects, so
>> both the problem have the same cause. Its the beacon reception
>> we need to figure it out.
>>
>> Adding some intel guys explicitly.
>>
>>>> Also can you add some debugging to the iwlagn_rx_beacon_notif
>>>> (the beacon RX handler)?
>>>
>>> All right, I've added debugging there, but so far I see nothing.
>>>
>>
>> Hmm...dead end this side too.
>>
>>>>> It seems to me all the beacon frames are dropped by the firmware
>>>>> before passing them to the driver, so the driver cannot parse them and
>>>>> do something sensible even though they are corrupted, the driver never
>>>>> gets them.
>>>>>
>>>>>> Just want to understand the problem is throughout or just before association.
>>>>>> If the driver itself it not getting the beacons then our debugging ends there,
>>>>>> some one from intel should guide you through the FW debugging.
>>>>>
>>>>> Not really, part of the debugging ends there, but we can still do something.
>>>>>
>>>>> What is the meaning of NEED_DTIM_BEFORE_ASSOC, if the driver doesn't
>>>>> *need* this? Why fail the association completely, if we don't need to?
>>>>>
>
> As I explained, the firmware needs to. This is for configuring the PS
> state machine. But since you AP is completely broken, PS is likely not
> to work at all anyway....
I don't use powersave anyway.
> And my small experience in WiFi leads me to the conclusion that if a
> driver cannot rely on the AP sending beacon, it is really in trouble.
Somehow every device in this house doesn't seem to have a problem.
Even this device in Windows works fine.
> We can cope with buggy AP, but not associate to microwaves.
> Other devices will work, granted. But they can't go to sleep then, and
> need to poke the AP from time to time to make sure it hasn't
> disappeared.
That's better than not associating at all, ever.
> Note that this is true regardless of the design / HW wahtever. Ok, the
> Windows driver on the same device works with this "AP". Fine. But it
> can't theoretically can't work well. Nor can any other WiFi device
> that can't hear the beacon. Now - maybe we have an issue in the Linux
> driver that mangles the beacons (PHY stuff) - that's possible. But
> since you haven't sent a sniffer capture of the AP with another
> device, we can't know.
That's right, I tried to do that with an N900 but the monitor mode
doesn't work. I'll keep trying.
>>>>> Also, I realized that after rebooting the router, the beacon frames
>>>>> are not corrupted any more, so it's a compound problem, yet even in
>>>>> the corrupted case, the driver can work just fine, if only it didn't
>>>>> *require* the DTIM unnecessarily,
>>>>
>>>> Yeah, that's more of design query with the problem being not able to
>>>> Rx the beacons? We need to understand the reason for this flag being
>>>> set by the iwlwifi driver.
>>>
>>> Indeed.
>>>
>>>>>as apparently all hardware and even
>>>>> other OS'es on this hardware do.
>>>>
>>>> Thats the reason this flag is a _HW_ not all hardwares requrie this
>>>> but intel does.
>>>
>>> But it doesn't, my hardware is Intel, and it works fine without it.
>>>
>> Yeah, so far so good. But there should be a reason why they are
>> specifically advertising this flag? Also DTIM is Multicast+Powersave
>> so a rare thing, we might no hit that too often.
>
> Hmm... well... N/M.
Wouldn't it make sense to timeout if there's no DTIM, and still
associate? It's better than not associating ever. Plus, if you already
know that power saving wouldn't work in this case, merely disable
powersave.
--
Felipe Contreras
On 11/10/2013 06:26 PM, Felipe Contreras wrote:
> On Sun, Nov 10, 2013 at 12:31 AM, Emmanuel Grumbach <[email protected]> wrote:
>>>>>>>>>> But we are receiving 0 beacons, waiting for more than 1 won't help.
>>>>>>>>>> BTW, why NEED_DTIM_BEFORE_ASSOC if the device doesn't *need* the DTIM
>>>>>>>>>> before the association?
>>>>>>>>>>
>>>>>>> This is not just for your case but rather on a generic note. Regarding
>>>>>>> the flag even i am not
>>>>>>> too sure but i guess some hardware need to know the DTIM to set the
>>>>>>> wakeup schedule
>>>>>>> after the association?
>>>>>>
>>
>> Right - we need the send the beacon interval to the device *before* we
>> configure the device to be associated.
>
> But what do you mean "need"? If I remove the flag the association works fine.
>
>>>>>> But not this hardware? Because everything works fine.
>>>>>>
>>>>>>>>> Oops...you just missed, Right after your print there is a check to
>>>>>>>>> drop frames with BAD CRC :-).
>>>>>>>>
>>>>>>>> That's why I put the print before that check. Since I don't see the
>>>>>>>> print, that means the check was never executed. iwlagn_rx_reply_rx()
>>>>>>>> was never called for the beacon frame.
>>>>>>>>
>>
>> That won't help since the firmware will drop frames with bad CRC,
>> unless you are in monitor mode.
>
> And apparently ad-hoc mode too.
>
> Either way that's not helping, ideally those corrupted beacons should
> be parsed by the driver, it will see they are corrupted, but still do
> something sensible.
>
>>>>>>> Ok. So when we disable advertising of that flag in the driver you said things
>>>>>>> are working fine.
>>>>>>
>>>>>> Yes, everything works perfectly.
>>>>>>
>>>>>>> So in that scenario after the connection are you
>>>>>>> seeing the beacons?
>>>>>>
>>>>>> No, there are no beacons ever, at least from this AP
>>>>
>>>>> Oh ok, thats interesting. Are you not seeing any disconnects due
>>>>> to beacon loss triggers?
>>>>
>>>> I see some disconnects now and then, but I don't know why. Before
>>>> trying to tackle those problems I would like to be able to connect
>>>> reliably.
>>
>> No wonder. If we can't receive any beacons you can expect issues....
>> PS will be completely broken and that is only the first on the list...
>
> That's OK, it's better to connect with issues rather than not connect at all.
>
>>> Its probably the beacons loss that triggering the disconnects, so
>>> both the problem have the same cause. Its the beacon reception
>>> we need to figure it out.
>>>
>>> Adding some intel guys explicitly.
>>>
>>>>> Also can you add some debugging to the iwlagn_rx_beacon_notif
>>>>> (the beacon RX handler)?
>>>>
>>>> All right, I've added debugging there, but so far I see nothing.
>>>>
>>>
>>> Hmm...dead end this side too.
>>>
>>>>>> It seems to me all the beacon frames are dropped by the firmware
>>>>>> before passing them to the driver, so the driver cannot parse them and
>>>>>> do something sensible even though they are corrupted, the driver never
>>>>>> gets them.
>>>>>>
>>>>>>> Just want to understand the problem is throughout or just before association.
>>>>>>> If the driver itself it not getting the beacons then our debugging ends there,
>>>>>>> some one from intel should guide you through the FW debugging.
>>>>>>
>>>>>> Not really, part of the debugging ends there, but we can still do something.
>>>>>>
>>>>>> What is the meaning of NEED_DTIM_BEFORE_ASSOC, if the driver doesn't
>>>>>> *need* this? Why fail the association completely, if we don't need to?
>>>>>>
>>
>> As I explained, the firmware needs to. This is for configuring the PS
>> state machine. But since you AP is completely broken, PS is likely not
>> to work at all anyway....
>
> I don't use powersave anyway.
>
>> And my small experience in WiFi leads me to the conclusion that if a
>> driver cannot rely on the AP sending beacon, it is really in trouble.
>
> Somehow every device in this house doesn't seem to have a problem.
> Even this device in Windows works fine.
>
>> We can cope with buggy AP, but not associate to microwaves.
>> Other devices will work, granted. But they can't go to sleep then, and
>> need to poke the AP from time to time to make sure it hasn't
>> disappeared.
>
> That's better than not associating at all, ever.
No because it would break the driver against all the working APs which
are fortunately enough more common. Maybe you can rewrite mac80211 /
iwlwifi to make things work differently so that PS would still work with
good APs and association would work with yours. Fair enough. Go ahead.
>
>> Note that this is true regardless of the design / HW wahtever. Ok, the
>> Windows driver on the same device works with this "AP". Fine. But it
>> can't theoretically can't work well. Nor can any other WiFi device
>> that can't hear the beacon. Now - maybe we have an issue in the Linux
>> driver that mangles the beacons (PHY stuff) - that's possible. But
>> since you haven't sent a sniffer capture of the AP with another
>> device, we can't know.
>
> That's right, I tried to do that with an N900 but the monitor mode
> doesn't work. I'll keep trying.
>
>>>>>> Also, I realized that after rebooting the router, the beacon frames
>>>>>> are not corrupted any more, so it's a compound problem, yet even in
>>>>>> the corrupted case, the driver can work just fine, if only it didn't
>>>>>> *require* the DTIM unnecessarily,
>>>>>
>>>>> Yeah, that's more of design query with the problem being not able to
>>>>> Rx the beacons? We need to understand the reason for this flag being
>>>>> set by the iwlwifi driver.
>>>>
>>>> Indeed.
>>>>
>>>>>> as apparently all hardware and even
>>>>>> other OS'es on this hardware do.
>>>>>
>>>>> Thats the reason this flag is a _HW_ not all hardwares requrie this
>>>>> but intel does.
>>>>
>>>> But it doesn't, my hardware is Intel, and it works fine without it.
>>>>
>>> Yeah, so far so good. But there should be a reason why they are
>>> specifically advertising this flag? Also DTIM is Multicast+Powersave
>>> so a rare thing, we might no hit that too often.
>>
>> Hmm... well... N/M.
>
> Wouldn't it make sense to timeout if there's no DTIM, and still
> associate? It's better than not associating ever. Plus, if you already
> know that power saving wouldn't work in this case, merely disable
> powersave.
>
I can't wait for your patch.
On Fri, Nov 8, 2013 at 7:48 PM, Felipe Contreras
<[email protected]> wrote:
> On Fri, Nov 8, 2013 at 8:06 AM, Krishna Chaitanya
> <[email protected]> wrote:
>> On Fri, Nov 8, 2013 at 6:44 PM, Felipe Contreras
>> <[email protected]> wrote:
>>> On Fri, Nov 8, 2013 at 2:35 AM, Felipe Contreras
>>> <[email protected]> wrote:
>>>> On Sat, Nov 2, 2013 at 2:05 PM, Krishna Chaitanya
>>>> <[email protected]> wrote:
>>>>
>>>>> Also one more thing you said N900 uses mac80211 and it has no issues, but as
>>>>> its a embedded device it might running an older kernel where the
>>>>> handling might be
>>>>> different, so we need to try with the same kernel you are facing an
>>>>> issue with the
>>>>> a driver which advertises IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC.
>>>>
>>>> Yes it was running an older kernel, but I just compiled v3.12 and ran
>>>> it on the N900, and still everything works fine.
>>>>
>>>>> (or) if you a have a compilation environment try commenting the advertisement of
>>>>> IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC in the iwlwifi DVM driver and
>>>>> try to reproduce the issue.
>>>>
>>>> After commenting that flag everything works fine :)
>>
>> Oh, great. That was just to corner the problem, that means we are not getting
>> the required beacon before the association, but we only wait for 1 beacon here
>> may be we to wait for some number of beacons before giving up the association??
>>
>> Johannes??
>
> But we are receiving 0 beacons, waiting for more than 1 won't help.
> BTW, why NEED_DTIM_BEFORE_ASSOC if the device doesn't *need* the DTIM
> before the association?
>
>>>> What are the next steps?
>>>
>>> I tried to add some debugging to see what's going on, and indeed the
>>> beacon packets are lost, I added debugging as low in the chain as I
>>> could (iwlagn_rx_reply_rx()), and I don't see them there. However,
>>> when I enable the monitor mode, I see them. What's going on?
>>
>> In the captures you shared all the beacons are malformed, so
>> probably they failed the CRC check. iwlwifi drops all the CRC failed
>> packets. (doth MVM and DVM)
>
> Before iwlagn_rx_reply_rx()?
>
>> Not sure how you are receiving the beacons in the monitor mode.
>
> I don't know what kismet does, but I can see my debugging is printing them.
>
>> BTW did you tried capturing the beacons in other devices and see if they
>> are really malformed, or is it just iwlwifi interpreting them wrongly.?
>
> I haven't managed to do that yet.
>
> This is what I'm doing:
>
> --- a/drivers/net/wireless/iwlwifi/dvm/rx.c
> +++ b/drivers/net/wireless/iwlwifi/dvm/rx.c
> @@ -919,6 +919,11 @@ static int iwlagn_rx_reply_rx(struct iwl_priv *priv,
> ampdu_status = iwlagn_translate_rx_status(priv,
> le32_to_cpu(rx_pkt_status));
>
> + if (ieee80211_is_beacon(header->frame_control)) {
> + print_hex_dump(KERN_INFO, "iwlwifi: dump: ", DUMP_PREFIX_OFFSET,
> + 16, 1, header, len, true);
> + }
> +
> if ((unlikely(phy_res->cfg_phy_cnt > 20))) {
> IWL_DEBUG_DROP(priv, "dsp size out of range [0,20]: %d\n",
> phy_res->cfg_phy_cnt);
>
Oops...you just missed, Right after your print there is a check to
drop frames with BAD CRC :-).
Line 928.
ampdu_status = iwlagn_translate_rx_status(priv,
le32_to_cpu(rx_pkt_status));
if ((unlikely(phy_res->cfg_phy_cnt > 20))) {
IWL_DEBUG_DROP(priv, "dsp size out of range [0,20]: %d\n",
phy_res->cfg_phy_cnt);
return 0;
}
if (!(rx_pkt_status & RX_RES_STATUS_NO_CRC32_ERROR) ||
!(rx_pkt_status & RX_RES_STATUS_NO_RXE_OVERFLOW)) {
IWL_DEBUG_RX(priv, "Bad CRC or FIFO: 0x%08X.\n",
le32_to_cpu(rx_pkt_status));
return 0;
}
On Sat, Nov 9, 2013 at 2:22 AM, Felipe Contreras
<[email protected]> wrote:
> On Fri, Nov 8, 2013 at 2:30 PM, Krishna Chaitanya
> <[email protected]> wrote:
>> On Fri, Nov 8, 2013 at 7:48 PM, Felipe Contreras
>> <[email protected]> wrote:
>>> On Fri, Nov 8, 2013 at 8:06 AM, Krishna Chaitanya
>>> <[email protected]> wrote:
>>>> On Fri, Nov 8, 2013 at 6:44 PM, Felipe Contreras
>>>> <[email protected]> wrote:
>>>>> On Fri, Nov 8, 2013 at 2:35 AM, Felipe Contreras
>>>>> <[email protected]> wrote:
>>>>>> On Sat, Nov 2, 2013 at 2:05 PM, Krishna Chaitanya
>>>>>> <[email protected]> wrote:
>>>>>>
>>>>>>> Also one more thing you said N900 uses mac80211 and it has no issues, but as
>>>>>>> its a embedded device it might running an older kernel where the
>>>>>>> handling might be
>>>>>>> different, so we need to try with the same kernel you are facing an
>>>>>>> issue with the
>>>>>>> a driver which advertises IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC.
>>>>>>
>>>>>> Yes it was running an older kernel, but I just compiled v3.12 and ran
>>>>>> it on the N900, and still everything works fine.
>>>>>>
>>>>>>> (or) if you a have a compilation environment try commenting the advertisement of
>>>>>>> IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC in the iwlwifi DVM driver and
>>>>>>> try to reproduce the issue.
>>>>>>
>>>>>> After commenting that flag everything works fine :)
>>>>
>>>> Oh, great. That was just to corner the problem, that means we are not getting
>>>> the required beacon before the association, but we only wait for 1 beacon here
>>>> may be we to wait for some number of beacons before giving up the association??
>>>>
>>>> Johannes??
>>>
>>> But we are receiving 0 beacons, waiting for more than 1 won't help.
>>> BTW, why NEED_DTIM_BEFORE_ASSOC if the device doesn't *need* the DTIM
>>> before the association?
>>>
This is not just for your case but rather on a generic note. Regarding
the flag even i am not
too sure but i guess some hardware need to know the DTIM to set the
wakeup schedule
after the association?
>>>>>> What are the next steps?
>>>>>
>>>>> I tried to add some debugging to see what's going on, and indeed the
>>>>> beacon packets are lost, I added debugging as low in the chain as I
>>>>> could (iwlagn_rx_reply_rx()), and I don't see them there. However,
>>>>> when I enable the monitor mode, I see them. What's going on?
>>>>
>>>> In the captures you shared all the beacons are malformed, so
>>>> probably they failed the CRC check. iwlwifi drops all the CRC failed
>>>> packets. (doth MVM and DVM)
>>>
>>> Before iwlagn_rx_reply_rx()?
>>>
>>>> Not sure how you are receiving the beacons in the monitor mode.
>>>
>>> I don't know what kismet does, but I can see my debugging is printing them.
>>>
>>>> BTW did you tried capturing the beacons in other devices and see if they
>>>> are really malformed, or is it just iwlwifi interpreting them wrongly.?
>>>
>>> I haven't managed to do that yet.
>>>
>>> This is what I'm doing:
>>>
>>> --- a/drivers/net/wireless/iwlwifi/dvm/rx.c
>>> +++ b/drivers/net/wireless/iwlwifi/dvm/rx.c
>>> @@ -919,6 +919,11 @@ static int iwlagn_rx_reply_rx(struct iwl_priv *priv,
>>> ampdu_status = iwlagn_translate_rx_status(priv,
>>> le32_to_cpu(rx_pkt_status));
>>>
>>> + if (ieee80211_is_beacon(header->frame_control)) {
>>> + print_hex_dump(KERN_INFO, "iwlwifi: dump: ", DUMP_PREFIX_OFFSET,
>>> + 16, 1, header, len, true);
>>> + }
>>> +
>>> if ((unlikely(phy_res->cfg_phy_cnt > 20))) {
>>> IWL_DEBUG_DROP(priv, "dsp size out of range [0,20]: %d\n",
>>> phy_res->cfg_phy_cnt);
>>>
>> Oops...you just missed, Right after your print there is a check to
>> drop frames with BAD CRC :-).
>
> That's why I put the print before that check. Since I don't see the
> print, that means the check was never executed. iwlagn_rx_reply_rx()
> was never called for the beacon frame.
>
Ok. So when we disable advertising of that flag in the driver you said things
are working fine. So in that scenario after the connection are you
seeing the beacons?
Just want to understand the problem is throughout or just before association.
If the driver itself it not getting the beacons then our debugging ends there,
some one from intel should guide you through the FW debugging.
On Fri, Nov 8, 2013 at 8:06 AM, Krishna Chaitanya
<[email protected]> wrote:
> On Fri, Nov 8, 2013 at 6:44 PM, Felipe Contreras
> <[email protected]> wrote:
>> On Fri, Nov 8, 2013 at 2:35 AM, Felipe Contreras
>> <[email protected]> wrote:
>>> On Sat, Nov 2, 2013 at 2:05 PM, Krishna Chaitanya
>>> <[email protected]> wrote:
>>>
>>>> Also one more thing you said N900 uses mac80211 and it has no issues, but as
>>>> its a embedded device it might running an older kernel where the
>>>> handling might be
>>>> different, so we need to try with the same kernel you are facing an
>>>> issue with the
>>>> a driver which advertises IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC.
>>>
>>> Yes it was running an older kernel, but I just compiled v3.12 and ran
>>> it on the N900, and still everything works fine.
>>>
>>>> (or) if you a have a compilation environment try commenting the advertisement of
>>>> IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC in the iwlwifi DVM driver and
>>>> try to reproduce the issue.
>>>
>>> After commenting that flag everything works fine :)
>
> Oh, great. That was just to corner the problem, that means we are not getting
> the required beacon before the association, but we only wait for 1 beacon here
> may be we to wait for some number of beacons before giving up the association??
>
> Johannes??
But we are receiving 0 beacons, waiting for more than 1 won't help.
BTW, why NEED_DTIM_BEFORE_ASSOC if the device doesn't *need* the DTIM
before the association?
>>> What are the next steps?
>>
>> I tried to add some debugging to see what's going on, and indeed the
>> beacon packets are lost, I added debugging as low in the chain as I
>> could (iwlagn_rx_reply_rx()), and I don't see them there. However,
>> when I enable the monitor mode, I see them. What's going on?
>
> In the captures you shared all the beacons are malformed, so
> probably they failed the CRC check. iwlwifi drops all the CRC failed
> packets. (doth MVM and DVM)
Before iwlagn_rx_reply_rx()?
> Not sure how you are receiving the beacons in the monitor mode.
I don't know what kismet does, but I can see my debugging is printing them.
> BTW did you tried capturing the beacons in other devices and see if they
> are really malformed, or is it just iwlwifi interpreting them wrongly.?
I haven't managed to do that yet.
This is what I'm doing:
--- a/drivers/net/wireless/iwlwifi/dvm/rx.c
+++ b/drivers/net/wireless/iwlwifi/dvm/rx.c
@@ -919,6 +919,11 @@ static int iwlagn_rx_reply_rx(struct iwl_priv *priv,
ampdu_status = iwlagn_translate_rx_status(priv,
le32_to_cpu(rx_pkt_status));
+ if (ieee80211_is_beacon(header->frame_control)) {
+ print_hex_dump(KERN_INFO, "iwlwifi: dump: ", DUMP_PREFIX_OFFSET,
+ 16, 1, header, len, true);
+ }
+
if ((unlikely(phy_res->cfg_phy_cnt > 20))) {
IWL_DEBUG_DROP(priv, "dsp size out of range [0,20]: %d\n",
phy_res->cfg_phy_cnt);
--
Felipe Contreras
On Fri, Nov 8, 2013 at 6:44 PM, Felipe Contreras
<[email protected]> wrote:
> On Fri, Nov 8, 2013 at 2:35 AM, Felipe Contreras
> <[email protected]> wrote:
>> On Sat, Nov 2, 2013 at 2:05 PM, Krishna Chaitanya
>> <[email protected]> wrote:
>>
>>> Also one more thing you said N900 uses mac80211 and it has no issues, but as
>>> its a embedded device it might running an older kernel where the
>>> handling might be
>>> different, so we need to try with the same kernel you are facing an
>>> issue with the
>>> a driver which advertises IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC.
>>
>> Yes it was running an older kernel, but I just compiled v3.12 and ran
>> it on the N900, and still everything works fine.
>>
>>> (or) if you a have a compilation environment try commenting the advertisement of
>>> IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC in the iwlwifi DVM driver and
>>> try to reproduce the issue.
>>
>> After commenting that flag everything works fine :)
Oh, great. That was just to corner the problem, that means we are not getting
the required beacon before the association, but we only wait for 1 beacon here
may be we to wait for some number of beacons before giving up the association??
Johannes??
>> What are the next steps?
>
> I tried to add some debugging to see what's going on, and indeed the
> beacon packets are lost, I added debugging as low in the chain as I
> could (iwlagn_rx_reply_rx()), and I don't see them there. However,
> when I enable the monitor mode, I see them. What's going on?
In the captures you shared all the beacons are malformed, so
probably they failed the CRC check. iwlwifi drops all the CRC failed
packets. (doth MVM and DVM)
Not sure how you are receiving the beacons in the monitor mode.
BTW did you tried capturing the beacons in other devices and see if they
are really malformed, or is it just iwlwifi interpreting them wrongly.?
On Sat, Nov 2, 2013 at 10:36 PM, Felipe Contreras
<[email protected]> wrote:
> Sorry for the crappy format (laptop is stuck).
>
> If the problem is the firmware why does it start working when I
> restart the router?
I was referring FW on iwlwifi, not the AP's. May be restarting the router
triggered some kind of event like Beacon Loss which might have
helped the STA to connect. Just guessing.
> Plus, I should be seeing this problem with other APs.
> And what could Windows be doing differently that doesn't exhibit this problem?
Without knowing the root cause it tough to comment these questions :-)
Emmanuel Grumbach wrote:
> On 11/10/2013 06:26 PM, Felipe Contreras wrote:
> > That's better than not associating at all, ever.
>
> No because it would break the driver against all the working APs which
> are fortunately enough more common. Maybe you can rewrite mac80211 /
> iwlwifi to make things work differently so that PS would still work with
> good APs and association would work with yours. Fair enough. Go ahead.
Challenge accepted:
http://article.gmane.org/gmane.linux.network/290256
> > Wouldn't it make sense to timeout if there's no DTIM, and still
> > associate? It's better than not associating ever. Plus, if you already
> > know that power saving wouldn't work in this case, merely disable
> > powersave.
>
> I can't wait for your patch.
Good, because I already sent it.
With my patch if the AP sends the beacon correctly; power saving is enabled, if
not, association still works, but power saving is disabled.
How you could not imagine such patch is beyond me.
--
Felipe Contreras
On Tue, Oct 29, 2013 at 8:23 AM, Krishna Chaitanya
<[email protected]> wrote:
> On Tue, Oct 29, 2013 at 3:02 AM, Felipe Contreras
> <[email protected]> wrote:
>> Yes, but maybe I overrode the file. I've pushed a new one again. The
>> sha-1 is 36c260d8d8c171a24eb1aa7b2ea736b06c9b55b7.
>>
> Thanks, able to decode now. I am not familiar with the
> iwlwifi code, but let me give it a try.
Did you find anything?
--
Felipe Contreras
>>>>>>>> But we are receiving 0 beacons, waiting for more than 1 won't help.
>>>>>>>> BTW, why NEED_DTIM_BEFORE_ASSOC if the device doesn't *need* the DTIM
>>>>>>>> before the association?
>>>>>>>>
>>>>> This is not just for your case but rather on a generic note. Regarding
>>>>> the flag even i am not
>>>>> too sure but i guess some hardware need to know the DTIM to set the
>>>>> wakeup schedule
>>>>> after the association?
>>>>
Right - we need the send the beacon interval to the device *before* we
configure the device to be associated.
>>>> But not this hardware? Because everything works fine.
>>>>
>>>>>>> Oops...you just missed, Right after your print there is a check to
>>>>>>> drop frames with BAD CRC :-).
>>>>>>
>>>>>> That's why I put the print before that check. Since I don't see the
>>>>>> print, that means the check was never executed. iwlagn_rx_reply_rx()
>>>>>> was never called for the beacon frame.
>>>>>>
That won't help since the firmware will drop frames with bad CRC,
unless you are in monitor mode.
>>>>> Ok. So when we disable advertising of that flag in the driver you said things
>>>>> are working fine.
>>>>
>>>> Yes, everything works perfectly.
>>>>
>>>>> So in that scenario after the connection are you
>>>>> seeing the beacons?
>>>>
>>>> No, there are no beacons ever, at least from this AP
>>
>>> Oh ok, thats interesting. Are you not seeing any disconnects due
>>> to beacon loss triggers?
>>
>> I see some disconnects now and then, but I don't know why. Before
>> trying to tackle those problems I would like to be able to connect
>> reliably.
No wonder. If we can't receive any beacons you can expect issues....
PS will be completely broken and that is only the first on the list...
>
> Its probably the beacons loss that triggering the disconnects, so
> both the problem have the same cause. Its the beacon reception
> we need to figure it out.
>
> Adding some intel guys explicitly.
>
>>> Also can you add some debugging to the iwlagn_rx_beacon_notif
>>> (the beacon RX handler)?
>>
>> All right, I've added debugging there, but so far I see nothing.
>>
>
> Hmm...dead end this side too.
>
>>>> It seems to me all the beacon frames are dropped by the firmware
>>>> before passing them to the driver, so the driver cannot parse them and
>>>> do something sensible even though they are corrupted, the driver never
>>>> gets them.
>>>>
>>>>> Just want to understand the problem is throughout or just before association.
>>>>> If the driver itself it not getting the beacons then our debugging ends there,
>>>>> some one from intel should guide you through the FW debugging.
>>>>
>>>> Not really, part of the debugging ends there, but we can still do something.
>>>>
>>>> What is the meaning of NEED_DTIM_BEFORE_ASSOC, if the driver doesn't
>>>> *need* this? Why fail the association completely, if we don't need to?
>>>>
As I explained, the firmware needs to. This is for configuring the PS
state machine. But since you AP is completely broken, PS is likely not
to work at all anyway....
And my small experience in WiFi leads me to the conclusion that if a
driver cannot rely on the AP sending beacon, it is really in trouble.
We can cope with buggy AP, but not associate to microwaves.
Other devices will work, granted. But they can't go to sleep then, and
need to poke the AP from time to time to make sure it hasn't
disappeared.
Note that this is true regardless of the design / HW wahtever. Ok, the
Windows driver on the same device works with this "AP". Fine. But it
can't theoretically can't work well. Nor can any other WiFi device
that can't hear the beacon. Now - maybe we have an issue in the Linux
driver that mangles the beacons (PHY stuff) - that's possible. But
since you haven't sent a sniffer capture of the AP with another
device, we can't know.
>>>> Also, I realized that after rebooting the router, the beacon frames
>>>> are not corrupted any more, so it's a compound problem, yet even in
>>>> the corrupted case, the driver can work just fine, if only it didn't
>>>> *require* the DTIM unnecessarily,
>>>
>>> Yeah, that's more of design query with the problem being not able to
>>> Rx the beacons? We need to understand the reason for this flag being
>>> set by the iwlwifi driver.
>>
>> Indeed.
>>
>>>>as apparently all hardware and even
>>>> other OS'es on this hardware do.
>>>
>>> Thats the reason this flag is a _HW_ not all hardwares requrie this
>>> but intel does.
>>
>> But it doesn't, my hardware is Intel, and it works fine without it.
>>
> Yeah, so far so good. But there should be a reason why they are
> specifically advertising this flag? Also DTIM is Multicast+Powersave
> so a rare thing, we might no hit that too often.
Hmm... well... N/M.
On Sat, Nov 9, 2013 at 1:10 PM, Krishna Chaitanya
<[email protected]> wrote:
> On Sat, Nov 9, 2013 at 10:22 PM, Felipe Contreras
> <[email protected]> wrote:
>> On Sat, Nov 9, 2013 at 7:10 AM, Krishna Chaitanya
>> <[email protected]> wrote:
>>> On Sat, Nov 9, 2013 at 2:22 AM, Felipe Contreras
>>> <[email protected]> wrote:
>>>> On Fri, Nov 8, 2013 at 2:30 PM, Krishna Chaitanya
>>>> <[email protected]> wrote:
>>
>>>>>> But we are receiving 0 beacons, waiting for more than 1 won't help.
>>>>>> BTW, why NEED_DTIM_BEFORE_ASSOC if the device doesn't *need* the DTIM
>>>>>> before the association?
>>>>>>
>>> This is not just for your case but rather on a generic note. Regarding
>>> the flag even i am not
>>> too sure but i guess some hardware need to know the DTIM to set the
>>> wakeup schedule
>>> after the association?
>>
>> But not this hardware? Because everything works fine.
>>
>>>>> Oops...you just missed, Right after your print there is a check to
>>>>> drop frames with BAD CRC :-).
>>>>
>>>> That's why I put the print before that check. Since I don't see the
>>>> print, that means the check was never executed. iwlagn_rx_reply_rx()
>>>> was never called for the beacon frame.
>>>>
>>> Ok. So when we disable advertising of that flag in the driver you said things
>>> are working fine.
>>
>> Yes, everything works perfectly.
>>
>>> So in that scenario after the connection are you
>>> seeing the beacons?
>>
>> No, there are no beacons ever, at least from this AP
> Oh ok, thats interesting. Are you not seeing any disconnects due
> to beacon loss triggers?
I see some disconnects now and then, but I don't know why. Before
trying to tackle those problems I would like to be able to connect
reliably.
> Also can you add some debugging to the iwlagn_rx_beacon_notif
> (the beacon RX handler)?
All right, I've added debugging there, but so far I see nothing.
>> It seems to me all the beacon frames are dropped by the firmware
>> before passing them to the driver, so the driver cannot parse them and
>> do something sensible even though they are corrupted, the driver never
>> gets them.
>>
>>> Just want to understand the problem is throughout or just before association.
>>> If the driver itself it not getting the beacons then our debugging ends there,
>>> some one from intel should guide you through the FW debugging.
>>
>> Not really, part of the debugging ends there, but we can still do something.
>>
>> What is the meaning of NEED_DTIM_BEFORE_ASSOC, if the driver doesn't
>> *need* this? Why fail the association completely, if we don't need to?
>>
>> Also, I realized that after rebooting the router, the beacon frames
>> are not corrupted any more, so it's a compound problem, yet even in
>> the corrupted case, the driver can work just fine, if only it didn't
>> *require* the DTIM unnecessarily,
>
> Yeah, that's more of design query with the problem being not able to
> Rx the beacons? We need to understand the reason for this flag being
> set by the iwlwifi driver.
Indeed.
>>as apparently all hardware and even
>> other OS'es on this hardware do.
>
> Thats the reason this flag is a _HW_ not all hardwares requrie this
> but intel does.
But it doesn't, my hardware is Intel, and it works fine without it.
--
Felipe Contreras
Sorry for the crappy format (laptop is stuck).
If the problem is the firmware why does it start working when I
restart the router?
Plus, I should be seeing this problem with other APs.
And what could Windows be doing differently that doesn't exhibit this problem?
On 11/2/13, Krishna Chaitanya <[email protected]> wrote:
> On Sat, Nov 2, 2013 at 11:16 AM, Felipe Contreras
> <[email protected]> wrote:
>>
>> On Tue, Oct 29, 2013 at 8:23 AM, Krishna Chaitanya
>> <[email protected]> wrote:
>> > On Tue, Oct 29, 2013 at 3:02 AM, Felipe Contreras
>> > <[email protected]> wrote:
>>
>> >> Yes, but maybe I overrode the file. I've pushed a new one again. The
>> >> sha-1 is 36c260d8d8c171a24eb1aa7b2ea736b06c9b55b7.
>> >>
>> > Thanks, able to decode now. I am not familiar with the
>> > iwlwifi code, but let me give it a try.
>>
>> Did you find anything?
>
> Not Much, i could see the below pattern repeating
>
> wpa_supplicant-6915 [003] 5610.786592: drv_return_void: phy1
> wpa_supplicant-6915 [003] 5610.786594: drv_sta_state: phy1
> vif:wlan0(2) sta:e0:1d:3b:46:82:a0 state: 0->1
> wpa_supplicant-6915 [003] 5610.786597: iwlwifi_dev_hcmd:
> [0000:02:00.0] hcmd 0x18 (sync)
> irq/44-iwlwifi-6803 [001] 5610.786632: iwlwifi_dev_rx:
> [0000:02:00.0] RX cmd 0x18
> wpa_supplicant-6915 [003] 5610.786639: drv_return_int: phy1 - 0
> wpa_supplicant-6915 [003] 5610.786645: drv_mgd_prepare_tx: phy1
> vif:wlan0(2)
> wpa_supplicant-6915 [003] 5610.786645: drv_return_void: phy1
> wpa_supplicant-6915 [003] 5610.786663: iwlwifi_dev_tx:
> [0000:02:00.0] TX 1c (90 bytes)
> irq/44-iwlwifi-6803 [001] 5610.788677: iwlwifi_dev_rx:
> [0000:02:00.0] RX cmd 0x1c
> irq/44-iwlwifi-6803 [001] 5610.788709: iwlwifi_dev_rx:
> [0000:02:00.0] RX cmd 0xc0
> irq/44-iwlwifi-6803 [001] 5610.788710: iwlwifi_dev_rx:
> [0000:02:00.0] RX cmd 0xc1
> kworker/u8:0-1459 [003] 5610.788730: drv_sta_state: phy1
> vif:wlan0(2) sta:e0:1d:3b:46:82:a0 state: 1->2
> kworker/u8:0-1459 [003] 5610.788732: drv_return_int: phy1 - 0
> kworker/u8:0-1459 [003] 5610.999549: drv_sta_state: phy1
> vif:wlan0(2) sta:e0:1d:3b:46:82:a0 state: 2->1
> kworker/u8:0-1459 [003] 5610.999553: drv_return_int: phy1 - 0
> kworker/u8:0-1459 [003] 5610.999553: drv_sta_state: phy1
> vif:wlan0(2) sta:e0:1d:3b:46:82:a0 state: 1->0
> kworker/u8:0-1459 [003] 5610.999554: drv_return_int: phy1 - 0
>
> So We move from NONE to AUTH and then AUTH to NONE. iwlwifi_dev_tx is
> transmistting 90 bytes packets
> but auth request is only 43 bytes, could that be association request??
> If yes where is that getting dropped.
>
> So my guess is that we are sending the association request (CMD_TX),
> but its never seen OTA, its lost some where in between.
> Ball goes to the FW :-).
>
--
Felipe Contreras
On Sat, Nov 9, 2013 at 10:22 PM, Felipe Contreras
<[email protected]> wrote:
> On Sat, Nov 9, 2013 at 7:10 AM, Krishna Chaitanya
> <[email protected]> wrote:
>> On Sat, Nov 9, 2013 at 2:22 AM, Felipe Contreras
>> <[email protected]> wrote:
>>> On Fri, Nov 8, 2013 at 2:30 PM, Krishna Chaitanya
>>> <[email protected]> wrote:
>
>>>>> But we are receiving 0 beacons, waiting for more than 1 won't help.
>>>>> BTW, why NEED_DTIM_BEFORE_ASSOC if the device doesn't *need* the DTIM
>>>>> before the association?
>>>>>
>> This is not just for your case but rather on a generic note. Regarding
>> the flag even i am not
>> too sure but i guess some hardware need to know the DTIM to set the
>> wakeup schedule
>> after the association?
>
> But not this hardware? Because everything works fine.
>
>>>> Oops...you just missed, Right after your print there is a check to
>>>> drop frames with BAD CRC :-).
>>>
>>> That's why I put the print before that check. Since I don't see the
>>> print, that means the check was never executed. iwlagn_rx_reply_rx()
>>> was never called for the beacon frame.
>>>
>> Ok. So when we disable advertising of that flag in the driver you said things
>> are working fine.
>
> Yes, everything works perfectly.
>
>> So in that scenario after the connection are you
>> seeing the beacons?
>
> No, there are no beacons ever, at least from this AP
Oh ok, thats interesting. Are you not seeing any disconnects due
to beacon loss triggers?
Also can you add some debugging to the iwlagn_rx_beacon_notif
(the beacon RX handler)?
.
>
> It seems to me all the beacon frames are dropped by the firmware
> before passing them to the driver, so the driver cannot parse them and
> do something sensible even though they are corrupted, the driver never
> gets them.
>
>> Just want to understand the problem is throughout or just before association.
>> If the driver itself it not getting the beacons then our debugging ends there,
>> some one from intel should guide you through the FW debugging.
>
> Not really, part of the debugging ends there, but we can still do something.
>
> What is the meaning of NEED_DTIM_BEFORE_ASSOC, if the driver doesn't
> *need* this? Why fail the association completely, if we don't need to?
>
> Also, I realized that after rebooting the router, the beacon frames
> are not corrupted any more, so it's a compound problem, yet even in
> the corrupted case, the driver can work just fine, if only it didn't
> *require* the DTIM unnecessarily,
Yeah, that's more of design query with the problem being not able to
Rx the beacons? We need to understand the reason for this flag being
set by the iwlwifi driver.
>as apparently all hardware and even
> other OS'es on this hardware do.
Thats the reason this flag is a _HW_ not all hardwares requrie this
but intel does.
I don't know the background of this flag, johannes is the right guy
to be able to answer this.