2016-01-07 16:19:46

by David Mosberger-Tang

[permalink] [raw]
Subject: lost connectivity until "wpa_cli reassociate" is issued

We are seeing a curious issue where WLAN connectivity sometimes
gets stuck until a "wpa_cli reassociate" command is issued.

At the WPA level, everything appears to be working fine)
(see thread starting at
http://lists.infradead.org/pipermail/hostap/2016-January/034454.html).

What's truly odd is when connectivity is stuck, you can often "ping" the
network's gateway exactly once and then you either get no response anymore
or, after a long time (like 40-60 seconds), the floodgate opens and
(almost) all previously sent ping responses arrive all at the same time
(in the latter case, connectivity is fine afterwards again).

This makes me wonder if there may be a packet queue that somehow gets
stuck.

I have turned on mac80211 debugging but have not found anything
that seems to correlate with the loss of connectivity. All I know is that
wpa_cli reassociate so far is the least intrusive workaround that seems
to reliably bring back connectivity.

We are based on kernel 3.7.0 but have back-ported the mac80211 fixes
from the linux-3.7.y stable branch.

I'm not very familiar with the wireless stack, so any tips or hints on how to
further debug this would be greatly appreciated.

Thanks and best regards,

--david
--
eGauge Systems LLC, http://egauge.net/, 1.877-EGAUGE1, fax 720.545.9768


2016-01-07 16:33:37

by Ben Greear

[permalink] [raw]
Subject: Re: lost connectivity until "wpa_cli reassociate" is issued

On 01/07/2016 08:29 AM, David Mosberger wrote:
> Ben,
>
> On Thu, Jan 7, 2016 at 9:24 AM, Ben Greear <[email protected]> wrote:
>> On 01/07/2016 08:19 AM, David Mosberger wrote:
>>>
>>> We are seeing a curious issue where WLAN connectivity sometimes
>>> gets stuck until a "wpa_cli reassociate" command is issued.
>>>
>>> At the WPA level, everything appears to be working fine)
>>> (see thread starting at
>>> http://lists.infradead.org/pipermail/hostap/2016-January/034454.html).
>>
>> I don't remember seeing you mention the driver and NIC you are using.
>>
>> I think this is likely a driver bug, so please provide that info.
>
> Sure, we're using rtl8192cu. I started out suspecting a driver bug as
> well, but since we're processing management frames during those
> "stuck" periods just fine (see
> debug output in
> http://lists.infradead.org/pipermail/hostap/2016-January/034459.html),
> I'm not so sure anymore. Like mac80211, we have patched rtl8192cu
> driver with current
> bug-fixes already.

I have no experience with that chip, but wifi is a tricky beast. Could be
a power-save issue perhaps. I assume you have sniffed to see if any frames
are going out on the air during the time of trouble?

If correct packets go out on the air and AP doesn't answer, then likely AP problem.

If pkts don't get on the air, then check to see if they at least get to
the driver.

If they don't get to the driver, then probably it is a kernel/stack issue.

If they get to the driver but not on the air, then NIC and/or it's firmware
and/or the driver is likely the culprit.

Thanks,
Ben

>
> --david
>


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2016-01-07 17:32:58

by David Mosberger-Tang

[permalink] [raw]
Subject: Re: lost connectivity until "wpa_cli reassociate" is issued

On Thu, Jan 7, 2016 at 9:59 AM, Bruno Randolf <[email protected]> wrote:

> Although I have not checked if the "wpa_cli reassociate" command helps,
> I have seen rtl8192cu get "stuck" in similar ways. Maybe you can give
> the new rtl8xxxu driver from Jes Sorensen try? It has less features, but
> works more reliable, IMHO...

Oh, boy, switching to a completely different driver is not what I'd
like to do at this
stage, but perhaps it's the right thing to do. Thanks for the suggestion!

I'll check with Jes.

--david
--
eGauge Systems LLC, http://egauge.net/, 1.877-EGAUGE1, fax 720.545.9768

2016-01-07 16:59:44

by Bruno Randolf

[permalink] [raw]
Subject: Re: lost connectivity until "wpa_cli reassociate" is issued

On 01/07/2016 04:29 PM, David Mosberger wrote:
> On Thu, Jan 7, 2016 at 9:24 AM, Ben Greear <[email protected]> wrote:
>> On 01/07/2016 08:19 AM, David Mosberger wrote:
>>>
>>> We are seeing a curious issue where WLAN connectivity sometimes
>>> gets stuck until a "wpa_cli reassociate" command is issued.
>>>
>>> At the WPA level, everything appears to be working fine)
>>> (see thread starting at
>>> http://lists.infradead.org/pipermail/hostap/2016-January/034454.html).
>>
>> I don't remember seeing you mention the driver and NIC you are using.
>>
>> I think this is likely a driver bug, so please provide that info.
>
> Sure, we're using rtl8192cu. I started out suspecting a driver bug as
> well, but since we're processing management frames during those
> "stuck" periods just fine (see

Although I have not checked if the "wpa_cli reassociate" command helps,
I have seen rtl8192cu get "stuck" in similar ways. Maybe you can give
the new rtl8xxxu driver from Jes Sorensen try? It has less features, but
works more reliable, IMHO...

bruno

2016-01-07 16:32:51

by changename

[permalink] [raw]
Subject: Re: lost connectivity until "wpa_cli reassociate" is issued

On Thu, Jan 7, 2016 at 9:59 PM, David Mosberger <[email protected]> wrote:
>
> Ben,
>
> On Thu, Jan 7, 2016 at 9:24 AM, Ben Greear <[email protected]> wrote:
> > On 01/07/2016 08:19 AM, David Mosberger wrote:
> >>
> >> We are seeing a curious issue where WLAN connectivity sometimes
> >> gets stuck until a "wpa_cli reassociate" command is issued.
> >>
> >> At the WPA level, everything appears to be working fine)
> >> (see thread starting at
> >> http://lists.infradead.org/pipermail/hostap/2016-January/034454.html).
> >
> > I don't remember seeing you mention the driver and NIC you are using.
> >
> > I think this is likely a driver bug, so please provide that info.
>
> Sure, we're using rtl8192cu. I started out suspecting a driver bug as
> well, but since we're processing management frames during those
> "stuck" periods just fine (see
> debug output in
> http://lists.infradead.org/pipermail/hostap/2016-January/034459.html),
> I'm not so sure anymore. Like mac80211, we have patched rtl8192cu
> driver with current
> bug-fixes already.

Management frames use a different queue (VO)
from the logs it looks like a data path issue in driver/FW.

The re-association might be clearing/triggering TX in driver
solving the issue.

2016-01-07 17:00:07

by David Mosberger-Tang

[permalink] [raw]
Subject: Re: lost connectivity until "wpa_cli reassociate" is issued

On Thu, Jan 7, 2016 at 9:58 AM, Krishna Chaitanya
<[email protected]> wrote:

> We can check pending packets per queue at mac80211, but first
> we need some info from driver level. Someone familiar with RTL
> should help.
>
> cat /sys/kernel/debug/ieee80211/phy*/queues
>
> If you see non-zero packets here (or) if the queue is stopped
> that might explain this behavior...

Cool. I'll check that next time it happens, thanks!

--david
--
eGauge Systems LLC, http://egauge.net/, 1.877-EGAUGE1, fax 720.545.9768

2016-01-07 16:45:41

by David Mosberger-Tang

[permalink] [raw]
Subject: Re: lost connectivity until "wpa_cli reassociate" is issued

Ben,

On Thu, Jan 7, 2016 at 9:33 AM, Ben Greear <[email protected]> wrote:

> Could be
> a power-save issue perhaps.

That was our thought, too. Particularly, since if there is steady
traffic (at least a ping every 8 seconds), the problem does not appear
to occur.

However, the rtl8192cu doesn't support power-saving mode and in any
case, we made sure it's off:

# iw wlan0 get power_save
Power save: off

> I assume you have sniffed to see if any frames are going out on the air during the time of trouble?

No, I'm not actually sure how I'd do that. The site we see this most
frequently with is
remote and we don't have any special WiFi packet sniffer.

> If correct packets go out on the air and AP doesn't answer, then likely AP
> problem.

Not likely an AP problem, since we're seeing this with multiple APs of
different brands etc.

> If pkts don't get on the air, then check to see if they at least get to
> the driver.

Yeah, that should be easy to do, I suppose. We had rtl8192cu
debugging turned on
before but at that time it didn't help.

> If they don't get to the driver, then probably it is a kernel/stack issue.
>
> If they get to the driver but not on the air, then NIC and/or it's firmware
> and/or the driver is likely the culprit.

Thanks for your thoughts!

--david
--
eGauge Systems LLC, http://egauge.net/, 1.877-EGAUGE1, fax 720.545.9768

2016-01-07 16:47:42

by David Mosberger-Tang

[permalink] [raw]
Subject: Re: lost connectivity until "wpa_cli reassociate" is issued

On Thu, Jan 7, 2016 at 9:32 AM, Krishna Chaitanya
<[email protected]> wrote:

> Management frames use a different queue (VO)
> from the logs it looks like a data path issue in driver/FW.
>
> The re-association might be clearing/triggering TX in driver
> solving the issue.

That sounds plausible to me. Is there an easy way to see the queues?
I haven't tried debugfs yet.

--david
--
eGauge Systems LLC, http://egauge.net/, 1.877-EGAUGE1, fax 720.545.9768

2016-01-07 16:59:07

by changename

[permalink] [raw]
Subject: Re: lost connectivity until "wpa_cli reassociate" is issued

On Thu, Jan 7, 2016 at 10:17 PM, David Mosberger <[email protected]> wrote:
> On Thu, Jan 7, 2016 at 9:32 AM, Krishna Chaitanya
> <[email protected]> wrote:
>
>> Management frames use a different queue (VO)
>> from the logs it looks like a data path issue in driver/FW.
>>
>> The re-association might be clearing/triggering TX in driver
>> solving the issue.
>
> That sounds plausible to me. Is there an easy way to see the queues?
> I haven't tried debugfs yet.
We can check pending packets per queue at mac80211, but first
we need some info from driver level. Someone familiar with RTL
should help.

cat /sys/kernel/debug/ieee80211/phy*/queues

If you see non-zero packets here (or) if the queue is stopped
that might explain this behavior...

2016-01-07 16:24:30

by Ben Greear

[permalink] [raw]
Subject: Re: lost connectivity until "wpa_cli reassociate" is issued

On 01/07/2016 08:19 AM, David Mosberger wrote:
> We are seeing a curious issue where WLAN connectivity sometimes
> gets stuck until a "wpa_cli reassociate" command is issued.
>
> At the WPA level, everything appears to be working fine)
> (see thread starting at
> http://lists.infradead.org/pipermail/hostap/2016-January/034454.html).

I don't remember seeing you mention the driver and NIC you are using.

I think this is likely a driver bug, so please provide that info.

Thanks,
Ben

>
> What's truly odd is when connectivity is stuck, you can often "ping" the
> network's gateway exactly once and then you either get no response anymore
> or, after a long time (like 40-60 seconds), the floodgate opens and
> (almost) all previously sent ping responses arrive all at the same time
> (in the latter case, connectivity is fine afterwards again).
>
> This makes me wonder if there may be a packet queue that somehow gets
> stuck.
>
> I have turned on mac80211 debugging but have not found anything
> that seems to correlate with the loss of connectivity. All I know is that
> wpa_cli reassociate so far is the least intrusive workaround that seems
> to reliably bring back connectivity.
>
> We are based on kernel 3.7.0 but have back-ported the mac80211 fixes
> from the linux-3.7.y stable branch.
>
> I'm not very familiar with the wireless stack, so any tips or hints on how to
> further debug this would be greatly appreciated.
>
> Thanks and best regards,
>
> --david
>


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2016-01-07 16:29:40

by David Mosberger-Tang

[permalink] [raw]
Subject: Re: lost connectivity until "wpa_cli reassociate" is issued

Ben,

On Thu, Jan 7, 2016 at 9:24 AM, Ben Greear <[email protected]> wrote:
> On 01/07/2016 08:19 AM, David Mosberger wrote:
>>
>> We are seeing a curious issue where WLAN connectivity sometimes
>> gets stuck until a "wpa_cli reassociate" command is issued.
>>
>> At the WPA level, everything appears to be working fine)
>> (see thread starting at
>> http://lists.infradead.org/pipermail/hostap/2016-January/034454.html).
>
> I don't remember seeing you mention the driver and NIC you are using.
>
> I think this is likely a driver bug, so please provide that info.

Sure, we're using rtl8192cu. I started out suspecting a driver bug as
well, but since we're processing management frames during those
"stuck" periods just fine (see
debug output in
http://lists.infradead.org/pipermail/hostap/2016-January/034459.html),
I'm not so sure anymore. Like mac80211, we have patched rtl8192cu
driver with current
bug-fixes already.

--david
--
eGauge Systems LLC, http://egauge.net/, 1.877-EGAUGE1, fax 720.545.9768