2016-01-28 05:34:37

by Linus Torvalds

[permalink] [raw]
Subject: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

Hmm. So my daughter has a little Gigabyte Brix that has rtl8821ae
wireless in it. Yeah, nasty, I know, but it has actually worked
reasonably well.

.. except now I upgraded the nearest access point, and now wireless on
that machine no longer works.

Or rather, it actually *does* work in the sense that it authenticates,
it associates, and it actually gets a DHCP lease etc. So the darn
thing has an IP address and everything, but then nothing else seems to
go through after that. Very odd. My guess is that the auth/assoc/dhcp
thign happens at low rates, then it starts trying to up the rates, and
things go to hell.

But clearly several packets have gotten through. And then absolutely
nothing. Everything else is happy with the new AP, so this is not a
problem with the wireless network itself.

I'm appending the warning that gets printed, which may or may not be relevant.

This is with a clean and up-to-date Fedora 23 install, so that line 513 is the

512 /* RC is busted */
513 if (WARN_ON_ONCE(rates[i].idx >= sband->n_bitrates)) {
514 rates[i].idx = -1;
515 continue;
516 }

thing, which still exists in the same form in current kernels (except
in current -git it's line 625).

I do note that that rate_fixup_ratelist() function is a bit odd wrt
those rate indexes: it has code to make sure that there are no valid
rates following an invalid one:

/*
* make sure there's no valid rate following
* an invalid one, just in case drivers don't
* take the API seriously to stop at -1.
*/
if (inval) {
rates[i].idx = -1;
continue;
}
if (rates[i].idx < 0) {
inval = true;
continue;
}

but then that "RC is busted" case that generates a warning will add
one of those invalid rates in the middle anyway. So I get the feeling
that if that warning ever triggers, it will basically be screwing up
that whole rate table. I dunno.

Is there anything sane I can do to help debug this case?

Linus

--- snip snip, relevant (?) wireless warning ---

IPv6: ADDRCONF(NETDEV_UP): enp3s0: link is not ready
r8169 0000:03:00.0 enp3s0: link down
IPv6: ADDRCONF(NETDEV_UP): enp3s0: link is not ready
IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready
IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready
IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready
tun: Universal TUN/TAP device driver, 1.6
tun: (C) 1999-2004 Max Krasnyansky <[email protected]>
device virbr0-nic entered promiscuous mode
virbr0: port 1(virbr0-nic) entered listening state
virbr0: port 1(virbr0-nic) entered listening state
virbr0: port 1(virbr0-nic) entered disabled state
wlp2s0: authenticate with 46:d9:e7:92:bf:29
wlp2s0: send auth to 46:d9:e7:92:bf:29 (try 1/3)
wlp2s0: authenticated
wlp2s0: associate with 46:d9:e7:92:bf:29 (try 1/3)
wlp2s0: associate with 46:d9:e7:92:bf:29 (try 2/3)
wlp2s0: RX AssocResp from 46:d9:e7:92:bf:29 (capab=0x411 status=0 aid=1)
wlp2s0: associated
IPv6: ADDRCONF(NETDEV_CHANGE): wlp2s0: link becomes ready
------------[ cut here ]------------
WARNING: CPU: 2 PID: 0 at net/mac80211/rate.c:513
ieee80211_get_tx_rates+0x243/0x7d0 [mac80211]()
Modules linked in: ccm cmac xt_CHECKSUM ipt_MASQUERADE
nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns
nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
xt_conntrack ebtable_filter ebtable_nat ebtable_broute bridge ebtables
ip6table_raw ip6table_security ip6table_nat nf_conntrack_ipv6
nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_filter ip6_tables
iptable_raw iptable_security iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle bnep
arc4 rtl8821ae vfat fat btcoexist rtl_pci rtlwifi mac80211
x86_pkg_temp_thermal coretemp snd_hda_codec_realtek snd_hda_codec_hdmi
snd_hda_codec_generic kvm_intel snd_soc_rt5640 kvm snd_soc_rl6231
snd_hda_intel snd_soc_core iTCO_wdt snd_hda_codec snd_compress btusb
snd_pcm_dmaengine snd_hda_core
iTCO_vendor_support cfg80211 ac97_bus btrtl snd_hwdep
crct10dif_pclmul btbcm snd_seq crc32_pclmul btintel crc32c_intel
bluetooth snd_seq_device joydev snd_pcm mei_me mei shpchp dw_dmac
tpm_tis lpc_ich i2c_i801 snd_timer rfkill snd tpm soundcore
snd_soc_sst_acpi dw_dmac_core i2c_designware_platform
i2c_designware_core nfsd auth_rpcgss nfs_acl lockd grace sunrpc
hid_logitech_hidpp hid_logitech_dj i915 i2c_algo_bit drm_kms_helper
8021q garp drm stp llc mrp r8169 sdhci_acpi mii sdhci mmc_core video
i2c_hid
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.2.8-300.fc23.x86_64 #1
Hardware name: GIGABYTE M4HM87P-00/M4HM87P-00, BIOS F2 12/11/2013
0000000000000000 aad0aff724c0ea01 ffff88021ea83648 ffffffff817738ca
0000000000000000 0000000000000000 ffff88021ea83688 ffffffff8109e4c6
0000000000000000 ffff8800d1309630 ffff8800d1309600 ffff8800d130963c
Call Trace:
<IRQ> [<ffffffff817738ca>] dump_stack+0x45/0x57
[<ffffffff8109e4c6>] warn_slowpath_common+0x86/0xc0
[<ffffffff8109e5fa>] warn_slowpath_null+0x1a/0x20
[<ffffffffa07a4f93>] ieee80211_get_tx_rates+0x243/0x7d0 [mac80211]
[<ffffffff8164d6fb>] ? __alloc_skb+0x5b/0x210
[<ffffffffa07a5740>] rate_control_get_rate+0x120/0x150 [mac80211]
[<ffffffffa07b4c6d>] ieee80211_tx_h_rate_ctrl+0x1dd/0x420 [mac80211]
[<ffffffffa07b684c>] invoke_tx_handlers+0x2ec/0xe50 [mac80211]
[<ffffffff811c342c>] ? zone_statistics+0x7c/0xa0
[<ffffffffa07b7585>] ieee80211_tx+0x85/0x110 [mac80211]
[<ffffffffa07b84cb>] ieee80211_xmit+0x9b/0xf0 [mac80211]
[<ffffffffa07b92b4>] __ieee80211_subif_start_xmit+0x514/0x740 [mac80211]
[<ffffffff810d3d91>] ? enqueue_entity+0x441/0xc50
[<ffffffff810d3d91>] ? enqueue_entity+0x441/0xc50
[<ffffffff8101df79>] ? sched_clock+0x9/0x10
[<ffffffffa07b94f0>] ieee80211_subif_start_xmit+0x10/0x20 [mac80211]
[<ffffffff816634ed>] dev_hard_start_xmit+0x24d/0x3b0
[<ffffffff816867e9>] sch_direct_xmit+0x129/0x200
[<ffffffff816639cd>] __dev_queue_xmit+0x23d/0x550
[<ffffffff81663cf3>] dev_queue_xmit_sk+0x13/0x20
[<ffffffff8166c240>] neigh_resolve_output+0x120/0x1d0
[<ffffffff81718752>] ip6_finish_output2+0x192/0x4a0
[<ffffffff81698237>] ? nf_iterate+0x97/0xb0
[<ffffffff8171b0af>] ip6_finish_output+0x8f/0xf0
[<ffffffff8171b163>] ip6_output+0x53/0x100
[<ffffffff8171b020>] ? ip6_fragment+0xa70/0xa70
[<ffffffff8173d286>] NF_HOOK_THRESH.constprop.37+0x36/0xa0
[<ffffffff8173ba60>] ? ipv6_icmp_sysctl_init+0x40/0x40
[<ffffffff8173d44b>] mld_sendpack+0x15b/0x200
[<ffffffff8173e7af>] mld_ifc_timer_expire+0x17f/0x280
[<ffffffff8173e630>] ? igmp6_timer_handler+0x80/0x80
[<ffffffff81105ab9>] call_timer_fn+0x39/0xf0
[<ffffffff8173e630>] ? igmp6_timer_handler+0x80/0x80
[<ffffffff811060ef>] run_timer_softirq+0x20f/0x2c0
[<ffffffff810a287b>] __do_softirq+0xfb/0x290
[<ffffffff810a2c29>] irq_exit+0x119/0x120
[<ffffffff8177cfb6>] smp_apic_timer_interrupt+0x46/0x60
[<ffffffff8177b14b>] apic_timer_interrupt+0x6b/0x70
<EOI> [<ffffffff81616920>] ? cpuidle_enter_state+0x130/0x270
[<ffffffff816168fb>] ? cpuidle_enter_state+0x10b/0x270
[<ffffffff81616a97>] cpuidle_enter+0x17/0x20
[<ffffffff810dfd02>] call_cpuidle+0x32/0x60
[<ffffffff81616a73>] ? cpuidle_select+0x13/0x20
[<ffffffff810dff98>] cpu_startup_entry+0x268/0x320
[<ffffffff8104cd76>] start_secondary+0x186/0x1c0
---[ end trace b8b82c9c5f4318b8 ]---


2016-01-28 22:12:47

by Johannes Berg

[permalink] [raw]
Subject: Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

On Thu, 2016-01-28 at 14:04 -0800, Linus Torvalds wrote:

> Well, it "makes a difference" in the sense that the warning goes
> away.
> But it doesn't make things work. In fact, it might be making things
> worse.

Heh, ok.

> Because with that patch, the wireless still authenticates and
> associates, but then it doesn't even get an IP address, so now even
> dhcp doesn't work. Of course, I was surprised that it worked last
> time, and I'm not 100% sure it did work consistently. I'll re-test
> without the patch, just to make sure, but it doesn't really seem to
> improve on anything.
>

It makes some sense, here's some speculation:

VHT rates are MCS 0-9. If the rate scaling decides to use only VHT
MCSes with a VHT-capable peer, then it stands to reason it might still
start at 0, but forget to set the VHT_MCS flag, so it would really use
rate index 0 from the table, which is 6 MBps. Then, it would see that
"working" (since it's not the right thing) and scale up until it hits
MCS 8 or 9, which is no longer a valid rate (those are only 0-7).

Since the suggested changes make it worse, we can assume that this is
not the only place where VHT is simply completely broken, and fixing
VHT here will instead uncover a bug elsewhere, that was previously not
happening because we never got to real VHT rates.

Your best workaround may just be to ignore VHT for now - clearly it's
broken so using "just" HT (which is likely not that much of a penalty
anyway since you're apparently not using 80 MHz) will be much better.

Go into

_rtl_init_hw_vht_capab()

and just remove or stub out the entire contents of that (or you could
just remove the "vht_supported=true" if you feel like it.)

That should get it to HT only, which is likely tested and working
better.

johannes

2016-01-29 01:54:56

by Larry Finger

[permalink] [raw]
Subject: Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

On 01/28/2016 05:01 PM, Linus Torvalds wrote:
> On Thu, Jan 28, 2016 at 2:12 PM, Johannes Berg
> <[email protected]> wrote:
>>
>> Your best workaround may just be to ignore VHT for now - clearly it's
>> broken so using "just" HT (which is likely not that much of a penalty
>> anyway since you're apparently not using 80 MHz) will be much better.
>>
>> Go into
>>
>> _rtl_init_hw_vht_capab()
>>
>> and just remove or stub out the entire contents of that (or you could
>> just remove the "vht_supported=true" if you feel like it.)
>>
>> That should get it to HT only, which is likely tested and working
>> better.
>
> Bingo. That indeed gets me working wireless. It's not super-fast, but
> I don't think it ever has been..
>
> If somebody has a suggested patch to actually *fix* VHT on this
> chipset, that would obviously be better. And maybe it works on some
> other chipsets, but not on mine. I'll happily test patches now that
> the merge window is over and I have some time again (and I can also
> make my AP do 80MHz channels if that matters, although as Johannes
> noted it's not enabled by default).
>
> For the realtek driver people, here is what lspci says:
>
> 02:00.0 Network controller: Realtek Semiconductor Co., Ltd. RTL8821AE
> 802.11ac PCIe Wireless Network Adapter
> Subsystem: AzureWave Device 2161
> Kernel driver in use: rtl8821ae
>
> (Numeric PCI ID: 10ec:8821, subsystem 1a3b:2161)
>
> Thanks,

Linus,

I have been running an RTL8821AE since kernel 3.18 without hitting this problem
using a TRENDnet AC1750 dual-band AP. The UniFi may be doing something that the
driver is not expecting.

There have also been some problems with the regdom in some models of these chips
that I also fail to see. It appears that some vendors are not coding the EEPROM
correctly. That should not affect your system.

Attached is a minimal patch that comments out the "vht_cap->vht_supported =
true;" statement for both RTL8821AE and RTL8812AE in _rtl_init_hw_vht_capab().
Does that allow your system to work? The patch also logs some information
regarding the channelplan and the country code. Please let me know the values
for those.

I apparently missed a previous complaint about this issue. If you still have the
reference, please send it to me.

Larry



Attachments:
rtl8821ae_test.patch (1.68 kB)

2016-01-28 20:40:31

by Johannes Berg

[permalink] [raw]
Subject: Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

On Thu, 2016-01-28 at 11:01 -0800, Linus Torvalds wrote:

> I used to have the basic original UniFi UAP. I've replaced them with
> the newer "AC Lite" version:
>
>     https://www.ubnt.com/unifi/unifi-ap-ac-lite/
>
> so it's a fairly big jump from a 2.4GHz-only network to a dual-band
> one.
>
> The old 2.4GHz-only AP's showed the problem with minstrel-ht
> incorrectly starting off at the highest rate (on a totally different
> machine). So the Unifi AP's have shown problems in the kernel
> wireless before, but so far it's always been the fault of the kernel
> wireless, not the AP.

Yeah; I wasn't trying to blame it on this change, I was just trying to
understand the change in the environment. Seems likely that it's simply
the switch to 5 GHz, which is strange, I'd have thought that even that
rtlwifi driver would've been tested with that :)

> > Could you print out the entire table there when the warning
> > happens?
>
> This is the best I can come up with: printing out the index, and the
> rate and bitrate tables:
>
>   rates[i].idx (9) >= sband->n_bitrates (8)
>   Rates:
>       0: idx 9 count 1 flags a0
>       1: idx 8 count 1 flags a0
>       2: idx 7 count 2 flags a0
>       3: idx 6 count 3 flags a0

Yeah, perfect. See, this is already evidently not making any sense:

flags a0 is
IEEE80211_TX_RC_40_MHZ_WIDTH | IEEE80211_TX_RC_SHORT_GI

both of those options *require* IEEE80211_TX_RC_MCS or
IEEE80211_TX_RC_VHT_MCS as well, so the flags really should be a8 or
1a0.

>   Bitrates:
>       0: flags 00000002 bitrate 60 (hw: 0004 0000)
>       1: flags 00000000 bitrate 90 (hw: 0005 0000)
>       2: flags 00000002 bitrate 120 (hw: 0006 0000)
>       3: flags 00000000 bitrate 180 (hw: 0007 0000)
>       4: flags 00000002 bitrate 240 (hw: 0008 0000)
>       5: flags 00000000 bitrate 360 (hw: 0009 0000)
>       6: flags 00000000 bitrate 480 (hw: 000a 0000)
>       7: flags 00000000 bitrate 540 (hw: 000b 0000)
>
> So it's the very first rate that has index 9, but the bitrate table
> only goes from 0-7.
>
> So I suspect that once the first index has been marked invalid, it
> now will never even look at the later indices, so it has no transmit
> rates at all.  Or something.

Indeed.

> That bitrate table does seem to match:
>
>    static struct ieee80211_rate rtl_ratetable_5g[] = {
>
> in drivers/net/wireless/realtek/rtlwifi/base.c
>

Yeah, it would, but it's irrelevant since the rate table isn't actually
used with MCS rates.

I'm not familiar with this code at all, but looking at it suggests that
perhaps the switch to 5 GHz wasn't at fault, but instead the switch to
VHT (802.11ac) - that's more plausible too, not testing with VHT seems
like something that could have happened for this driver.

And as I figured, the code in _rtl_rc_rate_set_series() is obviously
not handling VHT correctly: it has

                if (sgi_20 || sgi_40 || sgi_80)
                        rate->flags |= IEEE80211_TX_RC_SHORT_GI;
                if (sta && sta->ht_cap.ht_supported &&
                    ((wireless_mode == WIRELESS_MODE_N_5G) ||
                     (wireless_mode == WIRELESS_MODE_N_24G)))
                        rate->flags |= IEEE80211_TX_RC_MCS;

but can never set IEEE80211_TX_RC_VHT_MCS. Seems like there should be
something like

                if (sta && sta->ht_cap.vht_supported &&
                    (wireless_mode == WIRELESS_MODE_AC_5G ||
                     wireless_mode == WIRELESS_MODE_AC_24G ||
                     wireless_mode == WIRELESS_MODE_AC_ONLY))
                        rate->flags |= IEEE80211_TX_RC_VHT_MCS;

just after/before the above block.

But I'm not familiar with this code at all, so that may not really be
the right fix or even work.

johannes

2016-01-28 19:01:36

by Linus Torvalds

[permalink] [raw]
Subject: Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

On Thu, Jan 28, 2016 at 4:13 AM, Johannes Berg
<[email protected]> wrote:
> On Wed, 2016-01-27 at 21:34 -0800, Linus Torvalds wrote:
>
>> .. except now I upgraded the nearest access point, and now wireless
>> on that machine no longer works.
>
> Can you describe the upgrade a bit more, just for background?

I used to have the basic original UniFi UAP. I've replaced them with
the newer "AC Lite" version:

https://www.ubnt.com/unifi/unifi-ap-ac-lite/

so it's a fairly big jump from a 2.4GHz-only network to a dual-band one.

The old 2.4GHz-only AP's showed the problem with minstrel-ht
incorrectly starting off at the highest rate (on a totally different
machine). So the Unifi AP's have shown problems in the kernel wireless
before, but so far it's always been the fault of the kernel wireless,
not the AP.

> Could you print out the entire table there when the warning happens?

This is the best I can come up with: printing out the index, and the
rate and bitrate tables:

rates[i].idx (9) >= sband->n_bitrates (8)
Rates:
0: idx 9 count 1 flags a0
1: idx 8 count 1 flags a0
2: idx 7 count 2 flags a0
3: idx 6 count 3 flags a0
Bitrates:
0: flags 00000002 bitrate 60 (hw: 0004 0000)
1: flags 00000000 bitrate 90 (hw: 0005 0000)
2: flags 00000002 bitrate 120 (hw: 0006 0000)
3: flags 00000000 bitrate 180 (hw: 0007 0000)
4: flags 00000002 bitrate 240 (hw: 0008 0000)
5: flags 00000000 bitrate 360 (hw: 0009 0000)
6: flags 00000000 bitrate 480 (hw: 000a 0000)
7: flags 00000000 bitrate 540 (hw: 000b 0000)

So it's the very first rate that has index 9, but the bitrate table
only goes from 0-7.

So I suspect that once the first index has been marked invalid, it now
will never even look at the later indices, so it has no transmit rates
at all. Or something.

That bitrate table does seem to match:

static struct ieee80211_rate rtl_ratetable_5g[] = {

in drivers/net/wireless/realtek/rtlwifi/base.c

Does this give you any ideas?

Linus

2016-01-28 21:44:07

by Linus Torvalds

[permalink] [raw]
Subject: Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

Adding the RTL people to the cc, and leaving the whole thing quoted at
the bottom..

I will try Johannes' suggestion on that machine to see if it makes a
difference, but somebody who knows the rtlwifi rate control code
should take a double- or triple-look at this.

Please? Some googling shows that this is not a new issue. Or at least
I seem to find reports that look very much like this from over a year
ago.

Linus

On Thu, Jan 28, 2016 at 12:40 PM, Johannes Berg
<[email protected]> wrote:
> On Thu, 2016-01-28 at 11:01 -0800, Linus Torvalds wrote:
>>
>> I used to have the basic original UniFi UAP. I've replaced them with
>> the newer "AC Lite" version:
>>
>> https://www.ubnt.com/unifi/unifi-ap-ac-lite/
>>
>> so it's a fairly big jump from a 2.4GHz-only network to a dual-band
>> one.
>>
>> The old 2.4GHz-only AP's showed the problem with minstrel-ht
>> incorrectly starting off at the highest rate (on a totally different
>> machine). So the Unifi AP's have shown problems in the kernel
>> wireless before, but so far it's always been the fault of the kernel
>> wireless, not the AP.
>
> Yeah; I wasn't trying to blame it on this change, I was just trying to
> understand the change in the environment. Seems likely that it's simply
> the switch to 5 GHz, which is strange, I'd have thought that even that
> rtlwifi driver would've been tested with that :)
>
>> > Could you print out the entire table there when the warning
>> > happens?
>>
>> This is the best I can come up with: printing out the index, and the
>> rate and bitrate tables:
>>
>> rates[i].idx (9) >= sband->n_bitrates (8)
>> Rates:
>> 0: idx 9 count 1 flags a0
>> 1: idx 8 count 1 flags a0
>> 2: idx 7 count 2 flags a0
>> 3: idx 6 count 3 flags a0
>
> Yeah, perfect. See, this is already evidently not making any sense:
>
> flags a0 is
> IEEE80211_TX_RC_40_MHZ_WIDTH | IEEE80211_TX_RC_SHORT_GI
>
> both of those options *require* IEEE80211_TX_RC_MCS or
> IEEE80211_TX_RC_VHT_MCS as well, so the flags really should be a8 or
> 1a0.
>
>> Bitrates:
>> 0: flags 00000002 bitrate 60 (hw: 0004 0000)
>> 1: flags 00000000 bitrate 90 (hw: 0005 0000)
>> 2: flags 00000002 bitrate 120 (hw: 0006 0000)
>> 3: flags 00000000 bitrate 180 (hw: 0007 0000)
>> 4: flags 00000002 bitrate 240 (hw: 0008 0000)
>> 5: flags 00000000 bitrate 360 (hw: 0009 0000)
>> 6: flags 00000000 bitrate 480 (hw: 000a 0000)
>> 7: flags 00000000 bitrate 540 (hw: 000b 0000)
>>
>> So it's the very first rate that has index 9, but the bitrate table
>> only goes from 0-7.
>>
>> So I suspect that once the first index has been marked invalid, it
>> now will never even look at the later indices, so it has no transmit
>> rates at all. Or something.
>
> Indeed.
>
>> That bitrate table does seem to match:
>>
>> static struct ieee80211_rate rtl_ratetable_5g[] = {
>>
>> in drivers/net/wireless/realtek/rtlwifi/base.c
>>
>
> Yeah, it would, but it's irrelevant since the rate table isn't actually
> used with MCS rates.
>
> I'm not familiar with this code at all, but looking at it suggests that
> perhaps the switch to 5 GHz wasn't at fault, but instead the switch to
> VHT (802.11ac) - that's more plausible too, not testing with VHT seems
> like something that could have happened for this driver.
>
> And as I figured, the code in _rtl_rc_rate_set_series() is obviously
> not handling VHT correctly: it has
>
> if (sgi_20 || sgi_40 || sgi_80)
> rate->flags |= IEEE80211_TX_RC_SHORT_GI;
> if (sta && sta->ht_cap.ht_supported &&
> ((wireless_mode == WIRELESS_MODE_N_5G) ||
> (wireless_mode == WIRELESS_MODE_N_24G)))
> rate->flags |= IEEE80211_TX_RC_MCS;
>
> but can never set IEEE80211_TX_RC_VHT_MCS. Seems like there should be
> something like
>
> if (sta && sta->ht_cap.vht_supported &&
> (wireless_mode == WIRELESS_MODE_AC_5G ||
> wireless_mode == WIRELESS_MODE_AC_24G ||
> wireless_mode == WIRELESS_MODE_AC_ONLY))
> rate->flags |= IEEE80211_TX_RC_VHT_MCS;
>
> just after/before the above block.
>
> But I'm not familiar with this code at all, so that may not really be
> the right fix or even work.
>
> johannes

2016-01-28 12:13:10

by Johannes Berg

[permalink] [raw]
Subject: Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

On Wed, 2016-01-27 at 21:34 -0800, Linus Torvalds wrote:

> .. except now I upgraded the nearest access point, and now wireless
> on that machine no longer works.

Can you describe the upgrade a bit more, just for background?

> Or rather, it actually *does* work in the sense that it
> authenticates, it associates, and it actually gets a DHCP lease etc.
> So the darn thing has an IP address and everything, but then nothing
> else seems to go through after that. Very odd. My guess is that the
> auth/assoc/dhcp thign happens at low rates, then it starts trying to
> up the rates, and things go to hell.

That's usually the case, yes. Auth/assoc/etc. management frames use low
rates anyway, and the first few data frames usually also do until it
scales up.

The code involved is drivers/net/wireless/realtek/rtlwifi/rc.c

> I do note that that rate_fixup_ratelist() function is a bit odd wrt
> those rate indexes: it has code to make sure that there are no valid
> rates following an invalid one:
>
>                 /*
>                  * make sure there's no valid rate following
>                  * an invalid one, just in case drivers don't
>                  * take the API seriously to stop at -1.
>                  */
>                 if (inval) {
>                         rates[i].idx = -1;
>                         continue;
>                 }
>                 if (rates[i].idx < 0) {
>                         inval = true;
>                         continue;
>                 }
>
> but then that "RC is busted" case that generates a warning will add
> one of those invalid rates in the middle anyway. So I get the feeling
> that if that warning ever triggers, it will basically be screwing up
> that whole rate table. I dunno.

This should be OK, it's more of a sanity check. The driver is supposed
to stop transmission attempts at the first -1 it seems, but the rate
control algorithm shouldn't generate useless attempts that will never
really get used, since that indicates a bug in the rate scaling.

> Is there anything sane I can do to help debug this case?

Could you print out the entire table there when the warning happens? Or
at least, it'd help to figure out at which index the invalid actually
happens. It seems that if that perhaps happens on the very first index,
the driver might get completely confused and perhaps not even send the
frame, which would lead to symptoms like the one you describe.

It seems plausible that there's a path somewhere in the rate scaling
code that forgets to set IEEE80211_TX_RC_MCS or so.

johannes

2016-01-29 08:33:18

by Johannes Berg

[permalink] [raw]
Subject: Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

On Thu, 2016-01-28 at 19:54 -0600, Larry Finger wrote:

> I have been running an RTL8821AE since kernel 3.18 without hitting
> this problem
> using a TRENDnet AC1750 dual-band AP. The UniFi may be doing
> something that the
> driver is not expecting.

Are you quite sure you're actually using VHT though, perhaps the AP
somehow turned it off? It seems unlikely that you could successfully
use it in any way given that RATE_INFO_FLAGS_VHT_MCS doesn't show up in
the driver or rate scaling at all.


johannes

2016-01-28 22:04:54

by Linus Torvalds

[permalink] [raw]
Subject: Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

On Thu, Jan 28, 2016 at 1:44 PM, Linus Torvalds
<[email protected]> wrote:
>
> I will try Johannes' suggestion on that machine to see if it makes a
> difference

Well, it "makes a difference" in the sense that the warning goes away.
But it doesn't make things work. In fact, it might be making things
worse.

Because with that patch, the wireless still authenticates and
associates, but then it doesn't even get an IP address, so now even
dhcp doesn't work. Of course, I was surprised that it worked last
time, and I'm not 100% sure it did work consistently. I'll re-test
without the patch, just to make sure, but it doesn't really seem to
improve on anything.

Linus

2016-01-29 04:19:50

by Linus Torvalds

[permalink] [raw]
Subject: Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

On Thu, Jan 28, 2016 at 5:54 PM, Larry Finger <[email protected]> wrote:
>
> I have been running an RTL8821AE since kernel 3.18 without hitting this
> problem using a TRENDnet AC1750 dual-band AP. The UniFi may be doing
> something that the driver is not expecting.

I've had issues with unifi ap's before, but to be honest, I've had
issues with lots of hotel and airport wifi too. I don't think the
Unifi APs are outside of the normal spectrum..

> Attached is a minimal patch that comments out the "vht_cap->vht_supported =
> true;" statement for both RTL8821AE and RTL8812AE in
> _rtl_init_hw_vht_capab(). Does that allow your system to work?

That works too, yes.

> The patch
> also logs some information regarding the channelplan and the country code.
> Please let me know the values for those.

rtlwifi: **** channelplan 127
rtlwifi: **** country code 13

> I apparently missed a previous complaint about this issue. If you still have
> the reference, please send it to me.

So googling for similar issues, I found

https://bugzilla.redhat.com/show_bug.cgi?id=1168467
https://bugzilla.redhat.com/show_bug.cgi?id=1293136

where that second one in particular looks very like my issue
("Association succeeds, and ARP/DHCP work, but no IP frames can be
transmitted").

In both cases you have to go into the dmesg attachment to see that its
rtlwifi in both cases).

And there's an ubuntuforum thread

http://ubuntuforums.org/showthread.php?t=2226009&page=2

where it you follow the thing, it's an rtl chip on a PCI card, and it
has very similar "connected but no internet" behavior, along with the
"net/mac80211/rate.c:526" warning (different line numbers, different
kernel version, but it smells similar).

Or this one:

http://forums.debian.net/viewtopic.php?f=5&t=111781

which is also rtl-wifi, and also has the "associated, connected, got
an IP, but no data, not even a ping" behavior. It also has the
warning, but it looks different in other ways (2.4GHz only and
actually says it's not doing HT/VHT).

So I don't know. The warning in net/mac80211/rate.c:does seem to be
associated with the realtek driver.

Linus

2016-01-28 23:01:27

by Linus Torvalds

[permalink] [raw]
Subject: Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

On Thu, Jan 28, 2016 at 2:12 PM, Johannes Berg
<[email protected]> wrote:
>
> Your best workaround may just be to ignore VHT for now - clearly it's
> broken so using "just" HT (which is likely not that much of a penalty
> anyway since you're apparently not using 80 MHz) will be much better.
>
> Go into
>
> _rtl_init_hw_vht_capab()
>
> and just remove or stub out the entire contents of that (or you could
> just remove the "vht_supported=true" if you feel like it.)
>
> That should get it to HT only, which is likely tested and working
> better.

Bingo. That indeed gets me working wireless. It's not super-fast, but
I don't think it ever has been..

If somebody has a suggested patch to actually *fix* VHT on this
chipset, that would obviously be better. And maybe it works on some
other chipsets, but not on mine. I'll happily test patches now that
the merge window is over and I have some time again (and I can also
make my AP do 80MHz channels if that matters, although as Johannes
noted it's not enabled by default).

For the realtek driver people, here is what lspci says:

02:00.0 Network controller: Realtek Semiconductor Co., Ltd. RTL8821AE
802.11ac PCIe Wireless Network Adapter
Subsystem: AzureWave Device 2161
Kernel driver in use: rtl8821ae

(Numeric PCI ID: 10ec:8821, subsystem 1a3b:2161)

Thanks,

Linus