2024-01-15 14:43:39

by coldolt

[permalink] [raw]
Subject: [REGRESSION] 6.7 broke wifi "AP is in CSA process, reject auth"

I'm on Arch linux, updated the kernel from 6.6.10 -> 6.7.

Now it doesn't connect to my 5GHz wifi, to 2.4GHz it still connects.
Also the earlier kernel version still works. Output from "sudo dmesg |
grep -i wlp2s0":

> [ 6.049600] iwlwifi 0000:02:00.0 wlp2s0: renamed from wlan0
> [ 131.095861] wlp2s0: AP is in CSA process, reject auth
> [ 132.143170] wlp2s0: AP is in CSA process, reject auth
> [ 133.599906] wlp2s0: AP is in CSA process, reject auth
> [ 135.549325] wlp2s0: AP is in CSA process, reject auth
> [ 145.510438] wlp2s0: AP is in CSA process, reject auth

I notice that the commit c09c4f31998bac, which was added to kernel
6.7, introduced rejecting a connection with that error message "AP is
in CSA process, reject auth".

My guess is that commit is the cause of the regression.

I have a Dell E5550 laptop, lspci -k shows:

> 02:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59)
> Subsystem: Intel Corporation Dual Band Wireless-AC 7265 [Stone Peak 2 AC]
> Kernel driver in use: iwlwifi
> Kernel modules: iwlwifi

Output from "sudo dmesg | grep -i wifi":

> [ 5.198655] Intel(R) Wireless WiFi driver for Linux
> [ 5.221823] iwlwifi 0000:02:00.0: enabling device (0000 -> 0002)
> [ 5.230357] iwlwifi 0000:02:00.0: Detected crf-id 0x0, cnv-id 0x0 wfpm id 0x0
> [ 5.230513] iwlwifi 0000:02:00.0: PCI dev 095a/5410, rev=0x210, rfid=0xd55555d5
> [ 5.272339] iwlwifi 0000:02:00.0: Found debug destination: EXTERNAL_DRAM
> [ 5.272344] iwlwifi 0000:02:00.0: Found debug configuration: 0
> [ 5.273573] iwlwifi 0000:02:00.0: loaded firmware version 29.4063824552.0 7265D-29.ucode op_mode iwlmvm
> [ 5.551806] iwlwifi 0000:02:00.0: Detected Intel(R) Dual Band Wireless AC 7265, REV=0x210
> [ 5.565689] iwlwifi 0000:02:00.0: Applying debug destination EXTERNAL_DRAM
> [ 5.566606] iwlwifi 0000:02:00.0: Allocated 0x00400000 bytes for firmware monitor.
> [ 5.577802] iwlwifi 0000:02:00.0: base HW address: 34:02:86:17:53:27, OTP minor version: 0x0
> [ 6.049600] iwlwifi 0000:02:00.0 wlp2s0: renamed from wlan0
> [ 6.559212] iwlwifi 0000:02:00.0: Applying debug destination EXTERNAL_DRAM
> [ 6.638617] iwlwifi 0000:02:00.0: Applying debug destination EXTERNAL_DRAM
> [ 6.640695] iwlwifi 0000:02:00.0: FW already configured (0) - re-configuring
> [ 6.657163] iwlwifi 0000:02:00.0: Registered PHC clock: iwlwifi-PTP, with index: 1
> [ 6.710776] iwlwifi 0000:02:00.0: Applying debug destination EXTERNAL_DRAM
> [ 6.790826] iwlwifi 0000:02:00.0: Applying debug destination EXTERNAL_DRAM
> [ 6.792667] iwlwifi 0000:02:00.0: FW already configured (0) - re-configuring

#regzbot introduced: c09c4f31998bac


2024-01-15 20:00:40

by Johannes Berg

[permalink] [raw]
Subject: Re: [REGRESSION] 6.7 broke wifi "AP is in CSA process, reject auth"

On Mon, 2024-01-15 at 16:39 +0200, coldolt wrote:
> I'm on Arch linux, updated the kernel from 6.6.10 -> 6.7.
>
> Now it doesn't connect to my 5GHz wifi, to 2.4GHz it still connects.
> Also the earlier kernel version still works. Output from "sudo dmesg |
> grep -i wlp2s0":
>
> > [ 6.049600] iwlwifi 0000:02:00.0 wlp2s0: renamed from wlan0
> > [ 131.095861] wlp2s0: AP is in CSA process, reject auth
> > [ 132.143170] wlp2s0: AP is in CSA process, reject auth
> > [ 133.599906] wlp2s0: AP is in CSA process, reject auth
> > [ 135.549325] wlp2s0: AP is in CSA process, reject auth
> > [ 145.510438] wlp2s0: AP is in CSA process, reject auth
>
> I notice that the commit c09c4f31998bac, which was added to kernel
> 6.7, introduced rejecting a connection with that error message "AP is
> in CSA process, reject auth".
>
> My guess is that commit is the cause of the regression.

I guess? But that was quite intentional - we don't handle connecting
well while the AP is switching channels.

This really shouldn't persist for longer than a few seconds though, even
the 15 seconds sounds pretty excessive.

Could you show the output of

$ sudo iw wlp2s0 scan -u

for this AP?

johannes

2024-01-15 23:56:20

by coldolt

[permalink] [raw]
Subject: Re: [REGRESSION] 6.7 broke wifi "AP is in CSA process, reject auth"

I can try to keep connecting for over 5 minutes, it never connects,
keeps outputting the same dmesg message. The kernel before 6.7
connects immediately.

The router is an Asus RT-AC53. Output of "sudo iw wlp2s0 scan -u" for it is:

BSS b0:6e:bf:76:0a:3c(on wlp2s0)
last seen: 752.658s [boottime]
TSF: 177237341 usec (0d, 00:02:57)
freq: 5180.0
beacon interval: 200 TUs
capability: ESS Privacy ShortPreamble SpectrumMgmt APSD (0x0931)
signal: -61.00 dBm
last seen: 90 ms ago
Information elements from Probe Response frame:
SSID: internet5
Supported rates: 6.0* 9.0 12.0* 18.0 24.0* 36.0 48.0 54.0
DS Parameter set: channel 36
Unknown IE (60): 01 16 24 09
HT capabilities:
Capabilities: 0x16e
HT20/HT40
SM Power Save disabled
RX HT20 SGI
RX HT40 SGI
RX STBC 1-stream
Max AMSDU length: 3839 bytes
No DSSS/CCK HT40
Maximum RX AMPDU length 32767 bytes (exponent: 0x002)
Minimum RX AMPDU time spacing: 4 usec (0x05)
HT RX MCS rate indexes supported: 0-7, 32
HT TX MCS rate indexes are undefined
HT operation:
* primary channel: 36
* secondary channel offset: above
* STA channel width: any
* RIFS: 0
* HT protection: no
* non-GF present: 0
* OBSS non-GF present: 0
* dual beacon: 0
* dual CTS protection: 0
* STBC beacon: 0
* L-SIG TXOP Prot: 0
* PCO active: 0
* PCO phase: 0
RSN: * Version: 1
* Group cipher: CCMP
* Pairwise ciphers: CCMP
* Authentication suites: PSK
* Capabilities: 1-PTKSA-RC 1-GTKSA-RC (0x0000)
WMM: * Parameter version 1
* u-APSD
* BE: CW 15-1023, AIFSN 3
* BK: CW 15-1023, AIFSN 7
* VI: CW 7-15, AIFSN 2, TXOP 3008 usec
* VO: CW 3-7, AIFSN 2, TXOP 1504 usec
BSS Load:
* station count: 0
* channel utilisation: 9/255
* available admission capacity: 31250 [*32us]
Vendor specific: OUI 00:0c:43, data: 03 00 00 00
Power constraint: 3 dB
Country: FR Environment: Indoor/Outdoor
Channels [36 - 96] @ 16 dBm
VHT capabilities:
VHT Capabilities (0x31c00120):
Max MPDU length: 3895
Supported Channel Width: neither 160 nor 80+80
short GI (80 MHz)
+HTC-VHT
RX antenna pattern consistency
TX antenna pattern consistency
VHT RX MCS set:
1 streams: MCS 0-9
2 streams: not supported
3 streams: not supported
4 streams: not supported
5 streams: not supported
6 streams: not supported
7 streams: not supported
8 streams: not supported
VHT RX highest supported: 292 Mbps
VHT TX MCS set:
1 streams: MCS 0-9
2 streams: not supported
3 streams: not supported
4 streams: not supported
5 streams: not supported
6 streams: not supported
7 streams: not supported
8 streams: not supported
VHT TX highest supported: 292 Mbps
VHT extended NSS: not supported
VHT operation:
* channel width: 1 (80 MHz)
* center freq segment 1: 42
* center freq segment 2: 0
* VHT basic MCS set: 0xfffe


On Mon, Jan 15, 2024 at 10:00 PM Johannes Berg
<[email protected]> wrote:
>
> On Mon, 2024-01-15 at 16:39 +0200, coldolt wrote:
> > I'm on Arch linux, updated the kernel from 6.6.10 -> 6.7.
> >
> > Now it doesn't connect to my 5GHz wifi, to 2.4GHz it still connects.
> > Also the earlier kernel version still works. Output from "sudo dmesg |
> > grep -i wlp2s0":
> >
> > > [ 6.049600] iwlwifi 0000:02:00.0 wlp2s0: renamed from wlan0
> > > [ 131.095861] wlp2s0: AP is in CSA process, reject auth
> > > [ 132.143170] wlp2s0: AP is in CSA process, reject auth
> > > [ 133.599906] wlp2s0: AP is in CSA process, reject auth
> > > [ 135.549325] wlp2s0: AP is in CSA process, reject auth
> > > [ 145.510438] wlp2s0: AP is in CSA process, reject auth
> >
> > I notice that the commit c09c4f31998bac, which was added to kernel
> > 6.7, introduced rejecting a connection with that error message "AP is
> > in CSA process, reject auth".
> >
> > My guess is that commit is the cause of the regression.
>
> I guess? But that was quite intentional - we don't handle connecting
> well while the AP is switching channels.
>
> This really shouldn't persist for longer than a few seconds though, even
> the 15 seconds sounds pretty excessive.
>
> Could you show the output of
>
> $ sudo iw wlp2s0 scan -u
>
> for this AP?
>
> johannes

2024-01-16 11:44:13

by Johannes Berg

[permalink] [raw]
Subject: Re: [REGRESSION] 6.7 broke wifi "AP is in CSA process, reject auth"

On Tue, 2024-01-16 at 01:56 +0200, coldolt wrote:
> I can try to keep connecting for over 5 minutes, it never connects,
> keeps outputting the same dmesg message. The kernel before 6.7
> connects immediately.


> DS Parameter set: channel 36
> Unknown IE (60): 01 16 24 09
>

So it is indeed - as expected of course - advertising that it's about to
switch channel and even saying everyone else needs to be *quiet*. This
is

struct ieee80211_ext_chansw_ie {
u8 mode;
u8 new_operating_class;
u8 new_ch_num;
u8 count;
} __packed;


so mode==1 indicates quiet, new_operating_class/new_ch_num are actually
the channel it's currently on, and count is 9.

Can you say if it actually changes the count? Maybe capture on channel
36 using the NIC as a sniffer what it does over time:
https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#air_sniffing


Initially I'd say though that if this situation persists, then your AP
is having some problems and we'd not have stayed connected without the
patch in question either. If you want, maybe revert and see what the
symptom is then?

johannes

2024-01-16 22:14:54

by coldolt

[permalink] [raw]
Subject: Re: [REGRESSION] 6.7 broke wifi "AP is in CSA process, reject auth"

On Tue, Jan 16, 2024 at 1:43 PM Johannes Berg <[email protected]> wrote:
> so mode==1 indicates quiet, new_operating_class/new_ch_num are actually
> the channel it's currently on, and count is 9.
>
> Can you say if it actually changes the count? Maybe capture on channel
> 36 using the NIC as a sniffer what it does over time:
> https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#air_sniffing

If I keep checking it the line "Unknown IE (60): 01 16 24 09" seems to
always stay the same, the 09 doesn't change, it's the same today as it
was yesterday.

I captured the channel 36 for 15 mins, here is the 34MB file in
gdrive: https://drive.google.com/file/d/1yqDb3g3Cfttm4W-Jb5AA51nLQ7OglWhl/view?usp=sharing

> Initially I'd say though that if this situation persists, then your AP
> is having some problems and we'd not have stayed connected without the
> patch in question either. If you want, maybe revert and see what the
> symptom is then?

I now compiled and installed commit c09c4f3 and its parent 2bf57b0.
The 2bf57b0 works great, connects immediately and I used it for 30+
mins, also tried to connect/disconnect 5+ times smoothly, no symptoms.
The c09c4f3 has the problem described originally, never connects even
if trying 10+ times in 5+ minutes, keeps outputting the same dmesg
message "AP is in CSA process, reject auth".

2024-01-16 22:45:54

by Johannes Berg

[permalink] [raw]
Subject: Re: [REGRESSION] 6.7 broke wifi "AP is in CSA process, reject auth"

On Tue, 2024-01-16 at 22:37 +0200, coldolt wrote:
> On Tue, Jan 16, 2024 at 1:43 PM Johannes Berg <[email protected]> wrote:
> > so mode==1 indicates quiet, new_operating_class/new_ch_num are actually
> > the channel it's currently on, and count is 9.
> >
> > Can you say if it actually changes the count? Maybe capture on channel
> > 36 using the NIC as a sniffer what it does over time:
> > https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#air_sniffing
>
> If I keep checking it the line "Unknown IE (60): 01 16 24 09" seems to
> always stay the same, the 09 doesn't change, it's the same today as it
> was yesterday.

Interesting.

> I captured the channel 36 for 15 mins, here is the 34MB file in
> gdrive: https://drive.google.com/file/d/1yqDb3g3Cfttm4W-Jb5AA51nLQ7OglWhl/view?usp=sharing

Thanks! I got it, feel free to delete.

It's interesting, the AP is not including the element in the beacon, but
is unconditionally and always including it in the probe responses.

> > Initially I'd say though that if this situation persists, then your AP
> > is having some problems and we'd not have stayed connected without the
> > patch in question either. If you want, maybe revert and see what the
> > symptom is then?
>
> I now compiled and installed commit c09c4f3 and its parent 2bf57b0.
> The 2bf57b0 works great, connects immediately and I used it for 30+
> mins, also tried to connect/disconnect 5+ times smoothly, no symptoms.
> The c09c4f3 has the problem described originally, never connects even
> if trying 10+ times in 5+ minutes, keeps outputting the same dmesg
> message "AP is in CSA process, reject auth".

Yes, I didn't think of it being broken enough to include it in probe
responses but not beacons ... So it makes sense that w/o that we could
connect and not even disconnect, because while connected we ignore
(E)CSA in probe responses...

I'll think about it for a bit and ask some folks, but I guess we'll just
remove the check for probe responses from this ... there's little
guarantee anyway that this works well. Or maybe if we find it in probe
response check if we have a recent beacon? But that's tricky, we may
not, if we got a probe response on the channel but no beacon.

johannes


2024-01-19 10:54:59

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [REGRESSION] 6.7 broke wifi "AP is in CSA process, reject auth"

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 15.01.24 15:39, coldolt wrote:
> I'm on Arch linux, updated the kernel from 6.6.10 -> 6.7.
>
> Now it doesn't connect to my 5GHz wifi, to 2.4GHz it still connects.
> Also the earlier kernel version still works. Output from "sudo dmesg |
> grep -i wlp2s0":
>
>> [ 6.049600] iwlwifi 0000:02:00.0 wlp2s0: renamed from wlan0
>> [ 131.095861] wlp2s0: AP is in CSA process, reject auth
>> [ 132.143170] wlp2s0: AP is in CSA process, reject auth
>> [ 133.599906] wlp2s0: AP is in CSA process, reject auth
>> [ 135.549325] wlp2s0: AP is in CSA process, reject auth
>> [ 145.510438] wlp2s0: AP is in CSA process, reject auth
>
> I notice that the commit c09c4f31998bac, which was added to kernel
> 6.7, introduced rejecting a connection with that error message "AP is
> in CSA process, reject auth".

#regzbot ^introduced c09c4f31998bac
#regzbot title wifi: cfg80211: can't connect to my 5GHz wifi anymore
#regzbot monitor:
https://lore.kernel.org/all/[email protected]/
#regzbot fix: wifi: cfg80211: detect stuck ECSA element in probe resp
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.