2021-11-29 10:16:27

by Wen Gong

[permalink] [raw]
Subject: [PATCH] ath11k: enable IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS for WCN6855

Currently mac80211 will send 3 scan request for each scan of WCN6855,
they are 2.4 GHz/5 GHz/6 GHz band scan. Firmware of WCN6855 will
cache the RNR IE(Reduced Neighbor Report element) which exist in the
beacon of 2.4 GHz/5 GHz of the AP which is co-located with 6 GHz,
and then use the cache to scan in 6 GHz band scan if the 6 GHz scan
is in the same scan with the 2.4 GHz/5 GHz band, this will helpful to
search more AP of 6 GHz. Also it will decrease the time cost of scan
because firmware will use dual-band scan for the 2.4 GHz/5 GHz, it
means the 2.4 GHz and 5 GHz scans are doing simultaneously.

Set the flag IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS for WCN6855 since
it supports 2.4 GHz/5 GHz/6 GHz and it is single pdev which means
all the 2.4 GHz/5 GHz/6 GHz exist in the same wiphy/ieee80211_hw.

Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-01720.1-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1

Signed-off-by: Wen Gong <[email protected]>
---
drivers/net/wireless/ath/ath11k/mac.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/net/wireless/ath/ath11k/mac.c b/drivers/net/wireless/ath/ath11k/mac.c
index 06d20658586a..8218ea52f285 100644
--- a/drivers/net/wireless/ath/ath11k/mac.c
+++ b/drivers/net/wireless/ath/ath11k/mac.c
@@ -7915,6 +7915,9 @@ static int __ath11k_mac_register(struct ath11k *ar)

ar->hw->wiphy->interface_modes = ab->hw_params.interface_modes;

+ if (ab->hw_params.single_pdev_only && ar->supports_6ghz)
+ ieee80211_hw_set(ar->hw, SINGLE_SCAN_ON_ALL_BANDS);
+
ieee80211_hw_set(ar->hw, SIGNAL_DBM);
ieee80211_hw_set(ar->hw, SUPPORTS_PS);
ieee80211_hw_set(ar->hw, SUPPORTS_DYNAMIC_PS);
--
2.31.1



2021-12-03 14:09:39

by Sven Eckelmann

[permalink] [raw]
Subject: Re: [PATCH] ath11k: enable IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS for WCN6855

On Monday, 29 November 2021 11:13:09 CET Wen Gong wrote:
> Currently mac80211 will send 3 scan request for each scan of WCN6855,
> they are 2.4 GHz/5 GHz/6 GHz band scan. Firmware of WCN6855 will
> cache the RNR IE(Reduced Neighbor Report element) which exist in the
> beacon of 2.4 GHz/5 GHz of the AP which is co-located with 6 GHz,
> and then use the cache to scan in 6 GHz band scan if the 6 GHz scan
> is in the same scan with the 2.4 GHz/5 GHz band, this will helpful to
> search more AP of 6 GHz. Also it will decrease the time cost of scan
> because firmware will use dual-band scan for the 2.4 GHz/5 GHz, it
> means the 2.4 GHz and 5 GHz scans are doing simultaneously.
>
> Set the flag IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS for WCN6855 since
> it supports 2.4 GHz/5 GHz/6 GHz and it is single pdev which means
> all the 2.4 GHz/5 GHz/6 GHz exist in the same wiphy/ieee80211_hw.
>
> Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-01720.1-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1

I've tested this on ath-next on commit a93789ae541c ("ath11k: Avoid NULL ptr
access during mgmt tx cleanup") with a WCN6856 card (EmWicon/jjplus WMX7205)
with firmware WLAN.HSP.1.1-02892.1-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1. ath-next
was required for me because 32 MSI vectors are not available on the
used system.

Without this patch, it works fine. With patch, I just have to connect to an AP
via wpa_supplicant to crash the system. See the attached x86-64 .config, the
stacktrace and the decoded stacktrace.

Kind regards,
Sven


Attachments:
station_connect_crash_decoded.txt (9.68 kB)
station_connect_crash.txt (4.68 kB)
.config.xz (49.82 kB)
Download all attachments

2021-12-06 03:29:47

by Wen Gong

[permalink] [raw]
Subject: Re: [PATCH] ath11k: enable IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS for WCN6855

On 12/3/2021 10:09 PM, Sven Eckelmann wrote:
> On Monday, 29 November 2021 11:13:09 CET Wen Gong wrote:
...
> I've tested this on ath-next on commit a93789ae541c ("ath11k: Avoid NULL ptr
> access during mgmt tx cleanup") with a WCN6856 card (EmWicon/jjplus WMX7205)
> with firmware WLAN.HSP.1.1-02892.1-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1. ath-next
> was required for me because 32 MSI vectors are not available on the
> used system.
>
> Without this patch, it works fine. With patch, I just have to connect to an AP
> via wpa_supplicant to crash the system. See the attached x86-64 .config, the
> stacktrace and the decoded stacktrace.

I did test in my setup, not see the crash.

I am afraid you also need this patch("ath11k: change to use dynamic
memory for channel list of scan",

https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]
)

Could you apply this patch and try again?

> Kind regards,
> Sven

2021-12-06 06:56:21

by Sven Eckelmann

[permalink] [raw]
Subject: Re: [PATCH] ath11k: enable IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS for WCN6855

On Monday, 6 December 2021 04:29:39 CET Wen Gong wrote:
[...]
> I did test in my setup, not see the crash.
>
> I am afraid you also need this patch("ath11k: change to use dynamic
> memory for channel list of scan",
>
> https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]
> )
>
> Could you apply this patch and try again?

Tried it and I see the same problem.

Kind regards,
Sven


Attachments:
signature.asc (833.00 B)
This is a digitally signed message part.

2021-12-06 07:10:45

by Wen Gong

[permalink] [raw]
Subject: Re: [PATCH] ath11k: enable IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS for WCN6855

On 12/6/2021 2:56 PM, Sven Eckelmann wrote:
> On Monday, 6 December 2021 04:29:39 CET Wen Gong wrote:
> [...]
>> I did test in my setup, not see the crash.
>>
>> I am afraid you also need this patch("ath11k: change to use dynamic
>> memory for channel list of scan",
>>
>> https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]
>> )
>>
>> Could you apply this patch and try again?
> Tried it and I see the same problem.
Could you tell what is your test steps?
>
> Kind regards,
> Sven

2021-12-06 20:03:12

by Sven Eckelmann

[permalink] [raw]
Subject: Re: [PATCH] ath11k: enable IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS for WCN6855

On Monday, 6 December 2021 08:10:40 CET Wen Gong wrote:
> > On Monday, 6 December 2021 04:29:39 CET Wen Gong wrote:
> > [...]
> >> I did test in my setup, not see the crash.
> >>
> >> I am afraid you also need this patch("ath11k: change to use dynamic
> >> memory for channel list of scan",
> >>
> >> https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]
> >> )
> >>
> >> Could you apply this patch and try again?
> > Tried it and I see the same problem.
> Could you tell what is your test steps?

Start kernel with commit a93789ae541c ("ath11k: Avoid NULL ptr
access during mgmt tx cleanup") + patches:

* ath11k: enable IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS for WCN6855
* ath11k: change to use dynamic memory for channel list of scan

You can find the config in the first mail. But I have now enabled KASAN inline
to hopefully create some better error messages.

The firmware + board data (see mail "ath11k: incorrect board_id retrieval")
was prepared like this:

git clone https://github.com/kvalo/ath11k-firmware /root/ath11k-firmware
mkdir -p /lib/firmware/ath11k/WCN6855/hw2.0/
cp /root/ath11k-firmware/WCN6855/hw2.0/*.bin /lib/firmware/ath11k/WCN6855/hw2.0/
cp /root/ath11k-firmware/WCN6855/hw2.0/1.1/WLAN.HSP.1.1-02892.1-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1/*.bin /lib/firmware/ath11k/WCN6855/hw2.0/

git clone https://github.com/qca/qca-swiss-army-knife /root/qca-swiss-army-knife
apt install python2
python2 /root/qca-swiss-army-knife/tools/scripts/ath11k/ath11k-bdencoder -e /lib/firmware/ath11k/WCN6855/hw2.0/board-2.bin
rm /lib/firmware/ath11k/WCN6855/hw2.0/board-2.bin
cp 'bus=pci,vendor=17cb,device=1103,subsystem-vendor=17cb,subsystem-device=3374,qmi-chip-id=2,qmi-board-id=266.bin' /lib/firmware/ath11k/WCN6855/hw2.0/board.bin

Then I am just starting up the device as usual, and start wpa_supplicant (with
defconfig + CONFIG_MESH=y) from commit 14ab4a816c68 ("Reject
ap_vendor_elements if its length is odd")

cat << "EOF" > station_test.cfg
network={
ssid="MyTestAP"
key_mgmt=WPA-PSK FT-PSK
proto=RSN
psk="testtest"
}
EOF
ip link set up dev wlp6s0
~/hostap/wpa_supplicant/wpa_supplicant -D nl80211 -i wlp6s0 -c station_test.cfg

The actual SSID + PSK is valid and multiple access points (4) have this BSS on
2.4GHz + 5GHz.

So you are basically always calling dev_kfree_skb_any in ath11k_ce_tx_process_cb
because wcn6855 hw2.0 has credit_flow has set. But it seems like one of the
entries returned by ath11k_ce_completed_send_next is bogus and causes this
problems during the ath11k_ce_tx_process_cb. And for some reason, this is
triggered here by this firmware feature.

./scripts/faddr2line --list vmlinux consume_skb+0x9f/0x1c0
consume_skb+0x9f/0x1c0:

__kfree_skb at net/core/skbuff.c:757
752 */
753
754 void __kfree_skb(struct sk_buff *skb)
755 {
756 skb_release_all(skb);
>757< kfree_skbmem(skb);
758 }
759 EXPORT_SYMBOL(__kfree_skb);
760
761 /**
762 * kfree_skb - free an sk_buff

(inlined by) consume_skb at net/core/skbuff.c:912
907 {
908 if (!skb_unref(skb))
909 return;
910
911 trace_consume_skb(skb);
>912< __kfree_skb(skb);
913 }
914 EXPORT_SYMBOL(consume_skb);
915 #endif
916
917 /**

(inlined by) consume_skb at net/core/skbuff.c:906
901 *
902 * Drop a ref to the buffer and free it if the usage count has hit zero
903 * Functions identically to kfree_skb, but kfree_skb assumes that the frame
904 * is being dropped after a failure and notes that
905 */
>906< void consume_skb(struct sk_buff *skb)
907 {
908 if (!skb_unref(skb))
909 return;
910
911 trace_consume_skb(skb);


./scripts/faddr2line --list vmlinux skb_release_data+0x1b0/0x5c0
skb_release_data+0x1b0/0x5c0:

skb_zcopy_clear at include/linux/skbuff.h:1549
1544 {
1545 struct ubuf_info *uarg = skb_zcopy(skb);
1546
1547 if (uarg) {
1548 if (!skb_zcopy_is_nouarg(skb))
>1549< uarg->callback(skb, uarg, zerocopy_success);
1550
1551 skb_shinfo(skb)->flags &= ~SKBFL_ALL_ZEROCOPY;
1552 }
1553 }
1554

(inlined by) skb_release_data at net/core/skbuff.c:669
664 if (skb->cloned &&
665 atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1,
666 &shinfo->dataref))
667 goto exit;
668
>669< skb_zcopy_clear(skb, true);
670
671 for (i = 0; i < shinfo->nr_frags; i++)
672 __skb_frag_unref(&shinfo->frags[i], skb->pp_recycle);
673
674 if (shinfo->frag_list)

But I didn't like the inlined code. So I've changed the compilation flags
slightly:

diff --git a/net/core/Makefile b/net/core/Makefile
index 6bdcb2cafed8..5eda226c5f27 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -37,3 +37,4 @@ obj-$(CONFIG_NET_SOCK_MSG) += skmsg.o
obj-$(CONFIG_BPF_SYSCALL) += sock_map.o
obj-$(CONFIG_BPF_SYSCALL) += bpf_sk_storage.o
obj-$(CONFIG_OF) += of_net.o
+ccflags-y += -fno-inline -O1 -fno-optimize-sibling-calls

Now the stacktrace is a lot more readable. And the returned
crash location makes a lot more sense:

./scripts/faddr2line --list vmlinux 'skb_zcopy_clear+0x34/0x8f'
skb_zcopy_clear+0x34/0x8f:

skb_zcopy_clear at include/linux/skbuff.h:1549
1544 {
1545 struct ubuf_info *uarg = skb_zcopy(skb);
1546
1547 if (uarg) {
1548 if (!skb_zcopy_is_nouarg(skb))
>1549< uarg->callback(skb, uarg, zerocopy_success);
1550
1551 skb_shinfo(skb)->flags &= ~SKBFL_ALL_ZEROCOPY;
1552 }
1553 }
1554

Or with the assembler:

(gdb) disassemble /m *(skb_zcopy_clear+0x34/0x8f)
Dump of assembler code for function skb_zcopy_clear:
1544 {
0x000000000000072a <+0>: push %r12
0x000000000000072c <+2>: push %rbp
0x000000000000072d <+3>: push %rbx
0x000000000000072e <+4>: mov %rdi,%rbx
0x0000000000000731 <+7>: mov %esi,%r12d

1545 struct ubuf_info *uarg = skb_zcopy(skb);
0x0000000000000734 <+10>: call 0x5d3 <skb_zcopy>

1546
1547 if (uarg) {
0x0000000000000739 <+15>: test %rax,%rax
0x000000000000073c <+18>: je 0x7a0 <skb_zcopy_clear+118>
0x000000000000073e <+20>: mov %rax,%rbp

1548 if (!skb_zcopy_is_nouarg(skb))
0x0000000000000741 <+23>: mov %rbx,%rdi
0x0000000000000744 <+26>: call 0x6f6 <skb_zcopy_is_nouarg>
0x0000000000000749 <+31>: test %al,%al
0x000000000000074b <+33>: jne 0x777 <skb_zcopy_clear+77>

1549 uarg->callback(skb, uarg, zerocopy_success);
0x000000000000074d <+35>: mov %rbp,%rdx
0x0000000000000750 <+38>: shr $0x3,%rdx
0x0000000000000754 <+42>: movabs $0xdffffc0000000000,%rax
0x000000000000075e <+52>: cmpb $0x0,(%rdx,%rax,1)
0x0000000000000762 <+56>: jne 0x7a5 <skb_zcopy_clear+123>
0x0000000000000764 <+58>: movzbl %r12b,%edx
0x0000000000000768 <+62>: mov 0x0(%rbp),%rax
0x000000000000076c <+66>: mov %rbp,%rsi
0x000000000000076f <+69>: mov %rbx,%rdi
0x0000000000000772 <+72>: call 0x777 <skb_zcopy_clear+77>
0x00000000000007a5 <+123>: mov %rbp,%rdi
0x00000000000007a8 <+126>: call 0x7ad <skb_zcopy_clear+131>
0x00000000000007ad <+131>: jmp 0x764 <skb_zcopy_clear+58>

1550
1551 skb_shinfo(skb)->flags &= ~SKBFL_ALL_ZEROCOPY;
0x0000000000000777 <+77>: mov %rbx,%rdi
0x000000000000077a <+80>: call 0x518 <skb_end_pointer>
0x000000000000077f <+85>: mov %rax,%rbx
0x0000000000000782 <+88>: mov %rax,%rdx
0x0000000000000785 <+91>: shr $0x3,%rdx
0x0000000000000789 <+95>: movabs $0xdffffc0000000000,%rax
0x0000000000000793 <+105>: movzbl (%rdx,%rax,1),%eax
0x0000000000000797 <+109>: test %al,%al
0x0000000000000799 <+111>: je 0x79d <skb_zcopy_clear+115>
0x000000000000079b <+113>: jle 0x7af <skb_zcopy_clear+133>
0x000000000000079d <+115>: andb $0xf8,(%rbx)
0x00000000000007af <+133>: mov %rbx,%rdi
0x00000000000007b2 <+136>: call 0x7b7 <skb_zcopy_clear+141>
0x00000000000007b7 <+141>: jmp 0x79d <skb_zcopy_clear+115>

1552 }
1553 }
0x00000000000007a0 <+118>: pop %rbx
0x00000000000007a1 <+119>: pop %rbp
0x00000000000007a2 <+120>: pop %r12
0x00000000000007a4 <+122>: ret

End of assembler dump.

To make it even easier to read, just disable the inline KASAN and reduce the
optimization level for this for it:

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 059b6266dcd7..819cc58ab051 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1540,6 +1540,8 @@ static inline void net_zcopy_put_abort(struct ubuf_info *uarg, bool have_uref)
}

/* Release a reference on a zerocopy structure */
+#pragma GCC push_options
+#pragma GCC optimize ("O0")
static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy_success)
{
struct ubuf_info *uarg = skb_zcopy(skb);
@@ -1551,6 +1553,7 @@ static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy_success)
skb_shinfo(skb)->flags &= ~SKBFL_ALL_ZEROCOPY;
}
}
+#pragma GCC pop_options

static inline void skb_mark_not_on_list(struct sk_buff *skb)
{

This creates this nice, unoptimized function which crashes at +63:

$ gdb net/core/skbuff.o -q
Reading symbols from net/core/skbuff.o...
(gdb) disassemble /m *(skb_zcopy_clear+0x3f/0x70)
Dump of assembler code for function skb_zcopy_clear:
1546 {
0x0000000000000000 <+0>: push %rbp
0x0000000000000001 <+1>: mov %rsp,%rbp
0x0000000000000004 <+4>: sub $0x18,%rsp
0x0000000000000008 <+8>: mov %rdi,-0x10(%rbp)
0x000000000000000c <+12>: mov %esi,%eax
0x000000000000000e <+14>: mov %al,-0x14(%rbp)

1547 struct ubuf_info *uarg = skb_zcopy(skb);
0x0000000000000011 <+17>: mov -0x10(%rbp),%rax
0x0000000000000015 <+21>: mov %rax,%rdi
0x0000000000000018 <+24>: call 0x29e <skb_zcopy>
0x000000000000001d <+29>: mov %rax,-0x8(%rbp)

1548
1549 if (uarg) {
0x0000000000000021 <+33>: cmpq $0x0,-0x8(%rbp)
0x0000000000000026 <+38>: je 0x6d <skb_zcopy_clear+109>

1550 if (!skb_zcopy_is_nouarg(skb))
0x0000000000000028 <+40>: mov -0x10(%rbp),%rax
0x000000000000002c <+44>: mov %rax,%rdi
0x000000000000002f <+47>: call 0x2df <skb_zcopy_is_nouarg>
0x0000000000000034 <+52>: xor $0x1,%eax
0x0000000000000037 <+55>: test %al,%al
0x0000000000000039 <+57>: je 0x59 <skb_zcopy_clear+89>

1551 uarg->callback(skb, uarg, zerocopy_success);
0x000000000000003b <+59>: mov -0x8(%rbp),%rax
0x000000000000003f <+63>: mov (%rax),%r8
0x0000000000000042 <+66>: movzbl -0x14(%rbp),%edx
0x0000000000000046 <+70>: mov -0x8(%rbp),%rcx
0x000000000000004a <+74>: mov -0x10(%rbp),%rax
0x000000000000004e <+78>: mov %rcx,%rsi
0x0000000000000051 <+81>: mov %rax,%rdi
0x0000000000000054 <+84>: call 0x59 <skb_zcopy_clear+89>

1552
1553 skb_shinfo(skb)->flags &= ~SKBFL_ALL_ZEROCOPY;
0x0000000000000059 <+89>: mov -0x10(%rbp),%rax
0x000000000000005d <+93>: mov %rax,%rdi
0x0000000000000060 <+96>: call 0x27f <skb_end_pointer>
0x0000000000000065 <+101>: movzbl (%rax),%edx
0x0000000000000068 <+104>: and $0xfffffff8,%edx
0x000000000000006b <+107>: mov %dl,(%rax)

1554 }
1555 }
0x000000000000006d <+109>: nop
0x000000000000006e <+110>: leave
0x000000000000006f <+111>: ret

End of assembler dump.

The question now: What is causing the unclean state of the skb and thus
doesn't let it get rejected by skb_zcopy_is_nouarg before the uarg
callback is tried.

Kind regards,
Sven


Attachments:
screenlog.0.zip (19.63 kB)
station_connect_crash_decoded.txt (9.08 kB)
station_connect_crash2_decoded.txt (9.67 kB)
station_connect_crash2.txt (4.43 kB)
station_connect_crash3.txt (3.93 kB)
station_connect_crash3_decoded.txt (8.75 kB)
signature.asc (833.00 B)
This is a digitally signed message part.
Download all attachments

2021-12-07 04:35:11

by Wen Gong

[permalink] [raw]
Subject: Re: [PATCH] ath11k: enable IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS for WCN6855

On 12/7/2021 4:03 AM, Sven Eckelmann wrote:
> On Monday, 6 December 2021 08:10:40 CET Wen Gong wrote:
>>> On Monday, 6 December 2021 04:29:39 CET Wen Gong wrote:
>>> [...]
>>>> I did test in my setup, not see the crash.
>>>>
>>>> I am afraid you also need this patch("ath11k: change to use dynamic
>>>> memory for channel list of scan",
>>>>
>>>> https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]
>>>> )
>>>>
>>>> Could you apply this patch and try again?
>>> Tried it and I see the same problem.
>> Could you tell what is your test steps?
> Start kernel with commit a93789ae541c ("ath11k: Avoid NULL ptr
> access during mgmt tx cleanup") + patches:
>
> * ath11k: enable IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS for WCN6855
> * ath11k: change to use dynamic memory for channel list of scan
>
> You can find the config in the first mail. But I have now enabled KASAN inline
> to hopefully create some better error messages.
>
> The firmware + board data (see mail "ath11k: incorrect board_id retrieval")
> was prepared like this:
>
> git clone https://github.com/kvalo/ath11k-firmware /root/ath11k-firmware
> mkdir -p /lib/firmware/ath11k/WCN6855/hw2.0/
> cp /root/ath11k-firmware/WCN6855/hw2.0/*.bin /lib/firmware/ath11k/WCN6855/hw2.0/
> cp /root/ath11k-firmware/WCN6855/hw2.0/1.1/WLAN.HSP.1.1-02892.1-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1/*.bin /lib/firmware/ath11k/WCN6855/hw2.0/
>
> git clone https://github.com/qca/qca-swiss-army-knife /root/qca-swiss-army-knife
> apt install python2
> python2 /root/qca-swiss-army-knife/tools/scripts/ath11k/ath11k-bdencoder -e /lib/firmware/ath11k/WCN6855/hw2.0/board-2.bin
> rm /lib/firmware/ath11k/WCN6855/hw2.0/board-2.bin
> cp 'bus=pci,vendor=17cb,device=1103,subsystem-vendor=17cb,subsystem-device=3374,qmi-chip-id=2,qmi-board-id=266.bin' /lib/firmware/ath11k/WCN6855/hw2.0/board.bin
>
> Then I am just starting up the device as usual, and start wpa_supplicant (with
> defconfig + CONFIG_MESH=y) from commit 14ab4a816c68 ("Reject
> ap_vendor_elements if its length is odd")
>
> cat << "EOF" > station_test.cfg
> network={
> ssid="MyTestAP"
> key_mgmt=WPA-PSK FT-PSK
> proto=RSN
> psk="testtest"
> }
> EOF
> ip link set up dev wlp6s0
> ~/hostap/wpa_supplicant/wpa_supplicant -D nl80211 -i wlp6s0 -c station_test.cfg
>
> The actual SSID + PSK is valid and multiple access points (4) have this BSS on
> 2.4GHz + 5GHz.
>
> So you are basically always calling dev_kfree_skb_any in ath11k_ce_tx_process_cb
> because wcn6855 hw2.0 has credit_flow has set. But it seems like one of the
> entries returned by ath11k_ce_completed_send_next is bogus and causes this
> problems during the ath11k_ce_tx_process_cb. And for some reason, this is
> triggered here by this firmware feature.
>
> ./scripts/faddr2line --list vmlinux consume_skb+0x9f/0x1c0
> consume_skb+0x9f/0x1c0:
>
> __kfree_skb at net/core/skbuff.c:757
> 752 */
> 753
> 754 void __kfree_skb(struct sk_buff *skb)
> 755 {
> 756 skb_release_all(skb);
> >757< kfree_skbmem(skb);
> 758 }
> 759 EXPORT_SYMBOL(__kfree_skb);
> 760
> 761 /**
> 762 * kfree_skb - free an sk_buff
>
> (inlined by) consume_skb at net/core/skbuff.c:912
> 907 {
> 908 if (!skb_unref(skb))
> 909 return;
> 910
> 911 trace_consume_skb(skb);
> >912< __kfree_skb(skb);
> 913 }
> 914 EXPORT_SYMBOL(consume_skb);
> 915 #endif
> 916
> 917 /**
>
> (inlined by) consume_skb at net/core/skbuff.c:906
> 901 *
> 902 * Drop a ref to the buffer and free it if the usage count has hit zero
> 903 * Functions identically to kfree_skb, but kfree_skb assumes that the frame
> 904 * is being dropped after a failure and notes that
> 905 */
> >906< void consume_skb(struct sk_buff *skb)
> 907 {
> 908 if (!skb_unref(skb))
> 909 return;
> 910
> 911 trace_consume_skb(skb);
>
>
> ./scripts/faddr2line --list vmlinux skb_release_data+0x1b0/0x5c0
> skb_release_data+0x1b0/0x5c0:
>
> skb_zcopy_clear at include/linux/skbuff.h:1549
> 1544 {
> 1545 struct ubuf_info *uarg = skb_zcopy(skb);
> 1546
> 1547 if (uarg) {
> 1548 if (!skb_zcopy_is_nouarg(skb))
> >1549< uarg->callback(skb, uarg, zerocopy_success);
> 1550
> 1551 skb_shinfo(skb)->flags &= ~SKBFL_ALL_ZEROCOPY;
> 1552 }
> 1553 }
> 1554
>
> (inlined by) skb_release_data at net/core/skbuff.c:669
> 664 if (skb->cloned &&
> 665 atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1,
> 666 &shinfo->dataref))
> 667 goto exit;
> 668
> >669< skb_zcopy_clear(skb, true);
> 670
> 671 for (i = 0; i < shinfo->nr_frags; i++)
> 672 __skb_frag_unref(&shinfo->frags[i], skb->pp_recycle);
> 673
> 674 if (shinfo->frag_list)
>
> But I didn't like the inlined code. So I've changed the compilation flags
> slightly:
>
> diff --git a/net/core/Makefile b/net/core/Makefile
> index 6bdcb2cafed8..5eda226c5f27 100644
> --- a/net/core/Makefile
> +++ b/net/core/Makefile
> @@ -37,3 +37,4 @@ obj-$(CONFIG_NET_SOCK_MSG) += skmsg.o
> obj-$(CONFIG_BPF_SYSCALL) += sock_map.o
> obj-$(CONFIG_BPF_SYSCALL) += bpf_sk_storage.o
> obj-$(CONFIG_OF) += of_net.o
> +ccflags-y += -fno-inline -O1 -fno-optimize-sibling-calls
>
> Now the stacktrace is a lot more readable. And the returned
> crash location makes a lot more sense:
>
> ./scripts/faddr2line --list vmlinux 'skb_zcopy_clear+0x34/0x8f'
> skb_zcopy_clear+0x34/0x8f:
>
> skb_zcopy_clear at include/linux/skbuff.h:1549
> 1544 {
> 1545 struct ubuf_info *uarg = skb_zcopy(skb);
> 1546
> 1547 if (uarg) {
> 1548 if (!skb_zcopy_is_nouarg(skb))
> >1549< uarg->callback(skb, uarg, zerocopy_success);
> 1550
> 1551 skb_shinfo(skb)->flags &= ~SKBFL_ALL_ZEROCOPY;
> 1552 }
> 1553 }
> 1554
>
> Or with the assembler:
>
> (gdb) disassemble /m *(skb_zcopy_clear+0x34/0x8f)
> Dump of assembler code for function skb_zcopy_clear:
> 1544 {
> 0x000000000000072a <+0>: push %r12
> 0x000000000000072c <+2>: push %rbp
> 0x000000000000072d <+3>: push %rbx
> 0x000000000000072e <+4>: mov %rdi,%rbx
> 0x0000000000000731 <+7>: mov %esi,%r12d
>
> 1545 struct ubuf_info *uarg = skb_zcopy(skb);
> 0x0000000000000734 <+10>: call 0x5d3 <skb_zcopy>
>
> 1546
> 1547 if (uarg) {
> 0x0000000000000739 <+15>: test %rax,%rax
> 0x000000000000073c <+18>: je 0x7a0 <skb_zcopy_clear+118>
> 0x000000000000073e <+20>: mov %rax,%rbp
>
> 1548 if (!skb_zcopy_is_nouarg(skb))
> 0x0000000000000741 <+23>: mov %rbx,%rdi
> 0x0000000000000744 <+26>: call 0x6f6 <skb_zcopy_is_nouarg>
> 0x0000000000000749 <+31>: test %al,%al
> 0x000000000000074b <+33>: jne 0x777 <skb_zcopy_clear+77>
>
> 1549 uarg->callback(skb, uarg, zerocopy_success);
> 0x000000000000074d <+35>: mov %rbp,%rdx
> 0x0000000000000750 <+38>: shr $0x3,%rdx
> 0x0000000000000754 <+42>: movabs $0xdffffc0000000000,%rax
> 0x000000000000075e <+52>: cmpb $0x0,(%rdx,%rax,1)
> 0x0000000000000762 <+56>: jne 0x7a5 <skb_zcopy_clear+123>
> 0x0000000000000764 <+58>: movzbl %r12b,%edx
> 0x0000000000000768 <+62>: mov 0x0(%rbp),%rax
> 0x000000000000076c <+66>: mov %rbp,%rsi
> 0x000000000000076f <+69>: mov %rbx,%rdi
> 0x0000000000000772 <+72>: call 0x777 <skb_zcopy_clear+77>
> 0x00000000000007a5 <+123>: mov %rbp,%rdi
> 0x00000000000007a8 <+126>: call 0x7ad <skb_zcopy_clear+131>
> 0x00000000000007ad <+131>: jmp 0x764 <skb_zcopy_clear+58>
>
> 1550
> 1551 skb_shinfo(skb)->flags &= ~SKBFL_ALL_ZEROCOPY;
> 0x0000000000000777 <+77>: mov %rbx,%rdi
> 0x000000000000077a <+80>: call 0x518 <skb_end_pointer>
> 0x000000000000077f <+85>: mov %rax,%rbx
> 0x0000000000000782 <+88>: mov %rax,%rdx
> 0x0000000000000785 <+91>: shr $0x3,%rdx
> 0x0000000000000789 <+95>: movabs $0xdffffc0000000000,%rax
> 0x0000000000000793 <+105>: movzbl (%rdx,%rax,1),%eax
> 0x0000000000000797 <+109>: test %al,%al
> 0x0000000000000799 <+111>: je 0x79d <skb_zcopy_clear+115>
> 0x000000000000079b <+113>: jle 0x7af <skb_zcopy_clear+133>
> 0x000000000000079d <+115>: andb $0xf8,(%rbx)
> 0x00000000000007af <+133>: mov %rbx,%rdi
> 0x00000000000007b2 <+136>: call 0x7b7 <skb_zcopy_clear+141>
> 0x00000000000007b7 <+141>: jmp 0x79d <skb_zcopy_clear+115>
>
> 1552 }
> 1553 }
> 0x00000000000007a0 <+118>: pop %rbx
> 0x00000000000007a1 <+119>: pop %rbp
> 0x00000000000007a2 <+120>: pop %r12
> 0x00000000000007a4 <+122>: ret
>
> End of assembler dump.
>
> To make it even easier to read, just disable the inline KASAN and reduce the
> optimization level for this for it:
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 059b6266dcd7..819cc58ab051 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -1540,6 +1540,8 @@ static inline void net_zcopy_put_abort(struct ubuf_info *uarg, bool have_uref)
> }
>
> /* Release a reference on a zerocopy structure */
> +#pragma GCC push_options
> +#pragma GCC optimize ("O0")
> static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy_success)
> {
> struct ubuf_info *uarg = skb_zcopy(skb);
> @@ -1551,6 +1553,7 @@ static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy_success)
> skb_shinfo(skb)->flags &= ~SKBFL_ALL_ZEROCOPY;
> }
> }
> +#pragma GCC pop_options
>
> static inline void skb_mark_not_on_list(struct sk_buff *skb)
> {
>
> This creates this nice, unoptimized function which crashes at +63:
>
> $ gdb net/core/skbuff.o -q
> Reading symbols from net/core/skbuff.o...
> (gdb) disassemble /m *(skb_zcopy_clear+0x3f/0x70)
> Dump of assembler code for function skb_zcopy_clear:
> 1546 {
> 0x0000000000000000 <+0>: push %rbp
> 0x0000000000000001 <+1>: mov %rsp,%rbp
> 0x0000000000000004 <+4>: sub $0x18,%rsp
> 0x0000000000000008 <+8>: mov %rdi,-0x10(%rbp)
> 0x000000000000000c <+12>: mov %esi,%eax
> 0x000000000000000e <+14>: mov %al,-0x14(%rbp)
>
> 1547 struct ubuf_info *uarg = skb_zcopy(skb);
> 0x0000000000000011 <+17>: mov -0x10(%rbp),%rax
> 0x0000000000000015 <+21>: mov %rax,%rdi
> 0x0000000000000018 <+24>: call 0x29e <skb_zcopy>
> 0x000000000000001d <+29>: mov %rax,-0x8(%rbp)
>
> 1548
> 1549 if (uarg) {
> 0x0000000000000021 <+33>: cmpq $0x0,-0x8(%rbp)
> 0x0000000000000026 <+38>: je 0x6d <skb_zcopy_clear+109>
>
> 1550 if (!skb_zcopy_is_nouarg(skb))
> 0x0000000000000028 <+40>: mov -0x10(%rbp),%rax
> 0x000000000000002c <+44>: mov %rax,%rdi
> 0x000000000000002f <+47>: call 0x2df <skb_zcopy_is_nouarg>
> 0x0000000000000034 <+52>: xor $0x1,%eax
> 0x0000000000000037 <+55>: test %al,%al
> 0x0000000000000039 <+57>: je 0x59 <skb_zcopy_clear+89>
>
> 1551 uarg->callback(skb, uarg, zerocopy_success);
> 0x000000000000003b <+59>: mov -0x8(%rbp),%rax
> 0x000000000000003f <+63>: mov (%rax),%r8
> 0x0000000000000042 <+66>: movzbl -0x14(%rbp),%edx
> 0x0000000000000046 <+70>: mov -0x8(%rbp),%rcx
> 0x000000000000004a <+74>: mov -0x10(%rbp),%rax
> 0x000000000000004e <+78>: mov %rcx,%rsi
> 0x0000000000000051 <+81>: mov %rax,%rdi
> 0x0000000000000054 <+84>: call 0x59 <skb_zcopy_clear+89>
>
> 1552
> 1553 skb_shinfo(skb)->flags &= ~SKBFL_ALL_ZEROCOPY;
> 0x0000000000000059 <+89>: mov -0x10(%rbp),%rax
> 0x000000000000005d <+93>: mov %rax,%rdi
> 0x0000000000000060 <+96>: call 0x27f <skb_end_pointer>
> 0x0000000000000065 <+101>: movzbl (%rax),%edx
> 0x0000000000000068 <+104>: and $0xfffffff8,%edx
> 0x000000000000006b <+107>: mov %dl,(%rax)
>
> 1554 }
> 1555 }
> 0x000000000000006d <+109>: nop
> 0x000000000000006e <+110>: leave
> 0x000000000000006f <+111>: ret
>
> End of assembler dump.
>
> The question now: What is causing the unclean state of the skb and thus
> doesn't let it get rejected by skb_zcopy_is_nouarg before the uarg
> callback is tried.
>
> Kind regards,
> Sven

Thanks Sven a lot for your analyze.

I still can not reproduce it.

I think it is because the write over skb->tail in scan, because the
invalid address

is same for each crash(0x408210000b231a/0xe0080c4200016463), and it is
caused by this instruction

"0x000000000000003f <+63>: mov (%rax),%r8" which is assign the value of uarg->callback to %r8.

Could you add below change?

It will print the log to help us find out the bug.

diff --git a/drivers/net/wireless/ath/ath11k/mac.c
b/drivers/net/wireless/ath/ath11k/mac.c
index 26181f237e23..2147f74f5ebf 100644
--- a/drivers/net/wireless/ath/ath11k/mac.c
+++ b/drivers/net/wireless/ath/ath11k/mac.c
@@ -3421,12 +3421,15 @@ static int ath11k_mac_op_hw_scan(struct
ieee80211_hw *hw,
                memcpy(arg.extraie.ptr, req->ie, req->ie_len);
        }

+       ath11k_info(ar->ab, "n_ssids %d\n", req->n_ssids);
+
        if (req->n_ssids) {
                arg.num_ssids = req->n_ssids;
                for (i = 0; i < arg.num_ssids; i++) {
                        arg.ssid[i].length  = req->ssids[i].ssid_len;
                        memcpy(&arg.ssid[i].ssid, req->ssids[i].ssid,
                               req->ssids[i].ssid_len);
+                       ath11k_info(ar->ab, "ssid[%d] len %d\n", i,
arg.ssid[i].length);
                }
        } else {
                arg.scan_flags |= WMI_SCAN_FLAG_PASSIVE;
diff --git a/drivers/net/wireless/ath/ath11k/wmi.c
b/drivers/net/wireless/ath/ath11k/wmi.c
index 7d7f76d4bf1f..e42a64251799 100644
--- a/drivers/net/wireless/ath/ath11k/wmi.c
+++ b/drivers/net/wireless/ath/ath11k/wmi.c
@@ -2271,6 +2271,7 @@ int ath11k_wmi_send_scan_start_cmd(struct ath11k *ar,
                }
        }

+       ath11k_info(ar->ab, "%s ptr %px skb data %px len %d over %d",
__func__, ptr, skb->data, skb->len, ((unsigned char
*)ptr)-skb->data-skb->len);
        ret = ath11k_wmi_cmd_send(wmi, skb,
                                  WMI_START_SCAN_CMDID);
        if (ret) {


2021-12-07 14:30:32

by Sven Eckelmann

[permalink] [raw]
Subject: Re: [PATCH] ath11k: enable IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS for WCN6855

On Tuesday, 7 December 2021 05:35:04 CET Wen Gong wrote:
> Thanks Sven a lot for your analyze.
>
> I still can not reproduce it.
>
> I think it is because the write over skb->tail in scan, because the
> invalid address

Yes, I thought that I wanted to write about it but it might have gone into
another draft of the mail. So what I wanted to write was something like:

The information which is used in skb_zcopy_clear/skb_zcopy/skb_zcopy_is_nouarg
is coming from skb_shinfo. And skb_end_pointer is just a pointer to a region
at the end of the skb buffer (skb->end). And this got corrupted by something
Unfortunately this is correctly allocated memory and thus kasan cannot help
us with it.



[...]
> --- a/drivers/net/wireless/ath/ath11k/wmi.c
> +++ b/drivers/net/wireless/ath/ath11k/wmi.c
> @@ -2271,6 +2271,7 @@ int ath11k_wmi_send_scan_start_cmd(struct ath11k *ar,
> }
> }
>
> + ath11k_info(ar->ab, "%s ptr %px skb data %px len %d over %d",
> __func__, ptr, skb->data, skb->len, ((unsigned char
> *)ptr)-skb->data-skb->len);
> ret = ath11k_wmi_cmd_send(wmi, skb,
> WMI_START_SCAN_CMDID);
> if (ret) {

Changed the last part to:

ath11k_err(ar->ab, "%s ptr %px skb data %px len %d over %ld\n", __func__, ptr, skb->data, skb->len, ((unsigned char *)ptr) - skb->data - skb->len);


The output is:

ath11k_pci 0000:01:00.0: n_ssids 1
ath11k_pci 0000:01:00.0: ssid[0] len 0
ath11k_pci 0000:01:00.0: ath11k_wmi_send_scan_start_cmd ptr ffff9217101e82b4 skb data ffff9217101e804c len 616 over 0

But we are looking at the ath11k_ce_tx_process_cb function. So I would have
expected that it is related to something which as sent out. So the first thing
I did was to add some skb_dumps in the sent path (ath11k_htc_send) and in the
cleanup path (skb_zcopy_clear). Something like this (just the cleanup path
because otherwise I have to post a rather large diff):

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 819cc58ab051..c15512e2f30c 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1547,8 +1547,10 @@ static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy_success)
struct ubuf_info *uarg = skb_zcopy(skb);

if (uarg) {
- if (!skb_zcopy_is_nouarg(skb))
+ if (!skb_zcopy_is_nouarg(skb)) {
+ skb_dump(KERN_ERR, skb, true);
uarg->callback(skb, uarg, zerocopy_success);
+ }

skb_shinfo(skb)->flags &= ~SKBFL_ALL_ZEROCOPY;
}


But interestingly, it already crashes to parse the fraglist in
ath11k_htc_send. So I've added some more dump to figure out where is breaks.
And I've noticed that it breaks after following section in
ath11k_wmi_send_scan_start_cmd

if (params->extraie.len)
memcpy(ptr, params->extraie.ptr,
params->extraie.len);

Here is the full output:

[ 30.641297] ath11k_wmi_send_scan_start_cmd:2357
[ 30.645873] skb len=616 headroom=76 headlen=616 tailroom=12
[ 30.645873] mac=(-1,-1) net=(0,-1) trans=-1
[ 30.645873] shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
[ 30.645873] csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)
[ 30.645873] hash(0x0 sw=0 l4=0) proto=0x0000 pkttype=0 iif=0
[ 30.673381] skb headroom: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 30.681073] skb headroom: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 30.688758] skb headroom: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 30.696465] skb headroom: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 30.704197] skb headroom: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00
[ 30.710852] skb linear: 00000000: 9c 00 4d 00 00 a0 00 00 01 00 00 00 00 00 00 00
[ 30.718538] skb linear: 00000010: 01 00 00 00 1f 00 00 00 32 00 00 00 96 00 00 00
[ 30.726271] skb linear: 00000020: 32 00 00 00 f4 01 00 00 00 00 00 00 00 00 00 00
[ 30.733954] skb linear: 00000030: 00 00 00 00 20 4e 00 00 05 00 00 00 10 00 00 00
[ 30.741636] skb linear: 00000040: 00 00 00 00 61 00 00 00 01 00 00 00 01 00 00 00
[ 30.749346] skb linear: 00000050: 08 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 30.757092] skb linear: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 30.764795] skb linear: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 30.772483] skb linear: 00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 30.780170] skb linear: 00000090: 00 00 00 00 28 00 00 00 1e 00 00 00 00 00 00 00
[ 30.787854] skb linear: 000000a0: 84 01 10 00 6c 09 00 00 71 09 00 00 76 09 00 00
[ 30.795541] skb linear: 000000b0: 7b 09 00 00 80 09 00 00 85 09 00 00 8a 09 00 00
[ 30.803236] skb linear: 000000c0: 8f 09 00 00 94 09 00 00 99 09 00 00 9e 09 00 00
[ 30.810933] skb linear: 000000d0: a3 09 00 00 a8 09 00 00 3c 14 00 00 50 14 00 00
[ 30.818620] skb linear: 000000e0: 64 14 00 00 78 14 00 00 8c 14 00 00 a0 14 00 00
[ 30.826322] skb linear: 000000f0: b4 14 00 00 c8 14 00 00 7c 15 00 00 90 15 00 00
[ 30.834018] skb linear: 00000100: a4 15 00 00 b8 15 00 00 cc 15 00 00 e0 15 00 00
[ 30.841712] skb linear: 00000110: f4 15 00 00 08 16 00 00 1c 16 00 00 30 16 00 00
[ 30.849402] skb linear: 00000120: 44 16 00 00 58 16 00 00 71 16 00 00 85 16 00 00
[ 30.857094] skb linear: 00000130: 99 16 00 00 ad 16 00 00 c1 16 00 00 43 17 00 00
[ 30.864776] skb linear: 00000140: 57 17 00 00 6b 17 00 00 7f 17 00 00 93 17 00 00
[ 30.872490] skb linear: 00000150: a7 17 00 00 bb 17 00 00 cf 17 00 00 e3 17 00 00
[ 30.880182] skb linear: 00000160: f7 17 00 00 0b 18 00 00 1f 18 00 00 33 18 00 00
[ 30.887882] skb linear: 00000170: 47 18 00 00 5b 18 00 00 6f 18 00 00 83 18 00 00
[ 30.895581] skb linear: 00000180: 97 18 00 00 ab 18 00 00 bf 18 00 00 d3 18 00 00
[ 30.903265] skb linear: 00000190: e7 18 00 00 fb 18 00 00 0f 19 00 00 23 19 00 00
[ 30.910974] skb linear: 000001a0: 37 19 00 00 4b 19 00 00 5f 19 00 00 73 19 00 00
[ 30.918675] skb linear: 000001b0: 87 19 00 00 9b 19 00 00 af 19 00 00 c3 19 00 00
[ 30.926418] skb linear: 000001c0: d7 19 00 00 eb 19 00 00 ff 19 00 00 13 1a 00 00
[ 30.934118] skb linear: 000001d0: 27 1a 00 00 3b 1a 00 00 4f 1a 00 00 63 1a 00 00
[ 30.941842] skb linear: 000001e0: 77 1a 00 00 8b 1a 00 00 9f 1a 00 00 b3 1a 00 00
[ 30.949537] skb linear: 000001f0: c7 1a 00 00 db 1a 00 00 ef 1a 00 00 03 1b 00 00
[ 30.957221] skb linear: 00000200: 17 1b 00 00 2b 1b 00 00 3f 1b 00 00 53 1b 00 00
[ 30.964912] skb linear: 00000210: 67 1b 00 00 7b 1b 00 00 8f 1b 00 00 a3 1b 00 00
[ 30.972614] skb linear: 00000220: b7 1b 00 00 cb 1b 00 00 24 00 13 00 00 00 00 00
[ 30.980315] skb linear: 00000230: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 30.988010] skb linear: 00000240: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 30.995696] skb linear: 00000250: 08 00 13 00 ff ff ff ff ff ff 00 00 08 00 11 00
[ 31.003394] skb linear: 00000260: 00 00 00 00 00 00 00 00
[ 31.009002] skb tailroom: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00
[ 31.015646] ath11k_wmi_send_scan_start_cmd:2362
[ 31.020217] skb len=616 headroom=76 headlen=616 tailroom=12
[ 31.020217] mac=(-1,-1) net=(0,-1) trans=-1
[ 31.020217] shinfo(txflags=0 nr_frags=255 gso(size=0 type=265087 segs=0))
[ 31.020217] csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)
[ 31.020217] hash(0x0 sw=0 l4=0) proto=0x0000 pkttype=0 iif=0
[ 31.048289] skb headroom: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 31.056015] skb headroom: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 31.063714] skb headroom: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 31.071425] skb headroom: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 31.079141] skb headroom: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00
[ 31.085787] skb linear: 00000000: 9c 00 4d 00 00 a0 00 00 01 00 00 00 00 00 00 00
[ 31.093518] skb linear: 00000010: 01 00 00 00 1f 00 00 00 32 00 00 00 96 00 00 00
[ 31.101239] skb linear: 00000020: 32 00 00 00 f4 01 00 00 00 00 00 00 00 00 00 00
[ 31.108947] skb linear: 00000030: 00 00 00 00 20 4e 00 00 05 00 00 00 10 00 00 00
[ 31.116630] skb linear: 00000040: 00 00 00 00 61 00 00 00 01 00 00 00 01 00 00 00
[ 31.124326] skb linear: 00000050: 08 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 31.132007] skb linear: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 31.139708] skb linear: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 31.147420] skb linear: 00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 31.155118] skb linear: 00000090: 00 00 00 00 28 00 00 00 1e 00 00 00 00 00 00 00
[ 31.162798] skb linear: 000000a0: 84 01 10 00 6c 09 00 00 71 09 00 00 76 09 00 00
[ 31.170486] skb linear: 000000b0: 7b 09 00 00 80 09 00 00 85 09 00 00 8a 09 00 00
[ 31.178175] skb linear: 000000c0: 8f 09 00 00 94 09 00 00 99 09 00 00 9e 09 00 00
[ 31.185876] skb linear: 000000d0: a3 09 00 00 a8 09 00 00 3c 14 00 00 50 14 00 00
[ 31.193593] skb linear: 000000e0: 64 14 00 00 78 14 00 00 8c 14 00 00 a0 14 00 00
[ 31.201278] skb linear: 000000f0: b4 14 00 00 c8 14 00 00 7c 15 00 00 90 15 00 00
[ 31.208969] skb linear: 00000100: a4 15 00 00 b8 15 00 00 cc 15 00 00 e0 15 00 00
[ 31.216655] skb linear: 00000110: f4 15 00 00 08 16 00 00 1c 16 00 00 30 16 00 00
[ 31.224346] skb linear: 00000120: 44 16 00 00 58 16 00 00 71 16 00 00 85 16 00 00
[ 31.232030] skb linear: 00000130: 99 16 00 00 ad 16 00 00 c1 16 00 00 43 17 00 00
[ 31.239739] skb linear: 00000140: 57 17 00 00 6b 17 00 00 7f 17 00 00 93 17 00 00
[ 31.247428] skb linear: 00000150: a7 17 00 00 bb 17 00 00 cf 17 00 00 e3 17 00 00
[ 31.255141] skb linear: 00000160: f7 17 00 00 0b 18 00 00 1f 18 00 00 33 18 00 00
[ 31.262840] skb linear: 00000170: 47 18 00 00 5b 18 00 00 6f 18 00 00 83 18 00 00
[ 31.270591] skb linear: 00000180: 97 18 00 00 ab 18 00 00 bf 18 00 00 d3 18 00 00
[ 31.278282] skb linear: 00000190: e7 18 00 00 fb 18 00 00 0f 19 00 00 23 19 00 00
[ 31.285965] skb linear: 000001a0: 37 19 00 00 4b 19 00 00 5f 19 00 00 73 19 00 00
[ 31.293675] skb linear: 000001b0: 87 19 00 00 9b 19 00 00 af 19 00 00 c3 19 00 00
[ 31.301361] skb linear: 000001c0: d7 19 00 00 eb 19 00 00 ff 19 00 00 13 1a 00 00
[ 31.309056] skb linear: 000001d0: 27 1a 00 00 3b 1a 00 00 4f 1a 00 00 63 1a 00 00
[ 31.316753] skb linear: 000001e0: 77 1a 00 00 8b 1a 00 00 9f 1a 00 00 b3 1a 00 00
[ 31.324441] skb linear: 000001f0: c7 1a 00 00 db 1a 00 00 ef 1a 00 00 03 1b 00 00
[ 31.332138] skb linear: 00000200: 17 1b 00 00 2b 1b 00 00 3f 1b 00 00 53 1b 00 00
[ 31.339840] skb linear: 00000210: 67 1b 00 00 7b 1b 00 00 8f 1b 00 00 a3 1b 00 00
[ 31.347520] skb linear: 00000220: b7 1b 00 00 cb 1b 00 00 24 00 13 00 00 00 00 00
[ 31.355232] skb linear: 00000230: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 31.362920] skb linear: 00000240: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 31.370607] skb linear: 00000250: 08 00 13 00 ff ff ff ff ff ff 00 00 08 00 11 00
[ 31.378331] skb linear: 00000260: 01 08 02 04 0b 16 0c 12
[ 31.383972] skb tailroom: 00000000: 18 24 32 04 30 48 60 6c 2d 1a e3 19
[ 31.390651] skb fraglist:
[ 31.393348] BUG: unable to handle page fault for address: 00000100000000bc
[ 31.400317] #PF: supervisor read access in kernel mode
[ 31.405624] #PF: error_code(0x0000) - not-present page
[ 31.410832] PGD 0 P4D 0
[ 31.413422] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 31.417881] CPU: 0 PID: 520 Comm: wpa_supplicant Not tainted 5.16.0-rc1+ #5
[ 31.424862] Hardware name: PC Engines apu2/apu2, BIOS v4.15.0.1 11/23/2021
[ 31.431750] RIP: 0010:skb_end_pointer+0x0/0xe
[ 31.436129] Code: dc ff ff ff c3 48 8b 07 48 83 e0 fe 48 83 c8 02 48 89 07 c3 8b 47 08 c3 89 77 08 c3 01 77 08 c3 29 77 08 c3 b8 00 00 00 00 c3 <8b> 87 bc 00 00 00 48 03 87 c0 00 00 00 c3 8b 87 bc 00 00 00 c3 e8
[ 31.454883] RSP: 0018:ffffb0edc0427490 EFLAGS: 00010282
[ 31.460116] RAX: ffff8ff9818b1ec0 RBX: 0000010000000000 RCX: 0000000000000000
[ 31.467267] RDX: 0000000000000001 RSI: 0000010000000000 RDI: 0000010000000000
[ 31.474408] RBP: ffff8ff9891a07c8 R08: 0000000000000000 R09: ffffb0edc0427370
[ 31.481549] R10: ffffb0edc0427368 R11: ffffffffa5cd22e8 R12: 0000000000000268
[ 31.488689] R13: 00000000ffffffff R14: 0000010000000000 R15: ffffffffc0f2198b
[ 31.495823] FS: 00007f2725a55c00(0000) GS:ffff8ff9aac00000(0000) knlGS:0000000000000000
[ 31.503936] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 31.509706] CR2: 00000100000000bc CR3: 00000001090bc000 CR4: 00000000000406f0
[ 31.516868] Call Trace:
[ 31.519325] <TASK>
[ 31.521433] skb_dump+0x24/0x53a
[ 31.524688] ? _printk+0x58/0x6f
[ 31.527938] skb_dump+0x532/0x53a
[ 31.531267] ath11k_wmi_send_scan_start_cmd.cold+0x5f2/0x793 [ath11k]
[ 31.537785] ath11k_mac_op_hw_scan+0x173/0x3f0 [ath11k]
[ 31.543086] drv_hw_scan+0x43/0x130 [mac80211]
[ 31.547690] __ieee80211_start_scan+0x152/0x6d0 [mac80211]
[ 31.553306] ieee80211_request_scan+0x2c/0x50 [mac80211]
[ 31.558738] rdev_scan+0x28/0xd0 [cfg80211]
[ 31.563117] nl80211_trigger_scan+0x3fe/0x680 [cfg80211]
[ 31.568584] genl_family_rcv_msg_doit+0xea/0x150
[ 31.573223] genl_rcv_msg+0xde/0x1d0
[ 31.576816] ? nl80211_send_scan_start+0x90/0x90 [cfg80211]
[ 31.582520] ? genl_get_cmd+0xd0/0xd0
[ 31.586191] netlink_rcv_skb+0x50/0xf0
[ 31.589958] genl_rcv+0x24/0x40
[ 31.593109] netlink_unicast+0x239/0x340
[ 31.597045] netlink_sendmsg+0x245/0x480
[ 31.600981] sock_sendmsg+0x5e/0x60
[ 31.604487] ____sys_sendmsg+0x22e/0x270
[ 31.608440] ? import_iovec+0x2d/0x30
[ 31.612123] ? sendmsg_copy_msghdr+0x7c/0xa0
[ 31.616406] ___sys_sendmsg+0x75/0xb0
[ 31.620081] ? __mod_lruvec_page_state+0x7d/0xc0
[ 31.624714] ? folio_add_lru+0x5c/0xa0
[ 31.628476] ? _raw_spin_unlock+0x16/0x30
[ 31.632506] ? __handle_mm_fault+0x1261/0x1520
[ 31.636965] __sys_sendmsg+0x59/0xa0
[ 31.640552] do_syscall_64+0x3b/0xc0
[ 31.644148] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 31.649208] RIP: 0033:0x7f2725ef6f33
[ 31.652797] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b7 66 2e 0f 1f 84 00 00 00 00 00 90 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 89 54 24 1c 48
[ 31.671547] RSP: 002b:00007fff1b5f1668 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 31.679122] RAX: ffffffffffffffda RBX: 0000564919260760 RCX: 00007f2725ef6f33
[ 31.686264] RDX: 0000000000000000 RSI: 00007fff1b5f16a0 RDI: 0000000000000005
[ 31.693406] RBP: 000056491928f6c0 R08: 0000000000000004 R09: 00007f2725fb6c00
[ 31.700547] R10: 00007fff1b5f178c R11: 0000000000000246 R12: 0000564919260670
[ 31.707689] R13: 00007fff1b5f16a0 R14: 0000000000000000 R15: 00007fff1b5f178c
[ 31.714834] </TASK>
[ 31.717031] Modules linked in: qrtr_mhi btusb btrtl btbcm btintel bluetooth jitterentropy_rng sha512_ssse3 sha512_generic drbg ansi_cprng amd64_edac ecdh_generic edac_mce_amd ecc kvm_amd kvm irqbypass qrtr crc32_pclmul ghash_clmulni_intel ath11k_pci mhi ath11k evdev pcengines_apuv2 qmi_helpers gpio_keys_polled gpio_amd_fch aesni_intel snd_pcm crypto_simd snd_timer sdhci_pci xhci_pci snd cqhci mac80211 soundcore ehci_pci sp5100_tco cryptd libarc4 xhci_hcd sdhci ehci_hcd pcspkr igb watchdog ptp cfg80211 mmc_core k10temp i2c_piix4 fam15h_power usbcore ccp pps_core sg dca rng_core i2c_algo_bit usb_common rfkill leds_gpio gpio_keys acpi_cpufreq button fuse drm configfs ip_tables x_tables autofs4 ext4 crc32c_generic crc16 mbcache jbd2 sd_mod t10_pi crc_t10dif crct10dif_generic ahci libahci libata crct10dif_pclmul scsi_mod crct10dif_common crc32c_intel scsi_common
[ 31.793074] CR2: 00000100000000bc
[ 31.796498] ---[ end trace 07252723010a83e6 ]---
[ 31.801261] RIP: 0010:skb_end_pointer+0x0/0xe
[ 31.805824] Code: dc ff ff ff c3 48 8b 07 48 83 e0 fe 48 83 c8 02 48 89 07 c3 8b 47 08 c3 89 77 08 c3 01 77 08 c3 29 77 08 c3 b8 00 00 00 00 c3 <8b> 87 bc 00 00 00 48 03 87 c0 00 00 00 c3 8b 87 bc 00 00 00 c3 e8
[ 31.824842] RSP: 0018:ffffb0edc0427490 EFLAGS: 00010282
[ 31.830105] RAX: ffff8ff9818b1ec0 RBX: 0000010000000000 RCX: 0000000000000000
[ 31.837270] RDX: 0000000000000001 RSI: 0000010000000000 RDI: 0000010000000000
[ 31.844441] RBP: ffff8ff9891a07c8 R08: 0000000000000000 R09: ffffb0edc0427370
[ 31.851614] R10: ffffb0edc0427368 R11: ffffffffa5cd22e8 R12: 0000000000000268
[ 31.858781] R13: 00000000ffffffff R14: 0000010000000000 R15: ffffffffc0f2198b
[ 31.866020] FS: 00007f2725a55c00(0000) GS:ffff8ff9aac00000(0000) knlGS:0000000000000000
[ 31.874141] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 31.879920] CR2: 00000100000000bc CR3: 00000001090bc000 CR4: 00000000000406f0


So the length calculated for the ath11k_wmi_alloc_skb is just wrong. Reason
for this is the extraie_len_with_pad which is only an u8. But the
params->extraie.len with the IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS is for me
already 264. So the length will end up as 8 - but the length it occupies
is still 264.

But the problem is the length of the WMI_TLV_LEN. The params->extraie.len can
be up to 32 bit and WMI_TLV_LEN only has 16 bit. So the params->extraie.len
must also be size limited or we might run into a different problem.

Kind regards,
Sven


Attachments:
signature.asc (833.00 B)
This is a digitally signed message part.

2021-12-08 03:43:57

by Wen Gong

[permalink] [raw]
Subject: Re: [PATCH] ath11k: enable IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS for WCN6855

Thanks Sven's analyze/debugging.

I see your patch "ath11k: Fix buffer overflow when scanning with extraie".

On 12/7/2021 10:30 PM, Sven Eckelmann wrote:
> On Tuesday, 7 December 2021 05:35:04 CET Wen Gong wrote:
>> Thanks Sven a lot for your analyze.
>>
>> I still can not reproduce it.
>>
>> I think it is because the write over skb->tail in scan, because the
>> invalid address
> Yes, I thought that I wanted to write about it but it might have gone into
> another draft of the mail. So what I wanted to write was something like:
>
> The information which is used in skb_zcopy_clear/skb_zcopy/skb_zcopy_is_nouarg
> is coming from skb_shinfo. And skb_end_pointer is just a pointer to a region
> at the end of the skb buffer (skb->end). And this got corrupted by something
> Unfortunately this is correctly allocated memory and thus kasan cannot help
> us with it.
>
>
>
> [...]
>> --- a/drivers/net/wireless/ath/ath11k/wmi.c
>> +++ b/drivers/net/wireless/ath/ath11k/wmi.c
>> @@ -2271,6 +2271,7 @@ int ath11k_wmi_send_scan_start_cmd(struct ath11k *ar,
>> }
>> }
>>
>> + ath11k_info(ar->ab, "%s ptr %px skb data %px len %d over %d",
>> __func__, ptr, skb->data, skb->len, ((unsigned char
>> *)ptr)-skb->data-skb->len);
>> ret = ath11k_wmi_cmd_send(wmi, skb,
>> WMI_START_SCAN_CMDID);
>> if (ret) {
> Changed the last part to:
>
> ath11k_err(ar->ab, "%s ptr %px skb data %px len %d over %ld\n", __func__, ptr, skb->data, skb->len, ((unsigned char *)ptr) - skb->data - skb->len);
>
>
> The output is:
>
> ath11k_pci 0000:01:00.0: n_ssids 1
> ath11k_pci 0000:01:00.0: ssid[0] len 0
> ath11k_pci 0000:01:00.0: ath11k_wmi_send_scan_start_cmd ptr ffff9217101e82b4 skb data ffff9217101e804c len 616 over 0
>
> But we are looking at the ath11k_ce_tx_process_cb function. So I would have
> expected that it is related to something which as sent out. So the first thing
> I did was to add some skb_dumps in the sent path (ath11k_htc_send) and in the
> cleanup path (skb_zcopy_clear). Something like this (just the cleanup path
> because otherwise I have to post a rather large diff):
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 819cc58ab051..c15512e2f30c 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -1547,8 +1547,10 @@ static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy_success)
> struct ubuf_info *uarg = skb_zcopy(skb);
>
> if (uarg) {
> - if (!skb_zcopy_is_nouarg(skb))
> + if (!skb_zcopy_is_nouarg(skb)) {
> + skb_dump(KERN_ERR, skb, true);
> uarg->callback(skb, uarg, zerocopy_success);
> + }
>
> skb_shinfo(skb)->flags &= ~SKBFL_ALL_ZEROCOPY;
> }
>
>
> But interestingly, it already crashes to parse the fraglist in
> ath11k_htc_send. So I've added some more dump to figure out where is breaks.
> And I've noticed that it breaks after following section in
> ath11k_wmi_send_scan_start_cmd
>
> if (params->extraie.len)
> memcpy(ptr, params->extraie.ptr,
> params->extraie.len);
>
> Here is the full output:
>
> [ 30.641297] ath11k_wmi_send_scan_start_cmd:2357
> [ 30.645873] skb len=616 headroom=76 headlen=616 tailroom=12
> [ 30.645873] mac=(-1,-1) net=(0,-1) trans=-1
> [ 30.645873] shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
> [ 30.645873] csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)
> [ 30.645873] hash(0x0 sw=0 l4=0) proto=0x0000 pkttype=0 iif=0
> [ 30.673381] skb headroom: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 30.681073] skb headroom: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 30.688758] skb headroom: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 30.696465] skb headroom: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 30.704197] skb headroom: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00
> [ 30.710852] skb linear: 00000000: 9c 00 4d 00 00 a0 00 00 01 00 00 00 00 00 00 00
> [ 30.718538] skb linear: 00000010: 01 00 00 00 1f 00 00 00 32 00 00 00 96 00 00 00
> [ 30.726271] skb linear: 00000020: 32 00 00 00 f4 01 00 00 00 00 00 00 00 00 00 00
> [ 30.733954] skb linear: 00000030: 00 00 00 00 20 4e 00 00 05 00 00 00 10 00 00 00
> [ 30.741636] skb linear: 00000040: 00 00 00 00 61 00 00 00 01 00 00 00 01 00 00 00
> [ 30.749346] skb linear: 00000050: 08 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 30.757092] skb linear: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 30.764795] skb linear: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 30.772483] skb linear: 00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 30.780170] skb linear: 00000090: 00 00 00 00 28 00 00 00 1e 00 00 00 00 00 00 00
> [ 30.787854] skb linear: 000000a0: 84 01 10 00 6c 09 00 00 71 09 00 00 76 09 00 00
> [ 30.795541] skb linear: 000000b0: 7b 09 00 00 80 09 00 00 85 09 00 00 8a 09 00 00
> [ 30.803236] skb linear: 000000c0: 8f 09 00 00 94 09 00 00 99 09 00 00 9e 09 00 00
> [ 30.810933] skb linear: 000000d0: a3 09 00 00 a8 09 00 00 3c 14 00 00 50 14 00 00
> [ 30.818620] skb linear: 000000e0: 64 14 00 00 78 14 00 00 8c 14 00 00 a0 14 00 00
> [ 30.826322] skb linear: 000000f0: b4 14 00 00 c8 14 00 00 7c 15 00 00 90 15 00 00
> [ 30.834018] skb linear: 00000100: a4 15 00 00 b8 15 00 00 cc 15 00 00 e0 15 00 00
> [ 30.841712] skb linear: 00000110: f4 15 00 00 08 16 00 00 1c 16 00 00 30 16 00 00
> [ 30.849402] skb linear: 00000120: 44 16 00 00 58 16 00 00 71 16 00 00 85 16 00 00
> [ 30.857094] skb linear: 00000130: 99 16 00 00 ad 16 00 00 c1 16 00 00 43 17 00 00
> [ 30.864776] skb linear: 00000140: 57 17 00 00 6b 17 00 00 7f 17 00 00 93 17 00 00
> [ 30.872490] skb linear: 00000150: a7 17 00 00 bb 17 00 00 cf 17 00 00 e3 17 00 00
> [ 30.880182] skb linear: 00000160: f7 17 00 00 0b 18 00 00 1f 18 00 00 33 18 00 00
> [ 30.887882] skb linear: 00000170: 47 18 00 00 5b 18 00 00 6f 18 00 00 83 18 00 00
> [ 30.895581] skb linear: 00000180: 97 18 00 00 ab 18 00 00 bf 18 00 00 d3 18 00 00
> [ 30.903265] skb linear: 00000190: e7 18 00 00 fb 18 00 00 0f 19 00 00 23 19 00 00
> [ 30.910974] skb linear: 000001a0: 37 19 00 00 4b 19 00 00 5f 19 00 00 73 19 00 00
> [ 30.918675] skb linear: 000001b0: 87 19 00 00 9b 19 00 00 af 19 00 00 c3 19 00 00
> [ 30.926418] skb linear: 000001c0: d7 19 00 00 eb 19 00 00 ff 19 00 00 13 1a 00 00
> [ 30.934118] skb linear: 000001d0: 27 1a 00 00 3b 1a 00 00 4f 1a 00 00 63 1a 00 00
> [ 30.941842] skb linear: 000001e0: 77 1a 00 00 8b 1a 00 00 9f 1a 00 00 b3 1a 00 00
> [ 30.949537] skb linear: 000001f0: c7 1a 00 00 db 1a 00 00 ef 1a 00 00 03 1b 00 00
> [ 30.957221] skb linear: 00000200: 17 1b 00 00 2b 1b 00 00 3f 1b 00 00 53 1b 00 00
> [ 30.964912] skb linear: 00000210: 67 1b 00 00 7b 1b 00 00 8f 1b 00 00 a3 1b 00 00
> [ 30.972614] skb linear: 00000220: b7 1b 00 00 cb 1b 00 00 24 00 13 00 00 00 00 00
> [ 30.980315] skb linear: 00000230: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 30.988010] skb linear: 00000240: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 30.995696] skb linear: 00000250: 08 00 13 00 ff ff ff ff ff ff 00 00 08 00 11 00
> [ 31.003394] skb linear: 00000260: 00 00 00 00 00 00 00 00
> [ 31.009002] skb tailroom: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00
> [ 31.015646] ath11k_wmi_send_scan_start_cmd:2362
> [ 31.020217] skb len=616 headroom=76 headlen=616 tailroom=12
> [ 31.020217] mac=(-1,-1) net=(0,-1) trans=-1
> [ 31.020217] shinfo(txflags=0 nr_frags=255 gso(size=0 type=265087 segs=0))
> [ 31.020217] csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)
> [ 31.020217] hash(0x0 sw=0 l4=0) proto=0x0000 pkttype=0 iif=0
> [ 31.048289] skb headroom: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 31.056015] skb headroom: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 31.063714] skb headroom: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 31.071425] skb headroom: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 31.079141] skb headroom: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00
> [ 31.085787] skb linear: 00000000: 9c 00 4d 00 00 a0 00 00 01 00 00 00 00 00 00 00
> [ 31.093518] skb linear: 00000010: 01 00 00 00 1f 00 00 00 32 00 00 00 96 00 00 00
> [ 31.101239] skb linear: 00000020: 32 00 00 00 f4 01 00 00 00 00 00 00 00 00 00 00
> [ 31.108947] skb linear: 00000030: 00 00 00 00 20 4e 00 00 05 00 00 00 10 00 00 00
> [ 31.116630] skb linear: 00000040: 00 00 00 00 61 00 00 00 01 00 00 00 01 00 00 00
> [ 31.124326] skb linear: 00000050: 08 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 31.132007] skb linear: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 31.139708] skb linear: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 31.147420] skb linear: 00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 31.155118] skb linear: 00000090: 00 00 00 00 28 00 00 00 1e 00 00 00 00 00 00 00
> [ 31.162798] skb linear: 000000a0: 84 01 10 00 6c 09 00 00 71 09 00 00 76 09 00 00
> [ 31.170486] skb linear: 000000b0: 7b 09 00 00 80 09 00 00 85 09 00 00 8a 09 00 00
> [ 31.178175] skb linear: 000000c0: 8f 09 00 00 94 09 00 00 99 09 00 00 9e 09 00 00
> [ 31.185876] skb linear: 000000d0: a3 09 00 00 a8 09 00 00 3c 14 00 00 50 14 00 00
> [ 31.193593] skb linear: 000000e0: 64 14 00 00 78 14 00 00 8c 14 00 00 a0 14 00 00
> [ 31.201278] skb linear: 000000f0: b4 14 00 00 c8 14 00 00 7c 15 00 00 90 15 00 00
> [ 31.208969] skb linear: 00000100: a4 15 00 00 b8 15 00 00 cc 15 00 00 e0 15 00 00
> [ 31.216655] skb linear: 00000110: f4 15 00 00 08 16 00 00 1c 16 00 00 30 16 00 00
> [ 31.224346] skb linear: 00000120: 44 16 00 00 58 16 00 00 71 16 00 00 85 16 00 00
> [ 31.232030] skb linear: 00000130: 99 16 00 00 ad 16 00 00 c1 16 00 00 43 17 00 00
> [ 31.239739] skb linear: 00000140: 57 17 00 00 6b 17 00 00 7f 17 00 00 93 17 00 00
> [ 31.247428] skb linear: 00000150: a7 17 00 00 bb 17 00 00 cf 17 00 00 e3 17 00 00
> [ 31.255141] skb linear: 00000160: f7 17 00 00 0b 18 00 00 1f 18 00 00 33 18 00 00
> [ 31.262840] skb linear: 00000170: 47 18 00 00 5b 18 00 00 6f 18 00 00 83 18 00 00
> [ 31.270591] skb linear: 00000180: 97 18 00 00 ab 18 00 00 bf 18 00 00 d3 18 00 00
> [ 31.278282] skb linear: 00000190: e7 18 00 00 fb 18 00 00 0f 19 00 00 23 19 00 00
> [ 31.285965] skb linear: 000001a0: 37 19 00 00 4b 19 00 00 5f 19 00 00 73 19 00 00
> [ 31.293675] skb linear: 000001b0: 87 19 00 00 9b 19 00 00 af 19 00 00 c3 19 00 00
> [ 31.301361] skb linear: 000001c0: d7 19 00 00 eb 19 00 00 ff 19 00 00 13 1a 00 00
> [ 31.309056] skb linear: 000001d0: 27 1a 00 00 3b 1a 00 00 4f 1a 00 00 63 1a 00 00
> [ 31.316753] skb linear: 000001e0: 77 1a 00 00 8b 1a 00 00 9f 1a 00 00 b3 1a 00 00
> [ 31.324441] skb linear: 000001f0: c7 1a 00 00 db 1a 00 00 ef 1a 00 00 03 1b 00 00
> [ 31.332138] skb linear: 00000200: 17 1b 00 00 2b 1b 00 00 3f 1b 00 00 53 1b 00 00
> [ 31.339840] skb linear: 00000210: 67 1b 00 00 7b 1b 00 00 8f 1b 00 00 a3 1b 00 00
> [ 31.347520] skb linear: 00000220: b7 1b 00 00 cb 1b 00 00 24 00 13 00 00 00 00 00
> [ 31.355232] skb linear: 00000230: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 31.362920] skb linear: 00000240: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 31.370607] skb linear: 00000250: 08 00 13 00 ff ff ff ff ff ff 00 00 08 00 11 00
> [ 31.378331] skb linear: 00000260: 01 08 02 04 0b 16 0c 12
> [ 31.383972] skb tailroom: 00000000: 18 24 32 04 30 48 60 6c 2d 1a e3 19
> [ 31.390651] skb fraglist:
> [ 31.393348] BUG: unable to handle page fault for address: 00000100000000bc
> [ 31.400317] #PF: supervisor read access in kernel mode
> [ 31.405624] #PF: error_code(0x0000) - not-present page
> [ 31.410832] PGD 0 P4D 0
> [ 31.413422] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [ 31.417881] CPU: 0 PID: 520 Comm: wpa_supplicant Not tainted 5.16.0-rc1+ #5
> [ 31.424862] Hardware name: PC Engines apu2/apu2, BIOS v4.15.0.1 11/23/2021
> [ 31.431750] RIP: 0010:skb_end_pointer+0x0/0xe
> [ 31.436129] Code: dc ff ff ff c3 48 8b 07 48 83 e0 fe 48 83 c8 02 48 89 07 c3 8b 47 08 c3 89 77 08 c3 01 77 08 c3 29 77 08 c3 b8 00 00 00 00 c3 <8b> 87 bc 00 00 00 48 03 87 c0 00 00 00 c3 8b 87 bc 00 00 00 c3 e8
> [ 31.454883] RSP: 0018:ffffb0edc0427490 EFLAGS: 00010282
> [ 31.460116] RAX: ffff8ff9818b1ec0 RBX: 0000010000000000 RCX: 0000000000000000
> [ 31.467267] RDX: 0000000000000001 RSI: 0000010000000000 RDI: 0000010000000000
> [ 31.474408] RBP: ffff8ff9891a07c8 R08: 0000000000000000 R09: ffffb0edc0427370
> [ 31.481549] R10: ffffb0edc0427368 R11: ffffffffa5cd22e8 R12: 0000000000000268
> [ 31.488689] R13: 00000000ffffffff R14: 0000010000000000 R15: ffffffffc0f2198b
> [ 31.495823] FS: 00007f2725a55c00(0000) GS:ffff8ff9aac00000(0000) knlGS:0000000000000000
> [ 31.503936] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 31.509706] CR2: 00000100000000bc CR3: 00000001090bc000 CR4: 00000000000406f0
> [ 31.516868] Call Trace:
> [ 31.519325] <TASK>
> [ 31.521433] skb_dump+0x24/0x53a
> [ 31.524688] ? _printk+0x58/0x6f
> [ 31.527938] skb_dump+0x532/0x53a
> [ 31.531267] ath11k_wmi_send_scan_start_cmd.cold+0x5f2/0x793 [ath11k]
> [ 31.537785] ath11k_mac_op_hw_scan+0x173/0x3f0 [ath11k]
> [ 31.543086] drv_hw_scan+0x43/0x130 [mac80211]
> [ 31.547690] __ieee80211_start_scan+0x152/0x6d0 [mac80211]
> [ 31.553306] ieee80211_request_scan+0x2c/0x50 [mac80211]
> [ 31.558738] rdev_scan+0x28/0xd0 [cfg80211]
> [ 31.563117] nl80211_trigger_scan+0x3fe/0x680 [cfg80211]
> [ 31.568584] genl_family_rcv_msg_doit+0xea/0x150
> [ 31.573223] genl_rcv_msg+0xde/0x1d0
> [ 31.576816] ? nl80211_send_scan_start+0x90/0x90 [cfg80211]
> [ 31.582520] ? genl_get_cmd+0xd0/0xd0
> [ 31.586191] netlink_rcv_skb+0x50/0xf0
> [ 31.589958] genl_rcv+0x24/0x40
> [ 31.593109] netlink_unicast+0x239/0x340
> [ 31.597045] netlink_sendmsg+0x245/0x480
> [ 31.600981] sock_sendmsg+0x5e/0x60
> [ 31.604487] ____sys_sendmsg+0x22e/0x270
> [ 31.608440] ? import_iovec+0x2d/0x30
> [ 31.612123] ? sendmsg_copy_msghdr+0x7c/0xa0
> [ 31.616406] ___sys_sendmsg+0x75/0xb0
> [ 31.620081] ? __mod_lruvec_page_state+0x7d/0xc0
> [ 31.624714] ? folio_add_lru+0x5c/0xa0
> [ 31.628476] ? _raw_spin_unlock+0x16/0x30
> [ 31.632506] ? __handle_mm_fault+0x1261/0x1520
> [ 31.636965] __sys_sendmsg+0x59/0xa0
> [ 31.640552] do_syscall_64+0x3b/0xc0
> [ 31.644148] entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 31.649208] RIP: 0033:0x7f2725ef6f33
> [ 31.652797] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b7 66 2e 0f 1f 84 00 00 00 00 00 90 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 89 54 24 1c 48
> [ 31.671547] RSP: 002b:00007fff1b5f1668 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> [ 31.679122] RAX: ffffffffffffffda RBX: 0000564919260760 RCX: 00007f2725ef6f33
> [ 31.686264] RDX: 0000000000000000 RSI: 00007fff1b5f16a0 RDI: 0000000000000005
> [ 31.693406] RBP: 000056491928f6c0 R08: 0000000000000004 R09: 00007f2725fb6c00
> [ 31.700547] R10: 00007fff1b5f178c R11: 0000000000000246 R12: 0000564919260670
> [ 31.707689] R13: 00007fff1b5f16a0 R14: 0000000000000000 R15: 00007fff1b5f178c
> [ 31.714834] </TASK>
> [ 31.717031] Modules linked in: qrtr_mhi btusb btrtl btbcm btintel bluetooth jitterentropy_rng sha512_ssse3 sha512_generic drbg ansi_cprng amd64_edac ecdh_generic edac_mce_amd ecc kvm_amd kvm irqbypass qrtr crc32_pclmul ghash_clmulni_intel ath11k_pci mhi ath11k evdev pcengines_apuv2 qmi_helpers gpio_keys_polled gpio_amd_fch aesni_intel snd_pcm crypto_simd snd_timer sdhci_pci xhci_pci snd cqhci mac80211 soundcore ehci_pci sp5100_tco cryptd libarc4 xhci_hcd sdhci ehci_hcd pcspkr igb watchdog ptp cfg80211 mmc_core k10temp i2c_piix4 fam15h_power usbcore ccp pps_core sg dca rng_core i2c_algo_bit usb_common rfkill leds_gpio gpio_keys acpi_cpufreq button fuse drm configfs ip_tables x_tables autofs4 ext4 crc32c_generic crc16 mbcache jbd2 sd_mod t10_pi crc_t10dif crct10dif_generic ahci libahci libata crct10dif_pclmul scsi_mod crct10dif_common crc32c_intel scsi_common
> [ 31.793074] CR2: 00000100000000bc
> [ 31.796498] ---[ end trace 07252723010a83e6 ]---
> [ 31.801261] RIP: 0010:skb_end_pointer+0x0/0xe
> [ 31.805824] Code: dc ff ff ff c3 48 8b 07 48 83 e0 fe 48 83 c8 02 48 89 07 c3 8b 47 08 c3 89 77 08 c3 01 77 08 c3 29 77 08 c3 b8 00 00 00 00 c3 <8b> 87 bc 00 00 00 48 03 87 c0 00 00 00 c3 8b 87 bc 00 00 00 c3 e8
> [ 31.824842] RSP: 0018:ffffb0edc0427490 EFLAGS: 00010282
> [ 31.830105] RAX: ffff8ff9818b1ec0 RBX: 0000010000000000 RCX: 0000000000000000
> [ 31.837270] RDX: 0000000000000001 RSI: 0000010000000000 RDI: 0000010000000000
> [ 31.844441] RBP: ffff8ff9891a07c8 R08: 0000000000000000 R09: ffffb0edc0427370
> [ 31.851614] R10: ffffb0edc0427368 R11: ffffffffa5cd22e8 R12: 0000000000000268
> [ 31.858781] R13: 00000000ffffffff R14: 0000010000000000 R15: ffffffffc0f2198b
> [ 31.866020] FS: 00007f2725a55c00(0000) GS:ffff8ff9aac00000(0000) knlGS:0000000000000000
> [ 31.874141] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 31.879920] CR2: 00000100000000bc CR3: 00000001090bc000 CR4: 00000000000406f0
>
>
> So the length calculated for the ath11k_wmi_alloc_skb is just wrong. Reason
> for this is the extraie_len_with_pad which is only an u8. But the
> params->extraie.len with the IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS is for me
> already 264. So the length will end up as 8 - but the length it occupies
> is still 264.
>
> But the problem is the length of the WMI_TLV_LEN. The params->extraie.len can
> be up to 32 bit and WMI_TLV_LEN only has 16 bit. So the params->extraie.len
> must also be size limited or we might run into a different problem.
>
> Kind regards,
> Sven

2021-12-08 08:16:07

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH] ath11k: enable IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS for WCN6855

Wen Gong <[email protected]> wrote:

> Currently mac80211 will send 3 scan request for each scan of WCN6855,
> they are 2.4 GHz/5 GHz/6 GHz band scan. Firmware of WCN6855 will
> cache the RNR IE(Reduced Neighbor Report element) which exist in the
> beacon of 2.4 GHz/5 GHz of the AP which is co-located with 6 GHz,
> and then use the cache to scan in 6 GHz band scan if the 6 GHz scan
> is in the same scan with the 2.4 GHz/5 GHz band, this will helpful to
> search more AP of 6 GHz. Also it will decrease the time cost of scan
> because firmware will use dual-band scan for the 2.4 GHz/5 GHz, it
> means the 2.4 GHz and 5 GHz scans are doing simultaneously.
>
> Set the flag IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS for WCN6855 since
> it supports 2.4 GHz/5 GHz/6 GHz and it is single pdev which means
> all the 2.4 GHz/5 GHz/6 GHz exist in the same wiphy/ieee80211_hw.
>
> Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-01720.1-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1
>
> Signed-off-by: Wen Gong <[email protected]>
> Signed-off-by: Kalle Valo <[email protected]>

Sven, after your memory corruption fix is this good to take?

--
https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches


2021-12-08 08:19:34

by Wen Gong

[permalink] [raw]
Subject: Re: [PATCH] ath11k: enable IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS for WCN6855

On 12/8/2021 4:16 PM, Kalle Valo wrote:
> Wen Gong <[email protected]> wrote:
...
> Sven, after your memory corruption fix is this good to take?

After Sven's fix "ath11k: Fix buffer overflow when scanning with
extraie", it will not happen kernel crash.

But it need Sven's confirm.


2021-12-08 09:13:12

by Sven Eckelmann

[permalink] [raw]
Subject: Re: [PATCH] ath11k: enable IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS for WCN6855

On Wednesday, 8 December 2021 09:19:28 CET Wen Gong wrote:
> On 12/8/2021 4:16 PM, Kalle Valo wrote:
> > Wen Gong <[email protected]> wrote:
> ...
> > Sven, after your memory corruption fix is this good to take?
>
> After Sven's fix "ath11k: Fix buffer overflow when scanning with
> extraie", it will not happen kernel crash.
>
> But it need Sven's confirm.

Correct, it is not causing any problems anymore when the other fix was applied
before this change.

Tested-by: Sven Eckelmann <[email protected]>

Kind regards,
Sven


Attachments:
signature.asc (833.00 B)
This is a digitally signed message part.

2021-12-08 09:49:03

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH] ath11k: enable IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS for WCN6855

Sven Eckelmann <[email protected]> writes:

> On Wednesday, 8 December 2021 09:19:28 CET Wen Gong wrote:
>> On 12/8/2021 4:16 PM, Kalle Valo wrote:
>> > Wen Gong <[email protected]> wrote:
>> ...
>> > Sven, after your memory corruption fix is this good to take?
>>
>> After Sven's fix "ath11k: Fix buffer overflow when scanning with
>> extraie", it will not happen kernel crash.
>>
>> But it need Sven's confirm.
>
> Correct, it is not causing any problems anymore when the other fix was applied
> before this change.
>
> Tested-by: Sven Eckelmann <[email protected]>

Very good, thanks. I included your Tested-by.

--
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2021-12-09 07:59:46

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH] ath11k: enable IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS for WCN6855

Wen Gong <[email protected]> wrote:

> Currently mac80211 will send 3 scan request for each scan of WCN6855,
> they are 2.4 GHz/5 GHz/6 GHz band scan. Firmware of WCN6855 will
> cache the RNR IE(Reduced Neighbor Report element) which exist in the
> beacon of 2.4 GHz/5 GHz of the AP which is co-located with 6 GHz,
> and then use the cache to scan in 6 GHz band scan if the 6 GHz scan
> is in the same scan with the 2.4 GHz/5 GHz band, this will helpful to
> search more AP of 6 GHz. Also it will decrease the time cost of scan
> because firmware will use dual-band scan for the 2.4 GHz/5 GHz, it
> means the 2.4 GHz and 5 GHz scans are doing simultaneously.
>
> Set the flag IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS for WCN6855 since
> it supports 2.4 GHz/5 GHz/6 GHz and it is single pdev which means
> all the 2.4 GHz/5 GHz/6 GHz exist in the same wiphy/ieee80211_hw.
>
> Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-01720.1-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1
>
> Tested-by: Sven Eckelmann <[email protected]>
> Signed-off-by: Wen Gong <[email protected]>
> Signed-off-by: Kalle Valo <[email protected]>

Patch applied to ath-next branch of ath.git, thanks.

9f6da09a5f6a ath11k: enable IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS for WCN6855

--
https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches