2023-08-24 23:27:30

by Peter Astrand

[permalink] [raw]
Subject: wl18xx: firmware security, very unstable with multiple roles


Hi. We have been working with the TI WL1807MOD for quite some time now, in
a custom i.MX6 system. We are using a configuation with both Mesh and
AP/WiFi roles, based on hostapd and wpa_supplicant 2.10. Mesh size is ~10
nodes.

Unfortunately, in our experience, the wl18xx (WiLink) solution has
numerous issues. One major problem is that TI only supports the 4.19.38
kernel. Although 4.19 is a LTS/SLTS release, 4.19.38 has hundreds of
vulnerabilites fixed in later 4.19 versions. WiLink also requires a number
of patches on top of 4.19.38. Many of these are of low quality: Some
without description, some changing code that was added by earlier patches.
Most of these have been rejected by the upstream Linux or not submitted
for inclusion (see
https://github.com/astrand/wilink8-wlan-build-utilites/wiki) .

For us, 4.19 is too old; we need at least Linux 5.x. This has proved to be
a major challenge. Currently, we are using 5.10.72, except for adding
.beacon_int_min_gcd = 1 to the wl18xx_iface_combinations (corresponds to
TI patch 0018). We have tried 5.4.X kernels we as well, with different
combinations of TI patches. Unfortunately, the problems are still present:


1)
The firmware policy is problematic, and causes security issues: The kernel
requires firmware 8.9.x.x.58 or later. The repository and
changelog is incomplete
(https://git.ti.com/cgit/wilink8-wlan/wl18xx_fw/), but since that
version there has been several security related bugfixes for the
firmware. Compatibility between modern kernels and modern firmware
versions are "unknown". The latest firmware release (which
includes fragattack fix) does not work at all with upstream kernels due
to API change, unless
https://git.ti.com/cgit/wilink8-wlan/build-utilites/tree/patches/kernel_patches/4.19.38/0023-wlcore-Fixing-PN-drift-on-encrypted-link-after-recov.patch?h=r8.8
is applied. Again, this patch is only provided for 4.19.38, which means
that many users are more or less excluded from deploying these bug- and
security fixes provided by the latest firmware. Ideally, I think it
would be good if the "0023" patch above could be rewritten and applied,
so that the kernel would support both new and old firmware versions. What
do you think?


2)
It is very common that communication stops and
/sys/kernel/debug/ieee80211/phy0/wlcore/tx_queue_len climbs to 100-500. This has been reported on
https://e2e.ti.com/support/wireless-connectivity/wi-fi-group/wifi/f/wi-fi-forum/957174/wl1835mod-wl1835-firmware-gets-stuck-and-does-not-recover-when-bringing-up-large-mesh-10-peers-within-range/
. Unfortunately, the "solution" was to run 4.19.38 with WiLink patches.
Currently, we are monitoring tx_queue_len and restarting the network
interfaces if tx_queue_len is too high for too long, but this is error
prone and causes downtime.


3)
Warnings such as:
[ 3536.621345] wlcore: WARNING corrupted packet in RX: status: 0x1 len: 88
...are printed "all the time". Is this expected?


4)
We have suffered from panics of type:
skbuff: skb_under_panic: text:c7703318 len:158 put:16 head:cf7a8af6 data:10cc3c43 tail:0x9c9d269c end:0x9c9d2740 dev:wlan0_mesh

This has been reported here:
https://e2e.ti.com/support/wireless-connectivity/wi-fi-group/wifi/f/wi-fi-forum/1190025/wl1807mod-mesh-and-ap-incompatible-with-5-4-56-and-occasional-skbuff-skb_under_panic

We were able to reproduce this even with 4.19, but TI have not provided
any fix. In the end, we wrote a patch ourselves to fix this; attached.
Does this make any sense? If so, we can do a proper submit with
Signed-off-by etc.


5)
Lost DHCPOFFERs. hostapd+dnsmasq providing AP+DHCP service on top of
wl18xx. AP system sees DHCPDISCOVER and sends DHCPOFFER, but this offer
never reaches the client. This has been verified both with a Windows
laptop as well as another Linux system as client. Thus it seems like
wl18xx does not transmit properly. This exact problem has been described
here:

https://e2e.ti.com/support/wireless-connectivity/wi-fi-group/wifi/f/wi-fi-forum/1158120/wl1807mod-dhcp-server-not-working-in-wifi-ap-mode-when-sdio-clock-frequency-is-50mhz/

Here the problem could be solved by using a lower than 50 Mhz SDIO
frequency. Unfortunately, this does not help on our system.


6) Occasionally, we get kernel warnings like:
[55467.283136] WARNING: CPU: 0 PID: 6929 at drivers/net/wireless/ti/wlcore/main.c:5244 wlcore_pending_auth_complete_work+0x74/0xe8 [wlcore]



Grateful for any comments.


Br,
Peter Astrand


Attachments:
skb_push.patch (1.93 kB)