2023-10-01 22:14:56

by James Dutton

[permalink] [raw]
Subject: mac80211 bugs

Hi,

I recently saw a report of a kernel bug that was caused by a null
pointer reference in:
mac80211.c: sta_set_sinfo()

Now, looking at the kernel source code, I can see sta_set_sinfo()
dereferencing pointers.
The function is a void function, so there is no defensive programming
going on in the wifi drivers here.
It would seem sensible to try to log a message and return an -EINVAL,
rather than crashing/halting the entire kernel, but a void function
cannot return anything.

Why is there no defensive/security-by-design programming in the Linux
wifi drivers?

Kind Regards

James

P.S. The stack trace is here:
https://github.com/openwrt/openwrt/issues/13198


2023-10-09 15:57:22

by James Dutton

[permalink] [raw]
Subject: Re: mac80211 bugs

On Sun, 1 Oct 2023 at 11:27, James Dutton <[email protected]> wrote:
>
> Hi,
>
> I recently saw a report of a kernel bug that was caused by a null
> pointer reference in:
> mac80211.c: sta_set_sinfo()
>

The kernel Oops has this:
Code: d3441c63 12000c00 8b030ca3 f9409c63 (f9400465)
Extract of the code:

/usr/src/linux/net/mac80211/st
a_info.c:2451
d118: 17fffee6 b ccb0 <sta_set_sinfo+0x6b0>
/usr/src/linux/net/mac80211/sta_info.c:2422
d11c: f9402107 ldr x7, [x8, #64]
d120: d3441c42 ubfx x2, x2, #4, #4
d124: 12000c00 and w0, w0, #0xf
d128: 8b020ce2 add x2, x7, x2, lsl #3
/usr/src/linux/net/mac80211/sta_info.c:2424
d12c: f9409c42 ldr x2, [x2, #312]
d130: f9400447 ldr x7, [x2, #8] <-----It_crashes_here
d134: b40004c7 cbz x7, d1cc <sta_set_sinfo+0xbcc>
/usr/src/linux/net/mac80211/sta_info.c:2427
d138: 52800188 mov w8, #0xc // #12
d13c: 52800082 mov w2, #0x4 // #4
d140: 9ba81c00 umaddl x0, w0, w8, x7
d144: 79400800 ldrh w0, [x0, #4]
/usr/src/linux/net/mac80211/sta_info.c:2428
d148: f10004df cmp x6, #0x1
d14c: 54000080 b.eq d15c <sta_set_sinfo+0xb5c> // b.none


/usr/src/linux/net/mac80211/sta_info.c:2424
case STA_STATS_RATE_TYPE_LEGACY: {
struct ieee80211_supported_band *sband;
u16 brate;
unsigned int shift;
int band = STA_STATS_GET(LEGACY_BAND, rate);
int rate_idx = STA_STATS_GET(LEGACY_IDX, rate);

sband = local->hw.wiphy->bands[band];

if (WARN_ON_ONCE(!sband->bitrates)) <------It_crashes_here
break;

brate = sband->bitrates[rate_idx].bitrate;
if (rinfo->bw == RATE_INFO_BW_5)
shift = 2;
else if (rinfo->bw == RATE_INFO_BW_10)
shift = 1;
else
shift = 0;
rinfo->legacy = DIV_ROUND_UP(brate, 1 << shift);
break;
}

Looking at this, it can be one of two things:
1) local->hw.wiphy->bands[band]; is NULL
2) bands is an array of 6 items, making band valid for values 0-5.
If band >= 6, it would cause problems.
So maybe something along these lines might help:

Signed-off-by: James Courtier-Dutton <[email protected]>
--- sta_info.c.org 2023-10-08 19:52:13.578270007 +0100
+++ sta_info.c.new2 2023-10-08 19:52:09.450214070 +0100
@@ -2420,7 +2420,26 @@
int band = STA_STATS_GET(LEGACY_BAND, rate);
int rate_idx = STA_STATS_GET(LEGACY_IDX, rate);

+ if (band >= NUM_NL80211_BANDS) {
+ printk("ERROR: band=%d is too large.
Returning\n", band);
+ break;
+ }
+
sband = local->hw.wiphy->bands[band];
+ if (!sband) {
+ printk("ERROR: sband NULL. Returning\n");
+ break;
+ }

if (WARN_ON_ONCE(!sband->bitrates))
break;

2023-10-10 14:22:55

by Johannes Berg

[permalink] [raw]
Subject: Re: mac80211 bugs

On Mon, 2023-10-09 at 16:56 +0100, James Dutton wrote:
> /usr/src/linux/net/mac80211/sta_info.c:2424
> case STA_STATS_RATE_TYPE_LEGACY: {
> struct ieee80211_supported_band *sband;
> u16 brate;
> unsigned int shift;
> int band = STA_STATS_GET(LEGACY_BAND, rate);
> int rate_idx = STA_STATS_GET(LEGACY_IDX, rate);
>
> sband = local->hw.wiphy->bands[band];
>
> if (WARN_ON_ONCE(!sband->bitrates)) <------It_crashes_here
> break;
>
> brate = sband->bitrates[rate_idx].bitrate;
> if (rinfo->bw == RATE_INFO_BW_5)
> shift = 2;
> else if (rinfo->bw == RATE_INFO_BW_10)
> shift = 1;
> else
> shift = 0;
> rinfo->legacy = DIV_ROUND_UP(brate, 1 << shift);
> break;
> }
>
> Looking at this, it can be one of two things:
> 1) local->hw.wiphy->bands[band]; is NULL

Yes, I think that's it.

> 2) bands is an array of 6 items, making band valid for values 0-5.
> If band >= 6, it would cause problems.

Highly unlikely.

> So maybe something along these lines might help:
>
> Signed-off-by: James Courtier-Dutton <[email protected]>
> --- sta_info.c.org 2023-10-08 19:52:13.578270007 +0100
> +++ sta_info.c.new2 2023-10-08 19:52:09.450214070 +0100
> @@ -2420,7 +2420,26 @@
> int band = STA_STATS_GET(LEGACY_BAND, rate);
> int rate_idx = STA_STATS_GET(LEGACY_IDX, rate);
>
> + if (band >= NUM_NL80211_BANDS) {
> + printk("ERROR: band=%d is too large.
> Returning\n", band);
> + break;
> + }
> +
> sband = local->hw.wiphy->bands[band];
> + if (!sband) {
> + printk("ERROR: sband NULL. Returning\n");
> + break;
> + }


You'd really never want a plain printk, and anyway, that printk is
malformed (no severity string macro).


_Maybe_ change it to WARN_ON_ONCE(!sband || !sband->bitrates) there, but
really I think we should prevent this in the first place.


Is this, by any chance, a device without 2.4 GHz?

johannes

2023-10-10 20:43:14

by James Dutton

[permalink] [raw]
Subject: Re: mac80211 bugs

On Tue, 10 Oct 2023 at 15:22, Johannes Berg <[email protected]> wrote:
>
> On Mon, 2023-10-09 at 16:56 +0100, James Dutton wrote:
> >
> > Looking at this, it can be one of two things:
> > 1) local->hw.wiphy->bands[band]; is NULL
>
> Yes, I think that's it.
>
> > 2) bands is an array of 6 items, making band valid for values 0-5.
> > If band >= 6, it would cause problems.
>
> Highly unlikely.
>
[snip]
>
> _Maybe_ change it to WARN_ON_ONCE(!sband || !sband->bitrates) there, but
> really I think we should prevent this in the first place.
>
>
> Is this, by any chance, a device without 2.4 GHz?
>

Hi,
I have the same device that is reported to be crashing. My device does
not actually crash, so I personally have not seen the problem.
My device is what I would call unstable (not crashing, just not
forwarding packets) though. My work around is to switch on/off
airplane mode on the client, and it continues forwarding packets
again. Some of the wifi device drivers have some not very portable use
of bit fields that looks suspicious to me, but again no proof yet of
what causes my stability problems. It is why a lot of access to bit
fields in the kernel correctly uses portable accessors like
STA_STATS_GET() and friends.

I have seen this though:
https://github.com/openwrt/openwrt/issues/13198
Which has reports of a few other people seeing the crash.
The device has both 2.4 and 5G wifi.
From what I can see, the band information originates from what the
wifi card received over the RF.
So, theoretically, it might be caused by a bogus wifi packet being received.
I agree that it is unlikely for "band" to get to 6 or above, but until
one of the users who are seeing the problem runs a kernel with the
extra printk or WARN statements in it, I don't think we are going to
know the cause for certain.
I think it prudent to put the if statements in to catch both edge cases.