2023-10-23 17:57:51

by Ben Greear

[permalink] [raw]
Subject: [PATCH] wifi: mac80211: work around crash in mlme.c

From: Ben Greear <[email protected]>

Protect from NULL ifmgd->assoc_data in ieee80211_mgd_deauth, crash
was seen here fairly often in a 32-station test case utilizing
mtk7922 and be200 radios. I'm not sure if radio types matters
though.

Signed-off-by: Ben Greear <[email protected]>
---

Patch is for wireless-next tree, bug was likely introduced in
this release since this crash was not seen in earlier 6.6-rc testing
nor in 6.5 or earlier.

There may be a better way to fix this...

net/mac80211/mlme.c | 37 ++++++++++++++++++++++++-------------
1 file changed, 24 insertions(+), 13 deletions(-)

diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
index 7695531de611..d2a44a13625c 100644
--- a/net/mac80211/mlme.c
+++ b/net/mac80211/mlme.c
@@ -8185,13 +8185,18 @@ int ieee80211_mgd_deauth(struct ieee80211_sub_if_data *sdata,
"aborting authentication with %pM by local choice (Reason: %u=%s)\n",
req->bssid, req->reason_code,
ieee80211_get_reason_code_string(req->reason_code));
-
- info.link_id = ifmgd->assoc_data->assoc_link_id;
- drv_mgd_prepare_tx(sdata->local, sdata, &info);
- ieee80211_send_deauth_disassoc(sdata, req->bssid, req->bssid,
- IEEE80211_STYPE_DEAUTH,
- req->reason_code, tx,
- frame_buf);
+ if (WARN_ON_ONCE((unsigned long)(ifmgd) < 4000 ||
+ (unsigned long)(ifmgd->assoc_data) < 4000)) {
+ sdata_err(sdata, "ieee80211-mgd-auth abort auth, bad memory: ifmgd: %p ifmgd->assoc_data: %p\n",
+ ifmgd, ifmgd->assoc_data);
+ } else {
+ info.link_id = ifmgd->assoc_data->assoc_link_id;
+ drv_mgd_prepare_tx(sdata->local, sdata, &info);
+ ieee80211_send_deauth_disassoc(sdata, req->bssid, req->bssid,
+ IEEE80211_STYPE_DEAUTH,
+ req->reason_code, tx,
+ frame_buf);
+ }
ieee80211_destroy_auth_data(sdata, false);
ieee80211_report_disconnect(sdata, frame_buf,
sizeof(frame_buf), true,
@@ -8207,12 +8212,18 @@ int ieee80211_mgd_deauth(struct ieee80211_sub_if_data *sdata,
req->bssid, req->reason_code,
ieee80211_get_reason_code_string(req->reason_code));

- info.link_id = ifmgd->auth_data->link_id;
- drv_mgd_prepare_tx(sdata->local, sdata, &info);
- ieee80211_send_deauth_disassoc(sdata, req->bssid, req->bssid,
- IEEE80211_STYPE_DEAUTH,
- req->reason_code, tx,
- frame_buf);
+ if (WARN_ON_ONCE((unsigned long)(ifmgd) < 4000 ||
+ (unsigned long)(ifmgd->assoc_data) < 4000)) {
+ sdata_err(sdata, "ieee80211-mgd-auth abort assoc, bad memory: ifmgd: %p ifmgd->assoc_data: %p\n",
+ ifmgd, ifmgd->assoc_data);
+ } else {
+ info.link_id = ifmgd->auth_data->link_id;
+ drv_mgd_prepare_tx(sdata->local, sdata, &info);
+ ieee80211_send_deauth_disassoc(sdata, req->bssid, req->bssid,
+ IEEE80211_STYPE_DEAUTH,
+ req->reason_code, tx,
+ frame_buf);
+ }
ieee80211_destroy_assoc_data(sdata, ASSOC_ABANDON);
ieee80211_report_disconnect(sdata, frame_buf,
sizeof(frame_buf), true,
--
2.40.0


2023-10-23 18:18:16

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH] wifi: mac80211: work around crash in mlme.c

On Mon, 2023-10-23 at 10:57 -0700, [email protected] wrote:
> From: Ben Greear <[email protected]>
>
> Protect from NULL ifmgd->assoc_data in ieee80211_mgd_deauth, crash
> was seen here fairly often in a 32-station test case utilizing
> mtk7922 and be200 radios. I'm not sure if radio types matters
> though.
>
> Signed-off-by: Ben Greear <[email protected]>
> ---
>
> Patch is for wireless-next tree, bug was likely introduced in
> this release since this crash was not seen in earlier 6.6-rc testing
> nor in 6.5 or earlier.
>
> There may be a better way to fix this...

I mean, you're not *actually* suggesting we merge this patch, right?
Right?!

> +++ b/net/mac80211/mlme.c
> @@ -8185,13 +8185,18 @@ int ieee80211_mgd_deauth(struct ieee80211_sub_if_data *sdata,

I don't know what you did there, but that file doesn't even have 8k
lines for me.

> "aborting authentication with %pM by local choice (Reason: %u=%s)\n",
> req->bssid, req->reason_code,
> ieee80211_get_reason_code_string(req->reason_code));

So let me get this straight ... this is the "aborting authentication"
(!) case, but

> -
> - info.link_id = ifmgd->assoc_data->assoc_link_id;

your code is accessing the assoc_data? I'm not surprised that crashes,
but that's in no way what the upstream code looks like?


> + if (WARN_ON_ONCE((unsigned long)(ifmgd) < 4000 ||
> + (unsigned long)(ifmgd->assoc_data) < 4000)) {

You complain that it takes effort to get stuff upstream, but at the same
time this is what you post - you can't have really bad patches and a
fast track into upstream at the same time...

johannes

2023-10-23 20:06:07

by Ben Greear

[permalink] [raw]
Subject: Re: [PATCH] wifi: mac80211: work around crash in mlme.c

On 10/23/23 11:17, Johannes Berg wrote:
> On Mon, 2023-10-23 at 10:57 -0700, [email protected] wrote:
>> From: Ben Greear <[email protected]>
>>
>> Protect from NULL ifmgd->assoc_data in ieee80211_mgd_deauth, crash
>> was seen here fairly often in a 32-station test case utilizing
>> mtk7922 and be200 radios. I'm not sure if radio types matters
>> though.
>>
>> Signed-off-by: Ben Greear <[email protected]>
>> ---
>>
>> Patch is for wireless-next tree, bug was likely introduced in
>> this release since this crash was not seen in earlier 6.6-rc testing
>> nor in 6.5 or earlier.
>>
>> There may be a better way to fix this...
>
> I mean, you're not *actually* suggesting we merge this patch, right?
> Right?!

No, but it is easier to explain backtraces when you can see the code that
generated it.

>
>> +++ b/net/mac80211/mlme.c
>> @@ -8185,13 +8185,18 @@ int ieee80211_mgd_deauth(struct ieee80211_sub_if_data *sdata,
>
> I don't know what you did there, but that file doesn't even have 8k
> lines for me.

The bug appears to have come in with this patch that I grabbed from linux-wireless
mailing list:

[greearb@ben-dt5 linux-6.6-wn.dev.y]$ git show 4600547c01ef7
commit 4600547c01ef728113253c6df9367eb4ed75193c
Author: Miri Korenblit <[email protected]>
Date: Thu Sep 28 17:35:34 2023 +0300

wifi: mac80211: add link id to mgd_prepare_tx()

As we are moving to MLO and links terms, also the airtime protection
will be done for a link rather than for a vif. Thus, some
drivers will need to know for which link to protect airtime.
Add link id as a parameter to the mgd_prepare_tx() callback.

Signed-off-by: Miri Korenblit <[email protected]>
Signed-off-by: Gregory Greenman <[email protected]>


I see no response to it on linux-wireless mailing list. I applied the
series locally since it preceded other iwlwifi related patches that
I wanted to test.

>> "aborting authentication with %pM by local choice (Reason: %u=%s)\n",
>> req->bssid, req->reason_code,
>> ieee80211_get_reason_code_string(req->reason_code));
>
> So let me get this straight ... this is the "aborting authentication"
> (!) case, but
>
>> -
>> - info.link_id = ifmgd->assoc_data->assoc_link_id;
>
> your code is accessing the assoc_data? I'm not surprised that crashes,
> but that's in no way what the upstream code looks like?

So proper fix (assuming Miri's patch is applied at all) is to just not
assign link-id in this specific case?

Also, there was a WARN_ON from net/wireless/mlme.c that was triggered just
after my splat, from the method below.

eca7b7121795 (Johannes Berg 2013-05-16 00:55:45 +0200 128) static void cfg80211_process_disassoc(struct wireless_dev *wdev,
3bb02143ff55f (Johannes Berg 2020-12-06 14:54:42 +0200 129) const u8 *buf, size_t len,
3bb02143ff55f (Johannes Berg 2020-12-06 14:54:42 +0200 130) bool reconnect)
6039f6d23fe79 (Jouni Malinen 2009-03-19 13:39:21 +0200 131) {
f26cbf401be93 (Zhao, Gang 2014-04-21 12:53:03 +0800 132) struct cfg80211_registered_device *rdev = wiphy_to_rdev(wdev->wiphy);
6829c878ecd24 (Johannes Berg 2009-07-02 09:13:27 +0200 133) struct ieee80211_mgmt *mgmt = (struct ieee80211_mgmt *)buf;
19957bb399e27 (Johannes Berg 2009-07-02 17:20:43 +0200 134) const u8 *bssid = mgmt->bssid;
ceca7b7121795 (Johannes Berg 2013-05-16 00:55:45 +0200 135) u16 reason_code = le16_to_cpu(mgmt->u.disassoc.reason_code);
ceca7b7121795 (Johannes Berg 2013-05-16 00:55:45 +0200 136) bool from_ap = !ether_addr_equal(mgmt->sa, wdev->netdev->dev_addr);
6829c878ecd24 (Johannes Berg 2009-07-02 09:13:27 +0200 137)
3bb02143ff55f (Johannes Berg 2020-12-06 14:54:42 +0200 138) nl80211_send_disassoc(rdev, wdev->netdev, buf, len, reconnect,
3bb02143ff55f (Johannes Berg 2020-12-06 14:54:42 +0200 139) GFP_KERNEL);
a3b8b0569fbef (Jouni Malinen 2009-03-27 21:59:49 +0200 140)
7b0a0e3c3a882 (Johannes Berg 2022-04-14 16:50:57 +0200 141) if (WARN_ON(!wdev->connected ||
7b0a0e3c3a882 (Johannes Berg 2022-04-14 16:50:57 +0200 142) !ether_addr_equal(wdev->u.client.connected_addr, bssid)))
596a07c18b35c (Johannes Berg 2009-07-11 00:17:32 +0200 143) return;
6829c878ecd24 (Johannes Berg 2009-07-02 09:13:27 +0200 144)
ceca7b7121795 (Johannes Berg 2013-05-16 00:55:45 +0200 145) __cfg80211_disconnected(wdev->netdev, NULL, 0, reason_code, from_ap);
ceca7b7121795 (Johannes Berg 2013-05-16 00:55:45 +0200 146) cfg80211_sme_disassoc(wdev);
667503ddcb96f (Johannes Berg 2009-07-07 03:56:11 +0200 147) }

And of course it is always possible some other patch I've applied or created is actually
triggering this problem.

>> + if (WARN_ON_ONCE((unsigned long)(ifmgd) < 4000 ||
>> + (unsigned long)(ifmgd->assoc_data) < 4000)) {
>
> You complain that it takes effort to get stuff upstream, but at the same
> time this is what you post - you can't have really bad patches and a
> fast track into upstream at the same time...

Of course I don't expect every patch to go upstream effortlessly.

But there is also the case where a patch may be technically OK, and useful
to me, but it is not an API or feature that the driver/stack maintainer
cares about, so it is ignored. Regarding my previous patch to fix a
crash, I'm not going to spend my time renaming
variables on the off chance that you'd like the patch vs just fixing the
specific broken code and moving on to other tasks. Since you known your
own mind, you could rename variables in 2 minutes, post the patch, and
you'd be done.

Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2023-10-24 08:29:11

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH] wifi: mac80211: work around crash in mlme.c

On Mon, 2023-10-23 at 13:05 -0700, Ben Greear wrote:
> On 10/23/23 11:17, Johannes Berg wrote:
> > On Mon, 2023-10-23 at 10:57 -0700, [email protected] wrote:
> > > From: Ben Greear <[email protected]>
> > >
> > > Protect from NULL ifmgd->assoc_data in ieee80211_mgd_deauth, crash
> > > was seen here fairly often in a 32-station test case utilizing
> > > mtk7922 and be200 radios. I'm not sure if radio types matters
> > > though.
> > >
> > > Signed-off-by: Ben Greear <[email protected]>
> > > ---
> > >
> > > Patch is for wireless-next tree, bug was likely introduced in
> > > this release since this crash was not seen in earlier 6.6-rc testing
> > > nor in 6.5 or earlier.
> > >
> > > There may be a better way to fix this...
> >
> > I mean, you're not *actually* suggesting we merge this patch, right?
> > Right?!
>
> No, but it is easier to explain backtraces when you can see the code that
> generated it.

Sure, but why actually post it as a [PATCH] then rather than just part
of the bug report or something? :)

Anyway ...

> The bug appears to have come in with this patch that I grabbed from linux-wireless
> mailing list:
>
> [greearb@ben-dt5 linux-6.6-wn.dev.y]$ git show 4600547c01ef7
> commit 4600547c01ef728113253c6df9367eb4ed75193c
> Author: Miri Korenblit <[email protected]>
> Date: Thu Sep 28 17:35:34 2023 +0300
>
> wifi: mac80211: add link id to mgd_prepare_tx()
>
> As we are moving to MLO and links terms, also the airtime protection
> will be done for a link rather than for a vif. Thus, some
> drivers will need to know for which link to protect airtime.
> Add link id as a parameter to the mgd_prepare_tx() callback.
>
> Signed-off-by: Miri Korenblit <[email protected]>
> Signed-off-by: Gregory Greenman <[email protected]>
>
>
> I see no response to it on linux-wireless mailing list. I applied the
> series locally since it preceded other iwlwifi related patches that
> I wanted to test.

I applied the same patch as commit
e76f3b4a73ea60ef098c5762b2aef4d11e094a04
Author: Miri Korenblit <[email protected]>
Date: Thu Sep 28 17:35:34 2023 +0300

wifi: mac80211: add link id to mgd_prepare_tx()

from

https://lore.kernel.org/r/20230928172905.c7fc59a6780b.Ic88a5037d31e184a2dce0b031ece1a0a93a3a9da@changeid


It doesn't contain that bug, neither in my version nor in the list
version.

Maybe you had some conflicts due to other changes and resolved them
incorrectly by accident?

> So proper fix (assuming Miri's patch is applied at all) is to just not
> assign link-id in this specific case?

No, it should assign the link ID from the correct place, as Miri's patch
does :)

> Also, there was a WARN_ON from net/wireless/mlme.c that was triggered just
> after my splat, from the method below.
>
> eca7b7121795 (Johannes Berg 2013-05-16 00:55:45 +0200 128) static void cfg80211_process_disassoc(struct wireless_dev *wdev,
> 3bb02143ff55f (Johannes Berg 2020-12-06 14:54:42 +0200 129) const u8 *buf, size_t len,
> 3bb02143ff55f (Johannes Berg 2020-12-06 14:54:42 +0200 130) bool reconnect)
> 6039f6d23fe79 (Jouni Malinen 2009-03-19 13:39:21 +0200 131) {
> f26cbf401be93 (Zhao, Gang 2014-04-21 12:53:03 +0800 132) struct cfg80211_registered_device *rdev = wiphy_to_rdev(wdev->wiphy);
> 6829c878ecd24 (Johannes Berg 2009-07-02 09:13:27 +0200 133) struct ieee80211_mgmt *mgmt = (struct ieee80211_mgmt *)buf;
> 19957bb399e27 (Johannes Berg 2009-07-02 17:20:43 +0200 134) const u8 *bssid = mgmt->bssid;
> ceca7b7121795 (Johannes Berg 2013-05-16 00:55:45 +0200 135) u16 reason_code = le16_to_cpu(mgmt->u.disassoc.reason_code);
> ceca7b7121795 (Johannes Berg 2013-05-16 00:55:45 +0200 136) bool from_ap = !ether_addr_equal(mgmt->sa, wdev->netdev->dev_addr);
> 6829c878ecd24 (Johannes Berg 2009-07-02 09:13:27 +0200 137)
> 3bb02143ff55f (Johannes Berg 2020-12-06 14:54:42 +0200 138) nl80211_send_disassoc(rdev, wdev->netdev, buf, len, reconnect,
> 3bb02143ff55f (Johannes Berg 2020-12-06 14:54:42 +0200 139) GFP_KERNEL);
>

> And of course it is always possible some other patch I've applied or created is actually
> triggering this problem.


I think there's a good chance that's just a consequence of this
particular workaround, but I'm not really sure. If you can see this with
a less patched kernel I can take a look, but as is I don't even know
what you're running.


> Of course I don't expect every patch to go upstream effortlessly.

:)

> But there is also the case where a patch may be technically OK, and useful
> to me, but it is not an API or feature that the driver/stack maintainer
> cares about, so it is ignored.

Well, it does raise the question of whether we (or often really just
me?) should maintain something in upstream that's not generally useful,
or already solved in another way (like ethtool stuff).

Yes, I tend to not want to commit to saying no and let stuff linger, and
that's really a bad pattern that I have.

> Regarding my previous patch to fix a
> crash,
>

If you mean this:

https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/

> I'm not going to spend my time renaming
> variables on the off chance that you'd like the patch vs just fixing the
> specific broken code and moving on to other tasks.

I actually applied it. Someone else was replying there :)

> Since you known your
> own mind, you could rename variables in 2 minutes, post the patch, and
> you'd be done.

Agree. I think I do that, but maybe not often enough.

johannes

2023-10-24 16:35:22

by Ben Greear

[permalink] [raw]
Subject: Re: [PATCH] wifi: mac80211: work around crash in mlme.c

On 10/24/23 01:28, Johannes Berg wrote:
> On Mon, 2023-10-23 at 13:05 -0700, Ben Greear wrote:
>> On 10/23/23 11:17, Johannes Berg wrote:
>>> On Mon, 2023-10-23 at 10:57 -0700, [email protected] wrote:
>>>> From: Ben Greear <[email protected]>
>>>>
>>>> Protect from NULL ifmgd->assoc_data in ieee80211_mgd_deauth, crash
>>>> was seen here fairly often in a 32-station test case utilizing
>>>> mtk7922 and be200 radios. I'm not sure if radio types matters
>>>> though.
>>>>
>>>> Signed-off-by: Ben Greear <[email protected]>
>>>> ---
>>>>
>>>> Patch is for wireless-next tree, bug was likely introduced in
>>>> this release since this crash was not seen in earlier 6.6-rc testing
>>>> nor in 6.5 or earlier.
>>>>
>>>> There may be a better way to fix this...
>>>
>>> I mean, you're not *actually* suggesting we merge this patch, right?
>>> Right?!
>>
>> No, but it is easier to explain backtraces when you can see the code that
>> generated it.
>
> Sure, but why actually post it as a [PATCH] then rather than just part
> of the bug report or something? :)
>
> Anyway ...
>
>> The bug appears to have come in with this patch that I grabbed from linux-wireless
>> mailing list:
>>
>> [greearb@ben-dt5 linux-6.6-wn.dev.y]$ git show 4600547c01ef7
>> commit 4600547c01ef728113253c6df9367eb4ed75193c
>> Author: Miri Korenblit <[email protected]>
>> Date: Thu Sep 28 17:35:34 2023 +0300
>>
>> wifi: mac80211: add link id to mgd_prepare_tx()
>>
>> As we are moving to MLO and links terms, also the airtime protection
>> will be done for a link rather than for a vif. Thus, some
>> drivers will need to know for which link to protect airtime.
>> Add link id as a parameter to the mgd_prepare_tx() callback.
>>
>> Signed-off-by: Miri Korenblit <[email protected]>
>> Signed-off-by: Gregory Greenman <[email protected]>
>>
>>
>> I see no response to it on linux-wireless mailing list. I applied the
>> series locally since it preceded other iwlwifi related patches that
>> I wanted to test.
>
> I applied the same patch as commit
> e76f3b4a73ea60ef098c5762b2aef4d11e094a04
> Author: Miri Korenblit <[email protected]>
> Date: Thu Sep 28 17:35:34 2023 +0300
>
> wifi: mac80211: add link id to mgd_prepare_tx()
>
> from
>
> https://lore.kernel.org/r/20230928172905.c7fc59a6780b.Ic88a5037d31e184a2dce0b031ece1a0a93a3a9da@changeid
>
>
> It doesn't contain that bug, neither in my version nor in the list
> version.
>
> Maybe you had some conflicts due to other changes and resolved them
> incorrectly by accident?
>
>> So proper fix (assuming Miri's patch is applied at all) is to just not
>> assign link-id in this specific case?
>
> No, it should assign the link ID from the correct place, as Miri's patch
> does :)

My hack/patch did have a typo in it on the second clause, and checked the
wrong thing being NULL. But, before I got the patch into my kernel, I managed
to hit NPE in both the clauses. Not immediately...but sometimes.

I noticed afterwards that mtk7922 has a regression where it will not assoc on
6e, and it was 6e test case that reliably triggered the bug, so I suspect
it is something about deauthing a sta that never managed to auth in the first
place.

I guess just keep it in mind if someone else reports the problem. I'll run locally
with an 'if null, skip assigning link-id' patch.

>> But there is also the case where a patch may be technically OK, and useful
>> to me, but it is not an API or feature that the driver/stack maintainer
>> cares about, so it is ignored.
>
> Well, it does raise the question of whether we (or often really just
> me?) should maintain something in upstream that's not generally useful,
> or already solved in another way (like ethtool stuff).

I'm not going likely going anywhere. I have been maintaining my various patches
out-of-tree for a great many years, and that is even worse than maintaining them
in the tree. So whatever of my stuff makes it into the tree, if you ever find it
needs some work and I don't notice right away, dump it on me and I or someone else
at my company will work on it.

Thanks,
Ben

>
> Yes, I tend to not want to commit to saying no and let stuff linger, and
> that's really a bad pattern that I have.
>
>> Regarding my previous patch to fix a
>> crash,
>>
>
> If you mean this:
>
> https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/
>
>> I'm not going to spend my time renaming
>> variables on the off chance that you'd like the patch vs just fixing the
>> specific broken code and moving on to other tasks.
>
> I actually applied it. Someone else was replying there :)
>
>> Since you known your
>> own mind, you could rename variables in 2 minutes, post the patch, and
>> you'd be done.
>
> Agree. I think I do that, but maybe not often enough.
>
> johannes
>

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com