2020-06-25 23:32:06

by Nick Owens

[permalink] [raw]
Subject: [PATCH] rtw88: fix skb_under_panic in tx path

hello :)

this change fixes a reliable crash on my thinkpad A485.

please note i have no prior experience doing kernel development or
sending patches, and i'm not sure if this is a correct approach.

--

From aa589182d30a0f99e1b3201ed4f3830e8af71dac Mon Sep 17 00:00:00 2001
From: Nick Owens <[email protected]>
Date: Thu, 25 Jun 2020 12:55:41 -0700
Subject: [PATCH] rtw88: fix skb_under_panic in tx path

fixes the following panic on my thinkpad A485

Oops#1 Part3
<0>[ 3743.881656] skbuff: skb_under_panic: text:000000005f69fd98 len:208 put:48 head:000000009e2719e8 data:00000000bd3795e0 tail:0xc2 end:0x2c0 dev:wlp2s0
<4>[ 3743.881675] ------------[ cut here ]------------
<2>[ 3743.881677] kernel BUG at net/core/skbuff.c:109!
<4>[ 3743.881688] invalid opcode: 0000 [#1] SMP NOPTI
<4>[ 3743.881693] CPU: 7 PID: 665 Comm: irq/85-rtwpci Tainted: G E 5.7.5 #31
<4>[ 3743.881695] Hardware name: LENOVO 20MU000TUS/20MU000TUS, BIOS R0WET56W (1.24 ) 06/28/2019
<4>[ 3743.881703] RIP: 0010:skb_panic+0x48/0x4a
<4>[ 3743.881706] Code: 4f 70 50 8b 87 bc 00 00 00 50 8b 87 b8 00 00 00 50 ff b7 c8 00 00 00 4c 8b 8f c0 00 00 00 48 c7 c7 68 14 51 bd e8 d5 22 ab ff <0f> 0b 48 8b 14 24 48 c7 c1 00 c7 2c bd e8 a6 ff ff ff 48 c7 c6 40
<4>[ 3743.881708] RSP: 0018:ffffb354002fce00 EFLAGS: 00010246
<4>[ 3743.881711] RAX: 0000000000000088 RBX: ffff954377fe1e80 RCX: 0000000000000000
<4>[ 3743.881713] RDX: 0000000000000000 RSI: ffff95437ffd8968 RDI: ffff95437ffd8968
<4>[ 3743.881714] RBP: ffff954362d7d000 R08: 0000000000000485 R09: 0000000000000097
<4>[ 3743.881716] R10: 0000000000000000 R11: ffffb354002fccb0 R12: 0000000000000030
<4>[ 3743.881717] R13: 0000000000000001 R14: ffffb354002fcf08 R15: ffffffffc163aba0
<4>[ 3743.881720] FS: 0000000000000000(0000) GS:ffff95437ffc0000(0000) knlGS:0000000000000000
<4>[ 3743.881721] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 3743.881723] CR2: 00007f63e45e8cb0 CR3: 00000003c7fb0000 CR4: 00000000003406e0
<4>[ 3743.881724] Call Trace:
<4>[ 3743.881728] <IRQ>
<4>[ 3743.881733] skb_push.cold.98+0x10/0x10
<4>[ 3743.881741] rtw_pci_tx_write_data+0xb1/0x4e0 [rtwpci]
<4>[ 3743.881746] rtw_pci_tx_write+0x59/0xe7 [rtwpci]
Panic#2 Part3
<4>[ 3743.881755] rtw_tx_tasklet+0xfd/0x1f0 [rtw88]
<4>[ 3743.881763] tasklet_action_common.isra.20+0x4e/0xf0
<4>[ 3743.881769] __do_softirq+0xd9/0x2d9
<4>[ 3743.881773] do_softirq_own_stack+0x2a/0x40
<4>[ 3743.881775] </IRQ>
<4>[ 3743.881778] do_softirq.part.18+0x2b/0x30
<4>[ 3743.881780] __local_bh_enable_ip+0x4b/0x50
<4>[ 3743.881784] rtw_pci_interrupt_threadfn+0x154/0x230 [rtwpci]
<4>[ 3743.881789] ? irq_forced_thread_fn+0x70/0x70
<4>[ 3743.881791] irq_thread_fn+0x1f/0x50
<4>[ 3743.881794] irq_thread+0xe7/0x160
<4>[ 3743.881797] ? wake_threads_waitq+0x30/0x30
<4>[ 3743.881800] ? irq_thread_check_affinity+0x80/0x80
<4>[ 3743.881804] kthread+0x112/0x130
<4>[ 3743.881807] ? kthread_park+0x80/0x80
<4>[ 3743.881810] ret_from_fork+0x22/0x40
<4>[ 3743.881813] Modules linked in: ctr(E) ccm(E) xt_MASQUERADE(E) nf_conntrack_netlink(E) xfrm_user(E) xfrm_algo(E) nft_counter(E) nft_chain_nat(E) xt_addrtype(E) nft_compat(E) nf_tables(E) nfnetlink(E) xt_conntrack(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) br_netfilter(E) bridge(E) stp(E) llc(E) overlay(E) cmac(E) bnep(E) nls_ascii(E) nls_cp437(E) vfat(E) snd_hda_codec_realtek(E) fat(E) btusb(E) btrtl(E) snd_hda_codec_generic(E) btbcm(E) rtwpci(E) btintel(E) uvcvideo(E) snd_hda_codec_hdmi(E) bluetooth(E) rtw88(E) videobuf2_vmalloc(E) edac_mce_amd(E) snd_hda_intel(E) videobuf2_memops(E) snd_intel_dspcfg(E) videobuf2_v4l2(E) mac80211(E) videobuf2_common(E) snd_hda_codec(E) drbg(E) efi_pstore(E) kvm_amd(E) ansi_cprng(E) joydev(E) snd_hda_core(E) kvm(E) snd_hwdep(E) videodev(E) ecdh_generic(E) snd_pcm(E) irqbypass(E) pcspkr(E) serio_raw(E) efivars(E) ftdi_sio(E) wmi_bmof(E) k10temp(E) mc(E) sp5100_tco(E) ecc(E) tpm_crb(E) usbserial(E) snd_timer(E) ccp(E) cfg80211(E)

Signed-off-by: Nick Owens <[email protected]>
Cc: [email protected]
---
drivers/net/wireless/realtek/rtw88/pci.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/drivers/net/wireless/realtek/rtw88/pci.c b/drivers/net/wireless/realtek/rtw88/pci.c
index d735f3127fe8..21b3b268cb25 100644
--- a/drivers/net/wireless/realtek/rtw88/pci.c
+++ b/drivers/net/wireless/realtek/rtw88/pci.c
@@ -741,6 +741,12 @@ static int rtw_pci_tx_write_data(struct rtw_dev *rtwdev,
else if (!avail_desc(ring->r.wp, ring->r.rp, ring->r.len))
return -ENOSPC;

+ if (skb_headroom(skb) < chip->tx_pkt_desc_sz &&
+ pskb_expand_head(skb, chip->tx_pkt_desc_sz - skb_headroom(skb), 0, GFP_ATOMIC)) {
+ dev_err(rtwdev->dev, "no headroom available");
+ return -ENOMEM;
+ }
+
pkt_desc = skb_push(skb, chip->tx_pkt_desc_sz);
memset(pkt_desc, 0, tx_pkt_desc_sz);
pkt_info->qsel = rtw_pci_get_tx_qsel(skb, queue);
--
2.20.1


2020-06-26 00:13:33

by Thomas Pedersen

[permalink] [raw]
Subject: Re: [PATCH] rtw88: fix skb_under_panic in tx path

On 2020-06-25 13:18, Nick Owens wrote:
> hello :)

Hi Nick :)

> this change fixes a reliable crash on my thinkpad A485.
>
> please note i have no prior experience doing kernel development or
> sending patches, and i'm not sure if this is a correct approach.

You probably want to submit patches with git-send-email. See
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

> From aa589182d30a0f99e1b3201ed4f3830e8af71dac Mon Sep 17 00:00:00 2001
> From: Nick Owens <[email protected]>
> Date: Thu, 25 Jun 2020 12:55:41 -0700
> Subject: [PATCH] rtw88: fix skb_under_panic in tx path
>
> fixes the following panic on my thinkpad A485
>
> Oops#1 Part3
> <0>[ 3743.881656] skbuff: skb_under_panic: text:000000005f69fd98
> len:208 put:48 head:000000009e2719e8 data:00000000bd3795e0 tail:0xc2
> end:0x2c0 dev:wlp2s0

skb->head and skb->data here are really far (0.5GB) apart. Maybe
skb->data actually got corrupted earlier?

> diff --git a/drivers/net/wireless/realtek/rtw88/pci.c
> b/drivers/net/wireless/realtek/rtw88/pci.c
> index d735f3127fe8..21b3b268cb25 100644
> --- a/drivers/net/wireless/realtek/rtw88/pci.c
> +++ b/drivers/net/wireless/realtek/rtw88/pci.c
> @@ -741,6 +741,12 @@ static int rtw_pci_tx_write_data(struct rtw_dev
> *rtwdev,
> else if (!avail_desc(ring->r.wp, ring->r.rp, ring->r.len))
> return -ENOSPC;
>
> + if (skb_headroom(skb) < chip->tx_pkt_desc_sz &&
> + pskb_expand_head(skb, chip->tx_pkt_desc_sz - skb_headroom(skb),
> 0, GFP_ATOMIC)) {
> + dev_err(rtwdev->dev, "no headroom available");
> + return -ENOMEM;
> + }
> +

If it is a headroom issue, you can actually express the needed headroom
needed by the driver in hw->extra_tx_headroom during init and avoid the
pskb_expand_head() here.

--
thomas

2020-06-29 18:50:55

by Tony Chuang

[permalink] [raw]
Subject: RE: [PATCH] rtw88: fix skb_under_panic in tx path

> On 2020-06-25 13:18, Nick Owens wrote:
> > hello :)
>
> Hi Nick :)
>
> > this change fixes a reliable crash on my thinkpad A485.
> >
> > please note i have no prior experience doing kernel development or
> > sending patches, and i'm not sure if this is a correct approach.
>
> You probably want to submit patches with git-send-email. See
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatch
> es
>
> > From aa589182d30a0f99e1b3201ed4f3830e8af71dac Mon Sep 17 00:00:00
> 2001
> > From: Nick Owens <[email protected]>
> > Date: Thu, 25 Jun 2020 12:55:41 -0700
> > Subject: [PATCH] rtw88: fix skb_under_panic in tx path
> >
> > fixes the following panic on my thinkpad A485
> >
> > Oops#1 Part3
> > <0>[ 3743.881656] skbuff: skb_under_panic: text:000000005f69fd98
> > len:208 put:48 head:000000009e2719e8 data:00000000bd3795e0 tail:0xc2
> > end:0x2c0 dev:wlp2s0
>
> skb->head and skb->data here are really far (0.5GB) apart. Maybe
> skb->data actually got corrupted earlier?
>
> > diff --git a/drivers/net/wireless/realtek/rtw88/pci.c
> > b/drivers/net/wireless/realtek/rtw88/pci.c
> > index d735f3127fe8..21b3b268cb25 100644
> > --- a/drivers/net/wireless/realtek/rtw88/pci.c
> > +++ b/drivers/net/wireless/realtek/rtw88/pci.c
> > @@ -741,6 +741,12 @@ static int rtw_pci_tx_write_data(struct rtw_dev
> > *rtwdev,
> > else if (!avail_desc(ring->r.wp, ring->r.rp, ring->r.len))
> > return -ENOSPC;
> >
> > + if (skb_headroom(skb) < chip->tx_pkt_desc_sz &&
> > + pskb_expand_head(skb, chip->tx_pkt_desc_sz - skb_headroom(skb),
> > 0, GFP_ATOMIC)) {
> > + dev_err(rtwdev->dev, "no headroom available");
> > + return -ENOMEM;
> > + }
> > +
>
> If it is a headroom issue, you can actually express the needed headroom
> needed by the driver in hw->extra_tx_headroom during init and avoid the
> pskb_expand_head() here.
>

Looks like a headroom issue, but the driver already assigned headroom.
max_tx_headroom = rtwdev->chip->tx_pkt_desc_sz;
hw->extra_tx_headroom = max_tx_headroom;

Then I am not sure why this happens. Nick, can you help to dump_stack()
so we can see where is the skb from?

Yen-Hsuan

2020-11-21 01:45:28

by Brian Norris

[permalink] [raw]
Subject: Re: [PATCH] rtw88: fix skb_under_panic in tx path

(Swapping out Yen-Hsuan's new email)

Necromancing an old thread, since it's still relevant, and I had it
sitting in my inbox to deal with. Now I have something useful to say!

On Mon, Jun 29, 2020 at 11:50 AM Tony Chuang <[email protected]> wrote:
> > On 2020-06-25 13:18, Nick Owens wrote:

> > > fixes the following panic on my thinkpad A485
> > >
> > > Oops#1 Part3
> > > <0>[ 3743.881656] skbuff: skb_under_panic: text:000000005f69fd98
> > > len:208 put:48 head:000000009e2719e8 data:00000000bd3795e0 tail:0xc2
> > > end:0x2c0 dev:wlp2s0
...
> > skb->head and skb->data here are really far (0.5GB) apart. Maybe
> > skb->data actually got corrupted earlier?

For the record, I've reproduced similar issues myself, and the problem
occurs when
(a) the initial SKB starts with minimal headroom (I find that many
SKBs come into mac80211 with plenty of headroom)
(b) the SKB participates in AMSDU aggregation
I've spotted a specific bug, which I'll point out below. But I remain
confused why in many cases the SKB ends up looking so corrupted.

...
> > If it is a headroom issue, you can actually express the needed headroom
> > needed by the driver in hw->extra_tx_headroom during init and avoid the
> > pskb_expand_head() here.
>
> Looks like a headroom issue, but the driver already assigned headroom.
> max_tx_headroom = rtwdev->chip->tx_pkt_desc_sz;
> hw->extra_tx_headroom = max_tx_headroom;
>
> Then I am not sure why this happens. Nick, can you help to dump_stack()
> so we can see where is the skb from?

That's not so easy, because of the layers and queueing involved, but
per the above, I've blamed mac80211's AMSDU aggregation. Specifically:

ieee80211_amsdu_aggregate() -> ieee80211_amsdu_prepare_head(), where
we pad out / expand the SKB to fit some additional AMSDU headers (and
later append additional data). But the padding function
(ieee80211_amsdu_realloc_pad()) accounts only for the 802.11 protocol
headroom, and not for the driver-specific headroom. So it chooses not
to expand the headroom, and instead eats into rtw88's space.

For such SKBs, they end up in the driver without sufficient headroom
-- thus, Nick's bug report.

NB: the seemingly-obvious fix (changing the headroom checks in
ieee80211_amsdu_realloc_pad()) does not seem to work, as I hit other
bugs along the way. Unfortunately, I haven't had the time to fix this
all myself properly, nor have I convinced Realtek to fix this
themselves. So in the meantime, Chrome OS is running with this:

https://chromium.googlesource.com/chromiumos/third_party/kernel/+/260a7d4939c323aebe80efc73610682ad2cb187a%5E%21/#F0

It's a similar idea.

We should of course fix the mac80211 bug, but I wonder if we also
deserve some patch similar to either the Chromium patch or Nick's
somewhere (perhaps with a loud warning, etc.), because it's much more
user friendly (in the face of similar future bugs) to do some
suboptimal memcpy()'s, etc., than to crash their systems

Brian