2023-03-17 18:12:53

by Tim K

[permalink] [raw]
Subject: rtw88: kernel NULL pointer dereference

Hello again all,

I've been running a few rtw8822bu dongles in monitor mode doing a
packet capture on an STM32, running Linux 5.15. It's been fairly
stable over a few days, but I checked back in on it and found this in
the log.

Any thoughts on what could have caused this? This is from lwfinger's
GitHub repo at 4bede29 (I believe.)

[406207.728602] 8<--- cut here ---
[406207.730533] Unable to handle kernel NULL pointer dereference at
virtual address 00000004
[406207.740137] pgd = 40e434d7
[406207.741665] [00000004] *pgd=00000000
[406207.745300] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[406207.801313] CPU: 0 PID: 23145 Comm: kworker/u4:2 Tainted: G
O 5.15.24 #13
[406207.809516] Hardware name: STM32 (Device Tree Support)
[406207.814783] Workqueue: rtw88_usb: rx wq rtw_usb_rx_handler [rtw88_usb]
[406207.821397] PC is at rtw_rx_fill_rx_status+0x40/0x278 [rtw_core]
[406207.827604] LR is at rtw_rx_fill_rx_status+0x3c/0x278 [rtw_core]
[406207.833666] pc : [<bf324cac>] lr : [<bf324ca8>] psr: 400f0113
[406207.840043] sp : c5379e20 ip : c5379eb8 fp : 00000001
...
[406208.084918] [<bf324cac>] (rtw_rx_fill_rx_status [rtw_core]) from
[<bf3c6b44>] (rtw8822b_query_rx_desc+0x10c/0x19c [rtw_8822b])
[406208.096524] [<bf3c6b44>] (rtw8822b_query_rx_desc [rtw_8822b]) from
[<bf416920>] (rtw_usb_rx_handler+0x6c/0x16c [rtw88_usb])
[406208.107820] [<bf416920>] (rtw_usb_rx_handler [rtw88_usb]) from
[<c01455cc>] (process_one_work+0x1dc/0x588)
[406208.117570] [<c01455cc>] (process_one_work) from [<c0145e04>]
(worker_thread+0x5c/0x568)
[406208.125790] [<c0145e04>] (worker_thread) from [<c014c230>]
(kthread+0x144/0x160)
[406208.133310] [<c014c230>] (kthread) from [<c0100130>]
(ret_from_fork+0x14/0x24)
...
[406208.186520] ---[ end trace 140c234232a74c11 ]---

Kind regards,

Tim


Attachments:
kernel.log (4.99 kB)

2023-03-17 18:32:24

by Larry Finger

[permalink] [raw]
Subject: Re: rtw88: kernel NULL pointer dereference

On 3/17/23 13:12, Tim K wrote:
> Hello again all,
>
> I've been running a few rtw8822bu dongles in monitor mode doing a
> packet capture on an STM32, running Linux 5.15. It's been fairly
> stable over a few days, but I checked back in on it and found this in
> the log.
>
> Any thoughts on what could have caused this? This is from lwfinger's
> GitHub repo at 4bede29 (I believe.)
>
> [406207.728602] 8<--- cut here ---
> [406207.730533] Unable to handle kernel NULL pointer dereference at
> virtual address 00000004
> [406207.740137] pgd = 40e434d7
> [406207.741665] [00000004] *pgd=00000000
> [406207.745300] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
> [406207.801313] CPU: 0 PID: 23145 Comm: kworker/u4:2 Tainted: G
> O 5.15.24 #13
> [406207.809516] Hardware name: STM32 (Device Tree Support)
> [406207.814783] Workqueue: rtw88_usb: rx wq rtw_usb_rx_handler [rtw88_usb]
> [406207.821397] PC is at rtw_rx_fill_rx_status+0x40/0x278 [rtw_core]
> [406207.827604] LR is at rtw_rx_fill_rx_status+0x3c/0x278 [rtw_core]
> [406207.833666] pc : [<bf324cac>] lr : [<bf324ca8>] psr: 400f0113
> [406207.840043] sp : c5379e20 ip : c5379eb8 fp : 00000001
> ...
> [406208.084918] [<bf324cac>] (rtw_rx_fill_rx_status [rtw_core]) from
> [<bf3c6b44>] (rtw8822b_query_rx_desc+0x10c/0x19c [rtw_8822b])
> [406208.096524] [<bf3c6b44>] (rtw8822b_query_rx_desc [rtw_8822b]) from
> [<bf416920>] (rtw_usb_rx_handler+0x6c/0x16c [rtw88_usb])
> [406208.107820] [<bf416920>] (rtw_usb_rx_handler [rtw88_usb]) from
> [<c01455cc>] (process_one_work+0x1dc/0x588)
> [406208.117570] [<c01455cc>] (process_one_work) from [<c0145e04>]
> (worker_thread+0x5c/0x568)
> [406208.125790] [<c0145e04>] (worker_thread) from [<c014c230>]
> (kthread+0x144/0x160)
> [406208.133310] [<c014c230>] (kthread) from [<c0100130>]
> (ret_from_fork+0x14/0x24)
> ...
> [406208.186520] ---[ end trace 140c234232a74c11 ]---

Tim,

I can tell from your dump that you are not running x86_64 hardware. Is it ARM?

In any case, please run the command 'gdb rtw_core.ko' while in the rtw88
directory, and once gdb gets started, enter the command
l *rtw_rx_fill_rx_status+0x40

Report the output. That will show the line of code that is failing. We should be
able to figure it out from there.

Larry



2023-03-17 19:12:21

by Tim K

[permalink] [raw]
Subject: Re: rtw88: kernel NULL pointer dereference

Larry,

Yes, it's an arm system. (ARMv7 rev5, if it's of use.) Here is the
output from GDB:

Reading symbols from rtw88/rtw_core.ko...done.
>>> l *rtw_rx_fill_rx_status+0x40
0x8cac is in rtw_rx_fill_rx_status (/home/tkuester/code/rtw88/rx.c:159).
154 {
155 struct ieee80211_hw *hw = rtwdev->hw;
156 u8 path;
157
158 memset(rx_status, 0, sizeof(*rx_status));
159 rx_status->freq = hw->conf.chandef.chan->center_freq;
160 rx_status->band = hw->conf.chandef.chan->band;
161 if (rtw_fw_feature_check(&rtwdev->fw, FW_FEATURE_SCAN_OFFLOAD) &&
162 test_bit(RTW_FLAG_SCANNING, rtwdev->flags))
163 rtw_set_rx_freq_by_pktstat(pkt_stat, rx_status);

Please let me know if I can be of any more assistance!

- Tim

On Fri, Mar 17, 2023 at 2:32 PM Larry Finger <[email protected]> wrote:
>
> On 3/17/23 13:12, Tim K wrote:
> > Hello again all,
> >
> > I've been running a few rtw8822bu dongles in monitor mode doing a
> > packet capture on an STM32, running Linux 5.15. It's been fairly
> > stable over a few days, but I checked back in on it and found this in
> > the log.
> >
> > Any thoughts on what could have caused this? This is from lwfinger's
> > GitHub repo at 4bede29 (I believe.)
> >
> > [406207.728602] 8<--- cut here ---
> > [406207.730533] Unable to handle kernel NULL pointer dereference at
> > virtual address 00000004
> > [406207.740137] pgd = 40e434d7
> > [406207.741665] [00000004] *pgd=00000000
> > [406207.745300] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
> > [406207.801313] CPU: 0 PID: 23145 Comm: kworker/u4:2 Tainted: G
> > O 5.15.24 #13
> > [406207.809516] Hardware name: STM32 (Device Tree Support)
> > [406207.814783] Workqueue: rtw88_usb: rx wq rtw_usb_rx_handler [rtw88_usb]
> > [406207.821397] PC is at rtw_rx_fill_rx_status+0x40/0x278 [rtw_core]
> > [406207.827604] LR is at rtw_rx_fill_rx_status+0x3c/0x278 [rtw_core]
> > [406207.833666] pc : [<bf324cac>] lr : [<bf324ca8>] psr: 400f0113
> > [406207.840043] sp : c5379e20 ip : c5379eb8 fp : 00000001
> > ...
> > [406208.084918] [<bf324cac>] (rtw_rx_fill_rx_status [rtw_core]) from
> > [<bf3c6b44>] (rtw8822b_query_rx_desc+0x10c/0x19c [rtw_8822b])
> > [406208.096524] [<bf3c6b44>] (rtw8822b_query_rx_desc [rtw_8822b]) from
> > [<bf416920>] (rtw_usb_rx_handler+0x6c/0x16c [rtw88_usb])
> > [406208.107820] [<bf416920>] (rtw_usb_rx_handler [rtw88_usb]) from
> > [<c01455cc>] (process_one_work+0x1dc/0x588)
> > [406208.117570] [<c01455cc>] (process_one_work) from [<c0145e04>]
> > (worker_thread+0x5c/0x568)
> > [406208.125790] [<c0145e04>] (worker_thread) from [<c014c230>]
> > (kthread+0x144/0x160)
> > [406208.133310] [<c014c230>] (kthread) from [<c0100130>]
> > (ret_from_fork+0x14/0x24)
> > ...
> > [406208.186520] ---[ end trace 140c234232a74c11 ]---
>
> Tim,
>
> I can tell from your dump that you are not running x86_64 hardware. Is it ARM?
>
> In any case, please run the command 'gdb rtw_core.ko' while in the rtw88
> directory, and once gdb gets started, enter the command
> l *rtw_rx_fill_rx_status+0x40
>
> Report the output. That will show the line of code that is failing. We should be
> able to figure it out from there.
>
> Larry
>
>

2023-03-20 02:14:55

by Larry Finger

[permalink] [raw]
Subject: Re: rtw88: kernel NULL pointer dereference

On 3/17/23 14:11, Tim K wrote:
> Larry,
>
> Yes, it's an arm system. (ARMv7 rev5, if it's of use.) Here is the
> output from GDB:
>
> Reading symbols from rtw88/rtw_core.ko...done.
>>>> l *rtw_rx_fill_rx_status+0x40
> 0x8cac is in rtw_rx_fill_rx_status (/home/tkuester/code/rtw88/rx.c:159).
> 154 {
> 155 struct ieee80211_hw *hw = rtwdev->hw;
> 156 u8 path;
> 157
> 158 memset(rx_status, 0, sizeof(*rx_status));
> 159 rx_status->freq = hw->conf.chandef.chan->center_freq;
> 160 rx_status->band = hw->conf.chandef.chan->band;
> 161 if (rtw_fw_feature_check(&rtwdev->fw, FW_FEATURE_SCAN_OFFLOAD) &&
> 162 test_bit(RTW_FLAG_SCANNING, rtwdev->flags))
> 163 rtw_set_rx_freq_by_pktstat(pkt_stat, rx_status);
>
> Please let me know if I can be of any more assistance!
>
> - Tim
>
> On Fri, Mar 17, 2023 at 2:32 PM Larry Finger <[email protected]> wrote:
>>
>> On 3/17/23 13:12, Tim K wrote:
>>> Hello again all,
>>>
>>> I've been running a few rtw8822bu dongles in monitor mode doing a
>>> packet capture on an STM32, running Linux 5.15. It's been fairly
>>> stable over a few days, but I checked back in on it and found this in
>>> the log.
>>>
>>> Any thoughts on what could have caused this? This is from lwfinger's
>>> GitHub repo at 4bede29 (I believe.)
>>>
>>> [406207.728602] 8<--- cut here ---
>>> [406207.730533] Unable to handle kernel NULL pointer dereference at
>>> virtual address 00000004
>>> [406207.740137] pgd = 40e434d7
>>> [406207.741665] [00000004] *pgd=00000000
>>> [406207.745300] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
>>> [406207.801313] CPU: 0 PID: 23145 Comm: kworker/u4:2 Tainted: G
>>> O 5.15.24 #13
>>> [406207.809516] Hardware name: STM32 (Device Tree Support)
>>> [406207.814783] Workqueue: rtw88_usb: rx wq rtw_usb_rx_handler [rtw88_usb]
>>> [406207.821397] PC is at rtw_rx_fill_rx_status+0x40/0x278 [rtw_core]
>>> [406207.827604] LR is at rtw_rx_fill_rx_status+0x3c/0x278 [rtw_core]
>>> [406207.833666] pc : [<bf324cac>] lr : [<bf324ca8>] psr: 400f0113
>>> [406207.840043] sp : c5379e20 ip : c5379eb8 fp : 00000001
>>> ...
>>> [406208.084918] [<bf324cac>] (rtw_rx_fill_rx_status [rtw_core]) from
>>> [<bf3c6b44>] (rtw8822b_query_rx_desc+0x10c/0x19c [rtw_8822b])
>>> [406208.096524] [<bf3c6b44>] (rtw8822b_query_rx_desc [rtw_8822b]) from
>>> [<bf416920>] (rtw_usb_rx_handler+0x6c/0x16c [rtw88_usb])
>>> [406208.107820] [<bf416920>] (rtw_usb_rx_handler [rtw88_usb]) from
>>> [<c01455cc>] (process_one_work+0x1dc/0x588)
>>> [406208.117570] [<c01455cc>] (process_one_work) from [<c0145e04>]
>>> (worker_thread+0x5c/0x568)
>>> [406208.125790] [<c0145e04>] (worker_thread) from [<c014c230>]
>>> (kthread+0x144/0x160)
>>> [406208.133310] [<c014c230>] (kthread) from [<c0100130>]
>>> (ret_from_fork+0x14/0x24)
>>> ...
>>> [406208.186520] ---[ end trace 140c234232a74c11 ]---
>>
>> Tim,

Tim,

I happen to have an RTW8822BU device that I have been beating up all weekend
without triggering your kernel BUG. I did find one in my back journalctl logs,
but no new ones. The test was ping-ponging 6.22GB files from/to my nfs server
using rsync without compression. The download speeds were between 24 and 31
MB/s, and upload speeds were about 50 MB/s. My AP is ax1500 with a 1Gb/s
connection between the AP and the server.

Your traceback shows the NULL pointer is in routine rtw_rx_fill_rx_status(),
which is only a couple of routines removed from rtw_usb_rx_handler(). The error
is in the statement
rx_status->freq = hw->conf.chandef.chan->center_freq;

As rx_status is local to the routine, the error has to be in the right-hand side
of the statement. As hw->conf.....->center_freq is apparently OK most of the
time, I think you are getting a spurious RX interrupt. I have pushed a patch
that will show if hw is NULL, and quit the routine. Do a 'git pull', make... and
let me know what you see in the log.

Larry

2023-04-04 15:29:28

by Tim K

[permalink] [raw]
Subject: Re: rtw88: kernel NULL pointer dereference

> As rx_status is local to the routine, the error has to be in the right-hand side
> of the statement. As hw->conf.....->center_freq is apparently OK most of the
> time, I think you are getting a spurious RX interrupt. I have pushed a patch
> that will show if hw is NULL, and quit the routine. Do a 'git pull', make... and
> let me know what you see in the log.

Updated the drivers to e6d63e1 on your github repo, and ran for about
16 hours happily. Came back this morning to an error. I've attached
the stack trace, and the line of code that caused the error.

>>> l *rtw_rx_fill_rx_status+0x48
0x8cd8 is in rtw_rx_fill_rx_status (/home/tkuester/code/rtw88/rx.c:163).
163 rx_status->freq = hw->conf.chandef.chan->center_freq;

Much to my chagrin, it seems I have an intermittent USB cable causing
spurious disconnects. However, while this does appear to be associated
with the issue, I also discovered this issue 20-some odd seconds into
boot on one boot cycle with no mention of USB disconnects.


Attachments:
dmesg.log (4.99 kB)

2023-04-04 16:50:51

by Larry Finger

[permalink] [raw]
Subject: Re: rtw88: kernel NULL pointer dereference

On 4/4/23 10:07, Tim K wrote:
>> As rx_status is local to the routine, the error has to be in the right-hand side
>> of the statement. As hw->conf.....->center_freq is apparently OK most of the
>> time, I think you are getting a spurious RX interrupt. I have pushed a patch
>> that will show if hw is NULL, and quit the routine. Do a 'git pull', make... and
>> let me know what you see in the log.
>
> Updated the drivers to e6d63e1 on your github repo, and ran for about
> 16 hours happily. Came back this morning to an error. I've attached
> the stack trace, and the line of code that caused the error.
>
>>>> l *rtw_rx_fill_rx_status+0x48
> 0x8cd8 is in rtw_rx_fill_rx_status (/home/tkuester/code/rtw88/rx.c:163).
> 163 rx_status->freq = hw->conf.chandef.chan->center_freq;
>
> Much to my chagrin, it seems I have an intermittent USB cable causing
> spurious disconnects. However, while this does appear to be associated
> with the issue, I also discovered this issue 20-some odd seconds into
> boot on one boot cycle with no mention of USB disconnects.

Tim,

I suspect that your intermittent USB cable could also produce spurious
interrupts, which I thought might be the cause of the problem.

I had done some updates against wireless-next yesterday. To them, I added an
additional check in this routine. The driver should survive things like bad
cables without crashing the kernel. Do a pull, and let me know what you learn. I
would appreciate learning what happens with the faulty cable.

Larry