Hello,
syzbot found the following issue on:
HEAD commit: a6bd6c933339 Add linux-next specific files for 20240328
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=15c85eb1180000
kernel config: https://syzkaller.appspot.com/x/.config?x=b0058bda1436e073
dashboard link: https://syzkaller.appspot.com/bug?extid=0438378d6f157baae1a2
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
Unfortunately, I don't have any reproducer for this issue yet.
Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/7c1618ff7d25/disk-a6bd6c93.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/875519f620fe/vmlinux-a6bd6c93.xz
kernel image: https://storage.googleapis.com/syzbot-assets/ad92b057fb96/bzImage-a6bd6c93.xz
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]
------------[ cut here ]------------
WARNING: CPU: 1 PID: 2400 at kernel/kcov.c:860 kcov_remote_start+0x549/0x7e0 kernel/kcov.c:860
Modules linked in:
CPU: 1 PID: 2400 Comm: kworker/u8:7 Not tainted 6.9.0-rc1-next-20240328-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
Workqueue: events_unbound cfg80211_wiphy_work
RIP: 0010:kcov_remote_start+0x549/0x7e0 kernel/kcov.c:860
Code: 4c 89 ff be 03 00 00 00 e8 14 99 16 03 e9 fd fa ff ff e8 8a 26 ea 09 41 f7 c6 00 02 00 00 0f 84 eb fa ff ff e9 7f fc ff ff 90 <0f> 0b 90 e8 8f 43 ea 09 89 c0 48 c7 c7 c8 d4 02 00 48 03 3c c5 d0
RSP: 0018:ffffc90009b17aa8 EFLAGS: 00010002
RAX: 0000000080000000 RBX: ffff888029649e00 RCX: 0000000000000002
RDX: dffffc0000000000 RSI: ffffffff8bcae740 RDI: ffffffff8c1f77c0
RBP: 0000000000000000 R08: ffffffff92f3358f R09: 1ffffffff25e66b1
R10: dffffc0000000000 R11: fffffbfff25e66b2 R12: ffffffff8195747e
R13: ffff88807c8cd748 R14: 0000000000000246 R15: ffff8880b952d4c8
FS: 0000000000000000(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc66d258d58 CR3: 00000000222ca000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
kcov_remote_start_common include/linux/kcov.h:48 [inline]
ieee80211_iface_work+0x21f/0xf10 net/mac80211/iface.c:1654
cfg80211_wiphy_work+0x221/0x260 net/wireless/core.c:437
process_one_work kernel/workqueue.c:3218 [inline]
process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299
worker_thread+0x86d/0xd70 kernel/workqueue.c:3380
kthread+0x2f0/0x390 kernel/kthread.c:388
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
</TASK>
---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].
syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title
If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)
If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report
If you want to undo deduplication, reply with:
#syz undup
On Thu, 2024-03-28 at 04:00 -0700, syzbot wrote:
>
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 2400 at kernel/kcov.c:860 kcov_remote_start+0x549/0x7e0 kernel/kcov.c:860
This is
/*
* Check that kcov_remote_start() is not called twice in background
* threads nor called by user tasks (with enabled kcov).
*/
mode = READ_ONCE(t->kcov_mode);
if (WARN_ON(in_task() && kcov_mode_enabled(mode))) {
local_unlock_irqrestore(&kcov_percpu_data.lock, flags);
return;
}
but I have no idea what that even means?
> Workqueue: events_unbound cfg80211_wiphy_work
> RIP: 0010:kcov_remote_start+0x549/0x7e0 kernel/kcov.c:860
...
> Call Trace:
> <TASK>
> kcov_remote_start_common include/linux/kcov.h:48 [inline]
> ieee80211_iface_work+0x21f/0xf10 net/mac80211/iface.c:1654
> cfg80211_wiphy_work+0x221/0x260 net/wireless/core.c:437
> process_one_work kernel/workqueue.c:3218 [inline]
> process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299
> worker_thread+0x86d/0xd70 kernel/workqueue.c:3380
It's a worker thread. Was this not intended to be called in threads?
johannes
On Thu, Mar 28, 2024 at 12:45 PM Johannes Berg
<[email protected]> wrote:
>
> On Thu, 2024-03-28 at 04:00 -0700, syzbot wrote:
> >
> > ------------[ cut here ]------------
> > WARNING: CPU: 1 PID: 2400 at kernel/kcov.c:860 kcov_remote_start+0x549/0x7e0 kernel/kcov.c:860
>
> This is
>
> /*
> * Check that kcov_remote_start() is not called twice in background
> * threads nor called by user tasks (with enabled kcov).
> */
> mode = READ_ONCE(t->kcov_mode);
> if (WARN_ON(in_task() && kcov_mode_enabled(mode))) {
> local_unlock_irqrestore(&kcov_percpu_data.lock, flags);
> return;
> }
>
> but I have no idea what that even means?
>
> > Workqueue: events_unbound cfg80211_wiphy_work
> > RIP: 0010:kcov_remote_start+0x549/0x7e0 kernel/kcov.c:860
> ...
> > Call Trace:
> > <TASK>
> > kcov_remote_start_common include/linux/kcov.h:48 [inline]
> > ieee80211_iface_work+0x21f/0xf10 net/mac80211/iface.c:1654
> > cfg80211_wiphy_work+0x221/0x260 net/wireless/core.c:437
> > process_one_work kernel/workqueue.c:3218 [inline]
> > process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299
> > worker_thread+0x86d/0xd70 kernel/workqueue.c:3380
>
> It's a worker thread. Was this not intended to be called in threads?
I think the problem is that the KCOV annotations in the NFC code are
buggy: kcov_remote_stop() is never called if the loop in nci_rx_work()
exits on one of the breaks. With the recent addition of the nci_plen()
check, this started happening often. But breaks existed in the loop
before that too.
We need to move kcov_remote_stop() into the loop and call it every
time the loop exits.
Dmitry, could you PTAL and confirm this? You added the annotation for
NFC, AFAICS.
Thanks!
On Wed, 10 Apr 2024 at 12:56, Andrey Konovalov <[email protected]> wrote:
>
> On Thu, Mar 28, 2024 at 12:45 PM Johannes Berg
> <[email protected]> wrote:
> >
> > On Thu, 2024-03-28 at 04:00 -0700, syzbot wrote:
> > >
> > > ------------[ cut here ]------------
> > > WARNING: CPU: 1 PID: 2400 at kernel/kcov.c:860 kcov_remote_start+0x549/0x7e0 kernel/kcov.c:860
> >
> > This is
> >
> > /*
> > * Check that kcov_remote_start() is not called twice in background
> > * threads nor called by user tasks (with enabled kcov).
> > */
> > mode = READ_ONCE(t->kcov_mode);
> > if (WARN_ON(in_task() && kcov_mode_enabled(mode))) {
> > local_unlock_irqrestore(&kcov_percpu_data.lock, flags);
> > return;
> > }
> >
> > but I have no idea what that even means?
> >
> > > Workqueue: events_unbound cfg80211_wiphy_work
> > > RIP: 0010:kcov_remote_start+0x549/0x7e0 kernel/kcov.c:860
> > ...
> > > Call Trace:
> > > <TASK>
> > > kcov_remote_start_common include/linux/kcov.h:48 [inline]
> > > ieee80211_iface_work+0x21f/0xf10 net/mac80211/iface.c:1654
> > > cfg80211_wiphy_work+0x221/0x260 net/wireless/core.c:437
> > > process_one_work kernel/workqueue.c:3218 [inline]
> > > process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299
> > > worker_thread+0x86d/0xd70 kernel/workqueue.c:3380
> >
> > It's a worker thread. Was this not intended to be called in threads?
>
> I think the problem is that the KCOV annotations in the NFC code are
> buggy: kcov_remote_stop() is never called if the loop in nci_rx_work()
> exits on one of the breaks. With the recent addition of the nci_plen()
> check, this started happening often. But breaks existed in the loop
> before that too.
>
> We need to move kcov_remote_stop() into the loop and call it every
> time the loop exits.
>
> Dmitry, could you PTAL and confirm this? You added the annotation for
> NFC, AFAICS.
Missed this before somehow.
The other breaks seems to be from the switch, so should be fine:
https://elixir.bootlin.com/linux/v6.9-rc6/source/net/nfc/nci/core.c#L1528
Tetsuo, thanks for fixing it.