2021-04-01 18:32:31

by syzbot

[permalink] [raw]
Subject: [syzbot] WARNING in bpf_test_run

Hello,

syzbot found the following issue on:

HEAD commit: 36e79851 libbpf: Preserve empty DATASEC BTFs during static..
git tree: bpf-next
console output: https://syzkaller.appspot.com/x/log.txt?x=1569bb06d00000
kernel config: https://syzkaller.appspot.com/x/.config?x=7eff0f22b8563a5f
dashboard link: https://syzkaller.appspot.com/bug?extid=774c590240616eaa3423
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17556b7cd00000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1772be26d00000

The issue was bisected to:

commit 997acaf6b4b59c6a9c259740312a69ea549cc684
Author: Mark Rutland <[email protected]>
Date: Mon Jan 11 15:37:07 2021 +0000

lockdep: report broken irq restoration

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=10197016d00000
final oops: https://syzkaller.appspot.com/x/report.txt?x=12197016d00000
console output: https://syzkaller.appspot.com/x/log.txt?x=14197016d00000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]
Fixes: 997acaf6b4b5 ("lockdep: report broken irq restoration")

------------[ cut here ]------------
WARNING: CPU: 0 PID: 8725 at include/linux/bpf-cgroup.h:193 bpf_cgroup_storage_set include/linux/bpf-cgroup.h:193 [inline]
WARNING: CPU: 0 PID: 8725 at include/linux/bpf-cgroup.h:193 bpf_test_run+0x65e/0xaa0 net/bpf/test_run.c:109
Modules linked in:
CPU: 0 PID: 8725 Comm: syz-executor927 Not tainted 5.12.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:bpf_cgroup_storage_set include/linux/bpf-cgroup.h:193 [inline]
RIP: 0010:bpf_test_run+0x65e/0xaa0 net/bpf/test_run.c:109
Code: e9 29 fe ff ff e8 b2 9d 3a fa 41 83 c6 01 bf 08 00 00 00 44 89 f6 e8 51 a5 3a fa 41 83 fe 08 0f 85 74 fc ff ff e8 92 9d 3a fa <0f> 0b bd f0 ff ff ff e9 5c fd ff ff e8 81 9d 3a fa 83 c5 01 bf 08
RSP: 0018:ffffc900017bfaf0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffffc90000f29000 RCX: 0000000000000000
RDX: ffff88801bc68000 RSI: ffffffff8739543e RDI: 0000000000000003
RBP: 0000000000000007 R08: 0000000000000008 R09: 0000000000000001
R10: ffffffff8739542f R11: 0000000000000000 R12: dffffc0000000000
R13: ffff888021dd54c0 R14: 0000000000000008 R15: 0000000000000000
FS: 00007f00157d7700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f0015795718 CR3: 00000000157ae000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
bpf_prog_test_run_skb+0xabc/0x1c70 net/bpf/test_run.c:628
bpf_prog_test_run kernel/bpf/syscall.c:3132 [inline]
__do_sys_bpf+0x218b/0x4f40 kernel/bpf/syscall.c:4411
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x446199
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 11 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f00157d72f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
RAX: ffffffffffffffda RBX: 00000000004cb440 RCX: 0000000000446199
RDX: 0000000000000028 RSI: 0000000020000080 RDI: 000000000000000a
RBP: 000000000049b074 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: f9abde7200f522cd
R13: 3952ddf3af240c07 R14: 1631e0d82d3fa99d R15: 00000000004cb448


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches


2021-04-01 22:08:33

by Yonghong Song

[permalink] [raw]
Subject: Re: [syzbot] WARNING in bpf_test_run



On 4/1/21 4:29 AM, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 36e79851 libbpf: Preserve empty DATASEC BTFs during static..
> git tree: bpf-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=1569bb06d00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=7eff0f22b8563a5f
> dashboard link: https://syzkaller.appspot.com/bug?extid=774c590240616eaa3423
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17556b7cd00000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1772be26d00000
>
> The issue was bisected to:
>
> commit 997acaf6b4b59c6a9c259740312a69ea549cc684
> Author: Mark Rutland <[email protected]>
> Date: Mon Jan 11 15:37:07 2021 +0000
>
> lockdep: report broken irq restoration
>
> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=10197016d00000
> final oops: https://syzkaller.appspot.com/x/report.txt?x=12197016d00000
> console output: https://syzkaller.appspot.com/x/log.txt?x=14197016d00000
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: [email protected]
> Fixes: 997acaf6b4b5 ("lockdep: report broken irq restoration")
>
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 8725 at include/linux/bpf-cgroup.h:193 bpf_cgroup_storage_set include/linux/bpf-cgroup.h:193 [inline]
> WARNING: CPU: 0 PID: 8725 at include/linux/bpf-cgroup.h:193 bpf_test_run+0x65e/0xaa0 net/bpf/test_run.c:109

I will look at this issue. Thanks!

> Modules linked in:
> CPU: 0 PID: 8725 Comm: syz-executor927 Not tainted 5.12.0-rc4-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> RIP: 0010:bpf_cgroup_storage_set include/linux/bpf-cgroup.h:193 [inline]
> RIP: 0010:bpf_test_run+0x65e/0xaa0 net/bpf/test_run.c:109
> Code: e9 29 fe ff ff e8 b2 9d 3a fa 41 83 c6 01 bf 08 00 00 00 44 89 f6 e8 51 a5 3a fa 41 83 fe 08 0f 85 74 fc ff ff e8 92 9d 3a fa <0f> 0b bd f0 ff ff ff e9 5c fd ff ff e8 81 9d 3a fa 83 c5 01 bf 08
> RSP: 0018:ffffc900017bfaf0 EFLAGS: 00010293
> RAX: 0000000000000000 RBX: ffffc90000f29000 RCX: 0000000000000000
> RDX: ffff88801bc68000 RSI: ffffffff8739543e RDI: 0000000000000003
> RBP: 0000000000000007 R08: 0000000000000008 R09: 0000000000000001
> R10: ffffffff8739542f R11: 0000000000000000 R12: dffffc0000000000
> R13: ffff888021dd54c0 R14: 0000000000000008 R15: 0000000000000000
> FS: 00007f00157d7700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f0015795718 CR3: 00000000157ae000 CR4: 00000000001506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> bpf_prog_test_run_skb+0xabc/0x1c70 net/bpf/test_run.c:628
> bpf_prog_test_run kernel/bpf/syscall.c:3132 [inline]
> __do_sys_bpf+0x218b/0x4f40 kernel/bpf/syscall.c:4411
> do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
> entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x446199
> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 11 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007f00157d72f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
> RAX: ffffffffffffffda RBX: 00000000004cb440 RCX: 0000000000446199
> RDX: 0000000000000028 RSI: 0000000020000080 RDI: 000000000000000a
> RBP: 000000000049b074 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: f9abde7200f522cd
> R13: 3952ddf3af240c07 R14: 1631e0d82d3fa99d R15: 00000000004cb448
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at [email protected].
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> syzbot can test patches for this issue, for details see:
> https://goo.gl/tpsmEJ#testing-patches
>

2021-04-02 00:43:52

by Yonghong Song

[permalink] [raw]
Subject: Re: [syzbot] WARNING in bpf_test_run



On 4/1/21 3:05 PM, Yonghong Song wrote:
>
>
> On 4/1/21 4:29 AM, syzbot wrote:
>> Hello,
>>
>> syzbot found the following issue on:
>>
>> HEAD commit:    36e79851 libbpf: Preserve empty DATASEC BTFs during
>> static..
>> git tree:       bpf-next
>> console output:
>> https://syzkaller.appspot.com/x/log.txt?x=1569bb06d00000
>> kernel config:
>> https://syzkaller.appspot.com/x/.config?x=7eff0f22b8563a5f
>> dashboard link:
>> https://syzkaller.appspot.com/bug?extid=774c590240616eaa3423
>> syz repro:
>> https://syzkaller.appspot.com/x/repro.syz?x=17556b7cd00000
>> C reproducer:
>> https://syzkaller.appspot.com/x/repro.c?x=1772be26d00000
>>
>> The issue was bisected to:
>>
>> commit 997acaf6b4b59c6a9c259740312a69ea549cc684
>> Author: Mark Rutland <[email protected]>
>> Date:   Mon Jan 11 15:37:07 2021 +0000
>>
>>      lockdep: report broken irq restoration
>>
>> bisection log:
>> https://syzkaller.appspot.com/x/bisect.txt?x=10197016d00000
>> final oops:
>> https://syzkaller.appspot.com/x/report.txt?x=12197016d00000
>> console output:
>> https://syzkaller.appspot.com/x/log.txt?x=14197016d00000
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the
>> commit:
>> Reported-by: [email protected]
>> Fixes: 997acaf6b4b5 ("lockdep: report broken irq restoration")
>>
>> ------------[ cut here ]------------
>> WARNING: CPU: 0 PID: 8725 at include/linux/bpf-cgroup.h:193
>> bpf_cgroup_storage_set include/linux/bpf-cgroup.h:193 [inline]
>> WARNING: CPU: 0 PID: 8725 at include/linux/bpf-cgroup.h:193
>> bpf_test_run+0x65e/0xaa0 net/bpf/test_run.c:109
>
> I will look at this issue. Thanks!
>
>> Modules linked in:
>> CPU: 0 PID: 8725 Comm: syz-executor927 Not tainted
>> 5.12.0-rc4-syzkaller #0
>> Hardware name: Google Google Compute Engine/Google Compute Engine,
>> BIOS Google 01/01/2011
>> RIP: 0010:bpf_cgroup_storage_set include/linux/bpf-cgroup.h:193 [inline]
>> RIP: 0010:bpf_test_run+0x65e/0xaa0 net/bpf/test_run.c:109
>> Code: e9 29 fe ff ff e8 b2 9d 3a fa 41 83 c6 01 bf 08 00 00 00 44 89
>> f6 e8 51 a5 3a fa 41 83 fe 08 0f 85 74 fc ff ff e8 92 9d 3a fa <0f> 0b
>> bd f0 ffff ff e9 5c fd ff ff e8 81 9d 3a fa 83 c5 01 bf 08
>> RSP: 0018:ffffc900017bfaf0 EFLAGS: 00010293
>> RAX: 0000000000000000 RBX: ffffc90000f29000 RCX: 0000000000000000
>> RDX: ffff88801bc68000 RSI: ffffffff8739543e RDI: 0000000000000003
>> RBP: 0000000000000007 R08: 0000000000000008 R09: 0000000000000001
>> R10: ffffffff8739542f R11: 0000000000000000 R12: dffffc0000000000
>> R13: ffff888021dd54c0 R14: 0000000000000008 R15: 0000000000000000
>> FS:  00007f00157d7700(0000) GS:ffff8880b9c00000(0000)
>> knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007f0015795718 CR3: 00000000157ae000 CR4: 00000000001506f0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Call Trace:
>>   bpf_prog_test_run_skb+0xabc/0x1c70 net/bpf/test_run.c:628
>>   bpf_prog_test_run kernel/bpf/syscall.c:3132 [inline]
>>   __do_sys_bpf+0x218b/0x4f40 kernel/bpf/syscall.c:4411
>>   do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46

Run on my qemu (4 cpus) with C reproducer and I cannot reproduce the
result. It already ran 30 minutes and still running. Checked the code,
it is just doing a lot of parallel bpf_prog_test_run's.

The failure is in the below WARN_ON_ONCE code:

175 static inline int bpf_cgroup_storage_set(struct bpf_cgroup_storage
176
*storage[MAX_BPF_CGROUP_STORAGE_TYPE])
177 {
178 enum bpf_cgroup_storage_type stype;
179 int i, err = 0;
180
181 preempt_disable();
182 for (i = 0; i < BPF_CGROUP_STORAGE_NEST_MAX; i++) {
183 if
(unlikely(this_cpu_read(bpf_cgroup_storage_info[i].task) != NULL))
184 continue;
185
186 this_cpu_write(bpf_cgroup_storage_info[i].task,
current);
187 for_each_cgroup_storage_type(stype)
188
this_cpu_write(bpf_cgroup_storage_info[i].storage[stype],
189 storage[stype]);
190 goto out;
191 }
192 err = -EBUSY;
193 WARN_ON_ONCE(1);
194
195 out:
196 preempt_enable();
197 return err;
198 }

Basically it shows the stress test triggered a warning due to
limited kernel resource.

>>   entry_SYSCALL_64_after_hwframe+0x44/0xae
>> RIP: 0033:0x446199
>> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 11 15 00 00 90 48 89 f8 48
>> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
>> 01 f0 ffff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
>> RSP: 002b:00007f00157d72f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
>> RAX: ffffffffffffffda RBX: 00000000004cb440 RCX: 0000000000446199
>> RDX: 0000000000000028 RSI: 0000000020000080 RDI: 000000000000000a
>> RBP: 000000000049b074 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000246 R12: f9abde7200f522cd
>> R13: 3952ddf3af240c07 R14: 1631e0d82d3fa99d R15: 00000000004cb448
>>
>>
>> ---
>> This report is generated by a bot. It may contain errors.
>> See
>> https://goo.gl/tpsmEJ
>> for more information about syzbot.
>> syzbot engineers can be reached at [email protected].
>>
>> syzbot will keep track of this issue. See:
>> https://goo.gl/tpsmEJ#status
>> for how to communicate with syzbot.
>> For information about bisection process see:
>> https://goo.gl/tpsmEJ#bisection
>> syzbot can test patches for this issue, for details see:
>> https://goo.gl/tpsmEJ#testing-patches
>>

2021-04-13 14:05:27

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [syzbot] WARNING in bpf_test_run

On Fri, Apr 2, 2021 at 2:41 AM 'Yonghong Song' via syzkaller-bugs
<[email protected]> wrote:
> > On 4/1/21 4:29 AM, syzbot wrote:
> >> Hello,
> >>
> >> syzbot found the following issue on:
> >>
> >> HEAD commit: 36e79851 libbpf: Preserve empty DATASEC BTFs during
> >> static..
> >> git tree: bpf-next
> >> console output:
> >> https://syzkaller.appspot.com/x/log.txt?x=1569bb06d00000
> >> kernel config:
> >> https://syzkaller.appspot.com/x/.config?x=7eff0f22b8563a5f
> >> dashboard link:
> >> https://syzkaller.appspot.com/bug?extid=774c590240616eaa3423
> >> syz repro:
> >> https://syzkaller.appspot.com/x/repro.syz?x=17556b7cd00000
> >> C reproducer:
> >> https://syzkaller.appspot.com/x/repro.c?x=1772be26d00000
> >>
> >> The issue was bisected to:
> >>
> >> commit 997acaf6b4b59c6a9c259740312a69ea549cc684
> >> Author: Mark Rutland <[email protected]>
> >> Date: Mon Jan 11 15:37:07 2021 +0000
> >>
> >> lockdep: report broken irq restoration
> >>
> >> bisection log:
> >> https://syzkaller.appspot.com/x/bisect.txt?x=10197016d00000
> >> final oops:
> >> https://syzkaller.appspot.com/x/report.txt?x=12197016d00000
> >> console output:
> >> https://syzkaller.appspot.com/x/log.txt?x=14197016d00000
> >>
> >> IMPORTANT: if you fix the issue, please add the following tag to the
> >> commit:
> >> Reported-by: [email protected]
> >> Fixes: 997acaf6b4b5 ("lockdep: report broken irq restoration")
> >>
> >> ------------[ cut here ]------------
> >> WARNING: CPU: 0 PID: 8725 at include/linux/bpf-cgroup.h:193
> >> bpf_cgroup_storage_set include/linux/bpf-cgroup.h:193 [inline]
> >> WARNING: CPU: 0 PID: 8725 at include/linux/bpf-cgroup.h:193
> >> bpf_test_run+0x65e/0xaa0 net/bpf/test_run.c:109
> >
> > I will look at this issue. Thanks!
> >
> >> Modules linked in:
> >> CPU: 0 PID: 8725 Comm: syz-executor927 Not tainted
> >> 5.12.0-rc4-syzkaller #0
> >> Hardware name: Google Google Compute Engine/Google Compute Engine,
> >> BIOS Google 01/01/2011
> >> RIP: 0010:bpf_cgroup_storage_set include/linux/bpf-cgroup.h:193 [inline]
> >> RIP: 0010:bpf_test_run+0x65e/0xaa0 net/bpf/test_run.c:109
> >> Code: e9 29 fe ff ff e8 b2 9d 3a fa 41 83 c6 01 bf 08 00 00 00 44 89
> >> f6 e8 51 a5 3a fa 41 83 fe 08 0f 85 74 fc ff ff e8 92 9d 3a fa <0f> 0b
> >> bd f0 ffff ff e9 5c fd ff ff e8 81 9d 3a fa 83 c5 01 bf 08
> >> RSP: 0018:ffffc900017bfaf0 EFLAGS: 00010293
> >> RAX: 0000000000000000 RBX: ffffc90000f29000 RCX: 0000000000000000
> >> RDX: ffff88801bc68000 RSI: ffffffff8739543e RDI: 0000000000000003
> >> RBP: 0000000000000007 R08: 0000000000000008 R09: 0000000000000001
> >> R10: ffffffff8739542f R11: 0000000000000000 R12: dffffc0000000000
> >> R13: ffff888021dd54c0 R14: 0000000000000008 R15: 0000000000000000
> >> FS: 00007f00157d7700(0000) GS:ffff8880b9c00000(0000)
> >> knlGS:0000000000000000
> >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> CR2: 00007f0015795718 CR3: 00000000157ae000 CR4: 00000000001506f0
> >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >> Call Trace:
> >> bpf_prog_test_run_skb+0xabc/0x1c70 net/bpf/test_run.c:628
> >> bpf_prog_test_run kernel/bpf/syscall.c:3132 [inline]
> >> __do_sys_bpf+0x218b/0x4f40 kernel/bpf/syscall.c:4411
> >> do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
>
> Run on my qemu (4 cpus) with C reproducer and I cannot reproduce the
> result. It already ran 30 minutes and still running. Checked the code,
> it is just doing a lot of parallel bpf_prog_test_run's.
>
> The failure is in the below WARN_ON_ONCE code:
>
> 175 static inline int bpf_cgroup_storage_set(struct bpf_cgroup_storage
> 176
> *storage[MAX_BPF_CGROUP_STORAGE_TYPE])
> 177 {
> 178 enum bpf_cgroup_storage_type stype;
> 179 int i, err = 0;
> 180
> 181 preempt_disable();
> 182 for (i = 0; i < BPF_CGROUP_STORAGE_NEST_MAX; i++) {
> 183 if
> (unlikely(this_cpu_read(bpf_cgroup_storage_info[i].task) != NULL))
> 184 continue;
> 185
> 186 this_cpu_write(bpf_cgroup_storage_info[i].task,
> current);
> 187 for_each_cgroup_storage_type(stype)
> 188
> this_cpu_write(bpf_cgroup_storage_info[i].storage[stype],
> 189 storage[stype]);
> 190 goto out;
> 191 }
> 192 err = -EBUSY;
> 193 WARN_ON_ONCE(1);
> 194
> 195 out:
> 196 preempt_enable();
> 197 return err;
> 198 }
>
> Basically it shows the stress test triggered a warning due to
> limited kernel resource.

Hi Yonghong,

Thanks for looking into this.
If this is not a kernel bug, then it must not use WARN_ON[_ONCE]. It
makes the kernel untestable for both automated systems and humans:

https://lwn.net/Articles/769365/

<quote>
Greg Kroah-Hartman raised the problem of core kernel API code that
will use WARN_ON_ONCE() to complain about bad usage; that will not
generate the desired result if WARN_ON_ONCE() is configured to crash
the machine. He was told that the code should just call pr_warn()
instead, and that the called function should return an error in such
situations. It was generally agreed that any WARN_ON() or
WARN_ON_ONCE() calls that can be triggered from user space need to be
fixed.
</quote>



> >> entry_SYSCALL_64_after_hwframe+0x44/0xae
> >> RIP: 0033:0x446199
> >> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 11 15 00 00 90 48 89 f8 48
> >> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
> >> 01 f0 ffff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> >> RSP: 002b:00007f00157d72f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
> >> RAX: ffffffffffffffda RBX: 00000000004cb440 RCX: 0000000000446199
> >> RDX: 0000000000000028 RSI: 0000000020000080 RDI: 000000000000000a
> >> RBP: 000000000049b074 R08: 0000000000000000 R09: 0000000000000000
> >> R10: 0000000000000000 R11: 0000000000000246 R12: f9abde7200f522cd
> >> R13: 3952ddf3af240c07 R14: 1631e0d82d3fa99d R15: 00000000004cb448
> >>
> >>
> >> ---
> >> This report is generated by a bot. It may contain errors.
> >> See
> >> https://goo.gl/tpsmEJ
> >> for more information about syzbot.
> >> syzbot engineers can be reached at [email protected].
> >>
> >> syzbot will keep track of this issue. See:
> >> https://goo.gl/tpsmEJ#status
> >> for how to communicate with syzbot.
> >> For information about bisection process see:
> >> https://goo.gl/tpsmEJ#bisection
> >> syzbot can test patches for this issue, for details see:
> >> https://goo.gl/tpsmEJ#testing-patches

2021-04-13 17:53:05

by Steven Rostedt

[permalink] [raw]
Subject: Re: [syzbot] WARNING in bpf_test_run

On Tue, 13 Apr 2021 09:56:40 +0200
Dmitry Vyukov <[email protected]> wrote:

> Thanks for looking into this.
> If this is not a kernel bug, then it must not use WARN_ON[_ONCE]. It
> makes the kernel untestable for both automated systems and humans:
>
> https://lwn.net/Articles/769365/
>
> <quote>
> Greg Kroah-Hartman raised the problem of core kernel API code that
> will use WARN_ON_ONCE() to complain about bad usage; that will not
> generate the desired result if WARN_ON_ONCE() is configured to crash
> the machine. He was told that the code should just call pr_warn()
> instead, and that the called function should return an error in such
> situations. It was generally agreed that any WARN_ON() or
> WARN_ON_ONCE() calls that can be triggered from user space need to be
> fixed.
> </quote>

I agree. WARN_ON(_ONCE) should be reserved for anomalies that should not
happen ever. Anything that the user could trigger, should not trigger a
WARN_ON.

A WARN_ON is perfectly fine for detecting an accounting error inside the
kernel. I have them scattered all over my code, but they should never be
hit, even if something in user space tries to hit it. (with an exception of
an interface I want to deprecate, where I want to know if it's still being
used ;-) Of course, that wouldn't help bots testing the code. And I haven't
done that in years)

Any anomaly that can be triggered by user space doing something it should
not be doing really needs a pr_warn().

Thanks,

-- Steve