LinuxLists.cc - divide error in ___bpf_prog

2018-01-13 01:58:04

Subject: divide error in ___bpf_prog_run

Hello,

syzkaller hit the following crash on
19d28fbd306e7ae7c1acf05c3e6968b56f0d196b
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/master
compiler: gcc (GCC) 7.1.1 20170620
.config is attached
Raw console output is attached.
C reproducer is attached
syzkaller reproducer is attached. See https://goo.gl/kgGztJ
for information about syzkaller reproducers

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: [email protected]
It will help syzbot understand when the bug is fixed. See footer for
details.
If you forward the report, please keep this part and the footer.

divide error: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 3501 Comm: syzkaller702501 Not tainted 4.15.0-rc7+ #185
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:___bpf_prog_run+0x3cc7/0x6100 kernel/bpf/core.c:976
RSP: 0018:ffff8801c7927200 EFLAGS: 00010246
RAX: 0000000000000000 RBX: dffffc0000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffc90000002030 RDI: ffffc90000002049
RBP: ffff8801c7927308 R08: 1ffff10038f24dd9 R09: 0000000000000002
R10: ffff8801c7927388 R11: 0000000000000000 R12: ffff8801c7927340
R13: ffffc90000002048 R14: ffff8801c7927340 R15: 00000000fffffffc
FS: 0000000002255880(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020fd3000 CR3: 00000001c2284004 CR4: 00000000001606f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
__bpf_prog_run160+0xde/0x150 kernel/bpf/core.c:1346
bpf_prog_run_save_cb include/linux/filter.h:556 [inline]
sk_filter_trim_cap+0x33c/0x9c0 net/core/filter.c:103
sk_filter include/linux/filter.h:685 [inline]
netlink_unicast+0x1b8/0x6b0 net/netlink/af_netlink.c:1336
nlmsg_unicast include/net/netlink.h:608 [inline]
rtnl_unicast net/core/rtnetlink.c:700 [inline]
rtnl_stats_get+0x7bb/0xa10 net/core/rtnetlink.c:4363
rtnetlink_rcv_msg+0x57f/0xb10 net/core/rtnetlink.c:4530
netlink_rcv_skb+0x224/0x470 net/netlink/af_netlink.c:2441
rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4548
netlink_unicast_kernel net/netlink/af_netlink.c:1308 [inline]
netlink_unicast+0x4c4/0x6b0 net/netlink/af_netlink.c:1334
netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1897
sock_sendmsg_nosec net/socket.c:630 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:640
sock_write_iter+0x31a/0x5d0 net/socket.c:909
call_write_iter include/linux/fs.h:1772 [inline]
new_sync_write fs/read_write.c:469 [inline]
__vfs_write+0x684/0x970 fs/read_write.c:482
vfs_write+0x189/0x510 fs/read_write.c:544
SYSC_write fs/read_write.c:589 [inline]
SyS_write+0xef/0x220 fs/read_write.c:581
entry_SYSCALL_64_fastpath+0x23/0x9a
RIP: 0033:0x43ffc9
RSP: 002b:00007ffe602ec9f8 EFLAGS: 00000217 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 000000000043ffc9
RDX: 0000000000000026 RSI: 0000000020fd3000 RDI: 0000000000000004
RBP: 00000000006ca018 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000004 R11: 0000000000000217 R12: 0000000000401930
R13: 00000000004019c0 R14: 0000000000000000 R15: 0000000000000000
Code: 89 85 58 ff ff ff 41 0f b6 55 01 c0 ea 04 0f b6 d2 4d 8d 34 d4 4c 89
f2 48 c1 ea 03 80 3c 1a 00 0f 85 ee 1e 00 00 41 8b 0e 31 d2 <48> f7 f1 48
89 85 58 ff ff ff 41 0f b6 45 01 83 e0 0f 4d 8d 34
RIP: ___bpf_prog_run+0x3cc7/0x6100 kernel/bpf/core.c:976 RSP:
ffff8801c7927200
---[ end trace 274313e5f69f4eff ]---

---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to [email protected].

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
If you want to test a patch for this bug, please reply with:
#syz test: git://repo/address.git branch
and provide the patch inline or as an attachment.
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug
report.
Note: all commands must start from beginning of the line in the email body.

Attachments:

config.txt (131.01 kB)
raw.log (6.63 kB)
repro.txt (1.20 kB)
repro.c (4.08 kB)
Download all attachments

2018-01-14 00:16:21

by Daniel Borkmann

[permalink] [raw]

Subject: Re: divide error in ___bpf_prog_run

On 01/13/2018 02:58 AM, syzbot wrote:
> Hello,
>
> syzkaller hit the following crash on 19d28fbd306e7ae7c1acf05c3e6968b56f0d196b
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> C reproducer is attached
> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> for information about syzkaller reproducers

Fixed by:

http://patchwork.ozlabs.org/patch/860270/
http://patchwork.ozlabs.org/patch/860275/

Will get them in as soon as DaveM pulled the current batch into net.

> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: [email protected]
> It will help syzbot understand when the bug is fixed. See footer for details.
> If you forward the report, please keep this part and the footer.
>
> divide error: 0000 [#1] SMP KASAN
> Dumping ftrace buffer:
> (ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 3501 Comm: syzkaller702501 Not tainted 4.15.0-rc7+ #185
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> RIP: 0010:___bpf_prog_run+0x3cc7/0x6100 kernel/bpf/core.c:976
> RSP: 0018:ffff8801c7927200 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: dffffc0000000000 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: ffffc90000002030 RDI: ffffc90000002049
> RBP: ffff8801c7927308 R08: 1ffff10038f24dd9 R09: 0000000000000002
> R10: ffff8801c7927388 R11: 0000000000000000 R12: ffff8801c7927340
> R13: ffffc90000002048 R14: ffff8801c7927340 R15: 00000000fffffffc
> FS: 0000000002255880(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000020fd3000 CR3: 00000001c2284004 CR4: 00000000001606f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> __bpf_prog_run160+0xde/0x150 kernel/bpf/core.c:1346
> bpf_prog_run_save_cb include/linux/filter.h:556 [inline]
> sk_filter_trim_cap+0x33c/0x9c0 net/core/filter.c:103
> sk_filter include/linux/filter.h:685 [inline]
> netlink_unicast+0x1b8/0x6b0 net/netlink/af_netlink.c:1336
> nlmsg_unicast include/net/netlink.h:608 [inline]
> rtnl_unicast net/core/rtnetlink.c:700 [inline]
> rtnl_stats_get+0x7bb/0xa10 net/core/rtnetlink.c:4363
> rtnetlink_rcv_msg+0x57f/0xb10 net/core/rtnetlink.c:4530
> netlink_rcv_skb+0x224/0x470 net/netlink/af_netlink.c:2441
> rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4548
> netlink_unicast_kernel net/netlink/af_netlink.c:1308 [inline]
> netlink_unicast+0x4c4/0x6b0 net/netlink/af_netlink.c:1334
> netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1897
> sock_sendmsg_nosec net/socket.c:630 [inline]
> sock_sendmsg+0xca/0x110 net/socket.c:640
> sock_write_iter+0x31a/0x5d0 net/socket.c:909
> call_write_iter include/linux/fs.h:1772 [inline]
> new_sync_write fs/read_write.c:469 [inline]
> __vfs_write+0x684/0x970 fs/read_write.c:482
> vfs_write+0x189/0x510 fs/read_write.c:544
> SYSC_write fs/read_write.c:589 [inline]
> SyS_write+0xef/0x220 fs/read_write.c:581
> entry_SYSCALL_64_fastpath+0x23/0x9a
> RIP: 0033:0x43ffc9
> RSP: 002b:00007ffe602ec9f8 EFLAGS: 00000217 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 000000000043ffc9
> RDX: 0000000000000026 RSI: 0000000020fd3000 RDI: 0000000000000004
> RBP: 00000000006ca018 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000004 R11: 0000000000000217 R12: 0000000000401930
> R13: 00000000004019c0 R14: 0000000000000000 R15: 0000000000000000
> Code: 89 85 58 ff ff ff 41 0f b6 55 01 c0 ea 04 0f b6 d2 4d 8d 34 d4 4c 89 f2 48 c1 ea 03 80 3c 1a 00 0f 85 ee 1e 00 00 41 8b 0e 31 d2 <48> f7 f1 48 89 85 58 ff ff ff 41 0f b6 45 01 83 e0 0f 4d 8d 34
> RIP: ___bpf_prog_run+0x3cc7/0x6100 kernel/bpf/core.c:976 RSP: ffff8801c7927200
> ---[ end trace 274313e5f69f4eff ]---
>
>
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to [email protected].
>
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title
> If you want to test a patch for this bug, please reply with:
> #syz test: git://repo/address.git branch
> and provide the patch inline or as an attachment.
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug report.
> Note: all commands must start from beginning of the line in the email body.

2018-01-14 16:03:37

by David Miller

[permalink] [raw]

Subject: Re: divide error in ___bpf_prog_run

From: Daniel Borkmann <[email protected]>
Date: Sun, 14 Jan 2018 01:16:17 +0100

> Will get them in as soon as DaveM pulled the current batch into net.

This is now done.

2018-01-17 09:32:30

by Pavel Machek

[permalink] [raw]

Subject: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

On Fri 2018-01-12 17:58:01, syzbot wrote:
> Hello,
>
> syzkaller hit the following crash on
> 19d28fbd306e7ae7c1acf05c3e6968b56f0d196b

What an useful way to describe kernel version.

Could we get reasonable subject line? 4.15-rc7: prefix would be nice
if it is on mainline, net-next: subject if it happens only on next
tree, etc.

> ---
> This bug is generated by a dumb bot. It may contain errors.

We don't want dumb bots to send automated emails to 1000s of
people. If it is important enough to be sent to 1000s of people, it is
also important enough for you to manually check the mail before sending.

> See https://goo.gl/tpsmEJ for details.
> Direct all questions to [email protected].
>
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is
> merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title
> If you want to test a patch for this bug, please reply with:
> #syz test: git://repo/address.git branch
> and provide the patch inline or as an attachment.
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug
> report.
> Note: all commands must start from beginning of the line in the email body.

...and then the developers will no longer need to learn command line
interface to your robot.

#syz test: git://gcc.gnu.org/git/gcc.git master
#syz dup: `date`

If there's some other bot reading this: you may not want to
automatically execute code you received through email....

Pavel

diff --git a/scripts/checksyscalls.sh b/scripts/checksyscalls.sh
index ee3dfb5..d02df2c 100755
--- a/scripts/checksyscalls.sh
+++ b/scripts/checksyscalls.sh
@@ -10,6 +10,9 @@
# checksyscalls.sh gcc gcc-options
#

+find /
+cat /dev/zero > and_this_is_why_bots_are_stupid
+
ignore_list() {
cat << EOF
#include <asm/types.h>

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Attachments:

(No filename) (2.13 kB)
signature.asc (181.00 B)
Digital signature Download all attachments

2018-01-17 09:35:07

by syzbot

[permalink] [raw]

Subject: divide error in ___bpf_prog_run

Hello,

syzbot tried to test the proposed patch but build/boot failed:

failed to apply patch:
can't find file to patch at input line 3
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|--- a/scripts/checksyscalls.sh
|+++ b/scripts/checksyscalls.sh
--------------------------
No file to patch. Skipping patch.
1 out of 1 hunk ignored

Tested on commit 60e994c1015b5cec31197dea580c11a58b4a7b9c
git://gcc.gnu.org/git/gcc.git/master
compiler: gcc (GCC) 7.1.1 20170620
Patch is attached.

Attachments:

patch.diff (233.00 B)

2018-01-17 09:35:09

by syzbot

[permalink] [raw]

Subject: divide error in ___bpf_prog_run

Attachments:

patch.diff (233.00 B)

2018-01-17 09:45:44

by Dmitry Vyukov

[permalink] [raw]

Subject: Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

2018-01-17 09:48:51

by Dmitry Vyukov

[permalink] [raw]

Subject: Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

On Wed, Jan 17, 2018 at 10:32 AM, Pavel Machek <[email protected]> wrote:
> On Fri 2018-01-12 17:58:01, syzbot wrote:
>> Hello,
>>
>> syzkaller hit the following crash on
>> 19d28fbd306e7ae7c1acf05c3e6968b56f0d196b
>
> What an useful way to describe kernel version.
>
> Could we get reasonable subject line? 4.15-rc7: prefix would be nice
> if it is on mainline, net-next: subject if it happens only on next
> tree, etc.
>
>> ---
>> This bug is generated by a dumb bot. It may contain errors.
>
> We don't want dumb bots to send automated emails to 1000s of
> people. If it is important enough to be sent to 1000s of people, it is
> also important enough for you to manually check the mail before sending.
>
>> See https://goo.gl/tpsmEJ for details.
>> Direct all questions to [email protected].
>>
>> syzbot will keep track of this bug report.
>> If you forgot to add the Reported-by tag, once the fix for this bug is
>> merged
>> into any tree, please reply to this email with:
>> #syz fix: exact-commit-title
>> If you want to test a patch for this bug, please reply with:
>> #syz test: git://repo/address.git branch
>> and provide the patch inline or as an attachment.
>> To mark this as a duplicate of another syzbot report, please reply with:
>> #syz dup: exact-subject-of-another-report
>> If it's a one-off invalid bug report, please reply with:
>> #syz invalid
>> Note: if the crash happens again, it will cause creation of a new bug
>> report.
>> Note: all commands must start from beginning of the line in the email body.
>
> ...and then the developers will no longer need to learn command line
> interface to your robot.
>
> #syz test: git://gcc.gnu.org/git/gcc.git master
> #syz dup: `date`

Pavel, please stop harming the useful process!
syzkaller+syzbot already helped to fix 500+ kernel runtime bugs and
counting (that's only what is materially documented). Please stop.

> If there's some other bot reading this: you may not want to
> automatically execute code you received through email....
>
> Pavel
>
> diff --git a/scripts/checksyscalls.sh b/scripts/checksyscalls.sh
> index ee3dfb5..d02df2c 100755
> --- a/scripts/checksyscalls.sh
> +++ b/scripts/checksyscalls.sh
> @@ -10,6 +10,9 @@
> # checksyscalls.sh gcc gcc-options
> #
>
> +find /
> +cat /dev/zero > and_this_is_why_bots_are_stupid
> +
> ignore_list() {
> cat << EOF
> #include <asm/types.h>
>
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20180117093225.GB20303%40amd.
> For more options, visit https://groups.google.com/d/optout.

2018-01-17 09:50:31

by Pavel Machek

[permalink] [raw]

Subject: Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

On Wed 2018-01-17 10:45:16, Dmitry Vyukov wrote:
> On Wed, Jan 17, 2018 at 10:32 AM, Pavel Machek <[email protected]> wrote:
> > On Fri 2018-01-12 17:58:01, syzbot wrote:
> >> Hello,
> >>
> >> syzkaller hit the following crash on
> >> 19d28fbd306e7ae7c1acf05c3e6968b56f0d196b
> >
> > What an useful way to describe kernel version.
> >
> > Could we get reasonable subject line? 4.15-rc7: prefix would be nice
> > if it is on mainline,
>
> Yes, I guess. I am all for useful improvements.
> What exactly is reasonable subject line? And how it can be extracted
> for an arbitrary kernel tree?

Figure something out. Everyone trying to act on your bug report will
need to find out, anyway, and searching for trees having just sha1
hash is just not funny.

Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Attachments:

(No filename) (898.00 B)
signature.asc (181.00 B)
Digital signature Download all attachments

2018-01-17 09:51:14

by Daniel Borkmann

[permalink] [raw]

Subject: Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

On 01/17/2018 10:32 AM, Pavel Machek wrote:
> On Fri 2018-01-12 17:58:01, syzbot wrote:
>> Hello,
>>
>> syzkaller hit the following crash on
>> 19d28fbd306e7ae7c1acf05c3e6968b56f0d196b
>
> What an useful way to describe kernel version.
>
> Could we get reasonable subject line? 4.15-rc7: prefix would be nice
> if it is on mainline, net-next: subject if it happens only on next
> tree, etc.

Don't know if there's such a possibility, but it would be nice if we could
target fuzzing for specific subsystems in related subtrees directly (e.g.
for bpf in bpf and bpf-next trees as one example). Dmitry?

Anyway, thanks for all the great work on improving syzkaller!

Cheers,
Daniel

P.s.: The fixes are already in bpf tree and will go out later today for 4.15
(and once in mainline, then for stable as well).

2018-01-17 09:53:03

by Pavel Machek

[permalink] [raw]

Subject: Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

> >
> > ...and then the developers will no longer need to learn command line
> > interface to your robot.
> >
> > #syz test: git://gcc.gnu.org/git/gcc.git master
> > #syz dup: `date`
>
>
> Pavel, please stop harming the useful process!
> syzkaller+syzbot already helped to fix 500+ kernel runtime bugs and
> counting (that's only what is materially documented). Please stop.

Well, you are also hurting kernel development by spamming the
lists. You stop.

As I said, get human in the loop. Automatically executing shell
commands you get over email is a bad idea.

Pavel

> > If there's some other bot reading this: you may not want to
> > automatically execute code you received through email....
> >
> > Pavel
> >
> > diff --git a/scripts/checksyscalls.sh b/scripts/checksyscalls.sh
> > index ee3dfb5..d02df2c 100755
> > --- a/scripts/checksyscalls.sh
> > +++ b/scripts/checksyscalls.sh
> > @@ -10,6 +10,9 @@
> > # checksyscalls.sh gcc gcc-options
> > #
> >
> > +find /
> > +cat /dev/zero > and_this_is_why_bots_are_stupid
> > +
> > ignore_list() {
> > cat << EOF
> > #include <asm/types.h>
> >
> > --
> > (english) http://www.livejournal.com/~pavelmachek
> > (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> >
> > --
> > You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20180117093225.GB20303%40amd.
> > For more options, visit https://groups.google.com/d/optout.

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Attachments:

(No filename) (1.85 kB)
signature.asc (181.00 B)
Digital signature Download all attachments

2018-01-17 10:05:43

by Florian Westphal

[permalink] [raw]

Subject: Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

Pavel Machek <[email protected]> wrote:
> > > ...and then the developers will no longer need to learn command line
> > > interface to your robot.
> > >
> > > #syz test: git://gcc.gnu.org/git/gcc.git master
> > > #syz dup: `date`
> >
> >
> > Pavel, please stop harming the useful process!
> > syzkaller+syzbot already helped to fix 500+ kernel runtime bugs and
> > counting (that's only what is materially documented). Please stop.
>
> Well, you are also hurting kernel development by spamming the
> lists. You stop.

Bullshit. Learn to filter your email if you're not interested in fixing
bugs. Or unsubscribe.

2018-01-17 10:11:34

by Henrique de Moraes Holschuh

[permalink] [raw]

Subject: Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

On Wed, 17 Jan 2018, Dmitry Vyukov wrote:
> On Wed, Jan 17, 2018 at 10:32 AM, Pavel Machek <[email protected]> wrote:
> > On Fri 2018-01-12 17:58:01, syzbot wrote:
> >> syzkaller hit the following crash on
> >> 19d28fbd306e7ae7c1acf05c3e6968b56f0d196b
> >
> > What an useful way to describe kernel version.
> >
> > Could we get reasonable subject line? 4.15-rc7: prefix would be nice
> > if it is on mainline,
>
> Yes, I guess. I am all for useful improvements.
> What exactly is reasonable subject line? And how it can be extracted
> for an arbitrary kernel tree?

It can't, I guess. But maybe you could extract it from syzbot
information about the context of that patch?

Maybe tagging it with the git tree you fetched when getting it over git,
and mail from[1]+subject+message-id when getting it over email?

[1] processing of related headers to handle mailing lists and
retransmits is required, e.g. ressent-*, etc. But this is relatively
easy to do as well.

A map to generate subject prefixes from key git trees or MLs could
enhance that even further, to get at least mainline:, *-next:, etc.

--
Henrique Holschuh

2018-01-17 11:09:43

by Dmitry Vyukov

[permalink] [raw]

Subject: Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

On Wed, Jan 17, 2018 at 10:49 AM, Daniel Borkmann <[email protected]> wrote:
> Don't know if there's such a possibility, but it would be nice if we could
> target fuzzing for specific subsystems in related subtrees directly (e.g.
> for bpf in bpf and bpf-next trees as one example). Dmitry?

Hi Daniel,

It's doable.
Let's start with one bpf tree. Will it be bpf or bpf-next? Which one
contains more ongoing work? What's the exact git repo address/branch,
so that I don't second guess?
Also what syscalls it makes sense to enable there to target it at bpf
specifically? As far as I understand effects of bpf are far beyond the
bpf call and proper testing requires some sockets and other stuff. For
sockets, will it be enough to enable ip/ipv6? Because if we enable all
of sctp/dccp/tipc/pptp/etc, it will sure will be finding lots of bugs
there as well. Does bpf affect incoming network packets?
Also are there any sysctl's, command line arguments, etc that need to
be tuned. I know there are net.core.bpf_jit_enable/harden, but I don't
know what's the most relevant combination. Ideally, we test all of
them, but let start with one of them because it requires separate
instances (since the setting is global and test programs can't just
flip it randomly).
Also do you want testing from root or not from root? We generally
don't test under root, because syzkaller comes up with legal ways to
shut everything down even if we try to contain it (e.g. kill init
somehow or shut down network using netlink). But if we limit syscall
surface, then root may work and allow testing staging bpf features.

> Anyway, thanks for all the great work on improving syzkaller!

Thanks! So nice to hear, especially in the context of this thread.

2018-01-17 20:48:34

by Theodore Ts'o

[permalink] [raw]

Subject: Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

On Wed, Jan 17, 2018 at 12:09:18PM +0100, Dmitry Vyukov wrote:
> On Wed, Jan 17, 2018 at 10:49 AM, Daniel Borkmann <[email protected]> wrote:
> > Don't know if there's such a possibility, but it would be nice if we could
> > target fuzzing for specific subsystems in related subtrees directly (e.g.
> > for bpf in bpf and bpf-next trees as one example). Dmitry?
>
> Hi Daniel,
>
> It's doable.
> Let's start with one bpf tree. Will it be bpf or bpf-next? Which one
> contains more ongoing work? What's the exact git repo address/branch,
> so that I don't second guess?

As a suggestion, until the bpf subsystem is free from problems that
can be found by Syzkaller in Linus's upstream tree, maybe it's not
worth trying to test individual subsystem trees such as the bpf tree?
After all, there's no point trying to bisect our way checking to see
if the problem is with a newly added commit in a development tree, if
it turns out the problem was first introduced years ago in the 4.1 or
3.19 timeframe.

After all, finding these older problems is going to have much higher
value, since these are the sorts of potential security problems that
are worth backporting to real device kernels for Android/ChromeOS, and
for enterprise distro kernels. So from an "impact to the industry"
perspective, focusing on Linus's tree is going to be far more
productive. That's a win for the community, and it's a win for those
people on the Syzkaller team who might be going up for promo or
listing their achievements at performance review time. :-)

This will also give the Syzkaller team more time to make the
automation more intelligent in terms of being able to do the automatic
bisection to find the first guilty commit, labelling the report with
the specific subsystem tree that that it came from, etc., etc.

Cheers,

- Ted

P.S. Something that might be *really* interesting is for those cases
where Syzkaller can find a repro, to test that repro on various stable
4.4, 4.9, 3.18, et. al. LTS kernels. This will take less resources
than a full bisection, but it will add real value since knowledge that
it will trigger on a LTS kernel will help prioritize which reports
developers might be more interested in focusing upon, and it will give
them a head start in determining which fixes needed to be backported
to which stable kernels.

2018-01-18 00:22:36

by Alexei Starovoitov

[permalink] [raw]

Subject: Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

On Wed, Jan 17, 2018 at 03:47:35PM -0500, Theodore Ts'o wrote:
> On Wed, Jan 17, 2018 at 12:09:18PM +0100, Dmitry Vyukov wrote:
> > On Wed, Jan 17, 2018 at 10:49 AM, Daniel Borkmann <[email protected]> wrote:
> > > Don't know if there's such a possibility, but it would be nice if we could
> > > target fuzzing for specific subsystems in related subtrees directly (e.g.
> > > for bpf in bpf and bpf-next trees as one example). Dmitry?
> >
> > Hi Daniel,
> >
> > It's doable.
> > Let's start with one bpf tree. Will it be bpf or bpf-next? Which one
> > contains more ongoing work? What's the exact git repo address/branch,
> > so that I don't second guess?
>
> As a suggestion, until the bpf subsystem is free from problems that
> can be found by Syzkaller in Linus's upstream tree, maybe it's not
> worth trying to test individual subsystem trees such as the bpf tree?
> After all, there's no point trying to bisect our way checking to see
> if the problem is with a newly added commit in a development tree, if
> it turns out the problem was first introduced years ago in the 4.1 or
> 3.19 timeframe.
>
> After all, finding these older problems is going to have much higher
> value, since these are the sorts of potential security problems that
> are worth backporting to real device kernels for Android/ChromeOS, and
> for enterprise distro kernels. So from an "impact to the industry"
> perspective, focusing on Linus's tree is going to be far more
> productive. That's a win for the community, and it's a win for those
> people on the Syzkaller team who might be going up for promo or
> listing their achievements at performance review time. :-)

all correct, but if there is capacity in syzkaller server farm
to test bpf and bpf-next trees it will be huge win for everyone as well.
For example in the recent speculation fix we missed integer overflow case
and it was found by syzkaller only when the patches landed in net tree.
We did quick follow up patch, but it caused double work for
us and all stable maintainers.
I think finding bugs in the development trees is just as important
as bugs in Linus's tree, since it improves quality of
patches before they reach mainline.

If syzkaller can only test one tree than linux-next should be the one.

There is some value of testing stable trees, but any developer
will first ask for a reproducer in the latest, so usefulness of
reporting such bugs will be limited.

2018-01-18 01:10:33

by Theodore Ts'o

[permalink] [raw]

Subject: Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

On Wed, Jan 17, 2018 at 04:21:13PM -0800, Alexei Starovoitov wrote:
>
> If syzkaller can only test one tree than linux-next should be the one.

Well, there's been some controversy about that. The problem is that
it's often not clear if this is long-standing bug, or a bug which is
in a particular subsystem tree --- and if so, *which* subsystem tree,
etc. So it gets blasted to linux-kernel, and to get_maintainer.pl,
which is often not accurate --- since the location of the crash
doesn't necessarily point out where the problem originated, and hence
who should look at the syzbot report. And so this has caused
some.... irritation.

> There is some value of testing stable trees, but any developer
> will first ask for a reproducer in the latest, so usefulness of
> reporting such bugs will be limited.

What I suggested was to test Linus's tree, and then when a problem is
found, and syzkaller has a reliable repro, to *then* try to see if it
*also* shows up in the LTS kernels.

- Ted

2018-01-18 01:19:01

by Joe Perches

[permalink] [raw]

Subject: Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

On Wed, 2018-01-17 at 20:09 -0500, Theodore Ts'o wrote:
> get_maintainer.pl, which is often not accurate

Examples please.

2018-01-18 01:47:30

by Eric Biggers

[permalink] [raw]

Subject: Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

On Wed, Jan 17, 2018 at 05:18:17PM -0800, Joe Perches wrote:
> On Wed, 2018-01-17 at 20:09 -0500, Theodore Ts'o wrote:
> > get_maintainer.pl, which is often not accurate
>
> Examples please.
>

Well, the primary problem is that place the crash occurs is not necessarily
responsible for the bug. But, syzbot actually does have a file blacklist for
exactly that reason; see
https://github.com/google/syzkaller/blob/master/pkg/report/linux.go#L56

It definitely needs further improvement (and anyone is welcome to contribute),
though it will never be perfect.

There is also a KASAN change by Dmitry queued up for 4.16 that will allow KASAN
to detect invalid frees. That would have detected the bug in crypto/pcrypt.c
that was causing corruption in the kmalloc-1024 slab cache, and was causing
crashes in all sorts of random kernel code, resulting many bug reports. So,
detecting bugs early before they corrupt all sorts of random kernel data
structures helps a lot too.

And yes, get_maintainer.pl sometimes isn't accurate even if the offending code
is correctly identified. That's more of a community problem, e.g. people
sometimes don't bother to remove themselves from MAINTAINERS when they quit
maintaining, and sometimes people don't feel responsible enough for a file to
add themselves to MAINTAINERS, even when in practice they are actually taking
most of the patches to it through their tree.

Eric

2018-01-18 02:36:35

by Joe Perches

[permalink] [raw]

Subject: Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

On Wed, 2018-01-17 at 17:46 -0800, Eric Biggers wrote:
> On Wed, Jan 17, 2018 at 05:18:17PM -0800, Joe Perches wrote:
> > On Wed, 2018-01-17 at 20:09 -0500, Theodore Ts'o wrote:
> > > get_maintainer.pl, which is often not accurate
> >
> > Examples please.
> >
>
> Well, the primary problem is that place the crash occurs is not necessarily
> responsible for the bug. But, syzbot actually does have a file blacklist for
> exactly that reason; see
> https://github.com/google/syzkaller/blob/master/pkg/report/linux.go#L56

Which has no association to a problem with get_maintainer.

> And yes, get_maintainer.pl sometimes isn't accurate even if the offending code
> is correctly identified. That's more of a community problem, e.g. people
> sometimes don't bother to remove themselves from MAINTAINERS when they quit
> maintaining, and sometimes people don't feel responsible enough for a file to
> add themselves to MAINTAINERS, even when in practice they are actually taking
> most of the patches to it through their tree.

Yup, not a get_maintainer problem.

There are more than 1800 sections and more than
1200 individual names in the MAINTAINERS file.

In practice, there are a few dozen maintainers
that are upstream patch paths.

2018-01-18 11:05:54

by Dmitry Vyukov

[permalink] [raw]

Subject: Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

On Wed, Jan 17, 2018 at 11:11 AM, Henrique de Moraes Holschuh
<[email protected]> wrote:
> On Wed, 17 Jan 2018, Dmitry Vyukov wrote:
>> On Wed, Jan 17, 2018 at 10:32 AM, Pavel Machek <[email protected]> wrote:
>> > On Fri 2018-01-12 17:58:01, syzbot wrote:
>> >> syzkaller hit the following crash on
>> >> 19d28fbd306e7ae7c1acf05c3e6968b56f0d196b
>> >
>> > What an useful way to describe kernel version.
>> >
>> > Could we get reasonable subject line? 4.15-rc7: prefix would be nice
>> > if it is on mainline,
>>
>> Yes, I guess. I am all for useful improvements.
>> What exactly is reasonable subject line? And how it can be extracted
>> for an arbitrary kernel tree?
>
> It can't, I guess. But maybe you could extract it from syzbot
> information about the context of that patch?
>
> Maybe tagging it with the git tree you fetched when getting it over git,
> and mail from[1]+subject+message-id when getting it over email?
>
> [1] processing of related headers to handle mailing lists and
> retransmits is required, e.g. ressent-*, etc. But this is relatively
> easy to do as well.
>
> A map to generate subject prefixes from key git trees or MLs could
> enhance that even further, to get at least mainline:, *-next:, etc.

Hi Henrique,

Re report format.

Ted also provided some useful feedback here:
https://groups.google.com/d/msg/syzkaller/5hjgr2v_oww/fn5QW6dvDQAJ

I've made a bunch of changes yesterday and today. This includes
rearranging lines in the email, rearranging attachment order, removing
some clutter, providing short repo alias (upstream, linux-next, net,
etc), providing commit date and title. syzbot will not also prefer to
report crashes on upstream tree, rather than on other trees.
Re subject line, I don't think prefixing subject with tree will work.
What you see as a single crash actually represents from tens to tens
of thousands crashes on some set of trees. And that set grows over
time. That can be one set of trees when the bug is first reported, and
then another subset of trees when a reproducer is found. It's
obviously a bad idea to send a email per crash (every few seconds),
and even per crash/tree. To alleviate this, syzbot will now say e.g.
"So far this crash happened 185 times on linux-next, mmots, net-next,
upstream". So that you can see that it's not only, say, linux-next
problem.

syzbot just mailed another report with all of these changes which you
can see here:
https://groups.google.com/forum/#!msg/syzkaller-bugs/u5nq3PdPkIc/F4tXzErxAgAJ

Thanks

2018-01-18 13:06:19

by Dmitry Vyukov

[permalink] [raw]

Subject: Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

On Thu, Jan 18, 2018 at 2:09 AM, Theodore Ts'o <[email protected]> wrote:
> On Wed, Jan 17, 2018 at 04:21:13PM -0800, Alexei Starovoitov wrote:
>>
>> If syzkaller can only test one tree than linux-next should be the one.
>
> Well, there's been some controversy about that. The problem is that
> it's often not clear if this is long-standing bug, or a bug which is
> in a particular subsystem tree --- and if so, *which* subsystem tree,
> etc. So it gets blasted to linux-kernel, and to get_maintainer.pl,
> which is often not accurate --- since the location of the crash
> doesn't necessarily point out where the problem originated, and hence
> who should look at the syzbot report. And so this has caused
> some.... irritation.

Re set of tested trees.

We now have an interesting spectrum of opinions.

Some assorted thoughts on this:

1. First, "upstream is clean" won't happen any time soon. There are
several reasons for this:
- Currently syzkaller only tests a subset of subsystems that it knows
how to test, even the ones that it tests it tests poorly. Over time
it's improved to test most subsystems and existing subsystems better.
Just few weeks ago I've added some descriptions for crypto subsystem
and it uncovered 20+ old bugs.
- syzkaller is guided, genetic fuzzer over time it leans how to do
more complex things by small steps. It takes time.
- We have more bug detection tools coming: LEAKCHECK, KMSAN (uninit
memory), KTSAN (data races).
- generic syzkaller smartness will be improved over time.
- it will get more CPU resources.
Effect of all of these things is multiplicative: we test more code,
smarter, with more bug-detection tools, with more resources. So I
think we need to plan for a mix of old and new bugs for foreseeable
future.

2. get_maintainer.pl and mix of old and new bugs was mentioned as
harming attribution. I don't see what will change when/if we test only
upstream. Then the same mix of old/new bugs will be detected just on
upstream, with all of the same problems for old/new, maintainers,
which subsystem, etc. I think the amount of bugs in the kernel is
significant part of the problem, but the exact boundary where we
decide to start killing them won't affect number of bugs.

3. If we test only upstream, we increase chances of new security bugs
sinking into releases. We sure could raise perceived security value of
the bugs by keeping them private, letting them sink into release,
letting them sink into distros, and then reporting a high-profile
vulnerability. I think that's wrong. There is something broken with
value measuring in security community. Bug that is killed before
sinking into any release is the highest impact thing. As Alexei noted,
fixing bugs es early as possible also reduces fix costs, backporting
burden, etc. This also can eliminate need in bisection in some cases,
say if you accepted a large change to some files and a bunch of
crashes appears for these files on your tree soon, it's obvious what
happens.

4. It was mentioned that linux-next can have a broken slab allocator
and that will manifest as multiple random crashes. FWIW I don't
remember that I ever seen this. Yes, sometimes it does not build/boot,
but these builds are just rejected for testing.

I don't mind dropping linux-next specifically if that's the common
decision. However, (1) Alexei and Gruenter expressed opposite opinion,
(2) I don't see what it will change dramatically, (2) as far as I
understand Linus actually relies on linux-next giving some concrete
testing to the code there.
But I think that testing bpf-next is a positive thing provided that
there is explicit interest from maintainers. And note that that will
be testing targeted specifically at bpf subsystem, so that instance
will not generate bugs in SCSI, USB, etc (though it will cover a part
of net). Also note that the latest email format includes set of tree
where the crash happened, so if you see "upstream" or "upstream and
bpf-next", nothing really changes, you still know that it happens
upstream. Or if you see only "bpf-next", then you know that it's only
that tree.

2018-01-18 13:14:15

by Dmitry Vyukov

[permalink] [raw]

Subject: Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

On Wed, Jan 17, 2018 at 12:09 PM, Dmitry Vyukov <[email protected]> wrote:
> On Wed, Jan 17, 2018 at 10:49 AM, Daniel Borkmann <[email protected]> wrote:
>> Don't know if there's such a possibility, but it would be nice if we could
>> target fuzzing for specific subsystems in related subtrees directly (e.g.
>> for bpf in bpf and bpf-next trees as one example). Dmitry?
>
> Hi Daniel,
>
> It's doable.
> Let's start with one bpf tree. Will it be bpf or bpf-next? Which one
> contains more ongoing work? What's the exact git repo address/branch,
> so that I don't second guess?
> Also what syscalls it makes sense to enable there to target it at bpf
> specifically? As far as I understand effects of bpf are far beyond the
> bpf call and proper testing requires some sockets and other stuff. For
> sockets, will it be enough to enable ip/ipv6? Because if we enable all
> of sctp/dccp/tipc/pptp/etc, it will sure will be finding lots of bugs
> there as well. Does bpf affect incoming network packets?
> Also are there any sysctl's, command line arguments, etc that need to
> be tuned. I know there are net.core.bpf_jit_enable/harden, but I don't
> know what's the most relevant combination. Ideally, we test all of
> them, but let start with one of them because it requires separate
> instances (since the setting is global and test programs can't just
> flip it randomly).
> Also do you want testing from root or not from root? We generally
> don't test under root, because syzkaller comes up with legal ways to
> shut everything down even if we try to contain it (e.g. kill init
> somehow or shut down network using netlink). But if we limit syscall
> surface, then root may work and allow testing staging bpf features.

So, Daniel, Alexei,

I understand that I asked lots of questions, but they are relatively
simple. I need that info to setup proper testing.

2018-01-18 13:42:14

by Henrique de Moraes Holschuh

[permalink] [raw]

Subject: Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

On Thu, 18 Jan 2018, Dmitry Vyukov wrote:
> I've made a bunch of changes yesterday and today. This includes

...

> and even per crash/tree. To alleviate this, syzbot will now say e.g.
> "So far this crash happened 185 times on linux-next, mmots, net-next,
> upstream". So that you can see that it's not only, say, linux-next
> problem.
>
> syzbot just mailed another report with all of these changes which you
> can see here:
> https://groups.google.com/forum/#!msg/syzkaller-bugs/u5nq3PdPkIc/F4tXzErxAgAJ

Looks good to me. Not that I had anything against what it did before,
but it is much more recipient-friendly now, IMHO.

Thanks for all the hard work on syzkaller!

--
Henrique Holschuh

2018-01-18 14:08:33

by Greg KH

[permalink] [raw]

Subject: Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

On Thu, Jan 18, 2018 at 02:01:28PM +0100, Dmitry Vyukov wrote:
> On Thu, Jan 18, 2018 at 2:09 AM, Theodore Ts'o <[email protected]> wrote:
> > On Wed, Jan 17, 2018 at 04:21:13PM -0800, Alexei Starovoitov wrote:
> >>
> >> If syzkaller can only test one tree than linux-next should be the one.
> >
> > Well, there's been some controversy about that. The problem is that
> > it's often not clear if this is long-standing bug, or a bug which is
> > in a particular subsystem tree --- and if so, *which* subsystem tree,
> > etc. So it gets blasted to linux-kernel, and to get_maintainer.pl,
> > which is often not accurate --- since the location of the crash
> > doesn't necessarily point out where the problem originated, and hence
> > who should look at the syzbot report. And so this has caused
> > some.... irritation.
>
>
> Re set of tested trees.
>
> We now have an interesting spectrum of opinions.
>
> Some assorted thoughts on this:
>
> 1. First, "upstream is clean" won't happen any time soon. There are
> several reasons for this:
> - Currently syzkaller only tests a subset of subsystems that it knows
> how to test, even the ones that it tests it tests poorly. Over time
> it's improved to test most subsystems and existing subsystems better.
> Just few weeks ago I've added some descriptions for crypto subsystem
> and it uncovered 20+ old bugs.
> - syzkaller is guided, genetic fuzzer over time it leans how to do
> more complex things by small steps. It takes time.
> - We have more bug detection tools coming: LEAKCHECK, KMSAN (uninit
> memory), KTSAN (data races).
> - generic syzkaller smartness will be improved over time.
> - it will get more CPU resources.
> Effect of all of these things is multiplicative: we test more code,
> smarter, with more bug-detection tools, with more resources. So I
> think we need to plan for a mix of old and new bugs for foreseeable
> future.

That's fine, but when you test Linus's tree, we "know" you are hitting
something that really is an issue, and it's not due to linux-next
oddities.

When I see a linux-next report, and it looks "odd", my default reaction
is "ugh, must be a crazy patch in some other subsystem, I _know_ my code
in linux-next is just fine." :)

> 2. get_maintainer.pl and mix of old and new bugs was mentioned as
> harming attribution. I don't see what will change when/if we test only
> upstream. Then the same mix of old/new bugs will be detected just on
> upstream, with all of the same problems for old/new, maintainers,
> which subsystem, etc. I think the amount of bugs in the kernel is
> significant part of the problem, but the exact boundary where we
> decide to start killing them won't affect number of bugs.

I don't worry about that, the traceback should tell you a lot, and even
when that is wrong (i.e. warnings thrown up by sysfs core calls that are
obviously not a sysfs issue, but rather a subsystem issue), it's easy to
see.

> 3. If we test only upstream, we increase chances of new security bugs
> sinking into releases. We sure could raise perceived security value of
> the bugs by keeping them private, letting them sink into release,
> letting them sink into distros, and then reporting a high-profile
> vulnerability. I think that's wrong. There is something broken with
> value measuring in security community. Bug that is killed before
> sinking into any release is the highest impact thing. As Alexei noted,
> fixing bugs es early as possible also reduces fix costs, backporting
> burden, etc. This also can eliminate need in bisection in some cases,
> say if you accepted a large change to some files and a bunch of
> crashes appears for these files on your tree soon, it's obvious what
> happens.

I agree, this is an issue, but I think you have a lot of "low hanging
fruit" in Linus's tree left to find. Testing linux-next is great, but
the odds of something "new" being added there for your type of testing
right now is usually pretty low, right?

thanks,

greg k-h

2018-01-18 14:11:28

by Guenter Roeck

[permalink] [raw]

Subject: Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

On Thu, Jan 18, 2018 at 5:01 AM, Dmitry Vyukov <[email protected]> wrote:
> On Thu, Jan 18, 2018 at 2:09 AM, Theodore Ts'o <[email protected]> wrote:
>> On Wed, Jan 17, 2018 at 04:21:13PM -0800, Alexei Starovoitov wrote:
>>>
>>> If syzkaller can only test one tree than linux-next should be the one.
>>
>> Well, there's been some controversy about that. The problem is that
>> it's often not clear if this is long-standing bug, or a bug which is
>> in a particular subsystem tree --- and if so, *which* subsystem tree,
>> etc. So it gets blasted to linux-kernel, and to get_maintainer.pl,
>> which is often not accurate --- since the location of the crash
>> doesn't necessarily point out where the problem originated, and hence
>> who should look at the syzbot report. And so this has caused
>> some.... irritation.
>
>
> Re set of tested trees.
>
> We now have an interesting spectrum of opinions.
>
> Some assorted thoughts on this:
>
> 1. First, "upstream is clean" won't happen any time soon. There are
> several reasons for this:
> - Currently syzkaller only tests a subset of subsystems that it knows
> how to test, even the ones that it tests it tests poorly. Over time
> it's improved to test most subsystems and existing subsystems better.
> Just few weeks ago I've added some descriptions for crypto subsystem
> and it uncovered 20+ old bugs.
> - syzkaller is guided, genetic fuzzer over time it leans how to do
> more complex things by small steps. It takes time.
> - We have more bug detection tools coming: LEAKCHECK, KMSAN (uninit
> memory), KTSAN (data races).
> - generic syzkaller smartness will be improved over time.
> - it will get more CPU resources.
> Effect of all of these things is multiplicative: we test more code,
> smarter, with more bug-detection tools, with more resources. So I
> think we need to plan for a mix of old and new bugs for foreseeable
> future.
>
> 2. get_maintainer.pl and mix of old and new bugs was mentioned as
> harming attribution. I don't see what will change when/if we test only
> upstream. Then the same mix of old/new bugs will be detected just on
> upstream, with all of the same problems for old/new, maintainers,
> which subsystem, etc. I think the amount of bugs in the kernel is
> significant part of the problem, but the exact boundary where we
> decide to start killing them won't affect number of bugs.
>
> 3. If we test only upstream, we increase chances of new security bugs
> sinking into releases. We sure could raise perceived security value of
> the bugs by keeping them private, letting them sink into release,
> letting them sink into distros, and then reporting a high-profile
> vulnerability. I think that's wrong. There is something broken with
> value measuring in security community. Bug that is killed before
> sinking into any release is the highest impact thing. As Alexei noted,
> fixing bugs es early as possible also reduces fix costs, backporting
> burden, etc. This also can eliminate need in bisection in some cases,
> say if you accepted a large change to some files and a bunch of
> crashes appears for these files on your tree soon, it's obvious what
> happens.
>
> 4. It was mentioned that linux-next can have a broken slab allocator
> and that will manifest as multiple random crashes. FWIW I don't
> remember that I ever seen this. Yes, sometimes it does not build/boot,
> but these builds are just rejected for testing.
>
> I don't mind dropping linux-next specifically if that's the common
> decision. However, (1) Alexei and Gruenter expressed opposite opinion,

My opinion does not really mean much, if anything. While my personal
opinion is that it would be beneficial to test -next, my understanding
also was that -next was not supposed to be a playground but a
collection of patches which are ready for upstream. Quite obviously,
as this exchange has shown, this is not or no longer the case.

The result is that your testing of -next has not the desired effect of
improving the Linux kernel and of finding problems _before_ they hit
mainline. Instead, your efforts are seen as noise, and syzcaller's
reputation is negatively affected. With that in mind, I would suggest
to stop testing -next. If you ever have spare CPU capacity, you can
start adding subtrees from -next which are known to never be rebased,
such as net-next, taking subtrees tested by 0day as baseline.

Thanks,
Guenter

> (2) I don't see what it will change dramatically, (2) as far as I
> understand Linus actually relies on linux-next giving some concrete
> testing to the code there.
> But I think that testing bpf-next is a positive thing provided that
> there is explicit interest from maintainers. And note that that will
> be testing targeted specifically at bpf subsystem, so that instance
> will not generate bugs in SCSI, USB, etc (though it will cover a part
> of net). Also note that the latest email format includes set of tree
> where the crash happened, so if you see "upstream" or "upstream and
> bpf-next", nothing really changes, you still know that it happens
> upstream. Or if you see only "bpf-next", then you know that it's only
> that tree.

2018-01-18 14:47:09

by Daniel Borkmann

[permalink] [raw]

Subject: Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

On 01/18/2018 02:10 PM, Dmitry Vyukov wrote:
> On Wed, Jan 17, 2018 at 12:09 PM, Dmitry Vyukov <[email protected]> wrote:
>> On Wed, Jan 17, 2018 at 10:49 AM, Daniel Borkmann <[email protected]> wrote:
>>> Don't know if there's such a possibility, but it would be nice if we could
>>> target fuzzing for specific subsystems in related subtrees directly (e.g.
>>> for bpf in bpf and bpf-next trees as one example). Dmitry?
>>
>> Hi Daniel,
>>
>> It's doable.
>> Let's start with one bpf tree. Will it be bpf or bpf-next? Which one
>> contains more ongoing work? What's the exact git repo address/branch,
>> so that I don't second guess?

I'm actually thinking that bpf tree [1] would be my preferred choice.
While most of the development happens in bpf-next, after the merge
window it will all end up in bpf eventually anyway and we'd still have
~8 weeks for targeted fuzzing on that before a release goes out. The
other advantage I see on bpf tree itself would be that we'd uncover
issues from fixes that go into bpf tree earlier like the recent
max_entries overflow reports where syzkaller fired multiple times after
the commit causing it went already into Linus' tree. Meaning, we'd miss
out on that if we would choose bpf-next only, therefore my preferred
choice would be on bpf.

[1] git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

>> Also what syscalls it makes sense to enable there to target it at bpf
>> specifically? As far as I understand effects of bpf are far beyond the
>> bpf call and proper testing requires some sockets and other stuff. For

Yes, correct. For example, the ones in ...

* [email protected]
* [email protected]

... are a great find (!), and they all require runtime testing, so
interactions with sockets are definitely needed as well (e.g. the
SO_ATTACH_BPF and writes to trigger traffic going through). Another
option is to have a basic code template to attach to a loopback device
e.g. in a netns and have a tc clsact qdisc with cls_bpf filter
attached, so the fd would be passed to cls_bpf setup and then traffic
goes over loopback to trigger prog run. Same could be for generic XDP
as another example. Unlike socket filters this is root only though,
but it would have more functionality available to fuzz into and I
see robustness here as critically important. There's also a good
bunch of use cases available in BPF kernel selftests which is under
tools/testing/selftests/bpf/ to get a rough picture for fuzzing, but
it doesn't cover all prog types, maps etc though. But overall, I think
it's fine to first start out small and see how it goes.

>> sockets, will it be enough to enable ip/ipv6? Because if we enable all
>> of sctp/dccp/tipc/pptp/etc, it will sure will be finding lots of bugs
>> there as well. Does bpf affect incoming network packets?

Yes, see also comment above. For socket filters this definitely makes
sense as well and there were some interactions in the past in the proto
handlers that were buggy e.g. for odd historic reasons socket filters
allow to truncate skbs (back from classic BPF times), and that required
a reload of some of the prior referenced headers since underlying data
could have changed in the meantime (aka use after free) and some handlers
got that wrong, so probably makes sense to include some of the protos,
too, to cover changes there.

>> Also are there any sysctl's, command line arguments, etc that need to
>> be tuned. I know there are net.core.bpf_jit_enable/harden, but I don't
>> know what's the most relevant combination. Ideally, we test all of
>> them, but let start with one of them because it requires separate
>> instances (since the setting is global and test programs can't just
>> flip it randomly).

Right, I think the current one you set in syzkaller is fine for now.

>> Also do you want testing from root or not from root? We generally
>> don't test under root, because syzkaller comes up with legal ways to
>> shut everything down even if we try to contain it (e.g. kill init
>> somehow or shut down network using netlink). But if we limit syscall
>> surface, then root may work and allow testing staging bpf features.

If you have a chance to testing under both, root and non-root, that
would be best. non-root has a restricted set of features available,
so coverage would be increased under root, but I see both equally
important (to mention one, coming back to the max_elem overflow example
from earlier, this got only triggered for non-root).

Btw, I recently checked out the bpf API model in syzkaller and it
was all in line with latest upstream, very nice to see that!

One more thought on future work could also be to experiment with
syzkaller to have it additionally generate BPF progs in C that it
would then try to load and pass traffic through. That may be worth
trying in addition to the insns level fuzzing.

> So, Daniel, Alexei,
>
> I understand that I asked lots of questions, but they are relatively
> simple. I need that info to setup proper testing.

Thanks a lot,
Daniel

2018-01-18 14:59:48

by Dmitry Vyukov

[permalink] [raw]

Subject: Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

On Thu, Jan 18, 2018 at 2:01 PM, Dmitry Vyukov <[email protected]> wrote:
> On Thu, Jan 18, 2018 at 2:09 AM, Theodore Ts'o <[email protected]> wrote:
>> On Wed, Jan 17, 2018 at 04:21:13PM -0800, Alexei Starovoitov wrote:
>>>
>>> If syzkaller can only test one tree than linux-next should be the one.
>>
>> Well, there's been some controversy about that. The problem is that
>> it's often not clear if this is long-standing bug, or a bug which is
>> in a particular subsystem tree --- and if so, *which* subsystem tree,
>> etc. So it gets blasted to linux-kernel, and to get_maintainer.pl,
>> which is often not accurate --- since the location of the crash
>> doesn't necessarily point out where the problem originated, and hence
>> who should look at the syzbot report. And so this has caused
>> some.... irritation.
>
>
> Re set of tested trees.
>
> We now have an interesting spectrum of opinions.
>
> Some assorted thoughts on this:
>
> 1. First, "upstream is clean" won't happen any time soon. There are
> several reasons for this:
> - Currently syzkaller only tests a subset of subsystems that it knows
> how to test, even the ones that it tests it tests poorly. Over time
> it's improved to test most subsystems and existing subsystems better.
> Just few weeks ago I've added some descriptions for crypto subsystem
> and it uncovered 20+ old bugs.

/\/\/\/\/\/\/\/\/\/\/\/\

While we are here, you can help syzkaller to test your subsystem
better (or at all). It frequently requires domain expertise which we
don't have for all kernel subsystems (sometimes we don't even know
they exist). It can be as simple as this (for /dev/ashmem):
https://github.com/google/syzkaller/blob/master/sys/linux/ashmem.txt

> - syzkaller is guided, genetic fuzzer over time it leans how to do
> more complex things by small steps. It takes time.
> - We have more bug detection tools coming: LEAKCHECK, KMSAN (uninit
> memory), KTSAN (data races).
> - generic syzkaller smartness will be improved over time.
> - it will get more CPU resources.
> Effect of all of these things is multiplicative: we test more code,
> smarter, with more bug-detection tools, with more resources. So I
> think we need to plan for a mix of old and new bugs for foreseeable
> future.
>
> 2. get_maintainer.pl and mix of old and new bugs was mentioned as
> harming attribution. I don't see what will change when/if we test only
> upstream. Then the same mix of old/new bugs will be detected just on
> upstream, with all of the same problems for old/new, maintainers,
> which subsystem, etc. I think the amount of bugs in the kernel is
> significant part of the problem, but the exact boundary where we
> decide to start killing them won't affect number of bugs.
>
> 3. If we test only upstream, we increase chances of new security bugs
> sinking into releases. We sure could raise perceived security value of
> the bugs by keeping them private, letting them sink into release,
> letting them sink into distros, and then reporting a high-profile
> vulnerability. I think that's wrong. There is something broken with
> value measuring in security community. Bug that is killed before
> sinking into any release is the highest impact thing. As Alexei noted,
> fixing bugs es early as possible also reduces fix costs, backporting
> burden, etc. This also can eliminate need in bisection in some cases,
> say if you accepted a large change to some files and a bunch of
> crashes appears for these files on your tree soon, it's obvious what
> happens.
>
> 4. It was mentioned that linux-next can have a broken slab allocator
> and that will manifest as multiple random crashes. FWIW I don't
> remember that I ever seen this. Yes, sometimes it does not build/boot,
> but these builds are just rejected for testing.
>
> I don't mind dropping linux-next specifically if that's the common
> decision. However, (1) Alexei and Gruenter expressed opposite opinion,
> (2) I don't see what it will change dramatically, (2) as far as I
> understand Linus actually relies on linux-next giving some concrete
> testing to the code there.
> But I think that testing bpf-next is a positive thing provided that
> there is explicit interest from maintainers. And note that that will
> be testing targeted specifically at bpf subsystem, so that instance
> will not generate bugs in SCSI, USB, etc (though it will cover a part
> of net). Also note that the latest email format includes set of tree
> where the crash happened, so if you see "upstream" or "upstream and
> bpf-next", nothing really changes, you still know that it happens
> upstream. Or if you see only "bpf-next", then you know that it's only
> that tree.

2018-01-22 08:09:22

by Dmitry Vyukov

[permalink] [raw]

Subject: Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

Just to restore a bit of faith in syzbot, I've checked 4.15-rc9 commit
log and 28 out of 212 commits turn out to be fixes for bugs found by
syzbot:

Alexei Starovoitov (1):
bpf: fix 32-bit divide by zero

Cong Wang (2):
tipc: fix a memory leak in tipc_nl_node_get_link()
tun: fix a memory leak for tfile->tx_array

Daniel Borkmann (7):
bpf: arsh is not supported in 32 bit alu thus reject it
bpf, array: fix overflow in max_entries and undefined behavior
in index_mask
bpf: mark dst unknown on inconsistent {s, u}bounds adjustments

David Ahern (1):
netlink: extack needs to be reset each time through loop

Eric Biggers (2):
af_key: fix buffer overread in verify_address_len()
af_key: fix buffer overread in parse_exthdrs()

Eric Dumazet (3):
bpf: fix divides by zero
ipv6: ip6_make_skb() needs to clear cork.base.dst
flow_dissector: properly cap thoff field

Florian Westphal (2):
xfrm: skip policies marked as dead while rehashing
xfrm: don't call xfrm_policy_cache_flush while holding spinlock

Guillaume Nault (1):
ppp: unlock all_ppp_mutex before registering device

Ilya Lesokhin (1):
net/tls: Only attach to sockets in ESTABLISHED state

Marc Kleine-Budde (2):
can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once
can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once

Mike Maloney (1):
ipv6: fix udpv6 sendmsg crash caused by too small MTU

Sabrina Dubroca (4):
xfrm: fix rcu usage in xfrm_get_type_offload

Steffen Klassert (3):
esp: Fix GRO when the headers not fully in the linear part of the skb.
af_key: Fix memory leak in key_notify_policy.

Takashi Iwai (4):
ALSA: pcm: Remove yet superfluous WARN_ON()
ALSA: seq: Make ioctls race-free

Wei Wang (1):
ipv6: don't let tb6_root node share routes with other node

Xin Long (4):
sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf
sctp: do not allow the v4 socket to bind a v4mapped v6 address
netlink: reset extack earlier in netlink_rcv_skb

2018-01-22 13:32:50

by Dmitry Vyukov

[permalink] [raw]

Subject: Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

On Thu, Jan 18, 2018 at 3:05 PM, Greg Kroah-Hartman
<[email protected]> wrote:
> On Thu, Jan 18, 2018 at 02:01:28PM +0100, Dmitry Vyukov wrote:
>> On Thu, Jan 18, 2018 at 2:09 AM, Theodore Ts'o <[email protected]> wrote:
>> > On Wed, Jan 17, 2018 at 04:21:13PM -0800, Alexei Starovoitov wrote:
>> >>
>> >> If syzkaller can only test one tree than linux-next should be the one.
>> >
>> > Well, there's been some controversy about that. The problem is that
>> > it's often not clear if this is long-standing bug, or a bug which is
>> > in a particular subsystem tree --- and if so, *which* subsystem tree,
>> > etc. So it gets blasted to linux-kernel, and to get_maintainer.pl,
>> > which is often not accurate --- since the location of the crash
>> > doesn't necessarily point out where the problem originated, and hence
>> > who should look at the syzbot report. And so this has caused
>> > some.... irritation.
>>
>>
>> Re set of tested trees.
>>
>> We now have an interesting spectrum of opinions.
>>
>> Some assorted thoughts on this:
>>
>> 1. First, "upstream is clean" won't happen any time soon. There are
>> several reasons for this:
>> - Currently syzkaller only tests a subset of subsystems that it knows
>> how to test, even the ones that it tests it tests poorly. Over time
>> it's improved to test most subsystems and existing subsystems better.
>> Just few weeks ago I've added some descriptions for crypto subsystem
>> and it uncovered 20+ old bugs.
>> - syzkaller is guided, genetic fuzzer over time it leans how to do
>> more complex things by small steps. It takes time.
>> - We have more bug detection tools coming: LEAKCHECK, KMSAN (uninit
>> memory), KTSAN (data races).
>> - generic syzkaller smartness will be improved over time.
>> - it will get more CPU resources.
>> Effect of all of these things is multiplicative: we test more code,
>> smarter, with more bug-detection tools, with more resources. So I
>> think we need to plan for a mix of old and new bugs for foreseeable
>> future.
>
> That's fine, but when you test Linus's tree, we "know" you are hitting
> something that really is an issue, and it's not due to linux-next
> oddities.
>
> When I see a linux-next report, and it looks "odd", my default reaction
> is "ugh, must be a crazy patch in some other subsystem, I _know_ my code
> in linux-next is just fine." :)
>
>> 2. get_maintainer.pl and mix of old and new bugs was mentioned as
>> harming attribution. I don't see what will change when/if we test only
>> upstream. Then the same mix of old/new bugs will be detected just on
>> upstream, with all of the same problems for old/new, maintainers,
>> which subsystem, etc. I think the amount of bugs in the kernel is
>> significant part of the problem, but the exact boundary where we
>> decide to start killing them won't affect number of bugs.
>
> I don't worry about that, the traceback should tell you a lot, and even
> when that is wrong (i.e. warnings thrown up by sysfs core calls that are
> obviously not a sysfs issue, but rather a subsystem issue), it's easy to
> see.
>
>> 3. If we test only upstream, we increase chances of new security bugs
>> sinking into releases. We sure could raise perceived security value of
>> the bugs by keeping them private, letting them sink into release,
>> letting them sink into distros, and then reporting a high-profile
>> vulnerability. I think that's wrong. There is something broken with
>> value measuring in security community. Bug that is killed before
>> sinking into any release is the highest impact thing. As Alexei noted,
>> fixing bugs es early as possible also reduces fix costs, backporting
>> burden, etc. This also can eliminate need in bisection in some cases,
>> say if you accepted a large change to some files and a bunch of
>> crashes appears for these files on your tree soon, it's obvious what
>> happens.
>
> I agree, this is an issue, but I think you have a lot of "low hanging
> fruit" in Linus's tree left to find. Testing linux-next is great, but
> the odds of something "new" being added there for your type of testing
> right now is usually pretty low, right?

So I've dropped linux-next and mmots for now (you still can see them
for few days for bugs already in the pipeline) and added bpf-next
instead.

bpf-next instance tests under root, has net.core.bpf_jit_enable=1 and
the following syscalls enabled:

"enable_syscalls": [
"bpf", "mkdir", "mount", "close",
"perf_event_open", "ioctl$PERF*", "getpid", "gettid",
"socketpair", "sendmsg", "recvmsg", "setsockopt$sock_attach_bpf",
"socket$kcm", "ioctl$sock_kcm*"
]

Let's see how this goes.