Hello,
syzbot found the following crash on:
HEAD commit: 5e2204832b20 Merge tag 'powerpc-4.18-2' of git://git.kerne..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=148b5a90400000
kernel config: https://syzkaller.appspot.com/x/.config?x=befbcd7305e41bb0
dashboard link: https://syzkaller.appspot.com/bug?extid=a4eb8c7766952a1ca872
compiler: gcc (GCC) 8.0.1 20180413 (experimental)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=10ee22d4400000
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: [email protected]
RAX: ffffffffffffffda RBX: 0000000001429914 RCX: 0000000000455a99
RDX: 0000000000000048 RSI: 0000000020000240 RDI: 0000000000000005
RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000005
R13: 00000000004bb7d5 R14: 00000000004c8508 R15: 0000000000000023
BUG: unable to handle kernel paging request at ffffffffa0008002
PGD 8e6d067 P4D 8e6d067 PUD 8e6e063 PMD 1b4528067 PTE 1d433d161
Oops: 0003 [#1] SMP KASAN
CPU: 1 PID: 4811 Comm: syz-executor0 Not tainted 4.18.0-rc1+ #114
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:bpf_jit_binary_lock_ro include/linux/filter.h:703 [inline]
RIP: 0010:bpf_int_jit_compile+0xc36/0xf30 arch/x86/net/bpf_jit_comp.c:1168
Code: b8 00 00 00 00 00 fc ff df 48 c1 ea 03 0f b6 04 02 4c 89 f2 83 e2 07
38 d0 7f 08 84 c0 0f 85 a0 02 00 00 48 8b 85 00 ff ff ff <80> 60 02 fe e9
c7 fb ff ff e8 ac 00 36 00 48 8b 8d 30 ff ff ff 48
RSP: 0018:ffff8801cfca7998 EFLAGS: 00010246
RAX: ffffffffa0008000 RBX: 0000000000000046 RCX: ffffffff81460e4a
RDX: 0000000000000002 RSI: ffffffff81460e58 RDI: 0000000000000005
RBP: ffff8801cfca7ab8 R08: ffff8801aa2121c0 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90001938002
R13: ffff8801cfca7a90 R14: ffffffffa0008002 R15: 00000000fffffff4
FS: 0000000001429940(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffa0008002 CR3: 00000001d2c40000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
bpf_prog_select_runtime+0x7db/0xa60 kernel/bpf/core.c:1505
bpf_prog_load+0x1194/0x1c60 kernel/bpf/syscall.c:1356
__do_sys_bpf kernel/bpf/syscall.c:2360 [inline]
__se_sys_bpf kernel/bpf/syscall.c:2322 [inline]
__x64_sys_bpf+0x36c/0x510 kernel/bpf/syscall.c:2322
do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x455a99
Code: 1d ba fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 eb b9 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffd396676f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
RAX: ffffffffffffffda RBX: 0000000001429914 RCX: 0000000000455a99
RDX: 0000000000000048 RSI: 0000000020000240 RDI: 0000000000000005
RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000005
R13: 00000000004bb7d5 R14: 00000000004c8508 R15: 0000000000000023
Modules linked in:
Dumping ftrace buffer:
(ftrace buffer empty)
CR2: ffffffffa0008002
---[ end trace fa548fc30dca8c15 ]---
RIP: 0010:bpf_jit_binary_lock_ro include/linux/filter.h:703 [inline]
RIP: 0010:bpf_int_jit_compile+0xc36/0xf30 arch/x86/net/bpf_jit_comp.c:1168
Code: b8 00 00 00 00 00 fc ff df 48 c1 ea 03 0f b6 04 02 4c 89 f2 83 e2 07
38 d0 7f 08 84 c0 0f 85 a0 02 00 00 48 8b 85 00 ff ff ff <80> 60 02 fe e9
c7 fb ff ff e8 ac 00 36 00 48 8b 8d 30 ff ff ff 48
RSP: 0018:ffff8801cfca7998 EFLAGS: 00010246
RAX: ffffffffa0008000 RBX: 0000000000000046 RCX: ffffffff81460e4a
RDX: 0000000000000002 RSI: ffffffff81460e58 RDI: 0000000000000005
RBP: ffff8801cfca7ab8 R08: ffff8801aa2121c0 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90001938002
R13: ffff8801cfca7a90 R14: ffffffffa0008002 R15: 00000000fffffff4
FS: 0000000001429940(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffa0008002 CR3: 00000001d2c40000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].
syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches
On Sat, 23 Jun 2018, syzbot wrote:
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: [email protected]
>
> RAX: ffffffffffffffda RBX: 0000000001429914 RCX: 0000000000455a99
> RDX: 0000000000000048 RSI: 0000000020000240 RDI: 0000000000000005
> RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000005
> R13: 00000000004bb7d5 R14: 00000000004c8508 R15: 0000000000000023
> BUG: unable to handle kernel paging request at ffffffffa0008002
> PGD 8e6d067 P4D 8e6d067 PUD 8e6e063 PMD 1b4528067 PTE 1d433d161
> Oops: 0003 [#1] SMP KASAN
> CPU: 1 PID: 4811 Comm: syz-executor0 Not tainted 4.18.0-rc1+ #114
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
> 01/01/2011
> RIP: 0010:bpf_jit_binary_lock_ro include/linux/filter.h:703 [inline]
> RIP: 0010:bpf_int_jit_compile+0xc36/0xf30 arch/x86/net/bpf_jit_comp.c:1168
static inline void bpf_jit_binary_lock_ro(struct bpf_binary_header *hdr)
{
WARN_ON_ONCE(set_memory_ro((unsigned long)hdr, hdr->pages));
}
Qualitee. set_memory_ro() has legitimate reasons to fail, but sure it does
not most of the time.
So instead of implementing proper error handling, this adds complete bogus
wrappers. Hell, set_memory_*() have stub functions which return 0 for the
CONFIG_ARCH_HAS_SET_MEMORY=n case.
The unlock function is even more hilarious:
static inline void bpf_prog_unlock_ro(struct bpf_prog *fp)
{
if (fp->locked) {
WARN_ON_ONCE(set_memory_rw((unsigned long)fp, fp->pages));
/* In case set_memory_rw() fails, we want to be the first
* to crash here instead of some random place later on.
*/
fp->locked = 0;
}
}
Great approach for a facility, which deals with untrusted user space
stuff. Yeah. I know. The BPF mantra is: "Performance first"
I'm really tempted to make the BPF config switch depend on BROKEN.
Thanks,
tglx
From: Thomas Gleixner <[email protected]>
Date: Sun, 24 Jun 2018 09:09:09 +0200 (CEST)
> I'm really tempted to make the BPF config switch depend on BROKEN.
This really isn't necessary Thomas.
Whoever wrote the code didn't understand that set ro can legitimately
fail.
So let's correct that instead of flaming a feature.
Thank you.
* David Miller <[email protected]> wrote:
> From: Thomas Gleixner <[email protected]>
> Date: Sun, 24 Jun 2018 09:09:09 +0200 (CEST)
>
> > I'm really tempted to make the BPF config switch depend on BROKEN.
>
> This really isn't necessary Thomas.
>
> Whoever wrote the code didn't understand that set ro can legitimately
> fail.
No, that's *NOT* the only thing that happened, according to the Git history.
The first use of set_memory_ro() in include/linux/filter.h was added by
this commit almost four years ago:
# 2014/09
60a3b2253c41 ("net: bpf: make eBPF interpreter images read-only")
... and yes, that commit didn't anticipate the (in hindsight) obvious property of
a function that changes global kernel mappings that if it is used after bootup
without locking it 'may fail'. So that commit slipping through is 'shit happens'
and I don't think we ever complained about such things slipping through.
But what happened after that is not so good:
A bit over two years later a crash was found:
Eric and Willem reported that they recently saw random crashes when
JIT was in use and bisected this to 74451e66d516 ("bpf: make jited
programs visible in traces"). Issue was that the consolidation part
added bpf_jit_binary_unlock_ro() that would unlock previously made
read-only memory back to read-write.
... but instead of fixing it for real, it was only tinkered with:
# 2017//02
9d876e79df6a ("bpf: fix unlocking of jited image when module ronx not set")
... but the problems persisted:
Improve bpf_{prog,jit_binary}_{un,}lock_ro() by throwing a
one-time warning in case of an error when the image couldn't
be set read-only, and also mark struct bpf_prog as locked when
bpf_prog_lock_ro() was called.
... so the warnings Thomas complained about here were then added a month later:
# 2017/03
65869a47f348 ("bpf: improve read-only handling")
It 'improved' nothing of the sort, and the warnings and 'debug code' shows that
the author was aware that these functions could actually fail. To quote the fine
code, introduced a year ago:
WARN_ON_ONCE(set_memory_rw((unsigned long)fp, fp->pages));
/* In case set_memory_rw() fails, we want to be the first
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* to crash here instead of some random place later on.
*/
fp->locked = 0;
... and then, this month, it was tweaked *YET ANOTHER TIME*:
bpf: reject any prog that failed read-only lock
We currently lock any JITed image as read-only via bpf_jit_binary_lock_ro()
as well as the BPF image as read-only through bpf_prog_lock_ro(). In
the case any of these would fail we throw a WARN_ON_ONCE() in order to
yell loudly to the log. Perhaps, to some extend, this may be comparable
to an allocation where __GFP_NOWARN is explicitly not set.
# 2018/06
9facc336876f ("bpf: reject any prog that failed read-only lock")
The tone of uncertainty of the changelog, combined with the unfixed typo in it,
suggests that this commit too was just waved through to upstream without any real
review and without much design thinking behind it.
And yes, this was still not the right fix, as the fuzzer crash reported in this
thread outlines - we'll probably need a 5th commit?
> So let's correct that instead of flaming a feature.
So accusing Thomas of 'flaming a feature' is a really unfair attack in light of
all the details above.
Thanks,
Ingo
On 06/24/2018 12:02 PM, Ingo Molnar wrote:
> * David Miller <[email protected]> wrote:
>> From: Thomas Gleixner <[email protected]>
>> Date: Sun, 24 Jun 2018 09:09:09 +0200 (CEST)
>>
>>> I'm really tempted to make the BPF config switch depend on BROKEN.
>>
>> This really isn't necessary Thomas.
>>
>> Whoever wrote the code didn't understand that set ro can legitimately
>> fail.
>
> No, that's *NOT* the only thing that happened, according to the Git history.
>
> The first use of set_memory_ro() in include/linux/filter.h was added by
> this commit almost four years ago:
>
> # 2014/09
> 60a3b2253c41 ("net: bpf: make eBPF interpreter images read-only")
>
> ... and yes, that commit didn't anticipate the (in hindsight) obvious property of
> a function that changes global kernel mappings that if it is used after bootup
> without locking it 'may fail'. So that commit slipping through is 'shit happens'
> and I don't think we ever complained about such things slipping through.
Hmm, back then I adapted the code similar from 314beb9bcabf ("x86: bpf_jit_comp:
secure bpf jit against spraying attacks") for interpreter images as well, and
from grepping through the kernel code none of the callers of set_memory_{ro,rw}()
at that time (& now except bpf) did check for the return code (e.g. module_enable_ro()
and module_disable_ro() as one example which could happen late after bootup has
finished when pulling in modules on the fly).
I did made the mistake in 9facc336876f ("bpf: reject any prog that failed read-only
lock") assuming that after the set_memory_ro() call it would either succeed or
it would not, but not leaving us in a state in the middle. That was silly assumption
and I'll fix this up in bpf, very sorry about that! I've been debugging the syzkaller
BUG at [1] and noticed that even though set_memory_ro() failed with an error, doing
a probe_kernel_write() on it afterwards failed with EFAULT, meaning the module_alloc()
memory was however set to read-only at that point triggering later the BUG when
attempting to change its memory (at least on the virtual mem). From debugging output,
it was a single 4k page and on x86_64 in the __change_page_attr_set_clr() we failed
in the cpa_process_alias() where the syzkaller fault injection happened. So latter
failure from cpa_process_alias() came from call to __change_page_attr_set_clr() with
primary to 0, where it tried to split a large page in __change_page_attr() but failed
in alloc_pages() thus returning the -ENOMEM from there. Testing subsequent undoing
via set_memory_rw() made it writable again, though.
In any case, for pairs like set_memory_ro() + set_memory_rw() that are also used
outside of bpf e.g. STRICT_MODULE_RWX and friends which are mostly default these
days for some archs, is the choice to not check errors from there by design or from
historical context that it originated from 'debugging code' in that sense (DEBUG_RODATA /
DEBUG_SET_MODULE_RONX) earlier? Also if no-one checks for errors (and if that would
infact be the recommendation it is agreed upon) should the API be changed to void,
or generally should actual error checking occur on these + potential rollback; but
then question is what about restoring part from prior set_memory_ro() via set_memory_rw()?
Kees/others, do you happen to have some more context on recommended use around this
by any chance? (Would probably also help if we add some doc around assumptions into
include/linux/set_memory.h for future users.)
Thanks a lot,
Daniel
[1] https://syzkaller.appspot.com/bug?extid=a4eb8c7766952a1ca872
On Tue, Jun 26, 2018 at 3:53 PM, Daniel Borkmann <[email protected]> wrote:
> In any case, for pairs like set_memory_ro() + set_memory_rw() that are also used
> outside of bpf e.g. STRICT_MODULE_RWX and friends which are mostly default these
> days for some archs, is the choice to not check errors from there by design or from
> historical context that it originated from 'debugging code' in that sense (DEBUG_RODATA /
> DEBUG_SET_MODULE_RONX) earlier? Also if no-one checks for errors (and if that would
> infact be the recommendation it is agreed upon) should the API be changed to void,
> or generally should actual error checking occur on these + potential rollback; but
> then question is what about restoring part from prior set_memory_ro() via set_memory_rw()?
> Kees/others, do you happen to have some more context on recommended use around this
> by any chance? (Would probably also help if we add some doc around assumptions into
> include/linux/set_memory.h for future users.)
If set_memory_* can fail, I think it needs to be __must_check, and all
the callers need to deal with it gracefully. Those markings aren't
"advisory": they're expected to actually do what they say.
-Kees
--
Kees Cook
Pixel Security
* Kees Cook <[email protected]> wrote:
> On Tue, Jun 26, 2018 at 3:53 PM, Daniel Borkmann <[email protected]> wrote:
> > In any case, for pairs like set_memory_ro() + set_memory_rw() that are also used
> > outside of bpf e.g. STRICT_MODULE_RWX and friends which are mostly default these
> > days for some archs, is the choice to not check errors from there by design or from
> > historical context that it originated from 'debugging code' in that sense (DEBUG_RODATA /
> > DEBUG_SET_MODULE_RONX) earlier? Also if no-one checks for errors (and if that would
> > infact be the recommendation it is agreed upon) should the API be changed to void,
> > or generally should actual error checking occur on these + potential rollback; but
> > then question is what about restoring part from prior set_memory_ro() via set_memory_rw()?
> > Kees/others, do you happen to have some more context on recommended use around this
> > by any chance? (Would probably also help if we add some doc around assumptions into
> > include/linux/set_memory.h for future users.)
>
> If set_memory_* can fail, I think it needs to be __must_check, and all
> the callers need to deal with it gracefully. Those markings aren't
> "advisory": they're expected to actually do what they say.
Yes - but there's probably a few exceptions like early init code where the calls
not succeeding are signs of bugs - so any error return should probably be
WARN_ON()ed about.
Thanks,
Ingo