2021-05-31 07:16:49

by syzbot

[permalink] [raw]
Subject: [syzbot] KASAN: stack-out-of-bounds Read in profile_pc

Hello,

syzbot found the following issue on:

HEAD commit: 7ac3a1c1 Merge tag 'mtd/fixes-for-5.13-rc4' of git://git.k..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1246d43dd00000
kernel config: https://syzkaller.appspot.com/x/.config?x=f9f3fc7daa178986
dashboard link: https://syzkaller.appspot.com/bug?extid=84fe685c02cd112a2ac3
compiler: Debian clang version 11.0.1-2

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

==================================================================
BUG: KASAN: stack-out-of-bounds in profile_pc+0xa4/0xe0 arch/x86/kernel/time.c:42
Read of size 8 at addr ffffc90001c0f7a0 by task systemd-udevd/12323

CPU: 1 PID: 12323 Comm: systemd-udevd Not tainted 5.13.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x202/0x31e lib/dump_stack.c:120
print_address_description+0x5f/0x3b0 mm/kasan/report.c:233
__kasan_report mm/kasan/report.c:419 [inline]
kasan_report+0x15c/0x200 mm/kasan/report.c:436
profile_pc+0xa4/0xe0 arch/x86/kernel/time.c:42
profile_tick+0xcd/0x120 kernel/profile.c:408
tick_sched_handle kernel/time/tick-sched.c:227 [inline]
tick_sched_timer+0x287/0x420 kernel/time/tick-sched.c:1373
__run_hrtimer kernel/time/hrtimer.c:1537 [inline]
__hrtimer_run_queues+0x4cb/0xa60 kernel/time/hrtimer.c:1601
hrtimer_interrupt+0x3b3/0x1040 kernel/time/hrtimer.c:1663
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1089 [inline]
__sysvec_apic_timer_interrupt+0xf9/0x270 arch/x86/kernel/apic/apic.c:1106
sysvec_apic_timer_interrupt+0x8c/0xb0 arch/x86/kernel/apic/apic.c:1100
</IRQ>
asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:647
RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:161 [inline]
RIP: 0010:_raw_spin_unlock_irqrestore+0xbc/0x120 kernel/locking/spinlock.c:191
Code: f0 48 c1 e8 03 42 80 3c 20 00 74 08 4c 89 f7 e8 5a 57 04 f8 f6 44 24 21 02 75 4e 41 f7 c7 00 02 00 00 74 01 fb bf 01 00 00 00 <e8> 0f 3f 94 f7 65 8b 05 50 54 3f 76 85 c0 74 3f 48 c7 04 24 0e 36
RSP: 0018:ffffc90001c0f7a0 EFLAGS: 00000206
RAX: 1ffff92000381ef8 RBX: ffff88802e529540 RCX: ffffffff90e84703
RDX: dffffc0000000000 RSI: 0000000000000001 RDI: 0000000000000001
RBP: ffffc90001c0f830 R08: ffffffff81855cb0 R09: ffffed1005ca52a9
R10: ffffed1005ca52a9 R11: 0000000000000000 R12: dffffc0000000000
R13: 1ffff92000381ef4 R14: ffffc90001c0f7c0 R15: 0000000000000a02
spin_unlock_irqrestore include/linux/spinlock.h:409 [inline]
__wake_up_common_lock kernel/sched/wait.c:140 [inline]
__wake_up_sync_key+0x134/0x1f0 kernel/sched/wait.c:205
sock_def_readable+0x141/0x210 net/core/sock.c:2910
unix_dgram_sendmsg+0x1a6e/0x2aa0 net/unix/af_unix.c:1800
sock_sendmsg_nosec net/socket.c:654 [inline]
sock_sendmsg net/socket.c:674 [inline]
sock_write_iter+0x398/0x520 net/socket.c:1001
call_write_iter include/linux/fs.h:2114 [inline]
new_sync_write fs/read_write.c:518 [inline]
vfs_write+0xa39/0xc90 fs/read_write.c:605
ksys_write+0x171/0x2a0 fs/read_write.c:658
do_syscall_64+0x3f/0xb0 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f8354df21b0
Code: 2e 0f 1f 84 00 00 00 00 00 90 48 8b 05 19 7e 20 00 c3 0f 1f 84 00 00 00 00 00 83 3d 19 c2 20 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae fc ff ff 48 89 04 24
RSP: 002b:00007fff1f53e558 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000055f37f7b13c0 RCX: 00007f8354df21b0
RDX: 0000000000000000 RSI: 00007fff1f53e610 RDI: 0000000000000008
RBP: 00007fff1f53e6d0 R08: 000055f37f7a0a04 R09: 0000000000000000
R10: 0000000000000004 R11: 0000000000000246 R12: 00007fff1f53e620
R13: 000055f37f78c880 R14: 0000000000000003 R15: 000000000000000e


addr ffffc90001c0f7a0 is located in stack of task systemd-udevd/12323 at offset 0 in frame:
_raw_spin_unlock_irqrestore+0x0/0x120 kernel/locking/spinlock.c:184

this frame has 1 object:
[32, 40) 'flags.i.i.i.i'

Memory state around the buggy address:
ffffc90001c0f680: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffffc90001c0f700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffffc90001c0f780: 00 00 00 00 f1 f1 f1 f1 00 f3 f3 f3 00 00 00 00
^
ffffc90001c0f800: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
ffffc90001c0f880: 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3 00 00 00 00
==================================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.


2021-06-02 23:02:57

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [syzbot] KASAN: stack-out-of-bounds Read in profile_pc

On Mon, May 31, 2021 at 12:15:23AM -0700, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 7ac3a1c1 Merge tag 'mtd/fixes-for-5.13-rc4' of git://git.k..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1246d43dd00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=f9f3fc7daa178986
> dashboard link: https://syzkaller.appspot.com/bug?extid=84fe685c02cd112a2ac3
> compiler: Debian clang version 11.0.1-2
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: [email protected]
>
> ==================================================================
> BUG: KASAN: stack-out-of-bounds in profile_pc+0xa4/0xe0 arch/x86/kernel/time.c:42
> Read of size 8 at addr ffffc90001c0f7a0 by task systemd-udevd/12323

This looks like a valid bug in profile_pc(). With !FRAME_POINTER, it
has an ancient (2006) hack for unwinding a single frame, for when
regs->ip is in a lock function.

I guess the point is to put lock functions' callees in the profile,
rather than the lock functions themselves.

profile_pc() assumes the return address is either directly at regs->sp,
or one word adjacent to it due to saved flags, both of which are just
completely wrong. This code has probably never worked with ORC, and
nobody noticed apparently.

We could just use ORC to unwind to the next frame. Though, isn't
/proc/profile redundant, compared to all the more sophisticated options
nowadays? Is there still a distinct use case for it or can we just
remove it?

--
Josh

2021-06-02 23:39:19

by Andi Kleen

[permalink] [raw]
Subject: Re: [syzbot] KASAN: stack-out-of-bounds Read in profile_pc


> profile_pc() assumes the return address is either directly at regs->sp,
> or one word adjacent to it due to saved flags, both of which are just
> completely wrong. This code has probably never worked with ORC, and
> nobody noticed apparently.

I presume it used to work because the lock functions were really simple,
but that's not true anymore.

>
> We could just use ORC to unwind to the next frame. Though, isn't
> /proc/profile redundant, compared to all the more sophisticated options
> nowadays? Is there still a distinct use case for it or can we just
> remove it?

It's still needed for some special cases. For example there is no other
viable way to profile early boot without a VM

I would just drop the hack to unwind, at least for the early boot
profile use case locking profiling is usually not needed.

-Andi

2021-06-03 08:04:43

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] KASAN: stack-out-of-bounds Read in profile_pc

syzbot has found a reproducer for the following issue on:

HEAD commit: 324c92e5 Merge tag 'efi-urgent-2021-06-02' of git://git.ke..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1653122fd00000
kernel config: https://syzkaller.appspot.com/x/.config?x=ad5040c83f09b8e4
dashboard link: https://syzkaller.appspot.com/bug?extid=84fe685c02cd112a2ac3
compiler: Debian clang version 11.0.1-2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=171e0683d00000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1723cc47d00000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

==================================================================
BUG: KASAN: stack-out-of-bounds in profile_pc+0xa4/0xe0 arch/x86/kernel/time.c:42
Read of size 8 at addr ffffc9000163f620 by task syz-executor815/8426

CPU: 1 PID: 8426 Comm: syz-executor815 Not tainted 5.13.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x202/0x31e lib/dump_stack.c:120
print_address_description+0x5f/0x3b0 mm/kasan/report.c:233
__kasan_report mm/kasan/report.c:419 [inline]
kasan_report+0x15c/0x200 mm/kasan/report.c:436
profile_pc+0xa4/0xe0 arch/x86/kernel/time.c:42
profile_tick+0xcd/0x120 kernel/profile.c:408
tick_sched_handle kernel/time/tick-sched.c:227 [inline]
tick_sched_timer+0x287/0x420 kernel/time/tick-sched.c:1373
__run_hrtimer kernel/time/hrtimer.c:1537 [inline]
__hrtimer_run_queues+0x4cb/0xa60 kernel/time/hrtimer.c:1601
hrtimer_interrupt+0x3b3/0x1040 kernel/time/hrtimer.c:1663
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1089 [inline]
__sysvec_apic_timer_interrupt+0xf9/0x270 arch/x86/kernel/apic/apic.c:1106
sysvec_apic_timer_interrupt+0x8c/0xb0 arch/x86/kernel/apic/apic.c:1100
</IRQ>
asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:647
RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:161 [inline]
RIP: 0010:_raw_spin_unlock_irqrestore+0xbc/0x120 kernel/locking/spinlock.c:191
Code: f0 48 c1 e8 03 42 80 3c 20 00 74 08 4c 89 f7 e8 ea e7 03 f8 f6 44 24 21 02 75 4e 41 f7 c7 00 02 00 00 74 01 fb bf 01 00 00 00 <e8> 1f b3 93 f7 65 8b 05 50 c4 3e 76 85 c0 74 3f 48 c7 04 24 0e 36
RSP: 0018:ffffc9000163f620 EFLAGS: 00000206
RAX: 1ffff920002c7ec8 RBX: ffffffff9117f258 RCX: ffffffff90e85703
RDX: dffffc0000000000 RSI: 0000000000000001 RDI: 0000000000000001
RBP: ffffc9000163f6b8 R08: ffffffff818560c0 R09: fffffbfff222fe4c
R10: fffffbfff222fe4c R11: 0000000000000000 R12: dffffc0000000000
R13: 1ffff920002c7ec4 R14: ffffc9000163f640 R15: 0000000000000a02
__debug_check_no_obj_freed lib/debugobjects.c:997 [inline]
debug_check_no_obj_freed+0x5a2/0x650 lib/debugobjects.c:1018
free_pages_prepare mm/page_alloc.c:1303 [inline]
__free_pages_ok+0x2f5/0x1180 mm/page_alloc.c:1572
destroy_compound_page include/linux/mm.h:939 [inline]
__put_compound_page mm/swap.c:111 [inline]
release_pages+0x600/0x1b80 mm/swap.c:948
tlb_batch_pages_flush mm/mmu_gather.c:49 [inline]
tlb_flush_mmu_free mm/mmu_gather.c:242 [inline]
tlb_flush_mmu+0x780/0x910 mm/mmu_gather.c:249
tlb_finish_mmu+0xcb/0x200 mm/mmu_gather.c:340
exit_mmap+0x2c6/0x5f0 mm/mmap.c:3210
__mmput+0x111/0x370 kernel/fork.c:1096
exit_mm+0x67e/0x7d0 kernel/exit.c:502
do_exit+0x6b9/0x23d0 kernel/exit.c:813
do_group_exit+0x168/0x2d0 kernel/exit.c:923
__do_sys_exit_group+0x13/0x20 kernel/exit.c:934
__se_sys_exit_group+0x10/0x10 kernel/exit.c:932
__x64_sys_exit_group+0x37/0x40 kernel/exit.c:932
do_syscall_64+0x3f/0xb0 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x446bc9
Code: Unable to access opcode bytes at RIP 0x446b9f.
RSP: 002b:00007ffdae409208 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 00000000004b8390 RCX: 0000000000446bc9
RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000001
RBP: 0000000000000001 R08: ffffffffffffffc4 R09: 0000000000000004
R10: 00000000004004a0 R11: 0000000000000246 R12: 00000000004b8390
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001


addr ffffc9000163f620 is located in stack of task syz-executor815/8426 at offset 0 in frame:
_raw_spin_unlock_irqrestore+0x0/0x120 kernel/locking/spinlock.c:184

this frame has 1 object:
[32, 40) 'flags.i.i.i.i'

Memory state around the buggy address:
ffffc9000163f500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffffc9000163f580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffffc9000163f600: 00 00 00 00 f1 f1 f1 f1 00 f3 f3 f3 00 00 00 00
^
ffffc9000163f680: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffffc9000163f700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================

2021-06-03 13:31:53

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [syzbot] KASAN: stack-out-of-bounds Read in profile_pc

On Wed, Jun 02, 2021 at 04:35:11PM -0700, Andi Kleen wrote:
>
> > profile_pc() assumes the return address is either directly at regs->sp,
> > or one word adjacent to it due to saved flags, both of which are just
> > completely wrong. This code has probably never worked with ORC, and
> > nobody noticed apparently.
>
> I presume it used to work because the lock functions were really simple, but
> that's not true anymore.

Yeah, I figured as much.

> > We could just use ORC to unwind to the next frame. Though, isn't
> > /proc/profile redundant, compared to all the more sophisticated options
> > nowadays? Is there still a distinct use case for it or can we just
> > remove it?
>
> It's still needed for some special cases. For example there is no other
> viable way to profile early boot without a VM
>
> I would just drop the hack to unwind, at least for the early boot profile
> use case locking profiling is usually not needed.

Ok, I'll just get rid of the hack then.

--
Josh

2021-06-03 13:32:14

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [syzbot] KASAN: stack-out-of-bounds Read in profile_pc

On Wed, Jun 02, 2021 at 04:35:11PM -0700, Andi Kleen wrote:

> > We could just use ORC to unwind to the next frame. Though, isn't
> > /proc/profile redundant, compared to all the more sophisticated options
> > nowadays? Is there still a distinct use case for it or can we just
> > remove it?
>
> It's still needed for some special cases. For example there is no other
> viable way to profile early boot without a VM
>
> I would just drop the hack to unwind, at least for the early boot profile
> use case locking profiling is usually not needed.

Surely we can cook up something else there and delete this thing? ftrace
buffers are available really early, it shouldn't be hard to dump some
data in there during boot.

2021-06-03 13:40:34

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [syzbot] KASAN: stack-out-of-bounds Read in profile_pc

On Thu, Jun 03, 2021 at 03:30:10PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 02, 2021 at 04:35:11PM -0700, Andi Kleen wrote:
>
> > > We could just use ORC to unwind to the next frame. Though, isn't
> > > /proc/profile redundant, compared to all the more sophisticated options
> > > nowadays? Is there still a distinct use case for it or can we just
> > > remove it?
> >
> > It's still needed for some special cases. For example there is no other
> > viable way to profile early boot without a VM
> >
> > I would just drop the hack to unwind, at least for the early boot profile
> > use case locking profiling is usually not needed.
>
> Surely we can cook up something else there and delete this thing? ftrace
> buffers are available really early, it shouldn't be hard to dump some
> data in there during boot.

True, ftrace does have function profiling (function_profile_enabled).

Steve, is there a way to enable that on the kernel cmdline?

--
Josh

2021-06-03 13:55:51

by Andi Kleen

[permalink] [raw]
Subject: Re: [syzbot] KASAN: stack-out-of-bounds Read in profile_pc


> True, ftrace does have function profiling (function_profile_enabled).
>
> Steve, is there a way to enable that on the kernel cmdline?

That's not really comparable. function profiling has a lot more
overhead. Also there is various code which has ftrace instrumentation
disabled.

I don't think why you want to kill the old profiler. It's rarely used,
but when you need it usually works. It's always good to have simple fall
backs. And it's not that it's a lot of difficult code.

-Andi

2021-10-11 16:27:03

by Lee Jones

[permalink] [raw]
Subject: Re: [syzbot] KASAN: stack-out-of-bounds Read in profile_pc

On Thu, 03 Jun 2021, Andi Kleen wrote:

>
> > True, ftrace does have function profiling (function_profile_enabled).
> >
> > Steve, is there a way to enable that on the kernel cmdline?
>
> That's not really comparable. function profiling has a lot more overhead.
> Also there is various code which has ftrace instrumentation disabled.
>
> I don't think why you want to kill the old profiler. It's rarely used, but
> when you need it usually works. It's always good to have simple fall backs.
> And it's not that it's a lot of difficult code.

sysbot is still sending out reports on this:

https://syzkaller.appspot.com/bug?id=00c965d957410afc0d40cac5343064e0a98b9ecd

Are you guys still planning on sending out a fix?

Is there anything I can do to help?

--
Lee Jones [李琼斯]
Senior Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog

2021-10-11 16:40:56

by Steven Rostedt

[permalink] [raw]
Subject: Re: [syzbot] KASAN: stack-out-of-bounds Read in profile_pc

On Mon, 11 Oct 2021 14:07:15 +0100
Lee Jones <[email protected]> wrote:

> On Thu, 03 Jun 2021, Andi Kleen wrote:
>
> >
> > > True, ftrace does have function profiling (function_profile_enabled).
> > >
> > > Steve, is there a way to enable that on the kernel cmdline?
> >
> > That's not really comparable. function profiling has a lot more overhead.
> > Also there is various code which has ftrace instrumentation disabled.
> >
> > I don't think why you want to kill the old profiler. It's rarely used, but
> > when you need it usually works. It's always good to have simple fall backs.
> > And it's not that it's a lot of difficult code.
>
> sysbot is still sending out reports on this:
>
> https://syzkaller.appspot.com/bug?id=00c965d957410afc0d40cac5343064e0a98b9ecd
>
> Are you guys still planning on sending out a fix?
>
> Is there anything I can do to help?
>

According to the above:

==================================================================
BUG: KASAN: stack-out-of-bounds in profile_pc+0xa4/0xe0 arch/x86/kernel/time.c:42
Read of size 8 at addr ffffc90001c0f7a0 by task systemd-udevd/12323

CPU: 1 PID: 12323 Comm: systemd-udevd Not tainted 5.13.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x202/0x31e lib/dump_stack.c:120
print_address_description+0x5f/0x3b0 mm/kasan/report.c:233
__kasan_report mm/kasan/report.c:419 [inline]
kasan_report+0x15c/0x200 mm/kasan/report.c:436
profile_pc+0xa4/0xe0 arch/x86/kernel/time.c:42
profile_tick+0xcd/0x120 kernel/profile.c:408
tick_sched_handle kernel/time/tick-sched.c:227 [inline]
tick_sched_timer+0x287/0x420 kernel/time/tick-sched.c:1373
__run_hrtimer kernel/time/hrtimer.c:1537 [inline]
__hrtimer_run_queues+0x4cb/0xa60 kernel/time/hrtimer.c:1601
hrtimer_interrupt+0x3b3/0x1040 kernel/time/hrtimer.c:1663
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1089 [inline]
__sysvec_apic_timer_interrupt+0xf9/0x270 arch/x86/kernel/apic/apic.c:1106
sysvec_apic_timer_interrupt+0x8c/0xb0 arch/x86/kernel/apic/apic.c:1100

And the code has:

profile_pc+0xa4/0xe0 arch/x86/kernel/time.c:42

unsigned long profile_pc(struct pt_regs *regs)
{
unsigned long pc = instruction_pointer(regs);

if (!user_mode(regs) && in_lock_functions(pc)) {
#ifdef CONFIG_FRAME_POINTER
return *(unsigned long *)(regs->bp + sizeof(long));
#else
unsigned long *sp = (unsigned long *)regs->sp;
/*
* Return address is either directly at stack pointer
* or above a saved flags. Eflags has bits 22-31 zero,
* kernel addresses don't.
*/
if (sp[0] >> 22)
return sp[0]; <== line 42
if (sp[1] >> 22)
return sp[1];
#endif
}
return pc;
}
EXPORT_SYMBOL(profile_pc);


It looks to me that the profiler is doing a trick to read the contents of
the stack when the interrupt went off, but this triggers the KASAN
instrumentation to think it's a mistake when it's not. aka "false positive".

How does one tell KASAN that it wants to go outside the stack, because it
knows what its doing?

Should that just be converted to a "copy_from_kernel_nofault()"? That is,
does this fix it? (not even compiled tested)

-- Steve


diff --git a/arch/x86/kernel/time.c b/arch/x86/kernel/time.c
index e42faa792c07..cc6ec29aa14d 100644
--- a/arch/x86/kernel/time.c
+++ b/arch/x86/kernel/time.c
@@ -34,15 +34,18 @@ unsigned long profile_pc(struct pt_regs *regs)
return *(unsigned long *)(regs->bp + sizeof(long));
#else
unsigned long *sp = (unsigned long *)regs->sp;
+ unsigned long retaddr;
/*
* Return address is either directly at stack pointer
* or above a saved flags. Eflags has bits 22-31 zero,
* kernel addresses don't.
*/
- if (sp[0] >> 22)
- return sp[0];
- if (sp[1] >> 22)
- return sp[1];
+ if (!copy_from_kernel_nofault(&retaddr, sp, sizeof(long)) &&
+ retaddr >> 22)
+ return retaddr;
+ if (!copy_from_kernel_nofault(&retaddr, sp + 1, sizeof(long)) &&
+ retaddr >> 22)
+ return retaddr;
#endif
}
return pc;

2021-10-11 17:15:26

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [syzbot] KASAN: stack-out-of-bounds Read in profile_pc

On Mon, 11 Oct 2021 at 16:43, Steven Rostedt <[email protected]> wrote:
>
> On Mon, 11 Oct 2021 14:07:15 +0100
> Lee Jones <[email protected]> wrote:
>
> > On Thu, 03 Jun 2021, Andi Kleen wrote:
> >
> > >
> > > > True, ftrace does have function profiling (function_profile_enabled).
> > > >
> > > > Steve, is there a way to enable that on the kernel cmdline?
> > >
> > > That's not really comparable. function profiling has a lot more overhead.
> > > Also there is various code which has ftrace instrumentation disabled.
> > >
> > > I don't think why you want to kill the old profiler. It's rarely used, but
> > > when you need it usually works. It's always good to have simple fall backs.
> > > And it's not that it's a lot of difficult code.
> >
> > sysbot is still sending out reports on this:
> >
> > https://syzkaller.appspot.com/bug?id=00c965d957410afc0d40cac5343064e0a98b9ecd
> >
> > Are you guys still planning on sending out a fix?
> >
> > Is there anything I can do to help?
> >
>
> According to the above:
>
> ==================================================================
> BUG: KASAN: stack-out-of-bounds in profile_pc+0xa4/0xe0 arch/x86/kernel/time.c:42
> Read of size 8 at addr ffffc90001c0f7a0 by task systemd-udevd/12323
>
> CPU: 1 PID: 12323 Comm: systemd-udevd Not tainted 5.13.0-rc3-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
> <IRQ>
> __dump_stack lib/dump_stack.c:79 [inline]
> dump_stack+0x202/0x31e lib/dump_stack.c:120
> print_address_description+0x5f/0x3b0 mm/kasan/report.c:233
> __kasan_report mm/kasan/report.c:419 [inline]
> kasan_report+0x15c/0x200 mm/kasan/report.c:436
> profile_pc+0xa4/0xe0 arch/x86/kernel/time.c:42
> profile_tick+0xcd/0x120 kernel/profile.c:408
> tick_sched_handle kernel/time/tick-sched.c:227 [inline]
> tick_sched_timer+0x287/0x420 kernel/time/tick-sched.c:1373
> __run_hrtimer kernel/time/hrtimer.c:1537 [inline]
> __hrtimer_run_queues+0x4cb/0xa60 kernel/time/hrtimer.c:1601
> hrtimer_interrupt+0x3b3/0x1040 kernel/time/hrtimer.c:1663
> local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1089 [inline]
> __sysvec_apic_timer_interrupt+0xf9/0x270 arch/x86/kernel/apic/apic.c:1106
> sysvec_apic_timer_interrupt+0x8c/0xb0 arch/x86/kernel/apic/apic.c:1100
>
> And the code has:
>
> profile_pc+0xa4/0xe0 arch/x86/kernel/time.c:42
>
> unsigned long profile_pc(struct pt_regs *regs)
> {
> unsigned long pc = instruction_pointer(regs);
>
> if (!user_mode(regs) && in_lock_functions(pc)) {
> #ifdef CONFIG_FRAME_POINTER
> return *(unsigned long *)(regs->bp + sizeof(long));
> #else
> unsigned long *sp = (unsigned long *)regs->sp;
> /*
> * Return address is either directly at stack pointer
> * or above a saved flags. Eflags has bits 22-31 zero,
> * kernel addresses don't.
> */
> if (sp[0] >> 22)
> return sp[0]; <== line 42
> if (sp[1] >> 22)
> return sp[1];
> #endif
> }
> return pc;
> }
> EXPORT_SYMBOL(profile_pc);
>
>
> It looks to me that the profiler is doing a trick to read the contents of
> the stack when the interrupt went off, but this triggers the KASAN
> instrumentation to think it's a mistake when it's not. aka "false positive".
>
> How does one tell KASAN that it wants to go outside the stack, because it
> knows what its doing?
>
> Should that just be converted to a "copy_from_kernel_nofault()"? That is,
> does this fix it? (not even compiled tested)
>
> -- Steve
>
>
> diff --git a/arch/x86/kernel/time.c b/arch/x86/kernel/time.c
> index e42faa792c07..cc6ec29aa14d 100644
> --- a/arch/x86/kernel/time.c
> +++ b/arch/x86/kernel/time.c
> @@ -34,15 +34,18 @@ unsigned long profile_pc(struct pt_regs *regs)
> return *(unsigned long *)(regs->bp + sizeof(long));
> #else
> unsigned long *sp = (unsigned long *)regs->sp;
> + unsigned long retaddr;
> /*
> * Return address is either directly at stack pointer
> * or above a saved flags. Eflags has bits 22-31 zero,
> * kernel addresses don't.
> */
> - if (sp[0] >> 22)
> - return sp[0];
> - if (sp[1] >> 22)
> - return sp[1];
> + if (!copy_from_kernel_nofault(&retaddr, sp, sizeof(long)) &&
> + retaddr >> 22)
> + return retaddr;
> + if (!copy_from_kernel_nofault(&retaddr, sp + 1, sizeof(long)) &&
> + retaddr >> 22)
> + return retaddr;
> #endif
> }
> return pc;



This looks like a fit for READ_ONCE_NOCHECK(). It's already used in a
number of similar places like unwinders:
E.g.:
https://elixir.bootlin.com/linux/v5.15-rc4/source/arch/x86/kernel/process.c#L984
https://elixir.bootlin.com/linux/v5.15-rc4/A/ident/READ_ONCE_NOCHECK

2021-10-11 17:35:36

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [syzbot] KASAN: stack-out-of-bounds Read in profile_pc

On Mon, Oct 11, 2021 at 10:43:19AM -0400, Steven Rostedt wrote:
> ==================================================================
> BUG: KASAN: stack-out-of-bounds in profile_pc+0xa4/0xe0 arch/x86/kernel/time.c:42
> Read of size 8 at addr ffffc90001c0f7a0 by task systemd-udevd/12323
>
> CPU: 1 PID: 12323 Comm: systemd-udevd Not tainted 5.13.0-rc3-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
> <IRQ>
> __dump_stack lib/dump_stack.c:79 [inline]
> dump_stack+0x202/0x31e lib/dump_stack.c:120
> print_address_description+0x5f/0x3b0 mm/kasan/report.c:233
> __kasan_report mm/kasan/report.c:419 [inline]
> kasan_report+0x15c/0x200 mm/kasan/report.c:436
> profile_pc+0xa4/0xe0 arch/x86/kernel/time.c:42
> profile_tick+0xcd/0x120 kernel/profile.c:408
> tick_sched_handle kernel/time/tick-sched.c:227 [inline]
> tick_sched_timer+0x287/0x420 kernel/time/tick-sched.c:1373
> __run_hrtimer kernel/time/hrtimer.c:1537 [inline]
> __hrtimer_run_queues+0x4cb/0xa60 kernel/time/hrtimer.c:1601
> hrtimer_interrupt+0x3b3/0x1040 kernel/time/hrtimer.c:1663
> local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1089 [inline]
> __sysvec_apic_timer_interrupt+0xf9/0x270 arch/x86/kernel/apic/apic.c:1106
> sysvec_apic_timer_interrupt+0x8c/0xb0 arch/x86/kernel/apic/apic.c:1100
>
> And the code has:
>
> profile_pc+0xa4/0xe0 arch/x86/kernel/time.c:42
>
> unsigned long profile_pc(struct pt_regs *regs)
> {
> unsigned long pc = instruction_pointer(regs);
>
> if (!user_mode(regs) && in_lock_functions(pc)) {
> #ifdef CONFIG_FRAME_POINTER
> return *(unsigned long *)(regs->bp + sizeof(long));
> #else
> unsigned long *sp = (unsigned long *)regs->sp;
> /*
> * Return address is either directly at stack pointer
> * or above a saved flags. Eflags has bits 22-31 zero,
> * kernel addresses don't.
> */
> if (sp[0] >> 22)
> return sp[0]; <== line 42
> if (sp[1] >> 22)
> return sp[1];
> #endif
> }
> return pc;
> }
> EXPORT_SYMBOL(profile_pc);
>
>
> It looks to me that the profiler is doing a trick to read the contents of
> the stack when the interrupt went off, but this triggers the KASAN
> instrumentation to think it's a mistake when it's not. aka "false positive".
>
> How does one tell KASAN that it wants to go outside the stack, because it
> knows what its doing?

*If* the code knew what it were doing, it could use READ_ONCE_NOCHECK()
to skip KASAN checking. But this code is horribly broken and dangerous.

--
Josh