Greetings,
0day kernel testing robot got the below dmesg and the first bad commit is
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
commit afaef01c001537fa97a25092d7f54d764dc7d8c1
Author: Alexander Popov <[email protected]>
AuthorDate: Fri Aug 17 01:16:58 2018 +0300
Commit: Kees Cook <[email protected]>
CommitDate: Tue Sep 4 10:35:47 2018 -0700
x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls
The STACKLEAK feature (initially developed by PaX Team) has the following
benefits:
1. Reduces the information that can be revealed through kernel stack leak
bugs. The idea of erasing the thread stack at the end of syscalls is
similar to CONFIG_PAGE_POISONING and memzero_explicit() in kernel
crypto, which all comply with FDP_RIP.2 (Full Residual Information
Protection) of the Common Criteria standard.
2. Blocks some uninitialized stack variable attacks (e.g. CVE-2017-17712,
CVE-2010-2963). That kind of bugs should be killed by improving C
compilers in future, which might take a long time.
This commit introduces the code filling the used part of the kernel
stack with a poison value before returning to userspace. Full
STACKLEAK feature also contains the gcc plugin which comes in a
separate commit.
The STACKLEAK feature is ported from grsecurity/PaX. More information at:
https://grsecurity.net/
https://pax.grsecurity.net/
This code is modified from Brad Spengler/PaX Team's code in the last
public patch of grsecurity/PaX based on our understanding of the code.
Changes or omissions from the original code are ours and don't reflect
the original grsecurity/PaX code.
Performance impact:
Hardware: Intel Core i7-4770, 16 GB RAM
Test #1: building the Linux kernel on a single core
0.91% slowdown
Test #2: hackbench -s 4096 -l 2000 -g 15 -f 25 -P
4.2% slowdown
So the STACKLEAK description in Kconfig includes: "The tradeoff is the
performance impact: on a single CPU system kernel compilation sees a 1%
slowdown, other systems and workloads may vary and you are advised to
test this feature on your expected workload before deploying it".
Signed-off-by: Alexander Popov <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Reviewed-by: Dave Hansen <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
57361846b5 Linux 4.19-rc2
afaef01c00 x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls
24ccea7e10 Merge tag 'xfs-4.20-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
442b8cea24 Add linux-next specific files for 20181109
+---------------------------------------------------------------+-----------+------------+------------+---------------+
| | v4.19-rc2 | afaef01c00 | 24ccea7e10 | next-20181109 |
+---------------------------------------------------------------+-----------+------------+------------+---------------+
| boot_successes | 498 | 171 | 165 | 26 |
| boot_failures | 0 | 2 | 8 | 1 |
| double_fault:#[##] | 0 | 2 | 8 | |
| RIP:ftrace_ops_test | 0 | 2 | 8 | 1 |
| WARNING:stack_recursion | 0 | 2 | 8 | 1 |
| WARNING:at(____ptrval____)for_ip_syscall_return_via_sysret/0x | 0 | 2 | 8 | 1 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 2 | 8 | 1 |
+---------------------------------------------------------------+-----------+------------+------------+---------------+
[main] Setsockopt(0 8 68b000 4) on fd 376 [2:1:0]
[main] Setsockopt(29 1a 68b000 2f) on fd 377 [10:2:0]
[main] Setsockopt(1 2c 68b000 4) on fd 379 [2:1:0]
[main] Setsockopt(0 13 68b000 1) on fd 380 [10:1:0]
[main] 375 sockets created based on info from socket cachefile.
[ 127.808225] double fault: 0000 [#1]
[ 127.808695] CPU: 0 PID: 414 Comm: trinity-main Tainted: G T 4.19.0-rc2-00001-gafaef01 #1
[ 127.809799] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 127.810760] RIP: 0010:ftrace_ops_test+0x27/0xa0
[ 127.811289] Code: eb 9a 90 41 54 55 49 89 f4 53 48 89 d3 48 89 fd 48 81 ec b0 00 00 00 65 48 8b 04 25 28 00 00 00 48 89 84 24 a8 00 00 00 31 c0 <e8> 54 df ff ff 48 85 db 74 57 e8 4a df ff ff 48 8b 85 d0 00 00 00
[ 127.813385] RSP: 0018:fffffe0000001fb8 EFLAGS: 00010046
[ 127.813991] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000800
[ 127.814802] RDX: 0000000000000000 RSI: ffffffff811c4560 RDI: ffff8800158c2d20
[ 127.815652] RBP: ffff8800158c2d20 R08: 0000000000000000 R09: 0000000000000000
[ 127.816494] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff811c4560
[ 127.817357] R13: 0000000000000000 R14: ffffffff82400160 R15: 0000000000000800
[ 127.818178] FS: 00007fac9f0de700(0000) GS:ffffffff83044000(0000) knlGS:0000000000000000
[ 127.819099] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 127.819762] CR2: fffffe0000001fa8 CR3: 000000001579a000 CR4: 00000000000006b0
[ 127.820583] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 127.821406] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 127.822234] Call Trace:
[ 127.822530] <ENTRY_TRAMPOLINE>
[ 127.822914] ? __ia32_sys_rseq+0x2f0/0x2f0
[ 127.823395] ftrace_ops_list_func+0xa5/0x1b0
[ 127.823922] ftrace_call+0x5/0x34
[ 127.824318] ? stackleak_erase+0x5/0xf0
[ 127.824789] ? stackleak_erase+0x43/0xf0
[ 127.825260] stackleak_erase+0x5/0xf0
[ 127.825699] syscall_return_via_sysret+0x61/0x81
[ 127.826238] WARNING: stack recursion on stack type 4
[ 127.826243] WARNING: can't dereference registers at (____ptrval____) for ip syscall_return_via_sysret+0x61/0x81
[ 127.826246] </ENTRY_TRAMPOLINE>
[ 127.828342] ---[ end trace e9f96d3f45575499 ]---
[ 127.828911] RIP: 0010:ftrace_ops_test+0x27/0xa0
# HH:MM RESULT GOOD BAD GOOD_BUT_DIRTY DIRTY_NOT_BAD
git bisect start 651022382c7f8da46cb4872a545ee1da6d097d2a v4.19 --
git bisect good 685f7e4f161425b137056abe35ba8ef7b669d83d # 14:15 G 167 0 0 0 Merge tag 'powerpc-4.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
git bisect good 519f64bf15dccb4f64af34b74ed186c32363ab59 # 14:45 G 157 0 0 0 Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
git bisect bad 63c6e188f639b5828bf744e675270bb5e2adc139 # 14:59 B 0 1 15 0 Merge tag 'riscv-for-linus-4.20-mw3' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux
git bisect good 82aa4671516a3203261c835e98c3eecab10c994d # 16:03 G 158 0 0 0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
git bisect bad 34c7685a177a7bc98066f7e5daa42eef621d0bdb # 16:31 B 27 1 0 0 Merge tag 'devicetree-fixes-for-4.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
git bisect bad 2d6bb6adb714b133db92ccd4bfc9c20f75f71f3f # 17:08 B 32 1 0 0 Merge tag 'stackleak-v4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
git bisect good 6444ccfd699cda8db5edaac7fa469d6a29aa9a47 # 18:02 G 162 0 0 0 Merge branch 'for-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu
git bisect good 7c6c54b505b8aea1782ce6a6e8f3b8297d179937 # 18:32 G 161 0 0 0 Merge branch 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
git bisect bad c8d126275a5fa59394fe17109bdb9812fed296b8 # 18:55 B 10 3 0 0 fs/proc: Show STACKLEAK metrics in the /proc file system
git bisect bad 10e9ae9fabaf96c8e5227c1cd4827d58b3aa406d # 19:15 B 13 3 0 0 gcc-plugins: Add STACKLEAK plugin for tracking the kernel stack
git bisect bad afaef01c001537fa97a25092d7f54d764dc7d8c1 # 19:33 B 15 1 0 0 x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls
# first bad commit: [afaef01c001537fa97a25092d7f54d764dc7d8c1] x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls
git bisect good 57361846b52bc686112da6ca5368d11210796804 # 20:29 G 475 0 0 0 Linux 4.19-rc2
# extra tests with debug options
git bisect bad afaef01c001537fa97a25092d7f54d764dc7d8c1 # 20:47 B 44 2 0 0 x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls
# extra tests on HEAD of linux-devel/devel-spot-201811090824
git bisect bad 8992397c6e1a0adf719e7263a0c965fce4629b15 # 20:47 B 59 2 0 0 0day head guard for 'devel-spot-201811090824'
# extra tests on tree/branch linus/master
git bisect bad 24ccea7e102de8cbc93ab3befb123bbd18532be9 # 21:06 B 10 1 0 0 Merge tag 'xfs-4.20-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
# extra tests on tree/branch linux-next/master
git bisect bad 442b8cea2477fa95c22f28ca982addb5bc6b0845 # 21:22 B 22 1 0 0 Add linux-next specific files for 20181109
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/lkp Intel Corporation
+Andy, Thomas, Ingo
On Fri, Nov 9, 2018 at 2:24 PM kernel test robot <[email protected]> wrote:
> 0day kernel testing robot got the below dmesg and the first bad commit is
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>
> commit afaef01c001537fa97a25092d7f54d764dc7d8c1
> Author: Alexander Popov <[email protected]>
> AuthorDate: Fri Aug 17 01:16:58 2018 +0300
> Commit: Kees Cook <[email protected]>
> CommitDate: Tue Sep 4 10:35:47 2018 -0700
>
> x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls
[...]
> [ 127.808225] double fault: 0000 [#1]
> [ 127.808695] CPU: 0 PID: 414 Comm: trinity-main Tainted: G T 4.19.0-rc2-00001-gafaef01 #1
> [ 127.809799] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> [ 127.810760] RIP: 0010:ftrace_ops_test+0x27/0xa0
> [ 127.811289] Code: eb 9a 90 41 54 55 49 89 f4 53 48 89 d3 48 89 fd 48 81 ec b0 00 00 00 65 48 8b 04 25 28 00 00 00 48 89 84 24 a8 00 00 00 31 c0 <e8> 54 df ff ff 48 85 db 74 57 e8 4a df ff ff 48 8b 85 d0 00 00 00
> [ 127.813385] RSP: 0018:fffffe0000001fb8 EFLAGS: 00010046
[...]
> [ 127.819762] CR2: fffffe0000001fa8 CR3: 000000001579a000 CR4: 00000000000006b0
[...]
> [ 127.822234] Call Trace:
> [ 127.822530] <ENTRY_TRAMPOLINE>
> [ 127.822914] ? __ia32_sys_rseq+0x2f0/0x2f0
> [ 127.823395] ftrace_ops_list_func+0xa5/0x1b0
> [ 127.823922] ftrace_call+0x5/0x34
> [ 127.824318] ? stackleak_erase+0x5/0xf0
> [ 127.824789] ? stackleak_erase+0x43/0xf0
> [ 127.825260] stackleak_erase+0x5/0xf0
> [ 127.825699] syscall_return_via_sysret+0x61/0x81
> [ 127.826238] WARNING: stack recursion on stack type 4
> [ 127.826243] WARNING: can't dereference registers at (____ptrval____) for ip syscall_return_via_sysret+0x61/0x81
> [ 127.826246] </ENTRY_TRAMPOLINE>
> [ 127.828342] ---[ end trace e9f96d3f45575499 ]---
> [ 127.828911] RIP: 0010:ftrace_ops_test+0x27/0xa0
CR2: fffffe0000001fa8, RSP: 0018:fffffe0000001fb8; this is a pagefault
on the stack. fffffe0000000000 is CPU_ENTRY_AREA_RO_IDT;
fffffe0000001000 is CPU_ENTRY_AREA_PER_CPU; so fffffe0000002000 is the
page with the entry stack for cpu 0, and you overflowed from that into
the readonly gdt at fffffe0000001000, which doubles as a guard page
for the entry stack:
struct cpu_entry_area {
char gdt[PAGE_SIZE];
/*
* The GDT is just below entry_stack and thus serves (on x86_64) as
* a a read-only guard page.
*/
struct entry_stack_page entry_stack_page;
[...]
};
In other words: You're calling C code on the entry trampoline stack;
this C code can call into ftrace; and the entry trampoline stack isn't
big enough for ftrace shenanigans. I think you probably shouldn't be
calling C code on the entry stack, but maybe one of the X86 folks has
a different opinion?
> On Nov 9, 2018, at 12:06 PM, Jann Horn <[email protected]> wrote:
>
> +Andy, Thomas, Ingo
>
>> On Fri, Nov 9, 2018 at 2:24 PM kernel test robot <[email protected]> wrote:
>> 0day kernel testing robot got the below dmesg and the first bad commit is
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>>
>> commit afaef01c001537fa97a25092d7f54d764dc7d8c1
>> Author: Alexander Popov <[email protected]>
>> AuthorDate: Fri Aug 17 01:16:58 2018 +0300
>> Commit: Kees Cook <[email protected]>
>> CommitDate: Tue Sep 4 10:35:47 2018 -0700
>>
>> x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls
> [...]
>> [ 127.808225] double fault: 0000 [#1]
>> [ 127.808695] CPU: 0 PID: 414 Comm: trinity-main Tainted: G T 4.19.0-rc2-00001-gafaef01 #1
>> [ 127.809799] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
>> [ 127.810760] RIP: 0010:ftrace_ops_test+0x27/0xa0
>> [ 127.811289] Code: eb 9a 90 41 54 55 49 89 f4 53 48 89 d3 48 89 fd 48 81 ec b0 00 00 00 65 48 8b 04 25 28 00 00 00 48 89 84 24 a8 00 00 00 31 c0 <e8> 54 df ff ff 48 85 db 74 57 e8 4a df ff ff 48 8b 85 d0 00 00 00
>> [ 127.813385] RSP: 0018:fffffe0000001fb8 EFLAGS: 00010046
> [...]
>> [ 127.819762] CR2: fffffe0000001fa8 CR3: 000000001579a000 CR4: 00000000000006b0
> [...]
>> [ 127.822234] Call Trace:
>> [ 127.822530] <ENTRY_TRAMPOLINE>
>> [ 127.822914] ? __ia32_sys_rseq+0x2f0/0x2f0
>> [ 127.823395] ftrace_ops_list_func+0xa5/0x1b0
>> [ 127.823922] ftrace_call+0x5/0x34
>> [ 127.824318] ? stackleak_erase+0x5/0xf0
>> [ 127.824789] ? stackleak_erase+0x43/0xf0
>> [ 127.825260] stackleak_erase+0x5/0xf0
>> [ 127.825699] syscall_return_via_sysret+0x61/0x81
>> [ 127.826238] WARNING: stack recursion on stack type 4
>> [ 127.826243] WARNING: can't dereference registers at (____ptrval____) for ip syscall_return_via_sysret+0x61/0x81
>> [ 127.826246] </ENTRY_TRAMPOLINE>
>> [ 127.828342] ---[ end trace e9f96d3f45575499 ]---
>> [ 127.828911] RIP: 0010:ftrace_ops_test+0x27/0xa0
>
> CR2: fffffe0000001fa8, RSP: 0018:fffffe0000001fb8; this is a pagefault
> on the stack. fffffe0000000000 is CPU_ENTRY_AREA_RO_IDT;
> fffffe0000001000 is CPU_ENTRY_AREA_PER_CPU; so fffffe0000002000 is the
> page with the entry stack for cpu 0, and you overflowed from that into
> the readonly gdt at fffffe0000001000, which doubles as a guard page
> for the entry stack:
>
> struct cpu_entry_area {
> char gdt[PAGE_SIZE];
>
> /*
> * The GDT is just below entry_stack and thus serves (on x86_64) as
> * a a read-only guard page.
> */
> struct entry_stack_page entry_stack_page;
> [...]
> };
>
> In other words: You're calling C code on the entry trampoline stack;
> this C code can call into ftrace; and the entry trampoline stack isn't
> big enough for ftrace shenanigans. I think you probably shouldn't be
> calling C code on the entry stack, but maybe one of the X86 folks has
> a different opinion?
My opinion was that, on x86_32, the entry stack ought to be fairly large so that NMIs could execute on the entry stack. I don’t remember what the code actually does, though.
But stackleak_erase should probably not run on the entry stack. That seems like it’s just asking for trouble.
On 09.11.2018 23:46, Andy Lutomirski wrote:
>> On Nov 9, 2018, at 12:06 PM, Jann Horn <[email protected]> wrote:
>>
>> +Andy, Thomas, Ingo
>>
>>> On Fri, Nov 9, 2018 at 2:24 PM kernel test robot <[email protected]> wrote:
>>> 0day kernel testing robot got the below dmesg and the first bad commit is
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>>>
>>> commit afaef01c001537fa97a25092d7f54d764dc7d8c1
>>> Author: Alexander Popov <[email protected]>
>>> AuthorDate: Fri Aug 17 01:16:58 2018 +0300
>>> Commit: Kees Cook <[email protected]>
>>> CommitDate: Tue Sep 4 10:35:47 2018 -0700
>>>
>>> x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls
>> [...]
>>> [ 127.808225] double fault: 0000 [#1]
>>> [ 127.808695] CPU: 0 PID: 414 Comm: trinity-main Tainted: G T 4.19.0-rc2-00001-gafaef01 #1
>>> [ 127.809799] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
>>> [ 127.810760] RIP: 0010:ftrace_ops_test+0x27/0xa0
>>> [ 127.811289] Code: eb 9a 90 41 54 55 49 89 f4 53 48 89 d3 48 89 fd 48 81 ec b0 00 00 00 65 48 8b 04 25 28 00 00 00 48 89 84 24 a8 00 00 00 31 c0 <e8> 54 df ff ff 48 85 db 74 57 e8 4a df ff ff 48 8b 85 d0 00 00 00
>>> [ 127.813385] RSP: 0018:fffffe0000001fb8 EFLAGS: 00010046
>> [...]
>>> [ 127.819762] CR2: fffffe0000001fa8 CR3: 000000001579a000 CR4: 00000000000006b0
>> [...]
>>> [ 127.822234] Call Trace:
>>> [ 127.822530] <ENTRY_TRAMPOLINE>
>>> [ 127.822914] ? __ia32_sys_rseq+0x2f0/0x2f0
>>> [ 127.823395] ftrace_ops_list_func+0xa5/0x1b0
>>> [ 127.823922] ftrace_call+0x5/0x34
>>> [ 127.824318] ? stackleak_erase+0x5/0xf0
>>> [ 127.824789] ? stackleak_erase+0x43/0xf0
>>> [ 127.825260] stackleak_erase+0x5/0xf0
>>> [ 127.825699] syscall_return_via_sysret+0x61/0x81
>>> [ 127.826238] WARNING: stack recursion on stack type 4
>>> [ 127.826243] WARNING: can't dereference registers at (____ptrval____) for ip syscall_return_via_sysret+0x61/0x81
>>> [ 127.826246] </ENTRY_TRAMPOLINE>
>>> [ 127.828342] ---[ end trace e9f96d3f45575499 ]---
>>> [ 127.828911] RIP: 0010:ftrace_ops_test+0x27/0xa0
>>
>> CR2: fffffe0000001fa8, RSP: 0018:fffffe0000001fb8; this is a pagefault
>> on the stack. fffffe0000000000 is CPU_ENTRY_AREA_RO_IDT;
>> fffffe0000001000 is CPU_ENTRY_AREA_PER_CPU; so fffffe0000002000 is the
>> page with the entry stack for cpu 0, and you overflowed from that into
>> the readonly gdt at fffffe0000001000, which doubles as a guard page
>> for the entry stack:
>>
>> struct cpu_entry_area {
>> char gdt[PAGE_SIZE];
>>
>> /*
>> * The GDT is just below entry_stack and thus serves (on x86_64) as
>> * a a read-only guard page.
>> */
>> struct entry_stack_page entry_stack_page;
>> [...]
>> };
>>
>> In other words: You're calling C code on the entry trampoline stack;
>> this C code can call into ftrace; and the entry trampoline stack isn't
>> big enough for ftrace shenanigans. I think you probably shouldn't be
>> calling C code on the entry stack, but maybe one of the X86 folks has
>> a different opinion?
>
> My opinion was that, on x86_32, the entry stack ought to be fairly large so
> that NMIs could execute on the entry stack. I don’t remember what the code
> actually does, though.
>
> But stackleak_erase should probably not run on the entry stack. That seems
> like it’s just asking for trouble.
Hello Jann and Andy,
The stackleak_erase() function is called on the trampoline stack at the end of
syscall, it erases the used part of the kernel thread stack after the syscall is
handled.
I've reproduced such a double fault with function tracing for stackleak_erase():
# mount -t tracefs nodev /sys/kernel/tracing
# echo 'p:myprobe stackleak_erase' > /sys/kernel/debug/tracing/kprobe_events
# echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable
I think we should simply not allow function tracing for stackleak_*() functions:
diff --git a/kernel/Makefile b/kernel/Makefile
index 7343b3a..0906f6d 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -18,6 +18,7 @@ obj-$(CONFIG_MULTIUSER) += groups.o
ifdef CONFIG_FUNCTION_TRACER
# Do not trace internal ftrace files
CFLAGS_REMOVE_irq_work.o = $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_stackleak.o = $(CC_FLAGS_FTRACE)
endif
With this patch setting kprobe event for stackleak_erase() is not allowed. This
is the corresponding dmesg output:
[ 75.660478] trace_kprobe: Could not probe notrace function stackleak_erase
If you agree, I'll prepare the patch for LKML.
Best regards,
Alexander
On Fri, Nov 9, 2018 at 5:09 PM, Alexander Popov <[email protected]> wrote:
>
> On 09.11.2018 23:46, Andy Lutomirski wrote:
>>> On Nov 9, 2018, at 12:06 PM, Jann Horn <[email protected]> wrote:
>>>
>>> +Andy, Thomas, Ingo
>>>
>>>> On Fri, Nov 9, 2018 at 2:24 PM kernel test robot <[email protected]> wrote:
>>>> 0day kernel testing robot got the below dmesg and the first bad commit is
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>>>>
>>>> commit afaef01c001537fa97a25092d7f54d764dc7d8c1
>>>> Author: Alexander Popov <[email protected]>
>>>> AuthorDate: Fri Aug 17 01:16:58 2018 +0300
>>>> Commit: Kees Cook <[email protected]>
>>>> CommitDate: Tue Sep 4 10:35:47 2018 -0700
>>>>
>>>> x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls
>>> [...]
>>>> [ 127.808225] double fault: 0000 [#1]
>>>> [ 127.808695] CPU: 0 PID: 414 Comm: trinity-main Tainted: G T 4.19.0-rc2-00001-gafaef01 #1
>>>> [ 127.809799] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
>>>> [ 127.810760] RIP: 0010:ftrace_ops_test+0x27/0xa0
>>>> [ 127.811289] Code: eb 9a 90 41 54 55 49 89 f4 53 48 89 d3 48 89 fd 48 81 ec b0 00 00 00 65 48 8b 04 25 28 00 00 00 48 89 84 24 a8 00 00 00 31 c0 <e8> 54 df ff ff 48 85 db 74 57 e8 4a df ff ff 48 8b 85 d0 00 00 00
>>>> [ 127.813385] RSP: 0018:fffffe0000001fb8 EFLAGS: 00010046
>>> [...]
>>>> [ 127.819762] CR2: fffffe0000001fa8 CR3: 000000001579a000 CR4: 00000000000006b0
>>> [...]
>>>> [ 127.822234] Call Trace:
>>>> [ 127.822530] <ENTRY_TRAMPOLINE>
>>>> [ 127.822914] ? __ia32_sys_rseq+0x2f0/0x2f0
>>>> [ 127.823395] ftrace_ops_list_func+0xa5/0x1b0
>>>> [ 127.823922] ftrace_call+0x5/0x34
>>>> [ 127.824318] ? stackleak_erase+0x5/0xf0
>>>> [ 127.824789] ? stackleak_erase+0x43/0xf0
>>>> [ 127.825260] stackleak_erase+0x5/0xf0
>>>> [ 127.825699] syscall_return_via_sysret+0x61/0x81
>>>> [ 127.826238] WARNING: stack recursion on stack type 4
>>>> [ 127.826243] WARNING: can't dereference registers at (____ptrval____) for ip syscall_return_via_sysret+0x61/0x81
>>>> [ 127.826246] </ENTRY_TRAMPOLINE>
>>>> [ 127.828342] ---[ end trace e9f96d3f45575499 ]---
>>>> [ 127.828911] RIP: 0010:ftrace_ops_test+0x27/0xa0
>>>
>>> CR2: fffffe0000001fa8, RSP: 0018:fffffe0000001fb8; this is a pagefault
>>> on the stack. fffffe0000000000 is CPU_ENTRY_AREA_RO_IDT;
>>> fffffe0000001000 is CPU_ENTRY_AREA_PER_CPU; so fffffe0000002000 is the
>>> page with the entry stack for cpu 0, and you overflowed from that into
>>> the readonly gdt at fffffe0000001000, which doubles as a guard page
>>> for the entry stack:
>>>
>>> struct cpu_entry_area {
>>> char gdt[PAGE_SIZE];
>>>
>>> /*
>>> * The GDT is just below entry_stack and thus serves (on x86_64) as
>>> * a a read-only guard page.
>>> */
>>> struct entry_stack_page entry_stack_page;
>>> [...]
>>> };
>>>
>>> In other words: You're calling C code on the entry trampoline stack;
>>> this C code can call into ftrace; and the entry trampoline stack isn't
>>> big enough for ftrace shenanigans. I think you probably shouldn't be
>>> calling C code on the entry stack, but maybe one of the X86 folks has
>>> a different opinion?
>>
>> My opinion was that, on x86_32, the entry stack ought to be fairly large so
>> that NMIs could execute on the entry stack. I don’t remember what the code
>> actually does, though.
>>
>> But stackleak_erase should probably not run on the entry stack. That seems
>> like it’s just asking for trouble.
>
> Hello Jann and Andy,
>
>
> The stackleak_erase() function is called on the trampoline stack at the end of
> syscall, it erases the used part of the kernel thread stack after the syscall is
> handled.
>
>
> I've reproduced such a double fault with function tracing for stackleak_erase():
>
> # mount -t tracefs nodev /sys/kernel/tracing
> # echo 'p:myprobe stackleak_erase' > /sys/kernel/debug/tracing/kprobe_events
> # echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable
>
>
> I think we should simply not allow function tracing for stackleak_*() functions:
>
> diff --git a/kernel/Makefile b/kernel/Makefile
> index 7343b3a..0906f6d 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -18,6 +18,7 @@ obj-$(CONFIG_MULTIUSER) += groups.o
> ifdef CONFIG_FUNCTION_TRACER
> # Do not trace internal ftrace files
> CFLAGS_REMOVE_irq_work.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_stackleak.o = $(CC_FLAGS_FTRACE)
> endif
Yeah, that's what I was suspecting on IRC. This looks like the right
fix. Can you send that to me as a "regular" patch with changelog, etc,
and I'll send it up to Linus.
Reviewed-by: Kees Cook <[email protected]>
Thanks for everyone's attention on this! I've been travelling this
week, so I've been a little slow. :)
-Kees
>
>
> With this patch setting kprobe event for stackleak_erase() is not allowed. This
> is the corresponding dmesg output:
> [ 75.660478] trace_kprobe: Could not probe notrace function stackleak_erase
>
>
> If you agree, I'll prepare the patch for LKML.
>
> Best regards,
> Alexander
--
Kees Cook