From: Borislav Petkov <[email protected]>
I found out recently that this is very helpful when trying to look at
opcodes around the rIP when the segfault happens, and also poke at
architectural registers. When enabled, it looks something like this:
strsep[3702]: segfault at 40066b ip 00007ffff7abe22b sp 00007fffffffea70 error 7 in libc-2.19.so[7ffff7a33000+19f000]
RIP: 0033:[<00007ffff7abe22b>] [<00007ffff7abe22b>] 0x7ffff7abe22b
RSP: 002b:00007fffffffea70 EFLAGS: 00010202
RAX: 000000000040066b RBX: 0000000000400664 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 000000000000003d RDI: 0000000000400665
RBP: 00007fffffffea90 R08: 00007ffff7dd7c60 R09: 00007ffff7deae20
R10: 00007fffffffe850 R11: 00007ffff7abe200 R12: 0000000000400460
R13: 00007fffffffeb80 R14: 0000000000000000 R15: 0000000000000000
FS: 00007ffff7fdc700(0000) GS:ffff88007ed40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000040066b CR3: 0000000079a39000 CR4: 00000000000406e0
Code: 74 33 80 7e 01 00 74 22 48 89 df e8 5a 8a ff ff 48 85 c0 74 20 <c6> 00 00 48 83 c0
01 48 89 45 00 48 89 d8 48 83 c4 08 5b 5d c3 0f b6 13 38 d0 74 29 84 d2 75 15 48 c7 45 00 00 00 00 00 48 83 c4
I know, this info can be collected with core dumps but in constrained
environments or when writing out core dumps is impossible (too early in
the boot, fs is fscked), getting some more detailed info might be really
helpful.
Oh, and the size of the change is small enough.
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
arch/x86/Kconfig.debug | 9 +++++++++
arch/x86/mm/fault.c | 24 +++++++++++++++++++++++-
2 files changed, 32 insertions(+), 1 deletion(-)
diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index 67eec55093a5..514bbae2f4c6 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -361,4 +361,13 @@ config PUNIT_ATOM_DEBUG
The current power state can be read from
/sys/kernel/debug/punit_atom/dev_power_state
+config SEGFAULT_DETAILED_DEBUG
+ bool "Dump detailed information for fatal segfaults"
+ ---help---
+ Enable this option if you want to see more debug info in
+ the kernel log when fatal segfaults get reported. This
+ option might be useful in constrained environments when
+ core dumps might not be possible and/or filesystems are
+ not ready for a core dump writeout.
+
endmenu
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 9f72ca3b2669..120f126f5b54 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -847,8 +847,30 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code,
(void *)regs->ip, (void *)regs->sp, error_code);
print_vma_addr(KERN_CONT " in ", regs->ip);
-
printk(KERN_CONT "\n");
+
+ if (IS_ENABLED(CONFIG_SEGFAULT_DETAILED_DEBUG)) {
+#define PREAMBLE_LEN 21
+#define OPC_BUF_LEN 64
+ u8 code[OPC_BUF_LEN];
+ int len, i;
+
+ len = __copy_from_user_inatomic(code,
+ (void *)regs->ip - PREAMBLE_LEN,
+ OPC_BUF_LEN);
+
+ __show_regs(regs, 1);
+
+ printk(KERN_DEFAULT "Code: ");
+ for (i = 0; i < OPC_BUF_LEN - len; i++) {
+ if (i == PREAMBLE_LEN)
+ pr_cont("<%02x> ", code[i]);
+ else
+ pr_cont("%02x ", code[i]);
+ }
+
+ printk(KERN_CONT "\n");
+ }
}
static void
--
2.10.0
* Borislav Petkov <[email protected]> wrote:
> From: Borislav Petkov <[email protected]>
>
> I found out recently that this is very helpful when trying to look at
> opcodes around the rIP when the segfault happens, and also poke at
> architectural registers. When enabled, it looks something like this:
>
> strsep[3702]: segfault at 40066b ip 00007ffff7abe22b sp 00007fffffffea70 error 7 in libc-2.19.so[7ffff7a33000+19f000]
> RIP: 0033:[<00007ffff7abe22b>] [<00007ffff7abe22b>] 0x7ffff7abe22b
> RSP: 002b:00007fffffffea70 EFLAGS: 00010202
> RAX: 000000000040066b RBX: 0000000000400664 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 000000000000003d RDI: 0000000000400665
> RBP: 00007fffffffea90 R08: 00007ffff7dd7c60 R09: 00007ffff7deae20
> R10: 00007fffffffe850 R11: 00007ffff7abe200 R12: 0000000000400460
> R13: 00007fffffffeb80 R14: 0000000000000000 R15: 0000000000000000
> FS: 00007ffff7fdc700(0000) GS:ffff88007ed40000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000000040066b CR3: 0000000079a39000 CR4: 00000000000406e0
> Code: 74 33 80 7e 01 00 74 22 48 89 df e8 5a 8a ff ff 48 85 c0 74 20 <c6> 00 00 48 83 c0
> 01 48 89 45 00 48 89 d8 48 83 c4 08 5b 5d c3 0f b6 13 38 d0 74 29 84 d2 75 15 48 c7 45 00 00 00 00 00 48 83 c4
Note that on recent kernels, with printk log timestamping enabled, this looks
like:
[ 206.721243] CR2: 0000000000000000 CR3: 000000042ab75000 CR4: 00000000001406e0
[ 206.729217] Code:
[ 206.731271] 55
[ 206.733046] 48
[ 206.733348] 89
[ 206.733665] e5
[ 206.733982] ff
[ 206.734292] d0 5d e9
[ 206.735209] 7a
[ 206.735519] ff ff
[ 206.736133] ff
[ 206.736444] 55 48
[ 206.737057] 89
[ 206.737367] e5 b8
[ 206.737992] 00
[ 206.738303] 00 00 00
[ 206.739216] <c6>
[ 206.739728] 00
[ 206.740031] 00
> I know, this info can be collected with core dumps but in constrained
> environments or when writing out core dumps is impossible (too early in
> the boot, fs is fscked), getting some more detailed info might be really
> helpful.
>
> Oh, and the size of the change is small enough.
>
> Signed-off-by: Borislav Petkov <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> ---
> arch/x86/Kconfig.debug | 9 +++++++++
> arch/x86/mm/fault.c | 24 +++++++++++++++++++++++-
> 2 files changed, 32 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
> index 67eec55093a5..514bbae2f4c6 100644
> --- a/arch/x86/Kconfig.debug
> +++ b/arch/x86/Kconfig.debug
> @@ -361,4 +361,13 @@ config PUNIT_ATOM_DEBUG
> The current power state can be read from
> /sys/kernel/debug/punit_atom/dev_power_state
>
> +config SEGFAULT_DETAILED_DEBUG
> + bool "Dump detailed information for fatal segfaults"
> + ---help---
> + Enable this option if you want to see more debug info in
> + the kernel log when fatal segfaults get reported. This
> + option might be useful in constrained environments when
> + core dumps might not be possible and/or filesystems are
> + not ready for a core dump writeout.
> +
> endmenu
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 9f72ca3b2669..120f126f5b54 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -847,8 +847,30 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code,
> (void *)regs->ip, (void *)regs->sp, error_code);
>
> print_vma_addr(KERN_CONT " in ", regs->ip);
> -
> printk(KERN_CONT "\n");
> +
> + if (IS_ENABLED(CONFIG_SEGFAULT_DETAILED_DEBUG)) {
> +#define PREAMBLE_LEN 21
> +#define OPC_BUF_LEN 64
> + u8 code[OPC_BUF_LEN];
> + int len, i;
> +
> + len = __copy_from_user_inatomic(code,
> + (void *)regs->ip - PREAMBLE_LEN,
> + OPC_BUF_LEN);
> +
> + __show_regs(regs, 1);
> +
> + printk(KERN_DEFAULT "Code: ");
> + for (i = 0; i < OPC_BUF_LEN - len; i++) {
> + if (i == PREAMBLE_LEN)
> + pr_cont("<%02x> ", code[i]);
> + else
> + pr_cont("%02x ", code[i]);
> + }
> +
> + printk(KERN_CONT "\n");
> + }
The main problem I have with this is that it's a big information leak: if an admin
leaves this enabled then any user can use segfaults to read any kernel address.
Something like this will set RIP to arbitrary value:
void main(void)
{
void (*fn)(void) = (void *)0xffff8800000fcda0;
fn();
}
and the segfault info dumper then prints the user-inaccessible contents of the RIP
address.
So I don't mind the feature, but this should only dump code that is user-readable.
Thanks,
Ingo
On Sat, Nov 12, 2016 at 12:06:49PM +0100, Ingo Molnar wrote:
> Note that on recent kernels, with printk log timestamping enabled, this looks
> like:
>
> [ 206.721243] CR2: 0000000000000000 CR3: 000000042ab75000 CR4: 00000000001406e0
> [ 206.729217] Code:
> [ 206.731271] 55
> [ 206.733046] 48
> [ 206.733348] 89
> [ 206.733665] e5
> [ 206.733982] ff
Hmm, this would then be no different with the "normal" Code: line as
I simply stole it from there. And I have CONFIG_PRINTK_TIME=y. And it
looks ok in my guest:
[ 56.005550] strsep[3674]: segfault at 40066b ip 00007ffff7abe22b sp 00007fffffffeb40 error 7 in libc-2.19.so[7ffff7a33000+19f000]
[ 56.009876] RIP: 0033:[<00007ffff7abe22b>] [<00007ffff7abe22b>] 0x7ffff7abe22b
[ 56.011917] RSP: 002b:00007fffffffeb40 EFLAGS: 00010202
[ 56.020225] RAX: 000000000040066b RBX: 0000000000400664 RCX: 0000000000000000
[ 56.021387] RDX: 0000000000000000 RSI: 000000000000003d RDI: 0000000000400665
[ 56.022373] RBP: 00007fffffffeb60 R08: 00007ffff7dd7c60 R09: 00007ffff7deae20
[ 56.023348] R10: 00007fffffffe920 R11: 00007ffff7abe200 R12: 0000000000400460
[ 56.024467] R13: 00007fffffffec50 R14: 0000000000000000 R15: 0000000000000000
[ 56.025560] FS: 00007ffff7fdc700(0000) GS:ffff88007ec40000(0000) knlGS:0000000000000000
[ 56.026665] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 56.027458] CR2: 000000000040066b CR3: 000000007aabe000 CR4: 00000000000406e0
[ 56.028597] Code: 74 33 80 7e 01 00 74 22 48 89 df e8 5a 8a ff ff 48 85 c0 74 20 <c6> 00 00 48 83 c0 01 48 89 45 00 48 89 d8 48 83 c4 08 5b 5d c3 0f b6 13 38 d0 74 29 84 d2 75 15 48 c7 45 00 00 00 00 00 48 83 c4
So, theoretically, show_regs() would generate the same thing on your
machine. Normal splats look the ok here too:
[ 228.093462] sysrq: SysRq : Trigger a crash
[ 228.095306] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 228.096955] IP: [<ffffffff81369d5b>] sysrq_handle_crash+0x1b/0x30
[ 228.096955] PGD 7abc8067 [ 228.096955] PUD 79a26067
PMD 0 [ 228.096955]
[ 228.096955] Oops: 0002 [#1] PREEMPT SMP
[ 228.096955] Modules linked in:
[ 228.096955] CPU: 3 PID: 3692 Comm: bash Not tainted 4.9.0-rc4+ #32
[ 228.096955] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
[ 228.096955] task: ffff88007936c800 task.stack: ffffc90002e38000
[ 228.096955] RIP: 0010:[<ffffffff81369d5b>] [<ffffffff81369d5b>] sysrq_handle_crash+0x1b/0x30
[ 228.096955] RSP: 0018:ffffc90002e3bde8 EFLAGS: 00010246
[ 228.096955] RAX: 0000000000000000 RBX: 0000000000000063 RCX: 0000000000000000
[ 228.096955] RDX: 0000000000000001 RSI: ffffffff810a3e13 RDI: 0000000000000063
[ 228.096955] RBP: ffffc90002e3bde8 R08: 0000000000000001 R09: 0000000000000006
[ 228.096955] R10: 0000000000000001 R11: 000000000000018f R12: 000000000000000a
[ 228.096955] R13: ffffffff81c569c0 R14: 0000000000000000 R15: 0000000000000000
[ 228.096955] FS: 00007ffff7fdb700(0000) GS:ffff88007ecc0000(0000) knlGS:0000000000000000
[ 228.096955] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 228.096955] CR2: 0000000000000000 CR3: 0000000079831000 CR4: 00000000000406e0
[ 228.096955] Stack:
[ 228.096955] ffffc90002e3be18 ffffffff8136a233 0000000000000002 fffffffffffffffb
[ 228.096955] ffff88007a883d00 0000000000705408 ffffc90002e3be30 ffffffff8136a66f
[ 228.096955] ffff88007b9c6540 ffffc90002e3be50 ffffffff811dbcf2 ffff88007a883d00
[ 228.096955] Call Trace:
[ 228.096955] [<ffffffff8136a233>] __handle_sysrq+0x103/0x160
[ 228.096955] [<ffffffff8136a66f>] write_sysrq_trigger+0x2f/0x40
[ 228.096955] [<ffffffff811dbcf2>] proc_reg_write+0x42/0x70
[ 228.096955] [<ffffffff8117abd8>] __vfs_write+0x28/0x120
[ 228.096955] [<ffffffff8107b5bf>] ? preempt_count_sub+0xaf/0x120
[ 228.096955] [<ffffffff8107b5bf>] ? preempt_count_sub+0xaf/0x120
[ 228.096955] [<ffffffff8117e182>] ? __sb_start_write+0x52/0xe0
[ 228.096955] [<ffffffff8117b930>] vfs_write+0xc0/0x180
[ 228.096955] [<ffffffff8117cbef>] SyS_write+0x4f/0xb0
[ 228.096955] [<ffffffff816e5e2e>] entry_SYSCALL_64_fastpath+0x1c/0xac
[ 228.096955] Code: 6e bf da ff eb e4 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 e8 a2 4a d4 ff c7 05 f0 25 a9 00 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 00 0f 0b 90 66 2e 0f 1f 84 00 00 00 00 00
[ 228.096955] RIP [<ffffffff81369d5b>] sysrq_handle_crash+0x1b/0x30
[ 228.096955] RSP <ffffc90002e3bde8>
[ 228.096955] CR2: 0000000000000000
[ 228.137948] ---[ end trace cfc5457f348eda2e ]---
[ 228.138698] Kernel panic - not syncing: Fatal exception
[ 228.140137] Kernel Offset: disabled
[ 228.140639] ---[ end Kernel panic - not syncing: Fatal exception
...
> So I don't mind the feature, but this should only dump code that is user-readable.
Yeah, this is purely a debug feature so how about I stick it behind a
switch in debugfs which is root-only and it is disabled by default? When
you boot, you do:
# echo 1 > /sys/kernel/debug/x86/detailed_segfault
Hmmm?
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
* Borislav Petkov <[email protected]> wrote:
> On Sat, Nov 12, 2016 at 12:06:49PM +0100, Ingo Molnar wrote:
> > Note that on recent kernels, with printk log timestamping enabled, this looks
> > like:
> >
> > [ 206.721243] CR2: 0000000000000000 CR3: 000000042ab75000 CR4: 00000000001406e0
> > [ 206.729217] Code:
> > [ 206.731271] 55
> > [ 206.733046] 48
> > [ 206.733348] 89
> > [ 206.733665] e5
> > [ 206.733982] ff
>
> Hmm, this would then be no different with the "normal" Code: line as
> I simply stole it from there. And I have CONFIG_PRINTK_TIME=y. And it
> looks ok in my guest:
That's definitely how it came out of 'dmesg' - I've attached my .config.
> > So I don't mind the feature, but this should only dump code that is user-readable.
>
> Yeah, this is purely a debug feature so how about I stick it behind a
> switch in debugfs which is root-only and it is disabled by default? When
> you boot, you do:
>
> # echo 1 > /sys/kernel/debug/x86/detailed_segfault
>
> Hmmm?
That runtime flag is definitely useful, but I think we should also do the proper
permission checks - for 'defense in depth' reasons and all that.
There might be situations where an admin might want to use this on a production
system - I had some trouble in the past getting/seeing segfaults that trigger in
SystemD for example.
Thanks,
Ingo
On Sun, Nov 13, 2016 at 08:39:03AM +0100, Ingo Molnar wrote:
> That's definitely how it came out of 'dmesg' - I've attached my .config.
Hmm, enabling all *PRINTK* options from your .config doesn't change
anything for my qemu guest here. Lemme try with your full config.
> That runtime flag is definitely useful, but I think we should also do
> the proper permission checks - for 'defense in depth' reasons and all
> that.
>
> There might be situations where an admin might want to use this on
> a production system - I had some trouble in the past getting/seeing
> segfaults that trigger in SystemD for example.
Well, we have to pay attention not to cripple a nice debugging feature
with security. For example, come to think of it, having a debugfs flag
doesn't help if you want to see early boot segfaults, before even being
able to log in and set that flag to 1.
So the better solution would be a kernel command line flag:
detailed_segfault=1
This is where we could also control which segfaults to see:
detailed_segfault=<comma_separated_uid_list>
where uid comes from task_uid() (or should it be task_euid()?). No need
for the "1" - the mere presence of the command line option is enough.
With a special option of
detailed_segfault=*
which enables it for all.
I think that's a good compromise with security.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Sun, Nov 13, 2016 at 12:25:52PM +0100, Borislav Petkov wrote:
> Hmm, enabling all *PRINTK* options from your .config doesn't change
> anything for my qemu guest here. Lemme try with your full config.
Same with your .config:
[ 115.694717] strsep[3027]: segfault at 40066b ip 00007ffff7abe22b sp 00007fffffffe990 error 7 in libc-2.19.so[7ffff7a33000+19f000]
[ 115.700181] RIP: 0033:[<00007ffff7abe22b>] [<00007ffff7abe22b>] 0x7ffff7abe22b
[ 115.704843] RSP: 002b:00007fffffffe990 EFLAGS: 00010202
[ 115.707183] RAX: 000000000040066b RBX: 0000000000400664 RCX: 0000000000000000
[ 115.709189] RDX: 0000000000000000 RSI: 000000000000003d RDI: 0000000000400665
[ 115.711207] RBP: 00007fffffffe9b0 R08: 00007ffff7dd7c60 R09: 00007ffff7deae20
[ 115.713630] R10: 00007fffffffe770 R11: 00007ffff7abe200 R12: 0000000000400460
[ 115.715653] R13: 00007fffffffeaa0 R14: 0000000000000000 R15: 0000000000000000
[ 115.717651] FS: 00007ffff7fdc700(0000) GS:ffff88007ed00000(0000) knlGS:0000000000000000
[ 115.719554] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 115.720393] CR2: 000000000040066b CR3: 0000000079f4f000 CR4: 00000000000406e0
[ 115.721409] Code: [ 115.721692] 74 33 80 7e 01 00 74 22 48 89 df e8 5a 8a ff ff 48 85 c0 74 20 <c6> 00 00 48 83 c0 01 48 89 45 00 48 89 d8 48 83 c4 08 5b 5d c3 0f b6 13 38 d0 74 29 84 d2 75 15 48 c7 45 00 00 00 00 00 48 83 c4
Is this a real hw issue? I.e., maybe I should not be doing this in a
guest?
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Nov 12, 2016 4:27 AM, "Borislav Petkov" <[email protected]> wrote:
>
> On Sat, Nov 12, 2016 at 12:06:49PM +0100, Ingo Molnar wrote:
> > Note that on recent kernels, with printk log timestamping enabled, this looks
> > like:
> >
> > [ 206.721243] CR2: 0000000000000000 CR3: 000000042ab75000 CR4: 00000000001406e0
> > [ 206.729217] Code:
> > [ 206.731271] 55
> > [ 206.733046] 48
> > [ 206.733348] 89
> > [ 206.733665] e5
> > [ 206.733982] ff
>
> Hmm, this would then be no different with the "normal" Code: line as
> I simply stole it from there. And I have CONFIG_PRINTK_TIME=y. And it
> looks ok in my guest:
>
> [ 56.005550] strsep[3674]: segfault at 40066b ip 00007ffff7abe22b sp 00007fffffffeb40 error 7 in libc-2.19.so[7ffff7a33000+19f000]
> [ 56.009876] RIP: 0033:[<00007ffff7abe22b>] [<00007ffff7abe22b>] 0x7ffff7abe22b
> [ 56.011917] RSP: 002b:00007fffffffeb40 EFLAGS: 00010202
> [ 56.020225] RAX: 000000000040066b RBX: 0000000000400664 RCX: 0000000000000000
> [ 56.021387] RDX: 0000000000000000 RSI: 000000000000003d RDI: 0000000000400665
> [ 56.022373] RBP: 00007fffffffeb60 R08: 00007ffff7dd7c60 R09: 00007ffff7deae20
> [ 56.023348] R10: 00007fffffffe920 R11: 00007ffff7abe200 R12: 0000000000400460
> [ 56.024467] R13: 00007fffffffec50 R14: 0000000000000000 R15: 0000000000000000
> [ 56.025560] FS: 00007ffff7fdc700(0000) GS:ffff88007ec40000(0000) knlGS:0000000000000000
> [ 56.026665] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 56.027458] CR2: 000000000040066b CR3: 000000007aabe000 CR4: 00000000000406e0
> [ 56.028597] Code: 74 33 80 7e 01 00 74 22 48 89 df e8 5a 8a ff ff 48 85 c0 74 20 <c6> 00 00 48 83 c0 01 48 89 45 00 48 89 d8 48 83 c4 08 5b 5d c3 0f b6 13 38 d0 74 29 84 d2 75 15 48 c7 45 00 00 00 00 00 48 83 c4
>
> So, theoretically, show_regs() would generate the same thing on your
> machine. Normal splats look the ok here too:
>
> [ 228.093462] sysrq: SysRq : Trigger a crash
> [ 228.095306] BUG: unable to handle kernel NULL pointer dereference at (null)
> [ 228.096955] IP: [<ffffffff81369d5b>] sysrq_handle_crash+0x1b/0x30
> [ 228.096955] PGD 7abc8067 [ 228.096955] PUD 79a26067
> PMD 0 [ 228.096955]
> [ 228.096955] Oops: 0002 [#1] PREEMPT SMP
> [ 228.096955] Modules linked in:
> [ 228.096955] CPU: 3 PID: 3692 Comm: bash Not tainted 4.9.0-rc4+ #32
> [ 228.096955] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
> [ 228.096955] task: ffff88007936c800 task.stack: ffffc90002e38000
> [ 228.096955] RIP: 0010:[<ffffffff81369d5b>] [<ffffffff81369d5b>] sysrq_handle_crash+0x1b/0x30
> [ 228.096955] RSP: 0018:ffffc90002e3bde8 EFLAGS: 00010246
> [ 228.096955] RAX: 0000000000000000 RBX: 0000000000000063 RCX: 0000000000000000
> [ 228.096955] RDX: 0000000000000001 RSI: ffffffff810a3e13 RDI: 0000000000000063
> [ 228.096955] RBP: ffffc90002e3bde8 R08: 0000000000000001 R09: 0000000000000006
> [ 228.096955] R10: 0000000000000001 R11: 000000000000018f R12: 000000000000000a
> [ 228.096955] R13: ffffffff81c569c0 R14: 0000000000000000 R15: 0000000000000000
> [ 228.096955] FS: 00007ffff7fdb700(0000) GS:ffff88007ecc0000(0000) knlGS:0000000000000000
> [ 228.096955] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 228.096955] CR2: 0000000000000000 CR3: 0000000079831000 CR4: 00000000000406e0
> [ 228.096955] Stack:
> [ 228.096955] ffffc90002e3be18 ffffffff8136a233 0000000000000002 fffffffffffffffb
> [ 228.096955] ffff88007a883d00 0000000000705408 ffffc90002e3be30 ffffffff8136a66f
> [ 228.096955] ffff88007b9c6540 ffffc90002e3be50 ffffffff811dbcf2 ffff88007a883d00
> [ 228.096955] Call Trace:
> [ 228.096955] [<ffffffff8136a233>] __handle_sysrq+0x103/0x160
> [ 228.096955] [<ffffffff8136a66f>] write_sysrq_trigger+0x2f/0x40
> [ 228.096955] [<ffffffff811dbcf2>] proc_reg_write+0x42/0x70
> [ 228.096955] [<ffffffff8117abd8>] __vfs_write+0x28/0x120
> [ 228.096955] [<ffffffff8107b5bf>] ? preempt_count_sub+0xaf/0x120
> [ 228.096955] [<ffffffff8107b5bf>] ? preempt_count_sub+0xaf/0x120
> [ 228.096955] [<ffffffff8117e182>] ? __sb_start_write+0x52/0xe0
> [ 228.096955] [<ffffffff8117b930>] vfs_write+0xc0/0x180
> [ 228.096955] [<ffffffff8117cbef>] SyS_write+0x4f/0xb0
> [ 228.096955] [<ffffffff816e5e2e>] entry_SYSCALL_64_fastpath+0x1c/0xac
> [ 228.096955] Code: 6e bf da ff eb e4 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 e8 a2 4a d4 ff c7 05 f0 25 a9 00 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 00 0f 0b 90 66 2e 0f 1f 84 00 00 00 00 00
> [ 228.096955] RIP [<ffffffff81369d5b>] sysrq_handle_crash+0x1b/0x30
> [ 228.096955] RSP <ffffc90002e3bde8>
> [ 228.096955] CR2: 0000000000000000
> [ 228.137948] ---[ end trace cfc5457f348eda2e ]---
> [ 228.138698] Kernel panic - not syncing: Fatal exception
> [ 228.140137] Kernel Offset: disabled
> [ 228.140639] ---[ end Kernel panic - not syncing: Fatal exception
>
> ...
> > So I don't mind the feature, but this should only dump code that is user-readable.
>
> Yeah, this is purely a debug feature so how about I stick it behind a
> switch in debugfs which is root-only and it is disabled by default? When
> you boot, you do:
>
> # echo 1 > /sys/kernel/debug/x86/detailed_segfault
How about dropping the __ in front of the copy?
On Sun, Nov 13, 2016 at 08:15:01AM -0800, Andy Lutomirski wrote:
> How about dropping the __ in front of the copy?
In front of __copy_from_user_inatomic()?
If so, is there even a global helper by that name?
$ git grep -E "[^_]copy_from_user_inatomic"
drivers/gpu/drm/msm/msm_gem_submit.c:98: ret = copy_from_user_inatomic(&submit_bo, userptr, sizeof(submit_bo));
is all I can find.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
* Borislav Petkov <[email protected]> wrote:
> On Sun, Nov 13, 2016 at 12:25:52PM +0100, Borislav Petkov wrote:
> > Hmm, enabling all *PRINTK* options from your .config doesn't change
> > anything for my qemu guest here. Lemme try with your full config.
>
> Same with your .config:
>
> [ 115.694717] strsep[3027]: segfault at 40066b ip 00007ffff7abe22b sp 00007fffffffe990 error 7 in libc-2.19.so[7ffff7a33000+19f000]
> [ 115.700181] RIP: 0033:[<00007ffff7abe22b>] [<00007ffff7abe22b>] 0x7ffff7abe22b
> [ 115.704843] RSP: 002b:00007fffffffe990 EFLAGS: 00010202
> [ 115.707183] RAX: 000000000040066b RBX: 0000000000400664 RCX: 0000000000000000
> [ 115.709189] RDX: 0000000000000000 RSI: 000000000000003d RDI: 0000000000400665
> [ 115.711207] RBP: 00007fffffffe9b0 R08: 00007ffff7dd7c60 R09: 00007ffff7deae20
> [ 115.713630] R10: 00007fffffffe770 R11: 00007ffff7abe200 R12: 0000000000400460
> [ 115.715653] R13: 00007fffffffeaa0 R14: 0000000000000000 R15: 0000000000000000
> [ 115.717651] FS: 00007ffff7fdc700(0000) GS:ffff88007ed00000(0000) knlGS:0000000000000000
> [ 115.719554] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 115.720393] CR2: 000000000040066b CR3: 0000000079f4f000 CR4: 00000000000406e0
> [ 115.721409] Code: [ 115.721692] 74 33 80 7e 01 00 74 22 48 89 df e8 5a 8a ff ff 48 85 c0 74 20 <c6> 00 00 48 83 c0 01 48 89 45 00 48 89 d8 48 83 c4 08 5b 5d c3 0f b6 13 38 d0 74 29 84 d2 75 15 48 c7 45 00 00 00 00 00 48 83 c4
>
> Is this a real hw issue? I.e., maybe I should not be doing this in a
> guest?
So I think the line breaking artifact might be due to the following commit:
bfd8d3f23b51 ("printk: make reading the kernel log flush pending lines")
... which Linus reverted upstream a few hours ago:
commit f5c9f9c72395c3291c2e35c905dedae2b98475a4
Author: Linus Torvalds <[email protected]>
Date: Mon Nov 14 09:31:52 2016 -0800
Revert "printk: make reading the kernel log flush pending lines"
This reverts commit bfd8d3f23b51018388be0411ccbc2d56277fe294.
It turns out that this flushes things much too aggressiverly, and causes
lines to break up when the system logger races with new continuation
lines being printed.
...
Thanks,
Ingo
On Tue, Nov 15, 2016 at 07:14:47AM +0100, Ingo Molnar wrote:
> So I think the line breaking artifact might be due to the following commit:
>
> bfd8d3f23b51 ("printk: make reading the kernel log flush pending lines")
>
> ... which Linus reverted upstream a few hours ago:
Probably. I guess this doesn't happen when running my guest here because
I have the segfault logging change ontop of -rc4.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
* Borislav Petkov <[email protected]> wrote:
> On Tue, Nov 15, 2016 at 07:14:47AM +0100, Ingo Molnar wrote:
> > So I think the line breaking artifact might be due to the following commit:
> >
> > bfd8d3f23b51 ("printk: make reading the kernel log flush pending lines")
> >
> > ... which Linus reverted upstream a few hours ago:
>
> Probably. I guess this doesn't happen when running my guest here because
> I have the segfault logging change ontop of -rc4.
Well, bfd8d3f23b51 was part of -rc1 already, so you have it too.
What might have made the difference is that I tested it on real hardware with
serial logging, which delays printk() and might have created the race with the
systemd kernel logger. (Which is impossible to turn off :-( )
Thanks,
Ingo
On Tue, Nov 15, 2016 at 11:19:03AM +0100, Ingo Molnar wrote:
> Well, bfd8d3f23b51 was part of -rc1 already, so you have it too.
Right.
> What might have made the difference is that I tested it on real hardware with
> serial logging, which delays printk() and might have created the race with the
> systemd kernel logger. (Which is impossible to turn off :-( )
Of course it is!
printk.devkmsg=off
:-)
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
* Borislav Petkov <[email protected]> wrote:
> On Tue, Nov 15, 2016 at 11:19:03AM +0100, Ingo Molnar wrote:
> > Well, bfd8d3f23b51 was part of -rc1 already, so you have it too.
>
> Right.
>
> > What might have made the difference is that I tested it on real hardware with
> > serial logging, which delays printk() and might have created the race with the
> > systemd kernel logger. (Which is impossible to turn off :-( )
>
> Of course it is!
>
> printk.devkmsg=off
>
> :-)
LOL, indeed ;-)
Thanks,
Ingo
On Nov 13, 2016 10:18 PM, "Borislav Petkov" <[email protected]> wrote:
>
> On Sun, Nov 13, 2016 at 08:15:01AM -0800, Andy Lutomirski wrote:
> > How about dropping the __ in front of the copy?
>
> In front of __copy_from_user_inatomic()?
>
> If so, is there even a global helper by that name?
>
> $ git grep -E "[^_]copy_from_user_inatomic"
> drivers/gpu/drm/msm/msm_gem_submit.c:98: ret = copy_from_user_inatomic(&submit_bo, userptr, sizeof(submit_bo));
>
> is all I can find.
Huh, go figure. How about access_ok() instead? There's also
pagefault_disable() + copy_from_user().
On Thu, Nov 17, 2016 at 03:33:21PM -0800, Andy Lutomirski wrote:
> Huh, go figure. How about access_ok() instead? There's also
> pagefault_disable() + copy_from_user().
Ah, you don't want to have another #PF during the first one.
How about copy_from_user_nmi()?
It already does __range_not_ok() which is basically access_ok() and
disables #PF around __copy_from_user_inatomic().
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.