2017-07-20 01:43:51

by kernel test robot

[permalink] [raw]
Subject: [lkp-robot] [include/linux/string.h] 6974f0c455: kernel_BUG_at_lib/string.c


FYI, we noticed the following commit:

commit: 6974f0c4555e285ab217cee58b6e874f776ff409 ("include/linux/string.h: add the option of fortified string.h functions")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: boot

on test machine: qemu-system-i386 -enable-kvm -m 256M

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):



[ 8.134860] kernel BUG at lib/string.c:985!
[ 8.134860] kernel BUG at lib/string.c:985!
[ 8.134863] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 8.134863] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 8.134864] Modules linked in:
[ 8.134864] Modules linked in:
[ 8.134867] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.0-10998-g6974f0c #2
[ 8.134867] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.0-10998-g6974f0c #2
[ 8.134869] task: 8005c000 task.stack: 8005e000
[ 8.134869] task: 8005c000 task.stack: 8005e000
[ 8.134872] EIP: fortify_panic+0xe/0x10
[ 8.134872] EIP: fortify_panic+0xe/0x10
[ 8.134873] EFLAGS: 00210246 CPU: 0
[ 8.134873] EFLAGS: 00210246 CPU: 0
[ 8.134875] EAX: 00000022 EBX: 89d33e08 ECX: 00000003 EDX: 784c6e40
[ 8.134875] EAX: 00000022 EBX: 89d33e08 ECX: 00000003 EDX: 784c6e40
[ 8.134876] ESI: 00000006 EDI: 8be97ffc EBP: 8005fe3c ESP: 8005fe34
[ 8.134876] ESI: 00000006 EDI: 8be97ffc EBP: 8005fe3c ESP: 8005fe34
[ 8.134878] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 8.134878] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 8.134879] CR0: 80050033 CR2: 00000000 CR3: 0c67a000 CR4: 00000690
[ 8.134879] CR0: 80050033 CR2: 00000000 CR3: 0c67a000 CR4: 00000690
[ 8.134882] Call Trace:
[ 8.134882] Call Trace:
[ 8.134886] arch_prepare_optimized_kprobe+0xd5/0x171
[ 8.134886] arch_prepare_optimized_kprobe+0xd5/0x171
[ 8.134887] ? 0x90a2c000
[ 8.134887] ? 0x90a2c000
[ 8.134889] ? 0x90a2c000
[ 8.134889] ? 0x90a2c000
[ 8.134893] ? 0x90a2c02a
[ 8.134893] ? 0x90a2c02a
[ 8.134894] ? 0x90a2c039
[ 8.134894] ? 0x90a2c039
[ 8.134895] ? 0x90a2c02f
[ 8.134895] ? 0x90a2c02f
[ 8.134898] alloc_aggr_kprobe+0x3a/0x44
[ 8.134898] alloc_aggr_kprobe+0x3a/0x44
[ 8.134900] register_kprobe+0x366/0x3ca
[ 8.134900] register_kprobe+0x366/0x3ca
[ 8.134903] init_test_probes+0x43/0x41e
[ 8.134903] init_test_probes+0x43/0x41e
[ 8.134905] ? register_module_notifier+0xf/0x11
[ 8.134905] ? register_module_notifier+0xf/0x11
[ 8.134909] init_kprobes+0x25d/0x284
[ 8.134909] init_kprobes+0x25d/0x284
[ 8.134912] ? debugfs_kprobe_init+0xee/0xee
[ 8.134912] ? debugfs_kprobe_init+0xee/0xee
[ 8.134914] do_one_initcall+0x7e/0x121
[ 8.134914] do_one_initcall+0x7e/0x121
[ 8.134916] ? parse_args+0x1a8/0x29b
[ 8.134916] ? parse_args+0x1a8/0x29b
[ 8.134921] ? kernel_init_freeable+0xfd/0x1f8
[ 8.134921] ? kernel_init_freeable+0xfd/0x1f8
[ 8.134923] kernel_init_freeable+0x131/0x1f8
[ 8.134923] kernel_init_freeable+0x131/0x1f8
[ 8.134926] ? rest_init+0xe6/0xe6
[ 8.134926] ? rest_init+0xe6/0xe6
[ 8.134928] kernel_init+0x8/0xd0
[ 8.134928] kernel_init+0x8/0xd0
[ 8.134932] ret_from_fork+0x19/0x24
[ 8.134932] ret_from_fork+0x19/0x24
[ 8.134936] Code: e0 83 e1 07 01 d1 39 cb 74 09 89 f0 3a 03 75 05 43 eb f3 31 db 5a 89 d8 5b 5e 5f 5d c3 55 89 e5 50 68 03 df 3e 8c e8 71 8f c9 ff <0f> 0b 55 31 c9 89 e5 56 53 89 d3 89 c6 89 c2 3b 1b 74 16 0f ff
[ 8.134936] Code: e0 83 e1 07 01 d1 39 cb 74 09 89 f0 3a 03 75 05 43 eb f3 31 db 5a 89 d8 5b 5e 5f 5d c3 55 89 e5 50 68 03 df 3e 8c e8 71 8f c9 ff <0f> 0b 55 31 c9 89 e5 56 53 89 d3 89 c6 89 c2 3b 1b 74 16 0f ff
[ 8.134973] EIP: fortify_panic+0xe/0x10 SS:ESP: 0068:8005fe34
[ 8.134973] EIP: fortify_panic+0xe/0x10 SS:ESP: 0068:8005fe34
[ 8.134976] ---[ end trace 897c8f39c79b92e0 ]---


To reproduce:

git clone https://github.com/01org/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email



Thanks,
Xiaolong


Attachments:
(No filename) (3.98 kB)
config-4.12.0-10998-g6974f0c (90.45 kB)
dmesg.xz (7.95 kB)
Download all attachments

2017-07-20 04:02:27

by Daniel Micay

[permalink] [raw]
Subject: Re: [lkp-robot] [include/linux/string.h] 6974f0c455: kernel_BUG_at_lib/string.c

> [ 8.134886] arch_prepare_optimized_kprobe+0xd5/0x171
> [ 8.134886] arch_prepare_optimized_kprobe+0xd5/0x171

Probably this:

/* Copy arch-dep-instance from template */
memcpy(buf, &optprobe_template_entry, TMPL_END_IDX);

Not a real bug, just technically undefined because these are u32:

typedef u32 kprobe_opcode_t;

extern __visible kprobe_opcode_t optprobe_template_entry;
extern __visible kprobe_opcode_t optprobe_template_val;
extern __visible kprobe_opcode_t optprobe_template_call;
extern __visible kprobe_opcode_t optprobe_template_end;
extern __visible kprobe_opcode_t optprobe_template_sub_sp;
extern __visible kprobe_opcode_t optprobe_template_add_sp;
extern __visible kprobe_opcode_t optprobe_template_restore_begin;
extern __visible kprobe_opcode_t optprobe_template_restore_orig_insn;
extern __visible kprobe_opcode_t optprobe_template_restore_end;

Could be switched to unknown size arrays like optprobe_template_entry[]
but it might be best to just mark the kprobe code with #define
__NO_FORTIFY.

2017-07-20 04:04:27

by Linus Torvalds

[permalink] [raw]
Subject: Re: [lkp-robot] [include/linux/string.h] 6974f0c455: kernel_BUG_at_lib/string.c

Hmm. I wonder why the kernel test robot ends up having that annoying
line doubling for the dmesg.

On Wed, Jul 19, 2017 at 6:42 PM, kernel test robot
<[email protected]> wrote:
>
> FYI, we noticed the following commit:
>
> commit: 6974f0c4555e285ab217cee58b6e874f776ff409 ("include/linux/string.h: add the option of fortified string.h functions")
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):

It does strike me that the fortify_panic() code once again makes life
unnecessarily hard for everybody by using "BUG()"

What the hell is wrong with people? I feel; like I have to say this
multiple times for every single merge window, and sometimes in
between.

BUG() and BUG_ON() are not acceptable debugging things. They kill the
machine. They make for bad debugging.

> [ 8.134860] kernel BUG at lib/string.c:985!

This is basically an entirely useless piece of completely
information-less garbage.

It would have been much nicer if all the fortify_panic() calls had
instead used WARN_ONCE() with helpful pointers to what is going on.

As it is, the full dmesg does show that

detected buffer overflow in memcpy

but since it was printed out separately, if comes before the "-- cut
here --" part and didn't get reported in the summary.

> [ 8.134886] arch_prepare_optimized_kprobe+0xd5/0x171

It's apparently this:

/* Copy arch-dep-instance from template */
memcpy(buf, &optprobe_template_entry, TMPL_END_IDX);

and looking at the code generation, the conditional in the fortify case is

# ./include/linux/string.h:307: if (p_size < size || q_size < size)
cmpq $1, %r13 #, _14
jbe .L109 #,

but it's hard to tell whether that is p_size or q_size. One of them
must be ~0ul (or maybe both are 1) for it to have simplified to that
single conditional.

So the fortify_string code has decided that only a single-byte (or
empty) memcpy is ok.

And that, in turn, seems to be because we're copying from
optprobe_template_entry, which is declared as

extern __visible kprobe_opcode_t optprobe_template_entry;

so the fortify code decides it's a single character.

Does just changing all those things to be declared as arrays fix things?

IOW, a patch something like this white-space damaged mess:

diff --git a/arch/x86/include/asm/kprobes.h b/arch/x86/include/asm/kprobes.h
index 34b984c60790..6cf65437b5e5 100644
--- a/arch/x86/include/asm/kprobes.h
+++ b/arch/x86/include/asm/kprobes.h
@@ -52,10 +52,10 @@ typedef u8 kprobe_opcode_t;
#define flush_insn_slot(p) do { } while (0)

/* optinsn template addresses */
-extern __visible kprobe_opcode_t optprobe_template_entry;
-extern __visible kprobe_opcode_t optprobe_template_val;
-extern __visible kprobe_opcode_t optprobe_template_call;
-extern __visible kprobe_opcode_t optprobe_template_end;
+extern __visible kprobe_opcode_t optprobe_template_entry[];
+extern __visible kprobe_opcode_t optprobe_template_val[];
+extern __visible kprobe_opcode_t optprobe_template_call[];
+extern __visible kprobe_opcode_t optprobe_template_end[];
#define MAX_OPTIMIZED_LENGTH (MAX_INSN_SIZE + RELATIVE_ADDR_SIZE)
#define MAX_OPTINSN_SIZE \
(((unsigned long)&optprobe_template_end - \

Hmm?

Linus

2017-07-20 04:41:41

by Daniel Micay

[permalink] [raw]
Subject: Re: [lkp-robot] [include/linux/string.h] 6974f0c455: kernel_BUG_at_lib/string.c

> So the fortify_string code has decided that only a single-byte (or
> empty) memcpy is ok.
>
> And that, in turn, seems to be because we're copying from
> optprobe_template_entry, which is declared as
>
> extern __visible kprobe_opcode_t optprobe_template_entry;
>
> so the fortify code decides it's a single character.
>
> Does just changing all those things to be declared as arrays fix
> things?

Yeah, that fixes it because GCC will consider the size of 'char foo[]'
unknown (i.e. (size_t)-1 from __builtin_object_size).

GCC doesn't know this essentially constant value at compile-time so it
wasn't a compile-time error:

#define TMPL_END_IDX \
((long)&optprobe_template_end - (long)&optprobe_template_entry)

-fsanitize=object-size works the same way for pointer dereferences so
replacing might fix some issues for CONFIG_UBSAN_SANITIZE_ALL. I guess
that's way too noisy at the moment thus the !COMPILE_TEST.

2017-07-20 13:46:30

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [lkp-robot] [include/linux/string.h] 6974f0c455: kernel_BUG_at_lib/string.c

Hi Linus,

On Wed, 19 Jul 2017 21:04:25 -0700
Linus Torvalds <[email protected]> wrote:

> Hmm. I wonder why the kernel test robot ends up having that annoying
> line doubling for the dmesg.
>
> On Wed, Jul 19, 2017 at 6:42 PM, kernel test robot
> <[email protected]> wrote:
> >
> > FYI, we noticed the following commit:
> >
> > commit: 6974f0c4555e285ab217cee58b6e874f776ff409 ("include/linux/string.h: add the option of fortified string.h functions")
> >
> > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
> It does strike me that the fortify_panic() code once again makes life
> unnecessarily hard for everybody by using "BUG()"
>
> What the hell is wrong with people? I feel; like I have to say this
> multiple times for every single merge window, and sometimes in
> between.
>
> BUG() and BUG_ON() are not acceptable debugging things. They kill the
> machine. They make for bad debugging.
>
> > [ 8.134860] kernel BUG at lib/string.c:985!
>
> This is basically an entirely useless piece of completely
> information-less garbage.
>
> It would have been much nicer if all the fortify_panic() calls had
> instead used WARN_ONCE() with helpful pointers to what is going on.
>
> As it is, the full dmesg does show that
>
> detected buffer overflow in memcpy
>
> but since it was printed out separately, if comes before the "-- cut
> here --" part and didn't get reported in the summary.
>
> > [ 8.134886] arch_prepare_optimized_kprobe+0xd5/0x171
>
> It's apparently this:
>
> /* Copy arch-dep-instance from template */
> memcpy(buf, &optprobe_template_entry, TMPL_END_IDX);
>
> and looking at the code generation, the conditional in the fortify case is
>
> # ./include/linux/string.h:307: if (p_size < size || q_size < size)
> cmpq $1, %r13 #, _14
> jbe .L109 #,
>
> but it's hard to tell whether that is p_size or q_size. One of them
> must be ~0ul (or maybe both are 1) for it to have simplified to that
> single conditional.
>
> So the fortify_string code has decided that only a single-byte (or
> empty) memcpy is ok.
>
> And that, in turn, seems to be because we're copying from
> optprobe_template_entry, which is declared as
>
> extern __visible kprobe_opcode_t optprobe_template_entry;
>
> so the fortify code decides it's a single character.

Ah, right, that seems a single character, yes.

> Does just changing all those things to be declared as arrays fix things?

OK, and I also consider to replace the memcpy with probe_kernel_read()
as I did it in commit ea1e34fc366b ("kprobes/x86: Use probe_kernel_read()
instead of memcpy()"), since it reads memory in kernel-text area.
(Hmm, why I didn't change this memcpy too ...?)
It also fix that as a side effect.

Thank you,

>
> IOW, a patch something like this white-space damaged mess:
>
> diff --git a/arch/x86/include/asm/kprobes.h b/arch/x86/include/asm/kprobes.h
> index 34b984c60790..6cf65437b5e5 100644
> --- a/arch/x86/include/asm/kprobes.h
> +++ b/arch/x86/include/asm/kprobes.h
> @@ -52,10 +52,10 @@ typedef u8 kprobe_opcode_t;
> #define flush_insn_slot(p) do { } while (0)
>
> /* optinsn template addresses */
> -extern __visible kprobe_opcode_t optprobe_template_entry;
> -extern __visible kprobe_opcode_t optprobe_template_val;
> -extern __visible kprobe_opcode_t optprobe_template_call;
> -extern __visible kprobe_opcode_t optprobe_template_end;
> +extern __visible kprobe_opcode_t optprobe_template_entry[];
> +extern __visible kprobe_opcode_t optprobe_template_val[];
> +extern __visible kprobe_opcode_t optprobe_template_call[];
> +extern __visible kprobe_opcode_t optprobe_template_end[];
> #define MAX_OPTIMIZED_LENGTH (MAX_INSN_SIZE + RELATIVE_ADDR_SIZE)
> #define MAX_OPTINSN_SIZE \
> (((unsigned long)&optprobe_template_end - \
>
> Hmm?
>
> Linus


--
Masami Hiramatsu <[email protected]>

2017-07-20 15:51:46

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [lkp-robot] [include/linux/string.h] 6974f0c455: kernel_BUG_at_lib/string.c

On Wed, 19 Jul 2017 21:04:25 -0700
Linus Torvalds <[email protected]> wrote:

> Hmm. I wonder why the kernel test robot ends up having that annoying
> line doubling for the dmesg.
>
> On Wed, Jul 19, 2017 at 6:42 PM, kernel test robot
> <[email protected]> wrote:
> >
> > FYI, we noticed the following commit:
> >
> > commit: 6974f0c4555e285ab217cee58b6e874f776ff409 ("include/linux/string.h: add the option of fortified string.h functions")
> >
> > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
> It does strike me that the fortify_panic() code once again makes life
> unnecessarily hard for everybody by using "BUG()"
>
> What the hell is wrong with people? I feel; like I have to say this
> multiple times for every single merge window, and sometimes in
> between.
>
> BUG() and BUG_ON() are not acceptable debugging things. They kill the
> machine. They make for bad debugging.
>
> > [ 8.134860] kernel BUG at lib/string.c:985!
>
> This is basically an entirely useless piece of completely
> information-less garbage.
>
> It would have been much nicer if all the fortify_panic() calls had
> instead used WARN_ONCE() with helpful pointers to what is going on.
>
> As it is, the full dmesg does show that
>
> detected buffer overflow in memcpy
>
> but since it was printed out separately, if comes before the "-- cut
> here --" part and didn't get reported in the summary.
>
> > [ 8.134886] arch_prepare_optimized_kprobe+0xd5/0x171
>
> It's apparently this:
>
> /* Copy arch-dep-instance from template */
> memcpy(buf, &optprobe_template_entry, TMPL_END_IDX);
>
> and looking at the code generation, the conditional in the fortify case is
>
> # ./include/linux/string.h:307: if (p_size < size || q_size < size)
> cmpq $1, %r13 #, _14
> jbe .L109 #,
>
> but it's hard to tell whether that is p_size or q_size. One of them
> must be ~0ul (or maybe both are 1) for it to have simplified to that
> single conditional.
>
> So the fortify_string code has decided that only a single-byte (or
> empty) memcpy is ok.
>
> And that, in turn, seems to be because we're copying from
> optprobe_template_entry, which is declared as
>
> extern __visible kprobe_opcode_t optprobe_template_entry;
>
> so the fortify code decides it's a single character.
>
> Does just changing all those things to be declared as arrays fix things?
>

BTW, I've confirmed this works.

Acked-by: Masami Hiramatsu <[email protected]>
Tested-by: Masami Hiramatsu <[email protected]>

Thank you!

> IOW, a patch something like this white-space damaged mess:
>
> diff --git a/arch/x86/include/asm/kprobes.h b/arch/x86/include/asm/kprobes.h
> index 34b984c60790..6cf65437b5e5 100644
> --- a/arch/x86/include/asm/kprobes.h
> +++ b/arch/x86/include/asm/kprobes.h
> @@ -52,10 +52,10 @@ typedef u8 kprobe_opcode_t;
> #define flush_insn_slot(p) do { } while (0)
>
> /* optinsn template addresses */
> -extern __visible kprobe_opcode_t optprobe_template_entry;
> -extern __visible kprobe_opcode_t optprobe_template_val;
> -extern __visible kprobe_opcode_t optprobe_template_call;
> -extern __visible kprobe_opcode_t optprobe_template_end;
> +extern __visible kprobe_opcode_t optprobe_template_entry[];
> +extern __visible kprobe_opcode_t optprobe_template_val[];
> +extern __visible kprobe_opcode_t optprobe_template_call[];
> +extern __visible kprobe_opcode_t optprobe_template_end[];
> #define MAX_OPTIMIZED_LENGTH (MAX_INSN_SIZE + RELATIVE_ADDR_SIZE)
> #define MAX_OPTINSN_SIZE \
> (((unsigned long)&optprobe_template_end - \
>
> Hmm?
>
> Linus


--
Masami Hiramatsu <[email protected]>

2017-07-20 18:41:41

by Linus Torvalds

[permalink] [raw]
Subject: Re: [lkp-robot] [include/linux/string.h] 6974f0c455: kernel_BUG_at_lib/string.c

On Thu, Jul 20, 2017 at 8:51 AM, Masami Hiramatsu <[email protected]> wrote:
>
> Acked-by: Masami Hiramatsu <[email protected]>
> Tested-by: Masami Hiramatsu <[email protected]>

Ok, I committed that patch as-is.

Other architectures may end up with the same issue, unless they really
happen to have just a single instruction that fits in their
kprobe_opcode_t type in their templates. But I didn't touch any of
their (similar) code, since it will need testing.

Linus

2017-07-20 22:20:12

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [lkp-robot] [include/linux/string.h] 6974f0c455: kernel_BUG_at_lib/string.c

On Thu, 20 Jul 2017 11:41:38 -0700
Linus Torvalds <[email protected]> wrote:

> On Thu, Jul 20, 2017 at 8:51 AM, Masami Hiramatsu <[email protected]> wrote:
> >
> > Acked-by: Masami Hiramatsu <[email protected]>
> > Tested-by: Masami Hiramatsu <[email protected]>
>
> Ok, I committed that patch as-is.
>
> Other architectures may end up with the same issue, unless they really
> happen to have just a single instruction that fits in their
> kprobe_opcode_t type in their templates. But I didn't touch any of
> their (similar) code, since it will need testing.

OK, I'll check and fix those.

Thank you!

--
Masami Hiramatsu <[email protected]>

2017-07-21 02:00:28

by kernel test robot

[permalink] [raw]
Subject: Re: [lkp-robot] [include/linux/string.h] 6974f0c455: kernel_BUG_at_lib/string.c

Hi,

On 07/19, Linus Torvalds wrote:
>Hmm. I wonder why the kernel test robot ends up having that annoying
>line doubling for the dmesg.
>

Hmm, this line doubling issue should be caused by we set both
'earlyprintk=ttyS0,115200' and 'console=ttyS0,115200' in cmdline, after I
remove any of it, this issue is gone, is it an inappropriate setting?

FYI, the whole kernel command line is:

ip=::::vm-lkp-nhm-dp1-yocto-i386-7::dhcp
root=/dev/ram0
user=lkp
job=/lkp/scheduled/vm-lkp-nhm-dp1-yocto-i386-7/boot-1-yocto-tiny-i386-2016-04-22.cgz-6974f0c4555e285ab217cee58b6e874f776ff409-20170717-28286-131pyzw-0.yaml
ARCH=i386
kconfig=i386-randconfig-w0-07171351
branch=linus/master
commit=6974f0c4555e285ab217cee58b6e874f776ff409
BOOT_IMAGE=/pkg/linux/i386-randconfig-w0-07171351/gcc-6/6974f0c4555e285ab217cee58b6e874f776ff409/vmlinuz-4.12.0-10998-g6974f0c
max_uptime=600
RESULT_ROOT=/result/boot/1/vm-lkp-nhm-dp1-yocto-i386/yocto-tiny-i386-2016-04-22.cgz/i386-randconfig-w0-07171351/gcc-6/6974f0c4555e285ab217cee58b6e874f776ff409/0
LKP_SERVER=inn
debug
apic=debug
sysrq_always_enabled
rcupdate.rcu_cpu_stall_timeout=100
net.ifnames=0
printk.devkmsg=on
panic=-1
softlockup_panic=1
nmi_watchdog=panic
oops=panic
load_ramdisk=2
prompt_ramdisk=0
drbd.minor_count=8
systemd.log_level=err
ignore_loglevel
earlyprintk=ttyS0,115200
console=ttyS0,115200
console=tty0
vga=

Thanks,
Xiaolong

2017-07-21 03:29:56

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [lkp-robot] [include/linux/string.h] 6974f0c455: kernel_BUG_at_lib/string.c

On Thu, 20 Jul 2017 11:41:38 -0700
Linus Torvalds <[email protected]> wrote:

> On Thu, Jul 20, 2017 at 8:51 AM, Masami Hiramatsu <[email protected]> wrote:
> >
> > Acked-by: Masami Hiramatsu <[email protected]>
> > Tested-by: Masami Hiramatsu <[email protected]>
>
> Ok, I committed that patch as-is.
>
> Other architectures may end up with the same issue, unless they really
> happen to have just a single instruction that fits in their
> kprobe_opcode_t type in their templates. But I didn't touch any of
> their (similar) code, since it will need testing.

Similar optprobe code are in arm and powerpc, arm does not support
FORTIFY_SOURCE (yet), and powerpc already fixed that same way.
So I'll make a same fix on arm before traped to the same issue.

Thank you,

--
Masami Hiramatsu <[email protected]>

2017-07-21 05:48:04

by Kees Cook

[permalink] [raw]
Subject: Re: [lkp-robot] [include/linux/string.h] 6974f0c455: kernel_BUG_at_lib/string.c

On Thu, Jul 20, 2017 at 6:59 PM, Ye Xiaolong <[email protected]> wrote:
> On 07/19, Linus Torvalds wrote:
>>Hmm. I wonder why the kernel test robot ends up having that annoying
>>line doubling for the dmesg.
>>
>
> Hmm, this line doubling issue should be caused by we set both
> 'earlyprintk=ttyS0,115200' and 'console=ttyS0,115200' in cmdline, after I
> remove any of it, this issue is gone, is it an inappropriate setting?

This is a weird behavior. Normally the boot console (in this case
"earlyser0") should be disabled when the regular console starts. In
your dmesg I see:

[ 0.000000] bootconsole [earlyser0] enabled
...
[ 0.003333] console [ttyS0] enabled
[ 0.003333] console [ttyS0] enabled

This is where repeating starts, and I would have expected to see:

[ 0.000000] bootconsole [earlyser0] enabled
...
[ 0.000000] console [ttyS0] enabled
[ 0.000000] console [ttyS0] enabled
[ 0.000000] bootconsole [earlyser0] disabled
[ 0.000000] bootconsole [earlyser0] disabled

And that would be the end of doubling.

Actually, with your command-line ("earlyprintk=ttyS0,115200
console=ttyS0,115200 console=tty0") I would expect:

[ 0.000000] console [tty0] enabled
[ 0.000000] bootconsole [earlyser0] disabled
[ 0.000000] console [ttyS0] enabled

I don't see any mention of tty0, though. I wonder if that is confusing
the disable code. The only way I would normally expect the boot
console not to get unregistered is if ",keep" was appended to it...
The code is in kernel/printk/printk.c, FWIW.

Also, it looks like your kernel command line is cut off:

... earlyprintk=ttyS0,115200 console=ttyS0,115200 console=tty0 vga=

With your script showing "... vga=normal ..."

That's only 968 characters, well shy of x86's 2048 limit. Is this a
line length issue with qemu output, or does /proc/cmdline report the
same truncation?

-Kees

--
Kees Cook
Pixel Security

2017-07-21 09:15:52

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [lkp-robot] [include/linux/string.h] 6974f0c455: kernel_BUG_at_lib/string.c

On Fri, 2017-07-21 at 09:59 +0800, Ye Xiaolong wrote:
> Hi,
>
> On 07/19, Linus Torvalds wrote:
> > Hmm. I wonder why the kernel test robot ends up having that annoying
> > line doubling for the dmesg.
> >
>
> Hmm, this line doubling issue should be caused by we set both
> 'earlyprintk=ttyS0,115200' and 'console=ttyS0,115200' in cmdline,
> after I
> remove any of it, this issue is gone, is it an inappropriate setting?


I think this problem is being discussed for a long time.

+Cc: Petr and Sergey

>
> FYI, the whole kernel command line is:
>
> ip=::::vm-lkp-nhm-dp1-yocto-i386-7::dhcp
> root=/dev/ram0
> user=lkp
> job=/lkp/scheduled/vm-lkp-nhm-dp1-yocto-i386-7/boot-1-yocto-tiny-i386-
> 2016-04-22.cgz-6974f0c4555e285ab217cee58b6e874f776ff409-20170717-
> 28286-131pyzw-0.yaml
> ARCH=i386
> kconfig=i386-randconfig-w0-07171351
> branch=linus/master
> commit=6974f0c4555e285ab217cee58b6e874f776ff409
> BOOT_IMAGE=/pkg/linux/i386-randconfig-w0-07171351/gcc-
> 6/6974f0c4555e285ab217cee58b6e874f776ff409/vmlinuz-4.12.0-10998-
> g6974f0c
> max_uptime=600
> RESULT_ROOT=/result/boot/1/vm-lkp-nhm-dp1-yocto-i386/yocto-tiny-i386-
> 2016-04-22.cgz/i386-randconfig-w0-07171351/gcc-
> 6/6974f0c4555e285ab217cee58b6e874f776ff409/0
> LKP_SERVER=inn
> debug
> apic=debug
> sysrq_always_enabled
> rcupdate.rcu_cpu_stall_timeout=100
> net.ifnames=0
> printk.devkmsg=on
> panic=-1
> softlockup_panic=1
> nmi_watchdog=panic
> oops=panic
> load_ramdisk=2
> prompt_ramdisk=0
> drbd.minor_count=8
> systemd.log_level=err
> ignore_loglevel
> earlyprintk=ttyS0,115200
> console=ttyS0,115200
> console=tty0
> vga=
>
> Thanks,
> Xiaolong

--
Andy Shevchenko <[email protected]>
Intel Finland Oy

2017-07-21 10:03:46

by Petr Mladek

[permalink] [raw]
Subject: Re: [lkp-robot] [include/linux/string.h] 6974f0c455: kernel_BUG_at_lib/string.c

On Fri 2017-07-21 12:15:09, Andy Shevchenko wrote:
> On Fri, 2017-07-21 at 09:59 +0800, Ye Xiaolong wrote:
> > Hi,
> >
> > On 07/19, Linus Torvalds wrote:
> > > Hmm. I wonder why the kernel test robot ends up having that annoying
> > > line doubling for the dmesg.
> > >
> >
> > Hmm, this line doubling issue should be caused by we set both
> > 'earlyprintk=ttyS0,115200' and 'console=ttyS0,115200' in cmdline,
> > after I
> > remove any of it, this issue is gone, is it an inappropriate setting?
> >
> > FYI, the whole kernel command line is:
> >
> > earlyprintk=ttyS0,115200
> > console=ttyS0,115200
> > console=tty0

The order is important here. Early consoles are disabled when
the preferred console (last on the command line) is registered.
Therefore the following order should fix it:

console=tty0
earlyprintk=ttyS0,115200
console=ttyS0,115200


Best Regards,
Petr

PS: I have recently spent a lot of time trying to understad this
part of the code. Unfortunately, it is not easy to improve.
The same console might be registered by various aliases.
Therefore it is not easy to check which early console match
a real one. There are match() functions but they have side
effects. It would require some non-trivial redesign.

I am working on some partial clean that would make the
code easier but it is really hard to avoid regressions.

2017-07-25 23:35:46

by Kees Cook

[permalink] [raw]
Subject: Re: [lkp-robot] [include/linux/string.h] 6974f0c455: kernel_BUG_at_lib/string.c

(Finally getting back around to this thread, sorry for the delay...)

On Wed, Jul 19, 2017 at 9:04 PM, Linus Torvalds
<[email protected]> wrote:
> As it is, the full dmesg does show that
>
> detected buffer overflow in memcpy
>
> but since it was printed out separately, if comes before the "-- cut
> here --" part and didn't get reported in the summary.

Replying to this first: this is actually a long-standing problem, and
this reporting actually matches how WARNs appear. All the text from
WARN ends up above the --cut-- line too, which is frustrating to say
the least. For example, here's the kind of thing I saw while testing
refcount overflows earlier today:

[ 11.971447] refcount_t: saturated; leaking memory.
[ 11.972362] ------------[ cut here ]------------
[ 11.973222] WARNING: CPU: 3 PID: 2181 at lib/refcount.c:132
refcount_inc_not_zero+0x48/0x60

Which is from:

WARN_ONCE(new == UINT_MAX, "refcount_t: saturated; leaking memory.\n");

Following include/asm-generic/bug.h, it's because WARN(1, "eek\n") becomes:

printk("eek\n"); __WARN();

Which, IIUC, is mostly because WARN (and BUG) is handled as a trap. We
don't plumb the printk into a deeper level (or rather, we don't raise
the --cut-- line to the top).

The "cut here" comes from lib/bug.c's report_bug and gets duplicated
for the WARN path in kernel/panic.c's __warn():

enum bug_trap_type report_bug(unsigned long bugaddr, struct pt_regs *regs)
...
if (warning) {
/* this is a WARN_ON rather than BUG/BUG_ON */
__warn(file, line, (void *)bugaddr, BUG_GET_TAINT(bug), regs,
NULL);
return BUG_TRAP_TYPE_WARN;
}

printk(KERN_DEFAULT "------------[ cut here ]------------\n");
...

void __warn(const char *file, int line, void *caller, unsigned taint,
struct pt_regs *regs, struct warn_args *args)
{
disable_trace_on_warning();

pr_warn("------------[ cut here ]------------\n");
...

BUG doesn't even have a way to report text, and the warn paths don't
know if it came through WARN_ON (without text) or WARN (with text).
I've never been able to see a sensible way out of this. BUG entirely
lacks text, and WARN puts it in the wrong place. >_<

And, as an aside, I notice that the internal printk in WARN doesn't
include a KERN_WARN by default, and I think I remember having this
discussion a while back, where I see nearly everything doesn't set it,
but some do:

$ git grep -E 'WARN(_ON)?(_RATELIMIT)?(_ONCE)?\(' | grep -v KERN | wc -l
14465
$ git grep -E 'WARN(_ON)?(_RATELIMIT)?(_ONCE)?\(' | grep KERN | wc -l
87

> It would have been much nicer if all the fortify_panic() calls had
> instead used WARN_ONCE() with helpful pointers to what is going on.

In this case, there isn't a sensible way to continue. For other
things, like refcount hardening, we can WARN and then saturate the
refcount and continue running, fully avoiding the vulnerable condition
(use-after-free). Here, we're dealing with a runtime check that knows
the buffer size and we've been handed something that has turned out to
be too large, but it's too late (e.g. memcpy, strcpy, etc don't have a
way to return an error). If this was switched to a WARN, we'd be in a
situation where we can't truncate, skip the copy, or any other partial
action, as those each might lead to some other bad situation.

All that said, looking through the fortified functions again, it is
arguable that we could let kmemdup, memscan, memchr, and memchr_inv
each return NULL and WARN. I'm nervous about the scan/chr functions
being switched to "didn't find it" if that leads to logic errors,
though. Having kmemdup fail should already be a well-handled error
path...

-Kees

--
Kees Cook
Pixel Security

2017-07-26 00:11:45

by Linus Torvalds

[permalink] [raw]
Subject: Re: [lkp-robot] [include/linux/string.h] 6974f0c455: kernel_BUG_at_lib/string.c

On Tue, Jul 25, 2017 at 4:35 PM, Kees Cook <[email protected]> wrote:
>
> In this case, there isn't a sensible way to continue.

Kees, stop this idiocy already.

These have been FALSE POSITIVES. They haven't actually been bugs in
the code, they have been bugs in the *checking* code.

In two years, when this code is actually trusted, that would be one thing.

But right now, it's a f*cking disgrace that you are in denial about
the fact that it's the *checking* that is broken, not the code, and
are making excuses for shit.

That BUG() is broken. Claiming that there was no sane way to continue
is complete and utter bullshit.

Seriously, this is the kind of utter garbage that drives me bonkers.
Introducing new code that kills a machine, and then not owning the
fact that it was *your* code that was broken, and instead saying "but
but we HAD to kill the machine".

So get rid of the BUG(), and get rid of the excuses.

We *know* this code is likely to find these kinds of "not really a
bug, but the checker code does something we didn't used to do"
situations.

And even *if* it is a bug, right now we're so much better off having
it *reported* even if you don't have a serial console, that it's not
even funny. We want things like abortd (or whatever that thing is
called) gather reports and sending them to vendors. We want people to
be able to just run "dmesg" and get the information. We do *not* want
to possibly have a dead machine that took a BUG in an interrupt, and
as a result nothing works any more.

So instead of killing the machine, the code should damn well *warn*
(and be rate-limited about it too!). That way we're going to actually
get reports that can improve the code, instead of having people

(a) with dead machines and no idea why

(b) nervous about enabling the debug code that can make things so much worse.

Comprende?

None of this "there is no way to continue" bullshit. Because it is
pure and utter SHIT.

Linus

2017-07-26 00:38:14

by Kees Cook

[permalink] [raw]
Subject: Re: [lkp-robot] [include/linux/string.h] 6974f0c455: kernel_BUG_at_lib/string.c

On Tue, Jul 25, 2017 at 5:11 PM, Linus Torvalds
<[email protected]> wrote:
> On Tue, Jul 25, 2017 at 4:35 PM, Kees Cook <[email protected]> wrote:
>>
>> In this case, there isn't a sensible way to continue.
>
> Kees, stop this idiocy already.
>
> These have been FALSE POSITIVES. They haven't actually been bugs in
> the code, they have been bugs in the *checking* code.
>
> In two years, when this code is actually trusted, that would be one thing.

Okay, fair enough. As long as there is a time horizon where this can
operate as an actual runtime protection, I can accept that.

> But right now, it's a f*cking disgrace that you are in denial about
> the fact that it's the *checking* that is broken, not the code, and
> are making excuses for shit.

I'm not in denial about that -- I think we just have very different
perspectives on this. I sent a bunch of patches to fix all the places
where there were false positives being found before this landed; I'm
not making excuses and I'm obviously interested in fixing it.

> So get rid of the BUG(), and get rid of the excuses.

I'll get it fixed up.

> We *know* this code is likely to find these kinds of "not really a
> bug, but the checker code does something we didn't used to do"
> situations.

Yup, agreed. We've already fixed a bunch of these, and while this
checking would catch some actual security vulnerabilities from the
past, I can see your point about its "newness" being too great a risk.

-Kees

--
Kees Cook
Pixel Security

2017-07-26 02:37:57

by Daniel Micay

[permalink] [raw]
Subject: Re: [lkp-robot] [include/linux/string.h] 6974f0c455: kernel_BUG_at_lib/string.c

It was known that there are going to be bugs to work through, many of
them relatively benign like the leaks of data near string constants
(probably other string constants) in rodata. It makes sense to have it
default to WARN with BUG / noreturn as a non-default configuration
option for it, I guess with !COMPILE_TEST like UBSAN_SANITIZE_ALL. I
don't think there's any sane way to bound the length of either reads /
writes. It needs to either WARN + continue on into doing the overflow or
use BUG. Trying to correct it might make things worse and would make
this more complicated / bug-prone. It already has enough subtle edge
cases to deal with.

I think 'benign' is a better term than 'false positive' because there
hasn't been a non-bug found yet. They're mostly not security vulns but
they're undefined behavior.