2022-05-17 08:49:12

by Kees Cook

[permalink] [raw]
Subject: [PATCH] sched/core: Do not treat class list boundary markers as arrays

GCC 12 is very sensitive about array checking, and views all negative
array accesses as unsafe (a not unreasonable position). Avoid the
warnings about __begin_sched_classes being accessed via negative bounds
by converting them to the pointers they actually are. Silences this
warning:

In file included from kernel/sched/core.c:81:
kernel/sched/core.c: In function ‘set_rq_online.part.0’:
kernel/sched/sched.h:2197:52: error: array subscript -1 is outside array bounds of ‘struct sched_class[44343134792571037]’
[-Werror=array-bounds]
2197 | #define sched_class_lowest (__begin_sched_classes - 1)
| ^
kernel/sched/sched.h:2200:41: note: in definition of macro ‘for_class_range’
2200 | for (class = (_from); class != (_to); class--)
| ^~~
kernel/sched/sched.h:2203:53: note: in expansion of macro ‘sched_class_lowest’
2203 |for_class_range(class, sched_class_highest, sched_class_lowest)
| ^~~~~~~~~~~~~~~~~~
kernel/sched/core.c:9115:17: note: in expansion of macro ‘for_each_class’
9115 | for_each_class(class) {
| ^~~~~~~~~~~~~~
kernel/sched/sched.h:2193:27: note: at offset -208 into object ‘__begin_sched_classes’ of size [0, 9223372036854775807]
2193 | extern struct sched_class __begin_sched_classes[];
| ^~~~~~~~~~~~~~~~~~~~~

Reported-by: Christophe de Dinechin <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]/
Cc: Ingo Molnar <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Dietmar Eggemann <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ben Segall <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Daniel Bristot de Oliveira <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
---
kernel/sched/sched.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 8dccb34eb190..3d31ed9d33fa 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2190,8 +2190,8 @@ const struct sched_class name##_sched_class \
__section("__" #name "_sched_class")

/* Defined in include/asm-generic/vmlinux.lds.h */
-extern struct sched_class __begin_sched_classes[];
-extern struct sched_class __end_sched_classes[];
+extern struct sched_class *__begin_sched_classes;
+extern struct sched_class *__end_sched_classes;

#define sched_class_highest (__end_sched_classes - 1)
#define sched_class_lowest (__begin_sched_classes - 1)
--
2.32.0



2022-05-20 07:28:42

by kernel test robot

[permalink] [raw]
Subject: [sched/core] 4eb47d360b: BUG:unable_to_handle_page_fault_for_address



Greeting,

FYI, we noticed the following commit (built with gcc-11):

commit: 4eb47d360bbd379fc8f51fb5a00281bcb6e83e5a ("[PATCH] sched/core: Do not treat class list boundary markers as arrays")
url: https://github.com/intel-lab-lkp/linux/commits/Kees-Cook/sched-core-Do-not-treat-class-list-boundary-markers-as-arrays/20220517-035158
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 734387ec2f9d77b00276042b1fa7c95f48ee879d
patch link: https://lore.kernel.org/lkml/[email protected]

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+---------------------------------------------+------------+------------+
| | 734387ec2f | 4eb47d360b |
+---------------------------------------------+------------+------------+
| boot_successes | 102 | 0 |
| boot_failures | 0 | 104 |
| BUG:unable_to_handle_page_fault_for_address | 0 | 104 |
| Oops:#[##] | 0 | 104 |
| RIP:set_rq_online | 0 | 104 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 104 |
+---------------------------------------------+------------+------------+


If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


[ 0.236912][ T0] BUG: unable to handle page fault for address: ffffcd3a3fffffa0
[ 0.237849][ T0] #PF: supervisor read access in kernel mode
[ 0.238589][ T0] #PF: error_code(0x0000) - not-present page
[ 0.239306][ T0] PGD 43ffc1067 P4D 43ffc1067 PUD 0
[ 0.239970][ T0] Oops: 0000 [#1] SMP PTI
[ 0.240499][ T0] CPU: 0 PID: 0 Comm: swapper Not tainted 5.18.0-rc5-00021-g4eb47d360bbd #1
[ 0.241574][ T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
[ 0.242854][ T0] RIP: 0010:set_rq_online (kernel/sched/core.c:9139)
[ 0.243582][ T0] Code: 23 51 01 48 8b 15 30 1f 51 01 c7 87 64 0a 00 00 01 00 00 00 48 8d 98 30 ff ff ff 48 8d 82 30 ff ff ff 48 39 c3 74 2e 49 89 fc <48> 8b 43 70 48 85 c0 74 0f 4c 89 e7 e8 9f 1e ef 00 48 8b 15 f8 1e
All code
========
0: 23 51 01 and 0x1(%rcx),%edx
3: 48 8b 15 30 1f 51 01 mov 0x1511f30(%rip),%rdx # 0x1511f3a
a: c7 87 64 0a 00 00 01 movl $0x1,0xa64(%rdi)
11: 00 00 00
14: 48 8d 98 30 ff ff ff lea -0xd0(%rax),%rbx
1b: 48 8d 82 30 ff ff ff lea -0xd0(%rdx),%rax
22: 48 39 c3 cmp %rax,%rbx
25: 74 2e je 0x55
27: 49 89 fc mov %rdi,%r12
2a:* 48 8b 43 70 mov 0x70(%rbx),%rax <-- trapping instruction
2e: 48 85 c0 test %rax,%rax
31: 74 0f je 0x42
33: 4c 89 e7 mov %r12,%rdi
36: e8 9f 1e ef 00 callq 0xef1eda
3b: 48 rex.W
3c: 8b .byte 0x8b
3d: 15 .byte 0x15
3e: f8 clc
3f: 1e (bad)

Code starting with the faulting instruction
===========================================
0: 48 8b 43 70 mov 0x70(%rbx),%rax
4: 48 85 c0 test %rax,%rax
7: 74 0f je 0x18
9: 4c 89 e7 mov %r12,%rdi
c: e8 9f 1e ef 00 callq 0xef1eb0
11: 48 rex.W
12: 8b .byte 0x8b
13: 15 .byte 0x15
14: f8 clc
15: 1e (bad)
[ 0.246022][ T0] RSP: 0000:ffffffffa5203e98 EFLAGS: 00010087
[ 0.246764][ T0] RAX: ffffffffffffff30 RBX: ffffcd3a3fffff30 RCX: 00000000fffb6c20
[ 0.247655][ T0] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff89f6afc2b540
[ 0.248642][ T0] RBP: ffffffffa5203ea8 R08: 0000000000000000 R09: ffff89f380058770
[ 0.249633][ T0] R10: 0000000000000000 R11: 000000000000009c R12: ffff89f6afc2b540
[ 0.250639][ T0] R13: 0000000000000046 R14: 000000000002b540 R15: ffff89f6afc2b780
[ 0.251602][ T0] FS: 0000000000000000(0000) GS:ffff89f6afc00000(0000) knlGS:0000000000000000
[ 0.252698][ T0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.253505][ T0] CR2: ffffcd3a3fffffa0 CR3: 000000010560a000 CR4: 00000000000406b0
[ 0.254483][ T0] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 0.255421][ T0] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 0.256369][ T0] Call Trace:
[ 0.256763][ T0] <TASK>
[ 0.257113][ T0] rq_attach_root (kernel/sched/topology.c:493)
[ 0.257677][ T0] sched_init (kernel/sched/core.c:9601)
[ 0.258210][ T0] start_kernel (arch/x86/include/asm/irqflags.h:29 arch/x86/include/asm/irqflags.h:70 init/main.c:1000)
[ 0.258737][ T0] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:303)
[ 0.259432][ T0] </TASK>
[ 0.259783][ T0] Modules linked in:
[ 0.260245][ T0] CR2: ffffcd3a3fffffa0
[ 0.260732][ T0] ---[ end trace 0000000000000000 ]---
[ 0.261373][ T0] RIP: 0010:set_rq_online (kernel/sched/core.c:9139)
[ 0.262072][ T0] Code: 23 51 01 48 8b 15 30 1f 51 01 c7 87 64 0a 00 00 01 00 00 00 48 8d 98 30 ff ff ff 48 8d 82 30 ff ff ff 48 39 c3 74 2e 49 89 fc <48> 8b 43 70 48 85 c0 74 0f 4c 89 e7 e8 9f 1e ef 00 48 8b 15 f8 1e
All code
========
0: 23 51 01 and 0x1(%rcx),%edx
3: 48 8b 15 30 1f 51 01 mov 0x1511f30(%rip),%rdx # 0x1511f3a
a: c7 87 64 0a 00 00 01 movl $0x1,0xa64(%rdi)
11: 00 00 00
14: 48 8d 98 30 ff ff ff lea -0xd0(%rax),%rbx
1b: 48 8d 82 30 ff ff ff lea -0xd0(%rdx),%rax
22: 48 39 c3 cmp %rax,%rbx
25: 74 2e je 0x55
27: 49 89 fc mov %rdi,%r12
2a:* 48 8b 43 70 mov 0x70(%rbx),%rax <-- trapping instruction
2e: 48 85 c0 test %rax,%rax
31: 74 0f je 0x42
33: 4c 89 e7 mov %r12,%rdi
36: e8 9f 1e ef 00 callq 0xef1eda
3b: 48 rex.W
3c: 8b .byte 0x8b
3d: 15 .byte 0x15
3e: f8 clc
3f: 1e (bad)

Code starting with the faulting instruction
===========================================
0: 48 8b 43 70 mov 0x70(%rbx),%rax
4: 48 85 c0 test %rax,%rax
7: 74 0f je 0x18
9: 4c 89 e7 mov %r12,%rdi
c: e8 9f 1e ef 00 callq 0xef1eb0
11: 48 rex.W
12: 8b .byte 0x8b
13: 15 .byte 0x15
14: f8 clc
15: 1e (bad)


To reproduce:

# build kernel
cd linux
cp config-5.18.0-rc5-00021-g4eb47d360bbd .config
make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
cd <mod-install-dir>
find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz


git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.



--
0-DAY CI Kernel Test Service
https://01.org/lkp



Attachments:
(No filename) (7.66 kB)
config-5.18.0-rc5-00021-g4eb47d360bbd (165.11 kB)
job-script (4.96 kB)
dmesg.xz (4.87 kB)
Download all attachments