2021-04-06 15:47:17

by Oliver Sang

[permalink] [raw]
Subject: [ACPI] 1a1c130ab7: BUG:kernel_NULL_pointer_dereference,address



Greeting,

FYI, we noticed the following commit (built with gcc-9):

commit: 1a1c130ab7575498eed5bcf7220037ae09cd1f8a ("ACPI: tables: x86: Reserve memory occupied by ACPI tables")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


in testcase: trinity
version: trinity-i386-4d2343bd-1_20200320
with following parameters:

number: 99999
group: group-00

test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/


on test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+---------------------------------------------------------+-----------+------------+
| | v5.12-rc5 | 1a1c130ab7 |
+---------------------------------------------------------+-----------+------------+
| boot_successes | 16 | 0 |
| BUG:kernel_NULL_pointer_dereference,address | 0 | 24 |
| Oops:#[##] | 0 | 24 |
| EIP:get_pfnblock_flags_mask | 0 | 24 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 24 |
+---------------------------------------------------------+-----------+------------+


If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


[ 0.136065] BUG: kernel NULL pointer dereference, address: 00000004
[ 0.136596] #PF: supervisor read access in kernel mode
[ 0.137003] #PF: error_code(0x0000) - not-present page
[ 0.137431] *pde = 00000000
[ 0.137671] Oops: 0000 [#1]
[ 0.137908] CPU: 0 PID: 0 Comm: swapper Not tainted 5.12.0-rc5-00001-g1a1c130ab757 #2
[ 0.138567] EIP: get_pfnblock_flags_mask (kbuild/src/consumer/mm/page_alloc.c:490 kbuild/src/consumer/mm/page_alloc.c:504)
[ 0.138971] Code: 55 89 e5 83 ec 08 89 5d f8 89 cb 89 d1 89 75 fc c1 ea 0e c1 e2 04 8b 82 84 77 95 c2 c1 e9 08 83 e1 3c 89 ce 83 e1 1f c1 ee 05 <8b> 04 b0 8b 75 fc d3 e8 21 d8 8b 5d f8 89 ec 5d c3 e8 27 dc e4 ff
All code
========
0: 55 push %rbp
1: 89 e5 mov %esp,%ebp
3: 83 ec 08 sub $0x8,%esp
6: 89 5d f8 mov %ebx,-0x8(%rbp)
9: 89 cb mov %ecx,%ebx
b: 89 d1 mov %edx,%ecx
d: 89 75 fc mov %esi,-0x4(%rbp)
10: c1 ea 0e shr $0xe,%edx
13: c1 e2 04 shl $0x4,%edx
16: 8b 82 84 77 95 c2 mov -0x3d6a887c(%rdx),%eax
1c: c1 e9 08 shr $0x8,%ecx
1f: 83 e1 3c and $0x3c,%ecx
22: 89 ce mov %ecx,%esi
24: 83 e1 1f and $0x1f,%ecx
27: c1 ee 05 shr $0x5,%esi
2a:* 8b 04 b0 mov (%rax,%rsi,4),%eax <-- trapping instruction
2d: 8b 75 fc mov -0x4(%rbp),%esi
30: d3 e8 shr %cl,%eax
32: 21 d8 and %ebx,%eax
34: 8b 5d f8 mov -0x8(%rbp),%ebx
37: 89 ec mov %ebp,%esp
39: 5d pop %rbp
3a: c3 retq
3b: e8 27 dc e4 ff callq 0xffffffffffe4dc67

Code starting with the faulting instruction
===========================================
0: 8b 04 b0 mov (%rax,%rsi,4),%eax
3: 8b 75 fc mov -0x4(%rbp),%esi
6: d3 e8 shr %cl,%eax
8: 21 d8 and %ebx,%eax
a: 8b 5d f8 mov -0x8(%rbp),%ebx
d: 89 ec mov %ebp,%esp
f: 5d pop %rbp
10: c3 retq
11: e8 27 dc e4 ff callq 0xffffffffffe4dc3d
[ 0.140504] EAX: 00000000 EBX: 00000007 ECX: 0000001c EDX: 003faaa0
[ 0.141025] ESI: 00000001 EDI: ffffffff EBP: c1c29e18 ESP: c1c29e10
[ 0.141544] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00210003
[ 0.142096] CR0: 80050033 CR2: 00000004 CR3: 02226000 CR4: 00040690
[ 0.142620] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 0.143146] DR6: fffe0ff0 DR7: 00000400
[ 0.143473] Call Trace:
[ 0.143698] __dump_page (kbuild/src/consumer/mm/debug.c:66)
[ 0.143984] ? find_held_lock (kbuild/src/consumer/kernel/locking/lockdep.c:5003)
[ 0.144300] ? _raw_spin_unlock (kbuild/src/consumer/kernel/locking/spinlock.c:184)
[ 0.144633] ? console_unlock (kbuild/src/consumer/kernel/printk/printk.c:2561)
[ 0.144960] ? __next_mem_range_rev (kbuild/src/consumer/mm/memblock.c:1106)
[ 0.145339] ? memblock_insert_region+0x2a/0x45
[ 0.145762] ? memblock_add_range+0x12d/0x137
[ 0.146242] ? memblock_reserve (kbuild/src/consumer/mm/memblock.c:818 (discriminator 3))
[ 0.146570] ? should_skip_region (kbuild/src/consumer/mm/memblock.c:935)
[ 0.146905] ? __next_mem_range (kbuild/src/consumer/mm/memblock.c:1002)
[ 0.147245] dump_page (kbuild/src/consumer/arch/x86/include/asm/jump_label.h:25 kbuild/src/consumer/include/linux/page_owner.h:51 kbuild/src/consumer/mm/debug.c:193)
[ 0.147536] reserve_bootmem_region (kbuild/src/consumer/include/linux/page-flags.h:356 kbuild/src/consumer/mm/page_alloc.c:1528)
[ 0.147953] memblock_free_all (kbuild/src/consumer/mm/memblock.c:2013 kbuild/src/consumer/mm/memblock.c:2061)
[ 0.148376] ? memblock_alloc_try_nid (kbuild/src/consumer/mm/memblock.c:1567 (discriminator 3))
[ 0.148857] mem_init (kbuild/src/consumer/arch/x86/mm/init_32.c:756)
[ 0.149119] start_kernel (kbuild/src/consumer/init/main.c:835 kbuild/src/consumer/init/main.c:908)
[ 0.149421] i386_start_kernel (kbuild/src/consumer/arch/x86/kernel/head32.c:57)
[ 0.149744] startup_32_smp (kbuild/src/consumer/arch/x86/kernel/head_32.S:328)
[ 0.150071] Modules linked in:
[ 0.150330] CR2: 0000000000000004
[ 0.150589] random: get_random_bytes called from print_oops_end_marker+0x2f/0x50 with crng_init=0
[ 0.150596] ---[ end trace 0000000000000000 ]---
[ 0.151694] EIP: get_pfnblock_flags_mask (kbuild/src/consumer/mm/page_alloc.c:490 kbuild/src/consumer/mm/page_alloc.c:504)
[ 0.152069] Code: 55 89 e5 83 ec 08 89 5d f8 89 cb 89 d1 89 75 fc c1 ea 0e c1 e2 04 8b 82 84 77 95 c2 c1 e9 08 83 e1 3c 89 ce 83 e1 1f c1 ee 05 <8b> 04 b0 8b 75 fc d3 e8 21 d8 8b 5d f8 89 ec 5d c3 e8 27 dc e4 ff
All code
========
0: 55 push %rbp
1: 89 e5 mov %esp,%ebp
3: 83 ec 08 sub $0x8,%esp
6: 89 5d f8 mov %ebx,-0x8(%rbp)
9: 89 cb mov %ecx,%ebx
b: 89 d1 mov %edx,%ecx
d: 89 75 fc mov %esi,-0x4(%rbp)
10: c1 ea 0e shr $0xe,%edx
13: c1 e2 04 shl $0x4,%edx
16: 8b 82 84 77 95 c2 mov -0x3d6a887c(%rdx),%eax
1c: c1 e9 08 shr $0x8,%ecx
1f: 83 e1 3c and $0x3c,%ecx
22: 89 ce mov %ecx,%esi
24: 83 e1 1f and $0x1f,%ecx
27: c1 ee 05 shr $0x5,%esi
2a:* 8b 04 b0 mov (%rax,%rsi,4),%eax <-- trapping instruction
2d: 8b 75 fc mov -0x4(%rbp),%esi
30: d3 e8 shr %cl,%eax
32: 21 d8 and %ebx,%eax
34: 8b 5d f8 mov -0x8(%rbp),%ebx
37: 89 ec mov %ebp,%esp
39: 5d pop %rbp
3a: c3 retq
3b: e8 27 dc e4 ff callq 0xffffffffffe4dc67

Code starting with the faulting instruction
===========================================
0: 8b 04 b0 mov (%rax,%rsi,4),%eax
3: 8b 75 fc mov -0x4(%rbp),%esi
6: d3 e8 shr %cl,%eax
8: 21 d8 and %ebx,%eax
a: 8b 5d f8 mov -0x8(%rbp),%ebx
d: 89 ec mov %ebp,%esp
f: 5d pop %rbp
10: c3 retq
11: e8 27 dc e4 ff callq 0xffffffffffe4dc3d


To reproduce:

# build kernel
cd linux
cp config-5.12.0-rc5-00001-g1a1c130ab757 .config
make HOSTCC=gcc-9 CC=gcc-9 ARCH=i386 olddefconfig prepare modules_prepare bzImage

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email



---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation

Thanks,
Oliver Sang


Attachments:
(No filename) (8.65 kB)
config-5.12.0-rc5-00001-g1a1c130ab757 (131.88 kB)
job-script (4.24 kB)
dmesg.xz (4.49 kB)
Download all attachments

2021-04-07 06:32:57

by Mike Rapoport

[permalink] [raw]
Subject: Re: [ACPI] 1a1c130ab7: BUG:kernel_NULL_pointer_dereference,address

Hi,

On Tue, Apr 06, 2021 at 12:55:28PM +0800, kernel test robot wrote:
>
> Greeting,
>
> FYI, we noticed the following commit (built with gcc-9):
>
> commit: 1a1c130ab7575498eed5bcf7220037ae09cd1f8a ("ACPI: tables: x86: Reserve memory occupied by ACPI tables")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
>
> in testcase: trinity
> version: trinity-i386-4d2343bd-1_20200320
> with following parameters:
>
> number: 99999
> group: group-00
>
> test-description: Trinity is a linux system call fuzz tester.
> test-url: http://codemonkey.org.uk/projects/trinity/
>
>
> on test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
> +---------------------------------------------------------+-----------+------------+
> | | v5.12-rc5 | 1a1c130ab7 |
> +---------------------------------------------------------+-----------+------------+
> | boot_successes | 16 | 0 |
> | BUG:kernel_NULL_pointer_dereference,address | 0 | 24 |
> | Oops:#[##] | 0 | 24 |
> | EIP:get_pfnblock_flags_mask | 0 | 24 |
> | Kernel_panic-not_syncing:Fatal_exception | 0 | 24 |
> +---------------------------------------------------------+-----------+------------+
>
>
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <[email protected]>
>
>
> [ 0.136065] BUG: kernel NULL pointer dereference, address: 00000004
> [ 0.136596] #PF: supervisor read access in kernel mode
> [ 0.137003] #PF: error_code(0x0000) - not-present page
> [ 0.137431] *pde = 00000000
> [ 0.137671] Oops: 0000 [#1]
> [ 0.137908] CPU: 0 PID: 0 Comm: swapper Not tainted 5.12.0-rc5-00001-g1a1c130ab757 #2
> [ 0.138567] EIP: get_pfnblock_flags_mask (kbuild/src/consumer/mm/page_alloc.c:490 kbuild/src/consumer/mm/page_alloc.c:504)
> [ 0.138971] Code: 55 89 e5 83 ec 08 89 5d f8 89 cb 89 d1 89 75 fc c1 ea 0e c1 e2 04 8b 82 84 77 95 c2 c1 e9 08 83 e1 3c 89 ce 83 e1 1f c1 ee 05 <8b> 04 b0 8b 75 fc d3 e8 21 d8 8b 5d f8 89 ec 5d c3 e8 27 dc e4 ff
> [ 0.140504] EAX: 00000000 EBX: 00000007 ECX: 0000001c EDX: 003faaa0
> [ 0.141025] ESI: 00000001 EDI: ffffffff EBP: c1c29e18 ESP: c1c29e10
> [ 0.141544] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00210003
> [ 0.142096] CR0: 80050033 CR2: 00000004 CR3: 02226000 CR4: 00040690
> [ 0.142620] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [ 0.143146] DR6: fffe0ff0 DR7: 00000400
> [ 0.143473] Call Trace:
> [ 0.143698] __dump_page (kbuild/src/consumer/mm/debug.c:66)
> [ 0.143984] ? find_held_lock (kbuild/src/consumer/kernel/locking/lockdep.c:5003)
> [ 0.144300] ? _raw_spin_unlock (kbuild/src/consumer/kernel/locking/spinlock.c:184)
> [ 0.144633] ? console_unlock (kbuild/src/consumer/kernel/printk/printk.c:2561)
> [ 0.144960] ? __next_mem_range_rev (kbuild/src/consumer/mm/memblock.c:1106)
> [ 0.145339] ? memblock_insert_region+0x2a/0x45
> [ 0.145762] ? memblock_add_range+0x12d/0x137
> [ 0.146242] ? memblock_reserve (kbuild/src/consumer/mm/memblock.c:818 (discriminator 3))
> [ 0.146570] ? should_skip_region (kbuild/src/consumer/mm/memblock.c:935)
> [ 0.146905] ? __next_mem_range (kbuild/src/consumer/mm/memblock.c:1002)
> [ 0.147245] dump_page (kbuild/src/consumer/arch/x86/include/asm/jump_label.h:25 kbuild/src/consumer/include/linux/page_owner.h:51 kbuild/src/consumer/mm/debug.c:193)
> [ 0.147536] reserve_bootmem_region (kbuild/src/consumer/include/linux/page-flags.h:356 kbuild/src/consumer/mm/page_alloc.c:1528)
> [ 0.147953] memblock_free_all (kbuild/src/consumer/mm/memblock.c:2013 kbuild/src/consumer/mm/memblock.c:2061)
> [ 0.148376] ? memblock_alloc_try_nid (kbuild/src/consumer/mm/memblock.c:1567 (discriminator 3))
> [ 0.148857] mem_init (kbuild/src/consumer/arch/x86/mm/init_32.c:756)
> [ 0.149119] start_kernel (kbuild/src/consumer/init/main.c:835 kbuild/src/consumer/init/main.c:908)
> [ 0.149421] i386_start_kernel (kbuild/src/consumer/arch/x86/kernel/head32.c:57)
> [ 0.149744] startup_32_smp (kbuild/src/consumer/arch/x86/kernel/head_32.S:328)
> [ 0.150071] Modules linked in:
> [ 0.150330] CR2: 0000000000000004
> [ 0.150589] random: get_random_bytes called from print_oops_end_marker+0x2f/0x50 with crng_init=0
> [ 0.150596] ---[ end trace 0000000000000000 ]---
> [ 0.151694] EIP: get_pfnblock_flags_mask (kbuild/src/consumer/mm/page_alloc.c:490 kbuild/src/consumer/mm/page_alloc.c:504)
> [ 0.152069] Code: 55 89 e5 83 ec 08 89 5d f8 89 cb 89 d1 89 75 fc c1 ea 0e c1 e2 04 8b 82 84 77 95 c2 c1 e9 08 83 e1 3c 89 ce 83 e1 1f c1 ee 05 <8b> 04 b0 8b 75 fc d3 e8 21 d8 8b 5d f8 89 ec 5d c3 e8 27 dc e4 ff

We hit PF_POISONED_CHECK(page) at __SetPageReserved() called from
reserve_bootmem_region().

The machine has 16G of ram, but it runs 32bit kernel with no HIGHMEM, so it
has max_pfn trimmed in highmem_pfn_init(), but the rest of the memory
representation (neither e820 tables nor memblock) are never updated:

[ 0.006005] Warning only 495MB will be used.
[ 0.006328] Use a HIGHMEM enabled kernel.

So when we get to sparse_init() the sections above 495M are initialized and
pfns in that range are considered valid by the rest of the system.

I believe something along the lines of the patch below should fix this:

diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index da31c2635ee4..a41a64314ce2 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -606,6 +606,7 @@ static void __init lowmem_pfn_init(void)
*/
static void __init highmem_pfn_init(void)
{
+ unsigned long orig_max_pfn = max_pfn;
max_low_pfn = MAXMEM_PFN;

if (highmem_pages == -1)
@@ -636,6 +637,13 @@ static void __init highmem_pfn_init(void)
}
#endif /* !CONFIG_HIGHMEM64G */
#endif /* !CONFIG_HIGHMEM */
+
+ if (orig_max_pfn > max_pfn) {
+ u64 start = PFN_PHYS(max_pfn);
+ u64 size = ULLONG_MAX - size;
+
+ e820__range_remove(start, size, E820_TYPE_RAM, 1);
+ }
}

/*


--
Sincerely yours,
Mike.