2021-06-20 14:55:29

by kernel test robot

[permalink] [raw]
Subject: [sched] 944be1796b: BUG:Bad_rss-counter_state_mm:#type:MM_FILEPAGES_val



Greeting,

FYI, we noticed the following commit (built with gcc-9):

commit: 944be1796bc1da08d98ef6f41a9b97e39450f356 ("sched: Use lightweight hazard pointers to grab lazy mms")
https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git sched/lazymm


in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+---------------------------------------------------------------------------------------------+------------+------------+
| | 0e57afe67a | 944be1796b |
+---------------------------------------------------------------------------------------------+------------+------------+
| boot_successes | 48 | 2 |
| boot_failures | 0 | 68 |
| BUG:Bad_rss-counter_state_mm:#type:MM_FILEPAGES_val | 0 | 68 |
| BUG:Bad_rss-counter_state_mm:#type:MM_ANONPAGES_val | 0 | 68 |
| BUG:non-zero_pgtables_bytes_on_freeing_mm | 0 | 68 |
| Kernel_panic-not_syncing:Attempted_to_kill_init!exitcode= | 0 | 8 |
| WARNING:at_kernel/sched/core.c:#__schedule | 0 | 51 |
| RIP:__schedule | 0 | 51 |
| WARNING:at_kernel/fork.c:#__mmdrop | 0 | 43 |
| RIP:__mmdrop | 0 | 43 |
| WARNING:at_arch/x86/mm/tlb.c:#switch_mm_irqs_off | 0 | 17 |
| RIP:switch_mm_irqs_off | 0 | 17 |
| kernel_BUG_at_include/linux/mm.h | 0 | 40 |
| invalid_opcode:#[##] | 0 | 43 |
| RIP:put_page_testzero | 0 | 40 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 47 |
| kernel_BUG_at_mm/khugepaged.c | 0 | 3 |
| RIP:__khugepaged_enter | 0 | 3 |
| RIP:__put_user_nocheck_1 | 0 | 2 |
| BUG:Bad_rss-counter_state_mm:#type:MM_SHMEMPAGES_val | 0 | 7 |
| WARNING:at_kernel/sched/core.c:#mm_unlazy_mm_count | 0 | 3 |
| RIP:mm_unlazy_mm_count | 0 | 3 |
| canonical_address#:#[##] | 0 | 4 |
| RIP:pgd_free | 0 | 3 |
| RIP:__handle_mm_fault | 0 | 1 |
| RIP:copy_user_generic_string | 0 | 1 |
| BUG:kernel_NULL_pointer_dereference,address | 0 | 1 |
| Oops:#[##] | 0 | 2 |
| WARNING:at_arch/x86/mm/tlb.c:#nmi_uaccess_okay | 0 | 8 |
| RIP:nmi_uaccess_okay | 0 | 8 |
| BUG:unable_to_handle_page_fault_for_address | 0 | 1 |
| RIP:vma_interval_tree_augment_compute_max | 0 | 1 |
| BUG:Bad_page_map_in_process | 0 | 1 |
| BUG:Bad_page_state_in_process | 0 | 1 |
| RIP:native_machine_emergency_restart | 0 | 1 |
| BUG:stack_guard_page_was_hit_at(____ptrval____)(stack_is(____ptrval____)..(____ptrval____)) | 0 | 1 |
| RIP:number | 0 | 1 |
| RIP:ia32_setup_frame | 0 | 1 |
| WARNING:at_fs/coredump.c:#dump_vma_snapshot | 0 | 2 |
| RIP:dump_vma_snapshot | 0 | 2 |
+---------------------------------------------------------------------------------------------+------------+------------+


If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


[ 6.304063] BUG: Bad rss-counter state mm:(____ptrval____) type:MM_FILEPAGES val:104
[ 6.304586] BUG: Bad rss-counter state mm:(____ptrval____) type:MM_ANONPAGES val:7
[ 6.305063] BUG: non-zero pgtables_bytes on freeing mm: 24576
[ 6.305415] ------------[ cut here ]------------
[ 6.305706] rq->lazy_mm
[ 6.305710] WARNING: CPU: 0 PID: 2577 at kernel/sched/core.c:4392 __schedule (kbuild/src/consumer/kernel/sched/core.c:4392 kbuild/src/consumer/kernel/sched/core.c:5234)
[ 6.306389] Modules linked in:
[ 6.306604] CPU: 0 PID: 2577 Comm: modprobe Not tainted 5.13.0-rc3-00009-g944be1796bc1 #1
[ 6.307110] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 6.307622] RIP: 0010:__schedule (kbuild/src/consumer/kernel/sched/core.c:4392 kbuild/src/consumer/kernel/sched/core.c:5234)
[ 6.307900] Code: 00 00 74 38 49 83 bd 68 09 00 00 00 74 1e 80 3d db 6c 0e 01 00 75 15 48 c7 c7 3f 09 2b 82 c6 05 cb 6c 0e 01 01 e8 41 f0 fd ff <0f> 0b 49 8b 87 f0 03 00 00 49 89 85 68 09 00 00 eb 76 49 c7 84 24
All code
========
0: 00 00 add %al,(%rax)
2: 74 38 je 0x3c
4: 49 83 bd 68 09 00 00 cmpq $0x0,0x968(%r13)
b: 00
c: 74 1e je 0x2c
e: 80 3d db 6c 0e 01 00 cmpb $0x0,0x10e6cdb(%rip) # 0x10e6cf0
15: 75 15 jne 0x2c
17: 48 c7 c7 3f 09 2b 82 mov $0xffffffff822b093f,%rdi
1e: c6 05 cb 6c 0e 01 01 movb $0x1,0x10e6ccb(%rip) # 0x10e6cf0
25: e8 41 f0 fd ff callq 0xfffffffffffdf06b
2a:* 0f 0b ud2 <-- trapping instruction
2c: 49 8b 87 f0 03 00 00 mov 0x3f0(%r15),%rax
33: 49 89 85 68 09 00 00 mov %rax,0x968(%r13)
3a: eb 76 jmp 0xb2
3c: 49 rex.WB
3d: c7 .byte 0xc7
3e: 84 .byte 0x84
3f: 24 .byte 0x24

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 49 8b 87 f0 03 00 00 mov 0x3f0(%r15),%rax
9: 49 89 85 68 09 00 00 mov %rax,0x968(%r13)
10: eb 76 jmp 0x88
12: 49 rex.WB
13: c7 .byte 0xc7
14: 84 .byte 0x84
15: 24 .byte 0x24
[ 6.308958] RSP: 0000:ffffc900017fbcf8 EFLAGS: 00010082
[ 6.309279] RAX: 0000000000000000 RBX: ffff88811a2241c8 RCX: 00000000ffff7fff
[ 6.309697] RDX: 0000000000000252 RSI: 0000000000000001 RDI: 0000000000000001
[ 6.310114] RBP: ffffc900017fbd40 R08: 0000000000000003 R09: 0000000000000000
[ 6.310531] R10: 0000000000000001 R11: 000000002d2d2d2d R12: ffff88811a223c00
[ 6.310949] R13: ffff88842fc2a9c0 R14: 0000000000000001 R15: ffff88810caa9e00
[ 6.311369] FS: 0000000000000000(0000) GS:ffff88842fc00000(0000) knlGS:0000000000000000
[ 6.311861] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6.312209] CR2: 000000000809d8a0 CR3: 0000000110296000 CR4: 00000000000406f0
[ 6.312628] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 6.313048] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 6.313466] Call Trace:
[ 6.313655] preempt_schedule_common (kbuild/src/consumer/arch/x86/include/asm/preempt.h:85 (discriminator 1) kbuild/src/consumer/kernel/sched/core.c:5396 (discriminator 1))
[ 6.313939] __cond_resched (kbuild/src/consumer/kernel/sched/core.c:7076)
[ 6.314185] down_write_killable (kbuild/src/consumer/kernel/locking/rwsem.c:1260 kbuild/src/consumer/kernel/locking/rwsem.c:1275 kbuild/src/consumer/kernel/locking/rwsem.c:1419)
[ 6.314453] mmap_write_lock_killable (kbuild/src/consumer/include/linux/mmap_lock.h:87)
[ 6.314740] setup_arg_pages (kbuild/src/consumer/fs/exec.c:793)
[ 6.314992] load_elf_binary (kbuild/src/consumer/fs/binfmt_elf.c:1033)
[ 6.315252] ? __kernel_read (kbuild/src/consumer/arch/x86/include/asm/current.h:15 kbuild/src/consumer/fs/read_write.c:459)
[ 6.315512] ? __kernel_read (kbuild/src/consumer/arch/x86/include/asm/current.h:15 kbuild/src/consumer/fs/read_write.c:459)
[ 6.315767] bprm_execve (kbuild/src/consumer/fs/exec.c:1722 kbuild/src/consumer/fs/exec.c:1761 kbuild/src/consumer/fs/exec.c:1830 kbuild/src/consumer/fs/exec.c:1792)
[ 6.316010] kernel_execve (kbuild/src/consumer/fs/exec.c:1975)
[ 6.316256] call_usermodehelper_exec_async (kbuild/src/consumer/kernel/umh.c:112)
[ 6.316573] ? call_usermodehelper (kbuild/src/consumer/kernel/umh.c:67)
[ 6.316846] ret_from_fork (kbuild/src/consumer/arch/x86/entry/entry_64.S:300)
[ 6.317085] ---[ end trace 80dc07957d67d052 ]---
[ 6.317399] sh[2576]: segfault at 0 ip 0000000000000000 sp 00000000ffb5c300 error 14 in busybox[8048000+54000]
[ 6.317993] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.

Code starting with the faulting instruction
===========================================


To reproduce:

# build kernel
cd linux
cp config-5.13.0-rc3-00009-g944be1796bc1 .config
make HOSTCC=gcc-9 CC=gcc-9 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email



---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation

Thanks,
Oliver Sang


Attachments:
(No filename) (11.31 kB)
config-5.13.0-rc3-00009-g944be1796bc1 (121.78 kB)
job-script (4.30 kB)
dmesg.xz (15.75 kB)
Download all attachments