2022-03-17 08:34:52

by kernel test robot

[permalink] [raw]
Subject: [mm] f886cdb769: kernel_BUG_at_include/linux/swapops.h



Greeting,

FYI, we noticed the following commit (built with gcc-9):

commit: f886cdb76920131b030ffae13e752d8d0ff440f0 ("mm: pvmw: add support for walking devmap pages")
url: https://github.com/0day-ci/linux/commits/Petr-Mladek/kthread-Make-it-clear-that-kthread_create_on_node-might-be-terminated-by-any-fatal-signal/20220315-182614

in testcase: will-it-scale
version: will-it-scale-x86_64-a34a85c-1_20220312
with following parameters:

nr_task: 100%
mode: process
test: lock1
cpufreq_governor: performance
ucode: 0x2006c0a

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale


on test machine: 104 threads 2 sockets Skylake with 192G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


[ 163.838480][ T4483] kernel BUG at include/linux/swapops.h:258!
[ 163.844868][ T4483] invalid opcode: 0000 [#1] SMP PTI
[ 163.850544][ T4483] CPU: 103 PID: 4483 Comm: perf Not tainted 5.17.0-rc7-mm1-00347-gf886cdb76920 #1
[ 163.860216][ T4483] RIP: 0010:migration_entry_wait_on_locked (include/linux/swapops.h:258 mm/filemap.c:1412)
[ 163.867512][ T4483] Code: 66 90 e9 73 fe ff ff 66 90 e9 3c ff ff ff 48 8b 43 08 a8 01 0f 85 84 00 00 00 66 90 48 89 d8 48 8b 00 a8 01 0f 85 10 fe ff ff <0f> 0b 48 8d 58 ff e9 16 fe ff ff 65 48 8b 04 25 00 ad 01 00 48 83
All code
========
0: 66 90 xchg %ax,%ax
2: e9 73 fe ff ff jmpq 0xfffffffffffffe7a
7: 66 90 xchg %ax,%ax
9: e9 3c ff ff ff jmpq 0xffffffffffffff4a
e: 48 8b 43 08 mov 0x8(%rbx),%rax
12: a8 01 test $0x1,%al
14: 0f 85 84 00 00 00 jne 0x9e
1a: 66 90 xchg %ax,%ax
1c: 48 89 d8 mov %rbx,%rax
1f: 48 8b 00 mov (%rax),%rax
22: a8 01 test $0x1,%al
24: 0f 85 10 fe ff ff jne 0xfffffffffffffe3a
2a:* 0f 0b ud2 <-- trapping instruction
2c: 48 8d 58 ff lea -0x1(%rax),%rbx
30: e9 16 fe ff ff jmpq 0xfffffffffffffe4b
35: 65 48 8b 04 25 00 ad mov %gs:0x1ad00,%rax
3c: 01 00
3e: 48 rex.W
3f: 83 .byte 0x83

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 48 8d 58 ff lea -0x1(%rax),%rbx
6: e9 16 fe ff ff jmpq 0xfffffffffffffe21
b: 65 48 8b 04 25 00 ad mov %gs:0x1ad00,%rax
12: 01 00
14: 48 rex.W
15: 83 .byte 0x83
[ 163.888105][ T4483] RSP: 0000:ffffc900233afd60 EFLAGS: 00010246
[ 163.894594][ T4483] RAX: 0017ffffc0000000 RBX: ffffea000b000000 RCX: 000000000000001b
[ 163.903001][ T4483] RDX: ffffea0004ae2468 RSI: 0000000000000000 RDI: 6c000000002c0000
[ 163.911401][ T4483] RBP: 0400000000000080 R08: 0000000000100073 R09: ffff88812b890f50
[ 163.919811][ T4483] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000e28
[ 163.928219][ T4483] R13: 0400000000000000 R14: ffff8881ecf57450 R15: fff000003fffffff
[ 163.936625][ T4483] FS: 00007f7ab9199d40(0000) GS:ffff88afa62c0000(0000) knlGS:0000000000000000
[ 163.946000][ T4483] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 163.953026][ T4483] CR2: 00007f7ab8a06950 CR3: 000000016e806001 CR4: 00000000007706e0
[ 163.961451][ T4483] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 163.969879][ T4483] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 163.978305][ T4483] PKRU: 55555554
[ 163.982310][ T4483] Call Trace:
[ 163.986055][ T4483] <TASK>
[ 163.989441][ T4483] ? do_huge_pmd_numa_page (mm/huge_memory.c:1460)
[ 163.995343][ T4483] __handle_mm_fault (mm/memory.c:4608 mm/memory.c:4704)
[ 164.000726][ T4483] handle_mm_fault (mm/memory.c:4802)
[ 164.005854][ T4483] do_user_addr_fault (arch/x86/mm/fault.c:1397)
[ 164.011326][ T4483] exc_page_fault (arch/x86/include/asm/irqflags.h:40 arch/x86/include/asm/irqflags.h:75 arch/x86/mm/fault.c:1492 arch/x86/mm/fault.c:1540)
[ 164.016375][ T4483] ? asm_exc_page_fault (arch/x86/include/asm/idtentry.h:568)
[ 164.021757][ T4483] asm_exc_page_fault (arch/x86/include/asm/idtentry.h:568)
[ 164.027042][ T4483] RIP: 0033:0x5565db69fcde
[ 164.031894][ T4483] Code: f8 31 c0 48 8b 45 f8 64 48 33 04 25 28 00 00 00 75 02 c9 c3 e8 83 49 f3 ff 0f 1f 00 55 48 89 e5 41 54 53 48 89 fb 48 83 ec 10 <48> 8b bf a8 00 01 00 64 48 8b 04 25 28 00 00 00 48 89 45 e8 31 c0
All code
========
0: f8 clc
1: 31 c0 xor %eax,%eax
3: 48 8b 45 f8 mov -0x8(%rbp),%rax
7: 64 48 33 04 25 28 00 xor %fs:0x28,%rax
e: 00 00
10: 75 02 jne 0x14
12: c9 leaveq
13: c3 retq
14: e8 83 49 f3 ff callq 0xfffffffffff3499c
19: 0f 1f 00 nopl (%rax)
1c: 55 push %rbp
1d: 48 89 e5 mov %rsp,%rbp
20: 41 54 push %r12
22: 53 push %rbx
23: 48 89 fb mov %rdi,%rbx
26: 48 83 ec 10 sub $0x10,%rsp
2a:* 48 8b bf a8 00 01 00 mov 0x100a8(%rdi),%rdi <-- trapping instruction
31: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
38: 00 00
3a: 48 89 45 e8 mov %rax,-0x18(%rbp)
3e: 31 c0 xor %eax,%eax

Code starting with the faulting instruction
===========================================
0: 48 8b bf a8 00 01 00 mov 0x100a8(%rdi),%rdi
7: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
e: 00 00
10: 48 89 45 e8 mov %rax,-0x18(%rbp)
14: 31 c0 xor %eax,%eax
[ 164.052568][ T4483] RSP: 002b:00007fff1d7d6b40 EFLAGS: 00010202
[ 164.059122][ T4483] RAX: 0000000000000000 RBX: 00007f7ab89f68a8 RCX: 0000000000000000
[ 164.067592][ T4483] RDX: 0000000000001000 RSI: 0000000000401000 RDI: 00007f7ab89f68a8
[ 164.076078][ T4483] RBP: 00007fff1d7d6b60 R08: 0000000000000a9f R09: 00005565dc5248a0
[ 164.084546][ T4483] R10: 0000000000000000 R11: 0000000000000206 R12: 000000000000000c
[ 164.093005][ T4483] R13: 00000000000c0960 R14: 00007fff1d7d9c20 R15: 0000000000000000
[ 164.101455][ T4483] </TASK>
[ 164.104956][ T4483] Modules linked in: binfmt_misc btrfs blake2b_generic xor raid6_pq intel_rapl_msr intel_rapl_common zstd_compress libcrc32c ast drm_vram_helper drm_ttm_helper sd_mod ttm sg nvme skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ipmi_ssif drm_kms_helper rapl nvme_core syscopyarea sysfillrect t10_pi intel_cstate ahci sysimgblt acpi_ipmi libahci fb_sys_fops crc64_rocksoft_generic mei_me intel_uncore ipmi_si crc64_rocksoft drm crc64 ioatdma libata mei joydev ipmi_devintf intel_pch_thermal wmi dca ipmi_msghandler acpi_pad acpi_power_meter ip_tables
[ 164.168156][ T4483] ---[ end trace 0000000000000000 ]---
[ 164.200833][ T4483] RIP: 0010:migration_entry_wait_on_locked (include/linux/swapops.h:258 mm/filemap.c:1412)
[ 164.208245][ T4483] Code: 66 90 e9 73 fe ff ff 66 90 e9 3c ff ff ff 48 8b 43 08 a8 01 0f 85 84 00 00 00 66 90 48 89 d8 48 8b 00 a8 01 0f 85 10 fe ff ff <0f> 0b 48 8d 58 ff e9 16 fe ff ff 65 48 8b 04 25 00 ad 01 00 48 83
All code
========
0: 66 90 xchg %ax,%ax
2: e9 73 fe ff ff jmpq 0xfffffffffffffe7a
7: 66 90 xchg %ax,%ax
9: e9 3c ff ff ff jmpq 0xffffffffffffff4a
e: 48 8b 43 08 mov 0x8(%rbx),%rax
12: a8 01 test $0x1,%al
14: 0f 85 84 00 00 00 jne 0x9e
1a: 66 90 xchg %ax,%ax
1c: 48 89 d8 mov %rbx,%rax
1f: 48 8b 00 mov (%rax),%rax
22: a8 01 test $0x1,%al
24: 0f 85 10 fe ff ff jne 0xfffffffffffffe3a
2a:* 0f 0b ud2 <-- trapping instruction
2c: 48 8d 58 ff lea -0x1(%rax),%rbx
30: e9 16 fe ff ff jmpq 0xfffffffffffffe4b
35: 65 48 8b 04 25 00 ad mov %gs:0x1ad00,%rax
3c: 01 00
3e: 48 rex.W
3f: 83 .byte 0x83

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 48 8d 58 ff lea -0x1(%rax),%rbx
6: e9 16 fe ff ff jmpq 0xfffffffffffffe21
b: 65 48 8b 04 25 00 ad mov %gs:0x1ad00,%rax
12: 01 00
14: 48 rex.W
15: 83 .byte 0x83


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.



---
0-DAY CI Kernel Test Service
https://lists.01.org/hyperkitty/list/[email protected]

Thanks,
Oliver Sang


Attachments:
(No filename) (9.42 kB)
config-5.17.0-rc7-mm1-00347-gf886cdb76920 (142.63 kB)
job-script (7.69 kB)
dmesg.xz (35.75 kB)
job.yaml (5.26 kB)
reproduce (352.00 B)
Download all attachments

2022-03-17 19:41:14

by Muchun Song

[permalink] [raw]
Subject: Re: [mm] f886cdb769: kernel_BUG_at_include/linux/swapops.h

On Thu, Mar 17, 2022 at 4:05 PM kernel test robot <[email protected]> wrote:
>
>
>
> Greeting,
>
> FYI, we noticed the following commit (built with gcc-9):
>
> commit: f886cdb76920131b030ffae13e752d8d0ff440f0 ("mm: pvmw: add support for walking devmap pages")
> url: https://github.com/0day-ci/linux/commits/Petr-Mladek/kthread-Make-it-clear-that-kthread_create_on_node-might-be-terminated-by-any-fatal-signal/20220315-182614
>
> in testcase: will-it-scale
> version: will-it-scale-x86_64-a34a85c-1_20220312
> with following parameters:
>
> nr_task: 100%
> mode: process
> test: lock1
> cpufreq_governor: performance
> ucode: 0x2006c0a
>
> test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
> test-url: https://github.com/antonblanchard/will-it-scale
>
>
> on test machine: 104 threads 2 sockets Skylake with 192G memory
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
>
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <[email protected]>
>
>

Thanks for your report. I knew the reason. Because pmd_devmap() is
only reliable when pmd_present() returns true and pmd_devmap() could
returns tue for pmd swap entry. I should test pmd_present() before
pmd_devmap(). Will be fixed in the next version.

Thanks.