2022-10-05 15:53:16

by kernel test robot

[permalink] [raw]
Subject: [mm] 763ecb0350: kernel_BUG_at_mm/mmap.c


Greeting,

FYI, we noticed the following commit (built with gcc-11):

commit: 763ecb035029f500d7e6dc99acd1ad299b7726a1 ("mm: remove the vma linked list")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

in testcase: trinity
version: trinity-static-i386-x86_64-1c734c75-1_2020-01-06
with following parameters:

runtime: 300s
group: group-03

test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/


on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):



If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/r/[email protected]


[ 63.390267][ T5018] ------------[ cut here ]------------
[ 63.391875][ T5018] kernel BUG at mm/mmap.c:3167!
[ 63.393264][ T5018] invalid opcode: 0000 [#1] SMP PTI
[ 63.394501][ T5018] CPU: 1 PID: 5018 Comm: trinity-c1 Not tainted 6.0.0-rc3-00284-g763ecb035029 #1
[ 63.396050][ T5018] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
[ 63.397726][ T5018] RIP: 0010:exit_mmap (mm/mmap.c:3167 (discriminator 1))
[ 63.398927][ T5018] Code: 41 80 8c 24 7a 03 00 00 20 e9 17 fe ff ff 4c 89 e7 e8 60 86 04 00 e9 f7 fd ff ff be 01 00 00 00 4c 89 e7 e8 8e ba fe ff eb c0 <0f> 0b e8 c5 c3 a1 00 0f 1f 44 00 00 0f 1f 44 00 00 41 55 41 54 49
All code
========
0: 41 80 8c 24 7a 03 00 orb $0x20,0x37a(%r12)
7: 00 20
9: e9 17 fe ff ff jmpq 0xfffffffffffffe25
e: 4c 89 e7 mov %r12,%rdi
11: e8 60 86 04 00 callq 0x48676
16: e9 f7 fd ff ff jmpq 0xfffffffffffffe12
1b: be 01 00 00 00 mov $0x1,%esi
20: 4c 89 e7 mov %r12,%rdi
23: e8 8e ba fe ff callq 0xfffffffffffebab6
28: eb c0 jmp 0xffffffffffffffea
2a:* 0f 0b ud2 <-- trapping instruction
2c: e8 c5 c3 a1 00 callq 0xa1c3f6
31: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
36: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
3b: 41 55 push %r13
3d: 41 54 push %r12
3f: 49 rex.WB

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: e8 c5 c3 a1 00 callq 0xa1c3cc
7: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
11: 41 55 push %r13
13: 41 54 push %r12
15: 49 rex.WB
[ 63.402184][ T5018] RSP: 0000:ffffc90002087cb8 EFLAGS: 00010212
[ 63.403493][ T5018] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000000001
[ 63.404964][ T5018] RDX: ffff88811ed70c00 RSI: ffff88842fc35b00 RDI: ffffc90002087cb8
[ 63.406608][ T5018] RBP: 0000000000000000 R08: 0000000000000009 R09: ffffffff812d6500
[ 63.408192][ T5018] R10: ffffffffffffffff R11: 0000000000000001 R12: ffff88811ed70cc0
[ 63.409754][ T5018] R13: ffff88811ed70d30 R14: 00000000000ffd06 R15: ffff88811cb2cf80
[ 63.411287][ T5018] FS: 0000000000000000(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000
[ 63.413024][ T5018] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
[ 63.414537][ T5018] CR2: 00000000f7fe4549 CR3: 000000000280a000 CR4: 00000000000406e0
[ 63.416170][ T5018] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 63.417836][ T5018] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 63.419472][ T5018] Call Trace:
[ 63.420674][ T5018] <TASK>
[ 63.421821][ T5018] __mmput (kernel/fork.c:1187)
[ 63.423055][ T5018] exit_mm (kernel/exit.c:512)
[ 63.424271][ T5018] do_exit (kernel/exit.c:786)
[ 63.425510][ T5018] do_group_exit (kernel/exit.c:908)
[ 63.426763][ T5018] get_signal (kernel/signal.c:2857)
[ 63.427992][ T5018] arch_do_signal_or_restart (arch/x86/kernel/signal.c:869)
[ 63.429337][ T5018] ? force_sig_fault (kernel/signal.c:1727)
[ 63.430593][ T5018] exit_to_user_mode_loop (kernel/entry/common.c:168)
[ 63.431911][ T5018] exit_to_user_mode_prepare (kernel/entry/common.c:201)
[ 63.433259][ T5018] irqentry_exit_to_user_mode (arch/x86/include/asm/jump_label.h:27 include/linux/context_tracking_state.h:106 include/linux/context_tracking.h:41 kernel/entry/common.c:132 kernel/entry/common.c:309)
[ 63.434460][ T5018] asm_exc_page_fault (arch/x86/include/asm/idtentry.h:570)
[ 63.435617][ T5018] RIP: 0023:0xf7fe4549
[ 63.436620][ T5018] Code: Unable to access opcode bytes at RIP 0xf7fe451f.

Code starting with the faulting instruction
===========================================
[ 63.437862][ T5018] RSP: 002b:00000000fff56a2c EFLAGS: 00010206
[ 63.439090][ T5018] RAX: 0000000000000000 RBX: 0000000000000003 RCX: 00000000f7fe4549
[ 63.440514][ T5018] RDX: 0000000000000077 RSI: 0000000000000000 RDI: 0000000049fd6bf3
[ 63.441751][ T5018] RBP: 000000000000ff5a R08: 0000000000000000 R09: 0000000000000000
[ 63.443169][ T5018] R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
[ 63.444621][ T5018] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 63.446119][ T5018] </TASK>
[ 63.447194][ T5018] Modules linked in: bridge stp llc can_bcm can_raw can crypto_user nfnetlink scsi_transport_iscsi atm sctp ip6_udp_tunnel udp_tunnel libcrc32c sr_mod cdrom bochs intel_rapl_msr drm_vram_helper drm_ttm_helper intel_rapl_common ata_generic crct10dif_pclmul ttm crc32_pclmul drm_kms_helper ppdev crc32c_intel ghash_clmulni_intel ata_piix syscopyarea sysfillrect sysimgblt fb_sys_fops rapl libata drm joydev i2c_piix4 serio_raw parport_pc parport
[ 63.464556][ T5018] ---[ end trace 0000000000000000 ]---
[ 63.466679][ T5018] RIP: 0010:exit_mmap (mm/mmap.c:3167 (discriminator 1))
[ 63.468562][ T5018] Code: 41 80 8c 24 7a 03 00 00 20 e9 17 fe ff ff 4c 89 e7 e8 60 86 04 00 e9 f7 fd ff ff be 01 00 00 00 4c 89 e7 e8 8e ba fe ff eb c0 <0f> 0b e8 c5 c3 a1 00 0f 1f 44 00 00 0f 1f 44 00 00 41 55 41 54 49
All code
========
0: 41 80 8c 24 7a 03 00 orb $0x20,0x37a(%r12)
7: 00 20
9: e9 17 fe ff ff jmpq 0xfffffffffffffe25
e: 4c 89 e7 mov %r12,%rdi
11: e8 60 86 04 00 callq 0x48676
16: e9 f7 fd ff ff jmpq 0xfffffffffffffe12
1b: be 01 00 00 00 mov $0x1,%esi
20: 4c 89 e7 mov %r12,%rdi
23: e8 8e ba fe ff callq 0xfffffffffffebab6
28: eb c0 jmp 0xffffffffffffffea
2a:* 0f 0b ud2 <-- trapping instruction
2c: e8 c5 c3 a1 00 callq 0xa1c3f6
31: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
36: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
3b: 41 55 push %r13
3d: 41 54 push %r12
3f: 49 rex.WB

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: e8 c5 c3 a1 00 callq 0xa1c3cc
7: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
11: 41 55 push %r13
13: 41 54 push %r12
15: 49 rex.WB


To reproduce:

# build kernel
cd linux
cp config-6.0.0-rc3-00284-g763ecb035029 .config
make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
cd <mod-install-dir>
find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz


git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.



--
0-DAY CI Kernel Test Service
https://01.org/lkp



Attachments:
(No filename) (8.06 kB)
config-6.0.0-rc3-00284-g763ecb035029 (166.50 kB)
job-script (4.53 kB)
dmesg.xz (28.00 kB)
Download all attachments

2022-10-07 01:45:07

by Yu Zhao

[permalink] [raw]
Subject: Re: [mm] 763ecb0350: kernel_BUG_at_mm/mmap.c

On Wed, Oct 5, 2022 at 9:30 AM kernel test robot <[email protected]> wrote:
>
>
> Greeting,
>
> FYI, we noticed the following commit (built with gcc-11):
>
> commit: 763ecb035029f500d7e6dc99acd1ad299b7726a1 ("mm: remove the vma linked list")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> in testcase: trinity
> version: trinity-static-i386-x86_64-1c734c75-1_2020-01-06
> with following parameters:
>
> runtime: 300s
> group: group-03
>
> test-description: Trinity is a linux system call fuzz tester.
> test-url: http://codemonkey.org.uk/projects/trinity/
>
>
> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
>
> If you fix the issue, kindly add following tag
> | Reported-by: kernel test robot <[email protected]>
> | Link: https://lore.kernel.org/r/[email protected]
>
>
> [ 63.390267][ T5018] ------------[ cut here ]------------
> [ 63.391875][ T5018] kernel BUG at mm/mmap.c:3167!
> [ 63.393264][ T5018] invalid opcode: 0000 [#1] SMP PTI
> [ 63.394501][ T5018] CPU: 1 PID: 5018 Comm: trinity-c1 Not tainted 6.0.0-rc3-00284-g763ecb035029 #1
> [ 63.396050][ T5018] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
> [ 63.397726][ T5018] RIP: 0010:exit_mmap (mm/mmap.c:3167 (discriminator 1))

Thanks, Oliver.

The attached dmesg doesn't say much. My guess is the oom reaper jumped
in between

mmap_read_unlock(mm);

/*
* Set MMF_OOM_SKIP to hide this task from the oom killer/reaper
* because the memory has been already freed.
*/
set_bit(MMF_OOM_SKIP, &mm->flags);
mmap_write_lock(mm);

It seems to me we need to hold the lock for write all the time. But
there is probably a reason we didn't do it in the first place.

2022-10-07 08:51:10

by Yu Zhao

[permalink] [raw]
Subject: Re: [mm] 763ecb0350: kernel_BUG_at_mm/mmap.c

On Thu, Oct 6, 2022 at 6:47 PM Yu Zhao <[email protected]> wrote:
>
> On Wed, Oct 5, 2022 at 9:30 AM kernel test robot <[email protected]> wrote:
> >
> >
> > Greeting,
> >
> > FYI, we noticed the following commit (built with gcc-11):
> >
> > commit: 763ecb035029f500d7e6dc99acd1ad299b7726a1 ("mm: remove the vma linked list")
> > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >
> > in testcase: trinity
> > version: trinity-static-i386-x86_64-1c734c75-1_2020-01-06
> > with following parameters:
> >
> > runtime: 300s
> > group: group-03
> >
> > test-description: Trinity is a linux system call fuzz tester.
> > test-url: http://codemonkey.org.uk/projects/trinity/
> >
> >
> > on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> >
> > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> >
> >
> >
> > If you fix the issue, kindly add following tag
> > | Reported-by: kernel test robot <[email protected]>
> > | Link: https://lore.kernel.org/r/[email protected]
> >
> >
> > [ 63.390267][ T5018] ------------[ cut here ]------------
> > [ 63.391875][ T5018] kernel BUG at mm/mmap.c:3167!
> > [ 63.393264][ T5018] invalid opcode: 0000 [#1] SMP PTI
> > [ 63.394501][ T5018] CPU: 1 PID: 5018 Comm: trinity-c1 Not tainted 6.0.0-rc3-00284-g763ecb035029 #1
> > [ 63.396050][ T5018] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
> > [ 63.397726][ T5018] RIP: 0010:exit_mmap (mm/mmap.c:3167 (discriminator 1))
>
> Thanks, Oliver.
>
> The attached dmesg doesn't say much. My guess is the oom reaper jumped
> in between
>
> mmap_read_unlock(mm);
>
> /*
> * Set MMF_OOM_SKIP to hide this task from the oom killer/reaper
> * because the memory has been already freed.
> */
> set_bit(MMF_OOM_SKIP, &mm->flags);
> mmap_write_lock(mm);
>
> It seems to me we need to hold the lock for write all the time. But
> there is probably a reason we didn't do it in the first place.

Apparently this is safe: I checked all places that change VMAs and
none of them can race with the above (oom reaper was a red herring).