2023-02-27 01:33:23

by kernel test robot

[permalink] [raw]
Subject: [linus:master] [mm/mmap] 0503ea8f5b: kernel_BUG_at_mm/filemap.c


Greeting,

FYI, we noticed kernel_BUG_at_mm/filemap.c due to commit (built with gcc-11):

commit: 0503ea8f5ba73eb3ab13a81c1eefbaf51405385a ("mm/mmap: remove __vma_adjust()")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[test failed on linux-next/master 0222aa9800b25ff171d6dcabcabcd5c42c6ffc3f]

in testcase: trinity
version: trinity-static-i386-x86_64-1c734c75-1_2020-01-06
with following parameters:

runtime: 300s
group: group-04

test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/


on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


please be noted, as below table, parent also has other type issues, and we
found they happen almost at same position of kernel_BUG_at_mm/filemap.c for
this commit if looking into dmesg (attached two parent dmesgs as well)

we don't have knowledge if this commit fixes some problem in parent then
run further until further issues, but since this commit touches
mm/filemap.c, we just made out this report FYI

BTW, we also noticed there is a fix commit
07dc4b1862035 (" mm/mremap: fix dup_anon_vma() in vma_merge() case 4")
by further testing, BUG_at_mm/filemap.c is still existing there.

+---------------------------------------------+------------+------------+
| | 287051b185 | 0503ea8f5b |
+---------------------------------------------+------------+------------+
| BUG:kernel_NULL_pointer_dereference,address | 11 | |
| Oops:#[##] | 11 | |
| RIP:dup_anon_vma | 11 | |
| Kernel_panic-not_syncing:Fatal_exception | 20 | 9 |
| canonical_address#:#[##] | 9 | |
| RIP:anon_vma_clone | 9 | |
| kernel_BUG_at_mm/filemap.c | 0 | 9 |
| invalid_opcode:#[##] | 0 | 9 |
| RIP:filemap_unaccount_folio | 0 | 9 |
+---------------------------------------------+------------+------------+


If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/oe-lkp/[email protected]


[ 28.065728][ T4983] ------------[ cut here ]------------
[ 28.066480][ T4983] kernel BUG at mm/filemap.c:153!
[ 28.067153][ T4983] invalid opcode: 0000 [#1] SMP PTI
[ 28.067868][ T4983] CPU: 0 PID: 4983 Comm: trinity-c3 Not tainted 6.2.0-rc4-00443-g0503ea8f5ba7 #1
[ 28.069001][ T4983] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-5 04/01/2014
[ 28.072145][ T4983] RIP: 0010:filemap_unaccount_folio (filemap.c:?)
[ 28.072927][ T4983] Code: 89 fb 0f ba e0 10 72 05 8b 46 30 eb 0a 8b 46 58 85 c0 7f 07 8b 46 54 85 c0 78 11 48 c7 c6 a0 aa 24 82 48 89 ef e8 0b d2 02 00 <0f> 0b 48 89 ef e8 01 e7 ff ff be 13 00 00 00 48 89 ef 41 89 c4 41
All code
========
0: 89 fb mov %edi,%ebx
2: 0f ba e0 10 bt $0x10,%eax
6: 72 05 jb 0xd
8: 8b 46 30 mov 0x30(%rsi),%eax
b: eb 0a jmp 0x17
d: 8b 46 58 mov 0x58(%rsi),%eax
10: 85 c0 test %eax,%eax
12: 7f 07 jg 0x1b
14: 8b 46 54 mov 0x54(%rsi),%eax
17: 85 c0 test %eax,%eax
19: 78 11 js 0x2c
1b: 48 c7 c6 a0 aa 24 82 mov $0xffffffff8224aaa0,%rsi
22: 48 89 ef mov %rbp,%rdi
25: e8 0b d2 02 00 callq 0x2d235
2a:* 0f 0b ud2 <-- trapping instruction
2c: 48 89 ef mov %rbp,%rdi
2f: e8 01 e7 ff ff callq 0xffffffffffffe735
34: be 13 00 00 00 mov $0x13,%esi
39: 48 89 ef mov %rbp,%rdi
3c: 41 89 c4 mov %eax,%r12d
3f: 41 rex.B

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 48 89 ef mov %rbp,%rdi
5: e8 01 e7 ff ff callq 0xffffffffffffe70b
a: be 13 00 00 00 mov $0x13,%esi
f: 48 89 ef mov %rbp,%rdi
12: 41 89 c4 mov %eax,%r12d
15: 41 rex.B
[ 28.075337][ T4983] RSP: 0000:ffffc90000223b08 EFLAGS: 00010046
[ 28.076117][ T4983] RAX: 0000000000000039 RBX: ffff8881195e4dd8 RCX: 0000000000000027
[ 28.077144][ T4983] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88842fc1c680
[ 28.078211][ T4983] RBP: ffffea0005fa0b00 R08: 0000000000000000 R09: 0000000000000019
[ 28.079264][ T4983] R10: 0000000000000000 R11: 6d75642065676170 R12: ffffea0005fa0b00
[ 28.080312][ T4983] R13: 0000000000000000 R14: ffff8881195e4dd8 R15: 000000000000000c
[ 28.081380][ T4983] FS: 0000000000000000(0000) GS:ffff88842fc00000(0063) knlGS:0000000008acb840
[ 28.082525][ T4983] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
[ 28.083399][ T4983] CR2: 0000000000200000 CR3: 0000000118c36000 CR4: 00000000000406f0
[ 28.084497][ T4983] DR0: fffffffff68cc000 DR1: 0000000000000000 DR2: 0000000000000000
[ 28.085589][ T4983] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[ 28.086685][ T4983] Call Trace:
[ 28.087222][ T4983] <TASK>
[ 28.087701][ T4983] __filemap_remove_folio (??:?)
[ 28.088418][ T4983] ? unmap_mapping_range_tree (memory.c:?)
[ 28.089168][ T4983] ? mapping_can_writeback+0x5/0xc
[ 28.089940][ T4983] filemap_remove_folio (??:?)
[ 28.090627][ T4983] truncate_inode_folio (??:?)
[ 28.091342][ T4983] shmem_undo_range (shmem.c:?)
[ 28.092036][ T4983] shmem_truncate_range (??:?)
[ 28.092753][ T4983] shmem_fallocate (shmem.c:?)
[ 28.093444][ T4983] vfs_fallocate (??:?)
[ 28.094128][ T4983] madvise_vma_behavior (madvise.c:?)
[ 28.094874][ T4983] do_madvise (??:?)
[ 28.095491][ T4983] __ia32_sys_madvise (??:?)
[ 28.096166][ T4983] do_int80_syscall_32 (??:?)
[ 28.096885][ T4983] entry_INT80_compat (??:?)
[ 28.097538][ T4983] RIP: 0023:0x80a3392
[ 28.098133][ T4983] Code: 89 c8 c3 90 8d 74 26 00 85 c0 c7 01 01 00 00 00 75 d8 a1 c8 a9 ac 08 eb d1 66 90 66 90 66 90 66 90 66 90 66 90 66 90 90 cd 80 <c3> 8d b6 00 00 00 00 8d bc 27 00 00 00 00 8b 10 a3 f0 a9 ac 08 85
All code
========
0: 89 c8 mov %ecx,%eax
2: c3 retq
3: 90 nop
4: 8d 74 26 00 lea 0x0(%rsi,%riz,1),%esi
8: 85 c0 test %eax,%eax
a: c7 01 01 00 00 00 movl $0x1,(%rcx)
10: 75 d8 jne 0xffffffffffffffea
12: a1 c8 a9 ac 08 eb d1 movabs 0x9066d1eb08aca9c8,%eax
19: 66 90
1b: 66 90 xchg %ax,%ax
1d: 66 90 xchg %ax,%ax
1f: 66 90 xchg %ax,%ax
21: 66 90 xchg %ax,%ax
23: 66 90 xchg %ax,%ax
25: 66 90 xchg %ax,%ax
27: 90 nop
28: cd 80 int $0x80
2a:* c3 retq <-- trapping instruction
2b: 8d b6 00 00 00 00 lea 0x0(%rsi),%esi
31: 8d bc 27 00 00 00 00 lea 0x0(%rdi,%riz,1),%edi
38: 8b 10 mov (%rax),%edx
3a: a3 .byte 0xa3
3b: f0 lock
3c: a9 .byte 0xa9
3d: ac lods %ds:(%rsi),%al
3e: 08 .byte 0x8
3f: 85 .byte 0x85

Code starting with the faulting instruction
===========================================
0: c3 retq
1: 8d b6 00 00 00 00 lea 0x0(%rsi),%esi
7: 8d bc 27 00 00 00 00 lea 0x0(%rdi,%riz,1),%edi
e: 8b 10 mov (%rax),%edx
10: a3 .byte 0xa3
11: f0 lock
12: a9 .byte 0xa9
13: ac lods %ds:(%rsi),%al
14: 08 .byte 0x8
15: 85 .byte 0x85
[ 28.100541][ T4983] RSP: 002b:00000000ffa5c9b8 EFLAGS: 00000292 ORIG_RAX: 00000000000000db
[ 28.101693][ T4983] RAX: ffffffffffffffda RBX: 00000000f500d000 RCX: 000000000014c000
[ 28.102812][ T4983] RDX: 0000000000000009 RSI: 0000000000200000 RDI: 0000000000000002
[ 28.103946][ T4983] RBP: 00000000000000ff R08: 0000000000000000 R09: 0000000000000000
[ 28.105054][ T4983] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 28.106161][ T4983] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 28.107273][ T4983] </TASK>
[ 28.107785][ T4983] Modules linked in: can_bcm can_raw can cn scsi_transport_iscsi sr_mod cdrom ata_generic
[ 28.109085][ T4983] ---[ end trace 0000000000000000 ]---
[ 28.109822][ T4983] RIP: 0010:filemap_unaccount_folio (filemap.c:?)
[ 28.110662][ T4983] Code: 89 fb 0f ba e0 10 72 05 8b 46 30 eb 0a 8b 46 58 85 c0 7f 07 8b 46 54 85 c0 78 11 48 c7 c6 a0 aa 24 82 48 89 ef e8 0b d2 02 00 <0f> 0b 48 89 ef e8 01 e7 ff ff be 13 00 00 00 48 89 ef 41 89 c4 41
All code
========
0: 89 fb mov %edi,%ebx
2: 0f ba e0 10 bt $0x10,%eax
6: 72 05 jb 0xd
8: 8b 46 30 mov 0x30(%rsi),%eax
b: eb 0a jmp 0x17
d: 8b 46 58 mov 0x58(%rsi),%eax
10: 85 c0 test %eax,%eax
12: 7f 07 jg 0x1b
14: 8b 46 54 mov 0x54(%rsi),%eax
17: 85 c0 test %eax,%eax
19: 78 11 js 0x2c
1b: 48 c7 c6 a0 aa 24 82 mov $0xffffffff8224aaa0,%rsi
22: 48 89 ef mov %rbp,%rdi
25: e8 0b d2 02 00 callq 0x2d235
2a:* 0f 0b ud2 <-- trapping instruction
2c: 48 89 ef mov %rbp,%rdi
2f: e8 01 e7 ff ff callq 0xffffffffffffe735
34: be 13 00 00 00 mov $0x13,%esi
39: 48 89 ef mov %rbp,%rdi
3c: 41 89 c4 mov %eax,%r12d
3f: 41 rex.B

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 48 89 ef mov %rbp,%rdi
5: e8 01 e7 ff ff callq 0xffffffffffffe70b
a: be 13 00 00 00 mov $0x13,%esi
f: 48 89 ef mov %rbp,%rdi
12: 41 89 c4 mov %eax,%r12d
15: 41 rex.B


To reproduce:

# build kernel
cd linux
cp config-6.2.0-rc4-00443-g0503ea8f5ba7 .config
make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
cd <mod-install-dir>
find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz


git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.



--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests



Attachments:
(No filename) (11.15 kB)
config-6.2.0-rc4-00443-g0503ea8f5ba7 (127.89 kB)
job-script (4.34 kB)
dmesg.xz (27.61 kB)
dmesg-parent-1.xz (27.07 kB)
dmesg-parent-2.xz (27.39 kB)
Download all attachments

2023-02-27 03:18:25

by Liam R. Howlett

[permalink] [raw]
Subject: Re: [linus:master] [mm/mmap] 0503ea8f5b: kernel_BUG_at_mm/filemap.c

* kernel test robot <[email protected]> [230226 20:33]:
>
> Greeting,
>
> FYI, we noticed kernel_BUG_at_mm/filemap.c due to commit (built with gcc-11):
>
> commit: 0503ea8f5ba73eb3ab13a81c1eefbaf51405385a ("mm/mmap: remove __vma_adjust()")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> [test failed on linux-next/master 0222aa9800b25ff171d6dcabcabcd5c42c6ffc3f]

If this is linux-next, shouldn't that mailing list be in the Cc list?

>
> in testcase: trinity
> version: trinity-static-i386-x86_64-1c734c75-1_2020-01-06
> with following parameters:
>
> runtime: 300s
> group: group-04
>
> test-description: Trinity is a linux system call fuzz tester.
> test-url: http://codemonkey.org.uk/projects/trinity/
>
>
> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
> please be noted, as below table, parent also has other type issues, and we
> found they happen almost at same position of kernel_BUG_at_mm/filemap.c for
> this commit if looking into dmesg (attached two parent dmesgs as well)

I don't understand what you are saying with the above paragraph.

I thought I understood that the bug happens in the previous commits and
there is a dmesg from the previous two parents attached.. but when I
look at the attached two dmesg, they are from the same commit and
neither has anything to do with filemap.

And why would we blame the later commit for the same bug?

Did something go wrong with the bisection?

>
> we don't have knowledge if this commit fixes some problem in parent then
> run further until further issues, but since this commit touches
> mm/filemap.c, we just made out this report FYI

I changed one line in a comment in mm/filemap.c in the commit.

-------------
diff --git a/mm/filemap.c b/mm/filemap.c
index c915ded191f0..992554c18f1f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -97,7 +97,7 @@
* ->i_pages lock (__sync_single_inode)
*
* ->i_mmap_rwsem
- * ->anon_vma.lock (vma_adjust)
+ * ->anon_vma.lock (vma_merge)
*

-----------

Are you sure about this bisection? I'm not saying it isn't my fault or
looking to blame others, but I suspect we are indeed looking at the
wrong commit here.

>
> BTW, we also noticed there is a fix commit
> 07dc4b1862035 (" mm/mremap: fix dup_anon_vma() in vma_merge() case 4")
> by further testing, BUG_at_mm/filemap.c is still existing there.
>
> +---------------------------------------------+------------+------------+
> | | 287051b185 | 0503ea8f5b |
> +---------------------------------------------+------------+------------+
> | BUG:kernel_NULL_pointer_dereference,address | 11 | |
> | Oops:#[##] | 11 | |
> | RIP:dup_anon_vma | 11 | |
> | Kernel_panic-not_syncing:Fatal_exception | 20 | 9 |
> | canonical_address#:#[##] | 9 | |
> | RIP:anon_vma_clone | 9 | |
> | kernel_BUG_at_mm/filemap.c | 0 | 9 |
> | invalid_opcode:#[##] | 0 | 9 |
> | RIP:filemap_unaccount_folio | 0 | 9 |
> +---------------------------------------------+------------+------------+
>

At what commit did problems start showing up regardless of what the
problem was? I did not see any other emails from this bot since
2023-02-12, but clearly it is reporting problems with earlier commits
considering the table above.

>
> If you fix the issue, kindly add following tag
> | Reported-by: kernel test robot <[email protected]>
> | Link: https://lore.kernel.org/oe-lkp/[email protected]
>
>
> [ 28.065728][ T4983] ------------[ cut here ]------------
> [ 28.066480][ T4983] kernel BUG at mm/filemap.c:153!
> [ 28.067153][ T4983] invalid opcode: 0000 [#1] SMP PTI
> [ 28.067868][ T4983] CPU: 0 PID: 4983 Comm: trinity-c3 Not tainted 6.2.0-rc4-00443-g0503ea8f5ba7 #1
> [ 28.069001][ T4983] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-5 04/01/2014
> [ 28.072145][ T4983] RIP: 0010:filemap_unaccount_folio (filemap.c:?)
> [ 28.072927][ T4983] Code: 89 fb 0f ba e0 10 72 05 8b 46 30 eb 0a 8b 46 58 85 c0 7f 07 8b 46 54 85 c0 78 11 48 c7 c6 a0 aa 24 82 48 89 ef e8 0b d2 02 00 <0f> 0b 48 89 ef e8 01 e7 ff ff be 13 00 00 00 48 89 ef 41 89 c4 41
...

> [ 28.087701][ T4983] __filemap_remove_folio (??:?)
> [ 28.088418][ T4983] ? unmap_mapping_range_tree (memory.c:?)
> [ 28.089168][ T4983] ? mapping_can_writeback+0x5/0xc
> [ 28.089940][ T4983] filemap_remove_folio (??:?)
> [ 28.090627][ T4983] truncate_inode_folio (??:?)
> [ 28.091342][ T4983] shmem_undo_range (shmem.c:?)
> [ 28.092036][ T4983] shmem_truncate_range (??:?)
> [ 28.092753][ T4983] shmem_fallocate (shmem.c:?)
> [ 28.093444][ T4983] vfs_fallocate (??:?)
> [ 28.094128][ T4983] madvise_vma_behavior (madvise.c:?)
> [ 28.094874][ T4983] do_madvise (??:?)
> [ 28.095491][ T4983] __ia32_sys_madvise (??:?)
> [ 28.096166][ T4983] do_int80_syscall_32 (??:?)
> [ 28.096885][ T4983] entry_INT80_compat (??:?)

What happened to your line numbers? Didn't these show up before? They
did on 2023-02-06 [1]

...
>
> To reproduce:
>
> # build kernel
> cd linux
> cp config-6.2.0-rc4-00443-g0503ea8f5ba7 .config
> make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
> make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
> cd <mod-install-dir>
> find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz
>
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email
>
> # if come across any failure that blocks the test,
> # please remove ~/.lkp and /lkp dir to run from a clean state.
>

This does not work for me. Since my last use of lkp it seems something
was changed and now -watchdog is not recognized by my qemu and so my
attempts to reproduce this are failing. Is there a way to avoid using
the -watchdog flag? Running the command by hand fails as it seems some
files are removed on exit?

I did try to remove the directories and run from a clean state, but it
still fails for me. (see below)


Thanks,
Liam

1. https://lore.kernel.org/linux-mm/[email protected]/

Log of failed lkp 68d76160fd7bb767c4a63e7709706b462c475e1b
======================
x86_64
==> Making package: lkp-src 0-1 (Sun Feb 26 09:31:45 PM EST 2023)
==> Checking runtime dependencies...
==> Checking buildtime dependencies...
==> WARNING: Using existing $srcdir/ tree
==> Removing existing $pkgdir/ directory...
==> Starting build()...
make: Entering directory '/home/jedix/lkp-tests/bin/event'
klcc -D_FORTIFY_SOURCE=2 -c -o wakeup.o wakeup.c
klcc -Wl,-O1,--sort-common,--as-needed,-z,relro -static -o wakeup wakeup.o
rm -f wakeup.o
strip wakeup
make: Leaving directory '/home/jedix/lkp-tests/bin/event'
==> Entering fakeroot environment...
x86_64
==> Starting package()...
==> Creating package "lkp-src"...
88466 blocks
renamed '/home/jedix/.lkp/cache/lkp-x86_64.cgz.tmp' -> '/home/jedix/.lkp/cache/lkp-x86_64.cgz'
==> Leaving fakeroot environment.
==> Finished making: lkp-src 0-1 (Sun Feb 26 09:31:47 PM EST 2023)
~/lkp-tests
11 blocks
result_root: /home/jedix/.lkp//result/trinity/group-04-300s/vm-snb/yocto-i386-minimal-20190520.cgz/x86_64-kexec/gcc-11/0503ea8f5ba73eb3ab13a81c1eefbaf51405385a/0
downloading initrds ...
use local modules: /home/jedix/.lkp/cache/modules.cgz
/usr/bin/wget -q --timeout=1800 --tries=1 --local-encoding=UTF-8 https://download.01.org/0day-ci/lkp-qemu/osimage/yocto/yocto-i386-minimal-20190520.cgz -N -P /home/jedix/.lkp/cache/osimage/yocto
17916 blocks
/usr/bin/wget -q --timeout=1800 --tries=1 --local-encoding=UTF-8 https://download.01.org/0day-ci/lkp-qemu/osimage/pkg/debian-x86_64-20180403.cgz/trinity-static-i386-x86_64-1c734c75-1_2020-01-06.cgz -N -P /home/jedix/.lkp/cache/osimage/pkg/debian-x86_64-20180403.cgz
43019 blocks
exec command: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -fsdev local,id=test_dev,path=/home/jedix/.lkp//result/trinity/group-04-300s/vm-snb/yocto-i386-minimal-20190520.cgz/x86_64-kexec/gcc-11/0503ea8f5ba73eb3ab13a81c1eefbaf51405385a/0,security_model=none -device virtio-9p-pci,fsdev=test_dev,mount_tag=9p/virtfs_mount -kernel bzImage -append root=/dev/ram0 RESULT_ROOT=/result/trinity/group-04-300s/vm-snb/yocto-i386-minimal-20190520.cgz/x86_64-kexec/gcc-11/0503ea8f5ba73eb3ab13a81c1eefbaf51405385a/19 BOOT_IMAGE=/pkg/linux/x86_64-kexec/gcc-11/0503ea8f5ba73eb3ab13a81c1eefbaf51405385a/vmlinuz-6.2.0-rc4-00443-g0503ea8f5ba7 branch=linus/master job=/lkp/jobs/scheduled/vm-meta-102/trinity-group-04-300s-yocto-i386-minimal-20190520.cgz-0503ea8f5ba73eb3ab13a81c1eefbaf51405385a-20230224-7240-hzx70n-16.yaml user=lkp ARCH=x86_64 kconfig=x86_64-kexec commit=0503ea8f5ba73eb3ab13a81c1eefbaf51405385a initcall_debug nmi_watchdog=0 vmalloc=256M initramfs_async=0 page_owner=on max_uptime=1200 LKP_LOCAL_RUN=1 selinux=0 debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 net.ifnames=0 printk.devkmsg=on panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0 drbd.minor_count=8 systemd.log_level=err ignore_loglevel console=tty0 earlyprintk=ttyS0,115200 console=ttyS0,115200 vga=normal rw ip=dhcp result_service=9p/virtfs_mount -initrd /home/jedix/.lkp/cache/final_initrd -smp 2 -m 3419M -no-reboot -watchdog i6300esb -rtc base=localtime -device e1000,netdev=net0 -netdev user,id=net0 -display none -monitor null -serial stdio
qemu-system-x86_64: -watchdog: invalid option




2023-02-27 03:52:39

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [linus:master] [mm/mmap] 0503ea8f5b: kernel_BUG_at_mm/filemap.c

On Sun, Feb 26, 2023 at 10:17:56PM -0500, Liam R. Howlett wrote:
> Are you sure about this bisection? I'm not saying it isn't my fault or
> looking to blame others, but I suspect we are indeed looking at the
> wrong commit here.

I concur. Looking at the backtrace, I think it's more likely to be
a pagecache, swapcache or shmfs issue. I'll look more in the morning.

> > bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email
> >
> > # if come across any failure that blocks the test,
> > # please remove ~/.lkp and /lkp dir to run from a clean state.
> >
>
> This does not work for me. Since my last use of lkp it seems something
> was changed and now -watchdog is not recognized by my qemu and so my
> attempts to reproduce this are failing. Is there a way to avoid using
> the -watchdog flag? Running the command by hand fails as it seems some
> files are removed on exit?

https://www.qemu.org/docs/master/about/removed-features.html

-watchdog (since 7.2)

Use -device instead.

(I have no idea what this means, but maybe it's a clue?)

2023-03-01 07:32:32

by kernel test robot

[permalink] [raw]
Subject: Re: [linus:master] [mm/mmap] 0503ea8f5b: kernel_BUG_at_mm/filemap.c

Hi Liam,

On Sun, Feb 26, 2023 at 10:17:56PM -0500, Liam R. Howlett wrote:
> * kernel test robot <[email protected]> [230226 20:33]:
> >
> > Greeting,
> >
> > FYI, we noticed kernel_BUG_at_mm/filemap.c due to commit (built with gcc-11):
> >
> > commit: 0503ea8f5ba73eb3ab13a81c1eefbaf51405385a ("mm/mmap: remove __vma_adjust()")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > [test failed on linux-next/master 0222aa9800b25ff171d6dcabcabcd5c42c6ffc3f]
>
> If this is linux-next, shouldn't that mailing list be in the Cc list?

when this report was generated, the commit is already in mainline, so we don't
add linux-next mailing list.

when our auto-bisect captured a mainline commit as first bad commit for one
issue, it will also check latest mainline and latest linux-next at that time to
confirm the issue still exists, otherwise, auto-bisect will regard the issue
'has been fixed' then don't report.

above means the auto-bisect checked the "linux-next/master 0222aa9800b25", but
previously we won't list mainline check if there is already a linux-next check.
Since this seems cause confusion, now we changed a bit about this.
as in a new report also to you
[linus:master] [mm/mmap] 04241ffe3f: Kernel_panic-not_syncing:Fatal_exception
in
[1] https://lore.kernel.org/all/[email protected]/
(I will also mention this report later)
you may notice that change. do you think that is better or do you have other
suggests? Thanks a lot!

>
> >
> > in testcase: trinity
> > version: trinity-static-i386-x86_64-1c734c75-1_2020-01-06
> > with following parameters:
> >
> > runtime: 300s
> > group: group-04
> >
> > test-description: Trinity is a linux system call fuzz tester.
> > test-url: http://codemonkey.org.uk/projects/trinity/
> >
> >
> > on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> >
> > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> >
> >
> > please be noted, as below table, parent also has other type issues, and we
> > found they happen almost at same position of kernel_BUG_at_mm/filemap.c for
> > this commit if looking into dmesg (attached two parent dmesgs as well)
>
> I don't understand what you are saying with the above paragraph.
>
> I thought I understood that the bug happens in the previous commits and
> there is a dmesg from the previous two parents attached.. but when I
> look at the attached two dmesg, they are from the same commit and
> neither has anything to do with filemap.

Let me clarify. they are really from same commit, i.e. 287051b185048c4a,
the parent of 0503ea8f5ba73eb3ab13a81c1ee

maybe from below table, it could be clearer.

for both 0503ea8f5ba73eb3ab13a81c1ee and its parent, we tested 20 times.
then on parent, we observed "dmesg.RIP:anon_vma_clone" 9 times, observed
"dmesg.RIP:dup_anon_vma" 11 times, so I attached two parent dmesgs to
demostrate each case.

for 0503ea8f5ba73eb3ab13a81c1ee, we never observed those 2 issues, instead,
"dmesg.kernel_BUG_at_mm/filemap.c" shows in 9 runs out of 20.

287051b185048c4a 0503ea8f5ba73eb3ab13a81c1ee
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
9:20 -45% :20 dmesg.RIP:anon_vma_clone
11:20 -55% :20 dmesg.RIP:dup_anon_vma
:20 45% 9:20 dmesg.kernel_BUG_at_mm/filemap.c


>
> And why would we blame the later commit for the same bug?

this is due to our auto-bisect could only use some simple logic. in this case,
it finds the dmesg.kernel_BUG_at_mm/filemap.c first occurs on 0503ea8f5b and
not on parent, it has no knowledge if this issue is related with
RIP:anon_vma_clone or RIP:dup_anon_vma on parent (BTW, neither humans in our
team has), and this dmesg.kernel_BUG_at_mm/filemap.c still occcurs on latest
linux-next/master and mainline, so it think it captured a real 'fbc'.
we have a manual check step for report, that's the reason you saw above
paragrah you said which is confusing.

>
> Did something go wrong with the bisection?
>
> >
> > we don't have knowledge if this commit fixes some problem in parent then
> > run further until further issues, but since this commit touches
> > mm/filemap.c, we just made out this report FYI
>
> I changed one line in a comment in mm/filemap.c in the commit.
>
> -------------
> diff --git a/mm/filemap.c b/mm/filemap.c
> index c915ded191f0..992554c18f1f 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -97,7 +97,7 @@
> * ->i_pages lock (__sync_single_inode)
> *
> * ->i_mmap_rwsem
> - * ->anon_vma.lock (vma_adjust)
> + * ->anon_vma.lock (vma_merge)
> *
>
> -----------
>
> Are you sure about this bisection? I'm not saying it isn't my fault or
> looking to blame others, but I suspect we are indeed looking at the
> wrong commit here.


as above mentioned [1], now we made out a new report upon 04241ffe3f which seems
less confusing.

(BTW, a tricky thing is for report 04241ffe3f, the auto-bisect actually uses
"Kernel_panic-not_syncing:Fatal_exception" as the keyword to do bisection.
this "Kernel_panic-not_syncing:Fatal_exception" could be caused by both
(RIP:anon_vma_clone or RIP:dup_anon_vma) and kernel_BUG_at_mm/filemap.c.
if it uses RIP:anon_vma_clone or RIP:dup_anon_vma to do bisect, it will fail
since starting from 0503ea8f5b, there is only kernel_BUG_at_mm/filemap.c,
so it will think issue disppear in latest maineline or linux-next/master)


>
> >
> > BTW, we also noticed there is a fix commit
> > 07dc4b1862035 (" mm/mremap: fix dup_anon_vma() in vma_merge() case 4")
> > by further testing, BUG_at_mm/filemap.c is still existing there.
> >
> > +---------------------------------------------+------------+------------+
> > | | 287051b185 | 0503ea8f5b |
> > +---------------------------------------------+------------+------------+
> > | BUG:kernel_NULL_pointer_dereference,address | 11 | |
> > | Oops:#[##] | 11 | |
> > | RIP:dup_anon_vma | 11 | |
> > | Kernel_panic-not_syncing:Fatal_exception | 20 | 9 |
> > | canonical_address#:#[##] | 9 | |
> > | RIP:anon_vma_clone | 9 | |
> > | kernel_BUG_at_mm/filemap.c | 0 | 9 |
> > | invalid_opcode:#[##] | 0 | 9 |
> > | RIP:filemap_unaccount_folio | 0 | 9 |
> > +---------------------------------------------+------------+------------+
> >
>
> At what commit did problems start showing up regardless of what the
> problem was? I did not see any other emails from this bot since
> 2023-02-12, but clearly it is reporting problems with earlier commits
> considering the table above.
>
> >
> > If you fix the issue, kindly add following tag
> > | Reported-by: kernel test robot <[email protected]>
> > | Link: https://lore.kernel.org/oe-lkp/[email protected]
> >
> >
> > [ 28.065728][ T4983] ------------[ cut here ]------------
> > [ 28.066480][ T4983] kernel BUG at mm/filemap.c:153!
> > [ 28.067153][ T4983] invalid opcode: 0000 [#1] SMP PTI
> > [ 28.067868][ T4983] CPU: 0 PID: 4983 Comm: trinity-c3 Not tainted 6.2.0-rc4-00443-g0503ea8f5ba7 #1
> > [ 28.069001][ T4983] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-5 04/01/2014
> > [ 28.072145][ T4983] RIP: 0010:filemap_unaccount_folio (filemap.c:?)
> > [ 28.072927][ T4983] Code: 89 fb 0f ba e0 10 72 05 8b 46 30 eb 0a 8b 46 58 85 c0 7f 07 8b 46 54 85 c0 78 11 48 c7 c6 a0 aa 24 82 48 89 ef e8 0b d2 02 00 <0f> 0b 48 89 ef e8 01 e7 ff ff be 13 00 00 00 48 89 ef 41 89 c4 41
> ...
>
> > [ 28.087701][ T4983] __filemap_remove_folio (??:?)
> > [ 28.088418][ T4983] ? unmap_mapping_range_tree (memory.c:?)
> > [ 28.089168][ T4983] ? mapping_can_writeback+0x5/0xc
> > [ 28.089940][ T4983] filemap_remove_folio (??:?)
> > [ 28.090627][ T4983] truncate_inode_folio (??:?)
> > [ 28.091342][ T4983] shmem_undo_range (shmem.c:?)
> > [ 28.092036][ T4983] shmem_truncate_range (??:?)
> > [ 28.092753][ T4983] shmem_fallocate (shmem.c:?)
> > [ 28.093444][ T4983] vfs_fallocate (??:?)
> > [ 28.094128][ T4983] madvise_vma_behavior (madvise.c:?)
> > [ 28.094874][ T4983] do_madvise (??:?)
> > [ 28.095491][ T4983] __ia32_sys_madvise (??:?)
> > [ 28.096166][ T4983] do_int80_syscall_32 (??:?)
> > [ 28.096885][ T4983] entry_INT80_compat (??:?)
>
> What happened to your line numbers? Didn't these show up before? They
> did on 2023-02-06 [1]

oh, this is an issue under investigation now.
I guess the report for 04241ffe3f is kind of more valid?
I also mentioned this problem there and will update you after we fix.

>
> ...
> >
> > To reproduce:
> >
> > # build kernel
> > cd linux
> > cp config-6.2.0-rc4-00443-g0503ea8f5ba7 .config
> > make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
> > make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
> > cd <mod-install-dir>
> > find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz
> >
> >
> > git clone https://github.com/intel/lkp-tests.git
> > cd lkp-tests
> > bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email
> >
> > # if come across any failure that blocks the test,
> > # please remove ~/.lkp and /lkp dir to run from a clean state.
> >
>
> This does not work for me. Since my last use of lkp it seems something
> was changed and now -watchdog is not recognized by my qemu and so my
> attempts to reproduce this are failing. Is there a way to avoid using
> the -watchdog flag? Running the command by hand fails as it seems some
> files are removed on exit?

this is really related with a deprecated option '-watchdog'. (as Matthew Wilcox
also mentioned)

on my local platform:
$ qemu-system-x86_64 --version
QEMU emulator version 6.2.0 (Debian 1:6.2+dfsg-2ubuntu6.6)
$ qemu-system-x86_64 -watchdog i6300esb
qemu-system-x86_64: -watchdog i6300esb: warning: -watchdog is deprecated; use -device instead.

I guess you are using even newer version which deprecated this option already.
we've already pushed a fix:
https://github.com/intel/lkp-tests/commit/18c7a9c9fe0eebc4536220b97b10b47bfae00595

>
> I did try to remove the directories and run from a clean state, but it
> still fails for me. (see below)
>
>
> Thanks,
> Liam
>
> 1. https://lore.kernel.org/linux-mm/[email protected]/
>
> Log of failed lkp 68d76160fd7bb767c4a63e7709706b462c475e1b
> ======================
> x86_64
> ==> Making package: lkp-src 0-1 (Sun Feb 26 09:31:45 PM EST 2023)
> ==> Checking runtime dependencies...
> ==> Checking buildtime dependencies...
> ==> WARNING: Using existing $srcdir/ tree
> ==> Removing existing $pkgdir/ directory...
> ==> Starting build()...
> make: Entering directory '/home/jedix/lkp-tests/bin/event'
> klcc -D_FORTIFY_SOURCE=2 -c -o wakeup.o wakeup.c
> klcc -Wl,-O1,--sort-common,--as-needed,-z,relro -static -o wakeup wakeup.o
> rm -f wakeup.o
> strip wakeup
> make: Leaving directory '/home/jedix/lkp-tests/bin/event'
> ==> Entering fakeroot environment...
> x86_64
> ==> Starting package()...
> ==> Creating package "lkp-src"...
> 88466 blocks
> renamed '/home/jedix/.lkp/cache/lkp-x86_64.cgz.tmp' -> '/home/jedix/.lkp/cache/lkp-x86_64.cgz'
> ==> Leaving fakeroot environment.
> ==> Finished making: lkp-src 0-1 (Sun Feb 26 09:31:47 PM EST 2023)
> ~/lkp-tests
> 11 blocks
> result_root: /home/jedix/.lkp//result/trinity/group-04-300s/vm-snb/yocto-i386-minimal-20190520.cgz/x86_64-kexec/gcc-11/0503ea8f5ba73eb3ab13a81c1eefbaf51405385a/0
> downloading initrds ...
> use local modules: /home/jedix/.lkp/cache/modules.cgz
> /usr/bin/wget -q --timeout=1800 --tries=1 --local-encoding=UTF-8 https://download.01.org/0day-ci/lkp-qemu/osimage/yocto/yocto-i386-minimal-20190520.cgz -N -P /home/jedix/.lkp/cache/osimage/yocto
> 17916 blocks
> /usr/bin/wget -q --timeout=1800 --tries=1 --local-encoding=UTF-8 https://download.01.org/0day-ci/lkp-qemu/osimage/pkg/debian-x86_64-20180403.cgz/trinity-static-i386-x86_64-1c734c75-1_2020-01-06.cgz -N -P /home/jedix/.lkp/cache/osimage/pkg/debian-x86_64-20180403.cgz
> 43019 blocks
> exec command: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -fsdev local,id=test_dev,path=/home/jedix/.lkp//result/trinity/group-04-300s/vm-snb/yocto-i386-minimal-20190520.cgz/x86_64-kexec/gcc-11/0503ea8f5ba73eb3ab13a81c1eefbaf51405385a/0,security_model=none -device virtio-9p-pci,fsdev=test_dev,mount_tag=9p/virtfs_mount -kernel bzImage -append root=/dev/ram0 RESULT_ROOT=/result/trinity/group-04-300s/vm-snb/yocto-i386-minimal-20190520.cgz/x86_64-kexec/gcc-11/0503ea8f5ba73eb3ab13a81c1eefbaf51405385a/19 BOOT_IMAGE=/pkg/linux/x86_64-kexec/gcc-11/0503ea8f5ba73eb3ab13a81c1eefbaf51405385a/vmlinuz-6.2.0-rc4-00443-g0503ea8f5ba7 branch=linus/master job=/lkp/jobs/scheduled/vm-meta-102/trinity-group-04-300s-yocto-i386-minimal-20190520.cgz-0503ea8f5ba73eb3ab13a81c1eefbaf51405385a-20230224-7240-hzx70n-16.yaml user=lkp ARCH=x86_64 kconfig=x86_64-kexec commit=0503ea8f5ba73eb3ab13a81c1eefbaf51405385a initcall_debug nmi_watchdog=0 vmalloc=256M initramfs_async=0 page_owner=on max_uptime=1200 LKP_LOCAL_RUN=1 selinux=0 debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 net.ifnames=0 printk.devkmsg=on panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0 drbd.minor_count=8 systemd.log_level=err ignore_loglevel console=tty0 earlyprintk=ttyS0,115200 console=ttyS0,115200 vga=normal rw ip=dhcp result_service=9p/virtfs_mount -initrd /home/jedix/.lkp/cache/final_initrd -smp 2 -m 3419M -no-reboot -watchdog i6300esb -rtc base=localtime -device e1000,netdev=net0 -netdev user,id=net0 -display none -monitor null -serial stdio
> qemu-system-x86_64: -watchdog: invalid option
>
>
>