LinuxLists.cc - Re: [PATCH] mapletree-vs-khugepaged

2022-04-29 14:41:33

Subject: Re: [PATCH] mapletree-vs-khugepaged

On Wed, Apr 27, 2022 at 03:10:45PM -0700, Andrew Morton wrote:
> Fix mapletree for patch series "Make khugepaged collapse readonly FS THP
> more consistent", v3.
>
> Cc: Liam R. Howlett <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>

This patch causes all my sparc64 boot tests to fail. Bisect and crash logs
attached.

Guenter

---
[ 12.624703] Unable to handle kernel paging request at virtual address 0e00000000000000
[ 12.624793] tsk->{mm,active_mm}->context = 0000000000000005
[ 12.624823] tsk->{mm,active_mm}->pgd = fffff800048b8000
[ 12.624849] \|/ ____ \|/
[ 12.624849] "@'/ .. \`@"
[ 12.624849] /_| \__/ |_\
[ 12.624849] \__U_/
[ 12.624874] init(1): Oops [#1]
[ 12.625194] CPU: 0 PID: 1 Comm: init Not tainted 5.18.0-rc4-next-20220428 #1
[ 12.625421] TSTATE: 0000009911001606 TPC: 00000000005e6330 TNPC: 00000000005e6334 Y: 00000000 Not tainted
[ 12.625455] TPC: <mmap_region+0x150/0x700>
[ 12.625503] g0: 0000000000619a00 g1: 0000000000000000 g2: fffff8000488b200 g3: 0000000000000000
[ 12.625537] g4: fffff8000414a9a0 g5: fffff8001dd3e000 g6: fffff8000414c000 g7: 0000000000000000
[ 12.625569] o0: 0000000000000000 o1: 0000000000000000 o2: 0000000001167b68 o3: 0000000000f51bb8
[ 12.625601] o4: fffff80100301fff o5: fffff8000414fc20 sp: fffff8000414f341 ret_pc: 00000000005e6310
[ 12.625630] RPC: <mmap_region+0x130/0x700>
[ 12.625692] l0: fffff8000488b260 l1: 000000000000008b l2: fffff80100302000 l3: 0000000000000000
[ 12.625725] l4: fffff80100301fff l5: 0000000000000000 l6: 30812c2a1dd8556f l7: fffff8000414b438
[ 12.625762] i0: fffff800044f58a0 i1: fffff801001ec000 i2: 0e00000000000000 i3: 0000000000000075
[ 12.625795] i4: 0000000000000000 i5: fffff8000414fde0 i6: fffff8000414f461 i7: 00000000005e6c58
[ 12.625833] I7: <do_mmap+0x378/0x500>
[ 12.625906] Call Trace:
[ 12.626006] [<00000000005e6c58>] do_mmap+0x378/0x500
[ 12.626092] [<00000000005bdc98>] vm_mmap_pgoff+0x78/0x100
[ 12.626112] [<00000000005e3d24>] ksys_mmap_pgoff+0x164/0x1c0
[ 12.626129] [<0000000000406294>] linux_sparc_syscall+0x34/0x44
[ 12.626198] Disabling lock debugging due to kernel taint
[ 12.626286] Caller[00000000005e6c58]: do_mmap+0x378/0x500
[ 12.626335] Caller[00000000005bdc98]: vm_mmap_pgoff+0x78/0x100
[ 12.626354] Caller[00000000005e3d24]: ksys_mmap_pgoff+0x164/0x1c0
[ 12.626371] Caller[0000000000406294]: linux_sparc_syscall+0x34/0x44
[ 12.626390] Caller[fffff8010001d88c]: 0xfffff8010001d88c
[ 12.626537] Instruction DUMP:
[ 12.626567] a6100008
[ 12.626678] 02c68006
[ 12.626685] 01000000
[ 12.626690] <c25e8000>
[ 12.626696] 80a04012
[ 12.626701] 22600077
[ 12.626707] c25ea088
[ 12.626712] 22c4c00a
[ 12.626717] f277a7c7
[ 12.626728]
[ 12.627169] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009

---
# bad: [bdc61aad77faf67187525028f1f355eff3849f22] Add linux-next specific files for 20220428
# good: [af2d861d4cd2a4da5137f795ee3509e6f944a25b] Linux 5.18-rc4
git bisect start 'HEAD' 'v5.18-rc4'
# good: [a6ffa4aa7e81a54632f3370f4c93fce603160192] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
git bisect good a6ffa4aa7e81a54632f3370f4c93fce603160192
# good: [cd63f17e3bb63006f9f88bf7f5947b8e1601bcd9] Merge branch 'edac-for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git
git bisect good cd63f17e3bb63006f9f88bf7f5947b8e1601bcd9
# good: [cee7bbed3e5cc089b5c364ac8ad4a186c2a28bb6] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine.git
git bisect good cee7bbed3e5cc089b5c364ac8ad4a186c2a28bb6
# good: [d5a23156ea99f10b584221893a6a7d6f6554cde8] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git
git bisect good d5a23156ea99f10b584221893a6a7d6f6554cde8
# good: [2f1fde90d983bc404503100c9c4bbbf1e191bcf4] selftests: cgroup: fix alloc_anon_noexit() instantly freeing memory
git bisect good 2f1fde90d983bc404503100c9c4bbbf1e191bcf4
# good: [fca1db6ff251278c532231552e840c7dc36dfa76] Merge branch 'bitmap-for-next' of https://github.com/norov/linux.git
git bisect good fca1db6ff251278c532231552e840c7dc36dfa76
# good: [40b39116fe8e6fb66e3166ea40138eec506dfd91] perf: use VMA iterator
git bisect good 40b39116fe8e6fb66e3166ea40138eec506dfd91
# bad: [33ef257872566922df2b6bcfdb5330b2388aef53] Docs/{ABI,admin-guide}/damon: update for fixed virtual address ranges monitoring
git bisect bad 33ef257872566922df2b6bcfdb5330b2388aef53
# good: [2d8640f244c1ea6c40acde911d339dabc2ac765d] mm/oom_kill: use maple tree iterators instead of vma linked list
git bisect good 2d8640f244c1ea6c40acde911d339dabc2ac765d
# good: [49d281fa016f2906346f1707e5059b6f7674a948] mm/mmap.c: pass in mapping to __vma_link_file()
git bisect good 49d281fa016f2906346f1707e5059b6f7674a948
# bad: [778ae6914961a857596ccdddb69f34ad1d597cd0] selftets/damon/sysfs: test existence and permission of avail_operations
git bisect bad 778ae6914961a857596ccdddb69f34ad1d597cd0
# bad: [14031cb11d7f48cc0cb19084537e378fa8ce020d] mm/damon/core: add a function for damon_operations registration checks
git bisect bad 14031cb11d7f48cc0cb19084537e378fa8ce020d
# bad: [41fd8be857ee43f2f466fca7c2b66fea39f6540d] mapletree-vs-khugepaged
git bisect bad 41fd8be857ee43f2f466fca7c2b66fea39f6540d
# first bad commit: [41fd8be857ee43f2f466fca7c2b66fea39f6540d] mapletree-vs-khugepaged

2022-04-29 20:38:59

by Liam R. Howlett

[permalink] [raw]

Subject: Re: [PATCH] mapletree-vs-khugepaged

* Guenter Roeck <[email protected]> [220428 13:20]:
> On Wed, Apr 27, 2022 at 03:10:45PM -0700, Andrew Morton wrote:
> > Fix mapletree for patch series "Make khugepaged collapse readonly FS THP
> > more consistent", v3.
> >
> > Cc: Liam R. Howlett <[email protected]>
> > Signed-off-by: Andrew Morton <[email protected]>
>
> This patch causes all my sparc64 boot tests to fail. Bisect and crash logs
> attached.

This is very interesting. If 49d281fa016f2906346f1707e5059b6f7674a948
"mm/mmap.c: pass in mapping to __vma_link_file()" is okay, I would
expect this one to also be okay. Is this a case of randomization of
addresses on boot causing bad commits to be reported as good sometimes?

I'll try and get set up to test all these architectures, but a lot of
them are frustrating to get going so it might take a while. Note that
progress may be slower due to events scheduled for next week.

Thanks,
Liam

>
> Guenter
>
> ---
> [ 12.624703] Unable to handle kernel paging request at virtual address 0e00000000000000
> [ 12.624793] tsk->{mm,active_mm}->context = 0000000000000005
> [ 12.624823] tsk->{mm,active_mm}->pgd = fffff800048b8000
> [ 12.624849] \|/ ____ \|/
> [ 12.624849] "@'/ .. \`@"
> [ 12.624849] /_| \__/ |_\
> [ 12.624849] \__U_/
> [ 12.624874] init(1): Oops [#1]
> [ 12.625194] CPU: 0 PID: 1 Comm: init Not tainted 5.18.0-rc4-next-20220428 #1
> [ 12.625421] TSTATE: 0000009911001606 TPC: 00000000005e6330 TNPC: 00000000005e6334 Y: 00000000 Not tainted
> [ 12.625455] TPC: <mmap_region+0x150/0x700>
> [ 12.625503] g0: 0000000000619a00 g1: 0000000000000000 g2: fffff8000488b200 g3: 0000000000000000
> [ 12.625537] g4: fffff8000414a9a0 g5: fffff8001dd3e000 g6: fffff8000414c000 g7: 0000000000000000
> [ 12.625569] o0: 0000000000000000 o1: 0000000000000000 o2: 0000000001167b68 o3: 0000000000f51bb8
> [ 12.625601] o4: fffff80100301fff o5: fffff8000414fc20 sp: fffff8000414f341 ret_pc: 00000000005e6310
> [ 12.625630] RPC: <mmap_region+0x130/0x700>
> [ 12.625692] l0: fffff8000488b260 l1: 000000000000008b l2: fffff80100302000 l3: 0000000000000000
> [ 12.625725] l4: fffff80100301fff l5: 0000000000000000 l6: 30812c2a1dd8556f l7: fffff8000414b438
> [ 12.625762] i0: fffff800044f58a0 i1: fffff801001ec000 i2: 0e00000000000000 i3: 0000000000000075
> [ 12.625795] i4: 0000000000000000 i5: fffff8000414fde0 i6: fffff8000414f461 i7: 00000000005e6c58
> [ 12.625833] I7: <do_mmap+0x378/0x500>
> [ 12.625906] Call Trace:
> [ 12.626006] [<00000000005e6c58>] do_mmap+0x378/0x500
> [ 12.626092] [<00000000005bdc98>] vm_mmap_pgoff+0x78/0x100
> [ 12.626112] [<00000000005e3d24>] ksys_mmap_pgoff+0x164/0x1c0
> [ 12.626129] [<0000000000406294>] linux_sparc_syscall+0x34/0x44
> [ 12.626198] Disabling lock debugging due to kernel taint
> [ 12.626286] Caller[00000000005e6c58]: do_mmap+0x378/0x500
> [ 12.626335] Caller[00000000005bdc98]: vm_mmap_pgoff+0x78/0x100
> [ 12.626354] Caller[00000000005e3d24]: ksys_mmap_pgoff+0x164/0x1c0
> [ 12.626371] Caller[0000000000406294]: linux_sparc_syscall+0x34/0x44
> [ 12.626390] Caller[fffff8010001d88c]: 0xfffff8010001d88c
> [ 12.626537] Instruction DUMP:
> [ 12.626567] a6100008
> [ 12.626678] 02c68006
> [ 12.626685] 01000000
> [ 12.626690] <c25e8000>
> [ 12.626696] 80a04012
> [ 12.626701] 22600077
> [ 12.626707] c25ea088
> [ 12.626712] 22c4c00a
> [ 12.626717] f277a7c7
> [ 12.626728]
> [ 12.627169] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
>
> ---
> # bad: [bdc61aad77faf67187525028f1f355eff3849f22] Add linux-next specific files for 20220428
> # good: [af2d861d4cd2a4da5137f795ee3509e6f944a25b] Linux 5.18-rc4
> git bisect start 'HEAD' 'v5.18-rc4'
> # good: [a6ffa4aa7e81a54632f3370f4c93fce603160192] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
> git bisect good a6ffa4aa7e81a54632f3370f4c93fce603160192
> # good: [cd63f17e3bb63006f9f88bf7f5947b8e1601bcd9] Merge branch 'edac-for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git
> git bisect good cd63f17e3bb63006f9f88bf7f5947b8e1601bcd9
> # good: [cee7bbed3e5cc089b5c364ac8ad4a186c2a28bb6] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine.git
> git bisect good cee7bbed3e5cc089b5c364ac8ad4a186c2a28bb6
> # good: [d5a23156ea99f10b584221893a6a7d6f6554cde8] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git
> git bisect good d5a23156ea99f10b584221893a6a7d6f6554cde8
> # good: [2f1fde90d983bc404503100c9c4bbbf1e191bcf4] selftests: cgroup: fix alloc_anon_noexit() instantly freeing memory
> git bisect good 2f1fde90d983bc404503100c9c4bbbf1e191bcf4
> # good: [fca1db6ff251278c532231552e840c7dc36dfa76] Merge branch 'bitmap-for-next' of https://github.com/norov/linux.git
> git bisect good fca1db6ff251278c532231552e840c7dc36dfa76
> # good: [40b39116fe8e6fb66e3166ea40138eec506dfd91] perf: use VMA iterator
> git bisect good 40b39116fe8e6fb66e3166ea40138eec506dfd91
> # bad: [33ef257872566922df2b6bcfdb5330b2388aef53] Docs/{ABI,admin-guide}/damon: update for fixed virtual address ranges monitoring
> git bisect bad 33ef257872566922df2b6bcfdb5330b2388aef53
> # good: [2d8640f244c1ea6c40acde911d339dabc2ac765d] mm/oom_kill: use maple tree iterators instead of vma linked list
> git bisect good 2d8640f244c1ea6c40acde911d339dabc2ac765d
> # good: [49d281fa016f2906346f1707e5059b6f7674a948] mm/mmap.c: pass in mapping to __vma_link_file()
> git bisect good 49d281fa016f2906346f1707e5059b6f7674a948
> # bad: [778ae6914961a857596ccdddb69f34ad1d597cd0] selftets/damon/sysfs: test existence and permission of avail_operations
> git bisect bad 778ae6914961a857596ccdddb69f34ad1d597cd0
> # bad: [14031cb11d7f48cc0cb19084537e378fa8ce020d] mm/damon/core: add a function for damon_operations registration checks
> git bisect bad 14031cb11d7f48cc0cb19084537e378fa8ce020d
> # bad: [41fd8be857ee43f2f466fca7c2b66fea39f6540d] mapletree-vs-khugepaged
> git bisect bad 41fd8be857ee43f2f466fca7c2b66fea39f6540d
> # first bad commit: [41fd8be857ee43f2f466fca7c2b66fea39f6540d] mapletree-vs-khugepaged

2022-05-02 06:22:50

by Heiko Carstens

[permalink] [raw]

Subject: Re: [PATCH] mapletree-vs-khugepaged

On Thu, Apr 28, 2022 at 10:20:40AM -0700, Guenter Roeck wrote:
> On Wed, Apr 27, 2022 at 03:10:45PM -0700, Andrew Morton wrote:
> > Fix mapletree for patch series "Make khugepaged collapse readonly FS THP
> > more consistent", v3.
> >
> > Cc: Liam R. Howlett <[email protected]>
> > Signed-off-by: Andrew Morton <[email protected]>
>
> This patch causes all my sparc64 boot tests to fail. Bisect and crash logs
> attached.
>
> Guenter
>
> ---
> [ 12.624703] Unable to handle kernel paging request at virtual address 0e00000000000000
> [ 12.624793] tsk->{mm,active_mm}->context = 0000000000000005
> [ 12.624823] tsk->{mm,active_mm}->pgd = fffff800048b8000
> [ 12.624849] \|/ ____ \|/
> [ 12.624849] "@'/ .. \`@"
> [ 12.624849] /_| \__/ |_\
> [ 12.624849] \__U_/
> [ 12.624874] init(1): Oops [#1]
> [ 12.625194] CPU: 0 PID: 1 Comm: init Not tainted 5.18.0-rc4-next-20220428 #1
> [ 12.625421] TSTATE: 0000009911001606 TPC: 00000000005e6330 TNPC: 00000000005e6334 Y: 00000000 Not tainted
> [ 12.625455] TPC: <mmap_region+0x150/0x700>
> [ 12.625503] g0: 0000000000619a00 g1: 0000000000000000 g2: fffff8000488b200 g3: 0000000000000000
> [ 12.625537] g4: fffff8000414a9a0 g5: fffff8001dd3e000 g6: fffff8000414c000 g7: 0000000000000000
> [ 12.625569] o0: 0000000000000000 o1: 0000000000000000 o2: 0000000001167b68 o3: 0000000000f51bb8
> [ 12.625601] o4: fffff80100301fff o5: fffff8000414fc20 sp: fffff8000414f341 ret_pc: 00000000005e6310
> [ 12.625630] RPC: <mmap_region+0x130/0x700>
> [ 12.625692] l0: fffff8000488b260 l1: 000000000000008b l2: fffff80100302000 l3: 0000000000000000
> [ 12.625725] l4: fffff80100301fff l5: 0000000000000000 l6: 30812c2a1dd8556f l7: fffff8000414b438
> [ 12.625762] i0: fffff800044f58a0 i1: fffff801001ec000 i2: 0e00000000000000 i3: 0000000000000075
> [ 12.625795] i4: 0000000000000000 i5: fffff8000414fde0 i6: fffff8000414f461 i7: 00000000005e6c58
> [ 12.625833] I7: <do_mmap+0x378/0x500>
> [ 12.625906] Call Trace:
> [ 12.626006] [<00000000005e6c58>] do_mmap+0x378/0x500
> [ 12.626092] [<00000000005bdc98>] vm_mmap_pgoff+0x78/0x100
> [ 12.626112] [<00000000005e3d24>] ksys_mmap_pgoff+0x164/0x1c0
> [ 12.626129] [<0000000000406294>] linux_sparc_syscall+0x34/0x44
> [ 12.626198] Disabling lock debugging due to kernel taint
> [ 12.626286] Caller[00000000005e6c58]: do_mmap+0x378/0x500
> [ 12.626335] Caller[00000000005bdc98]: vm_mmap_pgoff+0x78/0x100
> [ 12.626354] Caller[00000000005e3d24]: ksys_mmap_pgoff+0x164/0x1c0
> [ 12.626371] Caller[0000000000406294]: linux_sparc_syscall+0x34/0x44
> [ 12.626390] Caller[fffff8010001d88c]: 0xfffff8010001d88c
> [ 12.626537] Instruction DUMP:
> [ 12.626567] a6100008
> [ 12.626678] 02c68006
> [ 12.626685] 01000000
> [ 12.626690] <c25e8000>
> [ 12.626696] 80a04012
> [ 12.626701] 22600077
> [ 12.626707] c25ea088
> [ 12.626712] 22c4c00a
> [ 12.626717] f277a7c7
> [ 12.626728]
> [ 12.627169] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009

FWIW, same on s390 - linux-next is completely broken. Note: I didn't
bisect, but given that the call trace, and even the failing address
match, I'm quite confident it is the same reason.

Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 0e00000000000000 TEID: 0e00000000000803
Fault in home space mode while using kernel ASCE.
AS:00000000bac44007 R3:00000001ffff0007 S:00000001fffef800 P:000000000000003d
Oops: 0038 ilc:3 [#1] SMP
CPU: 3 PID: 79757 Comm: pt_upgrade_race Tainted: G E K 5.18.0-20220428.rc4.git500.bdc61aad77fa.300.fc35.s390x+next #1
Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0)
Krnl PSW : 0704c00180000000 00000000b912c9a2 (mmap_region+0x1a2/0x8a8)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
Krnl GPRS: 0000000000000000 0e00000000000000 0000000000000000 0000000000000000
ffffffffffffffff 000000000000000f 00000380016b3d98 0000080000100000
000000008364c100 0000080000000000 0000000000100077 0e00000000000000
00000000909da100 00000380016b3c58 00000000b912c982 00000380016b3b40
Krnl Code: 00000000b912c992: a774002c brc 7,00000000b912c9ea
00000000b912c996: ecb80225007c cgij %r11,0,8,00000000b912cde0
#00000000b912c99c: e310f0f80004 lg %r1,248(%r15)
>00000000b912c9a2: e37010000020 cg %r7,0(%r1)
00000000b912c9a8: a784010b brc 8,00000000b912cbbe
00000000b912c9ac: e310f0e80004 lg %r1,232(%r15)
00000000b912c9b2: ec180013007c cgij %r1,0,8,00000000b912c9d8
00000000b912c9b8: e310f0e80004 lg %r1,232(%r15)
Call Trace:
[<00000000b912c9a2>] mmap_region+0x1a2/0x8a8
([<00000000b912c982>] mmap_region+0x182/0x8a8)
[<00000000b912d492>] do_mmap+0x3ea/0x4c8
[<00000000b90fb9cc>] vm_mmap_pgoff+0xd4/0x170
[<00000000b9129c9a>] ksys_mmap_pgoff+0x62/0x238
[<00000000b912a034>] __s390x_sys_old_mmap+0x74/0x98
[<00000000b9a78ff8>] __do_syscall+0x1d8/0x200
[<00000000b9a872a2>] system_call+0x82/0xb0
Last Breaking-Event-Address:
[<00000000b9b9e678>] __s390_indirect_jump_r14+0x0/0xc
Kernel panic - not syncing: Fatal exception: panic_on_oops

2022-05-02 09:01:39

by Liam R. Howlett

[permalink] [raw]

Subject: Re: [PATCH] mapletree-vs-khugepaged

* Heiko Carstens <[email protected]> [220429 08:10]:
> On Thu, Apr 28, 2022 at 10:20:40AM -0700, Guenter Roeck wrote:
> > On Wed, Apr 27, 2022 at 03:10:45PM -0700, Andrew Morton wrote:
> > > Fix mapletree for patch series "Make khugepaged collapse readonly FS THP
> > > more consistent", v3.
> > >
> > > Cc: Liam R. Howlett <[email protected]>
> > > Signed-off-by: Andrew Morton <[email protected]>
> >
> > This patch causes all my sparc64 boot tests to fail. Bisect and crash logs
> > attached.
> >
> > Guenter
> >
> > ---
> > [ 12.624703] Unable to handle kernel paging request at virtual address 0e00000000000000
> > [ 12.624793] tsk->{mm,active_mm}->context = 0000000000000005
> > [ 12.624823] tsk->{mm,active_mm}->pgd = fffff800048b8000
> > [ 12.624849] \|/ ____ \|/
> > [ 12.624849] "@'/ .. \`@"
> > [ 12.624849] /_| \__/ |_\
> > [ 12.624849] \__U_/
> > [ 12.624874] init(1): Oops [#1]
> > [ 12.625194] CPU: 0 PID: 1 Comm: init Not tainted 5.18.0-rc4-next-20220428 #1
> > [ 12.625421] TSTATE: 0000009911001606 TPC: 00000000005e6330 TNPC: 00000000005e6334 Y: 00000000 Not tainted
> > [ 12.625455] TPC: <mmap_region+0x150/0x700>
> > [ 12.625503] g0: 0000000000619a00 g1: 0000000000000000 g2: fffff8000488b200 g3: 0000000000000000
> > [ 12.625537] g4: fffff8000414a9a0 g5: fffff8001dd3e000 g6: fffff8000414c000 g7: 0000000000000000
> > [ 12.625569] o0: 0000000000000000 o1: 0000000000000000 o2: 0000000001167b68 o3: 0000000000f51bb8
> > [ 12.625601] o4: fffff80100301fff o5: fffff8000414fc20 sp: fffff8000414f341 ret_pc: 00000000005e6310
> > [ 12.625630] RPC: <mmap_region+0x130/0x700>
> > [ 12.625692] l0: fffff8000488b260 l1: 000000000000008b l2: fffff80100302000 l3: 0000000000000000
> > [ 12.625725] l4: fffff80100301fff l5: 0000000000000000 l6: 30812c2a1dd8556f l7: fffff8000414b438
> > [ 12.625762] i0: fffff800044f58a0 i1: fffff801001ec000 i2: 0e00000000000000 i3: 0000000000000075
> > [ 12.625795] i4: 0000000000000000 i5: fffff8000414fde0 i6: fffff8000414f461 i7: 00000000005e6c58
> > [ 12.625833] I7: <do_mmap+0x378/0x500>
> > [ 12.625906] Call Trace:
> > [ 12.626006] [<00000000005e6c58>] do_mmap+0x378/0x500
> > [ 12.626092] [<00000000005bdc98>] vm_mmap_pgoff+0x78/0x100
> > [ 12.626112] [<00000000005e3d24>] ksys_mmap_pgoff+0x164/0x1c0
> > [ 12.626129] [<0000000000406294>] linux_sparc_syscall+0x34/0x44
> > [ 12.626198] Disabling lock debugging due to kernel taint
> > [ 12.626286] Caller[00000000005e6c58]: do_mmap+0x378/0x500
> > [ 12.626335] Caller[00000000005bdc98]: vm_mmap_pgoff+0x78/0x100
> > [ 12.626354] Caller[00000000005e3d24]: ksys_mmap_pgoff+0x164/0x1c0
> > [ 12.626371] Caller[0000000000406294]: linux_sparc_syscall+0x34/0x44
> > [ 12.626390] Caller[fffff8010001d88c]: 0xfffff8010001d88c
> > [ 12.626537] Instruction DUMP:
> > [ 12.626567] a6100008
> > [ 12.626678] 02c68006
> > [ 12.626685] 01000000
> > [ 12.626690] <c25e8000>
> > [ 12.626696] 80a04012
> > [ 12.626701] 22600077
> > [ 12.626707] c25ea088
> > [ 12.626712] 22c4c00a
> > [ 12.626717] f277a7c7
> > [ 12.626728]
> > [ 12.627169] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
>
> FWIW, same on s390 - linux-next is completely broken. Note: I didn't
> bisect, but given that the call trace, and even the failing address
> match, I'm quite confident it is the same reason.

This is worth a lot to me. Thanks for the report and the testing.

Regards,
Liam

>
> Unable to handle kernel pointer dereference in virtual kernel address space
> Failing address: 0e00000000000000 TEID: 0e00000000000803
> Fault in home space mode while using kernel ASCE.
> AS:00000000bac44007 R3:00000001ffff0007 S:00000001fffef800 P:000000000000003d
> Oops: 0038 ilc:3 [#1] SMP
> CPU: 3 PID: 79757 Comm: pt_upgrade_race Tainted: G E K 5.18.0-20220428.rc4.git500.bdc61aad77fa.300.fc35.s390x+next #1
> Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0)
> Krnl PSW : 0704c00180000000 00000000b912c9a2 (mmap_region+0x1a2/0x8a8)
> R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> Krnl GPRS: 0000000000000000 0e00000000000000 0000000000000000 0000000000000000
> ffffffffffffffff 000000000000000f 00000380016b3d98 0000080000100000
> 000000008364c100 0000080000000000 0000000000100077 0e00000000000000
> 00000000909da100 00000380016b3c58 00000000b912c982 00000380016b3b40
> Krnl Code: 00000000b912c992: a774002c brc 7,00000000b912c9ea
> 00000000b912c996: ecb80225007c cgij %r11,0,8,00000000b912cde0
> #00000000b912c99c: e310f0f80004 lg %r1,248(%r15)
> >00000000b912c9a2: e37010000020 cg %r7,0(%r1)
> 00000000b912c9a8: a784010b brc 8,00000000b912cbbe
> 00000000b912c9ac: e310f0e80004 lg %r1,232(%r15)
> 00000000b912c9b2: ec180013007c cgij %r1,0,8,00000000b912c9d8
> 00000000b912c9b8: e310f0e80004 lg %r1,232(%r15)
> Call Trace:
> [<00000000b912c9a2>] mmap_region+0x1a2/0x8a8
> ([<00000000b912c982>] mmap_region+0x182/0x8a8)
> [<00000000b912d492>] do_mmap+0x3ea/0x4c8
> [<00000000b90fb9cc>] vm_mmap_pgoff+0xd4/0x170
> [<00000000b9129c9a>] ksys_mmap_pgoff+0x62/0x238
> [<00000000b912a034>] __s390x_sys_old_mmap+0x74/0x98
> [<00000000b9a78ff8>] __do_syscall+0x1d8/0x200
> [<00000000b9a872a2>] system_call+0x82/0xb0
> Last Breaking-Event-Address:
> [<00000000b9b9e678>] __s390_indirect_jump_r14+0x0/0xc
> Kernel panic - not syncing: Fatal exception: panic_on_oops

2022-05-02 09:20:37

by Heiko Carstens

[permalink] [raw]

Subject: Re: [PATCH] mapletree-vs-khugepaged

On Fri, Apr 29, 2022 at 01:01:53PM +0000, Liam Howlett wrote:
> * Heiko Carstens <[email protected]> [220429 08:10]:
> > On Thu, Apr 28, 2022 at 10:20:40AM -0700, Guenter Roeck wrote:
> > > On Wed, Apr 27, 2022 at 03:10:45PM -0700, Andrew Morton wrote:
> > > > Fix mapletree for patch series "Make khugepaged collapse readonly FS THP
> > > > more consistent", v3.
> > > >
> > > > Cc: Liam R. Howlett <[email protected]>
> > > > Signed-off-by: Andrew Morton <[email protected]>
> > >
> > > This patch causes all my sparc64 boot tests to fail. Bisect and crash logs
> > > attached.
> > >
> > > Guenter
...
> >
> > FWIW, same on s390 - linux-next is completely broken. Note: I didn't
> > bisect, but given that the call trace, and even the failing address
> > match, I'm quite confident it is the same reason.
>
> This is worth a lot to me. Thanks for the report and the testing.

Not sure if it is of any relevance, and you are probably aware if it
anyway, but both sparc64 and s390 are big endian; and there was no
report from little endian architectures yet.

2022-05-14 03:32:02

by Sven Schnelle

[permalink] [raw]

Subject: Re: [PATCH] mapletree-vs-khugepaged

Heiko Carstens <[email protected]> writes:

> On Thu, Apr 28, 2022 at 10:20:40AM -0700, Guenter Roeck wrote:
>> On Wed, Apr 27, 2022 at 03:10:45PM -0700, Andrew Morton wrote:
>> > Fix mapletree for patch series "Make khugepaged collapse readonly FS THP
>> > more consistent", v3.
>> >
>> > Cc: Liam R. Howlett <[email protected]>
>> > Signed-off-by: Andrew Morton <[email protected]>
>>
>> This patch causes all my sparc64 boot tests to fail. Bisect and crash logs
>> attached.
>>
>> Guenter
>>
>> ---
>> [ 12.624703] Unable to handle kernel paging request at virtual address 0e00000000000000
>> [ 12.624793] tsk->{mm,active_mm}->context = 0000000000000005
>> [ 12.624823] tsk->{mm,active_mm}->pgd = fffff800048b8000
>> [ 12.624849] \|/ ____ \|/
>> [ 12.624849] "@'/ .. \`@"
>> [ 12.624849] /_| \__/ |_\
>> [ 12.624849] \__U_/
>> [ 12.624874] init(1): Oops [#1]
>> [ 12.625194] CPU: 0 PID: 1 Comm: init Not tainted 5.18.0-rc4-next-20220428 #1
>> [ 12.625421] TSTATE: 0000009911001606 TPC: 00000000005e6330 TNPC: 00000000005e6334 Y: 00000000 Not tainted
>> [ 12.625455] TPC: <mmap_region+0x150/0x700>
>> [ 12.625503] g0: 0000000000619a00 g1: 0000000000000000 g2: fffff8000488b200 g3: 0000000000000000
>> [ 12.625537] g4: fffff8000414a9a0 g5: fffff8001dd3e000 g6: fffff8000414c000 g7: 0000000000000000
>> [ 12.625569] o0: 0000000000000000 o1: 0000000000000000 o2: 0000000001167b68 o3: 0000000000f51bb8
>> [ 12.625601] o4: fffff80100301fff o5: fffff8000414fc20 sp: fffff8000414f341 ret_pc: 00000000005e6310
>> [ 12.625630] RPC: <mmap_region+0x130/0x700>
>> [ 12.625692] l0: fffff8000488b260 l1: 000000000000008b l2: fffff80100302000 l3: 0000000000000000
>> [ 12.625725] l4: fffff80100301fff l5: 0000000000000000 l6: 30812c2a1dd8556f l7: fffff8000414b438
>> [ 12.625762] i0: fffff800044f58a0 i1: fffff801001ec000 i2: 0e00000000000000 i3: 0000000000000075
>> [ 12.625795] i4: 0000000000000000 i5: fffff8000414fde0 i6: fffff8000414f461 i7: 00000000005e6c58
>> [ 12.625833] I7: <do_mmap+0x378/0x500>
>> [ 12.625906] Call Trace:
>> [ 12.626006] [<00000000005e6c58>] do_mmap+0x378/0x500
>> [ 12.626092] [<00000000005bdc98>] vm_mmap_pgoff+0x78/0x100
>> [ 12.626112] [<00000000005e3d24>] ksys_mmap_pgoff+0x164/0x1c0
>> [ 12.626129] [<0000000000406294>] linux_sparc_syscall+0x34/0x44
>> [ 12.626198] Disabling lock debugging due to kernel taint
>> [ 12.626286] Caller[00000000005e6c58]: do_mmap+0x378/0x500
>> [ 12.626335] Caller[00000000005bdc98]: vm_mmap_pgoff+0x78/0x100
>> [ 12.626354] Caller[00000000005e3d24]: ksys_mmap_pgoff+0x164/0x1c0
>> [ 12.626371] Caller[0000000000406294]: linux_sparc_syscall+0x34/0x44
>> [ 12.626390] Caller[fffff8010001d88c]: 0xfffff8010001d88c
>> [ 12.626537] Instruction DUMP:
>> [ 12.626567] a6100008
>> [ 12.626678] 02c68006
>> [ 12.626685] 01000000
>> [ 12.626690] <c25e8000>
>> [ 12.626696] 80a04012
>> [ 12.626701] 22600077
>> [ 12.626707] c25ea088
>> [ 12.626712] 22c4c00a
>> [ 12.626717] f277a7c7
>> [ 12.626728]
>> [ 12.627169] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
>
> FWIW, same on s390 - linux-next is completely broken. Note: I didn't
> bisect, but given that the call trace, and even the failing address
> match, I'm quite confident it is the same reason.
>
> Unable to handle kernel pointer dereference in virtual kernel address space
> Failing address: 0e00000000000000 TEID: 0e00000000000803
> Fault in home space mode while using kernel ASCE.
> AS:00000000bac44007 R3:00000001ffff0007 S:00000001fffef800 P:000000000000003d
> Oops: 0038 ilc:3 [#1] SMP
> CPU: 3 PID: 79757 Comm: pt_upgrade_race Tainted: G E K 5.18.0-20220428.rc4.git500.bdc61aad77fa.300.fc35.s390x+next #1
> Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0)
> Krnl PSW : 0704c00180000000 00000000b912c9a2 (mmap_region+0x1a2/0x8a8)
> R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> Krnl GPRS: 0000000000000000 0e00000000000000 0000000000000000 0000000000000000
> ffffffffffffffff 000000000000000f 00000380016b3d98 0000080000100000
> 000000008364c100 0000080000000000 0000000000100077 0e00000000000000
> 00000000909da100 00000380016b3c58 00000000b912c982 00000380016b3b40
> Krnl Code: 00000000b912c992: a774002c brc 7,00000000b912c9ea
> 00000000b912c996: ecb80225007c cgij %r11,0,8,00000000b912cde0
> #00000000b912c99c: e310f0f80004 lg %r1,248(%r15)
> >00000000b912c9a2: e37010000020 cg %r7,0(%r1)
> 00000000b912c9a8: a784010b brc 8,00000000b912cbbe
> 00000000b912c9ac: e310f0e80004 lg %r1,232(%r15)
> 00000000b912c9b2: ec180013007c cgij %r1,0,8,00000000b912c9d8
> 00000000b912c9b8: e310f0e80004 lg %r1,232(%r15)
> Call Trace:
> [<00000000b912c9a2>] mmap_region+0x1a2/0x8a8
> ([<00000000b912c982>] mmap_region+0x182/0x8a8)
> [<00000000b912d492>] do_mmap+0x3ea/0x4c8
> [<00000000b90fb9cc>] vm_mmap_pgoff+0xd4/0x170
> [<00000000b9129c9a>] ksys_mmap_pgoff+0x62/0x238
> [<00000000b912a034>] __s390x_sys_old_mmap+0x74/0x98
> [<00000000b9a78ff8>] __do_syscall+0x1d8/0x200
> [<00000000b9a872a2>] system_call+0x82/0xb0
> Last Breaking-Event-Address:
> [<00000000b9b9e678>] __s390_indirect_jump_r14+0x0/0xc
> Kernel panic - not syncing: Fatal exception: panic_on_oops

Starting today we're still seeing the same crash with linux-next from
(next-20220513):

[ 211.937897] CPU: 7 PID: 535 Comm: pt_upgrade Not tainted 5.18.0-rc6-11648-g76535d42eb53-dirty #732
[ 211.937902] Unable to handle kernel pointer dereference in virtual kernel address space
[ 211.937903] Hardware name: IBM 3906 M04 704 (z/VM 7.1.0)
[ 211.937906] Failing address: 0e00000000000000 TEID: 0e00000000000803
[ 211.937909] Krnl PSW : 0704c00180000000 0000001ca52f06d6
[ 211.937910] Fault in home space mode while using kernel ASCE.
[ 211.937917] AS:0000001ca6e24007 R3:0000001fffff0007 S:0000001ffffef800 P:000000000000003d
[ 211.937914] (mmap_region+0x19e/0x848)
[ 211.937929] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[ 211.937939] Krnl GPRS: 0000000000000000 0e00000000000000 0000000000000000 0000000000000000
[ 211.937942] ffffffff00000f0f ffffffffffffffff 0e00000000000000 0000040000001000
[ 211.937945] 0000000083551900 0000040000000000 00000000000000fb 000003800070fc58
[ 211.937947] 000000008f490000 0000000000000000 0000001ca52f06b6 000003800070fb48
[ 211.937959] Krnl Code: 0000001ca52f06c6: a7740021 brc 7,0000001ca52f0708
[ 211.937959] 0000001ca52f06ca: ec6801b3007c cgij %r6,0,8,0000001ca52f0a30
[ 211.937959] #0000001ca52f06d0: e310f0f80004 lg %r1,248(%r15)
[ 211.937959] >0000001ca52f06d6: e37010000020 cg %r7,0(%r1)
[ 211.937959] 0000001ca52f06dc: a78400ea brc 8,0000001ca52f08b0
[ 211.937959] 0000001ca52f06e0: e310f0f00004 lg %r1,240(%r15)
[ 211.937959] 0000001ca52f06e6: ec180008007c cgij %r1,0,8,0000001ca52f06f6
[ 211.937959] 0000001ca52f06ec: e39010080020 cg %r9,8(%r1)
[ 211.937973] Call Trace:
[ 211.937975] [<0000001ca52f06d6>] mmap_region+0x19e/0x848
[ 211.937978] ([<0000001ca52f06b6>] mmap_region+0x17e/0x848)
[ 211.937981] [<0000001ca52f116a>] do_mmap+0x3ea/0x4c8
[ 211.937983] [<0000001ca52bed12>] vm_mmap_pgoff+0xda/0x178
[ 211.937987] [<0000001ca52ed5ea>] ksys_mmap_pgoff+0x62/0x238
[ 211.937989] [<0000001ca52ed992>] __s390x_sys_old_mmap+0x7a/0xa0
[ 211.937993] [<0000001ca5c4ef5c>] __do_syscall+0x1d4/0x200
[ 211.937999] [<0000001ca5c5d572>] system_call+0x82/0xb0
[ 211.938002] Last Breaking-Event-Address:
[ 211.938003] [<0000001ca5888616>] mas_prev+0xb6/0xc0
[ 211.938010] Oops: 0038 ilc:3 [#2]
[ 211.938011] Kernel panic - not syncing: Fatal exception: panic_on_oops
[ 211.938012] SMP
[ 211.938014] Modules linked in:
07: HCPGIR450W CP entered; disabled wait PSW 00020001 80000000 0000001C
A50679A6

IS that issue supposed to be fixed? git bisect pointed me to

# bad: [76535d42eb53485775a8c54ea85725812b75543f] Merge branch
'mm-everything' of
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

which isn't really helpful.

Anything we could help with debugging this?

Thanks
Sven

2022-05-14 03:36:07

by Liam R. Howlett

[permalink] [raw]

Subject: Re: [PATCH] mapletree-vs-khugepaged

* Sven Schnelle <[email protected]> [220513 10:46]:
> Heiko Carstens <[email protected]> writes:
>
> > On Thu, Apr 28, 2022 at 10:20:40AM -0700, Guenter Roeck wrote:
> >> On Wed, Apr 27, 2022 at 03:10:45PM -0700, Andrew Morton wrote:
> >> > Fix mapletree for patch series "Make khugepaged collapse readonly FS THP
> >> > more consistent", v3.
> >> >
> >> > Cc: Liam R. Howlett <[email protected]>
> >> > Signed-off-by: Andrew Morton <[email protected]>
> >>
> >> This patch causes all my sparc64 boot tests to fail. Bisect and crash logs
> >> attached.
> >>
> >> Guenter
> >>

....

> >
> > FWIW, same on s390 - linux-next is completely broken. Note: I didn't
> > bisect, but given that the call trace, and even the failing address
> > match, I'm quite confident it is the same reason.
> >
> > Unable to handle kernel pointer dereference in virtual kernel address space
> > Failing address: 0e00000000000000 TEID: 0e00000000000803
> > Fault in home space mode while using kernel ASCE.
> > AS:00000000bac44007 R3:00000001ffff0007 S:00000001fffef800 P:000000000000003d
> > Oops: 0038 ilc:3 [#1] SMP
> > CPU: 3 PID: 79757 Comm: pt_upgrade_race Tainted: G E K 5.18.0-20220428.rc4.git500.bdc61aad77fa.300.fc35.s390x+next #1
> > Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0)
> > Krnl PSW : 0704c00180000000 00000000b912c9a2 (mmap_region+0x1a2/0x8a8)
> > R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> > Krnl GPRS: 0000000000000000 0e00000000000000 0000000000000000 0000000000000000
> > ffffffffffffffff 000000000000000f 00000380016b3d98 0000080000100000
> > 000000008364c100 0000080000000000 0000000000100077 0e00000000000000
> > 00000000909da100 00000380016b3c58 00000000b912c982 00000380016b3b40
> > Krnl Code: 00000000b912c992: a774002c brc 7,00000000b912c9ea
> > 00000000b912c996: ecb80225007c cgij %r11,0,8,00000000b912cde0
> > #00000000b912c99c: e310f0f80004 lg %r1,248(%r15)
> > >00000000b912c9a2: e37010000020 cg %r7,0(%r1)
> > 00000000b912c9a8: a784010b brc 8,00000000b912cbbe
> > 00000000b912c9ac: e310f0e80004 lg %r1,232(%r15)
> > 00000000b912c9b2: ec180013007c cgij %r1,0,8,00000000b912c9d8
> > 00000000b912c9b8: e310f0e80004 lg %r1,232(%r15)
> > Call Trace:
> > [<00000000b912c9a2>] mmap_region+0x1a2/0x8a8
> > ([<00000000b912c982>] mmap_region+0x182/0x8a8)
> > [<00000000b912d492>] do_mmap+0x3ea/0x4c8
> > [<00000000b90fb9cc>] vm_mmap_pgoff+0xd4/0x170
> > [<00000000b9129c9a>] ksys_mmap_pgoff+0x62/0x238
> > [<00000000b912a034>] __s390x_sys_old_mmap+0x74/0x98
> > [<00000000b9a78ff8>] __do_syscall+0x1d8/0x200
> > [<00000000b9a872a2>] system_call+0x82/0xb0
> > Last Breaking-Event-Address:
> > [<00000000b9b9e678>] __s390_indirect_jump_r14+0x0/0xc
> > Kernel panic - not syncing: Fatal exception: panic_on_oops
>
> Starting today we're still seeing the same crash with linux-next from
> (next-20220513):
>
> [ 211.937897] CPU: 7 PID: 535 Comm: pt_upgrade Not tainted 5.18.0-rc6-11648-g76535d42eb53-dirty #732
> [ 211.937902] Unable to handle kernel pointer dereference in virtual kernel address space
> [ 211.937903] Hardware name: IBM 3906 M04 704 (z/VM 7.1.0)
> [ 211.937906] Failing address: 0e00000000000000 TEID: 0e00000000000803
> [ 211.937909] Krnl PSW : 0704c00180000000 0000001ca52f06d6
> [ 211.937910] Fault in home space mode while using kernel ASCE.
> [ 211.937917] AS:0000001ca6e24007 R3:0000001fffff0007 S:0000001ffffef800 P:000000000000003d
> [ 211.937914] (mmap_region+0x19e/0x848)
> [ 211.937929] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> [ 211.937939] Krnl GPRS: 0000000000000000 0e00000000000000 0000000000000000 0000000000000000
> [ 211.937942] ffffffff00000f0f ffffffffffffffff 0e00000000000000 0000040000001000
> [ 211.937945] 0000000083551900 0000040000000000 00000000000000fb 000003800070fc58
> [ 211.937947] 000000008f490000 0000000000000000 0000001ca52f06b6 000003800070fb48
> [ 211.937959] Krnl Code: 0000001ca52f06c6: a7740021 brc 7,0000001ca52f0708
> [ 211.937959] 0000001ca52f06ca: ec6801b3007c cgij %r6,0,8,0000001ca52f0a30
> [ 211.937959] #0000001ca52f06d0: e310f0f80004 lg %r1,248(%r15)
> [ 211.937959] >0000001ca52f06d6: e37010000020 cg %r7,0(%r1)
> [ 211.937959] 0000001ca52f06dc: a78400ea brc 8,0000001ca52f08b0
> [ 211.937959] 0000001ca52f06e0: e310f0f00004 lg %r1,240(%r15)
> [ 211.937959] 0000001ca52f06e6: ec180008007c cgij %r1,0,8,0000001ca52f06f6
> [ 211.937959] 0000001ca52f06ec: e39010080020 cg %r9,8(%r1)
> [ 211.937973] Call Trace:
> [ 211.937975] [<0000001ca52f06d6>] mmap_region+0x19e/0x848
> [ 211.937978] ([<0000001ca52f06b6>] mmap_region+0x17e/0x848)
> [ 211.937981] [<0000001ca52f116a>] do_mmap+0x3ea/0x4c8
> [ 211.937983] [<0000001ca52bed12>] vm_mmap_pgoff+0xda/0x178
> [ 211.937987] [<0000001ca52ed5ea>] ksys_mmap_pgoff+0x62/0x238
> [ 211.937989] [<0000001ca52ed992>] __s390x_sys_old_mmap+0x7a/0xa0
> [ 211.937993] [<0000001ca5c4ef5c>] __do_syscall+0x1d4/0x200
> [ 211.937999] [<0000001ca5c5d572>] system_call+0x82/0xb0
> [ 211.938002] Last Breaking-Event-Address:
> [ 211.938003] [<0000001ca5888616>] mas_prev+0xb6/0xc0
> [ 211.938010] Oops: 0038 ilc:3 [#2]
> [ 211.938011] Kernel panic - not syncing: Fatal exception: panic_on_oops
> [ 211.938012] SMP
> [ 211.938014] Modules linked in:
> 07: HCPGIR450W CP entered; disabled wait PSW 00020001 80000000 0000001C
> A50679A6
>
> IS that issue supposed to be fixed? git bisect pointed me to
>
> # bad: [76535d42eb53485775a8c54ea85725812b75543f] Merge branch
> 'mm-everything' of
> git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
>
> which isn't really helpful.
>
> Anything we could help with debugging this?

I tested the maple tree on top of the s390 as it was the same crash and
it was okay. I haven't tested the mm-everything branch though. Can you
test mm-unstable?

I'll continue setting up a sparc VM for testing here and test
mm-everything on that and the s390

Thanks,
Liam

2022-05-16 07:31:56

by Sven Schnelle

[permalink] [raw]

Subject: Re: [PATCH] mapletree-vs-khugepaged

Liam Howlett <[email protected]> writes:

> * Sven Schnelle <[email protected]> [220513 10:46]:
>> Starting today we're still seeing the same crash with linux-next from
>> (next-20220513):
>>
>> [ 211.937897] CPU: 7 PID: 535 Comm: pt_upgrade Not tainted 5.18.0-rc6-11648-g76535d42eb53-dirty #732
>> [ 211.937902] Unable to handle kernel pointer dereference in virtual kernel address space
>> [ 211.937903] Hardware name: IBM 3906 M04 704 (z/VM 7.1.0)
>> [ 211.937906] Failing address: 0e00000000000000 TEID: 0e00000000000803
>> [ 211.937909] Krnl PSW : 0704c00180000000 0000001ca52f06d6
>> [ 211.937910] Fault in home space mode while using kernel ASCE.
>> [ 211.937917] AS:0000001ca6e24007 R3:0000001fffff0007 S:0000001ffffef800 P:000000000000003d
>> [ 211.937914] (mmap_region+0x19e/0x848)
>> [ 211.937929] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
>> [ 211.937939] Krnl GPRS: 0000000000000000 0e00000000000000 0000000000000000 0000000000000000
>> [ 211.937942] ffffffff00000f0f ffffffffffffffff 0e00000000000000 0000040000001000
>> [ 211.937945] 0000000083551900 0000040000000000 00000000000000fb 000003800070fc58
>> [ 211.937947] 000000008f490000 0000000000000000 0000001ca52f06b6 000003800070fb48
>> [ 211.937959] Krnl Code: 0000001ca52f06c6: a7740021 brc 7,0000001ca52f0708
>> [ 211.937959] 0000001ca52f06ca: ec6801b3007c cgij %r6,0,8,0000001ca52f0a30
>> [ 211.937959] #0000001ca52f06d0: e310f0f80004 lg %r1,248(%r15)
>> [ 211.937959] >0000001ca52f06d6: e37010000020 cg %r7,0(%r1)
>> [ 211.937959] 0000001ca52f06dc: a78400ea brc 8,0000001ca52f08b0
>> [ 211.937959] 0000001ca52f06e0: e310f0f00004 lg %r1,240(%r15)
>> [ 211.937959] 0000001ca52f06e6: ec180008007c cgij %r1,0,8,0000001ca52f06f6
>> [ 211.937959] 0000001ca52f06ec: e39010080020 cg %r9,8(%r1)
>> [ 211.937973] Call Trace:
>> [ 211.937975] [<0000001ca52f06d6>] mmap_region+0x19e/0x848
>> [ 211.937978] ([<0000001ca52f06b6>] mmap_region+0x17e/0x848)
>> [ 211.937981] [<0000001ca52f116a>] do_mmap+0x3ea/0x4c8
>> [ 211.937983] [<0000001ca52bed12>] vm_mmap_pgoff+0xda/0x178
>> [ 211.937987] [<0000001ca52ed5ea>] ksys_mmap_pgoff+0x62/0x238
>> [ 211.937989] [<0000001ca52ed992>] __s390x_sys_old_mmap+0x7a/0xa0
>> [ 211.937993] [<0000001ca5c4ef5c>] __do_syscall+0x1d4/0x200
>> [ 211.937999] [<0000001ca5c5d572>] system_call+0x82/0xb0
>> [ 211.938002] Last Breaking-Event-Address:
>> [ 211.938003] [<0000001ca5888616>] mas_prev+0xb6/0xc0
>> [ 211.938010] Oops: 0038 ilc:3 [#2]
>> [ 211.938011] Kernel panic - not syncing: Fatal exception: panic_on_oops
>> [ 211.938012] SMP
>> [ 211.938014] Modules linked in:
>> 07: HCPGIR450W CP entered; disabled wait PSW 00020001 80000000 0000001C
>> A50679A6
>>
>> IS that issue supposed to be fixed? git bisect pointed me to
>>
>> # bad: [76535d42eb53485775a8c54ea85725812b75543f] Merge branch
>> 'mm-everything' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
>>
>> which isn't really helpful.
>>
>> Anything we could help with debugging this?
>
> I tested the maple tree on top of the s390 as it was the same crash and
> it was okay. I haven't tested the mm-everything branch though. Can you
> test mm-unstable?

Yes, i tested mm-unstable but wasn't able to reproduce the issue.

> I'll continue setting up a sparc VM for testing here and test
> mm-everything on that and the s390

One thing that is different compared to x86 is that both sparc and s390
are big endian. Not sure whether and where that would make a difference.

The code to trigger the crash on s390 is rather simple: Just force a
paging level upgrade to 5 levels by calling mmap() with an address that
doesn't fit in 3 levels. Haven't tested whether an upgrade to 4 levels
would be sufficent. I've condensed our test case that triggers this, and
basically all that is required is:

--------------------------------8<---------------------------------------
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <stdio.h>

#define PAGE_SIZE 0x1000
#define _REGION1_SIZE (1UL << 54)

int main(int argc, char *argv[])
{
int pid, status;
void *addr;

pid = fork();
if (pid == 0) {
/*
* Trigger page table level upgrade
*/
addr = mmap((void *)_REGION1_SIZE, PAGE_SIZE, PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_ANONYMOUS, -1, 0);
if (addr == MAP_FAILED)
return 1;
*(int *)addr = 1;
return 0;
}
wait(&status);
return 0;
}
--------------------------------8<---------------------------------------

I've added a few debug statements to the maple tree code:

[ 27.769641] mas_next_entry: offset=14
[ 27.769642] mas_next_nentry: entry = 0e00000000000000, slots=0000000090249f80, mas->offset=15 count=14

I see in mas_next_nentry() that there's a while that iterates over the
(used?) slots until count is reached. After that loop mas_next_entry()
just picks the next (unused?) entry, which is slot 15 in that case.

What i noticed while scanning over include/linux/maple_tree.h is:

struct maple_range_64 {
struct maple_pnode *parent;
unsigned long pivot[MAPLE_RANGE64_SLOTS - 1];
union {
void __rcu *slot[MAPLE_RANGE64_SLOTS];
struct {
void __rcu *pad[MAPLE_RANGE64_SLOTS - 1];
struct maple_metadata meta;
};
};
};

and struct maple_metadata is:

struct maple_metadata {
unsigned char end;
unsigned char gap;
};

If i swap the gap and end members 0x0e00000000000000 becomes
0x000e000000000000. And 0xe matches our msa->offset 14 above.
So it looks like mas_next() in mmap_region returns the meta
data for the node.

So from the lines above you likely already guessed that i have no clue
how mapple tree works, and i didn't had enough time today to read all
the magic and understand it. But i thought i just drop my observation
here in case someone has an idea.

Thanks,
Sven

2022-05-17 02:56:22

by Liam R. Howlett

[permalink] [raw]

Subject: Re: [PATCH] mapletree-vs-khugepaged

* Sven Schnelle <[email protected]> [220515 16:02]:
> Liam Howlett <[email protected]> writes:
>
> > * Sven Schnelle <[email protected]> [220513 10:46]:
> >> Starting today we're still seeing the same crash with linux-next from
> >> (next-20220513):
> >>
> >> [ 211.937897] CPU: 7 PID: 535 Comm: pt_upgrade Not tainted 5.18.0-rc6-11648-g76535d42eb53-dirty #732
> >> [ 211.937902] Unable to handle kernel pointer dereference in virtual kernel address space
> >> [ 211.937903] Hardware name: IBM 3906 M04 704 (z/VM 7.1.0)
> >> [ 211.937906] Failing address: 0e00000000000000 TEID: 0e00000000000803
> >> [ 211.937909] Krnl PSW : 0704c00180000000 0000001ca52f06d6
> >> [ 211.937910] Fault in home space mode while using kernel ASCE.
> >> [ 211.937917] AS:0000001ca6e24007 R3:0000001fffff0007 S:0000001ffffef800 P:000000000000003d
> >> [ 211.937914] (mmap_region+0x19e/0x848)
> >> [ 211.937929] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> >> [ 211.937939] Krnl GPRS: 0000000000000000 0e00000000000000 0000000000000000 0000000000000000
> >> [ 211.937942] ffffffff00000f0f ffffffffffffffff 0e00000000000000 0000040000001000
> >> [ 211.937945] 0000000083551900 0000040000000000 00000000000000fb 000003800070fc58
> >> [ 211.937947] 000000008f490000 0000000000000000 0000001ca52f06b6 000003800070fb48
> >> [ 211.937959] Krnl Code: 0000001ca52f06c6: a7740021 brc 7,0000001ca52f0708
> >> [ 211.937959] 0000001ca52f06ca: ec6801b3007c cgij %r6,0,8,0000001ca52f0a30
> >> [ 211.937959] #0000001ca52f06d0: e310f0f80004 lg %r1,248(%r15)
> >> [ 211.937959] >0000001ca52f06d6: e37010000020 cg %r7,0(%r1)
> >> [ 211.937959] 0000001ca52f06dc: a78400ea brc 8,0000001ca52f08b0
> >> [ 211.937959] 0000001ca52f06e0: e310f0f00004 lg %r1,240(%r15)
> >> [ 211.937959] 0000001ca52f06e6: ec180008007c cgij %r1,0,8,0000001ca52f06f6
> >> [ 211.937959] 0000001ca52f06ec: e39010080020 cg %r9,8(%r1)
> >> [ 211.937973] Call Trace:
> >> [ 211.937975] [<0000001ca52f06d6>] mmap_region+0x19e/0x848
> >> [ 211.937978] ([<0000001ca52f06b6>] mmap_region+0x17e/0x848)
> >> [ 211.937981] [<0000001ca52f116a>] do_mmap+0x3ea/0x4c8
> >> [ 211.937983] [<0000001ca52bed12>] vm_mmap_pgoff+0xda/0x178
> >> [ 211.937987] [<0000001ca52ed5ea>] ksys_mmap_pgoff+0x62/0x238
> >> [ 211.937989] [<0000001ca52ed992>] __s390x_sys_old_mmap+0x7a/0xa0
> >> [ 211.937993] [<0000001ca5c4ef5c>] __do_syscall+0x1d4/0x200
> >> [ 211.937999] [<0000001ca5c5d572>] system_call+0x82/0xb0
> >> [ 211.938002] Last Breaking-Event-Address:
> >> [ 211.938003] [<0000001ca5888616>] mas_prev+0xb6/0xc0
> >> [ 211.938010] Oops: 0038 ilc:3 [#2]
> >> [ 211.938011] Kernel panic - not syncing: Fatal exception: panic_on_oops
> >> [ 211.938012] SMP
> >> [ 211.938014] Modules linked in:
> >> 07: HCPGIR450W CP entered; disabled wait PSW 00020001 80000000 0000001C
> >> A50679A6
> >>
> >> IS that issue supposed to be fixed? git bisect pointed me to
> >>
> >> # bad: [76535d42eb53485775a8c54ea85725812b75543f] Merge branch
> >> 'mm-everything' of
> >> git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> >>
> >> which isn't really helpful.
> >>
> >> Anything we could help with debugging this?
> >
> > I tested the maple tree on top of the s390 as it was the same crash and
> > it was okay. I haven't tested the mm-everything branch though. Can you
> > test mm-unstable?
>
> Yes, i tested mm-unstable but wasn't able to reproduce the issue.
>
> > I'll continue setting up a sparc VM for testing here and test
> > mm-everything on that and the s390
>
> One thing that is different compared to x86 is that both sparc and s390
> are big endian. Not sure whether and where that would make a difference.
>
> The code to trigger the crash on s390 is rather simple: Just force a
> paging level upgrade to 5 levels by calling mmap() with an address that
> doesn't fit in 3 levels. Haven't tested whether an upgrade to 4 levels
> would be sufficent. I've condensed our test case that triggers this, and
> basically all that is required is:
>
> --------------------------------8<---------------------------------------
> #include <stdlib.h>
> #include <unistd.h>
> #include <sys/mman.h>
> #include <sys/wait.h>
> #include <stdio.h>
>
> #define PAGE_SIZE 0x1000
> #define _REGION1_SIZE (1UL << 54)
>
> int main(int argc, char *argv[])
> {
> int pid, status;
> void *addr;
>
> pid = fork();
> if (pid == 0) {
> /*
> * Trigger page table level upgrade
> */
> addr = mmap((void *)_REGION1_SIZE, PAGE_SIZE, PROT_READ | PROT_WRITE,
> MAP_SHARED | MAP_ANONYMOUS, -1, 0);
> if (addr == MAP_FAILED)
> return 1;
> *(int *)addr = 1;
> return 0;
> }
> wait(&status);
> return 0;
> }
> --------------------------------8<---------------------------------------
>

I tried the above on my qemu s390 with kernel 5.18.0-rc6-next-20220513,
but it runs without issue, return code is 0. Is there something the VM
needs to have for this to trigger?

> I've added a few debug statements to the maple tree code:
>
> [ 27.769641] mas_next_entry: offset=14
> [ 27.769642] mas_next_nentry: entry = 0e00000000000000, slots=0000000090249f80, mas->offset=15 count=14

Where exactly are you printing this?

>
> I see in mas_next_nentry() that there's a while that iterates over the
> (used?) slots until count is reached.`

Yes, mas_next_nentry() looks for the next non-null entry in the current
node.

>After that loop mas_next_entry()
> just picks the next (unused?) entry, which is slot 15 in that case.

mas_next_entry() returns the next non-null entry. If there isn't one
returned by mas_next_nentry(), then it will advance to the next node by
calling mas_next_node(). There are checks in there for detecting dead
nodes for RCU use and limit checking as well.

>
> What i noticed while scanning over include/linux/maple_tree.h is:
>
> struct maple_range_64 {
> struct maple_pnode *parent;
> unsigned long pivot[MAPLE_RANGE64_SLOTS - 1];
> union {
> void __rcu *slot[MAPLE_RANGE64_SLOTS];
> struct {
> void __rcu *pad[MAPLE_RANGE64_SLOTS - 1];
> struct maple_metadata meta;
> };
> };
> };
>
> and struct maple_metadata is:
>
> struct maple_metadata {
> unsigned char end;
> unsigned char gap;
> };
>
> If i swap the gap and end members 0x0e00000000000000 becomes
> 0x000e000000000000. And 0xe matches our msa->offset 14 above.
> So it looks like mas_next() in mmap_region returns the meta
> data for the node.

If this is the case, then I think any task that has more than 14 VMAs
would have issues. I also use mas_next_entry() in mas_find() which is
used for the mas_for_each() macro/iterator. Can you please enable
CONFIG_DEBUG_VM_MAPLE_TREE ? mmap.c tests the tree after pretty much
any change and will dump useful information if there is an issue -
including the entire tree. See validate_mm_mt() for details.

You can find CONFIG_DEBUG_VM_MAPLE_TREE in the config:
kernel hacking -> Memory debugging -> Debug VM -> Debug VM maple trees

>
> So from the lines above you likely already guessed that i have no clue
> how mapple tree works, and i didn't had enough time today to read all
> the magic and understand it. But i thought i just drop my observation
> here in case someone has an idea.

Thanks for sharing. I'm having a hard time recreating the issue so I
cannot fully dig in myself.

I was able to boot spar64 with mm-unstable. I did get an error:
[ 5.002625] Kernel unaligned access at TPC[59bae8]
mmap_region+0x168/0xb00

faddr2line is less than useful though with reported line "at ??:?"

I'll keep digging into that.

Thanks,
Liam

2022-05-18 04:46:19

by Heiko Carstens

[permalink] [raw]

Subject: Re: [PATCH] mapletree-vs-khugepaged

On Fri, May 13, 2022 at 05:00:31PM +0000, Liam Howlett wrote:
> * Sven Schnelle <[email protected]> [220513 10:46]:
> > Heiko Carstens <[email protected]> writes:
> > > FWIW, same on s390 - linux-next is completely broken. Note: I didn't
> > > bisect, but given that the call trace, and even the failing address
> > > match, I'm quite confident it is the same reason.
> > IS that issue supposed to be fixed? git bisect pointed me to
> >
> > # bad: [76535d42eb53485775a8c54ea85725812b75543f] Merge branch
> > 'mm-everything' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> >
> > which isn't really helpful.
> >
> > Anything we could help with debugging this?
>
> I tested the maple tree on top of the s390 as it was the same crash and
> it was okay. I haven't tested the mm-everything branch though. Can you
> test mm-unstable?
>
> I'll continue setting up a sparc VM for testing here and test
> mm-everything on that and the s390

So due to reports here I did some sort of "special bisect": with today's
linux-next I did a hard reset to commit 562340595cbb ("Merge branch
'for-next/kspp' of
git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux.git"),
started a bisect on Andrew's tree between mm-stable and mm-unstable, and
merged whatever commit was about to be bisected into 562340595cbb.

This lead finally to commit f1297d3a2cb7 ("mm/mmap: reorganize munmap to
use maple states") as "first bad commit".

So given that we are shortly before the merge window and linux-next is
completely broken for s390, how do we proceed? Right now I have no idea if
there is anything else in linux-next that would break s390 because of this.

Even though I'm sure you won't like to hear this, but I'd appreciate if
this code could be removed from linux-next again.