2022-04-19 22:54:25

by Naresh Kamboju

[permalink] [raw]
Subject: [next] arm: boot failed - PC is at cpu_ca15_set_pte_ext

Linux next 20220419 boot failed on arm architecture qemu_arm and BeagleBoard
x15 device.

kernel crash log from x15:
-----------------
[ 6.866516] 8<--- cut here ---
[ 6.869598] Unable to handle kernel paging request at virtual
address f000e62c
[ 6.876861] [f000e62c] *pgd=82935811, *pte=00000000, *ppte=00000000
[ 6.883209] Internal error: Oops: 807 [#3] SMP ARM
[ 6.888000] Modules linked in:
[ 6.891082] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G D W
5.18.0-rc3-next-20220419 #1
[ 6.899993] Hardware name: Generic DRA74X (Flattened Device Tree)
[ 6.906127] PC is at cpu_ca15_set_pte_ext+0x4c/0x58
[ 6.911041] LR is at handle_mm_fault+0x60c/0xed0
[ 6.915679] pc : [<c031f26c>] lr : [<c04cfeb8>] psr: 40000013
[ 6.921966] sp : f000dde8 ip : f000de44 fp : a0000013
[ 6.927215] r10: 00000000 r9 : 00000000 r8 : c1e95194
[ 6.932464] r7 : c3c95000 r6 : befffff1 r5 : 00000081 r4 : c29d8000
[ 6.939025] r3 : 00000000 r2 : 00000000 r1 : 00000040 r0 : f000de2c
[ 6.945587] Flags: nZcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
[ 6.952758] Control: 10c5387d Table: 8020406a DAC: 00000051
[ 6.958526] Register r0 information: 2-page vmalloc region starting
at 0xf000c000 allocated at kernel_clone+0x94/0x3b0
[ 6.969299] Register r1 information: non-paged memory
[ 6.974365] Register r2 information: NULL pointer
[ 6.979095] Register r3 information: NULL pointer
[ 6.983825] Register r4 information: slab task_struct start
c29d8000 pointer offset 0
[ 6.991729] Register r5 information: non-paged memory
[ 6.996795] Register r6 information: non-paged memory
[ 7.001861] Register r7 information: slab vm_area_struct start
c3c95000 pointer offset 0
[ 7.010009] Register r8 information: non-slab/vmalloc memory
[ 7.015716] Register r9 information: NULL pointer
[ 7.020446] Register r10 information: NULL pointer
[ 7.025238] Register r11 information: non-paged memory
[ 7.030426] Register r12 information: 2-page vmalloc region
starting at 0xf000c000 allocated at kernel_clone+0x94/0x3b0
[ 7.041259] Process swapper/0 (pid: 1, stack limit = 0xfaff0077)
[ 7.047302] Stack: (0xf000dde8 to 0xf000e000)
[ 7.051696] dde0: c29d8000 00000cc0 c20a1108
c2065fa0 c1e09f50 b6db6db7
[ 7.059906] de00: c195bf0c 17c0f572 c29d8000 c3c95000 00000cc0
000befff befff000 befffff1
[ 7.068115] de20: 00000081 c3c3afb8 c3c3afb8 00000000 00000000
00000000 00000000 00000000
[ 7.076324] de40: 00000000 17c0f572 befff000 c3c95000 00002017
befffff1 00002017 00002fb8
[ 7.084564] de60: c2d04000 00000081 c29d8000 c04c6790 c20d01d4
00000000 00000001 c20ce440
[ 7.092773] de80: c1e10bcc fffff000 00000000 c2a45680 eeb33cc0
c29d8000 00000000 c2d04000
[ 7.100982] dea0: befffff1 f000df18 00000000 00002017 c20661a0
c04c77e8 f000df18 00000000
[ 7.109222] dec0: 00000000 c1d95c40 00000002 c20661e0 00000000
00000001 00000000 c04c7ad0
[ 7.117431] dee0: 00000011 c2d02a00 00000001 befffff1 c29d8000
00000000 00000011 c2a30010
[ 7.125640] df00: c29d8000 c0524c24 f000df18 00000000 00000000
2cd9e000 c1d95c40 17c0f572
[ 7.133850] df20: 00000000 c2d02a00 0000000b 00000ffc 00000000
befffff1 00000000 c0524f74
[ 7.142089] df40: c1e0e394 c2d02a00 c209a71c 38e38e39 c29d8000
bee00008 c2d02a00 c2a30000
[ 7.150299] df60: c1e0e394 c1e0e420 00000000 00000000 00000000
c05266bc c209a000 c1944c60
[ 7.158508] df80: 00000000 00000000 00000000 c129d2b4 c209a000
c1e0e394 00000000 c12b5600
[ 7.166748] dfa0: 00000000 c12b5518 00000000 c0300168 00000000
00000000 00000000 00000000
[ 7.174957] dfc0: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
[ 7.183166] dfe0: 00000000 00000000 00000000 00000000 00000013
00000000 00000000 00000000
[ 7.191406] Code: 13110001 12211b02 13110b02 03a03000 (e5a03800)
[ 7.197570] ---[ end trace 0000000000000000 ]---
[ 7.202209] Kernel panic - not syncing: Attempted to kill init!
exitcode=0x0000000b

Reported-by: Linux Kernel Functional Testing <[email protected]>

metadata:
git_ref: master
git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
git_sha: 634de1db0e9bbeb90d7b01020e59ec3dab4d38a1
git_describe: next-20220419
kernel-config: https://builds.tuxbuild.com/280TXP6P7tIBfnowvFY4wobXp3R/config
System.map: https://builds.tuxbuild.com/280TXP6P7tIBfnowvFY4wobXp3R/System.map
vmlinux.xz: https://builds.tuxbuild.com/280TXP6P7tIBfnowvFY4wobXp3R/vmlinux.xz
build-url: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next/-/pipelines/519362851
build: https://builds.tuxbuild.com/280TXP6P7tIBfnowvFY4wobXp3R
toolchain: gcc-10

--
Linaro LKFT
https://lkft.linaro.org

[1] https://lkft.validation.linaro.org/scheduler/job/4921995#L2616
[2] https://lkft.validation.linaro.org/scheduler/job/4922061#L552


2022-04-20 20:09:39

by Naresh Kamboju

[permalink] [raw]
Subject: Re: [next] arm: boot failed - PC is at cpu_ca15_set_pte_ext

On Wed, 20 Apr 2022 at 00:28, Russell King (Oracle)
<[email protected]> wrote:
>
> On Tue, Apr 19, 2022 at 04:28:52PM +0530, Naresh Kamboju wrote:
> > Linux next 20220419 boot failed on arm architecture qemu_arm and BeagleBoard
> > x15 device.
>
> Was the immediately previous linux-next behaving correctly?

This crash started happening from the next-20220413 tag.

- Naresh

2022-04-20 23:22:46

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [next] arm: boot failed - PC is at cpu_ca15_set_pte_ext

On Wed, Apr 20, 2022 at 02:25:32PM +0530, Naresh Kamboju wrote:
> On Wed, 20 Apr 2022 at 00:28, Russell King (Oracle)
> <[email protected]> wrote:
> >
> > On Tue, Apr 19, 2022 at 04:28:52PM +0530, Naresh Kamboju wrote:
> > > Linux next 20220419 boot failed on arm architecture qemu_arm and BeagleBoard
> > > x15 device.
> >
> > Was the immediately previous linux-next behaving correctly?
>
> This crash started happening from the next-20220413 tag.

That rules out any arm32 specific changes - the last time my tree
changed in for-next was 1st April.

Ard points out that the pte table is on the stack, which it really
should not be. I'm guessing there's some inappropriate generic
kernel change that has broken arm32. A pte table should never ever
appear on a kernel stack.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

2022-04-20 23:53:51

by Max Krummenacher

[permalink] [raw]
Subject: Re: [next] arm: boot failed - PC is at cpu_ca15_set_pte_ext

Am Mittwoch, den 20.04.2022, 09:31 +0200 schrieb Ard Biesheuvel:
> On Tue, 19 Apr 2022 at 12:59, Naresh Kamboju <[email protected]> wrote:
> > Linux next 20220419 boot failed on arm architecture qemu_arm and BeagleBoard
> > x15 device.
> >
> > kernel crash log from x15:
> > -----------------
> > [ 6.866516] 8<--- cut here ---
> > [ 6.869598] Unable to handle kernel paging request at virtual
> > address f000e62c
> > [ 6.876861] [f000e62c] *pgd=82935811, *pte=00000000, *ppte=00000000
> > [ 6.883209] Internal error: Oops: 807 [#3] SMP ARM
> > [ 6.888000] Modules linked in:
> > [ 6.891082] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G D W
> > 5.18.0-rc3-next-20220419 #1
> > [ 6.899993] Hardware name: Generic DRA74X (Flattened Device Tree)
> > [ 6.906127] PC is at cpu_ca15_set_pte_ext+0x4c/0x58
> > [ 6.911041] LR is at handle_mm_fault+0x60c/0xed0
> > [ 6.915679] pc : [<c031f26c>] lr : [<c04cfeb8>] psr: 40000013
> > [ 6.921966] sp : f000dde8 ip : f000de44 fp : a0000013
> > [ 6.927215] r10: 00000000 r9 : 00000000 r8 : c1e95194
> > [ 6.932464] r7 : c3c95000 r6 : befffff1 r5 : 00000081 r4 : c29d8000
> > [ 6.939025] r3 : 00000000 r2 : 00000000 r1 : 00000040 r0 : f000de2c
> > [ 6.945587] Flags: nZcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
> > [ 6.952758] Control: 10c5387d Table: 8020406a DAC: 00000051
> > [ 6.958526] Register r0 information: 2-page vmalloc region starting
> > at 0xf000c000 allocated at kernel_clone+0x94/0x3b0
> > [ 6.969299] Register r1 information: non-paged memory
> > [ 6.974365] Register r2 information: NULL pointer
> > [ 6.979095] Register r3 information: NULL pointer
> > [ 6.983825] Register r4 information: slab task_struct start
> > c29d8000 pointer offset 0
> > [ 6.991729] Register r5 information: non-paged memory
> > [ 6.996795] Register r6 information: non-paged memory
> > [ 7.001861] Register r7 information: slab vm_area_struct start
> > c3c95000 pointer offset 0
> > [ 7.010009] Register r8 information: non-slab/vmalloc memory
> > [ 7.015716] Register r9 information: NULL pointer
> > [ 7.020446] Register r10 information: NULL pointer
> > [ 7.025238] Register r11 information: non-paged memory
> > [ 7.030426] Register r12 information: 2-page vmalloc region
> > starting at 0xf000c000 allocated at kernel_clone+0x94/0x3b0
> > [ 7.041259] Process swapper/0 (pid: 1, stack limit = 0xfaff0077)
> > [ 7.047302] Stack: (0xf000dde8 to 0xf000e000)
> > [ 7.051696] dde0: c29d8000 00000cc0 c20a1108
> > c2065fa0 c1e09f50 b6db6db7
> > [ 7.059906] de00: c195bf0c 17c0f572 c29d8000 c3c95000 00000cc0
> > 000befff befff000 befffff1
> > [ 7.068115] de20: 00000081 c3c3afb8 c3c3afb8 00000000 00000000
> > 00000000 00000000 00000000
> > [ 7.076324] de40: 00000000 17c0f572 befff000 c3c95000 00002017
> > befffff1 00002017 00002fb8
> > [ 7.084564] de60: c2d04000 00000081 c29d8000 c04c6790 c20d01d4
> > 00000000 00000001 c20ce440
> > [ 7.092773] de80: c1e10bcc fffff000 00000000 c2a45680 eeb33cc0
> > c29d8000 00000000 c2d04000
> > [ 7.100982] dea0: befffff1 f000df18 00000000 00002017 c20661a0
> > c04c77e8 f000df18 00000000
> > [ 7.109222] dec0: 00000000 c1d95c40 00000002 c20661e0 00000000
> > 00000001 00000000 c04c7ad0
> > [ 7.117431] dee0: 00000011 c2d02a00 00000001 befffff1 c29d8000
> > 00000000 00000011 c2a30010
> > [ 7.125640] df00: c29d8000 c0524c24 f000df18 00000000 00000000
> > 2cd9e000 c1d95c40 17c0f572
> > [ 7.133850] df20: 00000000 c2d02a00 0000000b 00000ffc 00000000
> > befffff1 00000000 c0524f74
> > [ 7.142089] df40: c1e0e394 c2d02a00 c209a71c 38e38e39 c29d8000
> > bee00008 c2d02a00 c2a30000
> > [ 7.150299] df60: c1e0e394 c1e0e420 00000000 00000000 00000000
> > c05266bc c209a000 c1944c60
> > [ 7.158508] df80: 00000000 00000000 00000000 c129d2b4 c209a000
> > c1e0e394 00000000 c12b5600
> > [ 7.166748] dfa0: 00000000 c12b5518 00000000 c0300168 00000000
> > 00000000 00000000 00000000
> > [ 7.174957] dfc0: 00000000 00000000 00000000 00000000 00000000
> > 00000000 00000000 00000000
> > [ 7.183166] dfe0: 00000000 00000000 00000000 00000000 00000013
> > 00000000 00000000 00000000
> > [ 7.191406] Code: 13110001 12211b02 13110b02 03a03000 (e5a03800)
>
> This decodes to
>
> 0: 13110001 tstne r1, #1
> 4: 12211b02 eorne r1, r1, #2048 ; 0x800
> 8: 13110b02 tstne r1, #2048 ; 0x800
> c: 03a03000 moveq r3, #0
> 10:* e5a03800 str r3, [r0, #2048]! ; 0x800 <-- trapping instruction
>
> and R0 points into the stack. So we are updating a PTE that is located
> on the stack rather than in a page table somewhere, which seems very
> odd. However, this could be a latent bug that got uncovered by the
> VMAP stacks changes.
>
> Unfortunately, the vmlinux.xz file I downloaded from the link below
> seems to be different from the one that produced the crash, given that
> the LR address of c04cfeb8 does not seem to correspond with
> handle_mm_fault+0x60c/0xed0.
>
> Can you please double check the artifacts?

Commit "mm: check against orig_pte for finish_fault()" introduced this,
i.e. on yesterdays next reverting a066bab3c0eb made a i.MX6 boot again.
A fix is discussed here:

https://lore.kernel.org/all/[email protected]/

Max

>
>
>
> > metadata:
> > git_ref: master
> > git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
> > git_sha: 634de1db0e9bbeb90d7b01020e59ec3dab4d38a1
> > git_describe: next-20220419
> > kernel-config: https://builds.tuxbuild.com/280TXP6P7tIBfnowvFY4wobXp3R/config
> > System.map: https://builds.tuxbuild.com/280TXP6P7tIBfnowvFY4wobXp3R/System.map
> > vmlinux.xz: https://builds.tuxbuild.com/280TXP6P7tIBfnowvFY4wobXp3R/vmlinux.xz
> > build-url: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next/-/pipelines/519362851
> > build: https://builds.tuxbuild.com/280TXP6P7tIBfnowvFY4wobXp3R
> > toolchain: gcc-10
> >
> > --
> > Linaro LKFT
> > https://lkft.linaro.org
> >
> > [1] https://lkft.validation.linaro.org/scheduler/job/4921995#L2616
> > [2] https://lkft.validation.linaro.org/scheduler/job/4922061#L552
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

2022-04-20 23:54:03

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [next] arm: boot failed - PC is at cpu_ca15_set_pte_ext

On Tue, 19 Apr 2022 at 12:59, Naresh Kamboju <[email protected]> wrote:
>
> Linux next 20220419 boot failed on arm architecture qemu_arm and BeagleBoard
> x15 device.
>
> kernel crash log from x15:
> -----------------
> [ 6.866516] 8<--- cut here ---
> [ 6.869598] Unable to handle kernel paging request at virtual
> address f000e62c
> [ 6.876861] [f000e62c] *pgd=82935811, *pte=00000000, *ppte=00000000
> [ 6.883209] Internal error: Oops: 807 [#3] SMP ARM
> [ 6.888000] Modules linked in:
> [ 6.891082] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G D W
> 5.18.0-rc3-next-20220419 #1
> [ 6.899993] Hardware name: Generic DRA74X (Flattened Device Tree)
> [ 6.906127] PC is at cpu_ca15_set_pte_ext+0x4c/0x58
> [ 6.911041] LR is at handle_mm_fault+0x60c/0xed0
> [ 6.915679] pc : [<c031f26c>] lr : [<c04cfeb8>] psr: 40000013
> [ 6.921966] sp : f000dde8 ip : f000de44 fp : a0000013
> [ 6.927215] r10: 00000000 r9 : 00000000 r8 : c1e95194
> [ 6.932464] r7 : c3c95000 r6 : befffff1 r5 : 00000081 r4 : c29d8000
> [ 6.939025] r3 : 00000000 r2 : 00000000 r1 : 00000040 r0 : f000de2c
> [ 6.945587] Flags: nZcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
> [ 6.952758] Control: 10c5387d Table: 8020406a DAC: 00000051
> [ 6.958526] Register r0 information: 2-page vmalloc region starting
> at 0xf000c000 allocated at kernel_clone+0x94/0x3b0
> [ 6.969299] Register r1 information: non-paged memory
> [ 6.974365] Register r2 information: NULL pointer
> [ 6.979095] Register r3 information: NULL pointer
> [ 6.983825] Register r4 information: slab task_struct start
> c29d8000 pointer offset 0
> [ 6.991729] Register r5 information: non-paged memory
> [ 6.996795] Register r6 information: non-paged memory
> [ 7.001861] Register r7 information: slab vm_area_struct start
> c3c95000 pointer offset 0
> [ 7.010009] Register r8 information: non-slab/vmalloc memory
> [ 7.015716] Register r9 information: NULL pointer
> [ 7.020446] Register r10 information: NULL pointer
> [ 7.025238] Register r11 information: non-paged memory
> [ 7.030426] Register r12 information: 2-page vmalloc region
> starting at 0xf000c000 allocated at kernel_clone+0x94/0x3b0
> [ 7.041259] Process swapper/0 (pid: 1, stack limit = 0xfaff0077)
> [ 7.047302] Stack: (0xf000dde8 to 0xf000e000)
> [ 7.051696] dde0: c29d8000 00000cc0 c20a1108
> c2065fa0 c1e09f50 b6db6db7
> [ 7.059906] de00: c195bf0c 17c0f572 c29d8000 c3c95000 00000cc0
> 000befff befff000 befffff1
> [ 7.068115] de20: 00000081 c3c3afb8 c3c3afb8 00000000 00000000
> 00000000 00000000 00000000
> [ 7.076324] de40: 00000000 17c0f572 befff000 c3c95000 00002017
> befffff1 00002017 00002fb8
> [ 7.084564] de60: c2d04000 00000081 c29d8000 c04c6790 c20d01d4
> 00000000 00000001 c20ce440
> [ 7.092773] de80: c1e10bcc fffff000 00000000 c2a45680 eeb33cc0
> c29d8000 00000000 c2d04000
> [ 7.100982] dea0: befffff1 f000df18 00000000 00002017 c20661a0
> c04c77e8 f000df18 00000000
> [ 7.109222] dec0: 00000000 c1d95c40 00000002 c20661e0 00000000
> 00000001 00000000 c04c7ad0
> [ 7.117431] dee0: 00000011 c2d02a00 00000001 befffff1 c29d8000
> 00000000 00000011 c2a30010
> [ 7.125640] df00: c29d8000 c0524c24 f000df18 00000000 00000000
> 2cd9e000 c1d95c40 17c0f572
> [ 7.133850] df20: 00000000 c2d02a00 0000000b 00000ffc 00000000
> befffff1 00000000 c0524f74
> [ 7.142089] df40: c1e0e394 c2d02a00 c209a71c 38e38e39 c29d8000
> bee00008 c2d02a00 c2a30000
> [ 7.150299] df60: c1e0e394 c1e0e420 00000000 00000000 00000000
> c05266bc c209a000 c1944c60
> [ 7.158508] df80: 00000000 00000000 00000000 c129d2b4 c209a000
> c1e0e394 00000000 c12b5600
> [ 7.166748] dfa0: 00000000 c12b5518 00000000 c0300168 00000000
> 00000000 00000000 00000000
> [ 7.174957] dfc0: 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000
> [ 7.183166] dfe0: 00000000 00000000 00000000 00000000 00000013
> 00000000 00000000 00000000
> [ 7.191406] Code: 13110001 12211b02 13110b02 03a03000 (e5a03800)

This decodes to

0: 13110001 tstne r1, #1
4: 12211b02 eorne r1, r1, #2048 ; 0x800
8: 13110b02 tstne r1, #2048 ; 0x800
c: 03a03000 moveq r3, #0
10:* e5a03800 str r3, [r0, #2048]! ; 0x800 <-- trapping instruction

and R0 points into the stack. So we are updating a PTE that is located
on the stack rather than in a page table somewhere, which seems very
odd. However, this could be a latent bug that got uncovered by the
VMAP stacks changes.

Unfortunately, the vmlinux.xz file I downloaded from the link below
seems to be different from the one that produced the crash, given that
the LR address of c04cfeb8 does not seem to correspond with
handle_mm_fault+0x60c/0xed0.

Can you please double check the artifacts?



> metadata:
> git_ref: master
> git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
> git_sha: 634de1db0e9bbeb90d7b01020e59ec3dab4d38a1
> git_describe: next-20220419
> kernel-config: https://builds.tuxbuild.com/280TXP6P7tIBfnowvFY4wobXp3R/config
> System.map: https://builds.tuxbuild.com/280TXP6P7tIBfnowvFY4wobXp3R/System.map
> vmlinux.xz: https://builds.tuxbuild.com/280TXP6P7tIBfnowvFY4wobXp3R/vmlinux.xz
> build-url: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next/-/pipelines/519362851
> build: https://builds.tuxbuild.com/280TXP6P7tIBfnowvFY4wobXp3R
> toolchain: gcc-10
>
> --
> Linaro LKFT
> https://lkft.linaro.org
>
> [1] https://lkft.validation.linaro.org/scheduler/job/4921995#L2616
> [2] https://lkft.validation.linaro.org/scheduler/job/4922061#L552

2022-04-21 19:02:13

by Naresh Kamboju

[permalink] [raw]
Subject: Re: [next] arm: boot failed - PC is at cpu_ca15_set_pte_ext

On Wed, 20 Apr 2022 at 13:01, Ard Biesheuvel <[email protected]> wrote:
>
> On Tue, 19 Apr 2022 at 12:59, Naresh Kamboju <[email protected]> wrote:
> >
> > Linux next 20220419 boot failed on arm architecture qemu_arm and BeagleBoard
> > x15 device.
> >
> > kernel crash log from x15:
> > -----------------
> > [ 6.866516] 8<--- cut here ---
> > [ 6.869598] Unable to handle kernel paging request at virtual
> > address f000e62c
> > [ 6.876861] [f000e62c] *pgd=82935811, *pte=00000000, *ppte=00000000
> > [ 6.883209] Internal error: Oops: 807 [#3] SMP ARM
> > [ 6.888000] Modules linked in:
> > [ 6.891082] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G D W
> > 5.18.0-rc3-next-20220419 #1
> > [ 6.899993] Hardware name: Generic DRA74X (Flattened Device Tree)
> > [ 6.906127] PC is at cpu_ca15_set_pte_ext+0x4c/0x58
> > [ 6.911041] LR is at handle_mm_fault+0x60c/0xed0
> > [ 6.915679] pc : [<c031f26c>] lr : [<c04cfeb8>] psr: 40000013
> > [ 6.921966] sp : f000dde8 ip : f000de44 fp : a0000013
> > [ 6.927215] r10: 00000000 r9 : 00000000 r8 : c1e95194
> > [ 6.932464] r7 : c3c95000 r6 : befffff1 r5 : 00000081 r4 : c29d8000
> > [ 6.939025] r3 : 00000000 r2 : 00000000 r1 : 00000040 r0 : f000de2c
> > [ 6.945587] Flags: nZcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
> > [ 6.952758] Control: 10c5387d Table: 8020406a DAC: 00000051
> > [ 6.958526] Register r0 information: 2-page vmalloc region starting
> > at 0xf000c000 allocated at kernel_clone+0x94/0x3b0
> > [ 6.969299] Register r1 information: non-paged memory
> > [ 6.974365] Register r2 information: NULL pointer
> > [ 6.979095] Register r3 information: NULL pointer
> > [ 6.983825] Register r4 information: slab task_struct start
> > c29d8000 pointer offset 0
> > [ 6.991729] Register r5 information: non-paged memory
> > [ 6.996795] Register r6 information: non-paged memory
> > [ 7.001861] Register r7 information: slab vm_area_struct start
> > c3c95000 pointer offset 0
> > [ 7.010009] Register r8 information: non-slab/vmalloc memory
> > [ 7.015716] Register r9 information: NULL pointer
> > [ 7.020446] Register r10 information: NULL pointer
> > [ 7.025238] Register r11 information: non-paged memory
> > [ 7.030426] Register r12 information: 2-page vmalloc region
> > starting at 0xf000c000 allocated at kernel_clone+0x94/0x3b0
> > [ 7.041259] Process swapper/0 (pid: 1, stack limit = 0xfaff0077)
> > [ 7.047302] Stack: (0xf000dde8 to 0xf000e000)
> > [ 7.051696] dde0: c29d8000 00000cc0 c20a1108
> > c2065fa0 c1e09f50 b6db6db7
> > [ 7.059906] de00: c195bf0c 17c0f572 c29d8000 c3c95000 00000cc0
> > 000befff befff000 befffff1
> > [ 7.068115] de20: 00000081 c3c3afb8 c3c3afb8 00000000 00000000
> > 00000000 00000000 00000000
> > [ 7.076324] de40: 00000000 17c0f572 befff000 c3c95000 00002017
> > befffff1 00002017 00002fb8
> > [ 7.084564] de60: c2d04000 00000081 c29d8000 c04c6790 c20d01d4
> > 00000000 00000001 c20ce440
> > [ 7.092773] de80: c1e10bcc fffff000 00000000 c2a45680 eeb33cc0
> > c29d8000 00000000 c2d04000
> > [ 7.100982] dea0: befffff1 f000df18 00000000 00002017 c20661a0
> > c04c77e8 f000df18 00000000
> > [ 7.109222] dec0: 00000000 c1d95c40 00000002 c20661e0 00000000
> > 00000001 00000000 c04c7ad0
> > [ 7.117431] dee0: 00000011 c2d02a00 00000001 befffff1 c29d8000
> > 00000000 00000011 c2a30010
> > [ 7.125640] df00: c29d8000 c0524c24 f000df18 00000000 00000000
> > 2cd9e000 c1d95c40 17c0f572
> > [ 7.133850] df20: 00000000 c2d02a00 0000000b 00000ffc 00000000
> > befffff1 00000000 c0524f74
> > [ 7.142089] df40: c1e0e394 c2d02a00 c209a71c 38e38e39 c29d8000
> > bee00008 c2d02a00 c2a30000
> > [ 7.150299] df60: c1e0e394 c1e0e420 00000000 00000000 00000000
> > c05266bc c209a000 c1944c60
> > [ 7.158508] df80: 00000000 00000000 00000000 c129d2b4 c209a000
> > c1e0e394 00000000 c12b5600
> > [ 7.166748] dfa0: 00000000 c12b5518 00000000 c0300168 00000000
> > 00000000 00000000 00000000
> > [ 7.174957] dfc0: 00000000 00000000 00000000 00000000 00000000
> > 00000000 00000000 00000000
> > [ 7.183166] dfe0: 00000000 00000000 00000000 00000000 00000013
> > 00000000 00000000 00000000
> > [ 7.191406] Code: 13110001 12211b02 13110b02 03a03000 (e5a03800)
>
> This decodes to
>
> 0: 13110001 tstne r1, #1
> 4: 12211b02 eorne r1, r1, #2048 ; 0x800
> 8: 13110b02 tstne r1, #2048 ; 0x800
> c: 03a03000 moveq r3, #0
> 10:* e5a03800 str r3, [r0, #2048]! ; 0x800 <-- trapping instruction
>
> and R0 points into the stack. So we are updating a PTE that is located
> on the stack rather than in a page table somewhere, which seems very
> odd. However, this could be a latent bug that got uncovered by the
> VMAP stacks changes.
>
> Unfortunately, the vmlinux.xz file I downloaded from the link below
> seems to be different from the one that produced the crash, given that
> the LR address of c04cfeb8 does not seem to correspond with
> handle_mm_fault+0x60c/0xed0.
> Can you please double check the artifacts?

You can find the vmlinux.xz for the trace log I have pasted.

vmlinux.xz : https://builds.tuxbuild.com/280TS8MuM6sYWk5aUtrvWIw0RQ7/vmlinux.xz
artifact-location: https://builds.tuxbuild.com/280TS8MuM6sYWk5aUtrvWIw0RQ7

- Naresh

2022-04-22 18:47:17

by Andrew Morton

[permalink] [raw]
Subject: Re: [next] arm: boot failed - PC is at cpu_ca15_set_pte_ext

On Wed, 20 Apr 2022 09:50:52 +0200 Max Krummenacher <[email protected]> wrote:

> >
> > Unfortunately, the vmlinux.xz file I downloaded from the link below
> > seems to be different from the one that produced the crash, given that
> > the LR address of c04cfeb8 does not seem to correspond with
> > handle_mm_fault+0x60c/0xed0.
> >
> > Can you please double check the artifacts?
>
> Commit "mm: check against orig_pte for finish_fault()" introduced this,
> i.e. on yesterdays next reverting a066bab3c0eb made a i.MX6 boot again.
> A fix is discussed here:
>
> https://lore.kernel.org/all/[email protected]/
>

Thanks for finding that. I have Peter's fix queued and shall push out
a snapshot later today, for integration into linux-next.


From: Peter Xu <[email protected]>
Subject: mm-check-against-orig_pte-for-finish_fault-fix

fix crash reported by Marek

Link: https://lkml.kernel.org/r/Ylb9rXJyPm8/[email protected]
Signed-off-by: Peter Xu <[email protected]>
Reported-by: Marek Szyprowski <[email protected]>
Tested-by: Marek Szyprowski <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: "Kirill A . Shutemov" <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Nadav Amit <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---


--- a/include/linux/mm_types.h~mm-check-against-orig_pte-for-finish_fault-fix
+++ a/include/linux/mm_types.h
@@ -814,6 +814,8 @@ typedef struct {
* @FAULT_FLAG_UNSHARE: The fault is an unsharing request to unshare (and mark
* exclusive) a possibly shared anonymous page that is
* mapped R/O.
+ * @FAULT_FLAG_ORIG_PTE_VALID: whether the fault has vmf->orig_pte cached.
+ * We should only access orig_pte if this flag set.
*
* About @FAULT_FLAG_ALLOW_RETRY and @FAULT_FLAG_TRIED: we can specify
* whether we would allow page faults to retry by specifying these two
@@ -850,6 +852,7 @@ enum fault_flag {
FAULT_FLAG_INSTRUCTION = 1 << 8,
FAULT_FLAG_INTERRUPTIBLE = 1 << 9,
FAULT_FLAG_UNSHARE = 1 << 10,
+ FAULT_FLAG_ORIG_PTE_VALID = 1 << 11,
};

#endif /* _LINUX_MM_TYPES_H */
--- a/mm/memory.c~mm-check-against-orig_pte-for-finish_fault-fix
+++ a/mm/memory.c
@@ -4194,6 +4194,15 @@ void do_set_pte(struct vm_fault *vmf, st
set_pte_at(vma->vm_mm, addr, vmf->pte, entry);
}

+static bool vmf_pte_changed(struct vm_fault *vmf)
+{
+ if (vmf->flags & FAULT_FLAG_ORIG_PTE_VALID) {
+ return !pte_same(*vmf->pte, vmf->orig_pte);
+ }
+
+ return !pte_none(*vmf->pte);
+}
+
/**
* finish_fault - finish page fault once we have prepared the page to fault
*
@@ -4252,7 +4261,7 @@ vm_fault_t finish_fault(struct vm_fault
vmf->address, &vmf->ptl);
ret = 0;
/* Re-check under ptl */
- if (likely(pte_same(*vmf->pte, vmf->orig_pte)))
+ if (likely(!vmf_pte_changed(vmf)))
do_set_pte(vmf, page, vmf->address);
else
ret = VM_FAULT_NOPAGE;
@@ -4720,13 +4729,7 @@ static vm_fault_t handle_pte_fault(struc
* concurrent faults and from rmap lookups.
*/
vmf->pte = NULL;
- /*
- * Always initialize orig_pte. This matches with below
- * code to have orig_pte to be the none pte if pte==NULL.
- * This makes the rest code to be always safe to reference
- * it, e.g. in finish_fault() we'll detect pte changes.
- */
- pte_clear(vmf->vma->vm_mm, vmf->address, &vmf->orig_pte);
+ vmf->flags &= ~FAULT_FLAG_ORIG_PTE_VALID;
} else {
/*
* If a huge pmd materialized under us just retry later. Use
@@ -4750,6 +4753,7 @@ static vm_fault_t handle_pte_fault(struc
*/
vmf->pte = pte_offset_map(vmf->pmd, vmf->address);
vmf->orig_pte = *vmf->pte;
+ vmf->flags |= FAULT_FLAG_ORIG_PTE_VALID;

/*
* some architectures can have larger ptes than wordsize,
_

2022-04-22 19:29:37

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [next] arm: boot failed - PC is at cpu_ca15_set_pte_ext

On Tue, Apr 19, 2022 at 04:28:52PM +0530, Naresh Kamboju wrote:
> Linux next 20220419 boot failed on arm architecture qemu_arm and BeagleBoard
> x15 device.

Was the immediately previous linux-next behaving correctly?

If so, nothing has changed in the ARM32 kernel tree, so this must be
someone else's issue - code that someone else has pushed into
linux-next.

It looks to me like someone is walking the page tables incorrectly,
somewhere buried in handle_mm_fault(), because the PTE pointer is in
the upper-2k of a 4k page, which is most definitely illegal on arm32.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

2022-04-22 22:38:08

by Naresh Kamboju

[permalink] [raw]
Subject: Re: [next] arm: boot failed - PC is at cpu_ca15_set_pte_ext

Hi Max,

On Wed, 20 Apr 2022 at 13:20, Max Krummenacher <[email protected]> wrote:
>
> Am Mittwoch, den 20.04.2022, 09:31 +0200 schrieb Ard Biesheuvel:
> > On Tue, 19 Apr 2022 at 12:59, Naresh Kamboju <[email protected]> wrote:
> > > Linux next 20220419 boot failed on arm architecture qemu_arm and BeagleBoard
> > > x15 device.
> > >
> > > kernel crash log from x15:
> > > -----------------
> > > [ 6.866516] 8<--- cut here ---
> > > [ 6.869598] Unable to handle kernel paging request at virtual
> > > address f000e62c

<trim>

> > Unfortunately, the vmlinux.xz file I downloaded from the link below
> > seems to be different from the one that produced the crash, given that
> > the LR address of c04cfeb8 does not seem to correspond with
> > handle_mm_fault+0x60c/0xed0.
> >
> > Can you please double check the artifacts?
>
> Commit "mm: check against orig_pte for finish_fault()" introduced this,
> i.e. on yesterdays next reverting a066bab3c0eb made a i.MX6 boot again.

Thanks for the pointers,
I have reverted the suggested commit and boot pass now.

Revert "mm: check against orig_pte for finish_fault()"
This reverts commit a066bab3c0eb8f6155257f1345f07d1f6550bc4a.

> A fix is discussed here:
> https://lore.kernel.org/all/[email protected]/
>
> Max

- Naresh