LinuxLists.cc - qemu-arm64: handle_futex_death - kernel/futex/core.c:661 - Unable to handle kernel unknown 43 at virtual address

2023-10-26 14:42:05

Subject: qemu-arm64: handle_futex_death - kernel/futex/core.c:661 - Unable to handle kernel unknown 43 at virtual address

Following kernel crash noticed on qemu-arm64 while running LTP syscalls
set_robust_list test case running Linux next 6.6.0-rc7-next-20231026 and
6.6.0-rc7-next-20231025.

BAD: next-20231025
Good: next-20231024

Reported-by: Linux Kernel Functional Testing <[email protected]>
Reported-by: Naresh Kamboju <[email protected]>

Log:
----
<1>[ 203.119139] Unable to handle kernel unknown 43 at virtual
address 0001ffff9e2e7d78
<1>[ 203.119838] Mem abort info:
<1>[ 203.120064] ESR = 0x000000009793002b
<1>[ 203.121040] EC = 0x25: DABT (current EL), IL = 32 bits
set_robust_list01 1 TPASS : set_robust_list: retval = -1
(expected -1), errno = 22 (expected 22)
set_robust_list01 2 TPASS : set_robust_list: retval = 0
(expected 0), errno = 0 (expected 0)
<1>[ 203.124496] SET = 0, FnV = 0
<1>[ 203.124778] EA = 0, S1PTW = 0
<1>[ 203.125029] FSC = 0x2b: unknown 43
<1>[ 203.126470] Data abort info:
<1>[ 203.126710] Access size = 4 byte(s)
<1>[ 203.126969] SSE = 0, SRT = 19
<1>[ 203.127708] SF = 0, AR = 0
<1>[ 203.128213] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
<1>[ 203.128788] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
<1>[ 203.130416] user pgtable: 4k pages, 52-bit VAs, pgdp=000000010606a780
<1>[ 203.130817] [0001ffff9e2e7d78] pgd=0000000000000000
<0>[ 203.132603] Internal error: Oops: 000000009793002b [#1] PREEMPT SMP
<4>[ 203.133483] Modules linked in: btrfs blake2b_generic libcrc32c
xor xor_neon raid6_pq zstd_compress crct10dif_ce sm3_ce sm3 sha3_ce
sha512_ce sha512_arm64 fuse drm backlight dm_mod ip_tables x_tables
<4>[ 203.135177] CPU: 1 PID: 653 Comm: set_robust_list Not tainted
6.6.0-rc7-next-20231026 #1
<4>[ 203.135642] Hardware name: linux,dummy-virt (DT)
<4>[ 203.136609] pstate: 83400009 (Nzcv daif +PAN -UAO +TCO +DIT
-SSBS BTYPE=--)
<4>[ 203.137028] pc : handle_futex_death (kernel/futex/core.c:661
(discriminator 6))
<4>[ 203.138844] lr : handle_futex_death
(arch/arm64/include/asm/uaccess.h:46 (discriminator 1)
kernel/futex/core.c:661 (discriminator 1))
<4>[ 203.139132] sp : ffff8000805c3c10
<4>[ 203.139356] x29: ffff8000805c3c10 x28: 0000ffffbf187740 x27:
d53bd04035000220
<4>[ 203.140366] x26: 0000000000000000 x25: fff00000c6195280 x24:
fff00000c6195280
<4>[ 203.141055] x23: 0000000000000001 x22: ffffa4e6aeef09d0 x21:
0001ffff9e2e7d78
<4>[ 203.141771] x20: 0001ffff9e2e7d78 x19: 0001ffff9e2e7d78 x18:
ffff8000805c3cf8
<4>[ 203.142457] x17: 0000000000000000 x16: ffffa4e6aeae7078 x15:
000000000000000a
<4>[ 203.143134] x14: 0000000000000000 x13: 1ffe000018258661 x12:
ffff8000805c3cf8
<4>[ 203.143809] x11: 0000000000000000 x10: fff00000c12c3308 x9 :
ffffa4e6ad0e5748
<4>[ 203.144504] x8 : ffff8000805c3c38 x7 : 0000000000000000 x6 :
0000000000000001
<4>[ 203.145186] x5 : 0000000000000000 x4 : fff00000c6195280 x3 :
0000000000000000
<4>[ 203.145929] x2 : 0000000000000000 x1 : 000ffffffffffffc x0 :
0001ffff9e2e7d78
<4>[ 203.147032] Call trace:
<4>[ 203.147254] handle_futex_death (kernel/futex/core.c:661 (discriminator 6))
<4>[ 203.147560] exit_robust_list (kernel/futex/core.c:828)
<4>[ 203.148348] futex_exit_release (kernel/futex/core.c:1035
(discriminator 1) kernel/futex/core.c:1131 (discriminator 1))
<4>[ 203.148891] exit_mm_release (kernel/fork.c:1657)
<4>[ 203.149669] do_exit (kernel/exit.c:541 kernel/exit.c:858)
<4>[ 203.149897] do_group_exit (kernel/exit.c:1002)
<4>[ 203.150209] __arm64_sys_exit_group (kernel/exit.c:1032)
<4>[ 203.150980] invoke_syscall (arch/arm64/include/asm/current.h:19
arch/arm64/kernel/syscall.c:56)
<4>[ 203.151234] el0_svc_common.constprop.0
(include/linux/thread_info.h:127 (discriminator 2)
arch/arm64/kernel/syscall.c:144 (discriminator 2))
<4>[ 203.151999] do_el0_svc (arch/arm64/kernel/syscall.c:156)
<4>[ 203.152231] el0_svc (arch/arm64/include/asm/daifflags.h:28
arch/arm64/kernel/entry-common.c:133
arch/arm64/kernel/entry-common.c:144
arch/arm64/kernel/entry-common.c:679)
<4>[ 203.152936] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:697)
<4>[ 203.153518] el0t_64_sync (arch/arm64/kernel/entry.S:595)
<0>[ 203.154424] Code: d50323bf d65f03c0 9248fa93 52800002 (b8400a73)
All code
========
0: d50323bf autiasp
4: d65f03c0 ret
8: 9248fa93 and x19, x20, #0xff7fffffffffffff
c: 52800002 mov w2, #0x0 // #0
10:* b8400a73 ldtr w19, [x19] <-- trapping instruction

Code starting with the faulting instruction
===========================================
0: b8400a73 ldtr w19, [x19]
<4>[ 203.155308] ---[ end trace 0000000000000000 ]---
<1>[ 203.156234] Fixing recursive fault but reboot is needed!
<3>[ 203.157116] BUG: using smp_processor_id() in preemptible
[00000000] code: set_robust_list/653
<4>[ 203.158116] caller is debug_smp_processor_id (lib/smp_processor_id.c:61)
<4>[ 203.158983] CPU: 1 PID: 653 Comm: set_robust_list Tainted: G
D 6.6.0-rc7-next-20231026 #1
<4>[ 203.159451] Hardware name: linux,dummy-virt (DT)
<4>[ 203.159990] Call trace:
<4>[ 203.160394] dump_backtrace (arch/arm64/kernel/stacktrace.c:235)
<4>[ 203.160625] show_stack (arch/arm64/kernel/stacktrace.c:242)
<4>[ 203.160854] dump_stack_lvl (lib/dump_stack.c:107)
<4>[ 203.161869] dump_stack (lib/dump_stack.c:114)
<4>[ 203.162093] check_preemption_disabled
(arch/arm64/include/asm/current.h:19
arch/arm64/include/asm/preempt.h:54 lib/smp_processor_id.c:53)
<4>[ 203.162898] debug_smp_processor_id (lib/smp_processor_id.c:61)
<4>[ 203.163176] __schedule (kernel/sched/core.c:6578 (discriminator 1))
<4>[ 203.163894] do_task_dead (kernel/sched/core.c:6705)
<4>[ 203.164143] make_task_dead
(arch/arm64/include/asm/atomic_ll_sc.h:95 (discriminator 3)
arch/arm64/include/asm/atomic.h:49 (discriminator 3)
include/linux/atomic/atomic-arch-fallback.h:747 (discriminator 3)
include/linux/atomic/atomic-instrumented.h:253 (discriminator 3)
include/linux/refcount.h:193 (discriminator 3)
include/linux/refcount.h:250 (discriminator 3)
include/linux/refcount.h:267 (discriminator 3) kernel/exit.c:979
(discriminator 3))
<4>[ 203.164871] die (arch/arm64/kernel/traps.c:239)
<4>[ 203.165093] die_kernel_fault (arch/arm64/mm/fault.c:321)
<4>[ 203.165905] do_mem_abort (arch/arm64/mm/fault.c:850)
<4>[ 203.166149] el1_abort (arch/arm64/include/asm/daifflags.h:28
arch/arm64/kernel/entry-common.c:399)
<4>[ 203.166864] el1h_64_sync_handler (arch/arm64/kernel/entry-common.c:486)
<4>[ 203.167173] el1h_64_sync (arch/arm64/kernel/entry.S:590)
<4>[ 203.167824] handle_futex_death (kernel/futex/core.c:661 (discriminator 6))
<4>[ 203.168329] exit_robust_list (kernel/futex/core.c:828)
<4>[ 203.168829] futex_exit_release (kernel/futex/core.c:1035
(discriminator 1) kernel/futex/core.c:1131 (discriminator 1))
<4>[ 203.169375] exit_mm_release (kernel/fork.c:1657)
<4>[ 203.169884] do_exit (kernel/exit.c:541 kernel/exit.c:858)
<4>[ 203.170372] do_group_exit (kernel/exit.c:1002)
<4>[ 203.170857] __arm64_sys_exit_group (kernel/exit.c:1032)
<4>[ 203.171643] invoke_syscall (arch/arm64/include/asm/current.h:19
arch/arm64/kernel/syscall.c:56)
<4>[ 203.172281] el0_svc_common.constprop.0
(include/linux/thread_info.h:127 (discriminator 2)
arch/arm64/kernel/syscall.c:144 (discriminator 2))
<4>[ 203.172815] do_el0_svc (arch/arm64/kernel/syscall.c:156)
<4>[ 203.173284] el0_svc (arch/arm64/include/asm/daifflags.h:28
arch/arm64/kernel/entry-common.c:133
arch/arm64/kernel/entry-common.c:144
arch/arm64/kernel/entry-common.c:679)
<4>[ 203.173769] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:697)
<4>[ 203.174052] el0t_64_sync (arch/arm64/kernel/entry.S:595)

Links:
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20231026/testrun/20823098/suite/log-parser-test/test/check-kernel-bug/log
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20231026/testrun/20823098/suite/log-parser-test/tests/
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20231026/testrun/20823050/suite/log-parser-test/tests/

--
Linaro LKFT
https://lkft.linaro.org

2023-10-26 15:33:03

by Mark Rutland

[permalink] [raw]

Subject: Re: qemu-arm64: handle_futex_death - kernel/futex/core.c:661 - Unable to handle kernel unknown 43 at virtual address

On Thu, Oct 26, 2023 at 08:11:26PM +0530, Naresh Kamboju wrote:
> Following kernel crash noticed on qemu-arm64 while running LTP syscalls
> set_robust_list test case running Linux next 6.6.0-rc7-next-20231026 and
> 6.6.0-rc7-next-20231025.
>
> BAD: next-20231025
> Good: next-20231024
>
> Reported-by: Linux Kernel Functional Testing <[email protected]>
> Reported-by: Naresh Kamboju <[email protected]>
>
> Log:
> ----
> <1>[ 203.119139] Unable to handle kernel unknown 43 at virtual
> address 0001ffff9e2e7d78
> <1>[ 203.119838] Mem abort info:
> <1>[ 203.120064] ESR = 0x000000009793002b
> <1>[ 203.121040] EC = 0x25: DABT (current EL), IL = 32 bits
> set_robust_list01 1 TPASS : set_robust_list: retval = -1
> (expected -1), errno = 22 (expected 22)
> set_robust_list01 2 TPASS : set_robust_list: retval = 0
> (expected 0), errno = 0 (expected 0)
> <1>[ 203.124496] SET = 0, FnV = 0
> <1>[ 203.124778] EA = 0, S1PTW = 0
> <1>[ 203.125029] FSC = 0x2b: unknown 43

It looks like this is fallout from the LPA2 enablement.

According to the latest ARM ARM (ARM DDI 0487J.a), page D19-6475, that "unknown
43" (0x2b / 0b101011) is the DFSC for a level -1 translation fault:

0b101011 When FEAT_LPA2 is implemented:
Translation fault, level -1.

It's triggered here by an LDTR in a get_user() on a bogus userspace address.
The exception is expected, and it's supposed to be handled via the exception
fixups, but the LPA2 patches didn't update the fault_info table entries for all
the level -1 faults, and so those all get handled by do_bad() and don't call
fixup_exception(), causing them to be fatal.

It should be relatively simple to update the fault_info table for the level -1
faults, but given the other issues we're seeing I think it's probably worth
dropping the LPA2 patches for the moment.

Mark.

> <1>[ 203.126470] Data abort info:
> <1>[ 203.126710] Access size = 4 byte(s)
> <1>[ 203.126969] SSE = 0, SRT = 19
> <1>[ 203.127708] SF = 0, AR = 0
> <1>[ 203.128213] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> <1>[ 203.128788] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> <1>[ 203.130416] user pgtable: 4k pages, 52-bit VAs, pgdp=000000010606a780
> <1>[ 203.130817] [0001ffff9e2e7d78] pgd=0000000000000000
> <0>[ 203.132603] Internal error: Oops: 000000009793002b [#1] PREEMPT SMP
> <4>[ 203.133483] Modules linked in: btrfs blake2b_generic libcrc32c
> xor xor_neon raid6_pq zstd_compress crct10dif_ce sm3_ce sm3 sha3_ce
> sha512_ce sha512_arm64 fuse drm backlight dm_mod ip_tables x_tables
> <4>[ 203.135177] CPU: 1 PID: 653 Comm: set_robust_list Not tainted
> 6.6.0-rc7-next-20231026 #1
> <4>[ 203.135642] Hardware name: linux,dummy-virt (DT)
> <4>[ 203.136609] pstate: 83400009 (Nzcv daif +PAN -UAO +TCO +DIT
> -SSBS BTYPE=--)
> <4>[ 203.137028] pc : handle_futex_death (kernel/futex/core.c:661
> (discriminator 6))
> <4>[ 203.138844] lr : handle_futex_death
> (arch/arm64/include/asm/uaccess.h:46 (discriminator 1)
> kernel/futex/core.c:661 (discriminator 1))
> <4>[ 203.139132] sp : ffff8000805c3c10
> <4>[ 203.139356] x29: ffff8000805c3c10 x28: 0000ffffbf187740 x27:
> d53bd04035000220
> <4>[ 203.140366] x26: 0000000000000000 x25: fff00000c6195280 x24:
> fff00000c6195280
> <4>[ 203.141055] x23: 0000000000000001 x22: ffffa4e6aeef09d0 x21:
> 0001ffff9e2e7d78
> <4>[ 203.141771] x20: 0001ffff9e2e7d78 x19: 0001ffff9e2e7d78 x18:
> ffff8000805c3cf8
> <4>[ 203.142457] x17: 0000000000000000 x16: ffffa4e6aeae7078 x15:
> 000000000000000a
> <4>[ 203.143134] x14: 0000000000000000 x13: 1ffe000018258661 x12:
> ffff8000805c3cf8
> <4>[ 203.143809] x11: 0000000000000000 x10: fff00000c12c3308 x9 :
> ffffa4e6ad0e5748
> <4>[ 203.144504] x8 : ffff8000805c3c38 x7 : 0000000000000000 x6 :
> 0000000000000001
> <4>[ 203.145186] x5 : 0000000000000000 x4 : fff00000c6195280 x3 :
> 0000000000000000
> <4>[ 203.145929] x2 : 0000000000000000 x1 : 000ffffffffffffc x0 :
> 0001ffff9e2e7d78
> <4>[ 203.147032] Call trace:
> <4>[ 203.147254] handle_futex_death (kernel/futex/core.c:661 (discriminator 6))
> <4>[ 203.147560] exit_robust_list (kernel/futex/core.c:828)
> <4>[ 203.148348] futex_exit_release (kernel/futex/core.c:1035
> (discriminator 1) kernel/futex/core.c:1131 (discriminator 1))
> <4>[ 203.148891] exit_mm_release (kernel/fork.c:1657)
> <4>[ 203.149669] do_exit (kernel/exit.c:541 kernel/exit.c:858)
> <4>[ 203.149897] do_group_exit (kernel/exit.c:1002)
> <4>[ 203.150209] __arm64_sys_exit_group (kernel/exit.c:1032)
> <4>[ 203.150980] invoke_syscall (arch/arm64/include/asm/current.h:19
> arch/arm64/kernel/syscall.c:56)
> <4>[ 203.151234] el0_svc_common.constprop.0
> (include/linux/thread_info.h:127 (discriminator 2)
> arch/arm64/kernel/syscall.c:144 (discriminator 2))
> <4>[ 203.151999] do_el0_svc (arch/arm64/kernel/syscall.c:156)
> <4>[ 203.152231] el0_svc (arch/arm64/include/asm/daifflags.h:28
> arch/arm64/kernel/entry-common.c:133
> arch/arm64/kernel/entry-common.c:144
> arch/arm64/kernel/entry-common.c:679)
> <4>[ 203.152936] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:697)
> <4>[ 203.153518] el0t_64_sync (arch/arm64/kernel/entry.S:595)
> <0>[ 203.154424] Code: d50323bf d65f03c0 9248fa93 52800002 (b8400a73)
> All code
> ========
> 0: d50323bf autiasp
> 4: d65f03c0 ret
> 8: 9248fa93 and x19, x20, #0xff7fffffffffffff
> c: 52800002 mov w2, #0x0 // #0
> 10:* b8400a73 ldtr w19, [x19] <-- trapping instruction
>
> Code starting with the faulting instruction
> ===========================================
> 0: b8400a73 ldtr w19, [x19]
> <4>[ 203.155308] ---[ end trace 0000000000000000 ]---
> <1>[ 203.156234] Fixing recursive fault but reboot is needed!
> <3>[ 203.157116] BUG: using smp_processor_id() in preemptible
> [00000000] code: set_robust_list/653
> <4>[ 203.158116] caller is debug_smp_processor_id (lib/smp_processor_id.c:61)
> <4>[ 203.158983] CPU: 1 PID: 653 Comm: set_robust_list Tainted: G
> D 6.6.0-rc7-next-20231026 #1
> <4>[ 203.159451] Hardware name: linux,dummy-virt (DT)
> <4>[ 203.159990] Call trace:
> <4>[ 203.160394] dump_backtrace (arch/arm64/kernel/stacktrace.c:235)
> <4>[ 203.160625] show_stack (arch/arm64/kernel/stacktrace.c:242)
> <4>[ 203.160854] dump_stack_lvl (lib/dump_stack.c:107)
> <4>[ 203.161869] dump_stack (lib/dump_stack.c:114)
> <4>[ 203.162093] check_preemption_disabled
> (arch/arm64/include/asm/current.h:19
> arch/arm64/include/asm/preempt.h:54 lib/smp_processor_id.c:53)
> <4>[ 203.162898] debug_smp_processor_id (lib/smp_processor_id.c:61)
> <4>[ 203.163176] __schedule (kernel/sched/core.c:6578 (discriminator 1))
> <4>[ 203.163894] do_task_dead (kernel/sched/core.c:6705)
> <4>[ 203.164143] make_task_dead
> (arch/arm64/include/asm/atomic_ll_sc.h:95 (discriminator 3)
> arch/arm64/include/asm/atomic.h:49 (discriminator 3)
> include/linux/atomic/atomic-arch-fallback.h:747 (discriminator 3)
> include/linux/atomic/atomic-instrumented.h:253 (discriminator 3)
> include/linux/refcount.h:193 (discriminator 3)
> include/linux/refcount.h:250 (discriminator 3)
> include/linux/refcount.h:267 (discriminator 3) kernel/exit.c:979
> (discriminator 3))
> <4>[ 203.164871] die (arch/arm64/kernel/traps.c:239)
> <4>[ 203.165093] die_kernel_fault (arch/arm64/mm/fault.c:321)
> <4>[ 203.165905] do_mem_abort (arch/arm64/mm/fault.c:850)
> <4>[ 203.166149] el1_abort (arch/arm64/include/asm/daifflags.h:28
> arch/arm64/kernel/entry-common.c:399)
> <4>[ 203.166864] el1h_64_sync_handler (arch/arm64/kernel/entry-common.c:486)
> <4>[ 203.167173] el1h_64_sync (arch/arm64/kernel/entry.S:590)
> <4>[ 203.167824] handle_futex_death (kernel/futex/core.c:661 (discriminator 6))
> <4>[ 203.168329] exit_robust_list (kernel/futex/core.c:828)
> <4>[ 203.168829] futex_exit_release (kernel/futex/core.c:1035
> (discriminator 1) kernel/futex/core.c:1131 (discriminator 1))
> <4>[ 203.169375] exit_mm_release (kernel/fork.c:1657)
> <4>[ 203.169884] do_exit (kernel/exit.c:541 kernel/exit.c:858)
> <4>[ 203.170372] do_group_exit (kernel/exit.c:1002)
> <4>[ 203.170857] __arm64_sys_exit_group (kernel/exit.c:1032)
> <4>[ 203.171643] invoke_syscall (arch/arm64/include/asm/current.h:19
> arch/arm64/kernel/syscall.c:56)
> <4>[ 203.172281] el0_svc_common.constprop.0
> (include/linux/thread_info.h:127 (discriminator 2)
> arch/arm64/kernel/syscall.c:144 (discriminator 2))
> <4>[ 203.172815] do_el0_svc (arch/arm64/kernel/syscall.c:156)
> <4>[ 203.173284] el0_svc (arch/arm64/include/asm/daifflags.h:28
> arch/arm64/kernel/entry-common.c:133
> arch/arm64/kernel/entry-common.c:144
> arch/arm64/kernel/entry-common.c:679)
> <4>[ 203.173769] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:697)
> <4>[ 203.174052] el0t_64_sync (arch/arm64/kernel/entry.S:595)
>
>
>
> Links:
> - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20231026/testrun/20823098/suite/log-parser-test/test/check-kernel-bug/log
> - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20231026/testrun/20823098/suite/log-parser-test/tests/
> - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20231026/testrun/20823050/suite/log-parser-test/tests/
>
> --
> Linaro LKFT
> https://lkft.linaro.org

2023-10-26 15:39:43

by Ard Biesheuvel

[permalink] [raw]

Subject: Re: qemu-arm64: handle_futex_death - kernel/futex/core.c:661 - Unable to handle kernel unknown 43 at virtual address

On Thu, 26 Oct 2023 at 17:30, Mark Rutland <[email protected]> wrote:
>
> On Thu, Oct 26, 2023 at 08:11:26PM +0530, Naresh Kamboju wrote:
> > Following kernel crash noticed on qemu-arm64 while running LTP syscalls
> > set_robust_list test case running Linux next 6.6.0-rc7-next-20231026 and
> > 6.6.0-rc7-next-20231025.
> >
> > BAD: next-20231025
> > Good: next-20231024
> >
> > Reported-by: Linux Kernel Functional Testing <[email protected]>
> > Reported-by: Naresh Kamboju <[email protected]>
> >
> > Log:
> > ----
> > <1>[ 203.119139] Unable to handle kernel unknown 43 at virtual
> > address 0001ffff9e2e7d78
> > <1>[ 203.119838] Mem abort info:
> > <1>[ 203.120064] ESR = 0x000000009793002b
> > <1>[ 203.121040] EC = 0x25: DABT (current EL), IL = 32 bits
> > set_robust_list01 1 TPASS : set_robust_list: retval = -1
> > (expected -1), errno = 22 (expected 22)
> > set_robust_list01 2 TPASS : set_robust_list: retval = 0
> > (expected 0), errno = 0 (expected 0)
> > <1>[ 203.124496] SET = 0, FnV = 0
> > <1>[ 203.124778] EA = 0, S1PTW = 0
> > <1>[ 203.125029] FSC = 0x2b: unknown 43
>
> It looks like this is fallout from the LPA2 enablement.
>
> According to the latest ARM ARM (ARM DDI 0487J.a), page D19-6475, that "unknown
> 43" (0x2b / 0b101011) is the DFSC for a level -1 translation fault:
>
> 0b101011 When FEAT_LPA2 is implemented:
> Translation fault, level -1.
>
> It's triggered here by an LDTR in a get_user() on a bogus userspace address.
> The exception is expected, and it's supposed to be handled via the exception
> fixups, but the LPA2 patches didn't update the fault_info table entries for all
> the level -1 faults, and so those all get handled by do_bad() and don't call
> fixup_exception(), causing them to be fatal.
>
> It should be relatively simple to update the fault_info table for the level -1
> faults, but given the other issues we're seeing I think it's probably worth
> dropping the LPA2 patches for the moment.
>

Thanks for the analysis Mark.

I agree that this should not be difficult to fix, but given the other
CI problems and identified loose ends, I am not going to object to
dropping this partially or entirely at this point. I'm sure everybody
will be thrilled to go over those 60 patches again after I rebase them
onto v6.7-rc1 :-)

2023-10-27 10:58:12

by Naresh Kamboju

[permalink] [raw]

Subject: Re: qemu-arm64: handle_futex_death - kernel/futex/core.c:661 - Unable to handle kernel unknown 43 at virtual address

On Thu, 26 Oct 2023 at 21:09, Ard Biesheuvel <[email protected]> wrote:
>
> On Thu, 26 Oct 2023 at 17:30, Mark Rutland <[email protected]> wrote:
> >
> > On Thu, Oct 26, 2023 at 08:11:26PM +0530, Naresh Kamboju wrote:
> > > Following kernel crash noticed on qemu-arm64 while running LTP syscalls
> > > set_robust_list test case running Linux next 6.6.0-rc7-next-20231026 and
> > > 6.6.0-rc7-next-20231025.
> > >
> > > BAD: next-20231025
> > > Good: next-20231024
> > >
> > > Reported-by: Linux Kernel Functional Testing <[email protected]>
> > > Reported-by: Naresh Kamboju <[email protected]>
> > >
> > > Log:
> > > ----
> > > <1>[ 203.119139] Unable to handle kernel unknown 43 at virtual
> > > address 0001ffff9e2e7d78
> > > <1>[ 203.119838] Mem abort info:
> > > <1>[ 203.120064] ESR = 0x000000009793002b
> > > <1>[ 203.121040] EC = 0x25: DABT (current EL), IL = 32 bits
> > > set_robust_list01 1 TPASS : set_robust_list: retval = -1
> > > (expected -1), errno = 22 (expected 22)
> > > set_robust_list01 2 TPASS : set_robust_list: retval = 0
> > > (expected 0), errno = 0 (expected 0)
> > > <1>[ 203.124496] SET = 0, FnV = 0
> > > <1>[ 203.124778] EA = 0, S1PTW = 0
> > > <1>[ 203.125029] FSC = 0x2b: unknown 43
> >
> > It looks like this is fallout from the LPA2 enablement.
> >
> > According to the latest ARM ARM (ARM DDI 0487J.a), page D19-6475, that "unknown
> > 43" (0x2b / 0b101011) is the DFSC for a level -1 translation fault:
> >
> > 0b101011 When FEAT_LPA2 is implemented:
> > Translation fault, level -1.
> >
> > It's triggered here by an LDTR in a get_user() on a bogus userspace address.
> > The exception is expected, and it's supposed to be handled via the exception
> > fixups, but the LPA2 patches didn't update the fault_info table entries for all
> > the level -1 faults, and so those all get handled by do_bad() and don't call
> > fixup_exception(), causing them to be fatal.
> >
> > It should be relatively simple to update the fault_info table for the level -1
> > faults, but given the other issues we're seeing I think it's probably worth
> > dropping the LPA2 patches for the moment.
> >
>
> Thanks for the analysis Mark.
>
> I agree that this should not be difficult to fix, but given the other
> CI problems and identified loose ends, I am not going to object to
> dropping this partially or entirely at this point. I'm sure everybody
> will be thrilled to go over those 60 patches again after I rebase them
> onto v6.7-rc1 :-)

I am happy to test any proposed fix patch.

- Naresh

2023-10-28 07:43:14

by Ard Biesheuvel

[permalink] [raw]

Subject: Re: qemu-arm64: handle_futex_death - kernel/futex/core.c:661 - Unable to handle kernel unknown 43 at virtual address

On Fri, 27 Oct 2023 at 12:57, Naresh Kamboju <[email protected]> wrote:
>
> On Thu, 26 Oct 2023 at 21:09, Ard Biesheuvel <[email protected]> wrote:
> >
> > On Thu, 26 Oct 2023 at 17:30, Mark Rutland <[email protected]> wrote:
> > >
> > > On Thu, Oct 26, 2023 at 08:11:26PM +0530, Naresh Kamboju wrote:
> > > > Following kernel crash noticed on qemu-arm64 while running LTP syscalls
> > > > set_robust_list test case running Linux next 6.6.0-rc7-next-20231026 ...
> > > It looks like this is fallout from the LPA2 enablement.
> > >
> > > According to the latest ARM ARM (ARM DDI 0487J.a), page D19-6475, that "unknown
> > > 43" (0x2b / 0b101011) is the DFSC for a level -1 translation fault:
> > >
> > > 0b101011 When FEAT_LPA2 is implemented:
> > > Translation fault, level -1.
> > >
> > > It's triggered here by an LDTR in a get_user() on a bogus userspace address.
> > > The exception is expected, and it's supposed to be handled via the exception
> > > fixups, but the LPA2 patches didn't update the fault_info table entries for all
> > > the level -1 faults, and so those all get handled by do_bad() and don't call
> > > fixup_exception(), causing them to be fatal.
> > >
> > > It should be relatively simple to update the fault_info table for the level -1
> > > faults, but given the other issues we're seeing I think it's probably worth
> > > dropping the LPA2 patches for the moment.
> > >
> >
> > Thanks for the analysis Mark.
> >
> > I agree that this should not be difficult to fix, but given the other
> > CI problems and identified loose ends, I am not going to object to
> > dropping this partially or entirely at this point. I'm sure everybody
> > will be thrilled to go over those 60 patches again after I rebase them
> > onto v6.7-rc1 :-)
>
> I am happy to test any proposed fix patch.
>

Thanks Naresh. Patch attached.

Attachments:

0001-Add-missing-ESR-decoding-for-level-1-translation-fau.patch (2.60 kB)

2023-10-30 08:08:22

by Naresh Kamboju

[permalink] [raw]

Subject: Re: qemu-arm64: handle_futex_death - kernel/futex/core.c:661 - Unable to handle kernel unknown 43 at virtual address

On Sat, 28 Oct 2023 at 13:12, Ard Biesheuvel <[email protected]> wrote:
>
> On Fri, 27 Oct 2023 at 12:57, Naresh Kamboju <[email protected]> wrote:
> >
> > On Thu, 26 Oct 2023 at 21:09, Ard Biesheuvel <[email protected]> wrote:
> > >
> > > On Thu, 26 Oct 2023 at 17:30, Mark Rutland <[email protected]> wrote:
> > > >
> > > > On Thu, Oct 26, 2023 at 08:11:26PM +0530, Naresh Kamboju wrote:
> > > > > Following kernel crash noticed on qemu-arm64 while running LTP syscalls
> > > > > set_robust_list test case running Linux next 6.6.0-rc7-next-20231026 ...
> > > > It looks like this is fallout from the LPA2 enablement.
> > > >
> > > > According to the latest ARM ARM (ARM DDI 0487J.a), page D19-6475, that "unknown
> > > > 43" (0x2b / 0b101011) is the DFSC for a level -1 translation fault:
> > > >
> > > > 0b101011 When FEAT_LPA2 is implemented:
> > > > Translation fault, level -1.
> > > >
> > > > It's triggered here by an LDTR in a get_user() on a bogus userspace address.
> > > > The exception is expected, and it's supposed to be handled via the exception
> > > > fixups, but the LPA2 patches didn't update the fault_info table entries for all
> > > > the level -1 faults, and so those all get handled by do_bad() and don't call
> > > > fixup_exception(), causing them to be fatal.
> > > >
> > > > It should be relatively simple to update the fault_info table for the level -1
> > > > faults, but given the other issues we're seeing I think it's probably worth
> > > > dropping the LPA2 patches for the moment.
> > > >
> > >
> > > Thanks for the analysis Mark.
> > >
> > > I agree that this should not be difficult to fix, but given the other
> > > CI problems and identified loose ends, I am not going to object to
> > > dropping this partially or entirely at this point. I'm sure everybody
> > > will be thrilled to go over those 60 patches again after I rebase them
> > > onto v6.7-rc1 :-)
> >
> > I am happy to test any proposed fix patch.
> >
>
> Thanks Naresh. Patch attached.

This patch did not solve the reported problem.
Test log links,
- https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/naresh/tests/2XTP1lXcUUscT357YaAm2G1AhpS

- Naresh

2023-10-30 08:16:21

by Ard Biesheuvel

[permalink] [raw]

Subject: Re: qemu-arm64: handle_futex_death - kernel/futex/core.c:661 - Unable to handle kernel unknown 43 at virtual address

On Mon, 30 Oct 2023 at 09:07, Naresh Kamboju <[email protected]> wrote:
>
> On Sat, 28 Oct 2023 at 13:12, Ard Biesheuvel <[email protected]> wrote:
> >
> > On Fri, 27 Oct 2023 at 12:57, Naresh Kamboju <[email protected]> wrote:
> > >
> > > On Thu, 26 Oct 2023 at 21:09, Ard Biesheuvel <[email protected]> wrote:
> > > >
> > > > On Thu, 26 Oct 2023 at 17:30, Mark Rutland <[email protected]> wrote:
> > > > >
> > > > > On Thu, Oct 26, 2023 at 08:11:26PM +0530, Naresh Kamboju wrote:
> > > > > > Following kernel crash noticed on qemu-arm64 while running LTP syscalls
> > > > > > set_robust_list test case running Linux next 6.6.0-rc7-next-20231026 ...
> > > > > It looks like this is fallout from the LPA2 enablement.
> > > > >
> > > > > According to the latest ARM ARM (ARM DDI 0487J.a), page D19-6475, that "unknown
> > > > > 43" (0x2b / 0b101011) is the DFSC for a level -1 translation fault:
> > > > >
> > > > > 0b101011 When FEAT_LPA2 is implemented:
> > > > > Translation fault, level -1.
> > > > >
> > > > > It's triggered here by an LDTR in a get_user() on a bogus userspace address.
> > > > > The exception is expected, and it's supposed to be handled via the exception
> > > > > fixups, but the LPA2 patches didn't update the fault_info table entries for all
> > > > > the level -1 faults, and so those all get handled by do_bad() and don't call
> > > > > fixup_exception(), causing them to be fatal.
> > > > >
> > > > > It should be relatively simple to update the fault_info table for the level -1
> > > > > faults, but given the other issues we're seeing I think it's probably worth
> > > > > dropping the LPA2 patches for the moment.
> > > > >
> > > >
> > > > Thanks for the analysis Mark.
> > > >
> > > > I agree that this should not be difficult to fix, but given the other
> > > > CI problems and identified loose ends, I am not going to object to
> > > > dropping this partially or entirely at this point. I'm sure everybody
> > > > will be thrilled to go over those 60 patches again after I rebase them
> > > > onto v6.7-rc1 :-)
> > >
> > > I am happy to test any proposed fix patch.
> > >
> >
> > Thanks Naresh. Patch attached.
>
> This patch did not solve the reported problem.
> Test log links,
> - https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/naresh/tests/2XTP1lXcUUscT357YaAm2G1AhpS
>

Oops, sorry about that.

Fixed patch attched.

Attachments:

v2-0001-Add-missing-ESR-decoding-for-level-1-translation-.patch (3.52 kB)

2023-10-30 11:51:32

by Naresh Kamboju

[permalink] [raw]

Subject: Re: qemu-arm64: handle_futex_death - kernel/futex/core.c:661 - Unable to handle kernel unknown 43 at virtual address

On Mon, 30 Oct 2023 at 13:45, Ard Biesheuvel <[email protected]> wrote:
>
> On Mon, 30 Oct 2023 at 09:07, Naresh Kamboju <[email protected]> wrote:
> >
> > On Sat, 28 Oct 2023 at 13:12, Ard Biesheuvel <[email protected]> wrote:
> > >
> > > On Fri, 27 Oct 2023 at 12:57, Naresh Kamboju <[email protected]> wrote:
> > > >
> > > > On Thu, 26 Oct 2023 at 21:09, Ard Biesheuvel <[email protected]> wrote:
> > > > >
> > > > > On Thu, 26 Oct 2023 at 17:30, Mark Rutland <[email protected]> wrote:
> > > > > >
> > > > > > On Thu, Oct 26, 2023 at 08:11:26PM +0530, Naresh Kamboju wrote:
> > > > > > > Following kernel crash noticed on qemu-arm64 while running LTP syscalls
> > > > > > > set_robust_list test case running Linux next 6.6.0-rc7-next-20231026 ...
> > > > > > It looks like this is fallout from the LPA2 enablement.
> > > > > >
> > > > > > According to the latest ARM ARM (ARM DDI 0487J.a), page D19-6475, that "unknown
> > > > > > 43" (0x2b / 0b101011) is the DFSC for a level -1 translation fault:
> > > > > >
> > > > > > 0b101011 When FEAT_LPA2 is implemented:
> > > > > > Translation fault, level -1.
> > > > > >
> > > > > > It's triggered here by an LDTR in a get_user() on a bogus userspace address.
> > > > > > The exception is expected, and it's supposed to be handled via the exception
> > > > > > fixups, but the LPA2 patches didn't update the fault_info table entries for all
> > > > > > the level -1 faults, and so those all get handled by do_bad() and don't call
> > > > > > fixup_exception(), causing them to be fatal.
> > > > > >
> > > > > > It should be relatively simple to update the fault_info table for the level -1
> > > > > > faults, but given the other issues we're seeing I think it's probably worth
> > > > > > dropping the LPA2 patches for the moment.
> > > > > >
> > > > >
> > > > > Thanks for the analysis Mark.
> > > > >
> > > > > I agree that this should not be difficult to fix, but given the other
> > > > > CI problems and identified loose ends, I am not going to object to
> > > > > dropping this partially or entirely at this point. I'm sure everybody
> > > > > will be thrilled to go over those 60 patches again after I rebase them
> > > > > onto v6.7-rc1 :-)
> > > >
> > > > I am happy to test any proposed fix patch.
> > > >
> > >
> > > Thanks Naresh. Patch attached.
> >
> > This patch did not solve the reported problem.
> > Test log links,
> > - https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/naresh/tests/2XTP1lXcUUscT357YaAm2G1AhpS
> >
>
> Oops, sorry about that.
>
> Fixed patch attched.

Tested-by: Linux Kernel Functional Testing <[email protected]>

- Naresh

Attachments:

v2-0001-Add-missing-ESR-decoding-for-level-1-translation-.patch (3.52 kB)

2023-10-31 07:43:51

by Naresh Kamboju

[permalink] [raw]

Subject: Re: qemu-arm64: handle_futex_death - kernel/futex/core.c:661 - Unable to handle kernel unknown 43 at virtual address

Hi Ard,

Your V2 patch works perfectly.
Thanks for providing a fix patch.

- Naresh

On Mon, 30 Oct 2023 at 17:20, Naresh Kamboju <[email protected]> wrote:
>
> On Mon, 30 Oct 2023 at 13:45, Ard Biesheuvel <[email protected]> wrote:
> >
> > On Mon, 30 Oct 2023 at 09:07, Naresh Kamboju <[email protected]> wrote:
> > >
> > > On Sat, 28 Oct 2023 at 13:12, Ard Biesheuvel <[email protected]> wrote:
> > > >
> > > > On Fri, 27 Oct 2023 at 12:57, Naresh Kamboju <[email protected]> wrote:
> > > > >
> > > > > On Thu, 26 Oct 2023 at 21:09, Ard Biesheuvel <[email protected]> wrote:
> > > > > >
> > > > > > On Thu, 26 Oct 2023 at 17:30, Mark Rutland <[email protected]> wrote:
> > > > > > >
> > > > > > > On Thu, Oct 26, 2023 at 08:11:26PM +0530, Naresh Kamboju wrote:
> > > > > > > > Following kernel crash noticed on qemu-arm64 while running LTP syscalls
> > > > > > > > set_robust_list test case running Linux next 6.6.0-rc7-next-20231026 ...
> > > > > > > It looks like this is fallout from the LPA2 enablement.
> > > > > > >
> > > > > > > According to the latest ARM ARM (ARM DDI 0487J.a), page D19-6475, that "unknown
> > > > > > > 43" (0x2b / 0b101011) is the DFSC for a level -1 translation fault:
> > > > > > >
> > > > > > > 0b101011 When FEAT_LPA2 is implemented:
> > > > > > > Translation fault, level -1.
> > > > > > >
> > > > > > > It's triggered here by an LDTR in a get_user() on a bogus userspace address.
> > > > > > > The exception is expected, and it's supposed to be handled via the exception
> > > > > > > fixups, but the LPA2 patches didn't update the fault_info table entries for all
> > > > > > > the level -1 faults, and so those all get handled by do_bad() and don't call
> > > > > > > fixup_exception(), causing them to be fatal.
> > > > > > >
> > > > > > > It should be relatively simple to update the fault_info table for the level -1
> > > > > > > faults, but given the other issues we're seeing I think it's probably worth
> > > > > > > dropping the LPA2 patches for the moment.
> > > > > > >
> > > > > >
> > > > > > Thanks for the analysis Mark.
> > > > > >
> > > > > > I agree that this should not be difficult to fix, but given the other
> > > > > > CI problems and identified loose ends, I am not going to object to
> > > > > > dropping this partially or entirely at this point. I'm sure everybody
> > > > > > will be thrilled to go over those 60 patches again after I rebase them
> > > > > > onto v6.7-rc1 :-)
> > > > >
> > > > > I am happy to test any proposed fix patch.
> > > > >
> > > >
> > > > Thanks Naresh. Patch attached.
> > >
> > > This patch did not solve the reported problem.
> > > Test log links,
> > > - https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/naresh/tests/2XTP1lXcUUscT357YaAm2G1AhpS
> > >
> >
> > Oops, sorry about that.
> >
> > Fixed patch attched.
>
> Tested-by: Linux Kernel Functional Testing <[email protected]>
>
> - Naresh

Attachments:

v2-0001-Add-missing-ESR-decoding-for-level-1-translation-.patch (3.52 kB)

2023-10-31 16:28:03

by Mark Rutland

[permalink] [raw]

Subject: Re: qemu-arm64: handle_futex_death - kernel/futex/core.c:661 - Unable to handle kernel unknown 43 at virtual address

On Mon, Oct 30, 2023 at 09:14:56AM +0100, Ard Biesheuvel wrote:
> From 97dea432bceadfcece84484609374c277afc2c81 Mon Sep 17 00:00:00 2001
> From: Ard Biesheuvel <[email protected]>
> Date: Sat, 28 Oct 2023 09:40:29 +0200
> Subject: [PATCH v2] Add missing ESR decoding for level -1 translation faults
>
> Signed-off-by: Ard Biesheuvel <[email protected]>

As a heads-up, looking at this some more we'll also need to rework the usage of
of ESR_ELx_FSC_TYPE and ESR_ELx_FSC_LEVEL, since those no longer work correctly
Level -1 xFSC value. ESR_ELx_FSC_TYPE is 0x3c and ESR_ELx_FSC_LEVEL is 0x3, and
work on the basis that the xFSC fault types are encoded as xxxxyy, where the
xxxx is the type and the yy is the level (0 to 3).

That didn't expand naturally to level -1. For example, Level {0,1,2,3}
translation faults get reported as 0b0001xx, where the xx encodes the level,
while Level -1 translation faults get reported as 0b101011.

That ends up affecting:

* All the is_${FOO}_fault() predicat functions, e.g. is_translation_fault(),
is_el1_permission_fault() and is_spurious_el1_translation_fault().

* Places where we synthesize an xFSC value, e.g. set_thread_esr()

* A bunch of KVM due to the use of kvm_vcpu_trap_get_fault_type()

... and we probably need to remove ESR_ELx_FSC_TYPE and ESR_ELx_FSC_LEVEL
entirely to avoid the possiblity of misuse.

Mark.

> ---
> arch/arm64/mm/fault.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 2e5d1e238af9..13f192691060 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -780,18 +780,18 @@ static const struct fault_info fault_info[] = {
> { do_translation_fault, SIGSEGV, SEGV_MAPERR, "level 1 translation fault" },
> { do_translation_fault, SIGSEGV, SEGV_MAPERR, "level 2 translation fault" },
> { do_translation_fault, SIGSEGV, SEGV_MAPERR, "level 3 translation fault" },
> - { do_bad, SIGKILL, SI_KERNEL, "unknown 8" },
> + { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 0 access flag fault" },
> { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 1 access flag fault" },
> { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 2 access flag fault" },
> { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 3 access flag fault" },
> - { do_bad, SIGKILL, SI_KERNEL, "unknown 12" },
> + { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 0 permission fault" },
> { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 1 permission fault" },
> { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 2 permission fault" },
> { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 3 permission fault" },
> { do_sea, SIGBUS, BUS_OBJERR, "synchronous external abort" },
> { do_tag_check_fault, SIGSEGV, SEGV_MTESERR, "synchronous tag check fault" },
> { do_bad, SIGKILL, SI_KERNEL, "unknown 18" },
> - { do_bad, SIGKILL, SI_KERNEL, "unknown 19" },
> + { do_sea, SIGKILL, SI_KERNEL, "level -1 (translation table walk)" },
> { do_sea, SIGKILL, SI_KERNEL, "level 0 (translation table walk)" },
> { do_sea, SIGKILL, SI_KERNEL, "level 1 (translation table walk)" },
> { do_sea, SIGKILL, SI_KERNEL, "level 2 (translation table walk)" },
> @@ -799,7 +799,7 @@ static const struct fault_info fault_info[] = {
> { do_sea, SIGBUS, BUS_OBJERR, "synchronous parity or ECC error" }, // Reserved when RAS is implemented
> { do_bad, SIGKILL, SI_KERNEL, "unknown 25" },
> { do_bad, SIGKILL, SI_KERNEL, "unknown 26" },
> - { do_bad, SIGKILL, SI_KERNEL, "unknown 27" },
> + { do_sea, SIGKILL, SI_KERNEL, "level -1 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented
> { do_sea, SIGKILL, SI_KERNEL, "level 0 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented
> { do_sea, SIGKILL, SI_KERNEL, "level 1 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented
> { do_sea, SIGKILL, SI_KERNEL, "level 2 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented
> @@ -813,9 +813,9 @@ static const struct fault_info fault_info[] = {
> { do_bad, SIGKILL, SI_KERNEL, "unknown 38" },
> { do_bad, SIGKILL, SI_KERNEL, "unknown 39" },
> { do_bad, SIGKILL, SI_KERNEL, "unknown 40" },
> - { do_bad, SIGKILL, SI_KERNEL, "unknown 41" },
> + { do_bad, SIGKILL, SI_KERNEL, "level -1 address size fault" },
> { do_bad, SIGKILL, SI_KERNEL, "unknown 42" },
> - { do_bad, SIGKILL, SI_KERNEL, "unknown 43" },
> + { do_translation_fault, SIGSEGV, SEGV_MAPERR, "level -1 translation fault" },
> { do_bad, SIGKILL, SI_KERNEL, "unknown 44" },
> { do_bad, SIGKILL, SI_KERNEL, "unknown 45" },
> { do_bad, SIGKILL, SI_KERNEL, "unknown 46" },
> --
> 2.42.0.820.g83a721a137-goog
>

2023-10-31 16:35:48

by Mark Rutland

[permalink] [raw]

Subject: Re: qemu-arm64: handle_futex_death - kernel/futex/core.c:661 - Unable to handle kernel unknown 43 at virtual address

On Thu, Oct 26, 2023 at 05:39:11PM +0200, Ard Biesheuvel wrote:
> On Thu, 26 Oct 2023 at 17:30, Mark Rutland <[email protected]> wrote:
> >
> > On Thu, Oct 26, 2023 at 08:11:26PM +0530, Naresh Kamboju wrote:
> > > Following kernel crash noticed on qemu-arm64 while running LTP syscalls
> > > set_robust_list test case running Linux next 6.6.0-rc7-next-20231026 and
> > > 6.6.0-rc7-next-20231025.
> > >
> > > BAD: next-20231025
> > > Good: next-20231024
> > >
> > > Reported-by: Linux Kernel Functional Testing <[email protected]>
> > > Reported-by: Naresh Kamboju <[email protected]>
> > >
> > > Log:
> > > ----
> > > <1>[ 203.119139] Unable to handle kernel unknown 43 at virtual
> > > address 0001ffff9e2e7d78
> > > <1>[ 203.119838] Mem abort info:
> > > <1>[ 203.120064] ESR = 0x000000009793002b
> > > <1>[ 203.121040] EC = 0x25: DABT (current EL), IL = 32 bits
> > > set_robust_list01 1 TPASS : set_robust_list: retval = -1
> > > (expected -1), errno = 22 (expected 22)
> > > set_robust_list01 2 TPASS : set_robust_list: retval = 0
> > > (expected 0), errno = 0 (expected 0)
> > > <1>[ 203.124496] SET = 0, FnV = 0
> > > <1>[ 203.124778] EA = 0, S1PTW = 0
> > > <1>[ 203.125029] FSC = 0x2b: unknown 43
> >
> > It looks like this is fallout from the LPA2 enablement.
> >
> > According to the latest ARM ARM (ARM DDI 0487J.a), page D19-6475, that "unknown
> > 43" (0x2b / 0b101011) is the DFSC for a level -1 translation fault:
> >
> > 0b101011 When FEAT_LPA2 is implemented:
> > Translation fault, level -1.
> >
> > It's triggered here by an LDTR in a get_user() on a bogus userspace address.
> > The exception is expected, and it's supposed to be handled via the exception
> > fixups, but the LPA2 patches didn't update the fault_info table entries for all
> > the level -1 faults, and so those all get handled by do_bad() and don't call
> > fixup_exception(), causing them to be fatal.
> >
> > It should be relatively simple to update the fault_info table for the level -1
> > faults, but given the other issues we're seeing I think it's probably worth
> > dropping the LPA2 patches for the moment.
> >
>
> Thanks for the analysis Mark.
>
> I agree that this should not be difficult to fix, but given the other
> CI problems and identified loose ends, I am not going to object to
> dropping this partially or entirely at this point. I'm sure everybody
> will be thrilled to go over those 60 patches again after I rebase them
> onto v6.7-rc1 :-)

FWIW, I'm more than happy to try; the issue has lagely been finding the time.
Hopefully that'll be a bit easier after LPC!

Mark.