Following kernel warning noticed while running kselftest arm64 sve-ptrace
on qemu-arm64 on ampere-altra server.
Reported-by: Linux Kernel Functional Testing <[email protected]>
/usr/bin/qemu-system-aarch64 -cpu max,pauth-impdef=on \
-machine virt-2.10 \
-nographic \
-net nic,model=virtio,macaddr=BA:DD:AD:FC:09:12 \
-net tap -m 4096 -monitor none \
-kernel Image.gz --append "console=ttyAMA0 root=/dev/vda rw"
-hda lkft-kselftest-image-juno-20221114150409.rootfs.ext4
-smp 4 -nographic
Boot log:
---------
[ 0.000000] Linux version 6.0.9-rc1 (tuxmake@tuxmake)
(aarch64-linux-gnu-gcc (Debian 11.3.0-6) 11.3.0, GNU ld (GNU Binutils
for Debian) 2.39) #1 SMP PREEMPT @1668438377
[ 0.000000] random: crng init done
[ 0.000000] Machine model: linux,dummy-virt
# selftests: arm64: sve-ptrace
# ok 680 # SKIP SVE set FPSIMD get SVE for VL 2704
# ok 681 Set SVE VL 2720
[ 422.607034] ------------[ cut here ]------------
[ 422.615382] WARNING: CPU: 0 PID: 1111 at
arch/arm64/kernel/fpsimd.c:464 fpsimd_save+0x170/0x1b0
[ 422.617588] Modules linked in: cfg80211 bluetooth rfkill
crct10dif_ce sm3_ce sm3 sha3_ce sha512_ce sha512_arm64 fuse drm
[ 422.619758] CPU: 0 PID: 1111 Comm: sve-ptrace Not tainted 6.0.9-rc1 #1
[ 422.620402] Hardware name: linux,dummy-virt (DT)
[ 422.620958] pstate: 804000c5 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 422.621614] pc : fpsimd_save+0x170/0x1b0
[ 422.621988] lr : fpsimd_save+0xd8/0x1b0
[ 422.622307] sp : ffff800008f3bb00
[ 422.622612] x29: ffff800008f3bb00 x28: ffffae14dd664bc0 x27: 0000000000000001
[ 422.623519] x26: ffff0000ff773858 x25: 0000000000000000 x24: ffff0000c0994fa8
[ 422.624102] x23: 0000000000000001 x22: 0000000000000100 x21: ffff0000ff75f0b0
[ 422.624706] x20: ffff51ec22a8b000 x19: ffffae14dccd40b0 x18: 0000000000000000
[ 422.625292] x17: ffff51ec22a8b000 x16: 0000000000000000 x15: 0000000000000000
[ 422.626041] x14: 0000000000000003 x13: 0000000000000000 x12: 0000000000000002
[ 422.626647] x11: ffffae14ddbee840 x10: 0000000000000312 x9 : ffffae14da818210
[ 422.627326] x8 : ffff0000c09935c0 x7 : ffffae14de2b8d08 x6 : 0000000000000000
[ 422.627889] x5 : 000000c91075a4a8 x4 : 0000000000000000 x3 : 0000000000000001
[ 422.628487] x2 : ffff51ec22a8b000 x1 : 0000000000000204 x0 : 0000000000000010
[ 422.629203] Call trace:
[ 422.629579] fpsimd_save+0x170/0x1b0
[ 422.630014] fpsimd_thread_switch+0x2c/0xc4
[ 422.630431] __switch_to+0x20/0x160
[ 422.630745] __schedule+0x380/0xb90
[ 422.631038] preempt_schedule_irq+0x4c/0x130
[ 422.631386] el1_interrupt+0x4c/0x64
[ 422.631689] el1h_64_irq_handler+0x18/0x24
[ 422.632037] el1h_64_irq+0x64/0x68
[ 422.632335] do_page_fault+0x31c/0x4d0
[ 422.632660] do_translation_fault+0xd8/0x100
[ 422.632993] do_mem_abort+0x58/0xb0
[ 422.633311] el0_ia+0x8c/0x134
[ 422.633685] el0t_64_sync_handler+0x134/0x140
[ 422.634061] el0t_64_sync+0x18c/0x190
[ 422.634580] irq event stamp: 654
[ 422.634923] hardirqs last enabled at (653): [<ffffae14dbeafc94>]
exit_to_kernel_mode+0x34/0x130
[ 422.635713] hardirqs last disabled at (654): [<ffffae14dbeb7700>]
__schedule+0x3f0/0xb90
[ 422.636309] softirqs last enabled at (650): [<ffffae14da810be4>]
__do_softirq+0x514/0x62c
[ 422.636877] softirqs last disabled at (637): [<ffffae14da8b4f58>]
__irq_exit_rcu+0x164/0x19c
[ 422.637446] ---[ end trace 0000000000000000 ]---
Full test log:
https://lkft.validation.linaro.org/scheduler/job/5847349#L2206
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.0.y/build/v6.0.8-191-gf8896c3ebbcf/testrun/13007451/suite/log-parser-test/test/check-kernel-exception/log
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.0.y/build/v6.0.8-191-gf8896c3ebbcf/testrun/13007451/suite/log-parser-test/test/check-kernel-exception/details/
metadata:
git_ref: linux-6.0.y
git_repo: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc
git_sha: f8896c3ebbcfcc053d9c27413bea3af94c01fd71
git_describe: v6.0.8-191-gf8896c3ebbcf
kernel_version: 6.0.9-rc1
kernel-config: https://builds.tuxbuild.com/2HXisCgbMlQAU85bS1QC4TvzydK/config
build-url: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc/-/pipelines/694074125
artifact-location: https://builds.tuxbuild.com/2HXisCgbMlQAU85bS1QC4TvzydK
toolchain: gcc-11
--
Linaro LKFT
https://lkft.linaro.org
On Tue, Nov 15, 2022, at 08:27, Naresh Kamboju wrote:
> Following kernel warning noticed while running kselftest arm64 sve-ptrace
> on qemu-arm64 on ampere-altra server.
>
> Reported-by: Linux Kernel Functional Testing <[email protected]>
>
> /usr/bin/qemu-system-aarch64 -cpu max,pauth-impdef=on \
> -machine virt-2.10 \
> -nographic \
> -net nic,model=virtio,macaddr=BA:DD:AD:FC:09:12 \
> -net tap -m 4096 -monitor none \
> -kernel Image.gz --append "console=ttyAMA0 root=/dev/vda rw"
> -hda lkft-kselftest-image-juno-20221114150409.rootfs.ext4
> -smp 4 -nographic
Hi Naresh,
Have you tried what happens if you run the same thing on an x86
machine? I would expect them to behave the same way, but it's
possible something goes wrong with the guest CPU if this ends
up using some (but not all) of the logic from KVM that would
use '-cpu host' instead of '-cpu max'. Note that the Neoverse
CPU in the Altra machine does not support SVE.
Other things you could easily try would use the same command
line as above, with the possible combinations of '-cpu host'
(replacing -cpu max) and '-enable-kvm'. Do you always get
the same result?
>
> Boot log:
> ---------
> [ 0.000000] Linux version 6.0.9-rc1 (tuxmake@tuxmake)
> (aarch64-linux-gnu-gcc (Debian 11.3.0-6) 11.3.0, GNU ld (GNU Binutils
> for Debian) 2.39) #1 SMP PREEMPT @1668438377
> [ 0.000000] random: crng init done
> [ 0.000000] Machine model: linux,dummy-virt
>
>
> # selftests: arm64: sve-ptrace
> # ok 680 # SKIP SVE set FPSIMD get SVE for VL 2704
> # ok 681 Set SVE VL 2720
>
> [ 422.607034] ------------[ cut here ]------------
> [ 422.615382] WARNING: CPU: 0 PID: 1111 at
> arch/arm64/kernel/fpsimd.c:464 fpsimd_save+0x170/0x1b0
> [ 422.617588] Modules linked in: cfg80211 bluetooth rfkill
> crct10dif_ce sm3_ce sm3 sha3_ce sha512_ce sha512_arm64 fuse drm
> [ 422.619758] CPU: 0 PID: 1111 Comm: sve-ptrace Not tainted 6.0.9-rc1 #1
> [ 422.620402] Hardware name: linux,dummy-virt (DT)
> [ 422.620958] pstate: 804000c5 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 422.621614] pc : fpsimd_save+0x170/0x1b0
> [ 422.621988] lr : fpsimd_save+0xd8/0x1b0
> [ 422.622307] sp : ffff800008f3bb00
> [ 422.622612] x29: ffff800008f3bb00 x28: ffffae14dd664bc0 x27: 0000000000000001
> [ 422.623519] x26: ffff0000ff773858 x25: 0000000000000000 x24: ffff0000c0994fa8
> [ 422.624102] x23: 0000000000000001 x22: 0000000000000100 x21: ffff0000ff75f0b0
> [ 422.624706] x20: ffff51ec22a8b000 x19: ffffae14dccd40b0 x18: 0000000000000000
> [ 422.625292] x17: ffff51ec22a8b000 x16: 0000000000000000 x15: 0000000000000000
> [ 422.626041] x14: 0000000000000003 x13: 0000000000000000 x12: 0000000000000002
> [ 422.626647] x11: ffffae14ddbee840 x10: 0000000000000312 x9 : ffffae14da818210
> [ 422.627326] x8 : ffff0000c09935c0 x7 : ffffae14de2b8d08 x6 : 0000000000000000
> [ 422.627889] x5 : 000000c91075a4a8 x4 : 0000000000000000 x3 : 0000000000000001
> [ 422.628487] x2 : ffff51ec22a8b000 x1 : 0000000000000204 x0 : 0000000000000010
> [ 422.629203] Call trace:
> [ 422.629579] fpsimd_save+0x170/0x1b0
> [ 422.630014] fpsimd_thread_switch+0x2c/0xc4
This is the location of the WARN_ON(), it tests that the
vector size matches. If for some reason it takes the vector
size of the host CPU, this would warn.
if (IS_ENABLED(CONFIG_ARM64_SVE) && save_sve_regs) {
/* Get the configured VL from RDVL, will account for SM */
if (WARN_ON(sve_get_vl() != vl)) {
/*
> [ 422.630431] __switch_to+0x20/0x160
> [ 422.630745] __schedule+0x380/0xb90
> [ 422.631038] preempt_schedule_irq+0x4c/0x130
> [ 422.631386] el1_interrupt+0x4c/0x64
> [ 422.631689] el1h_64_irq_handler+0x18/0x24
> [ 422.632037] el1h_64_irq+0x64/0x68
> [ 422.632335] do_page_fault+0x31c/0x4d0
> [ 422.632660] do_translation_fault+0xd8/0x100
> [ 422.632993] do_mem_abort+0x58/0xb0
> [ 422.633311] el0_ia+0x8c/0x134
> [ 422.633685] el0t_64_sync_handler+0x134/0x140
> [ 422.634061] el0t_64_sync+0x18c/0x190
> [ 422.634580] irq event stamp: 654
> [ 422.634923] hardirqs last enabled at (653): [<ffffae14dbeafc94>]
> exit_to_kernel_mode+0x34/0x130
> [ 422.635713] hardirqs last disabled at (654): [<ffffae14dbeb7700>]
> __schedule+0x3f0/0xb90
> [ 422.636309] softirqs last enabled at (650): [<ffffae14da810be4>]
> __do_softirq+0x514/0x62c
> [ 422.636877] softirqs last disabled at (637): [<ffffae14da8b4f58>]
> __irq_exit_rcu+0x164/0x19c
> [ 422.637446] ---[ end trace 0000000000000000 ]---
>
> Full test log:
> https://lkft.validation.linaro.org/scheduler/job/5847349#L2206
> https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.0.y/build/v6.0.8-191-gf8896c3ebbcf/testrun/13007451/suite/log-parser-test/test/check-kernel-exception/log
> https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.0.y/build/v6.0.8-191-gf8896c3ebbcf/testrun/13007451/suite/log-parser-test/test/check-kernel-exception/details/
On Tue, Nov 15, 2022 at 12:57:53PM +0530, Naresh Kamboju wrote:
> Following kernel warning noticed while running kselftest arm64 sve-ptrace
> on qemu-arm64 on ampere-altra server.
> [ 422.607034] ------------[ cut here ]------------
> [ 422.615382] WARNING: CPU: 0 PID: 1111 at
> arch/arm64/kernel/fpsimd.c:464 fpsimd_save+0x170/0x1b0
> [ 422.617588] Modules linked in: cfg80211 bluetooth rfkill
> crct10dif_ce sm3_ce sm3 sha3_ce sha512_ce sha512_arm64 fuse drm
Without the ability to reproduce this or more information this
isn't really actionable - since I'm not seeing any changes that
look in the least bit relevant in the stable queue I'm guessing
that it's just happened once?
You mention that this is hosted on an Altra but it looks like
you're running the TCG backend, if there's some reason to expect
that qemu might be unstable when hosted on that platform it's
probably worth looping the qemu people in.
On Tue, Nov 15, 2022 at 09:22:53AM +0100, Arnd Bergmann wrote:
> Have you tried what happens if you run the same thing on an x86
> machine? I would expect them to behave the same way, but it's
> possible something goes wrong with the guest CPU if this ends
> up using some (but not all) of the logic from KVM that would
> use '-cpu host' instead of '-cpu max'. Note that the Neoverse
> CPU in the Altra machine does not support SVE.
I'm finding it hard to think of a failure pattern that would
make it through VL discovery then fail at runtime but also not
obviously trigger any issues in syscall-abi...
> Other things you could easily try would use the same command
> line as above, with the possible combinations of '-cpu host'
> (replacing -cpu max) and '-enable-kvm'. Do you always get
> the same result?
The machine parameter accel={tcg,kvm} is useful for forcing a
specific backend - it's probably wise to force TCG if you might
be running on a job on a native architecture.
BTW there's some other funky stuff going on with that job, the
syscall-abi test is stopped with a timeout after 45 seconds (as
is sve-ptrace) which appears to be coming from a harness
somewhere. The selection of FP tests run seems to miss fp-stress
too.