2024-03-18 19:48:04

by Mirsad Todorovac

[permalink] [raw]
Subject: [BUG net-next] arch/x86/kernel/cpu/bugs.c:2935: "Unpatched return thunk in use. This should not happen!" [STACKTRACE]

Hi,

With the latest net-next v6.8-5204-g237bb5f7f7f5 kernel, while running kselftest, there was this
trap and stacktrace:

This is a vanilla net-next tree kernel, only changes to tools/testing/sefltests Makefile make the
build mark it as "dirty".

The message was apparently introduced with this patch: https://lore.kernel.org/lkml/20240207185328.GEZcPRqPsNInRXyNMj@fat_crate.local/

Here is the stacktrace:

Mar 18 19:46:35 defiant kernel: [ 1859.134913] ------------[ cut here ]------------
Mar 18 19:46:35 defiant kernel: [ 1859.134916] Unpatched return thunk in use. This should not happen!
Mar 18 19:46:35 defiant kernel: [ 1859.134919] WARNING: CPU: 30 PID: 80103 at arch/x86/kernel/cpu/bugs.c:2935 __warn_thunk (/home/marvin/linux/kernel/net-next/arch/x86/kernel/cpu/bugs.c:2935 (discriminator 3))
Mar 18 19:46:35 defiant kernel: [ 1859.134925] Modules linked in: ir_rcmm_decoder ir_imon_decoder ir_sharp_decoder ir_rc6_decoder ir_sanyo_decoder ir_nec_decoder ir_sony_decoder ir_jvc_decoder ir_rc5_decoder rc_loopback gpio_sim macvlan act_gact cls_flower sch_ingress bridge stp llc bonding tls xfrm_user nf_tables nfnetlink nvme_fabrics binfmt_misc amd_atl intel_rapl_msr nls_iso8859_1 intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic amdgpu snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_pcm edac_mce_amd crct10dif_pclmul polyval_clmulni snd_seq_midi polyval_generic snd_seq_midi_event ghash_clmulni_intel sha512_ssse3 amdxcp sha256_ssse3 snd_rawmidi drm_exec sha1_ssse3 gpu_sched aesni_intel drm_buddy drm_suballoc_helper crypto_simd drm_ttm_helper cryptd snd_seq ttm joydev snd_seq_device rapl input_leds drm_display_helper snd_timer cec snd wmi_bmof drm_kms_helper k10temp ccp soundcore i2c_algo_bit mac_hid tcp_bbr sch_fq msr parport_pc ppdev lp parport fuse drm efi_pstore ip_tables
Mar 18 19:46:35 defiant kernel: [ 1859.134985] x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c hid_generic nvme nvme_core ahci i2c_piix4 crc32_pclmul r8169 nvme_auth xhci_pci libahci xhci_pci_renesas realtek video wmi gpio_amdpt [last unloaded: gpio_mockup]
Mar 18 19:46:35 defiant kernel: [ 1859.135002] CPU: 30 PID: 80103 Comm: cpuid_test Not tainted 6.8.0-net-next-km-05204-g237bb5f7f7f5-dirty #3
Mar 18 19:46:35 defiant kernel: [ 1859.135004] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
Mar 18 19:46:35 defiant kernel: [ 1859.135005] RIP: 0010:__warn_thunk (/home/marvin/linux/kernel/net-next/arch/x86/kernel/cpu/bugs.c:2935 (discriminator 3))
Mar 18 19:46:35 defiant kernel: [ 1859.135008] Code: 50 9b 0f 01 83 e3 01 74 0e 48 8b 5d f8 c9 31 f6 31 ff e9 0e 9f 3b 01 48 c7 c7 28 62 1d b9 c6 05 f2 d6 6b 02 01 e8 60 af 07 00 <0f> 0b 48 8b 5d f8 c9 31 f6 31 ff e9 eb 9e 3b 01 90 90 90 90 90 90
All code
========
0: 50 push %rax
1: 9b fwait
2: 0f 01 83 e3 01 74 0e sgdt 0xe7401e3(%rbx)
9: 48 8b 5d f8 mov -0x8(%rbp),%rbx
d: c9 leave
e: 31 f6 xor %esi,%esi
10: 31 ff xor %edi,%edi
12: e9 0e 9f 3b 01 jmp 0x13b9f25
17: 48 c7 c7 28 62 1d b9 mov $0xffffffffb91d6228,%rdi
1e: c6 05 f2 d6 6b 02 01 movb $0x1,0x26bd6f2(%rip) # 0x26bd717
25: e8 60 af 07 00 call 0x7af8a
2a:* 0f 0b ud2 <-- trapping instruction
2c: 48 8b 5d f8 mov -0x8(%rbp),%rbx
30: c9 leave
31: 31 f6 xor %esi,%esi
33: 31 ff xor %edi,%edi
35: e9 eb 9e 3b 01 jmp 0x13b9f25
3a: 90 nop
3b: 90 nop
3c: 90 nop
3d: 90 nop
3e: 90 nop
3f: 90 nop

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 48 8b 5d f8 mov -0x8(%rbp),%rbx
6: c9 leave
7: 31 f6 xor %esi,%esi
9: 31 ff xor %edi,%edi
b: e9 eb 9e 3b 01 jmp 0x13b9efb
10: 90 nop
11: 90 nop
12: 90 nop
13: 90 nop
14: 90 nop
15: 90 nop
Mar 18 19:46:35 defiant kernel: [ 1859.135009] RSP: 0018:ffffb9f8c652bc70 EFLAGS: 00010046
Mar 18 19:46:35 defiant kernel: [ 1859.135011] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
Mar 18 19:46:35 defiant kernel: [ 1859.135012] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Mar 18 19:46:35 defiant kernel: [ 1859.135013] RBP: ffffb9f8c652bc78 R08: 0000000000000000 R09: 0000000000000000
Mar 18 19:46:35 defiant kernel: [ 1859.135014] R10: 0000000000000000 R11: 0000000000000000 R12: ffff998e369f1c78
Mar 18 19:46:35 defiant kernel: [ 1859.135015] R13: 0000000000000000 R14: 0000000000000000 R15: ffff998e369f23f8
Mar 18 19:46:35 defiant kernel: [ 1859.135016] FS: 000071f68c680740(0000) GS:ffff999b18900000(0000) knlGS:0000000000000000
Mar 18 19:46:35 defiant kernel: [ 1859.135017] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 18 19:46:35 defiant kernel: [ 1859.135018] CR2: 0000000000000000 CR3: 000000032c828000 CR4: 0000000000f50ef0
Mar 18 19:46:35 defiant kernel: [ 1859.135019] PKRU: 55555554
Mar 18 19:46:35 defiant kernel: [ 1859.135020] Call Trace:
Mar 18 19:46:35 defiant kernel: [ 1859.135021] <TASK>
Mar 18 19:46:35 defiant kernel: [ 1859.135022] ? show_regs (/home/marvin/linux/kernel/net-next/arch/x86/kernel/dumpstack.c:479)
Mar 18 19:46:35 defiant kernel: [ 1859.135026] ? __warn_thunk (/home/marvin/linux/kernel/net-next/arch/x86/kernel/cpu/bugs.c:2935 (discriminator 3))
Mar 18 19:46:35 defiant kernel: [ 1859.135028] ? __warn (/home/marvin/linux/kernel/net-next/kernel/panic.c:677)
Mar 18 19:46:35 defiant kernel: [ 1859.135030] ? __warn_thunk (/home/marvin/linux/kernel/net-next/arch/x86/kernel/cpu/bugs.c:2935 (discriminator 3))
Mar 18 19:46:35 defiant kernel: [ 1859.135032] ? report_bug (/home/marvin/linux/kernel/net-next/lib/bug.c:201 /home/marvin/linux/kernel/net-next/lib/bug.c:219)
Mar 18 19:46:35 defiant kernel: [ 1859.135036] ? irq_work_queue (/home/marvin/linux/kernel/net-next/kernel/irq_work.c:119)
Mar 18 19:46:35 defiant kernel: [ 1859.135040] ? handle_bug (/home/marvin/linux/kernel/net-next/arch/x86/kernel/traps.c:218)
Mar 18 19:46:35 defiant kernel: [ 1859.135043] ? exc_invalid_op (/home/marvin/linux/kernel/net-next/arch/x86/kernel/traps.c:260 (discriminator 1))
Mar 18 19:46:35 defiant kernel: [ 1859.135045] ? asm_exc_invalid_op (/home/marvin/linux/kernel/net-next/./arch/x86/include/asm/idtentry.h:621)
Mar 18 19:46:35 defiant kernel: [ 1859.135050] ? __warn_thunk (/home/marvin/linux/kernel/net-next/arch/x86/kernel/cpu/bugs.c:2935 (discriminator 3))
Mar 18 19:46:35 defiant kernel: [ 1859.135053] warn_thunk_thunk (/home/marvin/linux/kernel/net-next/arch/x86/entry/entry.S:48)
Mar 18 19:46:35 defiant kernel: [ 1859.135057] svm_vcpu_enter_exit (/home/marvin/linux/kernel/net-next/./include/linux/kvm_host.h:543 /home/marvin/linux/kernel/net-next/arch/x86/kvm/svm/svm.c:4115)
Mar 18 19:46:35 defiant kernel: [ 1859.135059] svm_vcpu_run (/home/marvin/linux/kernel/net-next/./arch/x86/include/asm/cpufeature.h:171 /home/marvin/linux/kernel/net-next/arch/x86/kvm/svm/svm.c:4182)
Mar 18 19:46:35 defiant kernel: [ 1859.135062] kvm_arch_vcpu_ioctl_run (/home/marvin/linux/kernel/net-next/arch/x86/kvm/x86.c:10981 /home/marvin/linux/kernel/net-next/arch/x86/kvm/x86.c:11184 /home/marvin/linux/kernel/net-next/arch/x86/kvm/x86.c:11410)
Mar 18 19:46:35 defiant kernel: [ 1859.135065] ? call_rcu (/home/marvin/linux/kernel/net-next/kernel/rcu/tree.c:2839)
Mar 18 19:46:35 defiant kernel: [ 1859.135068] ? srso_alias_return_thunk (/home/marvin/linux/kernel/net-next/arch/x86/lib/retpoline.S:181)
Mar 18 19:46:35 defiant kernel: [ 1859.135070] ? put_object (/home/marvin/linux/kernel/net-next/mm/kmemleak.c:549)
Mar 18 19:46:35 defiant kernel: [ 1859.135074] ? srso_alias_return_thunk (/home/marvin/linux/kernel/net-next/arch/x86/lib/retpoline.S:181)
Mar 18 19:46:35 defiant kernel: [ 1859.135076] ? kmemleak_free (/home/marvin/linux/kernel/net-next/mm/kmemleak.c:1109)
Mar 18 19:46:35 defiant kernel: [ 1859.135078] ? srso_alias_return_thunk (/home/marvin/linux/kernel/net-next/arch/x86/lib/retpoline.S:181)
Mar 18 19:46:35 defiant kernel: [ 1859.135080] kvm_vcpu_ioctl (/home/marvin/linux/kernel/net-next/arch/x86/kvm/../../../virt/kvm/kvm_main.c:4447)
Mar 18 19:46:35 defiant kernel: [ 1859.135083] ? srso_alias_return_thunk (/home/marvin/linux/kernel/net-next/arch/x86/lib/retpoline.S:181)
Mar 18 19:46:35 defiant kernel: [ 1859.135085] ? kvm_vcpu_ioctl (/home/marvin/linux/kernel/net-next/arch/x86/kvm/../../../virt/kvm/kvm_main.c:4610)
Mar 18 19:46:35 defiant kernel: [ 1859.135087] ? srso_alias_return_thunk (/home/marvin/linux/kernel/net-next/arch/x86/lib/retpoline.S:181)
Mar 18 19:46:35 defiant kernel: [ 1859.135089] ? do_syscall_64 (/home/marvin/linux/kernel/net-next/./arch/x86/include/asm/cpufeature.h:171 /home/marvin/linux/kernel/net-next/arch/x86/entry/common.c:98)
Mar 18 19:46:35 defiant kernel: [ 1859.135092] ? srso_alias_return_thunk (/home/marvin/linux/kernel/net-next/arch/x86/lib/retpoline.S:181)
Mar 18 19:46:35 defiant kernel: [ 1859.135093] ? trace_hardirqs_on (/home/marvin/linux/kernel/net-next/kernel/trace/trace_preemptirq.c:58)
Mar 18 19:46:35 defiant kernel: [ 1859.135097] __x64_sys_ioctl (/home/marvin/linux/kernel/net-next/fs/ioctl.c:51 /home/marvin/linux/kernel/net-next/fs/ioctl.c:904 /home/marvin/linux/kernel/net-next/fs/ioctl.c:890 /home/marvin/linux/kernel/net-next/fs/ioctl.c:890)
Mar 18 19:46:35 defiant kernel: [ 1859.135101] do_syscall_64 (/home/marvin/linux/kernel/net-next/arch/x86/entry/common.c:52 /home/marvin/linux/kernel/net-next/arch/x86/entry/common.c:83)
Mar 18 19:46:35 defiant kernel: [ 1859.135103] ? srso_alias_return_thunk (/home/marvin/linux/kernel/net-next/arch/x86/lib/retpoline.S:181)
Mar 18 19:46:35 defiant kernel: [ 1859.135105] ? do_syscall_64 (/home/marvin/linux/kernel/net-next/./arch/x86/include/asm/cpufeature.h:171 /home/marvin/linux/kernel/net-next/arch/x86/entry/common.c:98)
Mar 18 19:46:35 defiant kernel: [ 1859.135106] ? srso_alias_return_thunk (/home/marvin/linux/kernel/net-next/arch/x86/lib/retpoline.S:181)
Mar 18 19:46:35 defiant kernel: [ 1859.135108] ? irqentry_exit (/home/marvin/linux/kernel/net-next/kernel/entry/common.c:361)
Mar 18 19:46:35 defiant kernel: [ 1859.135110] ? srso_alias_return_thunk (/home/marvin/linux/kernel/net-next/arch/x86/lib/retpoline.S:181)
Mar 18 19:46:35 defiant kernel: [ 1859.135112] ? exc_page_fault (/home/marvin/linux/kernel/net-next/arch/x86/mm/fault.c:1567)
Mar 18 19:46:35 defiant kernel: [ 1859.135114] entry_SYSCALL_64_after_hwframe (/home/marvin/linux/kernel/net-next/arch/x86/entry/entry_64.S:129)
Mar 18 19:46:35 defiant kernel: [ 1859.135115] RIP: 0033:0x71f68c51a94f
Mar 18 19:46:35 defiant kernel: [ 1859.135135] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
All code
========
0: 00 48 89 add %cl,-0x77(%rax)
3: 44 24 18 rex.R and $0x18,%al
6: 31 c0 xor %eax,%eax
8: 48 8d 44 24 60 lea 0x60(%rsp),%rax
d: c7 04 24 10 00 00 00 movl $0x10,(%rsp)
14: 48 89 44 24 08 mov %rax,0x8(%rsp)
19: 48 8d 44 24 20 lea 0x20(%rsp),%rax
1e: 48 89 44 24 10 mov %rax,0x10(%rsp)
23: b8 10 00 00 00 mov $0x10,%eax
28: 0f 05 syscall
2a:* 41 89 c0 mov %eax,%r8d <-- trapping instruction
2d: 3d 00 f0 ff ff cmp $0xfffff000,%eax
32: 77 1f ja 0x53
34: 48 8b 44 24 18 mov 0x18(%rsp),%rax
39: 64 fs
3a: 48 rex.W
3b: 2b .byte 0x2b
3c: 04 25 add $0x25,%al
3e: 28 00 sub %al,(%rax)

Code starting with the faulting instruction
===========================================
0: 41 89 c0 mov %eax,%r8d
3: 3d 00 f0 ff ff cmp $0xfffff000,%eax
8: 77 1f ja 0x29
a: 48 8b 44 24 18 mov 0x18(%rsp),%rax
f: 64 fs
10: 48 rex.W
11: 2b .byte 0x2b
12: 04 25 add $0x25,%al
14: 28 00 sub %al,(%rax)
Mar 18 19:46:35 defiant kernel: [ 1859.135137] RSP: 002b:00007ffd0f928bd0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Mar 18 19:46:35 defiant kernel: [ 1859.135138] RAX: ffffffffffffffda RBX: 0000000028af2880 RCX: 000071f68c51a94f
Mar 18 19:46:35 defiant kernel: [ 1859.135139] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000007
Mar 18 19:46:35 defiant kernel: [ 1859.135140] RBP: 000071f68c6806c0 R08: 0000000000000000 R09: 0000000000000001
Mar 18 19:46:35 defiant kernel: [ 1859.135141] R10: 000000000000001f R11: 0000000000000246 R12: 0000000028af2880
Mar 18 19:46:35 defiant kernel: [ 1859.135142] R13: 0000000000000041 R14: 0000000000427e18 R15: 000071f68c6e3040
Mar 18 19:46:35 defiant kernel: [ 1859.135146] </TASK>
Mar 18 19:46:35 defiant kernel: [ 1859.135146] ---[ end trace 0000000000000000 ]---

Hope this helps.

Best regards,
Mirsad Todorovac


2024-03-18 20:22:12

by Borislav Petkov

[permalink] [raw]
Subject: Re: [BUG net-next] arch/x86/kernel/cpu/bugs.c:2935: "Unpatched return thunk in use. This should not happen!" [STACKTRACE]

On Mon, Mar 18, 2024 at 08:47:26PM +0100, Mirsad Todorovac wrote:
> With the latest net-next v6.8-5204-g237bb5f7f7f5 kernel, while running kselftest, there was this
> trap and stacktrace:

Send your kernel .config and how exactly you're triggering it, please.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-03-20 01:29:33

by Mirsad Todorovac

[permalink] [raw]
Subject: Re: [BUG net-next] arch/x86/kernel/cpu/bugs.c:2935: "Unpatched return thunk in use. This should not happen!" [STACKTRACE]

On 3/18/24 21:21, Borislav Petkov wrote:
> On Mon, Mar 18, 2024 at 08:47:26PM +0100, Mirsad Todorovac wrote:
>> With the latest net-next v6.8-5204-g237bb5f7f7f5 kernel, while running kselftest, there was this
>> trap and stacktrace:
>
> Send your kernel .config and how exactly you're triggering it, please.
>
> Thx.

Hi,

Please find the kernel .config attached.

I got another one of these "Unpatched thunk" and it seems connected with selftest/kvm.

But running selftests/kvm one by one did not trigger the bug.

Best regards,
Mirsad Todorovac


Attachments:
config-6.8.0-net-next-km-05204-g237bb5f7f7f5-dirty.xz (58.07 kB)
unpatched-return-thunk-decoded-02.log (10.33 kB)
Download all attachments

2024-03-26 10:17:31

by Borislav Petkov

[permalink] [raw]
Subject: Re: [BUG net-next] arch/x86/kernel/cpu/bugs.c:2935: "Unpatched return thunk in use. This should not happen!" [STACKTRACE]

On Wed, Mar 20, 2024 at 02:28:57AM +0100, Mirsad Todorovac wrote:
> Please find the kernel .config attached.

Thanks, that's one huuuge kernel you're building. :)

> I got another one of these "Unpatched thunk" and it seems connected
> with selftest/kvm.
>
> But running selftests/kvm one by one did not trigger the bug.

Which commands are you exactly running?

I'll try to reproduce here.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-03-26 19:15:48

by Mirsad Todorovac

[permalink] [raw]
Subject: Re: [BUG net-next] arch/x86/kernel/cpu/bugs.c:2935: "Unpatched return thunk in use. This should not happen!" [STACKTRACE]

On 3/26/24 11:16, Borislav Petkov wrote:
> On Wed, Mar 20, 2024 at 02:28:57AM +0100, Mirsad Todorovac wrote:
>> Please find the kernel .config attached.
>
> Thanks, that's one huuuge kernel you're building. :)
>
>> I got another one of these "Unpatched thunk" and it seems connected
>> with selftest/kvm.
>>
>> But running selftests/kvm one by one did not trigger the bug.
>
> Which commands are you exactly running?
>
> I'll try to reproduce here.

I think I have a reproducer here on the latest torvalds vanilla tree (on Ubuntu 22.04 LTS box):

root# tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.sh
Running test with CAP_SYS_BOOT enabled
Running as root, skipping nx_huge_pages_test with CAP_SYS_BOOT disabled
root# git describe
v6.9-rc1-5-g928a87efa423
root#

> Thx.

Not at all.

The stacktrace for the bug triggered by the above command was:

kernel: [ 101.973612] ------------[ cut here ]------------
kernel: [ 101.973615] Unpatched return thunk in use. This should not happen!
kernel: [ 101.973618] WARNING: CPU: 1 PID: 3827 at arch/x86/kernel/cpu/bugs.c:2935 __warn_thunk (./arch/x86/kernel/cpu/bugs.c:2935 (discriminator 3))
kernel: [ 101.973625] Modules linked in: xfrm_user nf_tables nfnetlink nvme_fabrics binfmt_misc snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec intel_rapl_msr amd_atl snd_hda_core intel_rapl_common nls_iso8859_1 snd_hwdep snd_pcm edac_mce_amd amdgpu crct10dif_pclmul polyval_clmulni snd_seq_midi polyval_generic snd_seq_midi_event ghash_clmulni_intel sha512_ssse3 snd_rawmidi sha256_ssse3 amdxcp sha1_ssse3 drm_exec aesni_intel snd_seq gpu_sched crypto_simd drm_buddy cryptd drm_suballoc_helper drm_ttm_helper snd_seq_device joydev input_leds rapl ttm snd_timer wmi_bmof drm_display_helper cec snd drm_kms_helper k10temp ccp i2c_algo_bit soundcore mac_hid tcp_bbr msr parport_pc ppdev lp parport drm efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq hid_generic nvme r8169 xhci_pci ahci nvme_core crc32_pclmul i2c_piix4 xhci_pci_renesas nvme_auth realtek libahci video wmi gpio_amdpt
kernel: [ 101.973685] CPU: 1 PID: 3827 Comm: nx_huge_pages_t Not tainted 6.9.0-rc1-torv-00005-g928a87efa423-dirty #36
kernel: [ 101.973687] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
kernel: [ 101.973688] RIP: 0010:__warn_thunk (./arch/x86/kernel/cpu/bugs.c:2935 (discriminator 3))
kernel: [ 101.973691] Code: 62 c5 1d 01 83 e3 01 74 0e 48 8b 5d f8 c9 31 f6 31 ff e9 be 98 3b 01 48 c7 c7 98 21 c1 bc c6 05 22 26 8d 02 01 e8 90 aa 07 00 <0f> 0b 48 8b 5d f8 c9 31 f6 31 ff e9 9b 98 3b 01 90 90 90 90 90 90
All code
========
0: 62 c5 1d 01 83 (bad)
5: e3 01 jrcxz 0x8
7: 74 0e je 0x17
9: 48 8b 5d f8 mov -0x8(%rbp),%rbx
d: c9 leave
e: 31 f6 xor %esi,%esi
10: 31 ff xor %edi,%edi
12: e9 be 98 3b 01 jmp 0x13b98d5
17: 48 c7 c7 98 21 c1 bc mov $0xffffffffbcc12198,%rdi
1e: c6 05 22 26 8d 02 01 movb $0x1,0x28d2622(%rip) # 0x28d2647
25: e8 90 aa 07 00 call 0x7aaba
2a:* 0f 0b ud2 <-- trapping instruction
2c: 48 8b 5d f8 mov -0x8(%rbp),%rbx
30: c9 leave
31: 31 f6 xor %esi,%esi
33: 31 ff xor %edi,%edi
35: e9 9b 98 3b 01 jmp 0x13b98d5
3a: 90 nop
3b: 90 nop
3c: 90 nop
3d: 90 nop
3e: 90 nop
3f: 90 nop

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 48 8b 5d f8 mov -0x8(%rbp),%rbx
6: c9 leave
7: 31 f6 xor %esi,%esi
9: 31 ff xor %edi,%edi
b: e9 9b 98 3b 01 jmp 0x13b98ab
10: 90 nop
11: 90 nop
12: 90 nop
13: 90 nop
14: 90 nop
15: 90 nop
kernel: [ 101.973692] RSP: 0018:ffffbbd90580fc90 EFLAGS: 00010046
kernel: [ 101.973694] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
kernel: [ 101.973695] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
kernel: [ 101.973696] RBP: ffffbbd90580fc98 R08: 0000000000000000 R09: 0000000000000000
kernel: [ 101.973697] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9964e4b7d4f0
kernel: [ 101.973698] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9964e4b7dc70
kernel: [ 101.973699] FS: 0000720b95372740(0000) GS:ffff9973d7a80000(0000) knlGS:0000000000000000
kernel: [ 101.973700] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: [ 101.973701] CR2: 0000000000000000 CR3: 00000001aea6c000 CR4: 0000000000f50ef0
kernel: [ 101.973703] PKRU: 55555554
kernel: [ 101.973703] Call Trace:
kernel: [ 101.973704] <TASK>
kernel: [ 101.973706] ? show_regs (./arch/x86/kernel/dumpstack.c:479)
kernel: [ 101.973709] ? __warn_thunk (./arch/x86/kernel/cpu/bugs.c:2935 (discriminator 3))
kernel: [ 101.973711] ? __warn (./kernel/panic.c:694)
kernel: [ 101.973713] ? __warn_thunk (./arch/x86/kernel/cpu/bugs.c:2935 (discriminator 3))
kernel: [ 101.973715] ? report_bug (./lib/bug.c:201 ./lib/bug.c:219)
kernel: [ 101.973718] ? irq_work_queue (./kernel/irq_work.c:119)
kernel: [ 101.973722] ? handle_bug (./arch/x86/kernel/traps.c:218)
kernel: [ 101.973725] ? exc_invalid_op (./arch/x86/kernel/traps.c:260 (discriminator 1))
kernel: [ 101.973727] ? asm_exc_invalid_op (././arch/x86/include/asm/idtentry.h:621)
kernel: [ 101.973731] ? __warn_thunk (./arch/x86/kernel/cpu/bugs.c:2935 (discriminator 3))
kernel: [ 101.973734] warn_thunk_thunk (./arch/x86/entry/entry.S:48)
kernel: [ 101.973738] svm_vcpu_enter_exit (././include/linux/kvm_host.h:547 ./arch/x86/kvm/svm/svm.c:4115)
kernel: [ 101.973740] svm_vcpu_run (././arch/x86/include/asm/cpufeature.h:171 ./arch/x86/kvm/svm/svm.c:4186)
kernel: [ 101.973744] kvm_arch_vcpu_ioctl_run (./arch/x86/kvm/x86.c:11008 ./arch/x86/kvm/x86.c:11211 ./arch/x86/kvm/x86.c:11437)
kernel: [ 101.973747] ? srso_alias_return_thunk (./arch/x86/lib/retpoline.S:181)
kernel: [ 101.973750] ? srso_alias_return_thunk (./arch/x86/lib/retpoline.S:181)
kernel: [ 101.973752] ? kvm_vm_stats_read (./arch/x86/kvm/../../../virt/kvm/kvm_main.c:5066)
kernel: [ 101.973755] kvm_vcpu_ioctl (./arch/x86/kvm/../../../virt/kvm/kvm_main.c:4464)
kernel: [ 101.973757] ? srso_alias_return_thunk (./arch/x86/lib/retpoline.S:181)
kernel: [ 101.973759] ? trace_hardirqs_on_prepare (./kernel/trace/trace_preemptirq.c:47 ./kernel/trace/trace_preemptirq.c:42)
kernel: [ 101.973761] ? srso_alias_return_thunk (./arch/x86/lib/retpoline.S:181)
kernel: [ 101.973763] ? syscall_exit_to_user_mode (./kernel/entry/common.c:221)
kernel: [ 101.973765] ? srso_alias_return_thunk (./arch/x86/lib/retpoline.S:181)
kernel: [ 101.973767] ? do_syscall_64 (././arch/x86/include/asm/cpufeature.h:171 ./arch/x86/entry/common.c:98)
kernel: [ 101.973770] __x64_sys_ioctl (./fs/ioctl.c:51 ./fs/ioctl.c:904 ./fs/ioctl.c:890 ./fs/ioctl.c:890)
kernel: [ 101.973773] do_syscall_64 (./arch/x86/entry/common.c:52 ./arch/x86/entry/common.c:83)
kernel: [ 101.973775] ? srso_alias_return_thunk (./arch/x86/lib/retpoline.S:181)
kernel: [ 101.973777] ? irqentry_exit (./kernel/entry/common.c:367)
kernel: [ 101.973778] ? srso_alias_return_thunk (./arch/x86/lib/retpoline.S:181)
kernel: [ 101.973780] entry_SYSCALL_64_after_hwframe (./arch/x86/entry/entry_64.S:129)
kernel: [ 101.973782] RIP: 0033:0x720b9511a94f
kernel: [ 101.973798] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
All code
========
0: 00 48 89 add %cl,-0x77(%rax)
3: 44 24 18 rex.R and $0x18,%al
6: 31 c0 xor %eax,%eax
8: 48 8d 44 24 60 lea 0x60(%rsp),%rax
d: c7 04 24 10 00 00 00 movl $0x10,(%rsp)
14: 48 89 44 24 08 mov %rax,0x8(%rsp)
19: 48 8d 44 24 20 lea 0x20(%rsp),%rax
1e: 48 89 44 24 10 mov %rax,0x10(%rsp)
23: b8 10 00 00 00 mov $0x10,%eax
28: 0f 05 syscall
2a:* 41 89 c0 mov %eax,%r8d <-- trapping instruction
2d: 3d 00 f0 ff ff cmp $0xfffff000,%eax
32: 77 1f ja 0x53
34: 48 8b 44 24 18 mov 0x18(%rsp),%rax
39: 64 fs
3a: 48 rex.W
3b: 2b .byte 0x2b
3c: 04 25 add $0x25,%al
3e: 28 00 sub %al,(%rax)

Code starting with the faulting instruction
===========================================
0: 41 89 c0 mov %eax,%r8d
3: 3d 00 f0 ff ff cmp $0xfffff000,%eax
8: 77 1f ja 0x29
a: 48 8b 44 24 18 mov 0x18(%rsp),%rax
f: 64 fs
10: 48 rex.W
11: 2b .byte 0x2b
12: 04 25 add $0x25,%al
14: 28 00 sub %al,(%rax)
kernel: [ 101.973799] RSP: 002b:00007ffd786b9ca0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
kernel: [ 101.973801] RAX: ffffffffffffffda RBX: 0000000000600000 RCX: 0000720b9511a94f
kernel: [ 101.973802] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000005
kernel: [ 101.973803] RBP: 0000720b953726c0 R08: 000000000041b228 R09: 0000000000000000
kernel: [ 101.973804] R10: 0000720b951d8882 R11: 0000000000000246 R12: 000000000c9b18c0
kernel: [ 101.973805] R13: 000000000c9b18c0 R14: 0000000000000000 R15: 0000000000000064
kernel: [ 101.973809] </TASK>
kernel: [ 101.973810] ---[ end trace 0000000000000000 ]---

NOTE: Cc:-ed author of the reproducer for these results.
NOTE 2: The stacktrace is only displayed once, repeating the reproducer doesn't work until the next reboot.

Sending the latest config as well attached:

Best regards,
Mirsad Todorovac


Attachments:
config-6.9.0-rc1-torv-00005-g928a87efa423-dirty.xz (58.30 kB)

2024-03-28 12:39:13

by Michael Roth

[permalink] [raw]
Subject: Re: [BUG net-next] arch/x86/kernel/cpu/bugs.c:2935: "Unpatched return thunk in use. This should not happen!" [STACKTRACE]

On Tue, Mar 26, 2024 at 08:15:12PM +0100, Mirsad Todorovac wrote:
> On 3/26/24 11:16, Borislav Petkov wrote:
> > On Wed, Mar 20, 2024 at 02:28:57AM +0100, Mirsad Todorovac wrote:
> > > Please find the kernel .config attached.
> >
> > Thanks, that's one huuuge kernel you're building. :)
> >
> > > I got another one of these "Unpatched thunk" and it seems connected
> > > with selftest/kvm.
> > >
> > > But running selftests/kvm one by one did not trigger the bug.
> >
> > Which commands are you exactly running?
> >
> > I'll try to reproduce here.
>
> I think I have a reproducer here on the latest torvalds vanilla tree (on Ubuntu 22.04 LTS box):
>
> root# tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.sh
> Running test with CAP_SYS_BOOT enabled
> Running as root, skipping nx_huge_pages_test with CAP_SYS_BOOT disabled
> root# git describe
> v6.9-rc1-5-g928a87efa423
> root#

I'm seeing it pretty consistently on kvm/next as well. Not sure if
there's anything special about my config but starting a fairly basic
SVM guest seems to be enough to trigger it for me on the first
invocation of svm_vcpu_run().

It seems to be 2 call-sites, one inside:

amd_clear_divider()

and another inside:

__svm_vcpu_run()

which seems to match up with the decoded stack you posted here. Maybe
the first case would be easiest to focus on? It's a fairly
straight-forward use of ALTERNATIVE():

void noinstr amd_clear_divider(void)
{
asm volatile(ALTERNATIVE("", "div %2\n\t", X86_BUG_DIV0)
:: "a" (0), "d" (0), "r" (1));
}
EXPORT_SYMBOL_GPL(amd_clear_divider);

and it's been that way since before 4461438a84 ("x86/retpoline: Ensure
default return thunk isn't used at runtime") was added. Not sure if
anything else has changed underneath the covers since 4461438a84.

-Mike

>
> > Thx.
>
> Not at all.
>
> The stacktrace for the bug triggered by the above command was:
>
> kernel: [ 101.973612] ------------[ cut here ]------------
> kernel: [ 101.973615] Unpatched return thunk in use. This should not happen!
> kernel: [ 101.973618] WARNING: CPU: 1 PID: 3827 at arch/x86/kernel/cpu/bugs.c:2935 __warn_thunk (./arch/x86/kernel/cpu/bugs.c:2935 (discriminator 3))
> kernel: [ 101.973625] Modules linked in: xfrm_user nf_tables nfnetlink nvme_fabrics binfmt_misc snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec intel_rapl_msr amd_atl snd_hda_core intel_rapl_common nls_iso8859_1 snd_hwdep snd_pcm edac_mce_amd amdgpu crct10dif_pclmul polyval_clmulni snd_seq_midi polyval_generic snd_seq_midi_event ghash_clmulni_intel sha512_ssse3 snd_rawmidi sha256_ssse3 amdxcp sha1_ssse3 drm_exec aesni_intel snd_seq gpu_sched crypto_simd drm_buddy cryptd drm_suballoc_helper drm_ttm_helper snd_seq_device joydev input_leds rapl ttm snd_timer wmi_bmof drm_display_helper cec snd drm_kms_helper k10temp ccp i2c_algo_bit soundcore mac_hid tcp_bbr msr parport_pc ppdev lp parport drm efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq hid_generic nvme r8169 xhci_pci ahci nvme_core crc32_pclmul i2c_piix4 xhci_pci_renesas nvme_auth realtek libahci video wmi gpio_amdpt
> kernel: [ 101.973685] CPU: 1 PID: 3827 Comm: nx_huge_pages_t Not tainted 6.9.0-rc1-torv-00005-g928a87efa423-dirty #36
> kernel: [ 101.973687] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
> kernel: [ 101.973688] RIP: 0010:__warn_thunk (./arch/x86/kernel/cpu/bugs.c:2935 (discriminator 3))
> kernel: [ 101.973691] Code: 62 c5 1d 01 83 e3 01 74 0e 48 8b 5d f8 c9 31 f6 31 ff e9 be 98 3b 01 48 c7 c7 98 21 c1 bc c6 05 22 26 8d 02 01 e8 90 aa 07 00 <0f> 0b 48 8b 5d f8 c9 31 f6 31 ff e9 9b 98 3b 01 90 90 90 90 90 90
> All code
> ========
> 0: 62 c5 1d 01 83 (bad)
> 5: e3 01 jrcxz 0x8
> 7: 74 0e je 0x17
> 9: 48 8b 5d f8 mov -0x8(%rbp),%rbx
> d: c9 leave
> e: 31 f6 xor %esi,%esi
> 10: 31 ff xor %edi,%edi
> 12: e9 be 98 3b 01 jmp 0x13b98d5
> 17: 48 c7 c7 98 21 c1 bc mov $0xffffffffbcc12198,%rdi
> 1e: c6 05 22 26 8d 02 01 movb $0x1,0x28d2622(%rip) # 0x28d2647
> 25: e8 90 aa 07 00 call 0x7aaba
> 2a:* 0f 0b ud2 <-- trapping instruction
> 2c: 48 8b 5d f8 mov -0x8(%rbp),%rbx
> 30: c9 leave
> 31: 31 f6 xor %esi,%esi
> 33: 31 ff xor %edi,%edi
> 35: e9 9b 98 3b 01 jmp 0x13b98d5
> 3a: 90 nop
> 3b: 90 nop
> 3c: 90 nop
> 3d: 90 nop
> 3e: 90 nop
> 3f: 90 nop
>
> Code starting with the faulting instruction
> ===========================================
> 0: 0f 0b ud2
> 2: 48 8b 5d f8 mov -0x8(%rbp),%rbx
> 6: c9 leave
> 7: 31 f6 xor %esi,%esi
> 9: 31 ff xor %edi,%edi
> b: e9 9b 98 3b 01 jmp 0x13b98ab
> 10: 90 nop
> 11: 90 nop
> 12: 90 nop
> 13: 90 nop
> 14: 90 nop
> 15: 90 nop
> kernel: [ 101.973692] RSP: 0018:ffffbbd90580fc90 EFLAGS: 00010046
> kernel: [ 101.973694] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> kernel: [ 101.973695] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> kernel: [ 101.973696] RBP: ffffbbd90580fc98 R08: 0000000000000000 R09: 0000000000000000
> kernel: [ 101.973697] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9964e4b7d4f0
> kernel: [ 101.973698] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9964e4b7dc70
> kernel: [ 101.973699] FS: 0000720b95372740(0000) GS:ffff9973d7a80000(0000) knlGS:0000000000000000
> kernel: [ 101.973700] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> kernel: [ 101.973701] CR2: 0000000000000000 CR3: 00000001aea6c000 CR4: 0000000000f50ef0
> kernel: [ 101.973703] PKRU: 55555554
> kernel: [ 101.973703] Call Trace:
> kernel: [ 101.973704] <TASK>
> kernel: [ 101.973706] ? show_regs (./arch/x86/kernel/dumpstack.c:479)
> kernel: [ 101.973709] ? __warn_thunk (./arch/x86/kernel/cpu/bugs.c:2935 (discriminator 3))
> kernel: [ 101.973711] ? __warn (./kernel/panic.c:694)
> kernel: [ 101.973713] ? __warn_thunk (./arch/x86/kernel/cpu/bugs.c:2935 (discriminator 3))
> kernel: [ 101.973715] ? report_bug (./lib/bug.c:201 ./lib/bug.c:219)
> kernel: [ 101.973718] ? irq_work_queue (./kernel/irq_work.c:119)
> kernel: [ 101.973722] ? handle_bug (./arch/x86/kernel/traps.c:218)
> kernel: [ 101.973725] ? exc_invalid_op (./arch/x86/kernel/traps.c:260 (discriminator 1))
> kernel: [ 101.973727] ? asm_exc_invalid_op (././arch/x86/include/asm/idtentry.h:621)
> kernel: [ 101.973731] ? __warn_thunk (./arch/x86/kernel/cpu/bugs.c:2935 (discriminator 3))
> kernel: [ 101.973734] warn_thunk_thunk (./arch/x86/entry/entry.S:48)
> kernel: [ 101.973738] svm_vcpu_enter_exit (././include/linux/kvm_host.h:547 ./arch/x86/kvm/svm/svm.c:4115)
> kernel: [ 101.973740] svm_vcpu_run (././arch/x86/include/asm/cpufeature.h:171 ./arch/x86/kvm/svm/svm.c:4186)
> kernel: [ 101.973744] kvm_arch_vcpu_ioctl_run (./arch/x86/kvm/x86.c:11008 ./arch/x86/kvm/x86.c:11211 ./arch/x86/kvm/x86.c:11437)
> kernel: [ 101.973747] ? srso_alias_return_thunk (./arch/x86/lib/retpoline.S:181)
> kernel: [ 101.973750] ? srso_alias_return_thunk (./arch/x86/lib/retpoline.S:181)
> kernel: [ 101.973752] ? kvm_vm_stats_read (./arch/x86/kvm/../../../virt/kvm/kvm_main.c:5066)
> kernel: [ 101.973755] kvm_vcpu_ioctl (./arch/x86/kvm/../../../virt/kvm/kvm_main.c:4464)
> kernel: [ 101.973757] ? srso_alias_return_thunk (./arch/x86/lib/retpoline.S:181)
> kernel: [ 101.973759] ? trace_hardirqs_on_prepare (./kernel/trace/trace_preemptirq.c:47 ./kernel/trace/trace_preemptirq.c:42)
> kernel: [ 101.973761] ? srso_alias_return_thunk (./arch/x86/lib/retpoline.S:181)
> kernel: [ 101.973763] ? syscall_exit_to_user_mode (./kernel/entry/common.c:221)
> kernel: [ 101.973765] ? srso_alias_return_thunk (./arch/x86/lib/retpoline.S:181)
> kernel: [ 101.973767] ? do_syscall_64 (././arch/x86/include/asm/cpufeature.h:171 ./arch/x86/entry/common.c:98)
> kernel: [ 101.973770] __x64_sys_ioctl (./fs/ioctl.c:51 ./fs/ioctl.c:904 ./fs/ioctl.c:890 ./fs/ioctl.c:890)
> kernel: [ 101.973773] do_syscall_64 (./arch/x86/entry/common.c:52 ./arch/x86/entry/common.c:83)
> kernel: [ 101.973775] ? srso_alias_return_thunk (./arch/x86/lib/retpoline.S:181)
> kernel: [ 101.973777] ? irqentry_exit (./kernel/entry/common.c:367)
> kernel: [ 101.973778] ? srso_alias_return_thunk (./arch/x86/lib/retpoline.S:181)
> kernel: [ 101.973780] entry_SYSCALL_64_after_hwframe (./arch/x86/entry/entry_64.S:129)
> kernel: [ 101.973782] RIP: 0033:0x720b9511a94f
> kernel: [ 101.973798] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
> All code
> ========
> 0: 00 48 89 add %cl,-0x77(%rax)
> 3: 44 24 18 rex.R and $0x18,%al
> 6: 31 c0 xor %eax,%eax
> 8: 48 8d 44 24 60 lea 0x60(%rsp),%rax
> d: c7 04 24 10 00 00 00 movl $0x10,(%rsp)
> 14: 48 89 44 24 08 mov %rax,0x8(%rsp)
> 19: 48 8d 44 24 20 lea 0x20(%rsp),%rax
> 1e: 48 89 44 24 10 mov %rax,0x10(%rsp)
> 23: b8 10 00 00 00 mov $0x10,%eax
> 28: 0f 05 syscall
> 2a:* 41 89 c0 mov %eax,%r8d <-- trapping instruction
> 2d: 3d 00 f0 ff ff cmp $0xfffff000,%eax
> 32: 77 1f ja 0x53
> 34: 48 8b 44 24 18 mov 0x18(%rsp),%rax
> 39: 64 fs
> 3a: 48 rex.W
> 3b: 2b .byte 0x2b
> 3c: 04 25 add $0x25,%al
> 3e: 28 00 sub %al,(%rax)
>
> Code starting with the faulting instruction
> ===========================================
> 0: 41 89 c0 mov %eax,%r8d
> 3: 3d 00 f0 ff ff cmp $0xfffff000,%eax
> 8: 77 1f ja 0x29
> a: 48 8b 44 24 18 mov 0x18(%rsp),%rax
> f: 64 fs
> 10: 48 rex.W
> 11: 2b .byte 0x2b
> 12: 04 25 add $0x25,%al
> 14: 28 00 sub %al,(%rax)
> kernel: [ 101.973799] RSP: 002b:00007ffd786b9ca0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> kernel: [ 101.973801] RAX: ffffffffffffffda RBX: 0000000000600000 RCX: 0000720b9511a94f
> kernel: [ 101.973802] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000005
> kernel: [ 101.973803] RBP: 0000720b953726c0 R08: 000000000041b228 R09: 0000000000000000
> kernel: [ 101.973804] R10: 0000720b951d8882 R11: 0000000000000246 R12: 000000000c9b18c0
> kernel: [ 101.973805] R13: 000000000c9b18c0 R14: 0000000000000000 R15: 0000000000000064
> kernel: [ 101.973809] </TASK>
> kernel: [ 101.973810] ---[ end trace 0000000000000000 ]---
>
> NOTE: Cc:-ed author of the reproducer for these results.
> NOTE 2: The stacktrace is only displayed once, repeating the reproducer doesn't work until the next reboot.
>
> Sending the latest config as well attached:
>
> Best regards,
> Mirsad Todorovac



2024-04-02 10:16:55

by Borislav Petkov

[permalink] [raw]
Subject: Re: [BUG net-next] arch/x86/kernel/cpu/bugs.c:2935: "Unpatched return thunk in use. This should not happen!" [STACKTRACE]

From: Borislav Petkov <[email protected]>

Sorry if this comes out weird - mail troubles currently.

On Thu, Mar 28, 2024 at 07:38:30AM -0500, Michael Roth wrote:
> I'm seeing it pretty consistently on kvm/next as well. Not sure if
> there's anything special about my config but starting a fairly basic
> SVM guest seems to be enough to trigger it for me on the first
> invocation of svm_vcpu_run().

Hmm, can you share your config and what exactly you're doing?

I can't reproduce with Mirsad's reproducer, probably because of .config
differences. I tried making all CONFIG*KVM* options =y but no
difference.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


2024-04-02 13:39:52

by Michael Roth

[permalink] [raw]
Subject: Re: [BUG net-next] arch/x86/kernel/cpu/bugs.c:2935: "Unpatched return thunk in use. This should not happen!" [STACKTRACE]

On Tue, Apr 02, 2024 at 12:15:49PM +0200, [email protected] wrote:
> From: Borislav Petkov <[email protected]>
>
> Sorry if this comes out weird - mail troubles currently.
>
> On Thu, Mar 28, 2024 at 07:38:30AM -0500, Michael Roth wrote:
> > I'm seeing it pretty consistently on kvm/next as well. Not sure if
> > there's anything special about my config but starting a fairly basic
> > SVM guest seems to be enough to trigger it for me on the first
> > invocation of svm_vcpu_run().
>
> Hmm, can you share your config and what exactly you're doing?
>
> I can't reproduce with Mirsad's reproducer, probably because of .config
> differences. I tried making all CONFIG*KVM* options =y but no
> difference.

I've reproduced against tip/master from today and attached the host
config I used.

I can reproduce with a normal SVM guest using the following cmdline,
but I don't think there's anything particular special regarding what
QEMU options you use. It seems to trigger on the very first entry into
VMRUN path:

/home/mroth/qemu-build-snp-v4-wip2/qemu-system-x86_64
-smp 32,maxcpus=255 -cpu EPYC-Milan-v2 -overcommit cpu-pm=off
-enable-kvm -m 4G,slots=5,maxmem=210G -vga std -nographic
-machine pc,memory-backend=ram1
-object memory-backend-memfd,id=ram1,size=4G,share=true,prealloc=false,reserve=false
-device virtio-scsi-pci,id=scsi0,disable-legacy=on,iommu_platform=true
-drive file=/home/mroth/ubuntu-18.04-seves2.qcow2,if=none,id=drive0,snapshot=on
-device scsi-hd,id=hd0,drive=drive0,bus=scsi0.0
-device virtio-net-pci,mac=52:54:00:6c:3c:01,netdev=netdev0,id=net0,disable-legacy=on,iommu_platform=true,romfile=
-netdev tap,script=/home/mroth/qemu-ifup,id=netdev0
-L /home/mroth/AMDSEV/snp-release-2024-02-22/usr/local/share/qemu
-msg timestamp=on
-drive if=pflash,format=raw,unit=0,file=/home/mroth/AMDSEV/snp-release-2024-02-22/usr/local/share/qemu/OVMF_CODE.fd,readonly=on
-drive if=pflash,format=raw,unit=1,file=/home/mroth/AMDSEV/snp-release-2024-02-22/usr/local/share/qemu/OVMF_VARS.fd

I can also trigger using one of the more basic KVM selftests:

make INSTALL_HDR_PATH="$headers_dir" headers_install
make -C tools/testing/selftests TARGETS="kvm" EXTRA_CFLAGS="-DDEBUG -I$headers_dir"
sudo tools/testing/selftests/kvm/userspace_io_test

after that you need to kexec or reboot to see the WARN_ONCE() again.

-Mike

>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
>


Attachments:
(No filename) (2.49 kB)
config-6.9.0-rc2-tip-master-20240402+ (281.92 kB)
Download all attachments

2024-04-03 12:15:21

by Borislav Petkov

[permalink] [raw]
Subject: Re: [BUG net-next] arch/x86/kernel/cpu/bugs.c:2935: "Unpatched return thunk in use. This should not happen!" [STACKTRACE]

On Tue, Apr 02, 2024 at 08:38:56AM -0500, Michael Roth wrote:
> On Tue, Apr 02, 2024 at 12:15:49PM +0200, [email protected] wrote:
> > From: Borislav Petkov <[email protected]>
> >
> > Sorry if this comes out weird - mail troubles currently.
> >
> > On Thu, Mar 28, 2024 at 07:38:30AM -0500, Michael Roth wrote:
> > > I'm seeing it pretty consistently on kvm/next as well. Not sure if
> > > there's anything special about my config but starting a fairly basic
> > > SVM guest seems to be enough to trigger it for me on the first
> > > invocation of svm_vcpu_run().
> >
> > Hmm, can you share your config and what exactly you're doing?
> >
> > I can't reproduce with Mirsad's reproducer, probably because of .config
> > differences. I tried making all CONFIG*KVM* options =y but no
> > difference.
>
> I've reproduced against tip/master from today and attached the host
> config I used.
>
> I can reproduce with a normal SVM guest using the following cmdline,
> but I don't think there's anything particular special regarding what
> QEMU options you use. It seems to trigger on the very first entry into
> VMRUN path:
>
> /home/mroth/qemu-build-snp-v4-wip2/qemu-system-x86_64
> -smp 32,maxcpus=255 -cpu EPYC-Milan-v2 -overcommit cpu-pm=off
> -enable-kvm -m 4G,slots=5,maxmem=210G -vga std -nographic
> -machine pc,memory-backend=ram1
> -object memory-backend-memfd,id=ram1,size=4G,share=true,prealloc=false,reserve=false
> -device virtio-scsi-pci,id=scsi0,disable-legacy=on,iommu_platform=true
> -drive file=/home/mroth/ubuntu-18.04-seves2.qcow2,if=none,id=drive0,snapshot=on
> -device scsi-hd,id=hd0,drive=drive0,bus=scsi0.0
> -device virtio-net-pci,mac=52:54:00:6c:3c:01,netdev=netdev0,id=net0,disable-legacy=on,iommu_platform=true,romfile=
> -netdev tap,script=/home/mroth/qemu-ifup,id=netdev0
> -L /home/mroth/AMDSEV/snp-release-2024-02-22/usr/local/share/qemu
> -msg timestamp=on
> -drive if=pflash,format=raw,unit=0,file=/home/mroth/AMDSEV/snp-release-2024-02-22/usr/local/share/qemu/OVMF_CODE.fd,readonly=on
> -drive if=pflash,format=raw,unit=1,file=/home/mroth/AMDSEV/snp-release-2024-02-22/usr/local/share/qemu/OVMF_VARS.fd
>
> I can also trigger using one of the more basic KVM selftests:
>
> make INSTALL_HDR_PATH="$headers_dir" headers_install
> make -C tools/testing/selftests TARGETS="kvm" EXTRA_CFLAGS="-DDEBUG -I$headers_dir"
> sudo tools/testing/selftests/kvm/userspace_io_test

Ok, thanks, that helped.

Problem is:

7f4b5cde2409 ("kvm: Disable objtool frame pointer checking for vmenter.S")

it is disabling checking of the arch/x86/kvm/svm/vmenter.S by objtool
when CONFIG_FRAME_POINTER=y but that also leads to objtool *not*
generating .return_sites and the return thunk remains unpatched.

I think we need to say: ignore frame pointer checking but still generate
return_sites.

Josh, ideas?

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-04-03 12:48:37

by Sean Christopherson

[permalink] [raw]
Subject: Re: [BUG net-next] arch/x86/kernel/cpu/bugs.c:2935: "Unpatched return thunk in use. This should not happen!" [STACKTRACE]

On Wed, Apr 03, 2024, Borislav Petkov wrote:
> On Tue, Apr 02, 2024 at 08:38:56AM -0500, Michael Roth wrote:
> > On Tue, Apr 02, 2024 at 12:15:49PM +0200, [email protected] wrote:
> > I can also trigger using one of the more basic KVM selftests:
> >
> > make INSTALL_HDR_PATH="$headers_dir" headers_install
> > make -C tools/testing/selftests TARGETS="kvm" EXTRA_CFLAGS="-DDEBUG -I$headers_dir"
> > sudo tools/testing/selftests/kvm/userspace_io_test
>
> Ok, thanks, that helped.
>
> Problem is:
>
> 7f4b5cde2409 ("kvm: Disable objtool frame pointer checking for vmenter.S")
>
> it is disabling checking of the arch/x86/kvm/svm/vmenter.S by objtool
> when CONFIG_FRAME_POINTER=y but that also leads to objtool *not*
> generating .return_sites and the return thunk remains unpatched.
>
> I think we need to say: ignore frame pointer checking but still generate
> .return_sites.

I'm guessing a general solution for OBJECT_FILES_NON_STANDARD is needed, but I
have a series to drop it for vmenter.S.

https://lore.kernel.org/all/[email protected]

2024-04-04 13:42:36

by Borislav Petkov

[permalink] [raw]
Subject: Re: [BUG net-next] arch/x86/kernel/cpu/bugs.c:2935: "Unpatched return thunk in use. This should not happen!" [STACKTRACE]

On Wed, Apr 03, 2024 at 03:43:02PM +0200, Mirsad Todorovac wrote:
> I wonder if I could make any additional contribution to the project.

I'd suggest:

https://kernel.org/doc/html/latest/process/2.Process.html#getting-started-with-kernel-development

and

https://kernel.org/doc/html/latest/process/development-process.html

which will give you a broader picture.

You could test linux-next, build random configs:

"make randconfig"

and see if you trigger a compiler warning, try to analyze it, understand
it and fix it.

It is a steep climb but it is a lot of fun. :-)

HTH.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-04-04 14:05:18

by Borislav Petkov

[permalink] [raw]
Subject: Re: [BUG net-next] arch/x86/kernel/cpu/bugs.c:2935: "Unpatched return thunk in use. This should not happen!" [STACKTRACE]

On Wed, Apr 03, 2024 at 05:48:22AM -0700, Sean Christopherson wrote:
> I'm guessing a general solution for OBJECT_FILES_NON_STANDARD is needed

Yeah.

> but I have a series to drop it for vmenter.S.
>
> https://lore.kernel.org/all/[email protected]

Cool, ship it.

Holler if I should test it a bit on my pile of hw.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-04-17 15:52:53

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [BUG net-next] arch/x86/kernel/cpu/bugs.c:2935: "Unpatched return thunk in use. This should not happen!" [STACKTRACE]

On Thu, Apr 4, 2024 at 3:45 PM Borislav Petkov <[email protected]> wrote:
>
> On Wed, Apr 03, 2024 at 05:48:22AM -0700, Sean Christopherson wrote:
> > I'm guessing a general solution for OBJECT_FILES_NON_STANDARD is needed
>
> Yeah.
>
> > but I have a series to drop it for vmenter.S.
> >
> > https://lore.kernel.org/all/[email protected]
>
> Cool, ship it.

Applied for 6.9.

Paolo