Dear Linux folks,
On a Dell XPS 13 9360/0596KF, BIOS 2.21.0 06/02/2022, Linux 6.9-rc1+
built with
CONFIG_KCSAN=y
and `KCSAN_EARLY_ENABLE` *not* selected shows the warning below.
$ git log --no-decorate --oneline -1
928a87efa423 Merge tag 'gfs2-v6.8-fix' of
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
### First try
(Sorry, I didn’t save the direct output of `dmesg` the first time.)
```
$ journalctl -o short-monotonic -b -1 _TRANSPORT=kernel
[…]
[ 1.695447] abreu kernel: ------------[ cut here ]------------
[ 1.695463] abreu kernel: Unpatched return thunk in use. This should
not happen!
[ 1.695472] abreu kernel: WARNING: CPU: 2 PID: 89 at
arch/x86/kernel/cpu/bugs.c:2935 __warn_thunk+0x42/0x50
[ 1.695489] abreu kernel: Modules linked in: cryptd(+)
[ 1.695501] abreu kernel: CPU: 2 PID: 89 Comm: modprobe Not tainted
6.9.0-rc1+ #77
[ 1.695512] abreu kernel: Hardware name: Dell Inc. XPS 13
9360/0596KF, BIOS 2.21.0 06/02/2022
[ 1.695518] abreu kernel: RIP: 0010:__warn_thunk+0x42/0x50
[ 1.695530] abreu kernel: Code: 05 da 01 00 74 05 c3 cc cc cc cc 48
c7 c7 7d f4 60 9c e8 51 5b 37 00 48 c7 c7 a0 1b 2a 9c c6 05 60 05 da 01
01 e8 5e 38 08 00 <0f> 0b c3 cc cc cc cc 0f 1f 80 00 00 00 00 90 90>
[ 1.695539] abreu kernel: RSP: 0018:ffffabd5c0543a50 EFLAGS: 00010286
[ 1.695549] abreu kernel: RAX: 0000000000000000 RBX: ffffffffc05222c0
RCX: 0001ffffffffffff
[ 1.695556] abreu kernel: RDX: ffff99f100e4b480 RSI: 0000000000000004
RDI: ffffffff9c4ca398
[ 1.695563] abreu kernel: RBP: ffffabd5c0543aa0 R08: ffffffff9d2ce8c8
R09: 0000000000000000
[ 1.695570] abreu kernel: R10: 0001ffff9c4ca398 R11: ffffffff9a9af821
R12: ffff99f1038d7560
[ 1.695577] abreu kernel: R13: ffffffffc0522770 R14: ffffffffc0522768
R15: ffffffffc052c008
[ 1.695584] abreu kernel: FS: 00007ff3d9264040(0000)
GS:ffff99f46f100000(0000) knlGS:0000000000000000
[ 1.695593] abreu kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 1.695601] abreu kernel: CR2: 00007ff3d9341170 CR3: 0000000101838004
CR4: 00000000003706f0
[ 1.695608] abreu kernel: Call Trace:
[ 1.695613] abreu kernel: <TASK>
[ 1.695626] abreu kernel: ? __warn+0xaf/0x1c0
[ 1.695636] abreu kernel: ? __warn_thunk+0x42/0x50
[ 1.695648] abreu kernel: ? report_bug+0x1d6/0x200
[ 1.695662] abreu kernel: ? handle_bug+0x3c/0x80
[ 1.695672] abreu kernel: ? exc_invalid_op+0x17/0x70
[ 1.695682] abreu kernel: ? asm_exc_invalid_op+0x1a/0x20
[ 1.695696] abreu kernel: ? __wake_up_klogd.part.0+0x21/0x80
[ 1.695713] abreu kernel: ? __warn_thunk+0x42/0x50
[ 1.695724] abreu kernel: warn_thunk_thunk+0x1a/0x30
[ 1.695735] abreu kernel: ? do_init_module+0xf2/0x360
[ 1.695750] abreu kernel: ? _sub_I_00099_0+0x20/0x20 [cryptd]
[ 1.695801] abreu kernel: do_init_module+0xf7/0x360
[ 1.695817] abreu kernel: load_module+0x35f2/0x37d0
[ 1.695851] abreu kernel: ? init_module_from_file+0xca/0x130
[ 1.695866] abreu kernel: init_module_from_file+0xca/0x130
[ 1.695888] abreu kernel: idempotent_init_module+0x1b0/0x3d0
[ 1.695907] abreu kernel: __x64_sys_finit_module+0x88/0xe0
[ 1.695923] abreu kernel: do_syscall_64+0x85/0x1a0
[ 1.695936] abreu kernel: ? do_syscall_64+0x94/0x1a0
[ 1.695950] abreu kernel: ? fpregs_assert_state_consistent+0x7e/0x90
[ 1.695968] abreu kernel: ?
arch_exit_to_user_mode_prepare.isra.0+0x69/0xa0
[ 1.695980] abreu kernel: ? syscall_exit_to_user_mode+0x40/0xe0
[ 1.695996] abreu kernel: ? do_syscall_64+0x94/0x1a0
[ 1.696009] abreu kernel: ? irqentry_exit_to_user_mode+0x36/0xd0
[ 1.696020] abreu kernel: entry_SYSCALL_64_after_hwframe+0x6c/0x74
[ 1.696032] abreu kernel: RIP: 0033:0x7ff3d9366059
[ 1.696039] abreu kernel: Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00
00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c
8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 1d 0d 00>
[ 1.696048] abreu kernel: RSP: 002b:00007ffe37361958 EFLAGS: 00000246
ORIG_RAX: 0000000000000139
[ 1.696059] abreu kernel: RAX: ffffffffffffffda RBX: 0000555d44f65b40
RCX: 00007ff3d9366059
[ 1.696066] abreu kernel: RDX: 0000000000000000 RSI: 0000555d239659a6
RDI: 0000000000000000
[ 1.696073] abreu kernel: RBP: 0000000000000000 R08: 0000000000000060
R09: 0000555d44f671f0
[ 1.696080] abreu kernel: R10: 0000000000000038 R11: 0000000000000246
R12: 0000555d239659a6
[ 1.696087] abreu kernel: R13: 0000000000040000 R14: 0000555d44f65f30
R15: 0000000000000000
[ 1.696101] abreu kernel: </TASK>
[ 1.696106] abreu kernel: ---[ end trace 0000000000000000 ]---
```
```
$ ./scripts/decodecode < first_boot.txt
[ 1.696039] Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48
89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05
<48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 1d 0d 00 f7 d8 64 89 01 48
All code
========
0: 08 89 e8 5b 5d c3 or %cl,-0x3ca2a418(%rcx)
6: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
d: 00 00 00
10: 90 nop
11: 48 89 f8 mov %rdi,%rax
14: 48 89 f7 mov %rsi,%rdi
17: 48 89 d6 mov %rdx,%rsi
1a: 48 89 ca mov %rcx,%rdx
1d: 4d 89 c2 mov %r8,%r10
20: 4d 89 c8 mov %r9,%r8
23: 4c 8b 4c 24 08 mov 0x8(%rsp),%r9
28: 0f 05 syscall
2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <--
trapping instruction
30: 73 01 jae 0x33
32: c3 ret
33: 48 8b 0d 8f 1d 0d 00 mov 0xd1d8f(%rip),%rcx # 0xd1dc9
3a: f7 d8 neg %eax
3c: 64 89 01 mov %eax,%fs:(%rcx)
3f: 48 rex.W
Code starting with the faulting instruction
===========================================
0: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax
6: 73 01 jae 0x9
8: c3 ret
9: 48 8b 0d 8f 1d 0d 00 mov 0xd1d8f(%rip),%rcx # 0xd1d9f
10: f7 d8 neg %eax
12: 64 89 01 mov %eax,%fs:(%rcx)
15: 48 rex.W
```
### Second try
The Code line is a little different:
````
[ 1.685474] ------------[ cut here ]------------
[ 1.685482] Unpatched return thunk in use. This should not happen!
[ 1.685498] WARNING: CPU: 1 PID: 88 at
arch/x86/kernel/cpu/bugs.c:2935 __warn_thunk+0x42/0x50
[ 1.685515] Modules linked in: cryptd(+)
[ 1.685527] CPU: 1 PID: 88 Comm: modprobe Not tainted 6.9.0-rc1+ #77
[ 1.685537] Hardware name: Dell Inc. XPS 13 9360/0596KF, BIOS 2.21.0
06/02/2022
[ 1.685544] RIP: 0010:__warn_thunk+0x42/0x50
[ 1.685555] Code: 05 da 01 00 74 05 c3 cc cc cc cc 48 c7 c7 7d f4 20
9b e8 51 5b 37 00 48 c7 c7 a0 1b ea 9a c6 05 60 05 da 01 01 e8 5e 38 08
00 <0f> 0b c3 cc cc cc cc 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90
[ 1.685565] RSP: 0018:ffffb6a6c054bac0 EFLAGS: 00010282
[ 1.685574] RAX: 0000000000000000 RBX: ffffffffc04692c0 RCX:
0001ffffffffffff
[ 1.685582] RDX: ffff996b43028000 RSI: 0000000000000004 RDI:
ffffffff9b0ca398
[ 1.685589] RBP: ffffb6a6c054bb10 R08: ffffffff9bece8c8 R09:
0000000000000000
[ 1.685596] R10: 0001ffff9b0ca398 R11: ffffffff995af821 R12:
ffff996b4670d4a0
[ 1.685603] R13: ffffffffc0469770 R14: ffffffffc0469768 R15:
ffffffffc0473008
[ 1.685610] FS: 00007f8cee712040(0000) GS:ffff996eaf080000(0000)
knlGS:0000000000000000
[ 1.685618] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.685626] CR2: 00007ffda89fcb28 CR3: 00000001013f2004 CR4:
00000000003706f0
[ 1.685633] Call Trace:
[ 1.685638] <TASK>
[ 1.685644] ? __warn+0xaf/0x1c0
[ 1.685655] ? __warn_thunk+0x42/0x50
[ 1.685667] ? report_bug+0x1d6/0x200
[ 1.685680] ? handle_bug+0x3c/0x80
[ 1.685690] ? exc_invalid_op+0x17/0x70
[ 1.685700] ? asm_exc_invalid_op+0x1a/0x20
[ 1.685714] ? __wake_up_klogd.part.0+0x21/0x80
[ 1.685731] ? __warn_thunk+0x42/0x50
[ 1.685742] warn_thunk_thunk+0x1a/0x30
[ 1.685752] ? do_init_module+0xf2/0x360
[ 1.685767] ? _sub_I_00099_0+0x20/0x20 [cryptd]
[ 1.685818] do_init_module+0xf7/0x360
[ 1.685835] load_module+0x35f2/0x37d0
[ 1.685868] ? init_module_from_file+0xca/0x130
[ 1.685883] init_module_from_file+0xca/0x130
[ 1.685905] idempotent_init_module+0x1b0/0x3d0
[ 1.685924] __x64_sys_finit_module+0x88/0xe0
[ 1.685940] do_syscall_64+0x85/0x1a0
[ 1.685953] ? do_syscall_64+0x94/0x1a0
[ 1.685965] ? do_syscall_64+0x94/0x1a0
[ 1.685978] ? arch_exit_to_user_mode_prepare.isra.0+0x69/0xa0
[ 1.685990] ? irqentry_exit_to_user_mode+0x36/0xd0
[ 1.686006] entry_SYSCALL_64_after_hwframe+0x6c/0x74
[ 1.686017] RIP: 0033:0x7f8cee814059
[ 1.686025] Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90
48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 1d 0d 00 f7 d8 64 89 01 48
[ 1.686034] RSP: 002b:00007ffda89ffae8 EFLAGS: 00000246 ORIG_RAX:
0000000000000139
[ 1.686045] RAX: ffffffffffffffda RBX: 000055ec481f3b40 RCX:
00007f8cee814059
[ 1.686052] RDX: 0000000000000000 RSI: 000055ec13d9d9a6 RDI:
0000000000000000
[ 1.686059] RBP: 0000000000000000 R08: 0000000000000060 R09:
000055ec481f51f0
[ 1.686066] R10: 0000000000000038 R11: 0000000000000246 R12:
000055ec13d9d9a6
[ 1.686073] R13: 0000000000040000 R14: 000055ec481f3f30 R15:
0000000000000000
[ 1.686087] </TASK>
[ 1.686091] ---[ end trace 0000000000000000 ]---
```
```
$ ./scripts/decodecode < second_try.txt
Code: 05 da 01 00 74 05 c3 cc cc cc cc 48 c7 c7 7d f4 20 9b e8 51 5b 37
00 48 c7 c7 a0 1b ea 9a c6 05 60 05 da 01 01 e8 5e 38 08 00 <0f> 0b c3
cc cc cc cc 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90
All code
========
0: 05 da 01 00 74 add $0x740001da,%eax
5: 05 c3 cc cc cc add $0xccccccc3,%eax
a: cc int3
b: 48 c7 c7 7d f4 20 9b mov $0xffffffff9b20f47d,%rdi
12: e8 51 5b 37 00 call 0x375b68
17: 48 c7 c7 a0 1b ea 9a mov $0xffffffff9aea1ba0,%rdi
1e: c6 05 60 05 da 01 01 movb $0x1,0x1da0560(%rip) #
0x1da0585
25: e8 5e 38 08 00 call 0x83888
2a:* 0f 0b ud2 <-- trapping instruction
2c: c3 ret
2d: cc int3
2e: cc int3
2f: cc int3
30: cc int3
31: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
38: 90 nop
39: 90 nop
3a: 90 nop
3b: 90 nop
3c: 90 nop
3d: 90 nop
3e: 90 nop
3f: 90 nop
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: c3 ret
3: cc int3
4: cc int3
5: cc int3
6: cc int3
7: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
e: 90 nop
f: 90 nop
10: 90 nop
11: 90 nop
12: 90 nop
13: 90 nop
14: 90 nop
15: 90 nop
```
Please find the full Linux messages of the second try attached.
The problem does not happen with QEMU emulator version 8.2.2 (Debian
1:8.2.2+ds-2) with machine *q35*.
Kind regards,
Paul
On Tue, Mar 26, 2024 at 01:40:57PM +0100, Paul Menzel wrote:
> On a Dell XPS 13 9360/0596KF, BIOS 2.21.0 06/02/2022, Linux 6.9-rc1+ built
> with
>
> CONFIG_KCSAN=y
Are you saying that with KCSAN=n, it doesn't happen?
From the splat lt looks like it complains when loading that cryptd
module. But that thing looks like a normal module to me so it should
be fine actually.
Hmm.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Dear Borislav,
Am 26.03.24 um 13:49 schrieb Borislav Petkov:
> On Tue, Mar 26, 2024 at 01:40:57PM +0100, Paul Menzel wrote:
>> On a Dell XPS 13 9360/0596KF, BIOS 2.21.0 06/02/2022, Linux 6.9-rc1+ built
>> with
>>
>> CONFIG_KCSAN=y
>
> Are you saying that with KCSAN=n, it doesn't happen?
Indeed. Please find the messages from the journal attached.
> From the splat lt looks like it complains when loading that cryptd
> module. But that thing looks like a normal module to me so it should
> be fine actually.
>
> Hmm.
Kind regards,
Paul
On 26.03.24 г. 14:40 ч., Paul Menzel wrote:
> Dear Linux folks,
>
>
> On a Dell XPS 13 9360/0596KF, BIOS 2.21.0 06/02/2022, Linux 6.9-rc1+
> built with
>
> CONFIG_KCSAN=y
>
So the problem happens when KCSAN=y CONFIG_CONSTRUCTORS is also enabled
and this results in an indirect call in do_mod_ctors():
mod->ctors[i]();
When KCSAN is disabled, do_mod_ctors is empty, hence the warning is not
printed.
On Tue, Mar 26, 2024 at 04:08:32PM +0200, Nikolay Borisov wrote:
> So the problem happens when KCSAN=y CONFIG_CONSTRUCTORS is also enabled and
> this results in an indirect call in do_mod_ctors():
>
> mod->ctors[i]();
>
>
> When KCSAN is disabled, do_mod_ctors is empty, hence the warning is not
> printed.
Yeah, KCSAN is doing something weird. I was able to stop the guest when
the warning fires. Here's what I see:
The callstack when it fires:
#0 warn_thunk_thunk () at arch/x86/entry/entry.S:48
#1 0xffffffff811a98f9 in do_mod_ctors (mod=0xffffffffa00052c0) at kernel/module/main.c:2462
#2 do_init_module (mod=mod@entry=0xffffffffa00052c0) at kernel/module/main.c:2535
#3 0xffffffff811ad2e1 in load_module (info=info@entry=0xffffc900004c7dd0, uargs=uargs@entry=0x564c103dd4a0 "", flags=flags@entry=0) at kernel/module/main.c:3001
#4 0xffffffff811ad8ef in init_module_from_file (f=f@entry=0xffff8880151c5d00, uargs=uargs@entry=0x564c103dd4a0 "", flags=flags@entry=0) at kernel/module/main.c:3168
#5 0xffffffff811adade in idempotent_init_module (f=f@entry=0xffff8880151c5d00, uargs=uargs@entry=0x564c103dd4a0 "", flags=flags@entry=0) at kernel/module/main.c:3185
#6 0xffffffff811adec9 in __do_sys_finit_module (flags=0, uargs=0x564c103dd4a0 "", fd=3) at kernel/module/main.c:3206
#7 __se_sys_finit_module (flags=<optimized out>, uargs=94884689990816, fd=3) at kernel/module/main.c:3189
#8 __x64_sys_finit_module (regs=<optimized out>) at kernel/module/main.c:3189
#9 0xffffffff81fccdff in do_syscall_x64 (nr=<optimized out>, regs=0xffffc900004c7f58) at arch/x86/entry/common.c:52
#10 do_syscall_64 (regs=0xffffc900004c7f58, nr=<optimized out>) at arch/x86/entry/common.c:83
#11 0xffffffff82000126 in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:120
#12 0x0000000000000000 in ?? ()
Now, when we look at frame #1:
ffffffff811a9800 <do_init_module>:
ffffffff811a9800: e8 bb 36 ee ff call ffffffff8108cec0 <__fentry__>
ffffffff811a9805: 41 57 push %r15
ffffffff811a9807: 41 56 push %r14
ffffffff811a9809: 41 55 push %r13
ffffffff811a980b: 41 54 push %r12
ffffffff811a980d: 55 push %rbp
ffffffff811a980e: 53 push %rbx
ffffffff811a980f: 48 89 fb mov %rdi,%rbx
ffffffff811a9812: 48 c7 c7 c8 9f 6a 82 mov $0xffffffff826a9fc8,%rdi
ffffffff811a9819: 48 83 ec 08 sub $0x8,%rsp
ffffffff811a981d: e8 5e 51 0d 00 call ffffffff8127e980 <__tsan_read8>
ffffffff811a9822: 48 8b 3d 9f 07 50 01 mov 0x150079f(%rip),%rdi # ffffffff826a9fc8 <kmalloc_caches+0x28>
..
ffffffff811a98ec: e8 8f 50 0d 00 call ffffffff8127e980 <__tsan_read8>
ffffffff811a98f1: 49 8b 07 mov (%r15),%rax
ffffffff811a98f4: e8 27 d1 e3 00 call ffffffff81fe6a20 <__x86_indirect_thunk_array>
ffffffff811a98f9: 4c 89 ef mov %r13,%rdi
there's that call to the indirect array. Which is in the static kernel image:
ffffffff81fe6a20 <__x86_indirect_thunk_array>:
ffffffff81fe6a20: e8 01 00 00 00 call ffffffff81fe6a26 <__x86_indirect_thunk_array+0x6>
ffffffff81fe6a25: cc int3
ffffffff81fe6a26: 48 89 04 24 mov %rax,(%rsp)
ffffffff81fe6a2a: e9 b1 07 00 00 jmp ffffffff81fe71e0 <__x86_return_thunk>
where you'd think, ah, yes, that's why it fires.
BUT! The live kernel image in gdb looks like this:
Dump of assembler code for function __x86_indirect_thunk_array:
0xffffffff81fe6a20 <+0>: call 0xffffffff81fe6a26 <__x86_indirect_thunk_array+6>
0xffffffff81fe6a25 <+5>: int3
0xffffffff81fe6a26 <+6>: mov %rax,(%rsp)
0xffffffff81fe6a2a <+10>: jmp 0xffffffff81fe70a0 <srso_return_thunk>
so the right thunk is already there!
And yet, the warning still fired.
I need to singlestep this whole loading bit more carefully.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 26.03.24 г. 17:52 ч., Borislav Petkov wrote:
> On Tue, Mar 26, 2024 at 04:08:32PM +0200, Nikolay Borisov wrote:
>> So the problem happens when KCSAN=y CONFIG_CONSTRUCTORS is also enabled and
>> this results in an indirect call in do_mod_ctors():
>>
>> mod->ctors[i]();
>>
>>
>> When KCSAN is disabled, do_mod_ctors is empty, hence the warning is not
>> printed.
>
> Yeah, KCSAN is doing something weird. I was able to stop the guest when
> the warning fires. Here's what I see:
>
> The callstack when it fires:
>
> #0 warn_thunk_thunk () at arch/x86/entry/entry.S:48
> #1 0xffffffff811a98f9 in do_mod_ctors (mod=0xffffffffa00052c0) at kernel/module/main.c:2462
> #2 do_init_module (mod=mod@entry=0xffffffffa00052c0) at kernel/module/main.c:2535
> #3 0xffffffff811ad2e1 in load_module (info=info@entry=0xffffc900004c7dd0, uargs=uargs@entry=0x564c103dd4a0 "", flags=flags@entry=0) at kernel/module/main.c:3001
> #4 0xffffffff811ad8ef in init_module_from_file (f=f@entry=0xffff8880151c5d00, uargs=uargs@entry=0x564c103dd4a0 "", flags=flags@entry=0) at kernel/module/main.c:3168
> #5 0xffffffff811adade in idempotent_init_module (f=f@entry=0xffff8880151c5d00, uargs=uargs@entry=0x564c103dd4a0 "", flags=flags@entry=0) at kernel/module/main.c:3185
> #6 0xffffffff811adec9 in __do_sys_finit_module (flags=0, uargs=0x564c103dd4a0 "", fd=3) at kernel/module/main.c:3206
> #7 __se_sys_finit_module (flags=<optimized out>, uargs=94884689990816, fd=3) at kernel/module/main.c:3189
> #8 __x64_sys_finit_module (regs=<optimized out>) at kernel/module/main.c:3189
> #9 0xffffffff81fccdff in do_syscall_x64 (nr=<optimized out>, regs=0xffffc900004c7f58) at arch/x86/entry/common.c:52
> #10 do_syscall_64 (regs=0xffffc900004c7f58, nr=<optimized out>) at arch/x86/entry/common.c:83
> #11 0xffffffff82000126 in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:120
> #12 0x0000000000000000 in ?? ()
>
> Now, when we look at frame #1:
>
> ffffffff811a9800 <do_init_module>:
> ffffffff811a9800: e8 bb 36 ee ff call ffffffff8108cec0 <__fentry__>
> ffffffff811a9805: 41 57 push %r15
> ffffffff811a9807: 41 56 push %r14
> ffffffff811a9809: 41 55 push %r13
> ffffffff811a980b: 41 54 push %r12
> ffffffff811a980d: 55 push %rbp
> ffffffff811a980e: 53 push %rbx
> ffffffff811a980f: 48 89 fb mov %rdi,%rbx
> ffffffff811a9812: 48 c7 c7 c8 9f 6a 82 mov $0xffffffff826a9fc8,%rdi
> ffffffff811a9819: 48 83 ec 08 sub $0x8,%rsp
> ffffffff811a981d: e8 5e 51 0d 00 call ffffffff8127e980 <__tsan_read8>
> ffffffff811a9822: 48 8b 3d 9f 07 50 01 mov 0x150079f(%rip),%rdi # ffffffff826a9fc8 <kmalloc_caches+0x28>
>
> ...
>
> ffffffff811a98ec: e8 8f 50 0d 00 call ffffffff8127e980 <__tsan_read8>
> ffffffff811a98f1: 49 8b 07 mov (%r15),%rax
> ffffffff811a98f4: e8 27 d1 e3 00 call ffffffff81fe6a20 <__x86_indirect_thunk_array>
> ffffffff811a98f9: 4c 89 ef mov %r13,%rdi
>
> there's that call to the indirect array. Which is in the static kernel image:
>
> ffffffff81fe6a20 <__x86_indirect_thunk_array>:
> ffffffff81fe6a20: e8 01 00 00 00 call ffffffff81fe6a26 <__x86_indirect_thunk_array+0x6>
> ffffffff81fe6a25: cc int3
> ffffffff81fe6a26: 48 89 04 24 mov %rax,(%rsp)
> ffffffff81fe6a2a: e9 b1 07 00 00 jmp ffffffff81fe71e0 <__x86_return_thunk>
>
> where you'd think, ah, yes, that's why it fires.
>
> BUT! The live kernel image in gdb looks like this:
>
> Dump of assembler code for function __x86_indirect_thunk_array:
> 0xffffffff81fe6a20 <+0>: call 0xffffffff81fe6a26 <__x86_indirect_thunk_array+6>
> 0xffffffff81fe6a25 <+5>: int3
> 0xffffffff81fe6a26 <+6>: mov %rax,(%rsp)
> 0xffffffff81fe6a2a <+10>: jmp 0xffffffff81fe70a0 <srso_return_thunk>
>
> so the right thunk is already there!
>
> And yet, the warning still fired.
But you eventually call the address that was in %RAX from within
srso_return_thunk, so it's likely that's where the warning is triggered.
As far as I managed to see that address is supposed to be some compiler
generated constructors that calls tsan_init. Dumping the .init_array
contains:
.type _sub_I_00099_0, @function
25 _sub_I_00099_0:
24 endbr64
23 call __tsan_init #
22 jmp __x86_return_thunk
21 .size _sub_I_00099_0, .-_sub_I_00099_0
20 .section .init_array.00099,"aw"
19 .align 8
18 .quad _sub_I_00099_0
17 .ident "GCC: (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0"
16 .section .note.GNU-stack,"",@progbits
15 .section .note.gnu.property,"a"
14 .align 8
13 .long 1f - 0f
12 .long 4f - 1f
11 .long 5
10 0:
9 .string "GNU"
8 1:
7 .align 8
6 .long 0xc0000002
5 .long 3f - 2f
4 2:
3 .long 0x1
2 3:
1 .align 8
0 4:
So this _sub_I_00099_0 is the compiler generated ctors that is
likely not patched. What's strange is that when adding debugging code I
see that 2 ctors are being executed and only the 2nd one fires:
[ 7.635418] in do_mod_ctors
[ 7.635425] calling 0 ctor 00000000aa7a443a
[ 7.635430] called 0 ctor
[ 7.635433] calling 1 ctor 00000000fe9d0d54
[ 7.635437] ------------[ cut here ]------------
[ 7.635441] Unpatched return thunk in use. This should not happen!
>
> I need to singlestep this whole loading bit more carefully.
>
> Thx.
>
On Tue, Mar 26, 2024 at 06:04:26PM +0200, Nikolay Borisov wrote:
> So this _sub_I_00099_0 is the compiler generated ctors that is likely
> not patched. What's strange is that when adding debugging code I see that 2
> ctors are being executed and only the 2nd one fires:
>
> [ 7.635418] in do_mod_ctors
> [ 7.635425] calling 0 ctor 00000000aa7a443a
> [ 7.635430] called 0 ctor
> [ 7.635433] calling 1 ctor 00000000fe9d0d54
> [ 7.635437] ------------[ cut here ]------------
> [ 7.635441] Unpatched return thunk in use. This should not happen!
.. and this is just the beginning of the rabbit hole. David and I went
all the way down.
Turns out that objtool runs on the .o files and creates the
return_sites just fine but then the module building dance creates an
intermediary *.mod.c file and when that thing is built, KCSAN would
cause the addition of *another* constructor to .text.startup in the
module.
The .o file has one:
-------------------
Disassembly of section .text.startup:
..
0000000000000010 <_sub_I_00099_0>:
10: f3 0f 1e fa endbr64
14: e8 00 00 00 00 call 19 <_sub_I_00099_0+0x9>
15: R_X86_64_PLT32 __tsan_init-0x4
19: e9 00 00 00 00 jmp 1e <__UNIQUE_ID___addressable_cryptd_alloc_aead349+0x6>
1a: R_X86_64_PLT32 __x86_return_thunk-0x4
-------------------
while the .ko file has two:
-------------------
Disassembly of section .text.startup:
0000000000000010 <_sub_I_00099_0>:
10: f3 0f 1e fa endbr64
14: e8 00 00 00 00 call 19 <_sub_I_00099_0+0x9>
15: R_X86_64_PLT32 __tsan_init-0x4
19: e9 00 00 00 00 jmp 1e <_sub_I_00099_0+0xe>
1a: R_X86_64_PLT32 __x86_return_thunk-0x4
..
0000000000000030 <_sub_I_00099_0>:
30: f3 0f 1e fa endbr64
34: e8 00 00 00 00 call 39 <_sub_I_00099_0+0x9>
35: R_X86_64_PLT32 __tsan_init-0x4
39: e9 00 00 00 00 jmp 3e <__ksymtab_cryptd_alloc_ahash+0x2>
3a: R_X86_64_PLT32 __x86_return_thunk-0x4
-------------------
Once we've figured that out, finding a fix is easy:
diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal
index 8568d256d6fb..79fcf2731686 100644
--- a/scripts/Makefile.modfinal
+++ b/scripts/Makefile.modfinal
@@ -23,7 +23,7 @@ modname = $(notdir $(@:.mod.o=))
part-of-module = y
quiet_cmd_cc_o_c = CC [M] $@
- cmd_cc_o_c = $(CC) $(filter-out $(CC_FLAGS_CFI) $(CFLAGS_GCOV), $(c_flags)) -c -o $@ $<
+ cmd_cc_o_c = $(CC) $(filter-out $(CC_FLAGS_CFI) $(CFLAGS_GCOV) $(CFLAGS_KCSAN), $(c_flags)) -c -o $@ $<
%.mod.o: %.mod.c FORCE
$(call if_changed_dep,cc_o_c)
However, I'm not sure.
I wanna say that since those are constructors then we don't care about
dynamic races there so we could exclude them from KCSAN.
If not, I could disable the warning on KCSAN. I'm thinking no one would
run KCSAN in production...
A third option would be to make objtool run on .ko files. Yeah, that
would be fun for Josh. :-P
I'd look into the direction of melver for suggestions here.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Tue, 26 Mar 2024 at 20:12, Borislav Petkov <[email protected]> wrote:
>
> On Tue, Mar 26, 2024 at 06:04:26PM +0200, Nikolay Borisov wrote:
> > So this _sub_I_00099_0 is the compiler generated ctors that is likely
> > not patched. What's strange is that when adding debugging code I see that 2
> > ctors are being executed and only the 2nd one fires:
> >
> > [ 7.635418] in do_mod_ctors
> > [ 7.635425] calling 0 ctor 00000000aa7a443a
> > [ 7.635430] called 0 ctor
> > [ 7.635433] calling 1 ctor 00000000fe9d0d54
> > [ 7.635437] ------------[ cut here ]------------
> > [ 7.635441] Unpatched return thunk in use. This should not happen!
>
> ... and this is just the beginning of the rabbit hole. David and I went
> all the way down.
>
> Turns out that objtool runs on the .o files and creates the
> .return_sites just fine but then the module building dance creates an
> intermediary *.mod.c file and when that thing is built, KCSAN would
> cause the addition of *another* constructor to .text.startup in the
> module.
>
> The .o file has one:
>
> -------------------
> Disassembly of section .text.startup:
>
> ...
>
> 0000000000000010 <_sub_I_00099_0>:
> 10: f3 0f 1e fa endbr64
> 14: e8 00 00 00 00 call 19 <_sub_I_00099_0+0x9>
> 15: R_X86_64_PLT32 __tsan_init-0x4
> 19: e9 00 00 00 00 jmp 1e <__UNIQUE_ID___addressable_cryptd_alloc_aead349+0x6>
> 1a: R_X86_64_PLT32 __x86_return_thunk-0x4
> -------------------
>
>
> while the .ko file has two:
>
> -------------------
> Disassembly of section .text.startup:
>
> 0000000000000010 <_sub_I_00099_0>:
> 10: f3 0f 1e fa endbr64
> 14: e8 00 00 00 00 call 19 <_sub_I_00099_0+0x9>
> 15: R_X86_64_PLT32 __tsan_init-0x4
> 19: e9 00 00 00 00 jmp 1e <_sub_I_00099_0+0xe>
> 1a: R_X86_64_PLT32 __x86_return_thunk-0x4
>
> ...
>
> 0000000000000030 <_sub_I_00099_0>:
> 30: f3 0f 1e fa endbr64
> 34: e8 00 00 00 00 call 39 <_sub_I_00099_0+0x9>
> 35: R_X86_64_PLT32 __tsan_init-0x4
> 39: e9 00 00 00 00 jmp 3e <__ksymtab_cryptd_alloc_ahash+0x2>
> 3a: R_X86_64_PLT32 __x86_return_thunk-0x4
> -------------------
>
> Once we've figured that out, finding a fix is easy:
Thanks for figuring this one out!
> diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal
> index 8568d256d6fb..79fcf2731686 100644
> --- a/scripts/Makefile.modfinal
> +++ b/scripts/Makefile.modfinal
> @@ -23,7 +23,7 @@ modname = $(notdir $(@:.mod.o=))
> part-of-module = y
>
> quiet_cmd_cc_o_c = CC [M] $@
> - cmd_cc_o_c = $(CC) $(filter-out $(CC_FLAGS_CFI) $(CFLAGS_GCOV), $(c_flags)) -c -o $@ $<
> + cmd_cc_o_c = $(CC) $(filter-out $(CC_FLAGS_CFI) $(CFLAGS_GCOV) $(CFLAGS_KCSAN), $(c_flags)) -c -o $@ $<
This looks reasonable.
> %.mod.o: %.mod.c FORCE
> $(call if_changed_dep,cc_o_c)
>
> However, I'm not sure.
>
> I wanna say that since those are constructors then we don't care about
> dynamic races there so we could exclude them from KCSAN.
Yeah, we can just exclude all the code from the auto-generated mod.c
files from KCSAN instrumentation. It looks like they just contain
global metadata for the module, and no other code. The auto-generated
constructors don't contain much interesting code (unlikely they'd
contain data races we'd ever care about).
> If not, I could disable the warning on KCSAN. I'm thinking no one would
> run KCSAN in production...
>
> A third option would be to make objtool run on .ko files. Yeah, that
> would be fun for Josh. :-P
>
> I'd look into the direction of melver for suggestions here.
I think just removing instrumentation from the mod.c files is very reasonable.
Thanks,
-- Marco
On Tue, Mar 26, 2024 at 08:33:31PM +0100, Marco Elver wrote:
> I think just removing instrumentation from the mod.c files is very reasonable.
Thanks!
@Masahiro: pls send this to Linus now as the commit which adds the
warning is in 6.9 so we should make sure we release it with all issues
fixed.
Thx.
---
From: "Borislav Petkov (AMD)" <[email protected]>
Date: Tue, 26 Mar 2024 21:11:01 +0100
When KCSAN and CONSTRUCTORS are enabled, one can trigger the
"Unpatched return thunk in use. This should not happen!"
catch-all warning.
Usually, when objtool runs on the .o objects, it does generate a section
return_sites which contains all offsets in the objects to the return
thunks of the functions present there. Those return thunks then get
patched at runtime by the alternatives.
KCSAN and CONSTRUCTORS add this to the the object file's .text.startup
section:
-------------------
Disassembly of section .text.startup:
...
0000000000000010 <_sub_I_00099_0>:
10: f3 0f 1e fa endbr64
14: e8 00 00 00 00 call 19 <_sub_I_00099_0+0x9>
15: R_X86_64_PLT32 __tsan_init-0x4
19: e9 00 00 00 00 jmp 1e <__UNIQUE_ID___addressable_cryptd_alloc_aead349+0x6>
1a: R_X86_64_PLT32 __x86_return_thunk-0x4
-------------------
which, if it is built as a module goes through the intermediary stage of
creating a <module>.mod.c file which, when translated, receives a second
constructor:
-------------------
Disassembly of section .text.startup:
0000000000000010 <_sub_I_00099_0>:
10: f3 0f 1e fa endbr64
14: e8 00 00 00 00 call 19 <_sub_I_00099_0+0x9>
15: R_X86_64_PLT32 __tsan_init-0x4
19: e9 00 00 00 00 jmp 1e <_sub_I_00099_0+0xe>
1a: R_X86_64_PLT32 __x86_return_thunk-0x4
...
0000000000000030 <_sub_I_00099_0>:
30: f3 0f 1e fa endbr64
34: e8 00 00 00 00 call 39 <_sub_I_00099_0+0x9>
35: R_X86_64_PLT32 __tsan_init-0x4
39: e9 00 00 00 00 jmp 3e <__ksymtab_cryptd_alloc_ahash+0x2>
3a: R_X86_64_PLT32 __x86_return_thunk-0x4
-------------------
in the .ko file.
Objtool has run already so that second constructor's return thunk cannot
be added to the .return_sites section and thus the return thunk remains
unpatched and the warning rightfully fires.
Drop KCSAN flags from the mod.c generation stage as those constructors
do not contain data races one would be interested about.
Debugged together with David Kaplan <[email protected]> and Nikolay
Borisov <[email protected]>.
Reported-by: Paul Menzel <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
scripts/Makefile.modfinal | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal
index 8568d256d6fb..79fcf2731686 100644
--- a/scripts/Makefile.modfinal
+++ b/scripts/Makefile.modfinal
@@ -23,7 +23,7 @@ modname = $(notdir $(@:.mod.o=))
part-of-module = y
quiet_cmd_cc_o_c = CC [M] $@
- cmd_cc_o_c = $(CC) $(filter-out $(CC_FLAGS_CFI) $(CFLAGS_GCOV), $(c_flags)) -c -o $@ $<
+ cmd_cc_o_c = $(CC) $(filter-out $(CC_FLAGS_CFI) $(CFLAGS_GCOV) $(CFLAGS_KCSAN), $(c_flags)) -c -o $@ $<
%.mod.o: %.mod.c FORCE
$(call if_changed_dep,cc_o_c)
--
2.43.0
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Tue, 26 Mar 2024 at 21:26, Borislav Petkov <[email protected]> wrote:
>
> On Tue, Mar 26, 2024 at 08:33:31PM +0100, Marco Elver wrote:
> > I think just removing instrumentation from the mod.c files is very reasonable.
>
> Thanks!
>
> @Masahiro: pls send this to Linus now as the commit which adds the
> warning is in 6.9 so we should make sure we release it with all issues
> fixed.
>
> Thx.
>
> ---
> From: "Borislav Petkov (AMD)" <[email protected]>
> Date: Tue, 26 Mar 2024 21:11:01 +0100
>
> When KCSAN and CONSTRUCTORS are enabled, one can trigger the
>
> "Unpatched return thunk in use. This should not happen!"
>
> catch-all warning.
>
> Usually, when objtool runs on the .o objects, it does generate a section
> .return_sites which contains all offsets in the objects to the return
> thunks of the functions present there. Those return thunks then get
> patched at runtime by the alternatives.
>
> KCSAN and CONSTRUCTORS add this to the the object file's .text.startup
> section:
>
> -------------------
> Disassembly of section .text.startup:
>
> ...
>
> 0000000000000010 <_sub_I_00099_0>:
> 10: f3 0f 1e fa endbr64
> 14: e8 00 00 00 00 call 19 <_sub_I_00099_0+0x9>
> 15: R_X86_64_PLT32 __tsan_init-0x4
> 19: e9 00 00 00 00 jmp 1e <__UNIQUE_ID___addressable_cryptd_alloc_aead349+0x6>
> 1a: R_X86_64_PLT32 __x86_return_thunk-0x4
> -------------------
>
> which, if it is built as a module goes through the intermediary stage of
> creating a <module>.mod.c file which, when translated, receives a second
> constructor:
>
> -------------------
> Disassembly of section .text.startup:
>
> 0000000000000010 <_sub_I_00099_0>:
> 10: f3 0f 1e fa endbr64
> 14: e8 00 00 00 00 call 19 <_sub_I_00099_0+0x9>
> 15: R_X86_64_PLT32 __tsan_init-0x4
> 19: e9 00 00 00 00 jmp 1e <_sub_I_00099_0+0xe>
> 1a: R_X86_64_PLT32 __x86_return_thunk-0x4
>
> ...
>
> 0000000000000030 <_sub_I_00099_0>:
> 30: f3 0f 1e fa endbr64
> 34: e8 00 00 00 00 call 39 <_sub_I_00099_0+0x9>
> 35: R_X86_64_PLT32 __tsan_init-0x4
> 39: e9 00 00 00 00 jmp 3e <__ksymtab_cryptd_alloc_ahash+0x2>
> 3a: R_X86_64_PLT32 __x86_return_thunk-0x4
> -------------------
>
> in the .ko file.
>
> Objtool has run already so that second constructor's return thunk cannot
> be added to the .return_sites section and thus the return thunk remains
> unpatched and the warning rightfully fires.
>
> Drop KCSAN flags from the mod.c generation stage as those constructors
> do not contain data races one would be interested about.
>
> Debugged together with David Kaplan <[email protected]> and Nikolay
> Borisov <[email protected]>.
>
> Reported-by: Paul Menzel <[email protected]>
> Signed-off-by: Borislav Petkov (AMD) <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Marco Elver <[email protected]>
Thanks!
> ---
> scripts/Makefile.modfinal | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal
> index 8568d256d6fb..79fcf2731686 100644
> --- a/scripts/Makefile.modfinal
> +++ b/scripts/Makefile.modfinal
> @@ -23,7 +23,7 @@ modname = $(notdir $(@:.mod.o=))
> part-of-module = y
>
> quiet_cmd_cc_o_c = CC [M] $@
> - cmd_cc_o_c = $(CC) $(filter-out $(CC_FLAGS_CFI) $(CFLAGS_GCOV), $(c_flags)) -c -o $@ $<
> + cmd_cc_o_c = $(CC) $(filter-out $(CC_FLAGS_CFI) $(CFLAGS_GCOV) $(CFLAGS_KCSAN), $(c_flags)) -c -o $@ $<
>
> %.mod.o: %.mod.c FORCE
> $(call if_changed_dep,cc_o_c)
> --
> 2.43.0
>
>
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
On 26.03.24 г. 22:25 ч., Borislav Petkov wrote:
> On Tue, Mar 26, 2024 at 08:33:31PM +0100, Marco Elver wrote:
>> I think just removing instrumentation from the mod.c files is very reasonable.
>
> Thanks!
>
> @Masahiro: pls send this to Linus now as the commit which adds the
> warning is in 6.9 so we should make sure we release it with all issues
> fixed.
>
> Thx.
>
> ---
> From: "Borislav Petkov (AMD)" <[email protected]>
> Date: Tue, 26 Mar 2024 21:11:01 +0100
>
> When KCSAN and CONSTRUCTORS are enabled, one can trigger the
>
> "Unpatched return thunk in use. This should not happen!"
>
> catch-all warning.
>
> Usually, when objtool runs on the .o objects, it does generate a section
> .return_sites which contains all offsets in the objects to the return
> thunks of the functions present there. Those return thunks then get
> patched at runtime by the alternatives.
>
> KCSAN and CONSTRUCTORS add this to the the object file's .text.startup
> section:
>
> -------------------
> Disassembly of section .text.startup:
>
> ...
>
> 0000000000000010 <_sub_I_00099_0>:
> 10: f3 0f 1e fa endbr64
> 14: e8 00 00 00 00 call 19 <_sub_I_00099_0+0x9>
> 15: R_X86_64_PLT32 __tsan_init-0x4
> 19: e9 00 00 00 00 jmp 1e <__UNIQUE_ID___addressable_cryptd_alloc_aead349+0x6>
> 1a: R_X86_64_PLT32 __x86_return_thunk-0x4
> -------------------
>
> which, if it is built as a module goes through the intermediary stage of
> creating a <module>.mod.c file which, when translated, receives a second
> constructor:
>
> -------------------
> Disassembly of section .text.startup:
>
> 0000000000000010 <_sub_I_00099_0>:
> 10: f3 0f 1e fa endbr64
> 14: e8 00 00 00 00 call 19 <_sub_I_00099_0+0x9>
> 15: R_X86_64_PLT32 __tsan_init-0x4
> 19: e9 00 00 00 00 jmp 1e <_sub_I_00099_0+0xe>
> 1a: R_X86_64_PLT32 __x86_return_thunk-0x4
>
> ...
>
> 0000000000000030 <_sub_I_00099_0>:
> 30: f3 0f 1e fa endbr64
> 34: e8 00 00 00 00 call 39 <_sub_I_00099_0+0x9>
> 35: R_X86_64_PLT32 __tsan_init-0x4
> 39: e9 00 00 00 00 jmp 3e <__ksymtab_cryptd_alloc_ahash+0x2>
> 3a: R_X86_64_PLT32 __x86_return_thunk-0x4
> -------------------
>
> in the .ko file.
>
> Objtool has run already so that second constructor's return thunk cannot
> be added to the .return_sites section and thus the return thunk remains
> unpatched and the warning rightfully fires.
>
> Drop KCSAN flags from the mod.c generation stage as those constructors
> do not contain data races one would be interested about.
>
> Debugged together with David Kaplan <[email protected]> and Nikolay
> Borisov <[email protected]>.
>
> Reported-by: Paul Menzel <[email protected]>
> Signed-off-by: Borislav Petkov (AMD) <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> ---
> scripts/Makefile.modfinal | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal
> index 8568d256d6fb..79fcf2731686 100644
> --- a/scripts/Makefile.modfinal
> +++ b/scripts/Makefile.modfinal
> @@ -23,7 +23,7 @@ modname = $(notdir $(@:.mod.o=))
> part-of-module = y
>
> quiet_cmd_cc_o_c = CC [M] $@
> - cmd_cc_o_c = $(CC) $(filter-out $(CC_FLAGS_CFI) $(CFLAGS_GCOV), $(c_flags)) -c -o $@ $<
> + cmd_cc_o_c = $(CC) $(filter-out $(CC_FLAGS_CFI) $(CFLAGS_GCOV) $(CFLAGS_KCSAN), $(c_flags)) -c -o $@ $<
>
> %.mod.o: %.mod.c FORCE
> $(call if_changed_dep,cc_o_c)
Reviewed-by: Nikolay Borisov <[email protected]>
Dear Borislav,
Thank you and the others very much for coming up with a patch so quickly.
Tested-by: Paul Menzel <[email protected]> # Dell XPS 13
9360/0596KF, BIOS 2.21.0 06/02/2022
Kind regards,
Paul
PS: Without your patch, I also so it one in QEMU q35, but not consistently.
On Wed, Mar 27, 2024 at 5:26 AM Borislav Petkov <[email protected]> wrote:
>
> On Tue, Mar 26, 2024 at 08:33:31PM +0100, Marco Elver wrote:
> > I think just removing instrumentation from the mod.c files is very reasonable.
>
> Thanks!
>
> @Masahiro: pls send this to Linus now as the commit which adds the
> warning is in 6.9 so we should make sure we release it with all issues
> fixed.
>
> Thx.
>
> ---
> From: "Borislav Petkov (AMD)" <[email protected]>
> Date: Tue, 26 Mar 2024 21:11:01 +0100
>
> When KCSAN and CONSTRUCTORS are enabled, one can trigger the
>
> "Unpatched return thunk in use. This should not happen!"
>
> catch-all warning.
>
> Usually, when objtool runs on the .o objects, it does generate a section
> .return_sites which contains all offsets in the objects to the return
> thunks of the functions present there. Those return thunks then get
> patched at runtime by the alternatives.
>
> KCSAN and CONSTRUCTORS add this to the the object file's .text.startup
> section:
>
> -------------------
> Disassembly of section .text.startup:
>
> ...
>
> 0000000000000010 <_sub_I_00099_0>:
> 10: f3 0f 1e fa endbr64
> 14: e8 00 00 00 00 call 19 <_sub_I_00099_0+0x9>
> 15: R_X86_64_PLT32 __tsan_init-0x4
> 19: e9 00 00 00 00 jmp 1e <__UNIQUE_ID___addressable_cryptd_alloc_aead349+0x6>
> 1a: R_X86_64_PLT32 __x86_return_thunk-0x4
> -------------------
>
> which, if it is built as a module goes through the intermediary stage of
> creating a <module>.mod.c file which, when translated, receives a second
> constructor:
>
> -------------------
> Disassembly of section .text.startup:
>
> 0000000000000010 <_sub_I_00099_0>:
> 10: f3 0f 1e fa endbr64
> 14: e8 00 00 00 00 call 19 <_sub_I_00099_0+0x9>
> 15: R_X86_64_PLT32 __tsan_init-0x4
> 19: e9 00 00 00 00 jmp 1e <_sub_I_00099_0+0xe>
> 1a: R_X86_64_PLT32 __x86_return_thunk-0x4
>
> ...
>
> 0000000000000030 <_sub_I_00099_0>:
> 30: f3 0f 1e fa endbr64
> 34: e8 00 00 00 00 call 39 <_sub_I_00099_0+0x9>
> 35: R_X86_64_PLT32 __tsan_init-0x4
> 39: e9 00 00 00 00 jmp 3e <__ksymtab_cryptd_alloc_ahash+0x2>
> 3a: R_X86_64_PLT32 __x86_return_thunk-0x4
> -------------------
>
> in the .ko file.
>
> Objtool has run already so that second constructor's return thunk cannot
> be added to the .return_sites section and thus the return thunk remains
> unpatched and the warning rightfully fires.
>
> Drop KCSAN flags from the mod.c generation stage as those constructors
> do not contain data races one would be interested about.
>
> Debugged together with David Kaplan <[email protected]> and Nikolay
> Borisov <[email protected]>.
>
> Reported-by: Paul Menzel <[email protected]>
> Signed-off-by: Borislav Petkov (AMD) <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> ---
> scripts/Makefile.modfinal | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal
> index 8568d256d6fb..79fcf2731686 100644
> --- a/scripts/Makefile.modfinal
> +++ b/scripts/Makefile.modfinal
> @@ -23,7 +23,7 @@ modname = $(notdir $(@:.mod.o=))
> part-of-module = y
>
> quiet_cmd_cc_o_c = CC [M] $@
> - cmd_cc_o_c = $(CC) $(filter-out $(CC_FLAGS_CFI) $(CFLAGS_GCOV), $(c_flags)) -c -o $@ $<
> + cmd_cc_o_c = $(CC) $(filter-out $(CC_FLAGS_CFI) $(CFLAGS_GCOV) $(CFLAGS_KCSAN), $(c_flags)) -c -o $@ $<
>
> %.mod.o: %.mod.c FORCE
> $(call if_changed_dep,cc_o_c)
> --
> 2.43.0
>
>
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
I applied.
I fixed the typo "the the" and replaced Link: with Closes:
to address the following checkpatch warnings:
WARNING: Possible repeated word: 'the'
#18:
KCSAN and CONSTRUCTORS add this to the the object file's .text.startup
WARNING: Reported-by: should be immediately followed by Closes: with a
URL to the report
#70:
Reported-by: Paul Menzel <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Instead of filter-out, you could add
KCSAN_SANITIZE := n
to scripts/Makefile.modfinal because
it is the reason why KCSAN_SANITIZE exists.
But, that is not a big deal.
GCOV flag is also filtered away instead of
GCOV_PROFILE := n
I will probably use a different approach later.
--
Best Regards
Masahiro Yamada
On Sun, Mar 31, 2024 at 07:32:30AM +0900, Masahiro Yamada wrote:
> I applied.
>
> I fixed the typo "the the" and replaced Link: with Closes:
> to address the following checkpatch warnings:
Thanks!
> Instead of filter-out, you could add
> KCSAN_SANITIZE := n
> to scripts/Makefile.modfinal because
> it is the reason why KCSAN_SANITIZE exists.
>
> But, that is not a big deal.
> GCOV flag is also filtered away instead of
> GCOV_PROFILE := n
Ah, that would've been more readable, yap.
> I will probably use a different approach later.
Right.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette