All CPU bugs that require a return thunk define a special return thunk
to use (e.g., srso_return_thunk). The default thunk,
__x86_return_thunk, should never be used after apply_returns() completes.
Otherwise this could lead to potential speculation holes.
Enforce this by replacing this thunk with a ud2 when alternatives are
applied. Alternative instructions are applied after apply_returns().
The default thunk is only used during kernel boot, it is not used during
module init since that occurs after apply_returns().
Signed-off-by: David Kaplan <[email protected]>
---
arch/x86/lib/retpoline.S | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index 3da768a71cf9..10212cf4a9af 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -358,15 +358,17 @@ SYM_FUNC_END(call_depth_return_thunk)
* This function name is magical and is used by -mfunction-return=thunk-extern
* for the compiler to generate JMPs to it.
*
- * This code is only used during kernel boot or module init. All
+ * This code is only used during kernel boot. All
* 'JMP __x86_return_thunk' sites are changed to something else by
* apply_returns().
+ *
+ * This thunk is turned into a ud2 to ensure it is never used at runtime.
+ * Alternative instructions are applied after apply_returns().
*/
SYM_CODE_START(__x86_return_thunk)
UNWIND_HINT_FUNC
ANNOTATE_NOENDBR
- ANNOTATE_UNRET_SAFE
- ret
+ ALTERNATIVE __stringify(ANNOTATE_UNRET_SAFE;ret),"ud2", X86_FEATURE_RETHUNK
int3
SYM_CODE_END(__x86_return_thunk)
EXPORT_SYMBOL(__x86_return_thunk)
--
2.25.1
On Tue, Oct 10, 2023 at 12:10:20PM -0500, David Kaplan wrote:
> All CPU bugs that require a return thunk define a special return thunk
> to use (e.g., srso_return_thunk). The default thunk,
> __x86_return_thunk, should never be used after apply_returns() completes.
> Otherwise this could lead to potential speculation holes.
>
> Enforce this by replacing this thunk with a ud2 when alternatives are
> applied. Alternative instructions are applied after apply_returns().
>
> The default thunk is only used during kernel boot, it is not used during
> module init since that occurs after apply_returns().
>
> Signed-off-by: David Kaplan <[email protected]>
> ---
> arch/x86/lib/retpoline.S | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
> index 3da768a71cf9..10212cf4a9af 100644
> --- a/arch/x86/lib/retpoline.S
> +++ b/arch/x86/lib/retpoline.S
> @@ -358,15 +358,17 @@ SYM_FUNC_END(call_depth_return_thunk)
> * This function name is magical and is used by -mfunction-return=thunk-extern
> * for the compiler to generate JMPs to it.
> *
> - * This code is only used during kernel boot or module init. All
> + * This code is only used during kernel boot. All
> * 'JMP __x86_return_thunk' sites are changed to something else by
> * apply_returns().
> + *
> + * This thunk is turned into a ud2 to ensure it is never used at runtime.
> + * Alternative instructions are applied after apply_returns().
> */
> SYM_CODE_START(__x86_return_thunk)
> UNWIND_HINT_FUNC
> ANNOTATE_NOENDBR
> - ANNOTATE_UNRET_SAFE
> - ret
> + ALTERNATIVE __stringify(ANNOTATE_UNRET_SAFE;ret),"ud2", X86_FEATURE_RETHUNK
If it's truly never used after boot (even for non-rethunk cases) then
can we use X86_FEATURE_ALWAYS?
--
Josh
[AMD Official Use Only - General]
> -----Original Message-----
> From: Josh Poimboeuf <[email protected]>
> Sent: Tuesday, October 10, 2023 2:37 PM
> To: Kaplan, David <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH 3/3] x86/retpoline: Ensure default return thunk isn't used
> at runtime
>
> > *
> > - * This code is only used during kernel boot or module init. All
> > + * This code is only used during kernel boot. All
> > * 'JMP __x86_return_thunk' sites are changed to something else by
> > * apply_returns().
> > + *
> > + * This thunk is turned into a ud2 to ensure it is never used at runtime.
> > + * Alternative instructions are applied after apply_returns().
> > */
> > SYM_CODE_START(__x86_return_thunk)
> > UNWIND_HINT_FUNC
> > ANNOTATE_NOENDBR
> > - ANNOTATE_UNRET_SAFE
> > - ret
> > + ALTERNATIVE __stringify(ANNOTATE_UNRET_SAFE;ret),"ud2",
> > + X86_FEATURE_RETHUNK
>
> If it's truly never used after boot (even for non-rethunk cases) then can we use
> X86_FEATURE_ALWAYS?
>
I think that could work. There is one subtlety though I'll point out:
The use of __x86_return_thunk when X86_FEATURE_RETHUNK is set is a potential security issue, as it means the required return thunk is not being used. The use of __x86_return_thunk when X86_FEATURE_RETHUNK is not set is only a performance issue, as it means there is a return that was not rewritten to be an inline 'ret' by apply_returns().
The ud2 was primarily intended to capture cases where there is a potential security hole, while it is a bit overkill just to point out a return that was not optimized.
--David Kaplan
On Tue, Oct 10, 2023 at 08:14:33PM +0000, Kaplan, David wrote:
> [AMD Official Use Only - General]
>
> > -----Original Message-----
> > From: Josh Poimboeuf <[email protected]>
> > Sent: Tuesday, October 10, 2023 2:37 PM
> > To: Kaplan, David <[email protected]>
> > Cc: [email protected]; [email protected]; [email protected]
> > Subject: Re: [PATCH 3/3] x86/retpoline: Ensure default return thunk isn't used
> > at runtime
> >
> > > *
> > > - * This code is only used during kernel boot or module init. All
> > > + * This code is only used during kernel boot. All
> > > * 'JMP __x86_return_thunk' sites are changed to something else by
> > > * apply_returns().
> > > + *
> > > + * This thunk is turned into a ud2 to ensure it is never used at runtime.
> > > + * Alternative instructions are applied after apply_returns().
> > > */
> > > SYM_CODE_START(__x86_return_thunk)
> > > UNWIND_HINT_FUNC
> > > ANNOTATE_NOENDBR
> > > - ANNOTATE_UNRET_SAFE
> > > - ret
> > > + ALTERNATIVE __stringify(ANNOTATE_UNRET_SAFE;ret),"ud2",
> > > + X86_FEATURE_RETHUNK
> >
> > If it's truly never used after boot (even for non-rethunk cases) then can we use
> > X86_FEATURE_ALWAYS?
> >
>
> I think that could work. There is one subtlety though I'll point out:
>
> The use of __x86_return_thunk when X86_FEATURE_RETHUNK is set is a
> potential security issue, as it means the required return thunk is not
> being used. The use of __x86_return_thunk when X86_FEATURE_RETHUNK is
> not set is only a performance issue, as it means there is a return
> that was not rewritten to be an inline 'ret' by apply_returns().
>
> The ud2 was primarily intended to capture cases where there is a
> potential security hole, while it is a bit overkill just to point out
> a return that was not optimized.
Even if it's not a security hole, I'd still view it as a major BUG() as
it would directly contradict our understanding (and the comments above)
and could cause performance or other correctness issues that would
otherwise go unnoticed.
So I think an unconditional UD2 is warranted.
--
Josh
On Tue, Oct 10, 2023 at 01:41:19PM -0700, Josh Poimboeuf wrote:
> Even if it's not a security hole, I'd still view it as a major BUG() as
> it would directly contradict our understanding (and the comments above)
> and could cause performance or other correctness issues that would
> otherwise go unnoticed.
>
> So I think an unconditional UD2 is warranted.
Before David's outlook mangles v2, lemme send it from a real mail
client :-P.
v2 uses X86_FEATURE_ALWAYS as Josh requested.
---
From: David Kaplan <[email protected]>
Date: Thu, 12 Oct 2023 08:52:32 -0500
Subject: [PATCH] x86/retpoline: Ensure default return thunk isn't used at runtime
All CPU bugs that require a return thunk define a special return thunk
to use (e.g., srso_return_thunk). The default thunk,
__x86_return_thunk, should never be used after apply_returns()
completes. Otherwise this could lead to potential speculation holes.
Enforce this by replacing this thunk with a ud2 when alternatives are
applied. Alternative instructions are applied after apply_returns().
The default thunk is only used during kernel boot, it is not used during
module init since that occurs after apply_returns().
Signed-off-by: David Kaplan <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
---
arch/x86/lib/retpoline.S | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index db813113e637..3f3a478b74dd 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -356,15 +356,17 @@ SYM_FUNC_END(call_depth_return_thunk)
* This function name is magical and is used by -mfunction-return=thunk-extern
* for the compiler to generate JMPs to it.
*
- * This code is only used during kernel boot or module init. All
+ * This code is only used during kernel boot. All
* 'JMP __x86_return_thunk' sites are changed to something else by
* apply_returns().
+ *
+ * This thunk is turned into a ud2 to ensure it is never used at runtime.
+ * Alternative instructions are applied after apply_returns().
*/
SYM_CODE_START(__x86_return_thunk)
UNWIND_HINT_FUNC
ANNOTATE_NOENDBR
- ANNOTATE_UNRET_SAFE
- ret
+ ALTERNATIVE __stringify(ANNOTATE_UNRET_SAFE;ret),"ud2", X86_FEATURE_ALWAYS
int3
SYM_CODE_END(__x86_return_thunk)
EXPORT_SYMBOL(__x86_return_thunk)
--
2.25.1
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Thu, Oct 12, 2023 at 04:10:31PM +0200, Borislav Petkov wrote:
> On Tue, Oct 10, 2023 at 01:41:19PM -0700, Josh Poimboeuf wrote:
> > Even if it's not a security hole, I'd still view it as a major BUG() as
> > it would directly contradict our understanding (and the comments above)
> > and could cause performance or other correctness issues that would
> > otherwise go unnoticed.
> >
> > So I think an unconditional UD2 is warranted.
>
> Before David's outlook mangles v2, lemme send it from a real mail
> client :-P.
>
> v2 uses X86_FEATURE_ALWAYS as Josh requested.
>
> ---
> From: David Kaplan <[email protected]>
> Date: Thu, 12 Oct 2023 08:52:32 -0500
> Subject: [PATCH] x86/retpoline: Ensure default return thunk isn't used at runtime
>
> All CPU bugs that require a return thunk define a special return thunk
> to use (e.g., srso_return_thunk). The default thunk,
> __x86_return_thunk, should never be used after apply_returns()
> completes. Otherwise this could lead to potential speculation holes.
>
> Enforce this by replacing this thunk with a ud2 when alternatives are
> applied. Alternative instructions are applied after apply_returns().
>
> The default thunk is only used during kernel boot, it is not used during
> module init since that occurs after apply_returns().
>
> Signed-off-by: David Kaplan <[email protected]>
> Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
--
Josh
The following commit has been merged into the x86/bugs branch of tip:
Commit-ID: 91174087dcc7565d8bf0d576544e42d5b1de6f39
Gitweb: https://git.kernel.org/tip/91174087dcc7565d8bf0d576544e42d5b1de6f39
Author: David Kaplan <[email protected]>
AuthorDate: Thu, 12 Oct 2023 16:10:31 +02:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Thu, 12 Oct 2023 19:44:34 +02:00
x86/retpoline: Ensure default return thunk isn't used at runtime
All CPU bugs that require a return thunk define a special return thunk
to use (e.g., srso_return_thunk). The default thunk,
__x86_return_thunk, should never be used after apply_returns()
completes. Otherwise this could lead to potential speculation holes.
Enforce this by replacing this thunk with a ud2 when alternatives are
applied. Alternative instructions are applied after apply_returns().
The default thunk is only used during kernel boot, it is not used during
module init since that occurs after apply_returns().
Signed-off-by: David Kaplan <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/20231012141031.GHZSf+V1NjjUJTc9a9@fat_crate.local
---
arch/x86/lib/retpoline.S | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index 6376d01..fe05c13 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -356,15 +356,17 @@ SYM_FUNC_END(call_depth_return_thunk)
* This function name is magical and is used by -mfunction-return=thunk-extern
* for the compiler to generate JMPs to it.
*
- * This code is only used during kernel boot or module init. All
+ * This code is only used during kernel boot. All
* 'JMP __x86_return_thunk' sites are changed to something else by
* apply_returns().
+ *
+ * This thunk is turned into a ud2 to ensure it is never used at runtime.
+ * Alternative instructions are applied after apply_returns().
*/
SYM_CODE_START(__x86_return_thunk)
UNWIND_HINT_FUNC
ANNOTATE_NOENDBR
- ANNOTATE_UNRET_SAFE
- ret
+ ALTERNATIVE __stringify(ANNOTATE_UNRET_SAFE;ret),"ud2", X86_FEATURE_ALWAYS
int3
SYM_CODE_END(__x86_return_thunk)
EXPORT_SYMBOL(__x86_return_thunk)
On Thu, Oct 12, 2023 at 05:50:35PM -0000, tip-bot2 for David Kaplan wrote:
> The following commit has been merged into the x86/bugs branch of tip:
>
> Commit-ID: 91174087dcc7565d8bf0d576544e42d5b1de6f39
> Gitweb: https://git.kernel.org/tip/91174087dcc7565d8bf0d576544e42d5b1de6f39
> Author: David Kaplan <[email protected]>
> AuthorDate: Thu, 12 Oct 2023 16:10:31 +02:00
> Committer: Ingo Molnar <[email protected]>
> CommitterDate: Thu, 12 Oct 2023 19:44:34 +02:00
>
> x86/retpoline: Ensure default return thunk isn't used at runtime
>
> All CPU bugs that require a return thunk define a special return thunk
> to use (e.g., srso_return_thunk). The default thunk,
> __x86_return_thunk, should never be used after apply_returns()
> completes. Otherwise this could lead to potential speculation holes.
>
> Enforce this by replacing this thunk with a ud2 when alternatives are
> applied. Alternative instructions are applied after apply_returns().
>
> The default thunk is only used during kernel boot, it is not used during
> module init since that occurs after apply_returns().
>
> Signed-off-by: David Kaplan <[email protected]>
> Signed-off-by: Borislav Petkov (AMD) <[email protected]>
> Signed-off-by: Ingo Molnar <[email protected]>
> Reviewed-by: Josh Poimboeuf <[email protected]>
> Acked-by: Peter Zijlstra (Intel) <[email protected]>
> Link: https://lore.kernel.org/r/20231012141031.GHZSf+V1NjjUJTc9a9@fat_crate.local
> ---
> arch/x86/lib/retpoline.S | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
> index 6376d01..fe05c13 100644
> --- a/arch/x86/lib/retpoline.S
> +++ b/arch/x86/lib/retpoline.S
> @@ -356,15 +356,17 @@ SYM_FUNC_END(call_depth_return_thunk)
> * This function name is magical and is used by -mfunction-return=thunk-extern
> * for the compiler to generate JMPs to it.
> *
> - * This code is only used during kernel boot or module init. All
> + * This code is only used during kernel boot. All
> * 'JMP __x86_return_thunk' sites are changed to something else by
> * apply_returns().
> + *
> + * This thunk is turned into a ud2 to ensure it is never used at runtime.
> + * Alternative instructions are applied after apply_returns().
> */
> SYM_CODE_START(__x86_return_thunk)
> UNWIND_HINT_FUNC
> ANNOTATE_NOENDBR
> - ANNOTATE_UNRET_SAFE
> - ret
> + ALTERNATIVE __stringify(ANNOTATE_UNRET_SAFE;ret),"ud2", X86_FEATURE_ALWAYS
> int3
> SYM_CODE_END(__x86_return_thunk)
> EXPORT_SYMBOL(__x86_return_thunk)
I just bisected a boot failure that our continuous integration sees [1]
with x86_64_defconfig + CONFIG_KCSAN=y to this change in -tip/-next. It
does not appear to be clang specific, as I can reproduce it with GCC
13.2.0 from kernel.org [2] (the rootfs is available at [3], if it is
necessary for reproducing).
$ make -skj"$(nproc)" ARCH=x86_64 CROSS_COMPILE=x86_64-linux- defconfig
$ scripts/config -e KCSAN
$ make -skj"$(nproc)" ARCH=x86_64 CROSS_COMPILE=x86_64-linux- olddefconfig bzImage
$ qemu-system-x86_64 \
-display none \
-nodefaults \
-d unimp,guest_errors \
-append 'console=ttyS0 earlycon=uart8250,io,0x3f8' \
-kernel arch/x86/boot/bzImage \
-initrd x86_64-rootfs.cpio \
-cpu host \
-enable-kvm \
-m 512m \
-smp 2 \
-serial mon:stdio
[ 0.000000] Linux version 6.6.0-rc2-00316-g91174087dcc7 ([email protected]) (x86_64-linux-gcc (GCC) 13.2.0, GNU ld (GNU Binutils) 2.41) #1 SMP PREEMPT_DYNAMIC Mon Oct 16 13:54:09 MST 2023
...
[ 0.269298] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 0.269592] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.6.0-rc2-00316-g91174087dcc7 #1
[ 0.269592] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-14-g1e1da7a96300-prebuilt.qemu.org 04/01/2014
[ 0.269592] RIP: 0010:__x86_return_thunk+0x0/0x10
[ 0.269592] Code: e8 01 00 00 00 cc e8 01 00 00 00 cc 48 81 c4 80 00 00 00 65 48 c7 04 25 d0 ac 02 00 ff ff ff ff c3 cc 0f 1f 84 00 00 00 00 00 <0f> 0b cc cc cc cc cc cc cc cc cc cc cc cc cc cc e9 db 8c 8e fe 0f
[ 0.269592] RSP: 0018:ffffaef1c0013ed0 EFLAGS: 00010246
[ 0.269592] RAX: ffffffffa0e80eb0 RBX: ffffffffa0f05240 RCX: 0001ffffffffffff
[ 0.269592] RDX: 0000000000000551 RSI: ffffffffa0dcc64e RDI: ffffffffa0f05238
[ 0.269592] RBP: ffff8f93c11708e0 R08: ffffffffa1387280 R09: 0000000000000000
[ 0.269592] R10: 0000000000000282 R11: 0001ffffa0f05238 R12: 0000000000000002
[ 0.269592] R13: 0000000000000282 R14: 0000000000000001 R15: 0000000000000000
[ 0.269592] FS: 0000000000000000(0000) GS:ffff8f93df000000(0000) knlGS:0000000000000000
[ 0.269592] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.269592] CR2: ffff8f93d6c01000 CR3: 0000000015c2e000 CR4: 0000000000350ef0
[ 0.269592] Call Trace:
[ 0.269592] <TASK>
[ 0.269592] ? die+0x31/0x80
[ 0.269592] ? do_trap+0xd5/0x100
[ 0.269592] ? call_depth_return_thunk+0x90/0x90
[ 0.269592] ? do_error_trap+0x60/0x80
[ 0.269592] ? call_depth_return_thunk+0x90/0x90
[ 0.269592] ? exc_invalid_op+0x50/0x70
[ 0.269592] ? call_depth_return_thunk+0x90/0x90
[ 0.269592] ? asm_exc_invalid_op+0x1a/0x20
[ 0.269592] ? _sub_I_00099_0+0x20/0x20
[ 0.269592] ? kernel_init_freeable+0x1de/0x4b0
[ 0.269592] ? call_depth_return_thunk+0x90/0x90
[ 0.269592] kernel_init_freeable+0x1e4/0x4b0
[ 0.269592] ? __pfx_kernel_init+0x10/0x10
[ 0.269592] kernel_init+0x1a/0x220
[ 0.269592] ? srso_return_thunk+0x5/0x5f
[ 0.269592] ret_from_fork+0x2f/0x50
[ 0.269592] ? __pfx_kernel_init+0x10/0x10
[ 0.269592] ret_from_fork_asm+0x1b/0x30
[ 0.269592] </TASK>
[ 0.269592] Modules linked in:
[ 0.269596] ---[ end trace 0000000000000000 ]---
[ 0.270115] RIP: 0010:__x86_return_thunk+0x0/0x10
[ 0.270598] Code: e8 01 00 00 00 cc e8 01 00 00 00 cc 48 81 c4 80 00 00 00 65 48 c7 04 25 d0 ac 02 00 ff ff ff ff c3 cc 0f 1f 84 00 00 00 00 00 <0f> 0b cc cc cc cc cc cc cc cc cc cc cc cc cc cc e9 db 8c 8e fe 0f
[ 0.271597] RSP: 0018:ffffaef1c0013ed0 EFLAGS: 00010246
[ 0.272234] RAX: ffffffffa0e80eb0 RBX: ffffffffa0f05240 RCX: 0001ffffffffffff
[ 0.272597] RDX: 0000000000000551 RSI: ffffffffa0dcc64e RDI: ffffffffa0f05238
[ 0.273393] RBP: ffff8f93c11708e0 R08: ffffffffa1387280 R09: 0000000000000000
[ 0.273599] R10: 0000000000000282 R11: 0001ffffa0f05238 R12: 0000000000000002
[ 0.274398] R13: 0000000000000282 R14: 0000000000000001 R15: 0000000000000000
[ 0.274599] FS: 0000000000000000(0000) GS:ffff8f93df000000(0000) knlGS:0000000000000000
[ 0.275500] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.275597] CR2: ffff8f93d6c01000 CR3: 0000000015c2e000 CR4: 0000000000350ef0
[ 0.276426] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 0.276592] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
If there is any other information I can provide or patches I can test, I
am more than happy to do so.
Cheers,
Nathan
[1]: https://github.com/ClangBuiltLinux/continuous-integration2/actions/runs/6533455720/job/17740995771
[2]: https://mirrors.edge.kernel.org/pub/tools/crosstool/
[3]: https://github.com/ClangBuiltLinux/boot-utils/releases
# bad: [4d0515b235dec789578d135a5db586b25c5870cb] Add linux-next specific files for 20231016
# good: [58720809f52779dc0f08e53e54b014209d13eebb] Linux 6.6-rc6
git bisect start '4d0515b235dec789578d135a5db586b25c5870cb' '58720809f52779dc0f08e53e54b014209d13eebb'
# good: [8c7c066964a7274b254171552d465219e9f234e6] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git
git bisect good 8c7c066964a7274b254171552d465219e9f234e6
# good: [018e1e44855a873d74d712657abc93bdb9aaf961] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git
git bisect good 018e1e44855a873d74d712657abc93bdb9aaf961
# bad: [e9deda4910ddc2b03bf58e8c946bd63f101f25be] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt.git
git bisect bad e9deda4910ddc2b03bf58e8c946bd63f101f25be
# good: [ca6177c462b7eca552a967b1caf2a535f0f536f5] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git
git bisect good ca6177c462b7eca552a967b1caf2a535f0f536f5
# bad: [a92888f634feb41d6fb29f9a4da4e8ccfc5e8968] Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
git bisect bad a92888f634feb41d6fb29f9a4da4e8ccfc5e8968
# bad: [63f98235b237e94f82b16d21292c013109c9ae66] Merge branch into tip/master: 'x86/bugs'
git bisect bad 63f98235b237e94f82b16d21292c013109c9ae66
# good: [d3a3b8c3efbb532c867447b8c00a59489a08f8ef] Merge branch into tip/master: 'objtool/core'
git bisect good d3a3b8c3efbb532c867447b8c00a59489a08f8ef
# good: [3657680f38cd7df413d665f2b2f38e9a78130d8b] sched/psi: Delete the 'update_total' function parameter from update_triggers()
git bisect good 3657680f38cd7df413d665f2b2f38e9a78130d8b
# good: [791b24f02bf2230cec50f6c7a508f1d554f69464] Merge branch into tip/master: 'x86/asm'
git bisect good 791b24f02bf2230cec50f6c7a508f1d554f69464
# bad: [91174087dcc7565d8bf0d576544e42d5b1de6f39] x86/retpoline: Ensure default return thunk isn't used at runtime
git bisect bad 91174087dcc7565d8bf0d576544e42d5b1de6f39
# good: [074c9666d42211d0f70d5c156d377a4881b2a98c] x86/srso: Move retbleed IBPB check into existing 'has_microcode' code block
git bisect good 074c9666d42211d0f70d5c156d377a4881b2a98c
# good: [88494339b5bccf0abea5660228fd066ec2e91dea] x86/calldepth: Rename __x86_return_skl() to call_depth_return_thunk()
git bisect good 88494339b5bccf0abea5660228fd066ec2e91dea
# good: [5c44836dd1451c754c58cea5179d2fa5cbd9fc85] x86/srso: Remove unnecessary semicolon
git bisect good 5c44836dd1451c754c58cea5179d2fa5cbd9fc85
# good: [99b5bf0276d4ae5028ab9743053c6d16044009ea] x86/vdso: Run objtool on vdso32-setup.o
git bisect good 99b5bf0276d4ae5028ab9743053c6d16044009ea
# first bad commit: [91174087dcc7565d8bf0d576544e42d5b1de6f39] x86/retpoline: Ensure default return thunk isn't used at runtime
On Mon, Oct 16, 2023 at 02:10:40PM -0700, Nathan Chancellor wrote:
> I just bisected a boot failure that our continuous integration sees [1]
> with x86_64_defconfig + CONFIG_KCSAN=y to this change in -tip/-next. It
> does not appear to be clang specific, as I can reproduce it with GCC
> 13.2.0 from kernel.org [2] (the rootfs is available at [3], if it is
> necessary for reproducing).
>
> $ make -skj"$(nproc)" ARCH=x86_64 CROSS_COMPILE=x86_64-linux- defconfig
> $ scripts/config -e KCSAN
> $ make -skj"$(nproc)" ARCH=x86_64 CROSS_COMPILE=x86_64-linux- olddefconfig bzImage
> $ qemu-system-x86_64 \
> -display none \
> -nodefaults \
> -d unimp,guest_errors \
> -append 'console=ttyS0 earlycon=uart8250,io,0x3f8' \
> -kernel arch/x86/boot/bzImage \
> -initrd x86_64-rootfs.cpio \
> -cpu host \
What's the host?
> If there is any other information I can provide or patches I can test, I
> am more than happy to do so.
Yes, pls send your .config too because depending on the compiler, KCSAN
does get disabled with older ones. So I guess it has to be gcc 13 or so.
And full guest dmesg so that I can compare.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Mon, Oct 16, 2023 at 11:29:44PM +0200, Borislav Petkov wrote:
> On Mon, Oct 16, 2023 at 02:10:40PM -0700, Nathan Chancellor wrote:
> > I just bisected a boot failure that our continuous integration sees [1]
> > with x86_64_defconfig + CONFIG_KCSAN=y to this change in -tip/-next. It
> > does not appear to be clang specific, as I can reproduce it with GCC
> > 13.2.0 from kernel.org [2] (the rootfs is available at [3], if it is
> > necessary for reproducing).
> >
> > $ make -skj"$(nproc)" ARCH=x86_64 CROSS_COMPILE=x86_64-linux- defconfig
> > $ scripts/config -e KCSAN
> > $ make -skj"$(nproc)" ARCH=x86_64 CROSS_COMPILE=x86_64-linux- olddefconfig bzImage
> > $ qemu-system-x86_64 \
> > -display none \
> > -nodefaults \
> > -d unimp,guest_errors \
> > -append 'console=ttyS0 earlycon=uart8250,io,0x3f8' \
> > -kernel arch/x86/boot/bzImage \
> > -initrd x86_64-rootfs.cpio \
> > -cpu host \
>
> What's the host?
A Threadripper 3990X, although I will say that this is reproducible with
QEMU's TCG, as that is what our CI tests with.
> > If there is any other information I can provide or patches I can test, I
> > am more than happy to do so.
>
> Yes, pls send your .config too because depending on the compiler, KCSAN
> does get disabled with older ones. So I guess it has to be gcc 13 or so.
>
> And full guest dmesg so that I can compare.
I attached the config and full dmesg from this change and the one
directly before it.
Cheers,
Nathan
[AMD Official Use Only - General]
> -----Original Message-----
> From: Nathan Chancellor <[email protected]>
> Sent: Monday, October 16, 2023 4:48 PM
> To: Borislav Petkov <[email protected]>
> Cc: [email protected]; [email protected];
> Kaplan, David <[email protected]>; Ingo Molnar <[email protected]>;
> Josh Poimboeuf <[email protected]>; Peter Zijlstra (Intel)
> <[email protected]>; [email protected]; [email protected]
> Subject: Re: [tip: x86/bugs] x86/retpoline: Ensure default return thunk isn't
> used at runtime
>
> Caution: This message originated from an External Source. Use proper
> caution when opening attachments, clicking links, or responding.
>
>
> On Mon, Oct 16, 2023 at 11:29:44PM +0200, Borislav Petkov wrote:
> > On Mon, Oct 16, 2023 at 02:10:40PM -0700, Nathan Chancellor wrote:
> > > I just bisected a boot failure that our continuous integration sees
> > > [1] with x86_64_defconfig + CONFIG_KCSAN=y to this change in
> > > -tip/-next. It does not appear to be clang specific, as I can
> > > reproduce it with GCC
> > > 13.2.0 from kernel.org [2] (the rootfs is available at [3], if it is
> > > necessary for reproducing).
> > >
> > > $ make -skj"$(nproc)" ARCH=x86_64 CROSS_COMPILE=x86_64-linux-
> > > defconfig $ scripts/config -e KCSAN $ make -skj"$(nproc)"
> > > ARCH=x86_64 CROSS_COMPILE=x86_64-linux- olddefconfig bzImage $
> > > qemu-system-x86_64 \
> > > -display none \
> > > -nodefaults \
> > > -d unimp,guest_errors \
> > > -append 'console=ttyS0 earlycon=uart8250,io,0x3f8' \
> > > -kernel arch/x86/boot/bzImage \
> > > -initrd x86_64-rootfs.cpio \
> > > -cpu host \
> >
I think I found the problem, although I'm not sure the best way to fix it.
When KCSAN is enabled, GCC generates lots of constructor functions named _sub_I_00099_0 which call __tsan_init and then return. The returns in these are generally annotated normally by objtool and fixed up at runtime. But objtool runs on vmlinux.o and vmlinux.o does not include a couple of object files that are in vmlinux, like init/version-timestamp.o and .vmlinux.export.o, both of which contain _sub_I_00099_0 functions. As a result, the returns in these functions are not annotated, and the panic occurs when we call one of them in do_ctors and it uses the default return thunk.
This difference can be seen by counting the number of these functions in the object files:
$ objdump -d vmlinux.o|grep -c "<_sub_I_00099_0>:"
2601
$ objdump -d vmlinux|grep -c "<_sub_I_00099_0>:"
2603
If these functions are only run during kernel boot, there is no speculation concern. My first thought is that these two object files perhaps should be built without -mfunction-return=thunk-extern. The use of that flag requires objtool to have the intended behavior and objtool isn't seeing these files.
Perhaps another option would be to not compile these two files with KCSAN, as they are already excluded from KASAN and GCOV it looks like.
--David Kaplan
On Tue, Oct 17, 2023 at 04:31:09AM +0000, Kaplan, David wrote:
> I think I found the problem, although I'm not sure the best way to fix it.
>
> When KCSAN is enabled, GCC generates lots of constructor functions named _sub_I_00099_0 which call __tsan_init and then return. The returns in these are generally annotated normally by objtool and fixed up at runtime. But objtool runs on vmlinux.o and vmlinux.o does not include a couple of object files that are in vmlinux, like init/version-timestamp.o and .vmlinux.export.o, both of which contain _sub_I_00099_0 functions. As a result, the returns in these functions are not annotated, and the panic occurs when we call one of them in do_ctors and it uses the default return thunk.
>
> This difference can be seen by counting the number of these functions in the object files:
> $ objdump -d vmlinux.o|grep -c "<_sub_I_00099_0>:"
> 2601
> $ objdump -d vmlinux|grep -c "<_sub_I_00099_0>:"
> 2603
>
> If these functions are only run during kernel boot, there is no speculation concern. My first thought is that these two object files perhaps should be built without -mfunction-return=thunk-extern. The use of that flag requires objtool to have the intended behavior and objtool isn't seeing these files.
>
> Perhaps another option would be to not compile these two files with KCSAN, as they are already excluded from KASAN and GCOV it looks like.
I think the latter would be the easy fix, does this make it go away?
diff --git a/init/Makefile b/init/Makefile
index ec557ada3c12..cbac576c57d6 100644
--- a/init/Makefile
+++ b/init/Makefile
@@ -60,4 +60,5 @@ include/generated/utsversion.h: FORCE
$(obj)/version-timestamp.o: include/generated/utsversion.h
CFLAGS_version-timestamp.o := -include include/generated/utsversion.h
KASAN_SANITIZE_version-timestamp.o := n
+KCSAN_SANITIZE_version-timestamp.o := n
GCOV_PROFILE_version-timestamp.o := n
diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
index 3cd6ca15f390..c9f3e03124d7 100644
--- a/scripts/Makefile.vmlinux
+++ b/scripts/Makefile.vmlinux
@@ -19,6 +19,7 @@ quiet_cmd_cc_o_c = CC $@
ifdef CONFIG_MODULES
KASAN_SANITIZE_.vmlinux.export.o := n
+KCSAN_SANITIZE_.vmlinux.export.o := n
GCOV_PROFILE_.vmlinux.export.o := n
targets += .vmlinux.export.o
vmlinux: .vmlinux.export.o
[AMD Official Use Only - General]
> -----Original Message-----
> From: Josh Poimboeuf <[email protected]>
> Sent: Tuesday, October 17, 2023 12:29 AM
> To: Kaplan, David <[email protected]>
> Cc: Nathan Chancellor <[email protected]>; Borislav Petkov
> <[email protected]>; [email protected]; linux-tip-
> [email protected]; Ingo Molnar <[email protected]>; Peter Zijlstra
> (Intel) <[email protected]>; [email protected]; [email protected]
> Subject: Re: [tip: x86/bugs] x86/retpoline: Ensure default return thunk isn't
> used at runtime
>
> Caution: This message originated from an External Source. Use proper
> caution when opening attachments, clicking links, or responding.
>
>
> On Tue, Oct 17, 2023 at 04:31:09AM +0000, Kaplan, David wrote:
> > I think I found the problem, although I'm not sure the best way to fix it.
> >
> > When KCSAN is enabled, GCC generates lots of constructor functions
> named _sub_I_00099_0 which call __tsan_init and then return. The returns
> in these are generally annotated normally by objtool and fixed up at runtime.
> But objtool runs on vmlinux.o and vmlinux.o does not include a couple of
> object files that are in vmlinux, like init/version-timestamp.o and
> .vmlinux.export.o, both of which contain _sub_I_00099_0 functions. As a
> result, the returns in these functions are not annotated, and the panic occurs
> when we call one of them in do_ctors and it uses the default return thunk.
> >
> > This difference can be seen by counting the number of these functions in
> the object files:
> > $ objdump -d vmlinux.o|grep -c "<_sub_I_00099_0>:"
> > 2601
> > $ objdump -d vmlinux|grep -c "<_sub_I_00099_0>:"
> > 2603
> >
> > If these functions are only run during kernel boot, there is no speculation
> concern. My first thought is that these two object files perhaps should be
> built without -mfunction-return=thunk-extern. The use of that flag requires
> objtool to have the intended behavior and objtool isn't seeing these files.
> >
> > Perhaps another option would be to not compile these two files with
> KCSAN, as they are already excluded from KASAN and GCOV it looks like.
>
> I think the latter would be the easy fix, does this make it go away?
>
> diff --git a/init/Makefile b/init/Makefile
> index ec557ada3c12..cbac576c57d6 100644
> --- a/init/Makefile
> +++ b/init/Makefile
> @@ -60,4 +60,5 @@ include/generated/utsversion.h: FORCE
> $(obj)/version-timestamp.o: include/generated/utsversion.h
> CFLAGS_version-timestamp.o := -include include/generated/utsversion.h
> KASAN_SANITIZE_version-timestamp.o := n
> +KCSAN_SANITIZE_version-timestamp.o := n
> GCOV_PROFILE_version-timestamp.o := n
> diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
> index 3cd6ca15f390..c9f3e03124d7 100644
> --- a/scripts/Makefile.vmlinux
> +++ b/scripts/Makefile.vmlinux
> @@ -19,6 +19,7 @@ quiet_cmd_cc_o_c = CC $@
>
> ifdef CONFIG_MODULES
> KASAN_SANITIZE_.vmlinux.export.o := n
> +KCSAN_SANITIZE_.vmlinux.export.o := n
> GCOV_PROFILE_.vmlinux.export.o := n
> targets += .vmlinux.export.o
> vmlinux: .vmlinux.export.o
Yes, that worked for me. With this the VM booted and the number of these sub_I_00099_0 functions was consistent between vmlinux.o and vmlinux.
--David Kaplan
+ Marco, Dmitry
On Mon, Oct 16, 2023 at 10:28 PM Josh Poimboeuf <[email protected]> wrote:
>
> On Tue, Oct 17, 2023 at 04:31:09AM +0000, Kaplan, David wrote:
> > Perhaps another option would be to not compile these two files with KCSAN, as they are already excluded from KASAN and GCOV it looks like.
>
> I think the latter would be the easy fix, does this make it go away?
Yeah, usually when I see the other sanitizers being disabled on a per
object basis, I think "where there's smoke, there's fire."
Reviewed-by: Nick Desaulniers <[email protected]>
Reported-by: Nathan Chancellor <[email protected]>
Closes: https://lore.kernel.org/lkml/[email protected]/
>
> diff --git a/init/Makefile b/init/Makefile
> index ec557ada3c12..cbac576c57d6 100644
> --- a/init/Makefile
> +++ b/init/Makefile
> @@ -60,4 +60,5 @@ include/generated/utsversion.h: FORCE
> $(obj)/version-timestamp.o: include/generated/utsversion.h
> CFLAGS_version-timestamp.o := -include include/generated/utsversion.h
> KASAN_SANITIZE_version-timestamp.o := n
> +KCSAN_SANITIZE_version-timestamp.o := n
> GCOV_PROFILE_version-timestamp.o := n
> diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
> index 3cd6ca15f390..c9f3e03124d7 100644
> --- a/scripts/Makefile.vmlinux
> +++ b/scripts/Makefile.vmlinux
> @@ -19,6 +19,7 @@ quiet_cmd_cc_o_c = CC $@
>
> ifdef CONFIG_MODULES
> KASAN_SANITIZE_.vmlinux.export.o := n
> +KCSAN_SANITIZE_.vmlinux.export.o := n
> GCOV_PROFILE_.vmlinux.export.o := n
> targets += .vmlinux.export.o
> vmlinux: .vmlinux.export.o
>
--
Thanks,
~Nick Desaulniers
On Tue, 17 Oct 2023 at 17:24, Nick Desaulniers <[email protected]> wrote:
>
> + Marco, Dmitry
>
> On Mon, Oct 16, 2023 at 10:28 PM Josh Poimboeuf <[email protected]> wrote:
> >
> > On Tue, Oct 17, 2023 at 04:31:09AM +0000, Kaplan, David wrote:
> > > Perhaps another option would be to not compile these two files with KCSAN, as they are already excluded from KASAN and GCOV it looks like.
> >
> > I think the latter would be the easy fix, does this make it go away?
>
> Yeah, usually when I see the other sanitizers being disabled on a per
> object basis, I think "where there's smoke, there's fire."
>
> Reviewed-by: Nick Desaulniers <[email protected]>
> Reported-by: Nathan Chancellor <[email protected]>
> Closes: https://lore.kernel.org/lkml/[email protected]/
Acked-by: Marco Elver <[email protected]>
Instrumenting these files really doesn't make sense. Thanks for
catching this and the fix!
> >
> > diff --git a/init/Makefile b/init/Makefile
> > index ec557ada3c12..cbac576c57d6 100644
> > --- a/init/Makefile
> > +++ b/init/Makefile
> > @@ -60,4 +60,5 @@ include/generated/utsversion.h: FORCE
> > $(obj)/version-timestamp.o: include/generated/utsversion.h
> > CFLAGS_version-timestamp.o := -include include/generated/utsversion.h
> > KASAN_SANITIZE_version-timestamp.o := n
> > +KCSAN_SANITIZE_version-timestamp.o := n
> > GCOV_PROFILE_version-timestamp.o := n
> > diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
> > index 3cd6ca15f390..c9f3e03124d7 100644
> > --- a/scripts/Makefile.vmlinux
> > +++ b/scripts/Makefile.vmlinux
> > @@ -19,6 +19,7 @@ quiet_cmd_cc_o_c = CC $@
> >
> > ifdef CONFIG_MODULES
> > KASAN_SANITIZE_.vmlinux.export.o := n
> > +KCSAN_SANITIZE_.vmlinux.export.o := n
> > GCOV_PROFILE_.vmlinux.export.o := n
> > targets += .vmlinux.export.o
> > vmlinux: .vmlinux.export.o
> >
>
>
> --
> Thanks,
> ~Nick Desaulniers
On Mon, Oct 16, 2023 at 10:28:34PM -0700, Josh Poimboeuf wrote:
> On Tue, Oct 17, 2023 at 04:31:09AM +0000, Kaplan, David wrote:
> > I think I found the problem, although I'm not sure the best way to fix it.
> >
> > When KCSAN is enabled, GCC generates lots of constructor functions named _sub_I_00099_0 which call __tsan_init and then return. The returns in these are generally annotated normally by objtool and fixed up at runtime. But objtool runs on vmlinux.o and vmlinux.o does not include a couple of object files that are in vmlinux, like init/version-timestamp.o and .vmlinux.export.o, both of which contain _sub_I_00099_0 functions. As a result, the returns in these functions are not annotated, and the panic occurs when we call one of them in do_ctors and it uses the default return thunk.
> >
> > This difference can be seen by counting the number of these functions in the object files:
> > $ objdump -d vmlinux.o|grep -c "<_sub_I_00099_0>:"
> > 2601
> > $ objdump -d vmlinux|grep -c "<_sub_I_00099_0>:"
> > 2603
> >
> > If these functions are only run during kernel boot, there is no speculation concern. My first thought is that these two object files perhaps should be built without -mfunction-return=thunk-extern. The use of that flag requires objtool to have the intended behavior and objtool isn't seeing these files.
> >
> > Perhaps another option would be to not compile these two files with KCSAN, as they are already excluded from KASAN and GCOV it looks like.
>
> I think the latter would be the easy fix, does this make it go away?
Tested-by: Nathan Chancellor <[email protected]>
Thanks for figuring this out!
> diff --git a/init/Makefile b/init/Makefile
> index ec557ada3c12..cbac576c57d6 100644
> --- a/init/Makefile
> +++ b/init/Makefile
> @@ -60,4 +60,5 @@ include/generated/utsversion.h: FORCE
> $(obj)/version-timestamp.o: include/generated/utsversion.h
> CFLAGS_version-timestamp.o := -include include/generated/utsversion.h
> KASAN_SANITIZE_version-timestamp.o := n
> +KCSAN_SANITIZE_version-timestamp.o := n
> GCOV_PROFILE_version-timestamp.o := n
> diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
> index 3cd6ca15f390..c9f3e03124d7 100644
> --- a/scripts/Makefile.vmlinux
> +++ b/scripts/Makefile.vmlinux
> @@ -19,6 +19,7 @@ quiet_cmd_cc_o_c = CC $@
>
> ifdef CONFIG_MODULES
> KASAN_SANITIZE_.vmlinux.export.o := n
> +KCSAN_SANITIZE_.vmlinux.export.o := n
> GCOV_PROFILE_.vmlinux.export.o := n
> targets += .vmlinux.export.o
> vmlinux: .vmlinux.export.o
Enabling CONFIG_KCSAN causes a panic during boot due to an "invalid
opcode" in __x86_return_thunk():
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.6.0-rc2-00316-g91174087dcc7 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-14-g1e1da7a96300-prebuilt.qemu.org 04/01/2014
RIP: 0010:__x86_return_thunk+0x0/0x10
Code: e8 01 00 00 00 cc e8 01 00 00 00 cc 48 81 c4 80 00 00 00 65 48 c7 04 25 d0 ac 02 00 ff ff ff ff c3 cc 0f 1f 84 00 00 00 00 00 <0f> 0b cc cc cc cc cc cc cc cc cc cc cc cc cc cc e9 db 8c 8e fe 0f
RSP: 0018:ffffaef1c0013ed0 EFLAGS: 00010246
RAX: ffffffffa0e80eb0 RBX: ffffffffa0f05240 RCX: 0001ffffffffffff
RDX: 0000000000000551 RSI: ffffffffa0dcc64e RDI: ffffffffa0f05238
RBP: ffff8f93c11708e0 R08: ffffffffa1387280 R09: 0000000000000000
R10: 0000000000000282 R11: 0001ffffa0f05238 R12: 0000000000000002
R13: 0000000000000282 R14: 0000000000000001 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff8f93df000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff8f93d6c01000 CR3: 0000000015c2e000 CR4: 0000000000350ef0
The panic is triggered by the UD2 instruction which gets patched into
__x86_return_thunk() when alternatives are applied. After that point,
the default return thunk should no longer be used.
As David Kaplan describes, the issue is caused by a couple of
KCSAN-generated constructors which aren't processed by objtool:
"When KCSAN is enabled, GCC generates lots of constructor functions
named _sub_I_00099_0 which call __tsan_init and then return. The
returns in these are generally annotated normally by objtool and fixed
up at runtime. But objtool runs on vmlinux.o and vmlinux.o does not
include a couple of object files that are in vmlinux, like
init/version-timestamp.o and .vmlinux.export.o, both of which contain
_sub_I_00099_0 functions. As a result, the returns in these functions
are not annotated, and the panic occurs when we call one of them in
do_ctors and it uses the default return thunk.
This difference can be seen by counting the number of these functions in the object files:
$ objdump -d vmlinux.o|grep -c "<_sub_I_00099_0>:"
2601
$ objdump -d vmlinux|grep -c "<_sub_I_00099_0>:"
2603
If these functions are only run during kernel boot, there is no
speculation concern."
Fix it by disabling KCSAN on version-timestamp.o and .vmlinux.export.o
so the extra functions don't get generated. KASAN and GCOV are already
disabled for those files.
Fixes: 91174087dcc7 ("x86/retpoline: Ensure default return thunk isn't used at runtime")
Reported-by: Nathan Chancellor <[email protected]>
Closes: https://lore.kernel.org/lkml/[email protected]/
Debugged-by: David Kaplan <[email protected]>
Tested-by: Nathan Chancellor <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Acked-by: Marco Elver <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
---
init/Makefile | 1 +
scripts/Makefile.vmlinux | 1 +
2 files changed, 2 insertions(+)
diff --git a/init/Makefile b/init/Makefile
index ec557ada3c12..cbac576c57d6 100644
--- a/init/Makefile
+++ b/init/Makefile
@@ -60,4 +60,5 @@ include/generated/utsversion.h: FORCE
$(obj)/version-timestamp.o: include/generated/utsversion.h
CFLAGS_version-timestamp.o := -include include/generated/utsversion.h
KASAN_SANITIZE_version-timestamp.o := n
+KCSAN_SANITIZE_version-timestamp.o := n
GCOV_PROFILE_version-timestamp.o := n
diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
index 3cd6ca15f390..c9f3e03124d7 100644
--- a/scripts/Makefile.vmlinux
+++ b/scripts/Makefile.vmlinux
@@ -19,6 +19,7 @@ quiet_cmd_cc_o_c = CC $@
ifdef CONFIG_MODULES
KASAN_SANITIZE_.vmlinux.export.o := n
+KCSAN_SANITIZE_.vmlinux.export.o := n
GCOV_PROFILE_.vmlinux.export.o := n
targets += .vmlinux.export.o
vmlinux: .vmlinux.export.o
--
2.41.0
The following commit has been merged into the x86/bugs branch of tip:
Commit-ID: 28860182b7d88e5be76f332c34377288ad08e87a
Gitweb: https://git.kernel.org/tip/28860182b7d88e5be76f332c34377288ad08e87a
Author: Josh Poimboeuf <[email protected]>
AuthorDate: Tue, 17 Oct 2023 09:59:46 -07:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Tue, 17 Oct 2023 19:46:04 +02:00
x86/retpoline: Make sure there are no unconverted return thunks due to KCSAN
Enabling CONFIG_KCSAN causes the undefined opcode exception diagnostic
added by
91174087dcc7 ("x86/retpoline: Ensure default return thunk isn't used at runtime")
which is supposed to catch unconverted, default return thunks, to fire.
The resulting panic is triggered by the UD2 instruction which gets
patched into __x86_return_thunk() when alternatives are applied. After
that point, the default return thunk should no longer be used.
As David Kaplan describes in his debugging of the issue, it is caused by
a couple of KCSAN-generated constructors which aren't processed by
objtool:
"When KCSAN is enabled, GCC generates lots of constructor functions
named _sub_I_00099_0 which call __tsan_init and then return. The
returns in these are generally annotated normally by objtool and fixed
up at runtime. But objtool runs on vmlinux.o and vmlinux.o does not
include a couple of object files that are in vmlinux, like
init/version-timestamp.o and .vmlinux.export.o, both of which contain
_sub_I_00099_0 functions. As a result, the returns in these functions
are not annotated, and the panic occurs when we call one of them in
do_ctors and it uses the default return thunk.
This difference can be seen by counting the number of these functions in the object files:
$ objdump -d vmlinux.o|grep -c "<_sub_I_00099_0>:"
2601
$ objdump -d vmlinux|grep -c "<_sub_I_00099_0>:"
2603
If these functions are only run during kernel boot, there is no
speculation concern."
Fix it by disabling KCSAN on version-timestamp.o and .vmlinux.export.o
so the extra functions don't get generated. KASAN and GCOV are already
disabled for those files.
[ bp: Massage commit message. ]
Fixes: 91174087dcc7 ("x86/retpoline: Ensure default return thunk isn't used at runtime")
Closes: https://lore.kernel.org/lkml/[email protected]/
Reported-by: Nathan Chancellor <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Acked-by: Marco Elver <[email protected]>
Tested-by: Nathan Chancellor <[email protected]>
Link: https://lore.kernel.org/r/20231017165946.v4i2d4exyqwqq3bx@treble
---
init/Makefile | 1 +
scripts/Makefile.vmlinux | 1 +
2 files changed, 2 insertions(+)
diff --git a/init/Makefile b/init/Makefile
index ec557ad..cbac576 100644
--- a/init/Makefile
+++ b/init/Makefile
@@ -60,4 +60,5 @@ include/generated/utsversion.h: FORCE
$(obj)/version-timestamp.o: include/generated/utsversion.h
CFLAGS_version-timestamp.o := -include include/generated/utsversion.h
KASAN_SANITIZE_version-timestamp.o := n
+KCSAN_SANITIZE_version-timestamp.o := n
GCOV_PROFILE_version-timestamp.o := n
diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
index 3cd6ca1..c9f3e03 100644
--- a/scripts/Makefile.vmlinux
+++ b/scripts/Makefile.vmlinux
@@ -19,6 +19,7 @@ quiet_cmd_cc_o_c = CC $@
ifdef CONFIG_MODULES
KASAN_SANITIZE_.vmlinux.export.o := n
+KCSAN_SANITIZE_.vmlinux.export.o := n
GCOV_PROFILE_.vmlinux.export.o := n
targets += .vmlinux.export.o
vmlinux: .vmlinux.export.o
On Thu, Oct 12, 2023 at 05:50:35PM -0000, tip-bot2 for David Kaplan wrote:
> SYM_CODE_START(__x86_return_thunk)
> UNWIND_HINT_FUNC
> ANNOTATE_NOENDBR
> - ANNOTATE_UNRET_SAFE
> - ret
> + ALTERNATIVE __stringify(ANNOTATE_UNRET_SAFE;ret),"ud2", X86_FEATURE_ALWAYS
I'm wondering if panicking people's boxes isn't too harsh.
Also, we don't BUG() if we can continue so perhaps this should be
a really loud warn instead:
---
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index f93e9b96927a..f230f396c9c1 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -357,6 +357,8 @@ extern void entry_ibpb(void);
extern void (*x86_return_thunk)(void);
+extern void check_thunks(void);
+
#ifdef CONFIG_CALL_DEPTH_TRACKING
extern void call_depth_return_thunk(void);
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index bb0ab8466b91..e4b2dfbf3de5 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -2849,3 +2849,8 @@ ssize_t cpu_show_gds(struct device *dev, struct device_attribute *attr, char *bu
return cpu_show_common(dev, attr, buf, X86_BUG_GDS);
}
#endif
+
+void check_thunks(void)
+{
+ WARN(1, "Unconverted return thunk\n");
+}
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index 3f3a478b74dd..ca9024ef0a7c 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -366,8 +366,7 @@ SYM_FUNC_END(call_depth_return_thunk)
SYM_CODE_START(__x86_return_thunk)
UNWIND_HINT_FUNC
ANNOTATE_NOENDBR
- ALTERNATIVE __stringify(ANNOTATE_UNRET_SAFE;ret),"ud2", X86_FEATURE_ALWAYS
- int3
+ ALTERNATIVE __stringify(ANNOTATE_UNRET_SAFE;ret;int3),"call check_thunks; ret", X86_FEATURE_ALWAYS
SYM_CODE_END(__x86_return_thunk)
EXPORT_SYMBOL(__x86_return_thunk)
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
* Borislav Petkov <[email protected]> wrote:
> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index bb0ab8466b91..e4b2dfbf3de5 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -2849,3 +2849,8 @@ ssize_t cpu_show_gds(struct device *dev, struct device_attribute *attr, char *bu
> return cpu_show_common(dev, attr, buf, X86_BUG_GDS);
> }
> #endif
> +
> +void check_thunks(void)
> +{
> + WARN(1, "Unconverted return thunk\n");
If then WARN_ONCE().
Thanks,
Ingo
On Wed, Oct 18, 2023 at 03:38:56PM +0200, Ingo Molnar wrote:
> If then WARN_ONCE().
WARN_ONCE() is not enough considering that if this fires, it means we're
not really properly protected against one of those RET-speculation
things.
It needs to be warning constantly but then still allow booting. I.e,
a ratelimited warn of sorts but I don't think we have that... yet.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Oct 18, 2023 at 05:12:45PM +0200, Borislav Petkov wrote:
> On Wed, Oct 18, 2023 at 03:38:56PM +0200, Ingo Molnar wrote:
> > If then WARN_ONCE().
>
> WARN_ONCE() is not enough considering that if this fires, it means we're
> not really properly protected against one of those RET-speculation
> things.
>
> It needs to be warning constantly but then still allow booting. I.e,
> a ratelimited warn of sorts but I don't think we have that... yet.
I'm not sure a rate-limited WARN() would be a good thing. Either the
user is regularly checking dmesg (most likely in some automated fashion)
or they're not. If the latter, a rate-limited WARN() would wrap dmesg
pretty quickly.
--
Josh
On Wed, Oct 18, 2023 at 08:54:33AM -0700, Josh Poimboeuf wrote:
> On Wed, Oct 18, 2023 at 05:12:45PM +0200, Borislav Petkov wrote:
> > On Wed, Oct 18, 2023 at 03:38:56PM +0200, Ingo Molnar wrote:
> > > If then WARN_ONCE().
> >
> > WARN_ONCE() is not enough considering that if this fires, it means we're
> > not really properly protected against one of those RET-speculation
> > things.
> >
> > It needs to be warning constantly but then still allow booting. I.e,
> > a ratelimited warn of sorts but I don't think we have that... yet.
>
> I'm not sure a rate-limited WARN() would be a good thing. Either the
> user is regularly checking dmesg (most likely in some automated fashion)
> or they're not. If the latter, a rate-limited WARN() would wrap dmesg
> pretty quickly.
Well, freezing the box without any mention about why it happens is not
viable either. So for lack of a better solution, overflowing dmesg is
all we could do.
And, on a related note, I'm thinking I should revert:
e92626af3234 ("x86/retpoline: Remove .text..__x86.return_thunk section")
after all because I'm debugging another similar issue reported by
dhowells.
And I can reproduce it on linux-next with his config and gcc-13. The
splat looks like this below - and mind you, that's in a VM. On baremetal
you get to see only the first warning and output stops.
And that happens because for whatever reason apply_returns() can't find
that last jmp __x86_return_thunk for %r15 and it barfs.
When I revert e92626af3234, it is fixed. It fixes dhowells' box too.
Which means, IMHO, objtool is missing to add a return return call site
at the end of that __x86_indirect_thunk_r15.
And considering how close we are to the merge window, I'd let that
.text..__x86.return_thunk section exist so that objtool can find the
return sites more reliably that what we currently have.
We can always do e92626af3234 later, when it has seen more testing.
Now, to the UD2 case - look below at "* first splat".
Stack protector fires but there's no #UD exception. Well, there is, it
is well hidden:
(gdb) bt
#0 delay_tsc (cycles=3670543) at arch/x86/lib/delay.c:90
#1 0xffffffff810c706e in panic (fmt=fmt@entry=0xffffffff82504fe4 "stack-protector: Kernel stack is corrupted in: %pB")
at kernel/panic.c:456
#2 0xffffffff81d64afb in __stack_chk_fail () at kernel/panic.c:763
#3 0xffffffff810e9333 in notify_die (val=val@entry=DIE_TRAP, str=str@entry=0xffffffff824f49b8 "invalid opcode",
regs=regs@entry=0xffff8880794000a8, err=err@entry=0, trap=trap@entry=6, sig=sig@entry=4) at kernel/notifier.c:597
#4 0xffffffff8101d4fb in do_error_trap (regs=regs@entry=0xffff8880794000a8, error_code=error_code@entry=0,
str=str@entry=0xffffffff824f49b8 "invalid opcode", trapnr=trapnr@entry=6, signr=signr@entry=4, sicode=sicode@entry=2,
addr=0xffffffff81d712a0 <__x86_return_thunk>) at arch/x86/kernel/traps.c:170
#5 0xffffffff81d62355 in handle_invalid_op (regs=0xffff8880794000a8) at ./arch/x86/include/asm/ptrace.h:209a
^^^^^^^^^^^^^^^^
#6 exc_invalid_op (regs=0xffff8880794000a8) at arch/x86/kernel/traps.c:263
#7 0xffffffff81e00a96 in asm_exc_invalid_op () at ./arch/x86/include/asm/idtentry.h:568
#8 0xffffffff81d49ff4 in cmp_ex_sort (a=0x38020f, b=0x761d61d8b) at lib/extable.c:61
#9 0x000000000000000c in fixed_percpu_data ()
#10 0xffff888079400198 in ?? ()
#11 0xffffffff826cdc70 in __modver_attr ()
#12 0x0000000000000270 in ?? ()
#13 0xffffffff826ceb10 in __start___ex_table ()
#14 0x0000000000000000 in ?? ()
(gdb)
*while* it runs the #UD exception handler, stackprotector determines
that the stack has been corrupted, leading to that panic.
And nothing in dmesg tells the user what's really going on.
And with the warn, you can actually see it:
------------[ cut here ]------------
Unconverted return thunk
WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/bugs.c:2855 check_thunks+0x11/0x1a
Modules linked in:
...
----
I still need to figure out, though, how to make check_thunks *not* have a "jmp
__x86_return_thunk" at the end itself because it gets loopy. :)
* first splat
-------------
...
x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
------------[ cut here ]------------
missing return thunk: __x86_indirect_thunk_r15+0xa/0x5f-0x0: eb 74 66 66 2e
WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:755 apply_returns+0xca/0x247
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.6.0-rc6-next-20231018-build3 #4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:apply_returns+0xca/0x247
Code: 80 3d bd e1 aa 01 00 75 b4 49 89 d8 48 89 ea 48 89 de c6 05 ab e1 aa 01 01 b9 05 00 00 00 48 c7 c7 45 65 4f 82 e8 0b 10 0a 00 <0f> 0b eb 8f f6 05 36 2e 1f 02 02 74 26 0f b6 54 24 52 48 89 de 48
RSP: 0000:ffffffff82803e30 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffffffff81d7122a RCX: 0000000000000003
RDX: 0000000000000086 RSI: 00000000fff7ffff RDI: 0000000000000001
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: ffffffff82803969 R12: ffffffff831fed00
R13: 0000000000000005 R14: ffffffff831fed18 R15: 0000000000013af0
FS: 0000000000000000(0000) GS:ffff888079400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff888003401000 CR3: 0000000002854000 CR4: 00000000001506f0
Call Trace:
<TASK>
? __warn+0x8c/0xf6
? report_bug+0xbf/0x11f
? apply_returns+0xca/0x247
? handle_bug+0x3c/0x63
? exc_invalid_op+0x13/0x5d
? asm_exc_invalid_op+0x16/0x20
? __x86_indirect_thunk_r15+0xa/0x5f
? apply_returns+0xca/0x247
? __x86_indirect_thunk_r15+0xa/0x5f
? __x86_indirect_thunk_r15+0x19/0x5f
? __x86_indirect_thunk_r15+0xc/0x5f
alternative_instructions+0x35/0xe2
arch_cpu_finalize_init+0xba/0xdb
start_kernel+0x4a1/0x524
x86_64_start_reservations+0x25/0x25
x86_64_start_kernel+0x73/0x73
secondary_startup_64_no_verify+0x166/0x16b
</TASK>
---[ end trace 0000000000000000 ]---
Freeing SMP alternatives memory: 36K
pid_max: default: 32768 minimum: 301
LSM: initializing lsm=capability,yama,selinux
Yama: becoming mindful.
SELinux: Initializing.
Mount-cache hash table entries: 4096 (order: 3, 32768 bytes, linear)
Mountpoint-cache hash table entries: 4096 (order: 3, 32768 bytes, linear)
smpboot: CPU0: Intel Core Processor (Haswell, no TSX) (family: 0x6, model: 0x3c, stepping: 0x1)
RCU Tasks Rude: Setting shift to 2 and lim to 1 rcu_task_cb_adjust=1.
RCU Tasks Trace: Setting shift to 2 and lim to 1 rcu_task_cb_adjust=1.
Performance Events: unsupported p6 CPU model 60 no PMU driver, software events only.
signal: max sigframe size: 1776
rcu: Hierarchical SRCU implementation.
rcu: Max phase no-delay instances is 1000.
NMI watchdog: Perf NMI watchdog permanently disabled
smp: Bringing up secondary CPUs ...
BUG: spinlock bad magic on CPU#0, swapper/0/1
Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: notify_die+0x52/0x5b
CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.6.0-rc6-next-20231018-build3 #4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: notify_die+0x52/0x5b ]---
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Oct 18, 2023 at 07:55:31PM +0200, Borislav Petkov wrote:
> On Wed, Oct 18, 2023 at 08:54:33AM -0700, Josh Poimboeuf wrote:
> > On Wed, Oct 18, 2023 at 05:12:45PM +0200, Borislav Petkov wrote:
> > > On Wed, Oct 18, 2023 at 03:38:56PM +0200, Ingo Molnar wrote:
> > > > If then WARN_ONCE().
> > >
> > > WARN_ONCE() is not enough considering that if this fires, it means we're
> > > not really properly protected against one of those RET-speculation
> > > things.
> > >
> > > It needs to be warning constantly but then still allow booting. I.e,
> > > a ratelimited warn of sorts but I don't think we have that... yet.
> >
> > I'm not sure a rate-limited WARN() would be a good thing. Either the
> > user is regularly checking dmesg (most likely in some automated fashion)
> > or they're not. If the latter, a rate-limited WARN() would wrap dmesg
> > pretty quickly.
>
> Well, freezing the box without any mention about why it happens is not
> viable either. So for lack of a better solution, overflowing dmesg is
> all we could do.
Why not just WARN_ONCE() then?
> And, on a related note, I'm thinking I should revert:
>
> e92626af3234 ("x86/retpoline: Remove .text..__x86.return_thunk section")
>
> after all because I'm debugging another similar issue reported by
> dhowells.
>
> And I can reproduce it on linux-next with his config and gcc-13. The
> splat looks like this below - and mind you, that's in a VM. On baremetal
> you get to see only the first warning and output stops.
>
> And that happens because for whatever reason apply_returns() can't find
> that last jmp __x86_return_thunk for %r15 and it barfs.
>
> When I revert e92626af3234, it is fixed. It fixes dhowells' box too.
>
> Which means, IMHO, objtool is missing to add a return return call site
> at the end of that __x86_indirect_thunk_r15.
>
> And considering how close we are to the merge window, I'd let that
> .text..__x86.return_thunk section exist so that objtool can find the
> return sites more reliably that what we currently have.
>
> We can always do e92626af3234 later, when it has seen more testing.
Ok. A revert is fine for now, but either way we do need to get to the
bottom of why objtool is messing up. Can you share the config?
--
Josh
On Wed, Oct 18, 2023 at 11:14:31AM -0700, Josh Poimboeuf wrote:
> > > > WARN_ONCE() is not enough considering that if this fires, it means we're
> > > > not really properly protected against one of those RET-speculation
> > > > things.
> > > >
> > > > It needs to be warning constantly but then still allow booting. I.e,
> > > > a ratelimited warn of sorts but I don't think we have that... yet.
^
-----| this here.
> > > I'm not sure a rate-limited WARN() would be a good thing. Either the
> > > user is regularly checking dmesg (most likely in some automated fashion)
> > > or they're not. If the latter, a rate-limited WARN() would wrap dmesg
> > > pretty quickly.
> >
> > Well, freezing the box without any mention about why it happens is not
> > viable either. So for lack of a better solution, overflowing dmesg is
> > all we could do.
>
> Why not just WARN_ONCE() then?
See above....^
> Ok. A revert is fine for now, but either way we do need to get to the
> bottom of why objtool is messing up. Can you share the config?
Attached.
And as said, you need gcc 13.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Oct 18, 2023 at 08:22:23PM +0200, Borislav Petkov wrote:
> On Wed, Oct 18, 2023 at 11:14:31AM -0700, Josh Poimboeuf wrote:
> > > > > WARN_ONCE() is not enough considering that if this fires, it means we're
> > > > > not really properly protected against one of those RET-speculation
> > > > > things.
> > > > >
> > > > > It needs to be warning constantly but then still allow booting. I.e,
> > > > > a ratelimited warn of sorts but I don't think we have that... yet.
>
>
> ^
> -----| this here.
>
>
> > > > I'm not sure a rate-limited WARN() would be a good thing. Either the
> > > > user is regularly checking dmesg (most likely in some automated fashion)
> > > > or they're not. If the latter, a rate-limited WARN() would wrap dmesg
> > > > pretty quickly.
> > >
> > > Well, freezing the box without any mention about why it happens is not
> > > viable either. So for lack of a better solution, overflowing dmesg is
> > > all we could do.
> >
> > Why not just WARN_ONCE() then?
>
> See above....^
And see my reply to that? Not trying to be daft, I just didn't see how
your reply was responsive.
A single WARN_ONCE() has the benefit of not overflowing dmesg, while
also making the warning available to those looking at dmesg (or the
taint flag), as those who care should be.
A rate-limited WARN() is problematic, as it overflows dmesg (and
possibly wrapping other logs), potentially obscuring other important
data.
> > Ok. A revert is fine for now, but either way we do need to get to the
> > bottom of why objtool is messing up. Can you share the config?
>
> Attached.
>
> And as said, you need gcc 13.
Thanks, I'll see if I can recreate.
--
Josh
On Wed, Oct 18, 2023 at 11:39:15AM -0700, Josh Poimboeuf wrote:
> And see my reply to that? Not trying to be daft, I just didn't see how
> your reply was responsive.
>
> A single WARN_ONCE() has the benefit of not overflowing dmesg, while
> also making the warning available to those looking at dmesg (or the
> taint flag), as those who care should be.
A single WARN once is not enough as this is security-sensitive. Warns do
get ignored.
> A rate-limited WARN() is problematic, as it overflows dmesg (and
> possibly wrapping other logs), potentially obscuring other important
> data.
This will hopefully make people look by screaming louder. But no
guarantee. Not saying it is the right thing.
UDing without any output is not the right thing either.
All I'm saying is, there's no good solution for how to catch this and
make people report it.
Make more sense?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Oct 18, 2023 at 08:44:31PM +0200, Borislav Petkov wrote:
> On Wed, Oct 18, 2023 at 11:39:15AM -0700, Josh Poimboeuf wrote:
> > And see my reply to that? Not trying to be daft, I just didn't see how
> > your reply was responsive.
> >
> > A single WARN_ONCE() has the benefit of not overflowing dmesg, while
> > also making the warning available to those looking at dmesg (or the
> > taint flag), as those who care should be.
>
> A single WARN once is not enough as this is security-sensitive. Warns do
> get ignored.
>
> > A rate-limited WARN() is problematic, as it overflows dmesg (and
> > possibly wrapping other logs), potentially obscuring other important
> > data.
>
> This will hopefully make people look by screaming louder. But no
> guarantee. Not saying it is the right thing.
>
> UDing without any output is not the right thing either.
>
> All I'm saying is, there's no good solution for how to catch this and
> make people report it.
>
> Make more sense?
There are a lot of warnings which could become security concerns.
By definition, a warning means something is seriously wrong. If it's
ignored, all bets are off. That's why we taint the kernel.
--
Josh
On Wed, Oct 18, 2023 at 12:14:07PM -0700, Josh Poimboeuf wrote:
> There are a lot of warnings which could become security concerns.
Not "could become" - this one *is* a security issue because it means we're
not mitigating with the RET thunks properly.
> By definition, a warning means something is seriously wrong. If it's
> ignored, all bets are off. That's why we taint the kernel.
If I could, I'd cripple the kernel just enough so that it issues the
warning and then stops so that the users are not exposed, but show why
it stopped. And we know that panicking doesn't provide that.
David suggested earlier that perhaps we should warn and then mark the
kernel as vulnerable to those mitigations. That could be a more
realistic thing to do...
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Oct 18, 2023 at 07:55:31PM +0200, Borislav Petkov wrote:
> And that happens because for whatever reason apply_returns() can't find
> that last jmp __x86_return_thunk for %r15 and it barfs.
Some more info on why it happens:
something with gcc-13 or this config of whatever ends up generating
this:
ffffffff81d71200 <__x86_indirect_thunk_r14>:
ffffffff81d71200: e8 01 00 00 00 call ffffffff81d71206 <__x86_indirect_thunk_r14+0x6>
ffffffff81d71205: cc int3
ffffffff81d71206: 4c 89 34 24 mov %r14,(%rsp)
ffffffff81d7120a: e9 91 00 00 00 jmp ffffffff81d712a0 <__x86_return_thunk>
^^^^^^^^^
ffffffff81d7120f: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1)
ffffffff81d71216: 00 00 00 00
ffffffff81d7121a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
ffffffff81d71220 <__x86_indirect_thunk_r15>:
ffffffff81d71220: e8 01 00 00 00 call ffffffff81d71226 <__x86_indirect_thunk_r15+0x6>
ffffffff81d71225: cc int3
ffffffff81d71226: 4c 89 3c 24 mov %r15,(%rsp)
ffffffff81d7122a: eb 74 jmp ffffffff81d712a0 <__x86_return_thunk>
^^^^^^^^^^
notice the two JMP opcodes there.
Now look at the code in apply_returns:
if (op == JMP32_INSN_OPCODE)
dest = addr + insn.length + insn.immediate.value;
with
#define JMP32_INSN_OPCODE 0xE9
And here's the fix:
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 73be3931e4f0..50d64f5226f4 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -748,14 +748,20 @@ void __init_or_module noinline apply_returns(s32 *start, s32 *end)
continue;
op = insn.opcode.bytes[0];
- if (op == JMP32_INSN_OPCODE)
+ if (op == JMP32_INSN_OPCODE || op == JMP8_INSN_OPCODE)
dest = addr + insn.length + insn.immediate.value;
I'd still prefer the revert, though, that close to the MW. We can work
at those things later, at leisure.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Oct 18, 2023 at 10:37:47PM +0200, Borislav Petkov wrote:
> +++ b/arch/x86/kernel/alternative.c
> @@ -748,14 +748,20 @@ void __init_or_module noinline apply_returns(s32 *start, s32 *end)
> continue;
>
> op = insn.opcode.bytes[0];
> - if (op == JMP32_INSN_OPCODE)
> + if (op == JMP32_INSN_OPCODE || op == JMP8_INSN_OPCODE)
> dest = addr + insn.length + insn.immediate.value;
I can recreate (with my GCC 12) by disabling CONFIG_CALL_DEPTH_TRACKING
and CONFIG_CPU_SRSO, which puts __x86_return_thunk() close enough to the
retpolines to enable the two-byte JMP in the last retpoline. And then
booting with spectre_v2=retpoline.
(Then to force two-byte JMPs for more retpolines, I cheated and just
moved __x86_return_thunk() to right after the retpolines.)
Your WARN patch didn't seem to fix the no-output hang for me, maybe due
to recursive warnings?
I was able to get more output by changing the WARN to (ahem) WARN_ONCE,
but it's still getting into some kind of stack corruption. Full output
below. I haven't had a chance to look further, but it's worrisome that
even the WARN_ONCE isn't being recovered from.
Regardless of if we revert e92626af3234 ("x86/retpoline: Remove
.text..__x86.return_thunk section"), or do the above patch, we still
need to figure out why even WARN_ONCE() would be borking things.
Off to bed...
[ 0.403286] ------------[ cut here ]------------
[ 0.403286] missing return thunk: __x86_indirect_thunk_r12+0xa/0x20-0x0: eb 74 66 66 2e
[ 0.403286] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:742 apply_returns+0xca/0x23a
[ 0.403286] Modules linked in:
[ 0.403286] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.6.0-rc2+ #193
[ 0.403286] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014
[ 0.403286] RIP: 0010:apply_returns+0xca/0x23a
[ 0.403286] Code: 80 3d fa e2 65 01 00 75 b4 49 89 d8 b9 05 00 00 00 48 89 ea 48 89 de 48 c7 c7 c0 35 1a 82 c6 05 dc e2 65 01 01 e8 00 2c 0a 00 <0f> 0b eb 8f f6 05 50 18 e3 01 02 74 26 0f b6 54 24 52 48 63 44 24
[ 0.403286] RSP: 0000:ffffffff82403e30 EFLAGS: 00010282
[ 0.403286] RAX: 0000000000000000 RBX: ffffffff81aa100a RCX: 0000000000000003
[ 0.403286] RDX: 0000000000000003 RSI: 0000000000000003 RDI: 0000000000000001
[ 0.403286] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
[ 0.403286] R10: 0000000000000003 R11: ffffffff82403961 R12: ffffffff82e421f4
[ 0.403286] R13: ffffffff82e42218 R14: ffffffff82411f40 R15: 0000000000093cb0
[ 0.403286] FS: 0000000000000000(0000) GS:ffff88817b600000(0000) knlGS:0000000000000000
[ 0.403286] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.403286] CR2: ffff88817ffff000 CR3: 000000000243c001 CR4: 0000000000770ef0
[ 0.403286] PKRU: 55555554
[ 0.403286] Call Trace:
[ 0.403286] <TASK>
[ 0.403286] ? __warn+0xab/0x148
[ 0.403286] ? report_bug+0x109/0x17e
[ 0.403286] ? apply_returns+0xca/0x23a
[ 0.403286] ? handle_bug+0x41/0x6f
[ 0.403286] ? exc_invalid_op+0x13/0x60
[ 0.403286] ? asm_exc_invalid_op+0x16/0x20
[ 0.403286] ? __x86_indirect_thunk_r12+0xa/0x20
[ 0.403286] ? apply_returns+0xca/0x23a
[ 0.403286] ? __x86_indirect_thunk_r12+0xa/0x20
[ 0.403286] ? __x86_indirect_thunk_r12+0x19/0x20
[ 0.403286] ? __x86_indirect_thunk_r12+0xc/0x20
[ 0.403286] alternative_instructions+0x48/0xf5
[ 0.403286] arch_cpu_finalize_init+0xca/0xeb
[ 0.403286] start_kernel+0x588/0x610
[ 0.403286] x86_64_start_reservations+0x25/0x25
[ 0.403286] x86_64_start_kernel+0x73/0x73
[ 0.403286] secondary_startup_64_no_verify+0x166/0x16b
[ 0.403286] </TASK>
[ 0.403286] irq event stamp: 116873
[ 0.403286] hardirqs last enabled at (116883): [<ffffffff811497c4>] console_unlock+0xb0/0xfe
[ 0.403286] hardirqs last disabled at (116892): [<ffffffff811497a9>] console_unlock+0x95/0xfe
[ 0.403286] softirqs last enabled at (0): [<0000000000000000>] 0x0
[ 0.403286] softirqs last disabled at (0): [<0000000000000000>] 0x0
[ 0.403286] ---[ end trace 0000000000000000 ]---
[ 0.403286] ------------[ cut here ]------------
[ 0.403286] Unconverted return thunk
[ 0.403286] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/bugs.c:2855 check_thunks+0x21/0x28
[ 0.403286] Modules linked in:
[ 0.403286] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 6.6.0-rc2+ #193
[ 0.403286] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014
[ 0.403286] RIP: 0010:check_thunks+0x21/0x28
[ 0.403286] Code: be 02 00 00 e9 3b f3 ff ff 0f 1f 44 00 00 80 3d c9 0f 65 01 00 75 15 48 c7 c7 9b 61 1a 82 c6 05 b9 0f 65 01 01 e8 c3 58 09 00 <0f> 0b c3 cc cc cc cc 89 ff e8 54 09 02 00 90 c3 cc cc cc cc 53 89
[ 0.403286] RSP: 0000:ffffffff824039f8 EFLAGS: 00010082
[ 0.403286] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000003
[ 0.403286] RDX: 0000000000000004 RSI: 0000000000000003 RDI: 0000000000000001
[ 0.403286] RBP: ffffffff82403aa8 R08: 0000000000000000 R09: 0000000000000000
[ 0.403286] R10: 7574657220646574 R11: 7265766e6f636e55 R12: ffffffff824128c0
[ 0.403286] R13: ffffffff8117bdf0 R14: ffffffff82403ab8 R15: ffffffff83097580
[ 0.403286] FS: 0000000000000000(0000) GS:ffff88817b600000(0000) knlGS:0000000000000000
[ 0.403286] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.403286] CR2: ffff88817ffff000 CR3: 000000000243c001 CR4: 0000000000770ef0
[ 0.403286] PKRU: 55555554
[ 0.403286] Call Trace:
[ 0.403286] <TASK>
[ 0.403286] ? __warn+0xab/0x148
[ 0.403286] ? report_bug+0x109/0x17e
[ 0.403286] ? check_thunks+0x21/0x28
[ 0.403286] ? handle_bug+0x41/0x6f
[ 0.403286] ? exc_invalid_op+0x13/0x60
[ 0.403286] ? asm_exc_invalid_op+0x16/0x20
[ 0.403286] ? write_profile+0x163/0x163
[ 0.403286] ? check_thunks+0x21/0x28
[ 0.403286] ? check_thunks+0x21/0x28
[ 0.403286] __x86_return_thunk+0x5/0x7f
[ 0.403286] write_profile+0x163/0x163
[ 0.403286] arch_stack_walk+0xa3/0xda
[ 0.403286] ? stack_trace_save+0x48/0x6a
[ 0.403286] stack_trace_save+0x48/0x6a
[ 0.403286] save_trace+0x60/0x27a
[ 0.403286] mark_lock+0x108/0x357
[ 0.403286] ? desc_read+0x21/0xc5
[ 0.403286] ? insn_get_prefixes+0x17f/0x248
[ 0.403286] ? insn_get_opcode+0xd8/0x13a
[ 0.403286] __lock_acquire+0x538/0x1108
[ 0.403286] ? skip_nops+0x40/0x8e
[ 0.403286] ? _vm_unmap_aliases+0x58/0x203
[ 0.403286] lock_acquire+0x12b/0x277
[ 0.403286] ? _vm_unmap_aliases+0x58/0x203
[ 0.403286] ? insn_get_displacement+0x29/0x10a
[ 0.403286] ? __copy_user_flushcache+0x1d/0x94
[ 0.403286] ? _vm_unmap_aliases+0x58/0x203
[ 0.403286] __mutex_lock+0xbb/0x3ff
[ 0.403286] ? _vm_unmap_aliases+0x58/0x203
[ 0.403286] ? tracer_hardirqs_on+0x2c/0xd8
[ 0.403286] ? text_poke_early+0x85/0x95
[ 0.403286] ? _vm_unmap_aliases+0x58/0x203
[ 0.403286] _vm_unmap_aliases+0x58/0x203
[ 0.403286] ? __copy_user_flushcache+0x1d/0x94
[ 0.403286] change_page_attr_set_clr+0xc0/0x24f
[ 0.403286] change_page_attr_set.constprop.0+0xf/0x15
[ 0.403286] set_memory_nx+0x27/0x2d
[ 0.403286] free_init_pages+0x54/0x83
[ 0.403286] alternative_instructions+0xe1/0xf5
[ 0.403286] arch_cpu_finalize_init+0xca/0xeb
[ 0.403286] start_kernel+0x588/0x610
[ 0.403286] x86_64_start_reservations+0x25/0x25
[ 0.403286] x86_64_start_kernel+0x73/0x73
[ 0.403286] secondary_startup_64_no_verify+0x166/0x16b
[ 0.403286] </TASK>
[ 0.403286] irq event stamp: 122505
[ 0.403286] hardirqs last enabled at (122505): [<ffffffff81033999>] text_poke_early+0x85/0x95
[ 0.403286] hardirqs last disabled at (122504): [<ffffffff81033986>] text_poke_early+0x72/0x95
[ 0.403286] softirqs last enabled at (0): [<0000000000000000>] 0x0
[ 0.403286] softirqs last disabled at (0): [<0000000000000000>] 0x0
[ 0.403286] ---[ end trace 0000000000000000 ]---
[ 0.403286] BUG: kernel NULL pointer dereference, address: 0000000000000011
[ 0.403286] #PF: supervisor read access in kernel mode
[ 0.403286] #PF: error_code(0x0000) - not-present page
[ 0.403286] PGD 0 P4D 0
[ 0.403286] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 0.403286] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 6.6.0-rc2+ #193
[ 0.403286] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014
[ 0.403286] RIP: 0010:stack_trace_consume_entry+0x5/0x3c
[ 0.403286] Code: 48 89 d8 48 8b 54 24 08 65 48 2b 14 25 28 00 00 00 74 05 e8 f9 47 91 00 48 83 c4 10 5b 5d 41 5c c3 cc cc cc cc 0f 1f 44 00 00 <8b> 47 10 31 d2 3b 47 08 73 26 8b 57 0c 85 d2 74 09 ff ca 89 57 0c
[ 0.403286] RSP: 0000:ffffffff82403a08 EFLAGS: 00010082
[ 0.403286] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000003
[ 0.403286] RDX: 0000000000000004 RSI: 0000000000000003 RDI: 0000000000000001
[ 0.403286] RBP: ffffffff82403aa8 R08: 0000000000000000 R09: 0000000000000000
[ 0.403286] R10: 7574657220646574 R11: 7265766e6f636e55 R12: ffffffff824128c0
[ 0.403286] R13: ffffffff8117bdf0 R14: ffffffff82403ab8 R15: ffffffff83097580
[ 0.403286] FS: 0000000000000000(0000) GS:ffff88817b600000(0000) knlGS:0000000000000000
[ 0.403286] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.403286] CR2: 0000000000000011 CR3: 000000000243c001 CR4: 0000000000770ef0
[ 0.403286] PKRU: 55555554
[ 0.403286] Call Trace:
[ 0.403286] <TASK>
[ 0.403286] ? __die_body+0x1a/0x5c
[ 0.403286] ? page_fault_oops+0x333/0x380
[ 0.403286] ? do_user_addr_fault+0x125/0x503
[ 0.403286] ? exc_page_fault+0x168/0x19e
[ 0.403286] ? asm_exc_page_fault+0x22/0x30
[ 0.403286] ? write_profile+0x163/0x163
[ 0.403286] ? stack_trace_consume_entry+0x5/0x3c
[ 0.403286] ? write_profile+0x163/0x163
[ 0.403286] arch_stack_walk+0xa3/0xda
[ 0.403286] ? stack_trace_save+0x48/0x6a
[ 0.403286] stack_trace_save+0x48/0x6a
[ 0.403286] save_trace+0x60/0x27a
[ 0.403286] mark_lock+0x108/0x357
[ 0.403286] ? desc_read+0x21/0xc5
[ 0.403286] ? insn_get_prefixes+0x17f/0x248
[ 0.403286] ? insn_get_opcode+0xd8/0x13a
[ 0.403286] __lock_acquire+0x538/0x1108
[ 0.403286] ? skip_nops+0x40/0x8e
[ 0.403286] ? _vm_unmap_aliases+0x58/0x203
[ 0.403286] lock_acquire+0x12b/0x277
[ 0.403286] ? _vm_unmap_aliases+0x58/0x203
[ 0.403286] ? insn_get_displacement+0x29/0x10a
[ 0.403286] ? __copy_user_flushcache+0x1d/0x94
[ 0.403286] ? _vm_unmap_aliases+0x58/0x203
[ 0.403286] __mutex_lock+0xbb/0x3ff
[ 0.403286] ? _vm_unmap_aliases+0x58/0x203
[ 0.403286] ? tracer_hardirqs_on+0x2c/0xd8
[ 0.403286] ? text_poke_early+0x85/0x95
[ 0.403286] ? _vm_unmap_aliases+0x58/0x203
[ 0.403286] _vm_unmap_aliases+0x58/0x203
[ 0.403286] ? __copy_user_flushcache+0x1d/0x94
[ 0.403286] change_page_attr_set_clr+0xc0/0x24f
[ 0.403286] change_page_attr_set.constprop.0+0xf/0x15
[ 0.403286] set_memory_nx+0x27/0x2d
[ 0.403286] free_init_pages+0x54/0x83
[ 0.403286] alternative_instructions+0xe1/0xf5
[ 0.403286] arch_cpu_finalize_init+0xca/0xeb
[ 0.403286] start_kernel+0x588/0x610
[ 0.403286] x86_64_start_reservations+0x25/0x25
[ 0.403286] x86_64_start_kernel+0x73/0x73
[ 0.403286] secondary_startup_64_no_verify+0x166/0x16b
[ 0.403286] </TASK>
[ 0.403286] Modules linked in:
[ 0.403286] CR2: 0000000000000011
[ 0.403286] ---[ end trace 0000000000000000 ]---
[ 0.403286] RIP: 0010:stack_trace_consume_entry+0x5/0x3c
[ 0.403286] Code: 48 89 d8 48 8b 54 24 08 65 48 2b 14 25 28 00 00 00 74 05 e8 f9 47 91 00 48 83 c4 10 5b 5d 41 5c c3 cc cc cc cc 0f 1f 44 00 00 <8b> 47 10 31 d2 3b 47 08 73 26 8b 57 0c 85 d2 74 09 ff ca 89 57 0c
[ 0.403286] RSP: 0000:ffffffff82403a08 EFLAGS: 00010082
[ 0.403286] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000003
[ 0.403286] RDX: 0000000000000004 RSI: 0000000000000003 RDI: 0000000000000001
[ 0.403286] RBP: ffffffff82403aa8 R08: 0000000000000000 R09: 0000000000000000
[ 0.403286] R10: 7574657220646574 R11: 7265766e6f636e55 R12: ffffffff824128c0
[ 0.403286] R13: ffffffff8117bdf0 R14: ffffffff82403ab8 R15: ffffffff83097580
[ 0.403286] FS: 0000000000000000(0000) GS:ffff88817b600000(0000) knlGS:0000000000000000
[ 0.403286] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.403286] CR2: 0000000000000011 CR3: 000000000243c001 CR4: 0000000000770ef0
[ 0.403286] PKRU: 55555554
[ 0.403286] Kernel panic - not syncing: Attempted to kill the idle task!
[ 0.403286] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
--
Josh
On Wed, Oct 18, 2023 at 11:35:30PM -0700, Josh Poimboeuf wrote:
> On Wed, Oct 18, 2023 at 10:37:47PM +0200, Borislav Petkov wrote:
> > +++ b/arch/x86/kernel/alternative.c
> > @@ -748,14 +748,20 @@ void __init_or_module noinline apply_returns(s32 *start, s32 *end)
> > continue;
> >
> > op = insn.opcode.bytes[0];
> > - if (op == JMP32_INSN_OPCODE)
> > + if (op == JMP32_INSN_OPCODE || op == JMP8_INSN_OPCODE)
> > dest = addr + insn.length + insn.immediate.value;
>
> I can recreate (with my GCC 12) by disabling CONFIG_CALL_DEPTH_TRACKING
> and CONFIG_CPU_SRSO, which puts __x86_return_thunk() close enough to the
> retpolines to enable the two-byte JMP in the last retpoline. And then
> booting with spectre_v2=retpoline.
>
> (Then to force two-byte JMPs for more retpolines, I cheated and just
> moved __x86_return_thunk() to right after the retpolines.)
>
> Your WARN patch didn't seem to fix the no-output hang for me, maybe due
> to recursive warnings?
>
> I was able to get more output by changing the WARN to (ahem) WARN_ONCE,
> but it's still getting into some kind of stack corruption. Full output
> below. I haven't had a chance to look further, but it's worrisome that
> even the WARN_ONCE isn't being recovered from.
>
> Regardless of if we revert e92626af3234 ("x86/retpoline: Remove
> .text..__x86.return_thunk section"), or do the above patch, we still
> need to figure out why even WARN_ONCE() would be borking things.
>
> Off to bed...
One last idea, since the return thunk is used everywhere (even non-ABI
compliant functions) it might be possible the "call check_thunks" (and
its call to warn_printk) is clobbering some registers which some code
(exception handling entry code?) doesn't appreciate.
FWIW, I changed to a WARN_ON_ONCE and it booted fine.
--
Josh
On Wed, Oct 18, 2023 at 10:37:47PM +0200, Borislav Petkov wrote:
> And here's the fix:
>
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index 73be3931e4f0..50d64f5226f4 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -748,14 +748,20 @@ void __init_or_module noinline apply_returns(s32 *start, s32 *end)
> continue;
>
> op = insn.opcode.bytes[0];
> - if (op == JMP32_INSN_OPCODE)
> + if (op == JMP32_INSN_OPCODE || op == JMP8_INSN_OPCODE)
> dest = addr + insn.length + insn.immediate.value;
>
>
> I'd still prefer the revert, though, that close to the MW. We can work
> at those things later, at leisure.
Yet another fall-out from removing the section... When in it's own
section the compiler must emit long form jump because it doesn't know
where the target is.
Now, not so much.
Anyway, yes, that seems trivial enough as a fix.
The following commit has been merged into the x86/bugs branch of tip:
Commit-ID: 08ec7e82c1e3ebcd79ab8d2d0d11faad0f07e71c
Gitweb: https://git.kernel.org/tip/08ec7e82c1e3ebcd79ab8d2d0d11faad0f07e71c
Author: Borislav Petkov (AMD) <[email protected]>
AuthorDate: Thu, 19 Oct 2023 11:04:27 +02:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Thu, 19 Oct 2023 11:08:22 +02:00
Revert "x86/retpoline: Ensure default return thunk isn't used at runtime"
This reverts commit 91174087dcc7565d8bf0d576544e42d5b1de6f39.
It turns out that raising an undefined opcode exception due to unpatched
return thunks is not visible to users in every possible scenario (not
being able to catch dmesg, slow console, etc.).
Thus, it is not very friendly to them when the box explodes without even
saying why.
Revert for now until a better solution has been devised.
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Cc: David Kaplan <[email protected]>
Link: https://lore.kernel.org/r/20231018175531.GEZTAcE2p92U1AuVp1@fat_crate.local
---
arch/x86/lib/retpoline.S | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index fe05c13..6376d01 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -356,17 +356,15 @@ SYM_FUNC_END(call_depth_return_thunk)
* This function name is magical and is used by -mfunction-return=thunk-extern
* for the compiler to generate JMPs to it.
*
- * This code is only used during kernel boot. All
+ * This code is only used during kernel boot or module init. All
* 'JMP __x86_return_thunk' sites are changed to something else by
* apply_returns().
- *
- * This thunk is turned into a ud2 to ensure it is never used at runtime.
- * Alternative instructions are applied after apply_returns().
*/
SYM_CODE_START(__x86_return_thunk)
UNWIND_HINT_FUNC
ANNOTATE_NOENDBR
- ALTERNATIVE __stringify(ANNOTATE_UNRET_SAFE;ret),"ud2", X86_FEATURE_ALWAYS
+ ANNOTATE_UNRET_SAFE
+ ret
int3
SYM_CODE_END(__x86_return_thunk)
EXPORT_SYMBOL(__x86_return_thunk)
The following commit has been merged into the x86/bugs branch of tip:
Commit-ID: 59e6ce1eaaa2d9b2f9c89a108ce3fc7510bcd7ea
Gitweb: https://git.kernel.org/tip/59e6ce1eaaa2d9b2f9c89a108ce3fc7510bcd7ea
Author: Borislav Petkov (AMD) <[email protected]>
AuthorDate: Thu, 19 Oct 2023 11:09:41 +02:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Thu, 19 Oct 2023 11:25:19 +02:00
Revert "x86/retpoline: Remove .text..__x86.return_thunk section"
This reverts commit e92626af3234708fe30f53b269d210d202b95206.
David Howells reported his box freezing without being able to see
a panic. However, it managed to issue a warning beforehand:
missing return thunk: __x86_indirect_thunk_r15+0xa/0x5f-0x0: eb 74 66 66 2e
WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:755 apply_returns+0xca/0x247
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.6.0-rc5-next-20231013-build3+ #3044
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
RIP: 0010:apply_returns+0xca/0x247
this happened with linux-next and with gcc 13. Looking at the compiler
output and particularly paying attention to the two JMP instructions:
<__x86_indirect_thunk_r14>:
e8 01 00 00 00 call ffffffff81d71206 <__x86_indirect_thunk_r14+0x6>
cc int3
4c 89 34 24 mov %r14,(%rsp)
e9 91 00 00 00 jmp ffffffff81d712a0 <__x86_return_thunk>
66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1)
00 00 00 00
66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
<__x86_indirect_thunk_r15>:
e8 01 00 00 00 call ffffffff81d71226 <__x86_indirect_thunk_r15+0x6>
cc int3
4c 89 3c 24 mov %r15,(%rsp)
eb 74 jmp ffffffff81d712a0 <__x86_return_thunk>
the second JMP is a short JMP one. This is likely some new gcc
optimization to size the JMP offsets and generate a small one if it
fits.
However, the apply_returns() logic does not expect a short JMP:
if (op == JMP32_INSN_OPCODE)
dest = addr + insn.length + insn.immediate.value;
and that JMP32_INSN_OPCODE is 0xe9.
Now, if __x86_return_thunk is in another section, the compiler cannot do
those shortcuts and will have to generate a JMP with a s32 offset.
As a matter of fact, the removal of the section broke another case, see
https://lore.kernel.org/r/[email protected]
so revert for now until all the possible code generation issues have
been assessed, addressed and verified properly.
Reported-by: David Howells <[email protected]>
Tested-by: David Howells <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Link: https://lore.kernel.org/r/20231018175531.GEZTAcE2p92U1AuVp1@fat_crate.local
---
arch/x86/kernel/vmlinux.lds.S | 3 +++
arch/x86/lib/retpoline.S | 2 ++
2 files changed, 5 insertions(+)
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 9cdb1a7..54a5596 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -132,7 +132,10 @@ SECTIONS
LOCK_TEXT
KPROBES_TEXT
SOFTIRQENTRY_TEXT
+#ifdef CONFIG_RETPOLINE
*(.text..__x86.indirect_thunk)
+ *(.text..__x86.return_thunk)
+#endif
STATIC_CALL_TEXT
ALIGN_ENTRY_TEXT_BEGIN
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index 6376d01..d410aba 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -129,6 +129,8 @@ SYM_CODE_END(__x86_indirect_jump_thunk_array)
#ifdef CONFIG_RETHUNK
+ .section .text..__x86.return_thunk
+
#ifdef CONFIG_CPU_SRSO
/*
On Wed, Oct 18, 2023 at 11:59:28PM -0700, Josh Poimboeuf wrote:
> One last idea, since the return thunk is used everywhere (even non-ABI
> compliant functions) it might be possible the "call check_thunks" (and
> its call to warn_printk) is clobbering some registers which some code
> (exception handling entry code?) doesn't appreciate.
Yeah, that is still unclean, I'd say. gcc doesn't know that we patch in
a CALL insn in the alternative. What should work is to have
alternative_call
there which alternates between two calls and gcc knows there's a call so
it can act accordingly wrt callee-* regs.
Considering how __x86_return_thunk is there only until alternatives have
run, we could do something like
ALTERNATIVE_CALL nop, check_thunks
where nop is a function which doesn't do anything.
I say "ALTERNATIVE_CALL" because we don't have a _CALL asm macro yet.
And then in check_thunks() we can do all kinds of screaming, tainting
and setting mitigation status to vulnerable, etc.
Anyway something along those lines.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
[AMD Official Use Only - General]
> -----Original Message-----
> From: Borislav Petkov <[email protected]>
> Sent: Thursday, October 19, 2023 9:15 AM
> To: Josh Poimboeuf <[email protected]>
> Cc: Ingo Molnar <[email protected]>; [email protected]; linux-tip-
> [email protected]; Kaplan, David <[email protected]>; Peter
> Zijlstra (Intel) <[email protected]>; [email protected]; David Howells
> <[email protected]>
> Subject: Re: [tip: x86/bugs] x86/retpoline: Ensure default return thunk isn't
> used at runtime
>
> Caution: This message originated from an External Source. Use proper caution
> when opening attachments, clicking links, or responding.
>
>
> On Wed, Oct 18, 2023 at 11:59:28PM -0700, Josh Poimboeuf wrote:
> > One last idea, since the return thunk is used everywhere (even non-ABI
> > compliant functions) it might be possible the "call check_thunks" (and
> > its call to warn_printk) is clobbering some registers which some code
> > (exception handling entry code?) doesn't appreciate.
>
> Yeah, that is still unclean, I'd say. gcc doesn't know that we patch in a CALL insn
> in the alternative. What should work is to have
>
> alternative_call
>
> there which alternates between two calls and gcc knows there's a call so it can
> act accordingly wrt callee-* regs.
>
> Considering how __x86_return_thunk is there only until alternatives have run,
> we could do something like
>
> ALTERNATIVE_CALL nop, check_thunks
>
> where nop is a function which doesn't do anything.
>
> I say "ALTERNATIVE_CALL" because we don't have a _CALL asm macro yet.
>
> And then in check_thunks() we can do all kinds of screaming, tainting and
> setting mitigation status to vulnerable, etc.
>
> Anyway something along those lines.
The return thunk is used for all functions though, including assembly coded functions which may use non-standard calling conventions and aren't visible to gcc. I think the only safe thing would be to preserve all GPRs across the call to check_thunks. Something like PUSH_REGS/call check_thunks/POP_REGS.
--David Kaplan
On Thu, Oct 19, 2023 at 02:21:40PM +0000, Kaplan, David wrote:
> The return thunk is used for all functions though, including assembly
> coded functions which may use non-standard calling conventions and
> aren't visible to gcc. I think the only safe thing would be to
> preserve all GPRs across the call to check_thunks. Something like
> PUSH_REGS/call check_thunks/POP_REGS.
That call nop will be inside the return thunk. I.e., something like
this:
SYM_CODE_START(__x86_return_thunk)
UNWIND_HINT_FUNC
ANNOTATE_NOENDBR
ANNOTATE_UNRET_SAFE
ALTERNATIVE CALL nop, check_thunks, X86_FEATURE_ALWAYS
ret
int3
SYM_CODE_END(__x86_return_thunk)
EXPORT_SYMBOL(__x86_return_thunk)
I suspect that gcc doesn't know that there is a function call in the asm
there, which is also what you hint at - I need to ask a compiler guy.
But yeah, if it doesn't, then we'll need to push/pop regs as you
suggest.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Thu, Oct 19, 2023 at 04:39:51PM +0200, Borislav Petkov wrote:
> On Thu, Oct 19, 2023 at 02:21:40PM +0000, Kaplan, David wrote:
> > The return thunk is used for all functions though, including assembly
> > coded functions which may use non-standard calling conventions and
> > aren't visible to gcc. I think the only safe thing would be to
> > preserve all GPRs across the call to check_thunks. Something like
> > PUSH_REGS/call check_thunks/POP_REGS.
>
> That call nop will be inside the return thunk. I.e., something like
> this:
>
> SYM_CODE_START(__x86_return_thunk)
> UNWIND_HINT_FUNC
> ANNOTATE_NOENDBR
> ANNOTATE_UNRET_SAFE
> ALTERNATIVE CALL nop, check_thunks, X86_FEATURE_ALWAYS
> ret
> int3
> SYM_CODE_END(__x86_return_thunk)
> EXPORT_SYMBOL(__x86_return_thunk)
>
> I suspect that gcc doesn't know that there is a function call in the asm
> there, which is also what you hint at - I need to ask a compiler guy.
>
> But yeah, if it doesn't, then we'll need to push/pop regs as you
> suggest.
GCC doesn't read asm. Even if it did that wouldn't fix things for
callers of custom-ABI return-thunk-using functions.
The below seems to work.
diff --git a/arch/x86/entry/thunk_64.S b/arch/x86/entry/thunk_64.S
index 27b5da2111ac..54c043e010f9 100644
--- a/arch/x86/entry/thunk_64.S
+++ b/arch/x86/entry/thunk_64.S
@@ -46,3 +46,5 @@ THUNK preempt_schedule_thunk, preempt_schedule
THUNK preempt_schedule_notrace_thunk, preempt_schedule_notrace
EXPORT_SYMBOL(preempt_schedule_thunk)
EXPORT_SYMBOL(preempt_schedule_notrace_thunk)
+
+THUNK warn_thunk_thunk, __warn_thunk
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 14cd3cd5f85a..315e3f9410b2 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -357,6 +357,8 @@ extern void entry_ibpb(void);
extern void (*x86_return_thunk)(void);
+extern void __warn_thunk(void);
+
#ifdef CONFIG_CALL_DEPTH_TRACKING
extern void call_depth_return_thunk(void);
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index bb0ab8466b91..7d89fe7a2e69 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -2849,3 +2849,8 @@ ssize_t cpu_show_gds(struct device *dev, struct device_attribute *attr, char *bu
return cpu_show_common(dev, attr, buf, X86_BUG_GDS);
}
#endif
+
+void __warn_thunk(void)
+{
+ WARN_ONCE(1, "unpatched return thunk");
+}
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index fe05c139db48..389662b88e19 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -360,13 +360,14 @@ SYM_FUNC_END(call_depth_return_thunk)
* 'JMP __x86_return_thunk' sites are changed to something else by
* apply_returns().
*
- * This thunk is turned into a ud2 to ensure it is never used at runtime.
- * Alternative instructions are applied after apply_returns().
+ * The RET is replaced with a WARN_ONCE() to ensure it is never used at
+ * runtime. Alternative instructions are applied after apply_returns().
*/
SYM_CODE_START(__x86_return_thunk)
UNWIND_HINT_FUNC
ANNOTATE_NOENDBR
- ALTERNATIVE __stringify(ANNOTATE_UNRET_SAFE;ret),"ud2", X86_FEATURE_ALWAYS
+ ALTERNATIVE __stringify(ANNOTATE_UNRET_SAFE; ret), \
+ "jmp warn_thunk_thunk", X86_FEATURE_ALWAYS
int3
SYM_CODE_END(__x86_return_thunk)
EXPORT_SYMBOL(__x86_return_thunk)
The following commit has been merged into the x86/bugs branch of tip:
Commit-ID: 2d7ce49f58dc95495b3e22e45d2be7de909b2c63
Gitweb: https://git.kernel.org/tip/2d7ce49f58dc95495b3e22e45d2be7de909b2c63
Author: Josh Poimboeuf <[email protected]>
AuthorDate: Tue, 17 Oct 2023 09:59:46 -07:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Fri, 20 Oct 2023 13:02:23 +02:00
x86/retpoline: Make sure there are no unconverted return thunks due to KCSAN
Enabling CONFIG_KCSAN leads to unconverted, default return thunks to
remain after patching.
As David Kaplan describes in his debugging of the issue, it is caused by
a couple of KCSAN-generated constructors which aren't processed by
objtool:
"When KCSAN is enabled, GCC generates lots of constructor functions
named _sub_I_00099_0 which call __tsan_init and then return. The
returns in these are generally annotated normally by objtool and fixed
up at runtime. But objtool runs on vmlinux.o and vmlinux.o does not
include a couple of object files that are in vmlinux, like
init/version-timestamp.o and .vmlinux.export.o, both of which contain
_sub_I_00099_0 functions. As a result, the returns in these functions
are not annotated, and the panic occurs when we call one of them in
do_ctors and it uses the default return thunk.
This difference can be seen by counting the number of these functions in the object files:
$ objdump -d vmlinux.o|grep -c "<_sub_I_00099_0>:"
2601
$ objdump -d vmlinux|grep -c "<_sub_I_00099_0>:"
2603
If these functions are only run during kernel boot, there is no
speculation concern."
Fix it by disabling KCSAN on version-timestamp.o and .vmlinux.export.o
so the extra functions don't get generated. KASAN and GCOV are already
disabled for those files.
[ bp: Massage commit message. ]
Closes: https://lore.kernel.org/lkml/[email protected]/
Reported-by: Nathan Chancellor <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Acked-by: Marco Elver <[email protected]>
Tested-by: Nathan Chancellor <[email protected]>
Link: https://lore.kernel.org/r/20231017165946.v4i2d4exyqwqq3bx@treble
---
init/Makefile | 1 +
scripts/Makefile.vmlinux | 1 +
2 files changed, 2 insertions(+)
diff --git a/init/Makefile b/init/Makefile
index ec557ad..cbac576 100644
--- a/init/Makefile
+++ b/init/Makefile
@@ -60,4 +60,5 @@ include/generated/utsversion.h: FORCE
$(obj)/version-timestamp.o: include/generated/utsversion.h
CFLAGS_version-timestamp.o := -include include/generated/utsversion.h
KASAN_SANITIZE_version-timestamp.o := n
+KCSAN_SANITIZE_version-timestamp.o := n
GCOV_PROFILE_version-timestamp.o := n
diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
index 3cd6ca1..c9f3e03 100644
--- a/scripts/Makefile.vmlinux
+++ b/scripts/Makefile.vmlinux
@@ -19,6 +19,7 @@ quiet_cmd_cc_o_c = CC $@
ifdef CONFIG_MODULES
KASAN_SANITIZE_.vmlinux.export.o := n
+KCSAN_SANITIZE_.vmlinux.export.o := n
GCOV_PROFILE_.vmlinux.export.o := n
targets += .vmlinux.export.o
vmlinux: .vmlinux.export.o
On Thu, Oct 19, 2023 at 08:20:51AM -0700, Josh Poimboeuf wrote:
> GCC doesn't read asm. Even if it did that wouldn't fix things for
> callers of custom-ABI return-thunk-using functions.
>
> The below seems to work.
Right, I guess we can do something like that. Linker is not happy here
about that symbol, tho:
ld: arch/x86/lib/retpoline.o:(.altinstr_replacement+0x95): undefined reference to `warn_thunk_thunk'
make[2]: *** [scripts/Makefile.vmlinux:37: vmlinux] Error 1
make[1]: *** [/mnt/kernel/kernel/5th/linux/Makefile:1165: vmlinux] Error 2
make: *** [Makefile:234: __sub-make] Error 2
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Tue, Oct 24, 2023 at 10:19:13PM +0200, Borislav Petkov wrote:
> On Thu, Oct 19, 2023 at 08:20:51AM -0700, Josh Poimboeuf wrote:
> > GCC doesn't read asm. Even if it did that wouldn't fix things for
> > callers of custom-ABI return-thunk-using functions.
> >
> > The below seems to work.
>
> Right, I guess we can do something like that. Linker is not happy here
> about that symbol, tho:
>
> ld: arch/x86/lib/retpoline.o:(.altinstr_replacement+0x95): undefined reference to `warn_thunk_thunk'
> make[2]: *** [scripts/Makefile.vmlinux:37: vmlinux] Error 1
> make[1]: *** [/mnt/kernel/kernel/5th/linux/Makefile:1165: vmlinux] Error 2
> make: *** [Makefile:234: __sub-make] Error 2
Ok, back to playing with this. A fix below along with a beefed up
warning message.
If only I can remember now how we did trigger the warning in the first
place in order to test it...
---
diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index e59d3073e7cf..a4679e8f30ad 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -426,3 +426,36 @@ For 32-bit we have the following conventions - kernel is built with
.endm
#endif /* CONFIG_SMP */
+
+/* rdi: arg1 ... normal C conventions. rax is saved/restored. */
+.macro THUNK name, func
+SYM_FUNC_START(\name)
+ pushq %rbp
+ movq %rsp, %rbp
+
+ pushq %rdi
+ pushq %rsi
+ pushq %rdx
+ pushq %rcx
+ pushq %rax
+ pushq %r8
+ pushq %r9
+ pushq %r10
+ pushq %r11
+
+ call \func
+
+ popq %r11
+ popq %r10
+ popq %r9
+ popq %r8
+ popq %rax
+ popq %rcx
+ popq %rdx
+ popq %rsi
+ popq %rdi
+ popq %rbp
+ RET
+SYM_FUNC_END(\name)
+ _ASM_NOKPROBE(\name)
+.endm
diff --git a/arch/x86/entry/entry.S b/arch/x86/entry/entry.S
index 8c8d38f0cb1d..582731f74dc8 100644
--- a/arch/x86/entry/entry.S
+++ b/arch/x86/entry/entry.S
@@ -7,6 +7,8 @@
#include <linux/linkage.h>
#include <asm/msr-index.h>
+#include "calling.h"
+
.pushsection .noinstr.text, "ax"
SYM_FUNC_START(entry_ibpb)
@@ -20,3 +22,5 @@ SYM_FUNC_END(entry_ibpb)
EXPORT_SYMBOL_GPL(entry_ibpb);
.popsection
+
+THUNK warn_thunk_thunk, __warn_thunk
diff --git a/arch/x86/entry/thunk_64.S b/arch/x86/entry/thunk_64.S
index 416b400f39db..119ebdc3d362 100644
--- a/arch/x86/entry/thunk_64.S
+++ b/arch/x86/entry/thunk_64.S
@@ -9,39 +9,6 @@
#include "calling.h"
#include <asm/asm.h>
- /* rdi: arg1 ... normal C conventions. rax is saved/restored. */
- .macro THUNK name, func
-SYM_FUNC_START(\name)
- pushq %rbp
- movq %rsp, %rbp
-
- pushq %rdi
- pushq %rsi
- pushq %rdx
- pushq %rcx
- pushq %rax
- pushq %r8
- pushq %r9
- pushq %r10
- pushq %r11
-
- call \func
-
- popq %r11
- popq %r10
- popq %r9
- popq %r8
- popq %rax
- popq %rcx
- popq %rdx
- popq %rsi
- popq %rdi
- popq %rbp
- RET
-SYM_FUNC_END(\name)
- _ASM_NOKPROBE(\name)
- .endm
-
THUNK preempt_schedule_thunk, preempt_schedule
THUNK preempt_schedule_notrace_thunk, preempt_schedule_notrace
EXPORT_SYMBOL(preempt_schedule_thunk)
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 691ff1ef701b..64b175f03cdb 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -348,6 +348,8 @@ extern void entry_ibpb(void);
extern void (*x86_return_thunk)(void);
+extern void __warn_thunk(void);
+
#ifdef CONFIG_CALL_DEPTH_TRACKING
extern void call_depth_return_thunk(void);
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index bb0ab8466b91..00a6d024ddf7 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -2849,3 +2849,19 @@ ssize_t cpu_show_gds(struct device *dev, struct device_attribute *attr, char *bu
return cpu_show_common(dev, attr, buf, X86_BUG_GDS);
}
#endif
+
+void __warn_thunk(void)
+{
+ pr_warn_once("\n");
+ pr_warn_once("**********************************************************\n");
+ pr_warn_once("** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE **\n");
+ pr_warn_once("** **\n");
+ pr_warn_once("** unpatched return thunk in use. This should not **\n");
+ pr_warn_once("** on a production kernel. Please report this to **\n");
+ pr_warn_once("** [email protected]. **\n");
+ pr_warn_once("** **\n");
+ pr_warn_once("** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE **\n");
+ pr_warn_once("**********************************************************\n");
+
+ dump_stack();
+}
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index 7b2589877d06..dfcb7d64d05a 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -369,19 +369,14 @@ SYM_FUNC_END(call_depth_return_thunk)
* 'JMP __x86_return_thunk' sites are changed to something else by
* apply_returns().
*
- * This should be converted eventually to call a warning function which
- * should scream loudly when the default return thunk is called after
- * alternatives have been applied.
- *
- * That warning function cannot BUG() because the bug splat cannot be
- * displayed in all possible configurations, leading to users not really
- * knowing why the machine froze.
+ * The RET is replaced with a WARN_ONCE() to ensure it is never used at
+ * runtime. Alternative instructions are applied after apply_returns().
*/
SYM_CODE_START(__x86_return_thunk)
UNWIND_HINT_FUNC
ANNOTATE_NOENDBR
- ANNOTATE_UNRET_SAFE
- ret
+ ALTERNATIVE __stringify(ANNOTATE_UNRET_SAFE; ret), \
+ "jmp warn_thunk_thunk", X86_FEATURE_ALWAYS
int3
SYM_CODE_END(__x86_return_thunk)
EXPORT_SYMBOL(__x86_return_thunk)
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Jan 03, 2024 at 07:46:56PM +0100, Borislav Petkov wrote:
> If only I can remember now how we did trigger the warning in the first
> place in order to test it...
Ok, got tired of trying to make it use the default thunk - it seems
kinda hard to do - which is good - or I simply can't think of a good way
to trigger it.
So I went and replaced the jump to the actual thunk:
Dump of assembler code for function default_idle_call:
0xffffffff8197bda0 <+0>: nopw (%rax)
0xffffffff8197bda4 <+4>: nop
...
0xffffffff8197bdda <+58>: xchg %ax,%ax
0xffffffff8197bddc <+60>: sti
0xffffffff8197bddd <+61>: nop
0xffffffff8197bdde <+62>: jmp 0xffffffff81988420 <srso_return_thunk>
to what it is at build time. I.e., what should *not* happen after
patch_returns() as run:
Dump of assembler code for function default_idle_call:
0xffffffff8197bda0 <+0>: nopw (%rax)
0xffffffff8197bda4 <+4>: nop
...
0xffffffff8197bdda <+58>: xchg %ax,%ax
0xffffffff8197bddc <+60>: sti
0xffffffff8197bddd <+61>: nop
0xffffffff8197bdde <+62>: jmp 0xffffffff819884a0 <__x86_return_thunk>
and yap, it fires as expected:
[ 209.051694] **********************************************************
[ 209.053200] ** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE **
[ 209.054435] ** **
[ 209.055687] ** unpatched return thunk in use. This should not **
[ 209.056911] ** on a production kernel. Please report this to **
[ 209.058133] ** [email protected]. **
[ 209.059367] ** **
[ 209.060587] ** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE **
[ 209.061808] **********************************************************
[ 209.063064] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 6.7.0-rc8+ #15
[ 209.064527] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 209.066086] Call Trace:
[ 209.066569] <TASK>
[ 209.066975] dump_stack_lvl+0x36/0x50
[ 209.067675] warn_thunk_thunk+0x1a/0x30
[ 209.068405] do_idle+0x1a5/0x1e0
[ 209.069403] cpu_startup_entry+0x29/0x30
[ 209.070147] rest_init+0xc5/0xd0
[ 209.070775] arch_call_rest_init+0xe/0x20
[ 209.071537] start_kernel+0x425/0x680
[ 209.072235] ? set_init_arg+0x80/0x80
[ 209.072931] x86_64_start_reservations+0x18/0x30
[ 209.073803] x86_64_start_kernel+0xb7/0xc0
[ 209.074590] secondary_startup_64_no_verify+0x175/0x17b
[ 209.075584] </TASK>
Lemme write a proper patch.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Thoughts? Complaints?
---
From: Josh Poimboeuf <[email protected]>
Date: Wed, 3 Jan 2024 19:36:26 +0100
Make sure the default return thunk is not used after all return
instructions have been patched by the alternatives because the default
return thunk is insufficient when it comes to mitigating Retbleed or
SRSO.
Fix based on a earlier version by David Kaplan <[email protected]>.
[ bp: Use the big-fat "NOTICE NOTICE" banner and fix the compilation
error of warn_thunk_thunk being an invisible symbol. ]
Signed-off-by: Josh Poimboeuf <[email protected]>
Co-developed-by: Borislav Petkov (AMD) <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/entry/calling.h | 33 ++++++++++++++++++++++++++++
arch/x86/entry/entry.S | 4 ++++
arch/x86/entry/thunk_64.S | 33 ----------------------------
arch/x86/include/asm/nospec-branch.h | 2 ++
arch/x86/kernel/cpu/bugs.c | 16 ++++++++++++++
arch/x86/lib/retpoline.S | 15 +++++--------
6 files changed, 61 insertions(+), 42 deletions(-)
diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index e59d3073e7cf..a4679e8f30ad 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -426,3 +426,36 @@ For 32-bit we have the following conventions - kernel is built with
.endm
#endif /* CONFIG_SMP */
+
+/* rdi: arg1 ... normal C conventions. rax is saved/restored. */
+.macro THUNK name, func
+SYM_FUNC_START(\name)
+ pushq %rbp
+ movq %rsp, %rbp
+
+ pushq %rdi
+ pushq %rsi
+ pushq %rdx
+ pushq %rcx
+ pushq %rax
+ pushq %r8
+ pushq %r9
+ pushq %r10
+ pushq %r11
+
+ call \func
+
+ popq %r11
+ popq %r10
+ popq %r9
+ popq %r8
+ popq %rax
+ popq %rcx
+ popq %rdx
+ popq %rsi
+ popq %rdi
+ popq %rbp
+ RET
+SYM_FUNC_END(\name)
+ _ASM_NOKPROBE(\name)
+.endm
diff --git a/arch/x86/entry/entry.S b/arch/x86/entry/entry.S
index 8c8d38f0cb1d..582731f74dc8 100644
--- a/arch/x86/entry/entry.S
+++ b/arch/x86/entry/entry.S
@@ -7,6 +7,8 @@
#include <linux/linkage.h>
#include <asm/msr-index.h>
+#include "calling.h"
+
.pushsection .noinstr.text, "ax"
SYM_FUNC_START(entry_ibpb)
@@ -20,3 +22,5 @@ SYM_FUNC_END(entry_ibpb)
EXPORT_SYMBOL_GPL(entry_ibpb);
.popsection
+
+THUNK warn_thunk_thunk, __warn_thunk
diff --git a/arch/x86/entry/thunk_64.S b/arch/x86/entry/thunk_64.S
index 416b400f39db..119ebdc3d362 100644
--- a/arch/x86/entry/thunk_64.S
+++ b/arch/x86/entry/thunk_64.S
@@ -9,39 +9,6 @@
#include "calling.h"
#include <asm/asm.h>
- /* rdi: arg1 ... normal C conventions. rax is saved/restored. */
- .macro THUNK name, func
-SYM_FUNC_START(\name)
- pushq %rbp
- movq %rsp, %rbp
-
- pushq %rdi
- pushq %rsi
- pushq %rdx
- pushq %rcx
- pushq %rax
- pushq %r8
- pushq %r9
- pushq %r10
- pushq %r11
-
- call \func
-
- popq %r11
- popq %r10
- popq %r9
- popq %r8
- popq %rax
- popq %rcx
- popq %rdx
- popq %rsi
- popq %rdi
- popq %rbp
- RET
-SYM_FUNC_END(\name)
- _ASM_NOKPROBE(\name)
- .endm
-
THUNK preempt_schedule_thunk, preempt_schedule
THUNK preempt_schedule_notrace_thunk, preempt_schedule_notrace
EXPORT_SYMBOL(preempt_schedule_thunk)
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 691ff1ef701b..64b175f03cdb 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -348,6 +348,8 @@ extern void entry_ibpb(void);
extern void (*x86_return_thunk)(void);
+extern void __warn_thunk(void);
+
#ifdef CONFIG_CALL_DEPTH_TRACKING
extern void call_depth_return_thunk(void);
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index bb0ab8466b91..b96483551299 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -2849,3 +2849,19 @@ ssize_t cpu_show_gds(struct device *dev, struct device_attribute *attr, char *bu
return cpu_show_common(dev, attr, buf, X86_BUG_GDS);
}
#endif
+
+void __warn_thunk(void)
+{
+ pr_warn_once("\n");
+ pr_warn_once("**********************************************************\n");
+ pr_warn_once("** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE **\n");
+ pr_warn_once("** **\n");
+ pr_warn_once("** Unpatched return thunk in use. This should not **\n");
+ pr_warn_once("** happen on a production kernel. Please report this **\n");
+ pr_warn_once("** to [email protected]. **\n");
+ pr_warn_once("** **\n");
+ pr_warn_once("** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE **\n");
+ pr_warn_once("**********************************************************\n");
+
+ dump_stack();
+}
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index 7b2589877d06..5ed0c22f5351 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -369,19 +369,16 @@ SYM_FUNC_END(call_depth_return_thunk)
* 'JMP __x86_return_thunk' sites are changed to something else by
* apply_returns().
*
- * This should be converted eventually to call a warning function which
- * should scream loudly when the default return thunk is called after
- * alternatives have been applied.
- *
- * That warning function cannot BUG() because the bug splat cannot be
- * displayed in all possible configurations, leading to users not really
- * knowing why the machine froze.
+ * The ALTERNATIVE below adds a really loud warning to catch the case
+ * where the insufficient default return thunk ends up getting used for
+ * whatever reason like miscompilation or failure of
+ * objtool/alternatives/etc to patch all the return sites.
*/
SYM_CODE_START(__x86_return_thunk)
UNWIND_HINT_FUNC
ANNOTATE_NOENDBR
- ANNOTATE_UNRET_SAFE
- ret
+ ALTERNATIVE __stringify(ANNOTATE_UNRET_SAFE; ret), \
+ "jmp warn_thunk_thunk", X86_FEATURE_ALWAYS
int3
SYM_CODE_END(__x86_return_thunk)
EXPORT_SYMBOL(__x86_return_thunk)
--
2.42.0.rc0.25.ga82fb66fed25
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Thu, Jan 04, 2024 at 02:24:46PM +0100, Borislav Petkov wrote:
> +void __warn_thunk(void)
> +{
> + pr_warn_once("\n");
> + pr_warn_once("**********************************************************\n");
> + pr_warn_once("** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE **\n");
> + pr_warn_once("** **\n");
> + pr_warn_once("** Unpatched return thunk in use. This should not **\n");
> + pr_warn_once("** happen on a production kernel. Please report this **\n");
> + pr_warn_once("** to [email protected]. **\n");
I'm not yet sure here whether this should say "upstream kernels" because
otherwise we'll get a bunch of distro or whatnot downstream kernels
reports where we can't really do anything about...
Hmmm.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Thu, Jan 04, 2024 at 02:26:23PM +0100, Borislav Petkov wrote:
> On Thu, Jan 04, 2024 at 02:24:46PM +0100, Borislav Petkov wrote:
> > +void __warn_thunk(void)
> > +{
> > + pr_warn_once("\n");
> > + pr_warn_once("**********************************************************\n");
> > + pr_warn_once("** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE **\n");
> > + pr_warn_once("** **\n");
> > + pr_warn_once("** Unpatched return thunk in use. This should not **\n");
> > + pr_warn_once("** happen on a production kernel. Please report this **\n");
> > + pr_warn_once("** to [email protected]. **\n");
>
> I'm not yet sure here whether this should say "upstream kernels" because
> otherwise we'll get a bunch of distro or whatnot downstream kernels
> reports where we can't really do anything about...
>
> Hmmm.
At the very least, the dump_stack() should be a WARN_ON_ONCE().
Otherwise this is actually *more* likely to be ignored since automated
tools don't have a way to catch it: no taint, no "WARNING" string, no
panic_on_warn, etc.
But also, I'm not a fan of the banner. A warning is enough IMO.
Many/most warnings can be "security" issues. A production server which
ignores warnings/taints/etc would be a much bigger problem.
And as you say, there are many frankenkernels out there and upstream
doesn't want to be in the business of debugging them.
--
Josh
On Wed, Feb 07, 2024 at 09:50:10AM -0800, Josh Poimboeuf wrote:
> And as you say, there are many frankenkernels out there and upstream
> doesn't want to be in the business of debugging them.
Ok, all valid points. Diff ontop.
I'll queue it now so that it has ample time of cooking in linux-next.
Thx.
---
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 612c9ec456ae..5a300a7bad04 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -2853,16 +2853,5 @@ ssize_t cpu_show_gds(struct device *dev, struct device_attribute *attr, char *bu
void __warn_thunk(void)
{
- pr_warn_once("\n");
- pr_warn_once("**********************************************************\n");
- pr_warn_once("** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE **\n");
- pr_warn_once("** **\n");
- pr_warn_once("** Unpatched return thunk in use. This should not **\n");
- pr_warn_once("** happen on a production kernel. Please report this **\n");
- pr_warn_once("** to [email protected]. **\n");
- pr_warn_once("** **\n");
- pr_warn_once("** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE **\n");
- pr_warn_once("**********************************************************\n");
-
- dump_stack();
+ WARN_ONCE(1, "Unpatched return thunk in use. This should not happen!\n");
}
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Feb 07, 2024 at 07:53:28PM +0100, Borislav Petkov wrote:
> On Wed, Feb 07, 2024 at 09:50:10AM -0800, Josh Poimboeuf wrote:
> > And as you say, there are many frankenkernels out there and upstream
> > doesn't want to be in the business of debugging them.
>
> Ok, all valid points. Diff ontop.
>
> I'll queue it now so that it has ample time of cooking in linux-next.
>
> Thx.
>
> ---
>
> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index 612c9ec456ae..5a300a7bad04 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -2853,16 +2853,5 @@ ssize_t cpu_show_gds(struct device *dev, struct device_attribute *attr, char *bu
>
> void __warn_thunk(void)
> {
> - pr_warn_once("\n");
> - pr_warn_once("**********************************************************\n");
> - pr_warn_once("** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE **\n");
> - pr_warn_once("** **\n");
> - pr_warn_once("** Unpatched return thunk in use. This should not **\n");
> - pr_warn_once("** happen on a production kernel. Please report this **\n");
> - pr_warn_once("** to [email protected]. **\n");
> - pr_warn_once("** **\n");
> - pr_warn_once("** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE **\n");
> - pr_warn_once("**********************************************************\n");
> -
> - dump_stack();
> + WARN_ONCE(1, "Unpatched return thunk in use. This should not happen!\n");
> }
LGTM, thanks!
--
Josh
On Wed, Feb 07, 2024 at 11:49:19AM -0800, Josh Poimboeuf wrote:
> LGTM, thanks!
Thanks, had to hoist up both THUNK macros into the header to make that
nuisance 32-bit build too :)
---
commit 4461438a8405e800f90e0e40409e5f3d07eed381 (HEAD -> refs/heads/tip-x86-bugs)
Author: Josh Poimboeuf <[email protected]>
Date: Wed Jan 3 19:36:26 2024 +0100
x86/retpoline: Ensure default return thunk isn't used at runtime
Make sure the default return thunk is not used after all return
instructions have been patched by the alternatives because the default
return thunk is insufficient when it comes to mitigating Retbleed or
SRSO.
Fix based on an earlier version by David Kaplan <[email protected]>.
[ bp: Fix the compilation error of warn_thunk_thunk being an invisible
symbol, hoist thunk macro into calling.h ]
Signed-off-by: Josh Poimboeuf <[email protected]>
Co-developed-by: Borislav Petkov (AMD) <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/20240104132446.GEZZaxnrIgIyat0pqf@fat_crate.local
diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 39e069b68c6e..bd31b2534053 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -426,3 +426,63 @@ For 32-bit we have the following conventions - kernel is built with
.endm
#endif /* CONFIG_SMP */
+
+#ifdef CONFIG_X86_64
+
+/* rdi: arg1 ... normal C conventions. rax is saved/restored. */
+.macro THUNK name, func
+SYM_FUNC_START(\name)
+ pushq %rbp
+ movq %rsp, %rbp
+
+ pushq %rdi
+ pushq %rsi
+ pushq %rdx
+ pushq %rcx
+ pushq %rax
+ pushq %r8
+ pushq %r9
+ pushq %r10
+ pushq %r11
+
+ call \func
+
+ popq %r11
+ popq %r10
+ popq %r9
+ popq %r8
+ popq %rax
+ popq %rcx
+ popq %rdx
+ popq %rsi
+ popq %rdi
+ popq %rbp
+ RET
+SYM_FUNC_END(\name)
+ _ASM_NOKPROBE(\name)
+.endm
+
+#else /* CONFIG_X86_32 */
+
+/* put return address in eax (arg1) */
+.macro THUNK name, func, put_ret_addr_in_eax=0
+SYM_CODE_START_NOALIGN(\name)
+ pushl %eax
+ pushl %ecx
+ pushl %edx
+
+ .if \put_ret_addr_in_eax
+ /* Place EIP in the arg1 */
+ movl 3*4(%esp), %eax
+ .endif
+
+ call \func
+ popl %edx
+ popl %ecx
+ popl %eax
+ RET
+ _ASM_NOKPROBE(\name)
+SYM_CODE_END(\name)
+ .endm
+
+#endif
diff --git a/arch/x86/entry/entry.S b/arch/x86/entry/entry.S
index 8c8d38f0cb1d..582731f74dc8 100644
--- a/arch/x86/entry/entry.S
+++ b/arch/x86/entry/entry.S
@@ -7,6 +7,8 @@
#include <linux/linkage.h>
#include <asm/msr-index.h>
+#include "calling.h"
+
.pushsection .noinstr.text, "ax"
SYM_FUNC_START(entry_ibpb)
@@ -20,3 +22,5 @@ SYM_FUNC_END(entry_ibpb)
EXPORT_SYMBOL_GPL(entry_ibpb);
.popsection
+
+THUNK warn_thunk_thunk, __warn_thunk
diff --git a/arch/x86/entry/thunk_32.S b/arch/x86/entry/thunk_32.S
index 0103e103a657..da37f42f4549 100644
--- a/arch/x86/entry/thunk_32.S
+++ b/arch/x86/entry/thunk_32.S
@@ -4,33 +4,15 @@
* Copyright 2008 by Steven Rostedt, Red Hat, Inc
* (inspired by Andi Kleen's thunk_64.S)
*/
- #include <linux/export.h>
- #include <linux/linkage.h>
- #include <asm/asm.h>
- /* put return address in eax (arg1) */
- .macro THUNK name, func, put_ret_addr_in_eax=0
-SYM_CODE_START_NOALIGN(\name)
- pushl %eax
- pushl %ecx
- pushl %edx
+#include <linux/export.h>
+#include <linux/linkage.h>
+#include <asm/asm.h>
- .if \put_ret_addr_in_eax
- /* Place EIP in the arg1 */
- movl 3*4(%esp), %eax
- .endif
+#include "calling.h"
- call \func
- popl %edx
- popl %ecx
- popl %eax
- RET
- _ASM_NOKPROBE(\name)
-SYM_CODE_END(\name)
- .endm
-
- THUNK preempt_schedule_thunk, preempt_schedule
- THUNK preempt_schedule_notrace_thunk, preempt_schedule_notrace
- EXPORT_SYMBOL(preempt_schedule_thunk)
- EXPORT_SYMBOL(preempt_schedule_notrace_thunk)
+THUNK preempt_schedule_thunk, preempt_schedule
+THUNK preempt_schedule_notrace_thunk, preempt_schedule_notrace
+EXPORT_SYMBOL(preempt_schedule_thunk)
+EXPORT_SYMBOL(preempt_schedule_notrace_thunk)
diff --git a/arch/x86/entry/thunk_64.S b/arch/x86/entry/thunk_64.S
index 416b400f39db..119ebdc3d362 100644
--- a/arch/x86/entry/thunk_64.S
+++ b/arch/x86/entry/thunk_64.S
@@ -9,39 +9,6 @@
#include "calling.h"
#include <asm/asm.h>
- /* rdi: arg1 ... normal C conventions. rax is saved/restored. */
- .macro THUNK name, func
-SYM_FUNC_START(\name)
- pushq %rbp
- movq %rsp, %rbp
-
- pushq %rdi
- pushq %rsi
- pushq %rdx
- pushq %rcx
- pushq %rax
- pushq %r8
- pushq %r9
- pushq %r10
- pushq %r11
-
- call \func
-
- popq %r11
- popq %r10
- popq %r9
- popq %r8
- popq %rax
- popq %rcx
- popq %rdx
- popq %rsi
- popq %rdi
- popq %rbp
- RET
-SYM_FUNC_END(\name)
- _ASM_NOKPROBE(\name)
- .endm
-
THUNK preempt_schedule_thunk, preempt_schedule
THUNK preempt_schedule_notrace_thunk, preempt_schedule_notrace
EXPORT_SYMBOL(preempt_schedule_thunk)
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 2c0679ebe914..55754617eaee 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -357,6 +357,8 @@ extern void entry_ibpb(void);
extern void (*x86_return_thunk)(void);
+extern void __warn_thunk(void);
+
#ifdef CONFIG_MITIGATION_CALL_DEPTH_TRACKING
extern void call_depth_return_thunk(void);
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index f2775417bda2..a78892b0f823 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -2850,3 +2850,8 @@ ssize_t cpu_show_gds(struct device *dev, struct device_attribute *attr, char *bu
return cpu_show_common(dev, attr, buf, X86_BUG_GDS);
}
#endif
+
+void __warn_thunk(void)
+{
+ WARN_ONCE(1, "Unpatched return thunk in use. This should not happen!\n");
+}
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index 0045153ba222..721b528da9ac 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -369,19 +369,16 @@ SYM_FUNC_END(call_depth_return_thunk)
* 'JMP __x86_return_thunk' sites are changed to something else by
* apply_returns().
*
- * This should be converted eventually to call a warning function which
- * should scream loudly when the default return thunk is called after
- * alternatives have been applied.
- *
- * That warning function cannot BUG() because the bug splat cannot be
- * displayed in all possible configurations, leading to users not really
- * knowing why the machine froze.
+ * The ALTERNATIVE below adds a really loud warning to catch the case
+ * where the insufficient default return thunk ends up getting used for
+ * whatever reason like miscompilation or failure of
+ * objtool/alternatives/etc to patch all the return sites.
*/
SYM_CODE_START(__x86_return_thunk)
UNWIND_HINT_FUNC
ANNOTATE_NOENDBR
- ANNOTATE_UNRET_SAFE
- ret
+ ALTERNATIVE __stringify(ANNOTATE_UNRET_SAFE; ret), \
+ "jmp warn_thunk_thunk", X86_FEATURE_ALWAYS
int3
SYM_CODE_END(__x86_return_thunk)
EXPORT_SYMBOL(__x86_return_thunk)
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
The following commit has been merged into the x86/bugs branch of tip:
Commit-ID: 4461438a8405e800f90e0e40409e5f3d07eed381
Gitweb: https://git.kernel.org/tip/4461438a8405e800f90e0e40409e5f3d07eed381
Author: Josh Poimboeuf <[email protected]>
AuthorDate: Wed, 03 Jan 2024 19:36:26 +01:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Mon, 12 Feb 2024 11:42:15 +01:00
x86/retpoline: Ensure default return thunk isn't used at runtime
Make sure the default return thunk is not used after all return
instructions have been patched by the alternatives because the default
return thunk is insufficient when it comes to mitigating Retbleed or
SRSO.
Fix based on an earlier version by David Kaplan <[email protected]>.
[ bp: Fix the compilation error of warn_thunk_thunk being an invisible
symbol, hoist thunk macro into calling.h ]
Signed-off-by: Josh Poimboeuf <[email protected]>
Co-developed-by: Borislav Petkov (AMD) <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/20240104132446.GEZZaxnrIgIyat0pqf@fat_crate.local
---
arch/x86/entry/calling.h | 60 +++++++++++++++++++++++++++-
arch/x86/entry/entry.S | 4 ++-
arch/x86/entry/thunk_32.S | 34 +++------------
arch/x86/entry/thunk_64.S | 33 +---------------
arch/x86/include/asm/nospec-branch.h | 2 +-
arch/x86/kernel/cpu/bugs.c | 5 ++-
arch/x86/lib/retpoline.S | 15 ++-----
7 files changed, 85 insertions(+), 68 deletions(-)
diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 39e069b..bd31b25 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -426,3 +426,63 @@ For 32-bit we have the following conventions - kernel is built with
.endm
#endif /* CONFIG_SMP */
+
+#ifdef CONFIG_X86_64
+
+/* rdi: arg1 ... normal C conventions. rax is saved/restored. */
+.macro THUNK name, func
+SYM_FUNC_START(\name)
+ pushq %rbp
+ movq %rsp, %rbp
+
+ pushq %rdi
+ pushq %rsi
+ pushq %rdx
+ pushq %rcx
+ pushq %rax
+ pushq %r8
+ pushq %r9
+ pushq %r10
+ pushq %r11
+
+ call \func
+
+ popq %r11
+ popq %r10
+ popq %r9
+ popq %r8
+ popq %rax
+ popq %rcx
+ popq %rdx
+ popq %rsi
+ popq %rdi
+ popq %rbp
+ RET
+SYM_FUNC_END(\name)
+ _ASM_NOKPROBE(\name)
+.endm
+
+#else /* CONFIG_X86_32 */
+
+/* put return address in eax (arg1) */
+.macro THUNK name, func, put_ret_addr_in_eax=0
+SYM_CODE_START_NOALIGN(\name)
+ pushl %eax
+ pushl %ecx
+ pushl %edx
+
+ .if \put_ret_addr_in_eax
+ /* Place EIP in the arg1 */
+ movl 3*4(%esp), %eax
+ .endif
+
+ call \func
+ popl %edx
+ popl %ecx
+ popl %eax
+ RET
+ _ASM_NOKPROBE(\name)
+SYM_CODE_END(\name)
+ .endm
+
+#endif
diff --git a/arch/x86/entry/entry.S b/arch/x86/entry/entry.S
index 8c8d38f..582731f 100644
--- a/arch/x86/entry/entry.S
+++ b/arch/x86/entry/entry.S
@@ -7,6 +7,8 @@
#include <linux/linkage.h>
#include <asm/msr-index.h>
+#include "calling.h"
+
.pushsection .noinstr.text, "ax"
SYM_FUNC_START(entry_ibpb)
@@ -20,3 +22,5 @@ SYM_FUNC_END(entry_ibpb)
EXPORT_SYMBOL_GPL(entry_ibpb);
.popsection
+
+THUNK warn_thunk_thunk, __warn_thunk
diff --git a/arch/x86/entry/thunk_32.S b/arch/x86/entry/thunk_32.S
index 0103e10..da37f42 100644
--- a/arch/x86/entry/thunk_32.S
+++ b/arch/x86/entry/thunk_32.S
@@ -4,33 +4,15 @@
* Copyright 2008 by Steven Rostedt, Red Hat, Inc
* (inspired by Andi Kleen's thunk_64.S)
*/
- #include <linux/export.h>
- #include <linux/linkage.h>
- #include <asm/asm.h>
- /* put return address in eax (arg1) */
- .macro THUNK name, func, put_ret_addr_in_eax=0
-SYM_CODE_START_NOALIGN(\name)
- pushl %eax
- pushl %ecx
- pushl %edx
+#include <linux/export.h>
+#include <linux/linkage.h>
+#include <asm/asm.h>
- .if \put_ret_addr_in_eax
- /* Place EIP in the arg1 */
- movl 3*4(%esp), %eax
- .endif
+#include "calling.h"
- call \func
- popl %edx
- popl %ecx
- popl %eax
- RET
- _ASM_NOKPROBE(\name)
-SYM_CODE_END(\name)
- .endm
-
- THUNK preempt_schedule_thunk, preempt_schedule
- THUNK preempt_schedule_notrace_thunk, preempt_schedule_notrace
- EXPORT_SYMBOL(preempt_schedule_thunk)
- EXPORT_SYMBOL(preempt_schedule_notrace_thunk)
+THUNK preempt_schedule_thunk, preempt_schedule
+THUNK preempt_schedule_notrace_thunk, preempt_schedule_notrace
+EXPORT_SYMBOL(preempt_schedule_thunk)
+EXPORT_SYMBOL(preempt_schedule_notrace_thunk)
diff --git a/arch/x86/entry/thunk_64.S b/arch/x86/entry/thunk_64.S
index 416b400..119ebdc 100644
--- a/arch/x86/entry/thunk_64.S
+++ b/arch/x86/entry/thunk_64.S
@@ -9,39 +9,6 @@
#include "calling.h"
#include <asm/asm.h>
- /* rdi: arg1 ... normal C conventions. rax is saved/restored. */
- .macro THUNK name, func
-SYM_FUNC_START(\name)
- pushq %rbp
- movq %rsp, %rbp
-
- pushq %rdi
- pushq %rsi
- pushq %rdx
- pushq %rcx
- pushq %rax
- pushq %r8
- pushq %r9
- pushq %r10
- pushq %r11
-
- call \func
-
- popq %r11
- popq %r10
- popq %r9
- popq %r8
- popq %rax
- popq %rcx
- popq %rdx
- popq %rsi
- popq %rdi
- popq %rbp
- RET
-SYM_FUNC_END(\name)
- _ASM_NOKPROBE(\name)
- .endm
-
THUNK preempt_schedule_thunk, preempt_schedule
THUNK preempt_schedule_notrace_thunk, preempt_schedule_notrace
EXPORT_SYMBOL(preempt_schedule_thunk)
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 2c0679e..5575461 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -357,6 +357,8 @@ extern void entry_ibpb(void);
extern void (*x86_return_thunk)(void);
+extern void __warn_thunk(void);
+
#ifdef CONFIG_MITIGATION_CALL_DEPTH_TRACKING
extern void call_depth_return_thunk(void);
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index f277541..a78892b 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -2850,3 +2850,8 @@ ssize_t cpu_show_gds(struct device *dev, struct device_attribute *attr, char *bu
return cpu_show_common(dev, attr, buf, X86_BUG_GDS);
}
#endif
+
+void __warn_thunk(void)
+{
+ WARN_ONCE(1, "Unpatched return thunk in use. This should not happen!\n");
+}
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index 0045153..721b528 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -369,19 +369,16 @@ SYM_FUNC_END(call_depth_return_thunk)
* 'JMP __x86_return_thunk' sites are changed to something else by
* apply_returns().
*
- * This should be converted eventually to call a warning function which
- * should scream loudly when the default return thunk is called after
- * alternatives have been applied.
- *
- * That warning function cannot BUG() because the bug splat cannot be
- * displayed in all possible configurations, leading to users not really
- * knowing why the machine froze.
+ * The ALTERNATIVE below adds a really loud warning to catch the case
+ * where the insufficient default return thunk ends up getting used for
+ * whatever reason like miscompilation or failure of
+ * objtool/alternatives/etc to patch all the return sites.
*/
SYM_CODE_START(__x86_return_thunk)
UNWIND_HINT_FUNC
ANNOTATE_NOENDBR
- ANNOTATE_UNRET_SAFE
- ret
+ ALTERNATIVE __stringify(ANNOTATE_UNRET_SAFE; ret), \
+ "jmp warn_thunk_thunk", X86_FEATURE_ALWAYS
int3
SYM_CODE_END(__x86_return_thunk)
EXPORT_SYMBOL(__x86_return_thunk)
On Mon, Feb 12, 2024 at 02:13:39PM -0000, tip-bot2 for Josh Poimboeuf wrote:
> The following commit has been merged into the x86/bugs branch of tip:
>
> Commit-ID: 4461438a8405e800f90e0e40409e5f3d07eed381
> Gitweb: https://git.kernel.org/tip/4461438a8405e800f90e0e40409e5f3d07eed381
> Author: Josh Poimboeuf <[email protected]>
> AuthorDate: Wed, 03 Jan 2024 19:36:26 +01:00
> Committer: Borislav Petkov (AMD) <[email protected]>
> CommitterDate: Mon, 12 Feb 2024 11:42:15 +01:00
>
> x86/retpoline: Ensure default return thunk isn't used at runtime
>
> Make sure the default return thunk is not used after all return
> instructions have been patched by the alternatives because the default
> return thunk is insufficient when it comes to mitigating Retbleed or
> SRSO.
>
> Fix based on an earlier version by David Kaplan <[email protected]>.
>
> [ bp: Fix the compilation error of warn_thunk_thunk being an invisible
> symbol, hoist thunk macro into calling.h ]
>
> Signed-off-by: Josh Poimboeuf <[email protected]>
> Co-developed-by: Borislav Petkov (AMD) <[email protected]>
> Signed-off-by: Borislav Petkov (AMD) <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> Link: https://lore.kernel.org/r/20240104132446.GEZZaxnrIgIyat0pqf@fat_crate.local
This warning is now getting triggered for me in some of my builds,
specifically from Alpine Linux's configuration. A minimal reproducer on
top of defconfig:
$ echo 'CONFIG_X86_KERNEL_IBT=n
CONFIG_UNWINDER_ORC=n
CONFIG_UNWINDER_FRAME_POINTER=y' >arch/x86/configs/repro.config
$ make -skj"$(nproc)" ARCH=x86_64 CROSS_COMPILE=x86_64-linux- mrproper defconfig repro.config bzImage
$ qemu-system-x86_64 \
-display none \
-nodefaults \
-d unimp,guest_errors \
-append 'console=ttyS0 earlycon=uart8250,io,0x3f8' \
-kernel arch/x86/boot/bzImage \
-initrd rootfs.cpio \
-cpu host \
-enable-kvm \
-m 512m \
-smp 2 \
-serial mon:stdio
[ 0.000000] Linux version 6.7.0-01738-g4461438a8405-dirty ([email protected]) (x86_64-linux-gcc (GCC) 13.2.0, GNU ld (GNU Binutils) 2.41) #1 SMP PREEMPT_DYNAMIC Wed Feb 14 20:14:55 MST 2024
..
[ 0.337317] ------------[ cut here ]------------
[ 0.338282] Unpatched return thunk in use. This should not happen!
[ 0.339292] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/bugs.c:2856 __warn_thunk+0x27/0x40
[ 0.340284] Modules linked in:
[ 0.341021] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.7.0-01738-g4461438a8405-dirty #1
[ 0.341281] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 0.342281] RIP: 0010:__warn_thunk+0x27/0x40
[ 0.343281] Code: 90 90 90 80 3d 22 20 c3 01 00 74 05 e9 32 a5 eb 00 55 c6 05 13 20 c3 01 01 48 89 e5 90 48 c7 c7 80 80 50 89 e8 6a c4 03 00 90 <0f> 0b 90 90 5d e9 0f a5 eb 00 cc cc cc cc cc cc cc cc cc cc cc cc
[ 0.344286] RSP: 0018:ffff8ba9c0013e10 EFLAGS: 00010286
[ 0.345281] RAX: 0000000000000000 RBX: ffffffff89afba70 RCX: 0000000000000000
[ 0.346281] RDX: 0000000000000000 RSI: 00000000ffffdfff RDI: 0000000000000001
[ 0.347282] RBP: ffff8ba9c0013e10 R08: 00000000ffffdfff R09: ffff8ba9c0013c88
[ 0.348282] R10: 0000000000000001 R11: ffffffff89856ae0 R12: 0000000000000000
[ 0.349282] R13: ffff88c101126ac0 R14: ffff8ba9c0013e78 R15: 0000000000000000
[ 0.350285] FS: 0000000000000000(0000) GS:ffff88c11f000000(0000) knlGS:0000000000000000
[ 0.351283] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.352282] CR2: ffff88c119601000 CR3: 0000000018e2c000 CR4: 0000000000350ef0
[ 0.353284] Call Trace:
[ 0.354281] <TASK>
[ 0.355281] ? show_regs+0x60/0x70
[ 0.356281] ? __warn+0x84/0x150
[ 0.357281] ? __warn_thunk+0x27/0x40
[ 0.358281] ? report_bug+0x16d/0x1a0
[ 0.359088] ? console_unlock+0x4f/0xe0
[ 0.359281] ? handle_bug+0x43/0x80
[ 0.360228] ? exc_invalid_op+0x18/0x70
[ 0.360281] ? asm_exc_invalid_op+0x1b/0x20
[ 0.361282] ? ia32_binfmt_init+0x40/0x40
[ 0.362283] ? __warn_thunk+0x27/0x40
[ 0.363283] warn_thunk_thunk+0x16/0x30
[ 0.364283] do_one_initcall+0x59/0x230
[ 0.365284] kernel_init_freeable+0x1a4/0x2e0
[ 0.366248] ? __pfx_kernel_init+0x10/0x10
[ 0.366282] kernel_init+0x15/0x1b0
[ 0.367200] ret_from_fork+0x38/0x60
[ 0.367280] ? __pfx_kernel_init+0x10/0x10
[ 0.368175] ret_from_fork_asm+0x1b/0x30
[ 0.368285] </TASK>
[ 0.369280] ---[ end trace 0000000000000000 ]---
..
If there is any more information I can provide or patches I can test, I
am more than happy to do so.
Cheers,
Nathan
# bad: [2c3b09aac00d7835023bbc4473ee06696be64fa8] Add linux-next specific files for 20240214
# good: [7e90b5c295ec1e47c8ad865429f046970c549a66] Merge tag 'trace-tools-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
git bisect start '2c3b09aac00d7835023bbc4473ee06696be64fa8' '7e90b5c295ec1e47c8ad865429f046970c549a66'
# good: [a4f281576352365d7c83d9a2ff46c0430c8d6f1d] Merge branch 'main' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
git bisect good a4f281576352365d7c83d9a2ff46c0430c8d6f1d
# good: [2b837601fcd12acc492699f9148ca20a41d76b5d] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git
git bisect good 2b837601fcd12acc492699f9148ca20a41d76b5d
# bad: [4b0fab17a40c71b1202109cca7ab4854722f6fee] Merge branch 'usb-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
git bisect bad 4b0fab17a40c71b1202109cca7ab4854722f6fee
# bad: [09e1b07412d3a47f343acd2ab2459af3034e028b] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
git bisect bad 09e1b07412d3a47f343acd2ab2459af3034e028b
# good: [2208f1364f1de82b19313f36e3e4758487183639] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-dt.git
git bisect good 2208f1364f1de82b19313f36e3e4758487183639
# good: [fbbf6ba42cd4bbe1675db713d230fccda1183a47] Merge branch into tip/master: 'timers/ptp'
git bisect good fbbf6ba42cd4bbe1675db713d230fccda1183a47
# good: [0da9a7e5c86b003a9b446b30c90eaf96b2e442c2] spi: get rid of some legacy macros
git bisect good 0da9a7e5c86b003a9b446b30c90eaf96b2e442c2
# good: [64ffc035640f3a74205ae57d21fb171a88748b60] Merge branch into tip/master: 'irq/urgent'
git bisect good 64ffc035640f3a74205ae57d21fb171a88748b60
# good: [d1ff85fdf0b8f63a6e042ae7559c630f9b1c50e2] spi: pl022: Use typedef for dma_filter_fn
git bisect good d1ff85fdf0b8f63a6e042ae7559c630f9b1c50e2
# bad: [743a9723b476831c7910e6e15a714a713ab5989f] Merge branch into tip/master: 'x86/bugs'
git bisect bad 743a9723b476831c7910e6e15a714a713ab5989f
# good: [ee4c1592b7e9a5bf89b962d7afd7e9b04c8d16ee] irqchip/gic-v3-its: Remove usage of the deprecated ida_simple_xx() API
git bisect good ee4c1592b7e9a5bf89b962d7afd7e9b04c8d16ee
# good: [850d0fd76557fa4ad2d389a7d380f8a40043f874] Merge branch into tip/master: 'x86/urgent'
git bisect good 850d0fd76557fa4ad2d389a7d380f8a40043f874
# bad: [4461438a8405e800f90e0e40409e5f3d07eed381] x86/retpoline: Ensure default return thunk isn't used at runtime
git bisect bad 4461438a8405e800f90e0e40409e5f3d07eed381
# first bad commit: [4461438a8405e800f90e0e40409e5f3d07eed381] x86/retpoline: Ensure default return thunk isn't used at runtime
On 15.02.24 г. 5:20 ч., Nathan Chancellor wrote:
> On Mon, Feb 12, 2024 at 02:13:39PM -0000, tip-bot2 for Josh Poimboeuf wrote:
>> The following commit has been merged into the x86/bugs branch of tip:
>>
>> Commit-ID: 4461438a8405e800f90e0e40409e5f3d07eed381
>> Gitweb: https://git.kernel.org/tip/4461438a8405e800f90e0e40409e5f3d07eed381
>> Author: Josh Poimboeuf <[email protected]>
>> AuthorDate: Wed, 03 Jan 2024 19:36:26 +01:00
>> Committer: Borislav Petkov (AMD) <[email protected]>
>> CommitterDate: Mon, 12 Feb 2024 11:42:15 +01:00
>>
>> x86/retpoline: Ensure default return thunk isn't used at runtime
>>
>> Make sure the default return thunk is not used after all return
>> instructions have been patched by the alternatives because the default
>> return thunk is insufficient when it comes to mitigating Retbleed or
>> SRSO.
>>
>> Fix based on an earlier version by David Kaplan <[email protected]>.
>>
>> [ bp: Fix the compilation error of warn_thunk_thunk being an invisible
>> symbol, hoist thunk macro into calling.h ]
>>
>> Signed-off-by: Josh Poimboeuf <[email protected]>
>> Co-developed-by: Borislav Petkov (AMD) <[email protected]>
>> Signed-off-by: Borislav Petkov (AMD) <[email protected]>
>> Link: https://lore.kernel.org/r/[email protected]
>> Link: https://lore.kernel.org/r/20240104132446.GEZZaxnrIgIyat0pqf@fat_crate.local
>
> This warning is now getting triggered for me in some of my builds,
> specifically from Alpine Linux's configuration. A minimal reproducer on
> top of defconfig:
>
> $ echo 'CONFIG_X86_KERNEL_IBT=n
> CONFIG_UNWINDER_ORC=n
> CONFIG_UNWINDER_FRAME_POINTER=y' >arch/x86/configs/repro.config
>
I was able to reproduce this and it seems to go away if KERNEL_IBT=y.
When looking at the disassembly of do_one_initcall it seems the 2 return
sites are not patched at all, I see:
0xffffffff81001284 <+84>: call 0xffffffff81f2d000
<__x86_indirect_thunk_array+96>
0xffffffff810012e7 <+183>: jmp 0xffffffff81f2d760
<__x86_return_thunk>
The former should be rewritten to an indirect call as per
patch_retpoline and the latter should be rewritten altogether. I wonder
if objtool ignores the function for some reason ...
On Wed, Feb 14, 2024 at 08:20:49PM -0700, Nathan Chancellor wrote:
> On Mon, Feb 12, 2024 at 02:13:39PM -0000, tip-bot2 for Josh Poimboeuf wrote:
> > The following commit has been merged into the x86/bugs branch of tip:
> >
> > Commit-ID: 4461438a8405e800f90e0e40409e5f3d07eed381
> > Gitweb: https://git.kernel.org/tip/4461438a8405e800f90e0e40409e5f3d07eed381
> > Author: Josh Poimboeuf <[email protected]>
> > AuthorDate: Wed, 03 Jan 2024 19:36:26 +01:00
> > Committer: Borislav Petkov (AMD) <[email protected]>
> > CommitterDate: Mon, 12 Feb 2024 11:42:15 +01:00
> >
> > x86/retpoline: Ensure default return thunk isn't used at runtime
> >
> > Make sure the default return thunk is not used after all return
> > instructions have been patched by the alternatives because the default
> > return thunk is insufficient when it comes to mitigating Retbleed or
> > SRSO.
> >
> > Fix based on an earlier version by David Kaplan <[email protected]>.
> >
> > [ bp: Fix the compilation error of warn_thunk_thunk being an invisible
> > symbol, hoist thunk macro into calling.h ]
> >
> > Signed-off-by: Josh Poimboeuf <[email protected]>
> > Co-developed-by: Borislav Petkov (AMD) <[email protected]>
> > Signed-off-by: Borislav Petkov (AMD) <[email protected]>
> > Link: https://lore.kernel.org/r/[email protected]
> > Link: https://lore.kernel.org/r/20240104132446.GEZZaxnrIgIyat0pqf@fat_crate.local
>
> This warning is now getting triggered for me in some of my builds,
> specifically from Alpine Linux's configuration. A minimal reproducer on
> top of defconfig:
>
> $ echo 'CONFIG_X86_KERNEL_IBT=n
> CONFIG_UNWINDER_ORC=n
> CONFIG_UNWINDER_FRAME_POINTER=y' >arch/x86/configs/repro.config
Yeah, the special vdso glue:
[ 325.985728] PCI: Using configuration type 1 for base access
[ 325.986637] PCI: Using configuration type 1 for extended access
[ 325.987797] initcall pci_arch_init+0x0/0x90 returned 0 after 3000 usecs
[ 325.988815] calling init_vdso_image_64+0x0/0x30 @ 1
[ 325.989804] ------------[ cut here ]------------
[ 325.990724] Unpatched return thunk in use. This should not happen!
[ 325.991735] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/bugs.c:2856 __warn_thunk+0x27/0x40
[ 325.992793] Modules linked in:
[ 325.993440] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.8.0-rc4-00040-g4589f199eb68 #1
[ 325.993794] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 325.994794] RIP: 0010:__warn_thunk+0x27/0x40
[ 325.995648] Code: 90 90 90 80 3d 62 38 c3 01 00 74 05 e9 52 07 ee 00 55 c6 05 53 38 c3 01 01 48 89 e5 90 48 c7 c7 08 54 71 82 e8 9a c9 03 00 90 <0f> 0b 90 90 5d e9 2f 07 ee 00 cc cc cc cc cc cc cc cc cc cc cc cc
[ 325.996793] RSP: 0018:ffffc90000013e10 EFLAGS: 00010286
[ 325.997793] RAX: 0000000000000000 RBX: ffffffff82cfdac0 RCX: 0000000000000000
[ 325.998795] RDX: 0000000000000000 RSI: 00000000fff7ffff RDI: 0000000000000001
[ 325.999793] RBP: ffffc90000013e10 R08: 00000000fff7ffff R09: ffffc90000013c90
[ 326.000794] R10: 0000000000000001 R11: ffff88807b7df000 R12: 0000000000000000
[ 326.001793] R13: ffff8880035f9900 R14: ffffc90000013e78 R15: 0000000000000000
[ 326.002795] FS: 0000000000000000(0000) GS:ffff888079200000(0000) knlGS:0000000000000000
[ 326.003795] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 326.004793] CR2: ffff888003201000 CR3: 0000000002a2c000 CR4: 00000000003506f0
[ 326.005794] Call Trace:
[ 326.006339] <TASK>
[ 326.006794] ? show_regs+0x5f/0x70
[ 326.007501] ? __warn+0x83/0x130
[ 326.007794] ? __warn_thunk+0x27/0x40
[ 326.008542] ? report_bug+0x171/0x1a0
[ 326.008793] ? srso_return_thunk+0x5/0x5f
[ 326.009603] ? console_unlock+0x4f/0xe0
[ 326.010381] ? handle_bug+0x43/0x80
[ 326.010795] ? exc_invalid_op+0x18/0x70
[ 326.011592] ? asm_exc_invalid_op+0x1b/0x20
[ 326.012440] ? ia32_binfmt_init+0x40/0x40
[ 326.012795] ? __warn_thunk+0x27/0x40
[ 326.013546] warn_thunk_thunk+0x16/0x30
[ 326.013795] do_one_initcall+0x59/0x230
[ 326.014574] kernel_init_freeable+0x19a/0x2d0
[ 326.014794] ? __pfx_kernel_init+0x10/0x10
[ 326.015629] kernel_init+0x15/0x1b0
[ 326.016326] ret_from_fork+0x38/0x60
[ 326.016794] ? __pfx_kernel_init+0x10/0x10
[ 326.017627] ret_from_fork_asm+0x1a/0x30
[ 326.018394] </TASK>
[ 326.018794] ---[ end trace 0000000000000000 ]---
[ 326.019705] initcall init_vdso_image_64+0x0/0x30 returned 0 after 29000 usecs
[ 326.020794] calling init_vdso_image_32+0x0/0x20 @ 1
During build we do:
# VDSO2C arch/x86/entry/vdso/vdso-image-64.c
arch/x86/entry/vdso/vdso2c arch/x86/entry/vdso/vdso64.so.dbg arch/x86/entry/vdso/vdso64.so arch/x86/entry/vdso/vdso-image-64.c
...
# CC arch/x86/entry/vdso/vdso-image-64.o
gcc -Wp,-MMD,arch/x86/entry/vdso/.vdso-image-64.o.d -nostdinc -I./arch/x86/include -I./arch/x86/include/generated -I./include -I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/compiler-version.h -include ./include/linux/kconfig.h -include ./include/linux/compiler_types.h -D__KERNEL__ -fmacro-prefix-map=./= -Werror -std=gnu11 -fshort-wchar -funsigned-char -fno-common -fno-PIE -fno-strict-aliasing -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -fcf-protection=none -m64 -falign-jumps=1 -falign-loops=1 -mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mskip-rax-setup -mtune=generic -mno-red-zone -mcmodel=kernel -Wno-sign-compare -fno-asynchronous-unwind-tables -mindirect-branch=thunk-extern -mindirect-branch-register -mindirect-branch-cs-prefix -mfunction-return=thunk-extern -fno-jump-tables -fpatchable-function-entry=16,16 -fno-delete-null-pointer-checks -O2 -fno-allow-store-data-races -fstack-protector-strong -fno-omit-frame-pointer -fno-optimize-sibling-calls -ftrivial-auto-var-init=zero -fno-stack-clash-protection -falign-functions=16 -fstrict-flex-arrays=3 -fno-strict-overflow -fno-stack-check -fconserve-stack -Wall -Wundef -Werror=implicit-function-declaration -Werror=implicit-int -Werror=return-type -Werror=strict-prototypes -Wno-format-security -Wno-trigraphs -Wno-frame-address -Wno-address-of-packed-member -Wmissing-declarations -Wmissing-prototypes -Wframe-larger-than=2048 -Wno-main -Wno-unused-but-set-variable -Wno-unused-const-variable -Wno-dangling-pointer -Wvla -Wno-pointer-sign -Wcast-function-type -Wno-stringop-overflow -Wno-array-bounds -Wno-alloc-size-larger-than -Wimplicit-fallthrough=5 -Werror=date-time -Werror=incompatible-pointer-types -Werror=designated-init -Wenum-conversion -Wno-unused-but-set-variable -Wno-unused-const-variable -Wno-restrict -Wno-packed-not-aligned -Wno-format-overflow -Wno-format-truncation -Wno-stringop-truncation -Wno-missing-field-initializers -Wno-type-limits -Wno-shift-negative-value -Wno-maybe-uninitialized -Wno-sign-compare -DKBUILD_MODFILE='"arch/x86/entry/vdso/vdso-image-64"' -DKBUILD_BASENAME='"vdso_image_64"' -DKBUILD_MODNAME='"vdso_image_64"' -D__KBUILD_MODNAME=kmod_vdso_image_64 -c -o arch/x86/entry/vdso/vdso-image-64.o arch/x86/entry/vdso/vdso-image-64.c
...
and what is missing here is
# cmd_gen_objtooldep arch/x86/lib/vdso-image-64.o
{ echo ; echo 'arch/x86/lib/vdso-image-64.o: $(wildcard ./tools/objtool/objtool)' ; } >> arch/x86/lib/...
or so which needs to create the .return_sites but that thing gets
generated without it:
rm -f arch/x86/entry/vdso/built-in.a; printf "arch/x86/entry/vdso/%s " vma.o extable.o vdso32-setup.o vdso-image-64.o vdso-image-32.o | xargs ar cDPrST arch/x86/entry/vdso/built-in.a
I'd tend to look in Josh's direction as to say what would be the right
thing to do here and more specifically, where?
We need to run objtool on the vdso objects which are *kernel* code.
I.e., that initcall thing. The vdso-image-64.c gets generated by vdso2c
and lands in arch/x86/entry/vdso/vdso-image-64.c, that's why objtool
hasn't seen it yet.
I mean, it is one initcall in the vdso, probably not that important and
if its return hasn't been patched, it won't be the end of the world but
still...
In any case, the patch works as advertized! :-)
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Thu, Feb 15, 2024 at 04:53:49PM +0100, Borislav Petkov wrote:
> I'd tend to look in Josh's direction as to say what would be the right
> thing to do here and more specifically, where?
>
> We need to run objtool on the vdso objects which are *kernel* code.
> I.e., that initcall thing. The vdso-image-64.c gets generated by vdso2c
> and lands in arch/x86/entry/vdso/vdso-image-64.c, that's why objtool
> hasn't seen it yet.
>
> I mean, it is one initcall in the vdso, probably not that important and
> if its return hasn't been patched, it won't be the end of the world but
> still...
>
> In any case, the patch works as advertized! :-)
Right, the good news is this isn't a regression and the warning is
working as designed.
This should tell the build to invoke objtool on that file:
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index b1b8dd1608f7..92d67379f570 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -36,6 +36,7 @@ UBSAN_SANITIZE_vma.o := y
KCSAN_SANITIZE_vma.o := y
OBJECT_FILES_NON_STANDARD_vma.o := n
OBJECT_FILES_NON_STANDARD_extable.o := n
+OBJECT_FILES_NON_STANDARD_vdso-image-64.o := n
# vDSO images to build
vdso_img-$(VDSO64-y) += 64
On Thu, Feb 15, 2024 at 09:42:35PM -0800, Josh Poimboeuf wrote:
> Right, the good news is this isn't a regression and the warning is
> working as designed.
>
> This should tell the build to invoke objtool on that file:
>
> diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> index b1b8dd1608f7..92d67379f570 100644
> --- a/arch/x86/entry/vdso/Makefile
> +++ b/arch/x86/entry/vdso/Makefile
> @@ -36,6 +36,7 @@ UBSAN_SANITIZE_vma.o := y
> KCSAN_SANITIZE_vma.o := y
> OBJECT_FILES_NON_STANDARD_vma.o := n
> OBJECT_FILES_NON_STANDARD_extable.o := n
> +OBJECT_FILES_NON_STANDARD_vdso-image-64.o := n
Right, this should be:
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index c4df99aa1615..4a514cafd73e 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -36,6 +36,8 @@ UBSAN_SANITIZE_vma.o := y
KCSAN_SANITIZE_vma.o := y
OBJECT_FILES_NON_STANDARD_vma.o := n
OBJECT_FILES_NON_STANDARD_extable.o := n
+OBJECT_FILES_NON_STANDARD_vdso-image-32.o := n
+OBJECT_FILES_NON_STANDARD_vdso-image-64.o := n
# vDSO images to build
vdso_img-$(VDSO64-y) += 64
for completeness.
Lemme know if you want to write a formal patch or I should.
If you do, please make sure to include the exact way to reproduce
because we might need it in the future.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
For CONFIG_RETHUNK kernels, objtool annotates all the function return
sites so they can be patched during boot. By design, after
apply_returns() is called, all tail-calls to the compiler-generated
default return thunk (__x86_return_thunk) should be patched out and
replaced with whatever's needed for any mitigations (or lack thereof).
With the following commit
4461438a8405 ("x86/retpoline: Ensure default return thunk isn't used at runtime")
a runtime check was added to do a WARN_ONCE() if the default return
thunk ever gets executed after alternatives have been applied. This
warning is a sanity check to make sure objtool and apply_returns() are
doing their job.
As Nathan reported, that check found something:
Unpatched return thunk in use. This should not happen!
WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/bugs.c:2856 __warn_thunk+0x27/0x40
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.7.0-01738-g4461438a8405-dirty #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:__warn_thunk+0x27/0x40
Code: 90 90 90 80 3d 22 20 c3 01 00 74 05 e9 32 a5 eb 00 55 c6 05 13 20 c3 01 01 48 89 e5 90 48 c7 c7 80 80 50 89 e8 6a c4 03 00 90 <0f> 0b 90 90 5d e9 0f a5 eb 00 cc cc cc cc cc cc cc cc cc cc cc cc
RSP: 0018:ffff8ba9c0013e10 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffffffff89afba70 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 00000000ffffdfff RDI: 0000000000000001
RBP: ffff8ba9c0013e10 R08: 00000000ffffdfff R09: ffff8ba9c0013c88
R10: 0000000000000001 R11: ffffffff89856ae0 R12: 0000000000000000
R13: ffff88c101126ac0 R14: ffff8ba9c0013e78 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88c11f000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88c119601000 CR3: 0000000018e2c000 CR4: 0000000000350ef0
Call Trace:
<TASK>
? show_regs+0x60/0x70
? __warn+0x84/0x150
? __warn_thunk+0x27/0x40
? report_bug+0x16d/0x1a0
? console_unlock+0x4f/0xe0
? handle_bug+0x43/0x80
? exc_invalid_op+0x18/0x70
? asm_exc_invalid_op+0x1b/0x20
? ia32_binfmt_init+0x40/0x40
? __warn_thunk+0x27/0x40
warn_thunk_thunk+0x16/0x30
do_one_initcall+0x59/0x230
kernel_init_freeable+0x1a4/0x2e0
? __pfx_kernel_init+0x10/0x10
kernel_init+0x15/0x1b0
ret_from_fork+0x38/0x60
? __pfx_kernel_init+0x10/0x10
ret_from_fork_asm+0x1b/0x30
</TASK>
Boris debugged to find that the unpatched return site was in
init_vdso_image_64(), and its translation unit wasn't being analyzed by
objtool, so it never got annotated. So it got ignored by
apply_returns().
This is only a minor issue, as this function is only called during boot.
Still, objtool needs full visibility to the kernel. Fix it by enabling
objtool on vdso-image-{32,64}.o.
Note this problem can only be seen with !CONFIG_X86_KERNEL_IBT, as that
requires objtool to run individually on all translation units rather on
vmlinux.o.
Reported-by: Nathan Chancellor <[email protected]>
Debugged-by: Borislav Petkov <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
---
arch/x86/entry/vdso/Makefile | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index b1b8dd1608f7..4ee59121b905 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -34,8 +34,12 @@ obj-y += vma.o extable.o
KASAN_SANITIZE_vma.o := y
UBSAN_SANITIZE_vma.o := y
KCSAN_SANITIZE_vma.o := y
-OBJECT_FILES_NON_STANDARD_vma.o := n
-OBJECT_FILES_NON_STANDARD_extable.o := n
+
+OBJECT_FILES_NON_STANDARD_extable.o := n
+OBJECT_FILES_NON_STANDARD_vdso-image-32.o := n
+OBJECT_FILES_NON_STANDARD_vdso-image-64.o := n
+OBJECT_FILES_NON_STANDARD_vdso32-setup.o := n
+OBJECT_FILES_NON_STANDARD_vma.o := n
# vDSO images to build
vdso_img-$(VDSO64-y) += 64
@@ -43,7 +47,6 @@ vdso_img-$(VDSOX32-y) += x32
vdso_img-$(VDSO32-y) += 32
obj-$(VDSO32-y) += vdso32-setup.o
-OBJECT_FILES_NON_STANDARD_vdso32-setup.o := n
vobjs := $(foreach F,$(vobjs-y),$(obj)/$F)
vobjs32 := $(foreach F,$(vobjs32-y),$(obj)/$F)
--
2.43.0
The following commit has been merged into the x86/core branch of tip:
Commit-ID: b388e57d4628eb22782bdad4cd5b83ca87a1b7c9
Gitweb: https://git.kernel.org/tip/b388e57d4628eb22782bdad4cd5b83ca87a1b7c9
Author: Josh Poimboeuf <[email protected]>
AuthorDate: Mon, 19 Feb 2024 21:57:18 -08:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Tue, 20 Feb 2024 13:26:10 +01:00
x86/vdso: Fix rethunk patching for vdso-image-{32,64}.o
For CONFIG_RETHUNK kernels, objtool annotates all the function return
sites so they can be patched during boot. By design, after
apply_returns() is called, all tail-calls to the compiler-generated
default return thunk (__x86_return_thunk) should be patched out and
replaced with whatever's needed for any mitigations (or lack thereof).
The commit
4461438a8405 ("x86/retpoline: Ensure default return thunk isn't used at runtime")
adds a runtime check and a WARN_ONCE() if the default return thunk ever
gets executed after alternatives have been applied. This warning is
a sanity check to make sure objtool and apply_returns() are doing their
job.
As Nathan reported, that check found something:
Unpatched return thunk in use. This should not happen!
WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/bugs.c:2856 __warn_thunk+0x27/0x40
RIP: 0010:__warn_thunk+0x27/0x40
Call Trace:
<TASK>
? show_regs
? __warn
? __warn_thunk
? report_bug
? console_unlock
? handle_bug
? exc_invalid_op
? asm_exc_invalid_op
? ia32_binfmt_init
? __warn_thunk
warn_thunk_thunk
do_one_initcall
kernel_init_freeable
? __pfx_kernel_init
kernel_init
ret_from_fork
? __pfx_kernel_init
ret_from_fork_asm
</TASK>
Boris debugged to find that the unpatched return site was in
init_vdso_image_64(), and its translation unit wasn't being analyzed by
objtool, so it never got annotated. So it got ignored by
apply_returns().
This is only a minor issue, as this function is only called during boot.
Still, objtool needs full visibility to the kernel. Fix it by enabling
objtool on vdso-image-{32,64}.o.
Note this problem can only be seen with !CONFIG_X86_KERNEL_IBT, as that
requires objtool to run individually on all translation units rather on
vmlinux.o.
[ bp: Massage commit message. ]
Reported-by: Nathan Chancellor <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/entry/vdso/Makefile | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index c4df99a..b80f4bb 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -34,8 +34,12 @@ obj-y += vma.o extable.o
KASAN_SANITIZE_vma.o := y
UBSAN_SANITIZE_vma.o := y
KCSAN_SANITIZE_vma.o := y
-OBJECT_FILES_NON_STANDARD_vma.o := n
-OBJECT_FILES_NON_STANDARD_extable.o := n
+
+OBJECT_FILES_NON_STANDARD_extable.o := n
+OBJECT_FILES_NON_STANDARD_vdso-image-32.o := n
+OBJECT_FILES_NON_STANDARD_vdso-image-64.o := n
+OBJECT_FILES_NON_STANDARD_vdso32-setup.o := n
+OBJECT_FILES_NON_STANDARD_vma.o := n
# vDSO images to build
vdso_img-$(VDSO64-y) += 64
@@ -43,7 +47,6 @@ vdso_img-$(VDSOX32-y) += x32
vdso_img-$(VDSO32-y) += 32
obj-$(VDSO32-y) += vdso32-setup.o
-OBJECT_FILES_NON_STANDARD_vdso32-setup.o := n
vobjs := $(foreach F,$(vobjs-y),$(obj)/$F)
vobjs32 := $(foreach F,$(vobjs32-y),$(obj)/$F)
Hi,
On 2024-02-12 11:43, Borislav Petkov wrote:
> On Wed, Feb 07, 2024 at 11:49:19AM -0800, Josh Poimboeuf wrote:
>> LGTM, thanks!
>
> Thanks, had to hoist up both THUNK macros into the header to make that
> nuisance 32-bit build too :)
>
> ---
>
> commit 4461438a8405e800f90e0e40409e5f3d07eed381 (HEAD -> refs/heads/tip-x86-bugs)
> Author: Josh Poimboeuf <[email protected]>
> Date: Wed Jan 3 19:36:26 2024 +0100
>
> x86/retpoline: Ensure default return thunk isn't used at runtime
>
> Make sure the default return thunk is not used after all return
> instructions have been patched by the alternatives because the default
> return thunk is insufficient when it comes to mitigating Retbleed or
> SRSO.
>
> Fix based on an earlier version by David Kaplan <[email protected]>.
>
> [ bp: Fix the compilation error of warn_thunk_thunk being an invisible
> symbol, hoist thunk macro into calling.h ]
>
> Signed-off-by: Josh Poimboeuf <[email protected]>
> Co-developed-by: Borislav Petkov (AMD) <[email protected]>
> Signed-off-by: Borislav Petkov (AMD) <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> Link: https://lore.kernel.org/r/20240104132446.GEZZaxnrIgIyat0pqf@fat_crate.local
>
With this patch/commit, one of my machines (older P4 Xeon, 32-bit only)
hangs on boot with CONFIG_RETHUNK=y / CONFIG_MITIGATION_RETHUNK=y.
> diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
> index 39e069b68c6e..bd31b2534053 100644
> --- a/arch/x86/entry/calling.h
> +++ b/arch/x86/entry/calling.h
> @@ -426,3 +426,63 @@ For 32-bit we have the following conventions - kernel is built with
> .endm
>
> #endif /* CONFIG_SMP */
> +
> +#ifdef CONFIG_X86_64
> +
> +/* rdi: arg1 ... normal C conventions. rax is saved/restored. */
> +.macro THUNK name, func
> +SYM_FUNC_START(\name)
> + pushq %rbp
> + movq %rsp, %rbp
> +
> + pushq %rdi
> + pushq %rsi
> + pushq %rdx
> + pushq %rcx
> + pushq %rax
> + pushq %r8
> + pushq %r9
> + pushq %r10
> + pushq %r11
> +
> + call \func
> +
> + popq %r11
> + popq %r10
> + popq %r9
> + popq %r8
> + popq %rax
> + popq %rcx
> + popq %rdx
> + popq %rsi
> + popq %rdi
> + popq %rbp
> + RET
> +SYM_FUNC_END(\name)
> + _ASM_NOKPROBE(\name)
> +.endm
> +
> +#else /* CONFIG_X86_32 */
> +
> +/* put return address in eax (arg1) */
> +.macro THUNK name, func, put_ret_addr_in_eax=0
> +SYM_CODE_START_NOALIGN(\name)
> + pushl %eax
> + pushl %ecx
> + pushl %edx
> +
> + .if \put_ret_addr_in_eax
> + /* Place EIP in the arg1 */
> + movl 3*4(%esp), %eax
> + .endif
> +
> + call \func
> + popl %edx
> + popl %ecx
> + popl %eax
> + RET
> + _ASM_NOKPROBE(\name)
> +SYM_CODE_END(\name)
> + .endm
> +
> +#endif
> diff --git a/arch/x86/entry/entry.S b/arch/x86/entry/entry.S
> index 8c8d38f0cb1d..582731f74dc8 100644
> --- a/arch/x86/entry/entry.S
> +++ b/arch/x86/entry/entry.S
> @@ -7,6 +7,8 @@
> #include <linux/linkage.h>
> #include <asm/msr-index.h>
>
> +#include "calling.h"
> +
> .pushsection .noinstr.text, "ax"
>
> SYM_FUNC_START(entry_ibpb)
> @@ -20,3 +22,5 @@ SYM_FUNC_END(entry_ibpb)
> EXPORT_SYMBOL_GPL(entry_ibpb);
>
> .popsection
> +
> +THUNK warn_thunk_thunk, __warn_thunk
> diff --git a/arch/x86/entry/thunk_32.S b/arch/x86/entry/thunk_32.S
> index 0103e103a657..da37f42f4549 100644
> --- a/arch/x86/entry/thunk_32.S
> +++ b/arch/x86/entry/thunk_32.S
> @@ -4,33 +4,15 @@
> * Copyright 2008 by Steven Rostedt, Red Hat, Inc
> * (inspired by Andi Kleen's thunk_64.S)
> */
> - #include <linux/export.h>
> - #include <linux/linkage.h>
> - #include <asm/asm.h>
>
> - /* put return address in eax (arg1) */
> - .macro THUNK name, func, put_ret_addr_in_eax=0
> -SYM_CODE_START_NOALIGN(\name)
> - pushl %eax
> - pushl %ecx
> - pushl %edx
> +#include <linux/export.h>
> +#include <linux/linkage.h>
> +#include <asm/asm.h>
>
> - .if \put_ret_addr_in_eax
> - /* Place EIP in the arg1 */
> - movl 3*4(%esp), %eax
> - .endif
> +#include "calling.h"
>
> - call \func
> - popl %edx
> - popl %ecx
> - popl %eax
> - RET
> - _ASM_NOKPROBE(\name)
> -SYM_CODE_END(\name)
> - .endm
> -
> - THUNK preempt_schedule_thunk, preempt_schedule
> - THUNK preempt_schedule_notrace_thunk, preempt_schedule_notrace
> - EXPORT_SYMBOL(preempt_schedule_thunk)
> - EXPORT_SYMBOL(preempt_schedule_notrace_thunk)
> +THUNK preempt_schedule_thunk, preempt_schedule
> +THUNK preempt_schedule_notrace_thunk, preempt_schedule_notrace
> +EXPORT_SYMBOL(preempt_schedule_thunk)
> +EXPORT_SYMBOL(preempt_schedule_notrace_thunk)
>
> diff --git a/arch/x86/entry/thunk_64.S b/arch/x86/entry/thunk_64.S
> index 416b400f39db..119ebdc3d362 100644
> --- a/arch/x86/entry/thunk_64.S
> +++ b/arch/x86/entry/thunk_64.S
> @@ -9,39 +9,6 @@
> #include "calling.h"
> #include <asm/asm.h>
>
> - /* rdi: arg1 ... normal C conventions. rax is saved/restored. */
> - .macro THUNK name, func
> -SYM_FUNC_START(\name)
> - pushq %rbp
> - movq %rsp, %rbp
> -
> - pushq %rdi
> - pushq %rsi
> - pushq %rdx
> - pushq %rcx
> - pushq %rax
> - pushq %r8
> - pushq %r9
> - pushq %r10
> - pushq %r11
> -
> - call \func
> -
> - popq %r11
> - popq %r10
> - popq %r9
> - popq %r8
> - popq %rax
> - popq %rcx
> - popq %rdx
> - popq %rsi
> - popq %rdi
> - popq %rbp
> - RET
> -SYM_FUNC_END(\name)
> - _ASM_NOKPROBE(\name)
> - .endm
> -
> THUNK preempt_schedule_thunk, preempt_schedule
> THUNK preempt_schedule_notrace_thunk, preempt_schedule_notrace
> EXPORT_SYMBOL(preempt_schedule_thunk)
> diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
> index 2c0679ebe914..55754617eaee 100644
> --- a/arch/x86/include/asm/nospec-branch.h
> +++ b/arch/x86/include/asm/nospec-branch.h
> @@ -357,6 +357,8 @@ extern void entry_ibpb(void);
>
> extern void (*x86_return_thunk)(void);
>
> +extern void __warn_thunk(void);
> +
> #ifdef CONFIG_MITIGATION_CALL_DEPTH_TRACKING
> extern void call_depth_return_thunk(void);
>
> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index f2775417bda2..a78892b0f823 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -2850,3 +2850,8 @@ ssize_t cpu_show_gds(struct device *dev, struct device_attribute *attr, char *bu
> return cpu_show_common(dev, attr, buf, X86_BUG_GDS);
> }
> #endif
> +
> +void __warn_thunk(void)
> +{
> + WARN_ONCE(1, "Unpatched return thunk in use. This should not happen!\n");
> +}
> diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
> index 0045153ba222..721b528da9ac 100644
> --- a/arch/x86/lib/retpoline.S
> +++ b/arch/x86/lib/retpoline.S
> @@ -369,19 +369,16 @@ SYM_FUNC_END(call_depth_return_thunk)
> * 'JMP __x86_return_thunk' sites are changed to something else by
> * apply_returns().
> *
> - * This should be converted eventually to call a warning function which
> - * should scream loudly when the default return thunk is called after
> - * alternatives have been applied.
> - *
> - * That warning function cannot BUG() because the bug splat cannot be
> - * displayed in all possible configurations, leading to users not really
> - * knowing why the machine froze.
> + * The ALTERNATIVE below adds a really loud warning to catch the case
> + * where the insufficient default return thunk ends up getting used for
> + * whatever reason like miscompilation or failure of
> + * objtool/alternatives/etc to patch all the return sites.
> */
> SYM_CODE_START(__x86_return_thunk)
> UNWIND_HINT_FUNC
> ANNOTATE_NOENDBR
> - ANNOTATE_UNRET_SAFE
> - ret
> + ALTERNATIVE __stringify(ANNOTATE_UNRET_SAFE; ret), \
> + "jmp warn_thunk_thunk", X86_FEATURE_ALWAYS
> int3
> SYM_CODE_END(__x86_return_thunk)
> EXPORT_SYMBOL(__x86_return_thunk)
>
This seems to be the problematic snippet. Reverting it alone fixes the
issue for me. I wonder if it could have anything to do with the previous
comment text?
Please let me know if there's anything else you need.
Kind regards,
Klara Modin
On Wed, Apr 03, 2024 at 07:10:17PM +0200, Klara Modin wrote:
> With this patch/commit, one of my machines (older P4 Xeon, 32-bit only)
> hangs on boot with CONFIG_RETHUNK=y / CONFIG_MITIGATION_RETHUNK=y.
I wanna say your old P4 heater :) is not even affected by the crap the
return thunks are trying to address so perhaps we should make
CONFIG_MITIGATION_RETHUNK depend on !X86_32...
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 2024-04-03 19:30, Borislav Petkov wrote:
> On Wed, Apr 03, 2024 at 07:10:17PM +0200, Klara Modin wrote:
>> With this patch/commit, one of my machines (older P4 Xeon, 32-bit only)
>> hangs on boot with CONFIG_RETHUNK=y / CONFIG_MITIGATION_RETHUNK=y.
>
> I wanna say your old P4 heater :) is not even affected by the crap the
> return thunks are trying to address so perhaps we should make
> CONFIG_MITIGATION_RETHUNK depend on !X86_32...
>
Probably, I don't have much knowledge about this stuff. The machine can
at least be useful for testing still :)
On Wed, Apr 03, 2024 at 10:26:19PM +0200, Klara Modin wrote:
> Probably, I don't have much knowledge about this stuff. The machine can at
> least be useful for testing still :)
I wouldn't use it if I were you as it wouldn't even justify the
electricity wasted. No one cares about 32-bit x86 kernels anymore and we
barely keep them alive.
It'll be a lot more helpful if you'd test 64-bit kernels on 64-bit hw.
:-)
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 2024-04-03 22:41, Borislav Petkov wrote:
> On Wed, Apr 03, 2024 at 10:26:19PM +0200, Klara Modin wrote:
>> Probably, I don't have much knowledge about this stuff. The machine can at
>> least be useful for testing still :)
>
> I wouldn't use it if I were you as it wouldn't even justify the
> electricity wasted. No one cares about 32-bit x86 kernels anymore and we
> barely keep them alive.
>
> It'll be a lot more helpful if you'd test 64-bit kernels on 64-bit hw.
>
> :-)
>
> Thx.
>
All the more reason to continue then, even if only for nostalgia ;)
Jokes aside, I do run -next kernels regularly for my daily drivers
(which are x86_64), but it's honestly not very often I notice bugs there
that affect me. They have all been pretty minor or very obvious and
would probably have been caught regardless, but I'll of course still
report them.
On Thu, Apr 04, 2024 at 12:25:42AM +0200, Klara Modin wrote:
> All the more reason to continue then, even if only for nostalgia ;)
Here's an argument for you: please save the environment by using only
64-bit hw. :-P
> Jokes aside, I do run -next kernels regularly for my daily drivers (which
> are x86_64), but it's honestly not very often I notice bugs there that
> affect me. They have all been pretty minor or very obvious and would
> probably have been caught regardless, but I'll of course still report them.
That's good.
What you could also do is build random configs on linux-next - "make
randconfig" - and see if you catch something weird there. And maybe then
try to boot them in a VM and see what explodes.
In general, testing linux-next is a very good idea because it helps us
catch crap early and fix it before it hits the official releases.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Apr 03, 2024 at 07:10:17PM +0200, Klara Modin wrote:
> With this patch/commit, one of my machines (older P4 Xeon, 32-bit only)
> hangs on boot with CONFIG_RETHUNK=y / CONFIG_MITIGATION_RETHUNK=y.
Ok, this should fix it:
---
From: "Borislav Petkov (AMD)" <[email protected]>
Date: Mon, 15 Apr 2024 18:15:43 +0200
Subject: [PATCH] x86/retpolines: Enable the default thunk warning only on relevant configs
The using-default-thunk warning check makes sense only with
configurations which actually enable the special return thunks.
Otherwise, it fires on unrelated 32-bit configs on which the special
return thunks won't even work (they're 64-bit only) and, what is more,
those configs even go off into the weeds when booting in the
alternatives patching code, leading to a dead machine.
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/20240413024956.488d474e@yea
---
arch/x86/lib/retpoline.S | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index e674ccf720b9..391059b2c6fb 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -382,8 +382,15 @@ SYM_FUNC_END(call_depth_return_thunk)
SYM_CODE_START(__x86_return_thunk)
UNWIND_HINT_FUNC
ANNOTATE_NOENDBR
+#if defined(CONFIG_MITIGATION_UNRET_ENTRY) || \
+ defined(CONFIG_MITIGATION_SRSO) || \
+ defined(CONFIG_MITIGATION_CALL_DEPTH_TRACKING)
ALTERNATIVE __stringify(ANNOTATE_UNRET_SAFE; ret), \
"jmp warn_thunk_thunk", X86_FEATURE_ALWAYS
+#else
+ ANNOTATE_UNRET_SAFE
+ ret
+#endif
int3
SYM_CODE_END(__x86_return_thunk)
EXPORT_SYMBOL(__x86_return_thunk)
--
2.43.0
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 2024-04-16 11:27, Borislav Petkov wrote:
> On Wed, Apr 03, 2024 at 07:10:17PM +0200, Klara Modin wrote:
>> With this patch/commit, one of my machines (older P4 Xeon, 32-bit only)
>> hangs on boot with CONFIG_RETHUNK=y / CONFIG_MITIGATION_RETHUNK=y.
>
> Ok, this should fix it:
>
> ---
> From: "Borislav Petkov (AMD)" <[email protected]>
> Date: Mon, 15 Apr 2024 18:15:43 +0200
> Subject: [PATCH] x86/retpolines: Enable the default thunk warning only on relevant configs
>
> The using-default-thunk warning check makes sense only with
> configurations which actually enable the special return thunks.
>
> Otherwise, it fires on unrelated 32-bit configs on which the special
> return thunks won't even work (they're 64-bit only) and, what is more,
> those configs even go off into the weeds when booting in the
> alternatives patching code, leading to a dead machine.
>
> Signed-off-by: Borislav Petkov (AMD) <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> Link: https://lore.kernel.org/r/20240413024956.488d474e@yea
> ---
> arch/x86/lib/retpoline.S | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
> index e674ccf720b9..391059b2c6fb 100644
> --- a/arch/x86/lib/retpoline.S
> +++ b/arch/x86/lib/retpoline.S
> @@ -382,8 +382,15 @@ SYM_FUNC_END(call_depth_return_thunk)
> SYM_CODE_START(__x86_return_thunk)
> UNWIND_HINT_FUNC
> ANNOTATE_NOENDBR
> +#if defined(CONFIG_MITIGATION_UNRET_ENTRY) || \
> + defined(CONFIG_MITIGATION_SRSO) || \
> + defined(CONFIG_MITIGATION_CALL_DEPTH_TRACKING)
> ALTERNATIVE __stringify(ANNOTATE_UNRET_SAFE; ret), \
> "jmp warn_thunk_thunk", X86_FEATURE_ALWAYS
> +#else
> + ANNOTATE_UNRET_SAFE
> + ret
> +#endif
> int3
> SYM_CODE_END(__x86_return_thunk)
> EXPORT_SYMBOL(__x86_return_thunk)
Thanks,
Tested-by: Klara Modin <[email protected]>
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 6376306adde5b252ee7c73572e35d13fb13f6f18
Gitweb: https://git.kernel.org/tip/6376306adde5b252ee7c73572e35d13fb13f6f18
Author: Borislav Petkov (AMD) <[email protected]>
AuthorDate: Mon, 15 Apr 2024 18:15:43 +02:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Wed, 17 Apr 2024 18:02:05 +02:00
x86/retpolines: Enable the default thunk warning only on relevant configs
The using-default-thunk warning check makes sense only with
configurations which actually enable the special return thunks.
Otherwise, it fires on unrelated 32-bit configs on which the special
return thunks won't even work (they're 64-bit only) and, what is more,
those configs even go off into the weeds when booting in the
alternatives patching code, leading to a dead machine.
Fixes: 4461438a8405 ("x86/retpoline: Ensure default return thunk isn't used at runtime")
Reported-by: Klara Modin <[email protected]>
Reported-by: Erhard Furtner <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Tested-by: Klara Modin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/20240413024956.488d474e@yea
---
arch/x86/lib/retpoline.S | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index e674ccf..391059b 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -382,8 +382,15 @@ SYM_FUNC_END(call_depth_return_thunk)
SYM_CODE_START(__x86_return_thunk)
UNWIND_HINT_FUNC
ANNOTATE_NOENDBR
+#if defined(CONFIG_MITIGATION_UNRET_ENTRY) || \
+ defined(CONFIG_MITIGATION_SRSO) || \
+ defined(CONFIG_MITIGATION_CALL_DEPTH_TRACKING)
ALTERNATIVE __stringify(ANNOTATE_UNRET_SAFE; ret), \
"jmp warn_thunk_thunk", X86_FEATURE_ALWAYS
+#else
+ ANNOTATE_UNRET_SAFE
+ ret
+#endif
int3
SYM_CODE_END(__x86_return_thunk)
EXPORT_SYMBOL(__x86_return_thunk)