Subject: Re: [bisected] Kernel v6.9-rc3 fails to boot on a Thinkpad T60 with MITIGATION_RETHUNK=y (regression from v6.8.5)

On 13.04.24 11:19, Bagas Sanjaya wrote:
> On Sat, Apr 13, 2024 at 02:49:56AM +0200, Erhard Furtner wrote:
>> Greetings!
>>
>> With MITIGATION_RETHUNK=y selected in kernel .config v6.9-rc3 fails to boot on my Thinkpad T60. The resulting kernel stalls booting at "x86/fpu: x87 FPU will use FXSAVE":
>> [...]
>> 4461438a8405e800f90e0e40409e5f3d07eed381 is the first bad commit

There was an earlier report about this here:
https://lore.kernel.org/all/[email protected]/

Boris there suggested: "perhaps we should make
CONFIG_MITIGATION_RETHUNK depend on !X86_32":
https://lore.kernel.org/all/20240403173059.GJZg2SUwS8MXw7CdwF@fat_crate.local/

But that did not happen afaics. Would it be wise to go down that path?

Ciao, Thorsten



2024-04-14 08:34:27

by Borislav Petkov

[permalink] [raw]
Subject: Re: [bisected] Kernel v6.9-rc3 fails to boot on a Thinkpad T60 with MITIGATION_RETHUNK=y (regression from v6.8.5)

On Sat, Apr 13, 2024 at 11:46:09AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> Boris there suggested: "perhaps we should make
> CONFIG_MITIGATION_RETHUNK depend on !X86_32":
> https://lore.kernel.org/all/20240403173059.GJZg2SUwS8MXw7CdwF@fat_crate.local/
>
> But that did not happen afaics. Would it be wise to go down that path?

Am looking at the whole thing. Stay tuned...

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-04-14 09:06:12

by Borislav Petkov

[permalink] [raw]
Subject: Re: [bisected] Kernel v6.9-rc3 fails to boot on a Thinkpad T60 with MITIGATION_RETHUNK=y (regression from v6.8.5)

On Sun, Apr 14, 2024 at 10:36:26AM +0200, Borislav Petkov wrote:
> Am looking at the whole thing. Stay tuned...

Something like this, I guess...

Execution goes off somewhere into the weeds during alternatives patching
of the return thunk while it tries to warn about it in the alternatives
code itself and it all ends up in an endless INT3 exceptions due to our
speculation blockers everywhere...

I could chase it as to why exactly but the warning is there for all
those mitigations which need a special return thunk and 32-bit doesn't
need them (and at least the AMD untraining sequences are 64-bit only
so...).

IOW:

diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index e674ccf720b9..391059b2c6fb 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -382,8 +382,15 @@ SYM_FUNC_END(call_depth_return_thunk)
SYM_CODE_START(__x86_return_thunk)
UNWIND_HINT_FUNC
ANNOTATE_NOENDBR
+#if defined(CONFIG_MITIGATION_UNRET_ENTRY) || \
+ defined(CONFIG_MITIGATION_SRSO) || \
+ defined(CONFIG_MITIGATION_CALL_DEPTH_TRACKING)
ALTERNATIVE __stringify(ANNOTATE_UNRET_SAFE; ret), \
"jmp warn_thunk_thunk", X86_FEATURE_ALWAYS
+#else
+ ANNOTATE_UNRET_SAFE
+ ret
+#endif
int3
SYM_CODE_END(__x86_return_thunk)
EXPORT_SYMBOL(__x86_return_thunk)

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Subject: Re: [bisected] Kernel v6.9-rc3 fails to boot on a Thinkpad T60 with MITIGATION_RETHUNK=y (regression from v6.8.5)

On 14.04.24 11:08, Borislav Petkov wrote:
> On Sun, Apr 14, 2024 at 10:36:26AM +0200, Borislav Petkov wrote:
>>> There was an earlier report about this here:
>>> https://lore.kernel.org/all/[email protected]/
>> Am looking at the whole thing. Stay tuned...
>
> Something like this, I guess...
>
> Execution goes off somewhere into the weeds during alternatives patching
> of the return thunk while it tries to warn about it in the alternatives
> code itself and it all ends up in an endless INT3 exceptions due to our
> speculation blockers everywhere...
>
> I could chase it as to why exactly but the warning is there for all
> those mitigations which need a special return thunk and 32-bit doesn't
> need them (and at least the AMD untraining sequences are 64-bit only
> so...).

Erhard Furtner, did you try if this helps for a kernel with
MITIGATION_RETHUNK=y? Klara Modin, or could you give it a try?

Without a check this is unlikely to be merged and then more people might
run into problems like you two did.

Ciao, Thorsten
> IOW:
>
> diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
> index e674ccf720b9..391059b2c6fb 100644
> --- a/arch/x86/lib/retpoline.S
> +++ b/arch/x86/lib/retpoline.S
> @@ -382,8 +382,15 @@ SYM_FUNC_END(call_depth_return_thunk)
> SYM_CODE_START(__x86_return_thunk)
> UNWIND_HINT_FUNC
> ANNOTATE_NOENDBR
> +#if defined(CONFIG_MITIGATION_UNRET_ENTRY) || \
> + defined(CONFIG_MITIGATION_SRSO) || \
> + defined(CONFIG_MITIGATION_CALL_DEPTH_TRACKING)
> ALTERNATIVE __stringify(ANNOTATE_UNRET_SAFE; ret), \
> "jmp warn_thunk_thunk", X86_FEATURE_ALWAYS
> +#else
> + ANNOTATE_UNRET_SAFE
> + ret
> +#endif
> int3
> SYM_CODE_END(__x86_return_thunk)
> EXPORT_SYMBOL(__x86_return_thunk)
>

Subject: Re: [bisected] Kernel v6.9-rc3 fails to boot on a Thinkpad T60 with MITIGATION_RETHUNK=y (regression from v6.8.5)



On 17.04.24 10:38, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 14.04.24 11:08, Borislav Petkov wrote:
>> On Sun, Apr 14, 2024 at 10:36:26AM +0200, Borislav Petkov wrote:
>>>> There was an earlier report about this here:
>>>> https://lore.kernel.org/all/[email protected]/
>>> Am looking at the whole thing. Stay tuned...
>>
>> Something like this, I guess...
>>
>> Execution goes off somewhere into the weeds during alternatives patching
>> of the return thunk while it tries to warn about it in the alternatives
>> code itself and it all ends up in an endless INT3 exceptions due to our
>> speculation blockers everywhere...
>>
>> I could chase it as to why exactly but the warning is there for all
>> those mitigations which need a special return thunk and 32-bit doesn't
>> need them (and at least the AMD untraining sequences are 64-bit only
>> so...).
>
> Erhard Furtner, did you try if this helps for a kernel with
> MITIGATION_RETHUNK=y? Klara Modin, or could you give it a try?
>
> Without a check this is unlikely to be merged and then more people might
> run into problems like you two did.

Ignore that, I only not noticed the discussion continued in the other
thread and Klara Modin already provided a tested-by. Sorry for the noise.

Ciao, Thorsten

>> IOW:
>>
>> diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
>> index e674ccf720b9..391059b2c6fb 100644
>> --- a/arch/x86/lib/retpoline.S
>> +++ b/arch/x86/lib/retpoline.S
>> @@ -382,8 +382,15 @@ SYM_FUNC_END(call_depth_return_thunk)
>> SYM_CODE_START(__x86_return_thunk)
>> UNWIND_HINT_FUNC
>> ANNOTATE_NOENDBR
>> +#if defined(CONFIG_MITIGATION_UNRET_ENTRY) || \
>> + defined(CONFIG_MITIGATION_SRSO) || \
>> + defined(CONFIG_MITIGATION_CALL_DEPTH_TRACKING)
>> ALTERNATIVE __stringify(ANNOTATE_UNRET_SAFE; ret), \
>> "jmp warn_thunk_thunk", X86_FEATURE_ALWAYS
>> +#else
>> + ANNOTATE_UNRET_SAFE
>> + ret
>> +#endif
>> int3
>> SYM_CODE_END(__x86_return_thunk)
>> EXPORT_SYMBOL(__x86_return_thunk)
>>