2022-11-29 21:31:29

by Ashok Raj

[permalink] [raw]
Subject: [Patch V1 2/7] x86/microcode/intel: Remove retries on early microcode load

Microcode loading can fail. This happens today when handling mixed
steppings. But it can also happen for other reasons such as corrupted
image, Security Version Number (SVN) preventing anti-rollback,
dependencies on BIOS loaded microcode image for some capabilities.

When the microcode loading fails, the kernel will quietly hang at boot.
This has been observed by end users (Links below) who had to revert their
microcode packages in order to boot again.

The hang is due to an infinite retry loop. The retries were in place to
support systems with mixed steppings. Now that mixed steppings are no
longer supported, there is only one microcode image at a time. Any retries
will simply reattempt to apply the same image over and over without making
progress.

Some possible past bugs that could be due to this bug are below.

There is no direct evidence that these end user issues were caused by this
retry loop. However, the early boot hangs along with reverting the
microcode update workaround provide strong circumstantial evidence to
support the theory that they are linked.

Remove the retry loop and only attempt to apply microcode once.

Link: https://bugs.launchpad.net/ubuntu/+source/intel-microcode/+bug/1911959
Link: https://forums.linuxmint.com/viewtopic.php?p=1827032#1827032
Link: https://askubuntu.com/questions/1291486/boot-crash-after-latest-update-of-intel-microcode-nov-11-2020
Fixes: 06b8534cb728 ("x86/microcode: Rework microcode loading")
Cc: [email protected]
Signed-off-by: Ashok Raj <[email protected]>
---
arch/x86/kernel/cpu/microcode/intel.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 4f93875f57b4..d68b084a17e7 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -495,7 +495,6 @@ void load_ucode_intel_ap(void)
else
iup = &intel_ucode_patch;

-reget:
if (!*iup) {
patch = __load_ucode_intel(&uci);
if (!patch)
@@ -505,13 +504,7 @@ void load_ucode_intel_ap(void)
}

uci.mc = *iup;
-
- if (apply_microcode_early(&uci, true)) {
- /* Mixed-silicon system? Try to refetch the proper patch: */
- *iup = NULL;
-
- goto reget;
- }
+ apply_microcode_early(&uci, true);
}

static struct microcode_intel *find_patch(struct ucode_cpu_info *uci)
--
2.34.1


2022-12-02 20:04:48

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [Patch V1 2/7] x86/microcode/intel: Remove retries on early microcode load

On Tue, Nov 29 2022 at 13:08, Ashok Raj wrote:
> Microcode loading can fail. This happens today when handling mixed
> steppings. But it can also happen for other reasons such as corrupted
> image, Security Version Number (SVN) preventing anti-rollback,
> dependencies on BIOS loaded microcode image for some capabilities.
>
> When the microcode loading fails, the kernel will quietly hang at boot.
> This has been observed by end users (Links below) who had to revert their
> microcode packages in order to boot again.
>
> The hang is due to an infinite retry loop. The retries were in place to
> support systems with mixed steppings. Now that mixed steppings are no
> longer supported, there is only one microcode image at a time. Any retries
> will simply reattempt to apply the same image over and over without making
> progress.
>
> Some possible past bugs that could be due to this bug are below.
>
> There is no direct evidence that these end user issues were caused by this
> retry loop. However, the early boot hangs along with reverting the
> microcode update workaround provide strong circumstantial evidence to
> support the theory that they are linked.
>
> Remove the retry loop and only attempt to apply microcode once.

Very concise and informative changelog. See, you can do it :)

> Link: https://bugs.launchpad.net/ubuntu/+source/intel-microcode/+bug/1911959
> Link: https://forums.linuxmint.com/viewtopic.php?p=1827032#1827032
> Link: https://askubuntu.com/questions/1291486/boot-crash-after-latest-update-of-intel-microcode-nov-11-2020
> Fixes: 06b8534cb728 ("x86/microcode: Rework microcode loading")
> Cc: [email protected]
> Signed-off-by: Ashok Raj <[email protected]>

Nit: Can you order the tags according to the tip documentation next time
please?

Reviewed-by: Thomas Gleixner <[email protected]>

2022-12-03 00:22:00

by Sohil Mehta

[permalink] [raw]
Subject: Re: [Patch V1 2/7] x86/microcode/intel: Remove retries on early microcode load

On 11/29/2022 1:08 PM, Ashok Raj wrote:
> -
> - if (apply_microcode_early(&uci, true)) {
> - /* Mixed-silicon system? Try to refetch the proper patch: */
> - *iup = NULL;
> -
> - goto reget;
> - }
> + apply_microcode_early(&uci, true);

After this change, none of the callers of apply_microcode_early() check
the return code.

In future, do we expect callers to care about the return code? The rest
patches in this series don't seem to suggest so. Also, the expected
error printing happens in the function itself.

Should the return type for apply_microcode_early() be changed to void
(in a follow-up patch)?

Sohil

2022-12-03 02:13:02

by Ashok Raj

[permalink] [raw]
Subject: Re: [Patch V1 2/7] x86/microcode/intel: Remove retries on early microcode load

On Fri, Dec 02, 2022 at 03:53:52PM -0800, Mehta, Sohil wrote:
> On 11/29/2022 1:08 PM, Ashok Raj wrote:
> > -
> > - if (apply_microcode_early(&uci, true)) {
> > - /* Mixed-silicon system? Try to refetch the proper patch: */
> > - *iup = NULL;
> > -
> > - goto reget;
> > - }
> > + apply_microcode_early(&uci, true);
>
> After this change, none of the callers of apply_microcode_early() check the
> return code.
>
> In future, do we expect callers to care about the return code? The rest
> patches in this series don't seem to suggest so. Also, the expected error
> printing happens in the function itself.
>
> Should the return type for apply_microcode_early() be changed to void (in a
> follow-up patch)?

Good idea.. But I think its early, the return code could be used for
something useful. I have some additional cleanup patches that I need to
fixup and we could use this for real.

For e.g. early loading failures are now reported by each vendor, if we can
consolidate this, we could do it more at a core level, but I'm worried it
might be too much change right now, and this can wait its turn.

Cheers,
Ashok

2022-12-03 02:20:46

by Ashok Raj

[permalink] [raw]
Subject: Re: [Patch V1 2/7] x86/microcode/intel: Remove retries on early microcode load

On Fri, Dec 02, 2022 at 08:01:54PM +0100, Thomas Gleixner wrote:
> On Tue, Nov 29 2022 at 13:08, Ashok Raj wrote:
> > Microcode loading can fail. This happens today when handling mixed
> > steppings. But it can also happen for other reasons such as corrupted
> > image, Security Version Number (SVN) preventing anti-rollback,
> > dependencies on BIOS loaded microcode image for some capabilities.
> >
> > When the microcode loading fails, the kernel will quietly hang at boot.
> > This has been observed by end users (Links below) who had to revert their
> > microcode packages in order to boot again.
> >
> > The hang is due to an infinite retry loop. The retries were in place to
> > support systems with mixed steppings. Now that mixed steppings are no
> > longer supported, there is only one microcode image at a time. Any retries
> > will simply reattempt to apply the same image over and over without making
> > progress.
> >
> > Some possible past bugs that could be due to this bug are below.
> >
> > There is no direct evidence that these end user issues were caused by this
> > retry loop. However, the early boot hangs along with reverting the
> > microcode update workaround provide strong circumstantial evidence to
> > support the theory that they are linked.
> >
> > Remove the retry loop and only attempt to apply microcode once.
>
> Very concise and informative changelog. See, you can do it :)
>
> > Link: https://bugs.launchpad.net/ubuntu/+source/intel-microcode/+bug/1911959
> > Link: https://forums.linuxmint.com/viewtopic.php?p=1827032#1827032
> > Link: https://askubuntu.com/questions/1291486/boot-crash-after-latest-update-of-intel-microcode-nov-11-2020
> > Fixes: 06b8534cb728 ("x86/microcode: Rework microcode loading")
> > Cc: [email protected]
> > Signed-off-by: Ashok Raj <[email protected]>
>
> Nit: Can you order the tags according to the tip documentation next time
> please?
>
> Reviewed-by: Thomas Gleixner <[email protected]>

Thanks!. Dave Hansen gave me script to order them correctly :-).. I'll fix
when i repost them.

2022-12-05 13:09:34

by Borislav Petkov

[permalink] [raw]
Subject: Re: [Patch V1 2/7] x86/microcode/intel: Remove retries on early microcode load

On Tue, Nov 29, 2022 at 01:08:27PM -0800, Ashok Raj wrote:
> There is no direct evidence that these end user issues were caused by this
> retry loop. However, the early boot hangs along with reverting the
> microcode update workaround provide strong circumstantial evidence to
> support the theory that they are linked.

A "circumstantial" reason for why something "might" be broken has no
place in a commit message.

If you still wanna chase this and *actually* give me a sane,
comprehensible reason of why this could cause an endless loop with
officially released microcode, then I'm willing to listen.

Otherwise, I'll apply this:

---
From: Ashok Raj <[email protected]>
Date: Tue, 29 Nov 2022 13:08:27 -0800
Subject: x86/microcode/intel: Remove retries on early microcode load

The retries in load_ucode_intel_ap() were in place to support systems
with mixed steppings. Mixed steppings are no longer supported and there is
only one microcode image at a time. Any retries will simply reattempt to
apply the same image over and over without making progress.

Fixes: 06b8534cb728 ("x86/microcode: Rework microcode loading")
Signed-off-by: Ashok Raj <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/microcode/intel.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 4f93875f57b4..d68b084a17e7 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -495,7 +495,6 @@ void load_ucode_intel_ap(void)
else
iup = &intel_ucode_patch;

-reget:
if (!*iup) {
patch = __load_ucode_intel(&uci);
if (!patch)
@@ -505,13 +504,7 @@ void load_ucode_intel_ap(void)
}

uci.mc = *iup;
-
- if (apply_microcode_early(&uci, true)) {
- /* Mixed-silicon system? Try to refetch the proper patch: */
- *iup = NULL;
-
- goto reget;
- }
+ apply_microcode_early(&uci, true);
}

static struct microcode_intel *find_patch(struct ucode_cpu_info *uci)

--
2.34.1


--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-12-05 16:55:36

by Ashok Raj

[permalink] [raw]
Subject: Re: [Patch V1 2/7] x86/microcode/intel: Remove retries on early microcode load

On Mon, Dec 05, 2022 at 01:18:58PM +0100, Borislav Petkov wrote:
> On Tue, Nov 29, 2022 at 01:08:27PM -0800, Ashok Raj wrote:
> > There is no direct evidence that these end user issues were caused by this
> > retry loop. However, the early boot hangs along with reverting the
> > microcode update workaround provide strong circumstantial evidence to
> > support the theory that they are linked.
>
> A "circumstantial" reason for why something "might" be broken has no
> place in a commit message.
>
> If you still wanna chase this and *actually* give me a sane,
> comprehensible reason of why this could cause an endless loop with
> officially released microcode, then I'm willing to listen.

Your changes look fine, thanks for fixing it up.

>
> Otherwise, I'll apply this:
>

Cheers,
Ashok

2022-12-05 21:53:12

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: x86/microcode] x86/microcode/intel: Do not retry microcode reloading on the APs

The following commit has been merged into the x86/microcode branch of tip:

Commit-ID: be1b670f61443aa5d0d01782e9b8ea0ee825d018
Gitweb: https://git.kernel.org/tip/be1b670f61443aa5d0d01782e9b8ea0ee825d018
Author: Ashok Raj <[email protected]>
AuthorDate: Tue, 29 Nov 2022 13:08:27 -08:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Mon, 05 Dec 2022 21:22:21 +01:00

x86/microcode/intel: Do not retry microcode reloading on the APs

The retries in load_ucode_intel_ap() were in place to support systems
with mixed steppings. Mixed steppings are no longer supported and there is
only one microcode image at a time. Any retries will simply reattempt to
apply the same image over and over without making progress.

[ bp: Zap the circumstantial reasoning from the commit message. ]

Fixes: 06b8534cb728 ("x86/microcode: Rework microcode loading")
Signed-off-by: Ashok Raj <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/microcode/intel.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 4f93875..2dba4b5 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -495,7 +495,6 @@ void load_ucode_intel_ap(void)
else
iup = &intel_ucode_patch;

-reget:
if (!*iup) {
patch = __load_ucode_intel(&uci);
if (!patch)
@@ -506,12 +505,7 @@ reget:

uci.mc = *iup;

- if (apply_microcode_early(&uci, true)) {
- /* Mixed-silicon system? Try to refetch the proper patch: */
- *iup = NULL;
-
- goto reget;
- }
+ apply_microcode_early(&uci, true);
}

static struct microcode_intel *find_patch(struct ucode_cpu_info *uci)