2023-08-10 19:09:43

by Thomas Gleixner

[permalink] [raw]
Subject: [patch 29/30] x86/microcode: Prepare for minimal revision check

From: Thomas Gleixner <[email protected]>

Applying microcode late can be fatal for the running kernel when the update
changes functionality which is in use already in a non-compatible way,
e.g. by removing a CPUID bit.

There is no way for admins which do not have access to the vendors deep
technical support to decide whether late loading of such a microcode is
safe or not.

Intel has added a new field to the microcode header which tells the minimal
microcode revision which is required to be active in the CPU in order to be
safe.

Provide infrastructure for handling this in the core code and a command
line switch which allows to enforce it.

If the update is considered safe the kernel is not tainted and the annoying
warning message not emitted. If it's enforced and the currently loaded
microcode revision is not safe for late loading then the load is aborted.

Signed-off-by: Thomas Gleixner <[email protected]>

---
Documentation/admin-guide/kernel-parameters.txt | 5 ++++
arch/x86/Kconfig | 23 ++++++++++++++++++-
arch/x86/kernel/cpu/microcode/amd.c | 3 ++
arch/x86/kernel/cpu/microcode/core.c | 29 ++++++++++++++++++------
arch/x86/kernel/cpu/microcode/intel.c | 3 ++
arch/x86/kernel/cpu/microcode/internal.h | 3 ++
6 files changed, 58 insertions(+), 8 deletions(-)
---
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3239,6 +3239,11 @@

mga= [HW,DRM]

+ microcode.force_minrev= [X86]
+ Format: <bool>
+ Enable or disable the microcode minimal revision
+ enforcement for the runtime microcode loader.
+
min_addr=nn[KMG] [KNL,BOOT,IA-64] All physical memory below this
physical address is ignored.

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1320,7 +1320,28 @@ config MICROCODE_LATE_LOADING
is a tricky business and should be avoided if possible. Just the sequence
of synchronizing all cores and SMT threads is one fragile dance which does
not guarantee that cores might not softlock after the loading. Therefore,
- use this at your own risk. Late loading taints the kernel too.
+ use this at your own risk. Late loading taints the kernel unless the
+ microcode header indicates that it is safe for late loading via the
+ minimal revision check. This minimal revision check can be enforced on
+ the kernel command line with "microcode.minrev=Y".
+
+config MICROCODE_LATE_FORCE_MINREV
+ bool "Enforce late microcode loading minimal revision check"
+ default n
+ depends on MICROCODE_LATE_LOADING
+ help
+ To prevent that users load microcode late which modifies already
+ in use features, newer microcodes have a minimum revision field
+ in the microcode header, which tells the kernel which minimum
+ revision must be active in the CPU to safely load that new microcode
+ late into the running system. If disabled the check will not
+ be enforced but the kernel will be tainted when the minimal
+ revision check fails.
+
+ This minimal revision check can also be controlled via the
+ "microcode.minrev" parameter on the kernel command line.
+
+ If unsure say Y.

config X86_MSR
tristate "/dev/cpu/*/msr - Model-specific register support"
--- a/arch/x86/kernel/cpu/microcode/amd.c
+++ b/arch/x86/kernel/cpu/microcode/amd.c
@@ -880,6 +880,9 @@ static enum ucode_state request_microcod
enum ucode_state ret = UCODE_NFOUND;
const struct firmware *fw;

+ if (force_minrev)
+ return UCODE_NFOUND;
+
if (c->x86 >= 0x15)
snprintf(fw_name, sizeof(fw_name), "amd-ucode/microcode_amd_fam%.2xh.bin", c->x86);

--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -46,6 +46,9 @@
static struct microcode_ops *microcode_ops;
static bool dis_ucode_ldr = true;

+bool force_minrev = IS_ENABLED(CONFIG_MICROCODE_LATE_FORCE_MINREV);
+module_param(force_minrev, bool, S_IRUSR | S_IWUSR);
+
bool initrd_gone;

/*
@@ -601,15 +604,17 @@ static int ucode_load_cpus_stopped(void
return 0;
}

-static int ucode_load_late_stop_cpus(void)
+static int ucode_load_late_stop_cpus(bool is_safe)
{
unsigned int cpu, updated = 0, failed = 0, timedout = 0, siblings = 0;
unsigned int nr_offl, offline = 0;
int old_rev = boot_cpu_data.microcode;
struct cpuinfo_x86 prev_info;

- pr_err("Attempting late microcode loading - it is dangerous and taints the kernel.\n");
- pr_err("You should switch to early loading, if possible.\n");
+ if (!is_safe) {
+ pr_err("Late microcode loading without minimal revision check.\n");
+ pr_err("You should switch to early loading, if possible.\n");
+ }

atomic_set(&late_cpus_in, num_online_cpus());
atomic_set(&offline_in_nmi, 0);
@@ -659,7 +664,9 @@ static int ucode_load_late_stop_cpus(voi
return -EIO;
}

- add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_STILL_OK);
+ if (!is_safe || failed || timedout)
+ add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_STILL_OK);
+
pr_info("Microcode load: updated on %u primary CPUs with %u siblings\n", updated, siblings);
if (failed || timedout) {
pr_err("Microcode load incomplete. %u CPUs timed out or failed\n",
@@ -753,9 +760,17 @@ static int ucode_load_late_locked(void)
return -EBUSY;

ret = microcode_ops->request_microcode_fw(0, &microcode_pdev->dev);
- if (ret != UCODE_NEW)
- return ret == UCODE_NFOUND ? -ENOENT : -EBADFD;
- return ucode_load_late_stop_cpus();
+
+ switch (ret) {
+ case UCODE_NEW:
+ case UCODE_NEW_SAFE:
+ break;
+ case UCODE_NFOUND:
+ return -ENOENT;
+ default:
+ return -EBADFD;
+ }
+ return ucode_load_late_stop_cpus(ret == UCODE_NEW_SAFE);
}

static ssize_t reload_store(struct device *dev,
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -549,6 +549,9 @@ static enum ucode_state read_ucode_intel
int cur_rev = uci->cpu_sig.rev;
u8 *new_mc = NULL, *mc = NULL;

+ if (force_minrev)
+ return UCODE_NFOUND;
+
while (iov_iter_count(iter)) {
struct microcode_header_intel mc_header;
unsigned int mc_size, data_size;
--- a/arch/x86/kernel/cpu/microcode/internal.h
+++ b/arch/x86/kernel/cpu/microcode/internal.h
@@ -13,6 +13,7 @@ struct device;
enum ucode_state {
UCODE_OK = 0,
UCODE_NEW,
+ UCODE_NEW_SAFE,
UCODE_UPDATED,
UCODE_NFOUND,
UCODE_ERROR,
@@ -36,6 +37,8 @@ struct microcode_ops {
use_nmi : 1;
};

+extern bool force_minrev;
+
extern struct ucode_cpu_info ucode_cpu_info[];
struct cpio_data find_microcode_in_initrd(const char *path, bool use_pa);




2023-08-10 22:04:38

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [patch 29/30] x86/microcode: Prepare for minimal revision check

On Thu, Aug 10, 2023 at 08:38:09PM +0200, Thomas Gleixner wrote:
> From: Thomas Gleixner <[email protected]>
>
> Applying microcode late can be fatal for the running kernel when the update
> changes functionality which is in use already in a non-compatible way,
> e.g. by removing a CPUID bit.

This includes all compatibility constraints? Because IIRC we've also had
trouble because a CPUID bit got set. Kernel didn't know about, didn't
manage it, but userspace saw the bit and happily tried to use it.

Ofc I can't remember the exact case :/ but anything that changes the
xsave size/state would obviously cause trouble.


2023-08-11 10:00:39

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 29/30] x86/microcode: Prepare for minimal revision check

On Thu, Aug 10 2023 at 22:54, Peter Zijlstra wrote:
> On Thu, Aug 10, 2023 at 08:38:09PM +0200, Thomas Gleixner wrote:
>> From: Thomas Gleixner <[email protected]>
>>
>> Applying microcode late can be fatal for the running kernel when the update
>> changes functionality which is in use already in a non-compatible way,
>> e.g. by removing a CPUID bit.
>
> This includes all compatibility constraints? Because IIRC we've also had
> trouble because a CPUID bit got set. Kernel didn't know about, didn't
> manage it, but userspace saw the bit and happily tried to use it.

We don't know. If the microcoders screw that minrev constraint up, then
we are up the creek w/o a paddle as before.

> Ofc I can't remember the exact case :/ but anything that changes the
> xsave size/state would obviously cause trouble.

Details. :)

2023-08-11 15:41:48

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [patch 29/30] x86/microcode: Prepare for minimal revision check

On 8/10/2023 1:54 PM, Peter Zijlstra wrote:
> On Thu, Aug 10, 2023 at 08:38:09PM +0200, Thomas Gleixner wrote:
>> From: Thomas Gleixner <[email protected]>
>>
>> Applying microcode late can be fatal for the running kernel when the update
>> changes functionality which is in use already in a non-compatible way,
>> e.g. by removing a CPUID bit.
>
> This includes all compatibility constraints? Because IIRC we've also had
> trouble because a CPUID bit got set. Kernel didn't know about, didn't

do you have the details on that -- I don't know of any of those outside
of enumerating the sidechannel status cpuid bits.

> manage it, but userspace saw the bit and happily tried to use it.

yes it contains all the compatibility constraints the OS folks (e.g. the intel kernel folks)
could convey to the microcode team. If you think the constraints are
not complete please help us improve them

>
> Ofc I can't remember the exact case :/ but anything that changes the
> xsave size/state would obviously cause trouble.

of course that cant' happen at runtime at all correctly.
>