Commit fa94d0c6e0f3 ("x86/MCE: Save microcode revision in machine check
records") extended MCE entries to report the microcode revision taken
from boot_cpu_data. Unfortunately, boot_cpu_data isn't updated on late
microcode loading, thus making MCE entries slightly incorrect.
Use cpu_info instead, which is updated on late microcode loading.
Fixes: fa94d0c6e0f3 ("x86/MCE: Save microcode revision in machine check records")
Signed-off-by: Filippo Sironi <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
arch/x86/kernel/cpu/mcheck/mce.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 42cf2880d0ed..4be323f9b390 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -134,7 +134,7 @@ void mce_setup(struct mce *m)
if (this_cpu_has(X86_FEATURE_INTEL_PPIN))
rdmsrl(MSR_PPIN, m->ppin);
- m->microcode = boot_cpu_data.microcode;
+ m->microcode = cpu_data(m->extcpu).microcode;
}
DEFINE_PER_CPU(struct mce, injectm);
--
2.7.4
On Fri, Jun 01, 2018 at 01:30:26PM +0200, Filippo Sironi wrote:
> Commit fa94d0c6e0f3 ("x86/MCE: Save microcode revision in machine check
> records") extended MCE entries to report the microcode revision taken
> from boot_cpu_data. Unfortunately, boot_cpu_data isn't updated on late
> microcode loading,
Actually, I'd prefer if we fixed *that* instead by adding:
/* Update boot_cpu_data's revision too, if we're on the BSP: */
if (c->cpu_index == boot_cpu_data.cpu_index)
boot_cpu_data.microcode = <new rev>;
to the end of ->apply_microcode() functions so that boot_cpu_data has
the correct revision too, in case something else queries it.
Thx.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
> On 1. Jun 2018, at 14:19, Borislav Petkov <[email protected]> wrote:
>
> On Fri, Jun 01, 2018 at 01:30:26PM +0200, Filippo Sironi wrote:
>> Commit fa94d0c6e0f3 ("x86/MCE: Save microcode revision in machine check
>> records") extended MCE entries to report the microcode revision taken
>> from boot_cpu_data. Unfortunately, boot_cpu_data isn't updated on late
>> microcode loading,
>
> Actually, I'd prefer if we fixed *that* instead by adding:
>
> /* Update boot_cpu_data's revision too, if we're on the BSP: */
> if (c->cpu_index == boot_cpu_data.cpu_index)
> boot_cpu_data.microcode = <new rev>;
>
> to the end of ->apply_microcode() functions so that boot_cpu_data has
> the correct revision too, in case something else queries it.
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> ECO tip #101: Trim your mails when you reply.
> --
I've that patch in my tree already, I can post it.
I'm still curious on why you'd prefer to use boot_cpu_data for all
CPUs instead of using cpu_data(m->extcpu) though.
Regards,
Filippo
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
On Fri, Jun 01, 2018 at 12:32:02PM +0000, Sironi, Filippo wrote:
> I've that patch in my tree already, I can post it.
> I'm still curious on why you'd prefer to use boot_cpu_data for all
> CPUs instead of using cpu_data(m->extcpu) though.
Well, for starters, boot_cpu_data having stale revision is a bug in
itself already. So fixing that is already the Right Thing(tm) to do.
Then, one fine day when the whole cpu_data mess is cleaned up,
boot_cpu_data will morph into the global descriptor representing CPU
features and keeping it uptodate now will facilitate the replacement
without introducing bugs.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
On systems where a runtime microcode update has occurred the microcode
version output in a MCE log record is wrong because
boot_cpu_data.microcode is not updated during runtime.
Update boot_cpu_data.microcode when the BSP's microcode is updated.
Fixes: fa94d0c6e0f3 ("x86/MCE: Save microcode revision in machine check records")
Suggested-by: Borislav Petkov <[email protected]>
Signed-off-by: Prarit Bhargava <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
arch/x86/kernel/cpu/microcode/amd.c | 4 ++++
arch/x86/kernel/cpu/microcode/intel.c | 4 ++++
2 files changed, 8 insertions(+)
diff --git a/arch/x86/kernel/cpu/microcode/amd.c b/arch/x86/kernel/cpu/microcode/amd.c
index 0624957aa068..7f5b32535ac7 100644
--- a/arch/x86/kernel/cpu/microcode/amd.c
+++ b/arch/x86/kernel/cpu/microcode/amd.c
@@ -537,6 +537,10 @@ static enum ucode_state apply_microcode_amd(int cpu)
uci->cpu_sig.rev = mc_amd->hdr.patch_id;
c->microcode = mc_amd->hdr.patch_id;
+ /* Update boot_cpu_data's revision too, if we're on the BSP: */
+ if (c->cpu_index == boot_cpu_data.cpu_index)
+ boot_cpu_data.microcode = rev;
+
return UCODE_UPDATED;
}
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 97ccf4c3b45b..256d336cbc04 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -851,6 +851,10 @@ static enum ucode_state apply_microcode_intel(int cpu)
uci->cpu_sig.rev = rev;
c->microcode = rev;
+ /* Update boot_cpu_data's revision too, if we're on the BSP: */
+ if (c->cpu_index == boot_cpu_data.cpu_index)
+ boot_cpu_data.microcode = rev;
+
return UCODE_UPDATED;
}
--
2.17.0
On 07/30/2018 01:49 PM, Prarit Bhargava wrote:
> On systems where a runtime microcode update has occurred the microcode
> version output in a MCE log record is wrong because
> boot_cpu_data.microcode is not updated during runtime.
>
> Update boot_cpu_data.microcode when the BSP's microcode is updated.
>
> Fixes: fa94d0c6e0f3 ("x86/MCE: Save microcode revision in machine check records")
> Suggested-by: Borislav Petkov <[email protected]>
> Signed-off-by: Prarit Bhargava <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> ---
> arch/x86/kernel/cpu/microcode/amd.c | 4 ++++
> arch/x86/kernel/cpu/microcode/intel.c | 4 ++++
> 2 files changed, 8 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/microcode/amd.c b/arch/x86/kernel/cpu/microcode/amd.c
> index 0624957aa068..7f5b32535ac7 100644
> --- a/arch/x86/kernel/cpu/microcode/amd.c
> +++ b/arch/x86/kernel/cpu/microcode/amd.c
> @@ -537,6 +537,10 @@ static enum ucode_state apply_microcode_amd(int cpu)
> uci->cpu_sig.rev = mc_amd->hdr.patch_id;
> c->microcode = mc_amd->hdr.patch_id;
>
> + /* Update boot_cpu_data's revision too, if we're on the BSP: */
> + if (c->cpu_index == boot_cpu_data.cpu_index)
> + boot_cpu_data.microcode = rev;
> +
Borislav, hold off on this. I want to double check something with AMD.
I think this has to be
boot_cpu_data.microcode = mc_amd->hdr.patch_id;
I'm going to grab some old/new microcode and test. I do know that the Intel
update below works.
P.
> return UCODE_UPDATED;
> }
>
> diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
> index 97ccf4c3b45b..256d336cbc04 100644
> --- a/arch/x86/kernel/cpu/microcode/intel.c
> +++ b/arch/x86/kernel/cpu/microcode/intel.c
> @@ -851,6 +851,10 @@ static enum ucode_state apply_microcode_intel(int cpu)
> uci->cpu_sig.rev = rev;
> c->microcode = rev;
>
> + /* Update boot_cpu_data's revision too, if we're on the BSP: */
> + if (c->cpu_index == boot_cpu_data.cpu_index)
> + boot_cpu_data.microcode = rev;
> +
> return UCODE_UPDATED;
> }
>
>
On Mon, Jul 30, 2018 at 01:53:50PM -0400, Prarit Bhargava wrote:
> I think this has to be
>
> boot_cpu_data.microcode = mc_amd->hdr.patch_id;
Yes, it does.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
I tested this on AMD Ryzen & Intel Broadwell system and dumped the
boot_cpu_data before and after a microcode update. On the Intel
system I also did a fatal MCE using mce-inject to confirm the output
from the mce handling code.
P.
---8<---
On systems where a runtime microcode update has occurred the microcode
version output in a MCE log record is wrong because
boot_cpu_data.microcode is not updated during runtime.
Update boot_cpu_data.microcode when the BSP's microcode is updated.
Fixes: fa94d0c6e0f3 ("x86/MCE: Save microcode revision in machine check records")
Suggested-by: Borislav Petkov <[email protected]>
Signed-off-by: Prarit Bhargava <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
Changes in v2: Use mc_amd->hdr.patch_id on AMD
arch/x86/kernel/cpu/microcode/amd.c | 4 ++++
arch/x86/kernel/cpu/microcode/intel.c | 4 ++++
2 files changed, 8 insertions(+)
diff --git a/arch/x86/kernel/cpu/microcode/amd.c b/arch/x86/kernel/cpu/microcode/amd.c
index 0624957aa068..63b072377ba4 100644
--- a/arch/x86/kernel/cpu/microcode/amd.c
+++ b/arch/x86/kernel/cpu/microcode/amd.c
@@ -537,6 +537,10 @@ static enum ucode_state apply_microcode_amd(int cpu)
uci->cpu_sig.rev = mc_amd->hdr.patch_id;
c->microcode = mc_amd->hdr.patch_id;
+ /* Update boot_cpu_data's revision too, if we're on the BSP: */
+ if (c->cpu_index == boot_cpu_data.cpu_index)
+ boot_cpu_data.microcode = mc_amd->hdr.patch_id;
+
return UCODE_UPDATED;
}
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 97ccf4c3b45b..256d336cbc04 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -851,6 +851,10 @@ static enum ucode_state apply_microcode_intel(int cpu)
uci->cpu_sig.rev = rev;
c->microcode = rev;
+ /* Update boot_cpu_data's revision too, if we're on the BSP: */
+ if (c->cpu_index == boot_cpu_data.cpu_index)
+ boot_cpu_data.microcode = rev;
+
return UCODE_UPDATED;
}
--
2.17.0
> On 31. Jul 2018, at 13:27, Prarit Bhargava <[email protected]> wrote:
>
> I tested this on AMD Ryzen & Intel Broadwell system and dumped the
> boot_cpu_data before and after a microcode update. On the Intel
> system I also did a fatal MCE using mce-inject to confirm the output
> from the mce handling code.
>
> P.
>
> ---8<---
>
> On systems where a runtime microcode update has occurred the microcode
> version output in a MCE log record is wrong because
> boot_cpu_data.microcode is not updated during runtime.
>
> Update boot_cpu_data.microcode when the BSP's microcode is updated.
>
> Fixes: fa94d0c6e0f3 ("x86/MCE: Save microcode revision in machine check records")
> Suggested-by: Borislav Petkov <[email protected]>
> Signed-off-by: Prarit Bhargava <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> ---
> Changes in v2: Use mc_amd->hdr.patch_id on AMD
>
> arch/x86/kernel/cpu/microcode/amd.c | 4 ++++
> arch/x86/kernel/cpu/microcode/intel.c | 4 ++++
> 2 files changed, 8 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/microcode/amd.c b/arch/x86/kernel/cpu/microcode/amd.c
> index 0624957aa068..63b072377ba4 100644
> --- a/arch/x86/kernel/cpu/microcode/amd.c
> +++ b/arch/x86/kernel/cpu/microcode/amd.c
> @@ -537,6 +537,10 @@ static enum ucode_state apply_microcode_amd(int cpu)
> uci->cpu_sig.rev = mc_amd->hdr.patch_id;
> c->microcode = mc_amd->hdr.patch_id;
>
> + /* Update boot_cpu_data's revision too, if we're on the BSP: */
> + if (c->cpu_index == boot_cpu_data.cpu_index)
> + boot_cpu_data.microcode = mc_amd->hdr.patch_id;
> +
> return UCODE_UPDATED;
> }
>
> diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
> index 97ccf4c3b45b..256d336cbc04 100644
> --- a/arch/x86/kernel/cpu/microcode/intel.c
> +++ b/arch/x86/kernel/cpu/microcode/intel.c
> @@ -851,6 +851,10 @@ static enum ucode_state apply_microcode_intel(int cpu)
> uci->cpu_sig.rev = rev;
> c->microcode = rev;
>
> + /* Update boot_cpu_data's revision too, if we're on the BSP: */
> + if (c->cpu_index == boot_cpu_data.cpu_index)
> + boot_cpu_data.microcode = rev;
> +
> return UCODE_UPDATED;
> }
There may be a chance of skipping this code, I think.
If the microcode is loaded on the hyperthread sibling of the boot cpu
before being loaded on the boot cpu, the boot cpu will exit earlier
from apply_microcode_intel() - in if (rev >= mc->hdr.rev) { ... }.
(This seems to be possible in apply_microcode_amd() as well.)
In my tree with the aforementioned change - Intel only - I also had
the following patch:
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 97ccf4c3b45b..4bc869e829eb 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -797,6 +797,7 @@ static enum ucode_state apply_microcode_intel(int cpu)
struct microcode_intel *mc;
static int prev_rev;
u32 rev;
+ enum ucode_state ret;
/* We should bind the task to the CPU */
if (WARN_ON(raw_smp_processor_id() != cpu))
@@ -817,9 +818,8 @@ static enum ucode_state apply_microcode_intel(int cpu)
*/
rev = intel_get_microcode_revision();
if (rev >= mc->hdr.rev) {
- uci->cpu_sig.rev = rev;
- c->microcode = rev;
- return UCODE_OK;
+ ret = UCODE_OK;
+ goto out;
}
/*
@@ -848,10 +848,12 @@ static enum ucode_state apply_microcode_intel(int cpu)
prev_rev = rev;
}
+ ret = UCODE_UPDATED;
+out:
uci->cpu_sig.rev = rev;
c->microcode = rev;
- return UCODE_UPDATED;
+ return ret;
}
static enum ucode_state generic_load_microcode(int cpu, void *data, size_t size,
which prevents the issue.
> --
> 2.17.0
>
>
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
On Tue, Jul 31, 2018 at 11:46:09AM +0000, Sironi, Filippo wrote:
> There may be a chance of skipping this code, I think.
>
> If the microcode is loaded on the hyperthread sibling of the boot cpu
> before being loaded on the boot cpu, the boot cpu will exit earlier
> from apply_microcode_intel() - in if (rev >= mc->hdr.rev) { ... }.
>
> (This seems to be possible in apply_microcode_amd() as well.)
>
> In my tree with the aforementioned change - Intel only - I also had
> the following patch:
Yap, good catch.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
> On 31. Jul 2018, at 15:00, Borislav Petkov <[email protected]> wrote:
>
> On Tue, Jul 31, 2018 at 11:46:09AM +0000, Sironi, Filippo wrote:
>> There may be a chance of skipping this code, I think.
>>
>> If the microcode is loaded on the hyperthread sibling of the boot cpu
>> before being loaded on the boot cpu, the boot cpu will exit earlier
>> from apply_microcode_intel() - in if (rev >= mc->hdr.rev) { ... }.
>>
>> (This seems to be possible in apply_microcode_amd() as well.)
>>
>> In my tree with the aforementioned change - Intel only - I also had
>> the following patch:
>
> Yap, good catch.
>
> --
> Regards/Gruss,
> Boris.
>
> ECO tip #101: Trim your mails when you reply.
> --
>
Boris, Prarit,
I've sent the patch the I had in my tree and extended it to cover
apply_microcode_amd(). I don't have an AMD host available for testing,
can one of you give that a spin?
Thanks,
Filippo
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
Hi.
> I tested this on AMD Ryzen & Intel Broadwell system and dumped the
> boot_cpu_data before and after a microcode update. On the Intel
> system I also did a fatal MCE using mce-inject to confirm the output
> from the mce handling code.
>
> P.
>
> ---8<---
>
> On systems where a runtime microcode update has occurred the microcode
> version output in a MCE log record is wrong because
> boot_cpu_data.microcode is not updated during runtime.
>
> Update boot_cpu_data.microcode when the BSP's microcode is updated.
>
> Fixes: fa94d0c6e0f3 ("x86/MCE: Save microcode revision in machine check
> records")
> Suggested-by: Borislav Petkov <[email protected]>
> Signed-off-by: Prarit Bhargava <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> ---
> Changes in v2: Use mc_amd->hdr.patch_id on AMD
>
> arch/x86/kernel/cpu/microcode/amd.c | 4 ++++
> arch/x86/kernel/cpu/microcode/intel.c | 4 ++++
> 2 files changed, 8 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/microcode/amd.c
> b/arch/x86/kernel/cpu/microcode/amd.c
> index 0624957aa068..63b072377ba4 100644
> --- a/arch/x86/kernel/cpu/microcode/amd.c
> +++ b/arch/x86/kernel/cpu/microcode/amd.c
> @@ -537,6 +537,10 @@ static enum ucode_state apply_microcode_amd(int
> cpu)
> uci->cpu_sig.rev = mc_amd->hdr.patch_id;
> c->microcode = mc_amd->hdr.patch_id;
>
> + /* Update boot_cpu_data's revision too, if we're on the BSP: */
> + if (c->cpu_index == boot_cpu_data.cpu_index)
> + boot_cpu_data.microcode = mc_amd->hdr.patch_id;
> +
> return UCODE_UPDATED;
> }
>
> diff --git a/arch/x86/kernel/cpu/microcode/intel.c
> b/arch/x86/kernel/cpu/microcode/intel.c
> index 97ccf4c3b45b..256d336cbc04 100644
> --- a/arch/x86/kernel/cpu/microcode/intel.c
> +++ b/arch/x86/kernel/cpu/microcode/intel.c
> @@ -851,6 +851,10 @@ static enum ucode_state apply_microcode_intel(int
> cpu)
> uci->cpu_sig.rev = rev;
> c->microcode = rev;
>
> + /* Update boot_cpu_data's revision too, if we're on the BSP: */
> + if (c->cpu_index == boot_cpu_data.cpu_index)
> + boot_cpu_data.microcode = rev;
> +
> return UCODE_UPDATED;
> }
>
> --
> 2.17.0
After this patch, do we preserve an original microcode version
somewhere? If no, why? Sometimes it is useful while debugging another
crash because of faulty microcode.
Thanks.
--
Oleksandr Natalenko (post-factum)
On 08/01/2018 02:38 AM, Oleksandr Natalenko wrote:
> Hi.
>
>> I tested this on AMD Ryzen & Intel Broadwell system and dumped the
>> boot_cpu_data before and after a microcode update. On the Intel
>> system I also did a fatal MCE using mce-inject to confirm the output
>> from the mce handling code.
>>
>> P.
>>
>> ---8<---
>>
>> On systems where a runtime microcode update has occurred the microcode
>> version output in a MCE log record is wrong because
>> boot_cpu_data.microcode is not updated during runtime.
>>
>> Update boot_cpu_data.microcode when the BSP's microcode is updated.
>>
>> Fixes: fa94d0c6e0f3 ("x86/MCE: Save microcode revision in machine check records")
>> Suggested-by: Borislav Petkov <[email protected]>
>> Signed-off-by: Prarit Bhargava <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>> ---
>> Changes in v2: Use mc_amd->hdr.patch_id on AMD
>>
>> arch/x86/kernel/cpu/microcode/amd.c | 4 ++++
>> arch/x86/kernel/cpu/microcode/intel.c | 4 ++++
>> 2 files changed, 8 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/microcode/amd.c
>> b/arch/x86/kernel/cpu/microcode/amd.c
>> index 0624957aa068..63b072377ba4 100644
>> --- a/arch/x86/kernel/cpu/microcode/amd.c
>> +++ b/arch/x86/kernel/cpu/microcode/amd.c
>> @@ -537,6 +537,10 @@ static enum ucode_state apply_microcode_amd(int cpu)
>> uci->cpu_sig.rev = mc_amd->hdr.patch_id;
>> c->microcode = mc_amd->hdr.patch_id;
>>
>> + /* Update boot_cpu_data's revision too, if we're on the BSP: */
>> + if (c->cpu_index == boot_cpu_data.cpu_index)
>> + boot_cpu_data.microcode = mc_amd->hdr.patch_id;
>> +
>> return UCODE_UPDATED;
>> }
>>
>> diff --git a/arch/x86/kernel/cpu/microcode/intel.c
>> b/arch/x86/kernel/cpu/microcode/intel.c
>> index 97ccf4c3b45b..256d336cbc04 100644
>> --- a/arch/x86/kernel/cpu/microcode/intel.c
>> +++ b/arch/x86/kernel/cpu/microcode/intel.c
>> @@ -851,6 +851,10 @@ static enum ucode_state apply_microcode_intel(int cpu)
>> uci->cpu_sig.rev = rev;
>> c->microcode = rev;
>>
>> + /* Update boot_cpu_data's revision too, if we're on the BSP: */
>> + if (c->cpu_index == boot_cpu_data.cpu_index)
>> + boot_cpu_data.microcode = rev;
>> +
>> return UCODE_UPDATED;
>> }
>>
>> --
>> 2.17.0
>
> After this patch, do we preserve an original microcode version somewhere? If no,
> why? Sometimes it is useful while debugging another crash because of faulty
> microcode.
>
Interesting, and thanks for bringing this up. Oleksandr, under what
circumstances would you want to know what the old version was. AFAICS it is no
longer running and should not have an impact on the system?
P.
> Thanks.
>
Hi.
On 01.08.2018 13:38, Prarit Bhargava wrote:
>> After this patch, do we preserve an original microcode version
>> somewhere? If no,
>> why? Sometimes it is useful while debugging another crash because of
>> faulty
>> microcode.
>>
> Interesting, and thanks for bringing this up. Oleksandr, under what
> circumstances would you want to know what the old version was. AFAICS
> it is no
> longer running and should not have an impact on the system?
Once the kernel log does not contain a printout regarding updated
microcode anymore (because the log buffer is limited in size and can be
overwritten) and once you have a vmcore, it is handy to use
boot_cpu_data to compare the microcode version with the per-CPU value to
find out that is was updated at all. Or, maybe, that can be inspected in
another way now?
--
Oleksandr Natalenko (post-factum)
(drop stable@ from CC)
On Wed, Aug 01, 2018 at 04:16:42PM +0200, Oleksandr Natalenko wrote:
> Once the kernel log does not contain a printout regarding updated microcode
> anymore (because the log buffer is limited in size and can be overwritten)
There's no reliable way to get the old microcode revision which was
overwritten during the upgrade. If dmesg gets overwritten you lose, like
in all the other gazillion cases where you lose information due to that.
Sorry.
This becomes an even bigger problem if you have a long-running system
which upgrades microcode a couple of times before being rebooted again.
In that case, your only log which contains the microcode revisions being
upgraded is dmesg.
> once you have a vmcore, it is handy to use boot_cpu_data to compare
> the microcode version with the per-CPU value to find out that is was
> updated at all.
boot_cpu_data.microcode was never meant to contain the *previous*
microcode revision AFAICT - it just didn't get updated, which we're
fixing now.
> Or, maybe, that can be inspected in another way now?
Dunno, does kdump collect /proc/cpuinfo? If so:
$ grep microcode /proc/cpuinfo
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
(dropping stable@)
On Tue, Jul 31, 2018 at 07:27:39AM -0400, Prarit Bhargava wrote:
> I tested this on AMD Ryzen & Intel Broadwell system and dumped the
> boot_cpu_data before and after a microcode update. On the Intel
> system I also did a fatal MCE using mce-inject to confirm the output
> from the mce handling code.
Ok, sending the two patches I've massaged into submission as a reply to
this message - please run them on your boxes. You can also inject MCEs
on the AMD box to verify - simply enable CONFIG_X86_MCE_INJECT, go to
debugfs and do, for example:
cd /sys/kernel/debug/mce-inject/
echo 10 > /sys/devices/system/machinecheck/machinecheck0/check_interval
echo 0x9c7d410092080813 > status; echo 0x000000006d3d483b > addr; echo 2 > cpu; echo 4 > bank; echo hw > flags
echo 0xbc00200000010015 > status; echo 0x00000000942e4d98 > addr; echo 6 > cpu; echo 2 > bank; echo hw > flags
echo 0xfe000010000b0c0f > status; echo 0xe1101add1e550012 > addr; echo 0xcafebeef > misc; echo 2 > cpu; echo hw > flags; echo 4 > bank
Thanks!
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
From: Prarit Bhargava <[email protected]>
The MCE code uses boot_cpu_data.microcode to prepare the MCE record.
However, on systems where late microcode update has occurred, the
microcode revision output in a MCE log record is wrong because
boot_cpu_data.microcode is not updated when the microcode gets updated.
However, the microcode revision saved in boot_cpu_data's microcode
member should be kept up-to-date, regardless, for consistency.
Make it so.
Fixes: fa94d0c6e0f3 ("x86/MCE: Save microcode revision in machine check records")
Signed-off-by: Prarit Bhargava <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: x86-ml <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ Make commit message more precise. ]
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/kernel/cpu/microcode/amd.c | 4 ++++
arch/x86/kernel/cpu/microcode/intel.c | 4 ++++
2 files changed, 8 insertions(+)
diff --git a/arch/x86/kernel/cpu/microcode/amd.c b/arch/x86/kernel/cpu/microcode/amd.c
index 0624957aa068..63b072377ba4 100644
--- a/arch/x86/kernel/cpu/microcode/amd.c
+++ b/arch/x86/kernel/cpu/microcode/amd.c
@@ -537,6 +537,10 @@ static enum ucode_state apply_microcode_amd(int cpu)
uci->cpu_sig.rev = mc_amd->hdr.patch_id;
c->microcode = mc_amd->hdr.patch_id;
+ /* Update boot_cpu_data's revision too, if we're on the BSP: */
+ if (c->cpu_index == boot_cpu_data.cpu_index)
+ boot_cpu_data.microcode = mc_amd->hdr.patch_id;
+
return UCODE_UPDATED;
}
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 97ccf4c3b45b..256d336cbc04 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -851,6 +851,10 @@ static enum ucode_state apply_microcode_intel(int cpu)
uci->cpu_sig.rev = rev;
c->microcode = rev;
+ /* Update boot_cpu_data's revision too, if we're on the BSP: */
+ if (c->cpu_index == boot_cpu_data.cpu_index)
+ boot_cpu_data.microcode = rev;
+
return UCODE_UPDATED;
}
--
2.18.0.321.gffc6fa0e3962
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
From: Filippo Sironi <[email protected]>
Handle the case where microcode gets loaded on the BSP's hyperthread
sibling first and the boot_cpu_data's microcode revision doesn't get
updated because of early exit due to the siblings sharing a microcode
engine.
For that, simply write the updated revision on all CPUs unconditionally.
Signed-off-by: Filippo Sironi <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: x86-ml <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ Massage everything a bit. ]
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/kernel/cpu/microcode/amd.c | 22 +++++++++++++---------
arch/x86/kernel/cpu/microcode/intel.c | 13 ++++++++-----
2 files changed, 21 insertions(+), 14 deletions(-)
diff --git a/arch/x86/kernel/cpu/microcode/amd.c b/arch/x86/kernel/cpu/microcode/amd.c
index 63b072377ba4..07b5fc00b188 100644
--- a/arch/x86/kernel/cpu/microcode/amd.c
+++ b/arch/x86/kernel/cpu/microcode/amd.c
@@ -504,6 +504,7 @@ static enum ucode_state apply_microcode_amd(int cpu)
struct microcode_amd *mc_amd;
struct ucode_cpu_info *uci;
struct ucode_patch *p;
+ enum ucode_state ret;
u32 rev, dummy;
BUG_ON(raw_smp_processor_id() != cpu);
@@ -521,9 +522,8 @@ static enum ucode_state apply_microcode_amd(int cpu)
/* need to apply patch? */
if (rev >= mc_amd->hdr.patch_id) {
- c->microcode = rev;
- uci->cpu_sig.rev = rev;
- return UCODE_OK;
+ ret = UCODE_OK;
+ goto out;
}
if (__apply_microcode_amd(mc_amd)) {
@@ -531,17 +531,21 @@ static enum ucode_state apply_microcode_amd(int cpu)
cpu, mc_amd->hdr.patch_id);
return UCODE_ERROR;
}
- pr_info("CPU%d: new patch_level=0x%08x\n", cpu,
- mc_amd->hdr.patch_id);
- uci->cpu_sig.rev = mc_amd->hdr.patch_id;
- c->microcode = mc_amd->hdr.patch_id;
+ rev = mc_amd->hdr.patch_id;
+ ret = UCODE_UPDATED;
+
+ pr_info("CPU%d: new patch_level=0x%08x\n", cpu, rev);
+
+out:
+ uci->cpu_sig.rev = rev;
+ c->microcode = rev;
/* Update boot_cpu_data's revision too, if we're on the BSP: */
if (c->cpu_index == boot_cpu_data.cpu_index)
- boot_cpu_data.microcode = mc_amd->hdr.patch_id;
+ boot_cpu_data.microcode = rev;
- return UCODE_UPDATED;
+ return ret;
}
static int install_equiv_cpu_table(const u8 *buf)
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 256d336cbc04..16936a24795c 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -795,6 +795,7 @@ static enum ucode_state apply_microcode_intel(int cpu)
struct ucode_cpu_info *uci = ucode_cpu_info + cpu;
struct cpuinfo_x86 *c = &cpu_data(cpu);
struct microcode_intel *mc;
+ enum ucode_state ret;
static int prev_rev;
u32 rev;
@@ -817,9 +818,8 @@ static enum ucode_state apply_microcode_intel(int cpu)
*/
rev = intel_get_microcode_revision();
if (rev >= mc->hdr.rev) {
- uci->cpu_sig.rev = rev;
- c->microcode = rev;
- return UCODE_OK;
+ ret = UCODE_OK;
+ goto out;
}
/*
@@ -848,14 +848,17 @@ static enum ucode_state apply_microcode_intel(int cpu)
prev_rev = rev;
}
+ ret = UCODE_UPDATED;
+
+out:
uci->cpu_sig.rev = rev;
- c->microcode = rev;
+ c->microcode = rev;
/* Update boot_cpu_data's revision too, if we're on the BSP: */
if (c->cpu_index == boot_cpu_data.cpu_index)
boot_cpu_data.microcode = rev;
- return UCODE_UPDATED;
+ return ret;
}
static enum ucode_state generic_load_microcode(int cpu, void *data, size_t size,
--
2.18.0.321.gffc6fa0e3962
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
On 08/01/2018 11:43 AM, Borislav Petkov wrote:
> (dropping stable@)
>
> On Tue, Jul 31, 2018 at 07:27:39AM -0400, Prarit Bhargava wrote:
>> I tested this on AMD Ryzen & Intel Broadwell system and dumped the
>> boot_cpu_data before and after a microcode update. On the Intel
>> system I also did a fatal MCE using mce-inject to confirm the output
>> from the mce handling code.
>
> Ok, sending the two patches I've massaged into submission as a reply to
> this message - please run them on your boxes. You can also inject MCEs
> on the AMD box to verify - simply enable CONFIG_X86_MCE_INJECT, go to
> debugfs and do, for example:
>
> cd /sys/kernel/debug/mce-inject/
> echo 10 > /sys/devices/system/machinecheck/machinecheck0/check_interval
> echo 0x9c7d410092080813 > status; echo 0x000000006d3d483b > addr; echo 2 > cpu; echo 4 > bank; echo hw > flags
> echo 0xbc00200000010015 > status; echo 0x00000000942e4d98 > addr; echo 6 > cpu; echo 2 > bank; echo hw > flags
> echo 0xfe000010000b0c0f > status; echo 0xe1101add1e550012 > addr; echo 0xcafebeef > misc; echo 2 > cpu; echo hw > flags; echo 4 > bank
>
Borislav, I'm not sure if you saw his comments but Oleksandr, cc'd, has an
objection to overwriting the stored microcode version in boot_cpu_data. He has
found that information useful in debugging microcode issues.
P.
> Thanks!
>
> There's no reliable way to get the old microcode revision which was
> overwritten during the upgrade. If dmesg gets overwritten you lose, like
> in all the other gazillion cases where you lose information due to that.
The primary requirement here is that we report the version of the microcode
in use at the time of a crash. Keeping history of all updates seems to me to
beyond the scope of the kernel's responsibilities.
It's not like these updates appear out of the ether. You have to go out and
grab a new package and install it. User land can keep track of this much
more easily than the kernel.
-Tony
Hi.
On 01.08.2018 17:59, Luck, Tony wrote:
>> There's no reliable way to get the old microcode revision which was
>> overwritten during the upgrade. If dmesg gets overwritten you lose,
>> like
>> in all the other gazillion cases where you lose information due to
>> that.
>
> The primary requirement here is that we report the version of the
> microcode
> in use at the time of a crash. Keeping history of all updates seems to
> me to
> beyond the scope of the kernel's responsibilities.
>
> It's not like these updates appear out of the ether. You have to go out
> and
> grab a new package and install it. User land can keep track of this
> much
> more easily than the kernel.
I don't mind doing the right thing at all. It is just to inform you that
it was found to be useful.
Also, [1].
Thanks ☺.
[1] https://xkcd.com/1172/
--
Oleksandr Natalenko (post-factum)
On August 1, 2018 7:01:50 PM GMT+03:00, Oleksandr Natalenko <[email protected]> wrote:
>I don't mind doing the right thing at all. It is just to inform you
>that
>it was found to be useful.
I don't think it would've worked if you did a second microcode upgrade on the system.
--
Sent from a small device: formatting sux and brevity is inevitable.
Commit-ID: 370a132bb2227ff76278f98370e0e701d86ff752
Gitweb: https://git.kernel.org/tip/370a132bb2227ff76278f98370e0e701d86ff752
Author: Prarit Bhargava <[email protected]>
AuthorDate: Tue, 31 Jul 2018 07:27:39 -0400
Committer: Thomas Gleixner <[email protected]>
CommitDate: Sun, 2 Sep 2018 14:10:54 +0200
x86/microcode: Make sure boot_cpu_data.microcode is up-to-date
When preparing an MCE record for logging, boot_cpu_data.microcode is used
to read out the microcode revision on the box.
However, on systems where late microcode update has happened, the microcode
revision output in a MCE log record is wrong because
boot_cpu_data.microcode is not updated when the microcode gets updated.
But, the microcode revision saved in boot_cpu_data's microcode member
should be kept up-to-date, regardless, for consistency.
Make it so.
Fixes: fa94d0c6e0f3 ("x86/MCE: Save microcode revision in machine check records")
Signed-off-by: Prarit Bhargava <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/microcode/amd.c | 4 ++++
arch/x86/kernel/cpu/microcode/intel.c | 4 ++++
2 files changed, 8 insertions(+)
diff --git a/arch/x86/kernel/cpu/microcode/amd.c b/arch/x86/kernel/cpu/microcode/amd.c
index 0624957aa068..602f17134103 100644
--- a/arch/x86/kernel/cpu/microcode/amd.c
+++ b/arch/x86/kernel/cpu/microcode/amd.c
@@ -537,6 +537,10 @@ static enum ucode_state apply_microcode_amd(int cpu)
uci->cpu_sig.rev = mc_amd->hdr.patch_id;
c->microcode = mc_amd->hdr.patch_id;
+ /* Update boot_cpu_data's revision too, if we're on the BSP: */
+ if (c->cpu_index == boot_cpu_data.cpu_index)
+ boot_cpu_data.microcode = mc_amd->hdr.patch_id;
+
return UCODE_UPDATED;
}
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 97ccf4c3b45b..256d336cbc04 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -851,6 +851,10 @@ static enum ucode_state apply_microcode_intel(int cpu)
uci->cpu_sig.rev = rev;
c->microcode = rev;
+ /* Update boot_cpu_data's revision too, if we're on the BSP: */
+ if (c->cpu_index == boot_cpu_data.cpu_index)
+ boot_cpu_data.microcode = rev;
+
return UCODE_UPDATED;
}