2018-04-18 08:18:54

by Vitezslav Samel

[permalink] [raw]
Subject: 4.15.17 regression: bisected: timeout during microcode update

Hi!

Starting with 4.15.17 I got panic "timeout during microcode update"
which I bisected down to commit 8e1161f94614 ("x86/microcode: Synchronize
late microcode loading") - upstream commit
a5321aec6412b20b5ad15db2d6b916c05349dbff.

The panic happens during CPU microcode update.

The same happens with 4.16+.

Machine is "Supermicro X10SLM-F/X10SLM-F, BIOS 2.2 02/05/2015"
CPU is "Intel(R) Xeon(R) CPU E3-1220 v3 @ 3.10GHz" with microcode
revision 0x1c. Microcode update files are in /lib/firmware/intel-ucode
(2018-03-12) - revision 0x24.

I'm attaching .config and dmesg from 4.15.16 boot (known good).

Could be done anything to prevent this panic?

Cheers,

Vita


Attachments:
(No filename) (709.00 B)
.config (80.88 kB)
dmesg-20180418-09:41-4.15.16 (39.95 kB)
Download all attachments

2018-04-18 10:09:04

by Borislav Petkov

[permalink] [raw]
Subject: Re: 4.15.17 regression: bisected: timeout during microcode update

On Wed, Apr 18, 2018 at 10:11:40AM +0200, Vitezslav Samel wrote:
> Could be done anything to prevent this panic?

Yes, for starters, is there anything preventing you from using an initrd
and doing early microcode loading?

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/x86/microcode.txt

If no, consider doing that and not using the late method which is
problematic at best.

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2018-04-18 12:10:12

by Vitezslav Samel

[permalink] [raw]
Subject: Re: 4.15.17 regression: bisected: timeout during microcode update

On Wed, Apr 18, 2018 at 12:07:21PM +0200, Borislav Petkov wrote:
> On Wed, Apr 18, 2018 at 10:11:40AM +0200, Vitezslav Samel wrote:
> > Could be done anything to prevent this panic?
>
> Yes, for starters, is there anything preventing you from using an initrd
> and doing early microcode loading?
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/x86/microcode.txt
>
> If no, consider doing that and not using the late method which is
> problematic at best.

I switched to firmware-in-kernel early loading and that works OK.
But still, the reported issue is regression in 4.15.17 and 4.16+.

Thanks,

Vita

2018-04-18 12:23:33

by Borislav Petkov

[permalink] [raw]
Subject: Re: 4.15.17 regression: bisected: timeout during microcode update

On Wed, Apr 18, 2018 at 02:08:40PM +0200, Vitezslav Samel wrote:
> I switched to firmware-in-kernel early loading and that works OK.

Ok, and keep using that from now on.

People should all move away from that late loading dance. I'm saying
that in case someone else reads this on lkml.

> But still, the reported issue is regression in 4.15.17 and 4.16+.

Oh, I know it is a regression.

@Ashok: anything particular about his microcode revision not being able
to stomach late loading?

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2018-04-18 13:55:03

by Ashok Raj

[permalink] [raw]
Subject: Re: 4.15.17 regression: bisected: timeout during microcode update

On Wed, Apr 18, 2018 at 02:22:12PM +0200, Borislav Petkov wrote:
> On Wed, Apr 18, 2018 at 02:08:40PM +0200, Vitezslav Samel wrote:
> > I switched to firmware-in-kernel early loading and that works OK.

firmware-in-kernel means you compile your microcode image in linux?

Can you tell me which distro you are using? The ones we used
doesn't do late-loading (i.e echo 1 > /sys/devices/system/cpu/microcode/reload)

I suspect that might be the problem.

- Can you remove your builtin microcode,
- rename the /lib/firmware/intel-ucode so we don't find it during late
loading.
- let the system boot completely
- then rename the intel-ucode back for this test.
- write 1 to reload and see if that update succeeds or fails?


>
> Ok, and keep using that from now on.
>
> People should all move away from that late loading dance. I'm saying
> that in case someone else reads this on lkml.
>
> > But still, the reported issue is regression in 4.15.17 and 4.16+.
>
> Oh, I know it is a regression.
>
> @Ashok: anything particular about his microcode revision not being able
> to stomach late loading?

nothing about the microcode itself comes to mind. I'm wondering if this
similar to the Arch linux that used late-load during booting might be an issue.


2018-04-18 16:15:55

by Borislav Petkov

[permalink] [raw]
Subject: Re: 4.15.17 regression: bisected: timeout during microcode update

On Wed, Apr 18, 2018 at 06:53:30AM -0700, Raj, Ashok wrote:
> nothing about the microcode itself comes to mind. I'm wondering if
> this similar to the Arch linux that used late-load during booting
> might be an issue.

How so?

They'd echo 1 > /sys/... very early during boot?

Got an example dmesg or so?

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2018-04-19 05:37:00

by Vitezslav Samel

[permalink] [raw]
Subject: Re: 4.15.17 regression: bisected: timeout during microcode update

On Wed, Apr 18, 2018 at 06:53:30AM -0700, Raj, Ashok wrote:
> On Wed, Apr 18, 2018 at 02:22:12PM +0200, Borislav Petkov wrote:
> > On Wed, Apr 18, 2018 at 02:08:40PM +0200, Vitezslav Samel wrote:
> > > I switched to firmware-in-kernel early loading and that works OK.
>
> firmware-in-kernel means you compile your microcode image in linux?

Yes:

CONFIG_FW_LOADER=y
CONFIG_EXTRA_FIRMWARE="intel-ucode/06-3c-03"
CONFIG_EXTRA_FIRMWARE_DIR="/lib/firmware"

> Can you tell me which distro you are using? The ones we used
> doesn't do late-loading (i.e echo 1 > /sys/devices/system/cpu/microcode/reload)
>
> I suspect that might be the problem.

Yes, this is it. I followed instructions included in current microcode
package downloaded from intel website and added "echo 1 > /sys/dev..."
to my Slackware's /etc/rc.d/rc.local.

> - Can you remove your builtin microcode,
> - rename the /lib/firmware/intel-ucode so we don't find it during late loading.
> - let the system boot completely
> - then rename the intel-ucode back for this test.
> - write 1 to reload and see if that update succeeds or fails?

Just tested, it fails.

Cheers,

Vita

> > Ok, and keep using that from now on.
> >
> > People should all move away from that late loading dance. I'm saying
> > that in case someone else reads this on lkml.
> >
> > > But still, the reported issue is regression in 4.15.17 and 4.16+.
> >
> > Oh, I know it is a regression.
> >
> > @Ashok: anything particular about his microcode revision not being able
> > to stomach late loading?
>
> nothing about the microcode itself comes to mind. I'm wondering if this
> similar to the Arch linux that used late-load during booting might be an issue.

2018-04-19 07:24:27

by Ashok Raj

[permalink] [raw]
Subject: Re: 4.15.17 regression: bisected: timeout during microcode update

On Thu, Apr 19, 2018 at 07:35:31AM +0200, Vitezslav Samel wrote:
> > - Can you remove your builtin microcode,
> > - rename the /lib/firmware/intel-ucode so we don't find it during late loading.
> > - let the system boot completely
> > - then rename the intel-ucode back for this test.
> > - write 1 to reload and see if that update succeeds or fails?
>
> Just tested, it fails.
>
I haven't tested under slackware.. i'll have someone test this and keep you posted.



2018-04-19 10:50:17

by Borislav Petkov

[permalink] [raw]
Subject: Re: 4.15.17 regression: bisected: timeout during microcode update

On Thu, Apr 19, 2018 at 07:35:31AM +0200, Vitezslav Samel wrote:
> > - Can you remove your builtin microcode,
> > - rename the /lib/firmware/intel-ucode so we don't find it during late loading.
> > - let the system boot completely
> > - then rename the intel-ucode back for this test.
> > - write 1 to reload and see if that update succeeds or fails?
>
> Just tested, it fails.

Can you apply the below patch, do the exact same exercise and catch the
output? Over serial console or netconsole or if nothing else, do a video
of the screen with a phone and upload it somewhere?

Thx.

---
diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index 10c4fc2c91f8..374ec1d75d89 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -553,6 +553,8 @@ static int __reload_late(void *info)
enum ucode_state err;
int ret = 0;

+ pr_info("%s: CPU%d\n", __func__, cpu);
+
/*
* Wait for all CPUs to arrive. A load will not be attempted unless all
* CPUs show up.
@@ -560,6 +562,8 @@ static int __reload_late(void *info)
if (__wait_for_cpus(&late_cpus_in, NSEC_PER_SEC))
return -1;

+ pr_info("%s: CPU%d reloading\n", __func__, cpu);
+
spin_lock(&update_lock);
apply_microcode_local(&err);
spin_unlock(&update_lock);
@@ -571,9 +575,12 @@ static int __reload_late(void *info)
} else if (err == UCODE_UPDATED || err == UCODE_OK) {
ret = 1;
} else {
+ pr_info("%s: CPU%d returning 0x%x\n", __func__, cpu, ret);
return ret;
}

+ pr_info("%s: CPU%d waiting to exit\n", __func__, cpu);
+
/*
* Increase the wait timeout to a safe value here since we're
* serializing the microcode update and that could take a while on a

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2018-04-19 12:04:18

by Vitezslav Samel

[permalink] [raw]
Subject: Re: 4.15.17 regression: bisected: timeout during microcode update

On Thu, Apr 19, 2018 at 12:48:29PM +0200, Borislav Petkov wrote:
> On Thu, Apr 19, 2018 at 07:35:31AM +0200, Vitezslav Samel wrote:
> > > - Can you remove your builtin microcode,
> > > - rename the /lib/firmware/intel-ucode so we don't find it during late loading.
> > > - let the system boot completely
> > > - then rename the intel-ucode back for this test.
> > > - write 1 to reload and see if that update succeeds or fails?
> >
> > Just tested, it fails.
>
> Can you apply the below patch, do the exact same exercise and catch the
> output? Over serial console or netconsole or if nothing else, do a video
> of the screen with a phone and upload it somewhere?

Here it is:
-------------------------------------------------------------
microcode: __reload_late: CPU1
microcode: __reload_late: CPU3
microcode: __reload_late: CPU2
microcode: __reload_late: CPU0
microcode: __reload_late: CPU1 reloading
microcode: __reload_late: CPU3 reloading
microcode: __reload_late: CPU2 reloading
microcode: __reload_late: CPU0 reloading
microcode: __reload_late: CPU3 returning 0x0
microcode: __reload_late: CPU2 returning 0x0
microcode: updated to revision 0x24, date = 2018-01-21
microcode: __reload_late: CPU0 waiting to exit
microcode: __reload_late: CPU1 returning 0x0
microcode: Timeout while waiting for CPUs rendezvous, remaining: 3
Kernel panic - not syncing: Timeout during microcode update!

CPU: 0 PID: 11 Comm: migration/0 Not tainted 4.16.3+ #1
Hardware name: Supermicro X10SLM-F/X10SLM-F, BIOS 2.2 02/05/2015
Call Trace:
dump_stack+0x46/0x65
panic+0xca/0x208
__reload_late+0x11e/0x120
multi_cpu_stop+0x55/0xa0
? cpu_stop_queue_work+0x80/0x80
cpu_stopper_thread+0x7d/0x100
? sort_range+0x20/0x20
smpboot_thread_fn+0x11f/0x1e0
kthread+0x101/0x120
? __kthread_create_on_node+0x150/0x150
? __kthread_create_on_node+0xf0/0x150
ret_from_fork+0x35/0x40
Shutting down cpus with NMI
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Timeout during microcode update!
-------------------------------------------------------------

Vita

>
> Thx.
>
> ---
> diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
> index 10c4fc2c91f8..374ec1d75d89 100644
> --- a/arch/x86/kernel/cpu/microcode/core.c
> +++ b/arch/x86/kernel/cpu/microcode/core.c
> @@ -553,6 +553,8 @@ static int __reload_late(void *info)
> enum ucode_state err;
> int ret = 0;
>
> + pr_info("%s: CPU%d\n", __func__, cpu);
> +
> /*
> * Wait for all CPUs to arrive. A load will not be attempted unless all
> * CPUs show up.
> @@ -560,6 +562,8 @@ static int __reload_late(void *info)
> if (__wait_for_cpus(&late_cpus_in, NSEC_PER_SEC))
> return -1;
>
> + pr_info("%s: CPU%d reloading\n", __func__, cpu);
> +
> spin_lock(&update_lock);
> apply_microcode_local(&err);
> spin_unlock(&update_lock);
> @@ -571,9 +575,12 @@ static int __reload_late(void *info)
> } else if (err == UCODE_UPDATED || err == UCODE_OK) {
> ret = 1;
> } else {
> + pr_info("%s: CPU%d returning 0x%x\n", __func__, cpu, ret);
> return ret;
> }
>
> + pr_info("%s: CPU%d waiting to exit\n", __func__, cpu);
> +
> /*
> * Increase the wait timeout to a safe value here since we're
> * serializing the microcode update and that could take a while on a
>
> --
> Regards/Gruss,
> Boris.
>
> SUSE Linux GmbH, GF: Felix Imend?rffer, Jane Smithard, Graham Norton, HRB 21284 (AG N?rnberg)
> --

2018-04-19 12:20:20

by Borislav Petkov

[permalink] [raw]
Subject: Re: 4.15.17 regression: bisected: timeout during microcode update

On Thu, Apr 19, 2018 at 02:02:39PM +0200, Vitezslav Samel wrote:
> Here it is:

Thanks!

> -------------------------------------------------------------
> microcode: __reload_late: CPU1
> microcode: __reload_late: CPU3
> microcode: __reload_late: CPU2
> microcode: __reload_late: CPU0
> microcode: __reload_late: CPU1 reloading
> microcode: __reload_late: CPU3 reloading
> microcode: __reload_late: CPU2 reloading
> microcode: __reload_late: CPU0 reloading
> microcode: __reload_late: CPU3 returning 0x0

Aaaha, there it is.

Ok, debug patch v2, please do the same.

Thx.

---
diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index 10c4fc2c91f8..737f93422d0b 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -553,6 +553,8 @@ static int __reload_late(void *info)
enum ucode_state err;
int ret = 0;

+ pr_info("%s: CPU%d\n", __func__, cpu);
+
/*
* Wait for all CPUs to arrive. A load will not be attempted unless all
* CPUs show up.
@@ -560,6 +562,8 @@ static int __reload_late(void *info)
if (__wait_for_cpus(&late_cpus_in, NSEC_PER_SEC))
return -1;

+ pr_info("%s: CPU%d reloading\n", __func__, cpu);
+
spin_lock(&update_lock);
apply_microcode_local(&err);
spin_unlock(&update_lock);
@@ -571,9 +575,12 @@ static int __reload_late(void *info)
} else if (err == UCODE_UPDATED || err == UCODE_OK) {
ret = 1;
} else {
+ pr_info("%s: CPU%d returning 0x%x\n", __func__, cpu, err);
return ret;
}

+ pr_info("%s: CPU%d waiting to exit\n", __func__, cpu);
+
/*
* Increase the wait timeout to a safe value here since we're
* serializing the microcode update and that could take a while on a
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 32b8e5724f96..ad0a19154c40 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -727,21 +727,32 @@ static struct microcode_intel *find_patch(struct ucode_cpu_info *uci)
{
struct microcode_header_intel *phdr;
struct ucode_patch *iter, *tmp;
+ int cpu = smp_processor_id();

list_for_each_entry_safe(iter, tmp, &microcode_cache, plist) {

phdr = (struct microcode_header_intel *)iter->data;

+ pr_info("%s: CPU%d, phdr: 0x%x, uci: 0x%x\n",
+ __func__, cpu, phdr->rev, uci->cpu_sig.rev);
+
if (phdr->rev <= uci->cpu_sig.rev)
continue;

+ pr_info("%s: CPU%d, find_matching_signature: sig: 0x%x, pf: 0x%x\n",
+ __func__, cpu, uci->cpu_sig.sig, uci->cpu_sig.pf);
+
if (!find_matching_signature(phdr,
uci->cpu_sig.sig,
uci->cpu_sig.pf))
continue;

+
+ pr_info("%s: CPU%d, found phdr: 0x%x\n", __func__, cpu, phdr->rev);
return iter->data;
}
+
+ pr_info("%s: CPU%d, NADA\n", __func__, cpu);
return NULL;
}

@@ -805,8 +816,10 @@ static enum ucode_state apply_microcode_intel(int cpu)
mc = find_patch(uci);
if (!mc) {
mc = uci->mc;
- if (!mc)
+ if (!mc) {
+ pr_info("%s: CPU%d, NFOUND\n", __func__, cpu);
return UCODE_NFOUND;
+ }
}

/*

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2018-04-19 12:34:11

by Ashok Raj

[permalink] [raw]
Subject: Re: 4.15.17 regression: bisected: timeout during microcode update

Thanks.. also can you remove the ret below, and send the output
of /proc/cpuinfo before and after?

On Thu, Apr 19, 2018 at 02:18:41PM +0200, Borislav Petkov wrote:
> } else {
> + pr_info("%s: CPU%d returning 0x%x\n", __func__, cpu, err);
> return ret;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> }
>

This should not panic, but curious what the microcode versions reported...

2018-04-19 13:48:18

by Vitezslav Samel

[permalink] [raw]
Subject: Re: 4.15.17 regression: bisected: timeout during microcode update

On Thu, Apr 19, 2018 at 02:18:41PM +0200, Borislav Petkov wrote:
> On Thu, Apr 19, 2018 at 02:02:39PM +0200, Vitezslav Samel wrote:
> > Here it is:
>
> Thanks!
>
> > -------------------------------------------------------------
> > microcode: __reload_late: CPU1
> > microcode: __reload_late: CPU3
> > microcode: __reload_late: CPU2
> > microcode: __reload_late: CPU0
> > microcode: __reload_late: CPU1 reloading
> > microcode: __reload_late: CPU3 reloading
> > microcode: __reload_late: CPU2 reloading
> > microcode: __reload_late: CPU0 reloading
> > microcode: __reload_late: CPU3 returning 0x0
>
> Aaaha, there it is.
>
> Ok, debug patch v2, please do the same.

Here it is:

------------------------------------------------------------
microcode: __reload_late: CPU0
microcode: __reload_late: CPU3
microcode: __reload_late: CPU2
microcode: __reload_late: CPU1
microcode: __reload_late: CPU0 reloading
microcode: __reload_late: CPU2 reloading
microcode: __reload_late: CPU1 reloading
microcode: __reload_late: CPU3 reloading
microcode: find_patch: CPU2, NADA
microcode: apply_microcode_intel: CPU2, NFOUND
microcode: __reload_late: CPU2 returning 0x3
microcode: find_patch: CPU1, NADA
microcode: apply_microcode_intel: CPU1, NFOUND
microcode: __reload_late: CPU1 returning 0x3
microcode: find_patch: CPU3, NADA
microcode: apply_microcode_intel: CPU3, NFOUND
microcode: __reload_late: CPU3 returning 0x3
microcode: find_patch: CPU0, NADA
microcode: updated to revision 0x24, date = 2018-01-21
microcode: __reload_late: CPU0 waiting to exit
microcode: Timeout while waiting for CPUs rendezvous, remaining: 3
Kernel panic - not syncing: Timeout during microcode update!

CPU: 0 PID: 11 Comm: migration/0 Not tainted 4.16.3 #1
Hardware name: Supermicro X10SLM-F/X10SLM-F, BIOS 2.2 02/05/2015
Call Trace:
dump_stack+0x46/0x65
panic+0xca/0x208
__reload_late+0x122/0x130
multi_cpu_stop+0x55/0xa0
? cpu_stop_queue_work+0x80/0x80
cpu_stopper_thread+0x7d/0x100
? sort_range+0x20/0x20
smpboot_thread_fn+0x11f/0x1e0
kthread+0x101/0x120
? __kthread_create_on_node+0x150/0x150
? __kthread_create_on_node+0x40/0x150
ret_from_fork+0x35/0x40
Shutting down cpus with NMI
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Timeout during microcode update!
------------------------------------------------------------

Cheers,

Vita


>
> Thx.
>
> ---
> diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
> index 10c4fc2c91f8..737f93422d0b 100644
> --- a/arch/x86/kernel/cpu/microcode/core.c
> +++ b/arch/x86/kernel/cpu/microcode/core.c
> @@ -553,6 +553,8 @@ static int __reload_late(void *info)
> enum ucode_state err;
> int ret = 0;
>
> + pr_info("%s: CPU%d\n", __func__, cpu);
> +
> /*
> * Wait for all CPUs to arrive. A load will not be attempted unless all
> * CPUs show up.
> @@ -560,6 +562,8 @@ static int __reload_late(void *info)
> if (__wait_for_cpus(&late_cpus_in, NSEC_PER_SEC))
> return -1;
>
> + pr_info("%s: CPU%d reloading\n", __func__, cpu);
> +
> spin_lock(&update_lock);
> apply_microcode_local(&err);
> spin_unlock(&update_lock);
> @@ -571,9 +575,12 @@ static int __reload_late(void *info)
> } else if (err == UCODE_UPDATED || err == UCODE_OK) {
> ret = 1;
> } else {
> + pr_info("%s: CPU%d returning 0x%x\n", __func__, cpu, err);
> return ret;
> }
>
> + pr_info("%s: CPU%d waiting to exit\n", __func__, cpu);
> +
> /*
> * Increase the wait timeout to a safe value here since we're
> * serializing the microcode update and that could take a while on a
> diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
> index 32b8e5724f96..ad0a19154c40 100644
> --- a/arch/x86/kernel/cpu/microcode/intel.c
> +++ b/arch/x86/kernel/cpu/microcode/intel.c
> @@ -727,21 +727,32 @@ static struct microcode_intel *find_patch(struct ucode_cpu_info *uci)
> {
> struct microcode_header_intel *phdr;
> struct ucode_patch *iter, *tmp;
> + int cpu = smp_processor_id();
>
> list_for_each_entry_safe(iter, tmp, &microcode_cache, plist) {
>
> phdr = (struct microcode_header_intel *)iter->data;
>
> + pr_info("%s: CPU%d, phdr: 0x%x, uci: 0x%x\n",
> + __func__, cpu, phdr->rev, uci->cpu_sig.rev);
> +
> if (phdr->rev <= uci->cpu_sig.rev)
> continue;
>
> + pr_info("%s: CPU%d, find_matching_signature: sig: 0x%x, pf: 0x%x\n",
> + __func__, cpu, uci->cpu_sig.sig, uci->cpu_sig.pf);
> +
> if (!find_matching_signature(phdr,
> uci->cpu_sig.sig,
> uci->cpu_sig.pf))
> continue;
>
> +
> + pr_info("%s: CPU%d, found phdr: 0x%x\n", __func__, cpu, phdr->rev);
> return iter->data;
> }
> +
> + pr_info("%s: CPU%d, NADA\n", __func__, cpu);
> return NULL;
> }
>
> @@ -805,8 +816,10 @@ static enum ucode_state apply_microcode_intel(int cpu)
> mc = find_patch(uci);
> if (!mc) {
> mc = uci->mc;
> - if (!mc)
> + if (!mc) {
> + pr_info("%s: CPU%d, NFOUND\n", __func__, cpu);
> return UCODE_NFOUND;
> + }
> }
>
> /*
>
> --
> Regards/Gruss,
> Boris.
>
> SUSE Linux GmbH, GF: Felix Imend?rffer, Jane Smithard, Graham Norton, HRB 21284 (AG N?rnberg)
> --

2018-04-19 13:52:21

by Vitezslav Samel

[permalink] [raw]
Subject: Re: 4.15.17 regression: bisected: timeout during microcode update

On Thu, Apr 19, 2018 at 05:32:18AM -0700, Raj, Ashok wrote:
> Thanks.. also can you remove the ret below, and send the output
> of /proc/cpuinfo before and after?
>
> On Thu, Apr 19, 2018 at 02:18:41PM +0200, Borislav Petkov wrote:
> > } else {
> > + pr_info("%s: CPU%d returning 0x%x\n", __func__, cpu, err);
> > return ret;
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > }
> >
>
> This should not panic, but curious what the microcode versions reported...

Yes, this time it didn't panic:

--------------------------------------------------------------
microcode: __reload_late: CPU1
microcode: __reload_late: CPU3
microcode: __reload_late: CPU0
microcode: __reload_late: CPU2
microcode: __reload_late: CPU1 reloading
microcode: __reload_late: CPU2 reloading
microcode: __reload_late: CPU0 reloading
microcode: __reload_late: CPU3 reloading
microcode: find_patch: CPU2, NADA
microcode: apply_microcode_intel: CPU2, NFOUND
microcode: __reload_late: CPU2 returning 0x3
microcode: find_patch: CPU0, NADA
microcode: __reload_late: CPU2 waiting to exit
microcode: updated to revision 0x24, date = 2018-01-21
microcode: __reload_late: CPU0 waiting to exit
microcode: find_patch: CPU3, NADA
microcode: apply_microcode_intel: CPU3, NFOUND
microcode: __reload_late: CPU3 returning 0x3
microcode: find_patch: CPU1, NADA
microcode: apply_microcode_intel: CPU1, NFOUND
microcode: __reload_late: CPU3 waiting to exit
microcode: __reload_late: CPU1 returning 0x3
microcode: __reload_late: CPU1 waiting to exit
x86/CPU: CPU features have changed after loading microcode, but might not take effect.
x86/CPU: Please consider either early loading through initrd/built-in or a potential BIOS update.
--------------------------------------------------------------
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Xeon(R) CPU E3-1220 v3 @ 3.10GHz
stepping : 3
microcode : 0x1c
cpu MHz : 3052.578
cache size : 8192 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts
bugs : cpu_meltdown spectre_v1 spectre_v2
bogomips : 6200.02
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Xeon(R) CPU E3-1220 v3 @ 3.10GHz
stepping : 3
microcode : 0x1c
cpu MHz : 3070.964
cache size : 8192 KB
physical id : 0
siblings : 4
core id : 1
cpu cores : 4
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts
bugs : cpu_meltdown spectre_v1 spectre_v2
bogomips : 6200.02
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:

processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Xeon(R) CPU E3-1220 v3 @ 3.10GHz
stepping : 3
microcode : 0x1c
cpu MHz : 3094.770
cache size : 8192 KB
physical id : 0
siblings : 4
core id : 2
cpu cores : 4
apicid : 4
initial apicid : 4
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts
bugs : cpu_meltdown spectre_v1 spectre_v2
bogomips : 6200.02
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:

processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Xeon(R) CPU E3-1220 v3 @ 3.10GHz
stepping : 3
microcode : 0x1c
cpu MHz : 2895.857
cache size : 8192 KB
physical id : 0
siblings : 4
core id : 3
cpu cores : 4
apicid : 6
initial apicid : 6
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts
bugs : cpu_meltdown spectre_v1 spectre_v2
bogomips : 6200.02
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
--------------------------------------------------------------
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Xeon(R) CPU E3-1220 v3 @ 3.10GHz
stepping : 3
microcode : 0x24
cpu MHz : 1915.336
cache size : 8192 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts
bugs : cpu_meltdown spectre_v1 spectre_v2
bogomips : 6200.02
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Xeon(R) CPU E3-1220 v3 @ 3.10GHz
stepping : 3
microcode : 0x1c
cpu MHz : 3076.997
cache size : 8192 KB
physical id : 0
siblings : 4
core id : 1
cpu cores : 4
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts
bugs : cpu_meltdown spectre_v1 spectre_v2
bogomips : 6200.02
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:

processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Xeon(R) CPU E3-1220 v3 @ 3.10GHz
stepping : 3
microcode : 0x1c
cpu MHz : 2589.773
cache size : 8192 KB
physical id : 0
siblings : 4
core id : 2
cpu cores : 4
apicid : 4
initial apicid : 4
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts
bugs : cpu_meltdown spectre_v1 spectre_v2
bogomips : 6200.02
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:

processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Xeon(R) CPU E3-1220 v3 @ 3.10GHz
stepping : 3
microcode : 0x1c
cpu MHz : 2865.152
cache size : 8192 KB
physical id : 0
siblings : 4
core id : 3
cpu cores : 4
apicid : 6
initial apicid : 6
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts
bugs : cpu_meltdown spectre_v1 spectre_v2
bogomips : 6200.02
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
--------------------------------------------------------------

Cheers,

Vita

2018-04-19 17:03:03

by Borislav Petkov

[permalink] [raw]
Subject: Re: 4.15.17 regression: bisected: timeout during microcode update

On Thu, Apr 19, 2018 at 03:46:27PM +0200, Vitezslav Samel wrote:
> ------------------------------------------------------------
> microcode: __reload_late: CPU0
> microcode: __reload_late: CPU3
> microcode: __reload_late: CPU2
> microcode: __reload_late: CPU1
> microcode: __reload_late: CPU0 reloading
> microcode: __reload_late: CPU2 reloading
> microcode: __reload_late: CPU1 reloading
> microcode: __reload_late: CPU3 reloading
> microcode: find_patch: CPU2, NADA

Ok, I think I have it. Please run the patch below, it still has the
debugging output so please paste it here once you've done the exact same
exercise.

It should not explode this time! (Famous last words :-))

Thx!

---
arch/x86/kernel/cpu/microcode/core.c | 11 +++++++----
arch/x86/kernel/cpu/microcode/intel.c | 17 ++++++++++++++---
2 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index 10c4fc2c91f8..e84877b0f7d7 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -553,6 +553,8 @@ static int __reload_late(void *info)
enum ucode_state err;
int ret = 0;

+ pr_info("%s: CPU%d\n", __func__, cpu);
+
/*
* Wait for all CPUs to arrive. A load will not be attempted unless all
* CPUs show up.
@@ -560,20 +562,21 @@ static int __reload_late(void *info)
if (__wait_for_cpus(&late_cpus_in, NSEC_PER_SEC))
return -1;

+ pr_info("%s: CPU%d reloading\n", __func__, cpu);
+
spin_lock(&update_lock);
apply_microcode_local(&err);
spin_unlock(&update_lock);

+ /* siblings return UCODE_OK because their engine got updated already */
if (err > UCODE_NFOUND) {
pr_warn("Error reloading microcode on CPU %d\n", cpu);
- return -1;
- /* siblings return UCODE_OK because their engine got updated already */
} else if (err == UCODE_UPDATED || err == UCODE_OK) {
ret = 1;
- } else {
- return ret;
}

+ pr_info("%s: CPU%d waiting to exit\n", __func__, cpu);
+
/*
* Increase the wait timeout to a safe value here since we're
* serializing the microcode update and that could take a while on a
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 32b8e5724f96..725e0bb6df03 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -485,7 +485,6 @@ static void show_saved_mc(void)
*/
static void save_mc_for_early(u8 *mc, unsigned int size)
{
-#ifdef CONFIG_HOTPLUG_CPU
/* Synchronization during CPU hotplug. */
static DEFINE_MUTEX(x86_cpu_microcode_mutex);

@@ -495,7 +494,6 @@ static void save_mc_for_early(u8 *mc, unsigned int size)
show_saved_mc();

mutex_unlock(&x86_cpu_microcode_mutex);
-#endif
}

static bool load_builtin_intel_microcode(struct cpio_data *cp)
@@ -727,21 +725,32 @@ static struct microcode_intel *find_patch(struct ucode_cpu_info *uci)
{
struct microcode_header_intel *phdr;
struct ucode_patch *iter, *tmp;
+ int cpu = smp_processor_id();

list_for_each_entry_safe(iter, tmp, &microcode_cache, plist) {

phdr = (struct microcode_header_intel *)iter->data;

+ pr_info("%s: CPU%d, phdr: 0x%x, uci: 0x%x\n",
+ __func__, cpu, phdr->rev, uci->cpu_sig.rev);
+
if (phdr->rev <= uci->cpu_sig.rev)
continue;

+ pr_info("%s: CPU%d, find_matching_signature: sig: 0x%x, pf: 0x%x\n",
+ __func__, cpu, uci->cpu_sig.sig, uci->cpu_sig.pf);
+
if (!find_matching_signature(phdr,
uci->cpu_sig.sig,
uci->cpu_sig.pf))
continue;

+
+ pr_info("%s: CPU%d, found phdr: 0x%x\n", __func__, cpu, phdr->rev);
return iter->data;
}
+
+ pr_info("%s: CPU%d, NADA\n", __func__, cpu);
return NULL;
}

@@ -805,8 +814,10 @@ static enum ucode_state apply_microcode_intel(int cpu)
mc = find_patch(uci);
if (!mc) {
mc = uci->mc;
- if (!mc)
+ if (!mc) {
+ pr_info("%s: CPU%d, NFOUND\n", __func__, cpu);
return UCODE_NFOUND;
+ }
}

/*
--
2.13.0

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2018-04-20 06:21:55

by Vitezslav Samel

[permalink] [raw]
Subject: Re: 4.15.17 regression: bisected: timeout during microcode update

On Thu, Apr 19, 2018 at 06:37:34PM +0200, Borislav Petkov wrote:
> On Thu, Apr 19, 2018 at 03:46:27PM +0200, Vitezslav Samel wrote:
> > ------------------------------------------------------------
> > microcode: __reload_late: CPU0
> > microcode: __reload_late: CPU3
> > microcode: __reload_late: CPU2
> > microcode: __reload_late: CPU1
> > microcode: __reload_late: CPU0 reloading
> > microcode: __reload_late: CPU2 reloading
> > microcode: __reload_late: CPU1 reloading
> > microcode: __reload_late: CPU3 reloading
> > microcode: find_patch: CPU2, NADA
>
> Ok, I think I have it. Please run the patch below, it still has the
> debugging output so please paste it here once you've done the exact same
> exercise.
>
> It should not explode this time! (Famous last words :-))

;-) This time it works.

(Ashok: all test were against stable 4.16.3)

------------------------------------------------------
microcode: __reload_late: CPU1
microcode: __reload_late: CPU0
microcode: __reload_late: CPU2
microcode: __reload_late: CPU3
microcode: __reload_late: CPU1 reloading
microcode: __reload_late: CPU0 reloading
microcode: __reload_late: CPU3 reloading
microcode: __reload_late: CPU2 reloading
microcode: find_patch: CPU0, phdr: 0x24, uci: 0x1c
microcode: find_patch: CPU0, find_matching_signature: sig: 0x306c3, pf: 0x2
microcode: find_patch: CPU0, found phdr: 0x24
microcode: updated to revision 0x24, date = 2018-01-21
microcode: __reload_late: CPU0 waiting to exit
microcode: find_patch: CPU3, phdr: 0x24, uci: 0x1c
microcode: find_patch: CPU3, find_matching_signature: sig: 0x306c3, pf: 0x2
microcode: find_patch: CPU3, found phdr: 0x24
microcode: __reload_late: CPU3 waiting to exit
microcode: find_patch: CPU2, phdr: 0x24, uci: 0x1c
microcode: find_patch: CPU2, find_matching_signature: sig: 0x306c3, pf: 0x2
microcode: find_patch: CPU2, found phdr: 0x24
microcode: __reload_late: CPU2 waiting to exit
microcode: find_patch: CPU1, phdr: 0x24, uci: 0x1c
microcode: find_patch: CPU1, find_matching_signature: sig: 0x306c3, pf: 0x2
microcode: find_patch: CPU1, found phdr: 0x24
microcode: __reload_late: CPU1 waiting to exit
x86/CPU: CPU features have changed after loading microcode, but might not take effect.
x86/CPU: Please consider either early loading through initrd/built-in or a potential BIOS update.
------------------------------------------------------

Thank you very much,

Vita


> Thx!
>
> ---
> arch/x86/kernel/cpu/microcode/core.c | 11 +++++++----
> arch/x86/kernel/cpu/microcode/intel.c | 17 ++++++++++++++---
> 2 files changed, 21 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
> index 10c4fc2c91f8..e84877b0f7d7 100644
> --- a/arch/x86/kernel/cpu/microcode/core.c
> +++ b/arch/x86/kernel/cpu/microcode/core.c
> @@ -553,6 +553,8 @@ static int __reload_late(void *info)
> enum ucode_state err;
> int ret = 0;
>
> + pr_info("%s: CPU%d\n", __func__, cpu);
> +
> /*
> * Wait for all CPUs to arrive. A load will not be attempted unless all
> * CPUs show up.
> @@ -560,20 +562,21 @@ static int __reload_late(void *info)
> if (__wait_for_cpus(&late_cpus_in, NSEC_PER_SEC))
> return -1;
>
> + pr_info("%s: CPU%d reloading\n", __func__, cpu);
> +
> spin_lock(&update_lock);
> apply_microcode_local(&err);
> spin_unlock(&update_lock);
>
> + /* siblings return UCODE_OK because their engine got updated already */
> if (err > UCODE_NFOUND) {
> pr_warn("Error reloading microcode on CPU %d\n", cpu);
> - return -1;
> - /* siblings return UCODE_OK because their engine got updated already */
> } else if (err == UCODE_UPDATED || err == UCODE_OK) {
> ret = 1;
> - } else {
> - return ret;
> }
>
> + pr_info("%s: CPU%d waiting to exit\n", __func__, cpu);
> +
> /*
> * Increase the wait timeout to a safe value here since we're
> * serializing the microcode update and that could take a while on a
> diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
> index 32b8e5724f96..725e0bb6df03 100644
> --- a/arch/x86/kernel/cpu/microcode/intel.c
> +++ b/arch/x86/kernel/cpu/microcode/intel.c
> @@ -485,7 +485,6 @@ static void show_saved_mc(void)
> */
> static void save_mc_for_early(u8 *mc, unsigned int size)
> {
> -#ifdef CONFIG_HOTPLUG_CPU
> /* Synchronization during CPU hotplug. */
> static DEFINE_MUTEX(x86_cpu_microcode_mutex);
>
> @@ -495,7 +494,6 @@ static void save_mc_for_early(u8 *mc, unsigned int size)
> show_saved_mc();
>
> mutex_unlock(&x86_cpu_microcode_mutex);
> -#endif
> }
>
> static bool load_builtin_intel_microcode(struct cpio_data *cp)
> @@ -727,21 +725,32 @@ static struct microcode_intel *find_patch(struct ucode_cpu_info *uci)
> {
> struct microcode_header_intel *phdr;
> struct ucode_patch *iter, *tmp;
> + int cpu = smp_processor_id();
>
> list_for_each_entry_safe(iter, tmp, &microcode_cache, plist) {
>
> phdr = (struct microcode_header_intel *)iter->data;
>
> + pr_info("%s: CPU%d, phdr: 0x%x, uci: 0x%x\n",
> + __func__, cpu, phdr->rev, uci->cpu_sig.rev);
> +
> if (phdr->rev <= uci->cpu_sig.rev)
> continue;
>
> + pr_info("%s: CPU%d, find_matching_signature: sig: 0x%x, pf: 0x%x\n",
> + __func__, cpu, uci->cpu_sig.sig, uci->cpu_sig.pf);
> +
> if (!find_matching_signature(phdr,
> uci->cpu_sig.sig,
> uci->cpu_sig.pf))
> continue;
>
> +
> + pr_info("%s: CPU%d, found phdr: 0x%x\n", __func__, cpu, phdr->rev);
> return iter->data;
> }
> +
> + pr_info("%s: CPU%d, NADA\n", __func__, cpu);
> return NULL;
> }
>
> @@ -805,8 +814,10 @@ static enum ucode_state apply_microcode_intel(int cpu)
> mc = find_patch(uci);
> if (!mc) {
> mc = uci->mc;
> - if (!mc)
> + if (!mc) {
> + pr_info("%s: CPU%d, NFOUND\n", __func__, cpu);
> return UCODE_NFOUND;
> + }
> }
>
> /*
> --
> 2.13.0
>
> SUSE Linux GmbH, GF: Felix Imend?rffer, Jane Smithard, Graham Norton, HRB 21284 (AG N?rnberg)
> --

2018-04-20 09:53:42

by Borislav Petkov

[permalink] [raw]
Subject: Re: 4.15.17 regression: bisected: timeout during microcode update

On Fri, Apr 20, 2018 at 08:20:21AM +0200, Vitezslav Samel wrote:
> ;-) This time it works.

Good. :-)

> microcode: __reload_late: CPU1 waiting to exit
> x86/CPU: CPU features have changed after loading microcode, but might not take effect.
> x86/CPU: Please consider either early loading through initrd/built-in or a potential BIOS update.
> ------------------------------------------------------

Ok, and after this happens

$ grep microcode /proc/cpuinfo | uniq

shows only 0x24, correct?

> Thank you very much,

No, thank you for reporting and testing!

Ok, cleaned up patches with commit messages coming up.

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2018-04-20 10:03:25

by Vitezslav Samel

[permalink] [raw]
Subject: Re: 4.15.17 regression: bisected: timeout during microcode update

On Fri, Apr 20, 2018 at 11:52:20AM +0200, Borislav Petkov wrote:
> On Fri, Apr 20, 2018 at 08:20:21AM +0200, Vitezslav Samel wrote:
> > ;-) This time it works.
>
> Good. :-)
>
> > microcode: __reload_late: CPU1 waiting to exit
> > x86/CPU: CPU features have changed after loading microcode, but might not take effect.
> > x86/CPU: Please consider either early loading through initrd/built-in or a potential BIOS update.
> > ------------------------------------------------------
>
> Ok, and after this happens
>
> $ grep microcode /proc/cpuinfo | uniq
>
> shows only 0x24, correct?

Yes.

> > Thank you very much,
>
> No, thank you for reporting and testing!
>
> Ok, cleaned up patches with commit messages coming up.

Good.

Cheers,

Vita

2018-04-20 10:34:12

by Borislav Petkov

[permalink] [raw]
Subject: Re: 4.15.17 regression: bisected: timeout during microcode update

On Fri, Apr 20, 2018 at 12:01:31PM +0200, Vitezslav Samel wrote:
> > Ok, cleaned up patches with commit messages coming up.
>
> Good.

Ok, here they are as a reply to this message. I'm running them here, you
could run them one last time to make sure all is good.

@Ashok: please run them too, before I send them to tip guys.

Thx.

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2018-04-20 10:35:54

by Borislav Petkov

[permalink] [raw]
Subject: [PATCH 1/2] x86/microcode/intel: Save microcode patch unconditionally

save_mc_for_early() was a no-op on !CONFIG_HOTPLUG_CPU but the
generic_load_microcode() path saves the microcode patches it has found
into our cache of patches which is used for late loading too. Regardless
of whether we do CPU hotplug or not.

So make the saving unconditional so that late loading can find the
proper patch.

Reported-by: Vitezslav Samel <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Ashok Raj <[email protected]>
Cc: <[email protected]> # if it has backported d8c3b52c00a05036e0a6b315b4b17921a7b67997
Link: http://lkml.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/microcode/intel.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 32b8e5724f96..1c2cfa0644aa 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -485,7 +485,6 @@ static void show_saved_mc(void)
*/
static void save_mc_for_early(u8 *mc, unsigned int size)
{
-#ifdef CONFIG_HOTPLUG_CPU
/* Synchronization during CPU hotplug. */
static DEFINE_MUTEX(x86_cpu_microcode_mutex);

@@ -495,7 +494,6 @@ static void save_mc_for_early(u8 *mc, unsigned int size)
show_saved_mc();

mutex_unlock(&x86_cpu_microcode_mutex);
-#endif
}

static bool load_builtin_intel_microcode(struct cpio_data *cp)
--
2.13.0

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2018-04-20 10:38:34

by Borislav Petkov

[permalink] [raw]
Subject: [PATCH 2/2] x86/microcode: Do not exit early from __reload_late()

Vitezslav reported a case where the

"Timeout during microcode update!"

panic would hit. After a deeper look, it turned out that his .config had
CONFIG_HOTPLUG_CPU disabled which practically made save_mc_for_early() a
no-op.

When that happened, the discovered microcode patch wasn't saved into our
cache and the late loading path wouldn't find any.

This, then, lead to early exit from __reload_late() and thus CPUs
waiting until the timeout is reached, leading to the panic.

In hindsight, I should've made that function not return before the
post-synchronization. Oh well, I know better now...

Reported-by: Vitezslav Samel <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Ashok Raj <[email protected]>
Cc: <[email protected]>
Fixes: bb8c13d61a62 ("x86/microcode: Fix CPU synchronization routine")
Link: http://lkml.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/microcode/core.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index 10c4fc2c91f8..77e201301528 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -564,14 +564,12 @@ static int __reload_late(void *info)
apply_microcode_local(&err);
spin_unlock(&update_lock);

+ /* siblings return UCODE_OK because their engine got updated already */
if (err > UCODE_NFOUND) {
pr_warn("Error reloading microcode on CPU %d\n", cpu);
- return -1;
- /* siblings return UCODE_OK because their engine got updated already */
+ ret = -1;
} else if (err == UCODE_UPDATED || err == UCODE_OK) {
ret = 1;
- } else {
- return ret;
}

/*
--
2.13.0

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2018-04-20 10:55:57

by Vitezslav Samel

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/microcode/intel: Save microcode patch unconditionally

On Fri, Apr 20, 2018 at 12:34:28PM +0200, Borislav Petkov wrote:
> save_mc_for_early() was a no-op on !CONFIG_HOTPLUG_CPU but the
> generic_load_microcode() path saves the microcode patches it has found
> into our cache of patches which is used for late loading too. Regardless
> of whether we do CPU hotplug or not.
>
> So make the saving unconditional so that late loading can find the
> proper patch.
>
> Reported-by: Vitezslav Samel <[email protected]>

Tested-by: Vitezslav Samel <[email protected]>

> Signed-off-by: Borislav Petkov <[email protected]>
> Cc: Ashok Raj <[email protected]>
> Cc: <[email protected]> # if it has backported d8c3b52c00a05036e0a6b315b4b17921a7b67997
> Link: http://lkml.kernel.org/r/[email protected]
> ---
> arch/x86/kernel/cpu/microcode/intel.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
> index 32b8e5724f96..1c2cfa0644aa 100644
> --- a/arch/x86/kernel/cpu/microcode/intel.c
> +++ b/arch/x86/kernel/cpu/microcode/intel.c
> @@ -485,7 +485,6 @@ static void show_saved_mc(void)
> */
> static void save_mc_for_early(u8 *mc, unsigned int size)
> {
> -#ifdef CONFIG_HOTPLUG_CPU
> /* Synchronization during CPU hotplug. */
> static DEFINE_MUTEX(x86_cpu_microcode_mutex);
>
> @@ -495,7 +494,6 @@ static void save_mc_for_early(u8 *mc, unsigned int size)
> show_saved_mc();
>
> mutex_unlock(&x86_cpu_microcode_mutex);
> -#endif
> }
>
> static bool load_builtin_intel_microcode(struct cpio_data *cp)
> --
> 2.13.0
>
> SUSE Linux GmbH, GF: Felix Imend?rffer, Jane Smithard, Graham Norton, HRB 21284 (AG N?rnberg)
> --

2018-04-20 10:58:28

by Vitezslav Samel

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/microcode: Do not exit early from __reload_late()

On Fri, Apr 20, 2018 at 12:37:08PM +0200, Borislav Petkov wrote:
> Vitezslav reported a case where the
>
> "Timeout during microcode update!"
>
> panic would hit. After a deeper look, it turned out that his .config had
> CONFIG_HOTPLUG_CPU disabled which practically made save_mc_for_early() a
> no-op.
>
> When that happened, the discovered microcode patch wasn't saved into our
> cache and the late loading path wouldn't find any.
>
> This, then, lead to early exit from __reload_late() and thus CPUs
> waiting until the timeout is reached, leading to the panic.
>
> In hindsight, I should've made that function not return before the
> post-synchronization. Oh well, I know better now...
>
> Reported-by: Vitezslav Samel <[email protected]>

Tested-by: Vitezslav Samel <[email protected]>

> Signed-off-by: Borislav Petkov <[email protected]>
> Cc: Ashok Raj <[email protected]>
> Cc: <[email protected]>
> Fixes: bb8c13d61a62 ("x86/microcode: Fix CPU synchronization routine")
> Link: http://lkml.kernel.org/r/[email protected]
> ---
> arch/x86/kernel/cpu/microcode/core.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
> index 10c4fc2c91f8..77e201301528 100644
> --- a/arch/x86/kernel/cpu/microcode/core.c
> +++ b/arch/x86/kernel/cpu/microcode/core.c
> @@ -564,14 +564,12 @@ static int __reload_late(void *info)
> apply_microcode_local(&err);
> spin_unlock(&update_lock);
>
> + /* siblings return UCODE_OK because their engine got updated already */
> if (err > UCODE_NFOUND) {
> pr_warn("Error reloading microcode on CPU %d\n", cpu);
> - return -1;
> - /* siblings return UCODE_OK because their engine got updated already */
> + ret = -1;
> } else if (err == UCODE_UPDATED || err == UCODE_OK) {
> ret = 1;
> - } else {
> - return ret;
> }
>
> /*
> --
> 2.13.0
>
> SUSE Linux GmbH, GF: Felix Imend?rffer, Jane Smithard, Graham Norton, HRB 21284 (AG N?rnberg)
> --

2018-04-20 16:01:27

by Ashok Raj

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/microcode/intel: Save microcode patch unconditionally

On Fri, Apr 20, 2018 at 12:34:28PM +0200, Borislav Petkov wrote:
> save_mc_for_early() was a no-op on !CONFIG_HOTPLUG_CPU but the
> generic_load_microcode() path saves the microcode patches it has found
> into our cache of patches which is used for late loading too. Regardless
> of whether we do CPU hotplug or not.
>
> So make the saving unconditional so that late loading can find the
> proper patch.
>
> Reported-by: Vitezslav Samel <[email protected]>
> Signed-off-by: Borislav Petkov <[email protected]>
> Cc: Ashok Raj <[email protected]>

Tested-by: Ashok Raj <[email protected]>

> Cc: <[email protected]> # if it has backported d8c3b52c00a05036e0a6b315b4b17921a7b67997
> Link: http://lkml.kernel.org/r/[email protected]
> ---

2018-04-20 16:02:14

by Ashok Raj

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/microcode: Do not exit early from __reload_late()

On Fri, Apr 20, 2018 at 12:37:08PM +0200, Borislav Petkov wrote:
> Vitezslav reported a case where the
>
> "Timeout during microcode update!"
>
> panic would hit. After a deeper look, it turned out that his .config had
> CONFIG_HOTPLUG_CPU disabled which practically made save_mc_for_early() a
> no-op.
>
> When that happened, the discovered microcode patch wasn't saved into our
> cache and the late loading path wouldn't find any.
>
> This, then, lead to early exit from __reload_late() and thus CPUs
> waiting until the timeout is reached, leading to the panic.
>
> In hindsight, I should've made that function not return before the
> post-synchronization. Oh well, I know better now...
>
> Reported-by: Vitezslav Samel <[email protected]>
> Signed-off-by: Borislav Petkov <[email protected]>

Tested-by: Ashok Raj <[email protected]>

> Cc: Ashok Raj <[email protected]>
> Cc: <[email protected]>
> Fixes: bb8c13d61a62 ("x86/microcode: Fix CPU synchronization routine")
> Link: http://lkml.kernel.org/r/[email protected]
> ---