2018-01-15 13:12:24

by Jia Zhang

[permalink] [raw]
Subject: [PATCH v2] x86/microcode/intel: Extend BDW late-loading with LLC size check

The commit b94b73733171
("x86/microcode/intel: Extend BDW late-loading with a revision check")
reduces the impact of erratum BDF90 for Broadwell process model.
Actually, the impact can be reduced further through adding the checks
for the size of LLC per core.

For more details, see erratum BDF90 in document #334165 (Intel Xeon
Processor E7-8800/4800 v4 Product Family Specification Update) from
September 2017.

Signed-off-by: Jia Zhang <[email protected]>
---
arch/x86/kernel/cpu/microcode/intel.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index d9e460f..9143cf2 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -906,18 +906,29 @@ static int get_ucode_fw(void *to, const void *from, size_t n)
return 0;
}

+static int llc_size_per_core(struct cpuinfo_x86 *c)
+{
+ u64 llc_size = c->x86_cache_size * 1024;
+
+ do_div(llc_size, c->x86_max_cores);
+
+ return (int)llc_size;
+}
+
static bool is_blacklisted(unsigned int cpu)
{
struct cpuinfo_x86 *c = &cpu_data(cpu);

/*
* Late loading on model 79 with microcode revision less than 0x0b000021
- * may result in a system hang. This behavior is documented in item
- * BDF90, #334165 (Intel Xeon Processor E7-8800/4800 v4 Product Family).
+ * and LLC size per core bigger than 2.5MB may result in a system hang.
+ * This behavior is documented in item BDF90, #334165 (Intel Xeon
+ * Processor E7-8800/4800 v4 Product Family).
*/
if (c->x86 == 6 &&
c->x86_model == INTEL_FAM6_BROADWELL_X &&
c->x86_mask == 0x01 &&
+ llc_size_per_core(c) > 2621440 &&
c->microcode < 0x0b000021) {
pr_err_once("Erratum BDF90: late loading with revision < 0x0b000021 (0x%x) disabled.\n", c->microcode);
pr_err_once("Please consider either early loading through initrd/built-in or a potential BIOS update.\n");
--
1.8.3.1


2018-01-15 18:46:20

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2] x86/microcode/intel: Extend BDW late-loading with LLC size check

On Mon, Jan 15, 2018 at 09:11:57PM +0800, Jia Zhang wrote:
> The commit b94b73733171
> ("x86/microcode/intel: Extend BDW late-loading with a revision check")
> reduces the impact of erratum BDF90 for Broadwell process model.
> Actually, the impact can be reduced further through adding the checks
> for the size of LLC per core.
>
> For more details, see erratum BDF90 in document #334165 (Intel Xeon
> Processor E7-8800/4800 v4 Product Family Specification Update) from
> September 2017.
>
> Signed-off-by: Jia Zhang <[email protected]>
> ---
> arch/x86/kernel/cpu/microcode/intel.c | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
> index d9e460f..9143cf2 100644
> --- a/arch/x86/kernel/cpu/microcode/intel.c
> +++ b/arch/x86/kernel/cpu/microcode/intel.c
> @@ -906,18 +906,29 @@ static int get_ucode_fw(void *to, const void *from, size_t n)
> return 0;
> }
>
> +static int llc_size_per_core(struct cpuinfo_x86 *c)
> +{
> + u64 llc_size = c->x86_cache_size * 1024;
> +
> + do_div(llc_size, c->x86_max_cores);

This is done per-CPU - I don't want it to do the same division for each
core. Do it once at driver init only for that model and cache it.

> +
> + return (int)llc_size;
> +}
> +
> static bool is_blacklisted(unsigned int cpu)
> {
> struct cpuinfo_x86 *c = &cpu_data(cpu);
>
> /*
> * Late loading on model 79 with microcode revision less than 0x0b000021
> - * may result in a system hang. This behavior is documented in item
> - * BDF90, #334165 (Intel Xeon Processor E7-8800/4800 v4 Product Family).
> + * and LLC size per core bigger than 2.5MB may result in a system hang.
> + * This behavior is documented in item BDF90, #334165 (Intel Xeon
> + * Processor E7-8800/4800 v4 Product Family).
> */
> if (c->x86 == 6 &&
> c->x86_model == INTEL_FAM6_BROADWELL_X &&
> c->x86_mask == 0x01 &&
> + llc_size_per_core(c) > 2621440 &&

I'm not taking this: this looks like a bunch of voodoo magic numbers.
Please get someone from Intel to explain first.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2018-01-16 01:15:08

by Jia Zhang

[permalink] [raw]
Subject: Re: [PATCH v2] x86/microcode/intel: Extend BDW late-loading with LLC size check



在 2018/1/16 上午2:46, Borislav Petkov 写道:
> On Mon, Jan 15, 2018 at 09:11:57PM +0800, Jia Zhang wrote:
>> The commit b94b73733171
>> ("x86/microcode/intel: Extend BDW late-loading with a revision check")
>> reduces the impact of erratum BDF90 for Broadwell process model.
>> Actually, the impact can be reduced further through adding the checks
>> for the size of LLC per core.
>>
>> For more details, see erratum BDF90 in document #334165 (Intel Xeon
>> Processor E7-8800/4800 v4 Product Family Specification Update) from
>> September 2017.
>>
>> Signed-off-by: Jia Zhang <[email protected]>
>> ---
>> arch/x86/kernel/cpu/microcode/intel.c | 15 +++++++++++++--
>> 1 file changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
>> index d9e460f..9143cf2 100644
>> --- a/arch/x86/kernel/cpu/microcode/intel.c
>> +++ b/arch/x86/kernel/cpu/microcode/intel.c
>> @@ -906,18 +906,29 @@ static int get_ucode_fw(void *to, const void *from, size_t n)
>> return 0;
>> }
>>
>> +static int llc_size_per_core(struct cpuinfo_x86 *c)
>> +{
>> + u64 llc_size = c->x86_cache_size * 1024;
>> +
>> + do_div(llc_size, c->x86_max_cores);
>
> This is done per-CPU - I don't want it to do the same division for each
> core. Do it once at driver init only for that model and cache it.

How about this?

--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -908,10 +908,13 @@ static int get_ucode_fw(void *to, const void
*from, size_t n)

static int llc_size_per_core(struct cpuinfo_x86 *c)
{
- u64 llc_size = c->x86_cache_size * 1024;
-
- do_div(llc_size, c->x86_max_cores);
+ static u64 llc_size;

+ if (unlikely(!llc_size)) {
+ llc_size = c->x86_cache_size * 1024;
+ do_div(llc_size, c->x86_max_cores);
+ }
+
return (int)llc_size;
}


or driver init style?

@@ -996,5 +999,7 @@ struct microcode_ops * __init init_intel_microcode(void)
return NULL;
}

+ llc_size_per_core = calc_llc_size_per_core(c);
+
return &microcode_intel_ops;
}

or more generic?

@@ -984,6 +987,7 @@ static int get_ucode_user(void *to, const void
*from, size_t n)
.request_microcode_fw = request_microcode_fw,
.collect_cpu_info = collect_cpu_info,
.apply_microcode = apply_microcode_intel,
+ .is_blacklisted = is_blacklisted,
};

>
>> +
>> + return (int)llc_size;
>> +}
>> +
>> static bool is_blacklisted(unsigned int cpu)
>> {
>> struct cpuinfo_x86 *c = &cpu_data(cpu);
>>
>> /*
>> * Late loading on model 79 with microcode revision less than 0x0b000021
>> - * may result in a system hang. This behavior is documented in item
>> - * BDF90, #334165 (Intel Xeon Processor E7-8800/4800 v4 Product Family).
>> + * and LLC size per core bigger than 2.5MB may result in a system hang.
>> + * This behavior is documented in item BDF90, #334165 (Intel Xeon
>> + * Processor E7-8800/4800 v4 Product Family).
>> */
>> if (c->x86 == 6 &&
>> c->x86_model == INTEL_FAM6_BROADWELL_X &&
>> c->x86_mask == 0x01 &&
>> + llc_size_per_core(c) > 2621440 &&
>
> I'm not taking this: this looks like a bunch of voodoo magic numbers.
> Please get someone from Intel to explain first.

Tony, could you clarify this?

Thanks,
Jia

>

2018-01-16 14:59:39

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2] x86/microcode/intel: Extend BDW late-loading with LLC size check

On Tue, Jan 16, 2018 at 09:14:44AM +0800, Jia Zhang wrote:
> or driver init style?
>
> @@ -996,5 +999,7 @@ struct microcode_ops * __init init_intel_microcode(void)
> return NULL;
> }
>
> + llc_size_per_core = calc_llc_size_per_core(c);
> +
> return &microcode_intel_ops;
> }

Yes, that looks ok.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2018-01-16 17:03:02

by Luck, Tony

[permalink] [raw]
Subject: RE: [PATCH v2] x86/microcode/intel: Extend BDW late-loading with LLC size check

>> I'm not taking this: this looks like a bunch of voodoo magic numbers.
>> Please get someone from Intel to explain first.
>
> Tony, could you clarify this?

Jia,

I'll look for someone who can confirm the 2.5MB/core detail.

-Tony

2018-01-16 17:24:30

by Luck, Tony

[permalink] [raw]
Subject: RE: [PATCH v2] x86/microcode/intel: Extend BDW late-loading with LLC size check

> I'll look for someone who can confirm the 2.5MB/core detail.

Ok ... re-read the erratum. The 2.5MB/core is clear. The E5+E7 is clear.

No mention of the platform ID, but Jia is dropping that part.

Boris ... what specific questions remain?

-Tony


2018-01-16 20:01:57

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2] x86/microcode/intel: Extend BDW late-loading with LLC size check

On Tue, Jan 16, 2018 at 05:24:27PM +0000, Luck, Tony wrote:
> > I'll look for someone who can confirm the 2.5MB/core detail.
>
> Ok ... re-read the erratum. The 2.5MB/core is clear. The E5+E7 is clear.
>
> No mention of the platform ID, but Jia is dropping that part.
>
> Boris ... what specific questions remain?

This magic:

llc_size_per_core(c) > 2621440

as a reliable detection characteristic whether the patch is good to
apply late. There must be a more reliable way to detect that.

Also, the testing order is:

llc_size_per_core(c) > 2621440 &&
c->microcode < 0x0b000021) {

so if the LLC size per core check fails, the microcode revision being <
0x0b000021 doesn't matter. I.e., on machines with LLC-per-core < 2.5M,
we can update even with revisions < 0x0b000021.

Is that ordering correct?

Also, this heuristic is not documented in the public doc AFAICT - I'm
guessing that'll change soon...?

Thx.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2018-01-16 20:12:00

by Luck, Tony

[permalink] [raw]
Subject: Re: [PATCH v2] x86/microcode/intel: Extend BDW late-loading with LLC size check

On Tue, Jan 16, 2018 at 09:01:49PM +0100, Borislav Petkov wrote:
> On Tue, Jan 16, 2018 at 05:24:27PM +0000, Luck, Tony wrote:
> > > I'll look for someone who can confirm the 2.5MB/core detail.
> >
> > Ok ... re-read the erratum. The 2.5MB/core is clear. The E5+E7 is clear.
> >
> > No mention of the platform ID, but Jia is dropping that part.
> >
> > Boris ... what specific questions remain?
>
> This magic:
>
> llc_size_per_core(c) > 2621440
>
> as a reliable detection characteristic whether the patch is good to
> apply late. There must be a more reliable way to detect that.
>
> Also, the testing order is:
>
> llc_size_per_core(c) > 2621440 &&
> c->microcode < 0x0b000021) {
>
> so if the LLC size per core check fails, the microcode revision being <
> 0x0b000021 doesn't matter. I.e., on machines with LLC-per-core < 2.5M,
> we can update even with revisions < 0x0b000021.
>
> Is that ordering correct?

I think so. The erratum (see below) says the problem only occurs
on the large-cache SKUs. So we only need to avoid the update if
we are on a big cache SKU that is also running old microcode.

> Also, this heuristic is not documented in the public doc AFAICT - I'm
> guessing that'll change soon...?

Here's what I see in the public doc. for BDF90:

Problem: An uncorrectable error (IA32_MC3_STATUS.MCACOD=0400 and
IA32_MC3_STATUS.MSCOD=0080) may be logged for processors that have more
than 2.5MB last-level-cache per core on attempting to load a microcode
update or execute an authenticated code module. This issue does not
occur with microcode updates with a signature of 0x0b000021 and greater.

-Tony

2018-01-16 20:50:47

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2] x86/microcode/intel: Extend BDW late-loading with LLC size check

On Tue, Jan 16, 2018 at 12:11:58PM -0800, Luck, Tony wrote:
> I think so. The erratum (see below) says the problem only occurs
> on the large-cache SKUs. So we only need to avoid the update if
> we are on a big cache SKU that is also running old microcode.

... and there's not a more reliable way to detect those like platform ID
or so? Because if for anywhere, this is where one *should* use platform
ID.

Or perhaps some other bit somewhere instead of this cache size thing?

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2018-01-16 21:30:21

by Luck, Tony

[permalink] [raw]
Subject: Re: [PATCH v2] x86/microcode/intel: Extend BDW late-loading with LLC size check

On Tue, Jan 16, 2018 at 09:50:37PM +0100, Borislav Petkov wrote:
> ... and there's not a more reliable way to detect those like platform ID
> or so? Because if for anywhere, this is where one *should* use platform
> ID.
>
> Or perhaps some other bit somewhere instead of this cache size thing?

I could get you a list of model numbers that you can check against
model_name. But that seems way worse. Especially as the 2.5MB thing
is what is called out in the erratum.

-Tony

2018-01-16 21:51:38

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2] x86/microcode/intel: Extend BDW late-loading with LLC size check

On Tue, Jan 16, 2018 at 01:30:19PM -0800, Luck, Tony wrote:
> I could get you a list of model numbers that you can check against
> model_name.

Yeah, we're not doing that again. :)

> But that seems way worse. Especially as the 2.5MB thing is what is
> called out in the erratum.

Oh well.

Thx.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2018-01-19 00:30:35

by Jia Zhang

[permalink] [raw]
Subject: Re: [PATCH v2] x86/microcode/intel: Extend BDW late-loading with LLC size check



在 2018/1/17 上午5:51, Borislav Petkov 写道:
> On Tue, Jan 16, 2018 at 01:30:19PM -0800, Luck, Tony wrote:
>> I could get you a list of model numbers that you can check against
>> model_name.
>
> Yeah, we're not doing that again. :)
>
>> But that seems way worse. Especially as the 2.5MB thing is what is
>> called out in the erratum.
>
> Oh well.

I will send a v3 and it may be pending till Tony gives a list.

Thanks,
Jia

>
> Thx.
>