2024-03-24 19:06:12

by Xi Ruoyao

[permalink] [raw]
Subject: [PATCH v2] x86/mm: Don't disable INVLPG if "incomplete Global INVLPG flushes" is fixed by microcode

Per the "Processor Specification Update" documentations referred by the
intel-microcode-20240312 release note, this microcode release has fixed
the issue for all affected models.

So don't disable INVLPG if the microcode is new enough.

Cc: Dave Hansen <[email protected]>
Link: https://lore.kernel.org/all/168436059559.404.13934972543631851306.tip-bot2@tip-bot2/
Link: https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/releases/tag/microcode-20240312
Link: https://cdrdv2.intel.com/v1/dl/getContent/740518 # RPL042, rev. 13
Link: https://cdrdv2.intel.com/v1/dl/getContent/682436 # ADL063, rev. 24
Signed-off-by: Xi Ruoyao <[email protected]>
---
arch/x86/mm/init.c | 32 ++++++++++++++++++++------------
1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 679893ea5e68..c52be4e66e44 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -261,33 +261,41 @@ static void __init probe_page_size_mask(void)
}
}

-#define INTEL_MATCH(_model) { .vendor = X86_VENDOR_INTEL, \
- .family = 6, \
- .model = _model, \
- }
+#define INTEL_MATCH(_model, _fixed_microcode) \
+ { .vendor = X86_VENDOR_INTEL, \
+ .family = 6, \
+ .model = _model, \
+ .driver_data = _fixed_microcode, \
+ }
+
/*
* INVLPG may not properly flush Global entries
- * on these CPUs when PCIDs are enabled.
+ * on these CPUs when PCIDs are enabled and the
+ * microcode is not updated to fix the issue.
*/
static const struct x86_cpu_id invlpg_miss_ids[] = {
- INTEL_MATCH(INTEL_FAM6_ALDERLAKE ),
- INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L ),
- INTEL_MATCH(INTEL_FAM6_ATOM_GRACEMONT ),
- INTEL_MATCH(INTEL_FAM6_RAPTORLAKE ),
- INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P),
- INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S),
+ INTEL_MATCH(INTEL_FAM6_ALDERLAKE, 0x34),
+ INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L, 0x432),
+ INTEL_MATCH(INTEL_FAM6_ATOM_GRACEMONT, 0x15),
+ INTEL_MATCH(INTEL_FAM6_RAPTORLAKE, 0x122),
+ INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P, 0x4121),
+ INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S, 0x34),
{}
};

static void setup_pcid(void)
{
+ const struct x86_cpu_id *invlpg_miss_match;
+
if (!IS_ENABLED(CONFIG_X86_64))
return;

if (!boot_cpu_has(X86_FEATURE_PCID))
return;

- if (x86_match_cpu(invlpg_miss_ids)) {
+ invlpg_miss_match = x86_match_cpu(invlpg_miss_ids);
+ if (invlpg_miss_match &&
+ invlpg_miss_match->driver_data > boot_cpu_data.microcode) {
pr_info("Incomplete global flushes, disabling PCID");
setup_clear_cpu_cap(X86_FEATURE_PCID);
return;
--
2.44.0



2024-03-25 12:54:11

by Michael Kelley

[permalink] [raw]
Subject: RE: [PATCH v2] x86/mm: Don't disable INVLPG if "incomplete Global INVLPG flushes" is fixed by microcode

From: Xi Ruoyao <[email protected]> Sent: Sunday, March 24, 2024 12:05 PM
>
> Per the "Processor Specification Update" documentations referred by the
> intel-microcode-20240312 release note, this microcode release has fixed
> the issue for all affected models.
>
> So don't disable INVLPG if the microcode is new enough.
>
> Cc: Dave Hansen <[email protected]>
> Signed-off-by: Xi Ruoyao <[email protected]>
> ---
> arch/x86/mm/init.c | 32 ++++++++++++++++++++------------
> 1 file changed, 20 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index 679893ea5e68..c52be4e66e44 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -261,33 +261,41 @@ static void __init probe_page_size_mask(void)
> }
> }
>
> -#define INTEL_MATCH(_model) { .vendor = X86_VENDOR_INTEL, \
> - .family = 6, \
> - .model = _model, \
> - }
> +#define INTEL_MATCH(_model, _fixed_microcode) \
> + { .vendor = X86_VENDOR_INTEL, \
> + .family = 6, \
> + .model = _model, \
> + .driver_data = _fixed_microcode, \
> + }
> +
> /*
> * INVLPG may not properly flush Global entries
> - * on these CPUs when PCIDs are enabled.
> + * on these CPUs when PCIDs are enabled and the
> + * microcode is not updated to fix the issue.
> */
> static const struct x86_cpu_id invlpg_miss_ids[] = {
> - INTEL_MATCH(INTEL_FAM6_ALDERLAKE ),
> - INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L ),
> - INTEL_MATCH(INTEL_FAM6_ATOM_GRACEMONT ),
> - INTEL_MATCH(INTEL_FAM6_RAPTORLAKE ),
> - INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P),
> - INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S),
> + INTEL_MATCH(INTEL_FAM6_ALDERLAKE, 0x34),
> + INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L, 0x432),
> + INTEL_MATCH(INTEL_FAM6_ATOM_GRACEMONT, 0x15),
> + INTEL_MATCH(INTEL_FAM6_RAPTORLAKE, 0x122),
> + INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P, 0x4121),
> + INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S, 0x34),
> {}
> };
>
> static void setup_pcid(void)
> {
> + const struct x86_cpu_id *invlpg_miss_match;
> +
> if (!IS_ENABLED(CONFIG_X86_64))
> return;
>
> if (!boot_cpu_has(X86_FEATURE_PCID))
> return;
>
> - if (x86_match_cpu(invlpg_miss_ids)) {
> + invlpg_miss_match = x86_match_cpu(invlpg_miss_ids);
> + if (invlpg_miss_match &&
> + invlpg_miss_match->driver_data > boot_cpu_data.microcode) {
> pr_info("Incomplete global flushes, disabling PCID");
> setup_clear_cpu_cap(X86_FEATURE_PCID);
> return;

As noted in similar places where microcode versions are
checked, hypervisors often lie to guests about microcode versions.
For example, see comments in bad_spectre_microcode(). I
know Hyper-V guests always see the microcode version as
0xFFFFFFFF (max u32 value). So in a Hyper-V guest the above
code will always leave PCID enabled.

Maybe the above should have a check for running on a
hypervisor and always disable PCID without checking the
microcode version. That's the safe approach, though there are
other similar cases like bad_spectre_microcode() that take
the unsafe approach when running as a guest. I don't know
what's best here .....

Michael

2024-03-25 14:24:01

by Xi Ruoyao

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mm: Don't disable INVLPG if "incomplete Global INVLPG flushes" is fixed by microcode

On Mon, 2024-03-25 at 04:57 +0000, Michael Kelley wrote:
> From: Xi Ruoyao <[email protected]> Sent: Sunday, March 24, 2024 12:05 PM
> >
> > Per the "Processor Specification Update" documentations referred by the
> > intel-microcode-20240312 release note, this microcode release has fixed
> > the issue for all affected models.
> >
> > So don't disable INVLPG if the microcode is new enough.
> >
> > Cc: Dave Hansen <[email protected]>
> > Signed-off-by: Xi Ruoyao <[email protected]>
> > ---
> >  arch/x86/mm/init.c | 32 ++++++++++++++++++++------------
> >  1 file changed, 20 insertions(+), 12 deletions(-)
> >
> > diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> > index 679893ea5e68..c52be4e66e44 100644
> > --- a/arch/x86/mm/init.c
> > +++ b/arch/x86/mm/init.c
> > @@ -261,33 +261,41 @@ static void __init probe_page_size_mask(void)
> >   }
> >  }
> >
> > -#define INTEL_MATCH(_model) { .vendor  = X86_VENDOR_INTEL, \
> > -       .family  = 6, \
> > -       .model = _model, \
> > -     }
> > +#define INTEL_MATCH(_model, _fixed_microcode) \
> > +    { .vendor = X86_VENDOR_INTEL, \
> > +      .family = 6, \
> > +      .model = _model, \
> > +      .driver_data = _fixed_microcode, \
> > +    }
> > +
> >  /*
> >   * INVLPG may not properly flush Global entries
> > - * on these CPUs when PCIDs are enabled.
> > + * on these CPUs when PCIDs are enabled and the
> > + * microcode is not updated to fix the issue.
> >   */
> >  static const struct x86_cpu_id invlpg_miss_ids[] = {
> > - INTEL_MATCH(INTEL_FAM6_ALDERLAKE   ),
> > - INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L ),
> > - INTEL_MATCH(INTEL_FAM6_ATOM_GRACEMONT ),
> > - INTEL_MATCH(INTEL_FAM6_RAPTORLAKE  ),
> > - INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P),
> > - INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S),
> > + INTEL_MATCH(INTEL_FAM6_ALDERLAKE, 0x34),
> > + INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L, 0x432),
> > + INTEL_MATCH(INTEL_FAM6_ATOM_GRACEMONT, 0x15),
> > + INTEL_MATCH(INTEL_FAM6_RAPTORLAKE, 0x122),
> > + INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P, 0x4121),
> > + INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S, 0x34),
> >   {}
> >  };
> >
> >  static void setup_pcid(void)
> >  {
> > + const struct x86_cpu_id *invlpg_miss_match;
> > +
> >   if (!IS_ENABLED(CONFIG_X86_64))
> >   return;
> >
> >   if (!boot_cpu_has(X86_FEATURE_PCID))
> >   return;
> >
> > - if (x86_match_cpu(invlpg_miss_ids)) {
> > + invlpg_miss_match = x86_match_cpu(invlpg_miss_ids);
> > + if (invlpg_miss_match &&
> > +     invlpg_miss_match->driver_data > boot_cpu_data.microcode) {
> >   pr_info("Incomplete global flushes, disabling PCID");
> >   setup_clear_cpu_cap(X86_FEATURE_PCID);
> >   return;
>
> As noted in similar places where microcode versions are
> checked, hypervisors often lie to guests about microcode versions.
> For example, see comments in bad_spectre_microcode().  I
> know Hyper-V guests always see the microcode version as
> 0xFFFFFFFF (max u32 value).  So in a Hyper-V guest the above
> code will always leave PCID enabled.
>
> Maybe the above should have a check for running on a
> hypervisor and always disable PCID without checking the
> microcode version.  That's the safe approach, though there are
> other similar cases like bad_spectre_microcode() that take
> the unsafe approach when running as a guest.  I don't know
> what's best here .....

Then generally maybe we should set boot_cpu_data.microcode to 0 if a
hypervisor is detected? Or checking against hypervisors at every place
where we check the microcode revision will be nasty.

And I'd prefer they say 0 instead if 0xffffffffu if they must lie about
the microcode revision.

--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University

2024-03-25 20:07:09

by Michael Kelley

[permalink] [raw]
Subject: RE: [PATCH v2] x86/mm: Don't disable INVLPG if "incomplete Global INVLPG flushes" is fixed by microcode

From: Xi Ruoyao <[email protected]> Sent: Monday, March 25, 2024 3:21 AM
>
> On Mon, 2024-03-25 at 04:57 +0000, Michael Kelley wrote:
> > From: Xi Ruoyao <[email protected]> Sent: Sunday, March 24, 2024
> 12:05 PM
> > >
> > > Per the "Processor Specification Update" documentations referred by the
> > > intel-microcode-20240312 release note, this microcode release has fixed
> > > the issue for all affected models.
> > >
> > > So don't disable INVLPG if the microcode is new enough.
> > >
> > > Cc: Dave Hansen <[email protected]>
> > > Signed-off-by: Xi Ruoyao <[email protected]>
> > > ---
> > >  arch/x86/mm/init.c | 32 ++++++++++++++++++++------------
> > >  1 file changed, 20 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> > > index 679893ea5e68..c52be4e66e44 100644
> > > --- a/arch/x86/mm/init.c
> > > +++ b/arch/x86/mm/init.c
> > > @@ -261,33 +261,41 @@ static void __init probe_page_size_mask(void)
> > >   }
> > >  }
> > >
> > > -#define INTEL_MATCH(_model) { .vendor  = X86_VENDOR_INTEL, \
> > > -       .family  = 6, \
> > > -       .model = _model, \
> > > -     }
> > > +#define INTEL_MATCH(_model, _fixed_microcode) \
> > > +    { .vendor = X86_VENDOR_INTEL, \
> > > +      .family = 6, \
> > > +      .model = _model, \
> > > +      .driver_data = _fixed_microcode, \
> > > +    }
> > > +
> > >  /*
> > >   * INVLPG may not properly flush Global entries
> > > - * on these CPUs when PCIDs are enabled.
> > > + * on these CPUs when PCIDs are enabled and the
> > > + * microcode is not updated to fix the issue.
> > >   */
> > >  static const struct x86_cpu_id invlpg_miss_ids[] = {
> > > - INTEL_MATCH(INTEL_FAM6_ALDERLAKE   ),
> > > - INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L ),
> > > - INTEL_MATCH(INTEL_FAM6_ATOM_GRACEMONT ),
> > > - INTEL_MATCH(INTEL_FAM6_RAPTORLAKE  ),
> > > - INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P),
> > > - INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S),
> > > + INTEL_MATCH(INTEL_FAM6_ALDERLAKE, 0x34),
> > > + INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L, 0x432),
> > > + INTEL_MATCH(INTEL_FAM6_ATOM_GRACEMONT, 0x15),
> > > + INTEL_MATCH(INTEL_FAM6_RAPTORLAKE, 0x122),
> > > + INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P, 0x4121),
> > > + INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S, 0x34),
> > >   {}
> > >  };
> > >
> > >  static void setup_pcid(void)
> > >  {
> > > + const struct x86_cpu_id *invlpg_miss_match;
> > > +
> > >   if (!IS_ENABLED(CONFIG_X86_64))
> > >   return;
> > >
> > >   if (!boot_cpu_has(X86_FEATURE_PCID))
> > >   return;
> > >
> > > - if (x86_match_cpu(invlpg_miss_ids)) {
> > > + invlpg_miss_match = x86_match_cpu(invlpg_miss_ids);
> > > + if (invlpg_miss_match &&
> > > +     invlpg_miss_match->driver_data > boot_cpu_data.microcode) {
> > >   pr_info("Incomplete global flushes, disabling PCID");
> > >   setup_clear_cpu_cap(X86_FEATURE_PCID);
> > >   return;
> >
> > As noted in similar places where microcode versions are
> > checked, hypervisors often lie to guests about microcode versions.
> > For example, see comments in bad_spectre_microcode().  I
> > know Hyper-V guests always see the microcode version as
> > 0xFFFFFFFF (max u32 value).  So in a Hyper-V guest the above
> > code will always leave PCID enabled.
> >
> > Maybe the above should have a check for running on a
> > hypervisor and always disable PCID without checking the
> > microcode version.  That's the safe approach, though there are
> > other similar cases like bad_spectre_microcode() that take
> > the unsafe approach when running as a guest.  I don't know
> > what's best here .....
>
> Then generally maybe we should set boot_cpu_data.microcode to 0 if a
> hypervisor is detected? Or checking against hypervisors at every place
> where we check the microcode revision will be nasty.

I haven't done a complete census, but there don't seem to be
that many places in x86 code where the microcode version is
checked. Several (most?) already have some kind of "out" when
running on a hypervisor -- like bad_spectre_microcode(), and
apic_validate_deadline_timer().

I've commented above on what Hyper-V does, but I don't know
what other hypervisors do. The comment in bad_spectre_microcode()
seems to imply that the problem is widespread, and perhaps
not just a Hyper-V thing. But I don’t know that for sure. If we
set boot_cpu_data.microcode to 0, we'll need to reason about
the implications because that will certainly flip the outcome of
any comparisons for Hyper-V guests and perhaps other
hypervisor guests as well.

>
> And I'd prefer they say 0 instead if 0xffffffffu if they must lie about
> the microcode revision.

I don't know. At least for Hyper-V, that ship has sailed. I don't
have a way to get Hyper-V to make a change. Making a change
would also introduce a backward compatibility problem that can't
easily be handled.

Michael

>
> --
> Xi Ruoyao <[email protected]>
> School of Aerospace Science and Technology, Xidian University

2024-03-25 23:50:41

by Pawan Gupta

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mm: Don't disable INVLPG if "incomplete Global INVLPG flushes" is fixed by microcode

On Mon, Mar 25, 2024 at 03:05:03AM +0800, Xi Ruoyao wrote:
> Per the "Processor Specification Update" documentations referred by the
> intel-microcode-20240312 release note, this microcode release has fixed
> the issue for all affected models.
>
> So don't disable INVLPG if the microcode is new enough.
>
> Cc: Dave Hansen <[email protected]>
> Link: https://lore.kernel.org/all/168436059559.404.13934972543631851306.tip-bot2@tip-bot2/
> Link: https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/releases/tag/microcode-20240312
> Link: https://cdrdv2.intel.com/v1/dl/getContent/740518 # RPL042, rev. 13
> Link: https://cdrdv2.intel.com/v1/dl/getContent/682436 # ADL063, rev. 24
> Signed-off-by: Xi Ruoyao <[email protected]>
> ---
> arch/x86/mm/init.c | 32 ++++++++++++++++++++------------
> 1 file changed, 20 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index 679893ea5e68..c52be4e66e44 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -261,33 +261,41 @@ static void __init probe_page_size_mask(void)
> }
> }
>
> -#define INTEL_MATCH(_model) { .vendor = X86_VENDOR_INTEL, \
> - .family = 6, \
> - .model = _model, \
> - }
> +#define INTEL_MATCH(_model, _fixed_microcode) \
> + { .vendor = X86_VENDOR_INTEL, \
> + .family = 6, \
> + .model = _model, \
> + .driver_data = _fixed_microcode, \
> + }

Checkpatch is complaining here:

WARNING: please, no spaces at the start of a line
#33: FILE: arch/x86/mm/init.c:265:
+ { .vendor^I^I= X86_VENDOR_INTEL,^I\$

..
total: 0 errors, 5 warnings, 53 lines checked

> +
> /*
> * INVLPG may not properly flush Global entries
> - * on these CPUs when PCIDs are enabled.
> + * on these CPUs when PCIDs are enabled and the
> + * microcode is not updated to fix the issue.
> */
> static const struct x86_cpu_id invlpg_miss_ids[] = {
> - INTEL_MATCH(INTEL_FAM6_ALDERLAKE ),
> - INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L ),
> - INTEL_MATCH(INTEL_FAM6_ATOM_GRACEMONT ),
> - INTEL_MATCH(INTEL_FAM6_RAPTORLAKE ),
> - INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P),
> - INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S),
> + INTEL_MATCH(INTEL_FAM6_ALDERLAKE, 0x34),
> + INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L, 0x432),
> + INTEL_MATCH(INTEL_FAM6_ATOM_GRACEMONT, 0x15),
> + INTEL_MATCH(INTEL_FAM6_RAPTORLAKE, 0x122),
> + INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P, 0x4121),
> + INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S, 0x34),

I have checked this internally, the minimum microcode version that fixed
this issue are as below:

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index c52be4e66e44..d72ad330f2fc 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -274,12 +274,12 @@ static void __init probe_page_size_mask(void)
* microcode is not updated to fix the issue.
*/
static const struct x86_cpu_id invlpg_miss_ids[] = {
- INTEL_MATCH(INTEL_FAM6_ALDERLAKE, 0x34),
- INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L, 0x432),
- INTEL_MATCH(INTEL_FAM6_ATOM_GRACEMONT, 0x15),
- INTEL_MATCH(INTEL_FAM6_RAPTORLAKE, 0x122),
- INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P, 0x4121),
- INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S, 0x34),
+ INTEL_MATCH(INTEL_FAM6_ALDERLAKE, 0x2e),
+ INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L, 0x42c),
+ INTEL_MATCH(INTEL_FAM6_ATOM_GRACEMONT, 0x11),
+ INTEL_MATCH(INTEL_FAM6_RAPTORLAKE, 0x118),
+ INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P, 0x4117),
+ INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S, 0x2e),
{}
};


2024-03-26 00:48:13

by Xi Ruoyao

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mm: Don't disable INVLPG if "incomplete Global INVLPG flushes" is fixed by microcode

On Mon, 2024-03-25 at 20:06 +0000, Michael Kelley wrote:

/* snip */

> I haven't done a complete census, but there don't seem to be
> that many places in x86 code where the microcode version is
> checked. Several (most?) already have some kind of "out" when
> running on a hypervisor -- like bad_spectre_microcode(), and
> apic_validate_deadline_timer().

So I've sent V3 as
https://lore.kernel.org/all/[email protected]/
with the check added.

--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University

2024-04-04 16:19:00

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mm: Don't disable INVLPG if "incomplete Global INVLPG flushes" is fixed by microcode

On Mon, Mar 25, 2024, Michael Kelley wrote:
> > static void setup_pcid(void)
> > {
> > + const struct x86_cpu_id *invlpg_miss_match;
> > +
> > if (!IS_ENABLED(CONFIG_X86_64))
> > return;
> >
> > if (!boot_cpu_has(X86_FEATURE_PCID))
> > return;
> >
> > - if (x86_match_cpu(invlpg_miss_ids)) {
> > + invlpg_miss_match = x86_match_cpu(invlpg_miss_ids);
> > + if (invlpg_miss_match &&
> > + invlpg_miss_match->driver_data > boot_cpu_data.microcode) {
> > pr_info("Incomplete global flushes, disabling PCID");
> > setup_clear_cpu_cap(X86_FEATURE_PCID);
> > return;
>
> As noted in similar places where microcode versions are
> checked, hypervisors often lie to guests about microcode versions.
> For example, see comments in bad_spectre_microcode(). I
> know Hyper-V guests always see the microcode version as
> 0xFFFFFFFF (max u32 value). So in a Hyper-V guest the above
> code will always leave PCID enabled.

Enumerating broken PCID support to a guest is very arguably a hypervisor bug.
Hypervisors also lie to guest about FMS. As KVM *user* with affected hardware
(home box), I would want the kernel to assume PCID works if X86_FEATURE_HYPERVISOR
is present.

2024-04-04 16:48:22

by Andrew Cooper

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mm: Don't disable INVLPG if "incomplete Global INVLPG flushes" is fixed by microcode

On 04/04/2024 5:18 pm, Sean Christopherson wrote:
> On Mon, Mar 25, 2024, Michael Kelley wrote:
>>> static void setup_pcid(void)
>>> {
>>> + const struct x86_cpu_id *invlpg_miss_match;
>>> +
>>> if (!IS_ENABLED(CONFIG_X86_64))
>>> return;
>>>
>>> if (!boot_cpu_has(X86_FEATURE_PCID))
>>> return;
>>>
>>> - if (x86_match_cpu(invlpg_miss_ids)) {
>>> + invlpg_miss_match = x86_match_cpu(invlpg_miss_ids);
>>> + if (invlpg_miss_match &&
>>> + invlpg_miss_match->driver_data > boot_cpu_data.microcode) {
>>> pr_info("Incomplete global flushes, disabling PCID");
>>> setup_clear_cpu_cap(X86_FEATURE_PCID);
>>> return;
>> As noted in similar places where microcode versions are
>> checked, hypervisors often lie to guests about microcode versions.
>> For example, see comments in bad_spectre_microcode(). I
>> know Hyper-V guests always see the microcode version as
>> 0xFFFFFFFF (max u32 value). So in a Hyper-V guest the above
>> code will always leave PCID enabled.
> Enumerating broken PCID support to a guest is very arguably a hypervisor bug.
> Hypervisors also lie to guest about FMS. As KVM *user* with affected hardware
> (home box), I would want the kernel to assume PCID works if X86_FEATURE_HYPERVISOR
> is present.

I have very mixed feelings about all of this.

The Gracemont INVLPG vs PCID bug was found in the field, so PCID will
have been enumerated to guests at that point.  You can't blindly drop
PCID until the VM next reboots.

A related example.  I wrote the patch to hide XSAVES to work around an
AMD erratum where XSAVEC sufficed, and the consequences were so dire for
some versions of Windows that there was a suggestion to simply revert
the workaround to make VMs run again.  Windows intentionally asserts
sanity (== expectations) in what it can see; I have no idea whether it
would object in this case but hiding PCID is definitely playing with fire.

I am frequently dismayed at how many FMS abuses there are in Linux, and
I'm this >< close to talking a leaf out of HyperV's book and poisoning
FMS to 0 or ~0 just like the microcode revision.   Any use of FMS for
anything other than diagnostic purposes under virt is live migration bug
waiting to happen.

Xen's current TLB algorithms ensure never to have both PCID and PGE
active together, so we managed to dodge this particular mess.  But as a
consequence, we've got no logic to spot it, or to consider changing PCID
visibility.  That said, for better or worse, the ucode revision is
visible (for now), and a guest which polls the revision will even spot
the hypervisor doing a late ucode load.

Sorry I don't have any better suggestions.  The only nice(ish) of fixing
this for the guest kernel is for Intel to allocate a $FOO_NO bit, and
it's horrible in every other way.

~Andrew

2024-04-04 17:38:03

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mm: Don't disable INVLPG if "incomplete Global INVLPG flushes" is fixed by microcode

On Thu, Apr 04, 2024, Andrew Cooper wrote:
> On 04/04/2024 5:18 pm, Sean Christopherson wrote:
> > On Mon, Mar 25, 2024, Michael Kelley wrote:
> >>> static void setup_pcid(void)
> >>> {
> >>> + const struct x86_cpu_id *invlpg_miss_match;
> >>> +
> >>> if (!IS_ENABLED(CONFIG_X86_64))
> >>> return;
> >>>
> >>> if (!boot_cpu_has(X86_FEATURE_PCID))
> >>> return;
> >>>
> >>> - if (x86_match_cpu(invlpg_miss_ids)) {
> >>> + invlpg_miss_match = x86_match_cpu(invlpg_miss_ids);
> >>> + if (invlpg_miss_match &&
> >>> + invlpg_miss_match->driver_data > boot_cpu_data.microcode) {
> >>> pr_info("Incomplete global flushes, disabling PCID");
> >>> setup_clear_cpu_cap(X86_FEATURE_PCID);
> >>> return;
> >> As noted in similar places where microcode versions are
> >> checked, hypervisors often lie to guests about microcode versions.
> >> For example, see comments in bad_spectre_microcode(). I
> >> know Hyper-V guests always see the microcode version as
> >> 0xFFFFFFFF (max u32 value). So in a Hyper-V guest the above
> >> code will always leave PCID enabled.
> > Enumerating broken PCID support to a guest is very arguably a hypervisor bug.
> > Hypervisors also lie to guest about FMS. As KVM *user* with affected hardware
> > (home box), I would want the kernel to assume PCID works if X86_FEATURE_HYPERVISOR
> > is present.
>
> I have very mixed feelings about all of this.
>
> The Gracemont INVLPG vs PCID bug was found in the field, so PCID will
> have been enumerated to guests at that point.  You can't blindly drop
> PCID until the VM next reboots.
>
> A related example.  I wrote the patch to hide XSAVES to work around an
> AMD erratum where XSAVEC sufficed, and the consequences were so dire for
> some versions of Windows that there was a suggestion to simply revert
> the workaround to make VMs run again.  Windows intentionally asserts
> sanity (== expectations) in what it can see; I have no idea whether it
> would object in this case but hiding PCID is definitely playing with fire.

Yeah, KVM users got burned by that too. d52734d00b8e ("KVM: x86: Give a hint when
Win2016 might fail to boot due to XSAVES erratum").

But practically speaking, that ship has sailed for KVM, as KVM advertises PCID
support if and only if the host kernel supports it, i.e. clearing X86_FEATURE_PCID
in setup_pcid() means KVM stops advertising to userspace, which in turn means
QEMU stops enumerating it VMs by default.

> I am frequently dismayed at how many FMS abuses there are in Linux, and
> I'm this >< close to talking a leaf out of HyperV's book and poisoning
> FMS to 0 or ~0 just like the microcode revision.   Any use of FMS for
> anything other than diagnostic purposes under virt is live migration bug
> waiting to happen.
>
> Xen's current TLB algorithms ensure never to have both PCID and PGE
> active together, so we managed to dodge this particular mess.  But as a
> consequence, we've got no logic to spot it, or to consider changing PCID
> visibility.  That said, for better or worse, the ucode revision is
> visible (for now), and a guest which polls the revision will even spot
> the hypervisor doing a late ucode load.
>
> Sorry I don't have any better suggestions.  The only nice(ish) of fixing
> this for the guest kernel is for Intel to allocate a $FOO_NO bit, and
> it's horrible in every other way.

Hmm, one crazy idea would be to carve out a hypervisor CPUID range for enumerating
(potentially) broken features. Dealing with the Intel/AMD (and Centaur, LOL),
0 / 0x8000_0000 split would be annoying, but not hard. E.g. use 0x4{0,8,C}01_xxxx
to enumerate broken features, and then the guest could do:

support = CPUID(leaf).reg & ~CPUID(to_pv_broken(leaf)).reg;

It'd require a decent amount of churn for the initial support, but it would give
hypervisors a way to inform guests that _any_ CPUID-based feature is broken,
without requiring guest changes (after the initial code is merged) or explicit
action from hardware vendors.

And if we got Windows/Hyper-V in on the game, it would allow them to keep their
sanity checks while (hopefully) degrading gracefully if a feature is enumerated
as broken.

2024-04-04 17:51:07

by Michael Kelley

[permalink] [raw]
Subject: RE: [PATCH v2] x86/mm: Don't disable INVLPG if "incomplete Global INVLPG flushes" is fixed by microcode

From: Sean Christopherson <[email protected]> Sent: Thursday, April 4, 2024 10:28 AM
>
> On Thu, Apr 04, 2024, Andrew Cooper wrote:
> > On 04/04/2024 5:18 pm, Sean Christopherson wrote:
> > > On Mon, Mar 25, 2024, Michael Kelley wrote:
> > >>> static void setup_pcid(void)
> > >>> {
> > >>> + const struct x86_cpu_id *invlpg_miss_match;
> > >>> +
> > >>> if (!IS_ENABLED(CONFIG_X86_64))
> > >>> return;
> > >>>
> > >>> if (!boot_cpu_has(X86_FEATURE_PCID))
> > >>> return;
> > >>>
> > >>> - if (x86_match_cpu(invlpg_miss_ids)) {
> > >>> + invlpg_miss_match = x86_match_cpu(invlpg_miss_ids);
> > >>> + if (invlpg_miss_match &&
> > >>> + invlpg_miss_match->driver_data > boot_cpu_data.microcode) {
> > >>> pr_info("Incomplete global flushes, disabling PCID");
> > >>> setup_clear_cpu_cap(X86_FEATURE_PCID);
> > >>> return;
> > >> As noted in similar places where microcode versions are
> > >> checked, hypervisors often lie to guests about microcode versions.
> > >> For example, see comments in bad_spectre_microcode(). I
> > >> know Hyper-V guests always see the microcode version as
> > >> 0xFFFFFFFF (max u32 value). So in a Hyper-V guest the above
> > >> code will always leave PCID enabled.
> > > Enumerating broken PCID support to a guest is very arguably a hypervisor bug.
> > > Hypervisors also lie to guest about FMS. As KVM *user* with affected hardware
> > > (home box), I would want the kernel to assume PCID works if X86_FEATURE_HYPERVISOR
> > > is present.
> >
> > I have very mixed feelings about all of this.
> >
> > The Gracemont INVLPG vs PCID bug was found in the field, so PCID will
> > have been enumerated to guests at that point.  You can't blindly drop
> > PCID until the VM next reboots.
> >
> > A related example.  I wrote the patch to hide XSAVES to work around an
> > AMD erratum where XSAVEC sufficed, and the consequences were so dire for
> > some versions of Windows that there was a suggestion to simply revert
> > the workaround to make VMs run again.  Windows intentionally asserts
> > sanity (== expectations) in what it can see; I have no idea whether it
> > would object in this case but hiding PCID is definitely playing with fire.
>
> Yeah, KVM users got burned by that too. d52734d00b8e ("KVM: x86: Give a
> hint when Win2016 might fail to boot due to XSAVES erratum").
>
> But practically speaking, that ship has sailed for KVM, as KVM advertises PCID
> support if and only if the host kernel supports it, i.e. clearing X86_FEATURE_PCID
> in setup_pcid() means KVM stops advertising to userspace, which in turn means
> QEMU stops enumerating it VMs by default.

FWIW, my home laptop has a RaptorLake CPU with the flaw. It's running an
up-to-date Windows 11/Hyper-V and a Linux guest VM. The microcode has
not been updated to a version with the fix. But the Linux guest still
sees CPUID 0x00000001 ECX with the "PCID" flag (bit 17) as set. So evidently
Hyper-V is not filtering this flaw.

I agree one could argue that it is a hypervisor bug to present PCID to the guest
in this situation. It's a lot cleaner to not have a guest be checking FMS and
microcode versions. But whether that's practical in the real world, at least
for Hyper-V, I don't know. What's the real impact of running with PCID while
the flaw is still present? I don’t know the history here ...

Michael

2024-04-04 18:10:15

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mm: Don't disable INVLPG if "incomplete Global INVLPG flushes" is fixed by microcode

On 4/4/24 10:48, Michael Kelley wrote:
> I agree one could argue that it is a hypervisor bug to present PCID to the guest
> in this situation. It's a lot cleaner to not have a guest be checking FMS and
> microcode versions. But whether that's practical in the real world, at least
> for Hyper-V, I don't know. What's the real impact of running with PCID while
> the flaw is still present? I don’t know the history here ...

There's a chance that INVLPG will appear ineffective.

The bad sequence would go something like this: The kernel does the
INVLPG on a global mapping. Later, when switching PCIDs, the TLB entry
mysteriously reappears. No PCIDs switching means no mysterious
reappearance.

2024-04-05 00:12:22

by Andrew Cooper

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mm: Don't disable INVLPG if "incomplete Global INVLPG flushes" is fixed by microcode

On 04/04/2024 6:28 pm, Sean Christopherson wrote:
> On Thu, Apr 04, 2024, Andrew Cooper wrote:
>> A related example.  I wrote the patch to hide XSAVES to work around an
>> AMD erratum where XSAVEC sufficed, and the consequences were so dire for
>> some versions of Windows that there was a suggestion to simply revert
>> the workaround to make VMs run again.  Windows intentionally asserts
>> sanity (== expectations) in what it can see; I have no idea whether it
>> would object in this case but hiding PCID is definitely playing with fire.
> Yeah, KVM users got burned by that too. d52734d00b8e ("KVM: x86: Give a hint when
> Win2016 might fail to boot due to XSAVES erratum").

Yeah what I meant was that I wrote the Linux patch, and KVM got burnt
while Xen cared not... :)

> Hmm, one crazy idea would be to carve out a hypervisor CPUID range for enumerating
> (potentially) broken features. Dealing with the Intel/AMD (and Centaur, LOL),
> 0 / 0x8000_0000 split would be annoying, but not hard. E.g. use 0x4{0,8,C}01_xxxx

No transmeta love then?  Or perhaps we declare it their fault for
choosing 0x8086 which is too awkward to fit into that scheme.

> to enumerate broken features, and then the guest could do:
>
> support = CPUID(leaf).reg & ~CPUID(to_pv_broken(leaf)).reg;
>
> It'd require a decent amount of churn for the initial support, but it would give
> hypervisors a way to inform guests that _any_ CPUID-based feature is broken,
> without requiring guest changes (after the initial code is merged) or explicit
> action from hardware vendors.
>
> And if we got Windows/Hyper-V in on the game, it would allow them to keep their
> sanity checks while (hopefully) degrading gracefully if a feature is enumerated
> as broken.

Crazy indeed, but I am curious to see if this has legs.  The exact
indices may need tweaking, because 0x4x01_xxxx might be a little too
close for comfort, but at first glance it does look like a surprisingly
neat solution to the problem.

Perhaps worth a slot at plumbers?

~Andrew

2024-04-08 23:31:37

by Michael Kelley

[permalink] [raw]
Subject: RE: [PATCH v2] x86/mm: Don't disable INVLPG if "incomplete Global INVLPG flushes" is fixed by microcode

From: Dave Hansen <[email protected]> Sent: Thursday, April 4, 2024 11:09 AM
>
> On 4/4/24 10:48, Michael Kelley wrote:
> > I agree one could argue that it is a hypervisor bug to present PCID to the guest
> > in this situation. It's a lot cleaner to not have a guest be checking FMS and
> > microcode versions. But whether that's practical in the real world, at least
> > for Hyper-V, I don't know. What's the real impact of running with PCID while
> > the flaw is still present? I don’t know the history here ...
>
> There's a chance that INVLPG will appear ineffective.
>
> The bad sequence would go something like this: The kernel does the
> INVLPG on a global mapping. Later, when switching PCIDs, the TLB entry
> mysteriously reappears. No PCIDs switching means no mysterious
> reappearance.

Xi Ruoyao's patch identifies these errata: RPL042 and ADL063. In the links
to the documents Xi provided, both of these errata have the following
statement in the Errata Details section:

This erratum does not apply in VMX non-root operation. It applies only
when PCIDs are enabled and either in VMX root operation or outside
VMX operation.

I don't have deep expertise on the terminology here, but this sounds
like it is saying the erratum doesn’t apply in a guest VM. Or am I
misunderstanding?

Michael




2024-04-09 01:43:49

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mm: Don't disable INVLPG if "incomplete Global INVLPG flushes" is fixed by microcode

On Mon, Apr 08, 2024, Michael Kelley wrote:
> From: Dave Hansen <[email protected]> Sent: Thursday, April 4, 2024 11:09 AM
> >
> > On 4/4/24 10:48, Michael Kelley wrote:
> > > I agree one could argue that it is a hypervisor bug to present PCID to the guest
> > > in this situation. It's a lot cleaner to not have a guest be checking FMS and
> > > microcode versions. But whether that's practical in the real world, at least
> > > for Hyper-V, I don't know. What's the real impact of running with PCID while
> > > the flaw is still present? I don’t know the history here ...
> >
> > There's a chance that INVLPG will appear ineffective.
> >
> > The bad sequence would go something like this: The kernel does the
> > INVLPG on a global mapping. Later, when switching PCIDs, the TLB entry
> > mysteriously reappears. No PCIDs switching means no mysterious
> > reappearance.
>
> Xi Ruoyao's patch identifies these errata: RPL042 and ADL063. In the links
> to the documents Xi provided, both of these errata have the following
> statement in the Errata Details section:
>
> This erratum does not apply in VMX non-root operation. It applies only
> when PCIDs are enabled and either in VMX root operation or outside
> VMX operation.
>
> I don't have deep expertise on the terminology here, but this sounds
> like it is saying the erratum doesn’t apply in a guest VM. Or am I
> misunderstanding?

Huh. My read of that is the same as yours. If that's the case, then it probably
makes sense to have KVM advertise support if PCID is available in hardware, even
if PCID is disabled by the host kernel.

2024-04-09 07:56:43

by Andrew Cooper

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mm: Don't disable INVLPG if "incomplete Global INVLPG flushes" is fixed by microcode

On 09/04/2024 2:43 am, Sean Christopherson wrote:
> On Mon, Apr 08, 2024, Michael Kelley wrote:
>> From: Dave Hansen <[email protected]> Sent: Thursday, April 4, 2024 11:09 AM
>>> On 4/4/24 10:48, Michael Kelley wrote:
>>>> I agree one could argue that it is a hypervisor bug to present PCID to the guest
>>>> in this situation. It's a lot cleaner to not have a guest be checking FMS and
>>>> microcode versions. But whether that's practical in the real world, at least
>>>> for Hyper-V, I don't know. What's the real impact of running with PCID while
>>>> the flaw is still present? I don’t know the history here ...
>>> There's a chance that INVLPG will appear ineffective.
>>>
>>> The bad sequence would go something like this: The kernel does the
>>> INVLPG on a global mapping. Later, when switching PCIDs, the TLB entry
>>> mysteriously reappears. No PCIDs switching means no mysterious
>>> reappearance.
>> Xi Ruoyao's patch identifies these errata: RPL042 and ADL063. In the links
>> to the documents Xi provided, both of these errata have the following
>> statement in the Errata Details section:
>>
>> This erratum does not apply in VMX non-root operation. It applies only
>> when PCIDs are enabled and either in VMX root operation or outside
>> VMX operation.
>>
>> I don't have deep expertise on the terminology here, but this sounds
>> like it is saying the erratum doesn’t apply in a guest VM. Or am I
>> misunderstanding?
> Huh. My read of that is the same as yours. If that's the case, then it probably
> makes sense to have KVM advertise support if PCID is available in hardware, even
> if PCID is disabled by the host kernel.

My reading is the same also.  Seems like VMs are fine.

~Andrew

2024-04-11 09:40:40

by Andrew Cooper

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mm: Don't disable INVLPG if "incomplete Global INVLPG flushes" is fixed by microcode

On 11/04/2024 6:38 am, Xi Ruoyao wrote:
> On Tue, 2024-04-09 at 08:56 +0100, Andrew Cooper wrote:
>> On 09/04/2024 2:43 am, Sean Christopherson wrote:
>>> On Mon, Apr 08, 2024, Michael Kelley wrote:
>>>> From: Dave Hansen <[email protected]> Sent: Thursday, April 4, 2024 11:09 AM
>>>>> On 4/4/24 10:48, Michael Kelley wrote:
>>>>>> I agree one could argue that it is a hypervisor bug to present PCID to the guest
>>>>>> in this situation. It's a lot cleaner to not have a guest be checking FMS and
>>>>>> microcode versions. But whether that's practical in the real world, at least
>>>>>> for Hyper-V, I don't know. What's the real impact of running with PCID while
>>>>>> the flaw is still present? I don’t know the history here ...
>>>>> There's a chance that INVLPG will appear ineffective.
>>>>>
>>>>> The bad sequence would go something like this: The kernel does the
>>>>> INVLPG on a global mapping.  Later, when switching PCIDs, the TLB entry
>>>>> mysteriously reappears.  No PCIDs switching means no mysterious
>>>>> reappearance.
>>>> Xi Ruoyao's patch identifies these errata:  RPL042 and ADL063.  In the links
>>>> to the documents Xi provided, both of these errata have the following
>>>> statement in the Errata Details section:
>>>>
>>>>     This erratum does not apply in VMX non-root operation.  It applies only
>>>>     when PCIDs are enabled and either in VMX root operation or outside
>>>>     VMX operation.
>>>>
>>>> I don't have deep expertise on the terminology here, but this sounds
>>>> like it is saying the erratum doesn’t apply in a guest VM.  Or am I
>>>> misunderstanding?
>>> Huh.  My read of that is the same as yours.  If that's the case, then it probably
>>> makes sense to have KVM advertise support if PCID is available in hardware, even
>>> if PCID is disabled by the host kernel.
>> My reading is the same also.  Seems like VMs are fine.
> So... Should I sent a v6 with the hypervisor checking reverted [ i.e.
> always enable PCID if boot_cpu_has(X86_FEATURE_HYPERVISOR) ]?

If Linux can see a hypervisor, then yes it should be safe to use PCID.

~Andrew