2022-04-30 10:31:47

by Jan Beulich

[permalink] [raw]
Subject: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

The latest with commit bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT
with pat_enabled()") pat_enabled() returning false (because of PAT
initialization being suppressed in the absence of MTRRs being announced
to be available) has become a problem: The i915 driver now fails to
initialize when running PV on Xen (i915_gem_object_pin_map() is where I
located the induced failure), and its error handling is flaky enough to
(at least sometimes) result in a hung system.

Yet even beyond that problem the keying of the use of WC mappings to
pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular
graphics frame buffer accesses would have been quite a bit less
performant than possible.

Arrange for the function to return true in such environments, without
undermining the rest of PAT MSR management logic considering PAT to be
disabled: Specifically, no writes to the PAT MSR should occur.

For the new boolean to live in .init.data, init_cache_modes() also needs
moving to .init.text (where it could/should have lived already before).

Signed-off-by: Jan Beulich <[email protected]>
---
On the system where I observed the issue, a knock-on effect of driver
initialization failing was that the SATA-controller also started to
report failures.

--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -62,6 +62,7 @@

static bool __read_mostly pat_bp_initialized;
static bool __read_mostly pat_disabled = !IS_ENABLED(CONFIG_X86_PAT);
+static bool __initdata pat_force_disabled = !IS_ENABLED(CONFIG_X86_PAT);
static bool __read_mostly pat_bp_enabled;
static bool __read_mostly pat_cm_initialized;

@@ -86,6 +87,7 @@ void pat_disable(const char *msg_reason)
static int __init nopat(char *str)
{
pat_disable("PAT support disabled via boot option.");
+ pat_force_disabled = true;
return 0;
}
early_param("nopat", nopat);
@@ -272,7 +274,7 @@ static void pat_ap_init(u64 pat)
wrmsrl(MSR_IA32_CR_PAT, pat);
}

-void init_cache_modes(void)
+void __init init_cache_modes(void)
{
u64 pat = 0;

@@ -313,6 +315,13 @@ void init_cache_modes(void)
*/
pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
+ } else if (!pat_force_disabled &&
+ boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
+ /*
+ * Clearly PAT is enabled underneath. Allow pat_enabled() to
+ * reflect this.
+ */
+ pat_bp_enabled = true;
}

__init_cache_modes(pat);


2022-05-03 15:07:04

by Jürgen Groß

[permalink] [raw]
Subject: Re: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

On 28.04.22 16:50, Jan Beulich wrote:
> The latest with commit bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT
> with pat_enabled()") pat_enabled() returning false (because of PAT
> initialization being suppressed in the absence of MTRRs being announced
> to be available) has become a problem: The i915 driver now fails to
> initialize when running PV on Xen (i915_gem_object_pin_map() is where I
> located the induced failure), and its error handling is flaky enough to
> (at least sometimes) result in a hung system.
>
> Yet even beyond that problem the keying of the use of WC mappings to
> pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular
> graphics frame buffer accesses would have been quite a bit less
> performant than possible.
>
> Arrange for the function to return true in such environments, without
> undermining the rest of PAT MSR management logic considering PAT to be
> disabled: Specifically, no writes to the PAT MSR should occur.
>
> For the new boolean to live in .init.data, init_cache_modes() also needs
> moving to .init.text (where it could/should have lived already before).
>
> Signed-off-by: Jan Beulich <[email protected]>

I think this approach isn't the best way to tackle the issue.

It can be solved rather easily by not deriving the supported caching
modes via pat_enabled(), but by adding specific functions to query
the needed caching mode from the PAT translation tables, and to use
those functions instead of pat_enabled().

I'm preparing a patch for that purpose.


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.08 kB)
OpenPGP public key
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments

2022-05-11 16:24:37

by Jürgen Groß

[permalink] [raw]
Subject: Re: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

On 03.05.22 14:54, Juergen Gross wrote:
> On 28.04.22 16:50, Jan Beulich wrote:
>> The latest with commit bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT
>> with pat_enabled()") pat_enabled() returning false (because of PAT
>> initialization being suppressed in the absence of MTRRs being announced
>> to be available) has become a problem: The i915 driver now fails to
>> initialize when running PV on Xen (i915_gem_object_pin_map() is where I
>> located the induced failure), and its error handling is flaky enough to
>> (at least sometimes) result in a hung system.
>>
>> Yet even beyond that problem the keying of the use of WC mappings to
>> pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular
>> graphics frame buffer accesses would have been quite a bit less
>> performant than possible.
>>
>> Arrange for the function to return true in such environments, without
>> undermining the rest of PAT MSR management logic considering PAT to be
>> disabled: Specifically, no writes to the PAT MSR should occur.
>>
>> For the new boolean to live in .init.data, init_cache_modes() also needs
>> moving to .init.text (where it could/should have lived already before).
>>
>> Signed-off-by: Jan Beulich <[email protected]>
>
> I think this approach isn't the best way to tackle the issue.
>
> It can be solved rather easily by not deriving the supported caching
> modes via pat_enabled(), but by adding specific functions to query
> the needed caching mode from the PAT translation tables, and to use
> those functions instead of pat_enabled().
>
> I'm preparing a patch for that purpose.

That attempt was not a complete success.

Especially there are issues with my approach when "nopat" has been
specified as boot parameter, as that would be just ignored.

So right now I can't think of a better approach than the one of Jan's
patch.


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.08 kB)
OpenPGP public key
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments

2022-05-21 15:16:08

by Chuck Zmudzinski

[permalink] [raw]
Subject: Re: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

On 4/28/22 10:50 AM, Jan Beulich wrote:
> The latest with commit bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT
> with pat_enabled()") pat_enabled() returning false (because of PAT
> initialization being suppressed in the absence of MTRRs being announced
> to be available) has become a problem: The i915 driver now fails to
> initialize when running PV on Xen (i915_gem_object_pin_map() is where I
> located the induced failure), and its error handling is flaky enough to
> (at least sometimes) result in a hung system.
>
> Yet even beyond that problem the keying of the use of WC mappings to
> pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular
> graphics frame buffer accesses would have been quite a bit less
> performant than possible.
>
> Arrange for the function to return true in such environments, without
> undermining the rest of PAT MSR management logic considering PAT to be
> disabled: Specifically, no writes to the PAT MSR should occur.
>
> For the new boolean to live in .init.data, init_cache_modes() also needs
> moving to .init.text (where it could/should have lived already before).
>
> Signed-off-by: Jan Beulich <[email protected]>
> ---
> On the system where I observed the issue, a knock-on effect of driver
> initialization failing was that the SATA-controller also started to
> report failures.
>
> --- a/arch/x86/mm/pat/memtype.c
> +++ b/arch/x86/mm/pat/memtype.c
> @@ -62,6 +62,7 @@
>
> static bool __read_mostly pat_bp_initialized;
> static bool __read_mostly pat_disabled = !IS_ENABLED(CONFIG_X86_PAT);
> +static bool __initdata pat_force_disabled = !IS_ENABLED(CONFIG_X86_PAT);
> static bool __read_mostly pat_bp_enabled;
> static bool __read_mostly pat_cm_initialized;
>
> @@ -86,6 +87,7 @@ void pat_disable(const char *msg_reason)
> static int __init nopat(char *str)
> {
> pat_disable("PAT support disabled via boot option.");
> + pat_force_disabled = true;
> return 0;
> }
> early_param("nopat", nopat);
> @@ -272,7 +274,7 @@ static void pat_ap_init(u64 pat)
> wrmsrl(MSR_IA32_CR_PAT, pat);
> }
>
> -void init_cache_modes(void)
> +void __init init_cache_modes(void)
> {
> u64 pat = 0;
>
> @@ -313,6 +315,13 @@ void init_cache_modes(void)
> */
> pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
> PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
> + } else if (!pat_force_disabled &&
> + boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
> + /*
> + * Clearly PAT is enabled underneath. Allow pat_enabled() to
> + * reflect this.
> + */
> + pat_bp_enabled = true;
> }
>
> __init_cache_modes(pat);
>

Hi Jan,

I tested this patch on my system with an Intel Haswell CPU with
integrated GPU. It fixes my system which currently is unable
to boot the official Linux 5.17.y as a Dom0 on Xen.

The first suggestion I have is that the commit message should be
edited to explain how the behavior of the kernel in response to
the nopat option would be changed by this patch. My tests indicate
that this patch changes what nopat actually does, and the commit
message should clearly explain that fact.

Thank you for this patch, I hope it is committed soon and backported
to 5.17 stable. I consider that it fixes a regression caused by
bdd8b6c98239, so my second suggestion is that you add Fixes:

Fixes: bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT with pat_enabled()")

I will be using this patch on my system until it, or another fix, is
committed to Linux. Without such a fix, I cannot run 5.17.y and later
on my system as a Xen Dom0.

Best regards,

Chuck

2022-05-26 08:21:24

by Jan Beulich

[permalink] [raw]
Subject: Ping: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

On 28.04.2022 16:50, Jan Beulich wrote:
> The latest with commit bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT
> with pat_enabled()") pat_enabled() returning false (because of PAT
> initialization being suppressed in the absence of MTRRs being announced
> to be available) has become a problem: The i915 driver now fails to
> initialize when running PV on Xen (i915_gem_object_pin_map() is where I
> located the induced failure), and its error handling is flaky enough to
> (at least sometimes) result in a hung system.
>
> Yet even beyond that problem the keying of the use of WC mappings to
> pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular
> graphics frame buffer accesses would have been quite a bit less
> performant than possible.
>
> Arrange for the function to return true in such environments, without
> undermining the rest of PAT MSR management logic considering PAT to be
> disabled: Specifically, no writes to the PAT MSR should occur.
>
> For the new boolean to live in .init.data, init_cache_modes() also needs
> moving to .init.text (where it could/should have lived already before).
>
> Signed-off-by: Jan Beulich <[email protected]>

The Linux kernel regression tracker is pestering me because things are
taking so long (effectively quoting him), and alternative proposals
made so far look to have more severe downsides.

Thanks, Jan


2022-07-04 12:27:44

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: Ping: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

On 25.05.22 10:55, Jan Beulich wrote:
> On 28.04.2022 16:50, Jan Beulich wrote:
>> The latest with commit bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT
>> with pat_enabled()") pat_enabled() returning false (because of PAT
>> initialization being suppressed in the absence of MTRRs being announced
>> to be available) has become a problem: The i915 driver now fails to
>> initialize when running PV on Xen (i915_gem_object_pin_map() is where I
>> located the induced failure), and its error handling is flaky enough to
>> (at least sometimes) result in a hung system.
>>
>> Yet even beyond that problem the keying of the use of WC mappings to
>> pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular
>> graphics frame buffer accesses would have been quite a bit less
>> performant than possible.
>>
>> Arrange for the function to return true in such environments, without
>> undermining the rest of PAT MSR management logic considering PAT to be
>> disabled: Specifically, no writes to the PAT MSR should occur.
>>
>> For the new boolean to live in .init.data, init_cache_modes() also needs
>> moving to .init.text (where it could/should have lived already before).
>>
>> Signed-off-by: Jan Beulich <[email protected]>
>
> The Linux kernel regression tracker is pestering me because things are
> taking so long (effectively quoting him), and alternative proposals
> made so far look to have more severe downsides.

Has any progress been made with this patch? It afaics is meant to fix
this regression, which ideally should have been fixed weeks ago (btw:
adding a "Link:" tag pointing to it would be good):
https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/

According to Juergen it's still needed:
https://lore.kernel.org/lkml/[email protected]/

Or was a different solution found to fix that regression?

Ciao, Thorsten

2022-07-04 12:47:11

by Jan Beulich

[permalink] [raw]
Subject: Re: Ping: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

On 04.07.2022 13:58, Thorsten Leemhuis wrote:
> On 25.05.22 10:55, Jan Beulich wrote:
>> On 28.04.2022 16:50, Jan Beulich wrote:
>>> The latest with commit bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT
>>> with pat_enabled()") pat_enabled() returning false (because of PAT
>>> initialization being suppressed in the absence of MTRRs being announced
>>> to be available) has become a problem: The i915 driver now fails to
>>> initialize when running PV on Xen (i915_gem_object_pin_map() is where I
>>> located the induced failure), and its error handling is flaky enough to
>>> (at least sometimes) result in a hung system.
>>>
>>> Yet even beyond that problem the keying of the use of WC mappings to
>>> pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular
>>> graphics frame buffer accesses would have been quite a bit less
>>> performant than possible.
>>>
>>> Arrange for the function to return true in such environments, without
>>> undermining the rest of PAT MSR management logic considering PAT to be
>>> disabled: Specifically, no writes to the PAT MSR should occur.
>>>
>>> For the new boolean to live in .init.data, init_cache_modes() also needs
>>> moving to .init.text (where it could/should have lived already before).
>>>
>>> Signed-off-by: Jan Beulich <[email protected]>
>>
>> The Linux kernel regression tracker is pestering me because things are
>> taking so long (effectively quoting him), and alternative proposals
>> made so far look to have more severe downsides.
>
> Has any progress been made with this patch? It afaics is meant to fix
> this regression, which ideally should have been fixed weeks ago (btw:
> adding a "Link:" tag pointing to it would be good):
> https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/
>
> According to Juergen it's still needed:
> https://lore.kernel.org/lkml/[email protected]/
>
> Or was a different solution found to fix that regression?

No progress and no alternatives I'm aware of.

Jan

2022-07-05 11:21:32

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: Ping: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

[CCing tglx, mingo, Boris and Juergen]

On 04.07.22 14:26, Jan Beulich wrote:
> On 04.07.2022 13:58, Thorsten Leemhuis wrote:
>> On 25.05.22 10:55, Jan Beulich wrote:
>>> On 28.04.2022 16:50, Jan Beulich wrote:
>>>> The latest with commit bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT
>>>> with pat_enabled()") pat_enabled() returning false (because of PAT
>>>> initialization being suppressed in the absence of MTRRs being announced
>>>> to be available) has become a problem: The i915 driver now fails to
>>>> initialize when running PV on Xen (i915_gem_object_pin_map() is where I
>>>> located the induced failure), and its error handling is flaky enough to
>>>> (at least sometimes) result in a hung system.
>>>>
>>>> Yet even beyond that problem the keying of the use of WC mappings to
>>>> pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular
>>>> graphics frame buffer accesses would have been quite a bit less
>>>> performant than possible.
>>>>
>>>> Arrange for the function to return true in such environments, without
>>>> undermining the rest of PAT MSR management logic considering PAT to be
>>>> disabled: Specifically, no writes to the PAT MSR should occur.
>>>>
>>>> For the new boolean to live in .init.data, init_cache_modes() also needs
>>>> moving to .init.text (where it could/should have lived already before).
>>>>
>>>> Signed-off-by: Jan Beulich <[email protected]>
>>>
>>> The Linux kernel regression tracker is pestering me because things are
>>> taking so long (effectively quoting him), and alternative proposals
>>> made so far look to have more severe downsides.
>>
>> Has any progress been made with this patch? It afaics is meant to fix
>> this regression, which ideally should have been fixed weeks ago (btw:
>> adding a "Link:" tag pointing to it would be good):
>> https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/
>>
>> According to Juergen it's still needed:
>> https://lore.kernel.org/lkml/[email protected]/
>>
>> Or was a different solution found to fix that regression?
>
> No progress and no alternatives I'm aware of.

Getting closer to the point where I need to bring this to Linus
attention. I hope this mail can help avoiding this.

Jan, I didn't follow this closely, but do you have any idea why Dave,
Luto, and Peter are ignoring this? Is reverting bdd8b6c98239 a option to
get the regression fixed? Would a repost maybe help getting this rolling
again?

BTW, for anyone new to this, Jan's patch afaics is supposed to fix the
regression reported here:
https://lore.kernel.org/all/YnHK1Z3o99eMXsVK@mail-itl/

Side note: Juergen Gross recently posted related patches in this code
area to fix some other problems (regressions?), but his efforts look
stalled, too:
https://lore.kernel.org/all/[email protected]/

And he recently stated this Jan's patch is still needed, even if his
changes make it in.
https://lore.kernel.org/all/[email protected]/

This from my point all looks a bit... unsatisfying. :-/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

2022-07-05 11:25:02

by Jan Beulich

[permalink] [raw]
Subject: Re: Ping: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

On 05.07.2022 12:57, Thorsten Leemhuis wrote:
> [CCing tglx, mingo, Boris and Juergen]
>
> On 04.07.22 14:26, Jan Beulich wrote:
>> On 04.07.2022 13:58, Thorsten Leemhuis wrote:
>>> On 25.05.22 10:55, Jan Beulich wrote:
>>>> On 28.04.2022 16:50, Jan Beulich wrote:
>>>>> The latest with commit bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT
>>>>> with pat_enabled()") pat_enabled() returning false (because of PAT
>>>>> initialization being suppressed in the absence of MTRRs being announced
>>>>> to be available) has become a problem: The i915 driver now fails to
>>>>> initialize when running PV on Xen (i915_gem_object_pin_map() is where I
>>>>> located the induced failure), and its error handling is flaky enough to
>>>>> (at least sometimes) result in a hung system.
>>>>>
>>>>> Yet even beyond that problem the keying of the use of WC mappings to
>>>>> pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular
>>>>> graphics frame buffer accesses would have been quite a bit less
>>>>> performant than possible.
>>>>>
>>>>> Arrange for the function to return true in such environments, without
>>>>> undermining the rest of PAT MSR management logic considering PAT to be
>>>>> disabled: Specifically, no writes to the PAT MSR should occur.
>>>>>
>>>>> For the new boolean to live in .init.data, init_cache_modes() also needs
>>>>> moving to .init.text (where it could/should have lived already before).
>>>>>
>>>>> Signed-off-by: Jan Beulich <[email protected]>
>>>>
>>>> The Linux kernel regression tracker is pestering me because things are
>>>> taking so long (effectively quoting him), and alternative proposals
>>>> made so far look to have more severe downsides.
>>>
>>> Has any progress been made with this patch? It afaics is meant to fix
>>> this regression, which ideally should have been fixed weeks ago (btw:
>>> adding a "Link:" tag pointing to it would be good):
>>> https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/
>>>
>>> According to Juergen it's still needed:
>>> https://lore.kernel.org/lkml/[email protected]/
>>>
>>> Or was a different solution found to fix that regression?
>>
>> No progress and no alternatives I'm aware of.
>
> Getting closer to the point where I need to bring this to Linus
> attention. I hope this mail can help avoiding this.
>
> Jan, I didn't follow this closely, but do you have any idea why Dave,
> Luto, and Peter are ignoring this?

No, I don't.

> Is reverting bdd8b6c98239 a option to get the regression fixed?

No, that change is warranted.

> Would a repost maybe help getting this rolling again?

No idea, but it doesn't seem very likely - there have been ample pings
and alike.

Jan

2022-07-05 14:27:22

by Jürgen Groß

[permalink] [raw]
Subject: Re: Ping: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

On 05.07.22 15:36, Borislav Petkov wrote:
> On Tue, Jul 05, 2022 at 12:57:18PM +0200, Thorsten Leemhuis wrote:
>> Side note: Juergen Gross recently posted related patches in this code
>> area to fix some other problems (regressions?), but his efforts look
>> stalled, too:
>> https://lore.kernel.org/all/[email protected]/
>
> I'm still waiting for a resumbission of this:
>
> https://lore.kernel.org/r/[email protected]
>

How are those directly connected? IMO they are independent from each other.


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.08 kB)
OpenPGP public key
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments

2022-07-05 15:03:50

by Borislav Petkov

[permalink] [raw]
Subject: Re: Ping: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

On Tue, Jul 05, 2022 at 12:57:18PM +0200, Thorsten Leemhuis wrote:
> Side note: Juergen Gross recently posted related patches in this code
> area to fix some other problems (regressions?), but his efforts look
> stalled, too:
> https://lore.kernel.org/all/[email protected]/

I'm still waiting for a resumbission of this:

https://lore.kernel.org/r/[email protected]

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-07-05 15:45:13

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

On Thu, Apr 28, 2022 at 04:50:29PM +0200, Jan Beulich wrote:
> --- a/arch/x86/mm/pat/memtype.c
> +++ b/arch/x86/mm/pat/memtype.c
> @@ -62,6 +62,7 @@
>
> static bool __read_mostly pat_bp_initialized;
> static bool __read_mostly pat_disabled = !IS_ENABLED(CONFIG_X86_PAT);
> +static bool __initdata pat_force_disabled = !IS_ENABLED(CONFIG_X86_PAT);
> static bool __read_mostly pat_bp_enabled;
> static bool __read_mostly pat_cm_initialized;

Why yet another boolean var?

Why not extend pat_enabled() to reflect the Xen case and explain it
properly above it?

My comment is likely wrong because I don't know what the Xen HV hides or
doesn't but you get the idea...

diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index d5ef64ddd35e..a8f1a02f9bc2 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -92,6 +92,13 @@ early_param("nopat", nopat);

bool pat_enabled(void)
{
+ /*
+ * Xen PV doesn't expose the PAT MSR to dom0 so the proper init path
+ * there cannot be exercised. Announce PAT is enabled in that case too.
+ */
+ if (cpu_feature_enabled(X86_FEATURE_XENPV) && !pat_disabled)
+ return true;
+
return pat_bp_enabled;
}
EXPORT_SYMBOL_GPL(pat_enabled);

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-07-05 16:13:06

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

On 05.07.2022 17:04, Borislav Petkov wrote:
> On Thu, Apr 28, 2022 at 04:50:29PM +0200, Jan Beulich wrote:
>> --- a/arch/x86/mm/pat/memtype.c
>> +++ b/arch/x86/mm/pat/memtype.c
>> @@ -62,6 +62,7 @@
>>
>> static bool __read_mostly pat_bp_initialized;
>> static bool __read_mostly pat_disabled = !IS_ENABLED(CONFIG_X86_PAT);
>> +static bool __initdata pat_force_disabled = !IS_ENABLED(CONFIG_X86_PAT);
>> static bool __read_mostly pat_bp_enabled;
>> static bool __read_mostly pat_cm_initialized;
>
> Why yet another boolean var?

Re-using pat_disabled like you do in your suggestion below won't
work, because mtrr_bp_init() calls pat_disable() when MTRRs
appear to be disabled (from the kernel's view). The goal is to
honor "nopat" without honoring any other calls to pat_disable().

> Why not extend pat_enabled() to reflect the Xen case and explain it
> properly above it?

I can probably fiddle with pat_enabled() instead of with
init_cache_modes(), but when making the change I had the feeling
this might be less liked (as looking more hacky, at least to me).

But besides the "where" the other question is: Do you really want
me to limit this to Xen/PV, rather than - as I have it now -
extending it to any hypervisor, which may behave in similar ways?

Jan

> My comment is likely wrong because I don't know what the Xen HV hides or
> doesn't but you get the idea...
>
> diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
> index d5ef64ddd35e..a8f1a02f9bc2 100644
> --- a/arch/x86/mm/pat/memtype.c
> +++ b/arch/x86/mm/pat/memtype.c
> @@ -92,6 +92,13 @@ early_param("nopat", nopat);
>
> bool pat_enabled(void)
> {
> + /*
> + * Xen PV doesn't expose the PAT MSR to dom0 so the proper init path
> + * there cannot be exercised. Announce PAT is enabled in that case too.
> + */
> + if (cpu_feature_enabled(X86_FEATURE_XENPV) && !pat_disabled)
> + return true;
> +
> return pat_bp_enabled;
> }
> EXPORT_SYMBOL_GPL(pat_enabled);
>

2022-07-05 16:26:38

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

On Tue, Jul 05, 2022 at 05:56:36PM +0200, Jan Beulich wrote:
> Re-using pat_disabled like you do in your suggestion below won't
> work, because mtrr_bp_init() calls pat_disable() when MTRRs
> appear to be disabled (from the kernel's view). The goal is to
> honor "nopat" without honoring any other calls to pat_disable().

Actually, the current goal is to adjust Xen dom0 because:

1. it uses the PAT code

2. but then it does something special and hides the MTRRs

which is not something real hardware does.

So this one-off thing should be prominent, visible and not get in the
way.

As to mtrr_bp_init(), can you use X86_FEATURE_XENPV there to detect this
special case when the kernel is running as dom0 and set stuff there
accordingly so that it doesn't disable PAT?

Then you don't have to touch pat_disabled() either but intergrate the
Xen variant properly...

> I can probably fiddle with pat_enabled() instead of with
> init_cache_modes(), but when making the change I had the feeling
> this might be less liked (as looking more hacky, at least to me).

Why would that be more hacky?

I'd much rather check upfront what the kernel is running on and act
accordingly instead of hooking into random functions and then years
later wonder why was it done in the first place.

> But besides the "where" the other question is: Do you really want
> me to limit this to Xen/PV, rather than - as I have it now -
> extending it to any hypervisor, which may behave in similar ways?

Well, do you know of some other HV which hides MTRRs from the guest?

I haven't heard of any...

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-07-06 06:29:22

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

On 05.07.2022 18:14, Borislav Petkov wrote:
> On Tue, Jul 05, 2022 at 05:56:36PM +0200, Jan Beulich wrote:
>> Re-using pat_disabled like you do in your suggestion below won't
>> work, because mtrr_bp_init() calls pat_disable() when MTRRs
>> appear to be disabled (from the kernel's view). The goal is to
>> honor "nopat" without honoring any other calls to pat_disable().
>
> Actually, the current goal is to adjust Xen dom0 because:
>
> 1. it uses the PAT code
>
> 2. but then it does something special and hides the MTRRs
>
> which is not something real hardware does.
>
> So this one-off thing should be prominent, visible and not get in the
> way.
>
> As to mtrr_bp_init(), can you use X86_FEATURE_XENPV there to detect this
> special case when the kernel is running as dom0 and set stuff there
> accordingly so that it doesn't disable PAT?

Sure, but that alone won't help. There's a beneficial side effect
of running through pat_disable(): That way pat_init() will bail
right away. Without that I'd need to further special case things
there (as under Xen/PV PAT must not be written, only read) and I'd
also need to set pat_bp_enabled and pat_bp_initialized somewhere.
I could of course check X86_FEATURE_XENPV in all the necessary
places, but I was quite certain _that_ wouldn't be liked (nor
would I be convinced this is the right thing to do - see bottom).

> Then you don't have to touch pat_disabled() either but intergrate the
> Xen variant properly...
>
>> I can probably fiddle with pat_enabled() instead of with
>> init_cache_modes(), but when making the change I had the feeling
>> this might be less liked (as looking more hacky, at least to me).
>
> Why would that be more hacky?

My view on it, as said. I did actually make several attempts, until
reaching what I then submitted. All earlier ones were quite a bit
more intrusive (see above for an outline).

> I'd much rather check upfront what the kernel is running on and act
> accordingly instead of hooking into random functions and then years
> later wonder why was it done in the first place.

Thank you for putting it that kindly. It was a pretty conscious
decision where to make the changes, after - as said - quite a bit
of trying other variants. History with Xen-specific changes has
taught me to try to keep them as uninvasive and generic as possible.
The more things smelled like Xen-only, the less they were liked.

>> But besides the "where" the other question is: Do you really want
>> me to limit this to Xen/PV, rather than - as I have it now -
>> extending it to any hypervisor, which may behave in similar ways?
>
> Well, do you know of some other HV which hides MTRRs from the guest?
>
> I haven't heard of any...

Any decent hypervisor will allow overriding CPUID, so in principle
I'd expect any to permit disabling MTRR to leave a guest to use
the (more modern and less cumbersome) PAT alone.

Jan

2022-07-06 17:11:28

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

On Wed, Jul 06, 2022 at 08:17:41AM +0200, Jan Beulich wrote:
> Sure, but that alone won't help.

Well, the MTRR code looks at X86_FEATURE_MTRR. If Xen doesn't expose the
MTRRs, then that bit should be clear in the CPUID the guest sees.

So in that case, you could test X86_FEATURE_XENPV at the end of
mtrr_bp_init() and not disable PAT if running as a PV guest. Would that
work?

> There's a beneficial side effect of running through pat_disable():
> That way pat_init() will bail right away. Without that I'd need to
> further special case things there (as under Xen/PV PAT must not be
> written, only read)

We have wrmsr_safe for that.

> Any decent hypervisor will allow overriding CPUID, so in principle
> I'd expect any to permit disabling MTRR to leave a guest to use
> the (more modern and less cumbersome) PAT alone.

So I'm being told that it would be generally beneficial for all kinds of
virtualization solutions to be able to support PAT only, without MTRRs
so it would be interesting to see how ugly it would become to decouple
PAT from MTRRs in Linux...

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-07-07 06:47:37

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

On 06.07.2022 19:01, Borislav Petkov wrote:
> On Wed, Jul 06, 2022 at 08:17:41AM +0200, Jan Beulich wrote:
>> Sure, but that alone won't help.
>
> Well, the MTRR code looks at X86_FEATURE_MTRR. If Xen doesn't expose the
> MTRRs, then that bit should be clear in the CPUID the guest sees.
>
> So in that case, you could test X86_FEATURE_XENPV at the end of
> mtrr_bp_init() and not disable PAT if running as a PV guest. Would that
> work?
>
>> There's a beneficial side effect of running through pat_disable():
>> That way pat_init() will bail right away. Without that I'd need to
>> further special case things there (as under Xen/PV PAT must not be
>> written, only read)
>
> We have wrmsr_safe for that.

Well, right now the pvops hook for Xen swallows #GP anyway (wrongly
so imo, but any of my earlier pointing out of that has been left
unheard, despite even the code comments there saying "It may be worth
changing that"). The point is therefore that after writing PAT, it
would need reading back. In which case it feels (slightly) more clean
to me to avoid the write attempt in the first place, when we know
it's not going to work.

>> Any decent hypervisor will allow overriding CPUID, so in principle
>> I'd expect any to permit disabling MTRR to leave a guest to use
>> the (more modern and less cumbersome) PAT alone.
>
> So I'm being told that it would be generally beneficial for all kinds of
> virtualization solutions to be able to support PAT only, without MTRRs
> so it would be interesting to see how ugly it would become to decouple
> PAT from MTRRs in Linux...

If I may ask - doesn't this mean this patch, in its current shape, is
already a (small) step in that direction? In any event what you say
doesn't sound to me like a viable (backportable) route to addressing
the regression at hand.

Jan

2022-07-11 11:24:01

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

On Thu, Jul 07, 2022 at 08:38:44AM +0200, Jan Beulich wrote:
> Well, right now the pvops hook for Xen swallows #GP anyway (wrongly
> so imo, but any of my earlier pointing out of that has been left
> unheard, despite even the code comments there saying "It may be worth
> changing that").

Oh great. ;-\

> The point is therefore that after writing PAT, it would need reading
> back. In which case it feels (slightly) more clean to me to avoid the
> write attempt in the first place, when we know it's not going to work.

X86_FEATURE_XENPV check then.

> If I may ask - doesn't this mean this patch, in its current shape, is
> already a (small) step in that direction? In any event what you say
> doesn't sound to me like a viable (backportable) route to addressing
> the regression at hand.

Backportable to where? To whatever tree has

bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT with pat_enabled()")

? If it is that, then 5.17 and newer.

Anyway, I don't mind it as long as you put the proper splitting out
ontop and it all goes as a single patchset, with the minimal fix
CC:stable and queued first.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-07-11 11:46:27

by Chuck Zmudzinski

[permalink] [raw]
Subject: Re: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

On 7/5/22 12:14 PM, Borislav Petkov wrote:
> On Tue, Jul 05, 2022 at 05:56:36PM +0200, Jan Beulich wrote:
> > Re-using pat_disabled like you do in your suggestion below won't
> > work, because mtrr_bp_init() calls pat_disable() when MTRRs
> > appear to be disabled (from the kernel's view). The goal is to
> > honor "nopat" without honoring any other calls to pat_disable().

I think Jan is speaking of the narrow goal of the patch
that is causing the regression at hand:

Commit bdd8b6c98239cad3a976d6f197afc2c794d3cef8
("drm/i915: replace X86_FEATURE_PAT with pat_enabled()")


The author of that commit expressed the desire to
more readily handle the nopat option in Linux in an
architecture-independent way. in the commit message.
The patch was not intended to cause a functional
change, but it did - it caused a regression in Xen
Dom0s running certain Intel graphics devices.

>
> Actually, the current goal is to adjust Xen dom0 because:
>
> 1. it uses the PAT code
>
> 2. but then it does something special and hides the MTRRs
>
> which is not something real hardware does.
>
> So this one-off thing should be prominent, visible and not get in the
> way.

Actually, this is probably the most insightful comment in all
the words that have been written about this regression. This
is a one-off thing, peculiar to Xen PV, but it is not visible
enough and gets overlooked when changes are made. I guess
I agree it should not get in the way either, and it doesn't,
except when the lack of visibility of this one-off thing causes
the author of a patch to overlook it and cause unforeseen
problems for users of Xen on Linux. That is precisely what
happened five years ago with commit 99c13b8c8896d7bcb92753bf
("x86/mm/pat: Don't report PAT on CPUs that don't support it")

Have you looked at that patch? That is the one that introduced
the regression that causes pat_enabled() to return a false negative
on Xen Dom0 today, and it hid in the code for four and a half years
and was only exposed as a regression with commit
bdd8b6c98239cad3a9 last December, which again was written in
a way that caused the regression on Xen because this one-off
thing that Xen does was not visible enough to the author of the
recent patch from last December that exposed this problem as a
regression with Xen PV domains and certain Intel graphics adpapters.

That patch did not take into account this not-visible-enough Xen
case when MTRR is disabled but PAT is still supported by the
CPU in Xen PV domains. So the proper way to fix this regression
is by fixing that commit from five years ago. It is very simple.
After some code re-factoring and other changes, today that
commit can be fixed by something like:

--- a/arch/x86/mm/pat/memtype.c    2022-06-16 07:32:05.000000000 -0400
+++ b/arch/x86/mm/pat/memtype.c    2022-07-09 17:51:56.783050765 -0400
@@ -315,6 +315,20 @@
               PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
     }
 
+    else if (!pat_bp_enabled) {
+        /*
+         * In some environments, specifically Xen PV, PAT
+         * initialization is skipped because MTRRs are
+         * disabled even though PAT is available. In such
+         * environments, set PAT to enabled to correctly
+         * indicate to callers of pat_enabled() that CPU
+         * support for PAT is available.
+         */
+        pat_bp_enabled = true;
+        pr_info("x86/PAT: PAT enabled by init_cache_modes\n");
+    }
+
     __init_cache_modes(pat);
 }
 
Like Jan's approach, it implements the fix in init_cache_modes(),
but unlike Jan's approach, it only sets pat_bp_enabled and
pat_bp_initialized to true if boot_cpu_has(X86_FEATURE_PAT)
is true and rdmsrl(MSR_IA32_CR_PAT, pat) returns a valid
value. No need to check for a hypervisor, just check if
the CPU supports PAT here and that PAT MSR returns something
valid here. If both are true, then set pat_bp_enabled to true.
Regression solved.

And that leaves the bigger goal of dealing with this one-off
thing that Xen does in a more sane way for another day. I
am working on a patch series that attempts to start the process
by first re-factoring the currently confusing pat_disable functions
and variables that will hopefully make this one-off Xen thing
more visible and easier to understand so when someone wants
to play around with the current way of deciding whether or
not PAT is enabled on X86, no regression will occur on Xen
or in any other environment.

Best Regards,

Chuck

2022-07-11 13:00:17

by Chuck Zmudzinski

[permalink] [raw]
Subject: Re: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen, with corrected patch

On 7/5/22 12:14 PM, Borislav Petkov wrote:
> On Tue, Jul 05, 2022 at 05:56:36PM +0200, Jan Beulich wrote:
> > Re-using pat_disabled like you do in your suggestion below won't
> > work, because mtrr_bp_init() calls pat_disable() when MTRRs
> > appear to be disabled (from the kernel's view). The goal is to
> > honor "nopat" without honoring any other calls to pat_disable().

I think Jan is speaking of the narrow goal of the patch
that is causing the regression at hand:

Commit bdd8b6c98239cad3a976d6f197afc2c794d3cef8
("drm/i915: replace X86_FEATURE_PAT with pat_enabled()")


The author of that commit expressed the desire to
more readily handle the nopat option in Linux in an
architecture-independent way. in the commit message.
The patch was not intended to cause a functional
change, but it did - it caused a regression in Xen
Dom0s running certain Intel graphics devices.

>
> Actually, the current goal is to adjust Xen dom0 because:
>
> 1. it uses the PAT code
>
> 2. but then it does something special and hides the MTRRs
>
> which is not something real hardware does.
>
> So this one-off thing should be prominent, visible and not get in the
> way.

Actually, this is probably the most insightful comment in all
the words that have been written about this regression. This
is a one-off thing, peculiar to Xen PV, but it is not visible
enough and gets overlooked when changes are made. I guess
I agree it should not get in the way either, and it doesn't,
except when the lack of visibility of this one-off thing causes
the author of a patch to overlook it and cause unforeseen
problems for users of Xen on Linux. That is precisely what
happened five years ago with commit 99c13b8c8896d7bcb92753bf
("x86/mm/pat: Don't report PAT on CPUs that don't support it")

Have you looked at that patch? That is the one that introduced
the regression that causes pat_enabled() to return a false negative
on Xen Dom0 today, and it hid in the code for four and a half years
and was only exposed as a regression with commit
bdd8b6c98239cad3a9 last December, which again was written in
a way that caused the regression on Xen because this one-off
thing that Xen does was not visible enough to the author of the
recent patch from last December that exposed this problem as a
regression with Xen PV domains and certain Intel graphics adpapters.

That patch did not take into account this not-visible-enough Xen
case when MTRR is disabled but PAT is still supported by the
CPU in Xen PV domains. So the proper way to fix this regression
is by fixing that commit from five years ago. It is very simple.
After some code re-factoring and other changes, today that
commit can be fixed by something like:

--- a/arch/x86/mm/pat/memtype.c    2022-06-16 07:32:05.000000000 -0400
+++ b/arch/x86/mm/pat/memtype.c    2022-07-09 17:51:56.783050765 -0400
@@ -315,6 +315,19 @@
               PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
     }
 
+    else if (!pat_bp_enabled) {
+        /*
+         * In some environments, specifically Xen PV, PAT
+         * initialization is skipped because MTRRs are
+         * disabled even though PAT is available. In such
+         * environments, set PAT to enabled to correctly
+         * indicate to callers of pat_enabled() that CPU
+         * support for PAT is available.
+         */
+        pat_bp_enabled = true;
+        pr_info("x86/PAT: PAT enabled by init_cache_modes\n");
+    }
+
     __init_cache_modes(pat);
 }
 
[N.B. I am re-sending this message because this patch in the
earlier message would be malformed because I deleted an inserted
line without editing the line in the patch that defines how
many new lines are inserted into the patched file. The change from
the proposed patch in the previous message is:

@@ -315,6 +315,20 @@ -> @@ -315,6 +315,19 @@

Sorry for the confusion.]

Like Jan's approach, this patch implements the fix in
init_cache_modes(), but unlike Jan's approach, it only sets
pat_bp_enabled and pat_bp_initialized to true if
boot_cpu_has(X86_FEATURE_PAT) is true and
rdmsrl(MSR_IA32_CR_PAT, pat) returns a valid
value. No need to check for a hypervisor, just check if
the CPU supports PAT here and that PAT MSR returns something
valid here. If both are true, then set pat_bp_enabled to true.
Regression solved.

And that leaves the bigger goal of dealing with this one-off
thing that Xen does in a more sane way for another day. I
am working on a patch series that attempts to start the process
by first re-factoring the currently confusing pat_disable functions
and variables that will hopefully make this one-off Xen thing
more visible and easier to understand so when someone wants
to play around with the current way of deciding whether or
not PAT is enabled on X86, no regression will occur on Xen
or in any other environment.

Best Regards,

Chuck

2022-07-11 14:21:05

by Chuck Zmudzinski

[permalink] [raw]
Subject: Re: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

On 7/5/22 12:14 PM, Borislav Petkov wrote:
> On Tue, Jul 05, 2022 at 05:56:36PM +0200, Jan Beulich wrote:
> > Re-using pat_disabled like you do in your suggestion below won't
> > work, because mtrr_bp_init() calls pat_disable() when MTRRs
> > appear to be disabled (from the kernel's view). The goal is to
> > honor "nopat" without honoring any other calls to pat_disable().
>
> Actually, the current goal is to adjust Xen dom0 because:
>
> 1. it uses the PAT code
>
> 2. but then it does something special and hides the MTRRs
>
> which is not something real hardware does.
>
> So this one-off thing should be prominent, visible and not get in the
> way.

Hello,

I have spent a fair amount of time this past weekend
investigating this regression and what might have caused
it and I also have done several tests on my Xen workstation
that exhibits the regression to verify my understanding
of the problem and also raise other problematic points.

I think, in addition to immediately fixing the regression by
fixing the now five-year-old commit that is the root cause
of the recently exposed regression, as discussed in my
earlier message which proposes a simple patch to fix that
five-year-old broken commit,

https://lore.kernel.org/lkml/[email protected]/

there needs to be a decision about how best to deal with
this very aptly described "one-off Xen thing" in the future.

One problem in x86/mm/pat/memtype.c is the fact that there
exist two functions, pat_init(), and init_cache_modes(), that
do more or less the same thing, so that when one of those
functions needs to be updated, the other also needs to
be updated. In the past, when changes to the pat_enable
and pat_disable functions and variables were made, all
too often, the change was only applied to pat_init() and not
to init_cache_modes() and the one-off Xen case was simply
not addressed and dealt with properly.

So I propose the following:

1) Identify those things that always need to be done in
either pat_init() or init_cache_modes() and try to combine
those things into a single function so that changes will
be applied for both cases. We sort of have that now with
__init_cache_modes(), but it is not really good enough to
prevent the regressions we sometimes get in Xen PV
domains such as the current regression we have with
Xen Dom0 and certain particular Intel graphics devices.

2) If it is not possible to do what is proposed in 1), at least
re-factor the code to make it very clear that whenever
either pat_init() needs to be adjusted or init_cache_modes()
needs to be adjusted, care must be taken to ensure that
all cases are properly dealt with, including the
one-off Xen case of enabling PAT with MTRR disabled,
Currently, AIUI, all one really needs to know is that
init_cached_modes() handles two special cases:
First, when PAT is disabled for any reason, including when
the "nopat" boot option is set, and second, the one-off
Xen case of MTRR being disabled but PAT being enabled.

I am trying to do number 2 with a patch series I am
working on. It is up to the experts to decide if it
is possible or even desirable to improve the code so
that any changes to the caching configuration are more
likely to be properly implemented for all supported platforms
by successfully combining the two functions pat_init() and
init_cache_modes() into a single function. The new function
would probably need to be renamed and include bits from
mtrr_bp_enabled, etc. for it to function properly.

If anyone wants to argue that it is not desirable to try
combine pat_init() and init_cache_modes() into a single
function, I think the burden of proof rests on that
person to demonstrate why it is good and clean
coding practice to continue to have them separate
and independent from each other in the code when
they both essentially do the same thing in the end, which
is call __init_cache_modes() and determine the PAT
state, and also probably the MTRR state.

Best Regards,

Chuck

2022-07-11 14:53:05

by Jürgen Groß

[permalink] [raw]
Subject: Re: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

On 11.07.22 16:18, Chuck Zmudzinski wrote:
> On 7/5/22 12:14 PM, Borislav Petkov wrote:
>> On Tue, Jul 05, 2022 at 05:56:36PM +0200, Jan Beulich wrote:
>>> Re-using pat_disabled like you do in your suggestion below won't
>>> work, because mtrr_bp_init() calls pat_disable() when MTRRs
>>> appear to be disabled (from the kernel's view). The goal is to
>>> honor "nopat" without honoring any other calls to pat_disable().
>>
>> Actually, the current goal is to adjust Xen dom0 because:
>>
>> 1. it uses the PAT code
>>
>> 2. but then it does something special and hides the MTRRs
>>
>> which is not something real hardware does.
>>
>> So this one-off thing should be prominent, visible and not get in the
>> way.
>
> Hello,
>
> I have spent a fair amount of time this past weekend
> investigating this regression and what might have caused
> it and I also have done several tests on my Xen workstation
> that exhibits the regression to verify my understanding
> of the problem and also raise other problematic points.
>
> I think, in addition to immediately fixing the regression by
> fixing the now five-year-old commit that is the root cause
> of the recently exposed regression, as discussed in my
> earlier message which proposes a simple patch to fix that
> five-year-old broken commit,
>
> https://lore.kernel.org/lkml/[email protected]/
>
> there needs to be a decision about how best to deal with
> this very aptly described "one-off Xen thing" in the future.
>
> One problem in x86/mm/pat/memtype.c is the fact that there
> exist two functions, pat_init(), and init_cache_modes(), that
> do more or less the same thing, so that when one of those
> functions needs to be updated, the other also needs to
> be updated. In the past, when changes to the pat_enable
> and pat_disable functions and variables were made, all
> too often, the change was only applied to pat_init() and not
> to init_cache_modes() and the one-off Xen case was simply
> not addressed and dealt with properly.
>
> So I propose the following:
>
> 1) Identify those things that always need to be done in
> either pat_init() or init_cache_modes() and try to combine
> those things into a single function so that changes will
> be applied for both cases. We sort of have that now with
> __init_cache_modes(), but it is not really good enough to
> prevent the regressions we sometimes get in Xen PV
> domains such as the current regression we have with
> Xen Dom0 and certain particular Intel graphics devices.
>
> 2) If it is not possible to do what is proposed in 1), at least
> re-factor the code to make it very clear that whenever
> either pat_init() needs to be adjusted or init_cache_modes()
> needs to be adjusted, care must be taken to ensure that
> all cases are properly dealt with, including the
> one-off Xen case of enabling PAT with MTRR disabled,
> Currently, AIUI, all one really needs to know is that
> init_cached_modes() handles two special cases:
> First, when PAT is disabled for any reason, including when
> the "nopat" boot option is set, and second, the one-off
> Xen case of MTRR being disabled but PAT being enabled.
>
> I am trying to do number 2 with a patch series I am
> working on. It is up to the experts to decide if it
> is possible or even desirable to improve the code so
> that any changes to the caching configuration are more
> likely to be properly implemented for all supported platforms
> by successfully combining the two functions pat_init() and
> init_cache_modes() into a single function. The new function
> would probably need to be renamed and include bits from
> mtrr_bp_enabled, etc. for it to function properly.
>
> If anyone wants to argue that it is not desirable to try
> combine pat_init() and init_cache_modes() into a single
> function, I think the burden of proof rests on that
> person to demonstrate why it is good and clean
> coding practice to continue to have them separate
> and independent from each other in the code when
> they both essentially do the same thing in the end, which
> is call __init_cache_modes() and determine the PAT
> state, and also probably the MTRR state.

I think the proper solution would be to make PAT and MTRR independent
from each other with some additional restructuring on the PAT side:

Today PAT can't be used without MTRR being available, unless MTRR is at
least configured via CONFIG_MTRR and the system is running as Xen PV
guest. In this case PAT is automatically available via the hypervisor,
but the PAT MSR can't be modified by the kernel and MTRR is disabled.

As an additional complexity the availability of PAT can't be queried
via pat_enabled() in the Xen PV case, as the lack of MTRR will set PAT
to be disabled. This leads to some drivers believing that not all cache
modes are available, resulting in failures or degraded functionality
(the current regression).

The same applies to a kernel built with no MTRR support: it won't
allow to use the PAT MSR, even if there is no technical reason for
that, other than setting up PAT on all cpus the same way (which is a
requirement of the processor's cache management) is relying on some
MTRR specific code.

All of that should be fixable by:

- moving the function needed by PAT from MTRR specific code one level
up
- adding a PAT indirection layer supporting the 3 cases "no or disabled
PAT", "PAT under kernel control", and "PAT under Xen control"
- removing the dependency of PAT on MTRR

I'd be up for trying this, but right now I haven't got the time to do
it. I might be able to start working on that in September.

Meanwhile I think either Jan's patch or the simple one of Chuck
should be applied.


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.08 kB)
OpenPGP public key
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments

2022-07-11 18:18:27

by Chuck Zmudzinski

[permalink] [raw]
Subject: Re: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

On 7/11/2022 10:31 AM, Juergen Gross wrote:
> On 11.07.22 16:18, Chuck Zmudzinski wrote:
> > On 7/5/22 12:14 PM, Borislav Petkov wrote:
> >> On Tue, Jul 05, 2022 at 05:56:36PM +0200, Jan Beulich wrote:
> >>> Re-using pat_disabled like you do in your suggestion below won't
> >>> work, because mtrr_bp_init() calls pat_disable() when MTRRs
> >>> appear to be disabled (from the kernel's view). The goal is to
> >>> honor "nopat" without honoring any other calls to pat_disable().
> >>
> >> Actually, the current goal is to adjust Xen dom0 because:
> >>
> >> 1. it uses the PAT code
> >>
> >> 2. but then it does something special and hides the MTRRs
> >>
> >> which is not something real hardware does.
> >>
> >> So this one-off thing should be prominent, visible and not get in the
> >> way.
> >
> > Hello,
> >
> > I have spent a fair amount of time this past weekend
> > investigating this regression and what might have caused
> > it and I also have done several tests on my Xen workstation
> > that exhibits the regression to verify my understanding
> > of the problem and also raise other problematic points.
> >
> > I think, in addition to immediately fixing the regression by
> > fixing the now five-year-old commit that is the root cause
> > of the recently exposed regression, as discussed in my
> > earlier message which proposes a simple patch to fix that
> > five-year-old broken commit,
> >
> > https://lore.kernel.org/lkml/[email protected]/
> >
> > there needs to be a decision about how best to deal with
> > this very aptly described "one-off Xen thing" in the future.
> >
> > One problem in x86/mm/pat/memtype.c is the fact that there
> > exist two functions, pat_init(), and init_cache_modes(), that
> > do more or less the same thing, so that when one of those
> > functions needs to be updated, the other also needs to
> > be updated. In the past, when changes to the pat_enable
> > and pat_disable functions and variables were made, all
> > too often, the change was only applied to pat_init() and not
> > to init_cache_modes() and the one-off Xen case was simply
> > not addressed and dealt with properly.
> >
> > So I propose the following:
> >
> > 1) Identify those things that always need to be done in
> > either pat_init() or init_cache_modes() and try to combine
> > those things into a single function so that changes will
> > be applied for both cases. We sort of have that now with
> > __init_cache_modes(), but it is not really good enough to
> > prevent the regressions we sometimes get in Xen PV
> > domains such as the current regression we have with
> > Xen Dom0 and certain particular Intel graphics devices.
> >
> > 2) If it is not possible to do what is proposed in 1), at least
> > re-factor the code to make it very clear that whenever
> > either pat_init() needs to be adjusted or init_cache_modes()
> > needs to be adjusted, care must be taken to ensure that
> > all cases are properly dealt with, including the
> > one-off Xen case of enabling PAT with MTRR disabled,
> > Currently, AIUI, all one really needs to know is that
> > init_cached_modes() handles two special cases:
> > First, when PAT is disabled for any reason, including when
> > the "nopat" boot option is set, and second, the one-off
> > Xen case of MTRR being disabled but PAT being enabled.
> >
> > I am trying to do number 2 with a patch series I am
> > working on. It is up to the experts to decide if it
> > is possible or even desirable to improve the code so
> > that any changes to the caching configuration are more
> > likely to be properly implemented for all supported platforms
> > by successfully combining the two functions pat_init() and
> > init_cache_modes() into a single function. The new function
> > would probably need to be renamed and include bits from
> > mtrr_bp_enabled, etc. for it to function properly.
> >
> > If anyone wants to argue that it is not desirable to try
> > combine pat_init() and init_cache_modes() into a single
> > function, I think the burden of proof rests on that
> > person to demonstrate why it is good and clean
> > coding practice to continue to have them separate
> > and independent from each other in the code when
> > they both essentially do the same thing in the end, which
> > is call __init_cache_modes() and determine the PAT
> > state, and also probably the MTRR state.
>
> I think the proper solution would be to make PAT and MTRR independent
> from each other with some additional restructuring on the PAT side:
>
> Today PAT can't be used without MTRR being available, unless MTRR is at
> least configured via CONFIG_MTRR and the system is running as Xen PV
> guest. In this case PAT is automatically available via the hypervisor,
> but the PAT MSR can't be modified by the kernel and MTRR is disabled.
>
> As an additional complexity the availability of PAT can't be queried
> via pat_enabled() in the Xen PV case, as the lack of MTRR will set PAT
> to be disabled.

My testing indicates your presumption is not correct, or at least my
testing on my Xen workstation makes me doubt that your assertion that
PAT can't be queried vi pat_enabled() in the Xen PV case is true.

Let me begin a discussion by posing a question about the following
code in init_cache_modes(), a function implemented in
arch/x86/mm/pat/memtypes.c, and which is called from
arch/x86/kernel/setup.c but only effective if pat_cm_initialized
has not already been set by pat_init(), as is obvious from the
immediate return if pat_cm_initialized has already been set.

Now in the Xen PV case, pat_cm_initialized is not set because
pat_init(), where it is normally set, was skipped due to the fact
that MTRRs are disabled in Xen PV domains. I verified that
init_cache_modes() is activated in the Xen PV Dom0 case by my
testing. So, I ask, why cannot the code that is present here in
init_cache_modes() be used to test for PAT in the Xen PV Dom0
environment and if the test is positive, why cannot pat_bp_enabled
be set which will cause pat_enabled() to return true in the Xen
PV environment? In fact, I used the additional 'else if' block to
set pat_bp_enabled to true in the Xen PV environment for the
case when boot_cpu_has(X86_FEATURE_PAT) is true and PAT
MSR does not return 0, that, is, when pat !=0 after calling
rdmsrl(MSR_IA32_CR_PAT, pat) for the boot CPU. All the existing
comments in this function indicate this is a valid way to test
for the existence of PAT in the Xen PV Dom0 environment.

Moreover... (please move to the bottom of the code snippet
for more information about my tests in the Xen PV environment...)

void init_cache_modes(void)
{
    u64 pat = 0;

    if (pat_cm_initialized)
        return;

    if (boot_cpu_has(X86_FEATURE_PAT)) {
        /*
         * CPU supports PAT. Set PAT table to be consistent with
         * PAT MSR. This case supports "nopat" boot option, and
         * virtual machine environments which support PAT without
         * MTRRs. In specific, Xen has unique setup to PAT MSR.
         *
         * If PAT MSR returns 0, it is considered invalid and emulates
         * as No PAT.
         */
        rdmsrl(MSR_IA32_CR_PAT, pat);
    }

    if (!pat) {
        /*
         * No PAT. Emulate the PAT table that corresponds to the two
         * cache bits, PWT (Write Through) and PCD (Cache Disable).
         * This setup is also the same as the BIOS default setup.
         *
         * PTE encoding:
         *
         *       PCD
         *       |PWT  PAT
         *       ||    slot
         *       00    0    WB : _PAGE_CACHE_MODE_WB
         *       01    1    WT : _PAGE_CACHE_MODE_WT
         *       10    2    UC-: _PAGE_CACHE_MODE_UC_MINUS
         *       11    3    UC : _PAGE_CACHE_MODE_UC
         *
         * NOTE: When WC or WP is used, it is redirected to UC- per
         * the default setup in __cachemode2pte_tbl[].
         */
        pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
              PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
    }

    else if (!pat_bp_enabled) {
        /*
         * In some environments, specifically Xen PV, PAT
         * initialization is skipped because MTRRs are
         * disabled even though PAT is available. In such
         * environments, set PAT to initialized and enabled to
         * correctly indicate to callers of pat_enabled() that
         * PAT is available and prevent PAT from being disabled.
         */
        pat_bp_enabled = true;
        pr_info("x86/PAT: PAT enabled by init_cache_modes\n");
    }

    __init_cache_modes(pat);
}

This function, patched with the extra 'else if' block, fixes the
regression on my Xen worksatation, and the pr_info message
"x86/PAT: PAT enabled by init_cache_modes" appears in the logs
when running this patched kernel in my Xen Dom0. This means
that in the Xen PV environment on my Xen Dom0 workstation,
rdmsrl(MSR_IA32_CR_PAT, pat) successfully tested for the presence
of PAT on the virtual CPU that Xen exposed to the Linux kernel on my
Xen Dom0 workstation. At least that is what I think my tests prove.

So why is this not a valid way to test for the existence of
PAT in the Xen PV environment? Are the existing comments
in init_cache_modes() about supporting both the case when
the "nopat" boot option is set and the specific case of Xen and
MTRR disabled wrong? My testing confirms those comments are
correct.

> This leads to some drivers believing that not all cache
> modes are available, resulting in failures or degraded functionality
> (the current regression).

I also did some tests to try to verify that fast WC caching mode
was used as requested by the i915 Intel graphics driver on my
Xen Dom0 workstation and so far I cannot prove that the WC caching
mode was not used, and I have some custom logs from the
Linux kernel i915 driver that indicates my Linux Xen Dom0 is able to
use the fast WC caching mode, so long as the bug that causes
pat_enabled() to return a false negative is fixed.

So my testing seems to contradict rather than confirm your claim
that "the availability of PAT can't be queried via pat_enabled() in
the Xen PV case." I think my patch successfully does what you
claim cannot be done, and I also think the existing comments in
init_cache_modes() indicates that my patch is successfully testing
for PAT in the Xen PV case.

>
> The same applies to a kernel built with no MTRR support: it won't
> allow to use the PAT MSR, even if there is no technical reason for
> that, other than setting up PAT on all cpus the same way (which is a
> requirement of the processor's cache management) is relying on some
> MTRR specific code.

After reading the code extensively, I am curious about testing
and even successfully compiling a kernel with options like
CONFIG_MTRR and CONFIG_PAT disabled. But my testing indicates
such problems could be debugged and fixed if they manifest
using the current code which, AFAICT, appears to be mostly
designed and tested for the case when both PAT and MTRR
are enabled on modern x86 hardware. I would expect many
bugs for configurations that deviate too far from that paradigm
on Linux. This is the problem with Xen's unique configuration
that enables PAT but disables MTRR. I think the problem can
be solved, it is just that there is not enough care taken to
ensure that these outlier cases will work as well as the common
configuration that Linux supports well, which is with both
MTRR and PAT enabled on x86.

>
> All of that should be fixable by:
>
> - moving the function needed by PAT from MTRR specific code one level
> up
> - adding a PAT indirection layer supporting the 3 cases "no or disabled
> PAT", "PAT under kernel control", and "PAT under Xen control"

I am not sure the right way to divide the cases is "PAT under
kernel control" and "PAT under Xen control." I also looked at
the code in the Xen hypervisor and it does allow for Dom0
to use a Xen hypercall provided by the xenctrl library to enable
PAT WC caching mode, and I ran some tests for evidence that
my Xen Dom0 was making such hypercalls to support the WC
caching mode, but I have not yet been able to verify that the
WC caching mode is enabled by a hypercall from Dom0 to Xen,
although by reading the code in the hypervisor and xenctrl it
seems to be possible. I just don't think the i915 Linux kernel
driver uses such hypercalls, and I am not sure that my Xen
Dom0 could be using the fast WC caching mode because
"PAT is under Xen control" instead of "under kernel control"
unless the kernel is relying on Xen to provide PAT via Xen
hypercalls. The successful call of my Xen Dom0 to rdmsrl
which obtained a valid configuration of PAT for the virtual
boot CPU exposed by Xen to the kernel indicates at least that
the kernel can detect, if not control, what the supported PAT
caching modes are. Of course you are absolutely correct that
the kernel cannot directly control the PAT caching modes in the
Xen PV environment, at least not without possibly resorting to
some Xen hypercalls that are most likely not currently in use
by the kernel.


I would have to do much more testing to really determine
what is happening and how my Xen Dom0 is really
configuring the PAT caching modes and specifically, if
it is actually using the fast WC caching mode as requested
by the i915 Intel graphics driver.

> - removing the dependency of PAT on MTRR
>
> I'd be up for trying this, but right now I haven't got the time to do
> it. I might be able to start working on that in September.

Please let me know if/when you start on it, I would be interested
to try out what you come up with on my Xen Dom0.

>
> Meanwhile I think either Jan's patch or the simple one of Chuck
> should be applied.

I agree.

Best Regards,

Chuck

2022-07-12 05:51:09

by Jürgen Groß

[permalink] [raw]
Subject: Re: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

On 11.07.22 19:41, Chuck Zmudzinski wrote:
> On 7/11/2022 10:31 AM, Juergen Gross wrote:
>> On 11.07.22 16:18, Chuck Zmudzinski wrote:
>>> On 7/5/22 12:14 PM, Borislav Petkov wrote:
>>>> On Tue, Jul 05, 2022 at 05:56:36PM +0200, Jan Beulich wrote:
>>>>> Re-using pat_disabled like you do in your suggestion below won't
>>>>> work, because mtrr_bp_init() calls pat_disable() when MTRRs
>>>>> appear to be disabled (from the kernel's view). The goal is to
>>>>> honor "nopat" without honoring any other calls to pat_disable().
>>>>
>>>> Actually, the current goal is to adjust Xen dom0 because:
>>>>
>>>> 1. it uses the PAT code
>>>>
>>>> 2. but then it does something special and hides the MTRRs
>>>>
>>>> which is not something real hardware does.
>>>>
>>>> So this one-off thing should be prominent, visible and not get in the
>>>> way.
>>>
>>> Hello,
>>>
>>> I have spent a fair amount of time this past weekend
>>> investigating this regression and what might have caused
>>> it and I also have done several tests on my Xen workstation
>>> that exhibits the regression to verify my understanding
>>> of the problem and also raise other problematic points.
>>>
>>> I think, in addition to immediately fixing the regression by
>>> fixing the now five-year-old commit that is the root cause
>>> of the recently exposed regression, as discussed in my
>>> earlier message which proposes a simple patch to fix that
>>> five-year-old broken commit,
>>>
>>> https://lore.kernel.org/lkml/[email protected]/
>>>
>>> there needs to be a decision about how best to deal with
>>> this very aptly described "one-off Xen thing" in the future.
>>>
>>> One problem in x86/mm/pat/memtype.c is the fact that there
>>> exist two functions, pat_init(), and init_cache_modes(), that
>>> do more or less the same thing, so that when one of those
>>> functions needs to be updated, the other also needs to
>>> be updated. In the past, when changes to the pat_enable
>>> and pat_disable functions and variables were made, all
>>> too often, the change was only applied to pat_init() and not
>>> to init_cache_modes() and the one-off Xen case was simply
>>> not addressed and dealt with properly.
>>>
>>> So I propose the following:
>>>
>>> 1) Identify those things that always need to be done in
>>> either pat_init() or init_cache_modes() and try to combine
>>> those things into a single function so that changes will
>>> be applied for both cases. We sort of have that now with
>>> __init_cache_modes(), but it is not really good enough to
>>> prevent the regressions we sometimes get in Xen PV
>>> domains such as the current regression we have with
>>> Xen Dom0 and certain particular Intel graphics devices.
>>>
>>> 2) If it is not possible to do what is proposed in 1), at least
>>> re-factor the code to make it very clear that whenever
>>> either pat_init() needs to be adjusted or init_cache_modes()
>>> needs to be adjusted, care must be taken to ensure that
>>> all cases are properly dealt with, including the
>>> one-off Xen case of enabling PAT with MTRR disabled,
>>> Currently, AIUI, all one really needs to know is that
>>> init_cached_modes() handles two special cases:
>>> First, when PAT is disabled for any reason, including when
>>> the "nopat" boot option is set, and second, the one-off
>>> Xen case of MTRR being disabled but PAT being enabled.
>>>
>>> I am trying to do number 2 with a patch series I am
>>> working on. It is up to the experts to decide if it
>>> is possible or even desirable to improve the code so
>>> that any changes to the caching configuration are more
>>> likely to be properly implemented for all supported platforms
>>> by successfully combining the two functions pat_init() and
>>> init_cache_modes() into a single function. The new function
>>> would probably need to be renamed and include bits from
>>> mtrr_bp_enabled, etc. for it to function properly.
>>>
>>> If anyone wants to argue that it is not desirable to try
>>> combine pat_init() and init_cache_modes() into a single
>>> function, I think the burden of proof rests on that
>>> person to demonstrate why it is good and clean
>>> coding practice to continue to have them separate
>>> and independent from each other in the code when
>>> they both essentially do the same thing in the end, which
>>> is call __init_cache_modes() and determine the PAT
>>> state, and also probably the MTRR state.
>>
>> I think the proper solution would be to make PAT and MTRR independent
>> from each other with some additional restructuring on the PAT side:
>>
>> Today PAT can't be used without MTRR being available, unless MTRR is at
>> least configured via CONFIG_MTRR and the system is running as Xen PV
>> guest. In this case PAT is automatically available via the hypervisor,
>> but the PAT MSR can't be modified by the kernel and MTRR is disabled.
>>
>> As an additional complexity the availability of PAT can't be queried
>> via pat_enabled() in the Xen PV case, as the lack of MTRR will set PAT
>> to be disabled.
>
> My testing indicates your presumption is not correct, or at least my
> testing on my Xen workstation makes me doubt that your assertion that
> PAT can't be queried vi pat_enabled() in the Xen PV case is true.

I was referring to the current (regressed) state.

>
> Let me begin a discussion by posing a question about the following
> code in init_cache_modes(), a function implemented in
> arch/x86/mm/pat/memtypes.c, and which is called from
> arch/x86/kernel/setup.c but only effective if pat_cm_initialized
> has not already been set by pat_init(), as is obvious from the
> immediate return if pat_cm_initialized has already been set.
>
> Now in the Xen PV case, pat_cm_initialized is not set because
> pat_init(), where it is normally set, was skipped due to the fact
> that MTRRs are disabled in Xen PV domains. I verified that
> init_cache_modes() is activated in the Xen PV Dom0 case by my
> testing. So, I ask, why cannot the code that is present here in
> init_cache_modes() be used to test for PAT in the Xen PV Dom0
> environment and if the test is positive, why cannot pat_bp_enabled
> be set which will cause pat_enabled() to return true in the Xen
> PV environment? In fact, I used the additional 'else if' block to
> set pat_bp_enabled to true in the Xen PV environment for the
> case when boot_cpu_has(X86_FEATURE_PAT) is true and PAT
> MSR does not return 0, that, is, when pat !=0 after calling
> rdmsrl(MSR_IA32_CR_PAT, pat) for the boot CPU. All the existing
> comments in this function indicate this is a valid way to test
> for the existence of PAT in the Xen PV Dom0 environment.

I agree this is a possibility to fix the regression.

In the long run I still think that my idea is the better one, as it
allows to run without CONFIG_MTRR even on bare metal without losing
the PAT functionality.

>
> Moreover... (please move to the bottom of the code snippet
> for more information about my tests in the Xen PV environment...)
>
> void init_cache_modes(void)
> {
>     u64 pat = 0;
>
>     if (pat_cm_initialized)
>         return;
>
>     if (boot_cpu_has(X86_FEATURE_PAT)) {
>         /*
>          * CPU supports PAT. Set PAT table to be consistent with
>          * PAT MSR. This case supports "nopat" boot option, and
>          * virtual machine environments which support PAT without
>          * MTRRs. In specific, Xen has unique setup to PAT MSR.
>          *
>          * If PAT MSR returns 0, it is considered invalid and emulates
>          * as No PAT.
>          */
>         rdmsrl(MSR_IA32_CR_PAT, pat);
>     }
>
>     if (!pat) {
>         /*
>          * No PAT. Emulate the PAT table that corresponds to the two
>          * cache bits, PWT (Write Through) and PCD (Cache Disable).
>          * This setup is also the same as the BIOS default setup.
>          *
>          * PTE encoding:
>          *
>          *       PCD
>          *       |PWT  PAT
>          *       ||    slot
>          *       00    0    WB : _PAGE_CACHE_MODE_WB
>          *       01    1    WT : _PAGE_CACHE_MODE_WT
>          *       10    2    UC-: _PAGE_CACHE_MODE_UC_MINUS
>          *       11    3    UC : _PAGE_CACHE_MODE_UC
>          *
>          * NOTE: When WC or WP is used, it is redirected to UC- per
>          * the default setup in __cachemode2pte_tbl[].
>          */
>         pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
>               PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
>     }
>
>     else if (!pat_bp_enabled) {
>         /*
>          * In some environments, specifically Xen PV, PAT
>          * initialization is skipped because MTRRs are
>          * disabled even though PAT is available. In such
>          * environments, set PAT to initialized and enabled to
>          * correctly indicate to callers of pat_enabled() that
>          * PAT is available and prevent PAT from being disabled.
>          */
>         pat_bp_enabled = true;
>         pr_info("x86/PAT: PAT enabled by init_cache_modes\n");
>     }
>
>     __init_cache_modes(pat);
> }
>
> This function, patched with the extra 'else if' block, fixes the
> regression on my Xen worksatation, and the pr_info message
> "x86/PAT: PAT enabled by init_cache_modes" appears in the logs
> when running this patched kernel in my Xen Dom0. This means
> that in the Xen PV environment on my Xen Dom0 workstation,
> rdmsrl(MSR_IA32_CR_PAT, pat) successfully tested for the presence
> of PAT on the virtual CPU that Xen exposed to the Linux kernel on my
> Xen Dom0 workstation. At least that is what I think my tests prove.

Yes, your fix is correct AFAICS.

> So why is this not a valid way to test for the existence of
> PAT in the Xen PV environment? Are the existing comments
> in init_cache_modes() about supporting both the case when
> the "nopat" boot option is set and the specific case of Xen and
> MTRR disabled wrong? My testing confirms those comments are
> correct.
>
>> This leads to some drivers believing that not all cache
>> modes are available, resulting in failures or degraded functionality
>> (the current regression).
>
> I also did some tests to try to verify that fast WC caching mode
> was used as requested by the i915 Intel graphics driver on my
> Xen Dom0 workstation and so far I cannot prove that the WC caching
> mode was not used, and I have some custom logs from the
> Linux kernel i915 driver that indicates my Linux Xen Dom0 is able to
> use the fast WC caching mode, so long as the bug that causes
> pat_enabled() to return a false negative is fixed.
>
> So my testing seems to contradict rather than confirm your claim
> that "the availability of PAT can't be queried via pat_enabled() in
> the Xen PV case." I think my patch successfully does what you
> claim cannot be done, and I also think the existing comments in
> init_cache_modes() indicates that my patch is successfully testing
> for PAT in the Xen PV case.
>
>>
>> The same applies to a kernel built with no MTRR support: it won't
>> allow to use the PAT MSR, even if there is no technical reason for
>> that, other than setting up PAT on all cpus the same way (which is a
>> requirement of the processor's cache management) is relying on some
>> MTRR specific code.
>
> After reading the code extensively, I am curious about testing
> and even successfully compiling a kernel with options like
> CONFIG_MTRR and CONFIG_PAT disabled. But my testing indicates
> such problems could be debugged and fixed if they manifest
> using the current code which, AFAICT, appears to be mostly
> designed and tested for the case when both PAT and MTRR
> are enabled on modern x86 hardware. I would expect many
> bugs for configurations that deviate too far from that paradigm
> on Linux. This is the problem with Xen's unique configuration
> that enables PAT but disables MTRR. I think the problem can
> be solved, it is just that there is not enough care taken to
> ensure that these outlier cases will work as well as the common
> configuration that Linux supports well, which is with both
> MTRR and PAT enabled on x86.

On bare metal PAT can't work without CONFIG_MTRR, as setting the PAT
MSR uniformly across cpus can't be done (the code is not available
without CONFIG_MTRR).

>
>>
>> All of that should be fixable by:
>>
>> - moving the function needed by PAT from MTRR specific code one level
>> up
>> - adding a PAT indirection layer supporting the 3 cases "no or disabled
>> PAT", "PAT under kernel control", and "PAT under Xen control"
>
> I am not sure the right way to divide the cases is "PAT under
> kernel control" and "PAT under Xen control." I also looked at
> the code in the Xen hypervisor and it does allow for Dom0
> to use a Xen hypercall provided by the xenctrl library to enable
> PAT WC caching mode, and I ran some tests for evidence that

I guess you mean XEN_DMOP_pin_memory_cacheattr. This is meant to
be used for device emulation.

> my Xen Dom0 was making such hypercalls to support the WC
> caching mode, but I have not yet been able to verify that the
> WC caching mode is enabled by a hypercall from Dom0 to Xen,
> although by reading the code in the hypervisor and xenctrl it
> seems to be possible. I just don't think the i915 Linux kernel
> driver uses such hypercalls, and I am not sure that my Xen
> Dom0 could be using the fast WC caching mode because
> "PAT is under Xen control" instead of "under kernel control"
> unless the kernel is relying on Xen to provide PAT via Xen
> hypercalls. The successful call of my Xen Dom0 to rdmsrl

The PAT MSR is setup by the hypervisor to support all possible
caching modes. Dom0 just needs to set the correct bits in a
page table entry to select the wanted caching mode.

The reason why there is no hypercall to modify the PAT MSR is,
that it is required that the MSR has the same value across all
cpus on the system. This precludes the possibility that any
guest could change it, as this would change it for all guests.
This all is true only for PV guests, of course, as HVM or PVH
guests get a virtualized PAT MSR via the hardware virtualization
feature.

> which obtained a valid configuration of PAT for the virtual
> boot CPU exposed by Xen to the kernel indicates at least that
> the kernel can detect, if not control, what the supported PAT
> caching modes are. Of course you are absolutely correct that
> the kernel cannot directly control the PAT caching modes in the
> Xen PV environment, at least not without possibly resorting to
> some Xen hypercalls that are most likely not currently in use
> by the kernel.

There is such hypercall. The hypercall you are thinking of is
only changing the caching mode for a specific set of pages by
modifying the related EPT/NPT entries.

> I would have to do much more testing to really determine
> what is happening and how my Xen Dom0 is really
> configuring the PAT caching modes and specifically, if
> it is actually using the fast WC caching mode as requested
> by the i915 Intel graphics driver.

A PV dom0 can only adapt itself to the PAT setting the Xen
hypervisor has done.

>
>> - removing the dependency of PAT on MTRR
>>
>> I'd be up for trying this, but right now I haven't got the time to do
>> it. I might be able to start working on that in September.
>
> Please let me know if/when you start on it, I would be interested
> to try out what you come up with on my Xen Dom0.

Will do that.

>
>>
>> Meanwhile I think either Jan's patch or the simple one of Chuck
>> should be applied.
>
> I agree.

Happy to hear that. :-)


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.08 kB)
OpenPGP public key
OpenPGP_signature (501.00 B)
OpenPGP digital signature
Download all attachments

2022-07-12 06:39:50

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

On 11.07.2022 19:41, Chuck Zmudzinski wrote:
> Moreover... (please move to the bottom of the code snippet
> for more information about my tests in the Xen PV environment...)
>
> void init_cache_modes(void)
> {
>     u64 pat = 0;
>
>     if (pat_cm_initialized)
>         return;
>
>     if (boot_cpu_has(X86_FEATURE_PAT)) {
>         /*
>          * CPU supports PAT. Set PAT table to be consistent with
>          * PAT MSR. This case supports "nopat" boot option, and
>          * virtual machine environments which support PAT without
>          * MTRRs. In specific, Xen has unique setup to PAT MSR.
>          *
>          * If PAT MSR returns 0, it is considered invalid and emulates
>          * as No PAT.
>          */
>         rdmsrl(MSR_IA32_CR_PAT, pat);
>     }
>
>     if (!pat) {
>         /*
>          * No PAT. Emulate the PAT table that corresponds to the two
>          * cache bits, PWT (Write Through) and PCD (Cache Disable).
>          * This setup is also the same as the BIOS default setup.
>          *
>          * PTE encoding:
>          *
>          *       PCD
>          *       |PWT  PAT
>          *       ||    slot
>          *       00    0    WB : _PAGE_CACHE_MODE_WB
>          *       01    1    WT : _PAGE_CACHE_MODE_WT
>          *       10    2    UC-: _PAGE_CACHE_MODE_UC_MINUS
>          *       11    3    UC : _PAGE_CACHE_MODE_UC
>          *
>          * NOTE: When WC or WP is used, it is redirected to UC- per
>          * the default setup in __cachemode2pte_tbl[].
>          */
>         pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
>               PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
>     }
>
>     else if (!pat_bp_enabled) {
>         /*
>          * In some environments, specifically Xen PV, PAT
>          * initialization is skipped because MTRRs are
>          * disabled even though PAT is available. In such
>          * environments, set PAT to initialized and enabled to
>          * correctly indicate to callers of pat_enabled() that
>          * PAT is available and prevent PAT from being disabled.
>          */
>         pat_bp_enabled = true;
>         pr_info("x86/PAT: PAT enabled by init_cache_modes\n");
>     }
>
>     __init_cache_modes(pat);
> }
>
> This function, patched with the extra 'else if' block, fixes the
> regression on my Xen worksatation, and the pr_info message
> "x86/PAT: PAT enabled by init_cache_modes" appears in the logs
> when running this patched kernel in my Xen Dom0. This means
> that in the Xen PV environment on my Xen Dom0 workstation,
> rdmsrl(MSR_IA32_CR_PAT, pat) successfully tested for the presence
> of PAT on the virtual CPU that Xen exposed to the Linux kernel on my
> Xen Dom0 workstation. At least that is what I think my tests prove.
>
> So why is this not a valid way to test for the existence of
> PAT in the Xen PV environment? Are the existing comments
> in init_cache_modes() about supporting both the case when
> the "nopat" boot option is set and the specific case of Xen and
> MTRR disabled wrong? My testing confirms those comments are
> correct.

At the very least this ignores the possible "nopat" an admin may
have passed to the kernel.

Jan

2022-07-12 13:32:24

by Chuck Zmudzinski

[permalink] [raw]
Subject: Re: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

On 7/12/2022 2:04 AM, Jan Beulich wrote:
> On 11.07.2022 19:41, Chuck Zmudzinski wrote:
> > Moreover... (please move to the bottom of the code snippet
> > for more information about my tests in the Xen PV environment...)
> >
> > void init_cache_modes(void)
> > {
> >     u64 pat = 0;
> >
> >     if (pat_cm_initialized)
> >         return;
> >
> >     if (boot_cpu_has(X86_FEATURE_PAT)) {
> >         /*
> >          * CPU supports PAT. Set PAT table to be consistent with
> >          * PAT MSR. This case supports "nopat" boot option, and
> >          * virtual machine environments which support PAT without
> >          * MTRRs. In specific, Xen has unique setup to PAT MSR.
> >          *
> >          * If PAT MSR returns 0, it is considered invalid and emulates
> >          * as No PAT.
> >          */
> >         rdmsrl(MSR_IA32_CR_PAT, pat);
> >     }
> >
> >     if (!pat) {
> >         /*
> >          * No PAT. Emulate the PAT table that corresponds to the two
> >          * cache bits, PWT (Write Through) and PCD (Cache Disable).
> >          * This setup is also the same as the BIOS default setup.
> >          *
> >          * PTE encoding:
> >          *
> >          *       PCD
> >          *       |PWT  PAT
> >          *       ||    slot
> >          *       00    0    WB : _PAGE_CACHE_MODE_WB
> >          *       01    1    WT : _PAGE_CACHE_MODE_WT
> >          *       10    2    UC-: _PAGE_CACHE_MODE_UC_MINUS
> >          *       11    3    UC : _PAGE_CACHE_MODE_UC
> >          *
> >          * NOTE: When WC or WP is used, it is redirected to UC- per
> >          * the default setup in __cachemode2pte_tbl[].
> >          */
> >         pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
> >               PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
> >     }
> >
> >     else if (!pat_bp_enabled) {
> >         /*
> >          * In some environments, specifically Xen PV, PAT
> >          * initialization is skipped because MTRRs are
> >          * disabled even though PAT is available. In such
> >          * environments, set PAT to initialized and enabled to
> >          * correctly indicate to callers of pat_enabled() that
> >          * PAT is available and prevent PAT from being disabled.
> >          */
> >         pat_bp_enabled = true;
> >         pr_info("x86/PAT: PAT enabled by init_cache_modes\n");
> >     }
> >
> >     __init_cache_modes(pat);
> > }
> >
> > This function, patched with the extra 'else if' block, fixes the
> > regression on my Xen worksatation, and the pr_info message
> > "x86/PAT: PAT enabled by init_cache_modes" appears in the logs
> > when running this patched kernel in my Xen Dom0. This means
> > that in the Xen PV environment on my Xen Dom0 workstation,
> > rdmsrl(MSR_IA32_CR_PAT, pat) successfully tested for the presence
> > of PAT on the virtual CPU that Xen exposed to the Linux kernel on my
> > Xen Dom0 workstation. At least that is what I think my tests prove.
> >
> > So why is this not a valid way to test for the existence of
> > PAT in the Xen PV environment? Are the existing comments
> > in init_cache_modes() about supporting both the case when
> > the "nopat" boot option is set and the specific case of Xen and
> > MTRR disabled wrong? My testing confirms those comments are
> > correct.
>
> At the very least this ignores the possible "nopat" an admin may
> have passed to the kernel.

I realize that. The patch I proposed here only fixes the regression. It
would be easy to also modify the patch to also observe the 'nopat"
setting. I think your patch had a force_pat_disable local variable that
is set if pat is disabled by the administrator with "nopat." With that
variable available, modifying the patch so in init_cache_modes we have:

     if (!pat || force_pat_disable) {
         /*
          * No PAT. Emulate the PAT table that corresponds to the two

Instead of:

     if (!pat) {
         /*
          * No PAT. Emulate the PAT table that corresponds to the two

would cause the kernel to respect the "nopat" setting by the administrator
in the Xen PV Dom0 environment.

I agree this needs to be fixed up, because currently the code is very
confusing and the current variable names and function names do not
always accurately describe what they actually do in the code. That is
why I am working on a patch to do some re-factoring, which only consists
of function and variable name changes and comment changes to fix
the places where the comments in the code are misleading or incomplete.

I think perhaps the most misnamed variable here is the  local
variable pat_disabled in memtypes.c and the most misnamed function is the
pat_disable function in memtypes.c. They should be named pat_init_disabled
and pat_init_disable, respectively, because they do not really disable
PAT in
the code but only prevent execution of the pat_init function. That
leaves open
the possibility for PAT to be enabled by init_cache_modes, which actually
occurs in the current code in the Xen PV Dom0 environment, but the current
code neglects to set pat_bp_enabled to true in that case. So we need a patch
to fix that in order to fix the regression.

Chuck

2022-07-12 13:36:16

by Jürgen Groß

[permalink] [raw]
Subject: Re: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

On 12.07.22 15:22, Chuck Zmudzinski wrote:
> On 7/12/2022 2:04 AM, Jan Beulich wrote:
>> On 11.07.2022 19:41, Chuck Zmudzinski wrote:
>>> Moreover... (please move to the bottom of the code snippet
>>> for more information about my tests in the Xen PV environment...)
>>>
>>> void init_cache_modes(void)
>>> {
>>>     u64 pat = 0;
>>>
>>>     if (pat_cm_initialized)
>>>         return;
>>>
>>>     if (boot_cpu_has(X86_FEATURE_PAT)) {
>>>         /*
>>>          * CPU supports PAT. Set PAT table to be consistent with
>>>          * PAT MSR. This case supports "nopat" boot option, and
>>>          * virtual machine environments which support PAT without
>>>          * MTRRs. In specific, Xen has unique setup to PAT MSR.
>>>          *
>>>          * If PAT MSR returns 0, it is considered invalid and emulates
>>>          * as No PAT.
>>>          */
>>>         rdmsrl(MSR_IA32_CR_PAT, pat);
>>>     }
>>>
>>>     if (!pat) {
>>>         /*
>>>          * No PAT. Emulate the PAT table that corresponds to the two
>>>          * cache bits, PWT (Write Through) and PCD (Cache Disable).
>>>          * This setup is also the same as the BIOS default setup.
>>>          *
>>>          * PTE encoding:
>>>          *
>>>          *       PCD
>>>          *       |PWT  PAT
>>>          *       ||    slot
>>>          *       00    0    WB : _PAGE_CACHE_MODE_WB
>>>          *       01    1    WT : _PAGE_CACHE_MODE_WT
>>>          *       10    2    UC-: _PAGE_CACHE_MODE_UC_MINUS
>>>          *       11    3    UC : _PAGE_CACHE_MODE_UC
>>>          *
>>>          * NOTE: When WC or WP is used, it is redirected to UC- per
>>>          * the default setup in __cachemode2pte_tbl[].
>>>          */
>>>         pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
>>>               PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
>>>     }
>>>
>>>     else if (!pat_bp_enabled) {
>>>         /*
>>>          * In some environments, specifically Xen PV, PAT
>>>          * initialization is skipped because MTRRs are
>>>          * disabled even though PAT is available. In such
>>>          * environments, set PAT to initialized and enabled to
>>>          * correctly indicate to callers of pat_enabled() that
>>>          * PAT is available and prevent PAT from being disabled.
>>>          */
>>>         pat_bp_enabled = true;
>>>         pr_info("x86/PAT: PAT enabled by init_cache_modes\n");
>>>     }
>>>
>>>     __init_cache_modes(pat);
>>> }
>>>
>>> This function, patched with the extra 'else if' block, fixes the
>>> regression on my Xen worksatation, and the pr_info message
>>> "x86/PAT: PAT enabled by init_cache_modes" appears in the logs
>>> when running this patched kernel in my Xen Dom0. This means
>>> that in the Xen PV environment on my Xen Dom0 workstation,
>>> rdmsrl(MSR_IA32_CR_PAT, pat) successfully tested for the presence
>>> of PAT on the virtual CPU that Xen exposed to the Linux kernel on my
>>> Xen Dom0 workstation. At least that is what I think my tests prove.
>>>
>>> So why is this not a valid way to test for the existence of
>>> PAT in the Xen PV environment? Are the existing comments
>>> in init_cache_modes() about supporting both the case when
>>> the "nopat" boot option is set and the specific case of Xen and
>>> MTRR disabled wrong? My testing confirms those comments are
>>> correct.
>>
>> At the very least this ignores the possible "nopat" an admin may
>> have passed to the kernel.
>
> I realize that. The patch I proposed here only fixes the regression. It
> would be easy to also modify the patch to also observe the 'nopat"
> setting. I think your patch had a force_pat_disable local variable that
> is set if pat is disabled by the administrator with "nopat." With that
> variable available, modifying the patch so in init_cache_modes we have:
>
>      if (!pat || force_pat_disable) {
>          /*
>           * No PAT. Emulate the PAT table that corresponds to the two
>
> Instead of:
>
>      if (!pat) {
>          /*
>           * No PAT. Emulate the PAT table that corresponds to the two
>
> would cause the kernel to respect the "nopat" setting by the administrator
> in the Xen PV Dom0 environment.

Chuck, could you please send out a proper patch with your initial fix
(setting pat_bp_enabled) and the fix above?

I've chatted with Boris Petkov on IRC and he is fine with that.

> I agree this needs to be fixed up, because currently the code is very
> confusing and the current variable names and function names do not
> always accurately describe what they actually do in the code. That is
> why I am working on a patch to do some re-factoring, which only consists
> of function and variable name changes and comment changes to fix
> the places where the comments in the code are misleading or incomplete.

Boris and I agreed to pursue my approach further by removing the
dependency between PAT and MTRR and to make this whole mess more
clear.

I will start to work on this as soon as possible, which will
probably be some time in September.

> I think perhaps the most misnamed variable here is the  local
> variable pat_disabled in memtypes.c and the most misnamed function is the
> pat_disable function in memtypes.c. They should be named pat_init_disabled
> and pat_init_disable, respectively, because they do not really disable
> PAT in
> the code but only prevent execution of the pat_init function. That
> leaves open
> the possibility for PAT to be enabled by init_cache_modes, which actually
> occurs in the current code in the Xen PV Dom0 environment, but the current
> code neglects to set pat_bp_enabled to true in that case. So we need a patch
> to fix that in order to fix the regression.

In principle I agree, but you should be aware of my refactoring plans.


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.08 kB)
OpenPGP public key
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments

2022-07-12 15:25:19

by Chuck Zmudzinski

[permalink] [raw]
Subject: Re: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

On 7/12/2022 9:32 AM, Juergen Gross wrote:
> On 12.07.22 15:22, Chuck Zmudzinski wrote:
> > On 7/12/2022 2:04 AM, Jan Beulich wrote:
> >> On 11.07.2022 19:41, Chuck Zmudzinski wrote:
> >>> Moreover... (please move to the bottom of the code snippet
> >>> for more information about my tests in the Xen PV environment...)
> >>>
> >>> void init_cache_modes(void)
> >>> {
> >>>     u64 pat = 0;
> >>>
> >>>     if (pat_cm_initialized)
> >>>         return;
> >>>
> >>>     if (boot_cpu_has(X86_FEATURE_PAT)) {
> >>>         /*
> >>>          * CPU supports PAT. Set PAT table to be consistent with
> >>>          * PAT MSR. This case supports "nopat" boot option, and
> >>>          * virtual machine environments which support PAT without
> >>>          * MTRRs. In specific, Xen has unique setup to PAT MSR.
> >>>          *
> >>>          * If PAT MSR returns 0, it is considered invalid and emulates
> >>>          * as No PAT.
> >>>          */
> >>>         rdmsrl(MSR_IA32_CR_PAT, pat);
> >>>     }
> >>>
> >>>     if (!pat) {
> >>>         /*
> >>>          * No PAT. Emulate the PAT table that corresponds to the two
> >>>          * cache bits, PWT (Write Through) and PCD (Cache Disable).
> >>>          * This setup is also the same as the BIOS default setup.
> >>>          *
> >>>          * PTE encoding:
> >>>          *
> >>>          *       PCD
> >>>          *       |PWT  PAT
> >>>          *       ||    slot
> >>>          *       00    0    WB : _PAGE_CACHE_MODE_WB
> >>>          *       01    1    WT : _PAGE_CACHE_MODE_WT
> >>>          *       10    2    UC-: _PAGE_CACHE_MODE_UC_MINUS
> >>>          *       11    3    UC : _PAGE_CACHE_MODE_UC
> >>>          *
> >>>          * NOTE: When WC or WP is used, it is redirected to UC- per
> >>>          * the default setup in __cachemode2pte_tbl[].
> >>>          */
> >>>         pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
> >>>               PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
> >>>     }
> >>>
> >>>     else if (!pat_bp_enabled) {
> >>>         /*
> >>>          * In some environments, specifically Xen PV, PAT
> >>>          * initialization is skipped because MTRRs are
> >>>          * disabled even though PAT is available. In such
> >>>          * environments, set PAT to initialized and enabled to
> >>>          * correctly indicate to callers of pat_enabled() that
> >>>          * PAT is available and prevent PAT from being disabled.
> >>>          */
> >>>         pat_bp_enabled = true;
> >>>         pr_info("x86/PAT: PAT enabled by init_cache_modes\n");
> >>>     }
> >>>
> >>>     __init_cache_modes(pat);
> >>> }
> >>>
> >>> This function, patched with the extra 'else if' block, fixes the
> >>> regression on my Xen worksatation, and the pr_info message
> >>> "x86/PAT: PAT enabled by init_cache_modes" appears in the logs
> >>> when running this patched kernel in my Xen Dom0. This means
> >>> that in the Xen PV environment on my Xen Dom0 workstation,
> >>> rdmsrl(MSR_IA32_CR_PAT, pat) successfully tested for the presence
> >>> of PAT on the virtual CPU that Xen exposed to the Linux kernel on my
> >>> Xen Dom0 workstation. At least that is what I think my tests prove.
> >>>
> >>> So why is this not a valid way to test for the existence of
> >>> PAT in the Xen PV environment? Are the existing comments
> >>> in init_cache_modes() about supporting both the case when
> >>> the "nopat" boot option is set and the specific case of Xen and
> >>> MTRR disabled wrong? My testing confirms those comments are
> >>> correct.
> >>
> >> At the very least this ignores the possible "nopat" an admin may
> >> have passed to the kernel.
> >
> > I realize that. The patch I proposed here only fixes the regression. It
> > would be easy to also modify the patch to also observe the 'nopat"
> > setting. I think your patch had a force_pat_disable local variable that
> > is set if pat is disabled by the administrator with "nopat." With that
> > variable available, modifying the patch so in init_cache_modes we have:
> >
> >      if (!pat || force_pat_disable) {
> >          /*
> >           * No PAT. Emulate the PAT table that corresponds to the two
> >
> > Instead of:
> >
> >      if (!pat) {
> >          /*
> >           * No PAT. Emulate the PAT table that corresponds to the two
> >
> > would cause the kernel to respect the "nopat" setting by the administrator
> > in the Xen PV Dom0 environment.
>
> Chuck, could you please send out a proper patch with your initial fix
> (setting pat_bp_enabled) and the fix above?
>
> I've chatted with Boris Petkov on IRC and he is fine with that.

That's great, I will submit a formal patch later today.

>
> > I agree this needs to be fixed up, because currently the code is very
> > confusing and the current variable names and function names do not
> > always accurately describe what they actually do in the code. That is
> > why I am working on a patch to do some re-factoring, which only consists
> > of function and variable name changes and comment changes to fix
> > the places where the comments in the code are misleading or incomplete.
>
> Boris and I agreed to pursue my approach further by removing the
> dependency between PAT and MTRR and to make this whole mess more
> clear.
>
> I will start to work on this as soon as possible, which will
> probably be some time in September.

Good, I will look for your patches and try them out.

>
> > I think perhaps the most misnamed variable here is the  local
> > variable pat_disabled in memtypes.c and the most misnamed function is the
> > pat_disable function in memtypes.c. They should be named pat_init_disabled
> > and pat_init_disable, respectively, because they do not really disable
> > PAT in
> > the code but only prevent execution of the pat_init function. That
> > leaves open
> > the possibility for PAT to be enabled by init_cache_modes, which actually
> > occurs in the current code in the Xen PV Dom0 environment, but the current
> > code neglects to set pat_bp_enabled to true in that case. So we need a patch
> > to fix that in order to fix the regression.
>
> In principle I agree, but you should be aware of my refactoring plans.

I will defer to you and stop working on this re-factoring effort, but I
will prepare a formal patch to fix the regression later today.

I do think Jan's point about respecting the administrator's "nopat" setting
should also be considered. AFAICT, the "nopat" option in current code
is also not being respected on the bare metal, and the patch to
init_cache_modes with a force_no_pat variable is also needed to
ensure "nopat" is respected on the bare metal, AFAICT.

Best Regards,

Chuck

2022-07-12 16:31:38

by Jürgen Groß

[permalink] [raw]
Subject: Re: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

On 12.07.22 17:09, Chuck Zmudzinski wrote:
> On 7/12/2022 9:32 AM, Juergen Gross wrote:
>> On 12.07.22 15:22, Chuck Zmudzinski wrote:
>>> On 7/12/2022 2:04 AM, Jan Beulich wrote:
>>>> On 11.07.2022 19:41, Chuck Zmudzinski wrote:
>>>>> Moreover... (please move to the bottom of the code snippet
>>>>> for more information about my tests in the Xen PV environment...)
>>>>>
>>>>> void init_cache_modes(void)
>>>>> {
>>>>>     u64 pat = 0;
>>>>>
>>>>>     if (pat_cm_initialized)
>>>>>         return;
>>>>>
>>>>>     if (boot_cpu_has(X86_FEATURE_PAT)) {
>>>>>         /*
>>>>>          * CPU supports PAT. Set PAT table to be consistent with
>>>>>          * PAT MSR. This case supports "nopat" boot option, and
>>>>>          * virtual machine environments which support PAT without
>>>>>          * MTRRs. In specific, Xen has unique setup to PAT MSR.
>>>>>          *
>>>>>          * If PAT MSR returns 0, it is considered invalid and emulates
>>>>>          * as No PAT.
>>>>>          */
>>>>>         rdmsrl(MSR_IA32_CR_PAT, pat);
>>>>>     }
>>>>>
>>>>>     if (!pat) {
>>>>>         /*
>>>>>          * No PAT. Emulate the PAT table that corresponds to the two
>>>>>          * cache bits, PWT (Write Through) and PCD (Cache Disable).
>>>>>          * This setup is also the same as the BIOS default setup.
>>>>>          *
>>>>>          * PTE encoding:
>>>>>          *
>>>>>          *       PCD
>>>>>          *       |PWT  PAT
>>>>>          *       ||    slot
>>>>>          *       00    0    WB : _PAGE_CACHE_MODE_WB
>>>>>          *       01    1    WT : _PAGE_CACHE_MODE_WT
>>>>>          *       10    2    UC-: _PAGE_CACHE_MODE_UC_MINUS
>>>>>          *       11    3    UC : _PAGE_CACHE_MODE_UC
>>>>>          *
>>>>>          * NOTE: When WC or WP is used, it is redirected to UC- per
>>>>>          * the default setup in __cachemode2pte_tbl[].
>>>>>          */
>>>>>         pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
>>>>>               PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
>>>>>     }
>>>>>
>>>>>     else if (!pat_bp_enabled) {
>>>>>         /*
>>>>>          * In some environments, specifically Xen PV, PAT
>>>>>          * initialization is skipped because MTRRs are
>>>>>          * disabled even though PAT is available. In such
>>>>>          * environments, set PAT to initialized and enabled to
>>>>>          * correctly indicate to callers of pat_enabled() that
>>>>>          * PAT is available and prevent PAT from being disabled.
>>>>>          */
>>>>>         pat_bp_enabled = true;
>>>>>         pr_info("x86/PAT: PAT enabled by init_cache_modes\n");
>>>>>     }
>>>>>
>>>>>     __init_cache_modes(pat);
>>>>> }
>>>>>
>>>>> This function, patched with the extra 'else if' block, fixes the
>>>>> regression on my Xen worksatation, and the pr_info message
>>>>> "x86/PAT: PAT enabled by init_cache_modes" appears in the logs
>>>>> when running this patched kernel in my Xen Dom0. This means
>>>>> that in the Xen PV environment on my Xen Dom0 workstation,
>>>>> rdmsrl(MSR_IA32_CR_PAT, pat) successfully tested for the presence
>>>>> of PAT on the virtual CPU that Xen exposed to the Linux kernel on my
>>>>> Xen Dom0 workstation. At least that is what I think my tests prove.
>>>>>
>>>>> So why is this not a valid way to test for the existence of
>>>>> PAT in the Xen PV environment? Are the existing comments
>>>>> in init_cache_modes() about supporting both the case when
>>>>> the "nopat" boot option is set and the specific case of Xen and
>>>>> MTRR disabled wrong? My testing confirms those comments are
>>>>> correct.
>>>>
>>>> At the very least this ignores the possible "nopat" an admin may
>>>> have passed to the kernel.
>>>
>>> I realize that. The patch I proposed here only fixes the regression. It
>>> would be easy to also modify the patch to also observe the 'nopat"
>>> setting. I think your patch had a force_pat_disable local variable that
>>> is set if pat is disabled by the administrator with "nopat." With that
>>> variable available, modifying the patch so in init_cache_modes we have:
>>>
>>>      if (!pat || force_pat_disable) {
>>>          /*
>>>           * No PAT. Emulate the PAT table that corresponds to the two
>>>
>>> Instead of:
>>>
>>>      if (!pat) {
>>>          /*
>>>           * No PAT. Emulate the PAT table that corresponds to the two
>>>
>>> would cause the kernel to respect the "nopat" setting by the administrator
>>> in the Xen PV Dom0 environment.
>>
>> Chuck, could you please send out a proper patch with your initial fix
>> (setting pat_bp_enabled) and the fix above?
>>
>> I've chatted with Boris Petkov on IRC and he is fine with that.
>
> That's great, I will submit a formal patch later today.
>
>>
>>> I agree this needs to be fixed up, because currently the code is very
>>> confusing and the current variable names and function names do not
>>> always accurately describe what they actually do in the code. That is
>>> why I am working on a patch to do some re-factoring, which only consists
>>> of function and variable name changes and comment changes to fix
>>> the places where the comments in the code are misleading or incomplete.
>>
>> Boris and I agreed to pursue my approach further by removing the
>> dependency between PAT and MTRR and to make this whole mess more
>> clear.
>>
>> I will start to work on this as soon as possible, which will
>> probably be some time in September.
>
> Good, I will look for your patches and try them out.
>
>>
>>> I think perhaps the most misnamed variable here is the  local
>>> variable pat_disabled in memtypes.c and the most misnamed function is the
>>> pat_disable function in memtypes.c. They should be named pat_init_disabled
>>> and pat_init_disable, respectively, because they do not really disable
>>> PAT in
>>> the code but only prevent execution of the pat_init function. That
>>> leaves open
>>> the possibility for PAT to be enabled by init_cache_modes, which actually
>>> occurs in the current code in the Xen PV Dom0 environment, but the current
>>> code neglects to set pat_bp_enabled to true in that case. So we need a patch
>>> to fix that in order to fix the regression.
>>
>> In principle I agree, but you should be aware of my refactoring plans.
>
> I will defer to you and stop working on this re-factoring effort, but I
> will prepare a formal patch to fix the regression later today.
>
> I do think Jan's point about respecting the administrator's "nopat" setting
> should also be considered. AFAICT, the "nopat" option in current code

Yes, please add that, too. This was what I meant with "the fix above".

> is also not being respected on the bare metal, and the patch to
> init_cache_modes with a force_no_pat variable is also needed to
> ensure "nopat" is respected on the bare metal, AFAICT.

Hmm, I don't see how the PAT MSR will be written on bare metal when "nopat"
has been specified.

I just tried it with a 5.19 kernel and it booted with the correct PAT
settings:

[ 0.000000] x86/PAT: PAT support disabled via boot option.
[ 0.000986] x86/PAT: Configuration [0-7]: WB WT UC- UC WB WT UC- UC


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.08 kB)
OpenPGP public key
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments

2022-07-12 16:40:12

by Chuck Zmudzinski

[permalink] [raw]
Subject: Re: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen



On 7/12/22 11:30 AM, Juergen Gross wrote:
> On 12.07.22 17:09, Chuck Zmudzinski wrote:
> > On 7/12/2022 9:32 AM, Juergen Gross wrote:
> >> On 12.07.22 15:22, Chuck Zmudzinski wrote:
> >>> On 7/12/2022 2:04 AM, Jan Beulich wrote:
> >>>> On 11.07.2022 19:41, Chuck Zmudzinski wrote:
> >>>>> Moreover... (please move to the bottom of the code snippet
> >>>>> for more information about my tests in the Xen PV environment...)
> >>>>>
> >>>>> void init_cache_modes(void)
> >>>>> {
> >>>>>     u64 pat = 0;
> >>>>>
> >>>>>     if (pat_cm_initialized)
> >>>>>         return;
> >>>>>
> >>>>>     if (boot_cpu_has(X86_FEATURE_PAT)) {
> >>>>>         /*
> >>>>>          * CPU supports PAT. Set PAT table to be consistent with
> >>>>>          * PAT MSR. This case supports "nopat" boot option, and
> >>>>>          * virtual machine environments which support PAT without
> >>>>>          * MTRRs. In specific, Xen has unique setup to PAT MSR.
> >>>>>          *
> >>>>>          * If PAT MSR returns 0, it is considered invalid and emulates
> >>>>>          * as No PAT.
> >>>>>          */
> >>>>>         rdmsrl(MSR_IA32_CR_PAT, pat);
> >>>>>     }
> >>>>>
> >>>>>     if (!pat) {
> >>>>>         /*
> >>>>>          * No PAT. Emulate the PAT table that corresponds to the two
> >>>>>          * cache bits, PWT (Write Through) and PCD (Cache Disable).
> >>>>>          * This setup is also the same as the BIOS default setup.
> >>>>>          *
> >>>>>          * PTE encoding:
> >>>>>          *
> >>>>>          *       PCD
> >>>>>          *       |PWT  PAT
> >>>>>          *       ||    slot
> >>>>>          *       00    0    WB : _PAGE_CACHE_MODE_WB
> >>>>>          *       01    1    WT : _PAGE_CACHE_MODE_WT
> >>>>>          *       10    2    UC-: _PAGE_CACHE_MODE_UC_MINUS
> >>>>>          *       11    3    UC : _PAGE_CACHE_MODE_UC
> >>>>>          *
> >>>>>          * NOTE: When WC or WP is used, it is redirected to UC- per
> >>>>>          * the default setup in __cachemode2pte_tbl[].
> >>>>>          */
> >>>>>         pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
> >>>>>               PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
> >>>>>     }
> >>>>>
> >>>>>     else if (!pat_bp_enabled) {
> >>>>>         /*
> >>>>>          * In some environments, specifically Xen PV, PAT
> >>>>>          * initialization is skipped because MTRRs are
> >>>>>          * disabled even though PAT is available. In such
> >>>>>          * environments, set PAT to initialized and enabled to
> >>>>>          * correctly indicate to callers of pat_enabled() that
> >>>>>          * PAT is available and prevent PAT from being disabled.
> >>>>>          */
> >>>>>         pat_bp_enabled = true;
> >>>>>         pr_info("x86/PAT: PAT enabled by init_cache_modes\n");
> >>>>>     }
> >>>>>
> >>>>>     __init_cache_modes(pat);
> >>>>> }
> >>>>>
> >>>>> This function, patched with the extra 'else if' block, fixes the
> >>>>> regression on my Xen worksatation, and the pr_info message
> >>>>> "x86/PAT: PAT enabled by init_cache_modes" appears in the logs
> >>>>> when running this patched kernel in my Xen Dom0. This means
> >>>>> that in the Xen PV environment on my Xen Dom0 workstation,
> >>>>> rdmsrl(MSR_IA32_CR_PAT, pat) successfully tested for the presence
> >>>>> of PAT on the virtual CPU that Xen exposed to the Linux kernel on my
> >>>>> Xen Dom0 workstation. At least that is what I think my tests prove.
> >>>>>
> >>>>> So why is this not a valid way to test for the existence of
> >>>>> PAT in the Xen PV environment? Are the existing comments
> >>>>> in init_cache_modes() about supporting both the case when
> >>>>> the "nopat" boot option is set and the specific case of Xen and
> >>>>> MTRR disabled wrong? My testing confirms those comments are
> >>>>> correct.
> >>>>
> >>>> At the very least this ignores the possible "nopat" an admin may
> >>>> have passed to the kernel.
> >>>
> >>> I realize that. The patch I proposed here only fixes the regression. It
> >>> would be easy to also modify the patch to also observe the 'nopat"
> >>> setting. I think your patch had a force_pat_disable local variable that
> >>> is set if pat is disabled by the administrator with "nopat." With that
> >>> variable available, modifying the patch so in init_cache_modes we have:
> >>>
> >>>      if (!pat || force_pat_disable) {
> >>>          /*
> >>>           * No PAT. Emulate the PAT table that corresponds to the two
> >>>
> >>> Instead of:
> >>>
> >>>      if (!pat) {
> >>>          /*
> >>>           * No PAT. Emulate the PAT table that corresponds to the two
> >>>
> >>> would cause the kernel to respect the "nopat" setting by the administrator
> >>> in the Xen PV Dom0 environment.
> >>
> >> Chuck, could you please send out a proper patch with your initial fix
> >> (setting pat_bp_enabled) and the fix above?
> >>
> >> I've chatted with Boris Petkov on IRC and he is fine with that.
> >
> > That's great, I will submit a formal patch later today.
> >
> >>
> >>> I agree this needs to be fixed up, because currently the code is very
> >>> confusing and the current variable names and function names do not
> >>> always accurately describe what they actually do in the code. That is
> >>> why I am working on a patch to do some re-factoring, which only consists
> >>> of function and variable name changes and comment changes to fix
> >>> the places where the comments in the code are misleading or incomplete.
> >>
> >> Boris and I agreed to pursue my approach further by removing the
> >> dependency between PAT and MTRR and to make this whole mess more
> >> clear.
> >>
> >> I will start to work on this as soon as possible, which will
> >> probably be some time in September.
> >
> > Good, I will look for your patches and try them out.
> >
> >>
> >>> I think perhaps the most misnamed variable here is the  local
> >>> variable pat_disabled in memtypes.c and the most misnamed function is the
> >>> pat_disable function in memtypes.c. They should be named pat_init_disabled
> >>> and pat_init_disable, respectively, because they do not really disable
> >>> PAT in
> >>> the code but only prevent execution of the pat_init function. That
> >>> leaves open
> >>> the possibility for PAT to be enabled by init_cache_modes, which actually
> >>> occurs in the current code in the Xen PV Dom0 environment, but the current
> >>> code neglects to set pat_bp_enabled to true in that case. So we need a patch
> >>> to fix that in order to fix the regression.
> >>
> >> In principle I agree, but you should be aware of my refactoring plans.
> >
> > I will defer to you and stop working on this re-factoring effort, but I
> > will prepare a formal patch to fix the regression later today.
> >
> > I do think Jan's point about respecting the administrator's "nopat" setting
> > should also be considered. AFAICT, the "nopat" option in current code
>
> Yes, please add that, too. This was what I meant with "the fix above".
>
> > is also not being respected on the bare metal, and the patch to
> > init_cache_modes with a force_no_pat variable is also needed to
> > ensure "nopat" is respected on the bare metal, AFAICT.
>
> Hmm, I don't see how the PAT MSR will be written on bare metal when "nopat"
> has been specified.
>
> I just tried it with a 5.19 kernel and it booted with the correct PAT
> settings:
>
> [ 0.000000] x86/PAT: PAT support disabled via boot option.
> [ 0.000986] x86/PAT: Configuration [0-7]: WB WT UC- UC WB WT UC- UC
>
>
> Juergen

OK, I understand. In init_cache_modes, if on Xen we read the PAT MSR
to pick up the configuration Xen did, so on bare metal we will just read the
configuration with PAT disabled. So "nopat" does work on bare metal.

I do think Jan is correct that "nopat" does not work on Xen, though.

Chuck

2022-07-14 17:23:18

by Chuck Zmudzinski

[permalink] [raw]
Subject: Re: Ping: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

On 7/5/22 6:57 AM, Thorsten Leemhuis wrote:
> [CCing tglx, mingo, Boris and Juergen]
>
> On 04.07.22 14:26, Jan Beulich wrote:
> > On 04.07.2022 13:58, Thorsten Leemhuis wrote:
> >> On 25.05.22 10:55, Jan Beulich wrote:
> >>> On 28.04.2022 16:50, Jan Beulich wrote:
> >>>> The latest with commit bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT
> >>>> with pat_enabled()") pat_enabled() returning false (because of PAT
> >>>> initialization being suppressed in the absence of MTRRs being announced
> >>>> to be available) has become a problem: The i915 driver now fails to
> >>>> initialize when running PV on Xen (i915_gem_object_pin_map() is where I
> >>>> located the induced failure), and its error handling is flaky enough to
> >>>> (at least sometimes) result in a hung system.
> >>>>
> >>>> Yet even beyond that problem the keying of the use of WC mappings to
> >>>> pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular
> >>>> graphics frame buffer accesses would have been quite a bit less
> >>>> performant than possible.
> >>>>
> >>>> Arrange for the function to return true in such environments, without
> >>>> undermining the rest of PAT MSR management logic considering PAT to be
> >>>> disabled: Specifically, no writes to the PAT MSR should occur.
> >>>>
> >>>> For the new boolean to live in .init.data, init_cache_modes() also needs
> >>>> moving to .init.text (where it could/should have lived already before).
> >>>>
> >>>> Signed-off-by: Jan Beulich <[email protected]>
> >>>
> >>> The Linux kernel regression tracker is pestering me because things are
> >>> taking so long (effectively quoting him), and alternative proposals
> >>> made so far look to have more severe downsides.
> >>
> >> Has any progress been made with this patch? It afaics is meant to fix
> >> this regression, which ideally should have been fixed weeks ago (btw:
> >> adding a "Link:" tag pointing to it would be good):
> >> https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/
> >>
> >> According to Juergen it's still needed:
> >> https://lore.kernel.org/lkml/[email protected]/
> >>
> >> Or was a different solution found to fix that regression?
> >
> > No progress and no alternatives I'm aware of.
>
> Getting closer to the point where I need to bring this to Linus
> attention. I hope this mail can help avoiding this.
>
> Jan, I didn't follow this closely, but do you have any idea why Dave,
> Luto, and Peter are ignoring this? Is reverting bdd8b6c98239 a option to
> get the regression fixed? Would a repost maybe help getting this rolling
> again?

Hi, Thorsten,

Here is a link to the hardware probe of my system which exhibits
a system hang before fully booting with bdd8b6c98239. Without
bdd8b6c98239, the problem is gone:

https://linux-hardware.org/?probe=32e615b538

Keep in mind this problem is not seen with bdd8b6c98239
on the bare metal, but only when running as a traditional Dom0
PV type guest on Xen. I don't know see the problem on Xen HVM
DomU, and I have not tested it on Xen PVH DomU, Xen PV DomU,
or the experimental Xen PVH Dom0.

You can probably verify yourself that reverting bdd8b6c98239
fixes the regression if you try to reproduce the regression with
any Linux version that has bdd8b6c98239 or its equivalent on
the stable branches with a hardware profile similar to the link
to the profile of my machine which exhibits the problem. Mine
is a Haswell core-i5 4590S CPU and ASRock B85M-Pro4
motherboard.

Also, other notes:

1. Yes, AFAICT, Marek at Qubes OS is the first to report the problem.
2. Juergen Gross' work to try to fix this has been helpful, but none
of his posted patches has fixed the regression on my system.
3. Jan's patch fixes it also, and so do the two patches I posted to lkml
earlier this week to the appropriate maintainers.
4. On the pkg-xen-devel mailing list, which is public, this issue was
briefly discussed where I first reported it. Someone there said they
did not see the issue with Broadwell Xeons. Mine is a Haswell core i5,
which is one generation older than Broadwell, so you are most likely
to be able to reproduce the problem with a Haswell core i5 desktop
system like my ASRock system, which was my own private build
which has been working fine for eight years until Linux 5.17.y.

Hope this helps.

Chuck

> BTW, for anyone new to this, Jan's patch afaics is supposed to fix the
> regression reported here:
> https://lore.kernel.org/all/YnHK1Z3o99eMXsVK@mail-itl/
>
> Side note: Juergen Gross recently posted related patches in this code
> area to fix some other problems (regressions?), but his efforts look
> stalled, too:
> https://lore.kernel.org/all/[email protected]/
>
> And he recently stated this Jan's patch is still needed, even if his
> changes make it in.
> https://lore.kernel.org/all/[email protected]/
>
> This from my point all looks a bit... unsatisfying. :-/
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>
> P.S.: As the Linux kernel's regression tracker I deal with a lot of
> reports and sometimes miss something important when writing mails like
> this. If that's the case here, don't hesitate to tell me in a public
> reply, it's in everyone's interest to set the public record straight.

2022-07-14 22:42:41

by Chuck Zmudzinski

[permalink] [raw]
Subject: Re: Ping: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

On 7/14/2022 1:17 PM, Chuck Zmudzinski wrote:
> On 7/5/22 6:57 AM, Thorsten Leemhuis wrote:
> > [CCing tglx, mingo, Boris and Juergen]
> >
> > On 04.07.22 14:26, Jan Beulich wrote:
> > > On 04.07.2022 13:58, Thorsten Leemhuis wrote:
> > >> On 25.05.22 10:55, Jan Beulich wrote:
> > >>> On 28.04.2022 16:50, Jan Beulich wrote:
> > >>>> The latest with commit bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT
> > >>>> with pat_enabled()") pat_enabled() returning false (because of PAT
> > >>>> initialization being suppressed in the absence of MTRRs being announced
> > >>>> to be available) has become a problem: The i915 driver now fails to
> > >>>> initialize when running PV on Xen (i915_gem_object_pin_map() is where I
> > >>>> located the induced failure), and its error handling is flaky enough to
> > >>>> (at least sometimes) result in a hung system.
> > >>>>
> > >>>> Yet even beyond that problem the keying of the use of WC mappings to
> > >>>> pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular
> > >>>> graphics frame buffer accesses would have been quite a bit less
> > >>>> performant than possible.
> > >>>>
> > >>>> Arrange for the function to return true in such environments, without
> > >>>> undermining the rest of PAT MSR management logic considering PAT to be
> > >>>> disabled: Specifically, no writes to the PAT MSR should occur.
> > >>>>
> > >>>> For the new boolean to live in .init.data, init_cache_modes() also needs
> > >>>> moving to .init.text (where it could/should have lived already before).
> > >>>>
> > >>>> Signed-off-by: Jan Beulich <[email protected]>
> > >>>
> > >>> The Linux kernel regression tracker is pestering me because things are
> > >>> taking so long (effectively quoting him), and alternative proposals
> > >>> made so far look to have more severe downsides.
> > >>
> > >> Has any progress been made with this patch? It afaics is meant to fix
> > >> this regression, which ideally should have been fixed weeks ago (btw:
> > >> adding a "Link:" tag pointing to it would be good):
> > >> https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/
> > >>
> > >> According to Juergen it's still needed:
> > >> https://lore.kernel.org/lkml/[email protected]/
> > >>
> > >> Or was a different solution found to fix that regression?
> > >
> > > No progress and no alternatives I'm aware of.
> >
> > Getting closer to the point where I need to bring this to Linus
> > attention. I hope this mail can help avoiding this.
> >
> > Jan, I didn't follow this closely, but do you have any idea why Dave,
> > Luto, and Peter are ignoring this? Is reverting bdd8b6c98239 a option to
> > get the regression fixed? Would a repost maybe help getting this rolling
> > again?
>
> Hi, Thorsten,
>
> Here is a link to the hardware probe of my system which exhibits
> a system hang before fully booting with bdd8b6c98239. Without
> bdd8b6c98239, the problem is gone:
>
> https://linux-hardware.org/?probe=32e615b538
>
> Keep in mind this problem is not seen with bdd8b6c98239
> on the bare metal, but only when running as a traditional Dom0
> PV type guest on Xen. I don't know see the problem on Xen HVM
> DomU, and I have not tested it on Xen PVH DomU, Xen PV DomU,
> or the experimental Xen PVH Dom0.

Update: On affected hardware, you do not need to run in a
Xen PV Dom0 to see the regression caused by bdd8b6c98239.

All you need to do is run, on the bare metal, on the affected
hardware, with the Linux kernel nopat boot option.

Jan mentions in his commit message the function in the i915
driver that was touched by bdd8b6c98239 and that causes this
regression. That is, any Intel IGD that needs to execute the
function that Jan mentions in the commit message of his
proposed patch when the i915 driver is setting up the graphics
engine will most likely be hardware that is affected. My Intel
IGD was marketed as HD Graphics 4600, I think.

So find an a system with these hardware characteristics, and
try running, with the nopat option, the Linux kernel, with
and without bdd8b6c98239. You will see the regression I
am experiencing, I predict.

Chuck

>
> You can probably verify yourself that reverting bdd8b6c98239
> fixes the regression if you try to reproduce the regression with
> any Linux version that has bdd8b6c98239 or its equivalent on
> the stable branches with a hardware profile similar to the link
> to the profile of my machine which exhibits the problem. Mine
> is a Haswell core-i5 4590S CPU and ASRock B85M-Pro4
> motherboard.
>
> Also, other notes:
>
> 1. Yes, AFAICT, Marek at Qubes OS is the first to report the problem.
> 2. Juergen Gross' work to try to fix this has been helpful, but none
> of his posted patches has fixed the regression on my system.
> 3. Jan's patch fixes it also, and so do the two patches I posted to lkml
> earlier this week to the appropriate maintainers.
> 4. On the pkg-xen-devel mailing list, which is public, this issue was
> briefly discussed where I first reported it. Someone there said they
> did not see the issue with Broadwell Xeons. Mine is a Haswell core i5,
> which is one generation older than Broadwell, so you are most likely
> to be able to reproduce the problem with a Haswell core i5 desktop
> system like my ASRock system, which was my own private build
> which has been working fine for eight years until Linux 5.17.y.
>
> Hope this helps.
>
> Chuck
>
> > BTW, for anyone new to this, Jan's patch afaics is supposed to fix the
> > regression reported here:
> > https://lore.kernel.org/all/YnHK1Z3o99eMXsVK@mail-itl/
> >
> > Side note: Juergen Gross recently posted related patches in this code
> > area to fix some other problems (regressions?), but his efforts look
> > stalled, too:
> > https://lore.kernel.org/all/[email protected]/
> >
> > And he recently stated this Jan's patch is still needed, even if his
> > changes make it in.
> > https://lore.kernel.org/all/[email protected]/
> >
> > This from my point all looks a bit... unsatisfying. :-/
> >
> > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> >
> > P.S.: As the Linux kernel's regression tracker I deal with a lot of
> > reports and sometimes miss something important when writing mails like
> > this. If that's the case here, don't hesitate to tell me in a public
> > reply, it's in everyone's interest to set the public record straight.
>


2022-07-14 23:14:14

by Chuck Zmudzinski

[permalink] [raw]
Subject: Re: Ping: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

On 7/14/2022 6:33 PM, Chuck Zmudzinski wrote:
> On 7/14/2022 1:17 PM, Chuck Zmudzinski wrote:
> > On 7/5/22 6:57 AM, Thorsten Leemhuis wrote:
> > > [CCing tglx, mingo, Boris and Juergen]
> > >
> > > On 04.07.22 14:26, Jan Beulich wrote:
> > > > On 04.07.2022 13:58, Thorsten Leemhuis wrote:
> > > >> On 25.05.22 10:55, Jan Beulich wrote:
> > > >>> On 28.04.2022 16:50, Jan Beulich wrote:
> > > >>>> The latest with commit bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT
> > > >>>> with pat_enabled()") pat_enabled() returning false (because of PAT
> > > >>>> initialization being suppressed in the absence of MTRRs being announced
> > > >>>> to be available) has become a problem: The i915 driver now fails to
> > > >>>> initialize when running PV on Xen (i915_gem_object_pin_map() is where I
> > > >>>> located the induced failure), and its error handling is flaky enough to
> > > >>>> (at least sometimes) result in a hung system.
> > > >>>>
> > > >>>> Yet even beyond that problem the keying of the use of WC mappings to
> > > >>>> pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular
> > > >>>> graphics frame buffer accesses would have been quite a bit less
> > > >>>> performant than possible.
> > > >>>>
> > > >>>> Arrange for the function to return true in such environments, without
> > > >>>> undermining the rest of PAT MSR management logic considering PAT to be
> > > >>>> disabled: Specifically, no writes to the PAT MSR should occur.
> > > >>>>
> > > >>>> For the new boolean to live in .init.data, init_cache_modes() also needs
> > > >>>> moving to .init.text (where it could/should have lived already before).
> > > >>>>
> > > >>>> Signed-off-by: Jan Beulich <[email protected]>
> > > >>>
> > > >>> The Linux kernel regression tracker is pestering me because things are
> > > >>> taking so long (effectively quoting him), and alternative proposals
> > > >>> made so far look to have more severe downsides.
> > > >>
> > > >> Has any progress been made with this patch? It afaics is meant to fix
> > > >> this regression, which ideally should have been fixed weeks ago (btw:
> > > >> adding a "Link:" tag pointing to it would be good):
> > > >> https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/
> > > >>
> > > >> According to Juergen it's still needed:
> > > >> https://lore.kernel.org/lkml/[email protected]/
> > > >>
> > > >> Or was a different solution found to fix that regression?
> > > >
> > > > No progress and no alternatives I'm aware of.
> > >
> > > Getting closer to the point where I need to bring this to Linus
> > > attention. I hope this mail can help avoiding this.
> > >
> > > Jan, I didn't follow this closely, but do you have any idea why Dave,
> > > Luto, and Peter are ignoring this? Is reverting bdd8b6c98239 a option to
> > > get the regression fixed? Would a repost maybe help getting this rolling
> > > again?
> >
> > Hi, Thorsten,
> >
> > Here is a link to the hardware probe of my system which exhibits
> > a system hang before fully booting with bdd8b6c98239. Without
> > bdd8b6c98239, the problem is gone:
> >
> > https://linux-hardware.org/?probe=32e615b538
> >
> > Keep in mind this problem is not seen with bdd8b6c98239
> > on the bare metal, but only when running as a traditional Dom0
> > PV type guest on Xen. I don't know see the problem on Xen HVM
> > DomU, and I have not tested it on Xen PVH DomU, Xen PV DomU,
> > or the experimental Xen PVH Dom0.
>
> Update: On affected hardware, you do not need to run in a
> Xen PV Dom0 to see the regression caused by bdd8b6c98239.
>
> All you need to do is run, on the bare metal, on the affected
> hardware, with the Linux kernel nopat boot option.
>
> Jan mentions in his commit message the function in the i915
> driver that was touched by bdd8b6c98239 and that causes this
> regression. That is, any Intel IGD that needs to execute the
> function that Jan mentions in the commit message of his
> proposed patch when the i915 driver is setting up the graphics
> engine will most likely be hardware that is affected. My Intel
> IGD was marketed as HD Graphics 4600, I think.
>
> So find an a system with these hardware characteristics, and
> try running, with the nopat option, the Linux kernel, with
> and without bdd8b6c98239. You will see the regression I
> am experiencing, I predict.

This raises a disturbing question: The commit message of
bdd8b6c98239 mentions the nopat option. It does not specify what
effect the commit was supposed to have on system
with the nopat option, but the actual effect on the system,
both with the seldom used nopat option and in Xen PV Dom0,
a nasty regression on some older Intel IGD devices. My question:

Was this intentional? Or just grossly incompetent? Any other
possibilities?

I think you should definitely notify Linus about this if you can
verify the story I am telling here.

Chuck

>
> Chuck
>
> >
> > You can probably verify yourself that reverting bdd8b6c98239
> > fixes the regression if you try to reproduce the regression with
> > any Linux version that has bdd8b6c98239 or its equivalent on
> > the stable branches with a hardware profile similar to the link
> > to the profile of my machine which exhibits the problem. Mine
> > is a Haswell core-i5 4590S CPU and ASRock B85M-Pro4
> > motherboard.
> >
> > Also, other notes:
> >
> > 1. Yes, AFAICT, Marek at Qubes OS is the first to report the problem.
> > 2. Juergen Gross' work to try to fix this has been helpful, but none
> > of his posted patches has fixed the regression on my system.
> > 3. Jan's patch fixes it also, and so do the two patches I posted to lkml
> > earlier this week to the appropriate maintainers.
> > 4. On the pkg-xen-devel mailing list, which is public, this issue was
> > briefly discussed where I first reported it. Someone there said they
> > did not see the issue with Broadwell Xeons. Mine is a Haswell core i5,
> > which is one generation older than Broadwell, so you are most likely
> > to be able to reproduce the problem with a Haswell core i5 desktop
> > system like my ASRock system, which was my own private build
> > which has been working fine for eight years until Linux 5.17.y.
> >
> > Hope this helps.
> >
> > Chuck
> >
> > > BTW, for anyone new to this, Jan's patch afaics is supposed to fix the
> > > regression reported here:
> > > https://lore.kernel.org/all/YnHK1Z3o99eMXsVK@mail-itl/
> > >
> > > Side note: Juergen Gross recently posted related patches in this code
> > > area to fix some other problems (regressions?), but his efforts look
> > > stalled, too:
> > > https://lore.kernel.org/all/[email protected]/
> > >
> > > And he recently stated this Jan's patch is still needed, even if his
> > > changes make it in.
> > > https://lore.kernel.org/all/[email protected]/
> > >
> > > This from my point all looks a bit... unsatisfying. :-/
> > >
> > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> > >
> > > P.S.: As the Linux kernel's regression tracker I deal with a lot of
> > > reports and sometimes miss something important when writing mails like
> > > this. If that's the case here, don't hesitate to tell me in a public
> > > reply, it's in everyone's interest to set the public record straight.
> >
>
>

2022-07-19 14:38:04

by Chuck Zmudzinski

[permalink] [raw]
Subject: Re: Ping: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

On 7/14/2022 6:45 PM, Chuck Zmudzinski wrote:
> On 7/14/2022 6:33 PM, Chuck Zmudzinski wrote:
> > On 7/14/2022 1:17 PM, Chuck Zmudzinski wrote:
> > > On 7/5/22 6:57 AM, Thorsten Leemhuis wrote:
> > > > [CCing tglx, mingo, Boris and Juergen]
> > > >
> > > > On 04.07.22 14:26, Jan Beulich wrote:
> > > > > On 04.07.2022 13:58, Thorsten Leemhuis wrote:
> > > > >> On 25.05.22 10:55, Jan Beulich wrote:
> > > > >>> On 28.04.2022 16:50, Jan Beulich wrote:
> > > > >>>> The latest with commit bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT
> > > > >>>> with pat_enabled()") pat_enabled() returning false (because of PAT
> > > > >>>> initialization being suppressed in the absence of MTRRs being announced
> > > > >>>> to be available) has become a problem: The i915 driver now fails to
> > > > >>>> initialize when running PV on Xen (i915_gem_object_pin_map() is where I
> > > > >>>> located the induced failure), and its error handling is flaky enough to
> > > > >>>> (at least sometimes) result in a hung system.
> > > > >>>>
> > > > >>>> Yet even beyond that problem the keying of the use of WC mappings to
> > > > >>>> pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular
> > > > >>>> graphics frame buffer accesses would have been quite a bit less
> > > > >>>> performant than possible.
> > > > >>>>
> > > > >>>> Arrange for the function to return true in such environments, without
> > > > >>>> undermining the rest of PAT MSR management logic considering PAT to be
> > > > >>>> disabled: Specifically, no writes to the PAT MSR should occur.
> > > > >>>>
> > > > >>>> For the new boolean to live in .init.data, init_cache_modes() also needs
> > > > >>>> moving to .init.text (where it could/should have lived already before).
> > > > >>>>
> > > > >>>> Signed-off-by: Jan Beulich <[email protected]>
> > > > >>>
> > > > >>> The Linux kernel regression tracker is pestering me because things are
> > > > >>> taking so long (effectively quoting him), and alternative proposals
> > > > >>> made so far look to have more severe downsides.
> > > > >>
> > > > >> Has any progress been made with this patch? It afaics is meant to fix
> > > > >> this regression, which ideally should have been fixed weeks ago (btw:
> > > > >> adding a "Link:" tag pointing to it would be good):
> > > > >> https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/
> > > > >>
> > > > >> According to Juergen it's still needed:
> > > > >> https://lore.kernel.org/lkml/[email protected]/
> > > > >>
> > > > >> Or was a different solution found to fix that regression?
> > > > >
> > > > > No progress and no alternatives I'm aware of.
> > > >
> > > > Getting closer to the point where I need to bring this to Linus
> > > > attention. I hope this mail can help avoiding this.
> > > >
> > > > Jan, I didn't follow this closely, but do you have any idea why Dave,
> > > > Luto, and Peter are ignoring this? Is reverting bdd8b6c98239 a option to
> > > > get the regression fixed? Would a repost maybe help getting this rolling
> > > > again?
> > >
> > > Hi, Thorsten,
> > >
> > > Here is a link to the hardware probe of my system which exhibits
> > > a system hang before fully booting with bdd8b6c98239. Without
> > > bdd8b6c98239, the problem is gone:
> > >
> > > https://linux-hardware.org/?probe=32e615b538
> > >
> > > Keep in mind this problem is not seen with bdd8b6c98239
> > > on the bare metal, but only when running as a traditional Dom0
> > > PV type guest on Xen. I don't know see the problem on Xen HVM
> > > DomU, and I have not tested it on Xen PVH DomU, Xen PV DomU,
> > > or the experimental Xen PVH Dom0.
> >
> > Update: On affected hardware, you do not need to run in a
> > Xen PV Dom0 to see the regression caused by bdd8b6c98239.
> >
> > All you need to do is run, on the bare metal, on the affected
> > hardware, with the Linux kernel nopat boot option.
> >
> > Jan mentions in his commit message the function in the i915
> > driver that was touched by bdd8b6c98239 and that causes this
> > regression. That is, any Intel IGD that needs to execute the
> > function that Jan mentions in the commit message of his
> > proposed patch when the i915 driver is setting up the graphics
> > engine will most likely be hardware that is affected. My Intel
> > IGD was marketed as HD Graphics 4600, I think.
> >
> > So find an a system with these hardware characteristics, and
> > try running, with the nopat option, the Linux kernel, with
> > and without bdd8b6c98239. You will see the regression I
> > am experiencing, I predict.
>
> This raises a disturbing question: The commit message of
> bdd8b6c98239 mentions the nopat option. It does not specify what
> effect the commit was supposed to have on system
> with the nopat option, but the actual effect on the system,
> both with the seldom used nopat option and in Xen PV Dom0,
> a nasty regression on some older Intel IGD devices. My question:
>
> Was this intentional? Or just grossly incompetent? Any other
> possibilities?

After reading more of the discussions on lkml, I suppose Intel
decided a regression affecting older Intel IGDs running on Xen
would not affect enough users to cause a serious problem.
I just happen to be one of the few users affected by it. I
understand I cannot expect Intel to support its older devices
forever, even if Linus' regression rule would seem to suggest
that on Linux older devices need to continue to be supported
for as long as is reasonably possible. In any case, I can see that
efforts are still underway to fix this regression, not from Intel,
but from maintainers of x86 and Xen kernel code, and that is
what is necessary if the Linus regression rule applies even when
just a few users are affected by a regression. While waiting for a
fix in the mainline kernel, affected users can revert bdd8b6c98239
on top of the released kernel versions as the most simple
workaround for this regression.

Chuck

>
> I think you should definitely notify Linus about this if you can
> verify the story I am telling here.
>
> Chuck
>
> >
> > Chuck
> >
> > >
> > > You can probably verify yourself that reverting bdd8b6c98239
> > > fixes the regression if you try to reproduce the regression with
> > > any Linux version that has bdd8b6c98239 or its equivalent on
> > > the stable branches with a hardware profile similar to the link
> > > to the profile of my machine which exhibits the problem. Mine
> > > is a Haswell core-i5 4590S CPU and ASRock B85M-Pro4
> > > motherboard.
> > >
> > > Also, other notes:
> > >
> > > 1. Yes, AFAICT, Marek at Qubes OS is the first to report the problem.
> > > 2. Juergen Gross' work to try to fix this has been helpful, but none
> > > of his posted patches has fixed the regression on my system.
> > > 3. Jan's patch fixes it also, and so do the two patches I posted to lkml
> > > earlier this week to the appropriate maintainers.
> > > 4. On the pkg-xen-devel mailing list, which is public, this issue was
> > > briefly discussed where I first reported it. Someone there said they
> > > did not see the issue with Broadwell Xeons. Mine is a Haswell core i5,
> > > which is one generation older than Broadwell, so you are most likely
> > > to be able to reproduce the problem with a Haswell core i5 desktop
> > > system like my ASRock system, which was my own private build
> > > which has been working fine for eight years until Linux 5.17.y.
> > >
> > > Hope this helps.
> > >
> > > Chuck
> > >
> > > > BTW, for anyone new to this, Jan's patch afaics is supposed to fix the
> > > > regression reported here:
> > > > https://lore.kernel.org/all/YnHK1Z3o99eMXsVK@mail-itl/
> > > >
> > > > Side note: Juergen Gross recently posted related patches in this code
> > > > area to fix some other problems (regressions?), but his efforts look
> > > > stalled, too:
> > > > https://lore.kernel.org/all/[email protected]/
> > > >
> > > > And he recently stated this Jan's patch is still needed, even if his
> > > > changes make it in.
> > > > https://lore.kernel.org/all/[email protected]/
> > > >
> > > > This from my point all looks a bit... unsatisfying. :-/
> > > >
> > > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> > > >
> > > > P.S.: As the Linux kernel's regression tracker I deal with a lot of
> > > > reports and sometimes miss something important when writing mails like
> > > > this. If that's the case here, don't hesitate to tell me in a public
> > > > reply, it's in everyone's interest to set the public record straight.
> > >
> >
> >
>

Subject: [tip: x86/urgent] x86/PAT: Have pat_enabled() properly reflect state when running on Xen

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 72cbc8f04fe2fa93443c0fcccb7ad91dfea3d9ce
Gitweb: https://git.kernel.org/tip/72cbc8f04fe2fa93443c0fcccb7ad91dfea3d9ce
Author: Jan Beulich <[email protected]>
AuthorDate: Thu, 28 Apr 2022 16:50:29 +02:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Mon, 15 Aug 2022 10:51:23 +02:00

x86/PAT: Have pat_enabled() properly reflect state when running on Xen

After commit ID in the Fixes: tag, pat_enabled() returns false (because
of PAT initialization being suppressed in the absence of MTRRs being
announced to be available).

This has become a problem: the i915 driver now fails to initialize when
running PV on Xen (i915_gem_object_pin_map() is where I located the
induced failure), and its error handling is flaky enough to (at least
sometimes) result in a hung system.

Yet even beyond that problem the keying of the use of WC mappings to
pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular
graphics frame buffer accesses would have been quite a bit less optimal
than possible.

Arrange for the function to return true in such environments, without
undermining the rest of PAT MSR management logic considering PAT to be
disabled: specifically, no writes to the PAT MSR should occur.

For the new boolean to live in .init.data, init_cache_modes() also needs
moving to .init.text (where it could/should have lived already before).

[ bp: This is the "small fix" variant for stable. It'll get replaced
with a proper PAT and MTRR detection split upstream but that is too
involved for a stable backport.
- additional touchups to commit msg. Use cpu_feature_enabled(). ]

Fixes: bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT with pat_enabled()")
Signed-off-by: Jan Beulich <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Cc: <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/mm/pat/memtype.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index d5ef64d..66a209f 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -62,6 +62,7 @@

static bool __read_mostly pat_bp_initialized;
static bool __read_mostly pat_disabled = !IS_ENABLED(CONFIG_X86_PAT);
+static bool __initdata pat_force_disabled = !IS_ENABLED(CONFIG_X86_PAT);
static bool __read_mostly pat_bp_enabled;
static bool __read_mostly pat_cm_initialized;

@@ -86,6 +87,7 @@ void pat_disable(const char *msg_reason)
static int __init nopat(char *str)
{
pat_disable("PAT support disabled via boot option.");
+ pat_force_disabled = true;
return 0;
}
early_param("nopat", nopat);
@@ -272,7 +274,7 @@ static void pat_ap_init(u64 pat)
wrmsrl(MSR_IA32_CR_PAT, pat);
}

-void init_cache_modes(void)
+void __init init_cache_modes(void)
{
u64 pat = 0;

@@ -313,6 +315,12 @@ void init_cache_modes(void)
*/
pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
+ } else if (!pat_force_disabled && cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) {
+ /*
+ * Clearly PAT is enabled underneath. Allow pat_enabled() to
+ * reflect this.
+ */
+ pat_bp_enabled = true;
}

__init_cache_modes(pat);