2024-01-24 17:05:32

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCH, RESEND] x86/pat: Simplifying the PAT programming protocol

The programming protocol for the PAT MSR follows the MTRR programming
protocol. However, this protocol is cumbersome and requires disabling
caching (CR0.CD=1), which is not possible on some platforms.

Specifically, a TDX guest is not allowed to set CR0.CD. It triggers
a #VE exception.

Turned out the requirement to follow the MTRR programming protocol for
PAT programming is unnecessarily strict. The new Intel Software
Developer Manual[1] (December 2023) relaxes this requirement. Please
refer to the section titled "Programming the PAT" for more information.

The AMD documentation does not link PAT programming to MTRR.

The kernel only needs to flush the TLB after updating the PAT MSR. The
set_memory code already takes care of flushing the TLB and cache when
changing the memory type of a page.

[1] http://www.intel.com/sdm

Signed-off-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
---
arch/x86/kernel/cpu/cacheinfo.c | 7 ++++---
arch/x86/mm/pat/memtype.c | 9 +++------
2 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index c131c412db89..78afad50a7c0 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -1118,15 +1118,16 @@ static void cache_cpu_init(void)
unsigned long flags;

local_irq_save(flags);
- cache_disable();

- if (memory_caching_control & CACHE_MTRR)
+ if (memory_caching_control & CACHE_MTRR) {
+ cache_disable();
mtrr_generic_set_state();
+ cache_enable();
+ }

if (memory_caching_control & CACHE_PAT)
pat_cpu_init();

- cache_enable();
local_irq_restore(flags);
}

diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index 0904d7e8e126..0d72183b5dd0 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -240,6 +240,8 @@ void pat_cpu_init(void)
}

wrmsrl(MSR_IA32_CR_PAT, pat_msr_val);
+
+ __flush_tlb_all();
}

/**
@@ -296,13 +298,8 @@ void __init pat_bp_init(void)
/*
* Xen PV doesn't allow to set PAT MSR, but all cache modes are
* supported.
- * When running as TDX guest setting the PAT MSR won't work either
- * due to the requirement to set CR0.CD when doing so. Rely on
- * firmware to have set the PAT MSR correctly.
*/
- if (pat_disabled ||
- cpu_feature_enabled(X86_FEATURE_XENPV) ||
- cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
+ if (pat_disabled || cpu_feature_enabled(X86_FEATURE_XENPV)) {
init_cache_modes(pat_msr_val);
return;
}
--
2.43.0



2024-01-31 16:34:05

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH, RESEND] x86/pat: Simplifying the PAT programming protocol

On Wed, Jan 24, 2024 at 03:06:50PM +0200, Kirill A. Shutemov wrote:
> The programming protocol for the PAT MSR follows the MTRR programming
> protocol. However, this protocol is cumbersome and requires disabling
> caching (CR0.CD=1), which is not possible on some platforms.
>
> Specifically, a TDX guest is not allowed to set CR0.CD. It triggers
> a #VE exception.
>
> Turned out the requirement to follow the MTRR programming protocol for
> PAT programming is unnecessarily strict. The new Intel Software
> Developer Manual[1] (December 2023) relaxes this requirement. Please
> refer to the section titled "Programming the PAT" for more information.
>
> The AMD documentation does not link PAT programming to MTRR.
>
> The kernel only needs to flush the TLB after updating the PAT MSR. The
> set_memory code already takes care of flushing the TLB and cache when
> changing the memory type of a page.
>
> [1] http://www.intel.com/sdm
>
> Signed-off-by: Kirill A. Shutemov <[email protected]>
> Reviewed-by: Juergen Gross <[email protected]>

Any feedback?

--
Kiryl Shutsemau / Kirill A. Shutemov

2024-01-31 18:30:39

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH, RESEND] x86/pat: Simplifying the PAT programming protocol

On Wed, Jan 24, 2024 at 03:06:50PM +0200, Kirill A. Shutemov wrote:
> The programming protocol for the PAT MSR follows the MTRR programming
> protocol. However, this protocol is cumbersome and requires disabling
> caching (CR0.CD=1), which is not possible on some platforms.
>
> Specifically, a TDX guest is not allowed to set CR0.CD. It triggers
> a #VE exception.
>
> Turned out the requirement to follow the MTRR programming protocol for
> PAT programming is unnecessarily strict. The new Intel Software
> Developer Manual[1] (December 2023) relaxes this requirement. Please
> refer to the section titled "Programming the PAT" for more information.

How about you state that requirement here instead of referring to that
doc which is hard to read and changes constantly?

I'd prefer to have that programming requirement spelled out here to know
in the future what that requirement was and what "variant" was added to
the kernel in case someone decides to relax it even more.

>
> The AMD documentation does not link PAT programming to MTRR.
>
> The kernel only needs to flush the TLB after updating the PAT MSR. The
> set_memory code already takes care of flushing the TLB and cache when
> changing the memory type of a page.

So far so good. However, what guarantees that this relaxing of the
protocol doesn't break any existing machines?

If anything, this patch needs to be tested on everything possible. I can
do that on AMD hw and some old Intels, just to be sure.

> @@ -296,13 +298,8 @@ void __init pat_bp_init(void)
> /*
> * Xen PV doesn't allow to set PAT MSR, but all cache modes are
> * supported.
> - * When running as TDX guest setting the PAT MSR won't work either
> - * due to the requirement to set CR0.CD when doing so. Rely on
> - * firmware to have set the PAT MSR correctly.
> */
> - if (pat_disabled ||
> - cpu_feature_enabled(X86_FEATURE_XENPV) ||
> - cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
> + if (pat_disabled || cpu_feature_enabled(X86_FEATURE_XENPV)) {
> init_cache_modes(pat_msr_val);
> return;

What does that mean, now, practically?

That TDX guests virtualize the PAT MSR just like with any other guest or
what is going on there?

This should be explicitly stated in the commit message.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-01-31 18:57:36

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH, RESEND] x86/pat: Simplifying the PAT programming protocol

On Wed, Jan 31, 2024 at 06:57:38PM +0100, Borislav Petkov wrote:
> On Wed, Jan 24, 2024 at 03:06:50PM +0200, Kirill A. Shutemov wrote:
> > The programming protocol for the PAT MSR follows the MTRR programming
> > protocol. However, this protocol is cumbersome and requires disabling
> > caching (CR0.CD=1), which is not possible on some platforms.
> >
> > Specifically, a TDX guest is not allowed to set CR0.CD. It triggers
> > a #VE exception.
> >
> > Turned out the requirement to follow the MTRR programming protocol for
> > PAT programming is unnecessarily strict. The new Intel Software
> > Developer Manual[1] (December 2023) relaxes this requirement. Please
> > refer to the section titled "Programming the PAT" for more information.
>
> How about you state that requirement here instead of referring to that
> doc which is hard to read and changes constantly?
>
> I'd prefer to have that programming requirement spelled out here to know
> in the future what that requirement was and what "variant" was added to
> the kernel in case someone decides to relax it even more.

I summarized the requirements below: TLB has to flashed. Here's what SDM
says:

The operating system (OS) is responsible for ensuring that changes to a
PAT entry occur in a manner that maintains the consistency of the
processor caches and translation lookaside buffers (TLB). It requires the
OS to invalidate all affected TLB entries (including global entries) and
all entries in all paging-structure caches. It may also require flushing
of the processor caches in certain situations.

...

Example of a sequence to invalidate the processor TLBs and caches (if
necessary):

1. If the PCIDE or PGE flag is set in CR4, flush TLBs by clearing one of
those flags (then restore the flag via a subsequent CR4 write).

Otherwise, flush TLBs by executing a MOV from control register CR3 to
another register and then a MOV from that register back to CR3.

2. In the case that there are changes to memory-type mappings for which
cache self-snooping behavior would be problematic given the existing
mappings (e.g., changing a cache line's memory type from WB to UC to be
used for memory-mapped I/O), then cache flushing is also required. This
can be done by executing CLFLUSH operations for all affected cache lines
or by executing the WBINVD instruction (recommended only if there are a
large number of affected mappings or if it is unknown which mappings are
affected)

The second step is relevant for set_memory code that already does the
flushing on changing memory type.

> > The AMD documentation does not link PAT programming to MTRR.
> >
> > The kernel only needs to flush the TLB after updating the PAT MSR. The
> > set_memory code already takes care of flushing the TLB and cache when
> > changing the memory type of a page.
>
> So far so good. However, what guarantees that this relaxing of the
> protocol doesn't break any existing machines?

Our HW folks confirmed that the new optimized sequence works on all past
processors that support PAT.

> If anything, this patch needs to be tested on everything possible. I can
> do that on AMD hw and some old Intels, just to be sure.

Testing wouldn't hurt, but it cannot possibly prove that the new flow is
safe by testing.

> > @@ -296,13 +298,8 @@ void __init pat_bp_init(void)
> > /*
> > * Xen PV doesn't allow to set PAT MSR, but all cache modes are
> > * supported.
> > - * When running as TDX guest setting the PAT MSR won't work either
> > - * due to the requirement to set CR0.CD when doing so. Rely on
> > - * firmware to have set the PAT MSR correctly.
> > */
> > - if (pat_disabled ||
> > - cpu_feature_enabled(X86_FEATURE_XENPV) ||
> > - cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
> > + if (pat_disabled || cpu_feature_enabled(X86_FEATURE_XENPV)) {
> > init_cache_modes(pat_msr_val);
> > return;
>
> What does that mean, now, practically?
>
> That TDX guests virtualize the PAT MSR just like with any other guest or
> what is going on there?
>
> This should be explicitly stated in the commit message.

PAT MST was always virtualized, but we was not able to program it as we
followed MTRR protocol that sets CR0.CD. And I covered it in the commit
message:

Specifically, a TDX guest is not allowed to set CR0.CD. It triggers
a #VE exception.

--
Kiryl Shutsemau / Kirill A. Shutemov

2024-01-31 20:24:47

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH, RESEND] x86/pat: Simplifying the PAT programming protocol

On Wed, Jan 31, 2024 at 08:52:46PM +0200, Kirill A. Shutemov wrote:
> The second step is relevant for set_memory code that already does the
> flushing on changing memory type.

So the "relaxation" is the removal of that CR0.CD requirement?

> Our HW folks confirmed that the new optimized sequence works on all past
> processors that support PAT.

Your folks confirmed that for all released hw and for x86 hardware from
other vendors?

> Testing wouldn't hurt, but it cannot possibly prove that the new flow is
> safe by testing.

This basically says that I should not take this patch at all as there's
no way of even testing it?!

I hope I'm misunderstanding you.

> PAT MST was always virtualized, but we was not able to program it as we
> followed MTRR protocol that sets CR0.CD. And I covered it in the commit
> message:
>
> Specifically, a TDX guest is not allowed to set CR0.CD. It triggers
> a #VE exception.

Ok, I'll extend that part once the rest has been sorted out.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-01-31 22:23:15

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH, RESEND] x86/pat: Simplifying the PAT programming protocol

On Wed, Jan 31, 2024 at 09:23:40PM +0100, Borislav Petkov wrote:
> On Wed, Jan 31, 2024 at 08:52:46PM +0200, Kirill A. Shutemov wrote:
> > The second step is relevant for set_memory code that already does the
> > flushing on changing memory type.
>
> So the "relaxation" is the removal of that CR0.CD requirement?

And double WBINVD if the machine has no X86_FEATURE_SELFSNOOP (before and
after TLB flush).

> > Our HW folks confirmed that the new optimized sequence works on all past
> > processors that support PAT.
>
> Your folks confirmed that for all released hw and for x86 hardware from
> other vendors?

No. They can only talk for Intel CPUs. But AMD docs don't require MTTR
flow to begin with.

It is better to double-check on AMD side.

> > Testing wouldn't hurt, but it cannot possibly prove that the new flow is
> > safe by testing.
>
> This basically says that I should not take this patch at all as there's
> no way of even testing it?!
>
> I hope I'm misunderstanding you.

Testing can prove that the proposed patch is broken if a problem show ups.
But if you found no issue during testing you cannot say that the patch is
safe. You could just get lucky and don't hit pathological scenario with
broken cache/TLB aliases or something.

It is better to get confirmation from HW folks.

--
Kiryl Shutsemau / Kirill A. Shutemov

2024-02-05 13:31:30

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH, RESEND] x86/pat: Simplifying the PAT programming protocol

On Thu, Feb 01, 2024 at 12:17:32AM +0200, Kirill A. Shutemov wrote:
> It is better to get confirmation from HW folks.

Yah, lemme see what I can do.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-02-13 16:17:30

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH, RESEND] x86/pat: Simplifying the PAT programming protocol

On Thu, Feb 01, 2024 at 12:17:32AM +0200, Kirill A. Shutemov wrote:
> > So the "relaxation" is the removal of that CR0.CD requirement?

So I'm looking at the SDM, revisions 081US and 082US.

Section

"12.11.8 MTRR Considerations in MP Systems"

still has

"4. Enter the no-fill cache mode. (Set the CD flag in control register
CR0 to 1 and the NW flag to 0.)"

and

"4. Enter the no-fill cache mode. (Set the CD flag in control register
CR0 to 1 and the NW flag to 0.)"

so where is that relaxation written? Am I looking at the wrong place?

> And double WBINVD if the machine has no X86_FEATURE_SELFSNOOP (before and
> after TLB flush).

That's still there too. Steps 5 and 11, respectively.

Hmmm?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-02-13 16:57:30

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH, RESEND] x86/pat: Simplifying the PAT programming protocol

On Tue, Feb 13, 2024 at 05:15:14PM +0100, Borislav Petkov wrote:
> On Thu, Feb 01, 2024 at 12:17:32AM +0200, Kirill A. Shutemov wrote:
> > > So the "relaxation" is the removal of that CR0.CD requirement?
>
> So I'm looking at the SDM, revisions 081US and 082US.
>
> Section
>
> "12.11.8 MTRR Considerations in MP Systems"

The point is that PAT programming doesn't need to follow MTRR
considerations anymore.

Previously "Programming the PAT" section had this:

The operating system is responsible for ensuring that changes to a PAT
entry occur in a manner that maintains the consistency of the processor
caches and translation lookaside buffers (TLB). This is accomplished by
following the procedure as specified in Section 12.11.8, “MTRR
Considerations in MP Systems,” for changing the value of an MTRR in a
multiple processor system. It requires a specific sequence of
operations that includes flushing the processors caches and TLBs.

The new version points to MTTR consideration as one of possible way to
invalidate TLB and caches:

The operating system (OS) is responsible for ensuring that changes to a
PAT entry occur in a manner that maintains the consistency of the
processor caches and translation lookaside buffers (TLB). It requires the
OS to invalidate all affected TLB entries (including global entries) and
all entries in all paging-structure caches. It may also require flushing
of the processor caches in certain situations. This can be accomplished
in various ways, including the sequence below or by following the
procedure specified in Section 12.11.8, “MTRR Considerations in MP
Systems.” (See Section 4.10.4, “Invalidation of TLBs and
Paging-Structure Caches” for additional background information.) Also
note that in a multi-processor environment, it is the software's
responsibility to resolve differences in conflicting memory types across
logical processors that may arise from changes to the PAT (e.g., if two
logical processors map a linear address to the same physical address but
have PATs that specify a different memory type for that physical
address).

The new text follows with example of sequence that flushes TLB and
caches. And it doesn't touch CR0.CD.

--
Kiryl Shutsemau / Kirill A. Shutemov

2024-02-20 13:28:57

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH, RESEND] x86/pat: Simplifying the PAT programming protocol

On Tue, Feb 13, 2024 at 06:57:19PM +0200, Kirill A. Shutemov wrote:
> The new text follows with example of sequence that flushes TLB and
> caches. And it doesn't touch CR0.CD.

Ok, sounds like AMD is fine here. I'm going to take this one and see
what explodes. If something does, we can always remove it.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Subject: [tip: x86/mtrr] x86/pat: Simplify the PAT programming protocol

The following commit has been merged into the x86/mtrr branch of tip:

Commit-ID: ffc92cf3db62443c626469ef160f9276f296f6c6
Gitweb: https://git.kernel.org/tip/ffc92cf3db62443c626469ef160f9276f296f6c6
Author: Kirill A. Shutemov <[email protected]>
AuthorDate: Wed, 24 Jan 2024 15:06:50 +02:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Tue, 20 Feb 2024 14:40:51 +01:00

x86/pat: Simplify the PAT programming protocol

The programming protocol for the PAT MSR follows the MTRR programming
protocol. However, this protocol is cumbersome and requires disabling
caching (CR0.CD=1), which is not possible on some platforms.

Specifically, a TDX guest is not allowed to set CR0.CD. It triggers
a #VE exception.

It turns out that the requirement to follow the MTRR programming
protocol for PAT programming is unnecessarily strict. The new Intel
Software Developer Manual (http://www.intel.com/sdm) (December 2023)
relaxes this requirement, please refer to the section titled
"Programming the PAT" for more information.

In short, this section provides an alternative PAT update sequence which
doesn't need to disable caches around the PAT update but only to flush
those caches and TLBs.

The AMD documentation does not link PAT programming to MTRR and is there
fore, fine too.

The kernel only needs to flush the TLB after updating the PAT MSR. The
set_memory code already takes care of flushing the TLB and cache when
changing the memory type of a page.

[ bp: Expand commit message. ]

Signed-off-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/cacheinfo.c | 7 ++++---
arch/x86/mm/pat/memtype.c | 9 +++------
2 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index c131c41..78afad5 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -1118,15 +1118,16 @@ static void cache_cpu_init(void)
unsigned long flags;

local_irq_save(flags);
- cache_disable();

- if (memory_caching_control & CACHE_MTRR)
+ if (memory_caching_control & CACHE_MTRR) {
+ cache_disable();
mtrr_generic_set_state();
+ cache_enable();
+ }

if (memory_caching_control & CACHE_PAT)
pat_cpu_init();

- cache_enable();
local_irq_restore(flags);
}

diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index 0904d7e..0d72183 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -240,6 +240,8 @@ void pat_cpu_init(void)
}

wrmsrl(MSR_IA32_CR_PAT, pat_msr_val);
+
+ __flush_tlb_all();
}

/**
@@ -296,13 +298,8 @@ void __init pat_bp_init(void)
/*
* Xen PV doesn't allow to set PAT MSR, but all cache modes are
* supported.
- * When running as TDX guest setting the PAT MSR won't work either
- * due to the requirement to set CR0.CD when doing so. Rely on
- * firmware to have set the PAT MSR correctly.
*/
- if (pat_disabled ||
- cpu_feature_enabled(X86_FEATURE_XENPV) ||
- cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
+ if (pat_disabled || cpu_feature_enabled(X86_FEATURE_XENPV)) {
init_cache_modes(pat_msr_val);
return;
}