Today PAT can't be used without MTRR being available, unless MTRR is at
least configured via CONFIG_MTRR and the system is running as Xen PV
guest. In this case PAT is automatically available via the hypervisor,
but the PAT MSR can't be modified by the kernel and MTRR is disabled.
The same applies to a kernel built with no MTRR support: it won't
allow to use the PAT MSR, even if there is no technical reason for
that, other than setting up PAT on all CPUs the same way (which is a
requirement of the processor's cache management) is relying on some
MTRR specific code.
Fix all of that by:
- moving the function needed by PAT from MTRR specific code one level
up
- reworking the init sequences of MTRR and PAT to be more similar to
each other without calling PAT from MTRR code
- removing the dependency of PAT on MTRR
There is some more cleanup done reducing code size.
Note that patches 1+2 have already been applied to tip.git x86/cpu.
They are included in this series only for reference.
Changes in V5:
- addressed comments
Changes in V4:
- new patches 10, 14, 15, 16
- split up old patch 4 into 3 patches
- addressed comments
Changes in V3:
- replace patch 1 by just adding a comment
Changes in V2:
- complete rework of the patches based on comments by Boris Petkov
- added several patches to the series
Juergen Gross (16):
x86/mtrr: add comment for set_mtrr_state() serialization
x86/mtrr: remove unused cyrix_set_all() function
x86/mtrr: replace use_intel() with a local flag
x86/mtrr: rename prepare_set() and post_set()
x86/mtrr: split MTRR specific handling from cache dis/enabling
x86: move some code out of arch/x86/kernel/cpu/mtrr
x86/mtrr: Disentangle MTRR init from PAT init.
x86/mtrr: remove set_all callback from struct mtrr_ops
x86/mtrr: simplify mtrr_bp_init()
x86/mtrr: get rid of __mtrr_enabled bool
x86/mtrr: let cache_aps_delayed_init replace mtrr_aps_delayed_init
x86/mtrr: add a stop_machine() handler calling only cache_cpu_init()
x86: decouple PAT and MTRR handling
x86: switch cache_ap_init() to hotplug callback
x86: do MTRR/PAT setup on all secondary CPUs in parallel
x86/mtrr: simplify mtrr_ops initialization
arch/x86/include/asm/cacheinfo.h | 19 ++++
arch/x86/include/asm/memtype.h | 5 +-
arch/x86/include/asm/mtrr.h | 17 +--
arch/x86/kernel/cpu/cacheinfo.c | 173 +++++++++++++++++++++++++++++
arch/x86/kernel/cpu/common.c | 2 +-
arch/x86/kernel/cpu/mtrr/amd.c | 8 +-
arch/x86/kernel/cpu/mtrr/centaur.c | 8 +-
arch/x86/kernel/cpu/mtrr/cyrix.c | 42 +------
arch/x86/kernel/cpu/mtrr/generic.c | 127 ++++-----------------
arch/x86/kernel/cpu/mtrr/mtrr.c | 171 ++++------------------------
arch/x86/kernel/cpu/mtrr/mtrr.h | 15 +--
arch/x86/kernel/setup.c | 14 +--
arch/x86/kernel/smpboot.c | 9 +-
arch/x86/mm/pat/memtype.c | 152 +++++++++----------------
arch/x86/power/cpu.c | 3 +-
include/linux/cpuhotplug.h | 1 +
16 files changed, 308 insertions(+), 458 deletions(-)
--
2.35.3
The Cyrix CPU specific MTRR function cyrix_set_all() will never be
called, as the struct mtrr_ops set_all() callback will only be called
in the use_intel() case, which would require the use_intel_if member
of struct mtrr_ops to be set, which isn't the case for Cyrix.
Note that this patch has already been applied to tip.git.
Signed-off-by: Juergen Gross <[email protected]>
---
V2:
- new patch
---
arch/x86/kernel/cpu/mtrr/cyrix.c | 34 --------------------------------
1 file changed, 34 deletions(-)
diff --git a/arch/x86/kernel/cpu/mtrr/cyrix.c b/arch/x86/kernel/cpu/mtrr/cyrix.c
index ca670919b561..c77d3b0a5bf2 100644
--- a/arch/x86/kernel/cpu/mtrr/cyrix.c
+++ b/arch/x86/kernel/cpu/mtrr/cyrix.c
@@ -234,42 +234,8 @@ static void cyrix_set_arr(unsigned int reg, unsigned long base,
post_set();
}
-typedef struct {
- unsigned long base;
- unsigned long size;
- mtrr_type type;
-} arr_state_t;
-
-static arr_state_t arr_state[8] = {
- {0UL, 0UL, 0UL}, {0UL, 0UL, 0UL}, {0UL, 0UL, 0UL}, {0UL, 0UL, 0UL},
- {0UL, 0UL, 0UL}, {0UL, 0UL, 0UL}, {0UL, 0UL, 0UL}, {0UL, 0UL, 0UL}
-};
-
-static unsigned char ccr_state[7] = { 0, 0, 0, 0, 0, 0, 0 };
-
-static void cyrix_set_all(void)
-{
- int i;
-
- prepare_set();
-
- /* the CCRs are not contiguous */
- for (i = 0; i < 4; i++)
- setCx86(CX86_CCR0 + i, ccr_state[i]);
- for (; i < 7; i++)
- setCx86(CX86_CCR4 + i, ccr_state[i]);
-
- for (i = 0; i < 8; i++) {
- cyrix_set_arr(i, arr_state[i].base,
- arr_state[i].size, arr_state[i].type);
- }
-
- post_set();
-}
-
static const struct mtrr_ops cyrix_mtrr_ops = {
.vendor = X86_VENDOR_CYRIX,
- .set_all = cyrix_set_all,
.set = cyrix_set_arr,
.get = cyrix_get_arr,
.get_free_region = cyrix_get_free_region,
--
2.35.3
Rename the currently MTRR specific functions prepare_set() and
post_set() for preparing to move them. Make them non-static and put
their prototypes into cacheinfo.h, where they will end after moving
them to their final position anyway.
Expand the comment before the functions with an introductory line and
rename two related static variables, too.
Signed-off-by: Juergen Gross <[email protected]>
---
V4:
- carved out from other patch (Borislav Petkov)
---
arch/x86/include/asm/cacheinfo.h | 3 +++
arch/x86/kernel/cpu/mtrr/generic.c | 43 +++++++++++++++---------------
2 files changed, 24 insertions(+), 22 deletions(-)
diff --git a/arch/x86/include/asm/cacheinfo.h b/arch/x86/include/asm/cacheinfo.h
index c3873962a7cd..6159874b4183 100644
--- a/arch/x86/include/asm/cacheinfo.h
+++ b/arch/x86/include/asm/cacheinfo.h
@@ -10,4 +10,7 @@ extern unsigned int memory_caching_control;
void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c, int cpu);
void cacheinfo_hygon_init_llc_id(struct cpuinfo_x86 *c, int cpu);
+void cache_disable(void);
+void cache_enable(void);
+
#endif /* _ASM_X86_CACHEINFO_H */
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 81742870ecc5..aebdc90a2489 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -10,6 +10,7 @@
#include <linux/mm.h>
#include <asm/processor-flags.h>
+#include <asm/cacheinfo.h>
#include <asm/cpufeature.h>
#include <asm/tlbflush.h>
#include <asm/mtrr.h>
@@ -396,9 +397,6 @@ print_fixed(unsigned base, unsigned step, const mtrr_type *types)
}
}
-static void prepare_set(void);
-static void post_set(void);
-
static void __init print_mtrr_state(void)
{
unsigned int i;
@@ -450,11 +448,11 @@ void __init mtrr_bp_pat_init(void)
unsigned long flags;
local_irq_save(flags);
- prepare_set();
+ cache_disable();
pat_init();
- post_set();
+ cache_enable();
local_irq_restore(flags);
}
@@ -687,7 +685,7 @@ static u32 deftype_lo, deftype_hi;
* NOTE: The CPU must already be in a safe state for MTRR changes, including
* measures that only a single CPU can be active in set_mtrr_state() in
* order to not be subject to races for usage of deftype_lo (this is
- * accomplished by taking set_atomicity_lock).
+ * accomplished by taking cache_disable_lock).
* RETURNS: 0 if no changes made, else a mask indicating what was changed.
*/
static unsigned long set_mtrr_state(void)
@@ -718,18 +716,19 @@ static unsigned long set_mtrr_state(void)
return change_mask;
}
-
-static unsigned long cr4;
-static DEFINE_RAW_SPINLOCK(set_atomicity_lock);
-
/*
+ * Disable and enable caches. Needed for changing MTRRs and the PAT MSR.
+ *
* Since we are disabling the cache don't allow any interrupts,
* they would run extremely slow and would only increase the pain.
*
* The caller must ensure that local interrupts are disabled and
- * are reenabled after post_set() has been called.
+ * are reenabled after cache_enable() has been called.
*/
-static void prepare_set(void) __acquires(set_atomicity_lock)
+static unsigned long saved_cr4;
+static DEFINE_RAW_SPINLOCK(cache_disable_lock);
+
+void cache_disable(void) __acquires(cache_disable_lock)
{
unsigned long cr0;
@@ -740,7 +739,7 @@ static void prepare_set(void) __acquires(set_atomicity_lock)
* changes to the way the kernel boots
*/
- raw_spin_lock(&set_atomicity_lock);
+ raw_spin_lock(&cache_disable_lock);
/* Enter the no-fill (CD=1, NW=0) cache mode and flush caches. */
cr0 = read_cr0() | X86_CR0_CD;
@@ -757,8 +756,8 @@ static void prepare_set(void) __acquires(set_atomicity_lock)
/* Save value of CR4 and clear Page Global Enable (bit 7) */
if (boot_cpu_has(X86_FEATURE_PGE)) {
- cr4 = __read_cr4();
- __write_cr4(cr4 & ~X86_CR4_PGE);
+ saved_cr4 = __read_cr4();
+ __write_cr4(saved_cr4 & ~X86_CR4_PGE);
}
/* Flush all TLBs via a mov %cr3, %reg; mov %reg, %cr3 */
@@ -776,7 +775,7 @@ static void prepare_set(void) __acquires(set_atomicity_lock)
wbinvd();
}
-static void post_set(void) __releases(set_atomicity_lock)
+void cache_enable(void) __releases(cache_disable_lock)
{
/* Flush TLBs (no need to flush caches - they are disabled) */
count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
@@ -790,8 +789,8 @@ static void post_set(void) __releases(set_atomicity_lock)
/* Restore value of CR4 */
if (boot_cpu_has(X86_FEATURE_PGE))
- __write_cr4(cr4);
- raw_spin_unlock(&set_atomicity_lock);
+ __write_cr4(saved_cr4);
+ raw_spin_unlock(&cache_disable_lock);
}
static void generic_set_all(void)
@@ -800,7 +799,7 @@ static void generic_set_all(void)
unsigned long flags;
local_irq_save(flags);
- prepare_set();
+ cache_disable();
/* Actually set the state */
mask = set_mtrr_state();
@@ -808,7 +807,7 @@ static void generic_set_all(void)
/* also set PAT */
pat_init();
- post_set();
+ cache_enable();
local_irq_restore(flags);
/* Use the atomic bitops to update the global mask */
@@ -839,7 +838,7 @@ static void generic_set_mtrr(unsigned int reg, unsigned long base,
vr = &mtrr_state.var_ranges[reg];
local_irq_save(flags);
- prepare_set();
+ cache_disable();
if (size == 0) {
/*
@@ -858,7 +857,7 @@ static void generic_set_mtrr(unsigned int reg, unsigned long base,
mtrr_wrmsr(MTRRphysMask_MSR(reg), vr->mask_lo, vr->mask_hi);
}
- post_set();
+ cache_enable();
local_irq_restore(flags);
}
--
2.35.3
Add a main cache_cpu_init() init routine which initializes MTRR and/or
PAT support depending on what has been detected on the system.
Leave the MTRR-specific initialization in a MTRR-specific init function
where the smp_changes_mask setting happens now with caches disabled.
This global mask update was done with caches enabled before probably
because atomic operations while running uncached might have been quite
expensive.
But since only systems with a broken BIOS should ever require to set any
bit in smp_changes_mask, hurting those devices with a penalty of a few
microseconds during boot shouldn't be a real issue.
Signed-off-by: Juergen Gross <[email protected]>
---
V2:
- new patch
V4:
- remove some comments (Borislav Petkov)
V5:
- rephrase commit message (Borislav Petkov)
---
arch/x86/include/asm/cacheinfo.h | 1 +
arch/x86/include/asm/mtrr.h | 2 ++
arch/x86/kernel/cpu/cacheinfo.c | 17 +++++++++++++++++
arch/x86/kernel/cpu/mtrr/generic.c | 15 ++-------------
4 files changed, 22 insertions(+), 13 deletions(-)
diff --git a/arch/x86/include/asm/cacheinfo.h b/arch/x86/include/asm/cacheinfo.h
index 6159874b4183..978bac70fd49 100644
--- a/arch/x86/include/asm/cacheinfo.h
+++ b/arch/x86/include/asm/cacheinfo.h
@@ -12,5 +12,6 @@ void cacheinfo_hygon_init_llc_id(struct cpuinfo_x86 *c, int cpu);
void cache_disable(void);
void cache_enable(void);
+void cache_cpu_init(void);
#endif /* _ASM_X86_CACHEINFO_H */
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index 12a16caed395..986249a2b9b6 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -50,6 +50,7 @@ extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
extern int amd_special_default_mtrr(void);
void mtrr_disable(void);
void mtrr_enable(void);
+void mtrr_generic_set_state(void);
# else
static inline u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform)
{
@@ -91,6 +92,7 @@ static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
#define mtrr_bp_restore() do {} while (0)
#define mtrr_disable() do {} while (0)
#define mtrr_enable() do {} while (0)
+#define mtrr_generic_set_state() do {} while (0)
# endif
#ifdef CONFIG_COMPAT
diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index c6a17e21301e..81ab99fe92bd 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -1120,3 +1120,20 @@ void cache_enable(void) __releases(cache_disable_lock)
raw_spin_unlock(&cache_disable_lock);
}
+
+void cache_cpu_init(void)
+{
+ unsigned long flags;
+
+ local_irq_save(flags);
+ cache_disable();
+
+ if (memory_caching_control & CACHE_MTRR)
+ mtrr_generic_set_state();
+
+ if (memory_caching_control & CACHE_PAT)
+ pat_init();
+
+ cache_enable();
+ local_irq_restore(flags);
+}
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index bfe13eedaca8..32aebed25e3f 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -731,30 +731,19 @@ void mtrr_enable(void)
mtrr_wrmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
}
-static void generic_set_all(void)
+void mtrr_generic_set_state(void)
{
unsigned long mask, count;
- unsigned long flags;
-
- local_irq_save(flags);
- cache_disable();
/* Actually set the state */
mask = set_mtrr_state();
- /* also set PAT */
- pat_init();
-
- cache_enable();
- local_irq_restore(flags);
-
/* Use the atomic bitops to update the global mask */
for (count = 0; count < sizeof(mask) * 8; ++count) {
if (mask & 0x01)
set_bit(count, &smp_changes_mask);
mask >>= 1;
}
-
}
/**
@@ -854,7 +843,7 @@ int positive_have_wrcomb(void)
* Generic structure...
*/
const struct mtrr_ops generic_mtrr_ops = {
- .set_all = generic_set_all,
+ .set_all = cache_cpu_init,
.get = generic_get_mtrr,
.get_free_region = generic_get_free_region,
.set = generic_set_mtrr,
--
2.35.3
Instead of using an indirect call to mtrr_if->set_all just call the
only possible target cache_cpu_init() directly. This enables to remove
the set_all callback from struct mtrr_ops.
Signed-off-by: Juergen Gross <[email protected]>
---
arch/x86/kernel/cpu/mtrr/generic.c | 1 -
arch/x86/kernel/cpu/mtrr/mtrr.c | 10 +++++-----
arch/x86/kernel/cpu/mtrr/mtrr.h | 2 --
3 files changed, 5 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 32aebed25e3f..af8422c96b92 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -843,7 +843,6 @@ int positive_have_wrcomb(void)
* Generic structure...
*/
const struct mtrr_ops generic_mtrr_ops = {
- .set_all = cache_cpu_init,
.get = generic_get_mtrr,
.get_free_region = generic_get_free_region,
.set = generic_set_mtrr,
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
index 4209945c4e68..a44b510ced0e 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -170,15 +170,15 @@ static int mtrr_rendezvous_handler(void *info)
* saved, and we want to replicate that across all the cpus that come
* online (either at the end of boot or resume or during a runtime cpu
* online). If we're doing that, @reg is set to something special and on
- * all the cpu's we do mtrr_if->set_all() (On the logical cpu that
+ * all the CPUs we do cache_cpu_init() (On the logical CPU that
* started the boot/resume sequence, this might be a duplicate
- * set_all()).
+ * cache_cpu_init()).
*/
if (data->smp_reg != ~0U) {
mtrr_if->set(data->smp_reg, data->smp_base,
data->smp_size, data->smp_type);
} else if (mtrr_aps_delayed_init || !cpu_online(smp_processor_id())) {
- mtrr_if->set_all();
+ cache_cpu_init();
}
return 0;
}
@@ -770,7 +770,7 @@ void __init mtrr_bp_init(void)
if (mtrr_cleanup(phys_addr)) {
changed_by_mtrr_cleanup = 1;
- mtrr_if->set_all();
+ cache_cpu_init();
}
}
}
@@ -856,7 +856,7 @@ void mtrr_bp_restore(void)
if (!memory_caching_control)
return;
- mtrr_if->set_all();
+ cache_cpu_init();
}
static int __init mtrr_init_finialize(void)
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h
index 88b1c4b6174a..3b1883185185 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.h
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.h
@@ -16,8 +16,6 @@ struct mtrr_ops {
u32 vendor;
void (*set)(unsigned int reg, unsigned long base,
unsigned long size, mtrr_type type);
- void (*set_all)(void);
-
void (*get)(unsigned int reg, unsigned long *base,
unsigned long *size, mtrr_type *type);
int (*get_free_region)(unsigned long base, unsigned long size,
--
2.35.3
There is no need for keeping __mtrr_enabled, as it can easily be
replaced by testing mtrr_if to be not NULL.
Signed-off-by: Juergen Gross <[email protected]>
---
V4:
- new patch
V5:
- rebase
---
arch/x86/kernel/cpu/mtrr/mtrr.c | 13 +++++--------
1 file changed, 5 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
index a468be5d778f..f671be9823b6 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -59,11 +59,9 @@
#define MTRR_TO_PHYS_WC_OFFSET 1000
u32 num_var_ranges;
-static bool __mtrr_enabled;
-
static bool mtrr_enabled(void)
{
- return __mtrr_enabled;
+ return !!mtrr_if;
}
unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
@@ -755,18 +753,17 @@ void __init mtrr_bp_init(void)
}
}
- if (mtrr_if) {
- __mtrr_enabled = true;
+ if (mtrr_enabled()) {
set_num_var_ranges(mtrr_if == &generic_mtrr_ops);
init_table();
if (mtrr_if == &generic_mtrr_ops) {
/* BIOS may override */
- __mtrr_enabled = get_mtrr_state();
-
- if (mtrr_enabled()) {
+ if (get_mtrr_state()) {
memory_caching_control |= CACHE_MTRR | CACHE_PAT;
changed_by_mtrr_cleanup = mtrr_cleanup(phys_addr);
cache_cpu_init();
+ } else {
+ mtrr_if = NULL;
}
}
}
--
2.35.3
Instead of having a stop_machine() handler for either a specific MTRR
register or all state at once, add a handler just for calling
cache_cpu_init() if appropriate.
Add functions for calling stop_machine() with this handler as well.
Add a generic replacement for mtrr_bp_restore() and a wrapper for
mtrr_bp_init().
Signed-off-by: Juergen Gross <[email protected]>
---
V2:
- completely new replacement of former patch 2
V5:
- add a missing mtrr_bp_init() stub (Borislav Petkov)
---
arch/x86/include/asm/cacheinfo.h | 5 +-
arch/x86/include/asm/mtrr.h | 8 +--
arch/x86/kernel/cpu/cacheinfo.c | 59 ++++++++++++++++++++-
arch/x86/kernel/cpu/common.c | 3 +-
arch/x86/kernel/cpu/mtrr/mtrr.c | 88 +-------------------------------
arch/x86/kernel/setup.c | 3 +-
arch/x86/kernel/smpboot.c | 4 +-
arch/x86/power/cpu.c | 3 +-
8 files changed, 74 insertions(+), 99 deletions(-)
diff --git a/arch/x86/include/asm/cacheinfo.h b/arch/x86/include/asm/cacheinfo.h
index e443fcc1f045..a0ef46e9f453 100644
--- a/arch/x86/include/asm/cacheinfo.h
+++ b/arch/x86/include/asm/cacheinfo.h
@@ -12,8 +12,11 @@ void cacheinfo_hygon_init_llc_id(struct cpuinfo_x86 *c, int cpu);
void cache_disable(void);
void cache_enable(void);
-void cache_cpu_init(void);
void set_cache_aps_delayed_init(bool val);
bool get_cache_aps_delayed_init(void);
+void cache_bp_init(void);
+void cache_bp_restore(void);
+void cache_ap_init(void);
+void cache_aps_init(void);
#endif /* _ASM_X86_CACHEINFO_H */
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index 5d31219c8529..f0eeaf6e5f5f 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -25,13 +25,12 @@
#include <uapi/asm/mtrr.h>
-void mtrr_bp_init(void);
-
/*
* The following functions are for use by other drivers that cannot use
* arch_phys_wc_add and arch_phys_wc_del.
*/
# ifdef CONFIG_MTRR
+void mtrr_bp_init(void);
extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform);
extern void mtrr_save_fixed_ranges(void *);
extern void mtrr_save_state(void);
@@ -42,8 +41,6 @@ extern int mtrr_add_page(unsigned long base, unsigned long size,
extern int mtrr_del(int reg, unsigned long base, unsigned long size);
extern int mtrr_del_page(int reg, unsigned long base, unsigned long size);
extern void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi);
-extern void mtrr_ap_init(void);
-extern void mtrr_aps_init(void);
extern void mtrr_bp_restore(void);
extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
extern int amd_special_default_mtrr(void);
@@ -85,8 +82,7 @@ static inline int mtrr_trim_uncached_memory(unsigned long end_pfn)
static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
{
}
-#define mtrr_ap_init() do {} while (0)
-#define mtrr_aps_init() do {} while (0)
+#define mtrr_bp_init() do {} while (0)
#define mtrr_bp_restore() do {} while (0)
#define mtrr_disable() do {} while (0)
#define mtrr_enable() do {} while (0)
diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 931ba3fb1363..a92099569617 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -15,6 +15,7 @@
#include <linux/capability.h>
#include <linux/sysfs.h>
#include <linux/pci.h>
+#include <linux/stop_machine.h>
#include <asm/cpufeature.h>
#include <asm/cacheinfo.h>
@@ -1121,7 +1122,7 @@ void cache_enable(void) __releases(cache_disable_lock)
raw_spin_unlock(&cache_disable_lock);
}
-void cache_cpu_init(void)
+static void cache_cpu_init(void)
{
unsigned long flags;
@@ -1149,3 +1150,59 @@ bool get_cache_aps_delayed_init(void)
{
return cache_aps_delayed_init;
}
+
+static int cache_rendezvous_handler(void *unused)
+{
+ if (get_cache_aps_delayed_init() || !cpu_online(smp_processor_id()))
+ cache_cpu_init();
+
+ return 0;
+}
+
+void __init cache_bp_init(void)
+{
+ mtrr_bp_init();
+
+ if (memory_caching_control)
+ cache_cpu_init();
+}
+
+void cache_bp_restore(void)
+{
+ if (memory_caching_control)
+ cache_cpu_init();
+}
+
+void cache_ap_init(void)
+{
+ if (!memory_caching_control || get_cache_aps_delayed_init())
+ return;
+
+ /*
+ * Ideally we should hold mtrr_mutex here to avoid MTRR entries
+ * changed, but this routine will be called in CPU boot time,
+ * holding the lock breaks it.
+ *
+ * This routine is called in two cases:
+ *
+ * 1. very early time of software resume, when there absolutely
+ * isn't MTRR entry changes;
+ *
+ * 2. CPU hotadd time. We let mtrr_add/del_page hold cpuhotplug
+ * lock to prevent MTRR entry changes
+ */
+ stop_machine_from_inactive_cpu(cache_rendezvous_handler, NULL,
+ cpu_callout_mask);
+}
+
+/*
+ * Delayed cache initialization for all AP's
+ */
+void cache_aps_init(void)
+{
+ if (!memory_caching_control || !get_cache_aps_delayed_init())
+ return;
+
+ stop_machine(cache_rendezvous_handler, NULL, cpu_online_mask);
+ set_cache_aps_delayed_init(false);
+}
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 3e508f239098..fd058b547f8d 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -52,6 +52,7 @@
#include <asm/cpu.h>
#include <asm/mce.h>
#include <asm/msr.h>
+#include <asm/cacheinfo.h>
#include <asm/memtype.h>
#include <asm/microcode.h>
#include <asm/microcode_intel.h>
@@ -1948,7 +1949,7 @@ void identify_secondary_cpu(struct cpuinfo_x86 *c)
#ifdef CONFIG_X86_32
enable_sep_cpu();
#endif
- mtrr_ap_init();
+ cache_ap_init();
validate_apic_and_package_id(c);
x86_spec_ctrl_setup_ap();
update_srbds_msr();
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
index 15ee6d72fb1f..99b6973a69b4 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -73,9 +73,6 @@ static const struct mtrr_ops *mtrr_ops[X86_VENDOR_NUM] __ro_after_init;
const struct mtrr_ops *mtrr_if;
-static void set_mtrr(unsigned int reg, unsigned long base,
- unsigned long size, mtrr_type type);
-
void __init set_mtrr_ops(const struct mtrr_ops *ops)
{
if (ops->vendor && ops->vendor < X86_VENDOR_NUM)
@@ -158,26 +155,8 @@ static int mtrr_rendezvous_handler(void *info)
{
struct set_mtrr_data *data = info;
- /*
- * We use this same function to initialize the mtrrs during boot,
- * resume, runtime cpu online and on an explicit request to set a
- * specific MTRR.
- *
- * During boot or suspend, the state of the boot cpu's mtrrs has been
- * saved, and we want to replicate that across all the cpus that come
- * online (either at the end of boot or resume or during a runtime cpu
- * online). If we're doing that, @reg is set to something special and on
- * all the CPUs we do cache_cpu_init() (On the logical CPU that
- * started the boot/resume sequence, this might be a duplicate
- * cache_cpu_init()).
- */
- if (data->smp_reg != ~0U) {
- mtrr_if->set(data->smp_reg, data->smp_base,
- data->smp_size, data->smp_type);
- } else if (get_cache_aps_delayed_init() ||
- !cpu_online(smp_processor_id())) {
- cache_cpu_init();
- }
+ mtrr_if->set(data->smp_reg, data->smp_base,
+ data->smp_size, data->smp_type);
return 0;
}
@@ -247,19 +226,6 @@ static void set_mtrr_cpuslocked(unsigned int reg, unsigned long base,
stop_machine_cpuslocked(mtrr_rendezvous_handler, &data, cpu_online_mask);
}
-static void set_mtrr_from_inactive_cpu(unsigned int reg, unsigned long base,
- unsigned long size, mtrr_type type)
-{
- struct set_mtrr_data data = { .smp_reg = reg,
- .smp_base = base,
- .smp_size = size,
- .smp_type = type
- };
-
- stop_machine_from_inactive_cpu(mtrr_rendezvous_handler, &data,
- cpu_callout_mask);
-}
-
/**
* mtrr_add_page - Add a memory type region
* @base: Physical base address of region in pages (in units of 4 kB!)
@@ -761,7 +727,6 @@ void __init mtrr_bp_init(void)
if (get_mtrr_state()) {
memory_caching_control |= CACHE_MTRR | CACHE_PAT;
changed_by_mtrr_cleanup = mtrr_cleanup(phys_addr);
- cache_cpu_init();
} else {
mtrr_if = NULL;
}
@@ -780,27 +745,6 @@ void __init mtrr_bp_init(void)
}
}
-void mtrr_ap_init(void)
-{
- if (!memory_caching_control || get_cache_aps_delayed_init())
- return;
-
- /*
- * Ideally we should hold mtrr_mutex here to avoid mtrr entries
- * changed, but this routine will be called in cpu boot time,
- * holding the lock breaks it.
- *
- * This routine is called in two cases:
- *
- * 1. very early time of software resume, when there absolutely
- * isn't mtrr entry changes;
- *
- * 2. cpu hotadd time. We let mtrr_add/del_page hold cpuhotplug
- * lock to prevent mtrr entry changes
- */
- set_mtrr_from_inactive_cpu(~0U, 0, 0, 0);
-}
-
/**
* mtrr_save_state - Save current fixed-range MTRR state of the first
* cpu in cpu_online_mask.
@@ -816,34 +760,6 @@ void mtrr_save_state(void)
smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
}
-/*
- * Delayed MTRR initialization for all AP's
- */
-void mtrr_aps_init(void)
-{
- if (!memory_caching_control)
- return;
-
- /*
- * Check if someone has requested the delay of AP MTRR initialization,
- * by doing set_mtrr_aps_delayed_init(), prior to this point. If not,
- * then we are done.
- */
- if (!get_cache_aps_delayed_init())
- return;
-
- set_mtrr(~0U, 0, 0, 0);
- set_cache_aps_delayed_init(false);
-}
-
-void mtrr_bp_restore(void)
-{
- if (!memory_caching_control)
- return;
-
- cache_cpu_init();
-}
-
static int __init mtrr_init_finialize(void)
{
if (!mtrr_enabled())
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 216fee7144ee..e0e185ee0229 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -34,6 +34,7 @@
#include <asm/numa.h>
#include <asm/bios_ebda.h>
#include <asm/bugs.h>
+#include <asm/cacheinfo.h>
#include <asm/cpu.h>
#include <asm/efi.h>
#include <asm/gart.h>
@@ -1075,7 +1076,7 @@ void __init setup_arch(char **cmdline_p)
/* update e820 for memory not covered by WB MTRRs */
if (IS_ENABLED(CONFIG_MTRR))
- mtrr_bp_init();
+ cache_bp_init();
else
pat_disable("PAT support disabled because CONFIG_MTRR is disabled in the kernel.");
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 13c71ab29d84..1b61a480c966 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1445,7 +1445,7 @@ void arch_thaw_secondary_cpus_begin(void)
void arch_thaw_secondary_cpus_end(void)
{
- mtrr_aps_init();
+ cache_aps_init();
}
/*
@@ -1488,7 +1488,7 @@ void __init native_smp_cpus_done(unsigned int max_cpus)
nmi_selftest();
impress_friends();
- mtrr_aps_init();
+ cache_aps_init();
}
static int __initdata setup_possible_cpus = -1;
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index bb176c72891c..754221c9a1c3 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -23,6 +23,7 @@
#include <asm/fpu/api.h>
#include <asm/debugreg.h>
#include <asm/cpu.h>
+#include <asm/cacheinfo.h>
#include <asm/mmu_context.h>
#include <asm/cpu_device_id.h>
#include <asm/microcode.h>
@@ -261,7 +262,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
do_fpu_end();
tsc_verify_tsc_adjust(true);
x86_platform.restore_sched_clock_state();
- mtrr_bp_restore();
+ cache_bp_restore();
perf_restore_debug_store();
c = &cpu_data(smp_processor_id());
--
2.35.3
Today PAT is usable only with MTRR being active, with some nasty tweaks
to make PAT usable when running as Xen PV guest, which doesn't support
MTRR.
The reason for this coupling is, that both, PAT MSR changes and MTRR
changes, require a similar sequence and so full PAT support was added
using the already available MTRR handling.
Xen PV PAT handling can work without MTRR, as it just needs to consume
the PAT MSR setting done by the hypervisor without the ability and need
to change it. This in turn has resulted in a convoluted initialization
sequence and wrong decisions regarding cache mode availability due to
misguiding PAT availability flags.
Fix all of that by allowing to use PAT without MTRR and by reworking
the current PAT initialization sequence to match better with the newly
introduced generic cache initialization.
This removes the need of the recently added pat_force_disabled flag, so
remove the remnants of the patch adding it.
Signed-off-by: Juergen Gross <[email protected]>
---
V2:
- former patch 3 completely reworked
V5:
- rework PAT() macro
- drop local pat variable (Borislav Petkov)
- use cpu_feature_enabled() (Borislav Petkov)
- some more minor adjustments (Borislav Petkov)
---
arch/x86/include/asm/memtype.h | 5 +-
arch/x86/kernel/cpu/cacheinfo.c | 3 +-
arch/x86/kernel/cpu/mtrr/mtrr.c | 12 +--
arch/x86/kernel/setup.c | 13 +--
arch/x86/mm/pat/memtype.c | 152 +++++++++++---------------------
5 files changed, 57 insertions(+), 128 deletions(-)
diff --git a/arch/x86/include/asm/memtype.h b/arch/x86/include/asm/memtype.h
index 9ca760e430b9..113b2fa51849 100644
--- a/arch/x86/include/asm/memtype.h
+++ b/arch/x86/include/asm/memtype.h
@@ -6,9 +6,8 @@
#include <asm/pgtable_types.h>
extern bool pat_enabled(void);
-extern void pat_disable(const char *reason);
-extern void pat_init(void);
-extern void init_cache_modes(void);
+extern void pat_bp_init(void);
+extern void pat_cpu_init(void);
extern int memtype_reserve(u64 start, u64 end,
enum page_cache_mode req_pcm, enum page_cache_mode *ret_pcm);
diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index a92099569617..1aaf830254df 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -1133,7 +1133,7 @@ static void cache_cpu_init(void)
mtrr_generic_set_state();
if (memory_caching_control & CACHE_PAT)
- pat_init();
+ pat_cpu_init();
cache_enable();
local_irq_restore(flags);
@@ -1162,6 +1162,7 @@ static int cache_rendezvous_handler(void *unused)
void __init cache_bp_init(void)
{
mtrr_bp_init();
+ pat_bp_init();
if (memory_caching_control)
cache_cpu_init();
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
index 99b6973a69b4..8403daf34158 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -725,7 +725,7 @@ void __init mtrr_bp_init(void)
if (mtrr_if == &generic_mtrr_ops) {
/* BIOS may override */
if (get_mtrr_state()) {
- memory_caching_control |= CACHE_MTRR | CACHE_PAT;
+ memory_caching_control |= CACHE_MTRR;
changed_by_mtrr_cleanup = mtrr_cleanup(phys_addr);
} else {
mtrr_if = NULL;
@@ -733,16 +733,8 @@ void __init mtrr_bp_init(void)
}
}
- if (!mtrr_enabled()) {
+ if (!mtrr_enabled())
pr_info("Disabled\n");
-
- /*
- * PAT initialization relies on MTRR's rendezvous handler.
- * Skip PAT init until the handler can initialize both
- * features independently.
- */
- pat_disable("MTRRs disabled, skipping PAT initialization too.");
- }
}
/**
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index e0e185ee0229..aacaa96f0195 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1075,23 +1075,12 @@ void __init setup_arch(char **cmdline_p)
max_pfn = e820__end_of_ram_pfn();
/* update e820 for memory not covered by WB MTRRs */
- if (IS_ENABLED(CONFIG_MTRR))
- cache_bp_init();
- else
- pat_disable("PAT support disabled because CONFIG_MTRR is disabled in the kernel.");
-
+ cache_bp_init();
if (mtrr_trim_uncached_memory(max_pfn))
max_pfn = e820__end_of_ram_pfn();
max_possible_pfn = max_pfn;
- /*
- * This call is required when the CPU does not support PAT. If
- * mtrr_bp_init() invoked it already via pat_init() the call has no
- * effect.
- */
- init_cache_modes();
-
/*
* Define random base addresses for memory sections after max_pfn is
* defined and before each memory section base is used.
diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index 66a209f7eb86..9aab17d660cd 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -43,6 +43,7 @@
#include <linux/rbtree.h>
#include <asm/cacheflush.h>
+#include <asm/cacheinfo.h>
#include <asm/processor.h>
#include <asm/tlbflush.h>
#include <asm/x86_init.h>
@@ -60,41 +61,34 @@
#undef pr_fmt
#define pr_fmt(fmt) "" fmt
-static bool __read_mostly pat_bp_initialized;
static bool __read_mostly pat_disabled = !IS_ENABLED(CONFIG_X86_PAT);
-static bool __initdata pat_force_disabled = !IS_ENABLED(CONFIG_X86_PAT);
-static bool __read_mostly pat_bp_enabled;
-static bool __read_mostly pat_cm_initialized;
+static u64 __ro_after_init pat_msr_val;
/*
* PAT support is enabled by default, but can be disabled for
* various user-requested or hardware-forced reasons:
*/
-void pat_disable(const char *msg_reason)
+static void __init pat_disable(const char *msg_reason)
{
if (pat_disabled)
return;
- if (pat_bp_initialized) {
- WARN_ONCE(1, "x86/PAT: PAT cannot be disabled after initialization\n");
- return;
- }
-
pat_disabled = true;
pr_info("x86/PAT: %s\n", msg_reason);
+
+ memory_caching_control &= ~CACHE_PAT;
}
static int __init nopat(char *str)
{
pat_disable("PAT support disabled via boot option.");
- pat_force_disabled = true;
return 0;
}
early_param("nopat", nopat);
bool pat_enabled(void)
{
- return pat_bp_enabled;
+ return !pat_disabled;
}
EXPORT_SYMBOL_GPL(pat_enabled);
@@ -192,7 +186,8 @@ enum {
#define CM(c) (_PAGE_CACHE_MODE_ ## c)
-static enum page_cache_mode pat_get_cache_mode(unsigned pat_val, char *msg)
+static enum page_cache_mode __init pat_get_cache_mode(unsigned int pat_val,
+ char *msg)
{
enum page_cache_mode cache;
char *cache_mode;
@@ -219,14 +214,12 @@ static enum page_cache_mode pat_get_cache_mode(unsigned pat_val, char *msg)
* configuration.
* Using lower indices is preferred, so we start with highest index.
*/
-static void __init_cache_modes(u64 pat)
+static void __init init_cache_modes(u64 pat)
{
enum page_cache_mode cache;
char pat_msg[33];
int i;
- WARN_ON_ONCE(pat_cm_initialized);
-
pat_msg[32] = 0;
for (i = 7; i >= 0; i--) {
cache = pat_get_cache_mode((pat >> (i * 8)) & 7,
@@ -234,34 +227,9 @@ static void __init_cache_modes(u64 pat)
update_cache_mode_entry(i, cache);
}
pr_info("x86/PAT: Configuration [0-7]: %s\n", pat_msg);
-
- pat_cm_initialized = true;
}
-#define PAT(x, y) ((u64)PAT_ ## y << ((x)*8))
-
-static void pat_bp_init(u64 pat)
-{
- u64 tmp_pat;
-
- if (!boot_cpu_has(X86_FEATURE_PAT)) {
- pat_disable("PAT not supported by the CPU.");
- return;
- }
-
- rdmsrl(MSR_IA32_CR_PAT, tmp_pat);
- if (!tmp_pat) {
- pat_disable("PAT support disabled by the firmware.");
- return;
- }
-
- wrmsrl(MSR_IA32_CR_PAT, pat);
- pat_bp_enabled = true;
-
- __init_cache_modes(pat);
-}
-
-static void pat_ap_init(u64 pat)
+void pat_cpu_init(void)
{
if (!boot_cpu_has(X86_FEATURE_PAT)) {
/*
@@ -271,30 +239,39 @@ static void pat_ap_init(u64 pat)
panic("x86/PAT: PAT enabled, but not supported by secondary CPU\n");
}
- wrmsrl(MSR_IA32_CR_PAT, pat);
+ wrmsrl(MSR_IA32_CR_PAT, pat_msr_val);
}
-void __init init_cache_modes(void)
+/**
+ * pat_bp_init - Initialize the PAT MSR value and PAT table
+ *
+ * This function initializes PAT MSR value and PAT table with an OS-defined
+ * value to enable additional cache attributes, WC, WT and WP.
+ *
+ * This function prepares the calls of pat_cpu_init() via cache_cpu_init()
+ * on all CPUs.
+ */
+void __init pat_bp_init(void)
{
- u64 pat = 0;
+ struct cpuinfo_x86 *c = &boot_cpu_data;
+#define PAT(p0, p1, p2, p3, p4, p5, p6, p7) \
+ (((u64)PAT_ ## p0) | ((u64)PAT_ ## p1 << 8) | \
+ ((u64)PAT_ ## p2 << 16) | ((u64)PAT_ ## p3 << 24) | \
+ ((u64)PAT_ ## p4 << 32) | ((u64)PAT_ ## p5 << 40) | \
+ ((u64)PAT_ ## p6 << 48) | ((u64)PAT_ ## p7 << 56))
- if (pat_cm_initialized)
- return;
- if (boot_cpu_has(X86_FEATURE_PAT)) {
- /*
- * CPU supports PAT. Set PAT table to be consistent with
- * PAT MSR. This case supports "nopat" boot option, and
- * virtual machine environments which support PAT without
- * MTRRs. In specific, Xen has unique setup to PAT MSR.
- *
- * If PAT MSR returns 0, it is considered invalid and emulates
- * as No PAT.
- */
- rdmsrl(MSR_IA32_CR_PAT, pat);
- }
+ if (!IS_ENABLED(CONFIG_X86_PAT))
+ pr_info_once("x86/PAT: PAT support disabled because CONFIG_X86_PAT is disabled in the kernel.\n");
+
+ if (!cpu_feature_enabled(X86_FEATURE_PAT))
+ pat_disable("PAT not supported by the CPU.");
+ else
+ rdmsrl(MSR_IA32_CR_PAT, pat_msr_val);
+
+ if (!pat_msr_val) {
+ pat_disable("PAT support disabled by the firmware.");
- if (!pat) {
/*
* No PAT. Emulate the PAT table that corresponds to the two
* cache bits, PWT (Write Through) and PCD (Cache Disable).
@@ -313,40 +290,17 @@ void __init init_cache_modes(void)
* NOTE: When WC or WP is used, it is redirected to UC- per
* the default setup in __cachemode2pte_tbl[].
*/
- pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
- PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
- } else if (!pat_force_disabled && cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) {
- /*
- * Clearly PAT is enabled underneath. Allow pat_enabled() to
- * reflect this.
- */
- pat_bp_enabled = true;
+ pat_msr_val = PAT(WB, WT, UC_MINUS, UC, WB, WT, UC_MINUS, UC);
}
- __init_cache_modes(pat);
-}
-
-/**
- * pat_init - Initialize the PAT MSR and PAT table on the current CPU
- *
- * This function initializes PAT MSR and PAT table with an OS-defined value
- * to enable additional cache attributes, WC, WT and WP.
- *
- * This function must be called on all CPUs using the specific sequence of
- * operations defined in Intel SDM. mtrr_rendezvous_handler() provides this
- * procedure for PAT.
- */
-void pat_init(void)
-{
- u64 pat;
- struct cpuinfo_x86 *c = &boot_cpu_data;
-
-#ifndef CONFIG_X86_PAT
- pr_info_once("x86/PAT: PAT support disabled because CONFIG_X86_PAT is disabled in the kernel.\n");
-#endif
-
- if (pat_disabled)
+ /*
+ * Xen PV doesn't allow to set PAT MSR, but all cache modes are
+ * supported.
+ */
+ if (pat_disabled || cpu_feature_enabled(X86_FEATURE_XENPV)) {
+ init_cache_modes(pat_msr_val);
return;
+ }
if ((c->x86_vendor == X86_VENDOR_INTEL) &&
(((c->x86 == 0x6) && (c->x86_model <= 0xd)) ||
@@ -371,8 +325,7 @@ void pat_init(void)
* NOTE: When WT or WP is used, it is redirected to UC- per
* the default setup in __cachemode2pte_tbl[].
*/
- pat = PAT(0, WB) | PAT(1, WC) | PAT(2, UC_MINUS) | PAT(3, UC) |
- PAT(4, WB) | PAT(5, WC) | PAT(6, UC_MINUS) | PAT(7, UC);
+ pat_msr_val = PAT(WB, WC, UC_MINUS, UC, WB, WC, UC_MINUS, UC);
} else {
/*
* Full PAT support. We put WT in slot 7 to improve
@@ -400,19 +353,14 @@ void pat_init(void)
* The reserved slots are unused, but mapped to their
* corresponding types in the presence of PAT errata.
*/
- pat = PAT(0, WB) | PAT(1, WC) | PAT(2, UC_MINUS) | PAT(3, UC) |
- PAT(4, WB) | PAT(5, WP) | PAT(6, UC_MINUS) | PAT(7, WT);
+ pat_msr_val = PAT(WB, WC, UC_MINUS, UC, WB, WP, UC_MINUS, WT);
}
- if (!pat_bp_initialized) {
- pat_bp_init(pat);
- pat_bp_initialized = true;
- } else {
- pat_ap_init(pat);
- }
-}
+ memory_caching_control |= CACHE_PAT;
+ init_cache_modes(pat_msr_val);
#undef PAT
+}
static DEFINE_SPINLOCK(memtype_lock); /* protects memtype accesses */
--
2.35.3
In order to prepare decoupling MTRR and PAT replace the MTRR specific
mtrr_aps_delayed_init flag with a more generic cache_aps_delayed_init
one.
Signed-off-by: Juergen Gross <[email protected]>
---
V2:
- new patch
V4:
- reestablish function to set cache_aps_delayed_init (Borislav Petkov)
V5:
- make cache_aps_delayed_init static, add get accessor (Borislav Petkov)
---
arch/x86/include/asm/cacheinfo.h | 2 ++
arch/x86/include/asm/mtrr.h | 2 --
arch/x86/kernel/cpu/cacheinfo.c | 12 ++++++++++++
arch/x86/kernel/cpu/mtrr/mtrr.c | 18 +++++-------------
arch/x86/kernel/smpboot.c | 5 +++--
5 files changed, 22 insertions(+), 17 deletions(-)
diff --git a/arch/x86/include/asm/cacheinfo.h b/arch/x86/include/asm/cacheinfo.h
index 978bac70fd49..e443fcc1f045 100644
--- a/arch/x86/include/asm/cacheinfo.h
+++ b/arch/x86/include/asm/cacheinfo.h
@@ -13,5 +13,7 @@ void cacheinfo_hygon_init_llc_id(struct cpuinfo_x86 *c, int cpu);
void cache_disable(void);
void cache_enable(void);
void cache_cpu_init(void);
+void set_cache_aps_delayed_init(bool val);
+bool get_cache_aps_delayed_init(void);
#endif /* _ASM_X86_CACHEINFO_H */
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index 986249a2b9b6..5d31219c8529 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -43,7 +43,6 @@ extern int mtrr_del(int reg, unsigned long base, unsigned long size);
extern int mtrr_del_page(int reg, unsigned long base, unsigned long size);
extern void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi);
extern void mtrr_ap_init(void);
-extern void set_mtrr_aps_delayed_init(void);
extern void mtrr_aps_init(void);
extern void mtrr_bp_restore(void);
extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
@@ -87,7 +86,6 @@ static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
{
}
#define mtrr_ap_init() do {} while (0)
-#define set_mtrr_aps_delayed_init() do {} while (0)
#define mtrr_aps_init() do {} while (0)
#define mtrr_bp_restore() do {} while (0)
#define mtrr_disable() do {} while (0)
diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 81ab99fe92bd..931ba3fb1363 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -1137,3 +1137,15 @@ void cache_cpu_init(void)
cache_enable();
local_irq_restore(flags);
}
+
+static bool cache_aps_delayed_init;
+
+void set_cache_aps_delayed_init(bool val)
+{
+ cache_aps_delayed_init = val;
+}
+
+bool get_cache_aps_delayed_init(void)
+{
+ return cache_aps_delayed_init;
+}
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
index f671be9823b6..15ee6d72fb1f 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -68,7 +68,6 @@ unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
static DEFINE_MUTEX(mtrr_mutex);
u64 size_or_mask, size_and_mask;
-static bool mtrr_aps_delayed_init;
static const struct mtrr_ops *mtrr_ops[X86_VENDOR_NUM] __ro_after_init;
@@ -175,7 +174,8 @@ static int mtrr_rendezvous_handler(void *info)
if (data->smp_reg != ~0U) {
mtrr_if->set(data->smp_reg, data->smp_base,
data->smp_size, data->smp_type);
- } else if (mtrr_aps_delayed_init || !cpu_online(smp_processor_id())) {
+ } else if (get_cache_aps_delayed_init() ||
+ !cpu_online(smp_processor_id())) {
cache_cpu_init();
}
return 0;
@@ -782,7 +782,7 @@ void __init mtrr_bp_init(void)
void mtrr_ap_init(void)
{
- if (!memory_caching_control || mtrr_aps_delayed_init)
+ if (!memory_caching_control || get_cache_aps_delayed_init())
return;
/*
@@ -816,14 +816,6 @@ void mtrr_save_state(void)
smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
}
-void set_mtrr_aps_delayed_init(void)
-{
- if (!memory_caching_control)
- return;
-
- mtrr_aps_delayed_init = true;
-}
-
/*
* Delayed MTRR initialization for all AP's
*/
@@ -837,11 +829,11 @@ void mtrr_aps_init(void)
* by doing set_mtrr_aps_delayed_init(), prior to this point. If not,
* then we are done.
*/
- if (!mtrr_aps_delayed_init)
+ if (!get_cache_aps_delayed_init())
return;
set_mtrr(~0U, 0, 0, 0);
- mtrr_aps_delayed_init = false;
+ set_cache_aps_delayed_init(false);
}
void mtrr_bp_restore(void)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 3f3ea0287f69..13c71ab29d84 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -58,6 +58,7 @@
#include <linux/overflow.h>
#include <asm/acpi.h>
+#include <asm/cacheinfo.h>
#include <asm/desc.h>
#include <asm/nmi.h>
#include <asm/irq.h>
@@ -1428,7 +1429,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
uv_system_init();
- set_mtrr_aps_delayed_init();
+ set_cache_aps_delayed_init(true);
smp_quirk_init_udelay();
@@ -1439,7 +1440,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
void arch_thaw_secondary_cpus_begin(void)
{
- set_mtrr_aps_delayed_init();
+ set_cache_aps_delayed_init(true);
}
void arch_thaw_secondary_cpus_end(void)
--
2.35.3
Prepare making PAT and MTRR support independent from each other by
moving some code needed by both out of the MTRR specific sources.
Signed-off-by: Juergen Gross <[email protected]>
---
V2:
- move code from cpu/common.c to cpu/cacheinfo.c (Borislav Petkov)
V4:
- carved out all non-movement modifications (Borislav Petkov)
---
arch/x86/kernel/cpu/cacheinfo.c | 77 ++++++++++++++++++++++++++++++
arch/x86/kernel/cpu/mtrr/generic.c | 74 ----------------------------
2 files changed, 77 insertions(+), 74 deletions(-)
diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 5228fb9a3798..c6a17e21301e 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -20,6 +20,8 @@
#include <asm/cacheinfo.h>
#include <asm/amd_nb.h>
#include <asm/smp.h>
+#include <asm/mtrr.h>
+#include <asm/tlbflush.h>
#include "cpu.h"
@@ -1043,3 +1045,78 @@ int populate_cache_leaves(unsigned int cpu)
return 0;
}
+
+/*
+ * Disable and enable caches. Needed for changing MTRRs and the PAT MSR.
+ *
+ * Since we are disabling the cache don't allow any interrupts,
+ * they would run extremely slow and would only increase the pain.
+ *
+ * The caller must ensure that local interrupts are disabled and
+ * are reenabled after cache_enable() has been called.
+ */
+static unsigned long saved_cr4;
+static DEFINE_RAW_SPINLOCK(cache_disable_lock);
+
+void cache_disable(void) __acquires(cache_disable_lock)
+{
+ unsigned long cr0;
+
+ /*
+ * Note that this is not ideal
+ * since the cache is only flushed/disabled for this CPU while the
+ * MTRRs are changed, but changing this requires more invasive
+ * changes to the way the kernel boots
+ */
+
+ raw_spin_lock(&cache_disable_lock);
+
+ /* Enter the no-fill (CD=1, NW=0) cache mode and flush caches. */
+ cr0 = read_cr0() | X86_CR0_CD;
+ write_cr0(cr0);
+
+ /*
+ * Cache flushing is the most time-consuming step when programming
+ * the MTRRs. Fortunately, as per the Intel Software Development
+ * Manual, we can skip it if the processor supports cache self-
+ * snooping.
+ */
+ if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
+ wbinvd();
+
+ /* Save value of CR4 and clear Page Global Enable (bit 7) */
+ if (cpu_feature_enabled(X86_FEATURE_PGE)) {
+ saved_cr4 = __read_cr4();
+ __write_cr4(saved_cr4 & ~X86_CR4_PGE);
+ }
+
+ /* Flush all TLBs via a mov %cr3, %reg; mov %reg, %cr3 */
+ count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
+ flush_tlb_local();
+
+ if (cpu_feature_enabled(X86_FEATURE_MTRR))
+ mtrr_disable();
+
+ /* Again, only flush caches if we have to. */
+ if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
+ wbinvd();
+}
+
+void cache_enable(void) __releases(cache_disable_lock)
+{
+ /* Flush TLBs (no need to flush caches - they are disabled) */
+ count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
+ flush_tlb_local();
+
+ if (cpu_feature_enabled(X86_FEATURE_MTRR))
+ mtrr_enable();
+
+ /* Enable caches */
+ write_cr0(read_cr0() & ~X86_CR0_CD);
+
+ /* Restore value of CR4 */
+ if (cpu_feature_enabled(X86_FEATURE_PGE))
+ __write_cr4(saved_cr4);
+
+ raw_spin_unlock(&cache_disable_lock);
+}
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 4edf0827f7ee..bfe13eedaca8 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -731,80 +731,6 @@ void mtrr_enable(void)
mtrr_wrmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
}
-/*
- * Disable and enable caches. Needed for changing MTRRs and the PAT MSR.
- *
- * Since we are disabling the cache don't allow any interrupts,
- * they would run extremely slow and would only increase the pain.
- *
- * The caller must ensure that local interrupts are disabled and
- * are reenabled after cache_enable() has been called.
- */
-static unsigned long saved_cr4;
-static DEFINE_RAW_SPINLOCK(cache_disable_lock);
-
-void cache_disable(void) __acquires(cache_disable_lock)
-{
- unsigned long cr0;
-
- /*
- * Note that this is not ideal
- * since the cache is only flushed/disabled for this CPU while the
- * MTRRs are changed, but changing this requires more invasive
- * changes to the way the kernel boots
- */
-
- raw_spin_lock(&cache_disable_lock);
-
- /* Enter the no-fill (CD=1, NW=0) cache mode and flush caches. */
- cr0 = read_cr0() | X86_CR0_CD;
- write_cr0(cr0);
-
- /*
- * Cache flushing is the most time-consuming step when programming
- * the MTRRs. Fortunately, as per the Intel Software Development
- * Manual, we can skip it if the processor supports cache self-
- * snooping.
- */
- if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
- wbinvd();
-
- /* Save value of CR4 and clear Page Global Enable (bit 7) */
- if (boot_cpu_has(X86_FEATURE_PGE)) {
- saved_cr4 = __read_cr4();
- __write_cr4(saved_cr4 & ~X86_CR4_PGE);
- }
-
- /* Flush all TLBs via a mov %cr3, %reg; mov %reg, %cr3 */
- count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
- flush_tlb_local();
-
- if (cpu_feature_enabled(X86_FEATURE_MTRR))
- mtrr_disable();
-
- /* Again, only flush caches if we have to. */
- if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
- wbinvd();
-}
-
-void cache_enable(void) __releases(cache_disable_lock)
-{
- /* Flush TLBs (no need to flush caches - they are disabled) */
- count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
- flush_tlb_local();
-
- if (cpu_feature_enabled(X86_FEATURE_MTRR))
- mtrr_enable();
-
- /* Enable caches */
- write_cr0(read_cr0() & ~X86_CR0_CD);
-
- /* Restore value of CR4 */
- if (boot_cpu_has(X86_FEATURE_PGE))
- __write_cr4(saved_cr4);
- raw_spin_unlock(&cache_disable_lock);
-}
-
static void generic_set_all(void)
{
unsigned long mask, count;
--
2.35.3
In case of the generic cache interface being used (Intel CPUs or a
64-bit system), the initialization sequence of the boot CPU is more
complicated than necessary:
- check if MTRR enabled, if yes, call mtrr_bp_pat_init() which will
disable caching, set the PAT MSR, and reenable caching
- call mtrr_cleanup(), in case that changed anything, call
cache_cpu_init() doing the same caching disable/enable dance as
above, but this time with setting the (modified) MTRR state (even
if MTRR was disabled) AND setting the PAT MSR (again even with
disabled MTRR)
The sequence can be simplified a lot while removing potential
inconsistencies:
- check if MTRR enabled, if yes, call mtrr_cleanup() and then
cache_cpu_init()
This ensures to:
- no longer disable/enable caching more than once
- avoid to set MTRRs and/or the PAT MSR on the boot processor in case
of MTRR cleanups even if MTRRs meant to be disabled
With that mtrr_bp_pat_init() can be removed.
Signed-off-by: Juergen Gross <[email protected]>
---
V2:
- new patch
---
arch/x86/kernel/cpu/mtrr/generic.c | 14 --------------
arch/x86/kernel/cpu/mtrr/mtrr.c | 6 +-----
arch/x86/kernel/cpu/mtrr/mtrr.h | 1 -
3 files changed, 1 insertion(+), 20 deletions(-)
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index af8422c96b92..2f2485d6657f 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -442,20 +442,6 @@ static void __init print_mtrr_state(void)
pr_debug("TOM2: %016llx aka %lldM\n", mtrr_tom2, mtrr_tom2>>20);
}
-/* PAT setup for BP. We need to go through sync steps here */
-void __init mtrr_bp_pat_init(void)
-{
- unsigned long flags;
-
- local_irq_save(flags);
- cache_disable();
-
- pat_init();
-
- cache_enable();
- local_irq_restore(flags);
-}
-
/* Grab all of the MTRR state for this CPU into *state */
bool __init get_mtrr_state(void)
{
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
index a44b510ced0e..a468be5d778f 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -764,12 +764,8 @@ void __init mtrr_bp_init(void)
__mtrr_enabled = get_mtrr_state();
if (mtrr_enabled()) {
- mtrr_bp_pat_init();
memory_caching_control |= CACHE_MTRR | CACHE_PAT;
- }
-
- if (mtrr_cleanup(phys_addr)) {
- changed_by_mtrr_cleanup = 1;
+ changed_by_mtrr_cleanup = mtrr_cleanup(phys_addr);
cache_cpu_init();
}
}
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h
index 3b1883185185..c98928ceee6a 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.h
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.h
@@ -50,7 +50,6 @@ void set_mtrr_prepare_save(struct set_mtrr_context *ctxt);
void fill_mtrr_var_range(unsigned int index,
u32 base_lo, u32 base_hi, u32 mask_lo, u32 mask_hi);
bool get_mtrr_state(void);
-void mtrr_bp_pat_init(void);
extern void __init set_mtrr_ops(const struct mtrr_ops *ops);
--
2.35.3
The way mtrr_if is initialized with the correct mtrr_ops structure is
quite weird.
Simplify that by dropping the vendor specific init functions and the
mtrr_ops[] array. Replace those with direct assignments of the related
vendor specific ops array to mtrr_if.
Note that a direct assignment is okay even for 64-bit builds, where the
symbol isn't present, as the related code will be subject to "dead code
elimination" due to how cpu_feature_enabled() is implemented.
Signed-off-by: Juergen Gross <[email protected]>
---
V4:
- new patch
V5:
- drop the vendor_mtrr_ops() macro (Borislav Petkov)
---
arch/x86/kernel/cpu/mtrr/amd.c | 8 +-------
arch/x86/kernel/cpu/mtrr/centaur.c | 8 +-------
arch/x86/kernel/cpu/mtrr/cyrix.c | 8 +-------
arch/x86/kernel/cpu/mtrr/mtrr.c | 30 +++---------------------------
arch/x86/kernel/cpu/mtrr/mtrr.h | 10 ++++------
5 files changed, 10 insertions(+), 54 deletions(-)
diff --git a/arch/x86/kernel/cpu/mtrr/amd.c b/arch/x86/kernel/cpu/mtrr/amd.c
index a65a0272096d..eff6ac62c0ff 100644
--- a/arch/x86/kernel/cpu/mtrr/amd.c
+++ b/arch/x86/kernel/cpu/mtrr/amd.c
@@ -109,7 +109,7 @@ amd_validate_add_page(unsigned long base, unsigned long size, unsigned int type)
return 0;
}
-static const struct mtrr_ops amd_mtrr_ops = {
+const struct mtrr_ops amd_mtrr_ops = {
.vendor = X86_VENDOR_AMD,
.set = amd_set_mtrr,
.get = amd_get_mtrr,
@@ -117,9 +117,3 @@ static const struct mtrr_ops amd_mtrr_ops = {
.validate_add_page = amd_validate_add_page,
.have_wrcomb = positive_have_wrcomb,
};
-
-int __init amd_init_mtrr(void)
-{
- set_mtrr_ops(&amd_mtrr_ops);
- return 0;
-}
diff --git a/arch/x86/kernel/cpu/mtrr/centaur.c b/arch/x86/kernel/cpu/mtrr/centaur.c
index f27177816569..b8a74eddde83 100644
--- a/arch/x86/kernel/cpu/mtrr/centaur.c
+++ b/arch/x86/kernel/cpu/mtrr/centaur.c
@@ -111,7 +111,7 @@ centaur_validate_add_page(unsigned long base, unsigned long size, unsigned int t
return 0;
}
-static const struct mtrr_ops centaur_mtrr_ops = {
+const struct mtrr_ops centaur_mtrr_ops = {
.vendor = X86_VENDOR_CENTAUR,
.set = centaur_set_mcr,
.get = centaur_get_mcr,
@@ -119,9 +119,3 @@ static const struct mtrr_ops centaur_mtrr_ops = {
.validate_add_page = centaur_validate_add_page,
.have_wrcomb = positive_have_wrcomb,
};
-
-int __init centaur_init_mtrr(void)
-{
- set_mtrr_ops(¢aur_mtrr_ops);
- return 0;
-}
diff --git a/arch/x86/kernel/cpu/mtrr/cyrix.c b/arch/x86/kernel/cpu/mtrr/cyrix.c
index c77d3b0a5bf2..173b9e01e623 100644
--- a/arch/x86/kernel/cpu/mtrr/cyrix.c
+++ b/arch/x86/kernel/cpu/mtrr/cyrix.c
@@ -234,7 +234,7 @@ static void cyrix_set_arr(unsigned int reg, unsigned long base,
post_set();
}
-static const struct mtrr_ops cyrix_mtrr_ops = {
+const struct mtrr_ops cyrix_mtrr_ops = {
.vendor = X86_VENDOR_CYRIX,
.set = cyrix_set_arr,
.get = cyrix_get_arr,
@@ -242,9 +242,3 @@ static const struct mtrr_ops cyrix_mtrr_ops = {
.validate_add_page = generic_validate_add_page,
.have_wrcomb = positive_have_wrcomb,
};
-
-int __init cyrix_init_mtrr(void)
-{
- set_mtrr_ops(&cyrix_mtrr_ops);
- return 0;
-}
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
index 8403daf34158..6432abccbf56 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -69,16 +69,8 @@ static DEFINE_MUTEX(mtrr_mutex);
u64 size_or_mask, size_and_mask;
-static const struct mtrr_ops *mtrr_ops[X86_VENDOR_NUM] __ro_after_init;
-
const struct mtrr_ops *mtrr_if;
-void __init set_mtrr_ops(const struct mtrr_ops *ops)
-{
- if (ops->vendor && ops->vendor < X86_VENDOR_NUM)
- mtrr_ops[ops->vendor] = ops;
-}
-
/* Returns non-zero if we have the write-combining memory type */
static int have_wrcomb(void)
{
@@ -582,20 +574,6 @@ int arch_phys_wc_index(int handle)
}
EXPORT_SYMBOL_GPL(arch_phys_wc_index);
-/*
- * HACK ALERT!
- * These should be called implicitly, but we can't yet until all the initcall
- * stuff is done...
- */
-static void __init init_ifs(void)
-{
-#ifndef CONFIG_X86_64
- amd_init_mtrr();
- cyrix_init_mtrr();
- centaur_init_mtrr();
-#endif
-}
-
/* The suspend/resume methods are only for CPU without MTRR. CPU using generic
* MTRR driver doesn't require this
*/
@@ -653,8 +631,6 @@ void __init mtrr_bp_init(void)
{
u32 phys_addr;
- init_ifs();
-
phys_addr = 32;
if (boot_cpu_has(X86_FEATURE_MTRR)) {
@@ -695,21 +671,21 @@ void __init mtrr_bp_init(void)
case X86_VENDOR_AMD:
if (cpu_feature_enabled(X86_FEATURE_K6_MTRR)) {
/* Pre-Athlon (K6) AMD CPU MTRRs */
- mtrr_if = mtrr_ops[X86_VENDOR_AMD];
+ mtrr_if = &amd_mtrr_ops;
size_or_mask = SIZE_OR_MASK_BITS(32);
size_and_mask = 0;
}
break;
case X86_VENDOR_CENTAUR:
if (cpu_feature_enabled(X86_FEATURE_CENTAUR_MCR)) {
- mtrr_if = mtrr_ops[X86_VENDOR_CENTAUR];
+ mtrr_if = ¢aur_mtrr_ops;
size_or_mask = SIZE_OR_MASK_BITS(32);
size_and_mask = 0;
}
break;
case X86_VENDOR_CYRIX:
if (cpu_feature_enabled(X86_FEATURE_CYRIX_ARR)) {
- mtrr_if = mtrr_ops[X86_VENDOR_CYRIX];
+ mtrr_if = &cyrix_mtrr_ops;
size_or_mask = SIZE_OR_MASK_BITS(32);
size_and_mask = 0;
}
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h
index c98928ceee6a..02eb5871492d 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.h
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.h
@@ -51,8 +51,6 @@ void fill_mtrr_var_range(unsigned int index,
u32 base_lo, u32 base_hi, u32 mask_lo, u32 mask_hi);
bool get_mtrr_state(void);
-extern void __init set_mtrr_ops(const struct mtrr_ops *ops);
-
extern u64 size_or_mask, size_and_mask;
extern const struct mtrr_ops *mtrr_if;
@@ -66,10 +64,10 @@ void mtrr_state_warn(void);
const char *mtrr_attrib_to_str(int x);
void mtrr_wrmsr(unsigned, unsigned, unsigned);
-/* CPU specific mtrr init functions */
-int amd_init_mtrr(void);
-int cyrix_init_mtrr(void);
-int centaur_init_mtrr(void);
+/* CPU specific mtrr_ops vectors. */
+extern const struct mtrr_ops amd_mtrr_ops;
+extern const struct mtrr_ops cyrix_mtrr_ops;
+extern const struct mtrr_ops centaur_mtrr_ops;
extern int changed_by_mtrr_cleanup;
extern int mtrr_cleanup(unsigned address_bits);
--
2.35.3
Split the MTRR specific actions from cache_disable() and cache_enable()
into the new functions mtrr_disable() and mtrr_enable().
Signed-off-by: Juergen Gross <[email protected]>
---
V4:
- carved out from other patch (Borislav Petkov)
V5:
- use cpu_feature_enabled() (Borislav Petkov)
---
arch/x86/include/asm/mtrr.h | 4 ++++
arch/x86/kernel/cpu/mtrr/generic.c | 26 +++++++++++++++++++-------
2 files changed, 23 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index 76d726074c16..12a16caed395 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -48,6 +48,8 @@ extern void mtrr_aps_init(void);
extern void mtrr_bp_restore(void);
extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
extern int amd_special_default_mtrr(void);
+void mtrr_disable(void);
+void mtrr_enable(void);
# else
static inline u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform)
{
@@ -87,6 +89,8 @@ static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
#define set_mtrr_aps_delayed_init() do {} while (0)
#define mtrr_aps_init() do {} while (0)
#define mtrr_bp_restore() do {} while (0)
+#define mtrr_disable() do {} while (0)
+#define mtrr_enable() do {} while (0)
# endif
#ifdef CONFIG_COMPAT
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index aebdc90a2489..4edf0827f7ee 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -716,6 +716,21 @@ static unsigned long set_mtrr_state(void)
return change_mask;
}
+void mtrr_disable(void)
+{
+ /* Save MTRR state */
+ rdmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
+
+ /* Disable MTRRs, and set the default type to uncached */
+ mtrr_wrmsr(MSR_MTRRdefType, deftype_lo & ~0xcff, deftype_hi);
+}
+
+void mtrr_enable(void)
+{
+ /* Intel (P6) standard MTRRs */
+ mtrr_wrmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
+}
+
/*
* Disable and enable caches. Needed for changing MTRRs and the PAT MSR.
*
@@ -764,11 +779,8 @@ void cache_disable(void) __acquires(cache_disable_lock)
count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
flush_tlb_local();
- /* Save MTRR state */
- rdmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
-
- /* Disable MTRRs, and set the default type to uncached */
- mtrr_wrmsr(MSR_MTRRdefType, deftype_lo & ~0xcff, deftype_hi);
+ if (cpu_feature_enabled(X86_FEATURE_MTRR))
+ mtrr_disable();
/* Again, only flush caches if we have to. */
if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
@@ -781,8 +793,8 @@ void cache_enable(void) __releases(cache_disable_lock)
count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
flush_tlb_local();
- /* Intel (P6) standard MTRRs */
- mtrr_wrmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
+ if (cpu_feature_enabled(X86_FEATURE_MTRR))
+ mtrr_enable();
/* Enable caches */
write_cr0(read_cr0() & ~X86_CR0_CD);
--
2.35.3
Instead of serializing MTRR/PAT setup on the secondary CPUs in order
to avoid clobbering of static variables used by the setup process, put
those variables into a structure held on the stack and drop the
serialization.
This speeds up the start of secondary CPUs a little bit (on a small
system with 8 CPUs the time needed for starting the secondary CPUs was
measured to go down from about 60 milliseconds without this patch to
about 55 milliseconds with this patch applied).
Signed-off-by: Juergen Gross <[email protected]>
---
V4:
- new patch
---
arch/x86/include/asm/cacheinfo.h | 10 ++++++--
arch/x86/include/asm/mtrr.h | 13 +++++-----
arch/x86/kernel/cpu/cacheinfo.c | 28 ++++++++-------------
arch/x86/kernel/cpu/mtrr/generic.c | 40 ++++++++++++++----------------
4 files changed, 45 insertions(+), 46 deletions(-)
diff --git a/arch/x86/include/asm/cacheinfo.h b/arch/x86/include/asm/cacheinfo.h
index ce9685fc78d8..f66578e1e4e1 100644
--- a/arch/x86/include/asm/cacheinfo.h
+++ b/arch/x86/include/asm/cacheinfo.h
@@ -10,8 +10,14 @@ extern unsigned int memory_caching_control;
void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c, int cpu);
void cacheinfo_hygon_init_llc_id(struct cpuinfo_x86 *c, int cpu);
-void cache_disable(void);
-void cache_enable(void);
+struct cache_state {
+ unsigned long cr4;
+ u32 mtrr_deftype_lo;
+ u32 mtrr_deftype_hi;
+};
+
+void cache_disable(struct cache_state *state);
+void cache_enable(struct cache_state *state);
void set_cache_aps_delayed_init(bool val);
bool get_cache_aps_delayed_init(void);
void cache_bp_init(void);
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index f0eeaf6e5f5f..2ea4a9de7318 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -23,6 +23,7 @@
#ifndef _ASM_X86_MTRR_H
#define _ASM_X86_MTRR_H
+#include <asm/cacheinfo.h>
#include <uapi/asm/mtrr.h>
/*
@@ -44,9 +45,9 @@ extern void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi);
extern void mtrr_bp_restore(void);
extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
extern int amd_special_default_mtrr(void);
-void mtrr_disable(void);
-void mtrr_enable(void);
-void mtrr_generic_set_state(void);
+void mtrr_disable(struct cache_state *state);
+void mtrr_enable(struct cache_state *state);
+void mtrr_generic_set_state(struct cache_state *state);
# else
static inline u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform)
{
@@ -84,9 +85,9 @@ static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
}
#define mtrr_bp_init() do {} while (0)
#define mtrr_bp_restore() do {} while (0)
-#define mtrr_disable() do {} while (0)
-#define mtrr_enable() do {} while (0)
-#define mtrr_generic_set_state() do {} while (0)
+#define mtrr_disable(s) do {} while (0)
+#define mtrr_enable(s) do {} while (0)
+#define mtrr_generic_set_state(s) do {} while (0)
# endif
#ifdef CONFIG_COMPAT
diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 231cf1ff0641..7e370c979417 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -1057,10 +1057,7 @@ int populate_cache_leaves(unsigned int cpu)
* The caller must ensure that local interrupts are disabled and
* are reenabled after cache_enable() has been called.
*/
-static unsigned long saved_cr4;
-static DEFINE_RAW_SPINLOCK(cache_disable_lock);
-
-void cache_disable(void) __acquires(cache_disable_lock)
+void cache_disable(struct cache_state *state)
{
unsigned long cr0;
@@ -1071,8 +1068,6 @@ void cache_disable(void) __acquires(cache_disable_lock)
* changes to the way the kernel boots
*/
- raw_spin_lock(&cache_disable_lock);
-
/* Enter the no-fill (CD=1, NW=0) cache mode and flush caches. */
cr0 = read_cr0() | X86_CR0_CD;
write_cr0(cr0);
@@ -1088,8 +1083,8 @@ void cache_disable(void) __acquires(cache_disable_lock)
/* Save value of CR4 and clear Page Global Enable (bit 7) */
if (cpu_feature_enabled(X86_FEATURE_PGE)) {
- saved_cr4 = __read_cr4();
- __write_cr4(saved_cr4 & ~X86_CR4_PGE);
+ state->cr4 = __read_cr4();
+ __write_cr4(state->cr4 & ~X86_CR4_PGE);
}
/* Flush all TLBs via a mov %cr3, %reg; mov %reg, %cr3 */
@@ -1097,46 +1092,45 @@ void cache_disable(void) __acquires(cache_disable_lock)
flush_tlb_local();
if (cpu_feature_enabled(X86_FEATURE_MTRR))
- mtrr_disable();
+ mtrr_disable(state);
/* Again, only flush caches if we have to. */
if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
wbinvd();
}
-void cache_enable(void) __releases(cache_disable_lock)
+void cache_enable(struct cache_state *state)
{
/* Flush TLBs (no need to flush caches - they are disabled) */
count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
flush_tlb_local();
if (cpu_feature_enabled(X86_FEATURE_MTRR))
- mtrr_enable();
+ mtrr_enable(state);
/* Enable caches */
write_cr0(read_cr0() & ~X86_CR0_CD);
/* Restore value of CR4 */
if (cpu_feature_enabled(X86_FEATURE_PGE))
- __write_cr4(saved_cr4);
-
- raw_spin_unlock(&cache_disable_lock);
+ __write_cr4(state->cr4);
}
static void cache_cpu_init(void)
{
unsigned long flags;
+ struct cache_state state;
local_irq_save(flags);
- cache_disable();
+ cache_disable(&state);
if (memory_caching_control & CACHE_MTRR)
- mtrr_generic_set_state();
+ mtrr_generic_set_state(&state);
if (memory_caching_control & CACHE_PAT)
pat_cpu_init();
- cache_enable();
+ cache_enable(&state);
local_irq_restore(flags);
}
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 2f2485d6657f..cddb440f330d 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -663,18 +663,13 @@ static bool set_mtrr_var_ranges(unsigned int index, struct mtrr_var_range *vr)
return changed;
}
-static u32 deftype_lo, deftype_hi;
-
/**
* set_mtrr_state - Set the MTRR state for this CPU.
*
- * NOTE: The CPU must already be in a safe state for MTRR changes, including
- * measures that only a single CPU can be active in set_mtrr_state() in
- * order to not be subject to races for usage of deftype_lo (this is
- * accomplished by taking cache_disable_lock).
+ * NOTE: The CPU must already be in a safe state for MTRR changes.
* RETURNS: 0 if no changes made, else a mask indicating what was changed.
*/
-static unsigned long set_mtrr_state(void)
+static unsigned long set_mtrr_state(struct cache_state *state)
{
unsigned long change_mask = 0;
unsigned int i;
@@ -691,38 +686,40 @@ static unsigned long set_mtrr_state(void)
* Set_mtrr_restore restores the old value of MTRRdefType,
* so to set it we fiddle with the saved value:
*/
- if ((deftype_lo & 0xff) != mtrr_state.def_type
- || ((deftype_lo & 0xc00) >> 10) != mtrr_state.enabled) {
-
- deftype_lo = (deftype_lo & ~0xcff) | mtrr_state.def_type |
- (mtrr_state.enabled << 10);
+ if ((state->mtrr_deftype_lo & 0xff) != mtrr_state.def_type
+ || ((state->mtrr_deftype_lo & 0xc00) >> 10) != mtrr_state.enabled) {
+ state->mtrr_deftype_lo = (state->mtrr_deftype_lo & ~0xcff) |
+ mtrr_state.def_type |
+ (mtrr_state.enabled << 10);
change_mask |= MTRR_CHANGE_MASK_DEFTYPE;
}
return change_mask;
}
-void mtrr_disable(void)
+void mtrr_disable(struct cache_state *state)
{
/* Save MTRR state */
- rdmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
+ rdmsr(MSR_MTRRdefType, state->mtrr_deftype_lo, state->mtrr_deftype_hi);
/* Disable MTRRs, and set the default type to uncached */
- mtrr_wrmsr(MSR_MTRRdefType, deftype_lo & ~0xcff, deftype_hi);
+ mtrr_wrmsr(MSR_MTRRdefType, state->mtrr_deftype_lo & ~0xcff,
+ state->mtrr_deftype_hi);
}
-void mtrr_enable(void)
+void mtrr_enable(struct cache_state *state)
{
/* Intel (P6) standard MTRRs */
- mtrr_wrmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
+ mtrr_wrmsr(MSR_MTRRdefType, state->mtrr_deftype_lo,
+ state->mtrr_deftype_hi);
}
-void mtrr_generic_set_state(void)
+void mtrr_generic_set_state(struct cache_state *state)
{
unsigned long mask, count;
/* Actually set the state */
- mask = set_mtrr_state();
+ mask = set_mtrr_state(state);
/* Use the atomic bitops to update the global mask */
for (count = 0; count < sizeof(mask) * 8; ++count) {
@@ -747,11 +744,12 @@ static void generic_set_mtrr(unsigned int reg, unsigned long base,
{
unsigned long flags;
struct mtrr_var_range *vr;
+ struct cache_state state;
vr = &mtrr_state.var_ranges[reg];
local_irq_save(flags);
- cache_disable();
+ cache_disable(&state);
if (size == 0) {
/*
@@ -770,7 +768,7 @@ static void generic_set_mtrr(unsigned int reg, unsigned long base,
mtrr_wrmsr(MTRRphysMask_MSR(reg), vr->mask_lo, vr->mask_hi);
}
- cache_enable();
+ cache_enable(&state);
local_irq_restore(flags);
}
--
2.35.3
In MTRR code use_intel() is only used in one source file, and the
relevant use_intel_if member of struct mtrr_ops is set only in
generic_mtrr_ops.
Replace use_intel() with a single flag in cacheinfo.c, which can be set
when assigning generic_mtrr_ops to mtrr_if. This allows to drop
use_intel_if from mtrr_ops, while preparing to support PAT without
MTRR. As another preparation for the PAT/MTRR decoupling use a bit for
MTRR control and one for PAT control. For now set both bits together,
this can be changed later.
As the new flag will be set only if mtrr_enabled is set, the test for
mtrr_enabled can be dropped at some places.
Signed-off-by: Juergen Gross <[email protected]>
---
V2:
- new patch
V4:
- rename cache_generic to memory_caching_control (Borislav Petkov)
- rename CACHE_GENERIC_* to CACHE_* (Borislav Petkov)
- get rid of use_generic in mtrr_bp_init() (Borislav Petkov)
V5:
- keep mtrr_enabled() (Borislav Petkov)
---
arch/x86/include/asm/cacheinfo.h | 5 +++++
arch/x86/kernel/cpu/cacheinfo.c | 3 +++
arch/x86/kernel/cpu/mtrr/generic.c | 1 -
arch/x86/kernel/cpu/mtrr/mtrr.c | 28 +++++++++++++---------------
arch/x86/kernel/cpu/mtrr/mtrr.h | 2 --
5 files changed, 21 insertions(+), 18 deletions(-)
diff --git a/arch/x86/include/asm/cacheinfo.h b/arch/x86/include/asm/cacheinfo.h
index 86b2e0dcc4bf..c3873962a7cd 100644
--- a/arch/x86/include/asm/cacheinfo.h
+++ b/arch/x86/include/asm/cacheinfo.h
@@ -2,6 +2,11 @@
#ifndef _ASM_X86_CACHEINFO_H
#define _ASM_X86_CACHEINFO_H
+/* Kernel controls MTRR and/or PAT MSRs. */
+extern unsigned int memory_caching_control;
+#define CACHE_MTRR 0x01
+#define CACHE_PAT 0x02
+
void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c, int cpu);
void cacheinfo_hygon_init_llc_id(struct cpuinfo_x86 *c, int cpu);
diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 66556833d7af..5228fb9a3798 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -35,6 +35,9 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_llc_shared_map);
/* Shared L2 cache maps */
DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_l2c_shared_map);
+/* Kernel controls MTRR and/or PAT MSRs. */
+unsigned int memory_caching_control __ro_after_init;
+
struct _cache_table {
unsigned char descriptor;
char cache_type;
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index cd64eab02393..81742870ecc5 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -917,7 +917,6 @@ int positive_have_wrcomb(void)
* Generic structure...
*/
const struct mtrr_ops generic_mtrr_ops = {
- .use_intel_if = 1,
.set_all = generic_set_all,
.get = generic_get_mtrr,
.get_free_region = generic_get_free_region,
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
index 2746cac9d8a9..4209945c4e68 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -46,6 +46,7 @@
#include <linux/syscore_ops.h>
#include <linux/rcupdate.h>
+#include <asm/cacheinfo.h>
#include <asm/cpufeature.h>
#include <asm/e820/api.h>
#include <asm/mtrr.h>
@@ -119,11 +120,11 @@ static int have_wrcomb(void)
}
/* This function returns the number of variable MTRRs */
-static void __init set_num_var_ranges(void)
+static void __init set_num_var_ranges(bool use_generic)
{
unsigned long config = 0, dummy;
- if (use_intel())
+ if (use_generic)
rdmsr(MSR_MTRRcap, config, dummy);
else if (is_cpu(AMD) || is_cpu(HYGON))
config = 2;
@@ -756,14 +757,16 @@ void __init mtrr_bp_init(void)
if (mtrr_if) {
__mtrr_enabled = true;
- set_num_var_ranges();
+ set_num_var_ranges(mtrr_if == &generic_mtrr_ops);
init_table();
- if (use_intel()) {
+ if (mtrr_if == &generic_mtrr_ops) {
/* BIOS may override */
__mtrr_enabled = get_mtrr_state();
- if (mtrr_enabled())
+ if (mtrr_enabled()) {
mtrr_bp_pat_init();
+ memory_caching_control |= CACHE_MTRR | CACHE_PAT;
+ }
if (mtrr_cleanup(phys_addr)) {
changed_by_mtrr_cleanup = 1;
@@ -786,10 +789,7 @@ void __init mtrr_bp_init(void)
void mtrr_ap_init(void)
{
- if (!mtrr_enabled())
- return;
-
- if (!use_intel() || mtrr_aps_delayed_init)
+ if (!memory_caching_control || mtrr_aps_delayed_init)
return;
/*
@@ -825,9 +825,7 @@ void mtrr_save_state(void)
void set_mtrr_aps_delayed_init(void)
{
- if (!mtrr_enabled())
- return;
- if (!use_intel())
+ if (!memory_caching_control)
return;
mtrr_aps_delayed_init = true;
@@ -838,7 +836,7 @@ void set_mtrr_aps_delayed_init(void)
*/
void mtrr_aps_init(void)
{
- if (!use_intel() || !mtrr_enabled())
+ if (!memory_caching_control)
return;
/*
@@ -855,7 +853,7 @@ void mtrr_aps_init(void)
void mtrr_bp_restore(void)
{
- if (!use_intel() || !mtrr_enabled())
+ if (!memory_caching_control)
return;
mtrr_if->set_all();
@@ -866,7 +864,7 @@ static int __init mtrr_init_finialize(void)
if (!mtrr_enabled())
return 0;
- if (use_intel()) {
+ if (memory_caching_control & CACHE_MTRR) {
if (!changed_by_mtrr_cleanup)
mtrr_state_warn();
return 0;
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h
index 2ac99e561181..88b1c4b6174a 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.h
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.h
@@ -14,7 +14,6 @@ extern unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
struct mtrr_ops {
u32 vendor;
- u32 use_intel_if;
void (*set)(unsigned int reg, unsigned long base,
unsigned long size, mtrr_type type);
void (*set_all)(void);
@@ -61,7 +60,6 @@ extern u64 size_or_mask, size_and_mask;
extern const struct mtrr_ops *mtrr_if;
#define is_cpu(vnd) (mtrr_if && mtrr_if->vendor == X86_VENDOR_##vnd)
-#define use_intel() (mtrr_if && mtrr_if->use_intel_if == 1)
extern unsigned int num_var_ranges;
extern u64 mtrr_tom2;
--
2.35.3
Instead of explicitly calling cache_ap_init() in
identify_secondary_cpu() use a CPU hotplug callback instead. By
registering the callback only after having started the non-boot CPUs
and initializing cache_aps_delayed_init with "true", calling
set_cache_aps_delayed_init() at boot time can be dropped.
It should be noted that this change results in cache_ap_init() being
called a little bit later when hotplugging CPUs. By using a new
hotplug slot right at the start of the low level bringup this is not
problematic, as no operations requiring a specific caching mode are
performed that early in CPU initialization.
Suggested-by: Borislav Petkov <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
---
V4:
- new patch
---
arch/x86/include/asm/cacheinfo.h | 1 -
arch/x86/kernel/cpu/cacheinfo.c | 18 +++++++++++++++---
arch/x86/kernel/cpu/common.c | 1 -
arch/x86/kernel/smpboot.c | 2 --
include/linux/cpuhotplug.h | 1 +
5 files changed, 16 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/cacheinfo.h b/arch/x86/include/asm/cacheinfo.h
index a0ef46e9f453..ce9685fc78d8 100644
--- a/arch/x86/include/asm/cacheinfo.h
+++ b/arch/x86/include/asm/cacheinfo.h
@@ -16,7 +16,6 @@ void set_cache_aps_delayed_init(bool val);
bool get_cache_aps_delayed_init(void);
void cache_bp_init(void);
void cache_bp_restore(void);
-void cache_ap_init(void);
void cache_aps_init(void);
#endif /* _ASM_X86_CACHEINFO_H */
diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 1aaf830254df..231cf1ff0641 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -11,6 +11,7 @@
#include <linux/slab.h>
#include <linux/cacheinfo.h>
#include <linux/cpu.h>
+#include <linux/cpuhotplug.h>
#include <linux/sched.h>
#include <linux/capability.h>
#include <linux/sysfs.h>
@@ -1139,7 +1140,7 @@ static void cache_cpu_init(void)
local_irq_restore(flags);
}
-static bool cache_aps_delayed_init;
+static bool cache_aps_delayed_init = true;
void set_cache_aps_delayed_init(bool val)
{
@@ -1174,10 +1175,10 @@ void cache_bp_restore(void)
cache_cpu_init();
}
-void cache_ap_init(void)
+static int cache_ap_init(unsigned int cpu)
{
if (!memory_caching_control || get_cache_aps_delayed_init())
- return;
+ return 0;
/*
* Ideally we should hold mtrr_mutex here to avoid MTRR entries
@@ -1194,6 +1195,8 @@ void cache_ap_init(void)
*/
stop_machine_from_inactive_cpu(cache_rendezvous_handler, NULL,
cpu_callout_mask);
+
+ return 0;
}
/*
@@ -1207,3 +1210,12 @@ void cache_aps_init(void)
stop_machine(cache_rendezvous_handler, NULL, cpu_online_mask);
set_cache_aps_delayed_init(false);
}
+
+static int __init cache_ap_register(void)
+{
+ cpuhp_setup_state_nocalls(CPUHP_AP_CACHECTRL_STARTING,
+ "x86/cachectrl:starting",
+ cache_ap_init, NULL);
+ return 0;
+}
+core_initcall(cache_ap_register);
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index fd058b547f8d..bf4ac1cb93d7 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1949,7 +1949,6 @@ void identify_secondary_cpu(struct cpuinfo_x86 *c)
#ifdef CONFIG_X86_32
enable_sep_cpu();
#endif
- cache_ap_init();
validate_apic_and_package_id(c);
x86_spec_ctrl_setup_ap();
update_srbds_msr();
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 1b61a480c966..82b311c718bc 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1429,8 +1429,6 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
uv_system_init();
- set_cache_aps_delayed_init(true);
-
smp_quirk_init_udelay();
speculative_store_bypass_ht_init();
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index f61447913db9..0d277b4b025a 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -140,6 +140,7 @@ enum cpuhp_state {
*/
CPUHP_AP_IDLE_DEAD,
CPUHP_AP_OFFLINE,
+ CPUHP_AP_CACHECTRL_STARTING,
CPUHP_AP_SCHED_STARTING,
CPUHP_AP_RCUTREE_DYING,
CPUHP_AP_CPU_PM_STARTING,
--
2.35.3
Add a comment how set_mtrr_state() is needing serialization.
Note that this patch has already been applied to tip.git.
Suggested-by: Borislav Petkov <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
---
V3:
- new patch instead of old patch 1
---
arch/x86/kernel/cpu/mtrr/generic.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 558108296f3c..cd64eab02393 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -684,7 +684,10 @@ static u32 deftype_lo, deftype_hi;
/**
* set_mtrr_state - Set the MTRR state for this CPU.
*
- * NOTE: The CPU must already be in a safe state for MTRR changes.
+ * NOTE: The CPU must already be in a safe state for MTRR changes, including
+ * measures that only a single CPU can be active in set_mtrr_state() in
+ * order to not be subject to races for usage of deftype_lo (this is
+ * accomplished by taking set_atomicity_lock).
* RETURNS: 0 if no changes made, else a mask indicating what was changed.
*/
static unsigned long set_mtrr_state(void)
--
2.35.3
On Wed, Nov 02, 2022 at 08:46:57AM +0100, Juergen Gross wrote:
> Today PAT can't be used without MTRR being available, unless MTRR is at
> least configured via CONFIG_MTRR and the system is running as Xen PV
> guest. In this case PAT is automatically available via the hypervisor,
> but the PAT MSR can't be modified by the kernel and MTRR is disabled.
>
> The same applies to a kernel built with no MTRR support: it won't
> allow to use the PAT MSR, even if there is no technical reason for
> that, other than setting up PAT on all CPUs the same way (which is a
> requirement of the processor's cache management) is relying on some
> MTRR specific code.
>
> Fix all of that by:
One of the AMD test boxes here says with this:
...
[ 0.863466] PCI: not using MMCONFIG
[ 0.863475] PCI: Using configuration type 1 for base access
[ 0.863478] PCI: Using configuration type 1 for extended access
[ 0.866733] mtrr: your CPUs had inconsistent MTRRdefType settings
[ 0.866737] mtrr: probably your BIOS does not setup all CPUs.
[ 0.866740] mtrr: corrected configuration.
[ 0.869350] kprobes: kprobe jump-optimization is enabled. All kprobes are optimized if possible.
...
Previous logs don't have it:
PCI: not using MMCONFIG
PCI: Using configuration type 1 for base access
PCI: Using configuration type 1 for extended access
kprobes: kprobe jump-optimization is enabled. All kprobes are optimized if possible.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 02.11.22 19:04, Borislav Petkov wrote:
> On Wed, Nov 02, 2022 at 08:46:57AM +0100, Juergen Gross wrote:
>> Today PAT can't be used without MTRR being available, unless MTRR is at
>> least configured via CONFIG_MTRR and the system is running as Xen PV
>> guest. In this case PAT is automatically available via the hypervisor,
>> but the PAT MSR can't be modified by the kernel and MTRR is disabled.
>>
>> The same applies to a kernel built with no MTRR support: it won't
>> allow to use the PAT MSR, even if there is no technical reason for
>> that, other than setting up PAT on all CPUs the same way (which is a
>> requirement of the processor's cache management) is relying on some
>> MTRR specific code.
>>
>> Fix all of that by:
>
> One of the AMD test boxes here says with this:
>
> ...
> [ 0.863466] PCI: not using MMCONFIG
> [ 0.863475] PCI: Using configuration type 1 for base access
> [ 0.863478] PCI: Using configuration type 1 for extended access
> [ 0.866733] mtrr: your CPUs had inconsistent MTRRdefType settings
> [ 0.866737] mtrr: probably your BIOS does not setup all CPUs.
> [ 0.866740] mtrr: corrected configuration.
> [ 0.869350] kprobes: kprobe jump-optimization is enabled. All kprobes are optimized if possible.
> ...
>
> Previous logs don't have it:
>
> PCI: not using MMCONFIG
> PCI: Using configuration type 1 for base access
> PCI: Using configuration type 1 for extended access
> kprobes: kprobe jump-optimization is enabled. All kprobes are optimized if possible.
>
Weird. I can't spot any modification which could have caused that.
Would it be possible to identify the patch causing that?
Juergen
On Thu, Nov 03, 2022 at 09:40:32AM +0100, Juergen Gross wrote:
> Would it be possible to identify the patch causing that?
Lemme try to find a smaller box which shows that too - that one is a
pain to bisect on.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Thu, Nov 03, 2022 at 05:15:52PM +0100, Borislav Petkov wrote:
> Lemme try to find a smaller box which shows that too - that one is a
> pain to bisect on.
Ok, couldn't find a smaller one (or maybe it had to be a big one to
tickle this out).
So I think it is the parallel setup thing:
x86/mtrr: Do MTRR/PAT setup on all secondary CPUs in parallel
Note that before it, it would do the configuration sequentially on each
CPU:
[ 0.759239] MTRR: prepare_set: CPU83, MSR_MTRRdefType: 0x0, read: (0xc00:0)
[ 0.759239] MTRR: set_mtrr_state: CPU83, mtrr_deftype_lo: 0xc00, mtrr_state.def_type: 0, mtrr_state.enabled: 3
[ 0.760794] MTRR: post_set: CPU83, MSR_MTRRdefType will write: (0xc00:0)
[ 0.761151] MTRR: prepare_set: CPU70, MSR_MTRRdefType: 0x0, read: (0xc00:0)
[ 0.761151] MTRR: set_mtrr_state: CPU70, mtrr_deftype_lo: 0xc00, mtrr_state.def_type: 0, mtrr_state.enabled: 3
[ 0.761151] MTRR: post_set: CPU70, MSR_MTRRdefType will write: (0xc00:0)
...
and so on.
Now, it would do it all in parallel:
[ 0.762006] MTRR: mtrr_disable: CPU70, MSR_MTRRdefType: 0x0, read: (0xc00:0)
[ 0.761916] MTRR: mtrr_disable: CPU18, MSR_MTRRdefType: 0x0, read: (0xc00:0)
[ 0.761808] MTRR: mtrr_disable: CPU82, MSR_MTRRdefType: 0x0, read: (0xc00:0)
[ 0.762593] MTRR: mtrr_disable: CPU6, MSR_MTRRdefType: 0x0, read: (0x0:0)
^^^^^^
Note that last thing. That comes from (with debug output added):
void mtrr_disable(struct cache_state *state)
{
unsigned int cpu = smp_processor_id();
u64 msrval;
/* Save MTRR state */
rdmsr(MSR_MTRRdefType, state->mtrr_deftype_lo, state->mtrr_deftype_hi);
/* Disable MTRRs, and set the default type to uncached */
mtrr_wrmsr(MSR_MTRRdefType, state->mtrr_deftype_lo & ~0xcff,
state->mtrr_deftype_hi);
rdmsrl(MSR_MTRRdefType, msrval);
pr_info("%s: CPU%d, MSR_MTRRdefType: 0x%llx, read: (0x%x:%x)\n",
__func__, cpu, msrval, state->mtrr_deftype_lo, state->mtrr_deftype_hi);
}
The "read: (0x0:0)" basically says that
state->mtrr_deftype_lo, state->mtrr_deftype_hi
are both 0 already. BUT(!), they should NOT be. The low piece is 0xc00 on most
cores except a handful and it means that MTRRs and Fixed Range are
enabled. In total, they're these cores here:
[ 0.762593] MTRR: mtrr_disable: CPU6, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.762247] MTRR: mtrr_disable: CPU26, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.762685] MTRR: mtrr_disable: CPU68, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.762725] MTRR: mtrr_disable: CPU17, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.762685] MTRR: mtrr_disable: CPU69, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.762800] MTRR: mtrr_disable: CPU1, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.762734] MTRR: mtrr_disable: CPU13, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.762720] MTRR: mtrr_disable: CPU24, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.762696] MTRR: mtrr_disable: CPU66, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.762716] MTRR: mtrr_disable: CPU48, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.762693] MTRR: mtrr_disable: CPU57, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.762519] MTRR: mtrr_disable: CPU87, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.762532] MTRR: mtrr_disable: CPU58, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.762755] MTRR: mtrr_disable: CPU32, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.762693] MTRR: mtrr_disable: CPU52, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.762861] MTRR: mtrr_disable: CPU0, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.762724] MTRR: mtrr_disable: CPU21, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.762640] MTRR: mtrr_disable: CPU15, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.762615] MTRR: mtrr_disable: CPU50, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.762741] MTRR: mtrr_disable: CPU40, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.762738] MTRR: mtrr_disable: CPU37, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.762716] MTRR: mtrr_disable: CPU25, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.762512] MTRR: mtrr_disable: CPU59, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.762721] MTRR: mtrr_disable: CPU45, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.762682] MTRR: mtrr_disable: CPU56, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.762583] MTRR: mtrr_disable: CPU124, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.762751] MTRR: mtrr_disable: CPU12, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.762741] MTRR: mtrr_disable: CPU9, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.762575] MTRR: mtrr_disable: CPU51, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.762632] MTRR: mtrr_disable: CPU100, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.762688] MTRR: mtrr_disable: CPU61, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.762610] MTRR: mtrr_disable: CPU105, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.762721] MTRR: mtrr_disable: CPU20, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.762583] MTRR: mtrr_disable: CPU47, MSR_MTRRdefType: 0x0, read: (0x0:0)
Now, if I add MFENCEs around those RDMSRs:
void mtrr_disable(struct cache_state *state)
{
unsigned int cpu = smp_processor_id();
u64 msrval;
/* Save MTRR state */
rdmsr(MSR_MTRRdefType, state->mtrr_deftype_lo, state->mtrr_deftype_hi);
__mb();
/* Disable MTRRs, and set the default type to uncached */
mtrr_wrmsr(MSR_MTRRdefType, state->mtrr_deftype_lo & ~0xcff,
state->mtrr_deftype_hi);
__mb();
rdmsrl(MSR_MTRRdefType, msrval);
pr_info("%s: CPU%d, MSR_MTRRdefType: 0x%llx, read: (0x%x:%x)\n",
__func__, cpu, msrval, state->mtrr_deftype_lo, state->mtrr_deftype_hi);
__mb();
}
the amount of cores becomes less:
[ 0.765260] MTRR: mtrr_disable: CPU6, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.765462] MTRR: mtrr_disable: CPU5, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.765242] MTRR: mtrr_disable: CPU22, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.765522] MTRR: mtrr_disable: CPU0, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.765474] MTRR: mtrr_disable: CPU1, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.765207] MTRR: mtrr_disable: CPU54, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.765225] MTRR: mtrr_disable: CPU8, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.765282] MTRR: mtrr_disable: CPU88, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.765150] MTRR: mtrr_disable: CPU119, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.765370] MTRR: mtrr_disable: CPU49, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.765395] MTRR: mtrr_disable: CPU16, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.765348] MTRR: mtrr_disable: CPU52, MSR_MTRRdefType: 0x0, read: (0x0:0)
[ 0.765270] MTRR: mtrr_disable: CPU58, MSR_MTRRdefType: 0x0, read: (0x0:0)
which basically hints at some speculative fun where we end up reading
the MSR *after* the write to it has already happened. After this thing:
/* Disable MTRRs, and set the default type to uncached */
mtrr_wrmsr(MSR_MTRRdefType, state->mtrr_deftype_lo & ~0xcff,
state->mtrr_deftype_hi);
and thus when we read it, we already read the disabled state. But this
is only a conjecture because I still have no clear idea how TF would
that even happen?!?
Needless to say, this fixes it, ofc:
diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 3805a6d32d37..4a685898caf3 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -1116,12 +1116,14 @@ void cache_enable(struct cache_state *state)
__write_cr4(state->cr4);
}
+static DEFINE_RAW_SPINLOCK(set_atomicity_lock);
+
static void cache_cpu_init(void)
{
unsigned long flags;
struct cache_state state = { };
- local_irq_save(flags);
+ raw_spin_lock_irqsave(&set_atomicity_lock, flags);
cache_disable(&state);
if (memory_caching_control & CACHE_MTRR)
@@ -1131,7 +1133,7 @@ static void cache_cpu_init(void)
pat_cpu_init();
cache_enable(&state);
- local_irq_restore(flags);
+ raw_spin_unlock_irqrestore(&set_atomicity_lock, flags);
}
static bool cache_aps_delayed_init = true;
---
and frankly, considering how we have bigger fish to fry, I'd say we do
it the old way and leave that can'o'worms half-opened.
Unless you wanna continue poking at it. I can give you access to that
box at work...
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 07.11.22 20:25, Borislav Petkov wrote:
> On Thu, Nov 03, 2022 at 05:15:52PM +0100, Borislav Petkov wrote:
>> Lemme try to find a smaller box which shows that too - that one is a
>> pain to bisect on.
>
> Ok, couldn't find a smaller one (or maybe it had to be a big one to
> tickle this out).
>
> So I think it is the parallel setup thing:
>
> x86/mtrr: Do MTRR/PAT setup on all secondary CPUs in parallel
>
> Note that before it, it would do the configuration sequentially on each
> CPU:
>
> [ 0.759239] MTRR: prepare_set: CPU83, MSR_MTRRdefType: 0x0, read: (0xc00:0)
> [ 0.759239] MTRR: set_mtrr_state: CPU83, mtrr_deftype_lo: 0xc00, mtrr_state.def_type: 0, mtrr_state.enabled: 3
> [ 0.760794] MTRR: post_set: CPU83, MSR_MTRRdefType will write: (0xc00:0)
> [ 0.761151] MTRR: prepare_set: CPU70, MSR_MTRRdefType: 0x0, read: (0xc00:0)
> [ 0.761151] MTRR: set_mtrr_state: CPU70, mtrr_deftype_lo: 0xc00, mtrr_state.def_type: 0, mtrr_state.enabled: 3
> [ 0.761151] MTRR: post_set: CPU70, MSR_MTRRdefType will write: (0xc00:0)
> ...
>
> and so on.
>
> Now, it would do it all in parallel:
>
> [ 0.762006] MTRR: mtrr_disable: CPU70, MSR_MTRRdefType: 0x0, read: (0xc00:0)
> [ 0.761916] MTRR: mtrr_disable: CPU18, MSR_MTRRdefType: 0x0, read: (0xc00:0)
> [ 0.761808] MTRR: mtrr_disable: CPU82, MSR_MTRRdefType: 0x0, read: (0xc00:0)
> [ 0.762593] MTRR: mtrr_disable: CPU6, MSR_MTRRdefType: 0x0, read: (0x0:0)
> ^^^^^^
>
> Note that last thing. That comes from (with debug output added):
>
> void mtrr_disable(struct cache_state *state)
> {
> unsigned int cpu = smp_processor_id();
> u64 msrval;
>
> /* Save MTRR state */
> rdmsr(MSR_MTRRdefType, state->mtrr_deftype_lo, state->mtrr_deftype_hi);
>
> /* Disable MTRRs, and set the default type to uncached */
> mtrr_wrmsr(MSR_MTRRdefType, state->mtrr_deftype_lo & ~0xcff,
> state->mtrr_deftype_hi);
>
> rdmsrl(MSR_MTRRdefType, msrval);
>
> pr_info("%s: CPU%d, MSR_MTRRdefType: 0x%llx, read: (0x%x:%x)\n",
> __func__, cpu, msrval, state->mtrr_deftype_lo, state->mtrr_deftype_hi);
> }
>
> The "read: (0x0:0)" basically says that
>
> state->mtrr_deftype_lo, state->mtrr_deftype_hi
>
> are both 0 already. BUT(!), they should NOT be. The low piece is 0xc00 on most
> cores except a handful and it means that MTRRs and Fixed Range are
> enabled. In total, they're these cores here:
>
> [ 0.762593] MTRR: mtrr_disable: CPU6, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.762247] MTRR: mtrr_disable: CPU26, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.762685] MTRR: mtrr_disable: CPU68, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.762725] MTRR: mtrr_disable: CPU17, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.762685] MTRR: mtrr_disable: CPU69, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.762800] MTRR: mtrr_disable: CPU1, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.762734] MTRR: mtrr_disable: CPU13, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.762720] MTRR: mtrr_disable: CPU24, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.762696] MTRR: mtrr_disable: CPU66, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.762716] MTRR: mtrr_disable: CPU48, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.762693] MTRR: mtrr_disable: CPU57, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.762519] MTRR: mtrr_disable: CPU87, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.762532] MTRR: mtrr_disable: CPU58, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.762755] MTRR: mtrr_disable: CPU32, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.762693] MTRR: mtrr_disable: CPU52, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.762861] MTRR: mtrr_disable: CPU0, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.762724] MTRR: mtrr_disable: CPU21, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.762640] MTRR: mtrr_disable: CPU15, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.762615] MTRR: mtrr_disable: CPU50, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.762741] MTRR: mtrr_disable: CPU40, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.762738] MTRR: mtrr_disable: CPU37, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.762716] MTRR: mtrr_disable: CPU25, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.762512] MTRR: mtrr_disable: CPU59, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.762721] MTRR: mtrr_disable: CPU45, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.762682] MTRR: mtrr_disable: CPU56, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.762583] MTRR: mtrr_disable: CPU124, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.762751] MTRR: mtrr_disable: CPU12, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.762741] MTRR: mtrr_disable: CPU9, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.762575] MTRR: mtrr_disable: CPU51, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.762632] MTRR: mtrr_disable: CPU100, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.762688] MTRR: mtrr_disable: CPU61, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.762610] MTRR: mtrr_disable: CPU105, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.762721] MTRR: mtrr_disable: CPU20, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.762583] MTRR: mtrr_disable: CPU47, MSR_MTRRdefType: 0x0, read: (0x0:0)
>
> Now, if I add MFENCEs around those RDMSRs:
>
> void mtrr_disable(struct cache_state *state)
> {
> unsigned int cpu = smp_processor_id();
> u64 msrval;
>
> /* Save MTRR state */
> rdmsr(MSR_MTRRdefType, state->mtrr_deftype_lo, state->mtrr_deftype_hi);
>
> __mb();
>
> /* Disable MTRRs, and set the default type to uncached */
> mtrr_wrmsr(MSR_MTRRdefType, state->mtrr_deftype_lo & ~0xcff,
> state->mtrr_deftype_hi);
>
> __mb();
>
> rdmsrl(MSR_MTRRdefType, msrval);
>
> pr_info("%s: CPU%d, MSR_MTRRdefType: 0x%llx, read: (0x%x:%x)\n",
> __func__, cpu, msrval, state->mtrr_deftype_lo, state->mtrr_deftype_hi);
>
> __mb();
> }
>
> the amount of cores becomes less:
Probably not because of the fencing, but because of the different timing.
>
> [ 0.765260] MTRR: mtrr_disable: CPU6, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.765462] MTRR: mtrr_disable: CPU5, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.765242] MTRR: mtrr_disable: CPU22, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.765522] MTRR: mtrr_disable: CPU0, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.765474] MTRR: mtrr_disable: CPU1, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.765207] MTRR: mtrr_disable: CPU54, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.765225] MTRR: mtrr_disable: CPU8, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.765282] MTRR: mtrr_disable: CPU88, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.765150] MTRR: mtrr_disable: CPU119, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.765370] MTRR: mtrr_disable: CPU49, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.765395] MTRR: mtrr_disable: CPU16, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.765348] MTRR: mtrr_disable: CPU52, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [ 0.765270] MTRR: mtrr_disable: CPU58, MSR_MTRRdefType: 0x0, read: (0x0:0)
>
> which basically hints at some speculative fun where we end up reading
> the MSR *after* the write to it has already happened. After this thing:
>
> /* Disable MTRRs, and set the default type to uncached */
> mtrr_wrmsr(MSR_MTRRdefType, state->mtrr_deftype_lo & ~0xcff,
> state->mtrr_deftype_hi);
>
> and thus when we read it, we already read the disabled state. But this
> is only a conjecture because I still have no clear idea how TF would
> that even happen?!?
Yeah, and why doesn't it happen when we handle only one cpu at a time?
There might be some interaction between the cpus ...
>
> Needless to say, this fixes it, ofc:
>
> diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
> index 3805a6d32d37..4a685898caf3 100644
> --- a/arch/x86/kernel/cpu/cacheinfo.c
> +++ b/arch/x86/kernel/cpu/cacheinfo.c
> @@ -1116,12 +1116,14 @@ void cache_enable(struct cache_state *state)
> __write_cr4(state->cr4);
> }
>
> +static DEFINE_RAW_SPINLOCK(set_atomicity_lock);
> +
> static void cache_cpu_init(void)
> {
> unsigned long flags;
> struct cache_state state = { };
>
> - local_irq_save(flags);
> + raw_spin_lock_irqsave(&set_atomicity_lock, flags);
> cache_disable(&state);
>
> if (memory_caching_control & CACHE_MTRR)
> @@ -1131,7 +1133,7 @@ static void cache_cpu_init(void)
> pat_cpu_init();
>
> cache_enable(&state);
> - local_irq_restore(flags);
> + raw_spin_unlock_irqrestore(&set_atomicity_lock, flags);
> }
>
> static bool cache_aps_delayed_init = true;
>
> ---
>
> and frankly, considering how we have bigger fish to fry, I'd say we do
> it the old way and leave that can'o'worms half-opened.
I agree to keep this patch out of the series for now.
>
> Unless you wanna continue poking at it. I can give you access to that
> box at work...
Yes, please. I suspect there are some additional requirements for updating
MTRR in parallel, or this is "just" a cpu bug.
Juergen
The following commit has been merged into the x86/cpu branch of tip:
Commit-ID: 955d0e0805912641230fb46c380aa625f78ecaca
Gitweb: https://git.kernel.org/tip/955d0e0805912641230fb46c380aa625f78ecaca
Author: Juergen Gross <[email protected]>
AuthorDate: Wed, 02 Nov 2022 08:47:08 +01:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Thu, 10 Nov 2022 13:12:45 +01:00
x86/mtrr: Let cache_aps_delayed_init replace mtrr_aps_delayed_init
In order to prepare decoupling MTRR and PAT replace the MTRR-specific
mtrr_aps_delayed_init flag with a more generic cache_aps_delayed_init
one.
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/include/asm/cacheinfo.h | 2 ++
arch/x86/include/asm/mtrr.h | 2 --
arch/x86/kernel/cpu/cacheinfo.c | 12 ++++++++++++
arch/x86/kernel/cpu/mtrr/mtrr.c | 18 +++++-------------
arch/x86/kernel/smpboot.c | 5 +++--
5 files changed, 22 insertions(+), 17 deletions(-)
diff --git a/arch/x86/include/asm/cacheinfo.h b/arch/x86/include/asm/cacheinfo.h
index 978bac7..e443fcc 100644
--- a/arch/x86/include/asm/cacheinfo.h
+++ b/arch/x86/include/asm/cacheinfo.h
@@ -13,5 +13,7 @@ void cacheinfo_hygon_init_llc_id(struct cpuinfo_x86 *c, int cpu);
void cache_disable(void);
void cache_enable(void);
void cache_cpu_init(void);
+void set_cache_aps_delayed_init(bool val);
+bool get_cache_aps_delayed_init(void);
#endif /* _ASM_X86_CACHEINFO_H */
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index 986249a..5d31219 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -43,7 +43,6 @@ extern int mtrr_del(int reg, unsigned long base, unsigned long size);
extern int mtrr_del_page(int reg, unsigned long base, unsigned long size);
extern void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi);
extern void mtrr_ap_init(void);
-extern void set_mtrr_aps_delayed_init(void);
extern void mtrr_aps_init(void);
extern void mtrr_bp_restore(void);
extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
@@ -87,7 +86,6 @@ static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
{
}
#define mtrr_ap_init() do {} while (0)
-#define set_mtrr_aps_delayed_init() do {} while (0)
#define mtrr_aps_init() do {} while (0)
#define mtrr_bp_restore() do {} while (0)
#define mtrr_disable() do {} while (0)
diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 31684bf..063d556 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -1137,3 +1137,15 @@ void cache_cpu_init(void)
cache_enable();
local_irq_restore(flags);
}
+
+static bool cache_aps_delayed_init;
+
+void set_cache_aps_delayed_init(bool val)
+{
+ cache_aps_delayed_init = val;
+}
+
+bool get_cache_aps_delayed_init(void)
+{
+ return cache_aps_delayed_init;
+}
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
index f671be9..15ee6d7 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -68,7 +68,6 @@ unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
static DEFINE_MUTEX(mtrr_mutex);
u64 size_or_mask, size_and_mask;
-static bool mtrr_aps_delayed_init;
static const struct mtrr_ops *mtrr_ops[X86_VENDOR_NUM] __ro_after_init;
@@ -175,7 +174,8 @@ static int mtrr_rendezvous_handler(void *info)
if (data->smp_reg != ~0U) {
mtrr_if->set(data->smp_reg, data->smp_base,
data->smp_size, data->smp_type);
- } else if (mtrr_aps_delayed_init || !cpu_online(smp_processor_id())) {
+ } else if (get_cache_aps_delayed_init() ||
+ !cpu_online(smp_processor_id())) {
cache_cpu_init();
}
return 0;
@@ -782,7 +782,7 @@ void __init mtrr_bp_init(void)
void mtrr_ap_init(void)
{
- if (!memory_caching_control || mtrr_aps_delayed_init)
+ if (!memory_caching_control || get_cache_aps_delayed_init())
return;
/*
@@ -816,14 +816,6 @@ void mtrr_save_state(void)
smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
}
-void set_mtrr_aps_delayed_init(void)
-{
- if (!memory_caching_control)
- return;
-
- mtrr_aps_delayed_init = true;
-}
-
/*
* Delayed MTRR initialization for all AP's
*/
@@ -837,11 +829,11 @@ void mtrr_aps_init(void)
* by doing set_mtrr_aps_delayed_init(), prior to this point. If not,
* then we are done.
*/
- if (!mtrr_aps_delayed_init)
+ if (!get_cache_aps_delayed_init())
return;
set_mtrr(~0U, 0, 0, 0);
- mtrr_aps_delayed_init = false;
+ set_cache_aps_delayed_init(false);
}
void mtrr_bp_restore(void)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 3f3ea02..13c71ab 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -58,6 +58,7 @@
#include <linux/overflow.h>
#include <asm/acpi.h>
+#include <asm/cacheinfo.h>
#include <asm/desc.h>
#include <asm/nmi.h>
#include <asm/irq.h>
@@ -1428,7 +1429,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
uv_system_init();
- set_mtrr_aps_delayed_init();
+ set_cache_aps_delayed_init(true);
smp_quirk_init_udelay();
@@ -1439,7 +1440,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
void arch_thaw_secondary_cpus_begin(void)
{
- set_mtrr_aps_delayed_init();
+ set_cache_aps_delayed_init(true);
}
void arch_thaw_secondary_cpus_end(void)
The following commit has been merged into the x86/cpu branch of tip:
Commit-ID: 23a63e369098a8503550d1df80f4b4801af32c19
Gitweb: https://git.kernel.org/tip/23a63e369098a8503550d1df80f4b4801af32c19
Author: Juergen Gross <[email protected]>
AuthorDate: Wed, 02 Nov 2022 08:47:03 +01:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Thu, 10 Nov 2022 13:12:44 +01:00
x86/mtrr: Move cache control code to cacheinfo.c
Prepare making PAT and MTRR support independent from each other by
moving some code needed by both out of the MTRR-specific sources.
[ bp: Massage commit message. ]
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/kernel/cpu/cacheinfo.c | 77 +++++++++++++++++++++++++++++-
arch/x86/kernel/cpu/mtrr/generic.c | 74 +----------------------------
2 files changed, 77 insertions(+), 74 deletions(-)
diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 32fb049..0cbacec 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -20,6 +20,8 @@
#include <asm/cacheinfo.h>
#include <asm/amd_nb.h>
#include <asm/smp.h>
+#include <asm/mtrr.h>
+#include <asm/tlbflush.h>
#include "cpu.h"
@@ -1043,3 +1045,78 @@ int populate_cache_leaves(unsigned int cpu)
return 0;
}
+
+/*
+ * Disable and enable caches. Needed for changing MTRRs and the PAT MSR.
+ *
+ * Since we are disabling the cache don't allow any interrupts,
+ * they would run extremely slow and would only increase the pain.
+ *
+ * The caller must ensure that local interrupts are disabled and
+ * are reenabled after cache_enable() has been called.
+ */
+static unsigned long saved_cr4;
+static DEFINE_RAW_SPINLOCK(cache_disable_lock);
+
+void cache_disable(void) __acquires(cache_disable_lock)
+{
+ unsigned long cr0;
+
+ /*
+ * Note that this is not ideal
+ * since the cache is only flushed/disabled for this CPU while the
+ * MTRRs are changed, but changing this requires more invasive
+ * changes to the way the kernel boots
+ */
+
+ raw_spin_lock(&cache_disable_lock);
+
+ /* Enter the no-fill (CD=1, NW=0) cache mode and flush caches. */
+ cr0 = read_cr0() | X86_CR0_CD;
+ write_cr0(cr0);
+
+ /*
+ * Cache flushing is the most time-consuming step when programming
+ * the MTRRs. Fortunately, as per the Intel Software Development
+ * Manual, we can skip it if the processor supports cache self-
+ * snooping.
+ */
+ if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
+ wbinvd();
+
+ /* Save value of CR4 and clear Page Global Enable (bit 7) */
+ if (cpu_feature_enabled(X86_FEATURE_PGE)) {
+ saved_cr4 = __read_cr4();
+ __write_cr4(saved_cr4 & ~X86_CR4_PGE);
+ }
+
+ /* Flush all TLBs via a mov %cr3, %reg; mov %reg, %cr3 */
+ count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
+ flush_tlb_local();
+
+ if (cpu_feature_enabled(X86_FEATURE_MTRR))
+ mtrr_disable();
+
+ /* Again, only flush caches if we have to. */
+ if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
+ wbinvd();
+}
+
+void cache_enable(void) __releases(cache_disable_lock)
+{
+ /* Flush TLBs (no need to flush caches - they are disabled) */
+ count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
+ flush_tlb_local();
+
+ if (cpu_feature_enabled(X86_FEATURE_MTRR))
+ mtrr_enable();
+
+ /* Enable caches */
+ write_cr0(read_cr0() & ~X86_CR0_CD);
+
+ /* Restore value of CR4 */
+ if (cpu_feature_enabled(X86_FEATURE_PGE))
+ __write_cr4(saved_cr4);
+
+ raw_spin_unlock(&cache_disable_lock);
+}
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 0db0770..396cb1e 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -731,80 +731,6 @@ void mtrr_enable(void)
mtrr_wrmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
}
-/*
- * Disable and enable caches. Needed for changing MTRRs and the PAT MSR.
- *
- * Since we are disabling the cache don't allow any interrupts,
- * they would run extremely slow and would only increase the pain.
- *
- * The caller must ensure that local interrupts are disabled and
- * are reenabled after cache_enable() has been called.
- */
-static unsigned long saved_cr4;
-static DEFINE_RAW_SPINLOCK(cache_disable_lock);
-
-void cache_disable(void) __acquires(cache_disable_lock)
-{
- unsigned long cr0;
-
- /*
- * Note that this is not ideal
- * since the cache is only flushed/disabled for this CPU while the
- * MTRRs are changed, but changing this requires more invasive
- * changes to the way the kernel boots
- */
-
- raw_spin_lock(&cache_disable_lock);
-
- /* Enter the no-fill (CD=1, NW=0) cache mode and flush caches. */
- cr0 = read_cr0() | X86_CR0_CD;
- write_cr0(cr0);
-
- /*
- * Cache flushing is the most time-consuming step when programming
- * the MTRRs. Fortunately, as per the Intel Software Development
- * Manual, we can skip it if the processor supports cache self-
- * snooping.
- */
- if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
- wbinvd();
-
- /* Save value of CR4 and clear Page Global Enable (bit 7) */
- if (boot_cpu_has(X86_FEATURE_PGE)) {
- saved_cr4 = __read_cr4();
- __write_cr4(saved_cr4 & ~X86_CR4_PGE);
- }
-
- /* Flush all TLBs via a mov %cr3, %reg; mov %reg, %cr3 */
- count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
- flush_tlb_local();
-
- if (cpu_feature_enabled(X86_FEATURE_MTRR))
- mtrr_disable();
-
- /* Again, only flush caches if we have to. */
- if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
- wbinvd();
-}
-
-void cache_enable(void) __releases(cache_disable_lock)
-{
- /* Flush TLBs (no need to flush caches - they are disabled) */
- count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
- flush_tlb_local();
-
- if (cpu_feature_enabled(X86_FEATURE_MTRR))
- mtrr_enable();
-
- /* Enable caches */
- write_cr0(read_cr0() & ~X86_CR0_CD);
-
- /* Restore value of CR4 */
- if (boot_cpu_has(X86_FEATURE_PGE))
- __write_cr4(saved_cr4);
- raw_spin_unlock(&cache_disable_lock);
-}
-
static void generic_set_all(void)
{
unsigned long mask, count;
The following commit has been merged into the x86/cpu branch of tip:
Commit-ID: 7d71db537b01a6beadbe45a4e6e302272110c2c0
Gitweb: https://git.kernel.org/tip/7d71db537b01a6beadbe45a4e6e302272110c2c0
Author: Juergen Gross <[email protected]>
AuthorDate: Wed, 02 Nov 2022 08:47:04 +01:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Thu, 10 Nov 2022 13:12:44 +01:00
x86/mtrr: Disentangle MTRR init from PAT init
Add a main cache_cpu_init() init routine which initializes MTRR and/or
PAT support depending on what has been detected on the system.
Leave the MTRR-specific initialization in a MTRR-specific init function
where the smp_changes_mask setting happens now with caches disabled.
This global mask update was done with caches enabled before probably
because atomic operations while running uncached might have been quite
expensive.
But since only systems with a broken BIOS should ever require to set any
bit in smp_changes_mask, hurting those devices with a penalty of a few
microseconds during boot shouldn't be a real issue.
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/include/asm/cacheinfo.h | 1 +
arch/x86/include/asm/mtrr.h | 2 ++
arch/x86/kernel/cpu/cacheinfo.c | 17 +++++++++++++++++
arch/x86/kernel/cpu/mtrr/generic.c | 15 ++-------------
4 files changed, 22 insertions(+), 13 deletions(-)
diff --git a/arch/x86/include/asm/cacheinfo.h b/arch/x86/include/asm/cacheinfo.h
index 6159874..978bac7 100644
--- a/arch/x86/include/asm/cacheinfo.h
+++ b/arch/x86/include/asm/cacheinfo.h
@@ -12,5 +12,6 @@ void cacheinfo_hygon_init_llc_id(struct cpuinfo_x86 *c, int cpu);
void cache_disable(void);
void cache_enable(void);
+void cache_cpu_init(void);
#endif /* _ASM_X86_CACHEINFO_H */
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index 12a16ca..986249a 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -50,6 +50,7 @@ extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
extern int amd_special_default_mtrr(void);
void mtrr_disable(void);
void mtrr_enable(void);
+void mtrr_generic_set_state(void);
# else
static inline u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform)
{
@@ -91,6 +92,7 @@ static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
#define mtrr_bp_restore() do {} while (0)
#define mtrr_disable() do {} while (0)
#define mtrr_enable() do {} while (0)
+#define mtrr_generic_set_state() do {} while (0)
# endif
#ifdef CONFIG_COMPAT
diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 0cbacec..31684bf 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -1120,3 +1120,20 @@ void cache_enable(void) __releases(cache_disable_lock)
raw_spin_unlock(&cache_disable_lock);
}
+
+void cache_cpu_init(void)
+{
+ unsigned long flags;
+
+ local_irq_save(flags);
+ cache_disable();
+
+ if (memory_caching_control & CACHE_MTRR)
+ mtrr_generic_set_state();
+
+ if (memory_caching_control & CACHE_PAT)
+ pat_init();
+
+ cache_enable();
+ local_irq_restore(flags);
+}
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 396cb1e..d409c38 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -731,30 +731,19 @@ void mtrr_enable(void)
mtrr_wrmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
}
-static void generic_set_all(void)
+void mtrr_generic_set_state(void)
{
unsigned long mask, count;
- unsigned long flags;
-
- local_irq_save(flags);
- cache_disable();
/* Actually set the state */
mask = set_mtrr_state();
- /* also set PAT */
- pat_init();
-
- cache_enable();
- local_irq_restore(flags);
-
/* Use the atomic bitops to update the global mask */
for (count = 0; count < sizeof(mask) * 8; ++count) {
if (mask & 0x01)
set_bit(count, &smp_changes_mask);
mask >>= 1;
}
-
}
/**
@@ -854,7 +843,7 @@ int positive_have_wrcomb(void)
* Generic structure...
*/
const struct mtrr_ops generic_mtrr_ops = {
- .set_all = generic_set_all,
+ .set_all = cache_cpu_init,
.get = generic_get_mtrr,
.get_free_region = generic_get_free_region,
.set = generic_set_mtrr,
The following commit has been merged into the x86/cpu branch of tip:
Commit-ID: 74069135f09c4600ab2985939c305ebef57ac34f
Gitweb: https://git.kernel.org/tip/74069135f09c4600ab2985939c305ebef57ac34f
Author: Juergen Gross <[email protected]>
AuthorDate: Wed, 02 Nov 2022 08:47:06 +01:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Thu, 10 Nov 2022 13:12:44 +01:00
x86/mtrr: Simplify mtrr_bp_init()
In case of the generic cache interface being used (Intel CPUs or a
64-bit system), the initialization sequence of the boot CPU is more
complicated than necessary:
- check if MTRR enabled, if yes, call mtrr_bp_pat_init() which will
disable caching, set the PAT MSR, and reenable caching
- call mtrr_cleanup(), in case that changed anything, call
cache_cpu_init() doing the same caching disable/enable dance as
above, but this time with setting the (modified) MTRR state (even
if MTRR was disabled) AND setting the PAT MSR (again even with
disabled MTRR)
The sequence can be simplified a lot while removing potential
inconsistencies:
- check if MTRR enabled, if yes, call mtrr_cleanup() and then
cache_cpu_init()
This ensures to:
- no longer disable/enable caching more than once
- avoid to set MTRRs and/or the PAT MSR on the boot processor in case
of MTRR cleanups even if MTRRs meant to be disabled
With that mtrr_bp_pat_init() can be removed.
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/kernel/cpu/mtrr/generic.c | 14 --------------
arch/x86/kernel/cpu/mtrr/mtrr.c | 6 +-----
arch/x86/kernel/cpu/mtrr/mtrr.h | 1 -
3 files changed, 1 insertion(+), 20 deletions(-)
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 9d4d2bc..ee09d35 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -442,20 +442,6 @@ static void __init print_mtrr_state(void)
pr_debug("TOM2: %016llx aka %lldM\n", mtrr_tom2, mtrr_tom2>>20);
}
-/* PAT setup for BP. We need to go through sync steps here */
-void __init mtrr_bp_pat_init(void)
-{
- unsigned long flags;
-
- local_irq_save(flags);
- cache_disable();
-
- pat_init();
-
- cache_enable();
- local_irq_restore(flags);
-}
-
/* Grab all of the MTRR state for this CPU into *state */
bool __init get_mtrr_state(void)
{
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
index a44b510..a468be5 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -764,12 +764,8 @@ void __init mtrr_bp_init(void)
__mtrr_enabled = get_mtrr_state();
if (mtrr_enabled()) {
- mtrr_bp_pat_init();
memory_caching_control |= CACHE_MTRR | CACHE_PAT;
- }
-
- if (mtrr_cleanup(phys_addr)) {
- changed_by_mtrr_cleanup = 1;
+ changed_by_mtrr_cleanup = mtrr_cleanup(phys_addr);
cache_cpu_init();
}
}
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h
index 3b18831..c98928c 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.h
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.h
@@ -50,7 +50,6 @@ void set_mtrr_prepare_save(struct set_mtrr_context *ctxt);
void fill_mtrr_var_range(unsigned int index,
u32 base_lo, u32 base_hi, u32 mask_lo, u32 mask_hi);
bool get_mtrr_state(void);
-void mtrr_bp_pat_init(void);
extern void __init set_mtrr_ops(const struct mtrr_ops *ops);
The following commit has been merged into the x86/cpu branch of tip:
Commit-ID: 30f89e524becdbaa483b34902b079c9d4dfaa4a3
Gitweb: https://git.kernel.org/tip/30f89e524becdbaa483b34902b079c9d4dfaa4a3
Author: Juergen Gross <[email protected]>
AuthorDate: Wed, 02 Nov 2022 08:47:11 +01:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Thu, 10 Nov 2022 13:12:45 +01:00
x86/cacheinfo: Switch cache_ap_init() to hotplug callback
Instead of explicitly calling cache_ap_init() in
identify_secondary_cpu() use a CPU hotplug callback instead. By
registering the callback only after having started the non-boot CPUs
and initializing cache_aps_delayed_init with "true", calling
set_cache_aps_delayed_init() at boot time can be dropped.
It should be noted that this change results in cache_ap_init() being
called a little bit later when hotplugging CPUs. By using a new
hotplug slot right at the start of the low level bringup this is not
problematic, as no operations requiring a specific caching mode are
performed that early in CPU initialization.
Suggested-by: Borislav Petkov <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/include/asm/cacheinfo.h | 1 -
arch/x86/kernel/cpu/cacheinfo.c | 18 +++++++++++++++---
arch/x86/kernel/cpu/common.c | 1 -
arch/x86/kernel/smpboot.c | 2 --
include/linux/cpuhotplug.h | 1 +
5 files changed, 16 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/cacheinfo.h b/arch/x86/include/asm/cacheinfo.h
index a0ef46e..ce9685f 100644
--- a/arch/x86/include/asm/cacheinfo.h
+++ b/arch/x86/include/asm/cacheinfo.h
@@ -16,7 +16,6 @@ void set_cache_aps_delayed_init(bool val);
bool get_cache_aps_delayed_init(void);
void cache_bp_init(void);
void cache_bp_restore(void);
-void cache_ap_init(void);
void cache_aps_init(void);
#endif /* _ASM_X86_CACHEINFO_H */
diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index c830f85..f4e5aa2 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -11,6 +11,7 @@
#include <linux/slab.h>
#include <linux/cacheinfo.h>
#include <linux/cpu.h>
+#include <linux/cpuhotplug.h>
#include <linux/sched.h>
#include <linux/capability.h>
#include <linux/sysfs.h>
@@ -1139,7 +1140,7 @@ static void cache_cpu_init(void)
local_irq_restore(flags);
}
-static bool cache_aps_delayed_init;
+static bool cache_aps_delayed_init = true;
void set_cache_aps_delayed_init(bool val)
{
@@ -1174,10 +1175,10 @@ void cache_bp_restore(void)
cache_cpu_init();
}
-void cache_ap_init(void)
+static int cache_ap_init(unsigned int cpu)
{
if (!memory_caching_control || get_cache_aps_delayed_init())
- return;
+ return 0;
/*
* Ideally we should hold mtrr_mutex here to avoid MTRR entries
@@ -1194,6 +1195,8 @@ void cache_ap_init(void)
*/
stop_machine_from_inactive_cpu(cache_rendezvous_handler, NULL,
cpu_callout_mask);
+
+ return 0;
}
/*
@@ -1207,3 +1210,12 @@ void cache_aps_init(void)
stop_machine(cache_rendezvous_handler, NULL, cpu_online_mask);
set_cache_aps_delayed_init(false);
}
+
+static int __init cache_ap_register(void)
+{
+ cpuhp_setup_state_nocalls(CPUHP_AP_CACHECTRL_STARTING,
+ "x86/cachectrl:starting",
+ cache_ap_init, NULL);
+ return 0;
+}
+core_initcall(cache_ap_register);
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index fd058b5..bf4ac1c 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1949,7 +1949,6 @@ void identify_secondary_cpu(struct cpuinfo_x86 *c)
#ifdef CONFIG_X86_32
enable_sep_cpu();
#endif
- cache_ap_init();
validate_apic_and_package_id(c);
x86_spec_ctrl_setup_ap();
update_srbds_msr();
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 1b61a48..82b311c 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1429,8 +1429,6 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
uv_system_init();
- set_cache_aps_delayed_init(true);
-
smp_quirk_init_udelay();
speculative_store_bypass_ht_init();
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index f614479..0d277b4 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -140,6 +140,7 @@ enum cpuhp_state {
*/
CPUHP_AP_IDLE_DEAD,
CPUHP_AP_OFFLINE,
+ CPUHP_AP_CACHECTRL_STARTING,
CPUHP_AP_SCHED_STARTING,
CPUHP_AP_RCUTREE_DYING,
CPUHP_AP_CPU_PM_STARTING,
The following commit has been merged into the x86/cpu branch of tip:
Commit-ID: adfe7512e1d0b2e83215b0ec56337d2df9f1032d
Gitweb: https://git.kernel.org/tip/adfe7512e1d0b2e83215b0ec56337d2df9f1032d
Author: Juergen Gross <[email protected]>
AuthorDate: Wed, 02 Nov 2022 08:47:10 +01:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Thu, 10 Nov 2022 13:12:45 +01:00
x86: Decouple PAT and MTRR handling
Today, PAT is usable only with MTRR being active, with some nasty tweaks
to make PAT usable when running as a Xen PV guest which doesn't support
MTRR.
The reason for this coupling is that both PAT MSR changes and MTRR
changes require a similar sequence and so full PAT support was added
using the already available MTRR handling.
Xen PV PAT handling can work without MTRR, as it just needs to consume
the PAT MSR setting done by the hypervisor without the ability and need
to change it. This in turn has resulted in a convoluted initialization
sequence and wrong decisions regarding cache mode availability due to
misguiding PAT availability flags.
Fix all of that by allowing to use PAT without MTRR and by reworking
the current PAT initialization sequence to match better with the newly
introduced generic cache initialization.
This removes the need of the recently added pat_force_disabled flag, so
remove the remnants of the patch adding it.
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/include/asm/memtype.h | 5 +-
arch/x86/kernel/cpu/cacheinfo.c | 3 +-
arch/x86/kernel/cpu/mtrr/mtrr.c | 12 +--
arch/x86/kernel/setup.c | 13 +---
arch/x86/mm/pat/memtype.c | 152 ++++++++++---------------------
5 files changed, 57 insertions(+), 128 deletions(-)
diff --git a/arch/x86/include/asm/memtype.h b/arch/x86/include/asm/memtype.h
index 9ca760e..113b2fa 100644
--- a/arch/x86/include/asm/memtype.h
+++ b/arch/x86/include/asm/memtype.h
@@ -6,9 +6,8 @@
#include <asm/pgtable_types.h>
extern bool pat_enabled(void);
-extern void pat_disable(const char *reason);
-extern void pat_init(void);
-extern void init_cache_modes(void);
+extern void pat_bp_init(void);
+extern void pat_cpu_init(void);
extern int memtype_reserve(u64 start, u64 end,
enum page_cache_mode req_pcm, enum page_cache_mode *ret_pcm);
diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 4e155bd..c830f85 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -1133,7 +1133,7 @@ static void cache_cpu_init(void)
mtrr_generic_set_state();
if (memory_caching_control & CACHE_PAT)
- pat_init();
+ pat_cpu_init();
cache_enable();
local_irq_restore(flags);
@@ -1162,6 +1162,7 @@ static int cache_rendezvous_handler(void *unused)
void __init cache_bp_init(void)
{
mtrr_bp_init();
+ pat_bp_init();
if (memory_caching_control)
cache_cpu_init();
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
index 99b6973..8403daf 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -725,7 +725,7 @@ void __init mtrr_bp_init(void)
if (mtrr_if == &generic_mtrr_ops) {
/* BIOS may override */
if (get_mtrr_state()) {
- memory_caching_control |= CACHE_MTRR | CACHE_PAT;
+ memory_caching_control |= CACHE_MTRR;
changed_by_mtrr_cleanup = mtrr_cleanup(phys_addr);
} else {
mtrr_if = NULL;
@@ -733,16 +733,8 @@ void __init mtrr_bp_init(void)
}
}
- if (!mtrr_enabled()) {
+ if (!mtrr_enabled())
pr_info("Disabled\n");
-
- /*
- * PAT initialization relies on MTRR's rendezvous handler.
- * Skip PAT init until the handler can initialize both
- * features independently.
- */
- pat_disable("MTRRs disabled, skipping PAT initialization too.");
- }
}
/**
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index e0e185e..aacaa96 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1075,24 +1075,13 @@ void __init setup_arch(char **cmdline_p)
max_pfn = e820__end_of_ram_pfn();
/* update e820 for memory not covered by WB MTRRs */
- if (IS_ENABLED(CONFIG_MTRR))
- cache_bp_init();
- else
- pat_disable("PAT support disabled because CONFIG_MTRR is disabled in the kernel.");
-
+ cache_bp_init();
if (mtrr_trim_uncached_memory(max_pfn))
max_pfn = e820__end_of_ram_pfn();
max_possible_pfn = max_pfn;
/*
- * This call is required when the CPU does not support PAT. If
- * mtrr_bp_init() invoked it already via pat_init() the call has no
- * effect.
- */
- init_cache_modes();
-
- /*
* Define random base addresses for memory sections after max_pfn is
* defined and before each memory section base is used.
*/
diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index 66a209f..9aab17d 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -43,6 +43,7 @@
#include <linux/rbtree.h>
#include <asm/cacheflush.h>
+#include <asm/cacheinfo.h>
#include <asm/processor.h>
#include <asm/tlbflush.h>
#include <asm/x86_init.h>
@@ -60,41 +61,34 @@
#undef pr_fmt
#define pr_fmt(fmt) "" fmt
-static bool __read_mostly pat_bp_initialized;
static bool __read_mostly pat_disabled = !IS_ENABLED(CONFIG_X86_PAT);
-static bool __initdata pat_force_disabled = !IS_ENABLED(CONFIG_X86_PAT);
-static bool __read_mostly pat_bp_enabled;
-static bool __read_mostly pat_cm_initialized;
+static u64 __ro_after_init pat_msr_val;
/*
* PAT support is enabled by default, but can be disabled for
* various user-requested or hardware-forced reasons:
*/
-void pat_disable(const char *msg_reason)
+static void __init pat_disable(const char *msg_reason)
{
if (pat_disabled)
return;
- if (pat_bp_initialized) {
- WARN_ONCE(1, "x86/PAT: PAT cannot be disabled after initialization\n");
- return;
- }
-
pat_disabled = true;
pr_info("x86/PAT: %s\n", msg_reason);
+
+ memory_caching_control &= ~CACHE_PAT;
}
static int __init nopat(char *str)
{
pat_disable("PAT support disabled via boot option.");
- pat_force_disabled = true;
return 0;
}
early_param("nopat", nopat);
bool pat_enabled(void)
{
- return pat_bp_enabled;
+ return !pat_disabled;
}
EXPORT_SYMBOL_GPL(pat_enabled);
@@ -192,7 +186,8 @@ enum {
#define CM(c) (_PAGE_CACHE_MODE_ ## c)
-static enum page_cache_mode pat_get_cache_mode(unsigned pat_val, char *msg)
+static enum page_cache_mode __init pat_get_cache_mode(unsigned int pat_val,
+ char *msg)
{
enum page_cache_mode cache;
char *cache_mode;
@@ -219,14 +214,12 @@ static enum page_cache_mode pat_get_cache_mode(unsigned pat_val, char *msg)
* configuration.
* Using lower indices is preferred, so we start with highest index.
*/
-static void __init_cache_modes(u64 pat)
+static void __init init_cache_modes(u64 pat)
{
enum page_cache_mode cache;
char pat_msg[33];
int i;
- WARN_ON_ONCE(pat_cm_initialized);
-
pat_msg[32] = 0;
for (i = 7; i >= 0; i--) {
cache = pat_get_cache_mode((pat >> (i * 8)) & 7,
@@ -234,34 +227,9 @@ static void __init_cache_modes(u64 pat)
update_cache_mode_entry(i, cache);
}
pr_info("x86/PAT: Configuration [0-7]: %s\n", pat_msg);
-
- pat_cm_initialized = true;
}
-#define PAT(x, y) ((u64)PAT_ ## y << ((x)*8))
-
-static void pat_bp_init(u64 pat)
-{
- u64 tmp_pat;
-
- if (!boot_cpu_has(X86_FEATURE_PAT)) {
- pat_disable("PAT not supported by the CPU.");
- return;
- }
-
- rdmsrl(MSR_IA32_CR_PAT, tmp_pat);
- if (!tmp_pat) {
- pat_disable("PAT support disabled by the firmware.");
- return;
- }
-
- wrmsrl(MSR_IA32_CR_PAT, pat);
- pat_bp_enabled = true;
-
- __init_cache_modes(pat);
-}
-
-static void pat_ap_init(u64 pat)
+void pat_cpu_init(void)
{
if (!boot_cpu_has(X86_FEATURE_PAT)) {
/*
@@ -271,30 +239,39 @@ static void pat_ap_init(u64 pat)
panic("x86/PAT: PAT enabled, but not supported by secondary CPU\n");
}
- wrmsrl(MSR_IA32_CR_PAT, pat);
+ wrmsrl(MSR_IA32_CR_PAT, pat_msr_val);
}
-void __init init_cache_modes(void)
+/**
+ * pat_bp_init - Initialize the PAT MSR value and PAT table
+ *
+ * This function initializes PAT MSR value and PAT table with an OS-defined
+ * value to enable additional cache attributes, WC, WT and WP.
+ *
+ * This function prepares the calls of pat_cpu_init() via cache_cpu_init()
+ * on all CPUs.
+ */
+void __init pat_bp_init(void)
{
- u64 pat = 0;
+ struct cpuinfo_x86 *c = &boot_cpu_data;
+#define PAT(p0, p1, p2, p3, p4, p5, p6, p7) \
+ (((u64)PAT_ ## p0) | ((u64)PAT_ ## p1 << 8) | \
+ ((u64)PAT_ ## p2 << 16) | ((u64)PAT_ ## p3 << 24) | \
+ ((u64)PAT_ ## p4 << 32) | ((u64)PAT_ ## p5 << 40) | \
+ ((u64)PAT_ ## p6 << 48) | ((u64)PAT_ ## p7 << 56))
- if (pat_cm_initialized)
- return;
- if (boot_cpu_has(X86_FEATURE_PAT)) {
- /*
- * CPU supports PAT. Set PAT table to be consistent with
- * PAT MSR. This case supports "nopat" boot option, and
- * virtual machine environments which support PAT without
- * MTRRs. In specific, Xen has unique setup to PAT MSR.
- *
- * If PAT MSR returns 0, it is considered invalid and emulates
- * as No PAT.
- */
- rdmsrl(MSR_IA32_CR_PAT, pat);
- }
+ if (!IS_ENABLED(CONFIG_X86_PAT))
+ pr_info_once("x86/PAT: PAT support disabled because CONFIG_X86_PAT is disabled in the kernel.\n");
+
+ if (!cpu_feature_enabled(X86_FEATURE_PAT))
+ pat_disable("PAT not supported by the CPU.");
+ else
+ rdmsrl(MSR_IA32_CR_PAT, pat_msr_val);
+
+ if (!pat_msr_val) {
+ pat_disable("PAT support disabled by the firmware.");
- if (!pat) {
/*
* No PAT. Emulate the PAT table that corresponds to the two
* cache bits, PWT (Write Through) and PCD (Cache Disable).
@@ -313,40 +290,17 @@ void __init init_cache_modes(void)
* NOTE: When WC or WP is used, it is redirected to UC- per
* the default setup in __cachemode2pte_tbl[].
*/
- pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
- PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
- } else if (!pat_force_disabled && cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) {
- /*
- * Clearly PAT is enabled underneath. Allow pat_enabled() to
- * reflect this.
- */
- pat_bp_enabled = true;
+ pat_msr_val = PAT(WB, WT, UC_MINUS, UC, WB, WT, UC_MINUS, UC);
}
- __init_cache_modes(pat);
-}
-
-/**
- * pat_init - Initialize the PAT MSR and PAT table on the current CPU
- *
- * This function initializes PAT MSR and PAT table with an OS-defined value
- * to enable additional cache attributes, WC, WT and WP.
- *
- * This function must be called on all CPUs using the specific sequence of
- * operations defined in Intel SDM. mtrr_rendezvous_handler() provides this
- * procedure for PAT.
- */
-void pat_init(void)
-{
- u64 pat;
- struct cpuinfo_x86 *c = &boot_cpu_data;
-
-#ifndef CONFIG_X86_PAT
- pr_info_once("x86/PAT: PAT support disabled because CONFIG_X86_PAT is disabled in the kernel.\n");
-#endif
-
- if (pat_disabled)
+ /*
+ * Xen PV doesn't allow to set PAT MSR, but all cache modes are
+ * supported.
+ */
+ if (pat_disabled || cpu_feature_enabled(X86_FEATURE_XENPV)) {
+ init_cache_modes(pat_msr_val);
return;
+ }
if ((c->x86_vendor == X86_VENDOR_INTEL) &&
(((c->x86 == 0x6) && (c->x86_model <= 0xd)) ||
@@ -371,8 +325,7 @@ void pat_init(void)
* NOTE: When WT or WP is used, it is redirected to UC- per
* the default setup in __cachemode2pte_tbl[].
*/
- pat = PAT(0, WB) | PAT(1, WC) | PAT(2, UC_MINUS) | PAT(3, UC) |
- PAT(4, WB) | PAT(5, WC) | PAT(6, UC_MINUS) | PAT(7, UC);
+ pat_msr_val = PAT(WB, WC, UC_MINUS, UC, WB, WC, UC_MINUS, UC);
} else {
/*
* Full PAT support. We put WT in slot 7 to improve
@@ -400,19 +353,14 @@ void pat_init(void)
* The reserved slots are unused, but mapped to their
* corresponding types in the presence of PAT errata.
*/
- pat = PAT(0, WB) | PAT(1, WC) | PAT(2, UC_MINUS) | PAT(3, UC) |
- PAT(4, WB) | PAT(5, WP) | PAT(6, UC_MINUS) | PAT(7, WT);
+ pat_msr_val = PAT(WB, WC, UC_MINUS, UC, WB, WP, UC_MINUS, WT);
}
- if (!pat_bp_initialized) {
- pat_bp_init(pat);
- pat_bp_initialized = true;
- } else {
- pat_ap_init(pat);
- }
-}
+ memory_caching_control |= CACHE_PAT;
+ init_cache_modes(pat_msr_val);
#undef PAT
+}
static DEFINE_SPINLOCK(memtype_lock); /* protects memtype accesses */
The following commit has been merged into the x86/cpu branch of tip:
Commit-ID: 45fa71f19a2d73f157d6892a8d677a738a0414fd
Gitweb: https://git.kernel.org/tip/45fa71f19a2d73f157d6892a8d677a738a0414fd
Author: Juergen Gross <[email protected]>
AuthorDate: Wed, 02 Nov 2022 08:47:00 +01:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Thu, 10 Nov 2022 13:12:44 +01:00
x86/mtrr: Replace use_intel() with a local flag
In MTRR code use_intel() is only used in one source file, and the
relevant use_intel_if member of struct mtrr_ops is set only in
generic_mtrr_ops.
Replace use_intel() with a single flag in cacheinfo.c which can be
set when assigning generic_mtrr_ops to mtrr_if. This allows to drop
use_intel_if from mtrr_ops, while preparing to decouple PAT from MTRR.
As another preparation for the PAT/MTRR decoupling use a bit for MTRR
control and one for PAT control. For now set both bits together, this
can be changed later.
As the new flag will be set only if mtrr_enabled is set, the test for
mtrr_enabled can be dropped at some places.
[ bp: Massage commit message. ]
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/include/asm/cacheinfo.h | 5 +++++
arch/x86/kernel/cpu/cacheinfo.c | 3 +++
arch/x86/kernel/cpu/mtrr/generic.c | 1 -
arch/x86/kernel/cpu/mtrr/mtrr.c | 28 +++++++++++++---------------
arch/x86/kernel/cpu/mtrr/mtrr.h | 2 --
5 files changed, 21 insertions(+), 18 deletions(-)
diff --git a/arch/x86/include/asm/cacheinfo.h b/arch/x86/include/asm/cacheinfo.h
index 86b2e0d..c387396 100644
--- a/arch/x86/include/asm/cacheinfo.h
+++ b/arch/x86/include/asm/cacheinfo.h
@@ -2,6 +2,11 @@
#ifndef _ASM_X86_CACHEINFO_H
#define _ASM_X86_CACHEINFO_H
+/* Kernel controls MTRR and/or PAT MSRs. */
+extern unsigned int memory_caching_control;
+#define CACHE_MTRR 0x01
+#define CACHE_PAT 0x02
+
void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c, int cpu);
void cacheinfo_hygon_init_llc_id(struct cpuinfo_x86 *c, int cpu);
diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 6655683..32fb049 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -35,6 +35,9 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_llc_shared_map);
/* Shared L2 cache maps */
DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_l2c_shared_map);
+/* Kernel controls MTRR and/or PAT MSRs. */
+unsigned int memory_caching_control __ro_after_init;
+
struct _cache_table {
unsigned char descriptor;
char cache_type;
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index c8f8951..7bbaba4 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -917,7 +917,6 @@ int positive_have_wrcomb(void)
* Generic structure...
*/
const struct mtrr_ops generic_mtrr_ops = {
- .use_intel_if = 1,
.set_all = generic_set_all,
.get = generic_get_mtrr,
.get_free_region = generic_get_free_region,
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
index 2746cac..4209945 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -46,6 +46,7 @@
#include <linux/syscore_ops.h>
#include <linux/rcupdate.h>
+#include <asm/cacheinfo.h>
#include <asm/cpufeature.h>
#include <asm/e820/api.h>
#include <asm/mtrr.h>
@@ -119,11 +120,11 @@ static int have_wrcomb(void)
}
/* This function returns the number of variable MTRRs */
-static void __init set_num_var_ranges(void)
+static void __init set_num_var_ranges(bool use_generic)
{
unsigned long config = 0, dummy;
- if (use_intel())
+ if (use_generic)
rdmsr(MSR_MTRRcap, config, dummy);
else if (is_cpu(AMD) || is_cpu(HYGON))
config = 2;
@@ -756,14 +757,16 @@ void __init mtrr_bp_init(void)
if (mtrr_if) {
__mtrr_enabled = true;
- set_num_var_ranges();
+ set_num_var_ranges(mtrr_if == &generic_mtrr_ops);
init_table();
- if (use_intel()) {
+ if (mtrr_if == &generic_mtrr_ops) {
/* BIOS may override */
__mtrr_enabled = get_mtrr_state();
- if (mtrr_enabled())
+ if (mtrr_enabled()) {
mtrr_bp_pat_init();
+ memory_caching_control |= CACHE_MTRR | CACHE_PAT;
+ }
if (mtrr_cleanup(phys_addr)) {
changed_by_mtrr_cleanup = 1;
@@ -786,10 +789,7 @@ void __init mtrr_bp_init(void)
void mtrr_ap_init(void)
{
- if (!mtrr_enabled())
- return;
-
- if (!use_intel() || mtrr_aps_delayed_init)
+ if (!memory_caching_control || mtrr_aps_delayed_init)
return;
/*
@@ -825,9 +825,7 @@ void mtrr_save_state(void)
void set_mtrr_aps_delayed_init(void)
{
- if (!mtrr_enabled())
- return;
- if (!use_intel())
+ if (!memory_caching_control)
return;
mtrr_aps_delayed_init = true;
@@ -838,7 +836,7 @@ void set_mtrr_aps_delayed_init(void)
*/
void mtrr_aps_init(void)
{
- if (!use_intel() || !mtrr_enabled())
+ if (!memory_caching_control)
return;
/*
@@ -855,7 +853,7 @@ void mtrr_aps_init(void)
void mtrr_bp_restore(void)
{
- if (!use_intel() || !mtrr_enabled())
+ if (!memory_caching_control)
return;
mtrr_if->set_all();
@@ -866,7 +864,7 @@ static int __init mtrr_init_finialize(void)
if (!mtrr_enabled())
return 0;
- if (use_intel()) {
+ if (memory_caching_control & CACHE_MTRR) {
if (!changed_by_mtrr_cleanup)
mtrr_state_warn();
return 0;
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h
index 2ac99e5..88b1c4b 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.h
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.h
@@ -14,7 +14,6 @@ extern unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
struct mtrr_ops {
u32 vendor;
- u32 use_intel_if;
void (*set)(unsigned int reg, unsigned long base,
unsigned long size, mtrr_type type);
void (*set_all)(void);
@@ -61,7 +60,6 @@ extern u64 size_or_mask, size_and_mask;
extern const struct mtrr_ops *mtrr_if;
#define is_cpu(vnd) (mtrr_if && mtrr_if->vendor == X86_VENDOR_##vnd)
-#define use_intel() (mtrr_if && mtrr_if->use_intel_if == 1)
extern unsigned int num_var_ranges;
extern u64 mtrr_tom2;
The following commit has been merged into the x86/cpu branch of tip:
Commit-ID: 4ad7149e46d048d3a543c96d35c1d255208dd33a
Gitweb: https://git.kernel.org/tip/4ad7149e46d048d3a543c96d35c1d255208dd33a
Author: Juergen Gross <[email protected]>
AuthorDate: Wed, 02 Nov 2022 08:47:02 +01:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Thu, 10 Nov 2022 13:12:44 +01:00
x86/mtrr: Split MTRR-specific handling from cache dis/enabling
Split the MTRR-specific actions from cache_disable() and cache_enable()
into new functions mtrr_disable() and mtrr_enable().
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/include/asm/mtrr.h | 4 ++++
arch/x86/kernel/cpu/mtrr/generic.c | 26 +++++++++++++++++++-------
2 files changed, 23 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index 76d7260..12a16ca 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -48,6 +48,8 @@ extern void mtrr_aps_init(void);
extern void mtrr_bp_restore(void);
extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
extern int amd_special_default_mtrr(void);
+void mtrr_disable(void);
+void mtrr_enable(void);
# else
static inline u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform)
{
@@ -87,6 +89,8 @@ static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
#define set_mtrr_aps_delayed_init() do {} while (0)
#define mtrr_aps_init() do {} while (0)
#define mtrr_bp_restore() do {} while (0)
+#define mtrr_disable() do {} while (0)
+#define mtrr_enable() do {} while (0)
# endif
#ifdef CONFIG_COMPAT
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 2f3fc28..0db0770 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -716,6 +716,21 @@ static unsigned long set_mtrr_state(void)
return change_mask;
}
+void mtrr_disable(void)
+{
+ /* Save MTRR state */
+ rdmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
+
+ /* Disable MTRRs, and set the default type to uncached */
+ mtrr_wrmsr(MSR_MTRRdefType, deftype_lo & ~0xcff, deftype_hi);
+}
+
+void mtrr_enable(void)
+{
+ /* Intel (P6) standard MTRRs */
+ mtrr_wrmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
+}
+
/*
* Disable and enable caches. Needed for changing MTRRs and the PAT MSR.
*
@@ -764,11 +779,8 @@ void cache_disable(void) __acquires(cache_disable_lock)
count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
flush_tlb_local();
- /* Save MTRR state */
- rdmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
-
- /* Disable MTRRs, and set the default type to uncached */
- mtrr_wrmsr(MSR_MTRRdefType, deftype_lo & ~0xcff, deftype_hi);
+ if (cpu_feature_enabled(X86_FEATURE_MTRR))
+ mtrr_disable();
/* Again, only flush caches if we have to. */
if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
@@ -781,8 +793,8 @@ void cache_enable(void) __releases(cache_disable_lock)
count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
flush_tlb_local();
- /* Intel (P6) standard MTRRs */
- mtrr_wrmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
+ if (cpu_feature_enabled(X86_FEATURE_MTRR))
+ mtrr_enable();
/* Enable caches */
write_cr0(read_cr0() & ~X86_CR0_CD);
The following commit has been merged into the x86/cpu branch of tip:
Commit-ID: 0b9a6a8bedbfb38e7c6be4d119a267e6277307cc
Gitweb: https://git.kernel.org/tip/0b9a6a8bedbfb38e7c6be4d119a267e6277307cc
Author: Juergen Gross <[email protected]>
AuthorDate: Wed, 02 Nov 2022 08:47:09 +01:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Thu, 10 Nov 2022 13:12:45 +01:00
x86/mtrr: Add a stop_machine() handler calling only cache_cpu_init()
Instead of having a stop_machine() handler for either a specific
MTRR register or all state at once, add a handler just for calling
cache_cpu_init() if appropriate.
Add functions for calling stop_machine() with this handler as well.
Add a generic replacement for mtrr_bp_restore() and a wrapper for
mtrr_bp_init().
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/include/asm/cacheinfo.h | 5 +-
arch/x86/include/asm/mtrr.h | 8 +---
arch/x86/kernel/cpu/cacheinfo.c | 59 ++++++++++++++++++++-
arch/x86/kernel/cpu/common.c | 3 +-
arch/x86/kernel/cpu/mtrr/mtrr.c | 88 +-------------------------------
arch/x86/kernel/setup.c | 3 +-
arch/x86/kernel/smpboot.c | 4 +-
arch/x86/power/cpu.c | 3 +-
8 files changed, 74 insertions(+), 99 deletions(-)
diff --git a/arch/x86/include/asm/cacheinfo.h b/arch/x86/include/asm/cacheinfo.h
index e443fcc..a0ef46e 100644
--- a/arch/x86/include/asm/cacheinfo.h
+++ b/arch/x86/include/asm/cacheinfo.h
@@ -12,8 +12,11 @@ void cacheinfo_hygon_init_llc_id(struct cpuinfo_x86 *c, int cpu);
void cache_disable(void);
void cache_enable(void);
-void cache_cpu_init(void);
void set_cache_aps_delayed_init(bool val);
bool get_cache_aps_delayed_init(void);
+void cache_bp_init(void);
+void cache_bp_restore(void);
+void cache_ap_init(void);
+void cache_aps_init(void);
#endif /* _ASM_X86_CACHEINFO_H */
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index 5d31219..f0eeaf6 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -25,13 +25,12 @@
#include <uapi/asm/mtrr.h>
-void mtrr_bp_init(void);
-
/*
* The following functions are for use by other drivers that cannot use
* arch_phys_wc_add and arch_phys_wc_del.
*/
# ifdef CONFIG_MTRR
+void mtrr_bp_init(void);
extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform);
extern void mtrr_save_fixed_ranges(void *);
extern void mtrr_save_state(void);
@@ -42,8 +41,6 @@ extern int mtrr_add_page(unsigned long base, unsigned long size,
extern int mtrr_del(int reg, unsigned long base, unsigned long size);
extern int mtrr_del_page(int reg, unsigned long base, unsigned long size);
extern void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi);
-extern void mtrr_ap_init(void);
-extern void mtrr_aps_init(void);
extern void mtrr_bp_restore(void);
extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
extern int amd_special_default_mtrr(void);
@@ -85,8 +82,7 @@ static inline int mtrr_trim_uncached_memory(unsigned long end_pfn)
static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
{
}
-#define mtrr_ap_init() do {} while (0)
-#define mtrr_aps_init() do {} while (0)
+#define mtrr_bp_init() do {} while (0)
#define mtrr_bp_restore() do {} while (0)
#define mtrr_disable() do {} while (0)
#define mtrr_enable() do {} while (0)
diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 063d556..4e155bd 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -15,6 +15,7 @@
#include <linux/capability.h>
#include <linux/sysfs.h>
#include <linux/pci.h>
+#include <linux/stop_machine.h>
#include <asm/cpufeature.h>
#include <asm/cacheinfo.h>
@@ -1121,7 +1122,7 @@ void cache_enable(void) __releases(cache_disable_lock)
raw_spin_unlock(&cache_disable_lock);
}
-void cache_cpu_init(void)
+static void cache_cpu_init(void)
{
unsigned long flags;
@@ -1149,3 +1150,59 @@ bool get_cache_aps_delayed_init(void)
{
return cache_aps_delayed_init;
}
+
+static int cache_rendezvous_handler(void *unused)
+{
+ if (get_cache_aps_delayed_init() || !cpu_online(smp_processor_id()))
+ cache_cpu_init();
+
+ return 0;
+}
+
+void __init cache_bp_init(void)
+{
+ mtrr_bp_init();
+
+ if (memory_caching_control)
+ cache_cpu_init();
+}
+
+void cache_bp_restore(void)
+{
+ if (memory_caching_control)
+ cache_cpu_init();
+}
+
+void cache_ap_init(void)
+{
+ if (!memory_caching_control || get_cache_aps_delayed_init())
+ return;
+
+ /*
+ * Ideally we should hold mtrr_mutex here to avoid MTRR entries
+ * changed, but this routine will be called in CPU boot time,
+ * holding the lock breaks it.
+ *
+ * This routine is called in two cases:
+ *
+ * 1. very early time of software resume, when there absolutely
+ * isn't MTRR entry changes;
+ *
+ * 2. CPU hotadd time. We let mtrr_add/del_page hold cpuhotplug
+ * lock to prevent MTRR entry changes
+ */
+ stop_machine_from_inactive_cpu(cache_rendezvous_handler, NULL,
+ cpu_callout_mask);
+}
+
+/*
+ * Delayed cache initialization for all AP's
+ */
+void cache_aps_init(void)
+{
+ if (!memory_caching_control || !get_cache_aps_delayed_init())
+ return;
+
+ stop_machine(cache_rendezvous_handler, NULL, cpu_online_mask);
+ set_cache_aps_delayed_init(false);
+}
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 3e508f2..fd058b5 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -52,6 +52,7 @@
#include <asm/cpu.h>
#include <asm/mce.h>
#include <asm/msr.h>
+#include <asm/cacheinfo.h>
#include <asm/memtype.h>
#include <asm/microcode.h>
#include <asm/microcode_intel.h>
@@ -1948,7 +1949,7 @@ void identify_secondary_cpu(struct cpuinfo_x86 *c)
#ifdef CONFIG_X86_32
enable_sep_cpu();
#endif
- mtrr_ap_init();
+ cache_ap_init();
validate_apic_and_package_id(c);
x86_spec_ctrl_setup_ap();
update_srbds_msr();
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
index 15ee6d7..99b6973 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -73,9 +73,6 @@ static const struct mtrr_ops *mtrr_ops[X86_VENDOR_NUM] __ro_after_init;
const struct mtrr_ops *mtrr_if;
-static void set_mtrr(unsigned int reg, unsigned long base,
- unsigned long size, mtrr_type type);
-
void __init set_mtrr_ops(const struct mtrr_ops *ops)
{
if (ops->vendor && ops->vendor < X86_VENDOR_NUM)
@@ -158,26 +155,8 @@ static int mtrr_rendezvous_handler(void *info)
{
struct set_mtrr_data *data = info;
- /*
- * We use this same function to initialize the mtrrs during boot,
- * resume, runtime cpu online and on an explicit request to set a
- * specific MTRR.
- *
- * During boot or suspend, the state of the boot cpu's mtrrs has been
- * saved, and we want to replicate that across all the cpus that come
- * online (either at the end of boot or resume or during a runtime cpu
- * online). If we're doing that, @reg is set to something special and on
- * all the CPUs we do cache_cpu_init() (On the logical CPU that
- * started the boot/resume sequence, this might be a duplicate
- * cache_cpu_init()).
- */
- if (data->smp_reg != ~0U) {
- mtrr_if->set(data->smp_reg, data->smp_base,
- data->smp_size, data->smp_type);
- } else if (get_cache_aps_delayed_init() ||
- !cpu_online(smp_processor_id())) {
- cache_cpu_init();
- }
+ mtrr_if->set(data->smp_reg, data->smp_base,
+ data->smp_size, data->smp_type);
return 0;
}
@@ -247,19 +226,6 @@ static void set_mtrr_cpuslocked(unsigned int reg, unsigned long base,
stop_machine_cpuslocked(mtrr_rendezvous_handler, &data, cpu_online_mask);
}
-static void set_mtrr_from_inactive_cpu(unsigned int reg, unsigned long base,
- unsigned long size, mtrr_type type)
-{
- struct set_mtrr_data data = { .smp_reg = reg,
- .smp_base = base,
- .smp_size = size,
- .smp_type = type
- };
-
- stop_machine_from_inactive_cpu(mtrr_rendezvous_handler, &data,
- cpu_callout_mask);
-}
-
/**
* mtrr_add_page - Add a memory type region
* @base: Physical base address of region in pages (in units of 4 kB!)
@@ -761,7 +727,6 @@ void __init mtrr_bp_init(void)
if (get_mtrr_state()) {
memory_caching_control |= CACHE_MTRR | CACHE_PAT;
changed_by_mtrr_cleanup = mtrr_cleanup(phys_addr);
- cache_cpu_init();
} else {
mtrr_if = NULL;
}
@@ -780,27 +745,6 @@ void __init mtrr_bp_init(void)
}
}
-void mtrr_ap_init(void)
-{
- if (!memory_caching_control || get_cache_aps_delayed_init())
- return;
-
- /*
- * Ideally we should hold mtrr_mutex here to avoid mtrr entries
- * changed, but this routine will be called in cpu boot time,
- * holding the lock breaks it.
- *
- * This routine is called in two cases:
- *
- * 1. very early time of software resume, when there absolutely
- * isn't mtrr entry changes;
- *
- * 2. cpu hotadd time. We let mtrr_add/del_page hold cpuhotplug
- * lock to prevent mtrr entry changes
- */
- set_mtrr_from_inactive_cpu(~0U, 0, 0, 0);
-}
-
/**
* mtrr_save_state - Save current fixed-range MTRR state of the first
* cpu in cpu_online_mask.
@@ -816,34 +760,6 @@ void mtrr_save_state(void)
smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
}
-/*
- * Delayed MTRR initialization for all AP's
- */
-void mtrr_aps_init(void)
-{
- if (!memory_caching_control)
- return;
-
- /*
- * Check if someone has requested the delay of AP MTRR initialization,
- * by doing set_mtrr_aps_delayed_init(), prior to this point. If not,
- * then we are done.
- */
- if (!get_cache_aps_delayed_init())
- return;
-
- set_mtrr(~0U, 0, 0, 0);
- set_cache_aps_delayed_init(false);
-}
-
-void mtrr_bp_restore(void)
-{
- if (!memory_caching_control)
- return;
-
- cache_cpu_init();
-}
-
static int __init mtrr_init_finialize(void)
{
if (!mtrr_enabled())
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 216fee7..e0e185e 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -34,6 +34,7 @@
#include <asm/numa.h>
#include <asm/bios_ebda.h>
#include <asm/bugs.h>
+#include <asm/cacheinfo.h>
#include <asm/cpu.h>
#include <asm/efi.h>
#include <asm/gart.h>
@@ -1075,7 +1076,7 @@ void __init setup_arch(char **cmdline_p)
/* update e820 for memory not covered by WB MTRRs */
if (IS_ENABLED(CONFIG_MTRR))
- mtrr_bp_init();
+ cache_bp_init();
else
pat_disable("PAT support disabled because CONFIG_MTRR is disabled in the kernel.");
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 13c71ab..1b61a48 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1445,7 +1445,7 @@ void arch_thaw_secondary_cpus_begin(void)
void arch_thaw_secondary_cpus_end(void)
{
- mtrr_aps_init();
+ cache_aps_init();
}
/*
@@ -1488,7 +1488,7 @@ void __init native_smp_cpus_done(unsigned int max_cpus)
nmi_selftest();
impress_friends();
- mtrr_aps_init();
+ cache_aps_init();
}
static int __initdata setup_possible_cpus = -1;
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index bb176c7..754221c 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -23,6 +23,7 @@
#include <asm/fpu/api.h>
#include <asm/debugreg.h>
#include <asm/cpu.h>
+#include <asm/cacheinfo.h>
#include <asm/mmu_context.h>
#include <asm/cpu_device_id.h>
#include <asm/microcode.h>
@@ -261,7 +262,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
do_fpu_end();
tsc_verify_tsc_adjust(true);
x86_platform.restore_sched_clock_state();
- mtrr_bp_restore();
+ cache_bp_restore();
perf_restore_debug_store();
c = &cpu_data(smp_processor_id());
The following commit has been merged into the x86/cpu branch of tip:
Commit-ID: 57df636cd336a1929c7ddc5fb48ed124d24cd7b2
Gitweb: https://git.kernel.org/tip/57df636cd336a1929c7ddc5fb48ed124d24cd7b2
Author: Juergen Gross <[email protected]>
AuthorDate: Wed, 02 Nov 2022 08:47:05 +01:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Thu, 10 Nov 2022 13:12:44 +01:00
x86/mtrr: Remove set_all callback from struct mtrr_ops
Instead of using an indirect call to mtrr_if->set_all just call the only
possible target cache_cpu_init() directly. Remove the set_all function
pointer from struct mtrr_ops.
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/kernel/cpu/mtrr/generic.c | 1 -
arch/x86/kernel/cpu/mtrr/mtrr.c | 10 +++++-----
arch/x86/kernel/cpu/mtrr/mtrr.h | 2 --
3 files changed, 5 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index d409c38..9d4d2bc 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -843,7 +843,6 @@ int positive_have_wrcomb(void)
* Generic structure...
*/
const struct mtrr_ops generic_mtrr_ops = {
- .set_all = cache_cpu_init,
.get = generic_get_mtrr,
.get_free_region = generic_get_free_region,
.set = generic_set_mtrr,
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
index 4209945..a44b510 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -170,15 +170,15 @@ static int mtrr_rendezvous_handler(void *info)
* saved, and we want to replicate that across all the cpus that come
* online (either at the end of boot or resume or during a runtime cpu
* online). If we're doing that, @reg is set to something special and on
- * all the cpu's we do mtrr_if->set_all() (On the logical cpu that
+ * all the CPUs we do cache_cpu_init() (On the logical CPU that
* started the boot/resume sequence, this might be a duplicate
- * set_all()).
+ * cache_cpu_init()).
*/
if (data->smp_reg != ~0U) {
mtrr_if->set(data->smp_reg, data->smp_base,
data->smp_size, data->smp_type);
} else if (mtrr_aps_delayed_init || !cpu_online(smp_processor_id())) {
- mtrr_if->set_all();
+ cache_cpu_init();
}
return 0;
}
@@ -770,7 +770,7 @@ void __init mtrr_bp_init(void)
if (mtrr_cleanup(phys_addr)) {
changed_by_mtrr_cleanup = 1;
- mtrr_if->set_all();
+ cache_cpu_init();
}
}
}
@@ -856,7 +856,7 @@ void mtrr_bp_restore(void)
if (!memory_caching_control)
return;
- mtrr_if->set_all();
+ cache_cpu_init();
}
static int __init mtrr_init_finialize(void)
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h
index 88b1c4b..3b18831 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.h
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.h
@@ -16,8 +16,6 @@ struct mtrr_ops {
u32 vendor;
void (*set)(unsigned int reg, unsigned long base,
unsigned long size, mtrr_type type);
- void (*set_all)(void);
-
void (*get)(unsigned int reg, unsigned long *base,
unsigned long *size, mtrr_type *type);
int (*get_free_region)(unsigned long base, unsigned long size,
The following commit has been merged into the x86/cpu branch of tip:
Commit-ID: d5f66d5d10611978c3a93cc94a811d74e0cf6cbc
Gitweb: https://git.kernel.org/tip/d5f66d5d10611978c3a93cc94a811d74e0cf6cbc
Author: Juergen Gross <[email protected]>
AuthorDate: Wed, 02 Nov 2022 08:47:01 +01:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Thu, 10 Nov 2022 13:12:44 +01:00
x86/mtrr: Rename prepare_set() and post_set()
Rename the currently MTRR-specific functions prepare_set() and
post_set() in preparation to move them. Make them non-static and put
their prototypes into cacheinfo.h, where they will end after moving them
to their final position anyway.
Expand the comment before the functions with an introductory line and
rename two related static variables, too.
[ bp: Massage commit message. ]
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/include/asm/cacheinfo.h | 3 ++-
arch/x86/kernel/cpu/mtrr/generic.c | 43 ++++++++++++++---------------
2 files changed, 24 insertions(+), 22 deletions(-)
diff --git a/arch/x86/include/asm/cacheinfo.h b/arch/x86/include/asm/cacheinfo.h
index c387396..6159874 100644
--- a/arch/x86/include/asm/cacheinfo.h
+++ b/arch/x86/include/asm/cacheinfo.h
@@ -10,4 +10,7 @@ extern unsigned int memory_caching_control;
void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c, int cpu);
void cacheinfo_hygon_init_llc_id(struct cpuinfo_x86 *c, int cpu);
+void cache_disable(void);
+void cache_enable(void);
+
#endif /* _ASM_X86_CACHEINFO_H */
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 7bbaba4..2f3fc28 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -10,6 +10,7 @@
#include <linux/mm.h>
#include <asm/processor-flags.h>
+#include <asm/cacheinfo.h>
#include <asm/cpufeature.h>
#include <asm/tlbflush.h>
#include <asm/mtrr.h>
@@ -396,9 +397,6 @@ print_fixed(unsigned base, unsigned step, const mtrr_type *types)
}
}
-static void prepare_set(void);
-static void post_set(void);
-
static void __init print_mtrr_state(void)
{
unsigned int i;
@@ -450,11 +448,11 @@ void __init mtrr_bp_pat_init(void)
unsigned long flags;
local_irq_save(flags);
- prepare_set();
+ cache_disable();
pat_init();
- post_set();
+ cache_enable();
local_irq_restore(flags);
}
@@ -687,7 +685,7 @@ static u32 deftype_lo, deftype_hi;
* NOTE: The CPU must already be in a safe state for MTRR changes, including
* measures that only a single CPU can be active in set_mtrr_state() in
* order to not be subject to races for usage of deftype_lo. This is
- * accomplished by taking set_atomicity_lock.
+ * accomplished by taking cache_disable_lock.
* RETURNS: 0 if no changes made, else a mask indicating what was changed.
*/
static unsigned long set_mtrr_state(void)
@@ -718,18 +716,19 @@ static unsigned long set_mtrr_state(void)
return change_mask;
}
-
-static unsigned long cr4;
-static DEFINE_RAW_SPINLOCK(set_atomicity_lock);
-
/*
+ * Disable and enable caches. Needed for changing MTRRs and the PAT MSR.
+ *
* Since we are disabling the cache don't allow any interrupts,
* they would run extremely slow and would only increase the pain.
*
* The caller must ensure that local interrupts are disabled and
- * are reenabled after post_set() has been called.
+ * are reenabled after cache_enable() has been called.
*/
-static void prepare_set(void) __acquires(set_atomicity_lock)
+static unsigned long saved_cr4;
+static DEFINE_RAW_SPINLOCK(cache_disable_lock);
+
+void cache_disable(void) __acquires(cache_disable_lock)
{
unsigned long cr0;
@@ -740,7 +739,7 @@ static void prepare_set(void) __acquires(set_atomicity_lock)
* changes to the way the kernel boots
*/
- raw_spin_lock(&set_atomicity_lock);
+ raw_spin_lock(&cache_disable_lock);
/* Enter the no-fill (CD=1, NW=0) cache mode and flush caches. */
cr0 = read_cr0() | X86_CR0_CD;
@@ -757,8 +756,8 @@ static void prepare_set(void) __acquires(set_atomicity_lock)
/* Save value of CR4 and clear Page Global Enable (bit 7) */
if (boot_cpu_has(X86_FEATURE_PGE)) {
- cr4 = __read_cr4();
- __write_cr4(cr4 & ~X86_CR4_PGE);
+ saved_cr4 = __read_cr4();
+ __write_cr4(saved_cr4 & ~X86_CR4_PGE);
}
/* Flush all TLBs via a mov %cr3, %reg; mov %reg, %cr3 */
@@ -776,7 +775,7 @@ static void prepare_set(void) __acquires(set_atomicity_lock)
wbinvd();
}
-static void post_set(void) __releases(set_atomicity_lock)
+void cache_enable(void) __releases(cache_disable_lock)
{
/* Flush TLBs (no need to flush caches - they are disabled) */
count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
@@ -790,8 +789,8 @@ static void post_set(void) __releases(set_atomicity_lock)
/* Restore value of CR4 */
if (boot_cpu_has(X86_FEATURE_PGE))
- __write_cr4(cr4);
- raw_spin_unlock(&set_atomicity_lock);
+ __write_cr4(saved_cr4);
+ raw_spin_unlock(&cache_disable_lock);
}
static void generic_set_all(void)
@@ -800,7 +799,7 @@ static void generic_set_all(void)
unsigned long flags;
local_irq_save(flags);
- prepare_set();
+ cache_disable();
/* Actually set the state */
mask = set_mtrr_state();
@@ -808,7 +807,7 @@ static void generic_set_all(void)
/* also set PAT */
pat_init();
- post_set();
+ cache_enable();
local_irq_restore(flags);
/* Use the atomic bitops to update the global mask */
@@ -839,7 +838,7 @@ static void generic_set_mtrr(unsigned int reg, unsigned long base,
vr = &mtrr_state.var_ranges[reg];
local_irq_save(flags);
- prepare_set();
+ cache_disable();
if (size == 0) {
/*
@@ -858,7 +857,7 @@ static void generic_set_mtrr(unsigned int reg, unsigned long base,
mtrr_wrmsr(MTRRphysMask_MSR(reg), vr->mask_lo, vr->mask_hi);
}
- post_set();
+ cache_enable();
local_irq_restore(flags);
}
The following commit has been merged into the x86/cpu branch of tip:
Commit-ID: 2c15679e8687d5934e1a70fe50ce409bb8a2aba1
Gitweb: https://git.kernel.org/tip/2c15679e8687d5934e1a70fe50ce409bb8a2aba1
Author: Juergen Gross <[email protected]>
AuthorDate: Wed, 02 Nov 2022 08:47:07 +01:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Thu, 10 Nov 2022 13:12:45 +01:00
x86/mtrr: Get rid of __mtrr_enabled bool
There is no need for keeping __mtrr_enabled as it can easily be replaced
by testing mtrr_if to be not NULL.
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/kernel/cpu/mtrr/mtrr.c | 13 +++++--------
1 file changed, 5 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
index a468be5..f671be9 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -59,11 +59,9 @@
#define MTRR_TO_PHYS_WC_OFFSET 1000
u32 num_var_ranges;
-static bool __mtrr_enabled;
-
static bool mtrr_enabled(void)
{
- return __mtrr_enabled;
+ return !!mtrr_if;
}
unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
@@ -755,18 +753,17 @@ void __init mtrr_bp_init(void)
}
}
- if (mtrr_if) {
- __mtrr_enabled = true;
+ if (mtrr_enabled()) {
set_num_var_ranges(mtrr_if == &generic_mtrr_ops);
init_table();
if (mtrr_if == &generic_mtrr_ops) {
/* BIOS may override */
- __mtrr_enabled = get_mtrr_state();
-
- if (mtrr_enabled()) {
+ if (get_mtrr_state()) {
memory_caching_control |= CACHE_MTRR | CACHE_PAT;
changed_by_mtrr_cleanup = mtrr_cleanup(phys_addr);
cache_cpu_init();
+ } else {
+ mtrr_if = NULL;
}
}
}
The following commit has been merged into the x86/cpu branch of tip:
Commit-ID: f8bd9f25c9815161a39886fdd96d110b536a6074
Gitweb: https://git.kernel.org/tip/f8bd9f25c9815161a39886fdd96d110b536a6074
Author: Juergen Gross <[email protected]>
AuthorDate: Wed, 02 Nov 2022 08:47:13 +01:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Thu, 10 Nov 2022 13:12:45 +01:00
x86/mtrr: Simplify mtrr_ops initialization
The way mtrr_if is initialized with the correct mtrr_ops structure is
quite weird.
Simplify that by dropping the vendor specific init functions and the
mtrr_ops[] array. Replace those with direct assignments of the related
vendor specific ops array to mtrr_if.
Note that a direct assignment is okay even for 64-bit builds, where the
symbol isn't present, as the related code will be subject to "dead code
elimination" due to how cpu_feature_enabled() is implemented.
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/kernel/cpu/mtrr/amd.c | 8 +-------
arch/x86/kernel/cpu/mtrr/centaur.c | 8 +-------
arch/x86/kernel/cpu/mtrr/cyrix.c | 8 +-------
arch/x86/kernel/cpu/mtrr/mtrr.c | 30 ++---------------------------
arch/x86/kernel/cpu/mtrr/mtrr.h | 10 ++++------
5 files changed, 10 insertions(+), 54 deletions(-)
diff --git a/arch/x86/kernel/cpu/mtrr/amd.c b/arch/x86/kernel/cpu/mtrr/amd.c
index a65a027..eff6ac6 100644
--- a/arch/x86/kernel/cpu/mtrr/amd.c
+++ b/arch/x86/kernel/cpu/mtrr/amd.c
@@ -109,7 +109,7 @@ amd_validate_add_page(unsigned long base, unsigned long size, unsigned int type)
return 0;
}
-static const struct mtrr_ops amd_mtrr_ops = {
+const struct mtrr_ops amd_mtrr_ops = {
.vendor = X86_VENDOR_AMD,
.set = amd_set_mtrr,
.get = amd_get_mtrr,
@@ -117,9 +117,3 @@ static const struct mtrr_ops amd_mtrr_ops = {
.validate_add_page = amd_validate_add_page,
.have_wrcomb = positive_have_wrcomb,
};
-
-int __init amd_init_mtrr(void)
-{
- set_mtrr_ops(&amd_mtrr_ops);
- return 0;
-}
diff --git a/arch/x86/kernel/cpu/mtrr/centaur.c b/arch/x86/kernel/cpu/mtrr/centaur.c
index f271778..b8a74ed 100644
--- a/arch/x86/kernel/cpu/mtrr/centaur.c
+++ b/arch/x86/kernel/cpu/mtrr/centaur.c
@@ -111,7 +111,7 @@ centaur_validate_add_page(unsigned long base, unsigned long size, unsigned int t
return 0;
}
-static const struct mtrr_ops centaur_mtrr_ops = {
+const struct mtrr_ops centaur_mtrr_ops = {
.vendor = X86_VENDOR_CENTAUR,
.set = centaur_set_mcr,
.get = centaur_get_mcr,
@@ -119,9 +119,3 @@ static const struct mtrr_ops centaur_mtrr_ops = {
.validate_add_page = centaur_validate_add_page,
.have_wrcomb = positive_have_wrcomb,
};
-
-int __init centaur_init_mtrr(void)
-{
- set_mtrr_ops(¢aur_mtrr_ops);
- return 0;
-}
diff --git a/arch/x86/kernel/cpu/mtrr/cyrix.c b/arch/x86/kernel/cpu/mtrr/cyrix.c
index c77d3b0..173b9e0 100644
--- a/arch/x86/kernel/cpu/mtrr/cyrix.c
+++ b/arch/x86/kernel/cpu/mtrr/cyrix.c
@@ -234,7 +234,7 @@ static void cyrix_set_arr(unsigned int reg, unsigned long base,
post_set();
}
-static const struct mtrr_ops cyrix_mtrr_ops = {
+const struct mtrr_ops cyrix_mtrr_ops = {
.vendor = X86_VENDOR_CYRIX,
.set = cyrix_set_arr,
.get = cyrix_get_arr,
@@ -242,9 +242,3 @@ static const struct mtrr_ops cyrix_mtrr_ops = {
.validate_add_page = generic_validate_add_page,
.have_wrcomb = positive_have_wrcomb,
};
-
-int __init cyrix_init_mtrr(void)
-{
- set_mtrr_ops(&cyrix_mtrr_ops);
- return 0;
-}
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
index 8403daf..6432abc 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -69,16 +69,8 @@ static DEFINE_MUTEX(mtrr_mutex);
u64 size_or_mask, size_and_mask;
-static const struct mtrr_ops *mtrr_ops[X86_VENDOR_NUM] __ro_after_init;
-
const struct mtrr_ops *mtrr_if;
-void __init set_mtrr_ops(const struct mtrr_ops *ops)
-{
- if (ops->vendor && ops->vendor < X86_VENDOR_NUM)
- mtrr_ops[ops->vendor] = ops;
-}
-
/* Returns non-zero if we have the write-combining memory type */
static int have_wrcomb(void)
{
@@ -582,20 +574,6 @@ int arch_phys_wc_index(int handle)
}
EXPORT_SYMBOL_GPL(arch_phys_wc_index);
-/*
- * HACK ALERT!
- * These should be called implicitly, but we can't yet until all the initcall
- * stuff is done...
- */
-static void __init init_ifs(void)
-{
-#ifndef CONFIG_X86_64
- amd_init_mtrr();
- cyrix_init_mtrr();
- centaur_init_mtrr();
-#endif
-}
-
/* The suspend/resume methods are only for CPU without MTRR. CPU using generic
* MTRR driver doesn't require this
*/
@@ -653,8 +631,6 @@ void __init mtrr_bp_init(void)
{
u32 phys_addr;
- init_ifs();
-
phys_addr = 32;
if (boot_cpu_has(X86_FEATURE_MTRR)) {
@@ -695,21 +671,21 @@ void __init mtrr_bp_init(void)
case X86_VENDOR_AMD:
if (cpu_feature_enabled(X86_FEATURE_K6_MTRR)) {
/* Pre-Athlon (K6) AMD CPU MTRRs */
- mtrr_if = mtrr_ops[X86_VENDOR_AMD];
+ mtrr_if = &amd_mtrr_ops;
size_or_mask = SIZE_OR_MASK_BITS(32);
size_and_mask = 0;
}
break;
case X86_VENDOR_CENTAUR:
if (cpu_feature_enabled(X86_FEATURE_CENTAUR_MCR)) {
- mtrr_if = mtrr_ops[X86_VENDOR_CENTAUR];
+ mtrr_if = ¢aur_mtrr_ops;
size_or_mask = SIZE_OR_MASK_BITS(32);
size_and_mask = 0;
}
break;
case X86_VENDOR_CYRIX:
if (cpu_feature_enabled(X86_FEATURE_CYRIX_ARR)) {
- mtrr_if = mtrr_ops[X86_VENDOR_CYRIX];
+ mtrr_if = &cyrix_mtrr_ops;
size_or_mask = SIZE_OR_MASK_BITS(32);
size_and_mask = 0;
}
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h
index c98928c..02eb587 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.h
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.h
@@ -51,8 +51,6 @@ void fill_mtrr_var_range(unsigned int index,
u32 base_lo, u32 base_hi, u32 mask_lo, u32 mask_hi);
bool get_mtrr_state(void);
-extern void __init set_mtrr_ops(const struct mtrr_ops *ops);
-
extern u64 size_or_mask, size_and_mask;
extern const struct mtrr_ops *mtrr_if;
@@ -66,10 +64,10 @@ void mtrr_state_warn(void);
const char *mtrr_attrib_to_str(int x);
void mtrr_wrmsr(unsigned, unsigned, unsigned);
-/* CPU specific mtrr init functions */
-int amd_init_mtrr(void);
-int cyrix_init_mtrr(void);
-int centaur_init_mtrr(void);
+/* CPU specific mtrr_ops vectors. */
+extern const struct mtrr_ops amd_mtrr_ops;
+extern const struct mtrr_ops cyrix_mtrr_ops;
+extern const struct mtrr_ops centaur_mtrr_ops;
extern int changed_by_mtrr_cleanup;
extern int mtrr_cleanup(unsigned address_bits);
On 01.12.22 17:26, Kirill A. Shutemov wrote:
> On Wed, Nov 02, 2022 at 08:47:10AM +0100, Juergen Gross wrote:
>> Today PAT is usable only with MTRR being active, with some nasty tweaks
>> to make PAT usable when running as Xen PV guest, which doesn't support
>> MTRR.
>>
>> The reason for this coupling is, that both, PAT MSR changes and MTRR
>> changes, require a similar sequence and so full PAT support was added
>> using the already available MTRR handling.
>>
>> Xen PV PAT handling can work without MTRR, as it just needs to consume
>> the PAT MSR setting done by the hypervisor without the ability and need
>> to change it. This in turn has resulted in a convoluted initialization
>> sequence and wrong decisions regarding cache mode availability due to
>> misguiding PAT availability flags.
>>
>> Fix all of that by allowing to use PAT without MTRR and by reworking
>> the current PAT initialization sequence to match better with the newly
>> introduced generic cache initialization.
>>
>> This removes the need of the recently added pat_force_disabled flag, so
>> remove the remnants of the patch adding it.
>>
>> Signed-off-by: Juergen Gross <[email protected]>
>
> This patch breaks boot for TDX guest.
>
> Kernel now tries to set CR0.CD which is forbidden in TDX guest[1] and
> causes #VE:
>
> tdx: Unexpected #VE: 28
> VE fault: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.0-rc1-00015-gadfe7512e1d0 #2646
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> RIP: 0010:native_write_cr0 (arch/x86/kernel/cpu/common.c:427)
> Call Trace:
> <TASK>
> ? cache_disable (arch/x86/include/asm/cpufeature.h:173 arch/x86/kernel/cpu/cacheinfo.c:1085)
> ? cache_cpu_init (arch/x86/kernel/cpu/cacheinfo.c:1132 (discriminator 3))
> ? setup_arch (arch/x86/kernel/setup.c:1079)
> ? start_kernel (init/main.c:279 (discriminator 3) init/main.c:477 (discriminator 3) init/main.c:960 (discriminator 3))
> ? load_ucode_bsp (arch/x86/kernel/cpu/microcode/core.c:155)
> ? secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:358)
> </TASK>
>
> Any suggestion how to fix it?
>
> [1] Section 10.6.1. "CR0", https://cdrdv2.intel.com/v1/dl/getContent/733568
What was the solution before?
I guess MTRR was disabled, so there was no PAT, too?
If this is the case, you can go the same route as Xen PV guests do.
Juergen
On Wed, Nov 02, 2022 at 08:47:10AM +0100, Juergen Gross wrote:
> Today PAT is usable only with MTRR being active, with some nasty tweaks
> to make PAT usable when running as Xen PV guest, which doesn't support
> MTRR.
>
> The reason for this coupling is, that both, PAT MSR changes and MTRR
> changes, require a similar sequence and so full PAT support was added
> using the already available MTRR handling.
>
> Xen PV PAT handling can work without MTRR, as it just needs to consume
> the PAT MSR setting done by the hypervisor without the ability and need
> to change it. This in turn has resulted in a convoluted initialization
> sequence and wrong decisions regarding cache mode availability due to
> misguiding PAT availability flags.
>
> Fix all of that by allowing to use PAT without MTRR and by reworking
> the current PAT initialization sequence to match better with the newly
> introduced generic cache initialization.
>
> This removes the need of the recently added pat_force_disabled flag, so
> remove the remnants of the patch adding it.
>
> Signed-off-by: Juergen Gross <[email protected]>
This patch breaks boot for TDX guest.
Kernel now tries to set CR0.CD which is forbidden in TDX guest[1] and
causes #VE:
tdx: Unexpected #VE: 28
VE fault: 0000 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.0-rc1-00015-gadfe7512e1d0 #2646
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:native_write_cr0 (arch/x86/kernel/cpu/common.c:427)
Call Trace:
<TASK>
? cache_disable (arch/x86/include/asm/cpufeature.h:173 arch/x86/kernel/cpu/cacheinfo.c:1085)
? cache_cpu_init (arch/x86/kernel/cpu/cacheinfo.c:1132 (discriminator 3))
? setup_arch (arch/x86/kernel/setup.c:1079)
? start_kernel (init/main.c:279 (discriminator 3) init/main.c:477 (discriminator 3) init/main.c:960 (discriminator 3))
? load_ucode_bsp (arch/x86/kernel/cpu/microcode/core.c:155)
? secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:358)
</TASK>
Any suggestion how to fix it?
[1] Section 10.6.1. "CR0", https://cdrdv2.intel.com/v1/dl/getContent/733568
--
Kiryl Shutsemau / Kirill A. Shutemov
On Thu, Dec 01, 2022 at 05:33:28PM +0100, Juergen Gross wrote:
> On 01.12.22 17:26, Kirill A. Shutemov wrote:
> > On Wed, Nov 02, 2022 at 08:47:10AM +0100, Juergen Gross wrote:
> > > Today PAT is usable only with MTRR being active, with some nasty tweaks
> > > to make PAT usable when running as Xen PV guest, which doesn't support
> > > MTRR.
> > >
> > > The reason for this coupling is, that both, PAT MSR changes and MTRR
> > > changes, require a similar sequence and so full PAT support was added
> > > using the already available MTRR handling.
> > >
> > > Xen PV PAT handling can work without MTRR, as it just needs to consume
> > > the PAT MSR setting done by the hypervisor without the ability and need
> > > to change it. This in turn has resulted in a convoluted initialization
> > > sequence and wrong decisions regarding cache mode availability due to
> > > misguiding PAT availability flags.
> > >
> > > Fix all of that by allowing to use PAT without MTRR and by reworking
> > > the current PAT initialization sequence to match better with the newly
> > > introduced generic cache initialization.
> > >
> > > This removes the need of the recently added pat_force_disabled flag, so
> > > remove the remnants of the patch adding it.
> > >
> > > Signed-off-by: Juergen Gross <[email protected]>
> >
> > This patch breaks boot for TDX guest.
> >
> > Kernel now tries to set CR0.CD which is forbidden in TDX guest[1] and
> > causes #VE:
> >
> > tdx: Unexpected #VE: 28
> > VE fault: 0000 [#1] PREEMPT SMP NOPTI
> > CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.0-rc1-00015-gadfe7512e1d0 #2646
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> > RIP: 0010:native_write_cr0 (arch/x86/kernel/cpu/common.c:427)
> > Call Trace:
> > <TASK>
> > ? cache_disable (arch/x86/include/asm/cpufeature.h:173 arch/x86/kernel/cpu/cacheinfo.c:1085)
> > ? cache_cpu_init (arch/x86/kernel/cpu/cacheinfo.c:1132 (discriminator 3))
> > ? setup_arch (arch/x86/kernel/setup.c:1079)
> > ? start_kernel (init/main.c:279 (discriminator 3) init/main.c:477 (discriminator 3) init/main.c:960 (discriminator 3))
> > ? load_ucode_bsp (arch/x86/kernel/cpu/microcode/core.c:155)
> > ? secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:358)
> > </TASK>
> >
> > Any suggestion how to fix it?
> >
> > [1] Section 10.6.1. "CR0", https://cdrdv2.intel.com/v1/dl/getContent/733568
>
> What was the solution before?
>
> I guess MTRR was disabled, so there was no PAT, too?
Right:
Linus' tree:
[ 0.002589] last_pfn = 0x480000 max_arch_pfn = 0x10000000000
[ 0.003976] Disabled
[ 0.004452] x86/PAT: MTRRs disabled, skipping PAT initialization too.
[ 0.005856] CPU MTRRs all blank - virtualized system.
[ 0.006915] x86/PAT: Configuration [0-7]: WB WT UC- UC WB WT UC- UC
tip/master:
[ 0.003443] last_pfn = 0x20b8e max_arch_pfn = 0x10000000000
[ 0.005220] Disabled
[ 0.005818] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT
[ 0.007752] tdx: Unexpected #VE: 28
The dangling "Disabled" comes mtrr_bp_init().
> If this is the case, you can go the same route as Xen PV guests do.
Any reason X86_FEATURE_HYPERVISOR cannot be used instead of
X86_FEATURE_XENPV there?
Do we have any virtualized platform that supports it?
--
Kiryl Shutsemau / Kirill A. Shutemov
On 02.12.22 00:57, Kirill A. Shutemov wrote:
> On Thu, Dec 01, 2022 at 05:33:28PM +0100, Juergen Gross wrote:
>> On 01.12.22 17:26, Kirill A. Shutemov wrote:
>>> On Wed, Nov 02, 2022 at 08:47:10AM +0100, Juergen Gross wrote:
>>>> Today PAT is usable only with MTRR being active, with some nasty tweaks
>>>> to make PAT usable when running as Xen PV guest, which doesn't support
>>>> MTRR.
>>>>
>>>> The reason for this coupling is, that both, PAT MSR changes and MTRR
>>>> changes, require a similar sequence and so full PAT support was added
>>>> using the already available MTRR handling.
>>>>
>>>> Xen PV PAT handling can work without MTRR, as it just needs to consume
>>>> the PAT MSR setting done by the hypervisor without the ability and need
>>>> to change it. This in turn has resulted in a convoluted initialization
>>>> sequence and wrong decisions regarding cache mode availability due to
>>>> misguiding PAT availability flags.
>>>>
>>>> Fix all of that by allowing to use PAT without MTRR and by reworking
>>>> the current PAT initialization sequence to match better with the newly
>>>> introduced generic cache initialization.
>>>>
>>>> This removes the need of the recently added pat_force_disabled flag, so
>>>> remove the remnants of the patch adding it.
>>>>
>>>> Signed-off-by: Juergen Gross <[email protected]>
>>>
>>> This patch breaks boot for TDX guest.
>>>
>>> Kernel now tries to set CR0.CD which is forbidden in TDX guest[1] and
>>> causes #VE:
>>>
>>> tdx: Unexpected #VE: 28
>>> VE fault: 0000 [#1] PREEMPT SMP NOPTI
>>> CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.0-rc1-00015-gadfe7512e1d0 #2646
>>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
>>> RIP: 0010:native_write_cr0 (arch/x86/kernel/cpu/common.c:427)
>>> Call Trace:
>>> <TASK>
>>> ? cache_disable (arch/x86/include/asm/cpufeature.h:173 arch/x86/kernel/cpu/cacheinfo.c:1085)
>>> ? cache_cpu_init (arch/x86/kernel/cpu/cacheinfo.c:1132 (discriminator 3))
>>> ? setup_arch (arch/x86/kernel/setup.c:1079)
>>> ? start_kernel (init/main.c:279 (discriminator 3) init/main.c:477 (discriminator 3) init/main.c:960 (discriminator 3))
>>> ? load_ucode_bsp (arch/x86/kernel/cpu/microcode/core.c:155)
>>> ? secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:358)
>>> </TASK>
>>>
>>> Any suggestion how to fix it?
>>>
>>> [1] Section 10.6.1. "CR0", https://cdrdv2.intel.com/v1/dl/getContent/733568
>>
>> What was the solution before?
>>
>> I guess MTRR was disabled, so there was no PAT, too?
>
> Right:
>
> Linus' tree:
>
> [ 0.002589] last_pfn = 0x480000 max_arch_pfn = 0x10000000000
> [ 0.003976] Disabled
> [ 0.004452] x86/PAT: MTRRs disabled, skipping PAT initialization too.
> [ 0.005856] CPU MTRRs all blank - virtualized system.
> [ 0.006915] x86/PAT: Configuration [0-7]: WB WT UC- UC WB WT UC- UC
>
> tip/master:
>
> [ 0.003443] last_pfn = 0x20b8e max_arch_pfn = 0x10000000000
> [ 0.005220] Disabled
> [ 0.005818] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT
> [ 0.007752] tdx: Unexpected #VE: 28
>
> The dangling "Disabled" comes mtrr_bp_init().
>
>
>> If this is the case, you can go the same route as Xen PV guests do.
>
> Any reason X86_FEATURE_HYPERVISOR cannot be used instead of
> X86_FEATURE_XENPV there?
>
> Do we have any virtualized platform that supports it?
Yes, of course. Any hardware virtualized guest should be able to use it,
obviously TDX guests are the first ones not being able to do so.
And above dmesg snipplets are showing rather nicely that not disabling
PAT completely should be a benefit for TDX guests, as all caching modes
would be usable (the PAT MSR seems to be initialized quite fine).
Instead of X86_FEATURE_XENPV we could introduce something like
X86_FEATURE_PAT_READONLY, which could be set for Xen PV guests and for
TDX guests.
Juergen
On 02.12.22 14:27, Kirill A. Shutemov wrote:
> On Fri, Dec 02, 2022 at 06:56:47AM +0100, Juergen Gross wrote:
>> On 02.12.22 00:57, Kirill A. Shutemov wrote:
>>> On Thu, Dec 01, 2022 at 05:33:28PM +0100, Juergen Gross wrote:
>>>> On 01.12.22 17:26, Kirill A. Shutemov wrote:
>>>>> On Wed, Nov 02, 2022 at 08:47:10AM +0100, Juergen Gross wrote:
>>>>>> Today PAT is usable only with MTRR being active, with some nasty tweaks
>>>>>> to make PAT usable when running as Xen PV guest, which doesn't support
>>>>>> MTRR.
>>>>>>
>>>>>> The reason for this coupling is, that both, PAT MSR changes and MTRR
>>>>>> changes, require a similar sequence and so full PAT support was added
>>>>>> using the already available MTRR handling.
>>>>>>
>>>>>> Xen PV PAT handling can work without MTRR, as it just needs to consume
>>>>>> the PAT MSR setting done by the hypervisor without the ability and need
>>>>>> to change it. This in turn has resulted in a convoluted initialization
>>>>>> sequence and wrong decisions regarding cache mode availability due to
>>>>>> misguiding PAT availability flags.
>>>>>>
>>>>>> Fix all of that by allowing to use PAT without MTRR and by reworking
>>>>>> the current PAT initialization sequence to match better with the newly
>>>>>> introduced generic cache initialization.
>>>>>>
>>>>>> This removes the need of the recently added pat_force_disabled flag, so
>>>>>> remove the remnants of the patch adding it.
>>>>>>
>>>>>> Signed-off-by: Juergen Gross <[email protected]>
>>>>>
>>>>> This patch breaks boot for TDX guest.
>>>>>
>>>>> Kernel now tries to set CR0.CD which is forbidden in TDX guest[1] and
>>>>> causes #VE:
>>>>>
>>>>> tdx: Unexpected #VE: 28
>>>>> VE fault: 0000 [#1] PREEMPT SMP NOPTI
>>>>> CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.0-rc1-00015-gadfe7512e1d0 #2646
>>>>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
>>>>> RIP: 0010:native_write_cr0 (arch/x86/kernel/cpu/common.c:427)
>>>>> Call Trace:
>>>>> <TASK>
>>>>> ? cache_disable (arch/x86/include/asm/cpufeature.h:173 arch/x86/kernel/cpu/cacheinfo.c:1085)
>>>>> ? cache_cpu_init (arch/x86/kernel/cpu/cacheinfo.c:1132 (discriminator 3))
>>>>> ? setup_arch (arch/x86/kernel/setup.c:1079)
>>>>> ? start_kernel (init/main.c:279 (discriminator 3) init/main.c:477 (discriminator 3) init/main.c:960 (discriminator 3))
>>>>> ? load_ucode_bsp (arch/x86/kernel/cpu/microcode/core.c:155)
>>>>> ? secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:358)
>>>>> </TASK>
>>>>>
>>>>> Any suggestion how to fix it?
>>>>>
>>>>> [1] Section 10.6.1. "CR0", https://cdrdv2.intel.com/v1/dl/getContent/733568
>>>>
>>>> What was the solution before?
>>>>
>>>> I guess MTRR was disabled, so there was no PAT, too?
>>>
>>> Right:
>>>
>>> Linus' tree:
>>>
>>> [ 0.002589] last_pfn = 0x480000 max_arch_pfn = 0x10000000000
>>> [ 0.003976] Disabled
>>> [ 0.004452] x86/PAT: MTRRs disabled, skipping PAT initialization too.
>>> [ 0.005856] CPU MTRRs all blank - virtualized system.
>>> [ 0.006915] x86/PAT: Configuration [0-7]: WB WT UC- UC WB WT UC- UC
>>>
>>> tip/master:
>>>
>>> [ 0.003443] last_pfn = 0x20b8e max_arch_pfn = 0x10000000000
>>> [ 0.005220] Disabled
>>> [ 0.005818] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT
>>> [ 0.007752] tdx: Unexpected #VE: 28
>>>
>>> The dangling "Disabled" comes mtrr_bp_init().
>>>
>>>
>>>> If this is the case, you can go the same route as Xen PV guests do.
>>>
>>> Any reason X86_FEATURE_HYPERVISOR cannot be used instead of
>>> X86_FEATURE_XENPV there?
>>>
>>> Do we have any virtualized platform that supports it?
>>
>> Yes, of course. Any hardware virtualized guest should be able to use it,
>> obviously TDX guests are the first ones not being able to do so.
>>
>> And above dmesg snipplets are showing rather nicely that not disabling
>> PAT completely should be a benefit for TDX guests, as all caching modes
>> would be usable (the PAT MSR seems to be initialized quite fine).
>>
>> Instead of X86_FEATURE_XENPV we could introduce something like
>> X86_FEATURE_PAT_READONLY, which could be set for Xen PV guests and for
>> TDX guests.
>
> Technically, the MSR is writable on TDX. But it seems there's no way to
> properly change it, following the protocol of changing on MP systems.
Why not? I don't see why it is possible in a non-TDX system, but not within
a TDX guest.
> Although, I don't quite follow what role cache disabling playing on system
> with self-snoop support. Hm?
It is the recommended way to do it. See SDM Vol. 3 Chapter 11 ("Memory Cache
Control"):
The operating system is responsible for insuring that changes to a PAT entry
occur in a manner that maintains the consistency of the processor caches and
translation lookaside buffers (TLB). This is accomplished by following the
procedure as specified in Section 11.11.8, “MTRR Considerations in MP Systems,
”for changing the value of an MTRR in a multiple processor system. It requires
a specific sequence of operations that includes flushing the processors caches
and TLBs.
And the sequence for the MTRRs is:
1. Broadcast to all processors to execute the following code sequence.
2. Disable interrupts.
3. Wait for all processors to reach this point.
4. Enter the no-fill cache mode. (Set the CD flag in control register CR0 to 1
and the NW flag to 0.)
5. Flush all caches using the WBINVD instructions. Note on a processor that
supports self-snooping, CPUID feature flag bit 27, this step is unnecessary.
6. If the PGE flag is set in control register CR4, flush all TLBs by clearing
that flag.
7. If the PGE flag is clear in control register CR4, flush all TLBs by executing
a MOV from control register CR3 to another register and then a MOV from that
register back to CR3.
8. Disable all range registers (by clearing the E flag in register MTRRdefType).
If only variable ranges are being modified, software may clear the valid bits
for the affected register pairs instead.
9. Update the MTRRs.
10. Enable all range registers (by setting the E flag in register MTRRdefType).
If only variable-range registers were modified and their individual valid
bits were cleared, then set the valid bits for the affected ranges instead.
11. Flush all caches and all TLBs a second time. (The TLB flush is required for
Pentium 4, Intel Xeon, and P6 family processors. Executing the WBINVD
instruction is not needed when using Pentium 4, Intel Xeon, and P6 family
processors, but it may be needed in future systems.)
12. Enter the normal cache mode to re-enable caching. (Set the CD and NW flags
in control register CR0 to 0.)
13. Set PGE flag in control register CR4, if cleared in Step 6 (above).
14. Wait for all processors to reach this point.
15. Enable interrupts.
So cache disabling is recommended.
Juergen
On Fri, Dec 02, 2022 at 06:56:47AM +0100, Juergen Gross wrote:
> On 02.12.22 00:57, Kirill A. Shutemov wrote:
> > On Thu, Dec 01, 2022 at 05:33:28PM +0100, Juergen Gross wrote:
> > > On 01.12.22 17:26, Kirill A. Shutemov wrote:
> > > > On Wed, Nov 02, 2022 at 08:47:10AM +0100, Juergen Gross wrote:
> > > > > Today PAT is usable only with MTRR being active, with some nasty tweaks
> > > > > to make PAT usable when running as Xen PV guest, which doesn't support
> > > > > MTRR.
> > > > >
> > > > > The reason for this coupling is, that both, PAT MSR changes and MTRR
> > > > > changes, require a similar sequence and so full PAT support was added
> > > > > using the already available MTRR handling.
> > > > >
> > > > > Xen PV PAT handling can work without MTRR, as it just needs to consume
> > > > > the PAT MSR setting done by the hypervisor without the ability and need
> > > > > to change it. This in turn has resulted in a convoluted initialization
> > > > > sequence and wrong decisions regarding cache mode availability due to
> > > > > misguiding PAT availability flags.
> > > > >
> > > > > Fix all of that by allowing to use PAT without MTRR and by reworking
> > > > > the current PAT initialization sequence to match better with the newly
> > > > > introduced generic cache initialization.
> > > > >
> > > > > This removes the need of the recently added pat_force_disabled flag, so
> > > > > remove the remnants of the patch adding it.
> > > > >
> > > > > Signed-off-by: Juergen Gross <[email protected]>
> > > >
> > > > This patch breaks boot for TDX guest.
> > > >
> > > > Kernel now tries to set CR0.CD which is forbidden in TDX guest[1] and
> > > > causes #VE:
> > > >
> > > > tdx: Unexpected #VE: 28
> > > > VE fault: 0000 [#1] PREEMPT SMP NOPTI
> > > > CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.0-rc1-00015-gadfe7512e1d0 #2646
> > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> > > > RIP: 0010:native_write_cr0 (arch/x86/kernel/cpu/common.c:427)
> > > > Call Trace:
> > > > <TASK>
> > > > ? cache_disable (arch/x86/include/asm/cpufeature.h:173 arch/x86/kernel/cpu/cacheinfo.c:1085)
> > > > ? cache_cpu_init (arch/x86/kernel/cpu/cacheinfo.c:1132 (discriminator 3))
> > > > ? setup_arch (arch/x86/kernel/setup.c:1079)
> > > > ? start_kernel (init/main.c:279 (discriminator 3) init/main.c:477 (discriminator 3) init/main.c:960 (discriminator 3))
> > > > ? load_ucode_bsp (arch/x86/kernel/cpu/microcode/core.c:155)
> > > > ? secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:358)
> > > > </TASK>
> > > >
> > > > Any suggestion how to fix it?
> > > >
> > > > [1] Section 10.6.1. "CR0", https://cdrdv2.intel.com/v1/dl/getContent/733568
> > >
> > > What was the solution before?
> > >
> > > I guess MTRR was disabled, so there was no PAT, too?
> >
> > Right:
> >
> > Linus' tree:
> >
> > [ 0.002589] last_pfn = 0x480000 max_arch_pfn = 0x10000000000
> > [ 0.003976] Disabled
> > [ 0.004452] x86/PAT: MTRRs disabled, skipping PAT initialization too.
> > [ 0.005856] CPU MTRRs all blank - virtualized system.
> > [ 0.006915] x86/PAT: Configuration [0-7]: WB WT UC- UC WB WT UC- UC
> >
> > tip/master:
> >
> > [ 0.003443] last_pfn = 0x20b8e max_arch_pfn = 0x10000000000
> > [ 0.005220] Disabled
> > [ 0.005818] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT
> > [ 0.007752] tdx: Unexpected #VE: 28
> >
> > The dangling "Disabled" comes mtrr_bp_init().
> >
> >
> > > If this is the case, you can go the same route as Xen PV guests do.
> >
> > Any reason X86_FEATURE_HYPERVISOR cannot be used instead of
> > X86_FEATURE_XENPV there?
> >
> > Do we have any virtualized platform that supports it?
>
> Yes, of course. Any hardware virtualized guest should be able to use it,
> obviously TDX guests are the first ones not being able to do so.
>
> And above dmesg snipplets are showing rather nicely that not disabling
> PAT completely should be a benefit for TDX guests, as all caching modes
> would be usable (the PAT MSR seems to be initialized quite fine).
>
> Instead of X86_FEATURE_XENPV we could introduce something like
> X86_FEATURE_PAT_READONLY, which could be set for Xen PV guests and for
> TDX guests.
Technically, the MSR is writable on TDX. But it seems there's no way to
properly change it, following the protocol of changing on MP systems.
Although, I don't quite follow what role cache disabling playing on system
with self-snoop support. Hm?
--
Kiryl Shutsemau / Kirill A. Shutemov
On Fri, Dec 02, 2022 at 06:56:47AM +0100, Juergen Gross wrote:
> Instead of X86_FEATURE_XENPV we could introduce something like
> X86_FEATURE_PAT_READONLY, which could be set for Xen PV guests and for
> TDX guests.
Until a third type comes which wants its pony to be pink and to dance.
:-\
Nah, we already have X86_FEATURE_TDX_GUEST - let's use that in
pat_bp_init() and exit there early and be done with it. This way it is
also self-documenting who can and cannot deal with that code.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Dec 02, 2022 at 02:39:58PM +0100, Juergen Gross wrote:
> On 02.12.22 14:27, Kirill A. Shutemov wrote:
> > On Fri, Dec 02, 2022 at 06:56:47AM +0100, Juergen Gross wrote:
> > > On 02.12.22 00:57, Kirill A. Shutemov wrote:
> > > > On Thu, Dec 01, 2022 at 05:33:28PM +0100, Juergen Gross wrote:
> > > > > On 01.12.22 17:26, Kirill A. Shutemov wrote:
> > > > > > On Wed, Nov 02, 2022 at 08:47:10AM +0100, Juergen Gross wrote:
> > > > > > > Today PAT is usable only with MTRR being active, with some nasty tweaks
> > > > > > > to make PAT usable when running as Xen PV guest, which doesn't support
> > > > > > > MTRR.
> > > > > > >
> > > > > > > The reason for this coupling is, that both, PAT MSR changes and MTRR
> > > > > > > changes, require a similar sequence and so full PAT support was added
> > > > > > > using the already available MTRR handling.
> > > > > > >
> > > > > > > Xen PV PAT handling can work without MTRR, as it just needs to consume
> > > > > > > the PAT MSR setting done by the hypervisor without the ability and need
> > > > > > > to change it. This in turn has resulted in a convoluted initialization
> > > > > > > sequence and wrong decisions regarding cache mode availability due to
> > > > > > > misguiding PAT availability flags.
> > > > > > >
> > > > > > > Fix all of that by allowing to use PAT without MTRR and by reworking
> > > > > > > the current PAT initialization sequence to match better with the newly
> > > > > > > introduced generic cache initialization.
> > > > > > >
> > > > > > > This removes the need of the recently added pat_force_disabled flag, so
> > > > > > > remove the remnants of the patch adding it.
> > > > > > >
> > > > > > > Signed-off-by: Juergen Gross <[email protected]>
> > > > > >
> > > > > > This patch breaks boot for TDX guest.
> > > > > >
> > > > > > Kernel now tries to set CR0.CD which is forbidden in TDX guest[1] and
> > > > > > causes #VE:
> > > > > >
> > > > > > tdx: Unexpected #VE: 28
> > > > > > VE fault: 0000 [#1] PREEMPT SMP NOPTI
> > > > > > CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.0-rc1-00015-gadfe7512e1d0 #2646
> > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> > > > > > RIP: 0010:native_write_cr0 (arch/x86/kernel/cpu/common.c:427)
> > > > > > Call Trace:
> > > > > > <TASK>
> > > > > > ? cache_disable (arch/x86/include/asm/cpufeature.h:173 arch/x86/kernel/cpu/cacheinfo.c:1085)
> > > > > > ? cache_cpu_init (arch/x86/kernel/cpu/cacheinfo.c:1132 (discriminator 3))
> > > > > > ? setup_arch (arch/x86/kernel/setup.c:1079)
> > > > > > ? start_kernel (init/main.c:279 (discriminator 3) init/main.c:477 (discriminator 3) init/main.c:960 (discriminator 3))
> > > > > > ? load_ucode_bsp (arch/x86/kernel/cpu/microcode/core.c:155)
> > > > > > ? secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:358)
> > > > > > </TASK>
> > > > > >
> > > > > > Any suggestion how to fix it?
> > > > > >
> > > > > > [1] Section 10.6.1. "CR0", https://cdrdv2.intel.com/v1/dl/getContent/733568
> > > > >
> > > > > What was the solution before?
> > > > >
> > > > > I guess MTRR was disabled, so there was no PAT, too?
> > > >
> > > > Right:
> > > >
> > > > Linus' tree:
> > > >
> > > > [ 0.002589] last_pfn = 0x480000 max_arch_pfn = 0x10000000000
> > > > [ 0.003976] Disabled
> > > > [ 0.004452] x86/PAT: MTRRs disabled, skipping PAT initialization too.
> > > > [ 0.005856] CPU MTRRs all blank - virtualized system.
> > > > [ 0.006915] x86/PAT: Configuration [0-7]: WB WT UC- UC WB WT UC- UC
> > > >
> > > > tip/master:
> > > >
> > > > [ 0.003443] last_pfn = 0x20b8e max_arch_pfn = 0x10000000000
> > > > [ 0.005220] Disabled
> > > > [ 0.005818] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT
> > > > [ 0.007752] tdx: Unexpected #VE: 28
> > > >
> > > > The dangling "Disabled" comes mtrr_bp_init().
> > > >
> > > >
> > > > > If this is the case, you can go the same route as Xen PV guests do.
> > > >
> > > > Any reason X86_FEATURE_HYPERVISOR cannot be used instead of
> > > > X86_FEATURE_XENPV there?
> > > >
> > > > Do we have any virtualized platform that supports it?
> > >
> > > Yes, of course. Any hardware virtualized guest should be able to use it,
> > > obviously TDX guests are the first ones not being able to do so.
> > >
> > > And above dmesg snipplets are showing rather nicely that not disabling
> > > PAT completely should be a benefit for TDX guests, as all caching modes
> > > would be usable (the PAT MSR seems to be initialized quite fine).
> > >
> > > Instead of X86_FEATURE_XENPV we could introduce something like
> > > X86_FEATURE_PAT_READONLY, which could be set for Xen PV guests and for
> > > TDX guests.
> >
> > Technically, the MSR is writable on TDX. But it seems there's no way to
> > properly change it, following the protocol of changing on MP systems.
>
> Why not? I don't see why it is possible in a non-TDX system, but not within
> a TDX guest.
Because the protocol you described below requires setting CR0.CD which is
not allowed in TDX guest and causes #VE.
> > Although, I don't quite follow what role cache disabling playing on system
> > with self-snoop support. Hm?
>
> It is the recommended way to do it. See SDM Vol. 3 Chapter 11 ("Memory Cache
> Control"):
>
> The operating system is responsible for insuring that changes to a PAT entry
> occur in a manner that maintains the consistency of the processor caches and
> translation lookaside buffers (TLB). This is accomplished by following the
> procedure as specified in Section 11.11.8, “MTRR Considerations in MP Systems,
> ”for changing the value of an MTRR in a multiple processor system. It requires
> a specific sequence of operations that includes flushing the processors caches
> and TLBs.
>
> And the sequence for the MTRRs is:
>
> 1. Broadcast to all processors to execute the following code sequence.
> 2. Disable interrupts.
> 3. Wait for all processors to reach this point.
> 4. Enter the no-fill cache mode. (Set the CD flag in control register CR0 to 1
> and the NW flag to 0.)
> 5. Flush all caches using the WBINVD instructions. Note on a processor that
> supports self-snooping, CPUID feature flag bit 27, this step is unnecessary.
> 6. If the PGE flag is set in control register CR4, flush all TLBs by clearing
> that flag.
> 7. If the PGE flag is clear in control register CR4, flush all TLBs by executing
> a MOV from control register CR3 to another register and then a MOV from that
> register back to CR3.
> 8. Disable all range registers (by clearing the E flag in register MTRRdefType).
> If only variable ranges are being modified, software may clear the valid bits
> for the affected register pairs instead.
> 9. Update the MTRRs.
> 10. Enable all range registers (by setting the E flag in register MTRRdefType).
> If only variable-range registers were modified and their individual valid
> bits were cleared, then set the valid bits for the affected ranges instead.
> 11. Flush all caches and all TLBs a second time. (The TLB flush is required for
> Pentium 4, Intel Xeon, and P6 family processors. Executing the WBINVD
> instruction is not needed when using Pentium 4, Intel Xeon, and P6 family
> processors, but it may be needed in future systems.)
> 12. Enter the normal cache mode to re-enable caching. (Set the CD and NW flags
> in control register CR0 to 0.)
> 13. Set PGE flag in control register CR4, if cleared in Step 6 (above).
> 14. Wait for all processors to reach this point.
> 15. Enable interrupts.
>
> So cache disabling is recommended.
Yeah, I read that.
But the question is what kind of scenario cache disabling is actually
prevents if self-snoop is supported? In this case cache stays intact (no
WBINVD). The next time a cache line gets accessed with different caching
mode the old line gets snooped, right?
Would it be valid to avoid touching CR0.CD if X86_FEATURE_SELFSNOOP?
--
Kiryl Shutsemau / Kirill A. Shutemov
On 02.12.22 15:33, Kirill A. Shutemov wrote:
> On Fri, Dec 02, 2022 at 02:39:58PM +0100, Juergen Gross wrote:
>> On 02.12.22 14:27, Kirill A. Shutemov wrote:
>>> On Fri, Dec 02, 2022 at 06:56:47AM +0100, Juergen Gross wrote:
>>>> On 02.12.22 00:57, Kirill A. Shutemov wrote:
>>>>> On Thu, Dec 01, 2022 at 05:33:28PM +0100, Juergen Gross wrote:
>>>>>> On 01.12.22 17:26, Kirill A. Shutemov wrote:
>>>>>>> On Wed, Nov 02, 2022 at 08:47:10AM +0100, Juergen Gross wrote:
>>>>>>>> Today PAT is usable only with MTRR being active, with some nasty tweaks
>>>>>>>> to make PAT usable when running as Xen PV guest, which doesn't support
>>>>>>>> MTRR.
>>>>>>>>
>>>>>>>> The reason for this coupling is, that both, PAT MSR changes and MTRR
>>>>>>>> changes, require a similar sequence and so full PAT support was added
>>>>>>>> using the already available MTRR handling.
>>>>>>>>
>>>>>>>> Xen PV PAT handling can work without MTRR, as it just needs to consume
>>>>>>>> the PAT MSR setting done by the hypervisor without the ability and need
>>>>>>>> to change it. This in turn has resulted in a convoluted initialization
>>>>>>>> sequence and wrong decisions regarding cache mode availability due to
>>>>>>>> misguiding PAT availability flags.
>>>>>>>>
>>>>>>>> Fix all of that by allowing to use PAT without MTRR and by reworking
>>>>>>>> the current PAT initialization sequence to match better with the newly
>>>>>>>> introduced generic cache initialization.
>>>>>>>>
>>>>>>>> This removes the need of the recently added pat_force_disabled flag, so
>>>>>>>> remove the remnants of the patch adding it.
>>>>>>>>
>>>>>>>> Signed-off-by: Juergen Gross <[email protected]>
>>>>>>>
>>>>>>> This patch breaks boot for TDX guest.
>>>>>>>
>>>>>>> Kernel now tries to set CR0.CD which is forbidden in TDX guest[1] and
>>>>>>> causes #VE:
>>>>>>>
>>>>>>> tdx: Unexpected #VE: 28
>>>>>>> VE fault: 0000 [#1] PREEMPT SMP NOPTI
>>>>>>> CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.0-rc1-00015-gadfe7512e1d0 #2646
>>>>>>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
>>>>>>> RIP: 0010:native_write_cr0 (arch/x86/kernel/cpu/common.c:427)
>>>>>>> Call Trace:
>>>>>>> <TASK>
>>>>>>> ? cache_disable (arch/x86/include/asm/cpufeature.h:173 arch/x86/kernel/cpu/cacheinfo.c:1085)
>>>>>>> ? cache_cpu_init (arch/x86/kernel/cpu/cacheinfo.c:1132 (discriminator 3))
>>>>>>> ? setup_arch (arch/x86/kernel/setup.c:1079)
>>>>>>> ? start_kernel (init/main.c:279 (discriminator 3) init/main.c:477 (discriminator 3) init/main.c:960 (discriminator 3))
>>>>>>> ? load_ucode_bsp (arch/x86/kernel/cpu/microcode/core.c:155)
>>>>>>> ? secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:358)
>>>>>>> </TASK>
>>>>>>>
>>>>>>> Any suggestion how to fix it?
>>>>>>>
>>>>>>> [1] Section 10.6.1. "CR0", https://cdrdv2.intel.com/v1/dl/getContent/733568
>>>>>>
>>>>>> What was the solution before?
>>>>>>
>>>>>> I guess MTRR was disabled, so there was no PAT, too?
>>>>>
>>>>> Right:
>>>>>
>>>>> Linus' tree:
>>>>>
>>>>> [ 0.002589] last_pfn = 0x480000 max_arch_pfn = 0x10000000000
>>>>> [ 0.003976] Disabled
>>>>> [ 0.004452] x86/PAT: MTRRs disabled, skipping PAT initialization too.
>>>>> [ 0.005856] CPU MTRRs all blank - virtualized system.
>>>>> [ 0.006915] x86/PAT: Configuration [0-7]: WB WT UC- UC WB WT UC- UC
>>>>>
>>>>> tip/master:
>>>>>
>>>>> [ 0.003443] last_pfn = 0x20b8e max_arch_pfn = 0x10000000000
>>>>> [ 0.005220] Disabled
>>>>> [ 0.005818] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT
>>>>> [ 0.007752] tdx: Unexpected #VE: 28
>>>>>
>>>>> The dangling "Disabled" comes mtrr_bp_init().
>>>>>
>>>>>
>>>>>> If this is the case, you can go the same route as Xen PV guests do.
>>>>>
>>>>> Any reason X86_FEATURE_HYPERVISOR cannot be used instead of
>>>>> X86_FEATURE_XENPV there?
>>>>>
>>>>> Do we have any virtualized platform that supports it?
>>>>
>>>> Yes, of course. Any hardware virtualized guest should be able to use it,
>>>> obviously TDX guests are the first ones not being able to do so.
>>>>
>>>> And above dmesg snipplets are showing rather nicely that not disabling
>>>> PAT completely should be a benefit for TDX guests, as all caching modes
>>>> would be usable (the PAT MSR seems to be initialized quite fine).
>>>>
>>>> Instead of X86_FEATURE_XENPV we could introduce something like
>>>> X86_FEATURE_PAT_READONLY, which could be set for Xen PV guests and for
>>>> TDX guests.
>>>
>>> Technically, the MSR is writable on TDX. But it seems there's no way to
>>> properly change it, following the protocol of changing on MP systems.
>>
>> Why not? I don't see why it is possible in a non-TDX system, but not within
>> a TDX guest.
>
> Because the protocol you described below requires setting CR0.CD which is
> not allowed in TDX guest and causes #VE.
Hmm, yes, seems to be a valid reason. :-)
>
>>> Although, I don't quite follow what role cache disabling playing on system
>>> with self-snoop support. Hm?
>>
>> It is the recommended way to do it. See SDM Vol. 3 Chapter 11 ("Memory Cache
>> Control"):
>>
>> The operating system is responsible for insuring that changes to a PAT entry
>> occur in a manner that maintains the consistency of the processor caches and
>> translation lookaside buffers (TLB). This is accomplished by following the
>> procedure as specified in Section 11.11.8, “MTRR Considerations in MP Systems,
>> ”for changing the value of an MTRR in a multiple processor system. It requires
>> a specific sequence of operations that includes flushing the processors caches
>> and TLBs.
>>
>> And the sequence for the MTRRs is:
>>
>> 1. Broadcast to all processors to execute the following code sequence.
>> 2. Disable interrupts.
>> 3. Wait for all processors to reach this point.
>> 4. Enter the no-fill cache mode. (Set the CD flag in control register CR0 to 1
>> and the NW flag to 0.)
>> 5. Flush all caches using the WBINVD instructions. Note on a processor that
>> supports self-snooping, CPUID feature flag bit 27, this step is unnecessary.
>> 6. If the PGE flag is set in control register CR4, flush all TLBs by clearing
>> that flag.
>> 7. If the PGE flag is clear in control register CR4, flush all TLBs by executing
>> a MOV from control register CR3 to another register and then a MOV from that
>> register back to CR3.
>> 8. Disable all range registers (by clearing the E flag in register MTRRdefType).
>> If only variable ranges are being modified, software may clear the valid bits
>> for the affected register pairs instead.
>> 9. Update the MTRRs.
>> 10. Enable all range registers (by setting the E flag in register MTRRdefType).
>> If only variable-range registers were modified and their individual valid
>> bits were cleared, then set the valid bits for the affected ranges instead.
>> 11. Flush all caches and all TLBs a second time. (The TLB flush is required for
>> Pentium 4, Intel Xeon, and P6 family processors. Executing the WBINVD
>> instruction is not needed when using Pentium 4, Intel Xeon, and P6 family
>> processors, but it may be needed in future systems.)
>> 12. Enter the normal cache mode to re-enable caching. (Set the CD and NW flags
>> in control register CR0 to 0.)
>> 13. Set PGE flag in control register CR4, if cleared in Step 6 (above).
>> 14. Wait for all processors to reach this point.
>> 15. Enable interrupts.
>>
>> So cache disabling is recommended.
>
> Yeah, I read that.
>
> But the question is what kind of scenario cache disabling is actually
> prevents if self-snoop is supported? In this case cache stays intact (no
> WBINVD). The next time a cache line gets accessed with different caching
> mode the old line gets snooped, right?
>
> Would it be valid to avoid touching CR0.CD if X86_FEATURE_SELFSNOOP?
>
That's a question for the Intel architects, I guess.
I'd just ask them how to setup PAT in TDX guests. Either they need to
change the recommended setup sequence, or the PAT support bit needs to
be cleared IMO.
Juergen
On 02.12.22 15:56, Juergen Gross wrote:
> On 02.12.22 15:33, Kirill A. Shutemov wrote:
>> On Fri, Dec 02, 2022 at 02:39:58PM +0100, Juergen Gross wrote:
>>> On 02.12.22 14:27, Kirill A. Shutemov wrote:
>>>> On Fri, Dec 02, 2022 at 06:56:47AM +0100, Juergen Gross wrote:
>>>>> On 02.12.22 00:57, Kirill A. Shutemov wrote:
>>>>>> On Thu, Dec 01, 2022 at 05:33:28PM +0100, Juergen Gross wrote:
>>>>>>> On 01.12.22 17:26, Kirill A. Shutemov wrote:
>>>>>>>> On Wed, Nov 02, 2022 at 08:47:10AM +0100, Juergen Gross wrote:
>>>>>>>>> Today PAT is usable only with MTRR being active, with some nasty tweaks
>>>>>>>>> to make PAT usable when running as Xen PV guest, which doesn't support
>>>>>>>>> MTRR.
>>>>>>>>>
>>>>>>>>> The reason for this coupling is, that both, PAT MSR changes and MTRR
>>>>>>>>> changes, require a similar sequence and so full PAT support was added
>>>>>>>>> using the already available MTRR handling.
>>>>>>>>>
>>>>>>>>> Xen PV PAT handling can work without MTRR, as it just needs to consume
>>>>>>>>> the PAT MSR setting done by the hypervisor without the ability and need
>>>>>>>>> to change it. This in turn has resulted in a convoluted initialization
>>>>>>>>> sequence and wrong decisions regarding cache mode availability due to
>>>>>>>>> misguiding PAT availability flags.
>>>>>>>>>
>>>>>>>>> Fix all of that by allowing to use PAT without MTRR and by reworking
>>>>>>>>> the current PAT initialization sequence to match better with the newly
>>>>>>>>> introduced generic cache initialization.
>>>>>>>>>
>>>>>>>>> This removes the need of the recently added pat_force_disabled flag, so
>>>>>>>>> remove the remnants of the patch adding it.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Juergen Gross <[email protected]>
>>>>>>>>
>>>>>>>> This patch breaks boot for TDX guest.
>>>>>>>>
>>>>>>>> Kernel now tries to set CR0.CD which is forbidden in TDX guest[1] and
>>>>>>>> causes #VE:
>>>>>>>>
>>>>>>>> tdx: Unexpected #VE: 28
>>>>>>>> VE fault: 0000 [#1] PREEMPT SMP NOPTI
>>>>>>>> CPU: 0 PID: 0 Comm: swapper Not tainted
>>>>>>>> 6.1.0-rc1-00015-gadfe7512e1d0 #2646
>>>>>>>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0
>>>>>>>> 02/06/2015
>>>>>>>> RIP: 0010:native_write_cr0 (arch/x86/kernel/cpu/common.c:427)
>>>>>>>> Call Trace:
>>>>>>>> <TASK>
>>>>>>>> ? cache_disable (arch/x86/include/asm/cpufeature.h:173
>>>>>>>> arch/x86/kernel/cpu/cacheinfo.c:1085)
>>>>>>>> ? cache_cpu_init (arch/x86/kernel/cpu/cacheinfo.c:1132
>>>>>>>> (discriminator 3))
>>>>>>>> ? setup_arch (arch/x86/kernel/setup.c:1079)
>>>>>>>> ? start_kernel (init/main.c:279 (discriminator 3) init/main.c:477
>>>>>>>> (discriminator 3) init/main.c:960 (discriminator 3))
>>>>>>>> ? load_ucode_bsp (arch/x86/kernel/cpu/microcode/core.c:155)
>>>>>>>> ? secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:358)
>>>>>>>> </TASK>
>>>>>>>>
>>>>>>>> Any suggestion how to fix it?
>>>>>>>>
>>>>>>>> [1] Section 10.6.1. "CR0", https://cdrdv2.intel.com/v1/dl/getContent/733568
>>>>>>>
>>>>>>> What was the solution before?
>>>>>>>
>>>>>>> I guess MTRR was disabled, so there was no PAT, too?
>>>>>>
>>>>>> Right:
>>>>>>
>>>>>> Linus' tree:
>>>>>>
>>>>>> [ 0.002589] last_pfn = 0x480000 max_arch_pfn = 0x10000000000
>>>>>> [ 0.003976] Disabled
>>>>>> [ 0.004452] x86/PAT: MTRRs disabled, skipping PAT initialization too.
>>>>>> [ 0.005856] CPU MTRRs all blank - virtualized system.
>>>>>> [ 0.006915] x86/PAT: Configuration [0-7]: WB WT UC- UC WB WT UC- UC
>>>>>>
>>>>>> tip/master:
>>>>>>
>>>>>> [ 0.003443] last_pfn = 0x20b8e max_arch_pfn = 0x10000000000
>>>>>> [ 0.005220] Disabled
>>>>>> [ 0.005818] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT
>>>>>> [ 0.007752] tdx: Unexpected #VE: 28
>>>>>>
>>>>>> The dangling "Disabled" comes mtrr_bp_init().
>>>>>>
>>>>>>
>>>>>>> If this is the case, you can go the same route as Xen PV guests do.
>>>>>>
>>>>>> Any reason X86_FEATURE_HYPERVISOR cannot be used instead of
>>>>>> X86_FEATURE_XENPV there?
>>>>>>
>>>>>> Do we have any virtualized platform that supports it?
>>>>>
>>>>> Yes, of course. Any hardware virtualized guest should be able to use it,
>>>>> obviously TDX guests are the first ones not being able to do so.
>>>>>
>>>>> And above dmesg snipplets are showing rather nicely that not disabling
>>>>> PAT completely should be a benefit for TDX guests, as all caching modes
>>>>> would be usable (the PAT MSR seems to be initialized quite fine).
>>>>>
>>>>> Instead of X86_FEATURE_XENPV we could introduce something like
>>>>> X86_FEATURE_PAT_READONLY, which could be set for Xen PV guests and for
>>>>> TDX guests.
>>>>
>>>> Technically, the MSR is writable on TDX. But it seems there's no way to
>>>> properly change it, following the protocol of changing on MP systems.
>>>
>>> Why not? I don't see why it is possible in a non-TDX system, but not within
>>> a TDX guest.
>>
>> Because the protocol you described below requires setting CR0.CD which is
>> not allowed in TDX guest and causes #VE.
>
> Hmm, yes, seems to be a valid reason. :-)
>
>>
>>>> Although, I don't quite follow what role cache disabling playing on system
>>>> with self-snoop support. Hm?
>>>
>>> It is the recommended way to do it. See SDM Vol. 3 Chapter 11 ("Memory Cache
>>> Control"):
>>>
>>> The operating system is responsible for insuring that changes to a PAT entry
>>> occur in a manner that maintains the consistency of the processor caches and
>>> translation lookaside buffers (TLB). This is accomplished by following the
>>> procedure as specified in Section 11.11.8, “MTRR Considerations in MP Systems,
>>> ”for changing the value of an MTRR in a multiple processor system. It requires
>>> a specific sequence of operations that includes flushing the processors caches
>>> and TLBs.
>>>
>>> And the sequence for the MTRRs is:
>>>
>>> 1. Broadcast to all processors to execute the following code sequence.
>>> 2. Disable interrupts.
>>> 3. Wait for all processors to reach this point.
>>> 4. Enter the no-fill cache mode. (Set the CD flag in control register CR0 to 1
>>> and the NW flag to 0.)
>>> 5. Flush all caches using the WBINVD instructions. Note on a processor that
>>> supports self-snooping, CPUID feature flag bit 27, this step is unnecessary.
>>> 6. If the PGE flag is set in control register CR4, flush all TLBs by clearing
>>> that flag.
>>> 7. If the PGE flag is clear in control register CR4, flush all TLBs by executing
>>> a MOV from control register CR3 to another register and then a MOV from that
>>> register back to CR3.
>>> 8. Disable all range registers (by clearing the E flag in register MTRRdefType).
>>> If only variable ranges are being modified, software may clear the valid
>>> bits
>>> for the affected register pairs instead.
>>> 9. Update the MTRRs.
>>> 10. Enable all range registers (by setting the E flag in register MTRRdefType).
>>> If only variable-range registers were modified and their individual valid
>>> bits were cleared, then set the valid bits for the affected ranges instead.
>>> 11. Flush all caches and all TLBs a second time. (The TLB flush is required for
>>> Pentium 4, Intel Xeon, and P6 family processors. Executing the WBINVD
>>> instruction is not needed when using Pentium 4, Intel Xeon, and P6 family
>>> processors, but it may be needed in future systems.)
>>> 12. Enter the normal cache mode to re-enable caching. (Set the CD and NW flags
>>> in control register CR0 to 0.)
>>> 13. Set PGE flag in control register CR4, if cleared in Step 6 (above).
>>> 14. Wait for all processors to reach this point.
>>> 15. Enable interrupts.
>>>
>>> So cache disabling is recommended.
>>
>> Yeah, I read that.
>>
>> But the question is what kind of scenario cache disabling is actually
>> prevents if self-snoop is supported? In this case cache stays intact (no
>> WBINVD). The next time a cache line gets accessed with different caching
>> mode the old line gets snooped, right?
>>
>> Would it be valid to avoid touching CR0.CD if X86_FEATURE_SELFSNOOP?
>>
>
> That's a question for the Intel architects, I guess.
>
> I'd just ask them how to setup PAT in TDX guests. Either they need to
> change the recommended setup sequence, or the PAT support bit needs to
> be cleared IMO.
I've forwarded the question to Intel, BTW.
Another question to you: where does the initial PAT MSR value come from?
I guess from UEFI?
Juergen
On Mon, Dec 05, 2022 at 08:40:06AM +0100, Juergen Gross wrote:
> > That's a question for the Intel architects, I guess.
> >
> > I'd just ask them how to setup PAT in TDX guests. Either they need to
> > change the recommended setup sequence, or the PAT support bit needs to
> > be cleared IMO.
>
> I've forwarded the question to Intel, BTW.
I've initiated the talk internally too.
> Another question to you: where does the initial PAT MSR value come from?
> I guess from UEFI?
It is set by TDX module on initialization. See section 21.2.4.1.2. "TD
VMCS Guest MSRs" of TDX module spec[1]
[1] https://cdrdv2.intel.com/v1/dl/getContent/733568
--
Kiryl Shutsemau / Kirill A. Shutemov