LinuxLists.cc - [PATCH 0/3] x86/mtrr, pat: make PAT independent from MTRR

2019-08-09 03:55:28

Subject: [PATCH 0/3] x86/mtrr, pat: make PAT independent from MTRR

Make PAT(Page Attribute Table) independent from
MTRR(Memory Type Range Register).
Some environments (mainly virtual ones) support only PAT, but not MTRR
because PAT replaces MTRR.
It's tricky and no gain to support both MTRR and PAT except compatibility.
So some VM technologies don't support MTRR, but only PAT.
This patch series makes PAT available on such environments without MTRR.

patch 1 and 2 are only preparation. no logic change, function rename
(mtrr_ => mtrr_pat_ which is commonly used by both MTRR and PAT) and
moving functions out from mtrr specific files to a common file.
patch 3 is an essential patch which makes PAT independent from MTRR.

Isaku Yamahata (3):
x86/mtrr: split common funcs from mtrr.c
x86/mtrr: split common funcs from generic.c
x86/mtrr, pat: make PAT independent from MTRR

arch/x86/Kconfig | 1 -
arch/x86/include/asm/mtrr.h | 37 ++-
arch/x86/include/asm/pat.h | 2 +
arch/x86/kernel/cpu/common.c | 2 +-
arch/x86/kernel/cpu/mtrr/Makefile | 2 +-
arch/x86/kernel/cpu/mtrr/generic.c | 116 +--------
arch/x86/kernel/cpu/mtrr/mtrr.c | 211 +----------------
arch/x86/kernel/cpu/mtrr/mtrr.h | 8 +-
arch/x86/kernel/cpu/mtrr/rendezvous.c | 324 ++++++++++++++++++++++++++
arch/x86/kernel/setup.c | 4 +-
arch/x86/kernel/smpboot.c | 8 +-
arch/x86/mm/Makefile | 3 +
arch/x86/mm/pat.c | 99 +++++++-
arch/x86/power/cpu.c | 2 +-
14 files changed, 479 insertions(+), 340 deletions(-)
create mode 100644 arch/x86/kernel/cpu/mtrr/rendezvous.c

--
2.17.1

2019-08-09 03:55:29

by Isaku Yamahata

[permalink] [raw]

Subject: [PATCH 1/3] x86/mtrr: split common funcs from mtrr.c

This is a preparation for make PAT(Page Attribute Table) independent
from MTRR(Memory Type Range Register).
It renames prefix of common functions in mtrr.c from mtrr_ to
mtrr_pat_ which are commonly used by both MTRR and PAT and moves out
them from mtrr.c to rendezvous.c.
Only prefix rename and movement, no logic change.

Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/x86/include/asm/mtrr.h | 25 +--
arch/x86/kernel/cpu/common.c | 2 +-
arch/x86/kernel/cpu/mtrr/Makefile | 2 +-
arch/x86/kernel/cpu/mtrr/mtrr.c | 201 ++---------------------
arch/x86/kernel/cpu/mtrr/mtrr.h | 6 +
arch/x86/kernel/cpu/mtrr/rendezvous.c | 221 ++++++++++++++++++++++++++
arch/x86/kernel/setup.c | 4 +-
arch/x86/kernel/smpboot.c | 8 +-
arch/x86/power/cpu.c | 2 +-
9 files changed, 260 insertions(+), 211 deletions(-)
create mode 100644 arch/x86/kernel/cpu/mtrr/rendezvous.c

diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index dbff1456d215..d90e87c55302 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -32,6 +32,7 @@
* arch_phys_wc_add and arch_phys_wc_del.
*/
# ifdef CONFIG_MTRR
+extern bool mtrr_enabled(void);
extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform);
extern void mtrr_save_fixed_ranges(void *);
extern void mtrr_save_state(void);
@@ -42,14 +43,18 @@ extern int mtrr_add_page(unsigned long base, unsigned long size,
extern int mtrr_del(int reg, unsigned long base, unsigned long size);
extern int mtrr_del_page(int reg, unsigned long base, unsigned long size);
extern void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi);
-extern void mtrr_ap_init(void);
-extern void mtrr_bp_init(void);
-extern void set_mtrr_aps_delayed_init(void);
-extern void mtrr_aps_init(void);
-extern void mtrr_bp_restore(void);
+extern void mtrr_pat_ap_init(void);
+extern void mtrr_pat_bp_init(void);
+extern void set_mtrr_pat_aps_delayed_init(void);
+extern void mtrr_pat_aps_init(void);
+extern void mtrr_pat_bp_restore(void);
extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
extern int amd_special_default_mtrr(void);
# else
+static inline bool mtrr_enabled(void)
+{
+ return false;
+}
static inline u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform)
{
/*
@@ -84,15 +89,15 @@ static inline int mtrr_trim_uncached_memory(unsigned long end_pfn)
static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
{
}
-static inline void mtrr_bp_init(void)
+static inline void mtrr_pat_bp_init(void)
{
pat_disable("MTRRs disabled, skipping PAT initialization too.");
}

-#define mtrr_ap_init() do {} while (0)
-#define set_mtrr_aps_delayed_init() do {} while (0)
-#define mtrr_aps_init() do {} while (0)
-#define mtrr_bp_restore() do {} while (0)
+static inline void mtrr_pat_ap_init(void) { };
+static inline void set_mtrr_pat_aps_delayed_init(void) { };
+static inline void mtrr_pat_aps_init(void) { };
+static inline void mtrr_pat_bp_restore(void) { };
# endif

#ifdef CONFIG_COMPAT
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 11472178e17f..39b7942cb6fc 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1550,7 +1550,7 @@ void identify_secondary_cpu(struct cpuinfo_x86 *c)
#ifdef CONFIG_X86_32
enable_sep_cpu();
#endif
- mtrr_ap_init();
+ mtrr_pat_ap_init();
validate_apic_and_package_id(c);
x86_spec_ctrl_setup_ap();
}
diff --git a/arch/x86/kernel/cpu/mtrr/Makefile b/arch/x86/kernel/cpu/mtrr/Makefile
index cc4f9f1cb94c..e339d729f349 100644
--- a/arch/x86/kernel/cpu/mtrr/Makefile
+++ b/arch/x86/kernel/cpu/mtrr/Makefile
@@ -1,4 +1,4 @@
# SPDX-License-Identifier: GPL-2.0-only
-obj-y := mtrr.o if.o generic.o cleanup.o
+obj-y := mtrr.o if.o generic.o cleanup.o rendezvous.o
obj-$(CONFIG_X86_32) += amd.o cyrix.o centaur.o

diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
index 507039c20128..3d35edb1aa42 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -35,7 +35,6 @@

#include <linux/types.h> /* FIXME: kvm_para.h needs this */

-#include <linux/stop_machine.h>
#include <linux/kvm_para.h>
#include <linux/uaccess.h>
#include <linux/export.h>
@@ -46,10 +45,7 @@
#include <linux/pci.h>
#include <linux/smp.h>
#include <linux/syscore_ops.h>
-#include <linux/rcupdate.h>

-#include <asm/cpufeature.h>
-#include <asm/e820/api.h>
#include <asm/mtrr.h>
#include <asm/msr.h>
#include <asm/pat.h>
@@ -62,7 +58,7 @@
u32 num_var_ranges;
static bool __mtrr_enabled;

-static bool mtrr_enabled(void)
+bool mtrr_enabled(void)
{
return __mtrr_enabled;
}
@@ -71,15 +67,11 @@ unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
static DEFINE_MUTEX(mtrr_mutex);

u64 size_or_mask, size_and_mask;
-static bool mtrr_aps_delayed_init;

static const struct mtrr_ops *mtrr_ops[X86_VENDOR_NUM] __ro_after_init;

const struct mtrr_ops *mtrr_if;

-static void set_mtrr(unsigned int reg, unsigned long base,
- unsigned long size, mtrr_type type);
-
void __init set_mtrr_ops(const struct mtrr_ops *ops)
{
if (ops->vendor && ops->vendor < X86_VENDOR_NUM)
@@ -144,46 +136,6 @@ static void __init init_table(void)
mtrr_usage_table[i] = 1;
}

-struct set_mtrr_data {
- unsigned long smp_base;
- unsigned long smp_size;
- unsigned int smp_reg;
- mtrr_type smp_type;
-};
-
-/**
- * mtrr_rendezvous_handler - Work done in the synchronization handler. Executed
- * by all the CPUs.
- * @info: pointer to mtrr configuration data
- *
- * Returns nothing.
- */
-static int mtrr_rendezvous_handler(void *info)
-{
- struct set_mtrr_data *data = info;
-
- /*
- * We use this same function to initialize the mtrrs during boot,
- * resume, runtime cpu online and on an explicit request to set a
- * specific MTRR.
- *
- * During boot or suspend, the state of the boot cpu's mtrrs has been
- * saved, and we want to replicate that across all the cpus that come
- * online (either at the end of boot or resume or during a runtime cpu
- * online). If we're doing that, @reg is set to something special and on
- * all the cpu's we do mtrr_if->set_all() (On the logical cpu that
- * started the boot/resume sequence, this might be a duplicate
- * set_all()).
- */
- if (data->smp_reg != ~0U) {
- mtrr_if->set(data->smp_reg, data->smp_base,
- data->smp_size, data->smp_type);
- } else if (mtrr_aps_delayed_init || !cpu_online(smp_processor_id())) {
- mtrr_if->set_all();
- }
- return 0;
-}
-
static inline int types_compatible(mtrr_type type1, mtrr_type type2)
{
return type1 == MTRR_TYPE_UNCACHABLE ||
@@ -192,77 +144,6 @@ static inline int types_compatible(mtrr_type type1, mtrr_type type2)
(type1 == MTRR_TYPE_WRBACK && type2 == MTRR_TYPE_WRTHROUGH);
}

-/**
- * set_mtrr - update mtrrs on all processors
- * @reg: mtrr in question
- * @base: mtrr base
- * @size: mtrr size
- * @type: mtrr type
- *
- * This is kinda tricky, but fortunately, Intel spelled it out for us cleanly:
- *
- * 1. Queue work to do the following on all processors:
- * 2. Disable Interrupts
- * 3. Wait for all procs to do so
- * 4. Enter no-fill cache mode
- * 5. Flush caches
- * 6. Clear PGE bit
- * 7. Flush all TLBs
- * 8. Disable all range registers
- * 9. Update the MTRRs
- * 10. Enable all range registers
- * 11. Flush all TLBs and caches again
- * 12. Enter normal cache mode and reenable caching
- * 13. Set PGE
- * 14. Wait for buddies to catch up
- * 15. Enable interrupts.
- *
- * What does that mean for us? Well, stop_machine() will ensure that
- * the rendezvous handler is started on each CPU. And in lockstep they
- * do the state transition of disabling interrupts, updating MTRR's
- * (the CPU vendors may each do it differently, so we call mtrr_if->set()
- * callback and let them take care of it.) and enabling interrupts.
- *
- * Note that the mechanism is the same for UP systems, too; all the SMP stuff
- * becomes nops.
- */
-static void
-set_mtrr(unsigned int reg, unsigned long base, unsigned long size, mtrr_type type)
-{
- struct set_mtrr_data data = { .smp_reg = reg,
- .smp_base = base,
- .smp_size = size,
- .smp_type = type
- };
-
- stop_machine(mtrr_rendezvous_handler, &data, cpu_online_mask);
-}
-
-static void set_mtrr_cpuslocked(unsigned int reg, unsigned long base,
- unsigned long size, mtrr_type type)
-{
- struct set_mtrr_data data = { .smp_reg = reg,
- .smp_base = base,
- .smp_size = size,
- .smp_type = type
- };
-
- stop_machine_cpuslocked(mtrr_rendezvous_handler, &data, cpu_online_mask);
-}
-
-static void set_mtrr_from_inactive_cpu(unsigned int reg, unsigned long base,
- unsigned long size, mtrr_type type)
-{
- struct set_mtrr_data data = { .smp_reg = reg,
- .smp_base = base,
- .smp_size = size,
- .smp_type = type
- };
-
- stop_machine_from_inactive_cpu(mtrr_rendezvous_handler, &data,
- cpu_callout_mask);
-}
-
/**
* mtrr_add_page - Add a memory type region
* @base: Physical base address of region in pages (in units of 4 kB!)
@@ -382,7 +263,7 @@ int mtrr_add_page(unsigned long base, unsigned long size,
/* Search for an empty MTRR */
i = mtrr_if->get_free_region(base, size, replace);
if (i >= 0) {
- set_mtrr_cpuslocked(i, base, size, type);
+ set_mtrr_pat_cpuslocked(i, base, size, type);
if (likely(replace < 0)) {
mtrr_usage_table[i] = 1;
} else {
@@ -390,7 +271,7 @@ int mtrr_add_page(unsigned long base, unsigned long size,
if (increment)
mtrr_usage_table[i]++;
if (unlikely(replace != i)) {
- set_mtrr_cpuslocked(replace, 0, 0, 0);
+ set_mtrr_pat_cpuslocked(replace, 0, 0, 0);
mtrr_usage_table[replace] = 0;
}
}
@@ -518,7 +399,7 @@ int mtrr_del_page(int reg, unsigned long base, unsigned long size)
goto out;
}
if (--mtrr_usage_table[reg] < 1)
- set_mtrr_cpuslocked(reg, 0, 0, 0);
+ set_mtrr_pat_cpuslocked(reg, 0, 0, 0);
error = reg;
out:
mutex_unlock(&mtrr_mutex);
@@ -662,9 +543,9 @@ static void mtrr_restore(void)

for (i = 0; i < num_var_ranges; i++) {
if (mtrr_value[i].lsize) {
- set_mtrr(i, mtrr_value[i].lbase,
- mtrr_value[i].lsize,
- mtrr_value[i].ltype);
+ set_mtrr_pat(i, mtrr_value[i].lbase,
+ mtrr_value[i].lsize,
+ mtrr_value[i].ltype);
}
}
}
@@ -680,13 +561,13 @@ int __initdata changed_by_mtrr_cleanup;

#define SIZE_OR_MASK_BITS(n) (~((1ULL << ((n) - PAGE_SHIFT)) - 1))
/**
- * mtrr_bp_init - initialize mtrrs on the boot CPU
+ * mtrr_pat_bp_init - initialize mtrrs on the boot CPU
*
* This needs to be called early; before any of the other CPUs are
* initialized (i.e. before smp_init()).
*
*/
-void __init mtrr_bp_init(void)
+void __init mtrr_pat_bp_init(void)
{
u32 phys_addr;

@@ -786,32 +667,6 @@ void __init mtrr_bp_init(void)
}
}

-void mtrr_ap_init(void)
-{
- if (!mtrr_enabled())
- return;
-
- if (!use_intel() || mtrr_aps_delayed_init)
- return;
-
- rcu_cpu_starting(smp_processor_id());
-
- /*
- * Ideally we should hold mtrr_mutex here to avoid mtrr entries
- * changed, but this routine will be called in cpu boot time,
- * holding the lock breaks it.
- *
- * This routine is called in two cases:
- *
- * 1. very earily time of software resume, when there absolutely
- * isn't mtrr entry changes;
- *
- * 2. cpu hotadd time. We let mtrr_add/del_page hold cpuhotplug
- * lock to prevent mtrr entry changes
- */
- set_mtrr_from_inactive_cpu(~0U, 0, 0, 0);
-}
-
/**
* Save current fixed-range MTRR state of the first cpu in cpu_online_mask.
*/
@@ -826,44 +681,6 @@ void mtrr_save_state(void)
smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
}

-void set_mtrr_aps_delayed_init(void)
-{
- if (!mtrr_enabled())
- return;
- if (!use_intel())
- return;
-
- mtrr_aps_delayed_init = true;
-}
-
-/*
- * Delayed MTRR initialization for all AP's
- */
-void mtrr_aps_init(void)
-{
- if (!use_intel() || !mtrr_enabled())
- return;
-
- /*
- * Check if someone has requested the delay of AP MTRR initialization,
- * by doing set_mtrr_aps_delayed_init(), prior to this point. If not,
- * then we are done.
- */
- if (!mtrr_aps_delayed_init)
- return;
-
- set_mtrr(~0U, 0, 0, 0);
- mtrr_aps_delayed_init = false;
-}
-
-void mtrr_bp_restore(void)
-{
- if (!use_intel() || !mtrr_enabled())
- return;
-
- mtrr_if->set_all();
-}
-
static int __init mtrr_init_finialize(void)
{
if (!mtrr_enabled())
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h
index 2ac99e561181..e9aeeeac9a3e 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.h
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.h
@@ -71,6 +71,12 @@ void mtrr_state_warn(void);
const char *mtrr_attrib_to_str(int x);
void mtrr_wrmsr(unsigned, unsigned, unsigned);

+/* rendezvous */
+void set_mtrr_pat(unsigned int reg, unsigned long base,
+ unsigned long size, mtrr_type type);
+void set_mtrr_pat_cpuslocked(unsigned int reg, unsigned long base,
+ unsigned long size, mtrr_type type);
+
/* CPU specific mtrr init functions */
int amd_init_mtrr(void);
int cyrix_init_mtrr(void);
diff --git a/arch/x86/kernel/cpu/mtrr/rendezvous.c b/arch/x86/kernel/cpu/mtrr/rendezvous.c
new file mode 100644
index 000000000000..5448eea573df
--- /dev/null
+++ b/arch/x86/kernel/cpu/mtrr/rendezvous.c
@@ -0,0 +1,221 @@
+/* common code for MTRR (Memory Type Range Register) and
+ * PAT(Page Attribute Table)
+ *
+ * Copyright (C) 1997-2000 Richard Gooch
+ * Copyright (c) 2002 Patrick Mochel
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Library General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Library General Public License for more details.
+ *
+ * Richard Gooch may be reached by email at [email protected]
+ * The postal address is:
+ * Richard Gooch, c/o ATNF, P. O. Box 76, Epping, N.S.W., 2121, Australia.
+ *
+ * Source: "Pentium Pro Family Developer's Manual, Volume 3:
+ * Operating System Writer's Guide" (Intel document number 242692),
+ * section 11.11.7
+ *
+ * This was cleaned and made readable by Patrick Mochel <[email protected]>
+ * on 6-7 March 2002.
+ * Source: Intel Architecture Software Developers Manual, Volume 3:
+ * System Programming Guide; Section 9.11. (1997 edition - PPro).
+ *
+ * This file was split from mtrr.c and generic.c for MTRR and PAT.
+ */
+
+#define DEBUG
+
+#include <linux/stop_machine.h>
+
+#include <asm/mtrr.h>
+#include <asm/msr.h>
+#include <asm/pat.h>
+
+#include "mtrr.h"
+
+static bool mtrr_pat_aps_delayed_init;
+
+struct set_mtrr_data {
+ unsigned long smp_base;
+ unsigned long smp_size;
+ unsigned int smp_reg;
+ mtrr_type smp_type;
+};
+
+/**
+ * mtrr_pat_rendezvous_handler - Work done in the synchronization handler.
+ * Executed by all the CPUs.
+ * @info: pointer to mtrr configuration data
+ *
+ * Returns nothing.
+ */
+static int mtrr_pat_rendezvous_handler(void *info)
+{
+ struct set_mtrr_data *data = info;
+
+ /*
+ * We use this same function to initialize the mtrrs during boot,
+ * resume, runtime cpu online and on an explicit request to set a
+ * specific MTRR.
+ *
+ * During boot or suspend, the state of the boot cpu's mtrrs has been
+ * saved, and we want to replicate that across all the cpus that come
+ * online (either at the end of boot or resume or during a runtime cpu
+ * online). If we're doing that, @reg is set to something special and on
+ * all the cpu's we do mtrr_if->set_all() (On the logical cpu that
+ * started the boot/resume sequence, this might be a duplicate
+ * set_all()).
+ */
+ if (data->smp_reg != ~0U) {
+ mtrr_if->set(data->smp_reg, data->smp_base,
+ data->smp_size, data->smp_type);
+ } else if (mtrr_pat_aps_delayed_init ||
+ !cpu_online(smp_processor_id())) {
+ mtrr_if->set_all();
+ }
+ return 0;
+}
+
+/**
+ * set_mtrr_pat - update mtrrs on all processors
+ * @reg: mtrr in question
+ * @base: mtrr base
+ * @size: mtrr size
+ * @type: mtrr type
+ *
+ * This is kinda tricky, but fortunately, Intel spelled it out for us cleanly:
+ *
+ * 1. Queue work to do the following on all processors:
+ * 2. Disable Interrupts
+ * 3. Wait for all procs to do so
+ * 4. Enter no-fill cache mode
+ * 5. Flush caches
+ * 6. Clear PGE bit
+ * 7. Flush all TLBs
+ * 8. Disable all range registers
+ * 9. Update the MTRRs
+ * 10. Enable all range registers
+ * 11. Flush all TLBs and caches again
+ * 12. Enter normal cache mode and reenable caching
+ * 13. Set PGE
+ * 14. Wait for buddies to catch up
+ * 15. Enable interrupts.
+ *
+ * What does that mean for us? Well, stop_machine() will ensure that
+ * the rendezvous handler is started on each CPU. And in lockstep they
+ * do the state transition of disabling interrupts, updating MTRR's
+ * (the CPU vendors may each do it differently, so we call mtrr_if->set()
+ * callback and let them take care of it.) and enabling interrupts.
+ *
+ * Note that the mechanism is the same for UP systems, too; all the SMP stuff
+ * becomes nops.
+ */
+void
+set_mtrr_pat(unsigned int reg, unsigned long base,
+ unsigned long size, mtrr_type type)
+{
+ struct set_mtrr_data data = { .smp_reg = reg,
+ .smp_base = base,
+ .smp_size = size,
+ .smp_type = type
+ };
+
+ stop_machine(mtrr_pat_rendezvous_handler, &data, cpu_online_mask);
+}
+
+void set_mtrr_pat_cpuslocked(unsigned int reg, unsigned long base,
+ unsigned long size, mtrr_type type)
+{
+ struct set_mtrr_data data = { .smp_reg = reg,
+ .smp_base = base,
+ .smp_size = size,
+ .smp_type = type
+ };
+
+ stop_machine_cpuslocked(mtrr_pat_rendezvous_handler,
+ &data, cpu_online_mask);
+}
+
+static void set_mtrr_pat_from_inactive_cpu(unsigned int reg, unsigned long base,
+ unsigned long size, mtrr_type type)
+{
+ struct set_mtrr_data data = { .smp_reg = reg,
+ .smp_base = base,
+ .smp_size = size,
+ .smp_type = type
+ };
+
+ stop_machine_from_inactive_cpu(mtrr_pat_rendezvous_handler, &data,
+ cpu_callout_mask);
+}
+
+void mtrr_pat_ap_init(void)
+{
+ if (!mtrr_enabled())
+ return;
+
+ if (!use_intel() || mtrr_pat_aps_delayed_init)
+ return;
+
+ rcu_cpu_starting(smp_processor_id());
+
+ /*
+ * Ideally we should hold mtrr_mutex here to avoid mtrr entries
+ * changed, but this routine will be called in cpu boot time,
+ * holding the lock breaks it.
+ *
+ * This routine is called in two cases:
+ *
+ * 1. very earily time of software resume, when there absolutely
+ * isn't mtrr entry changes;
+ *
+ * 2. cpu hotadd time. We let mtrr_add/del_page hold cpuhotplug
+ * lock to prevent mtrr entry changes
+ */
+ set_mtrr_pat_from_inactive_cpu(~0U, 0, 0, 0);
+}
+
+void set_mtrr_pat_aps_delayed_init(void)
+{
+ if (!mtrr_enabled())
+ return;
+ if (!use_intel())
+ return;
+
+ mtrr_pat_aps_delayed_init = true;
+}
+
+/*
+ * Delayed MTRR initialization for all AP's
+ */
+void mtrr_pat_aps_init(void)
+{
+ if (!use_intel() || !mtrr_enabled())
+ return;
+
+ /*
+ * Check if someone has requested the delay of AP MTRR initialization,
+ * by doing set_mtrr_pat_aps_delayed_init(), prior to this point.
+ * If not, then we are done.
+ */
+ if (!mtrr_pat_aps_delayed_init)
+ return;
+
+ set_mtrr_pat(~0U, 0, 0, 0);
+ mtrr_pat_aps_delayed_init = false;
+}
+
+void mtrr_pat_bp_restore(void)
+{
+ if (!use_intel() || !mtrr_enabled())
+ return;
+
+ mtrr_if->set_all();
+}
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index bbe35bf879f5..ca06370c7a13 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1064,7 +1064,7 @@ void __init setup_arch(char **cmdline_p)
max_pfn = e820__end_of_ram_pfn();

/* update e820 for memory not covered by WB MTRRs */
- mtrr_bp_init();
+ mtrr_pat_bp_init();
if (mtrr_trim_uncached_memory(max_pfn))
max_pfn = e820__end_of_ram_pfn();

@@ -1072,7 +1072,7 @@ void __init setup_arch(char **cmdline_p)

/*
* This call is required when the CPU does not support PAT. If
- * mtrr_bp_init() invoked it already via pat_init() the call has no
+ * mtrr_pat_bp_init() invoked it already via pat_init() the call has no
* effect.
*/
init_cache_modes();
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index fdbd47ceb84d..3e16e7e3a01b 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1370,7 +1370,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)

uv_system_init();

- set_mtrr_aps_delayed_init();
+ set_mtrr_pat_aps_delayed_init();

smp_quirk_init_udelay();

@@ -1379,12 +1379,12 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)

void arch_enable_nonboot_cpus_begin(void)
{
- set_mtrr_aps_delayed_init();
+ set_mtrr_pat_aps_delayed_init();
}

void arch_enable_nonboot_cpus_end(void)
{
- mtrr_aps_init();
+ mtrr_pat_aps_init();
}

/*
@@ -1424,7 +1424,7 @@ void __init native_smp_cpus_done(unsigned int max_cpus)

nmi_selftest();
impress_friends();
- mtrr_aps_init();
+ mtrr_pat_aps_init();
}

static int __initdata setup_possible_cpus = -1;
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 24b079e94bc2..860d33b0dd1b 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -263,7 +263,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
do_fpu_end();
tsc_verify_tsc_adjust(true);
x86_platform.restore_sched_clock_state();
- mtrr_bp_restore();
+ mtrr_pat_bp_restore();
perf_restore_debug_store();
msr_restore_context(ctxt);
}
--
2.17.1

2019-08-09 03:55:29

by Isaku Yamahata

[permalink] [raw]

Subject: [PATCH 2/3] x86/mtrr: split common funcs from generic.c

This is a preparation for make PAT(Page Attribute Table) independent
from MTRR(Memory Type Range Register).
It renames prefix of common functions in mtrr/generic.c from mtrr_ to
mtrr_pat_ which are commonly used by both MTRR and PAT and moves out
them from mtrr/generic.c to rendezvous.c.
Only prefix rename and movement, no logic change.

Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/x86/include/asm/mtrr.h | 4 +
arch/x86/kernel/cpu/mtrr/generic.c | 111 ++------------------------
arch/x86/kernel/cpu/mtrr/mtrr.c | 2 +-
arch/x86/kernel/cpu/mtrr/mtrr.h | 3 +-
arch/x86/kernel/cpu/mtrr/rendezvous.c | 91 +++++++++++++++++++++
5 files changed, 106 insertions(+), 105 deletions(-)

diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index d90e87c55302..5b056374f5a6 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -33,6 +33,8 @@
*/
# ifdef CONFIG_MTRR
extern bool mtrr_enabled(void);
+extern void mtrr_pat_prepare_set(void) __acquires(set_atomicity_lock);
+extern void mtrr_pat_post_set(void) __releases(set_atomicity_lock);
extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform);
extern void mtrr_save_fixed_ranges(void *);
extern void mtrr_save_state(void);
@@ -55,6 +57,8 @@ static inline bool mtrr_enabled(void)
{
return false;
}
+static inline void mtrr_pat_prepare_set(void) { };
+static inline void mtrr_pat_post_set(void) { };
static inline u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform)
{
/*
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index aa5c064a6a22..a44f05f64846 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -397,9 +397,6 @@ print_fixed(unsigned base, unsigned step, const mtrr_type *types)
}
}

-static void prepare_set(void);
-static void post_set(void);
-
static void __init print_mtrr_state(void)
{
unsigned int i;
@@ -445,20 +442,6 @@ static void __init print_mtrr_state(void)
pr_debug("TOM2: %016llx aka %lldM\n", mtrr_tom2, mtrr_tom2>>20);
}

-/* PAT setup for BP. We need to go through sync steps here */
-void __init mtrr_bp_pat_init(void)
-{
- unsigned long flags;
-
- local_irq_save(flags);
- prepare_set();
-
- pat_init();
-
- post_set();
- local_irq_restore(flags);
-}
-
/* Grab all of the MTRR state for this CPU into *state */
bool __init get_mtrr_state(void)
{
@@ -680,8 +663,6 @@ static bool set_mtrr_var_ranges(unsigned int index, struct mtrr_var_range *vr)
return changed;
}

-static u32 deftype_lo, deftype_hi;
-
/**
* set_mtrr_state - Set the MTRR state for this CPU.
*
@@ -705,100 +686,24 @@ static unsigned long set_mtrr_state(void)
* Set_mtrr_restore restores the old value of MTRRdefType,
* so to set it we fiddle with the saved value:
*/
- if ((deftype_lo & 0xff) != mtrr_state.def_type
- || ((deftype_lo & 0xc00) >> 10) != mtrr_state.enabled) {
+ if ((mtrr_deftype_lo & 0xff) != mtrr_state.def_type
+ || ((mtrr_deftype_lo & 0xc00) >> 10) != mtrr_state.enabled) {

- deftype_lo = (deftype_lo & ~0xcff) | mtrr_state.def_type |
- (mtrr_state.enabled << 10);
+ mtrr_deftype_lo = (mtrr_deftype_lo & ~0xcff) |
+ mtrr_state.def_type | (mtrr_state.enabled << 10);
change_mask |= MTRR_CHANGE_MASK_DEFTYPE;
}

return change_mask;
}

-
-static unsigned long cr4;
-static DEFINE_RAW_SPINLOCK(set_atomicity_lock);
-
-/*
- * Since we are disabling the cache don't allow any interrupts,
- * they would run extremely slow and would only increase the pain.
- *
- * The caller must ensure that local interrupts are disabled and
- * are reenabled after post_set() has been called.
- */
-static void prepare_set(void) __acquires(set_atomicity_lock)
-{
- unsigned long cr0;
-
- /*
- * Note that this is not ideal
- * since the cache is only flushed/disabled for this CPU while the
- * MTRRs are changed, but changing this requires more invasive
- * changes to the way the kernel boots
- */
-
- raw_spin_lock(&set_atomicity_lock);
-
- /* Enter the no-fill (CD=1, NW=0) cache mode and flush caches. */
- cr0 = read_cr0() | X86_CR0_CD;
- write_cr0(cr0);
-
- /*
- * Cache flushing is the most time-consuming step when programming
- * the MTRRs. Fortunately, as per the Intel Software Development
- * Manual, we can skip it if the processor supports cache self-
- * snooping.
- */
- if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
- wbinvd();
-
- /* Save value of CR4 and clear Page Global Enable (bit 7) */
- if (boot_cpu_has(X86_FEATURE_PGE)) {
- cr4 = __read_cr4();
- __write_cr4(cr4 & ~X86_CR4_PGE);
- }
-
- /* Flush all TLBs via a mov %cr3, %reg; mov %reg, %cr3 */
- count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
- __flush_tlb();
-
- /* Save MTRR state */
- rdmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
-
- /* Disable MTRRs, and set the default type to uncached */
- mtrr_wrmsr(MSR_MTRRdefType, deftype_lo & ~0xcff, deftype_hi);
-
- /* Again, only flush caches if we have to. */
- if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
- wbinvd();
-}
-
-static void post_set(void) __releases(set_atomicity_lock)
-{
- /* Flush TLBs (no need to flush caches - they are disabled) */
- count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
- __flush_tlb();
-
- /* Intel (P6) standard MTRRs */
- mtrr_wrmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
-
- /* Enable caches */
- write_cr0(read_cr0() & ~X86_CR0_CD);
-
- /* Restore value of CR4 */
- if (boot_cpu_has(X86_FEATURE_PGE))
- __write_cr4(cr4);
- raw_spin_unlock(&set_atomicity_lock);
-}
-
static void generic_set_all(void)
{
unsigned long mask, count;
unsigned long flags;

local_irq_save(flags);
- prepare_set();
+ mtrr_pat_prepare_set();

/* Actually set the state */
mask = set_mtrr_state();
@@ -806,7 +711,7 @@ static void generic_set_all(void)
/* also set PAT */
pat_init();

- post_set();
+ mtrr_pat_post_set();
local_irq_restore(flags);

/* Use the atomic bitops to update the global mask */
@@ -837,7 +742,7 @@ static void generic_set_mtrr(unsigned int reg, unsigned long base,
vr = &mtrr_state.var_ranges[reg];

local_irq_save(flags);
- prepare_set();
+ mtrr_pat_prepare_set();

if (size == 0) {
/*
@@ -856,7 +761,7 @@ static void generic_set_mtrr(unsigned int reg, unsigned long base,
mtrr_wrmsr(MTRRphysMask_MSR(reg), vr->mask_lo, vr->mask_hi);
}

- post_set();
+ mtrr_pat_post_set();
local_irq_restore(flags);
}

diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
index 3d35edb1aa42..475627ca2c1b 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -646,7 +646,7 @@ void __init mtrr_pat_bp_init(void)
__mtrr_enabled = get_mtrr_state();

if (mtrr_enabled())
- mtrr_bp_pat_init();
+ pat_bp_init();

if (mtrr_cleanup(phys_addr)) {
changed_by_mtrr_cleanup = 1;
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h
index e9aeeeac9a3e..57948b651b8e 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.h
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.h
@@ -53,7 +53,7 @@ void set_mtrr_prepare_save(struct set_mtrr_context *ctxt);
void fill_mtrr_var_range(unsigned int index,
u32 base_lo, u32 base_hi, u32 mask_lo, u32 mask_hi);
bool get_mtrr_state(void);
-void mtrr_bp_pat_init(void);
+void pat_bp_init(void);

extern void __init set_mtrr_ops(const struct mtrr_ops *ops);

@@ -76,6 +76,7 @@ void set_mtrr_pat(unsigned int reg, unsigned long base,
unsigned long size, mtrr_type type);
void set_mtrr_pat_cpuslocked(unsigned int reg, unsigned long base,
unsigned long size, mtrr_type type);
+extern u32 mtrr_deftype_lo, mtrr_deftype_hi;

/* CPU specific mtrr init functions */
int amd_init_mtrr(void);
diff --git a/arch/x86/kernel/cpu/mtrr/rendezvous.c b/arch/x86/kernel/cpu/mtrr/rendezvous.c
index 5448eea573df..d902b9e5cc17 100644
--- a/arch/x86/kernel/cpu/mtrr/rendezvous.c
+++ b/arch/x86/kernel/cpu/mtrr/rendezvous.c
@@ -33,14 +33,105 @@
#define DEBUG

#include <linux/stop_machine.h>
+#include <linux/vmstat.h>

#include <asm/mtrr.h>
#include <asm/msr.h>
#include <asm/pat.h>
+#include <asm/tlbflush.h>

#include "mtrr.h"

static bool mtrr_pat_aps_delayed_init;
+u32 mtrr_deftype_lo, mtrr_deftype_hi;
+static unsigned long cr4;
+static DEFINE_RAW_SPINLOCK(set_atomicity_lock);
+
+/* PAT setup for BP. We need to go through sync steps here */
+void __init pat_bp_init(void)
+{
+ unsigned long flags;
+
+ local_irq_save(flags);
+ mtrr_pat_prepare_set();
+
+ pat_init();
+
+ mtrr_pat_post_set();
+ local_irq_restore(flags);
+}
+
+/*
+ * Since we are disabling the cache don't allow any interrupts,
+ * they would run extremely slow and would only increase the pain.
+ *
+ * The caller must ensure that local interrupts are disabled and
+ * are reenabled after mtrr_pat_post_set() has been called.
+ */
+void mtrr_pat_prepare_set(void) __acquires(set_atomicity_lock)
+{
+ unsigned long cr0;
+
+ /*
+ * Note that this is not ideal
+ * since the cache is only flushed/disabled for this CPU while the
+ * MTRRs are changed, but changing this requires more invasive
+ * changes to the way the kernel boots
+ */
+
+ raw_spin_lock(&set_atomicity_lock);
+
+ /* Enter the no-fill (CD=1, NW=0) cache mode and flush caches. */
+ cr0 = read_cr0() | X86_CR0_CD;
+ write_cr0(cr0);
+
+ /*
+ * Cache flushing is the most time-consuming step when programming
+ * the MTRRs. Fortunately, as per the Intel Software Development
+ * Manual, we can skip it if the processor supports cache self-
+ * snooping.
+ */
+ if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
+ wbinvd();
+
+ /* Save value of CR4 and clear Page Global Enable (bit 7) */
+ if (boot_cpu_has(X86_FEATURE_PGE)) {
+ cr4 = __read_cr4();
+ __write_cr4(cr4 & ~X86_CR4_PGE);
+ }
+
+ /* Flush all TLBs via a mov %cr3, %reg; mov %reg, %cr3 */
+ count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
+ __flush_tlb();
+
+ /* Save MTRR state */
+ rdmsr(MSR_MTRRdefType, mtrr_deftype_lo, mtrr_deftype_hi);
+
+ /* Disable MTRRs, and set the default type to uncached */
+ mtrr_wrmsr(MSR_MTRRdefType, mtrr_deftype_lo & ~0xcff, mtrr_deftype_hi);
+
+ /* Again, only flush caches if we have to. */
+ if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
+ wbinvd();
+}
+
+void mtrr_pat_post_set(void) __releases(set_atomicity_lock)
+{
+ /* Flush TLBs (no need to flush caches - they are disabled) */
+ count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
+ __flush_tlb();
+
+ /* Intel (P6) standard MTRRs */
+ mtrr_wrmsr(MSR_MTRRdefType, mtrr_deftype_lo, mtrr_deftype_hi);
+
+ /* Enable caches */
+ write_cr0(read_cr0() & ~X86_CR0_CD);
+
+ /* Restore value of CR4 */
+ if (boot_cpu_has(X86_FEATURE_PGE))
+ __write_cr4(cr4);
+ raw_spin_unlock(&set_atomicity_lock);
+}

struct set_mtrr_data {
unsigned long smp_base;
--
2.17.1

2019-08-09 03:57:17

by Isaku Yamahata

[permalink] [raw]

Subject: [PATCH 3/3] x86/mtrr, pat: make PAT independent from MTRR

This patch makes PAT(Page Attribute Table) independent from
MTRR(Memory Type Range Register)
Some environments (mainly virtual ones) support only PAT, not MTRR.
It's tricky and no gain to support both MTRR and PAT at the
same time except compatibility because PAT replaces MTRR.
So some VM technologies don't support MTRR, but only PAT.
This patch make PAT available on such environments without MTRR.

Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/x86/Kconfig | 1 -
arch/x86/include/asm/mtrr.h | 32 +++++----
arch/x86/include/asm/pat.h | 2 +
arch/x86/kernel/cpu/mtrr/generic.c | 5 --
arch/x86/kernel/cpu/mtrr/mtrr.c | 8 +--
arch/x86/kernel/cpu/mtrr/mtrr.h | 1 -
arch/x86/kernel/cpu/mtrr/rendezvous.c | 76 +++++++++++---------
arch/x86/mm/Makefile | 3 +
arch/x86/mm/pat.c | 99 ++++++++++++++++++++++++---
9 files changed, 158 insertions(+), 69 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 222855cc0158..5654283e010f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1838,7 +1838,6 @@ config MTRR_SANITIZER_SPARE_REG_NR_DEFAULT
config X86_PAT
def_bool y
prompt "x86 PAT support" if EXPERT
- depends on MTRR
---help---
Use PAT attributes to setup page level cache control.

diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index 5b056374f5a6..a401ad106c28 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -31,10 +31,25 @@
* The following functions are for use by other drivers that cannot use
* arch_phys_wc_add and arch_phys_wc_del.
*/
-# ifdef CONFIG_MTRR
-extern bool mtrr_enabled(void);
+#if defined(CONFIG_MTRR) || defined(CONFIG_X86_PAT)
+/* common method for MTRR and PAT */
extern void mtrr_pat_prepare_set(void) __acquires(set_atomicity_lock);
extern void mtrr_pat_post_set(void) __releases(set_atomicity_lock);
+extern void mtrr_pat_ap_init(void);
+extern void set_mtrr_pat_aps_delayed_init(void);
+extern void mtrr_pat_aps_init(void);
+extern void mtrr_pat_bp_restore(void);
+#else
+static inline void mtrr_pat_prepare_set(void) { }
+static inline void mtrr_pat_post_set(void) { }
+static inline void mtrr_pat_ap_init(void) { };
+static inline void set_mtrr_pat_aps_delayed_init(void) { };
+static inline void mtrr_pat_aps_init(void) { };
+static inline void mtrr_pat_bp_restore(void) { };
+#endif
+
+# ifdef CONFIG_MTRR
+extern bool mtrr_enabled(void);
extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform);
extern void mtrr_save_fixed_ranges(void *);
extern void mtrr_save_state(void);
@@ -45,11 +60,7 @@ extern int mtrr_add_page(unsigned long base, unsigned long size,
extern int mtrr_del(int reg, unsigned long base, unsigned long size);
extern int mtrr_del_page(int reg, unsigned long base, unsigned long size);
extern void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi);
-extern void mtrr_pat_ap_init(void);
extern void mtrr_pat_bp_init(void);
-extern void set_mtrr_pat_aps_delayed_init(void);
-extern void mtrr_pat_aps_init(void);
-extern void mtrr_pat_bp_restore(void);
extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
extern int amd_special_default_mtrr(void);
# else
@@ -57,8 +68,6 @@ static inline bool mtrr_enabled(void)
{
return false;
}
-static inline void mtrr_pat_prepare_set(void) { };
-static inline void mtrr_pat_post_set(void) { };
static inline u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform)
{
/*
@@ -95,13 +104,8 @@ static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
}
static inline void mtrr_pat_bp_init(void)
{
- pat_disable("MTRRs disabled, skipping PAT initialization too.");
+ pat_bp_init();
}
-
-static inline void mtrr_pat_ap_init(void) { };
-static inline void set_mtrr_pat_aps_delayed_init(void) { };
-static inline void mtrr_pat_aps_init(void) { };
-static inline void mtrr_pat_bp_restore(void) { };
# endif

#ifdef CONFIG_COMPAT
diff --git a/arch/x86/include/asm/pat.h b/arch/x86/include/asm/pat.h
index 92015c65fa2a..2a355ce94ebf 100644
--- a/arch/x86/include/asm/pat.h
+++ b/arch/x86/include/asm/pat.h
@@ -7,7 +7,9 @@

bool pat_enabled(void);
void pat_disable(const char *reason);
+extern void pat_set(void);
extern void pat_init(void);
+extern void pat_bp_init(void);
extern void init_cache_modes(void);

extern int reserve_memtype(u64 start, u64 end,
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index a44f05f64846..f9a7ca79e2c2 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -6,13 +6,8 @@
#define DEBUG

#include <linux/export.h>
-#include <linux/init.h>
-#include <linux/io.h>
#include <linux/mm.h>

-#include <asm/processor-flags.h>
-#include <asm/cpufeature.h>
-#include <asm/tlbflush.h>
#include <asm/mtrr.h>
#include <asm/msr.h>
#include <asm/pat.h>
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
index 475627ca2c1b..2d28c9b37ae7 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -657,13 +657,7 @@ void __init mtrr_pat_bp_init(void)

if (!mtrr_enabled()) {
pr_info("Disabled\n");
-
- /*
- * PAT initialization relies on MTRR's rendezvous handler.
- * Skip PAT init until the handler can initialize both
- * features independently.
- */
- pat_disable("MTRRs disabled, skipping PAT initialization too.");
+ pat_bp_init();
}
}

diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h
index 57948b651b8e..dfc1094cae27 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.h
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.h
@@ -53,7 +53,6 @@ void set_mtrr_prepare_save(struct set_mtrr_context *ctxt);
void fill_mtrr_var_range(unsigned int index,
u32 base_lo, u32 base_hi, u32 mask_lo, u32 mask_hi);
bool get_mtrr_state(void);
-void pat_bp_init(void);

extern void __init set_mtrr_ops(const struct mtrr_ops *ops);

diff --git a/arch/x86/kernel/cpu/mtrr/rendezvous.c b/arch/x86/kernel/cpu/mtrr/rendezvous.c
index d902b9e5cc17..2f31dcf334a9 100644
--- a/arch/x86/kernel/cpu/mtrr/rendezvous.c
+++ b/arch/x86/kernel/cpu/mtrr/rendezvous.c
@@ -47,20 +47,6 @@ u32 mtrr_deftype_lo, mtrr_deftype_hi;
static unsigned long cr4;
static DEFINE_RAW_SPINLOCK(set_atomicity_lock);

-/* PAT setup for BP. We need to go through sync steps here */
-void __init pat_bp_init(void)
-{
- unsigned long flags;
-
- local_irq_save(flags);
- mtrr_pat_prepare_set();
-
- pat_init();
-
- mtrr_pat_post_set();
- local_irq_restore(flags);
-}
-
/*
* Since we are disabling the cache don't allow any interrupts,
* they would run extremely slow and would only increase the pain.
@@ -104,11 +90,14 @@ void mtrr_pat_prepare_set(void) __acquires(set_atomicity_lock)
count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
__flush_tlb();

- /* Save MTRR state */
- rdmsr(MSR_MTRRdefType, mtrr_deftype_lo, mtrr_deftype_hi);
+ if (mtrr_enabled()) {
+ /* Save MTRR state */
+ rdmsr(MSR_MTRRdefType, mtrr_deftype_lo, mtrr_deftype_hi);

- /* Disable MTRRs, and set the default type to uncached */
- mtrr_wrmsr(MSR_MTRRdefType, mtrr_deftype_lo & ~0xcff, mtrr_deftype_hi);
+ /* Disable MTRRs, and set the default type to uncached */
+ mtrr_wrmsr(MSR_MTRRdefType, mtrr_deftype_lo & ~0xcff,
+ mtrr_deftype_hi);
+ }

/* Again, only flush caches if we have to. */
if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
@@ -121,8 +110,10 @@ void mtrr_pat_post_set(void) __releases(set_atomicity_lock)
count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
__flush_tlb();

- /* Intel (P6) standard MTRRs */
- mtrr_wrmsr(MSR_MTRRdefType, mtrr_deftype_lo, mtrr_deftype_hi);
+ if (mtrr_enabled()) {
+ /* Intel (P6) standard MTRRs */
+ mtrr_wrmsr(MSR_MTRRdefType, mtrr_deftype_lo, mtrr_deftype_hi);
+ }

/* Enable caches */
write_cr0(read_cr0() & ~X86_CR0_CD);
@@ -133,6 +124,14 @@ void mtrr_pat_post_set(void) __releases(set_atomicity_lock)
raw_spin_unlock(&set_atomicity_lock);
}

+static inline void mtrr_pat_set_all(void)
+{
+ if (mtrr_enabled())
+ mtrr_if->set_all();
+ else
+ pat_set();
+}
+
struct set_mtrr_data {
unsigned long smp_base;
unsigned long smp_size;
@@ -165,17 +164,19 @@ static int mtrr_pat_rendezvous_handler(void *info)
* set_all()).
*/
if (data->smp_reg != ~0U) {
- mtrr_if->set(data->smp_reg, data->smp_base,
- data->smp_size, data->smp_type);
+ if (mtrr_enabled()) {
+ mtrr_if->set(data->smp_reg, data->smp_base,
+ data->smp_size, data->smp_type);
+ }
} else if (mtrr_pat_aps_delayed_init ||
!cpu_online(smp_processor_id())) {
- mtrr_if->set_all();
+ mtrr_pat_set_all();
}
return 0;
}

/**
- * set_mtrr_pat - update mtrrs on all processors
+ * set_mtrr_pat - update mtrrs and pat on all processors
* @reg: mtrr in question
* @base: mtrr base
* @size: mtrr size
@@ -230,6 +231,7 @@ void set_mtrr_pat_cpuslocked(unsigned int reg, unsigned long base,
.smp_type = type
};

+ WARN_ON(!mtrr_enabled());
stop_machine_cpuslocked(mtrr_pat_rendezvous_handler,
&data, cpu_online_mask);
}
@@ -247,12 +249,24 @@ static void set_mtrr_pat_from_inactive_cpu(unsigned int reg, unsigned long base,
cpu_callout_mask);
}

+static inline bool use_intel_mtrr_pat(void)
+{
+ if (mtrr_enabled() || pat_enabled())
+ return true;
+
+#ifdef CONFIG_MTRR
+ return use_intel();
+#else
+ return true;
+#endif
+}
+
void mtrr_pat_ap_init(void)
{
- if (!mtrr_enabled())
+ if (!use_intel_mtrr_pat())
return;

- if (!use_intel() || mtrr_pat_aps_delayed_init)
+ if (mtrr_pat_aps_delayed_init)
return;

rcu_cpu_starting(smp_processor_id());
@@ -275,9 +289,7 @@ void mtrr_pat_ap_init(void)

void set_mtrr_pat_aps_delayed_init(void)
{
- if (!mtrr_enabled())
- return;
- if (!use_intel())
+ if (!use_intel_mtrr_pat())
return;

mtrr_pat_aps_delayed_init = true;
@@ -288,7 +300,7 @@ void set_mtrr_pat_aps_delayed_init(void)
*/
void mtrr_pat_aps_init(void)
{
- if (!use_intel() || !mtrr_enabled())
+ if (!use_intel_mtrr_pat())
return;

/*
@@ -305,8 +317,8 @@ void mtrr_pat_aps_init(void)

void mtrr_pat_bp_restore(void)
{
- if (!use_intel() || !mtrr_enabled())
+ if (!use_intel_mtrr_pat())
return;

- mtrr_if->set_all();
+ mtrr_pat_set_all();
}
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 84373dc9b341..841820553a62 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -24,6 +24,9 @@ CFLAGS_mem_encrypt_identity.o := $(nostackp)
CFLAGS_fault.o := -I $(srctree)/$(src)/../include/asm/trace

obj-$(CONFIG_X86_PAT) += pat_rbtree.o
+ifndef CONFIG_MTRR
+obj-$(CONFIG_X86_PAT) += ../kernel/cpu/mtrr/rendezvous.o
+endif

obj-$(CONFIG_X86_32) += pgtable_32.o iomap_32.o

diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index d9fbd4f69920..852cd0c3f96e 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -297,7 +297,7 @@ void init_cache_modes(void)
* to enable additional cache attributes, WC, WT and WP.
*
* This function must be called on all CPUs using the specific sequence of
- * operations defined in Intel SDM. mtrr_rendezvous_handler() provides this
+ * operations defined in Intel SDM. mtrr_pat_rendezvous_handler() provides this
* procedure for PAT.
*/
void pat_init(void)
@@ -374,8 +374,33 @@ void pat_init(void)

#undef PAT

+void pat_set(void)
+{
+ if (pat_disabled)
+ return;
+
+ unsigned long flags;
+
+ local_irq_save(flags);
+ mtrr_pat_prepare_set();
+
+ pat_init();
+
+ mtrr_pat_post_set();
+ local_irq_restore(flags);
+}
+
+/* PAT setup for BP. We need to go through sync steps here */
+void __init pat_bp_init(void)
+{
+ pat_set();
+}
+
static DEFINE_SPINLOCK(memtype_lock); /* protects memtype accesses */

+static int pat_pagerange_is_ram(resource_size_t start, resource_size_t end);
+static int pat_pagerange_is_acpi(resource_size_t start, resource_size_t end);
+
/*
* Does intersection of PAT memory type and MTRR memory type and returns
* the resulting memory type as PAT understands it.
@@ -383,6 +408,22 @@ static DEFINE_SPINLOCK(memtype_lock); /* protects memtype accesses */
* The intersection is based on "Effective Memory Type" tables in IA-32
* SDM vol 3a
*/
+static unsigned long system_memory_type(u64 start, u64 end,
+ enum page_cache_mode req_type)
+{
+ /*
+ * ACPI subsystem tries to map non-ram area as writeback.
+ * If it's not ram, use uc minus similarly to mtrr case.
+ */
+ if (pat_pagerange_is_ram(start, end) == 1)
+ return _PAGE_CACHE_MODE_WB;
+ /* allow writeback for ACPI tables/ACPI NVS */
+ if (pat_pagerange_is_acpi(start, end) == 1)
+ return _PAGE_CACHE_MODE_WB;
+
+ return _PAGE_CACHE_MODE_UC_MINUS;
+}
+
static unsigned long pat_x_mtrr_type(u64 start, u64 end,
enum page_cache_mode req_type)
{
@@ -391,13 +432,21 @@ static unsigned long pat_x_mtrr_type(u64 start, u64 end,
* request is for WB.
*/
if (req_type == _PAGE_CACHE_MODE_WB) {
- u8 mtrr_type, uniform;
-
- mtrr_type = mtrr_type_lookup(start, end, &uniform);
- if (mtrr_type != MTRR_TYPE_WRBACK)
- return _PAGE_CACHE_MODE_UC_MINUS;
-
- return _PAGE_CACHE_MODE_WB;
+ if (mtrr_enabled()) {
+ u8 mtrr_type, uniform;
+
+ mtrr_type = mtrr_type_lookup(start, end, &uniform);
+ if (mtrr_type == MTRR_TYPE_INVALID) {
+ /* MTRR doesn't cover this range. */
+ return system_memory_type(
+ start, end, req_type);
+ }
+ if (mtrr_type != MTRR_TYPE_WRBACK)
+ return _PAGE_CACHE_MODE_UC_MINUS;
+
+ return _PAGE_CACHE_MODE_WB;
+ }
+ return system_memory_type(start, end, req_type);
}

return req_type;
@@ -446,6 +495,36 @@ static int pat_pagerange_is_ram(resource_size_t start, resource_size_t end)
return (ret > 0) ? -1 : (state.ram ? 1 : 0);
}

+static int pagerange_is_acpi_desc_callback(struct resource *res, void *arg)
+{
+ unsigned long initial_pfn = res->start >> PAGE_SHIFT;
+ unsigned long end_pfn = (res->end + 1) >> PAGE_SHIFT;
+ unsigned long total_nr_pages = end_pfn - initial_pfn;
+
+ return pagerange_is_ram_callback(initial_pfn, total_nr_pages, arg);
+}
+
+static int pagerange_is_acpi_desc(unsigned long desc,
+ resource_size_t start, resource_size_t end)
+{
+ int ret = 0;
+ unsigned long start_pfn = start >> PAGE_SHIFT;
+ struct pagerange_state state = {start_pfn, 0, 0};
+
+ ret = walk_iomem_res_desc(desc, IORESOURCE_MEM | IORESOURCE_BUSY,
+ start, end, &state, pagerange_is_acpi_desc_callback);
+ return (ret > 0) ? -1 : (state.ram ? 1 : 0);
+}
+
+static int pat_pagerange_is_acpi(resource_size_t start, resource_size_t end)
+{
+ int ret = pagerange_is_acpi_desc(IORES_DESC_ACPI_TABLES, start, end);
+
+ if (ret == 1)
+ return ret;
+ return pagerange_is_acpi_desc(IORES_DESC_ACPI_NV_STORAGE, start, end);
+}
+
/*
* For RAM pages, we use page flags to mark the pages with appropriate type.
* The page flags are limited to four types, WB (default), WC, WT and UC-.
@@ -561,7 +640,7 @@ int reserve_memtype(u64 start, u64 end, enum page_cache_mode req_type,
if (!pat_enabled()) {
/* This is identical to page table setting without PAT */
if (new_type)
- *new_type = req_type;
+ *new_type = pat_x_mtrr_type(start, end, req_type);
return 0;
}

@@ -577,6 +656,8 @@ int reserve_memtype(u64 start, u64 end, enum page_cache_mode req_type,
* optimization for /dev/mem mmap'ers into WB memory (BIOS
* tools and ACPI tools). Use WB request for WB memory and use
* UC_MINUS otherwise.
+ * When mtrr is disabled, check iomem resource which is derived
+ * from e820.
*/
actual_type = pat_x_mtrr_type(start, end, req_type);

--
2.17.1

2019-08-09 07:08:29

by Borislav Petkov

[permalink] [raw]

Subject: Re: [PATCH 0/3] x86/mtrr, pat: make PAT independent from MTRR

On Thu, Aug 08, 2019 at 08:54:17PM -0700, Isaku Yamahata wrote:
> Make PAT(Page Attribute Table) independent from
> MTRR(Memory Type Range Register).
> Some environments (mainly virtual ones) support only PAT, but not MTRR
> because PAT replaces MTRR.
> It's tricky and no gain to support both MTRR and PAT except compatibility.
> So some VM technologies don't support MTRR, but only PAT.
> This patch series makes PAT available on such environments without MTRR.

And this "justification" is not even trying. Which "VM technologies" are
those? Why do we care? What's the impact? Why do we want this?

You need to sell this properly.

Also, your patches are huge. You'd need to split them sensibly.

Thx.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-08-09 19:53:13

by Kani, Toshimitsu

[permalink] [raw]

Subject: Re: [PATCH 0/3] x86/mtrr, pat: make PAT independent from MTRR

On Fri, 2019-08-09 at 09:06 +0200, Borislav Petkov wrote:
> On Thu, Aug 08, 2019 at 08:54:17PM -0700, Isaku Yamahata wrote:
> > Make PAT(Page Attribute Table) independent from
> > MTRR(Memory Type Range Register).
> > Some environments (mainly virtual ones) support only PAT, but not MTRR
> > because PAT replaces MTRR.
> > It's tricky and no gain to support both MTRR and PAT except compatibility.
> > So some VM technologies don't support MTRR, but only PAT.

I do not think it is technically correct on bare metal. AFAIK, MTRR is
still the only way to setup cache attribute in real-mode, which BIOS SMI
handler relies on in SMM.

> > This patch series makes PAT available on such environments without MTRR.
>
> And this "justification" is not even trying. Which "VM technologies" are
> those? Why do we care? What's the impact? Why do we want this?
>
> You need to sell this properly.

Agreed. If the situation is still the same, Xen does not support MTRR,
and the kernel sets the PAT table to the BIOS hand-off state when MTRR
is disabled. The change below accommodated the fact that Xen hypervisor
enables WC before hand-off, which is different from the default BIOS
hand-off state. The kernel does not support setting PAT when MTRR is
disabled due to the dependency Isaku mentioned.

https://www.mail-archive.com/[email protected]/msg1107094.html

-Toshi

2019-08-13 08:06:20

by Borislav Petkov

[permalink] [raw]

Subject: Re: [PATCH 0/3] x86/mtrr, pat: make PAT independent from MTRR

On Tue, Aug 13, 2019 at 12:49:20AM -0700, Isaku Yamahata wrote:
> In addition to Xen, KVM+qemu can enable/disable MTRR, PAT independently.
> So user may want to disable MTRR to reduce attack surface.

No, no "user may want" etc vague formulations. Just because some virt
thing "can" do stuff doesn't mean we should change the kernel. What are
the clear benefits of your proposal, why should it go upstream and why
should it be exposed to everybody?

How is going to be used it and what would it bring?

Are there any downsides?

Thx.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-08-13 08:51:53

by Isaku Yamahata

[permalink] [raw]

Subject: Re: [PATCH 0/3] x86/mtrr, pat: make PAT independent from MTRR

On Fri, Aug 09, 2019 at 07:51:17PM +0000,
"Kani, Toshi" <[email protected]> wrote:

> On Fri, 2019-08-09 at 09:06 +0200, Borislav Petkov wrote:
> > On Thu, Aug 08, 2019 at 08:54:17PM -0700, Isaku Yamahata wrote:
> > > Make PAT(Page Attribute Table) independent from
> > > MTRR(Memory Type Range Register).
> > > Some environments (mainly virtual ones) support only PAT, but not MTRR
> > > because PAT replaces MTRR.
> > > It's tricky and no gain to support both MTRR and PAT except compatibility.
> > > So some VM technologies don't support MTRR, but only PAT.
>
> I do not think it is technically correct on bare metal. AFAIK, MTRR is
> still the only way to setup cache attribute in real-mode, which BIOS SMI
> handler relies on in SMM.

Then you're claiming if it's baremetal, both MTRR and PAT should be
enabled/disabled at the same time?

> > > This patch series makes PAT available on such environments without MTRR.
> >
> > And this "justification" is not even trying. Which "VM technologies" are
> > those? Why do we care? What's the impact? Why do we want this?
> >
> > You need to sell this properly.
>
> Agreed. If the situation is still the same, Xen does not support MTRR,
> and the kernel sets the PAT table to the BIOS hand-off state when MTRR
> is disabled. The change below accommodated the fact that Xen hypervisor
> enables WC before hand-off, which is different from the default BIOS
> hand-off state. The kernel does not support setting PAT when MTRR is
> disabled due to the dependency Isaku mentioned.
>
>
> https://www.mail-archive.com/[email protected]/msg1107094.html

Thanks for supplement.
In addition to Xen, KVM+qemu can enable/disable MTRR, PAT independently.
So user may want to disable MTRR to reduce attack surface.
ACRN doesn't support MTRR.

Let me include those description for next respin.
--
Isaku Yamahata <[email protected]>

2019-08-13 15:09:24

by Kani, Toshimitsu

[permalink] [raw]

Subject: Re: [PATCH 0/3] x86/mtrr, pat: make PAT independent from MTRR

On Tue, 2019-08-13 at 00:49 -0700, Isaku Yamahata wrote:
> On Fri, Aug 09, 2019 at 07:51:17PM +0000,
> "Kani, Toshi" <[email protected]> wrote:
>
> > On Fri, 2019-08-09 at 09:06 +0200, Borislav Petkov wrote:
> > > On Thu, Aug 08, 2019 at 08:54:17PM -0700, Isaku Yamahata wrote:
> > > > Make PAT(Page Attribute Table) independent from
> > > > MTRR(Memory Type Range Register).
> > > > Some environments (mainly virtual ones) support only PAT, but not MTRR
> > > > because PAT replaces MTRR.
> > > > It's tricky and no gain to support both MTRR and PAT except compatibility.
> > > > So some VM technologies don't support MTRR, but only PAT.
> >
> > I do not think it is technically correct on bare metal. AFAIK, MTRR is
> > still the only way to setup cache attribute in real-mode, which BIOS SMI
> > handler relies on in SMM.
>
> Then you're claiming if it's baremetal, both MTRR and PAT should be
> enabled/disabled at the same time?

No, I did not say that. My point:
- Your statement of MTTR being useless is not correct. It's still used.
The OS should leave the MTTR hand-off state.

I agree with you in general that PAT and MTTR init should be
independent. However, as Boris said, please verify the impact of your
change. As I mentioned in the Xen's example, hypervisor may have non-
default PAT hand-off setting.

Thanks,
-Toshi