Since 'commit 9cd25aac1f44 ("x86/mm/pat: Emulate PAT when it
is disabled")', we emulate a PAT table when PAT is disabled.
This requires pat_init() be called even if PAT is disabled,
which revealed a long standing issue that PAT is left enabled
without calling pat_init() at all.
pat_init() is called from MTRR code since it relies on MTRR's
rendezvous handler to initialize PAT for all APs . However,
when CPU does not support MTRR, ex. qemu32's virtual CPU, MTRR
is set disabled and does not call pat_init(). There is no
interface available for MTRR to disable PAT, either.
This patch-set refactors MTRR and PAT initializations to make
sure that PAT is properly initialized in all cases.
---
Toshi Kani (2):
1/2 x86/mm/pat: Change pat_disable() to emulate PAT table
2/2 x86/mtrr: Refactor PAT initialization code
---
arch/x86/include/asm/mtrr.h | 6 ++-
arch/x86/include/asm/pat.h | 1 +
arch/x86/kernel/cpu/mtrr/generic.c | 24 ++++++-----
arch/x86/kernel/cpu/mtrr/main.c | 13 +++++-
arch/x86/kernel/cpu/mtrr/mtrr.h | 1 +
arch/x86/mm/pat.c | 84 +++++++++++++++++++++++---------------
6 files changed, 84 insertions(+), 45 deletions(-)
MTRR manages PAT initialization as it implements a rendezvous
handler that initializes PAT as part of MTRR initialization.
When CPU does not support MTRR, ex. qemu32 virtual CPU, MTRR
simply skips PAT init, which causes PAT left enabled without
initialization. Also, get_mtrr_state() calls pat_init() on
BSP even if MTRR is disabled by its MSR. This causes pat_init()
be called on BSP only.
The following changes are made to address these issues:
- Move BSP's PAT init code to mtrr_bp_pat_init(), and change
mtrr_bp_init() to call it if MTRR is enabled. This keeps
the init condition consistent with mtrr_ap_init() and
mtrr_aps_init() for APs.
- Change mtrr_bp_init() to call pat_disable() when MTRR is
disabled.
- Change mtrr_bp_init() stub function to call pat_disable()
when CONFIG_MTRR is unset.
The table below discribes how PAT is initialized in all possible
cases.
Legend
----------------------------
E Enabled in CPU feature and MSR
D Disabled in CPU feature or MSR
nopat "nopat" boot option specified
!PAT CONFIG_X86_PAT unset
!MTRR CONFIG_MTRR unset
MTRR PAT ACTION
====================================================================
E E MTRR calls pat_init() -> PAT enabled
E D MTRR calls pat_init() -> PAT disabled per cpu_has_pat
D E MTRR calls pat_disable() -> PAT disabled (*)
D D MTRR calls pat_disable() -> PAT disabled
E nopat nopat() calls pat_disable() -> PAT disabled
D nopat nopat() calls pat_disable() -> PAT disabled
E !PAT MTRR calls pat_init() -> PAT disabled per __pat_enabled
D !PAT MTRR calls pat_disable() -> PAT disabled
!MTRR !PAT mtrr_bp_init() stub calls pat_disable() -> PAT disabled
(*) Further enhancement may enable PAT if needed
Signed-off-by: Toshi Kani <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Luis R. Rodriguez <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
arch/x86/include/asm/mtrr.h | 6 +++++-
arch/x86/kernel/cpu/mtrr/generic.c | 24 ++++++++++++++----------
arch/x86/kernel/cpu/mtrr/main.c | 13 ++++++++++++-
arch/x86/kernel/cpu/mtrr/mtrr.h | 1 +
4 files changed, 32 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index b94f6f6..a965e74 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -24,6 +24,7 @@
#define _ASM_X86_MTRR_H
#include <uapi/asm/mtrr.h>
+#include <asm/pat.h>
/*
@@ -83,9 +84,12 @@ static inline int mtrr_trim_uncached_memory(unsigned long end_pfn)
static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
{
}
+static inline void mtrr_bp_init(void)
+{
+ pat_disable("PAT disabled by MTRR");
+}
#define mtrr_ap_init() do {} while (0)
-#define mtrr_bp_init() do {} while (0)
#define set_mtrr_aps_delayed_init() do {} while (0)
#define mtrr_aps_init() do {} while (0)
#define mtrr_bp_restore() do {} while (0)
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index c870af1..136ae86 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -444,11 +444,24 @@ static void __init print_mtrr_state(void)
pr_debug("TOM2: %016llx aka %lldM\n", mtrr_tom2, mtrr_tom2>>20);
}
+/* PAT setup for BP. We need to go through sync steps here */
+void __init mtrr_bp_pat_init(void)
+{
+ unsigned long flags;
+
+ local_irq_save(flags);
+ prepare_set();
+
+ pat_init();
+
+ post_set();
+ local_irq_restore(flags);
+}
+
/* Grab all of the MTRR state for this CPU into *state */
bool __init get_mtrr_state(void)
{
struct mtrr_var_range *vrs;
- unsigned long flags;
unsigned lo, dummy;
unsigned int i;
@@ -481,15 +494,6 @@ bool __init get_mtrr_state(void)
mtrr_state_set = 1;
- /* PAT setup for BP. We need to go through sync steps here */
- local_irq_save(flags);
- prepare_set();
-
- pat_init();
-
- post_set();
- local_irq_restore(flags);
-
return !!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED);
}
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 5c3d149..d9e91f1 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -752,6 +752,9 @@ void __init mtrr_bp_init(void)
/* BIOS may override */
__mtrr_enabled = get_mtrr_state();
+ if (mtrr_enabled())
+ mtrr_bp_pat_init();
+
if (mtrr_cleanup(phys_addr)) {
changed_by_mtrr_cleanup = 1;
mtrr_if->set_all();
@@ -759,8 +762,16 @@ void __init mtrr_bp_init(void)
}
}
- if (!mtrr_enabled())
+ if (!mtrr_enabled()) {
pr_info("MTRR: Disabled\n");
+
+ /*
+ * PAT initialization relies on MTRR's rendezvous handler.
+ * Disable PAT until the handler can initialize both features
+ * independently.
+ */
+ pat_disable("PAT disabled by MTRR");
+ }
}
void mtrr_ap_init(void)
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h
index 951884d..6c7ced0 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.h
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.h
@@ -52,6 +52,7 @@ void set_mtrr_prepare_save(struct set_mtrr_context *ctxt);
void fill_mtrr_var_range(unsigned int index,
u32 base_lo, u32 base_hi, u32 mask_lo, u32 mask_hi);
bool get_mtrr_state(void);
+void mtrr_bp_pat_init(void);
extern void set_mtrr_ops(const struct mtrr_ops *ops);
Since 'commit 9cd25aac1f44 ("x86/mm/pat: Emulate PAT when it
is disabled")', we emulate a PAT table when PAT is disabled.
This requires pat_init() be called even if PAT is disabled,
which revealed a long standing issue that PAT is left enabled
without calling pat_init() at all.
pat_init() is called from MTRR code since it relies on MTRR's
rendezvous handler to initialize PAT for all APs . However,
when CPU does not support MTRR, ex. qemu32's virtual CPU, MTRR
is set disabled and does not call pat_init(). There is no
interface available for MTRR to disable PAT, either.
Change pat_disable() to a regular function (from an inline func)
so that MTRR can call it to disable PAT when MTRR is disabled.
pat_disable() sets PAT disabled, and calls pat_disable_init()
to emulate the PAT table.
link: https://lkml.org/lkml/2016/3/10/402
Reported-by: Paul Gortmaker <[email protected]>
Signed-off-by: Toshi Kani <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Luis R. Rodriguez <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
arch/x86/include/asm/pat.h | 1 +
arch/x86/mm/pat.c | 84 +++++++++++++++++++++++++++-----------------
2 files changed, 52 insertions(+), 33 deletions(-)
diff --git a/arch/x86/include/asm/pat.h b/arch/x86/include/asm/pat.h
index ca6c228..016142b 100644
--- a/arch/x86/include/asm/pat.h
+++ b/arch/x86/include/asm/pat.h
@@ -5,6 +5,7 @@
#include <asm/pgtable_types.h>
bool pat_enabled(void);
+void pat_disable(const char *reason);
extern void pat_init(void);
void pat_init_cache_modes(u64);
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index f4ae536..1ff8aa9 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -40,11 +40,19 @@
static bool boot_cpu_done;
static int __read_mostly __pat_enabled = IS_ENABLED(CONFIG_X86_PAT);
+static void pat_disable_init(void);
-static inline void pat_disable(const char *reason)
+void pat_disable(const char *reason)
{
+ if (boot_cpu_done) {
+ pr_info("x86/PAT: PAT cannot be disabled after initialized\n");
+ return;
+ }
+
__pat_enabled = 0;
pr_info("x86/PAT: %s\n", reason);
+
+ pat_disable_init();
}
static int __init nopat(char *str)
@@ -207,9 +215,6 @@ static void pat_bsp_init(u64 pat)
return;
}
- if (!pat_enabled())
- goto done;
-
rdmsrl(MSR_IA32_CR_PAT, tmp_pat);
if (!tmp_pat) {
pat_disable("PAT MSR is 0, disabled.");
@@ -218,15 +223,11 @@ static void pat_bsp_init(u64 pat)
wrmsrl(MSR_IA32_CR_PAT, pat);
-done:
pat_init_cache_modes(pat);
}
static void pat_ap_init(u64 pat)
{
- if (!pat_enabled())
- return;
-
if (!cpu_has_pat) {
/*
* If this happens we are on a secondary CPU, but switched to
@@ -238,38 +239,55 @@ static void pat_ap_init(u64 pat)
wrmsrl(MSR_IA32_CR_PAT, pat);
}
+static void pat_disable_init(void)
+{
+ u64 pat;
+ static int disable_init_done;
+
+ if (disable_init_done)
+ return;
+
+ /*
+ * No PAT. Emulate the PAT table that corresponds to the two
+ * cache bits, PWT (Write Through) and PCD (Cache Disable). This
+ * setup is the same as the BIOS default setup when the system
+ * has PAT but the "nopat" boot option has been specified. This
+ * emulated PAT table is used when MSR_IA32_CR_PAT returns 0.
+ *
+ * PTE encoding:
+ *
+ * PCD
+ * |PWT PAT
+ * || slot
+ * 00 0 WB : _PAGE_CACHE_MODE_WB
+ * 01 1 WT : _PAGE_CACHE_MODE_WT
+ * 10 2 UC-: _PAGE_CACHE_MODE_UC_MINUS
+ * 11 3 UC : _PAGE_CACHE_MODE_UC
+ *
+ * NOTE: When WC or WP is used, it is redirected to UC- per
+ * the default setup in __cachemode2pte_tbl[].
+ */
+ pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
+ PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
+
+ pat_init_cache_modes(pat);
+
+ disable_init_done = 1;
+}
+
void pat_init(void)
{
u64 pat;
struct cpuinfo_x86 *c = &boot_cpu_data;
if (!pat_enabled()) {
- /*
- * No PAT. Emulate the PAT table that corresponds to the two
- * cache bits, PWT (Write Through) and PCD (Cache Disable). This
- * setup is the same as the BIOS default setup when the system
- * has PAT but the "nopat" boot option has been specified. This
- * emulated PAT table is used when MSR_IA32_CR_PAT returns 0.
- *
- * PTE encoding:
- *
- * PCD
- * |PWT PAT
- * || slot
- * 00 0 WB : _PAGE_CACHE_MODE_WB
- * 01 1 WT : _PAGE_CACHE_MODE_WT
- * 10 2 UC-: _PAGE_CACHE_MODE_UC_MINUS
- * 11 3 UC : _PAGE_CACHE_MODE_UC
- *
- * NOTE: When WC or WP is used, it is redirected to UC- per
- * the default setup in __cachemode2pte_tbl[].
- */
- pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
- PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
+ pat_disable_init();
+ return;
+ }
- } else if ((c->x86_vendor == X86_VENDOR_INTEL) &&
- (((c->x86 == 0x6) && (c->x86_model <= 0xd)) ||
- ((c->x86 == 0xf) && (c->x86_model <= 0x6)))) {
+ if ((c->x86_vendor == X86_VENDOR_INTEL) &&
+ (((c->x86 == 0x6) && (c->x86_model <= 0xd)) ||
+ ((c->x86 == 0xf) && (c->x86_model <= 0x6)))) {
/*
* PAT support with the lower four entries. Intel Pentium 2,
* 3, M, and 4 are affected by PAT errata, which makes the
* Toshi Kani <[email protected]> wrote:
> MTRR manages PAT initialization as it implements a rendezvous
> handler that initializes PAT as part of MTRR initialization.
>
> When CPU does not support MTRR, ex. qemu32 virtual CPU, MTRR
> simply skips PAT init, which causes PAT left enabled without
> initialization. [...]
What practical effects does this have to the user? Does the kernel crash?
Thanks,
Ingo
On Thu, Mar 10, 2016 at 09:45:45PM -0700, Toshi Kani wrote:
> Since 'commit 9cd25aac1f44 ("x86/mm/pat: Emulate PAT when it
> is disabled")', we emulate a PAT table when PAT is disabled.
> This requires pat_init() be called even if PAT is disabled,
> which revealed a long standing issue that PAT is left enabled
> without calling pat_init() at all.
>
> pat_init() is called from MTRR code since it relies on MTRR's
> rendezvous handler to initialize PAT for all APs . However,
> when CPU does not support MTRR, ex. qemu32's virtual CPU, MTRR
> is set disabled and does not call pat_init(). There is no
> interface available for MTRR to disable PAT, either.
>
> Change pat_disable() to a regular function (from an inline func)
> so that MTRR can call it to disable PAT when MTRR is disabled.
> pat_disable() sets PAT disabled, and calls pat_disable_init()
> to emulate the PAT table.
>
> link: https://lkml.org/lkml/2016/3/10/402
> Reported-by: Paul Gortmaker <[email protected]>
> Signed-off-by: Toshi Kani <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Luis R. Rodriguez <[email protected]>
> Cc: Juergen Gross <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: H. Peter Anvin <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> ---
> arch/x86/include/asm/pat.h | 1 +
> arch/x86/mm/pat.c | 84 +++++++++++++++++++++++++++-----------------
> 2 files changed, 52 insertions(+), 33 deletions(-)
>
> diff --git a/arch/x86/include/asm/pat.h b/arch/x86/include/asm/pat.h
> index ca6c228..016142b 100644
> --- a/arch/x86/include/asm/pat.h
> +++ b/arch/x86/include/asm/pat.h
> @@ -5,6 +5,7 @@
> #include <asm/pgtable_types.h>
>
> bool pat_enabled(void);
> +void pat_disable(const char *reason);
> extern void pat_init(void);
> void pat_init_cache_modes(u64);
>
> diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
> index f4ae536..1ff8aa9 100644
> --- a/arch/x86/mm/pat.c
> +++ b/arch/x86/mm/pat.c
> @@ -40,11 +40,19 @@
> static bool boot_cpu_done;
>
> static int __read_mostly __pat_enabled = IS_ENABLED(CONFIG_X86_PAT);
> +static void pat_disable_init(void);
>
> -static inline void pat_disable(const char *reason)
> +void pat_disable(const char *reason)
> {
> + if (boot_cpu_done) {
> + pr_info("x86/PAT: PAT cannot be disabled after initialized\n");
pr_err()
> + return;
> + }
> +
> __pat_enabled = 0;
> pr_info("x86/PAT: %s\n", reason);
> +
> + pat_disable_init();
Why can't you call pat_init() here simply? It checks pat_enabled(). You
can call it pat_setup() or so if it looks confusing to call an init
function in a disable function...
Then you don't have to add yet another static disable_init_done but rely on
boot_cpu_done which gets set in pat_init().
Also, I don't see the static_cpu_has() check I suggested yesterday - we need to
check the feature bits if PAT gets disabled early on some old Intels.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
* Ingo Molnar <[email protected]> wrote:
>
> * Toshi Kani <[email protected]> wrote:
>
> > MTRR manages PAT initialization as it implements a rendezvous
> > handler that initializes PAT as part of MTRR initialization.
> >
> > When CPU does not support MTRR, ex. qemu32 virtual CPU, MTRR
> > simply skips PAT init, which causes PAT left enabled without
> > initialization. [...]
>
> What practical effects does this have to the user? Does the kernel crash?
Btw., I find this omission _highly_ annoying: describing what negative effects a
bug _causes in practice_ is the most important part of a changelog. How on earth
can an experienced contributor omit such an important component from a patch
description?
Most readers of changelogs couldn't care less about technical details of how the
bug is fixed (of course others will read it so it's nice to have too), but what
symptoms a bug causes, how serious is it, whether it should be backported are like
super important compared to everything else you wrote - and both the description
and the changelogs are totally silent on those topics ...
I've seen this in other PAT patches - please try to improve this.
Thanks,
Ingo
On Thu, Mar 10, 2016 at 09:45:46PM -0700, Toshi Kani wrote:
> MTRR manages PAT initialization as it implements a rendezvous
> handler that initializes PAT as part of MTRR initialization.
>
> When CPU does not support MTRR, ex. qemu32 virtual CPU, MTRR
> simply skips PAT init, which causes PAT left enabled without
> initialization. Also, get_mtrr_state() calls pat_init() on
> BSP even if MTRR is disabled by its MSR. This causes pat_init()
> be called on BSP only.
So I don't understand what all this hoopla is all about: why can't you
simply call pat_disable() in mtrr_ap_init() and be done with it?
void mtrr_ap_init(void)
{
if (!mtrr_enabled()) {
pat_disable();
return;
}
?
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
On Fri, 2016-03-11 at 09:12 +0000, Borislav Petkov wrote:
> On Thu, Mar 10, 2016 at 09:45:45PM -0700, Toshi Kani wrote:
:
> >
> > -static inline void pat_disable(const char *reason)
> > +void pat_disable(const char *reason)
> > {
> > + if (boot_cpu_done) {
> > + pr_info("x86/PAT: PAT cannot be disabled after
> > initialized\n");
>
> pr_err()
Will do.
>
> > + return;
> > + }
> > +
> > __pat_enabled = 0;
> > pr_info("x86/PAT: %s\n", reason);
> > +
> > + pat_disable_init();
>
> Why can't you call pat_init() here simply? It checks pat_enabled(). You
> can call it pat_setup() or so if it looks confusing to call an init
> function in a disable function...
How about pat_disable_setup()? It's only used for the disabled case, so
I'd prefer to keep the word "disable".
Yes, calling pat_init() from pat_disable() works too. I changed it in this
way because:
- pat_bsp_init() calls pat_disabled() in an error case. It is simpler to
avoid a recursive call to pat_init().
- pat_bsp_init() has two different error paths, 1) call pat_disable() and
return, and 2) goto done and call pat_init_cache_modes(). We can remove
case 2) to keep the error handling consistent in this way.
> Then you don't have to add yet another static disable_init_done but rely
> on boot_cpu_done which gets set in pat_init().
Right, but it will do 'boot_cpu_done = true' twice, and this implicit
recursive call may cause an issue in future if someone makes change
carelessly.
> Also, I don't see the static_cpu_has() check I suggested yesterday - we
> need to check the feature bits if PAT gets disabled early on some old
> Intels.
Sorry, I should have mentioned it. I ended up not needing this change. The
table in patch 2/2 covers this case as:
MTRR PAT ACTION
====================================================================
E D MTRR calls pat_init() -> PAT disabled per cpu_has_pat
That is, the check with cpu_has_pat in pat_bsp_init() calls pat_disable()
in this case. I preferred this way because it will continue to log a
message "PAT not supported by CPU.", and keeps __pat_enabled as the single
variable to manage the PAT state.
Thanks,
-Toshi
On Fri, Mar 11, 2016 at 09:27:40AM -0700, Toshi Kani wrote:
> How about pat_disable_setup()? It's only used for the disabled case, so
> I'd prefer to keep the word "disable".
What for?
Renaming pat_init() to pat_setup() is perfectly fine as it sets up PAT
after looking at pat_disabled() setting and after looking at the CPU
vendor. Sounds like a perfectly sane design to me.
> Yes, calling pat_init() from pat_disable() works too. I changed it in this
> way because:
> - pat_bsp_init() calls pat_disabled() in an error case. It is simpler to
> avoid a recursive call to pat_init().
So do this:
static inline void pat_disable(const char *reason)
{
if (!__pat_enabled)
return;
> - pat_bsp_init() has two different error paths, 1) call pat_disable() and
> return, and 2) goto done and call pat_init_cache_modes(). We can remove
> case 2) to keep the error handling consistent in this way.
Above.
> > Then you don't have to add yet another static disable_init_done but rely
> > on boot_cpu_done which gets set in pat_init().
>
> Right, but it will do 'boot_cpu_done = true' twice, and this implicit
> recursive call may cause an issue in future if someone makes change
> carelessly.
So move boot_cpu_done into pat_bsp_init() and make it protect that
function from a being called a second time.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
On Fri, 2016-03-11 at 09:13 +0000, Ingo Molnar wrote:
> * Ingo Molnar <[email protected]> wrote:
>
> >
> > * Toshi Kani <[email protected]> wrote:
> >
> > > MTRR manages PAT initialization as it implements a rendezvous
> > > handler that initializes PAT as part of MTRR initialization.
> > >
> > > When CPU does not support MTRR, ex. qemu32 virtual CPU, MTRR
> > > simply skips PAT init, which causes PAT left enabled without
> > > initialization. [...]
> >
> > What practical effects does this have to the user? Does the kernel
> > crash?
>
> Btw., I find this omission _highly_ annoying: describing what negative
> effects a bug _causes in practice_ is the most important part of a
> changelog. How on earth can an experienced contributor omit such an
> important component from a patch description?
>
> Most readers of changelogs couldn't care less about technical details of
> how the bug is fixed (of course others will read it so it's nice to have
> too), but what symptoms a bug causes, how serious is it, whether it
> should be backported are like super important compared to everything else
> you wrote - and both the description and the changelogs are totally
> silent on those topics ...
>
> I've seen this in other PAT patches - please try to improve this.
My apology. I agree the importance of describing the negative effect of the
issue. This case is complicated to describe thoroughly, but here is a
summary.
The issue was reported as a regression caused by 'commit 9cd25aac1f44
("x86/mm/pat: Emulate PAT when it is disabled")'. So, the goal of this
patchset is to fix this regression.
https://lkml.org/lkml/2016/3/3/828
The negative effects of the issue were two failures in Xorg on qemu32 env,
which was triggered by the fact that its virtual CPU does not support MTRR.
https://lkml.org/lkml/2016/3/4/775
#1. copy_process() failed in the check in reserve_pfn_range()
#2. error path in copy_process() then hit WARN_ON_ONCE in untrack_pfn().
These negative effects are also caused by two different bugs, but they can
be dealt in lower priority. Fixing the pat_init() issue will avoid Xorg
hitting these cases.
When the CPU does not support MTRR, MTRR does not call pat_init(), but
leaves PAT enabled. This pat_init() issue is a long-standing issue, but
manifested as issue #1 (and then hit issue #2) with the commit because the
memtype now tracks cache attribute with 'page_cache_mode'. A WC map request
is tracked as WC in memtype, but sets a PTE as UC (pgprot) per
__cachemode2pte_tbl[]. This caused an error in reserve_pfn_range() when it
was called from track_pfn_copy(), which obtained pgprot from a PTE. It
converts pgprot to page_cache_mode, which does not necessarily result in
the original page_cache_mode since __cachemode2pte_tbl[] redirects multiple
types to UC. This is a separate issue in reserve_pfn_range().
If PAT is set to disabled properly, the code bypasses the memtype check.
So, #1 is a non-issue as a result (although it should be fixed).
This pat_init() issue existed before commit 9cd25aac1f44, but we used
pgprot in memtype. Hence, we did not have issue #1 before. But WC request
resulted in WT in effect because WC pgrot is actually WT when PAT is not
initialized. This is not how the code was designed to work. When PAT is set
to disabled properly, WC gets converted to UC. The use of WT can result in
a system crash if the target range does not support WT, but fortunately
people did not run into such issue before.
Thanks,
-Toshi
On Fri, 2016-03-11 at 10:24 +0100, Borislav Petkov wrote:
> On Thu, Mar 10, 2016 at 09:45:46PM -0700, Toshi Kani wrote:
> > MTRR manages PAT initialization as it implements a rendezvous
> > handler that initializes PAT as part of MTRR initialization.
> >
> > When CPU does not support MTRR, ex. qemu32 virtual CPU, MTRR
> > simply skips PAT init, which causes PAT left enabled without
> > initialization. Also, get_mtrr_state() calls pat_init() on
> > BSP even if MTRR is disabled by its MSR. This causes pat_init()
> > be called on BSP only.
>
> So I don't understand what all this hoopla is all about: why can't you
> simply call pat_disable() in mtrr_ap_init() and be done with it?
>
> void mtrr_ap_init(void)
> {
> if (!mtrr_enabled()) {
> pat_disable();
> return;
> }
>
> ?
No, it does not fix it. The problem in this particular case, i.e. MTRR
disabled by its MSR, is that mtrr_bp_init() calls pat_init() (as PAT
enabled) and initializes PAT on BSP. After APs are launched, we need the
MTRR's rendezvous handler to initialize PAT on APs to be consistent with
BSP. However, MTRR rendezvous handler is no-op since MTRR is disabled.
Hence, we cannot let mtrr_bp_init() to call pat_init() when MTRR is
disabled.
Thanks,
-Toshi
On Fri, 2016-03-11 at 16:54 +0100, Borislav Petkov wrote:
> On Fri, Mar 11, 2016 at 09:27:40AM -0700, Toshi Kani wrote:
> > How about pat_disable_setup()? It's only used for the disabled case,
> > so I'd prefer to keep the word "disable".
>
> What for?
>
> Renaming pat_init() to pat_setup() is perfectly fine as it sets up PAT
> after looking at pat_disabled() setting and after looking at the CPU
> vendor. Sounds like a perfectly sane design to me.
Sorry, I meant to say -- "How about renaming pat_disable_init() to
pat_disable_setup()?" since I thought you had suggested to rename
pat_disable_init() to pat_setup(). I am still in favor of having a
separate setup func for the disabled case.
> > Yes, calling pat_init() from pat_disable() works too. I changed it in
> > this way because:
> > - pat_bsp_init() calls pat_disabled() in an error case. It is simpler
> > to avoid a recursive call to pat_init().
>
> So do this:
>
> static inline void pat_disable(const char *reason)
> {
> if (!__pat_enabled)
> return;
Hmm... I do not think I understand this. When pat_bsp_init() calls
pat_disable(), 'pat' has been set to the "Full PAT support" setup. So, we
need to reset 'pat' to the "No PAT" setup. How is this handled in your
case?
> > - pat_bsp_init() has two different error paths, 1) call pat_disable()
> > and return, and 2) goto done and call pat_init_cache_modes(). We can
> > remove case 2) to keep the error handling consistent in this way.
>
> Above.
>
> > > Then you don't have to add yet another static disable_init_done but
> > > rely on boot_cpu_done which gets set in pat_init().
> >
> > Right, but it will do 'boot_cpu_done = true' twice, and this implicit
> > recursive call may cause an issue in future if someone makes change
> > carelessly.
>
> So move boot_cpu_done into pat_bsp_init() and make it protect that
> function from a being called a second time.
I think this leads more complication in the end. pat_init() covers (too)
many scenarios already, and moving the disabled setup case out will
simplify it, IMHO.
Thanks,
-Toshi
On Fri, Mar 11, 2016 at 11:57:12AM -0700, Toshi Kani wrote:
> On Fri, 2016-03-11 at 10:24 +0100, Borislav Petkov wrote:
> > On Thu, Mar 10, 2016 at 09:45:46PM -0700, Toshi Kani wrote:
> > > MTRR manages PAT initialization as it implements a rendezvous
> > > handler that initializes PAT as part of MTRR initialization.
> > >
> > > When CPU does not support MTRR, ex. qemu32 virtual CPU, MTRR
> > > simply skips PAT init, which causes PAT left enabled without
> > > initialization.??Also, get_mtrr_state() calls pat_init() on
> > > BSP even if MTRR is disabled by its MSR.??This causes pat_init()
> > > be called on BSP only.
> >
> > So I don't understand what all this hoopla is all about: why can't you
> > simply call pat_disable() in mtrr_ap_init() and be done with it?
> >
> > void mtrr_ap_init(void)
> > {
> > ????????if (!mtrr_enabled()) {
> > pat_disable();
> > ????????????????return;
> > }
> >
> > ?
>
> No, it does not fix it. The problem in this particular case, i.e. MTRR
> disabled by its MSR, is that mtrr_bp_init() calls pat_init() (as PAT
> enabled) and initializes PAT on BSP. After APs are launched, we need the
> MTRR's rendezvous handler to initialize PAT on APs to be consistent with
> BSP. However, MTRR rendezvous handler is no-op since MTRR is disabled.
This seems like a hack on enabling PAT through MTRR code, can we have
a PAT rendezvous handler on its own, or provide a generic rendezvous
handler that lets you deal with whatever interfaces need setup. Then
conflicts can just be negotiated early.
What I'm after is seeing if we can ultimately disable MTRR on kernel
code but still have PAT enabled. I realize you've mentioned BIOS code
may use some MTRR setup code but this is only true for some systems.
I know for a fact Xen cannot use MTRR, it seems qemu32 does not enable
it either. So why not have the ability to skip through its set up ?
I'll also note Xen managed to enable PAT only without enabling MTRR,
this was done through pat_init_cache_modes() -- not sure if this can
be leveraged for qemu32...
Luis
On Fri, 2016-03-11 at 23:17 +0100, Luis R. Rodriguez wrote:
> On Fri, Mar 11, 2016 at 11:57:12AM -0700, Toshi Kani wrote:
> > On Fri, 2016-03-11 at 10:24 +0100, Borislav Petkov wrote:
> > > On Thu, Mar 10, 2016 at 09:45:46PM -0700, Toshi Kani wrote:
> > > > MTRR manages PAT initialization as it implements a rendezvous
> > > > handler that initializes PAT as part of MTRR initialization.
> > > >
> > > > When CPU does not support MTRR, ex. qemu32 virtual CPU, MTRR
> > > > simply skips PAT init, which causes PAT left enabled without
> > > > initialization. Also, get_mtrr_state() calls pat_init() on
> > > > BSP even if MTRR is disabled by its MSR. This causes pat_init()
> > > > be called on BSP only.
> > >
> > > So I don't understand what all this hoopla is all about: why can't
> > > you
> > > simply call pat_disable() in mtrr_ap_init() and be done with it?
> > >
> > > void mtrr_ap_init(void)
> > > {
> > > if (!mtrr_enabled()) {
> > > pat_disable();
> > > return;
> > > }
> > >
> > > ?
> >
> > No, it does not fix it. The problem in this particular case, i.e. MTRR
> > disabled by its MSR, is that mtrr_bp_init() calls pat_init() (as PAT
> > enabled) and initializes PAT on BSP. After APs are launched, we need
> > the MTRR's rendezvous handler to initialize PAT on APs to be consistent
> > with BSP. However, MTRR rendezvous handler is no-op since MTRR is
> > disabled.
>
> This seems like a hack on enabling PAT through MTRR code, can we have
> a PAT rendezvous handler on its own, or provide a generic rendezvous
> handler that lets you deal with whatever interfaces need setup. Then
> conflicts can just be negotiated early.
The MTRR code can be enhanced so that the rendezvous handler can handle
MTRR and PAT state independently. I noted this case as (*) in the table of
this patch description. This is a separate item, however.
MTRR calling PAT was not a hack (as I suppose we did not have VMs at that
time), although this can surely be improved. As Intel SDM state below,
both MTRR and PAT require the same procedure, and the PAT initialization
sequence is defined in the MTRR section.
===
11.12.4 Programming the PAT
:
The operating system is responsible for insuring that changes to a PAT
entry occur in a manner that maintains the consistency of the processor
caches and translation lookaside buffers (TLB). This is accomplished by
following the procedure as specified in Section 11.11.8, “MTRR
Considerations in MP Systems,” for changing the value of an MTRR in a
multiple processor system. It requires a specific sequence of operations
that includes flushing the processors caches and TLBs.
===
> What I'm after is seeing if we can ultimately disable MTRR on kernel
> code but still have PAT enabled. I realize you've mentioned BIOS code
> may use some MTRR setup code but this is only true for some systems.
> I know for a fact Xen cannot use MTRR, it seems qemu32 does not enable
> it either. So why not have the ability to skip through its set up ?
MTRR support has two meanings:
1) The kernel keeps the MTRR setup by BIOS.
2) The kernel modifies the MTRR setup.
I am in a position that we need 1) but 2). In fact, the kernel disabling
MTRRs is the same as 2).
> I'll also note Xen managed to enable PAT only without enabling MTRR,
> this was done through pat_init_cache_modes() -- not sure if this can
> be leveraged for qemu32...
I am interested to know how Xen managed this. Is this done by the Xen
hypervisor initializes guest's PAT on behalf of the guest kernel?
Thanks,
-Toshi
On Fri, Mar 11, 2016 at 3:56 PM, Toshi Kani <[email protected]> wrote:
> On Fri, 2016-03-11 at 23:17 +0100, Luis R. Rodriguez wrote:
>> On Fri, Mar 11, 2016 at 11:57:12AM -0700, Toshi Kani wrote:
>> > On Fri, 2016-03-11 at 10:24 +0100, Borislav Petkov wrote:
>> > > On Thu, Mar 10, 2016 at 09:45:46PM -0700, Toshi Kani wrote:
>> > > > MTRR manages PAT initialization as it implements a rendezvous
>> > > > handler that initializes PAT as part of MTRR initialization.
>> > > >
>> > > > When CPU does not support MTRR, ex. qemu32 virtual CPU, MTRR
>> > > > simply skips PAT init, which causes PAT left enabled without
>> > > > initialization. Also, get_mtrr_state() calls pat_init() on
>> > > > BSP even if MTRR is disabled by its MSR. This causes pat_init()
>> > > > be called on BSP only.
>> > >
>> > > So I don't understand what all this hoopla is all about: why can't
>> > > you
>> > > simply call pat_disable() in mtrr_ap_init() and be done with it?
>> > >
>> > > void mtrr_ap_init(void)
>> > > {
>> > > if (!mtrr_enabled()) {
>> > > pat_disable();
>> > > return;
>> > > }
>> > >
>> > > ?
>> >
>> > No, it does not fix it. The problem in this particular case, i.e. MTRR
>> > disabled by its MSR, is that mtrr_bp_init() calls pat_init() (as PAT
>> > enabled) and initializes PAT on BSP. After APs are launched, we need
>> > the MTRR's rendezvous handler to initialize PAT on APs to be consistent
>> > with BSP. However, MTRR rendezvous handler is no-op since MTRR is
>> > disabled.
>>
>> This seems like a hack on enabling PAT through MTRR code, can we have
>> a PAT rendezvous handler on its own, or provide a generic rendezvous
>> handler that lets you deal with whatever interfaces need setup. Then
>> conflicts can just be negotiated early.
>
> The MTRR code can be enhanced so that the rendezvous handler can handle
> MTRR and PAT state independently. I noted this case as (*) in the table of
> this patch description. This is a separate item, however.
>
> MTRR calling PAT was not a hack (as I suppose we did not have VMs at that
> time), although this can surely be improved. As Intel SDM state below,
> both MTRR and PAT require the same procedure, and the PAT initialization
> sequence is defined in the MTRR section.
>
> ===
> 11.12.4 Programming the PAT
> :
> The operating system is responsible for insuring that changes to a PAT
> entry occur in a manner that maintains the consistency of the processor
> caches and translation lookaside buffers (TLB). This is accomplished by
> following the procedure as specified in Section 11.11.8, “MTRR
> Considerations in MP Systems,” for changing the value of an MTRR in a
> multiple processor system. It requires a specific sequence of operations
> that includes flushing the processors caches and TLBs.
> ===
>
>> What I'm after is seeing if we can ultimately disable MTRR on kernel
>> code but still have PAT enabled. I realize you've mentioned BIOS code
>> may use some MTRR setup code but this is only true for some systems.
>> I know for a fact Xen cannot use MTRR, it seems qemu32 does not enable
>> it either. So why not have the ability to skip through its set up ?
>
> MTRR support has two meanings:
> 1) The kernel keeps the MTRR setup by BIOS.
> 2) The kernel modifies the MTRR setup.
>
> I am in a position that we need 1) but 2).
I take it you meant "but not 2)" ? There *are folks however who do
more as I noted earlier. Perhaps now now, but in the future I'd
encourage folks to rip MTRR out of their own BIOS, and enable a new
ACPI legacy flag to say "MTRR required". That'd eventually can help
bury MTRR for good while remaining backward compatible.
I can read the above description to also say:
"Hey you need to implement PAT with the same skeleton code as MTRR"
If we do that, we can pave the way to deprecate MTRR as legacy for
good first on Linux.
> In fact, the kernel disabling MTRRs is the same as 2).
>
>> I'll also note Xen managed to enable PAT only without enabling MTRR,
>> this was done through pat_init_cache_modes() -- not sure if this can
>> be leveraged for qemu32...
>
> I am interested to know how Xen managed this. Is this done by the Xen
> hypervisor initializes guest's PAT on behalf of the guest kernel?
Yup. And the cache read thingy was reading back its own setup, which
was different than what Linux used by default IIRC. Juergen can
elaborate more.
Luis
On Fri, 2016-03-11 at 15:34 -0800, Luis R. Rodriguez wrote:
> On Fri, Mar 11, 2016 at 3:56 PM, Toshi Kani <[email protected]> wrote:
> > On Fri, 2016-03-11 at 23:17 +0100, Luis R. Rodriguez wrote:
> > > On Fri, Mar 11, 2016 at 11:57:12AM -0700, Toshi Kani wrote:
> > > > On Fri, 2016-03-11 at 10:24 +0100, Borislav Petkov wrote:
> > > > > On Thu, Mar 10, 2016 at 09:45:46PM -0700, Toshi Kani wrote:
> > > > > > MTRR manages PAT initialization as it implements a rendezvous
> > > > > > handler that initializes PAT as part of MTRR initialization.
:
> > > >
> > > > No, it does not fix it. The problem in this particular case, i.e.
> > > > MTRR disabled by its MSR, is that mtrr_bp_init() calls pat_init()
> > > > (as PAT enabled) and initializes PAT on BSP. After APs are
> > > > launched, we need the MTRR's rendezvous handler to initialize PAT
> > > > on APs to be consistent with BSP. However, MTRR rendezvous handler
> > > > is no-op since MTRR is disabled.
> > >
> > > This seems like a hack on enabling PAT through MTRR code, can we have
> > > a PAT rendezvous handler on its own, or provide a generic rendezvous
> > > handler that lets you deal with whatever interfaces need setup. Then
> > > conflicts can just be negotiated early.
> >
> > The MTRR code can be enhanced so that the rendezvous handler can handle
> > MTRR and PAT state independently. I noted this case as (*) in the
> > table of this patch description. This is a separate item, however.
> >
> > MTRR calling PAT was not a hack (as I suppose we did not have VMs at
> > that time), although this can surely be improved. As Intel SDM state
> > below, both MTRR and PAT require the same procedure, and the PAT
> > initialization sequence is defined in the MTRR section.
> >
> > ===
> > 11.12.4 Programming the PAT
> > :
> > The operating system is responsible for insuring that changes to a PAT
> > entry occur in a manner that maintains the consistency of the processor
> > caches and translation lookaside buffers (TLB). This is accomplished by
> > following the procedure as specified in Section 11.11.8, “MTRR
> > Considerations in MP Systems,” for changing the value of an MTRR in a
> > multiple processor system. It requires a specific sequence of
> > operations that includes flushing the processors caches and TLBs.
> > ===
> >
> > > What I'm after is seeing if we can ultimately disable MTRR on kernel
> > > code but still have PAT enabled. I realize you've mentioned BIOS code
> > > may use some MTRR setup code but this is only true for some systems.
> > > I know for a fact Xen cannot use MTRR, it seems qemu32 does not
> > > enable
> > > it either. So why not have the ability to skip through its set up ?
> >
> > MTRR support has two meanings:
> > 1) The kernel keeps the MTRR setup by BIOS.
> > 2) The kernel modifies the MTRR setup.
> >
> > I am in a position that we need 1) but 2).
>
> I take it you meant "but not 2)" ?
Yes. :)
> There *are folks however who do
> more as I noted earlier. Perhaps now now, but in the future I'd
> encourage folks to rip MTRR out of their own BIOS, and enable a new
> ACPI legacy flag to say "MTRR required". That'd eventually can help
> bury MTRR for good while remaining backward compatible.
Well, BIOS using MTRR is better than BIOS setting page tables in the SMI
handler. The kernel can be ignorant of the MTRR setup as long as it does
not modify it.
> I can read the above description to also say:
>
> "Hey you need to implement PAT with the same skeleton code as MTRR"
No, I did not say that. MTRR's rendezvous handler can be generalized to
work with both MTRR and PAT. We do not need two separate handlers. In
fact, it needs to be a single handler so that both can be initialized
together.
> If we do that, we can pave the way to deprecate MTRR as legacy for
> good first on Linux.
I do not think such change will deprecate MTRR. It just means that Linux
can enable PAT on virtual CPUs with PAT & !MTRR capability.
> > In fact, the kernel disabling MTRRs is the same as 2).
> >
> > > I'll also note Xen managed to enable PAT only without enabling MTRR,
> > > this was done through pat_init_cache_modes() -- not sure if this can
> > > be leveraged for qemu32...
> >
> > I am interested to know how Xen managed this. Is this done by the Xen
> > hypervisor initializes guest's PAT on behalf of the guest kernel?
>
> Yup. And the cache read thingy was reading back its own setup, which
> was different than what Linux used by default IIRC. Juergen can
> elaborate more.
Yeah, I'd like to make sure that my changes won't break it.
Thanks,
-Toshi
Ok, let's start again.
So what's wrong with this simpler, cleaner version below?
* It makes sure to run the rendezvous handler on the BP on init.
* It disables PAT otherwise.
* Oh, and it boots fine with Paul's reproducer, X is there (even though
vncviewer dies when X starts with "Rect too large: 640x480 at (0, 0)"
and I have to reconnect to see the X window).
In dmesg I see "x86/PAT: Disabled by MTRR".
---
diff --git a/arch/x86/include/asm/pat.h b/arch/x86/include/asm/pat.h
index ca6c228d5e62..3f3b1bc67391 100644
--- a/arch/x86/include/asm/pat.h
+++ b/arch/x86/include/asm/pat.h
@@ -5,7 +5,8 @@
#include <asm/pgtable_types.h>
bool pat_enabled(void);
-extern void pat_init(void);
+void pat_disable(const char *reason);
+void pat_setup(void);
void pat_init_cache_modes(u64);
extern int reserve_memtype(u64 start, u64 end,
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 19f57360dfd2..44bad29ebc9b 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -444,11 +444,24 @@ static void __init print_mtrr_state(void)
pr_debug("TOM2: %016llx aka %lldM\n", mtrr_tom2, mtrr_tom2>>20);
}
+void __init mtrr_pat_setup_bp(void)
+{
+ unsigned long flags;
+
+ /* PAT setup for BP. We need to go through sync steps here */
+ local_irq_save(flags);
+ prepare_set();
+
+ pat_setup();
+
+ post_set();
+ local_irq_restore(flags);
+}
+
/* Grab all of the MTRR state for this CPU into *state */
bool __init get_mtrr_state(void)
{
struct mtrr_var_range *vrs;
- unsigned long flags;
unsigned lo, dummy;
unsigned int i;
@@ -481,15 +494,6 @@ bool __init get_mtrr_state(void)
mtrr_state_set = 1;
- /* PAT setup for BP. We need to go through sync steps here */
- local_irq_save(flags);
- prepare_set();
-
- pat_init();
-
- post_set();
- local_irq_restore(flags);
-
return !!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED);
}
@@ -788,7 +792,7 @@ static void generic_set_all(void)
mask = set_mtrr_state();
/* also set PAT */
- pat_init();
+ pat_setup();
post_set();
local_irq_restore(flags);
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 10f8d4796240..5c442b4bd52a 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -752,6 +752,8 @@ void __init mtrr_bp_init(void)
/* BIOS may override */
__mtrr_enabled = get_mtrr_state();
+ mtrr_pat_setup_bp();
+
if (mtrr_cleanup(phys_addr)) {
changed_by_mtrr_cleanup = 1;
mtrr_if->set_all();
@@ -759,8 +761,11 @@ void __init mtrr_bp_init(void)
}
}
- if (!mtrr_enabled())
+ if (!__mtrr_enabled) {
pr_info("MTRR: Disabled\n");
+ pat_disable("PAT disabled by MTRR");
+ pat_setup();
+ }
}
void mtrr_ap_init(void)
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h
index 951884dcc433..e60d87cceae5 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.h
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.h
@@ -52,6 +52,7 @@ void set_mtrr_prepare_save(struct set_mtrr_context *ctxt);
void fill_mtrr_var_range(unsigned int index,
u32 base_lo, u32 base_hi, u32 mask_lo, u32 mask_hi);
bool get_mtrr_state(void);
+void __init mtrr_pat_setup_bp(void);
extern void set_mtrr_ops(const struct mtrr_ops *ops);
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index faec01e7a17d..ac8283bcf417 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -41,8 +41,11 @@ static bool boot_cpu_done;
static int __read_mostly __pat_enabled = IS_ENABLED(CONFIG_X86_PAT);
-static inline void pat_disable(const char *reason)
+void pat_disable(const char *reason)
{
+ if (!__pat_enabled)
+ return;
+
__pat_enabled = 0;
pr_info("x86/PAT: %s\n", reason);
}
@@ -238,7 +241,7 @@ static void pat_ap_init(u64 pat)
wrmsrl(MSR_IA32_CR_PAT, pat);
}
-void pat_init(void)
+void pat_setup(void)
{
u64 pat;
struct cpuinfo_x86 *c = &boot_cpu_data;
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
* Toshi Kani <[email protected]> wrote:
> On Fri, 2016-03-11 at 09:13 +0000, Ingo Molnar wrote:
> > * Ingo Molnar <[email protected]> wrote:
> >
> > >
> > > * Toshi Kani <[email protected]> wrote:
> > >
> > > > MTRR manages PAT initialization as it implements a rendezvous
> > > > handler that initializes PAT as part of MTRR initialization.
> > > >
> > > > When CPU does not support MTRR, ex. qemu32 virtual CPU, MTRR
> > > > simply skips PAT init, which causes PAT left enabled without
> > > > initialization. [...]
> > >
> > > What practical effects does this have to the user? Does the kernel
> > > crash?
> >
> > Btw., I find this omission _highly_ annoying: describing what negative
> > effects a bug _causes in practice_ is the most important part of a
> > changelog. How on earth can an experienced contributor omit such an
> > important component from a patch description?
> >
> > Most readers of changelogs couldn't care less about technical details of
> > how the bug is fixed (of course others will read it so it's nice to have
> > too), but what symptoms a bug causes, how serious is it, whether it
> > should be backported are like super important compared to everything else
> > you wrote - and both the description and the changelogs are totally
> > silent on those topics ...
> >
> > I've seen this in other PAT patches - please try to improve this.
>
> My apology. I agree the importance of describing the negative effect of the
> issue. This case is complicated to describe thoroughly, but here is a
> summary.
The new changelog looks very good, thanks!
> The issue was reported as a regression caused by 'commit 9cd25aac1f44
> ("x86/mm/pat: Emulate PAT when it is disabled")'. So, the goal of this
> patchset is to fix this regression.
> https://lkml.org/lkml/2016/3/3/828
So one thing that matters more than anything else in the changelog, the title!
Right now the title is:
x86/mtrr: Refactor PAT initialization code
... that's a nice title for a true refactoring of the code, but this isn't really
that, the purpose of this fix is to fix a bad Xorg crash for Qemu users.
The principle you need to remember is that readers of your changelogs will be
_very happy_ about 'negative' phrases like:
bad bug
Xorg crash
boot failure
kernel crash
NULL dereference
I.e. the 'best' title for a bug fix is to characterize it in the most negative
truthful fashion in the changelog. It sounds a bit counterintuitive but it's true.
So in this case the best changelog title would be something like:
x86/pat: Fix Xorg crashes in Qemu sessions
People will absolutely _love_ such titles, because:
- users who are trying to find mysterious Xorg failures can grep for it and
might find it before it hits a stable kernel they are using
- maintainers (like me) are able to see it at a glance that this fix should go
to Linus more urgently than other fixes. (and definitely more urgently than
feature patches.)
- stable kernel maintainers and distro backporters can see it immediately at a
glance that they really want this fix.
So by being intentionally and maximally negative in the title, you are being very
helpful to your fellow developers and users!
Now consider the original title:
x86/mtrr: Refactor PAT initialization code
99% of people will glance over such a title, which is not good. Furhermore,
maintainers like me will get _annoyed_ at such titles, because this neutrally
formulated title, while very polite, actively hides the important detail that
these patches fix real negative bugs for real users.
Okay?
And please also note that in the Linux kernel no-one ever 'blames' other people
for bugs. Bugs are part of the human condition and they happen all the time as
long as they are not introduced by carelessness. So in the typical case you cannot
possibly socially embarrass any good kernel developer by reporting and fixing a
bug he introduced. The typical reaction you will get is 'oh great, one bug less to
worry about!', so socially you can be absolutely honest and 'impolite' about the
negative effects of bugs.
> The negative effects of the issue were two failures in Xorg on qemu32 env,
> which was triggered by the fact that its virtual CPU does not support MTRR.
> https://lkml.org/lkml/2016/3/4/775
> ?#1. copy_process() failed in the check in reserve_pfn_range()
> ?#2. error path in copy_process() then hit WARN_ON_ONCE in untrack_pfn().
Yeah, it's nice to quote actual crash signatures as well (in a short form) -
because people hitting the crashes often do a google search and might find the fix
based on such patterns.
Thanks!
Ingo
On Sat, 2016-03-12 at 17:18 +0100, Ingo Molnar wrote:
> * Toshi Kani <[email protected]> wrote:
>
> > On Fri, 2016-03-11 at 09:13 +0000, Ingo Molnar wrote:
> > > * Ingo Molnar <[email protected]> wrote:
> > >
> > > >
> > > > * Toshi Kani <[email protected]> wrote:
> > > >
> > > > > MTRR manages PAT initialization as it implements a rendezvous
> > > > > handler that initializes PAT as part of MTRR initialization.
> > > > >
> > > > > When CPU does not support MTRR, ex. qemu32 virtual CPU, MTRR
> > > > > simply skips PAT init, which causes PAT left enabled without
> > > > > initialization. [...]
> > > >
> > > > What practical effects does this have to the user? Does the kernel
> > > > crash?
> > >
> > > Btw., I find this omission _highly_ annoying: describing what
> > > negative effects a bug _causes in practice_ is the most important
> > > part of a changelog. How on earth can an experienced contributor omit
> > > such an important component from a patch description?
> > >
> > > Most readers of changelogs couldn't care less about technical details
> > > of how the bug is fixed (of course others will read it so it's nice
> > > to have too), but what symptoms a bug causes, how serious is it,
> > > whether it should be backported are like super important compared to
> > > everything else you wrote - and both the description and the
> > > changelogs are totally silent on those topics ...
> > >
> > > I've seen this in other PAT patches - please try to improve this.
> >
> > My apology. I agree the importance of describing the negative effect of
> > the issue. This case is complicated to describe thoroughly, but here is
> > a summary.
>
> The new changelog looks very good, thanks!
>
> > The issue was reported as a regression caused by 'commit 9cd25aac1f44
> > ("x86/mm/pat: Emulate PAT when it is disabled")'. So, the goal of this
> > patchset is to fix this regression.
> > https://lkml.org/lkml/2016/3/3/828
>
> So one thing that matters more than anything else in the changelog, the
> title! Right now the title is:
>
> x86/mtrr: Refactor PAT initialization code
>
> ... that's a nice title for a true refactoring of the code, but this
> isn't really that, the purpose of this fix is to fix a bad Xorg crash for
> Qemu users.
>
> The principle you need to remember is that readers of your changelogs
> will be _very happy_ about 'negative' phrases like:
>
> bad bug
> Xorg crash
> boot failure
> kernel crash
> NULL dereference
>
> I.e. the 'best' title for a bug fix is to characterize it in the most
> negative truthful fashion in the changelog. It sounds a bit
> counterintuitive but it's true.
>
> So in this case the best changelog title would be something like:
>
> x86/pat: Fix Xorg crashes in Qemu sessions
>
> People will absolutely _love_ such titles, because:
>
> - users who are trying to find mysterious Xorg failures can grep for it
> and might find it before it hits a stable kernel they are using
>
> - maintainers (like me) are able to see it at a glance that this fix
> should go to Linus more urgently than other fixes. (and definitely more
> urgently than feature patches.)
>
> - stable kernel maintainers and distro backporters can see it
> immediately at a glance that they really want this fix.
>
> So by being intentionally and maximally negative in the title, you are
> being very helpful to your fellow developers and users!
>
> Now consider the original title:
>
> x86/mtrr: Refactor PAT initialization code
>
> 99% of people will glance over such a title, which is not good.
> Furhermore, maintainers like me will get _annoyed_ at such titles,
> because this neutrally formulated title, while very polite, actively
> hides the important detail that these patches fix real negative bugs for
> real users.
>
> Okay?
Thanks for all the explanation and guidance! That's very helpful. Yes, I
will keep this in mind.
> And please also note that in the Linux kernel no-one ever 'blames' other
> people for bugs. Bugs are part of the human condition and they happen all
> the time as long as they are not introduced by carelessness. So in the
> typical case you cannot possibly socially embarrass any good kernel
> developer by reporting and fixing a bug he introduced. The typical
> reaction you will get is 'oh great, one bug less to worry about!', so
> socially you can be absolutely honest and 'impolite' about the negative
> effects of bugs.
Understood.
> > The negative effects of the issue were two failures in Xorg on qemu32
> > env, which was triggered by the fact that its virtual CPU does not
> > support MTRR.
> > https://lkml.org/lkml/2016/3/4/775
> > #1. copy_process() failed in the check in reserve_pfn_range()
> > #2. error path in copy_process() then hit WARN_ON_ONCE in
> > untrack_pfn().
>
> Yeah, it's nice to quote actual crash signatures as well (in a short
> form) - because people hitting the crashes often do a google search and
> might find the fix based on such patterns.
Will do.
Thanks!
-Toshi
On Sat, 2016-03-12 at 12:55 +0100, Borislav Petkov wrote:
> Ok, let's start again.
>
> So what's wrong with this simpler, cleaner version below?
>
> * It makes sure to run the rendezvous handler on the BP on init.
>
> * It disables PAT otherwise.
>
> * Oh, and it boots fine with Paul's reproducer, X is there (even though
> vncviewer dies when X starts with "Rect too large: 640x480 at (0, 0)"
> and I have to reconnect to see the X window).
>
> In dmesg I see "x86/PAT: Disabled by MTRR".
Your patch is a simplified version of mine. So, yes, it fixes the Paul's
issue, but it does not address other issues that my patchset also
addressed. In specific, I think your patch has the following issues.
- pat_disable() is now callable from other modules. So, it needs to check
with boot_cpu_done. We cannot disable PAT once it is initialized.
- mtrr_bp_init() needs to check with mtrr_enabled() when it
calls mtrr_pat_setup_bp(). Otherwise, PAT is left initialized on BSP only
when MTRR is disabled by its MSR. In your patch, mtrr_bp_init() calls
pat_setup() again, but it does not help since boot_cpu_done is set.
- When PAT is disabled in CPU feature, pat_bsp_init() calls pat_disable()
and returns. However, it does not initialize a PAT table by calling
pat_init_cache_modes().
- When CONFIG_MTRR is unset, it does not call pat_setup().
Thanks,
-Toshi
On Fri, Mar 11, 2016 at 11:34:26AM -0700, Toshi Kani wrote:
> On Fri, 2016-03-11 at 09:13 +0000, Ingo Molnar wrote:
> > * Ingo Molnar <[email protected]> wrote:
> >
> > >
> > > * Toshi Kani <[email protected]> wrote:
> > >
> > > > MTRR manages PAT initialization as it implements a rendezvous
> > > > handler that initializes PAT as part of MTRR initialization.
> > > >
> > > > When CPU does not support MTRR, ex. qemu32 virtual CPU, MTRR
> > > > simply skips PAT init, which causes PAT left enabled without
> > > > initialization. [...]
> > >
> > > What practical effects does this have to the user? Does the kernel
> > > crash?
> >
> > Btw., I find this omission _highly_ annoying: describing what negative
> > effects a bug _causes in practice_ is the most important part of a
> > changelog. How on earth can an experienced contributor omit such an
> > important component from a patch description?
> >
> > Most readers of changelogs couldn't care less about technical details of
> > how the bug is fixed (of course others will read it so it's nice to have
> > too), but what symptoms a bug causes, how serious is it, whether it
> > should be backported are like super important compared to everything else
> > you wrote - and both the description and the changelogs are totally
> > silent on those topics ...
> >
> > I've seen this in other PAT patches - please try to improve this.
>
> My apology. I agree the importance of describing the negative effect of the
> issue. This case is complicated to describe thoroughly, but here is a
> summary.
>
> The issue was reported as a regression caused by 'commit 9cd25aac1f44
> ("x86/mm/pat: Emulate PAT when it is disabled")'. So, the goal of this
> patchset is to fix this regression.
> https://lkml.org/lkml/2016/3/3/828
Huh, interesting.
> The negative effects of the issue were two failures in Xorg on qemu32 env,
> which was triggered by the fact that its virtual CPU does not support MTRR.
> https://lkml.org/lkml/2016/3/4/775
> ?#1. copy_process() failed in the check in reserve_pfn_range()
> ?#2. error path in copy_process() then hit WARN_ON_ONCE in untrack_pfn().
>
> These negative effects are also caused by two different bugs, but they can
> be dealt in lower priority. Fixing the pat_init() issue will avoid Xorg
> hitting these cases.
Was it confirmed that these patches fix this issue? Mentioning this in the
commit log would also help. Tested-by, etc.
Joe at Stratus also hit this issue but on a system where MTRR is enabled. He
sent his report only to me as he thought it was caused by the ioremap_wc()
changes and his driver was one that got it. In his case though he modified the
driver significantly, and upon inspection of that code saw how it used a
secondary backup PCI device for failover for a framebuffer device... The changes
to the driver in place are rather complex though and as such it made no sense
to further review unless he moved his changes upstream. It is still worth
noting this issue has been seeing elsehwere, but the root cause is still not
known. The error Joe got is:
x86/PAT: Xorg:37506 map pfn expected mapping type uncached-minus for [mem 0x9f000000-0x9f7fffff], got write-combining
Even though the driver is custom (and actually I even saw another unrelated
proprietary driver loaded) I figured its worth noting others have seen this
error without MTRR being disabled.
The second thread you referred to seems to say that if you built-in the code
the error does not come up. What the hell. Joe, can you try building your
driver built-in to see if you also see this go away? Even though I don't
want to support your custom hacked up driver I do want to know if your
issue goes away with built-in as well.
> When the CPU does not support MTRR, MTRR does not call pat_init(), but
> leaves PAT enabled. This pat_init() issue is a long-standing issue, but
> manifested as issue #1 (and then hit issue #2) with the commit because the
> memtype now tracks cache attribute with 'page_cache_mode'. A WC map request
> is tracked as WC in memtype, but sets a PTE as UC (pgprot) per
> __cachemode2pte_tbl[].
Can you think of anything else other than having MTRR disabled that could
cause this same ending result?
> This caused an error in reserve_pfn_range() when it
> was called from?track_pfn_copy(), which obtained pgprot from a PTE. It
> converts pgprot to page_cache_mode, which does not necessarily result in
> the original page_cache_mode since __cachemode2pte_tbl[] redirects multiple
> types to UC. This is a separate issue in?reserve_pfn_range().
Do you have a fix in mind for that already too?
> If PAT is set to disabled properly, the code bypasses the memtype check.
> So, #1 is a non-issue as a result (although it should be fixed).
> ??
> This pat_init() issue existed before commit 9cd25aac1f44, but we used
> pgprot in memtype. Hence, we did not have issue #1 before. But WC request
> resulted in WT in effect because WC pgrot is actually WT when PAT is not
> initialized. This is not how the code was designed to work. When PAT is set
> to disabled properly, WC gets converted to UC. The use of WT can result in
> a system crash if the target range does not support WT, but fortunately
> people did not run into such issue before.
And since the ioremap_wc() crusade ended via commit 2baa891e42d84, and both
were merged on v4.3 it mean that we'd likely see more of these issues now as
more driver are using PAT since v4.3.
Luis
On Mon, 2016-03-14 at 23:50 +0100, Luis R. Rodriguez wrote:
> On Fri, Mar 11, 2016 at 11:34:26AM -0700, Toshi Kani wrote:
> > On Fri, 2016-03-11 at 09:13 +0000, Ingo Molnar wrote:
> > > * Ingo Molnar <[email protected]> wrote:
:
> > The negative effects of the issue were two failures in Xorg on qemu32
> > env, which was triggered by the fact that its virtual CPU does not
> > support MTRR.
> > https://lkml.org/lkml/2016/3/4/775
> > #1. copy_process() failed in the check in reserve_pfn_range()
> > #2. error path in copy_process() then hit WARN_ON_ONCE in
> > untrack_pfn().
> >
> > These negative effects are also caused by two different bugs, but they
> > can be dealt in lower priority. Fixing the pat_init() issue will avoid
> > Xorg hitting these cases.
>
> Was it confirmed that these patches fix this issue? Mentioning this in
> the commit log would also help. Tested-by, etc.
I instrumented the kernel to emulate the non-MTRR case, and reproduced the
issue. I then tested to confirm that my patch fixed it.
Yes, it'd be really nice if Paul can test it as well.
> Joe at Stratus also hit this issue but on a system where MTRR is enabled.
> He sent his report only to me as he thought it was caused by the
> ioremap_wc() changes and his driver was one that got it. In his case
> though he modified the driver significantly, and upon inspection of that
> code saw how it used a secondary backup PCI device for failover for a
> framebuffer device... The changes to the driver in place are rather
> complex though and as such it made no sense to further review unless he
> moved his changes upstream. It is still worth noting this issue has been
> seeing elsehwere, but the root cause is still not known. The error Joe
> got is:
>
> x86/PAT: Xorg:37506 map pfn expected mapping type uncached-minus for [mem
> 0x9f000000-0x9f7fffff], got write-combining
>
> Even though the driver is custom (and actually I even saw another
> unrelated proprietary driver loaded) I figured its worth noting others
> have seen this error without MTRR being disabled.
The error message looks the same. So, this could be the same issue if WC
is redirected to UC without disabling PAT properly on his env. I need a
whole dmesg output to confirm if this is the case. Another way to hit this
error is that the driver called remap_pfn_range() with UC to a range where
WC map was set by ioremap_wc() already.
> The second thread you referred to seems to say that if you built-in the
> code the error does not come up. What the hell. Joe, can you try building
> your driver built-in to see if you also see this go away? Even though I
> don't want to support your custom hacked up driver I do want to know if
> your issue goes away with built-in as well.
I do not have sufficient info to support this case, and do not have
technical explanation for it, either.
> > When the CPU does not support MTRR, MTRR does not call pat_init(), but
> > leaves PAT enabled. This pat_init() issue is a long-standing issue, but
> > manifested as issue #1 (and then hit issue #2) with the commit because
> > the memtype now tracks cache attribute with 'page_cache_mode'. A WC map
> > request is tracked as WC in memtype, but sets a PTE as UC (pgprot) per
> > __cachemode2pte_tbl[].
>
> Can you think of anything else other than having MTRR disabled that could
> cause this same ending result?
See above.
> > This caused an error in reserve_pfn_range() when it
> > was called from track_pfn_copy(), which obtained pgprot from a PTE. It
> > converts pgprot to page_cache_mode, which does not necessarily result
> > in the original page_cache_mode since __cachemode2pte_tbl[] redirects
> > multiple types to UC. This is a separate issue in reserve_pfn_range().
>
> Do you have a fix in mind for that already too?
Yes, we can compare with pgprot values as it was the case before.
> > If PAT is set to disabled properly, the code bypasses the memtype
> > check. So, #1 is a non-issue as a result (although it should be fixed).
> >
> > This pat_init() issue existed before commit 9cd25aac1f44, but we used
> > pgprot in memtype. Hence, we did not have issue #1 before. But WC
> > request resulted in WT in effect because WC pgrot is actually WT when
> > PAT is not initialized. This is not how the code was designed to work.
> > When PAT is set to disabled properly, WC gets converted to UC. The use
> > of WT can result in a system crash if the target range does not support
> > WT, but fortunately people did not run into such issue before.
>
> And since the ioremap_wc() crusade ended via commit 2baa891e42d84, and
> both were merged on v4.3 it mean that we'd likely see more of these
> issues now as more driver are using PAT since v4.3.
Uh-oh...
Thanks,
-Toshi
On Fri, Mar 11, 2016 at 06:16:36PM -0700, Toshi Kani wrote:
> On Fri, 2016-03-11 at 15:34 -0800, Luis R. Rodriguez wrote:
> > On Fri, Mar 11, 2016 at 3:56 PM, Toshi Kani <[email protected]> wrote:
> > > On Fri, 2016-03-11 at 23:17 +0100, Luis R. Rodriguez wrote:
> > > > On Fri, Mar 11, 2016 at 11:57:12AM -0700, Toshi Kani wrote:
> > > > > On Fri, 2016-03-11 at 10:24 +0100, Borislav Petkov wrote:
> > > > > > On Thu, Mar 10, 2016 at 09:45:46PM -0700, Toshi Kani wrote:
> > > > > > > MTRR manages PAT initialization as it implements a rendezvous
> > > > > > > handler that initializes PAT as part of MTRR initialization.
> :
> > > > >
> > > > > No, it does not fix it. The problem in this particular case, i.e.
> > > > > MTRR disabled by its MSR, is that mtrr_bp_init() calls pat_init()
> > > > > (as PAT enabled) and initializes PAT on BSP. After APs are
> > > > > launched, we need the MTRR's rendezvous handler to initialize PAT
> > > > > on APs to be consistent with BSP. However, MTRR rendezvous handler
> > > > > is no-op since MTRR is disabled.
> > > >
> > > > This seems like a hack on enabling PAT through MTRR code, can we have
> > > > a PAT rendezvous handler on its own, or provide a generic rendezvous
> > > > handler that lets you deal with whatever interfaces need setup. Then
> > > > conflicts can just be negotiated early.
> > >
> > > The MTRR code can be enhanced so that the rendezvous handler can handle
> > > MTRR and PAT state independently. I noted this case as (*) in the
> > > table of this patch description. This is a separate item, however.
> > >
> > > MTRR calling PAT was not a hack (as I suppose we did not have VMs at
> > > that time), although this can surely be improved. As Intel SDM state
> > > below, both MTRR and PAT require the same procedure, and the PAT
> > > initialization sequence is defined in the MTRR section.
> > >
> > > ===
> > > 11.12.4 Programming the PAT
> > > :
> > > The operating system is responsible for insuring that changes to a PAT
> > > entry occur in a manner that maintains the consistency of the processor
> > > caches and translation lookaside buffers (TLB). This is accomplished by
> > > following the procedure as specified in Section 11.11.8, “MTRR
> > > Considerations in MP Systems,” for changing the value of an MTRR in a
> > > multiple processor system. It requires a specific sequence of
> > > operations that includes flushing the processors caches and TLBs.
> > > ===
> > >
> > > > What I'm after is seeing if we can ultimately disable MTRR on kernel
> > > > code but still have PAT enabled. I realize you've mentioned BIOS code
> > > > may use some MTRR setup code but this is only true for some systems.
> > > > I know for a fact Xen cannot use MTRR, it seems qemu32 does not
> > > > enable
> > > > it either. So why not have the ability to skip through its set up ?
> > >
> > > MTRR support has two meanings:
> > > 1) The kernel keeps the MTRR setup by BIOS.
> > > 2) The kernel modifies the MTRR setup.
> > >
> > > I am in a position that we need 1) but 2).
> >
> > I take it you meant "but not 2)" ?
>
> Yes. :)
OK -- we are in agreement but we know 1) is only needed for a portion of
systems: Xen and qemu32 systems fly with no MTRR set up, and as such it
would be incorrect to run MTRR code on such systems. To these systems
MTRR functionality code should be dead, since PAT currently depends on
MTRR PAT should also be dead but as the report you're fixing shows
it wasn't. That's an issue for qemu that uses the regular x86 init path
but not for Xen. Its different for Xen as the hypervisor is the one that
set up the MSR_IA32_CR_PAT for each CPU. The *only* thing Xen does is:
void xen_start_kernel(void)
{
...
rdmsrl(MSR_IA32_CR_PAT, pat);
pat_init_cache_modes(pat);
...
}
Fortunately we only have to call pat_init_cache_modes() once, its
not per-CPU. Xen has shown then that you *can* live with PAT without
any of the complex MTRR setup / code. Please add to your table the
Xen case as well then as its important to consider. If you make it
a strong requirement to have MTRR enabled to enable PAT you'd
be disabling PAT on Xen guest boots.
As-is then your this patch which calls pat_disable() on mtrr_bp_init() for the
case where MTRR is disabled would essentially break PAT on Xen guests, so this
cannot be done. It is no longer true that if MTRR is disabled you can force
disable PAT. To do what you want you want to do we have to consider Xen.
I don't think its a good idea to keep PAT initialization meshed together
with MTRR and making it a strong requirement on enabling PAT. The MTRR
code is extremely complex. I'd like instead to encourage for us to
consider for this situation to let PAT become a first class citizen,
if MTRR is disabled but you've enabled PAT you should be able to use
it, just as Xen does. There are more reasons to enable such setup than
not to. Long term I'm advocating to see if we can get an ACPI legacy
MTRR flag that can tell us if the BIOS has MTRR ripped out, then we
can at run time also take advantage of ignoring PAT completely as well.
Note, if you insisted you didn't want to disable PAT on Xen, you could
in theory check for the subarch -- but note that the subarch is unused
yet on Xen, even though it was added to the x86 boot protocol years ago.
I have a slew of patches to make use of it to help put paravirt_enabled()
in the grave, but based on discussions with Ingo, we don't want to spread
use of the subarch in random x86 code paths, we want to compartamentalize
that. If you still want to follow your approach of just force-disabling
PAT on MTRR code if MTRR was disabled you'd have to use semantics to
figure out if the boot path came from Xen, to be more specific for Xen
PV guest types only... The current agreed approach to avoid directly
using subarch is to categorize differences between what some guests
need and bare metal under an x86 platform quirk and legacy set of components.
On the x86 init path we'd call something check for the subarch and based
on that set a series of x86 legacy features / quirks that need to be
disabled / enabled. We could add MTRR as one. I'm unifying some of this
with a bit of what goes into the ACPI IA-PC boot architecture, see
section 5.2.9.3 IA-PC Boot Architecture Flags [0]. In particular the
paravirt_enabled() series I'm working on happens to also dabble into
the no CMOS RTC case for Xen, generalizing this knocks a bit of birds
with one stone. I think we can do the same with MTRR but also be
proactive and see if we can get ACPI_FADT_NO_MTRR added as well for
a future ACPI spec to enable BIOS manufacturers to rip MTRR out.
[0] http://www.acpi.info/DOWNLOADS/ACPIspec50.pdf
/* Masks for FADT IA-PC Boot Architecture Flags (boot_flags) [Vx]=Introduced in this FADT revision */
#define ACPI_FADT_LEGACY_DEVICES (1) /* 00: [V2] System has LPC or ISA bus devices */
#define ACPI_FADT_8042 (1<<1) /* 01: [V3] System has an 8042 controller on port 60/64 */
#define ACPI_FADT_NO_VGA (1<<2) /* 02: [V4] It is not safe to probe for VGA hardware */
#define ACPI_FADT_NO_MSI (1<<3) /* 03: [V4] Message Signaled Interrupts (MSI) must not be enabled */
#define ACPI_FADT_NO_ASPM (1<<4) /* 04: [V4] PCIe ASPM control must not be enabled */
#define ACPI_FADT_NO_CMOS_RTC (1<<5) /* 05: [V5] No CMOS real-time clock present */
> > There *are folks however who do
> > more as I noted earlier. Perhaps not now, but in the future I'd
> > encourage folks to rip MTRR out of their own BIOS, and enable a new
> > ACPI legacy flag to say "MTRR required". That'd eventually can help
> > bury MTRR for good while remaining backward compatible.
>
> Well, BIOS using MTRR is better than BIOS setting page tables in the SMI
> handler.
Can some BIOSes be developed without MTRR? For instance I suspect Google might
be able to easily pull of ripping MTRR out of their BIOS if they didn't do it
already for the ChromeOS devices. If possible not only should it help with removing
complexity on the BIOS but not even having to think about that code *ever* running
on the kernel at all should be nice.
> The kernel can be ignorant of the MTRR setup as long as it does
> not modify it.
Sure, we're already there. The kernel no longer modifies the
MTRR setup unless of course you boot without PAT enabled. I think
we need to move beyond that to ACPI if we can to let regular
Linux boots never have to deal with MTRR at all. The code is
complex and nasty why not put let folks put a nail on the coffin for good?
> > I can read the above description to also say:
> >
> > "Hey you need to implement PAT with the same skeleton code as MTRR"
>
> No, I did not say that. MTRR's rendezvous handler can be generalized to
> work with both MTRR and PAT. We do not need two separate handlers. In
> fact, it needs to be a single handler so that both can be initialized
> together.
I'm not sure if that's really needed. Doesn't PAT just require setting
the wrmsrl(MSR_IA32_CR_PAT, pat) for each AP?
> > If we do that, we can pave the way to deprecate MTRR as legacy for
> > good first on Linux.
>
> I do not think such change will deprecate MTRR.
Not even for shiny new BIOSes? Post ACPI 5?
> It just means that Linux can enable PAT on virtual CPUs with PAT & !MTRR capability.
>
> > > In fact, the kernel disabling MTRRs is the same as 2).
> > >
> > > > I'll also note Xen managed to enable PAT only without enabling MTRR,
> > > > this was done through pat_init_cache_modes() -- not sure if this can
> > > > be leveraged for qemu32...
> > >
> > > I am interested to know how Xen managed this. Is this done by the Xen
> > > hypervisor initializes guest's PAT on behalf of the guest kernel?
> >
> > Yup. And the cache read thingy was reading back its own setup, which
> > was different than what Linux used by default IIRC. Juergen can
> > elaborate more.
>
> Yeah, I'd like to make sure that my changes won't break it.
I checked through code inspection and indeed, it seems it would break
Xen's PAT setup.
For the record: the issue here was code that should not run ran, that
is dead code ran. I'm working towards a generic solution for this.
Luis
I like this approach more as it stuff more PAT setup on its own type
of calls, but:
On Sat, Mar 12, 2016 at 12:55:44PM +0100, Borislav Petkov wrote:
> diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
> index 10f8d4796240..5c442b4bd52a 100644
> --- a/arch/x86/kernel/cpu/mtrr/main.c
> +++ b/arch/x86/kernel/cpu/mtrr/main.c
> @@ -759,8 +761,11 @@ void __init mtrr_bp_init(void)
> }
> }
>
> - if (!mtrr_enabled())
> + if (!__mtrr_enabled) {
> pr_info("MTRR: Disabled\n");
> + pat_disable("PAT disabled by MTRR");
> + pat_setup();
> + }
> }
This hunk would break PAT on Xen.
Luis
On Tue, 2016-03-15 at 01:29 +0100, Luis R. Rodriguez wrote:
> I like this approach more as it stuff more PAT setup on its own type
> of calls, but:
>
> On Sat, Mar 12, 2016 at 12:55:44PM +0100, Borislav Petkov wrote:
> > diff --git a/arch/x86/kernel/cpu/mtrr/main.c
> > b/arch/x86/kernel/cpu/mtrr/main.c
> > index 10f8d4796240..5c442b4bd52a 100644
> > --- a/arch/x86/kernel/cpu/mtrr/main.c
> > +++ b/arch/x86/kernel/cpu/mtrr/main.c
> > @@ -759,8 +761,11 @@ void __init mtrr_bp_init(void)
> > }
> > }
> >
> > - if (!mtrr_enabled())
> > + if (!__mtrr_enabled) {
> > pr_info("MTRR: Disabled\n");
> > + pat_disable("PAT disabled by MTRR");
> > + pat_setup();
> > + }
> > }
>
> This hunk would break PAT on Xen.
Can you try the attached patches? They apply on top of my original patch-
set. With this change, PAT code generally supports Xen, and the PAT init
code in Xen is now removed. If they look OK, I will reorganize the patch
series.
Thanks,
-Toshi
On Mon, Mar 14, 2016 at 03:37:23PM -0600, Toshi Kani wrote:
> Your patch is a simplified version of mine. So, yes, it fixes the Paul's
> issue, but it does not address other issues that my patchset also
> addressed. In specific, I think your patch has the following issues.
You couldnt've structured your reply better: remember how I split a
convoluted patch of yours already? A patch which was trying to do a
bunch of things in one go.
The situation here is the same. You need to do *one* *logical*
*non-trivial* thing in a patch. If there's something else that needs to
be done, add it in a *separate* patch which explains why that new change
is needed.
> - pat_disable() is now callable from other modules. So, it needs to check
> with boot_cpu_done. We cannot disable PAT once it is initialized.
That should be a separate patch which explains *why* the change is being
done.
> - mtrr_bp_init() needs to check with mtrr_enabled() when it
> calls mtrr_pat_setup_bp(). Otherwise, PAT is left initialized on BSP only
> when MTRR is disabled by its MSR. In your patch, mtrr_bp_init() calls
> pat_setup() again, but it does not help since boot_cpu_done is set.
The code which you carved out from get_mtrr_state() didn't check
mtrr_enabled() before. That needs to be another patch *again* with
explanations.
> - When PAT is disabled in CPU feature, pat_bsp_init() calls pat_disable()
> and returns. However, it does not initialize a PAT table by calling
> pat_init_cache_modes().
Yet another patch.
> - When CONFIG_MTRR is unset, it does not call pat_setup().
Aaaand... can you guess what I'm going to say here?
I hope it is coming across as I intend it: please use my hunk to do a
single fix and then prepare all those changes above in separate patches
with explanations:
"Problem is A. We need to do B. I'm doing it/I'm doing C because."
Ok?
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
On Mon, Mar 14, 2016 at 09:11:16PM -0600, Toshi Kani wrote:
> - pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
> - PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
> + if (cpu_has_pat) {
Please use on init paths boot_cpu_has(X86_FEATURE_PAT) and on fast paths
static_cpu_has(X86_FEATURE_PAT). No more of that cpu_has_XXX ugliness.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
On Tue, 2016-03-15 at 11:01 +0000, Borislav Petkov wrote:
> On Mon, Mar 14, 2016 at 09:11:16PM -0600, Toshi Kani wrote:
> > - pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC)
> > |
> > - PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
> > + if (cpu_has_pat) {
>
> Please use on init paths boot_cpu_has(X86_FEATURE_PAT) and on fast paths
> static_cpu_has(X86_FEATURE_PAT). No more of that cpu_has_XXX ugliness.
'cpu_has_pat' is defined as 'boot_cpu_has(X86_FEATURE_PAT)'. Do you mean
it should explicitly use 'boot_cpu_has(X86_FEATURE_PAT)'?
Thanks,
-Toshi
On Tue, Mar 15, 2016 at 09:43:15AM -0600, Toshi Kani wrote:
> > Please use on init paths boot_cpu_has(X86_FEATURE_PAT) and on fast paths
> > static_cpu_has(X86_FEATURE_PAT). No more of that cpu_has_XXX ugliness.
>
> 'cpu_has_pat' is defined as 'boot_cpu_has(X86_FEATURE_PAT)'. Do you mean
> it should explicitly use 'boot_cpu_has(X86_FEATURE_PAT)'?
No, read what I said.
We use boot_cpu_has(<feature_bit>) on slow paths (i.e., init, bootup,
etc), where speed is not that important. static_cpu_has(<feature_bit>)
is an optimized version which should be used in hot paths.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
On Mon, Mar 14, 2016 at 06:37:20PM -0600, Toshi Kani wrote:
> Yes, it'd be really nice if Paul can test it as well.
Let's please agree on the final design of the patchset first and then
ask bug reporters to test.
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
On Tue, 2016-03-15 at 16:47 +0100, Borislav Petkov wrote:
> On Tue, Mar 15, 2016 at 09:43:15AM -0600, Toshi Kani wrote:
> > > Please use on init paths boot_cpu_has(X86_FEATURE_PAT) and on fast
> > > paths static_cpu_has(X86_FEATURE_PAT). No more of that cpu_has_XXX
> > > ugliness.
> >
> > 'cpu_has_pat' is defined as 'boot_cpu_has(X86_FEATURE_PAT)'. Do you
> > mean it should explicitly use 'boot_cpu_has(X86_FEATURE_PAT)'?
>
> No, read what I said.
>
> We use boot_cpu_has(<feature_bit>) on slow paths (i.e., init, bootup,
> etc), where speed is not that important. static_cpu_has(<feature_bit>)
> is an optimized version which should be used in hot paths.
Yes, I understand that part. Let me rephrase my question.
This PAT code is on init paths and speed is not that important. So, it
needs to use 'boot_cpu_has()' here. 'cpu_has_pat' is defined
as boot_cpu_has(X86_FEATURE_PAT), and hence it uses boot_cpu_has() already.
While cpu_has_pat is the same as boot_cpu_has(X86_FEATURE_PAT), cpu_has_XXX
should not be used. So, this code needs to be changed to use
boot_cpu_has(X86_FEATURE_PAT) directly.
Is this right?
Thanks,
-Toshi
On Tue, Mar 15, 2016 at 11:11:23AM -0600, Toshi Kani wrote:
> While cpu_has_pat is the same as boot_cpu_has(X86_FEATURE_PAT), cpu_has_XXX
> should not be used. So, this code needs to be changed to use
> boot_cpu_has(X86_FEATURE_PAT) directly.
>
> Is this right?
Yes.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
On Tue, 2016-03-15 at 12:00 +0100, Borislav Petkov wrote:
> On Mon, Mar 14, 2016 at 03:37:23PM -0600, Toshi Kani wrote:
> > Your patch is a simplified version of mine. So, yes, it fixes the
> > Paul's issue, but it does not address other issues that my patchset
> > also addressed. In specific, I think your patch has the following
> > issues.
>
> You couldnt've structured your reply better: remember how I split a
> convoluted patch of yours already? A patch which was trying to do a
> bunch of things in one go.
>
> The situation here is the same. You need to do *one* *logical*
> *non-trivial* thing in a patch. If there's something else that needs to
> be done, add it in a *separate* patch which explains why that new change
> is needed.
Got it!
> > - pat_disable() is now callable from other modules. So, it needs to
> > check with boot_cpu_done. We cannot disable PAT once it is initialized.
>
> That should be a separate patch which explains *why* the change is being
> done.
>
> > - mtrr_bp_init() needs to check with mtrr_enabled() when it
> > calls mtrr_pat_setup_bp(). Otherwise, PAT is left initialized on BSP
> > only when MTRR is disabled by its MSR. In your patch, mtrr_bp_init()
> > calls pat_setup() again, but it does not help since boot_cpu_done is
> > set.
>
> The code which you carved out from get_mtrr_state() didn't check
> mtrr_enabled() before. That needs to be another patch *again* with
> explanations.
>
> > - When PAT is disabled in CPU feature, pat_bsp_init() calls
> > pat_disable() and returns. However, it does not initialize a PAT table
> > by calling pat_init_cache_modes().
>
> Yet another patch.
>
> > - When CONFIG_MTRR is unset, it does not call pat_setup().
>
> Aaaand... can you guess what I'm going to say here?
>
> I hope it is coming across as I intend it: please use my hunk to do a
> single fix and then prepare all those changes above in separate patches
> with explanations:
Unfortunately, this single fix will break Xen. So, I think we will need to
make a few enhancements first before making the fix.
> "Problem is A. We need to do B. I'm doing it/I'm doing C because."
>
> Ok?
Yes, I will try to separate the patches to change one logical thing at a
time.
Thanks,
-Toshi
On Mon, Mar 14, 2016 at 09:11:16PM -0600, Toshi Kani wrote:
> On Tue, 2016-03-15 at 01:29 +0100, Luis R. Rodriguez wrote:
> > I like this approach more as it stuff more PAT setup on its own type
> > of calls, but:
> >
> > On Sat, Mar 12, 2016 at 12:55:44PM +0100, Borislav Petkov wrote:
> > > diff --git a/arch/x86/kernel/cpu/mtrr/main.c
> > > b/arch/x86/kernel/cpu/mtrr/main.c
> > > index 10f8d4796240..5c442b4bd52a 100644
> > > --- a/arch/x86/kernel/cpu/mtrr/main.c
> > > +++ b/arch/x86/kernel/cpu/mtrr/main.c
> > > @@ -759,8 +761,11 @@ void __init mtrr_bp_init(void)
> > > ? }
> > > ? }
> > > ?
> > > - if (!mtrr_enabled())
> > > + if (!__mtrr_enabled) {
> > > ? pr_info("MTRR: Disabled\n");
> > > + pat_disable("PAT disabled by MTRR");
> > > + pat_setup();
> > > + }
> > > ?}
> >
> > This hunk would break PAT on Xen.
>
> Can you try the attached patches? ?They apply on top of my original patch-
> set. ?With this change, PAT code generally supports Xen, and the PAT init
> code in Xen is now removed. ?If they look OK, I will reorganize the patch
> series.
I don't have time to test this at this time but on a cursory review this should
in theory work, a few nitpicks:
>
> Thanks,
> -Toshi
> From: Toshi Kani <[email protected]>
>
> Add support of PAT emulation that matches with the PAT MSR.
>
> ---
> arch/x86/mm/pat.c | 73 +++++++++++++++++++++++++++++++----------------------
> 1 file changed, 43 insertions(+), 30 deletions(-)
>
> diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
> index 1ff8aa9..565a478 100644
> --- a/arch/x86/mm/pat.c
> +++ b/arch/x86/mm/pat.c
> @@ -40,7 +40,7 @@
> static bool boot_cpu_done;
>
> static int __read_mostly __pat_enabled = IS_ENABLED(CONFIG_X86_PAT);
> -static void pat_disable_init(void);
> +static void pat_emu_init(void);
>
> void pat_disable(const char *reason)
> {
> @@ -52,7 +52,7 @@ void pat_disable(const char *reason)
> __pat_enabled = 0;
> pr_info("x86/PAT: %s\n", reason);
>
> - pat_disable_init();
> + pat_emu_init();
> }
>
> static int __init nopat(char *str)
> @@ -239,40 +239,53 @@ static void pat_ap_init(u64 pat)
> wrmsrl(MSR_IA32_CR_PAT, pat);
> }
>
> -static void pat_disable_init(void)
> +static void pat_emu_init(void)
> {
> - u64 pat;
> - static int disable_init_done;
> + u64 pat = 0;
> + static int emu_init_done;
>
> - if (disable_init_done)
> + if (emu_init_done)
> return;
>
> - /*
> - * No PAT. Emulate the PAT table that corresponds to the two
> - * cache bits, PWT (Write Through) and PCD (Cache Disable). This
> - * setup is the same as the BIOS default setup when the system
> - * has PAT but the "nopat" boot option has been specified. This
> - * emulated PAT table is used when MSR_IA32_CR_PAT returns 0.
> - *
> - * PTE encoding:
> - *
> - * PCD
> - * |PWT PAT
> - * || slot
> - * 00 0 WB : _PAGE_CACHE_MODE_WB
> - * 01 1 WT : _PAGE_CACHE_MODE_WT
> - * 10 2 UC-: _PAGE_CACHE_MODE_UC_MINUS
> - * 11 3 UC : _PAGE_CACHE_MODE_UC
> - *
> - * NOTE: When WC or WP is used, it is redirected to UC- per
> - * the default setup in __cachemode2pte_tbl[].
> - */
> - pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
> - PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
> + if (cpu_has_pat) {
First, this is not emulation, to be clear.
> + /*
> + * CPU supports PAT. Initialize the PAT table to match with
> + * the PAT MSR value. This setup is used by "nopat" boot
Did you mean "nomtrr" option ? If not why would "nopat" land you here and if
"nopat" was used and you ended up here why would we want to go ahead and
read the MSR to keep the PAT set up ?
> + * option, or by virtual machine environments which do not
> + * support MTRRs but support PAT.
This might end up supporting PAT for some other virtual environments ;)
> + *
> + * If the MSR returns 0, it is considered invalid and emulate
> + * as No PAT.
> + */
> + rdmsrl(MSR_IA32_CR_PAT, pat);
> + }
> +
> + if (!pat) {
> + /*
> + * No PAT. Emulate the PAT table that corresponds to the two
> + * cache bits, PWT (Write Through) and PCD (Cache Disable).
> + * This setup is also the same as the BIOS default setup.
> + *
> + * PTE encoding:
> + *
> + * PCD
> + * |PWT PAT
> + * || slot
> + * 00 0 WB : _PAGE_CACHE_MODE_WB
> + * 01 1 WT : _PAGE_CACHE_MODE_WT
> + * 10 2 UC-: _PAGE_CACHE_MODE_UC_MINUS
> + * 11 3 UC : _PAGE_CACHE_MODE_UC
> + *
> + * NOTE: When WC or WP is used, it is redirected to UC- per
> + * the default setup in __cachemode2pte_tbl[].
> + */
> + pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
> + PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
> + }
a) pat_emu_init() -- this should probably be renamed to something that includes
not just emulation as for virtual environments that's not true and using
emulation is very misleading for Xen.
b) Will this mean that then you can support bare metal with no MTRR but with PAT
enabled too?
c) Why not just split this up to enable PAT init to be a first class citizen ?
Luis
On Tue, 2016-03-15 at 01:15 +0100, Luis R. Rodriguez wrote:
> On Fri, Mar 11, 2016 at 06:16:36PM -0700, Toshi Kani wrote:
> > On Fri, 2016-03-11 at 15:34 -0800, Luis R. Rodriguez wrote:
> > > On Fri, Mar 11, 2016 at 3:56 PM, Toshi Kani <[email protected]>
> > > wrote:
> > > > On Fri, 2016-03-11 at 23:17 +0100, Luis R. Rodriguez wrote:
> > > > > On Fri, Mar 11, 2016 at 11:57:12AM -0700, Toshi Kani wrote:
:
> > > > > What I'm after is seeing if we can ultimately disable MTRR on
> > > > > kernel code but still have PAT enabled. I realize you've
> > > > > mentioned BIOS code may use some MTRR setup code but this is only
> > > > > true for some systems. I know for a fact Xen cannot use MTRR, it
> > > > > seems qemu32 does not enable it either. So why not have the
> > > > > ability to skip through its set up ?
> > > >
> > > > MTRR support has two meanings:
> > > > 1) The kernel keeps the MTRR setup by BIOS.
> > > > 2) The kernel modifies the MTRR setup.
> > > >
> > > > I am in a position that we need 1) but 2).
> > >
> > > I take it you meant "but not 2)" ?
> >
> > Yes. :)
>
> OK -- we are in agreement but we know 1) is only needed for a portion of
> systems: Xen and qemu32 systems fly with no MTRR set up, and as such it
> would be incorrect to run MTRR code on such systems. To these systems
> MTRR functionality code should be dead, since PAT currently depends on
> MTRR PAT should also be dead but as the report you're fixing shows
> it wasn't. That's an issue for qemu that uses the regular x86 init path
> but not for Xen. Its different for Xen as the hypervisor is the one that
> set up the MSR_IA32_CR_PAT for each CPU. The *only* thing Xen does is:
MTRR code does not have to be dead for the virtual machines with no MTRR
support. The code just needs to handle the disabled case properly. I
consider this is part of 1) that the kernel keeps the MTRR state as
disabled.
> void xen_start_kernel(void)
> {
> ...
> rdmsrl(MSR_IA32_CR_PAT, pat);
>
> pat_init_cache_modes(pat);
> ...
> }
>
> Fortunately we only have to call pat_init_cache_modes() once, its
> not per-CPU. Xen has shown then that you *can* live with PAT without
> any of the complex MTRR setup / code. Please add to your table the
> Xen case as well then as its important to consider. If you make it
> a strong requirement to have MTRR enabled to enable PAT you'd
> be disabling PAT on Xen guest boots.
Yes, I understand that the original fix will break Xen. Thanks for
pointing this out! In the next version (which is the same approach as the
two additional patches I sent you yesterday), I am going to integrate the
Xen's use-case into the framework. xen_start_kernel() will no longer need
to call pat_init_cache_modes() as a result. So, please review the next
version, and let me know if there is any issue.
> As-is then your this patch which calls pat_disable() on mtrr_bp_init()
> for the case where MTRR is disabled would essentially break PAT on Xen
> guests, so this cannot be done. It is no longer true that if MTRR is
> disabled you can force disable PAT. To do what you want you want to do we
> have to consider Xen.
>
> I don't think its a good idea to keep PAT initialization meshed together
> with MTRR and making it a strong requirement on enabling PAT. The MTRR
> code is extremely complex. I'd like instead to encourage for us to
> consider for this situation to let PAT become a first class citizen,
> if MTRR is disabled but you've enabled PAT you should be able to use
> it, just as Xen does. There are more reasons to enable such setup than
> not to. Long term I'm advocating to see if we can get an ACPI legacy
> MTRR flag that can tell us if the BIOS has MTRR ripped out, then we
> can at run time also take advantage of ignoring PAT completely as well.
>
> Note, if you insisted you didn't want to disable PAT on Xen, you could
> in theory check for the subarch -- but note that the subarch is unused
> yet on Xen, even though it was added to the x86 boot protocol years ago.
> I have a slew of patches to make use of it to help put paravirt_enabled()
> in the grave, but based on discussions with Ingo, we don't want to spread
> use of the subarch in random x86 code paths, we want to compartamentalize
> that. If you still want to follow your approach of just force-disabling
> PAT on MTRR code if MTRR was disabled you'd have to use semantics to
> figure out if the boot path came from Xen, to be more specific for Xen
> PV guest types only... The current agreed approach to avoid directly
> using subarch is to categorize differences between what some guests
> need and bare metal under an x86 platform quirk and legacy set of
> components.
>
> On the x86 init path we'd call something check for the subarch and based
> on that set a series of x86 legacy features / quirks that need to be
> disabled / enabled. We could add MTRR as one. I'm unifying some of this
> with a bit of what goes into the ACPI IA-PC boot architecture, see
> section 5.2.9.3 IA-PC Boot Architecture Flags [0]. In particular the
> paravirt_enabled() series I'm working on happens to also dabble into
> the no CMOS RTC case for Xen, generalizing this knocks a bit of birds
> with one stone. I think we can do the same with MTRR but also be
> proactive and see if we can get ACPI_FADT_NO_MTRR added as well for
> a future ACPI spec to enable BIOS manufacturers to rip MTRR out.
We do not need subarch for this case since presence of the MTRR feature can
be tested with CPUID and MSR.
> [0] http://www.acpi.info/DOWNLOADS/ACPIspec50.pdf
>
> /* Masks for FADT IA-PC Boot Architecture Flags (boot_flags)
> [Vx]=Introduced in this FADT revision */
> #define ACPI_FADT_LEGACY_DEVICES (1) /* 00: [V2] System has
> LPC or ISA bus devices */
> #define ACPI_FADT_8042 (1<<1) /* 01: [V3] System has an
> 8042 controller on port 60/64 */
> #define ACPI_FADT_NO_VGA (1<<2) /* 02: [V4] It is not
> safe to probe for VGA hardware */
> #define ACPI_FADT_NO_MSI (1<<3) /* 03: [V4] Message
> Signaled Interrupts (MSI) must not be enabled */
> #define ACPI_FADT_NO_ASPM (1<<4) /* 04: [V4] PCIe ASPM
> control must not be enabled */
> #define ACPI_FADT_NO_CMOS_RTC (1<<5) /* 05: [V5] No CMOS real-
> time clock present */
>
> > > There *are folks however who do
> > > more as I noted earlier. Perhaps not now, but in the future I'd
> > > encourage folks to rip MTRR out of their own BIOS, and enable a new
> > > ACPI legacy flag to say "MTRR required". That'd eventually can help
> > > bury MTRR for good while remaining backward compatible.
> >
> > Well, BIOS using MTRR is better than BIOS setting page tables in the
> > SMI handler.
>
> Can some BIOSes be developed without MTRR? For instance I suspect Google
> might be able to easily pull of ripping MTRR out of their BIOS if they
> didn't do it already for the ChromeOS devices. If possible not only
> should it help with removing complexity on the BIOS but not even having
> to think about that code *ever* running on the kernel at all should be
> nice.
AFAIK, MTRR is the only way to specify UC attribute in physical mode on
x86. On ia64, one can simply set the UC bit to a physical address to
specify UC attribute. I wish we had something similar.
> > The kernel can be ignorant of the MTRR setup as long as it does
> > not modify it.
>
> Sure, we're already there. The kernel no longer modifies the
> MTRR setup unless of course you boot without PAT enabled. I think
> we need to move beyond that to ACPI if we can to let regular
> Linux boots never have to deal with MTRR at all. The code is
> complex and nasty why not put let folks put a nail on the coffin for
> good?
I think we are good as long as we do not modify it. The complexity comes
with the modification.
> > > I can read the above description to also say:
> > >
> > > "Hey you need to implement PAT with the same skeleton code as MTRR"
> >
> > No, I did not say that. MTRR's rendezvous handler can be generalized
> > to work with both MTRR and PAT. We do not need two separate handlers.
> > In fact, it needs to be a single handler so that both can be
> > initialized together.
>
> I'm not sure if that's really needed. Doesn't PAT just require setting
> the wrmsrl(MSR_IA32_CR_PAT, pat) for each AP?
No, it requires the same procedure as MTRR.
> > > If we do that, we can pave the way to deprecate MTRR as legacy for
> > > good first on Linux.
> >
> > I do not think such change will deprecate MTRR.
>
> Not even for shiny new BIOSes? Post ACPI 5?
Nope.
> > It just means that Linux can enable PAT on virtual CPUs with PAT &
> > !MTRR capability.
> >
> > > > In fact, the kernel disabling MTRRs is the same as 2).
> > > >
> > > > > I'll also note Xen managed to enable PAT only without enabling
> > > > > MTRR, this was done through pat_init_cache_modes() -- not sure if
> > > > > this can be leveraged for qemu32...
> > > >
> > > > I am interested to know how Xen managed this. Is this done by the
> > > > Xen hypervisor initializes guest's PAT on behalf of the guest
> > > > kernel?
> > >
> > > Yup. And the cache read thingy was reading back its own setup, which
> > > was different than what Linux used by default IIRC. Juergen can
> > > elaborate more.
> >
> > Yeah, I'd like to make sure that my changes won't break it.
>
> I checked through code inspection and indeed, it seems it would break
> Xen's PAT setup.
>
> For the record: the issue here was code that should not run ran, that
> is dead code ran. I'm working towards a generic solution for this.
Please review the next version.
Thanks,
-Toshi
On Tue, Mar 15, 2016 at 05:48:44PM -0600, Toshi Kani wrote:
> On Tue, 2016-03-15 at 01:15 +0100, Luis R. Rodriguez wrote:
> > On Fri, Mar 11, 2016 at 06:16:36PM -0700, Toshi Kani wrote:
> > > On Fri, 2016-03-11 at 15:34 -0800, Luis R. Rodriguez wrote:
> > > > On Fri, Mar 11, 2016 at 3:56 PM, Toshi Kani <[email protected]>
> > > > wrote:
> > > > > On Fri, 2016-03-11 at 23:17 +0100, Luis R. Rodriguez wrote:
> > > > > > On Fri, Mar 11, 2016 at 11:57:12AM -0700, Toshi Kani wrote:
> ?:
> > > > > > What I'm after is seeing if we can ultimately disable MTRR on
> > > > > > kernel code but still have PAT enabled. I realize you've
> > > > > > mentioned BIOS code may use some MTRR setup code but this is only
> > > > > > true for some systems. I know for a fact Xen cannot use MTRR, it
> > > > > > seems qemu32 does not enable it either. So why not have the
> > > > > > ability to skip through its set up ?
> > > > >
> > > > > MTRR support has two meanings:
> > > > > ?1) The kernel keeps the MTRR setup by BIOS.
> > > > > ?2) The kernel modifies the MTRR setup.
> > > > >
> > > > > I am in a position that we need 1) but 2).
> > > >
> > > > I take it you meant "but not 2)" ??
> > >
> > > Yes. :)
> >
> > OK -- we are in agreement but we know 1) is only needed for a portion of
> > systems: Xen and qemu32 systems fly with no MTRR set up, and as such it
> > would be incorrect to run MTRR code on such systems. To these systems
> > MTRR functionality code should be dead, since PAT currently depends on
> > MTRR PAT should also be dead but as the report you're fixing shows
> > it wasn't. That's an issue for qemu that uses the regular x86 init path
> > but not for Xen. Its different for Xen as the hypervisor is the one that
> > set up the MSR_IA32_CR_PAT for each CPU. The *only* thing Xen does is:
>
> MTRR code does not have to be dead for the virtual machines with no MTRR
> support. ?The code just needs to handle the disabled case properly. ?I
> consider this is part of 1) that the kernel keeps the MTRR state as
> disabled.
For Xen it was possible to use PAT without any of the MTRR code running,
I don't see why its needed then and why other virtual machines that
only need PAT need it.
> > void xen_start_kernel(void)
> > {
> > ...
> > ????????rdmsrl(MSR_IA32_CR_PAT,?pat);????????????????????????????????????
> > ???????
> > ????????pat_init_cache_modes(pat);?
> > ...
> > }
> >
> > Fortunately we only have to call pat_init_cache_modes() once, its
> > not per-CPU. Xen has shown then that you *can* live with PAT without
> > any of the complex MTRR setup / code. Please add to your table the
> > Xen case as well then as its important to consider. If you make it
> > a strong requirement to have MTRR enabled to enable PAT you'd
> > be disabling PAT on Xen guest boots.
>
> Yes, I understand that the original fix will break Xen. ?Thanks for
> pointing this out! ?In the next version (which is the same approach as the
> two additional patches I sent you yesterday), I am going to integrate the
> Xen's use-case into the framework. ?xen_start_kernel() will no longer need
> to call?pat_init_cache_modes() as a result. ?So, please review the next
> version, and let me know if there is any issue.
Sure.
> > As-is then your this patch which calls pat_disable() on mtrr_bp_init()
> > for the case where MTRR is disabled would essentially break PAT on Xen
> > guests, so this cannot be done. It is no longer true that if MTRR is
> > disabled you can force disable PAT. To do what you want you want to do we
> > have to consider Xen.
> >
> > I don't think its a good idea to keep PAT initialization meshed together
> > with MTRR and making it a strong requirement on enabling PAT. The MTRR
> > code is extremely complex. I'd like instead to encourage for us to
> > consider for this situation to let PAT become a first class citizen,
> > if MTRR is disabled but you've enabled PAT you should be able to use
> > it, just as Xen does. There are more reasons to enable such setup than
> > not to. Long term I'm advocating to see if we can get an ACPI legacy
> > MTRR flag that can tell us if the BIOS has MTRR ripped out, then we
> > can at run time also take advantage of ignoring PAT completely as well.
> >
> > Note, if you insisted you didn't want to disable PAT on Xen, you could
> > in theory check for the subarch -- but note that the subarch is unused
> > yet on Xen, even though it was added to the x86 boot protocol years ago.
> > I have a slew of patches to make use of it to help put paravirt_enabled()
> > in the grave, but based on discussions with Ingo, we don't want to spread
> > use of the subarch in random x86 code paths, we want to compartamentalize
> > that. If you still want to follow your approach of just force-disabling
> > PAT on MTRR code if MTRR was disabled you'd have to use semantics to
> > figure out if the boot path came from Xen, to be more specific for Xen
> > PV guest types only... The current agreed approach to avoid directly
> > using subarch is to categorize differences between what some guests
> > need and bare metal under an x86 platform quirk and legacy set of
> > components.
> >
> > On the x86 init path we'd call something check for the subarch and based
> > on that set a series of x86 legacy features / quirks that need to be
> > disabled / enabled. We could add MTRR as one. I'm unifying some of this
> > with a bit of what goes into the ACPI IA-PC boot architecture, see
> > section 5.2.9.3 IA-PC Boot Architecture Flags [0]. In particular the
> > paravirt_enabled() series I'm working on happens to also dabble into
> > the no CMOS RTC case for Xen, generalizing this knocks a bit of birds
> > with one stone. I think we can do the same with MTRR but also be
> > proactive and see if we can get ACPI_FADT_NO_MTRR added as well for
> > a future ACPI spec to enable BIOS manufacturers to rip MTRR out.
>
> We do not need subarch for this case since presence of the MTRR feature can
> be tested with CPUID and MSR.
But in this case its about two different cases:
No MTRR - but you need to emulate PAT
No MTRR - but you can just read what the hypervisor already set up
So it a given MTRR is disabled for both, what we need then is semantics
to distinguish between qemu32 and Xen PV. Curious to see what you use,
in your current new patch it was not clear what you did.
> > [0] http://www.acpi.info/DOWNLOADS/ACPIspec50.pdf?
> >
> > /* Masks for FADT IA-PC Boot Architecture Flags (boot_flags)
> > [Vx]=Introduced in this FADT revision */
> > #define ACPI_FADT_LEGACY_DEVICES????(1)?????????/* 00: [V2] System has
> > LPC or ISA bus devices */
> > #define ACPI_FADT_8042??????????????(1<<1)??????/* 01: [V3] System has an
> > 8042 controller on port 60/64 */
> > #define ACPI_FADT_NO_VGA????????????(1<<2)??????/* 02: [V4] It is not
> > safe to probe for VGA hardware */
> > #define ACPI_FADT_NO_MSI????????????(1<<3)??????/* 03: [V4] Message
> > Signaled Interrupts (MSI) must not be enabled */
> > #define ACPI_FADT_NO_ASPM???????????(1<<4)??????/* 04: [V4] PCIe ASPM
> > control must not be enabled */
> > #define ACPI_FADT_NO_CMOS_RTC???????(1<<5)??????/* 05: [V5] No CMOS real-
> > time clock present */
> >
> > > > There *are folks however who do
> > > > more as I noted earlier. Perhaps not now, but in the future I'd
> > > > encourage folks to rip MTRR out of their own BIOS, and enable a new
> > > > ACPI legacy flag to say "MTRR required". That'd eventually can help
> > > > bury MTRR for good while remaining backward compatible.
> > >
> > > Well, BIOS using MTRR is better than BIOS setting page tables in the
> > > SMI handler.
> >
> > Can some BIOSes be developed without MTRR? For instance I suspect Google
> > might be able to easily pull of ripping MTRR out of their BIOS if they
> > didn't do it already for the ChromeOS devices. If possible not only
> > should it help with removing complexity on the BIOS but not even having
> > to think about that code *ever* running on the kernel at all should be
> > nice.
>
> AFAIK, MTRR is the only way to specify UC attribute in physical mode on
> x86. ?On ia64, one can simply set the UC bit to a physical address to
> specify UC attribute. ?I wish we had something similar.
Whoa, you mean on the BIOS? This is BIG deal if true. Why haven't you just
said this a long time ago?
On x86 Linux code we now have ioremap_uc() that can't use MTRR behind the
scenes, why would something like this on the BIOS not be possible? That
ultimately uses set_pte_at(). What limitations are there on the BIOS
that prevent us from just using strong UC for PAT on the BIOS?
> > > The kernel can be ignorant of the MTRR setup as long as it does
> > > not modify it.
> >
> > Sure, we're already there. The kernel no longer modifies the
> > MTRR setup unless of course you boot without PAT enabled. I think
> > we need to move beyond that to ACPI if we can to let regular
> > Linux boots never have to deal with MTRR at all. The code is
> > complex and nasty why not put let folks put a nail on the coffin for
> > good?
>
> I think we are good as long as we do not modify it. ?The complexity comes
> with the modification.
Ew. I look at that code and cannot comprehend why we'd ever want to keep
it always running.
> > > > I can read the above description to also say:
> > > >
> > > > "Hey you need to implement PAT with the same skeleton code as MTRR"
> > >
> > > No, I did not say that. ?MTRR's rendezvous handler can be generalized
> > > to work with both MTRR and PAT. ?We do not need two separate handlers.
> > > ?In fact, it needs to be a single handler so that both can be
> > > initialized together.
> >
> > I'm not sure if that's really needed. Doesn't PAT just require setting
> > the wrmsrl(MSR_IA32_CR_PAT, pat) for each AP?
>
> No, it requires the same procedure as MTRR.
MTRR has a bunch of junk that is outside of the scope of the generic
procedure which I'd hope we can skip.
> > > > If we do that, we can pave the way to deprecate MTRR as legacy for
> > > > good first on Linux.
> > >
> > > I do not think such change will deprecate MTRR.
> >
> > Not even for shiny new BIOSes? Post ACPI 5?
>
> Nope.
Hrm...
> > > ?It just means that Linux can enable PAT on virtual CPUs with PAT &
> > > !MTRR capability.
> > >
> > > > > In fact, the kernel disabling MTRRs is the same as 2).
> > > > >
> > > > > > I'll also note Xen managed to enable PAT only without enabling
> > > > > > MTRR, this was done through pat_init_cache_modes() -- not sure if
> > > > > > this can be leveraged for qemu32...
> > > > >
> > > > > I am interested to know how Xen managed this.??Is this done by the
> > > > > Xen hypervisor initializes guest's PAT on behalf of the guest
> > > > > kernel?
> > > >
> > > > Yup. And the cache read thingy was reading back its own setup, which
> > > > was different than what Linux used by default IIRC. Juergen can
> > > > elaborate more.
> > >
> > > Yeah, I'd like to make sure that my changes won't break it.
> >
> > I checked through code inspection and indeed, it seems it would break
> > Xen's PAT setup.
> >
> > For the record: the issue here was code that should not run ran, that
> > is dead code ran. I'm working towards a generic solution for this.
>
> Please review the next version.
Sure..
Luis
On 03/14/2016 08:37 PM, Toshi Kani wrote:
[... snip ...]
>> Joe at Stratus also hit this issue but on a system where MTRR is enabled.
>> He sent his report only to me as he thought it was caused by the
>> ioremap_wc() changes and his driver was one that got it. In his case
>> though he modified the driver significantly, and upon inspection of that
>> code saw how it used a secondary backup PCI device for failover for a
>> framebuffer device... The changes to the driver in place are rather
>> complex though and as such it made no sense to further review unless he
>> moved his changes upstream. It is still worth noting this issue has been
>> seeing elsehwere, but the root cause is still not known. The error Joe
>> got is:
>>
>> x86/PAT: Xorg:37506 map pfn expected mapping type uncached-minus for [mem
>> 0x9f000000-0x9f7fffff], got write-combining
>>
>> Even though the driver is custom (and actually I even saw another
>> unrelated proprietary driver loaded) I figured its worth noting others
>> have seen this error without MTRR being disabled.
>
> The error message looks the same. So, this could be the same issue if WC
> is redirected to UC without disabling PAT properly on his env. I need a
> whole dmesg output to confirm if this is the case. Another way to hit this
> error is that the driver called remap_pfn_range() with UC to a range where
> WC map was set by ioremap_wc() already.
As mentioned in the other thread, our driver is very custom and performs
some page table parlor tricks to failover the frame buffer from one PCI
adapter to another. I need some time to fully digest these
customizations before I can really be of much help here. My current
theory is that we have bug when do:
ioremap_wc( backup adapter fb )
iounmap ( backup adapter fb ) << I'm unclear why this occurs
ioremap_wc( primary adapter fb )
to failover, we stop_machine and remap the framebuffer (heavily
summarized: iterate through its pages, lookup_address to get a ptep, get
its __pgprot and then rewrite the pte to the new physical address and
the old pgprot_val. A tlb flush on the way out of stop_machine).
Since this is all out-of-tree custom work, I'm not asking for any
support. If we're too far off the reservation to be useful to the
conversation here, no worries. I'm just trying to help if there is an
upstream bug.
>> The second thread you referred to seems to say that if you built-in the
>> code the error does not come up. What the hell. Joe, can you try building
>> your driver built-in to see if you also see this go away? Even though I
>> don't want to support your custom hacked up driver I do want to know if
>> your issue goes away with built-in as well.
>
> I do not have sufficient info to support this case, and do not have
> technical explanation for it, either.
Luis -- I tried last night to reproduce (built as a module) without any
luck. Going back through my test logs, it looks like the warning is
relatively sporadic and may take a few days to hit. Without having a
better repro case, I think my warning is caused by that dodgy page table
work mentioned above (or at least timing related). Let me see if I can
try to provoke the warning into happening more frequently first.
Regards,
-- Joe
On Wed, 2016-03-16 at 00:29 +0100, Luis R. Rodriguez wrote:
> On Tue, Mar 15, 2016 at 05:48:44PM -0600, Toshi Kani wrote:
> > On Tue, 2016-03-15 at 01:15 +0100, Luis R. Rodriguez wrote:
> > > On Fri, Mar 11, 2016 at 06:16:36PM -0700, Toshi Kani wrote:
> > > > On Fri, 2016-03-11 at 15:34 -0800, Luis R. Rodriguez wrote:
> > > > > On Fri, Mar 11, 2016 at 3:56 PM, Toshi Kani <[email protected]>
> > > > > wrote:
> > > > > > On Fri, 2016-03-11 at 23:17 +0100, Luis R. Rodriguez wrote:
> > > > > > > On Fri, Mar 11, 2016 at 11:57:12AM -0700, Toshi Kani wrote:
> > :
:
> > MTRR code does not have to be dead for the virtual machines with no
> > MTRR support. The code just needs to handle the disabled case
> > properly. I consider this is part of 1) that the kernel keeps the MTRR
> > state as disabled.
>
> For Xen it was possible to use PAT without any of the MTRR code running,
> I don't see why its needed then and why other virtual machines that
> only need PAT need it.
Virtual BIOS does not need MTRRs since it does not manage the platform.
> >
:
> > We do not need subarch for this case since presence of the MTRR feature
> > can be tested with CPUID and MSR.
>
> But in this case its about two different cases:
>
> No MTRR - but you need to emulate PAT
> No MTRR - but you can just read what the hypervisor already set up
>
> So it a given MTRR is disabled for both, what we need then is semantics
> to distinguish between qemu32 and Xen PV. Curious to see what you use,
> in your current new patch it was not clear what you did.
X86_FEATURE_PAT tells us if CPU supports PAT. So, the kernel can
distinguish the two cases above without knowing qemu32 or Xen.
> > >
:
> > AFAIK, MTRR is the only way to specify UC attribute in physical mode on
> > x86. On ia64, one can simply set the UC bit to a physical address to
> > specify UC attribute. I wish we had something similar.
>
> Whoa, you mean on the BIOS? This is BIG deal if true. Why haven't you
> just said this a long time ago?
Yes, BIOS. I think I've told you before when I mentioned that BIOS might
need to manage fan speed. Virt BIOS does not need to do such thing.
> On x86 Linux code we now have ioremap_uc() that can't use MTRR behind the
> scenes, why would something like this on the BIOS not be possible? That
> ultimately uses set_pte_at(). What limitations are there on the BIOS
> that prevent us from just using strong UC for PAT on the BIOS?
Because it requires to run in virtual mode with page tables.
> > > > The kernel can be ignorant of the MTRR setup as long as it does
> > > > not modify it.
> > >
> > > Sure, we're already there. The kernel no longer modifies the
> > > MTRR setup unless of course you boot without PAT enabled. I think
> > > we need to move beyond that to ACPI if we can to let regular
> > > Linux boots never have to deal with MTRR at all. The code is
> > > complex and nasty why not put let folks put a nail on the coffin for
> > > good?
> >
> > I think we are good as long as we do not modify it. The complexity
> > comes with the modification.
>
> Ew. I look at that code and cannot comprehend why we'd ever want to keep
> it always running.
The code is basically no-op on Xen.
> > > > > I can read the above description to also say:
> > > > >
> > > > > "Hey you need to implement PAT with the same skeleton code as
> > > > > MTRR"
> > > >
> > > > No, I did not say that. MTRR's rendezvous handler can be
> > > > generalized to work with both MTRR and PAT. We do not need two
> > > > separate handlers. In fact, it needs to be a single handler so
> > > > that both can be initialized together.
> > >
> > > I'm not sure if that's really needed. Doesn't PAT just require
> > > setting the wrmsrl(MSR_IA32_CR_PAT, pat) for each AP?
> >
> > No, it requires the same procedure as MTRR.
>
> MTRR has a bunch of junk that is outside of the scope of the generic
> procedure which I'd hope we can skip.
We can skip the part that modifies MTRR setup. I think that is the part you
think is a junk.
> > > > >
:
> > > > It just means that Linux can enable PAT on virtual CPUs with PAT &
> > > > !MTRR capability.
> > > >
> > > > > > In fact, the kernel disabling MTRRs is the same as 2).
> > > > > >
> > > > > > > I'll also note Xen managed to enable PAT only without
> > > > > > > enabling MTRR, this was done through pat_init_cache_modes()
> > > > > > > -- not sure if this can be leveraged for qemu32...
> > > > > >
> > > > > > I am interested to know how Xen managed this. Is this done by
> > > > > > the Xen hypervisor initializes guest's PAT on behalf of the
> > > > > > guest kernel?
> > > > >
> > > > > Yup. And the cache read thingy was reading back its own setup,
> > > > > which was different than what Linux used by default IIRC. Juergen
> > > > > can elaborate more.
> > > >
> > > > Yeah, I'd like to make sure that my changes won't break it.
> > >
> > > I checked through code inspection and indeed, it seems it would break
> > > Xen's PAT setup.
> > >
> > > For the record: the issue here was code that should not run ran, that
> > > is dead code ran. I'm working towards a generic solution for this.
> >
> > Please review the next version.
>
> Sure..
Thanks!
-Toshi
On Mar 17, 2016 2:04 PM, "Toshi Kani" <[email protected]> wrote:
>
> On Wed, 2016-03-16 at 00:29 +0100, Luis R. Rodriguez wrote:
> > On Tue, Mar 15, 2016 at 05:48:44PM -0600, Toshi Kani wrote:
> > > On Tue, 2016-03-15 at 01:15 +0100, Luis R. Rodriguez wrote:
> > > > On Fri, Mar 11, 2016 at 06:16:36PM -0700, Toshi Kani wrote:
> > > > > On Fri, 2016-03-11 at 15:34 -0800, Luis R. Rodriguez wrote:
> > > > > > On Fri, Mar 11, 2016 at 3:56 PM, Toshi Kani <[email protected]>
> > > > > > wrote:
> > > > > > > On Fri, 2016-03-11 at 23:17 +0100, Luis R. Rodriguez wrote:
> > > > > > > > On Fri, Mar 11, 2016 at 11:57:12AM -0700, Toshi Kani wrote:
> > > :
> :
> > > MTRR code does not have to be dead for the virtual machines with no
> > > MTRR support. The code just needs to handle the disabled case
> > > properly. I consider this is part of 1) that the kernel keeps the MTRR
> > > state as disabled.
> >
> > For Xen it was possible to use PAT without any of the MTRR code running,
> > I don't see why its needed then and why other virtual machines that
> > only need PAT need it.
>
> Virtual BIOS does not need MTRRs since it does not manage the platform.
Unless if in dom0 and if some of this purposely wants to be punted
there to leverage existing kernel code. On the Xen thread I'm asking
about the implications of that and how/if Xen should be doing. We can
follow up on this there as its Xen specific.
> > > We do not need subarch for this case since presence of the MTRR feature
> > > can be tested with CPUID and MSR.
> >
> > But in this case its about two different cases:
> >
> > No MTRR - but you need to emulate PAT
> > No MTRR - but you can just read what the hypervisor already set up
> >
> > So it a given MTRR is disabled for both, what we need then is semantics
> > to distinguish between qemu32 and Xen PV. Curious to see what you use,
> > in your current new patch it was not clear what you did.
>
> X86_FEATURE_PAT tells us if CPU supports PAT. So, the kernel can
> distinguish the two cases above without knowing qemu32 or Xen.
Ok indeed, neat.
> > > AFAIK, MTRR is the only way to specify UC attribute in physical mode on
> > > x86. On ia64, one can simply set the UC bit to a physical address to
> > > specify UC attribute. I wish we had something similar.
> >
> > Whoa, you mean on the BIOS? This is BIG deal if true. Why haven't you
> > just said this a long time ago?
>
> Yes, BIOS. I think I've told you before when I mentioned that BIOS might
> need to manage fan speed. Virt BIOS does not need to do such thing.
You had mentioned befor BIOS uses MTRR, not that *the BIOS can only
MTRR for UC*. These are two very different things, hence my surprise.
> > On x86 Linux code we now have ioremap_uc() that can't use MTRR behind the
> > scenes, why would something like this on the BIOS not be possible? That
> > ultimately uses set_pte_at(). What limitations are there on the BIOS
> > that prevent us from just using strong UC for PAT on the BIOS?
>
> Because it requires to run in virtual mode with page tables.
Ah... interesting... is UC really needed, what is the default? If the
default is used would there be an issue ? Can such work be deferred to
a later time ? It seems like a high burden to require on large piece
of legacy architecture to just blow a fan.
> > > > > The kernel can be ignorant of the MTRR setup as long as it does
> > > > > not modify it.
> > > >
> > > > Sure, we're already there. The kernel no longer modifies the
> > > > MTRR setup unless of course you boot without PAT enabled. I think
> > > > we need to move beyond that to ACPI if we can to let regular
> > > > Linux boots never have to deal with MTRR at all. The code is
> > > > complex and nasty why not put let folks put a nail on the coffin for
> > > > good?
> > >
> > > I think we are good as long as we do not modify it. The complexity
> > > comes with the modification.
> >
> > Ew. I look at that code and cannot comprehend why we'd ever want to keep
> > it always running.
>
> The code is basically no-op on Xen.
Right, but if we can strive towards similar goals with more up to date
BIOSes as an option on bare metal why not.
> > > > > > I can read the above description to also say:
> > > > > >
> > > > > > "Hey you need to implement PAT with the same skeleton code as
> > > > > > MTRR"
> > > > >
> > > > > No, I did not say that. MTRR's rendezvous handler can be
> > > > > generalized to work with both MTRR and PAT. We do not need two
> > > > > separate handlers. In fact, it needs to be a single handler so
> > > > > that both can be initialized together.
> > > >
> > > > I'm not sure if that's really needed. Doesn't PAT just require
> > > > setting the wrmsrl(MSR_IA32_CR_PAT, pat) for each AP?
> > >
> > > No, it requires the same procedure as MTRR.
> >
> > MTRR has a bunch of junk that is outside of the scope of the generic
> > procedure which I'd hope we can skip.
>
> We can skip the part that modifies MTRR setup. I think that is the part you
> think is a junk.
Sure, but the more we can avoid any of that code the better. For
example consider the cleanup patch to increase the chunk size, do we
even need the cleanup anymore ?
Luis
On Thu, 2016-03-17 at 17:06 -0700, Luis R. Rodriguez wrote:
> On Mar 17, 2016 2:04 PM, "Toshi Kani" <[email protected]> wrote:
> >
> > On Wed, 2016-03-16 at 00:29 +0100, Luis R. Rodriguez wrote:
> > > On Tue, Mar 15, 2016 at 05:48:44PM -0600, Toshi Kani wrote:
> > > > On Tue, 2016-03-15 at 01:15 +0100, Luis R. Rodriguez wrote:
> > > > > On Fri, Mar 11, 2016 at 06:16:36PM -0700, Toshi Kani wrote:
> > > > > >
> > :
> > > > MTRR code does not have to be dead for the virtual machines with no
> > > > MTRR support. The code just needs to handle the disabled case
> > > > properly. I consider this is part of 1) that the kernel keeps the
> > > > MTRR state as disabled.
> > >
> > > For Xen it was possible to use PAT without any of the MTRR code
> > > running, I don't see why its needed then and why other virtual
> > > machines that only need PAT need it.
> >
> > Virtual BIOS does not need MTRRs since it does not manage the platform.
>
> Unless if in dom0 and if some of this purposely wants to be punted
> there to leverage existing kernel code. On the Xen thread I'm asking
> about the implications of that and how/if Xen should be doing. We can
> follow up on this there as its Xen specific.
I do not see any issue for Xen, but sure, we can discuss about Xen in a
separate thread.
:
> > > On x86 Linux code we now have ioremap_uc() that can't use MTRR behind
> > > the scenes, why would something like this on the BIOS not be
> > > possible? That ultimately uses set_pte_at(). What limitations are
> > > there on the BIOS that prevent us from just using strong UC for PAT
> > > on the BIOS?
> >
> > Because it requires to run in virtual mode with page tables.
>
> Ah... interesting... is UC really needed, what is the default? If the
> default is used would there be an issue ? Can such work be deferred to
> a later time ? It seems like a high burden to require on large piece
> of legacy architecture to just blow a fan.
The default cache attribute (i.e. ranges not covered by MTRRs) is specified
by the MTRR default type MSR.
:
> > > > > >
> > > MTRR has a bunch of junk that is outside of the scope of the generic
> > > procedure which I'd hope we can skip.
> >
> > We can skip the part that modifies MTRR setup. I think that is the part
> > you think is a junk.
>
> Sure, but the more we can avoid any of that code the better. For
> example consider the cleanup patch to increase the chunk size, do we
> even need the cleanup anymore ?
No, I do not think we need it now. I think this cleanup was needed to
allocate more free slots to MTRRs. We do not need to care about the number
of free slots as long as the kernel does not insert any new entry to MTRRs.
Thanks,
-Toshi
On Fri, Mar 18, 2016 at 2:35 PM, Toshi Kani <[email protected]> wrote:
> On Thu, 2016-03-17 at 17:06 -0700, Luis R. Rodriguez wrote:
>> On Mar 17, 2016 2:04 PM, "Toshi Kani" <[email protected]> wrote:
>> >
>> > On Wed, 2016-03-16 at 00:29 +0100, Luis R. Rodriguez wrote:
>> > > On Tue, Mar 15, 2016 at 05:48:44PM -0600, Toshi Kani wrote:
>> > > > On Tue, 2016-03-15 at 01:15 +0100, Luis R. Rodriguez wrote:
>> > > > > On Fri, Mar 11, 2016 at 06:16:36PM -0700, Toshi Kani wrote:
>> > > > > >
>> > :
>> > > > MTRR code does not have to be dead for the virtual machines with no
>> > > > MTRR support. The code just needs to handle the disabled case
>> > > > properly. I consider this is part of 1) that the kernel keeps the
>> > > > MTRR state as disabled.
>> > >
>> > > For Xen it was possible to use PAT without any of the MTRR code
>> > > running, I don't see why its needed then and why other virtual
>> > > machines that only need PAT need it.
>> >
>> > Virtual BIOS does not need MTRRs since it does not manage the platform.
>>
>> Unless if in dom0 and if some of this purposely wants to be punted
>> there to leverage existing kernel code. On the Xen thread I'm asking
>> about the implications of that and how/if Xen should be doing. We can
>> follow up on this there as its Xen specific.
>
> I do not see any issue for Xen, but sure, we can discuss about Xen in a
> separate thread.
While on point -- I'll just wanted to clarify that a while ago you had
hinted we needed to have Xen return a valid type with
mtrr_type_lookup(), I then also explained how Xen disables MTRR [0],
it was however unclear if you still believe mtrr_type_lookup() is
needed on the guest side. Jan had pointed out that the Xen Hypervisor
implements the XENPF_read_memtype hypercall. On the recent thread I
posted [1] I got into my review of the prospects of implementing
support for using this hypercall on Linux xen guests and issues and
concerns with it. Please feel free to follow up there and we can take
up the other items below here as they relate to bare metal.
[0] http://lkml.kernel.org/r/[email protected]
[1] http://lkml.kernel.org/r/CAB=NE6UTp0T=rbOcAg88iPsfBJneY7O5-3c11VfFgAPiepoTNg@mail.gmail.com
>> > > On x86 Linux code we now have ioremap_uc() that can't use MTRR behind
>> > > the scenes, why would something like this on the BIOS not be
>> > > possible? That ultimately uses set_pte_at(). What limitations are
>> > > there on the BIOS that prevent us from just using strong UC for PAT
>> > > on the BIOS?
>> >
>> > Because it requires to run in virtual mode with page tables.
>>
>> Ah... interesting... is UC really needed, what is the default? If the
>> default is used would there be an issue ? Can such work be deferred to
>> a later time ? It seems like a high burden to require on large piece
>> of legacy architecture to just blow a fan.
>
> The default cache attribute (i.e. ranges not covered by MTRRs) is specified
> by the MTRR default type MSR.
Do we really need UC for the fan? What is the default for PAT? Can't
the same be used so that we way by default all ranges match what is
also the default by PAT? Would that really break fan control ? If we
have a match should't we be able to not have to worry about MTRRs at
all in-kernel even on bare metal?
Another option, which I've alluded to on the Xen thread is skipping
over the MTRR space from the e820 map. Is that not possible ? This
could be last resort... but which I'm hinting more for the Xen side of
things if we *really* need get_mtrr() on the Xen guest side of
things...
>> > > > > >
>> > > MTRR has a bunch of junk that is outside of the scope of the generic
>> > > procedure which I'd hope we can skip.
>> >
>> > We can skip the part that modifies MTRR setup. I think that is the part
>> > you think is a junk.
>>
>> Sure, but the more we can avoid any of that code the better. For
>> example consider the cleanup patch to increase the chunk size, do we
>> even need the cleanup anymore ?
>
> No, I do not think we need it now. I think this cleanup was needed to
> allocate more free slots to MTRRs. We do not need to care about the number
> of free slots as long as the kernel does not insert any new entry to MTRRs.
Beautiful, thanks.
Luis
On Tue, 2016-03-29 at 10:14 -0700, Luis R. Rodriguez wrote:
> On Fri, Mar 18, 2016 at 2:35 PM, Toshi Kani <[email protected]> wrote:
> > On Thu, 2016-03-17 at 17:06 -0700, Luis R. Rodriguez wrote:
> > > On Mar 17, 2016 2:04 PM, "Toshi Kani" <[email protected]> wrote:
> > > >
:
> >
> > I do not see any issue for Xen, but sure, we can discuss about Xen in a
> > separate thread.
>
> While on point -- I'll just wanted to clarify that a while ago you had
> hinted we needed to have Xen return a valid type with
> mtrr_type_lookup(), I then also explained how Xen disables MTRR [0],
> it was however unclear if you still believe mtrr_type_lookup() is
> needed on the guest side. Jan had pointed out that the Xen Hypervisor
> implements the XENPF_read_memtype hypercall. On the recent thread I
> posted [1] I got into my review of the prospects of implementing
> support for using this hypercall on Linux xen guests and issues and
> concerns with it. Please feel free to follow up there and we can take
> up the other items below here as they relate to bare metal.
What I said in the email [0] was that "When MTRRs are enabled, the kernel
needs to check through mtrr_type_lookup()".
Since Xen guests have MTRRs disabled, this statement does not apply. It
returns MTRR_TYPE_INVALID when disabled, and this is fine for the guests.
> [0] http://lkml.kernel.org/r/[email protected]
> [1] http://lkml.kernel.org/r/CAB=NE6UTp0T=rbOcAg88iPsfBJneY7O5-3c11VfFgAP
> [email protected]
>
> > > > > On x86 Linux code we now have ioremap_uc() that can't use MTRR
> > > > > behind the scenes, why would something like this on the BIOS not
> > > > > be possible? That ultimately uses set_pte_at(). What limitations
> > > > > are there on the BIOS that prevent us from just using strong UC
> > > > > for PAT on the BIOS?
> > > >
> > > > Because it requires to run in virtual mode with page tables.
> > >
> > > Ah... interesting... is UC really needed, what is the default? If the
> > > default is used would there be an issue ? Can such work be deferred
> > > to a later time ? It seems like a high burden to require on large
> > > piece of legacy architecture to just blow a fan.
> >
> > The default cache attribute (i.e. ranges not covered by MTRRs) is
> > specified by the MTRR default type MSR.
>
> Do we really need UC for the fan?
When you say "we", are you referring Xen guests? Xen guests do not need to
control the fan, so they do not need UC set in MTRRs.
In general, yes, MMIO registers need UC when they need to be accessed.
> What is the default for PAT?
There is no such thing as the default for PAT.
> Can't
> the same be used so that we way by default all ranges match what is
> also the default by PAT? Would that really break fan control ? If we
> have a match should't we be able to not have to worry about MTRRs at
> all in-kernel even on bare metal?
We do not need to know about BIOS impl, such as fan control, etc. The
point is that if BIOS sets MTRRs, then the kernel keeps their setup. If
(virtual) BIOS does not enable MTRRs, the kernel keeps them disabled. We
just need not to mess with the setup.
> Another option, which I've alluded to on the Xen thread is skipping
> over the MTRR space from the e820 map. Is that not possible ? This
> could be last resort... but which I'm hinting more for the Xen side of
> things if we *really* need get_mtrr() on the Xen guest side of
> things...
There is no MTRR space in the e820 map since they are MSRs. Since Xen
guests disable MTRRs, I do not think you have any issue here...
Thanks,
-Toshi
On Tue, Mar 29, 2016 at 2:46 PM, Toshi Kani <[email protected]> wrote:
> On Tue, 2016-03-29 at 10:14 -0700, Luis R. Rodriguez wrote:
>> On Fri, Mar 18, 2016 at 2:35 PM, Toshi Kani <[email protected]> wrote:
>> > On Thu, 2016-03-17 at 17:06 -0700, Luis R. Rodriguez wrote:
>> > > On Mar 17, 2016 2:04 PM, "Toshi Kani" <[email protected]> wrote:
>> > > >
> :
>> >
>> > I do not see any issue for Xen, but sure, we can discuss about Xen in a
>> > separate thread.
>>
>> While on point -- I'll just wanted to clarify that a while ago you had
>> hinted we needed to have Xen return a valid type with
>> mtrr_type_lookup(), I then also explained how Xen disables MTRR [0],
>> it was however unclear if you still believe mtrr_type_lookup() is
>> needed on the guest side. Jan had pointed out that the Xen Hypervisor
>> implements the XENPF_read_memtype hypercall. On the recent thread I
>> posted [1] I got into my review of the prospects of implementing
>> support for using this hypercall on Linux xen guests and issues and
>> concerns with it. Please feel free to follow up there and we can take
>> up the other items below here as they relate to bare metal.
>
> What I said in the email [0] was that "When MTRRs are enabled, the kernel
> needs to check through mtrr_type_lookup()".
>
> Since Xen guests have MTRRs disabled, this statement does not apply. It
> returns MTRR_TYPE_INVALID when disabled, and this is fine for the guests.
Ah, I see thanks for the clarification!
>> [0] http://lkml.kernel.org/r/[email protected]
>> [1] http://lkml.kernel.org/r/CAB=NE6UTp0T=rbOcAg88iPsfBJneY7O5-3c11VfFgAP
>> [email protected]
>>
>> > > > > On x86 Linux code we now have ioremap_uc() that can't use MTRR
>> > > > > behind the scenes, why would something like this on the BIOS not
>> > > > > be possible? That ultimately uses set_pte_at(). What limitations
>> > > > > are there on the BIOS that prevent us from just using strong UC
>> > > > > for PAT on the BIOS?
>> > > >
>> > > > Because it requires to run in virtual mode with page tables.
>> > >
>> > > Ah... interesting... is UC really needed, what is the default? If the
>> > > default is used would there be an issue ? Can such work be deferred
>> > > to a later time ? It seems like a high burden to require on large
>> > > piece of legacy architecture to just blow a fan.
>> >
>> > The default cache attribute (i.e. ranges not covered by MTRRs) is
>> > specified by the MTRR default type MSR.
>>
>> Do we really need UC for the fan?
>
> When you say "we", are you referring Xen guests? Xen guests do not need to
> control the fan, so they do not need UC set in MTRRs.
>
> In general, yes, MMIO registers need UC when they need to be accessed.
Curious, what does a BIOS do for fan control when MTRRs are disabled?
Also what if a BIOS just set MSR_MTRRdefType to uncachable only ?
Wouldn't that help simplify the BIOS when systems are known as not
wanting to deal with reading MTRRs on the kernel front, even if its
just to read the setup ?
I'm trying to determine exactly why a BIOS cannot simply enable use an
alternative for what it needs for fan control and let the kernel live
without any MTRR code at run time as an option. Although the
documentation says that the same "procedure" is needed for PAT setup,
I see it possible to split the skeleton of the code and have each
peace of code live separately and compartmentalized, they'd just have
respective calls on the skeleton of the procedure.
>> What is the default for PAT?
>
> There is no such thing as the default for PAT.
>
>> Can't
>> the same be used so that we way by default all ranges match what is
>> also the default by PAT? Would that really break fan control ? If we
>> have a match should't we be able to not have to worry about MTRRs at
>> all in-kernel even on bare metal?
>
> We do not need to know about BIOS impl, such as fan control, etc. The
> point is that if BIOS sets MTRRs, then the kernel keeps their setup.
Right, if the kernel no longer uses it directly it seems like an
aweful lot of code to keep updating simply for a BIOS requirement, I'm
trying to see if we can have the option to live without this
requirement.
> If (virtual) BIOS does not enable MTRRs, the kernel keeps them disabled. We
> just need not to mess with the setup.
Sure, thanks! I'm trying to see if we can have a similar option on bare metal.
>> Another option, which I've alluded to on the Xen thread is skipping
>> over the MTRR space from the e820 map. Is that not possible ? This
>> could be last resort... but which I'm hinting more for the Xen side of
>> things if we *really* need get_mtrr() on the Xen guest side of
>> things...
>
> There is no MTRR space in the e820 map since they are MSRs. Since Xen
> guests disable MTRRs, I do not think you have any issue here...
Xen seems to clip the e820 map given to a guest in certain MTRR
conditions, see init_e820(), this calls
machine_specific_memory_setup() which later clips MTRR if
mtrr_top_of_ram(). This is an Intel check that trims the e820 map if
MTRRs were found to be enabled and the default MTRR is not write-back.
If returns the address of the first non write-back variable MTRR, it
uses clip_to_limit() to limit the exposed memory [0], notice how
clip_to_limit() is also used to generally limit exposed memory through
the opt_mem boot parameter as well. Its not exactly clear why that's
done, but this looks very similar to the Linux MTRR cleanup -- see
x86_get_mtrr_mem_range().
[0] http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/e820.c
Luis
On Tue, 2016-03-29 at 15:12 -0700, Luis R. Rodriguez wrote:
> On Tue, Mar 29, 2016 at 2:46 PM, Toshi Kani <[email protected]> wrote:
> > On Tue, 2016-03-29 at 10:14 -0700, Luis R. Rodriguez wrote:
> > > On Fri, Mar 18, 2016 at 2:35 PM, Toshi Kani <[email protected]>
> > > wrote:
:
> > >
> > > Do we really need UC for the fan?
> >
> > When you say "we", are you referring Xen guests? Xen guests do not
> > need to control the fan, so they do not need UC set in MTRRs.
> >
> > In general, yes, MMIO registers need UC when they need to be accessed.
>
> Curious, what does a BIOS do for fan control when MTRRs are disabled?
You mean, when the kernel modified the MTRR setup and disabled them. BIOS
would assume the original setup and still access the registers. This may
lead to undefined behavior and may result in a system crash.
> Also what if a BIOS just set MSR_MTRRdefType to uncachable only ?
Many BIOSes actually set the default type to UC. MTRRs then cover regular
memory with WB.
> Wouldn't that help simplify the BIOS when systems are known as not
> wanting to deal with reading MTRRs on the kernel front, even if its
> just to read the setup ?
Nope.
> I'm trying to determine exactly why a BIOS cannot simply enable use an
> alternative for what it needs for fan control and let the kernel live
> without any MTRR code at run time as an option. Although the
> documentation says that the same "procedure" is needed for PAT setup,
> I see it possible to split the skeleton of the code and have each
> peace of code live separately and compartmentalized, they'd just have
> respective calls on the skeleton of the procedure.
I agree that the MTRR rendezvous handler can be improved for PAT, but I do
not see a compelling reason to make such change now. With my fix, I think
the code works reasonably for Xen.
> > > What is the default for PAT?
> >
> > There is no such thing as the default for PAT.
> >
> > > Can't
> > > the same be used so that we way by default all ranges match what is
> > > also the default by PAT? Would that really break fan control ? If we
> > > have a match should't we be able to not have to worry about MTRRs at
> > > all in-kernel even on bare metal?
> >
> > We do not need to know about BIOS impl, such as fan control, etc. The
> > point is that if BIOS sets MTRRs, then the kernel keeps their setup.
>
> Right, if the kernel no longer uses it directly it seems like an
> aweful lot of code to keep updating simply for a BIOS requirement, I'm
> trying to see if we can have the option to live without this
> requirement.
Please be aware of the hibernation case. I think this procedure involves
setting MTRRs back to the original setup.
> > If (virtual) BIOS does not enable MTRRs, the kernel keeps them
> > disabled. We just need not to mess with the setup.
>
> Sure, thanks! I'm trying to see if we can have a similar option on bare
> metal.
>
> > > Another option, which I've alluded to on the Xen thread is skipping
> > > over the MTRR space from the e820 map. Is that not possible ? This
> > > could be last resort... but which I'm hinting more for the Xen side
> > > of things if we *really* need get_mtrr() on the Xen guest side of
> > > things...
> >
> > There is no MTRR space in the e820 map since they are MSRs. Since Xen
> > guests disable MTRRs, I do not think you have any issue here...
>
> Xen seems to clip the e820 map given to a guest in certain MTRR
> conditions, see init_e820(), this calls
> machine_specific_memory_setup() which later clips MTRR if
> mtrr_top_of_ram(). This is an Intel check that trims the e820 map if
> MTRRs were found to be enabled and the default MTRR is not write-back.
> If returns the address of the first non write-back variable MTRR, it
> uses clip_to_limit() to limit the exposed memory [0], notice how
> clip_to_limit() is also used to generally limit exposed memory through
> the opt_mem boot parameter as well. Its not exactly clear why that's
> done, but this looks very similar to the Linux MTRR cleanup -- see
> x86_get_mtrr_mem_range().
>
> [0] http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/e820.c
It looks to me that the code makes sure all E820_RAM ranges in the e820
table are covered by WB entries of MTRRs. If not, it trims the e820 table.
I suppose it tries to react on a case when someone modified MTRRs and
resulted in mismatch with the e820 table. I'd think you do not need this
code as long as you do not modify the MTRR setup.
Thanks,
-Toshi
On Tue, Mar 29, 2016 at 5:16 PM, Toshi Kani <[email protected]> wrote:
> On Tue, 2016-03-29 at 15:12 -0700, Luis R. Rodriguez wrote:
>> On Tue, Mar 29, 2016 at 2:46 PM, Toshi Kani <[email protected]> wrote:
>> > On Tue, 2016-03-29 at 10:14 -0700, Luis R. Rodriguez wrote:
>> > > On Fri, Mar 18, 2016 at 2:35 PM, Toshi Kani <[email protected]>
>> > > wrote:
> :
>> > >
>> > > Do we really need UC for the fan?
>> >
>> > When you say "we", are you referring Xen guests? Xen guests do not
>> > need to control the fan, so they do not need UC set in MTRRs.
>> >
>> > In general, yes, MMIO registers need UC when they need to be accessed.
>>
>> Curious, what does a BIOS do for fan control when MTRRs are disabled?
>
> You mean, when the kernel modified the MTRR setup and disabled them.
Nope, but the below is good to know!
I meant to ask about the case where the option the lets a user go in a
muck with BIOS settings to disable MTRR e xists and the user disables
MTRR. What would happen for fan control in such situations? I'd
imagine such cases allow for a system to exist with proper fan
control, and allow the kernel to boot without having to deal with the
pesky MTRRs at all, while PAT lives on, no?
> BIOS
> would assume the original setup and still access the registers. This may
> lead to undefined behavior and may result in a system crash.
>
>> Also what if a BIOS just set MSR_MTRRdefType to uncachable only ?
>
> Many BIOSes actually set the default type to UC.
Thanks, I asked as I saw my BIOS uses write-back by default. Good to
know there are different strategies.
> MTRRs then cover regular memory with WB.
When you say regular memory you mean everything else we see as RAM? I
was under the impression we'd only need MTRR for a special range of
memory, and its up to implementation how they are used. If you can use
MTRR to change the cache attribute for regular RAM and if this is
actually a requirement if the default MTRR is UC then one way or
another a BIOS seems to always require MTRR, either for UC setting for
fan control or WB for regular RAM, is that right?
>> Wouldn't that help simplify the BIOS when systems are known as not
>> wanting to deal with reading MTRRs on the kernel front, even if its
>> just to read the setup ?
>
> Nope.
>
>> I'm trying to determine exactly why a BIOS cannot simply enable use an
>> alternative for what it needs for fan control and let the kernel live
>> without any MTRR code at run time as an option. Although the
>> documentation says that the same "procedure" is needed for PAT setup,
>> I see it possible to split the skeleton of the code and have each
>> peace of code live separately and compartmentalized, they'd just have
>> respective calls on the skeleton of the procedure.
>
> I agree that the MTRR rendezvous handler can be improved for PAT, but I do
> not see a compelling reason to make such change now. With my fix, I think
> the code works reasonably for Xen.
Agreed, don't think its needed now, my questions are for future optimizations.
>> > > What is the default for PAT?
>> >
>> > There is no such thing as the default for PAT.
>> >
>> > > Can't
>> > > the same be used so that we way by default all ranges match what is
>> > > also the default by PAT? Would that really break fan control ? If we
>> > > have a match should't we be able to not have to worry about MTRRs at
>> > > all in-kernel even on bare metal?
>> >
>> > We do not need to know about BIOS impl, such as fan control, etc. The
>> > point is that if BIOS sets MTRRs, then the kernel keeps their setup.
>>
>> Right, if the kernel no longer uses it directly it seems like an
>> aweful lot of code to keep updating simply for a BIOS requirement, I'm
>> trying to see if we can have the option to live without this
>> requirement.
>
> Please be aware of the hibernation case. I think this procedure involves
> setting MTRRs back to the original setup.
Eek, right, so best just disable them if we can.
>> > If (virtual) BIOS does not enable MTRRs, the kernel keeps them
>> > disabled. We just need not to mess with the setup.
>>
>> Sure, thanks! I'm trying to see if we can have a similar option on bare
>> metal.
>>
>> > > Another option, which I've alluded to on the Xen thread is skipping
>> > > over the MTRR space from the e820 map. Is that not possible ? This
>> > > could be last resort... but which I'm hinting more for the Xen side
>> > > of things if we *really* need get_mtrr() on the Xen guest side of
>> > > things...
>> >
>> > There is no MTRR space in the e820 map since they are MSRs. Since Xen
>> > guests disable MTRRs, I do not think you have any issue here...
>>
>> Xen seems to clip the e820 map given to a guest in certain MTRR
>> conditions, see init_e820(), this calls
>> machine_specific_memory_setup() which later clips MTRR if
>> mtrr_top_of_ram(). This is an Intel check that trims the e820 map if
>> MTRRs were found to be enabled and the default MTRR is not write-back.
>> If returns the address of the first non write-back variable MTRR, it
>> uses clip_to_limit() to limit the exposed memory [0], notice how
>> clip_to_limit() is also used to generally limit exposed memory through
>> the opt_mem boot parameter as well. Its not exactly clear why that's
>> done, but this looks very similar to the Linux MTRR cleanup -- see
>> x86_get_mtrr_mem_range().
>>
>> [0] http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/e820.c
>
> It looks to me that the code makes sure all E820_RAM ranges in the e820
> table are covered by WB entries of MTRRs. If not, it trims the e820 table.
>
> I suppose it tries to react on a case when someone modified MTRRs and
> resulted in mismatch with the e820 table. I'd think you do not need this
> code as long as you do not modify the MTRR setup.
Great thanks for that -- another optimization possible.
Luis
On Tue, 2016-03-29 at 16:43 -0700, Luis R. Rodriguez wrote:
> On Tue, Mar 29, 2016 at 5:16 PM, Toshi Kani <[email protected]> wrote:
> > On Tue, 2016-03-29 at 15:12 -0700, Luis R. Rodriguez wrote:
> > > On Tue, Mar 29, 2016 at 2:46 PM, Toshi Kani <[email protected]>
> > > wrote:
> > > > On Tue, 2016-03-29 at 10:14 -0700, Luis R. Rodriguez wrote:
> > > > > On Fri, Mar 18, 2016 at 2:35 PM, Toshi Kani <[email protected]>
> > > > > wrote:
> > :
> > > > >
> > > > > Do we really need UC for the fan?
> > > >
> > > > When you say "we", are you referring Xen guests? Xen guests do not
> > > > need to control the fan, so they do not need UC set in MTRRs.
> > > >
> > > > In general, yes, MMIO registers need UC when they need to be
> > > > accessed.
> > >
> > > Curious, what does a BIOS do for fan control when MTRRs are disabled?
> >
> > You mean, when the kernel modified the MTRR setup and disabled them.
>
> Nope, but the below is good to know!
>
> I meant to ask about the case where the option the lets a user go in a
> muck with BIOS settings to disable MTRR e xists and the user disables
> MTRR. What would happen for fan control in such situations? I'd
> imagine such cases allow for a system to exist with proper fan
> control, and allow the kernel to boot without having to deal with the
> pesky MTRRs at all, while PAT lives on, no?
You mean user disables MTRRs from BIOS setup menu? I am not a BIOS guy,
but I do not think it offers such option when the code depends on it...
> > BIOS would assume the original setup and still access the
> > registers. This may lead to undefined behavior and may result in a
> > system crash.
> >
> > > Also what if a BIOS just set MSR_MTRRdefType to uncachable only ?
> >
> > Many BIOSes actually set the default type to UC.
>
> Thanks, I asked as I saw my BIOS uses write-back by default. Good to
> know there are different strategies.
>
> > MTRRs then cover regular memory with WB.
>
> When you say regular memory you mean everything else we see as RAM? I
> was under the impression we'd only need MTRR for a special range of
> memory, and its up to implementation how they are used. If you can use
> MTRR to change the cache attribute for regular RAM and if this is
> actually a requirement if the default MTRR is UC then one way or
> another a BIOS seems to always require MTRR, either for UC setting for
> fan control or WB for regular RAM, is that right?
Right, in one way or the other, MTRRs set WB to RAM and UC to MMIO. PAT is
overwritten by MTRRs, so RAM must be set to WB.
Thanks,
-Toshi
On Tue, Mar 29, 2016 at 6:07 PM, Toshi Kani <[email protected]> wrote:
> On Tue, 2016-03-29 at 16:43 -0700, Luis R. Rodriguez wrote:
>> I meant to ask about the case where the option the lets a user go in a
>> muck with BIOS settings to disable MTRR e xists and the user disables
>> MTRR. What would happen for fan control in such situations? I'd
>> imagine such cases allow for a system to exist with proper fan
>> control, and allow the kernel to boot without having to deal with the
>> pesky MTRRs at all, while PAT lives on, no?
>
> You mean user disables MTRRs from BIOS setup menu?
Yup!
> I am not a BIOS guy,
> but I do not think it offers such option when the code depends on it...
Darn, I'm pretty sure I've seen such option before... can't seem to
find such a toggle now.
>> When you say regular memory you mean everything else we see as RAM? I
>> was under the impression we'd only need MTRR for a special range of
>> memory, and its up to implementation how they are used. If you can use
>> MTRR to change the cache attribute for regular RAM and if this is
>> actually a requirement if the default MTRR is UC then one way or
>> another a BIOS seems to always require MTRR, either for UC setting for
>> fan control or WB for regular RAM, is that right?
>
> Right, in one way or the other, MTRRs set WB to RAM and UC to MMIO. PAT is
> overwritten by MTRRs, so RAM must be set to WB.
I see... thanks....
Luis
On Thu, Mar 17, 2016 at 03:56:47PM -0600, Toshi Kani wrote:
> On Wed, 2016-03-16 at 00:29 +0100, Luis R. Rodriguez wrote:
> > On x86 Linux code we now have ioremap_uc() that can't use MTRR behind the
> > scenes, why would something like this on the BIOS not be possible? That
> > ultimately uses set_pte_at(). What limitations are there on the BIOS
> > that prevent us from just using strong UC for PAT on the BIOS?
>
> Because it requires to run in virtual mode with page tables.
I see now. Specifically, BIOSes run in real mode, and PAT uses
paging. Paging requires bit 31 on CR0 set (PG), and PG has no
effect if the PE flag (Protection Enable) bit 0 on CR0 is clear.
If PE is clear we have real mode, which is what the BIOS uses.
Stupid question then:
are there no use case for a BIOS to enter PE, even if just
limited to set paging attributes for instance. For the simple
sake of burying MTRR this seems worthy, but I wonder if there
are other paging needs a BIOS might find use for.
Luis
On Sat, 2016-04-09 at 04:04 +0200, Luis R. Rodriguez wrote:
> On Thu, Mar 17, 2016 at 03:56:47PM -0600, Toshi Kani wrote:
> >
> > On Wed, 2016-03-16 at 00:29 +0100, Luis R. Rodriguez wrote:
> > >
> > > On x86 Linux code we now have ioremap_uc() that can't use MTRR behind
> > > the scenes, why would something like this on the BIOS not be
> > > possible? That ultimately uses set_pte_at(). What limitations are
> > > there on the BIOS that prevent us from just using strong UC for PAT
> > > on the BIOS?
>>
> > Because it requires to run in virtual mode with page tables.
>
> I see now. Specifically, BIOSes run in real mode, and PAT uses
> paging. Paging requires bit 31 on CR0 set (PG), and PG has no
> effect if the PE flag (Protection Enable) bit 0 on CR0 is clear.
> If PE is clear we have real mode, which is what the BIOS uses.
>
> Stupid question then:
>
> are there no use case for a BIOS to enter PE, even if just
> limited to set paging attributes for instance. For the simple
> sake of burying MTRR this seems worthy, but I wonder if there
> are other paging needs a BIOS might find use for.
The OS calls EFI in virtual mode. BIOS SMI handler, however, is called in
physical mode. Since it has the highest priority, it needs to finish as
soon as possible. So, it typically does not make additional steps to
switch to virtual mode.
Thanks,
-Toshi