BIOS is expected to clear the SYSCFG[MtrrFixDramModEn] on AMD CPUs after
fixed MTRRs are configured.
Some BIOSes do not clear SYSCFG[MtrrFixDramModEn] on BP (and on APs).
This can lead to obfuscation in Linux when this bit is not cleared on
BP but cleared on APs. A consequence of this is that the saved
fixed-MTRR state (from BP) differs from the fixed-MTRRs of APs --
because RdDram/WrDram bits are read as zero when
SYSCFG[MtrrFixDramModEn] is cleared -- and Linux tries to sync
fixed-MTRR state from BP to AP. This implies that Linux sets
SYSCFG[MtrrFixDramEn] and activates those bits.
More important is that (some) systems change these bits in SMM when
ACPI is enabled. Hence it is racy if Linux modifies RdMem/WrMem bits,
too.
I suggest not to touch RdDram/WrDram bits of fixed-MTRRs and
SYSCFG[MtrrFixDramEn] and to clear SYSCFG[MtrrFixDramModEn] as
suggested by AMD K8, and AMD family 10h/11h BKDGs.
BIOS is expected to do this anyway. This should avoid that
Linux and SMM tread on each other's toes ...
Signed-off-by: Andreas Herrmann <[email protected]>
---
arch/x86/kernel/cpu/mtrr/generic.c | 50 +++++++++++++++++++++---------------
1 files changed, 29 insertions(+), 21 deletions(-)
Some more details are in
http://bugzilla.kernel.org/show_bug.cgi?id=11541
Several systems with AMD CPUs (Phenom and Turion) boot
only with either acpi=ht, acpi=off, or CONFIG_MTRR=n.
The fix worked at least for an ASUS M3A-H/HDMI (version 1301) system.
The other systems with that problem booted fine after BIOS updates and
therefore this patch wasn't verified on those systems.
Please apply,
Andreas
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 0c0a455..9ee4de5 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -41,6 +41,31 @@ static int __init mtrr_debug(char *opt)
}
early_param("mtrr.show", mtrr_debug);
+/**
+ * BIOS is expected to clear MtrrFixDramModEn bit, see for example
+ * "BIOS and Kernel Developer's Guide for the AMD Athlon 64 and AMD
+ * Opteron Processors" (26094 Rev. 3.30 February 2006), section
+ * "13.2.1.2 SYSCFG Register": "The MtrrFixDramModEn bit should be set
+ * to 1 during BIOS initalization of the fixed MTRRs, then cleared to
+ * 0 for operation."
+ */
+static inline void k8_check_syscfg_dram_mod_en(void)
+{
+ u32 lo, hi;
+
+ if (!((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) &&
+ (boot_cpu_data.x86 >= 0x0f)))
+ return;
+
+ rdmsr(MSR_K8_SYSCFG, lo, hi);
+ if (lo & K8_MTRRFIXRANGE_DRAM_MODIFY) {
+ printk(KERN_ERR "MTRR: CPU %u: SYSCFG[MtrrFixDramModEn] "
+ "not cleared, clearing this bit\n", smp_processor_id());
+ lo &= ~K8_MTRRFIXRANGE_DRAM_MODIFY;
+ mtrr_wrmsr(MSR_K8_SYSCFG, lo, hi);
+ }
+}
+
/*
* Returns the effective MTRR type for the region
* Error returns:
@@ -174,6 +199,8 @@ get_fixed_ranges(mtrr_type * frs)
unsigned int *p = (unsigned int *) frs;
int i;
+ k8_check_syscfg_dram_mod_en();
+
rdmsr(MTRRfix64K_00000_MSR, p[0], p[1]);
for (i = 0; i < 2; i++)
@@ -308,27 +335,10 @@ void mtrr_wrmsr(unsigned msr, unsigned a, unsigned b)
}
/**
- * Enable and allow read/write of extended fixed-range MTRR bits on K8 CPUs
- * see AMD publication no. 24593, chapter 3.2.1 for more information
- */
-static inline void k8_enable_fixed_iorrs(void)
-{
- unsigned lo, hi;
-
- rdmsr(MSR_K8_SYSCFG, lo, hi);
- mtrr_wrmsr(MSR_K8_SYSCFG, lo
- | K8_MTRRFIXRANGE_DRAM_ENABLE
- | K8_MTRRFIXRANGE_DRAM_MODIFY, hi);
-}
-
-/**
* set_fixed_range - checks & updates a fixed-range MTRR if it differs from the value it should have
* @msr: MSR address of the MTTR which should be checked and updated
* @changed: pointer which indicates whether the MTRR needed to be changed
* @msrwords: pointer to the MSR values which the MSR should have
- *
- * If K8 extentions are wanted, update the K8 SYSCFG MSR also.
- * See AMD publication no. 24593, chapter 7.8.1, page 233 for more information.
*/
static void set_fixed_range(int msr, bool *changed, unsigned int *msrwords)
{
@@ -337,10 +347,6 @@ static void set_fixed_range(int msr, bool *changed, unsigned int *msrwords)
rdmsr(msr, lo, hi);
if (lo != msrwords[0] || hi != msrwords[1]) {
- if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
- (boot_cpu_data.x86 >= 0x0f && boot_cpu_data.x86 <= 0x11) &&
- ((msrwords[0] | msrwords[1]) & K8_MTRR_RDMEM_WRMEM_MASK))
- k8_enable_fixed_iorrs();
mtrr_wrmsr(msr, msrwords[0], msrwords[1]);
*changed = true;
}
@@ -419,6 +425,8 @@ static int set_fixed_ranges(mtrr_type * frs)
bool changed = false;
int block=-1, range;
+ k8_check_syscfg_dram_mod_en();
+
while (fixed_range_blocks[++block].ranges)
for (range=0; range < fixed_range_blocks[block].ranges; range++)
set_fixed_range(fixed_range_blocks[block].base_msr + range,
--
1.6.1.3
On Wed, Mar 11, 2009 at 8:00 AM, Andreas Herrmann
<[email protected]> wrote:
> BIOS is expected to clear the SYSCFG[MtrrFixDramModEn] on AMD CPUs after
> fixed MTRRs are configured.
>
> Some BIOSes do not clear SYSCFG[MtrrFixDramModEn] on BP (and on APs).
>
> This can lead to obfuscation in Linux when this bit is not cleared on
> BP but cleared on APs. A consequence of this is that the saved
> fixed-MTRR state (from BP) differs from the fixed-MTRRs of APs --
> because RdDram/WrDram bits are read as zero when
> SYSCFG[MtrrFixDramModEn] is cleared -- and Linux tries to sync
> fixed-MTRR state from BP to AP. This implies that Linux sets
> SYSCFG[MtrrFixDramEn] and activates those bits.
>
> More important is that (some) systems change these bits in SMM when
> ACPI is enabled. Hence it is racy if Linux modifies RdMem/WrMem bits,
> too.
>
> I suggest not to touch RdDram/WrDram bits of fixed-MTRRs and
> SYSCFG[MtrrFixDramEn] and to clear SYSCFG[MtrrFixDramModEn] as
> suggested by AMD K8, and AMD family 10h/11h BKDGs.
> BIOS is expected to do this anyway. This should avoid that
> Linux and SMM tread on each other's toes ...
>
wonder if you could add one dmi check to tell the user that bios is
buggy, ask vendor fixed BIOS
and skip fixed mtrr sync.
instead of covering bios problem quietly.
YH
On Thu, Mar 12, 2009 at 01:01:26AM -0700, Yinghai Lu wrote:
> On Wed, Mar 11, 2009 at 8:00 AM, Andreas Herrmann
> <[email protected]> wrote:
> > BIOS is expected to clear the SYSCFG[MtrrFixDramModEn] on AMD CPUs after
> > fixed MTRRs are configured.
> >
> > Some BIOSes do not clear SYSCFG[MtrrFixDramModEn] on BP (and on APs).
> >
> > This can lead to obfuscation in Linux when this bit is not cleared on
> > BP but cleared on APs. A consequence of this is that the saved
> > fixed-MTRR state (from BP) differs from the fixed-MTRRs of APs --
> > because RdDram/WrDram bits are read as zero when
> > SYSCFG[MtrrFixDramModEn] is cleared -- and Linux tries to sync
> > fixed-MTRR state from BP to AP. This implies that Linux sets
> > SYSCFG[MtrrFixDramEn] and activates those bits.
> >
> > More important is that (some) systems change these bits in SMM when
> > ACPI is enabled. Hence it is racy if Linux modifies RdMem/WrMem bits,
> > too.
> >
> > I suggest not to touch RdDram/WrDram bits of fixed-MTRRs and
> > SYSCFG[MtrrFixDramEn] and to clear SYSCFG[MtrrFixDramModEn] as
> > suggested by AMD K8, and AMD family 10h/11h BKDGs.
> > BIOS is expected to do this anyway. This should avoid that
> > Linux and SMM tread on each other's toes ...
> >
>
> wonder if you could add one dmi check to tell the user that bios is
> buggy, ask vendor fixed BIOS
> and skip fixed mtrr sync.
There seem to be several systems affected that do not hide
RdMem/WrMem bits from OS.
(Causing Suspend/resume problem:)
- Acer Ferrari 1000
- Acer Ferrari 5000
(Causing kernel freeze:)
- Asus M51TR-AP003C notebook
- Asus M51Ta notebook
- Asus M3A-H/HDMI mobo
- maybe some more systems
That is why I prefer to have a general solution for this.
(Which is to hide those bits whenever Linux reads or modifies
fixed-MTRR state.)
> instead of covering bios problem quietly.
It's not quietly, an error message will be printed.
Thomas R. suggested to use "KERN_ERR FW_BUG" but I'd prefer to add a
FW_WARN which should be used "for not that clear (e.g. could the
kernel messed up things already?) and medium priority BIOS bugs."
Note that if Linux does not try to sync fixed-MTRRs the affected
systems do not panic. But as a fact Linux is syncing (and needs to)
MTRR values across CPUs due to other buggy BIOSes ...
Regards,
Andreas
--
Operating | Advanced Micro Devices GmbH
System | Karl-Hammerschmidt-Str. 34, 85609 Dornach b. M?nchen, Germany
Research | Gesch?ftsf?hrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis M?nchen
(OSRC) | Registergericht M?nchen, HRB Nr. 43632
On Thu, 2009-03-12 at 12:41 +0100, Andreas Herrmann wrote:
> On Thu, Mar 12, 2009 at 01:01:26AM -0700, Yinghai Lu wrote:
> > On Wed, Mar 11, 2009 at 8:00 AM, Andreas Herrmann
> > <[email protected]> wrote:
> > > BIOS is expected to clear the SYSCFG[MtrrFixDramModEn] on AMD CPUs after
> > > fixed MTRRs are configured.
> > >
> > > Some BIOSes do not clear SYSCFG[MtrrFixDramModEn] on BP (and on APs).
> > >
> > > This can lead to obfuscation in Linux when this bit is not cleared on
> > > BP but cleared on APs. A consequence of this is that the saved
> > > fixed-MTRR state (from BP) differs from the fixed-MTRRs of APs --
> > > because RdDram/WrDram bits are read as zero when
> > > SYSCFG[MtrrFixDramModEn] is cleared -- and Linux tries to sync
> > > fixed-MTRR state from BP to AP. This implies that Linux sets
> > > SYSCFG[MtrrFixDramEn] and activates those bits.
> > >
> > > More important is that (some) systems change these bits in SMM when
> > > ACPI is enabled. Hence it is racy if Linux modifies RdMem/WrMem bits,
> > > too.
> > >
> > > I suggest not to touch RdDram/WrDram bits of fixed-MTRRs and
> > > SYSCFG[MtrrFixDramEn] and to clear SYSCFG[MtrrFixDramModEn] as
> > > suggested by AMD K8, and AMD family 10h/11h BKDGs.
> > > BIOS is expected to do this anyway. This should avoid that
> > > Linux and SMM tread on each other's toes ...
> > >
> >
> > wonder if you could add one dmi check to tell the user that bios is
> > buggy, ask vendor fixed BIOS
> > and skip fixed mtrr sync.
>
> There seem to be several systems affected that do not hide
> RdMem/WrMem bits from OS.
>
> (Causing Suspend/resume problem:)
> - Acer Ferrari 1000
> - Acer Ferrari 5000
What problem?
Hang second resume???
I have an acer 5720G that hangs on second resume - bios doesn't pass
control to linux at all. First resume works fine.
Best regards,
Maxim Levitsky
On Thu, Mar 12, 2009 at 02:29:02PM +0200, Maxim Levitsky wrote:
> On Thu, 2009-03-12 at 12:41 +0100, Andreas Herrmann wrote:
> > (Causing Suspend/resume problem:)
> > - Acer Ferrari 1000
> > - Acer Ferrari 5000
> What problem?
> Hang second resume???
See
http://lkml.org/lkml/2007/4/3/110
I don't have access to such a machine. Thus I don't know
what the exact problem was.
But it was the reason that the code to sync RdMem/WrMem bits of
fixed-MTRRs on AMD-CPUs was added. And now this caused new problems on
systems with other buggy BIOSes.
Regards,
Andreas
--
Operating | Advanced Micro Devices GmbH
System | Karl-Hammerschmidt-Str. 34, 85609 Dornach b. M?nchen, Germany
Research | Gesch?ftsf?hrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis M?nchen
(OSRC) | Registergericht M?nchen, HRB Nr. 43632
Impact: bug fix + BIOS workaround
BIOS is expected to clear the SYSCFG[MtrrFixDramModEn] on AMD CPUs
after fixed MTRRs are configured.
Some BIOSes do not clear SYSCFG[MtrrFixDramModEn] on BP (and on APs).
This can lead to obfuscation in Linux when this bit is not cleared on
BP but cleared on APs. A consequence of this is that the saved
fixed-MTRR state (from BP) differs from the fixed-MTRRs of APs --
because RdDram/WrDram bits are read as zero when
SYSCFG[MtrrFixDramModEn] is cleared -- and Linux tries to sync
fixed-MTRR state from BP to AP. This implies that Linux sets
SYSCFG[MtrrFixDramEn] and activates those bits.
More important is that (some) systems change these bits in SMM when
ACPI is enabled. Hence it is racy if Linux modifies RdMem/WrMem bits,
too.
I suggest not to touch RdDram/WrDram bits of fixed-MTRRs and
SYSCFG[MtrrFixDramEn] and to clear SYSCFG[MtrrFixDramModEn] as
suggested by AMD K8, and AMD family 10h/11h BKDGs.
BIOS is expected to do this anyway. This should avoid that
Linux and SMM tread on each other's toes ...
Signed-off-by: Andreas Herrmann <[email protected]>
---
arch/x86/kernel/cpu/mtrr/generic.c | 51 +++++++++++++++++++++---------------
1 files changed, 30 insertions(+), 21 deletions(-)
Change to previous version:
I slightly modified the log message (e.g. addition of FW_WARN).
Please consider to apply this patch for .29.
Thanks,
Andreas
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 0c0a455..6f557e0 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -41,6 +41,32 @@ static int __init mtrr_debug(char *opt)
}
early_param("mtrr.show", mtrr_debug);
+/**
+ * BIOS is expected to clear MtrrFixDramModEn bit, see for example
+ * "BIOS and Kernel Developer's Guide for the AMD Athlon 64 and AMD
+ * Opteron Processors" (26094 Rev. 3.30 February 2006), section
+ * "13.2.1.2 SYSCFG Register": "The MtrrFixDramModEn bit should be set
+ * to 1 during BIOS initalization of the fixed MTRRs, then cleared to
+ * 0 for operation."
+ */
+static inline void k8_check_syscfg_dram_mod_en(void)
+{
+ u32 lo, hi;
+
+ if (!((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) &&
+ (boot_cpu_data.x86 >= 0x0f)))
+ return;
+
+ rdmsr(MSR_K8_SYSCFG, lo, hi);
+ if (lo & K8_MTRRFIXRANGE_DRAM_MODIFY) {
+ printk(KERN_ERR FW_WARN "MTRR: CPU %u: SYSCFG[MtrrFixDramModEn]"
+ " not cleared by BIOS, clearing this bit\n",
+ smp_processor_id());
+ lo &= ~K8_MTRRFIXRANGE_DRAM_MODIFY;
+ mtrr_wrmsr(MSR_K8_SYSCFG, lo, hi);
+ }
+}
+
/*
* Returns the effective MTRR type for the region
* Error returns:
@@ -174,6 +200,8 @@ get_fixed_ranges(mtrr_type * frs)
unsigned int *p = (unsigned int *) frs;
int i;
+ k8_check_syscfg_dram_mod_en();
+
rdmsr(MTRRfix64K_00000_MSR, p[0], p[1]);
for (i = 0; i < 2; i++)
@@ -308,27 +336,10 @@ void mtrr_wrmsr(unsigned msr, unsigned a, unsigned b)
}
/**
- * Enable and allow read/write of extended fixed-range MTRR bits on K8 CPUs
- * see AMD publication no. 24593, chapter 3.2.1 for more information
- */
-static inline void k8_enable_fixed_iorrs(void)
-{
- unsigned lo, hi;
-
- rdmsr(MSR_K8_SYSCFG, lo, hi);
- mtrr_wrmsr(MSR_K8_SYSCFG, lo
- | K8_MTRRFIXRANGE_DRAM_ENABLE
- | K8_MTRRFIXRANGE_DRAM_MODIFY, hi);
-}
-
-/**
* set_fixed_range - checks & updates a fixed-range MTRR if it differs from the value it should have
* @msr: MSR address of the MTTR which should be checked and updated
* @changed: pointer which indicates whether the MTRR needed to be changed
* @msrwords: pointer to the MSR values which the MSR should have
- *
- * If K8 extentions are wanted, update the K8 SYSCFG MSR also.
- * See AMD publication no. 24593, chapter 7.8.1, page 233 for more information.
*/
static void set_fixed_range(int msr, bool *changed, unsigned int *msrwords)
{
@@ -337,10 +348,6 @@ static void set_fixed_range(int msr, bool *changed, unsigned int *msrwords)
rdmsr(msr, lo, hi);
if (lo != msrwords[0] || hi != msrwords[1]) {
- if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
- (boot_cpu_data.x86 >= 0x0f && boot_cpu_data.x86 <= 0x11) &&
- ((msrwords[0] | msrwords[1]) & K8_MTRR_RDMEM_WRMEM_MASK))
- k8_enable_fixed_iorrs();
mtrr_wrmsr(msr, msrwords[0], msrwords[1]);
*changed = true;
}
@@ -419,6 +426,8 @@ static int set_fixed_ranges(mtrr_type * frs)
bool changed = false;
int block=-1, range;
+ k8_check_syscfg_dram_mod_en();
+
while (fixed_range_blocks[++block].ranges)
for (range=0; range < fixed_range_blocks[block].ranges; range++)
set_fixed_range(fixed_range_blocks[block].base_msr + range,
--
1.6.2
On Thu, Mar 12, 2009 at 4:41 AM, Andreas Herrmann
<[email protected]> wrote:
> On Thu, Mar 12, 2009 at 01:01:26AM -0700, Yinghai Lu wrote:
>> wonder if you could add one dmi check to tell the user that bios is
>> buggy, ask vendor fixed BIOS
>> and skip fixed mtrr sync.
>
> There seem to be several systems affected that do not hide
> RdMem/WrMem bits from OS.
>
> ?(Causing Suspend/resume problem:)
> - Acer Ferrari 1000
> - Acer Ferrari 5000
> ?(Causing kernel freeze:)
> - Asus M51TR-AP003C notebook
> - Asus M51Ta notebook
> - Asus M3A-H/HDMI mobo
they didn't run bios testsuite or bios testsuite miss that checking?
YH
* Andreas Herrmann <[email protected]> wrote:
> Impact: bug fix + BIOS workaround
> Change to previous version:
> I slightly modified the log message (e.g. addition of FW_WARN).
>
> Please consider to apply this patch for .29.
i've applied it to tip:x86/mtrr, thanks Andreas.
I've add a -stable backport tag - so if it's problem-free it
should show up in .29.1.
It is not completely clear what the impact of this fix is. What
types of problems are such incoherent MTRR settings causing in
practice? Boot hang? S2RAM failures? Performance problems?
Without knowing the exact impact we cannot apply it this late in
the .29.0 cycle - and MTRR code change are dangerous in any case
so even if we knew the exact scope and impact we'd probably not
do it in .29.
Ingo
Commit-ID: 850055f2e850ef3c1d133e024bc98fa97ff30c48
Gitweb: http://git.kernel.org/tip/850055f2e850ef3c1d133e024bc98fa97ff30c48
Author: Andreas Herrmann <[email protected]>
AuthorDate: Thu, 12 Mar 2009 17:39:37 +0100
Commit: Ingo Molnar <[email protected]>
CommitDate: Fri, 13 Mar 2009 02:55:27 +0100
x86: mtrr: don't modify RdDram/WrDram bits of fixed MTRRs
Impact: bug fix + BIOS workaround
BIOS is expected to clear the SYSCFG[MtrrFixDramModEn] on AMD CPUs
after fixed MTRRs are configured.
Some BIOSes do not clear SYSCFG[MtrrFixDramModEn] on BP (and on APs).
This can lead to obfuscation in Linux when this bit is not cleared on
BP but cleared on APs. A consequence of this is that the saved
fixed-MTRR state (from BP) differs from the fixed-MTRRs of APs --
because RdDram/WrDram bits are read as zero when
SYSCFG[MtrrFixDramModEn] is cleared -- and Linux tries to sync
fixed-MTRR state from BP to AP. This implies that Linux sets
SYSCFG[MtrrFixDramEn] and activates those bits.
More important is that (some) systems change these bits in SMM when
ACPI is enabled. Hence it is racy if Linux modifies RdMem/WrMem bits,
too.
I suggest not to touch RdDram/WrDram bits of fixed-MTRRs and
SYSCFG[MtrrFixDramEn] and to clear SYSCFG[MtrrFixDramModEn] as
suggested by AMD K8, and AMD family 10h/11h BKDGs.
BIOS is expected to do this anyway. This should avoid that
Linux and SMM tread on each other's toes ...
Signed-off-by: Andreas Herrmann <[email protected]>
Cc: [email protected]
Cc: Yinghai Lu <[email protected]>
LKML-Reference: <[email protected]>
Cc: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/cpu/mtrr/generic.c | 51 +++++++++++++++++++++---------------
1 files changed, 30 insertions(+), 21 deletions(-)
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 9644035..950c434 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -33,6 +33,32 @@ u64 mtrr_tom2;
struct mtrr_state_type mtrr_state = {};
EXPORT_SYMBOL_GPL(mtrr_state);
+/**
+ * BIOS is expected to clear MtrrFixDramModEn bit, see for example
+ * "BIOS and Kernel Developer's Guide for the AMD Athlon 64 and AMD
+ * Opteron Processors" (26094 Rev. 3.30 February 2006), section
+ * "13.2.1.2 SYSCFG Register": "The MtrrFixDramModEn bit should be set
+ * to 1 during BIOS initalization of the fixed MTRRs, then cleared to
+ * 0 for operation."
+ */
+static inline void k8_check_syscfg_dram_mod_en(void)
+{
+ u32 lo, hi;
+
+ if (!((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) &&
+ (boot_cpu_data.x86 >= 0x0f)))
+ return;
+
+ rdmsr(MSR_K8_SYSCFG, lo, hi);
+ if (lo & K8_MTRRFIXRANGE_DRAM_MODIFY) {
+ printk(KERN_ERR FW_WARN "MTRR: CPU %u: SYSCFG[MtrrFixDramModEn]"
+ " not cleared by BIOS, clearing this bit\n",
+ smp_processor_id());
+ lo &= ~K8_MTRRFIXRANGE_DRAM_MODIFY;
+ mtrr_wrmsr(MSR_K8_SYSCFG, lo, hi);
+ }
+}
+
/*
* Returns the effective MTRR type for the region
* Error returns:
@@ -166,6 +192,8 @@ get_fixed_ranges(mtrr_type * frs)
unsigned int *p = (unsigned int *) frs;
int i;
+ k8_check_syscfg_dram_mod_en();
+
rdmsr(MTRRfix64K_00000_MSR, p[0], p[1]);
for (i = 0; i < 2; i++)
@@ -306,27 +334,10 @@ void mtrr_wrmsr(unsigned msr, unsigned a, unsigned b)
}
/**
- * Enable and allow read/write of extended fixed-range MTRR bits on K8 CPUs
- * see AMD publication no. 24593, chapter 3.2.1 for more information
- */
-static inline void k8_enable_fixed_iorrs(void)
-{
- unsigned lo, hi;
-
- rdmsr(MSR_K8_SYSCFG, lo, hi);
- mtrr_wrmsr(MSR_K8_SYSCFG, lo
- | K8_MTRRFIXRANGE_DRAM_ENABLE
- | K8_MTRRFIXRANGE_DRAM_MODIFY, hi);
-}
-
-/**
* set_fixed_range - checks & updates a fixed-range MTRR if it differs from the value it should have
* @msr: MSR address of the MTTR which should be checked and updated
* @changed: pointer which indicates whether the MTRR needed to be changed
* @msrwords: pointer to the MSR values which the MSR should have
- *
- * If K8 extentions are wanted, update the K8 SYSCFG MSR also.
- * See AMD publication no. 24593, chapter 7.8.1, page 233 for more information.
*/
static void set_fixed_range(int msr, bool *changed, unsigned int *msrwords)
{
@@ -335,10 +346,6 @@ static void set_fixed_range(int msr, bool *changed, unsigned int *msrwords)
rdmsr(msr, lo, hi);
if (lo != msrwords[0] || hi != msrwords[1]) {
- if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
- (boot_cpu_data.x86 >= 0x0f && boot_cpu_data.x86 <= 0x11) &&
- ((msrwords[0] | msrwords[1]) & K8_MTRR_RDMEM_WRMEM_MASK))
- k8_enable_fixed_iorrs();
mtrr_wrmsr(msr, msrwords[0], msrwords[1]);
*changed = true;
}
@@ -426,6 +433,8 @@ static int set_fixed_ranges(mtrr_type * frs)
bool changed = false;
int block=-1, range;
+ k8_check_syscfg_dram_mod_en();
+
while (fixed_range_blocks[++block].ranges)
for (range=0; range < fixed_range_blocks[block].ranges; range++)
set_fixed_range(fixed_range_blocks[block].base_msr + range,
On Friday 13 March 2009 02:58:56 Ingo Molnar wrote:
>
> * Andreas Herrmann <[email protected]> wrote:
>
> > Impact: bug fix + BIOS workaround
>
> > Change to previous version:
> > I slightly modified the log message (e.g. addition of FW_WARN).
> >
> > Please consider to apply this patch for .29.
>
> i've applied it to tip:x86/mtrr, thanks Andreas.
>
> I've add a -stable backport tag - so if it's problem-free it
> should show up in .29.1.
What does -stable backport tag mean?
Is this something tip:x86 or Ingo Molnar specific?
I saw quite some "easy" fixes not being added/submitted to stable
in other subsystems and doing double work going through them,
pick them out and submit them to stable is an unnecessary waste
of time and some fixes will slip through.
Is there a general approach all maintainers should handle this?
If not, maybe you could suggest your way of handling this to others?
Thanks,
Thomas
On Fri, Mar 13, 2009 at 02:58:56AM +0100, Ingo Molnar wrote:
>
> * Andreas Herrmann <[email protected]> wrote:
>
> > Impact: bug fix + BIOS workaround
>
> > Change to previous version:
> > I slightly modified the log message (e.g. addition of FW_WARN).
> >
> > Please consider to apply this patch for .29.
>
> i've applied it to tip:x86/mtrr, thanks Andreas.
>
> I've add a -stable backport tag - so if it's problem-free it
> should show up in .29.1.
That should suffice.
> It is not completely clear what the impact of this fix is. What
> types of problems are such incoherent MTRR settings causing in
> practice?
I admit the commit message is not that explanatory ...
(1) The patch modifies an old fix from Bernhard Kaindl to get
suspend/resume working on some Acer Laptops. Bernhard's patch
tried to sync RdMem/WrMem bits of fixed MTRR registers and that
helped on those old Laptops. (Don't ask me why -- can't test it
myself). But this old problem was not the motivation for the
patch. (See http://lkml.org/lkml/2007/4/3/110)
(2) The more important effect is to fix issues on some more current systems.
On those systems Linux panics or just freezes, see
http://bugzilla.kernel.org/show_bug.cgi?id=11541
(and also duplicates of this bug:
http://bugzilla.kernel.org/show_bug.cgi?id=11737
http://bugzilla.kernel.org/show_bug.cgi?id=11714)
The affected systems boot only using acpi=ht, acpi=off or
when the kernel is built with CONFIG_MTRR=n.
The acpi options prevent full enablement of ACPI. Obviously when
ACPI is enabled the BIOS/SMM modfies RdMem/WrMem bits. When
CONFIG_MTRR=y Linux also accesses and modifies those bits when it
needs to sync fixed-MTRRs across cores (Bernhard's fix, see (1)).
How do you synchronize that? You can't. As a consequence Linux
shouldn't touch those bits at all (Rationale are AMD's BKDGs which
recommend to clear the bit that makes RdMem/WrMem accessible).
This is the purpose of this patch. And (so far) this suffices to
fix (1) and (2).
> Boot hang? S2RAM failures? Performance problems?
for (1) S2RAM and S2DISK failures.
for (2) boot hang
> Without knowing the exact impact we cannot apply it this late in
> the .29.0 cycle - and MTRR code change are dangerous in any case
> so even if we knew the exact scope and impact we'd probably not
> do it in .29.
Fine with me (although I think that it's safest not to touch the two
bits at all from the OS as we don't know what the BIOS wants to do
with them).
Regards,
Andreas
--
Operating | Advanced Micro Devices GmbH
System | Karl-Hammerschmidt-Str. 34, 85609 Dornach b. M?nchen, Germany
Research | Gesch?ftsf?hrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis M?nchen
(OSRC) | Registergericht M?nchen, HRB Nr. 43632
On Thu, Mar 12, 2009 at 11:24:12AM -0700, Yinghai Lu wrote:
> On Thu, Mar 12, 2009 at 4:41 AM, Andreas Herrmann
> <[email protected]> wrote:
> > On Thu, Mar 12, 2009 at 01:01:26AM -0700, Yinghai Lu wrote:
> >> wonder if you could add one dmi check to tell the user that bios is
> >> buggy, ask vendor fixed BIOS
> >> and skip fixed mtrr sync.
> >
> > There seem to be several systems affected that do not hide
> > RdMem/WrMem bits from OS.
> >
> > ?(Causing Suspend/resume problem:)
> > - Acer Ferrari 1000
> > - Acer Ferrari 5000
> > ?(Causing kernel freeze:)
> > - Asus M51TR-AP003C notebook
> > - Asus M51Ta notebook
> > - Asus M3A-H/HDMI mobo
>
> they didn't run bios testsuite or bios testsuite miss that checking?
Don't know. I'll try to figure that out.
Andreas
--
Operating | Advanced Micro Devices GmbH
System | Karl-Hammerschmidt-Str. 34, 85609 Dornach b. M?nchen, Germany
Research | Gesch?ftsf?hrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis M?nchen
(OSRC) | Registergericht M?nchen, HRB Nr. 43632
* Thomas Renninger <[email protected]> wrote:
> On Friday 13 March 2009 02:58:56 Ingo Molnar wrote:
> >
> > * Andreas Herrmann <[email protected]> wrote:
> >
> > > Impact: bug fix + BIOS workaround
> >
> > > Change to previous version:
> > > I slightly modified the log message (e.g. addition of FW_WARN).
> > >
> > > Please consider to apply this patch for .29.
> >
> > i've applied it to tip:x86/mtrr, thanks Andreas.
> >
> > I've add a -stable backport tag - so if it's problem-free it
> > should show up in .29.1.
>
> What does -stable backport tag mean?
It means such lines added to commit logs:
Cc: <[email protected]>
Greg has scripting in place to pick up such commits
automatically into -stable.
AFAIK Andrew Morton was the first maintainer who started using
these tags consistently, at around 2006.
> Is this something tip:x86 or Ingo Molnar specific?
We do it consistently for all fixes we identify as -stable
backport candidates that go via -tip but the practice existed
upstream long before we started doing it.
Even with such tags it's still a lot of non-trivial work to keep
-stable going:
- there are those fixes which are identified as fixes only
after they have been committed (without tags)
- there are those fixes that have not been tagged at all
- often patches will not apply or break if applied out of
sequence.
But adding <[email protected]> certainly helps.
> I saw quite some "easy" fixes not being added/submitted to
> stable in other subsystems and doing double work going through
> them, pick them out and submit them to stable is an
> unnecessary waste of time and some fixes will slip through.
Pointing out specific examples where the backport was realized
when the fix was committed but the tag was not added or outright
lost, and talking to those those maintainers might help. Often
it's a matter of "Oh, cool, did not know that!" realization.
Interestingly, the second-ever such tag i found in Git history
was for a fix ... from you:
commit 59d399d357a7705568f424c6e861ee8657f7f655
Author: Thomas Renninger <[email protected]>
Date: Tue Nov 8 05:27:00 2005 -0500
[ACPI] Fix Null pointer deref in video/lcd/brightness
http://bugzilla.kernel.org/show_bug.cgi?id=5571
Signed-off-by: Thomas Renninger <[email protected]>
Signed-off-by: Nishanth Aravamudan <[email protected]>
Signed-off-by: Yu Luming <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Len Brown <[email protected]>
:-)
Ingo
* Andreas Herrmann <[email protected]> wrote:
> (1) The patch modifies an old fix from Bernhard Kaindl to get
> suspend/resume working on some Acer Laptops. Bernhard's patch
> tried to sync RdMem/WrMem bits of fixed MTRR registers and that
> helped on those old Laptops. (Don't ask me why -- can't test it
> myself). But this old problem was not the motivation for the
> patch. (See http://lkml.org/lkml/2007/4/3/110)
>
> (2) The more important effect is to fix issues on some more current systems.
>
> On those systems Linux panics or just freezes, see
>
> http://bugzilla.kernel.org/show_bug.cgi?id=11541
> (and also duplicates of this bug:
> http://bugzilla.kernel.org/show_bug.cgi?id=11737
> http://bugzilla.kernel.org/show_bug.cgi?id=11714)
>
> The affected systems boot only using acpi=ht, acpi=off or
> when the kernel is built with CONFIG_MTRR=n.
>
> The acpi options prevent full enablement of ACPI. Obviously when
> ACPI is enabled the BIOS/SMM modfies RdMem/WrMem bits. When
> CONFIG_MTRR=y Linux also accesses and modifies those bits when it
> needs to sync fixed-MTRRs across cores (Bernhard's fix, see (1)).
> How do you synchronize that? You can't. As a consequence Linux
> shouldn't touch those bits at all (Rationale are AMD's BKDGs which
> recommend to clear the bit that makes RdMem/WrMem accessible).
> This is the purpose of this patch. And (so far) this suffices to
> fix (1) and (2).
thanks - that's excellent info. I've amended the commit log with
this.
It's still .29.1 material due to the general riskiness of MTRR
changes - but the merge window will open in 1-2 weeks so it's
not a 3 months delay.
Ingo
Commit-ID: 3ff42da5048649503e343a32be37b14a6a4e8aaf
Gitweb: http://git.kernel.org/tip/3ff42da5048649503e343a32be37b14a6a4e8aaf
Author: Andreas Herrmann <[email protected]>
AuthorDate: Thu, 12 Mar 2009 17:39:37 +0100
Commit: Ingo Molnar <[email protected]>
CommitDate: Fri, 13 Mar 2009 10:19:27 +0100
x86: mtrr: don't modify RdDram/WrDram bits of fixed MTRRs
Impact: bug fix + BIOS workaround
BIOS is expected to clear the SYSCFG[MtrrFixDramModEn] on AMD CPUs
after fixed MTRRs are configured.
Some BIOSes do not clear SYSCFG[MtrrFixDramModEn] on BP (and on APs).
This can lead to obfuscation in Linux when this bit is not cleared on
BP but cleared on APs. A consequence of this is that the saved
fixed-MTRR state (from BP) differs from the fixed-MTRRs of APs --
because RdDram/WrDram bits are read as zero when
SYSCFG[MtrrFixDramModEn] is cleared -- and Linux tries to sync
fixed-MTRR state from BP to AP. This implies that Linux sets
SYSCFG[MtrrFixDramEn] and activates those bits.
More important is that (some) systems change these bits in SMM when
ACPI is enabled. Hence it is racy if Linux modifies RdMem/WrMem bits,
too.
(1) The patch modifies an old fix from Bernhard Kaindl to get
suspend/resume working on some Acer Laptops. Bernhard's patch
tried to sync RdMem/WrMem bits of fixed MTRR registers and that
helped on those old Laptops. (Don't ask me why -- can't test it
myself). But this old problem was not the motivation for the
patch. (See http://lkml.org/lkml/2007/4/3/110)
(2) The more important effect is to fix issues on some more current systems.
On those systems Linux panics or just freezes, see
http://bugzilla.kernel.org/show_bug.cgi?id=11541
(and also duplicates of this bug:
http://bugzilla.kernel.org/show_bug.cgi?id=11737
http://bugzilla.kernel.org/show_bug.cgi?id=11714)
The affected systems boot only using acpi=ht, acpi=off or
when the kernel is built with CONFIG_MTRR=n.
The acpi options prevent full enablement of ACPI. Obviously when
ACPI is enabled the BIOS/SMM modfies RdMem/WrMem bits. When
CONFIG_MTRR=y Linux also accesses and modifies those bits when it
needs to sync fixed-MTRRs across cores (Bernhard's fix, see (1)).
How do you synchronize that? You can't. As a consequence Linux
shouldn't touch those bits at all (Rationale are AMD's BKDGs which
recommend to clear the bit that makes RdMem/WrMem accessible).
This is the purpose of this patch. And (so far) this suffices to
fix (1) and (2).
I suggest not to touch RdDram/WrDram bits of fixed-MTRRs and
SYSCFG[MtrrFixDramEn] and to clear SYSCFG[MtrrFixDramModEn] as
suggested by AMD K8, and AMD family 10h/11h BKDGs.
BIOS is expected to do this anyway. This should avoid that
Linux and SMM tread on each other's toes ...
Signed-off-by: Andreas Herrmann <[email protected]>
Cc: [email protected]
Cc: Yinghai Lu <[email protected]>
LKML-Reference: <[email protected]>
Cc: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/cpu/mtrr/generic.c | 51 +++++++++++++++++++++---------------
1 files changed, 30 insertions(+), 21 deletions(-)
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 9644035..950c434 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -33,6 +33,32 @@ u64 mtrr_tom2;
struct mtrr_state_type mtrr_state = {};
EXPORT_SYMBOL_GPL(mtrr_state);
+/**
+ * BIOS is expected to clear MtrrFixDramModEn bit, see for example
+ * "BIOS and Kernel Developer's Guide for the AMD Athlon 64 and AMD
+ * Opteron Processors" (26094 Rev. 3.30 February 2006), section
+ * "13.2.1.2 SYSCFG Register": "The MtrrFixDramModEn bit should be set
+ * to 1 during BIOS initalization of the fixed MTRRs, then cleared to
+ * 0 for operation."
+ */
+static inline void k8_check_syscfg_dram_mod_en(void)
+{
+ u32 lo, hi;
+
+ if (!((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) &&
+ (boot_cpu_data.x86 >= 0x0f)))
+ return;
+
+ rdmsr(MSR_K8_SYSCFG, lo, hi);
+ if (lo & K8_MTRRFIXRANGE_DRAM_MODIFY) {
+ printk(KERN_ERR FW_WARN "MTRR: CPU %u: SYSCFG[MtrrFixDramModEn]"
+ " not cleared by BIOS, clearing this bit\n",
+ smp_processor_id());
+ lo &= ~K8_MTRRFIXRANGE_DRAM_MODIFY;
+ mtrr_wrmsr(MSR_K8_SYSCFG, lo, hi);
+ }
+}
+
/*
* Returns the effective MTRR type for the region
* Error returns:
@@ -166,6 +192,8 @@ get_fixed_ranges(mtrr_type * frs)
unsigned int *p = (unsigned int *) frs;
int i;
+ k8_check_syscfg_dram_mod_en();
+
rdmsr(MTRRfix64K_00000_MSR, p[0], p[1]);
for (i = 0; i < 2; i++)
@@ -306,27 +334,10 @@ void mtrr_wrmsr(unsigned msr, unsigned a, unsigned b)
}
/**
- * Enable and allow read/write of extended fixed-range MTRR bits on K8 CPUs
- * see AMD publication no. 24593, chapter 3.2.1 for more information
- */
-static inline void k8_enable_fixed_iorrs(void)
-{
- unsigned lo, hi;
-
- rdmsr(MSR_K8_SYSCFG, lo, hi);
- mtrr_wrmsr(MSR_K8_SYSCFG, lo
- | K8_MTRRFIXRANGE_DRAM_ENABLE
- | K8_MTRRFIXRANGE_DRAM_MODIFY, hi);
-}
-
-/**
* set_fixed_range - checks & updates a fixed-range MTRR if it differs from the value it should have
* @msr: MSR address of the MTTR which should be checked and updated
* @changed: pointer which indicates whether the MTRR needed to be changed
* @msrwords: pointer to the MSR values which the MSR should have
- *
- * If K8 extentions are wanted, update the K8 SYSCFG MSR also.
- * See AMD publication no. 24593, chapter 7.8.1, page 233 for more information.
*/
static void set_fixed_range(int msr, bool *changed, unsigned int *msrwords)
{
@@ -335,10 +346,6 @@ static void set_fixed_range(int msr, bool *changed, unsigned int *msrwords)
rdmsr(msr, lo, hi);
if (lo != msrwords[0] || hi != msrwords[1]) {
- if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
- (boot_cpu_data.x86 >= 0x0f && boot_cpu_data.x86 <= 0x11) &&
- ((msrwords[0] | msrwords[1]) & K8_MTRR_RDMEM_WRMEM_MASK))
- k8_enable_fixed_iorrs();
mtrr_wrmsr(msr, msrwords[0], msrwords[1]);
*changed = true;
}
@@ -426,6 +433,8 @@ static int set_fixed_ranges(mtrr_type * frs)
bool changed = false;
int block=-1, range;
+ k8_check_syscfg_dram_mod_en();
+
while (fixed_range_blocks[++block].ranges)
for (range=0; range < fixed_range_blocks[block].ranges; range++)
set_fixed_range(fixed_range_blocks[block].base_msr + range,
On Friday 13 March 2009 10:18:12 Ingo Molnar wrote:
>
> * Thomas Renninger <[email protected]> wrote:
>
> > What does -stable backport tag mean?
>
> It means such lines added to commit logs:
>
> Cc: <[email protected]>
Thanks. That was the bit I liked to know.
>
> Pointing out specific examples where the backport was realized
> when the fix was committed but the tag was not added or outright
> lost, and talking to those those maintainers might help. Often
> it's a matter of "Oh, cool, did not know that!" realization.
Yep. Same for me.
>
> Interestingly, the second-ever such tag i found in Git history
> was for a fix ... from you:
I did this more intuitively.
If this is how it should be done, this info should be spread
to maintainers and repeated some times until really everybody
is looking at it. This is not much work. The backporting itself
might be, but this could be done by the author or distributions looking out
for that tag. At least much less important "easy" fixes shouldn't slip
through or take a long time until someone realizes that they are missing.
Thanks,
Thomas
On Fri, Mar 13, 2009 at 11:08:07AM +0100, Thomas Renninger wrote:
> > Interestingly, the second-ever such tag i found in Git history
> > was for a fix ... from you:
> I did this more intuitively.
> If this is how it should be done, this info should be spread
> to maintainers and repeated some times until really everybody
> is looking at it.
I've been saying it for quite a while, as well as Andrew, and it's
documented in the Documentation/stable_kernel_rules.txt file. If you
know of any other way to document this, please let me know.
thanks,
greg k-h