2013-07-18 22:42:38

by H. Peter Anvin

[permalink] [raw]
Subject: [GIT PULL] x86 fixes for 3.11-rc2

Hi Linus,

Trying again to get the fixes queue, including the fixed IDT alignment
patch.

The UEFI patch is by far the biggest issue at hand: it is currently
causing quite a few machines to boot. Which is sad, because the only
reason they would is because their BIOSes touch memory that has
already been freed. The other major issue is that we finally have
tracked down the root cause of a significant number of machines
failing to suspend/resume.

Toivottavasti sinun ei tarvitse kutsua minua perkeleen vittupää tällä
kertaa.

The following changes since commit 6d128e1e72bf082542e85f72e6b7ddd704193588:

Revert "Makefile: Fix install error with make -j option" (2013-07-10 19:02:51 -0700)

are available in the git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-urgent-for-linus

for you to fetch changes up to 4df05f361937ee86e5a8c9ead8aeb6a19ea9b7d7:

x86: Make sure IDT is page aligned (2013-07-16 15:14:48 -0700)

----------------------------------------------------------------
H. Peter Anvin (1):
x86, suspend: Handle CPUs which fail to #GP on RDMSR

Kees Cook (1):
x86: Make sure IDT is page aligned

Matt Fleming (2):
efivars: check for EFI_RUNTIME_SERVICES
Revert "UEFI: Don't pass boot services regions to SetVirtualAddressMap()"

Xiong Zhou (1):
x86/platform/ce4100: Add header file for reboot type

arch/x86/kernel/acpi/sleep.c | 18 ++++++++++++++++--
arch/x86/kernel/head_64.S | 15 ---------------
arch/x86/kernel/tracepoint.c | 6 ++----
arch/x86/kernel/traps.c | 12 ++++++------
arch/x86/platform/ce4100/ce4100.c | 1 +
arch/x86/platform/efi/efi.c | 7 -------
drivers/firmware/efi/efivars.c | 3 +++
7 files changed, 28 insertions(+), 34 deletions(-)

diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c
index 2a34aaf..3312010 100644
--- a/arch/x86/kernel/acpi/sleep.c
+++ b/arch/x86/kernel/acpi/sleep.c
@@ -48,9 +48,20 @@ int x86_acpi_suspend_lowlevel(void)
#ifndef CONFIG_64BIT
native_store_gdt((struct desc_ptr *)&header->pmode_gdt);

+ /*
+ * We have to check that we can write back the value, and not
+ * just read it. At least on 90 nm Pentium M (Family 6, Model
+ * 13), reading an invalid MSR is not guaranteed to trap, see
+ * Erratum X4 in "Intel Pentium M Processor on 90 nm Process
+ * with 2-MB L2 Cache and Intel® Processor A100 and A110 on 90
+ * nm process with 512-KB L2 Cache Specification Update".
+ */
if (!rdmsr_safe(MSR_EFER,
&header->pmode_efer_low,
- &header->pmode_efer_high))
+ &header->pmode_efer_high) &&
+ !wrmsr_safe(MSR_EFER,
+ header->pmode_efer_low,
+ header->pmode_efer_high))
header->pmode_behavior |= (1 << WAKEUP_BEHAVIOR_RESTORE_EFER);
#endif /* !CONFIG_64BIT */

@@ -61,7 +72,10 @@ int x86_acpi_suspend_lowlevel(void)
}
if (!rdmsr_safe(MSR_IA32_MISC_ENABLE,
&header->pmode_misc_en_low,
- &header->pmode_misc_en_high))
+ &header->pmode_misc_en_high) &&
+ !wrmsr_safe(MSR_IA32_MISC_ENABLE,
+ header->pmode_misc_en_low,
+ header->pmode_misc_en_high))
header->pmode_behavior |=
(1 << WAKEUP_BEHAVIOR_RESTORE_MISC_ENABLE);
header->realmode_flags = acpi_realmode_flags;
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 5e4d8a8..e1aabdb 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -512,21 +512,6 @@ ENTRY(phys_base)

#include "../../x86/xen/xen-head.S"

- .section .bss, "aw", @nobits
- .align L1_CACHE_BYTES
-ENTRY(idt_table)
- .skip IDT_ENTRIES * 16
-
- .align L1_CACHE_BYTES
-ENTRY(debug_idt_table)
- .skip IDT_ENTRIES * 16
-
-#ifdef CONFIG_TRACING
- .align L1_CACHE_BYTES
-ENTRY(trace_idt_table)
- .skip IDT_ENTRIES * 16
-#endif
-
__PAGE_ALIGNED_BSS
NEXT_PAGE(empty_zero_page)
.skip PAGE_SIZE
diff --git a/arch/x86/kernel/tracepoint.c b/arch/x86/kernel/tracepoint.c
index 4e584a8..1c113db 100644
--- a/arch/x86/kernel/tracepoint.c
+++ b/arch/x86/kernel/tracepoint.c
@@ -12,10 +12,8 @@ atomic_t trace_idt_ctr = ATOMIC_INIT(0);
struct desc_ptr trace_idt_descr = { NR_VECTORS * 16 - 1,
(unsigned long) trace_idt_table };

-#ifndef CONFIG_X86_64
-gate_desc trace_idt_table[NR_VECTORS] __page_aligned_data
- = { { { { 0, 0 } } }, };
-#endif
+/* No need to be aligned, but done to keep all IDTs defined the same way. */
+gate_desc trace_idt_table[NR_VECTORS] __page_aligned_bss;

static int trace_irq_vector_refcount;
static DEFINE_MUTEX(irq_vector_mutex);
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index b0865e8..1b23a1c 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -63,19 +63,19 @@
#include <asm/x86_init.h>
#include <asm/pgalloc.h>
#include <asm/proto.h>
+
+/* No need to be aligned, but done to keep all IDTs defined the same way. */
+gate_desc debug_idt_table[NR_VECTORS] __page_aligned_bss;
#else
#include <asm/processor-flags.h>
#include <asm/setup.h>

asmlinkage int system_call(void);
-
-/*
- * The IDT has to be page-aligned to simplify the Pentium
- * F0 0F bug workaround.
- */
-gate_desc idt_table[NR_VECTORS] __page_aligned_data = { { { { 0, 0 } } }, };
#endif

+/* Must be page-aligned because the real IDT is used in a fixmap. */
+gate_desc idt_table[NR_VECTORS] __page_aligned_bss;
+
DECLARE_BITMAP(used_vectors, NR_VECTORS);
EXPORT_SYMBOL_GPL(used_vectors);

diff --git a/arch/x86/platform/ce4100/ce4100.c b/arch/x86/platform/ce4100/ce4100.c
index f8ab494..9962015 100644
--- a/arch/x86/platform/ce4100/ce4100.c
+++ b/arch/x86/platform/ce4100/ce4100.c
@@ -14,6 +14,7 @@
#include <linux/module.h>
#include <linux/serial_reg.h>
#include <linux/serial_8250.h>
+#include <linux/reboot.h>

#include <asm/ce4100.h>
#include <asm/prom.h>
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index c8d5577..90f6ed1 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -931,13 +931,6 @@ void __init efi_enter_virtual_mode(void)
va = efi_ioremap(md->phys_addr, size,
md->type, md->attribute);

- if (!(md->attribute & EFI_MEMORY_RUNTIME)) {
- if (!va)
- pr_err("ioremap of 0x%llX failed!\n",
- (unsigned long long)md->phys_addr);
- continue;
- }
-
md->virt_addr = (u64) (unsigned long) va;

if (!va) {
diff --git a/drivers/firmware/efi/efivars.c b/drivers/firmware/efi/efivars.c
index 8bd1bb6..8a7432a 100644
--- a/drivers/firmware/efi/efivars.c
+++ b/drivers/firmware/efi/efivars.c
@@ -583,6 +583,9 @@ int efivars_sysfs_init(void)
struct kobject *parent_kobj = efivars_kobject();
int error = 0;

+ if (!efi_enabled(EFI_RUNTIME_SERVICES))
+ return -ENODEV;
+
/* No efivars has been registered yet */
if (!parent_kobj)
return 0;


2013-07-19 00:46:38

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] x86 fixes for 3.11-rc2

Google translate is getting better. I was going to ask you who you
knew who had enough of a Finnish background to do that, but decided to
see what I could make google translate do.

I'm pretty sure google didn't _use_ to do that good a job at Finnish.
It would be "pääksi", not "pää", but I think you edited that part by
hand without knowing the rules for Finnish translative declension.

Finnish is hard. But good for swearing.

. Linus

2013-07-19 00:49:32

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [GIT PULL] x86 fixes for 3.11-rc2

On Thu, 2013-07-18 at 17:46 -0700, Linus Torvalds wrote:

> Finnish is hard. But good for swearing.

Only because the ratio of vowels to consonants causes an immediate
outbreak of swearing among those who try...

Trond
--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-07-19 00:51:18

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [GIT PULL] x86 fixes for 3.11-rc2

On 07/18/2013 05:46 PM, Linus Torvalds wrote:
>
> Finnish is hard. But good for swearing.
>

http://www.youtube.com/watch?v=b1fkMuMDqXI

It probably won't make too much sense if you haven't seen:

http://www.youtube.com/watch?v=_4bka9Y2gJ0

-hpa

2013-07-19 04:25:35

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [GIT PULL] x86 fixes for 3.11-rc2

On 07/18/2013 05:46 PM, Linus Torvalds wrote:
>
> Finnish is hard. But good for swearing.
>

Ehk? meid?n s??nt?, ett? kiroilu on sallittu vain suomeksi (ja ven?j?,
muuten Al Viro todenn?k?isesti r?j?ht??.)

-hpa

2013-07-20 12:25:07

by George Spelvin

[permalink] [raw]
Subject: Re: [GIT PULL] x86 fixes for 3.11-rc2

It's marginal with only two call sites, but would it be worth factoring
out the write-back function? Something like this (untested) patch.
It definitely makes the generated assembly cleaner.

(Signed-off-by: George Spelvin <[email protected]> if you want it.)

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index cb75028..8802d97 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -171,6 +171,30 @@ static inline int wrmsr_safe(unsigned msr, unsigned low, unsigned high)
__err; \
})

+/*
+ * We have to check that we can write back the value, and not just
+ * read it. At least on 90 nm Pentium M (Family 6, Model 13), reading
+ * an invalid MSR is not guaranteed to trap, see Erratum X4 in "Intel
+ * Pentium M Processor on 90 nm Process with 2-MB L2 Cache and Intel®
+ * Processor A100 and A110 on 90 nm process with 512-KB L2 Cache
+ * Specification Update".
+ */
+#define rdmsr_verysafe(msr, low, high) \
+({ \
+ int __err; \
+ asm volatile("2: rdmsr\n" \
+ "3: wrmsr ; xor %[err],%[err]\n" \
+ "1:\n\t" \
+ ".section .fixup,\"ax\"\n\t" \
+ "4: mov %[fault],%[err] ; jmp 1b\n\t" \
+ ".previous\n\t" \
+ _ASM_EXTABLE(2b, 4b) \
+ _ASM_EXTABLE(3b, 4b) \
+ : [err] "=r" (__err), "=a" (*low), "=d" (*high) \
+ : "c" (msr), [fault] "i" (-EIO)); \
+ __err; \
+})
+
static inline int rdmsrl_safe(unsigned msr, unsigned long long *p)
{
int err;
diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c
index b44577b..7c3f40c 100644
--- a/arch/x86/kernel/acpi/sleep.c
+++ b/arch/x86/kernel/acpi/sleep.c
@@ -48,9 +48,9 @@ int acpi_suspend_lowlevel(void)
#ifndef CONFIG_64BIT
native_store_gdt((struct desc_ptr *)&header->pmode_gdt);

- if (!rdmsr_safe(MSR_EFER,
- &header->pmode_efer_low,
- &header->pmode_efer_high))
+ if (!rdmsr_verysafe(MSR_EFER,
+ &header->pmode_efer_low,
+ &header->pmode_efer_high))
header->pmode_behavior |= (1 << WAKEUP_BEHAVIOR_RESTORE_EFER);
#endif /* !CONFIG_64BIT */

@@ -59,9 +59,9 @@ int acpi_suspend_lowlevel(void)
header->pmode_cr4 = read_cr4();
header->pmode_behavior |= (1 << WAKEUP_BEHAVIOR_RESTORE_CR4);
}
- if (!rdmsr_safe(MSR_IA32_MISC_ENABLE,
- &header->pmode_misc_en_low,
- &header->pmode_misc_en_high))
+ if (!rdmsr_verysafe(MSR_IA32_MISC_ENABLE,
+ &header->pmode_misc_en_low,
+ &header->pmode_misc_en_high))
header->pmode_behavior |=
(1 << WAKEUP_BEHAVIOR_RESTORE_MISC_ENABLE);
header->realmode_flags = acpi_realmode_flags;

2013-07-20 13:25:57

by Borislav Petkov

[permalink] [raw]
Subject: Re: [GIT PULL] x86 fixes for 3.11-rc2

On Sat, Jul 20, 2013 at 08:25:04AM -0400, George Spelvin wrote:
> It's marginal with only two call sites, but would it be worth factoring
> out the write-back function? Something like this (untested) patch.
> It definitely makes the generated assembly cleaner.

I don't think that matters because this is called only once on suspend.
Unless the cleaner assembly translates into a palpable speedup, which I
doubt.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-07-20 14:48:03

by George Spelvin

[permalink] [raw]
Subject: Re: [GIT PULL] x86 fixes for 3.11-rc2

Borislav Petkov <[email protected]> wrote:
> I don't think that matters because this is called only once on suspend.
> Unless the cleaner assembly translates into a palpable speedup, which I
> doubt.

I was thinking about code *size*, actually; I agree that speed is
too small to measure.

Clean code (21 bytes):
4e: b9 80 00 00 c0 mov $0xc0000080,%ecx
53: 0f 32 rdmsr
55: 0f 30 wrmsr
57: 31 f6 xor %esi,%esi
59: 85 f6 test %esi,%esi
5b: 89 43 14 mov %eax,0x14(%ebx)
5e: 89 53 18 mov %edx,0x18(%ebx)
61: 75 04 jne 67 <acpi_suspend_lowlevel+0x67>

Ugly code (50 bytes):
51: b9 80 00 00 c0 mov $0xc0000080,%ecx
56: 0f 32 rdmsr
58: 31 c9 xor %ecx,%ecx
5a: 89 c6 mov %eax,%esi
5c: 85 c9 test %ecx,%ecx
5e: 89 45 ec mov %eax,-0x14(%ebp)
61: 89 55 f0 mov %edx,-0x10(%ebp)
64: 89 73 14 mov %esi,0x14(%ebx)
67: 89 53 18 mov %edx,0x18(%ebx)
6a: 75 1b jne 87 <acpi_suspend_lowlevel+0x87>
6c: 8b 75 ec mov -0x14(%ebp),%esi
6f: b9 80 00 00 c0 mov $0xc0000080,%ecx
74: 8b 7d f0 mov -0x10(%ebp),%edi
77: 89 f0 mov %esi,%eax
79: 89 fa mov %edi,%edx
7b: 0f 30 wrmsr
7d: 31 c0 xor %eax,%eax
7f: 85 c0 test %eax,%eax
81: 75 04 jne 87 <acpi_suspend_lowlevel+0x87>

2013-07-20 16:56:06

by Borislav Petkov

[permalink] [raw]
Subject: Re: [GIT PULL] x86 fixes for 3.11-rc2

On Sat, Jul 20, 2013 at 10:47:58AM -0400, George Spelvin wrote:
> Borislav Petkov <[email protected]> wrote:
> > I don't think that matters because this is called only once on suspend.
> > Unless the cleaner assembly translates into a palpable speedup, which I
> > doubt.
>
> I was thinking about code *size*, actually; I agree that speed is
> too small to measure.
>
> Clean code (21 bytes):
> 4e: b9 80 00 00 c0 mov $0xc0000080,%ecx
> 53: 0f 32 rdmsr
> 55: 0f 30 wrmsr
> 57: 31 f6 xor %esi,%esi
> 59: 85 f6 test %esi,%esi
> 5b: 89 43 14 mov %eax,0x14(%ebx)
> 5e: 89 53 18 mov %edx,0x18(%ebx)
> 61: 75 04 jne 67 <acpi_suspend_lowlevel+0x67>
>
> Ugly code (50 bytes):

Right, that would matter maybe partially if the code was executed very
often. In that case, the probability of it fitting in one cacheline is
higher depending on alignment, and, you'd possibly save yourself loading
a second cacheline.

If it is 29 bytes bigger, than we have a higher probability for using a
second cacheline.

But again, I highly doubt even that would be noticeable. Especially on
modern uarches with very aggressive and smart branch prediction.

And since this is being called only once, you won't notice the
difference even with perf and specific instruction cache counters
enabled.

But what do I know - I'm always open to surprising workloads! :-)

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-07-21 20:18:00

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [GIT PULL] x86 fixes for 3.11-rc2

On 07/20/2013 05:25 AM, George Spelvin wrote:
> It's marginal with only two call sites, but would it be worth factoring
> out the write-back function? Something like this (untested) patch.
> It definitely makes the generated assembly cleaner.
>
> (Signed-off-by: George Spelvin <[email protected]> if you want it.)

Two call sites, statistically "never" executed, really doesn't justify
adding a new assembly function. I was considering making a C wrapper,
though... I would also like to change these to using rdmsrl/wrmsrl.

-hpa