Patch 1 is the fix. Patch 2 is a comment that would have kept me from
chasing down a false lead.
I've tested patch 2 using CPU hotplug and suspend/resume. I haven't
tested hibernation or kexec because I don't know how. (If I do
systemctl hibernate on my laptop, it happily writes out a hiberation
image somewhere and then it equally happily ignores it on the next
boot. I don't know how to test kexec.)
I haven't tested the 32-bit version. I'll try to get to that
tomorrow.
Andy Lutomirski (2):
x86/mm: Reinitialize TLB state on hotplug and resume
x86/mm: Document how CR4.PCIDE restore works
arch/x86/include/asm/tlbflush.h | 2 ++
arch/x86/kernel/cpu/common.c | 15 ++++++++++++++
arch/x86/mm/tlb.c | 44 +++++++++++++++++++++++++++++++++++++++++
arch/x86/power/cpu.c | 1 +
4 files changed, 62 insertions(+)
--
2.13.5
When Linux brings a CPU down and back up, it switches to init_mm and then
loads swapper_pg_dir into CR3. With PCID enabled, this has the side effect
of masking off the ASID bits in CR3.
This can result in some confusion in the TLB handling code. If we
bring a CPU down and back up with any ASID other than 0, we end up
with the wrong ASID active on the CPU after resume. This could
cause our internal state to become corrupt, although major
corruption is unlikely because init_mm doesn't have any user pages.
More obviously, if CONFIG_DEBUG_VM=y, we'll trip over an assertion
in the next context switch. The result of *that* is a failure to
resume from suspend with probability 1 - 1/6^(cpus-1).
Fix it by reinitializing cpu_tlbstate on resume and CPU bringup.
Reported-by: Linus Torvalds <[email protected]>
Reported-by: Jiri Kosina <[email protected]>
Fixes: 10af6235e0d3 ("x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID")
Signed-off-by: Andy Lutomirski <[email protected]>
---
arch/x86/include/asm/tlbflush.h | 2 ++
arch/x86/kernel/cpu/common.c | 2 ++
arch/x86/mm/tlb.c | 44 +++++++++++++++++++++++++++++++++++++++++
arch/x86/power/cpu.c | 1 +
4 files changed, 49 insertions(+)
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index d23e61dc0640..4893abf7f74f 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -198,6 +198,8 @@ static inline void cr4_set_bits_and_update_boot(unsigned long mask)
cr4_set_bits(mask);
}
+extern void initialize_tlbstate_and_flush(void);
+
static inline void __native_flush_tlb(void)
{
/*
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index efba8e3da3e2..40cb4d0a5982 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1583,6 +1583,7 @@ void cpu_init(void)
mmgrab(&init_mm);
me->active_mm = &init_mm;
BUG_ON(me->mm);
+ initialize_tlbstate_and_flush();
enter_lazy_tlb(&init_mm, me);
load_sp0(t, ¤t->thread);
@@ -1637,6 +1638,7 @@ void cpu_init(void)
mmgrab(&init_mm);
curr->active_mm = &init_mm;
BUG_ON(curr->mm);
+ initialize_tlbstate_and_flush();
enter_lazy_tlb(&init_mm, curr);
load_sp0(t, thread);
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index ce104b962a17..dbbcfd59726a 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -214,6 +214,50 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
}
/*
+ * Call this when reinitializing a CPU. It fixes the following potential
+ * problems:
+ *
+ * - The ASID changed from what cpu_tlbstate thinks it is (most likely
+ * because the CPU was taken down and came back up with CR3's PCID
+ * bits clear. CPU hotplug can do this.
+ *
+ * - The TLB contains junk in slots corresponding to inactive ASIDs.
+ *
+ * - The CPU went so far out to lunch that it may have missed a TLB
+ * flush.
+ */
+void initialize_tlbstate_and_flush(void)
+{
+ int i;
+ struct mm_struct *mm = this_cpu_read(cpu_tlbstate.loaded_mm);
+ u64 tlb_gen = atomic64_read(&init_mm.context.tlb_gen);
+ unsigned long cr3 = __read_cr3();
+
+ /* Assert that CR3 already references the right mm. */
+ WARN_ON((cr3 & CR3_ADDR_MASK) != __pa(mm->pgd));
+
+ /*
+ * Assert that CR4.PCIDE is set if needed. (CR4.PCIDE initialization
+ * doesn't work like other CR4 bits because it can only be set from
+ * long mode.)
+ */
+ WARN_ON(boot_cpu_has(X86_CR4_PCIDE) &&
+ !(cr4_read_shadow() & X86_CR4_PCIDE));
+
+ /* Force ASID 0 and force a TLB flush. */
+ write_cr3(cr3 & ~CR3_PCID_MASK);
+
+ /* Reinitialize tlbstate. */
+ this_cpu_write(cpu_tlbstate.loaded_mm_asid, 0);
+ this_cpu_write(cpu_tlbstate.next_asid, 1);
+ this_cpu_write(cpu_tlbstate.ctxs[0].ctx_id, mm->context.ctx_id);
+ this_cpu_write(cpu_tlbstate.ctxs[0].tlb_gen, tlb_gen);
+
+ for (i = 1; i < TLB_NR_DYN_ASIDS; i++)
+ this_cpu_write(cpu_tlbstate.ctxs[i].ctx_id, 0);
+}
+
+/*
* flush_tlb_func_common()'s memory ordering requirement is that any
* TLB fills that happen after we flush the TLB are ordered after we
* read active_mm's tlb_gen. We don't need any explicit barriers
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 78459a6d455a..4d68d59f457d 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -181,6 +181,7 @@ static void fix_processor_context(void)
#endif
load_TR_desc(); /* This does ltr */
load_mm_ldt(current->active_mm); /* This does lldt */
+ initialize_tlbstate_and_flush();
fpu__resume_cpu();
--
2.13.5
While debugging a problem, I thought that using
cr4_set_bits_and_update_boot() to restore CR4.PCIDE would be
helpful. It turns out to be counterproductive.
Add a comment documenting how this works.
Signed-off-by: Andy Lutomirski <[email protected]>
---
arch/x86/kernel/cpu/common.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 40cb4d0a5982..4c31a6585333 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -333,6 +333,19 @@ static void setup_pcid(struct cpuinfo_x86 *c)
{
if (cpu_has(c, X86_FEATURE_PCID)) {
if (cpu_has(c, X86_FEATURE_PGE)) {
+ /*
+ * We'd like to use cr4_set_bits_and_update_boot(),
+ * but we can't. CR4.PCIDE is special and can only
+ * be set in long mode, and the early CPU init code
+ * doesn't know this and would try to restore CR4.PCIDE
+ * prior to entering long mode.
+ *
+ * Instead, we rely on the fact that hotplug, resume,
+ * etc all fully restore CR4 before they write anything
+ * that could have nonzero PCID bits to CR3. CR4.PCIDE
+ * has no effect on the page tables themselves, so we
+ * don't need it top be restored early.
+ */
cr4_set_bits(X86_CR4_PCIDE);
} else {
/*
--
2.13.5
On Wed, Sep 6, 2017 at 7:54 PM, Andy Lutomirski <[email protected]> wrote:
> Patch 1 is the fix. Patch 2 is a comment that would have kept me from
> chasing down a false lead.
Yes, this seems to fix things for me. Thanks.
Of course, right now that laptop has no working wifi with tip-of-tree
due to some issues with the networking tree, but that's an independent
thing and I could suspend and resume with this. So applied and pushed
out,
Linus
> On Sep 6, 2017, at 8:25 PM, Linus Torvalds <[email protected]> wrote:
>
>> On Wed, Sep 6, 2017 at 7:54 PM, Andy Lutomirski <[email protected]> wrote:
>> Patch 1 is the fix. Patch 2 is a comment that would have kept me from
>> chasing down a false lead.
>
> Yes, this seems to fix things for me. Thanks.
>
> Of course, right now that laptop has no working wifi with tip-of-tree
> due to some issues with the networking tree, but that's an independent
> thing and I could suspend and resume with this. So applied and pushed
> out,
>
> Linus
Great!
FWIW, there's still a possible glitch where doing EFI calls could corrupt ASID 0. I figured that fixing that could wait for tomorrow.
* Andy Lutomirski <[email protected]> wrote:
> More obviously, if CONFIG_DEBUG_VM=y, we'll trip over an assertion
> in the next context switch. The result of *that* is a failure to
> resume from suspend with probability 1 - 1/6^(cpus-1).
Nice fix, thanks!
On a related note, this bug could have been more debuggable I think.
Could we _please_ change VM_BUG_ON() to WARN_ON() or such?
Here the stupid VM_BUG_ON() crashed Linus's laptop in a totally
undebuggable state ... while a WARN_ON() might have at least
gotten something out to his laptop's screen, right?
So I propose the patch below. Detailed arguments in the changelog.
Pretty please?
==============>
>From 673b348ab4a5b2abd17d392cacbf9ab6de3d3042 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <[email protected]>
Date: Thu, 7 Sep 2017 08:44:13 +0200
Subject: [PATCH] mm/debug: Change BUG_ON() crashes to survivable WARN_ON() warnings
So a VM_BUG_ON() that triggered with the following bug:
72c0098d92ce: ("x86/mm: Reinitialize TLB state on hotplug and resume")
... crashed and made Linus's laptop totally undebuggable, because when
it triggered there was no screen up yet. It looked like a total lockup
on resume - although we produced a warning that could have helped
narrowing down the problem.
Thus instead of being able to report the warning, Linus had to bisect
the bug the hard way in the middle of the merge window - which is beyond
most users' capability and won't work with regular distro kernels anyway.
To make matters worse, a BUG_ON() done when Xorg is active is utterly
undebuggable anyway in most cases, because it won't be printed on the
framebuffer, and because the BUG_ON() prevents the system log to be
synced to disk.
The symptoms, typically, are similar to what Linus saw: a hard lockup
followed by a bootup that shows nothing in the logs ...
Utterly crazy behavior from the kernel, IMHO!
So instead of crashing the system with a BUG_ON(), use a WARN_ON()
instead. In the above situation the kernel would probably have survived
long enough to produce a kernel log.
I realize that in principle there might be bugs where it's better to stop,
i.e. crash the kernel intentionally.
But I argue that most of the kernel bugs are _not_ such bugs, and being
able to get a log out trumps that concern - because the people who run
new kernels early are not crazy enough to _depend_ on that kernel, and
the ability to get logs off is actually more important.
People wanting to crash the kernel here and now have the burden of proof
and we should not make it the default for any widely used assert to crash
the kernel ...
To not have to do a mass rename this patch simply reuses the existing
VM_BUG_ON() which becomes somewhat of a misnomer after this change.
I will send a rename patch as well after the merge window, separately.
Note that I also made mmdebug.h a bit more readable:
- align the various constructs coherently and separate them
visually a bit better
- use consistent definitions. I mean, half the functions have
externs, half don't - what the heck?
- add a bit of description what this is about
Plus, for consistency's sake, VIRTUAL_BUG_ON() is changed as well,
but it's not a widespread primitive.
Signed-off-by: Ingo Molnar <[email protected]>
---
include/linux/mmdebug.h | 56 +++++++++++++++++++++++++++++--------------------
1 file changed, 33 insertions(+), 23 deletions(-)
diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h
index 451a811f48f2..ad127a020c3f 100644
--- a/include/linux/mmdebug.h
+++ b/include/linux/mmdebug.h
@@ -1,6 +1,11 @@
#ifndef LINUX_MM_DEBUG_H
#define LINUX_MM_DEBUG_H 1
+/*
+ * Various VM related debug assert helper functions.
+ * On perfect kernels they should never trigger.
+ */
+
#include <linux/bug.h>
#include <linux/stringify.h>
@@ -8,59 +13,64 @@ struct page;
struct vm_area_struct;
struct mm_struct;
-extern void dump_page(struct page *page, const char *reason);
+extern void dump_page(struct page *page, const char *reason);
extern void __dump_page(struct page *page, const char *reason);
-void dump_vma(const struct vm_area_struct *vma);
-void dump_mm(const struct mm_struct *mm);
+extern void dump_vma(const struct vm_area_struct *vma);
+extern void dump_mm(const struct mm_struct *mm);
#ifdef CONFIG_DEBUG_VM
-#define VM_BUG_ON(cond) BUG_ON(cond)
+
+#define VM_BUG_ON(cond) WARN_ON(cond)
+
#define VM_BUG_ON_PAGE(cond, page) \
do { \
if (unlikely(cond)) { \
dump_page(page, "VM_BUG_ON_PAGE(" __stringify(cond)")");\
- BUG(); \
+ WARN_ON(1); \
} \
} while (0)
+
#define VM_BUG_ON_VMA(cond, vma) \
do { \
if (unlikely(cond)) { \
dump_vma(vma); \
- BUG(); \
+ WARN_ON(1); \
} \
} while (0)
+
#define VM_BUG_ON_MM(cond, mm) \
do { \
if (unlikely(cond)) { \
dump_mm(mm); \
- BUG(); \
+ WARN_ON(1); \
} \
} while (0)
-#define VM_WARN_ON(cond) WARN_ON(cond)
-#define VM_WARN_ON_ONCE(cond) WARN_ON_ONCE(cond)
-#define VM_WARN_ONCE(cond, format...) WARN_ONCE(cond, format)
-#define VM_WARN(cond, format...) WARN(cond, format)
+
+#define VM_WARN_ON(cond) WARN_ON(cond)
+#define VM_WARN_ON_ONCE(cond) WARN_ON_ONCE(cond)
+#define VM_WARN_ONCE(cond, format...) WARN_ONCE(cond, format)
+#define VM_WARN(cond, format...) WARN(cond, format)
#else
-#define VM_BUG_ON(cond) BUILD_BUG_ON_INVALID(cond)
-#define VM_BUG_ON_PAGE(cond, page) VM_BUG_ON(cond)
-#define VM_BUG_ON_VMA(cond, vma) VM_BUG_ON(cond)
-#define VM_BUG_ON_MM(cond, mm) VM_BUG_ON(cond)
-#define VM_WARN_ON(cond) BUILD_BUG_ON_INVALID(cond)
-#define VM_WARN_ON_ONCE(cond) BUILD_BUG_ON_INVALID(cond)
-#define VM_WARN_ONCE(cond, format...) BUILD_BUG_ON_INVALID(cond)
-#define VM_WARN(cond, format...) BUILD_BUG_ON_INVALID(cond)
+#define VM_BUG_ON(cond) BUILD_BUG_ON_INVALID(cond)
+#define VM_BUG_ON_PAGE(cond, page) VM_BUG_ON(cond)
+#define VM_BUG_ON_VMA(cond, vma) VM_BUG_ON(cond)
+#define VM_BUG_ON_MM(cond, mm) VM_BUG_ON(cond)
+#define VM_WARN_ON(cond) BUILD_BUG_ON_INVALID(cond)
+#define VM_WARN_ON_ONCE(cond) BUILD_BUG_ON_INVALID(cond)
+#define VM_WARN_ONCE(cond, format...) BUILD_BUG_ON_INVALID(cond)
+#define VM_WARN(cond, format...) BUILD_BUG_ON_INVALID(cond)
#endif
#ifdef CONFIG_DEBUG_VIRTUAL
-#define VIRTUAL_BUG_ON(cond) BUG_ON(cond)
+#define VIRTUAL_BUG_ON(cond) WARN_ON(cond)
#else
-#define VIRTUAL_BUG_ON(cond) do { } while (0)
+#define VIRTUAL_BUG_ON(cond) do { } while (0)
#endif
#ifdef CONFIG_DEBUG_VM_PGFLAGS
-#define VM_BUG_ON_PGFLAGS(cond, page) VM_BUG_ON_PAGE(cond, page)
+#define VM_BUG_ON_PGFLAGS(cond, page) VM_BUG_ON_PAGE(cond, page)
#else
-#define VM_BUG_ON_PGFLAGS(cond, page) BUILD_BUG_ON_INVALID(cond)
+#define VM_BUG_ON_PGFLAGS(cond, page) BUILD_BUG_ON_INVALID(cond)
#endif
#endif
On Wed, 6 Sep 2017, Andy Lutomirski wrote:
> When Linux brings a CPU down and back up, it switches to init_mm and then
> loads swapper_pg_dir into CR3. With PCID enabled, this has the side effect
> of masking off the ASID bits in CR3.
>
> This can result in some confusion in the TLB handling code. If we
> bring a CPU down and back up with any ASID other than 0, we end up
> with the wrong ASID active on the CPU after resume. This could
> cause our internal state to become corrupt, although major
> corruption is unlikely because init_mm doesn't have any user pages.
> More obviously, if CONFIG_DEBUG_VM=y, we'll trip over an assertion
> in the next context switch. The result of *that* is a failure to
> resume from suspend with probability 1 - 1/6^(cpus-1).
>
> Fix it by reinitializing cpu_tlbstate on resume and CPU bringup.
>
> Reported-by: Linus Torvalds <[email protected]>
> Reported-by: Jiri Kosina <[email protected]>
> Fixes: 10af6235e0d3 ("x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID")
> Signed-off-by: Andy Lutomirski <[email protected]>
Tested-by: Jiri Kosina <[email protected]>
Thanks,
--
Jiri Kosina
SUSE Labs
* Jiri Kosina <[email protected]> wrote:
> On Wed, 6 Sep 2017, Andy Lutomirski wrote:
>
> > When Linux brings a CPU down and back up, it switches to init_mm and then
> > loads swapper_pg_dir into CR3. With PCID enabled, this has the side effect
> > of masking off the ASID bits in CR3.
> >
> > This can result in some confusion in the TLB handling code. If we
> > bring a CPU down and back up with any ASID other than 0, we end up
> > with the wrong ASID active on the CPU after resume. This could
> > cause our internal state to become corrupt, although major
> > corruption is unlikely because init_mm doesn't have any user pages.
> > More obviously, if CONFIG_DEBUG_VM=y, we'll trip over an assertion
> > in the next context switch. The result of *that* is a failure to
> > resume from suspend with probability 1 - 1/6^(cpus-1).
> >
> > Fix it by reinitializing cpu_tlbstate on resume and CPU bringup.
> >
> > Reported-by: Linus Torvalds <[email protected]>
> > Reported-by: Jiri Kosina <[email protected]>
> > Fixes: 10af6235e0d3 ("x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID")
> > Signed-off-by: Andy Lutomirski <[email protected]>
>
> Tested-by: Jiri Kosina <[email protected]>
The fix should be upstream already, as of 1c9fe4409ce3 and later.
Thanks,
Ingo
On Wed, Sep 06, 2017 at 07:54:52PM -0700, Andy Lutomirski wrote:
> Patch 1 is the fix. Patch 2 is a comment that would have kept me from
> chasing down a false lead.
>
> I've tested patch 2 using CPU hotplug and suspend/resume. I haven't
> tested hibernation or kexec because I don't know how. (If I do
> systemctl hibernate on my laptop, it happily writes out a hiberation
> image somewhere and then it equally happily ignores it on the next
> boot.
Do you have this in cmdline?
resume= [SWSUSP]
Specify the partition device for software suspend
Format:
{/dev/<dev> | PARTUUID=<uuid> | <int>:<int> | <hex>}
> I don't know how to test kexec.)
You boot with something like this:
crashkernel=512M-2G:128M,2G-64G:256M,64G-:512M
Check dmesg to see whether it managed to reserve memory. Then you do:
# kexec --noefi -l bzImage --initrd=initrd.img --reuse-cmdline
# kexec -e
That last one loads it.
Anyway, something like that. I have this in my notes saying it worked at
some point.
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
Just nitpicks:
On Wed, Sep 06, 2017 at 07:54:53PM -0700, Andy Lutomirski wrote:
> When Linux brings a CPU down and back up, it switches to init_mm and then
> loads swapper_pg_dir into CR3. With PCID enabled, this has the side effect
> of masking off the ASID bits in CR3.
>
> This can result in some confusion in the TLB handling code. If we
> bring a CPU down and back up with any ASID other than 0, we end up
> with the wrong ASID active on the CPU after resume. This could
> cause our internal state to become corrupt, although major
> corruption is unlikely because init_mm doesn't have any user pages.
> More obviously, if CONFIG_DEBUG_VM=y, we'll trip over an assertion
> in the next context switch. The result of *that* is a failure to
> resume from suspend with probability 1 - 1/6^(cpus-1).
>
> Fix it by reinitializing cpu_tlbstate on resume and CPU bringup.
>
> Reported-by: Linus Torvalds <[email protected]>
> Reported-by: Jiri Kosina <[email protected]>
> Fixes: 10af6235e0d3 ("x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID")
> Signed-off-by: Andy Lutomirski <[email protected]>
> ---
> arch/x86/include/asm/tlbflush.h | 2 ++
> arch/x86/kernel/cpu/common.c | 2 ++
> arch/x86/mm/tlb.c | 44 +++++++++++++++++++++++++++++++++++++++++
> arch/x86/power/cpu.c | 1 +
> 4 files changed, 49 insertions(+)
>
> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> index d23e61dc0640..4893abf7f74f 100644
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -198,6 +198,8 @@ static inline void cr4_set_bits_and_update_boot(unsigned long mask)
> cr4_set_bits(mask);
> }
>
> +extern void initialize_tlbstate_and_flush(void);
Let's put that declaration at the end.
> static inline void __native_flush_tlb(void)
> {
> /*
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index efba8e3da3e2..40cb4d0a5982 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1583,6 +1583,7 @@ void cpu_init(void)
> mmgrab(&init_mm);
> me->active_mm = &init_mm;
> BUG_ON(me->mm);
> + initialize_tlbstate_and_flush();
> enter_lazy_tlb(&init_mm, me);
>
> load_sp0(t, ¤t->thread);
> @@ -1637,6 +1638,7 @@ void cpu_init(void)
> mmgrab(&init_mm);
> curr->active_mm = &init_mm;
> BUG_ON(curr->mm);
> + initialize_tlbstate_and_flush();
> enter_lazy_tlb(&init_mm, curr);
>
> load_sp0(t, thread);
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index ce104b962a17..dbbcfd59726a 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -214,6 +214,50 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
> }
>
> /*
> + * Call this when reinitializing a CPU. It fixes the following potential
> + * problems:
> + *
> + * - The ASID changed from what cpu_tlbstate thinks it is (most likely
> + * because the CPU was taken down and came back up with CR3's PCID
> + * bits clear. CPU hotplug can do this.
> + *
> + * - The TLB contains junk in slots corresponding to inactive ASIDs.
> + *
> + * - The CPU went so far out to lunch that it may have missed a TLB
> + * flush.
> + */
> +void initialize_tlbstate_and_flush(void)
I think we should prefix all those visible, TLB-handling functions with
"tlb_". So you'd have tlb_init_state_and_flush().
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
* Borislav Petkov <[email protected]> wrote:
> > + */
> > +void initialize_tlbstate_and_flush(void)
>
> I think we should prefix all those visible, TLB-handling functions with
> "tlb_". So you'd have tlb_init_state_and_flush().
Agreed absolutely, but note that this affects more functions as well - for example
enter_lazy_tlb() should probably be tlb_lazy_enter() - or at least
lazy_tlb_enter() or such?
Thanks,
Ingo
On Thu, Sep 07, 2017 at 11:59:32AM +0200, Ingo Molnar wrote:
> Agreed absolutely, but note that this affects more functions as well - for example
> enter_lazy_tlb() should probably be tlb_lazy_enter() - or at least
> lazy_tlb_enter() or such?
Yeah, or tlb_enter_lazy() or even tlb_enter_lazy_mode() or so. I could
give it a try when things get a bit quieter and see how it actually
looks and how much "better" it becomes, staring at that code...
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
On Thu, 7 Sep 2017, Ingo Molnar wrote:
> > > When Linux brings a CPU down and back up, it switches to init_mm and then
> > > loads swapper_pg_dir into CR3. With PCID enabled, this has the side effect
> > > of masking off the ASID bits in CR3.
> > >
> > > This can result in some confusion in the TLB handling code. If we
> > > bring a CPU down and back up with any ASID other than 0, we end up
> > > with the wrong ASID active on the CPU after resume. This could
> > > cause our internal state to become corrupt, although major
> > > corruption is unlikely because init_mm doesn't have any user pages.
> > > More obviously, if CONFIG_DEBUG_VM=y, we'll trip over an assertion
> > > in the next context switch. The result of *that* is a failure to
> > > resume from suspend with probability 1 - 1/6^(cpus-1).
> > >
> > > Fix it by reinitializing cpu_tlbstate on resume and CPU bringup.
> > >
> > > Reported-by: Linus Torvalds <[email protected]>
> > > Reported-by: Jiri Kosina <[email protected]>
> > > Fixes: 10af6235e0d3 ("x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID")
> > > Signed-off-by: Andy Lutomirski <[email protected]>
> >
> > Tested-by: Jiri Kosina <[email protected]>
>
> The fix should be upstream already, as of 1c9fe4409ce3 and later.
Hm, so I've just experienced two instances in a row of reboot just after
reading hibernation image (i.e. exactly the same symptom as before) even
with 3b9f8ed kernel (which contains the fix). Seems like the fix is either
incomplete (just the probability of it happening is lower), or I'm seeing
something differet with the same symptom.
I'll try to figure out whether it's the same VM_BUG_ON() triggering, but
probably will be able to do so only tomorrow.
--
Jiri Kosina
SUSE Labs
On Thu, Sep 7, 2017 at 12:01 AM, Ingo Molnar <[email protected]> wrote:
>
> On a related note, this bug could have been more debuggable I think.
> Could we _please_ change VM_BUG_ON() to WARN_ON() or such?
I think it should be WARN_ON_ONCE(), or at least rate-limited some way.
Because once you have one of the VM bugs, they tend to repeat.
(We had a discussion long ago about making the "ONCE" behavior
actually be "once in a blue moon", and just mean that you warn at most
once every five minutes or something like that. Because the "once"
behavior has also resulted in people missing bugs, because the machine
has been up a long time, and maybe you got a warning at boot time, but
then five days later something fails silently again).
Also, should you do a "dump_vma()" if you then don't give a call stack
because you already did it earlier? So the rate limiting would need to
cover that part too, methinks.
Linus
> On Sep 7, 2017, at 12:55 PM, Jiri Kosina <[email protected]> wrote:
>
> On Thu, 7 Sep 2017, Ingo Molnar wrote:
>
>>>> When Linux brings a CPU down and back up, it switches to init_mm and then
>>>> loads swapper_pg_dir into CR3. With PCID enabled, this has the side effect
>>>> of masking off the ASID bits in CR3.
>>>>
>>>> This can result in some confusion in the TLB handling code. If we
>>>> bring a CPU down and back up with any ASID other than 0, we end up
>>>> with the wrong ASID active on the CPU after resume. This could
>>>> cause our internal state to become corrupt, although major
>>>> corruption is unlikely because init_mm doesn't have any user pages.
>>>> More obviously, if CONFIG_DEBUG_VM=y, we'll trip over an assertion
>>>> in the next context switch. The result of *that* is a failure to
>>>> resume from suspend with probability 1 - 1/6^(cpus-1).
>>>>
>>>> Fix it by reinitializing cpu_tlbstate on resume and CPU bringup.
>>>>
>>>> Reported-by: Linus Torvalds <[email protected]>
>>>> Reported-by: Jiri Kosina <[email protected]>
>>>> Fixes: 10af6235e0d3 ("x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID")
>>>> Signed-off-by: Andy Lutomirski <[email protected]>
>>>
>>> Tested-by: Jiri Kosina <[email protected]>
>>
>> The fix should be upstream already, as of 1c9fe4409ce3 and later.
>
> Hm, so I've just experienced two instances in a row of reboot just after
> reading hibernation image (i.e. exactly the same symptom as before) even
> with 3b9f8ed kernel (which contains the fix). Seems like the fix is either
> incomplete (just the probability of it happening is lower), or I'm seeing
> something differet with the same symptom.
>
> I'll try to figure out whether it's the same VM_BUG_ON() triggering, but
> probably will be able to do so only tomorrow.
>
Nah, don't waste your time. I think I see the bug, and it's a different bug. It's an easy one-line fix, but I have to figure out how to test it.
> --
> Jiri Kosina
> SUSE Labs
>
On Wed 2017-09-06 20:25:10, Linus Torvalds wrote:
> On Wed, Sep 6, 2017 at 7:54 PM, Andy Lutomirski <[email protected]> wrote:
> > Patch 1 is the fix. Patch 2 is a comment that would have kept me from
> > chasing down a false lead.
>
> Yes, this seems to fix things for me. Thanks.
>
> Of course, right now that laptop has no working wifi with tip-of-tree
> due to some issues with the networking tree, but that's an independent
> thing and I could suspend and resume with this. So applied and pushed
> out,
Ok, seems this is still not completely right, I'm now getting WARN_ON
during boot and on every resume... but machine works.
4.14-rc0, 32-bit.
Pavel
[ 0.000000] Linux version 4.13.0+ (pavel@duo) (gcc version 4.9.2 (Debian 4.9.2-10)) #429 SMP Thu Sep 14 12:08:05 CEST 2017
[ 0.000000] Disabled fast string operations
[ 0.000000] x86/fpu: x87 FPU will use FXSAVE
[ 0.000000] e820: BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009efff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009f000-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000dc000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bf6cffff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000bf6d0000-0x00000000bf6defff] ACPI data
[ 0.000000] BIOS-e820: [mem 0x00000000bf6df000-0x00000000bf6fffff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000bf700000-0x00000000bfffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000f0000000-0x00000000f3ffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec0ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed00000-0x00000000fed003ff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed14000-0x00000000fed19fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed8ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000ff800000-0x00000000ffffffff] reserved
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] random: fast init done
[ 0.000000] SMBIOS 2.4 present.
[ 0.000000] DMI: LENOVO 17097HU/17097HU, BIOS 7BETD8WW (2.19 ) 03/31/2011
[ 0.000000] tsc: Fast TSC calibration failed
[ 0.000000] tsc: Using PIT calibration value
[ 0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[ 0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable
[ 0.000000] e820: last_pfn = 0xbf6d0 max_arch_pfn = 0x1000000
[ 0.000000] MTRR default type: uncachable
[ 0.000000] MTRR fixed ranges enabled:
[ 0.000000] 00000-9FFFF write-back
[ 0.000000] A0000-BFFFF uncachable
[ 0.000000] C0000-CFFFF write-protect
[ 0.000000] D0000-DBFFF uncachable
[ 0.000000] DC000-DFFFF write-back
[ 0.000000] E0000-FFFFF write-protect
[ 0.000000] MTRR variable ranges enabled:
[ 0.000000] 0 base 000000000 mask F80000000 write-back
[ 0.000000] 1 base 080000000 mask FC0000000 write-back
[ 0.000000] 2 base 0BF700000 mask FFFF00000 uncachable
[ 0.000000] 3 base 0BF800000 mask FFF800000 uncachable
[ 0.000000] 4 disabled
[ 0.000000] 5 disabled
[ 0.000000] 6 disabled
[ 0.000000] 7 disabled
[ 0.000000] x86/PAT: PAT not supported by CPU.
[ 0.000000] x86/PAT: Configuration [0-7]: WB WT UC- UC WB WT UC- UC
[ 0.000000] initial memory mapped: [mem 0x00000000-0x05bfffff]
[ 0.000000] Base memory trampoline at [c009b000] 9b000 size 16384
[ 0.000000] BRK [0x057e6000, 0x057e6fff] PGTABLE
[ 0.000000] BRK [0x057e7000, 0x057e7fff] PGTABLE
[ 0.000000] ACPI: Early table checksum verification disabled
[ 0.000000] ACPI: RSDP 0x00000000000F67C0 000024 (v02 LENOVO)
[ 0.000000] ACPI: XSDT 0x00000000BF6D191C 000084 (v01 LENOVO TP-7B 00002190 LTP 00000000)
[ 0.000000] ACPI: FACP 0x00000000BF6D1A00 0000F4 (v03 LENOVO TP-7B 00002190 LNVO 00000001)
[ 0.000000] ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe0Block: 64/32 (20170728/tbfadt-603)
[ 0.000000] ACPI BIOS Warning (bug): Optional FADT field Gpe1Block has valid Address but zero Length: 0x000000000000102C/0x0 (20170728/tbfadt-658)
[ 0.000000] ACPI: DSDT 0x00000000BF6D1D90 00CFB9 (v01 LENOVO TP-7B 00002190 MSFT 0100000E)
[ 0.000000] ACPI: FACS 0x00000000BF6F4000 000040
[ 0.000000] ACPI: FACS 0x00000000BF6F4000 000040
[ 0.000000] ACPI: SSDT 0x00000000BF6D1BB4 0001DC (v01 LENOVO TP-7B 00002190 MSFT 0100000E)
[ 0.000000] ACPI: ECDT 0x00000000BF6DED49 000052 (v01 LENOVO TP-7B 00002190 LNVO 00000001)
[ 0.000000] ACPI: TCPA 0x00000000BF6DED9B 000032 (v02 LENOVO TP-7B 00002190 LNVO 00000001)
[ 0.000000] ACPI: APIC 0x00000000BF6DEDCD 000068 (v01 LENOVO TP-7B 00002190 LNVO 00000001)
[ 0.000000] ACPI: MCFG 0x00000000BF6DEE35 00003C (v01 LENOVO TP-7B 00002190 LNVO 00000001)
[ 0.000000] ACPI: HPET 0x00000000BF6DEE71 000038 (v01 LENOVO TP-7B 00002190 LNVO 00000001)
[ 0.000000] ACPI: BOOT 0x00000000BF6DEFD8 000028 (v01 LENOVO TP-7B 00002190 LTP 00000001)
[ 0.000000] ACPI: SSDT 0x00000000BF6F2645 00025F (v01 LENOVO TP-7B 00002190 INTL 20050513)
[ 0.000000] ACPI: SSDT 0x00000000BF6F28A4 0000A6 (v01 LENOVO TP-7B 00002190 INTL 20050513)
[ 0.000000] ACPI: SSDT 0x00000000BF6F294A 0004F7 (v01 LENOVO TP-7B 00002190 INTL 20050513)
[ 0.000000] ACPI: SSDT 0x00000000BF6F2E41 0001D8 (v01 LENOVO TP-7B 00002190 INTL 20050513)
[ 0.000000] ACPI: Local APIC address 0xfee00000
[ 0.000000] 2170MB HIGHMEM available.
[ 0.000000] 891MB LOWMEM available.
[ 0.000000] mapped low ram: 0 - 37bfe000
[ 0.000000] low ram: 0 - 37bfe000
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000000001000-0x0000000000ffffff]
[ 0.000000] Normal [mem 0x0000000001000000-0x0000000037bfdfff]
[ 0.000000] HighMem [mem 0x0000000037bfe000-0x00000000bf6cffff]
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000000001000-0x000000000009efff]
[ 0.000000] node 0: [mem 0x0000000000100000-0x00000000bf6cffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000000001000-0x00000000bf6cffff]
[ 0.000000] On node 0 totalpages: 783982
[ 0.000000] free_area_init_node: node 0, pgdat c50d5e40, node_mem_map f640e020
[ 0.000000] DMA zone: 32 pages used for memmap
[ 0.000000] DMA zone: 0 pages reserved
[ 0.000000] DMA zone: 3998 pages, LIFO batch:0
[ 0.000000] Normal zone: 1752 pages used for memmap
[ 0.000000] Normal zone: 224254 pages, LIFO batch:31
[ 0.000000] HighMem zone: 555730 pages, LIFO batch:31
[ 0.000000] Using APIC driver default
[ 0.000000] Reserving Intel graphics memory at 0x00000000bf800000-0x00000000bfffffff
[ 0.000000] ACPI: PM-Timer IO Port: 0x1008
[ 0.000000] ACPI: Local APIC address 0xfee00000
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
[ 0.000000] IOAPIC[0]: apic_id 1, version 32, address 0xfec00000, GSI 0-23
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[ 0.000000] ACPI: IRQ0 used by override.
[ 0.000000] ACPI: IRQ9 used by override.
[ 0.000000] Using ACPI (MADT) for SMP configuration information
[ 0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[ 0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
[ 0.000000] PM: Registered nosave memory: [mem 0x00000000-0x00000fff]
[ 0.000000] PM: Registered nosave memory: [mem 0x0009f000-0x0009ffff]
[ 0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000dbfff]
[ 0.000000] PM: Registered nosave memory: [mem 0x000dc000-0x000fffff]
[ 0.000000] e820: [mem 0xc0000000-0xefffffff] available for PCI devices
[ 0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[ 0.000000] setup_percpu: NR_CPUS:8 nr_cpumask_bits:8 nr_cpu_ids:2 nr_node_ids:1
[ 0.000000] percpu: Embedded 22 pages/cpu @f63df000 s57704 r0 d32408 u90112
[ 0.000000] pcpu-alloc: s57704 r0 d32408 u90112 alloc=22*4096
[ 0.000000] pcpu-alloc: [0] 0 [0] 1
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 782198
[ 0.000000] Kernel command line: BOOT_IMAGE=(hd0,2)/fast/l/linux/arch/x86/boot/bzImage root=/dev/sda4 resume=/dev/sda1
[ 0.000000] PID hash table entries: 4096 (order: 2, 16384 bytes)
[ 0.000000] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
[ 0.000000] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
[ 0.000000] Initializing CPU#0
[ 0.000000] Initializing HighMem for node 0 (00037bfe:000bf6d0)
[ 0.000000] Initializing Movable for node 0 (00000000:00000000)
[ 0.000000] Memory: 3085872K/3135928K available (11242K kernel code, 746K rwdata, 5368K rodata, 564K init, 6176K bss, 50056K reserved, 0K cma-reserved, 2222920K highmem)
[ 0.000000] virtual kernel memory layout:
fixmap : 0xffe67000 - 0xfffff000 (1632 kB)
pkmap : 0xffc00000 - 0xffe00000 (2048 kB)
vmalloc : 0xf83fe000 - 0xffbfe000 ( 120 MB)
lowmem : 0xc0000000 - 0xf7bfe000 ( 891 MB)
.init : 0xc5118000 - 0xc51a5000 ( 564 kB)
.data : 0xc4afabfc - 0xc50f69c0 (6127 kB)
.text : 0xc4000000 - 0xc4afabfc (11242 kB)
[ 0.000000] Checking if this processor honours the WP bit even in supervisor mode...Ok.
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=2.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
[ 0.000000] NR_IRQS: 2304, nr_irqs: 440, preallocated irqs: 16
[ 0.000000] CPU 0 irqstacks, hard=f5c0e000 soft=f5c10000
[ 0.000000] Console: colour VGA+ 80x25
[ 0.000000] console [tty0] enabled
[ 0.000000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
[ 0.000000] ... MAX_LOCKDEP_SUBCLASSES: 8
[ 0.000000] ... MAX_LOCK_DEPTH: 48
[ 0.000000] ... MAX_LOCKDEP_KEYS: 8191
[ 0.000000] ... CLASSHASH_SIZE: 4096
[ 0.000000] ... MAX_LOCKDEP_ENTRIES: 32768
[ 0.000000] ... MAX_LOCKDEP_CHAINS: 65536
[ 0.000000] ... CHAINHASH_SIZE: 32768
[ 0.000000] memory used by lock dependency info: 4383 kB
[ 0.000000] per task-struct memory footprint: 1344 bytes
[ 0.000000] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484882848 ns
[ 0.000000] hpet clockevent registered
[ 0.000000] tsc: Fast TSC calibration failed
[ 0.008000] tsc: PIT calibration matches HPET. 1 loops
[ 0.008000] tsc: Detected 1828.759 MHz processor
[ 0.008000] Calibrating delay loop (skipped), value calculated using timer frequency.. 3657.51 BogoMIPS (lpj=7315036)
[ 0.008000] pid_max: default: 32768 minimum: 301
[ 0.008000] ACPI: Core revision 20170728
[ 0.074285] ACPI: 6 ACPI AML tables successfully acquired and loaded
[ 0.074615] Mount-cache hash table entries: 2048 (order: 1, 8192 bytes)
[ 0.074706] Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes)
[ 0.075452] Disabled fast string operations
[ 0.075527] CPU: Physical Processor ID: 0
[ 0.075599] CPU: Processor Core ID: 0
[ 0.075676] process: using mwait in idle threads
[ 0.075759] Last level iTLB entries: 4KB 128, 2MB 0, 4MB 2
[ 0.075839] Last level dTLB entries: 4KB 128, 2MB 0, 4MB 8, 1GB 0
[ 0.076099] Freeing SMP alternatives memory: 40K
[ 0.080268] smpboot: Max logical packages: 1
[ 0.080346] Enabling APIC mode: Flat. Using 1 I/O APICs
[ 0.084350] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.128000] smpboot: CPU0: Genuine Intel(R) CPU T2400 @ 1.83GHz (family: 0x6, model: 0xe, stepping: 0x8)
[ 0.128000] Performance Events: Core events, core PMU driver.
[ 0.128000] ... version: 1
[ 0.128000] ... bit width: 40
[ 0.128000] ... generic registers: 2
[ 0.128000] ... value mask: 000000ffffffffff
[ 0.128000] ... max period: 000000007fffffff
[ 0.128000] ... fixed-purpose events: 0
[ 0.128000] ... event mask: 0000000000000003
[ 0.128000] Hierarchical SRCU implementation.
[ 0.128000] smp: Bringing up secondary CPUs ...
[ 0.128204] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
[ 0.128779] CPU 1 irqstacks, hard=f5cd6000 soft=f5cd8000
[ 0.128782] x86: Booting SMP configuration:
[ 0.128863] .... node #0, CPUs: #1
[ 0.004000] Initializing CPU#1
[ 0.004000] ------------[ cut here ]------------
[ 0.004000] WARNING: CPU: 1 PID: 0 at arch/x86/mm/tlb.c:257 initialize_tlbstate_and_flush+0x27/0xcf
[ 0.004000] Modules linked in:
[ 0.004000] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.13.0+ #429
[ 0.004000] Hardware name: LENOVO 17097HU/17097HU, BIOS 7BETD8WW (2.19 ) 03/31/2011
[ 0.004000] task: f5ca2080 task.stack: f5cc4000
[ 0.004000] EIP: initialize_tlbstate_and_flush+0x27/0xcf
[ 0.004000] EFLAGS: 00210087 CPU: 1
[ 0.004000] EAX: 00000000 EBX: c506d540 ECX: 051b2000 EDX: 00000000
[ 0.004000] ESI: 0503f000 EDI: c51b2000 EBP: f5cc5f54 ESP: f5cc5f48
[ 0.004000] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 0.004000] CR0: 80050033 CR2: 00000000 CR3: 0503f000 CR4: 000006b0
[ 0.004000] Call Trace:
[ 0.004000] cpu_init+0xdc/0x2f0
[ 0.004000] start_secondary+0x34/0x1c6
[ 0.004000] startup_32_smp+0x164/0x166
[ 0.004000] ? startup_32_smp+0x164/0x166
[ 0.004000] Code: ff 5d f3 c3 55 89 e5 57 56 53 64 8b 1d c0 30 1a c5 b9 10 d7 06 c5 e8 20 8a ab 00 0f 20 de 8b 7b 20 8d 8f 00 00 00 40 39 ce 74 02 <0f> ff 8b 0d 20 f3 0e c5 f7 c1 00 00 02 00 74 0f 64 8b 0d c8 30
[ 0.004000] ---[ end trace 3026cee454dd6961 ]---
[ 0.004000] Disabled fast string operations
[ 0.210014] TSC synchronization [CPU#0 -> CPU#1]:
[ 0.212000] Measured 579612 cycles TSC warp between CPUs, turning off TSC clock.
[ 0.212000] tsc: Marking TSC unstable due to check_tsc_sync_source failed
[ 0.212079] smp: Brought up 1 node, 2 CPUs
[ 0.212086] smpboot: Total of 2 processors activated (7315.04 BogoMIPS)
[ 0.213821] devtmpfs: initialized
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
* Pavel Machek <[email protected]> wrote:
> On Wed 2017-09-06 20:25:10, Linus Torvalds wrote:
> > On Wed, Sep 6, 2017 at 7:54 PM, Andy Lutomirski <[email protected]> wrote:
> > > Patch 1 is the fix. Patch 2 is a comment that would have kept me from
> > > chasing down a false lead.
> >
> > Yes, this seems to fix things for me. Thanks.
> >
> > Of course, right now that laptop has no working wifi with tip-of-tree
> > due to some issues with the networking tree, but that's an independent
> > thing and I could suspend and resume with this. So applied and pushed
> > out,
>
> Ok, seems this is still not completely right, I'm now getting WARN_ON
> during boot and on every resume... but machine works.
>
> 4.14-rc0, 32-bit.
Which SHA1, just to make sure? (Please enable CONFIG_LOCALVERSION_AUTO=y.)
> [ 0.004000] Initializing CPU#1
> [ 0.004000] ------------[ cut here ]------------
> [ 0.004000] WARNING: CPU: 1 PID: 0 at arch/x86/mm/tlb.c:257 initialize_tlbstate_and_flush+0x27/0xcf
> [ 0.004000] Modules linked in:
> [ 0.004000] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.13.0+ #429
> [ 0.004000] Hardware name: LENOVO 17097HU/17097HU, BIOS 7BETD8WW (2.19 ) 03/31/2011
> [ 0.004000] task: f5ca2080 task.stack: f5cc4000
> [ 0.004000] EIP: initialize_tlbstate_and_flush+0x27/0xcf
> [ 0.004000] EFLAGS: 00210087 CPU: 1
> [ 0.004000] EAX: 00000000 EBX: c506d540 ECX: 051b2000 EDX: 00000000
> [ 0.004000] ESI: 0503f000 EDI: c51b2000 EBP: f5cc5f54 ESP: f5cc5f48
> [ 0.004000] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> [ 0.004000] CR0: 80050033 CR2: 00000000 CR3: 0503f000 CR4: 000006b0
> [ 0.004000] Call Trace:
> [ 0.004000] cpu_init+0xdc/0x2f0
> [ 0.004000] start_secondary+0x34/0x1c6
> [ 0.004000] startup_32_smp+0x164/0x166
> [ 0.004000] ? startup_32_smp+0x164/0x166
Could you please try the debug patch below, so that we get a bit more info?
Thanks,
Ingo
===============>
arch/x86/mm/tlb.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 1ab3821f9e26..f98feb4b39a7 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -254,7 +254,8 @@ void initialize_tlbstate_and_flush(void)
unsigned long cr3 = __read_cr3();
/* Assert that CR3 already references the right mm. */
- WARN_ON((cr3 & CR3_ADDR_MASK) != __pa(mm->pgd));
+ if (WARN_ON((cr3 & CR3_ADDR_MASK) != __pa(mm->pgd)))
+ printk("# CR3: %016lx, __pa(mm->pgd): %016lx\n", cr3, __pa(mm->pgd));
/*
* Assert that CR4.PCIDE is set if needed. (CR4.PCIDE initialization
Hi!
> * Pavel Machek <[email protected]> wrote:
>
> > On Wed 2017-09-06 20:25:10, Linus Torvalds wrote:
> > > On Wed, Sep 6, 2017 at 7:54 PM, Andy Lutomirski <[email protected]> wrote:
> > > > Patch 1 is the fix. Patch 2 is a comment that would have kept me from
> > > > chasing down a false lead.
> > >
> > > Yes, this seems to fix things for me. Thanks.
> > >
> > > Of course, right now that laptop has no working wifi with tip-of-tree
> > > due to some issues with the networking tree, but that's an independent
> > > thing and I could suspend and resume with this. So applied and pushed
> > > out,
> >
> > Ok, seems this is still not completely right, I'm now getting WARN_ON
> > during boot and on every resume... but machine works.
> >
> > 4.14-rc0, 32-bit.
>
> Which SHA1, just to make sure? (Please enable CONFIG_LOCALVERSION_AUTO=y.)
46c1e79fee417f151547aa46fae04ab06cb666f4, AFAICT.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
* Pavel Machek <[email protected]> wrote:
> Hi!
>
> > * Pavel Machek <[email protected]> wrote:
> >
> > > On Wed 2017-09-06 20:25:10, Linus Torvalds wrote:
> > > > On Wed, Sep 6, 2017 at 7:54 PM, Andy Lutomirski <[email protected]> wrote:
> > > > > Patch 1 is the fix. Patch 2 is a comment that would have kept me from
> > > > > chasing down a false lead.
> > > >
> > > > Yes, this seems to fix things for me. Thanks.
> > > >
> > > > Of course, right now that laptop has no working wifi with tip-of-tree
> > > > due to some issues with the networking tree, but that's an independent
> > > > thing and I could suspend and resume with this. So applied and pushed
> > > > out,
> > >
> > > Ok, seems this is still not completely right, I'm now getting WARN_ON
> > > during boot and on every resume... but machine works.
> > >
> > > 4.14-rc0, 32-bit.
> >
> > Which SHA1, just to make sure? (Please enable CONFIG_LOCALVERSION_AUTO=y.)
>
> 46c1e79fee417f151547aa46fae04ab06cb666f4, AFAICT.
Ok, just to reiterate: that SHA1 has all relevant and current fixes included, so
this definitely is an open regression.
Thanks,
Ingo
Hi!
> > Ok, seems this is still not completely right, I'm now getting WARN_ON
> > during boot and on every resume... but machine works.
> >
> > 4.14-rc0, 32-bit.
>
> Which SHA1, just to make sure? (Please enable CONFIG_LOCALVERSION_AUTO=y.)
>
> > [ 0.004000] Initializing CPU#1
> > [ 0.004000] ------------[ cut here ]------------
> > [ 0.004000] WARNING: CPU: 1 PID: 0 at arch/x86/mm/tlb.c:257 initialize_tlbstate_and_flush+0x27/0xcf
> > [ 0.004000] Modules linked in:
> > [ 0.004000] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.13.0+ #429
> > [ 0.004000] Hardware name: LENOVO 17097HU/17097HU, BIOS 7BETD8WW (2.19 ) 03/31/2011
> > [ 0.004000] task: f5ca2080 task.stack: f5cc4000
> > [ 0.004000] EIP: initialize_tlbstate_and_flush+0x27/0xcf
> > [ 0.004000] EFLAGS: 00210087 CPU: 1
> > [ 0.004000] EAX: 00000000 EBX: c506d540 ECX: 051b2000 EDX: 00000000
> > [ 0.004000] ESI: 0503f000 EDI: c51b2000 EBP: f5cc5f54 ESP: f5cc5f48
> > [ 0.004000] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> > [ 0.004000] CR0: 80050033 CR2: 00000000 CR3: 0503f000 CR4: 000006b0
> > [ 0.004000] Call Trace:
> > [ 0.004000] cpu_init+0xdc/0x2f0
> > [ 0.004000] start_secondary+0x34/0x1c6
> > [ 0.004000] startup_32_smp+0x164/0x166
> > [ 0.004000] ? startup_32_smp+0x164/0x166
>
> Could you please try the debug patch below, so that we get a bit
> more info?
Let me pull latest...
711aab1dbb324d321e3d84368a435a78908c7bce
(Strange. Not authored by Linus and old?)
Applying patch is easy enough.
> @@ -254,7 +254,8 @@ void initialize_tlbstate_and_flush(void)
> unsigned long cr3 = __read_cr3();
>
> /* Assert that CR3 already references the right mm. */
> - WARN_ON((cr3 & CR3_ADDR_MASK) != __pa(mm->pgd));
> + if (WARN_ON((cr3 & CR3_ADDR_MASK) != __pa(mm->pgd)))
> + printk("# CR3: %016lx, __pa(mm->pgd): %016lx\n", cr3, __pa(mm->pgd));
>
> /*
> * Assert that CR4.PCIDE is set if needed. (CR4.PCIDE initialization
But result is still similar, this time with more debug information.
Best regards,
Pavel
[ 0.000000] Linux version 4.13.0+ (pavel@duo) (gcc version 4.9.2 (Debian 4.9.2-10)) #431 SMP Fri Sep 15 12:05:10 CEST 2017
[ 0.000000] Disabled fast string operations
[ 0.000000] x86/fpu: x87 FPU will use FXSAVE
[ 0.000000] e820: BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009efff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009f000-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000dc000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bf6cffff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000bf6d0000-0x00000000bf6defff] ACPI data
[ 0.000000] BIOS-e820: [mem 0x00000000bf6df000-0x00000000bf6fffff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000bf700000-0x00000000bfffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000f0000000-0x00000000f3ffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec0ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed00000-0x00000000fed003ff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed14000-0x00000000fed19fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed8ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000ff800000-0x00000000ffffffff] reserved
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] random: fast init done
[ 0.000000] SMBIOS 2.4 present.
[ 0.000000] DMI: LENOVO 17097HU/17097HU, BIOS 7BETD8WW (2.19 ) 03/31/2011
[ 0.000000] tsc: Fast TSC calibration using PIT
[ 0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[ 0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable
[ 0.000000] e820: last_pfn = 0xbf6d0 max_arch_pfn = 0x1000000
[ 0.000000] MTRR default type: uncachable
[ 0.000000] MTRR fixed ranges enabled:
[ 0.000000] 00000-9FFFF write-back
[ 0.000000] A0000-BFFFF uncachable
[ 0.000000] C0000-CFFFF write-protect
[ 0.000000] D0000-DBFFF uncachable
[ 0.000000] DC000-DFFFF write-back
[ 0.000000] E0000-FFFFF write-protect
[ 0.000000] MTRR variable ranges enabled:
[ 0.000000] 0 base 000000000 mask F80000000 write-back
[ 0.000000] 1 base 080000000 mask FC0000000 write-back
[ 0.000000] 2 base 0BF700000 mask FFFF00000 uncachable
[ 0.000000] 3 base 0BF800000 mask FFF800000 uncachable
[ 0.000000] 4 disabled
[ 0.000000] 5 disabled
[ 0.000000] 6 disabled
[ 0.000000] 7 disabled
[ 0.000000] x86/PAT: PAT not supported by CPU.
[ 0.000000] x86/PAT: Configuration [0-7]: WB WT UC- UC WB WT UC- UC
[ 0.000000] initial memory mapped: [mem 0x00000000-0x05bfffff]
[ 0.000000] Base memory trampoline at [c009b000] 9b000 size 16384
[ 0.000000] BRK [0x0567f000, 0x0567ffff] PGTABLE
[ 0.000000] BRK [0x05680000, 0x05680fff] PGTABLE
[ 0.000000] ACPI: Early table checksum verification disabled
[ 0.000000] ACPI: RSDP 0x00000000000F67C0 000024 (v02 LENOVO)
[ 0.000000] ACPI: XSDT 0x00000000BF6D191C 000084 (v01 LENOVO TP-7B 00002190 LTP 00000000)
[ 0.000000] ACPI: FACP 0x00000000BF6D1A00 0000F4 (v03 LENOVO TP-7B 00002190 LNVO 00000001)
[ 0.000000] ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe0Block: 64/32 (20170728/tbfadt-603)
[ 0.000000] ACPI BIOS Warning (bug): Optional FADT field Gpe1Block has valid Address but zero Length: 0x000000000000102C/0x0 (20170728/tbfadt-658)
[ 0.000000] ACPI: DSDT 0x00000000BF6D1D90 00CFB9 (v01 LENOVO TP-7B 00002190 MSFT 0100000E)
[ 0.000000] ACPI: FACS 0x00000000BF6F4000 000040
[ 0.000000] ACPI: FACS 0x00000000BF6F4000 000040
[ 0.000000] ACPI: SSDT 0x00000000BF6D1BB4 0001DC (v01 LENOVO TP-7B 00002190 MSFT 0100000E)
[ 0.000000] ACPI: ECDT 0x00000000BF6DED49 000052 (v01 LENOVO TP-7B 00002190 LNVO 00000001)
[ 0.000000] ACPI: TCPA 0x00000000BF6DED9B 000032 (v02 LENOVO TP-7B 00002190 LNVO 00000001)
[ 0.000000] ACPI: APIC 0x00000000BF6DEDCD 000068 (v01 LENOVO TP-7B 00002190 LNVO 00000001)
[ 0.000000] ACPI: MCFG 0x00000000BF6DEE35 00003C (v01 LENOVO TP-7B 00002190 LNVO 00000001)
[ 0.000000] ACPI: HPET 0x00000000BF6DEE71 000038 (v01 LENOVO TP-7B 00002190 LNVO 00000001)
[ 0.000000] ACPI: BOOT 0x00000000BF6DEFD8 000028 (v01 LENOVO TP-7B 00002190 LTP 00000001)
[ 0.000000] ACPI: SSDT 0x00000000BF6F2645 00025F (v01 LENOVO TP-7B 00002190 INTL 20050513)
[ 0.000000] ACPI: SSDT 0x00000000BF6F28A4 0000A6 (v01 LENOVO TP-7B 00002190 INTL 20050513)
[ 0.000000] ACPI: SSDT 0x00000000BF6F294A 0004F7 (v01 LENOVO TP-7B 00002190 INTL 20050513)
[ 0.000000] ACPI: SSDT 0x00000000BF6F2E41 0001D8 (v01 LENOVO TP-7B 00002190 INTL 20050513)
[ 0.000000] ACPI: Local APIC address 0xfee00000
[ 0.000000] 2170MB HIGHMEM available.
[ 0.000000] 891MB LOWMEM available.
[ 0.000000] mapped low ram: 0 - 37bfe000
[ 0.000000] low ram: 0 - 37bfe000
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000000001000-0x0000000000ffffff]
[ 0.000000] Normal [mem 0x0000000001000000-0x0000000037bfdfff]
[ 0.000000] HighMem [mem 0x0000000037bfe000-0x00000000bf6cffff]
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000000001000-0x000000000009efff]
[ 0.000000] node 0: [mem 0x0000000000100000-0x00000000bf6cffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000000001000-0x00000000bf6cffff]
[ 0.000000] On node 0 totalpages: 783982
[ 0.000000] free_area_init_node: node 0, pgdat c4f6fe00, node_mem_map f640e020
[ 0.000000] DMA zone: 32 pages used for memmap
[ 0.000000] DMA zone: 0 pages reserved
[ 0.000000] DMA zone: 3998 pages, LIFO batch:0
[ 0.000000] Normal zone: 1752 pages used for memmap
[ 0.000000] Normal zone: 224254 pages, LIFO batch:31
[ 0.000000] HighMem zone: 555730 pages, LIFO batch:31
[ 0.000000] Using APIC driver default
[ 0.000000] Reserving Intel graphics memory at 0x00000000bf800000-0x00000000bfffffff
[ 0.000000] ACPI: PM-Timer IO Port: 0x1008
[ 0.000000] ACPI: Local APIC address 0xfee00000
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
[ 0.000000] IOAPIC[0]: apic_id 1, version 32, address 0xfec00000, GSI 0-23
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[ 0.000000] ACPI: IRQ0 used by override.
[ 0.000000] ACPI: IRQ9 used by override.
[ 0.000000] Using ACPI (MADT) for SMP configuration information
[ 0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[ 0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
[ 0.000000] PM: Registered nosave memory: [mem 0x00000000-0x00000fff]
[ 0.000000] PM: Registered nosave memory: [mem 0x0009f000-0x0009ffff]
[ 0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000dbfff]
[ 0.000000] PM: Registered nosave memory: [mem 0x000dc000-0x000fffff]
[ 0.000000] e820: [mem 0xc0000000-0xefffffff] available for PCI devices
[ 0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[ 0.000000] setup_percpu: NR_CPUS:8 nr_cpumask_bits:8 nr_cpu_ids:2 nr_node_ids:1
[ 0.000000] percpu: Embedded 22 pages/cpu @f63df000 s57704 r0 d32408 u90112
[ 0.000000] pcpu-alloc: s57704 r0 d32408 u90112 alloc=22*4096
[ 0.000000] pcpu-alloc: [0] 0 [0] 1
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 782198
[ 0.000000] Kernel command line: BOOT_IMAGE=(hd0,2)/fast/l/linux/arch/x86/boot/bzImage root=/dev/sda4 resume=/dev/sda1
[ 0.000000] PID hash table entries: 4096 (order: 2, 16384 bytes)
[ 0.000000] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
[ 0.000000] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
[ 0.000000] Initializing CPU#0
[ 0.000000] Initializing HighMem for node 0 (00037bfe:000bf6d0)
[ 0.000000] Initializing Movable for node 0 (00000000:00000000)
[ 0.000000] Memory: 3087308K/3135928K available (10401K kernel code, 689K rwdata, 4832K rodata, 564K init, 6176K bss, 48620K reserved, 0K cma-reserved, 2222920K highmem)
[ 0.000000] virtual kernel memory layout:
fixmap : 0xffe67000 - 0xfffff000 (1632 kB)
pkmap : 0xffc00000 - 0xffe00000 (2048 kB)
vmalloc : 0xf83fe000 - 0xffbfe000 ( 120 MB)
lowmem : 0xc0000000 - 0xf7bfe000 ( 891 MB)
.init : 0xc4fb1000 - 0xc503e000 ( 564 kB)
.data : 0xc4a287bc - 0xc4f906c0 (5535 kB)
.text : 0xc4000000 - 0xc4a287bc (10401 kB)
[ 0.000000] Checking if this processor honours the WP bit even in supervisor mode...Ok.
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=2.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
[ 0.000000] NR_IRQS: 2304, nr_irqs: 440, preallocated irqs: 16
[ 0.000000] CPU 0 irqstacks, hard=f5c0e000 soft=f5c10000
[ 0.000000] Console: colour VGA+ 80x25
[ 0.000000] console [tty0] enabled
[ 0.000000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
[ 0.000000] ... MAX_LOCKDEP_SUBCLASSES: 8
[ 0.000000] ... MAX_LOCK_DEPTH: 48
[ 0.000000] ... MAX_LOCKDEP_KEYS: 8191
[ 0.000000] ... CLASSHASH_SIZE: 4096
[ 0.000000] ... MAX_LOCKDEP_ENTRIES: 32768
[ 0.000000] ... MAX_LOCKDEP_CHAINS: 65536
[ 0.000000] ... CHAINHASH_SIZE: 32768
[ 0.000000] memory used by lock dependency info: 4383 kB
[ 0.000000] per task-struct memory footprint: 1344 bytes
[ 0.000000] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484882848 ns
[ 0.000000] hpet clockevent registered
[ 0.000000] tsc: Fast TSC calibration using PIT
[ 0.004000] tsc: Detected 1828.818 MHz processor
[ 0.004000] Calibrating delay loop (skipped), value calculated using timer frequency.. 3657.63 BogoMIPS (lpj=7315272)
[ 0.004000] pid_max: default: 32768 minimum: 301
[ 0.004000] ACPI: Core revision 20170728
[ 0.067814] ACPI: 6 ACPI AML tables successfully acquired and loaded
[ 0.068163] Mount-cache hash table entries: 2048 (order: 1, 8192 bytes)
[ 0.068254] Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes)
[ 0.068999] Disabled fast string operations
[ 0.069074] CPU: Physical Processor ID: 0
[ 0.069146] CPU: Processor Core ID: 0
[ 0.069224] process: using mwait in idle threads
[ 0.069306] Last level iTLB entries: 4KB 128, 2MB 0, 4MB 2
[ 0.069385] Last level dTLB entries: 4KB 128, 2MB 0, 4MB 8, 1GB 0
[ 0.069632] Freeing SMP alternatives memory: 40K
[ 0.072271] smpboot: Max logical packages: 1
[ 0.072350] Enabling APIC mode: Flat. Using 1 I/O APICs
[ 0.073008] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.116000] smpboot: CPU0: Genuine Intel(R) CPU T2400 @ 1.83GHz (family: 0x6, model: 0xe, stepping: 0x8)
[ 0.116000] Performance Events: Core events, core PMU driver.
[ 0.116000] ... version: 1
[ 0.116000] ... bit width: 40
[ 0.116000] ... generic registers: 2
[ 0.116000] ... value mask: 000000ffffffffff
[ 0.116000] ... max period: 000000007fffffff
[ 0.116000] ... fixed-purpose events: 0
[ 0.116000] ... event mask: 0000000000000003
[ 0.116000] Hierarchical SRCU implementation.
[ 0.116000] smp: Bringing up secondary CPUs ...
[ 0.116091] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
[ 0.116809] CPU 1 irqstacks, hard=f5cd6000 soft=f5cd8000
[ 0.116813] x86: Booting SMP configuration:
[ 0.116893] .... node #0, CPUs: #1
[ 0.004000] Initializing CPU#1
[ 0.004000] ------------[ cut here ]------------
[ 0.004000] WARNING: CPU: 1 PID: 0 at arch/x86/mm/tlb.c:257 initialize_tlbstate_and_flush+0x2e/0xed
[ 0.004000] Modules linked in:
[ 0.004000] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.13.0+ #431
[ 0.004000] Hardware name: LENOVO 17097HU/17097HU, BIOS 7BETD8WW (2.19 ) 03/31/2011
[ 0.004000] task: f5ca2080 task.stack: f5cc4000
[ 0.004000] EIP: initialize_tlbstate_and_flush+0x2e/0xed
[ 0.004000] EFLAGS: 00210087 CPU: 1
[ 0.004000] EAX: 0504b000 EBX: c4f15540 ECX: c4f15710 EDX: 00000000
[ 0.004000] ESI: 04ee7000 EDI: f5ca2080 EBP: f5cc5f54 ESP: f5cc5f44
[ 0.004000] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 0.004000] CR0: 80050033 CR2: 00000000 CR3: 04ee7000 CR4: 000006b0
[ 0.004000] Call Trace:
[ 0.004000] cpu_init+0xdc/0x2f0
[ 0.004000] start_secondary+0x34/0x1c6
[ 0.004000] startup_32_smp+0x164/0x166
[ 0.004000] ? startup_32_smp+0x164/0x166
[ 0.004000] Code: 56 53 83 ec 08 64 8b 1d c0 c0 03 c5 b9 10 57 f1 c4 e8 de 65 9e 00 89 45 f0 89 55 f4 0f 20 de 8b 43 20 05 00 00 00 40 39 c6 74 11 <0f> ff 50 56 68 74 9c d8 c4 e8 d1 cd 04 00 83 c4 0c a1 20 90 f8
[ 0.004000] ---[ end trace 7439e29925a49b51 ]---
[ 0.004000] # CR3: 0000000004ee7000, __pa(mm->pgd): 000000000504b000
[ 0.004000] Disabled fast string operations
[ 0.198012] TSC synchronization [CPU#0 -> CPU#1]:
[ 0.200000] Measured 579854 cycles TSC warp between CPUs, turning off TSC clock.
[ 0.200000] tsc: Marking TSC unstable due to check_tsc_sync_source failed
[ 0.200082] smp: Brought up 1 node, 2 CPUs
[ 0.200086] smpboot: Total of 2 processors activated (7315.14 BogoMIPS)
[ 0.201822] devtmpfs: initialized
[ 0.204623] PM: Registering ACPI NVS region [mem 0xbf6df000-0xbf6fffff] (135168 bytes)
[ 0.205027] clocksource: jiffies: mask: 0xffffffff max_cycles:
0xffffffff, max_idle_ns: 7645041785100000 ns
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Wed 2017-09-06 19:54:52, Andy Lutomirski wrote:
> Patch 1 is the fix. Patch 2 is a comment that would have kept me from
> chasing down a false lead.
>
> I've tested patch 2 using CPU hotplug and suspend/resume. I haven't
> tested hibernation or kexec because I don't know how. (If I do
> systemctl hibernate on my laptop, it happily writes out a hiberation
> image somewhere and then it equally happily ignores it on the next
> boot. I don't know how to test kexec.)
For hibernation, pass resume=<your swap partition> on kernel command
line.
There's even docs somewhere...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Fri, Sep 15, 2017 at 3:22 AM, Pavel Machek <[email protected]> wrote:
>
> Let me pull latest...
>
> 711aab1dbb324d321e3d84368a435a78908c7bce
>
> (Strange. Not authored by Linus and old?)
That's the author date, the committer date is new. Top of tree right
now just happens to be a patch I applied, it's much more commonly a
merge I've done.
> But result is still similar, this time with more debug information.
> [ 0.116813] x86: Booting SMP configuration:
> [ 0.116893] .... node #0, CPUs: #1
> [ 0.004000] Initializing CPU#1
> [ 0.004000] ------------[ cut here ]------------
> [ 0.004000] WARNING: CPU: 1 PID: 0 at arch/x86/mm/tlb.c:257 initialize_tlbstate_and_flush+0x2e/0xed
> [ 0.004000] Hardware name: LENOVO 17097HU/17097HU, BIOS 7BETD8WW (2.19 ) 03/31/2011
> [ 0.004000] task: f5ca2080 task.stack: f5cc4000
> [ 0.004000] EIP: initialize_tlbstate_and_flush+0x2e/0xed
> [ 0.004000] EFLAGS: 00210087 CPU: 1
> [ 0.004000] EAX: 0504b000 EBX: c4f15540 ECX: c4f15710 EDX: 00000000
> [ 0.004000] ESI: 04ee7000 EDI: f5ca2080 EBP: f5cc5f54 ESP: f5cc5f44
> [ 0.004000] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> [ 0.004000] CR0: 80050033 CR2: 00000000 CR3: 04ee7000 CR4: 000006b0
> [ 0.004000] Call Trace:
> [ 0.004000] cpu_init+0xdc/0x2f0
> [ 0.004000] start_secondary+0x34/0x1c6
> [ 0.004000] startup_32_smp+0x164/0x166
> [ 0.004000] ? startup_32_smp+0x164/0x166
> [ 0.004000] Code: 56 53 83 ec 08 64 8b 1d c0 c0 03 c5 b9 10 57 f1 c4 e8 de 65 9e 00 89 45 f0 89 55 f4 0f 20 de 8b 43 20 05 00 00 00 40 39 c6 74 11 <0f> ff 50 56 68 74 9c d8 c4 e8 d1 cd 04 00 83 c4 0c a1 20 90 f8
> [ 0.004000] ---[ end trace 7439e29925a49b51 ]---
> [ 0.004000] # CR3: 0000000004ee7000, __pa(mm->pgd): 000000000504b000
Ok, clearly Andy didn't get the 32-bit SMP bringup path right.
Presumable tested a 32-bit UP image, or in a single-cpu VM?
Andy?
Linus
> On Sep 15, 2017, at 11:47 AM, Linus Torvalds <[email protected]> wrote:
>
>> On Fri, Sep 15, 2017 at 3:22 AM, Pavel Machek <[email protected]> wrote:
>>
>> Let me pull latest...
>>
>> 711aab1dbb324d321e3d84368a435a78908c7bce
>>
>> (Strange. Not authored by Linus and old?)
>
> That's the author date, the committer date is new. Top of tree right
> now just happens to be a patch I applied, it's much more commonly a
> merge I've done.
>
>> But result is still similar, this time with more debug information.
>> [ 0.116813] x86: Booting SMP configuration:
>> [ 0.116893] .... node #0, CPUs: #1
>> [ 0.004000] Initializing CPU#1
>> [ 0.004000] ------------[ cut here ]------------
>> [ 0.004000] WARNING: CPU: 1 PID: 0 at arch/x86/mm/tlb.c:257 initialize_tlbstate_and_flush+0x2e/0xed
>> [ 0.004000] Hardware name: LENOVO 17097HU/17097HU, BIOS 7BETD8WW (2.19 ) 03/31/2011
>> [ 0.004000] task: f5ca2080 task.stack: f5cc4000
>> [ 0.004000] EIP: initialize_tlbstate_and_flush+0x2e/0xed
>> [ 0.004000] EFLAGS: 00210087 CPU: 1
>> [ 0.004000] EAX: 0504b000 EBX: c4f15540 ECX: c4f15710 EDX: 00000000
>> [ 0.004000] ESI: 04ee7000 EDI: f5ca2080 EBP: f5cc5f54 ESP: f5cc5f44
>> [ 0.004000] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
>> [ 0.004000] CR0: 80050033 CR2: 00000000 CR3: 04ee7000 CR4: 000006b0
>> [ 0.004000] Call Trace:
>> [ 0.004000] cpu_init+0xdc/0x2f0
>> [ 0.004000] start_secondary+0x34/0x1c6
>> [ 0.004000] startup_32_smp+0x164/0x166
>> [ 0.004000] ? startup_32_smp+0x164/0x166
>> [ 0.004000] Code: 56 53 83 ec 08 64 8b 1d c0 c0 03 c5 b9 10 57 f1 c4 e8 de 65 9e 00 89 45 f0 89 55 f4 0f 20 de 8b 43 20 05 00 00 00 40 39 c6 74 11 <0f> ff 50 56 68 74 9c d8 c4 e8 d1 cd 04 00 83 c4 0c a1 20 90 f8
>> [ 0.004000] ---[ end trace 7439e29925a49b51 ]---
>> [ 0.004000] # CR3: 0000000004ee7000, __pa(mm->pgd): 000000000504b000
>
> Ok, clearly Andy didn't get the 32-bit SMP bringup path right.
> Presumable tested a 32-bit UP image, or in a single-cpu VM?
>
> Andy?
The warning only triggers on 32-bit SMP. The issue seems to be that boot_cpu_has(X86_FEATURE_PCID) is true despite setup_clear_cpu_cap. I'm still trying to figure out why. I'll hopefully have a patch after lunch.
>
> Linus
On Fri, Sep 15, 2017 at 12:29 PM, Andy Lutomirski <[email protected]> wrote:
>
>
>> On Sep 15, 2017, at 11:47 AM, Linus Torvalds <[email protected]> wrote:
>>
>>> On Fri, Sep 15, 2017 at 3:22 AM, Pavel Machek <[email protected]> wrote:
>>>
>>> Let me pull latest...
>>>
>>> 711aab1dbb324d321e3d84368a435a78908c7bce
>>>
>>> (Strange. Not authored by Linus and old?)
>>
>> That's the author date, the committer date is new. Top of tree right
>> now just happens to be a patch I applied, it's much more commonly a
>> merge I've done.
>>
>>> But result is still similar, this time with more debug information.
>>> [ 0.116813] x86: Booting SMP configuration:
>>> [ 0.116893] .... node #0, CPUs: #1
>>> [ 0.004000] Initializing CPU#1
>>> [ 0.004000] ------------[ cut here ]------------
>>> [ 0.004000] WARNING: CPU: 1 PID: 0 at arch/x86/mm/tlb.c:257 initialize_tlbstate_and_flush+0x2e/0xed
>>> [ 0.004000] Hardware name: LENOVO 17097HU/17097HU, BIOS 7BETD8WW (2.19 ) 03/31/2011
>>> [ 0.004000] task: f5ca2080 task.stack: f5cc4000
>>> [ 0.004000] EIP: initialize_tlbstate_and_flush+0x2e/0xed
>>> [ 0.004000] EFLAGS: 00210087 CPU: 1
>>> [ 0.004000] EAX: 0504b000 EBX: c4f15540 ECX: c4f15710 EDX: 00000000
>>> [ 0.004000] ESI: 04ee7000 EDI: f5ca2080 EBP: f5cc5f54 ESP: f5cc5f44
>>> [ 0.004000] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
>>> [ 0.004000] CR0: 80050033 CR2: 00000000 CR3: 04ee7000 CR4: 000006b0
>>> [ 0.004000] Call Trace:
>>> [ 0.004000] cpu_init+0xdc/0x2f0
>>> [ 0.004000] start_secondary+0x34/0x1c6
>>> [ 0.004000] startup_32_smp+0x164/0x166
>>> [ 0.004000] ? startup_32_smp+0x164/0x166
>>> [ 0.004000] Code: 56 53 83 ec 08 64 8b 1d c0 c0 03 c5 b9 10 57 f1 c4 e8 de 65 9e 00 89 45 f0 89 55 f4 0f 20 de 8b 43 20 05 00 00 00 40 39 c6 74 11 <0f> ff 50 56 68 74 9c d8 c4 e8 d1 cd 04 00 83 c4 0c a1 20 90 f8
>>> [ 0.004000] ---[ end trace 7439e29925a49b51 ]---
>>> [ 0.004000] # CR3: 0000000004ee7000, __pa(mm->pgd): 000000000504b000
>>
>> Ok, clearly Andy didn't get the 32-bit SMP bringup path right.
>> Presumable tested a 32-bit UP image, or in a single-cpu VM?
>>
>> Andy?
>
> The warning only triggers on 32-bit SMP. The issue seems to be that boot_cpu_has(X86_FEATURE_PCID) is true despite setup_clear_cpu_cap. I'm still trying to figure out why. I'll hopefully have a patch after lunch.
Nah, that was a false alarm although there's arguably a real bug here.
But I found the issue causing the warning.
x86's boot code is so inconsistent and weird that I'm a bit surprised
that it works at all sometimes.
>
>>
>> Linus