Commit f77084d96355 "x86/mm/pat: Disable preemption around
__flush_tlb_all()" addressed a case where __flush_tlb_all() is called
without preemption being disabled. It also left a warning to catch other
cases where preemption is not disabled. That warning triggers for the
memory hotplug path which is also used for persistent memory enabling:
WARNING: CPU: 35 PID: 911 at ./arch/x86/include/asm/tlbflush.h:460
RIP: 0010:__flush_tlb_all+0x1b/0x3a
[..]
Call Trace:
phys_pud_init+0x29c/0x2bb
kernel_physical_mapping_init+0xfc/0x219
init_memory_mapping+0x1a5/0x3b0
arch_add_memory+0x2c/0x50
devm_memremap_pages+0x3aa/0x610
pmem_attach_disk+0x585/0x700 [nd_pmem]
Rather than audit all __flush_tlb_all() callers to add preemption, just
do it internally to __flush_tlb_all().
Fixes: f77084d96355 ("x86/mm/pat: Disable preemption around __flush_tlb_all()")
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
arch/x86/include/asm/tlbflush.h | 8 ++++----
arch/x86/mm/pageattr.c | 6 +-----
2 files changed, 5 insertions(+), 9 deletions(-)
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index d760611cfc35..049e0aca0fb5 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -454,11 +454,10 @@ static inline void __native_flush_tlb_one_user(unsigned long addr)
static inline void __flush_tlb_all(void)
{
/*
- * This is to catch users with enabled preemption and the PGE feature
- * and don't trigger the warning in __native_flush_tlb().
+ * Preemption needs to be disabled around __flush_tlb* calls
+ * due to CR3 reload in __native_flush_tlb().
*/
- VM_WARN_ON_ONCE(preemptible());
-
+ preempt_disable();
if (boot_cpu_has(X86_FEATURE_PGE)) {
__flush_tlb_global();
} else {
@@ -467,6 +466,7 @@ static inline void __flush_tlb_all(void)
*/
__flush_tlb();
}
+ preempt_enable();
}
/*
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index db7a10082238..f799076e3d57 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -2309,13 +2309,9 @@ void __kernel_map_pages(struct page *page, int numpages, int enable)
/*
* We should perform an IPI and flush all tlbs,
- * but that can deadlock->flush only current cpu.
- * Preemption needs to be disabled around __flush_tlb_all() due to
- * CR3 reload in __native_flush_tlb().
+ * but that can deadlock->flush only current cpu:
*/
- preempt_disable();
__flush_tlb_all();
- preempt_enable();
arch_flush_lazy_mmu_mode();
}
> On Nov 9, 2018, at 4:05 PM, Dan Williams <[email protected]> wrote:
>
> Commit f77084d96355 "x86/mm/pat: Disable preemption around
> __flush_tlb_all()" addressed a case where __flush_tlb_all() is called
> without preemption being disabled. It also left a warning to catch other
> cases where preemption is not disabled. That warning triggers for the
> memory hotplug path which is also used for persistent memory enabling:
I don’t think I agree with the patch. If you call __flush_tlb_all() in a context where you might be *migrated*, then there’s a bug. We could change the code to allow this particular use by checking that we haven’t done SMP init yet, perhaps.
>
> WARNING: CPU: 35 PID: 911 at ./arch/x86/include/asm/tlbflush.h:460
> RIP: 0010:__flush_tlb_all+0x1b/0x3a
> [..]
> Call Trace:
> phys_pud_init+0x29c/0x2bb
> kernel_physical_mapping_init+0xfc/0x219
> init_memory_mapping+0x1a5/0x3b0
> arch_add_memory+0x2c/0x50
> devm_memremap_pages+0x3aa/0x610
> pmem_attach_disk+0x585/0x700 [nd_pmem]
>
> Rather than audit all __flush_tlb_all() callers to add preemption, just
> do it internally to __flush_tlb_all().
>
> Fixes: f77084d96355 ("x86/mm/pat: Disable preemption around __flush_tlb_all()")
> Cc: Sebastian Andrzej Siewior <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: <[email protected]>
> Signed-off-by: Dan Williams <[email protected]>
> ---
> arch/x86/include/asm/tlbflush.h | 8 ++++----
> arch/x86/mm/pageattr.c | 6 +-----
> 2 files changed, 5 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> index d760611cfc35..049e0aca0fb5 100644
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -454,11 +454,10 @@ static inline void __native_flush_tlb_one_user(unsigned long addr)
> static inline void __flush_tlb_all(void)
> {
> /*
> - * This is to catch users with enabled preemption and the PGE feature
> - * and don't trigger the warning in __native_flush_tlb().
> + * Preemption needs to be disabled around __flush_tlb* calls
> + * due to CR3 reload in __native_flush_tlb().
> */
> - VM_WARN_ON_ONCE(preemptible());
> -
> + preempt_disable();
> if (boot_cpu_has(X86_FEATURE_PGE)) {
> __flush_tlb_global();
> } else {
> @@ -467,6 +466,7 @@ static inline void __flush_tlb_all(void)
> */
> __flush_tlb();
> }
> + preempt_enable();
> }
>
> /*
> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> index db7a10082238..f799076e3d57 100644
> --- a/arch/x86/mm/pageattr.c
> +++ b/arch/x86/mm/pageattr.c
> @@ -2309,13 +2309,9 @@ void __kernel_map_pages(struct page *page, int numpages, int enable)
>
> /*
> * We should perform an IPI and flush all tlbs,
> - * but that can deadlock->flush only current cpu.
> - * Preemption needs to be disabled around __flush_tlb_all() due to
> - * CR3 reload in __native_flush_tlb().
> + * but that can deadlock->flush only current cpu:
> */
> - preempt_disable();
> __flush_tlb_all();
> - preempt_enable();
>
> arch_flush_lazy_mmu_mode();
> }
>
On Fri, Nov 9, 2018 at 4:22 PM Andy Lutomirski <[email protected]> wrote:
>
>
>
> > On Nov 9, 2018, at 4:05 PM, Dan Williams <[email protected]> wrote:
> >
> > Commit f77084d96355 "x86/mm/pat: Disable preemption around
> > __flush_tlb_all()" addressed a case where __flush_tlb_all() is called
> > without preemption being disabled. It also left a warning to catch other
> > cases where preemption is not disabled. That warning triggers for the
> > memory hotplug path which is also used for persistent memory enabling:
>
> I don’t think I agree with the patch. If you call __flush_tlb_all() in a context where you might be *migrated*, then there’s a bug. We could change the code to allow this particular use by checking that we haven’t done SMP init yet, perhaps.
Hmm, are saying the entire kernel_physical_mapping_init() sequence
needs to run with pre-emption disabled?
> On Nov 10, 2018, at 3:57 PM, Dan Williams <[email protected]> wrote:
>
>> On Fri, Nov 9, 2018 at 4:22 PM Andy Lutomirski <[email protected]> wrote:
>>
>>
>>
>>> On Nov 9, 2018, at 4:05 PM, Dan Williams <[email protected]> wrote:
>>>
>>> Commit f77084d96355 "x86/mm/pat: Disable preemption around
>>> __flush_tlb_all()" addressed a case where __flush_tlb_all() is called
>>> without preemption being disabled. It also left a warning to catch other
>>> cases where preemption is not disabled. That warning triggers for the
>>> memory hotplug path which is also used for persistent memory enabling:
>>
>> I don’t think I agree with the patch. If you call __flush_tlb_all() in a context where you might be *migrated*, then there’s a bug. We could change the code to allow this particular use by checking that we haven’t done SMP init yet, perhaps.
>
> Hmm, are saying the entire kernel_physical_mapping_init() sequence
> needs to run with pre-emption disabled?
If it indeed can run late in boot or after boot, then it sure looks buggy. Either the __flush_tlb_all() should be removed or it should be replaced with flush_tlb_kernel_range(). It’s unclear to me why a flush is needed at all, but if it’s needed, surely all CPUs need flushing.
[ added Kirill ]
On Sat, Nov 10, 2018 at 4:19 PM Andy Lutomirski <[email protected]> wrote:
> > On Nov 10, 2018, at 3:57 PM, Dan Williams <[email protected]> wrote:
> >
> >> On Fri, Nov 9, 2018 at 4:22 PM Andy Lutomirski <[email protected]> wrote:
> >>
> >>
> >>
> >>> On Nov 9, 2018, at 4:05 PM, Dan Williams <[email protected]> wrote:
> >>>
> >>> Commit f77084d96355 "x86/mm/pat: Disable preemption around
> >>> __flush_tlb_all()" addressed a case where __flush_tlb_all() is called
> >>> without preemption being disabled. It also left a warning to catch other
> >>> cases where preemption is not disabled. That warning triggers for the
> >>> memory hotplug path which is also used for persistent memory enabling:
> >>
> >> I don’t think I agree with the patch. If you call __flush_tlb_all() in a context where you might be *migrated*, then there’s a bug. We could change the code to allow this particular use by checking that we haven’t done SMP init yet, perhaps.
> >
> > Hmm, are saying the entire kernel_physical_mapping_init() sequence
> > needs to run with pre-emption disabled?
>
> If it indeed can run late in boot or after boot, then it sure looks buggy. Either the __flush_tlb_all() should be removed or it should be replaced with flush_tlb_kernel_range(). It’s unclear to me why a flush is needed at all, but if it’s needed, surely all CPUs need flushing.
Yeah, I don't think __flush_tlb_all() is needed at
kernel_physical_mapping_init() time, and at
kernel_physical_mapping_remove() time we do a full flush_tlb_all().
Kirill?
On 11/10/18 4:31 PM, Dan Williams wrote:
>> If it indeed can run late in boot or after boot, then it sure looks
>> buggy. Either the __flush_tlb_all() should be removed or it should
>> be replaced with flush_tlb_kernel_range(). It’s unclear to me why a
>> flush is needed at all, but if it’s needed, surely all CPUs need
>> flushing.
> Yeah, I don't think __flush_tlb_all() is needed at
> kernel_physical_mapping_init() time, and at
> kernel_physical_mapping_remove() time we do a full flush_tlb_all().
It doesn't look strictly necessary to me. I _think_ we're only ever
populating previously non-present entries, and those never need TLB
flushes. I didn't look too deeply, so I'd appreciate anyone else
double-checking me on this.
The __flush_tlb_all() actually appears to predate git and it was
originally entirely intended for early-boot-only. It probably lasted
this long because it looks really important. :)
It was even next to where we set MMU features in CR4, which is *really*
early in boot:
> + asm volatile("movq %%cr4,%0" : "=r" (mmu_cr4_features));
> + __flush_tlb_all();
I also totally agree with Andy that if it were needed on the local CPU,
this code would be buggy because it doesn't initiate any *remote* TLB
flushes.
So, let's remove it, but also add some comments about not being allowed
to *change* page table entries, only populate them. We could even add
some warnings to keep this enforced.