LinuxLists.cc - [PATCH 01/13] mm: Update ptep_get

2022-10-22 11:54:00

Subject: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Improve the comment.

Suggested-by: Matthew Wilcox <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
include/linux/pgtable.h | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)

--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -260,15 +260,12 @@ static inline pte_t ptep_get(pte_t *ptep

#ifdef CONFIG_GUP_GET_PTE_LOW_HIGH
/*
- * WARNING: only to be used in the get_user_pages_fast() implementation.
- *
- * With get_user_pages_fast(), we walk down the pagetables without taking any
- * locks. For this we would like to load the pointers atomically, but sometimes
- * that is not possible (e.g. without expensive cmpxchg8b on x86_32 PAE). What
- * we do have is the guarantee that a PTE will only either go from not present
- * to present, or present to not present or both -- it will not switch to a
- * completely different present page without a TLB flush in between; something
- * that we are blocking by holding interrupts off.
+ * For walking the pagetables without holding any locks. Some architectures
+ * (eg x86-32 PAE) cannot load the entries atomically without using expensive
+ * instructions. We are guaranteed that a PTE will only either go from not
+ * present to present, or present to not present -- it will not switch to a
+ * completely different present page without a TLB flush inbetween; which we
+ * are blocking by holding interrupts off.
*
* Setting ptes from not present to present goes:
*

2022-10-24 05:49:58

by John Hubbard

[permalink] [raw]

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

On 10/22/22 04:14, Peter Zijlstra wrote:
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -260,15 +260,12 @@ static inline pte_t ptep_get(pte_t *ptep
>
> #ifdef CONFIG_GUP_GET_PTE_LOW_HIGH
> /*
> - * WARNING: only to be used in the get_user_pages_fast() implementation.
> - *
> - * With get_user_pages_fast(), we walk down the pagetables without taking any
> - * locks. For this we would like to load the pointers atomically, but sometimes
> - * that is not possible (e.g. without expensive cmpxchg8b on x86_32 PAE). What
> - * we do have is the guarantee that a PTE will only either go from not present
> - * to present, or present to not present or both -- it will not switch to a
> - * completely different present page without a TLB flush in between; something
> - * that we are blocking by holding interrupts off.
> + * For walking the pagetables without holding any locks. Some architectures
> + * (eg x86-32 PAE) cannot load the entries atomically without using expensive
> + * instructions. We are guaranteed that a PTE will only either go from not
> + * present to present, or present to not present -- it will not switch to a
> + * completely different present page without a TLB flush inbetween; which we
> + * are blocking by holding interrupts off.

This is getting interesting. My latest understanding of this story is
that both the "before" and "after" versions of that comment are
incorrect! Because, as Jann Horn noticed recently [1], there might not
be any IPIs involved in a TLB flush, if x86 is running under a
hypervisor, and that breaks the chain of reasoning here.

[1] https://lore.kernel.org/all/CAG48ez3h-mnp9ZFC10v+-BW_8NQvxbwBsMYJFP8JX31o0B17Pg@mail.gmail.com/

thanks,
--
John Hubbard
NVIDIA

2022-10-24 09:00:51

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

On Sun, Oct 23, 2022 at 10:42:49PM -0700, John Hubbard wrote:
> On 10/22/22 04:14, Peter Zijlstra wrote:
> > --- a/include/linux/pgtable.h
> > +++ b/include/linux/pgtable.h
> > @@ -260,15 +260,12 @@ static inline pte_t ptep_get(pte_t *ptep
> >
> > #ifdef CONFIG_GUP_GET_PTE_LOW_HIGH
> > /*
> > - * WARNING: only to be used in the get_user_pages_fast() implementation.
> > - *
> > - * With get_user_pages_fast(), we walk down the pagetables without taking any
> > - * locks. For this we would like to load the pointers atomically, but sometimes
> > - * that is not possible (e.g. without expensive cmpxchg8b on x86_32 PAE). What
> > - * we do have is the guarantee that a PTE will only either go from not present
> > - * to present, or present to not present or both -- it will not switch to a
> > - * completely different present page without a TLB flush in between; something
> > - * that we are blocking by holding interrupts off.
> > + * For walking the pagetables without holding any locks. Some architectures
> > + * (eg x86-32 PAE) cannot load the entries atomically without using expensive
> > + * instructions. We are guaranteed that a PTE will only either go from not
> > + * present to present, or present to not present -- it will not switch to a
> > + * completely different present page without a TLB flush inbetween; which we
> > + * are blocking by holding interrupts off.
>
>
> This is getting interesting. My latest understanding of this story is
> that both the "before" and "after" versions of that comment are
> incorrect! Because, as Jann Horn noticed recently [1], there might not
> be any IPIs involved in a TLB flush, if x86 is running under a
> hypervisor, and that breaks the chain of reasoning here.

That mail doesn't really include enough detail. The way x86 HV TLB
flushing is supposed to work is by making use of
MMU_GATHER_RCU_TABLE_FREE. Specifically, something like:

vCPU0 vCPU1

tlb_gather_mmut(&tlb, mm);

....

local_irq_disable();
... starts page-table walk ...

<schedules out; sets KVM_VCPU_PREEMPTED>

tlb_finish_mmu(&tlb)
...
kvm_flush_tlb_multi()
if (state & KVM_VCPU_PREEMPTED)
if (try_cmpxchg(,&state, state | KVM_VCPU_FLUSH_TLB))
__cpumask_clear_cpu(cpu, flushmask);

tlb_remove_table_sync_one() / call_rcu()

<schedules back in>

... continues page-table walk ...
local_irq_enable();

If mmu gather is forced into tlb_remove_talbe_sync_one() (by memory
pressure), then you've got your IPI back, otherwise it does call_rcu()
and RCU itself will need vCPU0 to enable IRQs in order to make progress.

Either way around, the actual freeing of the pages is delayed until the
page-table walk is finished.

What am I missing?

2022-10-24 22:33:20

Subject: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Attachments:

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Attachments:

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Attachments:

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Attachments:

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Subject: mm: delay rmap removal until after TLB flush

Subject: Re: mm: delay rmap removal until after TLB flush

Subject: Re: mm: delay rmap removal until after TLB flush

Subject: Re: mm: delay rmap removal until after TLB flush

Subject: Re: mm: delay rmap removal until after TLB flush

Subject: Re: mm: delay rmap removal until after TLB flush

Subject: Re: mm: delay rmap removal until after TLB flush

Subject: Re: mm: delay rmap removal until after TLB flush

Subject: Re: mm: delay rmap removal until after TLB flush

Subject: Re: mm: delay rmap removal until after TLB flush

Subject: Re: mm: delay rmap removal until after TLB flush

Subject: Re: mm: delay rmap removal until after TLB flush

Subject: Re: mm: delay rmap removal until after TLB flush

Subject: Re: mm: delay rmap removal until after TLB flush

Subject: Re: mm: delay rmap removal until after TLB flush

Subject: Re: mm: delay rmap removal until after TLB flush