2019-09-23 09:07:30

by Leonardo Bras

[permalink] [raw]
Subject: [PATCH v2 11/11] powerpc/mm/book3s64/pgtable: Uses counting method to skip serializing

Skips slow part of serialize_against_pte_lookup if there is no running
lockless pagetable walk.

Signed-off-by: Leonardo Bras <[email protected]>
---
arch/powerpc/mm/book3s64/pgtable.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
index 13239b17a22c..41ca30269fa3 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -95,7 +95,8 @@ static void do_nothing(void *unused)
void serialize_against_pte_lookup(struct mm_struct *mm)
{
smp_mb();
- smp_call_function_many(mm_cpumask(mm), do_nothing, NULL, 1);
+ if (running_lockless_pgtbl_walk(mm))
+ smp_call_function_many(mm_cpumask(mm), do_nothing, NULL, 1);
}

void start_lockless_pgtbl_walk(struct mm_struct *mm)
--
2.20.1


2019-09-25 16:53:38

by Leonardo Bras

[permalink] [raw]
Subject: Re: [PATCH v2 11/11] powerpc/mm/book3s64/pgtable: Uses counting method to skip serializing

On Fri, 2019-09-20 at 17:48 -0700, John Hubbard wrote:
>
[...]
> So it seems that full memory barriers (not just compiler barriers) are required.
> If the irq enable/disable somehow provides that, then your new code just goes
> along for the ride and Just Works. (You don't have any memory barriers in
> start_lockless_pgtbl_walk() / end_lockless_pgtbl_walk(), just the compiler
> barriers provided by the atomic inc/dec.)
>
> So it's really a pre-existing question about the correctness of the gup_fast()
> irq disabling approach.

I am not experienced in other archs, and I am still pretty new to
Power, but by what I could understand, this behavior is better
explained in serialize_against_pte_lookup.

What happens here is that, before doing a THP split/collapse, the
function does a update of the pmd and a serialize_against_pte_lookup,
in order do avoid a invalid output on a lockless pagetable walk.

Serialize basically runs a do_nothing in every cpu related to the
process, and wait for it to return.

This running depends on interrupt being enabled, so disabling it before
gup_pgd_range() and re-enabling after the end, makes the THP
split/collapse wait for gup_pgd_range() completion in every cpu before
continuing. (here happens the lock)

(As told before, every gup_pgd_range() that occurs after it uses a
updated pmd, so no problem.)

I am sure other archs may have a similar mechanism using
local_irq_{disable,enable}.

Did it answer your questions?

Best regards,

Leonardo Bras


Attachments:
signature.asc (849.00 B)
This is a digitally signed message part

2019-09-25 23:20:14

by Leonardo Bras

[permalink] [raw]
Subject: Re: [PATCH v2 11/11] powerpc/mm/book3s64/pgtable: Uses counting method to skip serializing

On Mon, 2019-09-23 at 11:14 -0700, John Hubbard wrote:
> On 9/23/19 10:25 AM, Leonardo Bras wrote:
> [...]
> That part is all fine, but there are no run-time memory barriers in the
> atomic_inc() and atomic_dec() additions, which means that this is not
> safe, because memory operations on CPU 1 can be reordered. It's safe
> as shown *if* there are memory barriers to keep the order as shown:
>
> CPU 0 CPU 1
> ------ --------------
> atomic_inc(val) (no run-time memory barrier!)
> pmd_clear(pte)
> if (val)
> run_on_all_cpus(): IPI
> local_irq_disable() (also not a mem barrier)
>
> READ(pte)
> if(pte)
> walk page tables
>
> local_irq_enable() (still not a barrier)
> atomic_dec(val)
>
> free(pte)
>
> thanks,

This is serialize:

void serialize_against_pte_lookup(struct mm_struct *mm)
{
smp_mb();
if (running_lockless_pgtbl_walk(mm))
smp_call_function_many(mm_cpumask(mm), do_nothing,
NULL, 1);
}

That would mean:

CPU 0 CPU 1
------ --------------
atomic_inc(val)
pmd_clear(pte)
smp_mb()
if (val)
run_on_all_cpus(): IPI
local_irq_disable()

READ(pte)
if(pte)
walk page tables

local_irq_enable() (still not a barrier)
atomic_dec(val)

By https://www.kernel.org/doc/Documentation/memory-barriers.txt :
'If you need all the CPUs to see a given store at the same time, use
smp_mb().'

Is it not enough?
Do you suggest adding 'smp_mb()' after atomic_{inc,dec} ?


Attachments:
signature.asc (849.00 B)
This is a digitally signed message part

2019-09-25 23:38:43

by Leonardo Bras

[permalink] [raw]
Subject: Re: [PATCH v2 11/11] powerpc/mm/book3s64/pgtable: Uses counting method to skip serializing

On Mon, 2019-09-23 at 12:58 -0700, John Hubbard wrote:
>
> CPU 0 CPU 1
> ------ --------------
> READ(pte) (re-ordered at run time)
> atomic_inc(val) (no run-time memory barrier!)
>
> pmd_clear(pte)
> if (val)
> run_on_all_cpus(): IPI
> local_irq_disable() (also not a mem barrier)
>
> if(pte)
> walk page tables

Let me see if I can understand,
On most patches, it would be:

CPU 0 CPU 1
------ --------------
ptep = __find_linux_pte
(re-ordered at run time)
atomic_inc(val)
pmd_clear(pte)
smp_mb()
if (val)
run_on_all_cpus(): IPI
local_irq_disable()

if(ptep)
pte = *ptep;

Is that what you meant?



Attachments:
signature.asc (849.00 B)
This is a digitally signed message part

2019-09-25 23:40:48

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH v2 11/11] powerpc/mm/book3s64/pgtable: Uses counting method to skip serializing

On 9/23/19 1:23 PM, Leonardo Bras wrote:
> On Mon, 2019-09-23 at 12:58 -0700, John Hubbard wrote:
>>
>> CPU 0 CPU 1
>> ------ --------------
>> READ(pte) (re-ordered at run time)
>> atomic_inc(val) (no run-time memory barrier!)
>>
>> pmd_clear(pte)
>> if (val)
>> run_on_all_cpus(): IPI
>> local_irq_disable() (also not a mem barrier)
>>
>> if(pte)
>> walk page tables
>
> Let me see if I can understand,
> On most patches, it would be:
>
> CPU 0 CPU 1
> ------ --------------
> ptep = __find_linux_pte
> (re-ordered at run time)
> atomic_inc(val)
> pmd_clear(pte)
> smp_mb()
> if (val)
> run_on_all_cpus(): IPI
> local_irq_disable()
>
> if(ptep)
> pte = *ptep;
>
> Is that what you meant?
>
>

Yes.

thanks,
--
John Hubbard
NVIDIA

2019-09-26 08:53:14

by Leonardo Bras

[permalink] [raw]
Subject: Re: [PATCH v2 11/11] powerpc/mm/book3s64/pgtable: Uses counting method to skip serializing

John Hubbard <[email protected]> writes:

>> Is that what you meant?
>
> Yes.
>

I am still trying to understand this issue.

I am also analyzing some cases where interrupt disable is not done
before the lockless pagetable walk (patch 3 discussion).

But given I forgot to add the mm mailing list before, I think it would
be wiser to send a v3 and gather feedback while I keep trying to
understand how it works, and if it needs additional memory barrier here.

Thanks!

Leonardo Bras


Attachments:
signature.asc (849.00 B)
This is a digitally signed message part