2019-06-13 16:42:40

by Nadav Amit

[permalink] [raw]
Subject: [PATCH 5/9] x86/mm/tlb: Optimize local TLB flushes

While the updated smp infrastructure is capable of running a function on
a single local core, it is not optimized for this case. The multiple
function calls and the indirect branch introduce some overhead, making
local TLB flushes slower than they were before the recent changes.

Before calling the SMP infrastructure, check if only a local TLB flush
is needed to restore the lost performance in this common case. This
requires to check mm_cpumask() another time, but unless this mask is
updated very frequently, this should impact performance negatively.

Cc: Dave Hansen <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Nadav Amit <[email protected]>
---
arch/x86/mm/tlb.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index db73d5f1dd43..ceb03b8cad32 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -815,8 +815,12 @@ static void flush_tlb_on_cpus(const cpumask_t *cpumask,
const struct flush_tlb_info *info)
{
int this_cpu = smp_processor_id();
+ bool flush_others = false;

- if (static_branch_likely(&flush_tlb_multi_enabled)) {
+ if (cpumask_any_but(cpumask, this_cpu) < nr_cpu_ids)
+ flush_others = true;
+
+ if (static_branch_likely(&flush_tlb_multi_enabled) && flush_others) {
flush_tlb_multi(cpumask, info);
return;
}
@@ -828,7 +832,7 @@ static void flush_tlb_on_cpus(const cpumask_t *cpumask,
local_irq_enable();
}

- if (cpumask_any_but(cpumask, this_cpu) < nr_cpu_ids)
+ if (flush_others)
flush_tlb_others(cpumask, info);
}

--
2.20.1


2019-06-25 21:37:24

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 5/9] x86/mm/tlb: Optimize local TLB flushes

On 6/12/19 11:48 PM, Nadav Amit wrote:
> While the updated smp infrastructure is capable of running a function on
> a single local core, it is not optimized for this case.

OK, so flush_tlb_multi() is optimized for flushing local+remote at the
same time and is also (near?) the most optimal way to flush remote-only.
But, it's not as optimized at doing local-only flushes. But,
flush_tlb_on_cpus() *is* optimized for local-only flushes.

Right?

Can we distill that down to any particular advise that we can comment
these suckers with? For instance, flush_tlb_multi() is apparently not
safe to call ever by itself without a 'flush_tlb_multi_enabled' check
first. It's also, suboptimal for local flushes apparently.


2019-06-26 16:34:15

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 5/9] x86/mm/tlb: Optimize local TLB flushes

On Tue, Jun 25, 2019 at 2:36 PM Dave Hansen <[email protected]> wrote:
>
> On 6/12/19 11:48 PM, Nadav Amit wrote:
> > While the updated smp infrastructure is capable of running a function on
> > a single local core, it is not optimized for this case.
>
> OK, so flush_tlb_multi() is optimized for flushing local+remote at the
> same time and is also (near?) the most optimal way to flush remote-only.
> But, it's not as optimized at doing local-only flushes. But,
> flush_tlb_on_cpus() *is* optimized for local-only flushes.

Can we stick the optimization into flush_tlb_multi() in the interest
of keeping this stuff readable?

Also, would this series be easier to understand if there was a patch
to just remove the UV optimization before making other changes?

--Andy

2019-06-26 16:41:15

by Nadav Amit

[permalink] [raw]
Subject: Re: [PATCH 5/9] x86/mm/tlb: Optimize local TLB flushes

> On Jun 26, 2019, at 9:33 AM, Andy Lutomirski <[email protected]> wrote:
>
> On Tue, Jun 25, 2019 at 2:36 PM Dave Hansen <[email protected]> wrote:
>> On 6/12/19 11:48 PM, Nadav Amit wrote:
>>> While the updated smp infrastructure is capable of running a function on
>>> a single local core, it is not optimized for this case.
>>
>> OK, so flush_tlb_multi() is optimized for flushing local+remote at the
>> same time and is also (near?) the most optimal way to flush remote-only.
>> But, it's not as optimized at doing local-only flushes. But,
>> flush_tlb_on_cpus() *is* optimized for local-only flushes.
>
> Can we stick the optimization into flush_tlb_multi() in the interest
> of keeping this stuff readable?

flush_tlb_on_cpus() will be much simpler once I remove the fallback
path that is in there for Xen and hyper-v. I can then open-code it in
flush_tlb_mm_range() and arch_tlbbatch_flush().

>
> Also, would this series be easier to understand if there was a patch
> to just remove the UV optimization before making other changes?

If you just want me to remove it, I can do it. I don’t know who uses it and
what the impact might be.

2019-06-26 16:51:17

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 5/9] x86/mm/tlb: Optimize local TLB flushes

On Wed, Jun 26, 2019 at 9:39 AM Nadav Amit <[email protected]> wrote:
>
> > On Jun 26, 2019, at 9:33 AM, Andy Lutomirski <[email protected]> wrote:
> >
> > On Tue, Jun 25, 2019 at 2:36 PM Dave Hansen <[email protected]> wrote:
> >> On 6/12/19 11:48 PM, Nadav Amit wrote:
> >>> While the updated smp infrastructure is capable of running a function on
> >>> a single local core, it is not optimized for this case.
> >>
> >> OK, so flush_tlb_multi() is optimized for flushing local+remote at the
> >> same time and is also (near?) the most optimal way to flush remote-only.
> >> But, it's not as optimized at doing local-only flushes. But,
> >> flush_tlb_on_cpus() *is* optimized for local-only flushes.
> >
> > Can we stick the optimization into flush_tlb_multi() in the interest
> > of keeping this stuff readable?
>
> flush_tlb_on_cpus() will be much simpler once I remove the fallback
> path that is in there for Xen and hyper-v. I can then open-code it in
> flush_tlb_mm_range() and arch_tlbbatch_flush().
>
> >
> > Also, would this series be easier to understand if there was a patch
> > to just remove the UV optimization before making other changes?
>
> If you just want me to remove it, I can do it. I don’t know who uses it and
> what the impact might be.
>

Only if you think it simplifies things. The impact will be somewhat
slower flushes on affected hardware. The UV maintainers know how to
fix this more sustainably, and maybe this will encourage them to do it
:)