2023-06-20 15:18:29

by Yair Podemsky

[permalink] [raw]
Subject: [PATCH v2 0/2] send tlb_remove_table_smp_sync IPI only to necessary CPUs

Currently the tlb_remove_table_smp_sync IPI is sent to all CPUs
indiscriminately, this causes unnecessary work and delays notable in
real-time use-cases and isolated cpus.
By limiting the IPI to only be sent to cpus referencing the effected
mm.
a config to differentiate architectures that support mm_cpumask from
those that don't will allow safe usage of this feature.

changes from -v1:
- Previous version included a patch to only send the IPI to CPU's with
context_tracking in the kernel space, this was removed due to race
condition concerns.
- for archs that do not maintain mm_cpumask the mask used should be
cpu_online_mask (Peter Zijlstra).

v1: https://lore.kernel.org/all/[email protected]/

Yair Podemsky (2):
arch: Introduce ARCH_HAS_CPUMASK_BITS
mm/mmu_gather: send tlb_remove_table_smp_sync IPI only to MM CPUs

arch/Kconfig | 8 ++++++++
arch/arm/Kconfig | 1 +
arch/powerpc/Kconfig | 1 +
arch/s390/Kconfig | 1 +
arch/sparc/Kconfig | 1 +
arch/x86/Kconfig | 1 +
include/asm-generic/tlb.h | 4 ++--
mm/khugepaged.c | 4 ++--
mm/mmu_gather.c | 17 ++++++++++++-----
9 files changed, 29 insertions(+), 9 deletions(-)

--
2.39.3



2023-06-21 08:11:28

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] send tlb_remove_table_smp_sync IPI only to necessary CPUs

On Tue, Jun 20, 2023 at 05:46:16PM +0300, Yair Podemsky wrote:
> Currently the tlb_remove_table_smp_sync IPI is sent to all CPUs
> indiscriminately, this causes unnecessary work and delays notable in
> real-time use-cases and isolated cpus.
> By limiting the IPI to only be sent to cpus referencing the effected
> mm.
> a config to differentiate architectures that support mm_cpumask from
> those that don't will allow safe usage of this feature.
>
> changes from -v1:
> - Previous version included a patch to only send the IPI to CPU's with
> context_tracking in the kernel space, this was removed due to race
> condition concerns.
> - for archs that do not maintain mm_cpumask the mask used should be
> cpu_online_mask (Peter Zijlstra).
>

Would it not be much better to fix the root cause? As per the last time,
there's patches that cure the thp abuse of this.

2023-06-22 13:18:26

by Yair Podemsky

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] send tlb_remove_table_smp_sync IPI only to necessary CPUs

On Wed, 2023-06-21 at 09:43 +0200, Peter Zijlstra wrote:
> On Tue, Jun 20, 2023 at 05:46:16PM +0300, Yair Podemsky wrote:
> > Currently the tlb_remove_table_smp_sync IPI is sent to all CPUs
> > indiscriminately, this causes unnecessary work and delays notable
> > in
> > real-time use-cases and isolated cpus.
> > By limiting the IPI to only be sent to cpus referencing the
> > effected
> > mm.
> > a config to differentiate architectures that support mm_cpumask
> > from
> > those that don't will allow safe usage of this feature.
> >
> > changes from -v1:
> > - Previous version included a patch to only send the IPI to CPU's
> > with
> > context_tracking in the kernel space, this was removed due to race
> > condition concerns.
> > - for archs that do not maintain mm_cpumask the mask used should be
> > cpu_online_mask (Peter Zijlstra).
> >
>
> Would it not be much better to fix the root cause? As per the last
> time,
> there's patches that cure the thp abuse of this.
>
Hi Peter,
Thanks for your reply.
There are two code paths leading to this IPI, one is the thp,
But the other is the failure to allocate page in tlb_remove_table,
It is the the second path that we are most interested in as it was
found
to cause interference in a real time process for a client (That system
did
not have thp).
So while curing thp abuses is a good thing, it will not unfortunately
solve
our root cause.
If you have any idea of how to remove the tlb_remove_table_sync_one()
usage
in the tlb_remove_table()->tlb_remove_table_one() call path -- the
usage
that's relevant for us -- that would be great. As long as we can't
remove
that, I'm afraid all we can do is optimize for it to not broadcast an
IPI
to all CPUs in the system, as done in this patch.
Thanks,
Yair


2023-06-22 13:56:27

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] send tlb_remove_table_smp_sync IPI only to necessary CPUs

On Wed, Jun 21, 2023 at 09:43:37AM +0200, Peter Zijlstra wrote:
> On Tue, Jun 20, 2023 at 05:46:16PM +0300, Yair Podemsky wrote:
> > Currently the tlb_remove_table_smp_sync IPI is sent to all CPUs
> > indiscriminately, this causes unnecessary work and delays notable in
> > real-time use-cases and isolated cpus.
> > By limiting the IPI to only be sent to cpus referencing the effected
> > mm.
> > a config to differentiate architectures that support mm_cpumask from
> > those that don't will allow safe usage of this feature.
> >
> > changes from -v1:
> > - Previous version included a patch to only send the IPI to CPU's with
> > context_tracking in the kernel space, this was removed due to race
> > condition concerns.
> > - for archs that do not maintain mm_cpumask the mask used should be
> > cpu_online_mask (Peter Zijlstra).
> >
>
> Would it not be much better to fix the root cause? As per the last time,
> there's patches that cure the thp abuse of this.

The other case where the IPI can happen is:

CPU-0 CPU-1

tlb_remove_table
tlb_remove_table_sync_one
IPI
local_irq_disable
gup_fast
local_irq_enable


So its not only the THP case.


2023-07-03 14:42:45

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] send tlb_remove_table_smp_sync IPI only to necessary CPUs

On Thu, Jun 22, 2023 at 09:47:22AM -0300, Marcelo Tosatti wrote:

> > there's patches that cure the thp abuse of this.
>
> The other case where the IPI can happen is:
>
> CPU-0 CPU-1
>
> tlb_remove_table
> tlb_remove_table_sync_one
> IPI
> local_irq_disable
> gup_fast
> local_irq_enable
>
>
> So its not only the THP case.

(your CPU-1 thing is wholly irrelevant)

Well, I know, but this case *should* be exceedingly rare. Last time
around I asked why you all were tripping this, you pointed at the THP
case.

The THP case should be fixed along the lines of Jann's original patches.

If you can trip this at any significant rate, then we should probably
look at a better allocation scheme. It means you're really low on
memory.