LinuxLists.cc - [PATCH v3 0/3] Optimize tlbflush path

2019-08-22 02:39:29

Subject: [PATCH v3 0/3] Optimize tlbflush path

This series adds few optimizations to reduce the trap cost in the tlb
flush path. We should only make SBI calls to remote tlb flush only if
absolutely required.

This series is based on Christoph's series:

http://lists.infradead.org/pipermail/linux-riscv/2019-August/006148.html

Changes from v2->v3:
1. Split the patches into smaller one per optimization.

Atish Patra (3):
RISC-V: Issue a local tlbflush if possible.
RISC-V: Issue a tlb page flush if possible
RISC-V: Do not invoke SBI call if cpumask is empty

arch/riscv/mm/tlbflush.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)

--
2.21.0

2019-08-22 02:39:31

by Atish Patra

[permalink] [raw]

Subject: [PATCH v3 1/3] RISC-V: Issue a local tlbflush if possible.

In RISC-V, tlb flush happens via SBI which is expensive. If the local
cpu is the only cpu in cpumask, there is no need to invoke a SBI call.

Just do a local flush and return.

Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/mm/tlbflush.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)

diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index df93b26f1b9d..36430ee3bed9 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -2,6 +2,7 @@

#include <linux/mm.h>
#include <linux/smp.h>
+#include <linux/sched.h>
#include <asm/sbi.h>

void flush_tlb_all(void)
@@ -13,9 +14,23 @@ static void __sbi_tlb_flush_range(struct cpumask *cmask, unsigned long start,
unsigned long size)
{
struct cpumask hmask;
+ unsigned int cpuid = get_cpu();

+ if (!cmask) {
+ riscv_cpuid_to_hartid_mask(cpu_online_mask, &hmask);
+ goto issue_sfence;
+ }
+
+ if (cpumask_test_cpu(cpuid, cmask) && cpumask_weight(cmask) == 1) {
+ local_flush_tlb_all();
+ goto done;
+ }
riscv_cpuid_to_hartid_mask(cmask, &hmask);
+
+issue_sfence:
sbi_remote_sfence_vma(hmask.bits, start, size);
+done:
+ put_cpu();
}

void flush_tlb_mm(struct mm_struct *mm)
--
2.21.0

2019-08-22 02:40:05

by Atish Patra

[permalink] [raw]

Subject: [PATCH v3 2/3] RISC-V: Issue a tlb page flush if possible

If tlbflush request is for page only, there is no need to do a
complete local tlb shootdown.

Just do a local tlb flush for the given address.

Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/mm/tlbflush.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index 36430ee3bed9..9f58b3790baa 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -22,7 +22,10 @@ static void __sbi_tlb_flush_range(struct cpumask *cmask, unsigned long start,
}

if (cpumask_test_cpu(cpuid, cmask) && cpumask_weight(cmask) == 1) {
- local_flush_tlb_all();
+ if (size <= PAGE_SIZE && size != -1)
+ local_flush_tlb_page(start);
+ else
+ local_flush_tlb_all();
goto done;
}
riscv_cpuid_to_hartid_mask(cmask, &hmask);
--
2.21.0

2019-08-22 02:41:11

by Atish Patra

[permalink] [raw]

Subject: [PATCH v3 3/3] RISC-V: Do not invoke SBI call if cpumask is empty

SBI calls are expensive. If cpumask is empty, there is no need to
trap via SBI as no remote tlb flushing is required.

Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/mm/tlbflush.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index 9f58b3790baa..2bd3c418d769 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -21,6 +21,9 @@ static void __sbi_tlb_flush_range(struct cpumask *cmask, unsigned long start,
goto issue_sfence;
}

+ if (cpumask_empty(cmask))
+ goto done;
+
if (cpumask_test_cpu(cpuid, cmask) && cpumask_weight(cmask) == 1) {
if (size <= PAGE_SIZE && size != -1)
local_flush_tlb_page(start);
--
2.21.0

2019-08-22 02:55:44

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH v3 2/3] RISC-V: Issue a tlb page flush if possible

On Wed, Aug 21, 2019 at 05:46:43PM -0700, Atish Patra wrote:
> + if (size <= PAGE_SIZE && size != -1)
> + local_flush_tlb_page(start);
> + else
> + local_flush_tlb_all();

As Andreas pointed out (unsigned long)-1 is actually larger than
PAGE_SIZE, so we don't need the extra check.

2019-08-22 02:56:52

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH v3 3/3] RISC-V: Do not invoke SBI call if cpumask is empty

On Wed, Aug 21, 2019 at 05:46:44PM -0700, Atish Patra wrote:
> SBI calls are expensive. If cpumask is empty, there is no need to
> trap via SBI as no remote tlb flushing is required.
>
> Signed-off-by: Atish Patra <[email protected]>
> ---
> arch/riscv/mm/tlbflush.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index 9f58b3790baa..2bd3c418d769 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -21,6 +21,9 @@ static void __sbi_tlb_flush_range(struct cpumask *cmask, unsigned long start,
> goto issue_sfence;
> }
>
> + if (cpumask_empty(cmask))
> + goto done;

I think this can even be done before the get_cpu to optimize it a little
further.

2019-08-22 04:03:45

by Atish Patra

[permalink] [raw]

Subject: Re: [PATCH v3 2/3] RISC-V: Issue a tlb page flush if possible

On Thu, 2019-08-22 at 03:50 +0200, Christoph Hellwig wrote:
> On Wed, Aug 21, 2019 at 05:46:43PM -0700, Atish Patra wrote:
> > + if (size <= PAGE_SIZE && size != -1)
> > + local_flush_tlb_page(start);
> > + else
> > + local_flush_tlb_all();
>
> As Andreas pointed out (unsigned long)-1 is actually larger than
> PAGE_SIZE, so we don't need the extra check.

Ahh yes. Sorry I missed his comment in the earlier email. Fixed it.

--
Regards,
Atish

2019-08-22 04:04:27

by Atish Patra

[permalink] [raw]

Subject: Re: [PATCH v3 1/3] RISC-V: Issue a local tlbflush if possible.

On Thu, 2019-08-22 at 03:46 +0200, Christoph Hellwig wrote:
> On Wed, Aug 21, 2019 at 05:46:42PM -0700, Atish Patra wrote:
> > In RISC-V, tlb flush happens via SBI which is expensive. If the
> > local
> > cpu is the only cpu in cpumask, there is no need to invoke a SBI
> > call.
> >
> > Just do a local flush and return.
> >
> > Signed-off-by: Atish Patra <[email protected]>
> > ---
> > arch/riscv/mm/tlbflush.c | 15 +++++++++++++++
> > 1 file changed, 15 insertions(+)
> >
> > diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> > index df93b26f1b9d..36430ee3bed9 100644
> > --- a/arch/riscv/mm/tlbflush.c
> > +++ b/arch/riscv/mm/tlbflush.c
> > @@ -2,6 +2,7 @@
> >
> > #include <linux/mm.h>
> > #include <linux/smp.h>
> > +#include <linux/sched.h>
> > #include <asm/sbi.h>
> >
> > void flush_tlb_all(void)
> > @@ -13,9 +14,23 @@ static void __sbi_tlb_flush_range(struct cpumask
> > *cmask, unsigned long start,
> > unsigned long size)
> > {
> > struct cpumask hmask;
> > + unsigned int cpuid = get_cpu();
> >
> > + if (!cmask) {
> > + riscv_cpuid_to_hartid_mask(cpu_online_mask, &hmask);
> > + goto issue_sfence;
> > + }
> > +
> > + if (cpumask_test_cpu(cpuid, cmask) && cpumask_weight(cmask) ==
> > 1) {
> > + local_flush_tlb_all();
> > + goto done;
> > + }
>
> I think a single core on a SMP kernel is a valid enough use case
> given
> how litte distros still have UP kernels. So Maybe this shiuld rather
> be:
>
> if (!cmask)
> cmask = cpu_online_mask;
>
> if (cpumask_test_cpu(cpuid, cmask) && cpumask_weight(cmask) ==
> 1) {
> local_flush_tlb_all();
> } else {
> riscv_cpuid_to_hartid_mask(cmask, &hmask);
> sbi_remote_sfence_vma(hmask.bits, start, size);
> }

The downside of this is that for every !cmask case in true SMP (more
common probably) it will execute 2 extra cpumask instructions. As
tlbflush path is in performance critical path, I think we should favor
more common case (SMP with more than 1 core).

Thoughts ?

--
Regards,
Atish

2019-08-22 06:01:11

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH v3 1/3] RISC-V: Issue a local tlbflush if possible.

On Wed, Aug 21, 2019 at 05:46:42PM -0700, Atish Patra wrote:
> In RISC-V, tlb flush happens via SBI which is expensive. If the local
> cpu is the only cpu in cpumask, there is no need to invoke a SBI call.
>
> Just do a local flush and return.
>
> Signed-off-by: Atish Patra <[email protected]>
> ---
> arch/riscv/mm/tlbflush.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index df93b26f1b9d..36430ee3bed9 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -2,6 +2,7 @@
>
> #include <linux/mm.h>
> #include <linux/smp.h>
> +#include <linux/sched.h>
> #include <asm/sbi.h>
>
> void flush_tlb_all(void)
> @@ -13,9 +14,23 @@ static void __sbi_tlb_flush_range(struct cpumask *cmask, unsigned long start,
> unsigned long size)
> {
> struct cpumask hmask;
> + unsigned int cpuid = get_cpu();
>
> + if (!cmask) {
> + riscv_cpuid_to_hartid_mask(cpu_online_mask, &hmask);
> + goto issue_sfence;
> + }
> +
> + if (cpumask_test_cpu(cpuid, cmask) && cpumask_weight(cmask) == 1) {
> + local_flush_tlb_all();
> + goto done;
> + }

I think a single core on a SMP kernel is a valid enough use case given
how litte distros still have UP kernels. So Maybe this shiuld rather be:

if (!cmask)
cmask = cpu_online_mask;

if (cpumask_test_cpu(cpuid, cmask) && cpumask_weight(cmask) == 1) {
local_flush_tlb_all();
} else {
riscv_cpuid_to_hartid_mask(cmask, &hmask);
sbi_remote_sfence_vma(hmask.bits, start, size);
}

2019-08-22 06:53:28

by Atish Patra

[permalink] [raw]

Subject: Re: [PATCH v3 3/3] RISC-V: Do not invoke SBI call if cpumask is empty

On Thu, 2019-08-22 at 03:51 +0200, Christoph Hellwig wrote:
> On Wed, Aug 21, 2019 at 05:46:44PM -0700, Atish Patra wrote:
> > SBI calls are expensive. If cpumask is empty, there is no need to
> > trap via SBI as no remote tlb flushing is required.
> >
> > Signed-off-by: Atish Patra <[email protected]>
> > ---
> > arch/riscv/mm/tlbflush.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> > index 9f58b3790baa..2bd3c418d769 100644
> > --- a/arch/riscv/mm/tlbflush.c
> > +++ b/arch/riscv/mm/tlbflush.c
> > @@ -21,6 +21,9 @@ static void __sbi_tlb_flush_range(struct cpumask
> > *cmask, unsigned long start,
> > goto issue_sfence;
> > }
> >
> > + if (cpumask_empty(cmask))
> > + goto done;
>
> I think this can even be done before the get_cpu to optimize it a
> little
> further.

Yeah. I can just return directly in this case and call get_cpu after
this. Thanks for the suggestion.

--
Regards,
Atish

2019-08-22 06:54:49

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH v3 1/3] RISC-V: Issue a local tlbflush if possible.

On Thu, Aug 22, 2019 at 04:01:24AM +0000, Atish Patra wrote:
> The downside of this is that for every !cmask case in true SMP (more
> common probably) it will execute 2 extra cpumask instructions. As
> tlbflush path is in performance critical path, I think we should favor
> more common case (SMP with more than 1 core).

Actually, looking at both the current mainline code, and the code from my
cleanups tree I don't think remote_sfence_vma / __sbi_tlb_flush_range
can ever be called with NULL cpumask, as we always have a valid mm.

So this is a bit of a moot point, and we can drop andling that case
entirely. With that we can also use a simple if / else for the local
cpu only vs remote case. Btw, what was the reason you didn't like
using cpumask_any_but like x86, which should be more efficient than
cpumask_test_cpu + hweigt?

2019-08-22 07:03:21

by Atish Patra

[permalink] [raw]

Subject: Re: [PATCH v3 1/3] RISC-V: Issue a local tlbflush if possible.

On Thu, 2019-08-22 at 06:27 +0200, [email protected] wrote:
> On Thu, Aug 22, 2019 at 04:01:24AM +0000, Atish Patra wrote:
> > The downside of this is that for every !cmask case in true SMP
> > (more
> > common probably) it will execute 2 extra cpumask instructions. As
> > tlbflush path is in performance critical path, I think we should
> > favor
> > more common case (SMP with more than 1 core).
>
> Actually, looking at both the current mainline code, and the code
> from my
> cleanups tree I don't think remote_sfence_vma / __sbi_tlb_flush_range
> can ever be called with NULL cpumask, as we always have a valid mm.
>

Yes. You are correct.

As both cpumask functions here will crash if cpumask is null, we should
probably leave a harmless comment to warn the consequeunce of cpumask
being null.

> So this is a bit of a moot point, and we can drop andling that case
> entirely. With that we can also use a simple if / else for the local
> cpu only vs remote case.

Done.

> Btw, what was the reason you didn't like
> using cpumask_any_but like x86, which should be more efficient than
> cpumask_test_cpu + hweigt?

I had it in v2 patch but removed as it can potentially return garbage
value if cpumask is empty.

However, we are already checking empty cpumask before the local cpu
check. I will replace cpumask_test_cpu + hweight with
cpumask_any_but().

--
Regards,
Atish