This series optimizes the tlb flushes on riscv which used to simply
flush the whole tlb whatever the size of the range to flush or the size
of the stride.
Patch 3 introduces a threshold that is microarchitecture specific and
will very likely be modified by vendors, not sure though which mechanism
we'll use to do that (dt? alternatives? vendor initialization code?).
Next steps would be to implement:
- svinval extension as Mayuresh did here [1]
- BATCHED_UNMAP_TLB_FLUSH (I'll wait for arm64 patchset to land)
- MMU_GATHER_RCU_TABLE_FREE
- MMU_GATHER_MERGE_VMAS
Any other idea welcome.
[1] https://lore.kernel.org/linux-riscv/[email protected]/
Alexandre Ghiti (4):
riscv: Improve flush_tlb()
riscv: Improve flush_tlb_range() for hugetlb pages
riscv: Make __flush_tlb_range() loop over pte instead of flushing the
whole tlb
riscv: Improve flush_tlb_kernel_range()
arch/riscv/include/asm/tlb.h | 6 +-
arch/riscv/include/asm/tlbflush.h | 12 ++--
arch/riscv/mm/tlbflush.c | 93 +++++++++++++++++++++++++++----
3 files changed, 94 insertions(+), 17 deletions(-)
--
2.39.2
Hey Alex,
On Tue, Jul 11, 2023 at 09:54:30AM +0200, Alexandre Ghiti wrote:
> This series optimizes the tlb flushes on riscv which used to simply
> flush the whole tlb whatever the size of the range to flush or the size
> of the stride.
>
> Patch 3 introduces a threshold that is microarchitecture specific and
> will very likely be modified by vendors, not sure though which mechanism
> we'll use to do that (dt? alternatives? vendor initialization code?).
>
> Next steps would be to implement:
> - svinval extension as Mayuresh did here [1]
> - BATCHED_UNMAP_TLB_FLUSH (I'll wait for arm64 patchset to land)
> - MMU_GATHER_RCU_TABLE_FREE
> - MMU_GATHER_MERGE_VMAS
>
> Any other idea welcome.
>
> [1] https://lore.kernel.org/linux-riscv/[email protected]/
>
> Alexandre Ghiti (4):
> riscv: Improve flush_tlb()
> riscv: Improve flush_tlb_range() for hugetlb pages
> riscv: Make __flush_tlb_range() loop over pte instead of flushing the
> whole tlb
The whole series does not build on nommu & this one adds a build warning
for regular builds:
+ 1 ../arch/riscv/mm/tlbflush.c:32:15: warning: symbol 'tlb_flush_all_threshold' was not declared. Should it be static?
Cheers,
Conor.
Hi Conor,
On 12/07/2023 09:08, Conor Dooley wrote:
> Hey Alex,
>
> On Tue, Jul 11, 2023 at 09:54:30AM +0200, Alexandre Ghiti wrote:
>> This series optimizes the tlb flushes on riscv which used to simply
>> flush the whole tlb whatever the size of the range to flush or the size
>> of the stride.
>>
>> Patch 3 introduces a threshold that is microarchitecture specific and
>> will very likely be modified by vendors, not sure though which mechanism
>> we'll use to do that (dt? alternatives? vendor initialization code?).
@Conor any idea how to achieve this?
>>
>> Next steps would be to implement:
>> - svinval extension as Mayuresh did here [1]
>> - BATCHED_UNMAP_TLB_FLUSH (I'll wait for arm64 patchset to land)
>> - MMU_GATHER_RCU_TABLE_FREE
>> - MMU_GATHER_MERGE_VMAS
>>
>> Any other idea welcome.
>>
>> [1] https://lore.kernel.org/linux-riscv/[email protected]/
>>
>> Alexandre Ghiti (4):
>> riscv: Improve flush_tlb()
>> riscv: Improve flush_tlb_range() for hugetlb pages
>> riscv: Make __flush_tlb_range() loop over pte instead of flushing the
>> whole tlb
> The whole series does not build on nommu & this one adds a build warning
> for regular builds:
> + 1 ../arch/riscv/mm/tlbflush.c:32:15: warning: symbol 'tlb_flush_all_threshold' was not declared. Should it be static?
>
> Cheers,
> Conor.
I'll fix the nommu build, sorry about that. Weird I missed this warning,
that's an LLVM build right? That variable will need to overwritten by
the vendors, so that should not be static (but it will depend on what
solution we implement).
Thanks,
Alex
On Wed, Jul 12, 2023 at 05:18:00PM +0200, Alexandre Ghiti wrote:
> On 12/07/2023 09:08, Conor Dooley wrote:
> > On Tue, Jul 11, 2023 at 09:54:30AM +0200, Alexandre Ghiti wrote:
> > > This series optimizes the tlb flushes on riscv which used to simply
> > > flush the whole tlb whatever the size of the range to flush or the size
> > > of the stride.
> > >
> > > Patch 3 introduces a threshold that is microarchitecture specific and
> > > will very likely be modified by vendors, not sure though which mechanism
> > > we'll use to do that (dt? alternatives? vendor initialization code?).
>
>
> @Conor any idea how to achieve this?
It's in my queue of things to look at, just been prioritising the
extension related stuff the last few days. Hopefully I'll have a chance
to think about this tomorrow.. Famous last words probably.
> > > Next steps would be to implement:
> > > - svinval extension as Mayuresh did here [1]
> > > - BATCHED_UNMAP_TLB_FLUSH (I'll wait for arm64 patchset to land)
> > > - MMU_GATHER_RCU_TABLE_FREE
> > > - MMU_GATHER_MERGE_VMAS
> > >
> > > Any other idea welcome.
> > >
> > > [1] https://lore.kernel.org/linux-riscv/[email protected]/
> > >
> > > Alexandre Ghiti (4):
> > > riscv: Improve flush_tlb()
> > > riscv: Improve flush_tlb_range() for hugetlb pages
> > > riscv: Make __flush_tlb_range() loop over pte instead of flushing the
> > > whole tlb
> > The whole series does not build on nommu & this one adds a build warning
> > for regular builds:
> > + 1 ../arch/riscv/mm/tlbflush.c:32:15: warning: symbol 'tlb_flush_all_threshold' was not declared. Should it be static?
> >
> > Cheers,
> > Conor.
>
>
> I'll fix the nommu build, sorry about that. Weird I missed this warning,
> that's an LLVM build right? That variable will need to overwritten by the
> vendors, so that should not be static (but it will depend on what solution
> we implement).
Just make it static until we actually have a vendor implementation of
this stuff please, since we don't know what that will look like yet.
On Wed, 12 Jul 2023 10:19:47 PDT (-0700), Conor Dooley wrote:
> On Wed, Jul 12, 2023 at 05:18:00PM +0200, Alexandre Ghiti wrote:
>> On 12/07/2023 09:08, Conor Dooley wrote:
>> > On Tue, Jul 11, 2023 at 09:54:30AM +0200, Alexandre Ghiti wrote:
>> > > This series optimizes the tlb flushes on riscv which used to simply
>> > > flush the whole tlb whatever the size of the range to flush or the size
>> > > of the stride.
>> > >
>> > > Patch 3 introduces a threshold that is microarchitecture specific and
>> > > will very likely be modified by vendors, not sure though which mechanism
>> > > we'll use to do that (dt? alternatives? vendor initialization code?).
>>
>>
>> @Conor any idea how to achieve this?
>
> It's in my queue of things to look at, just been prioritising the
> extension related stuff the last few days. Hopefully I'll have a chance
> to think about this tomorrow.. Famous last words probably.
>
>> > > Next steps would be to implement:
>> > > - svinval extension as Mayuresh did here [1]
>> > > - BATCHED_UNMAP_TLB_FLUSH (I'll wait for arm64 patchset to land)
>> > > - MMU_GATHER_RCU_TABLE_FREE
>> > > - MMU_GATHER_MERGE_VMAS
>> > >
>> > > Any other idea welcome.
>> > >
>> > > [1] https://lore.kernel.org/linux-riscv/[email protected]/
>> > >
>> > > Alexandre Ghiti (4):
>> > > riscv: Improve flush_tlb()
>> > > riscv: Improve flush_tlb_range() for hugetlb pages
>> > > riscv: Make __flush_tlb_range() loop over pte instead of flushing the
>> > > whole tlb
>> > The whole series does not build on nommu & this one adds a build warning
>> > for regular builds:
>> > + 1 ../arch/riscv/mm/tlbflush.c:32:15: warning: symbol 'tlb_flush_all_threshold' was not declared. Should it be static?
>> >
>> > Cheers,
>> > Conor.
>>
>>
>> I'll fix the nommu build, sorry about that. Weird I missed this warning,
>> that's an LLVM build right? That variable will need to overwritten by the
>> vendors, so that should not be static (but it will depend on what solution
>> we implement).
>
> Just make it static until we actually have a vendor implementation of
> this stuff please, since we don't know what that will look like yet.
It's just a performance issue, right? IIRC the SiFive errata wasn't
actually based on how many TLB flushes happen, they're just broken in
general so it was a probability thing.
If that's the case I agree we can just start with something arbitrary to
start and then figure out how to set the tunable later. It's probably
going to be workload-specific too, so we'll probably end up with both a
firmware default and a userspace override (maybe a sys entry or
whatever).