LinuxLists.cc - [PATCH 09/13] unicore: mmu

2011-03-02 17:56:52

Subject: [PATCH 09/13] unicore: mmu_gather rework

Fix up the unicore mmu_gather code to conform to the new API.

Cc: Guan Xuetao <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
---
arch/unicore32/include/asm/tlb.h | 32 ++++++++++++++++++++------------
1 file changed, 20 insertions(+), 12 deletions(-)

Index: linux-2.6/arch/unicore32/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/unicore32/include/asm/tlb.h
+++ linux-2.6/arch/unicore32/include/asm/tlb.h
@@ -27,17 +27,11 @@ struct mmu_gather {
unsigned long range_end;
};

-DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
-
-static inline struct mmu_gather *
-tlb_gather_mmu(struct mm_struct *mm, unsigned int full_mm_flush)
+static inline void
+tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int fullmm)
{
- struct mmu_gather *tlb = &get_cpu_var(mmu_gathers);
-
tlb->mm = mm;
- tlb->fullmm = full_mm_flush;
-
- return tlb;
+ tlb->fullmm = fullmm;
}

static inline void
@@ -48,8 +42,6 @@ tlb_finish_mmu(struct mmu_gather *tlb, u

/* keep the page table cache within bounds */
check_pgt_cache();
-
- put_cpu_var(mmu_gathers);
}

/*
@@ -88,7 +80,23 @@ tlb_end_vma(struct mmu_gather *tlb, stru
flush_tlb_range(vma, tlb->range_start, tlb->range_end);
}

-#define tlb_remove_page(tlb, page) free_page_and_swap_cache(page)
+static inline void tlb_flush_mmu(struct mmu_gather *tlb)
+{
+}
+
+static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+{
+ free_page_and_swap_cache(page);
+ return 0;
+}
+
+static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+{
+ if (__tlb_remove_page(tlb, page))
+ tlb_flush_mmu(tlb);
+}
+
+
#define pte_free_tlb(tlb, ptep, addr) pte_free((tlb)->mm, ptep)
#define pmd_free_tlb(tlb, pmdp, addr) pmd_free((tlb)->mm, pmdp)
#define pud_free_tlb(tlb, x, addr) do { } while (0)

2011-03-04 11:59:25

by Xuetao Guan

[permalink] [raw]

Subject: RE: [PATCH 09/13] unicore: mmu_gather rework

> -----Original Message-----
> From: Peter Zijlstra [mailto:[email protected]]
> Sent: Thursday, March 03, 2011 1:50 AM
> To: Andrea Arcangeli; Avi Kivity; Thomas Gleixner; Rik van Riel; Ingo Molnar; [email protected]; Linus Torvalds
> Cc: [email protected]; [email protected]; [email protected]; Benjamin Herrenschmidt; David Miller; Hugh
> Dickins; Mel Gorman; Nick Piggin; Peter Zijlstra; Paul McKenney; Yanmin Zhang; Guan Xuetao
> Subject: [PATCH 09/13] unicore: mmu_gather rework
>
> Fix up the unicore mmu_gather code to conform to the new API.
>
> Cc: Guan Xuetao <[email protected]>
> Signed-off-by: Peter Zijlstra <[email protected]>
> ---
> arch/unicore32/include/asm/tlb.h | 32 ++++++++++++++++++++------------
> 1 file changed, 20 insertions(+), 12 deletions(-)
>
> Index: linux-2.6/arch/unicore32/include/asm/tlb.h
> ===================================================================
> --- linux-2.6.orig/arch/unicore32/include/asm/tlb.h
> +++ linux-2.6/arch/unicore32/include/asm/tlb.h
> @@ -27,17 +27,11 @@ struct mmu_gather {
> unsigned long range_end;
> };
>
> -DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
> -
> -static inline struct mmu_gather *
> -tlb_gather_mmu(struct mm_struct *mm, unsigned int full_mm_flush)
> +static inline void
> +tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int fullmm)
> {
> - struct mmu_gather *tlb = &get_cpu_var(mmu_gathers);
> -
> tlb->mm = mm;
> - tlb->fullmm = full_mm_flush;
> -
> - return tlb;
> + tlb->fullmm = fullmm;
> }
>
> static inline void
> @@ -48,8 +42,6 @@ tlb_finish_mmu(struct mmu_gather *tlb, u
>
> /* keep the page table cache within bounds */
> check_pgt_cache();
> -
> - put_cpu_var(mmu_gathers);
> }
>
> /*
> @@ -88,7 +80,23 @@ tlb_end_vma(struct mmu_gather *tlb, stru
> flush_tlb_range(vma, tlb->range_start, tlb->range_end);
> }
>
> -#define tlb_remove_page(tlb, page) free_page_and_swap_cache(page)
> +static inline void tlb_flush_mmu(struct mmu_gather *tlb)
> +{
> +}
> +
> +static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
> +{
> + free_page_and_swap_cache(page);
> + return 0;
> +}
> +
> +static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
> +{
> + if (__tlb_remove_page(tlb, page))
> + tlb_flush_mmu(tlb);
> +}
> +
> +
> #define pte_free_tlb(tlb, ptep, addr) pte_free((tlb)->mm, ptep)
> #define pmd_free_tlb(tlb, pmdp, addr) pmd_free((tlb)->mm, pmdp)
> #define pud_free_tlb(tlb, x, addr) do { } while (0)
Thanks Peter.
It looks good to me, though it is dependent on your patch set "mm: Preemptible mmu_gather"
While I have another look to include/asm-generic/tlb.h, I found it is also suitable for unicore32.
And so, I rewrite the tlb.h to use asm-generic version, and then your patch set will also work for me.

Cc: Arnd Bergmann <[email protected]>

From: GuanXuetao <[email protected]>
Date: Fri, 4 Mar 2011 20:00:11 +0800
Subject: [PATCH] unicore32: rewrite arch-specific tlb.h to use asm-generic version

Signed-off-by: Guan Xuetao <[email protected]>
---
arch/unicore32/include/asm/tlb.h | 94 +++++---------------------------------
1 files changed, 12 insertions(+), 82 deletions(-)

diff --git a/arch/unicore32/include/asm/tlb.h b/arch/unicore32/include/asm/tlb.h
index 02ee40e..9cca15c 100644
--- a/arch/unicore32/include/asm/tlb.h
+++ b/arch/unicore32/include/asm/tlb.h
@@ -12,87 +12,17 @@
#ifndef __UNICORE_TLB_H__
#define __UNICORE_TLB_H__

-#include <asm/cacheflush.h>
-#include <asm/tlbflush.h>
-#include <asm/pgalloc.h>
-
-/*
- * TLB handling. This allows us to remove pages from the page
- * tables, and efficiently handle the TLB issues.
- */
-struct mmu_gather {
- struct mm_struct *mm;
- unsigned int fullmm;
- unsigned long range_start;
- unsigned long range_end;
-};
-
-DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
-
-static inline struct mmu_gather *
-tlb_gather_mmu(struct mm_struct *mm, unsigned int full_mm_flush)
-{
- struct mmu_gather *tlb = &get_cpu_var(mmu_gathers);
-
- tlb->mm = mm;
- tlb->fullmm = full_mm_flush;
-
- return tlb;
-}
-
-static inline void
-tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
-{
- if (tlb->fullmm)
- flush_tlb_mm(tlb->mm);
-
- /* keep the page table cache within bounds */
- check_pgt_cache();
-
- put_cpu_var(mmu_gathers);
-}
-
-/*
- * Memorize the range for the TLB flush.
- */
-static inline void
-tlb_remove_tlb_entry(struct mmu_gather *tlb, pte_t *ptep, unsigned long addr)
-{
- if (!tlb->fullmm) {
- if (addr < tlb->range_start)
- tlb->range_start = addr;
- if (addr + PAGE_SIZE > tlb->range_end)
- tlb->range_end = addr + PAGE_SIZE;
- }
-}
-
-/*
- * In the case of tlb vma handling, we can optimise these away in the
- * case where we're doing a full MM flush. When we're doing a munmap,
- * the vmas are adjusted to only cover the region to be torn down.
- */
-static inline void
-tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
-{
- if (!tlb->fullmm) {
- flush_cache_range(vma, vma->vm_start, vma->vm_end);
- tlb->range_start = TASK_SIZE;
- tlb->range_end = 0;
- }
-}
-
-static inline void
-tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
-{
- if (!tlb->fullmm && tlb->range_end > 0)
- flush_tlb_range(vma, tlb->range_start, tlb->range_end);
-}
-
-#define tlb_remove_page(tlb, page) free_page_and_swap_cache(page)
-#define pte_free_tlb(tlb, ptep, addr) pte_free((tlb)->mm, ptep)
-#define pmd_free_tlb(tlb, pmdp, addr) pmd_free((tlb)->mm, pmdp)
-#define pud_free_tlb(tlb, x, addr) do { } while (0)
-
-#define tlb_migrate_finish(mm) do { } while (0)
+#define tlb_start_vma(tlb, vma) do { } while (0)
+#define tlb_end_vma(tlb, vma) do { } while (0)
+#define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
+#define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
+
+#define __pte_free_tlb(tlb, pte, addr) \
+ do { \
+ pgtable_page_dtor(pte); \
+ tlb_remove_page((tlb), (pte)); \
+ } while (0)
+
+#include <asm-generic/tlb.h>

#endif
--
1.6.2.2

Thanks & Regards.

Guan Xuetao

2011-03-04 12:55:11

by Peter Zijlstra

[permalink] [raw]

Subject: RE: [PATCH 09/13] unicore: mmu_gather rework

On Fri, 2011-03-04 at 19:56 +0800, Guan Xuetao wrote:

> Thanks Peter.
> It looks good to me, though it is dependent on your patch set "mm: Preemptible mmu_gather"

It is indeed, the split-out per arch is purely to ease review. The final
commit should be a merge of the first 10 patches so as not to break
bisection.

> While I have another look to include/asm-generic/tlb.h, I found it is also suitable for unicore32.
> And so, I rewrite the tlb.h to use asm-generic version, and then your patch set will also work for me.

Awesome, I notice you're loosing flush_tlb_range() support for this, if
you're fine with that I'm obviously not going to argue, but if its
better for your platform to keep doing this we can work on that as well
as I'm trying to add generic support for range tracking into the generic
tlb code.

More importantly, you seem to loose your call to flush_cache_range()
which isn't a NOP on your platform.

Furthermore, while arch/unicore32/mm/tlb-ucv2.S is mostly magic to me, I
see unicore32 is one of the few architectures that actually uses
vm_flags in flush_tlb_range(). Do you have independent I/D-TLB flushes
or are you flushing I-cache on VM_EXEC?

Also, I notice your flush_tlb_range() implementation looks to be a loop
invalidating individual pages, which I can imagine is cheaper for small
ranges but large ranges might be better of with a full invalidate. Did
you think about this?

2011-03-08 10:28:37

by Xuetao Guan

[permalink] [raw]

Subject: RE: [PATCH 09/13] unicore: mmu_gather rework

> -----Original Message-----
> From: Peter Zijlstra [mailto:[email protected]]
> Sent: Friday, March 04, 2011 8:17 PM
> To: Guan Xuetao
> Cc: [email protected]; [email protected]; [email protected]; 'Benjamin Herrenschmidt'; 'David Miller'; 'Hugh
> Dickins'; 'Mel Gorman'; 'Nick Piggin'; 'Paul McKenney'; 'Yanmin Zhang'; 'Andrea Arcangeli'; 'Avi Kivity'; 'Thomas Gleixner'; 'Rik van Riel';
> 'Ingo Molnar'; [email protected]; 'Linus Torvalds'; Arnd Bergmann
> Subject: RE: [PATCH 09/13] unicore: mmu_gather rework
>
> On Fri, 2011-03-04 at 19:56 +0800, Guan Xuetao wrote:
>
> > Thanks Peter.
> > It looks good to me, though it is dependent on your patch set "mm: Preemptible mmu_gather"
>
> It is indeed, the split-out per arch is purely to ease review. The final
> commit should be a merge of the first 10 patches so as not to break
> bisection.
>
> > While I have another look to include/asm-generic/tlb.h, I found it is also suitable for unicore32.
> > And so, I rewrite the tlb.h to use asm-generic version, and then your patch set will also work for me.
>
> Awesome, I notice you're loosing flush_tlb_range() support for this, if
> you're fine with that I'm obviously not going to argue, but if its
> better for your platform to keep doing this we can work on that as well
> as I'm trying to add generic support for range tracking into the generic
> tlb code.
Yes, I think flush_tlb_range() have no effect in unicore32 architecture.
Or perhaps, it is because no optimization, just as you point it below.

>
> More importantly, you seem to loose your call to flush_cache_range()
> which isn't a NOP on your platform.
IMO, flush_cache_range() is only used in self-modified codes when cachetype is vipt.
So, it could be neglected here.
Perhaps it's wrong.

>
> Furthermore, while arch/unicore32/mm/tlb-ucv2.S is mostly magic to me, I
> see unicore32 is one of the few architectures that actually uses
> vm_flags in flush_tlb_range(). Do you have independent I/D-TLB flushes
> or are you flushing I-cache on VM_EXEC?
We have both independent and global I/D TLB flushes.
And flushing I-cache on VM_EXEC is also needed in self-modified codes, IMO.

>
> Also, I notice your flush_tlb_range() implementation looks to be a loop
> invalidating individual pages, which I can imagine is cheaper for small
> ranges but large ranges might be better of with a full invalidate. Did
> you think about this?
>
Yes, it should be optimized.
However, I doubt its effect in unicore32 which has no asid support.

Thanks & Regards.

Guan Xuetao