2023-02-27 17:58:00

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 00/30] New page table range API

This patchset changes the API used by the MM to set up page table entries.
The four APIs are:
set_ptes(mm, addr, ptep, pte, nr)
update_mmu_cache_range(vma, addr, ptep, nr)
flush_dcache_folio(folio)
flush_icache_pages(vma, page, nr)

flush_dcache_folio() isn't technically new, but no architecture
implemented it, so I've done that for you. The old APIs remain around
but are mostly implemented by calling the new interfaces.

The new APIs are based around setting up N page table entries at once.
The N entries belong to the same PMD, the same folio and the same VMA,
so ptep++ is a legitimate operation, and locking is taken care of for
you. Some architectures can do a better job of it than just a loop,
but I have hesitated to make too deep a change to architectures I don't
understand well.

One thing I have changed in every architecture is that PG_arch_1 is now a
per-folio bit instead of a per-page bit. This was something that would
have to happen eventually, and it makes sense to do it now rather than
iterate over every page involved in a cache flush and figure out if it
needs to happen.

The point of all this is better performance, and Fengwei Yin has
measured improvement on x86. I suspect you'll see improvement on
your architecture too. Try the new will-it-scale test mentioned here:
https://lore.kernel.org/linux-mm/[email protected]/
You'll need to run it on an XFS filesystem and have
CONFIG_TRANSPARENT_HUGEPAGE set.

For testing, I've only run the code on x86. If an x86->foo compiler
exists in Debian, I've built defconfig. I'm relying on the buildbots
to tell me what I missed, and people who actually have the hardware to
tell me if it actually works.

I'd like to get this into the MM tree soon after the current merge window
closes, so quick feedback would be appreciated.

Matthew Wilcox (Oracle) (26):
mm: Convert page_table_check_pte_set() to page_table_check_ptes_set()
mm: Add generic flush_icache_pages() and documentation
mm: Add folio_flush_mapping()
mm: Remove ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO
alpha: Implement the new page table range API
arc: Implement the new page table range API
arm64: Implement the new page table range API
csky: Implement the new page table range API
hexagon: Implement the new page table range API
ia64: Implement the new page table range API
loongarch: Implement the new page table range API
m68k: Implement the new page table range API
microblaze: Implement the new page table range API
mips: Implement the new page table range API
nios2: Implement the new page table range API
openrisc: Implement the new page table range API
parisc: Implement the new page table range API
powerpc: Implement the new page table range API
riscv: Implement the new page table range API
s390: Implement the new page table range API
superh: Implement the new page table range API
sparc32: Implement the new page table range API
sparc64: Implement the new page table range API
um: Implement the new page table range API
x86: Implement the new page table range API
xtensa: Implement the new page table range API

Yin Fengwei (4):
filemap: Add filemap_map_folio_range()
rmap: add folio_add_file_rmap_range()
mm: Convert do_set_pte() to set_pte_range()
filemap: Batch PTE mappings

Documentation/core-api/cachetlb.rst | 35 +++----
Documentation/filesystems/locking.rst | 2 +-
arch/alpha/include/asm/cacheflush.h | 10 ++
arch/alpha/include/asm/pgtable.h | 18 +++-
arch/arc/include/asm/cacheflush.h | 7 +-
arch/arc/include/asm/pgtable-bits-arcv2.h | 20 +++-
arch/arc/mm/cache.c | 61 +++++++-----
arch/arc/mm/tlb.c | 18 ++--
arch/arm/include/asm/cacheflush.h | 24 +++--
arch/arm/include/asm/pgtable.h | 5 +-
arch/arm/include/asm/tlbflush.h | 13 ++-
arch/arm/mm/copypage-v4mc.c | 5 +-
arch/arm/mm/copypage-v6.c | 5 +-
arch/arm/mm/copypage-xscale.c | 5 +-
arch/arm/mm/dma-mapping.c | 24 ++---
arch/arm/mm/fault-armv.c | 14 +--
arch/arm/mm/flush.c | 99 +++++++++++--------
arch/arm/mm/mm.h | 2 +-
arch/arm/mm/mmu.c | 14 ++-
arch/arm64/include/asm/cacheflush.h | 4 +-
arch/arm64/include/asm/pgtable.h | 25 +++--
arch/arm64/mm/flush.c | 36 +++----
arch/csky/abiv1/cacheflush.c | 32 ++++---
arch/csky/abiv1/inc/abi/cacheflush.h | 2 +
arch/csky/abiv2/cacheflush.c | 30 +++---
arch/csky/abiv2/inc/abi/cacheflush.h | 10 +-
arch/csky/include/asm/pgtable.h | 21 +++-
arch/hexagon/include/asm/cacheflush.h | 7 +-
arch/hexagon/include/asm/pgtable.h | 16 +++-
arch/ia64/hp/common/sba_iommu.c | 26 ++---
arch/ia64/include/asm/cacheflush.h | 14 ++-
arch/ia64/include/asm/pgtable.h | 14 ++-
arch/ia64/mm/init.c | 29 ++++--
arch/loongarch/include/asm/cacheflush.h | 2 +
arch/loongarch/include/asm/pgtable.h | 30 ++++--
arch/m68k/include/asm/cacheflush_mm.h | 26 +++--
arch/m68k/include/asm/pgtable_mm.h | 21 +++-
arch/m68k/mm/motorola.c | 2 +-
arch/microblaze/include/asm/cacheflush.h | 8 ++
arch/microblaze/include/asm/pgtable.h | 17 +++-
arch/microblaze/include/asm/tlbflush.h | 4 +-
arch/mips/include/asm/cacheflush.h | 32 ++++---
arch/mips/include/asm/pgtable.h | 36 ++++---
arch/mips/mm/c-r4k.c | 5 +-
arch/mips/mm/cache.c | 56 +++++------
arch/mips/mm/init.c | 17 ++--
arch/nios2/include/asm/cacheflush.h | 6 +-
arch/nios2/include/asm/pgtable.h | 27 ++++--
arch/nios2/mm/cacheflush.c | 61 ++++++------
arch/openrisc/include/asm/cacheflush.h | 8 +-
arch/openrisc/include/asm/pgtable.h | 27 +++++-
arch/openrisc/mm/cache.c | 12 ++-
arch/parisc/include/asm/cacheflush.h | 14 ++-
arch/parisc/include/asm/pgtable.h | 28 ++++--
arch/parisc/kernel/cache.c | 101 ++++++++++++++------
arch/powerpc/include/asm/book3s/pgtable.h | 10 +-
arch/powerpc/include/asm/cacheflush.h | 14 ++-
arch/powerpc/include/asm/kvm_ppc.h | 10 +-
arch/powerpc/include/asm/nohash/pgtable.h | 13 +--
arch/powerpc/include/asm/pgtable.h | 6 ++
arch/powerpc/mm/book3s64/hash_utils.c | 11 ++-
arch/powerpc/mm/cacheflush.c | 81 ++--------------
arch/powerpc/mm/nohash/e500_hugetlbpage.c | 3 +-
arch/powerpc/mm/pgtable.c | 51 +++++-----
arch/riscv/include/asm/cacheflush.h | 19 ++--
arch/riscv/include/asm/pgtable.h | 26 +++--
arch/riscv/mm/cacheflush.c | 11 +--
arch/s390/include/asm/pgtable.h | 34 +++++--
arch/sh/include/asm/cacheflush.h | 21 ++--
arch/sh/include/asm/pgtable.h | 6 +-
arch/sh/include/asm/pgtable_32.h | 16 +++-
arch/sh/mm/cache-j2.c | 4 +-
arch/sh/mm/cache-sh4.c | 26 +++--
arch/sh/mm/cache-sh7705.c | 26 +++--
arch/sh/mm/cache.c | 54 ++++++-----
arch/sh/mm/kmap.c | 3 +-
arch/sparc/include/asm/cacheflush_32.h | 9 +-
arch/sparc/include/asm/cacheflush_64.h | 18 ++--
arch/sparc/include/asm/pgtable_32.h | 15 ++-
arch/sparc/include/asm/pgtable_64.h | 25 ++++-
arch/sparc/kernel/smp_64.c | 56 +++++++----
arch/sparc/mm/init_32.c | 13 ++-
arch/sparc/mm/init_64.c | 78 ++++++++-------
arch/sparc/mm/tlb.c | 5 +-
arch/um/include/asm/pgtable.h | 15 ++-
arch/x86/include/asm/pgtable.h | 21 +++-
arch/xtensa/include/asm/cacheflush.h | 9 +-
arch/xtensa/include/asm/pgtable.h | 24 +++--
arch/xtensa/mm/cache.c | 83 +++++++++-------
include/asm-generic/cacheflush.h | 5 +
include/linux/cacheflush.h | 4 +-
include/linux/mm.h | 3 +-
include/linux/page_table_check.h | 14 +--
include/linux/pagemap.h | 26 ++++-
include/linux/rmap.h | 2 +
mm/filemap.c | 111 +++++++++++++---------
mm/memory.c | 27 +++---
mm/page_table_check.c | 14 +--
mm/rmap.c | 60 +++++++++---
mm/util.c | 2 +-
100 files changed, 1433 insertions(+), 838 deletions(-)

--
2.39.1



2023-02-27 17:58:03

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 08/30] csky: Implement the new page table range API

Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
Change the PG_dcache_clean flag from being per-page to per-folio.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: [email protected]
---
arch/csky/abiv1/cacheflush.c | 32 +++++++++++++++++-----------
arch/csky/abiv1/inc/abi/cacheflush.h | 2 ++
arch/csky/abiv2/cacheflush.c | 30 +++++++++++++-------------
arch/csky/abiv2/inc/abi/cacheflush.h | 10 +++++++--
arch/csky/include/asm/pgtable.h | 21 +++++++++++++++---
5 files changed, 62 insertions(+), 33 deletions(-)

diff --git a/arch/csky/abiv1/cacheflush.c b/arch/csky/abiv1/cacheflush.c
index fb91b069dc69..ba43f6c26b4f 100644
--- a/arch/csky/abiv1/cacheflush.c
+++ b/arch/csky/abiv1/cacheflush.c
@@ -14,43 +14,49 @@

#define PG_dcache_clean PG_arch_1

-void flush_dcache_page(struct page *page)
+void flush_dcache_folio(struct folio *folio)
{
struct address_space *mapping;

- if (page == ZERO_PAGE(0))
+ if (is_zero_pfn(folio_pfn(folio)))
return;

- mapping = page_mapping_file(page);
+ mapping = folio_flush_mapping(folio);

- if (mapping && !page_mapcount(page))
- clear_bit(PG_dcache_clean, &page->flags);
+ if (mapping && !folio_mapped(folio))
+ clear_bit(PG_dcache_clean, &folio->flags);
else {
dcache_wbinv_all();
if (mapping)
icache_inv_all();
- set_bit(PG_dcache_clean, &page->flags);
+ set_bit(PG_dcache_clean, &folio->flags);
}
}
+EXPORT_SYMBOL(flush_dcache_folio);
+
+void flush_dcache_page(struct page *page)
+{
+ flush_dcache_folio(page_folio(page));
+}
EXPORT_SYMBOL(flush_dcache_page);

-void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
- pte_t *ptep)
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long addr,
+ pte_t *ptep, unsigned int nr)
{
unsigned long pfn = pte_pfn(*ptep);
- struct page *page;
+ struct folio *folio;

if (!pfn_valid(pfn))
return;

- page = pfn_to_page(pfn);
- if (page == ZERO_PAGE(0))
+ if (is_zero_pfn(pfn))
return;

- if (!test_and_set_bit(PG_dcache_clean, &page->flags))
+ folio = page_folio(pfn_to_page(pfn));
+ if (!test_and_set_bit(PG_dcache_clean, &folio->flags))
dcache_wbinv_all();

- if (page_mapping_file(page)) {
+ if (folio_flush_mapping(folio)) {
if (vma->vm_flags & VM_EXEC)
icache_inv_all();
}
diff --git a/arch/csky/abiv1/inc/abi/cacheflush.h b/arch/csky/abiv1/inc/abi/cacheflush.h
index ed62e2066ba7..0d6cb65624c4 100644
--- a/arch/csky/abiv1/inc/abi/cacheflush.h
+++ b/arch/csky/abiv1/inc/abi/cacheflush.h
@@ -9,6 +9,8 @@

#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
extern void flush_dcache_page(struct page *);
+void flush_dcache_folio(struct folio *);
+#define flush_dcache_folio flush_dcache_folio

#define flush_cache_mm(mm) dcache_wbinv_all()
#define flush_cache_page(vma, page, pfn) cache_wbinv_all()
diff --git a/arch/csky/abiv2/cacheflush.c b/arch/csky/abiv2/cacheflush.c
index 39c51399dd81..c1cf0d55a2a1 100644
--- a/arch/csky/abiv2/cacheflush.c
+++ b/arch/csky/abiv2/cacheflush.c
@@ -6,30 +6,30 @@
#include <linux/mm.h>
#include <asm/cache.h>

-void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
- pte_t *pte)
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
+ pte_t *pte, unsigned int nr)
{
- unsigned long addr;
+ unsigned long pfn = pte_pfn(*pte);
struct page *page;
+ unsigned int i;

- if (!pfn_valid(pte_pfn(*pte)))
+ if (!pfn_valid(pfn) || is_zero_pfn(pfn))
return;

- page = pfn_to_page(pte_pfn(*pte));
- if (page == ZERO_PAGE(0))
- return;
+ folio = page_folio(pfn_to_page(pfn));

- if (test_and_set_bit(PG_dcache_clean, &page->flags))
+ if (test_and_set_bit(PG_dcache_clean, &folio->flags))
return;

- addr = (unsigned long) kmap_atomic(page);
-
- dcache_wb_range(addr, addr + PAGE_SIZE);
+ for (i = 0; i < folio_nr_pages(folio); i++) {
+ unsigned long addr = (unsigned long) kmap_local_folio(folio,
+ i * PAGE_SIZE);

- if (vma->vm_flags & VM_EXEC)
- icache_inv_range(addr, addr + PAGE_SIZE);
-
- kunmap_atomic((void *) addr);
+ dcache_wb_range(addr, addr + PAGE_SIZE);
+ if (vma->vm_flags & VM_EXEC)
+ icache_inv_range(addr, addr + PAGE_SIZE);
+ kunmap_local((void *) addr);
+ }
}

void flush_icache_deferred(struct mm_struct *mm)
diff --git a/arch/csky/abiv2/inc/abi/cacheflush.h b/arch/csky/abiv2/inc/abi/cacheflush.h
index a565e00c3f70..9c728933a776 100644
--- a/arch/csky/abiv2/inc/abi/cacheflush.h
+++ b/arch/csky/abiv2/inc/abi/cacheflush.h
@@ -18,11 +18,17 @@

#define PG_dcache_clean PG_arch_1

+static inline void flush_dcache_folio(struct folio *folio)
+{
+ if (test_bit(PG_dcache_clean, &folio->flags))
+ clear_bit(PG_dcache_clean, &folio->flags);
+}
+#define flush_dcache_folio flush_dcache_folio
+
#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
static inline void flush_dcache_page(struct page *page)
{
- if (test_bit(PG_dcache_clean, &page->flags))
- clear_bit(PG_dcache_clean, &page->flags);
+ flush_dcache_folio(page_folio(page));
}

#define flush_dcache_mmap_lock(mapping) do { } while (0)
diff --git a/arch/csky/include/asm/pgtable.h b/arch/csky/include/asm/pgtable.h
index d4042495febc..a30ae048233e 100644
--- a/arch/csky/include/asm/pgtable.h
+++ b/arch/csky/include/asm/pgtable.h
@@ -90,7 +90,20 @@ static inline void set_pte(pte_t *p, pte_t pte)
/* prevent out of order excution */
smp_mb();
}
-#define set_pte_at(mm, addr, ptep, pteval) set_pte(ptep, pteval)
+
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte, unsigned int nr)
+{
+ for (;;) {
+ set_pte(ptep, pte);
+ if (--nr == 0)
+ break;
+ ptep++;
+ pte_val(pte) += PAGE_SIZE;
+ }
+}
+
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)

static inline pte_t *pmd_page_vaddr(pmd_t pmd)
{
@@ -263,8 +276,10 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
extern void paging_init(void);

-void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
- pte_t *pte);
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
+ pte_t *pte, unsigned int nr);
+#define update_mmu_cache(vma, addr, ptep) \
+ update_mmu_cache_range(vma, addr, ptep, 1)

#define io_remap_pfn_range(vma, vaddr, pfn, size, prot) \
remap_pfn_range(vma, vaddr, pfn, size, prot)
--
2.39.1


2023-02-27 17:58:06

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 10/30] ia64: Implement the new page table range API

Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
Change the PG_arch_1 (aka PG_dcache_clean) flag from being per-page to
per-folio, which makes arch_dma_mark_clean() and mark_clean() a little
more exciting.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: [email protected]
---
arch/ia64/hp/common/sba_iommu.c | 26 +++++++++++++++-----------
arch/ia64/include/asm/cacheflush.h | 14 ++++++++++----
arch/ia64/include/asm/pgtable.h | 14 +++++++++++++-
arch/ia64/mm/init.c | 29 +++++++++++++++++++----------
4 files changed, 57 insertions(+), 26 deletions(-)

diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
index 8ad6946521d8..48d475f10003 100644
--- a/arch/ia64/hp/common/sba_iommu.c
+++ b/arch/ia64/hp/common/sba_iommu.c
@@ -798,22 +798,26 @@ sba_io_pdir_entry(u64 *pdir_ptr, unsigned long vba)
#endif

#ifdef ENABLE_MARK_CLEAN
-/**
+/*
* Since DMA is i-cache coherent, any (complete) pages that were written via
* DMA can be marked as "clean" so that lazy_mmu_prot_update() doesn't have to
* flush them when they get mapped into an executable vm-area.
*/
-static void
-mark_clean (void *addr, size_t size)
+static void mark_clean(void *addr, size_t size)
{
- unsigned long pg_addr, end;
-
- pg_addr = PAGE_ALIGN((unsigned long) addr);
- end = (unsigned long) addr + size;
- while (pg_addr + PAGE_SIZE <= end) {
- struct page *page = virt_to_page((void *)pg_addr);
- set_bit(PG_arch_1, &page->flags);
- pg_addr += PAGE_SIZE;
+ struct folio *folio = virt_to_folio(addr);
+ ssize_t left = size;
+ size_t offset = offset_in_folio(folio, addr);
+
+ if (offset) {
+ left -= folio_size(folio) - offset;
+ folio = folio_next(folio);
+ }
+
+ while (left >= folio_size(folio)) {
+ set_bit(PG_arch_1, &folio->flags);
+ left -= folio_size(folio);
+ folio = folio_next(folio);
}
}
#endif
diff --git a/arch/ia64/include/asm/cacheflush.h b/arch/ia64/include/asm/cacheflush.h
index 708c0fa5d975..eac493fa9e0d 100644
--- a/arch/ia64/include/asm/cacheflush.h
+++ b/arch/ia64/include/asm/cacheflush.h
@@ -13,10 +13,16 @@
#include <asm/page.h>

#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
-#define flush_dcache_page(page) \
-do { \
- clear_bit(PG_arch_1, &(page)->flags); \
-} while (0)
+static inline void flush_dcache_folio(struct folio *folio)
+{
+ clear_bit(PG_arch_1, &folio->flags);
+}
+#define flush_dcache_folio flush_dcache_folio
+
+static inline void flush_dcache_page(struct page *page)
+{
+ flush_dcache_folio(page_folio(page));
+}

extern void flush_icache_range(unsigned long start, unsigned long end);
#define flush_icache_range flush_icache_range
diff --git a/arch/ia64/include/asm/pgtable.h b/arch/ia64/include/asm/pgtable.h
index 21c97e31a28a..0c2be4ea664b 100644
--- a/arch/ia64/include/asm/pgtable.h
+++ b/arch/ia64/include/asm/pgtable.h
@@ -303,7 +303,18 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
*ptep = pteval;
}

-#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte, unsigned int nr)
+{
+ for (;;) {
+ set_pte(ptep, pte);
+ if (--nr == 0)
+ break;
+ ptep++;
+ pte_val(pte) += PAGE_SIZE;
+ }
+}
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, add, ptep, pte, 1)

/*
* Make page protection values cacheable, uncacheable, or write-
@@ -396,6 +407,7 @@ pte_same (pte_t a, pte_t b)
return pte_val(a) == pte_val(b);
}

+#define update_mmu_cache_range(vma, address, ptep, nr) do { } while (0)
#define update_mmu_cache(vma, address, ptep) do { } while (0)

extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 7f5353e28516..12aef25944aa 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -50,30 +50,39 @@ void
__ia64_sync_icache_dcache (pte_t pte)
{
unsigned long addr;
- struct page *page;
+ struct folio *folio;

- page = pte_page(pte);
- addr = (unsigned long) page_address(page);
+ folio = page_folio(pte_page(pte));
+ addr = (unsigned long)folio_address(folio);

- if (test_bit(PG_arch_1, &page->flags))
+ if (test_bit(PG_arch_1, &folio->flags))
return; /* i-cache is already coherent with d-cache */

- flush_icache_range(addr, addr + page_size(page));
- set_bit(PG_arch_1, &page->flags); /* mark page as clean */
+ flush_icache_range(addr, addr + folio_size(folio));
+ set_bit(PG_arch_1, &folio->flags); /* mark page as clean */
}

/*
- * Since DMA is i-cache coherent, any (complete) pages that were written via
+ * Since DMA is i-cache coherent, any (complete) folios that were written via
* DMA can be marked as "clean" so that lazy_mmu_prot_update() doesn't have to
* flush them when they get mapped into an executable vm-area.
*/
void arch_dma_mark_clean(phys_addr_t paddr, size_t size)
{
- unsigned long pfn = PHYS_PFN(paddr);
+ struct folio *folio = page_folio(phys_to_page(paddr));
+ ssize_t left = size;
+ size_t offset = offset_in_folio(folio, paddr);

- do {
+ if (offset) {
+ left -= folio_size(folio) - offset;
+ folio = folio_next(folio);
+ }
+
+ while (left >= (ssize_t)folio_size(folio)) {
set_bit(PG_arch_1, &pfn_to_page(pfn)->flags);
- } while (++pfn <= PHYS_PFN(paddr + size - 1));
+ left -= folio_size(folio);
+ folio = folio_next(folio);
+ }
}

inline void
--
2.39.1


2023-02-27 17:58:09

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 11/30] loongarch: Implement the new page table range API

Add set_ptes() and update_mmu_cache_range(). It would probably be
more efficient to implement __update_tlb() by flushing the entire
folio instead of calling it __update_tlb() N times, but I'll leave
that for someone who understands the architecture better.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: WANG Xuerui <[email protected]>
Cc: [email protected]
---
arch/loongarch/include/asm/cacheflush.h | 2 ++
arch/loongarch/include/asm/pgtable.h | 30 +++++++++++++++++++------
2 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/arch/loongarch/include/asm/cacheflush.h b/arch/loongarch/include/asm/cacheflush.h
index 0681788eb474..7907eb42bfbd 100644
--- a/arch/loongarch/include/asm/cacheflush.h
+++ b/arch/loongarch/include/asm/cacheflush.h
@@ -47,8 +47,10 @@ void local_flush_icache_range(unsigned long start, unsigned long end);
#define flush_cache_vmap(start, end) do { } while (0)
#define flush_cache_vunmap(start, end) do { } while (0)
#define flush_icache_page(vma, page) do { } while (0)
+#define flush_icache_pages(vma, page) do { } while (0)
#define flush_icache_user_page(vma, page, addr, len) do { } while (0)
#define flush_dcache_page(page) do { } while (0)
+#define flush_dcache_folio(folio) do { } while (0)
#define flush_dcache_mmap_lock(mapping) do { } while (0)
#define flush_dcache_mmap_unlock(mapping) do { } while (0)

diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h
index d28fb9dbec59..9154d317ffb4 100644
--- a/arch/loongarch/include/asm/pgtable.h
+++ b/arch/loongarch/include/asm/pgtable.h
@@ -334,12 +334,20 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
}
}

-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
- pte_t *ptep, pte_t pteval)
-{
- set_pte(ptep, pteval);
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte, unsigned int nr)
+{
+ for (;;) {
+ set_pte(ptep, pte);
+ if (--nr == 0)
+ break;
+ ptep++;
+ pte_val(pte) += 1 << _PFN_SHIFT;
+ }
}

+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
+
static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
{
/* Preserve global status for the pair */
@@ -445,11 +453,19 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
extern void __update_tlb(struct vm_area_struct *vma,
unsigned long address, pte_t *ptep);

-static inline void update_mmu_cache(struct vm_area_struct *vma,
- unsigned long address, pte_t *ptep)
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+ unsigned long address, pte_t *ptep, unsigned int nr)
{
- __update_tlb(vma, address, ptep);
+ for (;;) {
+ __update_tlb(vma, address, ptep);
+ if (--nr == 0)
+ break;
+ address += PAGE_SIZE;
+ ptep++;
+ }
}
+#define update_mmu_cache(vma, addr, ptep) \
+ update_mmu_cache_range(vma, addr, ptep, 1)

#define __HAVE_ARCH_UPDATE_MMU_TLB
#define update_mmu_tlb update_mmu_cache
--
2.39.1


2023-02-27 17:58:13

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 15/30] nios2: Implement the new page table range API

Add set_ptes(), update_mmu_cache_range(), flush_icache_pages() and
flush_dcache_folio(). Change the PG_arch_1 (aka PG_dcache_dirty) flag
from being per-page to per-folio.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Dinh Nguyen <[email protected]>
---
arch/nios2/include/asm/cacheflush.h | 6 ++-
arch/nios2/include/asm/pgtable.h | 27 +++++++++----
arch/nios2/mm/cacheflush.c | 61 ++++++++++++++++-------------
3 files changed, 58 insertions(+), 36 deletions(-)

diff --git a/arch/nios2/include/asm/cacheflush.h b/arch/nios2/include/asm/cacheflush.h
index d0b71dd71287..8624ca83cffe 100644
--- a/arch/nios2/include/asm/cacheflush.h
+++ b/arch/nios2/include/asm/cacheflush.h
@@ -29,9 +29,13 @@ extern void flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr,
unsigned long pfn);
#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
void flush_dcache_page(struct page *page);
+void flush_dcache_folio(struct folio *folio);
+#define flush_dcache_folio flush_dcache_folio

extern void flush_icache_range(unsigned long start, unsigned long end);
-extern void flush_icache_page(struct vm_area_struct *vma, struct page *page);
+void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
+ unsigned int nr);
+#define flush_icache_page(vma, page) flush_icache_pages(vma, page, 1);

#define flush_cache_vmap(start, end) flush_dcache_range(start, end)
#define flush_cache_vunmap(start, end) flush_dcache_range(start, end)
diff --git a/arch/nios2/include/asm/pgtable.h b/arch/nios2/include/asm/pgtable.h
index 0f5c2564e9f5..8a77821a17a5 100644
--- a/arch/nios2/include/asm/pgtable.h
+++ b/arch/nios2/include/asm/pgtable.h
@@ -178,15 +178,23 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
*ptep = pteval;
}

-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
- pte_t *ptep, pte_t pteval)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte, unsigned int nr)
{
- unsigned long paddr = (unsigned long)page_to_virt(pte_page(pteval));
-
- flush_dcache_range(paddr, paddr + PAGE_SIZE);
- set_pte(ptep, pteval);
+ unsigned long paddr = (unsigned long)page_to_virt(pte_page(pte));
+
+ flush_dcache_range(paddr, paddr + nr * PAGE_SIZE);
+ for (;;) {
+ set_pte(ptep, pte);
+ if (--nr == 0)
+ break;
+ ptep++;
+ pte_val(pte) += 1;
+ }
}

+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
+
static inline int pmd_none(pmd_t pmd)
{
return (pmd_val(pmd) ==
@@ -273,7 +281,10 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
extern void __init paging_init(void);
extern void __init mmu_init(void);

-extern void update_mmu_cache(struct vm_area_struct *vma,
- unsigned long address, pte_t *pte);
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
+ pte_t *ptep, unsigned int nr);
+
+#define update_mmu_cache(vma, addr, ptep) \
+ update_mmu_cache_range(vma, addr, ptep, 1)

#endif /* _ASM_NIOS2_PGTABLE_H */
diff --git a/arch/nios2/mm/cacheflush.c b/arch/nios2/mm/cacheflush.c
index 6aa9257c3ede..471485a84b2c 100644
--- a/arch/nios2/mm/cacheflush.c
+++ b/arch/nios2/mm/cacheflush.c
@@ -138,10 +138,11 @@ void flush_cache_range(struct vm_area_struct *vma, unsigned long start,
__flush_icache(start, end);
}

-void flush_icache_page(struct vm_area_struct *vma, struct page *page)
+void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
+ unsigned int nr)
{
unsigned long start = (unsigned long) page_address(page);
- unsigned long end = start + PAGE_SIZE;
+ unsigned long end = start + nr * PAGE_SIZE;

__flush_dcache(start, end);
__flush_icache(start, end);
@@ -158,19 +159,19 @@ void flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr,
__flush_icache(start, end);
}

-void __flush_dcache_page(struct address_space *mapping, struct page *page)
+void __flush_dcache_folio(struct address_space *mapping, struct folio *folio)
{
/*
* Writeback any data associated with the kernel mapping of this
* page. This ensures that data in the physical page is mutually
* coherent with the kernels mapping.
*/
- unsigned long start = (unsigned long)page_address(page);
+ unsigned long start = (unsigned long)folio_address(folio);

- __flush_dcache(start, start + PAGE_SIZE);
+ __flush_dcache(start, start + folio_size(folio));
}

-void flush_dcache_page(struct page *page)
+void flush_dcache_folio(struct folio *folio)
{
struct address_space *mapping;

@@ -178,32 +179,38 @@ void flush_dcache_page(struct page *page)
* The zero page is never written to, so never has any dirty
* cache lines, and therefore never needs to be flushed.
*/
- if (page == ZERO_PAGE(0))
+ if (is_zero_pfn(folio_pfn(folio)))
return;

- mapping = page_mapping_file(page);
+ mapping = folio_flush_mapping(folio);

/* Flush this page if there are aliases. */
if (mapping && !mapping_mapped(mapping)) {
- clear_bit(PG_dcache_clean, &page->flags);
+ clear_bit(PG_dcache_clean, &folio->flags);
} else {
- __flush_dcache_page(mapping, page);
+ __flush_dcache_folio(mapping, folio);
if (mapping) {
- unsigned long start = (unsigned long)page_address(page);
- flush_aliases(mapping, page);
- flush_icache_range(start, start + PAGE_SIZE);
+ unsigned long start = (unsigned long)folio_address(folio);
+ flush_aliases(mapping, folio);
+ flush_icache_range(start, start + folio_size(folio));
}
- set_bit(PG_dcache_clean, &page->flags);
+ set_bit(PG_dcache_clean, &folio->flags);
}
}
-EXPORT_SYMBOL(flush_dcache_page);
+EXPORT_SYMBOL(flush_dcache_folio);
+
+void flush_dcache_page(struct page *page)
+{
+ flush_dcache_folio(page_folio(page));
+}
+EXPORT_SYMBOL(flush_dcache_folio);

-void update_mmu_cache(struct vm_area_struct *vma,
- unsigned long address, pte_t *ptep)
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
+ pte_t *ptep, unsigned int nr)
{
pte_t pte = *ptep;
unsigned long pfn = pte_pfn(pte);
- struct page *page;
+ struct folio *folio;
struct address_space *mapping;

reload_tlb_page(vma, address, pte);
@@ -215,19 +222,19 @@ void update_mmu_cache(struct vm_area_struct *vma,
* The zero page is never written to, so never has any dirty
* cache lines, and therefore never needs to be flushed.
*/
- page = pfn_to_page(pfn);
- if (page == ZERO_PAGE(0))
+ if (is_zero_pfn(pfn))
return;

- mapping = page_mapping_file(page);
- if (!test_and_set_bit(PG_dcache_clean, &page->flags))
- __flush_dcache_page(mapping, page);
+ folio = page_folio(pfn_to_page(pfn));
+ mapping = folio_flush_mapping(folio);
+ if (!test_and_set_bit(PG_dcache_clean, &folio->flags))
+ __flush_dcache_folio(mapping, folio);

- if(mapping)
- {
- flush_aliases(mapping, page);
+ if (mapping) {
+ flush_aliases(mapping, folio);
if (vma->vm_flags & VM_EXEC)
- flush_icache_page(vma, page);
+ flush_icache_pages(vma, &folio->page,
+ folio_nr_pages(folio));
}
}

--
2.39.1


2023-02-27 17:58:16

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 14/30] mips: Implement the new page table range API

Add set_ptes(), update_mmu_cache_range(), flush_icache_pages() and
flush_dcache_folio(). Change the PG_arch_1 (aka PG_dcache_dirty) flag
from being per-page to per-folio.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: [email protected]
---
arch/mips/include/asm/cacheflush.h | 32 +++++++++++------
arch/mips/include/asm/pgtable.h | 36 +++++++++++++------
arch/mips/mm/c-r4k.c | 5 +--
arch/mips/mm/cache.c | 56 +++++++++++++++---------------
arch/mips/mm/init.c | 17 +++++----
5 files changed, 88 insertions(+), 58 deletions(-)

diff --git a/arch/mips/include/asm/cacheflush.h b/arch/mips/include/asm/cacheflush.h
index b3dc9c589442..2683cade42ef 100644
--- a/arch/mips/include/asm/cacheflush.h
+++ b/arch/mips/include/asm/cacheflush.h
@@ -36,12 +36,12 @@
*/
#define PG_dcache_dirty PG_arch_1

-#define Page_dcache_dirty(page) \
- test_bit(PG_dcache_dirty, &(page)->flags)
-#define SetPageDcacheDirty(page) \
- set_bit(PG_dcache_dirty, &(page)->flags)
-#define ClearPageDcacheDirty(page) \
- clear_bit(PG_dcache_dirty, &(page)->flags)
+#define folio_test_dcache_dirty(folio) \
+ test_bit(PG_dcache_dirty, &(folio)->flags)
+#define folio_set_dcache_dirty(folio) \
+ set_bit(PG_dcache_dirty, &(folio)->flags)
+#define folio_clear_dcache_dirty(folio) \
+ clear_bit(PG_dcache_dirty, &(folio)->flags)

extern void (*flush_cache_all)(void);
extern void (*__flush_cache_all)(void);
@@ -50,15 +50,24 @@ extern void (*flush_cache_mm)(struct mm_struct *mm);
extern void (*flush_cache_range)(struct vm_area_struct *vma,
unsigned long start, unsigned long end);
extern void (*flush_cache_page)(struct vm_area_struct *vma, unsigned long page, unsigned long pfn);
-extern void __flush_dcache_page(struct page *page);
+extern void __flush_dcache_pages(struct page *page, unsigned int nr);

#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
+static inline void flush_dcache_folio(struct folio *folio)
+{
+ if (cpu_has_dc_aliases)
+ __flush_dcache_pages(&folio->page, folio_nr_pages(folio));
+ else if (!cpu_has_ic_fills_f_dc)
+ folio_set_dcache_dirty(folio);
+}
+#define flush_dcache_folio flush_dcache_folio
+
static inline void flush_dcache_page(struct page *page)
{
if (cpu_has_dc_aliases)
- __flush_dcache_page(page);
+ __flush_dcache_pages(page, 1);
else if (!cpu_has_ic_fills_f_dc)
- SetPageDcacheDirty(page);
+ folio_set_dcache_dirty(page_folio(page));
}

#define flush_dcache_mmap_lock(mapping) do { } while (0)
@@ -73,10 +82,11 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
__flush_anon_page(page, vmaddr);
}

-static inline void flush_icache_page(struct vm_area_struct *vma,
- struct page *page)
+static inline void flush_icache_pages(struct vm_area_struct *vma,
+ struct page *page, unsigned int nr)
{
}
+#define flush_icache_page(vma, page) flush_icache_pages(vma, page, 1)

extern void (*flush_icache_range)(unsigned long start, unsigned long end);
extern void (*local_flush_icache_range)(unsigned long start, unsigned long end);
diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
index 791389bf3c12..0cf0455e6ae8 100644
--- a/arch/mips/include/asm/pgtable.h
+++ b/arch/mips/include/asm/pgtable.h
@@ -105,8 +105,10 @@ do { \
} \
} while(0)

-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
- pte_t *ptep, pte_t pteval);
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte, unsigned int nr);
+
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)

#if defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32)

@@ -204,19 +206,31 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
}
#endif

-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
- pte_t *ptep, pte_t pteval)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte, unsigned int nr)
{
+ unsigned int i;
+ bool do_sync = false;

- if (!pte_present(pteval))
- goto cache_sync_done;
+ for (i = 0; i < nr; i++) {
+ if (!pte_present(pte))
+ continue;
+ if (pte_present(ptep[i]) &&
+ (pte_pfn(ptep[i]) == pte_pfn(pte)))
+ continue;
+ do_sync = true;
+ }

- if (pte_present(*ptep) && (pte_pfn(*ptep) == pte_pfn(pteval)))
- goto cache_sync_done;
+ if (do_sync)
+ __update_cache(addr, pte);

- __update_cache(addr, pteval);
-cache_sync_done:
- set_pte(ptep, pteval);
+ for (;;) {
+ set_pte(ptep, pte);
+ if (--nr == 0)
+ break;
+ ptep++;
+ pte_val(pte) += 1 << _PFN_SHIFT;
+ }
}

/*
diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
index a549fa98c2f4..7d2a42f0cffd 100644
--- a/arch/mips/mm/c-r4k.c
+++ b/arch/mips/mm/c-r4k.c
@@ -679,13 +679,14 @@ static inline void local_r4k_flush_cache_page(void *args)
if ((mm == current->active_mm) && (pte_val(*ptep) & _PAGE_VALID))
vaddr = NULL;
else {
+ struct folio *folio = page_folio(page);
/*
* Use kmap_coherent or kmap_atomic to do flushes for
* another ASID than the current one.
*/
map_coherent = (cpu_has_dc_aliases &&
- page_mapcount(page) &&
- !Page_dcache_dirty(page));
+ folio_mapped(folio) &&
+ !folio_test_dcache_dirty(folio));
if (map_coherent)
vaddr = kmap_coherent(page, addr);
else
diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
index 11b3e7ddafd5..0668435521fc 100644
--- a/arch/mips/mm/cache.c
+++ b/arch/mips/mm/cache.c
@@ -82,13 +82,15 @@ SYSCALL_DEFINE3(cacheflush, unsigned long, addr, unsigned long, bytes,
return 0;
}

-void __flush_dcache_page(struct page *page)
+void __flush_dcache_pages(struct page *page, unsigned int nr)
{
- struct address_space *mapping = page_mapping_file(page);
+ struct folio *folio = page_folio(page);
+ struct address_space *mapping = folio_flush_mapping(folio);
unsigned long addr;
+ unsigned int i;

if (mapping && !mapping_mapped(mapping)) {
- SetPageDcacheDirty(page);
+ folio_set_dcache_dirty(folio);
return;
}

@@ -97,25 +99,21 @@ void __flush_dcache_page(struct page *page)
* case is for exec env/arg pages and those are %99 certainly going to
* get faulted into the tlb (and thus flushed) anyways.
*/
- if (PageHighMem(page))
- addr = (unsigned long)kmap_atomic(page);
- else
- addr = (unsigned long)page_address(page);
-
- flush_data_cache_page(addr);
-
- if (PageHighMem(page))
- kunmap_atomic((void *)addr);
+ for (i = 0; i < nr; i++) {
+ addr = (unsigned long)kmap_local_page(page + i);
+ flush_data_cache_page(addr);
+ kunmap_local((void *)addr);
+ }
}
-
-EXPORT_SYMBOL(__flush_dcache_page);
+EXPORT_SYMBOL(__flush_dcache_pages);

void __flush_anon_page(struct page *page, unsigned long vmaddr)
{
unsigned long addr = (unsigned long) page_address(page);
+ struct folio *folio = page_folio(page);

if (pages_do_alias(addr, vmaddr)) {
- if (page_mapcount(page) && !Page_dcache_dirty(page)) {
+ if (folio_mapped(folio) && !folio_test_dcache_dirty(folio)) {
void *kaddr;

kaddr = kmap_coherent(page, vmaddr);
@@ -130,27 +128,29 @@ EXPORT_SYMBOL(__flush_anon_page);

void __update_cache(unsigned long address, pte_t pte)
{
- struct page *page;
+ struct folio *folio;
unsigned long pfn, addr;
int exec = !pte_no_exec(pte) && !cpu_has_ic_fills_f_dc;
+ unsigned int i;

pfn = pte_pfn(pte);
if (unlikely(!pfn_valid(pfn)))
return;
- page = pfn_to_page(pfn);
- if (Page_dcache_dirty(page)) {
- if (PageHighMem(page))
- addr = (unsigned long)kmap_atomic(page);
- else
- addr = (unsigned long)page_address(page);
-
- if (exec || pages_do_alias(addr, address & PAGE_MASK))
- flush_data_cache_page(addr);

- if (PageHighMem(page))
- kunmap_atomic((void *)addr);
+ folio = page_folio(pfn_to_page(pfn));
+ address &= PAGE_MASK;
+ address -= offset_in_folio(folio, pfn << PAGE_SHIFT);
+
+ if (folio_test_dcache_dirty(folio)) {
+ for (i = 0; i < folio_nr_pages(folio); i++) {
+ addr = (unsigned long)kmap_local_folio(folio, i);

- ClearPageDcacheDirty(page);
+ if (exec || pages_do_alias(addr, address))
+ flush_data_cache_page(addr);
+ kunmap_local((void *)addr);
+ address += PAGE_SIZE;
+ }
+ folio_clear_dcache_dirty(folio);
}
}

diff --git a/arch/mips/mm/init.c b/arch/mips/mm/init.c
index 5a8002839550..19d4ca3b3fbd 100644
--- a/arch/mips/mm/init.c
+++ b/arch/mips/mm/init.c
@@ -88,7 +88,7 @@ static void *__kmap_pgprot(struct page *page, unsigned long addr, pgprot_t prot)
pte_t pte;
int tlbidx;

- BUG_ON(Page_dcache_dirty(page));
+ BUG_ON(folio_test_dcache_dirty(page_folio(page)));

preempt_disable();
pagefault_disable();
@@ -169,11 +169,12 @@ void kunmap_coherent(void)
void copy_user_highpage(struct page *to, struct page *from,
unsigned long vaddr, struct vm_area_struct *vma)
{
+ struct folio *src = page_folio(from);
void *vfrom, *vto;

vto = kmap_atomic(to);
if (cpu_has_dc_aliases &&
- page_mapcount(from) && !Page_dcache_dirty(from)) {
+ folio_mapped(src) && !folio_test_dcache_dirty(src)) {
vfrom = kmap_coherent(from, vaddr);
copy_page(vto, vfrom);
kunmap_coherent();
@@ -194,15 +195,17 @@ void copy_to_user_page(struct vm_area_struct *vma,
struct page *page, unsigned long vaddr, void *dst, const void *src,
unsigned long len)
{
+ struct folio *folio = page_folio(page);
+
if (cpu_has_dc_aliases &&
- page_mapcount(page) && !Page_dcache_dirty(page)) {
+ folio_mapped(folio) && !folio_test_dcache_dirty(folio)) {
void *vto = kmap_coherent(page, vaddr) + (vaddr & ~PAGE_MASK);
memcpy(vto, src, len);
kunmap_coherent();
} else {
memcpy(dst, src, len);
if (cpu_has_dc_aliases)
- SetPageDcacheDirty(page);
+ folio_set_dcache_dirty(folio);
}
if (vma->vm_flags & VM_EXEC)
flush_cache_page(vma, vaddr, page_to_pfn(page));
@@ -212,15 +215,17 @@ void copy_from_user_page(struct vm_area_struct *vma,
struct page *page, unsigned long vaddr, void *dst, const void *src,
unsigned long len)
{
+ struct folio *folio = page_folio(page);
+
if (cpu_has_dc_aliases &&
- page_mapcount(page) && !Page_dcache_dirty(page)) {
+ folio_mapped(folio) && !folio_test_dcache_dirty(folio)) {
void *vfrom = kmap_coherent(page, vaddr) + (vaddr & ~PAGE_MASK);
memcpy(dst, vfrom, len);
kunmap_coherent();
} else {
memcpy(dst, src, len);
if (cpu_has_dc_aliases)
- SetPageDcacheDirty(page);
+ folio_set_dcache_dirty(folio);
}
}
EXPORT_SYMBOL_GPL(copy_from_user_page);
--
2.39.1


2023-02-27 17:58:18

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 13/30] microblaze: Implement the new page table range API

Add set_ptes(), update_mmu_cache_range(), flush_icache_pages() and
flush_dcache_folio(). Also change the calling convention for set_pte()
to be the same as other architectures.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Michal Simek <[email protected]>
---
arch/microblaze/include/asm/cacheflush.h | 8 ++++++++
arch/microblaze/include/asm/pgtable.h | 17 ++++++++++++-----
arch/microblaze/include/asm/tlbflush.h | 4 +++-
3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/arch/microblaze/include/asm/cacheflush.h b/arch/microblaze/include/asm/cacheflush.h
index 39f8fb6768d8..e6641ff98cb3 100644
--- a/arch/microblaze/include/asm/cacheflush.h
+++ b/arch/microblaze/include/asm/cacheflush.h
@@ -74,6 +74,14 @@ do { \
flush_dcache_range((unsigned) (addr), (unsigned) (addr) + PAGE_SIZE); \
} while (0);

+static void flush_dcache_folio(struct folio *folio)
+{
+ unsigned long addr = folio_pfn(folio) << PAGE_SHIFT;
+
+ flush_dcache_range(addr, addr + folio_size(folio));
+}
+#define flush_dcache_folio flush_dcache_folio
+
#define flush_cache_page(vma, vmaddr, pfn) \
flush_dcache_range(pfn << PAGE_SHIFT, (pfn << PAGE_SHIFT) + PAGE_SIZE);

diff --git a/arch/microblaze/include/asm/pgtable.h b/arch/microblaze/include/asm/pgtable.h
index d1b8272abcd9..a01e1369b486 100644
--- a/arch/microblaze/include/asm/pgtable.h
+++ b/arch/microblaze/include/asm/pgtable.h
@@ -330,18 +330,25 @@ static inline unsigned long pte_update(pte_t *p, unsigned long clr,
/*
* set_pte stores a linux PTE into the linux page table.
*/
-static inline void set_pte(struct mm_struct *mm, unsigned long addr,
- pte_t *ptep, pte_t pte)
+static inline void set_pte(pte_t *ptep, pte_t pte)
{
*ptep = pte;
}

-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
- pte_t *ptep, pte_t pte)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte, unsigned int nr)
{
- *ptep = pte;
+ for (;;) {
+ set_pte(ptep, pte);
+ if (--nr == 0)
+ break;
+ ptep++;
+ pte_val(pte) += 1 << PFN_SHIFT_OFFSET;
+ }
}

+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
+
#define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
static inline int ptep_test_and_clear_young(struct vm_area_struct *vma,
unsigned long address, pte_t *ptep)
diff --git a/arch/microblaze/include/asm/tlbflush.h b/arch/microblaze/include/asm/tlbflush.h
index 2038168ed128..1b179e5e9062 100644
--- a/arch/microblaze/include/asm/tlbflush.h
+++ b/arch/microblaze/include/asm/tlbflush.h
@@ -33,7 +33,9 @@ static inline void local_flush_tlb_range(struct vm_area_struct *vma,

#define flush_tlb_kernel_range(start, end) do { } while (0)

-#define update_mmu_cache(vma, addr, ptep) do { } while (0)
+#define update_mmu_cache_range(vma, addr, ptep, nr) do { } while (0)
+#define update_mmu_cache(vma, addr, pte) \
+ update_mmu_cache_range(vma, addr, ptep, 1)

#define flush_tlb_all local_flush_tlb_all
#define flush_tlb_mm local_flush_tlb_mm
--
2.39.1


2023-02-27 17:58:22

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 07/30] arm64: Implement the new page table range API

Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
Change the PG_dcache_clean flag from being per-page to per-folio.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Cc: [email protected]
---
arch/arm64/include/asm/cacheflush.h | 4 +++-
arch/arm64/include/asm/pgtable.h | 25 ++++++++++++++------
arch/arm64/mm/flush.c | 36 +++++++++++------------------
3 files changed, 35 insertions(+), 30 deletions(-)

diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
index 37185e978aeb..d115451ed263 100644
--- a/arch/arm64/include/asm/cacheflush.h
+++ b/arch/arm64/include/asm/cacheflush.h
@@ -114,7 +114,7 @@ extern void copy_to_user_page(struct vm_area_struct *, struct page *,
#define copy_to_user_page copy_to_user_page

/*
- * flush_dcache_page is used when the kernel has written to the page
+ * flush_dcache_folio is used when the kernel has written to the page
* cache page at virtual address page->virtual.
*
* If this page isn't mapped (ie, page_mapping == NULL), or it might
@@ -127,6 +127,8 @@ extern void copy_to_user_page(struct vm_area_struct *, struct page *,
*/
#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
extern void flush_dcache_page(struct page *);
+void flush_dcache_folio(struct folio *);
+#define flush_dcache_folio flush_dcache_folio

static __always_inline void icache_inval_all_pou(void)
{
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 69765dc697af..4d1b79dbff16 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -355,12 +355,21 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
set_pte(ptep, pte);
}

-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
- pte_t *ptep, pte_t pte)
-{
- page_table_check_ptes_set(mm, addr, ptep, pte, 1);
- return __set_pte_at(mm, addr, ptep, pte);
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte, unsigned int nr)
+{
+ page_table_check_ptes_set(mm, addr, ptep, pte, nr);
+
+ for (;;) {
+ __set_pte_at(mm, addr, ptep, pte);
+ if (--nr == 0)
+ break;
+ ptep++;
+ addr += PAGE_SIZE;
+ pte_val(pte) += PAGE_SIZE;
+ }
}
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)

/*
* Huge pte definitions.
@@ -1059,8 +1068,8 @@ static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio)
/*
* On AArch64, the cache coherency is handled via the set_pte_at() function.
*/
-static inline void update_mmu_cache(struct vm_area_struct *vma,
- unsigned long addr, pte_t *ptep)
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep, unsigned int nr)
{
/*
* We don't do anything here, so there's a very small chance of
@@ -1069,6 +1078,8 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
*/
}

+#define update_mmu_cache(vma, addr, ptep) \
+ update_mmu_cache_range(vma, addr, ptep, 1)
#define update_mmu_cache_pmd(vma, address, pmd) do { } while (0)

#ifdef CONFIG_ARM64_PA_BITS_52
diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
index 5f9379b3c8c8..deb781af0a3a 100644
--- a/arch/arm64/mm/flush.c
+++ b/arch/arm64/mm/flush.c
@@ -50,20 +50,13 @@ void copy_to_user_page(struct vm_area_struct *vma, struct page *page,

void __sync_icache_dcache(pte_t pte)
{
- struct page *page = pte_page(pte);
+ struct folio *folio = page_folio(pte_page(pte));

- /*
- * HugeTLB pages are always fully mapped, so only setting head page's
- * PG_dcache_clean flag is enough.
- */
- if (PageHuge(page))
- page = compound_head(page);
-
- if (!test_bit(PG_dcache_clean, &page->flags)) {
- sync_icache_aliases((unsigned long)page_address(page),
- (unsigned long)page_address(page) +
- page_size(page));
- set_bit(PG_dcache_clean, &page->flags);
+ if (!test_bit(PG_dcache_clean, &folio->flags)) {
+ sync_icache_aliases((unsigned long)folio_address(folio),
+ (unsigned long)folio_address(folio) +
+ folio_size(folio));
+ set_bit(PG_dcache_clean, &folio->flags);
}
}
EXPORT_SYMBOL_GPL(__sync_icache_dcache);
@@ -73,17 +66,16 @@ EXPORT_SYMBOL_GPL(__sync_icache_dcache);
* it as dirty for later flushing when mapped in user space (if executable,
* see __sync_icache_dcache).
*/
-void flush_dcache_page(struct page *page)
+void flush_dcache_folio(struct folio *folio)
{
- /*
- * HugeTLB pages are always fully mapped and only head page will be
- * set PG_dcache_clean (see comments in __sync_icache_dcache()).
- */
- if (PageHuge(page))
- page = compound_head(page);
+ if (test_bit(PG_dcache_clean, &folio->flags))
+ clear_bit(PG_dcache_clean, &folio->flags);
+}
+EXPORT_SYMBOL(flush_dcache_folio);

- if (test_bit(PG_dcache_clean, &page->flags))
- clear_bit(PG_dcache_clean, &page->flags);
+void flush_dcache_page(struct page *page)
+{
+ flush_dcache_folio(page_folio(page));
}
EXPORT_SYMBOL(flush_dcache_page);

--
2.39.1


2023-02-27 17:58:26

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 20/30] s390: Implement the new page table range API

Add set_ptes() and update_mmu_cache_range().

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: [email protected]
---
arch/s390/include/asm/pgtable.h | 34 ++++++++++++++++++++++++---------
1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 2c70b4d1263d..46bf475116f1 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -50,6 +50,7 @@ void arch_report_meminfo(struct seq_file *m);
* tables contain all the necessary information.
*/
#define update_mmu_cache(vma, address, ptep) do { } while (0)
+#define update_mmu_cache_range(vma, addr, ptep, nr) do { } while (0)
#define update_mmu_cache_pmd(vma, address, ptep) do { } while (0)

/*
@@ -1317,21 +1318,36 @@ pgprot_t pgprot_writecombine(pgprot_t prot);
pgprot_t pgprot_writethrough(pgprot_t prot);

/*
- * Certain architectures need to do special things when PTEs
- * within a page table are directly modified. Thus, the following
- * hook is made available.
+ * Set multiple PTEs to consecutive pages with a single call. All PTEs
+ * are within the same folio, PMD and VMA.
*/
-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
- pte_t *ptep, pte_t entry)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t entry, unsigned int nr)
{
if (pte_present(entry))
entry = clear_pte_bit(entry, __pgprot(_PAGE_UNUSED));
- if (mm_has_pgste(mm))
- ptep_set_pte_at(mm, addr, ptep, entry);
- else
- set_pte(ptep, entry);
+ if (mm_has_pgste(mm)) {
+ for (;;) {
+ ptep_set_pte_at(mm, addr, ptep, entry);
+ if (--nr == 0)
+ break;
+ ptep++;
+ entry = __pte(pte_val(entry) + PAGE_SIZE);
+ addr += PAGE_SIZE;
+ }
+ } else {
+ for (;;) {
+ set_pte(ptep, entry);
+ if (--nr == 0)
+ break;
+ ptep++;
+ entry = __pte(pte_val(entry) + PAGE_SIZE);
+ }
+ }
}

+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
+
/*
* Conversion functions: convert a page and protection to a page entry,
* and a page entry and page directory to the page they refer to.
--
2.39.1


2023-02-27 17:58:29

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 26/30] xtensa: Implement the new page table range API

Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio() and
flush_icache_pages().

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: [email protected]
---
arch/xtensa/include/asm/cacheflush.h | 9 ++-
arch/xtensa/include/asm/pgtable.h | 24 +++++---
arch/xtensa/mm/cache.c | 83 ++++++++++++++++------------
3 files changed, 72 insertions(+), 44 deletions(-)

diff --git a/arch/xtensa/include/asm/cacheflush.h b/arch/xtensa/include/asm/cacheflush.h
index 7b4359312c25..35153f6725e4 100644
--- a/arch/xtensa/include/asm/cacheflush.h
+++ b/arch/xtensa/include/asm/cacheflush.h
@@ -119,8 +119,14 @@ void flush_cache_page(struct vm_area_struct*,
#define flush_cache_vmap(start,end) flush_cache_all()
#define flush_cache_vunmap(start,end) flush_cache_all()

+void flush_dcache_folio(struct folio *folio);
+#define flush_dcache_folio flush_dcache_folio
+
#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
-void flush_dcache_page(struct page *);
+static inline void flush_dcache_page(struct page *page)
+{
+ flush_dcache_folio(page_folio(page));
+}

void local_flush_cache_range(struct vm_area_struct *vma,
unsigned long start, unsigned long end);
@@ -156,6 +162,7 @@ void local_flush_cache_page(struct vm_area_struct *vma,

/* This is not required, see Documentation/core-api/cachetlb.rst */
#define flush_icache_page(vma,page) do { } while (0)
+#define flush_icache_pages(vma, page, nr) do { } while (0)

#define flush_dcache_mmap_lock(mapping) do { } while (0)
#define flush_dcache_mmap_unlock(mapping) do { } while (0)
diff --git a/arch/xtensa/include/asm/pgtable.h b/arch/xtensa/include/asm/pgtable.h
index fc7a14884c6c..293101530541 100644
--- a/arch/xtensa/include/asm/pgtable.h
+++ b/arch/xtensa/include/asm/pgtable.h
@@ -301,17 +301,25 @@ static inline void update_pte(pte_t *ptep, pte_t pteval)

struct mm_struct;

-static inline void
-set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pteval)
+static inline void set_pte(pte_t *ptep, pte_t pte)
{
- update_pte(ptep, pteval);
+ update_pte(ptep, pte);
}

-static inline void set_pte(pte_t *ptep, pte_t pteval)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte, unsigned int nr)
{
- update_pte(ptep, pteval);
+ for (;;) {
+ set_pte(ptep, pte);
+ if (--nr == 0)
+ break;
+ ptep++;
+ pte_val(pte) += PAGE_SIZE;
+ }
}

+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
+
static inline void
set_pmd(pmd_t *pmdp, pmd_t pmdval)
{
@@ -407,8 +415,10 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)

#else

-extern void update_mmu_cache(struct vm_area_struct * vma,
- unsigned long address, pte_t *ptep);
+void update_mmu_cache_range(struct vm_area_struct *vma,
+ unsigned long address, pte_t *ptep, unsigned int nr);
+#define update_mmu_cache(vma, address, ptep) \
+ update_mmu_cache_range(vma, address, ptep, 1)

typedef pte_t *pte_addr_t;

diff --git a/arch/xtensa/mm/cache.c b/arch/xtensa/mm/cache.c
index 19e5a478a7e8..65c0d5298041 100644
--- a/arch/xtensa/mm/cache.c
+++ b/arch/xtensa/mm/cache.c
@@ -121,9 +121,9 @@ EXPORT_SYMBOL(copy_user_highpage);
*
*/

-void flush_dcache_page(struct page *page)
+void flush_dcache_folio(struct folio *folio)
{
- struct address_space *mapping = page_mapping_file(page);
+ struct address_space *mapping = folio_flush_mapping(folio);

/*
* If we have a mapping but the page is not mapped to user-space
@@ -132,14 +132,14 @@ void flush_dcache_page(struct page *page)
*/

if (mapping && !mapping_mapped(mapping)) {
- if (!test_bit(PG_arch_1, &page->flags))
- set_bit(PG_arch_1, &page->flags);
+ if (!test_bit(PG_arch_1, &folio->flags))
+ set_bit(PG_arch_1, &folio->flags);
return;

} else {
-
- unsigned long phys = page_to_phys(page);
- unsigned long temp = page->index << PAGE_SHIFT;
+ unsigned long phys = folio_pfn(folio) * PAGE_SIZE;
+ unsigned long temp = folio_pos(folio);
+ unsigned int i, nr = folio_nr_pages(folio);
unsigned long alias = !(DCACHE_ALIAS_EQ(temp, phys));
unsigned long virt;

@@ -154,22 +154,26 @@ void flush_dcache_page(struct page *page)
return;

preempt_disable();
- virt = TLBTEMP_BASE_1 + (phys & DCACHE_ALIAS_MASK);
- __flush_invalidate_dcache_page_alias(virt, phys);
+ for (i = 0; i < nr; i++) {
+ virt = TLBTEMP_BASE_1 + (phys & DCACHE_ALIAS_MASK);
+ __flush_invalidate_dcache_page_alias(virt, phys);

- virt = TLBTEMP_BASE_1 + (temp & DCACHE_ALIAS_MASK);
+ virt = TLBTEMP_BASE_1 + (temp & DCACHE_ALIAS_MASK);

- if (alias)
- __flush_invalidate_dcache_page_alias(virt, phys);
+ if (alias)
+ __flush_invalidate_dcache_page_alias(virt, phys);

- if (mapping)
- __invalidate_icache_page_alias(virt, phys);
+ if (mapping)
+ __invalidate_icache_page_alias(virt, phys);
+ phys += PAGE_SIZE;
+ temp += PAGE_SIZE;
+ }
preempt_enable();
}

/* There shouldn't be an entry in the cache for this page anymore. */
}
-EXPORT_SYMBOL(flush_dcache_page);
+EXPORT_SYMBOL(flush_dcache_folio);

/*
* For now, flush the whole cache. FIXME??
@@ -207,45 +211,52 @@ EXPORT_SYMBOL(local_flush_cache_page);

#endif /* DCACHE_WAY_SIZE > PAGE_SIZE */

-void
-update_mmu_cache(struct vm_area_struct * vma, unsigned long addr, pte_t *ptep)
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long addr,
+ pte_t *ptep, unsigned int nr)
{
unsigned long pfn = pte_pfn(*ptep);
- struct page *page;
+ struct folio *folio;
+ unsigned int i;

if (!pfn_valid(pfn))
return;

- page = pfn_to_page(pfn);
+ folio = page_folio(pfn_to_page(pfn));

- /* Invalidate old entry in TLBs */
-
- flush_tlb_page(vma, addr);
+ /* Invalidate old entries in TLBs */
+ for (i = 0; i < nr; i++)
+ flush_tlb_page(vma, addr + i * PAGE_SIZE);
+ nr = folio_nr_pages(folio);

#if (DCACHE_WAY_SIZE > PAGE_SIZE)

- if (!PageReserved(page) && test_bit(PG_arch_1, &page->flags)) {
- unsigned long phys = page_to_phys(page);
+ if (!folio_test_reserved(folio) && test_bit(PG_arch_1, &folio->flags)) {
+ unsigned long phys = folio_pfn(folio) * PAGE_SIZE;
unsigned long tmp;

preempt_disable();
- tmp = TLBTEMP_BASE_1 + (phys & DCACHE_ALIAS_MASK);
- __flush_invalidate_dcache_page_alias(tmp, phys);
- tmp = TLBTEMP_BASE_1 + (addr & DCACHE_ALIAS_MASK);
- __flush_invalidate_dcache_page_alias(tmp, phys);
- __invalidate_icache_page_alias(tmp, phys);
+ for (i = 0; i < nr; i++) {
+ tmp = TLBTEMP_BASE_1 + (phys & DCACHE_ALIAS_MASK);
+ __flush_invalidate_dcache_page_alias(tmp, phys);
+ tmp = TLBTEMP_BASE_1 + (addr & DCACHE_ALIAS_MASK);
+ __flush_invalidate_dcache_page_alias(tmp, phys);
+ __invalidate_icache_page_alias(tmp, phys);
+ phys += PAGE_SIZE;
+ }
preempt_enable();

- clear_bit(PG_arch_1, &page->flags);
+ clear_bit(PG_arch_1, &folio->flags);
}
#else
- if (!PageReserved(page) && !test_bit(PG_arch_1, &page->flags)
+ if (!folio_test_reserved(folio) && !test_bit(PG_arch_1, &folio->flags)
&& (vma->vm_flags & VM_EXEC) != 0) {
- unsigned long paddr = (unsigned long)kmap_atomic(page);
- __flush_dcache_page(paddr);
- __invalidate_icache_page(paddr);
- set_bit(PG_arch_1, &page->flags);
- kunmap_atomic((void *)paddr);
+ for (i = 0; i < nr; i++) {
+ void *paddr = kmap_local_folio(folio, i * PAGE_SIZE);
+ __flush_dcache_page((unsigned long)paddr);
+ __invalidate_icache_page((unsigned long)paddr);
+ kunmap_atomic(paddr);
+ }
+ set_bit(PG_arch_1, &folio->flags);
}
#endif
}
--
2.39.1


2023-02-27 17:58:33

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 17/30] parisc: Implement the new page table range API

Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio()
and flush_icache_pages(). Change the PG_arch_1 (aka PG_dcache_dirty) flag
from being per-page to per-folio.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: [email protected]
---
arch/parisc/include/asm/cacheflush.h | 14 ++--
arch/parisc/include/asm/pgtable.h | 28 +++++---
arch/parisc/kernel/cache.c | 101 +++++++++++++++++++--------
3 files changed, 99 insertions(+), 44 deletions(-)

diff --git a/arch/parisc/include/asm/cacheflush.h b/arch/parisc/include/asm/cacheflush.h
index ff07c509e04b..0bf8b69d086b 100644
--- a/arch/parisc/include/asm/cacheflush.h
+++ b/arch/parisc/include/asm/cacheflush.h
@@ -46,16 +46,20 @@ void invalidate_kernel_vmap_range(void *vaddr, int size);
#define flush_cache_vmap(start, end) flush_cache_all()
#define flush_cache_vunmap(start, end) flush_cache_all()

+void flush_dcache_folio(struct folio *folio);
+#define flush_dcache_folio flush_dcache_folio
#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
-void flush_dcache_page(struct page *page);
+static inline void flush_dcache_page(struct page *page)
+{
+ flush_dcache_folio(page_folio(page));
+}

#define flush_dcache_mmap_lock(mapping) xa_lock_irq(&mapping->i_pages)
#define flush_dcache_mmap_unlock(mapping) xa_unlock_irq(&mapping->i_pages)

-#define flush_icache_page(vma,page) do { \
- flush_kernel_dcache_page_addr(page_address(page)); \
- flush_kernel_icache_page(page_address(page)); \
-} while (0)
+void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
+ unsigned int nr);
+#define flush_icache_page(vma, page) flush_icache_pages(vma, page, 1)

#define flush_icache_range(s,e) do { \
flush_kernel_dcache_range_asm(s,e); \
diff --git a/arch/parisc/include/asm/pgtable.h b/arch/parisc/include/asm/pgtable.h
index e2950f5db7c9..78ee9816f423 100644
--- a/arch/parisc/include/asm/pgtable.h
+++ b/arch/parisc/include/asm/pgtable.h
@@ -73,14 +73,7 @@ extern void __update_cache(pte_t pte);
mb(); \
} while(0)

-#define set_pte_at(mm, addr, pteptr, pteval) \
- do { \
- if (pte_present(pteval) && \
- pte_user(pteval)) \
- __update_cache(pteval); \
- *(pteptr) = (pteval); \
- purge_tlb_entries(mm, addr); \
- } while (0)
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)

#endif /* !__ASSEMBLY__ */

@@ -391,11 +384,28 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)

extern void paging_init (void);

+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte, unsigned int nr)
+{
+ if (pte_present(pte) && pte_user(pte))
+ __update_cache(pte);
+ for (;;) {
+ *ptep = pte;
+ purge_tlb_entries(mm, addr);
+ if (--nr == 0)
+ break;
+ ptep++;
+ pte_val(pte) += 1 << PFN_PTE_SHIFT;
+ addr += PAGE_SIZE;
+ }
+}
+
/* Used for deferring calls to flush_dcache_page() */

#define PG_dcache_dirty PG_arch_1

-#define update_mmu_cache(vms,addr,ptep) __update_cache(*ptep)
+#define update_mmu_cache_range(vma, addr, ptep, nr) __update_cache(*ptep)
+#define update_mmu_cache(vma, addr, ptep) __update_cache(*ptep)

/*
* Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
diff --git a/arch/parisc/kernel/cache.c b/arch/parisc/kernel/cache.c
index 984d3a1b3828..16057812103b 100644
--- a/arch/parisc/kernel/cache.c
+++ b/arch/parisc/kernel/cache.c
@@ -92,11 +92,11 @@ static inline void flush_data_cache(void)
/* Kernel virtual address of pfn. */
#define pfn_va(pfn) __va(PFN_PHYS(pfn))

-void
-__update_cache(pte_t pte)
+void __update_cache(pte_t pte)
{
unsigned long pfn = pte_pfn(pte);
- struct page *page;
+ struct folio *folio;
+ unsigned int nr;

/* We don't have pte special. As a result, we can be called with
an invalid pfn and we don't need to flush the kernel dcache page.
@@ -104,13 +104,17 @@ __update_cache(pte_t pte)
if (!pfn_valid(pfn))
return;

- page = pfn_to_page(pfn);
- if (page_mapping_file(page) &&
- test_bit(PG_dcache_dirty, &page->flags)) {
- flush_kernel_dcache_page_addr(pfn_va(pfn));
- clear_bit(PG_dcache_dirty, &page->flags);
+ folio = page_folio(pfn_to_page(pfn));
+ pfn = folio_pfn(folio);
+ nr = folio_nr_pages(folio);
+ if (folio_flush_mapping(folio) &&
+ test_bit(PG_dcache_dirty, &folio->flags)) {
+ while (nr--)
+ flush_kernel_dcache_page_addr(pfn_va(pfn + nr));
+ clear_bit(PG_dcache_dirty, &folio->flags);
} else if (parisc_requires_coherency())
- flush_kernel_dcache_page_addr(pfn_va(pfn));
+ while (nr--)
+ flush_kernel_dcache_page_addr(pfn_va(pfn + nr));
}

void
@@ -365,6 +369,20 @@ static void flush_user_cache_page(struct vm_area_struct *vma, unsigned long vmad
preempt_enable();
}

+void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
+ unsigned int nr)
+{
+ void *kaddr = page_address(page);
+
+ for (;;) {
+ flush_kernel_dcache_page_addr(kaddr);
+ flush_kernel_icache_page(kaddr);
+ if (--nr == 0)
+ break;
+ page += PAGE_SIZE;
+ }
+}
+
static inline pte_t *get_ptep(struct mm_struct *mm, unsigned long addr)
{
pte_t *ptep = NULL;
@@ -393,26 +411,30 @@ static inline bool pte_needs_flush(pte_t pte)
== (_PAGE_PRESENT | _PAGE_ACCESSED);
}

-void flush_dcache_page(struct page *page)
+void flush_dcache_folio(struct folio *folio)
{
- struct address_space *mapping = page_mapping_file(page);
- struct vm_area_struct *mpnt;
- unsigned long offset;
+ struct address_space *mapping = folio_flush_mapping(folio);
+ struct vm_area_struct *vma;
unsigned long addr, old_addr = 0;
+ void *kaddr;
unsigned long count = 0;
+ unsigned long i, nr;
pgoff_t pgoff;

if (mapping && !mapping_mapped(mapping)) {
- set_bit(PG_dcache_dirty, &page->flags);
+ set_bit(PG_dcache_dirty, &folio->flags);
return;
}

- flush_kernel_dcache_page_addr(page_address(page));
+ nr = folio_nr_pages(folio);
+ kaddr = folio_address(folio);
+ for (i = 0; i < nr; i++)
+ flush_kernel_dcache_page_addr(kaddr + i * PAGE_SIZE);

if (!mapping)
return;

- pgoff = page->index;
+ pgoff = folio->index;

/*
* We have carefully arranged in arch_get_unmapped_area() that
@@ -422,15 +444,29 @@ void flush_dcache_page(struct page *page)
* on machines that support equivalent aliasing
*/
flush_dcache_mmap_lock(mapping);
- vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) {
- offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
- addr = mpnt->vm_start + offset;
- if (parisc_requires_coherency()) {
- pte_t *ptep;
+ vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff + nr - 1) {
+ unsigned long offset = pgoff - vma->vm_pgoff;
+ unsigned long pfn = folio_pfn(folio);
+
+ addr = vma->vm_start;
+ nr = folio_nr_pages(folio);
+ if (offset > -nr) {
+ pfn -= offset;
+ nr += offset;
+ } else {
+ addr += offset * PAGE_SIZE;
+ }
+ if (addr + nr * PAGE_SIZE > vma->vm_end)
+ nr = (vma->vm_end - addr) / PAGE_SIZE;

- ptep = get_ptep(mpnt->vm_mm, addr);
- if (ptep && pte_needs_flush(*ptep))
- flush_user_cache_page(mpnt, addr);
+ if (parisc_requires_coherency()) {
+ for (i = 0; i < nr; i++) {
+ pte_t *ptep = get_ptep(vma->vm_mm,
+ addr + i * PAGE_SIZE);
+ if (ptep && pte_needs_flush(*ptep))
+ flush_user_cache_page(vma,
+ addr + i * PAGE_SIZE);
+ }
} else {
/*
* The TLB is the engine of coherence on parisc:
@@ -443,27 +479,32 @@ void flush_dcache_page(struct page *page)
* in (until the user or kernel specifically
* accesses it, of course)
*/
- flush_tlb_page(mpnt, addr);
+ for (i = 0; i < nr; i++)
+ flush_tlb_page(vma, addr + i * PAGE_SIZE);
if (old_addr == 0 || (old_addr & (SHM_COLOUR - 1))
!= (addr & (SHM_COLOUR - 1))) {
- __flush_cache_page(mpnt, addr, page_to_phys(page));
+ for (i = 0; i < nr; i++)
+ __flush_cache_page(vma,
+ addr + i * PAGE_SIZE,
+ (pfn + i) * PAGE_SIZE);
/*
* Software is allowed to have any number
* of private mappings to a page.
*/
- if (!(mpnt->vm_flags & VM_SHARED))
+ if (!(vma->vm_flags & VM_SHARED))
continue;
if (old_addr)
pr_err("INEQUIVALENT ALIASES 0x%lx and 0x%lx in file %pD\n",
- old_addr, addr, mpnt->vm_file);
- old_addr = addr;
+ old_addr, addr, vma->vm_file);
+ if (nr == folio_nr_pages(folio))
+ old_addr = addr;
}
}
WARN_ON(++count == 4096);
}
flush_dcache_mmap_unlock(mapping);
}
-EXPORT_SYMBOL(flush_dcache_page);
+EXPORT_SYMBOL(flush_dcache_folio);

/* Defined in arch/parisc/kernel/pacache.S */
EXPORT_SYMBOL(flush_kernel_dcache_range_asm);
--
2.39.1


2023-02-27 17:58:35

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 22/30] sparc32: Implement the new page table range API

Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio() and
flush_icache_pages().

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: [email protected]
---
arch/sparc/include/asm/cacheflush_32.h | 9 +++++++--
arch/sparc/include/asm/pgtable_32.h | 15 ++++++++++++++-
arch/sparc/mm/init_32.c | 13 +++++++++++--
3 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/arch/sparc/include/asm/cacheflush_32.h b/arch/sparc/include/asm/cacheflush_32.h
index adb6991d0455..8dba35d63328 100644
--- a/arch/sparc/include/asm/cacheflush_32.h
+++ b/arch/sparc/include/asm/cacheflush_32.h
@@ -16,6 +16,7 @@
sparc32_cachetlb_ops->cache_page(vma, addr)
#define flush_icache_range(start, end) do { } while (0)
#define flush_icache_page(vma, pg) do { } while (0)
+#define flush_icache_pages(vma, pg, nr) do { } while (0)

#define copy_to_user_page(vma, page, vaddr, dst, src, len) \
do { \
@@ -35,11 +36,15 @@
#define flush_page_for_dma(addr) \
sparc32_cachetlb_ops->page_for_dma(addr)

-struct page;
void sparc_flush_page_to_ram(struct page *page);
+void sparc_flush_folio_to_ram(struct folio *folio);

#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
-#define flush_dcache_page(page) sparc_flush_page_to_ram(page)
+#define flush_dcache_folio(folio) sparc_flush_folio_to_ram(folio)
+static inline void flush_dcache_page(struct page *page)
+{
+ flush_dcache_folio(page_folio(page));
+}
#define flush_dcache_mmap_lock(mapping) do { } while (0)
#define flush_dcache_mmap_unlock(mapping) do { } while (0)

diff --git a/arch/sparc/include/asm/pgtable_32.h b/arch/sparc/include/asm/pgtable_32.h
index d4330e3c57a6..47ae55ea1837 100644
--- a/arch/sparc/include/asm/pgtable_32.h
+++ b/arch/sparc/include/asm/pgtable_32.h
@@ -101,7 +101,19 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
srmmu_swap((unsigned long *)ptep, pte_val(pteval));
}

-#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte, unsigned int nr)
+{
+ for (;;) {
+ set_pte(ptep, pte);
+ if (--nr == 0)
+ break;
+ ptep++;
+ pte_val(pte) += PAGE_SIZE;
+ }
+}
+
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)

static inline int srmmu_device_memory(unsigned long x)
{
@@ -318,6 +330,7 @@ void mmu_info(struct seq_file *m);
#define FAULT_CODE_USER 0x4

#define update_mmu_cache(vma, address, ptep) do { } while (0)
+#define update_mmu_cache_range(vma, address, ptep, nr) do { } while (0)

void srmmu_mapiorange(unsigned int bus, unsigned long xpa,
unsigned long xva, unsigned int len);
diff --git a/arch/sparc/mm/init_32.c b/arch/sparc/mm/init_32.c
index 9c0ea457bdf0..d96a14ffceeb 100644
--- a/arch/sparc/mm/init_32.c
+++ b/arch/sparc/mm/init_32.c
@@ -297,11 +297,20 @@ void sparc_flush_page_to_ram(struct page *page)
{
unsigned long vaddr = (unsigned long)page_address(page);

- if (vaddr)
- __flush_page_to_ram(vaddr);
+ __flush_page_to_ram(vaddr);
}
EXPORT_SYMBOL(sparc_flush_page_to_ram);

+void sparc_flush_folio_to_ram(struct folio *folio)
+{
+ unsigned long vaddr = (unsigned long)folio_address(folio);
+ unsigned int i, nr = folio_nr_pages(folio);
+
+ for (i = 0; i < nr; i++)
+ __flush_page_to_ram(vaddr + i * PAGE_SIZE);
+}
+EXPORT_SYMBOL(sparc_flush_folio_to_ram);
+
static const pgprot_t protection_map[16] = {
[VM_NONE] = PAGE_NONE,
[VM_READ] = PAGE_READONLY,
--
2.39.1


2023-02-27 17:58:37

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 16/30] openrisc: Implement the new page table range API

Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
Change the PG_arch_1 (aka PG_dcache_dirty) flag from being per-page
to per-folio.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Jonas Bonn <[email protected]>
Cc: Stefan Kristiansson <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: [email protected]
---
arch/openrisc/include/asm/cacheflush.h | 8 +++++++-
arch/openrisc/include/asm/pgtable.h | 27 +++++++++++++++++++++-----
arch/openrisc/mm/cache.c | 12 ++++++++----
3 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/arch/openrisc/include/asm/cacheflush.h b/arch/openrisc/include/asm/cacheflush.h
index eeac40d4a854..984c331ff5f4 100644
--- a/arch/openrisc/include/asm/cacheflush.h
+++ b/arch/openrisc/include/asm/cacheflush.h
@@ -56,10 +56,16 @@ static inline void sync_icache_dcache(struct page *page)
*/
#define PG_dc_clean PG_arch_1

+static inline void flush_dcache_folio(struct folio *folio)
+{
+ clear_bit(PG_dc_clean, &folio->flags);
+}
+#define flush_dcache_folio flush_dcache_folio
+
#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
static inline void flush_dcache_page(struct page *page)
{
- clear_bit(PG_dc_clean, &page->flags);
+ flush_dcache_folio(page_folio(page));
}

#define flush_icache_user_page(vma, page, addr, len) \
diff --git a/arch/openrisc/include/asm/pgtable.h b/arch/openrisc/include/asm/pgtable.h
index 3eb9b9555d0d..1a7077150d7b 100644
--- a/arch/openrisc/include/asm/pgtable.h
+++ b/arch/openrisc/include/asm/pgtable.h
@@ -46,7 +46,21 @@ extern void paging_init(void);
* hook is made available.
*/
#define set_pte(pteptr, pteval) ((*(pteptr)) = (pteval))
-#define set_pte_at(mm, addr, ptep, pteval) set_pte(ptep, pteval)
+
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte, unsigned int nr)
+{
+ for (;;) {
+ set_pte(ptep, pte);
+ if (--nr == 0)
+ break;
+ ptep++;
+ pte_val(pte) += PAGE_SIZE;
+ }
+}
+
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
+
/*
* (pmds are folded into pgds so this doesn't get actually called,
* but the define is needed for a generic inline function.)
@@ -379,13 +393,16 @@ static inline void update_tlb(struct vm_area_struct *vma,
extern void update_cache(struct vm_area_struct *vma,
unsigned long address, pte_t *pte);

-static inline void update_mmu_cache(struct vm_area_struct *vma,
- unsigned long address, pte_t *pte)
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+ unsigned long address, pte_t *ptep, unsigned int nr)
{
- update_tlb(vma, address, pte);
- update_cache(vma, address, pte);
+ update_tlb(vma, address, ptep);
+ update_cache(vma, address, ptep);
}

+#define update_mmu_cache(vma, addr, ptep) \
+ update_mmu_cache_range(vma, addr, ptep, 1)
+
/* __PHX__ FIXME, SWAP, this probably doesn't work */

/*
diff --git a/arch/openrisc/mm/cache.c b/arch/openrisc/mm/cache.c
index 534a52ec5e66..eb43b73f3855 100644
--- a/arch/openrisc/mm/cache.c
+++ b/arch/openrisc/mm/cache.c
@@ -43,15 +43,19 @@ void update_cache(struct vm_area_struct *vma, unsigned long address,
pte_t *pte)
{
unsigned long pfn = pte_val(*pte) >> PAGE_SHIFT;
- struct page *page = pfn_to_page(pfn);
- int dirty = !test_and_set_bit(PG_dc_clean, &page->flags);
+ struct folio *folio = page_folio(pfn_to_page(pfn));
+ int dirty = !test_and_set_bit(PG_dc_clean, &folio->flags);

/*
* Since icaches do not snoop for updated data on OpenRISC, we
* must write back and invalidate any dirty pages manually. We
* can skip data pages, since they will not end up in icaches.
*/
- if ((vma->vm_flags & VM_EXEC) && dirty)
- sync_icache_dcache(page);
+ if ((vma->vm_flags & VM_EXEC) && dirty) {
+ unsigned int nr = folio_nr_pages(folio);
+
+ while (nr--)
+ sync_icache_dcache(folio_page(folio, nr));
+ }
}

--
2.39.1


2023-02-27 17:58:41

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 18/30] powerpc: Implement the new page table range API

Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
Change the PG_arch_1 (aka PG_dcache_dirty) flag from being per-page to
per-folio.

I'm unsure about my merging of flush_dcache_icache_hugepage() and
flush_dcache_icache_page() into flush_dcache_icache_folio() and subsequent
removal of flush_dcache_icache_phys(). Please review.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: [email protected]
---
arch/powerpc/include/asm/book3s/pgtable.h | 10 +--
arch/powerpc/include/asm/cacheflush.h | 14 ++--
arch/powerpc/include/asm/kvm_ppc.h | 10 +--
arch/powerpc/include/asm/nohash/pgtable.h | 13 ++--
arch/powerpc/include/asm/pgtable.h | 6 ++
arch/powerpc/mm/book3s64/hash_utils.c | 11 +--
arch/powerpc/mm/cacheflush.c | 81 +++--------------------
arch/powerpc/mm/nohash/e500_hugetlbpage.c | 3 +-
arch/powerpc/mm/pgtable.c | 51 ++++++++------
9 files changed, 73 insertions(+), 126 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/pgtable.h b/arch/powerpc/include/asm/book3s/pgtable.h
index d18b748ea3ae..c2ef811505b0 100644
--- a/arch/powerpc/include/asm/book3s/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/pgtable.h
@@ -9,13 +9,6 @@
#endif

#ifndef __ASSEMBLY__
-/* Insert a PTE, top-level function is out of line. It uses an inline
- * low level function in the respective pgtable-* files
- */
-extern void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
- pte_t pte);
-
-
#define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
extern int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address,
pte_t *ptep, pte_t entry, int dirty);
@@ -36,7 +29,8 @@ void __update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t
* corresponding HPTE into the hash table ahead of time, instead of
* waiting for the inevitable extra hash-table miss exception.
*/
-static inline void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep)
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+ unsigned long address, pte_t *ptep, unsigned int nr)
{
if (IS_ENABLED(CONFIG_PPC32) && !mmu_has_feature(MMU_FTR_HPTE_TABLE))
return;
diff --git a/arch/powerpc/include/asm/cacheflush.h b/arch/powerpc/include/asm/cacheflush.h
index 7564dd4fd12b..ef7d2de33b89 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -35,13 +35,19 @@ static inline void flush_cache_vmap(unsigned long start, unsigned long end)
* It just marks the page as not i-cache clean. We do the i-cache
* flush later when the page is given to a user process, if necessary.
*/
-static inline void flush_dcache_page(struct page *page)
+static inline void flush_dcache_folio(struct folio *folio)
{
if (cpu_has_feature(CPU_FTR_COHERENT_ICACHE))
return;
/* avoid an atomic op if possible */
- if (test_bit(PG_dcache_clean, &page->flags))
- clear_bit(PG_dcache_clean, &page->flags);
+ if (test_bit(PG_dcache_clean, &folio->flags))
+ clear_bit(PG_dcache_clean, &folio->flags);
+}
+#define flush_dcache_folio flush_dcache_folio
+
+static inline void flush_dcache_page(struct page *page)
+{
+ flush_dcache_folio(page_folio(page));
}

void flush_icache_range(unsigned long start, unsigned long stop);
@@ -51,7 +57,7 @@ void flush_icache_user_page(struct vm_area_struct *vma, struct page *page,
unsigned long addr, int len);
#define flush_icache_user_page flush_icache_user_page

-void flush_dcache_icache_page(struct page *page);
+void flush_dcache_icache_folio(struct folio *folio);

/**
* flush_dcache_range(): Write any modified data cache blocks out to memory and
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 6bef23d6d0e3..e91dd8e88bb7 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -868,7 +868,7 @@ void kvmppc_init_lpid(unsigned long nr_lpids);

static inline void kvmppc_mmu_flush_icache(kvm_pfn_t pfn)
{
- struct page *page;
+ struct folio *folio;
/*
* We can only access pages that the kernel maps
* as memory. Bail out for unmapped ones.
@@ -877,10 +877,10 @@ static inline void kvmppc_mmu_flush_icache(kvm_pfn_t pfn)
return;

/* Clear i-cache for new pages */
- page = pfn_to_page(pfn);
- if (!test_bit(PG_dcache_clean, &page->flags)) {
- flush_dcache_icache_page(page);
- set_bit(PG_dcache_clean, &page->flags);
+ folio = page_folio(pfn_to_page(pfn));
+ if (!test_bit(PG_dcache_clean, &folio->flags)) {
+ flush_dcache_icache_folio(folio);
+ set_bit(PG_dcache_clean, &folio->flags);
}
}

diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h
index a6caaaab6f92..69a7dd47a9f0 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -166,12 +166,6 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
return __pte(pte_val(pte) & ~_PAGE_SWP_EXCLUSIVE);
}

-/* Insert a PTE, top-level function is out of line. It uses an inline
- * low level function in the respective pgtable-* files
- */
-extern void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
- pte_t pte);
-
/* This low level function performs the actual PTE insertion
* Setting the PTE depends on the MMU type and other factors. It's
* an horrible mess that I'm not going to try to clean up now but
@@ -282,10 +276,11 @@ static inline int pud_huge(pud_t pud)
* for the page which has just been mapped in.
*/
#if defined(CONFIG_PPC_E500) && defined(CONFIG_HUGETLB_PAGE)
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep);
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
+ pte_t *ptep, unsigned int nr);
#else
-static inline
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep) {}
+static inline void update_mmu_cache(struct vm_area_struct *vma,
+ unsigned long address, pte_t *ptep, unsigned int nr) {}
#endif

#endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index 9972626ddaf6..bf1263ff7e67 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -41,6 +41,12 @@ struct mm_struct;

#ifndef __ASSEMBLY__

+void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
+ pte_t pte, unsigned int nr);
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
+#define update_mmu_cache(vma, addr, ptep) \
+ update_mmu_cache_range(vma, addr, ptep, 1);
+
#ifndef MAX_PTRS_PER_PGD
#define MAX_PTRS_PER_PGD PTRS_PER_PGD
#endif
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index fedffe3ae136..ad2afa08e62e 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -1307,18 +1307,19 @@ void hash__early_init_mmu_secondary(void)
*/
unsigned int hash_page_do_lazy_icache(unsigned int pp, pte_t pte, int trap)
{
- struct page *page;
+ struct folio *folio;

if (!pfn_valid(pte_pfn(pte)))
return pp;

- page = pte_page(pte);
+ folio = page_folio(pte_page(pte));

/* page is dirty */
- if (!test_bit(PG_dcache_clean, &page->flags) && !PageReserved(page)) {
+ if (!test_bit(PG_dcache_clean, &folio->flags) &&
+ !folio_test_reserved(folio)) {
if (trap == INTERRUPT_INST_STORAGE) {
- flush_dcache_icache_page(page);
- set_bit(PG_dcache_clean, &page->flags);
+ flush_dcache_icache_folio(folio);
+ set_bit(PG_dcache_clean, &folio->flags);
} else
pp |= HPTE_R_N;
}
diff --git a/arch/powerpc/mm/cacheflush.c b/arch/powerpc/mm/cacheflush.c
index 0e9b4879c0f9..8ea6a096a664 100644
--- a/arch/powerpc/mm/cacheflush.c
+++ b/arch/powerpc/mm/cacheflush.c
@@ -76,51 +76,6 @@ void flush_icache_range(unsigned long start, unsigned long stop)
}
EXPORT_SYMBOL(flush_icache_range);

-#ifdef CONFIG_HIGHMEM
-/**
- * flush_dcache_icache_phys() - Flush a page by it's physical address
- * @physaddr: the physical address of the page
- */
-static void flush_dcache_icache_phys(unsigned long physaddr)
-{
- unsigned long bytes = l1_dcache_bytes();
- unsigned long nb = PAGE_SIZE / bytes;
- unsigned long addr = physaddr & PAGE_MASK;
- unsigned long msr, msr0;
- unsigned long loop1 = addr, loop2 = addr;
-
- msr0 = mfmsr();
- msr = msr0 & ~MSR_DR;
- /*
- * This must remain as ASM to prevent potential memory accesses
- * while the data MMU is disabled
- */
- asm volatile(
- " mtctr %2;\n"
- " mtmsr %3;\n"
- " isync;\n"
- "0: dcbst 0, %0;\n"
- " addi %0, %0, %4;\n"
- " bdnz 0b;\n"
- " sync;\n"
- " mtctr %2;\n"
- "1: icbi 0, %1;\n"
- " addi %1, %1, %4;\n"
- " bdnz 1b;\n"
- " sync;\n"
- " mtmsr %5;\n"
- " isync;\n"
- : "+&r" (loop1), "+&r" (loop2)
- : "r" (nb), "r" (msr), "i" (bytes), "r" (msr0)
- : "ctr", "memory");
-}
-NOKPROBE_SYMBOL(flush_dcache_icache_phys)
-#else
-static void flush_dcache_icache_phys(unsigned long physaddr)
-{
-}
-#endif
-
/**
* __flush_dcache_icache(): Flush a particular page from the data cache to RAM.
* Note: this is necessary because the instruction cache does *not*
@@ -148,17 +103,20 @@ static void __flush_dcache_icache(void *p)
invalidate_icache_range(addr, addr + PAGE_SIZE);
}

-static void flush_dcache_icache_hugepage(struct page *page)
+void flush_dcache_icache_folio(struct folio *folio)
{
- int i;
- int nr = compound_nr(page);
+ unsigned int i, nr = folio_nr_pages(folio);

- if (!PageHighMem(page)) {
+ if (flush_coherent_icache())
+ return;
+
+ if (!folio_test_highmem(folio)) {
+ void *addr = folio_address(folio);
for (i = 0; i < nr; i++)
- __flush_dcache_icache(lowmem_page_address(page + i));
+ __flush_dcache_icache(addr + i * PAGE_SIZE);
} else {
for (i = 0; i < nr; i++) {
- void *start = kmap_local_page(page + i);
+ void *start = kmap_local_folio(folio, i * PAGE_SIZE);

__flush_dcache_icache(start);
kunmap_local(start);
@@ -166,27 +124,6 @@ static void flush_dcache_icache_hugepage(struct page *page)
}
}

-void flush_dcache_icache_page(struct page *page)
-{
- if (flush_coherent_icache())
- return;
-
- if (PageCompound(page))
- return flush_dcache_icache_hugepage(page);
-
- if (!PageHighMem(page)) {
- __flush_dcache_icache(lowmem_page_address(page));
- } else if (IS_ENABLED(CONFIG_BOOKE) || sizeof(phys_addr_t) > sizeof(void *)) {
- void *start = kmap_local_page(page);
-
- __flush_dcache_icache(start);
- kunmap_local(start);
- } else {
- flush_dcache_icache_phys(page_to_phys(page));
- }
-}
-EXPORT_SYMBOL(flush_dcache_icache_page);
-
void clear_user_page(void *page, unsigned long vaddr, struct page *pg)
{
clear_page(page);
diff --git a/arch/powerpc/mm/nohash/e500_hugetlbpage.c b/arch/powerpc/mm/nohash/e500_hugetlbpage.c
index 58c8d9849cb1..f3cb91107a47 100644
--- a/arch/powerpc/mm/nohash/e500_hugetlbpage.c
+++ b/arch/powerpc/mm/nohash/e500_hugetlbpage.c
@@ -178,7 +178,8 @@ book3e_hugetlb_preload(struct vm_area_struct *vma, unsigned long ea, pte_t pte)
*
* This must always be called with the pte lock held.
*/
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep)
+void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
+ pte_t *ptep, unsigned int nr)
{
if (is_vm_hugetlb_page(vma))
book3e_hugetlb_preload(vma, address, *ptep);
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index cb2dcdb18f8e..b3c7b874a7a2 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -58,7 +58,7 @@ static inline int pte_looks_normal(pte_t pte)
return 0;
}

-static struct page *maybe_pte_to_page(pte_t pte)
+static struct folio *maybe_pte_to_folio(pte_t pte)
{
unsigned long pfn = pte_pfn(pte);
struct page *page;
@@ -68,7 +68,7 @@ static struct page *maybe_pte_to_page(pte_t pte)
page = pfn_to_page(pfn);
if (PageReserved(page))
return NULL;
- return page;
+ return page_folio(page);
}

#ifdef CONFIG_PPC_BOOK3S
@@ -84,12 +84,12 @@ static pte_t set_pte_filter_hash(pte_t pte)
pte = __pte(pte_val(pte) & ~_PAGE_HPTEFLAGS);
if (pte_looks_normal(pte) && !(cpu_has_feature(CPU_FTR_COHERENT_ICACHE) ||
cpu_has_feature(CPU_FTR_NOEXECUTE))) {
- struct page *pg = maybe_pte_to_page(pte);
- if (!pg)
+ struct folio *folio = maybe_pte_to_folio(pte);
+ if (!folio)
return pte;
- if (!test_bit(PG_dcache_clean, &pg->flags)) {
- flush_dcache_icache_page(pg);
- set_bit(PG_dcache_clean, &pg->flags);
+ if (!test_bit(PG_dcache_clean, &folio->flags)) {
+ flush_dcache_icache_folio(folio);
+ set_bit(PG_dcache_clean, &folio->flags);
}
}
return pte;
@@ -107,7 +107,7 @@ static pte_t set_pte_filter_hash(pte_t pte) { return pte; }
*/
static inline pte_t set_pte_filter(pte_t pte)
{
- struct page *pg;
+ struct folio *folio;

if (radix_enabled())
return pte;
@@ -120,18 +120,18 @@ static inline pte_t set_pte_filter(pte_t pte)
return pte;

/* If you set _PAGE_EXEC on weird pages you're on your own */
- pg = maybe_pte_to_page(pte);
- if (unlikely(!pg))
+ folio = maybe_pte_to_folio(pte);
+ if (unlikely(!folio))
return pte;

/* If the page clean, we move on */
- if (test_bit(PG_dcache_clean, &pg->flags))
+ if (test_bit(PG_dcache_clean, &folio->flags))
return pte;

/* If it's an exec fault, we flush the cache and make it clean */
if (is_exec_fault()) {
- flush_dcache_icache_page(pg);
- set_bit(PG_dcache_clean, &pg->flags);
+ flush_dcache_icache_folio(folio);
+ set_bit(PG_dcache_clean, &folio->flags);
return pte;
}

@@ -142,7 +142,7 @@ static inline pte_t set_pte_filter(pte_t pte)
static pte_t set_access_flags_filter(pte_t pte, struct vm_area_struct *vma,
int dirty)
{
- struct page *pg;
+ struct folio *folio;

if (IS_ENABLED(CONFIG_PPC_BOOK3S_64))
return pte;
@@ -168,17 +168,17 @@ static pte_t set_access_flags_filter(pte_t pte, struct vm_area_struct *vma,
#endif /* CONFIG_DEBUG_VM */

/* If you set _PAGE_EXEC on weird pages you're on your own */
- pg = maybe_pte_to_page(pte);
- if (unlikely(!pg))
+ folio = maybe_pte_to_folio(pte);
+ if (unlikely(!folio))
goto bail;

/* If the page is already clean, we move on */
- if (test_bit(PG_dcache_clean, &pg->flags))
+ if (test_bit(PG_dcache_clean, &folio->flags))
goto bail;

/* Clean the page and set PG_dcache_clean */
- flush_dcache_icache_page(pg);
- set_bit(PG_dcache_clean, &pg->flags);
+ flush_dcache_icache_folio(folio);
+ set_bit(PG_dcache_clean, &folio->flags);

bail:
return pte_mkexec(pte);
@@ -187,8 +187,8 @@ static pte_t set_access_flags_filter(pte_t pte, struct vm_area_struct *vma,
/*
* set_pte stores a linux PTE into the linux page table.
*/
-void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
- pte_t pte)
+void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
+ pte_t pte, unsigned int nr)
{
/*
* Make sure hardware valid bit is not set. We don't do
@@ -203,7 +203,14 @@ void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
pte = set_pte_filter(pte);

/* Perform the setting of the PTE */
- __set_pte_at(mm, addr, ptep, pte, 0);
+ for (;;) {
+ __set_pte_at(mm, addr, ptep, pte, 0);
+ if (--nr == 0)
+ break;
+ ptep++;
+ pte = __pte(pte_val(pte) + PAGE_SIZE);
+ addr += PAGE_SIZE;
+ }
}

void unmap_kernel_page(unsigned long va)
--
2.39.1


2023-02-27 17:58:44

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 25/30] x86: Implement the new page table range API

Convert set_pte_at() into set_ptes() and add a noop
update_mmu_cache_range().

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: [email protected]
Cc: "H. Peter Anvin" <[email protected]>
---
arch/x86/include/asm/pgtable.h | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 84be3e07b112..f424371ea143 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1019,13 +1019,22 @@ static inline pud_t native_local_pudp_get_and_clear(pud_t *pudp)
return res;
}

-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
- pte_t *ptep, pte_t pte)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte, unsigned int nr)
{
- page_table_check_ptes_set(mm, addr, ptep, pte, 1);
- set_pte(ptep, pte);
+ page_table_check_ptes_set(mm, addr, ptep, pte, nr);
+
+ for (;;) {
+ set_pte(ptep, pte);
+ if (--nr == 0)
+ break;
+ ptep++;
+ pte = __pte(pte_val(pte) + PAGE_SIZE);
+ }
}

+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
+
static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp, pmd_t pmd)
{
@@ -1291,6 +1300,10 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep)
{
}
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep, unsigned int nr)
+{
+}
static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
unsigned long addr, pmd_t *pmd)
{
--
2.39.1


2023-02-27 17:58:46

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 19/30] riscv: Implement the new page table range API

Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
Change the PG_dcache_clean flag from being per-page to per-folio.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Alexandre Ghiti <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: [email protected]
---
arch/riscv/include/asm/cacheflush.h | 19 +++++++++----------
arch/riscv/include/asm/pgtable.h | 26 +++++++++++++++++++-------
arch/riscv/mm/cacheflush.c | 11 ++---------
3 files changed, 30 insertions(+), 26 deletions(-)

diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h
index 03e3b95ae6da..10e5e96f09b5 100644
--- a/arch/riscv/include/asm/cacheflush.h
+++ b/arch/riscv/include/asm/cacheflush.h
@@ -15,20 +15,19 @@ static inline void local_flush_icache_all(void)

#define PG_dcache_clean PG_arch_1

-static inline void flush_dcache_page(struct page *page)
+static inline void flush_dcache_folio(struct folio *folio)
{
- /*
- * HugeTLB pages are always fully mapped and only head page will be
- * set PG_dcache_clean (see comments in flush_icache_pte()).
- */
- if (PageHuge(page))
- page = compound_head(page);
-
- if (test_bit(PG_dcache_clean, &page->flags))
- clear_bit(PG_dcache_clean, &page->flags);
+ if (test_bit(PG_dcache_clean, &folio->flags))
+ clear_bit(PG_dcache_clean, &folio->flags);
}
+#define flush_dcache_folio flush_dcache_folio
#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1

+static inline void flush_dcache_page(struct page *page)
+{
+ flush_dcache_folio(page_folio(page));
+}
+
/*
* RISC-V doesn't have an instruction to flush parts of the instruction cache,
* so instead we just flush the whole thing.
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index b516f3b59616..3a3a776fc047 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -405,8 +405,8 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)


/* Commit new configuration to MMU hardware */
-static inline void update_mmu_cache(struct vm_area_struct *vma,
- unsigned long address, pte_t *ptep)
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+ unsigned long address, pte_t *ptep, unsigned int nr)
{
/*
* The kernel assumes that TLBs don't cache invalid entries, but
@@ -415,8 +415,11 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
* Relying on flush_tlb_fix_spurious_fault would suffice, but
* the extra traps reduce performance. So, eagerly SFENCE.VMA.
*/
- local_flush_tlb_page(address);
+ while (nr--)
+ local_flush_tlb_page(address + nr * PAGE_SIZE);
}
+#define update_mmu_cache(vma, addr, ptep) \
+ update_mmu_cache_range(vma, addr, ptep, 1)

#define __HAVE_ARCH_UPDATE_MMU_TLB
#define update_mmu_tlb update_mmu_cache
@@ -456,12 +459,21 @@ static inline void __set_pte_at(struct mm_struct *mm,
set_pte(ptep, pteval);
}

-static inline void set_pte_at(struct mm_struct *mm,
- unsigned long addr, pte_t *ptep, pte_t pteval)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pteval, unsigned int nr)
{
- page_table_check_ptes_set(mm, addr, ptep, pteval, 1);
- __set_pte_at(mm, addr, ptep, pteval);
+ page_table_check_ptes_set(mm, addr, ptep, pteval, nr);
+
+ for (;;) {
+ __set_pte_at(mm, addr, ptep, pteval);
+ if (--nr == 0)
+ break;
+ ptep++;
+ addr += PAGE_SIZE;
+ pte_val(pteval) += 1 << _PAGE_PFN_SHIFT;
+ }
}
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)

static inline void pte_clear(struct mm_struct *mm,
unsigned long addr, pte_t *ptep)
diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
index fcd6145fbead..e36a851e5788 100644
--- a/arch/riscv/mm/cacheflush.c
+++ b/arch/riscv/mm/cacheflush.c
@@ -81,16 +81,9 @@ void flush_icache_mm(struct mm_struct *mm, bool local)
#ifdef CONFIG_MMU
void flush_icache_pte(pte_t pte)
{
- struct page *page = pte_page(pte);
+ struct folio *folio = page_folio(pte_page(pte));

- /*
- * HugeTLB pages are always fully mapped, so only setting head page's
- * PG_dcache_clean flag is enough.
- */
- if (PageHuge(page))
- page = compound_head(page);
-
- if (!test_bit(PG_dcache_clean, &page->flags)) {
+ if (!test_bit(PG_dcache_clean, &folio->flags)) {
flush_icache_all();
set_bit(PG_dcache_clean, &page->flags);
}
--
2.39.1


2023-02-27 17:58:49

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 29/30] mm: Convert do_set_pte() to set_pte_range()

From: Yin Fengwei <[email protected]>

set_pte_range() allows to setup page table entries for a specific
range. It takes advantage of batched rmap update for large folio.
It now takes care of calling update_mmu_cache_range().

Signed-off-by: Yin Fengwei <[email protected]>
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
Documentation/filesystems/locking.rst | 2 +-
include/linux/mm.h | 3 ++-
mm/filemap.c | 3 +--
mm/memory.c | 27 +++++++++++++++------------
4 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 7de7a7272a5e..922886fefb7f 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -663,7 +663,7 @@ locked. The VM will unlock the page.
Filesystem should find and map pages associated with offsets from "start_pgoff"
till "end_pgoff". ->map_pages() is called with page table locked and must
not block. If it's not possible to reach a page without blocking,
-filesystem should skip it. Filesystem should use do_set_pte() to setup
+filesystem should skip it. Filesystem should use set_pte_range() to setup
page table entry. Pointer to entry associated with the page is passed in
"pte" field in vm_fault structure. Pointers to entries for other offsets
should be calculated relative to "pte".
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1f79667824eb..568ebe7058d4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1168,7 +1168,8 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
}

vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page);
-void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr);
+void set_pte_range(struct vm_fault *vmf, struct folio *folio,
+ struct page *page, unsigned int nr, unsigned long addr);

vm_fault_t finish_fault(struct vm_fault *vmf);
vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf);
diff --git a/mm/filemap.c b/mm/filemap.c
index db86e459dde6..07ebd90967a3 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3507,8 +3507,7 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
ret = VM_FAULT_NOPAGE;

ref_count++;
- do_set_pte(vmf, page, addr);
- update_mmu_cache(vma, addr, vmf->pte);
+ set_pte_range(vmf, folio, page, 1, addr);
} while (vmf->pte++, page++, addr += PAGE_SIZE, ++count < nr_pages);

/* Restore the vmf->pte */
diff --git a/mm/memory.c b/mm/memory.c
index bfa3100ec5a3..b09c03e4fadb 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4256,7 +4256,8 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
}
#endif

-void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr)
+void set_pte_range(struct vm_fault *vmf, struct folio *folio,
+ struct page *page, unsigned int nr, unsigned long addr)
{
struct vm_area_struct *vma = vmf->vma;
bool uffd_wp = pte_marker_uffd_wp(vmf->orig_pte);
@@ -4264,7 +4265,7 @@ void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr)
bool prefault = vmf->address != addr;
pte_t entry;

- flush_icache_page(vma, page);
+ flush_icache_pages(vma, page, nr);
entry = mk_pte(page, vma->vm_page_prot);

if (prefault && arch_wants_old_prefaulted_pte())
@@ -4278,14 +4279,18 @@ void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr)
entry = pte_mkuffd_wp(entry);
/* copy-on-write page */
if (write && !(vma->vm_flags & VM_SHARED)) {
- inc_mm_counter(vma->vm_mm, MM_ANONPAGES);
- page_add_new_anon_rmap(page, vma, addr);
- lru_cache_add_inactive_or_unevictable(page, vma);
+ add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr);
+ VM_BUG_ON_FOLIO(nr != 1, folio);
+ folio_add_new_anon_rmap(folio, vma, addr);
+ folio_add_lru_vma(folio, vma);
} else {
- inc_mm_counter(vma->vm_mm, mm_counter_file(page));
- page_add_file_rmap(page, vma, false);
+ add_mm_counter(vma->vm_mm, mm_counter_file(page), nr);
+ folio_add_file_rmap_range(folio, page, nr, vma, false);
}
- set_pte_at(vma->vm_mm, addr, vmf->pte, entry);
+ set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr);
+
+ /* no need to invalidate: a not-present page won't be cached */
+ update_mmu_cache_range(vma, addr, vmf->pte, nr);
}

static bool vmf_pte_changed(struct vm_fault *vmf)
@@ -4358,11 +4363,9 @@ vm_fault_t finish_fault(struct vm_fault *vmf)

/* Re-check under ptl */
if (likely(!vmf_pte_changed(vmf))) {
- do_set_pte(vmf, page, vmf->address);
-
- /* no need to invalidate: a not-present page won't be cached */
- update_mmu_cache(vma, vmf->address, vmf->pte);
+ struct folio *folio = page_folio(page);

+ set_pte_range(vmf, folio, page, 1, vmf->address);
ret = 0;
} else {
update_mmu_tlb(vma, vmf->address, vmf->pte);
--
2.39.1


2023-02-27 17:58:52

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 21/30] superh: Implement the new page table range API

Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio() and
flush_icache_pages(). Change the PG_dcache_clean flag from being
per-page to per-folio. Flush the entire folio containing the pages in
flush_icache_pages() for ease of implementation.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: John Paul Adrian Glaubitz <[email protected]>
Cc: [email protected]
---
arch/sh/include/asm/cacheflush.h | 21 ++++++++-----
arch/sh/include/asm/pgtable.h | 6 ++--
arch/sh/include/asm/pgtable_32.h | 16 ++++++++--
arch/sh/mm/cache-j2.c | 4 +--
arch/sh/mm/cache-sh4.c | 26 ++++++++++-----
arch/sh/mm/cache-sh7705.c | 26 +++++++++------
arch/sh/mm/cache.c | 54 ++++++++++++++++++--------------
arch/sh/mm/kmap.c | 3 +-
8 files changed, 101 insertions(+), 55 deletions(-)

diff --git a/arch/sh/include/asm/cacheflush.h b/arch/sh/include/asm/cacheflush.h
index 481a664287e2..9fceef6f3e00 100644
--- a/arch/sh/include/asm/cacheflush.h
+++ b/arch/sh/include/asm/cacheflush.h
@@ -13,9 +13,9 @@
* - flush_cache_page(mm, vmaddr, pfn) flushes a single page
* - flush_cache_range(vma, start, end) flushes a range of pages
*
- * - flush_dcache_page(pg) flushes(wback&invalidates) a page for dcache
+ * - flush_dcache_folio(folio) flushes(wback&invalidates) a folio for dcache
* - flush_icache_range(start, end) flushes(invalidates) a range for icache
- * - flush_icache_page(vma, pg) flushes(invalidates) a page for icache
+ * - flush_icache_pages(vma, pg, nr) flushes(invalidates) pages for icache
* - flush_cache_sigtramp(vaddr) flushes the signal trampoline
*/
extern void (*local_flush_cache_all)(void *args);
@@ -23,9 +23,9 @@ extern void (*local_flush_cache_mm)(void *args);
extern void (*local_flush_cache_dup_mm)(void *args);
extern void (*local_flush_cache_page)(void *args);
extern void (*local_flush_cache_range)(void *args);
-extern void (*local_flush_dcache_page)(void *args);
+extern void (*local_flush_dcache_folio)(void *args);
extern void (*local_flush_icache_range)(void *args);
-extern void (*local_flush_icache_page)(void *args);
+extern void (*local_flush_icache_folio)(void *args);
extern void (*local_flush_cache_sigtramp)(void *args);

static inline void cache_noop(void *args) { }
@@ -42,11 +42,18 @@ extern void flush_cache_page(struct vm_area_struct *vma,
extern void flush_cache_range(struct vm_area_struct *vma,
unsigned long start, unsigned long end);
#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
-void flush_dcache_page(struct page *page);
+void flush_dcache_folio(struct folio *folio);
+#define flush_dcache_folio flush_dcache_folio
+static inline void flush_dcache_page(struct page *page)
+{
+ flush_dcache_folio(page_folio(page));
+}
+
extern void flush_icache_range(unsigned long start, unsigned long end);
#define flush_icache_user_range flush_icache_range
-extern void flush_icache_page(struct vm_area_struct *vma,
- struct page *page);
+void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
+ unsigned int nr);
+#define flush_icache_page(vma, page) flush_icache_pages(vma, page, 1)
extern void flush_cache_sigtramp(unsigned long address);

struct flusher_data {
diff --git a/arch/sh/include/asm/pgtable.h b/arch/sh/include/asm/pgtable.h
index 3ce30becf6df..1a8fdc3bc363 100644
--- a/arch/sh/include/asm/pgtable.h
+++ b/arch/sh/include/asm/pgtable.h
@@ -102,13 +102,15 @@ extern void __update_cache(struct vm_area_struct *vma,
extern void __update_tlb(struct vm_area_struct *vma,
unsigned long address, pte_t pte);

-static inline void
-update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep)
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+ unsigned long address, pte_t *ptep, unsigned int nr)
{
pte_t pte = *ptep;
__update_cache(vma, address, pte);
__update_tlb(vma, address, pte);
}
+#define update_mmu_cache(vma, addr, ptep) \
+ update_mmu_cache_range(vma, addr, ptep, 1)

extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
extern void paging_init(void);
diff --git a/arch/sh/include/asm/pgtable_32.h b/arch/sh/include/asm/pgtable_32.h
index 21952b094650..03ba1834e126 100644
--- a/arch/sh/include/asm/pgtable_32.h
+++ b/arch/sh/include/asm/pgtable_32.h
@@ -307,7 +307,19 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
#define set_pte(pteptr, pteval) (*(pteptr) = pteval)
#endif

-#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte, unsigned int nr)
+{
+ for (;;) {
+ set_pte(ptep, pte);
+ if (--nr == 0)
+ break;
+ ptep++;
+ pte = __pte(pte_val(pte) + PAGE_SIZE);
+ }
+}
+
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)

/*
* (pmds are folded into pgds so this doesn't get actually called,
@@ -323,7 +335,7 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
#define pte_none(x) (!pte_val(x))
#define pte_present(x) ((x).pte_low & (_PAGE_PRESENT | _PAGE_PROTNONE))

-#define pte_clear(mm,addr,xp) do { set_pte_at(mm, addr, xp, __pte(0)); } while (0)
+#define pte_clear(mm, addr, ptep) set_pte(ptep, __pte(0))

#define pmd_none(x) (!pmd_val(x))
#define pmd_present(x) (pmd_val(x))
diff --git a/arch/sh/mm/cache-j2.c b/arch/sh/mm/cache-j2.c
index f277862a11f5..9ac960214380 100644
--- a/arch/sh/mm/cache-j2.c
+++ b/arch/sh/mm/cache-j2.c
@@ -55,9 +55,9 @@ void __init j2_cache_init(void)
local_flush_cache_dup_mm = j2_flush_both;
local_flush_cache_page = j2_flush_both;
local_flush_cache_range = j2_flush_both;
- local_flush_dcache_page = j2_flush_dcache;
+ local_flush_dcache_folio = j2_flush_dcache;
local_flush_icache_range = j2_flush_icache;
- local_flush_icache_page = j2_flush_icache;
+ local_flush_icache_folio = j2_flush_icache;
local_flush_cache_sigtramp = j2_flush_icache;

pr_info("Initial J2 CCR is %.8x\n", __raw_readl(j2_ccr_base));
diff --git a/arch/sh/mm/cache-sh4.c b/arch/sh/mm/cache-sh4.c
index 72c2e1b46c08..862046f26981 100644
--- a/arch/sh/mm/cache-sh4.c
+++ b/arch/sh/mm/cache-sh4.c
@@ -107,19 +107,29 @@ static inline void flush_cache_one(unsigned long start, unsigned long phys)
* Write back & invalidate the D-cache of the page.
* (To avoid "alias" issues)
*/
-static void sh4_flush_dcache_page(void *arg)
+static void sh4_flush_dcache_folio(void *arg)
{
- struct page *page = arg;
- unsigned long addr = (unsigned long)page_address(page);
+ struct folio *folio = arg;
#ifndef CONFIG_SMP
- struct address_space *mapping = page_mapping_file(page);
+ struct address_space *mapping = folio_flush_mapping(folio);

if (mapping && !mapping_mapped(mapping))
- clear_bit(PG_dcache_clean, &page->flags);
+ clear_bit(PG_dcache_clean, &folio->flags);
else
#endif
- flush_cache_one(CACHE_OC_ADDRESS_ARRAY |
- (addr & shm_align_mask), page_to_phys(page));
+ {
+ unsigned long pfn = folio_pfn(folio);
+ unsigned long addr = (unsigned long)folio_address(folio);
+ unsigned int i, nr = folio_nr_pages(folio);
+
+ for (i = 0; i < nr; i++) {
+ flush_cache_one(CACHE_OC_ADDRESS_ARRAY |
+ (addr & shm_align_mask),
+ pfn * PAGE_SIZE);
+ addr += PAGE_SIZE;
+ pfn++;
+ }
+ }

wmb();
}
@@ -379,7 +389,7 @@ void __init sh4_cache_init(void)
__raw_readl(CCN_PRR));

local_flush_icache_range = sh4_flush_icache_range;
- local_flush_dcache_page = sh4_flush_dcache_page;
+ local_flush_dcache_folio = sh4_flush_dcache_folio;
local_flush_cache_all = sh4_flush_cache_all;
local_flush_cache_mm = sh4_flush_cache_mm;
local_flush_cache_dup_mm = sh4_flush_cache_mm;
diff --git a/arch/sh/mm/cache-sh7705.c b/arch/sh/mm/cache-sh7705.c
index 9b63a53a5e46..b509a407588f 100644
--- a/arch/sh/mm/cache-sh7705.c
+++ b/arch/sh/mm/cache-sh7705.c
@@ -132,15 +132,20 @@ static void __flush_dcache_page(unsigned long phys)
* Write back & invalidate the D-cache of the page.
* (To avoid "alias" issues)
*/
-static void sh7705_flush_dcache_page(void *arg)
+static void sh7705_flush_dcache_folio(void *arg)
{
- struct page *page = arg;
- struct address_space *mapping = page_mapping_file(page);
+ struct folio *folio = arg;
+ struct address_space *mapping = folio_flush_mapping(folio);

if (mapping && !mapping_mapped(mapping))
- clear_bit(PG_dcache_clean, &page->flags);
- else
- __flush_dcache_page(__pa(page_address(page)));
+ clear_bit(PG_dcache_clean, &folio->flags);
+ else {
+ unsigned long pfn = folio_pfn(folio);
+ unsigned int i, nr = folio_nr_pages(folio);
+
+ for (i = 0; i < nr; i++)
+ __flush_dcache_page((pfn + i) * PAGE_SIZE);
+ }
}

static void sh7705_flush_cache_all(void *args)
@@ -176,19 +181,20 @@ static void sh7705_flush_cache_page(void *args)
* Not entirely sure why this is necessary on SH3 with 32K cache but
* without it we get occasional "Memory fault" when loading a program.
*/
-static void sh7705_flush_icache_page(void *page)
+static void sh7705_flush_icache_folio(void *arg)
{
- __flush_purge_region(page_address(page), PAGE_SIZE);
+ struct folio *folio = arg;
+ __flush_purge_region(folio_address(folio), folio_size(folio));
}

void __init sh7705_cache_init(void)
{
local_flush_icache_range = sh7705_flush_icache_range;
- local_flush_dcache_page = sh7705_flush_dcache_page;
+ local_flush_dcache_folio = sh7705_flush_dcache_folio;
local_flush_cache_all = sh7705_flush_cache_all;
local_flush_cache_mm = sh7705_flush_cache_all;
local_flush_cache_dup_mm = sh7705_flush_cache_all;
local_flush_cache_range = sh7705_flush_cache_all;
local_flush_cache_page = sh7705_flush_cache_page;
- local_flush_icache_page = sh7705_flush_icache_page;
+ local_flush_icache_folio = sh7705_flush_icache_folio;
}
diff --git a/arch/sh/mm/cache.c b/arch/sh/mm/cache.c
index 3aef78ceb820..93fc5fb8ec1c 100644
--- a/arch/sh/mm/cache.c
+++ b/arch/sh/mm/cache.c
@@ -20,9 +20,9 @@ void (*local_flush_cache_mm)(void *args) = cache_noop;
void (*local_flush_cache_dup_mm)(void *args) = cache_noop;
void (*local_flush_cache_page)(void *args) = cache_noop;
void (*local_flush_cache_range)(void *args) = cache_noop;
-void (*local_flush_dcache_page)(void *args) = cache_noop;
+void (*local_flush_dcache_folio)(void *args) = cache_noop;
void (*local_flush_icache_range)(void *args) = cache_noop;
-void (*local_flush_icache_page)(void *args) = cache_noop;
+void (*local_flush_icache_folio)(void *args) = cache_noop;
void (*local_flush_cache_sigtramp)(void *args) = cache_noop;

void (*__flush_wback_region)(void *start, int size);
@@ -61,15 +61,17 @@ void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
unsigned long vaddr, void *dst, const void *src,
unsigned long len)
{
- if (boot_cpu_data.dcache.n_aliases && page_mapcount(page) &&
- test_bit(PG_dcache_clean, &page->flags)) {
+ struct folio *folio = page_folio(page);
+
+ if (boot_cpu_data.dcache.n_aliases && folio_mapped(folio) &&
+ test_bit(PG_dcache_clean, &folio->flags)) {
void *vto = kmap_coherent(page, vaddr) + (vaddr & ~PAGE_MASK);
memcpy(vto, src, len);
kunmap_coherent(vto);
} else {
memcpy(dst, src, len);
if (boot_cpu_data.dcache.n_aliases)
- clear_bit(PG_dcache_clean, &page->flags);
+ clear_bit(PG_dcache_clean, &folio->flags);
}

if (vma->vm_flags & VM_EXEC)
@@ -80,27 +82,30 @@ void copy_from_user_page(struct vm_area_struct *vma, struct page *page,
unsigned long vaddr, void *dst, const void *src,
unsigned long len)
{
+ struct folio *folio = page_folio(page);
+
if (boot_cpu_data.dcache.n_aliases && page_mapcount(page) &&
- test_bit(PG_dcache_clean, &page->flags)) {
+ test_bit(PG_dcache_clean, &folio->flags)) {
void *vfrom = kmap_coherent(page, vaddr) + (vaddr & ~PAGE_MASK);
memcpy(dst, vfrom, len);
kunmap_coherent(vfrom);
} else {
memcpy(dst, src, len);
if (boot_cpu_data.dcache.n_aliases)
- clear_bit(PG_dcache_clean, &page->flags);
+ clear_bit(PG_dcache_clean, &folio->flags);
}
}

void copy_user_highpage(struct page *to, struct page *from,
unsigned long vaddr, struct vm_area_struct *vma)
{
+ struct folio *src = page_folio(from);
void *vfrom, *vto;

vto = kmap_atomic(to);

- if (boot_cpu_data.dcache.n_aliases && page_mapcount(from) &&
- test_bit(PG_dcache_clean, &from->flags)) {
+ if (boot_cpu_data.dcache.n_aliases && folio_mapped(src) &&
+ test_bit(PG_dcache_clean, &src->flags)) {
vfrom = kmap_coherent(from, vaddr);
copy_page(vto, vfrom);
kunmap_coherent(vfrom);
@@ -136,35 +141,37 @@ EXPORT_SYMBOL(clear_user_highpage);
void __update_cache(struct vm_area_struct *vma,
unsigned long address, pte_t pte)
{
- struct page *page;
unsigned long pfn = pte_pfn(pte);

if (!boot_cpu_data.dcache.n_aliases)
return;

- page = pfn_to_page(pfn);
if (pfn_valid(pfn)) {
- int dirty = !test_and_set_bit(PG_dcache_clean, &page->flags);
+ struct folio *folio = page_folio(pfn_to_page(pfn));
+ int dirty = !test_and_set_bit(PG_dcache_clean, &folio->flags);
if (dirty)
- __flush_purge_region(page_address(page), PAGE_SIZE);
+ __flush_purge_region(folio_address(folio),
+ folio_size(folio));
}
}

void __flush_anon_page(struct page *page, unsigned long vmaddr)
{
+ struct folio *folio = page_folio(page);
unsigned long addr = (unsigned long) page_address(page);

if (pages_do_alias(addr, vmaddr)) {
- if (boot_cpu_data.dcache.n_aliases && page_mapcount(page) &&
- test_bit(PG_dcache_clean, &page->flags)) {
+ if (boot_cpu_data.dcache.n_aliases && folio_mapped(folio) &&
+ test_bit(PG_dcache_clean, &folio->flags)) {
void *kaddr;

kaddr = kmap_coherent(page, vmaddr);
/* XXX.. For now kunmap_coherent() does a purge */
/* __flush_purge_region((void *)kaddr, PAGE_SIZE); */
kunmap_coherent(kaddr);
- } else
- __flush_purge_region((void *)addr, PAGE_SIZE);
+ } else
+ __flush_purge_region(folio_address(folio),
+ folio_size(folio));
}
}

@@ -215,11 +222,11 @@ void flush_cache_range(struct vm_area_struct *vma, unsigned long start,
}
EXPORT_SYMBOL(flush_cache_range);

-void flush_dcache_page(struct page *page)
+void flush_dcache_folio(struct folio *folio)
{
- cacheop_on_each_cpu(local_flush_dcache_page, page, 1);
+ cacheop_on_each_cpu(local_flush_dcache_folio, folio, 1);
}
-EXPORT_SYMBOL(flush_dcache_page);
+EXPORT_SYMBOL(flush_dcache_folio);

void flush_icache_range(unsigned long start, unsigned long end)
{
@@ -233,10 +240,11 @@ void flush_icache_range(unsigned long start, unsigned long end)
}
EXPORT_SYMBOL(flush_icache_range);

-void flush_icache_page(struct vm_area_struct *vma, struct page *page)
+void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
+ unsigned int nr)
{
- /* Nothing uses the VMA, so just pass the struct page along */
- cacheop_on_each_cpu(local_flush_icache_page, page, 1);
+ /* Nothing uses the VMA, so just pass the folio along */
+ cacheop_on_each_cpu(local_flush_icache_folio, page_folio(page), 1);
}

void flush_cache_sigtramp(unsigned long address)
diff --git a/arch/sh/mm/kmap.c b/arch/sh/mm/kmap.c
index 73fd7cc99430..fa50e8f6e7a9 100644
--- a/arch/sh/mm/kmap.c
+++ b/arch/sh/mm/kmap.c
@@ -27,10 +27,11 @@ void __init kmap_coherent_init(void)

void *kmap_coherent(struct page *page, unsigned long addr)
{
+ struct folio *folio = page_folio(page);
enum fixed_addresses idx;
unsigned long vaddr;

- BUG_ON(!test_bit(PG_dcache_clean, &page->flags));
+ BUG_ON(!test_bit(PG_dcache_clean, &folio->flags));

preempt_disable();
pagefault_disable();
--
2.39.1


2023-02-27 17:58:54

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 05/30] alpha: Implement the new page table range API

Add set_ptes(), update_mmu_cache_range() and flush_icache_pages().

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: [email protected]
---
arch/alpha/include/asm/cacheflush.h | 10 ++++++++++
arch/alpha/include/asm/pgtable.h | 18 +++++++++++++++++-
2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/arch/alpha/include/asm/cacheflush.h b/arch/alpha/include/asm/cacheflush.h
index 9945ff483eaf..3956460e69e2 100644
--- a/arch/alpha/include/asm/cacheflush.h
+++ b/arch/alpha/include/asm/cacheflush.h
@@ -57,6 +57,16 @@ extern void flush_icache_user_page(struct vm_area_struct *vma,
#define flush_icache_page(vma, page) \
flush_icache_user_page((vma), (page), 0, 0)

+/*
+ * Both implementations of flush_icache_user_page flush the entire
+ * address space, so one call, no matter how many pages.
+ */
+static inline void flush_icache_pages(struct vm_area_struct *vma,
+ struct page *page, unsigned int nr)
+{
+ flush_icache_user_page(vma, page, 0, 0);
+}
+
#include <asm-generic/cacheflush.h>

#endif /* _ALPHA_CACHEFLUSH_H */
diff --git a/arch/alpha/include/asm/pgtable.h b/arch/alpha/include/asm/pgtable.h
index ba43cb841d19..1e3354e9731b 100644
--- a/arch/alpha/include/asm/pgtable.h
+++ b/arch/alpha/include/asm/pgtable.h
@@ -26,7 +26,18 @@ struct vm_area_struct;
* hook is made available.
*/
#define set_pte(pteptr, pteval) ((*(pteptr)) = (pteval))
-#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte, unsigned int nr)
+{
+ for (;;) {
+ set_pte(ptep, pte);
+ if (--nr == 0)
+ break;
+ ptep++;
+ pte_val(pte) += 1UL << 32;
+ }
+}
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)

/* PMD_SHIFT determines the size of the area a second-level page table can map */
#define PMD_SHIFT (PAGE_SHIFT + (PAGE_SHIFT-3))
@@ -303,6 +314,11 @@ extern inline void update_mmu_cache(struct vm_area_struct * vma,
{
}

+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+ unsigned long address, pte_t *ptep, unsigned int nr)
+{
+}
+
/*
* Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
* are !pte_none() && !pte_present().
--
2.39.1


2023-02-27 17:58:58

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 09/30] hexagon: Implement the new page table range API

Add set_ptes() and update_mmu_cache_range().

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Acked-by: Brian Cain <[email protected]>
---
arch/hexagon/include/asm/cacheflush.h | 7 +++++--
arch/hexagon/include/asm/pgtable.h | 16 ++++++++++++++--
2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/hexagon/include/asm/cacheflush.h b/arch/hexagon/include/asm/cacheflush.h
index 6eff0730e6ef..63ca314ede89 100644
--- a/arch/hexagon/include/asm/cacheflush.h
+++ b/arch/hexagon/include/asm/cacheflush.h
@@ -58,12 +58,15 @@ extern void flush_cache_all_hexagon(void);
* clean the cache when the PTE is set.
*
*/
-static inline void update_mmu_cache(struct vm_area_struct *vma,
- unsigned long address, pte_t *ptep)
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+ unsigned long address, pte_t *ptep, unsigned int nr)
{
/* generic_ptrace_pokedata doesn't wind up here, does it? */
}

+#define update_mmu_cache(vma, addr, ptep) \
+ update_mmu_cache_range(vma, addr, ptep, 1)
+
void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
unsigned long vaddr, void *dst, void *src, int len);
#define copy_to_user_page copy_to_user_page
diff --git a/arch/hexagon/include/asm/pgtable.h b/arch/hexagon/include/asm/pgtable.h
index 59393613d086..f58f1d920769 100644
--- a/arch/hexagon/include/asm/pgtable.h
+++ b/arch/hexagon/include/asm/pgtable.h
@@ -346,12 +346,24 @@ static inline int pte_exec(pte_t pte)
#define set_pmd(pmdptr, pmdval) (*(pmdptr) = (pmdval))

/*
- * set_pte_at - update page table and do whatever magic may be
+ * set_ptes - update page table and do whatever magic may be
* necessary to make the underlying hardware/firmware take note.
*
* VM may require a virtual instruction to alert the MMU.
*/
-#define set_pte_at(mm, addr, ptep, pte) set_pte(ptep, pte)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte, unsigned int nr)
+{
+ for (;;) {
+ set_pte(ptep, pte);
+ if (--nr == 0)
+ break;
+ ptep++;
+ pte_val(pte) += PAGE_SIZE;
+ }
+}
+
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)

static inline unsigned long pmd_page_vaddr(pmd_t pmd)
{
--
2.39.1


2023-02-27 17:59:01

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 27/30] filemap: Add filemap_map_folio_range()

From: Yin Fengwei <[email protected]>

filemap_map_folio_range() maps partial/full folio. Comparing to original
filemap_map_pages(), it updates refcount once per folio instead of per
page and gets minor performance improvement for large folio.

With a will-it-scale.page_fault3 like app (change file write
fault testing to read fault testing. Trying to upstream it to
will-it-scale at [1]), got 2% performance gain on a 48C/96T
Cascade Lake test box with 96 processes running against xfs.

[1]: https://github.com/antonblanchard/will-it-scale/pull/37

Signed-off-by: Yin Fengwei <[email protected]>
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
mm/filemap.c | 98 +++++++++++++++++++++++++++++-----------------------
1 file changed, 54 insertions(+), 44 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 2723104cc06a..db86e459dde6 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2202,16 +2202,6 @@ unsigned filemap_get_folios(struct address_space *mapping, pgoff_t *start,
}
EXPORT_SYMBOL(filemap_get_folios);

-static inline
-bool folio_more_pages(struct folio *folio, pgoff_t index, pgoff_t max)
-{
- if (!folio_test_large(folio) || folio_test_hugetlb(folio))
- return false;
- if (index >= max)
- return false;
- return index < folio->index + folio_nr_pages(folio) - 1;
-}
-
/**
* filemap_get_folios_contig - Get a batch of contiguous folios
* @mapping: The address_space to search
@@ -3483,6 +3473,53 @@ static inline struct folio *next_map_page(struct address_space *mapping,
mapping, xas, end_pgoff);
}

+/*
+ * Map page range [start_page, start_page + nr_pages) of folio.
+ * start_page is gotten from start by folio_page(folio, start)
+ */
+static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
+ struct folio *folio, unsigned long start,
+ unsigned long addr, unsigned int nr_pages)
+{
+ vm_fault_t ret = 0;
+ struct vm_area_struct *vma = vmf->vma;
+ struct file *file = vma->vm_file;
+ struct page *page = folio_page(folio, start);
+ unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss);
+ unsigned int ref_count = 0, count = 0;
+
+ do {
+ if (PageHWPoison(page))
+ continue;
+
+ if (mmap_miss > 0)
+ mmap_miss--;
+
+ /*
+ * NOTE: If there're PTE markers, we'll leave them to be
+ * handled in the specific fault path, and it'll prohibit the
+ * fault-around logic.
+ */
+ if (!pte_none(*vmf->pte))
+ continue;
+
+ if (vmf->address == addr)
+ ret = VM_FAULT_NOPAGE;
+
+ ref_count++;
+ do_set_pte(vmf, page, addr);
+ update_mmu_cache(vma, addr, vmf->pte);
+ } while (vmf->pte++, page++, addr += PAGE_SIZE, ++count < nr_pages);
+
+ /* Restore the vmf->pte */
+ vmf->pte -= nr_pages;
+
+ folio_ref_add(folio, ref_count);
+ WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss);
+
+ return ret;
+}
+
vm_fault_t filemap_map_pages(struct vm_fault *vmf,
pgoff_t start_pgoff, pgoff_t end_pgoff)
{
@@ -3493,9 +3530,9 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
unsigned long addr;
XA_STATE(xas, &mapping->i_pages, start_pgoff);
struct folio *folio;
- struct page *page;
unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss);
vm_fault_t ret = 0;
+ int nr_pages = 0;

rcu_read_lock();
folio = first_map_page(mapping, &xas, end_pgoff);
@@ -3510,45 +3547,18 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
addr = vma->vm_start + ((start_pgoff - vma->vm_pgoff) << PAGE_SHIFT);
vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl);
do {
-again:
- page = folio_file_page(folio, xas.xa_index);
- if (PageHWPoison(page))
- goto unlock;
-
- if (mmap_miss > 0)
- mmap_miss--;
+ unsigned long end;

addr += (xas.xa_index - last_pgoff) << PAGE_SHIFT;
vmf->pte += xas.xa_index - last_pgoff;
last_pgoff = xas.xa_index;
+ end = folio->index + folio_nr_pages(folio) - 1;
+ nr_pages = min(end, end_pgoff) - xas.xa_index + 1;

- /*
- * NOTE: If there're PTE markers, we'll leave them to be
- * handled in the specific fault path, and it'll prohibit the
- * fault-around logic.
- */
- if (!pte_none(*vmf->pte))
- goto unlock;
+ ret |= filemap_map_folio_range(vmf, folio,
+ xas.xa_index - folio->index, addr, nr_pages);
+ xas.xa_index += nr_pages;

- /* We're about to handle the fault */
- if (vmf->address == addr)
- ret = VM_FAULT_NOPAGE;
-
- do_set_pte(vmf, page, addr);
- /* no need to invalidate: a not-present page won't be cached */
- update_mmu_cache(vma, addr, vmf->pte);
- if (folio_more_pages(folio, xas.xa_index, end_pgoff)) {
- xas.xa_index++;
- folio_ref_inc(folio);
- goto again;
- }
- folio_unlock(folio);
- continue;
-unlock:
- if (folio_more_pages(folio, xas.xa_index, end_pgoff)) {
- xas.xa_index++;
- goto again;
- }
folio_unlock(folio);
folio_put(folio);
} while ((folio = next_map_page(mapping, &xas, end_pgoff)) != NULL);
--
2.39.1


2023-02-27 17:59:04

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 24/30] um: Implement the new page table range API

Add set_ptes() and update_mmu_cache_range().

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Anton Ivanov <[email protected]>
Cc: Johannes Berg <[email protected]>
Cc: [email protected]
---
arch/um/include/asm/pgtable.h | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/um/include/asm/pgtable.h b/arch/um/include/asm/pgtable.h
index a70d1618eb35..ca78c90ae74f 100644
--- a/arch/um/include/asm/pgtable.h
+++ b/arch/um/include/asm/pgtable.h
@@ -242,12 +242,20 @@ static inline void set_pte(pte_t *pteptr, pte_t pteval)
if(pte_present(*pteptr)) *pteptr = pte_mknewprot(*pteptr);
}

-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
- pte_t *pteptr, pte_t pteval)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte, unsigned int nr)
{
- set_pte(pteptr, pteval);
+ for (;;) {
+ set_pte(ptep, pte);
+ if (--nr == 0)
+ break;
+ ptep++;
+ pte_val(pte) += PAGE_SIZE;
+ }
}

+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
+
#define __HAVE_ARCH_PTE_SAME
static inline int pte_same(pte_t pte_a, pte_t pte_b)
{
@@ -290,6 +298,7 @@ struct mm_struct;
extern pte_t *virt_to_pte(struct mm_struct *mm, unsigned long addr);

#define update_mmu_cache(vma,address,ptep) do {} while (0)
+#define update_mmu_cache_range(vma, address, ptep, nr) do {} while (0)

/*
* Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
--
2.39.1


2023-02-27 17:59:07

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 01/30] mm: Convert page_table_check_pte_set() to page_table_check_ptes_set()

Tell the page table check how many PTEs & PFNs we want it to check.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
arch/arm64/include/asm/pgtable.h | 2 +-
arch/riscv/include/asm/pgtable.h | 2 +-
arch/x86/include/asm/pgtable.h | 2 +-
include/linux/page_table_check.h | 14 +++++++-------
mm/page_table_check.c | 14 ++++++++------
5 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index b6ba466e2e8a..69765dc697af 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -358,7 +358,7 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte)
{
- page_table_check_pte_set(mm, addr, ptep, pte);
+ page_table_check_ptes_set(mm, addr, ptep, pte, 1);
return __set_pte_at(mm, addr, ptep, pte);
}

diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index ab05f892d317..b516f3b59616 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -459,7 +459,7 @@ static inline void __set_pte_at(struct mm_struct *mm,
static inline void set_pte_at(struct mm_struct *mm,
unsigned long addr, pte_t *ptep, pte_t pteval)
{
- page_table_check_pte_set(mm, addr, ptep, pteval);
+ page_table_check_ptes_set(mm, addr, ptep, pteval, 1);
__set_pte_at(mm, addr, ptep, pteval);
}

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 7425f32e5293..84be3e07b112 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1022,7 +1022,7 @@ static inline pud_t native_local_pudp_get_and_clear(pud_t *pudp)
static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte)
{
- page_table_check_pte_set(mm, addr, ptep, pte);
+ page_table_check_ptes_set(mm, addr, ptep, pte, 1);
set_pte(ptep, pte);
}

diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
index 01e16c7696ec..ba269c7009e4 100644
--- a/include/linux/page_table_check.h
+++ b/include/linux/page_table_check.h
@@ -20,8 +20,8 @@ void __page_table_check_pmd_clear(struct mm_struct *mm, unsigned long addr,
pmd_t pmd);
void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
pud_t pud);
-void __page_table_check_pte_set(struct mm_struct *mm, unsigned long addr,
- pte_t *ptep, pte_t pte);
+void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte, unsigned int nr);
void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp, pmd_t pmd);
void __page_table_check_pud_set(struct mm_struct *mm, unsigned long addr,
@@ -73,14 +73,14 @@ static inline void page_table_check_pud_clear(struct mm_struct *mm,
__page_table_check_pud_clear(mm, addr, pud);
}

-static inline void page_table_check_pte_set(struct mm_struct *mm,
+static inline void page_table_check_ptes_set(struct mm_struct *mm,
unsigned long addr, pte_t *ptep,
- pte_t pte)
+ pte_t pte, unsigned int nr)
{
if (static_branch_likely(&page_table_check_disabled))
return;

- __page_table_check_pte_set(mm, addr, ptep, pte);
+ __page_table_check_ptes_set(mm, addr, ptep, pte, nr);
}

static inline void page_table_check_pmd_set(struct mm_struct *mm,
@@ -138,9 +138,9 @@ static inline void page_table_check_pud_clear(struct mm_struct *mm,
{
}

-static inline void page_table_check_pte_set(struct mm_struct *mm,
+static inline void page_table_check_ptes_set(struct mm_struct *mm,
unsigned long addr, pte_t *ptep,
- pte_t pte)
+ pte_t pte, unsigned int nr)
{
}

diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 25d8610c0042..e6f4d40caaa2 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -184,20 +184,22 @@ void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
}
EXPORT_SYMBOL(__page_table_check_pud_clear);

-void __page_table_check_pte_set(struct mm_struct *mm, unsigned long addr,
- pte_t *ptep, pte_t pte)
+void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte, unsigned int nr)
{
+ unsigned int i;
+
if (&init_mm == mm)
return;

- __page_table_check_pte_clear(mm, addr, *ptep);
+ for (i = 0; i < nr; i++)
+ __page_table_check_pte_clear(mm, addr, ptep[i]);
if (pte_user_accessible_page(pte)) {
- page_table_check_set(mm, addr, pte_pfn(pte),
- PAGE_SIZE >> PAGE_SHIFT,
+ page_table_check_set(mm, addr, pte_pfn(pte), nr,
pte_write(pte));
}
}
-EXPORT_SYMBOL(__page_table_check_pte_set);
+EXPORT_SYMBOL(__page_table_check_ptes_set);

void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp, pmd_t pmd)
--
2.39.1


2023-02-27 17:59:11

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 23/30] sparc64: Implement the new page table range API

Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio() and
flush_icache_pages(). Convert the PG_dcache_dirty flag from being
per-page to per-folio.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: [email protected]
---
arch/sparc/include/asm/cacheflush_64.h | 18 ++++--
arch/sparc/include/asm/pgtable_64.h | 25 +++++++--
arch/sparc/kernel/smp_64.c | 56 +++++++++++-------
arch/sparc/mm/init_64.c | 78 +++++++++++++++-----------
arch/sparc/mm/tlb.c | 5 +-
5 files changed, 117 insertions(+), 65 deletions(-)

diff --git a/arch/sparc/include/asm/cacheflush_64.h b/arch/sparc/include/asm/cacheflush_64.h
index b9341836597e..a9a719f04d06 100644
--- a/arch/sparc/include/asm/cacheflush_64.h
+++ b/arch/sparc/include/asm/cacheflush_64.h
@@ -35,20 +35,26 @@ void flush_icache_range(unsigned long start, unsigned long end);
void __flush_icache_page(unsigned long);

void __flush_dcache_page(void *addr, int flush_icache);
-void flush_dcache_page_impl(struct page *page);
+void flush_dcache_folio_impl(struct folio *folio);
#ifdef CONFIG_SMP
-void smp_flush_dcache_page_impl(struct page *page, int cpu);
-void flush_dcache_page_all(struct mm_struct *mm, struct page *page);
+void smp_flush_dcache_folio_impl(struct folio *folio, int cpu);
+void flush_dcache_folio_all(struct mm_struct *mm, struct folio *folio);
#else
-#define smp_flush_dcache_page_impl(page,cpu) flush_dcache_page_impl(page)
-#define flush_dcache_page_all(mm,page) flush_dcache_page_impl(page)
+#define smp_flush_dcache_folio_impl(folio, cpu) flush_dcache_folio_impl(folio)
+#define flush_dcache_folio_all(mm, folio) flush_dcache_folio_impl(folio)
#endif

void __flush_dcache_range(unsigned long start, unsigned long end);
#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
-void flush_dcache_page(struct page *page);
+void flush_dcache_folio(struct folio *folio);
+#define flush_dcache_folio flush_dcache_folio
+static inline void flush_dcache_page(struct page *page)
+{
+ flush_dcache_folio(page_folio(page));
+}

#define flush_icache_page(vma, pg) do { } while(0)
+#define flush_icache_pages(vma, pg, nr) do { } while(0)

void flush_ptrace_access(struct vm_area_struct *, struct page *,
unsigned long uaddr, void *kaddr,
diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 2dc8d4641734..d5c0088e0c6a 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -911,8 +911,20 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
maybe_tlb_batch_add(mm, addr, ptep, orig, fullmm, PAGE_SHIFT);
}

-#define set_pte_at(mm,addr,ptep,pte) \
- __set_pte_at((mm), (addr), (ptep), (pte), 0)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte, unsigned int nr)
+{
+ for (;;) {
+ __set_pte_at(mm, addr, ptep, pte, 0);
+ if (--nr == 0)
+ break;
+ ptep++;
+ pte_val(pte) += PAGE_SIZE;
+ addr += PAGE_SIZE;
+ }
+}
+
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1);

#define pte_clear(mm,addr,ptep) \
set_pte_at((mm), (addr), (ptep), __pte(0UL))
@@ -931,8 +943,8 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
\
if (pfn_valid(this_pfn) && \
(((old_addr) ^ (new_addr)) & (1 << 13))) \
- flush_dcache_page_all(current->mm, \
- pfn_to_page(this_pfn)); \
+ flush_dcache_folio_all(current->mm, \
+ page_folio(pfn_to_page(this_pfn))); \
} \
newpte; \
})
@@ -947,7 +959,10 @@ struct seq_file;
void mmu_info(struct seq_file *);

struct vm_area_struct;
-void update_mmu_cache(struct vm_area_struct *, unsigned long, pte_t *);
+void update_mmu_cache_range(struct vm_area_struct *, unsigned long addr,
+ pte_t *ptep, unsigned int nr);
+#define update_mmu_cache(vma, addr, ptep) \
+ update_mmu_cache_range(vma, addr, ptep, 1)
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr,
pmd_t *pmd);
diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c
index a55295d1b924..90ef8677ac89 100644
--- a/arch/sparc/kernel/smp_64.c
+++ b/arch/sparc/kernel/smp_64.c
@@ -921,20 +921,26 @@ extern unsigned long xcall_flush_dcache_page_cheetah;
#endif
extern unsigned long xcall_flush_dcache_page_spitfire;

-static inline void __local_flush_dcache_page(struct page *page)
+static inline void __local_flush_dcache_folio(struct folio *folio)
{
+ unsigned int i, nr = folio_nr_pages(folio);
+
#ifdef DCACHE_ALIASING_POSSIBLE
- __flush_dcache_page(page_address(page),
+ for (i = 0; i < nr; i++)
+ __flush_dcache_page(folio_address(folio) + i * PAGE_SIZE,
((tlb_type == spitfire) &&
- page_mapping_file(page) != NULL));
+ folio_flush_mapping(folio) != NULL));
#else
- if (page_mapping_file(page) != NULL &&
- tlb_type == spitfire)
- __flush_icache_page(__pa(page_address(page)));
+ if (folio_flush_mapping(folio) != NULL &&
+ tlb_type == spitfire) {
+ unsigned long pfn = folio_pfn(folio)
+ for (i = 0; i < nr; i++)
+ __flush_icache_page((pfn + i) * PAGE_SIZE);
+ }
#endif
}

-void smp_flush_dcache_page_impl(struct page *page, int cpu)
+void smp_flush_dcache_folio_impl(struct folio *folio, int cpu)
{
int this_cpu;

@@ -948,14 +954,14 @@ void smp_flush_dcache_page_impl(struct page *page, int cpu)
this_cpu = get_cpu();

if (cpu == this_cpu) {
- __local_flush_dcache_page(page);
+ __local_flush_dcache_folio(folio);
} else if (cpu_online(cpu)) {
- void *pg_addr = page_address(page);
+ void *pg_addr = folio_address(folio);
u64 data0 = 0;

if (tlb_type == spitfire) {
data0 = ((u64)&xcall_flush_dcache_page_spitfire);
- if (page_mapping_file(page) != NULL)
+ if (folio_flush_mapping(folio) != NULL)
data0 |= ((u64)1 << 32);
} else if (tlb_type == cheetah || tlb_type == cheetah_plus) {
#ifdef DCACHE_ALIASING_POSSIBLE
@@ -963,18 +969,23 @@ void smp_flush_dcache_page_impl(struct page *page, int cpu)
#endif
}
if (data0) {
- xcall_deliver(data0, __pa(pg_addr),
- (u64) pg_addr, cpumask_of(cpu));
+ unsigned int i, nr = folio_nr_pages(folio);
+
+ for (i = 0; i < nr; i++) {
+ xcall_deliver(data0, __pa(pg_addr),
+ (u64) pg_addr, cpumask_of(cpu));
#ifdef CONFIG_DEBUG_DCFLUSH
- atomic_inc(&dcpage_flushes_xcall);
+ atomic_inc(&dcpage_flushes_xcall);
#endif
+ pg_addr += PAGE_SIZE;
+ }
}
}

put_cpu();
}

-void flush_dcache_page_all(struct mm_struct *mm, struct page *page)
+void flush_dcache_folio_all(struct mm_struct *mm, struct folio *folio)
{
void *pg_addr;
u64 data0;
@@ -988,10 +999,10 @@ void flush_dcache_page_all(struct mm_struct *mm, struct page *page)
atomic_inc(&dcpage_flushes);
#endif
data0 = 0;
- pg_addr = page_address(page);
+ pg_addr = folio_address(folio);
if (tlb_type == spitfire) {
data0 = ((u64)&xcall_flush_dcache_page_spitfire);
- if (page_mapping_file(page) != NULL)
+ if (folio_flush_mapping(folio) != NULL)
data0 |= ((u64)1 << 32);
} else if (tlb_type == cheetah || tlb_type == cheetah_plus) {
#ifdef DCACHE_ALIASING_POSSIBLE
@@ -999,13 +1010,18 @@ void flush_dcache_page_all(struct mm_struct *mm, struct page *page)
#endif
}
if (data0) {
- xcall_deliver(data0, __pa(pg_addr),
- (u64) pg_addr, cpu_online_mask);
+ unsigned int i, nr = folio_nr_pages(folio);
+
+ for (i = 0; i < nr; i++) {
+ xcall_deliver(data0, __pa(pg_addr),
+ (u64) pg_addr, cpu_online_mask);
#ifdef CONFIG_DEBUG_DCFLUSH
- atomic_inc(&dcpage_flushes_xcall);
+ atomic_inc(&dcpage_flushes_xcall);
#endif
+ pg_addr += PAGE_SIZE;
+ }
}
- __local_flush_dcache_page(page);
+ __local_flush_dcache_folio(folio);

preempt_enable();
}
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 04f9db0c3111..ab9aacbaf43c 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -195,21 +195,26 @@ atomic_t dcpage_flushes_xcall = ATOMIC_INIT(0);
#endif
#endif

-inline void flush_dcache_page_impl(struct page *page)
+inline void flush_dcache_folio_impl(struct folio *folio)
{
+ unsigned int i, nr = folio_nr_pages(folio);
+
BUG_ON(tlb_type == hypervisor);
#ifdef CONFIG_DEBUG_DCFLUSH
atomic_inc(&dcpage_flushes);
#endif

#ifdef DCACHE_ALIASING_POSSIBLE
- __flush_dcache_page(page_address(page),
- ((tlb_type == spitfire) &&
- page_mapping_file(page) != NULL));
+ for (i = 0; i < nr; i++)
+ __flush_dcache_page(folio_address(folio) + i * PAGE_SIZE,
+ ((tlb_type == spitfire) &&
+ folio_flush_mapping(folio) != NULL));
#else
- if (page_mapping_file(page) != NULL &&
- tlb_type == spitfire)
- __flush_icache_page(__pa(page_address(page)));
+ if (folio_flush_mapping(folio) != NULL &&
+ tlb_type == spitfire) {
+ for (i = 0; i < nr; i++)
+ __flush_icache_page((pfn + i) * PAGE_SIZE);
+ }
#endif
}

@@ -218,10 +223,10 @@ inline void flush_dcache_page_impl(struct page *page)
#define PG_dcache_cpu_mask \
((1UL<<ilog2(roundup_pow_of_two(NR_CPUS)))-1UL)

-#define dcache_dirty_cpu(page) \
- (((page)->flags >> PG_dcache_cpu_shift) & PG_dcache_cpu_mask)
+#define dcache_dirty_cpu(folio) \
+ (((folio)->flags >> PG_dcache_cpu_shift) & PG_dcache_cpu_mask)

-static inline void set_dcache_dirty(struct page *page, int this_cpu)
+static inline void set_dcache_dirty(struct folio *folio, int this_cpu)
{
unsigned long mask = this_cpu;
unsigned long non_cpu_bits;
@@ -238,11 +243,11 @@ static inline void set_dcache_dirty(struct page *page, int this_cpu)
"bne,pn %%xcc, 1b\n\t"
" nop"
: /* no outputs */
- : "r" (mask), "r" (non_cpu_bits), "r" (&page->flags)
+ : "r" (mask), "r" (non_cpu_bits), "r" (&folio->flags)
: "g1", "g7");
}

-static inline void clear_dcache_dirty_cpu(struct page *page, unsigned long cpu)
+static inline void clear_dcache_dirty_cpu(struct folio *folio, unsigned long cpu)
{
unsigned long mask = (1UL << PG_dcache_dirty);

@@ -260,7 +265,7 @@ static inline void clear_dcache_dirty_cpu(struct page *page, unsigned long cpu)
" nop\n"
"2:"
: /* no outputs */
- : "r" (cpu), "r" (mask), "r" (&page->flags),
+ : "r" (cpu), "r" (mask), "r" (&folio->flags),
"i" (PG_dcache_cpu_mask),
"i" (PG_dcache_cpu_shift)
: "g1", "g7");
@@ -284,9 +289,10 @@ static void flush_dcache(unsigned long pfn)

page = pfn_to_page(pfn);
if (page) {
+ struct folio *folio = page_folio(page);
unsigned long pg_flags;

- pg_flags = page->flags;
+ pg_flags = folio->flags;
if (pg_flags & (1UL << PG_dcache_dirty)) {
int cpu = ((pg_flags >> PG_dcache_cpu_shift) &
PG_dcache_cpu_mask);
@@ -296,11 +302,11 @@ static void flush_dcache(unsigned long pfn)
* in the SMP case.
*/
if (cpu == this_cpu)
- flush_dcache_page_impl(page);
+ flush_dcache_folio_impl(folio);
else
- smp_flush_dcache_page_impl(page, cpu);
+ smp_flush_dcache_folio_impl(folio, cpu);

- clear_dcache_dirty_cpu(page, cpu);
+ clear_dcache_dirty_cpu(folio, cpu);

put_cpu();
}
@@ -388,12 +394,14 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
}
#endif /* CONFIG_HUGETLB_PAGE */

-void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep)
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
+ pte_t *ptep, unsigned int nr)
{
struct mm_struct *mm;
unsigned long flags;
bool is_huge_tsb;
pte_t pte = *ptep;
+ unsigned int i;

if (tlb_type != hypervisor) {
unsigned long pfn = pte_pfn(pte);
@@ -440,15 +448,21 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *
}
}
#endif
- if (!is_huge_tsb)
- __update_mmu_tsb_insert(mm, MM_TSB_BASE, PAGE_SHIFT,
- address, pte_val(pte));
+ if (!is_huge_tsb) {
+ for (i = 0; i < nr; i++) {
+ __update_mmu_tsb_insert(mm, MM_TSB_BASE, PAGE_SHIFT,
+ address, pte_val(pte));
+ address += PAGE_SIZE;
+ pte_val(pte) += PAGE_SIZE;
+ }
+ }

spin_unlock_irqrestore(&mm->context.lock, flags);
}

-void flush_dcache_page(struct page *page)
+void flush_dcache_folio(struct folio *folio)
{
+ unsigned long pfn = folio_pfn(folio);
struct address_space *mapping;
int this_cpu;

@@ -459,35 +473,35 @@ void flush_dcache_page(struct page *page)
* is merely the zero page. The 'bigcore' testcase in GDB
* causes this case to run millions of times.
*/
- if (page == ZERO_PAGE(0))
+ if (is_zero_pfn(pfn))
return;

this_cpu = get_cpu();

- mapping = page_mapping_file(page);
+ mapping = folio_flush_mapping(folio);
if (mapping && !mapping_mapped(mapping)) {
- int dirty = test_bit(PG_dcache_dirty, &page->flags);
+ bool dirty = test_bit(PG_dcache_dirty, &folio->flags);
if (dirty) {
- int dirty_cpu = dcache_dirty_cpu(page);
+ int dirty_cpu = dcache_dirty_cpu(folio);

if (dirty_cpu == this_cpu)
goto out;
- smp_flush_dcache_page_impl(page, dirty_cpu);
+ smp_flush_dcache_folio_impl(folio, dirty_cpu);
}
- set_dcache_dirty(page, this_cpu);
+ set_dcache_dirty(folio, this_cpu);
} else {
/* We could delay the flush for the !page_mapping
* case too. But that case is for exec env/arg
* pages and those are %99 certainly going to get
* faulted into the tlb (and thus flushed) anyways.
*/
- flush_dcache_page_impl(page);
+ flush_dcache_folio_impl(folio);
}

out:
put_cpu();
}
-EXPORT_SYMBOL(flush_dcache_page);
+EXPORT_SYMBOL(flush_dcache_folio);

void __kprobes flush_icache_range(unsigned long start, unsigned long end)
{
@@ -2280,10 +2294,10 @@ void __init paging_init(void)
setup_page_offset();

/* These build time checkes make sure that the dcache_dirty_cpu()
- * page->flags usage will work.
+ * folio->flags usage will work.
*
* When a page gets marked as dcache-dirty, we store the
- * cpu number starting at bit 32 in the page->flags. Also,
+ * cpu number starting at bit 32 in the folio->flags. Also,
* functions like clear_dcache_dirty_cpu use the cpu mask
* in 13-bit signed-immediate instruction fields.
*/
diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c
index 9a725547578e..3fa6a070912d 100644
--- a/arch/sparc/mm/tlb.c
+++ b/arch/sparc/mm/tlb.c
@@ -118,6 +118,7 @@ void tlb_batch_add(struct mm_struct *mm, unsigned long vaddr,
unsigned long paddr, pfn = pte_pfn(orig);
struct address_space *mapping;
struct page *page;
+ struct folio *folio;

if (!pfn_valid(pfn))
goto no_cache_flush;
@@ -127,13 +128,13 @@ void tlb_batch_add(struct mm_struct *mm, unsigned long vaddr,
goto no_cache_flush;

/* A real file page? */
- mapping = page_mapping_file(page);
+ mapping = folio_flush_mapping(folio);
if (!mapping)
goto no_cache_flush;

paddr = (unsigned long) page_address(page);
if ((paddr ^ vaddr) & (1 << 13))
- flush_dcache_page_all(mm, page);
+ flush_dcache_folio_all(mm, folio);
}

no_cache_flush:
--
2.39.1


2023-02-27 17:59:15

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 06/30] arc: Implement the new page table range API

Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio()
and flush_icache_pages().

Change the PG_dc_clean flag from being per-page to per-folio (which
means it cannot always be set as we don't know that all pages in this
folio were cleaned). Enhance the internal flush routines to take the
number of pages to flush.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: [email protected]
---
arch/arc/include/asm/cacheflush.h | 7 +-
arch/arc/include/asm/pgtable-bits-arcv2.h | 20 +++--
arch/arc/mm/cache.c | 61 ++++++++------
arch/arc/mm/tlb.c | 18 +++--
arch/arm/include/asm/cacheflush.h | 24 +++---
arch/arm/include/asm/pgtable.h | 5 +-
arch/arm/include/asm/tlbflush.h | 13 +--
arch/arm/mm/copypage-v4mc.c | 5 +-
arch/arm/mm/copypage-v6.c | 5 +-
arch/arm/mm/copypage-xscale.c | 5 +-
arch/arm/mm/dma-mapping.c | 24 +++---
arch/arm/mm/fault-armv.c | 14 ++--
arch/arm/mm/flush.c | 99 ++++++++++++++---------
arch/arm/mm/mm.h | 2 +-
arch/arm/mm/mmu.c | 14 +++-
15 files changed, 193 insertions(+), 123 deletions(-)

diff --git a/arch/arc/include/asm/cacheflush.h b/arch/arc/include/asm/cacheflush.h
index e201b4b1655a..04f65f588510 100644
--- a/arch/arc/include/asm/cacheflush.h
+++ b/arch/arc/include/asm/cacheflush.h
@@ -25,17 +25,20 @@
* in update_mmu_cache()
*/
#define flush_icache_page(vma, page)
+#define flush_icache_pages(vma, page, nr)

void flush_cache_all(void);

void flush_icache_range(unsigned long kstart, unsigned long kend);
void __sync_icache_dcache(phys_addr_t paddr, unsigned long vaddr, int len);
-void __inv_icache_page(phys_addr_t paddr, unsigned long vaddr);
-void __flush_dcache_page(phys_addr_t paddr, unsigned long vaddr);
+void __inv_icache_pages(phys_addr_t paddr, unsigned long vaddr, unsigned nr);
+void __flush_dcache_pages(phys_addr_t paddr, unsigned long vaddr, unsigned nr);

#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1

void flush_dcache_page(struct page *page);
+void flush_dcache_folio(struct folio *folio);
+#define flush_dcache_folio flush_dcache_folio

void dma_cache_wback_inv(phys_addr_t start, unsigned long sz);
void dma_cache_inv(phys_addr_t start, unsigned long sz);
diff --git a/arch/arc/include/asm/pgtable-bits-arcv2.h b/arch/arc/include/asm/pgtable-bits-arcv2.h
index 6e9f8ca6d6a1..4a1b2ce204c6 100644
--- a/arch/arc/include/asm/pgtable-bits-arcv2.h
+++ b/arch/arc/include/asm/pgtable-bits-arcv2.h
@@ -100,14 +100,24 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
return __pte((pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot));
}

-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
- pte_t *ptep, pte_t pteval)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte, unsigned int nr)
{
- set_pte(ptep, pteval);
+ for (;;) {
+ set_pte(ptep, pte);
+ if (--nr == 0)
+ break;
+ ptep++;
+ pte_val(pte) += PAGE_SIZE;
+ }
}
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)

-void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
- pte_t *ptep);
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
+ pte_t *ptep, unsigned int nr);
+
+#define update_mmu_cache(vma, addr, ptep) \
+ update_mmu_cache_range(vma, addr, ptep, 1)

/*
* Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
diff --git a/arch/arc/mm/cache.c b/arch/arc/mm/cache.c
index 55c6de138eae..3c16ee942a5c 100644
--- a/arch/arc/mm/cache.c
+++ b/arch/arc/mm/cache.c
@@ -752,17 +752,17 @@ static inline void arc_slc_enable(void)
* There's a corollary case, where kernel READs from a userspace mapped page.
* If the U-mapping is not congruent to K-mapping, former needs flushing.
*/
-void flush_dcache_page(struct page *page)
+void flush_dcache_folio(struct folio *folio)
{
struct address_space *mapping;

if (!cache_is_vipt_aliasing()) {
- clear_bit(PG_dc_clean, &page->flags);
+ clear_bit(PG_dc_clean, &folio->flags);
return;
}

/* don't handle anon pages here */
- mapping = page_mapping_file(page);
+ mapping = folio_flush_mapping(folio);
if (!mapping)
return;

@@ -771,17 +771,27 @@ void flush_dcache_page(struct page *page)
* Make a note that K-mapping is dirty
*/
if (!mapping_mapped(mapping)) {
- clear_bit(PG_dc_clean, &page->flags);
- } else if (page_mapcount(page)) {
-
+ clear_bit(PG_dc_clean, &folio->flags);
+ } else if (folio_mapped(folio)) {
/* kernel reading from page with U-mapping */
- phys_addr_t paddr = (unsigned long)page_address(page);
- unsigned long vaddr = page->index << PAGE_SHIFT;
+ phys_addr_t paddr = (unsigned long)folio_address(folio);
+ unsigned long vaddr = folio_pos(folio);

+ /*
+ * vaddr is not actually the virtual address, but is
+ * congruent to every user mapping.
+ */
if (addr_not_cache_congruent(paddr, vaddr))
- __flush_dcache_page(paddr, vaddr);
+ __flush_dcache_pages(paddr, vaddr,
+ folio_nr_pages(folio));
}
}
+EXPORT_SYMBOL(flush_dcache_folio);
+
+void flush_dcache_page(struct page *page)
+{
+ return flush_dcache_folio(page_folio(page));
+}
EXPORT_SYMBOL(flush_dcache_page);

/*
@@ -921,18 +931,18 @@ void __sync_icache_dcache(phys_addr_t paddr, unsigned long vaddr, int len)
}

/* wrapper to compile time eliminate alignment checks in flush loop */
-void __inv_icache_page(phys_addr_t paddr, unsigned long vaddr)
+void __inv_icache_pages(phys_addr_t paddr, unsigned long vaddr, unsigned nr)
{
- __ic_line_inv_vaddr(paddr, vaddr, PAGE_SIZE);
+ __ic_line_inv_vaddr(paddr, vaddr, nr * PAGE_SIZE);
}

/*
* wrapper to clearout kernel or userspace mappings of a page
* For kernel mappings @vaddr == @paddr
*/
-void __flush_dcache_page(phys_addr_t paddr, unsigned long vaddr)
+void __flush_dcache_pages(phys_addr_t paddr, unsigned long vaddr, unsigned nr)
{
- __dc_line_op(paddr, vaddr & PAGE_MASK, PAGE_SIZE, OP_FLUSH_N_INV);
+ __dc_line_op(paddr, vaddr & PAGE_MASK, nr * PAGE_SIZE, OP_FLUSH_N_INV);
}

noinline void flush_cache_all(void)
@@ -962,10 +972,10 @@ void flush_cache_page(struct vm_area_struct *vma, unsigned long u_vaddr,

u_vaddr &= PAGE_MASK;

- __flush_dcache_page(paddr, u_vaddr);
+ __flush_dcache_pages(paddr, u_vaddr, 1);

if (vma->vm_flags & VM_EXEC)
- __inv_icache_page(paddr, u_vaddr);
+ __inv_icache_pages(paddr, u_vaddr, 1);
}

void flush_cache_range(struct vm_area_struct *vma, unsigned long start,
@@ -978,9 +988,9 @@ void flush_anon_page(struct vm_area_struct *vma, struct page *page,
unsigned long u_vaddr)
{
/* TBD: do we really need to clear the kernel mapping */
- __flush_dcache_page((phys_addr_t)page_address(page), u_vaddr);
- __flush_dcache_page((phys_addr_t)page_address(page),
- (phys_addr_t)page_address(page));
+ __flush_dcache_pages((phys_addr_t)page_address(page), u_vaddr, 1);
+ __flush_dcache_pages((phys_addr_t)page_address(page),
+ (phys_addr_t)page_address(page), 1);

}

@@ -989,6 +999,8 @@ void flush_anon_page(struct vm_area_struct *vma, struct page *page,
void copy_user_highpage(struct page *to, struct page *from,
unsigned long u_vaddr, struct vm_area_struct *vma)
{
+ struct folio *src = page_folio(from);
+ struct folio *dst = page_folio(to);
void *kfrom = kmap_atomic(from);
void *kto = kmap_atomic(to);
int clean_src_k_mappings = 0;
@@ -1005,7 +1017,7 @@ void copy_user_highpage(struct page *to, struct page *from,
* addr_not_cache_congruent() is 0
*/
if (page_mapcount(from) && addr_not_cache_congruent(kfrom, u_vaddr)) {
- __flush_dcache_page((unsigned long)kfrom, u_vaddr);
+ __flush_dcache_pages((unsigned long)kfrom, u_vaddr, 1);
clean_src_k_mappings = 1;
}

@@ -1019,17 +1031,17 @@ void copy_user_highpage(struct page *to, struct page *from,
* non copied user pages (e.g. read faults which wire in pagecache page
* directly).
*/
- clear_bit(PG_dc_clean, &to->flags);
+ clear_bit(PG_dc_clean, &dst->flags);

/*
* if SRC was already usermapped and non-congruent to kernel mapping
* sync the kernel mapping back to physical page
*/
if (clean_src_k_mappings) {
- __flush_dcache_page((unsigned long)kfrom, (unsigned long)kfrom);
- set_bit(PG_dc_clean, &from->flags);
+ __flush_dcache_pages((unsigned long)kfrom,
+ (unsigned long)kfrom, 1);
} else {
- clear_bit(PG_dc_clean, &from->flags);
+ clear_bit(PG_dc_clean, &src->flags);
}

kunmap_atomic(kto);
@@ -1038,8 +1050,9 @@ void copy_user_highpage(struct page *to, struct page *from,

void clear_user_page(void *to, unsigned long u_vaddr, struct page *page)
{
+ struct folio *folio = page_folio(page);
clear_page(to);
- clear_bit(PG_dc_clean, &page->flags);
+ clear_bit(PG_dc_clean, &folio->flags);
}
EXPORT_SYMBOL(clear_user_page);

diff --git a/arch/arc/mm/tlb.c b/arch/arc/mm/tlb.c
index 5f71445f26bd..0a996b65bb4e 100644
--- a/arch/arc/mm/tlb.c
+++ b/arch/arc/mm/tlb.c
@@ -467,8 +467,8 @@ void create_tlb(struct vm_area_struct *vma, unsigned long vaddr, pte_t *ptep)
* Note that flush (when done) involves both WBACK - so physical page is
* in sync as well as INV - so any non-congruent aliases don't remain
*/
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long vaddr_unaligned,
- pte_t *ptep)
+void update_mmu_cache_range(struct vm_area_struct *vma,
+ unsigned long vaddr_unaligned, pte_t *ptep, unsigned int nr)
{
unsigned long vaddr = vaddr_unaligned & PAGE_MASK;
phys_addr_t paddr = pte_val(*ptep) & PAGE_MASK_PHYS;
@@ -491,15 +491,19 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long vaddr_unaligned,
*/
if ((vma->vm_flags & VM_EXEC) ||
addr_not_cache_congruent(paddr, vaddr)) {
-
- int dirty = !test_and_set_bit(PG_dc_clean, &page->flags);
+ struct folio *folio = page_folio(page);
+ int dirty = !test_and_set_bit(PG_dc_clean, &folio->flags);
if (dirty) {
+ unsigned long offset = offset_in_folio(folio, paddr);
+ nr = folio_nr_pages(folio);
+ paddr -= offset;
+ vaddr -= offset;
/* wback + inv dcache lines (K-mapping) */
- __flush_dcache_page(paddr, paddr);
+ __flush_dcache_pages(paddr, paddr, nr);

/* invalidate any existing icache lines (U-mapping) */
if (vma->vm_flags & VM_EXEC)
- __inv_icache_page(paddr, vaddr);
+ __inv_icache_pages(paddr, vaddr, nr);
}
}
}
@@ -531,7 +535,7 @@ void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr,
pmd_t *pmd)
{
pte_t pte = __pte(pmd_val(*pmd));
- update_mmu_cache(vma, addr, &pte);
+ update_mmu_cache_range(vma, addr, &pte, HPAGE_PMD_NR);
}

void local_flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index a094f964c869..841e268d2374 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -231,14 +231,15 @@ vivt_flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned
vma->vm_flags);
}

-static inline void
-vivt_flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsigned long pfn)
+static inline void vivt_flush_cache_pages(struct vm_area_struct *vma,
+ unsigned long user_addr, unsigned long pfn, unsigned int nr)
{
struct mm_struct *mm = vma->vm_mm;

if (!mm || cpumask_test_cpu(smp_processor_id(), mm_cpumask(mm))) {
unsigned long addr = user_addr & PAGE_MASK;
- __cpuc_flush_user_range(addr, addr + PAGE_SIZE, vma->vm_flags);
+ __cpuc_flush_user_range(addr, addr + nr * PAGE_SIZE,
+ vma->vm_flags);
}
}

@@ -247,15 +248,17 @@ vivt_flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsig
vivt_flush_cache_mm(mm)
#define flush_cache_range(vma,start,end) \
vivt_flush_cache_range(vma,start,end)
-#define flush_cache_page(vma,addr,pfn) \
- vivt_flush_cache_page(vma,addr,pfn)
+#define flush_cache_pages(vma, addr, pfn, nr) \
+ vivt_flush_cache_pages(vma, addr, pfn, nr)
#else
-extern void flush_cache_mm(struct mm_struct *mm);
-extern void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned long end);
-extern void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsigned long pfn);
+void flush_cache_mm(struct mm_struct *mm);
+void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned long end);
+void flush_cache_pages(struct vm_area_struct *vma, unsigned long user_addr,
+ unsigned long pfn, unsigned int nr);
#endif

#define flush_cache_dup_mm(mm) flush_cache_mm(mm)
+#define flush_cache_page(vma, addr, pfn) flush_cache_pages(vma, addr, pfn, 1)

/*
* flush_icache_user_range is used when we want to ensure that the
@@ -289,7 +292,9 @@ extern void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr
* See update_mmu_cache for the user space part.
*/
#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
-extern void flush_dcache_page(struct page *);
+void flush_dcache_page(struct page *);
+void flush_dcache_folio(struct folio *folio);
+#define flush_dcache_folio flush_dcache_folio

#define ARCH_IMPLEMENTS_FLUSH_KERNEL_VMAP_RANGE 1
static inline void flush_kernel_vmap_range(void *addr, int size)
@@ -321,6 +326,7 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
* duplicate cache flushing elsewhere performed by flush_dcache_page().
*/
#define flush_icache_page(vma,page) do { } while (0)
+#define flush_icache_pages(vma, page, nr) do { } while (0)

/*
* flush_cache_vmap() is used when creating mappings (eg, via vmap,
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index a58ccbb406ad..6525ac82bd50 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -207,8 +207,9 @@ static inline void __sync_icache_dcache(pte_t pteval)
extern void __sync_icache_dcache(pte_t pteval);
#endif

-void set_pte_at(struct mm_struct *mm, unsigned long addr,
- pte_t *ptep, pte_t pteval);
+void set_ptes(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pteval, unsigned int nr);
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)

static inline pte_t clear_pte_bit(pte_t pte, pgprot_t prot)
{
diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
index 0ccc985b90af..7d792e485f4f 100644
--- a/arch/arm/include/asm/tlbflush.h
+++ b/arch/arm/include/asm/tlbflush.h
@@ -619,18 +619,21 @@ extern void flush_bp_all(void);
* If PG_dcache_clean is not set for the page, we need to ensure that any
* cache entries for the kernels virtual memory range are written
* back to the page. On ARMv6 and later, the cache coherency is handled via
- * the set_pte_at() function.
+ * the set_ptes() function.
*/
#if __LINUX_ARM_ARCH__ < 6
-extern void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
- pte_t *ptep);
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long addr,
+ pte_t *ptep, unsigned int nr);
#else
-static inline void update_mmu_cache(struct vm_area_struct *vma,
- unsigned long addr, pte_t *ptep)
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep, unsigned int nr)
{
}
#endif

+#define update_mmu_cache(vma, addr, ptep) \
+ update_mmu_cache_range(vma, addr, ptep, 1)
+
#define update_mmu_cache_pmd(vma, address, pmd) do { } while (0)

#endif
diff --git a/arch/arm/mm/copypage-v4mc.c b/arch/arm/mm/copypage-v4mc.c
index f1da3b439b96..7ddd82b9fe8b 100644
--- a/arch/arm/mm/copypage-v4mc.c
+++ b/arch/arm/mm/copypage-v4mc.c
@@ -64,10 +64,11 @@ static void mc_copy_user_page(void *from, void *to)
void v4_mc_copy_user_highpage(struct page *to, struct page *from,
unsigned long vaddr, struct vm_area_struct *vma)
{
+ struct folio *src = page_folio(from);
void *kto = kmap_atomic(to);

- if (!test_and_set_bit(PG_dcache_clean, &from->flags))
- __flush_dcache_page(page_mapping_file(from), from);
+ if (!test_and_set_bit(PG_dcache_clean, &src->flags))
+ __flush_dcache_folio(folio_flush_mapping(src), src);

raw_spin_lock(&minicache_lock);

diff --git a/arch/arm/mm/copypage-v6.c b/arch/arm/mm/copypage-v6.c
index d8a115de5507..a1a71f36d850 100644
--- a/arch/arm/mm/copypage-v6.c
+++ b/arch/arm/mm/copypage-v6.c
@@ -69,11 +69,12 @@ static void discard_old_kernel_data(void *kto)
static void v6_copy_user_highpage_aliasing(struct page *to,
struct page *from, unsigned long vaddr, struct vm_area_struct *vma)
{
+ struct folio *src = page_folio(from);
unsigned int offset = CACHE_COLOUR(vaddr);
unsigned long kfrom, kto;

- if (!test_and_set_bit(PG_dcache_clean, &from->flags))
- __flush_dcache_page(page_mapping_file(from), from);
+ if (!test_and_set_bit(PG_dcache_clean, &src->flags))
+ __flush_dcache_folio(folio_flush_mapping(src), src);

/* FIXME: not highmem safe */
discard_old_kernel_data(page_address(to));
diff --git a/arch/arm/mm/copypage-xscale.c b/arch/arm/mm/copypage-xscale.c
index bcb485620a05..f1e29d3e8193 100644
--- a/arch/arm/mm/copypage-xscale.c
+++ b/arch/arm/mm/copypage-xscale.c
@@ -84,10 +84,11 @@ static void mc_copy_user_page(void *from, void *to)
void xscale_mc_copy_user_highpage(struct page *to, struct page *from,
unsigned long vaddr, struct vm_area_struct *vma)
{
+ struct folio *src = page_folio(from);
void *kto = kmap_atomic(to);

- if (!test_and_set_bit(PG_dcache_clean, &from->flags))
- __flush_dcache_page(page_mapping_file(from), from);
+ if (!test_and_set_bit(PG_dcache_clean, &src->flags))
+ __flush_dcache_folio(folio_flush_mapping(src), src);

raw_spin_lock(&minicache_lock);

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 8bc01071474a..5ecfde41d70a 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -693,6 +693,7 @@ static void __dma_page_cpu_to_dev(struct page *page, unsigned long off,
static void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
size_t size, enum dma_data_direction dir)
{
+ struct folio *folio = page_folio(page);
phys_addr_t paddr = page_to_phys(page) + off;

/* FIXME: non-speculating: not required */
@@ -707,19 +708,18 @@ static void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
* Mark the D-cache clean for these pages to avoid extra flushing.
*/
if (dir != DMA_TO_DEVICE && size >= PAGE_SIZE) {
- unsigned long pfn;
- size_t left = size;
-
- pfn = page_to_pfn(page) + off / PAGE_SIZE;
- off %= PAGE_SIZE;
- if (off) {
- pfn++;
- left -= PAGE_SIZE - off;
+ ssize_t left = size;
+ size_t offset = offset_in_folio(folio, paddr);
+
+ if (offset) {
+ left -= folio_size(folio) - offset;
+ folio = folio_next(folio);
}
- while (left >= PAGE_SIZE) {
- page = pfn_to_page(pfn++);
- set_bit(PG_dcache_clean, &page->flags);
- left -= PAGE_SIZE;
+
+ while (left >= (ssize_t)folio_size(folio)) {
+ set_bit(PG_dcache_clean, &folio->flags);
+ left -= folio_size(folio);
+ folio = folio_next(folio);
}
}
}
diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
index 0e49154454a6..e2c869b8f012 100644
--- a/arch/arm/mm/fault-armv.c
+++ b/arch/arm/mm/fault-armv.c
@@ -178,8 +178,8 @@ make_coherent(struct address_space *mapping, struct vm_area_struct *vma,
*
* Note that the pte lock will be held.
*/
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
- pte_t *ptep)
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long addr,
+ pte_t *ptep, unsigned int nr)
{
unsigned long pfn = pte_pfn(*ptep);
struct address_space *mapping;
@@ -192,13 +192,13 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
* The zero page is never written to, so never has any dirty
* cache lines, and therefore never needs to be flushed.
*/
- page = pfn_to_page(pfn);
- if (page == ZERO_PAGE(0))
+ if (is_zero_pfn(pfn))
return;

- mapping = page_mapping_file(page);
- if (!test_and_set_bit(PG_dcache_clean, &page->flags))
- __flush_dcache_page(mapping, page);
+ folio = page_folio(pfn_to_page(pfn));
+ mapping = folio_flush_mapping(page);
+ if (!test_and_set_bit(PG_dcache_clean, &folio->flags))
+ __flush_dcache_folio(mapping, folio);
if (mapping) {
if (cache_is_vivt())
make_coherent(mapping, vma, addr, ptep, pfn);
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 7ff9feea13a6..07ea0ab51099 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -95,10 +95,10 @@ void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned
__flush_icache_all();
}

-void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsigned long pfn)
+void flush_cache_pages(struct vm_area_struct *vma, unsigned long user_addr, unsigned long pfn, unsigned int nr)
{
if (cache_is_vivt()) {
- vivt_flush_cache_page(vma, user_addr, pfn);
+ vivt_flush_cache_pages(vma, user_addr, pfn, nr);
return;
}

@@ -196,29 +196,31 @@ void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
#endif
}

-void __flush_dcache_page(struct address_space *mapping, struct page *page)
+void __flush_dcache_folio(struct address_space *mapping, struct folio *folio)
{
/*
* Writeback any data associated with the kernel mapping of this
* page. This ensures that data in the physical page is mutually
* coherent with the kernels mapping.
*/
- if (!PageHighMem(page)) {
- __cpuc_flush_dcache_area(page_address(page), page_size(page));
+ if (!folio_test_highmem(folio)) {
+ __cpuc_flush_dcache_area(folio_address(folio),
+ folio_size(folio));
} else {
unsigned long i;
if (cache_is_vipt_nonaliasing()) {
- for (i = 0; i < compound_nr(page); i++) {
- void *addr = kmap_atomic(page + i);
+ for (i = 0; i < folio_nr_pages(folio); i++) {
+ void *addr = kmap_local_folio(folio,
+ i * PAGE_SIZE);
__cpuc_flush_dcache_area(addr, PAGE_SIZE);
- kunmap_atomic(addr);
+ kunmap_local(addr);
}
} else {
- for (i = 0; i < compound_nr(page); i++) {
- void *addr = kmap_high_get(page + i);
+ for (i = 0; i < folio_nr_pages(folio); i++) {
+ void *addr = kmap_high_get(folio_page(folio, i));
if (addr) {
__cpuc_flush_dcache_area(addr, PAGE_SIZE);
- kunmap_high(page + i);
+ kunmap_high(folio_page(folio, i));
}
}
}
@@ -230,15 +232,14 @@ void __flush_dcache_page(struct address_space *mapping, struct page *page)
* userspace colour, which is congruent with page->index.
*/
if (mapping && cache_is_vipt_aliasing())
- flush_pfn_alias(page_to_pfn(page),
- page->index << PAGE_SHIFT);
+ flush_pfn_alias(folio_pfn(folio), folio_pos(folio));
}

-static void __flush_dcache_aliases(struct address_space *mapping, struct page *page)
+static void __flush_dcache_aliases(struct address_space *mapping, struct folio *folio)
{
struct mm_struct *mm = current->active_mm;
- struct vm_area_struct *mpnt;
- pgoff_t pgoff;
+ struct vm_area_struct *vma;
+ pgoff_t pgoff, pgoff_end;

/*
* There are possible user space mappings of this page:
@@ -246,21 +247,36 @@ static void __flush_dcache_aliases(struct address_space *mapping, struct page *p
* data in the current VM view associated with this page.
* - aliasing VIPT: we only need to find one mapping of this page.
*/
- pgoff = page->index;
+ pgoff = folio->index;
+ pgoff_end = pgoff + folio_nr_pages(folio) - 1;

flush_dcache_mmap_lock(mapping);
- vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) {
- unsigned long offset;
+ vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff_end) {
+ unsigned long start, offset, pfn;
+ unsigned int nr;

/*
* If this VMA is not in our MM, we can ignore it.
*/
- if (mpnt->vm_mm != mm)
+ if (vma->vm_mm != mm)
continue;
- if (!(mpnt->vm_flags & VM_MAYSHARE))
+ if (!(vma->vm_flags & VM_MAYSHARE))
continue;
- offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
- flush_cache_page(mpnt, mpnt->vm_start + offset, page_to_pfn(page));
+
+ start = vma->vm_start;
+ pfn = folio_pfn(folio);
+ nr = folio_nr_pages(folio);
+ offset = pgoff - vma->vm_pgoff;
+ if (offset > -nr) {
+ pfn -= offset;
+ nr += offset;
+ } else {
+ start += offset * PAGE_SIZE;
+ }
+ if (start + nr * PAGE_SIZE > vma->vm_end)
+ nr = (vma->vm_end - start) / PAGE_SIZE;
+
+ flush_cache_pages(vma, start, pfn, nr);
}
flush_dcache_mmap_unlock(mapping);
}
@@ -269,7 +285,7 @@ static void __flush_dcache_aliases(struct address_space *mapping, struct page *p
void __sync_icache_dcache(pte_t pteval)
{
unsigned long pfn;
- struct page *page;
+ struct folio *folio;
struct address_space *mapping;

if (cache_is_vipt_nonaliasing() && !pte_exec(pteval))
@@ -279,14 +295,14 @@ void __sync_icache_dcache(pte_t pteval)
if (!pfn_valid(pfn))
return;

- page = pfn_to_page(pfn);
+ folio = page_folio(pfn_to_page(pfn));
if (cache_is_vipt_aliasing())
- mapping = page_mapping_file(page);
+ mapping = folio_flush_mapping(folio);
else
mapping = NULL;

- if (!test_and_set_bit(PG_dcache_clean, &page->flags))
- __flush_dcache_page(mapping, page);
+ if (!test_and_set_bit(PG_dcache_clean, &folio->flags))
+ __flush_dcache_folio(mapping, folio);

if (pte_exec(pteval))
__flush_icache_all();
@@ -312,7 +328,7 @@ void __sync_icache_dcache(pte_t pteval)
* Note that we disable the lazy flush for SMP configurations where
* the cache maintenance operations are not automatically broadcasted.
*/
-void flush_dcache_page(struct page *page)
+void flush_dcache_folio(struct folio *folio)
{
struct address_space *mapping;

@@ -320,31 +336,36 @@ void flush_dcache_page(struct page *page)
* The zero page is never written to, so never has any dirty
* cache lines, and therefore never needs to be flushed.
*/
- if (page == ZERO_PAGE(0))
+ if (is_zero_pfn(folio_pfn(folio)))
return;

if (!cache_ops_need_broadcast() && cache_is_vipt_nonaliasing()) {
- if (test_bit(PG_dcache_clean, &page->flags))
- clear_bit(PG_dcache_clean, &page->flags);
+ if (test_bit(PG_dcache_clean, &folio->flags))
+ clear_bit(PG_dcache_clean, &folio->flags);
return;
}

- mapping = page_mapping_file(page);
+ mapping = folio_flush_mapping(folio);

if (!cache_ops_need_broadcast() &&
- mapping && !page_mapcount(page))
- clear_bit(PG_dcache_clean, &page->flags);
+ mapping && !folio_mapped(folio))
+ clear_bit(PG_dcache_clean, &folio->flags);
else {
- __flush_dcache_page(mapping, page);
+ __flush_dcache_folio(mapping, folio);
if (mapping && cache_is_vivt())
- __flush_dcache_aliases(mapping, page);
+ __flush_dcache_aliases(mapping, folio);
else if (mapping)
__flush_icache_all();
- set_bit(PG_dcache_clean, &page->flags);
+ set_bit(PG_dcache_clean, &folio->flags);
}
}
-EXPORT_SYMBOL(flush_dcache_page);
+EXPORT_SYMBOL(flush_dcache_folio);

+void flush_dcache_page(struct page *page)
+{
+ flush_dcache_folio(page_folio(page));
+}
+EXPORT_SYMBOL(flush_dcache_page);
/*
* Flush an anonymous page so that users of get_user_pages()
* can safely access the data. The expected sequence is:
diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
index d7ffccb7fea7..419316316711 100644
--- a/arch/arm/mm/mm.h
+++ b/arch/arm/mm/mm.h
@@ -45,7 +45,7 @@ struct mem_type {

const struct mem_type *get_mem_type(unsigned int type);

-extern void __flush_dcache_page(struct address_space *mapping, struct page *page);
+void __flush_dcache_folio(struct address_space *mapping, struct folio *folio);

/*
* ARM specific vm_struct->flags bits.
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 463fc2a8448f..9947bbc32b04 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -1788,7 +1788,7 @@ void __init paging_init(const struct machine_desc *mdesc)
bootmem_init();

empty_zero_page = virt_to_page(zero_page);
- __flush_dcache_page(NULL, empty_zero_page);
+ __flush_dcache_folio(NULL, page_folio(empty_zero_page));
}

void __init early_mm_init(const struct machine_desc *mdesc)
@@ -1797,8 +1797,8 @@ void __init early_mm_init(const struct machine_desc *mdesc)
early_paging_init(mdesc);
}

-void set_pte_at(struct mm_struct *mm, unsigned long addr,
- pte_t *ptep, pte_t pteval)
+void set_ptes(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pteval, unsigned int nr)
{
unsigned long ext = 0;

@@ -1808,5 +1808,11 @@ void set_pte_at(struct mm_struct *mm, unsigned long addr,
ext |= PTE_EXT_NG;
}

- set_pte_ext(ptep, pteval, ext);
+ for (;;) {
+ set_pte_ext(ptep, pteval, ext);
+ if (--nr == 0)
+ break;
+ ptep++;
+ pte_val(pteval) += PAGE_SIZE;
+ }
}
--
2.39.1


2023-02-27 17:59:19

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 12/30] m68k: Implement the new page table range API

Add set_ptes(), update_mmu_cache_range(), flush_icache_pages() and
flush_dcache_folio().

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: [email protected]
---
arch/m68k/include/asm/cacheflush_mm.h | 26 +++++++++++++++++---------
arch/m68k/include/asm/pgtable_mm.h | 21 ++++++++++++++++++---
arch/m68k/mm/motorola.c | 2 +-
3 files changed, 36 insertions(+), 13 deletions(-)

diff --git a/arch/m68k/include/asm/cacheflush_mm.h b/arch/m68k/include/asm/cacheflush_mm.h
index 1ac55e7b47f0..d43c8bce149b 100644
--- a/arch/m68k/include/asm/cacheflush_mm.h
+++ b/arch/m68k/include/asm/cacheflush_mm.h
@@ -220,24 +220,28 @@ static inline void flush_cache_page(struct vm_area_struct *vma, unsigned long vm

/* Push the page at kernel virtual address and clear the icache */
/* RZ: use cpush %bc instead of cpush %dc, cinv %ic */
-static inline void __flush_page_to_ram(void *vaddr)
+static inline void __flush_pages_to_ram(void *vaddr, unsigned int nr)
{
if (CPU_IS_COLDFIRE) {
unsigned long addr, start, end;
addr = ((unsigned long) vaddr) & ~(PAGE_SIZE - 1);
start = addr & ICACHE_SET_MASK;
- end = (addr + PAGE_SIZE - 1) & ICACHE_SET_MASK;
+ end = (addr + nr * PAGE_SIZE - 1) & ICACHE_SET_MASK;
if (start > end) {
flush_cf_bcache(0, end);
end = ICACHE_MAX_ADDR;
}
flush_cf_bcache(start, end);
} else if (CPU_IS_040_OR_060) {
- __asm__ __volatile__("nop\n\t"
- ".chip 68040\n\t"
- "cpushp %%bc,(%0)\n\t"
- ".chip 68k"
- : : "a" (__pa(vaddr)));
+ unsigned long paddr = __pa(vaddr);
+
+ while (nr--) {
+ __asm__ __volatile__("nop\n\t"
+ ".chip 68040\n\t"
+ "cpushp %%bc,(%0)\n\t"
+ ".chip 68k"
+ : : "a" (paddr + nr * PAGE_SIZE));
+ }
} else {
unsigned long _tmp;
__asm__ __volatile__("movec %%cacr,%0\n\t"
@@ -249,10 +253,14 @@ static inline void __flush_page_to_ram(void *vaddr)
}

#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
-#define flush_dcache_page(page) __flush_page_to_ram(page_address(page))
+#define flush_dcache_page(page) __flush_pages_to_ram(page_address(page), 1)
+#define flush_dcache_folio(folio) \
+ __flush_pages_to_ram(folio_address(folio), folio_nr_pages(folio))
#define flush_dcache_mmap_lock(mapping) do { } while (0)
#define flush_dcache_mmap_unlock(mapping) do { } while (0)
-#define flush_icache_page(vma, page) __flush_page_to_ram(page_address(page))
+#define flush_icache_pages(vma, page, nr) \
+ __flush_pages_to_ram(page_address(page), nr)
+#define flush_icache_page(vma, page) flush_icache_pages(vma, page, 1)

extern void flush_icache_user_page(struct vm_area_struct *vma, struct page *page,
unsigned long addr, int len);
diff --git a/arch/m68k/include/asm/pgtable_mm.h b/arch/m68k/include/asm/pgtable_mm.h
index b93c41fe2067..400206c17c97 100644
--- a/arch/m68k/include/asm/pgtable_mm.h
+++ b/arch/m68k/include/asm/pgtable_mm.h
@@ -31,8 +31,20 @@
do{ \
*(pteptr) = (pteval); \
} while(0)
-#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)

+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte, unsigned int nr)
+{
+ for (;;) {
+ set_pte(ptep, pte);
+ if (--nr == 0)
+ break;
+ ptep++;
+ pte_val(pte) += PAGE_SIZE;
+ }
+}
+
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)

/* PMD_SHIFT determines the size of the area a second-level page table can map */
#if CONFIG_PGTABLE_LEVELS == 3
@@ -138,11 +150,14 @@ extern void kernel_set_cachemode(void *addr, unsigned long size, int cmode);
* tables contain all the necessary information. The Sun3 does, but
* they are updated on demand.
*/
-static inline void update_mmu_cache(struct vm_area_struct *vma,
- unsigned long address, pte_t *ptep)
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+ unsigned long address, pte_t *ptep, unsigned int nr)
{
}

+#define update_mmu_cache(vma, addr, ptep) \
+ update_mmu_cache_range(vma, addr, ptep, 1)
+
#endif /* !__ASSEMBLY__ */

/* MMU-specific headers */
diff --git a/arch/m68k/mm/motorola.c b/arch/m68k/mm/motorola.c
index 2a375637e007..7784d0fcdf6e 100644
--- a/arch/m68k/mm/motorola.c
+++ b/arch/m68k/mm/motorola.c
@@ -81,7 +81,7 @@ static inline void cache_page(void *vaddr)

void mmu_page_ctor(void *page)
{
- __flush_page_to_ram(page);
+ __flush_pages_to_ram(page, 1);
flush_tlb_kernel_page(page);
nocache_page(page);
}
--
2.39.1


2023-02-27 17:59:21

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 30/30] filemap: Batch PTE mappings

From: Yin Fengwei <[email protected]>

Call set_pte_range() once per contiguous range of the folio instead
of once per page. This batches the updates to mm counters and the
rmap.

With a will-it-scale.page_fault3 like app (change file write
fault testing to read fault testing. Trying to upstream it to
will-it-scale at [1]) got 15% performance gain on a 48C/96T
Cascade Lake test box with 96 processes running against xfs.

Perf data collected before/after the change:
18.73%--page_add_file_rmap
|
--11.60%--__mod_lruvec_page_state
|
|--7.40%--__mod_memcg_lruvec_state
| |
| --5.58%--cgroup_rstat_updated
|
--2.53%--__mod_lruvec_state
|
--1.48%--__mod_node_page_state

9.93%--page_add_file_rmap_range
|
--2.67%--__mod_lruvec_page_state
|
|--1.95%--__mod_memcg_lruvec_state
| |
| --1.57%--cgroup_rstat_updated
|
--0.61%--__mod_lruvec_state
|
--0.54%--__mod_node_page_state

The running time of __mode_lruvec_page_state() is reduced about 9%.

[1]: https://github.com/antonblanchard/will-it-scale/pull/37

Signed-off-by: Yin Fengwei <[email protected]>
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
mm/filemap.c | 36 +++++++++++++++++++++++++-----------
1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 07ebd90967a3..40be33b5ee46 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3486,11 +3486,12 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
struct file *file = vma->vm_file;
struct page *page = folio_page(folio, start);
unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss);
- unsigned int ref_count = 0, count = 0;
+ unsigned int count = 0;
+ pte_t *old_ptep = vmf->pte;

do {
- if (PageHWPoison(page))
- continue;
+ if (PageHWPoison(page + count))
+ goto skip;

if (mmap_miss > 0)
mmap_miss--;
@@ -3500,20 +3501,33 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
* handled in the specific fault path, and it'll prohibit the
* fault-around logic.
*/
- if (!pte_none(*vmf->pte))
- continue;
+ if (!pte_none(vmf->pte[count]))
+ goto skip;

if (vmf->address == addr)
ret = VM_FAULT_NOPAGE;

- ref_count++;
- set_pte_range(vmf, folio, page, 1, addr);
- } while (vmf->pte++, page++, addr += PAGE_SIZE, ++count < nr_pages);
+ count++;
+ continue;
+skip:
+ if (count) {
+ set_pte_range(vmf, folio, page, count, addr);
+ folio_ref_add(folio, count);
+ }

- /* Restore the vmf->pte */
- vmf->pte -= nr_pages;
+ count++;
+ page += count;
+ vmf->pte += count;
+ addr += count * PAGE_SIZE;
+ count = 0;
+ } while (--nr_pages > 0);
+
+ if (count) {
+ set_pte_range(vmf, folio, page, count, addr);
+ folio_ref_add(folio, count);
+ }

- folio_ref_add(folio, ref_count);
+ vmf->pte = old_ptep;
WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss);

return ret;
--
2.39.1


2023-02-27 17:59:24

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 03/30] mm: Add folio_flush_mapping()

This is the folio equivalent of page_mapping_file(), but rename it
to make it clear that it's very different from page_file_mapping().
Theoretically, there's nothing flush-only about it, but there are no
other users today, and I doubt there will be; it's almost always more
useful to know the swapfile's mapping or the swapcache's mapping.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
include/linux/pagemap.h | 26 +++++++++++++++++++++-----
1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 51b75b89730e..647c5a036a97 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -369,6 +369,26 @@ static inline struct address_space *folio_file_mapping(struct folio *folio)
return folio->mapping;
}

+/**
+ * folio_flush_mapping - Find the file mapping this folio belongs to.
+ * @folio: The folio.
+ *
+ * For folios which are in the page cache, return the mapping that this
+ * page belongs to. Anonymous folios return NULL, even if they're in
+ * the swap cache. Other kinds of folio also return NULL.
+ *
+ * This is ONLY used by architecture cache flushing code. If you aren't
+ * writing cache flushing code, you want either folio_mapping() or
+ * folio_file_mapping().
+ */
+static inline struct address_space *folio_flush_mapping(struct folio *folio)
+{
+ if (unlikely(folio_test_swapcache(folio)))
+ return swapcache_mapping(folio);
+
+ return folio->mapping;
+}
+
static inline struct address_space *page_file_mapping(struct page *page)
{
return folio_file_mapping(page_folio(page));
@@ -379,11 +399,7 @@ static inline struct address_space *page_file_mapping(struct page *page)
*/
static inline struct address_space *page_mapping_file(struct page *page)
{
- struct folio *folio = page_folio(page);
-
- if (unlikely(folio_test_swapcache(folio)))
- return NULL;
- return folio_mapping(folio);
+ return folio_flush_mapping(page_folio(page));
}

/**
--
2.39.1


2023-02-27 17:59:29

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 02/30] mm: Add generic flush_icache_pages() and documentation

flush_icache_page() is deprecated but not yet removed, so add
a range version of it. Change the documentation to refer to
update_mmu_cache_range() instead of update_mmu_cache().

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
Documentation/core-api/cachetlb.rst | 35 +++++++++++++++--------------
include/asm-generic/cacheflush.h | 5 +++++
2 files changed, 23 insertions(+), 17 deletions(-)

diff --git a/Documentation/core-api/cachetlb.rst b/Documentation/core-api/cachetlb.rst
index 5c0552e78c58..d4c9e2a28d36 100644
--- a/Documentation/core-api/cachetlb.rst
+++ b/Documentation/core-api/cachetlb.rst
@@ -88,13 +88,13 @@ changes occur:

This is used primarily during fault processing.

-5) ``void update_mmu_cache(struct vm_area_struct *vma,
- unsigned long address, pte_t *ptep)``
+5) ``void update_mmu_cache_range(struct vm_area_struct *vma,
+ unsigned long address, pte_t *ptep, unsigned int nr)``

- At the end of every page fault, this routine is invoked to
- tell the architecture specific code that a translation
- now exists at virtual address "address" for address space
- "vma->vm_mm", in the software page tables.
+ At the end of every page fault, this routine is invoked to tell
+ the architecture specific code that translations now exists
+ in the software page tables for address space "vma->vm_mm"
+ at virtual address "address" for "nr" consecutive pages.

A port may use this information in any way it so chooses.
For example, it could use this event to pre-load TLB
@@ -306,17 +306,18 @@ maps this page at its virtual address.
private". The kernel guarantees that, for pagecache pages, it will
clear this bit when such a page first enters the pagecache.

- This allows these interfaces to be implemented much more efficiently.
- It allows one to "defer" (perhaps indefinitely) the actual flush if
- there are currently no user processes mapping this page. See sparc64's
- flush_dcache_page and update_mmu_cache implementations for an example
- of how to go about doing this.
+ This allows these interfaces to be implemented much more
+ efficiently. It allows one to "defer" (perhaps indefinitely) the
+ actual flush if there are currently no user processes mapping this
+ page. See sparc64's flush_dcache_page and update_mmu_cache_range
+ implementations for an example of how to go about doing this.

- The idea is, first at flush_dcache_page() time, if page_file_mapping()
- returns a mapping, and mapping_mapped on that mapping returns %false,
- just mark the architecture private page flag bit. Later, in
- update_mmu_cache(), a check is made of this flag bit, and if set the
- flush is done and the flag bit is cleared.
+ The idea is, first at flush_dcache_page() time, if
+ page_file_mapping() returns a mapping, and mapping_mapped on that
+ mapping returns %false, just mark the architecture private page
+ flag bit. Later, in update_mmu_cache_range(), a check is made
+ of this flag bit, and if set the flush is done and the flag bit
+ is cleared.

.. important::

@@ -369,7 +370,7 @@ maps this page at its virtual address.
``void flush_icache_page(struct vm_area_struct *vma, struct page *page)``

All the functionality of flush_icache_page can be implemented in
- flush_dcache_page and update_mmu_cache. In the future, the hope
+ flush_dcache_page and update_mmu_cache_range. In the future, the hope
is to remove this interface completely.

The final category of APIs is for I/O to deliberately aliased address
diff --git a/include/asm-generic/cacheflush.h b/include/asm-generic/cacheflush.h
index f46258d1a080..09d51a680765 100644
--- a/include/asm-generic/cacheflush.h
+++ b/include/asm-generic/cacheflush.h
@@ -78,6 +78,11 @@ static inline void flush_icache_range(unsigned long start, unsigned long end)
#endif

#ifndef flush_icache_page
+static inline void flush_icache_pages(struct vm_area_struct *vma,
+ struct page *page, unsigned int nr)
+{
+}
+
static inline void flush_icache_page(struct vm_area_struct *vma,
struct page *page)
{
--
2.39.1


2023-02-27 17:59:31

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 28/30] rmap: add folio_add_file_rmap_range()

From: Yin Fengwei <[email protected]>

folio_add_file_rmap_range() allows to add pte mapping to a specific
range of file folio. Comparing to page_add_file_rmap(), it batched
updates __lruvec_stat for large folio.

Signed-off-by: Yin Fengwei <[email protected]>
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
include/linux/rmap.h | 2 ++
mm/rmap.c | 60 +++++++++++++++++++++++++++++++++-----------
2 files changed, 48 insertions(+), 14 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index b87d01660412..a3825ce81102 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -198,6 +198,8 @@ void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address);
void page_add_file_rmap(struct page *, struct vm_area_struct *,
bool compound);
+void folio_add_file_rmap_range(struct folio *, struct page *, unsigned int nr,
+ struct vm_area_struct *, bool compound);
void page_remove_rmap(struct page *, struct vm_area_struct *,
bool compound);

diff --git a/mm/rmap.c b/mm/rmap.c
index bacdb795d5ee..fffdb85a3b3d 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1303,31 +1303,39 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
}

/**
- * page_add_file_rmap - add pte mapping to a file page
- * @page: the page to add the mapping to
+ * folio_add_file_rmap_range - add pte mapping to page range of a folio
+ * @folio: The folio to add the mapping to
+ * @page: The first page to add
+ * @nr_pages: The number of pages which will be mapped
* @vma: the vm area in which the mapping is added
* @compound: charge the page as compound or small page
*
+ * The page range of folio is defined by [first_page, first_page + nr_pages)
+ *
* The caller needs to hold the pte lock.
*/
-void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
- bool compound)
+void folio_add_file_rmap_range(struct folio *folio, struct page *page,
+ unsigned int nr_pages, struct vm_area_struct *vma,
+ bool compound)
{
- struct folio *folio = page_folio(page);
atomic_t *mapped = &folio->_nr_pages_mapped;
- int nr = 0, nr_pmdmapped = 0;
- bool first;
+ unsigned int nr_pmdmapped = 0, first;
+ int nr = 0;

- VM_BUG_ON_PAGE(compound && !PageTransHuge(page), page);
+ VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio);

/* Is page being mapped by PTE? Is this its first map to be added? */
if (likely(!compound)) {
- first = atomic_inc_and_test(&page->_mapcount);
- nr = first;
- if (first && folio_test_large(folio)) {
- nr = atomic_inc_return_relaxed(mapped);
- nr = (nr < COMPOUND_MAPPED);
- }
+ do {
+ first = atomic_inc_and_test(&page->_mapcount);
+ if (first && folio_test_large(folio)) {
+ first = atomic_inc_return_relaxed(mapped);
+ first = (nr < COMPOUND_MAPPED);
+ }
+
+ if (first)
+ nr++;
+ } while (page++, --nr_pages > 0);
} else if (folio_test_pmd_mappable(folio)) {
/* That test is redundant: it's for safety or to optimize out */

@@ -1356,6 +1364,30 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
mlock_vma_folio(folio, vma, compound);
}

+/**
+ * page_add_file_rmap - add pte mapping to a file page
+ * @page: the page to add the mapping to
+ * @vma: the vm area in which the mapping is added
+ * @compound: charge the page as compound or small page
+ *
+ * The caller needs to hold the pte lock.
+ */
+void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
+ bool compound)
+{
+ struct folio *folio = page_folio(page);
+ unsigned int nr_pages;
+
+ VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page);
+
+ if (likely(!compound))
+ nr_pages = 1;
+ else
+ nr_pages = folio_nr_pages(folio);
+
+ folio_add_file_rmap_range(folio, page, nr_pages, vma, compound);
+}
+
/**
* page_remove_rmap - take down pte mapping from a page
* @page: page to remove mapping from
--
2.39.1


2023-02-27 19:45:47

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH v2 18/30] powerpc: Implement the new page table range API

Hi,

Le 27/02/2023 à 18:57, Matthew Wilcox (Oracle) a écrit :
> Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
> Change the PG_arch_1 (aka PG_dcache_dirty) flag from being per-page to
> per-folio.
>
> I'm unsure about my merging of flush_dcache_icache_hugepage() and
> flush_dcache_icache_page() into flush_dcache_icache_folio() and subsequent
> removal of flush_dcache_icache_phys(). Please review.

Not sure why you want to remove flush_dcache_icache_phys().

Allthough that's only feasible when address bus is not wider than 32
bits and cannot be done on BOOKE as you can't switch off MMU on BOOKE,
flush_dcache_icache_phys() allows to flush not mapped pages without
having to map them. So it is more efficient.

Christophe

>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> Cc: Michael Ellerman <[email protected]>
> Cc: Nicholas Piggin <[email protected]>
> Cc: Christophe Leroy <[email protected]>
> Cc: [email protected]
> ---
> arch/powerpc/include/asm/book3s/pgtable.h | 10 +--
> arch/powerpc/include/asm/cacheflush.h | 14 ++--
> arch/powerpc/include/asm/kvm_ppc.h | 10 +--
> arch/powerpc/include/asm/nohash/pgtable.h | 13 ++--
> arch/powerpc/include/asm/pgtable.h | 6 ++
> arch/powerpc/mm/book3s64/hash_utils.c | 11 +--
> arch/powerpc/mm/cacheflush.c | 81 +++--------------------
> arch/powerpc/mm/nohash/e500_hugetlbpage.c | 3 +-
> arch/powerpc/mm/pgtable.c | 51 ++++++++------
> 9 files changed, 73 insertions(+), 126 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/pgtable.h b/arch/powerpc/include/asm/book3s/pgtable.h
> index d18b748ea3ae..c2ef811505b0 100644
> --- a/arch/powerpc/include/asm/book3s/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/pgtable.h
> @@ -9,13 +9,6 @@
> #endif
>
> #ifndef __ASSEMBLY__
> -/* Insert a PTE, top-level function is out of line. It uses an inline
> - * low level function in the respective pgtable-* files
> - */
> -extern void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
> - pte_t pte);
> -
> -
> #define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
> extern int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address,
> pte_t *ptep, pte_t entry, int dirty);
> @@ -36,7 +29,8 @@ void __update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t
> * corresponding HPTE into the hash table ahead of time, instead of
> * waiting for the inevitable extra hash-table miss exception.
> */
> -static inline void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep)
> +static inline void update_mmu_cache_range(struct vm_area_struct *vma,
> + unsigned long address, pte_t *ptep, unsigned int nr)
> {
> if (IS_ENABLED(CONFIG_PPC32) && !mmu_has_feature(MMU_FTR_HPTE_TABLE))
> return;
> diff --git a/arch/powerpc/include/asm/cacheflush.h b/arch/powerpc/include/asm/cacheflush.h
> index 7564dd4fd12b..ef7d2de33b89 100644
> --- a/arch/powerpc/include/asm/cacheflush.h
> +++ b/arch/powerpc/include/asm/cacheflush.h
> @@ -35,13 +35,19 @@ static inline void flush_cache_vmap(unsigned long start, unsigned long end)
> * It just marks the page as not i-cache clean. We do the i-cache
> * flush later when the page is given to a user process, if necessary.
> */
> -static inline void flush_dcache_page(struct page *page)
> +static inline void flush_dcache_folio(struct folio *folio)
> {
> if (cpu_has_feature(CPU_FTR_COHERENT_ICACHE))
> return;
> /* avoid an atomic op if possible */
> - if (test_bit(PG_dcache_clean, &page->flags))
> - clear_bit(PG_dcache_clean, &page->flags);
> + if (test_bit(PG_dcache_clean, &folio->flags))
> + clear_bit(PG_dcache_clean, &folio->flags);
> +}
> +#define flush_dcache_folio flush_dcache_folio
> +
> +static inline void flush_dcache_page(struct page *page)
> +{
> + flush_dcache_folio(page_folio(page));
> }
>
> void flush_icache_range(unsigned long start, unsigned long stop);
> @@ -51,7 +57,7 @@ void flush_icache_user_page(struct vm_area_struct *vma, struct page *page,
> unsigned long addr, int len);
> #define flush_icache_user_page flush_icache_user_page
>
> -void flush_dcache_icache_page(struct page *page);
> +void flush_dcache_icache_folio(struct folio *folio);
>
> /**
> * flush_dcache_range(): Write any modified data cache blocks out to memory and
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> index 6bef23d6d0e3..e91dd8e88bb7 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -868,7 +868,7 @@ void kvmppc_init_lpid(unsigned long nr_lpids);
>
> static inline void kvmppc_mmu_flush_icache(kvm_pfn_t pfn)
> {
> - struct page *page;
> + struct folio *folio;
> /*
> * We can only access pages that the kernel maps
> * as memory. Bail out for unmapped ones.
> @@ -877,10 +877,10 @@ static inline void kvmppc_mmu_flush_icache(kvm_pfn_t pfn)
> return;
>
> /* Clear i-cache for new pages */
> - page = pfn_to_page(pfn);
> - if (!test_bit(PG_dcache_clean, &page->flags)) {
> - flush_dcache_icache_page(page);
> - set_bit(PG_dcache_clean, &page->flags);
> + folio = page_folio(pfn_to_page(pfn));
> + if (!test_bit(PG_dcache_clean, &folio->flags)) {
> + flush_dcache_icache_folio(folio);
> + set_bit(PG_dcache_clean, &folio->flags);
> }
> }
>
> diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h
> index a6caaaab6f92..69a7dd47a9f0 100644
> --- a/arch/powerpc/include/asm/nohash/pgtable.h
> +++ b/arch/powerpc/include/asm/nohash/pgtable.h
> @@ -166,12 +166,6 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
> return __pte(pte_val(pte) & ~_PAGE_SWP_EXCLUSIVE);
> }
>
> -/* Insert a PTE, top-level function is out of line. It uses an inline
> - * low level function in the respective pgtable-* files
> - */
> -extern void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
> - pte_t pte);
> -
> /* This low level function performs the actual PTE insertion
> * Setting the PTE depends on the MMU type and other factors. It's
> * an horrible mess that I'm not going to try to clean up now but
> @@ -282,10 +276,11 @@ static inline int pud_huge(pud_t pud)
> * for the page which has just been mapped in.
> */
> #if defined(CONFIG_PPC_E500) && defined(CONFIG_HUGETLB_PAGE)
> -void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep);
> +void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
> + pte_t *ptep, unsigned int nr);
> #else
> -static inline
> -void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep) {}
> +static inline void update_mmu_cache(struct vm_area_struct *vma,
> + unsigned long address, pte_t *ptep, unsigned int nr) {}
> #endif
>
> #endif /* __ASSEMBLY__ */
> diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
> index 9972626ddaf6..bf1263ff7e67 100644
> --- a/arch/powerpc/include/asm/pgtable.h
> +++ b/arch/powerpc/include/asm/pgtable.h
> @@ -41,6 +41,12 @@ struct mm_struct;
>
> #ifndef __ASSEMBLY__
>
> +void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
> + pte_t pte, unsigned int nr);
> +#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
> +#define update_mmu_cache(vma, addr, ptep) \
> + update_mmu_cache_range(vma, addr, ptep, 1);
> +
> #ifndef MAX_PTRS_PER_PGD
> #define MAX_PTRS_PER_PGD PTRS_PER_PGD
> #endif
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
> index fedffe3ae136..ad2afa08e62e 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -1307,18 +1307,19 @@ void hash__early_init_mmu_secondary(void)
> */
> unsigned int hash_page_do_lazy_icache(unsigned int pp, pte_t pte, int trap)
> {
> - struct page *page;
> + struct folio *folio;
>
> if (!pfn_valid(pte_pfn(pte)))
> return pp;
>
> - page = pte_page(pte);
> + folio = page_folio(pte_page(pte));
>
> /* page is dirty */
> - if (!test_bit(PG_dcache_clean, &page->flags) && !PageReserved(page)) {
> + if (!test_bit(PG_dcache_clean, &folio->flags) &&
> + !folio_test_reserved(folio)) {
> if (trap == INTERRUPT_INST_STORAGE) {
> - flush_dcache_icache_page(page);
> - set_bit(PG_dcache_clean, &page->flags);
> + flush_dcache_icache_folio(folio);
> + set_bit(PG_dcache_clean, &folio->flags);
> } else
> pp |= HPTE_R_N;
> }
> diff --git a/arch/powerpc/mm/cacheflush.c b/arch/powerpc/mm/cacheflush.c
> index 0e9b4879c0f9..8ea6a096a664 100644
> --- a/arch/powerpc/mm/cacheflush.c
> +++ b/arch/powerpc/mm/cacheflush.c
> @@ -76,51 +76,6 @@ void flush_icache_range(unsigned long start, unsigned long stop)
> }
> EXPORT_SYMBOL(flush_icache_range);
>
> -#ifdef CONFIG_HIGHMEM
> -/**
> - * flush_dcache_icache_phys() - Flush a page by it's physical address
> - * @physaddr: the physical address of the page
> - */
> -static void flush_dcache_icache_phys(unsigned long physaddr)
> -{
> - unsigned long bytes = l1_dcache_bytes();
> - unsigned long nb = PAGE_SIZE / bytes;
> - unsigned long addr = physaddr & PAGE_MASK;
> - unsigned long msr, msr0;
> - unsigned long loop1 = addr, loop2 = addr;
> -
> - msr0 = mfmsr();
> - msr = msr0 & ~MSR_DR;
> - /*
> - * This must remain as ASM to prevent potential memory accesses
> - * while the data MMU is disabled
> - */
> - asm volatile(
> - " mtctr %2;\n"
> - " mtmsr %3;\n"
> - " isync;\n"
> - "0: dcbst 0, %0;\n"
> - " addi %0, %0, %4;\n"
> - " bdnz 0b;\n"
> - " sync;\n"
> - " mtctr %2;\n"
> - "1: icbi 0, %1;\n"
> - " addi %1, %1, %4;\n"
> - " bdnz 1b;\n"
> - " sync;\n"
> - " mtmsr %5;\n"
> - " isync;\n"
> - : "+&r" (loop1), "+&r" (loop2)
> - : "r" (nb), "r" (msr), "i" (bytes), "r" (msr0)
> - : "ctr", "memory");
> -}
> -NOKPROBE_SYMBOL(flush_dcache_icache_phys)
> -#else
> -static void flush_dcache_icache_phys(unsigned long physaddr)
> -{
> -}
> -#endif
> -
> /**
> * __flush_dcache_icache(): Flush a particular page from the data cache to RAM.
> * Note: this is necessary because the instruction cache does *not*
> @@ -148,17 +103,20 @@ static void __flush_dcache_icache(void *p)
> invalidate_icache_range(addr, addr + PAGE_SIZE);
> }
>
> -static void flush_dcache_icache_hugepage(struct page *page)
> +void flush_dcache_icache_folio(struct folio *folio)
> {
> - int i;
> - int nr = compound_nr(page);
> + unsigned int i, nr = folio_nr_pages(folio);
>
> - if (!PageHighMem(page)) {
> + if (flush_coherent_icache())
> + return;
> +
> + if (!folio_test_highmem(folio)) {
> + void *addr = folio_address(folio);
> for (i = 0; i < nr; i++)
> - __flush_dcache_icache(lowmem_page_address(page + i));
> + __flush_dcache_icache(addr + i * PAGE_SIZE);
> } else {
> for (i = 0; i < nr; i++) {
> - void *start = kmap_local_page(page + i);
> + void *start = kmap_local_folio(folio, i * PAGE_SIZE);
>
> __flush_dcache_icache(start);
> kunmap_local(start);
> @@ -166,27 +124,6 @@ static void flush_dcache_icache_hugepage(struct page *page)
> }
> }
>
> -void flush_dcache_icache_page(struct page *page)
> -{
> - if (flush_coherent_icache())
> - return;
> -
> - if (PageCompound(page))
> - return flush_dcache_icache_hugepage(page);
> -
> - if (!PageHighMem(page)) {
> - __flush_dcache_icache(lowmem_page_address(page));
> - } else if (IS_ENABLED(CONFIG_BOOKE) || sizeof(phys_addr_t) > sizeof(void *)) {
> - void *start = kmap_local_page(page);
> -
> - __flush_dcache_icache(start);
> - kunmap_local(start);
> - } else {
> - flush_dcache_icache_phys(page_to_phys(page));
> - }
> -}
> -EXPORT_SYMBOL(flush_dcache_icache_page);
> -
> void clear_user_page(void *page, unsigned long vaddr, struct page *pg)
> {
> clear_page(page);
> diff --git a/arch/powerpc/mm/nohash/e500_hugetlbpage.c b/arch/powerpc/mm/nohash/e500_hugetlbpage.c
> index 58c8d9849cb1..f3cb91107a47 100644
> --- a/arch/powerpc/mm/nohash/e500_hugetlbpage.c
> +++ b/arch/powerpc/mm/nohash/e500_hugetlbpage.c
> @@ -178,7 +178,8 @@ book3e_hugetlb_preload(struct vm_area_struct *vma, unsigned long ea, pte_t pte)
> *
> * This must always be called with the pte lock held.
> */
> -void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep)
> +void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
> + pte_t *ptep, unsigned int nr)
> {
> if (is_vm_hugetlb_page(vma))
> book3e_hugetlb_preload(vma, address, *ptep);
> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
> index cb2dcdb18f8e..b3c7b874a7a2 100644
> --- a/arch/powerpc/mm/pgtable.c
> +++ b/arch/powerpc/mm/pgtable.c
> @@ -58,7 +58,7 @@ static inline int pte_looks_normal(pte_t pte)
> return 0;
> }
>
> -static struct page *maybe_pte_to_page(pte_t pte)
> +static struct folio *maybe_pte_to_folio(pte_t pte)
> {
> unsigned long pfn = pte_pfn(pte);
> struct page *page;
> @@ -68,7 +68,7 @@ static struct page *maybe_pte_to_page(pte_t pte)
> page = pfn_to_page(pfn);
> if (PageReserved(page))
> return NULL;
> - return page;
> + return page_folio(page);
> }
>
> #ifdef CONFIG_PPC_BOOK3S
> @@ -84,12 +84,12 @@ static pte_t set_pte_filter_hash(pte_t pte)
> pte = __pte(pte_val(pte) & ~_PAGE_HPTEFLAGS);
> if (pte_looks_normal(pte) && !(cpu_has_feature(CPU_FTR_COHERENT_ICACHE) ||
> cpu_has_feature(CPU_FTR_NOEXECUTE))) {
> - struct page *pg = maybe_pte_to_page(pte);
> - if (!pg)
> + struct folio *folio = maybe_pte_to_folio(pte);
> + if (!folio)
> return pte;
> - if (!test_bit(PG_dcache_clean, &pg->flags)) {
> - flush_dcache_icache_page(pg);
> - set_bit(PG_dcache_clean, &pg->flags);
> + if (!test_bit(PG_dcache_clean, &folio->flags)) {
> + flush_dcache_icache_folio(folio);
> + set_bit(PG_dcache_clean, &folio->flags);
> }
> }
> return pte;
> @@ -107,7 +107,7 @@ static pte_t set_pte_filter_hash(pte_t pte) { return pte; }
> */
> static inline pte_t set_pte_filter(pte_t pte)
> {
> - struct page *pg;
> + struct folio *folio;
>
> if (radix_enabled())
> return pte;
> @@ -120,18 +120,18 @@ static inline pte_t set_pte_filter(pte_t pte)
> return pte;
>
> /* If you set _PAGE_EXEC on weird pages you're on your own */
> - pg = maybe_pte_to_page(pte);
> - if (unlikely(!pg))
> + folio = maybe_pte_to_folio(pte);
> + if (unlikely(!folio))
> return pte;
>
> /* If the page clean, we move on */
> - if (test_bit(PG_dcache_clean, &pg->flags))
> + if (test_bit(PG_dcache_clean, &folio->flags))
> return pte;
>
> /* If it's an exec fault, we flush the cache and make it clean */
> if (is_exec_fault()) {
> - flush_dcache_icache_page(pg);
> - set_bit(PG_dcache_clean, &pg->flags);
> + flush_dcache_icache_folio(folio);
> + set_bit(PG_dcache_clean, &folio->flags);
> return pte;
> }
>
> @@ -142,7 +142,7 @@ static inline pte_t set_pte_filter(pte_t pte)
> static pte_t set_access_flags_filter(pte_t pte, struct vm_area_struct *vma,
> int dirty)
> {
> - struct page *pg;
> + struct folio *folio;
>
> if (IS_ENABLED(CONFIG_PPC_BOOK3S_64))
> return pte;
> @@ -168,17 +168,17 @@ static pte_t set_access_flags_filter(pte_t pte, struct vm_area_struct *vma,
> #endif /* CONFIG_DEBUG_VM */
>
> /* If you set _PAGE_EXEC on weird pages you're on your own */
> - pg = maybe_pte_to_page(pte);
> - if (unlikely(!pg))
> + folio = maybe_pte_to_folio(pte);
> + if (unlikely(!folio))
> goto bail;
>
> /* If the page is already clean, we move on */
> - if (test_bit(PG_dcache_clean, &pg->flags))
> + if (test_bit(PG_dcache_clean, &folio->flags))
> goto bail;
>
> /* Clean the page and set PG_dcache_clean */
> - flush_dcache_icache_page(pg);
> - set_bit(PG_dcache_clean, &pg->flags);
> + flush_dcache_icache_folio(folio);
> + set_bit(PG_dcache_clean, &folio->flags);
>
> bail:
> return pte_mkexec(pte);
> @@ -187,8 +187,8 @@ static pte_t set_access_flags_filter(pte_t pte, struct vm_area_struct *vma,
> /*
> * set_pte stores a linux PTE into the linux page table.
> */
> -void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
> - pte_t pte)
> +void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
> + pte_t pte, unsigned int nr)
> {
> /*
> * Make sure hardware valid bit is not set. We don't do
> @@ -203,7 +203,14 @@ void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
> pte = set_pte_filter(pte);
>
> /* Perform the setting of the PTE */
> - __set_pte_at(mm, addr, ptep, pte, 0);
> + for (;;) {
> + __set_pte_at(mm, addr, ptep, pte, 0);
> + if (--nr == 0)
> + break;
> + ptep++;
> + pte = __pte(pte_val(pte) + PAGE_SIZE);
> + addr += PAGE_SIZE;
> + }
> }
>
> void unmap_kernel_page(unsigned long va)

2023-02-27 20:21:08

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v2 18/30] powerpc: Implement the new page table range API

On Mon, Feb 27, 2023 at 07:45:08PM +0000, Christophe Leroy wrote:
> Hi,
>
> Le 27/02/2023 ? 18:57, Matthew Wilcox (Oracle) a ?crit?:
> > Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
> > Change the PG_arch_1 (aka PG_dcache_dirty) flag from being per-page to
> > per-folio.
> >
> > I'm unsure about my merging of flush_dcache_icache_hugepage() and
> > flush_dcache_icache_page() into flush_dcache_icache_folio() and subsequent
> > removal of flush_dcache_icache_phys(). Please review.
>
> Not sure why you want to remove flush_dcache_icache_phys().

Well, I didn't, necessarily. It's just that when I merged
flush_dcache_icache_hugepage() and flush_dcache_icache_page()
together, it was left with no callers.

> Allthough that's only feasible when address bus is not wider than 32
> bits and cannot be done on BOOKE as you can't switch off MMU on BOOKE,
> flush_dcache_icache_phys() allows to flush not mapped pages without
> having to map them. So it is more efficient.

And it was just never done for the hugepage case?

> > @@ -148,17 +103,20 @@ static void __flush_dcache_icache(void *p)
> > invalidate_icache_range(addr, addr + PAGE_SIZE);
> > }
> >
> > -static void flush_dcache_icache_hugepage(struct page *page)
> > +void flush_dcache_icache_folio(struct folio *folio)
> > {
> > - int i;
> > - int nr = compound_nr(page);
> > + unsigned int i, nr = folio_nr_pages(folio);
> >
> > - if (!PageHighMem(page)) {
> > + if (flush_coherent_icache())
> > + return;
> > +
> > + if (!folio_test_highmem(folio)) {
> > + void *addr = folio_address(folio);
> > for (i = 0; i < nr; i++)
> > - __flush_dcache_icache(lowmem_page_address(page + i));
> > + __flush_dcache_icache(addr + i * PAGE_SIZE);
> > } else {
> > for (i = 0; i < nr; i++) {
> > - void *start = kmap_local_page(page + i);
> > + void *start = kmap_local_folio(folio, i * PAGE_SIZE);
> >
> > __flush_dcache_icache(start);
> > kunmap_local(start);

So you'd like this to be:

} else if (IS_ENABLED(CONFIG_BOOKE) || sizeof(phys_addr_t) > sizeof(void *)) {
for (i = 0; i < nr; i++) {
void *start = kmap_local_folio(folio, i * PAGE_SIZE);
__flush_dcache_icache(start);
kunmap_local(start);
}
} else {
unsigned long pfn = folio_pfn(folio);
for (i = 0; i < nr; i++)
flush_dcache_icache_phys((pfn + i) * PAGE_SIZE;
}

(or maybe you'd prefer a flush_dcache_icache_pfn() that doesn't need to
worry about PAGE_MASK).

> > @@ -166,27 +124,6 @@ static void flush_dcache_icache_hugepage(struct page *page)
> > }
> > }
> >
> > -void flush_dcache_icache_page(struct page *page)
> > -{
> > - if (flush_coherent_icache())
> > - return;
> > -
> > - if (PageCompound(page))
> > - return flush_dcache_icache_hugepage(page);
> > -
> > - if (!PageHighMem(page)) {
> > - __flush_dcache_icache(lowmem_page_address(page));
> > - } else if (IS_ENABLED(CONFIG_BOOKE) || sizeof(phys_addr_t) > sizeof(void *)) {
> > - void *start = kmap_local_page(page);
> > -
> > - __flush_dcache_icache(start);
> > - kunmap_local(start);
> > - } else {
> > - flush_dcache_icache_phys(page_to_phys(page));
> > - }
> > -}
> > -EXPORT_SYMBOL(flush_dcache_icache_page);
> > -
> > void clear_user_page(void *page, unsigned long vaddr, struct page *pg)
> > {
> > clear_page(page);
> > diff --git a/arch/powerpc/mm/nohash/e500_hugetlbpage.c b/arch/powerpc/mm/nohash/e500_hugetlbpage.c
> > index 58c8d9849cb1..f3cb91107a47 100644
> > --- a/arch/powerpc/mm/nohash/e500_hugetlbpage.c
> > +++ b/arch/powerpc/mm/nohash/e500_hugetlbpage.c
> > @@ -178,7 +178,8 @@ book3e_hugetlb_preload(struct vm_area_struct *vma, unsigned long ea, pte_t pte)
> > *
> > * This must always be called with the pte lock held.
> > */
> > -void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep)
> > +void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
> > + pte_t *ptep, unsigned int nr)
> > {
> > if (is_vm_hugetlb_page(vma))
> > book3e_hugetlb_preload(vma, address, *ptep);
> > diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
> > index cb2dcdb18f8e..b3c7b874a7a2 100644
> > --- a/arch/powerpc/mm/pgtable.c
> > +++ b/arch/powerpc/mm/pgtable.c
> > @@ -58,7 +58,7 @@ static inline int pte_looks_normal(pte_t pte)
> > return 0;
> > }
> >
> > -static struct page *maybe_pte_to_page(pte_t pte)
> > +static struct folio *maybe_pte_to_folio(pte_t pte)
> > {
> > unsigned long pfn = pte_pfn(pte);
> > struct page *page;
> > @@ -68,7 +68,7 @@ static struct page *maybe_pte_to_page(pte_t pte)
> > page = pfn_to_page(pfn);
> > if (PageReserved(page))
> > return NULL;
> > - return page;
> > + return page_folio(page);
> > }
> >
> > #ifdef CONFIG_PPC_BOOK3S
> > @@ -84,12 +84,12 @@ static pte_t set_pte_filter_hash(pte_t pte)
> > pte = __pte(pte_val(pte) & ~_PAGE_HPTEFLAGS);
> > if (pte_looks_normal(pte) && !(cpu_has_feature(CPU_FTR_COHERENT_ICACHE) ||
> > cpu_has_feature(CPU_FTR_NOEXECUTE))) {
> > - struct page *pg = maybe_pte_to_page(pte);
> > - if (!pg)
> > + struct folio *folio = maybe_pte_to_folio(pte);
> > + if (!folio)
> > return pte;
> > - if (!test_bit(PG_dcache_clean, &pg->flags)) {
> > - flush_dcache_icache_page(pg);
> > - set_bit(PG_dcache_clean, &pg->flags);
> > + if (!test_bit(PG_dcache_clean, &folio->flags)) {
> > + flush_dcache_icache_folio(folio);
> > + set_bit(PG_dcache_clean, &folio->flags);
> > }
> > }
> > return pte;
> > @@ -107,7 +107,7 @@ static pte_t set_pte_filter_hash(pte_t pte) { return pte; }
> > */
> > static inline pte_t set_pte_filter(pte_t pte)
> > {
> > - struct page *pg;
> > + struct folio *folio;
> >
> > if (radix_enabled())
> > return pte;
> > @@ -120,18 +120,18 @@ static inline pte_t set_pte_filter(pte_t pte)
> > return pte;
> >
> > /* If you set _PAGE_EXEC on weird pages you're on your own */
> > - pg = maybe_pte_to_page(pte);
> > - if (unlikely(!pg))
> > + folio = maybe_pte_to_folio(pte);
> > + if (unlikely(!folio))
> > return pte;
> >
> > /* If the page clean, we move on */
> > - if (test_bit(PG_dcache_clean, &pg->flags))
> > + if (test_bit(PG_dcache_clean, &folio->flags))
> > return pte;
> >
> > /* If it's an exec fault, we flush the cache and make it clean */
> > if (is_exec_fault()) {
> > - flush_dcache_icache_page(pg);
> > - set_bit(PG_dcache_clean, &pg->flags);
> > + flush_dcache_icache_folio(folio);
> > + set_bit(PG_dcache_clean, &folio->flags);
> > return pte;
> > }
> >
> > @@ -142,7 +142,7 @@ static inline pte_t set_pte_filter(pte_t pte)
> > static pte_t set_access_flags_filter(pte_t pte, struct vm_area_struct *vma,
> > int dirty)
> > {
> > - struct page *pg;
> > + struct folio *folio;
> >
> > if (IS_ENABLED(CONFIG_PPC_BOOK3S_64))
> > return pte;
> > @@ -168,17 +168,17 @@ static pte_t set_access_flags_filter(pte_t pte, struct vm_area_struct *vma,
> > #endif /* CONFIG_DEBUG_VM */
> >
> > /* If you set _PAGE_EXEC on weird pages you're on your own */
> > - pg = maybe_pte_to_page(pte);
> > - if (unlikely(!pg))
> > + folio = maybe_pte_to_folio(pte);
> > + if (unlikely(!folio))
> > goto bail;
> >
> > /* If the page is already clean, we move on */
> > - if (test_bit(PG_dcache_clean, &pg->flags))
> > + if (test_bit(PG_dcache_clean, &folio->flags))
> > goto bail;
> >
> > /* Clean the page and set PG_dcache_clean */
> > - flush_dcache_icache_page(pg);
> > - set_bit(PG_dcache_clean, &pg->flags);
> > + flush_dcache_icache_folio(folio);
> > + set_bit(PG_dcache_clean, &folio->flags);
> >
> > bail:
> > return pte_mkexec(pte);
> > @@ -187,8 +187,8 @@ static pte_t set_access_flags_filter(pte_t pte, struct vm_area_struct *vma,
> > /*
> > * set_pte stores a linux PTE into the linux page table.
> > */
> > -void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
> > - pte_t pte)
> > +void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
> > + pte_t pte, unsigned int nr)
> > {
> > /*
> > * Make sure hardware valid bit is not set. We don't do
> > @@ -203,7 +203,14 @@ void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
> > pte = set_pte_filter(pte);
> >
> > /* Perform the setting of the PTE */
> > - __set_pte_at(mm, addr, ptep, pte, 0);
> > + for (;;) {
> > + __set_pte_at(mm, addr, ptep, pte, 0);
> > + if (--nr == 0)
> > + break;
> > + ptep++;
> > + pte = __pte(pte_val(pte) + PAGE_SIZE);
> > + addr += PAGE_SIZE;
> > + }
> > }
> >
> > void unmap_kernel_page(unsigned long va)

2023-02-27 22:53:11

by John David Anglin

[permalink] [raw]
Subject: Re: [PATCH v2 17/30] parisc: Implement the new page table range API

On 2023-02-27 12:57 p.m., Matthew Wilcox (Oracle) wrote:
> Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio()
> and flush_icache_pages(). Change the PG_arch_1 (aka PG_dcache_dirty) flag
> from being per-page to per-folio.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> Cc: "James E.J. Bottomley" <[email protected]>
> Cc: Helge Deller <[email protected]>
> Cc: [email protected]
> ---
> arch/parisc/include/asm/cacheflush.h | 14 ++--
> arch/parisc/include/asm/pgtable.h | 28 +++++---
> arch/parisc/kernel/cache.c | 101 +++++++++++++++++++--------
> 3 files changed, 99 insertions(+), 44 deletions(-)
>
> diff --git a/arch/parisc/include/asm/cacheflush.h b/arch/parisc/include/asm/cacheflush.h
> index ff07c509e04b..0bf8b69d086b 100644
> --- a/arch/parisc/include/asm/cacheflush.h
> +++ b/arch/parisc/include/asm/cacheflush.h
> @@ -46,16 +46,20 @@ void invalidate_kernel_vmap_range(void *vaddr, int size);
> #define flush_cache_vmap(start, end) flush_cache_all()
> #define flush_cache_vunmap(start, end) flush_cache_all()
>
> +void flush_dcache_folio(struct folio *folio);
> +#define flush_dcache_folio flush_dcache_folio
> #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
> -void flush_dcache_page(struct page *page);
> +static inline void flush_dcache_page(struct page *page)
> +{
> + flush_dcache_folio(page_folio(page));
> +}
>
> #define flush_dcache_mmap_lock(mapping) xa_lock_irq(&mapping->i_pages)
> #define flush_dcache_mmap_unlock(mapping) xa_unlock_irq(&mapping->i_pages)
>
> -#define flush_icache_page(vma,page) do { \
> - flush_kernel_dcache_page_addr(page_address(page)); \
> - flush_kernel_icache_page(page_address(page)); \
> -} while (0)
> +void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
> + unsigned int nr);
> +#define flush_icache_page(vma, page) flush_icache_pages(vma, page, 1)
>
> #define flush_icache_range(s,e) do { \
> flush_kernel_dcache_range_asm(s,e); \
> diff --git a/arch/parisc/include/asm/pgtable.h b/arch/parisc/include/asm/pgtable.h
> index e2950f5db7c9..78ee9816f423 100644
> --- a/arch/parisc/include/asm/pgtable.h
> +++ b/arch/parisc/include/asm/pgtable.h
> @@ -73,14 +73,7 @@ extern void __update_cache(pte_t pte);
> mb(); \
> } while(0)
>
> -#define set_pte_at(mm, addr, pteptr, pteval) \
> - do { \
> - if (pte_present(pteval) && \
> - pte_user(pteval)) \
> - __update_cache(pteval); \
> - *(pteptr) = (pteval); \
> - purge_tlb_entries(mm, addr); \
> - } while (0)
> +#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
>
> #endif /* !__ASSEMBLY__ */
>
> @@ -391,11 +384,28 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
>
> extern void paging_init (void);
>
> +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> + pte_t *ptep, pte_t pte, unsigned int nr)
> +{
> + if (pte_present(pte) && pte_user(pte))
> + __update_cache(pte);
> + for (;;) {
> + *ptep = pte;
> + purge_tlb_entries(mm, addr);
> + if (--nr == 0)
> + break;
> + ptep++;
> + pte_val(pte) += 1 << PFN_PTE_SHIFT;
> + addr += PAGE_SIZE;
> + }
> +}
> +
> /* Used for deferring calls to flush_dcache_page() */
>
> #define PG_dcache_dirty PG_arch_1
>
> -#define update_mmu_cache(vms,addr,ptep) __update_cache(*ptep)
> +#define update_mmu_cache_range(vma, addr, ptep, nr) __update_cache(*ptep)
> +#define update_mmu_cache(vma, addr, ptep) __update_cache(*ptep)
>
> /*
> * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
> diff --git a/arch/parisc/kernel/cache.c b/arch/parisc/kernel/cache.c
> index 984d3a1b3828..16057812103b 100644
> --- a/arch/parisc/kernel/cache.c
> +++ b/arch/parisc/kernel/cache.c
> @@ -92,11 +92,11 @@ static inline void flush_data_cache(void)
> /* Kernel virtual address of pfn. */
> #define pfn_va(pfn) __va(PFN_PHYS(pfn))
>
> -void
> -__update_cache(pte_t pte)
> +void __update_cache(pte_t pte)
> {
> unsigned long pfn = pte_pfn(pte);
> - struct page *page;
> + struct folio *folio;
> + unsigned int nr;
>
> /* We don't have pte special. As a result, we can be called with
> an invalid pfn and we don't need to flush the kernel dcache page.
> @@ -104,13 +104,17 @@ __update_cache(pte_t pte)
> if (!pfn_valid(pfn))
> return;
>
> - page = pfn_to_page(pfn);
> - if (page_mapping_file(page) &&
> - test_bit(PG_dcache_dirty, &page->flags)) {
> - flush_kernel_dcache_page_addr(pfn_va(pfn));
> - clear_bit(PG_dcache_dirty, &page->flags);
> + folio = page_folio(pfn_to_page(pfn));
> + pfn = folio_pfn(folio);
> + nr = folio_nr_pages(folio);
> + if (folio_flush_mapping(folio) &&
Shouldn't this call be to folio_mapping()?
> + test_bit(PG_dcache_dirty, &folio->flags)) {
> + while (nr--)
> + flush_kernel_dcache_page_addr(pfn_va(pfn + nr));
> + clear_bit(PG_dcache_dirty, &folio->flags);
> } else if (parisc_requires_coherency())
> - flush_kernel_dcache_page_addr(pfn_va(pfn));
> + while (nr--)
> + flush_kernel_dcache_page_addr(pfn_va(pfn + nr));
> }
>
> void
> @@ -365,6 +369,20 @@ static void flush_user_cache_page(struct vm_area_struct *vma, unsigned long vmad
> preempt_enable();
> }
>
> +void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
> + unsigned int nr)
> +{
> + void *kaddr = page_address(page);
> +
> + for (;;) {
> + flush_kernel_dcache_page_addr(kaddr);
> + flush_kernel_icache_page(kaddr);
> + if (--nr == 0)
> + break;
> + page += PAGE_SIZE;
> + }
> +}
> +
> static inline pte_t *get_ptep(struct mm_struct *mm, unsigned long addr)
> {
> pte_t *ptep = NULL;
> @@ -393,26 +411,30 @@ static inline bool pte_needs_flush(pte_t pte)
> == (_PAGE_PRESENT | _PAGE_ACCESSED);
> }
>
> -void flush_dcache_page(struct page *page)
> +void flush_dcache_folio(struct folio *folio)
> {
> - struct address_space *mapping = page_mapping_file(page);
> - struct vm_area_struct *mpnt;
> - unsigned long offset;
> + struct address_space *mapping = folio_flush_mapping(folio);
Likewise.
> + struct vm_area_struct *vma;
> unsigned long addr, old_addr = 0;
> + void *kaddr;
> unsigned long count = 0;
> + unsigned long i, nr;
> pgoff_t pgoff;
>
> if (mapping && !mapping_mapped(mapping)) {
> - set_bit(PG_dcache_dirty, &page->flags);
> + set_bit(PG_dcache_dirty, &folio->flags);
> return;
> }
>
> - flush_kernel_dcache_page_addr(page_address(page));
> + nr = folio_nr_pages(folio);
> + kaddr = folio_address(folio);
> + for (i = 0; i < nr; i++)
> + flush_kernel_dcache_page_addr(kaddr + i * PAGE_SIZE);
>
> if (!mapping)
> return;
>
> - pgoff = page->index;
> + pgoff = folio->index;
>
> /*
> * We have carefully arranged in arch_get_unmapped_area() that
> @@ -422,15 +444,29 @@ void flush_dcache_page(struct page *page)
> * on machines that support equivalent aliasing
> */
> flush_dcache_mmap_lock(mapping);
> - vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) {
> - offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
> - addr = mpnt->vm_start + offset;
> - if (parisc_requires_coherency()) {
> - pte_t *ptep;
> + vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff + nr - 1) {
> + unsigned long offset = pgoff - vma->vm_pgoff;
> + unsigned long pfn = folio_pfn(folio);
> +
> + addr = vma->vm_start;
> + nr = folio_nr_pages(folio);
> + if (offset > -nr) {
> + pfn -= offset;
> + nr += offset;
> + } else {
> + addr += offset * PAGE_SIZE;
> + }
> + if (addr + nr * PAGE_SIZE > vma->vm_end)
> + nr = (vma->vm_end - addr) / PAGE_SIZE;
>
> - ptep = get_ptep(mpnt->vm_mm, addr);
> - if (ptep && pte_needs_flush(*ptep))
> - flush_user_cache_page(mpnt, addr);
> + if (parisc_requires_coherency()) {
> + for (i = 0; i < nr; i++) {
> + pte_t *ptep = get_ptep(vma->vm_mm,
> + addr + i * PAGE_SIZE);
> + if (ptep && pte_needs_flush(*ptep))
> + flush_user_cache_page(vma,
> + addr + i * PAGE_SIZE);
> + }
> } else {
> /*
> * The TLB is the engine of coherence on parisc:
> @@ -443,27 +479,32 @@ void flush_dcache_page(struct page *page)
> * in (until the user or kernel specifically
> * accesses it, of course)
> */
> - flush_tlb_page(mpnt, addr);
> + for (i = 0; i < nr; i++)
> + flush_tlb_page(vma, addr + i * PAGE_SIZE);
> if (old_addr == 0 || (old_addr & (SHM_COLOUR - 1))
> != (addr & (SHM_COLOUR - 1))) {
> - __flush_cache_page(mpnt, addr, page_to_phys(page));
> + for (i = 0; i < nr; i++)
> + __flush_cache_page(vma,
> + addr + i * PAGE_SIZE,
> + (pfn + i) * PAGE_SIZE);
> /*
> * Software is allowed to have any number
> * of private mappings to a page.
> */
> - if (!(mpnt->vm_flags & VM_SHARED))
> + if (!(vma->vm_flags & VM_SHARED))
> continue;
> if (old_addr)
> pr_err("INEQUIVALENT ALIASES 0x%lx and 0x%lx in file %pD\n",
> - old_addr, addr, mpnt->vm_file);
> - old_addr = addr;
> + old_addr, addr, vma->vm_file);
> + if (nr == folio_nr_pages(folio))
> + old_addr = addr;
> }
> }
> WARN_ON(++count == 4096);
> }
> flush_dcache_mmap_unlock(mapping);
> }
> -EXPORT_SYMBOL(flush_dcache_page);
> +EXPORT_SYMBOL(flush_dcache_folio);
>
> /* Defined in arch/parisc/kernel/pacache.S */
> EXPORT_SYMBOL(flush_kernel_dcache_range_asm);


--
John David Anglin [email protected]


2023-02-27 23:51:03

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v2 17/30] parisc: Implement the new page table range API

On Mon, Feb 27, 2023 at 05:49:18PM -0500, John David Anglin wrote:
> > @@ -104,13 +104,17 @@ __update_cache(pte_t pte)
> > if (!pfn_valid(pfn))
> > return;
> > - page = pfn_to_page(pfn);
> > - if (page_mapping_file(page) &&
> > - test_bit(PG_dcache_dirty, &page->flags)) {
> > - flush_kernel_dcache_page_addr(pfn_va(pfn));
> > - clear_bit(PG_dcache_dirty, &page->flags);
> > + folio = page_folio(pfn_to_page(pfn));
> > + pfn = folio_pfn(folio);
> > + nr = folio_nr_pages(folio);
> > + if (folio_flush_mapping(folio) &&
> Shouldn't this call be to folio_mapping()?

For pages in the swap cache, folio_mapping() will return the swap cache
mapping, which isn't what we want. folio_file_mapping() will return the
swap file mapping, which is also not what we want. folio_flush_mapping()
returns NULL, which is what we want.


2023-02-28 03:17:29

by Guo Ren

[permalink] [raw]
Subject: Re: [PATCH v2 08/30] csky: Implement the new page table range API

For csky part

Acked-by: Guo Ren <[email protected]>

On Tue, Feb 28, 2023 at 1:57 AM Matthew Wilcox (Oracle)
<[email protected]> wrote:
>
> Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
> Change the PG_dcache_clean flag from being per-page to per-folio.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> Cc: Guo Ren <[email protected]>
> Cc: [email protected]
> ---
> arch/csky/abiv1/cacheflush.c | 32 +++++++++++++++++-----------
> arch/csky/abiv1/inc/abi/cacheflush.h | 2 ++
> arch/csky/abiv2/cacheflush.c | 30 +++++++++++++-------------
> arch/csky/abiv2/inc/abi/cacheflush.h | 10 +++++++--
> arch/csky/include/asm/pgtable.h | 21 +++++++++++++++---
> 5 files changed, 62 insertions(+), 33 deletions(-)
>
> diff --git a/arch/csky/abiv1/cacheflush.c b/arch/csky/abiv1/cacheflush.c
> index fb91b069dc69..ba43f6c26b4f 100644
> --- a/arch/csky/abiv1/cacheflush.c
> +++ b/arch/csky/abiv1/cacheflush.c
> @@ -14,43 +14,49 @@
>
> #define PG_dcache_clean PG_arch_1
>
> -void flush_dcache_page(struct page *page)
> +void flush_dcache_folio(struct folio *folio)
> {
> struct address_space *mapping;
>
> - if (page == ZERO_PAGE(0))
> + if (is_zero_pfn(folio_pfn(folio)))
> return;
>
> - mapping = page_mapping_file(page);
> + mapping = folio_flush_mapping(folio);
>
> - if (mapping && !page_mapcount(page))
> - clear_bit(PG_dcache_clean, &page->flags);
> + if (mapping && !folio_mapped(folio))
> + clear_bit(PG_dcache_clean, &folio->flags);
> else {
> dcache_wbinv_all();
> if (mapping)
> icache_inv_all();
> - set_bit(PG_dcache_clean, &page->flags);
> + set_bit(PG_dcache_clean, &folio->flags);
> }
> }
> +EXPORT_SYMBOL(flush_dcache_folio);
> +
> +void flush_dcache_page(struct page *page)
> +{
> + flush_dcache_folio(page_folio(page));
> +}
> EXPORT_SYMBOL(flush_dcache_page);
>
> -void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
> - pte_t *ptep)
> +void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long addr,
> + pte_t *ptep, unsigned int nr)
> {
> unsigned long pfn = pte_pfn(*ptep);
> - struct page *page;
> + struct folio *folio;
>
> if (!pfn_valid(pfn))
> return;
>
> - page = pfn_to_page(pfn);
> - if (page == ZERO_PAGE(0))
> + if (is_zero_pfn(pfn))
> return;
>
> - if (!test_and_set_bit(PG_dcache_clean, &page->flags))
> + folio = page_folio(pfn_to_page(pfn));
> + if (!test_and_set_bit(PG_dcache_clean, &folio->flags))
> dcache_wbinv_all();
>
> - if (page_mapping_file(page)) {
> + if (folio_flush_mapping(folio)) {
> if (vma->vm_flags & VM_EXEC)
> icache_inv_all();
> }
> diff --git a/arch/csky/abiv1/inc/abi/cacheflush.h b/arch/csky/abiv1/inc/abi/cacheflush.h
> index ed62e2066ba7..0d6cb65624c4 100644
> --- a/arch/csky/abiv1/inc/abi/cacheflush.h
> +++ b/arch/csky/abiv1/inc/abi/cacheflush.h
> @@ -9,6 +9,8 @@
>
> #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
> extern void flush_dcache_page(struct page *);
> +void flush_dcache_folio(struct folio *);
> +#define flush_dcache_folio flush_dcache_folio
>
> #define flush_cache_mm(mm) dcache_wbinv_all()
> #define flush_cache_page(vma, page, pfn) cache_wbinv_all()
> diff --git a/arch/csky/abiv2/cacheflush.c b/arch/csky/abiv2/cacheflush.c
> index 39c51399dd81..c1cf0d55a2a1 100644
> --- a/arch/csky/abiv2/cacheflush.c
> +++ b/arch/csky/abiv2/cacheflush.c
> @@ -6,30 +6,30 @@
> #include <linux/mm.h>
> #include <asm/cache.h>
>
> -void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
> - pte_t *pte)
> +void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
> + pte_t *pte, unsigned int nr)
> {
> - unsigned long addr;
> + unsigned long pfn = pte_pfn(*pte);
> struct page *page;
> + unsigned int i;
>
> - if (!pfn_valid(pte_pfn(*pte)))
> + if (!pfn_valid(pfn) || is_zero_pfn(pfn))
> return;
>
> - page = pfn_to_page(pte_pfn(*pte));
> - if (page == ZERO_PAGE(0))
> - return;
> + folio = page_folio(pfn_to_page(pfn));
>
> - if (test_and_set_bit(PG_dcache_clean, &page->flags))
> + if (test_and_set_bit(PG_dcache_clean, &folio->flags))
> return;
>
> - addr = (unsigned long) kmap_atomic(page);
> -
> - dcache_wb_range(addr, addr + PAGE_SIZE);
> + for (i = 0; i < folio_nr_pages(folio); i++) {
> + unsigned long addr = (unsigned long) kmap_local_folio(folio,
> + i * PAGE_SIZE);
>
> - if (vma->vm_flags & VM_EXEC)
> - icache_inv_range(addr, addr + PAGE_SIZE);
> -
> - kunmap_atomic((void *) addr);
> + dcache_wb_range(addr, addr + PAGE_SIZE);
> + if (vma->vm_flags & VM_EXEC)
> + icache_inv_range(addr, addr + PAGE_SIZE);
> + kunmap_local((void *) addr);
> + }
> }
>
> void flush_icache_deferred(struct mm_struct *mm)
> diff --git a/arch/csky/abiv2/inc/abi/cacheflush.h b/arch/csky/abiv2/inc/abi/cacheflush.h
> index a565e00c3f70..9c728933a776 100644
> --- a/arch/csky/abiv2/inc/abi/cacheflush.h
> +++ b/arch/csky/abiv2/inc/abi/cacheflush.h
> @@ -18,11 +18,17 @@
>
> #define PG_dcache_clean PG_arch_1
>
> +static inline void flush_dcache_folio(struct folio *folio)
> +{
> + if (test_bit(PG_dcache_clean, &folio->flags))
> + clear_bit(PG_dcache_clean, &folio->flags);
> +}
> +#define flush_dcache_folio flush_dcache_folio
> +
> #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
> static inline void flush_dcache_page(struct page *page)
> {
> - if (test_bit(PG_dcache_clean, &page->flags))
> - clear_bit(PG_dcache_clean, &page->flags);
> + flush_dcache_folio(page_folio(page));
> }
>
> #define flush_dcache_mmap_lock(mapping) do { } while (0)
> diff --git a/arch/csky/include/asm/pgtable.h b/arch/csky/include/asm/pgtable.h
> index d4042495febc..a30ae048233e 100644
> --- a/arch/csky/include/asm/pgtable.h
> +++ b/arch/csky/include/asm/pgtable.h
> @@ -90,7 +90,20 @@ static inline void set_pte(pte_t *p, pte_t pte)
> /* prevent out of order excution */
> smp_mb();
> }
> -#define set_pte_at(mm, addr, ptep, pteval) set_pte(ptep, pteval)
> +
> +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> + pte_t *ptep, pte_t pte, unsigned int nr)
> +{
> + for (;;) {
> + set_pte(ptep, pte);
> + if (--nr == 0)
> + break;
> + ptep++;
> + pte_val(pte) += PAGE_SIZE;
> + }
> +}
> +
> +#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
>
> static inline pte_t *pmd_page_vaddr(pmd_t pmd)
> {
> @@ -263,8 +276,10 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
> extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
> extern void paging_init(void);
>
> -void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
> - pte_t *pte);
> +void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
> + pte_t *pte, unsigned int nr);
> +#define update_mmu_cache(vma, addr, ptep) \
> + update_mmu_cache_range(vma, addr, ptep, 1)
>
> #define io_remap_pfn_range(vma, vaddr, pfn, size, prot) \
> remap_pfn_range(vma, vaddr, pfn, size, prot)
> --
> 2.39.1
>


--
Best Regards
Guo Ren

2023-02-28 06:34:17

by Vineet Gupta

[permalink] [raw]
Subject: Re: [PATCH v2 06/30] arc: Implement the new page table range API



On 2/27/23 09:57, Matthew Wilcox (Oracle) wrote:
> Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio()
> and flush_icache_pages().
>
> Change the PG_dc_clean flag from being per-page to per-folio (which
> means it cannot always be set as we don't know that all pages in this
> folio were cleaned). Enhance the internal flush routines to take the
> number of pages to flush.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> Cc: Vineet Gupta <[email protected]>
> Cc: [email protected]
> ---
> arch/arc/include/asm/cacheflush.h | 7 +-
> arch/arc/include/asm/pgtable-bits-arcv2.h | 20 +++--
> arch/arc/mm/cache.c | 61 ++++++++------
> arch/arc/mm/tlb.c | 18 +++--

You need to split ARC and ARM into separate patches.

Also it'd be best to drop all the VIPT aliasing bits for ARC, they are a
needless maintenance burden.
I can send a patch which you could carry in your tree for easier logistics.

-Vineet

> arch/arm/include/asm/cacheflush.h | 24 +++---
> arch/arm/include/asm/pgtable.h | 5 +-
> arch/arm/include/asm/tlbflush.h | 13 +--
> arch/arm/mm/copypage-v4mc.c | 5 +-
> arch/arm/mm/copypage-v6.c | 5 +-
> arch/arm/mm/copypage-xscale.c | 5 +-
> arch/arm/mm/dma-mapping.c | 24 +++---
> arch/arm/mm/fault-armv.c | 14 ++--
> arch/arm/mm/flush.c | 99 ++++++++++++++---------
> arch/arm/mm/mm.h | 2 +-
> arch/arm/mm/mmu.c | 14 +++-
> 15 files changed, 193 insertions(+), 123 deletions(-)
>
> diff --git a/arch/arc/include/asm/cacheflush.h b/arch/arc/include/asm/cacheflush.h
> index e201b4b1655a..04f65f588510 100644
> --- a/arch/arc/include/asm/cacheflush.h
> +++ b/arch/arc/include/asm/cacheflush.h
> @@ -25,17 +25,20 @@
> * in update_mmu_cache()
> */
> #define flush_icache_page(vma, page)
> +#define flush_icache_pages(vma, page, nr)
>
> void flush_cache_all(void);
>
> void flush_icache_range(unsigned long kstart, unsigned long kend);
> void __sync_icache_dcache(phys_addr_t paddr, unsigned long vaddr, int len);
> -void __inv_icache_page(phys_addr_t paddr, unsigned long vaddr);
> -void __flush_dcache_page(phys_addr_t paddr, unsigned long vaddr);
> +void __inv_icache_pages(phys_addr_t paddr, unsigned long vaddr, unsigned nr);
> +void __flush_dcache_pages(phys_addr_t paddr, unsigned long vaddr, unsigned nr);
>
> #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
>
> void flush_dcache_page(struct page *page);
> +void flush_dcache_folio(struct folio *folio);
> +#define flush_dcache_folio flush_dcache_folio
>
> void dma_cache_wback_inv(phys_addr_t start, unsigned long sz);
> void dma_cache_inv(phys_addr_t start, unsigned long sz);
> diff --git a/arch/arc/include/asm/pgtable-bits-arcv2.h b/arch/arc/include/asm/pgtable-bits-arcv2.h
> index 6e9f8ca6d6a1..4a1b2ce204c6 100644
> --- a/arch/arc/include/asm/pgtable-bits-arcv2.h
> +++ b/arch/arc/include/asm/pgtable-bits-arcv2.h
> @@ -100,14 +100,24 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
> return __pte((pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot));
> }
>
> -static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> - pte_t *ptep, pte_t pteval)
> +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> + pte_t *ptep, pte_t pte, unsigned int nr)
> {
> - set_pte(ptep, pteval);
> + for (;;) {
> + set_pte(ptep, pte);
> + if (--nr == 0)
> + break;
> + ptep++;
> + pte_val(pte) += PAGE_SIZE;
> + }
> }
> +#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
>
> -void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
> - pte_t *ptep);
> +void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
> + pte_t *ptep, unsigned int nr);
> +
> +#define update_mmu_cache(vma, addr, ptep) \
> + update_mmu_cache_range(vma, addr, ptep, 1)
>
> /*
> * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
> diff --git a/arch/arc/mm/cache.c b/arch/arc/mm/cache.c
> index 55c6de138eae..3c16ee942a5c 100644
> --- a/arch/arc/mm/cache.c
> +++ b/arch/arc/mm/cache.c
> @@ -752,17 +752,17 @@ static inline void arc_slc_enable(void)
> * There's a corollary case, where kernel READs from a userspace mapped page.
> * If the U-mapping is not congruent to K-mapping, former needs flushing.
> */
> -void flush_dcache_page(struct page *page)
> +void flush_dcache_folio(struct folio *folio)
> {
> struct address_space *mapping;
>
> if (!cache_is_vipt_aliasing()) {
> - clear_bit(PG_dc_clean, &page->flags);
> + clear_bit(PG_dc_clean, &folio->flags);
> return;
> }
>
> /* don't handle anon pages here */
> - mapping = page_mapping_file(page);
> + mapping = folio_flush_mapping(folio);
> if (!mapping)
> return;
>
> @@ -771,17 +771,27 @@ void flush_dcache_page(struct page *page)
> * Make a note that K-mapping is dirty
> */
> if (!mapping_mapped(mapping)) {
> - clear_bit(PG_dc_clean, &page->flags);
> - } else if (page_mapcount(page)) {
> -
> + clear_bit(PG_dc_clean, &folio->flags);
> + } else if (folio_mapped(folio)) {
> /* kernel reading from page with U-mapping */
> - phys_addr_t paddr = (unsigned long)page_address(page);
> - unsigned long vaddr = page->index << PAGE_SHIFT;
> + phys_addr_t paddr = (unsigned long)folio_address(folio);
> + unsigned long vaddr = folio_pos(folio);
>
> + /*
> + * vaddr is not actually the virtual address, but is
> + * congruent to every user mapping.
> + */
> if (addr_not_cache_congruent(paddr, vaddr))
> - __flush_dcache_page(paddr, vaddr);
> + __flush_dcache_pages(paddr, vaddr,
> + folio_nr_pages(folio));
> }
> }
> +EXPORT_SYMBOL(flush_dcache_folio);
> +
> +void flush_dcache_page(struct page *page)
> +{
> + return flush_dcache_folio(page_folio(page));
> +}
> EXPORT_SYMBOL(flush_dcache_page);
>
> /*
> @@ -921,18 +931,18 @@ void __sync_icache_dcache(phys_addr_t paddr, unsigned long vaddr, int len)
> }
>
> /* wrapper to compile time eliminate alignment checks in flush loop */
> -void __inv_icache_page(phys_addr_t paddr, unsigned long vaddr)
> +void __inv_icache_pages(phys_addr_t paddr, unsigned long vaddr, unsigned nr)
> {
> - __ic_line_inv_vaddr(paddr, vaddr, PAGE_SIZE);
> + __ic_line_inv_vaddr(paddr, vaddr, nr * PAGE_SIZE);
> }
>
> /*
> * wrapper to clearout kernel or userspace mappings of a page
> * For kernel mappings @vaddr == @paddr
> */
> -void __flush_dcache_page(phys_addr_t paddr, unsigned long vaddr)
> +void __flush_dcache_pages(phys_addr_t paddr, unsigned long vaddr, unsigned nr)
> {
> - __dc_line_op(paddr, vaddr & PAGE_MASK, PAGE_SIZE, OP_FLUSH_N_INV);
> + __dc_line_op(paddr, vaddr & PAGE_MASK, nr * PAGE_SIZE, OP_FLUSH_N_INV);
> }
>
> noinline void flush_cache_all(void)
> @@ -962,10 +972,10 @@ void flush_cache_page(struct vm_area_struct *vma, unsigned long u_vaddr,
>
> u_vaddr &= PAGE_MASK;
>
> - __flush_dcache_page(paddr, u_vaddr);
> + __flush_dcache_pages(paddr, u_vaddr, 1);
>
> if (vma->vm_flags & VM_EXEC)
> - __inv_icache_page(paddr, u_vaddr);
> + __inv_icache_pages(paddr, u_vaddr, 1);
> }
>
> void flush_cache_range(struct vm_area_struct *vma, unsigned long start,
> @@ -978,9 +988,9 @@ void flush_anon_page(struct vm_area_struct *vma, struct page *page,
> unsigned long u_vaddr)
> {
> /* TBD: do we really need to clear the kernel mapping */
> - __flush_dcache_page((phys_addr_t)page_address(page), u_vaddr);
> - __flush_dcache_page((phys_addr_t)page_address(page),
> - (phys_addr_t)page_address(page));
> + __flush_dcache_pages((phys_addr_t)page_address(page), u_vaddr, 1);
> + __flush_dcache_pages((phys_addr_t)page_address(page),
> + (phys_addr_t)page_address(page), 1);
>
> }
>
> @@ -989,6 +999,8 @@ void flush_anon_page(struct vm_area_struct *vma, struct page *page,
> void copy_user_highpage(struct page *to, struct page *from,
> unsigned long u_vaddr, struct vm_area_struct *vma)
> {
> + struct folio *src = page_folio(from);
> + struct folio *dst = page_folio(to);
> void *kfrom = kmap_atomic(from);
> void *kto = kmap_atomic(to);
> int clean_src_k_mappings = 0;
> @@ -1005,7 +1017,7 @@ void copy_user_highpage(struct page *to, struct page *from,
> * addr_not_cache_congruent() is 0
> */
> if (page_mapcount(from) && addr_not_cache_congruent(kfrom, u_vaddr)) {
> - __flush_dcache_page((unsigned long)kfrom, u_vaddr);
> + __flush_dcache_pages((unsigned long)kfrom, u_vaddr, 1);
> clean_src_k_mappings = 1;
> }
>
> @@ -1019,17 +1031,17 @@ void copy_user_highpage(struct page *to, struct page *from,
> * non copied user pages (e.g. read faults which wire in pagecache page
> * directly).
> */
> - clear_bit(PG_dc_clean, &to->flags);
> + clear_bit(PG_dc_clean, &dst->flags);
>
> /*
> * if SRC was already usermapped and non-congruent to kernel mapping
> * sync the kernel mapping back to physical page
> */
> if (clean_src_k_mappings) {
> - __flush_dcache_page((unsigned long)kfrom, (unsigned long)kfrom);
> - set_bit(PG_dc_clean, &from->flags);
> + __flush_dcache_pages((unsigned long)kfrom,
> + (unsigned long)kfrom, 1);
> } else {
> - clear_bit(PG_dc_clean, &from->flags);
> + clear_bit(PG_dc_clean, &src->flags);
> }
>
> kunmap_atomic(kto);
> @@ -1038,8 +1050,9 @@ void copy_user_highpage(struct page *to, struct page *from,
>
> void clear_user_page(void *to, unsigned long u_vaddr, struct page *page)
> {
> + struct folio *folio = page_folio(page);
> clear_page(to);
> - clear_bit(PG_dc_clean, &page->flags);
> + clear_bit(PG_dc_clean, &folio->flags);
> }
> EXPORT_SYMBOL(clear_user_page);
>
> diff --git a/arch/arc/mm/tlb.c b/arch/arc/mm/tlb.c
> index 5f71445f26bd..0a996b65bb4e 100644
> --- a/arch/arc/mm/tlb.c
> +++ b/arch/arc/mm/tlb.c
> @@ -467,8 +467,8 @@ void create_tlb(struct vm_area_struct *vma, unsigned long vaddr, pte_t *ptep)
> * Note that flush (when done) involves both WBACK - so physical page is
> * in sync as well as INV - so any non-congruent aliases don't remain
> */
> -void update_mmu_cache(struct vm_area_struct *vma, unsigned long vaddr_unaligned,
> - pte_t *ptep)
> +void update_mmu_cache_range(struct vm_area_struct *vma,
> + unsigned long vaddr_unaligned, pte_t *ptep, unsigned int nr)
> {
> unsigned long vaddr = vaddr_unaligned & PAGE_MASK;
> phys_addr_t paddr = pte_val(*ptep) & PAGE_MASK_PHYS;
> @@ -491,15 +491,19 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long vaddr_unaligned,
> */
> if ((vma->vm_flags & VM_EXEC) ||
> addr_not_cache_congruent(paddr, vaddr)) {
> -
> - int dirty = !test_and_set_bit(PG_dc_clean, &page->flags);
> + struct folio *folio = page_folio(page);
> + int dirty = !test_and_set_bit(PG_dc_clean, &folio->flags);
> if (dirty) {
> + unsigned long offset = offset_in_folio(folio, paddr);
> + nr = folio_nr_pages(folio);
> + paddr -= offset;
> + vaddr -= offset;
> /* wback + inv dcache lines (K-mapping) */
> - __flush_dcache_page(paddr, paddr);
> + __flush_dcache_pages(paddr, paddr, nr);
>
> /* invalidate any existing icache lines (U-mapping) */
> if (vma->vm_flags & VM_EXEC)
> - __inv_icache_page(paddr, vaddr);
> + __inv_icache_pages(paddr, vaddr, nr);
> }
> }
> }
> @@ -531,7 +535,7 @@ void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr,
> pmd_t *pmd)
> {
> pte_t pte = __pte(pmd_val(*pmd));
> - update_mmu_cache(vma, addr, &pte);
> + update_mmu_cache_range(vma, addr, &pte, HPAGE_PMD_NR);
> }
>
> void local_flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
> diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
> index a094f964c869..841e268d2374 100644
> --- a/arch/arm/include/asm/cacheflush.h
> +++ b/arch/arm/include/asm/cacheflush.h
> @@ -231,14 +231,15 @@ vivt_flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned
> vma->vm_flags);
> }
>
> -static inline void
> -vivt_flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsigned long pfn)
> +static inline void vivt_flush_cache_pages(struct vm_area_struct *vma,
> + unsigned long user_addr, unsigned long pfn, unsigned int nr)
> {
> struct mm_struct *mm = vma->vm_mm;
>
> if (!mm || cpumask_test_cpu(smp_processor_id(), mm_cpumask(mm))) {
> unsigned long addr = user_addr & PAGE_MASK;
> - __cpuc_flush_user_range(addr, addr + PAGE_SIZE, vma->vm_flags);
> + __cpuc_flush_user_range(addr, addr + nr * PAGE_SIZE,
> + vma->vm_flags);
> }
> }
>
> @@ -247,15 +248,17 @@ vivt_flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsig
> vivt_flush_cache_mm(mm)
> #define flush_cache_range(vma,start,end) \
> vivt_flush_cache_range(vma,start,end)
> -#define flush_cache_page(vma,addr,pfn) \
> - vivt_flush_cache_page(vma,addr,pfn)
> +#define flush_cache_pages(vma, addr, pfn, nr) \
> + vivt_flush_cache_pages(vma, addr, pfn, nr)
> #else
> -extern void flush_cache_mm(struct mm_struct *mm);
> -extern void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned long end);
> -extern void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsigned long pfn);
> +void flush_cache_mm(struct mm_struct *mm);
> +void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned long end);
> +void flush_cache_pages(struct vm_area_struct *vma, unsigned long user_addr,
> + unsigned long pfn, unsigned int nr);
> #endif
>
> #define flush_cache_dup_mm(mm) flush_cache_mm(mm)
> +#define flush_cache_page(vma, addr, pfn) flush_cache_pages(vma, addr, pfn, 1)
>
> /*
> * flush_icache_user_range is used when we want to ensure that the
> @@ -289,7 +292,9 @@ extern void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr
> * See update_mmu_cache for the user space part.
> */
> #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
> -extern void flush_dcache_page(struct page *);
> +void flush_dcache_page(struct page *);
> +void flush_dcache_folio(struct folio *folio);
> +#define flush_dcache_folio flush_dcache_folio
>
> #define ARCH_IMPLEMENTS_FLUSH_KERNEL_VMAP_RANGE 1
> static inline void flush_kernel_vmap_range(void *addr, int size)
> @@ -321,6 +326,7 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
> * duplicate cache flushing elsewhere performed by flush_dcache_page().
> */
> #define flush_icache_page(vma,page) do { } while (0)
> +#define flush_icache_pages(vma, page, nr) do { } while (0)
>
> /*
> * flush_cache_vmap() is used when creating mappings (eg, via vmap,
> diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
> index a58ccbb406ad..6525ac82bd50 100644
> --- a/arch/arm/include/asm/pgtable.h
> +++ b/arch/arm/include/asm/pgtable.h
> @@ -207,8 +207,9 @@ static inline void __sync_icache_dcache(pte_t pteval)
> extern void __sync_icache_dcache(pte_t pteval);
> #endif
>
> -void set_pte_at(struct mm_struct *mm, unsigned long addr,
> - pte_t *ptep, pte_t pteval);
> +void set_ptes(struct mm_struct *mm, unsigned long addr,
> + pte_t *ptep, pte_t pteval, unsigned int nr);
> +#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
>
> static inline pte_t clear_pte_bit(pte_t pte, pgprot_t prot)
> {
> diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
> index 0ccc985b90af..7d792e485f4f 100644
> --- a/arch/arm/include/asm/tlbflush.h
> +++ b/arch/arm/include/asm/tlbflush.h
> @@ -619,18 +619,21 @@ extern void flush_bp_all(void);
> * If PG_dcache_clean is not set for the page, we need to ensure that any
> * cache entries for the kernels virtual memory range are written
> * back to the page. On ARMv6 and later, the cache coherency is handled via
> - * the set_pte_at() function.
> + * the set_ptes() function.
> */
> #if __LINUX_ARM_ARCH__ < 6
> -extern void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
> - pte_t *ptep);
> +void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long addr,
> + pte_t *ptep, unsigned int nr);
> #else
> -static inline void update_mmu_cache(struct vm_area_struct *vma,
> - unsigned long addr, pte_t *ptep)
> +static inline void update_mmu_cache_range(struct vm_area_struct *vma,
> + unsigned long addr, pte_t *ptep, unsigned int nr)
> {
> }
> #endif
>
> +#define update_mmu_cache(vma, addr, ptep) \
> + update_mmu_cache_range(vma, addr, ptep, 1)
> +
> #define update_mmu_cache_pmd(vma, address, pmd) do { } while (0)
>
> #endif
> diff --git a/arch/arm/mm/copypage-v4mc.c b/arch/arm/mm/copypage-v4mc.c
> index f1da3b439b96..7ddd82b9fe8b 100644
> --- a/arch/arm/mm/copypage-v4mc.c
> +++ b/arch/arm/mm/copypage-v4mc.c
> @@ -64,10 +64,11 @@ static void mc_copy_user_page(void *from, void *to)
> void v4_mc_copy_user_highpage(struct page *to, struct page *from,
> unsigned long vaddr, struct vm_area_struct *vma)
> {
> + struct folio *src = page_folio(from);
> void *kto = kmap_atomic(to);
>
> - if (!test_and_set_bit(PG_dcache_clean, &from->flags))
> - __flush_dcache_page(page_mapping_file(from), from);
> + if (!test_and_set_bit(PG_dcache_clean, &src->flags))
> + __flush_dcache_folio(folio_flush_mapping(src), src);
>
> raw_spin_lock(&minicache_lock);
>
> diff --git a/arch/arm/mm/copypage-v6.c b/arch/arm/mm/copypage-v6.c
> index d8a115de5507..a1a71f36d850 100644
> --- a/arch/arm/mm/copypage-v6.c
> +++ b/arch/arm/mm/copypage-v6.c
> @@ -69,11 +69,12 @@ static void discard_old_kernel_data(void *kto)
> static void v6_copy_user_highpage_aliasing(struct page *to,
> struct page *from, unsigned long vaddr, struct vm_area_struct *vma)
> {
> + struct folio *src = page_folio(from);
> unsigned int offset = CACHE_COLOUR(vaddr);
> unsigned long kfrom, kto;
>
> - if (!test_and_set_bit(PG_dcache_clean, &from->flags))
> - __flush_dcache_page(page_mapping_file(from), from);
> + if (!test_and_set_bit(PG_dcache_clean, &src->flags))
> + __flush_dcache_folio(folio_flush_mapping(src), src);
>
> /* FIXME: not highmem safe */
> discard_old_kernel_data(page_address(to));
> diff --git a/arch/arm/mm/copypage-xscale.c b/arch/arm/mm/copypage-xscale.c
> index bcb485620a05..f1e29d3e8193 100644
> --- a/arch/arm/mm/copypage-xscale.c
> +++ b/arch/arm/mm/copypage-xscale.c
> @@ -84,10 +84,11 @@ static void mc_copy_user_page(void *from, void *to)
> void xscale_mc_copy_user_highpage(struct page *to, struct page *from,
> unsigned long vaddr, struct vm_area_struct *vma)
> {
> + struct folio *src = page_folio(from);
> void *kto = kmap_atomic(to);
>
> - if (!test_and_set_bit(PG_dcache_clean, &from->flags))
> - __flush_dcache_page(page_mapping_file(from), from);
> + if (!test_and_set_bit(PG_dcache_clean, &src->flags))
> + __flush_dcache_folio(folio_flush_mapping(src), src);
>
> raw_spin_lock(&minicache_lock);
>
> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index 8bc01071474a..5ecfde41d70a 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -693,6 +693,7 @@ static void __dma_page_cpu_to_dev(struct page *page, unsigned long off,
> static void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
> size_t size, enum dma_data_direction dir)
> {
> + struct folio *folio = page_folio(page);
> phys_addr_t paddr = page_to_phys(page) + off;
>
> /* FIXME: non-speculating: not required */
> @@ -707,19 +708,18 @@ static void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
> * Mark the D-cache clean for these pages to avoid extra flushing.
> */
> if (dir != DMA_TO_DEVICE && size >= PAGE_SIZE) {
> - unsigned long pfn;
> - size_t left = size;
> -
> - pfn = page_to_pfn(page) + off / PAGE_SIZE;
> - off %= PAGE_SIZE;
> - if (off) {
> - pfn++;
> - left -= PAGE_SIZE - off;
> + ssize_t left = size;
> + size_t offset = offset_in_folio(folio, paddr);
> +
> + if (offset) {
> + left -= folio_size(folio) - offset;
> + folio = folio_next(folio);
> }
> - while (left >= PAGE_SIZE) {
> - page = pfn_to_page(pfn++);
> - set_bit(PG_dcache_clean, &page->flags);
> - left -= PAGE_SIZE;
> +
> + while (left >= (ssize_t)folio_size(folio)) {
> + set_bit(PG_dcache_clean, &folio->flags);
> + left -= folio_size(folio);
> + folio = folio_next(folio);
> }
> }
> }
> diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
> index 0e49154454a6..e2c869b8f012 100644
> --- a/arch/arm/mm/fault-armv.c
> +++ b/arch/arm/mm/fault-armv.c
> @@ -178,8 +178,8 @@ make_coherent(struct address_space *mapping, struct vm_area_struct *vma,
> *
> * Note that the pte lock will be held.
> */
> -void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
> - pte_t *ptep)
> +void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long addr,
> + pte_t *ptep, unsigned int nr)
> {
> unsigned long pfn = pte_pfn(*ptep);
> struct address_space *mapping;
> @@ -192,13 +192,13 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
> * The zero page is never written to, so never has any dirty
> * cache lines, and therefore never needs to be flushed.
> */
> - page = pfn_to_page(pfn);
> - if (page == ZERO_PAGE(0))
> + if (is_zero_pfn(pfn))
> return;
>
> - mapping = page_mapping_file(page);
> - if (!test_and_set_bit(PG_dcache_clean, &page->flags))
> - __flush_dcache_page(mapping, page);
> + folio = page_folio(pfn_to_page(pfn));
> + mapping = folio_flush_mapping(page);
> + if (!test_and_set_bit(PG_dcache_clean, &folio->flags))
> + __flush_dcache_folio(mapping, folio);
> if (mapping) {
> if (cache_is_vivt())
> make_coherent(mapping, vma, addr, ptep, pfn);
> diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
> index 7ff9feea13a6..07ea0ab51099 100644
> --- a/arch/arm/mm/flush.c
> +++ b/arch/arm/mm/flush.c
> @@ -95,10 +95,10 @@ void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned
> __flush_icache_all();
> }
>
> -void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsigned long pfn)
> +void flush_cache_pages(struct vm_area_struct *vma, unsigned long user_addr, unsigned long pfn, unsigned int nr)
> {
> if (cache_is_vivt()) {
> - vivt_flush_cache_page(vma, user_addr, pfn);
> + vivt_flush_cache_pages(vma, user_addr, pfn, nr);
> return;
> }
>
> @@ -196,29 +196,31 @@ void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
> #endif
> }
>
> -void __flush_dcache_page(struct address_space *mapping, struct page *page)
> +void __flush_dcache_folio(struct address_space *mapping, struct folio *folio)
> {
> /*
> * Writeback any data associated with the kernel mapping of this
> * page. This ensures that data in the physical page is mutually
> * coherent with the kernels mapping.
> */
> - if (!PageHighMem(page)) {
> - __cpuc_flush_dcache_area(page_address(page), page_size(page));
> + if (!folio_test_highmem(folio)) {
> + __cpuc_flush_dcache_area(folio_address(folio),
> + folio_size(folio));
> } else {
> unsigned long i;
> if (cache_is_vipt_nonaliasing()) {
> - for (i = 0; i < compound_nr(page); i++) {
> - void *addr = kmap_atomic(page + i);
> + for (i = 0; i < folio_nr_pages(folio); i++) {
> + void *addr = kmap_local_folio(folio,
> + i * PAGE_SIZE);
> __cpuc_flush_dcache_area(addr, PAGE_SIZE);
> - kunmap_atomic(addr);
> + kunmap_local(addr);
> }
> } else {
> - for (i = 0; i < compound_nr(page); i++) {
> - void *addr = kmap_high_get(page + i);
> + for (i = 0; i < folio_nr_pages(folio); i++) {
> + void *addr = kmap_high_get(folio_page(folio, i));
> if (addr) {
> __cpuc_flush_dcache_area(addr, PAGE_SIZE);
> - kunmap_high(page + i);
> + kunmap_high(folio_page(folio, i));
> }
> }
> }
> @@ -230,15 +232,14 @@ void __flush_dcache_page(struct address_space *mapping, struct page *page)
> * userspace colour, which is congruent with page->index.
> */
> if (mapping && cache_is_vipt_aliasing())
> - flush_pfn_alias(page_to_pfn(page),
> - page->index << PAGE_SHIFT);
> + flush_pfn_alias(folio_pfn(folio), folio_pos(folio));
> }
>
> -static void __flush_dcache_aliases(struct address_space *mapping, struct page *page)
> +static void __flush_dcache_aliases(struct address_space *mapping, struct folio *folio)
> {
> struct mm_struct *mm = current->active_mm;
> - struct vm_area_struct *mpnt;
> - pgoff_t pgoff;
> + struct vm_area_struct *vma;
> + pgoff_t pgoff, pgoff_end;
>
> /*
> * There are possible user space mappings of this page:
> @@ -246,21 +247,36 @@ static void __flush_dcache_aliases(struct address_space *mapping, struct page *p
> * data in the current VM view associated with this page.
> * - aliasing VIPT: we only need to find one mapping of this page.
> */
> - pgoff = page->index;
> + pgoff = folio->index;
> + pgoff_end = pgoff + folio_nr_pages(folio) - 1;
>
> flush_dcache_mmap_lock(mapping);
> - vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) {
> - unsigned long offset;
> + vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff_end) {
> + unsigned long start, offset, pfn;
> + unsigned int nr;
>
> /*
> * If this VMA is not in our MM, we can ignore it.
> */
> - if (mpnt->vm_mm != mm)
> + if (vma->vm_mm != mm)
> continue;
> - if (!(mpnt->vm_flags & VM_MAYSHARE))
> + if (!(vma->vm_flags & VM_MAYSHARE))
> continue;
> - offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
> - flush_cache_page(mpnt, mpnt->vm_start + offset, page_to_pfn(page));
> +
> + start = vma->vm_start;
> + pfn = folio_pfn(folio);
> + nr = folio_nr_pages(folio);
> + offset = pgoff - vma->vm_pgoff;
> + if (offset > -nr) {
> + pfn -= offset;
> + nr += offset;
> + } else {
> + start += offset * PAGE_SIZE;
> + }
> + if (start + nr * PAGE_SIZE > vma->vm_end)
> + nr = (vma->vm_end - start) / PAGE_SIZE;
> +
> + flush_cache_pages(vma, start, pfn, nr);
> }
> flush_dcache_mmap_unlock(mapping);
> }
> @@ -269,7 +285,7 @@ static void __flush_dcache_aliases(struct address_space *mapping, struct page *p
> void __sync_icache_dcache(pte_t pteval)
> {
> unsigned long pfn;
> - struct page *page;
> + struct folio *folio;
> struct address_space *mapping;
>
> if (cache_is_vipt_nonaliasing() && !pte_exec(pteval))
> @@ -279,14 +295,14 @@ void __sync_icache_dcache(pte_t pteval)
> if (!pfn_valid(pfn))
> return;
>
> - page = pfn_to_page(pfn);
> + folio = page_folio(pfn_to_page(pfn));
> if (cache_is_vipt_aliasing())
> - mapping = page_mapping_file(page);
> + mapping = folio_flush_mapping(folio);
> else
> mapping = NULL;
>
> - if (!test_and_set_bit(PG_dcache_clean, &page->flags))
> - __flush_dcache_page(mapping, page);
> + if (!test_and_set_bit(PG_dcache_clean, &folio->flags))
> + __flush_dcache_folio(mapping, folio);
>
> if (pte_exec(pteval))
> __flush_icache_all();
> @@ -312,7 +328,7 @@ void __sync_icache_dcache(pte_t pteval)
> * Note that we disable the lazy flush for SMP configurations where
> * the cache maintenance operations are not automatically broadcasted.
> */
> -void flush_dcache_page(struct page *page)
> +void flush_dcache_folio(struct folio *folio)
> {
> struct address_space *mapping;
>
> @@ -320,31 +336,36 @@ void flush_dcache_page(struct page *page)
> * The zero page is never written to, so never has any dirty
> * cache lines, and therefore never needs to be flushed.
> */
> - if (page == ZERO_PAGE(0))
> + if (is_zero_pfn(folio_pfn(folio)))
> return;
>
> if (!cache_ops_need_broadcast() && cache_is_vipt_nonaliasing()) {
> - if (test_bit(PG_dcache_clean, &page->flags))
> - clear_bit(PG_dcache_clean, &page->flags);
> + if (test_bit(PG_dcache_clean, &folio->flags))
> + clear_bit(PG_dcache_clean, &folio->flags);
> return;
> }
>
> - mapping = page_mapping_file(page);
> + mapping = folio_flush_mapping(folio);
>
> if (!cache_ops_need_broadcast() &&
> - mapping && !page_mapcount(page))
> - clear_bit(PG_dcache_clean, &page->flags);
> + mapping && !folio_mapped(folio))
> + clear_bit(PG_dcache_clean, &folio->flags);
> else {
> - __flush_dcache_page(mapping, page);
> + __flush_dcache_folio(mapping, folio);
> if (mapping && cache_is_vivt())
> - __flush_dcache_aliases(mapping, page);
> + __flush_dcache_aliases(mapping, folio);
> else if (mapping)
> __flush_icache_all();
> - set_bit(PG_dcache_clean, &page->flags);
> + set_bit(PG_dcache_clean, &folio->flags);
> }
> }
> -EXPORT_SYMBOL(flush_dcache_page);
> +EXPORT_SYMBOL(flush_dcache_folio);
>
> +void flush_dcache_page(struct page *page)
> +{
> + flush_dcache_folio(page_folio(page));
> +}
> +EXPORT_SYMBOL(flush_dcache_page);
> /*
> * Flush an anonymous page so that users of get_user_pages()
> * can safely access the data. The expected sequence is:
> diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
> index d7ffccb7fea7..419316316711 100644
> --- a/arch/arm/mm/mm.h
> +++ b/arch/arm/mm/mm.h
> @@ -45,7 +45,7 @@ struct mem_type {
>
> const struct mem_type *get_mem_type(unsigned int type);
>
> -extern void __flush_dcache_page(struct address_space *mapping, struct page *page);
> +void __flush_dcache_folio(struct address_space *mapping, struct folio *folio);
>
> /*
> * ARM specific vm_struct->flags bits.
> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
> index 463fc2a8448f..9947bbc32b04 100644
> --- a/arch/arm/mm/mmu.c
> +++ b/arch/arm/mm/mmu.c
> @@ -1788,7 +1788,7 @@ void __init paging_init(const struct machine_desc *mdesc)
> bootmem_init();
>
> empty_zero_page = virt_to_page(zero_page);
> - __flush_dcache_page(NULL, empty_zero_page);
> + __flush_dcache_folio(NULL, page_folio(empty_zero_page));
> }
>
> void __init early_mm_init(const struct machine_desc *mdesc)
> @@ -1797,8 +1797,8 @@ void __init early_mm_init(const struct machine_desc *mdesc)
> early_paging_init(mdesc);
> }
>
> -void set_pte_at(struct mm_struct *mm, unsigned long addr,
> - pte_t *ptep, pte_t pteval)
> +void set_ptes(struct mm_struct *mm, unsigned long addr,
> + pte_t *ptep, pte_t pteval, unsigned int nr)
> {
> unsigned long ext = 0;
>
> @@ -1808,5 +1808,11 @@ void set_pte_at(struct mm_struct *mm, unsigned long addr,
> ext |= PTE_EXT_NG;
> }
>
> - set_pte_ext(ptep, pteval, ext);
> + for (;;) {
> + set_pte_ext(ptep, pteval, ext);
> + if (--nr == 0)
> + break;
> + ptep++;
> + pte_val(pteval) += PAGE_SIZE;
> + }
> }


2023-02-28 06:58:40

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH v2 18/30] powerpc: Implement the new page table range API



Le 27/02/2023 à 21:20, Matthew Wilcox a écrit :
> On Mon, Feb 27, 2023 at 07:45:08PM +0000, Christophe Leroy wrote:
>> Hi,
>>
>> Le 27/02/2023 à 18:57, Matthew Wilcox (Oracle) a écrit :
>>> Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
>>> Change the PG_arch_1 (aka PG_dcache_dirty) flag from being per-page to
>>> per-folio.
>>>
>>> I'm unsure about my merging of flush_dcache_icache_hugepage() and
>>> flush_dcache_icache_page() into flush_dcache_icache_folio() and subsequent
>>> removal of flush_dcache_icache_phys(). Please review.
>>
>> Not sure why you want to remove flush_dcache_icache_phys().
>
> Well, I didn't, necessarily. It's just that when I merged
> flush_dcache_icache_hugepage() and flush_dcache_icache_page()
> together, it was left with no callers.
>
>> Allthough that's only feasible when address bus is not wider than 32
>> bits and cannot be done on BOOKE as you can't switch off MMU on BOOKE,
>> flush_dcache_icache_phys() allows to flush not mapped pages without
>> having to map them. So it is more efficient.
>
> And it was just never done for the hugepage case?

I think on PPC32 hugepages are available only on 8xx and BOOKE. 8xx
doesn't have HIGHMEM and BOOKE cannot switch MMU off. So there is no use
case for flush_dcache_icache_phys() with hugepages.

>
>>> @@ -148,17 +103,20 @@ static void __flush_dcache_icache(void *p)
>>> invalidate_icache_range(addr, addr + PAGE_SIZE);
>>> }
>>>
>>> -static void flush_dcache_icache_hugepage(struct page *page)
>>> +void flush_dcache_icache_folio(struct folio *folio)
>>> {
>>> - int i;
>>> - int nr = compound_nr(page);
>>> + unsigned int i, nr = folio_nr_pages(folio);
>>>
>>> - if (!PageHighMem(page)) {
>>> + if (flush_coherent_icache())
>>> + return;
>>> +
>>> + if (!folio_test_highmem(folio)) {
>>> + void *addr = folio_address(folio);
>>> for (i = 0; i < nr; i++)
>>> - __flush_dcache_icache(lowmem_page_address(page + i));
>>> + __flush_dcache_icache(addr + i * PAGE_SIZE);
>>> } else {
>>> for (i = 0; i < nr; i++) {
>>> - void *start = kmap_local_page(page + i);
>>> + void *start = kmap_local_folio(folio, i * PAGE_SIZE);
>>>
>>> __flush_dcache_icache(start);
>>> kunmap_local(start);
>
> So you'd like this to be:
>
> } else if (IS_ENABLED(CONFIG_BOOKE) || sizeof(phys_addr_t) > sizeof(void *)) {
> for (i = 0; i < nr; i++) {
> void *start = kmap_local_folio(folio, i * PAGE_SIZE);
> __flush_dcache_icache(start);
> kunmap_local(start);
> }
> } else {
> unsigned long pfn = folio_pfn(folio);
> for (i = 0; i < nr; i++)
> flush_dcache_icache_phys((pfn + i) * PAGE_SIZE;
> }
>
> (or maybe you'd prefer a flush_dcache_icache_pfn() that doesn't need to
> worry about PAGE_MASK).

Yes looks good.


Christophe

2023-02-28 16:26:04

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v2 06/30] arc: Implement the new page table range API

On Mon, Feb 27, 2023 at 10:34:05PM -0800, Vineet Gupta wrote:
> You need to split ARC and ARM into separate patches.

Ugh. Looks like I inadvertently squashed them together during a rebase.

c228f5b4e007 HEAD@{121}: rebase (reword): arm64: Implement the new page table ra
nge API
22744c8ae873 HEAD@{122}: rebase (fixup): arc: Implement the new page table range
API
11da1e433338 HEAD@{123}: rebase (fixup): # This is a combination of 2 commits.
d68d7ab9b184 HEAD@{124}: rebase (start): checkout next-20230225

I was trying to fixup an arm commit and looks like i squashed the arm
commit with the arc commit instead. Will fix and resend.

> Also it'd be best to drop all the VIPT aliasing bits for ARC, they are a
> needless maintenance burden.
> I can send a patch which you could carry in your tree for easier logistics.

Works for me! I don't mind if you want to drop the VIPT bits before
or after my changes; I can adapt to either. Thanks