2023-01-13 17:22:07

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH mm-unstable v1 00/26] mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs

This is the follow-up on [1]:
[PATCH v2 0/8] mm: COW fixes part 3: reliable GUP R/W FOLL_GET of
anonymous pages

After we implemented __HAVE_ARCH_PTE_SWP_EXCLUSIVE on most prominent
enterprise architectures, implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all
remaining architectures that support swap PTEs.

This makes sure that exclusive anonymous pages will stay exclusive, even
after they were swapped out -- for example, making GUP R/W FOLL_GET of
anonymous pages reliable. Details can be found in [1].

This primarily fixes remaining known O_DIRECT memory corruptions that can
happen on concurrent swapout, whereby we can lose DMA reads to a page
(modifying the user page by writing to it).

To verify, there are two test cases (requiring swap space, obviously):
(1) The O_DIRECT+swapout test case [2] from Andrea. This test case tries
triggering a race condition.
(2) My vmsplice() test case [3] that tries to detect if the exclusive
marker was lost during swapout, not relying on a race condition.


For example, on 32bit x86 (with and without PAE), my test case fails
without these patches:
$ ./test_swp_exclusive
FAIL: page was replaced during COW
But succeeds with these patches:
$ ./test_swp_exclusive
PASS: page was not replaced during COW


Why implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE for all architectures, even
the ones where swap support might be in a questionable state? This is the
first step towards removing "readable_exclusive" migration entries, and
instead using pte_swp_exclusive() also with (readable) migration entries
instead (as suggested by Peter). The only missing piece for that is
supporting pmd_swp_exclusive() on relevant architectures with THP
migration support.

As all relevant architectures now implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE,,
we can drop __HAVE_ARCH_PTE_SWP_EXCLUSIVE in the last patch.

I tried cross-compiling all relevant setups and tested on x86 and sparc64
so far.

CCing arch maintainers only on this cover letter and on the respective
patch(es).

[1] https://lkml.kernel.org/r/[email protected]
[2] https://gitlab.com/aarcange/kernel-testcases-for-v5.11/-/blob/main/page_count_do_wp_page-swap.c
[3] https://gitlab.com/davidhildenbrand/scratchspace/-/blob/main/test_swp_exclusive.c


RFC -> v1:
* Some smaller comment+patch description changes
* "powerpc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 32bit book3s"
-> Fixup swap PTE description


David Hildenbrand (26):
mm/debug_vm_pgtable: more pte_swp_exclusive() sanity checks
alpha/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
arc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
arm/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
csky/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
hexagon/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
ia64/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
loongarch/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
m68k/mm: remove dummy __swp definitions for nommu
m68k/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
microblaze/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
mips/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
nios2/mm: refactor swap PTE layout
nios2/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
openrisc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
parisc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
powerpc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 32bit book3s
powerpc/nohash/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
riscv/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
sh/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
sparc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 32bit
sparc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 64bit
um/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
x86/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE also on 32bit
xtensa/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
mm: remove __HAVE_ARCH_PTE_SWP_EXCLUSIVE

arch/alpha/include/asm/pgtable.h | 40 ++++++++-
arch/arc/include/asm/pgtable-bits-arcv2.h | 26 +++++-
arch/arm/include/asm/pgtable-2level.h | 3 +
arch/arm/include/asm/pgtable-3level.h | 3 +
arch/arm/include/asm/pgtable.h | 34 +++++--
arch/arm64/include/asm/pgtable.h | 1 -
arch/csky/abiv1/inc/abi/pgtable-bits.h | 13 ++-
arch/csky/abiv2/inc/abi/pgtable-bits.h | 19 ++--
arch/csky/include/asm/pgtable.h | 17 ++++
arch/hexagon/include/asm/pgtable.h | 36 ++++++--
arch/ia64/include/asm/pgtable.h | 31 ++++++-
arch/loongarch/include/asm/pgtable-bits.h | 4 +
arch/loongarch/include/asm/pgtable.h | 38 +++++++-
arch/m68k/include/asm/mcf_pgtable.h | 35 +++++++-
arch/m68k/include/asm/motorola_pgtable.h | 37 +++++++-
arch/m68k/include/asm/pgtable_no.h | 6 --
arch/m68k/include/asm/sun3_pgtable.h | 38 +++++++-
arch/microblaze/include/asm/pgtable.h | 44 +++++++---
arch/mips/include/asm/pgtable-32.h | 88 ++++++++++++++++---
arch/mips/include/asm/pgtable-64.h | 23 ++++-
arch/mips/include/asm/pgtable.h | 35 ++++++++
arch/nios2/include/asm/pgtable-bits.h | 3 +
arch/nios2/include/asm/pgtable.h | 37 ++++++--
arch/openrisc/include/asm/pgtable.h | 40 +++++++--
arch/parisc/include/asm/pgtable.h | 40 ++++++++-
arch/powerpc/include/asm/book3s/32/pgtable.h | 37 ++++++--
arch/powerpc/include/asm/book3s/64/pgtable.h | 1 -
arch/powerpc/include/asm/nohash/32/pgtable.h | 22 +++--
arch/powerpc/include/asm/nohash/32/pte-40x.h | 6 +-
arch/powerpc/include/asm/nohash/32/pte-44x.h | 18 +---
arch/powerpc/include/asm/nohash/32/pte-85xx.h | 4 +-
arch/powerpc/include/asm/nohash/64/pgtable.h | 24 ++++-
arch/powerpc/include/asm/nohash/pgtable.h | 15 ++++
arch/powerpc/include/asm/nohash/pte-e500.h | 1 -
arch/riscv/include/asm/pgtable-bits.h | 3 +
arch/riscv/include/asm/pgtable.h | 28 ++++--
arch/s390/include/asm/pgtable.h | 1 -
arch/sh/include/asm/pgtable_32.h | 53 ++++++++---
arch/sparc/include/asm/pgtable_32.h | 26 +++++-
arch/sparc/include/asm/pgtable_64.h | 37 +++++++-
arch/sparc/include/asm/pgtsrmmu.h | 14 +--
arch/um/include/asm/pgtable.h | 36 +++++++-
arch/x86/include/asm/pgtable-2level.h | 26 ++++--
arch/x86/include/asm/pgtable-3level.h | 26 +++++-
arch/x86/include/asm/pgtable.h | 3 -
arch/xtensa/include/asm/pgtable.h | 31 +++++--
include/linux/pgtable.h | 29 ------
mm/debug_vm_pgtable.c | 25 +++++-
mm/memory.c | 4 -
mm/rmap.c | 11 ---
50 files changed, 944 insertions(+), 228 deletions(-)

--
2.39.0


2023-01-13 17:23:27

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH mm-unstable v1 05/26] csky/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the
offset. This reduces the maximum swap space per file to 16 GiB (was 32
GiB).

We might actually be able to reuse one of the other software bits
(_PAGE_READ / PAGE_WRITE) instead, because we only have to keep
pte_present(), pte_none() and HW happy. For now, let's keep it simple
because there might be something non-obvious.

Cc: Guo Ren <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/csky/abiv1/inc/abi/pgtable-bits.h | 13 +++++++++----
arch/csky/abiv2/inc/abi/pgtable-bits.h | 19 ++++++++++++-------
arch/csky/include/asm/pgtable.h | 18 ++++++++++++++++++
3 files changed, 39 insertions(+), 11 deletions(-)

diff --git a/arch/csky/abiv1/inc/abi/pgtable-bits.h b/arch/csky/abiv1/inc/abi/pgtable-bits.h
index 752c8b3f9194..ae7a2f76dd42 100644
--- a/arch/csky/abiv1/inc/abi/pgtable-bits.h
+++ b/arch/csky/abiv1/inc/abi/pgtable-bits.h
@@ -10,6 +10,9 @@
#define _PAGE_ACCESSED (1<<3)
#define _PAGE_MODIFIED (1<<4)

+/* We borrow bit 9 to store the exclusive marker in swap PTEs. */
+#define _PAGE_SWP_EXCLUSIVE (1<<9)
+
/* implemented in hardware */
#define _PAGE_GLOBAL (1<<6)
#define _PAGE_VALID (1<<7)
@@ -26,7 +29,8 @@
#define _PAGE_PROT_NONE _PAGE_READ

/*
- * Encode and decode a swap entry
+ * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
+ * are !pte_none() && !pte_present().
*
* Format of swap PTE:
* bit 0: _PAGE_PRESENT (zero)
@@ -35,15 +39,16 @@
* bit 6: _PAGE_GLOBAL (zero)
* bit 7: _PAGE_VALID (zero)
* bit 8: swap type[4]
- * bit 9 - 31: swap offset
+ * bit 9: exclusive marker
+ * bit 10 - 31: swap offset
*/
#define __swp_type(x) ((((x).val >> 2) & 0xf) | \
(((x).val >> 4) & 0x10))
-#define __swp_offset(x) ((x).val >> 9)
+#define __swp_offset(x) ((x).val >> 10)
#define __swp_entry(type, offset) ((swp_entry_t) { \
((type & 0xf) << 2) | \
((type & 0x10) << 4) | \
- ((offset) << 9)})
+ ((offset) << 10)})

#define HAVE_ARCH_UNMAPPED_AREA

diff --git a/arch/csky/abiv2/inc/abi/pgtable-bits.h b/arch/csky/abiv2/inc/abi/pgtable-bits.h
index 7e7f389f546f..526152bd2156 100644
--- a/arch/csky/abiv2/inc/abi/pgtable-bits.h
+++ b/arch/csky/abiv2/inc/abi/pgtable-bits.h
@@ -10,6 +10,9 @@
#define _PAGE_PRESENT (1<<10)
#define _PAGE_MODIFIED (1<<11)

+/* We borrow bit 7 to store the exclusive marker in swap PTEs. */
+#define _PAGE_SWP_EXCLUSIVE (1<<7)
+
/* implemented in hardware */
#define _PAGE_GLOBAL (1<<0)
#define _PAGE_VALID (1<<1)
@@ -26,23 +29,25 @@
#define _PAGE_PROT_NONE _PAGE_WRITE

/*
- * Encode and decode a swap entry
+ * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
+ * are !pte_none() && !pte_present().
*
* Format of swap PTE:
* bit 0: _PAGE_GLOBAL (zero)
* bit 1: _PAGE_VALID (zero)
* bit 2 - 6: swap type
- * bit 7 - 8: swap offset[0 - 1]
+ * bit 7: exclusive marker
+ * bit 8: swap offset[0]
* bit 9: _PAGE_WRITE (zero)
* bit 10: _PAGE_PRESENT (zero)
- * bit 11 - 31: swap offset[2 - 22]
+ * bit 11 - 31: swap offset[1 - 21]
*/
#define __swp_type(x) (((x).val >> 2) & 0x1f)
-#define __swp_offset(x) ((((x).val >> 7) & 0x3) | \
- (((x).val >> 9) & 0x7ffffc))
+#define __swp_offset(x) ((((x).val >> 8) & 0x1) | \
+ (((x).val >> 10) & 0x3ffffe))
#define __swp_entry(type, offset) ((swp_entry_t) { \
((type & 0x1f) << 2) | \
- ((offset & 0x3) << 7) | \
- ((offset & 0x7ffffc) << 9)})
+ ((offset & 0x1) << 8) | \
+ ((offset & 0x3ffffe) << 10)})

#endif /* __ASM_CSKY_PGTABLE_BITS_H */
diff --git a/arch/csky/include/asm/pgtable.h b/arch/csky/include/asm/pgtable.h
index 77bc6caff2d2..574c97b9ecca 100644
--- a/arch/csky/include/asm/pgtable.h
+++ b/arch/csky/include/asm/pgtable.h
@@ -200,6 +200,24 @@ static inline pte_t pte_mkyoung(pte_t pte)
return pte;
}

+#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
+static inline int pte_swp_exclusive(pte_t pte)
+{
+ return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
+}
+
+static inline pte_t pte_swp_mkexclusive(pte_t pte)
+{
+ pte_val(pte) |= _PAGE_SWP_EXCLUSIVE;
+ return pte;
+}
+
+static inline pte_t pte_swp_clear_exclusive(pte_t pte)
+{
+ pte_val(pte) &= ~_PAGE_SWP_EXCLUSIVE;
+ return pte;
+}
+
#define __HAVE_PHYS_MEM_ACCESS_PROT
struct file;
extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
--
2.39.0

2023-01-13 17:24:49

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH mm-unstable v1 09/26] m68k/mm: remove dummy __swp definitions for nommu

The definitions are not required, let's remove them.

Cc: Geert Uytterhoeven <[email protected]>
Cc: Greg Ungerer <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/m68k/include/asm/pgtable_no.h | 6 ------
1 file changed, 6 deletions(-)

diff --git a/arch/m68k/include/asm/pgtable_no.h b/arch/m68k/include/asm/pgtable_no.h
index fed58da3a6b6..fc044df52b96 100644
--- a/arch/m68k/include/asm/pgtable_no.h
+++ b/arch/m68k/include/asm/pgtable_no.h
@@ -31,12 +31,6 @@
extern void paging_init(void);
#define swapper_pg_dir ((pgd_t *) 0)

-#define __swp_type(x) (0)
-#define __swp_offset(x) (0)
-#define __swp_entry(typ,off) ((swp_entry_t) { ((typ) | ((off) << 7)) })
-#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
-#define __swp_entry_to_pte(x) ((pte_t) { (x).val })
-
/*
* ZERO_PAGE is a global shared page that is always zero: used
* for zero-mapped memory areas etc..
--
2.39.0

2023-01-13 17:24:52

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH mm-unstable v1 08/26] loongarch/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit
from the type. Generic MM currently only uses 5 bits for the type
(MAX_SWAPFILES_SHIFT), so the stolen bit is effectively unused.

While at it, also mask the type in mk_swap_pte().

Note that this bit does not conflict with swap PMDs and could also be used
in swap PMD context later.

Cc: Huacai Chen <[email protected]>
Cc: WANG Xuerui <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/loongarch/include/asm/pgtable-bits.h | 4 +++
arch/loongarch/include/asm/pgtable.h | 39 ++++++++++++++++++++---
2 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/arch/loongarch/include/asm/pgtable-bits.h b/arch/loongarch/include/asm/pgtable-bits.h
index 3d1e0a69975a..8b98d22a145b 100644
--- a/arch/loongarch/include/asm/pgtable-bits.h
+++ b/arch/loongarch/include/asm/pgtable-bits.h
@@ -20,6 +20,7 @@
#define _PAGE_SPECIAL_SHIFT 11
#define _PAGE_HGLOBAL_SHIFT 12 /* HGlobal is a PMD bit */
#define _PAGE_PFN_SHIFT 12
+#define _PAGE_SWP_EXCLUSIVE_SHIFT 23
#define _PAGE_PFN_END_SHIFT 48
#define _PAGE_NO_READ_SHIFT 61
#define _PAGE_NO_EXEC_SHIFT 62
@@ -33,6 +34,9 @@
#define _PAGE_PROTNONE (_ULCAST_(1) << _PAGE_PROTNONE_SHIFT)
#define _PAGE_SPECIAL (_ULCAST_(1) << _PAGE_SPECIAL_SHIFT)

+/* We borrow bit 23 to store the exclusive marker in swap PTEs. */
+#define _PAGE_SWP_EXCLUSIVE (_ULCAST_(1) << _PAGE_SWP_EXCLUSIVE_SHIFT)
+
/* Used by TLB hardware (placed in EntryLo*) */
#define _PAGE_VALID (_ULCAST_(1) << _PAGE_VALID_SHIFT)
#define _PAGE_DIRTY (_ULCAST_(1) << _PAGE_DIRTY_SHIFT)
diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h
index 7a34e900d8c1..c6b8fe7ac43c 100644
--- a/arch/loongarch/include/asm/pgtable.h
+++ b/arch/loongarch/include/asm/pgtable.h
@@ -249,13 +249,26 @@ extern void pud_init(void *addr);
extern void pmd_init(void *addr);

/*
- * Non-present pages: high 40 bits are offset, next 8 bits type,
- * low 16 bits zero.
+ * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
+ * are !pte_none() && !pte_present().
+ *
+ * Format of swap PTEs:
+ *
+ * 6 6 6 6 5 5 5 5 5 5 5 5 5 5 4 4 4 4 4 4 4 4 4 4 3 3 3 3 3 3 3 3
+ * 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2
+ * <--------------------------- offset ---------------------------
+ *
+ * 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1
+ * 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ * --------------> E <--- type ---> <---------- zeroes ---------->
+ *
+ * E is the exclusive marker that is not stored in swap entries.
+ * The zero'ed bits include _PAGE_PRESENT and _PAGE_PROTNONE.
*/
static inline pte_t mk_swap_pte(unsigned long type, unsigned long offset)
-{ pte_t pte; pte_val(pte) = (type << 16) | (offset << 24); return pte; }
+{ pte_t pte; pte_val(pte) = ((type & 0x7f) << 16) | (offset << 24); return pte; }

-#define __swp_type(x) (((x).val >> 16) & 0xff)
+#define __swp_type(x) (((x).val >> 16) & 0x7f)
#define __swp_offset(x) ((x).val >> 24)
#define __swp_entry(type, offset) ((swp_entry_t) { pte_val(mk_swap_pte((type), (offset))) })
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
@@ -263,6 +276,24 @@ static inline pte_t mk_swap_pte(unsigned long type, unsigned long offset)
#define __pmd_to_swp_entry(pmd) ((swp_entry_t) { pmd_val(pmd) })
#define __swp_entry_to_pmd(x) ((pmd_t) { (x).val | _PAGE_HUGE })

+#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
+static inline int pte_swp_exclusive(pte_t pte)
+{
+ return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
+}
+
+static inline pte_t pte_swp_mkexclusive(pte_t pte)
+{
+ pte_val(pte) |= _PAGE_SWP_EXCLUSIVE;
+ return pte;
+}
+
+static inline pte_t pte_swp_clear_exclusive(pte_t pte)
+{
+ pte_val(pte) &= ~_PAGE_SWP_EXCLUSIVE;
+ return pte;
+}
+
extern void paging_init(void);

#define pte_none(pte) (!(pte_val(pte) & ~_PAGE_GLOBAL))
--
2.39.0

2023-01-13 17:25:25

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH mm-unstable v1 06/26] hexagon/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the
offset. This reduces the maximum swap space per file to 16 GiB (was 32
GiB).

While at it, mask the type in __swp_entry().

Cc: Brian Cain <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/hexagon/include/asm/pgtable.h | 37 +++++++++++++++++++++++++-----
1 file changed, 31 insertions(+), 6 deletions(-)

diff --git a/arch/hexagon/include/asm/pgtable.h b/arch/hexagon/include/asm/pgtable.h
index f7048c18b6f9..7eb008e477c8 100644
--- a/arch/hexagon/include/asm/pgtable.h
+++ b/arch/hexagon/include/asm/pgtable.h
@@ -61,6 +61,9 @@ extern unsigned long empty_zero_page;
* So we'll put up with a bit of inefficiency for now...
*/

+/* We borrow bit 6 to store the exclusive marker in swap PTEs. */
+#define _PAGE_SWP_EXCLUSIVE (1<<6)
+
/*
* Top "FOURTH" level (pgd), which for the Hexagon VM is really
* only the second from the bottom, pgd and pud both being collapsed.
@@ -359,9 +362,12 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
#define ZERO_PAGE(vaddr) (virt_to_page(&empty_zero_page))

/*
+ * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
+ * are !pte_none() && !pte_present().
+ *
* Swap/file PTE definitions. If _PAGE_PRESENT is zero, the rest of the PTE is
* interpreted as swap information. The remaining free bits are interpreted as
- * swap type/offset tuple. Rather than have the TLB fill handler test
+ * listed below. Rather than have the TLB fill handler test
* _PAGE_PRESENT, we're going to reserve the permissions bits and set them to
* all zeros for swap entries, which speeds up the miss handler at the cost of
* 3 bits of offset. That trade-off can be revisited if necessary, but Hexagon
@@ -371,9 +377,10 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
* Format of swap PTE:
* bit 0: Present (zero)
* bits 1-5: swap type (arch independent layer uses 5 bits max)
- * bits 6-9: bits 3:0 of offset
+ * bit 6: exclusive marker
+ * bits 7-9: bits 2:0 of offset
* bits 10-12: effectively _PAGE_PROTNONE (all zero)
- * bits 13-31: bits 22:4 of swap offset
+ * bits 13-31: bits 21:3 of swap offset
*
* The split offset makes some of the following macros a little gnarly,
* but there's plenty of precedent for this sort of thing.
@@ -383,11 +390,29 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
#define __swp_type(swp_pte) (((swp_pte).val >> 1) & 0x1f)

#define __swp_offset(swp_pte) \
- ((((swp_pte).val >> 6) & 0xf) | (((swp_pte).val >> 9) & 0x7ffff0))
+ ((((swp_pte).val >> 7) & 0x7) | (((swp_pte).val >> 10) & 0x3ffff8))

#define __swp_entry(type, offset) \
((swp_entry_t) { \
- ((type << 1) | \
- ((offset & 0x7ffff0) << 9) | ((offset & 0xf) << 6)) })
+ (((type & 0x1f) << 1) | \
+ ((offset & 0x3ffff8) << 10) | ((offset & 0x7) << 7)) })
+
+#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
+static inline int pte_swp_exclusive(pte_t pte)
+{
+ return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
+}
+
+static inline pte_t pte_swp_mkexclusive(pte_t pte)
+{
+ pte_val(pte) |= _PAGE_SWP_EXCLUSIVE;
+ return pte;
+}
+
+static inline pte_t pte_swp_clear_exclusive(pte_t pte)
+{
+ pte_val(pte) &= ~_PAGE_SWP_EXCLUSIVE;
+ return pte;
+}

#endif
--
2.39.0

2023-01-13 17:25:36

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH mm-unstable v1 03/26] arc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by using bit 5, which is yet
unused. The only important parts seems to be to not use _PAGE_PRESENT
(bit 9).

Cc: Vineet Gupta <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/arc/include/asm/pgtable-bits-arcv2.h | 27 ++++++++++++++++++++---
1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/arch/arc/include/asm/pgtable-bits-arcv2.h b/arch/arc/include/asm/pgtable-bits-arcv2.h
index 515e82db519f..611f412713b9 100644
--- a/arch/arc/include/asm/pgtable-bits-arcv2.h
+++ b/arch/arc/include/asm/pgtable-bits-arcv2.h
@@ -26,6 +26,9 @@
#define _PAGE_GLOBAL (1 << 8) /* ASID agnostic (H) */
#define _PAGE_PRESENT (1 << 9) /* PTE/TLB Valid (H) */

+/* We borrow bit 5 to store the exclusive marker in swap PTEs. */
+#define _PAGE_SWP_EXCLUSIVE _PAGE_DIRTY
+
#ifdef CONFIG_ARC_MMU_V4
#define _PAGE_HW_SZ (1 << 10) /* Normal/super (H) */
#else
@@ -106,9 +109,18 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
pte_t *ptep);

-/* Encode swap {type,off} tuple into PTE
- * We reserve 13 bits for 5-bit @type, keeping bits 12-5 zero, ensuring that
- * PAGE_PRESENT is zero in a PTE holding swap "identifier"
+/*
+ * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
+ * are !pte_none() && !pte_present().
+ *
+ * Format of swap PTEs:
+ *
+ * 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1
+ * 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ * <-------------- offset -------------> <--- zero --> E < type ->
+ *
+ * E is the exclusive marker that is not stored in swap entries.
+ * The zero'ed bits include _PAGE_PRESENT.
*/
#define __swp_entry(type, off) ((swp_entry_t) \
{ ((type) & 0x1f) | ((off) << 13) })
@@ -120,6 +132,15 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })

+#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
+static inline int pte_swp_exclusive(pte_t pte)
+{
+ return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
+}
+
+PTE_BIT_FUNC(swp_mkexclusive, |= (_PAGE_SWP_EXCLUSIVE));
+PTE_BIT_FUNC(swp_clear_exclusive, &= ~(_PAGE_SWP_EXCLUSIVE));
+
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
#include <asm/hugepage.h>
#endif
--
2.39.0

2023-01-13 17:25:39

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH mm-unstable v1 21/26] sparc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 32bit

Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by reusing the SRMMU_DIRTY
bit as that seems to be safe to reuse inside a swap PTE. This avoids
having to steal one bit from the swap offset.

While at it, relocate the swap PTE layout documentation and use the same
style now used for most other archs. Note that the old documentation was
wrong: we use 20 bit for the offset and the reserved bits were 8 instead
of 7 bits in the ascii art.

Cc: "David S. Miller" <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/sparc/include/asm/pgtable_32.h | 27 ++++++++++++++++++++++++++-
arch/sparc/include/asm/pgtsrmmu.h | 14 +++-----------
2 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/arch/sparc/include/asm/pgtable_32.h b/arch/sparc/include/asm/pgtable_32.h
index 5acc05b572e6..abf7a2601209 100644
--- a/arch/sparc/include/asm/pgtable_32.h
+++ b/arch/sparc/include/asm/pgtable_32.h
@@ -323,7 +323,16 @@ void srmmu_mapiorange(unsigned int bus, unsigned long xpa,
unsigned long xva, unsigned int len);
void srmmu_unmapiorange(unsigned long virt_addr, unsigned int len);

-/* Encode and de-code a swap entry */
+/*
+ * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
+ * are !pte_none() && !pte_present().
+ *
+ * Format of swap PTEs:
+ *
+ * 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1
+ * 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ * <-------------- offset ---------------> < type -> E 0 0 0 0 0 0
+ */
static inline unsigned long __swp_type(swp_entry_t entry)
{
return (entry.val >> SRMMU_SWP_TYPE_SHIFT) & SRMMU_SWP_TYPE_MASK;
@@ -344,6 +353,22 @@ static inline swp_entry_t __swp_entry(unsigned long type, unsigned long offset)
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })

+#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
+static inline int pte_swp_exclusive(pte_t pte)
+{
+ return pte_val(pte) & SRMMU_SWP_EXCLUSIVE;
+}
+
+static inline pte_t pte_swp_mkexclusive(pte_t pte)
+{
+ return __pte(pte_val(pte) | SRMMU_SWP_EXCLUSIVE);
+}
+
+static inline pte_t pte_swp_clear_exclusive(pte_t pte)
+{
+ return __pte(pte_val(pte) & ~SRMMU_SWP_EXCLUSIVE);
+}
+
static inline unsigned long
__get_phys (unsigned long addr)
{
diff --git a/arch/sparc/include/asm/pgtsrmmu.h b/arch/sparc/include/asm/pgtsrmmu.h
index 6067925972d9..18e68d43f036 100644
--- a/arch/sparc/include/asm/pgtsrmmu.h
+++ b/arch/sparc/include/asm/pgtsrmmu.h
@@ -53,21 +53,13 @@

#define SRMMU_CHG_MASK (0xffffff00 | SRMMU_REF | SRMMU_DIRTY)

-/* SRMMU swap entry encoding
- *
- * We use 5 bits for the type and 19 for the offset. This gives us
- * 32 swapfiles of 4GB each. Encoding looks like:
- *
- * oooooooooooooooooootttttRRRRRRRR
- * fedcba9876543210fedcba9876543210
- *
- * The bottom 7 bits are reserved for protection and status bits, especially
- * PRESENT.
- */
+/* SRMMU swap entry encoding */
#define SRMMU_SWP_TYPE_MASK 0x1f
#define SRMMU_SWP_TYPE_SHIFT 7
#define SRMMU_SWP_OFF_MASK 0xfffff
#define SRMMU_SWP_OFF_SHIFT (SRMMU_SWP_TYPE_SHIFT + 5)
+/* We borrow bit 6 to store the exclusive marker in swap PTEs. */
+#define SRMMU_SWP_EXCLUSIVE SRMMU_DIRTY

/* Some day I will implement true fine grained access bits for
* user pages because the SRMMU gives us the capabilities to
--
2.39.0

2023-01-13 17:25:46

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH mm-unstable v1 20/26] sh/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by using bit 6 in the PTE,
reducing the swap type in the !CONFIG_X2TLB case to 5 bits. Generic MM
currently only uses 5 bits for the type (MAX_SWAPFILES_SHIFT), so the
stolen bit is effectively unused.

Interrestingly, the swap type in the !CONFIG_X2TLB case could currently
overlap with the _PAGE_PRESENT bit, because there is a sneaky shift by 1 in
__pte_to_swp_entry() and __swp_entry_to_pte(). Bit 0-7 in the architecture
specific swap PTE would get shifted to bit 1-8 in the PTE. As generic MM
uses 5 bits only, this didn't matter so far.

While at it, mask the type in __swp_entry().

Cc: Yoshinori Sato <[email protected]>
Cc: Rich Felker <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/sh/include/asm/pgtable_32.h | 54 +++++++++++++++++++++++++-------
1 file changed, 42 insertions(+), 12 deletions(-)

diff --git a/arch/sh/include/asm/pgtable_32.h b/arch/sh/include/asm/pgtable_32.h
index d0240decacca..c34aa795a9d2 100644
--- a/arch/sh/include/asm/pgtable_32.h
+++ b/arch/sh/include/asm/pgtable_32.h
@@ -423,40 +423,70 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
#endif

/*
- * Encode and de-code a swap entry
+ * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
+ * are !pte_none() && !pte_present().
*
* Constraints:
* _PAGE_PRESENT at bit 8
* _PAGE_PROTNONE at bit 9
*
- * For the normal case, we encode the swap type into bits 0:7 and the
- * swap offset into bits 10:30. For the 64-bit PTE case, we keep the
- * preserved bits in the low 32-bits and use the upper 32 as the swap
- * offset (along with a 5-bit type), following the same approach as x86
- * PAE. This keeps the logic quite simple.
+ * For the normal case, we encode the swap type and offset into the swap PTE
+ * such that bits 8 and 9 stay zero. For the 64-bit PTE case, we use the
+ * upper 32 for the swap offset and swap type, following the same approach as
+ * x86 PAE. This keeps the logic quite simple.
*
* As is evident by the Alpha code, if we ever get a 64-bit unsigned
* long (swp_entry_t) to match up with the 64-bit PTEs, this all becomes
* much cleaner..
- *
- * NOTE: We should set ZEROs at the position of _PAGE_PRESENT
- * and _PAGE_PROTNONE bits
*/
+
#ifdef CONFIG_X2TLB
+/*
+ * Format of swap PTEs:
+ *
+ * 6 6 6 6 5 5 5 5 5 5 5 5 5 5 4 4 4 4 4 4 4 4 4 4 3 3 3 3 3 3 3 3
+ * 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2
+ * <--------------------- offset ----------------------> < type ->
+ *
+ * 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1
+ * 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ * <------------------- zeroes --------------------> E 0 0 0 0 0 0
+ */
#define __swp_type(x) ((x).val & 0x1f)
#define __swp_offset(x) ((x).val >> 5)
-#define __swp_entry(type, offset) ((swp_entry_t){ (type) | (offset) << 5})
+#define __swp_entry(type, offset) ((swp_entry_t){ ((type) & 0x1f) | (offset) << 5})
#define __pte_to_swp_entry(pte) ((swp_entry_t){ (pte).pte_high })
#define __swp_entry_to_pte(x) ((pte_t){ 0, (x).val })

#else
-#define __swp_type(x) ((x).val & 0xff)
+/*
+ * Format of swap PTEs:
+ *
+ * 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1
+ * 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ * <--------------- offset ----------------> 0 0 0 0 E < type -> 0
+ *
+ * E is the exclusive marker that is not stored in swap entries.
+ */
+#define __swp_type(x) ((x).val & 0x1f)
#define __swp_offset(x) ((x).val >> 10)
-#define __swp_entry(type, offset) ((swp_entry_t){(type) | (offset) <<10})
+#define __swp_entry(type, offset) ((swp_entry_t){((type) & 0x1f) | (offset) << 10})

#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) >> 1 })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val << 1 })
#endif

+/* In both cases, we borrow bit 6 to store the exclusive marker in swap PTEs. */
+#define _PAGE_SWP_EXCLUSIVE _PAGE_USER
+
+#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
+static inline int pte_swp_exclusive(pte_t pte)
+{
+ return pte.pte_low & _PAGE_SWP_EXCLUSIVE;
+}
+
+PTE_BIT_FUNC(low, swp_mkexclusive, |= _PAGE_SWP_EXCLUSIVE);
+PTE_BIT_FUNC(low, swp_clear_exclusive, &= ~_PAGE_SWP_EXCLUSIVE);
+
#endif /* __ASSEMBLY__ */
#endif /* __ASM_SH_PGTABLE_32_H */
--
2.39.0

2023-01-13 17:25:49

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH mm-unstable v1 13/26] nios2/mm: refactor swap PTE layout

nios2 disables swap for a good reason: it doesn't even provide
sufficient type bits as required by core MM. However, swap entries are
nowadays also used for other purposes (migration entries,
PTE markers, HWPoison, ...), and accidential use could be problematic.

Let's properly use 5 bits for the swap type and document the layout.
Bits 26--31 should get ignored by hardware completely, so they can be
used.

Cc: Dinh Nguyen <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/nios2/include/asm/pgtable.h | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/nios2/include/asm/pgtable.h b/arch/nios2/include/asm/pgtable.h
index ab793bc517f5..d1e5c9eb4643 100644
--- a/arch/nios2/include/asm/pgtable.h
+++ b/arch/nios2/include/asm/pgtable.h
@@ -232,19 +232,21 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
__FILE__, __LINE__, pgd_val(e))

/*
- * Encode and decode a swap entry (must be !pte_none(pte) && !pte_present(pte):
+ * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
+ * are !pte_none() && !pte_present().
*
- * 31 30 29 28 27 26 25 24 23 22 21 20 19 18 ... 1 0
- * 0 0 0 0 type. 0 0 0 0 0 0 offset.........
+ * Format of swap PTEs:
*
- * This gives us up to 2**2 = 4 swap files and 2**20 * 4K = 4G per swap file.
+ * 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1
+ * 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ * 0 < type -> 0 0 0 0 0 0 <-------------- offset --------------->
*
- * Note that the offset field is always non-zero, thus !pte_none(pte) is always
- * true.
+ * Note that the offset field is always non-zero if the swap type is 0, thus
+ * !pte_none() is always true.
*/
-#define __swp_type(swp) (((swp).val >> 26) & 0x3)
+#define __swp_type(swp) (((swp).val >> 26) & 0x1f)
#define __swp_offset(swp) ((swp).val & 0xfffff)
-#define __swp_entry(type, off) ((swp_entry_t) { (((type) & 0x3) << 26) \
+#define __swp_entry(type, off) ((swp_entry_t) { (((type) & 0x1f) << 26) \
| ((off) & 0xfffff) })
#define __swp_entry_to_pte(swp) ((pte_t) { (swp).val })
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
--
2.39.0

2023-01-13 17:25:56

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH mm-unstable v1 23/26] um/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by using bit 10, which is
yet unused for swap PTEs.

The pte_mkuptodate() is a bit weird in __pte_to_swp_entry() for a swap PTE
... but it only messes with bit 1 and 2 and there is a comment in
set_pte(), so leave these bits alone.

While at it, mask the type in __swp_entry().

Cc: Richard Weinberger <[email protected]>
Cc: Anton Ivanov <[email protected]>
Cc: Johannes Berg <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/um/include/asm/pgtable.h | 37 +++++++++++++++++++++++++++++++++--
1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/arch/um/include/asm/pgtable.h b/arch/um/include/asm/pgtable.h
index 4e3052f2671a..cedc5fd451ce 100644
--- a/arch/um/include/asm/pgtable.h
+++ b/arch/um/include/asm/pgtable.h
@@ -21,6 +21,9 @@
#define _PAGE_PROTNONE 0x010 /* if the user mapped it with PROT_NONE;
pte_present gives true */

+/* We borrow bit 10 to store the exclusive marker in swap PTEs. */
+#define _PAGE_SWP_EXCLUSIVE 0x400
+
#ifdef CONFIG_3_LEVEL_PGTABLES
#include <asm/pgtable-3level.h>
#else
@@ -288,16 +291,46 @@ extern pte_t *virt_to_pte(struct mm_struct *mm, unsigned long addr);

#define update_mmu_cache(vma,address,ptep) do {} while (0)

-/* Encode and de-code a swap entry */
+/*
+ * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
+ * are !pte_none() && !pte_present().
+ *
+ * Format of swap PTEs:
+ *
+ * 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1
+ * 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ * <--------------- offset ----------------> E < type -> 0 0 0 1 0
+ *
+ * E is the exclusive marker that is not stored in swap entries.
+ * _PAGE_NEWPAGE (bit 1) is always set to 1 in set_pte().
+ */
#define __swp_type(x) (((x).val >> 5) & 0x1f)
#define __swp_offset(x) ((x).val >> 11)

#define __swp_entry(type, offset) \
- ((swp_entry_t) { ((type) << 5) | ((offset) << 11) })
+ ((swp_entry_t) { (((type) & 0x1f) << 5) | ((offset) << 11) })
#define __pte_to_swp_entry(pte) \
((swp_entry_t) { pte_val(pte_mkuptodate(pte)) })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })

+#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
+static inline int pte_swp_exclusive(pte_t pte)
+{
+ return pte_get_bits(pte, _PAGE_SWP_EXCLUSIVE);
+}
+
+static inline pte_t pte_swp_mkexclusive(pte_t pte)
+{
+ pte_set_bits(pte, _PAGE_SWP_EXCLUSIVE);
+ return pte;
+}
+
+static inline pte_t pte_swp_clear_exclusive(pte_t pte)
+{
+ pte_clear_bits(pte, _PAGE_SWP_EXCLUSIVE);
+ return pte;
+}
+
/* Clear a kernel PTE and flush it from the TLB */
#define kpte_clear_flush(ptep, vaddr) \
do { \
--
2.39.0

2023-01-13 17:26:27

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH mm-unstable v1 25/26] xtensa/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by using bit 1. This
bit should be safe to use for our usecase.

Most importantly, we can still distinguish swap PTEs from PAGE_NONE PTEs
(see pte_present()) and don't use one of the two reserved attribute
masks (1101 and 1111). Attribute mask 1100 and 1110 now identify swap PTEs.

While at it, remove SWP_TYPE_BITS (not really helpful as it's not used in
the actual swap macros) and mask the type in __swp_entry().

Cc: Chris Zankel <[email protected]>
Cc: Max Filippov <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/xtensa/include/asm/pgtable.h | 32 ++++++++++++++++++++++++++-----
1 file changed, 27 insertions(+), 5 deletions(-)

diff --git a/arch/xtensa/include/asm/pgtable.h b/arch/xtensa/include/asm/pgtable.h
index 5b5484d707b2..1025e2dc292b 100644
--- a/arch/xtensa/include/asm/pgtable.h
+++ b/arch/xtensa/include/asm/pgtable.h
@@ -96,7 +96,7 @@
* +- - - - - - - - - - - - - - - - - - - - -+
* (PAGE_NONE)| PPN | 0 | 00 | ADW | 01 | 11 | 11 |
* +-----------------------------------------+
- * swap | index | type | 01 | 11 | 00 |
+ * swap | index | type | 01 | 11 | e0 |
* +-----------------------------------------+
*
* For T1050 hardware and earlier the layout differs for present and (PAGE_NONE)
@@ -112,6 +112,7 @@
* RI ring (0=privileged, 1=user, 2 and 3 are unused)
* CA cache attribute: 00 bypass, 01 writeback, 10 writethrough
* (11 is invalid and used to mark pages that are not present)
+ * e exclusive marker in swap PTEs
* w page is writable (hw)
* x page is executable (hw)
* index swap offset / PAGE_SIZE (bit 11-31: 21 bits -> 8 GB)
@@ -158,6 +159,9 @@
#define _PAGE_DIRTY (1<<7) /* software: page dirty */
#define _PAGE_ACCESSED (1<<8) /* software: page accessed (read) */

+/* We borrow bit 1 to store the exclusive marker in swap PTEs. */
+#define _PAGE_SWP_EXCLUSIVE (1<<1)
+
#ifdef CONFIG_MMU

#define _PAGE_CHG_MASK (PAGE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY)
@@ -343,19 +347,37 @@ ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
}

/*
- * Encode and decode a swap and file entry.
+ * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
+ * are !pte_none() && !pte_present().
*/
-#define SWP_TYPE_BITS 5
-#define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS)
+#define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > 5)

#define __swp_type(entry) (((entry).val >> 6) & 0x1f)
#define __swp_offset(entry) ((entry).val >> 11)
#define __swp_entry(type,offs) \
- ((swp_entry_t){((type) << 6) | ((offs) << 11) | \
+ ((swp_entry_t){(((type) & 0x1f) << 6) | ((offs) << 11) | \
_PAGE_CA_INVALID | _PAGE_USER})
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })

+#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
+static inline int pte_swp_exclusive(pte_t pte)
+{
+ return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
+}
+
+static inline pte_t pte_swp_mkexclusive(pte_t pte)
+{
+ pte_val(pte) |= _PAGE_SWP_EXCLUSIVE;
+ return pte;
+}
+
+static inline pte_t pte_swp_clear_exclusive(pte_t pte)
+{
+ pte_val(pte) &= ~_PAGE_SWP_EXCLUSIVE;
+ return pte;
+}
+
#endif /* !defined (__ASSEMBLY__) */


--
2.39.0

2023-01-13 17:26:39

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH mm-unstable v1 26/26] mm: remove __HAVE_ARCH_PTE_SWP_EXCLUSIVE

__HAVE_ARCH_PTE_SWP_EXCLUSIVE is now supported by all architectures that
support swp PTEs, so let's drop it.

Signed-off-by: David Hildenbrand <[email protected]>
---
arch/alpha/include/asm/pgtable.h | 1 -
arch/arc/include/asm/pgtable-bits-arcv2.h | 1 -
arch/arm/include/asm/pgtable.h | 1 -
arch/arm64/include/asm/pgtable.h | 1 -
arch/csky/include/asm/pgtable.h | 1 -
arch/hexagon/include/asm/pgtable.h | 1 -
arch/ia64/include/asm/pgtable.h | 1 -
arch/loongarch/include/asm/pgtable.h | 1 -
arch/m68k/include/asm/mcf_pgtable.h | 1 -
arch/m68k/include/asm/motorola_pgtable.h | 1 -
arch/m68k/include/asm/sun3_pgtable.h | 1 -
arch/microblaze/include/asm/pgtable.h | 1 -
arch/mips/include/asm/pgtable.h | 1 -
arch/nios2/include/asm/pgtable.h | 1 -
arch/openrisc/include/asm/pgtable.h | 1 -
arch/parisc/include/asm/pgtable.h | 1 -
arch/powerpc/include/asm/book3s/32/pgtable.h | 1 -
arch/powerpc/include/asm/book3s/64/pgtable.h | 1 -
arch/powerpc/include/asm/nohash/pgtable.h | 1 -
arch/riscv/include/asm/pgtable.h | 1 -
arch/s390/include/asm/pgtable.h | 1 -
arch/sh/include/asm/pgtable_32.h | 1 -
arch/sparc/include/asm/pgtable_32.h | 1 -
arch/sparc/include/asm/pgtable_64.h | 1 -
arch/um/include/asm/pgtable.h | 1 -
arch/x86/include/asm/pgtable.h | 1 -
arch/xtensa/include/asm/pgtable.h | 1 -
include/linux/pgtable.h | 29 --------------------
mm/debug_vm_pgtable.c | 2 --
mm/memory.c | 4 ---
mm/rmap.c | 11 --------
31 files changed, 73 deletions(-)

diff --git a/arch/alpha/include/asm/pgtable.h b/arch/alpha/include/asm/pgtable.h
index 970abf511b13..ba43cb841d19 100644
--- a/arch/alpha/include/asm/pgtable.h
+++ b/arch/alpha/include/asm/pgtable.h
@@ -328,7 +328,6 @@ extern inline pte_t mk_swap_pte(unsigned long type, unsigned long offset)
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })

-#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
static inline int pte_swp_exclusive(pte_t pte)
{
return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
diff --git a/arch/arc/include/asm/pgtable-bits-arcv2.h b/arch/arc/include/asm/pgtable-bits-arcv2.h
index 611f412713b9..6e9f8ca6d6a1 100644
--- a/arch/arc/include/asm/pgtable-bits-arcv2.h
+++ b/arch/arc/include/asm/pgtable-bits-arcv2.h
@@ -132,7 +132,6 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })

-#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
static inline int pte_swp_exclusive(pte_t pte)
{
return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 886c275995a2..2e626e6da9a3 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -298,7 +298,6 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
#define __swp_entry_to_pte(swp) __pte((swp).val)

-#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
static inline int pte_swp_exclusive(pte_t pte)
{
return pte_isset(pte, L_PTE_SWP_EXCLUSIVE);
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index b4bbeed80fb6..e0c19f5e3413 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -417,7 +417,6 @@ static inline pgprot_t mk_pmd_sect_prot(pgprot_t prot)
return __pgprot((pgprot_val(prot) & ~PMD_TABLE_BIT) | PMD_TYPE_SECT);
}

-#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
static inline pte_t pte_swp_mkexclusive(pte_t pte)
{
return set_pte_bit(pte, __pgprot(PTE_SWP_EXCLUSIVE));
diff --git a/arch/csky/include/asm/pgtable.h b/arch/csky/include/asm/pgtable.h
index 574c97b9ecca..d4042495febc 100644
--- a/arch/csky/include/asm/pgtable.h
+++ b/arch/csky/include/asm/pgtable.h
@@ -200,7 +200,6 @@ static inline pte_t pte_mkyoung(pte_t pte)
return pte;
}

-#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
static inline int pte_swp_exclusive(pte_t pte)
{
return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
diff --git a/arch/hexagon/include/asm/pgtable.h b/arch/hexagon/include/asm/pgtable.h
index 7eb008e477c8..59393613d086 100644
--- a/arch/hexagon/include/asm/pgtable.h
+++ b/arch/hexagon/include/asm/pgtable.h
@@ -397,7 +397,6 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
(((type & 0x1f) << 1) | \
((offset & 0x3ffff8) << 10) | ((offset & 0x7) << 7)) })

-#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
static inline int pte_swp_exclusive(pte_t pte)
{
return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
diff --git a/arch/ia64/include/asm/pgtable.h b/arch/ia64/include/asm/pgtable.h
index e4b8ab931399..21c97e31a28a 100644
--- a/arch/ia64/include/asm/pgtable.h
+++ b/arch/ia64/include/asm/pgtable.h
@@ -424,7 +424,6 @@ extern void paging_init (void);
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })

-#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
static inline int pte_swp_exclusive(pte_t pte)
{
return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h
index c6b8fe7ac43c..d28fb9dbec59 100644
--- a/arch/loongarch/include/asm/pgtable.h
+++ b/arch/loongarch/include/asm/pgtable.h
@@ -276,7 +276,6 @@ static inline pte_t mk_swap_pte(unsigned long type, unsigned long offset)
#define __pmd_to_swp_entry(pmd) ((swp_entry_t) { pmd_val(pmd) })
#define __swp_entry_to_pmd(x) ((pmd_t) { (x).val | _PAGE_HUGE })

-#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
static inline int pte_swp_exclusive(pte_t pte)
{
return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
diff --git a/arch/m68k/include/asm/mcf_pgtable.h b/arch/m68k/include/asm/mcf_pgtable.h
index e573d7b649f7..13741c1245e1 100644
--- a/arch/m68k/include/asm/mcf_pgtable.h
+++ b/arch/m68k/include/asm/mcf_pgtable.h
@@ -275,7 +275,6 @@ extern pgd_t kernel_pg_dir[PTRS_PER_PGD];
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
#define __swp_entry_to_pte(x) (__pte((x).val))

-#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
static inline int pte_swp_exclusive(pte_t pte)
{
return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
diff --git a/arch/m68k/include/asm/motorola_pgtable.h b/arch/m68k/include/asm/motorola_pgtable.h
index c1782563e793..ec0dc19ab834 100644
--- a/arch/m68k/include/asm/motorola_pgtable.h
+++ b/arch/m68k/include/asm/motorola_pgtable.h
@@ -190,7 +190,6 @@ extern pgd_t kernel_pg_dir[128];
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })

-#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
static inline int pte_swp_exclusive(pte_t pte)
{
return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
diff --git a/arch/m68k/include/asm/sun3_pgtable.h b/arch/m68k/include/asm/sun3_pgtable.h
index dbfc9703b15d..e582b0484a55 100644
--- a/arch/m68k/include/asm/sun3_pgtable.h
+++ b/arch/m68k/include/asm/sun3_pgtable.h
@@ -174,7 +174,6 @@ extern pgd_t kernel_pg_dir[PTRS_PER_PGD];
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })

-#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
static inline int pte_swp_exclusive(pte_t pte)
{
return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
diff --git a/arch/microblaze/include/asm/pgtable.h b/arch/microblaze/include/asm/pgtable.h
index 7e3de54bf426..d1b8272abcd9 100644
--- a/arch/microblaze/include/asm/pgtable.h
+++ b/arch/microblaze/include/asm/pgtable.h
@@ -412,7 +412,6 @@ extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) >> 2 })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val << 2 })

-#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
static inline int pte_swp_exclusive(pte_t pte)
{
return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
index 711874cee8e4..791389bf3c12 100644
--- a/arch/mips/include/asm/pgtable.h
+++ b/arch/mips/include/asm/pgtable.h
@@ -528,7 +528,6 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
}
#endif

-#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
#if defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32)
static inline int pte_swp_exclusive(pte_t pte)
{
diff --git a/arch/nios2/include/asm/pgtable.h b/arch/nios2/include/asm/pgtable.h
index 05999da01731..0f5c2564e9f5 100644
--- a/arch/nios2/include/asm/pgtable.h
+++ b/arch/nios2/include/asm/pgtable.h
@@ -253,7 +253,6 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
#define __swp_entry_to_pte(swp) ((pte_t) { (swp).val })
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })

-#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
static inline int pte_swp_exclusive(pte_t pte)
{
return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
diff --git a/arch/openrisc/include/asm/pgtable.h b/arch/openrisc/include/asm/pgtable.h
index 903b32d662ab..3eb9b9555d0d 100644
--- a/arch/openrisc/include/asm/pgtable.h
+++ b/arch/openrisc/include/asm/pgtable.h
@@ -408,7 +408,6 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })

-#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
static inline int pte_swp_exclusive(pte_t pte)
{
return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
diff --git a/arch/parisc/include/asm/pgtable.h b/arch/parisc/include/asm/pgtable.h
index 3033bb88df34..e2950f5db7c9 100644
--- a/arch/parisc/include/asm/pgtable.h
+++ b/arch/parisc/include/asm/pgtable.h
@@ -422,7 +422,6 @@ extern void paging_init (void);
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })

-#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
static inline int pte_swp_exclusive(pte_t pte)
{
return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 0ecb3a58f23f..7bf1fe7297c6 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -386,7 +386,6 @@ static inline void __ptep_set_access_flags(struct vm_area_struct *vma,
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) >> 3 })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val << 3 })

-#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
static inline int pte_swp_exclusive(pte_t pte)
{
return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index cb4c67bf45d7..4acc9690f599 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -717,7 +717,6 @@ static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
}
#endif /* CONFIG_HAVE_ARCH_SOFT_DIRTY */

-#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
static inline pte_t pte_swp_mkexclusive(pte_t pte)
{
return __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_SWP_EXCLUSIVE));
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h
index 5f4620940c2c..a6caaaab6f92 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -151,7 +151,6 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
return __pte((pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot));
}

-#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
static inline int pte_swp_exclusive(pte_t pte)
{
return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 03a4728db039..5b9f409a940d 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -752,7 +752,6 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })

-#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
static inline int pte_swp_exclusive(pte_t pte)
{
return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index b26cbf1c533c..2b5db99e31dd 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -812,7 +812,6 @@ static inline int pmd_protnone(pmd_t pmd)
}
#endif

-#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
static inline int pte_swp_exclusive(pte_t pte)
{
return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
diff --git a/arch/sh/include/asm/pgtable_32.h b/arch/sh/include/asm/pgtable_32.h
index c34aa795a9d2..21952b094650 100644
--- a/arch/sh/include/asm/pgtable_32.h
+++ b/arch/sh/include/asm/pgtable_32.h
@@ -479,7 +479,6 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
/* In both cases, we borrow bit 6 to store the exclusive marker in swap PTEs. */
#define _PAGE_SWP_EXCLUSIVE _PAGE_USER

-#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
static inline int pte_swp_exclusive(pte_t pte)
{
return pte.pte_low & _PAGE_SWP_EXCLUSIVE;
diff --git a/arch/sparc/include/asm/pgtable_32.h b/arch/sparc/include/asm/pgtable_32.h
index abf7a2601209..d4330e3c57a6 100644
--- a/arch/sparc/include/asm/pgtable_32.h
+++ b/arch/sparc/include/asm/pgtable_32.h
@@ -353,7 +353,6 @@ static inline swp_entry_t __swp_entry(unsigned long type, unsigned long offset)
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })

-#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
static inline int pte_swp_exclusive(pte_t pte)
{
return pte_val(pte) & SRMMU_SWP_EXCLUSIVE;
diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index a1658eebd036..2dc8d4641734 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -989,7 +989,6 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })

-#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
static inline int pte_swp_exclusive(pte_t pte)
{
return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
diff --git a/arch/um/include/asm/pgtable.h b/arch/um/include/asm/pgtable.h
index cedc5fd451ce..a70d1618eb35 100644
--- a/arch/um/include/asm/pgtable.h
+++ b/arch/um/include/asm/pgtable.h
@@ -313,7 +313,6 @@ extern pte_t *virt_to_pte(struct mm_struct *mm, unsigned long addr);
((swp_entry_t) { pte_val(pte_mkuptodate(pte)) })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })

-#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
static inline int pte_swp_exclusive(pte_t pte)
{
return pte_get_bits(pte, _PAGE_SWP_EXCLUSIVE);
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index d25195726b78..7425f32e5293 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1299,7 +1299,6 @@ static inline void update_mmu_cache_pud(struct vm_area_struct *vma,
unsigned long addr, pud_t *pud)
{
}
-#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
static inline pte_t pte_swp_mkexclusive(pte_t pte)
{
return pte_set_flags(pte, _PAGE_SWP_EXCLUSIVE);
diff --git a/arch/xtensa/include/asm/pgtable.h b/arch/xtensa/include/asm/pgtable.h
index 1025e2dc292b..fc7a14884c6c 100644
--- a/arch/xtensa/include/asm/pgtable.h
+++ b/arch/xtensa/include/asm/pgtable.h
@@ -360,7 +360,6 @@ ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })

-#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
static inline int pte_swp_exclusive(pte_t pte)
{
return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 1159b25b0542..5fd45454c073 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1064,35 +1064,6 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)
#define arch_start_context_switch(prev) do {} while (0)
#endif

-/*
- * When replacing an anonymous page by a real (!non) swap entry, we clear
- * PG_anon_exclusive from the page and instead remember whether the flag was
- * set in the swp pte. During fork(), we have to mark the entry as !exclusive
- * (possibly shared). On swapin, we use that information to restore
- * PG_anon_exclusive, which is very helpful in cases where we might have
- * additional (e.g., FOLL_GET) references on a page and wouldn't be able to
- * detect exclusivity.
- *
- * These functions don't apply to non-swap entries (e.g., migration, hwpoison,
- * ...).
- */
-#ifndef __HAVE_ARCH_PTE_SWP_EXCLUSIVE
-static inline pte_t pte_swp_mkexclusive(pte_t pte)
-{
- return pte;
-}
-
-static inline int pte_swp_exclusive(pte_t pte)
-{
- return false;
-}
-
-static inline pte_t pte_swp_clear_exclusive(pte_t pte)
-{
- return pte;
-}
-#endif
-
#ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
#ifndef CONFIG_ARCH_ENABLE_THP_MIGRATION
static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd)
diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index a0730beffd78..3da0cc380c35 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -810,7 +810,6 @@ static void __init pmd_swap_soft_dirty_tests(struct pgtable_debug_args *args) {

static void __init pte_swap_exclusive_tests(struct pgtable_debug_args *args)
{
-#ifdef __HAVE_ARCH_PTE_SWP_EXCLUSIVE
unsigned long max_swapfile_size = generic_max_swapfile_size();
swp_entry_t entry, entry2;
pte_t pte;
@@ -839,7 +838,6 @@ static void __init pte_swap_exclusive_tests(struct pgtable_debug_args *args)
WARN_ON(!is_swap_pte(pte));
entry2 = pte_to_swp_entry(pte);
WARN_ON(memcmp(&entry, &entry2, sizeof(entry)));
-#endif /* __HAVE_ARCH_PTE_SWP_EXCLUSIVE */
}

static void __init pte_swap_tests(struct pgtable_debug_args *args)
diff --git a/mm/memory.c b/mm/memory.c
index abc4e606d8bc..ed853ba74366 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3870,10 +3870,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
* the swap entry concurrently) for certainly exclusive pages.
*/
if (!folio_test_ksm(folio)) {
- /*
- * Note that pte_swp_exclusive() == false for architectures
- * without __HAVE_ARCH_PTE_SWP_EXCLUSIVE.
- */
exclusive = pte_swp_exclusive(vmf->orig_pte);
if (folio != swapcache) {
/*
diff --git a/mm/rmap.c b/mm/rmap.c
index 0e450e6bb963..b26fbbcc9257 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1710,17 +1710,6 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
page_vma_mapped_walk_done(&pvmw);
break;
}
- /*
- * Note: We *don't* remember if the page was mapped
- * exclusively in the swap pte if the architecture
- * doesn't support __HAVE_ARCH_PTE_SWP_EXCLUSIVE. In
- * that case, swapin code has to re-determine that
- * manually and might detect the page as possibly
- * shared, for example, if there are other references on
- * the page or if the page is under writeback. We made
- * sure that there are no GUP pins on the page that
- * would rely on it, so for GUP pins this is fine.
- */
if (list_empty(&mm->mmlist)) {
spin_lock(&mmlist_lock);
if (list_empty(&mm->mmlist))
--
2.39.0

2023-01-13 17:35:02

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH mm-unstable v1 10/26] m68k/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit
from the type. Generic MM currently only uses 5 bits for the type
(MAX_SWAPFILES_SHIFT), so the stolen bit is effectively unused.

While at it, make sure for sun3 that the valid bit never gets set by
properly masking it off and mask the type in __swp_entry().

Cc: Geert Uytterhoeven <[email protected]>
Cc: Greg Ungerer <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/m68k/include/asm/mcf_pgtable.h | 36 ++++++++++++++++++++--
arch/m68k/include/asm/motorola_pgtable.h | 38 +++++++++++++++++++++--
arch/m68k/include/asm/sun3_pgtable.h | 39 ++++++++++++++++++++++--
3 files changed, 104 insertions(+), 9 deletions(-)

diff --git a/arch/m68k/include/asm/mcf_pgtable.h b/arch/m68k/include/asm/mcf_pgtable.h
index b619b22823f8..3f8f4d0e66dd 100644
--- a/arch/m68k/include/asm/mcf_pgtable.h
+++ b/arch/m68k/include/asm/mcf_pgtable.h
@@ -46,6 +46,9 @@
#define _CACHEMASK040 (~0x060)
#define _PAGE_GLOBAL040 0x400 /* 68040 global bit, used for kva descs */

+/* We borrow bit 7 to store the exclusive marker in swap PTEs. */
+#define _PAGE_SWP_EXCLUSIVE 0x080
+
/*
* Externally used page protection values.
*/
@@ -254,15 +257,42 @@ static inline pte_t pte_mkcache(pte_t pte)
extern pgd_t kernel_pg_dir[PTRS_PER_PGD];

/*
- * Encode and de-code a swap entry (must be !pte_none(e) && !pte_present(e))
+ * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
+ * are !pte_none() && !pte_present().
+ *
+ * Format of swap PTEs:
+ *
+ * 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1
+ * 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ * <------------------ offset -------------> 0 0 0 E <-- type --->
+ *
+ * E is the exclusive marker that is not stored in swap entries.
*/
-#define __swp_type(x) ((x).val & 0xFF)
+#define __swp_type(x) ((x).val & 0x7f)
#define __swp_offset(x) ((x).val >> 11)
-#define __swp_entry(typ, off) ((swp_entry_t) { (typ) | \
+#define __swp_entry(typ, off) ((swp_entry_t) { ((typ) & 0x7f) | \
(off << 11) })
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
#define __swp_entry_to_pte(x) (__pte((x).val))

+#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
+static inline int pte_swp_exclusive(pte_t pte)
+{
+ return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
+}
+
+static inline pte_t pte_swp_mkexclusive(pte_t pte)
+{
+ pte_val(pte) |= _PAGE_SWP_EXCLUSIVE;
+ return pte;
+}
+
+static inline pte_t pte_swp_clear_exclusive(pte_t pte)
+{
+ pte_val(pte) &= ~_PAGE_SWP_EXCLUSIVE;
+ return pte;
+}
+
#define pmd_pfn(pmd) (pmd_val(pmd) >> PAGE_SHIFT)
#define pmd_page(pmd) (pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT))

diff --git a/arch/m68k/include/asm/motorola_pgtable.h b/arch/m68k/include/asm/motorola_pgtable.h
index 562b54e09850..c1782563e793 100644
--- a/arch/m68k/include/asm/motorola_pgtable.h
+++ b/arch/m68k/include/asm/motorola_pgtable.h
@@ -41,6 +41,9 @@

#define _PAGE_PROTNONE 0x004

+/* We borrow bit 11 to store the exclusive marker in swap PTEs. */
+#define _PAGE_SWP_EXCLUSIVE 0x800
+
#ifndef __ASSEMBLY__

/* This is the cache mode to be used for pages containing page descriptors for
@@ -169,12 +172,41 @@ static inline pte_t pte_mkcache(pte_t pte)
#define swapper_pg_dir kernel_pg_dir
extern pgd_t kernel_pg_dir[128];

-/* Encode and de-code a swap entry (must be !pte_none(e) && !pte_present(e)) */
-#define __swp_type(x) (((x).val >> 4) & 0xff)
+/*
+ * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
+ * are !pte_none() && !pte_present().
+ *
+ * Format of swap PTEs:
+ *
+ * 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1
+ * 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ * <----------------- offset ------------> E <-- type ---> 0 0 0 0
+ *
+ * E is the exclusive marker that is not stored in swap entries.
+ */
+#define __swp_type(x) (((x).val >> 4) & 0x7f)
#define __swp_offset(x) ((x).val >> 12)
-#define __swp_entry(type, offset) ((swp_entry_t) { ((type) << 4) | ((offset) << 12) })
+#define __swp_entry(type, offset) ((swp_entry_t) { (((type) & 0x7f) << 4) | ((offset) << 12) })
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })

+#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
+static inline int pte_swp_exclusive(pte_t pte)
+{
+ return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
+}
+
+static inline pte_t pte_swp_mkexclusive(pte_t pte)
+{
+ pte_val(pte) |= _PAGE_SWP_EXCLUSIVE;
+ return pte;
+}
+
+static inline pte_t pte_swp_clear_exclusive(pte_t pte)
+{
+ pte_val(pte) &= ~_PAGE_SWP_EXCLUSIVE;
+ return pte;
+}
+
#endif /* !__ASSEMBLY__ */
#endif /* _MOTOROLA_PGTABLE_H */
diff --git a/arch/m68k/include/asm/sun3_pgtable.h b/arch/m68k/include/asm/sun3_pgtable.h
index 90d57e537eb1..dbfc9703b15d 100644
--- a/arch/m68k/include/asm/sun3_pgtable.h
+++ b/arch/m68k/include/asm/sun3_pgtable.h
@@ -71,6 +71,9 @@
#define SUN3_PMD_MASK (0x0000003F)
#define SUN3_PMD_MAGIC (0x0000002B)

+/* We borrow bit 6 to store the exclusive marker in swap PTEs. */
+#define _PAGE_SWP_EXCLUSIVE 0x040
+
#ifndef __ASSEMBLY__

/*
@@ -152,12 +155,42 @@ static inline pte_t pte_mkcache(pte_t pte) { return pte; }
extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
extern pgd_t kernel_pg_dir[PTRS_PER_PGD];

-/* Macros to (de)construct the fake PTEs representing swap pages. */
-#define __swp_type(x) ((x).val & 0x7F)
+/*
+ * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
+ * are !pte_none() && !pte_present().
+ *
+ * Format of swap PTEs:
+ *
+ * 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1
+ * 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ * 0 <--------------------- offset ----------------> E <- type -->
+ *
+ * E is the exclusive marker that is not stored in swap entries.
+ */
+#define __swp_type(x) ((x).val & 0x3f)
#define __swp_offset(x) (((x).val) >> 7)
-#define __swp_entry(type,offset) ((swp_entry_t) { ((type) | ((offset) << 7)) })
+#define __swp_entry(type, offset) ((swp_entry_t) { (((type) & 0x3f) | \
+ (((offset) << 7) & ~SUN3_PAGE_VALID)) })
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })

+#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
+static inline int pte_swp_exclusive(pte_t pte)
+{
+ return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
+}
+
+static inline pte_t pte_swp_mkexclusive(pte_t pte)
+{
+ pte_val(pte) |= _PAGE_SWP_EXCLUSIVE;
+ return pte;
+}
+
+static inline pte_t pte_swp_clear_exclusive(pte_t pte)
+{
+ pte_val(pte) &= ~_PAGE_SWP_EXCLUSIVE;
+ return pte;
+}
+
#endif /* !__ASSEMBLY__ */
#endif /* !_SUN3_PGTABLE_H */
--
2.39.0

2023-01-13 17:37:36

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH mm-unstable v1 14/26] nios2/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by using the yet-unused bit
31.

Cc: Thomas Bogendoerfer <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/nios2/include/asm/pgtable-bits.h | 3 +++
arch/nios2/include/asm/pgtable.h | 22 +++++++++++++++++++++-
2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/nios2/include/asm/pgtable-bits.h b/arch/nios2/include/asm/pgtable-bits.h
index bfddff383e89..724f9b08b1d1 100644
--- a/arch/nios2/include/asm/pgtable-bits.h
+++ b/arch/nios2/include/asm/pgtable-bits.h
@@ -31,4 +31,7 @@
#define _PAGE_ACCESSED (1<<26) /* page referenced */
#define _PAGE_DIRTY (1<<27) /* dirty page */

+/* We borrow bit 31 to store the exclusive marker in swap PTEs. */
+#define _PAGE_SWP_EXCLUSIVE (1<<31)
+
#endif /* _ASM_NIOS2_PGTABLE_BITS_H */
diff --git a/arch/nios2/include/asm/pgtable.h b/arch/nios2/include/asm/pgtable.h
index d1e5c9eb4643..05999da01731 100644
--- a/arch/nios2/include/asm/pgtable.h
+++ b/arch/nios2/include/asm/pgtable.h
@@ -239,7 +239,9 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
*
* 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1
* 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
- * 0 < type -> 0 0 0 0 0 0 <-------------- offset --------------->
+ * E < type -> 0 0 0 0 0 0 <-------------- offset --------------->
+ *
+ * E is the exclusive marker that is not stored in swap entries.
*
* Note that the offset field is always non-zero if the swap type is 0, thus
* !pte_none() is always true.
@@ -251,6 +253,24 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
#define __swp_entry_to_pte(swp) ((pte_t) { (swp).val })
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })

+#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
+static inline int pte_swp_exclusive(pte_t pte)
+{
+ return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
+}
+
+static inline pte_t pte_swp_mkexclusive(pte_t pte)
+{
+ pte_val(pte) |= _PAGE_SWP_EXCLUSIVE;
+ return pte;
+}
+
+static inline pte_t pte_swp_clear_exclusive(pte_t pte)
+{
+ pte_val(pte) &= ~_PAGE_SWP_EXCLUSIVE;
+ return pte;
+}
+
extern void __init paging_init(void);
extern void __init mmu_init(void);

--
2.39.0

2023-01-13 17:44:02

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH mm-unstable v1 24/26] x86/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE also on 32bit

Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE just like we already do on
x86-64. After deciphering the PTE layout it becomes clear that there are
still unused bits for 2-level and 3-level page tables that we should be
able to use. Reusing a bit avoids stealing one bit from the swap offset.

While at it, mask the type in __swp_entry(); use some helper definitions
to make the macros easier to grasp.

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/x86/include/asm/pgtable-2level.h | 26 +++++++++++++++++++++-----
arch/x86/include/asm/pgtable-3level.h | 26 +++++++++++++++++++++++---
arch/x86/include/asm/pgtable.h | 2 --
3 files changed, 44 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/pgtable-2level.h b/arch/x86/include/asm/pgtable-2level.h
index 60d0f9015317..e9482a11ac52 100644
--- a/arch/x86/include/asm/pgtable-2level.h
+++ b/arch/x86/include/asm/pgtable-2level.h
@@ -80,21 +80,37 @@ static inline unsigned long pte_bitop(unsigned long value, unsigned int rightshi
return ((value >> rightshift) & mask) << leftshift;
}

-/* Encode and de-code a swap entry */
+/*
+ * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
+ * are !pte_none() && !pte_present().
+ *
+ * Format of swap PTEs:
+ *
+ * 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1
+ * 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ * <----------------- offset ------------------> 0 E <- type --> 0
+ *
+ * E is the exclusive marker that is not stored in swap entries.
+ */
#define SWP_TYPE_BITS 5
+#define _SWP_TYPE_MASK ((1U << SWP_TYPE_BITS) - 1)
+#define _SWP_TYPE_SHIFT (_PAGE_BIT_PRESENT + 1)
#define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 1)

-#define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS)
+#define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > 5)

-#define __swp_type(x) (((x).val >> (_PAGE_BIT_PRESENT + 1)) \
- & ((1U << SWP_TYPE_BITS) - 1))
+#define __swp_type(x) (((x).val >> _SWP_TYPE_SHIFT) \
+ & _SWP_TYPE_MASK)
#define __swp_offset(x) ((x).val >> SWP_OFFSET_SHIFT)
#define __swp_entry(type, offset) ((swp_entry_t) { \
- ((type) << (_PAGE_BIT_PRESENT + 1)) \
+ (((type) & _SWP_TYPE_MASK) << _SWP_TYPE_SHIFT) \
| ((offset) << SWP_OFFSET_SHIFT) })
#define __pte_to_swp_entry(pte) ((swp_entry_t) { (pte).pte_low })
#define __swp_entry_to_pte(x) ((pte_t) { .pte = (x).val })

+/* We borrow bit 7 to store the exclusive marker in swap PTEs. */
+#define _PAGE_SWP_EXCLUSIVE _PAGE_PSE
+
/* No inverted PFNs on 2 level page tables */

static inline u64 protnone_mask(u64 val)
diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h
index 967b135fa2c0..9e7c0b719c3c 100644
--- a/arch/x86/include/asm/pgtable-3level.h
+++ b/arch/x86/include/asm/pgtable-3level.h
@@ -145,8 +145,24 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
}
#endif

-/* Encode and de-code a swap entry */
+/*
+ * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
+ * are !pte_none() && !pte_present().
+ *
+ * Format of swap PTEs:
+ *
+ * 6 6 6 6 5 5 5 5 5 5 5 5 5 5 4 4 4 4 4 4 4 4 4 4 3 3 3 3 3 3 3 3
+ * 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2
+ * < type -> <---------------------- offset ----------------------
+ *
+ * 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1
+ * 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ * --------------------------------------------> 0 E 0 0 0 0 0 0 0
+ *
+ * E is the exclusive marker that is not stored in swap entries.
+ */
#define SWP_TYPE_BITS 5
+#define _SWP_TYPE_MASK ((1U << SWP_TYPE_BITS) - 1)

#define SWP_OFFSET_FIRST_BIT (_PAGE_BIT_PROTNONE + 1)

@@ -154,9 +170,10 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
#define SWP_OFFSET_SHIFT (SWP_OFFSET_FIRST_BIT + SWP_TYPE_BITS)

#define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS)
-#define __swp_type(x) (((x).val) & ((1UL << SWP_TYPE_BITS) - 1))
+#define __swp_type(x) (((x).val) & _SWP_TYPE_MASK)
#define __swp_offset(x) ((x).val >> SWP_TYPE_BITS)
-#define __swp_entry(type, offset) ((swp_entry_t){(type) | (offset) << SWP_TYPE_BITS})
+#define __swp_entry(type, offset) ((swp_entry_t){((type) & _SWP_TYPE_MASK) \
+ | (offset) << SWP_TYPE_BITS})

/*
* Normally, __swp_entry() converts from arch-independent swp_entry_t to
@@ -184,6 +201,9 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
#define __pte_to_swp_entry(pte) (__swp_entry(__pteval_swp_type(pte), \
__pteval_swp_offset(pte)))

+/* We borrow bit 7 to store the exclusive marker in swap PTEs. */
+#define _PAGE_SWP_EXCLUSIVE _PAGE_PSE
+
#include <asm/pgtable-invert.h>

#endif /* _ASM_X86_PGTABLE_3LEVEL_H */
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 1c843395a8b3..d25195726b78 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1299,7 +1299,6 @@ static inline void update_mmu_cache_pud(struct vm_area_struct *vma,
unsigned long addr, pud_t *pud)
{
}
-#ifdef _PAGE_SWP_EXCLUSIVE
#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
static inline pte_t pte_swp_mkexclusive(pte_t pte)
{
@@ -1315,7 +1314,6 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
{
return pte_clear_flags(pte, _PAGE_SWP_EXCLUSIVE);
}
-#endif /* _PAGE_SWP_EXCLUSIVE */

#ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
--
2.39.0

2023-01-13 17:44:15

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH mm-unstable v1 07/26] ia64/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit
from the type. Generic MM currently only uses 5 bits for the type
(MAX_SWAPFILES_SHIFT), so the stolen bit is effectively unused.

While at it, also mask the type in __swp_entry().

Signed-off-by: David Hildenbrand <[email protected]>
---
arch/ia64/include/asm/pgtable.h | 32 +++++++++++++++++++++++++++++---
1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/arch/ia64/include/asm/pgtable.h b/arch/ia64/include/asm/pgtable.h
index 01517a5e6778..e4b8ab931399 100644
--- a/arch/ia64/include/asm/pgtable.h
+++ b/arch/ia64/include/asm/pgtable.h
@@ -58,6 +58,9 @@
#define _PAGE_ED (__IA64_UL(1) << 52) /* exception deferral */
#define _PAGE_PROTNONE (__IA64_UL(1) << 63)

+/* We borrow bit 7 to store the exclusive marker in swap PTEs. */
+#define _PAGE_SWP_EXCLUSIVE (1 << 7)
+
#define _PFN_MASK _PAGE_PPN_MASK
/* Mask of bits which may be changed by pte_modify(); the odd bits are there for _PAGE_PROTNONE */
#define _PAGE_CHG_MASK (_PAGE_P | _PAGE_PROTNONE | _PAGE_PL_MASK | _PAGE_AR_MASK | _PAGE_ED)
@@ -399,6 +402,9 @@ extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
extern void paging_init (void);

/*
+ * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
+ * are !pte_none() && !pte_present().
+ *
* Note: The macros below rely on the fact that MAX_SWAPFILES_SHIFT <= number of
* bits in the swap-type field of the swap pte. It would be nice to
* enforce that, but we can't easily include <linux/swap.h> here.
@@ -406,16 +412,36 @@ extern void paging_init (void);
*
* Format of swap pte:
* bit 0 : present bit (must be zero)
- * bits 1- 7: swap-type
+ * bits 1- 6: swap type
+ * bit 7 : exclusive marker
* bits 8-62: swap offset
* bit 63 : _PAGE_PROTNONE bit
*/
-#define __swp_type(entry) (((entry).val >> 1) & 0x7f)
+#define __swp_type(entry) (((entry).val >> 1) & 0x3f)
#define __swp_offset(entry) (((entry).val << 1) >> 9)
-#define __swp_entry(type,offset) ((swp_entry_t) { ((type) << 1) | ((long) (offset) << 8) })
+#define __swp_entry(type, offset) ((swp_entry_t) { ((type & 0x3f) << 1) | \
+ ((long) (offset) << 8) })
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })

+#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
+static inline int pte_swp_exclusive(pte_t pte)
+{
+ return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
+}
+
+static inline pte_t pte_swp_mkexclusive(pte_t pte)
+{
+ pte_val(pte) |= _PAGE_SWP_EXCLUSIVE;
+ return pte;
+}
+
+static inline pte_t pte_swp_clear_exclusive(pte_t pte)
+{
+ pte_val(pte) &= ~_PAGE_SWP_EXCLUSIVE;
+ return pte;
+}
+
/*
* ZERO_PAGE is a global shared page that is always zero: used
* for zero-mapped memory areas etc..
--
2.39.0

2023-01-13 17:44:30

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH mm-unstable v1 16/26] parisc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by using the yet-unused
_PAGE_ACCESSED location in the swap PTE. Looking at pte_present()
and pte_none() checks, there seems to be no actual reason why we cannot
use it: we only have to make sure we're not using _PAGE_PRESENT.

Reusing this bit avoids having to steal one bit from the swap offset.

Cc: "James E.J. Bottomley" <[email protected]>
Cc: Helge Deller <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/parisc/include/asm/pgtable.h | 41 ++++++++++++++++++++++++++++---
1 file changed, 38 insertions(+), 3 deletions(-)

diff --git a/arch/parisc/include/asm/pgtable.h b/arch/parisc/include/asm/pgtable.h
index ea357430aafe..3033bb88df34 100644
--- a/arch/parisc/include/asm/pgtable.h
+++ b/arch/parisc/include/asm/pgtable.h
@@ -218,6 +218,9 @@ extern void __update_cache(pte_t pte);
#define _PAGE_KERNEL_RWX (_PAGE_KERNEL_EXEC | _PAGE_WRITE)
#define _PAGE_KERNEL (_PAGE_KERNEL_RO | _PAGE_WRITE)

+/* We borrow bit 23 to store the exclusive marker in swap PTEs. */
+#define _PAGE_SWP_EXCLUSIVE _PAGE_ACCESSED
+
/* The pgd/pmd contains a ptr (in phys addr space); since all pgds/pmds
* are page-aligned, we don't care about the PAGE_OFFSET bits, except
* for a few meta-information bits, so we shift the address to be
@@ -394,17 +397,49 @@ extern void paging_init (void);

#define update_mmu_cache(vms,addr,ptep) __update_cache(*ptep)

-/* Encode and de-code a swap entry */
-
+/*
+ * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
+ * are !pte_none() && !pte_present().
+ *
+ * Format of swap PTEs (32bit):
+ *
+ * 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
+ * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ * <---------------- offset -----------------> P E <ofs> < type ->
+ *
+ * E is the exclusive marker that is not stored in swap entries.
+ * _PAGE_PRESENT (P) must be 0.
+ *
+ * For the 64bit version, the offset is extended by 32bit.
+ */
#define __swp_type(x) ((x).val & 0x1f)
#define __swp_offset(x) ( (((x).val >> 6) & 0x7) | \
(((x).val >> 8) & ~0x7) )
-#define __swp_entry(type, offset) ((swp_entry_t) { (type) | \
+#define __swp_entry(type, offset) ((swp_entry_t) { \
+ ((type) & 0x1f) | \
((offset & 0x7) << 6) | \
((offset & ~0x7) << 8) })
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })

+#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
+static inline int pte_swp_exclusive(pte_t pte)
+{
+ return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
+}
+
+static inline pte_t pte_swp_mkexclusive(pte_t pte)
+{
+ pte_val(pte) |= _PAGE_SWP_EXCLUSIVE;
+ return pte;
+}
+
+static inline pte_t pte_swp_clear_exclusive(pte_t pte)
+{
+ pte_val(pte) &= ~_PAGE_SWP_EXCLUSIVE;
+ return pte;
+}
+
static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep)
{
pte_t pte;
--
2.39.0

2023-01-13 17:44:43

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH mm-unstable v1 01/26] mm/debug_vm_pgtable: more pte_swp_exclusive() sanity checks

We want to implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures.
Let's extend our sanity checks, especially testing that our PTE bit
does not affect:
* is_swap_pte() -> pte_present() and pte_none()
* the swap entry + type
* pte_swp_soft_dirty()

Especially, the pfn_pte() is dodgy when the swap PTE layout differs
heavily from ordinary PTEs. Let's properly construct a swap PTE from
swap type+offset.

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/debug_vm_pgtable.c | 23 ++++++++++++++++++++++-
1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index bb3328f46126..a0730beffd78 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -811,13 +811,34 @@ static void __init pmd_swap_soft_dirty_tests(struct pgtable_debug_args *args) {
static void __init pte_swap_exclusive_tests(struct pgtable_debug_args *args)
{
#ifdef __HAVE_ARCH_PTE_SWP_EXCLUSIVE
- pte_t pte = pfn_pte(args->fixed_pte_pfn, args->page_prot);
+ unsigned long max_swapfile_size = generic_max_swapfile_size();
+ swp_entry_t entry, entry2;
+ pte_t pte;

pr_debug("Validating PTE swap exclusive\n");
+
+ /* Create a swp entry with all possible bits set */
+ entry = swp_entry((1 << MAX_SWAPFILES_SHIFT) - 1,
+ max_swapfile_size - 1);
+
+ pte = swp_entry_to_pte(entry);
+ WARN_ON(pte_swp_exclusive(pte));
+ WARN_ON(!is_swap_pte(pte));
+ entry2 = pte_to_swp_entry(pte);
+ WARN_ON(memcmp(&entry, &entry2, sizeof(entry)));
+
pte = pte_swp_mkexclusive(pte);
WARN_ON(!pte_swp_exclusive(pte));
+ WARN_ON(!is_swap_pte(pte));
+ WARN_ON(pte_swp_soft_dirty(pte));
+ entry2 = pte_to_swp_entry(pte);
+ WARN_ON(memcmp(&entry, &entry2, sizeof(entry)));
+
pte = pte_swp_clear_exclusive(pte);
WARN_ON(pte_swp_exclusive(pte));
+ WARN_ON(!is_swap_pte(pte));
+ entry2 = pte_to_swp_entry(pte);
+ WARN_ON(memcmp(&entry, &entry2, sizeof(entry)));
#endif /* __HAVE_ARCH_PTE_SWP_EXCLUSIVE */
}

--
2.39.0

2023-01-13 17:45:04

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH mm-unstable v1 19/26] riscv/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit
from the offset. This reduces the maximum swap space per file: on 32bit
to 16 GiB (was 32 GiB).

Note that this bit does not conflict with swap PMDs and could also be used
in swap PMD context later.

While at it, mask the type in __swp_entry().

Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/riscv/include/asm/pgtable-bits.h | 3 +++
arch/riscv/include/asm/pgtable.h | 29 ++++++++++++++++++++++-----
2 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/arch/riscv/include/asm/pgtable-bits.h b/arch/riscv/include/asm/pgtable-bits.h
index b9e13a8fe2b7..f896708e8331 100644
--- a/arch/riscv/include/asm/pgtable-bits.h
+++ b/arch/riscv/include/asm/pgtable-bits.h
@@ -27,6 +27,9 @@
*/
#define _PAGE_PROT_NONE _PAGE_GLOBAL

+/* Used for swap PTEs only. */
+#define _PAGE_SWP_EXCLUSIVE _PAGE_ACCESSED
+
#define _PAGE_PFN_SHIFT 10

/*
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 4eba9a98d0e3..03a4728db039 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -724,16 +724,18 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */

/*
- * Encode and decode a swap entry
+ * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
+ * are !pte_none() && !pte_present().
*
* Format of swap PTE:
* bit 0: _PAGE_PRESENT (zero)
* bit 1 to 3: _PAGE_LEAF (zero)
* bit 5: _PAGE_PROT_NONE (zero)
- * bits 6 to 10: swap type
- * bits 10 to XLEN-1: swap offset
+ * bit 6: exclusive marker
+ * bits 7 to 11: swap type
+ * bits 11 to XLEN-1: swap offset
*/
-#define __SWP_TYPE_SHIFT 6
+#define __SWP_TYPE_SHIFT 7
#define __SWP_TYPE_BITS 5
#define __SWP_TYPE_MASK ((1UL << __SWP_TYPE_BITS) - 1)
#define __SWP_OFFSET_SHIFT (__SWP_TYPE_BITS + __SWP_TYPE_SHIFT)
@@ -744,11 +746,28 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
#define __swp_type(x) (((x).val >> __SWP_TYPE_SHIFT) & __SWP_TYPE_MASK)
#define __swp_offset(x) ((x).val >> __SWP_OFFSET_SHIFT)
#define __swp_entry(type, offset) ((swp_entry_t) \
- { ((type) << __SWP_TYPE_SHIFT) | ((offset) << __SWP_OFFSET_SHIFT) })
+ { (((type) & __SWP_TYPE_MASK) << __SWP_TYPE_SHIFT) | \
+ ((offset) << __SWP_OFFSET_SHIFT) })

#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })

+#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
+static inline int pte_swp_exclusive(pte_t pte)
+{
+ return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
+}
+
+static inline pte_t pte_swp_mkexclusive(pte_t pte)
+{
+ return __pte(pte_val(pte) | _PAGE_SWP_EXCLUSIVE);
+}
+
+static inline pte_t pte_swp_clear_exclusive(pte_t pte)
+{
+ return __pte(pte_val(pte) & ~_PAGE_SWP_EXCLUSIVE);
+}
+
#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
#define __pmd_to_swp_entry(pmd) ((swp_entry_t) { pmd_val(pmd) })
#define __swp_entry_to_pmd(swp) __pmd((swp).val)
--
2.39.0

2023-01-13 17:45:15

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH mm-unstable v1 17/26] powerpc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 32bit book3s

We already implemented support for 64bit book3s in commit bff9beaa2e80
("powerpc/pgtable: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE for book3s")

Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE also in 32bit by reusing yet
unused LSB 2 / MSB 29. There seems to be no real reason why that bit cannot
be used, and reusing it avoids having to steal one bit from the swap
offset.

While at it, mask the type in __swp_entry().

Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Christophe Leroy <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/powerpc/include/asm/book3s/32/pgtable.h | 38 +++++++++++++++++---
1 file changed, 33 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 75823f39e042..0ecb3a58f23f 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -42,6 +42,9 @@
#define _PMD_PRESENT_MASK (PAGE_MASK)
#define _PMD_BAD (~PAGE_MASK)

+/* We borrow the _PAGE_USER bit to store the exclusive marker in swap PTEs. */
+#define _PAGE_SWP_EXCLUSIVE _PAGE_USER
+
/* And here we include common definitions */

#define _PAGE_KERNEL_RO 0
@@ -363,17 +366,42 @@ static inline void __ptep_set_access_flags(struct vm_area_struct *vma,
#define pmd_page(pmd) pfn_to_page(pmd_pfn(pmd))

/*
- * Encode and decode a swap entry.
- * Note that the bits we use in a PTE for representing a swap entry
- * must not include the _PAGE_PRESENT bit or the _PAGE_HASHPTE bit (if used).
- * -- paulus
+ * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
+ * are !pte_none() && !pte_present().
+ *
+ * Format of swap PTEs (32bit PTEs):
+ *
+ * 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
+ * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ * <----------------- offset --------------------> < type -> E H P
+ *
+ * E is the exclusive marker that is not stored in swap entries.
+ * _PAGE_PRESENT (P) and __PAGE_HASHPTE (H) must be 0.
+ *
+ * For 64bit PTEs, the offset is extended by 32bit.
*/
#define __swp_type(entry) ((entry).val & 0x1f)
#define __swp_offset(entry) ((entry).val >> 5)
-#define __swp_entry(type, offset) ((swp_entry_t) { (type) | ((offset) << 5) })
+#define __swp_entry(type, offset) ((swp_entry_t) { ((type) & 0x1f) | ((offset) << 5) })
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) >> 3 })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val << 3 })

+#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
+static inline int pte_swp_exclusive(pte_t pte)
+{
+ return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
+}
+
+static inline pte_t pte_swp_mkexclusive(pte_t pte)
+{
+ return __pte(pte_val(pte) | _PAGE_SWP_EXCLUSIVE);
+}
+
+static inline pte_t pte_swp_clear_exclusive(pte_t pte)
+{
+ return __pte(pte_val(pte) & ~_PAGE_SWP_EXCLUSIVE);
+}
+
/* Generic accessors to PTE bits */
static inline int pte_write(pte_t pte) { return !!(pte_val(pte) & _PAGE_RW);}
static inline int pte_read(pte_t pte) { return 1; }
--
2.39.0

2023-01-13 17:45:21

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH mm-unstable v1 15/26] openrisc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit
from the type. Generic MM currently only uses 5 bits for the type
(MAX_SWAPFILES_SHIFT), so the stolen bit is effectively unused.

While at it, mask the type in __swp_entry().

Cc: Stefan Kristiansson <[email protected]>
Cc: Stafford Horne <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/openrisc/include/asm/pgtable.h | 41 +++++++++++++++++++++++++----
1 file changed, 36 insertions(+), 5 deletions(-)

diff --git a/arch/openrisc/include/asm/pgtable.h b/arch/openrisc/include/asm/pgtable.h
index 6477c17b3062..903b32d662ab 100644
--- a/arch/openrisc/include/asm/pgtable.h
+++ b/arch/openrisc/include/asm/pgtable.h
@@ -154,6 +154,9 @@ extern void paging_init(void);
#define _KERNPG_TABLE \
(_PAGE_BASE | _PAGE_SRE | _PAGE_SWE | _PAGE_ACCESSED | _PAGE_DIRTY)

+/* We borrow bit 11 to store the exclusive marker in swap PTEs. */
+#define _PAGE_SWP_EXCLUSIVE _PAGE_U_SHARED
+
#define PAGE_NONE __pgprot(_PAGE_ALL)
#define PAGE_READONLY __pgprot(_PAGE_ALL | _PAGE_URE | _PAGE_SRE)
#define PAGE_READONLY_X __pgprot(_PAGE_ALL | _PAGE_URE | _PAGE_SRE | _PAGE_EXEC)
@@ -385,16 +388,44 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,

/* __PHX__ FIXME, SWAP, this probably doesn't work */

-/* Encode and de-code a swap entry (must be !pte_none(e) && !pte_present(e)) */
-/* Since the PAGE_PRESENT bit is bit 4, we can use the bits above */
-
-#define __swp_type(x) (((x).val >> 5) & 0x7f)
+/*
+ * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
+ * are !pte_none() && !pte_present().
+ *
+ * Format of swap PTEs:
+ *
+ * 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1
+ * 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ * <-------------- offset ---------------> E <- type --> 0 0 0 0 0
+ *
+ * E is the exclusive marker that is not stored in swap entries.
+ * The zero'ed bits include _PAGE_PRESENT.
+ */
+#define __swp_type(x) (((x).val >> 5) & 0x3f)
#define __swp_offset(x) ((x).val >> 12)
#define __swp_entry(type, offset) \
- ((swp_entry_t) { ((type) << 5) | ((offset) << 12) })
+ ((swp_entry_t) { (((type) & 0x3f) << 5) | ((offset) << 12) })
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })

+#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
+static inline int pte_swp_exclusive(pte_t pte)
+{
+ return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
+}
+
+static inline pte_t pte_swp_mkexclusive(pte_t pte)
+{
+ pte_val(pte) |= _PAGE_SWP_EXCLUSIVE;
+ return pte;
+}
+
+static inline pte_t pte_swp_clear_exclusive(pte_t pte)
+{
+ pte_val(pte) &= ~_PAGE_SWP_EXCLUSIVE;
+ return pte;
+}
+
typedef pte_t *pte_addr_t;

#endif /* __ASSEMBLY__ */
--
2.39.0

2023-01-13 17:45:28

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH mm-unstable v1 22/26] sparc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 64bit

Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit
from the type. Generic MM currently only uses 5 bits for the type
(MAX_SWAPFILES_SHIFT), so the stolen bit was effectively unused.

While at it, mask the type in __swp_entry().

Cc: "David S. Miller" <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/sparc/include/asm/pgtable_64.h | 38 ++++++++++++++++++++++++++---
1 file changed, 35 insertions(+), 3 deletions(-)

diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 3bc9736bddb1..a1658eebd036 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -187,6 +187,9 @@ bool kern_addr_valid(unsigned long addr);
#define _PAGE_SZHUGE_4U _PAGE_SZ4MB_4U
#define _PAGE_SZHUGE_4V _PAGE_SZ4MB_4V

+/* We borrow bit 20 to store the exclusive marker in swap PTEs. */
+#define _PAGE_SWP_EXCLUSIVE _AC(0x0000000000100000, UL)
+
#ifndef __ASSEMBLY__

pte_t mk_pte_io(unsigned long, pgprot_t, int, unsigned long);
@@ -961,18 +964,47 @@ void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
#endif

-/* Encode and de-code a swap entry */
-#define __swp_type(entry) (((entry).val >> PAGE_SHIFT) & 0xffUL)
+/*
+ * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
+ * are !pte_none() && !pte_present().
+ *
+ * Format of swap PTEs:
+ *
+ * 6 6 6 6 5 5 5 5 5 5 5 5 5 5 4 4 4 4 4 4 4 4 4 4 3 3 3 3 3 3 3 3
+ * 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2
+ * <--------------------------- offset ---------------------------
+ *
+ * 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1
+ * 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ * --------------------> E <-- type ---> <------- zeroes -------->
+ */
+#define __swp_type(entry) (((entry).val >> PAGE_SHIFT) & 0x7fUL)
#define __swp_offset(entry) ((entry).val >> (PAGE_SHIFT + 8UL))
#define __swp_entry(type, offset) \
( (swp_entry_t) \
{ \
- (((long)(type) << PAGE_SHIFT) | \
+ ((((long)(type) & 0x7fUL) << PAGE_SHIFT) | \
((long)(offset) << (PAGE_SHIFT + 8UL))) \
} )
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })

+#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
+static inline int pte_swp_exclusive(pte_t pte)
+{
+ return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
+}
+
+static inline pte_t pte_swp_mkexclusive(pte_t pte)
+{
+ return __pte(pte_val(pte) | _PAGE_SWP_EXCLUSIVE);
+}
+
+static inline pte_t pte_swp_clear_exclusive(pte_t pte)
+{
+ return __pte(pte_val(pte) & ~_PAGE_SWP_EXCLUSIVE);
+}
+
int page_in_phys_avail(unsigned long paddr);

/*
--
2.39.0

2023-01-13 17:45:34

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH mm-unstable v1 11/26] microblaze/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit
from the type. Generic MM currently only uses 5 bits for the type
(MAX_SWAPFILES_SHIFT), so the stolen bit is effectively unused.

The shift by 2 when converting between PTE and arch-specific swap entry
makes the swap PTE layout a little bit harder to decipher.

While at it, drop the comment from paulus---copy-and-paste leftover
from powerpc where we actually have _PAGE_HASHPTE---and mask the type in
__swp_entry_to_pte() as well.

Cc: Michal Simek <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/m68k/include/asm/mcf_pgtable.h | 4 +--
arch/microblaze/include/asm/pgtable.h | 45 +++++++++++++++++++++------
2 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/arch/m68k/include/asm/mcf_pgtable.h b/arch/m68k/include/asm/mcf_pgtable.h
index 3f8f4d0e66dd..e573d7b649f7 100644
--- a/arch/m68k/include/asm/mcf_pgtable.h
+++ b/arch/m68k/include/asm/mcf_pgtable.h
@@ -46,8 +46,8 @@
#define _CACHEMASK040 (~0x060)
#define _PAGE_GLOBAL040 0x400 /* 68040 global bit, used for kva descs */

-/* We borrow bit 7 to store the exclusive marker in swap PTEs. */
-#define _PAGE_SWP_EXCLUSIVE 0x080
+/* We borrow bit 24 to store the exclusive marker in swap PTEs. */
+#define _PAGE_SWP_EXCLUSIVE CF_PAGE_NOCACHE

/*
* Externally used page protection values.
diff --git a/arch/microblaze/include/asm/pgtable.h b/arch/microblaze/include/asm/pgtable.h
index 42f5988e998b..7e3de54bf426 100644
--- a/arch/microblaze/include/asm/pgtable.h
+++ b/arch/microblaze/include/asm/pgtable.h
@@ -131,10 +131,10 @@ extern pte_t *va_to_pte(unsigned long address);
* of the 16 available. Bit 24-26 of the TLB are cleared in the TLB
* miss handler. Bit 27 is PAGE_USER, thus selecting the correct
* zone.
- * - PRESENT *must* be in the bottom two bits because swap cache
- * entries use the top 30 bits. Because 4xx doesn't support SMP
- * anyway, M is irrelevant so we borrow it for PAGE_PRESENT. Bit 30
- * is cleared in the TLB miss handler before the TLB entry is loaded.
+ * - PRESENT *must* be in the bottom two bits because swap PTEs use the top
+ * 30 bits. Because 4xx doesn't support SMP anyway, M is irrelevant so we
+ * borrow it for PAGE_PRESENT. Bit 30 is cleared in the TLB miss handler
+ * before the TLB entry is loaded.
* - All other bits of the PTE are loaded into TLBLO without
* * modification, leaving us only the bits 20, 21, 24, 25, 26, 30 for
* software PTE bits. We actually use bits 21, 24, 25, and
@@ -155,6 +155,9 @@ extern pte_t *va_to_pte(unsigned long address);
#define _PAGE_ACCESSED 0x400 /* software: R: page referenced */
#define _PMD_PRESENT PAGE_MASK

+/* We borrow bit 24 to store the exclusive marker in swap PTEs. */
+#define _PAGE_SWP_EXCLUSIVE _PAGE_DIRTY
+
/*
* Some bits are unused...
*/
@@ -393,18 +396,40 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
extern pgd_t swapper_pg_dir[PTRS_PER_PGD];

/*
- * Encode and decode a swap entry.
- * Note that the bits we use in a PTE for representing a swap entry
- * must not include the _PAGE_PRESENT bit, or the _PAGE_HASHPTE bit
- * (if used). -- paulus
+ * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
+ * are !pte_none() && !pte_present().
+ *
+ * 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
+ * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ * <------------------ offset -------------------> E < type -> 0 0
+ *
+ * E is the exclusive marker that is not stored in swap entries.
*/
-#define __swp_type(entry) ((entry).val & 0x3f)
+#define __swp_type(entry) ((entry).val & 0x1f)
#define __swp_offset(entry) ((entry).val >> 6)
#define __swp_entry(type, offset) \
- ((swp_entry_t) { (type) | ((offset) << 6) })
+ ((swp_entry_t) { ((type) & 0x1f) | ((offset) << 6) })
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) >> 2 })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val << 2 })

+#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
+static inline int pte_swp_exclusive(pte_t pte)
+{
+ return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
+}
+
+static inline pte_t pte_swp_mkexclusive(pte_t pte)
+{
+ pte_val(pte) |= _PAGE_SWP_EXCLUSIVE;
+ return pte;
+}
+
+static inline pte_t pte_swp_clear_exclusive(pte_t pte)
+{
+ pte_val(pte) &= ~_PAGE_SWP_EXCLUSIVE;
+ return pte;
+}
+
extern unsigned long iopa(unsigned long addr);

/* Values for nocacheflag and cmode */
--
2.39.0

2023-01-13 17:46:19

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH mm-unstable v1 18/26] powerpc/nohash/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 32bit and 64bit.

On 64bit, let's use MSB 56 (LSB 7), located right next to the page type.
On 32bit, let's use LSB 2 to avoid stealing one bit from the swap offset.

There seems to be no real reason why these bits cannot be used for swap
PTEs. The important part is that _PAGE_PRESENT and _PAGE_HASHPTE remain
0.

While at it, mask the type in __swp_entry() and remove
_PAGE_BIT_SWAP_TYPE from pte-e500.h: while it was used in 64bit code it was
ignored in 32bit code.

Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Christophe Leroy <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/powerpc/include/asm/nohash/32/pgtable.h | 22 +++++++++++++----
arch/powerpc/include/asm/nohash/32/pte-40x.h | 6 ++---
arch/powerpc/include/asm/nohash/32/pte-44x.h | 18 ++++----------
arch/powerpc/include/asm/nohash/32/pte-85xx.h | 4 ++--
arch/powerpc/include/asm/nohash/64/pgtable.h | 24 ++++++++++++++++---
arch/powerpc/include/asm/nohash/pgtable.h | 16 +++++++++++++
arch/powerpc/include/asm/nohash/pte-e500.h | 1 -
7 files changed, 63 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h b/arch/powerpc/include/asm/nohash/32/pgtable.h
index 70edad44dff6..fec56d965f00 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -360,18 +360,30 @@ static inline int pte_young(pte_t pte)
#endif

#define pmd_page(pmd) pfn_to_page(pmd_pfn(pmd))
+
/*
- * Encode and decode a swap entry.
- * Note that the bits we use in a PTE for representing a swap entry
- * must not include the _PAGE_PRESENT bit.
- * -- paulus
+ * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
+ * are !pte_none() && !pte_present().
+ *
+ * Format of swap PTEs (32bit PTEs):
+ *
+ * 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
+ * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ * <------------------ offset -------------------> < type -> E 0 0
+ *
+ * E is the exclusive marker that is not stored in swap entries.
+ *
+ * For 64bit PTEs, the offset is extended by 32bit.
*/
#define __swp_type(entry) ((entry).val & 0x1f)
#define __swp_offset(entry) ((entry).val >> 5)
-#define __swp_entry(type, offset) ((swp_entry_t) { (type) | ((offset) << 5) })
+#define __swp_entry(type, offset) ((swp_entry_t) { ((type) & 0x1f) | ((offset) << 5) })
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) >> 3 })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val << 3 })

+/* We borrow LSB 2 to store the exclusive marker in swap PTEs. */
+#define _PAGE_SWP_EXCLUSIVE 0x000004
+
#endif /* !__ASSEMBLY__ */

#endif /* __ASM_POWERPC_NOHASH_32_PGTABLE_H */
diff --git a/arch/powerpc/include/asm/nohash/32/pte-40x.h b/arch/powerpc/include/asm/nohash/32/pte-40x.h
index 2d3153cfc0d7..6fe46e754556 100644
--- a/arch/powerpc/include/asm/nohash/32/pte-40x.h
+++ b/arch/powerpc/include/asm/nohash/32/pte-40x.h
@@ -27,9 +27,9 @@
* of the 16 available. Bit 24-26 of the TLB are cleared in the TLB
* miss handler. Bit 27 is PAGE_USER, thus selecting the correct
* zone.
- * - PRESENT *must* be in the bottom two bits because swap cache
- * entries use the top 30 bits. Because 40x doesn't support SMP
- * anyway, M is irrelevant so we borrow it for PAGE_PRESENT. Bit 30
+ * - PRESENT *must* be in the bottom two bits because swap PTEs
+ * use the top 30 bits. Because 40x doesn't support SMP anyway, M is
+ * irrelevant so we borrow it for PAGE_PRESENT. Bit 30
* is cleared in the TLB miss handler before the TLB entry is loaded.
* - All other bits of the PTE are loaded into TLBLO without
* modification, leaving us only the bits 20, 21, 24, 25, 26, 30 for
diff --git a/arch/powerpc/include/asm/nohash/32/pte-44x.h b/arch/powerpc/include/asm/nohash/32/pte-44x.h
index 78bc304f750e..b7ed13cee137 100644
--- a/arch/powerpc/include/asm/nohash/32/pte-44x.h
+++ b/arch/powerpc/include/asm/nohash/32/pte-44x.h
@@ -56,20 +56,10 @@
* above bits. Note that the bit values are CPU specific, not architecture
* specific.
*
- * The kernel PTE entry holds an arch-dependent swp_entry structure under
- * certain situations. In other words, in such situations some portion of
- * the PTE bits are used as a swp_entry. In the PPC implementation, the
- * 3-24th LSB are shared with swp_entry, however the 0-2nd three LSB still
- * hold protection values. That means the three protection bits are
- * reserved for both PTE and SWAP entry at the most significant three
- * LSBs.
- *
- * There are three protection bits available for SWAP entry:
- * _PAGE_PRESENT
- * _PAGE_HASHPTE (if HW has)
- *
- * So those three bits have to be inside of 0-2nd LSB of PTE.
- *
+ * The kernel PTE entry can be an ordinary PTE mapping a page or a special swap
+ * PTE. In case of a swap PTE, LSB 2-24 are used to store information regarding
+ * the swap entry. However LSB 0-1 still hold protection values, for example,
+ * to distinguish swap PTEs from ordinary PTEs, and must be used with care.
*/

#define _PAGE_PRESENT 0x00000001 /* S: PTE valid */
diff --git a/arch/powerpc/include/asm/nohash/32/pte-85xx.h b/arch/powerpc/include/asm/nohash/32/pte-85xx.h
index 93fb8e11a3f1..16451df5ddb0 100644
--- a/arch/powerpc/include/asm/nohash/32/pte-85xx.h
+++ b/arch/powerpc/include/asm/nohash/32/pte-85xx.h
@@ -11,8 +11,8 @@
32 33 34 35 36 ... 50 51 52 53 54 55 56 57 58 59 60 61 62 63
RPN...................... 0 0 U0 U1 U2 U3 UX SX UW SW UR SR

- - PRESENT *must* be in the bottom three bits because swap cache
- entries use the top 29 bits.
+ - PRESENT *must* be in the bottom two bits because swap PTEs use
+ the top 30 bits.

*/

diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h b/arch/powerpc/include/asm/nohash/64/pgtable.h
index 879e9a6e5a87..287e25864ffa 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
@@ -276,22 +276,40 @@ static inline void __ptep_set_access_flags(struct vm_area_struct *vma,
#define pgd_ERROR(e) \
pr_err("%s:%d: bad pgd %08lx.\n", __FILE__, __LINE__, pgd_val(e))

-/* Encode and de-code a swap entry */
+/*
+ * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
+ * are !pte_none() && !pte_present().
+ *
+ * Format of swap PTEs:
+ *
+ * 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
+ * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ * <-------------------------- offset ----------------------------
+ *
+ * 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 6 6 6 6
+ * 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
+ * --------------> <----------- zero ------------> E < type -> 0 0
+ *
+ * E is the exclusive marker that is not stored in swap entries.
+ */
#define MAX_SWAPFILES_CHECK() do { \
BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS); \
} while (0)

#define SWP_TYPE_BITS 5
-#define __swp_type(x) (((x).val >> _PAGE_BIT_SWAP_TYPE) \
+#define __swp_type(x) (((x).val >> 2) \
& ((1UL << SWP_TYPE_BITS) - 1))
#define __swp_offset(x) ((x).val >> PTE_RPN_SHIFT)
#define __swp_entry(type, offset) ((swp_entry_t) { \
- ((type) << _PAGE_BIT_SWAP_TYPE) \
+ (((type) & 0x1f) << 2) \
| ((offset) << PTE_RPN_SHIFT) })

#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val((pte)) })
#define __swp_entry_to_pte(x) __pte((x).val)

+/* We borrow MSB 56 (LSB 7) to store the exclusive marker in swap PTEs. */
+#define _PAGE_SWP_EXCLUSIVE 0x80
+
int map_kernel_page(unsigned long ea, unsigned long pa, pgprot_t prot);
void unmap_kernel_page(unsigned long va);
extern int __meminit vmemmap_create_mapping(unsigned long start,
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h
index 69c3a050a3d8..5f4620940c2c 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -151,6 +151,22 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
return __pte((pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot));
}

+#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
+static inline int pte_swp_exclusive(pte_t pte)
+{
+ return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
+}
+
+static inline pte_t pte_swp_mkexclusive(pte_t pte)
+{
+ return __pte(pte_val(pte) | _PAGE_SWP_EXCLUSIVE);
+}
+
+static inline pte_t pte_swp_clear_exclusive(pte_t pte)
+{
+ return __pte(pte_val(pte) & ~_PAGE_SWP_EXCLUSIVE);
+}
+
/* Insert a PTE, top-level function is out of line. It uses an inline
* low level function in the respective pgtable-* files
*/
diff --git a/arch/powerpc/include/asm/nohash/pte-e500.h b/arch/powerpc/include/asm/nohash/pte-e500.h
index 0934e8965e4e..d8924cbd61e4 100644
--- a/arch/powerpc/include/asm/nohash/pte-e500.h
+++ b/arch/powerpc/include/asm/nohash/pte-e500.h
@@ -12,7 +12,6 @@
/* Architected bits */
#define _PAGE_PRESENT 0x000001 /* software: pte contains a translation */
#define _PAGE_SW1 0x000002
-#define _PAGE_BIT_SWAP_TYPE 2
#define _PAGE_BAP_SR 0x000004
#define _PAGE_BAP_UR 0x000008
#define _PAGE_BAP_SW 0x000010
--
2.39.0

2023-01-14 16:18:04

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH mm-unstable v1 01/26] mm/debug_vm_pgtable: more pte_swp_exclusive() sanity checks

On 13.01.23 18:10, David Hildenbrand wrote:
> We want to implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures.
> Let's extend our sanity checks, especially testing that our PTE bit
> does not affect:
> * is_swap_pte() -> pte_present() and pte_none()
> * the swap entry + type
> * pte_swp_soft_dirty()
>
> Especially, the pfn_pte() is dodgy when the swap PTE layout differs
> heavily from ordinary PTEs. Let's properly construct a swap PTE from
> swap type+offset.
>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---

The following fixup for !CONFIG_SWAP on top, which makes it compile for me and
passes when booting on x86_64 with CONFIG_DEBUG_VM_PGTABLE:

...
[ 0.347112] Loaded X.509 cert 'Build time autogenerated kernel key: ee6afc0578f6475656fec8a4f9d02832'
[ 0.350112] debug_vm_pgtable: [debug_vm_pgtable ]: Validating architecture page table helpers
[ 0.351217] page_owner is disabled
...


From 6a6162e8af62a4b3f7b9d823fdfae86de3f34a9d Mon Sep 17 00:00:00 2001
From: David Hildenbrand <[email protected]>
Date: Sat, 14 Jan 2023 16:47:12 +0100
Subject: [PATCH] fixup: mm/debug_vm_pgtable: more pte_swp_exclusive() sanity
checks

generic_max_swapfile_size() is only available with CONFIG_SWAP -- which
makes sense, because without SWAP there are no swap files. Let's
simply probe manually which bits we can obtain after storing them in a
PTE, and properly call it "max swap offset", which is more generic for
a swap entry.

Reported-by: kernel test robot <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
mm/debug_vm_pgtable.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 3da0cc380c35..af59cc7bd307 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -810,15 +810,17 @@ static void __init pmd_swap_soft_dirty_tests(struct pgtable_debug_args *args) {

static void __init pte_swap_exclusive_tests(struct pgtable_debug_args *args)
{
- unsigned long max_swapfile_size = generic_max_swapfile_size();
+ unsigned long max_swap_offset;
swp_entry_t entry, entry2;
pte_t pte;

pr_debug("Validating PTE swap exclusive\n");

+ /* See generic_max_swapfile_size(): probe the maximum offset */
+ max_swap_offset = swp_offset(pte_to_swp_entry(swp_entry_to_pte(swp_entry(0, ~0UL))));
+
/* Create a swp entry with all possible bits set */
- entry = swp_entry((1 << MAX_SWAPFILES_SHIFT) - 1,
- max_swapfile_size - 1);
+ entry = swp_entry((1 << MAX_SWAPFILES_SHIFT) - 1, max_swap_offset);

pte = swp_entry_to_pte(entry);
WARN_ON(pte_swp_exclusive(pte));
--
2.39.0



--
Thanks,

David / dhildenb

2023-02-10 05:08:24

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH mm-unstable v1 17/26] powerpc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 32bit book3s

David Hildenbrand <[email protected]> writes:
> We already implemented support for 64bit book3s in commit bff9beaa2e80
> ("powerpc/pgtable: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE for book3s")
>
> Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE also in 32bit by reusing yet
> unused LSB 2 / MSB 29. There seems to be no real reason why that bit cannot
> be used, and reusing it avoids having to steal one bit from the swap
> offset.
>
> While at it, mask the type in __swp_entry().
>
> Cc: Michael Ellerman <[email protected]>
> Cc: Nicholas Piggin <[email protected]>
> Cc: Christophe Leroy <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> arch/powerpc/include/asm/book3s/32/pgtable.h | 38 +++++++++++++++++---
> 1 file changed, 33 insertions(+), 5 deletions(-)

I gave this a quick test on a ppc32 machine, everything seems fine.

Your test_swp_exclusive.c passes, and an LTP run looks normal.

Acked-by: Michael Ellerman <[email protected]> (powerpc)

cheers

> diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h
> index 75823f39e042..0ecb3a58f23f 100644
> --- a/arch/powerpc/include/asm/book3s/32/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
> @@ -42,6 +42,9 @@
> #define _PMD_PRESENT_MASK (PAGE_MASK)
> #define _PMD_BAD (~PAGE_MASK)
>
> +/* We borrow the _PAGE_USER bit to store the exclusive marker in swap PTEs. */
> +#define _PAGE_SWP_EXCLUSIVE _PAGE_USER
> +
> /* And here we include common definitions */
>
> #define _PAGE_KERNEL_RO 0
> @@ -363,17 +366,42 @@ static inline void __ptep_set_access_flags(struct vm_area_struct *vma,
> #define pmd_page(pmd) pfn_to_page(pmd_pfn(pmd))
>
> /*
> - * Encode and decode a swap entry.
> - * Note that the bits we use in a PTE for representing a swap entry
> - * must not include the _PAGE_PRESENT bit or the _PAGE_HASHPTE bit (if used).
> - * -- paulus
> + * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
> + * are !pte_none() && !pte_present().
> + *
> + * Format of swap PTEs (32bit PTEs):
> + *
> + * 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
> + * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
> + * <----------------- offset --------------------> < type -> E H P
> + *
> + * E is the exclusive marker that is not stored in swap entries.
> + * _PAGE_PRESENT (P) and __PAGE_HASHPTE (H) must be 0.
> + *
> + * For 64bit PTEs, the offset is extended by 32bit.
> */
> #define __swp_type(entry) ((entry).val & 0x1f)
> #define __swp_offset(entry) ((entry).val >> 5)
> -#define __swp_entry(type, offset) ((swp_entry_t) { (type) | ((offset) << 5) })
> +#define __swp_entry(type, offset) ((swp_entry_t) { ((type) & 0x1f) | ((offset) << 5) })
> #define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) >> 3 })
> #define __swp_entry_to_pte(x) ((pte_t) { (x).val << 3 })
>
> +#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
> +static inline int pte_swp_exclusive(pte_t pte)
> +{
> + return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
> +}
> +
> +static inline pte_t pte_swp_mkexclusive(pte_t pte)
> +{
> + return __pte(pte_val(pte) | _PAGE_SWP_EXCLUSIVE);
> +}
> +
> +static inline pte_t pte_swp_clear_exclusive(pte_t pte)
> +{
> + return __pte(pte_val(pte) & ~_PAGE_SWP_EXCLUSIVE);
> +}
> +
> /* Generic accessors to PTE bits */
> static inline int pte_write(pte_t pte) { return !!(pte_val(pte) & _PAGE_RW);}
> static inline int pte_read(pte_t pte) { return 1; }
> --
> 2.39.0
>
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv

2023-02-26 20:13:55

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH mm-unstable v1 11/26] microblaze/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

Hi David,

On Fri, Jan 13, 2023 at 6:16 PM David Hildenbrand <[email protected]> wrote:
> Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit
> from the type. Generic MM currently only uses 5 bits for the type
> (MAX_SWAPFILES_SHIFT), so the stolen bit is effectively unused.
>
> The shift by 2 when converting between PTE and arch-specific swap entry
> makes the swap PTE layout a little bit harder to decipher.
>
> While at it, drop the comment from paulus---copy-and-paste leftover
> from powerpc where we actually have _PAGE_HASHPTE---and mask the type in
> __swp_entry_to_pte() as well.
>
> Cc: Michal Simek <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>

Thanks for your patch, which is now commit b5c88f21531c3457
("microblaze/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE") in

> arch/m68k/include/asm/mcf_pgtable.h | 4 +--

What is this m68k change doing here?
Sorry for not noticing this earlier.

Furthermore, several things below look strange to me...

> arch/microblaze/include/asm/pgtable.h | 45 +++++++++++++++++++++------
> 2 files changed, 37 insertions(+), 12 deletions(-)
>
> diff --git a/arch/m68k/include/asm/mcf_pgtable.h b/arch/m68k/include/asm/mcf_pgtable.h
> index 3f8f4d0e66dd..e573d7b649f7 100644
> --- a/arch/m68k/include/asm/mcf_pgtable.h
> +++ b/arch/m68k/include/asm/mcf_pgtable.h
> @@ -46,8 +46,8 @@
> #define _CACHEMASK040 (~0x060)
> #define _PAGE_GLOBAL040 0x400 /* 68040 global bit, used for kva descs */
>
> -/* We borrow bit 7 to store the exclusive marker in swap PTEs. */
> -#define _PAGE_SWP_EXCLUSIVE 0x080
> +/* We borrow bit 24 to store the exclusive marker in swap PTEs. */
> +#define _PAGE_SWP_EXCLUSIVE CF_PAGE_NOCACHE

CF_PAGE_NOCACHE is 0x80, so this is still bit 7, thus the new comment
is wrong?

>
> /*
> * Externally used page protection values.
> diff --git a/arch/microblaze/include/asm/pgtable.h b/arch/microblaze/include/asm/pgtable.h
> index 42f5988e998b..7e3de54bf426 100644
> --- a/arch/microblaze/include/asm/pgtable.h
> +++ b/arch/microblaze/include/asm/pgtable.h
> @@ -131,10 +131,10 @@ extern pte_t *va_to_pte(unsigned long address);
> * of the 16 available. Bit 24-26 of the TLB are cleared in the TLB
> * miss handler. Bit 27 is PAGE_USER, thus selecting the correct
> * zone.
> - * - PRESENT *must* be in the bottom two bits because swap cache
> - * entries use the top 30 bits. Because 4xx doesn't support SMP
> - * anyway, M is irrelevant so we borrow it for PAGE_PRESENT. Bit 30
> - * is cleared in the TLB miss handler before the TLB entry is loaded.
> + * - PRESENT *must* be in the bottom two bits because swap PTEs use the top
> + * 30 bits. Because 4xx doesn't support SMP anyway, M is irrelevant so we
> + * borrow it for PAGE_PRESENT. Bit 30 is cleared in the TLB miss handler
> + * before the TLB entry is loaded.

So the PowerPC 4xx comment is still here?

> * - All other bits of the PTE are loaded into TLBLO without
> * * modification, leaving us only the bits 20, 21, 24, 25, 26, 30 for
> * software PTE bits. We actually use bits 21, 24, 25, and
> @@ -155,6 +155,9 @@ extern pte_t *va_to_pte(unsigned long address);
> #define _PAGE_ACCESSED 0x400 /* software: R: page referenced */
> #define _PMD_PRESENT PAGE_MASK
>
> +/* We borrow bit 24 to store the exclusive marker in swap PTEs. */
> +#define _PAGE_SWP_EXCLUSIVE _PAGE_DIRTY

_PAGE_DIRTY is 0x80, so this is also bit 7, thus the new comment is
wrong?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2023-02-27 13:32:48

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH mm-unstable v1 11/26] microblaze/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

On 26.02.23 21:13, Geert Uytterhoeven wrote:
> Hi David,

Hi Geert,

>
> On Fri, Jan 13, 2023 at 6:16 PM David Hildenbrand <[email protected]> wrote:
>> Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit
>> from the type. Generic MM currently only uses 5 bits for the type
>> (MAX_SWAPFILES_SHIFT), so the stolen bit is effectively unused.
>>
>> The shift by 2 when converting between PTE and arch-specific swap entry
>> makes the swap PTE layout a little bit harder to decipher.
>>
>> While at it, drop the comment from paulus---copy-and-paste leftover
>> from powerpc where we actually have _PAGE_HASHPTE---and mask the type in
>> __swp_entry_to_pte() as well.
>>
>> Cc: Michal Simek <[email protected]>
>> Signed-off-by: David Hildenbrand <[email protected]>
>
> Thanks for your patch, which is now commit b5c88f21531c3457
> ("microblaze/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE") in
>

Right, it went upstream, so we can only fixup.

>> arch/m68k/include/asm/mcf_pgtable.h | 4 +--
>
> What is this m68k change doing here?
> Sorry for not noticing this earlier.

Thanks for the late review, still valuable :)

That hunk should have gone into the previous patch, looks like I messed
that up when reworking.

>
> Furthermore, several things below look strange to me...
>
>> arch/microblaze/include/asm/pgtable.h | 45 +++++++++++++++++++++------
>> 2 files changed, 37 insertions(+), 12 deletions(-)
>>
>> diff --git a/arch/m68k/include/asm/mcf_pgtable.h b/arch/m68k/include/asm/mcf_pgtable.h
>> index 3f8f4d0e66dd..e573d7b649f7 100644
>> --- a/arch/m68k/include/asm/mcf_pgtable.h
>> +++ b/arch/m68k/include/asm/mcf_pgtable.h
>> @@ -46,8 +46,8 @@
>> #define _CACHEMASK040 (~0x060)
>> #define _PAGE_GLOBAL040 0x400 /* 68040 global bit, used for kva descs */
>>
>> -/* We borrow bit 7 to store the exclusive marker in swap PTEs. */
>> -#define _PAGE_SWP_EXCLUSIVE 0x080
>> +/* We borrow bit 24 to store the exclusive marker in swap PTEs. */
>> +#define _PAGE_SWP_EXCLUSIVE CF_PAGE_NOCACHE
>
> CF_PAGE_NOCACHE is 0x80, so this is still bit 7, thus the new comment
> is wrong?

You're right, it's still bit 7 (and we use LSB-0 bit numbering in that
file). I'll send a fixup.

>
>>
>> /*
>> * Externally used page protection values.
>> diff --git a/arch/microblaze/include/asm/pgtable.h b/arch/microblaze/include/asm/pgtable.h
>> index 42f5988e998b..7e3de54bf426 100644
>> --- a/arch/microblaze/include/asm/pgtable.h
>> +++ b/arch/microblaze/include/asm/pgtable.h
>> @@ -131,10 +131,10 @@ extern pte_t *va_to_pte(unsigned long address);
>> * of the 16 available. Bit 24-26 of the TLB are cleared in the TLB
>> * miss handler. Bit 27 is PAGE_USER, thus selecting the correct
>> * zone.
>> - * - PRESENT *must* be in the bottom two bits because swap cache
>> - * entries use the top 30 bits. Because 4xx doesn't support SMP
>> - * anyway, M is irrelevant so we borrow it for PAGE_PRESENT. Bit 30
>> - * is cleared in the TLB miss handler before the TLB entry is loaded.
>> + * - PRESENT *must* be in the bottom two bits because swap PTEs use the top
>> + * 30 bits. Because 4xx doesn't support SMP anyway, M is irrelevant so we
>> + * borrow it for PAGE_PRESENT. Bit 30 is cleared in the TLB miss handler
>> + * before the TLB entry is loaded.
>
> So the PowerPC 4xx comment is still here?

I only dropped the comment above __swp_type(). I guess you mean that we
could also drop the "Because 4xx doesn't support SMP anyway, M is
irrelevant so we borrow it for PAGE_PRESENT." sentence, correct? Not
sure about the "Bit 30 is cleared in the TLB miss handler" comment, if
that can similarly be dropped.

>
>> * - All other bits of the PTE are loaded into TLBLO without
>> * * modification, leaving us only the bits 20, 21, 24, 25, 26, 30 for
>> * software PTE bits. We actually use bits 21, 24, 25, and
>> @@ -155,6 +155,9 @@ extern pte_t *va_to_pte(unsigned long address);
>> #define _PAGE_ACCESSED 0x400 /* software: R: page referenced */
>> #define _PMD_PRESENT PAGE_MASK
>>
>> +/* We borrow bit 24 to store the exclusive marker in swap PTEs. */
>> +#define _PAGE_SWP_EXCLUSIVE _PAGE_DIRTY
>
> _PAGE_DIRTY is 0x80, so this is also bit 7, thus the new comment is
> wrong?

In the example, I use MSB-0 bit numbering (which I determined to be
correct in microblaze context eventually, but I got confused a couple a
times because it's very inconsistent). That should be MSB-0 bit 24.

Thanks!

--
Thanks,

David / dhildenb


2023-02-27 14:43:33

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH mm-unstable v1 11/26] microblaze/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

Hi David,

On Mon, Feb 27, 2023 at 2:31 PM David Hildenbrand <[email protected]> wrote:
> On 26.02.23 21:13, Geert Uytterhoeven wrote:
> > On Fri, Jan 13, 2023 at 6:16 PM David Hildenbrand <[email protected]> wrote:
> >> Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit
> >> from the type. Generic MM currently only uses 5 bits for the type
> >> (MAX_SWAPFILES_SHIFT), so the stolen bit is effectively unused.
> >>
> >> The shift by 2 when converting between PTE and arch-specific swap entry
> >> makes the swap PTE layout a little bit harder to decipher.
> >>
> >> While at it, drop the comment from paulus---copy-and-paste leftover
> >> from powerpc where we actually have _PAGE_HASHPTE---and mask the type in
> >> __swp_entry_to_pte() as well.
> >>
> >> Cc: Michal Simek <[email protected]>
> >> Signed-off-by: David Hildenbrand <[email protected]>
> >
> > Thanks for your patch, which is now commit b5c88f21531c3457
> > ("microblaze/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE") in
> >
>
> Right, it went upstream, so we can only fixup.
>
> >> arch/m68k/include/asm/mcf_pgtable.h | 4 +--
> >
> > What is this m68k change doing here?
> > Sorry for not noticing this earlier.
>
> Thanks for the late review, still valuable :)
>
> That hunk should have gone into the previous patch, looks like I messed
> that up when reworking.
>
> >
> > Furthermore, several things below look strange to me...
> >
> >> arch/microblaze/include/asm/pgtable.h | 45 +++++++++++++++++++++------
> >> 2 files changed, 37 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/arch/m68k/include/asm/mcf_pgtable.h b/arch/m68k/include/asm/mcf_pgtable.h
> >> index 3f8f4d0e66dd..e573d7b649f7 100644
> >> --- a/arch/m68k/include/asm/mcf_pgtable.h
> >> +++ b/arch/m68k/include/asm/mcf_pgtable.h
> >> @@ -46,8 +46,8 @@
> >> #define _CACHEMASK040 (~0x060)
> >> #define _PAGE_GLOBAL040 0x400 /* 68040 global bit, used for kva descs */
> >>
> >> -/* We borrow bit 7 to store the exclusive marker in swap PTEs. */
> >> -#define _PAGE_SWP_EXCLUSIVE 0x080
> >> +/* We borrow bit 24 to store the exclusive marker in swap PTEs. */
> >> +#define _PAGE_SWP_EXCLUSIVE CF_PAGE_NOCACHE
> >
> > CF_PAGE_NOCACHE is 0x80, so this is still bit 7, thus the new comment
> > is wrong?
>
> You're right, it's still bit 7 (and we use LSB-0 bit numbering in that
> file). I'll send a fixup.

OK.

> >> /*
> >> * Externally used page protection values.
> >> diff --git a/arch/microblaze/include/asm/pgtable.h b/arch/microblaze/include/asm/pgtable.h
> >> index 42f5988e998b..7e3de54bf426 100644
> >> --- a/arch/microblaze/include/asm/pgtable.h
> >> +++ b/arch/microblaze/include/asm/pgtable.h
> >> @@ -131,10 +131,10 @@ extern pte_t *va_to_pte(unsigned long address);
> >> * of the 16 available. Bit 24-26 of the TLB are cleared in the TLB
> >> * miss handler. Bit 27 is PAGE_USER, thus selecting the correct
> >> * zone.
> >> - * - PRESENT *must* be in the bottom two bits because swap cache
> >> - * entries use the top 30 bits. Because 4xx doesn't support SMP
> >> - * anyway, M is irrelevant so we borrow it for PAGE_PRESENT. Bit 30
> >> - * is cleared in the TLB miss handler before the TLB entry is loaded.
> >> + * - PRESENT *must* be in the bottom two bits because swap PTEs use the top
> >> + * 30 bits. Because 4xx doesn't support SMP anyway, M is irrelevant so we
> >> + * borrow it for PAGE_PRESENT. Bit 30 is cleared in the TLB miss handler
> >> + * before the TLB entry is loaded.
> >
> > So the PowerPC 4xx comment is still here?
>
> I only dropped the comment above __swp_type(). I guess you mean that we
> could also drop the "Because 4xx doesn't support SMP anyway, M is
> irrelevant so we borrow it for PAGE_PRESENT." sentence, correct? Not

Yes, that's what I meant.

> sure about the "Bit 30 is cleared in the TLB miss handler" comment, if
> that can similarly be dropped.

No idea, didn't check. But if it was copied from PPC, chances are
high it's no longer true....

> >> * - All other bits of the PTE are loaded into TLBLO without
> >> * * modification, leaving us only the bits 20, 21, 24, 25, 26, 30 for
> >> * software PTE bits. We actually use bits 21, 24, 25, and
> >> @@ -155,6 +155,9 @@ extern pte_t *va_to_pte(unsigned long address);
> >> #define _PAGE_ACCESSED 0x400 /* software: R: page referenced */
> >> #define _PMD_PRESENT PAGE_MASK
> >>
> >> +/* We borrow bit 24 to store the exclusive marker in swap PTEs. */
> >> +#define _PAGE_SWP_EXCLUSIVE _PAGE_DIRTY
> >
> > _PAGE_DIRTY is 0x80, so this is also bit 7, thus the new comment is
> > wrong?
>
> In the example, I use MSB-0 bit numbering (which I determined to be
> correct in microblaze context eventually, but I got confused a couple a
> times because it's very inconsistent). That should be MSB-0 bit 24.

Thanks, TIL microblaze uses IBM bit numbering...

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2023-02-27 17:01:55

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH mm-unstable v1 11/26] microblaze/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

>>>> /*
>>>> * Externally used page protection values.
>>>> diff --git a/arch/microblaze/include/asm/pgtable.h b/arch/microblaze/include/asm/pgtable.h
>>>> index 42f5988e998b..7e3de54bf426 100644
>>>> --- a/arch/microblaze/include/asm/pgtable.h
>>>> +++ b/arch/microblaze/include/asm/pgtable.h
>>>> @@ -131,10 +131,10 @@ extern pte_t *va_to_pte(unsigned long address);
>>>> * of the 16 available. Bit 24-26 of the TLB are cleared in the TLB
>>>> * miss handler. Bit 27 is PAGE_USER, thus selecting the correct
>>>> * zone.
>>>> - * - PRESENT *must* be in the bottom two bits because swap cache
>>>> - * entries use the top 30 bits. Because 4xx doesn't support SMP
>>>> - * anyway, M is irrelevant so we borrow it for PAGE_PRESENT. Bit 30
>>>> - * is cleared in the TLB miss handler before the TLB entry is loaded.
>>>> + * - PRESENT *must* be in the bottom two bits because swap PTEs use the top
>>>> + * 30 bits. Because 4xx doesn't support SMP anyway, M is irrelevant so we
>>>> + * borrow it for PAGE_PRESENT. Bit 30 is cleared in the TLB miss handler
>>>> + * before the TLB entry is loaded.
>>>
>>> So the PowerPC 4xx comment is still here?
>>
>> I only dropped the comment above __swp_type(). I guess you mean that we
>> could also drop the "Because 4xx doesn't support SMP anyway, M is
>> irrelevant so we borrow it for PAGE_PRESENT." sentence, correct? Not
>
> Yes, that's what I meant.
>
>> sure about the "Bit 30 is cleared in the TLB miss handler" comment, if
>> that can similarly be dropped.
>
> No idea, didn't check. But if it was copied from PPC, chances are
> high it's no longer true....

I'll have a look.

>
>>>> * - All other bits of the PTE are loaded into TLBLO without
>>>> * * modification, leaving us only the bits 20, 21, 24, 25, 26, 30 for
>>>> * software PTE bits. We actually use bits 21, 24, 25, and
>>>> @@ -155,6 +155,9 @@ extern pte_t *va_to_pte(unsigned long address);
>>>> #define _PAGE_ACCESSED 0x400 /* software: R: page referenced */
>>>> #define _PMD_PRESENT PAGE_MASK
>>>>
>>>> +/* We borrow bit 24 to store the exclusive marker in swap PTEs. */
>>>> +#define _PAGE_SWP_EXCLUSIVE _PAGE_DIRTY
>>>
>>> _PAGE_DIRTY is 0x80, so this is also bit 7, thus the new comment is
>>> wrong?
>>
>> In the example, I use MSB-0 bit numbering (which I determined to be
>> correct in microblaze context eventually, but I got confused a couple a
>> times because it's very inconsistent). That should be MSB-0 bit 24.
>
> Thanks, TIL microblaze uses IBM bit numbering...

I assume IBM bit numbering corresponds to MSB-0 bit numbering, correct?


I recall that I used the comment above "/* Definitions for MicroBlaze.
*/" as an orientation.

0 1 2 3 4 ... 18 19 20 21 22 23 24 25 26 27 28 29 30 31
RPN..................... 0 0 EX WR ZSEL....... W I M G


So ... either we adjust both or we leave it as is. (again, depends on
what the right thing to to is -- which I don't know :) )

--
Thanks,

David / dhildenb


2023-02-27 19:47:20

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH mm-unstable v1 11/26] microblaze/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

Hi David,

On Mon, Feb 27, 2023 at 6:01 PM David Hildenbrand <[email protected]> wrote:
> >>>> /*
> >>>> * Externally used page protection values.
> >>>> diff --git a/arch/microblaze/include/asm/pgtable.h b/arch/microblaze/include/asm/pgtable.h
> >>>> index 42f5988e998b..7e3de54bf426 100644
> >>>> --- a/arch/microblaze/include/asm/pgtable.h
> >>>> +++ b/arch/microblaze/include/asm/pgtable.h

> >>>> * - All other bits of the PTE are loaded into TLBLO without
> >>>> * * modification, leaving us only the bits 20, 21, 24, 25, 26, 30 for
> >>>> * software PTE bits. We actually use bits 21, 24, 25, and
> >>>> @@ -155,6 +155,9 @@ extern pte_t *va_to_pte(unsigned long address);
> >>>> #define _PAGE_ACCESSED 0x400 /* software: R: page referenced */
> >>>> #define _PMD_PRESENT PAGE_MASK
> >>>>
> >>>> +/* We borrow bit 24 to store the exclusive marker in swap PTEs. */
> >>>> +#define _PAGE_SWP_EXCLUSIVE _PAGE_DIRTY
> >>>
> >>> _PAGE_DIRTY is 0x80, so this is also bit 7, thus the new comment is
> >>> wrong?
> >>
> >> In the example, I use MSB-0 bit numbering (which I determined to be
> >> correct in microblaze context eventually, but I got confused a couple a
> >> times because it's very inconsistent). That should be MSB-0 bit 24.
> >
> > Thanks, TIL microblaze uses IBM bit numbering...
>
> I assume IBM bit numbering corresponds to MSB-0 bit numbering, correct?

Correct, as seen in s370 and PowerPC manuals...

> I recall that I used the comment above "/* Definitions for MicroBlaze.
> */" as an orientation.
>
> 0 1 2 3 4 ... 18 19 20 21 22 23 24 25 26 27 28 29 30 31
> RPN..................... 0 0 EX WR ZSEL....... W I M G

Indeed, that's where I noticed the "unconventional" numbering...

> So ... either we adjust both or we leave it as is. (again, depends on
> what the right thing to to is -- which I don't know :) )

It depends whether you want to match the hardware documentation,
or the Linux BIT() macro and friends...

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2023-02-28 15:50:46

by Palmer Dabbelt

[permalink] [raw]
Subject: Re: [PATCH mm-unstable v1 19/26] riscv/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

On Fri, 13 Jan 2023 09:10:19 PST (-0800), [email protected] wrote:
> Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit
> from the offset. This reduces the maximum swap space per file: on 32bit
> to 16 GiB (was 32 GiB).

Seems fine to me, I doubt anyone wants a huge pile of swap on rv32.

>
> Note that this bit does not conflict with swap PMDs and could also be used
> in swap PMD context later.
>
> While at it, mask the type in __swp_entry().
>
> Cc: Paul Walmsley <[email protected]>
> Cc: Palmer Dabbelt <[email protected]>
> Cc: Albert Ou <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> arch/riscv/include/asm/pgtable-bits.h | 3 +++
> arch/riscv/include/asm/pgtable.h | 29 ++++++++++++++++++++++-----
> 2 files changed, 27 insertions(+), 5 deletions(-)
>
> diff --git a/arch/riscv/include/asm/pgtable-bits.h b/arch/riscv/include/asm/pgtable-bits.h
> index b9e13a8fe2b7..f896708e8331 100644
> --- a/arch/riscv/include/asm/pgtable-bits.h
> +++ b/arch/riscv/include/asm/pgtable-bits.h
> @@ -27,6 +27,9 @@
> */
> #define _PAGE_PROT_NONE _PAGE_GLOBAL
>
> +/* Used for swap PTEs only. */
> +#define _PAGE_SWP_EXCLUSIVE _PAGE_ACCESSED
> +
> #define _PAGE_PFN_SHIFT 10
>
> /*
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index 4eba9a98d0e3..03a4728db039 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -724,16 +724,18 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>
> /*
> - * Encode and decode a swap entry
> + * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
> + * are !pte_none() && !pte_present().
> *
> * Format of swap PTE:
> * bit 0: _PAGE_PRESENT (zero)
> * bit 1 to 3: _PAGE_LEAF (zero)
> * bit 5: _PAGE_PROT_NONE (zero)
> - * bits 6 to 10: swap type
> - * bits 10 to XLEN-1: swap offset
> + * bit 6: exclusive marker
> + * bits 7 to 11: swap type
> + * bits 11 to XLEN-1: swap offset
> */
> -#define __SWP_TYPE_SHIFT 6
> +#define __SWP_TYPE_SHIFT 7
> #define __SWP_TYPE_BITS 5
> #define __SWP_TYPE_MASK ((1UL << __SWP_TYPE_BITS) - 1)
> #define __SWP_OFFSET_SHIFT (__SWP_TYPE_BITS + __SWP_TYPE_SHIFT)
> @@ -744,11 +746,28 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
> #define __swp_type(x) (((x).val >> __SWP_TYPE_SHIFT) & __SWP_TYPE_MASK)
> #define __swp_offset(x) ((x).val >> __SWP_OFFSET_SHIFT)
> #define __swp_entry(type, offset) ((swp_entry_t) \
> - { ((type) << __SWP_TYPE_SHIFT) | ((offset) << __SWP_OFFSET_SHIFT) })
> + { (((type) & __SWP_TYPE_MASK) << __SWP_TYPE_SHIFT) | \
> + ((offset) << __SWP_OFFSET_SHIFT) })
>
> #define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
> #define __swp_entry_to_pte(x) ((pte_t) { (x).val })
>
> +#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
> +static inline int pte_swp_exclusive(pte_t pte)
> +{
> + return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
> +}
> +
> +static inline pte_t pte_swp_mkexclusive(pte_t pte)
> +{
> + return __pte(pte_val(pte) | _PAGE_SWP_EXCLUSIVE);
> +}
> +
> +static inline pte_t pte_swp_clear_exclusive(pte_t pte)
> +{
> + return __pte(pte_val(pte) & ~_PAGE_SWP_EXCLUSIVE);
> +}
> +
> #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
> #define __pmd_to_swp_entry(pmd) ((swp_entry_t) { pmd_val(pmd) })
> #define __swp_entry_to_pmd(swp) __pmd((swp).val)

Acked-by: Palmer Dabbelt <[email protected]>
Reviewed-by: Palmer Dabbelt <[email protected]>

2023-02-28 15:56:43

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH mm-unstable v1 11/26] microblaze/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

On 27.02.23 20:46, Geert Uytterhoeven wrote:
> Hi David,
>
> On Mon, Feb 27, 2023 at 6:01 PM David Hildenbrand <[email protected]> wrote:
>>>>>> /*
>>>>>> * Externally used page protection values.
>>>>>> diff --git a/arch/microblaze/include/asm/pgtable.h b/arch/microblaze/include/asm/pgtable.h
>>>>>> index 42f5988e998b..7e3de54bf426 100644
>>>>>> --- a/arch/microblaze/include/asm/pgtable.h
>>>>>> +++ b/arch/microblaze/include/asm/pgtable.h
>
>>>>>> * - All other bits of the PTE are loaded into TLBLO without
>>>>>> * * modification, leaving us only the bits 20, 21, 24, 25, 26, 30 for
>>>>>> * software PTE bits. We actually use bits 21, 24, 25, and
>>>>>> @@ -155,6 +155,9 @@ extern pte_t *va_to_pte(unsigned long address);
>>>>>> #define _PAGE_ACCESSED 0x400 /* software: R: page referenced */
>>>>>> #define _PMD_PRESENT PAGE_MASK
>>>>>>
>>>>>> +/* We borrow bit 24 to store the exclusive marker in swap PTEs. */
>>>>>> +#define _PAGE_SWP_EXCLUSIVE _PAGE_DIRTY
>>>>>
>>>>> _PAGE_DIRTY is 0x80, so this is also bit 7, thus the new comment is
>>>>> wrong?
>>>>
>>>> In the example, I use MSB-0 bit numbering (which I determined to be
>>>> correct in microblaze context eventually, but I got confused a couple a
>>>> times because it's very inconsistent). That should be MSB-0 bit 24.
>>>
>>> Thanks, TIL microblaze uses IBM bit numbering...
>>
>> I assume IBM bit numbering corresponds to MSB-0 bit numbering, correct?
>
> Correct, as seen in s370 and PowerPC manuals...

Good, I have some solid s390x background, but thinking about the term
"IBM PC" made me double-check that we're talking about the same thing ;)

>
>> I recall that I used the comment above "/* Definitions for MicroBlaze.
>> */" as an orientation.
>>
>> 0 1 2 3 4 ... 18 19 20 21 22 23 24 25 26 27 28 29 30 31
>> RPN..................... 0 0 EX WR ZSEL....... W I M G
>
> Indeed, that's where I noticed the "unconventional" numbering...
>
>> So ... either we adjust both or we leave it as is. (again, depends on
>> what the right thing to to is -- which I don't know :) )
>
> It depends whether you want to match the hardware documentation,
> or the Linux BIT() macro and friends...

The hardware documentation, so we should be good.

Thanks!

--
Thanks,

David / dhildenb


2023-02-28 15:58:04

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH mm-unstable v1 19/26] riscv/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

On 28.02.23 16:50, Palmer Dabbelt wrote:
> On Fri, 13 Jan 2023 09:10:19 PST (-0800), [email protected] wrote:
>> Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit
>> from the offset. This reduces the maximum swap space per file: on 32bit
>> to 16 GiB (was 32 GiB).
>
> Seems fine to me, I doubt anyone wants a huge pile of swap on rv32.

Patch is already upstream, so we can't add tags unfortunately. Thanks
for the review!

--
Thanks,

David / dhildenb


Subject: Re: [PATCH mm-unstable v1 00/26] mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs

Hello:

This series was applied to riscv/linux.git (for-next)
by Andrew Morton <[email protected]>:

On Fri, 13 Jan 2023 18:10:00 +0100 you wrote:
> This is the follow-up on [1]:
> [PATCH v2 0/8] mm: COW fixes part 3: reliable GUP R/W FOLL_GET of
> anonymous pages
>
> After we implemented __HAVE_ARCH_PTE_SWP_EXCLUSIVE on most prominent
> enterprise architectures, implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all
> remaining architectures that support swap PTEs.
>
> [...]

Here is the summary with links:
- [mm-unstable,v1,01/26] mm/debug_vm_pgtable: more pte_swp_exclusive() sanity checks
(no matching commit)
- [mm-unstable,v1,02/26] alpha/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
(no matching commit)
- [mm-unstable,v1,03/26] arc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
(no matching commit)
- [mm-unstable,v1,04/26] arm/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
(no matching commit)
- [mm-unstable,v1,05/26] csky/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
(no matching commit)
- [mm-unstable,v1,06/26] hexagon/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
(no matching commit)
- [mm-unstable,v1,07/26] ia64/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
(no matching commit)
- [mm-unstable,v1,08/26] loongarch/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
(no matching commit)
- [mm-unstable,v1,09/26] m68k/mm: remove dummy __swp definitions for nommu
(no matching commit)
- [mm-unstable,v1,10/26] m68k/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
(no matching commit)
- [mm-unstable,v1,11/26] microblaze/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
(no matching commit)
- [mm-unstable,v1,12/26] mips/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
(no matching commit)
- [mm-unstable,v1,13/26] nios2/mm: refactor swap PTE layout
(no matching commit)
- [mm-unstable,v1,14/26] nios2/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
(no matching commit)
- [mm-unstable,v1,15/26] openrisc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
(no matching commit)
- [mm-unstable,v1,16/26] parisc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
(no matching commit)
- [mm-unstable,v1,17/26] powerpc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 32bit book3s
(no matching commit)
- [mm-unstable,v1,18/26] powerpc/nohash/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
(no matching commit)
- [mm-unstable,v1,19/26] riscv/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
https://git.kernel.org/riscv/c/51a1007d4113
- [mm-unstable,v1,20/26] sh/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
(no matching commit)
- [mm-unstable,v1,21/26] sparc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 32bit
(no matching commit)
- [mm-unstable,v1,22/26] sparc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 64bit
(no matching commit)
- [mm-unstable,v1,23/26] um/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
(no matching commit)
- [mm-unstable,v1,24/26] x86/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE also on 32bit
(no matching commit)
- [mm-unstable,v1,25/26] xtensa/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
(no matching commit)
- [mm-unstable,v1,26/26] mm: remove __HAVE_ARCH_PTE_SWP_EXCLUSIVE
(no matching commit)

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html