2024-04-29 14:03:17

by Ryan Roberts

[permalink] [raw]
Subject: [PATCH v2 0/3] arm64/mm: Enable userfaultfd write-protect

Hi All,

This series adds uffd write-protect support for arm64.

Previous attempts to add uffd-wp (and soft-dirty) have failed because of a
perceived lack of available PTE SW bits. However it actually turns out that
there are 2 available but they are hidden. PTE_PROT_NONE was previously
occupying a SW bit, but can be moved, freeing up the SW bit. Bit 63 is marked as
"IGNORED" in the Arm ARM, but it does not currently indicate "reserved for SW
use" like it does for the other SW bits. I've confirmed with the spec owner that
this is an oversight; the bit is intended to be reserved for SW use and the spec
will clarify this in a future update.

So now we have two spare bits; patch 3 enables uffd-wp on arm64, using the SW
bit freed up by moving PTE_PROT_NONE. This leaves bit 63 spare for future use
(e.g. soft-dirty - see RFC at [2] - or some other usage).

---

This applies on top of v6.9-rc5.

All mm selftests involving uffd-wp now run. However, this work exposed a bug in
core-mm that was leading to some test uffd-wp failures in the pagemap_ioctl
test. The fix for that is posted at [3]. With the fix applied, all the uffd-wp
tests pass and no other selftest regressions are observed.


Changes since v1 [1]
====================

patch 1 & 2 (was patch 1):
- generalized PMD_PRESENT_INVALID into PTE_INVALID
- removed explicit PTE_PROT_NONE bit

patch 3 (was patch 2):
- collected R-b/A-b from Peter and Catalin - thanks!


[1] https://lore.kernel.org/linux-arm-kernel/[email protected]/
[2] https://lore.kernel.org/all/[email protected]/
[3] https://lore.kernel.org/all/[email protected]/

Thanks,
Ryan

Ryan Roberts (3):
arm64/mm: Refactor PMD_PRESENT_INVALID and PTE_PROT_NONE bits
arm64/mm: Move PTE_INVALID to overlay PTE_NS
arm64/mm: Add uffd write-protect support

arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/pgtable-hwdef.h | 1 +
arch/arm64/include/asm/pgtable-prot.h | 17 ++---
arch/arm64/include/asm/pgtable.h | 88 +++++++++++++++++++-------
4 files changed, 75 insertions(+), 32 deletions(-)

--
2.25.1



2024-04-29 14:03:19

by Ryan Roberts

[permalink] [raw]
Subject: [PATCH v2 1/3] arm64/mm: Refactor PMD_PRESENT_INVALID and PTE_PROT_NONE bits

Currently the PMD_PRESENT_INVALID and PTE_PROT_NONE functionality
explicitly occupy 2 bits in the PTE when PTE_VALID/PMD_SECT_VALID is
clear. This has 2 significant consequences:

- PTE_PROT_NONE consumes a precious SW PTE bit that could be used for
other things.
- The swap pte layout must reserve those same 2 bits and ensure they
are both always zero for a swap pte. It would be nice to reclaim at
least one of those bits.

Note that while PMD_PRESENT_INVALID technically only applies to pmds,
the swap pte layout is common to ptes and pmds so we are currently
effectively reserving that bit at both levels.

Let's replace PMD_PRESENT_INVALID with a more generic PTE_INVALID bit,
which occupies the same position (bit 59) but applies uniformly to
page/block descriptors at any level. This bit is only interpretted when
PTE_VALID is clear. If it is set, then the pte is still considered
present; pte_present() returns true and all the fields in the pte follow
the HW interpretation (e.g. SW can safely call pte_pfn(), etc). But
crucially, the HW treats the pte as invalid and will fault if it hits.

With this in place, we can remove PTE_PROT_NONE entirely and instead
represent PROT_NONE as a present but invalid pte (PTE_VALID=0,
PTE_INVALID=1) with PTE_USER=0 and PTE_UXN=1. This is a unique
combination that is not used anywhere else.

The net result is a clearer, simpler, more generic encoding scheme that
applies uniformly to all levels. Additionally we free up a PTE SW bit a
swap pte bit (bit 58 in both cases).

Signed-off-by: Ryan Roberts <[email protected]>
---
arch/arm64/include/asm/pgtable-prot.h | 11 ++------
arch/arm64/include/asm/pgtable.h | 38 ++++++++++++---------------
2 files changed, 19 insertions(+), 30 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index dd9ee67d1d87..de62e6881154 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -18,14 +18,7 @@
#define PTE_DIRTY (_AT(pteval_t, 1) << 55)
#define PTE_SPECIAL (_AT(pteval_t, 1) << 56)
#define PTE_DEVMAP (_AT(pteval_t, 1) << 57)
-#define PTE_PROT_NONE (_AT(pteval_t, 1) << 58) /* only when !PTE_VALID */
-
-/*
- * This bit indicates that the entry is present i.e. pmd_page()
- * still points to a valid huge page in memory even if the pmd
- * has been invalidated.
- */
-#define PMD_PRESENT_INVALID (_AT(pteval_t, 1) << 59) /* only when !PMD_SECT_VALID */
+#define PTE_INVALID (_AT(pteval_t, 1) << 59) /* only when !PTE_VALID */

#define _PROT_DEFAULT (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
#define _PROT_SECT_DEFAULT (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
@@ -103,7 +96,7 @@ static inline bool __pure lpa2_is_enabled(void)
__val; \
})

-#define PAGE_NONE __pgprot(((_PAGE_DEFAULT) & ~PTE_VALID) | PTE_PROT_NONE | PTE_RDONLY | PTE_NG | PTE_PXN | PTE_UXN)
+#define PAGE_NONE __pgprot(((_PAGE_DEFAULT) & ~PTE_VALID) | PTE_INVALID | PTE_RDONLY | PTE_NG | PTE_PXN | PTE_UXN)
/* shared+writable pages are clean by default, hence PTE_RDONLY|PTE_WRITE */
#define PAGE_SHARED __pgprot(_PAGE_SHARED)
#define PAGE_SHARED_EXEC __pgprot(_PAGE_SHARED_EXEC)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index afdd56d26ad7..8dd4637d6b56 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -105,7 +105,7 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
/*
* The following only work if pte_present(). Undefined behaviour otherwise.
*/
-#define pte_present(pte) (!!(pte_val(pte) & (PTE_VALID | PTE_PROT_NONE)))
+#define pte_present(pte) (pte_valid(pte) || pte_invalid(pte))
#define pte_young(pte) (!!(pte_val(pte) & PTE_AF))
#define pte_special(pte) (!!(pte_val(pte) & PTE_SPECIAL))
#define pte_write(pte) (!!(pte_val(pte) & PTE_WRITE))
@@ -132,6 +132,7 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
#define pte_dirty(pte) (pte_sw_dirty(pte) || pte_hw_dirty(pte))

#define pte_valid(pte) (!!(pte_val(pte) & PTE_VALID))
+#define pte_invalid(pte) ((pte_val(pte) & (PTE_VALID | PTE_INVALID)) == PTE_INVALID)
/*
* Execute-only user mappings do not have the PTE_USER bit set. All valid
* kernel mappings have the PTE_UXN bit set.
@@ -261,6 +262,13 @@ static inline pte_t pte_mkpresent(pte_t pte)
return set_pte_bit(pte, __pgprot(PTE_VALID));
}

+static inline pte_t pte_mkinvalid(pte_t pte)
+{
+ pte = set_pte_bit(pte, __pgprot(PTE_INVALID));
+ pte = clear_pte_bit(pte, __pgprot(PTE_VALID));
+ return pte;
+}
+
static inline pmd_t pmd_mkcont(pmd_t pmd)
{
return __pmd(pmd_val(pmd) | PMD_SECT_CONT);
@@ -469,7 +477,7 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
*/
static inline int pte_protnone(pte_t pte)
{
- return (pte_val(pte) & (PTE_VALID | PTE_PROT_NONE)) == PTE_PROT_NONE;
+ return pte_invalid(pte) && !pte_user(pte) && !pte_user_exec(pte);
}

static inline int pmd_protnone(pmd_t pmd)
@@ -478,12 +486,7 @@ static inline int pmd_protnone(pmd_t pmd)
}
#endif

-#define pmd_present_invalid(pmd) (!!(pmd_val(pmd) & PMD_PRESENT_INVALID))
-
-static inline int pmd_present(pmd_t pmd)
-{
- return pte_present(pmd_pte(pmd)) || pmd_present_invalid(pmd);
-}
+#define pmd_present(pmd) pte_present(pmd_pte(pmd))

/*
* THP definitions.
@@ -508,14 +511,7 @@ static inline int pmd_trans_huge(pmd_t pmd)
#define pmd_mkclean(pmd) pte_pmd(pte_mkclean(pmd_pte(pmd)))
#define pmd_mkdirty(pmd) pte_pmd(pte_mkdirty(pmd_pte(pmd)))
#define pmd_mkyoung(pmd) pte_pmd(pte_mkyoung(pmd_pte(pmd)))
-
-static inline pmd_t pmd_mkinvalid(pmd_t pmd)
-{
- pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
- pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
-
- return pmd;
-}
+#define pmd_mkinvalid(pmd) pte_pmd(pte_mkinvalid(pmd_pte(pmd)))

#define pmd_thp_or_huge(pmd) (pmd_huge(pmd) || pmd_trans_huge(pmd))

@@ -1027,7 +1023,7 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
* in MAIR_EL1. The mask below has to include PTE_ATTRINDX_MASK.
*/
const pteval_t mask = PTE_USER | PTE_PXN | PTE_UXN | PTE_RDONLY |
- PTE_PROT_NONE | PTE_VALID | PTE_WRITE | PTE_GP |
+ PTE_INVALID | PTE_VALID | PTE_WRITE | PTE_GP |
PTE_ATTRINDX_MASK;
/* preserve the hardware dirty information */
if (pte_hw_dirty(pte))
@@ -1076,17 +1072,17 @@ static inline int pgd_devmap(pgd_t pgd)
#ifdef CONFIG_PAGE_TABLE_CHECK
static inline bool pte_user_accessible_page(pte_t pte)
{
- return pte_present(pte) && (pte_user(pte) || pte_user_exec(pte));
+ return pte_valid(pte) && (pte_user(pte) || pte_user_exec(pte));
}

static inline bool pmd_user_accessible_page(pmd_t pmd)
{
- return pmd_leaf(pmd) && !pmd_present_invalid(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd));
+ return pmd_valid(pmd) && !pmd_table(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd));
}

static inline bool pud_user_accessible_page(pud_t pud)
{
- return pud_leaf(pud) && (pud_user(pud) || pud_user_exec(pud));
+ return pud_valid(pud) && !pud_table(pud) && (pud_user(pud) || pud_user_exec(pud));
}
#endif

@@ -1250,7 +1246,7 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
* bits 2: remember PG_anon_exclusive
* bits 3-7: swap type
* bits 8-57: swap offset
- * bit 58: PTE_PROT_NONE (must be zero)
+ * bit 59: PTE_INVALID (must be zero)
*/
#define __SWP_TYPE_SHIFT 3
#define __SWP_TYPE_BITS 5
--
2.25.1


2024-04-29 14:03:40

by Ryan Roberts

[permalink] [raw]
Subject: [PATCH v2 3/3] arm64/mm: Add uffd write-protect support

Let's use the newly-free PTE SW bit (58) to add support for uffd-wp.

The standard handlers are implemented for set/test/clear for both pte
and pmd. Additionally we must also track the uffd-wp state as a pte swp
bit, so use a free swap pte bit (3).

Acked-by: Peter Xu <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Signed-off-by: Ryan Roberts <[email protected]>
---
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/pgtable-prot.h | 8 +++++
arch/arm64/include/asm/pgtable.h | 44 +++++++++++++++++++++++++++
3 files changed, 53 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7b11c98b3e84..763e221f2169 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -255,6 +255,7 @@ config ARM64
select SYSCTL_EXCEPTION_TRACE
select THREAD_INFO_IN_TASK
select HAVE_ARCH_USERFAULTFD_MINOR if USERFAULTFD
+ select HAVE_ARCH_USERFAULTFD_WP if USERFAULTFD
select TRACE_IRQFLAGS_SUPPORT
select TRACE_IRQFLAGS_NMI_SUPPORT
select HAVE_SOFTIRQ_ON_OWN_STACK
diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index ddf55895c9c2..a5e7dd37bb0a 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -20,6 +20,14 @@
#define PTE_DEVMAP (_AT(pteval_t, 1) << 57)
#define PTE_INVALID (PTE_NS) /* only when !PTE_VALID */

+#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP
+#define PTE_UFFD_WP (_AT(pteval_t, 1) << 58) /* uffd-wp tracking */
+#define PTE_SWP_UFFD_WP (_AT(pteval_t, 1) << 3) /* only for swp ptes */
+#else
+#define PTE_UFFD_WP (_AT(pteval_t, 0))
+#define PTE_SWP_UFFD_WP (_AT(pteval_t, 0))
+#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */
+
#define _PROT_DEFAULT (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
#define _PROT_SECT_DEFAULT (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index d966d2ee1097..759ba8c32a58 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -279,6 +279,23 @@ static inline pte_t pte_mkdevmap(pte_t pte)
return set_pte_bit(pte, __pgprot(PTE_DEVMAP | PTE_SPECIAL));
}

+#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP
+static inline int pte_uffd_wp(pte_t pte)
+{
+ return !!(pte_val(pte) & PTE_UFFD_WP);
+}
+
+static inline pte_t pte_mkuffd_wp(pte_t pte)
+{
+ return pte_wrprotect(set_pte_bit(pte, __pgprot(PTE_UFFD_WP)));
+}
+
+static inline pte_t pte_clear_uffd_wp(pte_t pte)
+{
+ return clear_pte_bit(pte, __pgprot(PTE_UFFD_WP));
+}
+#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */
+
static inline void __set_pte(pte_t *ptep, pte_t pte)
{
WRITE_ONCE(*ptep, pte);
@@ -471,6 +488,23 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
return clear_pte_bit(pte, __pgprot(PTE_SWP_EXCLUSIVE));
}

+#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP
+static inline pte_t pte_swp_mkuffd_wp(pte_t pte)
+{
+ return set_pte_bit(pte, __pgprot(PTE_SWP_UFFD_WP));
+}
+
+static inline int pte_swp_uffd_wp(pte_t pte)
+{
+ return !!(pte_val(pte) & PTE_SWP_UFFD_WP);
+}
+
+static inline pte_t pte_swp_clear_uffd_wp(pte_t pte)
+{
+ return clear_pte_bit(pte, __pgprot(PTE_SWP_UFFD_WP));
+}
+#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */
+
#ifdef CONFIG_NUMA_BALANCING
/*
* See the comment in include/linux/pgtable.h
@@ -512,6 +546,15 @@ static inline int pmd_trans_huge(pmd_t pmd)
#define pmd_mkdirty(pmd) pte_pmd(pte_mkdirty(pmd_pte(pmd)))
#define pmd_mkyoung(pmd) pte_pmd(pte_mkyoung(pmd_pte(pmd)))
#define pmd_mkinvalid(pmd) pte_pmd(pte_mkinvalid(pmd_pte(pmd)))
+#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP
+#define pmd_uffd_wp(pmd) pte_uffd_wp(pmd_pte(pmd))
+#define pmd_mkuffd_wp(pmd) pte_pmd(pte_mkuffd_wp(pmd_pte(pmd)))
+#define pmd_clear_uffd_wp(pmd) pte_pmd(pte_clear_uffd_wp(pmd_pte(pmd)))
+#define pmd_swp_uffd_wp(pmd) pte_swp_uffd_wp(pmd_pte(pmd))
+#define pmd_swp_mkuffd_wp(pmd) pte_pmd(pte_swp_mkuffd_wp(pmd_pte(pmd)))
+#define pmd_swp_clear_uffd_wp(pmd) \
+ pte_pmd(pte_swp_clear_uffd_wp(pmd_pte(pmd)))
+#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */

#define pmd_thp_or_huge(pmd) (pmd_huge(pmd) || pmd_trans_huge(pmd))

@@ -1244,6 +1287,7 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
* Encode and decode a swap entry:
* bits 0-1: present (must be zero)
* bits 2: remember PG_anon_exclusive
+ * bit 3: remember uffd-wp state
* bit 5: PTE_INVALID (must be zero)
* bits 6-10: swap type
* bits 11-60: swap offset
--
2.25.1


2024-04-29 14:05:25

by Ryan Roberts

[permalink] [raw]
Subject: [PATCH v2 2/3] arm64/mm: Move PTE_INVALID to overlay PTE_NS

PTE_INVALID was previously occupying bit 59, which when a PTE is valid
can either be IGNORED, PBHA[0] or AttrIndex[3], depending on the HW
configuration. In practice this is currently not a problem because
PTE_INVALID can only be 1 when PTE_VALID=0 and upstream Linux always
requires the bit set to 0 for a valid pte.

However, if in future Linux wants to use the field (e.g. AttrIndex[3])
then we could end up with confusion when PTE_INVALID comes along and
corrupts the field - we would ideally want to preserve it even for an
invalid (but present) pte.

The other problem with bit 59 is that it prevents the offset field of a
swap entry within a swap pte from growing beyond 51 bits. By moving
PTE_INVALID to a low bit we can lay the swap pte out so that the
offset field could grow to 53 bits in future.

So let's move PTE_INVALID to overlay PTE_NS (bit 5). PTE_NS is res0 for
SW outside of the secure state so Linux will never need to touch it.

These are both marginal benefits, but make things a bit tidier in my
opinion.

Signed-off-by: Ryan Roberts <[email protected]>
---
arch/arm64/include/asm/pgtable-hwdef.h | 1 +
arch/arm64/include/asm/pgtable-prot.h | 2 +-
arch/arm64/include/asm/pgtable.h | 8 ++++----
3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
index ef207a0d4f0d..7e1fea3a4328 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -160,6 +160,7 @@
#define PTE_TYPE_MASK (_AT(pteval_t, 3) << 0)
#define PTE_TYPE_PAGE (_AT(pteval_t, 3) << 0)
#define PTE_TABLE_BIT (_AT(pteval_t, 1) << 1)
+#define PTE_NS (_AT(pteval_t, 1) << 5) /* NS */
#define PTE_USER (_AT(pteval_t, 1) << 6) /* AP[1] */
#define PTE_RDONLY (_AT(pteval_t, 1) << 7) /* AP[2] */
#define PTE_SHARED (_AT(pteval_t, 3) << 8) /* SH[1:0], inner shareable */
diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index de62e6881154..ddf55895c9c2 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -18,7 +18,7 @@
#define PTE_DIRTY (_AT(pteval_t, 1) << 55)
#define PTE_SPECIAL (_AT(pteval_t, 1) << 56)
#define PTE_DEVMAP (_AT(pteval_t, 1) << 57)
-#define PTE_INVALID (_AT(pteval_t, 1) << 59) /* only when !PTE_VALID */
+#define PTE_INVALID (PTE_NS) /* only when !PTE_VALID */

#define _PROT_DEFAULT (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
#define _PROT_SECT_DEFAULT (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 8dd4637d6b56..d966d2ee1097 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1244,11 +1244,11 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
* Encode and decode a swap entry:
* bits 0-1: present (must be zero)
* bits 2: remember PG_anon_exclusive
- * bits 3-7: swap type
- * bits 8-57: swap offset
- * bit 59: PTE_INVALID (must be zero)
+ * bit 5: PTE_INVALID (must be zero)
+ * bits 6-10: swap type
+ * bits 11-60: swap offset
*/
-#define __SWP_TYPE_SHIFT 3
+#define __SWP_TYPE_SHIFT 6
#define __SWP_TYPE_BITS 5
#define __SWP_OFFSET_BITS 50
#define __SWP_TYPE_MASK ((1 << __SWP_TYPE_BITS) - 1)
--
2.25.1


2024-04-29 16:13:45

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] arm64/mm: Refactor PMD_PRESENT_INVALID and PTE_PROT_NONE bits

On 29.04.24 16:02, Ryan Roberts wrote:
> Currently the PMD_PRESENT_INVALID and PTE_PROT_NONE functionality
> explicitly occupy 2 bits in the PTE when PTE_VALID/PMD_SECT_VALID is
> clear. This has 2 significant consequences:
>
> - PTE_PROT_NONE consumes a precious SW PTE bit that could be used for
> other things.
> - The swap pte layout must reserve those same 2 bits and ensure they
> are both always zero for a swap pte. It would be nice to reclaim at
> least one of those bits.
>
> Note that while PMD_PRESENT_INVALID technically only applies to pmds,
> the swap pte layout is common to ptes and pmds so we are currently
> effectively reserving that bit at both levels.
>
> Let's replace PMD_PRESENT_INVALID with a more generic PTE_INVALID bit,
> which occupies the same position (bit 59) but applies uniformly to
> page/block descriptors at any level. This bit is only interpretted when

s/interpretted/interpreted/

> PTE_VALID is clear. If it is set, then the pte is still considered
> present; pte_present() returns true and all the fields in the pte follow
> the HW interpretation (e.g. SW can safely call pte_pfn(), etc). But
> crucially, the HW treats the pte as invalid and will fault if it hits.
>
> With this in place, we can remove PTE_PROT_NONE entirely and instead
> represent PROT_NONE as a present but invalid pte (PTE_VALID=0,
> PTE_INVALID=1) with PTE_USER=0 and PTE_UXN=1. This is a unique
> combination that is not used anywhere else.
>
> The net result is a clearer, simpler, more generic encoding scheme that
> applies uniformly to all levels. Additionally we free up a PTE SW bit a
> swap pte bit (bit 58 in both cases).
>
> Signed-off-by: Ryan Roberts <[email protected]>

Not an expert on all the details, but nothing jumped at me.

--
Cheers,

David / dhildenb


2024-04-29 16:17:08

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] arm64/mm: Add uffd write-protect support

On 29.04.24 16:02, Ryan Roberts wrote:
> Let's use the newly-free PTE SW bit (58) to add support for uffd-wp.
>
> The standard handlers are implemented for set/test/clear for both pte
> and pmd. Additionally we must also track the uffd-wp state as a pte swp
> bit, so use a free swap pte bit (3).
>
> Acked-by: Peter Xu <[email protected]>
> Reviewed-by: Catalin Marinas <[email protected]>
> Signed-off-by: Ryan Roberts <[email protected]>
> ---

Reviewed-by: David Hildenbrand <[email protected]>

--
Cheers,

David / dhildenb


2024-04-29 16:20:41

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] arm64/mm: Refactor PMD_PRESENT_INVALID and PTE_PROT_NONE bits

On Mon, Apr 29, 2024 at 03:02:05PM +0100, Ryan Roberts wrote:
> diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
> index dd9ee67d1d87..de62e6881154 100644
> --- a/arch/arm64/include/asm/pgtable-prot.h
> +++ b/arch/arm64/include/asm/pgtable-prot.h
> @@ -18,14 +18,7 @@
> #define PTE_DIRTY (_AT(pteval_t, 1) << 55)
> #define PTE_SPECIAL (_AT(pteval_t, 1) << 56)
> #define PTE_DEVMAP (_AT(pteval_t, 1) << 57)
> -#define PTE_PROT_NONE (_AT(pteval_t, 1) << 58) /* only when !PTE_VALID */
> -
> -/*
> - * This bit indicates that the entry is present i.e. pmd_page()
> - * still points to a valid huge page in memory even if the pmd
> - * has been invalidated.
> - */
> -#define PMD_PRESENT_INVALID (_AT(pteval_t, 1) << 59) /* only when !PMD_SECT_VALID */
> +#define PTE_INVALID (_AT(pteval_t, 1) << 59) /* only when !PTE_VALID */

Nitpick - I prefer the PTE_PRESENT_INVALID name as it makes it clearer
it's a present pte. We already have PTE_VALID, calling it PTE_INVALID
looks like a negation only.

> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index afdd56d26ad7..8dd4637d6b56 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -105,7 +105,7 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
> /*
> * The following only work if pte_present(). Undefined behaviour otherwise.
> */
> -#define pte_present(pte) (!!(pte_val(pte) & (PTE_VALID | PTE_PROT_NONE)))
> +#define pte_present(pte) (pte_valid(pte) || pte_invalid(pte))
> #define pte_young(pte) (!!(pte_val(pte) & PTE_AF))
> #define pte_special(pte) (!!(pte_val(pte) & PTE_SPECIAL))
> #define pte_write(pte) (!!(pte_val(pte) & PTE_WRITE))
> @@ -132,6 +132,7 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
> #define pte_dirty(pte) (pte_sw_dirty(pte) || pte_hw_dirty(pte))
>
> #define pte_valid(pte) (!!(pte_val(pte) & PTE_VALID))
> +#define pte_invalid(pte) ((pte_val(pte) & (PTE_VALID | PTE_INVALID)) == PTE_INVALID)

Same argument as above, pte_invalid() looks confusing to me, better (to
me) as pte_present_invalid().

I think it's sufficient to check PTE_PRESENT_INVALID only. We'd never
have both bits set, so no need for mask and compare.

> /*
> * Execute-only user mappings do not have the PTE_USER bit set. All valid
> * kernel mappings have the PTE_UXN bit set.
> @@ -261,6 +262,13 @@ static inline pte_t pte_mkpresent(pte_t pte)
> return set_pte_bit(pte, __pgprot(PTE_VALID));
> }
>
> +static inline pte_t pte_mkinvalid(pte_t pte)
> +{
> + pte = set_pte_bit(pte, __pgprot(PTE_INVALID));
> + pte = clear_pte_bit(pte, __pgprot(PTE_VALID));
> + return pte;
> +}

I wonder whether we need to define this. I guess it makes sense than
having the pmd_mkinvalid() use the PTE_* definitions directly, though it
won't be something we need to do on a pte.

> +
> static inline pmd_t pmd_mkcont(pmd_t pmd)
> {
> return __pmd(pmd_val(pmd) | PMD_SECT_CONT);
> @@ -469,7 +477,7 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
> */
> static inline int pte_protnone(pte_t pte)
> {
> - return (pte_val(pte) & (PTE_VALID | PTE_PROT_NONE)) == PTE_PROT_NONE;
> + return pte_invalid(pte) && !pte_user(pte) && !pte_user_exec(pte);
> }
>
> static inline int pmd_protnone(pmd_t pmd)
> @@ -478,12 +486,7 @@ static inline int pmd_protnone(pmd_t pmd)
> }
> #endif
>
> -#define pmd_present_invalid(pmd) (!!(pmd_val(pmd) & PMD_PRESENT_INVALID))
> -
> -static inline int pmd_present(pmd_t pmd)
> -{
> - return pte_present(pmd_pte(pmd)) || pmd_present_invalid(pmd);
> -}
> +#define pmd_present(pmd) pte_present(pmd_pte(pmd))
>
> /*
> * THP definitions.
> @@ -508,14 +511,7 @@ static inline int pmd_trans_huge(pmd_t pmd)
> #define pmd_mkclean(pmd) pte_pmd(pte_mkclean(pmd_pte(pmd)))
> #define pmd_mkdirty(pmd) pte_pmd(pte_mkdirty(pmd_pte(pmd)))
> #define pmd_mkyoung(pmd) pte_pmd(pte_mkyoung(pmd_pte(pmd)))
> -
> -static inline pmd_t pmd_mkinvalid(pmd_t pmd)
> -{
> - pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
> - pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
> -
> - return pmd;
> -}
> +#define pmd_mkinvalid(pmd) pte_pmd(pte_mkinvalid(pmd_pte(pmd)))
>
> #define pmd_thp_or_huge(pmd) (pmd_huge(pmd) || pmd_trans_huge(pmd))
>
> @@ -1027,7 +1023,7 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
> * in MAIR_EL1. The mask below has to include PTE_ATTRINDX_MASK.
> */
> const pteval_t mask = PTE_USER | PTE_PXN | PTE_UXN | PTE_RDONLY |
> - PTE_PROT_NONE | PTE_VALID | PTE_WRITE | PTE_GP |
> + PTE_INVALID | PTE_VALID | PTE_WRITE | PTE_GP |
> PTE_ATTRINDX_MASK;
> /* preserve the hardware dirty information */
> if (pte_hw_dirty(pte))
> @@ -1076,17 +1072,17 @@ static inline int pgd_devmap(pgd_t pgd)
> #ifdef CONFIG_PAGE_TABLE_CHECK
> static inline bool pte_user_accessible_page(pte_t pte)
> {
> - return pte_present(pte) && (pte_user(pte) || pte_user_exec(pte));
> + return pte_valid(pte) && (pte_user(pte) || pte_user_exec(pte));
> }

This looks fine, it doesn't change the semantics since PROT_NONE never
had pte_user() or pte_user_exec().

> static inline bool pmd_user_accessible_page(pmd_t pmd)
> {
> - return pmd_leaf(pmd) && !pmd_present_invalid(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd));
> + return pmd_valid(pmd) && !pmd_table(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd));
> }

Maybe our pmd_leaf() should actually check valid && !table instead of
present and no need to change these.

> static inline bool pud_user_accessible_page(pud_t pud)
> {
> - return pud_leaf(pud) && (pud_user(pud) || pud_user_exec(pud));
> + return pud_valid(pud) && !pud_table(pud) && (pud_user(pud) || pud_user_exec(pud));
> }
> #endif

Same here.

Otherwise I'm happy with the patch. Feel free to add:

Reviewed-by: Catalin Marinas <[email protected]>

As for getting it into 6.10, I'd say it's pretty late. Are there other
dependencies on core kernel fixes?

--
Catalin

2024-04-29 16:34:13

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] arm64/mm: Move PTE_INVALID to overlay PTE_NS

On Mon, Apr 29, 2024 at 03:02:06PM +0100, Ryan Roberts wrote:
> PTE_INVALID was previously occupying bit 59, which when a PTE is valid
> can either be IGNORED, PBHA[0] or AttrIndex[3], depending on the HW
> configuration. In practice this is currently not a problem because
> PTE_INVALID can only be 1 when PTE_VALID=0 and upstream Linux always
> requires the bit set to 0 for a valid pte.
>
> However, if in future Linux wants to use the field (e.g. AttrIndex[3])
> then we could end up with confusion when PTE_INVALID comes along and
> corrupts the field - we would ideally want to preserve it even for an
> invalid (but present) pte.
>
> The other problem with bit 59 is that it prevents the offset field of a
> swap entry within a swap pte from growing beyond 51 bits. By moving
> PTE_INVALID to a low bit we can lay the swap pte out so that the
> offset field could grow to 53 bits in future.
>
> So let's move PTE_INVALID to overlay PTE_NS (bit 5). PTE_NS is res0 for
> SW outside of the secure state so Linux will never need to touch it.
>
> These are both marginal benefits, but make things a bit tidier in my
> opinion.
>
> Signed-off-by: Ryan Roberts <[email protected]>

Reviewed-by: Catalin Marinas <[email protected]>

(subject to renaming PTE_INVALID to PTE_PRESENT_INVALID)

2024-04-29 17:15:58

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] arm64/mm: Refactor PMD_PRESENT_INVALID and PTE_PROT_NONE bits

On 29/04/2024 17:20, Catalin Marinas wrote:
> On Mon, Apr 29, 2024 at 03:02:05PM +0100, Ryan Roberts wrote:
>> diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
>> index dd9ee67d1d87..de62e6881154 100644
>> --- a/arch/arm64/include/asm/pgtable-prot.h
>> +++ b/arch/arm64/include/asm/pgtable-prot.h
>> @@ -18,14 +18,7 @@
>> #define PTE_DIRTY (_AT(pteval_t, 1) << 55)
>> #define PTE_SPECIAL (_AT(pteval_t, 1) << 56)
>> #define PTE_DEVMAP (_AT(pteval_t, 1) << 57)
>> -#define PTE_PROT_NONE (_AT(pteval_t, 1) << 58) /* only when !PTE_VALID */
>> -
>> -/*
>> - * This bit indicates that the entry is present i.e. pmd_page()
>> - * still points to a valid huge page in memory even if the pmd
>> - * has been invalidated.
>> - */
>> -#define PMD_PRESENT_INVALID (_AT(pteval_t, 1) << 59) /* only when !PMD_SECT_VALID */
>> +#define PTE_INVALID (_AT(pteval_t, 1) << 59) /* only when !PTE_VALID */
>
> Nitpick - I prefer the PTE_PRESENT_INVALID name as it makes it clearer
> it's a present pte. We already have PTE_VALID, calling it PTE_INVALID
> looks like a negation only.

Meh, for me the pte can only be valid or invalid if it is present. So it's
implicit. And if you have PTE_PRESENT_INVALID you should also have
PTE_PRESENT_VALID.

We also have pte_mkinvalid(), which is core-mm-defined. In your scheme, surely
it should be pte_mkpresent_invalid()?

But you're the boss, I'll change this to PTE_PRESENT_INVALID. :-(

>
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index afdd56d26ad7..8dd4637d6b56 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -105,7 +105,7 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
>> /*
>> * The following only work if pte_present(). Undefined behaviour otherwise.
>> */
>> -#define pte_present(pte) (!!(pte_val(pte) & (PTE_VALID | PTE_PROT_NONE)))
>> +#define pte_present(pte) (pte_valid(pte) || pte_invalid(pte))
>> #define pte_young(pte) (!!(pte_val(pte) & PTE_AF))
>> #define pte_special(pte) (!!(pte_val(pte) & PTE_SPECIAL))
>> #define pte_write(pte) (!!(pte_val(pte) & PTE_WRITE))
>> @@ -132,6 +132,7 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
>> #define pte_dirty(pte) (pte_sw_dirty(pte) || pte_hw_dirty(pte))
>>
>> #define pte_valid(pte) (!!(pte_val(pte) & PTE_VALID))
>> +#define pte_invalid(pte) ((pte_val(pte) & (PTE_VALID | PTE_INVALID)) == PTE_INVALID)
>
> Same argument as above, pte_invalid() looks confusing to me, better (to
> me) as pte_present_invalid().

OK. Consider it done.

>
> I think it's sufficient to check PTE_PRESENT_INVALID only. We'd never
> have both bits set, so no need for mask and compare.

My rationale is that the INVALID bit may have some other HW meaning when
PTE_VALID is set, so its not correct to interpret it as INVALID unless VALID is
clear. Granted bit 59 is AttrIndex[3] or PBHA[0], neither of which we are using
currently so it will always be 0 when PTE_VALID=1 (and same argument when its
moved to NS in next patch). But it feels fragile to me. I'd rather leave it as
is unless you insist.

>
>> /*
>> * Execute-only user mappings do not have the PTE_USER bit set. All valid
>> * kernel mappings have the PTE_UXN bit set.
>> @@ -261,6 +262,13 @@ static inline pte_t pte_mkpresent(pte_t pte)
>> return set_pte_bit(pte, __pgprot(PTE_VALID));
>> }
>>
>> +static inline pte_t pte_mkinvalid(pte_t pte)
>> +{
>> + pte = set_pte_bit(pte, __pgprot(PTE_INVALID));
>> + pte = clear_pte_bit(pte, __pgprot(PTE_VALID));
>> + return pte;
>> +}
>
> I wonder whether we need to define this. I guess it makes sense than
> having the pmd_mkinvalid() use the PTE_* definitions directly, though it
> won't be something we need to do on a pte.

For me its much cleaner to do it as is because it makes it clear that the format
is the same across pte, pmd and pud. And we need pte_invalid() (or
pte_present_invalid()) for PROT_NONE so isn't it better to match it with a setter?

>
>> +
>> static inline pmd_t pmd_mkcont(pmd_t pmd)
>> {
>> return __pmd(pmd_val(pmd) | PMD_SECT_CONT);
>> @@ -469,7 +477,7 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
>> */
>> static inline int pte_protnone(pte_t pte)
>> {
>> - return (pte_val(pte) & (PTE_VALID | PTE_PROT_NONE)) == PTE_PROT_NONE;
>> + return pte_invalid(pte) && !pte_user(pte) && !pte_user_exec(pte);
>> }
>>
>> static inline int pmd_protnone(pmd_t pmd)
>> @@ -478,12 +486,7 @@ static inline int pmd_protnone(pmd_t pmd)
>> }
>> #endif
>>
>> -#define pmd_present_invalid(pmd) (!!(pmd_val(pmd) & PMD_PRESENT_INVALID))
>> -
>> -static inline int pmd_present(pmd_t pmd)
>> -{
>> - return pte_present(pmd_pte(pmd)) || pmd_present_invalid(pmd);
>> -}
>> +#define pmd_present(pmd) pte_present(pmd_pte(pmd))
>>
>> /*
>> * THP definitions.
>> @@ -508,14 +511,7 @@ static inline int pmd_trans_huge(pmd_t pmd)
>> #define pmd_mkclean(pmd) pte_pmd(pte_mkclean(pmd_pte(pmd)))
>> #define pmd_mkdirty(pmd) pte_pmd(pte_mkdirty(pmd_pte(pmd)))
>> #define pmd_mkyoung(pmd) pte_pmd(pte_mkyoung(pmd_pte(pmd)))
>> -
>> -static inline pmd_t pmd_mkinvalid(pmd_t pmd)
>> -{
>> - pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
>> - pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
>> -
>> - return pmd;
>> -}
>> +#define pmd_mkinvalid(pmd) pte_pmd(pte_mkinvalid(pmd_pte(pmd)))
>>
>> #define pmd_thp_or_huge(pmd) (pmd_huge(pmd) || pmd_trans_huge(pmd))
>>
>> @@ -1027,7 +1023,7 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
>> * in MAIR_EL1. The mask below has to include PTE_ATTRINDX_MASK.
>> */
>> const pteval_t mask = PTE_USER | PTE_PXN | PTE_UXN | PTE_RDONLY |
>> - PTE_PROT_NONE | PTE_VALID | PTE_WRITE | PTE_GP |
>> + PTE_INVALID | PTE_VALID | PTE_WRITE | PTE_GP |
>> PTE_ATTRINDX_MASK;
>> /* preserve the hardware dirty information */
>> if (pte_hw_dirty(pte))
>> @@ -1076,17 +1072,17 @@ static inline int pgd_devmap(pgd_t pgd)
>> #ifdef CONFIG_PAGE_TABLE_CHECK
>> static inline bool pte_user_accessible_page(pte_t pte)
>> {
>> - return pte_present(pte) && (pte_user(pte) || pte_user_exec(pte));
>> + return pte_valid(pte) && (pte_user(pte) || pte_user_exec(pte));
>> }
>
> This looks fine, it doesn't change the semantics since PROT_NONE never
> had pte_user() or pte_user_exec().
>
>> static inline bool pmd_user_accessible_page(pmd_t pmd)
>> {
>> - return pmd_leaf(pmd) && !pmd_present_invalid(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd));
>> + return pmd_valid(pmd) && !pmd_table(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd));
>> }
>
> Maybe our pmd_leaf() should actually check valid && !table instead of
> present and no need to change these.

I'm not sure that would be a great approach; pmd_leaf() is core-mm-defined. And
I can't imagine core-mm would want pmd_leaf() to start returning false after
calling pmd_mkinvalid(). You probably won't find anywhere where it actually
matters right now, but it would be subtly broken and could be exposed in future.

>
>> static inline bool pud_user_accessible_page(pud_t pud)
>> {
>> - return pud_leaf(pud) && (pud_user(pud) || pud_user_exec(pud));
>> + return pud_valid(pud) && !pud_table(pud) && (pud_user(pud) || pud_user_exec(pud));
>> }
>> #endif
>
> Same here.
>
> Otherwise I'm happy with the patch. Feel free to add:
>
> Reviewed-by: Catalin Marinas <[email protected]>
>
> As for getting it into 6.10, I'd say it's pretty late. Are there other
> dependencies on core kernel fixes?

Yes one fix that this depends on - Andrew has just taken the fix into
mm-hotfixes-unstable. So I think that will get into v6.9 all being well? I'm
only pushing because I'd prefer to have it off my desk before the baby comes
(14th May). Realistically it can wait.


2024-04-30 11:13:39

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] arm64/mm: Refactor PMD_PRESENT_INVALID and PTE_PROT_NONE bits

On Mon, Apr 29, 2024 at 06:15:45PM +0100, Ryan Roberts wrote:
> On 29/04/2024 17:20, Catalin Marinas wrote:
> > On Mon, Apr 29, 2024 at 03:02:05PM +0100, Ryan Roberts wrote:
> >> diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
> >> index dd9ee67d1d87..de62e6881154 100644
> >> --- a/arch/arm64/include/asm/pgtable-prot.h
> >> +++ b/arch/arm64/include/asm/pgtable-prot.h
> >> @@ -18,14 +18,7 @@
> >> #define PTE_DIRTY (_AT(pteval_t, 1) << 55)
> >> #define PTE_SPECIAL (_AT(pteval_t, 1) << 56)
> >> #define PTE_DEVMAP (_AT(pteval_t, 1) << 57)
> >> -#define PTE_PROT_NONE (_AT(pteval_t, 1) << 58) /* only when !PTE_VALID */
> >> -
> >> -/*
> >> - * This bit indicates that the entry is present i.e. pmd_page()
> >> - * still points to a valid huge page in memory even if the pmd
> >> - * has been invalidated.
> >> - */
> >> -#define PMD_PRESENT_INVALID (_AT(pteval_t, 1) << 59) /* only when !PMD_SECT_VALID */
> >> +#define PTE_INVALID (_AT(pteval_t, 1) << 59) /* only when !PTE_VALID */
> >
> > Nitpick - I prefer the PTE_PRESENT_INVALID name as it makes it clearer
> > it's a present pte. We already have PTE_VALID, calling it PTE_INVALID
> > looks like a negation only.
>
> Meh, for me the pte can only be valid or invalid if it is present. So it's
> implicit. And if you have PTE_PRESENT_INVALID you should also have
> PTE_PRESENT_VALID.
>
> We also have pte_mkinvalid(), which is core-mm-defined. In your scheme, surely
> it should be pte_mkpresent_invalid()?
>
> But you're the boss, I'll change this to PTE_PRESENT_INVALID. :-(

TBH, I don't have a strong opinion but best to avoid the bikeshedding.
I'll leave the decision to you ;). It would match the pmd_mkinvalid()
core code. But if you drop 'present' make sure you add a comment above
that it's meant for present ptes.

> >> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> >> index afdd56d26ad7..8dd4637d6b56 100644
> >> --- a/arch/arm64/include/asm/pgtable.h
> >> +++ b/arch/arm64/include/asm/pgtable.h
> >> @@ -105,7 +105,7 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
> >> /*
> >> * The following only work if pte_present(). Undefined behaviour otherwise.
> >> */
> >> -#define pte_present(pte) (!!(pte_val(pte) & (PTE_VALID | PTE_PROT_NONE)))
> >> +#define pte_present(pte) (pte_valid(pte) || pte_invalid(pte))
> >> #define pte_young(pte) (!!(pte_val(pte) & PTE_AF))
> >> #define pte_special(pte) (!!(pte_val(pte) & PTE_SPECIAL))
> >> #define pte_write(pte) (!!(pte_val(pte) & PTE_WRITE))
> >> @@ -132,6 +132,7 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
> >> #define pte_dirty(pte) (pte_sw_dirty(pte) || pte_hw_dirty(pte))
> >>
> >> #define pte_valid(pte) (!!(pte_val(pte) & PTE_VALID))
> >> +#define pte_invalid(pte) ((pte_val(pte) & (PTE_VALID | PTE_INVALID)) == PTE_INVALID)
[...]
> > I think it's sufficient to check PTE_PRESENT_INVALID only. We'd never
> > have both bits set, so no need for mask and compare.
>
> My rationale is that the INVALID bit may have some other HW meaning when
> PTE_VALID is set, so its not correct to interpret it as INVALID unless VALID is
> clear. Granted bit 59 is AttrIndex[3] or PBHA[0], neither of which we are using
> currently so it will always be 0 when PTE_VALID=1 (and same argument when its
> moved to NS in next patch). But it feels fragile to me. I'd rather leave it as
> is unless you insist.

You are right. It currently works but it may overlap with some hardware
bit on valid ptes.

> >> /*
> >> * Execute-only user mappings do not have the PTE_USER bit set. All valid
> >> * kernel mappings have the PTE_UXN bit set.
> >> @@ -261,6 +262,13 @@ static inline pte_t pte_mkpresent(pte_t pte)
> >> return set_pte_bit(pte, __pgprot(PTE_VALID));
> >> }
> >>
> >> +static inline pte_t pte_mkinvalid(pte_t pte)
> >> +{
> >> + pte = set_pte_bit(pte, __pgprot(PTE_INVALID));
> >> + pte = clear_pte_bit(pte, __pgprot(PTE_VALID));
> >> + return pte;
> >> +}
> >
> > I wonder whether we need to define this. I guess it makes sense than
> > having the pmd_mkinvalid() use the PTE_* definitions directly, though it
> > won't be something we need to do on a pte.
>
> For me its much cleaner to do it as is because it makes it clear that the format
> is the same across pte, pmd and pud. And we need pte_invalid() (or
> pte_present_invalid()) for PROT_NONE so isn't it better to match it with a setter?

I agree. It was just a remark above.

> >> static inline bool pmd_user_accessible_page(pmd_t pmd)
> >> {
> >> - return pmd_leaf(pmd) && !pmd_present_invalid(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd));
> >> + return pmd_valid(pmd) && !pmd_table(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd));
> >> }
> >
> > Maybe our pmd_leaf() should actually check valid && !table instead of
> > present and no need to change these.
>
> I'm not sure that would be a great approach; pmd_leaf() is core-mm-defined. And
> I can't imagine core-mm would want pmd_leaf() to start returning false after
> calling pmd_mkinvalid(). You probably won't find anywhere where it actually
> matters right now, but it would be subtly broken and could be exposed in future.

True, I think it's fine currently but you never know. So after
pmd_mkinvalid() on a leaf pmd, pmd_leaf() should still return true. It
might be worth adding a test to pmd_thp_tests() in debug_vm_pgtable.c.

> >> static inline bool pud_user_accessible_page(pud_t pud)
> >> {
> >> - return pud_leaf(pud) && (pud_user(pud) || pud_user_exec(pud));
> >> + return pud_valid(pud) && !pud_table(pud) && (pud_user(pud) || pud_user_exec(pud));
> >> }
> >> #endif
> >
> > Same here.
> >
> > Otherwise I'm happy with the patch. Feel free to add:
> >
> > Reviewed-by: Catalin Marinas <[email protected]>

My reviewed-by tag still stands even if you leave the patch as is.

Thanks.

--
Catalin

2024-04-30 11:37:22

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] arm64/mm: Refactor PMD_PRESENT_INVALID and PTE_PROT_NONE bits

On 30/04/2024 12:11, Catalin Marinas wrote:
> On Mon, Apr 29, 2024 at 06:15:45PM +0100, Ryan Roberts wrote:
>> On 29/04/2024 17:20, Catalin Marinas wrote:
>>> On Mon, Apr 29, 2024 at 03:02:05PM +0100, Ryan Roberts wrote:
>>>> diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
>>>> index dd9ee67d1d87..de62e6881154 100644
>>>> --- a/arch/arm64/include/asm/pgtable-prot.h
>>>> +++ b/arch/arm64/include/asm/pgtable-prot.h
>>>> @@ -18,14 +18,7 @@
>>>> #define PTE_DIRTY (_AT(pteval_t, 1) << 55)
>>>> #define PTE_SPECIAL (_AT(pteval_t, 1) << 56)
>>>> #define PTE_DEVMAP (_AT(pteval_t, 1) << 57)
>>>> -#define PTE_PROT_NONE (_AT(pteval_t, 1) << 58) /* only when !PTE_VALID */
>>>> -
>>>> -/*
>>>> - * This bit indicates that the entry is present i.e. pmd_page()
>>>> - * still points to a valid huge page in memory even if the pmd
>>>> - * has been invalidated.
>>>> - */
>>>> -#define PMD_PRESENT_INVALID (_AT(pteval_t, 1) << 59) /* only when !PMD_SECT_VALID */
>>>> +#define PTE_INVALID (_AT(pteval_t, 1) << 59) /* only when !PTE_VALID */
>>>
>>> Nitpick - I prefer the PTE_PRESENT_INVALID name as it makes it clearer
>>> it's a present pte. We already have PTE_VALID, calling it PTE_INVALID
>>> looks like a negation only.
>>
>> Meh, for me the pte can only be valid or invalid if it is present. So it's
>> implicit. And if you have PTE_PRESENT_INVALID you should also have
>> PTE_PRESENT_VALID.
>>
>> We also have pte_mkinvalid(), which is core-mm-defined. In your scheme, surely
>> it should be pte_mkpresent_invalid()?
>>
>> But you're the boss, I'll change this to PTE_PRESENT_INVALID. :-(
>
> TBH, I don't have a strong opinion but best to avoid the bikeshedding.
> I'll leave the decision to you ;). It would match the pmd_mkinvalid()
> core code. But if you drop 'present' make sure you add a comment above
> that it's meant for present ptes.

OK, thanks - I'll add a comment and leave it as pte_invalid().

>
>>>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>>>> index afdd56d26ad7..8dd4637d6b56 100644
>>>> --- a/arch/arm64/include/asm/pgtable.h
>>>> +++ b/arch/arm64/include/asm/pgtable.h
>>>> @@ -105,7 +105,7 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
>>>> /*
>>>> * The following only work if pte_present(). Undefined behaviour otherwise.
>>>> */
>>>> -#define pte_present(pte) (!!(pte_val(pte) & (PTE_VALID | PTE_PROT_NONE)))
>>>> +#define pte_present(pte) (pte_valid(pte) || pte_invalid(pte))
>>>> #define pte_young(pte) (!!(pte_val(pte) & PTE_AF))
>>>> #define pte_special(pte) (!!(pte_val(pte) & PTE_SPECIAL))
>>>> #define pte_write(pte) (!!(pte_val(pte) & PTE_WRITE))
>>>> @@ -132,6 +132,7 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
>>>> #define pte_dirty(pte) (pte_sw_dirty(pte) || pte_hw_dirty(pte))
>>>>
>>>> #define pte_valid(pte) (!!(pte_val(pte) & PTE_VALID))
>>>> +#define pte_invalid(pte) ((pte_val(pte) & (PTE_VALID | PTE_INVALID)) == PTE_INVALID)
> [...]
>>> I think it's sufficient to check PTE_PRESENT_INVALID only. We'd never
>>> have both bits set, so no need for mask and compare.
>>
>> My rationale is that the INVALID bit may have some other HW meaning when
>> PTE_VALID is set, so its not correct to interpret it as INVALID unless VALID is
>> clear. Granted bit 59 is AttrIndex[3] or PBHA[0], neither of which we are using
>> currently so it will always be 0 when PTE_VALID=1 (and same argument when its
>> moved to NS in next patch). But it feels fragile to me. I'd rather leave it as
>> is unless you insist.
>
> You are right. It currently works but it may overlap with some hardware
> bit on valid ptes.

OK, I'll leave it as is.

>
>>>> /*
>>>> * Execute-only user mappings do not have the PTE_USER bit set. All valid
>>>> * kernel mappings have the PTE_UXN bit set.
>>>> @@ -261,6 +262,13 @@ static inline pte_t pte_mkpresent(pte_t pte)
>>>> return set_pte_bit(pte, __pgprot(PTE_VALID));
>>>> }
>>>>
>>>> +static inline pte_t pte_mkinvalid(pte_t pte)
>>>> +{
>>>> + pte = set_pte_bit(pte, __pgprot(PTE_INVALID));
>>>> + pte = clear_pte_bit(pte, __pgprot(PTE_VALID));
>>>> + return pte;
>>>> +}
>>>
>>> I wonder whether we need to define this. I guess it makes sense than
>>> having the pmd_mkinvalid() use the PTE_* definitions directly, though it
>>> won't be something we need to do on a pte.
>>
>> For me its much cleaner to do it as is because it makes it clear that the format
>> is the same across pte, pmd and pud. And we need pte_invalid() (or
>> pte_present_invalid()) for PROT_NONE so isn't it better to match it with a setter?
>
> I agree. It was just a remark above.

ACK.

>
>>>> static inline bool pmd_user_accessible_page(pmd_t pmd)
>>>> {
>>>> - return pmd_leaf(pmd) && !pmd_present_invalid(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd));
>>>> + return pmd_valid(pmd) && !pmd_table(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd));
>>>> }
>>>
>>> Maybe our pmd_leaf() should actually check valid && !table instead of
>>> present and no need to change these.
>>
>> I'm not sure that would be a great approach; pmd_leaf() is core-mm-defined. And
>> I can't imagine core-mm would want pmd_leaf() to start returning false after
>> calling pmd_mkinvalid(). You probably won't find anywhere where it actually
>> matters right now, but it would be subtly broken and could be exposed in future.
>
> True, I think it's fine currently but you never know. So after
> pmd_mkinvalid() on a leaf pmd, pmd_leaf() should still return true. It
> might be worth adding a test to pmd_thp_tests() in debug_vm_pgtable.c.

Good idea; I'll add tests for this.

>
>>>> static inline bool pud_user_accessible_page(pud_t pud)
>>>> {
>>>> - return pud_leaf(pud) && (pud_user(pud) || pud_user_exec(pud));
>>>> + return pud_valid(pud) && !pud_table(pud) && (pud_user(pud) || pud_user_exec(pud));
>>>> }
>>>> #endif
>>>
>>> Same here.
>>>
>>> Otherwise I'm happy with the patch. Feel free to add:
>>>
>>> Reviewed-by: Catalin Marinas <[email protected]>
>
> My reviewed-by tag still stands even if you leave the patch as is.

Thanks.

There is still one problem I need to resolve; During this work I discovered that
core-mm can call pmd_mkinvalid() for swap pmds. On arm64 this will turn the swap
pmd into a present pmd, and BadThings can happen in GUP-fast (and any other
lockless SW table walkers). My original fix modified core-mm to only call
pmd_mkinvalid() for present pmds. But discussion over there has shown that arm64
is the only arch that cannot handle this. So I've been convinced that it's
probably more robust to make arm64 handle it gracefully and add tests to
debug_vm_pgtable.c to check for this. Patch incoming shortly, but it will cause
a conflict with this series. So I'll send a v2 of this once that fix is accepted.

>
> Thanks.
>


2024-04-30 11:38:07

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] arm64/mm: Refactor PMD_PRESENT_INVALID and PTE_PROT_NONE bits

On 30.04.24 13:11, Catalin Marinas wrote:
> On Mon, Apr 29, 2024 at 06:15:45PM +0100, Ryan Roberts wrote:
>> On 29/04/2024 17:20, Catalin Marinas wrote:
>>> On Mon, Apr 29, 2024 at 03:02:05PM +0100, Ryan Roberts wrote:
>>>> diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
>>>> index dd9ee67d1d87..de62e6881154 100644
>>>> --- a/arch/arm64/include/asm/pgtable-prot.h
>>>> +++ b/arch/arm64/include/asm/pgtable-prot.h
>>>> @@ -18,14 +18,7 @@
>>>> #define PTE_DIRTY (_AT(pteval_t, 1) << 55)
>>>> #define PTE_SPECIAL (_AT(pteval_t, 1) << 56)
>>>> #define PTE_DEVMAP (_AT(pteval_t, 1) << 57)
>>>> -#define PTE_PROT_NONE (_AT(pteval_t, 1) << 58) /* only when !PTE_VALID */
>>>> -
>>>> -/*
>>>> - * This bit indicates that the entry is present i.e. pmd_page()
>>>> - * still points to a valid huge page in memory even if the pmd
>>>> - * has been invalidated.
>>>> - */
>>>> -#define PMD_PRESENT_INVALID (_AT(pteval_t, 1) << 59) /* only when !PMD_SECT_VALID */
>>>> +#define PTE_INVALID (_AT(pteval_t, 1) << 59) /* only when !PTE_VALID */
>>>
>>> Nitpick - I prefer the PTE_PRESENT_INVALID name as it makes it clearer
>>> it's a present pte. We already have PTE_VALID, calling it PTE_INVALID
>>> looks like a negation only.
>>
>> Meh, for me the pte can only be valid or invalid if it is present. So it's
>> implicit. And if you have PTE_PRESENT_INVALID you should also have
>> PTE_PRESENT_VALID.
>>
>> We also have pte_mkinvalid(), which is core-mm-defined. In your scheme, surely
>> it should be pte_mkpresent_invalid()?
>>
>> But you're the boss, I'll change this to PTE_PRESENT_INVALID. :-(
>
> TBH, I don't have a strong opinion but best to avoid the bikeshedding.
> I'll leave the decision to you ;). It would match the pmd_mkinvalid()
> core code. But if you drop 'present' make sure you add a comment above
> that it's meant for present ptes.

FWIW, I was confused by

present = valid | invalid

Something like

present = present_valid | present_invalid

would be more obvious at least to me ;)

--
Cheers,

David / dhildenb


2024-04-30 12:55:52

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] arm64/mm: Refactor PMD_PRESENT_INVALID and PTE_PROT_NONE bits

On 30/04/2024 12:37, David Hildenbrand wrote:
> On 30.04.24 13:11, Catalin Marinas wrote:
>> On Mon, Apr 29, 2024 at 06:15:45PM +0100, Ryan Roberts wrote:
>>> On 29/04/2024 17:20, Catalin Marinas wrote:
>>>> On Mon, Apr 29, 2024 at 03:02:05PM +0100, Ryan Roberts wrote:
>>>>> diff --git a/arch/arm64/include/asm/pgtable-prot.h
>>>>> b/arch/arm64/include/asm/pgtable-prot.h
>>>>> index dd9ee67d1d87..de62e6881154 100644
>>>>> --- a/arch/arm64/include/asm/pgtable-prot.h
>>>>> +++ b/arch/arm64/include/asm/pgtable-prot.h
>>>>> @@ -18,14 +18,7 @@
>>>>>   #define PTE_DIRTY        (_AT(pteval_t, 1) << 55)
>>>>>   #define PTE_SPECIAL        (_AT(pteval_t, 1) << 56)
>>>>>   #define PTE_DEVMAP        (_AT(pteval_t, 1) << 57)
>>>>> -#define PTE_PROT_NONE        (_AT(pteval_t, 1) << 58) /* only when
>>>>> !PTE_VALID */
>>>>> -
>>>>> -/*
>>>>> - * This bit indicates that the entry is present i.e. pmd_page()
>>>>> - * still points to a valid huge page in memory even if the pmd
>>>>> - * has been invalidated.
>>>>> - */
>>>>> -#define PMD_PRESENT_INVALID    (_AT(pteval_t, 1) << 59) /* only when
>>>>> !PMD_SECT_VALID */
>>>>> +#define PTE_INVALID        (_AT(pteval_t, 1) << 59) /* only when
>>>>> !PTE_VALID */
>>>>
>>>> Nitpick - I prefer the PTE_PRESENT_INVALID name as it makes it clearer
>>>> it's a present pte. We already have PTE_VALID, calling it PTE_INVALID
>>>> looks like a negation only.
>>>
>>> Meh, for me the pte can only be valid or invalid if it is present. So it's
>>> implicit. And if you have PTE_PRESENT_INVALID you should also have
>>> PTE_PRESENT_VALID.
>>>
>>> We also have pte_mkinvalid(), which is core-mm-defined. In your scheme, surely
>>> it should be pte_mkpresent_invalid()?
>>>
>>> But you're the boss, I'll change this to PTE_PRESENT_INVALID. :-(
>>
>> TBH, I don't have a strong opinion but best to avoid the bikeshedding.
>> I'll leave the decision to you ;). It would match the pmd_mkinvalid()
>> core code. But if you drop 'present' make sure you add a comment above
>> that it's meant for present ptes.
>
> FWIW, I was confused by
>
> present = valid | invalid

OK fair enough.

>
> Something like
>
> present = present_valid | present_invalid

I don't want to change pte_valid() to pte_present_valid(); that would also be a
fair bit of churn.

I'll take Catalin's suggestion and make this PTE_PRESENT_INVALID and
pte_present_invalid(). And obviously leave pmd_mkinvalid() as it is.
(Conversation in the other thread has concluded that it's ok to invalidate a
non-present pmd afterall).

>
> would be more obvious at least to me ;)
>


2024-04-30 12:59:42

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] arm64/mm: Refactor PMD_PRESENT_INVALID and PTE_PROT_NONE bits

On 30.04.24 14:53, Ryan Roberts wrote:
> On 30/04/2024 12:37, David Hildenbrand wrote:
>> On 30.04.24 13:11, Catalin Marinas wrote:
>>> On Mon, Apr 29, 2024 at 06:15:45PM +0100, Ryan Roberts wrote:
>>>> On 29/04/2024 17:20, Catalin Marinas wrote:
>>>>> On Mon, Apr 29, 2024 at 03:02:05PM +0100, Ryan Roberts wrote:
>>>>>> diff --git a/arch/arm64/include/asm/pgtable-prot.h
>>>>>> b/arch/arm64/include/asm/pgtable-prot.h
>>>>>> index dd9ee67d1d87..de62e6881154 100644
>>>>>> --- a/arch/arm64/include/asm/pgtable-prot.h
>>>>>> +++ b/arch/arm64/include/asm/pgtable-prot.h
>>>>>> @@ -18,14 +18,7 @@
>>>>>>   #define PTE_DIRTY        (_AT(pteval_t, 1) << 55)
>>>>>>   #define PTE_SPECIAL        (_AT(pteval_t, 1) << 56)
>>>>>>   #define PTE_DEVMAP        (_AT(pteval_t, 1) << 57)
>>>>>> -#define PTE_PROT_NONE        (_AT(pteval_t, 1) << 58) /* only when
>>>>>> !PTE_VALID */
>>>>>> -
>>>>>> -/*
>>>>>> - * This bit indicates that the entry is present i.e. pmd_page()
>>>>>> - * still points to a valid huge page in memory even if the pmd
>>>>>> - * has been invalidated.
>>>>>> - */
>>>>>> -#define PMD_PRESENT_INVALID    (_AT(pteval_t, 1) << 59) /* only when
>>>>>> !PMD_SECT_VALID */
>>>>>> +#define PTE_INVALID        (_AT(pteval_t, 1) << 59) /* only when
>>>>>> !PTE_VALID */
>>>>>
>>>>> Nitpick - I prefer the PTE_PRESENT_INVALID name as it makes it clearer
>>>>> it's a present pte. We already have PTE_VALID, calling it PTE_INVALID
>>>>> looks like a negation only.
>>>>
>>>> Meh, for me the pte can only be valid or invalid if it is present. So it's
>>>> implicit. And if you have PTE_PRESENT_INVALID you should also have
>>>> PTE_PRESENT_VALID.
>>>>
>>>> We also have pte_mkinvalid(), which is core-mm-defined. In your scheme, surely
>>>> it should be pte_mkpresent_invalid()?
>>>>
>>>> But you're the boss, I'll change this to PTE_PRESENT_INVALID. :-(
>>>
>>> TBH, I don't have a strong opinion but best to avoid the bikeshedding.
>>> I'll leave the decision to you ;). It would match the pmd_mkinvalid()
>>> core code. But if you drop 'present' make sure you add a comment above
>>> that it's meant for present ptes.
>>
>> FWIW, I was confused by
>>
>> present = valid | invalid
>
> OK fair enough.
>
>>
>> Something like
>>
>> present = present_valid | present_invalid
>
> I don't want to change pte_valid() to pte_present_valid(); that would also be a
> fair bit of churn.

Yes.

>
> I'll take Catalin's suggestion and make this PTE_PRESENT_INVALID and
> pte_present_invalid(). And obviously leave pmd_mkinvalid() as it is.
> (Conversation in the other thread has concluded that it's ok to invalidate a
> non-present pmd afterall).

Works for me.

--
Cheers,

David / dhildenb


2024-04-30 13:29:50

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] arm64/mm: Refactor PMD_PRESENT_INVALID and PTE_PROT_NONE bits

On Tue, Apr 30, 2024 at 12:35:49PM +0100, Ryan Roberts wrote:
> There is still one problem I need to resolve; During this work I discovered that
> core-mm can call pmd_mkinvalid() for swap pmds. On arm64 this will turn the swap
> pmd into a present pmd, and BadThings can happen in GUP-fast (and any other
> lockless SW table walkers). My original fix modified core-mm to only call
> pmd_mkinvalid() for present pmds. But discussion over there has shown that arm64
> is the only arch that cannot handle this. So I've been convinced that it's
> probably more robust to make arm64 handle it gracefully and add tests to
> debug_vm_pgtable.c to check for this. Patch incoming shortly, but it will cause
> a conflict with this series. So I'll send a v2 of this once that fix is accepted.

Sounds fine. I can queue the arm64 pmd_mkinvalid() fix for 6.9 and you
can base this series on top. But I have a preference for this patchset
to sit in -next for a bit anyway, so it might be 6.11 material.

--
Catalin

2024-04-30 13:32:26

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] arm64/mm: Refactor PMD_PRESENT_INVALID and PTE_PROT_NONE bits

Hey Ryan,

Just a couple of comments on this:

On Mon, Apr 29, 2024 at 03:02:05PM +0100, Ryan Roberts wrote:
> diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
> index dd9ee67d1d87..de62e6881154 100644
> --- a/arch/arm64/include/asm/pgtable-prot.h
> +++ b/arch/arm64/include/asm/pgtable-prot.h
> @@ -18,14 +18,7 @@
> #define PTE_DIRTY (_AT(pteval_t, 1) << 55)
> #define PTE_SPECIAL (_AT(pteval_t, 1) << 56)
> #define PTE_DEVMAP (_AT(pteval_t, 1) << 57)
> -#define PTE_PROT_NONE (_AT(pteval_t, 1) << 58) /* only when !PTE_VALID */
> -
> -/*
> - * This bit indicates that the entry is present i.e. pmd_page()
> - * still points to a valid huge page in memory even if the pmd
> - * has been invalidated.
> - */
> -#define PMD_PRESENT_INVALID (_AT(pteval_t, 1) << 59) /* only when !PMD_SECT_VALID */
> +#define PTE_INVALID (_AT(pteval_t, 1) << 59) /* only when !PTE_VALID */

So this now overlaps with AttrIndx[3] if FEAT_AIE is implemented. Although
this shouldn't matter on the face of things because it's only used for
invalid entries, we originally moved the PROT_NONE bit from 2 to 57 back
in 3676f9ef5481 ("arm64: Move PTE_PROT_NONE higher up") because it was
possible to change the memory type for PROT_NONE mappings via some
drivers.

Moving the field to the NS bit (as you do later in the series) resolves
this, but the architecture currently says that the NS bit is RES0. How
can we guarantee that it won't be repurposed by hardware in future?

> #define _PROT_DEFAULT (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
> #define _PROT_SECT_DEFAULT (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
> @@ -103,7 +96,7 @@ static inline bool __pure lpa2_is_enabled(void)
> __val; \
> })
>
> -#define PAGE_NONE __pgprot(((_PAGE_DEFAULT) & ~PTE_VALID) | PTE_PROT_NONE | PTE_RDONLY | PTE_NG | PTE_PXN | PTE_UXN)
> +#define PAGE_NONE __pgprot(((_PAGE_DEFAULT) & ~PTE_VALID) | PTE_INVALID | PTE_RDONLY | PTE_NG | PTE_PXN | PTE_UXN)
> /* shared+writable pages are clean by default, hence PTE_RDONLY|PTE_WRITE */
> #define PAGE_SHARED __pgprot(_PAGE_SHARED)
> #define PAGE_SHARED_EXEC __pgprot(_PAGE_SHARED_EXEC)
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index afdd56d26ad7..8dd4637d6b56 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -105,7 +105,7 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
> /*
> * The following only work if pte_present(). Undefined behaviour otherwise.
> */
> -#define pte_present(pte) (!!(pte_val(pte) & (PTE_VALID | PTE_PROT_NONE)))
> +#define pte_present(pte) (pte_valid(pte) || pte_invalid(pte))
> #define pte_young(pte) (!!(pte_val(pte) & PTE_AF))
> #define pte_special(pte) (!!(pte_val(pte) & PTE_SPECIAL))
> #define pte_write(pte) (!!(pte_val(pte) & PTE_WRITE))
> @@ -132,6 +132,7 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
> #define pte_dirty(pte) (pte_sw_dirty(pte) || pte_hw_dirty(pte))
>
> #define pte_valid(pte) (!!(pte_val(pte) & PTE_VALID))
> +#define pte_invalid(pte) ((pte_val(pte) & (PTE_VALID | PTE_INVALID)) == PTE_INVALID)
> /*
> * Execute-only user mappings do not have the PTE_USER bit set. All valid
> * kernel mappings have the PTE_UXN bit set.
> @@ -261,6 +262,13 @@ static inline pte_t pte_mkpresent(pte_t pte)
> return set_pte_bit(pte, __pgprot(PTE_VALID));
> }
>
> +static inline pte_t pte_mkinvalid(pte_t pte)
> +{
> + pte = set_pte_bit(pte, __pgprot(PTE_INVALID));
> + pte = clear_pte_bit(pte, __pgprot(PTE_VALID));
> + return pte;
> +}
> +
> static inline pmd_t pmd_mkcont(pmd_t pmd)
> {
> return __pmd(pmd_val(pmd) | PMD_SECT_CONT);
> @@ -469,7 +477,7 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
> */
> static inline int pte_protnone(pte_t pte)
> {
> - return (pte_val(pte) & (PTE_VALID | PTE_PROT_NONE)) == PTE_PROT_NONE;
> + return pte_invalid(pte) && !pte_user(pte) && !pte_user_exec(pte);
> }

Why do we need to check pte_user_*() here? Isn't PROT_NONE the only case
in which a pte will have PTE_INVALID set?

Will

2024-04-30 13:44:40

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] arm64/mm: Refactor PMD_PRESENT_INVALID and PTE_PROT_NONE bits

On 30/04/2024 14:28, Catalin Marinas wrote:
> On Tue, Apr 30, 2024 at 12:35:49PM +0100, Ryan Roberts wrote:
>> There is still one problem I need to resolve; During this work I discovered that
>> core-mm can call pmd_mkinvalid() for swap pmds. On arm64 this will turn the swap
>> pmd into a present pmd, and BadThings can happen in GUP-fast (and any other
>> lockless SW table walkers). My original fix modified core-mm to only call
>> pmd_mkinvalid() for present pmds. But discussion over there has shown that arm64
>> is the only arch that cannot handle this. So I've been convinced that it's
>> probably more robust to make arm64 handle it gracefully and add tests to
>> debug_vm_pgtable.c to check for this. Patch incoming shortly, but it will cause
>> a conflict with this series. So I'll send a v2 of this once that fix is accepted.
>
> Sounds fine. I can queue the arm64 pmd_mkinvalid() fix for 6.9 and you
> can base this series on top. But I have a preference for this patchset
> to sit in -next for a bit anyway, so it might be 6.11 material.

Yeah that works for me. I just sent the pmd_mkinvalid() fix.


2024-04-30 14:10:52

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] arm64/mm: Refactor PMD_PRESENT_INVALID and PTE_PROT_NONE bits

On 30/04/2024 14:30, Will Deacon wrote:
> Hey Ryan,
>
> Just a couple of comments on this:
>
> On Mon, Apr 29, 2024 at 03:02:05PM +0100, Ryan Roberts wrote:
>> diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
>> index dd9ee67d1d87..de62e6881154 100644
>> --- a/arch/arm64/include/asm/pgtable-prot.h
>> +++ b/arch/arm64/include/asm/pgtable-prot.h
>> @@ -18,14 +18,7 @@
>> #define PTE_DIRTY (_AT(pteval_t, 1) << 55)
>> #define PTE_SPECIAL (_AT(pteval_t, 1) << 56)
>> #define PTE_DEVMAP (_AT(pteval_t, 1) << 57)
>> -#define PTE_PROT_NONE (_AT(pteval_t, 1) << 58) /* only when !PTE_VALID */
>> -
>> -/*
>> - * This bit indicates that the entry is present i.e. pmd_page()
>> - * still points to a valid huge page in memory even if the pmd
>> - * has been invalidated.
>> - */
>> -#define PMD_PRESENT_INVALID (_AT(pteval_t, 1) << 59) /* only when !PMD_SECT_VALID */
>> +#define PTE_INVALID (_AT(pteval_t, 1) << 59) /* only when !PTE_VALID */
>
> So this now overlaps with AttrIndx[3] if FEAT_AIE is implemented. Although
> this shouldn't matter on the face of things because it's only used for
> invalid entries, we originally moved the PROT_NONE bit from 2 to 57 back
> in 3676f9ef5481 ("arm64: Move PTE_PROT_NONE higher up") because it was
> possible to change the memory type for PROT_NONE mappings via some
> drivers.

I'm not sure I follow your argument.

1. We don't support FEAT_AIE (currently) so AttrIndx[3] is always going to be 0
for valid ptes. Drivers are only calling our helpers (e.g.
pgprot_writecombine(), right?) and those only know how to set AttrIndx[2:0].

2. PMD_PRESENT_INVALID was already occupying bit 59. So wouldn't the same shape
of concern apply there too for PMDs that have been invalidated, where the driver
then comes along and changes the memory type? (Perhaps because
PMD_PRESENT_INVALID is only set while the PTL is held this can't happen).

3. I had this same vague concern about confusion due to overlapping bit 59,
which is why in the next patch, I'm moving it to the NS bit.

Experience tells me that when I'm arguing confidently with someone who is much
more expert than me, then I'm using wrong... so what have I missed? :)

>
> Moving the field to the NS bit (as you do later in the series) resolves
> this, but the architecture currently says that the NS bit is RES0. How
> can we guarantee that it won't be repurposed by hardware in future?

Well it remains free for use in valid entries of course. So I guess you are
asking how to guarantee we won't also need to be able to modify it on the fly
for PROT_NONE entries? I don't have a definite answer, but I've been working on
the assumption that the architecture introducing a feature that is only needed
in states where NS is not needed is unlikely (so using that bit for the feature
is also unlikely). And then needing to manipulate that feature dyanically for
PROT_NONE mappings is even less likely.

If all else fails we could move it to nG (bit 11) to free up bit 5. But that
requires a bit more fiddling with the swap pte format.

>
>> #define _PROT_DEFAULT (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
>> #define _PROT_SECT_DEFAULT (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
>> @@ -103,7 +96,7 @@ static inline bool __pure lpa2_is_enabled(void)
>> __val; \
>> })
>>
>> -#define PAGE_NONE __pgprot(((_PAGE_DEFAULT) & ~PTE_VALID) | PTE_PROT_NONE | PTE_RDONLY | PTE_NG | PTE_PXN | PTE_UXN)
>> +#define PAGE_NONE __pgprot(((_PAGE_DEFAULT) & ~PTE_VALID) | PTE_INVALID | PTE_RDONLY | PTE_NG | PTE_PXN | PTE_UXN)
>> /* shared+writable pages are clean by default, hence PTE_RDONLY|PTE_WRITE */
>> #define PAGE_SHARED __pgprot(_PAGE_SHARED)
>> #define PAGE_SHARED_EXEC __pgprot(_PAGE_SHARED_EXEC)
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index afdd56d26ad7..8dd4637d6b56 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -105,7 +105,7 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
>> /*
>> * The following only work if pte_present(). Undefined behaviour otherwise.
>> */
>> -#define pte_present(pte) (!!(pte_val(pte) & (PTE_VALID | PTE_PROT_NONE)))
>> +#define pte_present(pte) (pte_valid(pte) || pte_invalid(pte))
>> #define pte_young(pte) (!!(pte_val(pte) & PTE_AF))
>> #define pte_special(pte) (!!(pte_val(pte) & PTE_SPECIAL))
>> #define pte_write(pte) (!!(pte_val(pte) & PTE_WRITE))
>> @@ -132,6 +132,7 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
>> #define pte_dirty(pte) (pte_sw_dirty(pte) || pte_hw_dirty(pte))
>>
>> #define pte_valid(pte) (!!(pte_val(pte) & PTE_VALID))
>> +#define pte_invalid(pte) ((pte_val(pte) & (PTE_VALID | PTE_INVALID)) == PTE_INVALID)
>> /*
>> * Execute-only user mappings do not have the PTE_USER bit set. All valid
>> * kernel mappings have the PTE_UXN bit set.
>> @@ -261,6 +262,13 @@ static inline pte_t pte_mkpresent(pte_t pte)
>> return set_pte_bit(pte, __pgprot(PTE_VALID));
>> }
>>
>> +static inline pte_t pte_mkinvalid(pte_t pte)
>> +{
>> + pte = set_pte_bit(pte, __pgprot(PTE_INVALID));
>> + pte = clear_pte_bit(pte, __pgprot(PTE_VALID));
>> + return pte;
>> +}
>> +
>> static inline pmd_t pmd_mkcont(pmd_t pmd)
>> {
>> return __pmd(pmd_val(pmd) | PMD_SECT_CONT);
>> @@ -469,7 +477,7 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
>> */
>> static inline int pte_protnone(pte_t pte)
>> {
>> - return (pte_val(pte) & (PTE_VALID | PTE_PROT_NONE)) == PTE_PROT_NONE;
>> + return pte_invalid(pte) && !pte_user(pte) && !pte_user_exec(pte);
>> }
>
> Why do we need to check pte_user_*() here? Isn't PROT_NONE the only case
> in which a pte will have PTE_INVALID set?

I guess for *ptes* this is technically correct. But I was trying to make the
format generic and reusable for *pmds* too. (pmd_protnone() wraps
pte_protnone()). For pmds, PTE_INVALID also represents invalid-but-present PMDs
(i.e. pmds on which pmd_mkinvalid() has been called).

The intention is that PTE_INVALID indicates "present but not valid in HW". And
(!pte_user(pte) && !pte_user_exec(pte)) indicates the PROT_NONE permission.

>
> Will


2024-04-30 15:08:06

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] arm64/mm: Refactor PMD_PRESENT_INVALID and PTE_PROT_NONE bits

On Tue, Apr 30, 2024 at 03:02:21PM +0100, Ryan Roberts wrote:
> On 30/04/2024 14:30, Will Deacon wrote:
> > On Mon, Apr 29, 2024 at 03:02:05PM +0100, Ryan Roberts wrote:
> >> diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
> >> index dd9ee67d1d87..de62e6881154 100644
> >> --- a/arch/arm64/include/asm/pgtable-prot.h
> >> +++ b/arch/arm64/include/asm/pgtable-prot.h
> >> @@ -18,14 +18,7 @@
> >> #define PTE_DIRTY (_AT(pteval_t, 1) << 55)
> >> #define PTE_SPECIAL (_AT(pteval_t, 1) << 56)
> >> #define PTE_DEVMAP (_AT(pteval_t, 1) << 57)
> >> -#define PTE_PROT_NONE (_AT(pteval_t, 1) << 58) /* only when !PTE_VALID */
> >> -
> >> -/*
> >> - * This bit indicates that the entry is present i.e. pmd_page()
> >> - * still points to a valid huge page in memory even if the pmd
> >> - * has been invalidated.
> >> - */
> >> -#define PMD_PRESENT_INVALID (_AT(pteval_t, 1) << 59) /* only when !PMD_SECT_VALID */
> >> +#define PTE_INVALID (_AT(pteval_t, 1) << 59) /* only when !PTE_VALID */
> >
> > So this now overlaps with AttrIndx[3] if FEAT_AIE is implemented. Although
> > this shouldn't matter on the face of things because it's only used for
> > invalid entries, we originally moved the PROT_NONE bit from 2 to 57 back
> > in 3676f9ef5481 ("arm64: Move PTE_PROT_NONE higher up") because it was
> > possible to change the memory type for PROT_NONE mappings via some
> > drivers.
>
> I'm not sure I follow your argument.
>
> 1. We don't support FEAT_AIE (currently) so AttrIndx[3] is always going to be 0
> for valid ptes. Drivers are only calling our helpers (e.g.
> pgprot_writecombine(), right?) and those only know how to set AttrIndx[2:0].

Sure, but we might want to use it in future and chucking that out for the
sake of uffd doesn't seem like an obviously worthwhile trade-off to me.

> 2. PMD_PRESENT_INVALID was already occupying bit 59. So wouldn't the same shape
> of concern apply there too for PMDs that have been invalidated, where the driver
> then comes along and changes the memory type? (Perhaps because
> PMD_PRESENT_INVALID is only set while the PTL is held this can't happen).

I was mainly thinking of the PROT_NONE case, to be honest with you. I
struggle to envisage how a driver could sensibly mess with the memory
type for anonymous mappings, let alone huge pages! But perhaps I just
lack imagination :)

> 3. I had this same vague concern about confusion due to overlapping bit 59,
> which is why in the next patch, I'm moving it to the NS bit.
>
> Experience tells me that when I'm arguing confidently with someone who is much
> more expert than me, then I'm using wrong... so what have I missed? :)
>
> >
> > Moving the field to the NS bit (as you do later in the series) resolves
> > this, but the architecture currently says that the NS bit is RES0. How
> > can we guarantee that it won't be repurposed by hardware in future?
>
> Well it remains free for use in valid entries of course.

I think that's what I'm actually questioning! RES0 doesn't mean that
tomorrow's whizz-bang CPU extension isn't allowed to use it, but that's
a guarantee that we need if we're going to use it for our own purposes.

> So I guess you are asking how to guarantee we won't also need to be able
> to modify it on the fly for PROT_NONE entries? I don't have a definite
> answer, but I've been working on the assumption that the architecture
> introducing a feature that is only needed in states where NS is not needed
> is unlikely (so using that bit for the feature is also unlikely). And then
> needing to manipulate that feature dyanically for PROT_NONE mappings is
> even less likely.

The architects are quite good at inventing unlikely features :) SVE
blowing the sigcontext comes to mind. I think we should seek
clarification that the NS bit won't be allocated in the future if we are
going to use it for our own stuff.

> If all else fails we could move it to nG (bit 11) to free up bit 5. But that
> requires a bit more fiddling with the swap pte format.

Oh, cunning, I hadn't thought of that. I think that's probably a better
approach if the NS bit isn't guaranteed to be left alone by the
architecture.

> >> @@ -469,7 +477,7 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
> >> */
> >> static inline int pte_protnone(pte_t pte)
> >> {
> >> - return (pte_val(pte) & (PTE_VALID | PTE_PROT_NONE)) == PTE_PROT_NONE;
> >> + return pte_invalid(pte) && !pte_user(pte) && !pte_user_exec(pte);
> >> }
> >
> > Why do we need to check pte_user_*() here? Isn't PROT_NONE the only case
> > in which a pte will have PTE_INVALID set?
>
> I guess for *ptes* this is technically correct. But I was trying to make the
> format generic and reusable for *pmds* too. (pmd_protnone() wraps
> pte_protnone()). For pmds, PTE_INVALID also represents invalid-but-present PMDs
> (i.e. pmds on which pmd_mkinvalid() has been called).
>
> The intention is that PTE_INVALID indicates "present but not valid in HW". And
> (!pte_user(pte) && !pte_user_exec(pte)) indicates the PROT_NONE permission.

Ok, but it does mean the compiler can't emit a nice TBNZ instruction for
the pte macro. Can you either seperate out the pmd/pte versions of the
macro or just add a comment along the lines of what you said above, please?

Cheers,

Will

2024-04-30 15:39:41

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] arm64/mm: Refactor PMD_PRESENT_INVALID and PTE_PROT_NONE bits

On 30/04/2024 16:04, Will Deacon wrote:
> On Tue, Apr 30, 2024 at 03:02:21PM +0100, Ryan Roberts wrote:
>> On 30/04/2024 14:30, Will Deacon wrote:
>>> On Mon, Apr 29, 2024 at 03:02:05PM +0100, Ryan Roberts wrote:
>>>> diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
>>>> index dd9ee67d1d87..de62e6881154 100644
>>>> --- a/arch/arm64/include/asm/pgtable-prot.h
>>>> +++ b/arch/arm64/include/asm/pgtable-prot.h
>>>> @@ -18,14 +18,7 @@
>>>> #define PTE_DIRTY (_AT(pteval_t, 1) << 55)
>>>> #define PTE_SPECIAL (_AT(pteval_t, 1) << 56)
>>>> #define PTE_DEVMAP (_AT(pteval_t, 1) << 57)
>>>> -#define PTE_PROT_NONE (_AT(pteval_t, 1) << 58) /* only when !PTE_VALID */
>>>> -
>>>> -/*
>>>> - * This bit indicates that the entry is present i.e. pmd_page()
>>>> - * still points to a valid huge page in memory even if the pmd
>>>> - * has been invalidated.
>>>> - */
>>>> -#define PMD_PRESENT_INVALID (_AT(pteval_t, 1) << 59) /* only when !PMD_SECT_VALID */
>>>> +#define PTE_INVALID (_AT(pteval_t, 1) << 59) /* only when !PTE_VALID */
>>>
>>> So this now overlaps with AttrIndx[3] if FEAT_AIE is implemented. Although
>>> this shouldn't matter on the face of things because it's only used for
>>> invalid entries, we originally moved the PROT_NONE bit from 2 to 57 back
>>> in 3676f9ef5481 ("arm64: Move PTE_PROT_NONE higher up") because it was
>>> possible to change the memory type for PROT_NONE mappings via some
>>> drivers.
>>
>> I'm not sure I follow your argument.
>>
>> 1. We don't support FEAT_AIE (currently) so AttrIndx[3] is always going to be 0
>> for valid ptes. Drivers are only calling our helpers (e.g.
>> pgprot_writecombine(), right?) and those only know how to set AttrIndx[2:0].
>
> Sure, but we might want to use it in future and chucking that out for the
> sake of uffd doesn't seem like an obviously worthwhile trade-off to me.

Totally agree, which is why I move it in the next patch. I was just commenting
that its not a problem for this intermediate state between patches because the
kernel doesn't support FEAT_AIE today.

>
>> 2. PMD_PRESENT_INVALID was already occupying bit 59. So wouldn't the same shape
>> of concern apply there too for PMDs that have been invalidated, where the driver
>> then comes along and changes the memory type? (Perhaps because
>> PMD_PRESENT_INVALID is only set while the PTL is held this can't happen).
>
> I was mainly thinking of the PROT_NONE case, to be honest with you. I
> struggle to envisage how a driver could sensibly mess with the memory
> type for anonymous mappings, let alone huge pages! But perhaps I just
> lack imagination :)
>
>> 3. I had this same vague concern about confusion due to overlapping bit 59,
>> which is why in the next patch, I'm moving it to the NS bit.
>>
>> Experience tells me that when I'm arguing confidently with someone who is much
>> more expert than me, then I'm using wrong... so what have I missed? :)

Sorry that was meant to say "I'm *usually* wrong" in case it wasn't obvious.

>>
>>>
>>> Moving the field to the NS bit (as you do later in the series) resolves
>>> this, but the architecture currently says that the NS bit is RES0. How
>>> can we guarantee that it won't be repurposed by hardware in future?
>>
>> Well it remains free for use in valid entries of course.
>
> I think that's what I'm actually questioning!

Sorry I'm not sure if we are talking cross-purposes... PTE_INVALID is only
overlayed on the NS bit when PTE_VALID=0. So the NS bit remains 0 for valid PTEs
today, and in future, any new feature control within the bit could be set as
required for valid ptes.

But I *think* you are concerned about the possibility that any future feature
control that occupies that bit could also require persisting in pte/pmd even
when its invalidated (i.e. pmd_mkinvalid(pmd) or pte_modify(pte, PAGE_NONE))?

> RES0 doesn't mean that
> tomorrow's whizz-bang CPU extension isn't allowed to use it, but that's
> a guarantee that we need if we're going to use it for our own purposes.
>
>> So I guess you are asking how to guarantee we won't also need to be able
>> to modify it on the fly for PROT_NONE entries? I don't have a definite
>> answer, but I've been working on the assumption that the architecture
>> introducing a feature that is only needed in states where NS is not needed
>> is unlikely (so using that bit for the feature is also unlikely). And then
>> needing to manipulate that feature dyanically for PROT_NONE mappings is
>> even less likely.
>
> The architects are quite good at inventing unlikely features :) SVE
> blowing the sigcontext comes to mind. I think we should seek
> clarification that the NS bit won't be allocated in the future if we are
> going to use it for our own stuff.

OK, clarification sought; the architects are *not* willing to upgrade the res0
to "IGNORED"...

>
>> If all else fails we could move it to nG (bit 11) to free up bit 5. But that
>> requires a bit more fiddling with the swap pte format.
>
> Oh, cunning, I hadn't thought of that. I think that's probably a better
> approach if the NS bit isn't guaranteed to be left alone by the
> architecture.

..So I'll change patch 2 to move the bit to nG (bit 11). Does that work for
you? I *think* an alternative could be the contig bit. But nG feels safest to me
- I'd have to think a lot harder about contig bit.

>
>>>> @@ -469,7 +477,7 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
>>>> */
>>>> static inline int pte_protnone(pte_t pte)
>>>> {
>>>> - return (pte_val(pte) & (PTE_VALID | PTE_PROT_NONE)) == PTE_PROT_NONE;
>>>> + return pte_invalid(pte) && !pte_user(pte) && !pte_user_exec(pte);
>>>> }
>>>
>>> Why do we need to check pte_user_*() here? Isn't PROT_NONE the only case
>>> in which a pte will have PTE_INVALID set?
>>
>> I guess for *ptes* this is technically correct. But I was trying to make the
>> format generic and reusable for *pmds* too. (pmd_protnone() wraps
>> pte_protnone()). For pmds, PTE_INVALID also represents invalid-but-present PMDs
>> (i.e. pmds on which pmd_mkinvalid() has been called).
>>
>> The intention is that PTE_INVALID indicates "present but not valid in HW". And
>> (!pte_user(pte) && !pte_user_exec(pte)) indicates the PROT_NONE permission.
>
> Ok, but it does mean the compiler can't emit a nice TBNZ instruction for
> the pte macro. Can you either seperate out the pmd/pte versions of the
> macro or just add a comment along the lines of what you said above, please?

I'll add a comment; I'd rather not have the implementations diverge unless there
is a clear performance advantage.

Thanks for the review!

>
> Cheers,
>
> Will