2024-05-03 14:47:04

by Ryan Roberts

[permalink] [raw]
Subject: [PATCH v4 0/4] arm64/mm: Enable userfaultfd write-protect

Hi All,

This series adds uffd write-protect support for arm64.

Previous attempts to add uffd-wp (and soft-dirty) have failed because of a
perceived lack of available PTE SW bits. However it actually turns out that
there are 2 available but they are hidden. PTE_PROT_NONE was previously
occupying a SW bit, but can be moved, freeing up the SW bit. Bit 63 is marked as
"IGNORED" in the Arm ARM, but it does not currently indicate "reserved for SW
use" like it does for the other SW bits. I've confirmed with the spec owner that
this is an oversight; the bit is intended to be reserved for SW use and the spec
will clarify this in a future update.

So now we have two spare bits; patch 4 enables uffd-wp on arm64, using the SW
bit freed up by moving PTE_PROT_NONE. This leaves bit 63 spare for future use
(e.g. soft-dirty - see RFC at [4] - or some other usage).

---

This applies on top of v6.9-rc5.

All mm selftests involving uffd-wp now run. However, this work exposed a bug in
core-mm that was leading to some test uffd-wp failures in the pagemap_ioctl
test. The fix for that is posted at [5], and is in mm-hotfixes-unstable. With
the fix applied, all the uffd-wp tests pass and no other selftest regressions
are observed.


Changes since v3 [3]
====================

patch 1 & 2 (was patch 1):
- Split into 2 patches (per Anshuman):
- patch 1: generalizes PMD_PRESENT_INVALID
- patch 2: removes PTE_PROT_NONE
- Re-aded comment for PTE_PRESENT_INVALID (per Anshuman)


Changes since v2 [2]
====================

patch 1:
- Renamed PTE_INVALID -> PTE_PRESENT_INVALID, pte_invalid() ->
pte_present_invalid() (per Catalin)
- Added comment explaining test in pte_protnone() (per Will)
- Added R-b (thanks to Catalin)
patch 2:
- Move PTE_PRESENT_INVALID to PTE_NG instead of PTE_NS (per Will)
- Added R-b (thanks to Catalin)
patch 3:
- Added R-b (thanks to Catalin, David)


Changes since v1 [1]
====================

patch 1 & 2 (was patch 1):
- generalized PMD_PRESENT_INVALID into PTE_INVALID
- removed explicit PTE_PROT_NONE bit
patch 3 (was patch 2):
- collected R-b/A-b from Peter and Catalin - thanks!


[1] https://lore.kernel.org/linux-arm-kernel/[email protected]/
[2] https://lore.kernel.org/linux-arm-kernel/[email protected]/
[3] https://lore.kernel.org/linux-arm-kernel/[email protected]/
[4] https://lore.kernel.org/all/[email protected]/
[5] https://lore.kernel.org/all/[email protected]/

Thanks,
Ryan

Ryan Roberts (4):
arm64/mm: generalize PMD_PRESENT_INVALID for all levels
arm64/mm: Remove PTE_PROT_NONE bit
arm64/mm: Move PTE_PRESENT_INVALID to overlay PTE_NG
arm64/mm: Add uffd write-protect support

arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/pgtable-prot.h | 19 +++--
arch/arm64/include/asm/pgtable.h | 104 +++++++++++++++++++-------
3 files changed, 91 insertions(+), 33 deletions(-)

--
2.43.0



2024-05-03 14:47:15

by Ryan Roberts

[permalink] [raw]
Subject: [PATCH v4 2/4] arm64/mm: Remove PTE_PROT_NONE bit

Currently the PTE_PRESENT_INVALID and PTE_PROT_NONE functionality
explicitly occupy 2 bits in the PTE when PTE_VALID/PMD_SECT_VALID is
clear. This has 2 significant consequences:

- PTE_PROT_NONE consumes a precious SW PTE bit that could be used for
other things.
- The swap pte layout must reserve those same 2 bits and ensure they
are both always zero for a swap pte. It would be nice to reclaim at
least one of those bits.

But PTE_PRESENT_INVALID, which since the previous patch, applies
uniformly to page/block descriptors at any level when PTE_VALID is
clear, can already give us most of what PTE_PROT_NONE requires: If it is
set, then the pte is still considered present; pte_present() returns
true and all the fields in the pte follow the HW interpretation (e.g. SW
can safely call pte_pfn(), etc). But crucially, the HW treats the pte as
invalid and will fault if it hits.

So let's remove PTE_PROT_NONE entirely and instead represent PROT_NONE
as a present but invalid pte (PTE_VALID=0, PTE_PRESENT_INVALID=1) with
PTE_USER=0 and PTE_UXN=1. This is a unique combination that is not used
anywhere else.

The net result is a clearer, simpler, more generic encoding scheme that
applies uniformly to all levels. Additionally we free up a PTE SW bit
and a swap pte bit (bit 58 in both cases).

Reviewed-by: Catalin Marinas <[email protected]>
Signed-off-by: Ryan Roberts <[email protected]>
---
arch/arm64/include/asm/pgtable-prot.h | 3 +--
arch/arm64/include/asm/pgtable.h | 31 +++++++++++++++------------
2 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index cdbf51eef7a6..81f07b44f7b8 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -18,7 +18,6 @@
#define PTE_DIRTY (_AT(pteval_t, 1) << 55)
#define PTE_SPECIAL (_AT(pteval_t, 1) << 56)
#define PTE_DEVMAP (_AT(pteval_t, 1) << 57)
-#define PTE_PROT_NONE (_AT(pteval_t, 1) << 58) /* only when !PTE_VALID */

/*
* PTE_PRESENT_INVALID=1 & PTE_VALID=0 indicates that the pte's fields should be
@@ -103,7 +102,7 @@ static inline bool __pure lpa2_is_enabled(void)
__val; \
})

-#define PAGE_NONE __pgprot(((_PAGE_DEFAULT) & ~PTE_VALID) | PTE_PROT_NONE | PTE_RDONLY | PTE_NG | PTE_PXN | PTE_UXN)
+#define PAGE_NONE __pgprot(((_PAGE_DEFAULT) & ~PTE_VALID) | PTE_PRESENT_INVALID | PTE_RDONLY | PTE_NG | PTE_PXN | PTE_UXN)
/* shared+writable pages are clean by default, hence PTE_RDONLY|PTE_WRITE */
#define PAGE_SHARED __pgprot(_PAGE_SHARED)
#define PAGE_SHARED_EXEC __pgprot(_PAGE_SHARED_EXEC)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 7156c940ac4f..c0f4471423db 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -105,7 +105,7 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
/*
* The following only work if pte_present(). Undefined behaviour otherwise.
*/
-#define pte_present(pte) (!!(pte_val(pte) & (PTE_VALID | PTE_PROT_NONE)))
+#define pte_present(pte) (pte_valid(pte) || pte_present_invalid(pte))
#define pte_young(pte) (!!(pte_val(pte) & PTE_AF))
#define pte_special(pte) (!!(pte_val(pte) & PTE_SPECIAL))
#define pte_write(pte) (!!(pte_val(pte) & PTE_WRITE))
@@ -478,7 +478,16 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
*/
static inline int pte_protnone(pte_t pte)
{
- return (pte_val(pte) & (PTE_VALID | PTE_PROT_NONE)) == PTE_PROT_NONE;
+ /*
+ * pte_present_invalid() tells us that the pte is invalid from HW
+ * perspective but present from SW perspective, so the fields are to be
+ * interpretted as per the HW layout. The second 2 checks are the unique
+ * encoding that we use for PROT_NONE. It is insufficient to only use
+ * the first check because we share the same encoding scheme with pmds
+ * which support pmd_mkinvalid(), so can be present-invalid without
+ * being PROT_NONE.
+ */
+ return pte_present_invalid(pte) && !pte_user(pte) && !pte_user_exec(pte);
}

static inline int pmd_protnone(pmd_t pmd)
@@ -487,12 +496,7 @@ static inline int pmd_protnone(pmd_t pmd)
}
#endif

-#define pmd_present_invalid(pmd) pte_present_invalid(pmd_pte(pmd))
-
-static inline int pmd_present(pmd_t pmd)
-{
- return pte_present(pmd_pte(pmd)) || pmd_present_invalid(pmd);
-}
+#define pmd_present(pmd) pte_present(pmd_pte(pmd))

/*
* THP definitions.
@@ -1029,8 +1033,8 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
* in MAIR_EL1. The mask below has to include PTE_ATTRINDX_MASK.
*/
const pteval_t mask = PTE_USER | PTE_PXN | PTE_UXN | PTE_RDONLY |
- PTE_PROT_NONE | PTE_VALID | PTE_WRITE | PTE_GP |
- PTE_ATTRINDX_MASK;
+ PTE_PRESENT_INVALID | PTE_VALID | PTE_WRITE |
+ PTE_GP | PTE_ATTRINDX_MASK;
/* preserve the hardware dirty information */
if (pte_hw_dirty(pte))
pte = set_pte_bit(pte, __pgprot(PTE_DIRTY));
@@ -1078,17 +1082,17 @@ static inline int pgd_devmap(pgd_t pgd)
#ifdef CONFIG_PAGE_TABLE_CHECK
static inline bool pte_user_accessible_page(pte_t pte)
{
- return pte_present(pte) && (pte_user(pte) || pte_user_exec(pte));
+ return pte_valid(pte) && (pte_user(pte) || pte_user_exec(pte));
}

static inline bool pmd_user_accessible_page(pmd_t pmd)
{
- return pmd_leaf(pmd) && !pmd_present_invalid(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd));
+ return pmd_valid(pmd) && !pmd_table(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd));
}

static inline bool pud_user_accessible_page(pud_t pud)
{
- return pud_leaf(pud) && (pud_user(pud) || pud_user_exec(pud));
+ return pud_valid(pud) && !pud_table(pud) && (pud_user(pud) || pud_user_exec(pud));
}
#endif

@@ -1252,7 +1256,6 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
* bits 2: remember PG_anon_exclusive
* bits 3-7: swap type
* bits 8-57: swap offset
- * bit 58: PTE_PROT_NONE (must be zero)
* bit 59: PTE_PRESENT_INVALID (must be zero)
*/
#define __SWP_TYPE_SHIFT 3
--
2.43.0


2024-05-03 14:47:21

by Ryan Roberts

[permalink] [raw]
Subject: [PATCH v4 3/4] arm64/mm: Move PTE_PRESENT_INVALID to overlay PTE_NG

PTE_PRESENT_INVALID was previously occupying bit 59, which when a PTE is
valid can either be IGNORED, PBHA[0] or AttrIndex[3], depending on the
HW configuration. In practice this is currently not a problem because
PTE_PRESENT_INVALID can only be 1 when PTE_VALID=0 and upstream Linux
always requires the bit set to 0 for a valid pte.

However, if in future Linux wants to use the field (e.g. AttrIndex[3])
then we could end up with confusion when PTE_PRESENT_INVALID comes along
and corrupts the field - we would ideally want to preserve it even for
an invalid (but present) pte.

The other problem with bit 59 is that it prevents the offset field of a
swap entry within a swap pte from growing beyond 51 bits. By moving
PTE_PRESENT_INVALID to a low bit we can lay the swap pte out so that the
offset field could grow to 52 bits in future.

So let's move PTE_PRESENT_INVALID to overlay PTE_NG (bit 11).

There is no need to persist NG for a present-invalid entry; it is always
set for user mappings and is not used by SW to derive any state from the
pte. PTE_NS was considered instead of PTE_NG, but it is RES0 for
non-secure SW, so there is a chance that future architecture may
allocate the bit and we may therefore need to persist that bit for
present-invalid ptes.

These are both marginal benefits, but make things a bit tidier in my
opinion.

Reviewed-by: Catalin Marinas <[email protected]>
Signed-off-by: Ryan Roberts <[email protected]>
---
arch/arm64/include/asm/pgtable-prot.h | 2 +-
arch/arm64/include/asm/pgtable.h | 12 ++++++------
2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index 81f07b44f7b8..35c9de13f7ed 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -24,7 +24,7 @@
* interpreted according to the HW layout by SW but any attempted HW access to
* the address will result in a fault. pte_present() returns true.
*/
-#define PTE_PRESENT_INVALID (_AT(pteval_t, 1) << 59) /* only when !PTE_VALID */
+#define PTE_PRESENT_INVALID (PTE_NG) /* only when !PTE_VALID */

#define _PROT_DEFAULT (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
#define _PROT_SECT_DEFAULT (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index c0f4471423db..7f1ff59c43ed 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1254,15 +1254,15 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
* Encode and decode a swap entry:
* bits 0-1: present (must be zero)
* bits 2: remember PG_anon_exclusive
- * bits 3-7: swap type
- * bits 8-57: swap offset
- * bit 59: PTE_PRESENT_INVALID (must be zero)
+ * bits 6-10: swap type
+ * bit 11: PTE_PRESENT_INVALID (must be zero)
+ * bits 12-61: swap offset
*/
-#define __SWP_TYPE_SHIFT 3
+#define __SWP_TYPE_SHIFT 6
#define __SWP_TYPE_BITS 5
-#define __SWP_OFFSET_BITS 50
#define __SWP_TYPE_MASK ((1 << __SWP_TYPE_BITS) - 1)
-#define __SWP_OFFSET_SHIFT (__SWP_TYPE_BITS + __SWP_TYPE_SHIFT)
+#define __SWP_OFFSET_SHIFT 12
+#define __SWP_OFFSET_BITS 50
#define __SWP_OFFSET_MASK ((1UL << __SWP_OFFSET_BITS) - 1)

#define __swp_type(x) (((x).val >> __SWP_TYPE_SHIFT) & __SWP_TYPE_MASK)
--
2.43.0


2024-05-03 14:47:28

by Ryan Roberts

[permalink] [raw]
Subject: [PATCH v4 4/4] arm64/mm: Add uffd write-protect support

Let's use the newly-free PTE SW bit (58) to add support for uffd-wp.

The standard handlers are implemented for set/test/clear for both pte
and pmd. Additionally we must also track the uffd-wp state as a pte swp
bit, so use a free swap pte bit (3).

Acked-by: Peter Xu <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Ryan Roberts <[email protected]>
---
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/pgtable-prot.h | 8 +++++
arch/arm64/include/asm/pgtable.h | 44 +++++++++++++++++++++++++++
3 files changed, 53 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7b11c98b3e84..763e221f2169 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -255,6 +255,7 @@ config ARM64
select SYSCTL_EXCEPTION_TRACE
select THREAD_INFO_IN_TASK
select HAVE_ARCH_USERFAULTFD_MINOR if USERFAULTFD
+ select HAVE_ARCH_USERFAULTFD_WP if USERFAULTFD
select TRACE_IRQFLAGS_SUPPORT
select TRACE_IRQFLAGS_NMI_SUPPORT
select HAVE_SOFTIRQ_ON_OWN_STACK
diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index 35c9de13f7ed..b11cfb9fdd37 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -26,6 +26,14 @@
*/
#define PTE_PRESENT_INVALID (PTE_NG) /* only when !PTE_VALID */

+#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP
+#define PTE_UFFD_WP (_AT(pteval_t, 1) << 58) /* uffd-wp tracking */
+#define PTE_SWP_UFFD_WP (_AT(pteval_t, 1) << 3) /* only for swp ptes */
+#else
+#define PTE_UFFD_WP (_AT(pteval_t, 0))
+#define PTE_SWP_UFFD_WP (_AT(pteval_t, 0))
+#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */
+
#define _PROT_DEFAULT (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
#define _PROT_SECT_DEFAULT (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 7f1ff59c43ed..2dcf582981ae 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -280,6 +280,23 @@ static inline pte_t pte_mkdevmap(pte_t pte)
return set_pte_bit(pte, __pgprot(PTE_DEVMAP | PTE_SPECIAL));
}

+#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP
+static inline int pte_uffd_wp(pte_t pte)
+{
+ return !!(pte_val(pte) & PTE_UFFD_WP);
+}
+
+static inline pte_t pte_mkuffd_wp(pte_t pte)
+{
+ return pte_wrprotect(set_pte_bit(pte, __pgprot(PTE_UFFD_WP)));
+}
+
+static inline pte_t pte_clear_uffd_wp(pte_t pte)
+{
+ return clear_pte_bit(pte, __pgprot(PTE_UFFD_WP));
+}
+#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */
+
static inline void __set_pte(pte_t *ptep, pte_t pte)
{
WRITE_ONCE(*ptep, pte);
@@ -472,6 +489,23 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
return clear_pte_bit(pte, __pgprot(PTE_SWP_EXCLUSIVE));
}

+#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP
+static inline pte_t pte_swp_mkuffd_wp(pte_t pte)
+{
+ return set_pte_bit(pte, __pgprot(PTE_SWP_UFFD_WP));
+}
+
+static inline int pte_swp_uffd_wp(pte_t pte)
+{
+ return !!(pte_val(pte) & PTE_SWP_UFFD_WP);
+}
+
+static inline pte_t pte_swp_clear_uffd_wp(pte_t pte)
+{
+ return clear_pte_bit(pte, __pgprot(PTE_SWP_UFFD_WP));
+}
+#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */
+
#ifdef CONFIG_NUMA_BALANCING
/*
* See the comment in include/linux/pgtable.h
@@ -522,6 +556,15 @@ static inline int pmd_trans_huge(pmd_t pmd)
#define pmd_mkdirty(pmd) pte_pmd(pte_mkdirty(pmd_pte(pmd)))
#define pmd_mkyoung(pmd) pte_pmd(pte_mkyoung(pmd_pte(pmd)))
#define pmd_mkinvalid(pmd) pte_pmd(pte_mkinvalid(pmd_pte(pmd)))
+#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP
+#define pmd_uffd_wp(pmd) pte_uffd_wp(pmd_pte(pmd))
+#define pmd_mkuffd_wp(pmd) pte_pmd(pte_mkuffd_wp(pmd_pte(pmd)))
+#define pmd_clear_uffd_wp(pmd) pte_pmd(pte_clear_uffd_wp(pmd_pte(pmd)))
+#define pmd_swp_uffd_wp(pmd) pte_swp_uffd_wp(pmd_pte(pmd))
+#define pmd_swp_mkuffd_wp(pmd) pte_pmd(pte_swp_mkuffd_wp(pmd_pte(pmd)))
+#define pmd_swp_clear_uffd_wp(pmd) \
+ pte_pmd(pte_swp_clear_uffd_wp(pmd_pte(pmd)))
+#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */

#define pmd_thp_or_huge(pmd) (pmd_huge(pmd) || pmd_trans_huge(pmd))

@@ -1254,6 +1297,7 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
* Encode and decode a swap entry:
* bits 0-1: present (must be zero)
* bits 2: remember PG_anon_exclusive
+ * bit 3: remember uffd-wp state
* bits 6-10: swap type
* bit 11: PTE_PRESENT_INVALID (must be zero)
* bits 12-61: swap offset
--
2.43.0


2024-05-07 11:10:46

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] arm64/mm: Enable userfaultfd write-protect

Hi Ryan,

On Fri, May 03, 2024 at 03:45:58PM +0100, Ryan Roberts wrote:
> This series adds uffd write-protect support for arm64.
>
> Previous attempts to add uffd-wp (and soft-dirty) have failed because of a
> perceived lack of available PTE SW bits. However it actually turns out that
> there are 2 available but they are hidden. PTE_PROT_NONE was previously
> occupying a SW bit, but can be moved, freeing up the SW bit. Bit 63 is marked as
> "IGNORED" in the Arm ARM, but it does not currently indicate "reserved for SW
> use" like it does for the other SW bits. I've confirmed with the spec owner that
> this is an oversight; the bit is intended to be reserved for SW use and the spec
> will clarify this in a future update.
>
> So now we have two spare bits; patch 4 enables uffd-wp on arm64, using the SW
> bit freed up by moving PTE_PROT_NONE. This leaves bit 63 spare for future use
> (e.g. soft-dirty - see RFC at [4] - or some other usage).
>
> ---
>
> This applies on top of v6.9-rc5.

I chucked this into the CI on Friday and it looks to have survived the
long weekend, so I've gone ahead and merged it into for-next/core. Short
of any last minute failures (touch wood), this should land in 6.10.

Thanks!

Will

2024-05-07 11:41:51

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v4 2/4] arm64/mm: Remove PTE_PROT_NONE bit

On 03.05.24 16:46, Ryan Roberts wrote:
> Currently the PTE_PRESENT_INVALID and PTE_PROT_NONE functionality
> explicitly occupy 2 bits in the PTE when PTE_VALID/PMD_SECT_VALID is
> clear. This has 2 significant consequences:
>
> - PTE_PROT_NONE consumes a precious SW PTE bit that could be used for
> other things.
> - The swap pte layout must reserve those same 2 bits and ensure they
> are both always zero for a swap pte. It would be nice to reclaim at
> least one of those bits.
>
> But PTE_PRESENT_INVALID, which since the previous patch, applies
> uniformly to page/block descriptors at any level when PTE_VALID is
> clear, can already give us most of what PTE_PROT_NONE requires: If it is
> set, then the pte is still considered present; pte_present() returns
> true and all the fields in the pte follow the HW interpretation (e.g. SW
> can safely call pte_pfn(), etc). But crucially, the HW treats the pte as
> invalid and will fault if it hits.
>
> So let's remove PTE_PROT_NONE entirely and instead represent PROT_NONE
> as a present but invalid pte (PTE_VALID=0, PTE_PRESENT_INVALID=1) with
> PTE_USER=0 and PTE_UXN=1. This is a unique combination that is not used
> anywhere else.
>
> The net result is a clearer, simpler, more generic encoding scheme that
> applies uniformly to all levels. Additionally we free up a PTE SW bit
> and a swap pte bit (bit 58 in both cases).
>
> Reviewed-by: Catalin Marinas <[email protected]>
> Signed-off-by: Ryan Roberts <[email protected]>

Reviewed-by: David Hildenbrand <[email protected]>

--
Cheers,

David / dhildenb


2024-05-07 11:44:28

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v4 3/4] arm64/mm: Move PTE_PRESENT_INVALID to overlay PTE_NG

On 03.05.24 16:46, Ryan Roberts wrote:
> PTE_PRESENT_INVALID was previously occupying bit 59, which when a PTE is
> valid can either be IGNORED, PBHA[0] or AttrIndex[3], depending on the
> HW configuration. In practice this is currently not a problem because
> PTE_PRESENT_INVALID can only be 1 when PTE_VALID=0 and upstream Linux
> always requires the bit set to 0 for a valid pte.
>
> However, if in future Linux wants to use the field (e.g. AttrIndex[3])
> then we could end up with confusion when PTE_PRESENT_INVALID comes along
> and corrupts the field - we would ideally want to preserve it even for
> an invalid (but present) pte.
>
> The other problem with bit 59 is that it prevents the offset field of a
> swap entry within a swap pte from growing beyond 51 bits. By moving
> PTE_PRESENT_INVALID to a low bit we can lay the swap pte out so that the
> offset field could grow to 52 bits in future.
>
> So let's move PTE_PRESENT_INVALID to overlay PTE_NG (bit 11).
>
> There is no need to persist NG for a present-invalid entry; it is always
> set for user mappings and is not used by SW to derive any state from the
> pte. PTE_NS was considered instead of PTE_NG, but it is RES0 for
> non-secure SW, so there is a chance that future architecture may
> allocate the bit and we may therefore need to persist that bit for
> present-invalid ptes.
>
> These are both marginal benefits, but make things a bit tidier in my
> opinion.
>
> Reviewed-by: Catalin Marinas <[email protected]>
> Signed-off-by: Ryan Roberts <[email protected]>

Reviewed-by: David Hildenbrand <[email protected]>

--
Cheers,

David / dhildenb


2024-05-07 11:47:53

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] arm64/mm: Enable userfaultfd write-protect

On 07/05/2024 12:07, Will Deacon wrote:
> Hi Ryan,
>
> On Fri, May 03, 2024 at 03:45:58PM +0100, Ryan Roberts wrote:
>> This series adds uffd write-protect support for arm64.
>>
>> Previous attempts to add uffd-wp (and soft-dirty) have failed because of a
>> perceived lack of available PTE SW bits. However it actually turns out that
>> there are 2 available but they are hidden. PTE_PROT_NONE was previously
>> occupying a SW bit, but can be moved, freeing up the SW bit. Bit 63 is marked as
>> "IGNORED" in the Arm ARM, but it does not currently indicate "reserved for SW
>> use" like it does for the other SW bits. I've confirmed with the spec owner that
>> this is an oversight; the bit is intended to be reserved for SW use and the spec
>> will clarify this in a future update.
>>
>> So now we have two spare bits; patch 4 enables uffd-wp on arm64, using the SW
>> bit freed up by moving PTE_PROT_NONE. This leaves bit 63 spare for future use
>> (e.g. soft-dirty - see RFC at [4] - or some other usage).
>>
>> ---
>>
>> This applies on top of v6.9-rc5.
>
> I chucked this into the CI on Friday and it looks to have survived the
> long weekend, so I've gone ahead and merged it into for-next/core. Short
> of any last minute failures (touch wood), this should land in 6.10.

Oh great - thanks!

Catalin was previously proposing to hold this until 6.11 - I'll leave you two to
fight it out in case that's still his preference ;-)

>
> Thanks!
>
> Will


2024-05-07 12:10:51

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] arm64/mm: Enable userfaultfd write-protect

On Tue, May 07, 2024 at 12:17:18PM +0100, Ryan Roberts wrote:
> On 07/05/2024 12:07, Will Deacon wrote:
> > On Fri, May 03, 2024 at 03:45:58PM +0100, Ryan Roberts wrote:
> >> This series adds uffd write-protect support for arm64.
> >>
> >> Previous attempts to add uffd-wp (and soft-dirty) have failed because of a
> >> perceived lack of available PTE SW bits. However it actually turns out that
> >> there are 2 available but they are hidden. PTE_PROT_NONE was previously
> >> occupying a SW bit, but can be moved, freeing up the SW bit. Bit 63 is marked as
> >> "IGNORED" in the Arm ARM, but it does not currently indicate "reserved for SW
> >> use" like it does for the other SW bits. I've confirmed with the spec owner that
> >> this is an oversight; the bit is intended to be reserved for SW use and the spec
> >> will clarify this in a future update.
> >>
> >> So now we have two spare bits; patch 4 enables uffd-wp on arm64, using the SW
> >> bit freed up by moving PTE_PROT_NONE. This leaves bit 63 spare for future use
> >> (e.g. soft-dirty - see RFC at [4] - or some other usage).
> >>
> >> ---
> >>
> >> This applies on top of v6.9-rc5.
> >
> > I chucked this into the CI on Friday and it looks to have survived the
> > long weekend, so I've gone ahead and merged it into for-next/core. Short
> > of any last minute failures (touch wood), this should land in 6.10.
>
> Oh great - thanks!
>
> Catalin was previously proposing to hold this until 6.11 - I'll leave you two to
> fight it out in case that's still his preference ;-)

Fine by me as well to go in 6.10. Will is taking the blame if it all
falls apart ;).

--
Catalin

2024-05-07 12:11:19

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v4 4/4] arm64/mm: Add uffd write-protect support

On 03.05.24 16:46, Ryan Roberts wrote:
> Let's use the newly-free PTE SW bit (58) to add support for uffd-wp.
>
> The standard handlers are implemented for set/test/clear for both pte
> and pmd. Additionally we must also track the uffd-wp state as a pte swp
> bit, so use a free swap pte bit (3).
>
> Acked-by: Peter Xu <[email protected]>
> Reviewed-by: Catalin Marinas <[email protected]>
> Reviewed-by: David Hildenbrand <[email protected]>
> Signed-off-by: Ryan Roberts <[email protected]>

LGTM, thanks!

Reviewed-by: David Hildenbrand <[email protected]>

--
Cheers,

David / dhildenb


2024-05-08 09:51:08

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH v4 2/4] arm64/mm: Remove PTE_PROT_NONE bit



On 5/3/24 20:16, Ryan Roberts wrote:
> Currently the PTE_PRESENT_INVALID and PTE_PROT_NONE functionality
> explicitly occupy 2 bits in the PTE when PTE_VALID/PMD_SECT_VALID is
> clear. This has 2 significant consequences:
>
> - PTE_PROT_NONE consumes a precious SW PTE bit that could be used for
> other things.
> - The swap pte layout must reserve those same 2 bits and ensure they
> are both always zero for a swap pte. It would be nice to reclaim at
> least one of those bits.
>
> But PTE_PRESENT_INVALID, which since the previous patch, applies
> uniformly to page/block descriptors at any level when PTE_VALID is
> clear, can already give us most of what PTE_PROT_NONE requires: If it is
> set, then the pte is still considered present; pte_present() returns
> true and all the fields in the pte follow the HW interpretation (e.g. SW
> can safely call pte_pfn(), etc). But crucially, the HW treats the pte as
> invalid and will fault if it hits.
>
> So let's remove PTE_PROT_NONE entirely and instead represent PROT_NONE
> as a present but invalid pte (PTE_VALID=0, PTE_PRESENT_INVALID=1) with
> PTE_USER=0 and PTE_UXN=1. This is a unique combination that is not used
> anywhere else.
>
> The net result is a clearer, simpler, more generic encoding scheme that
> applies uniformly to all levels. Additionally we free up a PTE SW bit
> and a swap pte bit (bit 58 in both cases).
>
> Reviewed-by: Catalin Marinas <[email protected]>
> Signed-off-by: Ryan Roberts <[email protected]>

Reviewed-by: Anshuman Khandual <[email protected]>

> ---
> arch/arm64/include/asm/pgtable-prot.h | 3 +--
> arch/arm64/include/asm/pgtable.h | 31 +++++++++++++++------------
> 2 files changed, 18 insertions(+), 16 deletions(-)
>
> diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
> index cdbf51eef7a6..81f07b44f7b8 100644
> --- a/arch/arm64/include/asm/pgtable-prot.h
> +++ b/arch/arm64/include/asm/pgtable-prot.h
> @@ -18,7 +18,6 @@
> #define PTE_DIRTY (_AT(pteval_t, 1) << 55)
> #define PTE_SPECIAL (_AT(pteval_t, 1) << 56)
> #define PTE_DEVMAP (_AT(pteval_t, 1) << 57)
> -#define PTE_PROT_NONE (_AT(pteval_t, 1) << 58) /* only when !PTE_VALID */
>
> /*
> * PTE_PRESENT_INVALID=1 & PTE_VALID=0 indicates that the pte's fields should be
> @@ -103,7 +102,7 @@ static inline bool __pure lpa2_is_enabled(void)
> __val; \
> })
>
> -#define PAGE_NONE __pgprot(((_PAGE_DEFAULT) & ~PTE_VALID) | PTE_PROT_NONE | PTE_RDONLY | PTE_NG | PTE_PXN | PTE_UXN)
> +#define PAGE_NONE __pgprot(((_PAGE_DEFAULT) & ~PTE_VALID) | PTE_PRESENT_INVALID | PTE_RDONLY | PTE_NG | PTE_PXN | PTE_UXN)
> /* shared+writable pages are clean by default, hence PTE_RDONLY|PTE_WRITE */
> #define PAGE_SHARED __pgprot(_PAGE_SHARED)
> #define PAGE_SHARED_EXEC __pgprot(_PAGE_SHARED_EXEC)
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 7156c940ac4f..c0f4471423db 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -105,7 +105,7 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
> /*
> * The following only work if pte_present(). Undefined behaviour otherwise.
> */
> -#define pte_present(pte) (!!(pte_val(pte) & (PTE_VALID | PTE_PROT_NONE)))
> +#define pte_present(pte) (pte_valid(pte) || pte_present_invalid(pte))
> #define pte_young(pte) (!!(pte_val(pte) & PTE_AF))
> #define pte_special(pte) (!!(pte_val(pte) & PTE_SPECIAL))
> #define pte_write(pte) (!!(pte_val(pte) & PTE_WRITE))
> @@ -478,7 +478,16 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
> */
> static inline int pte_protnone(pte_t pte)
> {
> - return (pte_val(pte) & (PTE_VALID | PTE_PROT_NONE)) == PTE_PROT_NONE;
> + /*
> + * pte_present_invalid() tells us that the pte is invalid from HW
> + * perspective but present from SW perspective, so the fields are to be
> + * interpretted as per the HW layout. The second 2 checks are the unique
> + * encoding that we use for PROT_NONE. It is insufficient to only use
> + * the first check because we share the same encoding scheme with pmds
> + * which support pmd_mkinvalid(), so can be present-invalid without
> + * being PROT_NONE.
> + */
> + return pte_present_invalid(pte) && !pte_user(pte) && !pte_user_exec(pte);
> }
>
> static inline int pmd_protnone(pmd_t pmd)
> @@ -487,12 +496,7 @@ static inline int pmd_protnone(pmd_t pmd)
> }
> #endif
>
> -#define pmd_present_invalid(pmd) pte_present_invalid(pmd_pte(pmd))
> -
> -static inline int pmd_present(pmd_t pmd)
> -{
> - return pte_present(pmd_pte(pmd)) || pmd_present_invalid(pmd);
> -}
> +#define pmd_present(pmd) pte_present(pmd_pte(pmd))
>
> /*
> * THP definitions.
> @@ -1029,8 +1033,8 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
> * in MAIR_EL1. The mask below has to include PTE_ATTRINDX_MASK.
> */
> const pteval_t mask = PTE_USER | PTE_PXN | PTE_UXN | PTE_RDONLY |
> - PTE_PROT_NONE | PTE_VALID | PTE_WRITE | PTE_GP |
> - PTE_ATTRINDX_MASK;
> + PTE_PRESENT_INVALID | PTE_VALID | PTE_WRITE |
> + PTE_GP | PTE_ATTRINDX_MASK;
> /* preserve the hardware dirty information */
> if (pte_hw_dirty(pte))
> pte = set_pte_bit(pte, __pgprot(PTE_DIRTY));
> @@ -1078,17 +1082,17 @@ static inline int pgd_devmap(pgd_t pgd)
> #ifdef CONFIG_PAGE_TABLE_CHECK
> static inline bool pte_user_accessible_page(pte_t pte)
> {
> - return pte_present(pte) && (pte_user(pte) || pte_user_exec(pte));
> + return pte_valid(pte) && (pte_user(pte) || pte_user_exec(pte));
> }
>
> static inline bool pmd_user_accessible_page(pmd_t pmd)
> {
> - return pmd_leaf(pmd) && !pmd_present_invalid(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd));
> + return pmd_valid(pmd) && !pmd_table(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd));
> }
>
> static inline bool pud_user_accessible_page(pud_t pud)
> {
> - return pud_leaf(pud) && (pud_user(pud) || pud_user_exec(pud));
> + return pud_valid(pud) && !pud_table(pud) && (pud_user(pud) || pud_user_exec(pud));
> }
> #endif
>
> @@ -1252,7 +1256,6 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
> * bits 2: remember PG_anon_exclusive
> * bits 3-7: swap type
> * bits 8-57: swap offset
> - * bit 58: PTE_PROT_NONE (must be zero)
> * bit 59: PTE_PRESENT_INVALID (must be zero)
> */
> #define __SWP_TYPE_SHIFT 3

2024-05-08 10:02:01

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] arm64/mm: Enable userfaultfd write-protect



On 5/7/24 16:37, Will Deacon wrote:
> Hi Ryan,
>
> On Fri, May 03, 2024 at 03:45:58PM +0100, Ryan Roberts wrote:
>> This series adds uffd write-protect support for arm64.
>>
>> Previous attempts to add uffd-wp (and soft-dirty) have failed because of a
>> perceived lack of available PTE SW bits. However it actually turns out that
>> there are 2 available but they are hidden. PTE_PROT_NONE was previously
>> occupying a SW bit, but can be moved, freeing up the SW bit. Bit 63 is marked as
>> "IGNORED" in the Arm ARM, but it does not currently indicate "reserved for SW
>> use" like it does for the other SW bits. I've confirmed with the spec owner that
>> this is an oversight; the bit is intended to be reserved for SW use and the spec
>> will clarify this in a future update.
>>
>> So now we have two spare bits; patch 4 enables uffd-wp on arm64, using the SW
>> bit freed up by moving PTE_PROT_NONE. This leaves bit 63 spare for future use
>> (e.g. soft-dirty - see RFC at [4] - or some other usage).
>>
>> ---
>>
>> This applies on top of v6.9-rc5.
>
> I chucked this into the CI on Friday and it looks to have survived the
> long weekend, so I've gone ahead and merged it into for-next/core. Short
> of any last minute failures (touch wood), this should land in 6.10.

It would be great to have some memory migration tests (including THP and HugeTLB)
thrown at this series, which should test the mapped, migration entry transitions
etc. But not sure if there are any such tests off the shelf and readily available
in the CI system.

2024-05-08 10:16:46

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] arm64/mm: Enable userfaultfd write-protect

On 08/05/2024 11:00, Anshuman Khandual wrote:
>
>
> On 5/7/24 16:37, Will Deacon wrote:
>> Hi Ryan,
>>
>> On Fri, May 03, 2024 at 03:45:58PM +0100, Ryan Roberts wrote:
>>> This series adds uffd write-protect support for arm64.
>>>
>>> Previous attempts to add uffd-wp (and soft-dirty) have failed because of a
>>> perceived lack of available PTE SW bits. However it actually turns out that
>>> there are 2 available but they are hidden. PTE_PROT_NONE was previously
>>> occupying a SW bit, but can be moved, freeing up the SW bit. Bit 63 is marked as
>>> "IGNORED" in the Arm ARM, but it does not currently indicate "reserved for SW
>>> use" like it does for the other SW bits. I've confirmed with the spec owner that
>>> this is an oversight; the bit is intended to be reserved for SW use and the spec
>>> will clarify this in a future update.
>>>
>>> So now we have two spare bits; patch 4 enables uffd-wp on arm64, using the SW
>>> bit freed up by moving PTE_PROT_NONE. This leaves bit 63 spare for future use
>>> (e.g. soft-dirty - see RFC at [4] - or some other usage).
>>>
>>> ---
>>>
>>> This applies on top of v6.9-rc5.
>>
>> I chucked this into the CI on Friday and it looks to have survived the
>> long weekend, so I've gone ahead and merged it into for-next/core. Short
>> of any last minute failures (touch wood), this should land in 6.10.
>
> It would be great to have some memory migration tests (including THP and HugeTLB)
> thrown at this series, which should test the mapped, migration entry transitions
> etc. But not sure if there are any such tests off the shelf and readily available
> in the CI system.

The "private_anon_thp" migration test in mm selftests is doing that for THP. and
invoking pmd_mkinvalid() as I recall; that's what originally led to me finding
the pmd_mkinvalid()-on-a-swap-pmd bug. There is nothing in that suite for
HugeTLB though - happy to run if someone can recommend anything.