2024-04-19 07:44:12

by Ryan Roberts

[permalink] [raw]
Subject: [PATCH v1 0/5] arm64/mm: uffd write-protect and soft-dirty tracking

Hi All,

This series adds uffd write-protect and soft-dirty tracking support for arm64. I
consider the soft-dirty support (patches 3 and 4) as RFC - see rationale below.

Previous attempts to add these features have failed because of a perceived lack
of available PTE SW bits. However it actually turns out that there are 2
available but they are hidden. PTE_PROT_NONE was previously occupying a SW bit,
but it only applies when PTE_VALID is clear, so this is moved to overlay PTE_UXN
in patch 1, freeing up the SW bit. Bit 63 is marked as "IGNORED" in the Arm ARM,
but it does not currently indicate "reserved for SW use" like it does for the
other SW bits. I've confirmed with the spec owner that this is an oversight; the
bit is intended to be reserved for SW use and the spec will clarify this in a
future update.

So we have our two bits; patch 2 enables uffd-wp, patch 3 enables soft-dirty and
patches 4 and 5 sort out the selftests so that the soft-dirty tests are compiled
for, and run on arm64.

That said, these are the last 2 SW bits and we may want to keep 1 bit in reserve
for future use. soft-dirty is only used for CRIU to my knowledge, and it is
thought that their use case could be solved with the more generic uffd-wp. So
unless somebody makes a clear case for the inclusion of soft-dirty support, we
are probably better off dropping patches 3 and 4 and keeping bit 63 for future
use. Although note that the most recent attempt to add soft-dirty for arm64 was
last month [1] so I'd like to give Shivansh Vij the opportunity to make the
case.

---8<---
As an appendix, I've also experimented with adding an "extended SW bits" region
linked by the `struct ptdesc` (which you can always find from the `pte_t *`). If
demonstrated to work, this would act as an insurance policy in case we ever need
more SW bits in future, giving us confidence to merge soft-dirty now.
Unfortunately this approach suffers from 2 problems; 1) its slow; my fork()
microbenchmark takes 40% longer in the worst case. 2) it is not possible to read
the HW pte and the extended SW bits atomically so it is impossible to implement
ptep_get_lockess() in its current form. So I've abandoned this experiment. (I
can provide more details if there is interest).
---8<---

[1] https://lore.kernel.org/linux-arm-kernel/MW4PR12MB687563EFB56373E8D55DDEABB92B2@MW4PR12MB6875.namprd12.prod.outlook.com/

Thanks,
Ryan


Ryan Roberts (5):
arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID
arm64/mm: Add uffd write-protect support
arm64/mm: Add soft-dirty page tracking support
selftests/mm: Enable soft-dirty tests on arm64
selftests/mm: soft-dirty should fail if a testcase fails

arch/arm64/Kconfig | 2 +
arch/arm64/include/asm/pgtable-prot.h | 20 +++-
arch/arm64/include/asm/pgtable.h | 118 +++++++++++++++++++--
arch/arm64/mm/contpte.c | 6 +-
arch/arm64/mm/fault.c | 3 +-
arch/arm64/mm/hugetlbpage.c | 6 +-
tools/testing/selftests/mm/Makefile | 5 +-
tools/testing/selftests/mm/madv_populate.c | 26 +----
tools/testing/selftests/mm/run_vmtests.sh | 5 +-
tools/testing/selftests/mm/soft-dirty.c | 2 +-
10 files changed, 141 insertions(+), 52 deletions(-)

--
2.25.1



2024-04-19 07:44:29

by Ryan Roberts

[permalink] [raw]
Subject: [RFC PATCH v1 3/5] arm64/mm: Add soft-dirty page tracking support

Use the final remaining PTE SW bit (63) for soft-dirty tracking. The
standard handlers are implemented for set/test/clear for both pte and
pmd. Additionally we must also track the soft-dirty state as a pte swp
bit, so use a free swap entry pte bit (61).

There are a few complexities worth calling out:

- The semantic of soft-dirty calls for having it auto-set by
pte_mkdirty(). But the arch code would previously call pte_mkdirty()
for various house-keeping operations such as gathering dirty bits
into a pte across a contpte block. These operations must not cause
soft-dirty to be set. So an internal version, __pte_mkdirty(), has
been created that does not manipulate soft-dirty, and pte_mkdirty()
is now a wrapper around that, which also sets the soft-dirty bit.

- For a region with soft-dirty tracking enabled, it works by
wrprotecting the ptes, causing a write to fault, where the handler
calls pte_mkdirty(ptep_get()) (which causes soft-dirty to be set),
then the resulting pte is written back with ptep_set_access_flags().
So the arm64 version of ptep_set_access_flags() now needs to
explicitly also set the soft-dirty bit to prevent loss.

The patch is very loosely based on a similar patch posted by Shivansh
Vij <[email protected]>, at the below link.

Primary motivation for adding soft-dirty support is to allow
Checkpoint-Restore in Userspace (CRIU) to be able to track a memory
page's changes if we want to enable pre-dumping, which is important for
live migration.

Link: https://lore.kernel.org/linux-arm-kernel/MW4PR12MB687563EFB56373E8D55DDEABB92B2@MW4PR12MB6875.namprd12.prod.outlook.com/
Signed-off-by: Ryan Roberts <[email protected]>
---
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/pgtable-prot.h | 8 +++++
arch/arm64/include/asm/pgtable.h | 47 +++++++++++++++++++++++++--
arch/arm64/mm/contpte.c | 6 ++--
arch/arm64/mm/fault.c | 3 +-
arch/arm64/mm/hugetlbpage.c | 6 ++--
6 files changed, 61 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 763e221f2169..3a5e22208e38 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -178,6 +178,7 @@ config ARM64
select HAVE_ARCH_PREL32_RELOCATIONS
select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
select HAVE_ARCH_SECCOMP_FILTER
+ select HAVE_ARCH_SOFT_DIRTY
select HAVE_ARCH_STACKLEAK
select HAVE_ARCH_THREAD_STRUCT_WHITELIST
select HAVE_ARCH_TRACEHOOK
diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index f1e1f6306e03..7fce22ed3fda 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -28,6 +28,14 @@
#define PTE_SWP_UFFD_WP (_AT(pteval_t, 0))
#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */

+#ifdef CONFIG_MEM_SOFT_DIRTY
+#define PTE_SOFT_DIRTY (_AT(pteval_t, 1) << 63) /* soft-dirty tracking */
+#define PTE_SWP_SOFT_DIRTY (_AT(pteval_t, 1) << 61) /* only for swp ptes */
+#else
+#define PTE_SOFT_DIRTY (_AT(pteval_t, 0))
+#define PTE_SWP_SOFT_DIRTY (_AT(pteval_t, 0))
+#endif /* CONFIG_MEM_SOFT_DIRTY */
+
/*
* This bit indicates that the entry is present i.e. pmd_page()
* still points to a valid huge page in memory even if the pmd
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 3f4748741fdb..0118e6e0adde 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -114,6 +114,7 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
#define pte_user_exec(pte) (!(pte_val(pte) & PTE_UXN))
#define pte_cont(pte) (!!(pte_val(pte) & PTE_CONT))
#define pte_devmap(pte) (!!(pte_val(pte) & PTE_DEVMAP))
+#define pte_soft_dirty(pte) (!!(pte_val(pte) & PTE_SOFT_DIRTY))
#define pte_tagged(pte) ((pte_val(pte) & PTE_ATTRINDX_MASK) == \
PTE_ATTRINDX(MT_NORMAL_TAGGED))

@@ -206,7 +207,7 @@ static inline pte_t pte_mkclean(pte_t pte)
return pte;
}

-static inline pte_t pte_mkdirty(pte_t pte)
+static inline pte_t __pte_mkdirty(pte_t pte)
{
pte = set_pte_bit(pte, __pgprot(PTE_DIRTY));

@@ -216,6 +217,11 @@ static inline pte_t pte_mkdirty(pte_t pte)
return pte;
}

+static inline pte_t pte_mkdirty(pte_t pte)
+{
+ return __pte_mkdirty(set_pte_bit(pte, __pgprot(PTE_SOFT_DIRTY)));
+}
+
static inline pte_t pte_wrprotect(pte_t pte)
{
/*
@@ -299,6 +305,16 @@ static inline pte_t pte_clear_uffd_wp(pte_t pte)
}
#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */

+static inline pte_t pte_mksoft_dirty(pte_t pte)
+{
+ return set_pte_bit(pte, __pgprot(PTE_SOFT_DIRTY));
+}
+
+static inline pte_t pte_clear_soft_dirty(pte_t pte)
+{
+ return clear_pte_bit(pte, __pgprot(PTE_SOFT_DIRTY));
+}
+
static inline void __set_pte(pte_t *ptep, pte_t pte)
{
WRITE_ONCE(*ptep, pte);
@@ -508,6 +524,21 @@ static inline pte_t pte_swp_clear_uffd_wp(pte_t pte)
}
#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */

+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
+{
+ return set_pte_bit(pte, __pgprot(PTE_SWP_SOFT_DIRTY));
+}
+
+static inline bool pte_swp_soft_dirty(pte_t pte)
+{
+ return !!(pte_val(pte) & PTE_SWP_SOFT_DIRTY);
+}
+
+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
+{
+ return clear_pte_bit(pte, __pgprot(PTE_SWP_SOFT_DIRTY));
+}
+
#ifdef CONFIG_NUMA_BALANCING
/*
* See the comment in include/linux/pgtable.h
@@ -562,6 +593,15 @@ static inline int pmd_trans_huge(pmd_t pmd)
#define pmd_swp_clear_uffd_wp(pmd) \
pte_pmd(pte_swp_clear_uffd_wp(pmd_pte(pmd)))
#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */
+#define pmd_soft_dirty(pmd) pte_soft_dirty(pmd_pte(pmd))
+#define pmd_mksoft_dirty(pmd) pte_pmd(pte_mksoft_dirty(pmd_pte(pmd)))
+#define pmd_clear_soft_dirty(pmd) \
+ pte_pmd(pte_clear_soft_dirty(pmd_pte(pmd)))
+#define pmd_swp_soft_dirty(pmd) pte_swp_soft_dirty(pmd_pte(pmd))
+#define pmd_swp_mksoft_dirty(pmd) \
+ pte_pmd(pte_swp_mksoft_dirty(pmd_pte(pmd)))
+#define pmd_swp_clear_soft_dirty(pmd) \
+ pte_pmd(pte_swp_clear_soft_dirty(pmd_pte(pmd)))

static inline pmd_t pmd_mkinvalid(pmd_t pmd)
{
@@ -1093,7 +1133,7 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
* dirtiness again.
*/
if (pte_sw_dirty(pte))
- pte = pte_mkdirty(pte);
+ pte = __pte_mkdirty(pte);
return pte;
}

@@ -1228,7 +1268,7 @@ static inline pte_t __get_and_clear_full_ptes(struct mm_struct *mm,
addr += PAGE_SIZE;
tmp_pte = __ptep_get_and_clear(mm, addr, ptep);
if (pte_dirty(tmp_pte))
- pte = pte_mkdirty(pte);
+ pte = __pte_mkdirty(pte);
if (pte_young(tmp_pte))
pte = pte_mkyoung(pte);
}
@@ -1307,6 +1347,7 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
* bit 54: PTE_PROT_NONE (overlays PTE_UXN) (must be zero)
* bits 55-59: swap type
* bit 60: PMD_PRESENT_INVALID (must be zero)
+ * bit 61: remember soft-dirty state
*/
#define __SWP_TYPE_SHIFT 55
#define __SWP_TYPE_BITS 5
diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c
index 1b64b4c3f8bf..c6f52fcf5d9a 100644
--- a/arch/arm64/mm/contpte.c
+++ b/arch/arm64/mm/contpte.c
@@ -62,7 +62,7 @@ static void contpte_convert(struct mm_struct *mm, unsigned long addr,
pte_t ptent = __ptep_get_and_clear(mm, addr, ptep);

if (pte_dirty(ptent))
- pte = pte_mkdirty(pte);
+ pte = __pte_mkdirty(pte);

if (pte_young(ptent))
pte = pte_mkyoung(pte);
@@ -170,7 +170,7 @@ pte_t contpte_ptep_get(pte_t *ptep, pte_t orig_pte)
pte = __ptep_get(ptep);

if (pte_dirty(pte))
- orig_pte = pte_mkdirty(orig_pte);
+ orig_pte = __pte_mkdirty(orig_pte);

if (pte_young(pte))
orig_pte = pte_mkyoung(orig_pte);
@@ -227,7 +227,7 @@ pte_t contpte_ptep_get_lockless(pte_t *orig_ptep)
goto retry;

if (pte_dirty(pte))
- orig_pte = pte_mkdirty(orig_pte);
+ orig_pte = __pte_mkdirty(orig_pte);

if (pte_young(pte))
orig_pte = pte_mkyoung(orig_pte);
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 8251e2fea9c7..678171fd88bd 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -220,7 +220,8 @@ int __ptep_set_access_flags(struct vm_area_struct *vma,
return 0;

/* only preserve the access flags and write permission */
- pte_val(entry) &= PTE_RDONLY | PTE_AF | PTE_WRITE | PTE_DIRTY;
+ pte_val(entry) &= PTE_RDONLY | PTE_AF | PTE_WRITE |
+ PTE_DIRTY | PTE_SOFT_DIRTY;

/*
* Setting the flags must be done atomically to avoid racing with the
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 0f0e10bb0a95..4605eb146a2f 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -155,7 +155,7 @@ pte_t huge_ptep_get(pte_t *ptep)
pte_t pte = __ptep_get(ptep);

if (pte_dirty(pte))
- orig_pte = pte_mkdirty(orig_pte);
+ orig_pte = __pte_mkdirty(orig_pte);

if (pte_young(pte))
orig_pte = pte_mkyoung(orig_pte);
@@ -189,7 +189,7 @@ static pte_t get_clear_contig(struct mm_struct *mm,
* so check them all.
*/
if (pte_dirty(pte))
- orig_pte = pte_mkdirty(orig_pte);
+ orig_pte = __pte_mkdirty(orig_pte);

if (pte_young(pte))
orig_pte = pte_mkyoung(orig_pte);
@@ -464,7 +464,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,

/* Make sure we don't lose the dirty or young state */
if (pte_dirty(orig_pte))
- pte = pte_mkdirty(pte);
+ pte = __pte_mkdirty(pte);

if (pte_young(orig_pte))
pte = pte_mkyoung(pte);
--
2.25.1


2024-04-19 07:44:43

by Ryan Roberts

[permalink] [raw]
Subject: [RFC PATCH v1 4/5] selftests/mm: Enable soft-dirty tests on arm64

Now that arm64 supports soft-dirty tracking lets enable the tests, which
were previously disabled for arm64 to reduce noise.

This reverts commit f6dd4e223d87 ("selftests/mm: skip soft-dirty tests
on arm64").

Signed-off-by: Ryan Roberts <[email protected]>
---
tools/testing/selftests/mm/Makefile | 5 +----
tools/testing/selftests/mm/madv_populate.c | 26 ++--------------------
tools/testing/selftests/mm/run_vmtests.sh | 5 +----
3 files changed, 4 insertions(+), 32 deletions(-)

diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index eb5f39a2668b..7f1a6ad09534 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -65,6 +65,7 @@ TEST_GEN_FILES += thuge-gen
TEST_GEN_FILES += transhuge-stress
TEST_GEN_FILES += uffd-stress
TEST_GEN_FILES += uffd-unit-tests
+TEST_GEN_FILES += soft-dirty
TEST_GEN_FILES += split_huge_page_test
TEST_GEN_FILES += ksm_tests
TEST_GEN_FILES += ksm_functional_tests
@@ -72,10 +73,6 @@ TEST_GEN_FILES += mdwe_test
TEST_GEN_FILES += hugetlb_fault_after_madv
TEST_GEN_FILES += hugetlb_madv_vs_map

-ifneq ($(ARCH),arm64)
-TEST_GEN_FILES += soft-dirty
-endif
-
ifeq ($(ARCH),x86_64)
CAN_BUILD_I386 := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_32bit_program.c -m32)
CAN_BUILD_X86_64 := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_64bit_program.c)
diff --git a/tools/testing/selftests/mm/madv_populate.c b/tools/testing/selftests/mm/madv_populate.c
index 17bcb07f19f3..60547245e479 100644
--- a/tools/testing/selftests/mm/madv_populate.c
+++ b/tools/testing/selftests/mm/madv_populate.c
@@ -264,35 +264,14 @@ static void test_softdirty(void)
munmap(addr, SIZE);
}

-static int system_has_softdirty(void)
-{
- /*
- * There is no way to check if the kernel supports soft-dirty, other
- * than by writing to a page and seeing if the bit was set. But the
- * tests are intended to check that the bit gets set when it should, so
- * doing that check would turn a potentially legitimate fail into a
- * skip. Fortunately, we know for sure that arm64 does not support
- * soft-dirty. So for now, let's just use the arch as a corse guide.
- */
-#if defined(__aarch64__)
- return 0;
-#else
- return 1;
-#endif
-}
-
int main(int argc, char **argv)
{
- int nr_tests = 16;
int err;

pagesize = getpagesize();

- if (system_has_softdirty())
- nr_tests += 5;
-
ksft_print_header();
- ksft_set_plan(nr_tests);
+ ksft_set_plan(21);

sense_support();
test_prot_read();
@@ -300,8 +279,7 @@ int main(int argc, char **argv)
test_holes();
test_populate_read();
test_populate_write();
- if (system_has_softdirty())
- test_softdirty();
+ test_softdirty();

err = ksft_get_fail_cnt();
if (err)
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index c2c542fe7b17..29806d352c73 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -395,10 +395,7 @@ then
CATEGORY="pkey" run_test ./protection_keys_64
fi

-if [ -x ./soft-dirty ]
-then
- CATEGORY="soft_dirty" run_test ./soft-dirty
-fi
+CATEGORY="soft_dirty" run_test ./soft-dirty

CATEGORY="pagemap" run_test ./pagemap_ioctl

--
2.25.1


2024-04-19 07:44:45

by Ryan Roberts

[permalink] [raw]
Subject: [PATCH v1 1/5] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID

Previously PTE_PROT_NONE was occupying bit 58, one of the bits reserved
for SW use when the PTE is valid. This is a waste of those precious SW
bits since PTE_PROT_NONE can only ever be set when valid is clear.
Instead let's overlay it on what would be a HW bit if valid was set.

We need to be careful about which HW bit to choose since some of them
must be preserved; when pte_present() is true (as it is for a
PTE_PROT_NONE pte), it is legitimate for the core to call various
accessors, e.g. pte_dirty(), pte_write() etc. There are also some
accessors that are private to the arch which must continue to be
honoured, e.g. pte_user(), pte_user_exec() etc.

So we choose to overlay PTE_UXN; This effectively means that whenever a
pte has PTE_PROT_NONE set, it will always report pte_user_exec() ==
false, which is obviously always correct.

As a result of this change, we must shuffle the layout of the
arch-specific swap pte so that PTE_PROT_NONE is always zero and not
overlapping with any other field. As a result of this, there is no way
to keep the `type` field contiguous without conflicting with
PMD_PRESENT_INVALID (bit 59), which must also be 0 for a swap pte. So
let's move PMD_PRESENT_INVALID to bit 60.

In the end, this frees up bit 58 for future use as a proper SW bit (e.g.
soft-dirty or uffd-wp).

Signed-off-by: Ryan Roberts <[email protected]>
---
arch/arm64/include/asm/pgtable-prot.h | 4 ++--
arch/arm64/include/asm/pgtable.h | 16 +++++++++-------
2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index dd9ee67d1d87..ef952d69fd04 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -18,14 +18,14 @@
#define PTE_DIRTY (_AT(pteval_t, 1) << 55)
#define PTE_SPECIAL (_AT(pteval_t, 1) << 56)
#define PTE_DEVMAP (_AT(pteval_t, 1) << 57)
-#define PTE_PROT_NONE (_AT(pteval_t, 1) << 58) /* only when !PTE_VALID */
+#define PTE_PROT_NONE (PTE_UXN) /* Reuse PTE_UXN; only when !PTE_VALID */

/*
* This bit indicates that the entry is present i.e. pmd_page()
* still points to a valid huge page in memory even if the pmd
* has been invalidated.
*/
-#define PMD_PRESENT_INVALID (_AT(pteval_t, 1) << 59) /* only when !PMD_SECT_VALID */
+#define PMD_PRESENT_INVALID (_AT(pteval_t, 1) << 60) /* only when !PMD_SECT_VALID */

#define _PROT_DEFAULT (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
#define _PROT_SECT_DEFAULT (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index afdd56d26ad7..23aabff4fa6f 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1248,20 +1248,22 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
* Encode and decode a swap entry:
* bits 0-1: present (must be zero)
* bits 2: remember PG_anon_exclusive
- * bits 3-7: swap type
- * bits 8-57: swap offset
- * bit 58: PTE_PROT_NONE (must be zero)
+ * bits 4-53: swap offset
+ * bit 54: PTE_PROT_NONE (overlays PTE_UXN) (must be zero)
+ * bits 55-59: swap type
+ * bit 60: PMD_PRESENT_INVALID (must be zero)
*/
-#define __SWP_TYPE_SHIFT 3
+#define __SWP_TYPE_SHIFT 55
#define __SWP_TYPE_BITS 5
-#define __SWP_OFFSET_BITS 50
#define __SWP_TYPE_MASK ((1 << __SWP_TYPE_BITS) - 1)
-#define __SWP_OFFSET_SHIFT (__SWP_TYPE_BITS + __SWP_TYPE_SHIFT)
+#define __SWP_OFFSET_SHIFT 4
+#define __SWP_OFFSET_BITS 50
#define __SWP_OFFSET_MASK ((1UL << __SWP_OFFSET_BITS) - 1)

#define __swp_type(x) (((x).val >> __SWP_TYPE_SHIFT) & __SWP_TYPE_MASK)
#define __swp_offset(x) (((x).val >> __SWP_OFFSET_SHIFT) & __SWP_OFFSET_MASK)
-#define __swp_entry(type,offset) ((swp_entry_t) { ((type) << __SWP_TYPE_SHIFT) | ((offset) << __SWP_OFFSET_SHIFT) })
+#define __swp_entry(type, offset) ((swp_entry_t) { ((unsigned long)(type) << __SWP_TYPE_SHIFT) | \
+ ((unsigned long)(offset) << __SWP_OFFSET_SHIFT) })

#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
#define __swp_entry_to_pte(swp) ((pte_t) { (swp).val })
--
2.25.1


2024-04-19 07:44:47

by Ryan Roberts

[permalink] [raw]
Subject: [PATCH v1 2/5] arm64/mm: Add uffd write-protect support

Let's use the newly-free PTE SW bit (58) to add support for uffd-wp.

The standard handlers are implemented for set/test/clear for both pte
and pmd. Additionally we must also track the uffd-wp state as a pte swp
bit, so use a free swap entry pte bit (3).

Signed-off-by: Ryan Roberts <[email protected]>
---
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/pgtable-prot.h | 8 ++++
arch/arm64/include/asm/pgtable.h | 55 +++++++++++++++++++++++++++
3 files changed, 64 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7b11c98b3e84..763e221f2169 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -255,6 +255,7 @@ config ARM64
select SYSCTL_EXCEPTION_TRACE
select THREAD_INFO_IN_TASK
select HAVE_ARCH_USERFAULTFD_MINOR if USERFAULTFD
+ select HAVE_ARCH_USERFAULTFD_WP if USERFAULTFD
select TRACE_IRQFLAGS_SUPPORT
select TRACE_IRQFLAGS_NMI_SUPPORT
select HAVE_SOFTIRQ_ON_OWN_STACK
diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index ef952d69fd04..f1e1f6306e03 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -20,6 +20,14 @@
#define PTE_DEVMAP (_AT(pteval_t, 1) << 57)
#define PTE_PROT_NONE (PTE_UXN) /* Reuse PTE_UXN; only when !PTE_VALID */

+#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP
+#define PTE_UFFD_WP (_AT(pteval_t, 1) << 58) /* uffd-wp tracking */
+#define PTE_SWP_UFFD_WP (_AT(pteval_t, 1) << 3) /* only for swp ptes */
+#else
+#define PTE_UFFD_WP (_AT(pteval_t, 0))
+#define PTE_SWP_UFFD_WP (_AT(pteval_t, 0))
+#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */
+
/*
* This bit indicates that the entry is present i.e. pmd_page()
* still points to a valid huge page in memory even if the pmd
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 23aabff4fa6f..3f4748741fdb 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -271,6 +271,34 @@ static inline pte_t pte_mkdevmap(pte_t pte)
return set_pte_bit(pte, __pgprot(PTE_DEVMAP | PTE_SPECIAL));
}

+#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP
+static inline int pte_uffd_wp(pte_t pte)
+{
+ bool wp = !!(pte_val(pte) & PTE_UFFD_WP);
+
+#ifdef CONFIG_DEBUG_VM
+ /*
+ * Having write bit for wr-protect-marked present ptes is fatal, because
+ * it means the uffd-wp bit will be ignored and write will just go
+ * through. See comment in x86 implementation.
+ */
+ WARN_ON_ONCE(wp && pte_write(pte));
+#endif
+
+ return wp;
+}
+
+static inline pte_t pte_mkuffd_wp(pte_t pte)
+{
+ return pte_wrprotect(set_pte_bit(pte, __pgprot(PTE_UFFD_WP)));
+}
+
+static inline pte_t pte_clear_uffd_wp(pte_t pte)
+{
+ return clear_pte_bit(pte, __pgprot(PTE_UFFD_WP));
+}
+#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */
+
static inline void __set_pte(pte_t *ptep, pte_t pte)
{
WRITE_ONCE(*ptep, pte);
@@ -463,6 +491,23 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
return clear_pte_bit(pte, __pgprot(PTE_SWP_EXCLUSIVE));
}

+#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP
+static inline pte_t pte_swp_mkuffd_wp(pte_t pte)
+{
+ return set_pte_bit(pte, __pgprot(PTE_SWP_UFFD_WP));
+}
+
+static inline int pte_swp_uffd_wp(pte_t pte)
+{
+ return !!(pte_val(pte) & PTE_SWP_UFFD_WP);
+}
+
+static inline pte_t pte_swp_clear_uffd_wp(pte_t pte)
+{
+ return clear_pte_bit(pte, __pgprot(PTE_SWP_UFFD_WP));
+}
+#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */
+
#ifdef CONFIG_NUMA_BALANCING
/*
* See the comment in include/linux/pgtable.h
@@ -508,6 +553,15 @@ static inline int pmd_trans_huge(pmd_t pmd)
#define pmd_mkclean(pmd) pte_pmd(pte_mkclean(pmd_pte(pmd)))
#define pmd_mkdirty(pmd) pte_pmd(pte_mkdirty(pmd_pte(pmd)))
#define pmd_mkyoung(pmd) pte_pmd(pte_mkyoung(pmd_pte(pmd)))
+#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP
+#define pmd_uffd_wp(pmd) pte_uffd_wp(pmd_pte(pmd))
+#define pmd_mkuffd_wp(pmd) pte_pmd(pte_mkuffd_wp(pmd_pte(pmd)))
+#define pmd_clear_uffd_wp(pmd) pte_pmd(pte_clear_uffd_wp(pmd_pte(pmd)))
+#define pmd_swp_uffd_wp(pmd) pte_swp_uffd_wp(pmd_pte(pmd))
+#define pmd_swp_mkuffd_wp(pmd) pte_pmd(pte_swp_mkuffd_wp(pmd_pte(pmd)))
+#define pmd_swp_clear_uffd_wp(pmd) \
+ pte_pmd(pte_swp_clear_uffd_wp(pmd_pte(pmd)))
+#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */

static inline pmd_t pmd_mkinvalid(pmd_t pmd)
{
@@ -1248,6 +1302,7 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
* Encode and decode a swap entry:
* bits 0-1: present (must be zero)
* bits 2: remember PG_anon_exclusive
+ * bit 3: remember uffd-wp state
* bits 4-53: swap offset
* bit 54: PTE_PROT_NONE (overlays PTE_UXN) (must be zero)
* bits 55-59: swap type
--
2.25.1


2024-04-19 07:45:12

by Ryan Roberts

[permalink] [raw]
Subject: [PATCH v1 5/5] selftests/mm: soft-dirty should fail if a testcase fails

Previously soft-dirty was unconditionally exiting with success, even if
one of it's testcases failed. Let's fix that so that failure can be
reported to automated systems properly.

Signed-off-by: Ryan Roberts <[email protected]>
---
tools/testing/selftests/mm/soft-dirty.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/mm/soft-dirty.c b/tools/testing/selftests/mm/soft-dirty.c
index 7dbfa53d93a0..bdfa5d085f00 100644
--- a/tools/testing/selftests/mm/soft-dirty.c
+++ b/tools/testing/selftests/mm/soft-dirty.c
@@ -209,5 +209,5 @@ int main(int argc, char **argv)

close(pagemap_fd);

- return ksft_exit_pass();
+ ksft_finished();
}
--
2.25.1


2024-04-19 07:47:25

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] arm64/mm: uffd write-protect and soft-dirty tracking

On 19/04/2024 08:43, Ryan Roberts wrote:
> Hi All,
>
> This series adds uffd write-protect and soft-dirty tracking support for arm64. I
> consider the soft-dirty support (patches 3 and 4) as RFC - see rationale below.
>
> Previous attempts to add these features have failed because of a perceived lack
> of available PTE SW bits. However it actually turns out that there are 2
> available but they are hidden. PTE_PROT_NONE was previously occupying a SW bit,
> but it only applies when PTE_VALID is clear, so this is moved to overlay PTE_UXN
> in patch 1, freeing up the SW bit. Bit 63 is marked as "IGNORED" in the Arm ARM,
> but it does not currently indicate "reserved for SW use" like it does for the
> other SW bits. I've confirmed with the spec owner that this is an oversight; the
> bit is intended to be reserved for SW use and the spec will clarify this in a
> future update.
>
> So we have our two bits; patch 2 enables uffd-wp, patch 3 enables soft-dirty and
> patches 4 and 5 sort out the selftests so that the soft-dirty tests are compiled
> for, and run on arm64.
>
> That said, these are the last 2 SW bits and we may want to keep 1 bit in reserve
> for future use. soft-dirty is only used for CRIU to my knowledge, and it is
> thought that their use case could be solved with the more generic uffd-wp. So
> unless somebody makes a clear case for the inclusion of soft-dirty support, we
> are probably better off dropping patches 3 and 4 and keeping bit 63 for future
> use. Although note that the most recent attempt to add soft-dirty for arm64 was
> last month [1] so I'd like to give Shivansh Vij the opportunity to make the
> case.

Ugh, forgot to mention that this applies on top of v6.9-rc3, and all the uffd-wp
and soft-dirty tests in the mm selftests suite run and pass. And no regressions
are observed in any of the other selftests.


>
> ---8<---
> As an appendix, I've also experimented with adding an "extended SW bits" region
> linked by the `struct ptdesc` (which you can always find from the `pte_t *`). If
> demonstrated to work, this would act as an insurance policy in case we ever need
> more SW bits in future, giving us confidence to merge soft-dirty now.
> Unfortunately this approach suffers from 2 problems; 1) its slow; my fork()
> microbenchmark takes 40% longer in the worst case. 2) it is not possible to read
> the HW pte and the extended SW bits atomically so it is impossible to implement
> ptep_get_lockess() in its current form. So I've abandoned this experiment. (I
> can provide more details if there is interest).
> ---8<---
>
> [1] https://lore.kernel.org/linux-arm-kernel/MW4PR12MB687563EFB56373E8D55DDEABB92B2@MW4PR12MB6875.namprd12.prod.outlook.com/
>
> Thanks,
> Ryan
>
>
> Ryan Roberts (5):
> arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID
> arm64/mm: Add uffd write-protect support
> arm64/mm: Add soft-dirty page tracking support
> selftests/mm: Enable soft-dirty tests on arm64
> selftests/mm: soft-dirty should fail if a testcase fails
>
> arch/arm64/Kconfig | 2 +
> arch/arm64/include/asm/pgtable-prot.h | 20 +++-
> arch/arm64/include/asm/pgtable.h | 118 +++++++++++++++++++--
> arch/arm64/mm/contpte.c | 6 +-
> arch/arm64/mm/fault.c | 3 +-
> arch/arm64/mm/hugetlbpage.c | 6 +-
> tools/testing/selftests/mm/Makefile | 5 +-
> tools/testing/selftests/mm/madv_populate.c | 26 +----
> tools/testing/selftests/mm/run_vmtests.sh | 5 +-
> tools/testing/selftests/mm/soft-dirty.c | 2 +-
> 10 files changed, 141 insertions(+), 52 deletions(-)
>
> --
> 2.25.1
>


2024-04-19 08:33:35

by Shivansh Vij

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] arm64/mm: uffd write-protect and soft-dirty tracking

(Sorry about the previous HTML email, accidentally used the wrong email client)

Hey All,

>On 19/04/2024 08:43, Ryan Roberts wrote:
>> Hi All,
>>
>> This series adds uffd write-protect and soft-dirty tracking support for arm64. I
>> consider the soft-dirty support (patches 3 and 4) as RFC - see rationale below.
>>
>> Previous attempts to add these features have failed because of a perceived lack
>> of available PTE SW bits. However it actually turns out that there are 2
>> available but they are hidden. PTE_PROT_NONE was previously occupying a SW bit,
>> but it only applies when PTE_VALID is clear, so this is moved to overlay PTE_UXN
>> in patch 1, freeing up the SW bit. Bit 63 is marked as "IGNORED" in the Arm ARM,
>> but it does not currently indicate "reserved for SW use" like it does for the
>> other SW bits. I've confirmed with the spec owner that this is an oversight; the
>> bit is intended to be reserved for SW use and the spec will clarify this in a
>> future update.
>>
>> So we have our two bits; patch 2 enables uffd-wp, patch 3 enables soft-dirty and
>> patches 4 and 5 sort out the selftests so that the soft-dirty tests are compiled
>> for, and run on arm64.
>>
>> That said, these are the last 2 SW bits and we may want to keep 1 bit in reserve
>> for future use. soft-dirty is only used for CRIU to my knowledge, and it is
>> thought that their use case could be solved with the more generic uffd-wp. So
>> unless somebody makes a clear case for the inclusion of soft-dirty support, we
>> are probably better off dropping patches 3 and 4 and keeping bit 63 for future
>> use. Although note that the most recent attempt to add soft-dirty for arm64 was
>> last month [1] so I'd like to give Shivansh Vij the opportunity to make the
>> case.
>
> Ugh, forgot to mention that this applies on top of v6.9-rc3, and all the uffd-wp
> and soft-dirty tests in the mm selftests suite run and pass. And no regressions
> are observed in any of the other selftests.

Appreciate the opportunity to provide input here.

I personally don't know of any applications other than CRIU that make heavy use of soft-dirty, and my use case is specifically focused on adding live-migration support to CRIU on ARM.

Cloud providers like AWS have pretty massive discounts for ARM-based spot instances (90% last time I checked), and having live-migration in CRIU would allow more applications to take advantage of that.

As Ryan mentioned, there are two ways to achieve this - add dirty tracking to ARM (Patch 3/4), or tear out the existing dirty tracking code in CRIU and replace it with uffd-wp.

I picked option one (dirty tracking in arm) because it seems to be the simplest way to move forward, whereas it would be a relatively heavy effort to add uffd-wp support to CRIU.

From a performance perspective I am also a little worried that uffd will be slower than just tracking the dirty bits asynchronously with sw dirty, but maybe that's not as much of a concern with the addition of uffd-wp async.

With all this being said, I'll defer to the wisdom of the crowd about which approach makes more sense - after all, with this patch we should get uffd-wp support on arm so at least there will be _a_ way forward for CRIU (albeit one requiring slightly more work).

Thanks,
Shivansh

2024-04-19 09:45:30

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] arm64/mm: uffd write-protect and soft-dirty tracking

On 19.04.24 10:33, Shivansh Vij wrote:
> (Sorry about the previous HTML email, accidentally used the wrong email client)
>
> Hey All,
>
>> On 19/04/2024 08:43, Ryan Roberts wrote:
>>> Hi All,
>>>
>>> This series adds uffd write-protect and soft-dirty tracking support for arm64. I
>>> consider the soft-dirty support (patches 3 and 4) as RFC - see rationale below.
>>>
>>> Previous attempts to add these features have failed because of a perceived lack
>>> of available PTE SW bits. However it actually turns out that there are 2
>>> available but they are hidden. PTE_PROT_NONE was previously occupying a SW bit,
>>> but it only applies when PTE_VALID is clear, so this is moved to overlay PTE_UXN
>>> in patch 1, freeing up the SW bit. Bit 63 is marked as "IGNORED" in the Arm ARM,
>>> but it does not currently indicate "reserved for SW use" like it does for the
>>> other SW bits. I've confirmed with the spec owner that this is an oversight; the
>>> bit is intended to be reserved for SW use and the spec will clarify this in a
>>> future update.
>>>
>>> So we have our two bits; patch 2 enables uffd-wp, patch 3 enables soft-dirty and
>>> patches 4 and 5 sort out the selftests so that the soft-dirty tests are compiled
>>> for, and run on arm64.
>>>
>>> That said, these are the last 2 SW bits and we may want to keep 1 bit in reserve
>>> for future use. soft-dirty is only used for CRIU to my knowledge, and it is
>>> thought that their use case could be solved with the more generic uffd-wp. So
>>> unless somebody makes a clear case for the inclusion of soft-dirty support, we
>>> are probably better off dropping patches 3 and 4 and keeping bit 63 for future
>>> use. Although note that the most recent attempt to add soft-dirty for arm64 was
>>> last month [1] so I'd like to give Shivansh Vij the opportunity to make the
>>> case.
>>
>> Ugh, forgot to mention that this applies on top of v6.9-rc3, and all the uffd-wp
>> and soft-dirty tests in the mm selftests suite run and pass. And no regressions
>> are observed in any of the other selftests.
>
> Appreciate the opportunity to provide input here.
>
> I personally don't know of any applications other than CRIU that make heavy use of soft-dirty, and my use case is specifically focused on adding live-migration support to CRIU on ARM.
>
> Cloud providers like AWS have pretty massive discounts for ARM-based spot instances (90% last time I checked), and having live-migration in CRIU would allow more applications to take advantage of that.
>
> As Ryan mentioned, there are two ways to achieve this - add dirty tracking to ARM (Patch 3/4), or tear out the existing dirty tracking code in CRIU and replace it with uffd-wp.
>
> I picked option one (dirty tracking in arm) because it seems to be the simplest way to move forward, whereas it would be a relatively heavy effort to add uffd-wp support to CRIU.
>
> From a performance perspective I am also a little worried that uffd will be slower than just tracking the dirty bits asynchronously with sw dirty, but maybe that's not as much of a concern with the addition of uffd-wp async.
>
> With all this being said, I'll defer to the wisdom of the crowd about which approach makes more sense - after all, with this patch we should get uffd-wp support on arm so at least there will be _a_ way forward for CRIU (albeit one requiring slightly more work).

Ccing Mike and Peter. In 2017, Mike gave a presentation "Memory tracking
for iterative container migration"[1] at LPC

Some key points are still true I think:
(1) More flexible and robust than soft-dirty
(2) May obsolete soft-dirty

We further recently added a new UFFD_FEATURE_WP_ASYNC feature as part of
[2], because getting soft-dirty return reliable results in some cases
turned out rather hard to fix.

We might still have to optimize that approach for some very sparse large
VMAs, but that should be solvable.

"The major defect of this approach of dirty tracking is we need to
populate the pgtables when tracking starts. Soft-dirty doesn't do it
like that. It's unwanted in the case where the range of memory to track
is huge and unpopulated (e.g., tracking updates on a 10G file with
mmap() on top, without having any page cache installed yet). One way to
improve this is to allow pte markers exist for larger than PTE level
for PMD+. That will not change the interface if to implemented, so we
can leave that for later.")[3]


If we can avoid adding soft-dirty on arm64 that would be great. This
will require work on the CRIU side. One downside of uffd-wp is that it
is currently not as avilable on architectures as soft-dirty.

But I'll throw in another idea: do we really need soft-dirty and uffd-wp
to exist at the same time in the same process (or the VMA?). In theory,
we could have a VMA flag that defines the semantics of the bit and
simply have arch code use a single, abstracted PTE bit. Requires a bit
more work, though, but the benfit would be that architecturs that do
support soft-dirty could support uffd-wp.


[1]
https://blog.linuxplumbersconf.org/2017/ocw//system/presentations/4724/original/Memory%20tracking%20for%20iterative%20container%20migration.pdf
[2]
https://lore.kernel.org/all/[email protected]/
[3]
https://lore.kernel.org/all/[email protected]/

--
Cheers,

David / dhildenb


2024-04-19 16:31:19

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] arm64/mm: uffd write-protect and soft-dirty tracking

On Fri, Apr 19, 2024 at 11:45:14AM +0200, David Hildenbrand wrote:
> On 19.04.24 10:33, Shivansh Vij wrote:
> > > On 19/04/2024 08:43, Ryan Roberts wrote:
> > > > Hi All,
> > > >
> > > > This series adds uffd write-protect and soft-dirty tracking support for arm64. I
> > > > consider the soft-dirty support (patches 3 and 4) as RFC - see rationale below.
> > > >
> > > > That said, these are the last 2 SW bits and we may want to keep 1 bit in reserve
> > > > for future use. soft-dirty is only used for CRIU to my knowledge, and it is
> > > > thought that their use case could be solved with the more generic uffd-wp. So
> > > > unless somebody makes a clear case for the inclusion of soft-dirty support, we
> > > > are probably better off dropping patches 3 and 4 and keeping bit 63 for future
> > > > use. Although note that the most recent attempt to add soft-dirty for arm64 was
> > > > last month [1] so I'd like to give Shivansh Vij the opportunity to make the
> > > > case.
> >
> > Appreciate the opportunity to provide input here.
> >
> > I picked option one (dirty tracking in arm) because it seems to be the
> > simplest way to move forward, whereas it would be a relatively heavy
> > effort to add uffd-wp support to CRIU.
> >
> > From a performance perspective I am also a little worried that uffd
> > will be slower than just tracking the dirty bits asynchronously with
> > sw dirty, but maybe that's not as much of a concern with the addition
> > of uffd-wp async.
> >
> > With all this being said, I'll defer to the wisdom of the crowd about
> > which approach makes more sense - after all, with this patch we should
> > get uffd-wp support on arm so at least there will be _a_ way forward
> > for CRIU (albeit one requiring slightly more work).
>
> Ccing Mike and Peter. In 2017, Mike gave a presentation "Memory tracking for
> iterative container migration"[1] at LPC
>
> Some key points are still true I think:
> (1) More flexible and robust than soft-dirty
> (2) May obsolete soft-dirty
>
> We further recently added a new UFFD_FEATURE_WP_ASYNC feature as part of
> [2], because getting soft-dirty return reliable results in some cases turned
> out rather hard to fix.
>
> We might still have to optimize that approach for some very sparse large
> VMAs, but that should be solvable.
>
> "The major defect of this approach of dirty tracking is we need to
> populate the pgtables when tracking starts. Soft-dirty doesn't do it
> like that. It's unwanted in the case where the range of memory to track
> is huge and unpopulated (e.g., tracking updates on a 10G file with
> mmap() on top, without having any page cache installed yet). One way to
> improve this is to allow pte markers exist for larger than PTE level
> for PMD+. That will not change the interface if to implemented, so we
> can leave that for later.")[3]
>
>
> If we can avoid adding soft-dirty on arm64 that would be great. This will
> require work on the CRIU side. One downside of uffd-wp is that it is
> currently not as avilable on architectures as soft-dirty.

Using uffd-wp instead of soft-dirty in CRIU will require quite some work on
CRIU side and probably on the kernel side too.

And as of now we'll anyway have to maintain soft-dirty because powerpc and
s390 don't have uffd-wp.

With UFFD_FEATURE_WP_ASYNC the concern that uffd-wp will be slower than
soft-dirty probably doesn't exist, but we won't know for sure until
somebody will try.

But there were other limitations, the most prominent was checkpointing an
application that uses uffd. If CRIU is to use uffd-wp for tracking of the
dirty pages, there should be some support for multiple uffd contexts for a
VMA and that's surely a lot of work.

> But I'll throw in another idea: do we really need soft-dirty and uffd-wp to
> exist at the same time in the same process (or the VMA?). In theory, we

For instance to have dirty memory tracking in CRIU for an application that
uses uffd-wp :)

> could have a VMA flag that defines the semantics of the bit and simply have
> arch code use a single, abstracted PTE bit. Requires a bit more work,
> though, but the benfit would be that architecturs that do support soft-dirty
> could support uffd-wp.
>
> [1] https://blog.linuxplumbersconf.org/2017/ocw//system/presentations/4724/original/Memory%20tracking%20for%20iterative%20container%20migration.pdf
> [2] https://lore.kernel.org/all/[email protected]/
> [3] https://lore.kernel.org/all/[email protected]/
>
> --
> Cheers,
>
> David / dhildenb
>

--
Sincerely yours,
Mike.

2024-04-19 17:13:12

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] arm64/mm: uffd write-protect and soft-dirty tracking

On 19.04.24 18:30, Mike Rapoport wrote:
> On Fri, Apr 19, 2024 at 11:45:14AM +0200, David Hildenbrand wrote:
>> On 19.04.24 10:33, Shivansh Vij wrote:
>>>> On 19/04/2024 08:43, Ryan Roberts wrote:
>>>>> Hi All,
>>>>>
>>>>> This series adds uffd write-protect and soft-dirty tracking support for arm64. I
>>>>> consider the soft-dirty support (patches 3 and 4) as RFC - see rationale below.
>>>>>
>>>>> That said, these are the last 2 SW bits and we may want to keep 1 bit in reserve
>>>>> for future use. soft-dirty is only used for CRIU to my knowledge, and it is
>>>>> thought that their use case could be solved with the more generic uffd-wp. So
>>>>> unless somebody makes a clear case for the inclusion of soft-dirty support, we
>>>>> are probably better off dropping patches 3 and 4 and keeping bit 63 for future
>>>>> use. Although note that the most recent attempt to add soft-dirty for arm64 was
>>>>> last month [1] so I'd like to give Shivansh Vij the opportunity to make the
>>>>> case.
>>>
>>> Appreciate the opportunity to provide input here.
>>>
>>> I picked option one (dirty tracking in arm) because it seems to be the
>>> simplest way to move forward, whereas it would be a relatively heavy
>>> effort to add uffd-wp support to CRIU.
>>>
>>> From a performance perspective I am also a little worried that uffd
>>> will be slower than just tracking the dirty bits asynchronously with
>>> sw dirty, but maybe that's not as much of a concern with the addition
>>> of uffd-wp async.
>>>
>>> With all this being said, I'll defer to the wisdom of the crowd about
>>> which approach makes more sense - after all, with this patch we should
>>> get uffd-wp support on arm so at least there will be _a_ way forward
>>> for CRIU (albeit one requiring slightly more work).
>>
>> Ccing Mike and Peter. In 2017, Mike gave a presentation "Memory tracking for
>> iterative container migration"[1] at LPC
>>
>> Some key points are still true I think:
>> (1) More flexible and robust than soft-dirty
>> (2) May obsolete soft-dirty
>>
>> We further recently added a new UFFD_FEATURE_WP_ASYNC feature as part of
>> [2], because getting soft-dirty return reliable results in some cases turned
>> out rather hard to fix.
>>
>> We might still have to optimize that approach for some very sparse large
>> VMAs, but that should be solvable.
>>
>> "The major defect of this approach of dirty tracking is we need to
>> populate the pgtables when tracking starts. Soft-dirty doesn't do it
>> like that. It's unwanted in the case where the range of memory to track
>> is huge and unpopulated (e.g., tracking updates on a 10G file with
>> mmap() on top, without having any page cache installed yet). One way to
>> improve this is to allow pte markers exist for larger than PTE level
>> for PMD+. That will not change the interface if to implemented, so we
>> can leave that for later.")[3]
>>
>>
>> If we can avoid adding soft-dirty on arm64 that would be great. This will
>> require work on the CRIU side. One downside of uffd-wp is that it is
>> currently not as avilable on architectures as soft-dirty.
>
> Using uffd-wp instead of soft-dirty in CRIU will require quite some work on
> CRIU side and probably on the kernel side too.
>
> And as of now we'll anyway have to maintain soft-dirty because powerpc and
> s390 don't have uffd-wp.
>
> With UFFD_FEATURE_WP_ASYNC the concern that uffd-wp will be slower than
> soft-dirty probably doesn't exist, but we won't know for sure until
> somebody will try.
>
> But there were other limitations, the most prominent was checkpointing an
> application that uses uffd. If CRIU is to use uffd-wp for tracking of the
> dirty pages, there should be some support for multiple uffd contexts for a
> VMA and that's surely a lot of work.

Is it even already supported to checkpoint an application that is using
uffd? Hard to believe, what if the monitor is running in a completely
different process than the one being checkpointed?

Further ... isn't CRIU already using uffd in some cases?
..documentation mentions [1] that it is used for "lazy (or post-copy)
restore in CRIU". At least if the documentation is correct and its
actually implemented.

[1] https://criu.org/Userfaultfd

>
>> But I'll throw in another idea: do we really need soft-dirty and uffd-wp to
>> exist at the same time in the same process (or the VMA?). In theory, we
>
> For instance to have dirty memory tracking in CRIU for an application that
> uses uffd-wp :)
>

Hah! Not a concern for application on architectures where uffd-wp does
not exist yet! Well, initially, until these applications exist and make
use of it :P

Also, I'm not sure if CRIU can checkpoint each and every application ...
I suspect one has to draw a line what can be supported and what not.

Case in point: how should CRIU checkpoint an application that is using
softdirty tracking itself? If I'm not missing something important, that
might not work ....

If the answer is "no other application is using soft-dirty tracking",
then it's really a shame we have to carry this baggage (+waste precious
PTE bits) only for one application ...

--
Cheers,

David / dhildenb


2024-04-22 09:33:38

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] selftests/mm: soft-dirty should fail if a testcase fails

On 19.04.24 09:43, Ryan Roberts wrote:
> Previously soft-dirty was unconditionally exiting with success, even if
> one of it's testcases failed. Let's fix that so that failure can be
> reported to automated systems properly.
>
> Signed-off-by: Ryan Roberts <[email protected]>
> ---
> tools/testing/selftests/mm/soft-dirty.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/mm/soft-dirty.c b/tools/testing/selftests/mm/soft-dirty.c
> index 7dbfa53d93a0..bdfa5d085f00 100644
> --- a/tools/testing/selftests/mm/soft-dirty.c
> +++ b/tools/testing/selftests/mm/soft-dirty.c
> @@ -209,5 +209,5 @@ int main(int argc, char **argv)
>
> close(pagemap_fd);
>
> - return ksft_exit_pass();
> + ksft_finished();
> }
> --
> 2.25.1
>

Guess that makes sense independent of all the other stuff?

Reviewed-by: David Hildenbrand <[email protected]>

--
Cheers,

David / dhildenb


2024-04-23 08:24:31

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] selftests/mm: soft-dirty should fail if a testcase fails

On 22/04/2024 10:33, David Hildenbrand wrote:
> On 19.04.24 09:43, Ryan Roberts wrote:
>> Previously soft-dirty was unconditionally exiting with success, even if
>> one of it's testcases failed. Let's fix that so that failure can be
>> reported to automated systems properly.
>>
>> Signed-off-by: Ryan Roberts <[email protected]>
>> ---
>>   tools/testing/selftests/mm/soft-dirty.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/tools/testing/selftests/mm/soft-dirty.c
>> b/tools/testing/selftests/mm/soft-dirty.c
>> index 7dbfa53d93a0..bdfa5d085f00 100644
>> --- a/tools/testing/selftests/mm/soft-dirty.c
>> +++ b/tools/testing/selftests/mm/soft-dirty.c
>> @@ -209,5 +209,5 @@ int main(int argc, char **argv)
>>
>>       close(pagemap_fd);
>>
>> -    return ksft_exit_pass();
>> +    ksft_finished();
>>   }
>> --
>> 2.25.1
>>
>
> Guess that makes sense independent of all the other stuff?

Yes definitely. What's the process here? Do I need to re-post as a stand-alone
patch? Or perhaps, Shuah, you could take this into your tree as is?

>
> Reviewed-by: David Hildenbrand <[email protected]>

Thanks!



2024-04-23 08:46:09

by Muhammad Usama Anjum

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] selftests/mm: soft-dirty should fail if a testcase fails

On 4/23/24 1:24 PM, Ryan Roberts wrote:
> On 22/04/2024 10:33, David Hildenbrand wrote:
>> On 19.04.24 09:43, Ryan Roberts wrote:
>>> Previously soft-dirty was unconditionally exiting with success, even if
>>> one of it's testcases failed. Let's fix that so that failure can be
>>> reported to automated systems properly.
>>>
>>> Signed-off-by: Ryan Roberts <[email protected]>
Reviewed-by: Muhammad Usama Anjum <[email protected]>

>>> ---
>>>   tools/testing/selftests/mm/soft-dirty.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/tools/testing/selftests/mm/soft-dirty.c
>>> b/tools/testing/selftests/mm/soft-dirty.c
>>> index 7dbfa53d93a0..bdfa5d085f00 100644
>>> --- a/tools/testing/selftests/mm/soft-dirty.c
>>> +++ b/tools/testing/selftests/mm/soft-dirty.c
>>> @@ -209,5 +209,5 @@ int main(int argc, char **argv)
>>>
>>>       close(pagemap_fd);
>>>
>>> -    return ksft_exit_pass();
>>> +    ksft_finished();
>>>   }
>>> --
>>> 2.25.1
>>>
>>
>> Guess that makes sense independent of all the other stuff?
>
> Yes definitely. What's the process here? Do I need to re-post as a stand-alone
> patch? Or perhaps, Shuah, you could take this into your tree as is?
She can. But if she misses it or you want to post v2 of this current
series, you can just send this one separately. Usually I try to send
separate patches for trivial and discussion required patches so that there
isn't confusion of this kind.

>
>>
>> Reviewed-by: David Hildenbrand <[email protected]>
>
> Thanks!
>
>
>

--
BR,
Muhammad Usama Anjum

2024-04-23 09:02:02

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] arm64/mm: uffd write-protect and soft-dirty tracking

On 19/04/2024 18:12, David Hildenbrand wrote:
> On 19.04.24 18:30, Mike Rapoport wrote:
>> On Fri, Apr 19, 2024 at 11:45:14AM +0200, David Hildenbrand wrote:
>>> On 19.04.24 10:33, Shivansh Vij wrote:
>>>>> On 19/04/2024 08:43, Ryan Roberts wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> This series adds uffd write-protect and soft-dirty tracking support for
>>>>>> arm64. I
>>>>>> consider the soft-dirty support (patches 3 and 4) as RFC - see rationale
>>>>>> below.
>>>>>>
>>>>>> That said, these are the last 2 SW bits and we may want to keep 1 bit in
>>>>>> reserve
>>>>>> for future use. soft-dirty is only used for CRIU to my knowledge, and it is
>>>>>> thought that their use case could be solved with the more generic uffd-wp. So
>>>>>> unless somebody makes a clear case for the inclusion of soft-dirty
>>>>>> support, we
>>>>>> are probably better off dropping patches 3 and 4 and keeping bit 63 for
>>>>>> future
>>>>>> use. Although note that the most recent attempt to add soft-dirty for
>>>>>> arm64 was
>>>>>> last month [1] so I'd like to give Shivansh Vij the opportunity to make the
>>>>>> case.
>>>>
>>>> Appreciate the opportunity to provide input here.
>>>>
>>>> I picked option one (dirty tracking in arm) because it seems to be the
>>>> simplest way to move forward, whereas it would be a relatively heavy
>>>> effort to add uffd-wp support to CRIU.
>>>>
>>>>  From a performance perspective I am also a little worried that uffd
>>>> will be slower than just tracking the dirty bits asynchronously with
>>>> sw dirty, but maybe that's not as much of a concern with the addition
>>>> of uffd-wp async.
>>>>
>>>> With all this being said, I'll defer to the wisdom of the crowd about
>>>> which approach makes more sense - after all, with this patch we should
>>>> get uffd-wp support on arm so at least there will be _a_ way forward
>>>> for CRIU (albeit one requiring slightly more work).
>>>
>>> Ccing Mike and Peter. In 2017, Mike gave a presentation "Memory tracking for
>>> iterative container migration"[1] at LPC
>>>
>>> Some key points are still true I think:
>>> (1) More flexible and robust than soft-dirty
>>> (2) May obsolete soft-dirty
>>>
>>> We further recently added a new UFFD_FEATURE_WP_ASYNC feature as part of
>>> [2], because getting soft-dirty return reliable results in some cases turned
>>> out rather hard to fix.

But it sounds like the current soft-dirty semantic is sufficient for CRIU on
other arches? If I understood correctly from my brief scan of the linked post,
the problem is that soft-dirty can sometimes provide false-positives? So could
result in uneccessary copy, but never lost data?

>>>
>>> We might still have to optimize that approach for some very sparse large
>>> VMAs, but that should be solvable.
>>>
>>>   "The major defect of this approach of dirty tracking is we need to
>>>   populate the pgtables when tracking starts. Soft-dirty doesn't do it
>>>   like that. It's unwanted in the case where the range of memory to track
>>>   is huge and unpopulated (e.g., tracking updates on a 10G file with
>>>   mmap() on top, without having any page cache installed yet). One way to
>>>   improve this is to allow pte markers exist for larger than PTE level
>>>   for PMD+. That will not change the interface if to implemented, so we
>>>   can leave that for later.")[3]
>>>
>>>
>>> If we can avoid adding soft-dirty on arm64 that would be great. This will
>>> require work on the CRIU side. One downside of uffd-wp is that it is
>>> currently not as avilable on architectures as soft-dirty.
>>
>> Using uffd-wp instead of soft-dirty in CRIU will require quite some work on
>> CRIU side and probably on the kernel side too.
>>
>> And as of now we'll anyway have to maintain soft-dirty because powerpc and
>> s390 don't have uffd-wp.
>>
>> With UFFD_FEATURE_WP_ASYNC the concern that uffd-wp will be slower than
>> soft-dirty probably doesn't exist, but we won't know for sure until
>> somebody will try.
>>
>> But there were other limitations, the most prominent was checkpointing an
>> application that uses uffd. If CRIU is to use uffd-wp for tracking of the
>> dirty pages, there should be some support for multiple uffd contexts for a
>> VMA and that's surely a lot of work.
>
> Is it even already supported to checkpoint an application that is using uffd?
> Hard to believe, what if the monitor is running in a completely different
> process than the one being checkpointed?

Shivansh, do you speak for CRIU? Are you able to comment on whether CRIU
supports checkpointing an app that uses uffd?

>
> Further ... isn't CRIU already using uffd in some cases? ...documentation
> mentions [1] that it is used for "lazy (or post-copy) restore in CRIU". At least
> if the documentation is correct and its actually implemented.
>
> [1] https://criu.org/Userfaultfd

Shivansh, same question - do you know the current CRIU status/plans for using
uffd-wp instead of soft-dirty? If CRIU doesn't currently implement it and has no
current plans to, how can we guage interest in making a plan?

>
>>
>>> But I'll throw in another idea: do we really need soft-dirty and uffd-wp to
>>> exist at the same time in the same process (or the VMA?). In theory, we

My instinct is that MUXing a PTE bit like this will lead to some subtle problems
that won't appear on arches that support either one or both of the features
independently and unconditionally. Surely better to limit ourselves to either
"arm64 will only support uffd-wp" or "arm64 will support both uffd-wp and
soft-dirty". That way, we could move ahead with reviewing/merging the uffd-wp
support asynchronously to deciding whether we want to support soft-dirty.

>>
>> For instance to have dirty memory tracking in CRIU for an application that
>> uses uffd-wp :)
>>
>
> Hah! Not a concern for application on architectures where uffd-wp does not exist
> yet! Well, initially, until these applications exist and make use of it :P
>
> Also, I'm not sure if CRIU can checkpoint each and every application ... I
> suspect one has to draw a line what can be supported and what not.
>
> Case in point: how should CRIU checkpoint an application that is using softdirty
> tracking itself? If I'm not missing something important, that might not work ....
>
> If the answer is "no other application is using soft-dirty tracking", then it's
> really a shame we have to carry this baggage (+waste precious PTE bits) only for
> one application ...



2024-04-23 19:36:45

by Shivansh Vij

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] arm64/mm: uffd write-protect and soft-dirty tracking

Hey All,
?
>On 19/04/2024 18:12, David Hildenbrand wrote:
>> On 19.04.24 18:30, Mike Rapoport wrote:
>>> On Fri, Apr 19, 2024 at 11:45:14AM +0200, David Hildenbrand wrote:
>>>> On 19.04.24 10:33, Shivansh Vij wrote:
>>>>>> On 19/04/2024 08:43, Ryan Roberts wrote:
>>>>>>> Hi All,
>>>>>>>
>>>>>>> This series adds uffd write-protect and soft-dirty tracking support for
>>>>>>> arm64. I
>>>>>>> consider the soft-dirty support (patches 3 and 4) as RFC - see rationale
>>>>>>> below.
>>>>>>>
>>>>>>> That said, these are the last 2 SW bits and we may want to keep 1 bit in
>>>>>>> reserve
>>>>>>> for future use. soft-dirty is only used for CRIU to my knowledge, and it is
>>>>>>> thought that their use case could be solved with the more generic uffd-wp. So
>>>>>>> unless somebody makes a clear case for the inclusion of soft-dirty
>>>>>>> support, we
>>>>>>> are probably better off dropping patches 3 and 4 and keeping bit 63 for
>>>>>>> future
>>>>>>> use. Although note that the most recent attempt to add soft-dirty for
>>>>>>> arm64 was
>>>>>>> last month [1] so I'd like to give Shivansh Vij the opportunity to make the
>>>>>>> case.
>>>>>
>>>>> Appreciate the opportunity to provide input here.
>>>>>
>>>>> I picked option one (dirty tracking in arm) because it seems to be the
>>>>> simplest way to move forward, whereas it would be a relatively heavy
>>>>> effort to add uffd-wp support to CRIU.
>>>>>
>>>>> ?From a performance perspective I am also a little worried that uffd
>>>>> will be slower than just tracking the dirty bits asynchronously with
>>>>> sw dirty, but maybe that's not as much of a concern with the addition
>>>>> of uffd-wp async.
>>>>>
>>>>> With all this being said, I'll defer to the wisdom of the crowd about
>>>>> which approach makes more sense - after all, with this patch we should
>>>>> get uffd-wp support on arm so at least there will be _a_ way forward
>>>>> for CRIU (albeit one requiring slightly more work).
>>>>
>>>> Ccing Mike and Peter. In 2017, Mike gave a presentation "Memory tracking for
>>>> iterative container migration"[1] at LPC
>>>>
>>>> Some key points are still true I think:
>>>> (1) More flexible and robust than soft-dirty
>>>> (2) May obsolete soft-dirty
>>>>
>>>> We further recently added a new UFFD_FEATURE_WP_ASYNC feature as part of
>>>> [2], because getting soft-dirty return reliable results in some cases turned
>>>> out rather hard to fix.
>
>But it sounds like the current soft-dirty semantic is sufficient for CRIU on
>other arches? If I understood correctly from my brief scan of the linked post,
>the problem is that soft-dirty can sometimes provide false-positives? So could
>result in uneccessary copy, but never lost data?

This is how I've always understood it as well.

>
>>>>>
>>>>> We might still have to optimize that approach for some very sparse large
>>>>> VMAs, but that should be solvable.
>>>>>
>>>> ? "The major defect of this approach of dirty tracking is we need to
>>>> ? populate the pgtables when tracking starts. Soft-dirty doesn't do it
>>>> ? like that. It's unwanted in the case where the range of memory to track
>>>> ? is huge and unpopulated (e.g., tracking updates on a 10G file with
>>>> ? mmap() on top, without having any page cache installed yet). One way to
>>>> ? improve this is to allow pte markers exist for larger than PTE level
>>>> ? for PMD+. That will not change the interface if to implemented, so we
>>>> ? can leave that for later.")[3]
>>>>
>>>>
>>>> If we can avoid adding soft-dirty on arm64 that would be great. This will
>>>> require work on the CRIU side. One downside of uffd-wp is that it is
>>>> currently not as avilable on architectures as soft-dirty.
>>>
>>> Using uffd-wp instead of soft-dirty in CRIU will require quite some work on
>>> CRIU side and probably on the kernel side too.
>>>
>>> And as of now we'll anyway have to maintain soft-dirty because powerpc and
>>> s390 don't have uffd-wp.
>>>
>>> With UFFD_FEATURE_WP_ASYNC the concern that uffd-wp will be slower than
>>> soft-dirty probably doesn't exist, but we won't know for sure until
>>> somebody will try.
>>>
>>> But there were other limitations, the most prominent was checkpointing an
>>> application that uses uffd. If CRIU is to use uffd-wp for tracking of the
>>> dirty pages, there should be some support for multiple uffd contexts for a
>>> VMA and that's surely a lot of work.
>>
>> Is it even already supported to checkpoint an application that is using uffd?
>> Hard to believe, what if the monitor is running in a completely different
>> process than the one being checkpointed?
>
>Shivansh, do you speak for CRIU? Are you able to comment on whether CRIU
>supports checkpointing an app that uses uffd?

I do not speak for CRIU - I'm just a user (and hopefully a future contributor), but not a maintainer or owner. I can however comment on whether CRIU supports checkpointing an app that uses UFFD - it doesn't. Looking through both the implementation of CRIU (specifically how they restore memory [1]), and at recently filed Github issues [2], it's pretty clear that CRIU doesn't support processes using UFFD - that they do not currently have plans to [3].

[1] https://github.com/checkpoint-restore/criu/blob/criu-2.x-stable/criu/mem.c#L683
[2] https://github.com/checkpoint-restore/criu/issues/2021
[3] https://github.com/checkpoint-restore/criu/issues/2021#issuecomment-1346971967

>>
>> Further ... isn't CRIU already using uffd in some cases? ...documentation
>> mentions [1] that it is used for "lazy (or post-copy) restore in CRIU". At least
>> if the documentation is correct and its actually implemented.
>>
>
>Shivansh, same question - do you know the current CRIU status/plans for using
>uffd-wp instead of soft-dirty? If CRIU doesn't currently implement it and has no
>current plans to, how can we guage interest in making a plan?
>

While I cannot gauge whether the maintainers or main contributors of CRIU plan on using uffd-wp instead of soft-dirty in the future, I can tell you that there is no currently open issue to track that work, and whenever anyone in the past has asked about ARM64 pre-dump support to CRIU (which is the feature that uses soft-dirty/would use uffd-wp), they've always just said it's not supported - but that they do want the feature [4].

So in summary, they want the feature, but no one is working on implementing it (either with soft-dirty or with uffd-wp).

I doubt that CRIU would have any issues with adding the feature via soft-dirty (since, as shown in [4] they're interested in it), but as for using uffd-wp they definitely haven't shown any interest thus far. Based on the fact that it would be a very significant amount of work and it would really only be for ARM64 support (which they're already fine without), I'd be very surprised if they were interested in pursuing it.

[4] https://github.com/checkpoint-restore/criu/issues/1859#issuecomment-1972674047

>>
>>>
>>>> But I'll throw in another idea: do we really need soft-dirty and uffd-wp to
>>>> exist at the same time in the same process (or the VMA?). In theory, we
>
>My instinct is that MUXing a PTE bit like this will lead to some subtle problems
>that won't appear on arches that support either one or both of the features
>independently and unconditionally. Surely better to limit ourselves to either
>"arm64 will only support uffd-wp" or "arm64 will support both uffd-wp and
>soft-dirty". That way, we could move ahead with reviewing/merging the uffd-wp
>support asynchronously to deciding whether we want to support soft-dirty.
>

My personal preference is having both approaches supported - especially in the context of CRIU since I doubt they'll be willing to rewrite all of the dumping and restore logic just for ARM64 support.

2024-04-23 20:57:23

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] arm64/mm: uffd write-protect and soft-dirty tracking


>>>>
>>>> We further recently added a new UFFD_FEATURE_WP_ASYNC feature as part of
>>>> [2], because getting soft-dirty return reliable results in some cases turned
>>>> out rather hard to fix.
>
> But it sounds like the current soft-dirty semantic is sufficient for CRIU on
> other arches? If I understood correctly from my brief scan of the linked post,
> the problem is that soft-dirty can sometimes provide false-positives? So could
> result in uneccessary copy, but never lost data?

Yes, it seems to be good enough for them in that regard I think.

[...]

>>>
>>>> But I'll throw in another idea: do we really need soft-dirty and uffd-wp to
>>>> exist at the same time in the same process (or the VMA?). In theory, we
>
> My instinct is that MUXing a PTE bit like this will lead to some subtle problems
> that won't appear on arches that support either one or both of the features
> independently and unconditionally. Surely better to limit ourselves to either
> "arm64 will only support uffd-wp" or "arm64 will support both uffd-wp and
> soft-dirty". That way, we could move ahead with reviewing/merging the uffd-wp
> support asynchronously to deciding whether we want to support soft-dirty.

Yes. MUXing would require some work, but likely better than wasting 1/64
PTE space on a corner case feature with one famous user that might be
able to port to an alternative with other active users (growing ;) ).

Anyhow, I don't maintain arm64 code and we have to carry that baggage in
the core either way for the time being ...

--
Cheers,

David / dhildenb


2024-04-23 21:03:13

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] arm64/mm: uffd write-protect and soft-dirty tracking

>>
>> Shivansh, do you speak for CRIU? Are you able to comment on whether CRIU
>> supports checkpointing an app that uses uffd?
>
> I do not speak for CRIU - I'm just a user (and hopefully a future contributor), but not a maintainer or owner. I can however comment on whether CRIU supports checkpointing an app that uses UFFD - it doesn't. Looking through both the implementation of CRIU (specifically how they restore memory [1]), and at recently filed Github issues [2], it's pretty clear that CRIU doesn't support processes using UFFD - that they do not currently have plans to [3].

Thanks for all these pointers!

>
> [1] https://github.com/checkpoint-restore/criu/blob/criu-2.x-stable/criu/mem.c#L683
> [2] https://github.com/checkpoint-restore/criu/issues/2021
> [3] https://github.com/checkpoint-restore/criu/issues/2021#issuecomment-1346971967
>
>>>
>>> Further ... isn't CRIU already using uffd in some cases? ...documentation
>>> mentions [1] that it is used for "lazy (or post-copy) restore in CRIU". At least
>>> if the documentation is correct and its actually implemented.
>>>
>>
>> Shivansh, same question - do you know the current CRIU status/plans for using
>> uffd-wp instead of soft-dirty? If CRIU doesn't currently implement it and has no
>> current plans to, how can we guage interest in making a plan?
>>
>
> While I cannot gauge whether the maintainers or main contributors of CRIU plan on using uffd-wp instead of soft-dirty in the future, I can tell you that there is no currently open issue to track that work, and whenever anyone in the past has asked about ARM64 pre-dump support to CRIU (which is the feature that uses soft-dirty/would use uffd-wp), they've always just said it's not supported - but that they do want the feature [4].
>
> So in summary, they want the feature, but no one is working on implementing it (either with soft-dirty or with uffd-wp).
>
> I doubt that CRIU would have any issues with adding the feature via soft-dirty (since, as shown in [4] they're interested in it), but as for using uffd-wp they definitely haven't shown any interest thus far. Based on the fact that it would be a very significant amount of work and it would really only be for ARM64 support (which they're already fine without), I'd be very surprised if they were interested in pursuing it.
>

Of course, nobody wants to do the work. But that doesn't mean that the
kernel has to do the work :)

If there are some major challenges why it cannot possible be done with
uffd-wp (unfixable), that's a different story.

> [4] https://github.com/checkpoint-restore/criu/issues/1859#issuecomment-1972674047
>
>>>
>>>>
>>>>> But I'll throw in another idea: do we really need soft-dirty and uffd-wp to
>>>>> exist at the same time in the same process (or the VMA?). In theory, we
>>
>> My instinct is that MUXing a PTE bit like this will lead to some subtle problems
>> that won't appear on arches that support either one or both of the features
>> independently and unconditionally. Surely better to limit ourselves to either
>> "arm64 will only support uffd-wp" or "arm64 will support both uffd-wp and
>> soft-dirty". That way, we could move ahead with reviewing/merging the uffd-wp
>> support asynchronously to deciding whether we want to support soft-dirty.
>>
>
> My personal preference is having both approaches supported - especially in the context of CRIU since I doubt they'll be willing to rewrite all of the dumping and restore logic just for ARM64 support.

Sure, nobody does any work unless they are forced to.

But this is something that arm64 maintainers will have to decide.

Let's start with uffd-wp that has other well-known users that could
benefit (e.g., QEMU background snapshots).

--
Cheers,

David / dhildenb


2024-04-24 10:39:44

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] arm64/mm: uffd write-protect and soft-dirty tracking

On 23/04/2024 22:02, David Hildenbrand wrote:
>>>
>>> Shivansh, do you speak for CRIU? Are you able to comment on whether CRIU
>>> supports checkpointing an app that uses uffd?
>>
>> I do not speak for CRIU - I'm just a user (and hopefully a future
>> contributor), but not a maintainer or owner. I can however comment on whether
>> CRIU supports checkpointing an app that uses UFFD - it doesn't. Looking
>> through both the implementation of CRIU (specifically how they restore memory
>> [1]), and at recently filed Github issues [2], it's pretty clear that CRIU
>> doesn't support processes using UFFD - that they do not currently have plans
>> to [3].
>
> Thanks for all these pointers!
>
>>
>> [1]
>> https://github.com/checkpoint-restore/criu/blob/criu-2.x-stable/criu/mem.c#L683
>> [2] https://github.com/checkpoint-restore/criu/issues/2021
>> [3]
>> https://github.com/checkpoint-restore/criu/issues/2021#issuecomment-1346971967
>>
>>>>
>>>> Further ... isn't CRIU already using uffd in some cases? ...documentation
>>>> mentions [1] that it is used for "lazy (or post-copy) restore in CRIU". At
>>>> least
>>>> if the documentation is correct and its actually implemented.
>>>>
>>>
>>> Shivansh, same question - do you know the current CRIU status/plans for using
>>> uffd-wp instead of soft-dirty? If CRIU doesn't currently implement it and has no
>>> current plans to, how can we guage interest in making a plan?
>>>
>>
>> While I cannot gauge whether the maintainers or main contributors of CRIU plan
>> on using uffd-wp instead of soft-dirty in the future, I can tell you that
>> there is no currently open issue to track that work, and whenever anyone in
>> the past has asked about ARM64 pre-dump support to CRIU (which is the feature
>> that uses soft-dirty/would use uffd-wp), they've always just said it's not
>> supported - but that they do want the feature [4].
>>
>> So in summary, they want the feature, but no one is working on implementing it
>> (either with soft-dirty or with uffd-wp).
>>
>> I doubt that CRIU would have any issues with adding the feature via soft-dirty
>> (since, as shown in [4] they're interested in it), but as for using uffd-wp
>> they definitely haven't shown any interest thus far. Based on the fact that it
>> would be a very significant amount of work and it would really only be for
>> ARM64 support (which they're already fine without), I'd be very surprised if
>> they were interested in pursuing it.
>>
>
> Of course, nobody wants to do the work. But that doesn't mean that the kernel
> has to do the work :)
>
> If there are some major challenges why it cannot possible be done with uffd-wp
> (unfixable), that's a different story.
>
>> [4]
>> https://github.com/checkpoint-restore/criu/issues/1859#issuecomment-1972674047
>>
>>>>
>>>>>
>>>>>> But I'll throw in another idea: do we really need soft-dirty and uffd-wp to
>>>>>> exist at the same time in the same process (or the VMA?). In theory, we
>>>
>>> My instinct is that MUXing a PTE bit like this will lead to some subtle problems
>>> that won't appear on arches that support either one or both of the features
>>> independently and unconditionally. Surely better to limit ourselves to either
>>> "arm64 will only support uffd-wp" or "arm64 will support both uffd-wp and
>>> soft-dirty". That way, we could move ahead with reviewing/merging the uffd-wp
>>> support asynchronously to deciding whether we want to support soft-dirty.
>>>
>>
>> My personal preference is having both approaches supported - especially in the
>> context of CRIU since I doubt they'll be willing to rewrite all of the dumping
>> and restore logic just for ARM64 support.
>
> Sure, nobody does any work unless they are forced to.
>
> But this is something that arm64 maintainers will have to decide.
>
> Let's start with uffd-wp that has other well-known users that could benefit
> (e.g., QEMU background snapshots).

Right. I'm going to:

- re-post patch 5 standalone to go in via kselftests.
- re-post patches 1 & 2 as a series to enable uffd-wp on arm64; uncontentious
I think.
- Have a chat with Catalin about appetite for soft-dirty on arm64; But likely
that will be left here until/unless there is clear justificaiton that the
use case cannot be met with uffd-wp.

Thanks,
Ryan


2024-04-24 10:40:19

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] selftests/mm: soft-dirty should fail if a testcase fails

On 23/04/2024 09:44, Muhammad Usama Anjum wrote:
> On 4/23/24 1:24 PM, Ryan Roberts wrote:
>> On 22/04/2024 10:33, David Hildenbrand wrote:
>>> On 19.04.24 09:43, Ryan Roberts wrote:
>>>> Previously soft-dirty was unconditionally exiting with success, even if
>>>> one of it's testcases failed. Let's fix that so that failure can be
>>>> reported to automated systems properly.
>>>>
>>>> Signed-off-by: Ryan Roberts <[email protected]>
> Reviewed-by: Muhammad Usama Anjum <[email protected]>

Thanks!

>
>>>> ---
>>>>   tools/testing/selftests/mm/soft-dirty.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/tools/testing/selftests/mm/soft-dirty.c
>>>> b/tools/testing/selftests/mm/soft-dirty.c
>>>> index 7dbfa53d93a0..bdfa5d085f00 100644
>>>> --- a/tools/testing/selftests/mm/soft-dirty.c
>>>> +++ b/tools/testing/selftests/mm/soft-dirty.c
>>>> @@ -209,5 +209,5 @@ int main(int argc, char **argv)
>>>>
>>>>       close(pagemap_fd);
>>>>
>>>> -    return ksft_exit_pass();
>>>> +    ksft_finished();
>>>>   }
>>>> --
>>>> 2.25.1
>>>>
>>>
>>> Guess that makes sense independent of all the other stuff?
>>
>> Yes definitely. What's the process here? Do I need to re-post as a stand-alone
>> patch? Or perhaps, Shuah, you could take this into your tree as is?
> She can. But if she misses it or you want to post v2 of this current
> series, you can just send this one separately. Usually I try to send
> separate patches for trivial and discussion required patches so that there
> isn't confusion of this kind.

Thanks - I'll do that.

>
>>
>>>
>>> Reviewed-by: David Hildenbrand <[email protected]>
>>
>> Thanks!
>>
>>
>>
>


2024-04-24 11:02:27

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] arm64/mm: uffd write-protect and soft-dirty tracking

On Wed, Apr 24, 2024 at 11:39:23AM +0100, Ryan Roberts wrote:
> - Have a chat with Catalin about appetite for soft-dirty on arm64; But likely
> that will be left here until/unless there is clear justificaiton that the
> use case cannot be met with uffd-wp.

I agree, I wouldn't use the last software bit if there's a way for CRIU
to eventually use uffd-wp.

--
Catalin