This series, currently based on 6.3-rc1, is divided into two parts:
- Commits 1-3 refactor userfaultfd ioctl code without behavior changes, with the
main goal of improving consistency and reducing the number of function args.
- Commit 4 adds UFFDIO_CONTINUE_MODE_WP.
The refactors are sorted by increasing controversial-ness, the idea being we
could drop some of the refactors if they are deemed not worth it.
Changelog:
v3->v4:
- massage the uffd_flags_t implementation to eliminate all sparse warnings
- add a couple inline helpers to make uffd_flags_t usage easier
- drop the refactor passing `struct uffdio_range *` around (previously 4/5)
- define a temporary `struct mm_struct *` in function with >=3 `vma->vm_mm`
- consistent argument order between `flags` and `pagep`
- expand on the use case in patch 4/4 message
v2->v3:
- rebase onto 6.3-rc1
- typedef a new type for mfill flags in patch 3/5 (suggested by Nadav)
v1->v2:
- refactor before adding the new flag, to avoid perpetuating messiness
Axel Rasmussen (4):
mm: userfaultfd: rename functions for clarity + consistency
mm: userfaultfd: don't pass around both mm and vma
mm: userfaultfd: combine 'mode' and 'wp_copy' arguments
mm: userfaultfd: add UFFDIO_CONTINUE_MODE_WP to install WP PTEs
fs/userfaultfd.c | 29 ++--
include/linux/hugetlb.h | 27 ++--
include/linux/shmem_fs.h | 9 +-
include/linux/userfaultfd_k.h | 68 +++++----
include/uapi/linux/userfaultfd.h | 7 +
mm/hugetlb.c | 28 ++--
mm/shmem.c | 14 +-
mm/userfaultfd.c | 170 +++++++++++------------
tools/testing/selftests/mm/userfaultfd.c | 4 +
9 files changed, 187 insertions(+), 169 deletions(-)
--
2.40.0.rc1.284.g88254d51c5-goog
The basic problem is, over time we've added new userfaultfd ioctls, and
we've refactored the code so functions which used to handle only one
case are now re-used to deal with several cases. While this happened, we
didn't bother to rename the functions.
Similarly, as we added new functions, we cargo-culted pieces of the
now-inconsistent naming scheme, so those functions too ended up with
names that don't make a lot of sense.
A key point here is, "copy" in most userfaultfd code refers specifically
to UFFDIO_COPY, where we allocate a new page and copy its contents from
userspace. There are many functions with "copy" in the name that don't
actually do this (at least in some cases).
So, rename things into a consistent scheme. The high level idea is that
the call stack for userfaultfd ioctls becomes:
userfaultfd_ioctl
-> userfaultfd_(particular ioctl)
-> mfill_atomic_(particular kind of fill operation)
-> mfill_atomic /* loops over pages in range */
-> mfill_atomic_pte /* deals with single pages */
-> mfill_atomic_pte_(particular kind of fill operation)
-> mfill_atomic_install_pte
There are of course some special cases (shmem, hugetlb), but this is the
general structure which all function names now adhere to.
Acked-by: Peter Xu <[email protected]>
Signed-off-by: Axel Rasmussen <[email protected]>
---
fs/userfaultfd.c | 18 +++----
include/linux/hugetlb.h | 30 +++++------
include/linux/userfaultfd_k.h | 18 +++----
mm/hugetlb.c | 20 +++----
mm/userfaultfd.c | 98 +++++++++++++++++------------------
5 files changed, 92 insertions(+), 92 deletions(-)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 44d1ee429eb0..365bf00dd8dd 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1741,9 +1741,9 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx,
if (uffdio_copy.mode & ~(UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP))
goto out;
if (mmget_not_zero(ctx->mm)) {
- ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src,
- uffdio_copy.len, &ctx->mmap_changing,
- uffdio_copy.mode);
+ ret = mfill_atomic_copy(ctx->mm, uffdio_copy.dst, uffdio_copy.src,
+ uffdio_copy.len, &ctx->mmap_changing,
+ uffdio_copy.mode);
mmput(ctx->mm);
} else {
return -ESRCH;
@@ -1793,9 +1793,9 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx,
goto out;
if (mmget_not_zero(ctx->mm)) {
- ret = mfill_zeropage(ctx->mm, uffdio_zeropage.range.start,
- uffdio_zeropage.range.len,
- &ctx->mmap_changing);
+ ret = mfill_atomic_zeropage(ctx->mm, uffdio_zeropage.range.start,
+ uffdio_zeropage.range.len,
+ &ctx->mmap_changing);
mmput(ctx->mm);
} else {
return -ESRCH;
@@ -1903,9 +1903,9 @@ static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg)
goto out;
if (mmget_not_zero(ctx->mm)) {
- ret = mcopy_continue(ctx->mm, uffdio_continue.range.start,
- uffdio_continue.range.len,
- &ctx->mmap_changing);
+ ret = mfill_atomic_continue(ctx->mm, uffdio_continue.range.start,
+ uffdio_continue.range.len,
+ &ctx->mmap_changing);
mmput(ctx->mm);
} else {
return -ESRCH;
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 7c977d234aba..8f0467bf1cbd 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -158,13 +158,13 @@ unsigned long hugetlb_total_pages(void);
vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long address, unsigned int flags);
#ifdef CONFIG_USERFAULTFD
-int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte,
- struct vm_area_struct *dst_vma,
- unsigned long dst_addr,
- unsigned long src_addr,
- enum mcopy_atomic_mode mode,
- struct page **pagep,
- bool wp_copy);
+int hugetlb_mfill_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte,
+ struct vm_area_struct *dst_vma,
+ unsigned long dst_addr,
+ unsigned long src_addr,
+ enum mcopy_atomic_mode mode,
+ struct page **pagep,
+ bool wp_copy);
#endif /* CONFIG_USERFAULTFD */
bool hugetlb_reserve_pages(struct inode *inode, long from, long to,
struct vm_area_struct *vma,
@@ -393,14 +393,14 @@ static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
}
#ifdef CONFIG_USERFAULTFD
-static inline int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
- pte_t *dst_pte,
- struct vm_area_struct *dst_vma,
- unsigned long dst_addr,
- unsigned long src_addr,
- enum mcopy_atomic_mode mode,
- struct page **pagep,
- bool wp_copy)
+static inline int hugetlb_mfill_atomic_pte(struct mm_struct *dst_mm,
+ pte_t *dst_pte,
+ struct vm_area_struct *dst_vma,
+ unsigned long dst_addr,
+ unsigned long src_addr,
+ enum mcopy_atomic_mode mode,
+ struct page **pagep,
+ bool wp_copy)
{
BUG();
return 0;
diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index 3767f18114ef..468080125612 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -61,15 +61,15 @@ extern int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
unsigned long dst_addr, struct page *page,
bool newly_allocated, bool wp_copy);
-extern ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start,
- unsigned long src_start, unsigned long len,
- atomic_t *mmap_changing, __u64 mode);
-extern ssize_t mfill_zeropage(struct mm_struct *dst_mm,
- unsigned long dst_start,
- unsigned long len,
- atomic_t *mmap_changing);
-extern ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long dst_start,
- unsigned long len, atomic_t *mmap_changing);
+extern ssize_t mfill_atomic_copy(struct mm_struct *dst_mm, unsigned long dst_start,
+ unsigned long src_start, unsigned long len,
+ atomic_t *mmap_changing, __u64 mode);
+extern ssize_t mfill_atomic_zeropage(struct mm_struct *dst_mm,
+ unsigned long dst_start,
+ unsigned long len,
+ atomic_t *mmap_changing);
+extern ssize_t mfill_atomic_continue(struct mm_struct *dst_mm, unsigned long dst_start,
+ unsigned long len, atomic_t *mmap_changing);
extern int mwriteprotect_range(struct mm_struct *dst_mm,
unsigned long start, unsigned long len,
bool enable_wp, atomic_t *mmap_changing);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 07abcb6eb203..4c9276549394 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6154,17 +6154,17 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
#ifdef CONFIG_USERFAULTFD
/*
- * Used by userfaultfd UFFDIO_COPY. Based on mcopy_atomic_pte with
- * modifications for huge pages.
+ * Used by userfaultfd UFFDIO_* ioctls. Based on userfaultfd's mfill_atomic_pte
+ * with modifications for hugetlb pages.
*/
-int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
- pte_t *dst_pte,
- struct vm_area_struct *dst_vma,
- unsigned long dst_addr,
- unsigned long src_addr,
- enum mcopy_atomic_mode mode,
- struct page **pagep,
- bool wp_copy)
+int hugetlb_mfill_atomic_pte(struct mm_struct *dst_mm,
+ pte_t *dst_pte,
+ struct vm_area_struct *dst_vma,
+ unsigned long dst_addr,
+ unsigned long src_addr,
+ enum mcopy_atomic_mode mode,
+ struct page **pagep,
+ bool wp_copy)
{
bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE);
struct hstate *h = hstate_vma(dst_vma);
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 53c3d916ff66..84db5b2fad3a 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -127,13 +127,13 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
return ret;
}
-static int mcopy_atomic_pte(struct mm_struct *dst_mm,
- pmd_t *dst_pmd,
- struct vm_area_struct *dst_vma,
- unsigned long dst_addr,
- unsigned long src_addr,
- struct page **pagep,
- bool wp_copy)
+static int mfill_atomic_pte_copy(struct mm_struct *dst_mm,
+ pmd_t *dst_pmd,
+ struct vm_area_struct *dst_vma,
+ unsigned long dst_addr,
+ unsigned long src_addr,
+ struct page **pagep,
+ bool wp_copy)
{
void *page_kaddr;
int ret;
@@ -204,10 +204,10 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm,
goto out;
}
-static int mfill_zeropage_pte(struct mm_struct *dst_mm,
- pmd_t *dst_pmd,
- struct vm_area_struct *dst_vma,
- unsigned long dst_addr)
+static int mfill_atomic_pte_zeropage(struct mm_struct *dst_mm,
+ pmd_t *dst_pmd,
+ struct vm_area_struct *dst_vma,
+ unsigned long dst_addr)
{
pte_t _dst_pte, *dst_pte;
spinlock_t *ptl;
@@ -240,11 +240,11 @@ static int mfill_zeropage_pte(struct mm_struct *dst_mm,
}
/* Handles UFFDIO_CONTINUE for all shmem VMAs (shared or private). */
-static int mcontinue_atomic_pte(struct mm_struct *dst_mm,
- pmd_t *dst_pmd,
- struct vm_area_struct *dst_vma,
- unsigned long dst_addr,
- bool wp_copy)
+static int mfill_atomic_pte_continue(struct mm_struct *dst_mm,
+ pmd_t *dst_pmd,
+ struct vm_area_struct *dst_vma,
+ unsigned long dst_addr,
+ bool wp_copy)
{
struct inode *inode = file_inode(dst_vma->vm_file);
pgoff_t pgoff = linear_page_index(dst_vma, dst_addr);
@@ -307,10 +307,10 @@ static pmd_t *mm_alloc_pmd(struct mm_struct *mm, unsigned long address)
#ifdef CONFIG_HUGETLB_PAGE
/*
- * __mcopy_atomic processing for HUGETLB vmas. Note that this routine is
+ * mfill_atomic processing for HUGETLB vmas. Note that this routine is
* called with mmap_lock held, it will release mmap_lock before returning.
*/
-static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
+static __always_inline ssize_t mfill_atomic_hugetlb(struct mm_struct *dst_mm,
struct vm_area_struct *dst_vma,
unsigned long dst_start,
unsigned long src_start,
@@ -411,7 +411,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
goto out_unlock;
}
- err = hugetlb_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma,
+ err = hugetlb_mfill_atomic_pte(dst_mm, dst_pte, dst_vma,
dst_addr, src_addr, mode, &page,
wp_copy);
@@ -463,7 +463,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
}
#else /* !CONFIG_HUGETLB_PAGE */
/* fail at build time if gcc attempts to use this */
-extern ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
+extern ssize_t mfill_atomic_hugetlb(struct mm_struct *dst_mm,
struct vm_area_struct *dst_vma,
unsigned long dst_start,
unsigned long src_start,
@@ -484,8 +484,8 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm,
ssize_t err;
if (mode == MCOPY_ATOMIC_CONTINUE) {
- return mcontinue_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
- wp_copy);
+ return mfill_atomic_pte_continue(dst_mm, dst_pmd, dst_vma,
+ dst_addr, wp_copy);
}
/*
@@ -500,11 +500,11 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm,
*/
if (!(dst_vma->vm_flags & VM_SHARED)) {
if (mode == MCOPY_ATOMIC_NORMAL)
- err = mcopy_atomic_pte(dst_mm, dst_pmd, dst_vma,
- dst_addr, src_addr, page,
- wp_copy);
+ err = mfill_atomic_pte_copy(dst_mm, dst_pmd, dst_vma,
+ dst_addr, src_addr, page,
+ wp_copy);
else
- err = mfill_zeropage_pte(dst_mm, dst_pmd,
+ err = mfill_atomic_pte_zeropage(dst_mm, dst_pmd,
dst_vma, dst_addr);
} else {
err = shmem_mfill_atomic_pte(dst_mm, dst_pmd, dst_vma,
@@ -516,13 +516,13 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm,
return err;
}
-static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
- unsigned long dst_start,
- unsigned long src_start,
- unsigned long len,
- enum mcopy_atomic_mode mcopy_mode,
- atomic_t *mmap_changing,
- __u64 mode)
+static __always_inline ssize_t mfill_atomic(struct mm_struct *dst_mm,
+ unsigned long dst_start,
+ unsigned long src_start,
+ unsigned long len,
+ enum mcopy_atomic_mode mcopy_mode,
+ atomic_t *mmap_changing,
+ __u64 mode)
{
struct vm_area_struct *dst_vma;
ssize_t err;
@@ -588,9 +588,9 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
* If this is a HUGETLB vma, pass off to appropriate routine
*/
if (is_vm_hugetlb_page(dst_vma))
- return __mcopy_atomic_hugetlb(dst_mm, dst_vma, dst_start,
- src_start, len, mcopy_mode,
- wp_copy);
+ return mfill_atomic_hugetlb(dst_mm, dst_vma, dst_start,
+ src_start, len, mcopy_mode,
+ wp_copy);
if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma))
goto out_unlock;
@@ -688,26 +688,26 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
return copied ? copied : err;
}
-ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start,
- unsigned long src_start, unsigned long len,
- atomic_t *mmap_changing, __u64 mode)
+ssize_t mfill_atomic_copy(struct mm_struct *dst_mm, unsigned long dst_start,
+ unsigned long src_start, unsigned long len,
+ atomic_t *mmap_changing, __u64 mode)
{
- return __mcopy_atomic(dst_mm, dst_start, src_start, len,
- MCOPY_ATOMIC_NORMAL, mmap_changing, mode);
+ return mfill_atomic(dst_mm, dst_start, src_start, len,
+ MCOPY_ATOMIC_NORMAL, mmap_changing, mode);
}
-ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start,
- unsigned long len, atomic_t *mmap_changing)
+ssize_t mfill_atomic_zeropage(struct mm_struct *dst_mm, unsigned long start,
+ unsigned long len, atomic_t *mmap_changing)
{
- return __mcopy_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_ZEROPAGE,
- mmap_changing, 0);
+ return mfill_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_ZEROPAGE,
+ mmap_changing, 0);
}
-ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long start,
- unsigned long len, atomic_t *mmap_changing)
+ssize_t mfill_atomic_continue(struct mm_struct *dst_mm, unsigned long start,
+ unsigned long len, atomic_t *mmap_changing)
{
- return __mcopy_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_CONTINUE,
- mmap_changing, 0);
+ return mfill_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_CONTINUE,
+ mmap_changing, 0);
}
long uffd_wp_range(struct mm_struct *dst_mm, struct vm_area_struct *dst_vma,
--
2.40.0.rc1.284.g88254d51c5-goog
Quite a few userfaultfd functions took both mm and vma pointers as
arguments. Since the mm is trivially accessible via vma->vm_mm, there's
no reason to pass both; it just needlessly extends the already long
argument list.
Get rid of the mm pointer, where possible, to shorten the argument list.
Acked-by: Peter Xu <[email protected]>
Signed-off-by: Axel Rasmussen <[email protected]>
---
fs/userfaultfd.c | 2 +-
include/linux/hugetlb.h | 5 ++-
include/linux/shmem_fs.h | 4 +--
include/linux/userfaultfd_k.h | 4 +--
mm/hugetlb.c | 4 +--
mm/shmem.c | 7 ++--
mm/userfaultfd.c | 61 +++++++++++++++++------------------
7 files changed, 41 insertions(+), 46 deletions(-)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 365bf00dd8dd..84d5d402214a 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1629,7 +1629,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
/* Reset ptes for the whole vma range if wr-protected */
if (userfaultfd_wp(vma))
- uffd_wp_range(mm, vma, start, vma_end - start, false);
+ uffd_wp_range(vma, start, vma_end - start, false);
new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS;
prev = vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 8f0467bf1cbd..8b9325f77ac3 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -158,7 +158,7 @@ unsigned long hugetlb_total_pages(void);
vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long address, unsigned int flags);
#ifdef CONFIG_USERFAULTFD
-int hugetlb_mfill_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte,
+int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
struct vm_area_struct *dst_vma,
unsigned long dst_addr,
unsigned long src_addr,
@@ -393,8 +393,7 @@ static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
}
#ifdef CONFIG_USERFAULTFD
-static inline int hugetlb_mfill_atomic_pte(struct mm_struct *dst_mm,
- pte_t *dst_pte,
+static inline int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
struct vm_area_struct *dst_vma,
unsigned long dst_addr,
unsigned long src_addr,
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 103d1000a5a2..b82916c25e61 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -151,14 +151,14 @@ extern void shmem_uncharge(struct inode *inode, long pages);
#ifdef CONFIG_USERFAULTFD
#ifdef CONFIG_SHMEM
-extern int shmem_mfill_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
+extern int shmem_mfill_atomic_pte(pmd_t *dst_pmd,
struct vm_area_struct *dst_vma,
unsigned long dst_addr,
unsigned long src_addr,
bool zeropage, bool wp_copy,
struct page **pagep);
#else /* !CONFIG_SHMEM */
-#define shmem_mfill_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, \
+#define shmem_mfill_atomic_pte(dst_pmd, dst_vma, dst_addr, \
src_addr, zeropage, wp_copy, pagep) ({ BUG(); 0; })
#endif /* CONFIG_SHMEM */
#endif /* CONFIG_USERFAULTFD */
diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index 468080125612..ba79e296fcc7 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -56,7 +56,7 @@ enum mcopy_atomic_mode {
MCOPY_ATOMIC_CONTINUE,
};
-extern int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
+extern int mfill_atomic_install_pte(pmd_t *dst_pmd,
struct vm_area_struct *dst_vma,
unsigned long dst_addr, struct page *page,
bool newly_allocated, bool wp_copy);
@@ -73,7 +73,7 @@ extern ssize_t mfill_atomic_continue(struct mm_struct *dst_mm, unsigned long dst
extern int mwriteprotect_range(struct mm_struct *dst_mm,
unsigned long start, unsigned long len,
bool enable_wp, atomic_t *mmap_changing);
-extern long uffd_wp_range(struct mm_struct *dst_mm, struct vm_area_struct *vma,
+extern long uffd_wp_range(struct vm_area_struct *vma,
unsigned long start, unsigned long len, bool enable_wp);
/* mm helpers */
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 4c9276549394..fe043034ab46 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6157,8 +6157,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
* Used by userfaultfd UFFDIO_* ioctls. Based on userfaultfd's mfill_atomic_pte
* with modifications for hugetlb pages.
*/
-int hugetlb_mfill_atomic_pte(struct mm_struct *dst_mm,
- pte_t *dst_pte,
+int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
struct vm_area_struct *dst_vma,
unsigned long dst_addr,
unsigned long src_addr,
@@ -6166,6 +6165,7 @@ int hugetlb_mfill_atomic_pte(struct mm_struct *dst_mm,
struct page **pagep,
bool wp_copy)
{
+ struct mm_struct *dst_mm = dst_vma->vm_mm;
bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE);
struct hstate *h = hstate_vma(dst_vma);
struct address_space *mapping = dst_vma->vm_file->f_mapping;
diff --git a/mm/shmem.c b/mm/shmem.c
index 448f393d8ab2..1d751b6cf1ac 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2415,8 +2415,7 @@ static struct inode *shmem_get_inode(struct mnt_idmap *idmap, struct super_block
}
#ifdef CONFIG_USERFAULTFD
-int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
- pmd_t *dst_pmd,
+int shmem_mfill_atomic_pte(pmd_t *dst_pmd,
struct vm_area_struct *dst_vma,
unsigned long dst_addr,
unsigned long src_addr,
@@ -2506,11 +2505,11 @@ int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
goto out_release;
ret = shmem_add_to_page_cache(folio, mapping, pgoff, NULL,
- gfp & GFP_RECLAIM_MASK, dst_mm);
+ gfp & GFP_RECLAIM_MASK, dst_vma->vm_mm);
if (ret)
goto out_release;
- ret = mfill_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
+ ret = mfill_atomic_install_pte(dst_pmd, dst_vma, dst_addr,
&folio->page, true, wp_copy);
if (ret)
goto out_delete_from_cache;
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 84db5b2fad3a..4fc373476739 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -55,12 +55,13 @@ struct vm_area_struct *find_dst_vma(struct mm_struct *dst_mm,
* This function handles both MCOPY_ATOMIC_NORMAL and _CONTINUE for both shmem
* and anon, and for both shared and private VMAs.
*/
-int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
+int mfill_atomic_install_pte(pmd_t *dst_pmd,
struct vm_area_struct *dst_vma,
unsigned long dst_addr, struct page *page,
bool newly_allocated, bool wp_copy)
{
int ret;
+ struct mm_struct *dst_mm = dst_vma->vm_mm;
pte_t _dst_pte, *dst_pte;
bool writable = dst_vma->vm_flags & VM_WRITE;
bool vm_shared = dst_vma->vm_flags & VM_SHARED;
@@ -127,8 +128,7 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
return ret;
}
-static int mfill_atomic_pte_copy(struct mm_struct *dst_mm,
- pmd_t *dst_pmd,
+static int mfill_atomic_pte_copy(pmd_t *dst_pmd,
struct vm_area_struct *dst_vma,
unsigned long dst_addr,
unsigned long src_addr,
@@ -190,10 +190,10 @@ static int mfill_atomic_pte_copy(struct mm_struct *dst_mm,
__SetPageUptodate(page);
ret = -ENOMEM;
- if (mem_cgroup_charge(page_folio(page), dst_mm, GFP_KERNEL))
+ if (mem_cgroup_charge(page_folio(page), dst_vma->vm_mm, GFP_KERNEL))
goto out_release;
- ret = mfill_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
+ ret = mfill_atomic_install_pte(dst_pmd, dst_vma, dst_addr,
page, true, wp_copy);
if (ret)
goto out_release;
@@ -204,8 +204,7 @@ static int mfill_atomic_pte_copy(struct mm_struct *dst_mm,
goto out;
}
-static int mfill_atomic_pte_zeropage(struct mm_struct *dst_mm,
- pmd_t *dst_pmd,
+static int mfill_atomic_pte_zeropage(pmd_t *dst_pmd,
struct vm_area_struct *dst_vma,
unsigned long dst_addr)
{
@@ -217,7 +216,7 @@ static int mfill_atomic_pte_zeropage(struct mm_struct *dst_mm,
_dst_pte = pte_mkspecial(pfn_pte(my_zero_pfn(dst_addr),
dst_vma->vm_page_prot));
- dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl);
+ dst_pte = pte_offset_map_lock(dst_vma->vm_mm, dst_pmd, dst_addr, &ptl);
if (dst_vma->vm_file) {
/* the shmem MAP_PRIVATE case requires checking the i_size */
inode = dst_vma->vm_file->f_inode;
@@ -230,7 +229,7 @@ static int mfill_atomic_pte_zeropage(struct mm_struct *dst_mm,
ret = -EEXIST;
if (!pte_none(*dst_pte))
goto out_unlock;
- set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
+ set_pte_at(dst_vma->vm_mm, dst_addr, dst_pte, _dst_pte);
/* No need to invalidate - it was non-present before */
update_mmu_cache(dst_vma, dst_addr, dst_pte);
ret = 0;
@@ -240,8 +239,7 @@ static int mfill_atomic_pte_zeropage(struct mm_struct *dst_mm,
}
/* Handles UFFDIO_CONTINUE for all shmem VMAs (shared or private). */
-static int mfill_atomic_pte_continue(struct mm_struct *dst_mm,
- pmd_t *dst_pmd,
+static int mfill_atomic_pte_continue(pmd_t *dst_pmd,
struct vm_area_struct *dst_vma,
unsigned long dst_addr,
bool wp_copy)
@@ -269,7 +267,7 @@ static int mfill_atomic_pte_continue(struct mm_struct *dst_mm,
goto out_release;
}
- ret = mfill_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
+ ret = mfill_atomic_install_pte(dst_pmd, dst_vma, dst_addr,
page, false, wp_copy);
if (ret)
goto out_release;
@@ -310,7 +308,7 @@ static pmd_t *mm_alloc_pmd(struct mm_struct *mm, unsigned long address)
* mfill_atomic processing for HUGETLB vmas. Note that this routine is
* called with mmap_lock held, it will release mmap_lock before returning.
*/
-static __always_inline ssize_t mfill_atomic_hugetlb(struct mm_struct *dst_mm,
+static __always_inline ssize_t mfill_atomic_hugetlb(
struct vm_area_struct *dst_vma,
unsigned long dst_start,
unsigned long src_start,
@@ -318,6 +316,7 @@ static __always_inline ssize_t mfill_atomic_hugetlb(struct mm_struct *dst_mm,
enum mcopy_atomic_mode mode,
bool wp_copy)
{
+ struct mm_struct *dst_mm = dst_vma->vm_mm;
int vm_shared = dst_vma->vm_flags & VM_SHARED;
ssize_t err;
pte_t *dst_pte;
@@ -411,7 +410,7 @@ static __always_inline ssize_t mfill_atomic_hugetlb(struct mm_struct *dst_mm,
goto out_unlock;
}
- err = hugetlb_mfill_atomic_pte(dst_mm, dst_pte, dst_vma,
+ err = hugetlb_mfill_atomic_pte(dst_pte, dst_vma,
dst_addr, src_addr, mode, &page,
wp_copy);
@@ -463,17 +462,15 @@ static __always_inline ssize_t mfill_atomic_hugetlb(struct mm_struct *dst_mm,
}
#else /* !CONFIG_HUGETLB_PAGE */
/* fail at build time if gcc attempts to use this */
-extern ssize_t mfill_atomic_hugetlb(struct mm_struct *dst_mm,
- struct vm_area_struct *dst_vma,
- unsigned long dst_start,
- unsigned long src_start,
- unsigned long len,
- enum mcopy_atomic_mode mode,
- bool wp_copy);
+extern ssize_t mfill_atomic_hugetlb(struct vm_area_struct *dst_vma,
+ unsigned long dst_start,
+ unsigned long src_start,
+ unsigned long len,
+ enum mcopy_atomic_mode mode,
+ bool wp_copy);
#endif /* CONFIG_HUGETLB_PAGE */
-static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm,
- pmd_t *dst_pmd,
+static __always_inline ssize_t mfill_atomic_pte(pmd_t *dst_pmd,
struct vm_area_struct *dst_vma,
unsigned long dst_addr,
unsigned long src_addr,
@@ -484,7 +481,7 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm,
ssize_t err;
if (mode == MCOPY_ATOMIC_CONTINUE) {
- return mfill_atomic_pte_continue(dst_mm, dst_pmd, dst_vma,
+ return mfill_atomic_pte_continue(dst_pmd, dst_vma,
dst_addr, wp_copy);
}
@@ -500,14 +497,14 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm,
*/
if (!(dst_vma->vm_flags & VM_SHARED)) {
if (mode == MCOPY_ATOMIC_NORMAL)
- err = mfill_atomic_pte_copy(dst_mm, dst_pmd, dst_vma,
+ err = mfill_atomic_pte_copy(dst_pmd, dst_vma,
dst_addr, src_addr, page,
wp_copy);
else
- err = mfill_atomic_pte_zeropage(dst_mm, dst_pmd,
+ err = mfill_atomic_pte_zeropage(dst_pmd,
dst_vma, dst_addr);
} else {
- err = shmem_mfill_atomic_pte(dst_mm, dst_pmd, dst_vma,
+ err = shmem_mfill_atomic_pte(dst_pmd, dst_vma,
dst_addr, src_addr,
mode != MCOPY_ATOMIC_NORMAL,
wp_copy, page);
@@ -588,7 +585,7 @@ static __always_inline ssize_t mfill_atomic(struct mm_struct *dst_mm,
* If this is a HUGETLB vma, pass off to appropriate routine
*/
if (is_vm_hugetlb_page(dst_vma))
- return mfill_atomic_hugetlb(dst_mm, dst_vma, dst_start,
+ return mfill_atomic_hugetlb(dst_vma, dst_start,
src_start, len, mcopy_mode,
wp_copy);
@@ -641,7 +638,7 @@ static __always_inline ssize_t mfill_atomic(struct mm_struct *dst_mm,
BUG_ON(pmd_none(*dst_pmd));
BUG_ON(pmd_trans_huge(*dst_pmd));
- err = mfill_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
+ err = mfill_atomic_pte(dst_pmd, dst_vma, dst_addr,
src_addr, &page, mcopy_mode, wp_copy);
cond_resched();
@@ -710,7 +707,7 @@ ssize_t mfill_atomic_continue(struct mm_struct *dst_mm, unsigned long start,
mmap_changing, 0);
}
-long uffd_wp_range(struct mm_struct *dst_mm, struct vm_area_struct *dst_vma,
+long uffd_wp_range(struct vm_area_struct *dst_vma,
unsigned long start, unsigned long len, bool enable_wp)
{
unsigned int mm_cp_flags;
@@ -730,7 +727,7 @@ long uffd_wp_range(struct mm_struct *dst_mm, struct vm_area_struct *dst_vma,
*/
if (!enable_wp && vma_wants_manual_pte_write_upgrade(dst_vma))
mm_cp_flags |= MM_CP_TRY_CHANGE_WRITABLE;
- tlb_gather_mmu(&tlb, dst_mm);
+ tlb_gather_mmu(&tlb, dst_vma->vm_mm);
ret = change_protection(&tlb, dst_vma, start, start + len, mm_cp_flags);
tlb_finish_mmu(&tlb);
@@ -782,7 +779,7 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
goto out_unlock;
}
- err = uffd_wp_range(dst_mm, dst_vma, start, len, enable_wp);
+ err = uffd_wp_range(dst_vma, start, len, enable_wp);
/* Return 0 on success, <0 on failures */
if (err > 0)
--
2.40.0.rc1.284.g88254d51c5-goog
Many userfaultfd ioctl functions take both a 'mode' and a 'wp_copy'
argument. In future commits we plan to plumb the flags through to more
places, so we'd be proliferating the very long argument list even
further.
Let's take the time to simplify the argument list. Combine the two
arguments into one - and generalize, so when we add more flags in the
future, it doesn't imply more function arguments.
Since the modes (copy, zeropage, continue) are mutually exclusive, store
them as an integer value (0, 1, 2) in the low bits. Place combine-able
flag bits in the high bits.
This is quite similar to an earlier patch proposed by Nadav Amit
("userfaultfd: introduce uffd_flags" [1]). The main difference is that
patch only handled flags, whereas this patch *also* combines the "mode"
argument into the same type to shorten the argument list.
[1]: https://lore.kernel.org/all/[email protected]/
Acked-by: James Houghton <[email protected]>
Signed-off-by: Axel Rasmussen <[email protected]>
---
fs/userfaultfd.c | 5 ++-
include/linux/hugetlb.h | 10 ++---
include/linux/shmem_fs.h | 5 ++-
include/linux/userfaultfd_k.h | 45 +++++++++++++--------
mm/hugetlb.c | 12 +++---
mm/shmem.c | 7 ++--
mm/userfaultfd.c | 76 ++++++++++++++++-------------------
7 files changed, 83 insertions(+), 77 deletions(-)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 84d5d402214a..56e54e50414e 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1714,6 +1714,7 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx,
struct uffdio_copy uffdio_copy;
struct uffdio_copy __user *user_uffdio_copy;
struct userfaultfd_wake_range range;
+ uffd_flags_t flags = 0;
user_uffdio_copy = (struct uffdio_copy __user *) arg;
@@ -1740,10 +1741,12 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx,
goto out;
if (uffdio_copy.mode & ~(UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP))
goto out;
+ if (uffdio_copy.mode & UFFDIO_COPY_MODE_WP)
+ flags |= MFILL_ATOMIC_WP;
if (mmget_not_zero(ctx->mm)) {
ret = mfill_atomic_copy(ctx->mm, uffdio_copy.dst, uffdio_copy.src,
uffdio_copy.len, &ctx->mmap_changing,
- uffdio_copy.mode);
+ flags);
mmput(ctx->mm);
} else {
return -ESRCH;
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 8b9325f77ac3..6270a4786584 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -162,9 +162,8 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
struct vm_area_struct *dst_vma,
unsigned long dst_addr,
unsigned long src_addr,
- enum mcopy_atomic_mode mode,
- struct page **pagep,
- bool wp_copy);
+ uffd_flags_t flags,
+ struct page **pagep);
#endif /* CONFIG_USERFAULTFD */
bool hugetlb_reserve_pages(struct inode *inode, long from, long to,
struct vm_area_struct *vma,
@@ -397,9 +396,8 @@ static inline int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
struct vm_area_struct *dst_vma,
unsigned long dst_addr,
unsigned long src_addr,
- enum mcopy_atomic_mode mode,
- struct page **pagep,
- bool wp_copy)
+ uffd_flags_t flags,
+ struct page **pagep)
{
BUG();
return 0;
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index b82916c25e61..b7048bd88a8d 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -9,6 +9,7 @@
#include <linux/percpu_counter.h>
#include <linux/xattr.h>
#include <linux/fs_parser.h>
+#include <linux/userfaultfd_k.h>
/* inode in-kernel data */
@@ -155,11 +156,11 @@ extern int shmem_mfill_atomic_pte(pmd_t *dst_pmd,
struct vm_area_struct *dst_vma,
unsigned long dst_addr,
unsigned long src_addr,
- bool zeropage, bool wp_copy,
+ uffd_flags_t flags,
struct page **pagep);
#else /* !CONFIG_SHMEM */
#define shmem_mfill_atomic_pte(dst_pmd, dst_vma, dst_addr, \
- src_addr, zeropage, wp_copy, pagep) ({ BUG(); 0; })
+ src_addr, flags, pagep) ({ BUG(); 0; })
#endif /* CONFIG_SHMEM */
#endif /* CONFIG_USERFAULTFD */
diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index ba79e296fcc7..4d7425684171 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -40,30 +40,43 @@ extern int sysctl_unprivileged_userfaultfd;
extern vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason);
-/*
- * The mode of operation for __mcopy_atomic and its helpers.
- *
- * This is almost an implementation detail (mcopy_atomic below doesn't take this
- * as a parameter), but it's exposed here because memory-kind-specific
- * implementations (e.g. hugetlbfs) need to know the mode of operation.
- */
-enum mcopy_atomic_mode {
- /* A normal copy_from_user into the destination range. */
- MCOPY_ATOMIC_NORMAL,
- /* Don't copy; map the destination range to the zero page. */
- MCOPY_ATOMIC_ZEROPAGE,
- /* Just install pte(s) with the existing page(s) in the page cache. */
- MCOPY_ATOMIC_CONTINUE,
+/* A combined operation mode + behavior flags. */
+typedef unsigned int __bitwise uffd_flags_t;
+
+/* Mutually exclusive modes of operation. */
+enum mfill_atomic_mode {
+ MFILL_ATOMIC_COPY,
+ MFILL_ATOMIC_ZEROPAGE,
+ MFILL_ATOMIC_CONTINUE,
+ NR_MFILL_ATOMIC_MODES,
};
+#define MFILL_ATOMIC_MODE_BITS (const_ilog2(NR_MFILL_ATOMIC_MODES - 1) + 1)
+#define MFILL_ATOMIC_BIT(nr) BIT(MFILL_ATOMIC_MODE_BITS + (nr))
+#define MFILL_ATOMIC_FLAG(nr) ((__force uffd_flags_t) MFILL_ATOMIC_BIT(nr))
+#define MFILL_ATOMIC_MODE_MASK ((__force uffd_flags_t) (MFILL_ATOMIC_BIT(0) - 1))
+
+static inline bool uffd_flags_has_mode(uffd_flags_t flags, enum mfill_atomic_mode expected)
+{
+ return (flags & MFILL_ATOMIC_MODE_MASK) == ((__force uffd_flags_t) expected);
+}
+
+static inline uffd_flags_t uffd_flags_set_mode(uffd_flags_t flags, enum mfill_atomic_mode mode)
+{
+ return flags | ((__force uffd_flags_t) mode);
+}
+
+/* Flags controlling behavior. */
+#define MFILL_ATOMIC_WP MFILL_ATOMIC_FLAG(0)
+
extern int mfill_atomic_install_pte(pmd_t *dst_pmd,
struct vm_area_struct *dst_vma,
unsigned long dst_addr, struct page *page,
- bool newly_allocated, bool wp_copy);
+ bool newly_allocated, uffd_flags_t flags);
extern ssize_t mfill_atomic_copy(struct mm_struct *dst_mm, unsigned long dst_start,
unsigned long src_start, unsigned long len,
- atomic_t *mmap_changing, __u64 mode);
+ atomic_t *mmap_changing, uffd_flags_t flags);
extern ssize_t mfill_atomic_zeropage(struct mm_struct *dst_mm,
unsigned long dst_start,
unsigned long len,
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index fe043034ab46..493406a2d61e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6161,12 +6161,12 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
struct vm_area_struct *dst_vma,
unsigned long dst_addr,
unsigned long src_addr,
- enum mcopy_atomic_mode mode,
- struct page **pagep,
- bool wp_copy)
+ uffd_flags_t flags,
+ struct page **pagep)
{
struct mm_struct *dst_mm = dst_vma->vm_mm;
- bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE);
+ bool is_continue = uffd_flags_has_mode(flags, MFILL_ATOMIC_CONTINUE);
+ bool wp_enabled = (flags & MFILL_ATOMIC_WP);
struct hstate *h = hstate_vma(dst_vma);
struct address_space *mapping = dst_vma->vm_file->f_mapping;
pgoff_t idx = vma_hugecache_offset(h, dst_vma, dst_addr);
@@ -6301,7 +6301,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
* For either: (1) CONTINUE on a non-shared VMA, or (2) UFFDIO_COPY
* with wp flag set, don't set pte write bit.
*/
- if (wp_copy || (is_continue && !vm_shared))
+ if (wp_enabled || (is_continue && !vm_shared))
writable = 0;
else
writable = dst_vma->vm_flags & VM_WRITE;
@@ -6316,7 +6316,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
_dst_pte = huge_pte_mkdirty(_dst_pte);
_dst_pte = pte_mkyoung(_dst_pte);
- if (wp_copy)
+ if (wp_enabled)
_dst_pte = huge_pte_mkuffd_wp(_dst_pte);
set_huge_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
diff --git a/mm/shmem.c b/mm/shmem.c
index 1d751b6cf1ac..7d688afb5e31 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -76,7 +76,6 @@ static struct vfsmount *shm_mnt;
#include <linux/syscalls.h>
#include <linux/fcntl.h>
#include <uapi/linux/memfd.h>
-#include <linux/userfaultfd_k.h>
#include <linux/rmap.h>
#include <linux/uuid.h>
@@ -2419,7 +2418,7 @@ int shmem_mfill_atomic_pte(pmd_t *dst_pmd,
struct vm_area_struct *dst_vma,
unsigned long dst_addr,
unsigned long src_addr,
- bool zeropage, bool wp_copy,
+ uffd_flags_t flags,
struct page **pagep)
{
struct inode *inode = file_inode(dst_vma->vm_file);
@@ -2451,7 +2450,7 @@ int shmem_mfill_atomic_pte(pmd_t *dst_pmd,
if (!folio)
goto out_unacct_blocks;
- if (!zeropage) { /* COPY */
+ if (uffd_flags_has_mode(flags, MFILL_ATOMIC_COPY)) {
page_kaddr = kmap_local_folio(folio, 0);
/*
* The read mmap_lock is held here. Despite the
@@ -2510,7 +2509,7 @@ int shmem_mfill_atomic_pte(pmd_t *dst_pmd,
goto out_release;
ret = mfill_atomic_install_pte(dst_pmd, dst_vma, dst_addr,
- &folio->page, true, wp_copy);
+ &folio->page, true, flags);
if (ret)
goto out_delete_from_cache;
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 4fc373476739..dd807924446f 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -58,7 +58,7 @@ struct vm_area_struct *find_dst_vma(struct mm_struct *dst_mm,
int mfill_atomic_install_pte(pmd_t *dst_pmd,
struct vm_area_struct *dst_vma,
unsigned long dst_addr, struct page *page,
- bool newly_allocated, bool wp_copy)
+ bool newly_allocated, uffd_flags_t flags)
{
int ret;
struct mm_struct *dst_mm = dst_vma->vm_mm;
@@ -77,7 +77,7 @@ int mfill_atomic_install_pte(pmd_t *dst_pmd,
writable = false;
if (writable)
_dst_pte = pte_mkwrite(_dst_pte);
- if (wp_copy)
+ if (flags & MFILL_ATOMIC_WP)
_dst_pte = pte_mkuffd_wp(_dst_pte);
dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl);
@@ -132,8 +132,8 @@ static int mfill_atomic_pte_copy(pmd_t *dst_pmd,
struct vm_area_struct *dst_vma,
unsigned long dst_addr,
unsigned long src_addr,
- struct page **pagep,
- bool wp_copy)
+ uffd_flags_t flags,
+ struct page **pagep)
{
void *page_kaddr;
int ret;
@@ -194,7 +194,7 @@ static int mfill_atomic_pte_copy(pmd_t *dst_pmd,
goto out_release;
ret = mfill_atomic_install_pte(dst_pmd, dst_vma, dst_addr,
- page, true, wp_copy);
+ page, true, flags);
if (ret)
goto out_release;
out:
@@ -242,7 +242,7 @@ static int mfill_atomic_pte_zeropage(pmd_t *dst_pmd,
static int mfill_atomic_pte_continue(pmd_t *dst_pmd,
struct vm_area_struct *dst_vma,
unsigned long dst_addr,
- bool wp_copy)
+ uffd_flags_t flags)
{
struct inode *inode = file_inode(dst_vma->vm_file);
pgoff_t pgoff = linear_page_index(dst_vma, dst_addr);
@@ -268,7 +268,7 @@ static int mfill_atomic_pte_continue(pmd_t *dst_pmd,
}
ret = mfill_atomic_install_pte(dst_pmd, dst_vma, dst_addr,
- page, false, wp_copy);
+ page, false, flags);
if (ret)
goto out_release;
@@ -313,8 +313,7 @@ static __always_inline ssize_t mfill_atomic_hugetlb(
unsigned long dst_start,
unsigned long src_start,
unsigned long len,
- enum mcopy_atomic_mode mode,
- bool wp_copy)
+ uffd_flags_t flags)
{
struct mm_struct *dst_mm = dst_vma->vm_mm;
int vm_shared = dst_vma->vm_flags & VM_SHARED;
@@ -334,7 +333,7 @@ static __always_inline ssize_t mfill_atomic_hugetlb(
* by THP. Since we can not reliably insert a zero page, this
* feature is not supported.
*/
- if (mode == MCOPY_ATOMIC_ZEROPAGE) {
+ if (uffd_flags_has_mode(flags, MFILL_ATOMIC_ZEROPAGE)) {
mmap_read_unlock(dst_mm);
return -EINVAL;
}
@@ -402,7 +401,7 @@ static __always_inline ssize_t mfill_atomic_hugetlb(
goto out_unlock;
}
- if (mode != MCOPY_ATOMIC_CONTINUE &&
+ if (!uffd_flags_has_mode(flags, MFILL_ATOMIC_CONTINUE) &&
!huge_pte_none_mostly(huge_ptep_get(dst_pte))) {
err = -EEXIST;
hugetlb_vma_unlock_read(dst_vma);
@@ -410,9 +409,8 @@ static __always_inline ssize_t mfill_atomic_hugetlb(
goto out_unlock;
}
- err = hugetlb_mfill_atomic_pte(dst_pte, dst_vma,
- dst_addr, src_addr, mode, &page,
- wp_copy);
+ err = hugetlb_mfill_atomic_pte(dst_pte, dst_vma, dst_addr,
+ src_addr, flags, &page);
hugetlb_vma_unlock_read(dst_vma);
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
@@ -466,23 +464,21 @@ extern ssize_t mfill_atomic_hugetlb(struct vm_area_struct *dst_vma,
unsigned long dst_start,
unsigned long src_start,
unsigned long len,
- enum mcopy_atomic_mode mode,
- bool wp_copy);
+ uffd_flags_t flags);
#endif /* CONFIG_HUGETLB_PAGE */
static __always_inline ssize_t mfill_atomic_pte(pmd_t *dst_pmd,
struct vm_area_struct *dst_vma,
unsigned long dst_addr,
unsigned long src_addr,
- struct page **page,
- enum mcopy_atomic_mode mode,
- bool wp_copy)
+ uffd_flags_t flags,
+ struct page **pagep)
{
ssize_t err;
- if (mode == MCOPY_ATOMIC_CONTINUE) {
+ if (uffd_flags_has_mode(flags, MFILL_ATOMIC_CONTINUE)) {
return mfill_atomic_pte_continue(dst_pmd, dst_vma,
- dst_addr, wp_copy);
+ dst_addr, flags);
}
/*
@@ -496,18 +492,17 @@ static __always_inline ssize_t mfill_atomic_pte(pmd_t *dst_pmd,
* and not in the radix tree.
*/
if (!(dst_vma->vm_flags & VM_SHARED)) {
- if (mode == MCOPY_ATOMIC_NORMAL)
+ if (uffd_flags_has_mode(flags, MFILL_ATOMIC_COPY))
err = mfill_atomic_pte_copy(dst_pmd, dst_vma,
- dst_addr, src_addr, page,
- wp_copy);
+ dst_addr, src_addr,
+ flags, pagep);
else
err = mfill_atomic_pte_zeropage(dst_pmd,
dst_vma, dst_addr);
} else {
err = shmem_mfill_atomic_pte(dst_pmd, dst_vma,
dst_addr, src_addr,
- mode != MCOPY_ATOMIC_NORMAL,
- wp_copy, page);
+ flags, pagep);
}
return err;
@@ -517,9 +512,8 @@ static __always_inline ssize_t mfill_atomic(struct mm_struct *dst_mm,
unsigned long dst_start,
unsigned long src_start,
unsigned long len,
- enum mcopy_atomic_mode mcopy_mode,
atomic_t *mmap_changing,
- __u64 mode)
+ uffd_flags_t flags)
{
struct vm_area_struct *dst_vma;
ssize_t err;
@@ -527,7 +521,6 @@ static __always_inline ssize_t mfill_atomic(struct mm_struct *dst_mm,
unsigned long src_addr, dst_addr;
long copied;
struct page *page;
- bool wp_copy;
/*
* Sanitize the command parameters:
@@ -577,8 +570,7 @@ static __always_inline ssize_t mfill_atomic(struct mm_struct *dst_mm,
* validate 'mode' now that we know the dst_vma: don't allow
* a wrprotect copy if the userfaultfd didn't register as WP.
*/
- wp_copy = mode & UFFDIO_COPY_MODE_WP;
- if (wp_copy && !(dst_vma->vm_flags & VM_UFFD_WP))
+ if ((flags & MFILL_ATOMIC_WP) && !(dst_vma->vm_flags & VM_UFFD_WP))
goto out_unlock;
/*
@@ -586,12 +578,12 @@ static __always_inline ssize_t mfill_atomic(struct mm_struct *dst_mm,
*/
if (is_vm_hugetlb_page(dst_vma))
return mfill_atomic_hugetlb(dst_vma, dst_start,
- src_start, len, mcopy_mode,
- wp_copy);
+ src_start, len, flags);
if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma))
goto out_unlock;
- if (!vma_is_shmem(dst_vma) && mcopy_mode == MCOPY_ATOMIC_CONTINUE)
+ if (!vma_is_shmem(dst_vma) &&
+ uffd_flags_has_mode(flags, MFILL_ATOMIC_CONTINUE))
goto out_unlock;
/*
@@ -639,7 +631,7 @@ static __always_inline ssize_t mfill_atomic(struct mm_struct *dst_mm,
BUG_ON(pmd_trans_huge(*dst_pmd));
err = mfill_atomic_pte(dst_pmd, dst_vma, dst_addr,
- src_addr, &page, mcopy_mode, wp_copy);
+ src_addr, flags, &page);
cond_resched();
if (unlikely(err == -ENOENT)) {
@@ -687,24 +679,24 @@ static __always_inline ssize_t mfill_atomic(struct mm_struct *dst_mm,
ssize_t mfill_atomic_copy(struct mm_struct *dst_mm, unsigned long dst_start,
unsigned long src_start, unsigned long len,
- atomic_t *mmap_changing, __u64 mode)
+ atomic_t *mmap_changing, uffd_flags_t flags)
{
- return mfill_atomic(dst_mm, dst_start, src_start, len,
- MCOPY_ATOMIC_NORMAL, mmap_changing, mode);
+ return mfill_atomic(dst_mm, dst_start, src_start, len, mmap_changing,
+ uffd_flags_set_mode(flags, MFILL_ATOMIC_COPY));
}
ssize_t mfill_atomic_zeropage(struct mm_struct *dst_mm, unsigned long start,
unsigned long len, atomic_t *mmap_changing)
{
- return mfill_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_ZEROPAGE,
- mmap_changing, 0);
+ return mfill_atomic(dst_mm, start, 0, len, mmap_changing,
+ uffd_flags_set_mode(0, MFILL_ATOMIC_ZEROPAGE));
}
ssize_t mfill_atomic_continue(struct mm_struct *dst_mm, unsigned long start,
unsigned long len, atomic_t *mmap_changing)
{
- return mfill_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_CONTINUE,
- mmap_changing, 0);
+ return mfill_atomic(dst_mm, start, 0, len, mmap_changing,
+ uffd_flags_set_mode(0, MFILL_ATOMIC_CONTINUE));
}
long uffd_wp_range(struct vm_area_struct *dst_vma,
--
2.40.0.rc1.284.g88254d51c5-goog
UFFDIO_COPY already has UFFDIO_COPY_MODE_WP, so when installing a new
PTE to resolve a missing fault, one can install a write-protected one.
This is useful when using UFFDIO_REGISTER_MODE_{MISSING,WP} in
combination.
This was motivated by testing HugeTLB HGM [1], and in particular its
interaction with userfaultfd features. Existing userfaultfd code
supports using WP and MINOR modes together (i.e. you can register an
area with both enabled), but without this CONTINUE flag the combination
is in practice unusable.
So, add an analogous UFFDIO_CONTINUE_MODE_WP, which does the same thing
as UFFDIO_COPY_MODE_WP, but for *minor* faults.
Update the selftest to do some very basic exercising of the new flag.
[1]: https://patchwork.kernel.org/project/linux-mm/cover/[email protected]/
Acked-by: Peter Xu <[email protected]>
Signed-off-by: Axel Rasmussen <[email protected]>
---
fs/userfaultfd.c | 8 ++++++--
include/linux/userfaultfd_k.h | 3 ++-
include/uapi/linux/userfaultfd.h | 7 +++++++
mm/userfaultfd.c | 5 +++--
tools/testing/selftests/mm/userfaultfd.c | 4 ++++
5 files changed, 22 insertions(+), 5 deletions(-)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 56e54e50414e..664019381e04 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1878,6 +1878,7 @@ static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg)
struct uffdio_continue uffdio_continue;
struct uffdio_continue __user *user_uffdio_continue;
struct userfaultfd_wake_range range;
+ uffd_flags_t flags = 0;
user_uffdio_continue = (struct uffdio_continue __user *)arg;
@@ -1902,13 +1903,16 @@ static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg)
uffdio_continue.range.start) {
goto out;
}
- if (uffdio_continue.mode & ~UFFDIO_CONTINUE_MODE_DONTWAKE)
+ if (uffdio_continue.mode & ~(UFFDIO_CONTINUE_MODE_DONTWAKE |
+ UFFDIO_CONTINUE_MODE_WP))
goto out;
+ if (uffdio_continue.mode & UFFDIO_CONTINUE_MODE_WP)
+ flags |= MFILL_ATOMIC_WP;
if (mmget_not_zero(ctx->mm)) {
ret = mfill_atomic_continue(ctx->mm, uffdio_continue.range.start,
uffdio_continue.range.len,
- &ctx->mmap_changing);
+ &ctx->mmap_changing, flags);
mmput(ctx->mm);
} else {
return -ESRCH;
diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index 4d7425684171..9499cfcf83fa 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -82,7 +82,8 @@ extern ssize_t mfill_atomic_zeropage(struct mm_struct *dst_mm,
unsigned long len,
atomic_t *mmap_changing);
extern ssize_t mfill_atomic_continue(struct mm_struct *dst_mm, unsigned long dst_start,
- unsigned long len, atomic_t *mmap_changing);
+ unsigned long len, atomic_t *mmap_changing,
+ uffd_flags_t flags);
extern int mwriteprotect_range(struct mm_struct *dst_mm,
unsigned long start, unsigned long len,
bool enable_wp, atomic_t *mmap_changing);
diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
index 005e5e306266..14059a0861bf 100644
--- a/include/uapi/linux/userfaultfd.h
+++ b/include/uapi/linux/userfaultfd.h
@@ -297,6 +297,13 @@ struct uffdio_writeprotect {
struct uffdio_continue {
struct uffdio_range range;
#define UFFDIO_CONTINUE_MODE_DONTWAKE ((__u64)1<<0)
+ /*
+ * UFFDIO_CONTINUE_MODE_WP will map the page write protected on
+ * the fly. UFFDIO_CONTINUE_MODE_WP is available only if the
+ * write protected ioctl is implemented for the range
+ * according to the uffdio_register.ioctls.
+ */
+#define UFFDIO_CONTINUE_MODE_WP ((__u64)1<<1)
__u64 mode;
/*
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index dd807924446f..2f64e0a9b234 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -693,10 +693,11 @@ ssize_t mfill_atomic_zeropage(struct mm_struct *dst_mm, unsigned long start,
}
ssize_t mfill_atomic_continue(struct mm_struct *dst_mm, unsigned long start,
- unsigned long len, atomic_t *mmap_changing)
+ unsigned long len, atomic_t *mmap_changing,
+ uffd_flags_t flags)
{
return mfill_atomic(dst_mm, start, 0, len, mmap_changing,
- uffd_flags_set_mode(0, MFILL_ATOMIC_CONTINUE));
+ uffd_flags_set_mode(flags, MFILL_ATOMIC_CONTINUE));
}
long uffd_wp_range(struct vm_area_struct *dst_vma,
diff --git a/tools/testing/selftests/mm/userfaultfd.c b/tools/testing/selftests/mm/userfaultfd.c
index 7f22844ed704..41c1f9abc481 100644
--- a/tools/testing/selftests/mm/userfaultfd.c
+++ b/tools/testing/selftests/mm/userfaultfd.c
@@ -585,6 +585,8 @@ static void continue_range(int ufd, __u64 start, __u64 len)
req.range.start = start;
req.range.len = len;
req.mode = 0;
+ if (test_uffdio_wp)
+ req.mode |= UFFDIO_CONTINUE_MODE_WP;
if (ioctl(ufd, UFFDIO_CONTINUE, &req))
err("UFFDIO_CONTINUE failed for address 0x%" PRIx64,
@@ -1332,6 +1334,8 @@ static int userfaultfd_minor_test(void)
uffdio_register.range.start = (unsigned long)area_dst_alias;
uffdio_register.range.len = nr_pages * page_size;
uffdio_register.mode = UFFDIO_REGISTER_MODE_MINOR;
+ if (test_uffdio_wp)
+ uffdio_register.mode |= UFFDIO_REGISTER_MODE_WP;
if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register))
err("register failure");
--
2.40.0.rc1.284.g88254d51c5-goog
All nitpicks below.
On Wed, Mar 08, 2023 at 02:19:31PM -0800, Axel Rasmussen wrote:
> +static inline bool uffd_flags_has_mode(uffd_flags_t flags, enum mfill_atomic_mode expected)
> +{
> + return (flags & MFILL_ATOMIC_MODE_MASK) == ((__force uffd_flags_t) expected);
> +}
I would still call it uffd_flags_get_mode() or uffd_flags_mode(), "has"
sounds a bit like there can be >1 modes set but it's not.
> +
> +static inline uffd_flags_t uffd_flags_set_mode(uffd_flags_t flags, enum mfill_atomic_mode mode)
> +{
> + return flags | ((__force uffd_flags_t) mode);
> +}
IIUC this __force mostly won't work in any way because it protects
e.g. illegal math ops upon it (to only allow bitops, iiuc) but here it's an
OR so it's always legal..
So I'd just drop it and also clear the mode mask to be very clear it sets
the mode right, rather than any chance of messing up when set twice:
flags &= ~MFILL_ATOMIC_MODE_MASK;
return flags | mode;
But feel free to ignore this if there's no other reason to repost, I don't
think it matters a huge deal.
Acked-by: Peter Xu <[email protected]>
Thanks,
--
Peter Xu
On Wed, Mar 08, 2023 at 02:19:29PM -0800, Axel Rasmussen wrote:
> The basic problem is, over time we've added new userfaultfd ioctls, and
> we've refactored the code so functions which used to handle only one
> case are now re-used to deal with several cases. While this happened, we
> didn't bother to rename the functions.
>
> Similarly, as we added new functions, we cargo-culted pieces of the
> now-inconsistent naming scheme, so those functions too ended up with
> names that don't make a lot of sense.
>
> A key point here is, "copy" in most userfaultfd code refers specifically
> to UFFDIO_COPY, where we allocate a new page and copy its contents from
> userspace. There are many functions with "copy" in the name that don't
> actually do this (at least in some cases).
>
> So, rename things into a consistent scheme. The high level idea is that
> the call stack for userfaultfd ioctls becomes:
>
> userfaultfd_ioctl
> -> userfaultfd_(particular ioctl)
> -> mfill_atomic_(particular kind of fill operation)
> -> mfill_atomic /* loops over pages in range */
> -> mfill_atomic_pte /* deals with single pages */
> -> mfill_atomic_pte_(particular kind of fill operation)
> -> mfill_atomic_install_pte
>
> There are of course some special cases (shmem, hugetlb), but this is the
> general structure which all function names now adhere to.
>
> Acked-by: Peter Xu <[email protected]>
> Signed-off-by: Axel Rasmussen <[email protected]>
Acked-by: Mike Rapoport (IBM) <[email protected]>
> ---
> fs/userfaultfd.c | 18 +++----
> include/linux/hugetlb.h | 30 +++++------
> include/linux/userfaultfd_k.h | 18 +++----
> mm/hugetlb.c | 20 +++----
> mm/userfaultfd.c | 98 +++++++++++++++++------------------
> 5 files changed, 92 insertions(+), 92 deletions(-)
>
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index 44d1ee429eb0..365bf00dd8dd 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -1741,9 +1741,9 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx,
> if (uffdio_copy.mode & ~(UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP))
> goto out;
> if (mmget_not_zero(ctx->mm)) {
> - ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src,
> - uffdio_copy.len, &ctx->mmap_changing,
> - uffdio_copy.mode);
> + ret = mfill_atomic_copy(ctx->mm, uffdio_copy.dst, uffdio_copy.src,
> + uffdio_copy.len, &ctx->mmap_changing,
> + uffdio_copy.mode);
> mmput(ctx->mm);
> } else {
> return -ESRCH;
> @@ -1793,9 +1793,9 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx,
> goto out;
>
> if (mmget_not_zero(ctx->mm)) {
> - ret = mfill_zeropage(ctx->mm, uffdio_zeropage.range.start,
> - uffdio_zeropage.range.len,
> - &ctx->mmap_changing);
> + ret = mfill_atomic_zeropage(ctx->mm, uffdio_zeropage.range.start,
> + uffdio_zeropage.range.len,
> + &ctx->mmap_changing);
> mmput(ctx->mm);
> } else {
> return -ESRCH;
> @@ -1903,9 +1903,9 @@ static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg)
> goto out;
>
> if (mmget_not_zero(ctx->mm)) {
> - ret = mcopy_continue(ctx->mm, uffdio_continue.range.start,
> - uffdio_continue.range.len,
> - &ctx->mmap_changing);
> + ret = mfill_atomic_continue(ctx->mm, uffdio_continue.range.start,
> + uffdio_continue.range.len,
> + &ctx->mmap_changing);
> mmput(ctx->mm);
> } else {
> return -ESRCH;
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 7c977d234aba..8f0467bf1cbd 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -158,13 +158,13 @@ unsigned long hugetlb_total_pages(void);
> vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> unsigned long address, unsigned int flags);
> #ifdef CONFIG_USERFAULTFD
> -int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte,
> - struct vm_area_struct *dst_vma,
> - unsigned long dst_addr,
> - unsigned long src_addr,
> - enum mcopy_atomic_mode mode,
> - struct page **pagep,
> - bool wp_copy);
> +int hugetlb_mfill_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte,
> + struct vm_area_struct *dst_vma,
> + unsigned long dst_addr,
> + unsigned long src_addr,
> + enum mcopy_atomic_mode mode,
> + struct page **pagep,
> + bool wp_copy);
> #endif /* CONFIG_USERFAULTFD */
> bool hugetlb_reserve_pages(struct inode *inode, long from, long to,
> struct vm_area_struct *vma,
> @@ -393,14 +393,14 @@ static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
> }
>
> #ifdef CONFIG_USERFAULTFD
> -static inline int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
> - pte_t *dst_pte,
> - struct vm_area_struct *dst_vma,
> - unsigned long dst_addr,
> - unsigned long src_addr,
> - enum mcopy_atomic_mode mode,
> - struct page **pagep,
> - bool wp_copy)
> +static inline int hugetlb_mfill_atomic_pte(struct mm_struct *dst_mm,
> + pte_t *dst_pte,
> + struct vm_area_struct *dst_vma,
> + unsigned long dst_addr,
> + unsigned long src_addr,
> + enum mcopy_atomic_mode mode,
> + struct page **pagep,
> + bool wp_copy)
> {
> BUG();
> return 0;
> diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
> index 3767f18114ef..468080125612 100644
> --- a/include/linux/userfaultfd_k.h
> +++ b/include/linux/userfaultfd_k.h
> @@ -61,15 +61,15 @@ extern int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
> unsigned long dst_addr, struct page *page,
> bool newly_allocated, bool wp_copy);
>
> -extern ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start,
> - unsigned long src_start, unsigned long len,
> - atomic_t *mmap_changing, __u64 mode);
> -extern ssize_t mfill_zeropage(struct mm_struct *dst_mm,
> - unsigned long dst_start,
> - unsigned long len,
> - atomic_t *mmap_changing);
> -extern ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long dst_start,
> - unsigned long len, atomic_t *mmap_changing);
> +extern ssize_t mfill_atomic_copy(struct mm_struct *dst_mm, unsigned long dst_start,
> + unsigned long src_start, unsigned long len,
> + atomic_t *mmap_changing, __u64 mode);
> +extern ssize_t mfill_atomic_zeropage(struct mm_struct *dst_mm,
> + unsigned long dst_start,
> + unsigned long len,
> + atomic_t *mmap_changing);
> +extern ssize_t mfill_atomic_continue(struct mm_struct *dst_mm, unsigned long dst_start,
> + unsigned long len, atomic_t *mmap_changing);
> extern int mwriteprotect_range(struct mm_struct *dst_mm,
> unsigned long start, unsigned long len,
> bool enable_wp, atomic_t *mmap_changing);
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 07abcb6eb203..4c9276549394 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -6154,17 +6154,17 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>
> #ifdef CONFIG_USERFAULTFD
> /*
> - * Used by userfaultfd UFFDIO_COPY. Based on mcopy_atomic_pte with
> - * modifications for huge pages.
> + * Used by userfaultfd UFFDIO_* ioctls. Based on userfaultfd's mfill_atomic_pte
> + * with modifications for hugetlb pages.
> */
> -int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
> - pte_t *dst_pte,
> - struct vm_area_struct *dst_vma,
> - unsigned long dst_addr,
> - unsigned long src_addr,
> - enum mcopy_atomic_mode mode,
> - struct page **pagep,
> - bool wp_copy)
> +int hugetlb_mfill_atomic_pte(struct mm_struct *dst_mm,
> + pte_t *dst_pte,
> + struct vm_area_struct *dst_vma,
> + unsigned long dst_addr,
> + unsigned long src_addr,
> + enum mcopy_atomic_mode mode,
> + struct page **pagep,
> + bool wp_copy)
> {
> bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE);
> struct hstate *h = hstate_vma(dst_vma);
> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> index 53c3d916ff66..84db5b2fad3a 100644
> --- a/mm/userfaultfd.c
> +++ b/mm/userfaultfd.c
> @@ -127,13 +127,13 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
> return ret;
> }
>
> -static int mcopy_atomic_pte(struct mm_struct *dst_mm,
> - pmd_t *dst_pmd,
> - struct vm_area_struct *dst_vma,
> - unsigned long dst_addr,
> - unsigned long src_addr,
> - struct page **pagep,
> - bool wp_copy)
> +static int mfill_atomic_pte_copy(struct mm_struct *dst_mm,
> + pmd_t *dst_pmd,
> + struct vm_area_struct *dst_vma,
> + unsigned long dst_addr,
> + unsigned long src_addr,
> + struct page **pagep,
> + bool wp_copy)
> {
> void *page_kaddr;
> int ret;
> @@ -204,10 +204,10 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm,
> goto out;
> }
>
> -static int mfill_zeropage_pte(struct mm_struct *dst_mm,
> - pmd_t *dst_pmd,
> - struct vm_area_struct *dst_vma,
> - unsigned long dst_addr)
> +static int mfill_atomic_pte_zeropage(struct mm_struct *dst_mm,
> + pmd_t *dst_pmd,
> + struct vm_area_struct *dst_vma,
> + unsigned long dst_addr)
> {
> pte_t _dst_pte, *dst_pte;
> spinlock_t *ptl;
> @@ -240,11 +240,11 @@ static int mfill_zeropage_pte(struct mm_struct *dst_mm,
> }
>
> /* Handles UFFDIO_CONTINUE for all shmem VMAs (shared or private). */
> -static int mcontinue_atomic_pte(struct mm_struct *dst_mm,
> - pmd_t *dst_pmd,
> - struct vm_area_struct *dst_vma,
> - unsigned long dst_addr,
> - bool wp_copy)
> +static int mfill_atomic_pte_continue(struct mm_struct *dst_mm,
> + pmd_t *dst_pmd,
> + struct vm_area_struct *dst_vma,
> + unsigned long dst_addr,
> + bool wp_copy)
> {
> struct inode *inode = file_inode(dst_vma->vm_file);
> pgoff_t pgoff = linear_page_index(dst_vma, dst_addr);
> @@ -307,10 +307,10 @@ static pmd_t *mm_alloc_pmd(struct mm_struct *mm, unsigned long address)
>
> #ifdef CONFIG_HUGETLB_PAGE
> /*
> - * __mcopy_atomic processing for HUGETLB vmas. Note that this routine is
> + * mfill_atomic processing for HUGETLB vmas. Note that this routine is
> * called with mmap_lock held, it will release mmap_lock before returning.
> */
> -static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
> +static __always_inline ssize_t mfill_atomic_hugetlb(struct mm_struct *dst_mm,
> struct vm_area_struct *dst_vma,
> unsigned long dst_start,
> unsigned long src_start,
> @@ -411,7 +411,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
> goto out_unlock;
> }
>
> - err = hugetlb_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma,
> + err = hugetlb_mfill_atomic_pte(dst_mm, dst_pte, dst_vma,
> dst_addr, src_addr, mode, &page,
> wp_copy);
>
> @@ -463,7 +463,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
> }
> #else /* !CONFIG_HUGETLB_PAGE */
> /* fail at build time if gcc attempts to use this */
> -extern ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
> +extern ssize_t mfill_atomic_hugetlb(struct mm_struct *dst_mm,
> struct vm_area_struct *dst_vma,
> unsigned long dst_start,
> unsigned long src_start,
> @@ -484,8 +484,8 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm,
> ssize_t err;
>
> if (mode == MCOPY_ATOMIC_CONTINUE) {
> - return mcontinue_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
> - wp_copy);
> + return mfill_atomic_pte_continue(dst_mm, dst_pmd, dst_vma,
> + dst_addr, wp_copy);
> }
>
> /*
> @@ -500,11 +500,11 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm,
> */
> if (!(dst_vma->vm_flags & VM_SHARED)) {
> if (mode == MCOPY_ATOMIC_NORMAL)
> - err = mcopy_atomic_pte(dst_mm, dst_pmd, dst_vma,
> - dst_addr, src_addr, page,
> - wp_copy);
> + err = mfill_atomic_pte_copy(dst_mm, dst_pmd, dst_vma,
> + dst_addr, src_addr, page,
> + wp_copy);
> else
> - err = mfill_zeropage_pte(dst_mm, dst_pmd,
> + err = mfill_atomic_pte_zeropage(dst_mm, dst_pmd,
> dst_vma, dst_addr);
> } else {
> err = shmem_mfill_atomic_pte(dst_mm, dst_pmd, dst_vma,
> @@ -516,13 +516,13 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm,
> return err;
> }
>
> -static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
> - unsigned long dst_start,
> - unsigned long src_start,
> - unsigned long len,
> - enum mcopy_atomic_mode mcopy_mode,
> - atomic_t *mmap_changing,
> - __u64 mode)
> +static __always_inline ssize_t mfill_atomic(struct mm_struct *dst_mm,
> + unsigned long dst_start,
> + unsigned long src_start,
> + unsigned long len,
> + enum mcopy_atomic_mode mcopy_mode,
> + atomic_t *mmap_changing,
> + __u64 mode)
> {
> struct vm_area_struct *dst_vma;
> ssize_t err;
> @@ -588,9 +588,9 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
> * If this is a HUGETLB vma, pass off to appropriate routine
> */
> if (is_vm_hugetlb_page(dst_vma))
> - return __mcopy_atomic_hugetlb(dst_mm, dst_vma, dst_start,
> - src_start, len, mcopy_mode,
> - wp_copy);
> + return mfill_atomic_hugetlb(dst_mm, dst_vma, dst_start,
> + src_start, len, mcopy_mode,
> + wp_copy);
>
> if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma))
> goto out_unlock;
> @@ -688,26 +688,26 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
> return copied ? copied : err;
> }
>
> -ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start,
> - unsigned long src_start, unsigned long len,
> - atomic_t *mmap_changing, __u64 mode)
> +ssize_t mfill_atomic_copy(struct mm_struct *dst_mm, unsigned long dst_start,
> + unsigned long src_start, unsigned long len,
> + atomic_t *mmap_changing, __u64 mode)
> {
> - return __mcopy_atomic(dst_mm, dst_start, src_start, len,
> - MCOPY_ATOMIC_NORMAL, mmap_changing, mode);
> + return mfill_atomic(dst_mm, dst_start, src_start, len,
> + MCOPY_ATOMIC_NORMAL, mmap_changing, mode);
> }
>
> -ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start,
> - unsigned long len, atomic_t *mmap_changing)
> +ssize_t mfill_atomic_zeropage(struct mm_struct *dst_mm, unsigned long start,
> + unsigned long len, atomic_t *mmap_changing)
> {
> - return __mcopy_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_ZEROPAGE,
> - mmap_changing, 0);
> + return mfill_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_ZEROPAGE,
> + mmap_changing, 0);
> }
>
> -ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long start,
> - unsigned long len, atomic_t *mmap_changing)
> +ssize_t mfill_atomic_continue(struct mm_struct *dst_mm, unsigned long start,
> + unsigned long len, atomic_t *mmap_changing)
> {
> - return __mcopy_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_CONTINUE,
> - mmap_changing, 0);
> + return mfill_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_CONTINUE,
> + mmap_changing, 0);
> }
>
> long uffd_wp_range(struct mm_struct *dst_mm, struct vm_area_struct *dst_vma,
> --
> 2.40.0.rc1.284.g88254d51c5-goog
>
--
Sincerely yours,
Mike.
On Wed, Mar 08, 2023 at 02:19:30PM -0800, Axel Rasmussen wrote:
> Quite a few userfaultfd functions took both mm and vma pointers as
> arguments. Since the mm is trivially accessible via vma->vm_mm, there's
> no reason to pass both; it just needlessly extends the already long
> argument list.
>
> Get rid of the mm pointer, where possible, to shorten the argument list.
>
> Acked-by: Peter Xu <[email protected]>
> Signed-off-by: Axel Rasmussen <[email protected]>
Acked-by: Mike Rapoport (IBM) <[email protected]>
> ---
> fs/userfaultfd.c | 2 +-
> include/linux/hugetlb.h | 5 ++-
> include/linux/shmem_fs.h | 4 +--
> include/linux/userfaultfd_k.h | 4 +--
> mm/hugetlb.c | 4 +--
> mm/shmem.c | 7 ++--
> mm/userfaultfd.c | 61 +++++++++++++++++------------------
> 7 files changed, 41 insertions(+), 46 deletions(-)
>
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index 365bf00dd8dd..84d5d402214a 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -1629,7 +1629,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
>
> /* Reset ptes for the whole vma range if wr-protected */
> if (userfaultfd_wp(vma))
> - uffd_wp_range(mm, vma, start, vma_end - start, false);
> + uffd_wp_range(vma, start, vma_end - start, false);
>
> new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS;
> prev = vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 8f0467bf1cbd..8b9325f77ac3 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -158,7 +158,7 @@ unsigned long hugetlb_total_pages(void);
> vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> unsigned long address, unsigned int flags);
> #ifdef CONFIG_USERFAULTFD
> -int hugetlb_mfill_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte,
> +int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
> struct vm_area_struct *dst_vma,
> unsigned long dst_addr,
> unsigned long src_addr,
> @@ -393,8 +393,7 @@ static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
> }
>
> #ifdef CONFIG_USERFAULTFD
> -static inline int hugetlb_mfill_atomic_pte(struct mm_struct *dst_mm,
> - pte_t *dst_pte,
> +static inline int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
> struct vm_area_struct *dst_vma,
> unsigned long dst_addr,
> unsigned long src_addr,
> diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> index 103d1000a5a2..b82916c25e61 100644
> --- a/include/linux/shmem_fs.h
> +++ b/include/linux/shmem_fs.h
> @@ -151,14 +151,14 @@ extern void shmem_uncharge(struct inode *inode, long pages);
>
> #ifdef CONFIG_USERFAULTFD
> #ifdef CONFIG_SHMEM
> -extern int shmem_mfill_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
> +extern int shmem_mfill_atomic_pte(pmd_t *dst_pmd,
> struct vm_area_struct *dst_vma,
> unsigned long dst_addr,
> unsigned long src_addr,
> bool zeropage, bool wp_copy,
> struct page **pagep);
> #else /* !CONFIG_SHMEM */
> -#define shmem_mfill_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, \
> +#define shmem_mfill_atomic_pte(dst_pmd, dst_vma, dst_addr, \
> src_addr, zeropage, wp_copy, pagep) ({ BUG(); 0; })
> #endif /* CONFIG_SHMEM */
> #endif /* CONFIG_USERFAULTFD */
> diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
> index 468080125612..ba79e296fcc7 100644
> --- a/include/linux/userfaultfd_k.h
> +++ b/include/linux/userfaultfd_k.h
> @@ -56,7 +56,7 @@ enum mcopy_atomic_mode {
> MCOPY_ATOMIC_CONTINUE,
> };
>
> -extern int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
> +extern int mfill_atomic_install_pte(pmd_t *dst_pmd,
> struct vm_area_struct *dst_vma,
> unsigned long dst_addr, struct page *page,
> bool newly_allocated, bool wp_copy);
> @@ -73,7 +73,7 @@ extern ssize_t mfill_atomic_continue(struct mm_struct *dst_mm, unsigned long dst
> extern int mwriteprotect_range(struct mm_struct *dst_mm,
> unsigned long start, unsigned long len,
> bool enable_wp, atomic_t *mmap_changing);
> -extern long uffd_wp_range(struct mm_struct *dst_mm, struct vm_area_struct *vma,
> +extern long uffd_wp_range(struct vm_area_struct *vma,
> unsigned long start, unsigned long len, bool enable_wp);
>
> /* mm helpers */
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 4c9276549394..fe043034ab46 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -6157,8 +6157,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> * Used by userfaultfd UFFDIO_* ioctls. Based on userfaultfd's mfill_atomic_pte
> * with modifications for hugetlb pages.
> */
> -int hugetlb_mfill_atomic_pte(struct mm_struct *dst_mm,
> - pte_t *dst_pte,
> +int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
> struct vm_area_struct *dst_vma,
> unsigned long dst_addr,
> unsigned long src_addr,
> @@ -6166,6 +6165,7 @@ int hugetlb_mfill_atomic_pte(struct mm_struct *dst_mm,
> struct page **pagep,
> bool wp_copy)
> {
> + struct mm_struct *dst_mm = dst_vma->vm_mm;
> bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE);
> struct hstate *h = hstate_vma(dst_vma);
> struct address_space *mapping = dst_vma->vm_file->f_mapping;
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 448f393d8ab2..1d751b6cf1ac 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -2415,8 +2415,7 @@ static struct inode *shmem_get_inode(struct mnt_idmap *idmap, struct super_block
> }
>
> #ifdef CONFIG_USERFAULTFD
> -int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
> - pmd_t *dst_pmd,
> +int shmem_mfill_atomic_pte(pmd_t *dst_pmd,
> struct vm_area_struct *dst_vma,
> unsigned long dst_addr,
> unsigned long src_addr,
> @@ -2506,11 +2505,11 @@ int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
> goto out_release;
>
> ret = shmem_add_to_page_cache(folio, mapping, pgoff, NULL,
> - gfp & GFP_RECLAIM_MASK, dst_mm);
> + gfp & GFP_RECLAIM_MASK, dst_vma->vm_mm);
> if (ret)
> goto out_release;
>
> - ret = mfill_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
> + ret = mfill_atomic_install_pte(dst_pmd, dst_vma, dst_addr,
> &folio->page, true, wp_copy);
> if (ret)
> goto out_delete_from_cache;
> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> index 84db5b2fad3a..4fc373476739 100644
> --- a/mm/userfaultfd.c
> +++ b/mm/userfaultfd.c
> @@ -55,12 +55,13 @@ struct vm_area_struct *find_dst_vma(struct mm_struct *dst_mm,
> * This function handles both MCOPY_ATOMIC_NORMAL and _CONTINUE for both shmem
> * and anon, and for both shared and private VMAs.
> */
> -int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
> +int mfill_atomic_install_pte(pmd_t *dst_pmd,
> struct vm_area_struct *dst_vma,
> unsigned long dst_addr, struct page *page,
> bool newly_allocated, bool wp_copy)
> {
> int ret;
> + struct mm_struct *dst_mm = dst_vma->vm_mm;
> pte_t _dst_pte, *dst_pte;
> bool writable = dst_vma->vm_flags & VM_WRITE;
> bool vm_shared = dst_vma->vm_flags & VM_SHARED;
> @@ -127,8 +128,7 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
> return ret;
> }
>
> -static int mfill_atomic_pte_copy(struct mm_struct *dst_mm,
> - pmd_t *dst_pmd,
> +static int mfill_atomic_pte_copy(pmd_t *dst_pmd,
> struct vm_area_struct *dst_vma,
> unsigned long dst_addr,
> unsigned long src_addr,
> @@ -190,10 +190,10 @@ static int mfill_atomic_pte_copy(struct mm_struct *dst_mm,
> __SetPageUptodate(page);
>
> ret = -ENOMEM;
> - if (mem_cgroup_charge(page_folio(page), dst_mm, GFP_KERNEL))
> + if (mem_cgroup_charge(page_folio(page), dst_vma->vm_mm, GFP_KERNEL))
> goto out_release;
>
> - ret = mfill_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
> + ret = mfill_atomic_install_pte(dst_pmd, dst_vma, dst_addr,
> page, true, wp_copy);
> if (ret)
> goto out_release;
> @@ -204,8 +204,7 @@ static int mfill_atomic_pte_copy(struct mm_struct *dst_mm,
> goto out;
> }
>
> -static int mfill_atomic_pte_zeropage(struct mm_struct *dst_mm,
> - pmd_t *dst_pmd,
> +static int mfill_atomic_pte_zeropage(pmd_t *dst_pmd,
> struct vm_area_struct *dst_vma,
> unsigned long dst_addr)
> {
> @@ -217,7 +216,7 @@ static int mfill_atomic_pte_zeropage(struct mm_struct *dst_mm,
>
> _dst_pte = pte_mkspecial(pfn_pte(my_zero_pfn(dst_addr),
> dst_vma->vm_page_prot));
> - dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl);
> + dst_pte = pte_offset_map_lock(dst_vma->vm_mm, dst_pmd, dst_addr, &ptl);
> if (dst_vma->vm_file) {
> /* the shmem MAP_PRIVATE case requires checking the i_size */
> inode = dst_vma->vm_file->f_inode;
> @@ -230,7 +229,7 @@ static int mfill_atomic_pte_zeropage(struct mm_struct *dst_mm,
> ret = -EEXIST;
> if (!pte_none(*dst_pte))
> goto out_unlock;
> - set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
> + set_pte_at(dst_vma->vm_mm, dst_addr, dst_pte, _dst_pte);
> /* No need to invalidate - it was non-present before */
> update_mmu_cache(dst_vma, dst_addr, dst_pte);
> ret = 0;
> @@ -240,8 +239,7 @@ static int mfill_atomic_pte_zeropage(struct mm_struct *dst_mm,
> }
>
> /* Handles UFFDIO_CONTINUE for all shmem VMAs (shared or private). */
> -static int mfill_atomic_pte_continue(struct mm_struct *dst_mm,
> - pmd_t *dst_pmd,
> +static int mfill_atomic_pte_continue(pmd_t *dst_pmd,
> struct vm_area_struct *dst_vma,
> unsigned long dst_addr,
> bool wp_copy)
> @@ -269,7 +267,7 @@ static int mfill_atomic_pte_continue(struct mm_struct *dst_mm,
> goto out_release;
> }
>
> - ret = mfill_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
> + ret = mfill_atomic_install_pte(dst_pmd, dst_vma, dst_addr,
> page, false, wp_copy);
> if (ret)
> goto out_release;
> @@ -310,7 +308,7 @@ static pmd_t *mm_alloc_pmd(struct mm_struct *mm, unsigned long address)
> * mfill_atomic processing for HUGETLB vmas. Note that this routine is
> * called with mmap_lock held, it will release mmap_lock before returning.
> */
> -static __always_inline ssize_t mfill_atomic_hugetlb(struct mm_struct *dst_mm,
> +static __always_inline ssize_t mfill_atomic_hugetlb(
> struct vm_area_struct *dst_vma,
> unsigned long dst_start,
> unsigned long src_start,
> @@ -318,6 +316,7 @@ static __always_inline ssize_t mfill_atomic_hugetlb(struct mm_struct *dst_mm,
> enum mcopy_atomic_mode mode,
> bool wp_copy)
> {
> + struct mm_struct *dst_mm = dst_vma->vm_mm;
> int vm_shared = dst_vma->vm_flags & VM_SHARED;
> ssize_t err;
> pte_t *dst_pte;
> @@ -411,7 +410,7 @@ static __always_inline ssize_t mfill_atomic_hugetlb(struct mm_struct *dst_mm,
> goto out_unlock;
> }
>
> - err = hugetlb_mfill_atomic_pte(dst_mm, dst_pte, dst_vma,
> + err = hugetlb_mfill_atomic_pte(dst_pte, dst_vma,
> dst_addr, src_addr, mode, &page,
> wp_copy);
>
> @@ -463,17 +462,15 @@ static __always_inline ssize_t mfill_atomic_hugetlb(struct mm_struct *dst_mm,
> }
> #else /* !CONFIG_HUGETLB_PAGE */
> /* fail at build time if gcc attempts to use this */
> -extern ssize_t mfill_atomic_hugetlb(struct mm_struct *dst_mm,
> - struct vm_area_struct *dst_vma,
> - unsigned long dst_start,
> - unsigned long src_start,
> - unsigned long len,
> - enum mcopy_atomic_mode mode,
> - bool wp_copy);
> +extern ssize_t mfill_atomic_hugetlb(struct vm_area_struct *dst_vma,
> + unsigned long dst_start,
> + unsigned long src_start,
> + unsigned long len,
> + enum mcopy_atomic_mode mode,
> + bool wp_copy);
> #endif /* CONFIG_HUGETLB_PAGE */
>
> -static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm,
> - pmd_t *dst_pmd,
> +static __always_inline ssize_t mfill_atomic_pte(pmd_t *dst_pmd,
> struct vm_area_struct *dst_vma,
> unsigned long dst_addr,
> unsigned long src_addr,
> @@ -484,7 +481,7 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm,
> ssize_t err;
>
> if (mode == MCOPY_ATOMIC_CONTINUE) {
> - return mfill_atomic_pte_continue(dst_mm, dst_pmd, dst_vma,
> + return mfill_atomic_pte_continue(dst_pmd, dst_vma,
> dst_addr, wp_copy);
> }
>
> @@ -500,14 +497,14 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm,
> */
> if (!(dst_vma->vm_flags & VM_SHARED)) {
> if (mode == MCOPY_ATOMIC_NORMAL)
> - err = mfill_atomic_pte_copy(dst_mm, dst_pmd, dst_vma,
> + err = mfill_atomic_pte_copy(dst_pmd, dst_vma,
> dst_addr, src_addr, page,
> wp_copy);
> else
> - err = mfill_atomic_pte_zeropage(dst_mm, dst_pmd,
> + err = mfill_atomic_pte_zeropage(dst_pmd,
> dst_vma, dst_addr);
> } else {
> - err = shmem_mfill_atomic_pte(dst_mm, dst_pmd, dst_vma,
> + err = shmem_mfill_atomic_pte(dst_pmd, dst_vma,
> dst_addr, src_addr,
> mode != MCOPY_ATOMIC_NORMAL,
> wp_copy, page);
> @@ -588,7 +585,7 @@ static __always_inline ssize_t mfill_atomic(struct mm_struct *dst_mm,
> * If this is a HUGETLB vma, pass off to appropriate routine
> */
> if (is_vm_hugetlb_page(dst_vma))
> - return mfill_atomic_hugetlb(dst_mm, dst_vma, dst_start,
> + return mfill_atomic_hugetlb(dst_vma, dst_start,
> src_start, len, mcopy_mode,
> wp_copy);
>
> @@ -641,7 +638,7 @@ static __always_inline ssize_t mfill_atomic(struct mm_struct *dst_mm,
> BUG_ON(pmd_none(*dst_pmd));
> BUG_ON(pmd_trans_huge(*dst_pmd));
>
> - err = mfill_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
> + err = mfill_atomic_pte(dst_pmd, dst_vma, dst_addr,
> src_addr, &page, mcopy_mode, wp_copy);
> cond_resched();
>
> @@ -710,7 +707,7 @@ ssize_t mfill_atomic_continue(struct mm_struct *dst_mm, unsigned long start,
> mmap_changing, 0);
> }
>
> -long uffd_wp_range(struct mm_struct *dst_mm, struct vm_area_struct *dst_vma,
> +long uffd_wp_range(struct vm_area_struct *dst_vma,
> unsigned long start, unsigned long len, bool enable_wp)
> {
> unsigned int mm_cp_flags;
> @@ -730,7 +727,7 @@ long uffd_wp_range(struct mm_struct *dst_mm, struct vm_area_struct *dst_vma,
> */
> if (!enable_wp && vma_wants_manual_pte_write_upgrade(dst_vma))
> mm_cp_flags |= MM_CP_TRY_CHANGE_WRITABLE;
> - tlb_gather_mmu(&tlb, dst_mm);
> + tlb_gather_mmu(&tlb, dst_vma->vm_mm);
> ret = change_protection(&tlb, dst_vma, start, start + len, mm_cp_flags);
> tlb_finish_mmu(&tlb);
>
> @@ -782,7 +779,7 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
> goto out_unlock;
> }
>
> - err = uffd_wp_range(dst_mm, dst_vma, start, len, enable_wp);
> + err = uffd_wp_range(dst_vma, start, len, enable_wp);
>
> /* Return 0 on success, <0 on failures */
> if (err > 0)
> --
> 2.40.0.rc1.284.g88254d51c5-goog
>
--
Sincerely yours,
Mike.
On Wed, Mar 08, 2023 at 02:19:31PM -0800, Axel Rasmussen wrote:
> Many userfaultfd ioctl functions take both a 'mode' and a 'wp_copy'
> argument. In future commits we plan to plumb the flags through to more
> places, so we'd be proliferating the very long argument list even
> further.
>
> Let's take the time to simplify the argument list. Combine the two
> arguments into one - and generalize, so when we add more flags in the
> future, it doesn't imply more function arguments.
>
> Since the modes (copy, zeropage, continue) are mutually exclusive, store
> them as an integer value (0, 1, 2) in the low bits. Place combine-able
> flag bits in the high bits.
>
> This is quite similar to an earlier patch proposed by Nadav Amit
> ("userfaultfd: introduce uffd_flags" [1]). The main difference is that
> patch only handled flags, whereas this patch *also* combines the "mode"
> argument into the same type to shorten the argument list.
>
> [1]: https://lore.kernel.org/all/[email protected]/
>
> Acked-by: James Houghton <[email protected]>
> Signed-off-by: Axel Rasmussen <[email protected]>
> ---
> fs/userfaultfd.c | 5 ++-
> include/linux/hugetlb.h | 10 ++---
> include/linux/shmem_fs.h | 5 ++-
> include/linux/userfaultfd_k.h | 45 +++++++++++++--------
> mm/hugetlb.c | 12 +++---
> mm/shmem.c | 7 ++--
> mm/userfaultfd.c | 76 ++++++++++++++++-------------------
> 7 files changed, 83 insertions(+), 77 deletions(-)
...
> diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
> index ba79e296fcc7..4d7425684171 100644
> --- a/include/linux/userfaultfd_k.h
> +++ b/include/linux/userfaultfd_k.h
> @@ -40,30 +40,43 @@ extern int sysctl_unprivileged_userfaultfd;
>
> extern vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason);
>
> -/*
> - * The mode of operation for __mcopy_atomic and its helpers.
> - *
> - * This is almost an implementation detail (mcopy_atomic below doesn't take this
> - * as a parameter), but it's exposed here because memory-kind-specific
> - * implementations (e.g. hugetlbfs) need to know the mode of operation.
> - */
> -enum mcopy_atomic_mode {
> - /* A normal copy_from_user into the destination range. */
> - MCOPY_ATOMIC_NORMAL,
> - /* Don't copy; map the destination range to the zero page. */
> - MCOPY_ATOMIC_ZEROPAGE,
> - /* Just install pte(s) with the existing page(s) in the page cache. */
> - MCOPY_ATOMIC_CONTINUE,
> +/* A combined operation mode + behavior flags. */
> +typedef unsigned int __bitwise uffd_flags_t;
> +
> +/* Mutually exclusive modes of operation. */
> +enum mfill_atomic_mode {
> + MFILL_ATOMIC_COPY,
> + MFILL_ATOMIC_ZEROPAGE,
> + MFILL_ATOMIC_CONTINUE,
> + NR_MFILL_ATOMIC_MODES,
> };
>
> +#define MFILL_ATOMIC_MODE_BITS (const_ilog2(NR_MFILL_ATOMIC_MODES - 1) + 1)
> +#define MFILL_ATOMIC_BIT(nr) BIT(MFILL_ATOMIC_MODE_BITS + (nr))
> +#define MFILL_ATOMIC_FLAG(nr) ((__force uffd_flags_t) MFILL_ATOMIC_BIT(nr))
> +#define MFILL_ATOMIC_MODE_MASK ((__force uffd_flags_t) (MFILL_ATOMIC_BIT(0) - 1))
> +
> +static inline bool uffd_flags_has_mode(uffd_flags_t flags, enum mfill_atomic_mode expected)
> +{
> + return (flags & MFILL_ATOMIC_MODE_MASK) == ((__force uffd_flags_t) expected);
> +}
> +
> +static inline uffd_flags_t uffd_flags_set_mode(uffd_flags_t flags, enum mfill_atomic_mode mode)
> +{
> + return flags | ((__force uffd_flags_t) mode);
> +}
I agree with Peter that uffd_flags_set_mode() implies that the modes are
not mutually exclusive and uffd_flags_get_mode() sounds a better name to
me.
> +/* Flags controlling behavior. */
I'd also emphasize that these apply to different modes.
Aside from that
Acked-by: Mike Rapoport (IBM) <[email protected]>
> +#define MFILL_ATOMIC_WP MFILL_ATOMIC_FLAG(0)
> +
> extern int mfill_atomic_install_pte(pmd_t *dst_pmd,
> struct vm_area_struct *dst_vma,
> unsigned long dst_addr, struct page *page,
> - bool newly_allocated, bool wp_copy);
> + bool newly_allocated, uffd_flags_t flags);
>
> extern ssize_t mfill_atomic_copy(struct mm_struct *dst_mm, unsigned long dst_start,
> unsigned long src_start, unsigned long len,
> - atomic_t *mmap_changing, __u64 mode);
> + atomic_t *mmap_changing, uffd_flags_t flags);
> extern ssize_t mfill_atomic_zeropage(struct mm_struct *dst_mm,
> unsigned long dst_start,
> unsigned long len,
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index fe043034ab46..493406a2d61e 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -6161,12 +6161,12 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
> struct vm_area_struct *dst_vma,
> unsigned long dst_addr,
> unsigned long src_addr,
> - enum mcopy_atomic_mode mode,
> - struct page **pagep,
> - bool wp_copy)
> + uffd_flags_t flags,
> + struct page **pagep)
> {
> struct mm_struct *dst_mm = dst_vma->vm_mm;
> - bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE);
> + bool is_continue = uffd_flags_has_mode(flags, MFILL_ATOMIC_CONTINUE);
> + bool wp_enabled = (flags & MFILL_ATOMIC_WP);
> struct hstate *h = hstate_vma(dst_vma);
> struct address_space *mapping = dst_vma->vm_file->f_mapping;
> pgoff_t idx = vma_hugecache_offset(h, dst_vma, dst_addr);
> @@ -6301,7 +6301,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
> * For either: (1) CONTINUE on a non-shared VMA, or (2) UFFDIO_COPY
> * with wp flag set, don't set pte write bit.
> */
> - if (wp_copy || (is_continue && !vm_shared))
> + if (wp_enabled || (is_continue && !vm_shared))
> writable = 0;
> else
> writable = dst_vma->vm_flags & VM_WRITE;
> @@ -6316,7 +6316,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
> _dst_pte = huge_pte_mkdirty(_dst_pte);
> _dst_pte = pte_mkyoung(_dst_pte);
>
> - if (wp_copy)
> + if (wp_enabled)
> _dst_pte = huge_pte_mkuffd_wp(_dst_pte);
>
> set_huge_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 1d751b6cf1ac..7d688afb5e31 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -76,7 +76,6 @@ static struct vfsmount *shm_mnt;
> #include <linux/syscalls.h>
> #include <linux/fcntl.h>
> #include <uapi/linux/memfd.h>
> -#include <linux/userfaultfd_k.h>
> #include <linux/rmap.h>
> #include <linux/uuid.h>
>
> @@ -2419,7 +2418,7 @@ int shmem_mfill_atomic_pte(pmd_t *dst_pmd,
> struct vm_area_struct *dst_vma,
> unsigned long dst_addr,
> unsigned long src_addr,
> - bool zeropage, bool wp_copy,
> + uffd_flags_t flags,
> struct page **pagep)
> {
> struct inode *inode = file_inode(dst_vma->vm_file);
> @@ -2451,7 +2450,7 @@ int shmem_mfill_atomic_pte(pmd_t *dst_pmd,
> if (!folio)
> goto out_unacct_blocks;
>
> - if (!zeropage) { /* COPY */
> + if (uffd_flags_has_mode(flags, MFILL_ATOMIC_COPY)) {
> page_kaddr = kmap_local_folio(folio, 0);
> /*
> * The read mmap_lock is held here. Despite the
> @@ -2510,7 +2509,7 @@ int shmem_mfill_atomic_pte(pmd_t *dst_pmd,
> goto out_release;
>
> ret = mfill_atomic_install_pte(dst_pmd, dst_vma, dst_addr,
> - &folio->page, true, wp_copy);
> + &folio->page, true, flags);
> if (ret)
> goto out_delete_from_cache;
>
> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> index 4fc373476739..dd807924446f 100644
> --- a/mm/userfaultfd.c
> +++ b/mm/userfaultfd.c
> @@ -58,7 +58,7 @@ struct vm_area_struct *find_dst_vma(struct mm_struct *dst_mm,
> int mfill_atomic_install_pte(pmd_t *dst_pmd,
> struct vm_area_struct *dst_vma,
> unsigned long dst_addr, struct page *page,
> - bool newly_allocated, bool wp_copy)
> + bool newly_allocated, uffd_flags_t flags)
> {
> int ret;
> struct mm_struct *dst_mm = dst_vma->vm_mm;
> @@ -77,7 +77,7 @@ int mfill_atomic_install_pte(pmd_t *dst_pmd,
> writable = false;
> if (writable)
> _dst_pte = pte_mkwrite(_dst_pte);
> - if (wp_copy)
> + if (flags & MFILL_ATOMIC_WP)
> _dst_pte = pte_mkuffd_wp(_dst_pte);
>
> dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl);
> @@ -132,8 +132,8 @@ static int mfill_atomic_pte_copy(pmd_t *dst_pmd,
> struct vm_area_struct *dst_vma,
> unsigned long dst_addr,
> unsigned long src_addr,
> - struct page **pagep,
> - bool wp_copy)
> + uffd_flags_t flags,
> + struct page **pagep)
> {
> void *page_kaddr;
> int ret;
> @@ -194,7 +194,7 @@ static int mfill_atomic_pte_copy(pmd_t *dst_pmd,
> goto out_release;
>
> ret = mfill_atomic_install_pte(dst_pmd, dst_vma, dst_addr,
> - page, true, wp_copy);
> + page, true, flags);
> if (ret)
> goto out_release;
> out:
> @@ -242,7 +242,7 @@ static int mfill_atomic_pte_zeropage(pmd_t *dst_pmd,
> static int mfill_atomic_pte_continue(pmd_t *dst_pmd,
> struct vm_area_struct *dst_vma,
> unsigned long dst_addr,
> - bool wp_copy)
> + uffd_flags_t flags)
> {
> struct inode *inode = file_inode(dst_vma->vm_file);
> pgoff_t pgoff = linear_page_index(dst_vma, dst_addr);
> @@ -268,7 +268,7 @@ static int mfill_atomic_pte_continue(pmd_t *dst_pmd,
> }
>
> ret = mfill_atomic_install_pte(dst_pmd, dst_vma, dst_addr,
> - page, false, wp_copy);
> + page, false, flags);
> if (ret)
> goto out_release;
>
> @@ -313,8 +313,7 @@ static __always_inline ssize_t mfill_atomic_hugetlb(
> unsigned long dst_start,
> unsigned long src_start,
> unsigned long len,
> - enum mcopy_atomic_mode mode,
> - bool wp_copy)
> + uffd_flags_t flags)
> {
> struct mm_struct *dst_mm = dst_vma->vm_mm;
> int vm_shared = dst_vma->vm_flags & VM_SHARED;
> @@ -334,7 +333,7 @@ static __always_inline ssize_t mfill_atomic_hugetlb(
> * by THP. Since we can not reliably insert a zero page, this
> * feature is not supported.
> */
> - if (mode == MCOPY_ATOMIC_ZEROPAGE) {
> + if (uffd_flags_has_mode(flags, MFILL_ATOMIC_ZEROPAGE)) {
> mmap_read_unlock(dst_mm);
> return -EINVAL;
> }
> @@ -402,7 +401,7 @@ static __always_inline ssize_t mfill_atomic_hugetlb(
> goto out_unlock;
> }
>
> - if (mode != MCOPY_ATOMIC_CONTINUE &&
> + if (!uffd_flags_has_mode(flags, MFILL_ATOMIC_CONTINUE) &&
> !huge_pte_none_mostly(huge_ptep_get(dst_pte))) {
> err = -EEXIST;
> hugetlb_vma_unlock_read(dst_vma);
> @@ -410,9 +409,8 @@ static __always_inline ssize_t mfill_atomic_hugetlb(
> goto out_unlock;
> }
>
> - err = hugetlb_mfill_atomic_pte(dst_pte, dst_vma,
> - dst_addr, src_addr, mode, &page,
> - wp_copy);
> + err = hugetlb_mfill_atomic_pte(dst_pte, dst_vma, dst_addr,
> + src_addr, flags, &page);
>
> hugetlb_vma_unlock_read(dst_vma);
> mutex_unlock(&hugetlb_fault_mutex_table[hash]);
> @@ -466,23 +464,21 @@ extern ssize_t mfill_atomic_hugetlb(struct vm_area_struct *dst_vma,
> unsigned long dst_start,
> unsigned long src_start,
> unsigned long len,
> - enum mcopy_atomic_mode mode,
> - bool wp_copy);
> + uffd_flags_t flags);
> #endif /* CONFIG_HUGETLB_PAGE */
>
> static __always_inline ssize_t mfill_atomic_pte(pmd_t *dst_pmd,
> struct vm_area_struct *dst_vma,
> unsigned long dst_addr,
> unsigned long src_addr,
> - struct page **page,
> - enum mcopy_atomic_mode mode,
> - bool wp_copy)
> + uffd_flags_t flags,
> + struct page **pagep)
> {
> ssize_t err;
>
> - if (mode == MCOPY_ATOMIC_CONTINUE) {
> + if (uffd_flags_has_mode(flags, MFILL_ATOMIC_CONTINUE)) {
> return mfill_atomic_pte_continue(dst_pmd, dst_vma,
> - dst_addr, wp_copy);
> + dst_addr, flags);
> }
>
> /*
> @@ -496,18 +492,17 @@ static __always_inline ssize_t mfill_atomic_pte(pmd_t *dst_pmd,
> * and not in the radix tree.
> */
> if (!(dst_vma->vm_flags & VM_SHARED)) {
> - if (mode == MCOPY_ATOMIC_NORMAL)
> + if (uffd_flags_has_mode(flags, MFILL_ATOMIC_COPY))
> err = mfill_atomic_pte_copy(dst_pmd, dst_vma,
> - dst_addr, src_addr, page,
> - wp_copy);
> + dst_addr, src_addr,
> + flags, pagep);
> else
> err = mfill_atomic_pte_zeropage(dst_pmd,
> dst_vma, dst_addr);
> } else {
> err = shmem_mfill_atomic_pte(dst_pmd, dst_vma,
> dst_addr, src_addr,
> - mode != MCOPY_ATOMIC_NORMAL,
> - wp_copy, page);
> + flags, pagep);
> }
>
> return err;
> @@ -517,9 +512,8 @@ static __always_inline ssize_t mfill_atomic(struct mm_struct *dst_mm,
> unsigned long dst_start,
> unsigned long src_start,
> unsigned long len,
> - enum mcopy_atomic_mode mcopy_mode,
> atomic_t *mmap_changing,
> - __u64 mode)
> + uffd_flags_t flags)
> {
> struct vm_area_struct *dst_vma;
> ssize_t err;
> @@ -527,7 +521,6 @@ static __always_inline ssize_t mfill_atomic(struct mm_struct *dst_mm,
> unsigned long src_addr, dst_addr;
> long copied;
> struct page *page;
> - bool wp_copy;
>
> /*
> * Sanitize the command parameters:
> @@ -577,8 +570,7 @@ static __always_inline ssize_t mfill_atomic(struct mm_struct *dst_mm,
> * validate 'mode' now that we know the dst_vma: don't allow
> * a wrprotect copy if the userfaultfd didn't register as WP.
> */
> - wp_copy = mode & UFFDIO_COPY_MODE_WP;
> - if (wp_copy && !(dst_vma->vm_flags & VM_UFFD_WP))
> + if ((flags & MFILL_ATOMIC_WP) && !(dst_vma->vm_flags & VM_UFFD_WP))
> goto out_unlock;
>
> /*
> @@ -586,12 +578,12 @@ static __always_inline ssize_t mfill_atomic(struct mm_struct *dst_mm,
> */
> if (is_vm_hugetlb_page(dst_vma))
> return mfill_atomic_hugetlb(dst_vma, dst_start,
> - src_start, len, mcopy_mode,
> - wp_copy);
> + src_start, len, flags);
>
> if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma))
> goto out_unlock;
> - if (!vma_is_shmem(dst_vma) && mcopy_mode == MCOPY_ATOMIC_CONTINUE)
> + if (!vma_is_shmem(dst_vma) &&
> + uffd_flags_has_mode(flags, MFILL_ATOMIC_CONTINUE))
> goto out_unlock;
>
> /*
> @@ -639,7 +631,7 @@ static __always_inline ssize_t mfill_atomic(struct mm_struct *dst_mm,
> BUG_ON(pmd_trans_huge(*dst_pmd));
>
> err = mfill_atomic_pte(dst_pmd, dst_vma, dst_addr,
> - src_addr, &page, mcopy_mode, wp_copy);
> + src_addr, flags, &page);
> cond_resched();
>
> if (unlikely(err == -ENOENT)) {
> @@ -687,24 +679,24 @@ static __always_inline ssize_t mfill_atomic(struct mm_struct *dst_mm,
>
> ssize_t mfill_atomic_copy(struct mm_struct *dst_mm, unsigned long dst_start,
> unsigned long src_start, unsigned long len,
> - atomic_t *mmap_changing, __u64 mode)
> + atomic_t *mmap_changing, uffd_flags_t flags)
> {
> - return mfill_atomic(dst_mm, dst_start, src_start, len,
> - MCOPY_ATOMIC_NORMAL, mmap_changing, mode);
> + return mfill_atomic(dst_mm, dst_start, src_start, len, mmap_changing,
> + uffd_flags_set_mode(flags, MFILL_ATOMIC_COPY));
> }
>
> ssize_t mfill_atomic_zeropage(struct mm_struct *dst_mm, unsigned long start,
> unsigned long len, atomic_t *mmap_changing)
> {
> - return mfill_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_ZEROPAGE,
> - mmap_changing, 0);
> + return mfill_atomic(dst_mm, start, 0, len, mmap_changing,
> + uffd_flags_set_mode(0, MFILL_ATOMIC_ZEROPAGE));
> }
>
> ssize_t mfill_atomic_continue(struct mm_struct *dst_mm, unsigned long start,
> unsigned long len, atomic_t *mmap_changing)
> {
> - return mfill_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_CONTINUE,
> - mmap_changing, 0);
> + return mfill_atomic(dst_mm, start, 0, len, mmap_changing,
> + uffd_flags_set_mode(0, MFILL_ATOMIC_CONTINUE));
> }
>
> long uffd_wp_range(struct vm_area_struct *dst_vma,
> --
> 2.40.0.rc1.284.g88254d51c5-goog
>
--
Sincerely yours,
Mike.
On Wed, Mar 08, 2023 at 02:19:32PM -0800, Axel Rasmussen wrote:
> UFFDIO_COPY already has UFFDIO_COPY_MODE_WP, so when installing a new
> PTE to resolve a missing fault, one can install a write-protected one.
> This is useful when using UFFDIO_REGISTER_MODE_{MISSING,WP} in
> combination.
>
> This was motivated by testing HugeTLB HGM [1], and in particular its
> interaction with userfaultfd features. Existing userfaultfd code
> supports using WP and MINOR modes together (i.e. you can register an
> area with both enabled), but without this CONTINUE flag the combination
> is in practice unusable.
>
> So, add an analogous UFFDIO_CONTINUE_MODE_WP, which does the same thing
> as UFFDIO_COPY_MODE_WP, but for *minor* faults.
>
> Update the selftest to do some very basic exercising of the new flag.
>
> [1]: https://patchwork.kernel.org/project/linux-mm/cover/[email protected]/
>
> Acked-by: Peter Xu <[email protected]>
> Signed-off-by: Axel Rasmussen <[email protected]>
Acked-by: Mike Rapoport (IBM) <[email protected]>
> ---
> fs/userfaultfd.c | 8 ++++++--
> include/linux/userfaultfd_k.h | 3 ++-
> include/uapi/linux/userfaultfd.h | 7 +++++++
> mm/userfaultfd.c | 5 +++--
> tools/testing/selftests/mm/userfaultfd.c | 4 ++++
> 5 files changed, 22 insertions(+), 5 deletions(-)
>
> diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
> index 005e5e306266..14059a0861bf 100644
> --- a/include/uapi/linux/userfaultfd.h
> +++ b/include/uapi/linux/userfaultfd.h
> @@ -297,6 +297,13 @@ struct uffdio_writeprotect {
> struct uffdio_continue {
> struct uffdio_range range;
> #define UFFDIO_CONTINUE_MODE_DONTWAKE ((__u64)1<<0)
> + /*
> + * UFFDIO_CONTINUE_MODE_WP will map the page write protected on
> + * the fly. UFFDIO_CONTINUE_MODE_WP is available only if the
> + * write protected ioctl is implemented for the range
> + * according to the uffdio_register.ioctls.
> + */
> +#define UFFDIO_CONTINUE_MODE_WP ((__u64)1<<1)
Please add the description of the new flag to Documentation/ and to the
userfaultfd man pages.
> __u64 mode;
>
> /*
--
Sincerely yours,
Mike.
On Thu, Mar 9, 2023 at 1:11 AM Mike Rapoport <[email protected]> wrote:
>
> On Wed, Mar 08, 2023 at 02:19:32PM -0800, Axel Rasmussen wrote:
> > UFFDIO_COPY already has UFFDIO_COPY_MODE_WP, so when installing a new
> > PTE to resolve a missing fault, one can install a write-protected one.
> > This is useful when using UFFDIO_REGISTER_MODE_{MISSING,WP} in
> > combination.
> >
> > This was motivated by testing HugeTLB HGM [1], and in particular its
> > interaction with userfaultfd features. Existing userfaultfd code
> > supports using WP and MINOR modes together (i.e. you can register an
> > area with both enabled), but without this CONTINUE flag the combination
> > is in practice unusable.
> >
> > So, add an analogous UFFDIO_CONTINUE_MODE_WP, which does the same thing
> > as UFFDIO_COPY_MODE_WP, but for *minor* faults.
> >
> > Update the selftest to do some very basic exercising of the new flag.
> >
> > [1]: https://patchwork.kernel.org/project/linux-mm/cover/[email protected]/
> >
> > Acked-by: Peter Xu <[email protected]>
> > Signed-off-by: Axel Rasmussen <[email protected]>
>
> Acked-by: Mike Rapoport (IBM) <[email protected]>
>
> > ---
> > fs/userfaultfd.c | 8 ++++++--
> > include/linux/userfaultfd_k.h | 3 ++-
> > include/uapi/linux/userfaultfd.h | 7 +++++++
> > mm/userfaultfd.c | 5 +++--
> > tools/testing/selftests/mm/userfaultfd.c | 4 ++++
> > 5 files changed, 22 insertions(+), 5 deletions(-)
> >
> > diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
> > index 005e5e306266..14059a0861bf 100644
> > --- a/include/uapi/linux/userfaultfd.h
> > +++ b/include/uapi/linux/userfaultfd.h
> > @@ -297,6 +297,13 @@ struct uffdio_writeprotect {
> > struct uffdio_continue {
> > struct uffdio_range range;
> > #define UFFDIO_CONTINUE_MODE_DONTWAKE ((__u64)1<<0)
> > + /*
> > + * UFFDIO_CONTINUE_MODE_WP will map the page write protected on
> > + * the fly. UFFDIO_CONTINUE_MODE_WP is available only if the
> > + * write protected ioctl is implemented for the range
> > + * according to the uffdio_register.ioctls.
> > + */
> > +#define UFFDIO_CONTINUE_MODE_WP ((__u64)1<<1)
>
> Please add the description of the new flag to Documentation/ and to the
> userfaultfd man pages.
Funny enough, neither flag is mentioned in Documentation/ today - I'll
add a short passage about both.
Happy to update the man pages as well, I'll send that patch separately.
Thanks for reviewing!
>
> > __u64 mode;
> >
> > /*
>
> --
> Sincerely yours,
> Mike.
On Wed, Mar 8, 2023 at 2:43 PM Peter Xu <[email protected]> wrote:
>
> All nitpicks below.
>
> On Wed, Mar 08, 2023 at 02:19:31PM -0800, Axel Rasmussen wrote:
> > +static inline bool uffd_flags_has_mode(uffd_flags_t flags, enum mfill_atomic_mode expected)
> > +{
> > + return (flags & MFILL_ATOMIC_MODE_MASK) == ((__force uffd_flags_t) expected);
> > +}
>
> I would still call it uffd_flags_get_mode() or uffd_flags_mode(), "has"
> sounds a bit like there can be >1 modes set but it's not.
I want a helper which does the comparison, instead of just returning
the mode, because it avoids all callers needing to do the __force cast
themselves to appease sparse.
How about uffd_flags_mode_is() ?
>
> > +
> > +static inline uffd_flags_t uffd_flags_set_mode(uffd_flags_t flags, enum mfill_atomic_mode mode)
> > +{
> > + return flags | ((__force uffd_flags_t) mode);
> > +}
>
> IIUC this __force mostly won't work in any way because it protects
> e.g. illegal math ops upon it (to only allow bitops, iiuc) but here it's an
> OR so it's always legal..
>
> So I'd just drop it and also clear the mode mask to be very clear it sets
> the mode right, rather than any chance of messing up when set twice:
>
> flags &= ~MFILL_ATOMIC_MODE_MASK;
> return flags | mode;
Without this __force, "make C=1" gives errors like this:
./include/linux/userfaultfd_k.h:66:16: warning: restricted
uffd_flags_t degrades to integer
./include/linux/userfaultfd_k.h:66:22: warning: incorrect type in
return expression (different base types)
./include/linux/userfaultfd_k.h:66:22: expected restricted uffd_flags_t
./include/linux/userfaultfd_k.h:66:22: got unsigned int
This is because the mode being passed in is effectively an integer, so
the | expression loses the restricted type. Casting the mode first
like this appeases sparse.
An alternative would be to do the cast in the definition of the mode
values up-front; but as we noticed before, we can't really usefully do
that with it still being an enum (so we'd have to hard-code things
like the mode mask, etc.)
I do completely agree about clearing the mask bits first, to avoid
mistakes. I'll send an updated version with that change. If we're
going to have an inline helper anyway to do that, for me it makes less
sense to switch away from the num approach (basically the benefit of
that would be to avoid needing this cast, and therefore the helper;
but if we want the helper anyway for other reasons ...).
>
> But feel free to ignore this if there's no other reason to repost, I don't
> think it matters a huge deal.
>
> Acked-by: Peter Xu <[email protected]>
>
> Thanks,
>
> --
> Peter Xu
>