2021-05-03 23:45:21

by Peter Xu

[permalink] [raw]
Subject: [PATCH v2 0/2] mm/hugetlb: Fix issues on file sealing and fork

v2:

- Move seal check to be after setting VM_HUGETLB [Mike]

- Rewrite commit message for patch 2, explaining more on why it got broken

- Add r-bs for Mike



Hugh reported issue with F_SEAL_FUTURE_WRITE not applied correctly to

hugetlbfs, which I can easily verify using the memfd_test program, which seems

that the program is hardly run with hugetlbfs pages (as by default shmem).



Meanwhile I found another probably even more severe issue on that hugetlb fork

won't wr-protect child cow pages, so child can potentially write to parent

private pages. Patch 2 addresses that.



After this series applied, "memfd_test hugetlbfs" should start to pass.



Please review, thanks.



Peter Xu (2):

mm/hugetlb: Fix F_SEAL_FUTURE_WRITE

mm/hugetlb: Fix cow where page writtable in child



fs/hugetlbfs/inode.c | 5 +++++

include/linux/mm.h | 32 ++++++++++++++++++++++++++++++++

mm/hugetlb.c | 1 +

mm/shmem.c | 22 ++++------------------

4 files changed, 42 insertions(+), 18 deletions(-)



--

2.31.1





2021-05-03 23:47:20

by Peter Xu

[permalink] [raw]
Subject: [PATCH v2 2/2] mm/hugetlb: Fix cow where page writtable in child

When rework early cow of pinned hugetlb pages, we moved huge_ptep_get() upper
but overlooked a side effect that the huge_ptep_get() will fetch the pte after
wr-protection. After moving it upwards, we need explicit wr-protect of child
pte or we will keep the write bit set in the child process, which could cause
data corrution where the child can write to the original page directly.

This issue can also be exposed by "memfd_test hugetlbfs" kselftest.

Cc: [email protected]
Fixes: 4eae4efa2c299 ("hugetlb: do early cow when page pinned on src mm")
Reviewed-by: Mike Kravetz <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
---
mm/hugetlb.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index aab3a33214d10..72544ebb24f0e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4076,6 +4076,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
* See Documentation/vm/mmu_notifier.rst
*/
huge_ptep_set_wrprotect(src, addr, src_pte);
+ entry = huge_pte_wrprotect(entry);
}

page_dup_rmap(ptepage, true);
--
2.31.1

2021-05-03 23:47:37

by Peter Xu

[permalink] [raw]
Subject: [PATCH v2 1/2] mm/hugetlb: Fix F_SEAL_FUTURE_WRITE

F_SEAL_FUTURE_WRITE is missing for hugetlb starting from the first day.
There is a test program for that and it fails constantly.

$ ./memfd_test hugetlbfs
memfd-hugetlb: CREATE
memfd-hugetlb: BASIC
memfd-hugetlb: SEAL-WRITE
memfd-hugetlb: SEAL-FUTURE-WRITE
mmap() didn't fail as expected
Aborted (core dumped)

I think it's probably because no one is really running the hugetlbfs test.

Fix it by checking FUTURE_WRITE also in hugetlbfs_file_mmap() as what we do in
shmem_mmap(). Generalize a helper for that.

Cc: Joel Fernandes (Google) <[email protected]>
Cc: [email protected]
Fixes: ab3948f58ff84 ("mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd")
Reported-by: Hugh Dickins <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
---
fs/hugetlbfs/inode.c | 5 +++++
include/linux/mm.h | 32 ++++++++++++++++++++++++++++++++
mm/shmem.c | 22 ++++------------------
3 files changed, 41 insertions(+), 18 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 9b383c39756a5..6557cf2cb1879 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -131,6 +131,7 @@ static void huge_pagevec_release(struct pagevec *pvec)
static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
{
struct inode *inode = file_inode(file);
+ struct hugetlbfs_inode_info *info = HUGETLBFS_I(inode);
loff_t len, vma_len;
int ret;
struct hstate *h = hstate_file(file);
@@ -146,6 +147,10 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND;
vma->vm_ops = &hugetlb_vm_ops;

+ ret = seal_check_future_write(info->seals, vma);
+ if (ret)
+ return ret;
+
/*
* page based offset in vm_pgoff could be sufficiently large to
* overflow a loff_t when converted to byte offset. This can
diff --git a/include/linux/mm.h b/include/linux/mm.h
index d6790ab0cf575..b9b2caf9302bc 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3238,5 +3238,37 @@ extern int sysctl_nr_trim_pages;

void mem_dump_obj(void *object);

+/**
+ * seal_check_future_write - Check for F_SEAL_FUTURE_WRITE flag and handle it
+ * @seals: the seals to check
+ * @vma: the vma to operate on
+ *
+ * Check whether F_SEAL_FUTURE_WRITE is set; if so, do proper check/handling on
+ * the vma flags. Return 0 if check pass, or <0 for errors.
+ */
+static inline int seal_check_future_write(int seals, struct vm_area_struct *vma)
+{
+ if (seals & F_SEAL_FUTURE_WRITE) {
+ /*
+ * New PROT_WRITE and MAP_SHARED mmaps are not allowed when
+ * "future write" seal active.
+ */
+ if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_WRITE))
+ return -EPERM;
+
+ /*
+ * Since an F_SEAL_FUTURE_WRITE sealed memfd can be mapped as
+ * MAP_SHARED and read-only, take care to not allow mprotect to
+ * revert protections on such mappings. Do this only for shared
+ * mappings. For private mappings, don't need to mask
+ * VM_MAYWRITE as we still want them to be COW-writable.
+ */
+ if (vma->vm_flags & VM_SHARED)
+ vma->vm_flags &= ~(VM_MAYWRITE);
+ }
+
+ return 0;
+}
+
#endif /* __KERNEL__ */
#endif /* _LINUX_MM_H */
diff --git a/mm/shmem.c b/mm/shmem.c
index a1f21736ad68e..250b52e682590 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2258,25 +2258,11 @@ int shmem_lock(struct file *file, int lock, struct user_struct *user)
static int shmem_mmap(struct file *file, struct vm_area_struct *vma)
{
struct shmem_inode_info *info = SHMEM_I(file_inode(file));
+ int ret;

- if (info->seals & F_SEAL_FUTURE_WRITE) {
- /*
- * New PROT_WRITE and MAP_SHARED mmaps are not allowed when
- * "future write" seal active.
- */
- if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_WRITE))
- return -EPERM;
-
- /*
- * Since an F_SEAL_FUTURE_WRITE sealed memfd can be mapped as
- * MAP_SHARED and read-only, take care to not allow mprotect to
- * revert protections on such mappings. Do this only for shared
- * mappings. For private mappings, don't need to mask
- * VM_MAYWRITE as we still want them to be COW-writable.
- */
- if (vma->vm_flags & VM_SHARED)
- vma->vm_flags &= ~(VM_MAYWRITE);
- }
+ ret = seal_check_future_write(info->seals, vma);
+ if (ret)
+ return ret;

/* arm64 - allow memory tagging on RAM-based files */
vma->vm_flags |= VM_MTE_ALLOWED;
--
2.31.1