2023-04-05 14:27:22

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 0/2] mm/userfaultfd: fix and cleanup for migration entries with uffd-wp

One fix (I have a simple reproducer but it's too long to paste it into
the commit) and one cleanup.

uffd-wp migration entry handling for PTE/PMDs should now be fairly similar
code-wise.

Cc: Andrew Morton <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Muhammad Usama Anjum <[email protected]>

David Hildenbrand (2):
mm/userfaultfd: fix uffd-wp handling for THP migration entries
mm/userfaultfd: don't consider uffd-wp bit of writable migration
entries

mm/huge_memory.c | 16 ++++++++++++----
mm/mprotect.c | 2 --
2 files changed, 12 insertions(+), 6 deletions(-)

--
2.39.2


2023-04-05 14:27:26

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 2/2] mm/userfaultfd: don't consider uffd-wp bit of writable migration entries

If we end up with a writable migration entry that has the uffd-wp bit set,
we already messed up: the source PTE/PMD was writable, which means we could
have modified the page without notifying uffd first. Setting the uffd-wp
bit always implies converting migration entries to !writable migration
entries.

Commit 8f34f1eac382 ("mm/userfaultfd: fix uffd-wp special cases for
fork()") documents that "3. Forget to carry over uffd-wp bit for a write
migration huge pmd entry", but it doesn't really say why that should be
relevant.

So let's remove that code to avoid hiding an eventual underlying issue
(in the future, we might want to warn when creating writable migration
entries that have the uffd-wp bit set -- or even better when turning a
PTE writable that still has the uffd-wp bit set).

This now matches the handling for hugetlb migration entries in
hugetlb_change_protection().

In copy_huge_pmd()/copy_nonpresent_pte()/copy_hugetlb_page_range(), we
still transfer the uffd-bit also for writable migration entries, but simply
because we have unified handling for "writable" and "readable-exclusive"
migration entries, and we care about transferring the uffd-wp bit for
the latter.

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/huge_memory.c | 2 --
mm/mprotect.c | 2 --
2 files changed, 4 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index bdda4f426d58..f977c965fdad 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1853,8 +1853,6 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
newpmd = swp_entry_to_pmd(entry);
if (pmd_swp_soft_dirty(*pmd))
newpmd = pmd_swp_mksoft_dirty(newpmd);
- if (pmd_swp_uffd_wp(*pmd))
- newpmd = pmd_swp_mkuffd_wp(newpmd);
} else {
newpmd = *pmd;
}
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 13e84d8c0797..e04e9ea62ae7 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -223,8 +223,6 @@ static long change_pte_range(struct mmu_gather *tlb,
newpte = swp_entry_to_pte(entry);
if (pte_swp_soft_dirty(oldpte))
newpte = pte_swp_mksoft_dirty(newpte);
- if (pte_swp_uffd_wp(oldpte))
- newpte = pte_swp_mkuffd_wp(newpte);
} else if (is_writable_device_private_entry(entry)) {
/*
* We do not preserve soft-dirtiness. See
--
2.39.2

2023-04-05 14:28:00

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 1/2] mm/userfaultfd: fix uffd-wp handling for THP migration entries

Looks like what we fixed for hugetlb in commit 44f86392bdd1 ("mm/hugetlb:
fix uffd-wp handling for migration entries in hugetlb_change_protection()")
similarly applies to THP.

Setting/clearing uffd-wp on THP migration entries is not implemented
properly. Further, while removing migration PMDs considers the uffd-wp
bit, inserting migration PMDs does not consider the uffd-wp bit.

We have to set/clear independently of the migration entry type in
change_huge_pmd() and properly copy the uffd-wp bit in
set_pmd_migration_entry().

Verified using a simple reproducer that triggers migration of a THP, that
the set_pmd_migration_entry() no longer loses the uffd-wp bit.

Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration")
Cc: [email protected]
Signed-off-by: David Hildenbrand <[email protected]>
---
mm/huge_memory.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 032fb0ef9cd1..bdda4f426d58 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1838,10 +1838,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
if (is_swap_pmd(*pmd)) {
swp_entry_t entry = pmd_to_swp_entry(*pmd);
struct page *page = pfn_swap_entry_to_page(entry);
+ pmd_t newpmd;

VM_BUG_ON(!is_pmd_migration_entry(*pmd));
if (is_writable_migration_entry(entry)) {
- pmd_t newpmd;
/*
* A protection check is difficult so
* just be safe and disable write
@@ -1855,8 +1855,16 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
newpmd = pmd_swp_mksoft_dirty(newpmd);
if (pmd_swp_uffd_wp(*pmd))
newpmd = pmd_swp_mkuffd_wp(newpmd);
- set_pmd_at(mm, addr, pmd, newpmd);
+ } else {
+ newpmd = *pmd;
}
+
+ if (uffd_wp)
+ newpmd = pmd_swp_mkuffd_wp(newpmd);
+ else if (uffd_wp_resolve)
+ newpmd = pmd_swp_clear_uffd_wp(newpmd);
+ if (!pmd_same(*pmd, newpmd))
+ set_pmd_at(mm, addr, pmd, newpmd);
goto unlock;
}
#endif
@@ -3251,6 +3259,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
pmdswp = swp_entry_to_pmd(entry);
if (pmd_soft_dirty(pmdval))
pmdswp = pmd_swp_mksoft_dirty(pmdswp);
+ if (pmd_swp_uffd_wp(*pvmw->pmd))
+ pmdswp = pmd_swp_mkuffd_wp(pmdswp);
set_pmd_at(mm, address, pvmw->pmd, pmdswp);
page_remove_rmap(page, vma, true);
put_page(page);
--
2.39.2

2023-04-05 15:16:41

by Peter Xu

[permalink] [raw]
Subject: Re: [PATCH v1 1/2] mm/userfaultfd: fix uffd-wp handling for THP migration entries

On Wed, Apr 05, 2023 at 04:25:34PM +0200, David Hildenbrand wrote:
> Looks like what we fixed for hugetlb in commit 44f86392bdd1 ("mm/hugetlb:
> fix uffd-wp handling for migration entries in hugetlb_change_protection()")
> similarly applies to THP.
>
> Setting/clearing uffd-wp on THP migration entries is not implemented
> properly. Further, while removing migration PMDs considers the uffd-wp
> bit, inserting migration PMDs does not consider the uffd-wp bit.
>
> We have to set/clear independently of the migration entry type in
> change_huge_pmd() and properly copy the uffd-wp bit in
> set_pmd_migration_entry().
>
> Verified using a simple reproducer that triggers migration of a THP, that
> the set_pmd_migration_entry() no longer loses the uffd-wp bit.
>
> Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration")
> Cc: [email protected]
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Peter Xu <[email protected]>

Thanks, one trivial nitpick:

> ---
> mm/huge_memory.c | 14 ++++++++++++--
> 1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 032fb0ef9cd1..bdda4f426d58 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1838,10 +1838,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> if (is_swap_pmd(*pmd)) {
> swp_entry_t entry = pmd_to_swp_entry(*pmd);
> struct page *page = pfn_swap_entry_to_page(entry);
> + pmd_t newpmd;
>
> VM_BUG_ON(!is_pmd_migration_entry(*pmd));
> if (is_writable_migration_entry(entry)) {
> - pmd_t newpmd;
> /*
> * A protection check is difficult so
> * just be safe and disable write
> @@ -1855,8 +1855,16 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> newpmd = pmd_swp_mksoft_dirty(newpmd);
> if (pmd_swp_uffd_wp(*pmd))
> newpmd = pmd_swp_mkuffd_wp(newpmd);
> - set_pmd_at(mm, addr, pmd, newpmd);
> + } else {
> + newpmd = *pmd;
> }
> +
> + if (uffd_wp)
> + newpmd = pmd_swp_mkuffd_wp(newpmd);
> + else if (uffd_wp_resolve)
> + newpmd = pmd_swp_clear_uffd_wp(newpmd);
> + if (!pmd_same(*pmd, newpmd))
> + set_pmd_at(mm, addr, pmd, newpmd);
> goto unlock;
> }
> #endif
> @@ -3251,6 +3259,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
> pmdswp = swp_entry_to_pmd(entry);
> if (pmd_soft_dirty(pmdval))
> pmdswp = pmd_swp_mksoft_dirty(pmdswp);
> + if (pmd_swp_uffd_wp(*pvmw->pmd))
> + pmdswp = pmd_swp_mkuffd_wp(pmdswp);

I think it's fine to use *pmd, but maybe still better to use pmdval? I
worry pmdp_invalidate()) can be something else in the future that may
affect the bit.

> set_pmd_at(mm, address, pvmw->pmd, pmdswp);
> page_remove_rmap(page, vma, true);
> put_page(page);
> --
> 2.39.2
>

--
Peter Xu

2023-04-05 15:18:33

by Peter Xu

[permalink] [raw]
Subject: Re: [PATCH v1 2/2] mm/userfaultfd: don't consider uffd-wp bit of writable migration entries

On Wed, Apr 05, 2023 at 04:25:35PM +0200, David Hildenbrand wrote:
> If we end up with a writable migration entry that has the uffd-wp bit set,
> we already messed up: the source PTE/PMD was writable, which means we could
> have modified the page without notifying uffd first. Setting the uffd-wp
> bit always implies converting migration entries to !writable migration
> entries.
>
> Commit 8f34f1eac382 ("mm/userfaultfd: fix uffd-wp special cases for
> fork()") documents that "3. Forget to carry over uffd-wp bit for a write
> migration huge pmd entry", but it doesn't really say why that should be
> relevant.
>
> So let's remove that code to avoid hiding an eventual underlying issue
> (in the future, we might want to warn when creating writable migration
> entries that have the uffd-wp bit set -- or even better when turning a
> PTE writable that still has the uffd-wp bit set).
>
> This now matches the handling for hugetlb migration entries in
> hugetlb_change_protection().
>
> In copy_huge_pmd()/copy_nonpresent_pte()/copy_hugetlb_page_range(), we
> still transfer the uffd-bit also for writable migration entries, but simply
> because we have unified handling for "writable" and "readable-exclusive"
> migration entries, and we care about transferring the uffd-wp bit for
> the latter.
>
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Peter Xu <[email protected]>

I think that's mostly for sanity to carry over one generic bit between
present <-> !present, even if uffd-wp is not that generic and currently
closely bound to write bit.

E.g., we will need to be more careful when we want to change the meaning of
uffd-wp bit some day, but that'll always be challenging anyway, so not
something this will change.

Thanks,

--
Peter Xu

2023-04-05 15:29:08

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 1/2] mm/userfaultfd: fix uffd-wp handling for THP migration entries

On 05.04.23 17:12, Peter Xu wrote:
> On Wed, Apr 05, 2023 at 04:25:34PM +0200, David Hildenbrand wrote:
>> Looks like what we fixed for hugetlb in commit 44f86392bdd1 ("mm/hugetlb:
>> fix uffd-wp handling for migration entries in hugetlb_change_protection()")
>> similarly applies to THP.
>>
>> Setting/clearing uffd-wp on THP migration entries is not implemented
>> properly. Further, while removing migration PMDs considers the uffd-wp
>> bit, inserting migration PMDs does not consider the uffd-wp bit.
>>
>> We have to set/clear independently of the migration entry type in
>> change_huge_pmd() and properly copy the uffd-wp bit in
>> set_pmd_migration_entry().
>>
>> Verified using a simple reproducer that triggers migration of a THP, that
>> the set_pmd_migration_entry() no longer loses the uffd-wp bit.
>>
>> Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration")
>> Cc: [email protected]
>> Signed-off-by: David Hildenbrand <[email protected]>
>
> Reviewed-by: Peter Xu <[email protected]>
>
> Thanks, one trivial nitpick:
>
>> ---
>> mm/huge_memory.c | 14 ++++++++++++--
>> 1 file changed, 12 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 032fb0ef9cd1..bdda4f426d58 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -1838,10 +1838,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>> if (is_swap_pmd(*pmd)) {
>> swp_entry_t entry = pmd_to_swp_entry(*pmd);
>> struct page *page = pfn_swap_entry_to_page(entry);
>> + pmd_t newpmd;
>>
>> VM_BUG_ON(!is_pmd_migration_entry(*pmd));
>> if (is_writable_migration_entry(entry)) {
>> - pmd_t newpmd;
>> /*
>> * A protection check is difficult so
>> * just be safe and disable write
>> @@ -1855,8 +1855,16 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>> newpmd = pmd_swp_mksoft_dirty(newpmd);
>> if (pmd_swp_uffd_wp(*pmd))
>> newpmd = pmd_swp_mkuffd_wp(newpmd);
>> - set_pmd_at(mm, addr, pmd, newpmd);
>> + } else {
>> + newpmd = *pmd;
>> }
>> +
>> + if (uffd_wp)
>> + newpmd = pmd_swp_mkuffd_wp(newpmd);
>> + else if (uffd_wp_resolve)
>> + newpmd = pmd_swp_clear_uffd_wp(newpmd);
>> + if (!pmd_same(*pmd, newpmd))
>> + set_pmd_at(mm, addr, pmd, newpmd);
>> goto unlock;
>> }
>> #endif
>> @@ -3251,6 +3259,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
>> pmdswp = swp_entry_to_pmd(entry);
>> if (pmd_soft_dirty(pmdval))
>> pmdswp = pmd_swp_mksoft_dirty(pmdswp);
>> + if (pmd_swp_uffd_wp(*pvmw->pmd))
>> + pmdswp = pmd_swp_mkuffd_wp(pmdswp);
>
> I think it's fine to use *pmd, but maybe still better to use pmdval? I
> worry pmdp_invalidate()) can be something else in the future that may
> affect the bit.

Wondering how I ended up with that, I realized that it's actually
wrong and might have worked by chance for my reproducer on x86.

That should make it work:

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f977c965fdad..fffc953fa6ea 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3257,7 +3257,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
pmdswp = swp_entry_to_pmd(entry);
if (pmd_soft_dirty(pmdval))
pmdswp = pmd_swp_mksoft_dirty(pmdswp);
- if (pmd_swp_uffd_wp(*pvmw->pmd))
+ if (pmd_uffd_wp(pmdval))
pmdswp = pmd_swp_mkuffd_wp(pmdswp);
set_pmd_at(mm, address, pvmw->pmd, pmdswp);
page_remove_rmap(page, vma, true);


--
Thanks,

David / dhildenb

2023-04-05 15:29:18

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 0/2] mm/userfaultfd: fix and cleanup for migration entries with uffd-wp

On 05.04.23 17:17, Peter Xu wrote:
> On Wed, Apr 05, 2023 at 04:25:33PM +0200, David Hildenbrand wrote:
>> One fix (I have a simple reproducer but it's too long to paste it into
>> the commit)
>
> I hope the recent rework of the unit test can help having more of these
> unit tests into the kselftests directly.
>
> Currently it's still ugly even after the rework patchset - we'll need to
> reference e.g. area_dst in unit tests for the allocated memory region (with
> specified type of either MEM_ANON, MEM_SHMEM, ...), but there's plan to
> make it even better.

Yes, I refrained from messing with the selftest for now while you rework
Here is the hacky reproducer:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>
#include <stdint.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/ioctl.h>
#include <linux/userfaultfd.h>
#include <linux/mempolicy.h>

static int uffd;
static int pagemap_fd;
static size_t pagesize;

static uint64_t pagemap_get_entry(int fd, char *start)
{
const unsigned long pfn = (unsigned long)start / pagesize;
uint64_t entry;
int ret;

ret = pread(fd, &entry, sizeof(entry), pfn * sizeof(entry));
if (ret != sizeof(entry)) {
fprintf(stderr, "pread() failed\n");
exit(1);
}

return entry;
}

#define BIT_ULL(nr) (1ULL << (nr))
#define PM_UFFD_WP BIT_ULL(57)
#define PM_SWAP BIT_ULL(62)
#define PM_PRESENT BIT_ULL(63)

static bool pagemap_is_uffd_wp(int fd, char *start)
{
uint64_t entry = pagemap_get_entry(fd, start);

return entry & PM_UFFD_WP;
}

static bool pagemap_is_populated(int fd, char *start)
{
uint64_t entry = pagemap_get_entry(fd, start);

return entry & (PM_SWAP | PM_PRESENT);
}

static int setup_uffd(char *mem, size_t size)
{
struct uffdio_api uffdio_api;
struct uffdio_register uffdio_register;
struct uffdio_range uffd_range;

uffd = syscall(__NR_userfaultfd,
O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY);
if (uffd < 0) {
fprintf(stderr, "syscall() failed: %d\n", errno);
return -errno;
}

uffdio_api.api = UFFD_API;
uffdio_api.features = UFFD_FEATURE_PAGEFAULT_FLAG_WP;
if (ioctl(uffd, UFFDIO_API, &uffdio_api) < 0) {
fprintf(stderr, "UFFDIO_API failed: %d\n", errno);
return -errno;
}

if (!(uffdio_api.features & UFFD_FEATURE_PAGEFAULT_FLAG_WP)) {
fprintf(stderr, "UFFD_FEATURE_PAGEFAULT_FLAG_WP missing\n");
return -ENOSYS;
}

/* Register UFFD-WP */
uffdio_register.range.start = (unsigned long) mem;
uffdio_register.range.len = size;
uffdio_register.mode = UFFDIO_REGISTER_MODE_WP;
if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) < 0) {
fprintf(stderr, "UFFDIO_REGISTER failed: %d\n", errno);
return -errno;
}

return 0;
}

int main(int argc, char **argv)
{
struct uffdio_writeprotect uffd_writeprotect;
const size_t thpsize = 2 * 1024 * 1024;
const size_t mmap_size = 2 * thpsize;
char *mem;

pagesize = getpagesize();
pagemap_fd = open("/proc/self/pagemap", O_RDONLY);
if (pagemap_fd < 0) {
fprintf(stderr, "open() failed\n");
exit(1);
}

mem = mmap(NULL, mmap_size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON,
-1, 0);
if (mem == MAP_FAILED) {
fprintf(stderr, "mmap() failed\n");
return -errno;
}
mem = (char *)(((uintptr_t)mem + thpsize) & ~(thpsize - 1));

if (madvise(mem, thpsize, MADV_HUGEPAGE)) {
fprintf(stderr, "MADV_HUGEPAGE failed\n");
return -errno;
}

/* Populate a THP. */
memset(mem, 0, pagesize);

if (!pagemap_is_populated(pagemap_fd, mem + thpsize - pagesize)) {
fprintf(stderr, "Did not get a THP populated\n");
return -EBUSY;
}

/* Verify uffd-wp is not set. */
if (pagemap_is_uffd_wp(pagemap_fd, mem)) {
fprintf(stderr, "uffd-wp unexpectedly set\n");
return -1;
}

/* Setup UFFD and protect the page. */
if (setup_uffd(mem, thpsize))
return 1;
uffd_writeprotect.range.start = (unsigned long) mem;
uffd_writeprotect.range.len = thpsize;
uffd_writeprotect.mode = UFFDIO_WRITEPROTECT_MODE_WP;
if (ioctl(uffd, UFFDIO_WRITEPROTECT, &uffd_writeprotect)) {
fprintf(stderr, "UFFDIO_WRITEPROTECT failed: %d\n", errno);
return -errno;
}

/* Verify uffd-wp is set. */
if (!pagemap_is_uffd_wp(pagemap_fd, mem)) {
fprintf(stderr, "uffd-wp not set\n");
return -1;
}

/* Migrate the THP. */
if (syscall(__NR_mbind, mem, thpsize, MPOL_LOCAL, NULL, 0x7fful,
MPOL_MF_MOVE)) {
fprintf(stderr, "mbind() failed\n");
return -errno;
}

/* Verify uffd-wp is still set. */
if (!pagemap_is_uffd_wp(pagemap_fd, mem)) {
fprintf(stderr, "uffd-wp lost\n");
return -1;
}

return 0;
}

--
Thanks,

David / dhildenb

2023-04-05 15:29:43

by Peter Xu

[permalink] [raw]
Subject: Re: [PATCH v1 0/2] mm/userfaultfd: fix and cleanup for migration entries with uffd-wp

On Wed, Apr 05, 2023 at 04:25:33PM +0200, David Hildenbrand wrote:
> One fix (I have a simple reproducer but it's too long to paste it into
> the commit)

I hope the recent rework of the unit test can help having more of these
unit tests into the kselftests directly.

Currently it's still ugly even after the rework patchset - we'll need to
reference e.g. area_dst in unit tests for the allocated memory region (with
specified type of either MEM_ANON, MEM_SHMEM, ...), but there's plan to
make it even better.

Thanks for fixing those already.

--
Peter Xu

2023-04-05 15:46:08

by Peter Xu

[permalink] [raw]
Subject: Re: [PATCH v1 1/2] mm/userfaultfd: fix uffd-wp handling for THP migration entries

On Wed, Apr 05, 2023 at 05:17:31PM +0200, David Hildenbrand wrote:
> On 05.04.23 17:12, Peter Xu wrote:
> > On Wed, Apr 05, 2023 at 04:25:34PM +0200, David Hildenbrand wrote:
> > > Looks like what we fixed for hugetlb in commit 44f86392bdd1 ("mm/hugetlb:
> > > fix uffd-wp handling for migration entries in hugetlb_change_protection()")
> > > similarly applies to THP.
> > >
> > > Setting/clearing uffd-wp on THP migration entries is not implemented
> > > properly. Further, while removing migration PMDs considers the uffd-wp
> > > bit, inserting migration PMDs does not consider the uffd-wp bit.
> > >
> > > We have to set/clear independently of the migration entry type in
> > > change_huge_pmd() and properly copy the uffd-wp bit in
> > > set_pmd_migration_entry().
> > >
> > > Verified using a simple reproducer that triggers migration of a THP, that
> > > the set_pmd_migration_entry() no longer loses the uffd-wp bit.
> > >
> > > Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration")
> > > Cc: [email protected]
> > > Signed-off-by: David Hildenbrand <[email protected]>
> >
> > Reviewed-by: Peter Xu <[email protected]>
> >
> > Thanks, one trivial nitpick:
> >
> > > ---
> > > mm/huge_memory.c | 14 ++++++++++++--
> > > 1 file changed, 12 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > index 032fb0ef9cd1..bdda4f426d58 100644
> > > --- a/mm/huge_memory.c
> > > +++ b/mm/huge_memory.c
> > > @@ -1838,10 +1838,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> > > if (is_swap_pmd(*pmd)) {
> > > swp_entry_t entry = pmd_to_swp_entry(*pmd);
> > > struct page *page = pfn_swap_entry_to_page(entry);
> > > + pmd_t newpmd;
> > > VM_BUG_ON(!is_pmd_migration_entry(*pmd));
> > > if (is_writable_migration_entry(entry)) {
> > > - pmd_t newpmd;
> > > /*
> > > * A protection check is difficult so
> > > * just be safe and disable write
> > > @@ -1855,8 +1855,16 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> > > newpmd = pmd_swp_mksoft_dirty(newpmd);
> > > if (pmd_swp_uffd_wp(*pmd))
> > > newpmd = pmd_swp_mkuffd_wp(newpmd);
> > > - set_pmd_at(mm, addr, pmd, newpmd);
> > > + } else {
> > > + newpmd = *pmd;
> > > }
> > > +
> > > + if (uffd_wp)
> > > + newpmd = pmd_swp_mkuffd_wp(newpmd);
> > > + else if (uffd_wp_resolve)
> > > + newpmd = pmd_swp_clear_uffd_wp(newpmd);
> > > + if (!pmd_same(*pmd, newpmd))
> > > + set_pmd_at(mm, addr, pmd, newpmd);
> > > goto unlock;
> > > }
> > > #endif
> > > @@ -3251,6 +3259,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
> > > pmdswp = swp_entry_to_pmd(entry);
> > > if (pmd_soft_dirty(pmdval))
> > > pmdswp = pmd_swp_mksoft_dirty(pmdswp);
> > > + if (pmd_swp_uffd_wp(*pvmw->pmd))
> > > + pmdswp = pmd_swp_mkuffd_wp(pmdswp);
> >
> > I think it's fine to use *pmd, but maybe still better to use pmdval? I
> > worry pmdp_invalidate()) can be something else in the future that may
> > affect the bit.
>
> Wondering how I ended up with that, I realized that it's actually
> wrong and might have worked by chance for my reproducer on x86.
>
> That should make it work:
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index f977c965fdad..fffc953fa6ea 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3257,7 +3257,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
> pmdswp = swp_entry_to_pmd(entry);
> if (pmd_soft_dirty(pmdval))
> pmdswp = pmd_swp_mksoft_dirty(pmdswp);
> - if (pmd_swp_uffd_wp(*pvmw->pmd))
> + if (pmd_uffd_wp(pmdval))
> pmdswp = pmd_swp_mkuffd_wp(pmdswp);
> set_pmd_at(mm, address, pvmw->pmd, pmdswp);
> page_remove_rmap(page, vma, true);

I guess pmd_swp_uffd_wp() just reads the _USER bit 2 which is also set for
a present pte, but then it sets swp uffd-wp always even if it was not set.

Yes the change must be squashed in to be correct, with that, my R-b keeps.

Thanks,

--
Peter Xu

2023-04-05 15:52:55

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 1/2] mm/userfaultfd: fix uffd-wp handling for THP migration entries

On 05.04.23 17:43, Peter Xu wrote:
> On Wed, Apr 05, 2023 at 05:17:31PM +0200, David Hildenbrand wrote:
>> On 05.04.23 17:12, Peter Xu wrote:
>>> On Wed, Apr 05, 2023 at 04:25:34PM +0200, David Hildenbrand wrote:
>>>> Looks like what we fixed for hugetlb in commit 44f86392bdd1 ("mm/hugetlb:
>>>> fix uffd-wp handling for migration entries in hugetlb_change_protection()")
>>>> similarly applies to THP.
>>>>
>>>> Setting/clearing uffd-wp on THP migration entries is not implemented
>>>> properly. Further, while removing migration PMDs considers the uffd-wp
>>>> bit, inserting migration PMDs does not consider the uffd-wp bit.
>>>>
>>>> We have to set/clear independently of the migration entry type in
>>>> change_huge_pmd() and properly copy the uffd-wp bit in
>>>> set_pmd_migration_entry().
>>>>
>>>> Verified using a simple reproducer that triggers migration of a THP, that
>>>> the set_pmd_migration_entry() no longer loses the uffd-wp bit.
>>>>
>>>> Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration")
>>>> Cc: [email protected]
>>>> Signed-off-by: David Hildenbrand <[email protected]>
>>>
>>> Reviewed-by: Peter Xu <[email protected]>
>>>
>>> Thanks, one trivial nitpick:
>>>
>>>> ---
>>>> mm/huge_memory.c | 14 ++++++++++++--
>>>> 1 file changed, 12 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index 032fb0ef9cd1..bdda4f426d58 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -1838,10 +1838,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>>>> if (is_swap_pmd(*pmd)) {
>>>> swp_entry_t entry = pmd_to_swp_entry(*pmd);
>>>> struct page *page = pfn_swap_entry_to_page(entry);
>>>> + pmd_t newpmd;
>>>> VM_BUG_ON(!is_pmd_migration_entry(*pmd));
>>>> if (is_writable_migration_entry(entry)) {
>>>> - pmd_t newpmd;
>>>> /*
>>>> * A protection check is difficult so
>>>> * just be safe and disable write
>>>> @@ -1855,8 +1855,16 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>>>> newpmd = pmd_swp_mksoft_dirty(newpmd);
>>>> if (pmd_swp_uffd_wp(*pmd))
>>>> newpmd = pmd_swp_mkuffd_wp(newpmd);
>>>> - set_pmd_at(mm, addr, pmd, newpmd);
>>>> + } else {
>>>> + newpmd = *pmd;
>>>> }
>>>> +
>>>> + if (uffd_wp)
>>>> + newpmd = pmd_swp_mkuffd_wp(newpmd);
>>>> + else if (uffd_wp_resolve)
>>>> + newpmd = pmd_swp_clear_uffd_wp(newpmd);
>>>> + if (!pmd_same(*pmd, newpmd))
>>>> + set_pmd_at(mm, addr, pmd, newpmd);
>>>> goto unlock;
>>>> }
>>>> #endif
>>>> @@ -3251,6 +3259,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
>>>> pmdswp = swp_entry_to_pmd(entry);
>>>> if (pmd_soft_dirty(pmdval))
>>>> pmdswp = pmd_swp_mksoft_dirty(pmdswp);
>>>> + if (pmd_swp_uffd_wp(*pvmw->pmd))
>>>> + pmdswp = pmd_swp_mkuffd_wp(pmdswp);
>>>
>>> I think it's fine to use *pmd, but maybe still better to use pmdval? I
>>> worry pmdp_invalidate()) can be something else in the future that may
>>> affect the bit.
>>
>> Wondering how I ended up with that, I realized that it's actually
>> wrong and might have worked by chance for my reproducer on x86.
>>
>> That should make it work:
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index f977c965fdad..fffc953fa6ea 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -3257,7 +3257,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
>> pmdswp = swp_entry_to_pmd(entry);
>> if (pmd_soft_dirty(pmdval))
>> pmdswp = pmd_swp_mksoft_dirty(pmdswp);
>> - if (pmd_swp_uffd_wp(*pvmw->pmd))
>> + if (pmd_uffd_wp(pmdval))
>> pmdswp = pmd_swp_mkuffd_wp(pmdswp);
>> set_pmd_at(mm, address, pvmw->pmd, pmdswp);
>> page_remove_rmap(page, vma, true);
>
> I guess pmd_swp_uffd_wp() just reads the _USER bit 2 which is also set for
> a present pte, but then it sets swp uffd-wp always even if it was not set.
>

Yes. I modified the reproducer to migrate without uffd-wp first and we
suddenly gain a uffd-wp bit.

> Yes the change must be squashed in to be correct, with that, my R-b keeps.

Thanks, I will resend later.

--
Thanks,

David / dhildenb