2021-03-04 23:28:22

by Brian Geffon

[permalink] [raw]
Subject: [PATCH] mm: Allow shmem mappings with MREMAP_DONTUNMAP

Currently MREMAP_DONTUNMAP only accepts private anonymous mappings. This change
will widen the support to include shmem mappings. The primary use case
is to support MREMAP_DONTUNMAP on mappings which may have been created from
a memfd.

Lokesh Gidra who works on the Android JVM, provided an explanation of how such
a feature will improve Android JVM garbage collection:
"Android is developing a new garbage collector (GC), based on userfaultfd. The
garbage collector will use userfaultfd (uffd) on the java heap during compaction.
On accessing any uncompacted page, the application threads will find it missing,
at which point the thread will create the compacted page and then use UFFDIO_COPY
ioctl to get it mapped and then resume execution. Before starting this compaction,
in a stop-the-world pause the heap will be mremap(MREMAP_DONTUNMAP) so that the
java heap is ready to receive UFFD_EVENT_PAGEFAULT events after resuming execution.

To speedup mremap operations, pagetable movement was optimized by moving PUD entries
instead of PTE entries [1]. It was necessary as mremap of even modest sized memory
ranges also took several milliseconds, and stopping the application for that long
isn't acceptable in response-time sensitive cases. With UFFDIO_CONTINUE feature [2],
it will be even more efficient to implement this GC, particularly the 'non-moveable'
portions of the heap. It will also help in reducing the need to copy (UFFDIO_COPY)
the pages. However, for this to work, the java heap has to be on a 'shared' vma.
Currently MREMAP_DONTUNMAP only supports private anonymous mappings, this patch will
enable using UFFDIO_CONTINUE for the new userfaultfd-based heap compaction."

[1] https://lore.kernel.org/linux-mm/20201215030730.NC3CU98e4%[email protected]/
[2] https://lore.kernel.org/linux-mm/[email protected]/
---
mm/mremap.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index ec8f840399ed..6934d199da54 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -653,8 +653,7 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr,
return ERR_PTR(-EINVAL);
}

- if (flags & MREMAP_DONTUNMAP && (!vma_is_anonymous(vma) ||
- vma->vm_flags & VM_SHARED))
+ if (flags & MREMAP_DONTUNMAP && !(vma_is_anonymous(vma) || vma_is_shmem(vma)))
return ERR_PTR(-EINVAL);

if (is_vm_hugetlb_page(vma))
--
2.31.0.rc0.254.gbdcc3b1a9d-goog


2021-03-04 23:28:35

by Brian Geffon

[permalink] [raw]
Subject: Re: [PATCH] mm: Allow shmem mappings with MREMAP_DONTUNMAP

I apologize, this patch didn't include my signed off by, here it is:

Signed-off-by: Brian Geffon <[email protected]>


On Wed, Mar 3, 2021 at 9:53 AM Brian Geffon <[email protected]> wrote:
>
> Currently MREMAP_DONTUNMAP only accepts private anonymous mappings. This change
> will widen the support to include shmem mappings. The primary use case
> is to support MREMAP_DONTUNMAP on mappings which may have been created from
> a memfd.
>
> Lokesh Gidra who works on the Android JVM, provided an explanation of how such
> a feature will improve Android JVM garbage collection:
> "Android is developing a new garbage collector (GC), based on userfaultfd. The
> garbage collector will use userfaultfd (uffd) on the java heap during compaction.
> On accessing any uncompacted page, the application threads will find it missing,
> at which point the thread will create the compacted page and then use UFFDIO_COPY
> ioctl to get it mapped and then resume execution. Before starting this compaction,
> in a stop-the-world pause the heap will be mremap(MREMAP_DONTUNMAP) so that the
> java heap is ready to receive UFFD_EVENT_PAGEFAULT events after resuming execution.
>
> To speedup mremap operations, pagetable movement was optimized by moving PUD entries
> instead of PTE entries [1]. It was necessary as mremap of even modest sized memory
> ranges also took several milliseconds, and stopping the application for that long
> isn't acceptable in response-time sensitive cases. With UFFDIO_CONTINUE feature [2],
> it will be even more efficient to implement this GC, particularly the 'non-moveable'
> portions of the heap. It will also help in reducing the need to copy (UFFDIO_COPY)
> the pages. However, for this to work, the java heap has to be on a 'shared' vma.
> Currently MREMAP_DONTUNMAP only supports private anonymous mappings, this patch will
> enable using UFFDIO_CONTINUE for the new userfaultfd-based heap compaction."
>
> [1] https://lore.kernel.org/linux-mm/20201215030730.NC3CU98e4%[email protected]/
> [2] https://lore.kernel.org/linux-mm/[email protected]/
> ---
> mm/mremap.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/mm/mremap.c b/mm/mremap.c
> index ec8f840399ed..6934d199da54 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -653,8 +653,7 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr,
> return ERR_PTR(-EINVAL);
> }
>
> - if (flags & MREMAP_DONTUNMAP && (!vma_is_anonymous(vma) ||
> - vma->vm_flags & VM_SHARED))
> + if (flags & MREMAP_DONTUNMAP && !(vma_is_anonymous(vma) || vma_is_shmem(vma)))
> return ERR_PTR(-EINVAL);
>
> if (is_vm_hugetlb_page(vma))
> --
> 2.31.0.rc0.254.gbdcc3b1a9d-goog
>

2021-03-14 04:23:16

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH] mm: Allow shmem mappings with MREMAP_DONTUNMAP

On Wed, 3 Mar 2021, Brian Geffon wrote:

> Currently MREMAP_DONTUNMAP only accepts private anonymous mappings. This change
> will widen the support to include shmem mappings. The primary use case
> is to support MREMAP_DONTUNMAP on mappings which may have been created from
> a memfd.
>
> Lokesh Gidra who works on the Android JVM, provided an explanation of how such
> a feature will improve Android JVM garbage collection:
> "Android is developing a new garbage collector (GC), based on userfaultfd. The
> garbage collector will use userfaultfd (uffd) on the java heap during compaction.
> On accessing any uncompacted page, the application threads will find it missing,
> at which point the thread will create the compacted page and then use UFFDIO_COPY
> ioctl to get it mapped and then resume execution. Before starting this compaction,
> in a stop-the-world pause the heap will be mremap(MREMAP_DONTUNMAP) so that the
> java heap is ready to receive UFFD_EVENT_PAGEFAULT events after resuming execution.
>
> To speedup mremap operations, pagetable movement was optimized by moving PUD entries
> instead of PTE entries [1]. It was necessary as mremap of even modest sized memory
> ranges also took several milliseconds, and stopping the application for that long
> isn't acceptable in response-time sensitive cases. With UFFDIO_CONTINUE feature [2],
> it will be even more efficient to implement this GC, particularly the 'non-moveable'
> portions of the heap. It will also help in reducing the need to copy (UFFDIO_COPY)
> the pages. However, for this to work, the java heap has to be on a 'shared' vma.
> Currently MREMAP_DONTUNMAP only supports private anonymous mappings, this patch will
> enable using UFFDIO_CONTINUE for the new userfaultfd-based heap compaction."
>
> [1] https://lore.kernel.org/linux-mm/20201215030730.NC3CU98e4%[email protected]/
> [2] https://lore.kernel.org/linux-mm/[email protected]/
> ---
> mm/mremap.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/mm/mremap.c b/mm/mremap.c
> index ec8f840399ed..6934d199da54 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -653,8 +653,7 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr,
> return ERR_PTR(-EINVAL);
> }
>
> - if (flags & MREMAP_DONTUNMAP && (!vma_is_anonymous(vma) ||
> - vma->vm_flags & VM_SHARED))
> + if (flags & MREMAP_DONTUNMAP && !(vma_is_anonymous(vma) || vma_is_shmem(vma)))
> return ERR_PTR(-EINVAL);
>
> if (is_vm_hugetlb_page(vma))
> --

Yet something to improve...

Thanks for extending MREMAP_DONTUNMAP to shmem, but I think this patch
goes in the wrong direction, complicating when it should be generalizing:
the mremap syscall is about rearranging the user's virtual address space,
and is not specific to the underlying anonymous or shmem or file object
(though so far you have only been interested in anonymous, and now shmem).

A better patch would say:

- if (flags & MREMAP_DONTUNMAP && (!vma_is_anonymous(vma) ||
- vma->vm_flags & VM_SHARED))
+ if ((flags & MREMAP_DONTUNMAP) &&
+ (vma->vm_flags & (VM_DONTEXPAND | VM_PFNMAP)))
return ERR_PTR(-EINVAL);

VM_DONTEXPAND is what has long been used on special mappings, to prevent
surprises from mremap changing the size of the mapping: MREMAP_DONTUNMAP
introduced a different way of expanding the mapping, so VM_DONTEXPAND
still seems a reasonable name (I've thrown in VM_PFNMAP there because
it's in the VM_DONTEXPAND test lower down: for safety I guess, and best
if both behave the same - though one says -EINVAL and the other -EFAULT).

With that VM_DONTEXPAND check in, Dmitry's commit cd544fd1dc92
("mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio")
can still be reverted (as you agreed on 28th December), even though
vma_is_anonymous() will no longer protect it.

Was there an mremap(2) man page update for MREMAP_DONTUNMAP?
Whether or not there was before, it ought to get one now.

Thanks,
Hugh

2021-03-16 21:22:55

by Dmitry Safonov

[permalink] [raw]
Subject: Re: [PATCH] mm: Allow shmem mappings with MREMAP_DONTUNMAP

Hi Brian, Hugh,

On 3/16/21 7:18 PM, Brian Geffon wrote:
> Hi Hugh,
> Thanks for this suggestion, responses in line.
>
>> A better patch would say:
>>
>> - if (flags & MREMAP_DONTUNMAP && (!vma_is_anonymous(vma) ||
>> - vma->vm_flags & VM_SHARED))
>> + if ((flags & MREMAP_DONTUNMAP) &&
>> + (vma->vm_flags & (VM_DONTEXPAND | VM_PFNMAP)))
>> return ERR_PTR(-EINVAL);
>>
>> VM_DONTEXPAND is what has long been used on special mappings, to prevent
>> surprises from mremap changing the size of the mapping: MREMAP_DONTUNMAP
>> introduced a different way of expanding the mapping, so VM_DONTEXPAND
>> still seems a reasonable name (I've thrown in VM_PFNMAP there because
>> it's in the VM_DONTEXPAND test lower down: for safety I guess, and best
>> if both behave the same - though one says -EINVAL and the other -EFAULT).
>
> I like this idea and am happy to mail a new patch. I think it may make
> sense to bring the lower block up here so that it becomes more clear
> that it's not duplicate code and that the MREMAP_DONTUNMAP case
> returns -EINVAL and other cases return -EFAULT. I wonder if the
> -EFAULT error code would have made more sense from the start for both
> cases, do you have any thoughts on changing the error code at this
> point?
>
>> With that VM_DONTEXPAND check in, Dmitry's commit cd544fd1dc92
>> ("mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio")
>> can still be reverted (as you agreed on 28th December), even though
>> vma_is_anonymous() will no longer protect it.
>
> I agree and if Dmitry does not have time I would be happy to mail a
> revert to cd544fd1dc92 as we discussed in [1]. Dmitry, would you like
> me to do that?

Ack. I was planning to send a patches set that includes the revert, but
that's stalled a bit. As the patch just adds excessive checks, but
doesn't introduce an issue, I haven't sent it separately.
Feel free to revert it :-)

Thanks,
Dmitry

2021-03-16 21:24:16

by Brian Geffon

[permalink] [raw]
Subject: Re: [PATCH] mm: Allow shmem mappings with MREMAP_DONTUNMAP

Hi Hugh,
Thanks for this suggestion, responses in line.

> A better patch would say:
>
> - if (flags & MREMAP_DONTUNMAP && (!vma_is_anonymous(vma) ||
> - vma->vm_flags & VM_SHARED))
> + if ((flags & MREMAP_DONTUNMAP) &&
> + (vma->vm_flags & (VM_DONTEXPAND | VM_PFNMAP)))
> return ERR_PTR(-EINVAL);
>
> VM_DONTEXPAND is what has long been used on special mappings, to prevent
> surprises from mremap changing the size of the mapping: MREMAP_DONTUNMAP
> introduced a different way of expanding the mapping, so VM_DONTEXPAND
> still seems a reasonable name (I've thrown in VM_PFNMAP there because
> it's in the VM_DONTEXPAND test lower down: for safety I guess, and best
> if both behave the same - though one says -EINVAL and the other -EFAULT).

I like this idea and am happy to mail a new patch. I think it may make
sense to bring the lower block up here so that it becomes more clear
that it's not duplicate code and that the MREMAP_DONTUNMAP case
returns -EINVAL and other cases return -EFAULT. I wonder if the
-EFAULT error code would have made more sense from the start for both
cases, do you have any thoughts on changing the error code at this
point?

> With that VM_DONTEXPAND check in, Dmitry's commit cd544fd1dc92
> ("mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio")
> can still be reverted (as you agreed on 28th December), even though
> vma_is_anonymous() will no longer protect it.

I agree and if Dmitry does not have time I would be happy to mail a
revert to cd544fd1dc92 as we discussed in [1]. Dmitry, would you like
me to do that?

> Was there an mremap(2) man page update for MREMAP_DONTUNMAP?
> Whether or not there was before, it ought to get one now.

Yes, the mremap(2) man page was updated when this flag was added and
it will require a further update to reflect this expanded mapping
support.

Thanks
Brian

1. https://lkml.org/lkml/2020/12/28/2340

2021-03-16 21:24:48

by Peter Xu

[permalink] [raw]
Subject: Re: [PATCH] mm: Allow shmem mappings with MREMAP_DONTUNMAP

On Sat, Mar 13, 2021 at 08:19:38PM -0800, Hugh Dickins wrote:
> On Wed, 3 Mar 2021, Brian Geffon wrote:
>
> > Currently MREMAP_DONTUNMAP only accepts private anonymous mappings. This change
> > will widen the support to include shmem mappings. The primary use case
> > is to support MREMAP_DONTUNMAP on mappings which may have been created from
> > a memfd.
> >
> > Lokesh Gidra who works on the Android JVM, provided an explanation of how such
> > a feature will improve Android JVM garbage collection:
> > "Android is developing a new garbage collector (GC), based on userfaultfd. The
> > garbage collector will use userfaultfd (uffd) on the java heap during compaction.
> > On accessing any uncompacted page, the application threads will find it missing,
> > at which point the thread will create the compacted page and then use UFFDIO_COPY
> > ioctl to get it mapped and then resume execution. Before starting this compaction,
> > in a stop-the-world pause the heap will be mremap(MREMAP_DONTUNMAP) so that the
> > java heap is ready to receive UFFD_EVENT_PAGEFAULT events after resuming execution.
> >
> > To speedup mremap operations, pagetable movement was optimized by moving PUD entries
> > instead of PTE entries [1]. It was necessary as mremap of even modest sized memory
> > ranges also took several milliseconds, and stopping the application for that long
> > isn't acceptable in response-time sensitive cases. With UFFDIO_CONTINUE feature [2],
> > it will be even more efficient to implement this GC, particularly the 'non-moveable'
> > portions of the heap. It will also help in reducing the need to copy (UFFDIO_COPY)
> > the pages. However, for this to work, the java heap has to be on a 'shared' vma.
> > Currently MREMAP_DONTUNMAP only supports private anonymous mappings, this patch will
> > enable using UFFDIO_CONTINUE for the new userfaultfd-based heap compaction."
> >
> > [1] https://lore.kernel.org/linux-mm/20201215030730.NC3CU98e4%[email protected]/
> > [2] https://lore.kernel.org/linux-mm/[email protected]/
> > ---
> > mm/mremap.c | 3 +--
> > 1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/mm/mremap.c b/mm/mremap.c
> > index ec8f840399ed..6934d199da54 100644
> > --- a/mm/mremap.c
> > +++ b/mm/mremap.c
> > @@ -653,8 +653,7 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr,
> > return ERR_PTR(-EINVAL);
> > }
> >
> > - if (flags & MREMAP_DONTUNMAP && (!vma_is_anonymous(vma) ||
> > - vma->vm_flags & VM_SHARED))
> > + if (flags & MREMAP_DONTUNMAP && !(vma_is_anonymous(vma) || vma_is_shmem(vma)))
> > return ERR_PTR(-EINVAL);
> >
> > if (is_vm_hugetlb_page(vma))
> > --
>
> Yet something to improve...
>
> Thanks for extending MREMAP_DONTUNMAP to shmem, but I think this patch
> goes in the wrong direction, complicating when it should be generalizing:
> the mremap syscall is about rearranging the user's virtual address space,
> and is not specific to the underlying anonymous or shmem or file object
> (though so far you have only been interested in anonymous, and now shmem).
>
> A better patch would say:
>
> - if (flags & MREMAP_DONTUNMAP && (!vma_is_anonymous(vma) ||
> - vma->vm_flags & VM_SHARED))
> + if ((flags & MREMAP_DONTUNMAP) &&
> + (vma->vm_flags & (VM_DONTEXPAND | VM_PFNMAP)))
> return ERR_PTR(-EINVAL);
>
> VM_DONTEXPAND is what has long been used on special mappings, to prevent
> surprises from mremap changing the size of the mapping: MREMAP_DONTUNMAP
> introduced a different way of expanding the mapping, so VM_DONTEXPAND
> still seems a reasonable name (I've thrown in VM_PFNMAP there because
> it's in the VM_DONTEXPAND test lower down: for safety I guess, and best
> if both behave the same - though one says -EINVAL and the other -EFAULT).
>
> With that VM_DONTEXPAND check in, Dmitry's commit cd544fd1dc92
> ("mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio")
> can still be reverted (as you agreed on 28th December), even though
> vma_is_anonymous() will no longer protect it.
>
> Was there an mremap(2) man page update for MREMAP_DONTUNMAP?
> Whether or not there was before, it ought to get one now.

I'm curious whether it's okay to expand MREMAP_DONTUNMAP to PFNMAP too..
E.g. vfio maps device MMIO regions with both VM_DONTEXPAND|VM_PFNMAP, to me it
makes sense to allow the userspace to get such MMIO region remapped/duplicated
somewhere else as long as the size won't change. With the strict check as
above we kill all those possibilities.

Though in that case we'll still need commits like cd544fd1dc92 to protect any
customized ->mremap() when they're not supported.

Thanks,

--
Peter Xu

2021-03-17 19:16:24

by Brian Geffon

[permalink] [raw]
Subject: [PATCH v2 1/2] mm: Allow non-VM_DONTEXPAND and VM_PFNMAP mappings with MREMAP_DONTUNMAP

Currently MREMAP_DONTUNMAP only accepts private anonymous mappings. This
change will widen the support to include any mappings which are not
VM_DONTEXPAND or VM_PFNMAP. The primary use case is to support
MREMAP_DONTUNMAP on mappings which may have been created from a memfd.

This change which takes advantage of the existing check in vma_to_resize
for non-VM_DONTEXPAND and non-VM_PFNMAP mappings will cause
MREMAP_DONTUNMAP to return -EFAULT if such mappings are remapped. This
behavior is consistent with existing behavior when using mremap with
such mappings.

Lokesh Gidra who works on the Android JVM, provided an explanation of how
such a feature will improve Android JVM garbage collection:
"Android is developing a new garbage collector (GC), based on userfaultfd.
The garbage collector will use userfaultfd (uffd) on the java heap during
compaction. On accessing any uncompacted page, the application threads will
find it missing, at which point the thread will create the compacted page
and then use UFFDIO_COPY ioctl to get it mapped and then resume execution.
Before starting this compaction, in a stop-the-world pause the heap will be
mremap(MREMAP_DONTUNMAP) so that the java heap is ready to receive
UFFD_EVENT_PAGEFAULT events after resuming execution.

To speedup mremap operations, pagetable movement was optimized by moving
PUD entries instead of PTE entries [1]. It was necessary as mremap of even
modest sized memory ranges also took several milliseconds, and stopping the
application for that long isn't acceptable in response-time sensitive
cases.

With UFFDIO_CONTINUE feature [2], it will be even more efficient to
implement this GC, particularly the 'non-moveable' portions of the heap.
It will also help in reducing the need to copy (UFFDIO_COPY) the pages.
However, for this to work, the java heap has to be on a 'shared' vma.
Currently MREMAP_DONTUNMAP only supports private anonymous mappings, this
patch will enable using UFFDIO_CONTINUE for the new userfaultfd-based heap
compaction."

[1] https://lore.kernel.org/linux-mm/20201215030730.NC3CU98e4%[email protected]/
[2] https://lore.kernel.org/linux-mm/[email protected]/

Signed-off-by: Brian Geffon <[email protected]>
---
mm/mremap.c | 4 ----
1 file changed, 4 deletions(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index ec8f840399ed..2c57dc4bc8b6 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -653,10 +653,6 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr,
return ERR_PTR(-EINVAL);
}

- if (flags & MREMAP_DONTUNMAP && (!vma_is_anonymous(vma) ||
- vma->vm_flags & VM_SHARED))
- return ERR_PTR(-EINVAL);
-
if (is_vm_hugetlb_page(vma))
return ERR_PTR(-EINVAL);

--
2.31.0.rc2.261.g7f71774620-goog

2021-03-17 19:16:43

by Brian Geffon

[permalink] [raw]
Subject: [PATCH v2 2/2] Revert "mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio"

This reverts commit cd544fd1dc9293c6702fab6effa63dac1cc67e99.

As discussed in [1] this commit was a no-op because the mapping type was
checked in vma_to_resize before move_vma is ever called. This meant that
vm_ops->mremap() would never be called on such mappings. Furthermore,
we've since expanded support of MREMAP_DONTUNMAP to non-anonymous
mappings, and these special mappings are still protected by the existing
check of !VM_DONTEXPAND and !VM_PFNMAP which will result in a -EFAULT.

1. https://lkml.org/lkml/2020/12/28/2340

Signed-off-by: Brian Geffon <[email protected]>
---
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 2 +-
fs/aio.c | 5 +----
include/linux/mm.h | 2 +-
mm/mmap.c | 6 +-----
mm/mremap.c | 2 +-
5 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index e916646adc69..0daf2f1cf7a8 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -1458,7 +1458,7 @@ static int pseudo_lock_dev_release(struct inode *inode, struct file *filp)
return 0;
}

-static int pseudo_lock_dev_mremap(struct vm_area_struct *area, unsigned long flags)
+static int pseudo_lock_dev_mremap(struct vm_area_struct *area)
{
/* Not supported */
return -EINVAL;
diff --git a/fs/aio.c b/fs/aio.c
index 1f32da13d39e..76ce0cc3ee4e 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -323,16 +323,13 @@ static void aio_free_ring(struct kioctx *ctx)
}
}

-static int aio_ring_mremap(struct vm_area_struct *vma, unsigned long flags)
+static int aio_ring_mremap(struct vm_area_struct *vma)
{
struct file *file = vma->vm_file;
struct mm_struct *mm = vma->vm_mm;
struct kioctx_table *table;
int i, res = -EINVAL;

- if (flags & MREMAP_DONTUNMAP)
- return -EINVAL;
-
spin_lock(&mm->ioctx_lock);
rcu_read_lock();
table = rcu_dereference(mm->ioctx_table);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 77e64e3eac80..8c3729eb3e38 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -570,7 +570,7 @@ struct vm_operations_struct {
void (*close)(struct vm_area_struct * area);
/* Called any time before splitting to check if it's allowed */
int (*may_split)(struct vm_area_struct *area, unsigned long addr);
- int (*mremap)(struct vm_area_struct *area, unsigned long flags);
+ int (*mremap)(struct vm_area_struct *area);
/*
* Called by mprotect() to make driver-specific permission
* checks before mprotect() is finalised. The VMA must not
diff --git a/mm/mmap.c b/mm/mmap.c
index 3f287599a7a3..9d7651e4e1fe 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3403,14 +3403,10 @@ static const char *special_mapping_name(struct vm_area_struct *vma)
return ((struct vm_special_mapping *)vma->vm_private_data)->name;
}

-static int special_mapping_mremap(struct vm_area_struct *new_vma,
- unsigned long flags)
+static int special_mapping_mremap(struct vm_area_struct *new_vma)
{
struct vm_special_mapping *sm = new_vma->vm_private_data;

- if (flags & MREMAP_DONTUNMAP)
- return -EINVAL;
-
if (WARN_ON_ONCE(current->mm != new_vma->vm_mm))
return -EFAULT;

diff --git a/mm/mremap.c b/mm/mremap.c
index 2c57dc4bc8b6..b1f7bc43ece9 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -545,7 +545,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
if (moved_len < old_len) {
err = -ENOMEM;
} else if (vma->vm_ops && vma->vm_ops->mremap) {
- err = vma->vm_ops->mremap(new_vma, flags);
+ err = vma->vm_ops->mremap(new_vma);
}

if (unlikely(err)) {
--
2.31.0.rc2.261.g7f71774620-goog

2021-03-17 20:42:34

by Peter Xu

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm: Allow non-VM_DONTEXPAND and VM_PFNMAP mappings with MREMAP_DONTUNMAP

Hi, Brian,

On Wed, Mar 17, 2021 at 12:13:33PM -0700, Brian Geffon wrote:
> Currently MREMAP_DONTUNMAP only accepts private anonymous mappings. This
> change will widen the support to include any mappings which are not
> VM_DONTEXPAND or VM_PFNMAP. The primary use case is to support
> MREMAP_DONTUNMAP on mappings which may have been created from a memfd.
>
> This change which takes advantage of the existing check in vma_to_resize
> for non-VM_DONTEXPAND and non-VM_PFNMAP mappings will cause
> MREMAP_DONTUNMAP to return -EFAULT if such mappings are remapped. This
> behavior is consistent with existing behavior when using mremap with
> such mappings.
>
> Lokesh Gidra who works on the Android JVM, provided an explanation of how
> such a feature will improve Android JVM garbage collection:
> "Android is developing a new garbage collector (GC), based on userfaultfd.
> The garbage collector will use userfaultfd (uffd) on the java heap during
> compaction. On accessing any uncompacted page, the application threads will
> find it missing, at which point the thread will create the compacted page
> and then use UFFDIO_COPY ioctl to get it mapped and then resume execution.
> Before starting this compaction, in a stop-the-world pause the heap will be
> mremap(MREMAP_DONTUNMAP) so that the java heap is ready to receive
> UFFD_EVENT_PAGEFAULT events after resuming execution.
>
> To speedup mremap operations, pagetable movement was optimized by moving
> PUD entries instead of PTE entries [1]. It was necessary as mremap of even
> modest sized memory ranges also took several milliseconds, and stopping the
> application for that long isn't acceptable in response-time sensitive
> cases.
>
> With UFFDIO_CONTINUE feature [2], it will be even more efficient to
> implement this GC, particularly the 'non-moveable' portions of the heap.
> It will also help in reducing the need to copy (UFFDIO_COPY) the pages.
> However, for this to work, the java heap has to be on a 'shared' vma.
> Currently MREMAP_DONTUNMAP only supports private anonymous mappings, this
> patch will enable using UFFDIO_CONTINUE for the new userfaultfd-based heap
> compaction."
>
> [1] https://lore.kernel.org/linux-mm/20201215030730.NC3CU98e4%2[email protected]/
> [2] https://lore.kernel.org/linux-mm/[email protected]/
>
> Signed-off-by: Brian Geffon <[email protected]>
> ---
> mm/mremap.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/mm/mremap.c b/mm/mremap.c
> index ec8f840399ed..2c57dc4bc8b6 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -653,10 +653,6 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr,
> return ERR_PTR(-EINVAL);
> }
>
> - if (flags & MREMAP_DONTUNMAP && (!vma_is_anonymous(vma) ||
> - vma->vm_flags & VM_SHARED))
> - return ERR_PTR(-EINVAL);
> -
> if (is_vm_hugetlb_page(vma))
> return ERR_PTR(-EINVAL);

The code change seems to be not aligned with what the commit message said. Did
you perhaps forget to add the checks against VM_DONTEXPAND | VM_PFNMAP? I'm
guessing that (instead of commit message to be touched up) because you still
attached the revert patch, then that check seems to be needed. Thanks,

--
Peter Xu

2021-03-17 20:50:05

by Brian Geffon

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm: Allow non-VM_DONTEXPAND and VM_PFNMAP mappings with MREMAP_DONTUNMAP

Hi Peter,
Thank you as always for taking a look. This change relies on the
existing check in vma_to_resize on line 686:
https://elixir.bootlin.com/linux/v5.12-rc3/source/mm/mremap.c#L686
which returns -EFAULT when the vma is VM_DONTEXPAND or VM_PFNMAP.

Thanks
Brian

On Wed, Mar 17, 2021 at 4:40 PM Peter Xu <[email protected]> wrote:
>
> Hi, Brian,
>
> On Wed, Mar 17, 2021 at 12:13:33PM -0700, Brian Geffon wrote:
> > Currently MREMAP_DONTUNMAP only accepts private anonymous mappings. This
> > change will widen the support to include any mappings which are not
> > VM_DONTEXPAND or VM_PFNMAP. The primary use case is to support
> > MREMAP_DONTUNMAP on mappings which may have been created from a memfd.
> >
> > This change which takes advantage of the existing check in vma_to_resize
> > for non-VM_DONTEXPAND and non-VM_PFNMAP mappings will cause
> > MREMAP_DONTUNMAP to return -EFAULT if such mappings are remapped. This
> > behavior is consistent with existing behavior when using mremap with
> > such mappings.
> >
> > Lokesh Gidra who works on the Android JVM, provided an explanation of how
> > such a feature will improve Android JVM garbage collection:
> > "Android is developing a new garbage collector (GC), based on userfaultfd.
> > The garbage collector will use userfaultfd (uffd) on the java heap during
> > compaction. On accessing any uncompacted page, the application threads will
> > find it missing, at which point the thread will create the compacted page
> > and then use UFFDIO_COPY ioctl to get it mapped and then resume execution.
> > Before starting this compaction, in a stop-the-world pause the heap will be
> > mremap(MREMAP_DONTUNMAP) so that the java heap is ready to receive
> > UFFD_EVENT_PAGEFAULT events after resuming execution.
> >
> > To speedup mremap operations, pagetable movement was optimized by moving
> > PUD entries instead of PTE entries [1]. It was necessary as mremap of even
> > modest sized memory ranges also took several milliseconds, and stopping the
> > application for that long isn't acceptable in response-time sensitive
> > cases.
> >
> > With UFFDIO_CONTINUE feature [2], it will be even more efficient to
> > implement this GC, particularly the 'non-moveable' portions of the heap.
> > It will also help in reducing the need to copy (UFFDIO_COPY) the pages.
> > However, for this to work, the java heap has to be on a 'shared' vma.
> > Currently MREMAP_DONTUNMAP only supports private anonymous mappings, this
> > patch will enable using UFFDIO_CONTINUE for the new userfaultfd-based heap
> > compaction."
> >
> > [1] https://lore.kernel.org/linux-mm/20201215030730.NC3CU98e4%[email protected]/
> > [2] https://lore.kernel.org/linux-mm/[email protected]/
> >
> > Signed-off-by: Brian Geffon <[email protected]>
> > ---
> > mm/mremap.c | 4 ----
> > 1 file changed, 4 deletions(-)
> >
> > diff --git a/mm/mremap.c b/mm/mremap.c
> > index ec8f840399ed..2c57dc4bc8b6 100644
> > --- a/mm/mremap.c
> > +++ b/mm/mremap.c
> > @@ -653,10 +653,6 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr,
> > return ERR_PTR(-EINVAL);
> > }
> >
> > - if (flags & MREMAP_DONTUNMAP && (!vma_is_anonymous(vma) ||
> > - vma->vm_flags & VM_SHARED))
> > - return ERR_PTR(-EINVAL);
> > -
> > if (is_vm_hugetlb_page(vma))
> > return ERR_PTR(-EINVAL);
>
> The code change seems to be not aligned with what the commit message said. Did
> you perhaps forget to add the checks against VM_DONTEXPAND | VM_PFNMAP? I'm
> guessing that (instead of commit message to be touched up) because you still
> attached the revert patch, then that check seems to be needed. Thanks,
>
> --
> Peter Xu
>

2021-03-17 21:47:07

by Peter Xu

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm: Allow non-VM_DONTEXPAND and VM_PFNMAP mappings with MREMAP_DONTUNMAP

On Wed, Mar 17, 2021 at 04:44:25PM -0400, Brian Geffon wrote:
> Hi Peter,

Hi, Brian,

> Thank you as always for taking a look. This change relies on the
> existing check in vma_to_resize on line 686:
> https://elixir.bootlin.com/linux/v5.12-rc3/source/mm/mremap.c#L686
> which returns -EFAULT when the vma is VM_DONTEXPAND or VM_PFNMAP.

Do you mean line 676?

https://elixir.bootlin.com/linux/v5.12-rc3/source/mm/mremap.c#L676

I'm not sure whether it'll work for MREMAP_DONTUNMAP, since IIUC
MREMAP_DONTUNMAP only works for the remap case with no size change, however in
that case in vma_to_resize() we'll bail out even earlier than line 676 when
checking against the size:

https://elixir.bootlin.com/linux/v5.12-rc3/source/mm/mremap.c#L667

So IIUC we'll still need the change as Hugh suggested previously.

Thanks,

--
Peter Xu

2021-03-17 21:49:33

by Brian Geffon

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm: Allow non-VM_DONTEXPAND and VM_PFNMAP mappings with MREMAP_DONTUNMAP

You're 100% correct, I'll mail a new patch in a few

Brian


On Wed, Mar 17, 2021 at 5:19 PM Peter Xu <[email protected]> wrote:
>
> On Wed, Mar 17, 2021 at 04:44:25PM -0400, Brian Geffon wrote:
> > Hi Peter,
>
> Hi, Brian,
>
> > Thank you as always for taking a look. This change relies on the
> > existing check in vma_to_resize on line 686:
> > https://elixir.bootlin.com/linux/v5.12-rc3/source/mm/mremap.c#L686
> > which returns -EFAULT when the vma is VM_DONTEXPAND or VM_PFNMAP.
>
> Do you mean line 676?
>
> https://elixir.bootlin.com/linux/v5.12-rc3/source/mm/mremap.c#L676
>
> I'm not sure whether it'll work for MREMAP_DONTUNMAP, since IIUC
> MREMAP_DONTUNMAP only works for the remap case with no size change, however in
> that case in vma_to_resize() we'll bail out even earlier than line 676 when
> checking against the size:
>
> https://elixir.bootlin.com/linux/v5.12-rc3/source/mm/mremap.c#L667
>
> So IIUC we'll still need the change as Hugh suggested previously.
>
> Thanks,
>
> --
> Peter Xu
>

2021-03-17 21:49:40

by Brian Geffon

[permalink] [raw]
Subject: [PATCH v3 1/2] mm: Allow non-VM_DONTEXPAND and VM_PFNMAP mappings with MREMAP_DONTUNMAP

Currently MREMAP_DONTUNMAP only accepts private anonymous mappings. This
change will widen the support to include any mappings which are not
VM_DONTEXPAND or VM_PFNMAP. The primary use case is to support
MREMAP_DONTUNMAP on mappings which may have been created from a memfd.

This change will result in mremap(MREMAP_DONTUNMAP) returning -EINVAL
if VM_DONTEXPAND or VM_PFNMAP mappings are specified.

Lokesh Gidra who works on the Android JVM, provided an explanation of how
such a feature will improve Android JVM garbage collection:
"Android is developing a new garbage collector (GC), based on userfaultfd.
The garbage collector will use userfaultfd (uffd) on the java heap during
compaction. On accessing any uncompacted page, the application threads will
find it missing, at which point the thread will create the compacted page
and then use UFFDIO_COPY ioctl to get it mapped and then resume execution.
Before starting this compaction, in a stop-the-world pause the heap will be
mremap(MREMAP_DONTUNMAP) so that the java heap is ready to receive
UFFD_EVENT_PAGEFAULT events after resuming execution.

To speedup mremap operations, pagetable movement was optimized by moving
PUD entries instead of PTE entries [1]. It was necessary as mremap of even
modest sized memory ranges also took several milliseconds, and stopping the
application for that long isn't acceptable in response-time sensitive
cases.

With UFFDIO_CONTINUE feature [2], it will be even more efficient to
implement this GC, particularly the 'non-moveable' portions of the heap.
It will also help in reducing the need to copy (UFFDIO_COPY) the pages.
However, for this to work, the java heap has to be on a 'shared' vma.
Currently MREMAP_DONTUNMAP only supports private anonymous mappings, this
patch will enable using UFFDIO_CONTINUE for the new userfaultfd-based heap
compaction."

[1] https://lore.kernel.org/linux-mm/20201215030730.NC3CU98e4%[email protected]/
[2] https://lore.kernel.org/linux-mm/[email protected]/

Signed-off-by: Brian Geffon <[email protected]>
---
mm/mremap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index ec8f840399ed..db5b8b28c2dd 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -653,8 +653,8 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr,
return ERR_PTR(-EINVAL);
}

- if (flags & MREMAP_DONTUNMAP && (!vma_is_anonymous(vma) ||
- vma->vm_flags & VM_SHARED))
+ if ((flags & MREMAP_DONTUNMAP) &&
+ (vma->vm_flags & (VM_DONTEXPAND | VM_PFNMAP)))
return ERR_PTR(-EINVAL);

if (is_vm_hugetlb_page(vma))
--
2.31.0.rc2.261.g7f71774620-goog

2021-03-17 21:51:40

by Brian Geffon

[permalink] [raw]
Subject: [PATCH v3 2/2] Revert "mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio"

This reverts commit cd544fd1dc9293c6702fab6effa63dac1cc67e99.

As discussed in [1] this commit was a no-op because the mapping type was
checked in vma_to_resize before move_vma is ever called. This meant that
vm_ops->mremap() would never be called on such mappings. Furthermore,
we've since expanded support of MREMAP_DONTUNMAP to non-anonymous
mappings, and these special mappings are still protected by the existing
check of !VM_DONTEXPAND and !VM_PFNMAP which will result in a -EFAULT.

1. https://lkml.org/lkml/2020/12/28/2340

Signed-off-by: Brian Geffon <[email protected]>
---
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 2 +-
fs/aio.c | 5 +----
include/linux/mm.h | 2 +-
mm/mmap.c | 6 +-----
mm/mremap.c | 2 +-
5 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index e916646adc69..0daf2f1cf7a8 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -1458,7 +1458,7 @@ static int pseudo_lock_dev_release(struct inode *inode, struct file *filp)
return 0;
}

-static int pseudo_lock_dev_mremap(struct vm_area_struct *area, unsigned long flags)
+static int pseudo_lock_dev_mremap(struct vm_area_struct *area)
{
/* Not supported */
return -EINVAL;
diff --git a/fs/aio.c b/fs/aio.c
index 1f32da13d39e..76ce0cc3ee4e 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -323,16 +323,13 @@ static void aio_free_ring(struct kioctx *ctx)
}
}

-static int aio_ring_mremap(struct vm_area_struct *vma, unsigned long flags)
+static int aio_ring_mremap(struct vm_area_struct *vma)
{
struct file *file = vma->vm_file;
struct mm_struct *mm = vma->vm_mm;
struct kioctx_table *table;
int i, res = -EINVAL;

- if (flags & MREMAP_DONTUNMAP)
- return -EINVAL;
-
spin_lock(&mm->ioctx_lock);
rcu_read_lock();
table = rcu_dereference(mm->ioctx_table);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 77e64e3eac80..8c3729eb3e38 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -570,7 +570,7 @@ struct vm_operations_struct {
void (*close)(struct vm_area_struct * area);
/* Called any time before splitting to check if it's allowed */
int (*may_split)(struct vm_area_struct *area, unsigned long addr);
- int (*mremap)(struct vm_area_struct *area, unsigned long flags);
+ int (*mremap)(struct vm_area_struct *area);
/*
* Called by mprotect() to make driver-specific permission
* checks before mprotect() is finalised. The VMA must not
diff --git a/mm/mmap.c b/mm/mmap.c
index 3f287599a7a3..9d7651e4e1fe 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3403,14 +3403,10 @@ static const char *special_mapping_name(struct vm_area_struct *vma)
return ((struct vm_special_mapping *)vma->vm_private_data)->name;
}

-static int special_mapping_mremap(struct vm_area_struct *new_vma,
- unsigned long flags)
+static int special_mapping_mremap(struct vm_area_struct *new_vma)
{
struct vm_special_mapping *sm = new_vma->vm_private_data;

- if (flags & MREMAP_DONTUNMAP)
- return -EINVAL;
-
if (WARN_ON_ONCE(current->mm != new_vma->vm_mm))
return -EFAULT;

diff --git a/mm/mremap.c b/mm/mremap.c
index db5b8b28c2dd..d22629ff8f3c 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -545,7 +545,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
if (moved_len < old_len) {
err = -ENOMEM;
} else if (vma->vm_ops && vma->vm_ops->mremap) {
- err = vma->vm_ops->mremap(new_vma, flags);
+ err = vma->vm_ops->mremap(new_vma);
}

if (unlikely(err)) {
--
2.31.0.rc2.261.g7f71774620-goog

2021-03-17 22:05:30

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] mm: Allow non-VM_DONTEXPAND and VM_PFNMAP mappings with MREMAP_DONTUNMAP

On Wed, 17 Mar 2021 14:41:46 -0700 Brian Geffon <[email protected]> wrote:

> Currently MREMAP_DONTUNMAP only accepts private anonymous mappings. This
> change will widen the support to include any mappings which are not
> VM_DONTEXPAND or VM_PFNMAP.

Please update changelog to explain why these two were omitted?

> The primary use case is to support
> MREMAP_DONTUNMAP on mappings which may have been created from a memfd.
>
> This change will result in mremap(MREMAP_DONTUNMAP) returning -EINVAL
> if VM_DONTEXPAND or VM_PFNMAP mappings are specified.
>
> Lokesh Gidra who works on the Android JVM, provided an explanation of how
> such a feature will improve Android JVM garbage collection:
> "Android is developing a new garbage collector (GC), based on userfaultfd.
> The garbage collector will use userfaultfd (uffd) on the java heap during
> compaction. On accessing any uncompacted page, the application threads will
> find it missing, at which point the thread will create the compacted page
> and then use UFFDIO_COPY ioctl to get it mapped and then resume execution.
> Before starting this compaction, in a stop-the-world pause the heap will be
> mremap(MREMAP_DONTUNMAP) so that the java heap is ready to receive
> UFFD_EVENT_PAGEFAULT events after resuming execution.
>
> To speedup mremap operations, pagetable movement was optimized by moving
> PUD entries instead of PTE entries [1]. It was necessary as mremap of even
> modest sized memory ranges also took several milliseconds, and stopping the
> application for that long isn't acceptable in response-time sensitive
> cases.
>
> With UFFDIO_CONTINUE feature [2], it will be even more efficient to
> implement this GC, particularly the 'non-moveable' portions of the heap.
> It will also help in reducing the need to copy (UFFDIO_COPY) the pages.
> However, for this to work, the java heap has to be on a 'shared' vma.
> Currently MREMAP_DONTUNMAP only supports private anonymous mappings, this
> patch will enable using UFFDIO_CONTINUE for the new userfaultfd-based heap
> compaction."

Is a manpage update planned? It's appropriate to add this to the
series so folks can check it over.

Can we please get the appropriate updates into
tools/testing/selftests/vm/mremap_test.c for this?

> Signed-off-by: Brian Geffon <[email protected]>

v3 is getting up there. Has there been much review activity?

2021-03-19 01:22:46

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] mm: Allow non-VM_DONTEXPAND and VM_PFNMAP mappings with MREMAP_DONTUNMAP

If Andrew is happy with such a long patch name, okay;
but personally I'd prefer brevity to all that detail:

mm: Extend MREMAP_DONTUNMAP to non-anonymous mappings

On Wed, 17 Mar 2021, Brian Geffon wrote:

> Currently MREMAP_DONTUNMAP only accepts private anonymous mappings. This
> change will widen the support to include any mappings which are not
> VM_DONTEXPAND or VM_PFNMAP. The primary use case is to support
> MREMAP_DONTUNMAP on mappings which may have been created from a memfd.
>
> This change will result in mremap(MREMAP_DONTUNMAP) returning -EINVAL
> if VM_DONTEXPAND or VM_PFNMAP mappings are specified.
>
> Lokesh Gidra who works on the Android JVM, provided an explanation of how
> such a feature will improve Android JVM garbage collection:
> "Android is developing a new garbage collector (GC), based on userfaultfd.
> The garbage collector will use userfaultfd (uffd) on the java heap during
> compaction. On accessing any uncompacted page, the application threads will
> find it missing, at which point the thread will create the compacted page
> and then use UFFDIO_COPY ioctl to get it mapped and then resume execution.
> Before starting this compaction, in a stop-the-world pause the heap will be
> mremap(MREMAP_DONTUNMAP) so that the java heap is ready to receive
> UFFD_EVENT_PAGEFAULT events after resuming execution.
>
> To speedup mremap operations, pagetable movement was optimized by moving
> PUD entries instead of PTE entries [1]. It was necessary as mremap of even
> modest sized memory ranges also took several milliseconds, and stopping the
> application for that long isn't acceptable in response-time sensitive
> cases.
>
> With UFFDIO_CONTINUE feature [2], it will be even more efficient to
> implement this GC, particularly the 'non-moveable' portions of the heap.
> It will also help in reducing the need to copy (UFFDIO_COPY) the pages.
> However, for this to work, the java heap has to be on a 'shared' vma.
> Currently MREMAP_DONTUNMAP only supports private anonymous mappings, this
> patch will enable using UFFDIO_CONTINUE for the new userfaultfd-based heap
> compaction."
>
> [1] https://lore.kernel.org/linux-mm/20201215030730.NC3CU98e4%[email protected]/
> [2] https://lore.kernel.org/linux-mm/[email protected]/
>
> Signed-off-by: Brian Geffon <[email protected]>

Acked-by: Hugh Dickins <[email protected]>

Thanks Brian, just what I wanted :)

You wondered in another mail about this returning -EINVAL whereas
the VM_DONTEXPAND size error returns -EFAULT: I've pondered, and I've
read the manpage, and I'm sure it would be wrong to change the old
-EFAULT to -EINVAL now; and I don't see good reason to change your
-EINVAL to -EFAULT either. Let them differ, that's okay (and it's
only in special corner cases that either of these fail anyway).

> ---
> mm/mremap.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/mremap.c b/mm/mremap.c
> index ec8f840399ed..db5b8b28c2dd 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -653,8 +653,8 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr,
> return ERR_PTR(-EINVAL);
> }
>
> - if (flags & MREMAP_DONTUNMAP && (!vma_is_anonymous(vma) ||
> - vma->vm_flags & VM_SHARED))
> + if ((flags & MREMAP_DONTUNMAP) &&
> + (vma->vm_flags & (VM_DONTEXPAND | VM_PFNMAP)))
> return ERR_PTR(-EINVAL);
>
> if (is_vm_hugetlb_page(vma))
> --
> 2.31.0.rc2.261.g7f71774620-goog

2021-03-19 01:29:47

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] Revert "mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio"

On Wed, 17 Mar 2021, Brian Geffon wrote:

> This reverts commit cd544fd1dc9293c6702fab6effa63dac1cc67e99.
>
> As discussed in [1] this commit was a no-op because the mapping type was
> checked in vma_to_resize before move_vma is ever called. This meant that
> vm_ops->mremap() would never be called on such mappings. Furthermore,
> we've since expanded support of MREMAP_DONTUNMAP to non-anonymous
> mappings, and these special mappings are still protected by the existing
> check of !VM_DONTEXPAND and !VM_PFNMAP which will result in a -EFAULT.

One small fixup needed: -EFAULT was what the incorrect v2 gave, but
v3 issues -EINVAL like before, and I'm content with that difference.

>
> 1. https://lkml.org/lkml/2020/12/28/2340
>
> Signed-off-by: Brian Geffon <[email protected]>

Acked-by: Hugh Dickins <[email protected]>

Thanks Brian, I'm happy with this result.

> ---
> arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 2 +-
> fs/aio.c | 5 +----
> include/linux/mm.h | 2 +-
> mm/mmap.c | 6 +-----
> mm/mremap.c | 2 +-
> 5 files changed, 5 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
> index e916646adc69..0daf2f1cf7a8 100644
> --- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
> +++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
> @@ -1458,7 +1458,7 @@ static int pseudo_lock_dev_release(struct inode *inode, struct file *filp)
> return 0;
> }
>
> -static int pseudo_lock_dev_mremap(struct vm_area_struct *area, unsigned long flags)
> +static int pseudo_lock_dev_mremap(struct vm_area_struct *area)
> {
> /* Not supported */
> return -EINVAL;
> diff --git a/fs/aio.c b/fs/aio.c
> index 1f32da13d39e..76ce0cc3ee4e 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -323,16 +323,13 @@ static void aio_free_ring(struct kioctx *ctx)
> }
> }
>
> -static int aio_ring_mremap(struct vm_area_struct *vma, unsigned long flags)
> +static int aio_ring_mremap(struct vm_area_struct *vma)
> {
> struct file *file = vma->vm_file;
> struct mm_struct *mm = vma->vm_mm;
> struct kioctx_table *table;
> int i, res = -EINVAL;
>
> - if (flags & MREMAP_DONTUNMAP)
> - return -EINVAL;
> -
> spin_lock(&mm->ioctx_lock);
> rcu_read_lock();
> table = rcu_dereference(mm->ioctx_table);
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 77e64e3eac80..8c3729eb3e38 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -570,7 +570,7 @@ struct vm_operations_struct {
> void (*close)(struct vm_area_struct * area);
> /* Called any time before splitting to check if it's allowed */
> int (*may_split)(struct vm_area_struct *area, unsigned long addr);
> - int (*mremap)(struct vm_area_struct *area, unsigned long flags);
> + int (*mremap)(struct vm_area_struct *area);
> /*
> * Called by mprotect() to make driver-specific permission
> * checks before mprotect() is finalised. The VMA must not
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 3f287599a7a3..9d7651e4e1fe 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -3403,14 +3403,10 @@ static const char *special_mapping_name(struct vm_area_struct *vma)
> return ((struct vm_special_mapping *)vma->vm_private_data)->name;
> }
>
> -static int special_mapping_mremap(struct vm_area_struct *new_vma,
> - unsigned long flags)
> +static int special_mapping_mremap(struct vm_area_struct *new_vma)
> {
> struct vm_special_mapping *sm = new_vma->vm_private_data;
>
> - if (flags & MREMAP_DONTUNMAP)
> - return -EINVAL;
> -
> if (WARN_ON_ONCE(current->mm != new_vma->vm_mm))
> return -EFAULT;
>
> diff --git a/mm/mremap.c b/mm/mremap.c
> index db5b8b28c2dd..d22629ff8f3c 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -545,7 +545,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
> if (moved_len < old_len) {
> err = -ENOMEM;
> } else if (vma->vm_ops && vma->vm_ops->mremap) {
> - err = vma->vm_ops->mremap(new_vma, flags);
> + err = vma->vm_ops->mremap(new_vma);
> }
>
> if (unlikely(err)) {
> --
> 2.31.0.rc2.261.g7f71774620-goog

2021-03-19 02:05:31

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH] mm: Allow shmem mappings with MREMAP_DONTUNMAP

On Tue, 16 Mar 2021, Peter Xu wrote:
>
> I'm curious whether it's okay to expand MREMAP_DONTUNMAP to PFNMAP too..
> E.g. vfio maps device MMIO regions with both VM_DONTEXPAND|VM_PFNMAP, to me it
> makes sense to allow the userspace to get such MMIO region remapped/duplicated
> somewhere else as long as the size won't change. With the strict check as
> above we kill all those possibilities.
>
> Though in that case we'll still need commits like cd544fd1dc92 to protect any
> customized ->mremap() when they're not supported.

It would take me many hours to arrive at a conclusion on that:
I'm going to spend the time differently, and let whoever ends up
wanting MREMAP_DONTUNMAP on a VM_PFNMAP area research the safety
of that for existing users.

I did look to see what added VM_PFNMAP to the original VM_DONTEXPAND:

v2.6.15
commit 4d7672b46244abffea1953e55688c0ea143dd617
Author: Linus Torvalds <[email protected]>
Date: Fri Dec 16 10:21:23 2005 -0800

Make sure we copy pages inserted with "vm_insert_page()" on fork

The logic that decides that a fork() might be able to avoid copying a VM
area when it can be re-created by page faults didn't know about the new
vm_insert_page() case.

Also make some things a bit more anal wrt VM_PFNMAP.

Pointed out by Hugh Dickins <[email protected]>

Signed-off-by: Linus Torvalds <[email protected]>

So apparently I do bear some anal responsibility. My concern seems
to have been that in those days an unexpected page fault in a special
driver area would end up allocating an anonymous page, which would
never get freed later. Nowadays it looks like there's a SIGBUS for
the equivalent situation.

So probably VM_DONTEXPAND is less important than it was, and the
additional VM_PFNMAP safety net no longer necessary, and you could
strip it out of the old size check and Brian's new dontunmap check.

But I give no guarantee: I don't know VM_PFNMAP users at all well.

Hugh

2021-03-23 16:28:15

by Brian Geffon

[permalink] [raw]
Subject: [PATCH v4 1/3] mm: Extend MREMAP_DONTUNMAP to non-anonymous mappings

Currently MREMAP_DONTUNMAP only accepts private anonymous mappings.
This restriction was placed initially for simplicity and not because
there exists a technical reason to do so.

This change will widen the support to include any mappings which are not
VM_DONTEXPAND or VM_PFNMAP. The primary use case is to support
MREMAP_DONTUNMAP on mappings which may have been created from a memfd.
This change will result in mremap(MREMAP_DONTUNMAP) returning -EINVAL
if VM_DONTEXPAND or VM_PFNMAP mappings are specified.

Lokesh Gidra who works on the Android JVM, provided an explanation of how
such a feature will improve Android JVM garbage collection:
"Android is developing a new garbage collector (GC), based on userfaultfd.
The garbage collector will use userfaultfd (uffd) on the java heap during
compaction. On accessing any uncompacted page, the application threads will
find it missing, at which point the thread will create the compacted page
and then use UFFDIO_COPY ioctl to get it mapped and then resume execution.
Before starting this compaction, in a stop-the-world pause the heap will be
mremap(MREMAP_DONTUNMAP) so that the java heap is ready to receive
UFFD_EVENT_PAGEFAULT events after resuming execution.

To speedup mremap operations, pagetable movement was optimized by moving
PUD entries instead of PTE entries [1]. It was necessary as mremap of even
modest sized memory ranges also took several milliseconds, and stopping the
application for that long isn't acceptable in response-time sensitive
cases.

With UFFDIO_CONTINUE feature [2], it will be even more efficient to
implement this GC, particularly the 'non-moveable' portions of the heap.
It will also help in reducing the need to copy (UFFDIO_COPY) the pages.
However, for this to work, the java heap has to be on a 'shared' vma.
Currently MREMAP_DONTUNMAP only supports private anonymous mappings, this
patch will enable using UFFDIO_CONTINUE for the new userfaultfd-based heap
compaction."

[1] https://lore.kernel.org/linux-mm/20201215030730.NC3CU98e4%[email protected]/
[2] https://lore.kernel.org/linux-mm/[email protected]/

Signed-off-by: Brian Geffon <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Tested-by: Lokesh Gidra <[email protected]>
---
mm/mremap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index ec8f840399ed..db5b8b28c2dd 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -653,8 +653,8 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr,
return ERR_PTR(-EINVAL);
}

- if (flags & MREMAP_DONTUNMAP && (!vma_is_anonymous(vma) ||
- vma->vm_flags & VM_SHARED))
+ if ((flags & MREMAP_DONTUNMAP) &&
+ (vma->vm_flags & (VM_DONTEXPAND | VM_PFNMAP)))
return ERR_PTR(-EINVAL);

if (is_vm_hugetlb_page(vma))
--
2.31.0.rc2.261.g7f71774620-goog

2021-03-23 16:28:43

by Brian Geffon

[permalink] [raw]
Subject: [PATCH] mremap.2: MREMAP_DONTUNMAP to reflect to supported mappings

mremap(2) now supports MREMAP_DONTUNMAP with mapping types other
than private anonymous.

Signed-off-by: Brian Geffon <[email protected]>
---
man2/mremap.2 | 13 ++-----------
1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/man2/mremap.2 b/man2/mremap.2
index 3ed0c0c0a..72acbc111 100644
--- a/man2/mremap.2
+++ b/man2/mremap.2
@@ -118,16 +118,6 @@ This flag, which must be used in conjunction with
remaps a mapping to a new address but does not unmap the mapping at
.IR old_address .
.IP
-The
-.B MREMAP_DONTUNMAP
-flag can be used only with private anonymous mappings
-(see the description of
-.BR MAP_PRIVATE
-and
-.BR MAP_ANONYMOUS
-in
-.BR mmap (2)).
-.IP
After completion,
any access to the range specified by
.IR old_address
@@ -227,7 +217,8 @@ was specified, but one or more pages in the range specified by
.IR old_address
and
.IR old_size
-were not private anonymous;
+were part of a special mapping or the mapping is one that
+does not support merging or expanding;
.IP *
.B MREMAP_DONTUNMAP
was specified and
--
2.31.0.rc2.261.g7f71774620-goog

2021-03-24 05:50:40

by Brian Geffon

[permalink] [raw]
Subject: [PATCH v4 3/3] selftests: Add a MREMAP_DONTUNMAP selftest for shmem

This test extends the current mremap tests to validate that
the MREMAP_DONTUNMAP operation can be performed on shmem mappings.

Signed-off-by: Brian Geffon <[email protected]>
---
tools/testing/selftests/vm/mremap_dontunmap.c | 52 +++++++++++++++++++
1 file changed, 52 insertions(+)

diff --git a/tools/testing/selftests/vm/mremap_dontunmap.c b/tools/testing/selftests/vm/mremap_dontunmap.c
index 3a7b5ef0b0c6..f01dc4a85b0b 100644
--- a/tools/testing/selftests/vm/mremap_dontunmap.c
+++ b/tools/testing/selftests/vm/mremap_dontunmap.c
@@ -127,6 +127,57 @@ static void mremap_dontunmap_simple()
"unable to unmap source mapping");
}

+// This test validates that MREMAP_DONTUNMAP on a shared mapping works as expected.
+static void mremap_dontunmap_simple_shmem()
+{
+ unsigned long num_pages = 5;
+
+ int mem_fd = memfd_create("memfd", MFD_CLOEXEC);
+ BUG_ON(mem_fd < 0, "memfd_create");
+
+ BUG_ON(ftruncate(mem_fd, num_pages * page_size) < 0,
+ "ftruncate");
+
+ void *source_mapping =
+ mmap(NULL, num_pages * page_size, PROT_READ | PROT_WRITE,
+ MAP_FILE | MAP_SHARED, mem_fd, 0);
+ BUG_ON(source_mapping == MAP_FAILED, "mmap");
+
+ BUG_ON(close(mem_fd) < 0, "close");
+
+ memset(source_mapping, 'a', num_pages * page_size);
+
+ // Try to just move the whole mapping anywhere (not fixed).
+ void *dest_mapping =
+ mremap(source_mapping, num_pages * page_size, num_pages * page_size,
+ MREMAP_DONTUNMAP | MREMAP_MAYMOVE, NULL);
+ if (dest_mapping == MAP_FAILED && errno == EINVAL) {
+ // Old kernel which doesn't support MREMAP_DONTUNMAP on shmem.
+ BUG_ON(munmap(source_mapping, num_pages * page_size) == -1,
+ "unable to unmap source mapping");
+ return;
+ }
+
+ BUG_ON(dest_mapping == MAP_FAILED, "mremap");
+
+ // Validate that the pages have been moved, we know they were moved if
+ // the dest_mapping contains a's.
+ BUG_ON(check_region_contains_byte
+ (dest_mapping, num_pages * page_size, 'a') != 0,
+ "pages did not migrate");
+
+ // Because the region is backed by shmem, we will actually see the same
+ // memory at the source location still.
+ BUG_ON(check_region_contains_byte
+ (source_mapping, num_pages * page_size, 'a') != 0,
+ "source should have no ptes");
+
+ BUG_ON(munmap(dest_mapping, num_pages * page_size) == -1,
+ "unable to unmap destination mapping");
+ BUG_ON(munmap(source_mapping, num_pages * page_size) == -1,
+ "unable to unmap source mapping");
+}
+
// This test validates MREMAP_DONTUNMAP will move page tables to a specific
// destination using MREMAP_FIXED, also while validating that the source
// remains intact.
@@ -300,6 +351,7 @@ int main(void)
BUG_ON(page_buffer == MAP_FAILED, "unable to mmap a page.");

mremap_dontunmap_simple();
+ mremap_dontunmap_simple_shmem();
mremap_dontunmap_simple_fixed();
mremap_dontunmap_partial_mapping();
mremap_dontunmap_partial_mapping_overwrite();
--
2.31.0.rc2.261.g7f71774620-goog

2021-03-24 05:50:50

by Brian Geffon

[permalink] [raw]
Subject: [PATCH v4 2/3] Revert "mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio"

This reverts commit cd544fd1dc9293c6702fab6effa63dac1cc67e99.

As discussed in [1] this commit was a no-op because the mapping type was
checked in vma_to_resize before move_vma is ever called. This meant that
vm_ops->mremap() would never be called on such mappings. Furthermore,
we've since expanded support of MREMAP_DONTUNMAP to non-anonymous
mappings, and these special mappings are still protected by the existing
check of !VM_DONTEXPAND and !VM_PFNMAP which will result in a -EFAULT.

1. https://lkml.org/lkml/2020/12/28/2340

Signed-off-by: Brian Geffon <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
---
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 2 +-
fs/aio.c | 5 +----
include/linux/mm.h | 2 +-
mm/mmap.c | 6 +-----
mm/mremap.c | 2 +-
5 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index e916646adc69..0daf2f1cf7a8 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -1458,7 +1458,7 @@ static int pseudo_lock_dev_release(struct inode *inode, struct file *filp)
return 0;
}

-static int pseudo_lock_dev_mremap(struct vm_area_struct *area, unsigned long flags)
+static int pseudo_lock_dev_mremap(struct vm_area_struct *area)
{
/* Not supported */
return -EINVAL;
diff --git a/fs/aio.c b/fs/aio.c
index 1f32da13d39e..76ce0cc3ee4e 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -323,16 +323,13 @@ static void aio_free_ring(struct kioctx *ctx)
}
}

-static int aio_ring_mremap(struct vm_area_struct *vma, unsigned long flags)
+static int aio_ring_mremap(struct vm_area_struct *vma)
{
struct file *file = vma->vm_file;
struct mm_struct *mm = vma->vm_mm;
struct kioctx_table *table;
int i, res = -EINVAL;

- if (flags & MREMAP_DONTUNMAP)
- return -EINVAL;
-
spin_lock(&mm->ioctx_lock);
rcu_read_lock();
table = rcu_dereference(mm->ioctx_table);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 64a71bf20536..ecdc6e8dc5af 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -570,7 +570,7 @@ struct vm_operations_struct {
void (*close)(struct vm_area_struct * area);
/* Called any time before splitting to check if it's allowed */
int (*may_split)(struct vm_area_struct *area, unsigned long addr);
- int (*mremap)(struct vm_area_struct *area, unsigned long flags);
+ int (*mremap)(struct vm_area_struct *area);
/*
* Called by mprotect() to make driver-specific permission
* checks before mprotect() is finalised. The VMA must not
diff --git a/mm/mmap.c b/mm/mmap.c
index 3f287599a7a3..9d7651e4e1fe 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3403,14 +3403,10 @@ static const char *special_mapping_name(struct vm_area_struct *vma)
return ((struct vm_special_mapping *)vma->vm_private_data)->name;
}

-static int special_mapping_mremap(struct vm_area_struct *new_vma,
- unsigned long flags)
+static int special_mapping_mremap(struct vm_area_struct *new_vma)
{
struct vm_special_mapping *sm = new_vma->vm_private_data;

- if (flags & MREMAP_DONTUNMAP)
- return -EINVAL;
-
if (WARN_ON_ONCE(current->mm != new_vma->vm_mm))
return -EFAULT;

diff --git a/mm/mremap.c b/mm/mremap.c
index db5b8b28c2dd..d22629ff8f3c 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -545,7 +545,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
if (moved_len < old_len) {
err = -ENOMEM;
} else if (vma->vm_ops && vma->vm_ops->mremap) {
- err = vma->vm_ops->mremap(new_vma, flags);
+ err = vma->vm_ops->mremap(new_vma);
}

if (unlikely(err)) {
--
2.31.0.rc2.261.g7f71774620-goog

2021-03-24 06:48:50

by Brian Geffon

[permalink] [raw]
Subject: [PATCH v5 1/3] mm: Extend MREMAP_DONTUNMAP to non-anonymous mappings

Currently MREMAP_DONTUNMAP only accepts private anonymous mappings.
This restriction was placed initially for simplicity and not because
there exists a technical reason to do so.

This change will widen the support to include any mappings which are not
VM_DONTEXPAND or VM_PFNMAP. The primary use case is to support
MREMAP_DONTUNMAP on mappings which may have been created from a memfd.
This change will result in mremap(MREMAP_DONTUNMAP) returning -EINVAL
if VM_DONTEXPAND or VM_PFNMAP mappings are specified.

Lokesh Gidra who works on the Android JVM, provided an explanation of how
such a feature will improve Android JVM garbage collection:
"Android is developing a new garbage collector (GC), based on userfaultfd.
The garbage collector will use userfaultfd (uffd) on the java heap during
compaction. On accessing any uncompacted page, the application threads will
find it missing, at which point the thread will create the compacted page
and then use UFFDIO_COPY ioctl to get it mapped and then resume execution.
Before starting this compaction, in a stop-the-world pause the heap will be
mremap(MREMAP_DONTUNMAP) so that the java heap is ready to receive
UFFD_EVENT_PAGEFAULT events after resuming execution.

To speedup mremap operations, pagetable movement was optimized by moving
PUD entries instead of PTE entries [1]. It was necessary as mremap of even
modest sized memory ranges also took several milliseconds, and stopping the
application for that long isn't acceptable in response-time sensitive
cases.

With UFFDIO_CONTINUE feature [2], it will be even more efficient to
implement this GC, particularly the 'non-moveable' portions of the heap.
It will also help in reducing the need to copy (UFFDIO_COPY) the pages.
However, for this to work, the java heap has to be on a 'shared' vma.
Currently MREMAP_DONTUNMAP only supports private anonymous mappings, this
patch will enable using UFFDIO_CONTINUE for the new userfaultfd-based heap
compaction."

[1] https://lore.kernel.org/linux-mm/20201215030730.NC3CU98e4%[email protected]/
[2] https://lore.kernel.org/linux-mm/[email protected]/

Signed-off-by: Brian Geffon <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Tested-by: Lokesh Gidra <[email protected]>
---
mm/mremap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index ec8f840399ed..db5b8b28c2dd 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -653,8 +653,8 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr,
return ERR_PTR(-EINVAL);
}

- if (flags & MREMAP_DONTUNMAP && (!vma_is_anonymous(vma) ||
- vma->vm_flags & VM_SHARED))
+ if ((flags & MREMAP_DONTUNMAP) &&
+ (vma->vm_flags & (VM_DONTEXPAND | VM_PFNMAP)))
return ERR_PTR(-EINVAL);

if (is_vm_hugetlb_page(vma))
--
2.31.0.291.g576ba9dcdaf-goog

2021-03-24 06:50:59

by Brian Geffon

[permalink] [raw]
Subject: [PATCH] mremap.2: MREMAP_DONTUNMAP to reflect to supported mappings

mremap(2) now supports MREMAP_DONTUNMAP with mapping types other
than private anonymous.

Signed-off-by: Brian Geffon <[email protected]>
---
man2/mremap.2 | 13 ++-----------
1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/man2/mremap.2 b/man2/mremap.2
index 3ed0c0c0a..72acbc111 100644
--- a/man2/mremap.2
+++ b/man2/mremap.2
@@ -118,16 +118,6 @@ This flag, which must be used in conjunction with
remaps a mapping to a new address but does not unmap the mapping at
.IR old_address .
.IP
-The
-.B MREMAP_DONTUNMAP
-flag can be used only with private anonymous mappings
-(see the description of
-.BR MAP_PRIVATE
-and
-.BR MAP_ANONYMOUS
-in
-.BR mmap (2)).
-.IP
After completion,
any access to the range specified by
.IR old_address
@@ -227,7 +217,8 @@ was specified, but one or more pages in the range specified by
.IR old_address
and
.IR old_size
-were not private anonymous;
+were part of a special mapping or the mapping is one that
+does not support merging or expanding;
.IP *
.B MREMAP_DONTUNMAP
was specified and
--
2.31.0.rc2.261.g7f71774620-goog

2021-03-25 21:38:23

by Alejandro Colomar

[permalink] [raw]
Subject: Re: [PATCH] mremap.2: MREMAP_DONTUNMAP to reflect to supported mappings

Hello Brian,

Is this already merged in Linux? I guess not, as I've seen a patch of
yous for the kernel, right?

Thanks,

Alex

On 3/23/21 7:25 PM, Brian Geffon wrote:
> mremap(2) now supports MREMAP_DONTUNMAP with mapping types other
> than private anonymous.
>
> Signed-off-by: Brian Geffon <[email protected]>
> ---
> man2/mremap.2 | 13 ++-----------
> 1 file changed, 2 insertions(+), 11 deletions(-)
>
> diff --git a/man2/mremap.2 b/man2/mremap.2
> index 3ed0c0c0a..72acbc111 100644
> --- a/man2/mremap.2
> +++ b/man2/mremap.2
> @@ -118,16 +118,6 @@ This flag, which must be used in conjunction with
> remaps a mapping to a new address but does not unmap the mapping at
> .IR old_address .
> .IP
> -The
> -.B MREMAP_DONTUNMAP
> -flag can be used only with private anonymous mappings
> -(see the description of
> -.BR MAP_PRIVATE
> -and
> -.BR MAP_ANONYMOUS
> -in
> -.BR mmap (2)).
> -.IP
> After completion,
> any access to the range specified by
> .IR old_address
> @@ -227,7 +217,8 @@ was specified, but one or more pages in the range specified by
> .IR old_address
> and
> .IR old_size
> -were not private anonymous;
> +were part of a special mapping or the mapping is one that
> +does not support merging or expanding;
> .IP *
> .B MREMAP_DONTUNMAP
> was specified and
>

--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

2021-03-26 18:10:46

by Brian Geffon

[permalink] [raw]
Subject: Re: [PATCH] mremap.2: MREMAP_DONTUNMAP to reflect to supported mappings

Hi Alex,
It has not landed yet, it's currently in Andrew's mm tree. I can reach
out again when it makes it into Linus' tree.

Brian


On Thu, Mar 25, 2021 at 2:34 PM Alejandro Colomar (man-pages)
<[email protected]> wrote:
>
> Hello Brian,
>
> Is this already merged in Linux? I guess not, as I've seen a patch of
> yous for the kernel, right?
>
> Thanks,
>
> Alex
>
> On 3/23/21 7:25 PM, Brian Geffon wrote:
> > mremap(2) now supports MREMAP_DONTUNMAP with mapping types other
> > than private anonymous.
> >
> > Signed-off-by: Brian Geffon <[email protected]>
> > ---
> > man2/mremap.2 | 13 ++-----------
> > 1 file changed, 2 insertions(+), 11 deletions(-)
> >
> > diff --git a/man2/mremap.2 b/man2/mremap.2
> > index 3ed0c0c0a..72acbc111 100644
> > --- a/man2/mremap.2
> > +++ b/man2/mremap.2
> > @@ -118,16 +118,6 @@ This flag, which must be used in conjunction with
> > remaps a mapping to a new address but does not unmap the mapping at
> > .IR old_address .
> > .IP
> > -The
> > -.B MREMAP_DONTUNMAP
> > -flag can be used only with private anonymous mappings
> > -(see the description of
> > -.BR MAP_PRIVATE
> > -and
> > -.BR MAP_ANONYMOUS
> > -in
> > -.BR mmap (2)).
> > -.IP
> > After completion,
> > any access to the range specified by
> > .IR old_address
> > @@ -227,7 +217,8 @@ was specified, but one or more pages in the range specified by
> > .IR old_address
> > and
> > .IR old_size
> > -were not private anonymous;
> > +were part of a special mapping or the mapping is one that
> > +does not support merging or expanding;
> > .IP *
> > .B MREMAP_DONTUNMAP
> > was specified and
> >
>
> --
> Alejandro Colomar
> Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
> http://www.alejandro-colomar.es/